As it like processes 4 frames in sequence would it not be easiest do split the 10ms frames to an own core and then merge the result in the fifth and do the rest of the calculation.
If this would work it could mean that it would work on a R Pi in realtime.

