Fast 24-bit array -> 32-bit array conversion?

By : Clippy
Source: Stackoverflow.com
Question!

Quick Summary:

I have an array of 24-bit values. Any suggestion on how to quickly expand the individual 24-bit array elements into 32-bit elements?

Details:

I'm processing incoming video frames in realtime using Pixel Shaders in DirectX 10. A stumbling block is that my frames are coming in from the capture hardware with 24-bit pixels (either as YUV or RGB images), but DX10 takes 32-bit pixel textures. So, I have to expand the 24-bit values to 32-bits before I can load them into the GPU.

I really don't care what I set the remaining 8 bits to, or where the incoming 24-bits are in that 32-bit value - I can fix all that in a pixel shader. But I need to do the conversion from 24-bit to 32-bit really quickly.

I'm not terribly familiar with SIMD SSE operations, but from my cursory glance it doesn't look like I can do the expansion using them, given my reads and writes aren't the same size. Any suggestions? Or am I stuck sequentially massaging this data set?

This feels so very silly - I'm using the pixel shaders for parallelism, but I have to do a sequential per-pixel operation before that. I must be missing something obvious...

By : Clippy


Answers

SSE 4.1 .ASM:

PINSRD  XMM0,  DWORD PTR[ESI],   0
PINSRD  XMM0,  DWORD PTR[ESI+3], 1
PINSRD  XMM0,  DWORD PTR[ESI+6], 2
PINSRD  XMM0,  DWORD PTR[ESI+9], 3
PSLLD   XMM0,  8                    
PSRLD   XMM0,  8
MOVNTDQ [EDI], XMM1
add     ESI,   12
add     EDI,   16
By : Henry


The different input/output sizes are not a barrier to using simd, just a speed bump. You would need to chunk the data so that you read and write in full simd words (16 bytes).

In this case, you would read 3 SIMD words (48 bytes == 16 rgb pixels), do the expansion, then write 4 SIMD words.

I'm just saying you can use SIMD, I'm not saying you should. The middle bit, the expansion, is still tricky since you have non-uniform shift sizes in different parts of the word.



This video can help you solving your question :)
By: admin