I have two signed 16-bit values in a 32-bit word, and I need to shift them right (divide) on constant value (it can be from 1 to 6) and saturate to byte (0..0xFF).
- 0x FFE1 00AA with shift=5 must become 0x 0000 0005;
- 0x 2345 1234 must become 0x 00FF 0091
I'm trying to saturate the values simultaneously, something like this pseudo-code:
AND RT, R0, 0x80008000; - mask high bits to get negatives ORR RT, RT, LSR #1 ORR RT, RT, LSR #2 ORR RT, RT, LSR #4 ORR RT, RT, LSR #8; - now its expanded signs in each halfword MVN RT, RT AND R0, RT; now negative values are zero ; here something to saturate high overflow and shift after
but code I get is very ugly and slow. :) The best (fastest) thing I have now is separate saturation of each half, like this:
MOV RT, R0, LSL #16 MOVS RT, RT, ASR #16+5 MOVMI RT, #0 CMP RT, RT, #256 MOVCS RT, #255 MOVS R0, R0, ASR #16+5 MOVMI R0, #0 CMP R0, R0, #256 MOVCS R0, #255 ORR R0, RT, R0, LSL #16
But it's 10 cycles. :( Can it be faster?
p.s.: Later I found USAT16 instruction for this, but it's only for ARMv6. And I need code to work on ARMv5TE and ARMv4.
Edit: now I rewrite my first code:
ANDS RT, 0x10000, R0 << 1; // 0x10000 is in register. Sign (HI) moves to C flag, Sign (LO) is masked SUBNE RT, RT, 1; // Mask LO with 0xFFFF if it's negative SUBCS RT, RT, 0x10000; // Mask HI with 0xFFFF if it's negative BIC R0, R0, RT; // Negatives are 0 now. The mask can be used as XOR too TST R0, 0xE0000000; // check HI overflow ORRNE R0, R0, 0x1FE00000 // set HI to 0xFF (shifted) if so TST R0, 0x0000E000 // check LO overflow ORRNE R0, R0, 0x00001FE0 // set LO to 0xFF if so AND R0, 0x00FF00FF, R0 >> 5; // 0x00FF00FF is in register
but it isn't beautiful.