I have two signed 16-bit values in a 32-bit word, and I need to shift them right (divide) on constant value (it can be from 1 to 6) and saturate to byte (0..0xFF).

For example,

**0x FFE1 00AA**with**shift=5**must become**0x 0000 0005**;**0x 2345 1234**must become**0x 00FF 0091**

I'm trying to saturate the values simultaneously, something like this pseudo-code:

```
AND RT, R0, 0x80008000; - mask high bits to get negatives
ORR RT, RT, LSR #1
ORR RT, RT, LSR #2
ORR RT, RT, LSR #4
ORR RT, RT, LSR #8; - now its expanded signs in each halfword
MVN RT, RT
AND R0, RT; now negative values are zero
; here something to saturate high overflow and shift after
```

but code I get is very ugly and slow. :) The best (fastest) thing I have now is separate saturation of each half, like this:

```
MOV RT, R0, LSL #16
MOVS RT, RT, ASR #16+5
MOVMI RT, #0
CMP RT, RT, #256
MOVCS RT, #255
MOVS R0, R0, ASR #16+5
MOVMI R0, #0
CMP R0, R0, #256
MOVCS R0, #255
ORR R0, RT, R0, LSL #16
```

But it's 10 cycles. :( **Can it be faster?**

p.s.: Later I found USAT16 instruction for this, but it's only for ARMv6. And I need code to work on ARMv5TE and ARMv4.

**Edit:** now I rewrite my first code:

```
ANDS RT, 0x10000, R0 << 1; // 0x10000 is in register. Sign (HI) moves to C flag, Sign (LO) is masked
SUBNE RT, RT, 1; // Mask LO with 0xFFFF if it's negative
SUBCS RT, RT, 0x10000; // Mask HI with 0xFFFF if it's negative
BIC R0, R0, RT; // Negatives are 0 now. The mask can be used as XOR too
TST R0, 0xE0000000; // check HI overflow
ORRNE R0, R0, 0x1FE00000 // set HI to 0xFF (shifted) if so
TST R0, 0x0000E000 // check LO overflow
ORRNE R0, R0, 0x00001FE0 // set LO to 0xFF if so
AND R0, 0x00FF00FF, R0 >> 5; // 0x00FF00FF is in register
```

but it isn't beautiful.