That's not a simple task in ARM assembly because you can't easily use 32 bit constants. You have to break up all your operations that mask out bytes to use 8 bit constants each (also these constants can be rotated).
You mask out byte2 and 3 using the AND instruction and do the shift later. in ARM-assembler you have with most instruction one shift for free, so the shift-into-position and merge with the other bits often end up being a single instruction.
Here is some untested code that does the middle byte swap (ARMv4, not thumb-instruction set):
AND R2, R0, #0x00ff0000 @ R2=0x00BB0000 get byte 2
AND R3, R0, #0x0000ff00 @ R3=0x0000CC00 get byte 1
BIC R0, R0, #0x00ff0000 @ R0=0xAA00CCDD clear byte 2
BIC R0, R0, #0x0000ff00 @ R0=0xAA0000DD clear byte 1
ORR R0, R2, LSR #8 @ R0=0xAA00BBDD merge and shift byte 2
ORR R0, R3, LSL #8 @ R0=0xAACCBBDD merge and shift byte 1
That translate line by line into the following c-code:
int swap (int R0)
R2 = R0 & 0x00ff0000;
R3 = R0 & 0x0000ff00;
R0 = R0 & 0xff00ffff;
R0 = R0 & 0xffff00ff;
R0 |= (R2>>8);
R0 |= (R3<<8);
You'll see - lots of lines for such a simple task. Not even the ARMv6 architecture helps here much.
EDIT: ARMv6 version (also untested, but two instructions shorter)
@ bits in R0: aabbccdd
ROR R0, R0, #8 @ r0 = ddaabbcc
REV R1, R0 @ r1 = ccbbaadd
PKHTB R0, R0, R1 @ r0 = ddaaccbb
ROR R0, R0, #24 @ r0 = aaccbbdd