ARM assembly loop

for (int i = 0; i < 10000; i++)
  a[i] = b[i] + c[i]

What does the ARM assembly for this high level language look like?

Edit: I'm also assuming the base address of A is in R8, the base address of B is in R9, and the base address of C is in R10 and A,B,C are all int arrays

Much appreciated

I tried:

MOV  R0, #0  ; Init r0 (i = 0)


        a[i] = b[i] + c[i]   //How to fix this? 

        ADD  R0, R0, #1 ;Increment it

        CMP  R0, #1000 ;Check the limit

        BLE  Loop  ;Loop if not finished
By : CyberShot

for (int i = 0; i 
By : old_timer

To build upon @alpera 's answer - you could also unroll the loop to do 4 ops at once - although whether you get a performance benefit depends whether the memory access or the pipeline stall around the branch is the bigger effect

mov r11,#0x2700
orr r11,#0x0010
ldmia r9!, {r0-r3}
ldmia r10!, {r4-r7}
add r0,r0,r4
add r1,r1,r5
add r2,r2,r6
add r3,r3,r7
stmia r8!, {r0-r3}
subs r11,#4
bne top

If you have NEON unit handy, we could do it that way too - in which case it will parallelize the loads, stores and adds - in effect reducing the problem to 5 instructions that perform two iterations of the loop at once.

A C compiler is will not generate code this tight by default (or paralleize for NEON) as it must assume that the buffers used for reading and writing (r8,r10 and r11) can potentially overlap - hence a write through r8 might immediately be read in the next iteration of the loop through r9 or r10. You can use the restrict (__restrict in C ) modifier to tell the compiler that this is not the case.

By : marko

This video can help you solving your question :)
By: admin