Looping Structures[1][1]
Looping Structures[1][1]
PRESENTED BY
AHMED KOLA
IBRAHIM HAKIM
NADER AHMED
2
TABLE OF CONTENTS
• Fixed number of iterations
• Variable number of iterations
• Loop unrolling
FIXED NUMBER OF ITERATIONS
FIXED NUMBER OF 4
ITERATIONS
FIXED NUMBER OF 5
ITERATIONS
MOV R0, #0 ; initialize sum to 0
MOV R1, #0 ; initialize loop counter i to 0
MOV R2, data ; R2 points to the start of the data array
loop LDR R3, [R2], #4 ; load the integer pointed by R2 into R3 and increment R2 by 4 (size of int)
ADD R0, R0, R3 ; add the value in R3 to sum (R0)
ADD R1, R1, #1 ; increment loop counter i
CMP R1, #64 ; compare i with 64
BNE loop ; if i is not equal to 64, branch to loop; after loop completion, sum (R0) holds
the result
MOV R0, R0 ; move the result to R0 if not already there (depends on calling convention)
BX LR ; return from function
FIXED NUMBER OF 6
ITERATIONS
This is not efficient. On the ARM, a loop should only use two instructions:
■ A subtract to decrement the loop counter, which also sets the condition code fla
the result
■ A conditional branch instruction
FIXED NUMBER OF 7
ITERATIONS
ITERATIONS
FIXED NUMBER OF 9
ITERATIONS
checksum_v6
MOV r2, r0 ; r2 = data
MOV r0, #0 ; sum = 0
MOV r1, #0x40 ; i = 64
checksum_v6_loop
LDR r3, [r2], #4 ; r3 = *(data++), load word from address in r2 into r3 and increment
r2 by 4
SUBS r1, r1, #1 ; i--, decrement r1 by 1 and set condition flags
ADD r0, r3, r0 ; sum += r3
BNE checksum_v6_loop ; if i != 0, branch to checksum_v6_loop
MOV pc, lr ; return sum
FIXED NUMBER OF 10
ITERATIONS
For an unsigned loop counter i we can use either of the loop continuation
conditions i!=0 or i>0. As i can’t be negative, they are the same condition. For
a signed loop counter, it is tempting to use the condition i>0 to continue the
loop. You might expect the compiler to generate the following two instructions
to implement the loop:
SUBS r1,r1,#1 ; compare i with 1, i=i-1
BGT loop ; if (i+1>1) goto loop
VARIABLE NUMBER OF
ITERATIONS
VARIABLE NUMBER OF 12
ITERATIONS
• Now, suppose we want our checksum routine to handle packets of arbitrary size.
• We pass in a variable N giving the number of words in the data packet.
• The checksum_v7 example shows how the compiler handles a for loop with a variable number
of iterations N.
VARIABLE NUMBER OF 13
ITERATIONS
This example shows how to use a do-while loop to remove the test for N being
zero that occurs in a for loop.
Compare this with the output for checksum_v7 to see the two-cycleFill
& Signsaving.
14
LOOP UNROLLING
LOOP UNROLLING 15
We saw that each loop iteration costs two instructions in addition to the body of the
loop: a subtract to decrement the loop count and a conditional branch.
We call these instructions the loop overhead. On ARM7 or ARM9 processors the
subtract takes one cycle and the branch three cycles, giving an overhead of four
cycles per loop. You can save some of these cycles by unrolling a loop—repeating
the loop body several times, and reducing the number of loop iterations by the
same proportion. For example, let’s unroll our packet checksum example four
times.
LOOP UNROLLING 16
The following code unrolls our packet checksum loop by four times. We
assume that the number of words in the packet N is a multiple of four.
LOOP UNROLLING 17
LOOP UNROLLING
19
SUMMARY