0% found this document useful (0 votes)
4 views

Looping Structures[1][1]

Uploaded by

xixabir956
Copyright
© © All Rights Reserved
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
4 views

Looping Structures[1][1]

Uploaded by

xixabir956
Copyright
© © All Rights Reserved
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 20

LOOPING STRUCTURES

PRESENTED BY

AHMED KOLA
IBRAHIM HAKIM
NADER AHMED
2

TABLE OF CONTENTS
• Fixed number of iterations
• Variable number of iterations
• Loop unrolling
FIXED NUMBER OF ITERATIONS
FIXED NUMBER OF 4

ITERATIONS
FIXED NUMBER OF 5

ITERATIONS
MOV R0, #0 ; initialize sum to 0
MOV R1, #0 ; initialize loop counter i to 0
MOV R2, data ; R2 points to the start of the data array
loop LDR R3, [R2], #4 ; load the integer pointed by R2 into R3 and increment R2 by 4 (size of int)
ADD R0, R0, R3 ; add the value in R3 to sum (R0)
ADD R1, R1, #1 ; increment loop counter i
CMP R1, #64 ; compare i with 64
BNE loop ; if i is not equal to 64, branch to loop; after loop completion, sum (R0) holds
the result
MOV R0, R0 ; move the result to R0 if not already there (depends on calling convention)
BX LR ; return from function
FIXED NUMBER OF 6

ITERATIONS

It takes three instructions to implement the for loop structure:


■ An ADD to increment i
■ A compare to check if i is less than 64
■ A conditional branch to continue the loop if I

This is not efficient. On the ARM, a loop should only use two instructions:
■ A subtract to decrement the loop counter, which also sets the condition code fla
the result
■ A conditional branch instruction
FIXED NUMBER OF 7

ITERATIONS

The key point is that the loop counter should


count down to zero rather than counting up to
some arbitrary limit. Then the comparison with
zero is free since the result is stored in the
condition flags. Since we are no longer using i as
an array index, there is no problem in counting
down rather than up
FIXED NUMBER OF 8

ITERATIONS
FIXED NUMBER OF 9

ITERATIONS

checksum_v6
MOV r2, r0 ; r2 = data
MOV r0, #0 ; sum = 0
MOV r1, #0x40 ; i = 64
checksum_v6_loop
LDR r3, [r2], #4 ; r3 = *(data++), load word from address in r2 into r3 and increment
r2 by 4
SUBS r1, r1, #1 ; i--, decrement r1 by 1 and set condition flags
ADD r0, r3, r0 ; sum += r3
BNE checksum_v6_loop ; if i != 0, branch to checksum_v6_loop
MOV pc, lr ; return sum
FIXED NUMBER OF 10

ITERATIONS

For an unsigned loop counter i we can use either of the loop continuation
conditions i!=0 or i>0. As i can’t be negative, they are the same condition. For
a signed loop counter, it is tempting to use the condition i>0 to continue the
loop. You might expect the compiler to generate the following two instructions
to implement the loop:
SUBS r1,r1,#1 ; compare i with 1, i=i-1
BGT loop ; if (i+1>1) goto loop

In fact, the compiler will generate


SUB r1,r1,#1; i--
CMP r1,#0 ; compare i with 0
BGT loop ; if (i>0) goto loop
11

VARIABLE NUMBER OF
ITERATIONS
VARIABLE NUMBER OF 12

ITERATIONS
• Now, suppose we want our checksum routine to handle packets of arbitrary size.
• We pass in a variable N giving the number of words in the data packet.
• The checksum_v7 example shows how the compiler handles a for loop with a variable number
of iterations N.
VARIABLE NUMBER OF 13
ITERATIONS
This example shows how to use a do-while loop to remove the test for N being
zero that occurs in a for loop.

Compare this with the output for checksum_v7 to see the two-cycleFill
& Signsaving.
14

LOOP UNROLLING
LOOP UNROLLING 15

We saw that each loop iteration costs two instructions in addition to the body of the
loop: a subtract to decrement the loop count and a conditional branch.

We call these instructions the loop overhead. On ARM7 or ARM9 processors the
subtract takes one cycle and the branch three cycles, giving an overhead of four
cycles per loop. You can save some of these cycles by unrolling a loop—repeating
the loop body several times, and reducing the number of loop iterations by the
same proportion. For example, let’s unroll our packet checksum example four
times.
LOOP UNROLLING 16

The following code unrolls our packet checksum loop by four times. We
assume that the number of words in the packet N is a multiple of four.
LOOP UNROLLING 17

There are two questions you need to ask when unrolling a


loop:
• How many times should I unroll the loop?
• What if the number of loop iterations is not a multiple of
the unroll amount?

For example, what if N is not a multiple of four in


checksum_v9?
18

LOOP UNROLLING
19

SUMMARY

Writing Loops Efficiently:-


■ Use loops that count down to zero. Then the compiler does not need to allocate a
register to hold the termination value, and the comparison with zero is free.
■ Use unsigned loop counters by default and the continuation condition i!=0 rather than
i>0. This will ensure that the loop overhead is only two instructions.
■ Use do-while loops rather than for loops when you know the loop will iterate at least
once. This saves the compiler checking to see if the loop count is zero.
■ Unroll important loops to reduce the loop overhead. Do not over-unroll. If the loop
overhead is small as a proportion of the total, then unrolling will increase code size and
hurt the performance of the cache.
■ Try to arrange that the number of elements in arrays are multiples of four or eight. You
can then unroll loops easily by two, four, or eight times without worrying about the
leftover array elements.
THANK YOU

You might also like