ARM MC Module 03
ARM MC Module 03
The following short script shows you how to invoke armcc on a C file test.c.
The -Otime switch optimizes for execution efficiency rather than space and mainly affects the layout of for and while
loops.
If you are using the gcc compiler, then the following short script generates a similar
assembler output listing:
. The -fomit-frame- pointer switch prevents the GNU compiler from maintaining a
frame pointer register.
• ARM processors have 32-bit registers and 32-bit data processing operations.
• The ARM architecture is a RISC load/store architecture. In other words , load
values from memory into registers before acting on them.
• There are no arithmetic or logical instructions that manipulate values in memory
directly.
3
LOCAL VARIABLE TYPES
• ARMv4-based processors can efficiently load and store 8-, 16-, and 32-bit data.
• However, most ARM data processing operations are 32-bit only. So, use a 32-bit
datatype, int or long, for local variables wherever possible.
• Avoid using char and short as local variable types, even if you are manipulating an
8- or 16-bit value.
• The one exception is when you want wrap-around to occur.
• If you require modulo arithmetic of the form 255+ 1 = 0, then use the char type.
To see the effect of local variable types, let’s consider a simple example- a
checksum function that sums the values in a data packet. Most communication
protocols (such as TCP/IP) have a checksum or cyclic redundancy check (CRC)
routine to check for errors in a data packet.
5
All ARM registers are 32-bit and all stack entries are at least 32-bit.
Furthermore, to implement the i++ exactly, the compiler must account for the
case when i = 255. Any attempt to increment 255 should produce the answer 6
0.
7
8
Now compare this to the compiler output where instead i declared as an
unsigned int.
In the first case, the compiler inserts an extra AND instruction to reduce i to the
range 0 to 255 before the comparison with 64. This instruction disappears in the 9
second case.
FUNCTION ARGUMENT TYPES
• Converting local variables from types char or short to type int increases
performance and reduces code size.
• The same holds for function arguments.
10
11
12
C LOOPING STRUCTURES
Loops with a fixed number of iterations
Loops with a variable number of iterations.
Loop unrolling.
1) Loops with a fixed number of iterations
The key point is that the loop counter should count down to zero rather
than counting up to some arbitrary limit.
Then the comparison with zero is free since the result is stored in the
condition flags.
Since no longer using i as an array index, there is no problem in counting down
rather than up.
14
• The SUBS and BNE instructions implement the loop.
• Checksum example now has the minimum number of four instructions per loop.
• This is much better than six for checksum_v1 15
2) Loops with a variable number of
iterations.
• Notice that the compiler checks that N is nonzero on entry to the function. Often this check is
unnecessary since the array won’t be empty. 16
• In this case a do-while loop gives better performance and code density than a for loop.
17
LOOP UNROLLING
Each loop iteration costs two instructions in addition to the body of the loop: a
subtract to decrement the loop count and a conditional branch.
These instructions are called as the loop overhead.
On ARM7 or ARM9 processors the subtract takes one cycle and the branch three
cycles, giving an overhead of four cycles per loop.
Can save some of these cycles by unrolling a loop—repeating the loop body several
times, and reducing the number of loop iterations by the same proportion.
18
19
20
• The loop overhead has reduced from 4N cycles to (4N)/4= N cycles.
• On the ARM7TDMI, this accelerates the loop from 8 cycles per accumulate to 20/4= 5
cycles per accumulate, nearly doubling the speed.
• For the ARM9TDMI, which has a faster load instruction, the benefit is even higher.