Module 3 Book1 - Merged
Module 3 Book1 - Merged
data. But most ARM data processing operations are 32-bit only. For this reason, a 32-bit data
type, int or long, used for local variables wherever possible. Avoid using char and short as local
variable types, even if for an 8- or 16-bit value. The one exception is when you want wrap-
around to occur.
Example 1: The following code checksums a data packet containing 64 words. It shows why to
avoid using char for local variables.
int checksum_v1(int *data) {
char i; int sum = 0;
for (i = 0; i < 64; i++) {
sum += data[i];
}
return sum; }
. All ARM registers are 32-bit
32
and all stack entries are at least 32-bit. Too implement the i++ exactly, the compiler must account
for the case when i = 255. Any attempt to increment 255 should
should produce the answer 0.
Example 2: The data packet contains 16-bit
bit values for a 16-bit
16-bit checksum.
16-
short checksum_v3(short *data) {
unsigned int i;
short sum = 0;
for (i = 0; i < 64; i++) {
sum = (short)(sum + data[i]);
}
return sum; }
With armcc this code will produce a warning for enabling implicit narrowing cast warnings
using the compiler switch -W+ n. The expression sum + data[i] is an integer and so can only be
assigned to a short using an (implicit or explicit) narrowing cast.
To
To avoid unnecessary
unnecessar casts it uses int type local variables. It increments the pointer data instead
of using an index offset data[i].
short checksum_v4(short *data) {
unsigned int i;
int sum=0;
for (i=0; i<64; i++) {
sum += *(data++);// The *(data++) operation translates to a single ARM instruction that loads the data and increments the data pointer.
26 | P a g e
Microcontroller and Embedded Systems
}
return (short)sum; }
Function argument types: For armcc in ADS, function arguments are passed narrow and
values returned narrow. The caller casts argument values and the callee casts return values. The
compiler uses the ANSI prototype of the function to determine the datatypes of the function
arguments.
If code uses addition, subtraction, and multiplication, then there is no performance difference
between signed and unsigned operations. However, there is a difference when it comes to
division. Consider the following short example that averages two integers:
int average_v1(int a, int b) {
return (a+b)/2; }
The
he compiler adds one to the sum before shifting by right if the sum is negative
negative.. In other words
it replaces x/2 by the statement: (x<0) ? ((x+1) >> 1): (x >> 1)
deliberated..
deliberated
-bit or 16-bit modular
arithmetic is necessary. Use the signed or unsigned int types instead. Unsigned types are faster
for divisions operation.
For array entries and global variables held in main memory, use the type with the smallest size
possible to hold the required data. This saves memory footprint. The ARMv4 architecture is
array pointer. Avoid using offsets from the base of the array with short type arrays, as LDRH
does not support this.
Use explicit casts when reading array entries or global variables into local variables, or writing
local variables out to array entries. The casts make it clear that for fast operation taking a narrow
width type stored in memory and expa
expanding it to a wider type in the registers. Switch on implicit
narrowing cast warnings in the compiler to detect implicit casts.
Avoid implicit or explicit narrowing casts in expressions because they usually cost extra cycles.
Avoid char and short types fo
for function arguments or return values. Instead use the int type even
if the range of the parameter is smaller. This prevents the compiler performing unnecessary
casts.
C Looping Structures
Below example shows the compiler treats a loop with incrementing count i++.
27 | P a g e
Microcontroller and Embedded Systems
An ADD to increment i
A compare to check if i is less than 64
A conditional branch to continue the loop if i < 64
In this case a do
do-while loop gives better performance than a for loop
28 | P a g e
Microcontroller and Embedded Systems
do-while
while loop remove the test for N being zero that occurs in a for loop & hence it gives better
performance than a for loop.
Loop unrolling: Repeating the loop body several times, and reducing the number of loop
loop
iterations by the same proportion.
Points to remember
Use loops that count down to zero. Then the compiler does not need to allocate a register
to hold the termination value, and the comparison with zero is free.
Use unsigned loop counters by default and the continuat
continuation condition i!=0 rather than
i>0. This will ensure that the loop overhead is only two instructions.
Use do-
do-while
do-while loops rather than for loops when you know the loop will iterate at least
once. This saves the compiler checking to see if the loop count is zzero.
Unroll important loops to reduce the loop overhead. Do not overunroll
overunroll, if the loop
overhead is small as a proportion of the total, then unrolling will increase code size and
hurt the performance of the cache.
Register Allocation
The compiler attempts to allocate a processor register to each local variable use in a C function.
29 | P a g e
Microcontroller and Embedded Systems
It will try to use the same register for different local variables if the use of the variables does not
overlap. When there are more local variables than available registers, the compiler stores the
excess variables on the processor stack. These variables are called spilled or swapped out
variables since they are written out to memory (in a similar way virtual memory is swapped out
to disk). Spilled variables are slow to access compared to variables allocated to registers.
minimize the number of spilled variables & ensure that the
most important and frequently accessed variables are stored in registers.
C compiler register usage
Table shows the standard
tandard register names and usage when following the ARM-Thumb
Thumb procedure
call standard (ATPCS), which is used in code generated by C compilers.
Provided the compiler is not using software stack checking or a frame pointer, then the C
compiler can use registers
ers r0 to r12 and r14 to hold variables. It must save the callee values of r4
to r11 and r14 on the stack if using these registers.
30 | P a g e
Microcontroller and Embedded Systems
In theory, the C compiler can assign 14 variables to registers without spillage. In practice, some
variables to this register. Also, complex expressions require intermediate working registers to
evaluate. Therefore, to ensure good assignment to registers, try to limit the internal loop of
functions to using at most 12 local variables.
If the compiler does need to swap out variables, then it chooses which variables to swap out
based on frequency of use. A variable used inside a loop counts multiple times.
The register keyword in C hints that a compiler should allocate the given variable to a register.
Different
ifferent compilers treat this keyword in different ways, and different
erent architectures have a
different number of available registers (for example, Thumb and ARM).
Try to limit the number of local variables in the internal loop of functions to 12. The
compiler should be able to allocate these to ARM registers.
Guide
uide the compiler as to which variables are important by ensuring these variables are
used within the innermost loop.
Function Calls
registers: r0, r1, r2, and r3. Subsequent integer arguments are
placed on the full
full descending stack, ascending in memory shown
in figure
figure below. Function return integer values are passed in r0.
This description covers only integer or pointer arguments. Two
Two-
word arguments such as long or double are passed in a pair of
consecutive argume
argument registers and returned in r0, r1. The compiler may pass structures in
registers or by reference according to command line compiler options. Functions with four or
or
functions with four or fewer arguments, the compiler can pass all the arguments in registers. For
functions with more arguments, both the caller and callee must access the stack for some
arguments.
31 | P a g e
Microcontroller and Embedded Systems
Example: The following code creates a Queue structure and passes this to the function to reduce
the number of function arguments.
There are other ways of reducing function call overhead if the function is very small and
functions that will call it. The C compiler then knows the code generated for the callee function
and can make optimizations in the caller function:
Therefore the caller function need not save all the ATPCS corruptible registers.
If the callee function is very small, then the compilers can inline the code in the caller
function. This removes the function call overhead completely.
The compiler can then optimize the function call or inline the small function.
Critical functions can be inlined using the inline keyword.
Pointer Aliasing
Two pointers are said to be alias when they point to the same address. To write one pointer, it
32 | P a g e
Microcontroller and Embedded Systems
which poi
assume that any write to a pointer may affect the value read from any other pointer, which can
Example: The below code for function increments, two timer values by a step amount:
void timers_v1(int *timer1, int *timer2, int *step) {
*timer1 += *step;
*timer2 += *step; }
The compiler loads from step twice. Usually, a compiler optimization called common sub
expression elimination would kick in so that *step was only evaluated once, and the value reused
for the second occurrence. But
and step might alias one another i.e. the compiler cannot be sure that the write to timer1 doe
affect the read from step.
In this case,
forces the compiler to insert an extra load instruction.
The same problem occurs if you use structure accesses rather than di
direct pointer access.
complete.
15. Explain the ARM software interrupt instruction (SWI) with an example code.
16. Explain the ARM program status register instructions with an example code.
17. Describe Co-Processor
Processor instructions of ARM Processor and discuss coprocessor-15
instruction.
18. Explain with an example the different loading constants used in ARM processor.
19. Explain with an example the different basic C data types used by arm compiler.
20. With program example explain the advantages of using int rather
rathe than char & short type
for local variables & function arguments.
21. Describe with an example different looping structure used by arm compiler.
22. Explain loop unrolling concept with suitable program example.
23. Explain loops using variable number of iterations with
w program example.
24. Explain loop unrolling concept with suitable program example.
25. Explain in detail about Register Allocation.
26. Explain the function call operation with suitable program example.
27. Explain the pointer aliasing concept with suitable program example.
Reference:
1. Andrew N Sloss, Dominic Symes and Chris Wright, ARM system developers guide,
Elsevier, Morgan Kaufman publishers, 2008.
Notes By: Veena Bhat
34 | P a g e