0% found this document useful (0 votes)
17 views42 pages

Module 3 Book1 - Merged

The document discusses microcontroller and embedded systems, focusing on ARM assembly language instructions, data types, and optimization techniques for C programming. It emphasizes the importance of using appropriate data types, efficient looping structures, and register allocation to enhance performance. Additionally, it covers function calls, pointer aliasing, and provides examples and best practices for programming in ARM architecture.

Uploaded by

Deepa Jerin
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
17 views42 pages

Module 3 Book1 - Merged

The document discusses microcontroller and embedded systems, focusing on ARM assembly language instructions, data types, and optimization techniques for C programming. It emphasizes the importance of using appropriate data types, efficient looping structures, and register allocation to enhance performance. Additionally, it covers function calls, pointer aliasing, and provides examples and best practices for programming in ARM architecture.

Uploaded by

Deepa Jerin
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 42

Microcontroller and Embedded Systems

be encoded using a pc-relative expression.


Example: This example shows an LDR instruction loading a 32-bit constant 0xff00ffff into
register r0.
LDR r0, [pc, #constant_number-8-{PC}]
: constant_number
DCD 0xff00ffff
This example
ple involves a memory access to load the constant, which can be expensive for time
time-
critical routines. The following Example shows an alternative method to load the same constant
into register r0 by using an MVN instruction.
Example: Loading the constant 0xff00ffff using an MVN.
PRE none...
MVN r0, #0x00ff0000
POST r0 = 0xff00ffff
There
here are alternatives to accessing memory, but they depend upon the constant
constant. The LDR
pseudo-instruction
instruction either inserts an MOV or
MVN instruction to generate a value (if
possible)
sible) or generates an LDR instruction
with a pc-relative
relative address to read the constant from a literal pool a data area embedded within
the code. Below table
able shows two pseudo-
pseudo
pseudo-code
-code conversions.
The first conversion produces a simple MOV instruction; the seco
second conversion produces a
pcrelative load. Another useful pseudo
pseudo-instruction
instruction is the ADR instruction, or address relative.
This instruction places the address of the given label into register Rd, using a pc
pc-relative add or
subtract.

Basic C Data Types


ARM C -bit value, rather than a signed 8-bit value.
Compilers armcc and gcc use the datatype mappings are shown in below table
table.

Local variable types: ARMv4- -, 16-, and 32-bit


25 | P a g e
Microcontroller and Embedded Systems

data. But most ARM data processing operations are 32-bit only. For this reason, a 32-bit data
type, int or long, used for local variables wherever possible. Avoid using char and short as local
variable types, even if for an 8- or 16-bit value. The one exception is when you want wrap-
around to occur.
Example 1: The following code checksums a data packet containing 64 words. It shows why to
avoid using char for local variables.
int checksum_v1(int *data) {
char i; int sum = 0;
for (i = 0; i < 64; i++) {
sum += data[i];
}
return sum; }
. All ARM registers are 32-bit
32
and all stack entries are at least 32-bit. Too implement the i++ exactly, the compiler must account
for the case when i = 255. Any attempt to increment 255 should
should produce the answer 0.
Example 2: The data packet contains 16-bit
bit values for a 16-bit
16-bit checksum.
16-
short checksum_v3(short *data) {
unsigned int i;
short sum = 0;
for (i = 0; i < 64; i++) {
sum = (short)(sum + data[i]);
}
return sum; }
With armcc this code will produce a warning for enabling implicit narrowing cast warnings
using the compiler switch -W+ n. The expression sum + data[i] is an integer and so can only be
assigned to a short using an (implicit or explicit) narrowing cast.
To
To avoid unnecessary
unnecessar casts it uses int type local variables. It increments the pointer data instead
of using an index offset data[i].
short checksum_v4(short *data) {
unsigned int i;
int sum=0;
for (i=0; i<64; i++) {
sum += *(data++);// The *(data++) operation translates to a single ARM instruction that loads the data and increments the data pointer.
26 | P a g e
Microcontroller and Embedded Systems

}
return (short)sum; }
Function argument types: For armcc in ADS, function arguments are passed narrow and
values returned narrow. The caller casts argument values and the callee casts return values. The
compiler uses the ANSI prototype of the function to determine the datatypes of the function
arguments.
If code uses addition, subtraction, and multiplication, then there is no performance difference
between signed and unsigned operations. However, there is a difference when it comes to
division. Consider the following short example that averages two integers:
int average_v1(int a, int b) {
return (a+b)/2; }
The
he compiler adds one to the sum before shifting by right if the sum is negative
negative.. In other words
it replaces x/2 by the statement: (x<0) ? ((x+1) >> 1): (x >> 1)
deliberated..
deliberated
-bit or 16-bit modular
arithmetic is necessary. Use the signed or unsigned int types instead. Unsigned types are faster
for divisions operation.
For array entries and global variables held in main memory, use the type with the smallest size
possible to hold the required data. This saves memory footprint. The ARMv4 architecture is

array pointer. Avoid using offsets from the base of the array with short type arrays, as LDRH
does not support this.
Use explicit casts when reading array entries or global variables into local variables, or writing
local variables out to array entries. The casts make it clear that for fast operation taking a narrow
width type stored in memory and expa
expanding it to a wider type in the registers. Switch on implicit
narrowing cast warnings in the compiler to detect implicit casts.
Avoid implicit or explicit narrowing casts in expressions because they usually cost extra cycles.
Avoid char and short types fo
for function arguments or return values. Instead use the int type even
if the range of the parameter is smaller. This prevents the compiler performing unnecessary
casts.
C Looping Structures
Below example shows the compiler treats a loop with incrementing count i++.
27 | P a g e
Microcontroller and Embedded Systems

An ADD to increment i
A compare to check if i is less than 64
A conditional branch to continue the loop if i < 64

SUBS and BNE instructions implement the loop


Loops Using Variable Number of Iterations

In this case a do
do-while loop gives better performance than a for loop

28 | P a g e
Microcontroller and Embedded Systems

do-while
while loop remove the test for N being zero that occurs in a for loop & hence it gives better
performance than a for loop.
Loop unrolling: Repeating the loop body several times, and reducing the number of loop
loop
iterations by the same proportion.

Points to remember
Use loops that count down to zero. Then the compiler does not need to allocate a register
to hold the termination value, and the comparison with zero is free.
Use unsigned loop counters by default and the continuat
continuation condition i!=0 rather than
i>0. This will ensure that the loop overhead is only two instructions.
Use do-
do-while
do-while loops rather than for loops when you know the loop will iterate at least
once. This saves the compiler checking to see if the loop count is zzero.
Unroll important loops to reduce the loop overhead. Do not overunroll
overunroll, if the loop
overhead is small as a proportion of the total, then unrolling will increase code size and
hurt the performance of the cache.
Register Allocation
The compiler attempts to allocate a processor register to each local variable use in a C function.

29 | P a g e
Microcontroller and Embedded Systems

It will try to use the same register for different local variables if the use of the variables does not
overlap. When there are more local variables than available registers, the compiler stores the
excess variables on the processor stack. These variables are called spilled or swapped out
variables since they are written out to memory (in a similar way virtual memory is swapped out
to disk). Spilled variables are slow to access compared to variables allocated to registers.
minimize the number of spilled variables & ensure that the
most important and frequently accessed variables are stored in registers.
C compiler register usage
Table shows the standard
tandard register names and usage when following the ARM-Thumb
Thumb procedure
call standard (ATPCS), which is used in code generated by C compilers.
Provided the compiler is not using software stack checking or a frame pointer, then the C
compiler can use registers
ers r0 to r12 and r14 to hold variables. It must save the callee values of r4
to r11 and r14 on the stack if using these registers.

30 | P a g e
Microcontroller and Embedded Systems

In theory, the C compiler can assign 14 variables to registers without spillage. In practice, some

variables to this register. Also, complex expressions require intermediate working registers to
evaluate. Therefore, to ensure good assignment to registers, try to limit the internal loop of
functions to using at most 12 local variables.
If the compiler does need to swap out variables, then it chooses which variables to swap out
based on frequency of use. A variable used inside a loop counts multiple times.
The register keyword in C hints that a compiler should allocate the given variable to a register.
Different
ifferent compilers treat this keyword in different ways, and different
erent architectures have a
different number of available registers (for example, Thumb and ARM).

Try to limit the number of local variables in the internal loop of functions to 12. The
compiler should be able to allocate these to ARM registers.
Guide
uide the compiler as to which variables are important by ensuring these variables are
used within the innermost loop.
Function Calls

values in ARM registers. The more recent ARM


ARM-Thumb
Procedure Call Standard (ATPCS) covers ARM and Thumb
interworking as well.

registers: r0, r1, r2, and r3. Subsequent integer arguments are
placed on the full
full descending stack, ascending in memory shown
in figure
figure below. Function return integer values are passed in r0.
This description covers only integer or pointer arguments. Two
Two-
word arguments such as long or double are passed in a pair of
consecutive argume
argument registers and returned in r0, r1. The compiler may pass structures in
registers or by reference according to command line compiler options. Functions with four or
or
functions with four or fewer arguments, the compiler can pass all the arguments in registers. For
functions with more arguments, both the caller and callee must access the stack for some
arguments.
31 | P a g e
Microcontroller and Embedded Systems

Example: The following code creates a Queue structure and passes this to the function to reduce
the number of function arguments.
There are other ways of reducing function call overhead if the function is very small and

functions that will call it. The C compiler then knows the code generated for the callee function
and can make optimizations in the caller function:

Therefore the caller function need not save all the ATPCS corruptible registers.
If the callee function is very small, then the compilers can inline the code in the caller
function. This removes the function call overhead completely.

For efficient use of calling


alling a functions

Use structures to group related argume


arguments and pass structure pointers instead of multiple
arguments.

The compiler can then optimize the function call or inline the small function.
Critical functions can be inlined using the inline keyword.
Pointer Aliasing
Two pointers are said to be alias when they point to the same address. To write one pointer, it
32 | P a g e
Microcontroller and Embedded Systems

which poi
assume that any write to a pointer may affect the value read from any other pointer, which can

Example: The below code for function increments, two timer values by a step amount:
void timers_v1(int *timer1, int *timer2, int *step) {
*timer1 += *step;
*timer2 += *step; }
The compiler loads from step twice. Usually, a compiler optimization called common sub
expression elimination would kick in so that *step was only evaluated once, and the value reused
for the second occurrence. But
and step might alias one another i.e. the compiler cannot be sure that the write to timer1 doe
affect the read from step.
In this case,
forces the compiler to insert an extra load instruction.
The same problem occurs if you use structure accesses rather than di
direct pointer access.

typedef struct {int step;} State;


typedef struct {int timer1, timer2;} Timers;
void timers_v2(State *state, Timers *timers) {
timers->timer1
>timer1 += state-
state
state->step;
->step; timers-
timers
timers->timer2
->timer2 += state
state->step; }
Avoiding Pointer Aliasing
Do not rely on the compiler to eliminate common sub expressions involving memory
accesses. Instead
Instead, create new local variables to hold the expression. This ensures the
expression is evaluated only once.
Avoid taking the address
from then on.
Questions
1. Explain the MOV instruction set provided by ARM7 with the example for each.
2. With a neat diagram explain Barrel Shifter & its operation.
3. Discuss barrel shifter with arithmetic instructions.
4. Explain ARM arithmetic instructions with an example.
33 | P a g e
Microcontroller and Embedded Systems

5. Explain ARM Logical instructions with an example.


6. Explain ARM comparison instructions with an example.
7. Explain ARM multiply instructions with an example.
8. Explain ARM branch instructions with an example.
9. Write a program for forward and backward branch by considering an example.
10. Discus the categories of Load-Store instructions used with ARM.
11. Explain the ARM Single-Register and Multiple-Register load-store
store addressing modes
with example.
12. Discuss ARM stack operations with suitable example.
13. Explain the ARM swap instruction with an example code.
14. Write a program for simple data guard that can be used to protect data from being written

complete.
15. Explain the ARM software interrupt instruction (SWI) with an example code.
16. Explain the ARM program status register instructions with an example code.
17. Describe Co-Processor
Processor instructions of ARM Processor and discuss coprocessor-15
instruction.
18. Explain with an example the different loading constants used in ARM processor.
19. Explain with an example the different basic C data types used by arm compiler.
20. With program example explain the advantages of using int rather
rathe than char & short type
for local variables & function arguments.
21. Describe with an example different looping structure used by arm compiler.
22. Explain loop unrolling concept with suitable program example.
23. Explain loops using variable number of iterations with
w program example.
24. Explain loop unrolling concept with suitable program example.
25. Explain in detail about Register Allocation.
26. Explain the function call operation with suitable program example.
27. Explain the pointer aliasing concept with suitable program example.
Reference:
1. Andrew N Sloss, Dominic Symes and Chris Wright, ARM system developers guide,
Elsevier, Morgan Kaufman publishers, 2008.
Notes By: Veena Bhat
34 | P a g e

You might also like