0% found this document useful (0 votes)
105 views

ARM MC Module 03

Module 3 of Microcontrollers BCS402 covers C compilers and optimization, focusing on data types, looping structures, and performance enhancements. It emphasizes the importance of using appropriate data types and efficient looping techniques to improve execution speed and reduce code size. Key strategies include using 32-bit data types, counting down in loops, and unrolling loops for better performance.

Uploaded by

sd.manju.9901
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
105 views

ARM MC Module 03

Module 3 of Microcontrollers BCS402 covers C compilers and optimization, focusing on data types, looping structures, and performance enhancements. It emphasizes the importance of using appropriate data types and efficient looping techniques to improve execution speed and reduce code size. Key strategies include using 32-bit data types, counting down in loops, and unrolling loops for better performance.

Uploaded by

sd.manju.9901
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 21

Microcontrollers BCS402

MODULE-3 C Compilers and Optimization: Basic C


Data Types, C Looping Structures, Register Allocation,
Function
Calls, Pointer Aliasing, Portability Issues. No. of Hours:8

Textbook 1: Chapter 5.1 to 5.7 and 5.13


RBT: L1, L2, L3
OVERVIEW OF C COMPILERS AND OPTIMIZATION
• C compilers have to translate the C function into assembler so that it works for all
possible inputs.
• The compiler must be conservative and assume all possible values for N and all
possible alignments
thefor data. specific C compilers:
following
• Very dependent on the compiler vendor and compiler revision.

The following short script shows you how to invoke armcc on a C file test.c.

The -Otime switch optimizes for execution efficiency rather than space and mainly affects the layout of for and while
loops.
If you are using the gcc compiler, then the following short script generates a similar
assembler output listing:

. The -fomit-frame- pointer switch prevents the GNU compiler from maintaining a
frame pointer register.

BASIC C DATA TYPES

• ARM processors have 32-bit registers and 32-bit data processing operations.
• The ARM architecture is a RISC load/store architecture. In other words , load
values from memory into registers before acting on them.
• There are no arithmetic or logical instructions that manipulate values in memory
directly.
3
LOCAL VARIABLE TYPES
• ARMv4-based processors can efficiently load and store 8-, 16-, and 32-bit data.
• However, most ARM data processing operations are 32-bit only. So, use a 32-bit
datatype, int or long, for local variables wherever possible.
• Avoid using char and short as local variable types, even if you are manipulating an
8- or 16-bit value.
• The one exception is when you want wrap-around to occur.
• If you require modulo arithmetic of the form 255+ 1 = 0, then use the char type.
 To see the effect of local variable types, let’s consider a simple example- a
checksum function that sums the values in a data packet. Most communication
protocols (such as TCP/IP) have a checksum or cyclic redundancy check (CRC)
routine to check for errors in a data packet.

5
All ARM registers are 32-bit and all stack entries are at least 32-bit.
Furthermore, to implement the i++ exactly, the compiler must account for the
case when i = 255. Any attempt to increment 255 should produce the answer 6
0.
7
8
Now compare this to the compiler output where instead i declared as an
unsigned int.

In the first case, the compiler inserts an extra AND instruction to reduce i to the
range 0 to 255 before the comparison with 64. This instruction disappears in the 9
second case.
FUNCTION ARGUMENT TYPES
• Converting local variables from types char or short to type int increases
performance and reduces code size.
• The same holds for function arguments.

SIGNED VERSUS UNSIGNED TYPES


• The previous sections demonstrate the advantages of using int rather than a
char or short type for local variables and function arguments. This section
compares the efficiencies of signed int and unsigned int.
• If your code uses addition, subtraction, and multiplication, then there is no
performance difference between signed and unsigned operations.
• However, there is a difference when it comes to division.

10
11
12
C LOOPING STRUCTURES
Loops with a fixed number of iterations
Loops with a variable number of iterations.
Loop unrolling.
1) Loops with a fixed number of iterations
 The key point is that the loop counter should count down to zero rather
than counting up to some arbitrary limit.
 Then the comparison with zero is free since the result is stored in the
condition flags.
 Since no longer using i as an array index, there is no problem in counting down
rather than up.
14
• The SUBS and BNE instructions implement the loop.
• Checksum example now has the minimum number of four instructions per loop.
• This is much better than six for checksum_v1 15
2) Loops with a variable number of
iterations.

• Notice that the compiler checks that N is nonzero on entry to the function. Often this check is
unnecessary since the array won’t be empty. 16
• In this case a do-while loop gives better performance and code density than a for loop.
17
LOOP UNROLLING
 Each loop iteration costs two instructions in addition to the body of the loop: a
subtract to decrement the loop count and a conditional branch.
 These instructions are called as the loop overhead.
 On ARM7 or ARM9 processors the subtract takes one cycle and the branch three
cycles, giving an overhead of four cycles per loop.
 Can save some of these cycles by unrolling a loop—repeating the loop body several
times, and reducing the number of loop iterations by the same proportion.

18
19
20
• The loop overhead has reduced from 4N cycles to (4N)/4= N cycles.
• On the ARM7TDMI, this accelerates the loop from 8 cycles per accumulate to 20/4= 5
cycles per accumulate, nearly doubling the speed.
• For the ARM9TDMI, which has a faster load instruction, the benefit is even higher.

Writing Loops Efficiently


■ Use loops that count down to zero. Then the compiler does not need to allocate
a register to hold the termination value, and the comparison with zero is free.
■ Use unsigned loop counters by default and the continuation condition i!=0
rather than
i>0. This will ensure that the loop overhead is only two instructions.
■ Use do-while loops rather than for loops when you know the loop will iterate at
least once. This saves the compiler checking to see if the loop count is zero.
■ Unroll important loops to reduce the loop overhead. Do not overunroll. If the
loop overhead is small as a proportion of the total, then unrolling will increase code
size and hurt the performance of the cache.
■ Try to arrange that the number of elements in arrays are multiples of four or
eight. You can then unroll loops easily by two, four, or eight times without worrying 21

You might also like