0% found this document useful (0 votes)

3 views35 pages

Mod 3

The document discusses the ARM architecture's handling of data types, emphasizing the use of 32-bit data types for local variables and function arguments to improve efficiency. It explains the importance of using appropriate loop structures and conditions for signed and unsigned counters, as well as techniques like loop unrolling to optimize performance. Additionally, it highlights the compiler's role in register allocation for local variables to enhance execution speed.

Uploaded by

Ann Elizabeth Thomas

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

3 views35 pages

Mod 3

Uploaded by

Ann Elizabeth Thomas

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 35

Basic C Data Types

∙ ARM processors have 32-bit registers and 32-bit data processing operations.

∙ The ARM architecture is a RISC load/store architecture. In other words you

must load values from memory into registers before acting on them. There are
no arithmetic or logical instructions that manipulate values in memory directly.
∙ The ARMv4 architecture and above support signed 8-bit and 16-bit loads and stores
directly, through new instructions.

∙ Prior to ARMv4, ARM processors were not good at handling signed 8-bit or any 16-bit
values. Therefore ARM C compilers deﬁne char to be an unsigned 8-bit value, rather
than a signed 8-bit value as is typical in many other compilers.
LOCAL VARIABLE TYPES

∙ ARMv4-based processors can efﬁciently load and store 8-, 16-, and 32-bit data.
However, most ARM data processing operations are 32-bit only. For this reason,
you should use a 32-bit datatype, int or long, for local variables wherever
possible.

∙ Avoid using char and short as local variable types, even if you are manipulating
an 8- or 16-bit value. If you require modulo arithmetic of the form 255 1 0, then
use the char type.
suppose the data packet contains 16-bit values and we need a 16-bit checksum. It is tempting to write
the following C code:
FUNCTION ARGUMeNT TYPeS

• Consider the following simple function, which adds two 16-bit values, halving the
second, and returns a 16-bit sum:

• The input values a, b, and the return value will be passed in 32-bit ARM registers.
Should the compiler assume that these 32-bit values are in the range of a short
type, that is, 32,768 to 32,767?

• The compiler must make compatible decisions for the function caller and callee.

• Either the caller or callee must perform the cast to a short type.

• function arguments are passed wide if they are not reduced to the range of the
type and narrow if they are reduced to the range of the type
∙ We tell which decision the compiler has made by looking at the assembly output for add_v1.
■ If the compiler passes arguments wide, then the callee must reduce function arguments
to the correct range.

■ If the compiler passes arguments narrow, then the caller must reduce the range.

■ If the compiler returns values wide, then the caller must reduce the return value to the
correct range.

■ If the compiler returns values narrow, then the callee must reduce the range before
returning the value.
FUNCTION ARGUMeNT TYPeS
FUNCTION ARGUMeNT TYPeS

∙ The gcc compiler we used is more cautious and makes no assumptions about the range of
argument value. This version of the compiler reduces the input arguments to the range of a
short in both the caller and the callee. It also casts the return value to a short type. Here is the
compiled code for add_v1:
• you can see that char or short type function arguments and return values
introduce extra casts.

• These increase code size and decrease performance.

• It is more efﬁcient to use the int type for function arguments and return values,
even if you are only passing an 8-bit value.
SIGNeD VeRSUS UNSIGNeD TYPeS

• Addition, subtraction, and multiplication, then there is no performance difference between

signed and unsigned operations. However, there is a difference when it comes to division.

• Consider the following short example that averages two integers:

C Code to compute average of two

numbes Compiler Output
A=4 B=-6
1. A+B = 4 – 6 = -2

32 31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1
1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 0 0
1
1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 0 1

32 31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1
1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 0 1
1
1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
y=5-9

32 31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0
1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 0 1 1
1
1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 0 0
1
1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 0 1
1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 0
32 31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 0 0
0
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 0 0
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 0
C Looping Structures - LOOPS WITH A FIxeD NUMBeR Of ITeRATIONS

shows how the compiler treats a loop with incrementing count i++.
It takes three instructions to implement the for loop structure:
■ An ADD to increment i
■ A compare to check if i is less than 64
■ A conditional branch to continue the loop if i < 64

This is not efﬁcient. On the ARM, a loop should only use two instructions:

■ A subtract to decrement the loop counter, which also sets the condition
code ﬂags on the result
■ A conditional branch instruction
The key point is that the loop counter should count down to zero rather than counting up to some arbitrary limit.
Then the comparison with zero is free since the result is stored in the condition flags.

Since we are no longer using i as an array index, there is no problem in counting down rather than up.

The SUBS and BNE instructions implement the loop. Our checksum example now has the minimum number of four
instructions per loop. This is much better than six for checksum_v1 and eight for checksum_v3
Signed and Unsigned Loop Counter
• For an unsigned loop counter i we can use either of the loop continuation conditions i!=0 or i>0.
As i can’t be negative, they are the same condition.

• For a signed loop counter, it is tempting to use the condition i>0 to continue the loop

The compiler is not being inefficient. It must be careful about the case when i = -0x80000000 because the two sections of
code generate different answers in this case. For the first piece of code the SUBS instruction compares i with 1 and then
decrements i. Since -0x80000000 < 1, the loop terminates. For the second piece of code, we decrement i and then compare
with 0. Modulo arithmetic means that i now has the value +0x7fffffff, which is greater than zero. Thus the loop continues for
many iterations. Of course, in practice, i rarely takes the value -0x80000000.

Therefore you should use the termination condition i!=0 for signed or unsigned loop counters. It saves one instruction over
the condition i>0 for signed i.
LOOPS USING A VARIABLe NUMBeR Of ITeRATIONS
Now suppose we want our checksum routine to handle packets of arbitrary size. We pass in a variable N giving
the number of words in the data packet.

The checksum_v7 example shows how the compiler handles a for loop with a variable number of iterations N.

that the compiler checks that N is nonzero on entry to the function. Often this check is unnecessary since you
know that the array won’t be empty. In this case a do-while loop gives better performance and code density than a
for loop.
Example shows how to use a do-while loop to remove the test for N being zero that occurs
in a for loop
Loop Unrolling

In decrement loop each loop iteration costs two instructions in addition to the body of the loop:

• a subtract to decrement the loop count and

• a conditional branch.
We call these instructions the loop overhead

• On ARM7 or ARM9 processors the

subtract takes - one cycle and
branch takes - three cycles,

giving an overhead of four cycles per loop.

• You can save some of these cycles by unrolling a loop—repeating the loop body several times, and
reducing the number of loop iterations by the same proportion. For example, let’s unroll our packet
checksum example four times.
There are two questions you need to ask when unrolling a loop:

■ How many times should I unroll the loop?

■ What if the number of loop iterations is not a multiple of the unroll amount? For example, what if N is not a
multiple of four in checksum_v9?

To start with the first question, only unroll loops that are important for the overall performance of the application.
Otherwise unrolling will increase the code size with little performance benefit. Unrolling may even reduce
performance by evicting more important code from the cache

For the second question, try to arrange it so that array sizes are multiples of your unroll amount. If this isn’t
possible, then you must add extra code to take care of the leftover cases. This increases the code size a little but
keeps the performance high
REGISTER ALLOCATION

❑ The compiler attempts to allocate a processor register to each local variable you use in a
C function. It will try to use the same register for different local variables if the use of the
variables do not overlap.

❑ When there are more local variables than available registers, the compiler stores the
excess variables on the processor stack. These variables are called spilled or swapped out
variables since they are written out to memory (in a similar way virtual memory is
swapped out to disk).
❑ Spilled variables are slow to access compared to variables allocated to registers.
To implement a function efﬁciently, you need to

■ minimize the number of spilled variables

■ ensure that the most important and frequently accessed variables are stored in
registers

■ Try to limit the number of local variables in the internal loop of functions to
12. The compiler should be able to allocate these to ARM registers.
FUNCTION CALLS
• The ARM Procedure Call Standard (APCS) deﬁnes how to pass function arguments
and return values in ARM registers. The more recent ARM-Thumb Procedure Call
Standard (ATPCS) covers ARM and Thumb interworking as well.

• The ﬁrst four integer arguments are passed in the ﬁrst four ARM registers: r0, r1, r2,
and r3. Subsequent integer arguments are placed on the full descending stack,
ascending in memory as in Figure 5.1. Function return integer values are passed in
r0.

• Two-word arguments such as long long or double are passed in a pair of

consecutive argument registers and returned in r0, r1.

• Function return integer values are passed in r0.

Four-register rule.
POINTER ALIASING

• Two pointers are said to alias when they point to the same address.

• If you write to one pointer, it will affect the value you read from the other pointer.

• In a function, the compiler often doesn’t know which pointers can alias and which
pointers can’t.

• The compiler must be very pessimistic and assume that any write to a pointer may affect
the value read from any other pointer, which can signiﬁcantly reduce code efﬁciency.

Solution Manual of Cmputer Organization and Architectur
44% (27)
Solution Manual of Cmputer Organization and Architectur
29 pages
Answers 2 Reviews and Exercises
No ratings yet
Answers 2 Reviews and Exercises
26 pages
Module 3
No ratings yet
Module 3
21 pages
Module 3 Notes-1
No ratings yet
Module 3 Notes-1
30 pages
MC Ia-2
No ratings yet
MC Ia-2
14 pages
ARM MC Module 03
No ratings yet
ARM MC Module 03
21 pages
Module 3 Notes
No ratings yet
Module 3 Notes
18 pages
Module 3
No ratings yet
Module 3
51 pages
BCS402 MC Module3 Notes
No ratings yet
BCS402 MC Module3 Notes
30 pages
BCS402 - MC - M3 - Notes SJCIT
No ratings yet
BCS402 - MC - M3 - Notes SJCIT
18 pages
BCS402 Module 3 PDF
No ratings yet
BCS402 Module 3 PDF
18 pages
MC-module 3 C Compilers and Optimization (BCS402)
No ratings yet
MC-module 3 C Compilers and Optimization (BCS402)
22 pages
Module 2 Part B (Mces 21cs43)
No ratings yet
Module 2 Part B (Mces 21cs43)
29 pages
Imp Notes - 1
No ratings yet
Imp Notes - 1
28 pages
BCS402 M3
No ratings yet
BCS402 M3
110 pages
Arm Unit 3
No ratings yet
Arm Unit 3
62 pages
UNIT-IV Basic C Data Types
No ratings yet
UNIT-IV Basic C Data Types
24 pages
Module 5
No ratings yet
Module 5
33 pages
Looping Structures
No ratings yet
Looping Structures
20 pages
3rd Module MC Sem Exam Preparation
No ratings yet
3rd Module MC Sem Exam Preparation
31 pages
Embedded C Interview Questions
75% (4)
Embedded C Interview Questions
3 pages
Module 3
No ratings yet
Module 3
35 pages
Embedded C Programming
100% (1)
Embedded C Programming
57 pages
Hello World
No ratings yet
Hello World
18 pages
Module 3 Book1 - Merged
No ratings yet
Module 3 Book1 - Merged
42 pages
Class Ans Q
No ratings yet
Class Ans Q
24 pages
Department of Computer Science and Engineering
No ratings yet
Department of Computer Science and Engineering
25 pages
Es (U4) 1
No ratings yet
Es (U4) 1
24 pages
4 - Chapter 3 C Programming - 1 - 2024
No ratings yet
4 - Chapter 3 C Programming - 1 - 2024
44 pages
Unit-4 Signed or Unsigned Bits Concept
No ratings yet
Unit-4 Signed or Unsigned Bits Concept
6 pages
C Notes
No ratings yet
C Notes
38 pages
Crash Course in C and Assembly: Zeljko Vrba
No ratings yet
Crash Course in C and Assembly: Zeljko Vrba
10 pages
C - Chapter - 02 RR
No ratings yet
C - Chapter - 02 RR
36 pages
Part4 Clang 28 12 2023
No ratings yet
Part4 Clang 28 12 2023
120 pages
OPOP2
No ratings yet
OPOP2
51 pages
C Tutorial
No ratings yet
C Tutorial
63 pages
EE447 Week5 2023-24
No ratings yet
EE447 Week5 2023-24
37 pages
Step 1: Work An Example Yourself: C Programming-Fundamentals 1-4 Coursera
No ratings yet
Step 1: Work An Example Yourself: C Programming-Fundamentals 1-4 Coursera
44 pages
Puzzels On C
No ratings yet
Puzzels On C
14 pages
A Quick Introduc - On To C Programming: Based On Lewis Girod Slides CENS Systems Lab
No ratings yet
A Quick Introduc - On To C Programming: Based On Lewis Girod Slides CENS Systems Lab
36 pages
EE403W Senior Project Design Design: Section 4 Embedded Systems Section 4 - Embedded Systems C Tutorial
100% (1)
EE403W Senior Project Design Design: Section 4 Embedded Systems Section 4 - Embedded Systems C Tutorial
42 pages
C Programming Note
No ratings yet
C Programming Note
51 pages
Embedded C 1708564537
No ratings yet
Embedded C 1708564537
55 pages
Lect-03-Variables and Datatypes
No ratings yet
Lect-03-Variables and Datatypes
31 pages
Int Int Int Int: Main P P Malloc P
No ratings yet
Int Int Int Int: Main P P Malloc P
11 pages
C Learning
No ratings yet
C Learning
223 pages
Lecture6 RISC V Assembly IV
No ratings yet
Lecture6 RISC V Assembly IV
21 pages
C Chapter 02
No ratings yet
C Chapter 02
40 pages
A Quick Introduction To C Programming
No ratings yet
A Quick Introduction To C Programming
44 pages
C Solution 2005
No ratings yet
C Solution 2005
15 pages
A Quick Introduction To C Programming: Lewis Girod CENS Systems Lab July 5, 2005
No ratings yet
A Quick Introduction To C Programming: Lewis Girod CENS Systems Lab July 5, 2005
42 pages
A Quick Introduction To C Programming: Lewis Girod CENS Systems Lab July 5, 2005
No ratings yet
A Quick Introduction To C Programming: Lewis Girod CENS Systems Lab July 5, 2005
42 pages
C Programming Notes
No ratings yet
C Programming Notes
53 pages
cs61c Notes
No ratings yet
cs61c Notes
29 pages
Note of C Edit by Amar: No Multibyte Characters Are Supported in Turbo C++
No ratings yet
Note of C Edit by Amar: No Multibyte Characters Are Supported in Turbo C++
4 pages
Assembly Language Lab-1
100% (1)
Assembly Language Lab-1
6 pages
Introduction To CortexM3
No ratings yet
Introduction To CortexM3
15 pages
Tce Tutorial
No ratings yet
Tce Tutorial
12 pages
COA Important Questions For Mid Sem
No ratings yet
COA Important Questions For Mid Sem
15 pages
O Level Notes
No ratings yet
O Level Notes
72 pages
4 Bit Cpu Report
No ratings yet
4 Bit Cpu Report
16 pages
Computer Architecture (Bcs504) Unit I
No ratings yet
Computer Architecture (Bcs504) Unit I
51 pages
CS501-Mid Term Solved MCQs With References by Moaaz
0% (1)
CS501-Mid Term Solved MCQs With References by Moaaz
16 pages
CS3350B Computer Architecture MIPS Introduction: Marc Moreno Maza
No ratings yet
CS3350B Computer Architecture MIPS Introduction: Marc Moreno Maza
24 pages
Computer Registers
No ratings yet
Computer Registers
13 pages
Computer Architecture & Organisation Unit-1
No ratings yet
Computer Architecture & Organisation Unit-1
22 pages
6850 Asynchronous Communications Interface Adapter
No ratings yet
6850 Asynchronous Communications Interface Adapter
4 pages
Use of Large Register File
100% (3)
Use of Large Register File
16 pages
Central Processing Unit
No ratings yet
Central Processing Unit
49 pages
Computer Organization and Functional Units of Digital System
No ratings yet
Computer Organization and Functional Units of Digital System
5 pages
Assembler TPF
100% (1)
Assembler TPF
178 pages
Input To The Code Generator
No ratings yet
Input To The Code Generator
62 pages
Computer Organization and Architecture BCA Multiple Choice Questions
No ratings yet
Computer Organization and Architecture BCA Multiple Choice Questions
21 pages
Main Memory - Isaac Computer Science
No ratings yet
Main Memory - Isaac Computer Science
10 pages
Nios Cpu Datasheet
No ratings yet
Nios Cpu Datasheet
14 pages
ARM Processor Core
No ratings yet
ARM Processor Core
34 pages
Introduction To Assembler
No ratings yet
Introduction To Assembler
11 pages
Computer Science Coursebook-66-75
No ratings yet
Computer Science Coursebook-66-75
10 pages
QtSPIM Tutorial
No ratings yet
QtSPIM Tutorial
9 pages
1.1 Binary Systems
No ratings yet
1.1 Binary Systems
11 pages
Unit - Viii Machine Dependent Code Optimization Peephole Optimization
No ratings yet
Unit - Viii Machine Dependent Code Optimization Peephole Optimization
9 pages
CO Unit 5
No ratings yet
CO Unit 5
19 pages
Components of Computer Hardware
No ratings yet
Components of Computer Hardware
17 pages
Computer Architecture Note 2024
No ratings yet
Computer Architecture Note 2024
45 pages

Mod 3

Uploaded by

Mod 3

Uploaded by

Basic C Data Types

∙ The ARM architecture is a RISC load/store architecture. In other words you

• These increase code size and decrease performance.

• Addition, subtraction, and multiplication, then there is no performance difference between

• Consider the following short example that averages two integers:

C Code to compute average of two

• a subtract to decrement the loop count and

• On ARM7 or ARM9 processors the

giving an overhead of four cycles per loop.

■ How many times should I unroll the loop?

■ minimize the number of spilled variables

• Two-word arguments such as long long or double are passed in a pair of

• Function return integer values are passed in r0.

You might also like