0% found this document useful (0 votes)

25 views

Module 2

The document discusses compiler optimization in embedded C. It explains that compiler optimization aims to reduce code size, memory access time, and power consumption while maintaining program correctness and reasonable compilation time. It also describes the basic data types used in ARM processors, noting that char is typically unsigned 8-bit while int and long are preferred over char and short for local variables. An example shows how declaring a loop counter as char adds unnecessary instructions compared to declaring it as int.

Uploaded by

venugopal

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

25 views

Module 2

Uploaded by

venugopal

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 41

Chapter-3

INTRODUCTION TO THE ARM INSTRUCTION SET

The Instruction Set Architecture is fundamental for building fast, efficient computers that optimize
memory and processing resources. It specifies the following supported capabilities:

 instructions
 data types
 processor registers
 main memory hardware
 input/output model
 addressing modes

Programmers and system engineers rely on the ISA for guidance on how to program various
activities.

Instruction sets work with other important parts of a computer, such as compilers and interpreters.
Those components translate high-level programming code into machine code that the processor
can understand.

Think of the ISA as a programmer's gateway into the inner workings of a computer.

Opcode and Operand

Each assembly language statement is split into an opcode and an operand. The opcode is
the instruction that is executed by the CPU and the operand is the data or memory location used to
execute that instruction.
ARMv7 Instruction set Architecture.
 Different ARM architecture revisions support different instructions. However, new
revisions are backwardly compatible.
 The ARMv7 architecture is a 32-bit processor architecture.
 It is also a load/store architecture, meaning that data-processing instructions operate only
on values in general purpose registers.
 Only load and store instructions access memory.
 General purpose registers are also 32 bits.
 A word, we mean 32 bits. A double-word is therefore 64 bits and a half-word is 16 bits
wide.
Classes of Instructions
1. Data Processing Ins
2. Branch Ins
3. Load – Store Ins
4. Software Interrupt Ins
5. Program Status Register Ins

3.1 Data Processing Instructions

These are the fundamental arithmetic and logical operations of the processor and operate on values in
the general-purpose registers, or a register and an immediate value.
Multiply and divide instructions can be considered special cases of these instructions.
Data processing instructions mostly use one destination register and two source operands.
The general format can be considered to be the instruction, followed by the operands, as follows:

The data processing operations include:

 Move Instructions
 Arithmetic Instructions
 Logical Instructions
 Comparison Instructions
 Multiply Instructions

3.1.1 Move Instructions

It is the simplest ARM Instructions.
It copies N into a destination register Rd, where N is a register or immediate value.
Used to initialize a value or to transfer data between registers.
Example

3.1.2 Barrel Shifter

The barrel shifter is a functional unit which can be used in a number of different circumstances.
Barrel shifter along with ALU is shown here.

It provides five types of shifts and rotates which can be applied to Operand2. (These are not operations
themselves in ARM mode.)
Certain ARM instructions such as MUL, CLZ and QADD cannot use the barrel shifter.
Pre-processing or shift occurs within the cycle time of the instruction. This is useful for multiplying
or dividing a constant by a power of 2.
Instructions that uses Barrel shifter is illustrated with examples.
3.1.3 Arithmetic Instruction
Used to carry out addition and subtraction of 32 bit signed and unsigned values.

Simple subtract instruction

Reverse Subtract Instruction

3.1.4 Using the Barrel shifter with Arithmetic Instructions

3.1.5 Logical Instruction

Bitwise logical operations can be performed using this instructions
3.1.6 Comparison Instructions
Comparison instructions are used to compare or test a register. These instructions
compare and update the cpsr register. Hence, S suffix instructions are not there for these
instructions.
CMP is actually a subtract instruction but the results are simply discarded. However, the cpsr
flags are modified.
TST is a logical AND operation.
TEQ is logical Exclusive OR operation.
In all the cases, only the cpsr register is modified no other registers are changed. Results are
simply discarded.
3.1.7 Multiply Instructions
Multiply instructions multiply the contents of a pair of registers
The long multiply instruction results in 64 bit value. In such cases, RdLo holds the lower 32 bit
of the 64 bit result, and RdHi holds the higher 32 bit of the 64 bit result. So, we must specify two
registers as destination to hold the results.

3.2 Branch Instructions

Branch instruction changes the flow of execution or call a subroutine.
3.3 Load Store Instructions
These instructions are used to transfer data between memory and processor registers.
Three types are there.
 Single-register transfer,
 Multiple Register Transfer,
 Swap
3.3.1 Single-Register Transfer
These instructions are used to move a single data item in and out of a register.
With these, we can able to transfer signed or unsigned 32 bit / 16 bit data.
3.4 Software Interrupt Instructions (SWI)
It causes a software Interrupt exceptions. It provides a gateway to call the operating system
routines.
When the program executes the SWI ins, it sets the content of the program counter (pc) to an
offset value 0x8.
Also, it forces the proccessor mode to SVC.
Each SWI has anassociated SWI number.
3.5 Program Status Register Instructions
The ARM instruction set provides two instructions namely MRS and MSR to directly control the
psr.
3.5.1 Co-Processor Instructions
These instructions are used to extend the Instruction Set.
A co-processor can either provide additional computational capability or to control the
ma=emory management.
The Coprocessor instruction can be of
Data processing, register ttransfering, memory transfering. However,. These instructions are only
used by the cores with a coprocessor.

3.6 Loading Constants

Since ARM instructions are 32 bits in size, they obviously cannot specify a general 32 bit
constant.
So, to move a 32 bit constant, two pseudoinstructions are employed.

3.7 Programs
to find the sum of first 10 numbers
Find the factorial of a number
Result is stored in register R0.
Overview of C compilers and Optimization

5.1 Introduction

# C Embedded C
1 It is a structural and general purpose Embedded C is generally used to develop
programming language used by the microcontroller-based applications.
developers to build desktop-based
applications
2 C is a high-level programming language. Embedded C is just the extension variant of the
C language.
3 This programming language is hardware On the other hand, embedded C language is
independent. truly hardware dependent.
4 The compilers in C language are OS The compilers in embedded C are OS
dependent. independent.

5 Here, the traditional or standard compilers Here, we need a specific compiler that can help
are used to run the program . in generating micro-controller based
6 6Famous compilers used in C are Intel C++, Famous compilers used in embedded C are Keil
Borland turbo C, and more. compiler. BiPOM ELECTRONIC. Green Hill
Software.
What is compiler optimization in Embedded C?

Optimization is a series of actions taken by the compiler on your program's code generation
process to reduce a number of instructions (code space optimization), memory access time
(time-space optimization), and Power consumption.
Compiler optimizing process should meet the following objectives :
 The optimization must be correct, it must not, in any way, change the meaning of the
program.
 Optimization should increase the speed and performance of the program.
 The compilation time must be kept reasonable.
 The optimization process should not delay the overall compiling process.

5.2 Basic Data types

ARM processors have 32 bit registers and 32 bit data processing operations, It has Load /store
architecture.
(No arithmetic or logical operations possible in memory directly.)
Previous versions of ARM (ARMv4 and its lower) were not good in handling signed 8 /16 bit
values. So, the ARM C compilers define char to be an unsigned 8-bit value rather than a signed 8-
bit.
(Inside the memory whether it is character or number, all are stored as numbers only. So
how the compilers treat the number which is defined as char is a matter.)
ARM v4 and its lower define char to be an unsigned 8 bit value.

Data type mappings used by armcc and gcc

5.3 Local Variable Types

Though ARMv4 is efficient in to loading and storing 8, 16 and 32 bit, ARMv7 and above have
their data processing operations as 32 only. So, it is advisable to use int or long data type for
local variables. Avoid using char and short, even when working with 8 or 16 bit value.
Exception is when you use modulo arithmetics that needs to give 255+1 =0 case. (Here one
can use char)
Reason to avoid char as local variable
Example
Consider a function written to find checksum of a data-packet containing 64 words as below.
Looking the variable ‘i’ as a char datatypes seems like efficient, since it occupies less space in
register, as well as in stack. However, this is not correct, bcaz, all the registers and stack entries
are 32 bit only.
Looking at i++, the compiler has to look on the implementation that accounts for the case of
i=255. Once i= 255 and incrementing it leads to 0.
255+1.
The corresponding compiler output for this code is given below

Instead of declaring i as char, if we declare it as unsigned int, the AND instruction can be removed.
The compiler output for the program in which I is declared as int is
Suppose, the data packet contains 16 bit values, and we need a 16 bit checksum, in that case,

The expression sum+data[i] is an integer and an explicit typecasting short is carried out.
The corresponding assembly language compiler output is
The loop is now three instruction longer than the previous one.
Reasons are
The LDRH instruction does not allow for a shifted address offset. So, address calculation is
literally done in the ADD, and then the corresponding data in that address is summed.
LDRH instruction does not have offset calculation. It loads only the address.
The explicit typecasting requires two MOV instructions. The compiler shifts left by 16 and then
right by 16 to implement 16 bit sign extend.
If the embedded c program is modified as the sum as int inside the function and converting final
result to short will be an optimized one as below.
The *(data++) operation translates to a single ARM instruction, that loads the data and
increments the pointer.
The corresponding assembly code otput of the compiler goes as below

The compiler is still performing one cast to a 16 bit range on the return variable outside the loop.
If we make the function to return int, then the 2 MOV instruction kept before return can be
removed.

We know that, converting local variable from char or short to int will increase the performance
and reduces the code size. The same holds true for functions also.
Consider the function that adds two 16 bit values , halving the second and returns a 16 bit sum.

This program is actually a useful test case to illustrate the problem faced by the compiler.
The input values a, b and the return value will be passed in 32 bit registers.
Should the compiler assume that these 32 bit values are in the range of short type, (i.e -32768 to
+32767)?
Or the compiler force values to be in this range by sign- extending the lowest 16 bits to fill the 32
bit register?
So, the compiler must make compatible decisions b/w the function caller and callee on who to
perform the cast to short type.
If the compiler passes arguments wide, the callee must reduce the arguments to the correct range.
If, the compiler passes arguments narrow then the caller must perform the task of reducing the
arguments to the correct range.
In ARMcc, the function arguments are passed narrow (i.e caller casts the task of casting), and the
return values are narrow (i.e callee casts retrn value)
Following assembly code shows the narrow passing of arguments and return value.

One version of gcc compiler makes no assumptions about the range of argument value.
It reduces the input arguments to the range of a short in both the caller and callee. The compiler
output goes as below.

Addition, subtraction and multiplication operation does not make any difference in performance
whether it is signed or unsigned one.
However when it comes to division, it is different
(32 bit int has
a minimum value of -2,147,483,648 and a maximum value
of 2,147,483,647 (inclusive)

The compiler adds one to the sum before shifting by right if the sum is negative.
If the data type is unsigned int, then no need to keep the second ADD instruction.
Bcaz, a divide by 2 is not a right shift if the data is negative.
To understand the program, pls try this code in Micro Vision 4

Efficient use of C type

For local variables which are held in registers, don’t use a char or short unless 8 bit or 16 bit
modular arithmetic is necessary. Use the signed or unsigned int types. Unsigned int are faster
when you use division.
For array entries and global variables held in main memory, se the type with the smallest size
possible to hold the required data. This saves memory footprint.
Use explicit casts when reading array entries or global variables into local variables (ie passing
arguments to function)
Use explicit casts when writing local variables out to array entries (i.e returning data)
Avoid implicit or explicit narrowing casts in expressions, because they usually cost extra cycles.
Avoid char and short types for function arguments and return values.

Section 5.3

Loops with a fixed number of iterations

Let s see how the compiler treats a loop with incrementing count i++

This compiles to
It takes three instructions to implement the loop.

It is not efficient for ARM. It should use only two instructions.

 A subtract to decrement the loop variable. This also sets the condition flags on the result.
 Followed by a conditional branch instruction.
If we use a decrementing counter like this

This compiles to
Here the loop contains only 4 instructions. It is better than the 6 what we have in the
incrementing loop.
SUBS and BNE implements the loop.
This is nice, when the loop counter is positive. We can use both i!=0; or i>0.
However for a signed loop counter, with i>0, the compiler will generate the following

In fact the compiler will generate

However, when i=-0x80000000, the two sections of the code generate different answers.
For the case-1, SUBS ins compares i with 1 and then decrements i. Since -0x80000000<1, the
loop terminates.
For the case -2, i is decremented first and then compared with 0. For this case i has the value
0x7fffffff, which is greater than 0. So, the loop continues for many iterations.
So, one must use i!=0 for signed or unsigned loop counters.
Suppose the packet size is unknown or arbitrary, we use a variable N which gives the number of
data in the packet. For variable number of iterations N,

This compiles to

Notice here, the compiler often checks that N is non zero at the entry to the function.
This can be avoided if we use do- while loop.
Example program with do-while
The compiler output is

Each loop iteration costs two instructions in addition to the body of the loop. This we call it as
Loop overhead.
The subtract takes one cycle and branch takes three cycles, giving an overhead of 4 cycles per
loop.
We can save some of these cycles by unrolling a loop. Repeating a loop body several times and
reducing the number of loop iterations can be done in some places. For example.
This compiles to

With this, we have reduced the loop overhead from 4N cycles to N cycles.
However, there are two questions to be answered
1. How many times one can unroll the loop?
2. What if the number of iteration is not a multiple of 4?

Only do unroll for the loops that are important for the overall performance of the application.
Otherwise unrolling will increase the code size and gives little performance. Sometimes, this
may even reduce the performance.
Suppose if a loop is 30% of the entire application, we can unroll the loop until it is 0.5KB in
code size. Then the loop overhead is almost 4 cycles compared to a loop body of 128 cycles.
It is usually not worth unrolling when the gain is less than 1%.

For the qn2, try to arrange so that the iterations are multiples of your unroll amount.
Otherwise, put extra codes for the leftover case. This will improve the performance considerably.
Example

Here the second loop is meant for cases to handle the leftovercases
Writing loops efficiently

 The compiler attempts to allocate a register to each local variable.

 It tries to use the same register for different local variables if the use of the variables does
not overlap.
 When number of local variables exceeds number of available registers then the excess
variables are stored on the stack.

Spilling
 Such stacked variables are called spilled since they are written out to memory.
 Spilled variables are slow to access compared to variables allocated to registers.
 To implement a function efficiently, you need to:
o Minimise the number of spilled variables.
o Ensure that critical variables are stored in registers.

AAPCS (ARM Architecture Procedure Call Standard) Registers

AAPCS is the ARM Architecture Procedure Calling Standard. It is a convention which allows
high level languages to interwork.
Rn Name Usage under AAPCS

R0 Argument registers. These hold the first four function arguments on a function call and
To A1..4 the return value on a function return. A function may corrupt these registers and use
R3 them as general scratch registers within the function.

R4
General variable registers. The function must preserve the callee values of these
To V1..5
registers.
R8

General variable register. The function must preserve the callee value of this register
R9 V6 SB except when compiling for read-write position independence (RWPI). Then R9 holds
the static base address. This is the address of the read-write data.

General variable register. The function must preserve the callee value of this register
R10 V7 SL except when compiling with stack limit checking. Then R10 holds the stack limit
address.

General variable register. The function must preserve the callee value of this register
R11 V8 FP except when compiling using a frame pointer. Only old versions of armcc use a frame
pointer.

A general scratch register that the function can corrupt. It is useful as a scratch register
R12 IP
for function veneers or other intra-procedure call requirements.

R13 SP The stack pointer, pointing to the full descending stack.

R14 LR The link register. On a function call this holds the return address.

R15 PC The program counter.

Available Registers

 R0..R12, R14 can all hold variables.

 Must save R4..R11, R14 on the stack if using these registers.
 Compiler can assign 14 variables to registers without spillage.
 But some compilers use a fixed register e.g. R12 as scratch and never keep values in it.
 Complex expressions need intermediate working registers.

Try to limit the inner loop of routines to at most 12 local variables.

 If the compiler does spill variables, it chooses which variables to spill based on
frequency of use.
 A variable used inside a loop counts multiple times.
 You can tell the compiler about important variables by using them within the innermost
loop.

APCS defines how to pass function arguments and return values in ARM registers

Four register rule

First four integer arguments to a function are passed in R0-R3.

The remainder of the arguments are passed on the stack.

 Therefore, functions taking four or fewer arguments avoid the stack, which allows for
greater efficiency.

Two word arguments such as long long or double are passed in a pair of consecutive argument
registers and returned in ro, r1.
For functions with more than four arguments, both the caller and callee must access the stack
for some arguments.

For C++,the first argument to an object method is the this pointer. This argument is implicit
and additional to the explicit arguments.

In general, if the number of arguments are greater than 4, it is efficient to use structures.

Group related arguments into structures, and pass a structure pointer.

Example

This compiles to
This has only three function arguments Hence requires only three registers.

The callee function needs to assign a single register for the queue structure pointer.

The function call overhead can be further reduced by putting both the caller and callee function
in the same C file, then the compiler knows the code generated for the callee function and can
make optimization in the caller function:

Summary of Function calling

Two pointers are said to alias when they point to the same address.
If you write to one pointer, it will affect the value you read from the other pointer.
The compiler often doesn’t know which pointers alias.
The compiler must assume that any write through a pointer may affect the value read
from any another pointer!
This can significantly reduce code efficiency.

The following function increments two timer values by a step amount.

This compiles to

You’d expect *step to be pulled from memory once and used twice. That does not happen.

Usually a compiler optimization called subexpression elimination would kick in so that *step was
only evaluated once and is reused for the second occurrence.
However, the compiler can’t use this optimization here. The compiler cannot be sure that the write
to timer1 does not affect the read from step. This forces the compiler to insert an extra Load
instruction.
Avoiding pointer aliasing

CCC Question Answer PDF in English
56% (9)
CCC Question Answer PDF in English
10 pages
Assembly Programming:Simple, Short, And Straightforward Way Of Learning Assembly Language
From Everand
Assembly Programming:Simple, Short, And Straightforward Way Of Learning Assembly Language
Sherwyn Allibang
5/5 (2)
TE and TE Agent Installation Guide
No ratings yet
TE and TE Agent Installation Guide
15 pages
TypeScript Interview Questions
No ratings yet
TypeScript Interview Questions
36 pages
Unit - 2 ARM Instruction Set-Notes
100% (2)
Unit - 2 ARM Instruction Set-Notes
18 pages
Module 2 Chapter 3
No ratings yet
Module 2 Chapter 3
18 pages
MODULE_5.doc
No ratings yet
MODULE_5.doc
56 pages
Module 5 Ppt
No ratings yet
Module 5 Ppt
67 pages
Q&A Module-2 (1)
No ratings yet
Q&A Module-2 (1)
12 pages
Module-2: Microcontroller and Embedded Systems
No ratings yet
Module-2: Microcontroller and Embedded Systems
74 pages
mod5
No ratings yet
mod5
67 pages
APP_UNIT_2_PKM
No ratings yet
APP_UNIT_2_PKM
102 pages
SJB Institute of Technology: CO & ARM Microcontrollers (21EC52)
No ratings yet
SJB Institute of Technology: CO & ARM Microcontrollers (21EC52)
61 pages
Module-5.pptx
No ratings yet
Module-5.pptx
43 pages
APznzaat55JjBv7tUmgqqdJSZxvy0pmBbMiGMdfAU-Iex6ftVqYw6hgrPCaa00uq1HLWSJH8tYfAO4lqK3ZdLPu1QFkl1cPFnTX-skg28jOw2ufE_Yx12tpbzFBd5JAd9u_pERIjzVVSv2Hij1Qu3CdRN7e3kMHWQRFCwcAzZP6aickXkeV9hf5ZnXPuJ3ant_FsM0ax-Y3V3hHn3S_3Asg1GwuE03eX2DIt2xyKrJlHkDeE
No ratings yet
APznzaat55JjBv7tUmgqqdJSZxvy0pmBbMiGMdfAU-Iex6ftVqYw6hgrPCaa00uq1HLWSJH8tYfAO4lqK3ZdLPu1QFkl1cPFnTX-skg28jOw2ufE_Yx12tpbzFBd5JAd9u_pERIjzVVSv2Hij1Qu3CdRN7e3kMHWQRFCwcAzZP6aickXkeV9hf5ZnXPuJ3ant_FsM0ax-Y3V3hHn3S_3Asg1GwuE03eX2DIt2xyKrJlHkDeE
86 pages
ARM Presentation
No ratings yet
ARM Presentation
51 pages
Embedded Lecture 4 ARM
No ratings yet
Embedded Lecture 4 ARM
47 pages
Arm-Module 7
No ratings yet
Arm-Module 7
37 pages
Mc Lab Introduction Part Bcs402 Sem-4 2024-25
No ratings yet
Mc Lab Introduction Part Bcs402 Sem-4 2024-25
11 pages
ARM Processors
No ratings yet
ARM Processors
6 pages
Module 2
No ratings yet
Module 2
44 pages
ARMfinal 1
No ratings yet
ARMfinal 1
114 pages
ARM Introduction & Instruction Set Architecture: Aleksandar Milenkovic
No ratings yet
ARM Introduction & Instruction Set Architecture: Aleksandar Milenkovic
31 pages
ARM Instruction Set
No ratings yet
ARM Instruction Set
71 pages
Introduction To Processor Design & The ARM Architecture
100% (1)
Introduction To Processor Design & The ARM Architecture
65 pages
ARM Instruction Set
No ratings yet
ARM Instruction Set
40 pages
04 - The ARM Architecture and ISA
No ratings yet
04 - The ARM Architecture and ISA
73 pages
l18 Arm
No ratings yet
l18 Arm
71 pages
Intro To ARM Cortex-M3 (CM3) and LPC17xx MCU: Outline
No ratings yet
Intro To ARM Cortex-M3 (CM3) and LPC17xx MCU: Outline
79 pages
Unit 2 Erts
No ratings yet
Unit 2 Erts
93 pages
ARM Arch Instruc Set Part2
No ratings yet
ARM Arch Instruc Set Part2
18 pages
Arm Instruction Set
No ratings yet
Arm Instruction Set
54 pages
ARM Processor - Instruction Set - Module 5
No ratings yet
ARM Processor - Instruction Set - Module 5
24 pages
l18 Arm
No ratings yet
l18 Arm
71 pages
ARM Introduction & Instruction Set Architecture
100% (2)
ARM Introduction & Instruction Set Architecture
71 pages
Module 1B - ARM Cortex M0+ Core Architecture
No ratings yet
Module 1B - ARM Cortex M0+ Core Architecture
28 pages
ARM MCU Unit2 Part1
No ratings yet
ARM MCU Unit2 Part1
44 pages
3 Instruction Set
No ratings yet
3 Instruction Set
72 pages
19ECE304 - Chapter 3,5 - ARM
No ratings yet
19ECE304 - Chapter 3,5 - ARM
115 pages
Unit V Contents at A Glance
No ratings yet
Unit V Contents at A Glance
27 pages
2441-LT3 ARM Assembly Instr 2023-24
No ratings yet
2441-LT3 ARM Assembly Instr 2023-24
30 pages
ARM Instruction Set Architecture
No ratings yet
ARM Instruction Set Architecture
8 pages
PPT-2 - Data Processing Instructions
No ratings yet
PPT-2 - Data Processing Instructions
59 pages
04 ARM Assembly
No ratings yet
04 ARM Assembly
62 pages
Arm Processor
No ratings yet
Arm Processor
92 pages
Cortex M3
No ratings yet
Cortex M3
34 pages
657668478
No ratings yet
657668478
78 pages
CSE331_L3_ARM_ISA
No ratings yet
CSE331_L3_ARM_ISA
103 pages
ARM Slides Part2
No ratings yet
ARM Slides Part2
17 pages
Unit 5 Notes _ARM Instruction
No ratings yet
Unit 5 Notes _ARM Instruction
5 pages
ARM Instruction Set
No ratings yet
ARM Instruction Set
5 pages
Instruction Set Architecture (ISA)
No ratings yet
Instruction Set Architecture (ISA)
41 pages
Arm Instruction Set
No ratings yet
Arm Instruction Set
61 pages
Unit - 2 ARM Instruction Set-Notes
100% (1)
Unit - 2 ARM Instruction Set-Notes
18 pages
Module-2 Notes
No ratings yet
Module-2 Notes
28 pages
4 Isa 2
No ratings yet
4 Isa 2
112 pages
UNIT 5-ARM Processor
No ratings yet
UNIT 5-ARM Processor
55 pages
Lecture 3. ARM Instructions: Prof. Taeweon Suh Computer Science Education Korea University
No ratings yet
Lecture 3. ARM Instructions: Prof. Taeweon Suh Computer Science Education Korea University
83 pages
Arm 2
No ratings yet
Arm 2
30 pages
ARM Architecture
No ratings yet
ARM Architecture
6 pages
Cortex-M3 Instruction Sets
No ratings yet
Cortex-M3 Instruction Sets
35 pages
02 Arm
No ratings yet
02 Arm
53 pages
Preliminary Specifications: Programmed Data Processor Model Three (PDP-3) October, 1960
From Everand
Preliminary Specifications: Programmed Data Processor Model Three (PDP-3) October, 1960
Digital Equipment Corporation
No ratings yet
IEC 60870-5-104 and IEC 61850 Protocol Analysis With Wireshark
No ratings yet
IEC 60870-5-104 and IEC 61850 Protocol Analysis With Wireshark
3 pages
Perkins: Hardware: Pandaros Software: 14.1.01 Djoudi
No ratings yet
Perkins: Hardware: Pandaros Software: 14.1.01 Djoudi
4 pages
Lab 12 Electronic
No ratings yet
Lab 12 Electronic
7 pages
6.1 Operating System - Fundamentals
No ratings yet
6.1 Operating System - Fundamentals
4 pages
Product Manual: 1018 - Phidgetinterfacekit 8/8/8
100% (1)
Product Manual: 1018 - Phidgetinterfacekit 8/8/8
21 pages
UNIT 1: Principles of Power Electronics
No ratings yet
UNIT 1: Principles of Power Electronics
15 pages
1023 - Saturn Manual v1.4
No ratings yet
1023 - Saturn Manual v1.4
43 pages
Chariot Software Download Instructions - 3d1e91eb 1890 44d0 90d0 59b1113b9d65
No ratings yet
Chariot Software Download Instructions - 3d1e91eb 1890 44d0 90d0 59b1113b9d65
1 page
Epson TMU220 Brochure
No ratings yet
Epson TMU220 Brochure
2 pages
Install Scip y
No ratings yet
Install Scip y
6 pages
Dell g15 5530 Field Service Manual en Us
No ratings yet
Dell g15 5530 Field Service Manual en Us
136 pages
Cit853 2022 1
No ratings yet
Cit853 2022 1
2 pages
Auto Switch 1
No ratings yet
Auto Switch 1
2 pages
BCA_Sem-1_BC01001011_ Fundamental of Computer Organization
No ratings yet
BCA_Sem-1_BC01001011_ Fundamental of Computer Organization
4 pages
Introduction To Microcontrollers
No ratings yet
Introduction To Microcontrollers
65 pages
Lab 6
No ratings yet
Lab 6
11 pages
C++ Mini Project
No ratings yet
C++ Mini Project
18 pages
13 Types of Computer Drives (With Pictures)
No ratings yet
13 Types of Computer Drives (With Pictures)
26 pages
Zimbra 9 Datasheet
No ratings yet
Zimbra 9 Datasheet
2 pages
LB1847
No ratings yet
LB1847
17 pages
PageAlgorithms
No ratings yet
PageAlgorithms
31 pages
Series Circuits 1
No ratings yet
Series Circuits 1
1 page
Network SIS Setup Guide: Any Changes To The Following Information Is Not Authorized or Endorsed by Caterpillar Inc
100% (2)
Network SIS Setup Guide: Any Changes To The Following Information Is Not Authorized or Endorsed by Caterpillar Inc
30 pages
20180413172652612_b8 Nt156whm-n50新款 v8.0 Preliminary Product Spec Rev.p0_hc Pol_20160705
No ratings yet
20180413172652612_b8 Nt156whm-n50新款 v8.0 Preliminary Product Spec Rev.p0_hc Pol_20160705
36 pages
5.rashmi Final
No ratings yet
5.rashmi Final
19 pages
Service Manual Acer Aspire 6930 6930G
No ratings yet
Service Manual Acer Aspire 6930 6930G
238 pages
startel-vmware-vsphere-install-configure-manage-v8-vsicm8
No ratings yet
startel-vmware-vsphere-install-configure-manage-v8-vsicm8
9 pages

Module 2

Uploaded by

Module 2

Uploaded by

Chapter-3

INTRODUCTION TO THE ARM INSTRUCTION SET

Opcode and Operand

3.1 Data Processing Instructions

The data processing operations include:

3.1.1 Move Instructions

3.1.2 Barrel Shifter

Simple subtract instruction

Reverse Subtract Instruction

3.1.5 Logical Instruction

3.2 Branch Instructions

3.6 Loading Constants

5.2 Basic Data types

Data type mappings used by armcc and gcc

5.3 Local Variable Types

Efficient use of C type

Loops with a fixed number of iterations

It is not efficient for ARM. It should use only two instructions.

In fact the compiler will generate

 The compiler attempts to allocate a register to each local variable.

AAPCS (ARM Architecture Procedure Call Standard) Registers

R13 SP The stack pointer, pointing to the full descending stack.

R15 PC The program counter.

 R0..R12, R14 can all hold variables.

Try to limit the inner loop of routines to at most 12 local variables.

Four register rule

First four integer arguments to a function are passed in R0-R3.

The remainder of the arguments are passed on the stack.

Group related arguments into structures, and pass a structure pointer.

Summary of Function calling

The following function increments two timer values by a step amount.

You might also like