ARM Founded in November 1990: Advanced RISC Machines
ARM Founded in November 1990: Advanced RISC Machines
More information about ARM and our offices on our web site:
http://www.arm.com/aboutarm/
1
Embedded Processors
2
Development of the ARM Architecture
v4 v5 v6 v7
3
Architecture ARMv7 profiles
Application profile (ARMv7-A)
Memory management support (MMU)
Highest performance at low power
Influenced by multi-tasking OS system requirements
TrustZone and Jazelle-RCT for a safe, extensible system
e.g. Cortex-A5, Cortex-A9
4
Data Sizes and Instruction Sets
The ARM is a 32-bit architecture.
5
Processor Modes
The ARM has seven basic operating modes:
6
The ARM Register Set
cpsr
spsr spsr spsr spsr spsr spsr
7
Exception Handling
8
Program Status Registers
31 28 27 24 23 16 15 8 7 6 5 4 0
N Z C V Q J U n d e f i n e d I F T mode
f s x c
039v12 9
Program Counter (r15)
When the processor is executing in ARM state:
All instructions are 32 bits wide
All instructions must be word aligned
Therefore the pc value is stored in bits [31:2] with bits [1:0] undefined (as instruction
cannot be halfword or byte aligned)
039v12 10
Conditional Execution and Flags
ARM instructions can be made to execute conditionally by postfixing them with the
appropriate condition code field.
This improves code density and performance by reducing the number of
forward branch instructions.
CMP r3,#0 CMP r3,#0
BEQ skip ADDNE r0,r1,r2
ADD r0,r1,r2
skip
By default, data processing instructions do not affect the condition code flags but
the flags can be optionally set by using “S”. CMP does not need “S”.
loop
…
SUBS r1,r1,#1 decrement r1 and set flags
BNE loop if Z flag clear then branch
11
Condition Codes
12
Conditional execution examples
5 instructions 3 instructions
5 words 3 words
5 or 6 cycles 3 cycles
13
Data Processing Instructions
Consist of :
Arithmetic: ADD ADC SUB SBC RSB RSC
Logical: AND ORR EOR BIC
Comparisons: CMP CMN TST TEQ
Data movement: MOV MVN
Syntax:
Examples
ADD r0, r1, r2
SUBGT r3, r3, #1
RSBLES r4, r5, #5
16
Comparisons
The only effect of the comparisons is to update the
condition flags. Thus no need to set S bit.
Operations are:
CMP operand1 ‐operand2 ; Compare
CMN operand1 + operand2 ; Compare negative
TST operand1 AND operand2 ; Test
TEQ operand1 EOR operand2 ; Test equivalence
Syntax:
<Operation>{<cond>} Rn, Operand2
Examples:
CMPr0, r1
TSTEQr2, #5
17
Logical Operations
Operations are:
ANDoperand1 AND operand2
EORoperand1 EOR operand2
ORRoperand1 OR operand2
ORN operand1 NOR operand2
BIC operand1 AND NOT operand2 [ie bit clear]
Syntax:
<Operation>{<cond>}{S} Rd, Rn, Operand2
Examples:
AND r0, r1, r2
BICE Qr2, r3, #7
EORS r1,r3,r0
18
Data Movement
Operations are:
MOV operand2
MVN NOT operand2
Note that these make no use of operand1.
Syntax:
<Operation>{<cond>}{S} Rd, Operand2
Examples:
MOV r0, r1
MOVS r2, #10
MVNEQ r1,#0
19
The Barrel Shifter
20
The Barrel Shifter
Barrel Shifter ‐Left Shift
Shifts left by the specified amount (multiplies by powers
of two) e.g.
21
The Barrel Shifter
Barrel Shifter ‐Rotations
22
Using a Barrel Shifter:The 2nd Operand
Register, optionally with shift operation
Operand Operand Shift value can be either be:
1 2 5 bit unsigned integer
Specified in bottom byte of
another register.
Barrel
Shifter
Used for multiplication by constant
Immediate value
8 bit number, with a range of 0-
255.
ALU
Rotated right through even
number of positions
Allows increased range of 32-bit
Result constants to be loaded directly into
registers
23
Second Operand : Shifted Register
The amount by which the register is to be shifted is
contained in either:
the immediate 5‐bit field in the instruction
NO OVERHEAD
Shift is done for free ‐executes in single cycle.
the bottom byte of a register (not PC)
Then takes extra cycle to execute
ARM doesn’t have enough read ports to read 3 registers at
once.
Then same as on other processors where shift is separate
instruction.
If no shift is specified then a default shift is applied: LSL #0
i.e. barrel shifter has no effect on value in register.
24
Second Operand: Using a Shifted Register
Using a multiplication instruction to multiply by a constant means
first loading the constant into a register and then waiting a number of
internal cycles for the instruction to complete.
25
Data Processing Exercise
1. How would you load the two’s complement
representation of -1 into Register 3 using one
instruction?
26
Data Processing Solutions
1. MVN r3, #0
27
Immediate constants
No ARM instruction can contain a 32 bit immediate constant
All ARM instructions are fixed as 32 bits long
The data processing instruction format has 12 bits available
for operand2
11 8 7 0
rot immed_8
Quick Quiz:
x2 0xe3a004ff
Shifter
ROR
MOV r0, #???
28
Second Operand: Immediate Value (1)
There is no single instruction which will load a 32 bit immediate
constant into a register without performing a data load from memory.
All ARM instructions are 32 bits long
29
Second Operand: Immediate Value (2)
This gives us:
0 ‐255[0 ‐0xff]
256,260,264,..,1020[0x100‐0x3fc, step 4, 0x40‐0xff ror30]
1024,1040,1056,..,4080[0x400‐0xff0, step 16, 0x40‐0xff ror28]
4096,4160, 4224,..,16320[0x1000‐0x3fc0, step 64, 0x40‐0xff ror26]
30
Loading 32 bit constants
To allow larger constants to be loaded, the assembler offers a pseudo-
instruction:
LDR rd, =const
This will either:
Produce a MOV or MVN instruction to generate the value (if possible).
or
Generate a LDR instruction with a PC-relative address to read the
constant from a literal pool (Constant data area embedded in the
code).
For example
LDR r0,=0xFF => MOV r0,#0xFF
LDR r0,=0x55555555 => LDR r0,[PC,#Imm12]
…
…
DCD 0x55555555
This is the recommended way of loading constants into a register
31
Single register data transfer
LDR STR Word
LDRB STRB Byte
LDRH STRH Halfword
LDRSB Signed byte load
LDRSH Signed halfword load
Syntax:
LDR{<cond>}{<size>} Rd, <address>
STR{<cond>}{<size>} Rd, <address>
e.g. LDREQB
32
Loading 32 bit constants
To allow larger constants to be loaded, the assembler offers a pseudo-
instruction:
LDR rd, =const
This will either:
Produce a MOV or MVN instruction to generate the value (if possible).
or
Generate a LDR instruction with a PC-relative address to read the
constant from a literal pool (Constant data area embedded in the
code).
For example
LDR r0,=0xFF => MOV r0,#0xFF
LDR r0,=0x55555555 => LDR r0,[PC,#Imm12]
…
…
DCD 0x55555555
This is the recommended way of loading constants into a register
33
Single register data transfer
LDR STR Word
LDRB STRB Byte
LDRH STRH Halfword
LDRSB Signed byte load
LDRSH Signed halfword load
Syntax:
LDR{<cond>}{<size>} Rd, <address>
STR{<cond>}{<size>} Rd, <address>
e.g. LDREQB
34
Address accessed
35
Load/Store Exercise
Assume an array of 25 words. A compiler associates
y with r1. Assume that the base address for the
array is located in r2. Translate this C
statement/assignment using just three instructions:
array[10] = array[5] + y;
36
Load/Store Exercise Solution
array[10] = array[5] + y;
37
Load and Store Multiples
Syntax:
<LDM|STM>{<cond>}<addressing_mode> Rb{!}, <register list>
4 addressing modes:
LDMIA / STMIA increment after
LDMIB / STMIB increment before
LDMDA / STMDA decrement after
LDMDB / STMDB decrementIA before
IB DA DB
LDMxx r10, {r0,r1,r4} r4
STMxx r10, {r0,r1,r4}
r4 r1
r1 r0 Increasing
Base Register (Rb) r10 r0 r4 Address
r1 r4
r0 r1
r0
38
Multiply and Divide
There are 2 classes of multiply - producing 32-bit and 64-bit results
32-bit versions on an ARM7TDMI will execute in 2 - 5 cycles
39
Branch instructions
Branch : B{<cond>} label
Branch with Link : BL{<cond>} subroutine_label
31 28 27 25 24 23 0
Cond 1 0 1 L Offset
The processor core shifts the offset field left by 2 positions, sign-extends it
and adds it to the PC
± 32 Mbyte range
How to perform longer branches?
40
Register Usage
Register
Arguments into function The compiler has a set of rules known as a
r0 Procedure Call Standard that determine how to
Result(s) from function r1
otherwise corruptible pass parameters to a function (see AAPCS)
r2
(Additional parameters r3
passed on stack) CPSR flags may be corrupted by function call.
Assembler code which links with compiled code
r4 must follow the AAPCS at external interfaces
r5
r6 The AAPCS is part of the new ABI for the ARM
Register variables r7 Architecture
Must be preserved r8
r9/sb - Stack base
r10/sl - Stack limit if software stack checking selected
r11
41
ARM Branches and Subroutines
B <label>
PC relative. ±32 Mbyte range.
BL <subroutine>
Stores return address in LR
Returning implemented by restoring the PC from LR
For non-leaf functions, LR will have to be stacked
func1 func2
STMFD :
: sp!,{regs,lr}
:
: :
:
BL func1 BL func2
:
: :
:
: LDMFD
sp!,{regs,pc} MOV pc, lr
039v12 42
PSR access
31 28 27 24 23 19 16 15 10 9 8 7 6 5 4 0
In User Mode, all bits can be read but only the condition flags (_f) can be modified
43
Pipeline changes for ARM9TDMI
ARM7TDMI
ARM decode
Instruction Thumb→ARM Reg Reg
Shift ALU
Fetch decompress Read Write
Reg Select
ARM9TDMI
ARM or Thumb
Instruction Inst Decode Memory Reg
Shift + ALU Write
Fetch Reg Reg Access
Decode Read
FETCH DECODE EXECUTE MEMORY WRITE
44
ARM10 vs. ARM11 Pipelines
ARM10
Branch ARM or Memory
Prediction Shift + ALU
Thumb Reg Read Access Reg
Instruction Write
Instruction
Decode Multiply
Fetch Multiply
Add
FETCH ISSUE DECODE EXECUTE MEMORY WRITE
ARM11
Data Data
Address Cache Cache
1 2
45