0% found this document useful (0 votes)
60 views45 pages

ARM Founded in November 1990: Advanced RISC Machines

Uploaded by

aesyop
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
60 views45 pages

ARM Founded in November 1990: Advanced RISC Machines

Uploaded by

aesyop
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 45

 ARM founded in November 1990

 Advanced RISC Machines

 Company headquarters in Cambridge, UK


 Processor design centers in Cambridge, Austin, and Sophia Antipolis
 Sales, support, and engineering offices all over the world

 Best known for its range of RISC processor cores designs


 Other products – fabric IP, software tools, models, cell libraries - to help
partners develop and ship ARM-based SoCs

 ARM does not manufacture silicon

 More information about ARM and our offices on our web site:
 http://www.arm.com/aboutarm/

1
Embedded Processors

2
Development of the ARM Architecture
v4 v5 v6 v7

Halfword and Improved SIMD Instructions


Thumb-2
signed halfword interworking
Multi-processing
/ byte support
CLZ
v6 Memory architecture
Architecture Profiles
Saturated arithmetic
Unaligned data support
System mode 7-A -
DSP MAC
instructions Applications
Extensions: 7-R - Real-
Thumb
instruction set Thumb-2 time
Extensions:
(v4T) (6T2) 7-M -
Jazelle Microcontroller
TrustZone®
(5TEJ)
(6Z)
Multicore
(6K)

 Thumb only can be different


Note that implementations of the same architecture
(6-M)
 Cortex-A8 - architecture v7-A, with a 13-stage pipeline
 Cortex-A9 - architecture v7-A, with an 8-stage pipeline

3
Architecture ARMv7 profiles
 Application profile (ARMv7-A)
 Memory management support (MMU)
 Highest performance at low power
 Influenced by multi-tasking OS system requirements
 TrustZone and Jazelle-RCT for a safe, extensible system
 e.g. Cortex-A5, Cortex-A9

 Real-time profile (ARMv7-R)


 Protected memory (MPU)
 Low latency and predictability ‘real-time’ needs
 Evolutionary path for traditional embedded business
 e.g. Cortex-R4

 Microcontroller profile (ARMv7-M, ARMv7E-M, ARMv6-M)


 Lowest gate count entry point
 Deterministic and predictable behavior a key priority
 Deeply embedded use
 e.g. Cortex-M3

4
Data Sizes and Instruction Sets
 The ARM is a 32-bit architecture.

 When used in relation to the ARM:


 Byte means 8 bits
 Halfword means 16 bits (two bytes)
 Word means 32 bits (four bytes)

 Most ARM’s implement two instruction sets


 32-bit ARM Instruction Set
 16-bit Thumb Instruction Set

 Jazelle cores can also execute Java bytecode

5
Processor Modes
 The ARM has seven basic operating modes:

 User : unprivileged mode under which most tasks run

 FIQ : entered when a high priority (fast) interrupt is raised

 IRQ : entered when a low priority (normal) interrupt is raised

 Supervisor : entered on reset and when a Software Interrupt


instruction is executed
 Abort : used to handle memory access violations

 Undef : used to handle undefined instructions

 System : privileged mode using the same registers as user mode

6
The ARM Register Set

Current Visible Registers


r0
Abort
Undef
SVC
User
FIQ
IRQ Mode
Mode
Mode
Mode r1
r2
r3 Banked out Registers
r4
r5
r6 User FIQ IRQ SVC Undef Abort
r7
r8 r8 r8
r9 r9 r9
r10 r10 r10
r11 r11 r11
r12 r12 r12
r13 (sp) r13 (sp) r13 (sp) r13 (sp) r13 (sp) r13 (sp) r13 (sp)
r14 (lr) r14 (lr) r14 (lr) r14 (lr) r14 (lr) r14 (lr) r14 (lr)
r15 (pc)

cpsr
spsr spsr spsr spsr spsr spsr

7
Exception Handling

 When an exception occurs, the ARM:


 Copies CPSR into SPSR_<mode>
 Sets appropriate CPSR bits
 Change to ARM state 0x1C FIQ
0x18 IRQ
 Change to exception mode
0x14 (Reserved)
 Disable interrupts (if appropriate) 0x10 Data Abort
 Stores the return address in LR_<mode> 0x0C Prefetch Abort
 Sets PC to vector address 0x08 Software Interrupt
0x04 Undefined Instruction
 To return, exception handler needs to:0x00 Reset
 Restore CPSR from SPSR_<mode> Vector Table
 Restore PC from LR_<mode> Vector table can be at
0xFFFF0000 on ARM720T
This can only be done in ARM state. and on ARM9/10 family devices

8
Program Status Registers
31 28 27 24 23 16 15 8 7 6 5 4 0

N Z C V Q J U n d e f i n e d I F T mode
f s x c

 Condition code flags  Interrupt Disable bits.


 N = Negative result from ALU  I = 1: Disables the IRQ.
 Z = Zero result from ALU  F = 1: Disables the FIQ.
 C = ALU operation Carried out
 V = ALU operation oVerflowed  T Bit
 Architecture xT only
 Sticky Overflow flag - Q flag  T = 0: Processor in ARM state
 Architecture 5TE/J only  T = 1: Processor in Thumb state
 Indicates if saturation has occurred
 Mode bits
 J bit  Specify the processor mode
 Architecture 5TEJ only
 J = 1: Processor in Jazelle state

039v12 9
Program Counter (r15)
 When the processor is executing in ARM state:
 All instructions are 32 bits wide
 All instructions must be word aligned
 Therefore the pc value is stored in bits [31:2] with bits [1:0] undefined (as instruction
cannot be halfword or byte aligned)

 When the processor is executing in Thumb state:


 All instructions are 16 bits wide
 All instructions must be halfword aligned
 Therefore the pc value is stored in bits [31:1] with bit [0] undefined (as instruction
cannot be byte aligned)

 When the processor is executing in Jazelle state:


 All instructions are 8 bits wide
 Processor performs a word access to read 4 instructions at once

039v12 10
Conditional Execution and Flags
 ARM instructions can be made to execute conditionally by postfixing them with the
appropriate condition code field.
 This improves code density and performance by reducing the number of
forward branch instructions.
CMP r3,#0 CMP r3,#0
BEQ skip ADDNE r0,r1,r2
ADD r0,r1,r2
skip

 By default, data processing instructions do not affect the condition code flags but
the flags can be optionally set by using “S”. CMP does not need “S”.
loop

SUBS r1,r1,#1 decrement r1 and set flags
BNE loop if Z flag clear then branch

11
Condition Codes

 The possible condition codes are listed below


 Note AL is the default and does not need to be specified
Suffix Description Flags tested
EQ Equal Z=1
NE Not equal Z=0
CS/HS Unsigned higher or same C=1
CC/LO Unsigned lower C=0
MI Minus N=1
PL Positive or Zero N=0
VS Overflow V=1
VC No overflow V=0
HI Unsigned higher C=1 & Z=0
LS Unsigned lower or same C=0 or Z=1
GE Greater or equal N=V
LT Less than N!=V
GT Greater than Z=0 & N=V
LE Less than or equal Z=1 or N=!V
AL Always

12
Conditional execution examples

C source code ARM instructions


unconditional conditional
if (r0 == 0) CMP r0, #0 CMP r0, #0
{ BNE else ADDEQ r1, r1, #1
r1 = r1 + 1; ADD r1, r1, #1 ADDNE r2, r2, #1
} B end ...
else else
{ ADD r2, r2, #1
r2 = r2 + 1; end
} ...

 5 instructions  3 instructions
 5 words  3 words
 5 or 6 cycles  3 cycles

13
Data Processing Instructions
 Consist of :
 Arithmetic: ADD ADC SUB SBC RSB RSC
 Logical: AND ORR EOR BIC
 Comparisons: CMP CMN TST TEQ
 Data movement: MOV MVN

 These instructions only work on registers, NOT memory.

 Syntax:

<Operation>{<cond>}{S} Rd, Rn, Operand2

 Comparisons set flags only - they do not specify Rd


 Data movement does not specify Rn

 Second operand is sent to the ALU via barrel shifter.


14
Data processing Instructions
 Largest family of ARM instructions, all sharing the same
instruction format.
 Contains:
 Arithmetic operations
 Comparisons (no results ‐just set condition codes)
 Logical operations
 Data movement between registers

 Remember, this is a load / store architecture


 These instruction only work on registers, NOT memory.
 They each perform a specific operation on one or two
operands.
 First operand always a register ‐Rn
 Second operand sent to the ALU via barrel shifter.
 We will examine the barrel shifter shortly. 15
Arithmetic Operations
 Operations are:ADDoperand1 + operand2; Add
 ADC operand1 + operand2 + carry ; Add with carry
 SUB operand1 ‐operand2 ; Subtract
 SBC operand1 ‐operand2 + carry ‐1 ; Subtract with carry
 RSB operand2 ‐operand1 ; Reverse subtract
 RSC operand2 ‐operand1 + carry ‐1 ; Reverse subtract with
carry

 Syntax:<Operation>{<cond>}{S} Rd, Rn, Operand2

 Examples
 ADD r0, r1, r2
 SUBGT r3, r3, #1
 RSBLES r4, r5, #5
16
Comparisons
 The only effect of the comparisons is to update the
condition flags. Thus no need to set S bit.
 Operations are:
 CMP operand1 ‐operand2 ; Compare
 CMN operand1 + operand2 ; Compare negative
 TST operand1 AND operand2 ; Test
 TEQ operand1 EOR operand2 ; Test equivalence

 Syntax:
 <Operation>{<cond>} Rn, Operand2
 Examples:
 CMPr0, r1
 TSTEQr2, #5
17
Logical Operations
 Operations are:
 ANDoperand1 AND operand2
 EORoperand1 EOR operand2
 ORRoperand1 OR operand2
 ORN operand1 NOR operand2
 BIC operand1 AND NOT operand2 [ie bit clear]
 Syntax:
 <Operation>{<cond>}{S} Rd, Rn, Operand2

 Examples:
 AND r0, r1, r2
 BICE Qr2, r3, #7
 EORS r1,r3,r0
18
Data Movement
 Operations are:
 MOV operand2
 MVN NOT operand2
 Note that these make no use of operand1.
 Syntax:
 <Operation>{<cond>}{S} Rd, Operand2

 Examples:
 MOV r0, r1
 MOVS r2, #10
 MVNEQ r1,#0

19
The Barrel Shifter

 The ARM doesn’t have actual shift instructions.

 Instead it has a barrel shifter which provides a


mechanism to carry out shifts as part of other
instructions.

 So what operations does the barrel shifter support?

20
The Barrel Shifter
 Barrel Shifter ‐Left Shift
 Shifts left by the specified amount (multiplies by powers
of two) e.g.

 LSL #5 => multiply by 32

21
The Barrel Shifter
 Barrel Shifter ‐Rotations

22
Using a Barrel Shifter:The 2nd Operand
Register, optionally with shift operation
Operand Operand  Shift value can be either be:
1 2  5 bit unsigned integer
 Specified in bottom byte of
another register.
Barrel
Shifter
 Used for multiplication by constant

Immediate value
 8 bit number, with a range of 0-
255.
ALU
 Rotated right through even
number of positions
 Allows increased range of 32-bit
Result constants to be loaded directly into
registers

23
Second Operand : Shifted Register
 The amount by which the register is to be shifted is
contained in either:
 the immediate 5‐bit field in the instruction
 NO OVERHEAD
 Shift is done for free ‐executes in single cycle.
 the bottom byte of a register (not PC)
 Then takes extra cycle to execute
 ARM doesn’t have enough read ports to read 3 registers at
once.
 Then same as on other processors where shift is separate
instruction.
 If no shift is specified then a default shift is applied: LSL #0
 i.e. barrel shifter has no effect on value in register.

24
Second Operand: Using a Shifted Register
 Using a multiplication instruction to multiply by a constant means
first loading the constant into a register and then waiting a number of
internal cycles for the instruction to complete.

 A more optimum solution can often be found by using


some combination of MOVs, ADDs, SUBs and RSBs with
shifts.
 Multiplications by a constant equal to a ((power of 2) ±1) can be done in one cycle.
 MOV R2, R0, LSL #2; Shift R0 left by 2, write to R2, (R2=R0x4)
 ADD R9, R5, R5, LSL #3 ; R9 = R5 + R5 x 8 or R9 = R5 x 9
 RSB R9, R5, R5, LSL #3 ; R9 = R5 x 8 ‐R5 or R9 = R5 x 7
 SUB R10, R9, R8, LSR #4 ; R10 = R9 ‐R8 / 16
 MOV R12, R4, ROR R3 ; R12 = R4 rotated right by value of R3

25
Data Processing Exercise
1. How would you load the two’s complement
representation of -1 into Register 3 using one
instruction?

2. Implement an ABS (absolute value) function for a


registered value using only two instructions.

3. Multiply a number by 35, guaranteeing that it


executes in 2 core clock cycles.

26
Data Processing Solutions
1. MVN r3, #0

2. MOVS r7,r7 ; set the flags


RSBMI r7,r7,#0 ; if neg, r7=0-r7

3. ADD r9,r8,r8,LSL #2 ; r9=r8*5


RSB r10,r9,r9,LSL #3 ; r10=r9*7

27
Immediate constants
 No ARM instruction can contain a 32 bit immediate constant
 All ARM instructions are fixed as 32 bits long
 The data processing instruction format has 12 bits available
for operand2
11 8 7 0
rot immed_8
Quick Quiz:
x2 0xe3a004ff
Shifter
ROR
MOV r0, #???

 4 bit rotate value (0-15) is multiplied by two to give range 0-


30 in steps of 2
 Rule to remember is
“8-bits rotated right by an even number of bit positions”

28
Second Operand: Immediate Value (1)
 There is no single instruction which will load a 32 bit immediate
constant into a register without performing a data load from memory.
 All ARM instructions are 32 bits long

 The data processing instruction format has 12 bits available for


operand2
 If used directly this would only give a range of 4096.

 Instead it is used to store 8 bit constants, giving a range of 0 ‐255.


 These 8 bits can then be rotated right through an even number of
positions (ie RORs by 0, 2, 4,..30).
 This gives a much larger range of constants that can be directly loaded, though
some constants will still need to be loaded from memory.

29
Second Operand: Immediate Value (2)
 This gives us:
 0 ‐255[0 ‐0xff]
 256,260,264,..,1020[0x100‐0x3fc, step 4, 0x40‐0xff ror30]
 1024,1040,1056,..,4080[0x400‐0xff0, step 16, 0x40‐0xff ror28]
 4096,4160, 4224,..,16320[0x1000‐0x3fc0, step 64, 0x40‐0xff ror26]

 These can be loaded using, for example:


 MOV r0, #0x40, 26; => MOV r0, #0x1000 (ie 4096)
 To make this easier, the assembler will convert to this form for us if
simply given the required constant:
 MOV r0, #4096; => MOV r0, #0x1000 (ie 0x40 ror 26)

 The bitwise complements can also be formed using MVN:


 MOV r0, #0xFFFFFFFF ; assembles to MVN r0, #0
 If the required constant cannot be generated, an error will be
reported.

30
Loading 32 bit constants
 To allow larger constants to be loaded, the assembler offers a pseudo-
instruction:
 LDR rd, =const
 This will either:
 Produce a MOV or MVN instruction to generate the value (if possible).
or
 Generate a LDR instruction with a PC-relative address to read the
constant from a literal pool (Constant data area embedded in the
code).
 For example
 LDR r0,=0xFF => MOV r0,#0xFF
 LDR r0,=0x55555555 => LDR r0,[PC,#Imm12]


DCD 0x55555555
 This is the recommended way of loading constants into a register

31
Single register data transfer
LDR STR Word
LDRB STRB Byte
LDRH STRH Halfword
LDRSB Signed byte load
LDRSH Signed halfword load

 Memory system must support all access sizes

 Syntax:
 LDR{<cond>}{<size>} Rd, <address>
 STR{<cond>}{<size>} Rd, <address>

e.g. LDREQB
32
Loading 32 bit constants
 To allow larger constants to be loaded, the assembler offers a pseudo-
instruction:
 LDR rd, =const
 This will either:
 Produce a MOV or MVN instruction to generate the value (if possible).
or
 Generate a LDR instruction with a PC-relative address to read the
constant from a literal pool (Constant data area embedded in the
code).
 For example
 LDR r0,=0xFF => MOV r0,#0xFF
 LDR r0,=0x55555555 => LDR r0,[PC,#Imm12]


DCD 0x55555555
 This is the recommended way of loading constants into a register

33
Single register data transfer
LDR STR Word
LDRB STRB Byte
LDRH STRH Halfword
LDRSB Signed byte load
LDRSH Signed halfword load

 Memory system must support all access sizes

 Syntax:
 LDR{<cond>}{<size>} Rd, <address>
 STR{<cond>}{<size>} Rd, <address>

e.g. LDREQB
34
Address accessed

 Address accessed by LDR/STR is specified by a base register with an offset


 For word and unsigned byte accesses, offset can be:
 An unsigned 12-bit immediate value (i.e. 0 - 4095 bytes)
LDR r0, [r1, #8]
 A register, optionally shifted by an immediate value
LDR r0, [r1, r2]
LDR r0, [r1, r2, LSL#2]
 This can be either added or subtracted from the base register:
LDR r0, [r1, #-8]
LDR r0, [r1, -r2, LSL#2]
 For halfword and signed halfword / byte, offset can be:
 An unsigned 8 bit immediate value (i.e. 0 - 255 bytes)
 A register (unshifted)
 Choice of pre-indexed or post-indexed addressing
 Choice of whether to update the base pointer (pre-indexed only)
LDR r0, [r1, #-8]!

35
Load/Store Exercise
Assume an array of 25 words. A compiler associates
y with r1. Assume that the base address for the
array is located in r2. Translate this C
statement/assignment using just three instructions:

array[10] = array[5] + y;

36
Load/Store Exercise Solution

array[10] = array[5] + y;

LDR r3, [r2, #5] ; r3 = array[5]


ADD r3, r3, r1 ; r3 = array[5] + y
STR r3, [r2, #10] ; array[5] + y =
array[10]

37
Load and Store Multiples
 Syntax:
 <LDM|STM>{<cond>}<addressing_mode> Rb{!}, <register list>
 4 addressing modes:
 LDMIA / STMIA increment after
 LDMIB / STMIB increment before
 LDMDA / STMDA decrement after
 LDMDB / STMDB decrementIA before
IB DA DB
LDMxx r10, {r0,r1,r4} r4
STMxx r10, {r0,r1,r4}
r4 r1
r1 r0 Increasing
Base Register (Rb) r10 r0 r4 Address
r1 r4
r0 r1
r0

38
Multiply and Divide
 There are 2 classes of multiply - producing 32-bit and 64-bit results
 32-bit versions on an ARM7TDMI will execute in 2 - 5 cycles

 MUL r0, r1, r2 ; r0 = r1 * r2


 MLA r0, r1, r2, r3 ; r0 = (r1 * r2) + r3

 64-bit multiply instructions offer both signed and unsigned versions


 For these instruction there are 2 destination registers

 [U|S]MULL r4, r5, r2, r3 ; r5:r4 = r2 * r3


 [U|S]MLAL r4, r5, r2, r3 ; r5:r4 = (r2 * r3) + r5:r4

 Most ARM cores do not offer integer divide instructions


 Division operations will be performed by C library routines or inline shifts

39
Branch instructions
 Branch : B{<cond>} label
 Branch with Link : BL{<cond>} subroutine_label

31 28 27 25 24 23 0

Cond 1 0 1 L Offset

Link bit 0 = Branch


1 = Branch with link
Condition field

 The processor core shifts the offset field left by 2 positions, sign-extends it
and adds it to the PC
 ± 32 Mbyte range
 How to perform longer branches?

40
Register Usage
Register
Arguments into function The compiler has a set of rules known as a
r0 Procedure Call Standard that determine how to
Result(s) from function r1
otherwise corruptible pass parameters to a function (see AAPCS)
r2
(Additional parameters r3
passed on stack) CPSR flags may be corrupted by function call.
Assembler code which links with compiled code
r4 must follow the AAPCS at external interfaces
r5
r6 The AAPCS is part of the new ABI for the ARM
Register variables r7 Architecture
Must be preserved r8
r9/sb - Stack base
r10/sl - Stack limit if software stack checking selected
r11

Scratch register r12


(corruptible)

Stack Pointer r13/sp - SP should always be 8-byte (2 word) aligned


Link Register r14/lr - R14 can be used as a temporary once value stacked
Program Counter r15/pc

41
ARM Branches and Subroutines
 B <label>
 PC relative. ±32 Mbyte range.
 BL <subroutine>
 Stores return address in LR
 Returning implemented by restoring the PC from LR
 For non-leaf functions, LR will have to be stacked

func1 func2

STMFD :
: sp!,{regs,lr}
:
: :
:
BL func1 BL func2
:
: :
:
: LDMFD
sp!,{regs,pc} MOV pc, lr

039v12 42
PSR access
31 28 27 24 23 19 16 15 10 9 8 7 6 5 4 0

N Z C V Q de J GE[3:0] IT cond_abc E A I F T mode


f s x c

 MRS and MSR allow contents of CPSR / SPSR to be transferred to / from a


general purpose register or take an immediate value
 MSR allows the whole status register, or just parts of it to be updated
 Interrupts can be enable/disabled and modes changed, by writing to the CPSR
 Typically a read/modify/write strategy should be used:

MRS r0,CPSR ; read CPSR into r0


BIC r0,r0,#0x80 ; clear bit 7 to enable IRQ
MSR CPSR_c,r0 ; write modified value to ‘c’ byte only

 In User Mode, all bits can be read but only the condition flags (_f) can be modified

43
Pipeline changes for ARM9TDMI

ARM7TDMI
ARM decode
Instruction Thumb→ARM Reg Reg
Shift ALU
Fetch decompress Read Write
Reg Select

FETCH DECODE EXECUTE

ARM9TDMI
ARM or Thumb
Instruction Inst Decode Memory Reg
Shift + ALU Write
Fetch Reg Reg Access
Decode Read
FETCH DECODE EXECUTE MEMORY WRITE

44
ARM10 vs. ARM11 Pipelines
ARM10
Branch ARM or Memory
Prediction Shift + ALU
Thumb Reg Read Access Reg
Instruction Write
Instruction
Decode Multiply
Fetch Multiply
Add
FETCH ISSUE DECODE EXECUTE MEMORY WRITE

ARM11

Shift ALU Saturate

Fetch Fetch MAC MAC MAC Write


Decode Issue
1 2 1 2 3 back

Data Data
Address Cache Cache
1 2

45

You might also like