Lecture 3
Lecture 3
ARM is a RISC processor. It is used for small size and high performance applications. Simple architecture low power consumption.
ARM
TIMELINE (1/2)
1985: Acorn Computer Group manufactures the first commercial RISC microprocessor. 1990: Acorn and Apple participation leads to the founding of Advanced RISC Machines (A.R.M.). 1991: ARM6, First embeddable RISC microprocessor. 1992 1994: Various companies use ARM (Sharp, Samsung), while in 1993 ARM7, the first multimedia microprocessor is introduced.
ARM System - On - Chip Architecture
TIMELINE (2/2)
1995: Introduction of Thumb and ARM8. 1996 2000: Alcatel, Huindai, Philips, Sony, use RM, while in 1999 ARM cooperates with Erickson for the development of Bluetooth. 2000 2002: ARMs share of the 32 bit embedded RISC microprocessor market is 80%. ARM Developer Suite is introduced.
ARM
ARM
Small size
ARM
Registers
32 general purpose registers 7 modes of operation Different set of visible registers and different cpsr control level in each mode.
ARM
r13_svc r14_svc
r13_abt r14_abt
r13_irq r14_irq
r13_und r14_und
CPSR
SPSR_fiq
SPSR_svc
SPSR_abt
SPSR_irq SPSR_und
user mode
fiq mode
svc mode
abort mode
irq mode
undefined mode
CPSR
ARM CPSR format
31 28 27 8 7 6 5 4 0
NZ CV
unused
IF T
mode
Memory Organization
bi t 3 1
23 19 15 11 7 3 22 18 14 10 6 2 21 17 13 9 5 1
bi t 0
20 16 12 8 4 0
w ord16 half -w ord14 half -w ord12 w ord8 by te6 half -w ord4 by te3 by te2 by te1 by te0
by te address
ARM
11
Instruction Set
ARM
12
Supervisor mode
In user mode the operating system handles operations outside user privileges. Using supervisor calls, the user goes to system level and can perform system functions.
ARM
13
I/O System
ARM handles peripherals as memory mapped devices with interrupt support. Interrupts:
ARM
14
Exceptions
Exceptions:
Interrupts Supervisor Call Traps The value of PC is copied to r14_exc The operating mode changes into the respective exception mode. The PC takes the exception handler vector address.
ARM System - On - Chip Architecture 15
r13_svc r14_svc
r13_abt r14_abt
r13_irq r14_irq
r13_und r14_und
CPSR
SPSR_fiq
SPSR_svc
SPSR_abt
SPSR_irq SPSR_und
user mode
fiq mode
svc mode
abort mode
irq mode
undefined mode
Arithmetic Operations
ADD r0, r1, r2 ; r0:= r1+r2 and dont update flags ADDS r0, r1, r2 ; r0:= r1+r2 and update flags
Logical Operations
AND r0, r1, r2 ; r0:= r1 AND r2
Register Movement
MOV r0, r2
Comparison
CMP r1, r2
ARM System - On - Chip Architecture 18
Operands:
Immediate operands
ADD r3, r3, #1
Multiplication:
MUL r4, r3, r2
ARM
19
Offset: LDR r0, [r1,#4] Post indexed: LDR r0, [r1], #16 Auto indexed: LDR r0, [r1,#16]!
20
Examples
PRE:
r0 = 0x02020202 r1 = 0x00009004
ARM System - On - Chip Architecture 21
Examples
PRE:
r0 = 0x02020202 r1 = 0x00009000
ARM System - On - Chip Architecture 22
Examples
PRE:
r0 = 0x01010101 r1 = 0x00009004
ARM System - On - Chip Architecture 23
Examples
mem32[0x80018] = 0x03 mem32[0x80014] = 0x02 mem32[0x80010] = 0x01 r0 = 0x00080010 LDMIA r0!, {r1-r3} r0 = 0x0008001c r1 = 0x00000001 r2 = 0x00000002 r3 = 0x00000003
ARM
24
Examples
mem32[0x8001c] = 0x04 mem32[0x80018] = 0x03 mem32[0x80014] = 0x02 mem32[0x80010] = 0x01 r0 = 0x00080010 LDMIB r0!, {r1-r3} r0 = 0x0008001c r1 = 0x00000002 r2 = 0x00000003 r3 = 0x00000004
ARM
25
Conditional execution
Instructions can be executed conditionally without braches CMP r2, r3 ;subtract and set flags ADDGE r4, r5, r6 ; if r2>r3 SUBLT r4, r5, r6 ; else
ARM
26
ARM
27
Branch instruction: B label Conditional branch: BNE label Branch and Link: BL label
BL Loop loop
;
28
Example 1
AREA ARMex, CODE, READONLY ; Name this block of code ARMex ENTRY ; Mark first instruction to execute start MOV r0, #10 ; Set up parameters MOV r1, #3 ADD r0, r0, r1 ; r0 = r0 + r1 stop MOV r0, #0x18 ; angel_SWIreason_ReportException LDR r1, =0x20026 ; ADP_Stopped_ApplicationExit SWI 0x123456 ; ARM semihosting SWI END ; Mark end of file
ARM
29
Example 2
AREA subrout, CODE, READONLY ; Name this block of code ENTRY ; Mark first instruction to execute start MOV r0, #10 ; Set up parameters MOV r1, #3 BL doadd ; Call subroutine stop MOV r0, #0x18 ; angel_SWIreason_ReportException LDR r1, =0x20026 ; ADP_Stopped_ApplicationExit SWI 0x123456 ; ARM semihosting SWI doadd ADD r0, r0, r1 ; Subroutine code MOV pc, lr ; Return from subroutine END ; Mark end of file
ARM System - On - Chip Architecture 30
control
incrementer
ALU
Increase f clk Logic simplification Reduce CPI reduce the number of multicycle instructions.
ARM
33
ARM supports upto 16 coprocessors, which can be software emulated. Each coprocessor has upto 16 generalpurpose registers ARM is a load and store architecture. Coprocessors usually handle on chip functions, such as cache and memory management.
ARM System - On - Chip Architecture 35
(1/2)
For floating-point operations, ARM has the FPE software emulator and the FPA 10 hardware floating point accelerator. FPA 10 includes:
Coprocessor interface Load / store unit Register bank ( 8 registers 80 bit ) ALU (adder, mult, div)
ARM
37
(2/2)
i nstructi on i ssuer
l oad /sto re un it
ari th metic un it
ARM
38
APCS (1/2)
APCS (ARM Procedure Call Standard) is a set of rules concerning C procedure input and output. Specific use of general purpose registers. (r0 r4: arguments, r4 r8 variables, r10 stack limit, etc. ) Procedure I/O:
BL Loop
APCS (2/2)
C code
void f1(int a) { f2(a); }
16 8 4 0 Stack pointer
ARM System - On - Chip Architecture 40
Assembly code
f1 LDR r0, [r13] STR r13!, [r14] STR r13!, [r0] BL f2 SUB r13,#4 LDR r13!, r15
General information
Thumb objective: Code density. Thumb has a 16 bit instruction set. A subset of the ARM instruction set is coded to a 16bit space With appropriate use great benefits can be achieved in terms of
Commands are assembled as 16 bit instructions with the appropriate directive If r0[0] is 1, the T bit in the CPSR becomes 1 and the PC is set to the address obtained from the remaining bits of r0. Using the BX instruction from Thumb state, we return to ARM state.
ARM System - On - Chip Architecture
43
Thumb registers
r0 r1 r2 r3 r4 r5 r6 r7 r8 r9 r10 r11 r12 SP (r13) LR (r14) PC (r15) Lo registers shaded registers hav e res tricted access
Hi registers CPSR
ARM
44
Thumb
Upto 70% code size reduction 40% more instructions. 45% faster code with 16-bit memory Requires about 30% less external memory
ARM
ARM
45
If performance is critical:
ARM
Thumb
ARM
46
A 32 bit ARM system can go into Thumb mode for specific routines, in order to meet power and memory constraints. A 16 bit system: Can use an on chip, 32 bit memory for ARM state routines, and a 16-bit off chip memory and Thumb code for the rest of the application.
ARM
47
Example 3
AREA ThumbSub, CODE, READONLY ; Name this block of code ENTRY ; Mark first instruction to execute CODE32 ; Subsequent instructions are ARM header ADR r0, start + 1 ; Processor starts in ARM state, BX r0 ; so small ARM code header used ; to call Thumb main program CODE16 ; Subsequent instructions are Thumb start MOV r0, #10 ; Set up parameters MOV r1, #3 BL doadd ; Call subroutine stop MOV r0, #0x18 ; angel_SWIreason_ReportException LDR r1, =0x20026 ; ADP_Stopped_ApplicationExit SWI 0xAB ; Thumb semihosting SWI doadd ADD r0, r0, r1 ; Subroutine code MOV pc, lr ; Return from subroutine END ; Mark end of file
ARM System - On - Chip Architecture
48
Example 4
Implement the following pseudocode in ARM and Thumb assembly. Which is more efficient in terms of execution time and which in terms of code size? If r1>r2 then R3= r4 + r5 R6 = r4 r5 Else R3= r4 - r5 R6 = r4 + r5
Example 5
Write an ARM assembly program that loads data from memory location 0x40, sets bits 3 to 5, clears bits 0 to 2 and leaves the remaining bits unchanged. Test it using 0xAD as input data
ARM
50
AMBA (1/4)
Advanced High Performance Bus Advanced System Bus Advanced Peripheral Bus
ARM
53
AMBA (2/4)
ARM
54
AMBA (3/4)
AHB bus
Burst transaction Split transaction Data bus 64 128 bit
master 1
arbiter
address slave 1
master 2
write data
slave 2
master 3
read data
slave 3
decoder
ARM
55
AMBA (4/4)
ARM
56
(1/2)
ARM
57
(2/2)
Piccolo
ALU
mult
register bank
I cache
input buffer
A A i/f MB AMBA
AMBA i/f
MEMORY HIERARCHY
Memory hierarchy
Larger size
Memory type Registers On chip cache Size 32 bit 8 32kbytes
Lower speed
Speed A few nsec 10 nsec
On chip memory
Necessary for performance Some system prefer RAM to on chip cache. Simpler, cheaper and less powerhungry.
ARM
61
Cache types
Cache types:
Compulsory miss: first time and address is accessed Capacity miss: When cache full Conflict miss: Two addresses compete for the same place in the cache
ARM System - On - Chip Architecture 62
Least Recently Used (LRU) Least Frequently Used (LFU) Data prediction Fully-associative Direct-mapped Set-associative
ARM System - On - Chip Architecture 63
(1/2)
ARM
64
(2/2)
Each memory location has a specific place in the cache. Tag and data can be accessed at the same time. Tag RAM smaller than data RAM and has a smaller access time allowing the comparison to complete before accessing the data RAM.
ARM System - On - Chip Architecture 65
A set associative cache has a number of sets yielding n way associative cache. Two addresses that would be competing for the same spot in a direct mapped cache, can be stored in different locations and accessed independently.
ARM
67
Set selection:
ARM
68
Write strategies
Write through
All write operations are passed to main memory
ARM
71
ARM
72
MMU (1/3)
ARM
73
MMU (2/3)
base
limit
>?
physical address
ARM
access fault
74
MMU (3/3)
logical address
data
page directory
page table
page f rame
ARM
75
ETM
VIC (PL192) DMAC (PL080) CLCD (PL110)
CLCD Display
System Control
ARM1136JF core
AHB/APB Bridge
64
64
64
64
config
64 64 64 64
MPMC (PL176)
1. 2. 3. 4. 5. 6. 7. 8.
unassigned
8 AHBs
config
Bus Matrix 1. ARM Periph AHB 2. ARM D Write AHB 3. ARM D Read AHB 4. ARM I AHB 5. ARM DMA AHB 6. CLCD AHB 7. DMA 2 AHB 8. DMA 1 AHB
Static Memory
SMC (PL093)
AHB/APB Bridge AHB/APB Bridge
UART (PL011)
2x UARTs
GPIO (PL061)
SSP (PL022)
SCI (PL131)
32 GPIO Lines
CP15
On chip coprocessor for MMU, cache, protection unit control. Control takes place through registers with instructions executed in supervisor mode.
ARM
77
Protection Unit
Simpler alternative to the MMU. Requires simpler software and hardware. Does not use translation tables, but 8 protection regions instead.
ARM
78
ARMULATOR (1/2)
Armulator: Emulator of various ARM processors. Allows project development in C, C++ or Assembly. It includes debugger, compilers, assembler and this entire set is called ARM Developer Suite (ADS).
ARM System - On - Chip Architecture 80
ARMULATOR (2/2)
ARM and Thumb Interworking Mixing C, C++ and Assembly Code for ROM Exception handlers
MM
ARM System - On - Chip Architecture 81
ARMULATOR TUTORIAL
CODEWARRIOR ENVIRONMENT
ARM
82
ARM
83
ARM
84
ARM
85
ARM
86
ARM
87