Chapter 2
Chapter 2
C. W. Jen 任建葳
[email protected]
Outline
• Processor programming model
Institute of Electronics, National Chiao Tung University
1/213
2/213
Processor Programming Model
Institute of Electronics, National Chiao Tung University
ARM Ltd
• ARM was originally developed at Acron Computer Limited
Institute of Electronics, National Chiao Tung University
3/213
ARM Architecture vs. Berkeley RISC
• Features used
Institute of Electronics, National Chiao Tung University
– load/store architecture
– fixed-length 32-bit instructions
– 3-address instruction formats
• Features unused
– register windows ⇒ costly
use shadow registers in ARM
– delayed branch ⇒ not well to superscalar
badly with branch prediction
– single-cycle execution of all instructions
most single-cycle
– memory access
multiple cycles when no separate data and instruction memory
support
auto-indexing addressing modes
4/213
Data Size and Instruction set
• ARM processor is a 32-bit architecture
Institute of Electronics, National Chiao Tung University
5/213
Data Types
6/213
Programming Model
• Each instruction can be viewed as performing a
Institute of Electronics, National Chiao Tung University
7/213
Processor Modes
• ARM has seven basic operating modes
Institute of Electronics, National Chiao Tung University
8/213
Privileged Modes
• Most programs operate in user mode. ARM has
Institute of Electronics, National Chiao Tung University
9/213
Supervisor Mode
• Having some protective privileges
Institute of Electronics, National Chiao Tung University
10/213
The Registers
11/213
Register Banking
Institute of Electronics, National Chiao Tung University
cpsr
spsr spsr spsr spsr spsr
12/213
Registers Organization Summary
User FIQ IRQ SVC Undef Abort
Institute of Electronics, National Chiao Tung University
r0
r1
User
r2
mode
r3 r0-r7,
r15, Thumb state
r4 User User User User
and Low registers
r5 mode mode mode mode
cpsr
r0-r12, r0-r12, r0-r12, r0-r12,
r6
r15, r15, r15, r15,
r7 and and and and
r8 r8 cpsr cpsr cpsr cpsr
r9 r9
Thumb state
r10 r10
High registers
r11 r11
r12 r12
r13 (sp) r13 (sp) r13 (sp) r13 (sp) r13 (sp) r13 (sp)
r14 (lr) r14 (lr) r14 (lr) r14 (lr) r14 (lr) r14 (lr)
r15 (pc)
cpsr
spsr spsr spsr spsr spsr
13/213
Program Counter (r15)
• When the processor is executing in ARM state:
Institute of Electronics, National Chiao Tung University
14/213
Program Status Registers (CPSR)
24 23
Institute of Electronics, National Chiao Tung University
31 28 27 16 15 8 7 6 5 4 0
N Z C V Q U n d e f i n e d I F T mode
f s x c
15/213
SPSRs
• Each privileged mode (except system mode) has
Institute of Electronics, National Chiao Tung University
16/213
Exceptions
• Exceptions are usually used to handle
Institute of Electronics, National Chiao Tung University
18/213
Exception Entry (2/2)
– force the PC to begin executing at the relevant vector address
Institute of Electronics, National Chiao Tung University
20/213
Exception Return (2/3)
• When the return address has been kept in the
Institute of Electronics, National Chiao Tung University
banked r14
– to return from a SWI or undefined instruction trap
MOVS PC,r14
– to return from an IRQ, FIQ or prefetch abort
SUBS PC,r14,#4
– To return from a data abort to retry the data access
SUBS PC,r14,#8
– ‘S’ signifies when the destination register is the PC
21/213
Exception Return (3/3)
• When the return address has been saved onto a
Institute of Electronics, National Chiao Tung University
stack
LDMFD r13!,{r0-r3,PC}^ ;restore and return
– ‘^’ indicates that this is a special form of the instruction
the CPSR is restored at the same time that the PC is
loaded from memory, which will always be the last item
transferred from memory since the registers are loaded
in increasing order
22/213
Exception Priorities
• Priority order
Institute of Electronics, National Chiao Tung University
23/213
Memory Organization
bit 31 bit 0
bit 31 bit 0
Institute of Electronics, National Chiao Tung University
20 21 22 23
23 22 21 20
16 17 18 19
19 18 17 16
word16
word16
12 13 14 15
15 14 13 12
half-word12 half-word14
half-word14 half-word12
8 9 10 11
11 10 9 8
word8
word8
4 5 6 7
7 6 5 4
byte5 half-word6
byte6 half-word4 byte
byte 0 1 2 3
3 2 1 0 address
address
byte0 byte1 byte2 byte3
byte3 byte2 byte1 byte0
24/213
Features of the ARM Instruction Set
• Load-store architecture
Institute of Electronics, National Chiao Tung University
25/213
Coprocessors
Handshaking
Institute of Electronics, National Chiao Tung University
signals
F D E F D E F D E
Databus
26/213
Thumb
• Thumb is a 16-bit instruction set
Institute of Electronics, National Chiao Tung University
27/213
28/213
Average Thumb Code Sizes
Institute of Electronics, National Chiao Tung University
29/213
ARM and Thumb Performace
Institute of Electronics, National Chiao Tung University
I/O System
• ARM handles input/output peripherals as
Institute of Electronics, National Chiao Tung University
30/213
ARM Exceptions
• Supports interrupts, traps, supervisor calls
Institute of Electronics, National Chiao Tung University
31/213
ARM Exceptions
• Exception handler use r13_<mode> which will
Institute of Electronics, National Chiao Tung University
32/213
ARM Processor Cores
• ARM Processor core + cache + MMU
Institute of Electronics, National Chiao Tung University
33/213
ARM Processor Cores
• ARM 8 → ARM 9
Institute of Electronics, National Chiao Tung University
→ ARM 10
• ARM 9
– 5-stage pipeline (130 MHz or 200MHz)
– using separate instruction and data memory ports
• ARM 10 (1998. Oct.)
– high performance, 300 MHz
– multimedia digital consumer applications
– optional vector floating-point unit
34/213
ARM Architecture Versions (1/5)
• Version 1
Institute of Electronics, National Chiao Tung University
35/213
ARM Architecture Versions (2/5)
• Version 3
Institute of Electronics, National Chiao Tung University
36/213
ARM Architecture Versions (3/5)
• Version 4
Institute of Electronics, National Chiao Tung University
37/213
ARM Architecture Versions (4/5)
• Version 5T
Institute of Electronics, National Chiao Tung University
38/213
ARM Architecture Versions (5/5)
Institute of Electronics, National Chiao Tung University
Core Architecture
ARM1 v1
ARM2 v2
ARM2as, ARM3 v2a
ARM6, ARM600, ARM610 v3
ARM7, ARM700, ARM710 v3
ARM7TDMI, ARM710T, ARM720T, ARM740T v4T
StrongARM, ARM8, ARM810 v4
ARM9TDMI, ARM920T, ARM940T V4T
ARM9E-S v5TE
ARM10TDMI, ARM1020E v5TE
39/213
40/213
32-bit Instruction Set
Institute of Electronics, National Chiao Tung University
Institute of Electronics, National Chiao Tung University
41/213
Instructions
42/213
ARM Instruction Set Summary (1/4)
Mnemonic Instruction Action
Institute of Electronics, National Chiao Tung University
43/213
ARM Instruction Set Summary (2/4)
Mnemonic Instruction Action
Institute of Electronics, National Chiao Tung University
44/213
ARM Instruction Set Summary (3/4)
Mnemonic Instruction Action
Institute of Electronics, National Chiao Tung University
45/213
ARM Instruction Set Summary (4/4)
46/213
47/213
ARM Instruction Set Format
Institute of Electronics, National Chiao Tung University
Data Processing Instructions
• Consist of
Institute of Electronics, National Chiao Tung University
48/213
Conditional Execution
• Most instruction sets only allow branches to be executed
Institute of Electronics, National Chiao Tung University
conditionally.
• However by reusing the condition evaluation hardware,
ARM effectively increases number of instructions.
– all instructions contain a condition field which determines whether
the CPU will execute them
– non-executed instructions still take up 1 cycle
• to allow other stages in the pipeline to complete
• This reduces the number of branches which would stall the
pipeline
– allows very dense in-line code
– the time penalty of not executing several conditional instructions is
frequently less than overhead of the branch or subroutine call that
would otherwise be needed
49/213
Conditional Execution
Each of the 16 values causes the instruction to be executed or skipped
Institute of Electronics, National Chiao Tung University
cond
50/213
Using and Updating the Condition Field
51/213
Data Processing Instructions
• Simple register operands
Institute of Electronics, National Chiao Tung University
• Immediate operands
• Shifted register operands
• Multiply
52/213
Simple Register Operands (1/2)
• Arithmetic Operations
Institute of Electronics, National Chiao Tung University
53/213
Simple Register Operands (2/2)
• Register Movement Operations
Institute of Electronics, National Chiao Tung University
• Comparison Operations
– not produce result; omit the destination from the format
– just set the condition code bits (N, Z, C and V) in CPSR
54/213
Immediate Operands
• Replace the second source operand with an immediate
Institute of Electronics, National Chiao Tung University
• Immediate = (0~255)*22n
where n is 0-15 4-bit value
55/213
Shifted Register Operands
Institute of Electronics, National Chiao Tung University
• ADD r3,r2,r1,LSL#3
;r3:=r2+8*r1
– a single instruction executed in a
single cycle
• LSL: Logical shift left by 0 to 31
places, 0 filled at the lsb end
• LSR, ASL(Arithmetic Shift Left),
ASR, ROR(Rotate Right),
RRX(Rotate Right eXtended by 1
place)
• ADD r5,r5,r3,LSL r2
;r5:=r5+r3*2r2
• MOV r12,r4,ROR r3 ;r12:=r4
rotated right by value of r3
56/213
Using the Barrel Shifter: the 2nd Operand
Operand Operand
1 2 applied.
– Shift value can be either:
• 5-bit unsigned integer
Barrel • Specified in bottom byte of another
Shifter register
– Used for multiplication by constant
• Immediate value
– 8-bit number, with a range of 0-255
ALU • Rotated right through even number of
positions
– Allows increased range of 32-bit
Result constants to be loaded directly into
registers
57/213
Multiply
MUL r4,r3,r2 ;r4:=(r3*r2)[31:0]
Institute of Electronics, National Chiao Tung University
• Multiply-Accumulate
58/213
Multiplication by a Constant
• Multiplication by a constant equals to a ((power of 2) +/- 1)
Institute of Electronics, National Chiao Tung University
59/213
Data Processing Instructions (1/3)
• <op>{<cond>}{S} Rd,Rn,#<32-bit immediate>
Institute of Electronics, National Chiao Tung University
• <op>{<cond>}{S} Rd,Rn,Rm,{<shift>}
– omit Rn when the instruction is monadic (MOV, MVN)
– omit Rd when the instruction is a comparison, producing only
condition code outputs (CMP, CMN, TST, TEQ)
– <shift> specifies the shift type (LSL, LSR, ASL, ASR, ROR or RRX)
and in all cases but RRX, the shift amount which may be a 5-bit
immediate (# < # shift>) or a register Rs
• 3-address format
– 2 source operands and 1 destination register
– one source is always a register, the second may be a register, a
shifted register or an immediate value
60/213
61/213
Data Processing Instructions (2/3)
Institute of Electronics, National Chiao Tung University
Data Processing Instructions (3/3)
• Allows direct control of whether or not the condition codes are affected
by S bit (condition code unchanged when S=0)
Institute of Electronics, National Chiao Tung University
62/213
Examples
• ADD r5,r1,r3
• ADD Rs,PC,#offset ;PC is ADD address+8
Institute of Electronics, National Chiao Tung University
ADDs r2,r2,r0
ADC r3,r3,r1
63/213
Multiply Instructions (1/2)
• 32-bit Product (Least Significant)
MUL{<cond>}{S} Rd,Rm,Rs
Institute of Electronics, National Chiao Tung University
MLA{<cond>}{S} Rd,Rm,Rs,Rn
• 64-bit Product
<mul>{<cond>}{S} RdHi,RdLo,Rm,Rs
<mul> is (UMULL,UMLAC,SMULL,SMLAL)
64/213
Multiply Instructions (2/2)
• Accumulation is denoted by “+=“
Institute of Electronics, National Chiao Tung University
65/213
Count Leading Zeros (CLZ-V5T only)
• CLZ{<cond>} Rd,Rm
Institute of Electronics, National Chiao Tung University
• Example
CLZ r1,r2
MOVS r2,r2,LSL r1
66/213
Data Transfer Instructions
67/213
Single Register Data Transfer
• Word transfer
Institute of Electronics, National Chiao Tung University
LDR / STR
• Byte transfer
LDRB / STRB
• Halfword transfer
LDRH / STRH
• Load singed byte or halfword-load value and sign
extended to 32 bits
LDRSB / LDRSH
• All of these can be conditionally executed by inserting the
appropriate condition code after STR/LDR
LDREQB
68/213
Addressing
• Register-indirect addressing
Institute of Electronics, National Chiao Tung University
• Base-plus-offset addressing
– base register
r0-r15
– offset, add or subtract an unsigned number
immediate
register (not PC)
scaled register (only available for word and unsigned byte instructions)
• Stack addressing
• Block-copy addressing
69/213
Register-indirect Addressing
• Use a value in one register (base register) as a
Institute of Electronics, National Chiao Tung University
memory address
LDR r0,[r1] ;r0:=mem32[r1]
STR r0,[r1] ;mem32[r1]:=r0
• Other forms
– adding immediate or register offsets to the base
address
70/213
Initializing an Address Pointer
• A small offset to the program counter, r15
Institute of Electronics, National Chiao Tung University
71/213
Single Register Load and Store
an immediate value
• Post-indexing
LDR r0,[r1],#4 ;r0:=mem32[r1], r1:=r1+4
– equivalent to a simple register-indirect load, but faster, less code
space
• Auto-indexing
LDR r0,[r1,#4]! ;r0:=mem32[r1+4], r1:=r1+4
– no extra time, auto-indexing performed while the data is being
fetched from memory
73/213
Base-plus-offset Addressing (2/3)
*Pre-indexed: STR r0, [r1, #12]
Offset r0 Source
Institute of Electronics, National Chiao Tung University
r1
Base
Register 0x200 0x200
Original r1 r0 Source
Base 0x200 0x200 0x5 0x5 Register
Register for STR
74/213
Base-plus-offset Addressing (3/3)
• Copy ADR r1,TABLE1
Institute of Electronics, National Chiao Tung University
ADR r2,TABLE2
Loop LDR r0,[r1],#4
STR r0,[r2],#4
???
…
TABLE1
…
TABLE2
…
75/213
Loading Constants (1/2)
• No single ARM instruction can load a 32-bit immediate
constant directly into a register
Institute of Electronics, National Chiao Tung University
76/213
Loading Constants (2/2)
• This gives us:
Institute of Electronics, National Chiao Tung University
– 0~255 [0-0xff]
– 256,260,264,…,1020 [0x100-0x3fc,step4,0x40-0xff ror 30]
– 1024,1240,…,4080 [0x400-0xff0,step16,0x40-0xff ror 28]
– 4096,4160,…,16320 [0x1000-0x3fc0,step64,0x40-0xff ror 26]
• To load a constant, simply move the required value into a register - the
assembler will convert to the rotate form for us
– MOV r0,#4096 ;MOV r0,#0x1000 (0x40 ror 26)
77/213
Loading 32-bit Constants
• To allow larger constants to be loaded, the assembler
offers a pseudo-instruction:
Institute of Electronics, National Chiao Tung University
LDR rd,=const
• This will either:
– produce a MOV or MVN instruction to generate the value (if
possible) or
– generate a LDR instruction with a PC-relative address to read the
constant from a literal pool (Constant data area embedded in the
code)
• For example
LDR r0,=0xFF ;MOV r0,#0xFF
LDR r0,=0x55555555 ;LDR r0,[PC,#Imm10]
• As this mechanism will always generate the best
instruction for a given case, it is the recommended way of
loading constants.
78/213
Multiple Register Data Transfer (1/2)
• The load and store multiple instructions (LDM/STM) allow between 1
and 16 registers to be transferred to or from memory
Institute of Electronics, National Chiao Tung University
79/213
Multiple Register Data Transfer (2/2)
• Allow any subset (or all, r0 to r15) of the 16 registers to be
Institute of Electronics, National Chiao Tung University
LDMIA r1,{r0,r2,r5}
;r0:=mem32[r1]
;r2:=mem32[r1+4]
;r5:=mem32[r1+8]
80/213
Stack Processing
• A stack is usually implemented as a linear data structure which grows
up (an ascending stack) or down (a descending stack) memory
Institute of Electronics, National Chiao Tung University
• A stack pointer holds the address of the current top of the stack, either
by pointing to the last valid data item pushed onto the stack (a full stack),
or by pointing to the vacant slot where the next data item will be placed
(an empty stack)
• ARM multiple register transfer instructions support all four forms of
stacks
– full ascending: grows up; base register points to the highest address
containing a valid item
– empty ascending: grows up; base register points to the first empty location
above the stack
– full descending: grows down; base register points to the lowest address
containing a valid data
– empty descending: grows down, base register points to the first empty
location below the stack
81/213
Block Copy Addressing (1/2)
• Addressing modes
Institute of Electronics, National Chiao Tung University
101816 r5 101816
r9’ r5 r9’ r1
r1 r0 Ascending Descending
r9 r0 100c16 r9 100c16
Full Empty Full Empty
100016 100016 STMIB LDMIB
Before
STMIA r9!, {r0, r1, r5} STMIB r9!, {r0, r1, r5} STMFA LDMED
Increment
STMIA LDMIA
After
STMEA LDMFD
101816 101816
LDMDB STMDB
Before
r9 r5 100c16 r9 100c16 LDMEA STMFD
r1 r5 Decrement
r0 r1 LDMDA STMDA
r9’ 100016 r9’ r0 100016 After
LDMFA STMED
STMDA r9!, {r0, r1, r5} STMDB r9!, {r0, r1, r5}
82/213
Block Copy Addressing (2/2)
• Copy 8 words from the location r0 points to to the location
r1 points to
Institute of Electronics, National Chiao Tung University
LDMIA r0!,{r2-r9}
STMIA r1,{r2-r9}
– r0 increased by 32, r1 unchanged
• If r2 to r9 contained useful values, preserve them by
pushing them onto a stack
STMFD r13!,{r2-r9}
LDMIA r0!,{r2-r9}
STMIA r1,{r2-r9}
LDMFD r13,{r2-r9}
– FD postfix: full descending stack addressing mode
83/213
Memory Block Copy
• The direction that the base pointer moves through memory
is given by the postfix to the STM/LDM instruction
Institute of Electronics, National Chiao Tung University
increasing
– STMIA/LDMIA: Increment After memory
copy
r14
;r14 points to end of source data
;r13 points to start of destination data
r12
Loop LDMIA r12!,{r0-r11} ;load 48 bytes
STMIA r13!,{r0-r11} ;and store them
CMP r12,r14 ;check for the end
BNE Loop ;and loop until done
– this loop transfers 48 bytes in 31 cycles
– over 50Mbytes/sec at 33MHz
84/213
Single Word and Unsigned Byte Data Transfer
Instructions
• Pre-indexed form
Institute of Electronics, National Chiao Tung University
85/213
Example
• Store a byte in r0 to a peripheral
Institute of Electronics, National Chiao Tung University
LDR r1,UARTADD
STRB r0,[r1] ;store data to UART
…
UARTADD & &10000000
86/213
Half-word and Signed Byte Data Transfer Instructions
• Pre-indexed form
Institute of Electronics, National Chiao Tung University
LDR|STR{<cond>}H|SH|SB Rd,[Rn,<offset>]{!}
• Post-indexed form
LDR|STR{<cond>}H|SH|SB Rd,[Rn],<offset>
87/213
Example
of words
ADR r1,ARRAY1 ;half-word array start
ADR r2,ARRAY2 ;word array start
ADR r3,ENDARR1 ;ARRAY1 end+2
Loop LDRSH r0,[r1],#2 ;get signed half-word
STR r0,[r2],#4 ;save word
CMP r1,r3 ;check for end of array
BLT Loop ;if not finished, loop
88/213
Multiple Register Transfer Instructions
• LDM|STM {<cond>}<add mode> Rn{!},
Institute of Electronics, National Chiao Tung University
<registers>
– <add mode> specifies one of the addressing modes; ‘!’:
auto-indexing; <registers> a list of registers, e.g. {r0,r3-
r7,pc}
• In non-user mode, the CPSR may be restored by
LDM{<cond>}<add mode> Rn{!},<registers+PC>^
89/213
Example
90/213
Swap Memory and Register Instructions
• SWP{<cond>}{B} Rd,Rm,[Rn]
Institute of Electronics, National Chiao Tung University
91/213
Status Register to General Register Transfer
Instructions
• MRS{<cond>} Rd,CPSR|SPSR
Institute of Electronics, National Chiao Tung University
• Example
MRS r0,CPSR
MRS r3,SPSR
92/213
General Register to Status Register Transfer
Instructions
• MSR{<cond>} CPSR_<field>|SPSR_<field>,#<32-bit immediate>
Institute of Electronics, National Chiao Tung University
MSR{<cond>} CPSR_<field>|SPSR_<field>,Rm
– <field> is one of
• c - the control field PSR[7:0]
• x - the extension field PSR[15:8]
• s - the status field PSR[23:16]
• f - the flag field PSR[31:24]
• Example
– set N, X, C, V flags
MSR CPSR_f,#&f0000000
– set just C, preserving N, Z, V
MRS r0,CPSR
ORR r0,r0,#&20000000 ;set bit29 of r0
MSR CPSR_f,r0
93/213
Control Flow Instructions
• Branch instructions
Institute of Electronics, National Chiao Tung University
• Conditional branches
• Conditional execution
• Branch and link instructions
• Subroutine return instructions
• Supervisor calls
• Jump tables
94/213
Branch Instructions
B LABEL
Institute of Electronics, National Chiao Tung University
…
LABEL …
95/213
Conditional Branches
96/213
Conditional Branch
Branch Interpretation DSP Prodcut
B Unconditional Always take this branch
BAL Always Always take this branch
Institute of Electronics, National Chiao Tung University
97/213
Conditional Execution
• An unusual feature of the ARM instruction set is that conditional execution
Institute of Electronics, National Chiao Tung University
98/213
Branch and Link Instructions
• Perform a branch, save the address following the branch
Institute of Electronics, National Chiao Tung University
99/213
Subroutine Return Instructions
SUB …
MOV PC,r14 ;copy r14 into r15 to return
Institute of Electronics, National Chiao Tung University
100/213
Supervisor Calls
• The supervisor is a program which operates at a privileged
Institute of Electronics, National Chiao Tung University
101/213
Jump Tables
102/213
Branch and Branch with Link (B,BL)
• B{L}{<cond>} <target address>
Institute of Electronics, National Chiao Tung University
31 28 27 2524 23 0
cond 1110 L 24-bit signed word offset
103/213
Examples
• Unconditional jump
B LABEL
Institute of Electronics, National Chiao Tung University
…
LABEL …
• Loop ten times
MOV r0,#10
Loop …
SUBS r0,#1
BNE Loop
…
• Call a subroutine
BL SUB
…
SUB …
MOV PC,r14
• Conditional subroutine call
CMP r0,#5
BLLT SUB1 ; if r0<5, call SUB1
BLGE SUB2 ; else call SUB2
104/213
Branch, Branch with Link and eXchange
• B{L}X{<cond>} Rm
Institute of Electronics, National Chiao Tung University
105/213
Example
CODE32
…
BLX TSUB ;call Thumb subroutine
…
CODE16 ;start of Thumb code
TSUB …
BX r14 ;return to ARM code
106/213
Software Interrupt (SWI)
• SWI{<cond>} <24-bit immediate>
Institute of Electronics, National Chiao Tung University
107/213
Examples
• Output the character ‘A’
MOV r0,#’A’
Institute of Electronics, National Chiao Tung University
SWI SWI_WriteC
• Finish executing the user program and return to the monitor
SWI SWI_Exit
• A subroutine to output a text string
BL STROUT
= “Hello World”,&0a,&0d,0
…
STROUT LDRB r0,[r14],#1 ;get character
CMP r0,#0 ;check for end marker
SWINE SWI_WriteC ;if not end, print
BNE STROUT ; … , loop
ADD r14,#3 ;align to next word
BIC r14,#3
MOV PC,r14 ;return
108/213
Coprocessor Instructions
• Extend the instruction set through the addition of
Institute of Electronics, National Chiao Tung University
coprocessors
– System Coprocessor: control on-chip function such as cache and
memory management unit
– Floating-point Coprocessor
– Application-Specific Coprocessor
• Coprocessors have their own private register sets and
their state is controlled by instructions that mirror the
instructions that control ARM registers
109/213
Coprocessor Data Operations
• CDP{<cond>}<CP#>,<Cop1>,CRd,CRn,CRm{,<Cop2>}
Institute of Electronics, National Chiao Tung University
31 28 27 24 23 2019 16 15 12 11 8 7 54 3 0
cond 1110 Cop1 CRn CRd CP# Cop2 0 CRm
110/213
Coprocessor Data Transfers
• Pre-indexed form
LDC|STC{<cond>}{L}<CP#>,CRd,[Rn,<offset>]{!}
Institute of Electronics, National Chiao Tung University
• Post-indexed form
LDC|STC{<cond>}{L}<CP#>,CRd,[Rn],<offset>
– L flag, if present, selects the long data type
– <offset> is # +/-<8-bit immediate>
25 23 21
31 28 27 24 22 2019 16 15 12 11 8 7 0
cond 1110 P U NWL CRn CRd CP# 8-bit offset
source/destination register
base register
load/store
write-back (auto-index)
data size (coprocessor dependent)
up/down
pre-/post-index
– the number of words transferred is controlled by the coprocessor
– address calculated within ARM; number of words transferred controlled by the
coprocessor
• Examples
LDC P6,c0,[r1]
STCEQL P5,c1,[r0],#4
111/213
Coprocessor Register Transfers
• Move to ARM register from coprocessor
Institute of Electronics, National Chiao Tung University
MRC{<cond>} <CP#>,<Cop1>,Rd,CRn,CRm,{,<Cop2>}
• Move to coprocessor from ARM register
MCR{<cond>} <CP#>,<Cop1>,Rd,CRn,CRm,{,<Cop2>}
31 28 27 24 23 212019 16 15 12 11 8 7 54 3 0
cond 1110 Cop1 L CRn CRd CP# Cop2 1 CRm
• Examples
MCR P14,3,r0,c1,c2
MRCCS P2,4,r3,c3,c4,6
112/213
Breakpoint Instructions (BKPT-v5T only)
Institute of Electronics, National Chiao Tung University
113/213
Unused Instruction Space
• Unused Arithmetic Instructions
31 28 27 22 212019 16 15 12 11 8 7 65 4 3 0
Institute of Electronics, National Chiao Tung University
114/213
115/213
16-bit Instruction Set
Institute of Electronics, National Chiao Tung University
Thumb Instruction Set (1/3)
Mnemonic Instruction Lo Hi Condition
Register Register Code
Institute of Electronics, National Chiao Tung University
116/213
Thumb Instruction Set (2/3)
Mnemonic Instruction Lo Hi Condition
Register Register Code
Institute of Electronics, National Chiao Tung University
117/213
Thumb Instruction Set (3/3)
Mnemonic Instruction Lo Hi Condition
Register Register Code
Institute of Electronics, National Chiao Tung University
118/213
119/213
Thumb Instruction Format
Institute of Electronics, National Chiao Tung University
Thumb-ARM Difference
• Thumb instruction set is a subset of the ARM instruction
Institute of Electronics, National Chiao Tung University
120/213
Register Access in Thumb
• Not all registers are directly accessible in Thumb
Institute of Electronics, National Chiao Tung University
121/213
122/213
Thumb Accessible Registers
Institute of Electronics, National Chiao Tung University
Branches
• Thumb defines three PC-relative branch instructions, each of which
have different offset ranges
Institute of Electronics, National Chiao Tung University
123/213
Data Processing Instructions
• Subset of the ARM data processing instructions
Institute of Electronics, National Chiao Tung University
125/213
Block Data Transfers
• Memory copy, incrementing base pointer after transfer
Institute of Electronics, National Chiao Tung University
126/213
Miscellaneous
• Thumb SWI instruction format
Institute of Electronics, National Chiao Tung University
15 8 7 0
1 1 0 1 1 1 1 1 SWI number
• Indirect access to CPSR and no access to SPSR, so no
MRS or MSR instructions
• No coprocessor instruction space
127/213
Thumb Instruction Entry and Exit
• T bit, bit 5 of CPSR
Institute of Electronics, National Chiao Tung University
128/213
The Need for Interworking
Institute of Electronics, National Chiao Tung University
129/213
Interworking Instructions
• Interworking is achieved using the Branch Exchange
Institute of Electronics, National Chiao Tung University
instructions
– in Thumb state
BX Rn
– in ARM state (on Thumb-aware cores only)
BX<condition> Rn
where Rn can be any registers (r0 to r15)
• This performs a branch to an absolute address in 4GB
address space by copying Rn to the program counter
• Bit 0 of Rn specifies the state to change to
130/213
Switching between States
Institute of Electronics, National Chiao Tung University
31 1 0
Rn
ARM/Thumb Selection
BX 0- ARM State
1- Thumb State
31 1 0 Destination
0 Address
131/213
Example
;start off in ARM state
CODE32
Institute of Electronics, National Chiao Tung University
132/213
133/213
ARM Processor Core
Institute of Electronics, National Chiao Tung University
3-Stage Pipeline ARM Organization
• Register Bank
Institute of Electronics, National Chiao Tung University
134/213
Data Processing Instructions
Institute of Electronics, National Chiao Tung University
136/213
3-Stage Pipeline (2/2)
• At any time slice, 3 different instructions may occupy each
Institute of Electronics, National Chiao Tung University
137/213
Data Transfer Instructions
Institute of Electronics, National Chiao Tung University
instruction
time
• The third cycle, which is required to complete the pipeline refilling, is also used
to make a small correction to the value stored in the link register in order that it
points directly at the instruction which follows the branch
140/213
Branch Pipeline Example
Institute of Electronics, National Chiao Tung University
Cycle 1 2 3 4 5
address opeation
0x8000 BL fetch decode execute linkret adjust
0x8004 X fetch decode
0x8008 XX fetch
0x8FEC ADD fetch decode execute
0x8FF0 SUB fetch decode execute
0x8FF4 MOV fetch decode
fetch
141/213
Interrupt Pipeline Example
Institute of Electronics, National Chiao Tung University
IRQ
Cycle 1 2 3 4 5 6 7 8
address opeation
0x8000 ADD fetch decode execute
0x8004 SUB fetch decode IRQ execute IRQ linkret adjust
0x8008 MOV fetch
0x800C X fetch
0x001C B(to 0xAF00) fetch decode execute
0x0018 XX fetch decode
0x0020 XXX fetch
0xAF00 STMFD fetch decode execute
0xAF04 MOV fetch decode
0xAF08 LDR fetch
142/213
5-Stage Pipelined ARM Organization
• Tprog=Ninst*CPI*cycle_time
Institute of Electronics, National Chiao Tung University
143/213
ARM9TDMI 5-stage Pipeline Organization
• Fetch
– The instruction is fetched from memory and
Institute of Electronics, National Chiao Tung University
144/213
Data Forwarding
• Data dependency arises when an instruction needs to use the result of
one of its predecessors before the result has returned to the register
Institute of Electronics, National Chiao Tung University
145/213
ARM7TDMI Processor Core
• Current low-end ARM core for applications like digital
Institute of Electronics, National Chiao Tung University
mobile phones
• TDMI
– T: Thumb, 16-bit compressed instruction set
– D: on-chip Debug support, enabling the processor to halt in
response to a debug request
– M: enhanced Multiplier, yield a full 64-bit result, high performance
– I: Embedded ICE hardware
• Von Neumann architecture
• 3-stage pipeline, CPI ~1.9
146/213
147/213
ARM7TDMI Block Diagram
Institute of Electronics, National Chiao Tung University
148/213
ARM7TDMI Core Diagram
Institute of Electronics, National Chiao Tung University
149/213
ARM7TDMI Interface Signals (1/4)
Institute of Electronics, National Chiao Tung University
ARM7TDMI Interface Signals (2/4)
• Clock control
– all state change within the processor are controlled by mclk, the memory
Institute of Electronics, National Chiao Tung University
clock
– internal clock = mclk AND \wait
– eclk clock output reflects the clock used by the core
• Memory interface
– 32-bit address A[31:0], bidirectional data bus D[31:0], separate data out
Dout[31:0], data in Din[31:0]
– \mreq indicates a processor cycle which requires a memory access
– seq indicates that the memory address will be sequential to that used in the
previous cycle
150/213
ARM7TDMI Interface Signals (3/4)
– lock indicates that the processor should keep the bus to ensure the
Institute of Electronics, National Chiao Tung University
151/213
ARM7TDMI Interface Signals (4/4)
• Interrupt
Institute of Electronics, National Chiao Tung University
• ARM7TDMI characteristics
152/213
153/213
External Address Generation
Institute of Electronics, National Chiao Tung University
154/213
Memory Access
Institute of Electronics, National Chiao Tung University
155/213
ARM Memory Interface
Institute of Electronics, National Chiao Tung University
Instruction Execution Cycles (1/2)
Institute of Electronics, National Chiao Tung University
156/213
Instruction Execution Cycles (2/2)
157/213
158/213
Effect of T bit
Institute of Electronics, National Chiao Tung University
159/213
Cached ARM7TDMI Macrocells
Institute of Electronics, National Chiao Tung University
ARM 8
• Higher performance than ARM7
– by increasing the clock rate
Institute of Electronics, National Chiao Tung University
161/213
162/213
Integer Unit Organization
Institute of Electronics, National Chiao Tung University
ARM9TDMI
• Harvard architecture
Institute of Electronics, National Chiao Tung University
163/213
164/213
ARM9TDMI Organization
Institute of Electronics, National Chiao Tung University
ARM9TDMI Pipeline Operations (1/2)
Institute of Electronics, National Chiao Tung University
165/213
166/213
ARM9TDMI Datapath (1/2)
Institute of Electronics, National Chiao Tung University
167/213
ARM9TDMI Datapath (2/2)
Institute of Electronics, National Chiao Tung University
168/213
LDR Interlock
Institute of Electronics, National Chiao Tung University
169/213
Optimal Pipelining
Institute of Electronics, National Chiao Tung University
170/213
LDM Interlock (1/2)
Institute of Electronics, National Chiao Tung University
171/213
LDM Interlock (2/2)
Institute of Electronics, National Chiao Tung University
172/213
Example ARM9TDMI System
Institute of Electronics, National Chiao Tung University
173/213
Cached ARM9TDMI Macrocell
Institute of Electronics, National Chiao Tung University
ARM9TDMI Pipeline Operations (2/2)
• Coprocessor support
Institute of Electronics, National Chiao Tung University
174/213
ARM9E-S Family Overview
• ARM9E is based on an ARM9TDMI with the following extensions
Institute of Electronics, National Chiao Tung University
175/213
ARM1020T Overview
• Architecture v5T
– ARM1020E will be v5TE
Institute of Electronics, National Chiao Tung University
• CPI ~ 1.3
• 6-stage pipeline
• Static branch prediction
• 32KB instruction and 32KB data caches
– ‘hit under miss’ support
• 64 bits per cycle LDM/STM operations
• EmbeddedICE Logic RT-II
• Support for new VFPv1 architecture
• ARM10200 test chip
– ARM1020T
– VFP10
– SDRAM memory interface
– PLL
176/213
ARM10TDMI (1/2)
• Current high-end ARM processor core
Institute of Electronics, National Chiao Tung University
177/213
178/213
Software Development
Institute of Electronics, National Chiao Tung University
• ARM software development - ADS
• ARM system development - ICE and trace
• ARM-based SoC development – modeling, tools, design flow
Institute of Electronics, National Chiao Tung University
C compiler assembler
.aof
object
linker
libraries
.aif debug
development
ARMulator
board
179/213
ARM Development Suite (ADS),
ARM Software Development Toolkit (SDT) (1/3)
• Develop and debug C, C++ or assembly language program
ARM C compiler
Institute of Electronics, National Chiao Tung University
• armcc
armcpp ARM C++ compiler
tcc Thumb C compiler
tcpp Thumb C++ compiler
armasm ARM and Thumb assembler
armlink ARM linker
- combine the contents of one or more object files
with selected parts of one or more object libraries
to produce an executable program
- ARM linker creates ELF executable images
armsd ARM and Thumb symbolic debugger
- can single-step through C or assembly language
sources, set break-points and watch-points, and
examine program variables or memory
180/213
ARM Development Suite (ADS),
ARM Software Development Toolkit (SDT) (2/3)
• .aof ARM object format file
Institute of Electronics, National Chiao Tung University
• ARMsd can load, run and debug programs either on hardware such as
the ARM development board or using the software emulation of the
ARM (ARMulator)
• AxD (ADW, ADU)
– ARM debugger for Windows and Unix with graphics user interface
– debug C, C++, and assembly language source
Code Warrior IDE
– project management tool for windows
181/213
ARM Development Suite (ADS),
ARM Software Development Toolkit (SDT) (3/3)
• Utilities
Institute of Electronics, National Chiao Tung University
182/213
ARM C Compiler
• Compiler is compliant with the ANSI standard for C
Institute of Electronics, National Chiao Tung University
183/213
Linker
• Take one or more object files and combine them
Institute of Electronics, National Chiao Tung University
184/213
ARM Symbolic Debugger
• A front-end interface to debug program running either
Institute of Electronics, National Chiao Tung University
185/213
ARM Emulator
• ARMulator is a suite of programs that models the behavior
Institute of Electronics, National Chiao Tung University
186/213
ARM Development Board
• A circuit board including an ARM core (e.g. ARM7TDMI),
Institute of Electronics, National Chiao Tung University
187/213
Writing Assembly Language Programs
AREA HelloW,CODE,READONLY ;declare code area
SWI_WriteC EQU &0 ;output character in r0
Institute of Electronics, National Chiao Tung University
188/213
Program Design
• Start with understanding the requirements, translate the
Institute of Electronics, National Chiao Tung University
189/213
System Architecture (1/2)
• ARM processor, memory system, buses, and the ARM reference peripheral
specification
Institute of Electronics, National Chiao Tung University
190/213
System Architecture (2/2)
• Interrupt controller provides a way of enabling, disabling (by mask)
and examining the status of up to 32 level-sensitive IRQ sources and
Institute of Electronics, National Chiao Tung University
191/213
Hardware System Prototype
• Verifying the function correctness of hardware blocks,
Institute of Electronics, National Chiao Tung University
192/213
ARM Integrator
• A motherboard with some Peripheral Input/
Output
extensions to support the
Institute of Electronics, National Chiao Tung University
Compact PCI
193/213
Rapid Silicon Prototyping (VLSI Tech. Inc.)
194/213
ARMulator (1/2)
• ARMulator is a collection of programs that
Institute of Electronics, National Chiao Tung University
196/213
JTAG Boundary Scan (1/2)
• IEEE 1149, Standard Test Access Port and Boundary Scan Architecture
or called JTAG boundary scan (by Joint Test Action Group)
Institute of Electronics, National Chiao Tung University
197/213
JTAG Boundary Scan (2/2)
• Test signals
Institute of Electronics, National Chiao Tung University
198/213
TAP Controller (1/2)
test logic reset
Institute of Electronics, National Chiao Tung University
capture DR capture IR
shift DR shift IR
exit1 DR exit1 IR
TMS=0
pause DR pause IR
TMS=1
exit2 DR exit2 IR
update DR update IR
199/213
TAP Controller (2/2)
• Test instruction selects various data registers
Institute of Electronics, National Chiao Tung University
200/213
Macrocell Testing
• System chip is composed of the pre-designed macrocells with application-
specific custom logic
Institute of Electronics, National Chiao Tung University
201/213
ARM Debug Architecture (1/2)
• Two basic approaches to debug
Institute of Electronics, National Chiao Tung University
202/213
ARM Debug Architecture (2/2)
• In debug state, the core’s internal state and the
Institute of Electronics, National Chiao Tung University
203/213
Debugger (1/2)
• A debugger is software that enables you to make use of a debug agent
in order to examine and control the execution of software running on a
Institute of Electronics, National Chiao Tung University
debug target
• Different forms of the debug target
– early stage of product development, software
– prototype, on a PCB including one or more processors
– final product
• The debugger issues instructions that can
– load software into memory on the target
– start and stop execution of that software
– display the contents of memory, registers, and variables
– allow you to change stored values
• A debug agent performs the actions requested by the debugger, such
as
– setting breakpoints
– reading from / writing to memory
204/213
Debugger (2/2)
Institute of Electronics, National Chiao Tung University
– Multi-ICE AxD
– Embedded ICE RDI
205/213
In Circuit Emulator (ICE)
• The processor in the target system is removed and
Institute of Electronics, National Chiao Tung University
206/213
Multi-ICE and Embedded ICE
• Multi-ICE and Embedded ICE are JTAG-based
Institute of Electronics, National Chiao Tung University
207/213
Basic Debug Requirements
• Control of program execution
Institute of Electronics, National Chiao Tung University
208/213
Debugging with Multi-ICE
Institute of Electronics, National Chiao Tung University
209/213
ICEBreaker (EmbeddedICE macrocell)
• ICEBreaker is programmed in DBGRQI EXTERN1
a serial fashion using the TAP
Institute of Electronics, National Chiao Tung University
A[31:0] EXTERN0
controller D[31:0]
nOPC
• It consists of 2 real-time watch-
nRW RANGEOUT0
point units, together with a TBIT RANGEOUT1
control and status register MAS[1:0] DBGACK
Processor ICEBreaker
• Either watch-point unit can be nTRANS BREAKPT
or a breakpoint BREAKPTI
IFEN DBGEN
ECLK
nMREQ
SDIN SDOUT
TCK
TMS
nTRST
TAP TDI
TDO
210/213
Real-Time Trace (1/2)
• Debugging uses the breakpoint and single-step to run
Institute of Electronics, National Chiao Tung University
211/213
Real-Time Trace (2/2)
ADW and
TDT running
on the host
Institute of Electronics, National Chiao Tung University
ASIC
Embedded
JTAG JTAG ARM CPU Macrocell Trace
Unit Port Macrocell
5-wire
JTAG
Trace
Port
Trace
Port Analyzer
– branch prediction
– non-blocking load and store execution
– 64-bit data memory => transfer 2 registers in each cycle
213/213