Arm Instruction Program
Arm Instruction Program
V4. 2008-09-24
qFeatures used
– Load/Store architecture
– Fixed-length 32-bit instructions
– 3-address instruction formats
SPSR_und
SPSR_abt SPSR_irq
CPSR SPSR_fiq SPSR_svc
q T: Thumb
q D: On-chip debug support
q M: Enhanced multiplier
q I: Embedded ICE hardware
q T2: Thumb-2
q S: Synthesizable code
q E: Enhanced DSP instruction set
q J: JAVA support, Jazelle
q Z: Should be TrustZone?
q F: Floating point unit
q H: Handshake, clockless design for synchronous or
asynchronous design
qARM8 → ARM9
→ ARM10
qARM9
– 5-stage pipeline (130 MHz or 200MHz)
– Using separate instruction and data memory ports
qARM 10 (1998. Oct.)
– High performance, 300 MHz
– Multimedia digital consumer applications
– Optional vector floating-point unit
Core Architecture
ARM1 v1
ARM2 v2
ARM2as, ARM3 v2a
ARM6, ARM600, ARM610 v3
ARM7, ARM700, ARM710 v3
ARM7TDMI, ARM710T, ARM720T, ARM740T v4T
StrongARM, ARM8, ARM810 v4
ARM9TDMI, ARM920T, ARM940T V4T
ARM9E-S, ARM10TDMI, ARM1020E v5TE
ARM10TDMI, ARM1020E v5TE
ARM11 MPCore, ARM1136J(F)-S, ARM1176JZ(F)-S v6
Cortex-A/R/M v7
qComparison operations
– Not produce result; omit the destination from the format
– Just set the condition code bits (N, Z, C and V) in CPSR
CMP r1,r2 ;set cc on r1 - r2, compare
CMN r1,r2 ;set cc on r1 + r2, compare negated
TST r1,r2 ;set cc on r1 AND r2, bit test
TEQ r1,r2 ;set cc on r1 XOR r2, test equal
eXtended by 1 place) 31 0 31 0
– ADD r5,r5,r3,LSL r2 ; C
r5:=r5+r3*2r2
– MOV r12,r4,ROR r3
C C
;r12:=r4 rotated right by
value of r3 ROR #5 RRX
SOC Consortium Course Material 56
Using the Barrel Shifter: the 2nd Operand
qMultiply
MUL r4,r3,r2 ;r4:=(r3*r2)[31:0]
qMultiply-Accumulate
MLA r4,r3,r2,r1 ;r4:=(r3*r2+r1)[31:0]
q64-bit Product
– <mul>{<cond>}{S} RdHi,RdLo,Rm,Rs
– <mul> is UMULL,UMLAC,SMULL,SMLAL
q Word transfer
– LDR / STR
q Byte transfer
– LDRB / STRB
q Halfword transfer
– LDRH / STRH
q Load singled byte or halfword-load value and sign extended
to 32 bits
– LDRSB / LDRSH
q All of these can be conditionally executed by inserting the
appropriate condition code after STR/LDR
– LDREQB
qPre-indexing
LDR r0,[r1,#4] ;r0:=mem32[r1+4]
– Offset up to 4K, added or subtracted, (# -4)
qPost-indexing
LDR r0,[r1],#4 ;r0:=mem32[r1], r1:=r1+4
– Equivalent to a simple register-indirect load, but faster,
less code space
qAuto-indexing
LDR r0, [r1,#4]! ;r0:=mem32[r1+4], r1:=r1+4
– No extra time, auto-indexing performed while the data is
being fetched from memory
qPost-indexed form
– LDR|STR{<cond>}{B} Rd, [Rn], <offset>
qPC-relative form
– LDR|STR{<cond>}{B} Rd, LABEL
– LDR: ’load register’; STR: ’store register’
– ‘B’ unsigned byte transfer, default is word;
– <offset> may be # +/-<12-bit immediate> or +/- Rm{, shift}
– !: auto-indexing
– T flag selects the user view of the memory translation and protection
system
qPre-indexed form
– LDR|STR{<cond>}H|SH|SB Rd,[Rn,<offset>]{!}
qPost-indexed form
– LDR|STR{<cond>}H|SH|SB Rd,[Rn],<offset>
– <offset> is # +/-<8-bit immediate> or +/- Rm
– H|SH|SB selects the data type
• Unsigned half-word
• Signed half-word and
• Signed byte
• Otherwise the assumable format is for word and unsigned byte
transfer
q SWP{<cond>}{B} Rd,Rm,[Rn]
q Rd <- [Rn], [Rn] <- Rm
qExample
MRS r0,CPSR
MRS r3,SPSR
qBranch Instructions
qConditional Branches
qConditional Execution
qBranch and Link Instructions
qSubroutine Return Instructions
qSupervisor Calls
qJump Tables
qCall a subroutine
BL SUB
…
SUB…
MOV PC,r14
SOC Consortium Course Material 97
Branch, Branch with Link and eXchange
q B{L}X{<cond>} Rm
– The branch target is specified in a register, Rm
– Bit[0] of Rm is copied into the T bit in CPSR; bit[31:1] is moved into
PC
– If Rm[0] is 1, the processor switches to execute Thumb instructions
and begins executing at the address in Rm aligned to a half-word
boundary by clearing the bottom bit
– If Rm[0] is 0, the processor continues executing ARM instructions and
begins executing at the address in Rm aligned to a word boundary by
clearing Rm[1]
q BLX <target address>
– Call Thumb subroutine from ARM
– The H bit (bit 24) is also added into bit 1 of the resulting addressing,
allowing an odd half-word address to be selected for the target
instruction which will always be a Thumb instruction
CODE32
…
BLX TSUB ;call Thumb subroutine
…
CODE16 ;start of Thumb code
TSUB …
BX r14 ;return to ARM code
q Special forms:
– Load with PC as base with 1K-byte immediate offset (word aligned)
• Used for loading a value from a literal pool
– Load and store with SP as base with 1K-byte immediate offset (word
aligned)
• Used for accessing local variables on the stack
qARM Architecture
– Load/Store Architecture
– 32-bit Instructions
– 3-address Instruction Formats
– 37 Registers
qInstruction Set
– 32-bit ARM Instruction
– 16-bit Thumb Instruction
qARM/Thumb Interworking