Arm Instruction
Arm Instruction
ARM
Cond
31 28 27 25 24 23 0
Cond 1 0 1 L Offset
CF Destination 0
Barrel Shifter - Right Shifts
Result
Second Operand :Shifted Register
• The amount by which the register is to be shifted is contained in
either:
– the immediate 5-bit field in the instruction
• NO OVERHEAD
• Shift is done for free - executes in single cycle.
• If no shift is specified then a default shift is applied: LSL #0
– i.e. barrel shifter has no effect on value in register.
Second Operand :Immediate Value (1)
• There is no single instruction which will load a 32 bit immediate constant
into a register without performing a data load from memory.
– All ARM instructions are 32 bits long
– ARM instructions do not use the instruction stream as data.
• The data processing instruction format has 12 bits available for operand2
– If used directly this would only give a range of 4096.
• Instead it is used to store 8 bit constants, giving a range of 0 - 255.
• These 8 bits can then be rotated right through an even number of
positions (ie RORs by 0, 2, 4,..30).
– This gives a much larger range of constants that can be directly
loaded, though some constants will still need to be loaded from
memory.
Loading full 32 bit constants
• Although the MOV/MVN mechansim will load a large range of constants into a
register, sometimes this mechansim will not generate the required constant.
• Therefore, the assembler also provides a method which will load ANY 32 bit
constant:
– LDR rd,=numeric constant
Multiplication Instructions
• The Basic ARM provides two multiplication instructions.
• Multiply
– MUL{<cond>}{S} Rd, Rm, Rs ; Rd = Rm * Rs
• Multiply Accumulate - does addition for free
– MLA{<cond>}{S} Rd, Rm, Rs,Rn ; Rd = (Rm * Rs) + Rn
• Restrictions on use:
– Rd and Rm cannot be the same register
• Can be avoid by swapping Rm and Rs around. This works because
multiplication is commutative.
– Cannot use PC.
These will be picked up by the assembler if overlooked.
• Operands can be considered signed or unsigned
– Up to user to interpret correctly.
Load / Store Instructions
• The ARM is a Load / Store Architecture:
– Does not support memory to memory data processing operations.
– Must move data values into registers before using them.
• This might sound inefficient, but in practice isn’t:
– Load data values from memory into registers.
– Process data in registers using a number of data processing
instructions which are not slowed down by memory access.
– Store results from registers out to memory.
• The ARM has three sets of instructions which interact with main
memory. These are:
– Single register data transfer (LDR / STR).
– Block data transfer (LDM/STM).
– Single Data Swap (SWP).
Single register data transfer
• The basic load and store instructions are:
– Load and Store Word or Byte
• LDR / STR / LDRB / STRB
• ARM Architecture Version 4 also adds support for halfwords and signed data.
– Load and Store Halfword
• LDRH / STRH
– Load Signed Byte or Halfword - load value and sign extend it to 32 bits.
• LDRSB / LDRSH
• All of these instructions can be conditionally executed by inserting the
appropriate condition code after STR / LDR.
– e.g. LDREQB
• Syntax:
– <LDR|STR>{<cond>}{<size>} Rd, <address>
Load and Store Word or Byte: Base register
• The memory location to be accessed is held in a base register
– STR r0, [r1] ; Store contents of r0 to location pointed to
; by contents of r1.
– LDR r2, [r1] ; Load r2 with contents of memory location
; pointed to by contents of r1.
r0 Memory
Source
0x5
Register
for STR
r1 r2
Base Destination
0x200 0x200 0x5 0x5
Register Register
for LDR
Load and Store Word or Byte:
Offsets from the Base Register
• As well as accessing the actual location contained in the base register, these
instructions can access a location offset from the base register pointer.
• This offset can be
– An unsigned 12bit immediate value (ie 0 - 4095 bytes).
– A register, optionally shifted by an immediate value
• This can be either added or subtracted from the base register:
– Prefix the offset value or register with ‘+’ (default) or ‘-’.
• This offset can be applied:
– before the transfer is made: Pre-indexed addressing
• optionally auto-incrementing the base register, by postfixing the instruction
with an ‘!’.
– after the transfer is made: Post-indexed addressing
• causing the base register to be auto-incremented.
auto-incremented
Load and Store Word or Byte:
Pre-indexed Addressing
• Example: STR r0, [r1,#12] r0
Memory Source
0x5 Register
for STR
Offset
12 0x20c 0x5
r1
Base
0x200 0x200
Register
r1 Offset r0
Updated Source
Base 0x20c 12 0x20c 0x5 Register
Register for STR
0x200 0x5
r1
Original
Base 0x200
Register
11 22 33 44
, STRr0 [r1]
31 24 23 16 15 87 0 31 24 23 16 15 87 0
00 00 00 44 00 00 00 11
r2 = 0x44 r2 = 0x11
Block Data Transfer (1)
• The Load and Store Multiple instructions (LDM / STM) allow betweeen 1 and 16
registers to be transferred to or from memory.
• The transferred registers can be either:
– Any subset of the current bank of registers (default).
– Any subset of the user mode bank of registers when in a priviledged mode
(postfix instruction with a ‘^’).
31 28 27 24 23 22 21 20 19 16 15 0
36
Stacks
• A stack is an area of memory which grows as new data is “pushed” onto the
“top” of it, and shrinks as data is “popped” off the top.
• Two pointers define the current limits of the stack.
– A base pointer
• used to point to the “bottom” of the stack (the first location).
– A stack pointer
• used to point the current “top” of the stack.
PUSH
{1,2,3} POP
SP 3 Result of
2 SP 2 pop = 3
1 1
SP
BASE BASE
BASE
Stack Operation
• Traditionally, a stack grows down in memory, with the last “pushed” value at the
lowest address. The ARM also supports ascending stacks, where the stack
structure grows up through memory.
• The value of the stack pointer can either:
– Point to the last occupied address (Full stack)
• and so needs pre-decrementing (ie before the push)
– Point to the next occupied address (Empty stack)
• and so needs post-decrementing (ie after the push)
• The stack type to be used is given by the postfix to the instruction:
– STMFD / LDMFD : Full Descending stack
– STMFA / LDMFA : Full Ascending stack.
– STMED / LDMED : Empty Descending stack
– STMEA / LDMEA : Empty Ascending stack
• Note: ARM Compiler will always use a Full descending stack.
Stack Examples
STMFD sp!, STMED sp!, STMFA sp!, STMEA sp!,
{r0,r1,r3-r5} {r0,r1,r3-r5} {r0,r1,r3-r5} {r0,r1,r3-r5}
0x418
SP r5 SP
r4 r5
r3 r4
r1 r3
r0 r1
Old SP Old SP r5 Old SP Old SP r0 0x400
r5 r4
r4 r3
r3 r1
r1 r0
SP r0 SP
0x3e8
Difference between Empty and Full (in STMED and STMFD) of Stack
Push operations
Ref:
http://computing.unn.ac.uk/staff/cgmb3/teaching/CM506/ARM_Assembler/AssemblerSummary/
STACK.html
Empty: Full:
Store multiple empty descending Store multiple full descending
STMED instruction STMFD instruction
STMED r13!, {r0-r2, r14} ; STMFD r13!, {r0-r2, r14} ;
Address Address
r13 r14 0x50 Old r13 0x50
Old
r2 r14
SP moves
r1 SP moves r2
down
down r1
r0
r13' r0 0x40
New r13' 0x40
New
STMED r13!, {r0-r2, r14} STMFD r13!, {r0-r2, r14}
Address Address
new r13 r14 0x50 old r13 0x50
r2 r14
SP moves
r1 SP moves r2
up
down r1
r0
r0 0x40
old r13' 0x40 new r13'
r12
Software Interrupt (SWI)
31 28 27 24 23 0
Condition Field
• The CPSR and SPSR are not memory-mapped or part of the central register
file.The only instructions which operate on them are the MSR and MRS
instructions.These instructions are disabled when the
• The MSR and MRS instructions will work in all processor modes except the USER
mode.So it is only possible to change the operating mode of the process, or to
enable or disable interrupts, from a privileged mode. Once you have entered the
USER mode you cannot leave it, except through an exception, reset, FIQ, IRQ or
SWI instruction CPU is in USER mode.
46
• MRS – Move PSR into General-Purpose Register
Syntax: MRS{<cond>} <Rd >, CPSR
MRS{<cond>} <Rd >, SPSR
if(cond) Rd CPSR/SPSR Flags updated: None
Usage and Examples: Moves the value of CPSR or the current SPSR into a general-purpose
register.
Ex: MRS R0, CPSR
• MSR – Move to Status Register from GPR Register
Syntax: MSR{<cond>} CPSR_<fields>, #<immediate>
MSR{<cond>} CPSR_<fields>, <Rm>
MSR{<cond>} SPSR_<fields>, #<immediate>
MSR{<cond>} SPSR_<fields>, <Rm>
if(cond)
CPSR/SPSR immediate/register value
Flags updated: N/A
47
ARM Instruction set-summery
• Data Processing Instructions
• Data Transfer Instructions
• Control Flow Instructions
ARM Instruction Set Summary (1/4)
ARM Instruction Set Summary (2/4)
ARM Instruction Set Summary (3/4)
Thumb Instruction Set
• Thumb instructions are 16 bit long.
• Thumb Instruction set addresses the issue of code
density
• Compressed form of a subset of the ARM
instruction set
• Use dynamic decompression in an ARM
instruction pipeline
• Many thumb data proc. Instr uses 2 addr format.
Thumb programmer’s Model
MOV (Thumb)
Move constant or register to register.
• updates N, Z
• Rd := NOT Rm
• Update N and Z.
• Update N and Z.
ADC Rd, Rm
• Add Rd and Rm and C flag and store result to Rd.
• Only registers R0 - R7 allowed.
• N, Z, C, V
• ADD R2,R4 // R2 = R2 + R4, set flags
• ADC R3,R5 // R3 = R3 + R5 + carry from
previous ADD
ADD (Thumb)
Add values and store result to register.
• ADD Rd, Rm
Add Rd and Rm and store result to Rd.
supports low and high registers (R0 - R15).
64
SUB SP, #expr
subtracts the value of expr to the value from Rp, and places the result in
Rd.
SUB SP, #expr supports high and low registers (R0 - R15)
N, Z, C and V flags are updated. SUB SP, #expr does not affect the flags.
65
SUB R6, R2,#6 //R6:=R2-6 and set condition codes
66
SBC (THUMB)
SUBTRACT WITH CARRY.
SBC Rd, Rm
subtracts the value in Rm from the value in Rd, taking account of the carry flag,
and places the result in Rd. Use this to synthesize multiword subtraction.
MUL Rd, Rm
Multiplies the values in Rd and Rm and store result in Rd.
Update N and Z.
BIC Rd, Rm
• The bits set in Rm are cleared Rd. Rd := Rd AND
NOT Rm.
• Supports only low registers (R0 - R7).
• Update N and Z.
• MOV R1,#05H // load R1 with 0101B (bit 0
and bit 2 set)
• BIC R2,R1 // clear bit 0 and bit 2 in R2
BL (Thumb)
Branch with Link. Use to call subroutines.
• BL label
• Copy address of next instruction to R14 and jump
to label. The jump distance must be within
±4MByte of the current instruction. Note that this
mnemonic is generated as two 16-bit Thumb
instructions.
• Flags not modified.
• BL function // call function
B (THUMB)
BRANCH TO LABEL. USED TO JUMP TO A SPECIFIC PROGRAM
LOCATION.
B{cond} label
The jump distance must be within -252 to +258 bytes for
conditional and ±2 KBytes for unconditional branch.
Flags not modified.
CMP R1,#10 // compare R10 with #10
BEQ val_ok // jump to label val_ok
val_ok:
val_err: B val_err // jump to itself (loop forever)
71
CMP (THUMB)
COMPARE. USED IN COMBINATION WITH CONDITIONAL BRANCH
INSTRUCTIONS.
Update N, Z, C and V.
72
CMP R2,#255 // compare R2 with 255
BNE lab1 // jump to lab1 when R2 is not 255
lab1: CMP R7,R12 // compare R7 with R12
BHS lab2 // jump to lab2 when value in R12 is higher or same than in R7
73
EOR (THUMB)
LOGICAL EXCLUSIVE OR OPERATION.
EOR Rd, Rm
Load Rd with logical Exclusive OR of Rd with Rm.
Rd := Rd EOR Rm
Update N and Z.
ORR Rd, Rm
Load Rd with logical OR of Rd with Rm. Rd := Rd OR Rm
Update N and Z.
ORR R3,R4 // R3 = R3 OR R4
75
TST (THUMB)
BITWISE AND OPERATION, RESULT DISCARDED. USED FOR
CONDITIONAL OPERATIONS AFTERWARDS.
TST Rn, Rm
Set condition flags in CSPR on logical AND value between Rm and
Rn. The result is discarded.
Update N and Z.
NEG Rd, Rm
Load Rd with negated value in Rm.
Update N, Z, C, and V.
77
LDMIA (THUMB)
LOAD MULTIPLE REGISTERS FROM MEMORY.
LDMIA R3!, {R0-R2,R4} // load R0, R1, R2 and R4 from address in R3. R3 is
incremented by 16.
80
Update N, Z when LDR Rd, =value generated as
MOV Rd,#value, otherwise not modified.
81
LDRB (THUMB)
LOAD REGISTER BYTE VALUE FROM MEMORY .
LDRB Rd, [Rn, #offset] (immediate offset)
loads a byte from memory.
82
Supports only low registers (R0 - R7).
LDRB R2,[R0,R7] //load into R2 the byte found to the address formed by adding R7 to R0
83
LDRH (THUMB)
LOAD REGISTER RD WITH A 16-BIT HALF-WORD FROM MEMORY. THE
ADDRESS MUST BE DIVISIBLE BY 2.
84
Flags not modified.
85
LDRSB (THUMB)
LOAD A BYTE FROM MEMORY. THE BYTE VALUE IS SIGNED EXTENDED
AND COPIED TO RD .
90
STRB (THUMB)
STORE LOW BYTE IN REGISTER RD TO MEMORY.
92
STRH (THUMB)
STORE 16-BIT HALFWORD IN REGISTER RD TO MEMORY. THE MEMORY
ADDRESS MUST BE DIVISIBLE BY 2.
94
SWI(softwave interrupt)
The # imm8 can be in the range of 0-255 & may be used to invoke
diffrerent operating system functions.
95
Thumb break point
Used for software debugging purposes
Syntax: BKPT ; This instruction causes the processor
to take a prefetch abort when the debug
hardware unit is configured appropriately.
96
97
Thumb Implementation
98
The Thumb instruction decompressor organization.
EX: ADD Rd, #imm8 to ADDS Rd,Rd, #imm8
Thumb applications
• The Thumb code requires 70% of the space of the ARM
code.
• The Thumb code uses 40% more instructions than the
ARM code.
• With 32-bit memory, the ARM code is 40% faster than the
Thumb code.
• With 16-bit memory, the Thumb code is 45% faster than
the ARM code.
• Thumb code uses 30% less external memory power than
ARM code.
So where performance is all-important, a system
should use 32-bit memory and run ARM code.
Where cost and power consumption are more
important, a 16-bit memory system and Thumb
code may be a better choice.