ARM Cortex-A9 ARM V7-A A Programmer's Perspective
ARM Cortex-A9 ARM V7-A A Programmer's Perspective
ARM v7-A
A programmer’s perspective
Part 2
ARM Instructions 31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
cond Opcode S Rn Rd Rs Opcode2 Rm
L
Immd8
General Format
Immd12
Inst Rd, Rn, Rm, Rs
Immd24
Inst Rd, Rn, #0ximm
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
cond Opcode S Op1 Dest. Op. Acc. Opcode2 Op2
L
Instruction Classes
Data Processing (largest class): ADD, AND, BIC, CMP, EOR, MOV, ORR, RSB, SUB, TEQ, TST
ASR, ASL, LSL, LSR, ROR, MLA, MLS, MUL, PKH, SDIV, SXT
Load/Store: LDR, LDRB, LDRW, LDRD, STR, STRB, STRH, STRD, LDM, LDMIA, LDMDA,
LDMDB, LDMIB, STM, STMIA, STMDA, STMDB, STMIB, POP, PUSH
Ports
Port 1 OE DST_Addr R11: GPR
DST_Clk R12: GPR
AXI bus to R13: SP
FPGA R14: LR
R15: PC
8000 0000 Register File
Instruction
7FFF FFFF
Cache
Port 0
AXI bus to
FPGA
4000 0000
3FFF FFFF Immd_Op
Main Memory
32-bit external addresses (4 Gbytes). ZYNQ supports 30-bit external addresses (1 Gbyte).
Loads and stores can operate on words (4 consecutive bytes), half words (2 bytes), or bytes.
Operands residing in memory must be loaded into a register before they can be used, and
new values stored in main memory must be stored from a register.
Three sets of instructions that interact with main memory
Basic load and store operations are: LDR/STR, LDRH/STRH, LDRB/STRB, LDRD/STRD
All load/store instructions require a base address pointer placed in a GPR (Rn). Rn is a
“pointer”, or a 32-bit memory address. Square brackets [] designate a pointer.
The base address can be modified with an offset applied before access. Examples:
Imm is a 12-bit “immediate” value; if omitted, default is 0. A minus sign subtracts the
immediate value, a plus sign (or no sign) adds the immediate value.
Load and Store
LDR Rt, label @ Load Rt with location at label (can be +/- 4096 from PC)
Indexing means modifying the base address, and writing the modified value back into the
base address register, as a part of the execution cycle.
Pre-indexing means doing a transfer first, then updating Rn. Pre-indexing is used
when a ‘!’ is added to the end of any load instruction.
LDR Rt, [Rn, #4]! @ Rt <- [Rn + 4], then Rn is updated with address that was used
STR Rt, [Rn, Rm]! @ Rt <- [Rn + Rm], then Rn is updated with address that was used
Post-indexing means updating Rn first, then doing the transfer. Post-indexing is used
when only the base address is enclosed in square bracket.
LDR Rt, [Rn], #4 @ Rn updated with (Rn+4), then Rt loaded from new address
STR Rt, [Rn], Rm @ Rn is updated with (Rn + Rm), then Rt loaded from new address
Indexing provides a powerful tool for working with regular memory structures like arrays.
A quick aside… the barrel shifter Memory Controller
R0: GPR
32 R1: GPR
CPUClk Out
Controller
ARM CPU
Load and Store using Barrel Shifter
The shifter can also modify the base address used for load and store operations. A 5-bit
immediate field in the opcode encodes shift amount. No extra CPU time is needed.
Additional load and store commands operate on lists of registers. More information in the
Arm Architectural Reference starting on page A8-396.
LDMIA/LDMFD
LDMDA/LDMFA
LDMDB/LDMEA
LDMIB/LDMED
STMIA/STMEA
STMDA/STMED
STMDB/STMFD
STMIB/STMFA
STMDB R13, {r4, r5, r6, r7, r8, r9, r10, r11, r12, r13, r14}
LDMIA R13, {r4, r5, r6, r7, r8, r9, r10, r11, r12, r13, r14}
Load and Store
No instruction can load a 32-bit immediate constant into a register without performing a
data load from memory (ARM does not embed immediate in the instruction stream).
8 or 12 bit immediates can be loaded and rotated to give a wider range of numbers that can
be generated from immediates.
MOV r0, #0x40, 26 @ load #0x1000 into R0
The pseudo-instruction shown can move any declared constant into a register:
.set myconstant, 0xAAAA5555
LDR r0, =myconstant @ load #0xAAAA5555 into R0
The pseudo-instruction shown below can also be used. In this case, the assembler will use a
MOV or MVN instruction if possible; otherwise, it will create a constant and then load it:
LDR r0, =0xFFFF0C0C
PC-relative Load and Store
The assembler turns this instruction into the following (the assembler calculates offset)
A MOV instruction can move data between registers, or from an immediate to a register. A
MVN instruction also moves information, but does a bit-wise negation in the process.
There are other flavors of MOV instructions to move data into special registers and to
coprocessors. You can get more information about these move instructions from the text
book of from the Arm Architectural Reference starting on page A8-484.
Branch
Examples:
Loop_point: @ label MOV R0, #10
LDR r0, [r1] Loop_point1: SUBS r0, r0, #1
B loop_point BNE loop_point
ADD
The ADD and ADC (add with carry) instructions add the contents in two 32-bit “source”
registers and place the result in a 32-bit “destination” register (the destination register can
be the same as one of the sources).
More information about add instructions in text book and the Arm Architectural Reference
starting on page A8-300.
Subtract
The SUB and SBC (subtract with NOT of carry bit), and reverse-subtract RSB and RSC
instructions subtract the contents of one 32-bit register from another register, and place the
result in a 32-bit destination register (the destination register can be the same as one of the
sources).
More information about sub instructions in text book and the Arm Architectural Reference.
Shift
The ARM can do arithmetic and logical shifts by up to 32 bits. Arithmetic shifts are right-shift
and sign-extended (i.e, the sign bit is placed in all vacated bits). Logical shift right or left (LSR
or LSL) can shift up to 32 bits, and a ‘0’ will be placed in all vacated bits. The number of bits
to shift can be an immediate or placed in a register.
ASR r0, r1 ,#0x04 @ r0 <= r1 contents shifted right four bits with sign extend
ASR r0, r1, r2 @ r0 <= r1 contents shifted right by the number of bits in the bottom
of r2, sign extends, and places result in r0
LSL r0, r1, #0x06 @ r0 <= r1 shifted left 6 bits with 0 fill
LSR r0, r1, r2 @ r0 <= r1 shifted right by number of bits in bottom of r2, 0 fill
Comparisons
CMP compares by subtracting the operands, updating the status bits, and discarding the
results. No need to use “S” mnemonic extension. CMN adds operands, updates status, and
discards results. TST (test) ANDs operand 1 with operand 2, updates status bits, and discards
results. TEQ (test equivalence) XORs operand 1 with operand 2, updates status bits, and
discards results.
CMP r0, #0ximm @ Subtract imm (12 bit) from r0, discard results, update status bits
CMP r0, r1 @ Subtract r1 from r0, discard results, update status bits
CMP r0, r1, LSL #0x06 @ Subtract left-shifted r1 from r0, discard results, update status bits
CMP r0, r1, ROR r2 @ Subtract r1 (right-rotated by number in r2) from r0, discard, update
CMN r0, #0ximm @ Add imm to r0, discard results, update status bits
CMN r0, r1 @ Add r1 and r0, discard results, update status bits
TST r0, #1 @ ANDs r0 with 1, discard results, update status bits (why?)
TEQ r2, r3 @ Bitwise XOR of r2 and r3, discard results, updates status bits
TEQ r2, #5 @ Bitwise XOR of r2 and 5 (Z = 1 if equal)
Logical Operations
The AND, BIC, EOR, and ORR data processing operators all use the same general addressing
modes as the other data processing instructions.
AND r0, r1, #0ximm @ Bitwise AND r1 with 12-bit immediate and write result to r0
ANDS r0, r1, r2 @ Bitwise AND r1 with r2, write result to r0, update status bits
AND r0, r1, r2 LSR r3 @ Bitwise AND r1 with left-shifted r2, write result to r0
BIC r0, r1, #0ximm @ Bitwise AND r1 with inverse of imm, write to r0, update status bits
BIC r0, r1, r2, LSL 0x#2 @ Bitwise AND r1 with left-shifted r2, write to r0, update status bits
EORS r0, r1, #0ximm @ Bitwise XOR r1 with 12-b imm, write result to r0, update status bits
EOR r0, r1, r2, LSR r3 @ Bitwise XOR r1 with r2 shifted left by r3, write to r0
ORR r2, r3, #0xAA @ Bitwise XOR of r3 and AA, write to r2, update status bits
Multiplication
ARM has three multiplication instructions: MUL to multiply two 32-bit registers, MLA
(multiply and accumulate) to multiply two 32-bit registers and add a third register; and MLS
(multiply and subtract) to multiply two registers and subtract the result from a third register.
For each instruction, the destination register holds the least-significant 32 bits of the result.