3 - ARMv8-A Architecture
3 - ARMv8-A Architecture
1
Readings and Exercises
• P & H: Sections 2.1 – 2.3, 2.7
• ARMv8 Instruction Set Overview: Sections:
▪ 2
▪ 3.1 – 3.2
▪ 4.1- 4.4
▪ 5.1 - 5.2, 5.4.1, 5.5.1, 5.6
2
Objectives
At the end of the section, you will
1. Understand the ARMv8 register file
2. Entertain basic ARMv8 assembly instructions
3. Be able to write ARMv8 programs with
branching and looping
4. Be able to work with gdb
3
ARMV8 OVERVIEW
4
Introduction
• This course uses the Applied Micro X-Gene
X-C1 servers
▪ Its CPU is the APM883208-X1 X-Gene Multi-Core
64-bit processor
• Is an implementation of the ARMv8-A specification,
licensed from ARM Holdings, PLC
▪ ARM: Advanced RISC Machine (originally Acorn RISC Machine)
▪ The installed operating system (OS) is Linux
Fedora 26
• Includes up-to-date versions of gcc, as, gdb, and m4
5
Introduction (cont’d)
▪ The 3 servers can be accessed remotely using ssh
(secure shell) with these addresses:
• csa1.cpsc.ucalgary.ca
• csa2.cpsc.ucalgary.ca
• csa3.cpsc.ucalgary.ca
• Or through the load balancer: arm.cpsc.ucalgary.ca
• Use your CPSC credentials to log in
6
Introduction (cont’d)
• The ARMv8-A architecture:
▪ Is a RISC
▪ Is a Load/Store machine
• Register file contains 31 64-bit-wide registers
• Most instructions manipulate 64-bit or 32-bit data stored in
these registers
▪ Uses a von Neumann architecture for RAM
7
Introduction (cont’d)
• The ARMv8-A architecture has two execution
states:
▪ AArch64
• Uses the A64 instruction set and 64-bit registers
• Used exclusively in this course
▪ AArch32
• Uses the A32 or T32 instruction sets
▪ Provided for compatibility with the older ARM and THUMB
instruction sets using 32-bit registers
• Not used in this course
8
ARMv8 Exception Levels
EL2 Hypervisor
9
Introduction (cont’d)
• The ARMv8-A architecture has 4 exception
levels:
▪ EL0: for normal user applications with limited
privileges
• Restricted access to a limited set of instructions and
registers, and to certain parts of memory
• Most programs work at this level
10
Introduction (cont’d)
▪ EL1: for the OS kernel
• Privileged access to instructions, registers, and memory
• Accessed indirectly by user programs using system calls
▪ EL2: for a Hypervisor
• Supports virtualization, where the computer hosts multiple
guest operating systems, each on its own virtual machine
▪ EL3: Low-level firmware
• Includes the Secure Monitor
11
ARMv8 Registers
12
ARMv8 Registers
• In AArch64, has 31 64-bit-wide general-purpose
registers
▪ Numbered from 0 to 30
▪ When using all 64 bits, use “x” or “X” before the
number (stands for extended)
• Eg: x0, x1, x30
▪ When using only the low-order 32 bits of the register,
use “w” or “W” (word)
• Eg: w2, w29
13
ARMv8 Registers (cont’d)
• Many of these registers have special uses:
▪ x0 – x7: used to pass arguments into a procedure, and
return results
▪ x8: indirect result location register
▪ x9 – x15: temporary registers
▪ x16, x17: intra-procedure-call temporary registers
(IP0, IP1)
▪ x18: platform register
14
ARMv8 Registers (cont’d)
▪ x29: frame pointer (FP) register
▪ x30: procedure link register (LR)
• For now, use registers x19 – x28 for most of your
work
▪ Are callee-saved registers
• Value is preserved by any function you call
15
ARMv8 Registers (cont’d)
• There are several special-purpose registers:
▪ Stack Pointer:
• SP: 64-bit register used in A64 code
• WSP: 32-bit register used in A32 code
• Used to point to the top of the run-time stack
▪ Zero Register:
• XZR: 64 bits wide
• WZR: 32 bits wide
• Gives 0 value when read from
• Discards value when written to
16
ARMv8 Registers (cont’d)
▪ Program Counter
• PC: 64 bits wide
• Holds the address of the currently executing instruction
• Cannot be accessed directly as a named register
▪ Is changed indirectly by branch and other instructions
▪ Is used implicitly by PC-relative loads/stores
• Can be accessed in gdb with $pc
17
ARMv8 Registers (cont’d)
• There are 32 128-bit-wide floating-point registers
▪ Discussed in detail later
• Has numerous system registers
▪ Most are accessible only in EL1
• Used in OS kernel code
18
BASIC ARMV8 ASSEMBLY
19
A64 Assembly Language
• Consists of statements with one opcode and 0 to 4
operands
▪ The allowed operands depend on the particular
instruction
▪ In general:
• The first operand is a destination register
• The others are source registers
• Eg: add x19, x20, x21
20
A64 Assembly Language (cont’d)
• An immediate value (a constant) may be used as
the final source operand for some instructions
▪ Eg: add x19, x20, 42 immediate
22
A64 Assembly Language (cont’d)
• Some instructions are aliases for other
instructions
▪ Eg: mov x29, sp
is an alias for
23
A64 Assembly Language (cont’d)
• Some commonly-used instructions are:
▪ Move immediate (32-bit)
• Form: mov Wd, #imm32
▪ Wd: destination register
▪ #imm32: integer in range -231 to +232-1
• Eg: mov w20, -237
24
A64 Assembly Language (cont’d)
▪ Move immediate (64-bit)
• Form: mov Xd, #imm64
▪ Xd: destination register
▪ #imm64: integer in range -263 to +264-1
• Eg: mov x21, 0xFFFE
25
A64 Assembly Language (cont’d)
▪ Move register (32-bit)
• Form: mov Wd, Wm
▪ Wd: destination register
▪ Wm: source register
▪ Alias for: orr Wd, wzr, Wm
• Eg: mov w21, w28
▪ Move register (64-bit)
• Similar form to above
• Eg: mov x22, x20
26
A64 Assembly Language (cont’d)
• A function can be called using the Branch and
Link instruction (bl)
▪ Can be a library function or your own function
▪ Form: bl label
▪ Eg: bl printf
▪ Arguments are put into x0 – x7 before the call
▪ Return value is in x0
27
Basic Program Structure
• The main routine of a program can be structured
as follows:
.global main
main: stp x29, x30, [sp, -16]! saves state
mov x29, sp
.
. your custom code goes here
.
28
Basic Program Structure (cont’d)
• .global main
▪ Makes the label “main” visible to the linker
▪ The main routine is where execution always starts
• …[sp, -16]!
▪ Allocates 16 bytes in stack memory (in RAM)
▪ Does so by pre-incrementing the SP register by -16
29
Basic Program Structure (cont’d)
• stp x29, x30, …
▪ Stores the contents of the pair of registers to the stack
• x29: frame pointer (FP)
• x30: link register (LR)
• SP points to the location in RAM where we write to
▪ Saves the state of the registers used by calling code
• mov x29, sp
▪ Updates FP to the current SP
▪ FP may be used as a base address in the routine
30
Basic Program Structure (cont’d)
• ldp x29, x30, …
▪ Loads the pair of registers from RAM
• SP points to the location in RAM where we read from
▪ Restores the state of the FP and LR registers
• …[sp], 16
▪ Deallocates 16 bytes of stack memory
▪ Does so by post-incrementing SP by +16
31
Basic Program Structure (cont’d)
• ret
▪ Returns control to calling code (in OS)
▪ Uses the address in LR
32
Basic Arithmetic Instructions
• Addition
▪ Uses 1 destination and 2 source operands
▪ Register (64-bit and 32-bit):
• Eg: add x19, x20, x21 // x19 = x20 + x21
• Eg: add w19, w19, w20 // w19 = w19 + w20
▪ Immediate (64-bit and 32-bit):
• Eg: add x20, x20, 1 // x20 = x20 + 1
• Eg: add w27, w19, 4 // w27 = w19 + 4
33
Basic Arithmetic Instructions (cont’d)
• Subtraction
▪ Uses 1 destination and 2 source operands
▪ Register (64-bit and 32-bit):
• Eg: sub x0, x1, x2 // x0 = x1 – x2
• Eg: sub w3, w6, w7 // w3 = w6 – w7
▪ Immediate (64-bit and 32-bit):
• Eg: sub x20, x20, 1 // x20 = x20 - 1
• Eg: sub w27, w19, 4 // w27 = w19 - 4
34
Basic Arithmetic Instructions (cont’d)
• Multiplication
▪ Uses 1 destination and 2 or 3 source registers
▪ No immediates allowed
▪ Form (32-bit): mul Wd, Wn, Wm
• Calculates: Wd = Wn × Wm
• Alias for: madd Wd, Wn, Wm, wzr
• Eg: mul w0, w1, w2
▪ The 64-bit form is similar
• Eg: mul x19, x20, x20 // square number
35
Basic Arithmetic Instructions (cont’d)
▪ Multiply-Add
• Form (32-bit): madd Wd, Wn, Wm, Wa
▪ Calculates: Wd = Wa + (Wn × Wm)
• Eg: madd w20, w21, w22, w23
• 64-bit form is similar
▪ Eg: madd x20, x0, x1, x20
▪ Multiply-Subtract
• Form (32-bit): msub Wd, Wn, Wm, Wa
▪ Calculates: Wd = Wa - (Wn × Wm)
36
Basic Arithmetic Instructions (cont’d)
▪ Multiply-Negate
• Form (32-bit): mneg Wd, Wn, Wm
▪ Calculates: Wd = -(Wn × Wm)
▪ Other variants are possible
• See ARM documentation
37
Basic Arithmetic Instructions (cont’d)
• Division
▪ Uses 1 destination and 2 source registers
▪ No immediates allowed
▪ Signed form (32-bit): sdiv Wd, Wn, Wm
• Operands are signed integers
• Calculates: Wd = Wn ÷ Wm
• Eg: sdiv w0, w1, w2
▪ 64-bit form is similar
• Eg: sdiv x21, x22, x23
38
Basic Arithmetic Instructions (cont’d)
▪ The udiv variants use unsigned integer operands
• Eg: udiv w0, w1, w2
• Eg: udiv x21, x22, x23
▪ These instructions do integer division
• The calculated quotient is an integer, and any remainder is
discarded
▪ Eg: 14 / 3 is 4, with a remainder of 2
• The remainder (or modulus) can be calculated using
numerator – (quotient × denominator)
▪ The msub instruction is useful here
39
Basic Arithmetic Instructions (cont’d)
▪ Dividing by 0 does not generate an exception (a trap)
• Writes 0 to the destination register
40
Printing to Standard Output
• Is done by calling printf()
▪ Is a standard function in the C library
▪ Invoked with 1 or more arguments
• The first is the format string (usually a literal)
• The rest correspond to placeholders in the string
▪ Example C code:
. . .
int x = 42;
printf(“Meaning of life = %d\n”, x);
. . .
int placeholder
41
Printing to Standard Output (cont’d)
▪ Equivalent assembly code:
creates the
fmt: .string "Meaning of life = %d\n" format string
42
Alternative Way to call printf()
output0: .string "Enter N:"
….
ldr x0, =output0
bl printf
43
Calling scanf(“%d”,&n)
input0: .string “%d”
…. n must be declared:
.bss
ldr x0, =input0 n: .skip 4
ldr x1, =n Or
bl scanf
.data
Load address =n to a register n: .word
ldr x14, =n More on these
Load the value of n Sections later
45
Branch Instructions and Condition
Codes
• A branch instruction transfers control to another
part of a program
▪ Like a goto in the C language
▪ PC register is not incremented as usual, but is set to
the computed address of an instruction
• Corresponds to the value of its label
• An unconditional branch is always taken
▪ Form: b label
▪ Eg: b top
46
Branch Instructions and Condition
Codes (cont’d)
• Condition flags may be used to store information
about the result of an instruction
▪ Are single-bit units in the CPU
• Record process state (PSTATE) information
• 0 means false, 1 means true
▪ There are 4 flags:
• Z: true if result is zero
• N: true if result is negative
• V: true if result overflows
• C: true if result generates a carry out
47
Branch Instructions and Condition
Codes (cont’d)
• Condition flags are set by instructions that end in
“s” (short for set flags)
▪ Eg: subs, adds
▪ subs may be used to compare two registers
• Eg: subs x0, x1, x2
▪ But cmp is more intuitive:
• Form (64-bit): cmp Xn, Xm
• Is an alias for: subs xzr, Xn, Xm
• Eg: cmp x1, x2
48
Branch Instructions and Condition
Codes (cont’d)
• Conditional branch instructions use the condition
flags to make a decision
▪ If particular flags test true, then the branch is taken
• i.e. one “jumps” to the instruction at the specified label
▪ Otherwise, control “drops through” to the following
instruction
▪ Eg: b.eq top
• Branches if Z is true
49
Branch Instructions and Condition
Codes (cont’d)
▪ Form: b.cc
• Where cc is a condition code
▪ The condition codes for signed integers are:
Name Meaning C equivalent Flags cmp a, b
eq equal == Z == 1 a == b
ne not equal != Z == 0 a != b
gt greater than > Z == 0 && N == V a > b
ge greater than or equal >= N == V a >= b
lt less than < N != V a < b
le less than or equal <= !(Z == 0 && N == V) a <= b
50
Loops
• Are formed by branching from the bottom of the
loop to the top
• The do loop is post-test loop
▪ The loop body will be executed at least once
▪ Eg: C code long int x;
x = 1;
do {
// loop body
x++;
} while (x <= 10);
51
Loops (cont’d)
Equivalent assembly code:
define(x_r, x19)
mov x_r, 1
top: statements forming
loop body
52
Loops (cont’d)
• The while loop is pre-test loop
▪ Possible the loop body will not be executed
▪ Eg: C code
long int x;
x = 0;
while (x < 10) {
// loop body
x++;
}
53
Loops (cont’d)
long int x;
Assembly:
x = 0;
define(x_r, x19)
while (x < 10) {
mov x_r, 0 // loop body
test: cmp x_r, 10
b.ge done x++;
}
statements forming
body of loop
54
Loops (cont’d)
▪ Note: we branch over the loop body if x >= 10
• The logic operation is complemented
▪ Alternatively, the test can be moved to the end of the
loop
• Branch does not use complemented logic
▪ i.e. matches original C code
• Must branch to the test the first time through
▪ But is still a pre-test loop!
55
Loops (cont’d)
long int x;
x = 0;
define(x_r, x19) while (x < 10) {
// loop body
mov x_r, 0
b test
top: statements forming x++;
body of loop }
56
Loops (cont’d)
• A for loop can be formed by first converting it to
the equivalent while loop
▪ Eg: C code
for (i = 10; i < 20; i++)
x += i;
i = 10;
while (i < 20) {
x += i;
i++;
}
57
i = 10;
Loops (cont’d) while (i < 20) {
x += i;
i++;
Assembly: }
define(i_r, x19)
define(x_r, x20)
58
The if Construct
• Is formed by branching over the statement body if
the condition is not true
▪ Must use the logical complement:
b.lt <---> b.ge
b.le <---> b.gt
b.eq <---> b.ne
▪ Eg: C code
if (a > b) {
c = a + b;
d = c + 5;
}
59
The if Construct (cont’d)
Assembly: if (a > b) {
c = a + b;
define(a_r, x19) d = c + 5;
define(b_r, x20) }
define(c_r, x21)
define(d_r, x22)
...
60
The if-else Construct
• Is formed by branching to the else part if the
condition is not true
▪ Must use the logical complement
▪ If true, the code falls through to the if part
• Has an unconditional branch to the statement after the
construct
61
The if-else Construct (cont’d)
C code:
if (a > b) {
c = a + b;
d = c + 5;
} else {
c = a - b;
d = c - 5;
}
Assembly:
define(a_r, x19)
define(b_r, x20)
define(c_r, x21)
define(d_r, x22)
62
The if-else Construct (cont’d)
if (a > b) {
c = a + b;
cmp a_r, b_r d = c + 5;
b.le else } else {
c = a - b;
add c_r, a_r, b_r d = c - 5;
add d_r, c_r, 5 }
b next
else:
sub c_r, a_r, b_r
sub d_r, c_r, 5
63
GDB
64
Introduction to the gdb Debugger
• To start a program under debugger control, use:
gdb myprogram
• To set a breakpoint, type: b label
▪ Eg: b main
• Use r to run your program
▪ Will stop at the first breakpoint
• Use c to continue to the next breakpoint
▪ Or to the end of the program, if no other breakpoints
65
Introduction to the gdb Debugger
(cont’d)
• To single step through your program, use:
▪ si
• Executes the next instruction
▪ ni
• Also executes the next instruction
▪ But if a function call, proceeds until the function returns
▪ Use display/i $pc to automatically show the
current instruction when single stepping
• Do before running the program
66
Introduction to the gdb Debugger
(cont’d)
• Use p $reg to print the contents of a register
▪ Eg: p $x19
▪ Can append a format character:
• Signed decimal: p/d
• Hexadecimal: p/x
• Binary: p/t
• Use q to quit gdb
67