0% found this document useful (0 votes)
66 views67 pages

3 - ARMv8-A Architecture

3 - ARMv8-A Architecture

Uploaded by

ranbir singh
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
66 views67 pages

3 - ARMv8-A Architecture

3 - ARMv8-A Architecture

Uploaded by

ranbir singh
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 67

ARMv8-A Architecture

1
Readings and Exercises
• P & H: Sections 2.1 – 2.3, 2.7
• ARMv8 Instruction Set Overview: Sections:
▪ 2
▪ 3.1 – 3.2
▪ 4.1- 4.4
▪ 5.1 - 5.2, 5.4.1, 5.5.1, 5.6

2
Objectives
At the end of the section, you will
1. Understand the ARMv8 register file
2. Entertain basic ARMv8 assembly instructions
3. Be able to write ARMv8 programs with
branching and looping
4. Be able to work with gdb

3
ARMV8 OVERVIEW

4
Introduction
• This course uses the Applied Micro X-Gene
X-C1 servers
▪ Its CPU is the APM883208-X1 X-Gene Multi-Core
64-bit processor
• Is an implementation of the ARMv8-A specification,
licensed from ARM Holdings, PLC
▪ ARM: Advanced RISC Machine (originally Acorn RISC Machine)
▪ The installed operating system (OS) is Linux
Fedora 26
• Includes up-to-date versions of gcc, as, gdb, and m4
5
Introduction (cont’d)
▪ The 3 servers can be accessed remotely using ssh
(secure shell) with these addresses:
• csa1.cpsc.ucalgary.ca
• csa2.cpsc.ucalgary.ca
• csa3.cpsc.ucalgary.ca
• Or through the load balancer: arm.cpsc.ucalgary.ca
• Use your CPSC credentials to log in

6
Introduction (cont’d)
• The ARMv8-A architecture:
▪ Is a RISC
▪ Is a Load/Store machine
• Register file contains 31 64-bit-wide registers
• Most instructions manipulate 64-bit or 32-bit data stored in
these registers
▪ Uses a von Neumann architecture for RAM

7
Introduction (cont’d)
• The ARMv8-A architecture has two execution
states:
▪ AArch64
• Uses the A64 instruction set and 64-bit registers
• Used exclusively in this course
▪ AArch32
• Uses the A32 or T32 instruction sets
▪ Provided for compatibility with the older ARM and THUMB
instruction sets using 32-bit registers
• Not used in this course
8
ARMv8 Exception Levels

EL0 Application Application Application Application

EL1 OS Kernel OS Kernel

EL2 Hypervisor

EL3 Secure Monitor

9
Introduction (cont’d)
• The ARMv8-A architecture has 4 exception
levels:
▪ EL0: for normal user applications with limited
privileges
• Restricted access to a limited set of instructions and
registers, and to certain parts of memory
• Most programs work at this level

10
Introduction (cont’d)
▪ EL1: for the OS kernel
• Privileged access to instructions, registers, and memory
• Accessed indirectly by user programs using system calls
▪ EL2: for a Hypervisor
• Supports virtualization, where the computer hosts multiple
guest operating systems, each on its own virtual machine
▪ EL3: Low-level firmware
• Includes the Secure Monitor

11
ARMv8 Registers

Screenshot from: https://developer.arm.com/documentation/den0024/a/The-ABI-for-ARM-64-bit-Architecture/Register-use-in-the-AArch64-


Procedure-Call-Standard/Parameters-in-general-purpose-registers

12
ARMv8 Registers
• In AArch64, has 31 64-bit-wide general-purpose
registers
▪ Numbered from 0 to 30
▪ When using all 64 bits, use “x” or “X” before the
number (stands for extended)
• Eg: x0, x1, x30
▪ When using only the low-order 32 bits of the register,
use “w” or “W” (word)
• Eg: w2, w29

13
ARMv8 Registers (cont’d)
• Many of these registers have special uses:
▪ x0 – x7: used to pass arguments into a procedure, and
return results
▪ x8: indirect result location register
▪ x9 – x15: temporary registers
▪ x16, x17: intra-procedure-call temporary registers
(IP0, IP1)
▪ x18: platform register

14
ARMv8 Registers (cont’d)
▪ x29: frame pointer (FP) register
▪ x30: procedure link register (LR)
• For now, use registers x19 – x28 for most of your
work
▪ Are callee-saved registers
• Value is preserved by any function you call

15
ARMv8 Registers (cont’d)
• There are several special-purpose registers:
▪ Stack Pointer:
• SP: 64-bit register used in A64 code
• WSP: 32-bit register used in A32 code
• Used to point to the top of the run-time stack
▪ Zero Register:
• XZR: 64 bits wide
• WZR: 32 bits wide
• Gives 0 value when read from
• Discards value when written to
16
ARMv8 Registers (cont’d)
▪ Program Counter
• PC: 64 bits wide
• Holds the address of the currently executing instruction
• Cannot be accessed directly as a named register
▪ Is changed indirectly by branch and other instructions
▪ Is used implicitly by PC-relative loads/stores
• Can be accessed in gdb with $pc

17
ARMv8 Registers (cont’d)
• There are 32 128-bit-wide floating-point registers
▪ Discussed in detail later
• Has numerous system registers
▪ Most are accessible only in EL1
• Used in OS kernel code

18
BASIC ARMV8 ASSEMBLY

19
A64 Assembly Language
• Consists of statements with one opcode and 0 to 4
operands
▪ The allowed operands depend on the particular
instruction
▪ In general:
• The first operand is a destination register
• The others are source registers
• Eg: add x19, x20, x21

destination source1 source2

20
A64 Assembly Language (cont’d)
• An immediate value (a constant) may be used as
the final source operand for some instructions
▪ Eg: add x19, x20, 42 immediate

▪ A # symbol can prefix the immediate, but is optional


when using gcc
• Eg: add x19, x20, #42
▪ The allowable range of constants depends on the
particular instruction
• Depends on the number of available bits within the
machine instruction
21
A64 Assembly Language (cont’d)
• Immediates are assumed to be decimal numbers
unless prefixed as follows:
▪ Hexadecimal: 0x
• Eg: 0x6f
▪ Octal: 0
• Eg: 0777
▪ Binary: 0b
• Eg: 0b101

22
A64 Assembly Language (cont’d)
• Some instructions are aliases for other
instructions
▪ Eg: mov x29, sp
is an alias for

add x29, sp, 0


▪ Are provided for readability and programmer
convenience

23
A64 Assembly Language (cont’d)
• Some commonly-used instructions are:
▪ Move immediate (32-bit)
• Form: mov Wd, #imm32
▪ Wd: destination register
▪ #imm32: integer in range -231 to +232-1
• Eg: mov w20, -237

24
A64 Assembly Language (cont’d)
▪ Move immediate (64-bit)
• Form: mov Xd, #imm64
▪ Xd: destination register
▪ #imm64: integer in range -263 to +264-1
• Eg: mov x21, 0xFFFE

25
A64 Assembly Language (cont’d)
▪ Move register (32-bit)
• Form: mov Wd, Wm
▪ Wd: destination register
▪ Wm: source register
▪ Alias for: orr Wd, wzr, Wm
• Eg: mov w21, w28
▪ Move register (64-bit)
• Similar form to above
• Eg: mov x22, x20

26
A64 Assembly Language (cont’d)
• A function can be called using the Branch and
Link instruction (bl)
▪ Can be a library function or your own function
▪ Form: bl label
▪ Eg: bl printf
▪ Arguments are put into x0 – x7 before the call
▪ Return value is in x0

27
Basic Program Structure
• The main routine of a program can be structured
as follows:
.global main
main: stp x29, x30, [sp, -16]! saves state
mov x29, sp

.
. your custom code goes here
.

ldp x29, x30, [sp], 16


ret restores state

28
Basic Program Structure (cont’d)
• .global main
▪ Makes the label “main” visible to the linker
▪ The main routine is where execution always starts
• …[sp, -16]!
▪ Allocates 16 bytes in stack memory (in RAM)
▪ Does so by pre-incrementing the SP register by -16

29
Basic Program Structure (cont’d)
• stp x29, x30, …
▪ Stores the contents of the pair of registers to the stack
• x29: frame pointer (FP)
• x30: link register (LR)
• SP points to the location in RAM where we write to
▪ Saves the state of the registers used by calling code
• mov x29, sp
▪ Updates FP to the current SP
▪ FP may be used as a base address in the routine
30
Basic Program Structure (cont’d)
• ldp x29, x30, …
▪ Loads the pair of registers from RAM
• SP points to the location in RAM where we read from
▪ Restores the state of the FP and LR registers
• …[sp], 16
▪ Deallocates 16 bytes of stack memory
▪ Does so by post-incrementing SP by +16

31
Basic Program Structure (cont’d)
• ret
▪ Returns control to calling code (in OS)
▪ Uses the address in LR

32
Basic Arithmetic Instructions
• Addition
▪ Uses 1 destination and 2 source operands
▪ Register (64-bit and 32-bit):
• Eg: add x19, x20, x21 // x19 = x20 + x21
• Eg: add w19, w19, w20 // w19 = w19 + w20
▪ Immediate (64-bit and 32-bit):
• Eg: add x20, x20, 1 // x20 = x20 + 1
• Eg: add w27, w19, 4 // w27 = w19 + 4

33
Basic Arithmetic Instructions (cont’d)
• Subtraction
▪ Uses 1 destination and 2 source operands
▪ Register (64-bit and 32-bit):
• Eg: sub x0, x1, x2 // x0 = x1 – x2
• Eg: sub w3, w6, w7 // w3 = w6 – w7
▪ Immediate (64-bit and 32-bit):
• Eg: sub x20, x20, 1 // x20 = x20 - 1
• Eg: sub w27, w19, 4 // w27 = w19 - 4

34
Basic Arithmetic Instructions (cont’d)
• Multiplication
▪ Uses 1 destination and 2 or 3 source registers
▪ No immediates allowed
▪ Form (32-bit): mul Wd, Wn, Wm
• Calculates: Wd = Wn × Wm
• Alias for: madd Wd, Wn, Wm, wzr
• Eg: mul w0, w1, w2
▪ The 64-bit form is similar
• Eg: mul x19, x20, x20 // square number
35
Basic Arithmetic Instructions (cont’d)
▪ Multiply-Add
• Form (32-bit): madd Wd, Wn, Wm, Wa
▪ Calculates: Wd = Wa + (Wn × Wm)
• Eg: madd w20, w21, w22, w23
• 64-bit form is similar
▪ Eg: madd x20, x0, x1, x20
▪ Multiply-Subtract
• Form (32-bit): msub Wd, Wn, Wm, Wa
▪ Calculates: Wd = Wa - (Wn × Wm)

36
Basic Arithmetic Instructions (cont’d)
▪ Multiply-Negate
• Form (32-bit): mneg Wd, Wn, Wm
▪ Calculates: Wd = -(Wn × Wm)
▪ Other variants are possible
• See ARM documentation

37
Basic Arithmetic Instructions (cont’d)
• Division
▪ Uses 1 destination and 2 source registers
▪ No immediates allowed
▪ Signed form (32-bit): sdiv Wd, Wn, Wm
• Operands are signed integers
• Calculates: Wd = Wn ÷ Wm
• Eg: sdiv w0, w1, w2
▪ 64-bit form is similar
• Eg: sdiv x21, x22, x23
38
Basic Arithmetic Instructions (cont’d)
▪ The udiv variants use unsigned integer operands
• Eg: udiv w0, w1, w2
• Eg: udiv x21, x22, x23
▪ These instructions do integer division
• The calculated quotient is an integer, and any remainder is
discarded
▪ Eg: 14 / 3 is 4, with a remainder of 2
• The remainder (or modulus) can be calculated using
numerator – (quotient × denominator)
▪ The msub instruction is useful here

39
Basic Arithmetic Instructions (cont’d)
▪ Dividing by 0 does not generate an exception (a trap)
• Writes 0 to the destination register

40
Printing to Standard Output
• Is done by calling printf()
▪ Is a standard function in the C library
▪ Invoked with 1 or more arguments
• The first is the format string (usually a literal)
• The rest correspond to placeholders in the string
▪ Example C code:
. . .
int x = 42;
printf(“Meaning of life = %d\n”, x);
. . .
int placeholder

41
Printing to Standard Output (cont’d)
▪ Equivalent assembly code:
creates the
fmt: .string "Meaning of life = %d\n" format string

.balign 4 ensures instructions


.global main are properly aligned
main: . . .
Arg 1: address
adrp x0, fmt
of the string
add x0, x0, :lo12:fmt
mov w1, 42
bl printf Arg 2: int value
. . .
function call

42
Alternative Way to call printf()
output0: .string "Enter N:"
….
ldr x0, =output0
bl printf

43
Calling scanf(“%d”,&n)
input0: .string “%d”
…. n must be declared:
.bss
ldr x0, =input0 n: .skip 4
ldr x1, =n Or
bl scanf
.data
Load address =n to a register n: .word
ldr x14, =n More on these
Load the value of n Sections later

ldr x19, [x14]


44
BRANCHING AND LOOPS

45
Branch Instructions and Condition
Codes
• A branch instruction transfers control to another
part of a program
▪ Like a goto in the C language
▪ PC register is not incremented as usual, but is set to
the computed address of an instruction
• Corresponds to the value of its label
• An unconditional branch is always taken
▪ Form: b label
▪ Eg: b top
46
Branch Instructions and Condition
Codes (cont’d)
• Condition flags may be used to store information
about the result of an instruction
▪ Are single-bit units in the CPU
• Record process state (PSTATE) information
• 0 means false, 1 means true
▪ There are 4 flags:
• Z: true if result is zero
• N: true if result is negative
• V: true if result overflows
• C: true if result generates a carry out
47
Branch Instructions and Condition
Codes (cont’d)
• Condition flags are set by instructions that end in
“s” (short for set flags)
▪ Eg: subs, adds
▪ subs may be used to compare two registers
• Eg: subs x0, x1, x2
▪ But cmp is more intuitive:
• Form (64-bit): cmp Xn, Xm
• Is an alias for: subs xzr, Xn, Xm
• Eg: cmp x1, x2

48
Branch Instructions and Condition
Codes (cont’d)
• Conditional branch instructions use the condition
flags to make a decision
▪ If particular flags test true, then the branch is taken
• i.e. one “jumps” to the instruction at the specified label
▪ Otherwise, control “drops through” to the following
instruction
▪ Eg: b.eq top
• Branches if Z is true

49
Branch Instructions and Condition
Codes (cont’d)
▪ Form: b.cc
• Where cc is a condition code
▪ The condition codes for signed integers are:
Name Meaning C equivalent Flags cmp a, b
eq equal == Z == 1 a == b
ne not equal != Z == 0 a != b
gt greater than > Z == 0 && N == V a > b
ge greater than or equal >= N == V a >= b
lt less than < N != V a < b
le less than or equal <= !(Z == 0 && N == V) a <= b

50
Loops
• Are formed by branching from the bottom of the
loop to the top
• The do loop is post-test loop
▪ The loop body will be executed at least once
▪ Eg: C code long int x;

x = 1;
do {
// loop body

x++;
} while (x <= 10);

51
Loops (cont’d)
Equivalent assembly code:
define(x_r, x19)

mov x_r, 1
top: statements forming
loop body

add x_r, x_r, 1


cmp x_r, 10
b.le top

52
Loops (cont’d)
• The while loop is pre-test loop
▪ Possible the loop body will not be executed
▪ Eg: C code
long int x;

x = 0;
while (x < 10) {
// loop body

x++;
}

53
Loops (cont’d)
long int x;
Assembly:
x = 0;
define(x_r, x19)
while (x < 10) {
mov x_r, 0 // loop body
test: cmp x_r, 10
b.ge done x++;
}
statements forming
body of loop

add x_r, x_r, 1


b test

done: statement following loop

54
Loops (cont’d)
▪ Note: we branch over the loop body if x >= 10
• The logic operation is complemented
▪ Alternatively, the test can be moved to the end of the
loop
• Branch does not use complemented logic
▪ i.e. matches original C code
• Must branch to the test the first time through
▪ But is still a pre-test loop!

55
Loops (cont’d)
long int x;

x = 0;
define(x_r, x19) while (x < 10) {
// loop body
mov x_r, 0
b test
top: statements forming x++;
body of loop }

add x_r, x_r, 1


test: cmp x_r, 10
b.lt top

statement following loop

56
Loops (cont’d)
• A for loop can be formed by first converting it to
the equivalent while loop
▪ Eg: C code
for (i = 10; i < 20; i++)
x += i;

is the same as:

i = 10;
while (i < 20) {
x += i;
i++;
}
57
i = 10;
Loops (cont’d) while (i < 20) {
x += i;
i++;
Assembly: }

define(i_r, x19)
define(x_r, x20)

mov i_r, 10 // initialization


b test

top: add x_r, x_r, i_r // loop body

add i_r, i_r, 1 // increment


test: cmp i_r, 20 // test
b.lt top

58
The if Construct
• Is formed by branching over the statement body if
the condition is not true
▪ Must use the logical complement:
b.lt <---> b.ge
b.le <---> b.gt
b.eq <---> b.ne

▪ Eg: C code
if (a > b) {
c = a + b;
d = c + 5;
}
59
The if Construct (cont’d)
Assembly: if (a > b) {
c = a + b;
define(a_r, x19) d = c + 5;
define(b_r, x20) }
define(c_r, x21)
define(d_r, x22)

...

cmp a_r, b_r // test


b.le next // logical complement

add c_r, a_r, b_r // body


add d_r, c_r, 5

next: statement after if-construct

60
The if-else Construct
• Is formed by branching to the else part if the
condition is not true
▪ Must use the logical complement
▪ If true, the code falls through to the if part
• Has an unconditional branch to the statement after the
construct

61
The if-else Construct (cont’d)
C code:
if (a > b) {
c = a + b;
d = c + 5;
} else {
c = a - b;
d = c - 5;
}

Assembly:
define(a_r, x19)
define(b_r, x20)
define(c_r, x21)
define(d_r, x22)

62
The if-else Construct (cont’d)
if (a > b) {
c = a + b;
cmp a_r, b_r d = c + 5;
b.le else } else {
c = a - b;
add c_r, a_r, b_r d = c - 5;
add d_r, c_r, 5 }

b next

else:
sub c_r, a_r, b_r
sub d_r, c_r, 5

next: statement after if-else construct

63
GDB

64
Introduction to the gdb Debugger
• To start a program under debugger control, use:
gdb myprogram
• To set a breakpoint, type: b label
▪ Eg: b main
• Use r to run your program
▪ Will stop at the first breakpoint
• Use c to continue to the next breakpoint
▪ Or to the end of the program, if no other breakpoints
65
Introduction to the gdb Debugger
(cont’d)
• To single step through your program, use:
▪ si
• Executes the next instruction
▪ ni
• Also executes the next instruction
▪ But if a function call, proceeds until the function returns
▪ Use display/i $pc to automatically show the
current instruction when single stepping
• Do before running the program

66
Introduction to the gdb Debugger
(cont’d)
• Use p $reg to print the contents of a register
▪ Eg: p $x19
▪ Can append a format character:
• Signed decimal: p/d
• Hexadecimal: p/x
• Binary: p/t
• Use q to quit gdb

67

You might also like