0% found this document useful (0 votes)
7 views77 pages

Assembly Programming

The document provides an introduction to Armv8-M assembly programming, covering key topics such as data processing instructions, load/store instructions, and flow control. It emphasizes the importance of understanding assembly language for specific programming scenarios, debugging, and optimizing code. Additionally, it outlines the structure of assembly instructions, the Unified Assembler Language (UAL), and various instruction types and their functionalities.

Uploaded by

jatinsaini415
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
7 views77 pages

Assembly Programming

The document provides an introduction to Armv8-M assembly programming, covering key topics such as data processing instructions, load/store instructions, and flow control. It emphasizes the importance of understanding assembly language for specific programming scenarios, debugging, and optimizing code. Additionally, it outlines the structure of assembly instructions, the Unified Assembler Language (UAL), and various instruction types and their functionalities.

Uploaded by

jatinsaini415
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 77

Armv8-M Mainline

Assembly Programming

© 2022 Arm Course 2 Armv8-M Architecture Fundamentals


Agenda
Introduction
Data Processing Instructions
Load/Store Instructions
Flow Control
Miscellaneous
Arm Custom Instructions

2 1191 rev 35008


Learning objectives - Introduction
After completing this section, you will be able to:
• Discuss the situations in which assembly language programming is required

• Describe the syntax of assembly language instructions in the T32 instruction set

3 1191 rev 35008


Before we start...
This module provides an introduction to Armv8-M assembly language
• This module is NOT a complete summary of available instructions

For further information on the Armv8-M instruction set, see


• The Armv8-M Architecture Reference Manual
• The Arm Compiler toolchain – armasm User Guide

Arm Compiler 6 supports Armv8-M and provides two assemblers:


• armasm – legacy assembler used in Arm Compiler 5
• armclang – assembler and C/C++ compiler

Terminology – “Arm” can mean a number of different things, for example


• The company
• The processor core
• The general architecture

4 1191 rev 35008


Why do you need to know assembler?
Armv8-M processors can be programmed using C
• It is not usually necessary to use assembly code
• Exception model automatically saves the thread context

However, assembler is sometimes needed


• Validation work, or testing corner-case behavior
• Reset handler, operating systems, device drivers or other hand-optimized critical code
• Very helpful to understand the instruction set while debugging

Some features of the Arm architecture may not be available through the compiler
• CMSIS provides intrinsics to access certain instructions
• Example: __WFI(); (implements WFI instruction)

Some compiled code can be optimized by coding in assembly

5 1191 rev 35008


Instruction set basics
The Arm Architecture is a Load/Store architecture
• Data must be loaded from memory into the CPU, modified, then written back out
• No direct manipulation of memory contents

Instructions consist of
• Opcode, destination register, first source operand, optional second source operand
OPCODE{<qualifier>}{<cond>} Rd, Rn, {Rm}

T32 Instruction Set


• Mix of 16-bit and 32-bit instructions
• Superset of the traditional 16-bit Thumb instruction set
• Optimized for code density from C code
• Baseline architecture’s instruction set is derived from the Armv6-M Thumb instruction set
– Limited use of registers, constants and conditional execution
• Main Extension is derived from the Armv7-M architecture and adds more instructions

6 1191 rev 35008


Unified Assembler Language (UAL)
UAL gives the ability to write assembler code for all Arm processors
• Previously code had to be written exclusively for Arm state (not available in Armv8-M) or Thumb state
• UAL allows the execution state to be decided at assembly time
• Legacy assembler code will still assemble successfully

UAL also defines ‘pseudo’ instructions that are resolved by the Assembler
• The assembler will generate the machine code dependent upon the inline directives (e.g. .thumb) or the assembler switches (e.g. -mcpu)

Some general rules for UAL


• Use of POP, PUSH
• Relaxation of register definitions for Rd and Rn
• Requirement of S to enable flag setting

See complete definition in the Armv8-M Architecture Reference Manual

7 1191 rev 35008


Condition codes and flags (1)
Condition codes can be used for conditional execution of assembly instructions

The codes evaluate as TRUE or FALSE based on the values of the condition flags

The condition flags are part of the Application Program Status Register (APSR)

Certain assembly instructions set the condition flags


• For some instructions that do not set the flags by default, the S qualifier can be added

APSR bit Condition flag Meaning


31 N Negative
30 Z Zero
29 C Carry
28 V Overflow

8 1191 rev 35008


Condition codes and flags (2)
Condition code Meaning Flags
EQ Equal Z set
NE Not equal Z clear
CS or HS Carry set, or Higher or same (unsigned >=) C set
CC or LO Carry clear, or lower (unsigned <) C clear
MI Negative N set
PL Positive or zero N clear
VS Overflow V set
VC No overflow V clear
HI Higher (unsigned >) C set AND Z clear
LS Lower or same (unsigned <=) C clear OR Z set
GE Signed >= N and V the same
LT Signed < N and V differ
GT Signed > Z clear AND N and V the same
LE Signed <= Z set AND N and V differ
AL Always. This is the default, and it normally omitted. Any

9 1191 rev 35008


Thumb instruction encoding choice
When assembling for an Armv8-M Mainline processor there is often a choice of 16-bit and 32-bit instruction encodings
• The assembler will normally generate 16-bit instructions

Instruction width specifiers


• Allow you to determine which instruction width the assembler will use
• Can be placed immediately after instruction mnemonics:
– .W
▪ Forces a 32-bit instruction encoding
– .N
▪ Forces a 16-bit instruction encoding
• Errors raised by assembler if not possible

Disassembly rules
• One-to-one mapping is defined to ensure correct re-assembly
• .W or .N suffix used for cases when a bit pattern which doesn’t follow the above rules is disassembled

10 1191 rev 35008


Section quiz - Introduction
What are the different components of this instruction?

ADDSEQ.W r0, r1

11 1191 rev 35008


Section quiz - Introduction
What are the different components of this instruction?

Set bit Width specifier Source operand #2

ADDSEQ.W r0, r1

Opcode Condition code Source operand #1

12 1191 rev 35008


Agenda
Introduction
Data Processing Instructions
Load/Store Instructions
Flow Control
Miscellaneous
Arm Custom Instructions

13 1191 rev 35008


Learning objectives - Data processing instructions
After completing this section, you will be able to:
• Discuss why being able to understand assembly language instruction sequences is useful

• Compare common C language source code sequences to their corresponding T32 instruction sequences

14 1191 rev 35008


Data processing instructions (1)
These instructions operate on the contents of registers
• They DO NOT affect memory
• Comparison instructions only set the condition code flags
• Data processing instructions set the condition code flags only if suffix ‘S’ is added, for example:
ADDS r0, r1, r2 // r0 = r1 + r2
• Conditional execution – uses the IT Instruction (discussed later)

arithmetic logical move


manipulation ADC SBC BIC EOR MVN
(has destination ADD SUB AND ORR MOV
register) RSB ORN

comparison CMN CMP TST TEQ


(set flags only) (ADDS) (SUBS) (ANDS) (EORS)

15 1191 rev 35008


Data processing instructions (2)
Arithmetic
Operation Flags set? Result saved?
instruction
ADDS r0, r1, r2 r0 = r1 + r2 Yes, all Yes
ADCS r0, r1 r0 = r0 + r1 + <carry_flag> Yes, all Yes
SUB r3, r1, r7 r3 = r1 - r7 No Yes
RSB r3, r3, r7 r3 = r7 - r3 No Yes
CMP r1, r7 r1 - r7 Yes, all No

Logical
Operation Flags set? Result saved?
instruction
ANDS r0, r1, #0xA0 r0 = r1 & 0xA0 Only N, Z Yes
BIC r0, r1, #0xA0 r0 = r1 with bits 5 and 7 cleared No Yes
ORRS r0, r1, #0xA0 r0 = r1 | #0xA0 Only N, Z Yes
TST r1, #0xAB r1 & 0xAB Only N, Z No

16 1191 rev 35008


Generating data processing instructions

adds r0, r1, r0 // r0 = r1 + r0


dest1 = op1 + op2;
subs r2, r2, r3 // r2 = r2 – r3
dest2 = op3 - op4;
cmp r0, r2 // same as SUB but
// only affects APSR
if(dest1 > dest2)
ble .LBB0_1
{
movs r4, #170 // r4 = 0xAA
dest1 = op5 & 0xAA;
ands r0, r4 // r0 = r0 & r4
}
b .LBB0_2
else
.LBB0_1:
{
movs r5, #85 // r5 = 0x55
dest2 = op6 | 0x55;
orrs r2, r5 // r2 = r2 | r5
}
.LBB0_2:

17 1191 rev 35008


Shift operations
ASR: Arithmetic Shift Right
int dest1;
register CF

dest1 = dest1 >> 4;
Division by a power of 2,
preserving the sign bit ASR r8,r8,#4

LSL: Logical Shift Left


int dest2;

CF register 0 dest2 = dest2 << 8;
Multiplication by a power of 2 LSL r9,r9,#8

LSR: Logical Shift Right unsigned int dest3;



...0 register CF
dest3 = dest3 >> 4;
Division by a power of 2 LSR r10,r10,#4

These are also available as part of the flexible second operand


18 1191 rev 35008
Rotate operations
ROR: Rotate Right
dest1 = __ROR(op1,4);
register CF ROR r8,r4,#4

Bit rotate with wrap around


from LSB to MSB

RRX: Rotate Right Extended


dest2 = __RRX(op1);
register CF RRX r0,r0

Single bit rotate with wrap around


from CF to MSB

These are also available as part of the flexible second operand

19 1191 rev 35008


Flexible second operand - Registers
For many instructions the second operand is flexible
• Either a register with optional shift
• Or an immediate constant Operand 1 Operand 2

Register, with optional shift


• Shift distance can be a 5-bit unsigned integer

Can be used for multiplication by constant, for example: Barrel Shifter

int op1, op2, dest1, dest2;



dest1 = op1 * 3;
ADD r8, r4, r4,LSL #1
op1 2 * op1 ALU
dest2 = dest1 – op2 * 8;
SUB r9, r8, r5,LSL #3
dest1 3 * op2
Result

20 1191 rev 35008


Flexible second operand - Constants (1)
There is limited space for constants because of fixed-length instructions

Constants can be
• 8-bit number shifted left by any number of places
• In the form 0x00XY00XY, 0xXY00XY00 or 0xXYXYXYXY

Some instructions have a special immediate format


• MOVW loads a 16-bit immediate value into the lower half of the register, and clears the other half, for example:
– MOVW Rd, #imm16
• MOVT loads a 16-bit immediate value into the upper half of the register
• ADDW and SUBW can use any 12-bit positive constant, for example:
– ADDW Rd, Rn, #imm12

Attempts to use constants which are not in the correct range will generate an assembly error
• Use the LDR pseudo-op – LDR Rn, =<constant>
• Assembler will use optimal sequence to generate constant into specified register

21 1191 rev 35008


Flexible second operand - Constants (2)
Can still encode other constants through instruction substitutions
• The assembler performs substitutions automatically

ADCS r0, r0, #0xFFFFFFF0 8-bit? 12-bit? Pattern-match ROR?


r0 = r0 + (-16) ✘ ✘ ✘ ✘

Logical inversion

SBCS r0, r0, #0xF 8-bit? 12-bit? Pattern-match ROR?


r0 = r0 - 16 ✔ ✔ ✘ ✔
• SBCS uses the same architectural pseudocode function as ADCS

(result, carry, overflow) = AddWithCarry(R[n], NOT(imm32), APSR.C);

22 1191 rev 35008


Loading constants into registers
Compilers use optimal sequence to generate constant into specified register
• MOV, MVN, MOVW/MOVT, or LDR from a literal pool
• Constant determined at compile or link time

Examples:
int op1, op2, op3, op4;
... Execute-only support: (-mexecute-only)
op1 = 0x2543;
MOV r0,#0x2543 • Data constants loaded into a register through a pair of
op2 = 0xFFFF43FF; instructions
MVN r1,#0x0000bc00
• Data side access to instruction memory not required
op3 = 0x2F008000;
MOVW r7,#0x8000
MOVT r7,#0x2F00 Literal pools:
op4 = 0xFFFFF5;
• Data constants, accessed through a PC relative load
LDR r3,.LCPI0_0
...
• Located near the code that uses them
.LCPI0_0 • Data side access to instruction memory required
.long 16777205 @ 0xfffff5

23 1191 rev 35008


Multiply
32-bit multiplication 64-bit multiplication
Rn Rm Rn Rm
Ra
× UMULL ×
optional MUL
SMULL optional
accumulation MLA accumulation
+/- UMLAL +
MLS
SMLAL
Rd RdHi RdLo

Examples:
int op1, op2, op3, dest1, dest2;
dest1 = op1 * op2;
MUL r8,r4,r5
dest2 = dest2 + op1 * op2;
MLA r9,r4,r5,r9
dest1 = dest1 – op2 * op3;
MLS r8,r5,r6,r8
24 1191 rev 35008
Divide
Armv8-M cores include division hardware
• Signed and unsigned divide are implemented using 32-bit instructions

int sdiv(int a, int b) sdiv


{ SDIV r0,r0,r1
return (a / b); BX lr
}

unsigned int udiv(unsigned int c, unsigned int d) udiv


{ UDIV r0,r0,r1
return (c / d); BX lr
}

Prior to Armv7, Arm cores contained no division hardware


• Division was typically implemented by a run-time library function

25 1191 rev 35008


Bit manipulation instructions (1)
Allow insertion, clearing, extraction and reversal of bits within a register
Typically generated when compiling C code that manipulates bitfields

struct REG1_t
{
unsigned int bit_05 : 6;
unsigned int bit_68 : 3;
unsigned int bit_9F : 7;
};

struct REG1_t RegTemp1;

void test(void)
{
RegTemp1.bit_68 = RegTemp1.bit_9F; // UBFX r2,r1,#9,#7
// BFI r0,r2,#6,#3
}
26 1191 rev 35008
Bit manipulation instructions (2)
r1: LSB
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0

width
dest, src, lsb, width
r2: UBFX r2, r1, #9, #7

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0

zero-extend width

dest, src, lsb, width


r0: BFI r0, r2, #6, #3 LSB
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0

width

27 1191 rev 35008


Section quiz - Data processing instructions

unsigned int divideByTen(int x)


{
return x / 10;
}

Which of the following is valid disassembly for the C function divideByTen()?

divideByTen:
movw r1, #52429
movt r1, #52428 divideByTen:
umull r0, r1, r0, r1 udiv r0, r0, #10
lsrs r0, r1, #3 bx lr
bx lr

28 1191 rev 35008


Agenda
Introduction
Data Processing Instructions
Load/Store Instructions
Flow Control
Miscellaneous
Arm Custom Instructions

29 1191 rev 35008


Learning objectives - Load/store instructions
After completing this section, you will be able to:
• Discuss the memory access sizes and addressing modes of load and store instructions in the T32 instruction set

30 1191 rev 35008


Single/double register data transfer
Use to move data between one or two registers and memory
LDR / STR Word
LDRD / STRD Doubleword Memory

LDRB / STRB Byte


LDRH / STRH Halfword
LDRSB Signed byte load 31 0
LDRSH Signed halfword load Rd Any remaining space
zero filled or sign extended

Syntax
LDR{<size>}{<cond>} Rd, <address>
STR{<size>}{<cond>} Rd, <address>
Example
short *sptr = dest_addr; // r1
char *cptr = source_addr; // r0
*sptr = *cptr;
LDRB r0,[r0,#0]
STRH r0,[r1,#0]

31 1191 rev 35008


Addressing memory - Offsets
The address accessed by LDR / STR is specified by a base register with an optional offset

The offset can be of 4 types


• Base register only (no offset)
LDR r0, [r1]
Why is the offset here #4?
• Base register plus constant
short *sptr = (short *)0x1002;
LDR r0, [r1, #8]
int op1;
• Base register, plus register (optionally shifted by an op1 = *(sptr+2);
immediate value)
LDR r0, [r1, r2]
LDR r0, [r1, r2, LSL #2] LDRSH r1,[r0,#4]

• The offset can be either added to or


subtracted from the base register
LDR r0, [r1, #-8]
LDR r0, [r1, -r2]
LDR r0, [r1, -r2, LSL #2]

32 1191 rev 35008


Addressing memory - Addressing modes
Simple Offset
LDR R0, [R1] LDR R0, [R1, #12]
R1 R1 + #12

Memory Memory

R0 R0

Pre-indexed Post-indexed
LDR R0, [R1, #12]! LDR R0, [R1], #12
#12 #12
+ R1 + R1
R1 R1

Memory Memory

33 1191 rev 35008


R0 R0
Multiple register data transfer
These instructions move data between multiple registers and memory
• <LDM|STM>{<addressing_mode>}{<cond>} Rb{!}, <register list>

Two addressing modes supported in T32 state


• Increment after (IA)
• Decrement before (DB)
• When used as LDMIA and STMDB, these modes have the common name Full Descending (FD)

Example
• LDM r10, {r0,r1,r4} ; load registers using r10 base

IA DB

r4
r1 Increasing
Base Register (Rb) r10 r0 Address
r4
r1
r0
34 1191 rev 35008
Stacks
Stack operations are implemented as multiple register transfers
• PUSH Store Multiple - Full Descending stack (STMFD, STMDB)
• POP Load Multiple - Full Descending stack (LDMFD, LDMIA)

Registers are stacked in order from lowest register to lowest memory location
• The order registers are specified has no effect

Stack should normally be kept 8-byte aligned on function entry for AAPCS compliance

PUSH {r4-r7, lr} POP {r4-r7, pc}


Top of Memory
9753 9753
8420 8420
1234 1234 pc 9020
8035
lr 8035 1010 SP 1010 lr 9048
SP
Old SP
8035
FFFF 8035
r7 A0BE A0BE
16 AOBE
A0BE r7 A0BE
12
r6 1234 1234
102E 1234 r6 12340
r5 FF 8765
FF FF r5 14544
FF
r4 100 SP ABCD
100 SP 100 r4 100
1

35 1191 rev 35008


Section quiz - Load/store instructions
What is the error in the following instruction sequence that performs a string copy operation?

ldr r1, =source // pointer to source


ldr r2, =dest // pointer to destination
copy:
ldrb r0, [r1, #1] // load character from source
strb r0, [r2, #1] // store character to destination
cmp r0, #0 // the string is null-terminated
bne copy // copy characters until string ends
bx lr

36 1191 rev 35008


Section quiz - Load/store instructions
What is the error in the following instruction sequence that performs a string copy operation?

ldr r1, =source // pointer to source


ldr r2, =dest // pointer to destination
copy:
ldrb r0, [r1, #1] // load character from source
strb r0, [r2, #1] // store character to destination
cmp r0, #0 // the string is null-terminated
bne copy // copy characters until string ends
bx lr

Infinite loop!

37 1191 rev 35008


Section quiz - Load/store instructions
What is the error in the following instruction sequence that performs a string copy operation?

ldr r1, =source // pointer to source


ldr r2, =dest // pointer to destination
copy:
ldrb r0, [r1, #1] // load character from source
strb r0, [r2, #1] // store character to destination
cmp r0, #0 // the string is null-terminated
bne copy // copy characters until string ends
bx lr

ldrb r0, [r1, #1]! // increment pointer and load character from source
strb r0, [r2, #1]! // increment pointer and store character to destination

38 1191 rev 35008


Section quiz - Load/store instructions
What is the error in the following instruction sequence that performs a string copy operation?

ldr r1, =source // pointer to source


ldr r2, =dest // pointer to destination
copy:
ldrb r0, [r1, #1] // load character from source
strb r0, [r2, #1] // store character to destination
cmp r0, #0 // the string is null-terminated
bne copy // copy characters until string ends
bx lr

ldrb r0, [r1, #1]! // increment pointer and load character from source
strb r0, [r2, #1]! // increment pointer and store character to destination

Skips first character!

39 1191 rev 35008


Section quiz - Load/store instructions
What is the error in the following instruction sequence that performs a string copy operation?

ldr r1, =source // pointer to source


ldr r2, =dest // pointer to destination
copy:
ldrb r0, [r1, #1] // load character from source
strb r0, [r2, #1] // store character to destination
cmp r0, #0 // the string is null-terminated
bne copy // copy characters until string ends
bx lr

ldrb r0, [r1, #1]! // increment pointer and load character from source
strb r0, [r2, #1]! // increment pointer and store character to destination

ldrb r0, [r1], #1 // load character from source, and then increment pointer
strb r0, [r2], #1 // and store character to destination, and then increment pointer

40 1191 rev 35008


Agenda
Introduction
Data Processing Instructions
Load/Store Instructions
Flow Control
Miscellaneous
Arm Custom Instructions

41 1191 rev 35008


Learning objectives - Flow control
After completing this section, you will be able to:
• Describe the program flow of known a T32 instruction sequence.

42 1191 rev 35008


Flow control summary
Branch instructions vary in size and range

Branch Range
Instruction
16-bit 32-bit

B +/- 2KB +/- 16MB


B<cond> -256 to +254 bytes +/- 1MB
BL +/- 16MB
BX Any1 1 Register-based address anywhere in 4GB address space
BXNS Any2
2 Available if Security Extension is present
BLX Any1
(must be executed from Secure state)
BLXNS Any2
CBZ +4 to 130 bytes
CBNZ +4 to 130 bytes
TBB +512 bytes
TBH +128KB

43 1191 rev 35008


Branch instructions
Branch instructions have the following format
• B{<cond>} label
• Branch range depends on instruction set and width
• Assembler works out the offset of the label from the PC for the branch instruction

B start perform PC-relative branch to label “start”


MOVS r0, r1
lab1
ADDS r0, #1
..
..
start continue execution from here
CMP r0, r1
..

44 1191 rev 35008


Subroutines: Branch with Link
Implementing a conventional subroutine call requires two steps
• Store the return address
• Branch to the address of the required subroutine

These steps are carried out in one instruction, BL


• The return address is stored in the link register (lr/r14)
• Branch to an address (range dependent on instruction set and width)

Returning is performed by restoring the program counter (pc) from lr

func1 func2
void func1(void)
{
: :
BL func2 :
func2();
:
: BX lr
}

45 1191 rev 35008


Compare and Branch on Zero
Replaces a CMP followed by a Branch
• BUT does not affect condition code flags
• Can only branch forward between 4 and 130 bytes

Syntax
• CB{N}Z <Rn>, <label>
– CBZ: If Rn is equal to zero, branch to label
– CBNZ: If Rn is not equal to zero, branch to label

. .

CMP r0, #0 CBZ r0, exit

BEQ exit .

. .

exit exit
46 1191 rev 35008
If-Then block
Not enough bits in 16-bit or 32-bit Thumb encoding for conditional execution

; if (r0 == 0)
So IT instruction added, along with IT bits in xPSR ; r0 = *r1 + 2;
; else
Makes the next 1-4 instructions conditional ; r0 = *r2 + 4;

Syntax
• IT{T|E}{T|E}{T|E} <condition_code>
; if
• Any condition code may be used
CMP r0, #0
• Condition flags can change inside the block I T T E E EQ

Current “if-then status” stored in EPSR ; then


• Conditional block may be safely interrupted LDREQ r0, [r1]
• Not recommended to branch into or out of ‘if-then’ block ADDEQ r0, #2

; else
LDRNE r0, [r2]
ADDNE r0, #4

47 1191 rev 35008


Section quiz - Flow control
array dcd 0xF0000000, 0xF0000001, 0xF0000002

lookup:
mov r0, #2 // array index
ldr r1, =array // pointer to array
cmp r0, #1
blt L0
beq L1
bgt L2
L0:
ldr r0, [r1]
b exit
L1:
ldr r0, [r1, #4]
b exit
L2:
ldr r0, [r1, #8]
b exit
exit:
bx lr
48 1191 rev 35008
At the end of this instruction
Section quiz - Flow control sequence, what is the value in
register r0?
array dcd 0xF0000000, 0xF0000001, 0xF0000002

lookup:
mov r0, #2 // array index
ldr r1, =array // pointer to array
cmp r0, #1
blt L0
beq L1
bgt L2
L0:
ldr r0, [r1]
b exit
L1:
ldr r0, [r1, #4]
b exit
L2:
ldr r0, [r1, #8]
b exit
exit:
bx lr
49 1191 rev 35008
At the end of this instruction
Section quiz - Flow control sequence, what is the value in
register r0?
array dcd 0xF0000000, 0xF0000001, 0xF0000002

lookup:
mov r0, #2 // array index
ldr r1, =array // pointer to array
cmp r0, #1
blt L0
beq L1
bgt L2
L0:
ldr r0, [r1]
b exit
L1:
ldr r0, [r1, #4]
b exit
L2:
ldr r0, [r1, #8]
b exit
exit:
bx lr
50 1191 rev 35008
At the end of this instruction
Section quiz - Flow control sequence, what is the value in
register r0?
array dcd 0xF0000000, 0xF0000001, 0xF0000002

lookup:
mov r0, #2 // array index
ldr r1, =array // pointer to array
cmp r0, #1
blt L0
beq L1
bgt L2
L0:
ldr r0, [r1]
b exit
L1:
ldr r0, [r1, #4]
b exit
L2:
ldr r0, [r1, #8]
b exit
exit:
bx lr
51 1191 rev 35008
At the end of this instruction
Section quiz - Flow control sequence, what is the value in
register r0?
array dcd 0xF0000000, 0xF0000001, 0xF0000002

lookup:
mov r0, #2 // array index
ldr r1, =array // pointer to array
cmp r0, #1
blt L0
beq L1
bgt L2
L0:
ldr r0, [r1]
b exit
L1:
ldr r0, [r1, #4]
b exit
L2:
ldr r0, [r1, #8]
b exit
exit:
bx lr
52 1191 rev 35008
At the end of this instruction
Section quiz - Flow control sequence, what is the value in
register r0?
array dcd 0xF0000000, 0xF0000001, 0xF0000002

lookup:
mov r0, #2 // array index
ldr r1, =array // pointer to array
cmp r0, #1
blt L0
beq L1
bgt L2
L0:
ldr r0, [r1]
b exit
L1:
ldr r0, [r1, #4]
b exit
L2:
ldr r0, [r1, #8]
b exit
exit:
bx lr
53 1191 rev 35008
At the end of this instruction
Section quiz - Flow control sequence, what is the value in
register r0?
array dcd 0xF0000000, 0xF0000001, 0xF0000002

lookup:
mov r0, #2 // array index
ldr r1, =array // pointer to array
cmp r0, #1
blt L0
beq L1
bgt L2
L0:
ldr r0, [r1]
b exit
L1:
ldr r0, [r1, #4]
b exit
L2:
ldr r0, [r1, #8]
b exit
exit:
bx lr
54 1191 rev 35008
At the end of this instruction
Section quiz - Flow control sequence, what is the value in
register r0?
array dcd 0xF0000000, 0xF0000001, 0xF0000002

lookup:
mov r0, #2 // array index
ldr r1, =array // pointer to array
cmp r0, #1
blt L0
beq L1
bgt L2
L0:
ldr r0, [r1]
b exit
L1:
ldr r0, [r1, #4]
b exit
L2:
ldr r0, [r1, #8]
b exit
exit:
bx lr
55 1191 rev 35008
At the end of this instruction
Section quiz - Flow control sequence, what is the value in
register r0?
array dcd 0xF0000000, 0xF0000001, 0xF0000002
What C syntax could this code
lookup: correspond do?
mov r0, #2 // array index
ldr r1, =array // pointer to array Can you think of a more
cmp r0, #1 optimal way of performing this
blt L0 operation?
beq L1
bgt L2
L0:
ldr r0, [r1]
b exit
L1:
ldr r0, [r1, #4]
b exit
L2:
ldr r0, [r1, #8]
b exit
exit:
bx lr
56 1191 rev 35008
At the end of this instruction
Section quiz - Flow control sequence, what is the value in
register r0?

What C syntax could this code


correspond do?

Can you think of a more


optimal way of performing this
array dcd 0xF0000000, 0xF0000001, 0xF0000002 operation?

lookup:
mov r0, #2 // array index
ldr r1, =array // pointer to array
ldr r0, [r1, r0, lsl #2]
bx lr

57 1191 rev 35008


Agenda
Introduction
Data Processing Instructions
Load/Store Instructions
Flow Control
Miscellaneous
Arm Custom Instructions

58 1191 rev 35008


Learning objectives - Miscellaneous
After completing this section, you will be able to:
• Identify whether an assembly language source file has been written using legacy armasm syntax or using GNU
assembler syntax

• Write a T32 instruction sequence to modify an Armv8-M special-purpose register value

59 1191 rev 35008


Example assembly file
Arm assemblers like armclang and armasm comply to UAL
• The names and use of T32 instructions are unchanged

However expressions and directives vary across different assemblers

armasm armclang

AREA |.text|, CODE, READONLY .section .text, ”ax"


ENTRY // armclang provides no equivalent to ENTRY
abc EQU 54 .equ abc, 54
foo foo:
MOVS r0, #10 MOVS r0, #10
MOVS r1, #abc MOVS r1, #abc
ADDS r2, r0, r1 ; this is a comment ADDS r2, r0, r1 // this is a comment
... ...
DCD 0xAB00321A .word 0xAB00321A
END .end

60 1191 rev 35008


Saturation
Saturate a value to a specified bit position (a power of 2)
• Unsigned saturation of 32-bit value: USAT{<cond>} Rd, #sat, Rm {,shift}
– #sat is specified as an immediate value in the range 0 to 31
– {shift} is optional and is limited to 5-bit LSL or ASR
– Q flag is set if saturation occurs (sticky bit)

unsigned int dac_val;, dac_out;


DAC Output ...
dac_out = __usat(dac_val,10);
0x3FF
USAT r1,#10,r0

0x000 time

• Signed saturation of 32-bit value: SSAT Rd, #sat, Rm {shift}


– Use C intrinsic: int __SSAT(int val, unsigned int sat)
61 1191 rev 35008
Byte reversal and CLZ
Byte Reversal Instructions dest3 = __REV(dest3);
REV r4,r4

3 2 1 0 0 1 2 3

• REV16{cond} Rd, Rm Reverses the bytes in each halfword


• REVSH{cond} Rd, Rm Reverses the bottom two bytes, and sign extends the result to 32 bits

Count Leading Zeros


• Returns number of unset bits before the most significant set bit

dest2 = __CLZ(dest3);
CLZ r1,r4

CLZ returns 10 in this case

31 0
0 0 0 0 0 0 0 0 0 0 1 0 1 1 0 0 0 1 0 0 1 1 1 0 0 1 1 1 0 1 0 0

62 1191 rev 35008


Accessing special-purpose registers
The MRS/MSR instructions can be used to access special-purpose registers
• MRS Rd, <reg> ; Moves from special-purpose registers to Rd
• MSR <reg>, Rm ; Moves from Rm to special-purpose registers

Able to access all internal registers


• Stack pointers (MSP, PSP)
• Status registers (IPSR, EPSR, APSR, IEPSR, IAPSR, EAPSR, PSR)
• Interrupt registers (PRIMASK, BASEPRI, BASEPRI_MAX, FAULTMASK)
• CONTROL register

Rules and restrictions


• Thread mode cannot read alternate stack pointer or IPSR – zeros will be returned
• All EPSR bits read as zero during normal execution but can be read when in halting debug mode
• The APSR can be written to by software in Handler or Thread mode
• Software is not permitted to write to the IPSR or EPSR
• BASEPRI_MAX option updates the BASEPRI mask register only when the new value increases the priority level

63 1191 rev 35008


Power management instructions
Note that these instructions are “hint” instructions
• They may not be implemented on an actual core

__WFI();
Wait For Interrupt – WFI WFI
• Puts the core into standby mode
• Woken by an interrupt or debug event

__WFE();
Wait For Event – WFE WFE
• Same as WFI, but will also wake the core on a signalled event

Send Event – SEV


__SEV();
• Signals an event to other cores in a multi-processor system
SEV

64 1191 rev 35008


No operation
No Operation – NOP __NOP();
• Can be used as padding to align following instructions NOP
• May or may not take time to execute

65 1191 rev 35008


Other miscellaneous instructions
There are many more instructions available in Armv8-M some of which are covered in other modules:
• Access permissions and security state information instructions – TT, TTA, TTT, TTAT
• Atomics, e.g., LDA and STL
• Breakpoint instruction – BKPT
• Barrier instructions – DSB, DMB and ISB
• DSP and Floating-Point instructions, e.g. UASX and VADD
• Load/store exclusive instructions, e.g., LDREX and STREX
• Security extension instructions – BXNS, BLXNS, SG
• Supervisor Call – SVC

66 1191 rev 35008


Section quiz - Miscellaneous
What instruction sequence can be used to switch from the Main Stack Pointer (MSP) to the Process Stack Pointer
(PSP)?

67 1191 rev 35008


Section quiz - Miscellaneous
What instruction sequence can be used to switch from the Main Stack Pointer (MSP)
to the Process Stack Pointer (PSP) for Thread mode?

68 1191 rev 35008


Section quiz - Miscellaneous
What instruction sequence can be used to switch from the Main Stack Pointer (MSP)
to the Process Stack Pointer (PSP) for Thread mode?

69 1191 rev 35008


Section quiz - Miscellaneous
What instruction sequence can be used to switch from the Main Stack Pointer (MSP)
to the Process Stack Pointer (PSP) for Thread mode?

.section .text, "ax"

.global switch_to_psp
.type switch_to_psp, %function

switch_to_psp:
mrs r0, CONTROL // read the CONTROL register into r0
orr r0, r0, #0x2 // switch to the Process Stack Pointer
msr CONTROL, r0 // write r0 out to the CONTROL register
bx lr

70 1191 rev 35008


Agenda
Introduction
Data Processing Instructions
Load/Store Instructions
Flow Control
Miscellaneous
Arm Custom Instructions

71 1191 rev 35008


Learning objectives – Arm Custom Instructions (ACIs)
After completing this section, you will be able to:
• Interpret the instruction patterns of ACIs

72 1191 rev 35008


Custom instruction classes
Three classes of instruction
<operation code> <destination reg>, <imm>
<operation code> <dest reg>, <source reg>, <imm>
<operation code> <dest reg>, <src reg 1>, <src reg 2>, <imm>

Source and destination registers can be either:


• General purpose registers (and APSR_nzcv condition codes)
• FP and SIMD registers (no condition codes)

Immediate values can also be encoded

Note: Cortex-M33 does not support using the SIMD variant

73 1191 rev 35008


Coprocessor and accumulation
Custom instructions use the same encoding space as the coprocessor instructions
• Each instruction targets a specific coprocessor

Instructions can optionally accumulate into the destination register


• Only accumulating versions of integer class of instructions can be inside an IT Block

The integer class instructions have a “dual” variant, whose result is 64-bit

74 1191 rev 35008


General-purpose register file – Custom instruction class 1

CX1 p0, APSR_nzcv, #3 CX1A p1, r2, #4

Non-accumulator variant Accumulator variant

32-bit GPR / flags 32-bit GPR / flags


CDE CDE
13-bit immediate 13-bit immediate
opcode opcode

Operate using condition flags Accumulate into r2

75 1191 rev 35008


General-purpose register file – Custom instruction class 2

CX2D p1, r0, r1, r5, #6 CX2DA p3, r3, r4, r8, #0xab

Non-accumulator variant Accumulator variant

32-bit GPR / flags 32-bit GPR / flags


CDE CDE
32-bit GPR / flags 32-bit GPR / flags
32-bit GPR / flags 32-bit GPR / flags
9-bit immediate 9-bit immediate
opcode opcode

Two destination registers Two destination registers


and accumulation

76 1191 rev 35008


Thank You
Danke
Gracias
Grazie
谢谢
ありがとう
Asante
Merci
감사합니다
धन्यवाद
Kiitos
‫شكرا‬
ً
ধন্যবাদ
© 2022 Arm ‫תודה‬

You might also like