Proc Emb Ch3
Proc Emb Ch3
1
Chapter Syllabus
▪ ARM Cortex-M4 Assembler
▪ ARM Cortex-M4 Instruction Set
2
Assembler Programming Language Definition
▪ Machine level Language : a set of binary orders defined to be understood and executed by a given
microprocessor
▪ Assembler Language : defines instructions in mnemonic form or operation codes that represents a
function the processor will perform (Example: ADD, MOV, LDR, BL…)
▪ Assembler structure: code lines have two parts, first one is the name of instruction to be executed,
second part are command parameters (Example: MOV r0, #15)
▪ Assembling : Using of specific software tool to convert Assembler symbolic instructions into executable
machine code
▪ Allows an efficient adequacy between program algorithms and microprocessor hardware and software
resources
▪ After binary translation, Assembler program is very short and optimized compared to codes generated by
high level programs
▪ Reduced use of instructions reduces memory occupation
▪ Extremely required for operating system commands that need rapid execution timing
4
ARM CORTEX-M4 ASSEMBLER
5
Cortex-M4 Program Image
▪ The program image in Cortex-M4 contains
▪ Vector table -- includes the starting addresses of exceptions (vectors) and the value of the Main Stack Point
(MSP);
▪ C start-up routine;
▪ Program code – application code and data;
▪ C library code – program codes for C library functions.
Code region
External Interrupts
SysTick
PendSV
Reserved
Start-up routine & Debug monitor
Program code & SVCall
C library code Reserved
Program
Image Usage fault
Bus fault
MemManage fault
Hard fault vector
NMI vector
Vector table Reset vector
0x00000000 Initial MSP value
6
Cortex-M4 Program Image
7
Assembly Program Structure
DIRECTIVES: (Assembler destination) to prepare the execution environment for the processor
AREA, GLOBAL, END
▪ ALIGN: used before inserting a word-size data, uses a number to determine the alignment size
9
Cortex-M4 Endianness
▪ Endian refers to the order of bytes stored in memory
▪ Little endian: lowest byte of a word-size data is stored in bit 0 to bit 7
▪ Big endian: lowest byte of a word-size data is stored in bit 24 to bit 31
10
Data Definition Code Example
Address
(Code Memory)
Little
Endian
organization
Use of PC register as
pointer to access data Program and
Data in code
memory
11
ARM CORTEX-M4 PROCESSOR
INSTRUCTION SET
12
ARM and Thumb® Instruction Set
▪ Early ARM instruction set
▪ 32-bit instruction set, called the ARM instructions
▪ Powerful and good performance
▪ Larger program memory compared to 8-bit and 16-bit processors
▪ Larger power consumption
13
ARM and Thumb Instruction Set
0
ARM
Incoming Instructions
Instruction
Instructions Executing
Thumb remap decoder
to ARM
1
T bit, 0: select ARM,
▪ Thumb-2 instruction set 1: select Thumb
▪ Consists of both 32-bit Thumb instructions and original 16-bit Thumb-1 instruction sets
▪ Compared to 32-bit ARM instructions set, code size is reduced by ~26%, while keeping a similar performance
▪ Capable of handling all processing requirements in one operation state
14
Cortex-M4 Instruction Format
▪ ARM assembly syntax:
label
mnemonic operand1, operand2, … ; Comments
▪ Label is used as a reference to an address location;
▪ Mnemonic is the name of the instruction;
▪ Operand1 is the destination of the operation;
▪ Operand2 is normally the source of the operation;
▪ Comments are written after “ ; ”, which does not affect the program;
▪ For example
MOVS R3, #0x11 ; Set register R3 to 0x11
▪ Note that the assembly code can be assembled by either ARM assembler (armasm) or assembly tools from a
variety of vendors (e.g. GNU tool chain). When using GNU tool chain, the syntax for labels and comments is
slightly different
15
Functional groups of Cortex-M4 instructions
▪ Memory access instructions
▪ Saturating instructions
▪ Bitfield instructions
▪ Miscellaneous instructions
▪ Floating-point instructions
16
Memory Access Instructions
17
Single Access Data Transfer
▪ Use to move data between one or two registers and memory
LDRD STRD Doubleword
LDR STR Word Memory
LDRB STRB Byte
LDRH STRH Halfword
LDRSB Signed Byte load
31 0
LDRSH Signed Halfword load
Rd Upper bits zero filled or
sign extended on Load
▪ Syntax:
▪ LDR{<size>}{<cond>} Rd, <address>
▪ STR{<size>}{<cond>} Rd, <address>
▪ Example:
▪ LDRB r0, [r1] ; load bottom byte of r0 from the byte of memory at address in r1
18
Multiple Register Data Transfer
▪ These instructions move data between multiple registers and memory
▪ Syntax
<LDM|STM>{<addressing_mode>}{<cond>} Rb{!}, <register list>
20
General Data Processing Instructions (continued)
21
Data Processing Instructions - Examples
▪ These instructions operate on the contents of registers
▪ They DO NOT affect memory
▪ Examples:
▪ ADC r0, r1, r2 ; r0 = r1 + r2 + C
▪ TEQ r0, r1 ; if r0 = r1, Z flag will be set
▪ MOV r0, r1 ; copy r1 to r0
22
Arithmetic & Logic Instructions Examples
23
Multiply and Divide Instructions
24
Multiply Instructions Examples
▪ MUL and MLA are multiply and multiply-and-accumulate instructions that produce 32-bit results.
▪ MUL multiplies the values in two registers, truncates the result to 32 bits, and stores the product in a
third register.
▪ MLA multiplies two registers, adds the value of a third register to the product, truncates the results
to 32 bits, and stores the result in a fourth register
25
Multiply Long Instructions Examples
▪ Multiply long instructions produce 64-bit results. They multiply the values of two registers and store the
64-bit result in a third and fourth register. SMULL and UMULL are signed and unsigned multiply long
instructions
▪ SMLAL and UMLAL are signed and unsigned multiply-long-and-accumulate instructions. They multiply
the values of two registers, add the 64-bit value from a third and fourth register, and store the 64-bit
result in the third and fourth registers
( )
( )
26
Logic and Arithmetic Shifts Examples
Sign
maintained
27
Shifter Rotate Operations Examples
33-bit representation
MOV R0, R1, RRX The register R0 become the same as the value of the register R1 rotated
through the carry flag by one bit. The MSB of the value
becomes the same as the current Carry flag, while the Carry flag will be the
same as the LSB or R1. The value of R1 will not be changed.
28
Bit Field Instructions Branch and Control Instructions
29
Cortex-M4 Suffix
▪ Some instructions can be followed by suffixes to update processor flags or execute the instruction on a
certain condition
Suffix Description Example Example explanation
S Update APSR (flags) ADDS R1, #0x21 Add 0x21 to R1 and update APSR
Condition execution
EQ, NE, CS, CC, MI, PL,VS,VC, Branch to the label if not equal
e.g. EQ= equal, NE= not equal, LT= BNE label
HI, LS, GE, LT, GT, LE
less than
30
Condition Code Flags Update
▪ For a data processing instruction to update the condition code flags, the instruction must be postfixed
with an S.
▪ The exceptions to this are CMP, CMN, TST, and TEQ, which always update the flags, because updating
flags is their only real function.
31
Conditional Execution & Flags
▪ ARM instructions can be made to execute conditionally by postfixing them with the appropriate
condition code field.
▪ This improves code density and performance by reducing the number of forward branch instructions.
▪ By default, data processing instructions do not affect the condition code flags but the flags can be
optionally set by using “S”. CMP does not need “S”.
32
Conditional Execution Examples
33
Loop Structures
▪ Three basic types of loops
▪ for loops
▪ while loops
▪ do {…..} while loops
▪ for Loop example
r0
34
While and Do Loops
While Loop:
▪ Because the number of iterations of a while loop is not a constant, these structures tend to be
somewhat simpler.
▪ There is only one branch in the loop itself. The first branch actually throws you into the loop of code.
Do … while loops
▪ Loop body is executed before the expression is evaluated. The structure is the same as the while loop
but without the initial branch:
35
Branch & Subroutine
▪ B <label>
▪ PC relative. ±32 Mbyte range.
▪ BL <subroutine>
▪ Stores return address in LR
▪ Returning implemented by restoring the PC from LR
▪ For non-leaf functions, LR will have to be stacked
36
Binary Upwards Compatibility
ARMv7-M
Architecture
ARMv6-M
Architecture
37
APPENDIX:
ARM CORTEX-M4 ASSEMBLER
INSTRUCTIONS LIST
38
Cortex-M4 Instruction Set
Mnemonic Operands Brief description Flags
ADC, ADCS {Rd,} Rn, Op2 Add with Carry N,Z,C,V
ADD, ADDS {Rd,} Rn, Op2 Add N,Z,C,V
ADD, ADDW {Rd,} Rn, #imm12 Add N,Z,C,V
ADR Rd, label Load PC-relative Address
AND, ANDS {Rd,} Rn, Op2 Logical AND N,Z,C
ASR, ASRS Rd, Rm, <Rs|#n> Arithmetic Shift Right N,Z,C
B label Branch
BFC Rd, #lsb, #width Bit Field Clear
BFI Rd, Rn, #lsb, #width Bit Field Insert
BIC, BICS {Rd,} Rn, Op2 Bit Clear N,Z,C
BKPT #imm Breakpoint
BL label Branch with Link
BLX Rm Branch indirect with Link
BX Rm Branch indirect
39
Cortex-M4 Instruction Set
Mnemonic Operands Brief description Flags
40
Cortex-M4 Instruction Set
Mnemonic Operands Brief description Flags
LDRD Rt, Rt2, [Rn, #offset] Load Register with two bytes
41
Cortex-M4 Instruction Set
Mnemonic Operands Brief description Flags
LDRSB, LDRSBT Rt, [Rn, #offset] Load Register with Signed Byte
LDRSH, LDRSHT Rt, [Rn, #offset] Load Register with Signed Halfword
42
Cortex-M4 Instruction Set
Mnemonic Operands Brief description Flags
MUL, MULS {Rd,} Rn, Rm Multiply, 32-bit result N,Z
NOP No Operation
43
Cortex-M4 Instruction Set
Mnemonic Operands Brief description Flags
QASX {Rd, } Rn, Rm Saturating Add and Subtract with Exchange
REVSH Rd, Rn Reverse byte order in bottom halfword and sign extend
44
Cortex-M4 Instruction Set
Mnemonic Operands Brief description Flags
RRX, RRXS Rd, Rm Rotate Right with Extend N,Z,C
SHASX {Rd,} Rn, Rm Signed Halving Add and Subtract with Exchange
45
Cortex-M4 Instruction Set
Mnemonic Operands Brief description Flags
SHSAX {Rd,} Rn, Rm Signed Halving Subtract and Add with Exchange
SMLAWB, SMLAWT Rd, Rn, Rm, Ra Signed Multiply Accumulate, word by halfword Q
SMMLA Rd, Rn, Rm, Ra Signed Most significant word Multiply Accumulate
46
Cortex-M4 Instruction Set
Mnemonic Operands Brief description Flags
SMMLS, SMMLR Rd, Rn, Rm, Ra Signed Most significant word Multiply Subtract
47
Cortex-M4 Instruction Set
Mnemonic Operands Brief description Flags
STM Rn{!}, reglist Store Multiple registers, increment after
48
Cortex-M4 Instruction Set
Mnemonic Operands Brief description Flags
SUB, SUBW {Rd,} Rn, #imm12 Subtract N,Z,C,V
49
Cortex-M4 Instruction Set
Mnemonic Operands Brief description Flags
UHASX {Rd,} Rn, Rm Unsigned Halving Add and Subtract with Exchange
UHSAX {Rd,} Rn, Rm Unsigned Halving Subtract and Add with Exchange
50
Cortex-M4 Instruction Set
Mnemonic Operands Brief description Flags
UMLAL RdLo, RdHi, Rn, Rm Unsigned Multiply with Accumulate (32 x 32 + 64), 64-bit
result
UMULL RdLo, RdHi, Rn, Rm Unsigned Multiply (32 x 32), 64-bit result
UQASX {Rd,} Rn, Rm Unsigned Saturating Add and Subtract with Exchange
UQSAX {Rd,} Rn, Rm Unsigned Saturating Subtract and Add with Exchange
USADA8 {Rd,} Rn, Rm, Ra Unsigned Sum of Absolute Differences and Accumulate
USAT Rd, #n, Rm {,shift #s} Unsigned Saturate Q
51
Cortex-M4 Instruction Set
Mnemonic Operands Brief description Flags
UXTAB16 {Rd,} Rn, Rm,{,ROR #} Rotate, dual extend 8 bits to 16 and Add
UXTAH {Rd,} Rn, Rm,{,ROR #} Rotate, unsigned extend and Add Halfword
UXTB {Rd,} Rm {,ROR #n} Zero extend a Byte
52
Cortex-M4 Instruction Set
Mnemonic Operands Brief description Flags
Compare two floating-point registers, or one floating- FPSCR
VCMPE.F32 Sd, <Sm | #0.0>
point register and zero with Invalid Operation check
VCVT.S32.F32 Sd, Sm Convert between floating-point and integer
VCVT.S16.F32 Sd, Sd, #fbits Convert between floating-point and fixed point
Convert between floating-point and integer with
VCVTR.S32.F32 Sd, Sm
rounding
VCVT<B|H>.F32.F16 Sd, Sm Converts half-precision value to single-precision
53
Cortex-M4 Instruction Set
Mnemonic Operands Brief description Flags
VMOV Sm, Sm1, Rt, Rt2 Copy 2 ARM core registers to 2 single precision
VMRS Rt, FPSCR Move FPSCR to ARM core register or APSR N,Z,C,V
54
Cortex-M4 Instruction Set
Mnemonic Operands Brief description Flags
Note: full explanation of each instruction can be found in Cortex-M4 Devices’ Generic User Guide (Ref-4)
55