0% found this document useful (0 votes)
89 views55 pages

Proc Emb Ch3

Uploaded by

Jihene Zgolli
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
89 views55 pages

Proc Emb Ch3

Uploaded by

Jihene Zgolli
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 55

Chapter III

ARM Cortex-M4 Processor


Assembler Programming

1
Chapter Syllabus
▪ ARM Cortex-M4 Assembler
▪ ARM Cortex-M4 Instruction Set

2
Assembler Programming Language Definition
▪ Machine level Language : a set of binary orders defined to be understood and executed by a given
microprocessor

▪ Programming model : depends on registers, indicators and interruptions definition of target


microprocessor

▪ Assembler Language : defines instructions in mnemonic form or operation codes that represents a
function the processor will perform (Example: ADD, MOV, LDR, BL…)

▪ Assembler structure: code lines have two parts, first one is the name of instruction to be executed,
second part are command parameters (Example: MOV r0, #15)

▪ Assembling : Using of specific software tool to convert Assembler symbolic instructions into executable
machine code

• The label must start in the first column.


3
• An instruction never starts in the first column, even if there is no label.
Assembler Advantages
▪ Complete mastering of microprocessor operations and memories contents

▪ Allows an efficient adequacy between program algorithms and microprocessor hardware and software
resources
▪ After binary translation, Assembler program is very short and optimized compared to codes generated by
high level programs
▪ Reduced use of instructions reduces memory occupation

▪ Fast running of generated optimized short binary codes

▪ More flexibility to control I/O interfaces operations

▪ Extremely required for operating system commands that need rapid execution timing

4
ARM CORTEX-M4 ASSEMBLER

5
Cortex-M4 Program Image
▪ The program image in Cortex-M4 contains
▪ Vector table -- includes the starting addresses of exceptions (vectors) and the value of the Main Stack Point
(MSP);
▪ C start-up routine;
▪ Program code – application code and data;
▪ C library code – program codes for C library functions.
Code region
External Interrupts

SysTick
PendSV
Reserved
Start-up routine & Debug monitor
Program code & SVCall
C library code Reserved
Program
Image Usage fault
Bus fault
MemManage fault
Hard fault vector
NMI vector
Vector table Reset vector
0x00000000 Initial MSP value

6
Cortex-M4 Program Image

▪ After Reset, the processor: Reset

▪ First reads the initial MSP value;


Fetch initial value for MSP
▪ Then reads the reset vector; (Read address 0x00000000)

▪ Branches to the start of the programme execution address (reset


handler); Fetch reset vector
(Read address 0x00000004)
▪ Subsequently executes program instructions
Fetch 1st instruction
(Read address of reset vector)

Fetch 2nd instruction


(Read subsequent instructions)

7
Assembly Program Structure

DIRECTIVES: (Assembler destination) to prepare the execution environment for the processor
AREA, GLOBAL, END

INSTRUCTIONS: (Processor destination) to be executed by the processor


MOV, ADD, SVC
8
Data Definition Directives
▪ DCB allocates one or more bytes of memory, and defines the initial runtime contents of the memory
▪ DCW directive allocates one or more halfwords of memory, aligned on two-byte boundaries, and defines
the initial runtime contents of the memory
▪ DCD directive allocates one or more words of memory, aligned on four-byte boundaries, and defines the
initial runtime contents of the memory
▪ DCQ directive allocates one or more eight-byte blocks of memory, aligned on four-byte boundaries, and
defines the initial runtime contents of the memory
▪ Examples :
value1 DCB 0xC5, 13, -57

value2 DCW 4918, 0x3A54, 0x51E, 398

value3 DCD 0x12345678, 15, 2975

value4 DCQ 0x1234567812345678

▪ ALIGN: used before inserting a word-size data, uses a number to determine the alignment size

9
Cortex-M4 Endianness
▪ Endian refers to the order of bytes stored in memory
▪ Little endian: lowest byte of a word-size data is stored in bit 0 to bit 7
▪ Big endian: lowest byte of a word-size data is stored in bit 24 to bit 31

▪ Cortex-M4 supports both little endian and big endian


▪ However, Endianness only exists in the hardware level
Address [31:24] [23:16] [15:8] [7:0] [31:24] [23:16] [15:8] [7:0]

0x00000008 Byte3 Byte2 Byte1 Byte0 Byte0 Byte1 Byte2 Byte3


Word 3 Word 3

0x00000004 Byte3 Byte2 Byte1 Byte0 Byte0 Byte1 Byte2 Byte3


Word 2 Word 2

0x00000000 Byte3 Byte2 Byte1 Byte0 Byte0 Byte1 Byte2 Byte3


Word 1 Word 1

Little endian 32-bit memory Big endian 32-bit memory

10
Data Definition Code Example

Address
(Code Memory)

Little
Endian
organization
Use of PC register as
pointer to access data Program and
Data in code
memory

11
ARM CORTEX-M4 PROCESSOR
INSTRUCTION SET

12
ARM and Thumb® Instruction Set
▪ Early ARM instruction set
▪ 32-bit instruction set, called the ARM instructions
▪ Powerful and good performance
▪ Larger program memory compared to 8-bit and 16-bit processors
▪ Larger power consumption

▪ Thumb-1 instruction set


▪ 16-bit instruction set, first used in ARM7TDMI processor in 1995
▪ Provides a subset of the ARM instructions, giving better code density compared to 32-bit RISC
architecture
▪ Code size is reduced by ~30%, but performance is also reduced by ~20%

13
ARM and Thumb Instruction Set

▪ Mix of ARM and Thumb-1 Instruction sets


▪ Benefit from both 32-bit ARM (high performance) and 16-bit Thumb-1 (high code density)
▪ A multiplexer is used to switch between two states: ARM state (32-bit) and Thumb state (16-bit), which
requires a switching overhead

0
ARM
Incoming Instructions
Instruction
Instructions Executing
Thumb remap decoder
to ARM
1
T bit, 0: select ARM,
▪ Thumb-2 instruction set 1: select Thumb

▪ Consists of both 32-bit Thumb instructions and original 16-bit Thumb-1 instruction sets
▪ Compared to 32-bit ARM instructions set, code size is reduced by ~26%, while keeping a similar performance
▪ Capable of handling all processing requirements in one operation state

14
Cortex-M4 Instruction Format
▪ ARM assembly syntax:
label
mnemonic operand1, operand2, … ; Comments
▪ Label is used as a reference to an address location;
▪ Mnemonic is the name of the instruction;
▪ Operand1 is the destination of the operation;
▪ Operand2 is normally the source of the operation;
▪ Comments are written after “ ; ”, which does not affect the program;
▪ For example
MOVS R3, #0x11 ; Set register R3 to 0x11
▪ Note that the assembly code can be assembled by either ARM assembler (armasm) or assembly tools from a
variety of vendors (e.g. GNU tool chain). When using GNU tool chain, the syntax for labels and comments is
slightly different

15
Functional groups of Cortex-M4 instructions
▪ Memory access instructions

▪ General data processing instructions

▪ Multiply and divide instructions

▪ Saturating instructions

▪ Packing and unpacking instructions

▪ Bitfield instructions

▪ Branch and control instructions

▪ Miscellaneous instructions

▪ Floating-point instructions

16
Memory Access Instructions

17
Single Access Data Transfer
▪ Use to move data between one or two registers and memory
LDRD STRD Doubleword
LDR STR Word Memory
LDRB STRB Byte
LDRH STRH Halfword
LDRSB Signed Byte load
31 0
LDRSH Signed Halfword load
Rd Upper bits zero filled or
sign extended on Load
▪ Syntax:
▪ LDR{<size>}{<cond>} Rd, <address>
▪ STR{<size>}{<cond>} Rd, <address>

▪ Example:
▪ LDRB r0, [r1] ; load bottom byte of r0 from the byte of memory at address in r1

18
Multiple Register Data Transfer
▪ These instructions move data between multiple registers and memory
▪ Syntax
<LDM|STM>{<addressing_mode>}{<cond>} Rb{!}, <register list>

▪ 4 addressing modes (IA) IB DA DB


▪ Increment after/before r4
▪ Decrement after/before r4 r1
r1 r0 Increasing
Base Register (Rb) r10 r0 r4 Address
r1 r4
r0 r1
r0
▪ Also
PUSH/POP, equivalent to STMDB/LDMIA with SP! as base register
▪ Example
▪ LDM r10, {r0,r1,r4} ; load registers, using r10 base
▪ PUSH {r4-r6,pc} ; store registers, using SP base
19
General Data Processing Instructions

20
General Data Processing Instructions (continued)

21
Data Processing Instructions - Examples
▪ These instructions operate on the contents of registers
▪ They DO NOT affect memory

arithmetic logical move


ADC SBC BIC ORR MVN
manipulation
ADD SUB RSB AND EOR MOV
(has destination
RSC
register) ORN

comparison CMN CMP TST TEQ


(set flags only) (ADDS) (SUBS) (ANDS) (EORS)

▪ Examples:
▪ ADC r0, r1, r2 ; r0 = r1 + r2 + C
▪ TEQ r0, r1 ; if r0 = r1, Z flag will be set
▪ MOV r0, r1 ; copy r1 to r0
22
Arithmetic & Logic Instructions Examples

Arithmetic instructions in the ARM


Instruction Set Architecture (ISA)
perform addition, subtraction, and
reverse subtraction, all with and without
carry.

ARM supports Boolean logic


operations using two register
operands.

23
Multiply and Divide Instructions

24
Multiply Instructions Examples

▪ MUL and MLA are multiply and multiply-and-accumulate instructions that produce 32-bit results.

▪ MUL multiplies the values in two registers, truncates the result to 32 bits, and stores the product in a
third register.

▪ MLA multiplies two registers, adds the value of a third register to the product, truncates the results
to 32 bits, and stores the result in a fourth register

25
Multiply Long Instructions Examples
▪ Multiply long instructions produce 64-bit results. They multiply the values of two registers and store the
64-bit result in a third and fourth register. SMULL and UMULL are signed and unsigned multiply long
instructions

▪ SMLAL and UMLAL are signed and unsigned multiply-long-and-accumulate instructions. They multiply
the values of two registers, add the 64-bit value from a third and fourth register, and store the 64-bit
result in the third and fourth registers

( )
( )

26
Logic and Arithmetic Shifts Examples

MOV R0, R1, LSL #2

MOV R0, R1, LSR #2

MOV R0, R1, ASR #2

Sign
maintained
27
Shifter Rotate Operations Examples

MOV R0, R1, ROR #2

33-bit representation

MOV R0, R1, RRX The register R0 become the same as the value of the register R1 rotated
through the carry flag by one bit. The MSB of the value
becomes the same as the current Carry flag, while the Carry flag will be the
same as the LSB or R1. The value of R1 will not be changed.
28
Bit Field Instructions Branch and Control Instructions

29
Cortex-M4 Suffix
▪ Some instructions can be followed by suffixes to update processor flags or execute the instruction on a
certain condition
Suffix Description Example Example explanation

S Update APSR (flags) ADDS R1, #0x21 Add 0x21 to R1 and update APSR

Condition execution
EQ, NE, CS, CC, MI, PL,VS,VC, Branch to the label if not equal
e.g. EQ= equal, NE= not equal, LT= BNE label
HI, LS, GE, LT, GT, LE
less than

30
Condition Code Flags Update
▪ For a data processing instruction to update the condition code flags, the instruction must be postfixed
with an S.

▪ The exceptions to this are CMP, CMN, TST, and TEQ, which always update the flags, because updating
flags is their only real function.

31
Conditional Execution & Flags
▪ ARM instructions can be made to execute conditionally by postfixing them with the appropriate
condition code field.
▪ This improves code density and performance by reducing the number of forward branch instructions.

▪ By default, data processing instructions do not affect the condition code flags but the flags can be
optionally set by using “S”. CMP does not need “S”.

32
Conditional Execution Examples

33
Loop Structures
▪ Three basic types of loops
▪ for loops
▪ while loops
▪ do {…..} while loops
▪ for Loop example

r0

34
While and Do Loops
While Loop:
▪ Because the number of iterations of a while loop is not a constant, these structures tend to be
somewhat simpler.
▪ There is only one branch in the loop itself. The first branch actually throws you into the loop of code.

Do … while loops
▪ Loop body is executed before the expression is evaluated. The structure is the same as the while loop
but without the initial branch:

35
Branch & Subroutine
▪ B <label>
▪ PC relative. ±32 Mbyte range.

▪ BL <subroutine>
▪ Stores return address in LR
▪ Returning implemented by restoring the PC from LR
▪ For non-leaf functions, LR will have to be stacked

36
Binary Upwards Compatibility

ARMv7-M
Architecture

ARMv6-M
Architecture

37
APPENDIX:
ARM CORTEX-M4 ASSEMBLER
INSTRUCTIONS LIST
38
Cortex-M4 Instruction Set
Mnemonic Operands Brief description Flags
ADC, ADCS {Rd,} Rn, Op2 Add with Carry N,Z,C,V
ADD, ADDS {Rd,} Rn, Op2 Add N,Z,C,V
ADD, ADDW {Rd,} Rn, #imm12 Add N,Z,C,V
ADR Rd, label Load PC-relative Address
AND, ANDS {Rd,} Rn, Op2 Logical AND N,Z,C
ASR, ASRS Rd, Rm, <Rs|#n> Arithmetic Shift Right N,Z,C
B label Branch
BFC Rd, #lsb, #width Bit Field Clear
BFI Rd, Rn, #lsb, #width Bit Field Insert
BIC, BICS {Rd,} Rn, Op2 Bit Clear N,Z,C
BKPT #imm Breakpoint
BL label Branch with Link
BLX Rm Branch indirect with Link
BX Rm Branch indirect

39
Cortex-M4 Instruction Set
Mnemonic Operands Brief description Flags

CBNZ Rn, label Compare and Branch if Non Zero

CBZ Rn, label Compare and Branch if Zero

CLREX Clear Exclusive

CLZ Rd, Rm Count Leading Zeros

CMN Rn, Op2 Compare Negative N,Z,C,V

CMP Rn, Op2 Compare N,Z,C,V

CPSID i Change Processor State, Disable Interrupts

CPSIE i Change Processor State, Enable Interrupts

DMB Data Memory Barrier

DSB Data Synchronization Barrier

EOR, EORS {Rd,} Rn, Op2 Exclusive OR N,Z,C

ISB - Instruction Synchronization Barrier

40
Cortex-M4 Instruction Set
Mnemonic Operands Brief description Flags

IT If-Then condition block

LDM Rn{!}, reglist Load Multiple registers, increment after

LDMDB, LDMEA Rn{!}, reglist Load Multiple registers, decrement before

LDMFD, LDMIA Rn{!}, reglist Load Multiple registers, increment after

LDR Rt, [Rn, #offset] Load Register with word

LDRB, LDRBT Rt, [Rn, #offset] Load Register with byte

LDRD Rt, Rt2, [Rn, #offset] Load Register with two bytes

LDREX Rt, [Rn, #offset] Load Register Exclusive

LDREXB Rt, [Rn] Load Register Exclusive with Byte

LDREXH Rt, [Rn] Load Register Exclusive with Halfword

LDRH, LDRHT Rt, [Rn, #offset] Load Register with Halfword

41
Cortex-M4 Instruction Set
Mnemonic Operands Brief description Flags
LDRSB, LDRSBT Rt, [Rn, #offset] Load Register with Signed Byte

LDRSH, LDRSHT Rt, [Rn, #offset] Load Register with Signed Halfword

LDRT Rt, [Rn, #offset] Load Register with word

LSL, LSLS Rd, Rm, <Rs|#n> Logical Shift Left N,Z,C

LSR, LSRS Rd, Rm, <Rs|#n> Logical Shift Right N,Z,C

MLA Rd, Rn, Rm, Ra Multiply with Accumulate, 32-bit result

MLS Rd, Rn, Rm, Ra Multiply and Subtract, 32-bit result

MOV, MOVS Rd, Op2 Move N,Z,C

MOVT Rd, #imm16 Move Top

MOVW, MOV Rd, #imm16 Move 16-bit constant N,Z,C

MRS Rd, spec_reg Move from Special Register to general register

MSR spec_reg, Rm Move from general register to Special Register N,Z,C,V

42
Cortex-M4 Instruction Set
Mnemonic Operands Brief description Flags
MUL, MULS {Rd,} Rn, Rm Multiply, 32-bit result N,Z

MVN, MVNS Rd, Op2 Move NOT N,Z,C

NOP No Operation

ORN, ORNS {Rd,} Rn, Op2 Logical OR NOT N,Z,C

ORR, ORRS {Rd,} Rn, Op2 Logical OR N,Z,C

PKHTB, PKHBT {Rd, } Rn, Rm, Op2 Pack Halfword

POP reglist Pop registers from stack

PUSH reglist Push registers onto stack

QADD {Rd, } Rn, Rm Saturating double and Add Q

QADD16 {Rd, } Rn, Rm Saturating Add 16

QADD8 {Rd, } Rn, Rm Saturating Add 8

43
Cortex-M4 Instruction Set
Mnemonic Operands Brief description Flags
QASX {Rd, } Rn, Rm Saturating Add and Subtract with Exchange

QDADD {Rd, } Rn, Rm Saturating Add Q

QDSUB {Rd, } Rn, Rm Saturating double and Subtract Q

QSAX {Rd, } Rn, Rm Saturating Subtract and Add with Exchange

QSUB {Rd, } Rn, Rm Saturating Subtract Q

QSUB16 {Rd, } Rn, Rm Saturating Subtract 16

QSUB8 {Rd, } Rn, Rm Saturating Subtract 8

RBIT Rd, Rn Reverse Bits

REV Rd, Rn Reverse byte order in a word

REV16 Rd, Rn Reverse byte order in each halfword

REVSH Rd, Rn Reverse byte order in bottom halfword and sign extend

ROR, RORS Rd, Rm, <Rs|#n> Rotate Right N,Z,C

44
Cortex-M4 Instruction Set
Mnemonic Operands Brief description Flags
RRX, RRXS Rd, Rm Rotate Right with Extend N,Z,C

RSB, RSBS {Rd,} Rn, Op2 Reverse Subtract N,Z,C,V

SADD16 {Rd, } Rn, Rm Signed Add 16 GE

SADD8 {Rd, } Rn, Rm Signed Add 8 GE

SASX {Rd, } Rn, Rm Signed Add and Subtract with Exchange GE

SBC, SBCS {Rd,} Rn, Op2 Subtract with Carry N,Z,C,V

SBFX Rd, Rn, #lsb, #width Signed Bit Field Extract

SDIV {Rd,} Rn, Rm Signed Divide

SEV Send Event

SHADD16 {Rd,} Rn, Rm Signed Halving Add 16

SHADD8 {Rd,} Rn, Rm Signed Halving Add 8

SHASX {Rd,} Rn, Rm Signed Halving Add and Subtract with Exchange

45
Cortex-M4 Instruction Set
Mnemonic Operands Brief description Flags

SHSAX {Rd,} Rn, Rm Signed Halving Subtract and Add with Exchange

SHSUB16 {Rd,} Rn, Rm Signed Halving Subtract 16

SHSUB8 {Rd,} Rn, Rm Signed Halving Subtract 8


SMLABB, SMLABT, SMLATB,
Rd, Rn, Rm, Ra Signed Multiply Accumulate Long (halfwords) Q
SMLATT
SMLAD, SMLADX Rd, Rn, Rm, Ra Signed Multiply Accumulate Dual Q
SMLAL RdLo, RdHi, Rn, Rm Signed Multiply with Accumulate (32 x 32 + 64), 64-bit
result
SMLALBB, SMLALBT, SMLALTB,
RdLo, RdHi, Rn, Rm Signed Multiply Accumulate Long, halfwords
SMLALTT
SMLALD, SMLALDX RdLo, RdHi, Rn, Rm Signed Multiply Accumulate Long Dual

SMLAWB, SMLAWT Rd, Rn, Rm, Ra Signed Multiply Accumulate, word by halfword Q

SMLSD Rd, Rn, Rm, Ra Signed Multiply Subtract Dual Q

SMLSLD RdLo, RdHi, Rn, Rm Signed Multiply Subtract Long Dual

SMMLA Rd, Rn, Rm, Ra Signed Most significant word Multiply Accumulate

46
Cortex-M4 Instruction Set
Mnemonic Operands Brief description Flags

SMMLS, SMMLR Rd, Rn, Rm, Ra Signed Most significant word Multiply Subtract

SMMUL, SMMULR {Rd,} Rn, Rm Signed Most significant word Multiply

SMUAD {Rd,} Rn, Rm Signed dual Multiply Add Q


SMULBB, SMULBT SMULTB,
{Rd,} Rn, Rm Signed Multiply (halfwords)
SMULTT
SMULL RdLo, RdHi, Rn, Rm Signed Multiply (32 x 32), 64-bit result

SMULWB, SMULWT {Rd,} Rn, Rm Signed Multiply word by halfword

SMUSD, SMUSDX {Rd,} Rn, Rm Signed dual Multiply Subtract


SSAT Rd, #n, Rm {,shift #s} Signed Saturate Q

SSAT16 Rd, #n, Rm Signed Saturate 16 Q

SSAX {Rd,} Rn, Rm Signed Subtract and Add with Exchange GE

SSUB16 {Rd,} Rn, Rm Signed Subtract 16

SSUB8 {Rd,} Rn, Rm Signed Subtract 8

47
Cortex-M4 Instruction Set
Mnemonic Operands Brief description Flags
STM Rn{!}, reglist Store Multiple registers, increment after

STMDB, STMEA Rn{!}, reglist Store Multiple registers, decrement before

STMFD, STMIA Rn{!}, reglist Store Multiple registers, increment after

STR Rt, [Rn, #offset] Store Register word

STRB, STRBT Rt, [Rn, #offset] Store Register byte

STRD Rt, Rt2, [Rn, #offset] Store Register two words

STREX Rd, Rt, [Rn, #offset] Store Register Exclusive

STREXB Rd, Rt, [Rn] Store Register Exclusive Byte

STREXH Rd, Rt, [Rn] Store Register Exclusive Halfword

STRH, STRHT Rt, [Rn, #offset] Store Register Halfword

STRT Rt, [Rn, #offset] Store Register word

SUB, SUBS {Rd,} Rn, Op2 Subtract N,Z,C,V

48
Cortex-M4 Instruction Set
Mnemonic Operands Brief description Flags
SUB, SUBW {Rd,} Rn, #imm12 Subtract N,Z,C,V

SVC #imm Supervisor Call

SXTAB {Rd,} Rn, Rm,{,ROR #} Extend 8 bits to 32 and add

SXTAB16 {Rd,} Rn, Rm,{,ROR #} Dual extend 8 bits to 16 and add

SXTAH {Rd,} Rn, Rm,{,ROR #} Extend 16 bits to 32 and add

SXTB16 {Rd,} Rm {,ROR #n} Signed Extend Byte 16


SXTB {Rd,} Rm {,ROR #n} Sign extend a byte

SXTH {Rd,} Rm {,ROR #n} Sign extend a halfword

TBB [Rn, Rm] Table Branch Byte

TBH [Rn, Rm, LSL #1] Table Branch Halfword

TEQ Rn, Op2 Test Equivalence N,Z,C

TST Rn, Op2 Test N,Z,C

49
Cortex-M4 Instruction Set
Mnemonic Operands Brief description Flags

UADD16 {Rd,} Rn, Rm Unsigned Add 16 GE

UADD8 {Rd,} Rn, Rm Unsigned Add 8 GE

USAX {Rd,} Rn, Rm Unsigned Subtract and Add with Exchange GE

UHADD16 {Rd,} Rn, Rm Unsigned Halving Add 16

UHADD8 {Rd,} Rn, Rm Unsigned Halving Add 8

UHASX {Rd,} Rn, Rm Unsigned Halving Add and Subtract with Exchange

UHSAX {Rd,} Rn, Rm Unsigned Halving Subtract and Add with Exchange

UHSUB16 {Rd,} Rn, Rm Unsigned Halving Subtract 16

UHSUB8 {Rd,} Rn, Rm Unsigned Halving Subtract 8


UBFX Rd, Rn, #lsb, #width Unsigned Bit Field Extract

UDIV {Rd,} Rn, Rm Unsigned Divide

Unsigned Multiply Accumulate Accumulate Long (32 x 32


UMAAL RdLo, RdHi, Rn, Rm
+ 32 +32), 64-bit result

50
Cortex-M4 Instruction Set
Mnemonic Operands Brief description Flags
UMLAL RdLo, RdHi, Rn, Rm Unsigned Multiply with Accumulate (32 x 32 + 64), 64-bit
result
UMULL RdLo, RdHi, Rn, Rm Unsigned Multiply (32 x 32), 64-bit result

UQADD16 {Rd,} Rn, Rm Unsigned Saturating Add 16

UQADD8 {Rd,} Rn, Rm Unsigned Saturating Add 8

UQASX {Rd,} Rn, Rm Unsigned Saturating Add and Subtract with Exchange

UQSAX {Rd,} Rn, Rm Unsigned Saturating Subtract and Add with Exchange

UQSUB16 {Rd,} Rn, Rm Unsigned Saturating Subtract 16

UQSUB8 {Rd,} Rn, Rm Unsigned Saturating Subtract 8

USAD8 {Rd,} Rn, Rm Unsigned Sum of Absolute Differences

USADA8 {Rd,} Rn, Rm, Ra Unsigned Sum of Absolute Differences and Accumulate
USAT Rd, #n, Rm {,shift #s} Unsigned Saturate Q

USAT16 Rd, #n, Rm Unsigned Saturate 16 Q

51
Cortex-M4 Instruction Set
Mnemonic Operands Brief description Flags

UASX {Rd,} Rn, Rm Unsigned Add and Subtract with Exchange GE

USUB16 {Rd,} Rn, Rm Unsigned Subtract 16 GE

USUB8 {Rd,} Rn, Rm Unsigned Subtract 8 GE

UXTAB {Rd,} Rn, Rm,{,ROR #} Rotate, extend 8 bits to 32 and Add

UXTAB16 {Rd,} Rn, Rm,{,ROR #} Rotate, dual extend 8 bits to 16 and Add

UXTAH {Rd,} Rn, Rm,{,ROR #} Rotate, unsigned extend and Add Halfword
UXTB {Rd,} Rm {,ROR #n} Zero extend a Byte

UXTB16 {Rd,} Rm {,ROR #n} Unsigned Extend Byte 16


UXTH {Rd,} Rm {,ROR #n} Zero extend a Halfword

VABS.F32 Sd, Sm Floating-point Absolute

VADD.F32 {Sd,} Sn, Sm Floating-point Add


Compare two floating-point registers, or one floating-
VCMP.F32 Sd, <Sm | #0.0> FPSCR
point register and zero

52
Cortex-M4 Instruction Set
Mnemonic Operands Brief description Flags
Compare two floating-point registers, or one floating- FPSCR
VCMPE.F32 Sd, <Sm | #0.0>
point register and zero with Invalid Operation check
VCVT.S32.F32 Sd, Sm Convert between floating-point and integer

VCVT.S16.F32 Sd, Sd, #fbits Convert between floating-point and fixed point
Convert between floating-point and integer with
VCVTR.S32.F32 Sd, Sm
rounding
VCVT<B|H>.F32.F16 Sd, Sm Converts half-precision value to single-precision

VCVTT<B|T>.F32.F16 Sd, Sm Converts single-precision register to half-precision

VDIV.F32 {Sd,} Sn, Sm Floating-point Divide

VFMA.F32 {Sd,} Sn, Sm Floating-point Fused Multiply Accumulate

VFNMA.F32 {Sd,} Sn, Sm Floating-point Fused Negate Multiply Accumulate

VFMS.F32 {Sd,} Sn, Sm Floating-point Fused Multiply Subtract

VFNMS.F32 {Sd,} Sn, Sm Floating-point Fused Negate Multiply Subtract

VLDM.F<32|64> Rn{!}, list Load Multiple extension registers

53
Cortex-M4 Instruction Set
Mnemonic Operands Brief description Flags

VLDR.F<32|64> <Dd|Sd>, [Rn] Load an extension register from memory

VLMA.F32 {Sd,} Sn, Sm Floating-point Multiply Accumulate

VLMS.F32 {Sd,} Sn, Sm Floating-point Multiply Subtract

VMOV.F32 Sd, #imm Floating-point Move immediate

VMOV Sd, Sm Floating-point Move register

VMOV Sn, Rt Copy ARM core register to single precision

VMOV Sm, Sm1, Rt, Rt2 Copy 2 ARM core registers to 2 single precision

VMOV Dd[x], Rt Copy ARM core register to scalar

VMOV Rt, Dn[x] Copy scalar to ARM core register

VMRS Rt, FPSCR Move FPSCR to ARM core register or APSR N,Z,C,V

VMSR FPSCR, Rt Move to FPSCR from ARM Core register FPSCR

VMUL.F32 {Sd,} Sn, Sm Floating-point Multiply

54
Cortex-M4 Instruction Set
Mnemonic Operands Brief description Flags

VNEG.F32 Sd, Sm Floating-point Negate

VNMLA.F32 Sd, Sn, Sm Floating-point Multiply and Add

VNMLS.F32 Sd, Sn, Sm Floating-point Multiply and Subtract

VNMUL {Sd,} Sn, Sm Floating-point Multiply

VPOP list Pop extension registers

VPUSH list Push extension registers

VSQRT.F32 Sd, Sm Calculates floating-point Square Root

VSTM Rn{!}, list Floating-point register Store Multiple

VSTR.F<32|64> Sd, [Rn] Stores an extension register to memory

VSUB.F<32|64> {Sd,} Sn, Sm Floating-point Subtract


WFE Wait For Event

WFI Wait For Interrupt

Note: full explanation of each instruction can be found in Cortex-M4 Devices’ Generic User Guide (Ref-4)

55

You might also like