0% found this document useful (0 votes)
10 views28 pages

AES Module 4 Notes

The document provides an overview of ARM Cortex M3 assembly language syntax, including instruction formats, operand types, and examples of moving data. It explains the Unified Assembler Language (UAL) and its advantages for using both 16-bit and 32-bit instructions, along with various instruction categories such as data movement, pseudo-instructions, and data processing operations. Additionally, it details memory access instructions, including loading and storing data, as well as stack operations and special register access.

Uploaded by

tubashazmeen02
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
10 views28 pages

AES Module 4 Notes

The document provides an overview of ARM Cortex M3 assembly language syntax, including instruction formats, operand types, and examples of moving data. It explains the Unified Assembler Language (UAL) and its advantages for using both 16-bit and 32-bit instructions, along with various instruction categories such as data movement, pseudo-instructions, and data processing operations. Additionally, it details memory access instructions, including loading and storing data, as well as stack operations and special register access.

Uploaded by

tubashazmeen02
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 28

17EC62

Module – 4
ARM Cortex M3 Instruction Sets and Programming

Assembly language syntax in ARM Cortex M-3


An ARM instruction will have this format.

Label: opcode operand1, operand2, ... ; Comments

 The label is optional. Some of the instructions might have a label in front of them so that the
address of the instructions can be determined using the label.
 The opcode (the instruction) followed by a number of operands. Some times suffixes can be
added to the opcode to indicate various information on the (M) Multiple transfers, (B,H,W,D)
size of data,(!)updating of register contents, (IA)increment after, (DB)decrement before etc;
 The first operand is the destination of the operation.
 The number of operands in an instruction depends on the type of instruction, and the syntax
format of the operand can also be different.
Examples:
i) Moving immediate data:
MOV R0, #0x12 ; Set R0 = 0x12 (hexadecimal)
MOV R1, #'A' ; Set R1 = ASCII character A

ii) Constants using EQU:

NVIC_IRQ_SETEN0 EQU 0xE000E100;

iii)Use of DCI, DCB, DCD:


 DCI (Define Constant Instruction) can be used to code an instruction if the assembler cannot
generate the exact instruction that is required provided the programmer know the binary code for
the instruction
DCI 0xBE00 ;  Opcode for Breakpoint (BKPT 0)
 DCB (Define Constant Byte) for defining byte size constant values, such as characters;
MY_NUMBER DCD 0x12345678; MY_NUMBER =0x12345678
 DCD (Define Constant Data) for defining word size constant values to define binary data in
your code.
HELLO_TXT DCB "Hello\n",0; HELLO_TXT= Hello
Various other instructions will be explained later on

Unified Assembler Language


 In order to get the best out of the Thumb®-2 instruction set, the Unified Assembler Language
(UAL) was developed to allow selection of 16-bit and 32-bit instructions and to make it easier to
port applications between ARM code and Thumb code by using the same syntax for both.
17EC62

 With UAL, the syntax of Thumb instructions is now the same as ARM
instructions. Ex: ADD R0, R1 ;  R0 = R0 + R1, using Traditional Thumb syntax.
ADD R0, R0, R1 ;  Equivalent instruction using UAL syntax.
 The traditional Thumb syntax can still be used. The choice between whether the instructions are
interpreted as traditional Thumb code or the new UAL syntax is normally defined by the
directive in the assembly file.
 For example, with ARM assembler tool, a program code header
“CODE16” directive implies the code is in the traditional Thumb
syntax, “THUMB” directive implies the code is in the new UAL
syntax.
 In traditional Thumb some instructions change the flags in APSR, even if the S suffix is not used.
 But in UAL syntax an instruction changes the flag only if the S suffix is
used. Example:
AND R0, R1 ; Traditional Thumb syntax will update flag after + operation.
No need any suffix.
ANDS R0, R0, R1 ; Equivalent UAL syntax S suffix is added to update Flags.
 With the new instructions in Thumb-2 technology, some of the operations can be handled by
either a Thumb instruction or a Thumb-2 instruction.
 For example, R0 = R0 + 1 can be implemented as a 16-bit Thumb instruction or a 32-bit Thumb-
2 instruction. With UAL, programmer can specify which instruction is to be used by adding
suffixes:
ADDS R0, #1 ; Use 16-bit Thumb instruction by default for smaller size
ADDS.N R0, #1 ; Use 16-bit Thumb instruction (N=Narrow)
ADDS.W R0, #1 ; Use 32-bit Thumb-2 instruction (W=wide)

 If no suffix is given, the assembler tool can choose either instruction but usually defaults to 16-
bit Thumb code to get a smaller size. Depending on tool support, programmer can use the .N
(narrow) suffix to specify a 16-bit Thumb instruction.

Instruction List
The ARM Cortex instruction can be classified into the following:
1. Moving Data
2. Pseudo-Instructions
3. Data processing instructions
4. Call & Unconditional Branch Instructions
5. Decision & Conditional Branch Instructions
6. Combined Compare & Conditional branch Instructions
7. Instruction Barrier & Memory Barrier Instructions
8. Saturation operation Instructions
9. Useful Thumb-2 instruction

22
17EC62

Move instructions
This can be grouped into the following category:
 Moving Data Between Register And Register
 Moving An Immediate Data Value Into A Register
 Moving Data Between Memory And Register
 Moving Data Between Special Register And Register

(a) Moving Data Between Register And Register


MOV R8, R3; data from R3 goes into R8.

(b) Moving Data Between Memory And Register

MOV R0,#0X12 12Hex goes into R0.


MOV R1, # ‘A’ ASCII value of A goes into R1.
MOVS R0, #0X78 Suffix S is used to move 8 bit or less than that into register 78H
goes to R0.
MOVW.W R0,#0X789A it’s a thumb-2 instruction. Moves 32 bit data 0000789A to
R0.
Data 3456789A can be stored in R0 as
MOVW.W R0,#0X789A moves 789A as LOWER 16-bit data in R0.
MOVT.W R0,#0X3456 moves 3456 as HIGHER 16-bit data in R0.
MOVT upper 16bits , MOVW lower 16bits

(c) Moving Data Between Memory And Register

 The basic instructions for accessing memory are Load and Store.
 Load (LDR) transfers data from memory to registers, and Store transfers data from registers
to memory.
 The transfers can be in different data sizes (byte, half word, word, and double word), as
outlined in Table below.
LDRB Rd, [Rn, #offset] Read byte from memory location Rn + offset
LDRH Rd, [Rn, #offset] Read half word from memory location Rn +
offset LDR Rd, [Rn, #offset] Read word from memory location Rn + offset
LDRD Rd1,Rd2, [Rn, #offset] Read double word from memory location Rn + offset
STRB Rd, [Rn, #offset] Store byte to memory location Rn + offset
STRH Rd, [Rn, #offset] Store half word to memory location Rn +
offset STR Rd, [Rn, #offset] Store word to memory location Rn + offset
STRD Rd1,Rd2, [Rn, #offset] Store double word to memory location Rn + offset
17EC62

 Multiple Load and Store operations can be combined into single instructions called LDM (Load
Multiple) and STM (Store Multiple), as outlined in below.
 The exclamation mark (!) in the instruction specifies whether the register Rd should be updated
after the instruction is completed.

STMIA.W R8!, {R0-R3} ; R8 changed to 0x8010 after store; (increment by 4 words)


STMIA.W R8 , {R0-R3} ; R8 unchanged after store.
 ARM processors also support memory accesses with preindexing and postindexing.
(a) Pre-Indexing
 Pre-indexing load instructions for various sizes (word, byte, half word, and double word)
LDR.W Rd, [Rn, #offset]!
LDRB.W Rd, [Rn, #offset]!
LDRH.W Rd, [Rn, #offset]!
LDRD.W Rd1, Rd2,[Rn, #offset]!
 Pre-indexing load instructions for various sizes with sign extend (byte, half word)
LDRSB.W Rd, [Rn, #offset]!
LDRSH.W Rd, [Rn, #offset]!
 Pre-indexing store instructions for various sizes (word, byte, half word, and double word)
STR.W Rd, [Rn, #offset]!
STRB.W Rd, [Rn, #offset]!
STRH.W Rd, [Rn, #offset]!
STRD.W Rd1, Rd2,[Rn,
#offset]!
 For pre-indexing, the register holding the memory address is adjusted and the memory transfer
then takes place with the updated address.
For example,
LDR.W R0,[R1, #offset]! ; Read memory[ R1+offset], with R1 updates to R1+offset.
 ARM Cortex-3 also supports multiple memory load and store operation with Increment after
and decrement before facility. This is illustrated below.
LDMIA Rd!,<reg list> Read multiple words from memory location specified by Rd;
address increment after (IA) each transfer (16-bit Thumb
instruction)
STMIA Rd!,<reg list> Store multiple words to memory location specified by Rd; address
increment after (IA) each transfer (16-bit Thumb instruction)
LDMIA.W Rd(!),<reg list> Read multiple words from memory location specified by Rd;
address increment after each read (.W specified it is a 32-bit
Thumb-2 instruction)
LDMDB.W Rd(!),<reg list> Read multiple words from memory location specified by Rd;
address Decrement Before (DB) each read (.W specified it is a
32- bit Thumb-2instruction)
STMIA.W Rd(!),<reg list> Write multiple words to memory location specified by Rd; address
increment after each read (.W specified it is a 32-bit Thumb-2
instruction)
STMDB.W Rd(!),<reg list> Write multiple words to memory location specified by Rd; address
24
17EC62

DB each read (.W specified it is a 32-bit Thumb-2 instruction)

 Postindexing Memory Access Instructions are alos available.


 Postindexing memory access instructions carry out the memory transfer using the base
address specified by the register and then update the address register afterward.
For example,
LDR.W R0,[R1], #offset ; Read memory[R1], with R1 is updated to R1+offset
 Postindexing load instructions for various sizes (word, byte, half word, and double word)

LDR.W Rd, [Rn], #offset


LDRB.W Rd, [Rn], #offset
LDRH.W Rd, [Rn], #offset
LDRD.W Rd1, Rd2,[Rn], #offset
 Postindexing load instructions for various sizes with sign extend (byte, half word)

LDRSB.W Rd, [Rn], #offset


LDRSH.W Rd, [Rn], #offset
 Postindexing store instructions for various sizes (word, byte,half word, and double word)

STR.W Rd, [Rn], #offset


STRB.W Rd, [Rn], #offset
STRH.W Rd, [Rn], #offset
STRD.W Rd1, Rd2,[Rn], #offset

 Accessing Stack memory Locations through PUSH & POP


PUSH {R0, R4-R7, R9} ; Push R0, R4, R5, R6, R7, R9 into stack memory
POP {R2,R3} ; Pop R2 and R3 from stack
 Usually a PUSH instruction will have a corresponding POP with the same register list, but
this is not always necessary.
 For example, a common exception is when POP is used as a function return:
PUSH {R0-R3, LR} ; Save register contents at beginning of subroutine
.... ..........................; Processing
.......
........
POP {R0-R3, PC} ; restore registers and return
 In this case, instead of popping the LR register back and then branching to the address in
LR, one can POP the address value directly in the program counter.
(d)Moving data between special Registers to another Register
To access APSR registers, one can use the instructions MRS and MSR. For example,
MRS R0, PSR ; Read Processor status word into R0
MSR CONTROL, R1 ; Write value of R1 into control register
 APSR can be written only in privileged mode.
17EC62

Pseudo-Instructions
Both LDR and ADR pseudo-instructions can be used to set registers to a program address value.
They have different syntaxes and behaviors.
 LDR obtains the immediate data by putting the data in the program code and uses a PC relative
load to get the data into the register.
 ADR tries to generate the immediate value by adding or subtracting instructions (for example,
based on the current PC value).
 As a result, it is not possible to create all immediate values using ADR, and the target address
label must be in a close range. However, using ADR can generate smaller code sizes compared
with LDR.
 For LDR, if the address is a program address value, the assembler will automatically set the LSB
to 1.
LDR R0, = address1 ; R0 set to 0x4001
.............................

address1 ; address here is 0x4000


Mov R0, R1 ; address1 contains program code
 LDR instruction will put 0x4001 into R1; the LSB is set to 1 to indicate
that it is Thumb code.
 If address1 is a data address, LSB will not be changed.

LDR R0, =address1 ; R0 set to 0x4000


...
address1 ; address here is 0x4000
DCD 0x0 ; address1 contains data
 For ADR, one can load the address value of a program code into a register without setting the
LSB automatically and no equal sign (=) in the ADR statement is required.
ADR R0, address1
...
address1 ; (address here is 0x4000)
MOV R0, R1 ; address1 contains program code.

Data Processing Instructions


The Cortex-M3 provides many different instructions for data processing
(a) Arithmetic operation instructions include ADD, SUB, MUL, DIV, unsigned and signed divide
(UDIV/SDIV).
(b) Logical instructions include AND, OR, NOT, Exor, Shift & Rotate.

26
17EC62

 ADD instruction can operate between two registers or between one register and an immediate
data value:
ADD R0, R0, R1 ; R0 = R0 + R1 ADDS
R0, R0, #0x12 ; R0 = R0 + 0x12
ADD.W R0, R1, R2 ; R0 = R1 + R2
 When 16-bit Thumb code is used, an ADD instruction can change the flags in the PSR.
 However, 32-bit Thumb-2 code can either change a flag or keep it unchanged.
 To separate the two different operations, the S suffix should be used if the
following operation depends on the flags:

ADD.W R0, R1, R2 ; Flag unchanged. ADDS.W R0, R1, R2 ; Flag change.

 All the Cortex-M3 Arithmetic Instructions are listed below:

ADD operation ADD with carry SUBTRACT


ADD Rd, Rn, Rm ; Rd = Rn + Rm ADD ADC Rd, Rn, Rm ; Rd = Rn + Rm + carry
SUB Rd, Rn, Rm ; Rd = Rn − Rm
Rd, Rd, Rm ; Rd = Rd + Rm ADD Rd, ADC Rd, Rd, Rm ; Rd = Rd + Rm SUB
+ Rd, #immed ; Rd = Rd − #immed SUB
#immed ; Rd = Rd + #immed carry ADC Rd, #immed ; Rd = Rd + Rd, Rn,#immed ; Rd = Rn − #immed
ADD Rd, Rn, # immed ; Rd = Rn + #imm + carry
#immed
ADD register with 12-bit immediate SUBTRACT with borrow (not carry) Reverse subtract
value To make Rd = Rd − Rm − borrow RSB.W Rd, Rn, #immed ; Rd = #immed
ADDW Rd, Rn,#immed ; Rd = Rn + SBC Rd, Rm ; –Rn
#immed To make Rd = Rn − #immed − borrow RSB.W Rd, Rn, Rm ; Rd = Rm − Rn
SBC.W Rd, Rn, #immed ;
To make Rd = Rn − Rm – borrow SBC.W
Rd, Rn, Rm ;
Multiply Unsigned Division Signed Division

MUL Rd, Rm ; Rd = Rd * Rm MUL.W UDIV Rd, Rn, Rm ; Rd = Rn/Rm SDIV Rd, Rn, Rm ; Rd = Rn/Rm
Rd, Rn, Rm ; Rd = Rn * Rm

32-bit multiply instructions for signed values 32-bit multiply instructions for unsigned values

SMULL RdLo, RdHi, Rn, Rm ; {RdHi,RdLo} = Rn * Rm SMLALUMULL RdLo, RdHi, Rn, Rm ; {RdHi,RdLo} = Rn * Rm UMLAL
RdLo, RdHi, Rn, Rm ; {RdHi,RdLo} += Rn * Rm RdLo, RdHi, Rn, Rm ; {RdHi,RdLo} += Rn * Rm
17EC62

List of available Logic Operation Instructions are given below

Bitwise AND Bitwise OR Bit clear


AND Rd, Rn ; Rd = Rd & Rn ORRRd, Rn ; Rd = Rd | Rn BIC Rd, Rn ; Rd = Rd & (~Rn)
AND.W Rd, Rn,#imm ; Rd = Rn & ORR.W Rd, Rn,#immed ; Rd = Rn | BIC.W Rd, Rn,#imm ; Rd = Rn
#imm #imm &(~#imm)
AND.W Rd, Rn, Rm ; Rd = Rn & Rd ORR.W Rd, Rn, Rm ; Rd = Rn | Rd BIC.W Rd, Rn, Rm ; Rd = Rn &(~Rd)
Bitwise OR NOT Bitwise Exclusive OR W indicates 32-bit operation Without
ORN.W Rd, Rn,#imm ; Rd = Rn EOR Rd, Rn ; Rd = Rd ^ Rn W indicates 16-bit operation
|(~#imm) EOR.W Rd, Rn,#imm ; Rd = Rn | #imm
ORN.W Rd, Rn, Rm ; Rd = Rn | (~Rd) EOR.W Rd, Rn, Rm ; Rd = Rn | Rd
Arithmetic shift right Logical shift left Logical shift right
ASR Rd, Rn,#immed ; Rd = Rn » immed LSLRd, Rn,#immed ; Rd = Rn « immed LSRRd, Rn,#immed ; Rd = Rn » immed
ASRRd, Rn ; Rd = Rd » Rn LSLRd, Rn ; Rd = Rd « Rn LSRRd, Rn ; Rd = Rd » Rn
ASR.W Rd, Rn, Rm ; Rd = Rn » Rm LSL.W Rd, Rn, Rm ; Rd = Rn « Rm LSR.W Rd, Rn, Rm ; Rd = Rn » Rm

Rotate right Rotate right extended Why Is There Rotate Right But No
ROR Rd, Rn ; Rd rot by Rn RRX.W Rd, Rn ; {C, Rd} = {Rn, C} Rotate Left?
ROR.W Rd, Rn,#imm ; Rd = Rn rot by  The rotate left operation can be
imm replaced by a rotate right operation
ROR.W Rd, Rn, Rm ; Rd = Rn rot by Rm with a different rotate offset.
For example,
 A rotate left by 4-bit operation can
be written as a rotate right by 28-
bit instruction, which gives the
same
result and takes the same amount of
time to execute.

Illustration of Shift and Rotate instuctions

Call and Unconditional Branch


 The 3 main unconditional branch instructions are: B label  Branch to a labeled
address, BX reg  Branch to an address specified by a register, BL label  Branch to
a labeled address and save return in LR, BL means Branch & Link.
 In the Cortex-M3, because it is always in Thumb state, this bit should be set to 1.
 If it is zero, the program will cause a usage fault exception because it is trying to
switch the processor into ARM state.
28
17EC62

 The return address will be stored in the link register (LR) and the function can be
terminated using BX LR, which causes program control to return to the calling
process.
 However, when using BLX, make sure that the LSB of the register is 1. Otherwise
the processor will produce a fault exception because it is an attempt to switch to the
ARM state.
Example:
Save the LR if You Need to Call a Subroutine
 The BL instruction will destroy the current content of the LR. So, if the program code needs the
LR later, one should save LR before the use BL.
 The common method is to push the LR to stack in the beginning of your
subroutine. For example,
main ()
...
BL functionA
...
functionA
PUSH {LR} ; Save LR content to stack
...
BL functionB
...
POP {PC} ; Use stacked LR content to return to main
functionB PUSH
{LR}
...
POP {PC} ; Use stacked LR content to return to functionA

 In addition, if the subroutine you call is a C function, you might also need to save the contents in
R0–R3 and R12 if these values will be needed at a later stage.

Decisions and Conditional Branches


 Most conditional branches in ARM processors use flags in the APSR to determine whether a
branch should be carried out.
 In the APSR, there are five flag bits; four of them are used for branch decisions; N Negative flag
(last operation result is a negative value), Z Zero (last operation result returns a zero
value,)C Carry (last operation returns a carry out or borrow) V Overflow (last operation
results in an overflow).
 Flag bit at bit[27], called the Q flag for saturation math operations.
 It can’t be used as conditional branching flag.

 The branch instructions are given below.


BEQ label ; Branch to address 'label' if Z flag is set.
 Thumb-2 version
BEQ.W label ; Branch to address 'label' if Z flag is set.

30
17EC62

 The defined branch conditions can also be used in IF-THEN-ELSE structures.


 For example,
CMP R0, R1 ; Compare R0 and R1
ITTEE GT ; If R0 > R1 Then if true, first 2 statements execute; if false, execute other 2 statements.
MOVGT R2, R0 ; R2 = R0
MOVGT R3, R1 ; R3 = R1
MOVLE R2, R0 ; Else R2 = R1
MOVLE R3, R1 ; R3 = R0

Combined Compare and Conditional Branch


 Two new instructions are provided on the Cortex-M3 to supply a simple compare with zero and
conditional branch operations.
 These are CBZ (compare and branch if zero) and CBNZ (compare and branch if nonzero).
 These compare and branch instructions only support forward branches.

3 11
17EC62

Conditional Execution Using IT Instructions (IF-THEN)


 The IT (IF-THEN) block is very useful for handling small conditional code.
 It avoids branch penalties as there is no change to program flow.
 It can provide a maximum of four conditionally executed instructions.
 In IT instruction blocks, the first line must be the IT instruction, detailing the choice of
execution, followed by the condition it checks.
 The first statement after the IT command must be TRUE-THEN-EXECUTE, which is always
written as IT xyz, where T means THEN and E means ELSE.
 The second through fourth statements can be either THEN (true) or ELSE (false):

IT<x><y><z> <cond> ; IT instruction (<x>, <y>, <z> can be TorE)


instr1<cond> <operands> ; 1st instruction (<cond> must be same as
IT) instr2<cond or not cond> <operands> ; 2nd instruction (can be<cond> or <!
cond> instr3<cond or not cond> <operands> ; 3rd instruction (can be <cond> or
<!cond> instr4<cond or not cond> <operands> ; 4th instruction (can be<cond> or
<!cond>

 If a statement is to be executed when <cond> is false, the suffix for the instruction must be the
opposite of the condition. For example, the opposite of EQ is NE, the opposite of GT is LE, and
so on.
 The following code shows an example of a simple conditional execution:
32
17EC62

if (R1<R2) then
R2=R2−R1
R2=R2/2
else
R1=R1−R
2 R1=R1/2
 In assembly, same thing can be replaced with
CMP R1, R2 ; If R1 < R2 (less then)
ITTEE LT ; then execute inst 1 and 2 (indicated by T) else execute
inst3 & 4 (indicated by E)
SUBLT.W R2,R1 ; 1st instruction
LSRLT.W R2,#1 ; 2nd instruction
SUBGE.W R1,R2 ; 3rd instruction (notice the GE is opposite of LT)
LSRGE.W R1,#1 ; 4th instruction

Instruction Barrier and Memory Barrier Instructions


The following are the three barrier instructions in the Cortex-M3:
 DMB (Data memory barrier): ensures that all memory accesses are completed before new
memory access is committed.
 DSB (Data synchronization barrier): ensures that all memory accesses are completed
before next instruction is executed.
 ISB (Instruction synchronization barrier): flushes the pipeline and ensures that all previous
instructions are completed before executing new instructions.

 DMB is very useful for multi-processor systems. For example, tasks running on separate
processors might use shared memory to communicate with each other. In these environments,
the order of memory accesses to the shared memory can be very important.

 DMB instructions can be inserted between accesses to the shared memory to ensure that the
memory access sequence is exactly the same as expected.

 The DSB and ISB instructions can be important for self-modifying code. For example, if a
program changes its own program code, the next executed instruction should be based on the
updated program.
 However, since the processor is pipelined, the modified instruction location might have already
been fetched. Using DSB and then ISB can ensure that the modified program code is fetched
again.
 Architecturally, the ISB instruction should be used after updating the value of the CONTROL
register.
The memory barrier instructions
 These can be accessed in C using Cortex Microcontroller Software Interface.
 Standard (CMSIS) compliant device driver library as follows:
void DMB(void); // Data Memory Barrier
void DSB(void); // Data Synchronization Barrier
void ISB(void); // Instruction Synchronization
Barrier Useful Instructions In the Cortex-M3
33
17EC62

Cortec-m3 has additional instructions namely:


 Special Purpose Register Accessing Instructions(MSR,MRS),
 Reverse Instructions (REV, REVH, REVSH and RBIT),
 Data Extension Instructions(SXTB, SXTH, UXTB, and UXTH),
 Bit Field Extraction Instruction(UBFX and SBFX)
 Bit field clear and bit field insert instruction (BFC, BFI)

These two instructions provide access to the special registers in the Cortex-M3. Here is the syntax of
these instructions:
MRS <Rn>, <SReg> ; Move from Special Register
MSR <SReg>, <Rn> ; Write to Special Register
 They can be used to read and write the following registers:
IPSR Interrupt status register
EPSR Execution status register (read as zero)
APSR Flags from previous operation
IEPSR A composite of IPSR and EPSR
IAPSR A composite of IPSR and APSR
EAPSR A composite of EPSR and APSR
PSR A composite of APSR, EPSR, and
IPSR MSP Main stack pointer
PSP Process stack pointer
PRIMASK Normal exception mask register
BASEPRI Normal exception priority mask register
BASEPRI_MAX Same as normal exception priority mask register, with conditional write (new
priority level must be higher than the old level)
FAULTMASK Fault exception mask register (also disables normal interrupts)
CONTROL Control register

REV, REVH, and REVSH Instructions


 REV reverses the byte order in a data word, and REVH reverses the byte order inside a half
word. For example, if R0 is 0x12345678, in executing the following:
REV R1, R0
REVH R2, R0
 R1 will become 0x78563412, and R2 will be 0x34127856. REV and REVH are particularly useful
for converting data between big endian and little endian.
 REVSH is similar to REVH except that it only processes the lower half word, and then it sign
extends the result.
 For example, if R0 is 0x33448899, on running: REVSH R1, R0 R1 will become 0xFFFF9988.

Reverse Bit Instructions


 The RBIT instruction reverses the bit order in a data word. The syntax is as follows:
RBIT.W <Rd>, <Rn>
 This instruction is very useful for processing serial bit streams in data communications.

34
17EC62

For example,
if R0 is 0xB4E10C23 (binary value 1011_0100_1110_0001_0000_1100_0010_0011), on
executing: RBIT.W R0, R1
 R0 will become 0xC430872D (binary value 1100_0100_0011_0000_1000_0111_0010_1101).

Data extension Instruction


SXTB, SXTH, UXTB, and UXTH
 The four instructions SXTB, SXTH, UXTB, and UXTH are used to extend a byte or half word
data into a word.
 The syntax of the instructions is as follows:
SXTB <Rd>, <Rn>
SXTH <Rd>, <Rn>
UXTB <Rd>, <Rn>
UXTH <Rd>, <Rn>
 For SXTB/SXTH, the data are sign extended using bit[7]/bit[15] of Rn. With UXTB and
UXTH, the value is zero extended to 32-bit.
For example, if R0 is 0x55AA8765:
SXTB R1, R0 ; R1 = 0x00000065
SXTH R1, R0 ; R1 =
0xFFFF8765 UXTB R1, R0 ; R1
= 0x00000065 UXTH R1, R0 ;
R1 = 0x00008765

Bit Field Clear and Bit Field Insert


 Bit Field Clear (BFC) clears 1–31 adjacent bits in any position of a register. The syntax of
instruction is as follows:

BFC.W <Rd>, <#lsb>, <#width>


For example,

LDR R0,=0x1234FFFF
BFC.W R0, #4, #8

This will give R0 = 0x1234F00F.


 Bit Field Insert (BFI) copies 1–31 bits (#width) from one register to any location (#lsb) in
another register. The syntax is as follows:

BFI.W <Rd>, <Rn>, <#lsb>, <#width>


For example,
LDR R0,=0x12345678
LDR R1,=0x3355AACC
BFI.W R1, R0, #8, #16 ; Insert R0[15:0] to R1[23:8] so, R1= 0x335678CC.
Bit Field Extract Instructions (UBFX and SBFX)

35
17EC62

 UBFX and SBFX are the unsigned and signed bit field extract instructions. The syntax of the
instructions is as follows:
UBFX.W <Rd>, <Rn>, <#lsb>, <#width>
SBFX.W <Rd>, <Rn>, <#lsb>, <#width>
 UBFX extracts a bit field from a register starting from any location (specified by #lsb) with
any width (specified by #width), zero extends it, and puts it in the destination register.

For example,

LDR R0,=0x5678ABCD
UBFX.W R1, R0, #4, #8
This will give R1 = 0x000000BC.
 Similarly, SBFX extracts a bit field, but its sign extends it before putting it in a destination
register.

For example,
LDR R0,=0x5678ABCD
SBFX.W R1, R0, #4, #8
This will give R1 = 0xFFFFFFBC.

Saturation Operation instructions


 The Cortex-M3 supports two instructions that provide signed and unsigned saturation
operations: SSAT and USAT (for signed data type and unsigned data type,
respectively).
 Saturation is commonly used in signal processing—for example, in signal amplification.
 When an input signal is amplified, there is a chance that the output will be larger than the
allowed output range.
 If the value is adjusted simply by removing the unused MSB, an overflowed result will
cause the signal waveform to be completely deformed as shown below and use of
saturation instruction will the amount of distortion is greatly reduced in the signal
waveform.
 Q-bit of APSR is updated.
 Format of saturation instruction
SSAT.W <Rd>, #<immed>, <Rn>, {,<shift>} Saturation for signed value
USAT.W <Rd>, #<immed>, <Rn>, {,<shift>} Saturation for a signed value into
an unsigned value.
 If a 32-bit signed value is to be saturated into a 16-bit signed value, the following
instruction can be used:
SSAT.W R1, #16, R0
 If a 32-bit unsigned value is to saturate into a 16-bit unsigned value, the following
instruction can be used:
USAT.W R1, #16, R0
 They can be used to convert a 32-bit integer value to 16-bit integer value.

36
17EC62

 The table shows impact of saturation instruction in signed data conversion.

 The table shows impact of saturation instruction in unsigned data conversion.

37
17EC62

Memory Mapping

 The Cortex-M3 processor has a fixed memory map which makes it easier to port software from one
Cortex- M3 product to another.
 The Nested Vectored Interrupt Controller (NVIC) and Memory Protection Unit (MPU), have the
same memory locations in all Cortex-M3 products. Some of the memory locations are allocated for
private peripherals such as debugging components.
 They are located in the private peripheral memory region. These debugging components include the
following:
1. Fetch Patch and Breakpoint Unit (FPB)
2. Data Watchpoint and Trace Unit (DWT)
3. Instrumentation Trace Macrocell (ITM)
4. Embedded Trace Macrocell (ETM)
5. Trace Port Interface Unit (TPIU)
6. ROM table

Predefined Memory Map of Cortex-M3


 The Cortex-M3 processor has a total of 4 GB of address space. Program code can be located in
the code region, the Static Random Access Memory (SRAM) region, or the external RAM region.

38
17EC62

 It is good to put the program code in the code region because, the instruction fetches and data
accesses are carried out simultaneously on two separate bus interfaces.
 The SRAM memory range is for connecting internal SRAM. Access to this region is carried out via
the system interface bus. In this region, a 32-MB range is defined as a bit-band alias.
 Within the 32-bit-band alias memory range, each word address represents a single bit in the 1-MB
bit- band region.
 A data write access to this bit-band alias memory range will be converted to an atomic READ-
MODIFY-WRITE operation to the bit-band region so as to allow a program to set or clear
individual data bits in the memory.
 The bit-band operation applies only to data accesses not instruction fetches.
 By putting Boolean information (single bits) in the bit-band region, we can pack multiple Boolean
data in a single word while still allowing them to be accessible individually via bit-band alias, thus
saving memory space without the need for handling READ-MODIFY-WRITE in software.
 A 0.5-GB block of address range is allocated to on-chip peripherals. Similar to the SRAM
region, this region supports bit-band alias and is accessed via the system bus interface but the
instruction execution in this region is not allowed.
 The bit-band support in the peripheral region makes it easy to access or change control and status
bits of peripherals, making it easier to program peripheral control.
 Two slots of 1-GB memory space are allocated for external RAM and external devices.
 It is to be noted that the program execution in the external device region is not allowed, and there
are some differences with the caching behaviors.
 The last 0.5-GB memory is for the system-level components, internal peripheral buses, external
peripheral bus, and vendor-specific system peripherals.
 There are two segments of the private peripheral bus (PPB):
1. Advanced High-Performance Bus (AHB) PPB, for Cortex-M3 internal AHB peripherals ie;
NVIC, FPB, DWT, and ITM.
2. Advance Peripheral Bus (APB) PPB, for Cortex-M3 internal APB devices as well
as external peripherals (external to the Cortex-M3 processor);
 The Cortex-M3 allows chip vendors to add additional on-chip APB peripherals on this private
peripheral bus via an APB interface.

39
17EC62

System Control Space

 The NVIC is located in a memory region called the system control space (SCS) Besides
providing interrupt control features, this region also provides the control registers for
SYSTICK,MPU, and code debugging control.
 The remaining unused vendor-specific memory range can be accessed via the system bus
interface, but instruction execution in this region is not allowed.

Bit-band operation
 Bit-band operation support allows a single load/store operation to access (read/write) to a single
data bit.
 In the Cortex-M3, this is supported in two predefined memory regions called bit-band regions.
(a) One is located in the first 1 MB of the SRAM region.
(b) One more is located in the first 1 MB of the peripheral region.

Bit Accesses to Bit-Band Region via the Bit-Band Alias.

Write to Bit-Band Alias.

 These two memory regions can be accessed like normal memory, but they can also be accessed
via a separate memory region called the bit-band alias.
 When the bit-band alias address is used, each individual bit can be accessed separately in the
least significant bit (LSB) of each word-aligned address.

40
17EC62

 For example, to set bit 2 in word data in address 0x20000000, instead of using three
instructions to read the data, set the bit, and then write back the result, this task can be carried
out by a

single
instruction The assembler sequence for these two cases could be like the one shown
With out bit-band:
LDR R0,=0x20000000 ; Setup address LDR
R1, [R0] ; Read
ORR.W R1, #0x4 ; Modify bit STR
R1, [R0] ; Write back result
With bit-band:
LDR R0, = 0x22000008 ; Setup address
MOV R1, #1 ; Setup data
STR R1, [R0] ; Write
 The bit-band support can simplify application code if we need to read a bit in a memory
location. For example, if we need to determine bit 2 of address 0x20000000, we use the steps
outlined here.

 The assembler sequence for these two cases could be like the one shown.
Without Bit-band
LDR R0, =0x20000000 ; Setup address LDR
R1, [R0] ; Read
UBFX.W R1, R1, #2, #1 ; Extract bit[2]
With bit-band
LDR R0,=0x22000008 ; Setup
address LDR R1, [R0] ; Read
 The Cortex-M3 uses the following terms for the bit-band memory addresses:
1. Bit-band region: This is a memory address region that supports bit-band operation.
2. Bit-band alias: Access to the bit-band alias will cause an access (a bit-band operation)
to the bit-band region.

 Within the bit-band region, each word is represented by an LSB of 32 words in the bit-band alias
address range.
 When the bit-band alias address is accessed, the address is remapped into a bit-band address.
 For read operations, the word is read and the chosen bit location is shifted to the LSB of the read
return data.
 For write operations, the written bit data are shifted to the required bit position, and a READ-
MODIFY-WRITE is performed.
4 11
17EC62

 There are two regions of memory for bit-band operations:


1. 0x20000000–0x200FFFFF (SRAM, 1 MB)
2. 0x40000000–0x400FFFFF (peripherals, 1 MB)
 For the SRAM memory region, the remapping of the bit-band alias is shown.

 For the peripheral memory region bit-band aliased addresses, as shown.

 Procedure to access the bit band region:


1. Set address 0x20000000 to a value of 0x3355AACC.
2. Read address 0x22000008. This read access is remapped into read access to 0x20000000. The
return value is 1 (bit[2] of 0x3355AACC).
3. Write 0x0 to 0x22000008. This write access is remapped into a READ-MODIFY-WRITE
to 0x20000000. The value 0x3355AACC is read from memory, bit 2 is cleared, and a result
of 0x3355AAC8 is written back to address 0x20000000.
4. Now, read 0x20000000. That gives you a return value of 0x3355AAC8 (bit [2] cleared).

Advantages of Bit-Band Operations


 To implement serial data transfers in general-purpose input/output (GPIO) ports to serial
devices, the application code can be implemented easily because access to serial data and clock
signals can be separated.
 Bit-band operation can also be used to simplify branch decisions. For example, if a branch
should be carried out based on 1 single bit in a status register in a peripheral, instead of
1. Reading the whole register
2. Masking the unwanted bits
3. Comparing and branching gets simplified the operations to:
a. Reading the status bit via the bit-band alias (get 0 or 1)

42
17EC62

b. Comparing and branching.


 Along with faster bit operations with fewer instructions, the bit-band feature in the Cortex- M3
is also essential for situations in which resources are being shared by more than one process.
 One of the most important advantages of a bit-band operation is that it is atomic, due its READ-
MODIFY-WRITE sequence cannot be interrupted by other bus activities.
 Without this behavior in, for example, using a software READ-MODIFY-WRITE sequence, the
following problem can occur: consider a simple output port with bit 0 used by a main program
and bit 1 used by an interrupt handler. A software-based READ-MODIFY-WRITE operation can
cause data conflicts, as shown below.

Data gets lost due to modification shared memory by an Exception handler.

 With the Cortex-M3 bit-band feature, this kind of race condition can be avoided because the
READMODIFY-WRITE is carried out at the hardware level and is atomic (the two transfers
cannot be pulled apart) and interrupts cannot take place between them as shown below.

Data loss prevented with Locked Transfer through Bit-band Feature.


 Similar issues can be found in multitasking systems. Where, if bit 0 of the output port is

43
17EC62

used by Process A and bit 1 is used by Process B, a data conflict can occur in software-based
READ-MODIFY-WRITE as shown below.

 But the bit-band feature can ensure that bit accesses from each task are separated so that no data
conflicts occur as shown below.

 The bit-band feature can be used for storing and handling Boolean data in the SRAM region.
Like, multiple Boolean variables can be packed into one single memory location to save
memory space, whereas the access to each bit is still completely separated when the access is
carried out via the bit- band alias address range.

44
17EC62

CMSIS
 The CMSIS was developed by ARM to allow users of the Cortex- M3 microcontrollers to gain
the most benefit from all these software solutions and to allow them to develop their embedded
application quickly and reliably.

CMSIS Provides a Standardized Access Interface for Embedded Software Products.


 The CMSIS was started in 2008 to improve software usability and inter-operability of ARM
microcontroller software.
 It is integrated into the driver libraries provided by silicon vendors, providing a standardized
software interface for the Cortex-M3 processor features, as well as a number of common system
and I/O functions.
 The library is also supported by software companies including embedded OS vendors and
compiler vendors.
 The aims of CMSIS are to:
1. Improve software portability and reusability
2. Enable software solution suppliers to develop products that can
work seamlessly with device libraries from various silicon vendors
3. Allow embedded developers to develop software quicker with an easy-to-
use and standardized software interface.
4. Allow embedded software to be used on multiple compiler products
5. Avoid device driver compatibility issues when using software solutions
from multiple sources
 The scope of CMSIS involves standardization in the following areas:
1. Hardware Abstraction Layer (HAL) for Cortex-M processor registers: This includes
standardized register definitions for NVIC, System Control Block registers, SYSTICK register,
MPU registers, and a number of NVIC and core feature access functions.
2. Standardized system exception names: This allows OS and middleware to use system
exceptions easily without compatibility issues.
3. Standardized method of header file organization: This makes it easier for users to learn
new Cortex microcontroller products and improve software portability.
4. Common method for system initialization: Each Microcontroller Unit (MCU) vendor
provides a SystemInit() function in their device driver library for essential setup and configuration,

45
17EC62

such as initialization of clocks.


5. Standardized intrinsic functions: Intrinsic functions are normally used to produce instructions that
cannot be generated by IEC/ISO C.* By having standardized intrinsic functions, software reusability
and portability are considerably improved.
6. Common access functions for communication: This provides a set of software interface
functions for common communication interfaces including universal asynchronous receiver/transmitter
(UART), Ethernet, and Serial Peripheral Interface (SPI). By having these common access functions in
the device driver library, reusability and portability of embedded software are improved.
7. Standardized way for embedded software to determine system clock frequency : A software
variable called SystemFrequency is defined in device driver code. This allows embedded OS to set up
the SYSTICK unit based on the system clock frequency.

 The CMSIS is divided into multiple layers as follows:


Core Peripheral Access Layer
Name definitions, address definitions, and helper functions to access core registers and core
peripherals
Middleware Access Layer
• Common method to access peripherals for the software industry (work in progress)
• Targeted communication interfaces include Ethernet, UART, and SPI.
• Allows portable software to perform communication tasks on any Cortex microcontrollers that
support the required communication interface.
Device Peripheral Access Layer (MCU specific)
• Name definitions, address definitions, and driver code to access peripherals

CMSIS Structure
46
17EC62

Access Functions for Peripherals (MCU specific)


• Optional additional helper functions for peripherals
 The main advantages of CMSIS are:
1. Much better software portability and reusability.
Besides easy migration between different Cortex-M3 microcontrollers, it also allows software to
be quickly ported between Cortex-M3 and other Cortex-M processors, reducing time to market.
2. For embedded OS vendors and middleware providers, the advantages of the
CMSIS are significant.
By using the CMSIS, their software products can become compatible with device drivers from
multiple microcontroller vendors, including future microcontroller products that are yet to be
released. Without the CMSIS, the software vendors either have to include a small library for
Cortex- M3 core functions or develop multiple configurations of their product so that it can
work with device libraries from different microcontroller vendors.

CMSIS showing avoidance of Driver code overlapping

3. The CMSIS has a small memory footprint (less than 1 KB for all core access
functions and a few bytes of RAM).
It avoids overlapping of core peripheral driver code when reusing software code from other
projects. Since CMSIS is supported by multiple compiler vendors, embedded software can
compile and run with different compilers. As a result, embedded OS and middleware can be
MCU vendor independent and compiler tool vendor independent.
4. Before availability of CMSIS, intrinsic functions were generally compiler specific and
could cause problems in retargetting the software in a different compiler.
Since all CMSIS compliant device driver libraries have a similar structure, learning to use
different Cortex-M3 microcontrollers is even easier as the software interface has similar look
and feel (no need to relearn a new application programming interface).
5. CMSIS is tested by multiple parties and is Motor Industry Software Reliability
Association (MISRA) compliant, thus reducing the validation effort required for developing
your own NVIC or core feature access functions.
47
17EC62

A Typical Development Flow

 Various software programs are available for developing Cortex-M3 applications. The concepts of
codegeneration flow in terms of these tools are similar.
 For the most basic uses, you will need assembler, a C compiler, a linker, and binary file
generation utilities.
 For ARM solutions, the RealView Development Suite (RVDS) or RealView Compiler Tools
(RVCT) provide a file generation flow, as shown above.
 The scatter-loading script is optional but often required when the memory map becomes more
complex.
 Besides these basic tools, RVDS also contains a large number of utilities, including an
Integrated Development Environment (IDE) and debuggers.

Assembly Codes for Cortex-M3

1. Multiplication of two 32 bit

numbers THUMB
AREA MULTI, CODE, READONLY
ENTRY
LDR R0,=0X706F ; load 16 bit no. to R0 LDR
R1,=0X7161 ; load 16 bit no. to R1
UMUL R2,R3,R1,R0 ; multiply R1 with R0 and store result in dest register R2 and R3
END

2. ALP to find the sum of first 10 integer

numbers THUMB
AREA sum,CODE,READONLY
ENTRY
MOV R1,#10 ; load 10 to register
MOV R2,#0 ; empty R2 register to store result Loop

48
17EC62

ADD R2,R2,R1 ; add the content of R1 with result at


R2 SUBS R1,#0x01 ; Decreament R1 by 1
BNE loop ; repeat till R1 goes 0
LDR R0,=result ; load the addr of var into
R0 STR R2,[R0] ; store the sum into result addrs
location END
3.

4.

For C Codes, refer Lab Manual 17ECL67, Dept of ECE, MIT Mysore.

49

You might also like