0% found this document useful (0 votes)

431 views148 pages

Arm&Es Module2

The document provides information about the ARM Cortex M3 instruction set and programming. It begins with an overview of assembly basics, instruction types, addressing modes, and directives. It then describes the different operating modes of ARM and details of the register bank. The document explains the assembler language syntax and various addressing modes supported by the ARM architecture like register, immediate, indexed, pre-indexed, post-indexed, double register indirect, and program counter relative addressing. It also lists various instruction suffixes that specify condition codes or update flags.

Uploaded by

manjunathanaikv

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

431 views148 pages

Arm&Es Module2

Uploaded by

manjunathanaikv

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

You are on page 1/ 148

RN SHETTY TRUST

RNS INSTITUTE OF TECHNOLOGY

(AICTE Approved, VTU Affiliated and NAAC ‘A’ Accredited)
(UG programs-CSE, ECE, ISE, EIE and EEE have been accredited by NBA
For the Academic Years 2018-19, 2019-20 and 2020-21)
Channasandra, Dr.Vishnuvardhan Road, Bangalore 560098

MODULE 2
ARM Cortex M3 Instruction Sets and Programming: Assembly basics,
Instruction list and description, Useful instructions, Memory mapping, Bus
interfaces and CMSIS, Assembly and C language Programming (Text 1: Ch-
4, Ch-5, Ch-10 (10.1, 10.2, 10.3, 10.5 only)

Text Book to be referred

Joseph Yiu, “The Definitive Guide to the ARM Cortex-M3”, 2nd Edition, Newnes,
(Elsevier), 2010.
CHAPTER 4
ARM Cortex M3 Instruction Sets and Programming
Differences between ARM state and Thumb state

Basics of computer programming

 Program: is the information that the user want to convey to CPU / Computing Engine.

 High level language: A high-level language is an advanced computer programming that

abstracts details of the underlying hardware, may be designed for a specific job, and is easier
to understand. A line of code in a higher level language expands into multiple machine code
instruction. Ex: BASIC, FORTRAN, Java, C++ and Pascal.
 Low level language: A low-level programming language is a programming language that
provides little or no abstraction from a computer's instruction set architecture—commands or
functions in the language map closely to processor instructions. Generally this refers to
either machine code or assembly language.
 Machine level language: Programming language that can be directly understood and obeyed
by a machine (computer) without conversion (translation).

 Assembler: A program for converting instructions written in low level symbolic code into
machine code.
 Interpreter: A program that can analyze and execute a program line by line.
 Compiler: A program that converts instructions into a machine code or lower level form so
that they can be read and executed by a computer.
 Mnemonic: A mnemonic is a term, symbol or name used to define or specify a computing
function.
 Opcodes: In computing, an opcode is the portion of a machine language instruction that
specifies the operation to be performed.
 Operands: In computers, an operand is the part of a computer instruction that
specifies data that is to be operating on or manipulated and by extension, the data itself.
Basically, a computer instruction describes an operation (add, subtract, and so forth) and the
operand or operands on which the operation is to be performed.
 Instructions: An instruction is an order given to a computer processor by a
computer program.

Operating modes
ARM has 7 basic Operating modes
1. User mode: Unprivileged mode under which most task run.
2. FIQ (Fast Interrupt Request): Entered when a high priority interrupt is raised.
3. IRQ (Interrupt Request): Entered when low priority interrupt is raised.
4. Supervisor: Entered on reset and when a software interrupt instruction is executed.
5. Abort: Entered upon memory access violations.
6. Undef: Used to handle undefined instructions.
7. System: It is a privileged mode which uses same registers as user mode.

Bank Registers
 There are 37 registers in the register file, which are of 32 bit long.
 There are upto 18 active registers:- 16 data registers (R0 – R15), CPSR (Current Program
Status Register), SPSR (Saved Program Status Register).


Out of 37 registers, 20 are hidden from a program at different times. These registers are
called banked registers.
 They are available only when processor is in a particular mode.
 Ex: Abort mode has registers R13_ab, R14_ab and SPSR_ab.
 Every processor mode except user mode can change mode by writing directly to the mode
bits of the CPSR.
 All processor modes except system modes have a set of associated banked resisters that are a
subset of main 16 registers.
 If processor mode is changed, the banked registers from the new mode will replace an
existing register.
 Ex: When the processor is in interrupt request mode, the instructions that are executed still
access registers R13 and R14. However then registers are the banked registers R13_irq and
R14_irq. User mode registers R13 and R14 not affected.

Asembler Language
 Basic syntax in assembler code, the following instruction formatting is commonly used:
Label opcode operand1, operand2, .. ;Comments
 The label is optional.
 Some of the instructions might have a label in front of them so that the address of
the instructions can be determined using the label.
 Then, the op-code (the instruction) followed by number of operands. Normally, the first
operand is the destination of the operation.
 The text after each semicolon (;) is a comment. These comments do not affect the program
operation, but they can make programs easier for humans to understand.
 The difference between thumb instruction format and Unified Assembler Language is as
follows:

Sl Thumb instruction format Unified Assembler Language (UAL)

No
1 ADD R0, R1 (2 operands are specified in ADD R0, R0, R1 (3 operands are specified
an instruction). The operation is R0 = R0 in an instruction). The operation is R0 = R0
+R1. + R1.
2 By default, it updates the flag bits. Flag bits are updated only by specifying
‘S’ in the mnemonic. Ex: ADDS.

 By default, the instruction is narrow (16 bits).

Ex: MOV R1, #18 and MOV.N R1, #18 are same.
 For 32 bit instructions, MOV.W R1,#18 is used. (.W means wide).

Assembler Directives
 Assembler directives are instructions that direct the assembler to do something.
 1). EQU( Equate) - Constants can be defined using EQU.
- Ex: NVIC_IRQ_SET EQU
0xE000E100
NVIC_IRQ_ENABLE EQU 0x1
 2). DCI (Define Constant Instruction) - A number of data definition directives are
available for instruction of constants inside assembly code.
Ex: DCI is used to code an instruction if the assembler cant generate the exact instruction that
the user wants and if the user know the binary code for instruction.
DCI 0xBE00 ; Breakpoint (BKPT 0) – 16 bit instruction

 DCB (Define Constant Byte) – Byte size constant values such as characters can be used
to define using DCB.
Ex: LDR R0,
=hellotxt BL
printtext
Hellotxt DCB “hello\n”, 0

 DCD (Define Constant Data) – Word size constant values to define binary data in
assembler code is done through DCD.
Ex: LDR R3, = my_num
LDR R4, [R3]
my_num DCD 0x12345678H

Suffixes in instructions

Suffix Descriptions
S Update APSR flags. Ex: ADDS
EQ, NE, NT, GT Conditional execution. Ex: BEQ <label>

ARM addressing Modes

There are different ways to specify the address of the operands for any given operations such as
load, add or branch. The different ways of determining the address of the operands are called
addressing modes. The different addressing modes of ARM processor are explained below.
Name Alternative Name ARM Examples

Register to register Register direct MOV R0, R1

-
Absolute Direct LDR R0, MEM

Literal Immediate MOV R0, #15

ADD R1, R2,
#12

Indexed, base Register indirect LDR R0, [R1]

Pre-indexed, Register indirect LDR R0, [R1,

#4] base with displacement with offset

Pre-indexed, Register indirect LDR R0, [R1, #4]!

autoindexing pre-incrementing

Post-indexing, Register indirect LDR R0, [R1], #4

autoindexed post-increment

Double Register indirect Register indirect LDR R0, [R1,

R2] Register indexed

Double Reg indirect Register indirect LDR R0, [R1, r2, LSL
#2] with scaling indexed with scaling

Program counter relative LDR R0, [PC, #offset]

1. Register Addressing
Operand is given in the CPU register of ARM (R0-R15). It can be only source, only destination
or both.
Examples Meaning

MOV R0,R1 ;Copy R1 content to R0.

ADD R1, R2, R3 ; Add R3 and R2 register content and store result in R1

2. Immediate Addressing
Operand is directly given in the instruction, prefixed with # symbol. Can be provided in decimal
or hexadecimal, internally always stored in hexadecimal system..

Examples Meaning

CMP R0, #22 ;Compare R0 content with imm value 22

ADD R1, R2, #18 ;Add imm value 18 to R2 and store result in

R1 MOV R1, #0xFF ;Copy imm FFh to R1

AND R0, R1, #0xFF000000 ;Logically AND imm FF000000h with R1 and store
;result in R0

CMNS R0, #6400 ; Compare negate imm 6400 with R0 and update
; the N, Z, C and V flags

CMPGT SP, R7, LSL #2 ; Compare R7 data shifted left 4 times with SP if
; GT condition is met and update the N, Z, C and V flags

3. Register Indirect Addressing

Register indirect addressing means that the location of an operand is held in a register. It is also
called indexed addressing or base addressing.
Register indirect addressing mode requires three read operations to access an operand. It is very
important because the content of the register containing the pointer to the operand can be
modified at runtime. Therefore, the address is a vaiable that allows the access to the data
structure like arrays.

Some examples of using register indirect addressing mode:

LDR R2, [R0] ; Load R2 with the word pointed by R0.
[If R0 is pointing to location 0x10000000 then content (32-bit) of location 0x10000000 is
copied to R2.]
STR R2, [R3] ; Store the word in R2 in the location pointed by R3

4. Register Indirect Addressing with an Offset

ARM supports a memory-addressing mode where the effective address of an operand is
computed by adding the content of a register and an immediate offset coded into load/store
instruction. For example,

Instruction Effective Address

LDR R0, [R1, #20] R1 + 20 ; loads R0 with the word pointed at by R1+20

5. ARM's Autoindexing Pre-indexed Addressing Mode

This is used to facilitate the reading of sequential data in structures such as arrays, tables, and
vectors. A pointer register is used to hold the base address. An offset can be added to achieve the
effective address. For example,
Instruction Effective Address

LDR R0, [R1, #4]! R1+4 ; loads R0 with the word pointed at by
; R1+4then update the pointer by adding 4 to R1

6. ARM's Autoindexing Post-indexing Addressing Mode

This is similar to the above, but it first accesses the operand at the location pointed by the
base register, then increments the base register. For example,

Instruction Effective Address

LDR R0, [R1], #4 R1 ; loads R0 with the word pointed at by R1

; then update the pointer by adding 4 to R1

7. Program Counter Relative (PC Relative) Addressing Mode

Register R15 is the program counter. If you use R15 as a pointer register to access operand, the
resulting addressing mode is called PC relative addressing. The operand is specified with respect
to the current code location. Please look at this example,
Instruction Effective Address

LDR R0,[R15,#24] R15 + 24 ; loads R0 with the word pointed at by R15+24

Summary of ARM's Indexed Addressing Modes

Addressing Mode Assembly Mnemonic Effective address Final Value in R1

Pre-indexed, base LDR R0, [R1, #d] R1 + d R1

unchanged

Pre-indexed, base LDR R0, [R1, #d]! R1 + d R1 + d

updated

Post-indexed, base LDR R0, [R1], #d R1 R1 + d

updated

Instruction set classification

The instruction set used in ARM Cortex M3 are classified as follows:
1. Moving data within the processor
2. Memory access instructions
3. Arithmetic operations
4. Logic Operations
5. Shift and rotate instructions
6. Sign extend instructions
7. Data reverse instructions
8. Bit field processing instructions
9. Program flow control instructions
10. Memory Barrier instructions
11. IT instruction block
12. Saturation/Conversion instructions
13. Table branch byte and Table branch half word instructions
14. Miscellaneous instructions

A. Moving data within processor

 One of the most basic functions in a processor is transfer of data. In the Cortex-M3,
data transfers can be of one of the following types:
 Moving data between register and register.
 Moving data between memory and register.
 Moving data between special register and register.
 Moving an immediate data value into a register.
 The command to move data between registers is MOV (move). For example, moving
data from register R3 to register R8 looks like this: MOV R8, R3.
 Moving immediate data into a register is a common thing to do. For example, you
might want to access a peripheral register, so you need to put the address value into
a register before hand. For small values (8 bits or less), MOVS (move with status
update) can be used.
Ex: MOV R0, #0xFFH ; Set R0 = 0xFF (hexadecimal)
Ex: MOV R1, #'S' ; Set R1 = ASCII character S
Ex: MOVS R0, # 0x12H ; Set R0 to 0x12. For a larger value (over 8 bits),
you might need to use a Thumb-2 move instruction.
Ex: MOVW.W R0, #0x789AH ; Set R0 lower half to 0x789AH
 Another instruction can generate the negative value of the original data; it is called
MVN (move negative).

 MOV/MVN instruction without barrel shifter operations

Example 1 Example 2 Example 3

Before Before Before
R1= 0x00000000 R1 = 0x00000000 R1 = 0x00000000
R0 = 0x00000004 R0 = 0x00000004
MOV R1, R0 MOV R1, #0x04 MVN R1, R0
After After After
R1= 0x00000004 R1 = 0x00000004 R1 = 0xFFFFFFFB
R0 = 0x00000004 R0 = 0x00000004

 MOV/MVN instruction with barrel shifter operations

Barrel Shifter

Execute

Decode

Fetch

Ex: ADD Rd, Rn, Rm, LSL #2

i.e., Rd = Rn + Rm * 4

Example 1 Example 2
Before Execution Before Execution
R1=0x00000000H, R0 = 0x00000004H R1 = 0x00000000H, R0 = 0x00000004H
MOV R1, R0, LSR #1 MVN R1,R0, LSR #1
After Execution After Execution
R1= 0x00000002H, R0 = 0x00000004H R1 = 0xFFFFFFFDH, R0 = 0x00000004H
MOVT (Move Top)
 Syntax: MOVT{cond} Rd, #imm16
Where cond is an optional condition code. Rd is the destination register. imm16 is a 16-
bit immediate constant.
 Operation: MOVT writes a 16-bit immediate value imm16 to the top halfword,
Rd[31:16] of its destination register. The write does not affect Rd[15:0].
 The MOV, MOVT instruction pair enables the user to generate any 32-bit constant.
 Restrictions: Rd must not be SP and must not be PC.
 Condition Flags: This instruction does not change the flags.

Example
Before Execution Instruction After Execution
R3=0x12345678H MOVT R3, #0xF8CDH R3= 0xF8CD5678H
B. Memory Access Instructions
i. LDR and STR, LDM and STM, LDRD and STRD
ii. PUSH and POP
iii. MRS and MSR

i). LDR and STR Instructions

Example Description
LDRB Rd, [Rn, #offset] Read byte from memory location Rn+offset
LDRH Rd, [Rn, #offset] Read half word from memory location Rn+offset
LDR Rd, [Rn, #offset] Read word from memory location Rn+offset
LDRD Rd1,Rd2, [Rn, #offset] Read double word from memory location Rn+offset
STRB Rd, [Rn, #offset] Store byte to memory location Rn+offset
STRH Rd, [Rn, #offset] Store half word to memory location Rn+offset
STR Rd, [Rn, #offset] Store word to memory location Rn+offset
STRD Rd1,Rd2, [Rn, #offset] Store double word to memory location Rn+offset

Examples on LDR instruction

Before Execution

Address Data
1000002A 0xABCDEF54H
R1
1000002E 0x12345678H
10000032 0x24567893H
10000036 0x88564478H
Example 1 Example 2 Example 3 Example 4
LDRB R2, [R1,#4] LDRH R2, [R1, #4] LDR R2, [R1, #4] LDRD R2, R3, [R1, #4]

Operation: Operation: Operation: Operation:

R2=mem8[R1+4] R2= mem16[R1+4] R2= mem32 [R1+4] R2, R3= Word from
memloc[R1+4], [R1+8]
respectively
After Execution: After Execution: After Execution: After Execution:
R2=93H R2=7893H R2=0x24567893H R2=0x24567893H,
R3=0x88564478H
Examples on STR instruction
Before Execution
Address Data
1000002A 0xABCDEF54
R1 1000002E 0x12345678
10000032 0x24567893
10000036 0x88564478

Example 1 Example 2
STRB R2, [R1, #4] STRH R2, [R1,#4]
Before Execution: Before Execution:
R2=0x324565CDH R2=0x324565CDH
Operation:
Operation:
mem8[R1+4]=R2
Mem16[R1+4]= R2
After Execution:
After Execution:
Address Data
Address Data
1000002A 0xABCDEF54H
R1 1000002A 0xABCDEF54H
1000002E 0x12345678H R1 1000002E 0x12345678H
10000032 0x245678CDH 10000032 0x245665CDH
10000036 0x88564478H 10000036 0x88564478H

Example 3 Example 4
STR R2, [R1, #4]
STRD R2, R3, [R1, #4]
Before Execution:
Before Execution:
R2=0x324565CDH
R2=0x324565CDH, R3=0x88564478H
Operation: Operation:
Mem32[R1+4]= R2 [R1+4] = R2, [R1+8] = R3
After Execution: After Execution
Address Data
1000002A 0xABCDEF54H Address Data
R1 1000002A 0xABCDEF54H
1000002E 0x12345678H
R1 1000002E 0xDE285349H
10000032 0x324565CDH
10000036 0x88564478H 10000032 0x324565CDH
10000036 0x88564478H
1. Write a Cortex M3 program to move a 32 bit data from a location 2000H(data =
0xFFH) into 3000H and 4000H.
area ex1, code, readonly
entry
start
mov r5,#0xFFH ; r5 = 0xFFH
ldr r0, =0x2000H ; r0 points to the address 2000H
ldr r1, =0x3000H ; r1 points to the address 3000H
ldr r2, =0x4000H ; r2 points to the address 4000H
ldr r5, [r0] ; mem[2000H] = 0xFFH
str r5, [r1] ; mem[3000H] = 0xFFH
str r5, [r2] ; mem[4000H] =
0xFFH end

Table 1: Multiple Load and Store Instructions

Example Description
LDMIA Rd!, Read multiple words from memory location specified by Rd. Address
<Reg list> increments(IA) after each transfer(16 bit Thumb instruction)
STMIA Rd!, Store multiple words to memory location specified by Rd. Address
<Reg list> increments(IA) after each transfer(16 bit Thumb instruction)
LDMIA.W Read multiple words from memory location specified by Rd. Address
Rd(!), <Reg list> increments(IA) after each read(.W specifies that it is a 32 bit Thumb2
instruction)
LDMDB.W Read multiple words from memory location specified by Rd. Address
Rd(!), <Reg list> decrements before(DB) each read(.W specifies that it is a 32 bit
Thumb2 instruction)
STMIA.W Rd(!), Write multiple words to memory location specified by Rd. Address
<Reg list> increment after each read (.W specifies that it is a 32 bit Thumb2
instruction)
STMDB.W Write multiple words to memory location specified by Rd. Address
Rd(!), <Reg list> decrement before each read (.W specifies that it is a 32 bit Thumb2
instruction)
Example 1: Write an ALP to move 4 32 bit data from source location to the destination location
using LDMIA and STMIA instructions.

Source Data Data Destination

Address Address
X 0x55555555H Y
R0 R1
X+4 0x11111111H Y+4
X+8 0x33333333H Y+8
X+C 0x44444444H Y+C
area program, code, readonly
export main
main
LDR R0, =X ; R0 points to address X
LDR R1, =Y ; R1 points to address Y
LDMIA R0!, {R2, R3, R4, R5} ; R2 = 0x55555555H, R3 = 0x11111111H
R4 = 0x33333333H, R5 = 0x44444444H
R0 is auto incremented after each load.
STMIA R1!, {R2, R3, R4, R5} ; mem[Y] = 0x55555555H,
mem[Y+4] =
0x11111111H, mem[Y+8] =
0x33333333H, mem[Y+C]
= 0x44444444H
R1 is auto incremented after each store.
X DCD 0x55555555H, 0x11111111H, 0x33333333H, 0x44444444H ; address X
area temp, data, readwrite
Y DCD 0 ; address Y
end

Block diagram illustrating multiple store operation

Table 2: Examples of Pre indexed Memory Access Instructions

LDR.W Rd, [Rn, #offset]! Preindexing load instructions for various

sizes (word, byte, half word and double word)
LDRB.W Rd, [Rn, #offset]!
LDRH.W Rd, [Rn, #offset]!
LDRD.W Rd1,Rd2, [Rn, #offset]!
LDRSB.W Rd, [Rn, #offset]! Preindexing load instructions for various
sizes with sign extend (byte, half word)
LDRSH.W Rd, [Rn, #offset]!
STR.W Rd, [Rn, #offset]! Preindexing store instructions for various
STRB.W Rd, [Rn, #offset]! sizes (word, byte, half word and double word)
STRH.W Rd, [Rn, #offset]!
STRD.W Rd1,Rd2, [Rn, #offset]!

Examples on LDR instruction – Pre indexed memory access

Before Execution

Address Data
10000032 0xABCDEF54H
1000002E 0x12345678H
R1 1000002A 0x24567893H
10000026 0x88564478H

Sl Instruction Before Execution Operation After Execution:

Ex 1 LDR.W R2, [R1, #8] R2=0x324565CDH R2= Mem32[R1+8] R2 = 0x12345678H

Ex 2 LDRB.W R3, [R1,#4] R3=0x324565CDH R3 = Mem8[R1+4] R3 = 0x00000093H
Ex 3 LDRH.W R4, [R1, #4] R4 = 0x83502167H R4 = Mem16[R1+4] R4 = 0x00007893H
Ex 4 LDRD.W R2, R3, R2=0x324565CDH, R2 =Mem32[R1+8], R2 = 0x12345678H,
[R1,#8] R3 = 0x83502167H R3 = Mem32[R1+C] R3 = 0xABCDEF54H
Ex 5 LDRSB.W R2, R2=0x324565CDH R2=signext{Mem8 R2 = 0xFFFFFF93H
[R1,#4] [R1+4]}
Ex 6 LDRSH.W R2, R2=0x324565CDH R2=signext{Mem16 R2 = 0x00007893H
[R1,#4] [R1+4]}
Examples on STR instruction – Pre indexed memory access
Ex 7 STR.W R2, [R1,#8] R2=0x324565CDH [R1+8] =[1000002E] = 0x324565CDH
Ex 8 STRB.W R3, [R1,#4] R3 = 0x83502167H [R1+4] = [1000002A] = 0x 24567867H
Ex 9 STRH.W R3, [R1,#4] R3 = 0x83502167H [R1+4] = [1000002A ] = 0x 24562167H
Ex STRD.W R3, R4, R3 =0x83502167H, [R1+8] = [1000002E] = 0x83502167H,
10
[R1,#8] R2=0x324565CDH [R1+C] = [10000032] =0x324565CD H

Table 3: Examples of Post indexed Memory Access Instructions

LDR.W Rd, [Rn], #offset Post indexing load instructions for various
sizes (word, byte, half word and double word)
LDRB.W Rd, [Rn], #offset
LDRH.W Rd, [Rn], #offset
LDRD.W Rd1,Rd2, [Rn], #offset
LDRSB.W Rd, [Rn], #offset Post indexing load instructions for various
sizes with sign extend (byte, half word)
LDRSH.W Rd, [Rn], #offset
STR.W Rd, [Rn], #offset Post indexing store instructions for various
sizes (word, byte, half word and double word)
STRB.W Rd, [Rn], #offset
STRH.W Rd, [Rn], #offset
STRD.W Rd1,Rd2, [Rn], #offset

Examples on LDR and STR instruction – Post indexed memory access

Before Execution

Address Data
10000098 0xABCDEF54H
10000094 0x12345678H
R1 10000090 0x24567893H
1000008C 0x88564478H

Sl Instruction Before Execution Operation After Execution:

No
Ex 1 LDR.W R2, [R1], #8 R2=0x324565CDH R2= Mem32[R1] R2 = 0x88564478H
R1 points to 0x10000094H
Ex 2 LDRD.W R4 =0x22334455H, R4=Mem32[R1], R4 = 0x88564478H
R4,R3, [R1],#4 R3 =0x46789120H R3 = R3 = 0x24567893H
Mem32[R1+4] R1 points to 0x10000090H
Ex 3 STRH.W R4, [R1], #4 R4 = 0x83502167H Mem16[R1]=R4 [1000008C] =
0x88562167H
R1 points to
0x10000090H
Ex 4 STRD.W R3,R2, [R1], R3=0x324565CDH, R3 = Mem32[R1], [1000008C] =
#8 R2 = 0x83502167H R2 = Mem32[R1+8] 0x324565CDH,
[10000094] =
=0x83502167H
R1 points to
0x10000094H

ii). PUSH and POP Instructions

Syntax
PUSH<reg> ; Decrement and
store POP <reg> ; Get and increment

Multiple register PUSH and POP operation

PUSH {R0, R4-R7, R9} ;Push R0, R4, R5, R6, R7, R9 into stack memory

POP {R2,R3} ;Pop R2 and R3 from stack Usually a PUSH instruction

will have a corresponding POP with the same register list, but this is not always necessary.
For example, a common exception is when POP is used as a function return:
PUSH {R0-R3, LR} ; Save register contents at beginning of subroutine
processing.

POP {R0-R3, PC} ; restore registers and return in this case, instead of
popping the LR register back and then branching to the address in LR, we POP the address
value directly in the program counter.
iii). MSR and MRS Instructions
MSR and MRS instructions are used to access the special registers in Cortex M3. The special
registers are listed as shown in Table 2.

Table 2: Special Register names for MRS and MSR Instructions

Symbol Description
IPSR Interrupt status register
EPSR Execution status register (read as zero)
APSR Flags from previous operation
IEPSR A composite of IPSR and EPSR
IAPSR A composite of IPSR and APSR
EAPSR A composite of EPSR and APSR
PSR A composite of APSR, EPSR and IPSR
MSP Main Stack Pointer
PSP Process Stack Pointer
PRIMASK Normal exception mask register
BASEPRI Normal exception priority mask register
BASEPRI_MAX Same as normal exception priority mask register, with conditional
write (new priority level must be higher than the old level).
FAULTMASK Fault exception mask register (also disables normal interrupt)
CONTROL Control register

Syntax: MRS <general purpose reg>,<special register>

MSR < special register >,< general purpose reg >
Example:
MRS R0, PSR ; Read Processor status word into R0.
MSR CONTROL, R1 ; Write value of R1 into control register

3. Arithmetic instructions
Instruction Operation Description
ADD Rd, Rn, Rm Rd = Rn + Rm ADD operation
ADD Rd, Rd, Rm Rd = Rd + Rm
ADD Rd, #immd Rd = Rd + #immd
ADD Rd, Rn. #immd Rd = Rn + #immd
ADC Rd, Rn, Rm Rd = Rn+ Rm + carry ADD with carry
ADC Rd, Rd, Rm Rd = Rd + Rm + carry
ADC Rd, #immd Rd = Rd + #immd + carry
ADDW Rd, Rn, #immd Rd = Rn + #immd Add register with 12 bit
immediate value
SUB Rd, Rn, Rm Rd = Rn – Rm SUBTRACT
SUB Rd, #immd Rd = Rd - #immd
SUB Rd, Rn, #immd Rd = Rn - #immd
SBC Rd, Rm Rd = Rd – Rm - borrow SUBTRACT with borrow (not
SBC.W Rd, Rm,#immd Rd = Rm - #immd - borrow carry)
SBC.W Rd, Rn, Rm Rd = Rn - Rm - borrow
RSB.W Rd, Rn, #immd Rd = #immd – Rn Reverse subtract
RSB.W Rd, Rn, Rm Rd = Rm – Rn
MUL Rd, Rm Rd = Rd * Rm Multiply
MUL.W Rd, Rn, Rm Rd = Rn * Rm
UDIV Rd, Rn, Rm Rd = Rn/Rm Unsigned and Signed divide
SDIV Rd, Rn, Rm Rd = Rn/Rm

i). ADDITION INSTRUCTIONS

Ex: 1. ADD R3, R1, R0
Before Execution: R1=0x12345678,H R0=0xABCDEF45H, R3=0x23657894H
Operation: R3=R1+R0
After execution: R3=0xBE0245BDH.

2. ADD R3,R3,R1
Before Execution: R1=0x12345678H, R3=0x23657894H
Operation: R3=R3+R1
After execution: R3=0x3599CF0CH

3. ADD R3, #0x34624856H

Before execution:
R3=0x12345678H Operation:
R3=R3+0x34624856H After
execution:R3= 0x46969ECEH
4. ADD R3, R1, # 0xABCDEF45 H
Before Execution: R1=0x12345678H, R3=0x23657894 H
Operation: R3=R1+0xABCDEF45H
After execution: R3=0xBE0245BDH

5. ADC R3, R1, R0

Before Execution: R1=0x12345678H, R0=0xABCDEF45H, R3=0x23657894H , carry=1
Operation: R3=R1+R0+carry
After execution: R3=0xBE0245BEH

6. ADC R3, R3, R0

Before Execution: R0=0xABCDEF45H, R3=0x23657894H , carry=1
Operation: R3=R3+R0+carry
After execution: R3=0xCF3367DAH

7. ADC R3, #0xABCDEF45

Before Execution: R3=0x23657894 H,
carry=1 Operation:
R3=R3+#0xABCDEF45H+carry After
execution: R3=0xCF3367DAH

ii). SUBTRACTION INSTRUCTIONS

Example 1 Example 2
SUB R2, R1, R0 SUB R2, #0x00000003 SUB R2, R1, #0x00000003
Before Execution Before Execution Before Execution
R2=0x00000000H R2 = 0x00000007H R1 = 0x0000000AH
R1=0x00000004H
R0=0x00000003H
Operation: R2 = R1 – R0 Operation: R2 = R2 – Operation: R2 = R1 -
0x00000003H 0x00000003H
After Execution After Execution After Execution
R2 = 0x00000001H R2 = 0x00000004H R2 = 0x00000007H
R1 =
0x00000004H R0
= 0x00000003H

Example 4 Example 5 Example 6

SBC R1, R0 SBC.W R2, R1, R0 RSB.W R2, R1, #45H
Before Execution Before Execution Before Execution
B=1, C=0, R0=9AH, B=1, C=0, R0=9AH, R1=0x00000002H
R1=0x0000002BH R1=0x0000002BH
Operation: Operation: Operation: R2= #45H– R1
R1= R1 – R0 - borrow R2 = R1 – R0 - borrow
After Execution After Execution After Execution
R2=0xFFFFFF90H R2=0xFFFFFF90H R2=0x00000043H

Example 7
RSB.W R2, R1, R0
Before Execution
R1=0x00000022H, R0=0x00000049H
Operation: R2 = R0 – R1
After Execution
R2=0x00000027H
iii). MULTIPLICATION INSTRUCTIONS
Table: 32 bit multiply instructions
Instruction Operation
SMULL RdLo, RdHi, Rn, Rm 32 bit multiply instructions for
Operation: {RdHi, RdLo} = Rn * Rm signed values
SMLAL RdLo, RdHi, Rn, Rm
Operation: {RdHi, RdLo} + = Rn * Rm
UMULL RdLo, RdHi, Rn, Rm 32 bit multiply instructions for
Operation: {RdHi, RdLo} = Rn * Rm unsigned values
UMLAL RdLo, RdHi, Rn, Rm
Operation: {RdHi, RdLo} + = Rn * Rm
MLA R10,R1, R2, R5 Multiply with Accumulator
Operation: R10 = (R1 * R2) + R5
MLS R4, R5, R6, R7 Multiply with Subtract
Operation: R4 = R7 – (R5 * R6)

Example 1: Write an ALP to find the factorial of a given number.

area code1,code, readonly
value dcd 0x00001001
export main
main
ldr r0,=value ; r0 points to the address
ldr r1,[r0] ; r1 gets the data from the address pointed by
r0 ldr r2,#1 ; r2 value is loaded with 1
up: mul r2,r1 ; multiply to find factorial
subs r1,#1 ; subtract and update flags (r1 acts as count variable
also) bnz up ; repeat the loop till r1 becomes 0
end

Example 2: Write an ALP to multiply 2 64 bit numbers

area pgm1,code,readonly
value1 dcd 0x12345678H,0xABCDEF34H //lower, upper
value2 dcd 0x11223344H,0x55667788H //lower, upper
export main
main
ldr r0,=value1 ; r0 points to the address of value 1
ldrd r1, r2,[r0] ; r1=0x12345678,
r2=0xabcdef34H
ldr r0,=value2 ; r0 points to the address of value 2
ldrd r3, r4,[r0] ; r3=0x11223344H, r4=0x55667788H
umull r5,r6,r1,r3 ; Partial product generation
umlal r6,r7,r1,r4 ; Partial product generation
umlal r6,r7,r2,r3 ; Partial product generation
umlal r7,r8,r2,r4 ; Partial product generation
end

A(R2) B(R1) x C(R4) D(R3)

B(R6) D(R5)
B(R7) C(R6)
A(R7) D(R6)
A(R8) C(R7)
P4 P3 P2 P1

Example 3: Write an ALP to perform Multiply with accumulate and multiply with subtract
operations.
area pgm2, code, readonly
data1 dcd 0x00009678H
data2 dcd 0x000016EFH
data3 dcd 0x00000003H
export main
main
ldr r0, =data1 ; r0 points to the address of data 1
ldr r1, =data2 ; r1 points to the address of data 2
ldr r4, =data3 ; r4 points to the address of data 3
ldr r2, [r0] ; r2 = 0x00009678H
ldr r3, [r1] ; r3 = 0x000016EFH
ldr r6, [r4] ; r4 = 0x00000003H
mla r5,r2,r3,r6 ; r5 = 0x0D7ACA0BH
mls r7, r2,r3,r6 ; r7=0xF28535FBH
end

Example 4: Illustrate the difference between smull and umull instructions

Ex: -3 x 2 (smull) Ex: 3 x 2 (umull)
MOV R2, # - 0x0003 MOV R2, # 0x0003
MOV R3, # 0x0002 MOV R3, # 0x0002
Before Execution: Before Execution:
R2 = 0xFFFFFFFDH (- R2 = 0x00000003H (3),
3), R3 = 0x00000002H R3 = 0x00000002H (2)
(2)
After Execution: After Execution:
R4 = 0xFFFFFFFA (- R4 = 0x00000006H (6),
6), R5 = 0xFFFFFFFF ( R5 = 0x00000000H (0)
F)

iv). DIVISION INSTRUCTIONS

The mnemonic for signed and unsigned divide instructions is as follows: SDIV.W, UDIV.W.
The result is Rd = Rn/Rm.

Example 1: Program to divide two 32-bit data

AREA MULTIPLICATION, CODE, READONLY
EXPORT main
NUM1 DCD &BBBBBBBB
NUM2 DCD &22222222
main
LDR R0, NUM1 ; Read the first data
LDR R1, NUM2 ; Read the second data
UDIV /SDIV R3, R0, R1 ; Divide R0 by R1, R3=5
(quotient) END

4. Logical instructions

Instruction Operation Description

AND Rd, Rn Rd = Rd & Rn Bitwise AND
AND.W Rd, Rn, #immd Rd = Rn & #immd
AND.W Rd, Rn, Rm Rd = Rn & Rm
ORR Rd, Rn Rd = Rd | Rn Bitwise OR
ORR.W Rd, Rn, #immd Rd = Rn | #immd
ORR.W Rd, Rn, Rm Rd = Rn | Rm
BIC Rd, Rn Rd = Rd & (~Rn) Bit clear
BIC.W Rd, Rn, #immd Rd = Rn & (~#immd)
BIC.W Rd, Rn, Rm Rd = Rn & (~Rm)
ORN.W Rd, Rn,#immd Rd = Rn | (~#immd) Bitwise OR NOT
ORN.W Rd, Rn, Rm Rd = Rn | (~Rm)
EOR Rd, Rn Rd = Rd ^ Rn Bitwise XOR
EOR.W Rd, Rn, #immd Rd = Rn ^ #immd
EOR.W Rd, Rn, Rm Rd = Rn ^ Rm
Sl Examples
No. Instruction Before Execution After Execution
1. AND R2, R3 R2 = 0xE2352356H R2 = 0x60250340H
Operation: R2 = R2 & R3 R3 = 0x79254361H
2. ORR R1,R4, R3 R4 = 0xE2352356H R1 = 0xFB356377H
Operation: R1 = R4 | R3 R3 = 0x79254361H
3. BIC.W R6, R0, #2 R0 = 0x79254361H R6 = 0x79254361H
Operation: R6 = R0 & (~#2)
4. ORN.W R4,R1, R0 R1 = 0xE2352356H R4 = 0xE6FFBFDEH
Operation: R4= R1 | ~R0 R0 = 0x79254361H
5. EOR.W R2, R3, R4 R3 = 0xE2352356H R2 = 0x9B106037H
Operation: R2 = R3 ^ R4 R4 = 0x79254361H

5. Shift and Rotate Instructions

Instruction Operation Description

ASR Rd, Rn, #immd Rd = Rn >> immd Arithmetic shift right
ASR Rd, Rn Rd = Rd >> Rn
ASR.W Rd, Rn, Rm Rd = Rn >> Rm
LSL Rd, Rn, #immd Rd = Rn << immd Logical shift left
LSL Rd, Rn Rd = Rd << Rn
LSL.W Rd, Rn, Rm Rd = Rn << Rm
LSR Rd, Rn, #immd Rd = Rn >> immd Logical shift right
LSR Rd, Rn Rd = Rd >> Rn
LSR.W Rd, Rn, Rm Rd = Rn >> Rm
ROR Rd, Rn Rd rot by Rn Rotate right
ROR.W Rd, Rn, #immd Rd = Rn rot by immd
ROR.W Rd, Rn. Rm Rd = Rn rot by Rm
RRX,W Rd, Rn {C, Rd} = {Rn, C} Rotate right extended
Sl Examples
No. Instruction Before Execution After Execution
1. ASR R2, R3, #2 1. R3=0xE2352356H R2 = 0xF88d48D5H
Operation: R2 = R3 >> 2 2. R3 = 0x79254361H R2 = 0x1E4950D8H
2. LSL R1,#3 R1 = 0xABCD1234H R1 = 0x5E6891A0H
Operation: R1 = R1 <<3
3. LSR R6, #1 R6 = 0xA2BC3567H R6 = 0x515E1AB3H
Operation: R6 = R6 >>1
4. ROR R4, #5 R4 = 0x689243ACH R4 = 0x6344921DH
Operation: R4 = R4 rotate
right by 5
5. RRX R4, #3 R4 = 0x689243ACH R4 = 0x0D124875H
Operation: R4 = R4 rotate Carry = 0 Carry = 1
with carry by 3

6. Sign Extension instructions

Instruction Operation Description

SXTB <Rd>,< Rm> Rd = signext (Rm[7:0]) Sign extend byte data into word
SXTH <Rd>,< Rm> Rd = signext (Rm[15:0]) Sign extend half word data into word
UXTB <Rd>, <Rn> Rd = unsignext(Rn[7:0]) Unsign extend byte data into word
UXTH <Rd>, <Rn> Rd = unsignext(Rn[15:0]) Unsign extend half word data into
word
Example: Before execution: R0 = 0x55AA8765H, R1 = 0x9A2378ABH
1. Instruction: SXTB R1(d), R0(s) 2. Instruction: SXTH R0(d), R1(s)
After execution: R0 = 0x55AA8765H, After execution: R0 = 0x55AA8765H,
R1 = 0x00000065H R1 = 0xFFFF8765H
3. Instruction: UXTB R1(d), R0(s) 4. Instruction: UXTH R0(d), R1(s)
After execution: R0 = 0x55AA8765H, After execution: R0 = 0x55AA8765H,
R1 = 0x00000065H R1 = 0x00008765H

7. Data Reverse order Instructions

Instructions Operation Description
REV Rd, Rn Rd = rev(Rn) Reverse bytes in word
REV16 Rd, Rn Rd = rev16(Rn) Reverse bytes in each half word
REVSH Rd, Rn Rd = revsh(Rn) Reverse bytes in bottom half word and sign extend the result

Example: Before Execution: R0 = 0x1234789AH, R1 = 0x9A23C8ABH

Instruction: REV R0, R1 Instruction: REV16 R0, R1 Instruction: REVSH R0, R1
After Execution: After Execution: After Execution:
R1 = 0x9A23C8ABH R1 = 0x9A23C8ABH R1 = 0x9A23C8ABH
R0 = 0xABC8239AH, R0 = 0x 239AABC8H, R0 = 0x FFFFABC8H,
8. Bit field manipulation and extraction instructions

Sl Instruction Operation Description

No.
1 BFC Rd, #<LSB>, #width Clear the bits starting from LSB Bit Field Clear
bit till the width towards MSB,
rest of
the bits are unaffected.
Example: BFC R0, #4, #10
Before Execution: R0 =0x12345678H
R0 = 0001 0010 0011 0100 0101 0110 0111 1000 (clear bit 4 to bit 13)

After Execution: R0 = 0x12344008H

R0 = 0001 0010 0011 0100 0100 0000 0000 1000
2 BFI Rd, Rn, #<LSB>, #width Insert the source register bits from Bit Field Insert
0th bit till the width towards MSB
into the destination register bits
from LSB till width towards MSB.
Rest of the bits are unaffected.
Example: BFI R1, R2, #8, #12
Before Execution: R1 = 0x34221DBDH , R2 = 0xDCBC4533H
R2 = 1101 1100 1011 1100 0100 0101 0011 0011

R1 = 0011 0100 0010 0010 0001 1101 1011 1101 (Replace)

After Execution: R1 = 0011 0100 0010 0101 0011 0011 1011 1101
R1 = 0x342533BDH, R2 = 0xDCBC4533H
3 RBIT Rd, Rn Reverse every bit of source register Reverse Bit
and store it in destination register
Example: RBIT R1, R2
Before Execution: R1 = 0x12345678H , R2 = 0x10018008H

After Execution: R1 = 0x10018008H, R2 = 0x10018008H

4 CLZ.W Rd, Rn Counts the leading zeroes in source Count leading

register and place the count value in zeros
destination register
Example: CLZ.W R3, R4
Before Execution: R3 = 0x12345678H, R4 = 0x 22556688H
After execution: R3 = 0x00000002H

5 UBFX.W <Rd>, <Rn>, Extracts a bit field from a register starting Unsigned Bit
<#LSB>. <#width> from any location (specified by LSB) Field Extraction
with any width (specified by #width),
zero
extends it, puts in destination register
Example: LDR R0, =0x5678ABCDH
UBFX.W R1, R0, #4, #8
R1 = 0x000000BCH
6 SBFX.W <Rd>, <Rn>, SBFX extracts a bit field, but its sign Signed Bit Field
<#LSB>. <#width> extends it before putting it in a Extraction
destination register.
Example: LDR R0,=0x5678ABCDH
SBFX.W R1,R0,#4, #8
R1 = 0xFFFFFFBCHs

9. Conditions for Branches or Other Conditional Operations

Current Program Status Register (CPSR) Format

Flag PSR Bit Description

N 31 Negative flag(last operation result is a negative value)
Z 30 Zero (last operation result is a zero value)
C 29 Carry flag (last operation returns a carry out or borrow)
V 28 Overflow (last operation results in an overflow)

 Interrupt disable bits

I = 1; Disable IRQ (Interrupt Request)
F = 1; Disable FRQ (Fast Interrupt Request)

 T bit - T =0; Processor in ARM state

T = 1; Processor in Thumb
state

 Mode bits: Specify the processor mode.

Examples
B loop A ; Branch to loop A
BLE down ; Conditionally branch to label down
B.W target ; Branch to target within 16MB
range BEQ target; Conditionally branch to target
BL func ; Branch with link (Call) to function func, return address stored in
LR BX LR ; Return from function call
BXNE R0 ; Conditionally branch to address stored in R0
BLX R0 ; Branch with link and exchange (Call) to a address stored in R0.

Compare (CMP) instruction

 Subtracts two values and updates the flags (just like SUBS), but the result is not stored in
any registers.
 CMP can have the following formats:
CMP R0, R1 ; Calculate R0 – R1 and update flag
CMP R0, #0x12 ; Calculate R0 – 0x12 and update flag
CMN (compare negative) instruction
 It compares one value to the negative (two’s complement) of a second value; the flags are
updated, but the result is not stored in any registers.
 CMN R0, R1 ; Calculate R0 – (~R1) and update flag
CMN R0, #0x12H ; Calculate R0 – (~0x12) and update
flag

TST (test) instruction

 It is more like the AND operation. It ANDs two values and updates the flags. However,
the result is not stored in any register.
 Similarly to CMP, it has two input formats:
TST R0, R1 ; Calculate R0 AND R1 and update flag
TST R0, #0x12H ; Calculate R0 AND 0x12H and update
flag

TEQ (test equal) instruction

 It is similar to XOR operation. It XORs two values and updates the flags. The result is
not stored in any register.
TEQ R1, R2 ; Calculate R1 XOR R2 and update flag
TEQ R1, #0x34H ; Calculate R1 XOR 0x34H and update
flag

Combined Compare and Conditional Branch

 With ARM architecture v7-M, two new instructions are provided on the Cortex-M3 to
supply a simple compare with zero and conditional branch operations.
 These are CBZ (compare and branch if zero) and CBNZ (compare and branch if
nonzero). The compare and branch instructions only support forward branches.
 Syntax
CBZ Rn, label
CBNZ Rn, label
where: Rn Specifies the register holding the operand.
label Specifies the branch destination.

 Operation
Use the CBZ or CBNZ instructions to avoid changing the condition code flags and to reduce
number of instructions.
i). CBZ Rn, label
---Does not change condition flags but is otherwise equivalent to:
CMP Rn, #0
BEQ label
ii). CBNZ Rn, label
--- Does not change condition flags but is otherwise equivalent to:
CMP Rn, #0
BNE label
 Condition flags
These instructions do not change the flags.
 Example
CBZ R5, target ; Forward branch if R5 is zero
CBNZ R0,target ; Forward branch if R0 is not
zero.
 Example : i = 5;
while (i != 0 )
{
func1(); call a
function i−−;
}
 This can be compiled into the following:
MOV R0, #5 ; Set loop counter
loop1 CBZ R0, loop1exit ; if loop counter = 0 then exit the loop
BL func1 ; call a function
SUB R0, #1 ; loop counter decrement
B loop1 ; next loop
loop1exit

 The usage of CBNZ is similar to CBZ, apart from the fact that the branch is taken if the
Z flag is not set (result is not zero).

Example: status = strchr(email_address, '@');

if (status == 0)
{//status is 0 if @ is not in
email_address show_error_message();
exit(1);
}
This can be compiled into the following:
...
BL strchr
CBNZ R0, email_looks_okay ; Branch if result is not zero
BL show_error_message
BL exit
email_looks_okay
...

The APSR value is not affected by the CBZ and CBNZ instructions.
10. Memory Barrier Instructions
 The Cortex-M3 supports a number of barrier instructions.
 These instructions are needed as memory systems get more and more complex.
 In some cases, if memory barrier instructions are not used, race conditions could
occur.
 The following are the three barrier instructions in the Cortex-M3
• DMB • DSB • ISB.
 These instructions are described below.
 The memory barrier instructions can be accessed in C using Cortex Microcontroller
Software Interface Standard (CMSIS) compliant device driver library as follows:
void DMB(void); // Data Memory Barrier.
void DSB(void); // Data Synchronization Barrier.
void ISB(void); // Instruction Synchronization Barrier.

DMB
 It is very useful for multi-processor systems.
 For example, tasks running on separate processors might use shared memory
to communicate with each other.
 In these environments, the order of memory accesses to the shared memory can be
very important.
 DMB instructions can be inserted between accesses to the shared memory to ensure
that the memory access sequence is exactly the same as expected.

DSB
 For example, if the memory map can be switched by a hardware register, after
writing to the memory switching register you should use the DSB instruction.
 Otherwise, if the write to the memory switching register is buffered and takes a
few cycles to complete, and the next instruction accesses the switched memory
region immediately, the access could be using the old memory map.
 In some cases, this might result in an invalid access if the memory switching
and memory access happen at the same time.
 Using DSB in this case will make sure that the write to the memory map
switching register is completed before a new instruction is executed.
 The DSB and ISB instructions can be important for self-modifying code.
 For example, if a program changes its own program code, the next
executed instruction should be based on the updated program.
 However, since the processor is pipelined, the modified instruction location
might have already been fetched.
 Using DSB and then ISB can ensure that the modified program code is fetched
again.
ISB
 Architecturally, the ISB instruction should be used after updating the value of the
CONTROL register.
 In the Cortex-M3 processor, this is not strictly required. But if you want to make
sure your application is portable, you should ensure an ISB instruction is used after
updating to CONTROL register.

11. If then instruction block

 The IT (IF-THEN) block is very useful for handling small conditional code.
 It avoids branch penalties because there is no change to program flow.
 It can provide a maximum of four conditionally executed instructions.
 In IT instruction blocks, the first line must be the IT instruction, detailing the
choice of execution, followed by the condition it checks.
 The first statement after the IT command must be TRUE-THEN- EXECUTE, which
is always written as ITxyz, where T means THEN and E means ELSE.
 The second through fourth statements can be either THEN (true) or ELSE (false).

IT<x><y><z> <cond> ; IT instruction (<x>, <y>, ; <z> can be T or E)

instr1<cond> <operands> ; 1st instruction (<cond> ; must be same as IT)
instr2<cond or not cond> <operands> ; 2nd instruction (can be ; <cond> or <!cond>
instr3<cond or not cond> <operands> ; 3rd instruction (can be ; <cond> or <!cond>
instr4<cond or not cond> <operands> ; 4th instruction (can be ; <cond> or <!cond>

 If a statement is to be executed when <cond> is false, the suffix for the instruction
must be the opposite of the condition.
 For example, the opposite of EQ is NE, the opposite of GT is LE, and so on.
Ex 1: if (R1<R2) then
R2=R2−R1
R2=R2/2
else
R1=R1−R2
R1=R1/2

In assembly,
CMP R1, R2 ; If R1 < R2 (less then)
ITTEE LT ; then execute instruction 1 and 2 ; (indicated by T) ;
else execute instruction 3 and 4 ; (indicated by E)
SUBLT.W R2, R1 ; 1st instruction
LSRLT.W R2, #1 ; 2nd instruction
SUBGE.W R1, R2 ; 3rd instruction (notice the GE is ; opposite of
LT) LSRGE.W R1,#1 ; 4th instruction

 If an exception occurs during the IT instruction block, the execution status of the
block will be stored in the stacked PSR (in the IT/Interrupt-Continuable Instruction
[ICI] bit field).
 So, when the exception handler completes and the IT block resumes, the rest of
the instructions in the block can continue the execution correctly.
 In the case of using multicycle instructions (for example, multiple load and
store) inside an IT block, if an exception takes place during the execution, the
whole instruction is abandoned and restarted after the interrupt process is
completed.

 The table above gives how the disassembler generates the code from object file.
12. Saturation instructions
 The Cortex-M3 supports two instructions that provide signed and unsigned
saturation operations: SSAT and USAT (for signed data type and unsigned data
type, respectively).
 Saturation is commonly used in signal processing—for example, in signal
amplification. When an input signal is amplified, there is a chance that the
output will be larger than the allowed output range.
 If the value is adjusted simply by removing the unused MSB, an overflowed result
will cause the signal waveform to be completely deformed as shown in figure below.
 The saturation operation does not prevent the distortion of the signal, but at least
the amount of distortion is greatly reduced in the signal waveform.

Syntax

 Besides the destination register, the Q-bit in the APSR can also be affected by the
result.
 The Q flag is set if saturation takes place in the operation, and it can be cleared
by writing to the APSR.
 For example, if a 32-bit signed value is to be saturated into a 16-bit signed value,
the following instruction can be used: SSAT.W R1, #16, R0.
 This will provide a saturation feature that has the properties shown in above Figure.
 For the preceding 16-bit saturation example instruction, the output values shown in the
table below can be observed.
 The range of the signal with signed saturation is from 0xFFFF8000H to 0x00007FFFH.

Unsigned Saturation
 If a 32-bit unsigned value is to saturate into a 16-bit unsigned value, the
following instruction can be used: USAT.W R1, #16, R0.
 The figure below indicates the resultant signal with unsigned saturation.
 The range of the signal with unsigned saturation is from 0x00000000H to 0x0000FFFFH.
 The table below shows the status of Q bit for different values of input and output.

Applications of saturation instruction

 Saturation instructions are commonly used in signal processing applications.
 They can also be used for data type conversions.
Ex: They can be used to convert a 32-bit integer value to 16-bit integer value.
However, C compilers might not be able to directly use these instructions, so
intrinsic function or assembler functions (or embedded/inline assembler code) for the
data conversion could be required.

13. Table branch byte/Table branch half word

 Table Branch Byte (TBB) and Table Branch Halfword (TBH) are for
implementing branch tables.
 The TBB instruction uses a branch table of byte size offset, and TBH uses a branch
table of half word offset.
 Since the bit 0 of a program counter is always zero, the value in the branch table is
multiplied by two before it’s added to PC.
 Furthermore, because the PC value is the current instruction address plus four, the
branch range for TBB is (2 × 255) + 4 = 514.
 Branch range for TBH is (2 × 65535) + 4 = 131074.
 Both TBB and TBH support forward branch only.
 TBB has this general syntax: TBB.W [Rn, Rm]
where Rn is the base memory offset and Rm is the branch table index.
 The branch table item for TBB is located at Rn + Rm.
 Assuming we used PC for Rn, we can see the operation as shown in the below figure.
 For TBH instruction, the process is similar except the memory location of the branch
table item is located at Rn + 2 x Rm and the maximum branch offset is higher.
 Again, we assume that Rn is set to PC, as shown in Figure w.r.t TBH.
 If Rn in the table branch instruction is set to R15, the value used for Rn will be
PC + 4 because of the pipeline in the processor.
 These two instructions are more likely to be used by a C compiler to generate code
for switch (case) statements.
 Because the values in the branch table are relative to the current program counter, it
is not easy to code the branch table content manually in assembler as the address
offset value might not be able to be determined during assembly/compile stage,
especially if the branch target is in a separate program code file.
 The coding syntax for calculating TBB/TBH branch table content could be dependent
on the development tool.
 In ARM assembler (armasm), the TBB branch table can be created in the following
way:
TBB.W [pc, r0] ; when executing this instruction, PC is the base address
branchtable ; label
DCB ((dest0 − branchtable)/2) ; Note that DCB is used
because the value is 8-bit
DCB ((dest1 − branchtable)/2)
DCB ((dest2 − branchtable)/2)
DCB ((dest3 − branchtable)/2)
dest0 ... ; Execute if r0 = 0
dest1 ... ; Execute if r0 = 1
dest2 ... ; Execute if r0 = 2
dest3 ... ; Execute if r0 = 3

 When the TBB instruction is executed, the current PC value is at the address labeled
as branchtable (because of the pipeline in the processor).
 Similarly, for TBH instructions, it can be used as follows:

TBH.W [pc, r0, LSL #1]

branchtable
DCI ((dest0 − branchtable)/2) ; Note that DCI is used because the value is 16-bit
DCI ((dest1 − branchtable)/2)
DCI ((dest2 − branchtable)/2)
DCI ((dest3 − branchtable)/2)
dest0 ... ; Execute if r0 = 0
dest1 ... ; Execute if r0 = 1
dest2 ... ; Execute if r0 = 2
dest3 ... ; Execute if r0 = 3
14. Miscellaneous Instructions
Sl Instructions Descriptions
No.
1 BKPT Breakpoint
2 CPSID Change processor state, disable interrupt
3 CPSIE Change processor state, enable interrupt
4 DMB Data Memory Barrier
5 DSB Data Synchronization Barrier
6 ISB Instruction Synchronization Barrier
7 MRS Move from special register to register
8 MSR Move from register to special register
9 NOP No operation
10 SEV Send event
11 SVC Supervisor call
12 WFE Wait for event
13 WFI Wait for interrupt
LIST OF INSTRUCTIONS
Table A: 16-Bit Data Processing Instructions
Table B: 16-Bit Branch Instructions

Table C: 16-Bit Load and Store Instructions

Table D: Other 16-Bit Instructions

Table E: 32-Bit Data Processing Instructions
Table F: 32-Bit Branch Instructions

Table G: Other 32 bit instructions

Table H: 32-Bit Load and Store Instructions

Table I: Unsupported Thumb Instructions for Traditional ARM Processors

Table J: Unsupported Change Process State Instructions

Table K: Unsupported Coprocessor Instructions

Table L: Unsupported Hint Instructions

ARM ALP PROGRAMS
1.Write an ALP to find the sum of 5 numbers in an array
AREA DATA
SUMP DCD SUM
N DCD 5
NUM1 DCD 3, -7, 2, -2,
10 POINTER DCD NUM1

AREA MYRAM, DATA, READWRITE

SUM DCD 0

AREA MYCODE, CODE,

READONLY ENTRY
EXPORT main

main
;;;;;;;;;;User Code Start from the next line;;;;;;;;;;;;

LDR R1, N ; load size of array -

; a counter for how many elements are left to
process LDR R2, POINTER ; load base pointer of array
MOV R0, #0 ; initialize accumulator
LOOP :LDR R3, [R2], #4 ; load value from
array,
; increment array pointer to next
word ADD R0, R0, R3 ; add value from array to accumulator
SUBS R1, R1, #1 ; decrement work counter
BGT LOOP ; keep looping until counter is
zero LDR R4, SUM ; get memory address to store sum
STR R0, [R4] ; store answer
LDR R6, [R4] ; Check the value in the
SUM STOP B STOP
END
Differences between microprocessor and Microcontroller

Sl Microprocessor Microcontroller
No.
1
Read Read Microcontroller Read Only Read Write
Only Write Memory Memory
Memory Memory (ROM)
Timer I/O Port Serial
System
Interface
Micropr Bus Serial
ocessor Interface

Timer I/O Port

2 Microprocessor is heart of computer Microcontroller is a heart of embedded

system. system.
3 It is just a processor. Memory and I/O Microcontroller has external processor along
components have to be connected with internal memory and I/O components.
externally.
4 Since memory and I/O has to be connected Since memory and I/O are present
externally, the circuit becomes large. internally, the circuit is small.
5 Cannot be used in compact systems and Can be used in compact systems and hence
hence inefficient. it is an efficient technique.
6 Cost of the entire system increases. Cost of the entire system is low.
7 Due to external components, the entire Since external components are low, total
power consumption is high. Hence it is not power consumption is less and can be used
suitable to be used with devices running in with devices running on stored power like
stored power like batteries. batteries.
8 Most of the microprocessors do not Most of the microcontrollers have power
have power saving features. saving modes like idle mode and power
saving mode. This helps to reduce power
consumption even further.
Differences between RISC and CISC

Sl
No RISC CISC
1 RISC stands for Reduced Instruction Set CISC stands for Complex Instruction Set
Computer. Computer.
2 CSIC processor has complex instructions that
RISC processors have simple instructions take up multiple clocks for execution. The
taking about one clock cycle. The average average clock cycle per instruction (CPI) is in
clock cycle per instruction (CPI) is 1.5 the range of 2 and 15.
3 Performance is optimized with more focus Performance is optimized with more focus on
on software hardware.
4 It has no memory unit and uses a separate It has a memory unit to implement complex
hardware to implement instructions.. instructions.
5 It has a hard-wired unit of programming. It has a microprogramming unit.
6 The instruction set is reduced i.e. it has only
a few instructions in the instruction set. The instruction set has a variety of
Many of these instructions are very different instructions that can be used for
primitive. complex operations.
7 CISC has many different addressing modes and
The instruction set has a variety of can thus be used to represent higher-level
different instructions that can be used for programming language statements more
complex operations. efficiently.
8 Complex addressing modes are synthesized CISC already supports complex addressing
using the software. modes
9 Multiple register sets are present Only has a single register set
10 They are normally not pipelined or less
RISC processors are highly pipelined pipelined
11 The complexity of RISC lies with the
compiler that executes the program The complexity lies in the micro program
12 Execution time is very less Execution time is very high
13 Code expansion can be a problem Code expansion is not a problem
Differences between Harvard architecture and Von Neumann architecture

Sl No Harvard architecture Von Neumann architecture

2 It required two memories for their It required only one memory for their
instruction and data. instruction and data.

3 Design of Harvard architecture Design of the Von Neumann architecture is

is complicated. simple.

4 Harvard architecture is required Von Neumann architecture is required only one

separate bus for instruction and data. bus for instruction and data.

5 Processor can complete an instruction Processor needs two clock cycles to complete
in one cycle an instruction.

6 Easier to pipeline, so high performance can Low performance as compared to Harvard

be achieve. architecture.

7 Comparatively high cost. It is cheaper.

ARM MICROCONTROLLER & EMBEDDED SYSTEMS – 17EC62

MODULE 2
CHAPTER 5

Text Books: Joseph Yiu, “The Definitive Guide to the ARM Cortex-
M3”, 2nd Edition, Newnes, (Elsevier), 2010.
Memory map of Cortex M3
8.4.1 Memory system features overview

1. A predefined memory map that specifies which bus

interface is to be used when a memory location is accessed.
2. Bit band: This provides atomic operations to bit data in
memory or peripherals, only supported in special memory
regions.
3. Supports unaligned transfers and exclusive accesses as
well.
4. The Cortex M3 supports both little endian and big endian
memory configuration.
8.4.2 Memory maps

• The Cortex M3 processor has a fixed memory map.

• Some of the memory locations are allocated for private
peripherals such as debugging components.
1. Fetch Patch and Break Point Unit (FPB)
2. Data Watch Point and Trace Unit (DWT)
3. Instrumentation Trace Macrocell (ITM)
4. Embedded Trace Macrocell (ETM)
5. Trace Port Interface Unit (TPIU)
6. ROM Table

• The Cortex M3 processor has a total of 4GB of address space

SRAM
0.5GB, The SRAM memory range is for connecting internal SRAM

On- chip peripherals

0.5GB, supports bit band alias and is accessed via the system bus
interface

External RAM
1 GB. Program execution is allowed.

External devices:
1 GB. Program execution is not allowed.

System level components + internal private peripheral buses

+ external private peripheral bus + vendor – specific system
peripherals
1.5 GB
8.4.3 Memory access attributes

The memory map also defines the memory attributes of accessing

each memory block or device:

The default memory attribute settings can be overridden if MPU is

present and the region is programmed differently from the default.
Memory access Attributes cntd…

• Bufferable: Write to memory can be carried out by a write

buffer while the processor continues next instruction
execution.
• Cacheable: Data obtained from memory read can be copied
to a memory cache so that next time it is accessed the value
can be obtained from the cache to speed up the program
execution.
• Executable: The processor can fetch and execute program
code from this memory region.
• Sharable: Data in this memory region could be shared by
multiple bus masters. Memory system
needs to ensure coherency of data between different bus
masters in shareable memory region.
Memory access Attributes cntd…
• Code memory region (0x00000000–0x1FFFFFFF): This region is executable, and the
cache attribute is write through (WT). You can put data memory in this region as well. If
data operations are carried out for this region, they will take place via the data bus
interface. Write transfers to this region are bufferable.

• SRAM memory region (0x20000000–0x3FFFFFFF): This region is intended for on-

chip RAM. Write transfers to this region are bufferable, and the cache attribute is write
back, write allocated (WB-WA). This region is executable, so you can copy program code
here and execute it.

• Peripheral region (0x40000000–0x5FFFFFFF): This region is intended for

peripherals. The accesses are non cacheable. You cannot execute instruction code in this
region (Execute Never, or XN in ARM documentation, such as the Cortex-M3 TRM).

• External RAM region (0x60000000–0x7FFFFFFF): This region is intended for either

on-chip or off-chip memory. The accesses are cacheable (WB-WA), and you can execute
code in this region.
Memory access Attributes cntd…
• External RAM region (0x80000000–0x9FFFFFFF): This region is intended
for either on-chip or off-chip memory. The accesses are cacheable (WT), and
you can execute code in this region.
• External devices (0xA0000000–0xBFFFFFFF): This region is intended for
external devices and/or shared memory that needs ordering/non buffered
accesses. It is also a non executable region.
• External devices (0xC0000000–0xDFFFFFFF): This region is intended for
external devices and/or shared memory that needs ordering/non buffered
accesses. It is also a non executable region.
• System region (0xE0000000–0xFFFFFFFF): This region is for private
peripherals and vendor-specific devices. It is non executable. For the PPB
memory range, the accesses are strongly ordered (non cacheable, non
bufferable). For the vendor-specific memory region, the accesses are bufferable
and non cacheable.
Memory access Attributes cntd…
8.4.4 Default Memory access permissions

• The Cortex M3 memory map has a default configuration for

memory access permissions
 Prevents user program from accessing system control memory spaces

• The default memory access permission is used when either:

1. No MPU is present
2. MPU is present but disabled
 Otherwise, the MPU will determine whether user accesses are allowed

• When a user access is blocked, the fault exception takes place

immediately.
DEFAULT MEMORY
ACCESS PERMISSIONS
CNTD…
8.4.5 Bit – Band Operations

• Bit band operation support allows a single load/store (read/write)

operation to access a single data bit.

• Bit band regions:

1. The first 1 MB of the SRAM region
2. The first 1 MB of the peripheral region

• They can be accessed via a separate memory region called the

bit – band alias.
BIT BAND ALIAS CNTD…
BIT BAND ALIAS CNTD…
• There are two regions of memory for bit-band operations:
0x20000000–0x200FFFFF (SRAM, 1 MB) 0x40000000–
0x400FFFFF (peripherals, 1 MB)
BIT BAND ALIAS
CNTD… BAN
BIT BAND ALIAS
CNTD… PROGRAM 1
BIT BAND ALIAS CNTD…
BIT BAND ALIAS
CNTD… PROGRAM 2
Read from the bit band alias
BIT BAND ALIAS CNTD…
BIT BAND ALIAS CNTD…
BIT BAND ALIAS CNTD…
BIT BAND ALIAS CNTD…
BIT BAND ALIAS CNTD…
BIT BAND ALIAS CNTD…
BIT BAND ALIAS CNTD…
BIT BAND ALIAS CNTD…
Bit-Band Operations in C programs
• There is no native support of bit-band operation in most C compilers.
• Ex: C compilers do not understand that the same memory can be accessed using two
different addresses
• When the bit-band feature is used, the variables being accessed might need to be
declared as volatile.
• Volatile property is used to ensure that each time a variable is accessed, the memory
location is accessed instead of a local copy of the data inside the processor.
•ARM RealView Development Suite version 4.0 and Keil MDK-ARM 3.80, bit
band support is provided by __attribute ((bitband)) language extension and
__bitband command line option.
#define DEVICE_REG0 *((volatile unsigned long *) (0x40000000))
#define DEVICE_REG0_BIT0 *((volatile unsigned long *) (0x42000000))
#define DEVICE_REG0_BIT1 *((volatile unsigned long *) (0x42000004))
...
DEVICE_REG0 = 0xAB; // Accessing the hardware register
by normal
// address ...
DEVICE_REG0 = DEVICE_REG0 | 0x2; // Setting bit 1 without using
// bitband feature ...
DEVICE_REG0_BIT1 = 0x1; // Setting bit 1 using bitband feature
// via the bit band alias address
BIT BAND ALIAS CNTD…
C macros to make accessing the bit-band alias easier
1. set up one macro to convert the bit-band address and the bit number into the bit-
band alias address
2. set up another macro to access the memory location by taking the address value
as a pointer

// Convert bit band address and bit number into

// bit band alias address

#define BITBAND(addr,bitnum) ((addr & 0xF0000000)+0x2000000+((addr &

0xFFFFF)<<5)+(bitnum <<2))

// Convert the address as a pointer

#define MEM_ADDR(addr) *((volatile unsigned long *) (addr))
BIT BAND ALIAS CNTD…

Based on the previous example, we rewrite the code as follows:

#define DEVICE_REG0 0x40000000

#define BITBAND(addr,bitnum) ((addr & 0xF0000000)+0x02000000+((addr &

0xFFFFF)<<5)+(bitnum<<2))

#define MEM_ADDR(addr) ((volatile unsigned long ) (addr)) ...

MEM_ADDR(DEVICE_REG0) = 0xAB; // Accessing the hardware // register by
normal address ...

// Setting bit 1 without using bitband feature

MEM_ADDR(DEVICE_REG0) = MEM_ADDR(DEVICE_REG0) | 0x2; ...

// Setting bit 1 with using bitband feature

MEM_ADDR(BITBAND(DEVICE_REG0,1)) = 0x1;
8.4.6 Unaligned Transfers
• The cortex M3 supports unaligned transfers on single accesses. Data
memory accesses can be defined as aligned or unaligned.

1. Word size, the address is not a multiple of 4

Unsigned transfer
example 1

Unsigned transfer
example 2

Unsigned transfer
example 3
UNALIGNED TRANSFER CNTD…
2. Half word size, and the address is not a multiple of 2

Unsigned transfer
example 4

Unsigned transfer
example 5
UNALIGNED TRANSFER CNTD…
 Semaphores are commonly used for allocating shared resources to applications.
 When a shared resource can only service one client or application processor, we
also call it Mutual Exclusion (MUTEX)
EXCLUSIVE ACCESSES CNTD…
EXCLUSIVE ACCESSES CNTD…

Used in Cortex M3

Syntax
LDREX <Rxf>, [Rn, #offset]
STREX <Rd>, <Rxf>,[Rn, #offset]
Where Rd is the return status of the exclusive write (0 = success and 1 =
failure).
ENDIAN MODE CNTD…
ENDIAN MODE CNTD…
ENDIAN MODE CNTD…

• The endian mode is set when the processor exits

reset and it cannot be changed afterwards.

• Instruction fetches are always in little endian, as are

data accesses in the configuration control memory
space and the external PPB memory range.

• The data can be easily converted between little

endian and big endian using instructions REV/REVH
ARM MICROCONTROLLER & EMBEDDED SYSTEMS – 17EC62

MODULE 2
CHAPTER 10

Text Books: Joseph Yiu, “The Definitive Guide to the ARM Cortex-
M3”, 2nd Edition, Newnes, (Elsevier), 2010.
Cortex M3 programming
 Cortex M3 can be programmed using either
assembly language, C language, or other high-level
languages like National Instruments LabVIEW.
 For most embedded applications using the Cortex-
M3 processor, the software can be written entirely in
C language.
 There are of course some people who prefer to use
assembly language or a combination of C and
assembly language in their projects.
 The procedure of building and downloading the
resultant image files to the target device is largely
dependent on the tool chain used.
A typical Development flow
The most basic flow uses, assembler, a C compiler, a
linker, and binary file generation utilities.
 RealView Development Suite (RVDS) or RealView
Compiler Tools (RVCT) provide a file generation flow as
shown.

 The scatter-loading script is optional but often required

when the memory map becomes more complex.
 RVDS also contains a large number of utilities, including
an Integrated Development Environment (IDE) and
Example of a simple C program Using real view Development site
•A normal program for the Cortex-M3 contains at least the “main” program and a
vector table.
• Let’s start with the most basic main program that toggles an Light Emitting Diode
(LED):
#define LED *((volatile unsigned int *)(0xDFFF000C))
int main (void)
{
int i; /* loop counter for delay function */
volatile int j; /* dummy volatile variable to prevent C
compiler from optimize the delay away */
while (1)
{
LED = 0x00; /* toogle LED */
for (i=0;i<10;i++)
{ j=0; } /* delay */
LED = 0x01; /* toogle LED */
for (i=0;i<10;i++)
{ j=0; } /* delay */
}
return 0;
}
LED example cntd…
 File is saved as blinky.c
 Separate C program called “vectors.c.”
 The file “vectors.c” contains the vector table, as well as a
number of dummy exception handlers.
LED Example cntd…
 using RVDS
1. compile the program uses command line:
$> armcc –c –g –W blinky.c –o blinky.o
$> armcc –c –g –W vectors.c –o vectors.o

2. linker can be used to generate the program image

 A scatter loading file “led.scat” is used to tell the linker the memory layout and to put the
vector table in the starting of the program image
the command line for the linker is
$> armlink –scatter led.scat "--keep=vectors.o(exceptions_area)"
blinky.o vectors.o –o blinky.elf

The executable image blinky.elf is now generated.

3. It can be converted to binary file and disassembly file using fromelf.

/* create binary file */
$> fromelf –-bin blinky.elf –output blinky.bin
/* Create disassembly output */
$> fromelf –c blinky.elf > list.txt
 using RVDS
1. compile the program uses command line:
$> armcc –c –g –W blinky.c –o blinky.o
$> armcc –c –g –W vectors.c –o vectors.o

2. linker can be used to generate the program image

The executable image blinky.elf is now generated.

3. It can be converted to binary file and disassembly file using fromelf.

/* create binary file */
$> fromelf –-bin blinky.elf –output blinky.bin
/* Create disassembly output */
$> fromelf –c blinky.elf > list.txt
 Using Keil MDK tool

Created in DOS batch file

Accessing Memory-Mapped Registers
in C
• There are various ways to access memory-
mapped peripheral registers in C language.
• For illustration, we will use the System Tick
(SYSTICK) Timer in the Cortex-M3 as an
example peripheral to demonstrate different
access methods in C language.
• The SYSTICK is a 24-bit timer which
contains only four registers.
Accessing Peripheral Registers as
Pointers
• This is the easiest method defining
each register as a pointer.
Alternative Way of Accessing
Peripheral Registers as Pointers
• We can define a macro to convert
address values to C pointer.
• The C-code looks a bit different, but the
generated code is the same as previous
implementation.
Accessing Peripheral Registers as
Pointers to Elements in a Data
Structure
• Method 2 is to define the registers as a data structure, and
then define a pointer of the defined structure.
• This is the method used in CMSIS compliant device driver
libraries.
Defining Peripheral-Based Address
Using Scatter Loading File
• Method 3 also uses data structure, but the base address of
the peripheral is defined using a scatter loading file (or
linker script) during linking stage.
Intrinsic Functions
• Use of the C language can often speed up
application development, but in some cases,
we need to use some instructions that
cannot be generated using normal C-code.
• Some C compilers provide intrinsic
functions for accessing these special
instructions.
• Intrinsic functions are used just like normal
C functions.
Intrinsic Functions Provided in
ARM Compilers
Assembly code can be included in
C code like this
CMSIS

Cortex Microcontroller Software Interface

Standard
CMSIS
The aims of CMSIS
• improve software portability and reusability
• enable software solution suppliers to develop
products that can work seamlessly with device
• libraries from various silicon vendors allow
embedded developers to develop software quicker
with an easy-to-use and standardized software
interface
• allow embedded software to be used on multiple
compiler products
• avoid device driver compatibility issues when
using software solutions from multiple sources
Areas of Standardization
• Hardware Abstraction Layer (HAL) for
Cortex-M processor registers: This includes
standardized register definitions for NVIC,
System Control Block registers, SYSTICK
register, MPU registers, and a number of
NVIC and core feature access functions.

• Standardized system exception names: This

allows OS and middleware to use system
exceptions easily without compatibility issues.
Areas of Standardization cntd…

• Standardized method of header file organization: This

makes it easier for users to learn new Cortex
microcontroller products and improve software
portability.
• Common method for system initialization: Each
Microcontroller Unit (MCU) vendor provides a
SystemInit() function in their device driver library for
essential setup and configuration, such as initialization of
clocks.
• Standardized intrinsic functions: By having standardized
intrinsic functions, software reusability and portability are
considerably improved.
Areas of Standardization cntd…
• Common access functions for communication: This
provides a set of software interface functions for
common communication interfaces including
universal asynchronous receiver/transmitter (UART),
Ethernet, and Serial Peripheral Interface (SPI).
• Standardized way for embedded software to determine
system clock frequency: A software variable called
System Frequency is defined in device driver code.
This allows embedded OS to set up the SYSTICK unit
based on the system clock frequency.
Organization of CMSIS
The CMSIS is divided into multiple layers as follows:
• Core Peripheral Access Layer :Name definitions, address
definitions and helper functions to access core registers
and core peripherals
• Middleware Access Layer: Common method to access
peripherals for the software industry. Targeted
communication interfaces include Ethernet, UART and
SPI.
Allows portable software to perform communication tasks
on any Cortex microcontrollers that support the required
communication interface
Organization of CMSIS cntd…
• Device Peripheral Access Layer (MCU specific)
- Name definitions, address definitions and driver code to access
peripherals
• Access Functions for Peripherals (MCU specific)
- Optional additional helper functions for peripherals
Organization of CMSIS cntd…
Using CMSIS
• Since the CMSIS is incorporated inside the
device driver library, there is no special setup
requirement for using CMSIS in projects.
• For each MCU device, the MCU vendor
provides a header file, which pulls in
additional header files required by the device
driver library, including the Core Peripheral
CMSIS Files
Using CMSIS cntd…
• File core_cm3.h
- contains the peripheral register definitions
- access functions for the Cortex-M3 processor
peripherals like NVIC, System Control Block
registers, and SYSTICK registers.
- declaration of CMSIS intrinsic functions to allow C
applications to access instructions that cannot be
generated using IEC/ISO C language.
- contains a function for outputting a debug message
via the Instrumentation Trace Module (ITM).
Using CMSIS cntd…
• File core_cm3.c
- Contains implementations of CMSIS intrinsic functions
that cant be implemented in core_cm3.h using simple
definitions.

• File system_<device>.h
- contains microcontroller specific interrupt
number definitions
- peripheral register definitions.

• File system_<device>.c
- contains a microcontroller specific function called
SystemInit for system initialization.
Using CMSIS cntd…
• In addition, CMSIS compliant device
drivers also contain start-up code (which
contains the vector table)
Benefits of CMSIS
• Better software portability and reusability.
• Allows software to be quickly ported between
Cortex-M3 and other Cortex-M processors,
reducing time to market.
Benefits of CMSIS cntd…
• For embedded OS vendors and middleware
providers, the advantages of the CMSIS are
significant.
• By using the CMSIS, their software products can
become compatible with device drivers from
multiple microcontroller vendors.
Without the CMSIS, the software vendors either
have to include a small library for Cortex-M3
core functions or develop multiple configurations
of their product so that it can work with device
libraries from different microcontroller vendors.
CMSIS Avoids Overlapping Driver
Code.
Using Assembly
• For small projects, it is possible to develop the
whole application in assembly language.
• Using assembler, best optimization is
possible, but with increase in the development
time.
• Handling complex data structures or function
library management can be extremely
difficult.
Using Assembly cntd…
• Yet even when the C language is used in a project, in
some situations part of the program is implemented
in assembly language as follows:
 Functions that cannot be implemented in C, such as
direct manipulation of stack data or special instructions
that cannot be generated by the C compiler in normal
C-code.
 Timing-critical routines.
 Tight memory requirements, causing part of the
program to be written in assembly to get the smallest
memory size.
The Interface between Assembly and C

• In various situations, assembly code and the C

program interact.
• For example:
 When embedded assembly is used in C program
code.
 When C program code calls a function or
subroutine implemented in assembler in a
separate file.
 When an assembly program calls a C function or
subroutine.
The Interface between Assembly and C
cntd…
• From the above cases, it is important to understand
how parameters and return results are passed between
the calling program and the function being called.
• For simple cases, when a calling program needs to
pass parameters to a subroutine or function, it will use
registers R0–R3.
• Similarly, R0 and R1 is used for returning a value at
the end of a function.
The Interface between Assembly and
C cntd…
• R0–R3 and R12 can be changed by a
function or subroutine whereas the contents
of R4–R11 should be restored to the
previous state before entering the function,
usually handled by stack PUSH and stack
POP.
The First Step in Assembly Programming

STACK_TOP EQU 0x20002000 ; constant for SP starting value

AREA |Header Code |, CODE DCD STACK_TOP ; Stack top 17110.5 Using
Assembly
DCD Start ;Reset vector ENTRY
; Indicate program execution start here
Start
; Start of main program
; initialize registers
MOV r0, #10 ; Starting loop counter value
MOV r1, #0 ; starting result
; Calculated 10+9+8+...+1 loop
ADD r1, r0 ; R1 = R1 + R0
SUBS r0, #1 ; Decrement R0, update flag ("S" suffix)
BNE loop ; If result not zero jump to loop
; Result is now in R1
deadloop B deadloop
; Infinite loop END
; End of file
The First Step in Assembly
Programming cntd…
• The examples here are based on ARM assembler
tools (armasm) in RVDS.
• For users of Keil MDK-ARM, the command line
options are slightly different.
• For other assembler tools, the file format and
instruction syntax will also need to be modified.
• In addition, some development tools will actually
do the startup code for you, so you might not need
to worry about creating your assembly startup
code.
Assembling the Code
• Program can be assembled using.

- The -o option specifies the output file name.

- The test1.o is an object file.
• Use a linker to create an executable image (ELF).

- Here, --ro-base 0x0 specifies that the read-only

region (program ROM) starts at address 0x0
--rwbase specifies that the read/write region (data
memory) starts at address 0x20000000.
Assembling the Code cntd…
• The --map option creates an image map, which is
useful for understanding the memory layout of the
compiled image.
• to create the binary image

• For checking that the image looks like what we

wanted, we can also generate a disassembled code list
file by

• If everything works fine, you can then load your ELF

image or binary image into your hardware or
instruction set simulator for testing.
Producing Outputs
• It is always more fun when you can connect your
microcontroller to the outside world.
• The simplest way to do that is to turn on/off the
LEDs.
• However, this practice is quite limiting because it can
only represent very limited information.
• One of the most common output methods is to send
text messages to a console.
Low-Cost Test Environment
for Outputting Text
Messages
UART interface
• Cortex-M3 processor does not contain a UART
interface, but most Cortex-M3 microcontrollers
come with UART provided by the chip
manufacturers.
• Our next example assumes that a UART is
available and has a status flag to indicate
whether the transmit buffer is ready for sending
out new data.
• A level shifter is needed in the connection
because RS-232 has a different voltage level
than the microcontroller I/O pins.
The “Hello World” Example
• Before we try to write a “Hello world”
program, we should figure out how to
one character through the UART.
send
Alternate methods of display
• Semihosting: Depending on the debugger and code library support,
semihosting (outputting printf messages via a debug probe device)
can be done via debug register in the NVIC.
• Instrumentation trace: If the Cortex-M3 microcontroller provides a
trace port and an external Trace Port Analyzer (TPA) can be used to
display.
• Instrumentation trace via Serial-Wire Viewer (SWV):
Alternatively, the Cortex-M3 processor (revision 1 and later) also
provides an SWV operation mode on the Trace Port Interface Unit
(TPIU). This interface allows outputs from ITM to be captured
using low-cost hardware instead of a TPA. However, the bandwidth
provided with the SWV mode is limited, so it is not ideal for large
amounts of data (e.g., instruction trace operation).
Using Data Memory
• Back to our first example: When we were doing
the linking stage, we specified the read/write
memory region.
• How do we put data there? The method is to
define a data region in your assembly file.
• Using the same example from the beginning, we
can store the data in the data memory at
0x20000000 (the SRAM region).
• The location of the data section is controlled by
a command-line option when you run the linker:

Interpreters April 01 2011
100% (4)
Interpreters April 01 2011
254 pages
Chapter 6 Answers
No ratings yet
Chapter 6 Answers
10 pages
Unit-4 Arm Architecture
100% (2)
Unit-4 Arm Architecture
71 pages
Assembly Programming:Simple, Short, And Straightforward Way Of Learning Assembly Language
From Everand
Assembly Programming:Simple, Short, And Straightforward Way Of Learning Assembly Language
Sherwyn Allibang
5/5 (2)
C Programming Syllabus
No ratings yet
C Programming Syllabus
2 pages
Module 2 ASIC
No ratings yet
Module 2 ASIC
52 pages
Module 2 ASIC
No ratings yet
Module 2 ASIC
52 pages
ASIC 1st Module NOTES
100% (3)
ASIC 1st Module NOTES
29 pages
ARM MCU Unit2 Part1
No ratings yet
ARM MCU Unit2 Part1
44 pages
ARM Processor
No ratings yet
ARM Processor
63 pages
05 Machine Lang
No ratings yet
05 Machine Lang
30 pages
Cse331 L3 Arm Isa
No ratings yet
Cse331 L3 Arm Isa
103 pages
Module 2 (Lecture 1)
No ratings yet
Module 2 (Lecture 1)
30 pages
ARM Assembly Language: Course Objective's
No ratings yet
ARM Assembly Language: Course Objective's
39 pages
CSE331 - L3A - ARM - ISA smh2 7 Sep 2024
No ratings yet
CSE331 - L3A - ARM - ISA smh2 7 Sep 2024
53 pages
Chapter 3 Instructions ARM
No ratings yet
Chapter 3 Instructions ARM
35 pages
ARM Arch Instruc Set Part2
No ratings yet
ARM Arch Instruc Set Part2
18 pages
CSE331 - L3B - ARM - ISA smh2 12 Sep 2024
No ratings yet
CSE331 - L3B - ARM - ISA smh2 12 Sep 2024
71 pages
ES Module 2 Notes
No ratings yet
ES Module 2 Notes
39 pages
Module 1B - ARM Cortex M0+ Core Architecture
No ratings yet
Module 1B - ARM Cortex M0+ Core Architecture
28 pages
Chapter 3
No ratings yet
Chapter 3
37 pages
Module 2 (Part 1)
No ratings yet
Module 2 (Part 1)
76 pages
Lab4 (Week7)
No ratings yet
Lab4 (Week7)
8 pages
ARM Teaching Material
100% (1)
ARM Teaching Material
33 pages
ARM Teaching Material
No ratings yet
ARM Teaching Material
33 pages
ARM ISA Review
No ratings yet
ARM ISA Review
33 pages
CPU Instruction Set
No ratings yet
CPU Instruction Set
16 pages
ARm Chinmayi PPT Lecture1 Upld 1 ND 2
No ratings yet
ARm Chinmayi PPT Lecture1 Upld 1 ND 2
43 pages
AppendixD Assembly Arm
No ratings yet
AppendixD Assembly Arm
53 pages
Lecture 3 ARM Assembly
No ratings yet
Lecture 3 ARM Assembly
94 pages
EE 3002.01 Embedded Systems: ARM Assembly Programming
No ratings yet
EE 3002.01 Embedded Systems: ARM Assembly Programming
94 pages
Unit 1 Topic 3
No ratings yet
Unit 1 Topic 3
21 pages
Chapter 1
No ratings yet
Chapter 1
26 pages
Instruction Set of ARM Controller
No ratings yet
Instruction Set of ARM Controller
20 pages
Acorn RISC Machine
No ratings yet
Acorn RISC Machine
6 pages
ARM Assembly Language and AMBA
No ratings yet
ARM Assembly Language and AMBA
36 pages
Intro To ARM Cortex-M3 (CM3) and LPC17xx MCU: Outline
No ratings yet
Intro To ARM Cortex-M3 (CM3) and LPC17xx MCU: Outline
79 pages
ASM Session1
No ratings yet
ASM Session1
32 pages
Machine Code or Machine Language: Addressing Modes
No ratings yet
Machine Code or Machine Language: Addressing Modes
9 pages
Cortex-M3 Instruction Sets
No ratings yet
Cortex-M3 Instruction Sets
35 pages
ARM Presentation
No ratings yet
ARM Presentation
51 pages
Arm Processor
No ratings yet
Arm Processor
92 pages
Architecture Programmers Model Instruction Set
No ratings yet
Architecture Programmers Model Instruction Set
33 pages
CHAPTER 3 - 1 - Ver2-Intro To Assembly Language PDF
100% (2)
CHAPTER 3 - 1 - Ver2-Intro To Assembly Language PDF
34 pages
Lec08 ARMisa 4up
No ratings yet
Lec08 ARMisa 4up
24 pages
ARM Instruction Set: Computer Organization and Assembly Languages P GZ y GG Yung-Yu Chuang
No ratings yet
ARM Instruction Set: Computer Organization and Assembly Languages P GZ y GG Yung-Yu Chuang
25 pages
Arm MODULE - 4mugyfdfgu
No ratings yet
Arm MODULE - 4mugyfdfgu
59 pages
657668478
No ratings yet
657668478
78 pages
BllCFrDVSa2A10fOr 5azw 26 Instructions RR
No ratings yet
BllCFrDVSa2A10fOr 5azw 26 Instructions RR
14 pages
3 Instruction Set
No ratings yet
3 Instruction Set
72 pages
Cortex M3
No ratings yet
Cortex M3
34 pages
ARM Instruction Set Architecture
No ratings yet
ARM Instruction Set Architecture
8 pages
Module 2 PDF
No ratings yet
Module 2 PDF
114 pages
ESOS All Merged
No ratings yet
ESOS All Merged
363 pages
ARM Slides Part2
No ratings yet
ARM Slides Part2
17 pages
ARM Introduction & Instruction Set Architecture: Aleksandar Milenkovic
No ratings yet
ARM Introduction & Instruction Set Architecture: Aleksandar Milenkovic
31 pages
Chapter 6 MZB
No ratings yet
Chapter 6 MZB
100 pages
MOD2
No ratings yet
MOD2
13 pages
Exp2 - ARM Data Processing Instructions
No ratings yet
Exp2 - ARM Data Processing Instructions
9 pages
ARM Teaching Material
No ratings yet
ARM Teaching Material
33 pages
Emebdded System Programming On CortexM3M4
No ratings yet
Emebdded System Programming On CortexM3M4
375 pages
Unit III Part 1
No ratings yet
Unit III Part 1
47 pages
Arm PPT Full
No ratings yet
Arm PPT Full
84 pages
14 - Instruction Encoding
No ratings yet
14 - Instruction Encoding
18 pages
ARM Register Organization
No ratings yet
ARM Register Organization
33 pages
ARM Instruction Set
No ratings yet
ARM Instruction Set
40 pages
C Programming Syllabus 1. Introduction To C Programming
No ratings yet
C Programming Syllabus 1. Introduction To C Programming
2 pages
ASIC by Sebastian Smith
No ratings yet
ASIC by Sebastian Smith
506 pages
74HC244 74HCT244: 1. General Description
No ratings yet
74HC244 74HCT244: 1. General Description
17 pages
Routing: by Manjunatha Naik V Asst. Professor Dept. of ECE, RNSIT
No ratings yet
Routing: by Manjunatha Naik V Asst. Professor Dept. of ECE, RNSIT
26 pages
Rns Institute of Technology: Text Books
No ratings yet
Rns Institute of Technology: Text Books
52 pages
Routing: by Manjunatha Naik V Asst. Professor Dept. of ECE, RNSIT
No ratings yet
Routing: by Manjunatha Naik V Asst. Professor Dept. of ECE, RNSIT
26 pages
Adders
No ratings yet
Adders
15 pages
Chapter 2
No ratings yet
Chapter 2
124 pages
Instruction Set 1 Compressed 2 1
No ratings yet
Instruction Set 1 Compressed 2 1
25 pages
STIK1014 A172assembler Project Mac2018
No ratings yet
STIK1014 A172assembler Project Mac2018
5 pages
How To Hack On The ZX Spectrum
No ratings yet
How To Hack On The ZX Spectrum
57 pages
Exam of Jaypee
No ratings yet
Exam of Jaypee
4 pages
8052 Instr Set
100% (1)
8052 Instr Set
24 pages
Basic Program of Graphics
No ratings yet
Basic Program of Graphics
19 pages
The Central Processing Unit 3.1 Computer Arithmetic 3.1.1 The Arithmetic and Logic Unit (ALU)
No ratings yet
The Central Processing Unit 3.1 Computer Arithmetic 3.1.1 The Arithmetic and Logic Unit (ALU)
12 pages
Creation of Replication Instance
No ratings yet
Creation of Replication Instance
16 pages
Assembler New
No ratings yet
Assembler New
24 pages
Binary Counters
No ratings yet
Binary Counters
48 pages
Microprocessor 8085 Programming
100% (3)
Microprocessor 8085 Programming
3 pages
Assembly Progg. & Interfacing - 8086
No ratings yet
Assembly Progg. & Interfacing - 8086
145 pages
Chapter 4-9
No ratings yet
Chapter 4-9
222 pages
Esd Unit 3,4,5
No ratings yet
Esd Unit 3,4,5
67 pages
Basic Computer Organization and Design-I
No ratings yet
Basic Computer Organization and Design-I
54 pages
Chap 2 - Assemblers
100% (1)
Chap 2 - Assemblers
50 pages
CPE 101. Programming The Computer
No ratings yet
CPE 101. Programming The Computer
22 pages
Teaching of 808688 Programming With Assembly Emulator
No ratings yet
Teaching of 808688 Programming With Assembly Emulator
8 pages
Inside Code Virtualizer
No ratings yet
Inside Code Virtualizer
28 pages
CPU Organisation: Instructions and Instruction Sequencing
No ratings yet
CPU Organisation: Instructions and Instruction Sequencing
30 pages
Unit - 2 Instruction Cycle and Timing Diagram
No ratings yet
Unit - 2 Instruction Cycle and Timing Diagram
6 pages
Microprocessor Architecture-Lab Tutorial
No ratings yet
Microprocessor Architecture-Lab Tutorial
47 pages
UNIT 2 8085 - Instructions
No ratings yet
UNIT 2 8085 - Instructions
143 pages
Z80 Microprocessor Architecture
No ratings yet
Z80 Microprocessor Architecture
15 pages
Timing Diagram of 8085
No ratings yet
Timing Diagram of 8085
20 pages
Timing Diagram
No ratings yet
Timing Diagram
12 pages
SSCD Mod 1
No ratings yet
SSCD Mod 1
25 pages