AES Module 4 Notes
AES Module 4 Notes
Module – 4
ARM Cortex M3 Instruction Sets and Programming
The label is optional. Some of the instructions might have a label in front of them so that the
address of the instructions can be determined using the label.
The opcode (the instruction) followed by a number of operands. Some times suffixes can be
added to the opcode to indicate various information on the (M) Multiple transfers, (B,H,W,D)
size of data,(!)updating of register contents, (IA)increment after, (DB)decrement before etc;
The first operand is the destination of the operation.
The number of operands in an instruction depends on the type of instruction, and the syntax
format of the operand can also be different.
Examples:
i) Moving immediate data:
MOV R0, #0x12 ; Set R0 = 0x12 (hexadecimal)
MOV R1, #'A' ; Set R1 = ASCII character A
With UAL, the syntax of Thumb instructions is now the same as ARM
instructions. Ex: ADD R0, R1 ; R0 = R0 + R1, using Traditional Thumb syntax.
ADD R0, R0, R1 ; Equivalent instruction using UAL syntax.
The traditional Thumb syntax can still be used. The choice between whether the instructions are
interpreted as traditional Thumb code or the new UAL syntax is normally defined by the
directive in the assembly file.
For example, with ARM assembler tool, a program code header
“CODE16” directive implies the code is in the traditional Thumb
syntax, “THUMB” directive implies the code is in the new UAL
syntax.
In traditional Thumb some instructions change the flags in APSR, even if the S suffix is not used.
But in UAL syntax an instruction changes the flag only if the S suffix is
used. Example:
AND R0, R1 ; Traditional Thumb syntax will update flag after + operation.
No need any suffix.
ANDS R0, R0, R1 ; Equivalent UAL syntax S suffix is added to update Flags.
With the new instructions in Thumb-2 technology, some of the operations can be handled by
either a Thumb instruction or a Thumb-2 instruction.
For example, R0 = R0 + 1 can be implemented as a 16-bit Thumb instruction or a 32-bit Thumb-
2 instruction. With UAL, programmer can specify which instruction is to be used by adding
suffixes:
ADDS R0, #1 ; Use 16-bit Thumb instruction by default for smaller size
ADDS.N R0, #1 ; Use 16-bit Thumb instruction (N=Narrow)
ADDS.W R0, #1 ; Use 32-bit Thumb-2 instruction (W=wide)
If no suffix is given, the assembler tool can choose either instruction but usually defaults to 16-
bit Thumb code to get a smaller size. Depending on tool support, programmer can use the .N
(narrow) suffix to specify a 16-bit Thumb instruction.
Instruction List
The ARM Cortex instruction can be classified into the following:
1. Moving Data
2. Pseudo-Instructions
3. Data processing instructions
4. Call & Unconditional Branch Instructions
5. Decision & Conditional Branch Instructions
6. Combined Compare & Conditional branch Instructions
7. Instruction Barrier & Memory Barrier Instructions
8. Saturation operation Instructions
9. Useful Thumb-2 instruction
22
17EC62
Move instructions
This can be grouped into the following category:
Moving Data Between Register And Register
Moving An Immediate Data Value Into A Register
Moving Data Between Memory And Register
Moving Data Between Special Register And Register
The basic instructions for accessing memory are Load and Store.
Load (LDR) transfers data from memory to registers, and Store transfers data from registers
to memory.
The transfers can be in different data sizes (byte, half word, word, and double word), as
outlined in Table below.
LDRB Rd, [Rn, #offset] Read byte from memory location Rn + offset
LDRH Rd, [Rn, #offset] Read half word from memory location Rn +
offset LDR Rd, [Rn, #offset] Read word from memory location Rn + offset
LDRD Rd1,Rd2, [Rn, #offset] Read double word from memory location Rn + offset
STRB Rd, [Rn, #offset] Store byte to memory location Rn + offset
STRH Rd, [Rn, #offset] Store half word to memory location Rn +
offset STR Rd, [Rn, #offset] Store word to memory location Rn + offset
STRD Rd1,Rd2, [Rn, #offset] Store double word to memory location Rn + offset
17EC62
Multiple Load and Store operations can be combined into single instructions called LDM (Load
Multiple) and STM (Store Multiple), as outlined in below.
The exclamation mark (!) in the instruction specifies whether the register Rd should be updated
after the instruction is completed.
Pseudo-Instructions
Both LDR and ADR pseudo-instructions can be used to set registers to a program address value.
They have different syntaxes and behaviors.
LDR obtains the immediate data by putting the data in the program code and uses a PC relative
load to get the data into the register.
ADR tries to generate the immediate value by adding or subtracting instructions (for example,
based on the current PC value).
As a result, it is not possible to create all immediate values using ADR, and the target address
label must be in a close range. However, using ADR can generate smaller code sizes compared
with LDR.
For LDR, if the address is a program address value, the assembler will automatically set the LSB
to 1.
LDR R0, = address1 ; R0 set to 0x4001
.............................
26
17EC62
ADD instruction can operate between two registers or between one register and an immediate
data value:
ADD R0, R0, R1 ; R0 = R0 + R1 ADDS
R0, R0, #0x12 ; R0 = R0 + 0x12
ADD.W R0, R1, R2 ; R0 = R1 + R2
When 16-bit Thumb code is used, an ADD instruction can change the flags in the PSR.
However, 32-bit Thumb-2 code can either change a flag or keep it unchanged.
To separate the two different operations, the S suffix should be used if the
following operation depends on the flags:
ADD.W R0, R1, R2 ; Flag unchanged. ADDS.W R0, R1, R2 ; Flag change.
MUL Rd, Rm ; Rd = Rd * Rm MUL.W UDIV Rd, Rn, Rm ; Rd = Rn/Rm SDIV Rd, Rn, Rm ; Rd = Rn/Rm
Rd, Rn, Rm ; Rd = Rn * Rm
32-bit multiply instructions for signed values 32-bit multiply instructions for unsigned values
SMULL RdLo, RdHi, Rn, Rm ; {RdHi,RdLo} = Rn * Rm SMLALUMULL RdLo, RdHi, Rn, Rm ; {RdHi,RdLo} = Rn * Rm UMLAL
RdLo, RdHi, Rn, Rm ; {RdHi,RdLo} += Rn * Rm RdLo, RdHi, Rn, Rm ; {RdHi,RdLo} += Rn * Rm
17EC62
Rotate right Rotate right extended Why Is There Rotate Right But No
ROR Rd, Rn ; Rd rot by Rn RRX.W Rd, Rn ; {C, Rd} = {Rn, C} Rotate Left?
ROR.W Rd, Rn,#imm ; Rd = Rn rot by The rotate left operation can be
imm replaced by a rotate right operation
ROR.W Rd, Rn, Rm ; Rd = Rn rot by Rm with a different rotate offset.
For example,
A rotate left by 4-bit operation can
be written as a rotate right by 28-
bit instruction, which gives the
same
result and takes the same amount of
time to execute.
The return address will be stored in the link register (LR) and the function can be
terminated using BX LR, which causes program control to return to the calling
process.
However, when using BLX, make sure that the LSB of the register is 1. Otherwise
the processor will produce a fault exception because it is an attempt to switch to the
ARM state.
Example:
Save the LR if You Need to Call a Subroutine
The BL instruction will destroy the current content of the LR. So, if the program code needs the
LR later, one should save LR before the use BL.
The common method is to push the LR to stack in the beginning of your
subroutine. For example,
main ()
...
BL functionA
...
functionA
PUSH {LR} ; Save LR content to stack
...
BL functionB
...
POP {PC} ; Use stacked LR content to return to main
functionB PUSH
{LR}
...
POP {PC} ; Use stacked LR content to return to functionA
In addition, if the subroutine you call is a C function, you might also need to save the contents in
R0–R3 and R12 if these values will be needed at a later stage.
30
17EC62
3 11
17EC62
If a statement is to be executed when <cond> is false, the suffix for the instruction must be the
opposite of the condition. For example, the opposite of EQ is NE, the opposite of GT is LE, and
so on.
The following code shows an example of a simple conditional execution:
32
17EC62
if (R1<R2) then
R2=R2−R1
R2=R2/2
else
R1=R1−R
2 R1=R1/2
In assembly, same thing can be replaced with
CMP R1, R2 ; If R1 < R2 (less then)
ITTEE LT ; then execute inst 1 and 2 (indicated by T) else execute
inst3 & 4 (indicated by E)
SUBLT.W R2,R1 ; 1st instruction
LSRLT.W R2,#1 ; 2nd instruction
SUBGE.W R1,R2 ; 3rd instruction (notice the GE is opposite of LT)
LSRGE.W R1,#1 ; 4th instruction
DMB is very useful for multi-processor systems. For example, tasks running on separate
processors might use shared memory to communicate with each other. In these environments,
the order of memory accesses to the shared memory can be very important.
DMB instructions can be inserted between accesses to the shared memory to ensure that the
memory access sequence is exactly the same as expected.
The DSB and ISB instructions can be important for self-modifying code. For example, if a
program changes its own program code, the next executed instruction should be based on the
updated program.
However, since the processor is pipelined, the modified instruction location might have already
been fetched. Using DSB and then ISB can ensure that the modified program code is fetched
again.
Architecturally, the ISB instruction should be used after updating the value of the CONTROL
register.
The memory barrier instructions
These can be accessed in C using Cortex Microcontroller Software Interface.
Standard (CMSIS) compliant device driver library as follows:
void DMB(void); // Data Memory Barrier
void DSB(void); // Data Synchronization Barrier
void ISB(void); // Instruction Synchronization
Barrier Useful Instructions In the Cortex-M3
33
17EC62
34
17EC62
For example,
if R0 is 0xB4E10C23 (binary value 1011_0100_1110_0001_0000_1100_0010_0011), on
executing: RBIT.W R0, R1
R0 will become 0xC430872D (binary value 1100_0100_0011_0000_1000_0111_0010_1101).
LDR R0,=0x1234FFFF
BFC.W R0, #4, #8
35
17EC62
UBFX and SBFX are the unsigned and signed bit field extract instructions. The syntax of the
instructions is as follows:
UBFX.W <Rd>, <Rn>, <#lsb>, <#width>
SBFX.W <Rd>, <Rn>, <#lsb>, <#width>
UBFX extracts a bit field from a register starting from any location (specified by #lsb) with
any width (specified by #width), zero extends it, and puts it in the destination register.
For example,
LDR R0,=0x5678ABCD
UBFX.W R1, R0, #4, #8
This will give R1 = 0x000000BC.
Similarly, SBFX extracts a bit field, but its sign extends it before putting it in a destination
register.
For example,
LDR R0,=0x5678ABCD
SBFX.W R1, R0, #4, #8
This will give R1 = 0xFFFFFFBC.
36
17EC62
37
17EC62
Memory Mapping
The Cortex-M3 processor has a fixed memory map which makes it easier to port software from one
Cortex- M3 product to another.
The Nested Vectored Interrupt Controller (NVIC) and Memory Protection Unit (MPU), have the
same memory locations in all Cortex-M3 products. Some of the memory locations are allocated for
private peripherals such as debugging components.
They are located in the private peripheral memory region. These debugging components include the
following:
1. Fetch Patch and Breakpoint Unit (FPB)
2. Data Watchpoint and Trace Unit (DWT)
3. Instrumentation Trace Macrocell (ITM)
4. Embedded Trace Macrocell (ETM)
5. Trace Port Interface Unit (TPIU)
6. ROM table
38
17EC62
It is good to put the program code in the code region because, the instruction fetches and data
accesses are carried out simultaneously on two separate bus interfaces.
The SRAM memory range is for connecting internal SRAM. Access to this region is carried out via
the system interface bus. In this region, a 32-MB range is defined as a bit-band alias.
Within the 32-bit-band alias memory range, each word address represents a single bit in the 1-MB
bit- band region.
A data write access to this bit-band alias memory range will be converted to an atomic READ-
MODIFY-WRITE operation to the bit-band region so as to allow a program to set or clear
individual data bits in the memory.
The bit-band operation applies only to data accesses not instruction fetches.
By putting Boolean information (single bits) in the bit-band region, we can pack multiple Boolean
data in a single word while still allowing them to be accessible individually via bit-band alias, thus
saving memory space without the need for handling READ-MODIFY-WRITE in software.
A 0.5-GB block of address range is allocated to on-chip peripherals. Similar to the SRAM
region, this region supports bit-band alias and is accessed via the system bus interface but the
instruction execution in this region is not allowed.
The bit-band support in the peripheral region makes it easy to access or change control and status
bits of peripherals, making it easier to program peripheral control.
Two slots of 1-GB memory space are allocated for external RAM and external devices.
It is to be noted that the program execution in the external device region is not allowed, and there
are some differences with the caching behaviors.
The last 0.5-GB memory is for the system-level components, internal peripheral buses, external
peripheral bus, and vendor-specific system peripherals.
There are two segments of the private peripheral bus (PPB):
1. Advanced High-Performance Bus (AHB) PPB, for Cortex-M3 internal AHB peripherals ie;
NVIC, FPB, DWT, and ITM.
2. Advance Peripheral Bus (APB) PPB, for Cortex-M3 internal APB devices as well
as external peripherals (external to the Cortex-M3 processor);
The Cortex-M3 allows chip vendors to add additional on-chip APB peripherals on this private
peripheral bus via an APB interface.
39
17EC62
The NVIC is located in a memory region called the system control space (SCS) Besides
providing interrupt control features, this region also provides the control registers for
SYSTICK,MPU, and code debugging control.
The remaining unused vendor-specific memory range can be accessed via the system bus
interface, but instruction execution in this region is not allowed.
Bit-band operation
Bit-band operation support allows a single load/store operation to access (read/write) to a single
data bit.
In the Cortex-M3, this is supported in two predefined memory regions called bit-band regions.
(a) One is located in the first 1 MB of the SRAM region.
(b) One more is located in the first 1 MB of the peripheral region.
These two memory regions can be accessed like normal memory, but they can also be accessed
via a separate memory region called the bit-band alias.
When the bit-band alias address is used, each individual bit can be accessed separately in the
least significant bit (LSB) of each word-aligned address.
40
17EC62
For example, to set bit 2 in word data in address 0x20000000, instead of using three
instructions to read the data, set the bit, and then write back the result, this task can be carried
out by a
single
instruction The assembler sequence for these two cases could be like the one shown
With out bit-band:
LDR R0,=0x20000000 ; Setup address LDR
R1, [R0] ; Read
ORR.W R1, #0x4 ; Modify bit STR
R1, [R0] ; Write back result
With bit-band:
LDR R0, = 0x22000008 ; Setup address
MOV R1, #1 ; Setup data
STR R1, [R0] ; Write
The bit-band support can simplify application code if we need to read a bit in a memory
location. For example, if we need to determine bit 2 of address 0x20000000, we use the steps
outlined here.
The assembler sequence for these two cases could be like the one shown.
Without Bit-band
LDR R0, =0x20000000 ; Setup address LDR
R1, [R0] ; Read
UBFX.W R1, R1, #2, #1 ; Extract bit[2]
With bit-band
LDR R0,=0x22000008 ; Setup
address LDR R1, [R0] ; Read
The Cortex-M3 uses the following terms for the bit-band memory addresses:
1. Bit-band region: This is a memory address region that supports bit-band operation.
2. Bit-band alias: Access to the bit-band alias will cause an access (a bit-band operation)
to the bit-band region.
Within the bit-band region, each word is represented by an LSB of 32 words in the bit-band alias
address range.
When the bit-band alias address is accessed, the address is remapped into a bit-band address.
For read operations, the word is read and the chosen bit location is shifted to the LSB of the read
return data.
For write operations, the written bit data are shifted to the required bit position, and a READ-
MODIFY-WRITE is performed.
4 11
17EC62
42
17EC62
With the Cortex-M3 bit-band feature, this kind of race condition can be avoided because the
READMODIFY-WRITE is carried out at the hardware level and is atomic (the two transfers
cannot be pulled apart) and interrupts cannot take place between them as shown below.
43
17EC62
used by Process A and bit 1 is used by Process B, a data conflict can occur in software-based
READ-MODIFY-WRITE as shown below.
But the bit-band feature can ensure that bit accesses from each task are separated so that no data
conflicts occur as shown below.
The bit-band feature can be used for storing and handling Boolean data in the SRAM region.
Like, multiple Boolean variables can be packed into one single memory location to save
memory space, whereas the access to each bit is still completely separated when the access is
carried out via the bit- band alias address range.
44
17EC62
CMSIS
The CMSIS was developed by ARM to allow users of the Cortex- M3 microcontrollers to gain
the most benefit from all these software solutions and to allow them to develop their embedded
application quickly and reliably.
45
17EC62
CMSIS Structure
46
17EC62
3. The CMSIS has a small memory footprint (less than 1 KB for all core access
functions and a few bytes of RAM).
It avoids overlapping of core peripheral driver code when reusing software code from other
projects. Since CMSIS is supported by multiple compiler vendors, embedded software can
compile and run with different compilers. As a result, embedded OS and middleware can be
MCU vendor independent and compiler tool vendor independent.
4. Before availability of CMSIS, intrinsic functions were generally compiler specific and
could cause problems in retargetting the software in a different compiler.
Since all CMSIS compliant device driver libraries have a similar structure, learning to use
different Cortex-M3 microcontrollers is even easier as the software interface has similar look
and feel (no need to relearn a new application programming interface).
5. CMSIS is tested by multiple parties and is Motor Industry Software Reliability
Association (MISRA) compliant, thus reducing the validation effort required for developing
your own NVIC or core feature access functions.
47
17EC62
Various software programs are available for developing Cortex-M3 applications. The concepts of
codegeneration flow in terms of these tools are similar.
For the most basic uses, you will need assembler, a C compiler, a linker, and binary file
generation utilities.
For ARM solutions, the RealView Development Suite (RVDS) or RealView Compiler Tools
(RVCT) provide a file generation flow, as shown above.
The scatter-loading script is optional but often required when the memory map becomes more
complex.
Besides these basic tools, RVDS also contains a large number of utilities, including an
Integrated Development Environment (IDE) and debuggers.
numbers THUMB
AREA MULTI, CODE, READONLY
ENTRY
LDR R0,=0X706F ; load 16 bit no. to R0 LDR
R1,=0X7161 ; load 16 bit no. to R1
UMUL R2,R3,R1,R0 ; multiply R1 with R0 and store result in dest register R2 and R3
END
numbers THUMB
AREA sum,CODE,READONLY
ENTRY
MOV R1,#10 ; load 10 to register
MOV R2,#0 ; empty R2 register to store result Loop
48
17EC62
4.
For C Codes, refer Lab Manual 17ECL67, Dept of ECE, MIT Mysore.
49