0% found this document useful (0 votes)
6 views110 pages

Unit 5

Uploaded by

molugu nishritha
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
6 views110 pages

Unit 5

Uploaded by

molugu nishritha
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 110

UNIT-5

ARM Cortex-M3 Instruction Set and


Memory System
Syllabus
• Instruction Sets: ARM Cortex-M3 16-bit and 32-bit Instruction Set,
Unified Assembler Language, Data Processing Instructions, Branch
Instructions, Load and Store Instructions
• Memory System: Memory Maps, Memory Access Attributes, Default
Memory Access Permissions, Bit-Band Operations, Unaligned
Transfers, Exclusive accesses, Pipeline
Assembly Basics
• Assembler Language:
• Basic Syntax In assembler code is
label opcode operand1, operand2, ...; Comments
The number of operands in an instruction depends on the type of
instruction, and the syntax format of the operand can also be different.
example, immediate data are usually in the form #number
• EX:
• MOV R0, #0x12 ; Set R0 = 0x12 (hexadecimal)
• MOV R1, #'A' ; Set R1 = ASCII character A
• Constants are defined using EQU, and then use them inside your program
code
• NVIC_IRQ_SETEN0 EQU 0xE000E100
• NVIC_IRQ0_ENABLE EQU 0x1
• ...
• LDR R0,=NVIC_IRQ_SETEN0; ; LDR here is a pseudo-instruction that
; convert to a PC relative load by
; assembler.
• MOV R1,#NVIC_IRQ0_ENABLE ; Move immediate data to register
• STR [R0],R1 ; Enable IRQ 0 by writing R1 to address
; in R0
• A number of data definition directives are available for insertion of
constants inside assembly code.
• DCI (Define Constant Instruction) can be used to code an instruction if
your assembler cannot generate the exact instruction that you want
and if you know the binary code for the instruction.
EX:
DCI 0xBE00 ; Breakpoint (BKPT 0), a 16-bit instruction
• We can use DCB (Define Constant Byte) for byte size constant values,
such as characters, and Define Constant Data (DCD) for word size
constant values to define binary data in your code.
EX:
• LDR R3,=MY_NUMBER ; Get the memory address value of
MY_NUMBER LDR R4,[R3]
; Get the value code 0x12345678 in R4
...
LDR R0,=HELLO_TXT ; Get the starting memory address of
; HELLO_TXT
BL PrintText ; Call a function called PrintText to
; display string
...

MY_NUMBER
DCD 0x12345678
HELLO_TXT
DCB "Hello\n",0 ; null terminated string
Continution for Load and store instructions
Sign extend instructions
Contd..
• Assembler Language: Moving Data
• In the Cortex-M3, data transfers can be of one of the following types:
• Moving data between register and register
• Moving data between memory and register
• Moving data between special register and register
• Moving an immediate data value into a register
Assembly Language: Saturation Operations
• The Cortex-M3 supports two instructions that provide signed and unsigned saturation operations: SSAT
and USAT (for signed data type and unsigned data type, respectively).
• Saturation is commonly used in signal processing—for example, in signal amplification. When an input
signal is amplified, there is a chance that the output will be larger than the allowed output range. If
the value is adjusted simply by removing the unused MSB, an overflowed result will cause the signal
waveform to be completely deformed .
• The saturation operation does not prevent the distortion of the signal, but at least the amount of
distortion is greatly reduced in the signal waveform.

Fig: Signed Saturation Operation


Assembly Language: Saturation Operations

Ex: SSAT.W R1, #16, R0 Fig: Unsigned Saturation Operation


USAT.W R1, #16, R0
• The Cortex-M3 supports a number of barrier instructions. These
instructions are needed as memory systems get more and more
complex. In some cases, if memory barrier instructions are not used,
race conditions could occur.
Instruction Barrier and Memory Barrier Instructions
• The memory barrier instructions can be accessed in C using Cortex
Microcontroller Software Interface Standard (CMSIS)
• void __DMB(void); // Data Memory Barrier
• void __DSB(void); // Data Synchronization Barrier
• void __ISB(void); // Instruction Synchronization Barrier
Table Branch Byte and Table Branch Halfword
• Table Branch Byte (TBB) and Table Branch Halfword (TBH) are for
implementing branch tables.
• The TBB instruction uses a branch table of byte size offset, and TBH
uses a branch table of half word offset.
• TBB has this general syntax:
• TBB.W [Rn, Rm]
• where Rn is the base memory offset and Rm is the branch table index.
The branch table item for TBB is located at Rn + Rm. Assuming we
used PC for Rn
• Since the bit 0 of a program counter is always zero, the value in the
branch table is multiplied by two before it’s added to PC.
Furthermore, because the PC value is the current instruction address
plus four,
• the branch range for TBB is (2 × 255) + 4 = 514, and the branch range
for TBH is (2 × 65535) + 4 = 131074.
• Both TBB and TBH support forward branch only.
MEMORY MAP
Memory System Features Overview
• It has a predefined memory map that specifies which bus interface is
to be used when a memory location is accessed. This feature also
allows the processor design to optimize the access behavior when
different devices are accessed.
• The Cortex-M3 Memory is the bit-band support.
• The bit-band operations are supported only in special memory
regions.
• It also supports unaligned transfers and exclusive accesses.
Memory Maps
• The Cortex-M3 processor has a fixed memory map
• This makes it easier to port software from one Cortex-M3 product to
another.
• Some of the memory locations are allocated for private peripherals
such as debugging components.
Memory Access Attributes
• The memory map shows what is included in each memory region.
• the memory map also defines the memory attributes of the access. In the
Cortex-M3 processor include the following:
• Bufferable:
• Write to memory can be carried out by a write buffer while the processor
continues on next instruction execution.

• Cacheable: Data obtained from memory read can be copied to a memory


cache so that next time it is accessed the value can be obtained from the
cache to speed up the program execution.
• Executable: The processor can fetch and execute program code from
this memory region.

• Sharable: Data in this memory region could be shared by multiple bus


masters.
Memory system needs to ensure coherency of data between different
bus masters in shareable memory region.
The memory access attributes for each
memory region are as follows:
• Code memory region (0x00000000–0x1FFFFFFF):
• This region is executable, and the cache attribute is write through
(WT).
• You can put data also in this region as well.
• If data operations are carried out for this region, they will take place
via the data bus interface. Write transfers to this region are bufferable
• SRAM memory region (0x20000000–0x3FFFFFFF):
• This region is intended for on-chip RAM. Write transfers to this region
are bufferable, and the cache attribute is write back, write allocated
(WB-WA).
• This region is executable, so you can copy program code here and
execute it.
• Peripheral region (0x40000000–0x5FFFFFFF):
This region is intended for peripherals.
The accesses are noncacheable.
You cannot execute instruction code in this.
• External RAM region (0x60000000–0x7FFFFFFF):
• This region is intended for either on-chip or off-chip memory. The
accesses are cacheable (WB-WA), and you can execute code in this
region.
• External RAM region (0x80000000–0x9FFFFFFF):
This region is intended for either on-chip or off-chip memory. The
accesses are cacheable (WT), and you can execute code in this region.
off-chip memory.
• The accesses are cacheable (WT), and you can execute code in this region.
External devices (0xA0000000–0xBFFFFFFF):
• This region is intended for external devices and/or shared memory that needs
ordering/nonbuffered accesses. It is also a nonexecutable region.
External devices (0xC0000000–0xDFFFFFFF):
This region is intended for external devices and/or shared memory that needs
ordering/nonbuffered accesses. It is also a nonexecutable region.
System region (0xE0000000–0xFFFFFFFF):
This region is for private peripherals and vendor-specific devices.
It is nonexecutable.
For the PPB memory range, the accesses are strongly ordered (noncacheable,
nonbufferable).
For the vendor-specific memory region, the accesses are bufferable and noncacheable.
Default Memory Access Permissions
• This prevents user programs (non-privileged) from accessing system
control memory spaces such as the NVIC.
• The default memory access permission is used when either no MPU
is present or MPU is present but disabled.
• If MPU is present and enabled, the access permission in the MPU
setup will determine whether user accesses are allowed.
Bit-Band Operations
• Bit-band operation support allows a single load/store operation to
access (read/write) to a single data bit.
• In the Cortex-M3, this is supported in two predefined memory
regions called bit-band regions.
• One of them is located in the first 1 MB of the SRAM region, and the
other is located in the first 1 MB of the peripheral region.
• These two memory regions can be accessed like normal memory, but
they can also be accessed via a separate memory region called the
bit-band alias
• When the bit-band alias address is used, each individual bit can be
accessed separately in the least significant bit (LSB) of each word-
aligned address.
For example
• To set bit 2 in word data in address 0x20000000, instead of using
three instructions to read the data, set the bit, and then write back
the result, this task can be carried out by a single instruction
• Similarly ,bit-band support can simplify application code if we need to
read a bit in a memory location.
• For example
• if we need to determine bit 2 of address 0x20000000,
• The Cortex-M3 does not have special instructions for bit operation,
special memory regions are defined so that data accesses to these
regions are automatically converted into bit-band operations.
• The Cortex-M3 uses the following terms for the bit-band memory
addresses:
• Bit-band region: This is a memory address region that supports bit-
band operation.
• Bit-band alias: Access to the bit-band alias will cause an access (a bit-
band operation) to the bit-band region. (Note: A memory remapping is
performed.)
• Within the bit-band region, each word is represented by an LSB of 32
words in the bit-band alias address range. What actually happens is
that when the bit-band alias address is accessed, the address is
remapped into a bit-band address.
• For read operations, the word is read and the chosen bit location is
shifted to the LSB of the read return data.
• For write operations, the written bit data are shifted to the required
bit position, and a READ-MODIFY-WRITE is performed.
Similarly, the bit-band region of the peripheral memory region can be accessed via
bit-band aliased addresses,
• When you access bit-band alias addresses, only the LSB (bit[0]) in the
data is used. In addition,
• accesses to the bit-band alias region should not be unaligned. If an
unaligned access is carried out to bit-band alias address range, the
result is unpredictable.
• Bit-Band versus Bit-Bang
• In the Cortex-M3, we use the term bit-band to indicate that the
feature is a special memory band (region) that provides bit accesses.
• Bit-bang commonly refers to driving I/O pins under software control
to provide serial communication functions.
• The bit-band feature in the Cortex-M3 can be used for bit-banging
implementations,
• but the definitions of these two terms are different.
Advantages of Bit-Band Operations
• Bit-band operation can also be used to simplify branch decisions. For
example, if a branch should be carried out based on 1 single bit in a status
register in a peripheral, instead of
• Reading the whole register
• Masking the unwanted bits
• Comparing and branching

you can simplify the operations to


• Reading the status bit via the bit-band alias (get 0 or 1)
• Comparing and branching
Unaligned Transfers
• The Cortex-M3 supports unaligned transfers on single accesses.
• Data memory accesses can be defined as aligned or unaligned.
• Traditionally, ARM processors (such as the ARM7/ARM9/ARM10)
allow only aligned transfers. That means in accessing memory, a word
transfer must have address bit[1] and bit[0] equal to 0, and a half
word transfer must have address bit[0] equal to 0.
• For example, word data can be located at 0x1000 or 0x1004, but it
cannot be located in 0x1001, 0x1002, or 0x1003.
• For half word data, the address can be 0x1000 or 0x1002, but it
cannot be 0x1001.
• So, what does an unaligned transfer look like?
• An unaligned transfer can be any word size read/write such that the
address is not a multiple of 4,
• when the transfer is in half word size, and the address is not a
multiple of 2,
• All the byte-size transfers are aligned on the Cortex-M3 because the
minimum address step is 1 byte.
• In the Cortex-M3, unaligned transfers are supported in normal
memory accesses (such as LDR, LDRH, STR, and STRH instructions).
There are a number of limitations:
• Unaligned transfers are not supported in Load/Store multiple
instructions.
• Stack operations (PUSH/POP) must be aligned.
• Exclusive accesses (such as LDREX or STREX) must be aligned;
otherwise, a fault exception(usage fault) will be triggered.
• Unaligned transfers are not supported in bit-band operations.
Results will be unpredictable if you attempt to do so.
Exclusive Accesses
• The Cortex-M3 has no SWP instruction (swap).Which was used for
semaphore operations.
• This is now being replaced by exclusive access operations.
• Exclusive accesses were first supported in architecture v6
semaphore operations
• Semaphores are commonly used for allocating shared resources to
applications.
• When a shared resource can only service one client or application
processor, we also call it Mutual Exclusion (MUTEX).
• In such cases, when a resource is being used by one process, it is
locked to that process and cannot serve another process until the
lock is released.
• To set up a MUTEX semaphore, a memory location is defined as the
lock flag to indicate whether a shared resource is locked by a process.
• When a process or application wants to use the resource, it needs to
check whether the resource has been locked first.
• If it is not being used, it can set the lock flag to indicate that the
resource is now locked.
• In traditional ARM processors, the access to the lock flag is carried
out by the SWP instruction.
• It allows the lock flag read and write to be atomic, preventing the
resource from being locked by two processes at the same time.
• In newer ARM processors, the read/write access can be carried out
on separated buses.
• In such situations, the SWP instructions can no longer be used to
make the memory access atomic because the read and write in a
locked transfer sequence must be on the same bus. Therefore, the
locked transfers are replaced by exclusive accesses.
• The concept of exclusive access operation is quite simple but different
from SWP
exclusive access operation
• To allow exclusive access to work properly in a multiple processor
environment, an additional hardware called “exclusive access
monitor” is required.
• This monitor checks the transfers toward shared address locations
and replies to the processor if an exclusive access is success.
• The processor bus interface also provides additional control signals1
to this monitor to indicate if the transfer is an exclusive access.
• If the memory device has been accessed by another bus master
between the exclusive read and the exclusive write, the exclusive
access monitor will flag an exclusive failed through the bus system
when the processor attempts the exclusive write.
• This will cause the return status of the exclusive write to be 1.
• In the case of failed exclusive write, the exclusive access monitor also
blocks the write transfer from getting to the exclusive access address
Pipeline
• The Cortex-M3 processor has a three-stage pipeline.

• When running programs with mostly 16-bit instructions, you will find
that the processor might not fetch instructions in every cycle.
• This is because the processor fetches up to two instructions (32-bit) in
one go, so after one instruction is fetched, the next one is already
inside the processor.
• In this case, the processor bus interface may try to fetch the
instruction after the next or, if the buffer is full, the bus interface
could be idle.
• Some of the instructions take multiple cycles to execute; in this case,
the pipeline will be stalled.
• Inside the instruction prefetch unit of the processor core, there is also
an instruction buffer
• This buffer allows additional instructions to be queued before they
are needed. This buffer prevents the pipeline being stalled when the
instruction sequence contains 32-bit Thumb-2 instructions that are
not word aligned.
• In executing a branch instruction, the pipeline will be flushed.
• The processor will have to fetch instructions from the branch
destination to fill up the pipeline again.

You might also like