ARM Architecture
ARM Architecture
ARM Architecture
ARM Processors (or Microcontrollers) are a family of powerful CPUs that are based on the Reduced
Instruction Set Computer (RISC) architecture. ARM processors are available from small microcontrollers
like the ARM7 series to the powerful processors like Cortex – A series that are used in today‟s smart
phones.
ARM Processors follow Load and Store type architecture where the data processing is performed
only on the contents of the registers rather than directly on the memory. The instructions for data processing
on registers are different from that access the memory.
The instruction set of ARM is uniform and fixed in length. 32-bit ARM Processors have two
instruction sets: general 32-bit ARM Instruction Set and 16-bit Thumb Instruction Set.
ARM supports multiple stages of pipeline to speed up the flow of instructions. In a simple three stage
pipeline, the instructions follow three stages: fetch, decode and execute.
ARM has several processors that are grouped into number of families based on the processor core
they are implemented with. The architecture of ARM processors has continued to evolve with every family.
Some of the famous ARM Processor families are ARM7, ARM9, ARM10 and ARM11. The following table
shows some of the commonly found ARM Families along with their architectures.
ARM7TDMI ARMv4T
ARM9E ARMv5TE
ARM11 ARMv6
Cortex-M ARMv7-M
Cortex-R ARMv7R
Cortex-A (32-bit) ARMv7-A
Cortex-A (64-bit) ARMv8-A
ARM follows the nomenclature shown in the below figure to describe the processor implementations. The
letters or words after “ARM” are used to indicate the features of a processor.
Fig 4.1 ARM nomenclature
Where
x – Family or series
y – Memory Management/Protection Unit
z – Cache
T – 16 bit Thumb decoder
D – JTAG Debugger
M – Fast Multiplier
I – Embedded In-circuit Emulator (ICE) Macrocell
E – Enhanced Instructions for DSP (assumes TDMI)
J – Jazelle (for accelerated JAVA execution)
F – Vector Floating-point Unit
S – Synthesizable Version
D – JTAG Debug
JTAG is a serial protocol used by ARM to transfer the debug information between the processor and
the test equipment.
M – Fast Multiplier
Older ARM Processors used a small and simple multiplier unit. This multiplier unit required more
clock cycles to complete a single multiplication. With the introduction of Fast Multiplier unit, the clock
cycles required for multiplication are significantly reduced and modern ARM Processors are capable of
calculating a 32-bit product in a single cycle.
E – Enhanced Instructions
ARM Processors with this mode will support the extended DSP Instruction Set for high performance
DSP applications. With these extended DSP instructions, the DSP performance of the ARM Processors can
be increased without high clock frequencies.
J – Jazelle
ARM Processors with Jazelle Technology can be used in accelerated execution of Java bytecodes.
Jazelle DBX or Direct Bytecode eXecution is used in mobile phones and other consumer devices for high
performance Java execution without affecting memory or battery.
F – Vector Floating-point Unit
The Floating Point Architecture in ARM Processors provide execution of floating point arithmetic
operations. The Dynamic Range and Precision offered by the Floating Point Architecture in ARM
Processors are used in many real time applications in the industrial and automotive areas.
S – Synthesizable
The ARM Processor Core is available as source code. This software core can be compiled into a
format that can be easily understood by the EDA Tools. Using the processor source code, it is possible to
modify the architecture of the ARM Processor.
Arm Architecture
Features
32 – bit RISC Architecture
Two instruction sets
o ARM High performance 32-bit instruction set
o Thumb high code density 16-bit instruction set
Very low power consumption
32-bit RALU
There is a large register set consisting of sixteen 32-bit registers.
4G Bytes linear address space
Von Neumann load/store architecture
o Single 32-bit Data Bus for instructions and data
3-stage pipeline architecture
o Fetch, Decode, and Execute Stage
Instructions process data with 8, 16, 32-bit data types
7 different modes of operations
o User, FIQ, IRQ, Supervisor, Abort, System and Undefined modes.
Two types of requests for interrupts
o FIQ – Fast interrupt Request
o IRQ – Interrupt Request
Single cycle 32x8 hardware multiplier
On chip JTAG Debug and In circuit emulator
Block diagram
The ARM is a Reduced Instruction Set Computer (RISC), as it incorporates these typical RISC
architecture features:
control over both the Arithmetic Logic Unit (ALU) and shifter in most data-processing instructions
to maximize the use of an ALU and a shifter
auto-increment and auto-decrement addressing modes to optimize program loops
Load and Store Multiple instructions to maximize data throughput
Conditional execution of almost all instructions to maximize execution throughput.
Multiplier
The multiplier used is Booth Multiplier. A Booth Multiplier is a hardware multiplier that performs
multiplication of two signed binary numbers. The multiplier has three 32-bit inputs. All the inputs come
from the register file. The multiplier output is only the 32 least significant bits of the product.
Core Data path
Architecture is characterized by Data path and control path.
Data path is organized in such a way that, operands are not fetched directly from memory locations.
Data items are placed in register files. No data processing takes place in memory locations.
Instructions typically use 3 registers. 2 source registers and 1 destination register.
Increment or Decrement logic can update register content for sequential access.
Barrel Shifter
A barrel shifter is a digital circuit that can shift a data word by a specified number of bits in one
clock cycle. The ARM does not have shift and rotate instructions. Instead it has a 32-bit barrel shifter that is
capable of performing shift and rotate operations.
The input is coming from the register file or it could be immediate data. The shifter has other control
inputs coming from instruction register. Shift field in the instruction controls the operation of the barrel
shifter. The amount by which the register should be shifted is contained in an immediate field in the
instruction or it could be the lower 6 bits of a register in the register file.
It provides five types of shifts and rotates which can be applied to Operand2. They are LSL-Logical
Shift Left, LSR-Logical Shift Right, ASR-Arithmetic Shift Right, ROR-Rotate Right, RRX-Rotate Right
Extended.
Control Unit
For any microprocessor, control unit is the heart of the system. It is responsible for the system
operation. Control unit is usually a pure combinational circuit. Signals from the control unit are connected to
every component in the processor to supervise its operation.
ARM architecture supports the seven processor modes. They are User, FIQ (Fast interrupt Request),
IRQ (Interrupt Request), Supervisor, Abort, Undefined, System modes.
User mode is the main execution mode. By running application software in user mode, the operating
system can achieve protection and isolation.
Fast interrupt processing mode is entered whenever the processor receives an interrupt signal from
the designated fast interrupt source.
Normal interrupt processing mode is entered whenever the processor receives an interrupt signal
from any other interrupt source.
Supervisory mode is entered when the processor encounters a software interrupt instruction.
Software interrupts are a standard way to invoke operating system services on ARM.
Undefined instruction mode is entered when the processor attempts to execute an instruction that is
supported neither by the main integer core nor by one of the coprocessors. This mode can be used to
implement coprocessor emulation.
System mode is used for running privileged operating system tasks.
Abort mode is entered in response to memory faults.
Mode changes can be made under software control, or can be caused by external interrupts or
exception processing.
Most application programs execute in User mode. When the processor is in User mode, the program
being executed is unable to access some protected system resources or to change mode, other than by
causing an exception to occur (see Exceptions). This allows a suitably-written operating system to control
the use of system resources.
The modes other than User mode are known as privileged modes. They have full access to system
resources and can change mode freely. Five of them are known as exception modes:
FIQ
IRQ
Supervisor
Abort
Undefined.
These are entered when specific exceptions occur. Each of them has some additional registers to
avoid corrupting User mode state when the exception occurs.
The remaining mode is System mode, which is not entered by any exception and has exactly the
same registers available as User mode. However, it is a privileged mode and is therefore not subject to the
User mode restrictions. It is intended for use by operating system tasks that need access to system resources.
Registers
ARM has 37 registers in total, all of which are 32-bits long, from which
31 general-purpose registers, including a program counter.
6 status registers. These registers are also 32 bits wide, but only some of the 32 bits are allocated or
need to be implemented. The subset depends on the architecture variant supported.
Among 31 general purpose registers one register(r15) is used as PC (Program Counter) and from
remaining 30, fifteen registers (r0 to r14) are available to the user depending upon the mode of operation and
from 6 status registers 1 dedicated CPSR(Current Program Status Register) and 5 dedicated SPSR(Saved
Program Status Register).
The general purpose registers r0-15 can be split into three groups,
1. The Unbanked registers, r0 –r7
2. The Banked registers, r8 – 14
3. Program Counter, r15
Fig 4.3 ARM Registers
At any one time 16 general registers (r0 to r15) and one or two status registers are visible to the
programmer. The visible registers depend on the processor mode and the other registers (the banked
registers) are switched in to support IRQ, FIQ, Supervisor, Abort and undefined mode processing. The
register bank organization is shown in Figure. The banked registers are shaded in the below figure.
In all modes 16 registers, r0 to r15, are directly accessible. All registers except r15 are general
purpose and may be used to hold data or address values. Register r15 holds the Program Counter (PC). The
register r14 is used as the subroutine link register and receives a copy of r15 when a Branch and Link
instruction is executed. It may be treated as a general-purpose register at all other times. R14_svc, R14_irq,
R14_fiq, R14_abt and R14_und are used similarly to hold the return values of R15 when interrupts and
exceptions arise, or when Branch and Link instructions are executed within interrupt or exception routines.
A seventeenth register CPSR (Current Program Status Register) is also accessible. It contains condition code
flags and the current mode bits and may be thought of as an extension to the PC.
CPSR
The Current Program Status Register (CPSR) is accessible in all processor modes. It contains
condition code flags, interrupt mask bits, the current processor mode, and other status and control
information. Each exception mode also has a Saved Program Status Register (SPSR) that is used to preserve
the value of the CPSR when the associated exception occurs. The format of CPSR is given below
31 30 29 28 27…………………………8 7 6 5 4 3 2 1 0
N Z C V Reserved I F T M4 M3 M2 M1 M0
Condition code flags: Condition flags are updated by comparisons and the result of ALU operations
that specify the S instruction suffix. For example, if SUBS subtract instruction results in a register value of
zero, then the Z flag in the CPSR is set. This particular subtract instruction specifically updates the CPSR.
N (Negative) bit 31 of the result
Interrupt mask bits: Interrupt masks are used to stop specific interrupt requests from interrupting the
processor. There are two interrupt request levels available on the ARM processor core they are interrupt
request (IRQ) and fast interrupt request (FIQ).
Thumb bit:
Mode bits: The below table lists the various modes and the associated binary patterns. The last
column of the table gives the bit patterns that represent each of the processor modes in the CPSR.
Pipelining
In ARM 7, a 3 stage pipeline is used. A 3 stage pipeline is the simplest form of pipeline.
Fetch Decode Execute
In a pipeline, when one instruction is executed, second instruction is decoded and third instruction
will be fetched.
This is executed in a single cycle.
Stage 1 Stage 2 Stage 3
Time 0 Fetch1 -- --
Time 1 Fetch 2 Decode 1 --
Time 2 Fetch 3 Decode 2 Execute 1
Time 3 Fetch 4 Decode 3 Execute 2
The ARM architecture defines the following types of exceptions (listed in the order of decreasing
priority):
Reset starts the processor from a known state and renders all other pending exceptions irrelevant.
Data abort exception is raised by memory management hardware when a load or store instruction
violates memory access permissions.
Fast interrupt exception is raised whenever the processor receives an interrupt signal from the
designated fast interrupt source.
Normal interrupt exception is raised whenever the processor receives an interrupt signal from any
non-fast interrupt source.
Pre-fetch abort exception is raised by memory management hardware when memory access
permissions are violated during instruction fetch.
Undefined instruction exception is generated when trying to decode an instruction that is supported
neither by the main integer core nor by one of the coprocessors.
When an exception occurs, execution is forced from a fixed memory address corresponding to the
type of exception. These fixed addresses are called the exception vectors.
where
Instruction Set
The data processing instructions manipulate data within registers. They are move instructions, arithmetic
instructions, logical instructions, comparison instructions, and multiply instructions. Most data processing
instructions can process one of their operands using the barrel shifter. If you use the S suffix on a data
processing instruction, then it updates the flags in the CPSR.
i. Move Instructions: Move is the simplest ARM instruction. It copies N into a destination register Rd,
where N is a register or immediate value. This instruction is useful for setting initial values and
transferring data between registers.
ii. Arithmetic Instructions: The arithmetic instructions implement addition and subtraction of 32-bit
signed and unsigned values.
iii. Logical Instructions: Logical instructions perform bitwise logical operations on the two source
registers.
Examples:
r1=0x02040608, r2=0x10305070
ORR r0, r1, r2
r0=0x12345678
r1=0b1111, r2=0b0101
BIC r0, r1, r2
r0=0b1010
iv. Comparison Instructions: The comparison instructions are used to compare or test a register with a
32-bit value. They update the CPSR flag bits according to the result, but do not affect other registers.
After the bits have been set, the information can then be used to change program flow by using
conditional execution.
CMP r1, r0
Branch Instructions:
A branch instruction changes the flow of execution or is used to call a routine. This type of
instruction allows programs to have subroutines, if-then-else structures, and loops.
The change of execution flow forces the program counter pc to point to a new address.
Multiply Instructions:
The multiply instructions multiply the contents of a pair of registers and, depending upon the
instruction, accumulate the results in with another register. The long multiplies accumulate onto a pair of
registers representing a 64-bit value. The final result is placed in a destination register or a pair of registers.
Load-Store Instructions:
Load-store instructions transfer data between memory and processor registers. There are three types
of load-store instructions: single-register transfer, multiple-register transfer, and swap.
i. Single-Register Transfer: These instructions are used for moving a single data item in and out of a
register. The data types supported are signed and unsigned words (32-bit), half words (16-bit), and
bytes.
A software interrupt instruction (SWI) causes a software interrupt exception, which provides a
mechanism for applications to call operating system routines.
When the processor executes an SWI instruction, it sets the program counter pc to the offset 0x8 in
the vector table. The instruction also forces the processor mode to SVC, which allows an operating system
routine to be called in a privileged mode.
The ARM instruction set provides two instructions to directly control a program status register
(PSR). The MRS instruction transfers the contents of either the CPSR or SPSR into a register; in the reverse
direction, the MSR instruction transfers the contents of a register into the CPSR or SPSR. Together these
instructions are used to read and write the CPSR and SPSR. In the syntax you can see a label called fields.
This can be any combination of control (c), extension (x), status (s), and flags (f). These fields relate to
particular byte regions in a PSR.
Syntax: MRS{<cond>} Rd,<cpsr|spsr>
MSR{<cond>} <cpsr|spsr>_<fields>, Rm
Loading Constants:
You might have noticed that there is no ARM instruction to move a 32-bit constant into a register.
Since ARM instructions are 32 bits in size, they obviously cannot specify a general 32-bit constant. To aid
programming there are two pseudo instructions to move a 32-bit value into a register.
The first pseudo instruction writes a 32-bit constant to a register using whatever instructions are
available. It defaults to a memory read if the constant cannot be encoded using other instructions.
The second pseudo instruction writes a relative address into a register, which will be encoded using a
pc-relative expression.
Thumb Model
Thumb encodes a subset of the 32-bit ARM instructions into a 16-bit instruction set space.
Thumb has higher performance than ARM on a processor with a 16-bit data bus, but lower
performance than ARM on a 32-bit data bus.
Thumb is used for memory-constrained systems.
Thumb has higher code densitythe space taken up in memory by an executable program than
ARM. For memory-constrained embedded systems.
Cost pressures also limit memory size, width, and speed.
Notice that from the Thumb instruction set list and from the Thumb register usage table that there is
no direct access to the CPSR or SPSR. In other words, there are no MSR and MRS equivalent Thumb
instructions.
To alter the CPSR or SPSR, you must switch into ARM state to use MSR and MRS. Similarly, there
are no coprocessor instructions in Thumb state. You need to be in ARM state to access the coprocessor for
configuring cache and memory management.
ARM-Thumb Interworking:
ARM-Thumb interworking is the name given to the method of linking ARM and Thumb code
together. It handles the transition between the two states.
To call a Thumb routine from an ARM routine, the core has to change state. This state change is
shown in the T bit of the cpsr. The BX and BLX branch instructions cause a switch between ARM and
Thumb state while branching to a routine. The BX lr instruction returns from a routine, also with a state
switch if necessary.
Syntax: BX Rm
BLX Rm | label
Thumb Instruction Set: