ARM Notes For Students
ARM Notes For Students
ARM Notes For Students
ARM started life as part of Acorn computer, and now designs chips for Apple's iPad.
1978 - Acorn Computers is established in Cambridge, and produces computers which are
particularly successful in the UK. Acorn's BBC Microcomputer was the most widely-used
computer in school in the 1980s.
1985 - Acorn Computer Group develops the world's first commercial RISC processor -
enabling a computer system which uses simpler commands in order to operate faster, an
advance on the early computer systems which were created using machine code and tried to
pack as many actions into each command as possible.
1987 - Acorn's ARM processor is the first RISC processor available in a low-cost PC.
1990 - ARM is founded as a spin-off from Acorn and Apple, after the two companies started
collaborating on the ARM processor as part of the development of Apple's new Newton
computer system.
2007 - About 98pc of the more than 1bn mobile phones sold each year use at least one ARM
processor.
ARM
ARM stands for Advanced RISC Machines
An ARM processor is basically any 16/32bit microprocessor designed and licensed by ARM
Ltd, a microprocessor design company headquartered in England, founded in 1990 by
Herman Hauser
A characteristic feature of ARM processors is their low electric power consumption, which
makes them particularly suitable for use in portable devices.
APPLICATIONS
iPod from Apple
D-Link DSL-604+ Wireless ADSL Router.
Many automobiles embed ARM7 cores.
Sirius Satellite Radio receivers
Most of Nokia's mobile phone range.
Architectural Difference
Bus width: 8-bit for Bus width: 8/16/32-bit Bus width: 32-bit mostly also
standard core available in 64-bit
Communication Protocols:
Communication Protocols: PIC, UART, USART, LIN, Communication Protocols:
UART, USART,SPI,I2C CAN, Ethernet, SPI, I2S UART, USART, LIN, I2C, SPI,
CAN, USB, Ethernet, I2S, DSP,
SAI (serial audio interface),
IrDA
Speed: 12 Clock/instruction Speed: 4 Clock/instruction
cycle cycle Speed: 1 clock/ instruction cycle
It was known as the Advanced RISC Machine, and before that as the Acorn RISC Machine
This has made them dominant in the mobile and embedded electronics market as relatively
FEATURES OF LPC2148
PACKAGE:
MEMORY:
SPEED:
Single flash sector or full chip erase in 400 ms and programming of 256 bytes in 1ms.
USB 2.0 Full Speed compliant Device Controller with 2kB of endpoint RAM.
In addition, the LPC2146/8 provides 8kB of on-chip RAM accessible to USB by DMA.
ADC:
Two 10-bit A/D converters(AD0 and AD1) provide a total of 14 analog inputs
DAC:
TIMERS:
Watchdog timer
RTC:
Low power real-time clock with independent power and dedicated 32 kHz clock
input.
SERIAL INTERFACES:
I2C-bus:
Serial communication:
SPI (Serial Peripheral Interface) and SSP(Synchronous Serial Port) with buffering and
variable data length capabilities
FAST GPIO: Up to 45 of 5 V tolerant fast general purpose I/O pins in a tiny LQFP64
INTERRUPTS:
60 MHz maximum CPU clock available from programmable on-chip PLL with settling time of
100 μs.
OSCILLATOR:
On-chip integrated oscillator operates with an external crystal in range from 1 MHz
to 30 MHz and with an external oscillator up to 50 MHz
Idle mode
Power-down mode
CPU operating voltage range of 3.0 V to 3.6 V (3.3 V ± 10 %) with 5 V tolerant I/O pads.
D - Debug: 2 break points to stop the CPU (both hardware and software)
I: Interface: Embedded ICE macro cell. JTAG- Joint Test Action Group.
The three stage pipelined architecture of the ARM7 processor is shown in the above figure.
FEATURES OF ARM PROCESSORS
The ARM processors are based on RISC architectures and this architecture has provided small
implementations, and very low power consumption. Implementation size, performance, and
very low power consumption remain the key features in the development of the ARM devices.
ARM Registers : ARM has a total of 37 registers .In which - 31 are general-purpose registers
of 32-bits, and six status registers .But all these registers are not seen at once. The processor
state and operating mode decide which registers are available to the programmer. At any
time, among the 31 general purpose registers only 16 registers are available to the user. The
remaining 15 registers are used to speed up exception processing. there are two program status
registers: CPSR and SPSR (the current and saved program status registers, respectively
In ARM state the registers r0 to r13 are orthogonal—any instruction that you can apply to r0
you can equally well apply to any of the other registers.
The main bank of 16 registers is used by all unprivileged code. These are the User mode
registers. User mode is different from all other modes as it is unprivileged. In addition to this
register bank ,there is also one 32-bit Current Program status Register(CPSR)
In the 15 registers ,the r13 acts as a stack pointer register and r14 acts as a link register and r15
acts as a program counter register.
Register r13 is the sp register ,and it is used to store the address of the stack top. R13 is used
by the PUSH and POP instructions in T variants, and by the SRS and RFE instructions from
ARMv6.
Register 14 is the Link Register (LR). This register holds the address of the next instruction
after a Branch and Link (BL or BLX) instruction, which is the instruction used to make a
subroutine call. It is also used for return address information on entry to exception modes. At
all other times, R14 can be used as a general-purpose register.
Register 15 is the Program Counter (PC). It can be used in most instructions as a pointer to the
instruction which is two instructions after the instruction being executed.
The remaining 13 registers have no special hardware purpose.
CPSR : The ARM core uses the CPSR register to monitor and control internal operations.
The CPSR is a dedicated 32-bit register and resides in the register file. The CPSR is divided
into four fields, each of 8 bits wide : flags, status, extension, and control. The extension and
status fields are reserved for future use. The control field contains the processor mode, state,
and interrupt mask bits. The flags field contains the condition flags. The 32-bit CPSR register
is shown below.
Processor Modes: There are seven processor modes .Six privileged modes abort, fast interrupt
request, interrupt request, supervisor, system, and undefined and one non-privileged mode
called user mode.
The processor enters abort mode when there is a failed attempt to access memory. Fast interrupt
request and interrupt request modes correspond to the two interrupt levels available on the
ARM processor. Supervisor mode is the mode that the processor is in after reset and is generally
the mode that an operating system kernel operates in. System mode is a special version of user
mode that allows full read-write access to the CPSR. Undefined mode is used when the
processor encounters an instruction that is undefined or not supported by the implementation.
User mode is used for programs and applications.
Banked Registers: Out of the 32 registers , 20 registers are hidden from a program at different
times. These registers are called banked registers and are identified by the shading in the
diagram. They are available only when the processor is in a particular mode; for example,
abort mode has banked registers r13_abt , r14_abt and spsr _abt. Banked registers of a
particular mode are denoted by an underline character post-fixed to the mode mnemonic or
_mode.
When the T bit is 1, then the processor is in Thumb state. To change states the core executes
a specialized branch instruction and when T= 0 the processor is in ARM state and executes
ARM instructions. There are two interrupt request levels available on the ARM processor
core—interrupt request (IRQ) and fast interrupt request (FIQ).
PIPE LINE : Pipeline is the mechanism used by the RISC processor to execute instructions
at an increased speed. This pipeline speeds up execution by fetching the next instruction while
other instructions are being decoded and executed. During the execution of an instruction ,the
processor Fetches the instruction .It means loads an instruction from memory.And decodes
the instruction i.e identifies the instruction to be executed and finally Executes the instruction
and writes the result back to a register.
The ARM7 processor has a three stage pipelining architecture namely Fetch , Decode and
Execute.And the ARM 9 has five stage Pipe line architecture.The three stage pipelining is
explained as below.
To explain the pipelining ,let us consider that there are three instructions Compare, Subtract
and Add.The ARM7 processor fetches the first instruction CMP in the first cycle and during
the second cycle it decodes the CMP instruction and at the same time it will fetch the SUB
instruction. During the third cycle it executes the CMP instruction , while decoding the SUB
instruction and also at the same time will fetch the third instruction ADD. This will improve
the speed of operation. This leads to the concept of parallel processing .This pipeline example
is shown in the following diagram.
As the pipeline length increases, the amount of work done at each stage is reduced, which
allows the processor to attain a higher operating frequency. This in turn increases the
performance. One important feature of this pipeline is the execution of a branch instruction or
branching by the direct modification of the PC causes the ARM core to flush its pipeline.
Exceptions, Interrupts, and the Vector Table :
Exceptions are generated by internal and external sources to cause the ARM processor to
handle an event, such as an externally generated interrupt or an attempt to execute an Undefined
instruction. The processor state just before handling the exception is normally preserved so that
the original program can be resumed after the completion of the exception routine. More than
one exception can arise at the same time.ARM exceptions may be considered in three groups
1. Exceptions generated as the direct effect of executing an instruction.Software interrupts,
undefined instructions (including coprocessor instructions where the requested coprocessor is
absent) and prefetch aborts (instructions that are invalid due to a memory fault occurring during
fetch) come under this group.
2. Exceptions generated as a side-effect of an instruction.Data aborts (a memory fault during a
load or store data access) are in this group.
3. Exceptions generated externally, unrelated to the instruction flow.Reset, IRQ and FIQ are in
this group.
Undefined instruction vector is used when the processor cannot decode an instruction.
Software interrupt vector is called when you execute a SWI instruction. The SWI instruction
is frequently used as the mechanism to invoke an operating system routine.
Pre-fetch abort vector occurs when the processor attempts to fetch an instruction from an
address without the correct access permissions. The actual abort occurs in the decode stage.
Data abort vector is similar to a prefetch abort but is raised when an instruction attempts to
access data memory without the correct access permissions.
Interrupt request vector is used by external hardware to interrupt the normal execution flow of
the processor. It can only be raised if IRQs are not masked in the CPSR.
The ARM9 family was released in 1997. It has five stage pipeline architecture .Hence , the
ARM9 processor can run at higher clock frequencies than the ARM7 family. The extra stages
improve the overall performance of the processor. The memory system has been redesigned to
follow the Harvard architecture, with separate data and instruction .buses. The first processor
in the ARM9 family was the ARM920T, which includes a separate D + I cache and an MMU.
This processor can be used by operating systems requiring virtual memory support. ARM922T
is a variation on the ARM920T but with half the D +I cache size.
The latest core in the ARM9 product line is the ARM926EJ-S synthesizable processor core,
announced in 2000. It is designed for use in small portable Java-enabled devices such as 3G
phones and personal digital assistants (PDAs).
The ARM10 was released in 1999 . It extends the ARM9 pipeline to six stages. It also supports
an optional vector floating-point (VFP) unit, which adds a seventh stage to the ARM10
pipeline. The VFP significantly increases floating-point performance and is compliant with the
IEEE 754.1985 floating-point standard.
The ARM1136J-S is the ARM11 processor released in the year 2003 and it is designed for
high performance and power efficient applications. ARM1136J-S was the first processor
implementation to execute architecture ARMv6 instructions. It incorporates an eight-stage
pipeline with separate load store and arithmetic pipelines.
A brief comparison of different ARM families is presented below.
ARM instructions are classified into data processing instructions, branch instructions, load-
store instructions, software interrupt instruction, and program status register instructions.
Data processing instructions are processed within the arithmetic logic unit (ALU). A unique
and powerful feature of the ARM processor is the ability to shift the 32-bit binary pattern in
one of the source registers left or right by a specific number of positions before it enters the
ALU. This shift increases the power and flexibility of many data processing operations.
There are data processing instructions that do not use the barrel shift, for example, the MUL
(multiply), CLZ (count leading zeros), and QADD (signed saturated 32-bit add) instructions.
i.Move Instructions : Move instruction copies R into a destination register Rd, where R is a
register or immediate value. This instruction is useful for setting initial values and transferring
data between registers.
Example1 : PRE r5 = 5
r7 = 8
MOV r7, r5 ;
POST r5 = 5
r7 = 5
The MOV instruction takes the contents of register r5 and copies them into register r7.
Example 2: MOVS r0, r1, LSL #1
MOVS instruction shifts register r1 left by one bit
SUB r0, r1, r2 ; This subtract instruction subtracts a value stored in register r2 from a value
stored in register r1. The result is stored in register r0.
RSB r0, r1, #0 ; This reverse subtract instruction (RSB) subtracts r1 from the constant value
#0,
writing. the result to r0. You can use this instruction to negate numbers.
SUBS r1, r1, #1 ; The SUBS instruction is useful for decrementing loop counters. In this
example we subtract the immediate value one from the value one stored in
register r1. The result value zero is written to register r1.
Logical Instructions : These Logical instructions perform bitwise logical operations on the
two source registers.
BIC r0, r1, r2 ; BIC, carries out a logical bit clear. register r2 contains a binary pattern where
every binary 1 in r2 clears a corresponding bit location in register r1. This instruction is
particularly useful when clearing status bits and is frequently used to change interrupt masks
in the cpsr.
Branch Instructions: A branch instruction changes the normal flow of execution of a main
program or is used to call a subroutine routine. This type of instruction allows programs to
have subroutines, if-then-else structures, and loops. The change of execution flow forces the
program counter pc to point to a new address.
Example 1: B forward ; (unconditional branch to forward)
The branch with link, or BL, instruction is similar to the B instruction but overwrites the
Subroutine
The details of the branch instructions are given in the table above.
Load-Store Instructions : Load-store instructions transfer data between memory and processor
registers. There are three types of load-store instructions:
Single-register transfer
Multiple-register transfer, and
Swap.
Single-Register Transfer : These instructions are used for moving a single data item in and out of a
register. The data types supported are signed and unsigned words (32-bit), half-words (16-bit), and
bytes. Ex1: STR r0, [r1] ; = STR r0, [r1, #0] ; store the contents of register r0 to the
memory address pointed to by register r1.
Ex2 : LDR r0, [r1] ; = LDR r0, [r1, #0] ; load register r0 with the contents of the
Load-store multiple instructions can increase interrupt latency. ARM implementations do not
usually interrupt instructions while they are executing. For example, on an ARM7 a load
multiple instruction takes 2 + N.t cycles, where N is the number of registers to load and t is the
number of cycles required for each sequential access to memory. If an interrupt has been raised,
then it has no effect until the load-store multiple instruction is complete.
Example 1: LDMIA r0!, {r1-r3} ; In this example, register r0 is the base register Rn and is
followed by !, indicating that the register is updated after the instruction is executed. In this
case the range is from register r1 to r3.
Example 2 : LDMIB : load multiple and increment before
Ex 3: LDMIB r0!, {r1-r3} ;
Ex 4 : LDMDA r0!, {r1-r3}
Stack Operations : The ARM architecture uses the load-store multiple instructions to carry
out stack operations. The pop operation (removing data from a stack) uses a load multiple
instruction; similarly, the push operation (placing data onto the stack) uses a store multiple
instruction.
A stack is either ascending (A) or descending (D). Ascending stacks grow towards higher
memory addresses; in contrast, descending stacks which grow towards lower memory
addresses. When a full stack (F)is used , the stack pointer sp points to an address that is the
last used or full location (i.e., sp points to the last item on the stack). In contrast, if an empty
stack (E) is used , the sp points to an address that is the first unused or empty location (i.e., it
points after the last item on the stack).
Example1 : The STMFD instruction pushes registers onto the stack, updating the sp.
STMFD sp! , {r1,r4}; Store Multiple Full Descending Stack
PRE r1 = 0x00000002
r4 = 0x00000003
sp = 0x00080014
POST r1 = 0x00000002
r4 = 0x00000003
sp = 0x0008000c.
The stack operation is shown by the following diagram.
Example2: The STMED instruction pushes the registers onto the stack but updates register sp
to point to the next empty location as shown in the below diagram..
PRE r1 = 0x00000002
r4 = 0x00000003
sp = 0x00080010
POST r1 = 0x00000002
r4 = 0x00000003
sp = 0x00080008
Swap Instruction :
The Swap instruction is a special case of a load-store instruction. It swaps (Similar to exchange)
the contents of memory with the contents of a register. This instruction is an atomic operation—
it reads and writes a location in the same bus operation, preventing any other instruction from
reading or writing to that location until it completes.Swap cannot be interrupted by any other
instruction or any other bus access. So, the system “holds the bus” until the transaction is
complete.
Ex 1: SWP : Swap a word between memory and a register tmp = mem32[Rn]
mem32[Rn] =Rm
Rd = tmp
Ex2 : SWPB Swap a byte between memory and a register tmp = mem8[Rn]
mem8[Rn] =Rm
Rd = tmp.
Ex 3: SWP r0, r1, [r2] ; The swap instruction loads a word from memory into
register
r0 and overwrites the memory with register r1.
Here 0x123456, is the SWI number used by ARM toolkits as a debugging SWI. Typically
the SWI instruction is executed in user mode.
Program Status Register Instructions : There are two instructions available to directly
control a program status register (PSR). The MRS instruction transfers the contents of either
the CPSR or SPSR into a register.Similarly the MSR instruction transfers the contents of a
register into the CPSR or SPSR .These instructions together are used to read and write the
CPSR and SPSR.
MRS : copy program status register to a general-purpose register , Rd= PSR
MSR : move a general-purpose register to a program status register, PSR[field]=Rm
MSR : move an immediate value to a program status register, PSR[field]=immediate
Here the LDR instruction loads a 32-bit constant 0xff00ffff into register r0.
Example 3: The same constant can be loaded into the register r0 using the MVN instruction
also.
MVN r0, #0x00ff0000
After execution r0 = 0xff00ffff.
Introduction to Thumb instruction set : Thumb encodes a subset of the 32-bit ARM
instructions into a 16-bit instruction set space. Since Thumb has higher performance than ARM
on a processor with a 16-bit data bus, but lower performance than ARM on a 32-bit data bus,
use Thumb for memory-constrained systems. Thumb has higher code density—the space taken
up in memory by an executable program—than ARM. For memory-constrained embedded
systems, for example, mobile phones and PDAs, code density is very important. Cost pressures
also limit memory size, width, and speed.
Thumb execution is flagged by the T bit (bit [5] ) in the CPSR. A Thumb implementation of
the same code takes up around 30% less memory than the equivalent ARM implementation.
Even though the Thumb implementation uses more instructions ; the overall memory footprint
is reduced. Code density was the main driving force for the Thumb instruction set. Because it
was also designed as a compiler target, rather than for hand-written assembly code. Below
example explains the difference between ARM and Thumb code
From the above example it is clear that the Thumb code is more denser than the ARM code.
Exceptions generated during Thumb execution switch to ARM execution before executing the
exception handler . The state of the T bit is preserved in the SPSR, and the LR of the exception
mode is set so that the normal return instruction performs correctly, regardless of whether the
exception occurred during ARM or Thumb execution.
In Thumb state, all the registers can not be accessed . Only the low registers r0 to r7 can be
accessed. The higher registers r8 to r12 are only accessible with MOV, ADD, or CMP
instructions. CMP and all the data processing instructions that operate on low registers update
the condition flags in the CPSR
The list of registers and their accessibility in Thumb mode are shown in the following table..
Form the above discussion, it is clear that there are no MSR and MRS equivalent Thumb
instructions. To alter the CPSR or SPSR , one must switch into ARM state to use MSR and
MRS. Similarly, there are no coprocessor instructions in Thumb state. You need to be in ARM
state to access the coprocessor for configuring cache and memory management.
ARM-Thumb interworking is the method of linking ARM and Thumb code together for both
assembly and C/C++. It handles the transition between the two states. To call a Thumb routine
from an ARM routine, the core has to change state. This is done with the T bit of CPSR . The
BX and BLX branch instructions cause a switch between ARM and Thumb state while
branching to a routine. The BX lr instruction returns from a routine, also with a state switch if
necessary.
The data processing instructions manipulate data within registers. They include move
instructions, arithmetic instructions, shifts, logical instructions, comparison instructions, and
multiply instructions. The Thumb data processing instructions are a subset of the ARM data
processing instructions.
Note : Thumb deviates from the ARM style in that the barrel shift operations (ASR, LSL, LSR,
and ROR) are separate instructions.