We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF or read online on Scribd
You are on page 1/ 13
CHAPTER
ARM PROCESSOR
FUNDAMENTALS
Chapter 1 covered embedded systems with an ARM processor. In this chapter we will focus
on the actual processor itself. First, we will provide an overview of the processor core and
describe how data moves between its different parts. We will describe the programmer's
model from a software developer's view of the ARM processor, which will show you the
functions of the processor core and how different parts interact. We will also take a look at
the core extensions that form an ARM pracessor. Core extensions speed up and organize
main memory as well a5 extend the instruction set. We will then cover the revisions to the
ARM core architecture by describing the ARM core naming conventions used to identify
them and the chronological changes to the ARM instruction set architecture. The final
section introduces the architecture implementations by subdividing them into specific
ARM processor core families.
‘A programmer can think of an ARM core as functional units connected by data buses,
as shown in Figure 2.1, where, the arrows represent the flow of data, the lines represent the
buses, and the boxes represent cither an operation unit or a storage area. The figure shows
not only the ow of data but also the abstract components that make up an ARM core,
Data enters the processor core through the Data bus. The data may be an instruction to
exccute of a data item, Figure 2.1 shows a Von Neumann implementation of the ARM—
data items and instructions share the same bus. In contrast, Harvard implementations of
the ARM use two different buses. .
The instruction decoder translates instructions before they ae executed. Each
instruction executed belongs to a particular instruction set,
“The ARM processor, like all RISC processors, uses a load-store architecture, This
means it has two instruction types for transferring data in and ont of the processor: load
instructions copy data from memory to registers in the core, and conversely the store
v9
ic
Scanned with CamScannerChapter 2 "ARM Processor Fundamentals
20 + Chap!
I
rem
Address
Figure 2.1 ARM core dataflow model.
instructions copy data from registers to memory, There are no data Processing instructions
that directly manipulate data in memory. Thus, data processing is carried out solely ia
Data items are placed in the register file—a storage bank made up of 32-bit registers
Since the ARM cote is a 32-bit processor, most instructions treat the registers as holding
signed or unsigned 32-bit values. The sign extend hardware converts signed 8-bit and 16-bit
‘umber to 32-bit values as they are read from memory and placed in a register.
ARM instructi
ARM ons typically have two source registers, Rn and Rm, and a single venete
destination register, Rd. Source operands are reat from the register file using the inter:
bbuses A and B, respectively,
‘The ALU (arit}
thmetic logic unit) or MAC. (multipl ul it) takes the reg
ter values Rn and R, iply-accumulate unit) e
instructions gat 2 fom the A and B buses an
mica eA and computes a result. Data procesid
ana itectly to the register file. Load and store instruc
use the ALU ly register file, Load an
Address &"*T8€ aN address to be held in the address register and broadcast on
Scanned with CamScanner21 Registers 21
One important feature of the ARM is that register Rim alternatively can be preprocessed
in the barrel shifter before it enters the ALU. Together the barrel shifter and ALU can
calculate a wide range of expressions and addresses .
After passing through the functional units, the result in Rdis written back to the register
file using the Resultbus. For load and store instructions the incrementer updates the address
register before the core reads or writes the next register value from or to the next sequential
men ory location. The processor continues executing instructions until an exception or
inte: rupt changes the normal execution flow.
Now that you have an overview of the processor core we'll ake a more detailed look
at some of the key components of the processor: the registers, the current program status
register (cpsr), and the pipeline.
2.1 ReEcIsteRs
General-purpose registers hold either data or an address. They are identified with the
letter + prefixed to the register number. For example, register 4 is given the label r4.
Figure 2.2 shows the active registers available in user mode—a protected mode normally
Figure 2.2 _ Registers available in user mode,
iit,
Scanned with GamScanner22. Chapter? |ARM Processor Fundamentals
fate in seven diff
used when executing applications, The processor oa oe seven differen moa,
~ which we will introduce shortly. Allthe rs prcesor talus pie 1
‘There are up to 18 active registers ese pen The
data registers are visible t6 the TOE ed to a particular task or special functin,
1s ar task 0 :
a eh proce has et {ferent labels to differentiate them from the
113, 114, and 115. eres *
“other registers. a = died
“tii Figure 2.2, the shaded registers s identify the assigned spe’
am Register r13is traditionally used as the stack pointer
in the current processor mode.
es Register rldis called the link register
whenever it calls subroutine. h ,
Register r15is the program counter (pc) and contains the address of the next instruction
ss
di
jal-pur pose registers:
(sp) and stores the head of the stag,
(in) and is where the core puts the return address
to be fetched by the processor.
i the context, registers r13 and ri4 can also be used as general-purpose
Depend en uae -iked during. procesor
which can be particularly useful since these registers are bai
it is dangerous to use 113 as a general register. when the processor
because operating systems often assume that r13
registers,
mode change. However,
is Tanning any form of operating system be
always points to a valid stack frame. ———
. Te eM sate the registers rOto r13are orfhogonalany instruction that you cen apply
to rOyou can equally-well apply-to any of the other registers. ‘However, there are instructions
that treat #74 9nd r151n a special way.
“Th addition to the 16 data registers, there are two program status registers: cpsr and spsr
(the current and saved program status registers, respectively). —
“The register file contains all the registers available to a programmer. Which registers art
visible to the programmer ‘depend upon the currentmode of the processor.
2.2 CurRENT PROGRAM STATUS REGISTER
th ,
The ARM car wes i en to monitor and control internal operations. The cpst isa
pepoera register and-resides in the register file. Figure 2.3 shows the basic layout
-nerit" program status register. Note that the shaded par
carson, shaded parts are reserved for fututt
‘The epsris divided into four i (tse) Gat
Ids, each 8 bits wi ,
In current designs the ened el h 8bits wide! flags, status, extension, and control
field contains the proce sion and status fields are relerbed fot fu : io
tere contans the prosesor mode, state, and interrupt mask bite The fig econ
0 abieats..,
be fond fe Peoseser cores have extra bits allocated,
fo Se fags fel ated. For example, the Jbit, which
, is only avail
Only available on Jazee-enabled processors, which exe3!#
Scanned with CamScanner2.2 Current Program Status Register 23
Fields Flags Status Extension Control
p—Pxension 4 Control
Bit 2
" - 23. 1654 o
FIT] Mode
Function =
—
Cendition Interrupt | Processor
aes Masks mode
Thumb
state
Figure 2.3 A generic program status register (ps).
-S-bit instructions. We will discuss jazelle more in Section 2.2.3. It is highly probable that
future designs will assign extra bits for the monitoring and control of new features.
For a full description of the epsr, refer to Appendix B.
.1 PROCESSOR MODES
The processor mode determines which registers are active and the access rights to the cpsr
register itself. Each processor mode is either privileged or nonprivileged: A privileged mode
allows full read-write access to the epsr. Conversely, a nonprivileged mode only allows tead
access to the control field in the cpsr but still allows read: write access to the condition flags.
There are Sevent)processor modes in total: six privileged modes (abort, fast interrupt
request, interrupt request, supervisor, system, and undefined) and one nonprivileged mode
(user). ~ ~ _
~The processor enters‘abori ode when there is failed attempt to access memory. Fast
interrupt request and interrupt request modes correspond to the two interrupt levels available
‘on the ARM processor/Supervisorymode is the mode that the processor is in after reset and
‘is generally the mode that af operating system kernel operates in, System mode is a special
version of user mode that allows full read-write access to the epsr. Undefined mode is used
when the processor encounters an instruction that is undefined or not supported by the
implementation. User modes used for programs and applications.
Scanned with CamScanner26 Chapter? ARM Processor Fundamentals
Table 2.1° Processor mode.
r Mode(4:9]
Mode Abbreviation uit
Abort abt i
Fast interrupt request fiq 7 10010
i " 10011
Interrupt request irq yes
Supervisor sve = mut
1101
a wd . 10000
Undefined und nd
User
note is that the cpsris not copied into the spsr when a
Asai pe ae to sr ‘ting directly to the cpsr. The saving of the epsr
is forced due to a program writing directly
mode change is pane
= ay _ mt i cu ‘active processor mode occupies the five Teast significant
bits ofthe ar. When power is applied to the core, it starts in supervisor ode, which is
privileged. Starting in a priyileged: ‘mode is useful since initialization code can use full access,
the stacks for each of the other mod@s-
ee ts tie various modes and the associated binary patterns. The last column of
the tbe isthe bit paters tha represent each ofthe processor modes inthe eps
2.2.3 STATE AND INSTRUCTION SETS
‘The state of the core determines which instruction set is being executed. There are three
instruction sets: ARM, Thumb, and Jazelle. The ARM instruction set is only active when
the processor is in ARM state. Similarly the Thumb instruction set is only active when
the processor is in Thumb state. Once in Thumb state the processor is executing purely
Thumb 16-bit instructions. You cannot intermingle sequential ARM, Thumb, and Jazele
instructions.
‘The Jazelle Jand Thumb Tbits in the cpsr reflect the state of the processor. When both
Jand Tits are 0, the processor is in ARM state and executes ARM instructions. This is the
case when power is applied to the processor. When the T bit is 1, then the processor is in
Thumb state. To change states the core executes a specialized branch instruction, Table 22
compares the ARM and Thumb instruction set features,
‘The ARM designers introduced a third instruction set called Jazelle, Jazelle executes
8-bit instructions and is a hybrid mix of soft igned
execution of java neediee ware and hardware designed to speed up
To execute Java bytecodes, you require the Jazelle technology plus a specially modified
version of the Java virtual machine. It is important to note that the hardware portion ¢
Jazelle only supports a subset of the Java bytecodes; the rest are emulated in softval
Scanned with CamScanner2.2 Current Program Status Register 27
Table 2.2 ARM and Thumb instruction set features.
‘ARM (epsr T= 0) Thumb (
jon to be executed.
Decode identifies the insist rites the result back t 2 register
Execute processes the instruction
. , «sive using a simple example. It shows a sequence of three
_ rst i ei oy po” Hk wn
bar 7 an
i te after the pipeline is filled. |
ast com edit the pipeline sequen. tthe fist cle be
core fetches the ADD instruction from memory. In the second cycle the core fetches the
‘SUB instruction and decodes the ADD instruction. In the third cycle, both the SUB and
‘The ADD instruction is executed, the SU3
‘ADD instructions are moved along the pipeline. 1
instruction is decoded, and the CMP instruction is fetched. This procedure is called filling
the pipeline. The pipeline allows the core to execute an instruction every cycle.
the amount of work done at each stage is reduced,
As the Pigne ngh ness
which allows the processor to attain a higher operating frequency. This in turn increases
the performance. The system latency.also increases Because it takes more cycles to fill he
pipeline before the corecan executean instruction, The increased pipeline fength also meass
there can be data dependency between certain stages. You can write code to-reduce this
dependency ising instruction scheduling (for more information on instruction scheduling
takea look at Chapter 6). -
Figure 2.8 Pipelined instruction Sequence,
Scanned with CamScanner2A Exceptions, Interrupts, and the Vector Table 33
2.4 Exceptions, In
» INTER
TABLE RUPTS, AND THE VECTOR
wea excention on upt occurs, the processor sets the pe to a spec mory
adres. cldress is within a special aduress range called the vector table. The enti
te vector table are instructions that branch to specific routines designed to handle
Par icular exception or interrupt Oo
von BEBO ‘map address 0x00000000 is reserved for the vector table, a set of 32-bit
Words. On some processors the vector table can be optionally located at a higher address
in memory (starting at the offset Oxff¥10000). Operating systems such as Linux and
Microsoft's embedded products can take advantage of this feature.
When an exception or interrupt occurs, the processor suspends normal execution and
starts loading instructions from the exception vector table (see Table 2.6). Bach vector table
entry contains a form of branch instruction pointing to the start ofa specific routine:
executed by the processor when power
tior
® Reset vectoris the location of the first instructi
is applied. This insiruction branches to the init
= Undefined instruction vector is used when the processor cannot decode an instruction.
= Software interrupt vector is called when you execute a SWI instruction. The SWI
instruction is frequently used as the mechanism to invoke an operating system routine.
® Prefetch abort vector occurs when the: processor atternpts to fetch, an instruction from an
address without the correct access permissions. “The actual abort occurs in the decode
rout the correct access permissions.
stage.
Data abort vectoris similar to a prefetch abort buts raised when an instruction attempts
to access data memory without the correct access permissions.
Interrupt request vectors used by external hardware to interrupt the normal execution
flow ofthe processor. It can only beTaised if IRQs are not masked in he Pr.
Table 2.6 The vector table.
Exception/interrupt Shorthand ‘Address High address
Reset RESET 0x00000000 oxs¢rro00g
i F fff
instruction UNDEF 0x0000004 ox
ee eEpt Swi 0x00000008, oxff ff0008
Prefetch abort PABT 0x0000000c oxtF¥o00e
. Data abort DABT 0x00000010 oxff¢f00 10
Reserved - 0x00000014 ont caI8
IRQ 0x00000018 ont
Interrupt request
Fast interrupt request FQ ox0000001¢
Scanned with CamScanner2.5 CORE EXTENSIONS
The hardware extensions covered in this section are standard componeiits placed next, the
ARM core. They improve performance, manage resources, and provic€ extra function,
. They improve pet '
and are designed to provide flexibility in. handling particular applications. Each ARM funy,
i ns available. \
ea leréat erensions rae asians around the core cache and th,
coupled memory, memory management, and the coprocessor interface. =a
2.5.1 CACHE AND TIGHTLY COUPLED MEMORY
The cache is a block of fast memory placed between main memory and the core. Itallovsf;
more efficient fetches from some memory types. With a cache the processor core an Pra
for the majority of the time without having to wait for data from slow external niemon,
Most ARM-based embedded systems use a single-Tevel cache internal to the processo,
‘Ofcourse, many small embedded systems do not require the performance gains that}
cache brings,
ARM has two forms of cache. The first is found attached to the Von Neumann-styl
cores. It combines both data and instruction into a single unified cache, as shown in
‘Figure 2.13. For simplicity, we have called the glue logic that connects the memory system
to the AMBA bus logic and control. — ——
—— =
On-chip AMBA bus
gure 2.13 A simplified Von Neumann architecture with cache,
Scanned with CamScanner2.5 Core Extensions 35
‘On-chip AMBA bus
Figure2.14 A simplified Harvard architecture with TCMs,
By contrast, the second-form, attached to the Harvard-style cores, has separate caches
for data and instruction, —— = os
‘Keache provides an overall increase in performance but at the expense of predictable
execution, But for real-time systems itis paramount that code executic leterministic—
the time taken for loading and storing instructions or data must be pr is
achjeved using a form of memory called tightly coupled memory (TCM). TCM is fast SRAM.
located close to the core and guarantees the clock cycles required to fetch instructions or
data—critical for real-time algorithms requiring deterministic behavior. TCMs appear as
memory in the address map and'can be accessed as fast memory. An example ofa processor
with TCMs is shown in Figure 214—
by’ combining bothechnalogies ARM procesrscanharebth improved performance
and predictable real-time response. Figure 2.15 shows an example core witha combination
of caches ani —_ ~—
2.5.2. MEMORY MANAGEMENT
Embedded systems often use multiple memory devices. It is usually necessary to have a
method to help organize these devices and protect the system from applications trying to
make inappropriate accesses to hardware. ‘This is achieved with the assistance of memory
management hardware. —_
“ARM cores have three different types of memory management hardware—no extensions
providing no protection, a memory protection unit (MPU), providing limited proteon,
and a memory management unit (MMU) providing. fall protection: —_—_
8 Nonprotected memory'is fixed and provides very little flexibility. is normally used for
small simple embedded systems that require no protection from rogue appUcauons
Scanned with CamScanner446 Chapter 2. ARM Processor Fundamentals
Instruction
cache
D+l
‘On-chip AMBA bus
Figure 2.15 A simplified Harvard architecture with caches and TCMs.
= MPUs employ a simple system that uses a limited number of memory y regions. Thes
regions are controlled with a set of special coprocessor registers, and each region i
defined with Specific access permissions. This type of memory management is used
_for systems that Tequire memary protection but don’t have a complex memory mip
‘The MPU is explained in Chapter 13.
= MMUsare the most comprehensive memory management hardware available on
ARM. The MMU uses a set of translation tables to provide fine-grained control o
memory. These ‘ols te sie In sin memory ored in main memory and provide fe vinaalto pip
adress map as well as access permissions. MMUs are designed for more sophist-
etn form operating systems that support multitasking. The MMU ‘5 explained a
pter 14.
25.3 COPROCESSORS
Coprocessors can be attached to the ARM processor. A co} i
L cessor. A coprocessor extends the processing
features sts ore by extending the instruction seo} by providing commguraton te
isters. More ided to
ion ore tan_one coprocessor can be added to the ARM core via the coprocessor
The copracesso:
“ee rere 7 mH can be accessed through a group of dedicated ARM instructions
TeSSOF Gace ne PE interface. Consider, for example, coprocessor 15: THe
Processor ilses coproce:
management, Processor 15 registers to control the cache, TCMs, and memory
The coprocessor can ali
2 80 extend the instructi
of new instructions, For e uatio
Nn set by providing a specialized prov?
example, there are a set of specialized instrestms that
Scanned with CamScanner