11 ARM Processor
11 ARM Processor
11 ARM Processor
Prof.O.V.Gnana Swathika
• ARM stands for Advanced RISC Machine
is a product line of Acorn
• ARM(Microprocessor) architecture
incorporated a number of features from
the Berkeley RISC design:
a) load-store architecture
b) fixed-length 32-bit instructions
c) 3-address instruction formats
Prof.O.V.Gnana Swathika
• Simple hardware
• RISC ideas and few CISC features
• Better code density
• Small Core size
• Good power efficiency
Prof.O.V.Gnana Swathika
ARM Programmers Model
• Processors Instruction Set defines the
operations that the programmer can use to
change the state of the system
incorporating the processor
• The state comprises the value of the data
items in the processors’ visible registers
and systems memory
Prof.O.V.Gnana Swathika
• Each instruction can be viewed as
performing a defined transformation from
the state before the instruction is executed
to the state after it has completed
• Processor may have many invisible
registers involved in executing an
instruction. The values of these registers
before and after the instruction is executed
are not significant; only the values in the
visible registers have significance
Prof.O.V.Gnana Swathika
Prof.O.V.Gnana Swathika
• When writing user-level programs, only
15 general purpose 32-bit registers(r0-
r14), PC(r15) and current program status
word (CPSR) need to be considered
• The remaining registers are used only
for system-level programming and for
handling exceptions
Prof.O.V.Gnana Swathika
CPSR Format:
31 28 27 8 7 6 5 4 0
Prof.O.V.Gnana Swathika
• Apart from processor register state, ARM
has memory state
• It is a linear array of bytes numbered
from zero up to 232-1
• Data items maybe 8-bit bytes, 16-bit half
words or 32-bit words
• Words are always aligned on 4-byte
boundaries
• Half words are aligned on even byte
boundaries
Prof.O.V.Gnana Swathika
• A small portion of memory is considered
in the figure
• Each byte location has a unique number
• Byte may occupy any of these locations
• Word sized data must occupy a group of
four byte locations starting at a byte
address which is a multiple of four
• Half word occupy two byte locations
starting at an even byte address
• The above follows ‘little endian’ memory
organization
Prof.O.V.Gnana Swathika
• Big Endian
• suppose we have a 32 bit quantity, to
be written in memory as (90AB12CD)16
• In big endian, you store the most
significant byte in the smallest address.
Here's how it would look:
• AddressValue1000 90
1001 AB
1002 12
1003 CD
Prof.O.V.Gnana Swathika
Little Endian
Prof.O.V.Gnana Swathika
Load Store Architecture:
• Instruction set will only process (add, subtract
and so on) values which are in registers (or
specified directly within the instruction itself), and
will always place the results of such processing
into a register
• The only operation which apply to memory
state are ones which copy memory values into
registers (LOAD instructions) or copy register
values into memory (STORE instructions)
• ARM does not support memory-to-memory
operation
Prof.O.V.Gnana Swathika
• ARM instructions falls under category:
a) Data processing instructions: These
use and change only register values.
Example: Add contents of two registers and
place the result in a register
Prof.O.V.Gnana Swathika
c) Control Flow instructions:
Normal instruction execution uses
instructions stored at consecutive memory
addresses.
i) Switch to different address permanently
(BRANCH)
ii) Saving a return address to resume the
original sequence (BRANCH and LINK)
iii) Trapping into system code
(SUPERVISOR CALLS)
Prof.O.V.Gnana Swathika
Supervisor Mode:
• ARM supports protected supervisor
mode
• This mechanism ensures that user code
cannot gain supervisor privileges
without appropriate checks being carried
out to ensure that the code is not
attempting illegal operations
Prof.O.V.Gnana Swathika
• System level functions (access H/W
peripheral registers; character I/O;etc) can
be accessed using specified supervisor
calls
Prof.O.V.Gnana Swathika
ARM Instruction Set
• All instructions are 32 bits wide
Prof.O.V.Gnana Swathika
a) load-store architecture
b) 3-address data processing
instructions (i.e. two source operand
registers and result register are all
independently specified)
c) Conditional execution of every
instruction
d) Load and store multiple register
instructions
Prof.O.V.Gnana Swathika
e) Ability to perform a general shift
operation and general ALU operation in a
single instruction that executes in a
single clock cycle
f) Open instruction set extension through
the coprocessor instruction set,
including adding new registers and data
type to the programmers model
g) A very dense 16-bit compressed
representation of instruction set in
THUMBS architecture
Prof.O.V.Gnana Swathika
I/O system:
• ARM handles I/O peripherals as memory
mapped devices with interrupt support
• Internal registers in these devices
appear as addressable locations within the
ARM memory map. It maybe read and
written using same (LOAD-STORE)
instruction as any other memory locations
Prof.O.V.Gnana Swathika
• Peripherals may get the processors’
attention by making an interrupt request
using either normal interrupt (IRQ) or
fast interrupt input (FIQ)
• Both interrupt inputs are level-sensitive
and maskable
• Some systems may include DMA
hardware external to the processor
Prof.O.V.Gnana Swathika
ARM Exceptions:
• It includes interrupts, traps and supervisor
calls
• Steps to handle:
i) Copy [PC] into r14_exc and [CPSR] into
SPSR_exc; exc stands for exception type
ii) Processor operating mode is changed to
appropriate exception mode
iii) PC is forced to a value b/w (00)h to (1C)h
depending upon type of exception
Prof.O.V.Gnana Swathika
• The instruction at the location the PC is
forced to (vector address); will contain a
branch to the exception handler
• Exception handler will use r13_exc to
save some user registers for use as
work registers
• The return to the user program is done
by restoring the user registers and then
using an instruction to restore the PC and
CPSR automatically
Prof.O.V.Gnana Swathika
3-stage pipeline ARM organization
Prof.O.V.Gnana Swathika
Prof.O.V.Gnana Swathika
• The register bank, which stores the
processor state. It has two read ports
and one write port which can each be
used to access any register, plus an
additional read port and an additional
write port which gives special access to
r15, the PC
• Barrel shifter can shift or rotate one
operand by any number of bits
• ALU performs arithmetic and logic
functions required by the instruction set
Prof.O.V.Gnana Swathika
• The address register and incrementer,
which select and hold all memory
addresses and generate sequential
addresses when required
• The data registers, hold data passing to
and from memory
• Instruction decoder and associated
logic
• In a single-cycle data processing
instruction, two register operands are
accessed, value of B bus is shifted and
combined with value on A bus in ALU;
Prof.O.V.Gnana Swathika
• then the result is written back into
register bank
• PC value is in the address register, from
where it is fed into the incrementer, then
the incremented value is copied back into
r15 in the register bank and also into the
address register to be used as the address
for the next instruction fetch
Prof.O.V.Gnana Swathika
3-stage pipeline:
• Up to ARM7, they employ a simple 3-
stage pipeline with following pipeline
stages:
a) Fetch: The instruction is fetched from
memory and placed in the instruction
pipeline
b) Decode: The instruction is decoded and
the datapath control signals prepared for
the next cycle. In this stage the
instruction ‘owns’ the decode logic but
not the datapath
Prof.O.V.Gnana Swathika
c) Execute: The instruction ‘owns’ the
datapath; the register bank is read, an
operand shifted, the ALU result
generated and written back into a
destination register
Prof.O.V.Gnana Swathika
• When the processor is executing simple
data processing instructions the
pipeline enables one instruction to be
completed every clock cycle; thus
throughput is one instruction per cycle
• An individual instruction takes three clock
cycles to complete, so it is said to have
three-cycle latency
Prof.O.V.Gnana Swathika
ARM Multi-cycle instruction 3-stage pipeline operation:
Prof.O.V.Gnana Swathika
• When a multi-cycle instruction is
executed the flow is less regular
• The above figure shows a sequence of
single-cycle ADD instructions with a data
store instruction, STR, occurring after
the first ADD
• Cycles that access main memory are
shown with light shading. We can see
that memory is used in every cycle
• Similarly datapath is used in every cycle
(execute cycles, address calculation and
data transfer)
Prof.O.V.Gnana Swathika
• Decode logic is always generating control
signals for datapath to use in next cycle
• Thus in instruction cycle, all parts of the
processor are active in every cycle and
the memory is the limiting factor;
defining the number of cycles the
sequence must take
Prof.O.V.Gnana Swathika
Highlights:
• All instructions occupy datapath for one
or more adjacent cycles
• For each cycle that an instruction occupies
the datapath, it occupies the decoder
logic in the immediately preceding
cycle
• Branch instructions flush and refill the
instruction pipeline
• Due to pipeline behavior, PC should
point eight bytes (2 instructions) ahead
of current instruction
Prof.O.V.Gnana Swathika
Data path activity: DATA
PROCESSING INSTRUCTION
Prof.O.V.Gnana Swathika
• Instruction takes single clock cycle
• As per figure (slide 42), note how PC
value is incremented and copied both
into address register and r15 in the
register bank, and the next instruction
but one is loaded into bottom of
instruction pipeline (i.pipe)
• As per figure (slide 43),the immediate
value is extracted from the current
instruction at the top of the instruction
pipeline
Prof.O.V.Gnana Swathika
1
Prof.O.V.Gnana Swathika
3
Prof.O.V.Gnana Swathika
Data path activity: STR (Store
Register)
Prof.O.V.Gnana Swathika
• A register is used as a base address to
which is added an offset which again
maybe another register or an
immediate value
• This address is sent to address register
• Data transfer occurs in second cycle
Prof.O.V.Gnana Swathika
• Datapath operation for two cycles of a
data store instruction (STR) with an
immediate offset is shown in the below fig.
• Incremented PC value is stored in the
register bank at the end of first cycle; so
that address register is free to accept data
transfer address from second cycle
• At the end of second cycle, the PC is
fed back to the address register to allow
instruction prefetching to continue
Prof.O.V.Gnana Swathika
• The value sent to the address register in a cycle
is the value used for the memory access in the
following cycle. Thus the address register is a
pipeline register between the processor
datapath and the external memory
• For STORE of a byte datatype; ‘data out’ block
extracts the bottom byte from the register and
replicates it four times across the 32-bit data
bus. External memory logic can use bottom two
bits of address bus to activate appropriate byte
within the memory system
Prof.O.V.Gnana Swathika
• In case of Load instructions, data from
memory only gets as far as ‘data in’
register on second cycle and a third cycle
is needed to transfer the data from
there to the destination register
Prof.O.V.Gnana Swathika
2
Prof.O.V.Gnana Swathika
4
Prof.O.V.Gnana Swathika
Data path activity: BRANCH
INSTRUCTION
Prof.O.V.Gnana Swathika
• A 24-bit immediate field is extracted from
the instruction and then shifted two bit
positions to give a word-aligned offset
which is added to the PC
• The result is issued as an instruction fetch
address, and while the instruction pipeline
refills the return address is copied into the
link register(r14)[if it is ‘branch with link’]
• The second cycle is required to complete
the pipeline refilling
Prof.O.V.Gnana Swathika
1
Prof.O.V.Gnana Swathika
2
Prof.O.V.Gnana Swathika