0% found this document useful (0 votes)
49 views46 pages

ARM Processor

The document discusses the ARM processor architecture. It originated from Acorn Computers' RISC Machine design. ARM designs processor cores and licenses them to partners who manufacture and sell chips containing ARM cores. ARM is a leading provider of 32-bit embedded RISC microprocessors, commonly used in applications like mobile phones, automotive systems, and more. The document describes the evolution of ARM architectures over time, from ARMv1 to newer versions, and provides details on the ARM programming model, instruction set, pipeline design, and more.

Uploaded by

yixexi7070
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
49 views46 pages

ARM Processor

The document discusses the ARM processor architecture. It originated from Acorn Computers' RISC Machine design. ARM designs processor cores and licenses them to partners who manufacture and sell chips containing ARM cores. ARM is a leading provider of 32-bit embedded RISC microprocessors, commonly used in applications like mobile phones, automotive systems, and more. The document describes the evolution of ARM architectures over time, from ARMv1 to newer versions, and provides details on the ARM programming model, instruction set, pipeline design, and more.

Uploaded by

yixexi7070
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 46

ARM Processor

Dr. B. Thiyaneswaran
Associate Professor,
Department of ECE,
Sona College of Technology
ARM Basics
• ARM processor core originates -> ACRON.
• ACRON RISC Machine.
• Designs the ARM range of RISC processor
cores
• Licenses ARM core designs to semiconductor
partners who fabricate and sell to their
customers.
(ARM does not fabricate silicon itself)
Leading provider of 32-bit embedded RISC
microprocessors.
• 75% of market.
• High performance
• Low power consumption
• Low system cost
Solutions for,

Embedded real-time systems for mass storage, automotive,


industrial and networking , applications, secure applications -
smartcards and SIMs, Open platforms running complex
operating systems
• ARMv1
First version of ARM processor, 26-bit addressing,
no multiply / coprocessor

• ARMv2
ARM2, First commercial chip, Included 32-bit result
multiply instructions / coprocessor support
 ARMv2a
ARM3 chip with on-chip cache, Added load and store
cache management

 ARMv3
ARM6, 32 bit addressing, virtual , memory support
ARM Processor Core
Current low-end ARM core for applications like digital
mobile phones
TDMI
T: Thumb, 16-bit instruction set
D: on-chip Debug support, enabling the processor to halt in
response to a debug request
M: enhanced Multiplier, yield a full 64-bit result, high
performance
I: EmbeddedICE hardware
Von Neumann architecture
3-stage pipeline
ARM Core Diagram
ARM Inheritance
Used features of RISC:
• A load and store architecture.
• Fixed 32 bit instructions.
• 3 byte instruction format.
Unused features:
• Register windows.
• Delayed branches.
• Single cycle instruction execution
ARM Programmer Model
• Visible registers.
• Invisible registers -> not
significant.
It may be designated as ‘scratch pad’ registers. These are the
R0-R12
registers into which data and address are loaded.

It is the pointer to stack and it is used to the Stack pointer (SP)


R13

It is the link register. It is used whether there is a procedure call or an


interrupt, that is branching to a location. The value of PC is saved in
R14
the link register, and PC takes on the new branch address. It will
store the return address.

R15 It act as PC
Current Program Status Register (CPSR)
Operating Modes

Modes:
1. USER -> Unprivileged Mode.
2. FIQ (Fast Interrupt Request) -> Entered on high priority INT
3. IRQ (interrupt Request) -> Entered on low priority INT
4. Supervisor-> Entered on Reset & SWI is executed.
5. Abort -> Used to handle memory access violations.
6. Undef -> Used to handle undefined instructions.
7. System-> Privileged Mode same as register access in
user mode.
Saved Program Status
Register(SPSR)

• 5 SPSR.
• Each one corresponding to exception
mode of operation.
• When an exception that is an
interrupt occurs the corresponding
SPSR saves the current CPSR value
into it.
ARM Memory system
• 32 bit memory.
• 8 bit / 16 bit /32 bit access flexibility.
• Little Endian & Big Endian format.
Load and Store architecture
• Direct memory to memory operation is not
allowed like CISC.
• Addition/subtraction/ or an other operation,
data’s r to be from/to in registers only.
• The data’s in memory has to be loaded in to
registers (LOAD). -> Register to ALU -> ALU to
registers.
• Register data to stored to memory (STORE)
Load and Store architecture
Data Processing instruction:
Uses registers for loading and values. Results
also stored in registers. (Not directly with
memory).
Data Transfer instruction:
Even memory to operation. Location-1 ->
Registers. Register-> Location-2.
Control Flow Instruction:
Control flow required Jumping of different
address.
3 Stage ARM Organization
3 Stage ARM Organization

• The register bank, which stores the processor


state.
• It has two read ports and one write port.
• Plus an additional read port and an additional
write port that give special access to r15, the
program counter.
(The additional write port on r15 allows it to be
updated as the instruction fetch address is incremented
and the read port allows instruction fetch to resume
after a data address has been issued.)
3 Stage ARM Organization

• The barrel shifter-> which can shift or rotate


one operand by any number of bits.
• The ALU -> Arithmetic and logic functions.
• The address register and Incrementer -> which
select and hold all memory.
• The data registers, which hold data passing to
and from memory.
• The instruction decoder and associated
control logic.
3 Stage Pipeline Organization
• 3-stage pipeline: Fetch – Decode - Execute
• Three-cycle latency, one instruction per cycle
throughput

i
n
s
t i Fetch Decode Execute
r
u
i+1 Fetch Decode Execute
c
t
i i+2 Fetch Decode Execute
o cycle
n
t t+1 t+2 t+3 t+4 19
FETCH:
Instruction is fetched from memory and placed in the
instruction pipeline.
DECODE:
The instruction is decoded and the data path control
signals prepared for the next cycle. In this stage the
instruction 'owns' the decode logic but not the data path.
EXECUTE:
The instruction 'owns' the data path. The register bank is
read, an operand shifted, the ALU result generated and
written back into a destination register.
***At any one time, three different instructions may occupy
each of these stages, so the hardware in each stage has to
be capable of independent operation.
Multiple cycle instruction in 3 stage
5 stage pipe line

Performance of the system depends on,


CPI – Average clock cycle per instruction.
Fclk - Clock frequency
CPI – Average clock cycle per instruction.
Fclk - Clock frequency

Fclk
• Logic in Each pipe line has to be simplified.
• No of pipe line stages has to be increased.

CPI 3 stage pipeline may Re-implemented.


• Pipeline stall by data dependency may
reduced.
Bottle neck of ARM 3 stage
• Due to Von-Numan (ARM 3 stage) ->
Stored Program -> Needs to single instruction
and multiple memory access is required-> It is
crucial with limited memory band width.

Need of more no of 32 bit instruction has to be


read from memory at one cycle.
Need for 5 stage Pipeline
• Dumping of complex instruction into single leads to
incompletion / required higher clock width period.

Remember

(Major issue on 3 stage pipeline to stop us to increase


the fclk)
Solution:
• Breaking Instruction processing into 5 stage execution
stage into 3 stages.
• Increase the clock speed fclk and allow each stage will
complete with in a clock.
5 stage Pipeline

• Fetch.
• Decode.
• Execute.
• Buffer / Data.
• Write Back.
5 stage Pipeline
Fetch:
The instruction is fetched from memory and placed in the instruction
pipeline.
Decode:
The instruction is decoded and register operands read from the register file.
Register bank has 3 Read ports -> ARM instructions can source all their
operands in one cycle.
Execute:
An operand is shifted to ALU input and result generated. If the instruction is
a load or store the memory address is computed in the ALU.
Buffer/data:
Data memory is accessed if required. Otherwise the ALU result is simply
buffered for one clock cycle to give the same pipeline flow for all instructions.
Write-back:
The results generated by the instruction are written back to the register file,
including any data loaded from memory.
Comparative Clock analysis of 3 & 5 stage Pipeline
3 Stage 5 Stage

3 Stage

For 3 instruction
Analysis

3 Stage: (Low clock speed)


5 x 500ms = 2500ms / 2.5s.

5 Stage: (high clock Speed)


5 Stage 7 x 250ms = 1750ms / 1.75s.
ARM 5 Stage (ARM9 TDMI)
Operations…..
STR (Store – Register Data Path activity)
Branching Instruction
ARM Instruction Set

• Data Processing instructions.

• Data Transfer instructions.

• Branching instructions.
Data Processing Instruction
• All operands are 32 bits wide and come from
registers or are specified as literals in the instruction
itself.
• The result, if there is one, is 32 bits wide and is
placed in a register. (There is an exception here: long
multiply instructions produce a 64-bit result).
• Each of the operand registers and the result register
are independently specified in the instruction. That
is, the ARM uses a '3-address' format for these
instructions.
Ex: ADD R0, R1, R2
Arithmetic Instructions
Arithmetic

Bit wise operation


Register movement instructions

Comparison instructions

Immediate operands

ADD r3, r3, #1


AND r8, r7, #&ff
Shift register operands

ADD r3, r2, r1, LSL #3 ; r3 <= r2 + r1 (LSL #3)

LSL – Logical left shift.


Arithmetic Instructions
Multiplication

SMULL R0, R1, R2, R3 ; R0 <- Higher 32 bit, R1 <- Lower 32 bit
UMULL R0, R1, R2, R3 ; R0 <- Higher 32 bit, R1 <- Lower 32 bit

No division instruction ARM.


Data Transfer instructions
Register indirect, Single register Load & Store

Base plus offset addressing (pre indexed)

Base plus offset addressing (Post indexed)


Data Transfer instructions
8 bit operation:
Control Transfer instructions
Examples
Thumb Decompressor
Thumb Properties
• The Thumb code requires 70% of the space of
the ARM code.
• The Thumb code uses 40% more instructions
than the ARM code.
• With 32-bit memory, the ARM code is 40%
faster than the Thumb code.
• With 16-bit memory, the Thumb code is 45%
faster than the ARM code.
• Thumb code uses 30% less external memory
power than ARM code.
Thumb Instructions
Thumb Instructions

You might also like