0% found this document useful (0 votes)
22 views47 pages

Unit III Part 1

Uploaded by

Rama Krishna
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPT, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
22 views47 pages

Unit III Part 1

Uploaded by

Rama Krishna
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPT, PDF, TXT or read online on Scribd
You are on page 1/ 47

CISC vs RISC

RISC Design ARM design


The ARM processor has been specifically
Instructions designed to be small to reduce power
consumption and extend battery
operation

Pipelines High code density


Registers To reduce the area of the die taken up by
the embedded processor
Load Store Architecture Incorporated hardware debug technology
within the processor
ARM core is not a pure RISC architecture
Instruction Set for
Embedded Systems
• Variable cycle execution for certain
instructions
• Inline barrel shifter leading to more complex
instructions
• Thumb 16-bit instruction set
• Conditional execution
• Enhanced instructions
An example of an ARM-based
embedded device, a microcontroller
Software abstraction layers executing
on hardware
Example ARM-based System

16 bit RAM 32 bit RAM

Interrupt
Controller
Peripherals I/O
nIRQ nFIQ

ARM
Core
8 bit ROM
AMBA
Arbiter Reset

ARM
TIC
Remap/
External Bus Interface Timer
Pause
ROM External

Bridge
Bus
Interface
External
RAM On-chip Interrupt
Decoder RAM Controller

AHB or ASB APB

System Bus Peripheral Bus


ARM core dataflow model
•Data items are placed in the register file—a storage bank made
up of 32-bit registers. Since the ARM core is a 32-bit processor,
most instructions treat the registers as holding signed or unsigned
32-bit values. The sign extend hardware converts signed 8-bit and
16-bit numbers to 32-bit values as they are read from memory and
placed in a register.
•ARM instructions typically have two source registers, Rn and Rm,
and a single result or destination register, Rd. Source operands are
read from the register file using the internal buses A and B,
respectively.
•The ALU (arithmetic logic unit) or MAC (multiply-accumulate unit)
takes the register values Rn and Rm from the A and B buses and
computes a result. Data processing instructions write the result in
Rd directly to the register file. Load and store instructions use the
ALU to generate an address to be held in the address register and
broadcast on the Address bus.
History of ARM
• ARM (Acorn RISC Machine) started as a new, powerful,
CPU design for the replacement of the 8-bit 6502 in Acorn
Computers (Cambridge, UK, 1985)
• First models had only a 26-bit program counter, limiting
the memory space to 64 MB (not too much by today
standards, but a lot at that time).
• 1990 spin-off: ARM renamed Advanced RISC Machines
• ARM now focuses on Embedded CPU cores
• IP licensing: Almost every silicon manufacturer sells some microcontroller with an ARM core. Some
even compete with their own designs.
• Processing power with low current consumption
• Good MIPS/Watt figure
• Ideal for portable devices
• Compact memories: 16-bit opcodes (Thumb)
• New cores with added features
• Harvard architecture (ARM9, ARM11, Cortex)
• Floating point arithmetic
• Vector computing (VFP, NEON)
• Java language (Jazelle)
Facts
• 32-bit CPU
• 3-operand instructions (typical): ADD Rd,Rn,Operand2
• RISC design…
• Few, simple, instructions
• Load/store architecture (instructions operate on registers, not memory)
• Large register set
• Pipelined execution
• … Although with some CISC touches…
• Multiplication and Load/Store Multiple are complex instructions (many
cycles longer than regular, RISC, instructions)
• … And some very specific details
• No stack. Link register instead
• PC as a regular register
• Conditional execution of all instructions
• Flags altered or not by data processing instructions (selectable)
• Concurrent shifts/rotations (at the same time of other processing)
• …
Topologies
Von Neumann Harvard

ARM9s
ARM7s and newers
and olders
Inst. Data

AHB
bus
I D
Cache Cache
MEMORY
& I/O

Bus Interface

AHB
Memory-mapped I/O: bus
• No specific instructions for I/O (use
Load/Store instr. instead) MEMORY
• Peripheral’s registers at some & I/O
memory addresses
ARM7TDMI A[31:0]

Block Diagram
Address Register Address
Incrementer

PC bus
PC

REGISTER
BANK

ALU bus

Control Lines
INSTRUCCTION
DECODER
Multiplier

B bus
A bus

SHIFT

A.L.U.
Instruction Reg.

Thumb to
ARM
Write Data Reg. Read Data Reg.
translator

D[31:0]
ARM Pipelining examples
ARM7TDMI Pipeline

FETCH DECODE EXECUTE


Reg. Reg.
Read Shift ALU Write

1 Clock cycle

ARM9TDMI Pipeline

FETCH DECODE EXECUTE MEMORY WRITE


Reg. Reg.
Shift ALU access
Read Write

1 Clock cycle

• Fetch: Read Op-code from memory to internal Instruction Register

• Decode: Activate the appropriate control lines depending on Opcode

• Execute: Do the actual processing


ARM7TDMI Pipelining (I)

1 FETCH DECODE EXECUTE

2 FETCH DECODE EXECUTE

3 FETCH DECODE EXECUTE


instruction
time

• Simple instructions (like ADD) Complete at a rate of one per cycle


ARM7TDMI Pipelining (II)
• More complex instructions:

1 ADD FETCH DECODE EXECUTE

2 STR FETCH DECODE Cal. ADDR Data Xfer.

3 ADD FETCH stall DECODE EXECUTE

4 ADD FETCH stall DECODE EXECUTE

5 ADD FETCH DECODE EXECUTE


instruction
time

STR : 2 effective clock cycles (+1 cycle)


Arithmetic and Carry Flag
Carry flag behavior for subtraction
SBC R, #0 (4-bit examples)
A B
32 1 0 1 0 R
32
SUB
1 1 1 1 #0
0 Ci
32 Co 1 1 0 0 1
= 0 for ADD

Co adder Ci = 1 for SUB


1 0 1 0 R
to C_flag
= C_flag for ADC, SBC #0
32 1 1 1 1
1 Ci
Co 1 1 0 1 0
ALU equivalent for arithmetic instructions

Carry acts as an inverted borrow

• Same as 6502, PowerPC (Borrow = not Carry)


• In contrast with Z80, Intel x86, m68k, many others (Borrow = Carry)
Data Sizes and Instruction Sets
 The ARM is a 32-bit architecture.

 When used in relation to the ARM:


 Byte means 8 bits
 Halfword means 16 bits (two bytes)
 Word means 32 bits (four bytes)

 Most ARM’s implement two instruction sets


 32-bit ARM Instruction Set
 16-bit Thumb Instruction Set
Processor Modes
 The ARM has seven operating modes:

 User : unprivileged mode under which most tasks run

 FIQ : entered when a high priority (fast) interrupt is raised

 IRQ : entered when a low priority (normal) interrupt is raised

 SVC : (Supervisor) entered on reset and when a Software Interrupt


instruction is executed

 Abort : used to handle memory access violations

 Undef : used to handle undefined instructions

 System : privileged mode using the same registers as user mode


The Registers
 ARM has 37 registers all of which are 32-bits long.
 1 dedicated program counter
 1 dedicated current program status register
 5 dedicated saved program status registers
 30 general purpose registers

 The current processor mode governs which of several banks is


accessible. Each mode can access
 a particular set of r0-r12 registers
 a particular r13 (the stack pointer, sp) and r14 (the link register, lr)
 the program counter, r15 (pc)
 the current program status register, cpsr

Privileged modes (except System) can also access


 a particular spsr (saved program status register)
The ARM Register Set
Current Visible Registers
r0
Abort
Undef
SVC
IRQ
FIQ
User Mode
Mode
Mode
Mode
Mode
r1
r2
r3 Banked out Registers
r4
r5
User,
r6 User FIQ IRQ SVC Undef Abort
r7
SYS
r8 r8 r8
r9 r9 r9
r10 r10 r10
r11 r11 r11
r12 r12 r12
r13 (sp) r13 (sp) r13 (sp) r13 (sp) r13 (sp) r13 (sp) r13 (sp)
r14 (lr) r14 (lr) r14 (lr) r14 (lr) r14 (lr) r14 (lr) r14 (lr)
r15 (pc)

cpsr
spsr spsr spsr spsr spsr spsr
Special Registers
 Special function registers:
 PC (R15): Program Counter. Any instruction with PC as its destination register is a
program branch

 LR (R14): Link Register. Saves a copy of PC when executing the BL instruction


(subroutine call) or when jumping to an exception or interrupt routine
- It is copied back to PC on the return from those routines

 SP (R13): Stack Pointer. There is no stack in the ARM architecture. Even so, R13 is
usually reserved as a pointer for the program-managed stack

 CPSR : Current Program Status Register. Holds the visible status register

 SPSR : Saved Program Status Register. Holds a copy of the previous status register
while executing exception or interrupt routines
- It is copied back to CPSR on the return from the exception or interrupt
- No SPSR available in User or System modes
Register Organization
User,
FIQ IRQ SVC Undef Abort
SYS
r0
r1
User
r2 mode
r3 r0-r7,
r4 r15, User User User User
r5 and mode mode mode mode
cpsr r0-r12, r0-r12, r0-r12, r0-r12,
r6
r15, r15, r15, r15,
r7 and and and and
r8 r8 cpsr cpsr cpsr cpsr
r9 r9
r10 r10
r11 r11
r12 r12
r13 (sp) r13 (sp) r13 (sp) r13 (sp) r13 (sp) r13 (sp)
r14 (lr) r14 (lr) r14 (lr) r14 (lr) r14 (lr) r14 (lr)
r15 (pc)

cpsr
spsr spsr spsr spsr spsr

Note: System mode uses the User mode register set


Program Status Registers
31 28 27 24 23 16 15 8 7 6 5 4 0

N Z C V undefined I F T mode

f s x c

 Condition code flags Interrupt Disable bits.


 N = Negative result from ALU I = 1: Disables the IRQ.
 Z = Zero result from ALU F = 1: Disables the FIQ.
 C = ALU operation Carried out
 V = ALU operation oVerflowed
T Bit (Arch. with Thumb mode only)
T = 0: Processor in ARM state
 Mode bits T = 1: Processor in Thumb state
10000 User
10001 FIQ Never change T directly (use BX instead)
10010 IRQ Changing T in CPSR will lead to unexpected
10011 Supervisor behavior due to pipelining
10111 Abort
11011 Undefined Tip: Don’t change undefined bits.
11111 System This allows for code compatibility with
newer ARM processors
Program Counter (R15)
 When the processor is executing in ARM state:
 All instructions are 32 bits wide
 All instructions must be word aligned
 Therefore the PC value is stored in bits [31:2] and bits [1:0]
are zero
 Due to pipelining, the PC points 8 bytes ahead of the current
instruction, or 12 bytes ahead if current instruction includes
a register-specified shift

 When the processor is executing in Thumb state:


 All instructions are 16 bits wide
 All instructions must be halfword aligned
 Therefore the PC value is stored in bits [31:1] and bit [0] is
zero
Program Status Registers
31 28 27 24 23 16 15 8 7 6 5 4 0

NZCVQ J U n d e f i n e d I F T mode
f s x c
• Condition code flags • Interrupt Disable bits.
– N = Negative result from ALU – I = 1: Disables the IRQ.
– Z = Zero result from ALU – F = 1: Disables the FIQ.
– C = ALU operation Carried out
– V = ALU operation oVerflowed • T Bit
– Architecture xT only
• Sticky Overflow flag - Q flag – T = 0: Processor in ARM state
– Architecture 5TE/J only – T = 1: Processor in Thumb state
– Indicates if saturation has occurred
• Mode bits
• J bit – Specify the processor mode
– Architecture 5TEJ only
– J = 1: Processor in Jazelle state
Program Counter (r15)
• When the processor is executing in ARM state:
– All instructions are 32 bits wide
– All instructions must be word aligned
– Therefore the pc value is stored in bits [31:2] with bits [1:0] undefined (as
instruction cannot be halfword or byte aligned).

• When the processor is executing in Thumb state:


– All instructions are 16 bits wide
– All instructions must be halfword aligned
– Therefore the pc value is stored in bits [31:1] with bit [0] undefined (as
instruction cannot be byte aligned).

• When the processor is executing in Jazelle state:


– All instructions are 8 bits wide
– Processor performs a word access to read 4 instructions at once
Processor mode
ARM and Thumb instruction set
features
Jazelle instruction set features
Condition flags
Condition mnemonics
Example: cpsr = nzCvqjiFt_SVC.
PIPELINED INSTRCUTION
SEQUENCE
CONTD..
Pipeline Executing Characteristics
• The ARM pipeline has not processed an instruction until
it passes completely through the execute stage. For
example, an ARM7 pipeline (with three stages) has
executed an instruction only when the fourth instruction
is fetched.
• The MSR instruction is used to enable IRQ interrupts,
which only occurs once the MSR instruction completes
the execute stage of the pipeline. It clears the I bit in the
cpsr to enable the IRQ interrupts.
• Once the ADD instruction enters the execute stage of the
pipeline, IRQ interrupts are enabled.
CONTD..
• In the execute stage, the pc always points to the
address of the instruction plus 8 bytes. In other
words, the pc always points to the address of the
instruction being executed plus two instructions
ahead.
• This is important when the pc is used for calculating
a relative offset and is an architectural characteristic
across all the pipelines. Note when the processor is
in Thumb state the pc is the instruction address plus
4.
THREE CHARACTERISTICS OF
PIPELINING
• First, the execution of a branch instruction or branching by
the direct modification of the pc causes the ARM core to flush
its pipeline.
• Second, ARM10 uses branch prediction, which reduces the
effect of a pipeline flush by predicting possible branches and
loading the new branch address prior to the execution of the
instruction.
• Third, an instruction in the execute stage will complete even
though an interrupt has been raised. Other instructions in the
pipeline will be abandoned, and the processor will start filling
the pipeline from the appropriate entry in the vector table.
Exception Handling
 When an exception occurs, the ARM:
 Copies CPSR into SPSR_<mode>
 Sets appropriate CPSR bits:
 Changes to ARM state 0x1C FIQ
 Changes to related mode
0x18 IRQ
 Disables IRQ
 Disables FIQ (only on fast interrupts)
0x14 (Reserved)
 Stores the return address in LR_<mode> 0x10 Data Abort
 Sets PC to vector address 0x0C Prefetch Abort
 0x08 Software Interrupt
To return, exception handler needs to:
0x04 Undefined Instruction
 Restore CPSR from SPSR_<mode>
0x00 Reset
 Restore PC from LR_<mode>
Vector Table
This can only be done in ARM state.
EXCEPTIONS, INTERRUPTS &
VECTOR TABLE
• When an exception or interrupt occurs, the processor
sets the PC to a specific memory address. The address is
within a special address range called the vector table.
• The entries in the vector table are instructions that
branch to specific routines designed to handle a
particular exception or interrupt.
• When an exception or interrupt occurs, the processor
suspends normal execution and starts loading
instructions from the exception vector table. Each vector
table entry contains a form of branch instruction
pointing to the start of a specific routine.
THE VECTOR TABLE
• Reset vector is the location of the first instruction executed by the
processor when power is applied. This instruction branches to the
initialization code.
• Undefined instruction vector is used when the processor cannot
decode an instruction.
• Software interrupt vector is called when you execute a SWI
instruction. The SWI instruction is frequently used as the mechanism
to invoke an operating system routine.
• Prefetch abort vector occurs when the processor attempts to fetch
an instruction from an address without the correct access
permissions. The actual abort occurs in the decode stage.
• Data abort vector is similar to a prefetch abort but is raised when an
instruction attempts to access data memory without the correct
access permissions.
• Interrupt request vector is used by external hardware to interrupt
the normal execution flow of the processor. It can only be raised if
IRQs are not masked in the cpsr.
Architecture Revisions
Nomenclature
ARM{x} {y} {z} {T} {D} {M} {I} {E} {J} {F} {S}
x—Family
y—Memory management/protection unit
z—Cache
T—Thumb 16-bit decoder
D—JTAG debug
M—Fast multiplier
I—Embedded ICE
E—Enhanced instructions (assumes TDMI)
J—Jazelle State
F—Vector floating-point unit
S—Synthesizible version

You might also like