0% found this document useful (0 votes)
17K views24 pages

5-Stage Pipeline CPU Hardware

Here are the key points about conditional execution in ARM: - ARM instructions can optionally be made conditional by postfixing a condition code. - The condition code checks the status of flags like N, Z, C, and V set by previous instructions. - If the condition is true based on the flag statuses, the instruction executes normally. - If the condition is false, the instruction does not execute and the pipeline progresses to the next instruction. - Conditional execution allows greater pipeline performance by avoiding stalls when conditions are false. - It also improves code density since conditional instructions don't need separate branch instructions. - Overall, conditional execution enables higher instruction throughput in the ARM pipeline.

Uploaded by

Moksha Patel
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPSX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
17K views24 pages

5-Stage Pipeline CPU Hardware

Here are the key points about conditional execution in ARM: - ARM instructions can optionally be made conditional by postfixing a condition code. - The condition code checks the status of flags like N, Z, C, and V set by previous instructions. - If the condition is true based on the flag statuses, the instruction executes normally. - If the condition is false, the instruction does not execute and the pipeline progresses to the next instruction. - Conditional execution allows greater pipeline performance by avoiding stalls when conditions are false. - It also improves code density since conditional instructions don't need separate branch instructions. - Overall, conditional execution enables higher instruction throughput in the ARM pipeline.

Uploaded by

Moksha Patel
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPSX, PDF, TXT or read online on Scribd
You are on page 1/ 24

5-stage Pipeline CPU hardware

Pipeline CPU hardware

Distribution of control signals in pipeline CPU


Data hazards
Control hazards
 Control hazard occurs whenever there is change in normal sequential flow of
program (caused by branch/jump, calling subroutine, interrupt, return from
interrupt etc.)
Structural hazards
 [1] multiply instruction holds Ex stage for two or more clock cycle.

 [2]Two or more instructions in pipeline try to read/write register file =>


Since there is only one read/write port, only one instruction is allowed to
read/write register file.
ARM Architecture
 ARM core :
 Pipelined RISC CPU reduced number of fixed size instructions
 Offers high code density, small size, low power
 Applications are cell phones, handheld PDA, camera
 But different from pure RISC (to gain some advantages)
 Variable cycle execution for certain instructions to support multiple
load and store
 Inline barrel shifter leading to few complex instructions –
preprocessing one operand enhances computational power
 Thumb state (16-bit instruction set) to improve code density
 Conditional execution of instructions for smooth pipeline operation
 DSP instructions to support signal processing
 Performance: speed=> MIPS@ Clk freq., DMIPS@ Clk freq.
power=> mW @ (Volt, Clk freq., technology)

6
DMIPS
 Dhrystone is a synthetic benchmark program for system programming. So
DMIPS measures not just instructions per second but gives an idea of how
long overall it will take one processor to perform a task versus another,
taking into account the different number and kinds of instructions.
 The industries have adopted the VAX 11/780 as the reference 1 MIPS
machine. The VAX 11/780 achieves 1757 Dhrystones per second.

 The Dhrystone figure of given computing system is calculated by measuring


the number of Dhrystones executed per second and dividing that by 1757.
So if a computing system able to execute 140560 dhrystones per second,
then its DMIPS rating is 140560/1757 = 80 DMIPS
 To compare two computing systems that run at different clock frequency,
DMIPS is normalized to clock frequency.
e.g. 60 DMIPS @ 40 MHz = 1.5 DMIPS/MHz
 New Benchmarking => CoreMark MIPS

7
 Sign Extend -> converts
signed 8/16 bit to 32 bit
value and places in reg.
 Two source registers (Rn
and Rm) and one result
register Rd
 Barrel shifter =>
preprocess Rm before it
enters to ALU
 MAC unit => for multiply
and accumulation
operation

8
On Chip Debug Hardware

9
ARM Architecture
 ARM Core under study is ARM7TDMI
 ARM state => Instructions are 32-bit wide and address is word aligned
 Thumb state => Instructions are 16-bit and address is half-word aligned
ARM Modes:
 Different Modes of ARM processor are defined for specific purpose
 User mode => most application softwares run in this mode

10
ARM Architecture
 Exception modes => Supervisor, IRQ, FIQ, abort, undefined
 Non exception modes=> User, System
 ‘supervisor’ mode => runs embedded operating system routines
 ‘User’ mode => runs Application programs
 IRQ & FIQ modes => handles hardware interrupts
 Abort mode => handles memory access violations
 Undefined mode => handles undefined instruction
ARM Architecture
CPSR:
 32-bit register with condition flags, control bits, status & ext.
 Only privileged modes have full write access to CPSR
 Every processor mode except user mode can change mode by writing
directly to the mode bits of the CPSR.

 N = 1 if MSB of the ALU result is 1


 Z = 1 if Zero result from ALU
 C = 1 if ALU operation results in Carry (if Subtraction result is -ve =>C reset)
 V =1 if ALU operation oVerflowed (useful for signed numbers only)
 Flags are updated only if suffix ‘S’ is added to instruction 12
ARM Architecture
 When the processor is executing in ARM state:
 All instructions are 32 bits wide
 All instructions must be word aligned
 Therefore the pc value is stored in bits [31:2] with bits [1:0]
undefined (as instruction cannot be halfword or byte aligned).

 When the processor is executing in Thumb state:


 All instructions are 16 bits wide
 All instructions must be halfword aligned
 Therefore the pc value is stored in bits [31:1] with bit [0] undefined
(as instruction cannot be byte aligned).

 When the processor is executing in Jazelle state:


 All instructions are 8 bits wide
 Executes java byte codes

13
Banked Registers:

15
ARM Architecture

 Total 37 registers = 30 general purpose + 6 status + 1 PC


 Different set of register in different mode of operation
 User and System mode uses same set of registers
 Shaded registers (banked registers) are hidden from user/system mode and
available only in exception modes.
 R13 = Stack pointer (SP). Each exception mode has its own SP
 R14 = link register (LR) -> Holds return address of subroutine when it is
called with BL instruction.
 Each exception mode has its own SP and LR
BL <cc> subroutine_label (LR automatically stores return add.)
 The return can be in two ways

 MOV PC, LR or
 B LR

16
ARM Family and Cores

ARM Core Features ARM ISA Thumb


family version version

ARM7TDMI 3-state pipeline, thumb state ARMv4T v1


ARM7 ARM 720T as ARM7TDMI, cache
ARM 740T as ARM7TDMI, cache
ARM 920T 5-stage pipeline, thumb, data and inst. ARMv4T
cache, MMU
ARM 922T 5-stage pipeline, thumb, data and inst.
cache, MMU
ARM9 ARM946E 5-stage pipeline, thumb, Enhanced DSP ARMv5TE
instructions, caches, MPU
ARM926EJ 5-stage pipeline, thumb, Jazelle DBX, ARMv5TEJ
Enhanced DSP instructions, caches, MMU

ARM11 ARM1156T2(F) 8-stage pipeline, SIMD, Thumb-2, VFP, ARMv6T2 v2


Enhanced DSP instructions

ARM Cortex Series: Profile A, Profile R, Profile M


ARM Data Processing
 Syntax : <opcode> {<cc>} {S} Rd, Rn, op2
 ‘op2’ normally comes from barrel shifter and can be the following:

 Rm and Rs should not be PC (r15) in shift/rotate by register mode of ‘op2’


 shift and rotate affects N,Z,C flags
 # value for shift and rotate is 5-bit unsigned integer

18
19
ARM - The Barrel Shifter
LSL : Logical Left Shift ASR: Arithmetic Right Shift

CF Destination 0 Destination CF

Multiplication by a power of 2 Division by a power of 2,


preserving the sign bit
LSR : Logical Shift Right
ROR: Rotate Right

...0 Destination CF Destination CF

Bit rotate with wrap around


Division by a power of 2
from LSB to MSB

RRX: Rotate Right Extended

Destination CF

Single bit rotate with wrap around


from CF to MSB

20
ARM Data Processing Instructions

 CMP,CMN,TST & TEQ always update flags (even if ‘S’ is not used as
suffix) and do not alter any register. They use only Rn and OP2.
 MOV & MVN use only two operands i.e. Rd and ‘op2’

21
Data processing:
 ADD R9, R5, R5, LSL #3 ; R9 = R5+(R5*8) = 9*R5
 RSB R9, R5, R5, LSR #3 ; R9 = (R5/8) – R5
 MOV R12, R4, ROR R3 ;R12= R4 rotated right by value of R3
 CMP R7, R5 ; update flags after (R7-R5)

Conditional Execution:
 ARM instructions can be made to execute conditionally by post fixing
them with the appropriate condition code field. (e.g. MOVEQ R0,R1)
 Condition checks the status of appropriate flags
 If condition is true, normal execution otherwise no execution.
 Adv. => Greater pipeline performance and higher code density leading to
higher instructions throughput

22
ARM Conditional Execution

23
ARM Conditional Execution
 Set the flags, and then use various conditional codes
 CMP r0, # 0 if (a==0) x=0; (here r0 = a, r1= x)
 MOVEQ r1, # 0 if (a>0) x=1;
 MOVGT r1, #1
 Set of Conditional compare instruction
 CMP r0, # 4 if (a==4 or a==10)
 CMPNE r0, #10 x=0;
 MOVEQ r1, # 0

 Reduces number of instructions


While (a!=b) {
if (a>b) a=a-b; else b=b-a; } (here r1 = a, r2= b)
------------------------------------------------------------------------------------------
loop: CMP r1,r2 loop1: CMP r1, r2
BEQ finish SUBGT r1, r1, r2
BLT lessthan SUBLT r2, r2, r1
SUB r1, r1, r2 BNE loop1
B loop
lessthan : SUB r2,r2,r1
B loop
finish

24

You might also like