0% found this document useful (0 votes)
4 views22 pages

Pipelined Processor Design: Computer Architecture and Assembly Language

The document discusses pipelined processor design, comparing it to serial execution and outlining key concepts such as pipeline hazards, structural hazards, data hazards, and control hazards. It explains how pipelining improves throughput by overlapping execution stages, while also detailing potential issues that can arise, including various types of hazards and their solutions. The document concludes with strategies to mitigate these hazards, emphasizing the importance of careful design and hardware resources.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
4 views22 pages

Pipelined Processor Design: Computer Architecture and Assembly Language

The document discusses pipelined processor design, comparing it to serial execution and outlining key concepts such as pipeline hazards, structural hazards, data hazards, and control hazards. It explains how pipelining improves throughput by overlapping execution stages, while also detailing potential issues that can arise, including various types of hazards and their solutions. The document concludes with strategies to mitigate these hazards, emphasizing the importance of careful design and hardware resources.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 22

Pipelined Processor Design

CSE 333
Computer Architecture and Assembly Language

[Adapted from slides of Dr. M. Mudawar, ICS 233, KFUPM]


Presentation Outline
 Pipelining versus Serial Execution

 Pipeline Hazards, Structural Hazards

 Data Hazards and Forwarding

 Control Hazards

 Summary
Pipelining Example
 Laundry Example: Three Stages

1. Wash dirty load of clothes

2. Dry wet clothes

3. Fold and put clothes into drawers

 Each stage takes 30 minutes to complete


A B
 Four loads of clothes to wash, dry, and fold C D
Sequential Laundry
6 PM 7 8 9 10 11 12 AM
Time 30 30 30 30 30 30 30 30 30 30 30 30

 Sequential laundry takes 6 hours for 4 loads


 Intuitively, we can use pipelining to speed up laundry
Pipelined Laundry: Start Load ASAP
6 PM 7 8 9 PM
30 30 30
30 30 30 Time
30 30 30
30 30 30

A  Pipelined laundry takes


3 hours for 4 loads
B
 Speedup factor is 2 for
4 loads
C
 Time to wash, dry, and
D fold one load is still the
same (90 minutes)
Serial Execution versus Pipelining
 Consider a task that can be divided into k subtasks
 The k subtasks are executed on k different stages
 Each subtask requires one time unit
 The total execution time of the task is k time units
 Pipelining is to overlap the execution
 The k stages work in parallel on k different tasks
 Tasks enter/leave pipeline at the rate of one task per time unit
1 2 … k 1 2 … k
1 2 … k 1 2 … k
1 2 … k 1 2 … k

Without Pipelining With Pipelining


One completion every k time units One completion every 1 time unit
Synchronous Pipeline
 Uses clocked registers between stages
 Upon arrival of a clock edge …
 All registers hold the results of previous stages simultaneously

 The pipeline stages are combinational logic circuits


 It is desirable to have balanced stages
 Approximately equal delay in all stages

 Clock period is determined by the maximum stage delay


Register

Register

Register
Register

Input S1 S2 Sk Output

Clock
 Let i = time delay in stage Si
 Clock cycle  = max(i) is the maximum stage delay
 Clock frequency f = 1/ = 1/max(i)
 A pipeline can process n tasks in k + n – 1 cycles
 k cycles are needed to complete the first task
 n – 1 cycles are needed to complete the remaining n – 1 tasks

 Ideal speedup of a k-stage pipeline over serial execution

Serial execution in cycles nk


Sk = = Sk → k for large n
Pipelined execution in cycles k+n–1
MIPS Processor Pipeline
 Five stages, one cycle per stage
1. IF: Instruction Fetch from instruction memory
2. ID: Instruction Decode, register read
3. EX: Execute operation, calculate load/store address or
J/Br address
4. MEM: Memory access for load and store
5. WB: Write Back result to register
Single-Cycle vs Pipelined Performance
 Consider a 5-stage instruction execution in which …
 Instruction fetch = ALU operation = Data memory access = 200 ps
 Register read = register write = 150 ps
 What is the clock cycle of the single-cycle processor?
 What is the clock cycle of the pipelined processor?
 What is the speedup factor of pipelined execution?
 Solution
Single-Cycle Clock = 200+150+200+200+150 = 900 ps
IF Reg ALU MEM Reg
900 ps IF Reg ALU MEM Reg
900 ps
Single-Cycle versus Pipelined – cont’d
 Pipelined clock cycle = max(200, 150) = 200 ps
IF Reg ALU MEM Reg
200 IF Reg ALU MEM Reg
200 IF Reg ALU MEM Reg
200 200 200 200 200

 CPI for pipelined execution = 1


 One instruction completes each cycle (ignoring pipeline fill)
 Speedup of pipelined execution = 900 ps / 200 ps = 4.5
 Instruction count and CPI are equal in both cases
 Speedup factor is less than 5 (number of pipeline stage)
 Because the pipeline stages are not balanced
Pipeline Performance Summary
 Pipelining doesn’t improve latency of a single instruction
 However, it improves throughput of entire workload
 Instructions are initiated and completed at a higher rate

 In a k-stage pipeline, k instructions operate in parallel


 Overlapped execution using multiple hardware resources
 Potential speedup = number of pipeline stages k
 Unbalanced lengths of pipeline stages reduces speedup

 Pipeline rate is limited by slowest pipeline stage


 Unbalanced lengths of pipeline stages reduces speedup
 Also, time to fill and drain pipeline reduces speedup
Pipeline Hazards
 Hazards: situations that would cause incorrect execution
 If next instruction were launched during its designated clock cycle
1. Structural hazards
 Caused by resource contention
 Using same resource by two instructions during the same cycle
2. Data hazards
 An instruction may compute a result needed by next instruction
 Hardware can detect dependencies between instructions
3. Control hazards
 Caused by instructions that change control flow (branches/jumps)
 Delays in changing the flow of control
 Hazards complicate pipeline control and limit performance
Structural Hazards
 Problem
 Attempt to use the same hardware resource by two different
instructions during the same cycle
Structural Hazard
 Example
Two instructions are
 Writing back ALU result in stage 4 attempting to write
the register file
 Conflict with writing load data in stage 5
during same cycle

lw $t6, 8($s5) IF ID EX MEM WB


Instructions

ori $t4, $s3, 7 IF ID EX WB


sub $t5, $s2, $s3 IF ID EX WB
sw $s2, 10($s3) IF ID EX MEM

CC1 CC2 CC3 CC4 CC5 CC6 CC7 CC8 CC9 Time
Resolving Structural Hazards
 Serious Hazard:
 Hazard cannot be ignored

 Solution 1: Delay Access to Resource


 Must have mechanism to delay instruction access to resource
 Delay all write backs to the register file to stage 5
 ALU instructions bypass stage 4 (memory) without doing anything

 Solution 2: Add more hardware resources (more costly)


 Add more hardware to eliminate the structural hazard
 Redesign the register file to have two write ports
 First write port can be used to write back ALU results in stage 4
 Second write port can be used to write back load data in stage 5
Data Hazards
 Dependency between instructions causes a data hazard
 The dependent instructions are close to each other
 Pipelined execution might change the order of operand access

 Read After Write – RAW Hazard


 Given two instructions I and J, where I comes before J
 Instruction J should read an operand after it is written by I
 Called a data dependence in compiler terminology
I: add $s1, $s2, $s3 # $s1 is written
J: sub $s4, $s1, $s3 # $s1 is read
 Hazard occurs when J reads the operand before I writes it
Example of a RAW Data Hazard
Time (cycles) CC1 CC2 CC3 CC4 CC5 CC6 CC7 CC8
value of $s2 10 10 10 10 10 20 20 20
Program Execution Order

sub $s2, $t1, $t3 IM Reg ALU DM Reg

add $s4, $s2, $t5 IM Reg ALU DM Reg

or $s6, $t3, $s2 IM Reg ALU DM Reg

and $s7, $t4, $s2 IM Reg ALU DM Reg

sw $t8, 10($s2) IM Reg ALU DM

 Result of sub is needed by add, or, and, & sw instructions


 Instructions add & or will read old value of $s2 from reg file
 During CC5, $s2 is written at end of cycle, old value is read
Solution 1: Stalling the Pipeline
Time (in cycles) CC1 CC2 CC3 CC4 CC5 CC6 CC7 CC8 CC9
value of $s2 10 10 10 10 10 20 20 20 20
Instruction Order

sub $s2, $t1, $t3 IM Reg ALU DM Reg

add $s4, $s2, $t5 IM Reg Reg Reg Reg ALU DM Reg

stall stall stall


or $s6, $t3, $s2 IM Reg ALU DM

 Three stall cycles during CC3 thru CC5 (wasting 3 cycles)


 Stall cycles delay execution of add & fetching of or instruction
 The add instruction cannot read $s2 until beginning of CC6
 The add instruction remains in the Instruction register until CC6
 The PC register is not modified until beginning of CC6
Solution 2: Forwarding ALU Result
 The ALU result is forwarded (fed back) to the ALU input
 No bubbles are inserted into the pipeline and no cycles are wasted
 ALU result is forwarded from ALU, MEM, and WB stages
Time (cycles) CC1 CC2 CC3 CC4 CC5 CC6 CC7 CC8
value of $s2 10 10 10 10 10 20 20 20
Program Execution Order

sub $s2, $t1, $t3 IM Reg ALU DM Reg

add $s4, $s2, $t5 IM Reg ALU DM Reg

or $s6, $t3, $s2 IM Reg ALU DM Reg

and $s7, $s6, $s2 IM Reg ALU DM Reg

sw $t8, 10($s2) IM Reg ALU DM


Control Hazards
 Jump and Branch can cause great performance loss
 Jump instruction needs only the jump target address
 Branch instruction needs two things:
 Branch Result Taken or Not Taken
 Branch Target Address
 PC + 4 If Branch is NOT taken
 PC + 4 + 4 × immediate If Branch is Taken

 Jump and Branch targets are computed in EX stage


 At which point two instructions have already been fetched
 For Jump, the two instructions need to be flushed
 For Branch, the two instructions are flushed if Branch is Taken
2-Cycle Branch Delay
 Control logic detects a Branch instruction in the 2nd Stage
 ALU computes the Branch outcome in the 3rd Stage
 Next1 and Next2 instructions will be fetched anyway
 Convert Next1 and Next2 into bubbles if branch is taken
cc1 cc2 cc3 cc4 cc5 cc6 cc7

Beq $t1,$t2,L1 IF Reg ALU

Next1 IF Reg Bubble Bubble Bubble

Next2 IF Bubble Bubble Bubble Bubble

Branch
L1: target instruction Target IF Reg ALU DM
Addr
In Summary
 Three types of pipeline hazards
 Structural hazards: conflict using a resource during same cycle
 Data hazards: due to data dependencies between instructions
 Control hazards: due to branch and jump instructions

 Overcome the hazards.


 Structural hazards: eliminated by careful design or more hardware
 Data hazards can be eliminated by forwarding
 Control hazard can be eliminated by branch prediction

You might also like