0% found this document useful (0 votes)

4 views22 pages

Pipelined Processor Design: Computer Architecture and Assembly Language

The document discusses pipelined processor design, comparing it to serial execution and outlining key concepts such as pipeline hazards, structural hazards, data hazards, and control hazards. It explains how pipelining improves throughput by overlapping execution stages, while also detailing potential issues that can arise, including various types of hazards and their solutions. The document concludes with strategies to mitigate these hazards, emphasizing the importance of careful design and hardware resources.

Uploaded by

faisal.for.nothing

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

4 views22 pages

Pipelined Processor Design: Computer Architecture and Assembly Language

Uploaded by

faisal.for.nothing

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 22

Pipelined Processor Design

CSE 333
Computer Architecture and Assembly Language

[Adapted from slides of Dr. M. Mudawar, ICS 233, KFUPM]

Presentation Outline
 Pipelining versus Serial Execution

 Pipeline Hazards, Structural Hazards

 Data Hazards and Forwarding

 Control Hazards

 Summary
Pipelining Example
 Laundry Example: Three Stages

1. Wash dirty load of clothes

2. Dry wet clothes

3. Fold and put clothes into drawers

 Each stage takes 30 minutes to complete

A B
 Four loads of clothes to wash, dry, and fold C D
Sequential Laundry
6 PM 7 8 9 10 11 12 AM
Time 30 30 30 30 30 30 30 30 30 30 30 30

 Sequential laundry takes 6 hours for 4 loads

 Intuitively, we can use pipelining to speed up laundry
Pipelined Laundry: Start Load ASAP
6 PM 7 8 9 PM
30 30 30
30 30 30 Time
30 30 30
30 30 30

A  Pipelined laundry takes

3 hours for 4 loads
B
 Speedup factor is 2 for
4 loads
C
 Time to wash, dry, and
D fold one load is still the
same (90 minutes)
Serial Execution versus Pipelining
 Consider a task that can be divided into k subtasks
 The k subtasks are executed on k different stages
 Each subtask requires one time unit
 The total execution time of the task is k time units
 Pipelining is to overlap the execution
 The k stages work in parallel on k different tasks
 Tasks enter/leave pipeline at the rate of one task per time unit
1 2 … k 1 2 … k
1 2 … k 1 2 … k
1 2 … k 1 2 … k

Without Pipelining With Pipelining

One completion every k time units One completion every 1 time unit
Synchronous Pipeline
 Uses clocked registers between stages
 Upon arrival of a clock edge …
 All registers hold the results of previous stages simultaneously

 The pipeline stages are combinational logic circuits

 It is desirable to have balanced stages
 Approximately equal delay in all stages

 Clock period is determined by the maximum stage delay

Input S1 S2 Sk Output

Clock
 Let i = time delay in stage Si
 Clock cycle  = max(i) is the maximum stage delay
 Clock frequency f = 1/ = 1/max(i)
 A pipeline can process n tasks in k + n – 1 cycles
 k cycles are needed to complete the first task
 n – 1 cycles are needed to complete the remaining n – 1 tasks

 Ideal speedup of a k-stage pipeline over serial execution

Serial execution in cycles nk

Sk = = Sk → k for large n
Pipelined execution in cycles k+n–1
MIPS Processor Pipeline
 Five stages, one cycle per stage
1. IF: Instruction Fetch from instruction memory
2. ID: Instruction Decode, register read
3. EX: Execute operation, calculate load/store address or
J/Br address
4. MEM: Memory access for load and store
5. WB: Write Back result to register
Single-Cycle vs Pipelined Performance
 Consider a 5-stage instruction execution in which …
 Instruction fetch = ALU operation = Data memory access = 200 ps
 Register read = register write = 150 ps
 What is the clock cycle of the single-cycle processor?
 What is the clock cycle of the pipelined processor?
 What is the speedup factor of pipelined execution?
 Solution
Single-Cycle Clock = 200+150+200+200+150 = 900 ps
IF Reg ALU MEM Reg
900 ps IF Reg ALU MEM Reg
900 ps
Single-Cycle versus Pipelined – cont’d
 Pipelined clock cycle = max(200, 150) = 200 ps
IF Reg ALU MEM Reg
200 IF Reg ALU MEM Reg
200 IF Reg ALU MEM Reg
200 200 200 200 200

 CPI for pipelined execution = 1

 One instruction completes each cycle (ignoring pipeline fill)
 Speedup of pipelined execution = 900 ps / 200 ps = 4.5
 Instruction count and CPI are equal in both cases
 Speedup factor is less than 5 (number of pipeline stage)
 Because the pipeline stages are not balanced
Pipeline Performance Summary
 Pipelining doesn’t improve latency of a single instruction
 However, it improves throughput of entire workload
 Instructions are initiated and completed at a higher rate

 In a k-stage pipeline, k instructions operate in parallel

 Overlapped execution using multiple hardware resources
 Potential speedup = number of pipeline stages k
 Unbalanced lengths of pipeline stages reduces speedup

 Pipeline rate is limited by slowest pipeline stage

 Unbalanced lengths of pipeline stages reduces speedup
 Also, time to fill and drain pipeline reduces speedup
Pipeline Hazards
 Hazards: situations that would cause incorrect execution
 If next instruction were launched during its designated clock cycle
1. Structural hazards
 Caused by resource contention
 Using same resource by two instructions during the same cycle
2. Data hazards
 An instruction may compute a result needed by next instruction
 Hardware can detect dependencies between instructions
3. Control hazards
 Caused by instructions that change control flow (branches/jumps)
 Delays in changing the flow of control
 Hazards complicate pipeline control and limit performance
Structural Hazards
 Problem
 Attempt to use the same hardware resource by two different
instructions during the same cycle
Structural Hazard
 Example
Two instructions are
 Writing back ALU result in stage 4 attempting to write
the register file
 Conflict with writing load data in stage 5
during same cycle

lw $t6, 8($s5) IF ID EX MEM WB

Instructions

ori $t4, $s3, 7 IF ID EX WB

sub $t5, $s2, $s3 IF ID EX WB
sw $s2, 10($s3) IF ID EX MEM

CC1 CC2 CC3 CC4 CC5 CC6 CC7 CC8 CC9 Time
Resolving Structural Hazards
 Serious Hazard:
 Hazard cannot be ignored

 Solution 1: Delay Access to Resource

 Must have mechanism to delay instruction access to resource
 Delay all write backs to the register file to stage 5
 ALU instructions bypass stage 4 (memory) without doing anything

 Solution 2: Add more hardware resources (more costly)

 Add more hardware to eliminate the structural hazard
 Redesign the register file to have two write ports
 First write port can be used to write back ALU results in stage 4
 Second write port can be used to write back load data in stage 5
Data Hazards
 Dependency between instructions causes a data hazard
 The dependent instructions are close to each other
 Pipelined execution might change the order of operand access

 Read After Write – RAW Hazard

 Given two instructions I and J, where I comes before J
 Instruction J should read an operand after it is written by I
 Called a data dependence in compiler terminology
I: add $s1, $s2, $s3 # $s1 is written
J: sub $s4, $s1, $s3 # $s1 is read
 Hazard occurs when J reads the operand before I writes it
Example of a RAW Data Hazard
Time (cycles) CC1 CC2 CC3 CC4 CC5 CC6 CC7 CC8
value of $s2 10 10 10 10 10 20 20 20
Program Execution Order

sub $s2, $t1, $t3 IM Reg ALU DM Reg

add $s4, $s2, $t5 IM Reg ALU DM Reg

or $s6, $t3, $s2 IM Reg ALU DM Reg

and $s7, $t4, $s2 IM Reg ALU DM Reg

sw $t8, 10($s2) IM Reg ALU DM

 Result of sub is needed by add, or, and, & sw instructions

 Instructions add & or will read old value of $s2 from reg file
 During CC5, $s2 is written at end of cycle, old value is read
Solution 1: Stalling the Pipeline
Time (in cycles) CC1 CC2 CC3 CC4 CC5 CC6 CC7 CC8 CC9
value of $s2 10 10 10 10 10 20 20 20 20
Instruction Order

sub $s2, $t1, $t3 IM Reg ALU DM Reg

add $s4, $s2, $t5 IM Reg Reg Reg Reg ALU DM Reg

stall stall stall

or $s6, $t3, $s2 IM Reg ALU DM

 Three stall cycles during CC3 thru CC5 (wasting 3 cycles)

 Stall cycles delay execution of add & fetching of or instruction
 The add instruction cannot read $s2 until beginning of CC6
 The add instruction remains in the Instruction register until CC6
 The PC register is not modified until beginning of CC6
Solution 2: Forwarding ALU Result
 The ALU result is forwarded (fed back) to the ALU input
 No bubbles are inserted into the pipeline and no cycles are wasted
 ALU result is forwarded from ALU, MEM, and WB stages
Time (cycles) CC1 CC2 CC3 CC4 CC5 CC6 CC7 CC8
value of $s2 10 10 10 10 10 20 20 20
Program Execution Order

sub $s2, $t1, $t3 IM Reg ALU DM Reg

add $s4, $s2, $t5 IM Reg ALU DM Reg

or $s6, $t3, $s2 IM Reg ALU DM Reg

and $s7, $s6, $s2 IM Reg ALU DM Reg

sw $t8, 10($s2) IM Reg ALU DM

Control Hazards
 Jump and Branch can cause great performance loss
 Jump instruction needs only the jump target address
 Branch instruction needs two things:
 Branch Result Taken or Not Taken
 Branch Target Address
 PC + 4 If Branch is NOT taken
 PC + 4 + 4 × immediate If Branch is Taken

 Jump and Branch targets are computed in EX stage

 At which point two instructions have already been fetched
 For Jump, the two instructions need to be flushed
 For Branch, the two instructions are flushed if Branch is Taken
2-Cycle Branch Delay
 Control logic detects a Branch instruction in the 2nd Stage
 ALU computes the Branch outcome in the 3rd Stage
 Next1 and Next2 instructions will be fetched anyway
 Convert Next1 and Next2 into bubbles if branch is taken
cc1 cc2 cc3 cc4 cc5 cc6 cc7

Beq $t1,$t2,L1 IF Reg ALU

Next1 IF Reg Bubble Bubble Bubble

Next2 IF Bubble Bubble Bubble Bubble

Branch
L1: target instruction Target IF Reg ALU DM
Addr
In Summary
 Three types of pipeline hazards
 Structural hazards: conflict using a resource during same cycle
 Data hazards: due to data dependencies between instructions
 Control hazards: due to branch and jump instructions

 Overcome the hazards.

 Structural hazards: eliminated by careful design or more hardware
 Data hazards can be eliminated by forwarding
 Control hazard can be eliminated by branch prediction

MC-Part-A - 4 Units - Tamil
No ratings yet
MC-Part-A - 4 Units - Tamil
10 pages
Operating System Overview Objectives and Functions
No ratings yet
Operating System Overview Objectives and Functions
3 pages
Topic 10: Pipelining: Cos / Ele 375 Computer Architecture and Organization
No ratings yet
Topic 10: Pipelining: Cos / Ele 375 Computer Architecture and Organization
64 pages
FemtoRV32 Piplined Processor Report
No ratings yet
FemtoRV32 Piplined Processor Report
25 pages
MC14500B Industrial Control Unit Handbook 1977
100% (1)
MC14500B Industrial Control Unit Handbook 1977
113 pages
BIT1102 Introduction To Programming and Algorithms
No ratings yet
BIT1102 Introduction To Programming and Algorithms
161 pages
Lecture # Pipelining
No ratings yet
Lecture # Pipelining
36 pages
Lec7 Pipelining
No ratings yet
Lec7 Pipelining
22 pages
3-Pipelining 241110 203716
No ratings yet
3-Pipelining 241110 203716
59 pages
05 Pipelining
No ratings yet
05 Pipelining
37 pages
06 - CS F342 Pipelining (ForMIDSEM - Upto35slides)
No ratings yet
06 - CS F342 Pipelining (ForMIDSEM - Upto35slides)
69 pages
Module 5 Part2 Pipelining
No ratings yet
Module 5 Part2 Pipelining
36 pages
Pipeline Hazards
No ratings yet
Pipeline Hazards
61 pages
Coal Paper PDF
0% (1)
Coal Paper PDF
3 pages
Lec 1
No ratings yet
Lec 1
30 pages
Session 3 - DIO Module & Interfaces
No ratings yet
Session 3 - DIO Module & Interfaces
94 pages
Lecture 13 Pipelining
No ratings yet
Lecture 13 Pipelining
12 pages
Final Exam OS
No ratings yet
Final Exam OS
86 pages
Pipelining Lecture
No ratings yet
Pipelining Lecture
39 pages
Pipelining
No ratings yet
Pipelining
43 pages
Microprocessor and Microcontroller (MPMC)
No ratings yet
Microprocessor and Microcontroller (MPMC)
5 pages
Chapter 4.5 - 4.8 Piplined Processor and Hazards
No ratings yet
Chapter 4.5 - 4.8 Piplined Processor and Hazards
68 pages
CA07 2022S3 New
No ratings yet
CA07 2022S3 New
29 pages
Chapter 04 Processor 2
No ratings yet
Chapter 04 Processor 2
28 pages
Midterm Answers
No ratings yet
Midterm Answers
3 pages
Lecture 08 - Pipelined Processor Design
No ratings yet
Lecture 08 - Pipelined Processor Design
55 pages
Ch-05 (ICS I) - Computer Architecture
No ratings yet
Ch-05 (ICS I) - Computer Architecture
74 pages
Security Analysis of Processor Instruction Set Architecture For Enforcing Control-Flow Integrity
No ratings yet
Security Analysis of Processor Instruction Set Architecture For Enforcing Control-Flow Integrity
11 pages
Lecture-4-08 01 2025
No ratings yet
Lecture-4-08 01 2025
35 pages
Computer Architecture and Organization
No ratings yet
Computer Architecture and Organization
49 pages
8 Pipeline DDP Control
No ratings yet
8 Pipeline DDP Control
54 pages
Module1 UPD
No ratings yet
Module1 UPD
72 pages
Pipeline Processor Design
No ratings yet
Pipeline Processor Design
89 pages
L15 MipsPipeline
No ratings yet
L15 MipsPipeline
26 pages
MCA and MSC CS Syllabus
No ratings yet
MCA and MSC CS Syllabus
289 pages
Lect8 Pipelined DP Control
No ratings yet
Lect8 Pipelined DP Control
59 pages
CEG 2136 - Fall 2011 - Final PDF
No ratings yet
CEG 2136 - Fall 2011 - Final PDF
8 pages
Pipelining and Parallelism
No ratings yet
Pipelining and Parallelism
41 pages
Question Papers Solutions: Unit 1
No ratings yet
Question Papers Solutions: Unit 1
105 pages
CS530 Fall2015 Lecture9
No ratings yet
CS530 Fall2015 Lecture9
5 pages
Get File
No ratings yet
Get File
2 pages
Risc and Cisc: By: Farheen Masood Sania Shahzad
No ratings yet
Risc and Cisc: By: Farheen Masood Sania Shahzad
17 pages
Pipelining Basic and Intermediate Concepts
No ratings yet
Pipelining Basic and Intermediate Concepts
75 pages
Unit 5 Pipeline Hazard
No ratings yet
Unit 5 Pipeline Hazard
31 pages
Microprocessor Reviewer
No ratings yet
Microprocessor Reviewer
4 pages
Midterm CA
No ratings yet
Midterm CA
5 pages
CS3350B Computer Architecture: Lecture 6.2: Instructional Level Parallelism: Hazards and Resolutions
No ratings yet
CS3350B Computer Architecture: Lecture 6.2: Instructional Level Parallelism: Hazards and Resolutions
31 pages
Unit 4
No ratings yet
Unit 4
20 pages
Prasun Ghosal: Computer Organization and Architecture
No ratings yet
Prasun Ghosal: Computer Organization and Architecture
3 pages
Pipelining - Modified1
No ratings yet
Pipelining - Modified1
51 pages
Lec13 Pipe Control
No ratings yet
Lec13 Pipe Control
19 pages
Computer Architecture: Nguyễn Trí Thành
No ratings yet
Computer Architecture: Nguyễn Trí Thành
77 pages
Week 11 Reduced
No ratings yet
Week 11 Reduced
29 pages
ILP - Appendix C PDF
No ratings yet
ILP - Appendix C PDF
52 pages
Computer Architecture: Pipelining: Dr. Ashok Kumar Turuk
No ratings yet
Computer Architecture: Pipelining: Dr. Ashok Kumar Turuk
136 pages
8086 Microprocessor: Neha Verma Assistant Professor - ECE Dept. DCE, Gurgaon
No ratings yet
8086 Microprocessor: Neha Verma Assistant Professor - ECE Dept. DCE, Gurgaon
35 pages
Pipeline
No ratings yet
Pipeline
39 pages
CODch 6 Slides
No ratings yet
CODch 6 Slides
77 pages
Pipelinehazard 160823134502
No ratings yet
Pipelinehazard 160823134502
61 pages
Two Forms of Pipelining: - E.g., Floating Point Operations
No ratings yet
Two Forms of Pipelining: - E.g., Floating Point Operations
36 pages
CS 162 Computer Architecture Lecture 3: Pipelining Contd.: Instructor: L.N. Bhuyan
No ratings yet
CS 162 Computer Architecture Lecture 3: Pipelining Contd.: Instructor: L.N. Bhuyan
21 pages
Lec 04 Pipeline D Processor
No ratings yet
Lec 04 Pipeline D Processor
106 pages
MIPS Superscalar Simulator
No ratings yet
MIPS Superscalar Simulator
5 pages
Embedded Systems Design: Pipelining and Instruction Scheduling
No ratings yet
Embedded Systems Design: Pipelining and Instruction Scheduling
48 pages
Helping Slides Pipelining Hazards Solutions
No ratings yet
Helping Slides Pipelining Hazards Solutions
55 pages
Cao Unit-2
No ratings yet
Cao Unit-2
58 pages
Pipe Lining
No ratings yet
Pipe Lining
66 pages
Pipelinehazard For Class
No ratings yet
Pipelinehazard For Class
61 pages
Pipelining
No ratings yet
Pipelining
44 pages
IKI20210 Pengantar Organisasi Komputer Kuliah No. 25: Pipeline
No ratings yet
IKI20210 Pengantar Organisasi Komputer Kuliah No. 25: Pipeline
27 pages
Basic Pipelining: CS2100 - Computer Organization
No ratings yet
Basic Pipelining: CS2100 - Computer Organization
83 pages
KVARA 6 MAUAL 752plc0gb
No ratings yet
KVARA 6 MAUAL 752plc0gb
110 pages
CCS CMCS 611-101 Advanced Computer Architecture Advanced Computer Architecture
100% (2)
CCS CMCS 611-101 Advanced Computer Architecture Advanced Computer Architecture
24 pages
Enhancing Performance With Pipelining
No ratings yet
Enhancing Performance With Pipelining
85 pages
Assembler Directives
No ratings yet
Assembler Directives
6 pages
Microprocessors and Interfacing: Unit-I: 8085 Microprocessor
No ratings yet
Microprocessors and Interfacing: Unit-I: 8085 Microprocessor
6 pages
HRY-312 Computer Organization Introduction To Pipelining
No ratings yet
HRY-312 Computer Organization Introduction To Pipelining
30 pages
Textbook Unit 2 + Unit 3
No ratings yet
Textbook Unit 2 + Unit 3
23 pages
Chapter 8 - Pipelining
No ratings yet
Chapter 8 - Pipelining
38 pages
Affiliated Institutions
No ratings yet
Affiliated Institutions
39 pages
Tuesday, October 31, 2023 10:53 PM: Discuss, The Schemes For Dealing With The Pipeline Stalls Caused by Branch Hazards
No ratings yet
Tuesday, October 31, 2023 10:53 PM: Discuss, The Schemes For Dealing With The Pipeline Stalls Caused by Branch Hazards
7 pages
Sample Problems Pipe&Memory
No ratings yet
Sample Problems Pipe&Memory
57 pages
Enhancing Performance With Pipelining
No ratings yet
Enhancing Performance With Pipelining
71 pages
MPC6515HardwareManual PDF
No ratings yet
MPC6515HardwareManual PDF
37 pages
Chapter 3 Instructions ARM
No ratings yet
Chapter 3 Instructions ARM
35 pages
Ec3492-Digital Signal Processing Laboratory
100% (1)
Ec3492-Digital Signal Processing Laboratory
80 pages
2014fa CS61C L31 DG PipelineII 6up
No ratings yet
2014fa CS61C L31 DG PipelineII 6up
4 pages
Lec12 Pipeline
No ratings yet
Lec12 Pipeline
23 pages
Seabed Seismic Techniques: QC and Data Processing Keys
From Everand
Seabed Seismic Techniques: QC and Data Processing Keys
Mundy Obilor Jim
5/5 (2)
Solutions to Problems in Fluids and Turbomachinery
From Everand
Solutions to Problems in Fluids and Turbomachinery
Rahul Basu
No ratings yet

Pipelined Processor Design: Computer Architecture and Assembly Language

Uploaded by

Pipelined Processor Design: Computer Architecture and Assembly Language

Uploaded by

Pipelined Processor Design

[Adapted from slides of Dr. M. Mudawar, ICS 233, KFUPM]

 Pipeline Hazards, Structural Hazards

 Data Hazards and Forwarding

1. Wash dirty load of clothes

2. Dry wet clothes

3. Fold and put clothes into drawers

 Each stage takes 30 minutes to complete

 Sequential laundry takes 6 hours for 4 loads

A  Pipelined laundry takes

Without Pipelining With Pipelining

 The pipeline stages are combinational logic circuits

 Clock period is determined by the maximum stage delay

 Ideal speedup of a k-stage pipeline over serial execution

Serial execution in cycles nk

 CPI for pipelined execution = 1

 In a k-stage pipeline, k instructions operate in parallel

 Pipeline rate is limited by slowest pipeline stage

lw $t6, 8($s5) IF ID EX MEM WB

ori $t4, $s3, 7 IF ID EX WB

 Solution 1: Delay Access to Resource

 Solution 2: Add more hardware resources (more costly)

 Read After Write – RAW Hazard

sub $s2, $t1, $t3 IM Reg ALU DM Reg

add $s4, $s2, $t5 IM Reg ALU DM Reg

or $s6, $t3, $s2 IM Reg ALU DM Reg

and $s7, $t4, $s2 IM Reg ALU DM Reg

sw $t8, 10($s2) IM Reg ALU DM

 Result of sub is needed by add, or, and, & sw instructions

sub $s2, $t1, $t3 IM Reg ALU DM Reg

stall stall stall

 Three stall cycles during CC3 thru CC5 (wasting 3 cycles)

sub $s2, $t1, $t3 IM Reg ALU DM Reg

add $s4, $s2, $t5 IM Reg ALU DM Reg

or $s6, $t3, $s2 IM Reg ALU DM Reg

and $s7, $s6, $s2 IM Reg ALU DM Reg

sw $t8, 10($s2) IM Reg ALU DM

 Jump and Branch targets are computed in EX stage

Beq $t1,$t2,L1 IF Reg ALU

Next1 IF Reg Bubble Bubble Bubble

Next2 IF Bubble Bubble Bubble Bubble

 Overcome the hazards.

You might also like