0% found this document useful (0 votes)

8 views34 pages

Chapter4 2

The document discusses the architecture and implementation of a pipelined processor design, detailing the roles of various units such as the ALU, data memory, and registers. It covers the execution of different instruction types (R-type, load/store, J-type) and the impact of pipelining on performance, including potential hazards like structural, data, and control hazards. Additionally, it explores the benefits of bypassing to reduce stalls and improve throughput in a multi-stage pipeline system.

Uploaded by

Namit Jain

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

8 views34 pages

Chapter4 2

Uploaded by

Namit Jain

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 34

View from 30,000 Feet

Note: we haven’t bothered

showing multiplexors

• What is the role of the Add units? Source: H&P textbook

• Explain the inputs to the data memory unit

• Explain the inputs to the ALU
• Explain the inputs to the register unit 4
Clocking Methodology

Source: H&P textbook

• Which of the above units need a clock?
• What is being saved (latched) on the rising edge of the clock?
Keep in mind that the latched value remains there for an entire cycle
5
Implementing R-type Instructions

• Instructions of the form add $t1, $t2, $t3

• Explain the role of each signal

Source: H&P textbook

6
Implementing Loads/Stores

• Instructions of the form lw $t1, 8($t2) and sw $t1, 8($t2)

Where does this input come from?

7
Source: H&P textbook
Implementing J-type Instructions

• Instructions of the form beq $t1, $t2, offset

Source: H&P textbook 8

View from 10,000 Feet

9
Source: H&P textbook
View from 5,000 Feet

10
Source: H&P textbook
Latches and Clocks in a Single-Cycle Design

Instr Reg Data

PC ALU Addr
Mem File Memory

• The entire instruction executes in a single cycle

• Green blocks are latches
• At the rising edge, a new PC is recorded
• At the rising edge, the result of the previous cycle is recorded
• At the falling edge, the address of LW/SW is recorded so
we can access the data memory in the 2nd half of the cycle 11
Multi-Stage Circuit

Instead of executing the entire instruction in a single

cycle (a single stage), let’s break up the execution into
multiple stages, each separated by a latch

Instr Reg Data

PC L2 L3 ALU L4 L5
Mem File Memory

Reg
File
12
The Assembly Line
Unpipelined Start and finish a job before moving to the next

Jobs

Time

A B C
A B C Break the job into smaller stages
A B C
A B C
Pipelined

13
Performance Improvements?

• Does it take longer to finish each individual job?

• Does it take shorter to finish a series of jobs?

• What assumptions were made while answering these

questions?

• Is a 10-stage pipeline better than a 5-stage pipeline?

4
A 5-Stage Pipeline

register write in the first half of the clock cycle (dotted part)
register read in the second half of the clock cycle (solid part)

so that in CC5, I4 can use the register released by I1 (otherwise directly I5

will be able to use it)

5
Source: H&P textbook
A 5-Stage Pipeline

Use the PC to access the I-cache and increment PC by 4

all instructions go through all stages

for eg, add instruction does not require DM, but still it will
take 5 clock cycles (it will wait for that particular clock cycle)

6
DM - data memory
A 5-Stage Pipeline improved throughput, thorughput becomes 5 times

Read registers, compare registers, compute branch target; for now, assume
branches take 2 cyc (there is enough work that branches can easily take more)

branches dont work well with pipelines

7
A 5-Stage Pipeline

ALU computation, effective address computation for load/store

8
A 5-Stage Pipeline

Memory access to/from data cache, stores finish in 4 cycles

9
A 5-Stage Pipeline

Write result of ALU computation or load into register file

because of the solid and dotted lines, we are able to use

the writing of I1 for the reading of I4. otherwise we would have to
wait for I5 for reading what I1 wrote.

10
Pipeline Summary
note: no skipping of stages. so that there is no
overtaking (faster instruction overtaking the
slower one)
RR ALU DM RW
still 5 cycles taken (even if DM is empty). IM is not shown, only the latter 4 cycles are shown
ADD R1, R2,  R3 Rd R1,R2 R1+R2 -- Wr R3

BEQ R1, R2, 100 Rd R1, R2 -- -- --

Compare, Set PC

LD 8[R3]  R6 Rd R3 R3+8 Get data Wr R6

here R3 is the address

ST 8[R3]  R6 Rd R3,R6 R3+8 Wr data --

11
Performance Improvements?

• Does it take longer to finish each individual job?

yes, possibbly due to additional latch delays

• Does it take shorter to finish a series of jobs?

• What assumptions were made while answering these

questions?
– No dependences between instructions
– Easy to partition circuits into uniform pipeline stages
– No latch overhead

• Is a 10-stage pipeline better than a 5-stage pipeline?

12
Quantitative Effects

• As a result of pipelining:
 Time in ns per instruction goes up
 Each instruction takes more cycles to execute
 But… average CPI remains roughly the same
 Clock speed goes up becomes 5 times for 5 stage pipeline
 Total execution time goes down, resulting in lower
average time per instruction
 Under ideal conditions, speedup
= ratio of elapsed times between successive instruction
completions
= number of pipeline stages = increase in clock speed

13
Conflicts/Problems

• I-cache and D-cache are accessed in the same cycle – it

helps to implement them separately since, IM and DM might happen in the same clock
cycle, we must build separate hardware for these ops

• Registers are read and written in the same cycle – easy to

deal with if register read/write time equals cycle time/2

• Branch target changes only at the end of the second stage

-- what do you do in the meantime?

14
Hazards

• Structural hazards: different instructions in different stages

(or the same stage) conflicting for the same resource
for eg reading and writing in the same register in the same clock cycle solution: half cycles mein break kardo
eg(2) im and dm operations in the same clock cycle solution: keep separate IM and DM

• Data hazards: an instruction cannot continue because it

needs a value that has not yet been generated by an
earlier instruction dependencies, an instruction might need the output of some instruction that has still
not completed its 5 stages

• Control hazard: fetch cannot continue because it does

not know the outcome of an earlier branch – special case
of a data hazard – separate category because they are
treated in different ways

15
Structural Hazards

• Example: a unified instruction and data cache 

stage 4 (MEM) and stage 1 (IF) can never coincide

• The later instruction and all its successors are delayed

until a cycle is found when the resource is free  these
are pipeline bubbles

• Structural hazards are easy to eliminate – increase the

number of resources (for example, implement a separate
instruction and data cache, add more register ports)

5
Data Hazards

• An instruction produces a value in a given pipeline stage

• A subsequent instruction consumes that value in a pipeline

stage

• The consumer may have to be delayed so that the time

of consumption is later than the time of production

6
Example 1 – No Bypassing i1 and i2 have data hazard

• Show the instruction occupying each stage in each cycle (no bypassing)
if I1 is R1+R2R3 and I2 is R3+R4R5 and I3 is R7+R8R9
CYC-1 CYC-2 CYC-3 CYC-4 CYC-5 CYC-6 CYC-7 CYC-8

IF IF IF IF IF IF IF IF

D/R D/R D/R D/R D/R D/R D/R D/R

ALU ALU ALU ALU ALU ALU ALU ALU

DM DM DM DM DM DM DM DM

RW RW RW RW RW RW RW RW 7
Example 1 – No Bypassing
• Show the instruction occupying each stage in each cycle (no bypassing)
if I1 is R1+R2R3 and I2 is R3+R4R5 and I3 is R7+R8R9
CYC-1 CYC-2 CYC-3 CYC-4 CYC-5 CYC-6 CYC-7 CYC-8

IF IF IF IF IF IF IF IF
L2
I1 I2 I3 I3 I3
waiting for I2 to proceed
I4 I5

L3
D/R D/R D/R D/R D/R D/R D/R D/R
I1 I2 I2 I2
concluded finally
I3 I4
in the second half
of the clock cycle
L4 ALU ALU ALU ALU ALU ALU ALU ALU
I1 I2 I3
L5
DM DM DM DM DM DM DM DM
I1 this is a bubble
I2 I3
RW RW RW RW RW RW RW RW 8
I1 I2
Example 2 – Bypassing
• Show the instruction occupying each stage in each cycle (with bypassing)
if I1 is R1+R2R3 and I2 is R3+R4R5 and I3 is R3+R8R9.
Identify the input latch for each input operand.
CYC-1 CYC-2 CYC-3 CYC-4 CYC-5 CYC-6 CYC-7 CYC-8

IF IF IF IF IF IF IF IF

D/R D/R D/R D/R D/R D/R D/R D/R

ALU ALU ALU ALU ALU ALU ALU ALU

DM DM DM DM DM DM DM DM

RW RW RW RW RW RW RW RW 9
Example 2 – Bypassing Li Lj means that Li has been overwritten by Lj
L5 L3 because by the end of cyc4, L4 has been updated by the ALU op

• Show the instruction occupying each stage in each cycle (with bypassing)
if I1 is R1+R2R3 and I2 is R3+R4R5 and I3 is R3+R8R9.
Identify the input latch for each input operand.
observe that the result has been stored in L3 for I1 in cyc3 itself, and it is directly usable now for I2 (dont have to wait for all 5 cycles)
CYC-1 CYC-2 CYC-3 CYC-4 CYC-5 CYC-6 CYC-7 CYC-8

IF IF IF IF IF IF IF IF
I1 I2 I3 I4 I5
L2

D/R D/R D/R D/R D/R D/R D/R D/R

I1 I2 I3 I4
L3 L3 L3 L4 L3 L5 L3
ALU ALU ALU ALU ALU ALU ALU ALU
I1 I2 I3
L4

DM DM DM DM DM DM DM DM
I1 I2 I3
L5

RW RW RW RW RW RW RW RW
I1 I2 I3
Problem 1

IF D/R ALUL3 DM RW
i1 i1 i1 i1 i1

IF D/R ALU DM RW
L4
i2 i2 i2 i2
i2

IF D/R ALU DM RW
add $1, $2, $3

IF D/R ALU DM RW
lw $4, 8($1)

11
Problem 2
L2 L3 L4 L5

IF D/R ALU DM RW
i1 i1
i1 i1 i1

IF D/R ALU DM RW
i2 i2

lw $1, 8($2) IF D/R ALU DM RW

i2 i2
DM is in cyc4, so there is one cycle delay i2 i2
(still faster than non-bypass)

lw $4, 8($1) IF D/R ALU DM RW

12
Problem 3 1) read from L5
2) writing will happen in the first half and hence DM can access the written part in the second half

IF D/R ALU DM RW
i1 i1 i1 i1
i1

IF D/R ALU DM RW
i2 i2 i2
i2 i2

IF D/R ALU DM RW
lw $1, 8($2)

IF D/R ALU DM RW
sw $1, 8($3)

13
Problem 4

A 7 or 9 stage pipeline, RR and RW take an entire stage

IF IF Dec Dec RR ALU RW

ALU DM DM RW

lw $1, 8($2)

add $4, $1, $3 9

Problem 4

A 7 or 9 stage pipeline, RR and RW take an entire stage

instruction
fetch decode
IF IF Dec Dec RR ALU RW

ALU DM DM RW

lw $1, 8($2)

add $4, $1, $3 10

Problem 4

Without bypassing: 4 stalls

IF:IF:DE:DE:RR:AL:DM:DM:RW
IF: IF :DE:DE:DE:DE: DE :DE:RR:AL:RW

With bypassing: 2 stalls

IF:IF:DE:DE:RR:AL:DM:DM:RW
IF: IF :DE:DE:DE:DE: RR :AL:RW
lw $1, 8($2)
IF IF Dec Dec RR ALU RW
add $4, $1, $3

ALU DM DM RW
11

Quadracci Pavilion - Milwaukee Art Museum
No ratings yet
Quadracci Pavilion - Milwaukee Art Museum
9 pages
CA07 2022S3 New
No ratings yet
CA07 2022S3 New
29 pages
Pipelining ControlUnitAndHazards
No ratings yet
Pipelining ControlUnitAndHazards
109 pages
Pipe Lining
No ratings yet
Pipe Lining
66 pages
8 Pipeline DDP Control
No ratings yet
8 Pipeline DDP Control
54 pages
Ch#4 Part 1, 2,34
No ratings yet
Ch#4 Part 1, 2,34
70 pages
Lec 11
No ratings yet
Lec 11
30 pages
CS530 Fall2015 Lecture9
No ratings yet
CS530 Fall2015 Lecture9
5 pages
Chapter 04 Processor 3.5
No ratings yet
Chapter 04 Processor 3.5
52 pages
Processor Organization & Instruction Cycle
No ratings yet
Processor Organization & Instruction Cycle
31 pages
Lecture 6
No ratings yet
Lecture 6
17 pages
CS M151B / EE M116C: Computer Systems Architecture
No ratings yet
CS M151B / EE M116C: Computer Systems Architecture
38 pages
Chapter 4 The Processor
No ratings yet
Chapter 4 The Processor
72 pages
HRY-312 Computer Organization Introduction To Pipelining
No ratings yet
HRY-312 Computer Organization Introduction To Pipelining
30 pages
3 Pipeline
No ratings yet
3 Pipeline
38 pages
02a ILP Pipeline
No ratings yet
02a ILP Pipeline
40 pages
15IF11 Multicore A PDF
No ratings yet
15IF11 Multicore A PDF
64 pages
Pipelining and Parallelism
No ratings yet
Pipelining and Parallelism
41 pages
SRM Pipelining 05
No ratings yet
SRM Pipelining 05
42 pages
03 Pipeline
0% (1)
03 Pipeline
38 pages
Pipeline Processor Design
No ratings yet
Pipeline Processor Design
89 pages
Two Forms of Pipelining: - E.g., Floating Point Operations
No ratings yet
Two Forms of Pipelining: - E.g., Floating Point Operations
36 pages
L03 Pipelining
No ratings yet
L03 Pipelining
45 pages
Reduced Instruction Set Computer (Risc) Complex Instruction Set Computer (Cisc)
No ratings yet
Reduced Instruction Set Computer (Risc) Complex Instruction Set Computer (Cisc)
7 pages
Pipelining
No ratings yet
Pipelining
44 pages
Lec7 Pipelining
No ratings yet
Lec7 Pipelining
22 pages
Lecture10 - Chapter4-P2
No ratings yet
Lecture10 - Chapter4-P2
46 pages
Pipeline Review: Here Is The Example Instruction Sequence Used To Illustrate Pipelining On The Previous Page
No ratings yet
Pipeline Review: Here Is The Example Instruction Sequence Used To Illustrate Pipelining On The Previous Page
11 pages
Ca06 2014 PDF
No ratings yet
Ca06 2014 PDF
53 pages
FemtoRV32 Piplined Processor Report
No ratings yet
FemtoRV32 Piplined Processor Report
25 pages
Lec12 Pipeline
No ratings yet
Lec12 Pipeline
23 pages
Lecture # Pipelining
No ratings yet
Lecture # Pipelining
36 pages
Cse410 10 Pipelining A
No ratings yet
Cse410 10 Pipelining A
7 pages
Pipelined MIPS Processor: Dmitri Strukov ECE 154A
No ratings yet
Pipelined MIPS Processor: Dmitri Strukov ECE 154A
81 pages
Computer Architecture: Introduction To The Concept of Pipelined Processor
No ratings yet
Computer Architecture: Introduction To The Concept of Pipelined Processor
20 pages
Chapter 2 Lecture 4 and 5
No ratings yet
Chapter 2 Lecture 4 and 5
56 pages
Week 11
No ratings yet
Week 11
33 pages
Computer Architecture and Organization
No ratings yet
Computer Architecture and Organization
49 pages
ILP - Appendix C PDF
No ratings yet
ILP - Appendix C PDF
52 pages
CA Lecture 12
No ratings yet
CA Lecture 12
48 pages
Lec11 Pipeline 1 Notes
No ratings yet
Lec11 Pipeline 1 Notes
26 pages
Lecture2a PDF
No ratings yet
Lecture2a PDF
63 pages
CSCE 5610 Computer System Architecture: Instruction Level Parallelism
No ratings yet
CSCE 5610 Computer System Architecture: Instruction Level Parallelism
11 pages
Lect3 Pipeline
No ratings yet
Lect3 Pipeline
4 pages
Arch3 Pipelining Afterlecture
No ratings yet
Arch3 Pipelining Afterlecture
180 pages
Lec 25
No ratings yet
Lec 25
20 pages
Computer Architecture: Pipelining: Dr. Ashok Kumar Turuk
No ratings yet
Computer Architecture: Pipelining: Dr. Ashok Kumar Turuk
136 pages
Introduction To Pipelining Introduction To Pipelining
No ratings yet
Introduction To Pipelining Introduction To Pipelining
35 pages
Pipelined Processor Design: Computer Architecture and Assembly Language
No ratings yet
Pipelined Processor Design: Computer Architecture and Assembly Language
22 pages
2014fa CS61C L31 DG PipelineII 6up
No ratings yet
2014fa CS61C L31 DG PipelineII 6up
4 pages
Onur Digitaldesign - Comparch 2021 Lecture14 Pipelined Processor Design Afterlecture
No ratings yet
Onur Digitaldesign - Comparch 2021 Lecture14 Pipelined Processor Design Afterlecture
97 pages
Lecture 12
No ratings yet
Lecture 12
34 pages
L15 MipsPipeline
No ratings yet
L15 MipsPipeline
26 pages
L11 Pipelined Datapath and
100% (1)
L11 Pipelined Datapath and
31 pages
Pipelining
No ratings yet
Pipelining
24 pages
Embedded Systems Design: Pipelining and Instruction Scheduling
No ratings yet
Embedded Systems Design: Pipelining and Instruction Scheduling
48 pages
EE (CE) 6304 Computer Architecture Lecture #2 (8/28/13)
No ratings yet
EE (CE) 6304 Computer Architecture Lecture #2 (8/28/13)
35 pages
3-Pipelining 241110 203716
No ratings yet
3-Pipelining 241110 203716
59 pages
LEARN MPLS FROM SCRATCH PART-A: A Beginner's Guide to Next Level of Networking
From Everand
LEARN MPLS FROM SCRATCH PART-A: A Beginner's Guide to Next Level of Networking
POONAM DEVI
No ratings yet
LEARN MPLS FROM SCRATCH PART-B: A Beginners guide to next level of networking
From Everand
LEARN MPLS FROM SCRATCH PART-B: A Beginners guide to next level of networking
POONAM DEVI
No ratings yet
Computer Programming In C Language
From Everand
Computer Programming In C Language
Jitendra Patel
4/5 (15)
Portal Frames
No ratings yet
Portal Frames
32 pages
San Agustin Church
No ratings yet
San Agustin Church
3 pages
Structural Design Report
No ratings yet
Structural Design Report
208 pages
PAMBANSANG-PABAHAY
No ratings yet
PAMBANSANG-PABAHAY
6 pages
ARTIGO - Peter Collins - Tectonics 1960
No ratings yet
ARTIGO - Peter Collins - Tectonics 1960
4 pages
Homeworks Uk Nottingham
100% (1)
Homeworks Uk Nottingham
7 pages
Vpdoa Baramati: Major Project
No ratings yet
Vpdoa Baramati: Major Project
46 pages
The Origin of Monumental Architecture - Frankfort
No ratings yet
The Origin of Monumental Architecture - Frankfort
32 pages
Midterm #2 Solution
No ratings yet
Midterm #2 Solution
2 pages
Finalized Creative Focus Spring 2014
No ratings yet
Finalized Creative Focus Spring 2014
2 pages
Software Verification: ACI 318-14 Example 002
No ratings yet
Software Verification: ACI 318-14 Example 002
4 pages
Mezzanine Floor Plan
No ratings yet
Mezzanine Floor Plan
1 page
Notes:: Iranian Petroleum Standards
No ratings yet
Notes:: Iranian Petroleum Standards
1 page
Fill The Gaps With The Correct Tenses
No ratings yet
Fill The Gaps With The Correct Tenses
3 pages
Indian Database
No ratings yet
Indian Database
2 pages
Defect Mapping of The Gabaldon Building
No ratings yet
Defect Mapping of The Gabaldon Building
7 pages
Left Elevation 3: A B C D 1 2 3 4
No ratings yet
Left Elevation 3: A B C D 1 2 3 4
1 page
III B.tech II Sem Mad Unit-4 Lecture Notes
No ratings yet
III B.tech II Sem Mad Unit-4 Lecture Notes
53 pages
Art Appreciation Midterm Ii
No ratings yet
Art Appreciation Midterm Ii
50 pages
Question Submitted By:: Guest: What Is Meaning of M25 Grade
No ratings yet
Question Submitted By:: Guest: What Is Meaning of M25 Grade
1 page
8 Lateral Earth Pressures
No ratings yet
8 Lateral Earth Pressures
16 pages
Parametric Design in Urban Design
No ratings yet
Parametric Design in Urban Design
7 pages
01 A4 Paper - Forms For Building Permission
No ratings yet
01 A4 Paper - Forms For Building Permission
5 pages
Blast Damper HV Series
No ratings yet
Blast Damper HV Series
5 pages
Warm Startup
No ratings yet
Warm Startup
3 pages
Exp 3
No ratings yet
Exp 3
3 pages
Sikagard - 720 Epocem: 3-Part Cement and Epoxy Combination Micro Mortar For Surface Sealing
No ratings yet
Sikagard - 720 Epocem: 3-Part Cement and Epoxy Combination Micro Mortar For Surface Sealing
5 pages
Miners Cottage Miners Cottage: Linen
No ratings yet
Miners Cottage Miners Cottage: Linen
2 pages
Upload 5 Documents To Download
No ratings yet
Upload 5 Documents To Download
6 pages

Chapter4 2

Uploaded by

Chapter4 2

Uploaded by

View from 30,000 Feet

Note: we haven’t bothered

• What is the role of the Add units? Source: H&P textbook

• Explain the inputs to the data memory unit

Source: H&P textbook

• Instructions of the form add $t1, $t2, $t3

Source: H&P textbook

• Instructions of the form lw $t1, 8($t2) and sw $t1, 8($t2)

Where does this input come from?

• Instructions of the form beq $t1, $t2, offset

Source: H&P textbook 8

Instr Reg Data

• The entire instruction executes in a single cycle

Instead of executing the entire instruction in a single

Instr Reg Data

• Does it take longer to finish each individual job?

• Does it take shorter to finish a series of jobs?

• What assumptions were made while answering these

• Is a 10-stage pipeline better than a 5-stage pipeline?

so that in CC5, I4 can use the register released by I1 (otherwise directly I5

Use the PC to access the I-cache and increment PC by 4

all instructions go through all stages

branches dont work well with pipelines

ALU computation, effective address computation for load/store

Memory access to/from data cache, stores finish in 4 cycles

Write result of ALU computation or load into register file

because of the solid and dotted lines, we are able to use

BEQ R1, R2, 100 Rd R1, R2 -- -- --

LD 8[R3]  R6 Rd R3 R3+8 Get data Wr R6

ST 8[R3]  R6 Rd R3,R6 R3+8 Wr data --

• Does it take longer to finish each individual job?

• Does it take shorter to finish a series of jobs?

• What assumptions were made while answering these

• Is a 10-stage pipeline better than a 5-stage pipeline?

• I-cache and D-cache are accessed in the same cycle – it

• Registers are read and written in the same cycle – easy to

• Branch target changes only at the end of the second stage

• Structural hazards: different instructions in different stages

• Data hazards: an instruction cannot continue because it

• Control hazard: fetch cannot continue because it does

• Example: a unified instruction and data cache 

• The later instruction and all its successors are delayed

• Structural hazards are easy to eliminate – increase the

• An instruction produces a value in a given pipeline stage

• A subsequent instruction consumes that value in a pipeline

• The consumer may have to be delayed so that the time

D/R D/R D/R D/R D/R D/R D/R D/R

ALU ALU ALU ALU ALU ALU ALU ALU

D/R D/R D/R D/R D/R D/R D/R D/R

ALU ALU ALU ALU ALU ALU ALU ALU

D/R D/R D/R D/R D/R D/R D/R D/R

lw $1, 8($2) IF D/R ALU DM RW

lw $4, 8($1) IF D/R ALU DM RW

A 7 or 9 stage pipeline, RR and RW take an entire stage

IF IF Dec Dec RR ALU RW

add $4, $1, $3 9

A 7 or 9 stage pipeline, RR and RW take an entire stage

add $4, $1, $3 10

Without bypassing: 4 stalls

With bypassing: 2 stalls

You might also like