0% found this document useful (0 votes)

11 views42 pages

SRM Pipelining 05

Chapter 5 discusses pipelining in CPU architecture, highlighting its impact on performance through parallel execution of instructions. It covers various hazards that can occur during pipelining, including structure, data, and control hazards, and presents solutions such as interlocking and forwarding to mitigate these issues. The chapter also emphasizes the importance of instruction set architecture (ISA) design in facilitating efficient pipelining.

Uploaded by

Akella Venkata Sesha Sai

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

11 views42 pages

SRM Pipelining 05

Uploaded by

Akella Venkata Sesha Sai

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 42

Chapter 5: Pipelining

1
Introduction
• CPU performance factors
– Instruction count
• Determined by ISA and compiler
– CPI and Cycle time
• Determined by CPU hardware
• We will examine two CPU implementations
– A simplified version
– A more realistic and pipelined version
• Simple subset, shows the most aspects
– Memory reference: ld/lw, sd/sw
– Arithmetic-logical: add, sub, and, and or
– Condition branch: beq (branch if equal)

2
§4.5 An Overview of Pipelining
Pipelining Analogy
• Laundry Example A B C D

• Ann, Brian, Cathy, Dave 30 minutes

each have one load of clothes
to wash, dry, fold, and put away
– Washer takes 30 minutes
– Dryer takes 30 minutes 30 minutes
– ”Folder” takes 30 minutes
– “Putter” takes 30 minutes
30 minutes
• One load: 120 minutes
30 minutes

3
Pipelining: Its Natural!
• Pipelined laundry: overlapping execution
– Parallelism improves performance ■ Four loads:
■ Speedup
= 8/3.5 = 2.3
■ Non-stop:
■ Speedup
= 2n/0.5n + 1.5 ≈ 4
= number of stages

Important to note
Each laundry still takes 120 minutes.
Improvement are for 4 load throughput.
More complicated if stages take different
amount of time 4
RISC-V Pipeline
Five stages, one step per stage
1. IF: Instruction Fetch from memory
2. ID: Instruction Decode & register read
3. EX: Execute operation or calculate address
4. MEM: Access memory operand
5. WB: Write result Back to register

5
Graphical Representation of Instruction Pipeline

• IF: Instruction Fetch from memory

– Box representing instruction memory
– Right-half shade representing usage of IM at the second half of the cycle
• ID: Instruction Decode & register read
– Box representing register
– Right-half shade representing usage (read) of Register at the second half of the
cycle
• EX: Execute operation or calculate address
– Shade representing usage
• MEM: Access memory operand (only for load/store)
– White background representing NOT used by add instruction in this example
• WB: Write result Back to register (only for load and AL instructions)
– Box representing register
– Left-half shade representing write to register at the first half of the cycle
6
Classic 5-Stage Pipeline for a RISC
• In each cycle, hardware
initiates a new instruction
and executes some part of
five different instructions:
– Simple

Clock number
Instruction number 1 2 3 4 5 6 7 8 9
Instruction i IF ID EX MEM WB
Instruction i+1 IF ID EX MEM WB
Instruction i+2 IF ID EX MEM WB
Instruction i+3 IF ID EX MEM WB
Instruction i+4 IF ID EX MEM WB 7
Pipeline Performance

• Assume time for stages is

– 100ps for register read or write
– 200ps for other stages
• Compare pipelined datapath with single-cycle datapath

8
Pipeline Performance
Single-cycle (Tc= 800ps)
2400ps

vs
Pipelined (Tc= 200ps)

1400ps

• For large number of instructions, say 1M, the speedup will be

– ~= 800ps/200ps = 4 9
Pipeline Speedup

• Execute billions instructions, so throughput is what matters.

• Pipelining doesn’t help latency of single instruction
• Potential speedup = number pipeline stages;

• Unbalanced lengths of pipeline stages reduces10speedup;

Pipelining and ISA Design
• RISC ISA designed for pipelining
– All instructions are 32-bits
• Easier to fetch and decode in one cycle
• c.f. x86: 1- to 17-byte instructions

– Few and regular instruction formats

• Can decode and read registers in one step

– Load/store addressing
• Can calculate address in 3rd stage, access memory in 4th stage

– Alignment of memory operands

• Memory access takes only one cycle

11
Hazards

• Situations that prevent starting the next instruction in the

next cycle

• Structure hazards
– A required resource is busy
• Data hazard
– Need to wait for previous instruction to complete its data
read/write
• Control hazard because of branch or jump
– Deciding on control action depends on previous instruction
12
Structure Hazards
• Conflict for use of a resource
– Find a situation in laundry example?
• In RISC-V pipeline if with a single memory
🡪 IF and MEM conflict
– Load/store requires mem access
– Instruction fetch would have to stall for
that cycle
• Would cause a pipeline “bubble”
• Hence, pipelined datapaths require
separate instruction/data memories
– Or separate instruction/data caches
Load or store

13
One Memory Port🡪Structural Hazards
Time (clock cycles)
Cycle 1 Cycle 2 Cycle 3 Cycle 4 Cycle 5 Cycle 6 Cycle 7

ALU
I Load Ifetch Reg DMem Reg

n
s

ALU
Ifetch Reg DMem Reg
t Instr 1
r.

ALU
Ifetch Reg DMem Reg
Instr 2
O
r

ALU
Ifetch Reg DMem Reg
d Instr 3
e

ALU
r Instr 4 Ifetch Reg DMem Reg

14
One Memory Port/Structural Hazards
Time (clock cycles)
Cycle 1 Cycle 2 Cycle 3 Cycle 4 Cycle 5 Cycle 6 Cycle 7

ALU
I Load Ifetch Reg DMem Reg

n
s y !
a

ALU
t Instr 1 Ifetch Reg DMem Reg
e l
d
r. l e
y c

ALU
Instr 2 Ifetch Reg DMem Reg c
n e
O O
r Bu
Bub Bub Bub Bub
Stall bbl
d e
ble ble ble ble
e

ALU
r Instr 3 Ifetch Reg DMem Reg

How do you “bubble” the pipe? 🡪 No-Op

15
Summary of Structure Hazard
• To address structure hazard, have separate memories for
instructions and data
• However, it will increase cost
– E.g.: pipelining function units or duplicated resources is a high
cost;

† If the structure hazard is rare, it may not be

worth the cost to avoid it.

16
Data Hazards
• An instruction needs data produced
by a previous instruction
– Read-After-Write (RAW) data dependency
add x1, x2, x3
sub x4, x1, x5

add x1, x2, x3

sub x4, x1, x5

– Sub would read old value of x1 at cycle 3

17
Data Hazards and Solution #1: Interlocking
• An instruction needs data produced by a previous instruction
– Read-After-Write (RAW) data dependency
add x1, x2, x3
sub x4, x1, x5
• Interlock: Hardware detect their dependency, and
– Insert no-op instructions, e.g. “add $0,$0,$0”, as bubble
– Waste 400: two instructions in between since sub needs to
wait for two stages for add to write the result x1 to register
add x1, x2, x3 Two cycles delay!

sub x4, x1, x5 18

Solution #2: Forwarding (aka Bypassing)
• Use result right after when it is computed instead of waiting for it to be
stored in a register
– add produces the result at the end of its EXE stage
– sub uses the result at the beginning of its EXE stage, which is right after the cycle
for add’s EXE
– Requires extra connections in the datapath

19
Load-Use Data Hazard
• Load produce the results after the MEM stage
– Sub use the result at the beginning of the EXE stage, which is in
the same cycle as load’s MEM, thus, not possible to forward
• Can’t avoid stalls by forwarding for load-use
– If value not computed when needed
– Can’t forward backward in time!

One cycle delay!

20
Code Scheduling to Avoid Stalls (Software
Solution)
• Reorder code to avoid use of load result in the next
instruction
• C code for a = b + e; c = b + f;
ld x1, 0(x31) ld x1, 0(x31)
ld x2, 8(x31) ld x2, 8(x31)
stall
addx3, x1, x2 ld x4, 16(x31)
sd x3, 24(x31) addx3, x1, x2
ld x4, 16(x31) sd x3, 24(x31)
addx5, x1, x4 addx5, x1, x4
stall
sd x5, 32(x31) sd x5, 32(x31)
13 cycles 11 cycles

21
To Check Cycles Delayed and How Forward Works
in Different Cases
• In the 5-stage pipeline, check whether the results can be
generated before it is being used
– If so, forwarding
– If not, stall
• Load-Use
• Produce-Store
– sw rs2, offset(rs1)
• sw needs rs1 to be ready at the EXE stage
• sw needs rs2 to be ready at the MEM stage
add x9, x7, x8
add x9, x7, x8 sw x9, 32(x31)
sw x10, 32(x9)
2-cycle delay if no forwarding 2-cycle delay if no forwarding
No delay with forwarding No delay with forwarding
(Forwarding from EXE to EXE) (forwarding from EXE to MEM)
22
Control Hazards
• Branch determines flow of control
– Fetching next instruction depends on branch outcome
– Pipeline might fetch incorrect instruction in the next cycle after a
beq instru is fetched
• Still working on ID stage of branch

• In RISC-V pipeline
– Need to compare registers and compute target early in the
pipeline
– Add hardware to do it in ID stage

23
Stall on Branch
• Wait until branch outcome determined before fetching next
instruction
– One cycle stall (bubble) if branch condition is determined at ID
stage
– Two cycles stall if branch condition is determined at EXE stage

One cycle delay!

24
Branch Prediction
• Longer pipelines can’t readily determine branch outcome
early
– Stall penalty becomes unacceptable
• Predict outcome of branch
– Only stall if prediction is wrong
• In pipeline
– Can predict branches not taken
– Fetch instruction after branch, with no delay

25
RISC-V with Predict Not Taken

Prediction
correct

Prediction
incorrect

26
More-Realistic Branch Prediction
• Static branch prediction
– Based on typical branch behavior
– Example: loop and if-statement branches
• Predict backward branches taken
• Predict forward branches not taken

• Dynamic branch prediction

– Hardware measures actual branch behavior
• e.g., record recent history of each branch
– Assume future behavior will continue the trend
• When wrong, stall while re-fetching, and update history

27
Pipeline Summary

The BIG
Picture
• Pipelining improves performance by increasing instruction
throughput
– Executes multiple instructions in parallel
– Each instruction has the same latency
• Subject to hazards
– Structure, data, control
• Instruction set design affects complexity of pipeline
implementation

28
Pipeline Execution Diagram: Steps
1. Identify RAW dependencies between two instructions that are one after the other or
there is one instruction in between
– AL-Use: 2-cycle delay without forwarding, no delay with forwarding
– Load-Use: 2-cycle delay without forwarding, 1 cycle delay with forwarding
• With forwarding, we can reschedule load to eliminate the 1 cycle delay even with
forwarding
– No need to looking for RAW dependency between instructions that are far from each other
(>=1 instructions in between)
• Thus only check for the two instructions that could be executed one after another or has
one other instruction in between
2. Identify branch instruction
– 1 cycle delay (or two cycles delay) depending on the implementation (question)
3. Pipeline diagrams (4 situations)
– a) No pipeline at all, one cycle per stage, no overlap
– b) Pipeline with no forwarding, 2 cycle delay for AL-USE, Load-USE, beq (EXE outcome)
– c) Pipeline with forwarding, 1 cycle delay for Load-use, and 2 cycle-delay for beq
– d) Pipeline with forwarding and load-use rescheduling: reschedule the instruction to
eliminate the 1-cycle delay for load-use
• No any two instructions can be in the same stage in the same cycle
– Structural hazard

29
for (i=1; i<M-1; i++) B2[i] = B[i-1] + B[i] + B[i+1];
• Base address B and B2 are in register x22 and x23. i is stored in
register x5, M is stored in x4. Using beq (==) for (<)
add x5, x0, 1 // i=0 to exit
add x22, x4, -1 // loop bound x22 has M-1
LOOP: beq x5, x22, Exit
slliw x6, x5, 2 // x6 now store i*4, slliw is i<<2 (shift left logic)
add x7, x22, x6 // x7 now stores address of B[i].
lw x9, 0(x7) // load B[i] from memory location (x7+0) to x9
lw x10, -4(x7) // load B[i-1] to x10
add x9, x10, x9 // x9 = B[i] + B[i-1]
lw x10, 4(x7) //load B[i+1] to x10
add x9, x10, x9 // x9 = B[i-1] + B[i] + B[i+1]
add x8, x23, x6 // x8 now stores the address of B2[i]
sw x9, 0(x8) // store value for B2[i] from register x9 to memory (x8+0)
addi x5, x5, 1 // i++
beq x0, x0, LOOP
Exit:
30
for (i=1; i<M-1; i++) B2[i] = B[i-1] + B[i] + B[i+1];
• Base address B and B2 are in register x22 and x23. i is stored in
register x5, M is stored in x4. Using beq (==) for (<)
1. add x5, x0, 1 to exit
2. add x22, x4, -1
3. LOOP: beq x5, x22, Exit
4. slliw x6, x5, 2 RAW Dependencies
5. add x7, x22, x6 Instruction that Instruction that The # instructions in Load-u
6. lw x9, 0(x7) writes the register
add x5, x0, 1
reads the register
beq x5, x22, Exit
register
x5
between
1
se

7. lw x10, -4(x7) add x22, x4, -1 beq x5, x22, Exit x22 0
8. add x9, x10, x9 slliw x6, x5, 2 add x7, x22, x6 x6 0
x7 0
9. lw x10, 4(x7) add x7, x22, x6
add x7, x22, x6
lw x9, 0(x7)
lw x10, -4(x7) x7 1
10. add x9, x10, x9 lw x9, 0(x7) add x9, x10, x9 x9 1 Y
11. add x8, x23, x6 lw x10, -4(x7) add x9, x10, x9 x10 0 Y
lw x10, 4(x7) add x9, x10, x9 x10 0 Y
12. sw x9, 0(x8) add x9, x10, x9 sw x9, 0(x8) x9 1
13. addi x5, x5, 1 add x8, x23, x6 sw x9, 0(x8) x8 0

14. beq x0, x0, LOOP addi x5, x5, 1 beq x5, x22, Exit x5 1

15. Exit:
31
Instruction that Instruction that The In instructions in Load-u
writes the register reads the register register between se

Examples
add x5, x0, 1
add x22, x4, -1
slliw x6, x5, 2
beq x5, x22, Exit
beq x5, x22, Exit
add x7, x22, x6
x5
x22
x6
1
0
0
add x7, x22, x6 lw x9, 0(x7) x7 0
add x7, x22, x6 lw x10, -4(x7) x7 1
lw x9, 0(x7) add x9, x10, x9 x9 1 Y
lw x10, -4(x7) add x9, x10, x9 x10 0 Y
lw x10, 4(x7) add x9, x10, x9 x10 0 Y
add x9, x10, x9 sw x9, 0(x8) x9 1
add x8, x23, x6 sw x9, 0(x8) x8 0
addi x5, x5, 1 beq x5, x22, Exit x5 0

32
33
§4.9 Exceptions
Exceptions and Interrupts

• “Unexpected” events requiring change

in flow of control
– Different ISAs use the terms differently
• Exception
– Arises within the CPU
• e.g., undefined opcode, overflow, syscall, …

• Interrupt
– From an external I/O controller
• Dealing with them without sacrificing performance is
hard

34
Handling Exceptions
• In MIPS, exceptions managed by a System Control
Coprocessor (CP0)
• Save PC of offending (or interrupted) instruction
– In MIPS: Exception Program Counter (EPC)
• Save indication of the problem
– In MIPS: Cause register
– We’ll assume 1-bit
• 0 for undefined opcode, 1 for overflow
• Jump to handler at 8000 00180

35
An Alternate Mechanism
• Vectored Interrupts
– Handler address determined by the cause
• Example:
– Undefined opcode: C000 0000
– Overflow: C000 0020
– …: C000 0040
• Instructions either
– Deal with the interrupt, or
– Jump to real handler

36
Handler Actions
• Read cause, and transfer to relevant handler
• Determine action required
• If restartable
– Take corrective action
– use EPC to return to program
• Otherwise
– Terminate program
– Report error using EPC, cause, …

37
Exceptions in a Pipeline
• Another form of control hazard
• Consider overflow on add in EX stage
add $1, $2, $1
– Prevent $1 from being clobbered
– Complete previous instructions
– Flush add and subsequent instructions
– Set Cause and EPC register values
– Transfer control to handler
• Similar to mispredicted branch
– Use much of the same hardware

38
1-Bit Predictor: Shortcoming
• Inner loop branches mispredicted twice!
outer: …
…
inner: …
…
beq …, …, inner
…
beq …, …, outer

■ Mispredict as taken on last iteration of inner

loop
■ Then mispredict as not taken on first
iteration of inner loop next time around
39
2-Bit Predictor
• Only change prediction on two successive mispredictions

40
Calculating the Branch Target
• Even with predictor, still need to calculate the target address
– 1-cycle penalty for a taken branch
• Branch target buffer
– Cache of target addresses
– Indexed by PC when instruction fetched
• If hit and instruction is branch predicted taken, can fetch target
immediately

41
Dynamic Branch Prediction

• In deeper and superscalar pipelines, branch penalty is

more significant
• Use dynamic prediction
– Branch prediction buffer (aka branch history table)
– Indexed by recent branch instruction addresses
– Stores outcome (taken/not taken)
– To execute a branch
• Check table, expect the same outcome
• Start fetching from fall-through or target
• If wrong, flush pipeline and flip prediction

Week 11
No ratings yet
Week 11
33 pages
CAO Fall 2024 Lecture 07 RISC V Pipelined Implementation
No ratings yet
CAO Fall 2024 Lecture 07 RISC V Pipelined Implementation
114 pages
Chapter - 04 RISC V
No ratings yet
Chapter - 04 RISC V
132 pages
Lecture # 8B
No ratings yet
Lecture # 8B
20 pages
Pipelining 2019
No ratings yet
Pipelining 2019
82 pages
Chapter 4 Part 2
No ratings yet
Chapter 4 Part 2
50 pages
Pipelining and Pipelining Hazards
No ratings yet
Pipelining and Pipelining Hazards
43 pages
Topic 10: Pipelining: Cos / Ele 375 Computer Architecture and Organization
No ratings yet
Topic 10: Pipelining: Cos / Ele 375 Computer Architecture and Organization
64 pages
Lec11 Pipeline 1 Notes
No ratings yet
Lec11 Pipeline 1 Notes
26 pages
Computer Architecture and Organization
No ratings yet
Computer Architecture and Organization
49 pages
Pipelinehazard For Class
No ratings yet
Pipelinehazard For Class
61 pages
Pipelinehazard 160823134502
No ratings yet
Pipelinehazard 160823134502
61 pages
2.pipeline RISC-V v2
No ratings yet
2.pipeline RISC-V v2
47 pages
IT3030E CA Chap5 CPU - Removed
No ratings yet
IT3030E CA Chap5 CPU - Removed
26 pages
HRY-312 Computer Organization Introduction To Pipelining
No ratings yet
HRY-312 Computer Organization Introduction To Pipelining
30 pages
Pipe Lining
No ratings yet
Pipe Lining
66 pages
3 Pipeline
No ratings yet
3 Pipeline
38 pages
Pipelining Lecture
No ratings yet
Pipelining Lecture
39 pages
CO Pipelining PDF Notes
No ratings yet
CO Pipelining PDF Notes
10 pages
Unit2 Aca
No ratings yet
Unit2 Aca
118 pages
Chapter 10 Principles of Pipelining
No ratings yet
Chapter 10 Principles of Pipelining
124 pages
Computer Architecture: Pipelining: Dr. Ashok Kumar Turuk
No ratings yet
Computer Architecture: Pipelining: Dr. Ashok Kumar Turuk
136 pages
Pipelining Preview: Basics & Challenges
No ratings yet
Pipelining Preview: Basics & Challenges
75 pages
Pipelining
No ratings yet
Pipelining
44 pages
Week 4 - Pipelining
No ratings yet
Week 4 - Pipelining
44 pages
L04 Pipelining
No ratings yet
L04 Pipelining
38 pages
Chapter 04 RISC V Removed
No ratings yet
Chapter 04 RISC V Removed
99 pages
App C
No ratings yet
App C
50 pages
Reduced Instruction Set Computers Pipelining: (RISC)
No ratings yet
Reduced Instruction Set Computers Pipelining: (RISC)
25 pages
Unit-V: Performance Enhancement Techinques
No ratings yet
Unit-V: Performance Enhancement Techinques
61 pages
CS530 Fall2015 Lecture9
No ratings yet
CS530 Fall2015 Lecture9
5 pages
Ca07 2014 PDF
No ratings yet
Ca07 2014 PDF
56 pages
William Stallings Computer Organization and Architecture 8 Edition Processor Structure and Function
No ratings yet
William Stallings Computer Organization and Architecture 8 Edition Processor Structure and Function
74 pages
Cse410 10 Pipelining A
No ratings yet
Cse410 10 Pipelining A
7 pages
Pipelined Datapath and Control
No ratings yet
Pipelined Datapath and Control
37 pages
Week 11-13
No ratings yet
Week 11-13
76 pages
Chapter 04 Processor 2
No ratings yet
Chapter 04 Processor 2
28 pages
Ch2 Lec7 Instruction Piplining
No ratings yet
Ch2 Lec7 Instruction Piplining
34 pages
Lecture 8 Chapter - 04 RISC-V Pipelining - Student Version
No ratings yet
Lecture 8 Chapter - 04 RISC-V Pipelining - Student Version
59 pages
04 Pipeline
No ratings yet
04 Pipeline
83 pages
CO4 PPT Modified
No ratings yet
CO4 PPT Modified
35 pages
Lec04 Pipelining Intro&hazards
No ratings yet
Lec04 Pipelining Intro&hazards
77 pages
Instruction Level Pipelining
100% (1)
Instruction Level Pipelining
113 pages
Reduced Instruction Set Computers Pipelining: (RISC)
No ratings yet
Reduced Instruction Set Computers Pipelining: (RISC)
25 pages
05 Risc V Pipeline
No ratings yet
05 Risc V Pipeline
31 pages
Chapter 2 ILP
No ratings yet
Chapter 2 ILP
89 pages
Chapter4 2
No ratings yet
Chapter4 2
34 pages
Computer Architecture: Nguyễn Trí Thành
No ratings yet
Computer Architecture: Nguyễn Trí Thành
77 pages
12 - Processor Structure and Function
No ratings yet
12 - Processor Structure and Function
73 pages
Lec 1
No ratings yet
Lec 1
30 pages
Computer System Organization
No ratings yet
Computer System Organization
26 pages
CA Lecture 12
No ratings yet
CA Lecture 12
48 pages
Design of 32bit MIPS Processor
No ratings yet
Design of 32bit MIPS Processor
23 pages
ch4 2
No ratings yet
ch4 2
42 pages
Instruction Scheduling
No ratings yet
Instruction Scheduling
17 pages
CA L7 Unit4 Slides Updated
No ratings yet
CA L7 Unit4 Slides Updated
144 pages
CH10-Processor Structure and Function
No ratings yet
CH10-Processor Structure and Function
14 pages
Microprocessor Interfacing & Programming: Laboratory Manual
No ratings yet
Microprocessor Interfacing & Programming: Laboratory Manual
7 pages
KL University: Department of Electronics and Communication Engineering III/IV B. Tech, 6 Semester, 2017-18
No ratings yet
KL University: Department of Electronics and Communication Engineering III/IV B. Tech, 6 Semester, 2017-18
4 pages
Computer Hardware Basics
No ratings yet
Computer Hardware Basics
150 pages
An Overview On Comuter System
No ratings yet
An Overview On Comuter System
32 pages
8086 MIC - Unit 1 - Notes
100% (1)
8086 MIC - Unit 1 - Notes
31 pages
The Myth of The Harvard Architecture: Article
No ratings yet
The Myth of The Harvard Architecture: Article
11 pages
ChatbotReport Using HTML, CSS, JS
No ratings yet
ChatbotReport Using HTML, CSS, JS
29 pages
Ai & DS Iii-Viii Sem Syllabus Edited
100% (1)
Ai & DS Iii-Viii Sem Syllabus Edited
65 pages
UNIT I The 8051 Microcontroller Instruction Set Edited
No ratings yet
UNIT I The 8051 Microcontroller Instruction Set Edited
143 pages
MP LMT Notes
No ratings yet
MP LMT Notes
79 pages
Aggressive Unboxing in The Dart VM
No ratings yet
Aggressive Unboxing in The Dart VM
74 pages
Lab 03
No ratings yet
Lab 03
3 pages
I T C E: Ece/cs 252
No ratings yet
I T C E: Ece/cs 252
9 pages
MP LAB MANUAL Saw Tooth Gen
No ratings yet
MP LAB MANUAL Saw Tooth Gen
61 pages
Introduction To Computer Architecture
No ratings yet
Introduction To Computer Architecture
42 pages
4.2 Types of Programming Languages, Translators and IDEs
No ratings yet
4.2 Types of Programming Languages, Translators and IDEs
62 pages
Slide-5 (8086 Maximum Mode)
100% (1)
Slide-5 (8086 Maximum Mode)
13 pages
Os Lesson 4 Fetch Decode Execute Cycle
No ratings yet
Os Lesson 4 Fetch Decode Execute Cycle
11 pages
Cp5092 CCT - Unit1 - Part1
No ratings yet
Cp5092 CCT - Unit1 - Part1
2 pages
Unit-2 - Basic Computer Organization and Design
No ratings yet
Unit-2 - Basic Computer Organization and Design
55 pages
HD6433522 Dsa0087746 PDF
No ratings yet
HD6433522 Dsa0087746 PDF
455 pages
Memory Mapped
No ratings yet
Memory Mapped
7 pages
Prasun Ghosal: Computer Organization and Architecture
No ratings yet
Prasun Ghosal: Computer Organization and Architecture
3 pages
Updated - IT-SM MCQ by CA Swapnil Patni PDF
No ratings yet
Updated - IT-SM MCQ by CA Swapnil Patni PDF
167 pages
80486
No ratings yet
80486
7 pages
Lab9 Week9
No ratings yet
Lab9 Week9
8 pages
MP-MC R16 - Unit-6
100% (1)
MP-MC R16 - Unit-6
39 pages
MIC 11 - Assembly Language For 8086
No ratings yet
MIC 11 - Assembly Language For 8086
17 pages
Week 2 p2
No ratings yet
Week 2 p2
10 pages
Instruction Set
No ratings yet
Instruction Set
13 pages

SRM Pipelining 05

Uploaded by

SRM Pipelining 05

Uploaded by

Chapter 5: Pipelining

• Ann, Brian, Cathy, Dave 30 minutes

• IF: Instruction Fetch from memory

• Assume time for stages is

• For large number of instructions, say 1M, the speedup will be

• Execute billions instructions, so throughput is what matters.

• Unbalanced lengths of pipeline stages reduces10speedup;

– Few and regular instruction formats

– Alignment of memory operands

• Situations that prevent starting the next instruction in the

How do you “bubble” the pipe? 🡪 No-Op

† If the structure hazard is rare, it may not be

add x1, x2, x3

sub x4, x1, x5

– Sub would read old value of x1 at cycle 3

sub x4, x1, x5 18

One cycle delay!

One cycle delay!

• Dynamic branch prediction

• “Unexpected” events requiring change

■ Mispredict as taken on last iteration of inner

• In deeper and superscalar pipelines, branch penalty is

You might also like