0% found this document useful (0 votes)

6 views97 pages

Onur Digitaldesign - Comparch 2021 Lecture14 Pipelined Processor Design Afterlecture

Uploaded by

adapa.nikitha30

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

6 views97 pages

Onur Digitaldesign - Comparch 2021 Lecture14 Pipelined Processor Design Afterlecture

Uploaded by

adapa.nikitha30

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

You are on page 1/ 97

Digital Design &

Computer Arch.
Lecture 14: Pipelined
Processor Design
Prof. Onur Mutlu

ETH Zürich
Spring 2021
22 April 2021
Required Readings
 Last week & This week
 Pipelining
 H&H, Chapter 7.5
 Pipelining Issues
 H&H, Chapter 7.8.1-7.8.3

 This week & Next week

 Out-of-order execution
 H&H, Chapter 7.8-7.9
 Smith and Sohi, “The Microarchitecture of Superscalar
Processors,” Proceedings of the IEEE, 1995
 More advanced pipelining
 Interrupt and exception handling
 Out-of-order and superscalar execution concepts

2
Agenda for Today & Next Few
Lectures
Earlier
 Single-cycle Microarchitectures
 Multi-cycle Microarchitectures

 Last week & today

 Pipelining
 Issues in Pipelining: Control & Data Dependence
Handling, State Maintenance and Recovery, …

 Tomorrow & Next week

 Out-of-Order Execution
 Issues in OoO Execution: Load-Store Handling, …

3
Review: Single-Cycle MIPS
Processor
Jump MemtoReg
Control
MemWrite
Unit
Branch
ALUControl2:0 PCSrc
31:26
Op ALUSrc
5:0
Funct RegDst
RegWrite

CLK CLK
CLK
0 25:21
WE3 SrcA Zero WE
0 PC' PC Instr A1 RD1 0 Result
1 A RD

ALU
1 ALUResult ReadData
A RD 1
Instruction 20:16
A2 RD2 0 SrcB Data
Memory
A3 1 Memory
Register WriteData
WD3 WD
File
20:16
0
PCJump 15:11
1
WriteReg4:0
PCPlus4
+

SignImm
4 15:0
<<2
Sign Extend PCBranch

+
27:0 31:28

25:0
<<2

4
Review: Single-Cycle MIPS FSM
 Single-cycle machine

AS’ Sequential AS
Combinational
Logic
Logic
(State)

AS: Architectural State 5

Can We Do Better?

6
Review: Multi-Cycle MIPS
Processor
CLK
PCWrite
Branch PCEn
IorD Control PCSrc
MemWrite Unit ALUControl2:0
IRWrite ALUSrcB1:0
31:26 ALUSrcA
Op
5:0 RegWrite
Funct

MemtoReg
RegDst
CLK CLK CLK
CLK CLK
0 SrcA
WE WE3 A 31:28 Zero CLK
25:21
PC' PC Instr A1 RD1 1 00
0 Adr RD B

ALU
EN A EN
20:16
A2 RD2 00 ALUResult ALUOut
1 01
Instr / Data 20:16 4 01 SrcB 10
0
Memory 15:11 A3 10
CLK 1 Register PCJump
WD 11
0 File
Data WD3
1
<<2 27:0
<<2

ImmExt
15:0
Sign Extend
25:0 (Addr)

7
Review: Multi-Cycle MIPS
FSM
S0: Fetch S1: Decode
IorD = 0
Reset AluSrcA = 0 S11: Jump
ALUSrcB = 01 ALUSrcA = 0
ALUOp = 00 ALUSrcB = 11 Op = J
PCSrc = 00 ALUOp = 00 PCSrc = 10
IRWrite PCWrite
PCWrite
Op = ADDI
Op = BEQ
Op = LW
or Op = R-type What is the
S2: MemAdr Op = SW
S6: Execute
S8: Branch
S9: ADDI
Execute
shortcoming of
ALUSrcA = 1 ALUSrcA = 1
ALUSrcA = 1
ALUSrcB = 00 ALUSrcA = 1 this design?
ALUSrcB = 10 ALUSrcB = 00 ALUOp = 01 ALUSrcB = 10
ALUOp = 00 ALUOp = 10 PCSrc = 01 ALUOp = 00
Branch

Op = SW
Op = LW
S3: MemRead
S5: MemWrite
S7: ALU
Writeback S10: ADDI What does
Writeback
this design
IorD = 1
IorD = 1
MemWrite
RegDst = 1
MemtoReg = 0
RegDst = 0
MemtoReg = 0 assume
RegWrite RegWrite
about memory?

S4: Mem
Writeback

RegDst = 0
MemtoReg = 1
RegWrite

8
Can We Do Better?

9
Review: Pipelining Basic Idea
CLK
PCWrite
Branch PCEn
IorD Control PCSrc
MemWrite Unit ALUControl2:0
IRWrite ALUSrcB1:0
31:26 ALUSrcA
Op
5:0 RegWrite
Funct

MemtoReg
RegDst
CLK CLK CLK
CLK CLK
0 SrcA
WE WE3 A 31:28 Zero CLK
25:21
PC' PC Instr A1 RD1 1 00
0 Adr RD B

ALU
EN A EN
20:16
A2 RD2 00 ALUResult ALUOut
1 01
Instr / Data 20:16 4 01 SrcB 10
0
Memory 15:11 A3 10
CLK 1 Register PCJump
WD 11
0 File
Data WD3
1
<<2 27:0
<<2

ImmExt
15:0
Sign Extend
25:0 (Addr)

Of course, we need to be more careful than this! 10

Carnegie Mellon

Review: Pipelined Datapath & Control

CLK CLK CLK

RegWriteD RegWriteE RegWriteM RegWriteW

Control MemtoRegE MemtoRegM MemtoRegW
MemtoRegD
Unit
MemWriteD MemWriteE MemWriteM
BranchD BranchE BranchM
31:26 PCSrcM
Op ALUControlD ALUControlE2:0
5:0
Funct ALUSrcD ALUSrcE
RegDstD RegDstE
ALUOutW
CLK CLK CLK
CLK
25:21 WE3 SrcAE ZeroM WE
0 PC' PCF InstrD A1 RD1 0
A RD

ALU
1 ALUOutM ReadDataW
A RD 1
Instruction 20:16
A2 RD2 0 SrcBE Data
Memory
A3 1 Memory
Register WriteDataE WriteDataM
WD3 WD
File
20:16
RtE
0 WriteRegE4:0 WriteRegM4:0 WriteRegW 4:0
15:11
RdE
1
+

15:0
<<2
Sign Extend SignImmE
PCBranchM
4

+
PCPlus4F PCPlus4D PCPlus4E

ResultW

 Same control unit as single-cycle processor

Control delayed to proper pipeline stage 11
Review: Execution of Four
Independent ADDs
 Multi-cycle: 4 cycles per instruction

F D E W
F D E W
F D E W
F D E W
Time
 Pipelined: 4 cycles per 4 instructions (steady state)
1 instruction completed per cycle
F D E W
F D E W
Is life always this beautiful?
F D E W
F D E W

Time

12
Review: Issues in Pipeline
Design
Balancing work in pipeline stages
 How many stages and what is done in each stage

 Keeping the pipeline correct, moving, and full in the

presence of events that disrupt pipeline flow
 Handling dependences
 Data
 Control
 Handling resource contention
 Handling long-latency (multi-cycle) operations

 Handling exceptions, interrupts

 Advanced: Improving pipeline throughput
 Minimizing stalls
13
Data Dependence
Handling: Concepts and
Implementation

14
Review: Data Dependence
Types
Flow dependence
r3  r1 op r2 Read-after-Write
r5  r3 op r4 (RAW)

Anti dependence
r3  r1 op r2 Write-after-Read
r1  r4 op r5 (WAR)

Output-dependence
r3  r1 op r2 Write-after-Write
r5  r3 op r4 (WAW)
r3  r6 op r7 15
Review: How to Handle Data
Dependences
Anti and output dependences are easier to handle
 write to the destination only in last stage and in
program order

 Flow dependences are more interesting

 Six fundamental ways of handling flow dependences

 Detect and wait until value is available in register file
 Detect and forward/bypass data to dependent
instruction
 Detect and eliminate the dependence at the software
level
 No need for the hardware to detect dependence
 Detect and move it out of the way for independent
instructions
16

Review: Pipeline Stall: Resolving Data
Dependence
t0 t1 t2 t3 t4 t5
Insth IF ID ALU MEM WB
Insti i IF ID ALU MEM WB
Instj j IF ID ALU
ID MEM
ALU
ID ID
WB
MEM
ALU ALU
WB
MEM
Instk IF ID
IF ALU
ID
IF MEM
ALU
ID
IF WB
MEM
ALU
ID
Instl IF ID
IF ALU
ID
IF MEM
ALU
ID
IF
IF ID
IF ALU
ID
IF
i: rx  _
IF ID
IF
j:bubble
_  rx dist(i,j)=1
Stall = make the dependent instruction
bubble
j: _  rx dist(i,j)=2 wait until its source data valueIFis
bubble
j: _  rx dist(i,j)=3
available
j: _  rx dist(i,j)=4 1. stop all up-stream stages
17
2. drain all down-stream stages
How to Implement Stalling
PCSrc

ID/EX
0
M
u WB
x EX/MEM
1
Control M WB
MEM/WB

EX M WB
IF/ID

Add

Add
4 Add result

RegWrite
Branch
Shift
left 2

MemWrite
ALUSrc
Read

MemtoReg
Instruction

PC Address register 1
Read
data 1
Read
register 2 Zero
Instruction
Registers Read ALU ALU
memory Write 0 Read
data 2 result Address 1
register M data
u Data M
Write x memory u
data x
1
0
Write
data

Instruction 16 32 6
[15– 0] Sign ALU MemRead
extend control

Instruction
[20– 16]
0 ALUOp
M
Instruction u
[15– 11] x
1

 Stall
RegDst

 disable PC and IF/ID latching; ensure stalled instruction stays in

its stage
 Insert “invalid” instructions/nops into the stage following the
stalled one (called “bubbles”) 18
Based on original figure from [P&H CO&D, COPYRIGHT 2004 Elsevier. ALL RIGHTS RESERVED.]
RAW Data Dependence Example
 One instruction writes a register ($s0) and next
instructions read this register => read after write
(RAW) dependence.

Only
add writes intoif$s0
theinpipeline handles
the first half of cycle 5
 and reads $s0
data on cycle 3, obtaining
dependences the wrong value
incorrectly!
 or reads $s0 on cycle 4, again obtaining the wrong
value
 sub reads $s0 in 2nd half of cycle 5, getting the
correct value 1 2 3 4 5 6 7 8

Time (cycles)
 subsequent instructions read the correct value of $s0
add
$s2
DM $s0
add $s0, $s2, $s3 IM RF $s3 + RF

$s0
and DM $t0
and $t0, $s0, $s1 IM RF $s1 & RF

$s4
or DM $t1
or $t1, $s4, $s0 IM RF $s0 | RF

$s0
sub DM $t2
sub $t2, $s0, $s5 IM RF $s5 - RF
Compile-Time Detection and
Elimination 1 2 3 4 5 6 7 8 9 10

Time (cycles)
$s2
add DM $s0
add $s0, $s2, $s3 IM RF $s3 + RF

nop DM
nop IM RF RF

$s0
and DM $t0
and $t0, $s0, $s1 IM RF $s1 & RF

$s4
or DM $t1
or $t1, $s4, $s0 IM RF $s0 | RF

$s0
sub DM $t2
sub $t2, $s0, $s5 IM RF $s5 - RF

 Insert enough NOPs for the required result to be

ready
 Or (if you can) move independent useful
instructions up
Data Forwarding
 Also called Data Bypassing

 We have already seen the basic idea before

 Forward the result value to the dependent instruction
as soon as the value is available

 Remember dataflow?
 Data value supplied to dependent instruction as soon
as it is available
 Instruction executes when all its operands are
available

 Data forwarding brings a pipeline closer to data flow

execution principles
Data Forwarding

1 2 3 4 5 6 7 8

Time (cycles)
$s2
add DM $s0
add $s0, $s2, $s3 IM RF $s3 + RF

$s0
and DM $t0
and $t0, $s0, $s1 IM RF $s1 & RF

$s4
or DM $t1
or $t1, $s4, $s0 IM RF $s0 | RF

$s0
sub DM $t2
sub $t2, $s0, $s5 IM RF $s5 - RF
Data Forwarding
CLK CLK CLK

RegWriteD RegWriteE RegWriteM RegWriteW

Control MemtoRegE MemtoRegM MemtoRegW
MemtoRegD
Unit
MemWriteD MemWriteE MemWriteM
ALUControlD2:0 ALUControlE2:0
31:26
Op ALUSrcD ALUSrcE
5:0
Funct RegDstD RegDstE
PCSrcM
BranchD BranchE BranchM

CLK CLK CLK

CLK
25:21
WE3 SrcAE ZeroM WE
0 PC' PCF InstrD A1 RD1 00
A RD 01

ALU
1 10 ALUOutM ReadDataW
A RD
Instruction 20:16
A2 RD2 00 0 SrcBE Data
Memory 01
A3 10 1 Memory
Register WriteDataE WriteDataM
WD3 WD
File 1
25:21
RsD RsE ALUOutW
0
20:16
RtD RtE
0 WriteRegE4:0 WriteRegM4:0 WriteRegW 4:0
15:11
RdD RdE
1
SignImmD SignImmE
Sign
+

15:0
Extend
4
<<2

+
PCPlus4F PCPlus4D PCPlus4E

PCBranchM

ResultW

RegWriteW
ForwardBE

RegWriteM
ForwardAE

Hazard Unit
Data Forwarding
 Forward to Execute stage from either:
 Memory stage or
 Writeback stage

 When should we forward from either Memory or

Writeback stage?
 If that stage will write to a destination register and the
destination register matches the source register.
 If both the Memory and Writeback stages contain
matching destination registers, the Memory stage
should have priority, because it contains the more
recently executed instruction.
Data Forwarding (in
Pseudocode)
Forward to Execute stage from either:
 Memory stage or
 Writeback stage

 Forwarding logic for ForwardAE (pseudo code):

if ((rsE != 0) AND (rsE == WriteRegM) AND RegWriteM) then
ForwardAE = 10 # forward from Memory stage
else if ((rsE != 0) AND (rsE == WriteRegW) AND RegWriteW) then
ForwardAE = 01 # forward from Writeback stage
else
ForwardAE = 00 # no forwarding

 Forwarding logic for ForwardBE same, but replace rsE

with rtE
Forwarding Is Not Always
Possible 1 2 3 4 5 6 7 8

Time (cycles)
$0
lw DM $s0
lw $s0, 40($0) IM RF 40 + RF

Trouble!
$s0
and DM $t0
and $t0, $s0, $s1 IM RF $s1 & RF

$s4
or DM $t1
or $t1, $s4, $s0 IM RF $s0 | RF

$s0
sub DM $t2
sub $t2, $s0, $s5 IM RF $s5 - RF

 Forwarding is sufficient to resolve RAW data dependences

 Unfortunately, there are cases when forwarding is not possible
 due to pipeline design and instruction latencies
 The lw instruction does not finish reading data until the end of the

Memory stage
 its result cannot be forwarded to the Execute stage of the next
instruction
Stalling

1 2 3 4 5 6 7 8 9

Time (cycles)
$0
lw DM $s0
lw $s0, 40($0) IM RF 40 + RF

$s0 $s0
and DM $t0
and $t0, $s0, $s1 IM RF $s1 RF $s1 & RF

$s4
or or DM $t1
or $t1, $s4, $s0 IM IM RF $s0 | RF

Stall $s0
sub DM $t2
sub $t2, $s0, $s5 IM RF $s5 - RF
Hardware Needed for Stalling
 Stalls are supported by
 adding enable inputs (EN) to the Fetch and Decode
pipeline registers
 and a synchronous reset/clear (CLR) input to the
Execute pipeline register
 or an INV bit associated with each pipeline register,
indicating that contents are INValid

 When a lw stall occurs

 StallD and StallF are asserted to force the Decode and
Fetch stage pipeline registers to hold their old values.
 FlushE is also asserted to clear the contents of the
Execute stage pipeline register, introducing a bubble
Stalling and Dependence
Detection Hardware
CLK CLK CLK

RegWriteD RegWriteE RegWriteM RegWriteW

Control MemtoRegE MemtoRegM MemtoRegW
MemtoRegD
Unit
MemWriteD MemWriteE MemWriteM
ALUControlD2:0 ALUControlE2:0
31:26
Op ALUSrcD ALUSrcE
5:0
Funct RegDstD RegDstE
PCSrcM
BranchD BranchE BranchM

CLK CLK CLK

CLK
25:21
WE3 SrcAE ZeroM WE
0 PC' PCF InstrD A1 RD1 00
A RD 01

ALU
ALUOutM ReadDataW
EN

1 10
A RD
Instruction 20:16
A2 RD2 00 0 SrcBE Data
Memory 01
A3 10 1 Memory
Register WriteDataE WriteDataM
WD3 WD
File 1
25:21
RsD RsE ALUOutW
0
20:16
RtD RtE
0 WriteRegE4:0 WriteRegM4:0 WriteRegW 4:0
15:11
RdD RdE
1
SignImmD SignImmE
Sign
+

15:0
Extend
4
<<2

+
PCPlus4F PCPlus4D PCPlus4E
CLR
EN

PCBranchM

ResultW

MemtoRegE

RegWriteW
ForwardBE

RegWriteM
ForwardAE
FlushE
StallD
StallF

Hazard Unit
A Special Case of Data
Dependence
Control dependence
 Data dependence on the Instruction Pointer / Program
Counter

30
Control Dependence
 Question: What should the fetch PC be in the next
cycle?
 Answer: The address of the next instruction
 All instructions are control dependent on previous ones.
Why?

 If the fetched instruction is a non-control-flow

instruction:
 Next Fetch PC is the address of the next-sequential
instruction
 Easy to determine if we know the size of the fetched
instruction

 If the instruction that is fetched is a control-flow

instruction: 31
Carnegie Mellon

Control Dependences
 Special case of data dependence: dependence on PC
 beq:
 branch is not resolved until the fourth stage of the pipeline
 Instructions after the branch are fetched before branch is resolved
Always predict that the next sequential instruction is fetched
 Called “Always not taken” prediction
 These instructions must be flushed if the branch is taken

 Branch misprediction penalty

 number of instructions flushed when branch is taken
 May be reduced by resolving the branch earlier

32
Carnegie Mellon

Control Dependence: Original Pipeline

CLK CLK CLK

RegWriteD RegWriteE RegWriteM RegWriteW

Control MemtoRegE MemtoRegM MemtoRegW
MemtoRegD
Unit
MemWriteD MemWriteE MemWriteM
ALUControlD2:0 ALUControlE2:0
31:26
Op ALUSrcD ALUSrcE
5:0
Funct RegDstD RegDstE
PCSrcM
BranchD BranchE BranchM

CLK CLK CLK

CLK
25:21
WE3 SrcAE ZeroM WE
0 PC' PCF InstrD A1 RD1 00
A RD 01

ALU
1 10 ALUOutM ReadDataW
EN

A RD
Instruction 20:16
A2 RD2 00 0 SrcBE Data
Memory 01
A3 10 1 Memory
Register WriteDataE WriteDataM
WD3 WD
File 1
25:21
RsD RsE ALUOutW
0
20:16
RtD RtE
0 WriteRegE4:0 WriteRegM4:0 WriteRegW 4:0
15:11
RdD RdE
1
SignImmD SignImmE
Sign
+

15:0
Extend
4
<<2

+
PCPlus4F PCPlus4D PCPlus4E
CLR
EN

PCBranchM

ResultW

MemtoRegE

RegWriteW
ForwardBE
ForwardAE

RegWriteM
FlushE
StallD
StallF

Hazard Unit

33
Carnegie Mellon

Control Dependence
1 2 3 4 5 6 7 8 9

Time (cycles)
$t1
lw DM
20 beq $t1, $t2, 40 IM RF $t2 - RF

$s0
and DM
24 and $t0, $s0, $s1 IM RF $s1 & RF
Flush
$s4 these
or DM instructions
28 or $t1, $s4, $s0 IM RF $s0 | RF

$s0
sub DM
2C sub $t2, $s0, $s5 IM RF $s5 - RF

30 ...
...
$s2
slt DM $t3

slt
64 slt $t3, $s2, $s3 IM RF $s3 RF

34
Carnegie Mellon

Early Branch Resolution

CLK CLK CLK

RegWriteD RegWriteE RegWriteM RegWriteW

Control MemtoRegE MemtoRegM MemtoRegW
MemtoRegD
Unit
MemWriteD MemWriteE MemWriteM
ALUControlD2:0 ALUControlE2:0
31:26
Op ALUSrcD ALUSrcE
5:0
Funct RegDstD RegDstE
BranchD

EqualD PCSrcD
CLK CLK CLK
CLK
WE3
= WE
25:21 SrcAE
0 PC' PCF InstrD A1 RD1 00
A RD 01

ALU
1 10 ALUOutM ReadDataW
EN

15:0
Extend
4
<<2
+

PCPlus4F PCPlus4D
CLR

CLR
EN

PCBranchD

ResultW

MemtoRegE

RegWriteW
ForwardBE

RegWriteM
ForwardAE
FlushE
StallD
StallF

Hazard Unit

Introduces another data dependence in Decode stage. 35

Carnegie Mellon

Early Branch Resolution

1 2 3 4 5 6 7 8 9

Time (cycles)
$t1
lw DM
20 beq $t1, $t2, 40 IM RF $t2 - RF

$s0 Flush
and DM
24 and $t0, $s0, $s1 IM RF $s1 & RF this
instruction

28 or $t1, $s4, $s0

2C sub $t2, $s0, $s5

30 ...
...
$s2
slt DM $t3
slt
64 slt $t3, $s2, $s3 IM RF $s3 RF

36
Carnegie Mellon

Early Branch Resolution: Good Idea?

 Advantages
 Reduced branch misprediction penalty
 Reduced CPI (cycles per instruction)

 Disadvantages
 Potential increase in clock cycle time?
 Higher clock period and lower frequency?
 Additional hardware cost
 Specialized and likely not used by other instructions

37
Carnegie Mellon

Data Forwarding for Early Branch Resolution

CLK CLK CLK

RegWriteD RegWriteE RegWriteM RegWriteW

Control MemtoRegE MemtoRegM MemtoRegW
MemtoRegD
Unit
MemWriteD MemWriteE MemWriteM
ALUControlD2:0 ALUControlE2:0
31:26
Op ALUSrcD ALUSrcE
5:0
Funct RegDstD RegDstE
BranchD

EqualD PCSrcD
CLK CLK CLK
CLK
WE3
= WE
25:21 SrcAE
0 PC' PCF InstrD A1 RD1 0 00
A RD 01

ALU
ALUOutM ReadDataW
1 1 10
EN

A RD
Instruction 20:16
A2 RD2 0 00 0 SrcBE Data
Memory 01
A3 1 10 1 Memory
Register WriteDataE WriteDataM
WD3 WD
File 1
25:21
RsD RsE ALUOutW
0
20:16
RtD RtE
0 WriteRegE4:0 WriteRegM4:0 WriteRegW 4:0
15:11
RdD RdE
1
SignImmD SignImmE
Sign
+

15:0
Extend
4
<<2
+

PCPlus4F PCPlus4D
CLR

CLR
EN

PCBranchD

ResultW

MemtoRegE

RegWriteW
ForwardBD

ForwardBE
ForwardAD

RegWriteM
ForwardAE

RegWriteE
BranchD

FlushE
StallD
StallF

Hazard Unit

Data forwarding for early branch resolution. 38

Carnegie Mellon

Forwarding and Stalling Hardware Control

// Forwarding logic:
assign ForwardAD = (rsD != 0) & (rsD == WriteRegM) & RegWriteM;
assign ForwardBD = (rtD != 0) & (rtD == WriteRegM) & RegWriteM;

//Stalling logic:
assign lwstall = ((rsD == rtE) | (rtD == rtE)) & MemtoRegE;

assign branchstall = (BranchD & RegWriteE &

(WriteRegE == rsD | WriteRegE == rtD))
|
(BranchD & MemtoRegM &
(WriteRegM == rsD | WriteRegM == rtD));

// Stall signals;
assign StallF = lwstall | branchstall;
assign StallD = lwstall | branchstall;
assign FLushE = lwstall | branchstall;

39
Carnegie Mellon

Final Pipelined MIPS Processor (H&H)

40
Includes data dependence detection, early br resolution, forwarding, stall logic
Carnegie Mellon

Doing Better: Smarter Branch Prediction

 Guess whether branch will be taken
 Backward branches are usually taken (loops)
 Consider history of whether branch was previously taken to
improve the guess

 Good prediction reduces the fraction of branches

requiring a flush

41
More on Branch Prediction (I)

https://www.youtube.com/onurmutlulectures 42
More on Branch Prediction (II)

https://www.youtube.com/onurmutlulectures 43
More on Branch Prediction (III)

https://www.youtube.com/onurmutlulectures 44
Lectures on Branch Prediction
 Digital Design & Computer Architecture, Spring 2020,
Lecture 16b
 Branch Prediction I (ETH Zurich, Spring 2020)
 https://www.youtube.com/watch?v=h6l9yYSyZHM&list=PL5Q2soXY2Zi_F
RrloMa2fUYWPGiZUBQo2&index=22

 Digital Design & Computer Architecture, Spring 2020,

Lecture 17
 Branch Prediction II (ETH Zurich, Spring 2020)
 https://www.youtube.com/watch?v=z77VpggShvg&list=PL5Q2soXY2Zi_F
RrloMa2fUYWPGiZUBQo2&index=23

 Computer Architecture, Spring 2015, Lecture 5

 Advanced Branch Prediction (CMU, Spring 2015)
 https://www.youtube.com/watch?v=yDjsr-jTOtk&list=PL5PHm2jkkXmgVh
h8CHAu9N76TShJqfYDt&index=4

https://www.youtube.com/onurmutlulectures 45
Pipelined Performance
Example

46
Carnegie Mellon

Pipelined Performance Example

 SPECINT2017 benchmark:
 25% loads
 10% stores
 11% branches
 2% jumps
 52% R-type

 Suppose:
 40% of loads used by next instruction
 25% of branches mispredicted

 All jumps flush next instruction

 What is the average CPI?
47
Carnegie Mellon

Pipelined Performance Example Solution

 Load/Branch CPI = 1 when no stall/flush, 2 when stall/flush.
Thus:
 CPIlw = 1(0.6) + 2(0.4) = 1.4 Average CPI for load
 CPIbeq = 1(0.75) + 2(0.25) = 1.25 Average CPI for branch

 And
 Average CPI =

48
Carnegie Mellon

Pipelined Performance Example Solution

 Load/Branch CPI = 1 when no stall/flush, 2 when stall/flush.
Thus:
 CPIlw = 1(0.6) + 2(0.4) = 1.4 Average CPI for load
 CPIbeq = 1(0.75) + 2(0.25) = 1.25 Average CPI for branch

 And
 Average CPI = (0.25)(1.4) + load
(0.1)(1) + store
(0.11)(1.25) + beq
(0.02)(2) + jump
(0.52)(1) r-type

= 1.15

49
Carnegie Mellon

Pipelined Performance
 There are 5 stages, and 5 different timing paths:

Tc = max {
tpcq + tmem + tsetup fetch
2(tRFread + tmux + teq + tAND + tmux + tsetup ) decode
tpcq + tmux + tmux + tALU + tsetup execute
tpcq + tmemwrite + tsetup memory
2(tpcq + tmux + tRFwrite)
writeback
}
 The operation speed depends on the slowest operation
 Decode and Writeback use register file and have only half a 50
Carnegie Mellon

Pipelined Performance Example

Element Parameter Delay (ps)
Register clock-to-Q tpcq_PC 30
Register setup tsetup 20
Multiplexer tmux 25
ALU tALU 200
Memory read tmem 250
Register file read tRFread 150
Register file setup tRFsetup 20
Equality comparator teq 40
AND gate tAND 15
Memory write Tmemwrite 220
Register file write tRFwrite 100

Tc = 2(tRFread + tmux + teq + tAND + tmux + tsetup )

= 2[150 + 25 + 40 + 15 + 25 + 20] ps
51
= 550 ps
Carnegie Mellon

Pipelined Performance Example

 For a program with 100 billion instructions executing on a
pipelined MIPS processor:
 CPI = 1.15
 Tc = 550 ps

 Execution Time = (# instructions) × CPI × Tc

= (100 × 109)(1.15)(550 × 10-12)
= 63 seconds

52
Carnegie Mellon

Performance Summary for MIPS arch.

Execution Time Speedup
Processor (seconds) (single-cycle is baseline)
Single-cycle 95 1
Multicycle 133 0.71
Pipelined 63 1.51

 Fastest of the three MIPS architectures is Pipelined.

 However, even though we have 5 fold pipelining, it is not
5 times faster than single cycle.

53
Recall: How to Handle Data
Dependences
Anti and output dependences are easier to handle
 write to the destination only in last stage and in
program order

 Flow dependences are more interesting

 Six fundamental ways of handling flow dependences

 Flow dependences are more interesting

 Six fundamental ways of handling flow dependences

 Detect and wait until value is available in register file
 Detect and forward/bypass data to dependent
instruction
 Detect and eliminate the dependence at the software
level
 No need for the hardware to detect dependence
 Detect and move it out of the way for independent
instructions
55

Questions to Ponder
 What is the role of the hardware vs. the software in
data dependence handling?
 Software based interlocking
 Hardware based interlocking
 Who inserts/manages the pipeline bubbles?
 Who finds the independent instructions to fill “empty”
pipeline slots?
 What are the advantages/disadvantages of each?
 Think of the performance equation as well

56
Questions to Ponder
 What is the role of the hardware vs. the software in
the order in which instructions are executed in the
pipeline?
 Software based instruction scheduling  static
scheduling
 Hardware based instruction scheduling  dynamic
scheduling

 How does each impact different metrics?

 Performance (and parts of the performance equation)
 Complexity
 Power consumption
 Reliability
 …
57
More on Software vs. Hardware
 Software based scheduling of instructions  static
scheduling
 Compiler orders the instructions, hardware executes
them in that order
 Contrast this with dynamic scheduling (in which
hardware can execute instructions out of the compiler-
specified order)
 How does the compiler know the latency of each
instruction?

 What information does the compiler not know that

makes static scheduling difficult?
 Answer: Anything that is determined at run time
 Variable-length operation latency, memory addr, branch
direction

 58
More on Static Instruction
Scheduling

https://www.youtube.com/onurmutlulectures 59
Lectures on Static Instruction
Scheduling
 Computer Architecture, Spring 2015, Lecture 16
 Static Instruction Scheduling (CMU, Spring 2015)
 https://www.youtube.com/watch?v=isBEVkIjgGA&list=PL5PHm2jkkXmi5C
xxI7b3JCL1TWybTDtKq&index=18

 Computer Architecture, Spring 2013, Lecture 21

 Static Instruction Scheduling (CMU, Spring 2013)
 https://www.youtube.com/watch?v=XdDUn2WtkRg&list=PL5PHm2jkkXmi
dJOd59REog9jDnPDTG6IJ&index=21

https://www.youtube.com/onurmutlulectures 60
Recall: How to Handle Data
Dependences
Anti and output dependences are easier to handle
 write to the destination only in last stage and in
program order

 Flow dependences are more interesting

 Six fundamental ways of handling flow dependences

62
Fine-Grained Multithreading
 Idea: Hardware has multiple thread contexts
(PC+registers). Each cycle, fetch engine fetches from
a different thread.
 By the time the fetched branch/instruction resolves, no
instruction is fetched from the same thread
 Branch/instruction resolution latency overlapped with
execution of other threads’ instructions

+ No logic needed for handling control and

data dependences within a thread
-- Single thread performance suffers
-- Extra logic for keeping thread contexts
-- Does not overlap latency if not enough
threads to cover the whole pipeline
63
Fine-Grained Multithreading (II)
 Idea: Switch to another thread every cycle such that
no two instructions from a thread are in the pipeline
concurrently

 Tolerates the control and data dependence latencies

by overlapping the latency with useful work from
other threads
 Improves pipeline utilization by taking advantage of
multiple threads

 Thornton, “Parallel Operation in the Control Data 6600,”

AFIPS 1964.
 Smith, “A pipelined, shared resource MIMD computer,”
ICPP 1978.
64
Fine-Grained Multithreading:
History
CDC 6600’s peripheral processing unit is fine-grained
multithreaded
 Thornton, “Parallel Operation in the Control Data 6600,” AFIPS
1964.
 Processor executes a different I/O thread every cycle
 An operation from the same thread is executed every 10
cycles

 Denelcor HEP (Heterogeneous Element Processor)

 Smith, “A pipelined, shared resource MIMD computer,” ICPP 1978.
 120 threads/processor
 available queue vs. unavailable (waiting) queue for threads
 each thread can have only 1 instruction in the processor pipeline;
each thread independent
 to each thread, processor looks like a non-pipelined machine
 system throughput vs. single thread performance tradeoff
65
Fine-Grained Multithreading in
HEP
Cycle time: 100ns

 8 stages  800 ns to
complete an
instruction
 assuming no
memory access

 No control and data

dependence
checking
Burton Smith
(1941-2018)

66
Multithreaded Pipeline Example

Slide credit: Joel Emer 67

Sun Niagara Multithreaded
Pipeline

Kongetira et al., “Niagara: A 32-Way Multithreaded Sparc Processor,” IEEE Micro 2005.
68
Fine-Grained Multithreading
 Advantages
+ No need for dependence checking between instructions
(only one instruction in pipeline from a single thread)
+ No need for branch prediction logic
+ Otherwise-bubble cycles used for executing useful instructions
from different threads
+ Improved system throughput, latency tolerance, utilization

 Disadvantages
- Extra hardware complexity: multiple hardware contexts (PCs,
register files, …), thread selection logic
- Reduced single thread performance (one instruction fetched
every N cycles from the same thread)
- Resource contention between threads in caches and memory
- Some dependence checking logic between threads remains
(load/store) 69
Modern GPUs are
FGMT Machines

70
NVIDIA GeForce GTX 285
“core”

64 KB of storage
… for thread
contexts
(registers)

= data-parallel (SIMD) func. unit, = instruction stream decode

control shared across 8 units
= multiply-add = execution context storage
= multiply

71
Slide credit: Kayvon Fatahalian
NVIDIA GeForce GTX 285
“core”

64 KB of storage
… for thread
contexts
(registers)
 Groups of 32 threads share instruction stream (each
group is a Warp): they execute the same instruction
on different data
 Up to 32 warps are interleaved in an FGMT

manner
72
 Up to 1024 thread contexts can be stored
Slide credit: Kayvon Fatahalian
NVIDIA GeForce GTX 285

Tex Tex
… … … … … …

30 cores on the GTX 285: 30,720 threads

73
Slide credit: Kayvon Fatahalian
Further Reading for the
Interested (I)

Burton Smith
(1941-2018)

74
Further Reading for the
Interested (II)

75
Digital Design &
Computer Arch.
Lecture 14: Pipelined
Processor Design
Prof. Onur Mutlu

ETH Zürich
Spring 2021
22 April 2021
We did not cover the
following slides. They are for
your benefit.
We will cover them in future
lectures.

77
Pipelining and Precise
Exceptions: Preserving
Sequential Semantics
Multi-Cycle Execution
 Not all instructions take the same amount of time
for “execution”
 Idea: Have multiple different functional units that
take different number of cycles
 Can be pipelined or not pipelined
 Can let independent instructions start execution on a
different functional unit before a previous long-latency
instruction finishes execution
Integer add
E
Integer mul
E E E E
FP mul
?
F D
E E E E E E E E

E E E E E E E E ...
Load/store
79
Issues in Pipelining: Multi-Cycle
Execute
 Instructions can take different number of cycles in

EXECUTE stage
 Integer ADD versus FP MULtiply

FMUL R4  R1, R2 F D E E E E E E E E W
ADD R3  R1, R2 F D E W
F D E W
F D E W

FMUL R2  R5, R6 F D E E E E E E E E W
ADD R7  R5, R6 F D E W
F D E W

 What is wrong with this picture in a Von Neumann

architecture?
 Sequential semantics of the ISA NOT preserved!
 What if FMUL incurs an exception?
80
Exceptions vs. Interrupts
 Cause
 Exceptions: internal to the running thread
 Interrupts: external to the running thread

 When to Handle
 Exceptions: when detected (and known to be non-
speculative)
 Interrupts: when convenient
 Except for very high priority ones
 Power failure
 Machine check (error)

 Priority: process (exception), depends (interrupt)

 Handling Context: process (exception), system 81

Precise Exceptions/Interrupts
 The architectural state should be consistent
(precise) when the exception/interrupt is ready to
be handled

1. All previous instructions should be completely

retired.

2. No later instruction should be retired.

Retire = commit = finish execution and update arch.

state

82
Checking for and Handling Exceptions
in Pipelining
 When the oldest instruction ready-to-be-retired is
detected to have caused an exception, the control
logic

 Ensures architectural state is precise (register file, PC,

memory)

 Flushes all younger instructions in the pipeline

 Saves PC and registers (as specified by the ISA)

 Redirects the fetch engine to the appropriate exception

handling routine
83
Why Do We Want Precise
Exceptions?
Semantics of the von Neumann model ISA specifies
it
 Remember von Neumann vs. Dataflow

 Aids software debugging

 Enables (easy) recovery from exceptions

 Enables (easily) restartable processes

 Enables traps into software (e.g., software

implemented opcodes)

84
Ensuring Precise Exceptions in
Pipelining
Idea: Make each operation take the same amount of
time
FMUL R3  R1, R2 F D E E E E E E E E W
ADD R4  R1, R2 F D E E E E E E E E W
F D E E E E E E E E W
F D E E E E E E E E W
F D E E E E E E E E W
F D E E E E E E E E W
F D E E E E E E E E W

 Downside
 Worst-case instruction latency determines all
instructions’ latency
 What about memory operations?
 Each functional unit takes worst-case number of cycles?
85
Solutions
 Reorder buffer

 History buffer

 Future register file We will not cover these

 Checkpointing

 Suggested reading
 Smith and Plezskun, “Implementing Precise Interrupts in
Pipelined Processors,” IEEE Trans on Computers 1988 and
ISCA 1985.

86
Recall: Solution I: Reorder
Buffer
 (ROB)
Idea: Complete instructions out-of-order, but reorder
them before making results visible to architectural
state
 When instruction is decoded it reserves the next-
sequential entry in the ROB
 When instruction completes, it writes result into
ROB entry
 When instruction oldest in ROB and it has
completed without exceptions, its result moved to
Func Unit
reg. file or memory
Register
Instruction Reorder
Cache File Func Unit Buffer

Func Unit

87
Reorder Buffer
 Buffers information about all instructions that are
decoded but not yet retired/committed

88
What’s in a ROB Entry?
Valid bits for reg/data
V DestRegID DestRegVal StoreAddr StoreData PC Exception?
+ control bits

 Everything required to:

 correctly reorder instructions back into the program order
 update the architectural state with the instruction’s
result(s), if instruction can retire without any issues
 handle an exception/interrupt precisely, if an
exception/interrupt needs to be handled before retiring
the instruction

 Need valid bits to keep track of readiness of the

result(s) and find out if the instruction has completed
execution 89
Reorder Buffer: Independent
Operations
Result first written to ROB on instruction completion
 Result written to register file at commit time

F D E E E E E E E E R W
F D E R W
F D E R W
F D E R W
F D E E E E E E E E R W
F D E R W
F D E R W

 What if a later instruction needs a value in the reorder

buffer?
 One option: stall the operation  stall the pipeline
 Better: Read the value from the reorder buffer. How?
90
Reorder Buffer: How to Access?
 A register value can be in the register file, reorder
buffer, (or bypass/forwarding paths)

Random Access Memory

(indexed with Register ID,
Instruction Register which is the address of an entry)
Cache File
Func Unit

Func Unit

Reorder Func Unit

Content Buffer
Addressable
Memory bypass paths
(searched with
register ID,
which is part of the content of an entry)
91
Simplifying Reorder Buffer
Access
Idea: Use indirection

 Access register file first (check if the register is valid)

 If register not valid, register file stores the ID of the
reorder buffer entry that contains (or will contain) the
value of the register
 Mapping of the register to a ROB entry: Register file
maps the register to a reorder buffer entry if there is an
in-flight instruction writing to the register

 Access reorder buffer next

 Now, reorder buffer does not need to be content

addressable
92
Reorder Buffer in Intel Pentium
III

Boggs et al., “The

Microarchitecture of the
Pentium 4 Processor,” Intel
Technology Journal, 2001.

93
Important: Register Renaming with a
Reorder Buffer
 Output and anti dependencies are not true
dependencies
 WHY? The same register refers to values that have
nothing to do with each other
 They exist due to lack of register ID’s (i.e.
names) in the ISA

 The register ID is renamed to the reorder buffer

entry that will hold the register’s value
 Register ID  ROB entry ID
 Architectural register ID  Physical register ID
 After renaming, ROB entry ID used to refer to the
register

 This eliminates anti and output dependencies 94

Recall: Data Dependence Types
True (flow) dependence
r3  r1 op r2 Read-after-Write
r5  r3 op r4 (RAW) -- True

Anti dependence
r3  r1 op r2 Write-after-Read
r1  r4 op r5 (WAR) -- Anti

Output-dependence
r3  r1 op r2 Write-after-Write
r5  r3 op r4 (WAW) -- Output
r3  r6 op r7 95
In-Order Pipeline with Reorder
Buffer
Decode (D): Access regfile/ROB, allocate entry in ROB, check if
instruction can execute, if so dispatch instruction
 Execute (E): Instructions can complete out-of-order
 Completion (R): Write result to reorder buffer
 Retirement/Commit (W): Check for exceptions; if none, write
result to architectural register file or memory; else, flush
pipeline and start from exception handler
 In-order dispatch/execution, out-of-order completion, in-order
retirement Integer add
E
Integer mul
E E E E
FP mul
R W
F D
E E E E E E E E
R
E E E E E E E E ...
Load/store

97
Reorder Buffer Tradeoffs
 Advantages
 Conceptually simple for supporting precise exceptions
 Can eliminate false dependences

 Disadvantages
 Reorder buffer needs to be accessed to get the results
that are yet to be written to the register file
 CAM or indirection  increased latency and complexity

 Other solutions aim to eliminate the disadvantages

 History buffer
 Future file We will not cover these
 Checkpointing

Chapter 04 Computer Organization and Design, Fifth Edition: The Hardware/Software Interface (The Morgan Kaufmann Series in Computer Architecture and Design) 5th Edition
71% (7)
Chapter 04 Computer Organization and Design, Fifth Edition: The Hardware/Software Interface (The Morgan Kaufmann Series in Computer Architecture and Design) 5th Edition
137 pages
Howrah Bridge (1) (1) (Autosaved)
No ratings yet
Howrah Bridge (1) (1) (Autosaved)
19 pages
Topic 10: Pipelining: Cos / Ele 375 Computer Architecture and Organization
No ratings yet
Topic 10: Pipelining: Cos / Ele 375 Computer Architecture and Organization
64 pages
Chapter 4 The Processor
100% (1)
Chapter 4 The Processor
131 pages
Temple Architecture - Southern India
No ratings yet
Temple Architecture - Southern India
77 pages
RCC One Way Slab Design
No ratings yet
RCC One Way Slab Design
2 pages
Quadracci Pavilion - Milwaukee Art Museum
No ratings yet
Quadracci Pavilion - Milwaukee Art Museum
9 pages
Schako Quick Selection en
No ratings yet
Schako Quick Selection en
233 pages
Homeworks Uk Nottingham
100% (1)
Homeworks Uk Nottingham
7 pages
Construction Bill of Quantities For The Proposed 3 Storey One Bedroom Residential Building
No ratings yet
Construction Bill of Quantities For The Proposed 3 Storey One Bedroom Residential Building
22 pages
Timber Home Living 2015-09-10
No ratings yet
Timber Home Living 2015-09-10
84 pages
L11 Pipelined Datapath and
100% (1)
L11 Pipelined Datapath and
31 pages
Intel
No ratings yet
Intel
220 pages
Arch3 Pipelining Afterlecture
No ratings yet
Arch3 Pipelining Afterlecture
180 pages
Onur Digitaldesign - Comparch 2021 Lecture13 Pipelining Afterlecture
No ratings yet
Onur Digitaldesign - Comparch 2021 Lecture13 Pipelining Afterlecture
138 pages
Patterson6e MIPS Ch04
No ratings yet
Patterson6e MIPS Ch04
137 pages
Arch4 Pipelined Processor Design Afterlecture
No ratings yet
Arch4 Pipelined Processor Design Afterlecture
130 pages
Patterson6e MIPS Ch04 PPT
No ratings yet
Patterson6e MIPS Ch04 PPT
137 pages
Computer Architecture: Nguyễn Trí Thành
No ratings yet
Computer Architecture: Nguyễn Trí Thành
77 pages
LS 01TD
No ratings yet
LS 01TD
1 page
Chapter4 Pipelining END FA11
No ratings yet
Chapter4 Pipelining END FA11
84 pages
Onur 447 Spring15 Lecture12 Ooo Execution Afterlecture
No ratings yet
Onur 447 Spring15 Lecture12 Ooo Execution Afterlecture
67 pages
EE457Unit9a OoO
No ratings yet
EE457Unit9a OoO
77 pages
Chapter4 Part1
No ratings yet
Chapter4 Part1
51 pages
III B.tech II Sem Mad Unit-4 Lecture Notes
No ratings yet
III B.tech II Sem Mad Unit-4 Lecture Notes
53 pages
Module 5 - Processor Structure and Function
No ratings yet
Module 5 - Processor Structure and Function
74 pages
NSL Town Center Branding Wayfinding FINAL DRAFT1-compressed
No ratings yet
NSL Town Center Branding Wayfinding FINAL DRAFT1-compressed
52 pages
Chapter 2 Lecture 4 and 5
No ratings yet
Chapter 2 Lecture 4 and 5
56 pages
Lec5 - ILP Issues in Pipeline Design
No ratings yet
Lec5 - ILP Issues in Pipeline Design
38 pages
Chapter 04 Computer Architecture and D
No ratings yet
Chapter 04 Computer Architecture and D
95 pages
4.4 Pipelining
No ratings yet
4.4 Pipelining
39 pages
Creation of A National Genius Si̇nan and The Historiography of "Classical" Ottoman Architecture - Gülru Neci̇poğlu
No ratings yet
Creation of A National Genius Si̇nan and The Historiography of "Classical" Ottoman Architecture - Gülru Neci̇poğlu
44 pages
Pipelining and Parallelism
No ratings yet
Pipelining and Parallelism
41 pages
Embedded Computer Architecture 5SAI0
No ratings yet
Embedded Computer Architecture 5SAI0
59 pages
8 Pipeline DDP Control
No ratings yet
8 Pipeline DDP Control
54 pages
3 Pipeline
No ratings yet
3 Pipeline
38 pages
Ca Lecture 9
No ratings yet
Ca Lecture 9
26 pages
Week 11
No ratings yet
Week 11
33 pages
Computer Architecture: Chapter 4: The Processor Part 1
No ratings yet
Computer Architecture: Chapter 4: The Processor Part 1
51 pages
Pipeline Processor Design
No ratings yet
Pipeline Processor Design
89 pages
Basics and Hazards of Pipeline Controller
No ratings yet
Basics and Hazards of Pipeline Controller
23 pages
Processor PDF
No ratings yet
Processor PDF
98 pages
Ca07 2014 PDF
No ratings yet
Ca07 2014 PDF
56 pages
Ca06 2014 PDF
No ratings yet
Ca06 2014 PDF
53 pages
L15 MipsPipeline
No ratings yet
L15 MipsPipeline
26 pages
L05 PipeliningII
No ratings yet
L05 PipeliningII
36 pages
Lect8 Pipelined DP Control
No ratings yet
Lect8 Pipelined DP Control
59 pages
Lec 11
No ratings yet
Lec 11
30 pages
Pipe Lining
No ratings yet
Pipe Lining
66 pages
Chapter4 2
No ratings yet
Chapter4 2
34 pages
Pokemon Go Nests
No ratings yet
Pokemon Go Nests
36 pages
Lec7 Pipelining
No ratings yet
Lec7 Pipelining
22 pages
Advanced Computer Architecture: BY Dr. Radwa M. Tawfeek
No ratings yet
Advanced Computer Architecture: BY Dr. Radwa M. Tawfeek
36 pages
Chapter 4 The Processor
No ratings yet
Chapter 4 The Processor
72 pages
CA07 2022S3 New
No ratings yet
CA07 2022S3 New
29 pages
02a ILP Pipeline
No ratings yet
02a ILP Pipeline
40 pages
Embedded Systems Design: Pipelining and Instruction Scheduling
No ratings yet
Embedded Systems Design: Pipelining and Instruction Scheduling
48 pages
Upacara Rambu Solo
No ratings yet
Upacara Rambu Solo
11 pages
Advanced Linux Programming
No ratings yet
Advanced Linux Programming
31 pages
CH 6
No ratings yet
CH 6
12 pages
16.482 / 16.561 Computer Architecture and Design: Instructor: Dr. Michael Geiger Fall 2013
No ratings yet
16.482 / 16.561 Computer Architecture and Design: Instructor: Dr. Michael Geiger Fall 2013
42 pages
CS 162 Computer Architecture Lecture 3: Pipelining Contd.: Instructor: L.N. Bhuyan
No ratings yet
CS 162 Computer Architecture Lecture 3: Pipelining Contd.: Instructor: L.N. Bhuyan
21 pages
Gateway Class 8
No ratings yet
Gateway Class 8
9 pages
Introduction To Pipelining Introduction To Pipelining
No ratings yet
Introduction To Pipelining Introduction To Pipelining
35 pages
Enhancing Performance With Pipelining
No ratings yet
Enhancing Performance With Pipelining
71 pages
Chapter 4 The Processor
No ratings yet
Chapter 4 The Processor
131 pages
Great Buildings PDF
No ratings yet
Great Buildings PDF
11 pages
Pipeline Review: Here Is The Example Instruction Sequence Used To Illustrate Pipelining On The Previous Page
No ratings yet
Pipeline Review: Here Is The Example Instruction Sequence Used To Illustrate Pipelining On The Previous Page
11 pages
HRY-312 Computer Organization Introduction To Pipelining
No ratings yet
HRY-312 Computer Organization Introduction To Pipelining
30 pages
Distance Between The House and Bus Stop: 0.3 Miles: Cover Sheet, Site Plan & Roof Plan
No ratings yet
Distance Between The House and Bus Stop: 0.3 Miles: Cover Sheet, Site Plan & Roof Plan
2 pages
Baron Empain Palace
No ratings yet
Baron Empain Palace
6 pages
CS M151B / EE M116C: Computer Systems Architecture
No ratings yet
CS M151B / EE M116C: Computer Systems Architecture
38 pages
Cpre 381 Project Report 1
No ratings yet
Cpre 381 Project Report 1
22 pages
Cse410 10 Pipelining A
No ratings yet
Cse410 10 Pipelining A
7 pages
EE (CE) 6304 Computer Architecture Lecture #2 (8/28/13)
No ratings yet
EE (CE) 6304 Computer Architecture Lecture #2 (8/28/13)
35 pages
Zenda Chapter 6
No ratings yet
Zenda Chapter 6
6 pages
Roll Form Catalogue
No ratings yet
Roll Form Catalogue
7 pages
M116C 1 M116C 1 Lec10-Pipeline-II
No ratings yet
M116C 1 M116C 1 Lec10-Pipeline-II
18 pages
Exp 3
No ratings yet
Exp 3
3 pages
Pipeline Datapaths: Pipelined Datapath and Control
No ratings yet
Pipeline Datapaths: Pipelined Datapath and Control
16 pages
Coventry Log Homes: Foundation Plan
No ratings yet
Coventry Log Homes: Foundation Plan
3 pages
Modern Front Elevation Wall Tiles at The Best Price Orientbell 2
No ratings yet
Modern Front Elevation Wall Tiles at The Best Price Orientbell 2
1 page
Carl Angelo Sadaran Martin: Sto. Domingo, Urdaneta City, Pangasinan 09059514000
No ratings yet
Carl Angelo Sadaran Martin: Sto. Domingo, Urdaneta City, Pangasinan 09059514000
2 pages
Lec12 Pipeline
No ratings yet
Lec12 Pipeline
23 pages
Day 1 Inset
No ratings yet
Day 1 Inset
1 page
Annex B Form 2
No ratings yet
Annex B Form 2
2 pages
1 Pointperspective
No ratings yet
1 Pointperspective
2 pages
Detailed Unit Price Analysis
100% (3)
Detailed Unit Price Analysis
1 page
4TH. Floor Plan
No ratings yet
4TH. Floor Plan
1 page
WAN TECHNOLOGY FRAME-RELAY: An Expert's Handbook of Navigating Frame Relay Networks
From Everand
WAN TECHNOLOGY FRAME-RELAY: An Expert's Handbook of Navigating Frame Relay Networks
Mamta Devi
No ratings yet