0% found this document useful (0 votes)
3 views

Week 11

Copyright
© © All Rights Reserved
Available Formats
Download as PPSX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
3 views

Week 11

Copyright
© © All Rights Reserved
Available Formats
Download as PPSX, PDF, TXT or read online on Scribd
You are on page 1/ 33

Computer Architecture

EE-371/CS-330
Spring 2019

Hasan Baig
[email protected]

Habib University

The contents in these lecture slides are prepared with a help of the official lecture slides of the book “Computer Organization and Design –
RISC V edition” by Patterson and Hennessy .
2
Recap
3
Performance Issues

• Longest delay determines clock period


– Critical path: load instruction
– Instruction memory  register file  ALU  data
memory  register file
• Not feasible to vary period for different
instructions
• Violates design principle
– Making the common case fast
• We will improve performance by pipelining
4
Pipelining Restaurant Analogy

Buffet/Delivery Restaurant Delivery Address /


Goodies collection
1. Dine-in, Delivery, worker 10

Process execute one at a


time
Tasks: Cash Counter 2

1. Customer – Grab food and dine in


2. Delivery Guy – Deliver food
3. Worker – Purchase Groceries Kitchen
(Grab food) 2
(Dining Hall)
(Check groceries)

Take/Give Order 1

Token counter 1
5
Pipelining Laundry Analogy
An implementation technique in which multiple instructions are
overlapped in execution

 Four loads:
 Speedup
= 8/3.5 = 2.3
 Non-stop:
 Speedup
= 2n/(0.5n + 1.5) ≈ 4
= number of stages
6
Pipelining
Five stages, one step per stage
1. IF: Instruction fetch from memory
2. ID: Instruction decode & register read
3. EX: Execute operation or calculate address
4. MEM: Access memory operand
5. WB: Write result back to register
7
Pipelining Example

• Assume time for stages is


– 100ps for register read or write
– 200ps for other stages
• Compare pipelined datapath with single-cycle
datapath

Instr Instr fetch Register ALU op Memory Register Total time


read access write
ld 200ps 100 ps 200ps 200ps 100 ps 800ps
sd 200ps 100 ps 200ps 200ps 700ps
R-format 200ps 100 ps 200ps 100 ps 600ps
beq 200ps 100 ps 200ps 500ps
8
Pipelining Example
Instr Instr fetch Register ALU op Memory Register Total time
read access write
ld 200ps 100 ps 200ps 200ps 100 ps 800ps
sd 200ps 100 ps 200ps 200ps 700ps
R-format 200ps 100 ps 200ps 100 ps 600ps
beq 200ps 100 ps 200ps 500ps

Single-cycle (Tc= 800ps)


9
Pipelining Example
Instr Instr fetch Register ALU op Memory Register Total time
read access write
ld 200ps 100 ps 200ps 200ps 100 ps 800ps
sd 200ps 100 ps 200ps 200ps 700ps
R-format 200ps 100 ps 200ps 100 ps 600ps
beq 200ps 100 ps 200ps 500ps

Pipelined (Tc= 200ps)


10
Pipelining Speedup

• If all stages are balanced


– i.e., all take the same time
– Time between instructionspipelined
= Time between instructionsnonpipelined
Number of stages
• If not balanced, speedup is less
• Speedup due to increased throughput
– Latency (time for each instruction) does not
decrease
11
Pipelining Speedup

T = 2400

T = 1400
12
Pipelining Speedup
Instructions = 1,000,000

Pipelined Non-Pipelined
Each instruction will add 200 ps Each instruction will add 800 ps

Total time = 1,000,000 x 200 = 200000000 + 1400 Total time = 1,000,000 x 800 = 800000000 + 2400
= 200001400 = 800002400
13
Latency Exercise
14
Latency Solultion

1. R-type: 30 + 250 + 150 + 25 + 200 + 25 + 20 = 700ps

2. ld: 30 + 250 + 150 + 25 + 200 + 250 + 25 + 20 = 950 ps

3. sd: 30 + 250 + 150 + 200 + 25 + 250 = 905

4. beq: 30 + 250 + 150 + 25 + 200 + 5 + 25 + 20 = 705

5. I-type: 30 + 250 + 150 + 25 + 200 + 25 + 20 = 700ps


15
Recap Problems in single-cycle processor

• Longest delay determines clock period


– Critical path: load instruction
– Instruction memory  register file  ALU  data
memory  register file
• Not feasible to vary period for different
instructions
• Violates design principle
– Making the common case fast
• We will improve performance by pipelining
16
Recap
17
Quick Review Stages in Processor
Five stages, one step per stage
1. IF: Instruction fetch from memory
2. ID: Instruction decode & register read
3. EX: Execute operation or calculate address
4. MEM: Access memory operand
5. WB: Write result back to register
18
Recap

• Data Read in the second half of clock cycle


• Data Write in the first half of clock cycle
19
Pipelining and ISA Design

• RISC-V ISA designed for pipelining


– All instructions are 32-bits
• Easier to fetch and decode in one cycle
• c.f. x86: 1- to 17-byte instructions

– Few and regular instruction formats


• Can decode and read registers in one step
20
Pipelining and ISA Design

• Situations that prevent starting the next


instruction in the next cycle  Hazards
• Structure hazards
– A required resource is busy
• Data hazard
– Need to wait for previous instruction to complete
its data read/write
• Control hazard
– Deciding on control action depends on previous
instruction
21
Pipelining Structural Hazards

When a planned instruction cannot execute in the proper clock cycle


because the hardware does not support the combination of
instructions that are set to execute.
• Conflict for use of a resource
• In RISC-V pipeline with a single memory
– Load/store requires data access
– Instruction fetch would have to stall for that cycle
• Would cause a pipeline “bubble”
• Hence, pipelined datapaths require separate
instruction/data memories
22
Pipelining Data Hazards

Data hazards occur when the pipeline must be stalled because one
step must wait for another to complete.
add x19, x0, x1
sub x2, x19, x3
23
Pipelining Data Hazards

• Use result when it is computed


– Don’t wait for it to be stored in a register
– Requires extra connections in the datapath

Forwarding - Also called bypassing. A method of resolving a data hazard by retrieving the
missing data element from internal buffers rather than waiting for it to arrive from
programmer- visible registers or memory.
25
Pipelining Data Hazards

• Forwarding paths are valid only if the destination stage is


later in time than the source stage
• Source – Output of MEM in first instruction
• Destination - Input to EX stage
26
Pipelining Data Hazards

• Load-use Data Hazards


• Can’t always avoid stalls by forwarding
– If value not computed when needed
– Can’t use forwarding backward in time!
27
Pipelining Data Hazards

Code Scheduling to avoid stalls


• Reorder code to avoid use of load result in the
next instruction
• C code for a = b + e; c = b + f;
Assume all variables are in memory and accessible from the offset x31

ld x1, 0(x31) ld x1, 0(x31)


ld x2, 8(x31) ld x2, 8(x31)
stall
add x3, x1, x2 ld x4, 16(x31)
sd x3, 24(x31) add x3, x1, x2
ld x4, 16(x31) sd x3, 24(x31)
add x5, x1, x4 add x5, x1, x4
stall
sd x5, 32(x31) sd x5, 32(x31)
13 cycles 11 cycles
28
Pipelining Control Hazards

• Also called Branch Hazard


• Branch determines flow of control
– Fetching next instruction depends on branch
outcome
– Pipeline can’t always fetch correct instruction
• Still working on ID stage of branch

• In RISC-V pipeline
– Need to compare registers and compute target
early in the pipeline
– Add hardware to do it in ID stage
29
Pipelining Control Hazards

Stall on branch
• Wait until branch outcome determined before
fetching next instruction
30
Pipelining Control Hazards

Branch Predictions
• Longer pipelines can’t readily determine
branch outcome early
– Stall penalty becomes unacceptable
• Predict outcome of branch
– Only stall if prediction is wrong
• In RISC-V pipeline
– Can predict branches not taken
– Fetch instruction after branch, with no delay
31
Pipelining Control Hazards

Branch Predictions
32
Pipelining Control Hazards

More-Realistic Branch Predictions


• Static branch prediction
– Based on typical branch behavior
– Example: loop and if-statement branches
• Predict backward branches taken
• Predict forward branches not taken
• Dynamic branch prediction
– Hardware measures actual branch behavior
• e.g., record recent history of each branch
– Assume future behavior will continue the trend
• When wrong, stall while re-fetching, and update history
33
Pipelining Summary of Overview

• Pipelining improves performance by increasing


instruction throughput
– Executes multiple instructions in parallel
– Each instruction has the same latency
• Subject to hazards
– Structure, data, control
• Instruction set design affects complexity of
pipeline implementation
34
Pipelining Activity

For each code sequence below, state whether it must stall, can avoid stalls using
only forwarding, or can execute without stalling or forwarding.

You might also like