0% found this document useful (0 votes)
70 views4 pages

ECE/CS 752: Advanced Computer Architecture I 1

This document discusses superscalar pipelining and techniques to improve processor performance beyond single instruction pipelining. It introduces the concept of superscalar machines that can dispatch and execute multiple instructions per cycle by exploiting instruction level parallelism. Several challenges of implementing superscalar designs are discussed, including limiting factors on instruction level parallelism and managing dependencies between instructions. Different classifications of instruction level parallelism machines such as superscalar, superpipelined, VLIW and hybrid approaches are also covered.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
70 views4 pages

ECE/CS 752: Advanced Computer Architecture I 1

This document discusses superscalar pipelining and techniques to improve processor performance beyond single instruction pipelining. It introduces the concept of superscalar machines that can dispatch and execute multiple instructions per cycle by exploiting instruction level parallelism. Several challenges of implementing superscalar designs are discussed, including limiting factors on instruction level parallelism and managing dependencies between instructions. Different classifications of instruction level parallelism machines such as superscalar, superpipelined, VLIW and hybrid approaches are also covered.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 4

ECE/CS 752: Advanced Computer Architecture I 1

Pipelining to Superscalar Pipelining to Superscalar


Prof.Mikko H.Lipasti
UniversityofWisconsinMadison
LecturenotesbasedonnotesbyJohnP.Shen
UpdatedbyMikko Lipasti
Pipelining to Superscalar Pipelining to Superscalar
Forecast
Limitsofpipelining
Thecaseforsuperscalar p
Instructionlevelparallelmachines
Superscalarpipelineorganization
Superscalarpipelinedesign
Limits of Pipelining Limits of Pipelining
IBMRISCExperience
Controlanddatadependencesadd15%
BestcaseCPIof1.15,IPCof0.87
D i li (hi h f ) if Deeperpipelines(higherfrequency)magnify
dependencepenalties
Thisanalysisassumes100%cachehitrates
Hitratesapproach100%forsomeprograms
Manyimportantprogramshavemuchworsehit
rates
Later!
Processor Performance Processor Performance
Processor Performance = ---------------
Time
Program
Instructions Cycles
I i
Time
=
X X
Inthe1980s(decadeofpipelining):
CPI:5.0=>1.15
Inthe1990s(decadeofsuperscalar):
CPI:1.15=>0.5(bestcase)
Inthe2000s(decadeofmulticore):
MarginalCPIimprovement
Program Instruction Cycle
(code size)
X X
(CPI) (cycle time)
Amdahls Law Amdahls Law
No. of
Processors
N
1
h 1- h
1- f
f
h=fractionoftimeinserialcode
f=fractionthatisvectorizable
v=speedupforf
Overallspeedup:
Time
1 1 f
v
f
f
Speedup

1
1
Revisit Amdahls Law Revisit Amdahls Law
Sequentialbottleneck
Evenifvisinfinite
Performancelimitedbynonvectorizable
f
v
f
f
v



1
1
1
1
lim
y
portion(1f)
No. of
Processors
N
Time
1
h 1- h
1- f
f
ECE/CS 752: Advanced Computer Architecture I 2
Pipelined Performance Model Pipelined Performance Model
Pipeline
Depth
N
1
g=fractionoftimepipelineisfilled
1g=fractionoftimepipelineisnotfilled
(stalled)
1-g g
1
Pipeline
Depth
N
1
Pipelined Performance Model Pipelined Performance Model
g=fractionoftimepipelineisfilled
1g=fractionoftimepipelineisnotfilled
(stalled)
1-g g
1
Pipelined Performance Model Pipelined Performance Model
Pipeline
Depth
N
1
TyrannyofAmdahlsLaw[BobColwell]
Whengisevenslightlybelow100%,abig
performancehitwillresult
Stalledcyclesarethekeyadversaryandmustbe
minimizedasmuchaspossible
1-g g
1
Motivation for Superscalar Motivation for Superscalar
[Agerwala and Cocke] [Agerwala and Cocke]
5
6
7
8


p
n=12
n=100
Speedupjumpsfrom3to4.3
forN=6,f=0.8,buts=2instead
ofs=1(scalar)
0 0.2 0.4 0.6 0.8 1
0
1
2
3
4
5
Vectorizability f
S
p
e
e
d
u
p


p
n=4
n=6
n=6,s=2
Typical Range
Superscalar Proposal Superscalar Proposal
ModeratetyrannyofAmdahlsLaw
Easesequentialbottleneck
Moregenerallyapplicable g y pp
Robust(lesssensitivetof)
RevisedAmdahlsLaw:

v
f
s
f
Speedup

1
1
Limits on Instruction Level Limits on Instruction Level
Parallelism (ILP) Parallelism (ILP)
WeissandSmith[1984] 1.58
Sohi andVajapeyam[1987] 1.81
TjadenandFlynn[1970] 1.86(Flynns bottleneck)
TjadenandFlynn[1973] 1.96
Uht[1986] 2.00
Smithet al. [1989] 2.00 Smithet al. [1989] 2.00
J ouppi andWall [1988] 2.40
J ohnson[1991] 2.50
Acostaet al. [1986] 2.79
Wedig[1982] 3.00
Butler et al. [1991] 5.8
MelvinandPatt [1991] 6
Wall [1991] 7(J ouppi disagreed)
Kuck et al. [1972] 8
RisemanandFoster [1972] 51(nocontrol dependences)
NicolauandFisher [1984] 90(Fishers optimism)
ECE/CS 752: Advanced Computer Architecture I 3
Superscalar Proposal Superscalar Proposal
Gobeyondsingleinstructionpipeline,
achieveIPC>1
Dispatchmultipleinstructionspercycle
id ll li bl f f Providemoregenerallyapplicableformof
concurrency(notjustvectors)
Gearedforsequentialcodethatishardto
parallelizeotherwise
Exploitfinegrainedorinstructionlevel
parallelism(ILP)
Classifying ILP Machines Classifying ILP Machines
[Jouppi,DECWRL1991]
BaselinescalarRISC
Issueparallelism=IP=1
Operationlatency=OP=1
PeakIPC=1
1
2
3
4
5
6
IF DE EX WB
1 2 3 4 5 6 7 8 9 0
TIME IN CYCLES (OF BASELINE MACHINE)
S
U
C
C
E
S
S
I
V
E
I
N
S
T
R
U
C
T
I
O
N
S
Classifying ILP Machines Classifying ILP Machines
[Jouppi,DECWRL1991]
Superpipelined:cycletime=1/mofbaseline
Issueparallelism=IP=1inst/minorcycle
Operationlatency=OP=mminorcycles
P k IPC i t / j l ( d ?) PeakIPC=minstr/majorcycle(mxspeedup?)
1
2
3
4
5
IF DE EX WB
6
1 2
3 4 5 6
Classifying ILP Machines Classifying ILP Machines
[Jouppi,DECWRL1991]
Superscalar:
Issueparallelism=IP=ninst/cycle
Operationlatency=OP=1cycle
PeakIPC=ninstr/cycle(nxspeedup?) / y ( p p )
IF DE EX WB
1
2
3
4
5
6
9
7
8
Classifying ILP Machines Classifying ILP Machines
[Jouppi,DECWRL1991]
VLIW:VeryLongInstructionWord
Issueparallelism=IP=ninst/cycle
Operationlatency=OP=1cycle
PeakIPC=ninstr/cycle=1VLIW/cycle / y / y
IF DE
EX
WB
Classifying ILP Machines Classifying ILP Machines
[Jouppi,DECWRL1991]
SuperpipelinedSuperscalar
Issueparallelism=IP=ninst/minorcycle
Operationlatency=OP=mminorcycles
PeakIPC=nxminstr/majorcycle / j y
IF DE EX WB
1
2
3
4
5
6
9
7
8
ECE/CS 752: Advanced Computer Architecture I 4
Superscalar vs. Superpipelined Superscalar vs. Superpipelined
Roughlyequivalentperformance
Ifn=mthenbothhaveaboutthesameIPC
Parallelismexposedinspacevs.time
Timein Cycles (of BaseMachine)
0 1 2 3 4 5 6 7 8 9
SUPERPIPELINED
10 11 12 13
SUPERSCALAR
Key:
IFetch
Dcode
Execute
Writeback
Superpipelining Superpipelining: Result Latency : Result Latency
Superpipelining - J ouppi, 1989
essentially describes apipelined execution stage
J ouppi s basemachine J ouppi s basemachine
Underpipelined machine
Superpipelined machine
Underpipelined machines cannot
issue instructions as fast as they are
executed
Note - key charact eristic of Superpipe lined
machines is that results are not available
to M-1 suc cess ive instructions
Superscalar Challenges Superscalar Challenges
I-cache
FETCH
DECODE
Branch
Predictor
Instruction
Buffer
Instruction
Flow
DECODE
COMMIT
D-cache Store
Queue
Reorder
Buffer
Integer Floating-point Media Memory
Register
Data
Memory
Data
EXECUTE
(ROB)
Flow
Flow

You might also like