ACA - Chapter 6
ACA - Chapter 6
Pipelining and
Superscalar Techniques.
Dr.Manjunath Kotari
Professor & Head-CSE
Linear Pipeline
S1 S2 S3
L1 L2
S1 X
S2 X
S3 X
S4
X
5 tasks on 4 stages
Time
S1 X X X X X
S2 X X X X X
S3 X X X X X
S4 X X X X X
Clocking and Timing Control
Speedup, Efficiency & Throughput
Efficiency(Ek) & Throughput(Hk)
Non Linear Pipelines
• Variable functions
• Feed-Forward
• Feedback
3 stages & 2 functions
X Y
S1 S2 S3
S1 Y Y
S2
Y
S3
Y Y Y
Reservation Tables
• Latency-
• the number of time units between two initiations
of a pipeline is called latency.
• Must be non-negative integers
• A latency of k means, two initiations are
separated by a k clock cycles
• Collision
• Any attempt by two or more initiations to use the
same pipeline stage at the same time.
• Collision implies resource conflicts between
two initiations in the pipeline.
• Therefore all collisions must be avoided in
scheduling a sequence of pipeline initiations.
• Some latencies will cause collisions, and some
will not.
• Latencies that cause collisions are called
forbidden latencies.
• Latency sequence
• It is a sequence of permissible nonforbidden latencies
between successive task initiations.
• Latency cycle
• It is a latency sequence which repeats the same
subsequence indefinitely.
• Average latency
•It is obtained by dividing the sum of all latencies by the number of
latencies along the cycle.
• Constant cycle
•It is a latency cycle which contains only one latency value.
Collision free scheduling
• Collision vectors
• The combined set of permissible and forbidden latencies
can easily displayed by a collision vector.
• C=(CmCm-1……..C2C1)
• If Ci=1 latency i causes a collision
• If Ci=0 latency i causes is permissible.
State diagrams
• Simple cycle
• It is a latency cycle in which each state appears
only once.
• Ex: (3),(6),(1,8),(3,8) and (6,8)
• Greedy cycles
• Greedy cycle is one whose edges are all made
with minimum latencies from their respective
starting states.
• Ex: (1,8),(3)
• MAL (minimal average latency)
Nonlinear Pipeline Design
• Latency
The number of clock cycles between two initiations of a
pipeline
• Collision
Resource Conflict
• Forbidden Latencies
Latencies that cause collisions
Nonlinear Pipeline Design cont
• Latency Sequence
A sequence of permissible latencies between successive
task initiations
• Latency Cycle
A sequence that repeats the same subsequence
• Collision vector
C = (Cm, Cm-1, …, C2, C1), m <= n-1
n = number of column in reservation table
Ci = 1 if latency i causes collision, 0 otherwise
Collision Vector for Multiply
after Multiply
Forbidden Latencies: 1, 2
Collision vector
0 0 0 0 1 1 11
S1 S2 S3
Reservation Tables for X & Y
S1 X X X
S2 X X
S3 X X X
S1 Y Y
S2
Y
S3
Y Y Y
Reservation Tables for X & Y
S1 X X X
S2 X X
S3 X X X
S1 Y Y
S2
Y
S3
Y Y Y
Collision Vector
• Forbidden Latencies: 2, 4, 5, 7
• Collision Vector =
1011010
Y after Y
S1 Y Y Y
S2
Y Y
S3
Y YY YY
S1 Y YY
S2
Y
S3
Y Y YY
Collision Vector
• Forbidden Latencies: 2, 4
• Collision Vector =
1010
Exercise – Find the collision
vector
1 2 3 4 5 6 7
A X X X
B X X
C X X
D X
State Diagram for X
8+
1011010
3 8+
6 8+ 1*
1011011 1111111
3* 6
Cycles
• Prefetch Buffers
• Multiple Functional Units
• Internal Data Forwarding
• Hazard Avoidance
Prefetch Buffers
LD R2,M
Store-store forwarding
• The two stores are executed immediately one after the other.
• The second store overwrites the first
• The first store becomes redundant and thus can be
eliminated without affecting the outcome.
Implementing the dot-product operation with internal data
forwarding
Hazard Avoidance
• Static scheduling
• Data dependencies in a sequence of instructions create
interlocked relationships among them.
Branch handling techniques
• Three basic terms for the analysis of branching
effect.
• Branch Taken
• The action of fetching a nonsequential or remote instruction
after a branch instruction
• Branch Target
• The instruction to executed after a branch taken
• Delay Slot
• The number of pipeline cycles wasted between a branch taken
and its target.
• Denoted by d 0<=d<=k-1, where k=no. of pipeline stages.
Branch Prediction
• Branch can be predicted either based on branch code
types statically or based on branch history during
program execution.
• The static prediction direction is usually wired into the
processor.
• According to past experience, the best performance is
given by predicting taken.
• A dynamic branch strategy uses recent branch history to
predict whether or not the branch will be taken next time
when it occurs.
• To accurate one may need to use the entire history of
the branch to predict the future choice.
Classification of dynamic branch
strategies
• One class predicts the branch direction based
upon information found at the decode stage.
• The second class uses a cache to store target
addresses at the stage the effective address of
the branch target is computed.
• The third scheme uses a cache to store target
instructions at the fetch stage.
BTB (branch target buffer)