0% found this document useful (0 votes)
37 views2 pages

An Introductory Analysis of Pipelines: I I I I I Clock Cycles

This document discusses pipelining and its effects on instruction throughput, latency, and speedup. It defines a 5-stage instruction pipeline and uses time-space diagrams to show that pipelining improves instruction throughput from 0.2 to 0.6 instructions per cycle (IPC) but does not reduce instruction latency, which remains 5 cycles. Speedup from pipelining is calculated as the ratio of execution times with and without pipelining. The ideal speedup is limited by the pipeline depth, while ideal instruction throughput approaches 1/cycle time as the number of instructions increases.

Uploaded by

Werda Farooq
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
37 views2 pages

An Introductory Analysis of Pipelines: I I I I I Clock Cycles

This document discusses pipelining and its effects on instruction throughput, latency, and speedup. It defines a 5-stage instruction pipeline and uses time-space diagrams to show that pipelining improves instruction throughput from 0.2 to 0.6 instructions per cycle (IPC) but does not reduce instruction latency, which remains 5 cycles. Speedup from pipelining is calculated as the ratio of execution times with and without pipelining. The ideal speedup is limited by the pipeline depth, while ideal instruction throughput approaches 1/cycle time as the number of instructions increases.

Uploaded by

Werda Farooq
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 2

Parallel Processing

Discussion
- 03
An Introductory Analysis of Pipelines
Consider a 5-stage instruction pipeline as shown below:

IF ID EX M WB

A time-space diagram is used to describe the progress of instructions through the pipeline.

WB I1 I2 I3 I4 I5 I6
M I1 I2 I3 I4 I5 I6 I7
stages

EX I1 I2 I3 I4 I5 I6 I7 I8
ID I1 I2 I3 I4 I5 I6 I7 I8 I9
IF I1 I2 I3 I4 I5 I6 I7 I8 I9 I10
1 2 3 4 5 6 7 8 9 10
Clock Cycles 
(Pipelined Execution)

We’ve assumed that every stage takes one clock cycle and there are no hazards in the instruction stream.
Instruction Latency (the time it takes to complete an instruction) = 5 cycles
Instruction Throughput = 6/10 IPC = 0.6 IPC
In order to gain better appreciation of pipelined execution, we draw time-space diagram for non-pipelined
execution as shown below:
WB I1 I2
M I1 I2
EX I1 I2
ID I1 I2
IF I1 I2
1 2 3 4 5 6 7 8 9 10
Clock Cycles 
(Non-Pipelined Execution)

Instruction Latency = 5 cycles


Instruction Throughput = 2/10 IPC = 0.2 IPC (instructions per cycle)
Thus pipelined execution improves instruction throughput. However, it doesn’t improve instruction latency. In
practice, pipelining increases instruction latency due to delay of pipeline registers
Speedup
Suppose that a k-stage instruction pipeline executes a program containing n instructions. Let τ be the cycle time.
Execution time on non-pipelined computer is given as
tnp = nkτ ----------(1)
Execution time on pipelined computer is given as
tp = (k – 1 + n)τ ----------(2)
where, (k – 1) cycles are required to fill up the pipeline (also called pipeline setup time). By definition, speedup S
of pipelined execution over non-pipelined execution is given as

Page - 1 - of 2
Parallel Processing
Discussion
- 03
time before enhancemen t
S
time after enhancemen t
t np

tp
nk

k  1  n 
nk
          (3)
k 1 n
Clearly, for a given pipeline, greater speedup is achieved, as more and more instructions are executed. We can
compute the upper bound on speedup as follows:
k
S ideal  n  
Lim
k 1
1
n
k
We regard it as ideal speedup because its derivation is based on the assumption of no pipeline hazards. As can be
seen, even ideal speedup cannot go beyond pipeline depth (i.e. number of pipeline stages).
Instruction Throughput
Instruction throughput ω is defined as the number of instructions executed per unit time. This is calculated as:

n
          (4)
k  1  n 

Multiplying numerator and denominator of (4) by k, we can express ω in terms of speedup S as:

S

k
The upper bound on ω is similarly found:
1
 ideal  n  
Lim
 k 1 
  1
 n 
 1/
CPI
Cycles per instruction (CPI) of pipelined
execution can be found as:
CPI 
k  1  n 
n
k 1
 1
n
The lower bound on CPI is

 k 1 
CPI ideal  n  
Lim
 1
 n 
1
Page - 2 - of 2

You might also like