Pipeline Processing
Pipeline Processing
Pipeline Processing
PARALLEL PROCESSING
• A parallel processing system is able to perform
concurrent data processing to achieve faster
execution time
– Flynn's classification
• Based on the multiplicity of Instruction Streams and
Data Streams
• Instruction Stream
– Sequence of Instructions read from memory
• Data Stream
– Operations performed on the data in the processor
MISD Nonexistence
Systolic arrays
Dataflow
Associative processors
Message-passing multicomputers
Hypercube
Mesh
Reconfigurable
PIPELINE PROCESSING
• Pipeline is a technique of overlapping the
execution of several instructions to reduce
the execution time of a set of instructions.
• Disadvantages
– Variable throughput size
– Different amounts of delay may be used in
different stages
Synchronous pipeline
• Here clocked latches are used to interface
between stages. The latches are used to
isolate input from outputs.
• Upon arrival of the clock pulses , all
latches transfer data to the next stage
simultaneously.
Clock
Input S1 R1 S2 R2 S3 R3 S4 R4
• Advantage
– Equal delay in all stages
Instruction Execution steps
• Instruction fetch (IF) from MM
• Instruction Decoding (ID)
• Operand Fetch (OF), if any
• Execution of the decoded instruction (EX)
Non-pipelined computer
- 6 – Stage
- Instruction fetch, Instruction Decode, Operand
Address calculate, Operand fetch, Execute,
Write Result
Space-Time Diagram
1 2 3 4 5 6 7 8 9 Clock cycles
Segment 1 T1 T2 T3 T4 T5 T6
2 T1 T2 T3 T4 T5 T6
3 T1 T2 T3 T4 T5 T6
4 T1 T2 T3 T4 T5 T6
Pipelined Computer
EX I1 I2 I3
OF I1 I2 I3
ID I1 I2 I3
IF I1 I2 I3 I4
Stages
/Time
1 2 3 4 5 6 7 8 9 10 11 12 13
In the first cycle instruction I1 is fetched from memory. In the second
cycle another instruction I2 is fetched from memory and simultaneously
I1 is decoded by the instruction decoding unit.
Instruction Pipeline
INSTRUCTION PIPELINE
i FI DA FO EX
i+1 FI DA FO EX
i+2 FI DA FO EX
Pipelined
i FI DA FO EX
i+1 FI DA FO EX
i+2 FI DA FO EX
PIPELINING
A technique of decomposing a sequential process
into suboperations, with each subprocess being
executed in a partial dedicated segment that
operates concurrently with all other segments.
Ai * Bi + Ci for i = 1, 2, 3, ... , 7
Ai Bi Memory Ci
Segment 1
R1 R2
Multiplier
Segment 2
R3 R4
Adder
Segment 3
R5
Decode instruction
Segment2: and calculate
effective address
yes Branch?
no
Fetch operand
Segment3: from memory
Interrupt yes
Interrupt?
handling
no
Update PC
Empty pipe
Step: 1 2 3 4 5 6 7 8 9 10 11 12 13
Instruction 1 FI DA FO EX
2 FI DA FO EX
(Branch) 3 FI DA FO EX
4 FI FI DA FO EX
5 FI DA FO EX
6 FI DA FO EX
7 FI DA FO EX
Example: 6 tasks, divided into 4
segments
1 2 3 4 5 6 7 8 9
T1 T2 T3 T4 T5 T6
T1 T2 T3 T4 T5 T6
T1 T2 T3 T4 T5 T6
T1 T2 T3 T4 T5 T6
Pipeline Performance
• Latency
– It is the amount of time, that a single operation takes
to execute
• Throughput
– It is the rate at which each operations gets executed.
(Operations/second or operations/cycle)
1
Non-pipelined processor, throughput =
Latency
1
pipelined processor, throughput >
Latency
• Cycle time of pipeline processor
– Dependent on 4 factor
• Cycle time of unpipelined processor
• Number of pipeline stages
• How evenly data path logic is divided among the stages
• Latency of the pipeline stages
• If the logic is evenly divided, then the clock period
of the pipeline processor is
CycleTime unpipelined
CycleTime pipelined pipeline latch latency
No.of pipeline stages
CycleTime unpipelined
CycleTime pipelined pipeline latch latency
No.of pipeline stages
= (25 / 5) + 1 ns = 6 ns
Therefore, Cycle time of the 5 pipeline stages = 6 ns
Latency of each pipeline = Cycletime of the pipeline x No. of pipeline stages
= 6 ns x 5 = 30 ns
For the 50 stage pipeline, cycle time = (25 ns / 50) + 1 ns = 1.5 ns
Therefore, Cycle time of the 50 pipeline stages = 1.5 ns
Latency of each pipeline = Cycletime of the pipeline x No. of pipeline stages
= 1.5 ns x 50 = 75 ns
Questions
• Suppose an unpipelined processor with a
25 ns cycle time is divided into 5 pipeline
stages with latencies of 5, 7, 3,6 and 4 ns.
If the pipeline latch latency is 1 ns, what is
the cycle time of the pipeline processor?
What is the latency of the resulting
pipeline?
Solution
• Here, unpipeline processor is used.
• The longest pipeline stage is : 7 ns
• Pipeline latch latency is = 1 ns
• Cycle time = Longest pipeline stages + Pipeline latch Latency
= 7 + 1 = 8 ns
Therefore, cycle time of the unpipelined processor = 8 ns
There are 5 pipeline stages.
Total latency = Cycle Time of the pipeline x No. of pipeline stages
= 8 ns x 5 = 40 ns
Question
• Suppose that an unpipelined processor has a
cycle time of 25 ns and that its datapath is made
up of modules with latencies of 2,3,4,7,3,2 and 4
ns (in that order). In pipelining this processor, it
is not possible to rearrange the order of the
modules (For example, putting the register read
stage before the instruction decode stage) or to
divide a module into multiple pipeline stages (for
complexity reasons). Given pipeline latches with
1 ns latency. What is the minimum cycle time
that can be achieved by pipelining this
processor?
Solution
• There is no limit on the number of pipeline
stages.
• The minimum cycle time =
i+1 FI DA FO EX
Branch Instructions