LECTURE 3 Pipelining
Basic Instruction Pipelining
simplest definition – execution of the next instruction
begins before execution of previous instruction is
completed
Techniques used in design of modern processors,
microcontrollers and CPUs to increase instruction
throughput( no of instructions that can be executed in
a unit time)
The main idea of instruction pipelining
Main idea = divide (split) the processing of a
CPU instruction into a series of independent
steps/operations as defined by the opcode
Allows CPU’s control logic to handle
instructions at the processing rate of the
slowest step.
Slowest step is much faster than the time
needed to process the instruction as a single
step
Example of a car assembly plant
Like in car assembly each step is carrying a single
microinstruction/micro operation and each step is
linked to another step.
Each step is called a pipe stage- stages connected one
to the next to form a pipe
Instructions enter at one end, progress through the
stages and exit at the other end.
Time required to move an instruction one step down
the pipeline is called processor cycle
Pipeline stages
Example: Consider a RISC pipeline broken into five stages
with a set of flip flops between each stage as follows:
1 Instruction fetch (IF)
2 Instruction decode (ID)
3 Execute (EX)
4 Memory Access(MEM)
5 Write-back (WB)
Block diagram showing pipeline stages
Pipelined processors consists of :
Internal modules which can semi independently work on
separate microinstructions
Stages are linked by flip flops
Pipelining reduces instruction’s overall processing time
but does not reduce the stages
Serial processing
A Non pipelined is not as efficient because some
CPU modules are idle while another is active
during instruction cycle
Note that pipelining does not completely
remove idle time in a pipelined CPU.
Making CPU modules work in parallel
increases instruction throughput
Instruction pipeline is called fully pipelined if it
can accept a new instruction every clock cycle
Non pipelined processor
Quantitative effects of pipelining
Time in non seconds per instruction goes up
Each instruction takes more cycles to execute but
average CPI remains roughly the same
Clock speed goes up
Total execution time goes down resulting in lower
average time per instruction
Example
Consider a simple example to understand pipelining
Consider 4 students, Ann, Ben, Candle, and Donald who
share a Washer machine, a drying machine and an
iron.
Washing takes 30 min
Drying takes 40 min
Ironing takes 20 min
Calculating execution time in serial load
If the students do their workload in sequence, how
much time will it take to finish the four wash loads?
Use diagram to explain your calculations.
Calculating the time in pipelined
If the students have been taught pipelining and decide
to apply it the entire wash load would take 3.5 hours
Calculating speedup
It is possible to calculate speedup, i.e. how fast is
pipelining compared to serial processing.
Calculating total time in pipelining
1) Let k be the number of stages in the pipeline and tp
the time taken to execute per stage
2) Each instruction represents a task T in the pipeline
and n be the number of tasks(instructions)
3) First task requires k x tp time to complete in a k stage
pipeline
4) The remaining n – 1 tasks emerge from pipeline one
per cycle, so the total time to complete the remaining
tasks is (n – 1) tp
So to complete the n tasks using a k stage pipeline
requires:
(k x tp) + (n – 1) tp = (k + n -1) tp where:
(k x tp) is the first instruction
(k + n -1)tp are the remaining n – 1 instructions
This formula is applied if all stages take exactly the same
time and that there are no wait cycles.
Clock skew
Pipelining always introduce clock skew, i.e. different
arrivals of the clock signals especially in adjacent flip
flops
In such a case the formula for calculating instruction
latency is:
Latency = MAX(lengths of unpipelined stages)+
overhead (clock skew)
Example
Consider a non pipelined machine with six execution
stages of lengths 50 ns, 50 ns, 60 ns, 60 ns, 50 ns and
50 ns. Find the instruction latency and how much
time it would take to execute 100 instructions?
Suppose pipeline is introduced to the above
machine and assume a lock skew of 5 ns is
added as overhead to each execution stage.
(i) What would be the instruction latency?
(ii) How much time would it take to execute 100
instructions?
Calculating instruction latency
Length of pipe stage = MAX(length of un pipe
lined stages) + overhead
= MAX(50,50,60,60,50,50) + 5 NS
= 60 + 5 = 65 ns
To execute 100 instructions:
= (1 x 6 x 65) + (1 x 65 x 99)
= 390 + 6435
= 6825 ns
Superscalar processor
• A super scalar can execute more than one instruction
in a cycle
• Dispatches multiple instructions to several execution
units
• Each execution unit is not a separate processor but an
execution unit in the same processor such as ALU
• Implements a form of instruction parallelism called
instruction level parallelism in a single processor
Space time diagram of a super scalar
Super scalar
This can be represented by the instruction stages as
shown below: