Lecture13 Pipeline1
Lecture13 Pipeline1
Pipeline Architecture
- Performance measurement
Example: ADD R1 R2 R3 | R1 ← R2 + R3
We will design a processor that executes the above ADD instruction!
R3
R2
PC → ADD R1 R2 R3
R1
+
Control
, x0 x8 , 80
x0 , 1
x8 x1
)
11
0, 1, 2,
x
1
0(
x 1 x 1
D x
DI
3,
D D
x1
A D
A AD
LW
4 ns
CPU Design
Single cycle CPU design:
Important points to note with respect to single cycle design:
The first question we ask is what is the problem with single cycle design?
4 ns
R15
….
PC → ADD R1 R2 R3
R0
LW R13, 0(R11)
Temp Reg +
Control
CPU Design
Multi-cycle CPU design: Performance analysis
= 28 ns / 16 ns = 1.7 time
1 – The idea is to partition the data path in such a way that the cycle time
would be as minimum as possible (the smallest execution time of any instruction)
2 – The design would require more hardware resources than the corresponding single
cycle design.
4 – The multiple cycle design would support more diverse set of instruction in efficiently.
Where as in single cycle the diverse set of instruction would lead to performance loss.
5 – If the instruction set is uniform in terms of execution time it is certainly wise to implement
In single cycle. However, we will see next that this argument is not always true.
7 – Partition design is one of the challenging problems since it require that each data path
be uniform in terms of path length (or propagation time delay).
CPU Design
Single cycle CPU design
Parallelism
This is one of the architectural principle
When you do
google search
CC 1 INS 1
CC 2 INS 2 INS 1
CC 3 INS 3 INS 2 INS 1
CC 4 INS 4 INS 3 INS 2 INS 1
CC 5 INS 5 INS 4 INS 3 INS 2 INS 1
CC 6 INS 5 INS 4 INS 3 INS 2
CC 7 INS 5 INS 4 INS 3
CC 8 INS 5 INS 4
CC 9 INS 5
CC – clock cycle, INS1 – Instruction 1;
IF – instruction fetch, ID – instruction decode, EX – instruction execution,
MEM – Memory operation (load/store), WB – write back to register
Pipeline Architecture
Performance measurement:
0.5 ns 0.5 ns 0.5 ns 0.5 ns 0.5 ns
Ideal pipeline:
- No halt in any stage at any point of time
- All instructions get executed in free flow.
k is due to the fact that the first instruction requires k cycle to complete,
remaining n – 1 instructions would be completed in followed by cycles.
Pipeline Architecture
Example Exercise:
Consider the unpipeline processor that has been discussed. Assume that it has
A 2 GHz clock (or a .5 ns clock cycle) and that it uses 4 cycles for ALU
operations and branches and five cycles for memory operations. Assume that
the relative frequencies of these operations are 40%, 20% and 40%, respectively.
suppose that due to clock skew and setup, pipelining the processor adds 0.1 ns
of overhead to the clock. Ignoring everything other latencies, how much speed
up in the instruction execution rate will be gained from pipeline?
The clock cycle for pipeline would be 0.6 ns because 0.5 ns + 0.1 ns skew
Pipeline Architecture
Exercise on Performance Comparison:
Exercise 1:
3) Imagine an ISA where all the instructions take exactly same time, let say t ns,
then what would be the performance comparision of single, multi and pipeline
processor?
Pipeline Architecture
Pipeline Processor Design:
On this design, if the control path is also partitioned, you will get a pipeline design.
IF – instruction fetch:
IF – instruction fetch:
ID – instruction decode
PC ← NPC or
PC ← ALUout if condition is satified for branch instruction
LMD ← Mem[ALUout] or this is for load instruction
Mem[ALUout] ← B this is for store instruction
Note: The figure above does not show control path, the control signals would go to the
select line of muxes and to the ALU control. These signals are also pipelined.
Pipeline Architecture
Exercise:
1 – Complete the pipeline design of the MIPS architecture with control signals
3 - Find out the hardware overhead in comparison to single cycle and multi-cycle
design for MIPS architecture and RISC-V architecture.
Next Lecture
Pipeline to continue: Hazards
Long Latency pipeline