Investigating Instruction Pipelining
Investigating Instruction Pipelining
Introduction
Objectives
At the end of this lab you should be able to:
Demonstrate the difference between pipelined and sequential
processing of the CPU instructions
Explain pipeline data dependency and data hazard
Describe a pipeline technique to eliminate data hazards
Demonstrate compiler “loop unrolling” optimization’s benefits for
instruction pipelining
Describe compiler re-arranging instructions to minimize data
dependencies
Show the use of jump-predict table for pipeline optimisation
Basic Theory
Modern CPUs incorporate instruction pipelines which are able to process
different stages of multiple-stage instructions in parallel thus improving the
overall performance of the CPUs. However, most programs include
instructions that do not readily lend themselves to smooth pipelining thus
causing pipeline hazards and effectively reducing the CPU performance. As a
result, CPU pipelines are designed with some tricks up their sleeves for
dealing with these hazards.
1
Exercise 1 – Difference between the sequential and the pipelined execution
of CPU instructions
Enter the following source code, compile it and load in simulator’s memory:
program Ex1
for n = 1 to 20
p = p + 1
next
end
Open the CPU pipeline window by clicking on the SHOW PIPELINE… button in
the CPU simulator’s window. You should now see the Instruction Pipeline
window. This window simulates the behaviour of a CPU pipeline. Here we can
observe the different stages of the pipeline as program instructions are
processed. This pipeline has five stages. The stages are colour-coded as
shown in the key for the “Pipeline Stages”.
List the names of the stages here:
1. Fetch
2. Decode
3. Read Operands
4. Execute
5. Write Result
The instructions that are being pipelined are listed on the left side (in white
text boxes). The newest instruction in the pipeline is at the bottom and the
oldest at the top. You’ll see this when you run the instructions. The horizontal
yellowish boxes display the stages of an instruction as it progresses through
the pipeline. At the bottom left corner pipeline statistics are displayed as the
instructions are executed.
Check the box titled Stay on top and make sure No instruction pipeline check
box is selected. In the CPU simulator window bring the speed slider down to
around a reading of 30. Run the program and observe the pipeline. Wait for
the program to complete. Now make a note of the following values
2
Next, uncheck the No instruction pipeline checkbox, reset and run the above
program again and wait for it to complete.
Note down your observation on how the pipeline visually behaved differently
3
Now once again make a note of the following values
*Briefly explain why you think there is a difference in the two sets of values:
- No instruction pipleline: make the program run on single processor.
4
CPU pipelines often have to deal with various hazards, i.e. those aspects of
the CPU architecture which prevent the pipelines running smoothly and
uninterrupted. These are often called “pipeline hazards”. One such hazard is
called the “data hazard”. A data hazard is caused by unavailability of an
operand value when it is needed. In order to demonstrate this create a
program (call it Ex2) and enter the following set of instructions
MOV #1, R01
MOV #5, R03
MOV #3, R01
ADD R01, R03
HLT
5
Make a note of the following values
CPI (Clocks Per Instruction) 13
SF (Speed-up Factor) 1.92
Data Hazards 1
. Data hazards in a CPU simulator are situations where two instructions
are trying to access the same data at the same time.
Exercise 3 – A pipeline technique to eliminate data hazards
One way of dealing with “data hazard” is to get the CPU to “speed up” the
availability of operands to pipelined instructions. One such method is called
“operand forwarding”, a kind of short-cut by the hardware. To demonstrate
6
this check the box titled Enable operand forwarding and run the above code
again.
- Operand forwarding involves passing the data from one instruction to another
instruction as soon as it is generated. This eliminates data hazards because
the instruction can access the data as soon as it is available, instead of having
to wait for the data to be read from memory. By avoiding memory reads,
instructions can access the data in a much faster and more efficient way,
eliminating the possibility of two instructions trying to access the same data at
the same time. This ensures that instructions are executed in the correct order
and that the data is accessed safely.
Has the bubble seen in Exercise 2 disappeared (or burst!)?
7
Make a note of the following values
CPI (Clocks Per Instruction)
SF (Speed-up Factor)
Data Hazards
Make a note of the size of the code generated for Ex4_1 here:
8
9
1
0
Now, load this code in CPU simulator’s memory.
1
1
Make sure the pipeline window stays on top. Also make sure the Enable
operand forwarding and Enable jump prediction boxes are all unchecked.
First, select program Ex4_1 from the PROGRAM LIST frame in the CPU
simulator window then click the RESET button. Make sure the speed of
simulation is set at maximum. Now click the RUN button to run program
Ex4_1. Observe the pipeline and when the program is finished make a note of
the following values:
1
2
Do the same with program Ex4_2 and make note of the following values:
1
3
Briefly comment on your observations making references to the code sizes
and the number of instructions executed:
:
1
4
Exercise 5 – Compiler re-arranging instruction sequence to help minimize
data dependencies
The optimization in Exercise 4 is one example of how a modern compiler can
provide support for the CPU pipeline. Another example is when the compiler
re-arranges the code without changing the logic of the code. This is done to
minimize pipeline hazards such as the “data hazard” we studied in Exercise
3. Here we demonstrate this technique.
Make sure Show dependencies check box is checked and ONLY the
Redundant Code optimization is selected. Enter the following source code,
compile it and load in memory
program Ex5_1
a = 1
b = a
c = 2
end
Copy the CPU instruction sequence generated below (do not include the
instruction addresses):
1
5
Next, select the optimization option Code Dependencies. Change the
program name to Ex5_2, compile it and load in memory.
Copy the CPU instruction sequence generated below:
1
6
How do the two sequences differ? Does the change affect the logic of the
program? Briefly explain the rationale for the change:
1
7
Let’s see if we can measure any improvement introduced by this “out of
sequence execution” method.
First reset and run program Ex5_1 and make note of the values below:
CPI (Clocks Per Instruction)
SF (Speed-up Factor)
Next, reset and run program Ex5_2 and make note of the values below:
CPI (Clocks Per Instruction)
SF (Speed-up Factor)
Do you see any improvement in program Ex5_2 over program Ex5_1 (express
this in percentage)?
No improment
Exercise 6 – Jump predict table
The CPU pipeline uses a table to keep a record of the predicted jump
addresses. So, whenever a conditional jump instruction is being executed this
table is consulted in order to see what the jump address is predicted as. If
this prediction is wrong then the calculated address is used instead. Often the
predicted address will be correct with occasional wrong prediction. However,
the overall effect will be an improvement on CPU’s performance.
Enter the following program and compile it with ONLY the Enable optimizer
and Remove redundant code check boxes selected. Load the compiled
program in the CPU.
program Ex6
i = 0
for p = 1 to 40
i = i + 1
if i = 10 then
i = 0
r = i
end if
next
end
1
8
Run the program and make a note of the following pipeline stats:
Now, in the pipeline window select the Enable jump prediction check box.
Reset the program and run it again. Make a note of the following pipeline
stats:
1
9
Click on the SHOW JUMP TABLE… button. You should see the Jump Predict
Table window showing. This table keeps an entry relevant to each conditional
jump instruction. The information contained has the following fields. Can you
suggest what each field stands for? Enter your suggestions in the table below:
JInstAddr
JTarget
PStat
Count
2
0