CompEng 361 - Homework 3 Solutions
CompEng 361 - Homework 3 Solutions
1. In this exercise, we examine how pipelining affects the clock cycle time of the processor.
Problems in this exercise assume that individual stages of the datapath have the following
latencies:
IF ID EX MEM WB
Also, assume that instructions executed by the processor are broken down as follows:
R-Type beq lw sw
c. If we can split one stage of the pipelined datapath into two new stages, each with half the
latency of the original stage, which stage would you split and what is the new clock cycle
time of the processor?
d. Assuming there are no stalls or hazards, what is the utilization of the data memory?
e. Assuming there are no stalls or hazards, what is the utilization of the write-register port of
the “Registers” unit?
CPI 1 4 1
2. In this exercise, we examine how data dependencies affect execution in the basic 5-stage
pipeline described in Section 4.5. Problems in this exercise refer to the following sequence of
instructions:
Also, assume the following cycle times for each of the options related to forwarding:
or r1, r2, r3
nop
nop
// Data hazard on r1
or r2, r1, r4
nop
nop
// Data hazard on r1, r2
or r1, r1, r2
c. Assume there is full forwarding. Indicate hazards and add nop instructions to eliminate
them.
No hazards…nothing to do!
d. What is the total execution time of this instruction sequence without forwarding and with
full forwarding? What is the speedup achieved by adding full forwarding to a pipeline that
had no forwarding?
e. Add nop instructions to this code to eliminate hazards if there is ALU-ALU forwarding only
(no forwarding from the MEM to the EX stage).
or r1, r2, r3
or r2, r1, r4
nop
nop
or r1, r1, r2
f. What is the total execution time of this instruction sequence with only ALU-ALU
forwarding?
3. The importance of having a good branch predictor depends on how often conditional branches
are executed. Together with branch predictor accuracy, this will determine how much time is
spent stalling due to mispredicted branches. In this exercise, assume that the breakdown of
dynamic instructions into various instruction categories is as follows:
a. Stall cycles due to mispredicted branches increase the CPI. What is the extra CPI due to
mispredicted branches with the always-taken predictor? Assume that branch outcomes
are determined in the EX stage, that there are no data hazards, and that no delay slots are
used.
b. Repeat 3a. for the “always-not-taken” predictor.
c. Repeat 3a. for the 2-bit predictor.
Predictor Miss Rate Occurrence Stall Cycles Extra CPI
d. With the 2-bit predictor, what speedup would be achieved if we could convert half of the
branch instructions in a way that replaces a branch instruction with an ALU instruction?
Assume that correctly and incorrectly predicted instructions have the same chance of
being replaced.
e. With the 2-bit predictor, what speedup would be achieved if we could convert half of the
branch instructions in a way that replaced each branch instruction with two ALU
instructions? Assume that correctly and incorrectly predicted instructions have the same
chance of being replaced.
f. Some branch instructions are much more predictable than others. If we know that 80% of
all executed branch instructions are easy-to predict loop-back branches that are always
predicted correctly, what is the accuracy of the 2-bit predictor on the remaining 20% of the
branch instructions?
4. This exercise examines the accuracy of various branch predictors for the following repeating
pattern (e.g., in a loop) of branch outcomes:
a. What is the accuracy of always-taken and always-not-taken predictors for this sequence of
branch outcomes?
Always T: 0.4
Always NT: 0.6
b. What is the accuracy of the two-bit predictor for the first 4 branches in this pattern,
assuming that the predictor starts off in the bottom left state from the lecture slides
(strongly predict not taken)?
Branches NT T NT NT
Pred NT NT NT NT
c. What is the accuracy of the two-bit predictor if this pattern is repeated forever?
Branches NT T NT NT T
Pred NT NT NT NT NT
d. Design a predictor that would achieve a perfect accuracy if this pattern is repeated forever.
Your predictor should be a sequential circuit with one output that provides a prediction (1
for taken, 0 for not taken) and no inputs other than the clock and the control signal that
indicates that the instruction is a conditional branch.
There are several ways to show this (including Verilog code, Gate diagram w/ FFs). Here is the
most straightforward way to do this with a state diagram:
Slight variations on this are fine. Note that this predictor must be initialized in the correct state to
predict the pattern perfectly.
e. What is the accuracy of your predictor from part d if it is given a repeating pattern that is
the exact opposite of this one?
f. Repeat 4d., but now your predictor should be able to eventually (after a warm-up period
during which it can make wrong predictions) start perfectly predicting both this pattern and
its opposite. Your predictor should have an input that tells it what the real outcome was.
Hint: this input lets your predictor determine which of the two repeating patterns it is given.
Again. There are many ways to show this. The simplest thing to do is to distinguish between the
two patterns early in the sequence (warm up). Again, we will assume that you are initialized into
one of the two starting states. Here is a state diagram:
Note that the two inputs identify if the current instruction is a branch and what the real outcome
was. Slight variations on this are fine.
Given that we have an input to tell us what the correct prediction was, we can actually devise a
more complex predictor that will eventually correctly predict either pattern but won’t need to be
initialized into the correct state.