2004 Spring Exam2
2004 Spring Exam2
Name:
Section:
1
Question 1: Single-cycle CPU implementation (30 points)
In sections, you have seen how to implement jal instruction. This instruction is useful only if we can also return
from function calls. To perform the return, implement jr rs instruction (jump to register rs) in the single-cycle
datapath. The format of jr instruction is shown below.
Field op = 0 rs 0 func = 8
Bits 31-26 25-21 20-6 5-0
Before implementing the jr instruction, do parts (a) and (b) as a small warm-up exercise.
Part (a)
The single-cycle datapath from lecture appears below. Clearly mark all wires that are active during the
execution of lw instruction. (10 points)
Part (b)
On the diagram below, write (next to the signal’s name) values of all non-0 control signals required for the lw
instruction. (5 points)
0
M
Add u
PC x
4
Add 1
Shift
left 2
PCSrc
RegWrite
MemWrite MemToReg
Read Instru- I [25 - 21]
Read Read
address ction
register 1 data 1
[31-0] ALU Read Read 1
I [20 - 16]
Read Zero address data M
Instruction
register 2 Read 0 u
memory 0 Result Write
data 2 M x
M Write address
register u Data 0
u x Write
Registers memory
I [15 - 11] x Write ALUOp data
1
1 data
MemRead
ALUSrc
RegDst
I [15 - 0] Sign
extend
2
Question 1 continued
Part (c)
The single-cycle datapath from lecture (same as on the previous page) appears below. Show what changes are
needed to support jr instruction. You should only add wires and muxes to the datapath; do not modify the main
functional units themselves (the memory, register file and ALU). Try to keep your diagram neat! (10 points)
Note: While we’re primarily concerned about correctness, full points will only be rewarded to solutions that do
not lengthen the clock cycle. Assume that the ALU, Memory and Register file all take 2ns, and everything else is
instantaneous.
Part (d)
On the diagram below, write (next to the signal’s name) values of all non-0 control signals required for the jr
instruction. (5 points)
0
M
Add u
PC x
4
Add 1
Shift
left 2
PCSrc
RegWrite
MemWrite MemToReg
Read Instru- I [25 - 21]
Read Read
address ction
register 1 data 1
[31-0] ALU Read Read 1
I [20 - 16]
Read Zero address data M
Instruction
register 2 Read 0 u
memory 0 Result Write
data 2 M x
M Write address
register u Data 0
u x Write
Registers memory
I [15 - 11] x Write ALUOp data
1
1 data
MemRead
ALUSrc
RegDst
I [15 - 0] Sign
extend
3
Question 2: Multi-cycle CPU implementation and its performance (50 points)
Assume that the ALU can perform the max2 operation (i.e., return the greater of 2 inputs):
alu_result = (A_input > B_input) ? A_input : B_input;
ALUOp for this instruction is MAX2.
Given this improved ALU, implement the max4 instruction that writes into register rd the largest value of 4
registers:
max4 rs, rt, rd, rm # rd = max(rs, rt, rd, rm)
Note that register rd is both an input and an output. Instruction max4 has the following format:
Field op rs rt rd rm func
Bits 31-26 25-21 20-16 15-11 10-6 5-0
Part (a)
The multicycle datapath from lecture appears below. Show what changes are needed to support max4. You
should only add wires and muxes to the datapath; do not modify the main functional units themselves (the
memory, register file, and ALU). Try to keep your diagram neat! (10 points)
Note: While we’re primarily concerned about correctness, full points will only be rewarded to solutions that use
a minimal number of cycles and do not lengthen the clock cycle. Assume that the ALU, Memory and Register
file all take 2ns, and everything else is instantaneous.
PCWrite
ALUSrcA
PC
IorD
0
RegDst RegWrite M
MemRead u
0 x 0
M Read Read 1 M
A ALU
u Address reg 1 data 1 u
x Zero x
IRWrite Read ALU
1 B Result 1
Memory 0 reg 2 Read Out
data 2 0
M Write 1 PCSrc
[31-26] 4
Write Mem u register
[25-21] x 2 ALUOp
data Data Register
[20-16] Write
1 file 3
[15-11] data
MemWrite [15-0]
Instr 0 ALUSrcB
register
M
u Sign Shift
Memory x extend left 2
data
1
register
MemToReg
4
Question 2 continued
Part (b)
Complete this finite state machine diagram for the max4 instruction. Control values not shown in each stage
are assumed to be 0. Remember to account for any control signals that you added or modified in the previous
part of the question! (15 points)
Branch
Instruction fetch
completion
and PC increment ALUSrcA = 1
Register fetch and Op = BEQ ALUSrcB = 00
IorD = 0 branch computation ALUOp = SUB
MemRead = 1 PCSource = 1
IRWrite = 1 PCWrite = Zero
ALUSrcA = 0 ALUSrcA = 0
ALUSrcB = 01 ALUSrcB = 11
ALUOp = ADD ALUOp = ADD
R-type Write-
PCSource = 0 execution back
PCWrite = 1 Op = R-type ALUSrcA = 1 RegDst = 1
ALUSrcB = 00 MemToReg = 0
ALUOp = func RegWrite = 1
Op = MAX4
Effective address
computation
Memory
Op =
write
LW/SW ALUSrcA = 1 Op = SW IorD = 1
ALUSrcB = 10 MemWrite = 1
ALUOp = ADD
Op = LW
lw register
Memory
write
IorD = 1 read RegDst = 0
MemRead = 1 MemToReg = 1
RegWrite = 1
5
Question 2 continued
Part (c)
The FSM diagram on the previous page is incomplete. The control signals for the fourth cycle of sw
instruction are missing. Fill in the missing control signals in the FSM diagram. (5 points)
Part (d)
The max4 instruction can be used in place of three branch instructions, reducing the number of instructions
that need to be executed. Below are two functionally equivalent programs; the second of which uses the max4
instruction:
Program 1 Program 2
lw v0, 0(a0) lw v0, 0(a0)
add a1, a0, 12 lw t0, 4(a0)
label: add a0, a0, 4 lw t1, 8(a0)
lw t1, 0(a0) lw t2, 12(a0)
slt t2, t1, v0 max4 v0, t0, t1, t2
bne t2, $zero, skip jr $ra
move v0, t1
skip: bne a0, a1, label
jr $ra
Assume the datapath and control that you implemented in parts (a) and (b); assume slt and move can be
considered as R-type instructions and jr and bne take the same amount of time as beq. Also assume that the
first element in the input array is the largest (so that the first branch of program 1 is always taken). How
much faster (or slower) is program 2 than program 1? You may leave your answer as a fraction. (15 points)
Part (e)
What is average CPI of program 1? You may leave your answer as an expression. (5 points)
6
Question 3: More performance (20 points)
Part (a)
Assume the following delays for the main functional units: Functional Unit Time delay
Memory 5 ns
ALU 4 ns
Given the following instructions: lw, sw, add, beq, calculate:
- minimum time to perform each instruction
Register File 3 ns
- time required on a single-cycle datapath (from q. 1)
- time required on multi-cycle datapath (from q. 2).
Write your answers in the table below. State any assumptions. (15 points)
sw
add
beq
Part (b)
How much faster than a 1GHz single-cycle MIPS processor would a 3.0Ghz Pentium4 x86 processor be
if it achieved a CPI of 0.5 on a workload? (5 points)
7
Performance