Coa Applied
Coa Applied
Computer Organization
SOLUTION
Q1 / 17 Q2 /8
Q3 / 14 Q4 / 19
Q5 / 10 Q6 / 12
Total / 80
b) (2 points) What causes control hazards in a pipelined datapath and how control hazards
can be eliminated?
Control hazards are caused by jump and branch instructions that are delayed in a
pipelined datapath. They can be eliminated by converting the next (one or two)
instructions that appear after a jump or a taken branch into NOPs.
c) (2 points) Explain the difference between static RAM and dynamic RAM.
Static RAM: Cell is made out of 6 transistors and does not require refreshing.
d) (2 points) Is it possible to use only one memory for both instructions and data in the
single-cycle datapath? Explain why or why not. Is it possible to use only one memory
for both instructions and data in a multi-cycle datapath? Explain.
In a single-cycle datapath, a load instruction must be fetched and must read data
during the same cycle. Using only one memory is NOT possible to fetch the
instruction and load the data during the same cycle.
In a multi-cycle datapath, using only one memory IS possible because fetching the
instruction and loading the data can occur in two different cycles.
e) (2 points) Why do we need cache memory, and why do we have two separate cache
memories (I-cache and D-cache) in a pipelined processor?
Two separate caches (I-cache and D-cache) are needed to access both of them
during the same cycle by two different instructions.
Page 3 of 13
f) (2 points) Explain the concepts of temporal locality and spatial locality of reference in
cache memory.
g) (2 points) What needs to be stored inside a cache for block identification? How does a
cache know whether there is a cache hit or miss?
The tag stored in the cache is compared against the tag in the memory address to
determine whether there is a cache hit or miss.
Write-back cache: the write is done in the cache only. A modified bit is needed to
indicate whether a block has been modified. Modified blocks are written back to
memory when replaced.
Page 4 of 13
Consider adding the following two new instructions to the above datapath: JLR and LWI.
The JLR instruction is I-type and has a unique opcode. The LWI instruction is R-type and
has a unique function code. The least-significant 2 bits of register PC are hardwired to 00,
and not stored in PC. Therefore, it is sufficient to increment PC by 1 to point to the next
instruction in memory.
a) (4 points) Redraw the necessary changes to the above datapath to implement the above
two instructions. Draw only the modified parts and explain why they are needed.
b) (4 points) Identify any new control signal needed to implement the above two
instructions. Draw a table showing the values of all control signals to implement the
above two instructions.
Page 5 of 13
Q2 Solution
Add a 4th input to the mux at the input of the PC register and add a bus connecting the
ALU result (Jump Register Address) back to the PC.
Add a 3rd input to the WB mux and add a bus connecting the Return Address (PC + 1)
back to the register file.
b) Same control signals are used, except that Main control logic now depends on the
opcode and function code for LWI.
b) (2 points) Compute the clock cycle and the average CPI for the multi-cycle processor.
d) (2 points) Assume that the processor is pipelined. Furthermore, assume that a program
has the following instruction mix: 40% ALU, 5% load, 5% store, 30% branch, and 20%
jump. Moreover, assume that 90% of the branches will be taken. The CPU stalls 1 cycle
for each jump and 2 cycles for each taken branch. Compute the average CPI for the
pipelined processor due to control hazards only.
e) (2 points) Assume that the processor is pipelined and that load instructions are 5% of the
instruction count and store instructions are also 5% as given above. However, the program
spends 30% of its execution time executing load instructions and 15% of its execution
time executing store instructions. The designers discovered that the Data cache is
producing many cache misses causing the CPU to stall. They decided to improve the
design of the data cache and improve the execution time of the load instructions by a factor
of 3x (3 times faster) and the store instructions by a factor of 2x. Determine the overall
speedup of the program due to the improvements done to the data cache.
= .
. .
+ +( − . − . )
The program will run faster by a factor of 1.379x due to data cache improvement.
Page 8 of 13
a) (5 points) Show the design changes needed for handling data hazards using
forwarding including a block diagram for data hazard detection and forwarding unit.
Page 9 of 13
b) (4 points) Show the control signals that will be used for stalling the pipeline for data
hazards due to load instructions along with their conditions. Show the necessary
changes that need to be done to the design.
OR:
if ((EX.MemRd == 1)
and (Rd2 != 0) and ((Rs == Rd2) or (Rt == Rd2))) Stall
Stall will Disable PC and Disable IR (i.e. the signals PCWrite=0 and IRWrite=0), which
will freeze the content of PC and IR registers and will introduce a bubble in stage 2
control register by setting the control signals to 0.
Page 10 of 13
c) (2 points) Show the control signals that will be used for handling control hazards.
Show the necessary changes that need to be done to the design.
d) (3 points) Show the design of the PC control logic that includes the handling of control
hazards assuming that only BEQ, BNE, and J instructions are implemented.
Page 11 of 13
Complete the following table showing the timing of the above code on the 5-stage
pipeline given in part (i) (IF, ID, EX, MEM, WB) supporting forwarding and
pipeline stall. Draw an arrow showing forwarding between the stage that provides the
data and the stage that receives the data. Show all stall cycles (draw an X in the box to
represent a stall cycle). Determine the number of clock cycles to execute this code.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
-
I1: ORI IF ID EX WB
I2: ADDI IF ID EX - WB
I3: ADD IF ID EX - WB
I4: LW IF ID EX M WB
I5: ADD IF X ID EX - WB
I6: SW IF ID EX M -
a) (3 points) Given that the memory address consists of 64 bits, consider a 64 KiB fully
associative cache (1 KiB = 1024 bytes) with 64-byte cache blocks and a write back
policy is used. Compute the total number of bits required to store the valid, modified,
and tag bits in the cache.
b) (3 points) Assume that the memory address consists of 64 bits, and a 64 KiB 4-way set
associative cache with 64-byte cache blocks is used. Find the number of tag bits, index
bits, and offset bits needed.
c) (4 points) Given a 2-way set-associative cache that uses 32-bit memory addresses
divided into 4 bits of offset, 12 bits of index, and 16 bits of tag. Starting with an empty
cache, show the tag, index, and way (block 0 or 1) for each of the following sequentially
referenced addresses and indicate whether the reference resulted in a hit or a miss. The
replacement policy used is FIFO.
A processor runs at 2.5 GHz and has a CPI=1.7 for a perfect cache (i.e. without including the
stall cycles due to cache misses). Assume that load and store instructions are 15% of the
instructions. The processor has an I-cache with a 4% miss rate and a D-cache with 6% miss
rate. The hit time is 1 clock cycle for both caches. Assume that the time required to transfer a
block of data from the main memory to the cache, i.e. miss penalty, is 40 ns.
a) (4 Points) Compute the number of stall cycles per instruction and the overall CPI.
Alternative Solution:
c) (4 Points) Discuss how the average memory access time (AMAT) can be reduced by
mentioning all the factors that could reduce it and for each factor explaining how it can
be done.