0% found this document useful (0 votes)

24 views22 pages

Ee457 Final Fall2023

Uploaded by

arditxzy

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

24 views22 pages

Ee457 Final Fall2023

Uploaded by

arditxzy

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 22

Cover page

EE457 Final Exam (~33.5%)

Closed-book Closed-notes Exam; Verilog Guides are not needed and are not allowed.
This is a traditional paper pencil exam. Smart phones, laptops, iPads, tablets, and all kinds of computing/Internet devices are not allowed.
This is a Crowdmark exam. Please do not write on margins or on the backside. Use a dark HB or H1 pencil
Fall 2023
Instructor: Gandhi Puvvada
Final Exam (~33.5%): Saturday, Dec. 9, 2023, 01:15 PM - 04:15 PM PST in THH 101

I have previously read the Viterbi Code of Integrity and other related material at the site https://viterbischool.usc.edu/academic-
integrity/ and I will abide by these rules of conduct. I will neither seek help from others nor offer help to others in my exams.

_____________________________ <== Student’s signature

Ques# Topic Page# Points

1 Mutual Exclusion, MSI and MOESI 2-7 83

2 Miscellaneous advanced topics 8-9 50

3 Virtual Memory and Cache 10-12 73

4 FIFO 13 28

5 Lab 7 Part 3 SP 3 Verilog RTL coding 13-14 20

6 Lab 7 P3 SP2 modification 15-20 181

Just FYI Early Branch Block diagram 21

Total Cover+ 2-to-20 +2 435

Perfect Score 420
EE457 Final - Fall2023 1 / 22 C Copyright 2023 Gandhi Puvvada
Viterbi School of Engineering, University of Southern California
Q1P2 Page total 31 pts
1 ( points) min. Mutual Exclusion, MSI, and MOESI

1.1 Reproduced on the next four pages is the 14-step sequence from our class notes showing how
52 pts three competing threads in three single-threaded cores can obtain the lock in a mutual exclusive
on next
4 pages manner. Revise the 14-step sequence for MOESI in place of MSI showing the O state when
needed. Change the contents of the L1 and L2 caches as needed. Answer the 3 questions on the
last page of these 4 pages related to MOESI and FMM. Hint: E-state is never used here.

If the three threads are all in one single three-threaded core, you expect that

9 1. mutual exclusion is still possible even though there are no SCUs involved. _____ (T/F).
pts
2. most of the polling (checking for the lock to be released) is done in ______ (M/S/I) state
by the threads who are waiting to lock the lock.

3. here we can just use LW and SW and do not need LL and SC _____ (T/F).

1.2 Legend: A = desirable; B = undesirable; C = wrong; D = none of the above

6 It is ______________ (A/B/C/D) to map two independent locks, such as a Student Database Lock
pts
(SDBL) and a Faculty Database Lock (FDBL) to locations in the same cache block.
It is ______________ (A/B/C/D) to map two independent locks, such as a Student Database Lock
(SDBL) and a Faculty Database Lock (FDBL) to locations in the same virtual page.

1.3 After locking the SDBL and entering the Student Database, can the L1 cache of that core choose
6 to voluntarily replace the block containing the lock (replace it to bring some other block)?
pts
_______ (Yes/No). Does it pose any problem? _______ (Yes/No).

1.4 An application to admit students is written to admit one student at a time and release the SDBL
10 lock and not seek for it for the next millisecond. During this millisecond, there may or may not
pts
be others looking for the lock! If no other is looking for the lock, do you expect that the block
containing SDBL in this core would be in M or S or I states at the end of the millisecond?
.
____________________________________________________________________________
____________________________________________________________________________
____________________________________________________________________________
____________________________________________________________________________
____________________________________________________________________________
____________________________________________________________________________
____________________________________________________________________________

Blank rectangle (for rough work)

EE457 Final - Fall2023 2 / 22 C Copyright 2023 Gandhi Puvvada

Q1P3 Page total pts
EE457 Final - Fall2023 3 / 22 C Copyright 2023 Gandhi Puvvada
Q1P4 Page total pts
EE457 Final - Fall2023 4 / 22 C Copyright 2023 Gandhi Puvvada
Q1P5 Page total pts
EE457 Final - Fall2023 5 / 22 C Copyright 2023 Gandhi Puvvada
Q1P6 Page total pts
EE457 Final - Fall2023 6 / 22 C Copyright 2023 Gandhi Puvvada
Q1P7 Page total 14 pts

1.5 Complete the MOESI state diagram

state transition
conditions in
the two boxes
for the two state
transition
arrows: one
from I to S state
and the other
from I to E
state. If you
used any open-
collector (open-
6 drain) signal,
pts identify it and
state why it
should be an
open-collector
(open-drain)
signal.
____________________________________________________________________________
____________________________________________________________________________
____________________________________________________________________________
8 ____________________________________________________________________________
pts
____________________________________________________________________________
____________________________________________________________________________

Blank rectangle (for rough work)

EE457 Final - Fall2023 7 / 22 C Copyright 2023 Gandhi Puvvada

Q2P8 Page total 32 pts

2 ( points) min. Miscellaneous advanced topics

2.1 MPI (Miss rate Per Instruction) in the case of a hierarchy of caches
P
Calculate the effective CPI assuming that there are no other losses of clocks
due to stalling or flushes etc. (i.e CPI would be 1 if there are no cache
misses in L1 cache). Cache L1

MPI for L1 cache: MPI_1 = 6% Cache L2

8 L1 miss penalty: L1_M_P = 25 clocks
pts
MPI for L2 cache: MPI_2 = 1% Cache L3
L2 miss penalty: L2_M_P = 50 clocks
MPI for L3 cache: MPI_3 = 0.5% Main Memory
L3 miss penalty: L3_M_P = 200 clocks

2
pts L3 MPI is always less than L2 MPI which is always less than L1 MPI. True / False

2.2 Branch prediction: A 2Kx2 2-bit BPB in the ID stage is indexed by Mr. Trojan using
______________ ( PC[12:2] / PC[31:21] ) where as Mr. Bruin used ______________ ( PC[12:2]
8 / PC[31:21]) for indexing. Why Mr. Bruin is wrong? _________ _______________________
pts
____________________________________________________________________________
____________________________________________________________________________
____________________________________________________________________________

2.3 A direct mapped cache in a 32-bit address 32-bit data system uses 4-word blocks. The cache size
is 256KB (= 218 bytes = 216 words = 214 blocks)). Identify which of the following two address
divisions was made by Mr. Trojan and which is made by Mr. Bruin.
Address division by Mr. ____________ (Trojan/Bruin)
Tag (14) Index (14) Word Byte

A31 A30 A29 A28 A27 A26 A25 A24 A23 A22 A21 A20 A19 A18 A17 A16 A15 A14 A13 A12 A11 A10 A9 A8 A7 A6 A5 A4 A3 A2 A1 A0
6
pts
Address division by Mr. ____________ (Trojan/Bruin)
Index (14) Tag (14) Word Byte

A31 A30 A29 A28 A27 A26 A25 A24 A23 A22 A21 A20 A19 A18 A17 A16 A15 A14 A13 A12 A11 A10 A9 A8 A7 A6 A5 A4 A3 A2 A1 A0

Mr. Bruin further says that, since both the index and the tag fields are equal in size (14 bits each),
the size if the TAG RAM, DATA RAM, Tag comparison unit are all equal either way. ___ (T / F)
Why Mr. Bruin is wrong? ________________________________________________________
8 ____________________________________________________________________________
pts
____________________________________________________________________________
____________________________________________________________________________
____________________________________________________________________________
EE457 Final - Fall2023 8 / 22 C Copyright 2023 Gandhi Puvvada
Q2P9 Page total 18 pts
2.4 In the following extract from our class notes, we show that the BPB is accessed in the IF stage
and is processed in the ID stage. We sighted timing advantage (register re-balancing). Even if
there is no need for a timing advantage, show that there is cost advantage in doing this . Assume
(and state) a reasonably sized 2-bit prediction BPB, and explain quantitatively the cost advantage.

BPB size assumed: __________________________________________________________________________

6
pts Cost Advantage: __________________________________________________________________________

__________________________________________________________________________________________

Access in IF stage Processing

in ID stage

2 2.5 CMP: Intel's HTT (Hyper Threading Technology) is essentially same as

pts ___________________________ (fine-grain / coarse-grain / simultaneous) multi-threading

2.6 Tomasulo 3:

PRF stands for _____________________________________

FRL stands for _____________________________________

10
pts RAT stands for _____________________________________

Legend: A = Dispatch unit, B= Instruction Retirement logic

FRAT is updated by the _________________ (A / B).

RRAT is updated by the _________________ (A / B).

EE457 Final - Fall2023 9 / 22 C Copyright 2023 Gandhi Puvvada

Q3P10 Page total 15 pts

3 ( points) min. Virtual Memory and Cache

Specs of our Trojan computer (a 32-bit address, 32-bit data, byte-addressable machine) with
physically addressed cache (more specifically PIPT cache).

Virtual address space = 4GB, Virtual address = 32 bits (VA31-VA0) (232 = 4G),
Physical address space = 4GB, Physical address = 32 bits (PA31-PA0) (232 = 4G)

Page size = 16 KB (214 = 16K),

TLB size = 64 entry (fully-associative) (26 = 64)
Page table organization:
2-level table with 128-entry (27 = 128) page directory (top level table)

Cache size = 224 KB (7215 = 7 32K =224K),

Cache Block (cache line size) = two 32-bit words (8 bytes total) (23 = 8),
Cache mapping: Set-associative with _____ blocks per set. (choose a minimum
4
pts number of blocks per set suitable for the 224 KB cache).
State another possible choice _____ (but do not use this choice).

Main memory organization: Lower-order Interleaved. Degree of interleaving to suit

the most efficient access of the main-memory block for transferring it to cache.

3.1 Divide the virtual address into VPN (Virtual Page Number) and Page offset fields. Since
8 TLB is a fully associative TLB, we ____________ (further divide / do not divide) the
pts
VPN into TAG and SET fields.
How many comparators of what size are needed in the TLB? _____________ _
Virtual address Bank Enables BE3-BE0
VA31-VA0 (Byte enables)

VA31 VA30 VA29 VA28 VA27 VA26 VA25 VA24 VA23 VA22 VA21 VA20 VA19 VA18 VA17 VA16 VA15 VA14 VA13 VA12 VA11 VA10 VA9 VA8 VA7 VA6 VA5 VA4 VA3 VA2 VA1 VA0

Byte

Is any portion of the virtual address used for "indexing" TLB? ______________ (Yes / No ).

3.2 Divide the virtual address into VPN and Page offset fields again and further divide the VPN
(based on the page table organization information) into page directory index
and 2nd-level page table index.

Virtual address Bank Enables BE3-BE0

3 VA31-VA0 (Byte enables)
pts
VA31 VA30 VA29 VA28 VA27 VA26 VA25 VA24 VA23 VA22 VA21 VA20 VA19 VA18 VA17 VA16 VA15 VA14 VA13 VA12 VA11 VA10 VA9 VA8 VA7 VA6 VA5 VA4 VA3 VA2 VA1 VA0

Byte

EE457 Final - Fall2023 10 / 22 C Copyright 2023 Gandhi Puvvada

Page total 24 pts
Q3P11

3.3 Divide the physical address into PPFN (Physical Page Frame Number) and Page offset fields.

2 Physical address Bank Enables BE3-BE0

PA31-PA0 (Byte enables)
pts
PA31 PA30 PA29 PA28 PA27 PA26 PA25 PA24 PA23 PA22 PA21 PA20 PA19 PA18 PA17 PA16 PA15 PA14 PA13 PA12 PA11 PA10 PA9 PA8 PA7 PA6 PA5 PA4 PA3 PA2 PA1 PA0

Byte

3.4 Divide the physical address (based on cache specifications) into TAG, SET, WORD and BYTE fields

3 Physical address Bank Enables BE3-BE0

pts PA31-PA0 (Byte enables)

PA31 PA30 PA29 PA28 PA27 PA26 PA25 PA24 PA23 PA22 PA21 PA20 PA19 PA18 PA17 PA16 PA15 PA14 PA13 PA12 PA11 PA10 PA9 PA8 PA7 PA6 PA5 PA4 PA3 PA2 PA1 PA0

Byte

3.5 If the 32-bit physical byte address (produced by TAG RAM

13 address translation
pts Address
through TLB or Page Table) is 70586124 H

Comparator
(0111_0000_0101_1000_0110_0001_0010_0100B), Data_out HIT
+ valid
which set in the cache you will be approaching? Data_in
_____________________________ (set # in binary)
Does this set number form an index (an address) Size =
into _____________________________ (the
multiple TAG RAMs/the single TAG RAM/neither _____ more (besides the above)
of these)? are needed in this cache.
Complete the TAG RAM details in the side panel.

3.6 Complete the Cache DATA RAM details below.

DATA RAM
Address
Size: Each ______ more
of the 4 such DATA
byte_wide
D31-D24

D23-D16

D15-D8

RAM units
D7-D0

6 banks is a
pts Trojan (besides the
x 8
Processor D31-D 0 one on the side

Blank rectangle (for rough work)

EE457 Final - Fall2023 11 / 22 C Copyright 2023 Gandhi Puvvada

Q3P12 Page total 34 pts
3.7 Complete the Interleaved Main Memory details below.
Each of these 4 is __________ MB in size.
PA - PA

______ more such units

(besides the one on the left)
exist in Main Memory.
6 D31-D24 D23-D16 D15-D8 D7-D0
pts
32 bit 32 bit 32 bit 32 bit
XCVR XCVR XCVR XCVR

D31-D0

4
3.8 TLB miss leads to a _________________ (cache look up / a PT look up).
pts During TLB look up, a Read/Write/Execute violation (a memory protection violation) causes a TRAP. T / F
3.9 In a set associative cache of 2-blocks per set and 4 words per block, the degree of lower-order
interleaving recommended for the main memory is __________ (1-way/2-way/4-way/8-way/
6 other namely ...) and the number of TAG RAMs is __________ (8/16/32/other namely ...).
pts
The depth of a TAG RAM is determined by ________________________________________.
3.10 In a 4-core processor with each core running 8 threads,
6 in each core, there is/are ______________ (1 / 4 / 8) PTBR(s),
pts in each core, there is/are ______________ (1 / 4 / 8) L1 Data cache(s),
in each core, there is/are ______________ (1 / 4 / 8) PC(s),
in each core, there is/are ______________ (1 / 4 / 8) Register Files.

Assume that our TLB entries have a field containing Address Space number. Write a few lines,
stating the number of TLBs in a core, whether flushing of TLB occurs on thread switching by
hardware or context switching by the operating system or under both or neither situations.
___________________________________________________________________________
6 ___________________________________________________________________________
pts ___________________________________________________________________________
___________________________________________________________________________

3.11 Oracle T1 processor has the TS (Thread Select) stage before the ID where as in our EE560 CMP,
6 we have the TS stage after the ID stage. Both are right in their own context because .. _______
pts ____________________________________________________________________________
____________________________________________________________________________
____________________________________________________________________________

EE457 Final - Fall2023 12 / 22 C Copyright 2023 Gandhi Puvvada

Q4P13 Page total 28 pts
4 ( points) min. FIFO
A 8x4 single-clock FIFO (8 locations, each of 4 bits) can use one of the two methods below.
(i) n-bit (3-bit) WP and RP pointers with a AF/AE FF (Almost Full/Almost Empty flip-flop to disambiguate the WP-RP=0 situation)
28
pts (ii) (n+1)-bit (4-bit) WP and RP pointers.
The number of pins on the FIFO chip is _________ (the same / different) in the two choices .
In method (i), for WP = 110 (6), if the FIFO is full, find the values: RP = ______, AF/AE FF = ______
In method (i), for WP = 110 (6), if the FIFO is empty, find the values: RP = ______, AF/AE FF = ______
In method (i), for depth calculation, you perform ______ (3-bit/4-bit) subtraction ___________________
(WP-RP/RP-WP) with modulo _____.
In method (ii), for WP = 1100 (12), if the FIFO is full, find the values: RP = ______
In method (ii), for WP = 1100 (12), if the FIFO is empty, find the values: RP = ______
In method (ii), for depth calculation, you perform ______ (3-bit/4-bit) subtraction ___________________
(WP-RP/RP-WP) with modulo _____.
For the following four figures, if possible calculate depth and show the calculation (mod subtraction).
If not possible, state the reason.
Method (i) 8 7
Method (ii) 8 7
4 3 4 3
9 6 9 6

10 5 10 5
5 2 5 2
11 4 11 4
WP RP
WP RP 12 3 12 3
P
RP
1 1
P

6 6
RP

W
W

13 2 13 2
14 1 14 1
7 0 7 0 15 0 15 0

Depth = ___ Depth = _ Depth = _ Depth = ___

5 ( points) min.
Lab 7 Part 3 Subpart 3 Verilog RTL coding: A couple of figures to refresh your memory.
XD
IFRF_Mux

EX2 WB EX2 WB
IFRF Circuit

EX2_XMEX1 EX2_XMEX1
0
WD 1

Qualifying
reg_file[XA]

FU2 signals
WB_EX2_ADDER_OUT

FU2
MODIFIED

Qualifying
ADD4 signals ADD4
P=Q

EN EN
EX2_ADDER_IN EX2_ADDER_OUT
FORW2
R-Write

FORW2

ADDER_OUT
RA Q
P
XA

A+4 R2_Mux A+4 R2_Mux

X2_Mux 0 WB_RD X2_Mux 0
0 A 1 RD 0 1
A
ADDER_IN
WB_EX2_ADDER_IN

1
XD

CLK

1
WB_SKIP2
SKIP2

Cout
Reg. File

Cout
R-Write

EX2_MOV EX2_MOV
RD
XA

SKIP2
RA

Write

EX2_SUB3 EX2_SUB3
WB_Write WB_Write
Write

EX2_ADD4 EX2_ADD4
ORIGINAL

EX2_ADD1 EX2_ADD1
WB_RA WB_RA
EX2_RA RA RA
EX2_RA
XD

CLK

RESET_B RESET_B
Reg. File

R-Write
RD
XA

ORIGINAL MODIFIED

EE457 Final - Fall2023 13 / 22 C Copyright 2023 Gandhi Puvvada

Q4P14 Page total 20 pts
Suppose we had only one change form subpart 1, namely change of the negative-edge triggered
register file to positive-edge triggered register file with internal forwarding mechanism, but no
R2-Mux movement to WB stage.
Consider each of the following four choices for coding the write port (only the write port) of the
register file and state whether you agree or disagree.

always @(posedge CLK)

begin : RegFile_Block if (WB_WRITE)
if (WB_WRITE) begin
begin reg_file[WB_RA] <= WB_RD;
reg_file[WB_RA] <= WB_RD; end
end
end
Students #2, #3, and #4 choose to write
the above lines in the main clocked block
Student #1 chooses to write a separate after the following line:
clocked block as shown above.
else // else if posedge CLK

However, students #2, #3, and #4 differed in where they placed the above 4 lines.

Student #2 placed the 4 lines at the beginning of the else block (i.e before
producing WB_WRITE, WB_RA, and WB_RD).

Student #3 placed the 4 lines at the end of the else block (i.e after
producing WB_WRITE, WB_RA, and WB_RD).

Student #4 placed the 4 lines in the middle of the else block (i.e after
producing WB_WRITE and WB_RA, but before producing WB_RD).

You agree with Student #1. Yes / No

6 You agree with Student #2. Yes / No
pts
You agree with Student #3. Yes / No
You agree with Student #4. Yes / No

Your short explanation: _________________________________________________________

14 ____________________________________________________________________________
pts ____________________________________________________________________________
____________________________________________________________________________
____________________________________________________________________________
____________________________________________________________________________
____________________________________________________________________________

EE457 Final - Fall2023 14 / 22 C Copyright 2023 Gandhi Puvvada

Q5P15 Page total 10 pts
6 ( points) min. Lab 7 P3 SP2 modification

An ADD8 instruction (besides an ADD4 instruction) can be supported in Lab 7 Part3 by replacing
the SUB3 unit in EX1 with another ADD4 unit. Instead of having an ADD4 unit in each of the two
EX stages, EX1 and EX2, here, we have merged those two stages, EX1 and EX2 into EX12. So
ADD8 needs an extra clock in EX12 as it has to go through the second ADD4 also.

Instruction Operation Opcode MSD 32-bit instruction in hex

MOV BZ ADD4 ADD8 D=Destination, S=Source

NOP 0 0 0 0 0 000000DS

MOV $R, $X; ($R) <= ($X) 1 0 0 0 8 800000DS

SUB3 $R, $X; ($R) <= ($X) - 3 0 1 0 0 4 400000DS

BZ $X, JJJJ; (PC) <= JJJJ if ($X) = 0 0 1 0 0 2 4JJJJ0DS

ADD4 $R, $X; ($R) <= ($X) + 4 0 0 1 0 2 200000DS

ADD8 $R, $X; ($R) <= ($X) + 8 0 0 0 1 1 100000DS

We have a BZ (Branch if Zero) instruction. It uses the opcode previously allocated to the SUB3
instruction. The instructions are 32-bits in size, but the addresses are only 16-bit. PC is 16-bit wide
and is incremented by a "1". The JJJJ in the BZ $X, JJJJ stands for a 16-bit (4-digit hex)
absolute branch address. If the source register $X is a zero then we branch to JJJJ [ (PC) <= JJJJ if
($X) = 0 ]. The "D" in "4JJJJ0DS" is a random hex digit and should not be treated as a valid destination,
similar to the "DS" in "000000DS" for a NOP instruction. BZ executes from the ID stage.

You need to complete the early branch mechanism: dependency stalls, branch execution by causing
PC to be changed to JJJJ and flushing the junior instruction in IF stage, avoiding spurious branch
execution during stalling of ID stage (stalling BZ due to its dependency on ADD4 or ADD8 in the
EX12 stage), etc. A copy of our Lab 6 Early Branch design is given at the end of the
exam just FYI (for your information)

6.1 Complete the design on the page next to next (on page 17).

6.2 In your lab 7 Part 3 Subpart 2 (EX1 and EX2 merged case), you used the left side circuit below to stall
ADD1 for 1 clock. Complete the design by labeling the STALL signal. Suppose you are given a
flipflop with an asynchronous set as shown in the right side below (instead of the FF with an asynchronous
clear as shown on the left). Redesign your stall circuit with this FF and show the STALL signal.

STALL?? STALL??
RESET_B
10
pts
D Q SET
D Q
EX12_ADD1 CLK CLK
CLK CLK
CLR
RESET_B

EE457 Final - Fall2023 15 / 22 C Copyright 2023 Gandhi Puvvada

Q5P16 Page total 43 pts
6.3 When STALL_ADD8 is active, you stall the entire pipeline. True / False
6 When STALL_BR is active, you stall the entire pipeline. True / False
pts IF_Flush mechanism here is ___________________ (the same as / different from) the wrist-band
mechanism used in our pipelined CPU design.

6.4 In this design we have implemented an early branch. Would a medium branch from EX12 be better?
Yes / No / It depends. Explain. ____________________________________________________
_______________________________________________________________________________
_______________________________________________________________________________
6 _______________________________________________________________________________
pts
_______________________________________________________________________________
_______________________________________________________________________________
Is it possible to postpone executing the BZ instruction all the way into the WB stage (WB!, not EX12)?
Not Possible / possible but undesirable / possible and desirable. Explain __________________
_______________________________________________________________________________
6 _______________________________________________________________________________
pts _______________________________________________________________________________
_______________________________________________________________________________

6.5 Combining EX1 and EX2 into one EX12 stage (as done here) is ____________________________
(always better / always worse / depends on the instruction sequence in the program). Explain. ___
_______________________________________________________________________________
10
pts _______________________________________________________________________________
_______________________________________________________________________________
_______________________________________________________________________________

6.6 How come, we carried (PC + 4) to the ID stage in our 5-stage early branch CPU design (copy 16
+
5 at the end of the exam), but we do not carry (PC+1) to the ID stage here? __________________
pts 16 1
_________________________________________________________________________
_______________________________________________________________________________

6.7 We had HDU_BR, FU_BR, HDU, and FU in our 5-stage early branch CPU design (copy at the end of the exam).
5 How come we do not have a HDU here? We have the other three pieces here. ________________
pts _______________________________________________________________________________
_______________________________________________________________________________

6.8 Produce STALL_BR below.

Comp Station
5 in ID Stage
pts ID_XMEX12
HDU_BR

STALL_BR STALL_BR

EE457 Final - Fall2023 16 / 22 C Copyright 2023 Gandhi Puvvada

Q5P17
PCSource IF STALL_ADD8 ID EX12 WB
STALL_IF_ID Comp Station
in ID Stage EN
FU
1 STALL_BR

XMEX12
EX12_XMEX12

Page total 15
ID_XMEX12
0 HDU_BR

XD_ZERO
STALL_BR
16 16 16
ADD4 ADD4
EN EN
+ FU_BR
EN

pts
1

FORW1
Branch Reg. File
JJJJ Address IFRF X0_Mux A+4
I-MEM

16 A+4 R2_Mux
0 R1_Mux
XA XD XD 0
X1_Mux
XA 1 0
A 1 RD
WB_RA RA 0 A 1

FORW0

SKIP2
EE457 Final - Fall2023

PC WB_RD 1
RD

SKIP1
Cout

EX12_Write
R-Write Cout

WB_Write
ADD8 ADD4 BZ MOV

MOV
ID_MOV WB_Write

EX12_MOV

Write
ID_BZ
RESET_B

ADD8 ADD4

WB_RA
ID_ADD4 EX12_ADD4
EX12_A4_A8
17 / 22

EX12_ADD8
IF_Flush
ID_ADD8 RA
EX12_RA
RA STALL_BR
RA
RESET_B
RESET_B
RESET_B

Comp Station in ID Stage

CLK
ID_XA Matched with EX12_RA CLR
RESET_B STALL_ADD8
ID_XMEX12

P=Q
1. Complete the 6 connections to/from
P Q 2. Complete the STALL_ADD8 logic in EX12 (generate it).
4. Draw needed logic to produce IF_Flush, SKIP1, SKIP2 on this page itsef.
ID_XA EX12_RA 3. On a separate page, draw logic to produce STALL_BR,PCSource, FORW0, and FORW1.
Q5P18 Page total 36 pts
6.9 In our Lab 6 early branch design (copy at the end of the exam), we produced a BR1 signal, which _______ (A/B).
A = may go active based on obsolete values, but no harm is done because of the guardian angel HDU_BR.
B = is generated carefully and does not require any guardian angel’s help!
For the current design, produce a BR1 with less logic is possible, and state if there is a guardian
angel to help the BR1. State if appropriate how the simple-minded BR1 does not cause any harm.
In the same box below produce PCSource (mux select line to select the next value for the PC)
and IF_Flush (to flush the Junior1 after a taken branch.
8 PCSource PCSource
pts
16
16 1
16
0
BR1 16
+
16 1

IF_Flush

Your above BR1 _______ (A/B).

A = may go active based on obsolete values, but no harm is done because of the guardian angel HDU_BR.
8
pts B = is generated carefully and does not require any guardian angel’s help!
Explain: __________________________________________________________________________
___________________________________________________________________________________
___________________________________________________________________________________

6.10 Guardian angel: In our 5-stage early branch CPU design (copy at the end of the exam), we said that
HDU_BR acts like a guardian angel to FU_BR and FU_BR could use ___________________
____________________________________________________________________________
(use words like register-writing, memory-reading, register-writing-but-not-memory-reading, if they fit).
Here, while FU_BR could use ____________________ (EX2_Write/EX2_MOV) in place of
10 the more precise ________________ (EX2_Write/EX2_MOV), using the more precise signal
pts
creates a __________ (slower/faster) timing path! Produce FORW0 and FORW1 below.
Can the FU_Br be generous and help ID_XMEX12
other instructions besides the ID_Bz? _____ (Y/N) FU_BR FU
EX12_XMEX12
X0_Mux

XD 0 X1_Mux
1
0
FORW1
FORW0

6.11 The ID stage gets stalled for _______ (0/1/2) clock(s), if the BZ in ID is dependent on a MOV in EX12.

The ID stage gets stalled for _______ (0/1/2) clock(s), if the BZ in ID is dependent on a ADD4 in EX12.
10
pts The ID stage gets stalled for _______ (0/1/2) clock(s), if the BZ in ID is dependent on a ADD8 in EX12.

The EX12 stage gets stalled for _______ (0/1/2) clock(s), if a ADD8 is present in EX12.

Q5P19 Page total 24 pts
6.12 Complete the following "Single Cycle CPU" version of the 4-stage pipeline design. Complete the
control unit and the six points marked with .

PCSource Single Cycle CPU

16
16 1
pts

XD_ZERO
0
16
16
ADD4
PC_EN

ADD4
+ Reg. File

1 Branch
Address
XD A+4
I-MEM

JJJJ A+4 R2_Mux

16 XA XD
R1_Mux
0
XA
0
RA A 1
A 1
RA RD
PC R-Write
Cout Cout
RegWrite

SKIP1
CU

SKIP2
RESET_B ADD4

MOV

ADD8

BZ Branch

complete this

For this single-cycle CPU, you expect to have a clock with a clock period equivalent to the 4-
stage pipeline, or half of it or double of it? ____________________ (Short answer first). Brief
explanation: _________________________________________________________________
8 ____________________________________________________________________________
pts
____________________________________________________________________________
____________________________________________________________________________
____________________________________________________________________________

Blank rectangle (for rough work)

Q5P20 Page total 53 pts
6.13 Mr. Trojan thought of an improvement to our above 4-stage pipeline saving one clock
occasionally. He wants you to implement the improvement below. He has given you enough clues
18 below in the form of observations, questions, and suggestions.
pts
1. Do you agree that a BZ instruction does not do anything in the last two stages (EX12 and WB) _______ (Y / N).
2. Unless BZ itself wants to stall because of its dependency on its senior#1, can we let BZ execute
and vanish while an ADD8 is stalled in EX12? ________ (Yes / No).
3. The word "execute" in the preceding sentence may include both taken as well as untaken
branches. ________ (Yes / No).
4. You may want to avoid stalling the PC and the IF/ID stage register to save a clock under that
special occasion.
4.1. However, if a non-branch instruction (other than a NOP) is in the ID stage, the ADD8 related
stall shall stall PC and the IF/ID stage register to avoid loss of the ID stage instruction. ______ (T/F)
5. WB stage and IFRF: The senior in the WB stage may be helping the BZ in the ID stage through
the register file which is a IFRF. What does IFRF mean? ______________________________
6. The senior in the WB stage may be helping the ADD8 in the EX12 stage also. ______ (T / F)
7. WB stage instruction’s behavior when it is stalled:
A register writing WB stage instruction should write to the Register File
15
Choice #1 (C#1): in every clock even if it is stalled by STALL_ADD8
pts Choice #2 (C#2): only in clocks when it is not stalled
7.1 In our original design without Trojan’s improvement, ______________________________
(C#1 only/C#2 only/both/neither) is/are acceptable.
7.2 In the new design with Trojan’s improvement, ______________________________
(C#1 only/C#2 only/both/neither) is/are acceptable.
Discuss/Explain: _____________________________________________________________
____________________________________________________________________________
____________________________________________________________________________
____________________________________________________________________________
____________________________________________________________________________

STALL_ADD8
20 STALL_IF_ID STALL_IF_ID
pts
STALL_BR

EN EN EN EN
IF_ID

IF_ID
PC

Current design without Please implement

EE457 Final - Fall2023 20 / 22 C Copyright 2023 Gandhi Puvvada
Trojan’s improvement Trojan’s improvement
P21
Lab 6 Early Branch Design (Just FYI)
RegWrite_EX
Hazard MemRead_EX
detection MemRead_MEM

Non-Grading page. DEN students: No need to submit this page

unit
1
HDU_Br WriteRegister_MEM
WB
0 STALL_BEQ ME
STALL_LW RegDst
WB

Branch
EX ALUOp
STALL ALUSrc ME

+
(PC)

BR1
Control

4 0 0

MemWrite
MemRead
1

WriteRegister_EX

RegWrite
0 0
Branch
opcode

(rs)
1

ALU_result
1
EE457 Final - Fall2023

FW_RS_MEM
ALU

FW_RS_WB
R1 0 MemtoReg
@ WB
r1
rs

1 ALUSrc
Instruction

Registers
memory

Zero

MEM_data
memory
PC = 0

(rt)
r2
rt

Data
0 1
R2 0 0 R

Store_data
1
w 1
1 1

FW_RT_WB
rd

W 0

FW_RT_MEM
RegDst
W
21 / 22

rs
FW_RS
shift

Shift
FW_RT

ALUOp

REG_data
Left 2 ALU
Sign ctrl

WR
rt
funct

ext. 0

RegWrite
IF.Flush
fowarding_mux_control
rd

WR
FU_Br
s_ext

WriteRegister_MEM
funct
P22 Non-Grading page. DEN students: No need to submit this page
Blank page: Please write your name and email. Tear it off and use for rough work. Do not submit.
Student’s First & Last Name:______________________ email: __________________

We enjoyed teaching this course. Hope you liked it too!

Best Wishes!
Gandhi, TA: Rakshith Jayanth Mentor-cum-Graders: Shubham Rana, Ziyu Liu, Wenkai Zhang, Haochen Wu, Junjie Chen, Godha Lakshmi Garudaiahgari

Computer Architecture Midterm1 Cmu
No ratings yet
Computer Architecture Midterm1 Cmu
30 pages
Samsung Mobile Price in India 2012
No ratings yet
Samsung Mobile Price in India 2012
5 pages
Failure of Nokia Full Report.
100% (3)
Failure of Nokia Full Report.
23 pages
3 Best Ways On How To Bypass Google Account Samsung
No ratings yet
3 Best Ways On How To Bypass Google Account Samsung
10 pages
Homework 2 - Solution
No ratings yet
Homework 2 - Solution
5 pages
تفسیر حکمت القران اول جلد
No ratings yet
تفسیر حکمت القران اول جلد
693 pages
Mi Account ေက်ာ္နည္း
50% (2)
Mi Account ေက်ာ္နည္း
16 pages
Detox Run May 12
No ratings yet
Detox Run May 12
7,607 pages
M116C 1 EE116C-Midterm2-w15 Solution
100% (1)
M116C 1 EE116C-Midterm2-w15 Solution
8 pages
Final Exam - Fall 2008: COE 308 - Computer Architecture
No ratings yet
Final Exam - Fall 2008: COE 308 - Computer Architecture
8 pages
PS4 Solution
No ratings yet
PS4 Solution
6 pages
Midterm s09 Solution
No ratings yet
Midterm s09 Solution
12 pages
Content Writing
No ratings yet
Content Writing
3 pages
Sample Problems Pipe&Memory
No ratings yet
Sample Problems Pipe&Memory
57 pages
15IF11 Multicore E PDF
No ratings yet
15IF11 Multicore E PDF
14 pages
COSS - 2022-23 Question Paper
No ratings yet
COSS - 2022-23 Question Paper
6 pages
Android Advisor - 123 2024 Freemagazines Top
No ratings yet
Android Advisor - 123 2024 Freemagazines Top
96 pages
Check IMEI Iphone or Serial Apple FREE
No ratings yet
Check IMEI Iphone or Serial Apple FREE
1 page
IMEI Inventory Query New-20250316173517136
No ratings yet
IMEI Inventory Query New-20250316173517136
135 pages
Manual Reloj Skmei 1326 PDF
No ratings yet
Manual Reloj Skmei 1326 PDF
25 pages
Midterm1 s15 Sol
No ratings yet
Midterm1 s15 Sol
26 pages
This Is Examined First Whenever The Processor Tries To Read Data From The Main Memory.
No ratings yet
This Is Examined First Whenever The Processor Tries To Read Data From The Main Memory.
32 pages
New CUBA - Modelos Frequency - Bands Update List - 9 1 2022
No ratings yet
New CUBA - Modelos Frequency - Bands Update List - 9 1 2022
20 pages
Final Exam 2020 PDF
No ratings yet
Final Exam 2020 PDF
22 pages
Sem8 Endsem
No ratings yet
Sem8 Endsem
21 pages
CS683 Exam2 Answer
No ratings yet
CS683 Exam2 Answer
12 pages
iOS (Dynamo) - Installation Guide
No ratings yet
iOS (Dynamo) - Installation Guide
26 pages
Pipeline History
No ratings yet
Pipeline History
30 pages
IMEI Check Samsung - SAMSUNG IMEI or Serial Number Free Online Checker
No ratings yet
IMEI Check Samsung - SAMSUNG IMEI or Serial Number Free Online Checker
1 page
2324sem 1-CS2100
No ratings yet
2324sem 1-CS2100
14 pages
Kurnia, Sugiarto - 2018 - KAJIAN PERILAKU KONSUMEN PENGGUNA SMARTPHONE IPHONE PADA KALANGAN MAHSISWA STIESIA SURABAYA
No ratings yet
Kurnia, Sugiarto - 2018 - KAJIAN PERILAKU KONSUMEN PENGGUNA SMARTPHONE IPHONE PADA KALANGAN MAHSISWA STIESIA SURABAYA
15 pages
Final Soln 2019 PDF
No ratings yet
Final Soln 2019 PDF
16 pages
CS 6290: High-Performance Computer Architecture Spring 2009 Final Exam
No ratings yet
CS 6290: High-Performance Computer Architecture Spring 2009 Final Exam
14 pages
L-3rr-l/CSE Date:: Iw Iw
No ratings yet
L-3rr-l/CSE Date:: Iw Iw
30 pages
Coa Applied
No ratings yet
Coa Applied
13 pages
Practice Final Soln
No ratings yet
Practice Final Soln
17 pages
Cse410 Sp09 Final Sol
No ratings yet
Cse410 Sp09 Final Sol
10 pages
Midterm Solutions
No ratings yet
Midterm Solutions
12 pages
Chapter 05
No ratings yet
Chapter 05
19 pages
tdt4260 May 2013 Final
No ratings yet
tdt4260 May 2013 Final
7 pages
Midterm Exam Solutions and Grading Guidelines
No ratings yet
Midterm Exam Solutions and Grading Guidelines
8 pages
EE-457 Spring
No ratings yet
EE-457 Spring
11 pages
Product Overview
No ratings yet
Product Overview
12 pages
Nokia BB5 SL3 Unlocking
No ratings yet
Nokia BB5 SL3 Unlocking
2 pages
Future Screnio of Nokia
No ratings yet
Future Screnio of Nokia
8 pages
Homework 5
No ratings yet
Homework 5
6 pages
Ee457 Final Fall2023
No ratings yet
Ee457 Final Fall2023
22 pages
Apk Name
No ratings yet
Apk Name
10 pages
Hotel Empire Tycoon Mod Apk Download Android Ios Iphone
No ratings yet
Hotel Empire Tycoon Mod Apk Download Android Ios Iphone
5 pages
COE301 Final Solution 162
No ratings yet
COE301 Final Solution 162
10 pages
Ee457 MT Sp2022
No ratings yet
Ee457 MT Sp2022
14 pages
CSE 530 Homework #1 Due September 26 Anthony Dotterer: C C C T C T C C T T
No ratings yet
CSE 530 Homework #1 Due September 26 Anthony Dotterer: C C C T C T C C T T
9 pages
COA Digital-Cheatsheet
No ratings yet
COA Digital-Cheatsheet
4 pages
Ee457 MT Fall2024
No ratings yet
Ee457 MT Fall2024
11 pages
Sample Midterm
No ratings yet
Sample Midterm
9 pages
Final w11
No ratings yet
Final w11
10 pages
BEHA User Manual WiFi App Control Web
No ratings yet
BEHA User Manual WiFi App Control Web
7 pages
CS261 Final Exam 2022
No ratings yet
CS261 Final Exam 2022
11 pages
End Sem 3rd Semister 2024
No ratings yet
End Sem 3rd Semister 2024
7 pages
Practice Exam 1
No ratings yet
Practice Exam 1
11 pages
The Verge
No ratings yet
The Verge
6 pages
Weekly Market Update - Key Account
No ratings yet
Weekly Market Update - Key Account
4 pages
Aca Q-Bank
No ratings yet
Aca Q-Bank
3 pages
350 Exam 2 Spring 2024
No ratings yet
350 Exam 2 Spring 2024
7 pages
Computer Organization and Architecture Csen 2202 - 2022
No ratings yet
Computer Organization and Architecture Csen 2202 - 2022
6 pages
CompEng 361 Final Review Problems - Solutions
No ratings yet
CompEng 361 Final Review Problems - Solutions
6 pages
BFE Final Organization Fall 2014 Answer
No ratings yet
BFE Final Organization Fall 2014 Answer
8 pages
Compre 23
No ratings yet
Compre 23
3 pages
2011 Fall Midterm1 Soln CS439
No ratings yet
2011 Fall Midterm1 Soln CS439
8 pages
Apple Mobile Phone Quotation (2024!06!08 21-38-04)
No ratings yet
Apple Mobile Phone Quotation (2024!06!08 21-38-04)
5 pages
F10 E1 Solution
No ratings yet
F10 E1 Solution
5 pages
111 Computer Organization - Final
No ratings yet
111 Computer Organization - Final
4 pages
COA Answers
No ratings yet
COA Answers
5 pages
CS433 hw1 Fall 07
No ratings yet
CS433 hw1 Fall 07
3 pages
ISA 2 Regular Solution
No ratings yet
ISA 2 Regular Solution
4 pages
ISA 2 Regular Solution-1
No ratings yet
ISA 2 Regular Solution-1
4 pages
No "People" Tab in "Find My" App - Apple Community
No ratings yet
No "People" Tab in "Find My" App - Apple Community
1 page
CSE473/Spring 2008 - 1st Midterm Exam
No ratings yet
CSE473/Spring 2008 - 1st Midterm Exam
6 pages
July 2016
No ratings yet
July 2016
2 pages
PCS216
No ratings yet
PCS216
3 pages
Thank You For Your Order!: Enable Desktop Notifications To Stay Up To Date With Your Order
No ratings yet
Thank You For Your Order!: Enable Desktop Notifications To Stay Up To Date With Your Order
2 pages
MTL458 Minor1 2020-21 Sem2
No ratings yet
MTL458 Minor1 2020-21 Sem2
2 pages
Archi Second 2013 2014 JCE
No ratings yet
Archi Second 2013 2014 JCE
2 pages
Apps Apple Com App Id535886823 PT 9008 CT iosChromeShare MT 8
No ratings yet
Apps Apple Com App Id535886823 PT 9008 CT iosChromeShare MT 8
2 pages
Iphone 15pro Bill
No ratings yet
Iphone 15pro Bill
1 page
Illinois Exam2 Practice Solfa08
No ratings yet
Illinois Exam2 Practice Solfa08
4 pages
Fee Challan
No ratings yet
Fee Challan
1 page

Ee457 Final Fall2023

Uploaded by

Ee457 Final Fall2023

Uploaded by

Cover page

EE457 Final Exam (~33.5%)

_____________________________ <== Student’s signature

Ques# Topic Page# Points

2 Miscellaneous advanced topics 8-9 50

3 Virtual Memory and Cache 10-12 73

5 Lab 7 Part 3 SP 3 Verilog RTL coding 13-14 20

6 Lab 7 P3 SP2 modification 15-20 181

Just FYI Early Branch Block diagram 21

Total Cover+ 2-to-20 +2 435

1.2 Legend: A = desirable; B = undesirable; C = wrong; D = none of the above

Blank rectangle (for rough work)

EE457 Final - Fall2023 2 / 22 C Copyright 2023 Gandhi Puvvada

1.5 Complete the MOESI state diagram

Blank rectangle (for rough work)

EE457 Final - Fall2023 7 / 22 C Copyright 2023 Gandhi Puvvada

2 ( points) min. Miscellaneous advanced topics

MPI for L1 cache: MPI_1 = 6% Cache L2

BPB size assumed: __________________________________________________________________________

Access in IF stage Processing

2 2.5 CMP: Intel's HTT (Hyper Threading Technology) is essentially same as

PRF stands for _____________________________________

FRL stands for _____________________________________

Legend: A = Dispatch unit, B= Instruction Retirement logic

FRAT is updated by the _________________ (A / B).

RRAT is updated by the _________________ (A / B).

EE457 Final - Fall2023 9 / 22 C Copyright 2023 Gandhi Puvvada

3 ( points) min. Virtual Memory and Cache

Page size = 16 KB (214 = 16K),

Cache size = 224 KB (7*215 = 7 * 32K =224K),

Main memory organization: Lower-order Interleaved. Degree of interleaving to suit

Virtual address Bank Enables BE3-BE0

EE457 Final - Fall2023 10 / 22 C Copyright 2023 Gandhi Puvvada

2 Physical address Bank Enables BE3-BE0

3 Physical address Bank Enables BE3-BE0

3.5 If the 32-bit physical byte address (produced by TAG RAM

3.6 Complete the Cache DATA RAM details below.

Blank rectangle (for rough work)

EE457 Final - Fall2023 11 / 22 C Copyright 2023 Gandhi Puvvada

______ more such units

EE457 Final - Fall2023 12 / 22 C Copyright 2023 Gandhi Puvvada

Depth = _______ Depth = _______ Depth = _______ Depth = _______

A+4 R2_Mux A+4 R2_Mux

EE457 Final - Fall2023 13 / 22 C Copyright 2023 Gandhi Puvvada

always @(posedge CLK)

You agree with Student #1. Yes / No

Your short explanation: _________________________________________________________

EE457 Final - Fall2023 14 / 22 C Copyright 2023 Gandhi Puvvada

Instruction Operation Opcode MSD 32-bit instruction in hex

MOV $R, $X; ($R) <= ($X) 1 0 0 0 8 800000DS

SUB3 $R, $X; ($R) <= ($X) - 3 0 1 0 0 4 400000DS

BZ $X, JJJJ; (PC) <= JJJJ if ($X) = 0 0 1 0 0 2 4JJJJ0DS

ADD4 $R, $X; ($R) <= ($X) + 4 0 0 1 0 2 200000DS

ADD8 $R, $X; ($R) <= ($X) + 8 0 0 0 1 1 100000DS

EE457 Final - Fall2023 15 / 22 C Copyright 2023 Gandhi Puvvada

6.8 Produce STALL_BR below.

EE457 Final - Fall2023 16 / 22 C Copyright 2023 Gandhi Puvvada

Comp Station in ID Stage

Your above BR1 _______ (A/B).

EE457 Final - Fall2023 18 / 22 C Copyright 2023 Gandhi Puvvada

PCSource Single Cycle CPU

JJJJ A+4 R2_Mux

Blank rectangle (for rough work)

EE457 Final - Fall2023 19 / 22 C Copyright 2023 Gandhi Puvvada

Current design without Please implement

Non-Grading page. DEN students: No need to submit this page

We enjoyed teaching this course. Hope you liked it too!

EE457 Final - Fall2023 22 / 22 C Copyright 2023 Gandhi Puvvada

You might also like

Cache size = 224 KB (7215 = 7 32K =224K),

Depth = ___ Depth = _ Depth = _ Depth = ___