0% found this document useful (0 votes)

494 views30 pages

Computer Architecture Midterm1 Cmu

This document appears to be for an exam for an intro to computer architecture course. It provides information about the exam such as the date, problems, points allocated to each problem, and instructions for taking the exam. It also provides some tips for students taking the exam regarding time management, conciseness, and showing work. The exam seems focused on topics like pipelining, branch prediction, exceptions, and out-of-order execution.

Uploaded by

ÖzgürCemBirler

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

494 views30 pages

Computer Architecture Midterm1 Cmu

Uploaded by

ÖzgürCemBirler

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

CMU 18-447 Introduction to Computer Architecture, Spring 2013

Midterm Exam 1
Date: Wed., 3/6

Name:
Legibility & Name (5 Points):
Problem 1 (65 Points):

Problem 2 (25 Points):

Instructor: Onur Mutlu

TAs: Justin Meza, Yoongu Kim, Jason Lin

Problem 3 (20 Points):

Problem 4 (40 Points):
Problem 5 (40 Points):

LU
T

Problem 6 (40 Points):

Problem 7 (35 Points):

Bonus (45 Points):

Total (270 + 45 Points):

Instructions:

1. This is a closed book exam. You are allowed to have one letter-sized cheat sheet.

2. No electronic devices may be used.

3. This exam lasts 1 hour and 50 minutes.

4. Clearly indicate your final answer for each problem.
5. Please show your work when needed.
6. Please write your initials at the top of every page.
7. Please make sure that your answers to all questions (and all supporting work that is required)
are contained in the space required.

Tips:

Be cognizant of time. Do not spend to much time on one question.

Be concise. You will be penalized for verbosity.

Show work when needed. You will receive partial credit at the instructors discretion.

Write legibly. Show your final answer.

Initials:

1. Potpourri [65 points]

(a) Full Pipeline [6 points]
Keeping a processor pipeline full with useful instructions is critical for achieving high performance.
What are the three fundamental reasons why a processor pipeline cannot always be kept full?

Reason 1.

Data dependences

Reason 2.

Control flow dependences

Reason 3.

Resource contention

(b) Exceptions vs. Interrupts [9 points]

In class, we distinguished exceptions from interrupts. Exceptions need to be handled when detected by the processor (and known to be non-speculative) whereas interrupts can be handled
when convenient.
Why does an exception need to be handled when it is detected? In no more than 20 words, please.

The running program cannot continue if the exception is not immediately handled.

What does it mean to handle an interrupt when it is convenient?

The processor can handle the interrupt at any time.

Why can many interrupts be handled when it is convenient?

They are not needed for the running programs progress and not critical for system
progress.

2/30

Initials:

(c) Branch Target Buffer [5 points]

What is the purpose of a branch target buffer (in no more than 10 words, please)?
A BTB caches the target of a branch.

What is the downside of a design that does not use a branch target buffer? Please be concrete
(and use less than 20 words).

Determining the branch target would cause bubbles in the pipeline.

(d) Return Address Prediction [4 points]

In lecture, we discussed that a return address stack is used to predict the target address of a return
instruction instead of the branch target buffer. We also discussed that empirically a reasonablysized return address stack provides highly accurate predictions.
What key characteristic of programs does a return address stack exploit?
Usually, a return matches a function call.

Assume you have a machine with a 4-entry return address stack, yet the code that is executing
has six levels of nested function calls each of which end with an appropriate return instruction.
What is the return address prediction accuracy of this code?
4
6

(e) Restartable vs. Precise Interrupts [6 points]

As we discussed in one of the lectures, an exception (or interrupt) is restartable if a (pipelined)
machine is able to resume execution exactly from the state when the interrupt happened and
after the exception or interrupt is handled. By now you also should know what it means for an
interrupt to be precise versus imprecise.
Can a pipelined machine have restartable but imprecise exceptions or interrupts?

Yes.

3/30

Initials:

What is the disadvantage of such a machine over one that has restartable and precise exceptions
or interrupts? Explain briefly.

It would be hard to debug code running on such a machine. Restartable exceptions do

not ease debugging.

(f) Segmentation and Paging [4 points]

In segmentation, translation information is cached as part of the
. In paging, translation information is cached in the

Segment selector
TLB

(g) Out-of-Order vs. Dataflow [8 points]

When does the fetch of an instruction happen in a dataflow processor?

When all inputs to the instruction are ready.

When does the fetch of an instruction happen in an out-of-order execution processor?

When the program counter points to that instruction.

In class, we covered several dataflow machines that implemented dataflow execution at the ISA
level. These machines included a structure/unit called the matching store. What is the function
of the matching store (in less than 10 words)?

Determines whether a dataflow node is ready to fire.

What structure accomplishes a similar function in an out-of-order processor?

Reservation stations.

4/30

Initials:

(h) Tomasulos Algorithm [5 points]

Here is the state of the reservation stations in a processor during a particular cycle ( denotes an
unknown value):
Tag
A
B
C

ADD Reservation Station

V Tag Data V Tag Data
0
D

27
1

3
0
E

0
B

0
A

Tag
D
E

MUL Reservation Station

V Tag Data V Tag Data
0
B

0
C

16
0
B

What is wrong with this picture?

Cyclical dependences between instructions, which leads to deadlock. (Between tags B

and E, and also between tags A, D, and C.)

(i) Minimizing Stalls [10 points]

In multiple lectures, we discussed how the compiler can reorder instructions to minimize stalls
in a pipelined processor. The goal of the compiler in these optimizations is to find independent
instructions to place in between two dependent instructions such that by the time the consumer
instruction enters the pipeline the producer would have produced its result.
We discussed that control dependences get in the way of the compilers ability to reorder instructions. Why so?

At compile time, the compiler does not know whether control dependences are taken or
not during execution. Hence, it does not know if an instruction can be moved above or
below a branch.

What can the compiler do to alleviate this problem? Describe two solutions we discussed in class.

Solution 1.

Predication. Eliminating branches solves the problem.

Solution 2.

Superblock or trace scheduling. The compiler profiles the code and determines likely branch directions and optimizes the instruction scheduling on
the frequently executed path.

5/30

Initials:

What is the major disadvantage or limitation of each solution?

Solution 1.

Wasted instructions, ISA changes

Solution 2.

Profile may not be accurate

(j) Tomasulos Algorithm Strikes Back [8 points]

You have a friend who is an architect at UltraFastProcessors, Inc. Your friend explains to you
how their newest out-of-order execution processor that implements Tomasulos algorithm and that
uses full data forwarding works:
After an instruction finishes execution in the functional unit, the result of the instruction is
latched. In the next cycle, the tag and result value are broadcast to the reservation stations.
Comparators in the reservation stations check if the source tags of waiting instructions match the
broadcast tag and capture the broadcast result value if the broadcast tag is the same as a source
tag.
Based on this description, is there an opportunity to improve the performance of your friends
design? Circle one:
YES

If YES, explain what type of code leads to inefficient (i.e., lower performance than it could be)
execution and why. (Leave blank if you answered NO above.)
Tag and result broadcast is delayed by a cycle, thus delaying the execution of dependent
instructions.
If YES, explain what you would recommend to your friend to eliminate the inefficiency. (Leave
blank if you answered NO above.)
Broadcast the tag and result as soon as an instruction finishes execution.

If NO, justify how the design is as efficient as Tomasulos algorithm with full data forwarding can
be. (Leave blank if you answered YES above.)
BLANK

If NO, explain how the design can be simplified. (Leave blank if you answered YES above.)
BLANK

6/30

Initials:

2. Branch Prediction and Dual Path Execution [25 points]

Assume a machine with a 7-stage pipeline. Assume that branches are resolved in the sixth stage.
Assume that 20% of instructions are branches.
(a) How many instructions of wasted work are there per branch misprediction on this machine?
5

instructions.

(b) Assume N instructions are on the correct path of a program and assume a branch predictor
accuracy of A. Write the equation for the number of instructions that are fetched on this machine
in terms of N and A. (Please show your work for full credit.)

Note that if you assumed the wrong number of instructions in Part (a), you will only be
marked wrong for this in Part (a). You can still get full credit on this and the following
parts.
Correct path instructions = N
Incorrect path instructions = N (0.2)(1 A)5 = N (1 A)
Fetched instructions = Correct path instructions + Incorrect path instructions
= N + N (1 A)
= N (2 A)

(c) Lets say we modified the machine so that it used dual path execution like we discussed in class
(where an equal number of instructions are fetched from each of the two branch paths). Assume
branches are resolved before new branches are fetched. Write how many instructions would be
fetched in this case, as a function of N . (Please show your work for full credit.)

Correct path instructions = N

Incorrect path instructions = N (0.2)5
Fetched instructions = Correct path instructions + Incorrect path instructions
= N + N (1 0.8)5
= 2N
This solution assumes you have enough hardware in the frontend of the machine to
fetch concurrently from both paths. If you assumed that both paths are fetched from on
alternate cycles, that high-level approach is also OK, although note that you would need
additional branch taken and not taken information to solve it completely.

7/30

Initials:

(d) Now lets say that the machine combines branch prediction and dual path execution in the following way:
A branch confidence estimator, like we discussed in class, is used to gauge how confident the
machine is of the prediction made for a branch. When confidence in a prediction is high, the
branch predictors prediction is used to fetch the next instruction; When confidence in a prediction
is low, dual path execution is used instead.
Assume that the confidence estimator estimates a fraction C of the branch predictions have high
confidence, and that the probability that the confidence estimator is wrong in its high confidence
estimation is M .
Write how many instructions would be fetched in this case, as a function of N , A, C, and M .
(Please show your work for full credit.)

Correct path instructions = N

Incorrect path instructions due to. . .
lack of confidence
= N (0.2)(1 C)5 = N (1 C)
incorrect high confidence estimate
= N (0.2)CM 5 = N CM
Fetched instructions = Correct path instructions
+ Incorrect path instructions due to
lack of confidence
+ Incorrect path instructions due to
incorrect high confidence estimate
= N + N (1 C) + N CM
= N [2 + C(M 1)]
Like above, if you assumed a different execution model for Part (c), you will not be
penalized for using it in this part.

8/30

Initials:

3. Dataflow [20 points]

Here is a dataflow graph representing a dataflow program:

false
>0?

copy
T

NOT

output

T
copy

AND

The following is a description of the nodes used in the dataflow graph:

AND
NOT
BR
copy
>0?

subtracts right input from left input

bit-wise AND of two inputs
the boolean negation of the input (input and output are both boolean)
passes the input to the appropriate output corresponding to the boolean condition
passes the value from the input to the two outputs
true if input greater than 0

Note that the input X is a non-negative integer.

What does the dataflow program do? Specify clearly in less than 15 words.

Calculates the parity of X. (True if the number of set bits in X is odd and false otherwise.)

9/30

Initials:

4. Mystery Instruction [40 points]

That pesky engineer implemented yet another mystery instruction on the LC-3b. It is your job to
determine what the instruction does. The mystery instruction is encoded as:
15

14 13
1010

10
DR

7 6
SR1

5
0

4
0

3
0

2
0

1
0

0
0

The modifications we make to the LC-3b datapath and the microsequencer are highlighted in the
attached figures (see the next two pages). We also provide the original LC-3b state diagram, in case
you need it. (As a reminder, the selection logic for SR2MUX is determined internally based on the
instruction.)
The additional control signals are
GateTEMP1/1: NO, YES
GateTEMP2/1: NO, YES
LD.TEMP1/1: NO, LOAD
LD.TEMP2/1: NO, LOAD
ALUK/3: OR1 (A|0x1), LSHF1 (A<<1), PASSA, PASS0 (Pass value 0), PASS16 (Pass value 16)
COND/4:
COND0000 ;Unconditional
COND0001 ;Memory Ready
COND0010 ;Branch
COND0011 ;Addressing mode
COND0100 ;Mystery 1
COND1000 ;Mystery 2
The microcode for the instruction is given in the table below.
State
001010 (10)

Cond
COND0000

J
001011

001011 (11)

COND0000

101000

101000 (40)
110010 (50)

COND0000
COND1000

110010
101101

111101 (61)

COND0000

101101

101101 (45)
111111 (63)

COND0000
COND0100

111111
010010

Asserted Signals
ALUK = PASS0, GateALU, [Link],
DRMUX = DR (IR[11:9])
ALUK = PASSA, GateALU, LD.TEMP1,
SR1MUX = SR1 (IR[8:6])
ALUK = PASS16, GateALU, LD.TEMP2
ALUK = LSHF1, GateALU, [Link],
SR1MUX = DR, DRMUX = DR (IR[11:9])
ALUK = OR1, GateALU, [Link],
SR1MUX = DR, DRMUX = DR (IR[11:9])
GateTEMP1, LD.TEMP1
GateTEMP2, LD.TEMP2

Describe what this instruction does.

Bit-reverses the value in SR1 and puts it in DR.

10/30

Initials:

Code:
DR 0
TEMP1 value(SR1)
TEMP2 16
DR = DR << 1
if (TEMP1[0] == 0)
goto State 45
else
goto State 61
State 61: DR = DR | 0x1
State 45: TEMP1 = TEMP1 >> 1
State 63: DEC TEMP2
if (TEMP2 == 0)
goto State 18
else
goto State 50
State
State
State
State

10:
11:
40:
50:

11/30

Initials:

12/30

Initials:

13/30

Initials:

C.4. THE CONTROL STRUCTURE

IR[11:9]

IR[11:9]
DR

SR1

111

IR[8:6]

DRMUX

SR1MUX
(b)

(a)

IR[11:9]
N
Z
P

Logic

BEN

(c)

Figure C.6: Additional logic required to provide control signals

LC-3b to operate correctly with a memory that takes multiple clock cycles to read or
store a value.
Suppose it takes memory ve cycles to read a value. That is, once MAR contains
the address to be read and the microinstruction asserts READ, it will take ve cycles
before the contents of the specied location in memory are available to be loaded into
MDR. (Note that the microinstruction asserts READ by means of three control signals:
[Link]/YES, R.W/RD, and [Link]/WORD; see Figure C.3.)
Recall our discussion in Section C.2 of the function of state 33, which accesses
an instruction from memory during the fetch phase of each instruction cycle. For the
LC-3b to operate correctly, state 33 must execute ve times before moving on to state
35. That is, until MDR contains valid data from the memory location specied by the
contents of MAR, we want state 33 to continue to re-execute. After ve clock cycles,
th
h
l t d th d
lti i
lid d t i MDR
th

14/30

Initials:

C.2. THE STATE MACHINE

18, 19

MAR <! PC
PC <! PC + 2
33

MDR <! M
R

IR <! MDR
32
RTI

To 8

1011

BEN<! IR[11] & N + IR[10] & Z + IR[9] & P

[IR[15:12]]

ADD

To 11

1010

To 10

AND

DR<! SR1+OP2*
set CC
To 18

DR<! SR1&OP2*
set CC

XOR

JMP

TRAP

[BEN]

JSR
SHF

LEA

LDB

STW

LDW

STB

DR<! SR1 XOR OP2*

set CC

R
PC<! MDR

To 18

[IR[11]]

MAR<! LSHF(ZEXT[IR[7:0]],1)

To 18

PC<! BaseR

To 18

MDR<! M[MAR]
R7<! PC

PC<! PC+LSHF(off9,1)
9

To 18

R7<! PC
PC<! BaseR
30
To 18

R7<! PC
PC<! PC+LSHF(off11,1)

DR<! SHF(SR,A,D,amt4)
set CC

To 18
14

DR<! PC+LSHF(off9, 1)
set CC

To 18

MAR<! B+off6

MDR<! SR

MDR<! M[MAR]
27

MAR<! B+off6

MDR<! M[MAR[15:1]0]
R

MAR<! B+LSHF(off6,1) MAR<! B+LSHF(off6,1)

NOTES

B+off6 : Base + SEXT[offset6]

PC+off9 : PC + SEXT[offset9]
*OP2 may be SR2 or SEXT[imm5]
** [15:8] or [7:0] depending on
MAR[0]

MDR<! SR[7:0]
16

DR<! SEXT[[Link]]
set CC

DR<! MDR
set CC

M[MAR]<! MDR

To 18

Figure C.2: A state machine for the LC-3b

15/30

M[MAR]<! MDR**
R

R
To 19

Initials:

5. Virtual Memory [40 points]

Suppose a 32 K8 K matrix A with 1-byte elements is stored in row major order in virtual memory.
Assume only the program in question occupies space in physical memory. Show your work for full
credit.
Program 1

Program 2

for (i = 0; i < 32768; i++)

for (j = 0; j < 8192; j++)
A[i][j] = A[i][j] * A[i][j];

for (j = 0; j < 8192; j++)

for (i = 0; i < 32768; i++)
A[i][j] = A[i][j] * A[i][j];

(a) If Program 1 yields 8 K page faults, what is the size of a page in this architecture?

A SIZE = 32K 8K 1B = 256MB

A SIZE / PAGE SIZE = PAGE FAULTS
PAGE SIZE = A SIZE/PAGE FAULTS = 256MB/8K = 32KB

Assume the page size you calculated for the rest of this question.
(b) Consider Program 2. How many pages should the physical memory be able to store to ensure
that Program 2 experiences the same number of page faults as Program 1 does?

8K page faults is possible only if each page is brought into physical memory exactly
once i.e., there shouldnt be any swapping.
Therefore, the physical memory must be large enough to retain all the pages.
PAGE COUNT = A SIZE/PAGE SIZE = 256MB/32K = 8K

32K 8K / 4 = 64M. The inner loop touches a page four times before moving on to a
different page.

16/30

Initials:

What about if the physical memory can store 4 K pages?

32K 8K / 4 = 64M. After touching a page four times, the inner loop touches all other
pages (256MB) before coming back to the same page.

(d) Now suppose the same matrix is stored in column-major order. And, the physical memory size is
32 MB.
How many page faults would Program 1 experience?

32K 8K = 256M. After touching a page just once, the inner loop touches all other
pages (256MB) before coming back to the same page.

How many page faults would Program 2 experience?

8K. The inner loop touches all of a page and never comes back to the same page.

(e) Suppose still that the same matrix is stored in column-major order. However, this time the
physical memory size is 8 MB.
How many page faults would Program 1 experience?

32K 8K = 256M. After touching a page just once, the inner loop touches all other
pages (256MB) before coming back to the same page.

How many page faults would Program 2 experience?

8K. The inner loop touches all of a page and never comes back to the same page.

17/30

Initials:

6. Future File [40 points]

For this question, assume a machine with the following characteristics:
Scalar, out-of-order dispatch with a 4-entry reorder buffer, future file, and full data forwarding.
A 4-stage pipeline consisting of fetch, decode, execute, and writeback.
Fetch and decode take 1 cycle each.
Writeback takes 2 cycles and updates the future file and the reorder buffer.
When the reorder buffer is filled up, fetch is halted.
A program that consists of three instructions: ADD, DIV, LD that have the following semantics:
ADD Rd Rs, Rt: Adds the contents of Rs and Rt and stores the result in Rd.
DIV Rd Rs, Rt: Divides the contents of Rs by the contents of Rt and stores the result in
Rd. Raises an exception if Rt is zero.
LD Rd Rs, Rt: Loads the contents of the base memory address Rs at the offset Rt and stores
the result in Rd. Assume that calculated memory addresses are guaranteed to be 4-byte-aligned
and the memory is bit-addressable.
An ADD instruction takes 1 cycle to execute, a DIV instruction takes 3 cycles to execute and a divideby-zero exception, if present, is detected during the second cycle, and a LD instruction takes 5 cycles
to execute.
Here is the state of the future file in the machine at the end of the cycle when a divide-by-zero
exception is detected:
Future File
V Value
R1
0
21
R2
1
13
R3
1
0
R4
1
3
R5
1
25
R6
1
1
R7
1
17
R8
1
8
R9
1
9
R10 0
23
R11 1
7
R12 1
19
Using what you know about the reorder buffer and the future file, fill in the missing contents of the
reorder buffer in the machine. Assume reorder buffer entries are allocated from top to bottom in the
diagram.

Oldest

Youngest

V
1
1
1
1

Exception?
0
0
0
1

Opcode
LD
ADD
ADD
DIV

Rd
R1
R4
R7
R10

Reorder Buffer
Rs
Rt
R12
R2
R3/? R4/R7
R8 R9
?
R3

18/30

Dest. Value
?
3
17
?

Dest. Value Ready

0
1
1
0

Initials:

Note that R12 + R2 = 32, which is a valid 4-byte-aligned, bit addressable address.

19/30

Initials:

7. Branch Prediction [35 points]

Assume the following piece of code that iterates through a large array populated with completely
(i.e., truly) random positive integers. The code has four branches (labeled B1, B2, B3, and B4).
When we say that a branch is taken, we mean that the code inside the curly brackets is executed.
for (int i=0; i<N; i++) { /* B1 */
val = array[i];
/* TAKEN PATH for
if (val % 2 == 0) {
/* B2 */
sum += val;
/* TAKEN PATH for
}
if (val % 3 == 0) {
/* B3 */
sum += val;
/* TAKEN PATH for
}
if (val % 6 == 0) {
/* B4 */
sum += val;
/* TAKEN PATH for
}
}

B1 */
B2 */

B3 */

B4 */

(a) Of the four branches, list all those that exhibit local correlation, if any.
Only B1.
B2, B3, B4 are not locally correlated. Just like consecutive outcomes of a die, an
element being a multiple of N (N is 2, 3, and 6, respectively for B2, B3, and B4) has
no bearing on whether the next element is also a multiple of N .
(b) Which of the four branches are globally correlated, if any? Explain in less than 20 words.

B4 is correlated with B2 and B3. 6 is a common multiple of 2 and 3.

Now assume that the above piece of code is running on a processor that has a global branch predictor.
The global branch predictor has the following characteristics.
Global history register (GHR): 2 bits.
Pattern history table (PHT): 4 entries.
Pattern history table entry (PHTE): 11-bit signed saturating counter (possible values: -1024
1023)
Before the code is run, all PHTEs are initially set to 0.
As the code is being run, a PHTE is incremented (by one) whenever a branch that corresponds
to that PHTE is taken, whereas a PHTE is decremented (by one) whenever a branch that
corresponds to that PHTE is not taken.

20/30

Initials:

(d) After 120 iterations of the loop, calculate the expected value for only the first PHTE and fill it
in the shaded box below. (Please write it as a base-10 value, rounded to the nearest ones digit.)
Hint. For a given iteration of the loop, first consider, what is the probability that both B1 and B2
are taken? Given that they are, what is the probability that B3 will increment or decrement the
PHTE? Then consider...
Show your work.

Without loss of generality, lets take a look at the numbers from 1 through 6. Given
that a number is a multiple of two (i.e., 2, 4, 6), the probability that the number is
also a multiple of three (i.e., 6) is equal to 1/3, lets call this value Q. Given that a
number is a multiple of two and three (i.e., 6), the probability that the number is also
a multiple of six (i.e., 6) is equal to 1, lets call this value R.
For a single iteration of the loop, the PHTE has four chances of being incremented/decremented, once at each branch.
B3s contribution to PHTE. The probability that both B1 and B2 are taken is denoted
as P(B1 T && B2 T), which is equal to P(B1 T)*P(B2 T) = 1*1/2 = 1/2. Given that
they are, the probability that B3 is taken, is equal to Q = 1/3. Therefore, the PHTE
will be incremented with probability 1/2*1/3 = 1/6 and decremented with probability
1/2*(1-1/3) = 1/3. The net contribution of B3 to PHTE is 1/6-1/3 = -1/6.
B4s contribution to PHTE. P(B2 T && B3 T) = 1/6. P(B4 T | B2 T && B3 T) =
R = 1. B4s net contribution is 1/6*1 = 1/6.
B1s contribution to PHTE. P(B3 T && B4 T) = 1/6. P(B1 T | B3 T && B4 T) =
1. B1s net contribution is 1/6*1 = 1/6.
B2s contribution to PHTE. P(B4 T && B1 T) = 1/6*1 = 1/6. P(B2 T | B4 T &&
B1 T) = 1/2. B2s net contribution is 1/6*1/2 - 1/6*1/2 = 0.
For a single iteration, the net contribution to the PHTE, summed across all the four
branches, is equal to 1/6. Since there are 120 iterations, the expected PHTE value is
equal to 1/6*120=20.

1st PHTE
2nd PHTE
3rd PHTE
4th PHTE

1 0
Older
Younger

GHR

TN
NT
NN

PHT

21/30

Initials:

8. Bonus (Question 7 Continued) [45 points]

(a) Assume the same question in Part (d) of Question 7. Your job in this question is to fill in the
rest of the PHTEs. In other words, after 120 iterations of the loop in Question 7, calculate the
expected value for the rest of the PHTEs (i.e., PHTEs 2, 3, 4) and fill in the PHT below. (Please
write them as base-10 values, rounded to the nearest ones digit.)
Show your work.

PHTE2: TN
P(B1 T && B2 N)=1/2. P(B3 T | B1 T && B2 N)=1/3. PHTE=1/2*(1/3-2/3)=-1/6.
P(B2 T && B3 N)=1/3. P(B4 T | B2 T && B3 N)=0. PHTE=1/3*-1=-1/3.
P(B3 T && B4 N)=1/6. P(B1 T | B3 T && B4 N)=1. PHTE=1/6*1=1/6.
P(B4 T && B1 N)=0. P(B2 T | B4 T && B1 N)=X. PHTE = 0.
Answer: 120*(-1/6-1/3+1/6+0)=-40

PHTE3: NT
P(B1 N && B2 T)=0. P(B3 T | B1 N && B2 T)=X. PHTE=0.
P(B2 N && B3 T)=1/6. P(B4 T | B2 N && B3 T)=0. PHTE=1/6*-1=-1/6.
P(B3 N && B4 T)=0. P(B1 T | B3 N && B4 T)=X. PHTE=0.
P(B4 N && B1 T)=5/6. P(B2 T | B4 N && B1 T)=1/2. PHTE=5/6*(1/2-1/2)=0.
Answer: 120*(0-1/6+0+0)=-20

PHTE4: NN
P(B1 N && B2 N)=0. P(B3 T | B1 N && B2 N)=X. PHTE DELTA=0.
P(B2 N && B3 N)=1/3. P(B4 T | B2 N && B3 N)=0. PHTE DELTA=1/3*-1=-1/3.
P(B3 N && B4 N)=2/3. P(B1 T | B3 N && B4 N)=1. PHTE DELTA=2/3*1=2/3.
P(B4 N && B1 N)=0. P(B2 T | B4 N && B1 N)=X. PHTE DELTA = 0.
Answer: 120*(0-1/3+2/3+0) = 40.

1st PHTE
2nd PHTE
3rd PHTE
4th PHTE

1 0
Older
Younger

GHR

TN
NT
NN

PHT
22/30

Initials:

(b) After the first 120 iterations, let us assume that the loop continues to execute for another 1 billion
iterations. What is the accuracy of this global branch predictor during the 1 billion iterations?
(Please write it as a percentage, rounded to the nearest single-digit.)
Show your work.

Given a history
iteration =
P(B1 T && B2 T) *
P(B2 T && B3 T) *
P(B3 T && B4 T) *
P(B4 T && B1 T) *
Given a history
iteration =
P(B1 T && B2 N) *
P(B2 N && B3 N) *
P(B3 N && B4 N) *
P(B4 N && B1 N) *

of TT, the number of correct predictions per

P(B3
P(B4
P(B1
P(B2

T
T
T
T

|
|
|
|

B1
B2
B3
B4

T
T
T
T

&&
&&
&&
&&

B2
B3
B4
B1

T)
T)
T)
T)

+
+
+
= 7/12

of TN, the number of correct predictions per

P(B3
P(B4
P(B1
P(B2

N
N
N
N

|
|
|
|

B1
B2
B3
B4

T
T
T
T

&&
&&
&&
&&

B2
B3
B4
B1

N)
N)
N)
N)

+
+
+
= 2/3

Given a history of NT, the number of correct predictions per

iteration = 7/12
Given a history of NN, the number of correct predictions per
iteration = 2/3
Correct predictions per iteration = 7/12 + 2/3 + 7/12 + 2/3 = 30/12
Branches per iteration = 4
Accuracy = (30/12)/4 = 30/48 = 5/8 = 62.5% = 63%

(c) Without prior knowledge of the contents of the array, what is the highest accuracy that any type
of branch predictor can achieve during the same 1 billion iterations as above? (Please write it as
a percentage, rounded to the nearest single-digit.)
Show your work.

Per-branch accuracy:
B1: 100%
B2: 50% (half the numbers are even)
B3: 33% (a third of the numbers are a multiple of three)
B4: 100% (global correlation)
Average accuracy: 70.8% = 71%

23/30

Initials:

Stratchpad

24/30

Initials:

Stratchpad

25/30

Initials:

Stratchpad

26/30

Initials:

Stratchpad

27/30

Initials:

Stratchpad

28/30

Initials:

Stratchpad

29/30

Initials:

Stratchpad

30/30

Midterm1 s15 Sol
No ratings yet
Midterm1 s15 Sol
26 pages
cs146 Fall2017 Midterm1xx
No ratings yet
cs146 Fall2017 Midterm1xx
12 pages
Pipelining & Branch Prediction Analysis
No ratings yet
Pipelining & Branch Prediction Analysis
6 pages
Cs433 Sp12 Midterm Sol
No ratings yet
Cs433 Sp12 Midterm Sol
9 pages
Computer Architecture Midterm Exam 2013
No ratings yet
Computer Architecture Midterm Exam 2013
9 pages
Hpca Pyqp
No ratings yet
Hpca Pyqp
17 pages
Midterm Quiz Solutions for CS 4290/6290
No ratings yet
Midterm Quiz Solutions for CS 4290/6290
3 pages
Sample Problems Pipe&Memory
No ratings yet
Sample Problems Pipe&Memory
57 pages
Computer Systems Architecture Exam Solutions
100% (1)
Computer Systems Architecture Exam Solutions
8 pages
컴구 2021 1 중간고사답안 김성태
No ratings yet
컴구 2021 1 중간고사답안 김성태
23 pages
Sp11-Quiz1 Soln
No ratings yet
Sp11-Quiz1 Soln
20 pages
CMSC 411 Sample Final Exam 2019
No ratings yet
CMSC 411 Sample Final Exam 2019
14 pages
Onur 447 Spring15 Lecture12 Ooo Execution Afterlecture
No ratings yet
Onur 447 Spring15 Lecture12 Ooo Execution Afterlecture
67 pages
CMP3010L05-Hazard Continue ILP
No ratings yet
CMP3010L05-Hazard Continue ILP
54 pages
Data Forwarding in MIPS Pipeline Analysis
No ratings yet
Data Forwarding in MIPS Pipeline Analysis
4 pages
ACA Question Bank
No ratings yet
ACA Question Bank
19 pages
Pipeline History
No ratings yet
Pipeline History
30 pages
CompEng 361 Final Review Problems - Solutions
No ratings yet
CompEng 361 Final Review Problems - Solutions
6 pages
Midterm Exam: Computer Architecture 2014
No ratings yet
Midterm Exam: Computer Architecture 2014
3 pages
Pipeline and Branch Prediction Concepts
No ratings yet
Pipeline and Branch Prediction Concepts
7 pages
Onur 447 Spring15 Lecture9 Branch Prediction Afterlecture
No ratings yet
Onur 447 Spring15 Lecture9 Branch Prediction Afterlecture
65 pages
Illinois Exam2 Practice Solfa08
No ratings yet
Illinois Exam2 Practice Solfa08
4 pages
Final Soln 2019 PDF
No ratings yet
Final Soln 2019 PDF
16 pages
National University of Computer and Emerging Sciences, Lahore Campus
No ratings yet
National University of Computer and Emerging Sciences, Lahore Campus
4 pages
Chapter 4
No ratings yet
Chapter 4
4 pages
Final 18
No ratings yet
Final 18
7 pages
MIPS Pipeline Homework CS433 Fall 2007
No ratings yet
MIPS Pipeline Homework CS433 Fall 2007
3 pages
Unit I Instruction Level Parallelism Two Mark Questions: Dept of Cse G.SURESH. M.Tech, Asst Prof / CSE
No ratings yet
Unit I Instruction Level Parallelism Two Mark Questions: Dept of Cse G.SURESH. M.Tech, Asst Prof / CSE
12 pages
Instruction Pipelining Basics
No ratings yet
Instruction Pipelining Basics
20 pages
Midterm Exam Solutions and Grading Guidelines
No ratings yet
Midterm Exam Solutions and Grading Guidelines
8 pages
CA7 2024S2 New
No ratings yet
CA7 2024S2 New
30 pages
Proj2 Report-1-1
No ratings yet
Proj2 Report-1-1
13 pages
Anch Prediction
No ratings yet
Anch Prediction
183 pages
491 Part%2B1%2B-%2BTarea
No ratings yet
491 Part%2B1%2B-%2BTarea
3 pages
End Sem 3rd Semister 2024
No ratings yet
End Sem 3rd Semister 2024
7 pages
Midterm s09 Solution
No ratings yet
Midterm s09 Solution
12 pages
CSE 105 II Answer Key Overview
No ratings yet
CSE 105 II Answer Key Overview
4 pages
Computer Architecture Exam
No ratings yet
Computer Architecture Exam
7 pages
Fall 2022 Qs
No ratings yet
Fall 2022 Qs
15 pages
Aca Important Questions 2 Marks 16marks
60% (5)
Aca Important Questions 2 Marks 16marks
18 pages
Precise Exceptions in Computer Architecture
No ratings yet
Precise Exceptions in Computer Architecture
9 pages
Lecture 10 Pre
No ratings yet
Lecture 10 Pre
152 pages
PS4 Solution
No ratings yet
PS4 Solution
6 pages
Processor Design Solutions Guide
No ratings yet
Processor Design Solutions Guide
11 pages
Solutions for Modern Processor Design Exercises
No ratings yet
Solutions for Modern Processor Design Exercises
18 pages
Updated Solution Manual For Modern Processor Design by John Paul Shen and Mikko H. Lipasti
No ratings yet
Updated Solution Manual For Modern Processor Design by John Paul Shen and Mikko H. Lipasti
8 pages
Full Solution Manual For Modern Processor Design by John Paul Shen and Mikko H. Lipasti
50% (2)
Full Solution Manual For Modern Processor Design by John Paul Shen and Mikko H. Lipasti
27 pages
Solution Manual For Modern Processor Design by John Paul Shen and Mikko H. Lipasti PDF
0% (1)
Solution Manual For Modern Processor Design by John Paul Shen and Mikko H. Lipasti PDF
5 pages
MCQs
No ratings yet
MCQs
19 pages
Advanced Computer Architecture Q&A Bank
No ratings yet
Advanced Computer Architecture Q&A Bank
14 pages
3 Pipeline
No ratings yet
3 Pipeline
21 pages
ECE 452 Spring 2010 Midterm Exam
No ratings yet
ECE 452 Spring 2010 Midterm Exam
9 pages
Sys Verilog PPT PDF
80% (5)
Sys Verilog PPT PDF
544 pages
Baics of SystemVerilog
100% (1)
Baics of SystemVerilog
4 pages
TLM Ports in UVM
No ratings yet
TLM Ports in UVM
8 pages
Verification Interview Questions
50% (2)
Verification Interview Questions
10 pages
UVM Interview Questions Part 2 1705926241
No ratings yet
UVM Interview Questions Part 2 1705926241
23 pages
03-Verilog Modules and Ports-Merged
No ratings yet
03-Verilog Modules and Ports-Merged
170 pages
Digital Design Interview Guide
No ratings yet
Digital Design Interview Guide
24 pages
Metastability and CDC-1
No ratings yet
Metastability and CDC-1
32 pages
UVM Interview Questions
100% (10)
UVM Interview Questions
27 pages
UVM Interview Prep Guide
100% (1)
UVM Interview Prep Guide
12 pages
SystemVerilogAssertionHandbook Full
100% (3)
SystemVerilogAssertionHandbook Full
361 pages
UVM Interview Handbook
100% (1)
UVM Interview Handbook
55 pages
Overview of SOC Architecture Concepts
No ratings yet
Overview of SOC Architecture Concepts
69 pages
SystemVerilog Assertion Examples
50% (4)
SystemVerilog Assertion Examples
9 pages
Uvm
100% (7)
Uvm
46 pages
SystemVerilog Randomization Insights
No ratings yet
SystemVerilog Randomization Insights
16 pages
Digital Logic RTL & Verilog Interview Questions Preview
33% (6)
Digital Logic RTL & Verilog Interview Questions Preview
34 pages
UVM Ramakrishna
0% (2)
UVM Ramakrishna
54 pages
UVM Quick Reference Guide: Author: Putta Satish
50% (2)
UVM Quick Reference Guide: Author: Putta Satish
47 pages
RISC-V SystemC-TLM Simulator
No ratings yet
RISC-V SystemC-TLM Simulator
4 pages
UVM Slides
100% (2)
UVM Slides
158 pages
Verilog Text Book
100% (1)
Verilog Text Book
431 pages
SystemVerilog Constraints for Patterns
100% (1)
SystemVerilog Constraints for Patterns
33 pages
Interrupts
No ratings yet
Interrupts
59 pages
The RISC-V Instruction Set Manual: UCB/EECS-2014-54
No ratings yet
The RISC-V Instruction Set Manual: UCB/EECS-2014-54
100 pages
Functional Coverage Development Tips Dos and Donts VH v10 I2
No ratings yet
Functional Coverage Development Tips Dos and Donts VH v10 I2
6 pages
SV Interview Book
100% (1)
SV Interview Book
11 pages
VLSI Synthesis Overview and Techniques
100% (1)
VLSI Synthesis Overview and Techniques
41 pages
AHB Interview Questions
100% (4)
AHB Interview Questions
11 pages
UVM Interview Questions
100% (2)
UVM Interview Questions
8 pages
1st Quarter Performance Task in Introduction To The Philosophy of The Human Person
No ratings yet
1st Quarter Performance Task in Introduction To The Philosophy of The Human Person
4 pages
Frankie and Johnny: A Tragic Ballad
No ratings yet
Frankie and Johnny: A Tragic Ballad
6 pages
AbuRoashRudist PDF
No ratings yet
AbuRoashRudist PDF
15 pages
CNAS RiskandRivalry Kahl 0
No ratings yet
CNAS RiskandRivalry Kahl 0
56 pages
Class Lecture 1
No ratings yet
Class Lecture 1
29 pages
Reynolds Number Effects on Screen Drag
No ratings yet
Reynolds Number Effects on Screen Drag
24 pages
The Stone of Scone: Scotland's Symbol
No ratings yet
The Stone of Scone: Scotland's Symbol
6 pages
Acoustic & Lighting Ultimate Reviewer
No ratings yet
Acoustic & Lighting Ultimate Reviewer
9 pages
Ageism in The Workplace: The Role of Psychosocial Factors in Predicting Job Satisfaction, Commitment, and Engagement
No ratings yet
Ageism in The Workplace: The Role of Psychosocial Factors in Predicting Job Satisfaction, Commitment, and Engagement
22 pages
Psychological Theories of Crime Causation
No ratings yet
Psychological Theories of Crime Causation
7 pages
Chapter 07.assignment - Sol
No ratings yet
Chapter 07.assignment - Sol
6 pages
Spinal Reflex
No ratings yet
Spinal Reflex
10 pages
Physical Education - 8: Long Examination 2-Q1
No ratings yet
Physical Education - 8: Long Examination 2-Q1
4 pages
African Music and Arts Identification Guide
No ratings yet
African Music and Arts Identification Guide
2 pages
Spinal Cord Anatomy Guide
No ratings yet
Spinal Cord Anatomy Guide
39 pages
Business Finance Formulas PDF
0% (1)
Business Finance Formulas PDF
3 pages
ÔN TẬP SPEAKING B1
No ratings yet
ÔN TẬP SPEAKING B1
3 pages
The Wicker Husban1
No ratings yet
The Wicker Husban1
28 pages
Jose Rizal's Early Life and Education
No ratings yet
Jose Rizal's Early Life and Education
2 pages
Session 1 - Amazon Case HW
No ratings yet
Session 1 - Amazon Case HW
4 pages
Impact of Audio-Visual Resources on Student Performance in Nairobi Schools
No ratings yet
Impact of Audio-Visual Resources on Student Performance in Nairobi Schools
18 pages
Eva Syristova: Phenomenology & Psychosis
100% (1)
Eva Syristova: Phenomenology & Psychosis
8 pages
Programming Languages Build Prove and Compare Norman Ramsey PDF Download
No ratings yet
Programming Languages Build Prove and Compare Norman Ramsey PDF Download
155 pages
Bacterias
No ratings yet
Bacterias
1 page
African Literature and Language Philosophy
No ratings yet
African Literature and Language Philosophy
29 pages
Introduction To Myanmar A Captivating Journey 1
No ratings yet
Introduction To Myanmar A Captivating Journey 1
26 pages
1 1 550 Signalling Principles Designer v6
No ratings yet
1 1 550 Signalling Principles Designer v6
31 pages
Grade 10 English Romeo and Juliet Revision - Content
No ratings yet
Grade 10 English Romeo and Juliet Revision - Content
2 pages
Electricity and Electronics Ebook - January 2012 PDF
100% (1)
Electricity and Electronics Ebook - January 2012 PDF
113 pages
Neuropharmacology of Antiepileptic Drugs: P-Slide 1
No ratings yet
Neuropharmacology of Antiepileptic Drugs: P-Slide 1
64 pages

Computer Architecture Midterm1 Cmu

Uploaded by

Computer Architecture Midterm1 Cmu

Uploaded by

CMU 18-447 Introduction to Computer Architecture, Spring 2013

Problem 2 (25 Points):

Instructor: Onur Mutlu

Problem 3 (20 Points):

Problem 6 (40 Points):

Bonus (45 Points):

Total (270 + 45 Points):

2. No electronic devices may be used.

3. This exam lasts 1 hour and 50 minutes.

Be cognizant of time. Do not spend to much time on one question.

Write legibly. Show your final answer.

1. Potpourri [65 points]

Control flow dependences

(b) Exceptions vs. Interrupts [9 points]

What does it mean to handle an interrupt when it is convenient?

The processor can handle the interrupt at any time.

Why can many interrupts be handled when it is convenient?

(c) Branch Target Buffer [5 points]

Determining the branch target would cause bubbles in the pipeline.

(d) Return Address Prediction [4 points]

(e) Restartable vs. Precise Interrupts [6 points]

It would be hard to debug code running on such a machine. Restartable exceptions do

(f) Segmentation and Paging [4 points]

(g) Out-of-Order vs. Dataflow [8 points]

When all inputs to the instruction are ready.

When does the fetch of an instruction happen in an out-of-order execution processor?

When the program counter points to that instruction.

Determines whether a dataflow node is ready to fire.

What structure accomplishes a similar function in an out-of-order processor?

(h) Tomasulos Algorithm [5 points]

ADD Reservation Station

MUL Reservation Station

What is wrong with this picture?

Cyclical dependences between instructions, which leads to deadlock. (Between tags B

(i) Minimizing Stalls [10 points]

Predication. Eliminating branches solves the problem.

What is the major disadvantage or limitation of each solution?

Wasted instructions, ISA changes

Profile may not be accurate

(j) Tomasulos Algorithm Strikes Back [8 points]

2. Branch Prediction and Dual Path Execution [25 points]

Correct path instructions = N

Correct path instructions = N

3. Dataflow [20 points]

The following is a description of the nodes used in the dataflow graph:

subtracts right input from left input

Note that the input X is a non-negative integer.

4. Mystery Instruction [40 points]

Describe what this instruction does.

Bit-reverses the value in SR1 and puts it in DR.

C.4. THE CONTROL STRUCTURE

Figure C.6: Additional logic required to provide control signals

C.2. THE STATE MACHINE

BEN<! IR[11] & N + IR[10] & Z + IR[9] & P

DR<! SR1 XOR OP2*

MAR<! B+LSHF(off6,1) MAR<! B+LSHF(off6,1)

B+off6 : Base + SEXT[offset6]

Figure C.2: A state machine for the LC-3b

5. Virtual Memory [40 points]

for (i = 0; i < 32768; i++)

for (j = 0; j < 8192; j++)

A SIZE = 32K 8K 1B = 256MB

What about if the physical memory can store 4 K pages?

How many page faults would Program 2 experience?

How many page faults would Program 2 experience?

6. Future File [40 points]

Dest. Value Ready

7. Branch Prediction [35 points]

B4 is correlated with B2 and B3. 6 is a common multiple of 2 and 3.

8. Bonus (Question 7 Continued) [45 points]

of TT, the number of correct predictions per

of TN, the number of correct predictions per

Given a history of NT, the number of correct predictions per

You might also like