0% found this document useful (0 votes)

41 views12 pages

cs146 Fall2017 Midterm1xx

A: Reservation Station Fields: Register Name, Operation, Result B: Buses between Reservation Stations and Functional Units: 64 bits C: Buses between Load Buffers and Register File: 64 bits D: Width of Register File Read Ports: 64 bits b. (8 points) Consider the following sequence of FP instructions: FADD F1, F2, F3 FMUL F4, F5, F1 FSUB F6, F7, F4 Show the state of the reservation stations, functional units, register file and load buffers after each instruction has issued and completed. You may assume that there are no structural hazards or true data dependencies

Uploaded by

grizzyleo

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

41 views12 pages

cs146 Fall2017 Midterm1xx

Uploaded by

grizzyleo

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 12

CS146 Computer Architecture

Fall 2017

Midterm Exam
This exam is worth a total of 100 points.
Note the point breakdown below and budget your time wisely.

To maximize partial credit, show your work and state any assumptions explicitly.

1 /35
2 /20
3 /10
4 /12
5 /12
6 /10
Total /100

Name:
1. Multiple Choice and Short Answer (35pts)
1.1 (10 points) Amdahl’s Law

a. (5 points) Consider a workload where 50% of the execution time consists of multimedia
processing for which the MMX instruction set extensions might be helpful. According to
Amdahl’s law, what is the maximum speedup that can be achieved by implementing
them?

b. (5 points) Now, say that you work at Intel and the MMX designers claim that multimedia
code sequences will see a 3.5 times (3.5X) speedup by using the MMX extensions. What
is the fraction of the execution time that must be multimedia code in order to achieve an
overall speedup of 1.8X?
1.2 (9 points) CISC vs. RISC

When pipelined microprocessors were first becoming more common (early to mid 80’s)
designers believed that RISC instruction sets were easier to pipeline because…?
(Please give 3 reasons)

1.3 (5 points) In spite of this, high-performance pipelined implementations of CISC

instruction sets have been successfully built; various VAX and x86 implementations are
examples including Intel’s P6-Microarchitecture discussed in class. The most
effective/common implementation strategy used by these machines has been to:

a. pipeline the CISC instructions, despite their wide variability in instruction execution
times, and use elaborate memory disambiguation techniques to avoid stalls due to
memory address calculations.

b. Pipeline the CISC instructions and handle the extra structural hazards using aggressive
scoreboarding techniques.

c. Use link-time techniques to break CISC instructions into small RISC-like operations that
are more easily pipelined

d. Use run-time techniques to break CISC instructions into easily-pipelined RISC-like

operations
1.4 (6 points) Pipelining Limits

We have seen how pipelining improves the instruction throughput increasing effective
performance. Machines with deeper pipelines perform less work per pipestage but have
more “in-flight” instructions processing at the same time allowing instructions to complete at
a higher rate. In class we discussed several reasons why the effectiveness of deeply pipelined
machines can be limited – too much pipelining can be detrimental. Describe two reasons
here.

1.5 (6 points) Limits of Loop Unrolling

We have seen how loop unrolling can significantly improve performance by removing loop
overhead and providing a better opportunity for the compiler to generate an efficient static
schedule. In class we discussed several reasons why loop unrolling cannot be performed
indefinitely – too much loop unrolling can limit performance. Describe two reasons here.
2. Pipelining (20 Points)
For this question, consider the code segment below. Assume that full bypassing/forwarding has
been implemented. Assume that the initial value of register R23 is much bigger than the initial
value of register R20. Assume that all memory references hit in the caches and TLBs. Assume
that both load-use hazards and branch delay slots are hidden using delay slots. You may-
not reorder instructions to fill such slots, but if a subsequent instruction is independent and is
properly positioned, you may assume that it fills the slot. Otherwise, fill slots with additional no-
ops as needed.

LOOP: lw R10, X(R20)

lw R11, Y(R20)
subu R10, R10, R11
sw Z(R20), R10
addiu R20, R20, 4
subu R5, R23, R20
bnez R5, LOOP
nop ; 1 delay slot

a. (5 points) On the grid page at the end of the exam, draw a pipeline diagram of 2 iterations
of its execution on a standard 5-stage MIPS pipeline. (You may want to turn it
horizontally). Assume that the branch is resolved using an ID control point. In the box
below, write the total number of cycles required to complete 2 iterations of the loop.

Cycles =
b. (15 points) On the second grid page that follows, draw a pipeline diagram of 2 iterations
of the loop on the pipeline below. Note that the loop has a single branch delay slot nop
included – you may need to add more. You can not assume anything about the program’s
register usage before or after this code segment. Fill in the boxes below.

Pipeline Branch Delay =

Pipeline Load Delay =

Cycles =

Pipeline:

IF1 IF2 ID RF EX1 EX2 M1 M2 WB

IF1 IF2 ID RF EX1 EX2 M1 M2 WB
IF1 IF2 ID RF EX1 EX2 M1 M2 WB
IF1 IF2 ID RF EX1 EX2 M1 M2 WB
IF1 IF2 ID RF EX1 EX2 M1 M2 WB
IF1 IF2 ID RF EX1 EX2 M1 M2 WB

IF1: Begin Instruction Fetch

IF2: Complete Instruction Fetch
ID: Instruction Decode
RF: Register Fetch
EX1: ALU operation execution begins. Branch target calculation finishes. Memory address
calculation. Branch condition resolution calculation begins.
EX2: Branch condition resolution finishes. Finish ALU ops. (But branch and memory
address calculations finish in a single cycle).
M1: First part of memory access, TLB access.
M2: Second part of memory access, Data sent to memory for stores OR returned from
memory for loads.
WB: Write back results to register file
3. Multimedia ISAs and Conditional MOVs (10 Points)
Absolute value is expressed as A = abs(B). In high-level code:

If (B<0) {A=-B;} else {A=B;)

In MIPS-style code this would look something like the following (R2 = B, R1 = A):

BLTZ R2,THEN: ; Check if R2 < 0, Jump to Then

ADDI R1,R2,0 ; R1 = R2 + 0; Else Clause
JUMP END: ; Skip over Then Clause
THEN: SUBI R1,0,R2 ; R1 = 0 – R2; Then Clause
END:

In class, we have seen that conditional branches are detrimental to performance and we have
seen two methods to remove conditional branches.

a. (5 points) Using the saturating arithmetic features of a multimedia ISA code the absolute
value function without using any branch or jump instructions. You can perform the
computation with the following six instructions (you may not need all of them). You can
perform the absolute value operation on subwords (ie. don’t worry about shifting or
extracting).

HADD, HADD,us, HADD,ss, HSUB, HSUB,us, HSUB,ss

Here HADD,us uses unsigned saturating arithmetic and HADD,ss uses saturating
arithmetic.
b. (5 points) Using Conditional Move operations code the absolute value function without
using any branch or jump instructions. In this problem just perform the absolute value
operation on a 64-bit integer value. You may use CMOV’s of the form:

CMOVGTZ R1, R2, R3 // if (R1 > 0) R2 = R3

CMOVLTZ R1, R2, R3 // if (R1 < 0) R2 = R3
CMOVEQZ R1, R2, R3 // if (R1 ==0) R2 = R3
4. Branch Prediction (12 Points)
The following series of branch outcomes occurs for a single branch in a program. (T means the
branch is taken, N means the branch is not taken).

TTTNTNTTTNTNT

a. (4 points) Assume that we are trying to predict this sequence with a BHT using a 1-bit
counter. The counters of the BHT are initialized to the N state. Which of the branches
would be mispredicted? Use the following table. You may assume that this is the only
branch in the program.

Predictor State Before Prediction Branch Mis-Prediction?

Outcome
N T
b. (8 points) Draw the state-transition diagram for a BHT scheme using 2-bit saturating
counters. Repeat this exercise with a 2-bit saturating counter initialized to Weakly-Not-Taken.

Predictor State Before Prediction Branch Mis-Prediction?

Outcome
W-N T
5. Tomasulo’s Algorithm (12 pts)
The drawing below depicts the basic structure of Tomasulo’s algorithm as described in the
textbook and during class. This is the version without a reorder buffer – all renaming occurs
in the reservation stations.

a. Where indicated by the letters A,B,C, and D determine the width of the fields in the Memory
reservation stations, the width of the buses, etc. Please write your answers in the box below.

C
Floating
FP Load
Point
Registers Buffers (6)
Operation
Queue

FP Adder
Reservation
Stations (3) FP Mul
RS (3)

A B
Op Tag Tag Op Tag Tag

FP Adder FP Multiplier

Common Data Bus (D)

A= C=
B= D=
b. Now consider the three reservation stations associated with the FP adder. How many
(and what size) comparators are needed in order to determine when relevant values are
being broadcast on the Common Data Bus? Why?
6. SuperScalar Microarchitectures (10 Points)
Register Execute Write
Fetch Transit Map Queue Register Address Cache1 Cache2 Write
Register FP1 FP2 FP3 FP4 Write

The Alpha 21264 processor is designed to issue 4 integer (2 of which may be load/store) and 2
FP instructions per cycle, and its pipeline diagram is shown above. (The shaded region indicates
where instruction queueing/reordering may occur). All instructions execute in the first four
stages, and then the pipeline is different for integer ops (top row), memory ops (second row), or
floating point opts (third row).

a. (5 points) Since the Alpha instruction set is similar to MIPS (RISC, load/store, etc), how
many register read and write ports would one expect to need, to avoid structural hazards,
in a straightforward implementation of the integer register file?

b. (5 points) Since that number of ports is too difficult to implement, the chip designers
used a trick instead. They divided the physical registers of the machine into two clusters.
A group of functional units are associated with each clusters. Values written to registers
in one cluster will eventually propagate to the other cluster, but there will be extra delay.
In words explain how this implementation choice affects both the machine’s instruction
dispatch/issue unit as well as compiler strategies for this chip.

KLX250 - Service Manual - 2006
100% (2)
KLX250 - Service Manual - 2006
138 pages
Onkyo TX NR656 PDF
91% (33)
Onkyo TX NR656 PDF
180 pages
Magnum 260 - PT-BR
No ratings yet
Magnum 260 - PT-BR
752 pages
Doosan PP 05012018050221 143 46720686 PDF
No ratings yet
Doosan PP 05012018050221 143 46720686 PDF
90 pages
Computer Architecture Midterm1 Cmu
No ratings yet
Computer Architecture Midterm1 Cmu
30 pages
3.5 CN2 Connector: The Terminal Block of The Connector and Pin Number Are As Follows
No ratings yet
3.5 CN2 Connector: The Terminal Block of The Connector and Pin Number Are As Follows
3 pages
Brakes Calculation
No ratings yet
Brakes Calculation
15 pages
Lap Winding Wave Winding and Construction of DC Machine
100% (1)
Lap Winding Wave Winding and Construction of DC Machine
23 pages
HVDC Transmission
No ratings yet
HVDC Transmission
26 pages
Ladder Diagram
No ratings yet
Ladder Diagram
21 pages
Sampling Systems Catalog
100% (1)
Sampling Systems Catalog
18 pages
QP4 BRN32
No ratings yet
QP4 BRN32
7 pages
D01016342 FB4K Blender Ekill Color Coded Notes
No ratings yet
D01016342 FB4K Blender Ekill Color Coded Notes
1 page
Final Exam Solution - Test Paper Final Exam Solution - Test Paper
No ratings yet
Final Exam Solution - Test Paper Final Exam Solution - Test Paper
15 pages
Montana SParts 3040 & 3840
No ratings yet
Montana SParts 3040 & 3840
216 pages
Computer Organzation and Architecture Question Bank
100% (2)
Computer Organzation and Architecture Question Bank
10 pages
Service Manual: KD-A815J, KD-R810J, KD-R816U, KD-R816UH, KD-R811E, KD-R811EY
No ratings yet
Service Manual: KD-A815J, KD-R810J, KD-R816U, KD-R816UH, KD-R811E, KD-R811EY
69 pages
PS4 Solution
No ratings yet
PS4 Solution
6 pages
Manual de Servicio Jac j2 Ilovepdf Compressed
No ratings yet
Manual de Servicio Jac j2 Ilovepdf Compressed
243 pages
MV AC Drive Topology Analysis
No ratings yet
MV AC Drive Topology Analysis
44 pages
2011 Quiz 4 Sol
No ratings yet
2011 Quiz 4 Sol
17 pages
ECE 341 Final Exam Solution: Problem No. 1 (10 Points)
No ratings yet
ECE 341 Final Exam Solution: Problem No. 1 (10 Points)
9 pages
Cs433 Fa20 Hw3 Solution
No ratings yet
Cs433 Fa20 Hw3 Solution
15 pages
Midterm1 s15 Sol
No ratings yet
Midterm1 s15 Sol
26 pages
M116C 1 EE116C-Midterm2-w15 Solution
100% (1)
M116C 1 EE116C-Midterm2-w15 Solution
8 pages
Wiper/washer System
No ratings yet
Wiper/washer System
2 pages
Unit3 CPU
No ratings yet
Unit3 CPU
40 pages
Final Exam Solution - Test Paper Final Exam Solution - Test Paper
No ratings yet
Final Exam Solution - Test Paper Final Exam Solution - Test Paper
82 pages
Final Exam: 15-213 Introduction To Computer Systems
No ratings yet
Final Exam: 15-213 Introduction To Computer Systems
17 pages
Guc 717 65 54502 2025-03-11T12 56 44
No ratings yet
Guc 717 65 54502 2025-03-11T12 56 44
13 pages
ENCE361 Exam 2024
No ratings yet
ENCE361 Exam 2024
25 pages
Final w11
No ratings yet
Final w11
10 pages
cs433 Fa19 hw4 Solution
No ratings yet
cs433 Fa19 hw4 Solution
12 pages
Hw5 Solution
No ratings yet
Hw5 Solution
11 pages
Mid Paper
No ratings yet
Mid Paper
29 pages
40 Out
No ratings yet
40 Out
80 pages
Csci2021 Fa23 Midterm 2 Practice Solutions
No ratings yet
Csci2021 Fa23 Midterm 2 Practice Solutions
9 pages
Rolliflex Cables: Field-Bus Cables
No ratings yet
Rolliflex Cables: Field-Bus Cables
2 pages
Fall 2022 Qs
No ratings yet
Fall 2022 Qs
15 pages
Practice Final Soln
No ratings yet
Practice Final Soln
17 pages
Exam2 Practice Sol
No ratings yet
Exam2 Practice Sol
6 pages
CS 4290/6290: High-Performance Computer Architecture Spring 2004 Midterm Quiz
No ratings yet
CS 4290/6290: High-Performance Computer Architecture Spring 2004 Midterm Quiz
3 pages
Final 18
No ratings yet
Final 18
7 pages
Inf2c Cs 201314
No ratings yet
Inf2c Cs 201314
10 pages
ECSE 324 MT Fall 2021 A With Solutions PDF
No ratings yet
ECSE 324 MT Fall 2021 A With Solutions PDF
10 pages
Coss 2
No ratings yet
Coss 2
2 pages
Inf2c Cs 201112
No ratings yet
Inf2c Cs 201112
9 pages
CS398 Exam 3, 2 Chance December 17th, 2012: Circle The Section That Attend (So We Can Hand Back Your Exam)
No ratings yet
CS398 Exam 3, 2 Chance December 17th, 2012: Circle The Section That Attend (So We Can Hand Back Your Exam)
7 pages
Cat2 b1 Cao
No ratings yet
Cat2 b1 Cao
7 pages
Answer Exam Microprocessor - ECE341 2023 2024
No ratings yet
Answer Exam Microprocessor - ECE341 2023 2024
6 pages
CMPE361-Final - Sanple
No ratings yet
CMPE361-Final - Sanple
8 pages
Architecture Project F24
No ratings yet
Architecture Project F24
5 pages
CENG400-Midterm-Fall 2014
No ratings yet
CENG400-Midterm-Fall 2014
9 pages
Kien-Truc-May-Tinh - David-Brooks - cs146-hw2 - (Cuuduongthancong - Com)
No ratings yet
Kien-Truc-May-Tinh - David-Brooks - cs146-hw2 - (Cuuduongthancong - Com)
5 pages
PDF 2
No ratings yet
PDF 2
13 pages
Compre 23
No ratings yet
Compre 23
3 pages
Design and Implementation of Solar Based DC Grid Using Arduino Uno
No ratings yet
Design and Implementation of Solar Based DC Grid Using Arduino Uno
5 pages
Coa Applied
No ratings yet
Coa Applied
13 pages
ECE391 Final Sem202 Solution
No ratings yet
ECE391 Final Sem202 Solution
5 pages
Arch June 2020
No ratings yet
Arch June 2020
3 pages
Cs433 Sp12 Midterm Sol
No ratings yet
Cs433 Sp12 Midterm Sol
9 pages
Sample Midterm2
No ratings yet
Sample Midterm2
4 pages
School of Physics, Engineering and Technology: The Statement of Assessment
No ratings yet
School of Physics, Engineering and Technology: The Statement of Assessment
3 pages
ECE 4100 Advanced Computer Architecture Final Exam - Summer 2003
No ratings yet
ECE 4100 Advanced Computer Architecture Final Exam - Summer 2003
6 pages
Mid Term 13-14
No ratings yet
Mid Term 13-14
3 pages
Comparch Comparch-002 Exams Midterm A8Xj46NCRo
No ratings yet
Comparch Comparch-002 Exams Midterm A8Xj46NCRo
9 pages
CS211 Exam
No ratings yet
CS211 Exam
10 pages
Midtermarch 2
No ratings yet
Midtermarch 2
9 pages
National University of Computer and Emerging Sciences, Lahore Campus
No ratings yet
National University of Computer and Emerging Sciences, Lahore Campus
4 pages
F Capacitor, Find The Appropriate Value of
No ratings yet
F Capacitor, Find The Appropriate Value of
2 pages
CS433 hw1 Fall 07
No ratings yet
CS433 hw1 Fall 07
3 pages
Cong. Allan Butch Francisco 10.45 HYB 20KWH BAT
No ratings yet
Cong. Allan Butch Francisco 10.45 HYB 20KWH BAT
4 pages
Leaflet WMT56-63 EN
No ratings yet
Leaflet WMT56-63 EN
3 pages
Chiorino VFFS Belts MF Range 26jan11
No ratings yet
Chiorino VFFS Belts MF Range 26jan11
1 page
InstaPATCH Cu Connectivity System Brochure
No ratings yet
InstaPATCH Cu Connectivity System Brochure
8 pages
HW3 Sol PDF
No ratings yet
HW3 Sol PDF
5 pages
EM 120 Weekly Inspection List 19.06.2024
No ratings yet
EM 120 Weekly Inspection List 19.06.2024
3 pages
2005 Computer Architecture Solutions
No ratings yet
2005 Computer Architecture Solutions
11 pages
Illinois Exam2 Practice Solfa08
No ratings yet
Illinois Exam2 Practice Solfa08
4 pages
COMP1411 Final Exam Question Book
No ratings yet
COMP1411 Final Exam Question Book
10 pages
NOWA Price List
No ratings yet
NOWA Price List
2 pages
Laboratory Tutorial3
No ratings yet
Laboratory Tutorial3
3 pages
K73 TEKO Plus Owner Manual
No ratings yet
K73 TEKO Plus Owner Manual
7 pages
Gs Pro gf561
No ratings yet
Gs Pro gf561
6 pages
5 6Y-3607 - Valve GP Controll 1 380358 10/mar/21 22-Mar-21 SGP, 4days
No ratings yet
5 6Y-3607 - Valve GP Controll 1 380358 10/mar/21 22-Mar-21 SGP, 4days
2 pages
April 2005 Withsloution
No ratings yet
April 2005 Withsloution
21 pages
FC250 Broch 7010 2036 RevA
No ratings yet
FC250 Broch 7010 2036 RevA
2 pages
C Programming
From Everand
C Programming
Netra
No ratings yet
Interview Questions for IBM Mainframe Developers
From Everand
Interview Questions for IBM Mainframe Developers
Robert Wingate
1/5 (1)
Projects With Microcontrollers And PICC
From Everand
Projects With Microcontrollers And PICC
Guillermo Perez Guillen
5/5 (1)

cs146 Fall2017 Midterm1xx

Uploaded by

cs146 Fall2017 Midterm1xx

Uploaded by

CS146 Computer Architecture

1.3 (5 points) In spite of this, high-performance pipelined implementations of CISC

d. Use run-time techniques to break CISC instructions into easily-pipelined RISC-like

1.5 (6 points) Limits of Loop Unrolling

LOOP: lw R10, X(R20)

Pipeline Branch Delay =

Pipeline Load Delay =

IF1 IF2 ID RF EX1 EX2 M1 M2 WB

IF1: Begin Instruction Fetch

If (B<0) {A=-B;} else {A=B;)

BLTZ R2,THEN: ; Check if R2 < 0, Jump to Then

HADD, HADD,us, HADD,ss, HSUB, HSUB,us, HSUB,ss

CMOVGTZ R1, R2, R3 // if (R1 > 0) R2 = R3

Predictor State Before Prediction Branch Mis-Prediction?

Predictor State Before Prediction Branch Mis-Prediction?

Common Data Bus (D)

You might also like