0% found this document useful (0 votes)

45 views6 pages

CompEng 361 - Homework 3 Solutions

Uploaded by

Aaron Sun

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

45 views6 pages

CompEng 361 - Homework 3 Solutions

Uploaded by

Aaron Sun

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 6

Northwestern University

CompEng 361: Computer Architecture

Fall 2023
Homework 3

1. In this exercise, we examine how pipelining affects the clock cycle time of the processor.
Problems in this exercise assume that individual stages of the datapath have the following
latencies:

IF ID EX MEM WB

300ps 450ps 200ps 400ps 200ps

Also, assume that instructions executed by the processor are broken down as follows:

R-Type beq lw sw

60% 15% 15% 10%

a. What is the clock cycle time in a pipelined and non-pipelined processor?

Non-Pipelined: 300 + 450 + 200 + 400 + 200 = 1550 ps

Pipelined: max(300, 450, 200, 400, 200) = 450 ps

b. What is the total latency of an lw instruction in a pipelined and non-pipelined processor?

Non-Pipelined: 300 + 450 + 200 + 400 + 200 = 1550 ps

Pipelined: 450 * 5 = 2250 ps

c. If we can split one stage of the pipelined datapath into two new stages, each with half the
latency of the original stage, which stage would you split and what is the new clock cycle
time of the processor?

Split the ID stage because it is the longest.

New Cycle Time = max(300, 225, 225, 200, 400, 200) = 400 ps

d. Assuming there are no stalls or hazards, what is the utilization of the data memory?

Pct of Stores + Loads = 10% + 15% = 25%

e. Assuming there are no stalls or hazards, what is the utilization of the write-register port of
the “Registers” unit?

Pct of R-Type + Loads = 60% + 15% = 75 %

f. Instead of a single-cycle organization, we can use a multi-cycle organization where each

instruction takes multiple cycles but one instruction finishes before another is fetched. In
this organization, an instruction only goes through stages it actually needs (e.g., ST only
takes 4 cycles because it does not need the WB stage). Compare clock cycle times and
execution times with single-cycle, multi-cycle, and pipelined organization.
Multi Cycle CPI: 0.6 * 4 + 0.15 * 3 + 0.15 * 5 + 0.1 * 4 = 4

CPU Single Cycle Mult-cycle Pipeline

Cycle Time 1550 ps 450 ps 450 ps

CPI 1 4 1

Execution Time / 1550 ps 1800 ps 450 ps

Instruction

2. In this exercise, we examine how data dependencies affect execution in the basic 5-stage
pipeline described in Section 4.5. Problems in this exercise refer to the following sequence of
instructions:

or r1, r2, r3 // (i)

or r2, r1, r4 // (ii)
or r1, r1, r2 // (iii)

Also, assume the following cycle times for each of the options related to forwarding:

Without Forwarding With Full With ALU-ALU

Forwarding Forwarding Only

300 ps 350 ps 340 ps

a. Indicate dependences and their type.

1. RAW dependency for r1 between i and ii
2. RAW dependency for r1 between i and iii
3. RAW dependency for r2 between ii and iii
4. WAR dependency for r2 between i and ii
5. WAR dependency for r1 between ii and iii
6. WAW dependency for r1 between i and iii
b. Assume there is no forwarding in this pipelined processor. Indicate hazards and add nop
instructions to eliminate them.

or r1, r2, r3
nop
nop
// Data hazard on r1
or r2, r1, r4
nop
nop
// Data hazard on r1, r2
or r1, r1, r2
c. Assume there is full forwarding. Indicate hazards and add nop instructions to eliminate
them.

No hazards…nothing to do!
d. What is the total execution time of this instruction sequence without forwarding and with
full forwarding? What is the speedup achieved by adding full forwarding to a pipeline that
had no forwarding?

With: 2 + 5 cycles, time: 7 * 350 = 2450

Without: 4 + 7 cycles, time: 11 * 300 = 3300
Speedup: 3300 / 2450 = ~1.35 times

e. Add nop instructions to this code to eliminate hazards if there is ALU-ALU forwarding only
(no forwarding from the MEM to the EX stage).

or r1, r2, r3
or r2, r1, r4
nop
nop
or r1, r1, r2
f. What is the total execution time of this instruction sequence with only ALU-ALU
forwarding?

With ALU-ALU: 4 + 5 cycles, time: 9 * 340 = 3060

Without: 4 + 7 cycles, time: 11 * 300 = 3300
Speed up: 3300/3060~1.08 times

3. The importance of having a good branch predictor depends on how often conditional branches
are executed. Together with branch predictor accuracy, this will determine how much time is
spent stalling due to mispredicted branches. In this exercise, assume that the breakdown of
dynamic instructions into various instruction categories is as follows:

R-Type beq jmp lw sw

50% 20% 5% 20% 5%

Also, assume the following branch predictor accuracies:

Always-Taken Always-Not-Taken 2-Bit

40% 60% 85%

a. Stall cycles due to mispredicted branches increase the CPI. What is the extra CPI due to
mispredicted branches with the always-taken predictor? Assume that branch outcomes
are determined in the EX stage, that there are no data hazards, and that no delay slots are
used.
b. Repeat 3a. for the “always-not-taken” predictor.
c. Repeat 3a. for the 2-bit predictor.
Predictor Miss Rate Occurrence Stall Cycles Extra CPI

Atways Taken (a) 0.6 0.2 3 0.36

Always NT (b) 0.4 0.2 3 0.24

2 Bit (c) 0.15 0.2 3 0.09

d. With the 2-bit predictor, what speedup would be achieved if we could convert half of the
branch instructions in a way that replaces a branch instruction with an ALU instruction?
Assume that correctly and incorrectly predicted instructions have the same chance of
being replaced.

New extra CPI: 0.15 * 0.1 * 3 = 0.045

Speed up: 1.09 / 1.045 = ~ 1.043

e. With the 2-bit predictor, what speedup would be achieved if we could convert half of the
branch instructions in a way that replaced each branch instruction with two ALU
instructions? Assume that correctly and incorrectly predicted instructions have the same
chance of being replaced.

New extra CPI: 0.15 * 0.1 * 3 + 0.1 * 1 = 0.145

Speed up: 1.09 / 1.145 ~ 0.952

f. Some branch instructions are much more predictable than others. If we know that 80% of
all executed branch instructions are easy-to predict loop-back branches that are always
predicted correctly, what is the accuracy of the 2-bit predictor on the remaining 20% of the
branch instructions?

0.8 + x * 0.2 = 0.85 => x = 0.25

25 %

4. This exercise examines the accuracy of various branch predictors for the following repeating
pattern (e.g., in a loop) of branch outcomes:

NT, T, NT, NT, T

a. What is the accuracy of always-taken and always-not-taken predictors for this sequence of
branch outcomes?

Always T: 0.4
Always NT: 0.6

b. What is the accuracy of the two-bit predictor for the first 4 branches in this pattern,
assuming that the predictor starts off in the bottom left state from the lecture slides
(strongly predict not taken)?

Branches NT T NT NT
Pred NT NT NT NT

Outcome Correct Wrong Correct Correct

Status After Strong NT Weak NT Strong NT Strong NT

Accuracy is 75%

c. What is the accuracy of the two-bit predictor if this pattern is repeated forever?

Branches NT T NT NT T

Pred NT NT NT NT NT

Outcome Correct Wrong Correct Correct Wrong

Status After Strong NT Weak NT Strong NT Strong NT Weak NT

Accuracy is 60%

d. Design a predictor that would achieve a perfect accuracy if this pattern is repeated forever.
Your predictor should be a sequential circuit with one output that provides a prediction (1
for taken, 0 for not taken) and no inputs other than the clock and the control signal that
indicates that the instruction is a conditional branch.

There are several ways to show this (including Verilog code, Gate diagram w/ FFs). Here is the
most straightforward way to do this with a state diagram:

Slight variations on this are fine. Note that this predictor must be initialized in the correct state to
predict the pattern perfectly.

e. What is the accuracy of your predictor from part d if it is given a repeating pattern that is
the exact opposite of this one?

This predictor is always wrong => 0% accuracy.

f. Repeat 4d., but now your predictor should be able to eventually (after a warm-up period
during which it can make wrong predictions) start perfectly predicting both this pattern and
its opposite. Your predictor should have an input that tells it what the real outcome was.
Hint: this input lets your predictor determine which of the two repeating patterns it is given.

Again. There are many ways to show this. The simplest thing to do is to distinguish between the
two patterns early in the sequence (warm up). Again, we will assume that you are initialized into
one of the two starting states. Here is a state diagram:
Note that the two inputs identify if the current instruction is a branch and what the real outcome
was. Slight variations on this are fine.

Given that we have an input to tell us what the correct prediction was, we can actually devise a
more complex predictor that will eventually correctly predict either pattern but won’t need to be
initialized into the correct state.

Computer Architecture Midterm1 Cmu
No ratings yet
Computer Architecture Midterm1 Cmu
30 pages
FemtoRV32 Piplined Processor Report
No ratings yet
FemtoRV32 Piplined Processor Report
25 pages
Instruction-Level Parallelism and Its Exploitation: Prof. Dr. Nizamettin AYDIN
No ratings yet
Instruction-Level Parallelism and Its Exploitation: Prof. Dr. Nizamettin AYDIN
170 pages
IT3030E Exercise Chap5 v2 Ans
No ratings yet
IT3030E Exercise Chap5 v2 Ans
11 pages
Computer Architecture Elementary Pipelining Study
100% (4)
Computer Architecture Elementary Pipelining Study
20 pages
PS4 Solution
No ratings yet
PS4 Solution
6 pages
Lecture On Global Informatics and Electronics
No ratings yet
Lecture On Global Informatics and Electronics
45 pages
Slides Chapter 6 Pipelining
No ratings yet
Slides Chapter 6 Pipelining
60 pages
App C
No ratings yet
App C
50 pages
Midterm1 s15 Sol
No ratings yet
Midterm1 s15 Sol
26 pages
Pipeline
No ratings yet
Pipeline
39 pages
Control Hazard
No ratings yet
Control Hazard
20 pages
Pipe 3
No ratings yet
Pipe 3
32 pages
HRY-312 Computer Organization Introduction To Pipelining
No ratings yet
HRY-312 Computer Organization Introduction To Pipelining
30 pages
Computer Architecture M2 (Part 3)
No ratings yet
Computer Architecture M2 (Part 3)
34 pages
SQRT
No ratings yet
SQRT
28 pages
Numerical: Central Processing Unit
No ratings yet
Numerical: Central Processing Unit
28 pages
Investigating Instruction Pipelining
No ratings yet
Investigating Instruction Pipelining
20 pages
hw2 Sols Ece570 w14
No ratings yet
hw2 Sols Ece570 w14
9 pages
cs146 Fall2017 Midterm1xx
No ratings yet
cs146 Fall2017 Midterm1xx
12 pages
M116C 1 EE116C-Midterm2-w15 Solution
100% (1)
M116C 1 EE116C-Midterm2-w15 Solution
8 pages
Group 17 - 2151177
No ratings yet
Group 17 - 2151177
15 pages
CO Assignment 4 Solution
100% (1)
CO Assignment 4 Solution
10 pages
Solution 2
No ratings yet
Solution 2
3 pages
COA Practice Problems
No ratings yet
COA Practice Problems
59 pages
Ca CT2
No ratings yet
Ca CT2
4 pages
CCSP Master Notes V2
No ratings yet
CCSP Master Notes V2
54 pages
Homework Set - 5
No ratings yet
Homework Set - 5
2 pages
8 Pipeline DDP Control
No ratings yet
8 Pipeline DDP Control
54 pages
A4 Solution
No ratings yet
A4 Solution
4 pages
Control Hazard
No ratings yet
Control Hazard
4 pages
CompEng 361 Final Review Problems - Solutions
No ratings yet
CompEng 361 Final Review Problems - Solutions
6 pages
ECE 568 Solutions Computer Architecture
No ratings yet
ECE 568 Solutions Computer Architecture
5 pages
Lecture 4.3 - The Processor - Pipelining
No ratings yet
Lecture 4.3 - The Processor - Pipelining
27 pages
Introduction To Advanced Pipelining
No ratings yet
Introduction To Advanced Pipelining
64 pages
Two Forms of Pipelining: - E.g., Floating Point Operations
No ratings yet
Two Forms of Pipelining: - E.g., Floating Point Operations
36 pages
06 Solutions For Chapter 6 Exercises
No ratings yet
06 Solutions For Chapter 6 Exercises
14 pages
EE557 SP25 HW2 Sol
No ratings yet
EE557 SP25 HW2 Sol
9 pages
CSE 560 - Practice Problem Set 4 Solution
No ratings yet
CSE 560 - Practice Problem Set 4 Solution
3 pages
Tuesday, October 31, 2023 10:53 PM: Discuss, The Schemes For Dealing With The Pipeline Stalls Caused by Branch Hazards
No ratings yet
Tuesday, October 31, 2023 10:53 PM: Discuss, The Schemes For Dealing With The Pipeline Stalls Caused by Branch Hazards
7 pages
Investigating Instruction Pipelining
No ratings yet
Investigating Instruction Pipelining
8 pages
What About Branches?: Branch Outcomes Are Not Known Until EXE What Are Our Options?
No ratings yet
What About Branches?: Branch Outcomes Are Not Known Until EXE What Are Our Options?
27 pages
CMP3010L05-Hazard Continue ILP
No ratings yet
CMP3010L05-Hazard Continue ILP
54 pages
F10 E1 Solution
No ratings yet
F10 E1 Solution
5 pages
Sample Problems Pipe&Memory
No ratings yet
Sample Problems Pipe&Memory
57 pages
Cs433 Sp12 Midterm Sol
No ratings yet
Cs433 Sp12 Midterm Sol
9 pages
Assignment5 Soln
No ratings yet
Assignment5 Soln
5 pages
Cs433 Fa12 Hw4 Sol Correct
No ratings yet
Cs433 Fa12 Hw4 Sol Correct
14 pages
Illinois Exam2 Practice Solfa08
No ratings yet
Illinois Exam2 Practice Solfa08
4 pages
Mid 2
No ratings yet
Mid 2
8 pages
National University of Computer and Emerging Sciences, Lahore Campus
No ratings yet
National University of Computer and Emerging Sciences, Lahore Campus
4 pages
Cse590490 HW2
No ratings yet
Cse590490 HW2
5 pages
ECE 341 Final Exam Solution: Problem No. 1 (10 Points)
No ratings yet
ECE 341 Final Exam Solution: Problem No. 1 (10 Points)
9 pages
Homework 2
No ratings yet
Homework 2
8 pages
Ex4 Updated
No ratings yet
Ex4 Updated
4 pages
CS433 hw1 Fall 07
No ratings yet
CS433 hw1 Fall 07
3 pages
Solution Assignment No 2
No ratings yet
Solution Assignment No 2
8 pages
Exam Test Csu
100% (3)
Exam Test Csu
11 pages
Modem-TPS Software For FH915 and FH2400: User's Manual
No ratings yet
Modem-TPS Software For FH915 and FH2400: User's Manual
24 pages
MF920V QSG V1-0
No ratings yet
MF920V QSG V1-0
20 pages
Synon Extra 2
No ratings yet
Synon Extra 2
7 pages
Infinistream Certified/Hardware Appliance V6.3: Release Notes
No ratings yet
Infinistream Certified/Hardware Appliance V6.3: Release Notes
24 pages
GDL 69/69A: Troubleshooting Reference
No ratings yet
GDL 69/69A: Troubleshooting Reference
2 pages
Snapdragon 410 Processor Product Brief PDF
No ratings yet
Snapdragon 410 Processor Product Brief PDF
2 pages
DE0 User Manual
No ratings yet
DE0 User Manual
56 pages
Cisco 9300L Datasheet
No ratings yet
Cisco 9300L Datasheet
79 pages
Module 1 Computer Today
No ratings yet
Module 1 Computer Today
43 pages
Multichiller - Bacnet - Modbus - Mapping - MC - Evo
No ratings yet
Multichiller - Bacnet - Modbus - Mapping - MC - Evo
23 pages
How To Configure A DCC Light Driver
No ratings yet
How To Configure A DCC Light Driver
15 pages
All Arrays Commands
No ratings yet
All Arrays Commands
8 pages
CL650 Shared Cockpit Quick Start Guide
No ratings yet
CL650 Shared Cockpit Quick Start Guide
15 pages
VLANs
No ratings yet
VLANs
42 pages
CCRG 4 1 6
No ratings yet
CCRG 4 1 6
521 pages
BSIT First Year Lesson
No ratings yet
BSIT First Year Lesson
3 pages
DevOps PPT by MQS
No ratings yet
DevOps PPT by MQS
17 pages
Quiz (Unit-3)
No ratings yet
Quiz (Unit-3)
2 pages
CBSE Annual Exam IP
No ratings yet
CBSE Annual Exam IP
4 pages
Revision Paper 1
No ratings yet
Revision Paper 1
6 pages
MB Manual H310m-A-20 e
No ratings yet
MB Manual H310m-A-20 e
41 pages
Nagios Provides Complete Monitoring of Applications and Application State - Applications. Nagios XI
No ratings yet
Nagios Provides Complete Monitoring of Applications and Application State - Applications. Nagios XI
10 pages
CIT 204 Networking Administration and Management CO
No ratings yet
CIT 204 Networking Administration and Management CO
3 pages
Csit PPT Group-3
No ratings yet
Csit PPT Group-3
22 pages
K Ramesh: Mail Id
No ratings yet
K Ramesh: Mail Id
3 pages
Creating Development Environments With Vagrant - Second Edition - Sample Chapter
No ratings yet
Creating Development Environments With Vagrant - Second Edition - Sample Chapter
15 pages
SPCA506A1: Usb A/V Grabber
No ratings yet
SPCA506A1: Usb A/V Grabber
28 pages
FastCopy v. 3.30-Characteristics
No ratings yet
FastCopy v. 3.30-Characteristics
18 pages
Loop-shaping Robust Control
From Everand
Loop-shaping Robust Control
Philippe Feyel
No ratings yet
CCNA Exam Focus: Study Guide with Practice Tests
From Everand
CCNA Exam Focus: Study Guide with Practice Tests
SUJAN
No ratings yet
CCNA Exam Excellence: Study Guide & Practice Tests
From Everand
CCNA Exam Excellence: Study Guide & Practice Tests
SUJAN
No ratings yet
WAN TECHNOLOGY FRAME-RELAY: An Expert's Handbook of Navigating Frame Relay Networks
From Everand
WAN TECHNOLOGY FRAME-RELAY: An Expert's Handbook of Navigating Frame Relay Networks
Mamta Devi
No ratings yet

CompEng 361 - Homework 3 Solutions

Uploaded by

CompEng 361 - Homework 3 Solutions

Uploaded by

Northwestern University

CompEng 361: Computer Architecture

300ps 450ps 200ps 400ps 200ps

60% 15% 15% 10%

a. What is the clock cycle time in a pipelined and non-pipelined processor?

Non-Pipelined: 300 + 450 + 200 + 400 + 200 = 1550 ps

b. What is the total latency of an lw instruction in a pipelined and non-pipelined processor?

Non-Pipelined: 300 + 450 + 200 + 400 + 200 = 1550 ps

Split the ID stage because it is the longest.

Pct of Stores + Loads = 10% + 15% = 25%

Pct of R-Type + Loads = 60% + 15% = 75 %

f. Instead of a single-cycle organization, we can use a multi-cycle organization where each

CPU Single Cycle Mult-cycle Pipeline

Cycle Time 1550 ps 450 ps 450 ps

Execution Time / 1550 ps 1800 ps 450 ps

or r1, r2, r3 // (i)

Without Forwarding With Full With ALU-ALU

300 ps 350 ps 340 ps

a. Indicate dependences and their type.

With: 2 + 5 cycles, time: 7 * 350 = 2450

With ALU-ALU: 4 + 5 cycles, time: 9 * 340 = 3060

R-Type beq jmp lw sw

50% 20% 5% 20% 5%

Also, assume the following branch predictor accuracies:

Always-Taken Always-Not-Taken 2-Bit

40% 60% 85%

Atways Taken (a) 0.6 0.2 3 0.36

Always NT (b) 0.4 0.2 3 0.24

2 Bit (c) 0.15 0.2 3 0.09

New extra CPI: 0.15 * 0.1 * 3 = 0.045

New extra CPI: 0.15 * 0.1 * 3 + 0.1 * 1 = 0.145

0.8 + x * 0.2 = 0.85 => x = 0.25

NT, T, NT, NT, T

Outcome Correct Wrong Correct Correct

Status After Strong NT Weak NT Strong NT Strong NT

Outcome Correct Wrong Correct Correct Wrong

Status After Strong NT Weak NT Strong NT Strong NT Weak NT

This predictor is always wrong => 0% accuracy.

You might also like