0% found this document useful (0 votes)

66 views13 pages

Coa Applied

The average CPI for the pipelined processor due to data cache stalls is: CPIbase = 1 CPIloads = 0.3 × 2 = 0.6 CPIstores = 0.15 × 2 = 0.3 Average CPI = CPIbase + CPIloads + CPIstores = 1 + 0.6 + 0.3 = 1.9 So the average CPI for the pipelined processor due to data cache stalls is 1.9 cycles per instruction. Improving the data cache design would reduce these stalls and improve performance.

Uploaded by

Ishan

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

66 views13 pages

Coa Applied

Uploaded by

Ishan

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 13

COE 301 / ICS 233

Computer Organization

Final Exam – Term 172

Tuesday, May 15, 2018
8:00 am – 10:30 am

Computer Engineering Department

College of Computer Sciences & Engineering
King Fahd University of Petroleum & Minerals

SOLUTION

Dr. Aiman El-Maleh COE 301 ICS 233

Dr. Marwan Abu-Amara COE 301 ICS 233
Dr. Muhamed Mudawar

Q1 / 17 Q2 /8
Q3 / 14 Q4 / 19
Q5 / 10 Q6 / 12
Total / 80

Important Reminder on Academic Honesty

Using unauthorized information or notes on an exam, peeking at others work, or
altering graded exams to claim more credit are severe violations of academic
honesty. Detected cases will receive a failing grade in the course.
Page 2 of 13

Q1. [17 points] General Understanding of Topics

a) (1 point) Does pipelining improve the latency of individual instructions? Explain.

Pipelining does NOT improve the latency of individual instructions. However, it

improves the throughput.

b) (2 points) What causes control hazards in a pipelined datapath and how control hazards
can be eliminated?

Control hazards are caused by jump and branch instructions that are delayed in a
pipelined datapath. They can be eliminated by converting the next (one or two)
instructions that appear after a jump or a taken branch into NOPs.

c) (2 points) Explain the difference between static RAM and dynamic RAM.

Static RAM: Cell is made out of 6 transistors and does not require refreshing.

Dynamic RAM: Cell is made out of 1 transistor and 1 capacitor, requires

refreshing, but denser (cheaper) than SRAM.

d) (2 points) Is it possible to use only one memory for both instructions and data in the
single-cycle datapath? Explain why or why not. Is it possible to use only one memory
for both instructions and data in a multi-cycle datapath? Explain.

In a single-cycle datapath, a load instruction must be fetched and must read data
during the same cycle. Using only one memory is NOT possible to fetch the
instruction and load the data during the same cycle.

In a multi-cycle datapath, using only one memory IS possible because fetching the
instruction and loading the data can occur in two different cycles.

e) (2 points) Why do we need cache memory, and why do we have two separate cache
memories (I-cache and D-cache) in a pipelined processor?

We need cache memory to reduce memory latency.

Two separate caches (I-cache and D-cache) are needed to access both of them
during the same cycle by two different instructions.
Page 3 of 13

f) (2 points) Explain the concepts of temporal locality and spatial locality of reference in
cache memory.

Temporal Locality: if a program references an instruction (or data) at a given

address then it might reference the same address again in the future.

Spatial Locality: if a program references an instruction (or data) at a given

address then it might reference the next address in the memory.

g) (2 points) What needs to be stored inside a cache for block identification? How does a
cache know whether there is a cache hit or miss?

A cache stores tags for block identification.

The tag stored in the cache is compared against the tag in the memory address to
determine whether there is a cache hit or miss.

h) (2 points) Suppose a 4-way set-associative cache has a capacity of 32 KiB (1 KiB =

1024 bytes) and each block consists of 64 Bytes. What is the total number of blocks in
the cache? What is the number of sets?

Total number of blocks = 32 x 1024 / 64 = 512 blocks

Number of sets = 512 / 4 = 128 sets

i) (2 points) Explain the difference between a write-through and a write-back cache.

Write-through cache: every write to the cache is written to the lower-level

memory.

Write-back cache: the write is done in the cache only. A modified bit is needed to
indicate whether a block has been modified. Modified blocks are written back to
memory when replaced.
Page 4 of 13

Q2. [8 points] Single-Cycle Processor

The single-cycle datapath and control of a MIPS-like processor is shown below. However,
this datapath and control lacks the implementation of many important instructions.

Consider adding the following two new instructions to the above datapath: JLR and LWI.
The JLR instruction is I-type and has a unique opcode. The LWI instruction is R-type and
has a unique function code. The least-significant 2 bits of register PC are hardwired to 00,
and not stored in PC. Therefore, it is sufficient to increment PC by 1 to point to the next
instruction in memory.

Instruction Format Meaning

Jump and Link Register Reg[Rt] = PC + 1
Op, Rs, Rt, Imm16
Op = JLR PC = Reg[Rs] + Imm16
Load Word Indexed
Op, Rs, Rt, Rd, Func Reg[Rd] = Mem[Reg[Rs] + Reg[Rt]]
Op = R-type, Func = LWI

a) (4 points) Redraw the necessary changes to the above datapath to implement the above
two instructions. Draw only the modified parts and explain why they are needed.

b) (4 points) Identify any new control signal needed to implement the above two
instructions. Draw a table showing the values of all control signals to implement the
above two instructions.
Page 5 of 13

Q2 Solution

a) Changes needed to implement JLR and LWI instructions:

Add a 4th input to the mux at the input of the PC register and add a bus connecting the
ALU result (Jump Register Address) back to the PC.

Add a 3rd input to the WB mux and add a bus connecting the Return Address (PC + 1)
back to the register file.

b) Same control signals are used, except that Main control logic now depends on the
opcode and function code for LWI.

PCSrc RegDst RegWr ExtOp ALUSrc ALUOp MemRd MemWr WBdata

JLR 3=JRA 0=Rt 1 1=Sign 1=Imm ADD 0 0 2=RA
LWI 0=PC+1 1=Rd 1 X 0=BusB ADD 1 0 1=Mem
Page 6 of 13

Q3. [14 points] Performance of Single-Cycle, Multi-Cycle, and Pipelined CPU

Compare the performance of a single-cycle processor and a multi-cycle processor. The
delay times are as follows:
Instruction memory access time = 500 ps Data memory access time = 500 ps
Instruction Decode and Register read = 200 ps Register write = 200 ps
ALU delay = 100 ps
Ignore the other delays in the multiplexers, wires, etc. Assume a program has the following
instruction mix: 40% ALU, 5% load, 5% store, 30% branch, and 20% jump.
a) (6 points) Compute the delay for each instruction class and the clock cycle for the single-
cycle processor.

Instruction Instruction Decode and Data Write Total

ALU
Class Memory Register Read Memory Back Delay

ALU 500 200 100 200 1000

Load 500 200 100 500 200 1500
Store 500 200 100 500 1300
Branch 500 200 100 800
Jump 500 200 700

Clock cycle for the single-cycle processor = 1500 ps = 1.5 ns

b) (2 points) Compute the clock cycle and the average CPI for the multi-cycle processor.

Clock cycle for the multi-cycle processor = max(500,200,100) = 500 ps

Average CPI for the multi-cycle processor =

0.4 × 4 + 0.05 × 5 + 0.05 × 4 + 0.3 × 3 + 0.2 × 2 = 3.35

c) (2 points) Determine quantitatively if there is a speedup when using the multi-cycle

processor with respect to the single-cycle.

Speedup = (1500 × 1) / (500 × 3.35) = 0.8955 ⇒ No speedup

Page 7 of 13

d) (2 points) Assume that the processor is pipelined. Furthermore, assume that a program
has the following instruction mix: 40% ALU, 5% load, 5% store, 30% branch, and 20%
jump. Moreover, assume that 90% of the branches will be taken. The CPU stalls 1 cycle
for each jump and 2 cycles for each taken branch. Compute the average CPI for the
pipelined processor due to control hazards only.

Average CPI for the pipelined processor for control hazards =

CPIbase + CPIstalls = 1 + (0.3 × 0.9 × 2) + (0.2 × 1) = 1.74

e) (2 points) Assume that the processor is pipelined and that load instructions are 5% of the
instruction count and store instructions are also 5% as given above. However, the program
spends 30% of its execution time executing load instructions and 15% of its execution
time executing store instructions. The designers discovered that the Data cache is
producing many cache misses causing the CPU to stall. They decided to improve the
design of the data cache and improve the execution time of the load instructions by a factor
of 3x (3 times faster) and the store instructions by a factor of 2x. Determine the overall
speedup of the program due to the improvements done to the data cache.

Speedup due to data cache improvement =

= .
. .
+ +( − . − . )

The program will run faster by a factor of 1.379x due to data cache improvement.
Page 8 of 13

Q4. [19 points] Pipelined CPU Design

I. Consider the 5-stage pipelined CPU design given below.

a) (5 points) Show the design changes needed for handling data hazards using
forwarding including a block diagram for data hazard detection and forwarding unit.
Page 9 of 13

b) (4 points) Show the control signals that will be used for stalling the pipeline for data
hazards due to load instructions along with their conditions. Show the necessary
changes that need to be done to the design.

Condition for Stalling the pipeline due to Load Instruction:

if ((EX.MemRd == 1) // Detect Load in EX stage

and (ForwardA==1 or ForwardB==1)) Stall // RAW Hazard

OR:
if ((EX.MemRd == 1)
and (Rd2 != 0) and ((Rs == Rd2) or (Rt == Rd2))) Stall

Stall will Disable PC and Disable IR (i.e. the signals PCWrite=0 and IRWrite=0), which
will freeze the content of PC and IR registers and will introduce a bubble in stage 2
control register by setting the control signals to 0.
Page 10 of 13

c) (2 points) Show the control signals that will be used for handling control hazards.
Show the necessary changes that need to be done to the design.

When a jump instruction is at stage 2, Kill1=1 will replace that instruction by a

NOP and when a taken branch is at stage 3, Kill1=1 and Kill2=1 will replace both
instructions by NOPs.

d) (3 points) Show the design of the PC control logic that includes the handling of control
hazards assuming that only BEQ, BNE, and J instructions are implemented.
Page 11 of 13

II. (5 Points) Consider the following MIPS assembly language code:

I1: ORI $s0, $0, 5

I2: ADDI $s1, $0, 10
I3: ADD $s1, $s0, $s1
I4: LW $s0, -4($s1)
I5: ADD $s0, $s0, $s0
I6: SW $s0, -4($s1)

Complete the following table showing the timing of the above code on the 5-stage
pipeline given in part (i) (IF, ID, EX, MEM, WB) supporting forwarding and
pipeline stall. Draw an arrow showing forwarding between the stage that provides the
data and the stage that receives the data. Show all stall cycles (draw an X in the box to
represent a stall cycle). Determine the number of clock cycles to execute this code.

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
-
I1: ORI IF ID EX WB

I2: ADDI IF ID EX - WB

I3: ADD IF ID EX - WB

I4: LW IF ID EX M WB

I5: ADD IF X ID EX - WB

I6: SW IF ID EX M -

Total number of clock cycles to execute the above code = 10

Page 12 of 13

Q5. [10 points] Cache Memory

a) (3 points) Given that the memory address consists of 64 bits, consider a 64 KiB fully
associative cache (1 KiB = 1024 bytes) with 64-byte cache blocks and a write back
policy is used. Compute the total number of bits required to store the valid, modified,
and tag bits in the cache.

Total Valid bits = # of blocks = 1024 bits

Total Modified bits = # of blocks = 1024 bits
Total Tag bits = # of blocks × Tag bits = 1024 × (64 – 6) bits = 59,392 bits

b) (3 points) Assume that the memory address consists of 64 bits, and a 64 KiB 4-way set
associative cache with 64-byte cache blocks is used. Find the number of tag bits, index
bits, and offset bits needed.

Offset bits = 6 bits

Index bits = log2[64K/(4 × 64)] = 8 bits

Tab bits = 64 – 6 – 8 = 50 bits

c) (4 points) Given a 2-way set-associative cache that uses 32-bit memory addresses
divided into 4 bits of offset, 12 bits of index, and 16 bits of tag. Starting with an empty
cache, show the tag, index, and way (block 0 or 1) for each of the following sequentially
referenced addresses and indicate whether the reference resulted in a hit or a miss. The
replacement policy used is FIFO.

Address Tag Index Way Hit / Miss

0x00553F0F 0x0055 0x3F0 0 Miss

0x00773F01 0x0077 0x3F0 1 Miss

0x00553F02 0x0055 0x3F0 0 Hit

0x005530AC 0x0055 0x30A 0 Miss

0x00773F07 0x0077 0x3F0 1 Hit

0x005530AA 0x0055 0x30A 0 Hit

0x009930AB 0x0099 0x30A 1 Miss

0x00993F05 0x0099 0x3F0 0 Miss

Page 13 of 13

Q6. [12 points] Cache Performance

A processor runs at 2.5 GHz and has a CPI=1.7 for a perfect cache (i.e. without including the
stall cycles due to cache misses). Assume that load and store instructions are 15% of the
instructions. The processor has an I-cache with a 4% miss rate and a D-cache with 6% miss
rate. The hit time is 1 clock cycle for both caches. Assume that the time required to transfer a
block of data from the main memory to the cache, i.e. miss penalty, is 40 ns.

a) (4 Points) Compute the number of stall cycles per instruction and the overall CPI.

Combined misses per instruction = 4% + 15% × 6% = 0.049

Miss Penalty = 40 ns × 2.5 GHz = 100 cycles
Memory stall cycles per instruction = 0.049 × 100 = 4.9 cycles
Overall CPI = 1.7 + 4.9 = 6.6 cycles

b) (4 Points) Compute the average memory access time (AMAT) in ns.

AMAT(I-Cache) = Hit time + Miss rate(I-Cache) × Miss penalty

= (1 cycle / 2.5 GHz) + 0.04 × 40 ns = 2 ns

AMAT(D-Cache) = Hit time + Miss rate(D-Cache) × Miss penalty

= (1 cycle / 2.5 GHz) + 0.06 × 40 ns = 2.8 ns

AMAT = 1/(1+ %LS) × AMAT(I-Cache) + %LS/(1+%LS) × AMAT(D-Cache)

= 1/(1+0.15) × 2 ns + 0.15/(1+0.15) × 2.8 ns = 2.104 ns

Alternative Solution:

Combined Miss Rate = Combined Misses per Instruction / (1 + %LS)

Combined Miss Rate = 0.049 / (1 + 0.15) = 0.0426

AMAT = 0.4 ns + 0.0426 × 40 ns = 2.104 ns

c) (4 Points) Discuss how the average memory access time (AMAT) can be reduced by
mentioning all the factors that could reduce it and for each factor explaining how it can
be done.

The average memory access time can be reduced by:

1. Reducing the Hit time by using Small and simple caches

2. Reducing the Miss Rate by using Larger cache size, higher associativity, and
larger block size
3. Reducing the Miss Penalty by using Multilevel caches

IT3030E Exercise Chap5 v2 Ans
No ratings yet
IT3030E Exercise Chap5 v2 Ans
11 pages
Business Innovation Unit Plan Consult
No ratings yet
Business Innovation Unit Plan Consult
15 pages
RTR Bharti Shine Role
No ratings yet
RTR Bharti Shine Role
535 pages
Ford Blaupunkt - mp3 CD - Owners.manual
No ratings yet
Ford Blaupunkt - mp3 CD - Owners.manual
7 pages
350 Exam 2 Spring 2024
No ratings yet
350 Exam 2 Spring 2024
7 pages
Midtermarch 2
No ratings yet
Midtermarch 2
9 pages
M116C 1 EE116C-Midterm2-w15 Solution
100% (1)
M116C 1 EE116C-Midterm2-w15 Solution
8 pages
Final Exam - Fall 2008: COE 308 - Computer Architecture
No ratings yet
Final Exam - Fall 2008: COE 308 - Computer Architecture
8 pages
CENG400-Final-Fall 2015
No ratings yet
CENG400-Final-Fall 2015
10 pages
CS433 hw1 Fall 07
No ratings yet
CS433 hw1 Fall 07
3 pages
BFE Final Organization Fall 2014 Answer
No ratings yet
BFE Final Organization Fall 2014 Answer
8 pages
PS4 Solution
No ratings yet
PS4 Solution
6 pages
Final w11
No ratings yet
Final w11
10 pages
Illinois Exam2 Practice Solfa08
No ratings yet
Illinois Exam2 Practice Solfa08
4 pages
CS398 Exam 3, 2 Chance December 17th, 2012: Circle The Section That Attend (So We Can Hand Back Your Exam)
No ratings yet
CS398 Exam 3, 2 Chance December 17th, 2012: Circle The Section That Attend (So We Can Hand Back Your Exam)
7 pages
2023 Contoh Soalan Computer Architecture and Organization
No ratings yet
2023 Contoh Soalan Computer Architecture and Organization
7 pages
Sample Problems Pipe&Memory
No ratings yet
Sample Problems Pipe&Memory
57 pages
Archi Second 2013 2014 JCE
No ratings yet
Archi Second 2013 2014 JCE
2 pages
Instructions: Csce 212: Final Exam Spring 2009
No ratings yet
Instructions: Csce 212: Final Exam Spring 2009
5 pages
COE301 Final Solution 162
No ratings yet
COE301 Final Solution 162
10 pages
Practice Questions To Set 8
No ratings yet
Practice Questions To Set 8
8 pages
Cs433 Fa20 Hw3 Solution
No ratings yet
Cs433 Fa20 Hw3 Solution
15 pages
MID SEM Makeup QP July 2021
No ratings yet
MID SEM Makeup QP July 2021
4 pages
ACA Question Bank
No ratings yet
ACA Question Bank
19 pages
Compre 23
No ratings yet
Compre 23
3 pages
Coss 2
No ratings yet
Coss 2
2 pages
Computer Architecture and Design QP Set A CA 3
No ratings yet
Computer Architecture and Design QP Set A CA 3
6 pages
111 Computer Organization - Final
No ratings yet
111 Computer Organization - Final
4 pages
Not An Exam Paper
No ratings yet
Not An Exam Paper
5 pages
CS2002
No ratings yet
CS2002
2 pages
hw2 Carch 2024 Sol
No ratings yet
hw2 Carch 2024 Sol
8 pages
Homework 5
No ratings yet
Homework 5
6 pages
Solution of Questions From Chapter 4-COAL
No ratings yet
Solution of Questions From Chapter 4-COAL
28 pages
Tutorial Module 4
No ratings yet
Tutorial Module 4
9 pages
COSS - 2022-23 Question Paper
No ratings yet
COSS - 2022-23 Question Paper
6 pages
Computer Organization and Architecture Cs2253: Part-A
No ratings yet
Computer Organization and Architecture Cs2253: Part-A
21 pages
Question 1 (50 Points) Pipelining
No ratings yet
Question 1 (50 Points) Pipelining
3 pages
COSS MidSem 2020.07.05 MakeUp With Key COPYM06Tq# Name-Rana
No ratings yet
COSS MidSem 2020.07.05 MakeUp With Key COPYM06Tq# Name-Rana
5 pages
Quiz Questions
No ratings yet
Quiz Questions
2 pages
ECE 341 Final Exam Solution: Problem No. 1 (10 Points)
No ratings yet
ECE 341 Final Exam Solution: Problem No. 1 (10 Points)
9 pages
Compre Final
No ratings yet
Compre Final
2 pages
ITT204 - Ktu Qbank
No ratings yet
ITT204 - Ktu Qbank
8 pages
Mid Term 13-14
No ratings yet
Mid Term 13-14
3 pages
Cat2 b1 Cao
No ratings yet
Cat2 b1 Cao
7 pages
Answer:: Remark
No ratings yet
Answer:: Remark
72 pages
Important Points For Comprehensive
No ratings yet
Important Points For Comprehensive
9 pages
Sample Midterm2
No ratings yet
Sample Midterm2
4 pages
Comparch Answers and Questions
No ratings yet
Comparch Answers and Questions
7 pages
COA Answers
No ratings yet
COA Answers
5 pages
CMPE361-Final - Sanple
No ratings yet
CMPE361-Final - Sanple
8 pages
CSE 560 - Practice Problem Set 4 Solution
No ratings yet
CSE 560 - Practice Problem Set 4 Solution
3 pages
End Solution 2023 (Autumn)
No ratings yet
End Solution 2023 (Autumn)
10 pages
Archmidsem 2009 Sol
No ratings yet
Archmidsem 2009 Sol
5 pages
cs146 Fall2017 Midterm1xx
No ratings yet
cs146 Fall2017 Midterm1xx
12 pages
CAO EST Solution 2022
No ratings yet
CAO EST Solution 2022
8 pages
HW2 Carch 2023 Sol
No ratings yet
HW2 Carch 2023 Sol
7 pages
HCT222 - 22computer Architecture and Organization 2021 July Test1
No ratings yet
HCT222 - 22computer Architecture and Organization 2021 July Test1
6 pages
Test 2
No ratings yet
Test 2
4 pages
ECE 4100 Advanced Computer Architecture Final Exam - Summer 2003
No ratings yet
ECE 4100 Advanced Computer Architecture Final Exam - Summer 2003
6 pages
PgtrbcomputerscienceIN PART 3
No ratings yet
PgtrbcomputerscienceIN PART 3
91 pages
Midterm s09 Solution
No ratings yet
Midterm s09 Solution
12 pages
2018 - 5336 - Quantifier Equivalence - Collaborate
No ratings yet
2018 - 5336 - Quantifier Equivalence - Collaborate
1 page
2002 Spring Exam1 Sol
No ratings yet
2002 Spring Exam1 Sol
7 pages
Cfls and The Pumping Lemma
No ratings yet
Cfls and The Pumping Lemma
24 pages
Question 60: Please Check The Table Given in The Below Figure Which Says The Contents of The File Will Be Lost in W+ Mode Also
No ratings yet
Question 60: Please Check The Table Given in The Below Figure Which Says The Contents of The File Will Be Lost in W+ Mode Also
3 pages
Installation Guide Data Integration Linux en
No ratings yet
Installation Guide Data Integration Linux en
205 pages
Letter Head Ade XMN
No ratings yet
Letter Head Ade XMN
6 pages
Comparison of Neutral Earthing Methods: Students Corner
No ratings yet
Comparison of Neutral Earthing Methods: Students Corner
12 pages
Katalog 2022 Inkalum
No ratings yet
Katalog 2022 Inkalum
119 pages
Data Sheet 150M Wireless N ADSL2+ Router (DT 850W)
No ratings yet
Data Sheet 150M Wireless N ADSL2+ Router (DT 850W)
4 pages
USSL-XL Discrete Streetworks Rev March 24 2023
No ratings yet
USSL-XL Discrete Streetworks Rev March 24 2023
8 pages
6.3.1.10 Packet Tracer - Exploring Internetworking Devices Instructions IG
100% (1)
6.3.1.10 Packet Tracer - Exploring Internetworking Devices Instructions IG
4 pages
Att DSTA00011851 - Visco Damper Rev6 - e
No ratings yet
Att DSTA00011851 - Visco Damper Rev6 - e
9 pages
SFG Player Guide PVP
No ratings yet
SFG Player Guide PVP
31 pages
CV of Engr.M Saidul Islam X
100% (1)
CV of Engr.M Saidul Islam X
8 pages
Hydrogen in Box - Concept Note
No ratings yet
Hydrogen in Box - Concept Note
5 pages
A Survey Paper On Hard Disk Failure Prediction Using Machine Learning
No ratings yet
A Survey Paper On Hard Disk Failure Prediction Using Machine Learning
6 pages
Validation of Sterilization
No ratings yet
Validation of Sterilization
11 pages
Rohini 92883280173
No ratings yet
Rohini 92883280173
7 pages
PC7000-6 Loading Shovel PC7000-6 Backhoe
No ratings yet
PC7000-6 Loading Shovel PC7000-6 Backhoe
9 pages
Parts Catalog 2013: M 25H MX25H JET 30H 30H
No ratings yet
Parts Catalog 2013: M 25H MX25H JET 30H 30H
110 pages
GS112 3 PDF
No ratings yet
GS112 3 PDF
19 pages
Health and Safety Manual
No ratings yet
Health and Safety Manual
5 pages
Visio-OCC For Solar Power Rev2
No ratings yet
Visio-OCC For Solar Power Rev2
1 page
LMS Orientation
No ratings yet
LMS Orientation
44 pages
CBLM Final
75% (4)
CBLM Final
56 pages
Cardstudio Datasheet en Us
No ratings yet
Cardstudio Datasheet en Us
2 pages
Power Curve M1500 600
100% (1)
Power Curve M1500 600
3 pages
En100 Iec61850
No ratings yet
En100 Iec61850
150 pages
Wincc Unified Faceplate
No ratings yet
Wincc Unified Faceplate
27 pages
Muhammad Ahmed
No ratings yet
Muhammad Ahmed
2 pages
Empowerment Technlogies 11
No ratings yet
Empowerment Technlogies 11
5 pages

Coa Applied

Uploaded by

Coa Applied

Uploaded by

COE 301 / ICS 233

Final Exam – Term 172

Computer Engineering Department

Dr. Aiman El-Maleh COE 301 ICS 233

Important Reminder on Academic Honesty

Q1. [17 points] General Understanding of Topics

Pipelining does NOT improve the latency of individual instructions. However, it

Dynamic RAM: Cell is made out of 1 transistor and 1 capacitor, requires

We need cache memory to reduce memory latency.

Temporal Locality: if a program references an instruction (or data) at a given

Spatial Locality: if a program references an instruction (or data) at a given

A cache stores tags for block identification.

h) (2 points) Suppose a 4-way set-associative cache has a capacity of 32 KiB (1 KiB =

Total number of blocks = 32 x 1024 / 64 = 512 blocks

Number of sets = 512 / 4 = 128 sets

i) (2 points) Explain the difference between a write-through and a write-back cache.

Write-through cache: every write to the cache is written to the lower-level

Q2. [8 points] Single-Cycle Processor

Instruction Format Meaning

a) Changes needed to implement JLR and LWI instructions:

PCSrc RegDst RegWr ExtOp ALUSrc ALUOp MemRd MemWr WBdata

Q3. [14 points] Performance of Single-Cycle, Multi-Cycle, and Pipelined CPU

Instruction Instruction Decode and Data Write Total

ALU 500 200 100 200 1000

Clock cycle for the single-cycle processor = 1500 ps = 1.5 ns

Clock cycle for the multi-cycle processor = max(500,200,100) = 500 ps

Average CPI for the multi-cycle processor =

0.4 × 4 + 0.05 × 5 + 0.05 × 4 + 0.3 × 3 + 0.2 × 2 = 3.35

c) (2 points) Determine quantitatively if there is a speedup when using the multi-cycle

Speedup = (1500 × 1) / (500 × 3.35) = 0.8955 ⇒ No speedup

Average CPI for the pipelined processor for control hazards =

CPIbase + CPIstalls = 1 + (0.3 × 0.9 × 2) + (0.2 × 1) = 1.74

Speedup due to data cache improvement =

Q4. [19 points] Pipelined CPU Design

I. Consider the 5-stage pipelined CPU design given below.

Condition for Stalling the pipeline due to Load Instruction:

if ((EX.MemRd == 1) // Detect Load in EX stage

When a jump instruction is at stage 2, Kill1=1 will replace that instruction by a

II. (5 Points) Consider the following MIPS assembly language code:

I1: ORI $s0, $0, 5

Total number of clock cycles to execute the above code = 10

Q5. [10 points] Cache Memory

Total Valid bits = # of blocks = 1024 bits

Offset bits = 6 bits

Tab bits = 64 – 6 – 8 = 50 bits

Address Tag Index Way Hit / Miss

0x00553F0F 0x0055 0x3F0 0 Miss

0x00773F01 0x0077 0x3F0 1 Miss

0x00553F02 0x0055 0x3F0 0 Hit

0x005530AC 0x0055 0x30A 0 Miss

0x00773F07 0x0077 0x3F0 1 Hit

0x005530AA 0x0055 0x30A 0 Hit

0x009930AB 0x0099 0x30A 1 Miss

0x00993F05 0x0099 0x3F0 0 Miss

Q6. [12 points] Cache Performance

Combined misses per instruction = 4% + 15% × 6% = 0.049

b) (4 Points) Compute the average memory access time (AMAT) in ns.

AMAT(I-Cache) = Hit time + Miss rate(I-Cache) × Miss penalty

AMAT(D-Cache) = Hit time + Miss rate(D-Cache) × Miss penalty

AMAT = 1/(1+ %LS) × AMAT(I-Cache) + %LS/(1+%LS) × AMAT(D-Cache)

Combined Miss Rate = Combined Misses per Instruction / (1 + %LS)

AMAT = 0.4 ns + 0.0426 × 40 ns = 2.104 ns

The average memory access time can be reduced by:

1. Reducing the Hit time by using Small and simple caches

You might also like