0% found this document useful (0 votes)

80 views4 pages

hw4 Sol

This document provides solutions to homework questions about cache memory. Question 1 asks about dividing addresses into tag, index, and offset bits for direct mapped and set associative caches. Question 2 involves tracing accesses to a sample address sequence in a 2-way set associative cache. Question 3 discusses capacity misses and how replacement policies affect performance of different cache organizations. Question 4 calculates effective CPI and average memory access time for different cache configurations and memory access patterns.

Uploaded by

Marah Irshedat

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

80 views4 pages

hw4 Sol

Uploaded by

Marah Irshedat

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 4

Homework 4: CS 211 Fall 2008 Solutions

Ques. 1:
• (a) A 64KB, direct mapped cache has 16 byte blocks. If addresses are 32 bits, how many
bits are used the tag, index, and offset in this cache?
o Tag: 16 bits. Index: 12 bits. Offset: 4 bits.
o In a 64KB direct mapped cache with 16 byte blocks there are 4096 cache lines
(blocks), therefore requiring 12 bits for the index (i.e., to select the block). Since
the blocks are 16 bytes each, 4 bits are required for the offset. The remaining 16
bits are used for the tag. Note that the tag bits (for direct mapped) can also be
derived as total address bits (32) minus total bits to address the cache size (i.e., to
address 64KB we neet 16 bits).
• (b) How would the address be divided if the cache were 4-way set associative instead?
o Tag: 18 bits. Index: 10 bits. Offset: 4 bits. If the cache were 4-way associative
there would be 4 blocks in each set and 1024 sets or lines. Therefore we require
10 bits for the index (to specify the set to where the block is mapped). The offset
would remain at 4 bits, and the tag would now use 18 bits.
• (c) How many bits is the index for a fully associative cache. Explain your answer.
o A fully associative cache has no bits in the index, since any address can be stored
anywhere in the cache.

Ques.2: An 8 byte, 2-way set associative (using LRU replacement) with 2 byte blocks receives
requests for the following addresses (represented in binary):
0110, 0000, 0010, 0001, 0011, 0100, 1001, 0000, 1010, 1111, 0111
For each access, determine the address in the cache (after the access), whether each access hits or
misses, and the categorization of each miss under the “3 C” model. Fill in the worksheet in the
format shown below with your answer to this question (Note that the first access is done for you).
You should fill in the cache lines with the tags that
reside there, with the most recently used tag first.
Note: In some cases there may be multiple types of miss; I indicated the primary cause of the
miss.

Address Line 0 Line 1 Hit or Miss Type

0110 (empty) 01/(empty) Compulsory Miss

0000 00/(empty) 01/(empty) Compulsory miss
0010 00/(empty) 00/01 Compulsory miss
0001 00/(empty) 00/01 Hit
0011 00/(empty) 00/01 Hit
0100 01/00 00/01 Compulsory miss
1001 10/01 00/01 Compulsory miss
0000 00/10 00/01 Conflict miss
1010 00/10 10/00 Compulsory miss
1111 00/10 11/10 Compulsory miss
0111 00/10 01/11 Capacity miss

Ques.3: Looking at the surface of the three C’s cache miss model, a fully associative cache
should have fewer non-compulsory misses (capacity plus conflict) than an equal size direct
mapped cache because conflict misses are in addition to capacity misses and occur only in set
associative or direct mapped caches.
Capacity misses are defined as those misses in a fully associative cache that occur when a block
is retrieved any time(s) after its initial compulsory miss. The genesis of the opportunity for a
small direct mapped cache to outperfrorm an equally sized fully associative cache can be found in
the question hiding within this definition of capacity misses and left begging for an answer. If
fully associative cache capacity misses are caused by blocks being discarded before their final use,
why are these blocks discarded and must otherwise equivalent set associative or direct mapped
caches discard the same blocks at the same times during program execution?

Blocks are discarded, or replaced, based on the decision of the replacement policy. This
replacement decision is very important to cache performance . If the block chosen for
replacement is not referenced in the future by the program, then no capacity miss or conflict miss
can occur in the future. This is the ideal case. If the block chosen for replacement will be used
again in the very near future or is used frequently, as compared to other candidate blocks for
replacement, then the replacement choice is a poor one and cache performance will be generally
worse than ideal.Because fully associative, set associative, and direct mapped caches have
different block placement constraints, the block re-placement policy for one cache type cannot
consider the same blocks for replacement as are considered by the same policy on another
organizational type. To see this more clearly, consider an example.

Let a program loop access three distinct addresses, A, B, and C, and then repeat the sequence
from A. The reference stream for this program at this point would look like this:
ABCABCABCA... . To simplify the discussion we assume that the direct mapped and fully
associative caches each can hold two blocks and that addresses A and C are from different cache
block frames in memory but map to the same location in the direct mapped cache, while address
B maps to the other location in the cache. If the replacement policy for the fully associative cache
is LRU, then every reference generated by the loop is a miss. If the replacement policy for the
direct mapped cache is LRU (a degenerate form to be sure, because with only one block in each
set whatever blocks are in a direct mapped cache are all always “least recently used”), accesses to
A or C will always miss, but we will always hit on B (ignoring its compulsory miss).

The replacement policy of a fully associative cache can cast its eye on all the blocks in the cache,
and in our example, makes the worst possible choice for replacement from all the blocks every
time. For the direct mapped cache this choice is also always the worst possible, but is limited to
two of the three blocks by cache structure. The result is that the direct mapped cache performs
better.

Ques.4: Appendix C discussed a number of cache performance equations, and you will find that
there a number of ways to derive cache performance metrics. Assume you have a processor with
an ideal CPI without memory stalls for each instruction type as follows: ALU=1, Load/Store=1.5,
Branch=1.5, Jump=1. Consider an application which has an instruction mix of 40% ALU and
logical operations, 30% load and store, 20% branch and 10% jump instructions.

(a) Assume a 4-way set associative 1-level separate data and instruction cache with a miss rate of
20% (0.20) for data accesses and miss rate of 10% ( 0.10) for instructions, and a miss penalty of
50 cycles for both instruction and data caches (and assume a cache hit takes 1 cycle). What is the
effective CPU time (or effective CPI with memory stalls) and the average memory access time
for this application with this Cache organization?
First compute ideal CPI without memory stalls:
CPIideal=(0.4*1)+(0.3*1.5)+(0.2*1.5)+(0.1*1) = 1.25.

The effective CPI, and therefore execution time, includes the memory stalls. This is given
by CPIactual = CPIideal + Memory Stalls per instruction.

The Memory stalls per instruction, Stalls/Inst is given by

Stalls/Inst= (stalls-data/data)+(stalls-inst/inst)

Stalls-data (stalls due to data)= (data miss rate * data miss-penalty) * data accesses/inst

Stalls-inst (stalls due to inst) = (inst. Miss rate * miss penalty) * inst.accesses/inst

The data accesses take place during the 30% Load/Store operations. The instruction
accesses take place once for each instruction – i.e., 100% of the program.

Therefore, stalls/inst = ((0.250)0.3) + (0.150)1) = 8

Now compute the Average Memory Access time (AMAT). This is given by:

AMAT = percentage of data accesses (hit time + data miss rate* miss penalty) +
percentage of inst. Accesses (hit time + inst.miss rate*miss penalty).

The percentage data accesses is 0.3/1.3 (i.e., total of 1.3 accesses to memory during entire
program execution, of which 0.3 are for data), and for inst it is 1/1.3

Therefore AMAT = (0.3/1.3)(1 + 0.250) + (1/1.3)(1+ 0.150).

Note that another way to derive stalls per instruction is to multiply the average number of
memory accesses per instruction by the AMAT minus hit time. In this example, the
average number of memory accesses per instruction is 1.3 (i.e., 1 for instruction and 0.3
for data due to Load/Store). Therefore the average stall cycles per instruction can be
derived as:

1.3(AMAT-1) = 1.3[ (0.3/1.3)(1 + 0.250) + (1/1.3)(1+ 0.150) - 1] = 8 which is

identical to what we got earlier.

(b) Now consider a 2 level 4-way unified cache with a level l (L1) miss rate of 20% (0.20) and a
level 2 (L2) local miss rate of 30% (0.30). Assume hit time in L1 is 1 cycle, assume miss penalty
is 10 cycles if you miss in L1 and hit in L2 (i.e., hit time in L2 is 10 cycles), and assume miss
penalty is 50 cycles if you miss in L2 (i.e., miss penalty in L2 is 50 cycles). Derive the equation
for the effective CPU time (or effective CPI) and the average memory access time for the same
instruction mix as part (a) for this cache organization.

First note that the miss rate for L2 is given as the local miss rate. Average memory accesses per
instruction = 1.3 as noted earlier (0.3 for data and 1 for inst).
AMAT= (hit time L1) + (miss rate L1)*(Hit time L2 + (Local miss rate L2 * Miss Penalty))

The global miss rate for L2 is not the same as L1 – but you can derive the global miss rate from
the local miss rate of L1 and L2. Note that the local and global miss rate of L1 are the same.

AMAT = (1 + (0.2)(10 + (0.350))) = 6.

Effective CPI = Ideal CPI + average memory stalls per instruction.

Average memory stalls per instruction = misses per instruction-L1 * Hit-time L2 + Misses-per-
instruction L2*miss-penalty

This is equivalent to: Average mem.stalls per instruction = (mem.accesses/inst)(miss rate L1

Hit time L2 + Miss rate L2 * miss penalty).

However, note that the Miss rate L2 in the above equation refers to the global miss rate of L2.

Alternately, we can use our earlier observation that average mem.stalls per instruction can be
derived as Mem.accesses per instruction * (AMAT -1). This gives us

Avg.Mem.stalls/inst = 1.3(AMAT -1) = 1.3(6 -1) = 6.5 cycles.

The second design, the multi-level cache, gave us lower AMAT and avg. mem. Stalls/inst and
therefore it is a better design.

Windows Hardware Design
No ratings yet
Windows Hardware Design
1,324 pages
Cache Memory: A Safe Place For Hiding or Storing Things
100% (1)
Cache Memory: A Safe Place For Hiding or Storing Things
34 pages
Isilon Cluster Shutdown
100% (1)
Isilon Cluster Shutdown
12 pages
Vision Based Autonomous Landing of UAV
No ratings yet
Vision Based Autonomous Landing of UAV
12 pages
Installation and Maintenance
No ratings yet
Installation and Maintenance
59 pages
DATA Provisioning & Replication in SAP HANA
No ratings yet
DATA Provisioning & Replication in SAP HANA
5 pages
Infoscale HCL 8x Unix 19032024
No ratings yet
Infoscale HCL 8x Unix 19032024
52 pages
Introduction Systems Development
No ratings yet
Introduction Systems Development
11 pages
2010 Final Exam Solutions
0% (1)
2010 Final Exam Solutions
13 pages
CDATA-XPON ONU - FD514GD-R460 (4GE+2WIFI) Datasheet
No ratings yet
CDATA-XPON ONU - FD514GD-R460 (4GE+2WIFI) Datasheet
4 pages
Memory Hierarchy Design
No ratings yet
Memory Hierarchy Design
115 pages
ch4 Handouts
No ratings yet
ch4 Handouts
72 pages
Rust
No ratings yet
Rust
1 page
Cache Performance Research Paper
No ratings yet
Cache Performance Research Paper
96 pages
UNIT2 Cahe-Opt
No ratings yet
UNIT2 Cahe-Opt
134 pages
Chapter 4 - Update Operations, Update Anomalies, and Normalization
No ratings yet
Chapter 4 - Update Operations, Update Anomalies, and Normalization
59 pages
ACA Unit-5
No ratings yet
ACA Unit-5
54 pages
CH 5
No ratings yet
CH 5
55 pages
Libero Installation Licensing Setup User Guide PDF
No ratings yet
Libero Installation Licensing Setup User Guide PDF
46 pages
Converter RS485 To SNMP
No ratings yet
Converter RS485 To SNMP
42 pages
Uncashed 2014 15 As On 28.07.2016
No ratings yet
Uncashed 2014 15 As On 28.07.2016
294 pages
Cache
No ratings yet
Cache
34 pages
ch2 Appb
No ratings yet
ch2 Appb
58 pages
CA I - Chapter 5 Caches 3
No ratings yet
CA I - Chapter 5 Caches 3
70 pages
Lect12 Cache
No ratings yet
Lect12 Cache
39 pages
CODch 7 Slides
No ratings yet
CODch 7 Slides
49 pages
Unit Four: Flow of Control of Program
No ratings yet
Unit Four: Flow of Control of Program
40 pages
Computer Org and Arch: R.Magesh
No ratings yet
Computer Org and Arch: R.Magesh
48 pages
Chapter 4-Connecting To Databases
No ratings yet
Chapter 4-Connecting To Databases
20 pages
New Text Document
No ratings yet
New Text Document
15 pages
5 1
No ratings yet
5 1
39 pages
Cache Organization
No ratings yet
Cache Organization
27 pages
Cau 6 Cache
No ratings yet
Cau 6 Cache
25 pages
ch5 Easy
No ratings yet
ch5 Easy
27 pages
Assembly Language Fundamentals: CMPS293&290 Class Notes (Chap 03) Kuo-Pao Yang Page 1 / 22
No ratings yet
Assembly Language Fundamentals: CMPS293&290 Class Notes (Chap 03) Kuo-Pao Yang Page 1 / 22
22 pages
Pointers To Structures
No ratings yet
Pointers To Structures
32 pages
24-Cache Memory Mapping Techniques-14!03!2024
No ratings yet
24-Cache Memory Mapping Techniques-14!03!2024
36 pages
Chapter 6 - Database Implementation and Use
No ratings yet
Chapter 6 - Database Implementation and Use
19 pages
CL10 MemoryMgmt
No ratings yet
CL10 MemoryMgmt
45 pages
CS2115 Chapter-6
No ratings yet
CS2115 Chapter-6
45 pages
Aws Iam
No ratings yet
Aws Iam
19 pages
AC14L08 Memory Hierarchy
No ratings yet
AC14L08 Memory Hierarchy
20 pages
Manual - Beta Dos
No ratings yet
Manual - Beta Dos
24 pages
Question: Who Cares About The Memory Hierarchy?: Caches and Memory Systems I
No ratings yet
Question: Who Cares About The Memory Hierarchy?: Caches and Memory Systems I
13 pages
Amarisoft Software Install Guide
No ratings yet
Amarisoft Software Install Guide
21 pages
Lecture 41
No ratings yet
Lecture 41
41 pages
Karthik
No ratings yet
Karthik
10 pages
Revision 1
No ratings yet
Revision 1
33 pages
ch07 PPT
No ratings yet
ch07 PPT
34 pages
CS252 Graduate Computer Architecture Caches and Memory Systems I
No ratings yet
CS252 Graduate Computer Architecture Caches and Memory Systems I
49 pages
Nicer Globe Integration
No ratings yet
Nicer Globe Integration
12 pages
05) Cache Memory Introduction
No ratings yet
05) Cache Memory Introduction
20 pages
Chapter 5.1-5.6 Memory
No ratings yet
Chapter 5.1-5.6 Memory
26 pages
Week12 Updated
No ratings yet
Week12 Updated
60 pages
Cloud Storage
No ratings yet
Cloud Storage
14 pages
L18 Cache Wrap Up
No ratings yet
L18 Cache Wrap Up
30 pages
Lec17 Cache 3
No ratings yet
Lec17 Cache 3
33 pages
53-Cache Memory - Principles, Cache Memory Management Techniques-28!02!2025
No ratings yet
53-Cache Memory - Principles, Cache Memory Management Techniques-28!02!2025
38 pages
CSE 332 L 15 Complete - 26th Sep 2020
No ratings yet
CSE 332 L 15 Complete - 26th Sep 2020
16 pages
My Presentation - 6th Oct. 2011
No ratings yet
My Presentation - 6th Oct. 2011
18 pages
Week 13 - Lecture 13 - Memory (Cont)
No ratings yet
Week 13 - Lecture 13 - Memory (Cont)
31 pages
10 Cacheperf
No ratings yet
10 Cacheperf
24 pages
Parameters of Cache Memory: - Cache Hit - Cache Miss - Hit Ratio - Miss Penalty
No ratings yet
Parameters of Cache Memory: - Cache Hit - Cache Miss - Hit Ratio - Miss Penalty
18 pages
24 - Caching
No ratings yet
24 - Caching
22 pages
Cache Org
No ratings yet
Cache Org
19 pages
Dynamic Memory Allocation
No ratings yet
Dynamic Memory Allocation
12 pages
Problem Set Part A: Caches Direct-Mapped Cache: ECE 3056 Architecture, Concurrency, and Energy in Computation
No ratings yet
Problem Set Part A: Caches Direct-Mapped Cache: ECE 3056 Architecture, Concurrency, and Energy in Computation
12 pages
CMP3010L09 MemoryII
No ratings yet
CMP3010L09 MemoryII
39 pages
Cache Memory
No ratings yet
Cache Memory
28 pages
C++ Chapter 5 Arrays and String
No ratings yet
C++ Chapter 5 Arrays and String
15 pages
BaiTap Chuong4 PDF
No ratings yet
BaiTap Chuong4 PDF
8 pages
Chapter # 05
No ratings yet
Chapter # 05
42 pages
CA11 2023S1 New
No ratings yet
CA11 2023S1 New
26 pages
Allama Iqbal Open University, Islamabad (Department of Computer Science) Warning
No ratings yet
Allama Iqbal Open University, Islamabad (Department of Computer Science) Warning
4 pages
Chapter5 - Direct Mapped Caches
No ratings yet
Chapter5 - Direct Mapped Caches
11 pages
Lecture 5: Memory Hierarchy and Cache Traditional Four Questions For Memory Hierarchy Designers
No ratings yet
Lecture 5: Memory Hierarchy and Cache Traditional Four Questions For Memory Hierarchy Designers
10 pages
Cache Read Write Policies
No ratings yet
Cache Read Write Policies
9 pages
Datasheet AirCheck Wi Fi Tester-117528-En-3594200
No ratings yet
Datasheet AirCheck Wi Fi Tester-117528-En-3594200
7 pages
Module 4 - Cache Memory Problems
No ratings yet
Module 4 - Cache Memory Problems
8 pages
PDF
No ratings yet
PDF
6 pages
Computer Organization Exercise Answer7
No ratings yet
Computer Organization Exercise Answer7
7 pages
How Does The Software Work?: Jugar - Toyota Smart Key Solution
No ratings yet
How Does The Software Work?: Jugar - Toyota Smart Key Solution
4 pages
Memory Hierarchies (Part 2) Review: The Memory Hierarchy
No ratings yet
Memory Hierarchies (Part 2) Review: The Memory Hierarchy
7 pages
Review Problems For Exam 1: MIPS (Instruction Count) / (Execution Time X 10
No ratings yet
Review Problems For Exam 1: MIPS (Instruction Count) / (Execution Time X 10
6 pages
Brochure Springer10 PDF
No ratings yet
Brochure Springer10 PDF
2 pages
Lab - Implementing Identity Services and Group Policy
No ratings yet
Lab - Implementing Identity Services and Group Policy
5 pages
cs325 Fall10 Finalexam
No ratings yet
cs325 Fall10 Finalexam
9 pages
Cose222 HW4
No ratings yet
Cose222 HW4
5 pages
hw3 Cse490-590-Sp2025 Sol
No ratings yet
hw3 Cse490-590-Sp2025 Sol
6 pages
Part 1. Memory Analysis: Question 1 (40 PT) - You Are Given Designs of 3 Caches For A 16-Bit Address D1
No ratings yet
Part 1. Memory Analysis: Question 1 (40 PT) - You Are Given Designs of 3 Caches For A 16-Bit Address D1
4 pages
Assign1 PDF
No ratings yet
Assign1 PDF
5 pages
Maths
No ratings yet
Maths
3 pages
Jobsinmalta CV
No ratings yet
Jobsinmalta CV
2 pages
s4 CAM July 2022
No ratings yet
s4 CAM July 2022
2 pages
Cmsc132part1 3rdexam
No ratings yet
Cmsc132part1 3rdexam
2 pages
C & C++ Interview Questions You'll Most Likely Be Asked
From Everand
C & C++ Interview Questions You'll Most Likely Be Asked
Vibrant Publishers
No ratings yet
IGNOU BCA Fundamentals of Computer Networks Previous Year Unsolved Papers BCS 041
From Everand
IGNOU BCA Fundamentals of Computer Networks Previous Year Unsolved Papers BCS 041
Manish Soni
No ratings yet

hw4 Sol

Uploaded by

hw4 Sol

Uploaded by

Homework 4: CS 211 Fall 2008 Solutions

Address Line 0 Line 1 Hit or Miss Type

0110 (empty) 01/(empty) Compulsory Miss

The Memory stalls per instruction, Stalls/Inst is given by

Therefore, stalls/inst = ((0.2*50)*0.3) + (0.1*50)*1) = 8

Therefore AMAT = (0.3/1.3)(1 + 0.2*50) + (1/1.3)(1+ 0.1*50).

1.3*(AMAT-1) = 1.3*[ (0.3/1.3)(1 + 0.2*50) + (1/1.3)(1+ 0.1*50) - 1] = 8 which is

AMAT = (1 + (0.2)*(10 + (0.3*50))) = 6.

Effective CPI = Ideal CPI + average memory stalls per instruction.

This is equivalent to: Average mem.stalls per instruction = (mem.accesses/inst)*(miss rate L1 *

Avg.Mem.stalls/inst = 1.3*(AMAT -1) = 1.3*(6 -1) = 6.5 cycles.

You might also like

Therefore, stalls/inst = ((0.250)0.3) + (0.150)1) = 8

Therefore AMAT = (0.3/1.3)(1 + 0.250) + (1/1.3)(1+ 0.150).

1.3(AMAT-1) = 1.3[ (0.3/1.3)(1 + 0.250) + (1/1.3)(1+ 0.150) - 1] = 8 which is

AMAT = (1 + (0.2)(10 + (0.350))) = 6.

This is equivalent to: Average mem.stalls per instruction = (mem.accesses/inst)(miss rate L1

Avg.Mem.stalls/inst = 1.3(AMAT -1) = 1.3(6 -1) = 6.5 cycles.