0% found this document useful (0 votes)
26 views25 pages

Memory Organization

The document discusses memory organization, focusing on cache performance, access times, and hit ratios. It includes various questions and solutions related to cache behavior, instruction queues, and memory access times in different scenarios. The document also covers concepts such as write strategies, memory hierarchy, and locality of reference.

Uploaded by

vidhi
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
26 views25 pages

Memory Organization

The document discusses memory organization, focusing on cache performance, access times, and hit ratios. It includes various questions and solutions related to cache behavior, instruction queues, and memory access times in different scenarios. The document also covers concepts such as write strategies, memory hierarchy, and locality of reference.

Uploaded by

vidhi
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 25

MEMORY ORGANIZATION

Q3.

Caches:

• Tavg

Q1-Q2. There is an instruction QUEUE in the CPU that could hold pre-fetched
instructions. Thus, the “INSTRUCTION QUEUE” and the “GENERAL PURPOSE
REGISTERS” now result in having 25% less memory access made by cpu and INTERNAL
BUS in CPU circuitry is self sufficient for these accesses and no memory access signal
comes to cache or main memory. Earlier the cache hit was: 0.8, cache access time= 100 ns
and Main memory access time = 500ns.

Q1. What is the speedup achieved in the average memory access time of the memory
system due to having “INSTRUCTION QUEUE” :___________________________ 1 (no
change in memory access time of the memory system)

Q2. If earlier a program took 1000 ns to execute (assuming only memory accesses time in
program execution, cpu time is negligible), what is the new execution time:
_____________ns 750

Solution:

Let earlier be x memory accesses -> 1000 ns

The new memory accesses = 0.75x

New execution time = 750 ns

Q3 -Q5. The empirically analysed probability of accessing a main memory block number x
when the program A is under execution and having a particular instance of “VIRTUAL
MEMORY to MAIN MEMORY” mapping is:
Q3. Assuming the same “VIRTUAL MEMORY to MAIN MEMORY” mapping and all program
execution conditions same for executing program A, find the hit ratio of CACHE at this
instance, when the cache with 8 block size has the blocks: [1, 2, 3, 8, 5 , 6, 7, 9 ] in cache.

What is the “HIT RATE of cache at given instance for given data”

A. 1
B. 0.4
C. 0.8
D. 0.6

Hit rate = (0.1 + 0.2 + 0.3)


Q4.

Consider that as per block replacement policy, only 3 blocks can be allocated in the cache
to a process: then minimum how many block misses will be required after this snapshot of
cache, to have cache hit rate greater than 0.6:

A. 4
B. 1
C. 2
D. 3

Q5. Consider the cache block size of 4 words and memory word size is 16 bytes in byte
addressable main memory. Then the no. of words in the program A is:
A. 4
B. 256
C. 64
D. 16

Q6.
Consider a simultaneous access cache memory system with
Cache access time = 100 ns
Main memory access time = 1000 ns
Hit ratio = h

Which of the following is a correct graph for variation of “Tavg” vs “h”


Q7.
Consider a cache with access time = x ns
Main memory with access time = y ns

If the variation of Tavg with the cache hit ratio “h” is same in simultaneous access
arrangement and hierarchical access arrangement, then which of the following is condition is
necessary:

A. y<x
B.

• Multilevel

• Mapping:

Q1. . If the probability to access any block in the memory system is same and independent
of the current cache contents has the value, P(access a block) = 1/1024. Then what can be
the maximum size of tag bits possible in any kind of cache-mapping:

A. 1024
B. 12
C. 10
D. 20
Q2. . If the probability to access any address in the memory system is same and
independent of the current cache contents has the value, P(access an address) = 1/1024.
The cache block size is 4 words. And the memory word size is 4 bytes. Then what can be
the maximum size of tag bits possible in any kind of cache-mapping, where cache can hold 4
blocks:

E. 6
F. 64
G. 10
H. 4

Q3. . Assuming all addresses in the memory system are equi-probable to be accessed at
any time while the program execution. Each access request is independent from each other.
In direct mapping, there is only 1 tag bit saved in tag directory for each cache block. Then
the hit ratio

1. [NIELIT]

In a particular system it is observed that, the cache performance gets improved as a result of
increasing the block size of the cache. The primary reason behind this is :

A. Programs exhibits temporal locality


B. Programs have small working set
C. Read operation is frequently required rather than write operation
D. Programs exhibits spatial locality

Q1:[nat]
In a two-level memory hierarchy, the access time of cache memory is 12 ns and the access
time of main memory is 1500 ns. The hit ratio is 0.98, the average access time of the two
level memory system is ______ ns?

Solution:

if nothing is mentioned, it is following simultaneous access in memory hirarchy .


Avg access time = cache hit * cache access time + missRate * hit rate in 2nd level * 2nd
level access time

= .98 * 12 + .02 * 1 * 1500

= 41.76 ns

Q2: Which of the following is true for instruction cache and data cache generally: [msq]

A . Spatial locality of instruction cache references>spatial locality of data cache


references.
B. Temporal locality of instruction cache references<spatial locality of instruction
cache references.
C. Data cache hit rate<instruction cache hit rate.
D. Data cache miss rate>data cache hit rate.
Answer: ABCD

• Spatial locality of instruction cache references>spatial locality of data


cache references.The data-cache has no spatial locality since it is
randomly accessed.Instruction accesses are highly sequential, since there
are very few branches, and even the branches contribute to spatial locality
since they branch no more than 5 instructions back.
• Temporal locality of instruction cache references<spatial locality of
instruction cache references. I-cache temporal locality occurs only when
there is a branch. Since there are very few, this is lower than the spatial
locality.
• Data cache hit rate<instruction cache hit rate.
• Data cache miss rate>data cache hit rate
The data cache has a terrible hit rate – loads are almost always misses
because they are random. Stores might always be hits, but if the loads
evict data from other loads, stores might miss too.

Q3:
Consider a memory system with a single L1 cache that has an average access
time of100ns without L1, and 30ns with L1. L1 has an access time of 10ns. What
is the hit ratio of L1required to have an average access time of 30ns.
A. 90%
B. 70%
C. 80%
D. 81.8%

Solution:
C. 80%

Q:
Consider a memory system with I-cache, D-cache, L2 cache and main memory
accesstime of 5ns, 5ns, 10ns and 100ns respectively, and their respective hit
ratios are 0.85, 0.80, 0.9 and1. On executing a program, 40% of instructions
access I-cache and 60% of instructions accessD-cache.What is the average
memory access time?

a. 6.6ns
b. 14.7ns
c. 8.6ns
d. 12.2ns

Solution:
8.6ns

Q:
Suppose a processor supports software configuration on handling cache write hits/miss
strategy.

1. What strategy should be taken if:


(a) The processor mainly works on applications that require massive memory
write operation and data access.
(b) The processor mainly works on applications that require massive memory
write operation and data access but do not allow data inconsistency.

Answer 1. (a) Write Back, reduce main memory access.


(b) Write Through, keep data consistently.

Q:
Which of the following is true? [msq]
I : Write back updation of cache leads to raw hazards
II : Write through mechanism reduces the average memory access time
III : Write back and write through mechanisms reduce conflict misses
IV : Write back and write through mechanisms maintain cache coherence

a. I ,II,and IV
b. I, and IV
c. I, II, and IIIOnly
d. IV is True

Solution:
I, and IV

Q:

The following program fragment is executing on single core machine

for(int i=0; i<n;i++)


sum = sum+A[i]
Which of the following updation techniques of the cache will result in less access
time of the variables 'i' and 'sum' respectively?
a. write back to i and write through
b. write through for i and write back for sum
c. write through for both i and sum
d. write back for both i and sum

Solution:
write back for both i and sum
Q:
. Which of the following statements are true?
a. The main objective for using cache memory is to increase the effective speed of the
memory system.
b. The main objective for using virtual memory is to increase the effective capacity of the
memory system.
c. The size of main memory is larger as compared to cache memory.
d. Main memory is faster as compared to cache memory.
Correct answers are (a), (b) and (c).
(a) is true since the average memory access of a memory system using cache is close
to the access time of the cache. And cache is faster than main memory.
(b) is true since in virtual memory the size of a program can be as large as the size of
the secondary memory.
(c) is true since cache memory is more expensive and hence its capacity is smaller
than main memory.
(d) is clearly false.

Q:

. Which of the following is true for a memory hierarchy?


a. It tries to bridge the processor-memory speed gap.
b. The speed of the memory level closest to the processor has the highest
speed.
c. The capacity of the memory level farthest away from the processor is the
largest.
d. It is based on the principle of locality of reference.
All the answers are true.
This follows from the basic definition of memory hierarchy design. The fastest and
smallest memory module is closest to the processor. The overall access time of the
memory system is close to the access time of the fastest memory level. This is
achieved by exploiting locality of reference.
Q. Which of the following statements are false?
a. Temporal locality arises because of loops in a program.
b. Spatial locality arises because of loops in a program.
c. Temporal locality arises because of sequential instruction execution.
d. Spatial locality arises because of sequential instruction execution.
Correct answers are (b) and (c).
Temporal locality says that an accessed word will be accessed again in the near
future, and this is due to the presence of loops.
Spatial locality says that if a work is accessed, then words in the neighborhood will
also be accessed in the near future. This happens due to sequential program
execution.

Q:

Hardware cache memories exploit spatial locality of reference

A. by remembering which pieces of data have been accessed recently


B. when data items are re-accessed frequently
C. by remembering which cache blocks (lines) have been written to
D. only if cache block (line) size is greater than 1 byte

Solution:
Answer is D.

Spatial locality refers to a data near by a recently accessed one being accesses
in near future. So, to make use of it, when a data is accessed, a block of data
which incudes the accessed part is taken (called cache line) and placed in cache.
So, if this cache line size is 1 byte, that virtually rules out any chance of exploiting
spatial locality in cache.

A is not correct. There is no "remembering" mechanism in cache though this


might be used for cache line replacement like in LRU.

B is for temporal locality.

C is used for write-back cache.

Q:

Consider a cache works at 5× speed of main memory with hit rate of 75%. What is the
speedup of the memory performance if such cache is used.

Answer Let cache access time to be t, then the main memory access time is 5t. The
system average access time is,
Ta = 0.75t + 0.25 × 5t = 2.0t. (1)
Then the performance speedup is 5t/2.0t = 2.5.

There is a computer that has 64MB byte-addressable main memory. Instructions and data
are stored in separated caches, each of which has eight 64B cache lines. The data
cache use direct-mapping. Now there are two programs in the following form.
Program A:
int a[64][64];
int sum_array1()
{
int i,j,sum=0;
for(i=0;i<64;i++)
for(j=0;j<64;j++)
sum += a[i][j];
return sum;
}
Program B:
int a[64][64];
int sum_array1()
{
int i,j,sum=0;
for(j=0;j<64;j++)
for(i=0;i<64;i++)
sum += a[i][j];
return sum;
}
Suppose int data is represented in 32-bit 2’s complement and i,j,sum are stored
in specific registers. Arrays are stored in row-major with the start address 32010
in the main memory. Answer the following questions.

1. What are the line numbers of the main memory blocks that contain a[0][31]
and a[2][2] respectively? (Cache line number starts from 0)
2. What are the data cache hit rates of program A and B?

Answer 1. a. 320 + 31 ∗ 4 = 444, 444/64 = 6,


b. 320 + (64 ∗ 2 + 2) ∗ 4 = 2376, 2376/64 = 37, 37 mod 8 = 5.
2. The size of Array a is 64 ∗ 64 ∗ 4 = 214B. Since the size of a block is 64B, it
takes 28 main memory blocks to store it. Under the condition of row-major,
28times of cache miss will appear. Therefore, the hit rate of program A is (2 12 −
28)/212 = 93.75%. For program B, the hit rate is 0.

Q. Assume that a read request takes 50 nsec on a cache miss and 5 nsec on
a cache hit. While running a program, it is observed that 80% of the processor’s
read requests result in a cache hit. The average read access time is …………..
nsec.
Correct answer is 14.
Average read time = 0.80 x 5 + (1 – 0.80) x 50 = 14 nsec
Q. The memory access time is 1 nsec for a read operation with a hit in cache, 5
nsec for a read operation with a miss in cache, 2 nsec for a write operation with
a hit in cache, and 10 nsec for a write operation with a miss in cache. The
execution of a sequence of instructions involves 100 instruction fetch operations,
60 memory operand read operations, and 40 memory operand write operations.
The cache hit ratio is 0.9. The average memory access time (in nanoseconds) in
executing the sequence of instructions is:
a. 1.26
b. 1.68
c. 2.46
d. 4.52
Correct answer is (b).
Total number of read = 100 + 60 = 160
Total number of write = 40.
So, fraction of reads = 160 / (160 + 40) = 0.8
And, fraction of writes = 40 / (160 + 40) = 0.2
Average access time = 0.8 (0.9 x 1 + 0.1 x 5) + 0.2 (0.9 x 2 + 0.1 x 10) = 1.68

Q. Consider a two-level memory hierarchy with separate instruction and data


caches in level 1, and main memory in level 2. The clock cycle time in 1 ns. The
miss penalty is 20 clock cycles for both read and write. 2% of the instructions are
not found in I-cache, and 10% of data references not found in D-cache. 25% of
the total memory accesses are for data, and cache access time (including hit
detection) is 1 clock cycle. The average access time of the memory hierarchy will
be …………. nanoseconds.
Average access time = 0.75 (0.98 x 1 + 0.02 x 20) + 0.25 (0.90 x 1 + 0.10 x 20) = 1.76
ns

Direct Mapping

Q:

. Consider a direct-mapped cache with 64 blocks and a block size of 16 bytes.


Byte address 1200 will map to block number ………… of the cache. Correct
answer is 11.
We first find out the memory block number that byte address 1200 belongs to. Since
the size of a block is 16 bytes.
Byte address 0 to 15: block 0
Byte address 16 to 31: block 1
Byte address 32 to 47: block 2, and so on.
Byte address 1200 will belong to block number: floor(1200/16) = 75. For direct
mapped cache,
Cache block no. = (Memory block no.) MOD (No. of cache blocks) = 75 MOD 64 =
11.

Q. A cache memory system with capacity of N words and block size of B words
is to be designed. If it is designed as a direct mapped cache, the
length of the TAG field is 10 bits. If it is designed as a 16-way set associative
cache, the length of the TAG field will be ………… bits. Correct answer is 14.
For 16-way set associative cache, 4 more bits will be required in the TAG as
compared to direct mapping, since 24 = 16.

Q. Which of the following statements is true:


a. The implementation of direct mapping technique for cache requires expensive
hardware to carry out division.
b. The set associative mapping requires associative memory
for implementation.
c. A main memory block can be placed in any of the sets in set associative
mapping.
d. None of the above.
Correct answer is (b).
Direct mapping is the easiest to implement. Both fully associative and set associative
mappings require an associative memory. (c) is false as in set associative mapping, a
main memory block can be mapped to any of the blocks in the selected set.

Q.

Q For two direct-mapped cache design with a 32-bit address and a 16-bit address,
the following bits of the address are used to access the cache (as below).

Tag Index Offset

a 31-10 9-5 4-0

b 15-10 9-4 3-0

1. What is the cache line size (in word)?


2. How many entries does the cache have?
3. What is the ratio between total bits required for such a cache implementation
over the data storage bits?

Answer: 1. a. The offset is 4-0 (5 bits). It implies 25bytes= 32bytes= 8words. b.


The offset is 3-0 (4 bits). It implies 24bytes= 16bytes= 8words.
2. a. The range of index is 9-5 (5 bits). For the direct-mapped cache, it implies
25sets= 32entries.
b. The range of index is 9-4 (6 bits). For the direct-mapped cache, it
implies 26sets= 64entries.
3. a. Total bits=32 entries∗(1 valid bit+22 tag bits+32 ∗ 8 data bits) = 32 ∗
279, Data bits= 32 entries∗32 ∗ 8 data bits= 32 ∗ 256,
Ratio= 279/256 = 1.09.
b. Total bits=64 entries∗(1 valid bit+6 tag bits+16 ∗ 8 data bits) = 64 ∗
135, Data bits= 64 entries∗16 ∗ 8 data bits= 64 ∗ 128,
Ratio= 135/128 = 1.05.

Q:

Suppose there is a byte addressable computer with main memory size 256MB and a
cache with 8 lines. Each cache line size is 64B. Assume direct mapping in the
memory hierarchy. Calculate the following items.
1. How many bits do we need at least for the main memory address? The total
cache size (in bits).
2. The line number corresponding to the main memory address 233310.

Answer 1. All of the following answers are correct (note: analysing one senario

is enough). 1

(a) Assume the main memory address space is up to 256MB, then we have
28bit main memory address, thus the total cache size in bits is 8 × (1 + 19
+ 512) = 4256bits.
(b) Assume 32-bit main memory address, the total cache size in bits is 8 × (1
+ 23 + 512) = 4288bits.
(c) Assume 16-bit main memory address, the total cache size in bits is 8 × (1
+ 7 + 512) = 4160bits.
2. All of the following answers are correct
(a) d2333B/64Be = 37, line number = 37 mod 8 = 5.
(b) b2333B/64Bc = 36, line number = 36 mod 8 = 4.

Q2: A cache separate a 32-bit address as follows:


bits 0 - 3 = offset
bits 4 - 14 = index
bits 15 - 31 = tag
What is the size of cache? How much space is required to store the tags for
the cache?

Solution:

Size of cache line: 2 = 24= 16 bytes


offset bits

Number of cache lines: 2index bits= 211= 2048


Total cache size: 16∗2048 = 32KB
Total tag size: 17∗2048 = 34Kb

Q3:

Set Associative

Q:
A computer uses a 2-way set associative cache of size 128 KBytes with block (line) size of
32 Bytes. The cache accepts 32 bit addresses of the form b31 b30…b2 b1 b0 where b31
is the most significant address and b0 is the least significant address bit. Which bits are
used by the cache controller for indexing into the cache directory?

Solution:
block size = 32B

cache size = 128KB

number of cache lines = cache size/ block size

= 128kb/32b= 4096

number of cache sets = number of cache lines/p-way

s = 4096/2 = 2048

tag s block size

11 bits 5bits
16 bits

bb b b b b b b b
31 30 29 28 27 26 25 24 23 b b b b b
4 3 2 1 0

b b b b b b b b b bb
15 14 13 12 11 10 9 8 7 6 5

b b b b b b b
22 21 20 19 18 17 16

A method of addressing a cache directory, comprising:

computing a parity of an address tag field within a presented address;

computing an index including:

the computed parity within a bit of the index; and

an index field within the presented address within remaining bits of the index.

The index field of an address maps to low order cache directory address lines.
The remaining cache directory address line, the highest order line, is indexed
by the parity of the address tag for the cache entry to be stored to or
retrieved from the corresponding cache directory entry. Thus, even parity
address tags are stored in cache directory locations with zero in the most
significant index/address bit, while odd parity address tags are stored in cache
directory locations with one in the most significant index/address bit. The
opposite arrangement (msb 1=even parity; msb 0=odd parity) may also be
employed, as may configurations in which parity supplies the least significant bit
rather than the most significant bit.
now in the set (s) field msb is used as a parity bit the remaining set
field bits are used for indexing...along with parity bit

so clearly b b b b b b b b b b b are used for indexing


15 14 13 12 11 10 9 8 7 6 5

Q:
Which of the following statement/s is/are FALSE?I: In set associative mapping, if
set size is reduced to 1; it reduces to fully associative mapping.II : In set
associative mapping if only one set is present, it reduces to direct mapping.

a. I Only
b. II Only
c. Both I and II
d. None

Solution:
Both I and II

Q:

A byte-addressable computer uses 32-bit address to access main memory. Suppose the
data cache has a size of 4KiB and works in 8-way set-associative. Each block size
is 16B.
1. How many bits in Tag, Index and Offset?
2. Calculate the total cache size (in bits) if it uses write back and LRU
replacement.

Answer 1. The number of blocks is 4 KiB /16 B= 28. Therefore, 28/8 = 25rows need 5 bits to be
indexed. Since the computer is byte-addressable, the bits of offset is log2(24B) = 4.
And the tag bits is calculated as: 32 − 5 − 4 = 23 bits.
2. (23 Tag bits +16 ∗ 8 Data bits+1 Valid bit+1 Dirty bit+3 LRU bits) ∗ 28 = 39936
bits.

Q:

There are several parameters that impact the overall size of the page table. Listed below
are key page table parameters.

Table 2: Q6
Virtual Address Size Page Size Page Table Entry Size

32 16KiB 4bytes
1. Given the parameters shown above, calculate the total page table size for a
system running 5 applications that utilize half of the memory available (half of
the 32-bit virtual address space for each running ).
2. A cache designer wants to increase the size of a virtually indexed, physically
tagged cache. Given the page size shown above, is it possible to make a
64KiB direct mapped cache, assuming 2 words per block?
3. How would the designer increase the data size of the cache?

Answer 1. The tag size is 32 − log2(16384) = 32 − 14 = 18 bits. All five page tables would
require 5 ∗ (218 ∗ 4)/2 bytes = 2621440.0 B,
2. The page index consists of address bits 13 down to 0. So the LSB of the tag
is address bit 14. A 64 KiB direct-mapped cache with 2-words per block
would have 8-byte blocks and thus 64 KiB / 8 bytes = 8192 blocks, and its
index field would span address bits 16 down to 3 (13 bits to index, 1 bit word
offset, 2 bit byte offset). As such, the tag LSB of the cache tag is address bit
16,
3. The designer would instead need to make the cache 4-way associative to
increase its size to 64 KB.

Q:
A computer system has a 8K word cache organized in a block-set associative manner with
8 blocks per set and 32 words per block. The number of bits in the SET and WORD fields
of the main memory address format is:
a. 6,4
b. 4,5
c. 5,5
d. 8,5

Q:
The main memory of a computer has N blocks while the cache has N/m blocks. If the cache
uses set associative mapping scheme with 4 blocks per set, then block “k” of the main
memory maps to the set:

a. (k mod (N/m) ) of the cache


b. (k mod (N/4m) ) of the cache
c. (4k mod 4m) of the cache
d. (k mod 4m) of the cache
Q:
Consider a 256k 4- way set associative cache with block size 64 Bytes. Main memory is
2Gb.The number of bits used for tag,set and word will be respectively?

(tag,set,word) in that

a) 15,10,6
b) 9, 16, 6
c) 10,15,6
d) 8, 17, 6

Answer: a) 15,10,6

Q:
Consider a system with following sequence of memory block accesses : 16, 0,
16, 0, 4,0, and 4.
Cache memory has 16 blocks and main memory has 64 blocks. In this case
which of the below mapping techniques result in least number of misses and less
hardware complexity (i.e.,least number of tag comparators)?

a. Direct mapping
b. 2-way set associative
c. 4-way set associative
d. Fully associative

Solution:
2-way set associative

Q:Consider a system with a block size of p words. What is the range of words
present in kth block of main memory?

a. k*p to (k+1)*p - 1
b. k*p^2 to (k+1)*p^2
c. k*2^p to (k+1)*2^p - 1
d. k*2^p to (k)*2^(p+1) - 1
Solution:
k*p to (k+1)*p - 1

Q:
A computer has a 512KB, 8-way set associative data cache with block size of
32B. Theprocessor sends 32-bit addresses to the cache controller. Each cache
tag directory entry contains,in addition to the address tag, 2 valid bits, 1 modified
bit and 1 replacement bit.A. what will be the number of tag bits?

a. 11
b. 16
c. 21
d. 12

Solution:
16
B. What will be the cache controller size?
a. 40KB
b. 320KB
c. 192KB
d. 32KB

Solution:a
40Kb
Q
Consider a 2 way set associative cache with 4 blocks. The memory block requests in the
order.

4,6,3,8,5,6,0,15,6,17,20,15,0,8

If LRU is used for block replacement then memory set 17 will be in the cache block ____ and
set no ____. (assuming first block starts from 0)

Answer: cache block number 3 and set number 1(as there are two sets 0 and 1)

Q. A computer system uses 32-bit memory addresses and it has a main memory consisting
of 1G bytes. It has a 4K-byte cache organized in the block-set-associative manner, with 4
blocks per set and 64 bytes per block.
(a) Calculate the number of bits in each of the Tag, Set, and Word fields of the
memory address.
(b) Assume that the cache is initially empty. Suppose that the processor fetches 1088
words of four bytes each from successive word locations starting at location 0. It then
repeats this fetch sequence nine more times. If the cache is 10 times faster than the
memory, estimate the improvement factor resulting from the use of the cache.
Assume that the LRU algorithm is used for block replacement.

Solution:

Here number of words that are in 1 block = Block size / Word size

= 64 B / 4 B = 16
Number of cache lines = Cache size / Block size

= 4 KB / 64 B

= 64

Number of sets hence = 64 / 4 = 16 sets

Also number of words which are possible to be accomodated in cache till it is full = 4 K
/ 4 = 1024

Now we have to know about the access sequence of words i.e. a word access results in hit
or miss :

a) For word number 0 - 15 , block number = 0 ..

Thus set number = block number % number of sets = 0 mod 16 = 0

b) For word number 16 - 31 , block number = 1

Thus set number = 1

Thus continuing in the same manner till word no 1023 (at this point cache is full) , let us
mention the block number contained by each set :

Set number 0 : 0 16 32 48

Set number 1 : 1 17 33 49

Set number 2 : 2 18 34 50

and so on till .....

Set number 15 : 15 31 47 63

Now when the word numbers 1024-1039 comes i.e. block number 64 comes , cache is full
hence capacity miss occurs . Hence LRU replacement comes into picture . So block number
0 in set 0 is least recently used , hence it is replaced.Hence set 0 blocks appear
now as : 64 16 32 48

Similarly set 1 will be updated when block number 65 is accessed as


: 65 17 33 49

set 2 will be updated when block number 66 is accessed as


: 66 18 34 50
set 3 will be updated when block number 66 is accessed as
: 67 19 35 51

In this way all words (0 - 1087) are covered by covering 68 blocks .

Thus number of cache misses in 1st iteration = 64(compulsory) + 4(capacity)

= 68

Now in subsequent nine iterations where access of the words (0 - 1087) are done again , set
number ( 4 - 15 ) will remain unaffected as the words are already in cache which belong to
these sets and these sets wont face any block replacement as shown below . So from now
on we focus on first 4 sets only where replacement occurs due to cache miss.

Now when the word numbers 0-15 comes i.e. block number 0 comes , cache is full hence
capacity miss occurs . Hence LRU replacement comes into picture . So block number 16 in
set 0 is least recently used , hence it is replaced.Hence set 0 blocks appear
now as : 64 0 32 48

Similarly set 1 will be updated as : 65 1 33 49

set 2 will be updated as : 66 2 34 50

set 3 will be updated as : 67 3 35 51

Similarly when block number 16 comes , set 0 will be updated as : 64 0 16 48

block number 17 comes , set 1 will be updated as : 65 1 17 49

block number 16 comes , set 2 will be updated as : 66 2 18 50

set 3 will be updated as : 67 3 19 51

This way further misses will occur in these sets

Hence number of misses = 4*5 = 20

Likewise number of misses will be same for subsequent iterations .

Hence total number of misses in 10 iterations = 68(for first iteration) + 9*20 (for next
9 iterations)

= 68 + 180

= 248 misses

Thus speedup obtained = Time without cache / Time with cache


= ( Number of blocks accessed * Number
of iterations * 10 * Cache access time
) / ( Number of

blocks * (Cache access time + Main


memory access time + (Number of
iterations - 1) * (Number of cache misses
per iteration *(Cache access time + Main
memory access time) + (68 - Number of
cache misses) * Cache access time))

[ As main memory access time = 10 * cache


memory access time given ]

= (68 * 10 * 10) / ((1 * 68 * 11) + 9 * (20


* 11 + 48))

= 2.15

Thus the performance improvement(speedup) = 2.15

Types of Misses

Q:
Which one of the following is correct?
A. Compulsory misses can be reduced by increasing the total cache
size.
B. Capacity misses can be reduced by increasing the block size.
C. Conflict misses may be increased by increasing the value of
associativity.
D. Compulsory misses can be reduced by increasing the cache block
size.
Q:
. Which of the following statements is false for cache misses?
a. Compulsory miss can be reduced by decreasing the cache block size.
b. Capacity miss can be reduced by decreasing the total size of the cache.
c. Conflict miss can be reduced by decreasing the value of cache associativity.
d. Compulsory miss can be reduced by prefetching cache blocks. Correct answers
are (a), (b) and (c).
This follows from the definitions of compulsory and capacity misses.

Q:
Match the following
A.Capacity Misses I.Fully Associative cache
B.Compulsory Misses II.Set Associative cache
C.Conflict Misses III.Direct Mapped cache
IV.Cold Start cache

A. III, B. IV, C. II
A. I, B. I, C. II and III
A. I, B. IV, C. II and III
A. I, B. II, C. III

Solution:
A. I, B. IV, C. II and III

Q. How can the cache miss rate be reduced?


a. By using larger block size
b. By using larger cache size
c. By reducing the cache associativity
d. None of the above
Correct answers are (a) and (b).
If the cache block size is increased, number of cache misses will be reduced in
general. Similar will be the case for larger cache memories, where more number of
cache blocks can be stored. However, if the associativity is reduced, we are reducing
the choice for cache block placement that can result in an increase in the number of
misses.

Q: Consider a physical memory of 1MB size and a direct mapped cache of 8KB size with
block size 32 bytes.Let there be a sequence of block accesses given by
2,216,100,256,2048,728,256,216
Find the number of compulsory, capacity, conflict miss.

Solution:
in one go 32 bytes of data will be taken from/trasferred to main memory to/from cache. This
means there are 1MB/8KB=128 blocks in main memory corresponding to a single cache
block. Since 128 memory blocks can map to a single cache entry, this means we need
log2 128=7 tag bits.

• The main address bits get split into 3:tag :index :offset

For the above configuration we have 20 bit physical address, 7 bits of tag, 8 bits for
index (2 =256 entries in cache each of size 32 bytes = 8 KB) and 5 bits of offset.
8

No. of blocks in cache =8KB/32B=256

Now, to find which cache block a memory block goes, we just have to do
Mod256 .
2 2 Compulsory Miss
216 216 Compulsory Miss
100 100 Compulsory Miss
256 0 Compulsory Miss
2048 0 Compulsory Miss
728 216 Compulsory Miss
256 0 Conflict Miss
216 216 Conflict Miss

Multilevel-cache:

Q. Suppose that in 1000 memory references there are 40 misses in L1


cache and 10 misses in L2 cache. If the miss penalty of L2 is 200 clock cycles,
hit time of L1 is 1 clock cycle, and hit time of L2 is 15 clock cycles, the
average memory access time will be ……………….. clock cycles.
L1 hit ratio = (1000 – 40) / 1000 = 0.96
L2 hit ratio = (1000 – 10) / 1000 = 0.99
Average access time = 0.96 x 1 + 0.04 [0.99 x 15 + 0.01 x 200] = 1.634

Q:
. In a two-level cache system, the access times of L1 and L2 caches are 1 and 8
clock cycles respectively. The miss penalty from L2 cache to main memory is 18
clock cycles. The miss rate of L1 cache is twice that of L2. The average memory
access time of the cache system is 2 cycles. The miss rates of L1 and L2 caches
respectively are:
a. 0.130 and 0.065
b. 0.056 and 0.111
c. 0.0892 and 0.1784
d. 0.1784 and 0.0892
Correct answer is (a).
Let the miss rate of L2 cache be x.
So, miss rate of L1 cache = 2x.
Thus, average memory access time
AMAT = (1-2x).1 + 2x. [(1-x).8 + x.18] = 2 (given)
Solving, we get
x = 0.065
Q:
Suppose that in 250 memory references there are 30 misses in first level cache and 10
misses in second level cache. Assume that miss penalty from L 2 cache memory are 50
cycles. The hit time of L2 cache is 10 cycles. The hit time of the L1 cache is 5 cycles. If there
are 1.25 memory references per instruction, then average stall cycles per instruction is
__________

Answer:

Memory stall cycles created when CPU has to wait for memory. Here, we can assume no
stall cycles for L1 hit. So, CPU will access from L1 only. If something not found there we
need to consider L1 miss penalty (why not L2 miss penalty ?? because we will include it in
L1 miss

Miss penalty of
L1 =( HIT time in L2)+(miss rate in L2)∗(miss penalty of L2)
=10 cycles +(10/30)∗50 cycles
=80/3 cycles

Mem stall cycles due to misses per instruction


=(mem request per instruction)∗(Miss rate in L1)∗(Miss penalty of L1)
=1.25∗(30/250)∗80/
=4
Q:
Consider n unique memory block accesses on a fully associative cache with k
blocks..What is the necessary relation required to ensure maximum number of
misses using FIFOreplacement policy ?

a. n=k+1
b. n<k
c. n=k
d. n<=k

Solution:a
n=k+1

Q:A basic computer system with a single core processor where memory
operations take 40% of execution time. An enhancement called L1 cache speeds
up 60% of memory operations by a factor of 4. Another enhancement called L2
cache speeds up half of remaining 40% of memory operations by a factor of 2.
What is overall speedup of system?(Round off to 3 decimal digits)

Solution:
1.282

You might also like