Memory Organization
Memory Organization
Q3.
Caches:
• Tavg
Q1-Q2. There is an instruction QUEUE in the CPU that could hold pre-fetched
instructions. Thus, the “INSTRUCTION QUEUE” and the “GENERAL PURPOSE
REGISTERS” now result in having 25% less memory access made by cpu and INTERNAL
BUS in CPU circuitry is self sufficient for these accesses and no memory access signal
comes to cache or main memory. Earlier the cache hit was: 0.8, cache access time= 100 ns
and Main memory access time = 500ns.
Q1. What is the speedup achieved in the average memory access time of the memory
system due to having “INSTRUCTION QUEUE” :___________________________ 1 (no
change in memory access time of the memory system)
Q2. If earlier a program took 1000 ns to execute (assuming only memory accesses time in
program execution, cpu time is negligible), what is the new execution time:
_____________ns 750
Solution:
Q3 -Q5. The empirically analysed probability of accessing a main memory block number x
when the program A is under execution and having a particular instance of “VIRTUAL
MEMORY to MAIN MEMORY” mapping is:
Q3. Assuming the same “VIRTUAL MEMORY to MAIN MEMORY” mapping and all program
execution conditions same for executing program A, find the hit ratio of CACHE at this
instance, when the cache with 8 block size has the blocks: [1, 2, 3, 8, 5 , 6, 7, 9 ] in cache.
What is the “HIT RATE of cache at given instance for given data”
A. 1
B. 0.4
C. 0.8
D. 0.6
Consider that as per block replacement policy, only 3 blocks can be allocated in the cache
to a process: then minimum how many block misses will be required after this snapshot of
cache, to have cache hit rate greater than 0.6:
A. 4
B. 1
C. 2
D. 3
Q5. Consider the cache block size of 4 words and memory word size is 16 bytes in byte
addressable main memory. Then the no. of words in the program A is:
A. 4
B. 256
C. 64
D. 16
Q6.
Consider a simultaneous access cache memory system with
Cache access time = 100 ns
Main memory access time = 1000 ns
Hit ratio = h
If the variation of Tavg with the cache hit ratio “h” is same in simultaneous access
arrangement and hierarchical access arrangement, then which of the following is condition is
necessary:
A. y<x
B.
• Multilevel
• Mapping:
Q1. . If the probability to access any block in the memory system is same and independent
of the current cache contents has the value, P(access a block) = 1/1024. Then what can be
the maximum size of tag bits possible in any kind of cache-mapping:
A. 1024
B. 12
C. 10
D. 20
Q2. . If the probability to access any address in the memory system is same and
independent of the current cache contents has the value, P(access an address) = 1/1024.
The cache block size is 4 words. And the memory word size is 4 bytes. Then what can be
the maximum size of tag bits possible in any kind of cache-mapping, where cache can hold 4
blocks:
E. 6
F. 64
G. 10
H. 4
Q3. . Assuming all addresses in the memory system are equi-probable to be accessed at
any time while the program execution. Each access request is independent from each other.
In direct mapping, there is only 1 tag bit saved in tag directory for each cache block. Then
the hit ratio
1. [NIELIT]
In a particular system it is observed that, the cache performance gets improved as a result of
increasing the block size of the cache. The primary reason behind this is :
Q1:[nat]
In a two-level memory hierarchy, the access time of cache memory is 12 ns and the access
time of main memory is 1500 ns. The hit ratio is 0.98, the average access time of the two
level memory system is ______ ns?
Solution:
= 41.76 ns
Q2: Which of the following is true for instruction cache and data cache generally: [msq]
Q3:
Consider a memory system with a single L1 cache that has an average access
time of100ns without L1, and 30ns with L1. L1 has an access time of 10ns. What
is the hit ratio of L1required to have an average access time of 30ns.
A. 90%
B. 70%
C. 80%
D. 81.8%
Solution:
C. 80%
Q:
Consider a memory system with I-cache, D-cache, L2 cache and main memory
accesstime of 5ns, 5ns, 10ns and 100ns respectively, and their respective hit
ratios are 0.85, 0.80, 0.9 and1. On executing a program, 40% of instructions
access I-cache and 60% of instructions accessD-cache.What is the average
memory access time?
a. 6.6ns
b. 14.7ns
c. 8.6ns
d. 12.2ns
Solution:
8.6ns
Q:
Suppose a processor supports software configuration on handling cache write hits/miss
strategy.
Q:
Which of the following is true? [msq]
I : Write back updation of cache leads to raw hazards
II : Write through mechanism reduces the average memory access time
III : Write back and write through mechanisms reduce conflict misses
IV : Write back and write through mechanisms maintain cache coherence
a. I ,II,and IV
b. I, and IV
c. I, II, and IIIOnly
d. IV is True
Solution:
I, and IV
Q:
Solution:
write back for both i and sum
Q:
. Which of the following statements are true?
a. The main objective for using cache memory is to increase the effective speed of the
memory system.
b. The main objective for using virtual memory is to increase the effective capacity of the
memory system.
c. The size of main memory is larger as compared to cache memory.
d. Main memory is faster as compared to cache memory.
Correct answers are (a), (b) and (c).
(a) is true since the average memory access of a memory system using cache is close
to the access time of the cache. And cache is faster than main memory.
(b) is true since in virtual memory the size of a program can be as large as the size of
the secondary memory.
(c) is true since cache memory is more expensive and hence its capacity is smaller
than main memory.
(d) is clearly false.
Q:
Q:
Solution:
Answer is D.
Spatial locality refers to a data near by a recently accessed one being accesses
in near future. So, to make use of it, when a data is accessed, a block of data
which incudes the accessed part is taken (called cache line) and placed in cache.
So, if this cache line size is 1 byte, that virtually rules out any chance of exploiting
spatial locality in cache.
Q:
Consider a cache works at 5× speed of main memory with hit rate of 75%. What is the
speedup of the memory performance if such cache is used.
Answer Let cache access time to be t, then the main memory access time is 5t. The
system average access time is,
Ta = 0.75t + 0.25 × 5t = 2.0t. (1)
Then the performance speedup is 5t/2.0t = 2.5.
There is a computer that has 64MB byte-addressable main memory. Instructions and data
are stored in separated caches, each of which has eight 64B cache lines. The data
cache use direct-mapping. Now there are two programs in the following form.
Program A:
int a[64][64];
int sum_array1()
{
int i,j,sum=0;
for(i=0;i<64;i++)
for(j=0;j<64;j++)
sum += a[i][j];
return sum;
}
Program B:
int a[64][64];
int sum_array1()
{
int i,j,sum=0;
for(j=0;j<64;j++)
for(i=0;i<64;i++)
sum += a[i][j];
return sum;
}
Suppose int data is represented in 32-bit 2’s complement and i,j,sum are stored
in specific registers. Arrays are stored in row-major with the start address 32010
in the main memory. Answer the following questions.
1. What are the line numbers of the main memory blocks that contain a[0][31]
and a[2][2] respectively? (Cache line number starts from 0)
2. What are the data cache hit rates of program A and B?
Q. Assume that a read request takes 50 nsec on a cache miss and 5 nsec on
a cache hit. While running a program, it is observed that 80% of the processor’s
read requests result in a cache hit. The average read access time is …………..
nsec.
Correct answer is 14.
Average read time = 0.80 x 5 + (1 – 0.80) x 50 = 14 nsec
Q. The memory access time is 1 nsec for a read operation with a hit in cache, 5
nsec for a read operation with a miss in cache, 2 nsec for a write operation with
a hit in cache, and 10 nsec for a write operation with a miss in cache. The
execution of a sequence of instructions involves 100 instruction fetch operations,
60 memory operand read operations, and 40 memory operand write operations.
The cache hit ratio is 0.9. The average memory access time (in nanoseconds) in
executing the sequence of instructions is:
a. 1.26
b. 1.68
c. 2.46
d. 4.52
Correct answer is (b).
Total number of read = 100 + 60 = 160
Total number of write = 40.
So, fraction of reads = 160 / (160 + 40) = 0.8
And, fraction of writes = 40 / (160 + 40) = 0.2
Average access time = 0.8 (0.9 x 1 + 0.1 x 5) + 0.2 (0.9 x 2 + 0.1 x 10) = 1.68
Direct Mapping
Q:
Q. A cache memory system with capacity of N words and block size of B words
is to be designed. If it is designed as a direct mapped cache, the
length of the TAG field is 10 bits. If it is designed as a 16-way set associative
cache, the length of the TAG field will be ………… bits. Correct answer is 14.
For 16-way set associative cache, 4 more bits will be required in the TAG as
compared to direct mapping, since 24 = 16.
Q.
Q For two direct-mapped cache design with a 32-bit address and a 16-bit address,
the following bits of the address are used to access the cache (as below).
Q:
Suppose there is a byte addressable computer with main memory size 256MB and a
cache with 8 lines. Each cache line size is 64B. Assume direct mapping in the
memory hierarchy. Calculate the following items.
1. How many bits do we need at least for the main memory address? The total
cache size (in bits).
2. The line number corresponding to the main memory address 233310.
Answer 1. All of the following answers are correct (note: analysing one senario
is enough). 1
(a) Assume the main memory address space is up to 256MB, then we have
28bit main memory address, thus the total cache size in bits is 8 × (1 + 19
+ 512) = 4256bits.
(b) Assume 32-bit main memory address, the total cache size in bits is 8 × (1
+ 23 + 512) = 4288bits.
(c) Assume 16-bit main memory address, the total cache size in bits is 8 × (1
+ 7 + 512) = 4160bits.
2. All of the following answers are correct
(a) d2333B/64Be = 37, line number = 37 mod 8 = 5.
(b) b2333B/64Bc = 36, line number = 36 mod 8 = 4.
Solution:
Q3:
Set Associative
Q:
A computer uses a 2-way set associative cache of size 128 KBytes with block (line) size of
32 Bytes. The cache accepts 32 bit addresses of the form b31 b30…b2 b1 b0 where b31
is the most significant address and b0 is the least significant address bit. Which bits are
used by the cache controller for indexing into the cache directory?
Solution:
block size = 32B
= 128kb/32b= 4096
s = 4096/2 = 2048
11 bits 5bits
16 bits
bb b b b b b b b
31 30 29 28 27 26 25 24 23 b b b b b
4 3 2 1 0
b b b b b b b b b bb
15 14 13 12 11 10 9 8 7 6 5
b b b b b b b
22 21 20 19 18 17 16
an index field within the presented address within remaining bits of the index.
The index field of an address maps to low order cache directory address lines.
The remaining cache directory address line, the highest order line, is indexed
by the parity of the address tag for the cache entry to be stored to or
retrieved from the corresponding cache directory entry. Thus, even parity
address tags are stored in cache directory locations with zero in the most
significant index/address bit, while odd parity address tags are stored in cache
directory locations with one in the most significant index/address bit. The
opposite arrangement (msb 1=even parity; msb 0=odd parity) may also be
employed, as may configurations in which parity supplies the least significant bit
rather than the most significant bit.
now in the set (s) field msb is used as a parity bit the remaining set
field bits are used for indexing...along with parity bit
Q:
Which of the following statement/s is/are FALSE?I: In set associative mapping, if
set size is reduced to 1; it reduces to fully associative mapping.II : In set
associative mapping if only one set is present, it reduces to direct mapping.
a. I Only
b. II Only
c. Both I and II
d. None
Solution:
Both I and II
Q:
A byte-addressable computer uses 32-bit address to access main memory. Suppose the
data cache has a size of 4KiB and works in 8-way set-associative. Each block size
is 16B.
1. How many bits in Tag, Index and Offset?
2. Calculate the total cache size (in bits) if it uses write back and LRU
replacement.
Answer 1. The number of blocks is 4 KiB /16 B= 28. Therefore, 28/8 = 25rows need 5 bits to be
indexed. Since the computer is byte-addressable, the bits of offset is log2(24B) = 4.
And the tag bits is calculated as: 32 − 5 − 4 = 23 bits.
2. (23 Tag bits +16 ∗ 8 Data bits+1 Valid bit+1 Dirty bit+3 LRU bits) ∗ 28 = 39936
bits.
Q:
There are several parameters that impact the overall size of the page table. Listed below
are key page table parameters.
Table 2: Q6
Virtual Address Size Page Size Page Table Entry Size
32 16KiB 4bytes
1. Given the parameters shown above, calculate the total page table size for a
system running 5 applications that utilize half of the memory available (half of
the 32-bit virtual address space for each running ).
2. A cache designer wants to increase the size of a virtually indexed, physically
tagged cache. Given the page size shown above, is it possible to make a
64KiB direct mapped cache, assuming 2 words per block?
3. How would the designer increase the data size of the cache?
Answer 1. The tag size is 32 − log2(16384) = 32 − 14 = 18 bits. All five page tables would
require 5 ∗ (218 ∗ 4)/2 bytes = 2621440.0 B,
2. The page index consists of address bits 13 down to 0. So the LSB of the tag
is address bit 14. A 64 KiB direct-mapped cache with 2-words per block
would have 8-byte blocks and thus 64 KiB / 8 bytes = 8192 blocks, and its
index field would span address bits 16 down to 3 (13 bits to index, 1 bit word
offset, 2 bit byte offset). As such, the tag LSB of the cache tag is address bit
16,
3. The designer would instead need to make the cache 4-way associative to
increase its size to 64 KB.
Q:
A computer system has a 8K word cache organized in a block-set associative manner with
8 blocks per set and 32 words per block. The number of bits in the SET and WORD fields
of the main memory address format is:
a. 6,4
b. 4,5
c. 5,5
d. 8,5
Q:
The main memory of a computer has N blocks while the cache has N/m blocks. If the cache
uses set associative mapping scheme with 4 blocks per set, then block “k” of the main
memory maps to the set:
(tag,set,word) in that
a) 15,10,6
b) 9, 16, 6
c) 10,15,6
d) 8, 17, 6
Answer: a) 15,10,6
Q:
Consider a system with following sequence of memory block accesses : 16, 0,
16, 0, 4,0, and 4.
Cache memory has 16 blocks and main memory has 64 blocks. In this case
which of the below mapping techniques result in least number of misses and less
hardware complexity (i.e.,least number of tag comparators)?
a. Direct mapping
b. 2-way set associative
c. 4-way set associative
d. Fully associative
Solution:
2-way set associative
Q:Consider a system with a block size of p words. What is the range of words
present in kth block of main memory?
a. k*p to (k+1)*p - 1
b. k*p^2 to (k+1)*p^2
c. k*2^p to (k+1)*2^p - 1
d. k*2^p to (k)*2^(p+1) - 1
Solution:
k*p to (k+1)*p - 1
Q:
A computer has a 512KB, 8-way set associative data cache with block size of
32B. Theprocessor sends 32-bit addresses to the cache controller. Each cache
tag directory entry contains,in addition to the address tag, 2 valid bits, 1 modified
bit and 1 replacement bit.A. what will be the number of tag bits?
a. 11
b. 16
c. 21
d. 12
Solution:
16
B. What will be the cache controller size?
a. 40KB
b. 320KB
c. 192KB
d. 32KB
Solution:a
40Kb
Q
Consider a 2 way set associative cache with 4 blocks. The memory block requests in the
order.
4,6,3,8,5,6,0,15,6,17,20,15,0,8
If LRU is used for block replacement then memory set 17 will be in the cache block ____ and
set no ____. (assuming first block starts from 0)
Answer: cache block number 3 and set number 1(as there are two sets 0 and 1)
Q. A computer system uses 32-bit memory addresses and it has a main memory consisting
of 1G bytes. It has a 4K-byte cache organized in the block-set-associative manner, with 4
blocks per set and 64 bytes per block.
(a) Calculate the number of bits in each of the Tag, Set, and Word fields of the
memory address.
(b) Assume that the cache is initially empty. Suppose that the processor fetches 1088
words of four bytes each from successive word locations starting at location 0. It then
repeats this fetch sequence nine more times. If the cache is 10 times faster than the
memory, estimate the improvement factor resulting from the use of the cache.
Assume that the LRU algorithm is used for block replacement.
Solution:
Here number of words that are in 1 block = Block size / Word size
= 64 B / 4 B = 16
Number of cache lines = Cache size / Block size
= 4 KB / 64 B
= 64
Also number of words which are possible to be accomodated in cache till it is full = 4 K
/ 4 = 1024
Now we have to know about the access sequence of words i.e. a word access results in hit
or miss :
Thus continuing in the same manner till word no 1023 (at this point cache is full) , let us
mention the block number contained by each set :
Set number 0 : 0 16 32 48
Set number 1 : 1 17 33 49
Set number 2 : 2 18 34 50
Set number 15 : 15 31 47 63
Now when the word numbers 1024-1039 comes i.e. block number 64 comes , cache is full
hence capacity miss occurs . Hence LRU replacement comes into picture . So block number
0 in set 0 is least recently used , hence it is replaced.Hence set 0 blocks appear
now as : 64 16 32 48
= 68
Now in subsequent nine iterations where access of the words (0 - 1087) are done again , set
number ( 4 - 15 ) will remain unaffected as the words are already in cache which belong to
these sets and these sets wont face any block replacement as shown below . So from now
on we focus on first 4 sets only where replacement occurs due to cache miss.
Now when the word numbers 0-15 comes i.e. block number 0 comes , cache is full hence
capacity miss occurs . Hence LRU replacement comes into picture . So block number 16 in
set 0 is least recently used , hence it is replaced.Hence set 0 blocks appear
now as : 64 0 32 48
Hence total number of misses in 10 iterations = 68(for first iteration) + 9*20 (for next
9 iterations)
= 68 + 180
= 248 misses
= 2.15
Types of Misses
Q:
Which one of the following is correct?
A. Compulsory misses can be reduced by increasing the total cache
size.
B. Capacity misses can be reduced by increasing the block size.
C. Conflict misses may be increased by increasing the value of
associativity.
D. Compulsory misses can be reduced by increasing the cache block
size.
Q:
. Which of the following statements is false for cache misses?
a. Compulsory miss can be reduced by decreasing the cache block size.
b. Capacity miss can be reduced by decreasing the total size of the cache.
c. Conflict miss can be reduced by decreasing the value of cache associativity.
d. Compulsory miss can be reduced by prefetching cache blocks. Correct answers
are (a), (b) and (c).
This follows from the definitions of compulsory and capacity misses.
Q:
Match the following
A.Capacity Misses I.Fully Associative cache
B.Compulsory Misses II.Set Associative cache
C.Conflict Misses III.Direct Mapped cache
IV.Cold Start cache
A. III, B. IV, C. II
A. I, B. I, C. II and III
A. I, B. IV, C. II and III
A. I, B. II, C. III
Solution:
A. I, B. IV, C. II and III
Q: Consider a physical memory of 1MB size and a direct mapped cache of 8KB size with
block size 32 bytes.Let there be a sequence of block accesses given by
2,216,100,256,2048,728,256,216
Find the number of compulsory, capacity, conflict miss.
Solution:
in one go 32 bytes of data will be taken from/trasferred to main memory to/from cache. This
means there are 1MB/8KB=128 blocks in main memory corresponding to a single cache
block. Since 128 memory blocks can map to a single cache entry, this means we need
log2 128=7 tag bits.
• The main address bits get split into 3:tag :index :offset
For the above configuration we have 20 bit physical address, 7 bits of tag, 8 bits for
index (2 =256 entries in cache each of size 32 bytes = 8 KB) and 5 bits of offset.
8
Now, to find which cache block a memory block goes, we just have to do
Mod256 .
2 2 Compulsory Miss
216 216 Compulsory Miss
100 100 Compulsory Miss
256 0 Compulsory Miss
2048 0 Compulsory Miss
728 216 Compulsory Miss
256 0 Conflict Miss
216 216 Conflict Miss
Multilevel-cache:
Q:
. In a two-level cache system, the access times of L1 and L2 caches are 1 and 8
clock cycles respectively. The miss penalty from L2 cache to main memory is 18
clock cycles. The miss rate of L1 cache is twice that of L2. The average memory
access time of the cache system is 2 cycles. The miss rates of L1 and L2 caches
respectively are:
a. 0.130 and 0.065
b. 0.056 and 0.111
c. 0.0892 and 0.1784
d. 0.1784 and 0.0892
Correct answer is (a).
Let the miss rate of L2 cache be x.
So, miss rate of L1 cache = 2x.
Thus, average memory access time
AMAT = (1-2x).1 + 2x. [(1-x).8 + x.18] = 2 (given)
Solving, we get
x = 0.065
Q:
Suppose that in 250 memory references there are 30 misses in first level cache and 10
misses in second level cache. Assume that miss penalty from L 2 cache memory are 50
cycles. The hit time of L2 cache is 10 cycles. The hit time of the L1 cache is 5 cycles. If there
are 1.25 memory references per instruction, then average stall cycles per instruction is
__________
Answer:
Memory stall cycles created when CPU has to wait for memory. Here, we can assume no
stall cycles for L1 hit. So, CPU will access from L1 only. If something not found there we
need to consider L1 miss penalty (why not L2 miss penalty ?? because we will include it in
L1 miss
Miss penalty of
L1 =( HIT time in L2)+(miss rate in L2)∗(miss penalty of L2)
=10 cycles +(10/30)∗50 cycles
=80/3 cycles
a. n=k+1
b. n<k
c. n=k
d. n<=k
Solution:a
n=k+1
Q:A basic computer system with a single core processor where memory
operations take 40% of execution time. An enhancement called L1 cache speeds
up 60% of memory operations by a factor of 4. Another enhancement called L2
cache speeds up half of remaining 40% of memory operations by a factor of 2.
What is overall speedup of system?(Round off to 3 decimal digits)
Solution:
1.282