0% found this document useful (0 votes)
12 views9 pages

Final Chapter-5

Uploaded by

V Kumar
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
12 views9 pages

Final Chapter-5

Uploaded by

V Kumar
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 9

CHAPTER-5 Large and Fast: Exploiting Memory Hierarchy

 Performance of a computer depends on large and fast storage systems. We can take advantage of Principle of Locality by implementing the memory of a
 There are multiple levels of memory with different speeds and sizes. computer as a memory hierarchy.
Memory hierarchy: -
 A structure that uses multiple levels of memories; as the distance from the Principle of Locality: -
processor increases, the size of the memories and the access time both The principle of locality states that programs access a relatively small portion of
increase. their address space at any instant of time.
 The faster memories are more expensive per bit than the slower memories and
thus are smaller. Types of locality: -
 Main memory is implemented from DRAM (dynamic random access 1. Temporal locality (locality in time): - If a data location is referenced then it
memory), while levels closer to the processor (caches) use SRAM (static will tend to be referenced again soon.
random access memory). 2. Spatial locality (locality in space): - If a data location is referenced, data
 DRAM is less costly per bit than SRAM, although it is substantially slower. locations with nearby addresses will tend to be referenced soon.
The price difference arises because DRAM uses significantly less area per bit  Most programs contain loops, so instructions and data are likely to be accessed
of memory, and DRAMs thus have larger capacity for the same amount of repeatedly, showing high amounts of temporal locality.
silicon.  Since instructions are normally accessed sequentially, programs also show
 The third technology, used to implement the largest and slowest level in the high spatial locality.
hierarchy, is usually magnetic disk.  Accesses to data also exhibit natural spatial locality. For example, sequential
 Figure 1 shows the memory hierarchy with the faster memory presents closer accesses to elements of an array or a record will naturally have high degrees of
to the processor and the slower, less expensive memory towards the lower spatial locality.
level.
 The data is similarly hierarchical: a level closer to the processor is generally a NOTE: -
subset of any level further away, and all the data is stored at the lowest level.  Memory hierarchies take
 A memory hierarchy can consist of multiple levels, but data is copied between advantage of temporal
only two adjacent levels at a time. locality by keeping more
 The goal is to present the user with as much memory as is available in the recently accessed data items
cheapest technology, while providing access at the speed offered by the fastest closer to the processor.
memory.  Memory hierarchies take
advantage of spatial locality
by moving blocks
consisting of multiple
contiguous words in
memory to upper levels of
the hierarchy.
This figure indicates the structure of Memory Hierarchy: as the distance from
the processor increases, so does the size.
Memory Hierarchy Terminologies: -

Block or line: -
The minimum unit of information that can be either present or not present in a
cache is called block or line.
CHAPTER-5 Large and Fast: Exploiting Memory Hierarchy
Hit & Miss: - main memory first?
 If the data requested by the processor appears in some block in the upper level, [Solution: Use a tag field and valid bit]
this is called a hit. If the data is not found in the upper level, the request is  If the cache is full and we need a new memory block, then which existing
called a miss. cache block will be replaced? [Solution: Use a replacement technology such
 The lower level in the hierarchy is then accessed to retrieve the block as LRU, LFU, FIFO, Random]
containing the requested data.  How the write operations will be handled by the memory system?
Hit rate: - Direct Mapped Cache Design: -
The fraction of memory accesses found in a level of the memory hierarchy is  A cache structure in which each memory location is mapped to exactly one
called hit rate. location in the cache is called direct mapped cache.
Miss rate: -  For example, almost all direct-mapped caches use this mapping to find a block:
The fraction of memory accesses not found in a level of the memory hierarchy. (Block address) modulo (Number of blocks in the cache)
Miss rate = 1- hit rate  If the number of entries in the cache is a power of 2, then modulo can be
Hit time: - computed simply by using the low-order log 2 (cache size in blocks) bits of the
The time required to access a level of the memory hierarchy, including the time address.
needed to determine whether the access is a hit or a miss.  Thus, an 8-block cache uses the three lowest bits (8 = 23) of the block address.
Miss penalty: -  Notice that index = least significant bits (LSB) of address.
The time required to fetch a block into a level of the memory hierarchy from the  If the cache holds 2n blocks, index = n LSBs of address.
lower level, including the time to access the block, transmit it from one level to the Note:
other, insert it in the level that experienced the miss, and then pass the block to the  A main memory is a collection of words.
requestor.  Word size may be 1 byte, 2 byte, 4 byte or 8 byte and so on.
 Memory may be word addressable or byte addressable.
Design of a Cache Memory: -  In the byte addressing scheme, the first word starts at address 0, and the second
 Cache memory is a small, fast and expensive memory placed between word starts at address 4 like in MIPS architecture.
processor and main memory to cope up with data transfer between a high  In the word addressing scheme, all bytes of the first word are located in
speed processor and low speed main memory. address 0, and all bytes of the second word are located in address 1.
 For executing any instruction, processor searches data for a memory address in Address Address
the cache memory. If the data is available (i.e. Cache hit), processor continue
its execution by fetching the data. If the data is not available in cache memory 0 Word 0 0 Byte 3 Byte 2 Byte 1 Byte 0
(i.e. Cache miss), Processor will be stalled and the data is fetched from main 1 Word 1 4 Byte 7 Byte 6 Byte 5 Byte 4
memory and placed in appropriate location of cache memory and then the 2 Word 2 8 Byte 11 Byte 10 Byte 9 Byte 8
processor continue with the execution of the instruction. 3 Word 3 12 Byte 15 Byte 14 Byte 13 Byte 12
 A cache memory is considered as a collection of blocks of various sizes. 4 Word 4 16 Byte 19 Byte 18 Byte 17 Byte 16
 The number of blocks in cache memory is usually a power of 2.
5 Word 5 20 Byte 23 Byte 22 Byte 21 Byte 20
Important issues for design of a Cache Memory: - 6 Word 6 24 Byte 27 Byte 26 Byte 25 Byte 24
 When we copy a block of data from main memory to the cache, where exactly 7 Word 7 28 Byte 31 Byte 30 Byte 29 Byte 28
it should be placed? Word addressable Byte addressable memory with size 32 bytes i.e. 8
[Solution: Use a mapping technique such as direct mapping, Set Memory with size words
associative mapping, Full associative mapping] 32 bytes i.e. 8
 How can we tell if a word is already in the cache or it has to be fetched from words
CHAPTER-5 Large and Fast: Exploiting Memory Hierarchy
Case 1:  This question can be answered by using a field called tag.
Example: Consider a word addressable main memory of size 16 words, a
cache memory of size 4 blocks, each block containing one word. Tag: -
Main memory size= 16 words= 2 4 The tags contain the address information required to identify whether a word in the
words cache corresponds to the requested word.
So, 4 bits are required to represent a The tag needs only to contain the upper portion of the address, corresponding to
memory address. the bits that are not used as an index into the cache.
Cache memory size in number of  A valid bit is also added to indicate whether an entry contains a valid address
blocks= 4= 2 2 or not. If the bit is not set, there cannot be a match for this block.
So 2 bits are required to represent a  When the system is initialized, all the valid bits are set to 0
cache block index.  When data is loaded into a particular cache block, the corresponding valid bit
For a memory reference to address is set to 1
14 i.e. (1110)2, the cache index is
14%4= 2 [102] Steps taken for handling a memory access by the processor: -
Similarly, for a memory reference  The lowest n bits of the address will index a block in the cache.
to address 7 i.e. (0111)2, the cache  If the block is valid and the tag matches the upper (k-n) bits of the k-bit
index is 7% 4= 3 [112] address, then that data will be sent to the CPU (cache hit)
For a memory reference to address 3 i.e. (0011)2 , the cache index is 3% 4= 3 [112]  Otherwise (cache miss), data is read from main memory and stored in the
How to find data in the cache: - cache block specified by the lowest n bits of the address the upper (k-n)
 Because each cache location can contain the contents of a number of different address bits are stored in the block’s tag field the valid bit is set to 1.
memory locations, how do we know whether the data in the cache corresponds  On a cache miss, the simplest thing to do is to stall the pipeline until the data
to a requested word? from main memory can be fetched (and also copied into the cache).
 k bits memory address

(k-n) bits tag n bits block index

Example:
Consider a word addressable main memory of size 16 words, a cache memory
of size 4 blocks, each block containing one word. Draw the diagram
representing steps taken for cache hit.

Ans:
Main memory size= 16 words= 24 words
So k=4 bits are required to represent a memory address.
Cache memory size in number of blocks= 4= 22
So n=2 bits are required to represent a cache block (i.e. the cache index).
CHAPTER-5 Large and Fast: Exploiting Memory Hierarchy

Example: -
Find the number of cache hits, cache miss, hit ratio and miss ratio, for the
following memory reference string: 10110two , 11010two, 10110two , 11010two ,
10000 two , 00011two , 10000two, 10010two , and 10000two .
Assume a word addressable main memory of size 32 words, a cache memory Example:
of size 8 blocks and each block containing one word. Find the number of cache hits, cache miss, hit ratio and miss ratio, for the
following memory reference string: 0,3,1,2,1,0,5,2,4,15
ANS: - Assume a word addressable main memory of size 16 words, a cache memory
Main memory size= 32 words= 25 words of size 4 blocks and each block containing one word.
So 5 bits are required to represent a memory address.
Cache memory size in numbers of blocks= 8= 23 ANS: -
So 3 bits are required to represent a cache block. Memory reference string specifies a sequence of memory address requested by the
processor.
CHAPTER-5
Large and Fast: Exploiting Memory Hierarchy
Address 0 3 1 2 1 0 5 2 4 15 Case 2: Multi Word Direct Mapped Cache Design: -
Consider a word addressable main memory of size 2k words, a cache memory
Binary 0000 0011 0001 0010 0001 0000 0101 0010 0100 1111
address of size 2n blocks, each block containing 2m words. Find the number of bits to
represent cache index and tag field.
Cache 0%4 3%4= 1%4= 2%4= 1%4= 0%4= 5%4= 2%4= 4%4= 15%4 k bits memory address
index =0 3 1 2 1 0 1 2 0 =3
(k-n-m) bits tag n bits cache index m bits word offset
Hit or Miss Miss Miss Miss Hit Hit Miss Hit Miss Miss
miss Here lower m bits represent the word number in the block (i.e. the word offset).
Next n bits represent the cache block index.
k- (n + m) bits represent the tag value.
Example:
Find the cache block index of the memory reference 7, assuming a word
addressable main memory of size 32 words, a cache memory of size 4 blocks
and each block containing 4 words.

ANS: -
Main memory size= 32 words= 25 words
So k=5 bits are required to represent a memory address.
Cache memory size in terms of blocks= 4= 22
So n=2 bits are required to represent a cache block.
1 block= 4 words= 22 words
So, m = 2 bits are required to represent a word within a cache block.

Address= 7= 0 0 1 11
0 01 11
Tag cache block index word offset

Formula to find the cache block index from word addressable memory
address for multi word direct mapped cache:

Cache index = (block address) modulo (no. of cache blocks)


Where block address = floor (memory address/ words per block)
Total no. of references= 10
Address = 7
No. of hits= 3
Cache index= floor (7/4) %4
No. of miss= 7 = 1%4 = 1= (01)2
Hit ratio in percentage=3/10 * 100= 30%
Miss ratio in percentage= 7/10 *100= 70%
CHAPTER-5 Large and Fast: Exploiting Memory Hierarchy
Cache memory Address= 17= 0 10 001
Valid bit Tag Data index 0 10 001
Word 3 Word 2 Word 1 Word 0
Tag cache block index byte offset
00
Mem(7) 01
Formula to find the cache block index from byte addressable memory
10 address for multi byte direct mapped cache:
11
Cache index= (block address) modulo (no. of cache blocks)
Figure: - Structure of a cache memory with 4 blocks and each block contains 4 Where block address = floor (byte address/ bytes per block)
words. Address = 17
 Advantage of multi word cache design over single word cache design is Cache index = floor(17/8) %4
reduced miss rate. = 2%4
 Disadvantage of multi word cache design over single word cache design is = 2= (10)2
higher cache size.
Case 3: Cache memory
Single Word cache block with multiple bytes Direct Mapped Cache Design: - V T Data inde
Consider a byte addressable main memory of size 2 k bytes, a cache memory of al ag x
size 2n blocks, each block containing 2j bytes. Find the number of bits to id byt byt byte byte byte byte byte byte
represent cache index and tag field. bi e7 e6 5 4 3 2 1 0
t
k bits memory address 00
(k-n-j) bits tag n bits block index j bits byte offset 01
Mem 10
Here lower j bits represent the byte number in the block (i.e. the byte offset). (17)
Next n bits represent the cache block index. 11
k - (n+j) bits represent the tag value.
Figure: -Structure of a cache memory with 4 blocks and each block contains 8
Example: bytes.
Find the cache block index of the memory reference 17, assuming a byte Example:
addressable main memory of size 64 bytes, a cache memory of size 4 blocks Find the cache block index of the byte address 1200, with a cache memory of size
and each block containing 8 bytes. 64 blocks and each block containing 16 bytes.
ANS: -
Main memory size= 64 byte= 26 bytes Cache index= (block address) modulo (no. of cache blocks)
So k=6 bits are required to represent a memory address. Where block address = floor (byte address/ bytes per block)
Cache memory size in terms of blocks= 4= 22 Cache index= floor (1200/16) % 64= 75
So n=2 bits are required to represent a cache block index.
1 block= 8 bytes= 23 bytes
So j=3 bits are required to represent a byte within a cache block.
CHAPTER-5 Large and Fast: Exploiting Memory Hierarchy
Case 4: Multi Word cache block with 4 bytes Direct Mapped Cache Design: -
Consider a byte addressable main memory of size 2k bytes( i.e. 2k-2 words), a
cache memory of size 2n blocks, each block containing 2m words, and 1 word Cache memory
containing 4 bytes. Find the number of bits to represent cache index and tag Va T Data in
field. lid a de
k bits memory address bit g x
(k-n-m-2) bits tag n bits block index m bits word offset 2bits byte offset Word 3 Word 2 Word 1 Word 0
B B B B B B B B B B B B B B B B
Here lower 2 bits represent the byte number in the word (i.e. the byte offset). 3 2 1 0 3 2 1 0 3 2 1 0 3 2 1 0
Next m bits represent the word number in the block (i.e. word offset).
00
Next n bits represent the cache block index.
k-(n+m+2) bits represent the tag value. 01
10
Example: 11
Find the cache block index of the memory reference 17, assuming a byte
addressable main memory of size 256 bytes (ie. 64 words), a cache memory of Figure: Structure of a cache memory with 4 blocks, each block containing 4
size 4 blocks and each block containing 4 words. words and each word is 4 bytes.
ANS: -
Main memory size= 256 bytes= 28 bytes Note: In MIPS architecture the value of k is 32 as each memory address is 32
So k = 8 bits are required to represent a memory address. bits and memory is byte addressable.
Cache memory size in terms of blocks= 4= 22
So n = 2 bits are required to represent a cache block index. Question:
1 block= 4 words= 22 words Draw the diagram representing cache hit in MIPS direct mapped cache
So m=2 bits are required to represent a word within a cache block. design with 32 bit byte address and single word 1024 number of cache blocks.
1 word= 4 bytes= 22 bytes
So lower 2 bits will represent the byte offset. ANS: -
Address= 17= 00 01 00 01 Here number of bits to represent memory address is k= 32
Cache memory size in terms of blocks= 1024= 210
00 01 00 01
So n=10 bits are required to represent a cache block index.
Tag cache block index Word offset byte offset 1 block= 1word= 20 words
So no bits are used for word offset.
1 word= 4 bytes= 22 bytes
Formula to find the cache block index from byte addressable memory address So 2 bits are required to represent a byte within a cache block (i.e. byte offset).
for multi byte direct mapped cache: Number of bits to represent tag field= 32 – (10+0+2) = 20
Cache index= (block address) modulo (no. of cache blocks)
Where block address = floor (memory address/ bytes per block)
Address= 17
Cache index= floor (17/16) %4 = 1%4= 1= (01)2
CHAPTER-5 Large and Fast: Exploiting Memory Hierarchy
Thus, total bits required for direct mapped cache is 2 n × (block size + tag size +
valid field size).
147
210 × (27 + (32 – 10 – 2 -2) + 1 ) = 210 (128 + 19) = 147 Kbit = k Byte = 18.4 K
8
byte.
Here 18.4 KB is required for implementing the cache which will actually store
16KB data.
So the ratio between total bits required for such a cache implementation over data
storage bit
= 18.4/16= 1.15

Measuring and improving class performance: -


 CPU time can be divided into the clock cycles that the CPU spends executing
the program and the clock cycles that the CPU spends waiting for the memory
system.
 CPU time = (CPU execution clock cycles + Memory-stall clock cycles) ×
Clock cycle time
 Memory-stall clock cycles can be defined as the sum of the stall cycles coming
from reads plus those coming from writes:
Size of Cache Memory: - Memory-stall clock cycles = Read-stall cycles + Write-stall cycles
 The total number of bits in a direct-mapped cache is 2n × (block size in bits +  The read-stall cycles can be defined in terms of the number of read accesses
tag size in bits + valid field size in bits). per program, the miss penalty in clock cycles for a read, and the read miss rate:
 Since the block size is 2m words (2m+2 bytes or 2m+5 bites), and we need 1 bit 𝑅𝑒𝑎𝑑
for the valid field, the number of bits in such a cache is  Read-stall cycles = × Read miss rate × Read miss penalty
𝑃𝑟𝑜𝑔𝑟𝑎𝑚
2n × (2m × 32 + (32 - n - m - 2) + 1) 𝑊𝑟𝑖𝑡𝑒
 Write-stall cycles = × Write miss rate × Write miss penalty + Write
𝑃𝑟𝑜𝑔𝑟 𝑎𝑚
= 2n × (2m+5 + 31 - n - m).
buffer stalls.
𝑀𝑒𝑚𝑜𝑟𝑦𝑎𝑐𝑐𝑒𝑠𝑠𝑒𝑠
Question: -  Memory stall clock cycles = 𝑃𝑟𝑜𝑔𝑟𝑎𝑚
× miss rate × miss penalty
How many total bits are required for a direct-mapped cache with 16 KB of 𝐼𝑛𝑠𝑡𝑟𝑢𝑐𝑡𝑖𝑜𝑛𝑠 𝑀𝑖𝑠𝑠𝑒𝑠
 Memory stall clock cycles = 𝑃𝑟𝑜𝑔𝑟𝑎𝑚
× 𝐼𝑛𝑠𝑡𝑟 𝑢𝑐𝑡𝑖𝑜𝑛 × Miss penalty
data and 4-word blocks, assuming a 32-bit address?
What is the ratio between total bits required for such a cache implementation
over data storage bit? Questions: -
Assume the miss rate of an instruction cache is 2% and the miss rate of the data
ANS: - cache is 4%. If a processor has a CPI of 2 without any memory stalls and the miss
Each block has 4 words. penalty is 100 cycles for all misses, determine how much faster a processor would
Block size = 4 × 32 = 27 bits. run with a perfect cache that never missed. Assume the frequency of all loads and
Main memory size = 16 KB = 24 × 210 × 23 = 217 bits. stores is 36%.
217
Number of cache blocks = = 210.
27 ANS: -
Here k= 32, n= 10, m= 2 Let the instruction count is I.
CHAPTER-5 Large and Fast: Exploiting Memory Hierarchy
Instruction miss cycles = I × 2% × 100 = 2 I b) Number of cache blocks=2n = 2index bit = 26 = 64
Frequency of load and store is 36%. c) Total no. of bits for cache implementation= 2n × (block size + tag size +
Data miss cycles = I × 36% × 4% × 100 = 1.44 I valid field size)
Total number of memory stall cycle is 2I + 1.44I = 3.44I = 64 x (4x32+ 22 +1)
Thus total CPI including memory stall is 2 + 3.44 = 2 + 3.44 = 5.44 = 64 x (128+23) = 64 x 151= 9664 bits
𝐶𝑃𝑈𝑡𝑖𝑚𝑒𝑤𝑖𝑡 𝑕𝑠𝑡𝑎𝑙𝑙𝑠 𝐼×𝐶𝑃𝐼 𝑠𝑡𝑎𝑙𝑙 ×𝑐𝑙𝑜𝑐𝑘𝑐𝑦𝑐𝑙𝑒 𝐶𝑃𝐼 𝑠𝑡𝑎𝑙𝑙 5.44 Total no. of bits used for data storage= 2n × block size
= = = = 2.72.
𝐶𝑃𝑈𝑡𝑖𝑚𝑒𝑤𝑖𝑡 𝑕𝑝𝑒𝑟𝑓𝑒𝑐𝑡𝑐𝑎𝑐 𝑕𝑒 𝐼×𝐶𝑃𝐼 𝑃𝑒𝑟𝑓𝑒𝑐𝑡 ×𝑐𝑙𝑜𝑐𝑘𝑐𝑦𝑐𝑙𝑒 𝐶𝑃𝐼 𝑃𝑒𝑟𝑓𝑒𝑐𝑡 2 = 64 x 128 = 8192 bits
The ratio between total bits required for such a cache implementation over data
Average memory access time: - storage bits= 9664/8192= 1.179
Average memory access time is the average time to access memory considering
both hits and misses and frequency of different accesses. Questions: - With respect to the above cache design find out the number of
AMAT = Time for a hit + Miss rate × Miss penalty cache hits and cache miss for the following memory reference string.

Question: - 0 4 16 132 232 160 1024 30 140 3100 180 2180


Find the AMAT for a processor with a 1 ns clock cycle time, a miss penalty of 20
clock cycles, a miss rate of 0.05 misses per instruction, and a cache access time
(including hit detection) of 1 clock cycle. Assume that the read and write miss a) How many blocks are replaced?
penalties are the same and ignore other write stalls. b) What is the hit ratio?
ANS: - Here Cache index= floor (addr/16) % 64
AMAT = Time for a hit + Miss rate × miss penalty For a cache miss entire block containing 16 bytes will be transferred from
= 1 + 0.05 × 20 main memory to cache memory.
= 2 clock cycles. Mem 0 4 16 132 2 160 102 30 140 3100 180 2180
ory 3 4
Question: - addr 2
For a direct-mapped cache design with a 32-bit address, the following bits of the Cach 0 0 1 8 1 10 0 1 8 1 11 8
address are used to access the cache. e 4
index
Tag Index Byte Offset Hit/ M H M M M M M H H M M M
31-10 9-4 3-0 Miss Rep Repl Repla
lace aced ced
a) What is the cache line size (in words)? d
b) How many entries does the cache have?
c) What is the ratio between total bits required for such a cache No. of blocks replaced= 3
implementation over data storage bits? Hit ratio = 3/ 12= 0.25
ANS: -
a) Byte Offset = 4
2j = 24 = 16 bytes
16
Block size = 24 = 16 bytes = 4 = 4 words

You might also like