Final Chapter-5
Final Chapter-5
Performance of a computer depends on large and fast storage systems. We can take advantage of Principle of Locality by implementing the memory of a
There are multiple levels of memory with different speeds and sizes. computer as a memory hierarchy.
Memory hierarchy: -
A structure that uses multiple levels of memories; as the distance from the Principle of Locality: -
processor increases, the size of the memories and the access time both The principle of locality states that programs access a relatively small portion of
increase. their address space at any instant of time.
The faster memories are more expensive per bit than the slower memories and
thus are smaller. Types of locality: -
Main memory is implemented from DRAM (dynamic random access 1. Temporal locality (locality in time): - If a data location is referenced then it
memory), while levels closer to the processor (caches) use SRAM (static will tend to be referenced again soon.
random access memory). 2. Spatial locality (locality in space): - If a data location is referenced, data
DRAM is less costly per bit than SRAM, although it is substantially slower. locations with nearby addresses will tend to be referenced soon.
The price difference arises because DRAM uses significantly less area per bit Most programs contain loops, so instructions and data are likely to be accessed
of memory, and DRAMs thus have larger capacity for the same amount of repeatedly, showing high amounts of temporal locality.
silicon. Since instructions are normally accessed sequentially, programs also show
The third technology, used to implement the largest and slowest level in the high spatial locality.
hierarchy, is usually magnetic disk. Accesses to data also exhibit natural spatial locality. For example, sequential
Figure 1 shows the memory hierarchy with the faster memory presents closer accesses to elements of an array or a record will naturally have high degrees of
to the processor and the slower, less expensive memory towards the lower spatial locality.
level.
The data is similarly hierarchical: a level closer to the processor is generally a NOTE: -
subset of any level further away, and all the data is stored at the lowest level. Memory hierarchies take
A memory hierarchy can consist of multiple levels, but data is copied between advantage of temporal
only two adjacent levels at a time. locality by keeping more
The goal is to present the user with as much memory as is available in the recently accessed data items
cheapest technology, while providing access at the speed offered by the fastest closer to the processor.
memory. Memory hierarchies take
advantage of spatial locality
by moving blocks
consisting of multiple
contiguous words in
memory to upper levels of
the hierarchy.
This figure indicates the structure of Memory Hierarchy: as the distance from
the processor increases, so does the size.
Memory Hierarchy Terminologies: -
Block or line: -
The minimum unit of information that can be either present or not present in a
cache is called block or line.
CHAPTER-5 Large and Fast: Exploiting Memory Hierarchy
Hit & Miss: - main memory first?
If the data requested by the processor appears in some block in the upper level, [Solution: Use a tag field and valid bit]
this is called a hit. If the data is not found in the upper level, the request is If the cache is full and we need a new memory block, then which existing
called a miss. cache block will be replaced? [Solution: Use a replacement technology such
The lower level in the hierarchy is then accessed to retrieve the block as LRU, LFU, FIFO, Random]
containing the requested data. How the write operations will be handled by the memory system?
Hit rate: - Direct Mapped Cache Design: -
The fraction of memory accesses found in a level of the memory hierarchy is A cache structure in which each memory location is mapped to exactly one
called hit rate. location in the cache is called direct mapped cache.
Miss rate: - For example, almost all direct-mapped caches use this mapping to find a block:
The fraction of memory accesses not found in a level of the memory hierarchy. (Block address) modulo (Number of blocks in the cache)
Miss rate = 1- hit rate If the number of entries in the cache is a power of 2, then modulo can be
Hit time: - computed simply by using the low-order log 2 (cache size in blocks) bits of the
The time required to access a level of the memory hierarchy, including the time address.
needed to determine whether the access is a hit or a miss. Thus, an 8-block cache uses the three lowest bits (8 = 23) of the block address.
Miss penalty: - Notice that index = least significant bits (LSB) of address.
The time required to fetch a block into a level of the memory hierarchy from the If the cache holds 2n blocks, index = n LSBs of address.
lower level, including the time to access the block, transmit it from one level to the Note:
other, insert it in the level that experienced the miss, and then pass the block to the A main memory is a collection of words.
requestor. Word size may be 1 byte, 2 byte, 4 byte or 8 byte and so on.
Memory may be word addressable or byte addressable.
Design of a Cache Memory: - In the byte addressing scheme, the first word starts at address 0, and the second
Cache memory is a small, fast and expensive memory placed between word starts at address 4 like in MIPS architecture.
processor and main memory to cope up with data transfer between a high In the word addressing scheme, all bytes of the first word are located in
speed processor and low speed main memory. address 0, and all bytes of the second word are located in address 1.
For executing any instruction, processor searches data for a memory address in Address Address
the cache memory. If the data is available (i.e. Cache hit), processor continue
its execution by fetching the data. If the data is not available in cache memory 0 Word 0 0 Byte 3 Byte 2 Byte 1 Byte 0
(i.e. Cache miss), Processor will be stalled and the data is fetched from main 1 Word 1 4 Byte 7 Byte 6 Byte 5 Byte 4
memory and placed in appropriate location of cache memory and then the 2 Word 2 8 Byte 11 Byte 10 Byte 9 Byte 8
processor continue with the execution of the instruction. 3 Word 3 12 Byte 15 Byte 14 Byte 13 Byte 12
A cache memory is considered as a collection of blocks of various sizes. 4 Word 4 16 Byte 19 Byte 18 Byte 17 Byte 16
The number of blocks in cache memory is usually a power of 2.
5 Word 5 20 Byte 23 Byte 22 Byte 21 Byte 20
Important issues for design of a Cache Memory: - 6 Word 6 24 Byte 27 Byte 26 Byte 25 Byte 24
When we copy a block of data from main memory to the cache, where exactly 7 Word 7 28 Byte 31 Byte 30 Byte 29 Byte 28
it should be placed? Word addressable Byte addressable memory with size 32 bytes i.e. 8
[Solution: Use a mapping technique such as direct mapping, Set Memory with size words
associative mapping, Full associative mapping] 32 bytes i.e. 8
How can we tell if a word is already in the cache or it has to be fetched from words
CHAPTER-5 Large and Fast: Exploiting Memory Hierarchy
Case 1: This question can be answered by using a field called tag.
Example: Consider a word addressable main memory of size 16 words, a
cache memory of size 4 blocks, each block containing one word. Tag: -
Main memory size= 16 words= 2 4 The tags contain the address information required to identify whether a word in the
words cache corresponds to the requested word.
So, 4 bits are required to represent a The tag needs only to contain the upper portion of the address, corresponding to
memory address. the bits that are not used as an index into the cache.
Cache memory size in number of A valid bit is also added to indicate whether an entry contains a valid address
blocks= 4= 2 2 or not. If the bit is not set, there cannot be a match for this block.
So 2 bits are required to represent a When the system is initialized, all the valid bits are set to 0
cache block index. When data is loaded into a particular cache block, the corresponding valid bit
For a memory reference to address is set to 1
14 i.e. (1110)2, the cache index is
14%4= 2 [102] Steps taken for handling a memory access by the processor: -
Similarly, for a memory reference The lowest n bits of the address will index a block in the cache.
to address 7 i.e. (0111)2, the cache If the block is valid and the tag matches the upper (k-n) bits of the k-bit
index is 7% 4= 3 [112] address, then that data will be sent to the CPU (cache hit)
For a memory reference to address 3 i.e. (0011)2 , the cache index is 3% 4= 3 [112] Otherwise (cache miss), data is read from main memory and stored in the
How to find data in the cache: - cache block specified by the lowest n bits of the address the upper (k-n)
Because each cache location can contain the contents of a number of different address bits are stored in the block’s tag field the valid bit is set to 1.
memory locations, how do we know whether the data in the cache corresponds On a cache miss, the simplest thing to do is to stall the pipeline until the data
to a requested word? from main memory can be fetched (and also copied into the cache).
k bits memory address
Example:
Consider a word addressable main memory of size 16 words, a cache memory
of size 4 blocks, each block containing one word. Draw the diagram
representing steps taken for cache hit.
Ans:
Main memory size= 16 words= 24 words
So k=4 bits are required to represent a memory address.
Cache memory size in number of blocks= 4= 22
So n=2 bits are required to represent a cache block (i.e. the cache index).
CHAPTER-5 Large and Fast: Exploiting Memory Hierarchy
Example: -
Find the number of cache hits, cache miss, hit ratio and miss ratio, for the
following memory reference string: 10110two , 11010two, 10110two , 11010two ,
10000 two , 00011two , 10000two, 10010two , and 10000two .
Assume a word addressable main memory of size 32 words, a cache memory Example:
of size 8 blocks and each block containing one word. Find the number of cache hits, cache miss, hit ratio and miss ratio, for the
following memory reference string: 0,3,1,2,1,0,5,2,4,15
ANS: - Assume a word addressable main memory of size 16 words, a cache memory
Main memory size= 32 words= 25 words of size 4 blocks and each block containing one word.
So 5 bits are required to represent a memory address.
Cache memory size in numbers of blocks= 8= 23 ANS: -
So 3 bits are required to represent a cache block. Memory reference string specifies a sequence of memory address requested by the
processor.
CHAPTER-5
Large and Fast: Exploiting Memory Hierarchy
Address 0 3 1 2 1 0 5 2 4 15 Case 2: Multi Word Direct Mapped Cache Design: -
Consider a word addressable main memory of size 2k words, a cache memory
Binary 0000 0011 0001 0010 0001 0000 0101 0010 0100 1111
address of size 2n blocks, each block containing 2m words. Find the number of bits to
represent cache index and tag field.
Cache 0%4 3%4= 1%4= 2%4= 1%4= 0%4= 5%4= 2%4= 4%4= 15%4 k bits memory address
index =0 3 1 2 1 0 1 2 0 =3
(k-n-m) bits tag n bits cache index m bits word offset
Hit or Miss Miss Miss Miss Hit Hit Miss Hit Miss Miss
miss Here lower m bits represent the word number in the block (i.e. the word offset).
Next n bits represent the cache block index.
k- (n + m) bits represent the tag value.
Example:
Find the cache block index of the memory reference 7, assuming a word
addressable main memory of size 32 words, a cache memory of size 4 blocks
and each block containing 4 words.
ANS: -
Main memory size= 32 words= 25 words
So k=5 bits are required to represent a memory address.
Cache memory size in terms of blocks= 4= 22
So n=2 bits are required to represent a cache block.
1 block= 4 words= 22 words
So, m = 2 bits are required to represent a word within a cache block.
Address= 7= 0 0 1 11
0 01 11
Tag cache block index word offset
Formula to find the cache block index from word addressable memory
address for multi word direct mapped cache: