Assosiative Mapping - Cache Memory
Assosiative Mapping - Cache Memory
POINTS OF INTEREST:
Cache
Tag0 Block for Tag0 Address length is s + w bits Block 0 Tag Block for Tag1 1 Cache is divided into a number of Block 1 Set 0 Tag2 Block for Tag2 sets, v = 2d Tag3 Block for Tag3 k blocks/lines can be contained within Block 128 Tag Block for Tag4 4 Block 129 each set Tag Block for Tag 5 5 k lines in a cache is called a k-way Set 1 Tag6 Block for Tag6 Block 256 set associative mapping Tag7 Block for Tag7 Block 257 Number of lines in a cache = vk = d Tag8 Block for Tag8 k2 Block 384 Size of tag = (s-d) bits Block 385 Tag511 Block for Tag511 Each block of main memory maps to only one cache set, but k-lines can occupy a set at the same time Two lines per set is the most Effect of Cache Set Size on Address Partitioning common organization. Word o This is called 2-way Tag bits Set ID bits ID bits associative mapping o A given block can be in one 18 bits 9 bits 3 bits Direct mapping (1 line/set) of 2 lines in only one 19 bits 8 bits 3 bits 2-way set associative (21 lines/set) specific set 20 bits 7 bits 3 bits 4-way set associative (22 lines/set) o Significant improvement 21 bits 6 bits 3 bits 8-way set associative (23 lines/set) over direct mapping o Replacement algorithm simply uses LRU with a 25 bits 2 bits 3 bits 128-way set associative (27 lines/set) USE bit. When one block 26 bits 1 bit 3 bits 256-way set associative (28 lines/set) is referenced, its USE bit is set while its partner in the 27 bits 3 bits Fully associative (1 big set) set is cleared
Memory Blocks
Memory
DEFINITION: A block is a group of neighboring words in memory identified by bits of address excluding
"w" word ID bits
EXAMPLE: Assume a block uses three word ID bits, i.e., w=3. Memory addresses for this system are therefore broken up as shown in the figure below. Memory address a29 a28 a27 a4 a3 a2 a1 a0
EXERCISE: Which addresses below are contained the same block as the address 0x546A5 for a block
size of 8 words? a.) 0x536A5 a.) 0x546B5 a.) 0x546AF a.) 0x546A0 a.) 0x546C7 a.) 0x546A2
values[6] = {9, 34, 67, 23, 7, 3}; count; sum = 0; (count = 0; count < 8; count++) sum += values[count];
Writing to a Cache
Must not overwrite a cache block unless main memory is up to date Two main problems: If cache is written to, main memory is invalid or if main memory is written to, cache is invalid Can occur if I/O can address main memory directly Multiple CPUs may have individual caches; once one cache is written to, all caches are invalid Write Through: All writes go to main memory as well as cache Multiple CPUs can monitor main memory traffic to keep local (to CPU) cache up to date Lots of traffic and slows writes Write Back: Updates initially made in cache only Update bit for cache slot is set when update occurs If block is to be replaced, write to main memory only if update bit is set Other caches get out of sync I/O must access main memory through cache Research shows that 15% of memory references are writes Multiple Processors with Multiple Caches: Even if a write through policy is used, other processors may have invalid data in their caches
TAGn-1
Memory
Block 0 Block 1 Block 512 Block 513 Block 1024 Block 1025
POINTS OF INTEREST:
Line 511
Tag511
A main memory block can load into t bits any line of cache Memory address is interpreted as: Tag o Least significant w bits = word position within block o Most significant s bits = tag used to identify which block is stored in a particular line of cache Every line's tag must be examined for a match The algorithm for storing is independent of the size of the cache Cache searching gets expensive and slower
Fully Associative Mapping Memory Address Partitioning w bits Bits identifying word offset into block
Direct Mapping Partitioning of Memory Address t bits Tag l bits Bits identifying row in cache w bits Bits identifying word offset into block
EXAMPLE: Assume that a portion of the tags in the cache in our example looks like the table below. Which of the following addresses are contained in the cache?
a.) 0x438EE8 b.) 0xF18EFF c.) 0x6B8EF3 d.) 0xAD8EF3
Tag (binary) 0101 1110 1010 0110 1011 1111 0011 1101 1101 1011 0101 0001 1000 1100 1000 1000 0101 1000 1110 1001 1110 1110 1001 1110 1110 1011 1111 1111 0010 1111 10 01 00 11 00 11
EXAMPLE: What cache line number will the following addresses be stored to, and what will the minimum address and the maximum address of each block they are in be if we have a cache with 212 = 4K lines of 24 = 16 words to a block in a 228 = 256 Meg memory space?
a.) 0x9ABCDEF b.) 0x1234567 c.) 0xD43F6C2
EXAMPLE: Assume that a portion of the tags in the cache in our example looks like the table below. Which of the following addresses are contained in the cache?
a.) 0x438EE8 b.) 0xF18EFF c.) 0x6B8EF3 d.) 0xAD8EF3
Replacement Algorithms
There must be a method for selecting which line in the cache is going to be replaced when theres no room for a new line
Tag (binary) 0101 1110 1010 0110 1011 1111 0011 1101 1101 1011 0101 0001
Line number (binary) 1000 1000 1000 1000 1000 1000 1110 1110 1110 1110 1110 1110 1110 1110 1111 1111 1111 1111 10 11 00 01 10 11
POINTS OF INTEREST:
Hardware implemented algorithm for speed There is no need for a replacement algorithm with direct mapping since each block only maps to one line just replace line that is in the way. Types of replacement algorithms: o Least Recently used (LRU) replace the block that hasn't been touched in the longest period of time o First in first out (FIFO) replace block that has been in cache longest o Least frequently used (LFU) replace block which has had fewest hits o Random just pick one, only slightly lower performance than use-based algorithms LRU, FIFO, and LFU
EXAMPLE: For the previous example, how many lines does the cache contain? How many blocks can be mapped to a single line of the cache? PROS: Simple & inexpensive CONS: If a program accesses 2 blocks that map to the same line repeatedly, cache misses are very high (thrashing)