Cache: CIT 595 Spring 2007
Cache: CIT 595 Spring 2007
CIT 595 • The data stored in cache is data that the processor is
likely to use in the very near future
Spring 2007 ¾ SRAM is fast but has less memory, so store only a subset of
the data stored in main memory
CIT 595 11 - 2
Cache
1
Address Conversion to Cache Location Mapping Scheme 1: Direct Mapped Cache
• Address Conversion is done by giving special • In a direct mapped cache
significance to the bits of the main memory address consisting of N blocks of
cache (i.e. N locations) , block
• The address is split into distinct groups called fields X of main memory maps to
¾ Just like instruction decoding is done based on certain bit fields cache block Y = X mod N.
Direct Mapped Scheme: Address Conversion Cache with 4 blocks and 8 words per block
2
Example of Direct Mapped Scheme Direct Mapped Cache with 16 blocks
13
14
15
Cache Indexing and Data Retrieval Example of Direct Mapped Cache (contd..)
• Suppose a program generates the address 1AA
• In 14-bit binary, this number is: 000001 1010 1010
Block • The first 7 bits of this address go in the tag field, the next 4
No. Tag Data bits go in the block field, and the final 3 bits indicate the
word within the block.
n
7
7 MUX 3
= n-bit
?
CIT 595 Desired Word if tags match 11 - 11 CIT 595 11 - 12
3
Direct Mapped Cache Example Direct Mapped Cache Example (contd..)
Block • However, if the program generates the address,
No. Tag Data 3AB
0 • 3AB also maps to block 0101, but we will not find data for
1
3AB in cache
¾Tags will not match i.e. 0000111 (of addr 3AB) is not
2
equal to 0000011 (of addr 1AB)
3
4 • Hence we get it from main memory
5 0000011
• The block loaded for address 1AA would be evicted
(removed) from the cache, and replaced by the blocks
13
associated with the 3AB reference
14
15
Direct Mapped Cache with address 3AB Disadvantage of Direct Mapped Cache
Block • Suppose a program generates a series of memory
No. Tag Data references such as: 1AB, 3AB, 1AB, 3AB, . .
0 ¾The cache will continually evict and replace blocks
1
2 • The theoretical advantage offered by the cache is
3 lost in this extreme case
4
• Other cache mapping schemes are designed to
5 0000111
prevent this kind of thrashing
13
14
15
4
Address Breakup Valid Cache block
• Why is the address broken up in a particular manner ? • How do we know whether the block in cache is valid or not?
• If the higher order bits (i.e. bits used for tag) are used for • For validity, another bit called valid bit is added to the cache
determining cache location then values from consecutive addresses indicate whether the block contains valid information
would map to same location in cache
¾ The middle bits are preferred for block location as they would cause
¾0 – not valid, 1 – valid
less thrashing ¾All blocks at start up would be not valid
¾If data from main memory is got into cache for a
particular block, then valid bit for that field is set
¾Valid bit will contribute to the cache size
CIT 595 11 - 17 CIT 595 11 - 18
Direct Mapped Cache with Valid (V) Field Calculating Cache Size
Suppose our memory consists of 214 locations (or words), and
Block cache has 16 = 24 blocks, and each block holds 8 words
No. Tag Data V
0 0
1 0
2 0
3 0
4 0
• There are 16 locations in the cache
5 0000111 0 1 • Each row has 7 bits for tag + 8 words + 1 valid bit
Address 3AB • Assume 1 word is 8 bits, the total bits in row (8 x 8) + 7 + 1 = 72
13 0 • 72 bits = 9 bytes
referenced for
14 the first time. 0 • Cache size = 16 x 9 bytes = 144 bytes
15 Entire block is 0
brought into
CIT 595 cache block 5. 11 - 19 CIT 595 11 - 20
5
Hit or Miss in the Cache Some more Terminology
• Hit means that we actually found data in the cache
• The hit rate is the percentage of time data is found
at a given memory level
• A hit occurs when valid bit = 1 AND tag in the cache
matches the tag field of the address • The miss rate is the percentage of time it is not
• Miss rate = 1 - hit rate
• If both conditions don’t hold then we did not find the
data in cache • The hit time is the time required to access data at a
¾This is known as miss in cache given memory level
• On a miss, the data is brought from main memory into • The miss penalty is the time required to process a
the cache, and the valid bit is set miss
• This is how fully associative cache works • When the cache is searched, all tags are searched
in parallel to retrieve the data quickly
• A memory address is partitioned into only two
fields: the tag and the word ¾More hardware cost than direct mapped
¾Basically we need “n” comparators where n = #
of blocks in cache
6
Fully Associate: Which block to replace if cache is full? Scheme 3: Set Associative
• Recall that direct mapped cache evicts a block • Set associative cache combines the ideas of direct
whenever another memory reference needs that mapped cache and fully associative cache
block
• A set associative cache mapping is like direct
• With fully associative cache, we have no such mapped cache in that a memory reference maps to
mapping, thus we must devise an algorithm to a particular location in cache
determine which block to evict from the cache
• But that cache location can hold more than one
• The block that is evicted is called the victim block main memory block. The cache location is then
called a set.
• There are a number of ways to pick a victim, we ¾Instead of mapping anywhere in the entire cache (fully
associative), a memory reference can map only to the
will discuss them shortly…. subset of cache
7
K-Set Associative Cache Example Advantage & Disadvantage Set Associative
• Suppose we have a main memory of 214 locations • Advantage
¾ Unlike direct mapped cache, if an address maps to a set, there
• This memory is mapped to a 2-way set associative cache having 16 is choice for placing the new block
blocks where each block contains 8 words
¾ If both slots are filled, then we need an algorithm that will
• Number of Sets = Number of Blocks in cache/ K decide which old block to evict (like fully associative)
¾ Since this is a 2-way cache, each set consists of 2 blocks, and there are 16
sets i.e. 16/2 = 8
• Disadvantage
Thus, we need 3 bits for the set, 3 bits for the word, giving 8 leftover bits ¾ Tags of each block in a set need to be matched (in parallel)
for the tag: to figure out whether the data is present in cache. Need k
comparators.
8
Replacement Algorithm/Policy What about blocks that have been written to?
FIFO - First-in, first-out • While your program is running, it will modify some
• In FIFO, the block that has been in the cache the locations
longest, regardless of when it was last used
• We need to keep main memory and cache consistent if
we are modifying data
Random Replacement
• Does what its name implies: It picks a block at • We have two options
random and replaces it with a new block ¾Should we update cache and memory at the same
time? OR
• Random replacement can certainly evict a block
that will be needed often or needed soon, but it ¾Update the cache and then main memory at a later
time
never thrashes (like in the case of direct-mapped
cache) ¾The two choices are known Cache Write policies
• Disadvantage • Disadvantage
¾ All writes require main memory access (bus transaction) ¾ Need extra bit in cache to indicate which block has been modified
¾ Slows down the system - If the there is another read request ¾ Like valid bit, a another bit is introduced called Dirty Bit, to indicate a
for main memory due to miss in cache, the read request has to modified cache block.
wait until the earlier write was serviced ¾ 0 – Not Dirty, 1 – Dirty (modified)
¾ Adds to size of the cache
9
Direct Mapped Cache with Valid and Dirty Bit What affects Performance of Cache?
• Programs that exhibit bad locality
Block • E.g. Spatial Locality with matrix operations
No. Tag Data V D ¾ Suppose Matrix data kept in memory is by rows (known as
row-major) i.e. offset = row*NUMCOLS + column
0 0 0
1 xxxxx 1 0 • Poor code:
2 0 0 ¾ for (j = 0; j < numcols; j++)
3 xxxxx 1 1 for(i = 0; i < numrows; i++)
¾ i.e. x[i][j] followed by x[i + 1][j]
¾ The array is being accessed by column and we going to miss
Dirty Words within one block in the cache every time
• Level 2 cache (64KB to 2MB) located external to the • The tradeoffs in choosing one over the other involve
processor weighing the variables of access time, memory size, and
circuit complexity
¾ Access time is usually around 15 - 20ns
10
Instruction and Data Caches Review of Cache Organization
• A unified or integrated cache is one where both Q1: Where can a block be placed in the cache level?
instructions and data are cached Mapping scheme
Looking Forward
• Studied interaction between cache and main memory
¾ The memory is managed by hardware
CIT 595 11 - 43
11