Lect12 Cache
Lect12 Cache
Locality
Locality is a principle that makes having a memory hierarchy a
good idea
If an item is referenced then because of
temporal locality: it will tend to be again referenced soon
spatial locality: nearby items will tend to be referenced soon
why does code have locality – consider instruction and data?
Hit and Miss
Focus on any two adjacent levels – called, upper (closer to CPU)
and lower (farther from CPU) – in the memory hierarchy,
because each block copy is always between two adjacent levels
Terminology:
block: minimum unit of data to move between levels
X4 X4
Reference to Xn
X1 X1
causes miss so
Xn – 2 Xn – 2
it is fetched from
memory
Xn – 1 Xn – 1
X2 X2
Xn
X3 X3
Issues:
how do we know if a data item is in the cache?
Ind e x
In de x V a lid T ag D a ta
0
1
2
1021
1022
1023
20 32
16 14 Byte
offset
H it D ata
16 bits 32 bits
Valid Tag D ata
16K
entries
16 32
Write-back scheme
write the data block only into the cache and write-back the
block to main only when it is replaced in cache
more efficient than write-through, more complex to implement
Direct Mapped Cache: Taking Advantage of Spatial Locality
Taking advantage of spatial locality with larger blocks:
16 12 2 Byte
H it T ag D a ta
offset
Index Block offset
1 6 bits 12 8 bits
V T ag D ata
4K
entrie s
16 32 32 32 32
M ux
32
Cache with 4K 4-word blocks: byte offset (least 2 significant bits) is ignored,
next 2 bits are block offset, and the next 12 bits are used to index into cache
Direct Mapped Cache: Taking Advantage of
Spatial Locality
Cache replacement in large (multiword) blocks:
word read miss: read entire block from main memory
word write miss: cannot simply write word and tag! Why?!
35%
30%
15%
10%
5%
0%
4 16 64 256
Block size (bytes) 1 KB
8 KB
16 KB
64 KB
256 KB
Example
time
memory stall cycles = number of memory accesses miss
Direct mapped: one unique cache location for each memory block
cache block address = memory block address mod cache size
Fully associative: each memory block can locate anywhere in cache
all cache entries are searched (in parallel) to locate block
Set associative: each memory block can place in a unique set of
cache locations – if the set is of size n it is n-way set-associative
cache set address = memory block address mod number of sets in
cache
all cache entries in the corresponding set are searched (in parallel) to
locate block
Increasing degree of associativity
reduces miss rate
increases hit time because of the parallel search and then fetch
Decreasing Miss Rates with Associative Block Placement
12 mod 8 = 4 12 mod 4 = 0
1 1 1
Tag Tag Tag
2 2 2
4 1
5 2
6 3
2 (two-way set-associative)
Block address Cache set
0 0 (= 0 mod 2)
6 0 (= 6 mod 2)
8 0 (= 8 mod 2)
Block address translation in a two-way set-associative cache
3 misses
Implementation of a Set-Associative Cache
A d dr es s
31 3 0 1 2 11 10 9 8 3 2 1 0
22 8
4 - to - 1 m ultip le xo r
H it D a ta
1 2%
9%
Miss rate
6%
3%
0%
O n e -w a y T w o -w a y F ou r-w a y E ig h t-w a y
A sso c ia tivity 1 KB 16 KB
2 KB 32 KB
4 KB 64 KB
8 KB 1 28 KB
miss rate 5%