9 - Cache
9 - Cache
-Small in Size
-High cost
Fig. 13.55 A CMOS SRAM memory cell.
MAIN MEMORY
-Low cost
• 24 bit address
• 2 bit word identifier (4 byte block)
• 22 bit block identifier
– 8 bit tag (=22-14)
– 14 bit slot or line
• No two blocks in the same line have the same Tag field
• Check contents of cache by finding line and checking Tag
Direct Mapping Cache
Organization
Direct Mapping pros & cons
• Simple
• Inexpensive
• Fixed location for given block
– If a program accesses 2 blocks that map to the
same line repeatedly, cache misses are very
high
Associative Mapping
• A main memory block can load into any
line of cache
• Memory address is interpreted as tag and
word
• Tag uniquely identifies block of memory
• Every line’s tag is examined for a match
• Cache searching gets expensive
Fully Associative Cache
Organization
Associative Mapping
Address Structure
Word
Tag 22 bit 2 bit
X4 X4
Reference to Xn
X1 X1
causes miss so
Xn – 2 Xn – 2
it is fetched from
memory
Xn – 1 Xn – 1
X2 X2
Xn
X3 X3
Issues:
how do we know if a data item is in the cache?
Ind e x
In de x V a lid T ag D a ta
0
1
2
1021
1022
1023
20 32
16 14 Byte
offset
H it D ata
16 bits 32 bits
Valid Tag D ata
16K
entries
16 32
Write-back scheme
write the data block only into the cache and write-back the
block to main only when it is replaced in cache
more efficient than write-through, more complex to implement
Direct Mapped Cache: Taking Advantage of Spatial Locality
Taking advantage of spatial locality with larger blocks:
16 12 2 Byte
H it T ag D a ta
offset
Index Block offset
1 6 bits 12 8 bits
V T ag D ata
4K
entrie s
16 32 32 32 32
M ux
32
Cache with 4K 4-word blocks: byte offset (least 2 significant bits) is ignored,
next 2 bits are block offset, and the next 12 bits are used to index into cache
Direct Mapped Cache: Taking Advantage of
Spatial Locality
Cache replacement in large (multiword) blocks:
word read miss: read entire block from main memory
word write miss: cannot simply write word and tag! Why?!
35%
30%
15%
10%
5%
0%
4 16 64 256
Block size (bytes) 1 KB
8 KB
16 KB
64 KB
256 KB
Example
time
memory stall cycles = number of memory accesses miss
Direct mapped: one unique cache location for each memory block
cache block address = memory block address mod cache size
Fully associative: each memory block can locate anywhere in cache
all cache entries are searched (in parallel) to locate block
Set associative: each memory block can place in a unique set of
cache locations – if the set is of size n it is n-way set-associative
cache set address = memory block address mod number of sets in
cache
all cache entries in the corresponding set are searched (in parallel) to
locate block
Increasing degree of associativity
reduces miss rate
increases hit time because of the parallel search and then fetch
Decreasing Miss Rates with Associative Block Placement
12 mod 8 = 4 12 mod 4 = 0
1 1 1
Tag Tag Tag
2 2 2
2 (two-way set-associative)
Block address Cache set
0 0 (= 0 mod 2)
6 0 (= 6 mod 2)
8 0 (= 8 mod 2)
Block address translation in a two-way set-associative cache
3 misses
Implementation of a Set-Associative Cache
A d dr es s
31 3 0 1 2 11 10 9 8 3 2 1 0
22 8
4 - to - 1 m ultip le xo r
H it D a ta
1 2%
9%
Miss rate
6%
3%
0%
O n e -w a y T w o -w a y F ou r-w a y E ig h t-w a y
A sso c ia tivity 1 KB 16 KB
2 KB 32 KB
4 KB 64 KB
8 KB 128 KB
miss rate 5%