10 Cacheperf
10 Cacheperf
Hsin-Chou Chi
Secondary Memory
1000
100
10 Core
Memory
1
0.1
0.01
VAX/1980 PPro/1996 2010+
Impacts of Cache Performance
Relative cache penalty increases as processor
performance improves (faster clock rate and/or lower CPI)
The memory speed is unlikely to improve as fast as processor
cycle time. When calculating CPIstall, the cache miss penalty is
measured in processor clock cycles needed to handle a miss
The lower the CPIideal, the more pronounced the impact of stalls
A processor with a CPIideal of 2, a 100 cycle miss penalty,
36% load/store instr’s, and 2% I$ and 4% D$ miss rates
Memory-stall cycles = 2% × 100 + 36% × 4% × 100 = 3.44
So CPIstalls = 2 + 3.44 = 5.44
8 requests, 2 misses
Tag 22 8
Index
Index V Tag Data V Tag Data V Tag Data V Tag Data
0 0 0 0
1 1 1 1
2 2 2 2
. . . .
. . . .
. . . .
253 253 253 253
254 254 254 254
255 255 255 255
32
4x1 select
Hit Data
Range of Set Associative Caches
For a fixed size cache, each increase by a factor of two
in associativity doubles the number of blocks per set (i.e.,
the number or ways) and halves the number of sets –
decreases the size of the index by 1 bit and increases
the size of the tag by 1 bit
Used for tag compare Selects the set Selects the word in the block
Increasing associativity
Decreasing associativity
Fully associative
Direct mapped (only one set)
(only one way) Tag is all the bits except
Smaller tags block and byte offset
Costs of Set Associative Caches
When a miss occurs, which way’s block do we pick for
replacement?
Least Recently Used (LRU): the block replaced is the one that
has been unused for the longest time
- Must have hardware to keep track of when each way’s block was
used relative to the other blocks in the set
- For 2-way set associative, takes one bit per set → set the bit when a
block is referenced (and reset the other way’s bit)
N-way set associative cache costs
N comparators (delay and area)
MUX delay (set selection) before data is available
Data available after set selection (and Hit/Miss decision). In a
direct mapped cache, the cache block is available before the
Hit/Miss decision
- So its not possible to just assume a hit and continue and recover later
if it was a miss
Benefits of Set Associative Caches
The choice of direct mapped or set associative depends
on the cost of a miss versus the cost of implementation
12
4KB
10 8KB
16KB
8
Miss Rate
32KB
6 64KB
128KB
4 256KB
512KB
2
Data from Hennessy &
0 Patterson, Computer
1-way 2-way 4-way 8-way Architecture, 2003
Associativity
Largest gains are in going from direct mapped to 2-way
(20%+ reduction in miss rate)
Reducing Cache Miss Rates #2
2. Use multiple levels of caches
L1 typical L2 typical
Total size (blocks) 250 to 2000 4000 to
250,000
Total size (KB) 16 to 64 500 to 8000
Block size (B) 32 to 64 32 to 128
Miss penalty (clocks) 10 to 25 100 to 1000
Miss rates 2% to 5% 0.1% to 2%
(global for L2)
Two Machines’ Cache Parameters
Less More
Simplicity often wins