Cache Writing & Performance
Cache Writing & Performance
... ...
Only then can the cache block be replaced with data from
Index
address V142
Dirty Tag Data Address Data
... 1000 1110 1225
110 1 0 10001 1225 1101 0110 21763
... ...
Cache Hierarchies:
— Trade-off between access time & hit rate
• L1 cache can focus on fast access time (with okay hit rate)
• L2 cache can focus on good hit rate (with okay access
time)
— Such hierarchical design is another “big idea”
— We saw this in section.
CPU L1 cache L2 cache Main
Memory
L2 Cache:
— 1 MB
— 64 byte blocks
— 4-way set associative
— 16 cycle access time (total, not just miss penalty)
Memory
— 200+ cycle access time
12%
9%
Miss rate
6%
3%
0%
One-way Two-way Four-way Eight-way
Associativity
15%
12%
9%
Miss rate
1 KB
2 KB
4 KB
6% 8 KB
3%
0%
One-way Two-way Four-way Eight-way
Associativity
40%
35%
30%
1 KB
25%
Miss rate
8 KB
20% 16 KB
64 KB
15%
10%
5%
0%
4 16 64 256
Block size (bytes)
25
Series1
20
15
10
0
0 2000 4000 6000 8000 10000
size
Can learn strides as well (X, X+16, X+32, …) and (X, X-1, X-2, …)
01/27/25 Cache performance 22
PC-based HW Prefetching
What about the following?
3 separate streams
— Might confuse naïve prefetcher.
Non-stride accesses!
Like linked data structures:
— lists, arrays of pointers, etc.
Consider:
element_t *A[SIZE];
for (int i=0 ; i < SIZE ; i ++) {
process(A[i]);
}