ACA Lecture 27 Cache Optimizations
ACA Lecture 27 Cache Optimizations
1
Impact of Memory System
on Processor Performance
CPU Performance with Memory Stall = CPI without stall + Memory
Stall CPI
Memory Stall CPI
= Miss per inst × miss penalty
= % Memory Access/Instr × Miss rate × Miss
Penalty
Example: Assume 20% memory acc/instruction, 2%
miss rate, 400-cycle miss penalty. How much is
memory stall CPI?
2
Impact of Memory System
on Processor Performance
CPU Performance with Memory Stall = CPI without stall + Memory
Stall CPI
Memory Stall CPI
= Miss per inst × miss penalty
= % Memory Access/Instr × Miss rate × Miss
Penalty
Example: Assume 20% memory acc/instruction, 2%
miss rate, 400-cycle miss penalty. How much is
memory stall CPI?
Memory Stall CPI= 0.2*0.02*400=1.6 cycles
3
CPU Performance with
Memory Stall
CPU Performance with Memory Stall= CPI without stall +
Memory Stall CPI
CPU time = IC ( CPIexecution + CPImem_stall ) Cycle Time
CPImem_stall = Miss per inst × miss penalty
CPImem_stall = Memory Inst Frequency Miss Rate Miss Penalty
4
Modern Computer
Architectures
5
Unified vs Split Caches
● Separate Instruction and Data caches:
– Avoids structural hazard
– Also each cache can be tailored specific to need.
Processor Processor
I-Cache-1 D-Cache-1
Unified
Cache-1
Unified
Cache-2
Unified
Cache-2
7
Example 4
– Assume 16KB Instruction and Data Cache:
– Inst miss rate=0.64%, Data miss rate=6.47%
– 32KB unified: Aggregate miss rate=1.99%
– Assume 33% data ops 75% accesses from
instructions (1.0/1.33)
– hit time=1, miss penalty=50
– Data hit has 1 additional stall for unified cache
(why?)
– Which is better (ignore L2 cache)?
AMATSplit =75%x(1+0.64%x50)+25%x(1+6.47%x50) =
2.05
AMATUnified=75%x(1+1.99%x50)+25%x(1+1+1.99%x50)=
2.24
8
How to Improve Cache
Performance?
AMAT = HitTime + MissRate MissPenalty
1. Reduce miss rate.
2. Reduce miss penalty.
3. Reduce miss penalty or miss rates
via parallelism.
4. Reduce hit time.
11
Reducing Miss Rates
● Techniques:
– Larger block size
– Larger cache size
– Higher associativity
– Way prediction
– Pseudo-associativity
– Compiler optimization
12
Reducing Miss Penalty
● Techniques:
– Multilevel caches
– Victim caches
– Read miss first
– Critical word first
13
Reducing Miss Penalty or
Miss Rates via Parallelism
● Techniques:
– Non-blocking caches
– Hardware prefetching
– Compiler prefetching
14
Reducing Cache Hit
Time
● Techniques:
– Small and simple caches
– Avoiding address translation
– Pipelined cache access
– Trace caches
15
Modern Computer
Architectures
16
Reducing Miss Penalty
● Techniques:
– Multilevel caches
– Victim caches
– Read miss first
– Critical word first
– Non-blocking caches
17
Multilevel Cache
18
Reducing Miss Penalty(1):
Multi-Level Cache
● Add a second-level cache.
● L2 Equations:
20
Global vs. Local Miss
Rates
● At lower level caches (L2 or L3),
global miss rates provide more
useful information:
– Indicate how effective is cache in
reducing AMAT.
– Who cares if the miss rate of L3 is
50% as long as only 1% of processor
memory accesses ever benefit from it?
21
Q. Suppose that 1000 memory references
are generated by CPU. 40 misses in the first
level cache and 20 misses in the second level
cache. What are the local and global rate
respective to memories ?
22