Lecture16
Lecture16
Announcements
• Mid-term grades released. Solution on the website.
• Talk to a TA if you have doubts. Make an appointment if you
can’t make any TA office hours.
• Come to my office hour if TAs cannot solve your problems.
2
Carnegie Mellon
Cache Illustrations
CPU
0 1 2 3
Memory
4 5 6 7
(big but slow)
8 9 10 11
12 13 14 15
3
Carnegie Mellon
Cache Illustrations
CPU
Cache
8 9 14 3
(small but fast)
0 1 2 3
Memory
4 5 6 7
(big but slow)
8 9 10 11
12 13 14 15
4
Carnegie Mellon
Cache Illustrations
CPU
Request Data Data in address b is needed
at Address 14
Cache
8 9 14 3
(small but fast)
0 1 2 3
Memory
4 5 6 7
(big but slow)
8 9 10 11
12 13 14 15
4
Carnegie Mellon
Cache Illustrations
CPU
Request Data Data in address b is needed
at Address 14
Cache
8 9 14 3 Address b is in cache: Hit!
(small but fast)
0 1 2 3
Memory
4 5 6 7
(big but slow)
8 9 10 11
12 13 14 15
4
Carnegie Mellon
Cache Illustrations
CPU
Cache
8 9 14 3
(small but fast)
0 1 2 3
Memory
4 5 6 7
(big but slow)
8 9 10 11
12 13 14 15
5
Carnegie Mellon
Cache Illustrations
CPU
Request data
at Address 12
Cache
8 9 14 3
(small but fast)
0 1 2 3
Memory
4 5 6 7
(big but slow)
8 9 10 11
12 13 14 15
5
Carnegie Mellon
Cache Illustrations
CPU
Request data Data in address b is needed
at Address 12
0 1 2 3
Memory
4 5 6 7
(big but slow)
8 9 10 11
12 13 14 15
5
Carnegie Mellon
Cache Illustrations
CPU
Request data Data in address b is needed
at Address 12
0 1 2 3
Memory
4 5 6 7
(big but slow)
8 9 10 11
12 13 14 15
5
Carnegie Mellon
Cache Illustrations
CPU
Request data Data in address b is needed
at Address 12
0 1 2 3
Memory
4 5 6 7
(big but slow)
8 9 10 11
12 13 14 15
5
Carnegie Mellon
Cache Illustrations
CPU
Request data Data in address b is needed
at Address 12
0 1 2 3
Memory
4 5 6 7
(big but slow)
8 9 10 11
12 13 14 15
5
Carnegie Mellon
Cache Illustrations
CPU
Request data Data in address b is needed
at Address 12
0 1 2 3
Memory
4 5 6 7
(big but slow) Address b is stored in cache
8 9 10 11
12 13 14 15
5
Carnegie Mellon
A Simple Cache
Cache Memory • 16 memory locations
0000 • 4 cache locations
0001
Content Valid? 0010
0011 a
0100
00 0101
01 0110
10 0111
11 1000
1001
1010
1011
1100
1101
1110
1111
6
Carnegie Mellon
A Simple Cache
Cache Memory • 16 memory locations
0000 • 4 cache locations
0001
• Also called cache-line
Content Valid? 0010
0011 a
0100
00 0101
01 0110
10 0111
11 1000
1001
1010
1011
1100
1101
1110
1111
6
Carnegie Mellon
A Simple Cache
Cache Memory • 16 memory locations
0000 • 4 cache locations
0001
• Also called cache-line
Content Valid? 0010
0011 a • Every location has a valid bit, indicating
0100 whether that location contains valid data;
00 0101
0 initially.
01 0110
10 0111
11 1000
1001
1010
1011
1100
1101
1110
1111
6
Carnegie Mellon
A Simple Cache
Cache Memory • 16 memory locations
0000 • 4 cache locations
0001
• Also called cache-line
Content Valid? 0010
0011 a • Every location has a valid bit, indicating
0100 whether that location contains valid data;
00 0101
0 initially.
01 0110 • For now, assume cache location size
10 0111 == memory location size == 1 B
11 1000
1001
1010
1011
1100
1101
1110
1111
6
Carnegie Mellon
A Simple Cache
Cache Memory • 16 memory locations
0000 • 4 cache locations
0001
• Also called cache-line
Content Valid? 0010
0011 a • Every location has a valid bit, indicating
0100 whether that location contains valid data;
00 0101
0 initially.
01 0110 • For now, assume cache location size
10 0111 == memory location size == 1 B
11 1000
1001
• Cache is smaller than memory
1010
(obviously)
1011
1100
1101
1110
1111
6
Carnegie Mellon
A Simple Cache
Cache Memory • 16 memory locations
0000 • 4 cache locations
0001
• Also called cache-line
Content Valid? 0010
0011 a • Every location has a valid bit, indicating
0100 whether that location contains valid data;
00 0101
0 initially.
01 0110 • For now, assume cache location size
10 0111 == memory location size == 1 B
11 1000
1001
• Cache is smaller than memory
1010
(obviously)
1011 • Thus, not all memory locations can
1100 be cached at the same time
1101
1110
1111
6
Carnegie Mellon
Cache Placement
Cache Memory • Given a memory addr, say 0x0001, we
want to put the data there into the
0000
cache; where does the data go?
0001
0010
0011 a
0100
00 0101
01 0110
10 0111
11 1000
1001
1010
1011
1100
1101
1110
1111
7
Carnegie Mellon
Fully-Associative Cache
Cache Memory • Every memory location can be mapped
to any cache line in the cache.
0000
0001 • Given a request to address A from
Content Valid? 0010 CPU, detecting cache hit/miss
0011 a requires:
00 0xAA
0100
• Comparing address A with all four
0101
01 0XBB tags in the cache (a.k.a.,
0110 0xAA
10 0xCC 0111
associative search)
11 0xDD 1000 0xBB • Can we reduce the overhead:
1001
• of storing tags
1010
1011 0xCC • of comparison
1100 0xDD
1101
1110
1111
8
Carnegie Mellon
Fully-Associative Cache
Cache Memory • Every memory location can be mapped
to any cache line in the cache.
0000
0001 • Given a request to address A from
Content Valid? 0010 CPU, detecting cache hit/miss
0011 a requires:
00 0xBB
0100
• Comparing address A with all four
0101
01 0XAA tags in the cache (a.k.a.,
0110 0xAA
10 0xCC 0111
associative search)
11 0xDD 1000 0xBB • Can we reduce the overhead:
1001
• of storing tags
1010
1011 0xCC • of comparison
1100 0xDD
1101
1110
1111
9
Carnegie Mellon
Fully-Associative Cache
Cache Memory • Every memory location can be mapped
to any cache line in the cache.
0000
0001 • Given a request to address A from
Content Valid? Tag 0010 CPU, detecting cache hit/miss
0011 a requires:
00 0xBB
0100
• Comparing address A with all four
0101
01 0XAA tags in the cache (a.k.a.,
0110 0xAA
10 0xCC 0111
associative search)
11 0xDD 1000 0xBB • Can we reduce the overhead:
1001
• of storing tags
1010
1011 0xCC • of comparison
1100 0xDD
1101
1110
1111
10
Carnegie Mellon
Fully-Associative Cache
Cache Memory • Every memory location can be mapped
to any cache line in the cache.
0000
0001 • Given a request to address A from
Content Valid? Tag 0010 CPU, detecting cache hit/miss
0011 a requires:
00 0xBB addr [3:0]
0100
• Comparing address A with all four
0101
01 0XAA addr [3:0] tags in the cache (a.k.a.,
0110 0xAA
10 0xCC addr [3:0]
0111
associative search)
11 0xDD addr [3:0]
1000 0xBB • Can we reduce the overhead:
1001
• of storing tags
1010
1011 0xCC • of comparison
1100 0xDD
1101
1110
1111
10
Carnegie Mellon
A Few Terminologies
11
Carnegie Mellon
Direct-Mapped Cache
Cache Memory • Direct-Mapped Cache
0000 • One address can only be
0001 mapped to one cache line
0010 • CA = ADDR[1],ADDR[0]
0011
0100
00 0101
01 0110
10 0111
11 1000
1001
1010
1011
1100
1101
addr[1:0] 1110
1111
Mem addr
12
Carnegie Mellon
Direct-Mapped Cache
Cache Memory • Direct-Mapped Cache
0000 • One address can only be
0001 mapped to one cache line
0010 • CA = ADDR[1],ADDR[0]
0011 • Always use the lower order address
0100 bits
00 0101
01 0110
10 0111
11 1000
1001
1010
1011
1100
1101
addr[1:0] 1110
1111
Mem addr
12
Carnegie Mellon
Direct-Mapped Cache
Cache Memory • Direct-Mapped Cache
0000 • One address can only be
0001 mapped to one cache line
0010 • CA = ADDR[1],ADDR[0]
0011 • Always use the lower order address
0100 bits
00 0101
01 0110
10 0111
11 1000
1001
1010
1011
1100
1101
addr[1:0] 1110
1111
Mem addr
12
Carnegie Mellon
Direct-Mapped Cache
Cache Memory • Direct-Mapped Cache
0000 • One address can only be
0001 mapped to one cache line
0010 • CA = ADDR[1],ADDR[0]
0011 • Always use the lower order address
0100 bits
00
• Multiple addresses can be
0101
01 0110
10 0111 mapped to the same location
11 1000
1001
1010
1011
1100
1101
addr[1:0] 1110
1111
Mem addr
12
Carnegie Mellon
Direct-Mapped Cache
Cache Memory • Direct-Mapped Cache
0000 • One address can only be
0001 mapped to one cache line
0010 • CA = ADDR[1],ADDR[0]
0011 • Always use the lower order address
0100 bits
00
• Multiple addresses can be
0101
01 0110
10 0111 mapped to the same location
11 1000 • E.g., 0010 and 1010
1001
1010
1011
1100
1101
addr[1:0] 1110
1111
Mem addr
12
Carnegie Mellon
Direct-Mapped Cache
Cache Memory • Direct-Mapped Cache
0000 • One address can only be
0001 mapped to one cache line
0010 • CA = ADDR[1],ADDR[0]
0011 • Always use the lower order address
0100 bits
00
• Multiple addresses can be
0101
01 0110
10 0111 mapped to the same location
11 1000 • E.g., 0010 and 1010
Mem addr
12
Carnegie Mellon
Direct-Mapped Cache
Cache Memory • Direct-Mapped Cache
0000 • One address can only be
0001 mapped to one cache line
0010 • CA = ADDR[1],ADDR[0]
0011 • Always use the lower order address
0100 bits
00
• Multiple addresses can be
0101
01 0110
10 0111 mapped to the same location
11 1000 • E.g., 0010 and 1010
Mem addr
12
Carnegie Mellon
Direct-Mapped Cache
Cache Memory • Direct-Mapped Cache
0000 • One address can only be
0001 mapped to one cache line
0010 • CA = ADDR[1],ADDR[0]
0011 • Always use the lower order address
0100 bits
00
• Multiple addresses can be
0101
01 0110
10 0111 mapped to the same location
11 1000 • E.g., 0010 and 1010
12
Carnegie Mellon
Direct-Mapped Cache
Cache Memory • Direct-Mapped Cache
0000 • One address can only be
0001 mapped to one cache line
0010 • CA = ADDR[1],ADDR[0]
0011 • Always use the lower order address
0100 bits
00
• Multiple addresses can be
0101
01 0110
10 0111 mapped to the same location
11 1000 • E.g., 0010 and 1010
12
Carnegie Mellon
Direct-Mapped Cache
Cache Memory • Direct-Mapped Cache
0000 • One address can only be
0001 mapped to one cache line
0010 • CA = ADDR[1],ADDR[0]
0011 • Always use the lower order address
0100 bits
00 addr [3:2]
12
Carnegie Mellon
Direct-Mapped Cache
Cache Memory • Direct-Mapped Cache
0000 • One address can only be
Tag 0001 mapped to one cache line
0010 • CA = ADDR[1],ADDR[0]
0011 • Always use the lower order address
0100 bits
00 addr [3:2]
12
Carnegie Mellon
Direct-Mapped Cache
Cache Memory • Direct-Mapped Cache
0000 • One address can only be
Tag 0001 mapped to one cache line
0010 • CA = ADDR[1],ADDR[0]
0011 • Always use the lower order address
0100 bits
00 addr [3:2]
Direct-Mapped Cache
Cache Memory • Limitation: each memory
location can be mapped to only
0000 one cache location.
0001
Tag
0010
0011
0100
00 addr [3:2]
0101
01 addr [3:2]
0110
10 addr [3:2]
0111
11 addr [3:2]
1000
1001
1010
1011
1100
1101
addr[1:0] = Hit? 1110
1111
CPU
13
Carnegie Mellon
Direct-Mapped Cache
Cache Memory • Limitation: each memory
location can be mapped to only
0000 one cache location.
0001
Tag
0010
0011
• This leads to a lot of conflicts.
0100
00 addr [3:2]
0101
01 addr [3:2]
0110
10 addr [3:2]
0111
11 addr [3:2]
1000
1001
1010
1011
1100
1101
addr[1:0] = Hit? 1110
1111
CPU
13
Carnegie Mellon
Direct-Mapped Cache
Cache Memory • Limitation: each memory
location can be mapped to only
0000 one cache location.
0001
Tag
0010
0011
• This leads to a lot of conflicts.
00 addr [3:2]
0100 • How do we improve this?
0101
01 addr [3:2]
0110
10 addr [3:2]
0111
11 addr [3:2]
1000
1001
1010
1011
1100
1101
addr[1:0] = Hit? 1110
1111
CPU
13
Carnegie Mellon
Direct-Mapped Cache
Cache Memory • Limitation: each memory
location can be mapped to only
0000 one cache location.
0001
Tag
0010
0011
• This leads to a lot of conflicts.
00 addr [3:2]
0100 • How do we improve this?
01 addr [3:2]
0101
0110 • Can each memory location have
10 addr [3:2]
0111
the flexibility to be mapped to
11 addr [3:2]
1000 different cache locations?
1001
1010
1011
1100
1101
addr[1:0] = Hit? 1110
1111
CPU
13
Carnegie Mellon
Set-Associative Cache
Cache Memory
Content Valid? Tag 0000
0001
0010
00 0011 a
01 0100
10 0101
11 0110
0111
1000
1001
1010
1011
1100
1101
1110
1111
14
Carnegie Mellon
Set-Associative Cache
Cache Memory
Content Valid? Tag 0000
0001
0010
Set 0 0011 a
0100
Set 1 0101
0110
Way 0 Way 1 0111
1000
1001
1010
1011
1100
1101
1110
1111
15
Carnegie Mellon
Set-Associative Cache
Cache Memory
Content Valid? Tag 0000
0001
0010
Set 0 0011 a
0100
Set 1 0101
0110
Way 0 Way 1 0111
1000
2-Way Set Associative Cache 1001
Set-Associative Cache
Cache Memory
Content Valid? Tag 0000
0001
0010
Set 0 0011 a
0100
Set 1 0101
0110
Way 0 Way 1 0111
1000
16
Carnegie Mellon
Set-Associative Cache
Cache Memory
Content Valid? Tag 0000
0001
0010
Set 0 0011 a
0100
Set 1 0101
0110
Way 0 Way 1 0111
1000
16
Carnegie Mellon
Set-Associative Cache
Cache Memory
Content Valid? Tag 0000
0001
0010
Set 0 0011 a
0100
Set 1 0101
0110
Way 0 Way 1 0111
1000
16
Carnegie Mellon
Set-Associative Cache
Cache Memory
Content Valid? Tag 0000
0001
0010
Set 0 0011 a
0100
Set 1 0101
0110
Way 0 Way 1 0111
1000
16
Carnegie Mellon
17
Carnegie Mellon
18
Carnegie Mellon
18
Carnegie Mellon
00 a 1 10
01 b 1 10
10 c 1 10
11 d 1 10
addr[1:0] = Hit?
addr addr[3:2]
18
Carnegie Mellon
addr[1:0] = Hit?
addr addr[3:2]
18
Carnegie Mellon
= =
= =
Miss rate versus cache size on the Integer portion of SPEC CPU2000
19
Carnegie Mellon
Cache Organization
• Finding a name in a roster
• If the roster is completely unorganized
• Need to compare the name with all the names in the roster
• Same as a fully-associative cache
• If the roster is ordered by last name, and within the same last
name different first names are unordered
• First find the last name group
• Then compare the first name with all the first names in the
same group
• Same as a set-associative cache
20
Carnegie Mellon
b s 0
Memory
tag index
Address
21
Carnegie Mellon
Locality again
• So far: temporal locality
• What about spatial?
• Idea: Each cache location (cache line) store multiple bytes
22
Carnegie Mellon
Cache-Line Size of 2
Cache Memory
0000
0001
0010
0011 a
0100
00 0101
01 0110
10 0111
11 1000 a
1001 b
1010 c
1011 d
1100
1101
= Hit? 1110
1111
addr
23
Carnegie Mellon
Cache-Line Size of 2
Cache Memory
0000
0001
0010
0011 a
0100
00 0101
01 0110
10 0111
11 1000 a
1001 b
1010 c
1011 d
1100
1101
= Hit? 1110
1111
addr
23
Carnegie Mellon
Cache-Line Size of 2
Cache Memory
• Read 1000
0000
0001
0010
0011 a
0100
00 a b 0101
01 0110
10 0111
11 1000 a
1001 b
1010 c
1011 d
1100
1101
= Hit? 1110
1111
addr
23
Carnegie Mellon
Cache-Line Size of 2
Cache Memory
• Read 1000
0000
0001 • Read 1001 (Hit!)
0010
0011 a
0100
00 a b 0101
01 0110
10 0111
11 1000 a
1001 b
1010 c
1011 d
1100
1101
= Hit? 1110
1111
addr
23
Carnegie Mellon
Cache-Line Size of 2
Cache Memory
• Read 1000
0000
0001 • Read 1001 (Hit!)
0010
0011 a • Read 1010
0100
00 a b 0101
01 c d 0110
10 0111
11 1000 a
1001 b
1010 c
1011 d
1100
1101
= Hit? 1110
1111
addr
23
Carnegie Mellon
Cache-Line Size of 2
Cache Memory
• Read 1000
0000
0001 • Read 1001 (Hit!)
0010
0011 a • Read 1010
00 a b
0100
0101
• Read 1011 (Hit!)
01 c d 0110
10 0111
11 1000 a
1001 b
1010 c
1011 d
1100
1101
= Hit? 1110
1111
addr
23
Carnegie Mellon
Cache-Line Size of 2
Cache Memory
• Read 1000
0000
0001 • Read 1001 (Hit!)
0010
0011 a • Read 1010
00 a b
0100
0101
• Read 1011 (Hit!)
01
10
c d 0110
0111
• How to access
11 1000 a the cache now?
1001 b
1010 c
1011 d
1100
1101
??? = Hit? 1110
1111
addr ???
23
Carnegie Mellon
Cache-Line Size of 2
Cache Memory
• Read 1000
0000
0001 • Read 1001 (Hit!)
0010
0011 a • Read 1010
00 a b
0100
0101
• Read 1011 (Hit!)
01 c d 0110
10 0111
11 1000 a
1001 b
1010 c
1011 d
1100
1101
addr[2:1] = Hit? 1110
1111
addr addr[3]
24
Carnegie Mellon
Cache-Line Size of 2
Cache Memory
• Read 1000
0000
0001 • Read 1001 (Hit!)
0010
0011 a • Read 1010
00 a b
0100
0101
• Read 1011 (Hit!)
01 c d 0110
10 0111
11 1000 a
1001 b
1010 c
1011 d
MUX 1100
1101
addr[2:1] To = Hit? 1110
CPU 1111
addr addr[3]
24
Carnegie Mellon
Cache-Line Size of 2
Cache Memory
• Read 1000
0000
0001 • Read 1001 (Hit!)
0010
0011 a • Read 1010
00 a b
0100
0101
• Read 1011 (Hit!)
01 c d 0110
10 0111
11 1000 a
1001 b
1010 c
1011 d
MUX 1100
1101
addr[2:1] To = Hit? 1110
CPU 1111
addr addr[3]
addr[0]
24
Carnegie Mellon
b l+s l 0
Memory
tag index offset
Address
25
Carnegie Mellon
Handling Reads
2633
Carnegie Mellon
Handling Reads
• Read miss: Put into cache
2633
Carnegie Mellon
Handling Reads
• Read miss: Put into cache
• Any reason not to put into cache?
2633
Carnegie Mellon
Handling Reads
• Read miss: Put into cache
• Any reason not to put into cache?
• What to replace? Depends on the replacement policy. More on
this later.
2633
Carnegie Mellon
Handling Reads
• Read miss: Put into cache
• Any reason not to put into cache?
• What to replace? Depends on the replacement policy. More on
this later.
• Read hit: Nothing special. Enjoy the hit!
2633
Carnegie Mellon
27
Carnegie Mellon
27
Carnegie Mellon
• Write-back
• + Can consolidate multiple writes to the same block before eviction.
Potentially saves bandwidth between cache and memory + saves
energy
• - Need a bit in the tag store indicating the block is “dirty/modified”
27
Carnegie Mellon
• Write-back
• + Can consolidate multiple writes to the same block before eviction.
Potentially saves bandwidth between cache and memory + saves
energy
• - Need a bit in the tag store indicating the block is “dirty/modified”
27
Carnegie Mellon
• Write-back
• + Can consolidate multiple writes to the same block before eviction.
Potentially saves bandwidth between cache and memory + saves
energy
• - Need a bit in the tag store indicating the block is “dirty/modified”
• Write-through
27
Carnegie Mellon
• Write-back
• + Can consolidate multiple writes to the same block before eviction.
Potentially saves bandwidth between cache and memory + saves
energy
• - Need a bit in the tag store indicating the block is “dirty/modified”
• Write-through
• + Simpler
27
Carnegie Mellon
• Write-back
• + Can consolidate multiple writes to the same block before eviction.
Potentially saves bandwidth between cache and memory + saves
energy
• - Need a bit in the tag store indicating the block is “dirty/modified”
• Write-through
• + Simpler
• + Memory is up to date
27
Carnegie Mellon
• Write-back
• + Can consolidate multiple writes to the same block before eviction.
Potentially saves bandwidth between cache and memory + saves
energy
• - Need a bit in the tag store indicating the block is “dirty/modified”
• Write-through
• + Simpler
• + Memory is up to date
• - More bandwidth intensive; no coalescing of writes
27
Carnegie Mellon
28
Carnegie Mellon
28
Carnegie Mellon
28
Carnegie Mellon
• Non-allocate
• + Conserves cache space if locality of writes is low (potentially
better cache hit rate)
28
Carnegie Mellon
29
Carnegie Mellon
29
Carnegie Mellon
29
Carnegie Mellon
29
Carnegie Mellon
29
Carnegie Mellon
Cache
CPU Memory
$
3043
Carnegie Mellon
Cache
CPU Memory
$
3043
Carnegie Mellon
31
Carnegie Mellon
Eviction/Replacement Policy
32
Carnegie Mellon
Eviction/Replacement Policy
32
Carnegie Mellon
Eviction/Replacement Policy
32
Carnegie Mellon
Eviction/Replacement Policy
32
Carnegie Mellon
Eviction/Replacement Policy
32
Carnegie Mellon
Eviction/Replacement Policy
32
Carnegie Mellon
Eviction/Replacement Policy
32
Carnegie Mellon
Eviction/Replacement Policy
32
Carnegie Mellon
Eviction/Replacement Policy
32
Carnegie Mellon
Implementing LRU
• Idea: Evict the least recently accessed block
• Challenge: Need to keep track of access ordering of blocks
• Question: 2-way set associative cache:
• What do you need to implement LRU perfectly? One bit?
Cache Lines 0 1
33
Carnegie Mellon
Implementing LRU
• Idea: Evict the least recently accessed block
• Challenge: Need to keep track of access ordering of blocks
• Question: 2-way set associative cache:
• What do you need to implement LRU perfectly? One bit?
Address stream:
• Hit on 0
Cache Lines 0 1
• Hit on 1
• Miss, evict 0
LRU index (1-bit)
33
Carnegie Mellon
Implementing LRU
• Idea: Evict the least recently accessed block
• Challenge: Need to keep track of access ordering of blocks
• Question: 2-way set associative cache:
• What do you need to implement LRU perfectly? One bit?
Address stream:
• Hit on 0
Cache Lines 0 1
• Hit on 1
• Miss, evict 0
LRU index (1-bit) 1
33
Carnegie Mellon
Implementing LRU
• Idea: Evict the least recently accessed block
• Challenge: Need to keep track of access ordering of blocks
• Question: 2-way set associative cache:
• What do you need to implement LRU perfectly? One bit?
Address stream:
• Hit on 0
Cache Lines 0 1
• Hit on 1
• Miss, evict 0
LRU index (1-bit) 0
1
33
Carnegie Mellon
Implementing LRU
• Idea: Evict the least recently accessed block
• Challenge: Need to keep track of access ordering of blocks
• Question: 2-way set associative cache:
• What do you need to implement LRU perfectly? One bit?
Address stream:
• Hit on 0
Cache Lines 0 1
• Hit on 1
• Miss, evict 0
LRU index (1-bit) 0
1
33
Carnegie Mellon
Implementing LRU
• Idea: Evict the least recently accessed block
• Challenge: Need to keep track of access ordering of blocks
• Question: 2-way set associative cache:
• What do you need to implement LRU perfectly? One bit?
Address stream:
• Hit on 0
Cache Lines 0 1
• Hit on 1
• Miss, evict 0
LRU index (1-bit) 0
1
33
Carnegie Mellon
Implementing LRU
• Question: 4-way set associative cache:
Address stream:
Cache Lines 0 1 2 3 • Hit on 0
• Hit on 2
LRU index (2 bits) 1
• Hit on 3
• Miss, evict 1
34
Carnegie Mellon
Implementing LRU
• Question: 4-way set associative cache:
• What do you need to implement LRU perfectly? Will the same
mechanism work?
Address stream:
Cache Lines 0 1 2 3 • Hit on 0
• Hit on 2
LRU index (2 bits) 1
• Hit on 3
• Miss, evict 1
34
Carnegie Mellon
Implementing LRU
• Question: 4-way set associative cache:
• What do you need to implement LRU perfectly? Will the same
mechanism work?
Address stream:
Cache Lines 0 1 2 3 • Hit on 0
• Hit on 2
LRU index (2 bits) 1
• Hit on 3
• Miss, evict 1
34
Carnegie Mellon
Implementing LRU
• Question: 4-way set associative cache:
• What do you need to implement LRU perfectly? Will the same
mechanism work?
Address stream:
Cache Lines 0 1 2 3 • Hit on 0
• Hit on 2
LRU index (2 bits) 1
• Hit on 3
• Miss, evict 1
34
Carnegie Mellon
Implementing LRU
• Question: 4-way set associative cache:
• What do you need to implement LRU perfectly? Will the same
mechanism work?
Address stream:
Cache Lines 0 1 2 3 • Hit on 0
• Hit on 2
LRU index (2 bits) 1
• Hit on 3
• Miss, evict 1
34
Carnegie Mellon
Implementing LRU
• Question: 4-way set associative cache:
• What do you need to implement LRU perfectly? Will the same
mechanism work?
Address stream:
Cache Lines 0 1 2 3 • Hit on 0
• Hit on 2
LRU index (2 bits) 1
• Hit on 3
• Miss, evict 1
34
Carnegie Mellon
Implementing LRU
• Question: 4-way set associative cache:
• What do you need to implement LRU perfectly? Will the same
mechanism work?
Address stream:
Cache Lines 0 1 2 3 • Hit on 0
• Hit on 2
LRU index (2 bits) 1
• Hit on 3
How to update the • Miss, evict 1
LRU index now???
34
Carnegie Mellon
Implementing LRU
• Question: 4-way set associative cache:
• What do you need to implement LRU perfectly? Will the same
mechanism work?
• Essentially have to track the ordering of all cache lines
Address stream:
Cache Lines 0 1 2 3 • Hit on 0
• Hit on 2
LRU index (2 bits) 1
• Hit on 3
How to update the • Miss, evict 1
LRU index now???
34
Carnegie Mellon
Implementing LRU
• Question: 4-way set associative cache:
• What do you need to implement LRU perfectly? Will the same
mechanism work?
• Essentially have to track the ordering of all cache lines
• What are the hardware structures needed?
Address stream:
Cache Lines 0 1 2 3 • Hit on 0
• Hit on 2
LRU index (2 bits) 1
• Hit on 3
How to update the • Miss, evict 1
LRU index now???
34
Carnegie Mellon
Implementing LRU
• Question: 4-way set associative cache:
• What do you need to implement LRU perfectly? Will the same
mechanism work?
• Essentially have to track the ordering of all cache lines
• What are the hardware structures needed?
• In reality, true LRU is never implemented. Too complex.
Address stream:
Cache Lines 0 1 2 3 • Hit on 0
• Hit on 2
LRU index (2 bits) 1
• Hit on 3
How to update the • Miss, evict 1
LRU index now???
34
Carnegie Mellon
Implementing LRU
• Question: 4-way set associative cache:
• What do you need to implement LRU perfectly? Will the same
mechanism work?
• Essentially have to track the ordering of all cache lines
• What are the hardware structures needed?
• In reality, true LRU is never implemented. Too complex.
• “Pseudo-LRU” is usually used in real processors.
Address stream:
Cache Lines 0 1 2 3 • Hit on 0
• Hit on 2
LRU index (2 bits) 1
• Hit on 3
How to update the • Miss, evict 1
LRU index now???
34