0% found this document useful (0 votes)

15 views

Lecture16

Uploaded by

minulo

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

15 views

Lecture16

Uploaded by

minulo

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 123

CSC 252/452: Computer Organization

Fall 2024: Lecture 16

Instructor: Yanan Guo

Department of Computer Science

University of Rochester
Carnegie Mellon

Announcements
• Mid-term grades released. Solution on the website.
• Talk to a TA if you have doubts. Make an appointment if you
can’t make any TA office hours.
• Come to my office hour if TAs cannot solve your problems.

2
Carnegie Mellon

Cache Illustrations
CPU

0 1 2 3
Memory
4 5 6 7
(big but slow)
8 9 10 11
12 13 14 15

3
Carnegie Mellon

Cache Illustrations
CPU

Cache
8 9 14 3
(small but fast)

0 1 2 3
Memory
4 5 6 7
(big but slow)
8 9 10 11
12 13 14 15

4
Carnegie Mellon

Cache Illustrations
CPU
Request Data Data in address b is needed
at Address 14

Cache
8 9 14 3
(small but fast)

0 1 2 3
Memory
4 5 6 7
(big but slow)
8 9 10 11
12 13 14 15

4
Carnegie Mellon

Cache Illustrations
CPU
Request Data Data in address b is needed
at Address 14

Cache
8 9 14 3 Address b is in cache: Hit!
(small but fast)

0 1 2 3
Memory
4 5 6 7
(big but slow)
8 9 10 11
12 13 14 15

4
Carnegie Mellon

Cache Illustrations
CPU

Cache
8 9 14 3
(small but fast)

0 1 2 3
Memory
4 5 6 7
(big but slow)
8 9 10 11
12 13 14 15

5
Carnegie Mellon

Cache Illustrations
CPU
Request data
at Address 12

Cache
8 9 14 3
(small but fast)

0 1 2 3
Memory
4 5 6 7
(big but slow)
8 9 10 11
12 13 14 15

5
Carnegie Mellon

Cache Illustrations
CPU
Request data Data in address b is needed
at Address 12

Cache Address b is not in cache:

8 9 14 3
(small but fast) Miss!

0 1 2 3
Memory
4 5 6 7
(big but slow)
8 9 10 11
12 13 14 15

5
Carnegie Mellon

Cache Illustrations
CPU
Request data Data in address b is needed
at Address 12

Cache Address b is not in cache:

8 9 14 3
(small but fast) Miss!

Address b is fetched from

Request: 12
memory

0 1 2 3
Memory
4 5 6 7
(big but slow)
8 9 10 11
12 13 14 15

5
Carnegie Mellon

Cache Illustrations
CPU
Request data Data in address b is needed
at Address 12

Cache Address b is not in cache:

8 9 14 3
(small but fast) Miss!

Address b is fetched from

Request: 12
memory

0 1 2 3
Memory
4 5 6 7
(big but slow)
8 9 10 11
12 13 14 15

5
Carnegie Mellon

Cache Illustrations
CPU
Request data Data in address b is needed
at Address 12

Cache Address b is not in cache:

8 9 14 3
(small but fast) Miss!

Address b is fetched from

12 Request: 12
memory

0 1 2 3
Memory
4 5 6 7
(big but slow)
8 9 10 11
12 13 14 15

5
Carnegie Mellon

Cache Illustrations
CPU
Request data Data in address b is needed
at Address 12

Cache Address b is not in cache:

8 9
12 14 3
(small but fast) Miss!

Address b is fetched from

Request: 12
memory

0 1 2 3
Memory
4 5 6 7
(big but slow) Address b is stored in cache
8 9 10 11
12 13 14 15

5
Carnegie Mellon

A Simple Cache
Cache Memory • 16 memory locations
0000 • 4 cache locations
0001
Content Valid? 0010
0011 a
0100
00 0101
01 0110
10 0111
11 1000
1001
1010
1011
1100
1101
1110
1111

6
Carnegie Mellon

A Simple Cache
Cache Memory • 16 memory locations
0000 • 4 cache locations
0001
• Also called cache-line
Content Valid? 0010
0011 a
0100
00 0101
01 0110
10 0111
11 1000
1001
1010
1011
1100
1101
1110
1111

6
Carnegie Mellon

A Simple Cache
Cache Memory • 16 memory locations
0000 • 4 cache locations
0001
• Also called cache-line
Content Valid? 0010
0011 a • Every location has a valid bit, indicating
0100 whether that location contains valid data;
00 0101
0 initially.
01 0110 • For now, assume cache location size
10 0111 == memory location size == 1 B
11 1000
1001
1010
1011
1100
1101
1110
1111

6
Carnegie Mellon

Cache Placement
Cache Memory • Given a memory addr, say 0x0001, we
want to put the data there into the
0000
cache; where does the data go?
0001
0010
0011 a
0100
00 0101
01 0110
10 0111
11 1000
1001
1010
1011
1100
1101
1110
1111

7
Carnegie Mellon

Fully-Associative Cache
Cache Memory • Every memory location can be mapped
to any cache line in the cache.
0000
0001 • Given a request to address A from
Content Valid? 0010 CPU, detecting cache hit/miss
0011 a requires:
00 0xAA
0100
• Comparing address A with all four
0101
01 0XBB tags in the cache (a.k.a.,
0110 0xAA
10 0xCC 0111
associative search)
11 0xDD 1000 0xBB • Can we reduce the overhead:
1001
• of storing tags
1010
1011 0xCC • of comparison
1100 0xDD
1101
1110
1111

8
Carnegie Mellon

Fully-Associative Cache
Cache Memory • Every memory location can be mapped
to any cache line in the cache.
0000
0001 • Given a request to address A from
Content Valid? 0010 CPU, detecting cache hit/miss
0011 a requires:
00 0xBB
0100
• Comparing address A with all four
0101
01 0XAA tags in the cache (a.k.a.,
0110 0xAA
10 0xCC 0111
associative search)
11 0xDD 1000 0xBB • Can we reduce the overhead:
1001
• of storing tags
1010
1011 0xCC • of comparison
1100 0xDD
1101
1110
1111

9
Carnegie Mellon

Fully-Associative Cache
Cache Memory • Every memory location can be mapped
to any cache line in the cache.
0000
0001 • Given a request to address A from
Content Valid? Tag 0010 CPU, detecting cache hit/miss
0011 a requires:
00 0xBB
0100
• Comparing address A with all four
0101
01 0XAA tags in the cache (a.k.a.,
0110 0xAA
10 0xCC 0111
associative search)
11 0xDD 1000 0xBB • Can we reduce the overhead:
1001
• of storing tags
1010
1011 0xCC • of comparison
1100 0xDD
1101
1110
1111

10
Carnegie Mellon

Fully-Associative Cache
Cache Memory • Every memory location can be mapped
to any cache line in the cache.
0000
0001 • Given a request to address A from
Content Valid? Tag 0010 CPU, detecting cache hit/miss
0011 a requires:
00 0xBB addr [3:0]
0100
• Comparing address A with all four
0101
01 0XAA addr [3:0] tags in the cache (a.k.a.,
0110 0xAA
10 0xCC addr [3:0]
0111
associative search)
11 0xDD addr [3:0]
1000 0xBB • Can we reduce the overhead:
1001
• of storing tags
1010
1011 0xCC • of comparison
1100 0xDD
1101
1110
1111

10
Carnegie Mellon

A Few Terminologies

• A cache line: content + valid bit + tag bits 0000

• Valid bit + tag bits are “overhead” 0001

0010
• Content is what you really want to store 0011 a
• But we need valid and tag bits to correctly 0100

access the cache 0101

0110
0111
1000 0xEF
1001 0xAC
1010 0x06
1011
1100
1101 0x70
1110
1111

11
Carnegie Mellon

Direct-Mapped Cache
Cache Memory • Direct-Mapped Cache
0000 • One address can only be
0001 mapped to one cache line
0010 • CA = ADDR[1],ADDR[0]
0011
0100
00 0101
01 0110
10 0111
11 1000
1001
1010
1011
1100
1101
addr[1:0] 1110
1111

Mem addr

12
Carnegie Mellon

Direct-Mapped Cache
Cache Memory • Direct-Mapped Cache
0000 • One address can only be
0001 mapped to one cache line
0010 • CA = ADDR[1],ADDR[0]
0011 • Always use the lower order address
0100 bits
00 0101
01 0110
10 0111
11 1000
1001
1010
1011
1100
1101
addr[1:0] 1110
1111

Mem addr

12
Carnegie Mellon

Mem addr

12
Carnegie Mellon

Direct-Mapped Cache
Cache Memory • Direct-Mapped Cache
0000 • One address can only be
0001 mapped to one cache line
0010 • CA = ADDR[1],ADDR[0]
0011 • Always use the lower order address
0100 bits
00
• Multiple addresses can be
0101
01 0110
10 0111 mapped to the same location
11 1000
1001
1010
1011
1100
1101
addr[1:0] 1110
1111

Mem addr

12
Carnegie Mellon

Direct-Mapped Cache
Cache Memory • Direct-Mapped Cache
0000 • One address can only be
0001 mapped to one cache line
0010 • CA = ADDR[1],ADDR[0]
0011 • Always use the lower order address
0100 bits
00
• Multiple addresses can be
0101
01 0110
10 0111 mapped to the same location
11 1000 • E.g., 0010 and 1010
1001
1010
1011
1100
1101
addr[1:0] 1110
1111

Mem addr

12
Carnegie Mellon

Direct-Mapped Cache
Cache Memory • Direct-Mapped Cache
0000 • One address can only be
0001 mapped to one cache line
0010 • CA = ADDR[1],ADDR[0]
0011 • Always use the lower order address
0100 bits
00
• Multiple addresses can be
0101
01 0110
10 0111 mapped to the same location
11 1000 • E.g., 0010 and 1010

• How do we differentiate between

1001
1010
1011 different memory locations that
1100 are mapped to the same cache
1101 location?
addr[1:0] 1110
1111

Mem addr

12
Carnegie Mellon

Direct-Mapped Cache
Cache Memory • Direct-Mapped Cache
0000 • One address can only be
0001 mapped to one cache line
0010 • CA = ADDR[1],ADDR[0]
0011 • Always use the lower order address
0100 bits
00
• Multiple addresses can be
0101
01 0110
10 0111 mapped to the same location
11 1000 • E.g., 0010 and 1010

• How do we differentiate between

1001
1010
1011 different memory locations that
1100 are mapped to the same cache
1101 location?
addr[1:0] 1110
• Add a tag field for that purpose
1111

Mem addr

12
Carnegie Mellon

Direct-Mapped Cache
Cache Memory • Direct-Mapped Cache
0000 • One address can only be
0001 mapped to one cache line
0010 • CA = ADDR[1],ADDR[0]
0011 • Always use the lower order address
0100 bits
00
• Multiple addresses can be
0101
01 0110
10 0111 mapped to the same location
11 1000 • E.g., 0010 and 1010

• How do we differentiate between

1001
1010
1011 different memory locations that
1100 are mapped to the same cache
1101 location?
addr[1:0] 1110
• Add a tag field for that purpose
1111
• What should the tag field be?
Mem addr

12
Carnegie Mellon

Direct-Mapped Cache
Cache Memory • Direct-Mapped Cache
0000 • One address can only be
0001 mapped to one cache line
0010 • CA = ADDR[1],ADDR[0]
0011 • Always use the lower order address
0100 bits
00
• Multiple addresses can be
0101
01 0110
10 0111 mapped to the same location
11 1000 • E.g., 0010 and 1010

• How do we differentiate between

1001
1010
1011 different memory locations that
1100 are mapped to the same cache
1101 location?
addr[1:0] 1110
• Add a tag field for that purpose
1111
• What should the tag field be?
Mem addr • ADDR[3] and ADDR[2] in this particular
example

12
Carnegie Mellon

• Multiple addresses can be

0101
01 addr [3:2]
0110
10 addr [3:2]
0111 mapped to the same location
11 addr [3:2]
1000 • E.g., 0010 and 1010

• How do we differentiate between

12
Carnegie Mellon

Direct-Mapped Cache
Cache Memory • Direct-Mapped Cache
0000 • One address can only be
Tag 0001 mapped to one cache line
0010 • CA = ADDR[1],ADDR[0]
0011 • Always use the lower order address
0100 bits
00 addr [3:2]

• Multiple addresses can be

0101
01 addr [3:2]
0110
10 addr [3:2]
0111 mapped to the same location
11 addr [3:2]
1000 • E.g., 0010 and 1010

• How do we differentiate between

12
Carnegie Mellon

• Multiple addresses can be

0101
01 addr [3:2]
0110
10 addr [3:2]
0111 mapped to the same location
11 addr [3:2]
1000 • E.g., 0010 and 1010

• How do we differentiate between

1001
1010
1011 different memory locations that
1100 are mapped to the same cache
1101 location?
addr[1:0] = Hit? 1110
• Add a tag field for that purpose
1111
• What should the tag field be?
Mem addr addr[3:2] • ADDR[3] and ADDR[2] in this particular
example
CPU
12
Carnegie Mellon

Direct-Mapped Cache
Cache Memory • Limitation: each memory
location can be mapped to only
0000 one cache location.
0001
Tag
0010
0011
0100
00 addr [3:2]
0101
01 addr [3:2]
0110
10 addr [3:2]
0111
11 addr [3:2]
1000
1001
1010
1011
1100
1101
addr[1:0] = Hit? 1110
1111

Mem addr addr[3:2]

CPU
13
Carnegie Mellon

Direct-Mapped Cache
Cache Memory • Limitation: each memory
location can be mapped to only
0000 one cache location.
0001
Tag
0010
0011
• This leads to a lot of conflicts.
0100
00 addr [3:2]
0101
01 addr [3:2]
0110
10 addr [3:2]
0111
11 addr [3:2]
1000
1001
1010
1011
1100
1101
addr[1:0] = Hit? 1110
1111

Mem addr addr[3:2]

CPU
13
Carnegie Mellon

Direct-Mapped Cache
Cache Memory • Limitation: each memory
location can be mapped to only
0000 one cache location.
0001
Tag
0010
0011
• This leads to a lot of conflicts.
00 addr [3:2]
0100 • How do we improve this?
0101
01 addr [3:2]
0110
10 addr [3:2]
0111
11 addr [3:2]
1000
1001
1010
1011
1100
1101
addr[1:0] = Hit? 1110
1111

Mem addr addr[3:2]

CPU
13
Carnegie Mellon

Direct-Mapped Cache
Cache Memory • Limitation: each memory
location can be mapped to only
0000 one cache location.
0001
Tag
0010
0011
• This leads to a lot of conflicts.
00 addr [3:2]
0100 • How do we improve this?
01 addr [3:2]
0101
0110 • Can each memory location have
10 addr [3:2]
0111
the flexibility to be mapped to
11 addr [3:2]
1000 different cache locations?
1001
1010
1011
1100
1101
addr[1:0] = Hit? 1110
1111

Mem addr addr[3:2]

CPU
13
Carnegie Mellon

Set-Associative Cache
Cache Memory
Content Valid? Tag 0000
0001
0010
00 0011 a
01 0100
10 0101
11 0110
0111
1000
1001
1010
1011
1100
1101
1110
1111

14
Carnegie Mellon

Set-Associative Cache
Cache Memory
Content Valid? Tag 0000
0001
0010
Set 0 0011 a
0100
Set 1 0101
0110
Way 0 Way 1 0111
1000
1001
1010
1011
1100
1101
1110
1111

15
Carnegie Mellon

Set-Associative Cache
Cache Memory
Content Valid? Tag 0000
0001
0010
Set 0 0011 a
0100
Set 1 0101
0110
Way 0 Way 1 0111
1000
2-Way Set Associative Cache 1001

• 4 cache lines are organized into two sets; each 1010

1011
set has 2 cache lines (i.e., 2 ways) 1100
• Lowest bit is used for cache index 1101
• Even address go to first set and odd 1110

addresses go to the second set 1111

• Each address can be mapped to either cache

line in the same set
• Tag now stores the higher 3 bits instead of
the entire address
15
Carnegie Mellon

Set-Associative Cache
Cache Memory
Content Valid? Tag 0000
0001
0010
Set 0 0011 a
0100
Set 1 0101
0110
Way 0 Way 1 0111
1000

• Given a request to address, say 1011, from the 1001

1010
CPU, detecting cache hit/miss requires: 1011
1100
1101
1110
1111

16
Carnegie Mellon

Set-Associative Cache
Cache Memory
Content Valid? Tag 0000
0001
0010
Set 0 0011 a
0100
Set 1 0101
0110
Way 0 Way 1 0111
1000

• Given a request to address, say 1011, from the 1001

1010
CPU, detecting cache hit/miss requires: 1011
• Using the LSB to index into the cache and find 1100
the corresponding set, in this case set 1 1101
1110
1111

16
Carnegie Mellon

Set-Associative Cache
Cache Memory
Content Valid? Tag 0000
0001
0010
Set 0 0011 a
0100
Set 1 0101
0110
Way 0 Way 1 0111
1000

• Given a request to address, say 1011, from the 1001

1010
CPU, detecting cache hit/miss requires: 1011
• Using the LSB to index into the cache and find 1100
the corresponding set, in this case set 1 1101

• Then do an associative search in that set, i.e., 1110

1111
compare the highest 3 bits 101 with both tags
in set 1

16
Carnegie Mellon

Set-Associative Cache
Cache Memory
Content Valid? Tag 0000
0001
0010
Set 0 0011 a
0100
Set 1 0101
0110
Way 0 Way 1 0111
1000

• Given a request to address, say 1011, from the 1001

1010
CPU, detecting cache hit/miss requires: 1011
• Using the LSB to index into the cache and find 1100
the corresponding set, in this case set 1 1101

• Then do an associative search in that set, i.e., 1110

1111
compare the highest 3 bits 101 with both tags
in set 1
• Only two comparisons required

16
Carnegie Mellon

Direct-Mapped (1-way Associative) Cache

Cache Memory
Content Valid? Tag 0000
0001
00 0xEF 10 0010
01 0xAC 10 0011 a
10 0x06 10 0100
11 0101
0110

• 4 cache lines are organized into four sets 0111

• Each memory localization can only be

1000
1001
0xEF
0xAC
mapped to one set 1010 0x06
• Using the 2 LSBs to find the set 1011

• Tag now stores the higher 2 bits 1100

1101 0x70
1110
1111

17
Carnegie Mellon

Associative verses Direct Mapped Trade-offs

18
Carnegie Mellon

Associative verses Direct Mapped Trade-offs

• Direct-Mapped cache
• Generally lower hit rate
• Simpler, Faster

18
Carnegie Mellon

Associative verses Direct Mapped Trade-offs

• Direct-Mapped cache
• Generally lower hit rate
• Simpler, Faster

00 a 1 10
01 b 1 10
10 c 1 10
11 d 1 10

addr[1:0] = Hit?

addr addr[3:2]
18
Carnegie Mellon

Associative verses Direct Mapped Trade-offs

• Direct-Mapped cache
• Generally lower hit rate
• Simpler, Faster
• Set Associative cache
• Generally higher hit rate. Better utilization of cache resources
• Slower and higher power consumption. Why?
00 a 1 10 0 a 1 101 c 1 100
01 b 1 10 1 b 1 101 d 1 100
10 c 1 10
11 d 1 10

addr[1:0] = Hit?

addr addr[3:2]
18
Carnegie Mellon

Associative verses Direct Mapped Trade-offs

addr[1:0] = Hit? addr[0]

addr addr[3:2] addr

18
Carnegie Mellon

Associative verses Direct Mapped Trade-offs

= =

addr[1:0] = Hit? addr[0]

addr addr[3:2] addr addr[3:1]

18
Carnegie Mellon

Associative verses Direct Mapped Trade-offs

= =

addr[1:0] = Hit? addr[0] Or

addr addr[3:2] addr addr[3:1] Hit?

18
Carnegie Mellon

Associative verses Direct Mapped Trade-offs

Miss rate versus cache size on the Integer portion of SPEC CPU2000
19
Carnegie Mellon

Cache Organization
• Finding a name in a roster
• If the roster is completely unorganized
• Need to compare the name with all the names in the roster
• Same as a fully-associative cache
• If the roster is ordered by last name, and within the same last
name different first names are unordered
• First find the last name group
• Then compare the first name with all the first names in the
same group
• Same as a set-associative cache

20
Carnegie Mellon

Cache Access Summary (So far…)

• Assuming b bits in a memory address
• The b bits are split into two halves:
• Lower s bits used as index to find a set. Total sets S = 2s
• The higher (b - s) bits are used for the tag
• Associativity n (i.e., the number of ways in a cache set) is
independent of the the split between index and tag

b s 0
Memory
tag index
Address

21
Carnegie Mellon

Locality again
• So far: temporal locality
• What about spatial?
• Idea: Each cache location (cache line) store multiple bytes

22
Carnegie Mellon

Cache-Line Size of 2
Cache Memory
0000
0001
0010
0011 a
0100
00 0101
01 0110
10 0111
11 1000 a
1001 b
1010 c
1011 d
1100
1101
= Hit? 1110
1111

addr

23
Carnegie Mellon

Cache-Line Size of 2
Cache Memory
0000
0001
0010
0011 a
0100
00 0101
01 0110
10 0111
11 1000 a
1001 b
1010 c
1011 d
1100
1101
= Hit? 1110
1111

addr

23
Carnegie Mellon

Cache-Line Size of 2
Cache Memory
• Read 1000
0000
0001
0010
0011 a
0100
00 a b 0101
01 0110
10 0111
11 1000 a
1001 b
1010 c
1011 d
1100
1101
= Hit? 1110
1111

addr

23
Carnegie Mellon

Cache-Line Size of 2
Cache Memory
• Read 1000
0000
0001 • Read 1001 (Hit!)
0010
0011 a
0100
00 a b 0101
01 0110
10 0111
11 1000 a
1001 b
1010 c
1011 d
1100
1101
= Hit? 1110
1111

addr

23
Carnegie Mellon

Cache-Line Size of 2
Cache Memory
• Read 1000
0000
0001 • Read 1001 (Hit!)
0010
0011 a • Read 1010
0100
00 a b 0101
01 c d 0110
10 0111
11 1000 a
1001 b
1010 c
1011 d
1100
1101
= Hit? 1110
1111

addr

23
Carnegie Mellon

Cache-Line Size of 2
Cache Memory
• Read 1000
0000
0001 • Read 1001 (Hit!)
0010
0011 a • Read 1010
00 a b
0100
0101
• Read 1011 (Hit!)
01 c d 0110
10 0111
11 1000 a
1001 b
1010 c
1011 d
1100
1101
= Hit? 1110
1111

addr

23
Carnegie Mellon

Cache-Line Size of 2
Cache Memory
• Read 1000
0000
0001 • Read 1001 (Hit!)
0010
0011 a • Read 1010
00 a b
0100
0101
• Read 1011 (Hit!)
01
10
c d 0110
0111
• How to access
11 1000 a the cache now?
1001 b
1010 c
1011 d
1100
1101
??? = Hit? 1110
1111

addr ???

23
Carnegie Mellon

Cache-Line Size of 2
Cache Memory
• Read 1000
0000
0001 • Read 1001 (Hit!)
0010
0011 a • Read 1010
00 a b
0100
0101
• Read 1011 (Hit!)
01 c d 0110
10 0111
11 1000 a
1001 b
1010 c
1011 d
1100
1101
addr[2:1] = Hit? 1110
1111

addr addr[3]

24
Carnegie Mellon

Cache-Line Size of 2
Cache Memory
• Read 1000
0000
0001 • Read 1001 (Hit!)
0010
0011 a • Read 1010
00 a b
0100
0101
• Read 1011 (Hit!)
01 c d 0110
10 0111
11 1000 a
1001 b
1010 c
1011 d
MUX 1100
1101
addr[2:1] To = Hit? 1110
CPU 1111

addr addr[3]

24
Carnegie Mellon

addr addr[3]

addr[0]
24
Carnegie Mellon

Cache Access Summary

• Assuming b bits in a memory address
• The b bits are split into three fields:
• Lower l bits are used for byte offset within a cache line. Cache line
size L = 2l
• Next s bits used as index to find a set. Total sets S = 2s
• The higher (b - l - s) bits are used for the tag
• Associativity n is independent of the the split between index and tag

b l+s l 0
Memory
tag index offset
Address

25
Carnegie Mellon

Handling Reads

2633
Carnegie Mellon

Handling Reads
• Read miss: Put into cache

2633
Carnegie Mellon

Handling Reads
• Read miss: Put into cache
• Any reason not to put into cache?

2633
Carnegie Mellon

Handling Reads
• Read miss: Put into cache
• Any reason not to put into cache?
• What to replace? Depends on the replacement policy. More on
this later.

2633
Carnegie Mellon

Handling Reads
• Read miss: Put into cache
• Any reason not to put into cache?
• What to replace? Depends on the replacement policy. More on
this later.
• Read hit: Nothing special. Enjoy the hit!

2633
Carnegie Mellon

Handling Writes (Hit)

• Intricacy: data value is modified!
• Implication: value in cache will be different from that in memory!

• When do we write the modified data in a cache to the next level?

• Write through: At the time the write happens

27
Carnegie Mellon

Handling Writes (Hit)

• Intricacy: data value is modified!
• Implication: value in cache will be different from that in memory!

• When do we write the modified data in a cache to the next level?

• Write through: At the time the write happens
• Write back: When the cache line is evicted

27
Carnegie Mellon

Handling Writes (Hit)

• Intricacy: data value is modified!
• Implication: value in cache will be different from that in memory!

• When do we write the modified data in a cache to the next level?

• Write through: At the time the write happens
• Write back: When the cache line is evicted

• Write-back
• + Can consolidate multiple writes to the same block before eviction.
Potentially saves bandwidth between cache and memory + saves
energy
• - Need a bit in the tag store indicating the block is “dirty/modified”

27
Carnegie Mellon

Handling Writes (Hit)

• Intricacy: data value is modified!
• Implication: value in cache will be different from that in memory!

• When do we write the modified data in a cache to the next level?

• Write through: At the time the write happens
• Write back: When the cache line is evicted

27
Carnegie Mellon

Handling Writes (Hit)

• Intricacy: data value is modified!
• Implication: value in cache will be different from that in memory!

• When do we write the modified data in a cache to the next level?

• Write through: At the time the write happens
• Write back: When the cache line is evicted

• Write-through

27
Carnegie Mellon

Handling Writes (Hit)

• Intricacy: data value is modified!
• Implication: value in cache will be different from that in memory!

• When do we write the modified data in a cache to the next level?

• Write through: At the time the write happens
• Write back: When the cache line is evicted

• Write-through
• + Simpler

27
Carnegie Mellon

Handling Writes (Hit)

• Intricacy: data value is modified!
• Implication: value in cache will be different from that in memory!

• When do we write the modified data in a cache to the next level?

• Write through: At the time the write happens
• Write back: When the cache line is evicted

• Write-through
• + Simpler
• + Memory is up to date
27
Carnegie Mellon

Handling Writes (Hit)

• Intricacy: data value is modified!
• Implication: value in cache will be different from that in memory!

• When do we write the modified data in a cache to the next level?

• Write through: At the time the write happens
• Write back: When the cache line is evicted

• Write-through
• + Simpler
• + Memory is up to date
• - More bandwidth intensive; no coalescing of writes
27
Carnegie Mellon

Handling Writes (Miss)

• Do we allocate a cache line on a write miss?
• Write-allocate: Allocate on write miss
• Non-Write-Allocate: No-allocate on write miss

• Allocate on write miss

28
Carnegie Mellon

Handling Writes (Miss)

• Do we allocate a cache line on a write miss?
• Write-allocate: Allocate on write miss
• Non-Write-Allocate: No-allocate on write miss

• Allocate on write miss

• + Can consolidate writes instead of writing each of them
individually to memory

28
Carnegie Mellon

Handling Writes (Miss)

• Do we allocate a cache line on a write miss?
• Write-allocate: Allocate on write miss
• Non-Write-Allocate: No-allocate on write miss

• Allocate on write miss

• + Can consolidate writes instead of writing each of them
individually to memory
• + Simpler because write misses can be treated the same way
as read misses

28
Carnegie Mellon

Handling Writes (Miss)

• Do we allocate a cache line on a write miss?
• Write-allocate: Allocate on write miss
• Non-Write-Allocate: No-allocate on write miss

• Allocate on write miss

• + Can consolidate writes instead of writing each of them
individually to memory
• + Simpler because write misses can be treated the same way
as read misses

• Non-allocate
• + Conserves cache space if locality of writes is low (potentially
better cache hit rate)

28
Carnegie Mellon

Instruction vs. Data Caches

• Separate or Unified?

29
Carnegie Mellon

Instruction vs. Data Caches

• Separate or Unified?
• Unified:

29
Carnegie Mellon

Instruction vs. Data Caches

• Separate or Unified?
• Unified:
• + Dynamic sharing of cache space: no overprovisioning that might
happen with static partitioning (i.e., split Inst and Data caches)

29
Carnegie Mellon

Instruction vs. Data Caches

29
Carnegie Mellon

Instruction vs. Data Caches

• Separate or Unified?
• Unified:
• + Dynamic sharing of cache space: no overprovisioning that might
happen with static partitioning (i.e., split Inst and Data caches)
• - Instructions and data can thrash each other (i.e., no guaranteed
space for either)
• - Inst and Data are accessed in different places in the pipeline.
Where do we place the unified cache for fast access?

29
Carnegie Mellon

Instruction vs. Data Caches

• First level caches are almost always split

• Mainly for the last reason above

• Second and higher levels are almost always unified

29
Carnegie Mellon

General Rule: Bigger == Slower

Cache
CPU Memory
$

• How big should the cache be?

• Too small and too much memory traffic
• Too large and cache slows down execution (high latency)

3043
Carnegie Mellon

General Rule: Bigger == Slower

Cache
CPU Memory
$

• How big should the cache be?

• Too small and too much memory traffic
• Too large and cache slows down execution (high latency)

• Make multiple levels of cache

• Small L1 backed up by larger L2
• Today’s processors typically have 3 cache levels

3043
Carnegie Mellon

A Real Intel Processor

31
Carnegie Mellon

Eviction/Replacement Policy

• Which cache line should be replaced?

32
Carnegie Mellon

Eviction/Replacement Policy

• Which cache line should be replaced?

• Direct mapped? Only one place!

32
Carnegie Mellon

Eviction/Replacement Policy

• Which cache line should be replaced?

• Direct mapped? Only one place!
• Associative caches? Multiple places!

32
Carnegie Mellon

Eviction/Replacement Policy

• Which cache line should be replaced?

• Direct mapped? Only one place!
• Associative caches? Multiple places!
• For associative cache:

32
Carnegie Mellon

Eviction/Replacement Policy

• Which cache line should be replaced?

• Direct mapped? Only one place!
• Associative caches? Multiple places!
• For associative cache:
• Any invalid cache line first

32
Carnegie Mellon

Eviction/Replacement Policy

• Which cache line should be replaced?

• Direct mapped? Only one place!
• Associative caches? Multiple places!
• For associative cache:
• Any invalid cache line first
• If all are valid, consult the replacement policy

32
Carnegie Mellon

Eviction/Replacement Policy

• Which cache line should be replaced?

32
Carnegie Mellon

Eviction/Replacement Policy

• Which cache line should be replaced?

• Direct mapped? Only one place!
• Associative caches? Multiple places!
• For associative cache:
• Any invalid cache line first
• If all are valid, consult the replacement policy
• Randomly pick one???
• Ideally: Replace the cache line that’s least likely going to be
used again

32
Carnegie Mellon

Eviction/Replacement Policy

• Which cache line should be replaced?

32
Carnegie Mellon

Implementing LRU
• Idea: Evict the least recently accessed block
• Challenge: Need to keep track of access ordering of blocks
• Question: 2-way set associative cache:
• What do you need to implement LRU perfectly? One bit?

Cache Lines 0 1

LRU index (1-bit)

33
Carnegie Mellon

Address stream:
• Hit on 0
Cache Lines 0 1
• Hit on 1
• Miss, evict 0
LRU index (1-bit)

33
Carnegie Mellon

Address stream:
• Hit on 0
Cache Lines 0 1
• Hit on 1
• Miss, evict 0
LRU index (1-bit) 1

33
Carnegie Mellon

Address stream:
• Hit on 0
Cache Lines 0 1
• Hit on 1
• Miss, evict 0
LRU index (1-bit) 0
1

33
Carnegie Mellon

Address stream:
• Hit on 0
Cache Lines 0 1
• Hit on 1
• Miss, evict 0
LRU index (1-bit) 0
1

33
Carnegie Mellon

Address stream:
• Hit on 0
Cache Lines 0 1
• Hit on 1
• Miss, evict 0
LRU index (1-bit) 0
1

33
Carnegie Mellon

Implementing LRU
• Question: 4-way set associative cache:

Address stream:
Cache Lines 0 1 2 3 • Hit on 0
• Hit on 2
LRU index (2 bits) 1
• Hit on 3
• Miss, evict 1

34
Carnegie Mellon

Implementing LRU
• Question: 4-way set associative cache:
• What do you need to implement LRU perfectly? Will the same
mechanism work?

Address stream:
Cache Lines 0 1 2 3 • Hit on 0
• Hit on 2
LRU index (2 bits) 1
• Hit on 3
• Miss, evict 1

34
Carnegie Mellon

Implementing LRU
• Question: 4-way set associative cache:
• What do you need to implement LRU perfectly? Will the same
mechanism work?

Address stream:
Cache Lines 0 1 2 3 • Hit on 0
• Hit on 2
LRU index (2 bits) 1
• Hit on 3
• Miss, evict 1

34
Carnegie Mellon

Implementing LRU
• Question: 4-way set associative cache:
• What do you need to implement LRU perfectly? Will the same
mechanism work?

Address stream:
Cache Lines 0 1 2 3 • Hit on 0
• Hit on 2
LRU index (2 bits) 1
• Hit on 3
• Miss, evict 1

34
Carnegie Mellon

Implementing LRU
• Question: 4-way set associative cache:
• What do you need to implement LRU perfectly? Will the same
mechanism work?

Address stream:
Cache Lines 0 1 2 3 • Hit on 0
• Hit on 2
LRU index (2 bits) 1
• Hit on 3
• Miss, evict 1

34
Carnegie Mellon

Implementing LRU
• Question: 4-way set associative cache:
• What do you need to implement LRU perfectly? Will the same
mechanism work?

Address stream:
Cache Lines 0 1 2 3 • Hit on 0
• Hit on 2
LRU index (2 bits) 1
• Hit on 3
• Miss, evict 1

34
Carnegie Mellon

Implementing LRU
• Question: 4-way set associative cache:
• What do you need to implement LRU perfectly? Will the same
mechanism work?

Address stream:
Cache Lines 0 1 2 3 • Hit on 0
• Hit on 2
LRU index (2 bits) 1
• Hit on 3
How to update the • Miss, evict 1
LRU index now???
34
Carnegie Mellon

Implementing LRU
• Question: 4-way set associative cache:
• What do you need to implement LRU perfectly? Will the same
mechanism work?
• Essentially have to track the ordering of all cache lines
• What are the hardware structures needed?
• In reality, true LRU is never implemented. Too complex.

Address stream:
Cache Lines 0 1 2 3 • Hit on 0
• Hit on 2
LRU index (2 bits) 1
• Hit on 3
How to update the • Miss, evict 1
LRU index now???
34
Carnegie Mellon

Address stream:
Cache Lines 0 1 2 3 • Hit on 0
• Hit on 2
LRU index (2 bits) 1
• Hit on 3
How to update the • Miss, evict 1
LRU index now???
34

Power Analysis _ VLSI Back-End Adventure
No ratings yet
Power Analysis _ VLSI Back-End Adventure
4 pages
Lecture17
No ratings yet
Lecture17
83 pages
10-Cache
No ratings yet
10-Cache
28 pages
Rec 07
No ratings yet
Rec 07
40 pages
CS2115 chapter-6
No ratings yet
CS2115 chapter-6
45 pages
L14
No ratings yet
L14
17 pages
Module 5
No ratings yet
Module 5
17 pages
06_Memory System_I
No ratings yet
06_Memory System_I
63 pages
Computer Architecture: Assoc. Prof. Nguyễn Trí Thành, Phd
No ratings yet
Computer Architecture: Assoc. Prof. Nguyễn Trí Thành, Phd
55 pages
CPSC 312 Cache Memories: Topics
No ratings yet
CPSC 312 Cache Memories: Topics
39 pages
Lec8 - Caches
No ratings yet
Lec8 - Caches
55 pages
Lec 4
No ratings yet
Lec 4
31 pages
MODULE 5 Caches
No ratings yet
MODULE 5 Caches
49 pages
Memory Hierarchy Main Memory Auxiliary Memory Associative Memory Cache Memory Virtual Memory Memory MGT Hardware
No ratings yet
Memory Hierarchy Main Memory Auxiliary Memory Associative Memory Cache Memory Virtual Memory Memory MGT Hardware
8 pages
Cache and Caching: Electrical and Electronic Engineering
No ratings yet
Cache and Caching: Electrical and Electronic Engineering
15 pages
Chap 6
No ratings yet
Chap 6
48 pages
18-Vm-Systems 2016 VIRTUAL MEMORY SYSTEM EXPLANATION
No ratings yet
18-Vm-Systems 2016 VIRTUAL MEMORY SYSTEM EXPLANATION
43 pages
Caches - Basic Idea
No ratings yet
Caches - Basic Idea
11 pages
BCS402-module-5
No ratings yet
BCS402-module-5
13 pages
Memory Hierarchy Design
No ratings yet
Memory Hierarchy Design
76 pages
Cache and Caching: Electrical and Electronic Engineering
No ratings yet
Cache and Caching: Electrical and Electronic Engineering
15 pages
15IF11 Multicore B
No ratings yet
15IF11 Multicore B
36 pages
CA_Lecture_08
No ratings yet
CA_Lecture_08
38 pages
Mod-5 Microcontrollers
No ratings yet
Mod-5 Microcontrollers
13 pages
13_Large and Fast Exploiting Memory Hierarchy Final
No ratings yet
13_Large and Fast Exploiting Memory Hierarchy Final
118 pages
CA I - Chapter 5 Caches 2
No ratings yet
CA I - Chapter 5 Caches 2
80 pages
Cache Memory,Virtual Memory and Auxiliary Memory Ppts Lecture (3)
No ratings yet
Cache Memory,Virtual Memory and Auxiliary Memory Ppts Lecture (3)
42 pages
פרק ט - גדול ומהיר - ניצול היררכיות זיכרון
No ratings yet
פרק ט - גדול ומהיר - ניצול היררכיות זיכרון
77 pages
Computer Architecture: Memory Hierarchy Design
No ratings yet
Computer Architecture: Memory Hierarchy Design
60 pages
Large and Fast: Exploiting Memory Hierarchy
No ratings yet
Large and Fast: Exploiting Memory Hierarchy
48 pages
Input Output Organization(2.3)
No ratings yet
Input Output Organization(2.3)
151 pages
Cache Org
No ratings yet
Cache Org
19 pages
chapter 4 memory organization lecture
No ratings yet
chapter 4 memory organization lecture
54 pages
BCS402 MC M5 Notes
No ratings yet
BCS402 MC M5 Notes
13 pages
Fundamentals of Computer Systems: Caches
No ratings yet
Fundamentals of Computer Systems: Caches
28 pages
Lecture 5: Memory Hierarchy and Cache Traditional Four Questions For Memory Hierarchy Designers
No ratings yet
Lecture 5: Memory Hierarchy and Cache Traditional Four Questions For Memory Hierarchy Designers
10 pages
6.Module 2_Part 2
No ratings yet
6.Module 2_Part 2
39 pages
Class11 Cache
No ratings yet
Class11 Cache
41 pages
Computer Architecture: Memory Organization
No ratings yet
Computer Architecture: Memory Organization
65 pages
Memory Cache (Finley 2000)
No ratings yet
Memory Cache (Finley 2000)
15 pages
IT3030E CA Chap6 Memory
No ratings yet
IT3030E CA Chap6 Memory
72 pages
Cache memory,Virtual memory and Auxiliary memory notes
No ratings yet
Cache memory,Virtual memory and Auxiliary memory notes
42 pages
25 e 50 Beb 5 Aad 8 F 60
No ratings yet
25 e 50 Beb 5 Aad 8 F 60
49 pages
Lecture20
No ratings yet
Lecture20
81 pages
CH05 COA11e
No ratings yet
CH05 COA11e
43 pages
4 Caches With Notes
No ratings yet
4 Caches With Notes
121 pages
Computer Org and Arch: R.Magesh
No ratings yet
Computer Org and Arch: R.Magesh
48 pages
4.1 Computer Memory System Overview
No ratings yet
4.1 Computer Memory System Overview
12 pages
week12 updated
No ratings yet
week12 updated
60 pages
CAO - Lecutre7 Cache Memory
100% (1)
CAO - Lecutre7 Cache Memory
39 pages
COA_PPT
No ratings yet
COA_PPT
158 pages
Lecture 13- Introduction to Cache
No ratings yet
Lecture 13- Introduction to Cache
47 pages
Lab 8
No ratings yet
Lab 8
10 pages
UNIT IV.ppt
No ratings yet
UNIT IV.ppt
61 pages
CH 4.ppt Type I
No ratings yet
CH 4.ppt Type I
60 pages
Cache Memory
No ratings yet
Cache Memory
47 pages
DigitalLogic ComputerOrganization L20 CachesP1 Handout
No ratings yet
DigitalLogic ComputerOrganization L20 CachesP1 Handout
43 pages
Chapter10_ANOVA - Student(1)
No ratings yet
Chapter10_ANOVA - Student(1)
38 pages
Lecture15
No ratings yet
Lecture15
81 pages
CSC 240 HW 4
No ratings yet
CSC 240 HW 4
17 pages
Lecture18
No ratings yet
Lecture18
70 pages
Lecture19
No ratings yet
Lecture19
71 pages
Lecture2
No ratings yet
Lecture2
96 pages
Lecture6
No ratings yet
Lecture6
127 pages
Lecture4
No ratings yet
Lecture4
154 pages
Ecl 203 MCQ
No ratings yet
Ecl 203 MCQ
3 pages
HFXX Config Matrix
No ratings yet
HFXX Config Matrix
2 pages
H27UC (D) G8T (U) 2ETR-BC Rev1.0 0826
No ratings yet
H27UC (D) G8T (U) 2ETR-BC Rev1.0 0826
49 pages
Monolithic Microwave IC Fabrication
No ratings yet
Monolithic Microwave IC Fabrication
16 pages
BJT Switching Circuits
No ratings yet
BJT Switching Circuits
48 pages
DRAM Circuit and Architecture Basics: Terminology Access Protocol Architecture
No ratings yet
DRAM Circuit and Architecture Basics: Terminology Access Protocol Architecture
32 pages
Assignment I
No ratings yet
Assignment I
2 pages
Surat Peringatan Penghantaran Ebook Dan Video
No ratings yet
Surat Peringatan Penghantaran Ebook Dan Video
8 pages
Hex Inverted Buffers With Open-Collector Outputs: KK74LS06
No ratings yet
Hex Inverted Buffers With Open-Collector Outputs: KK74LS06
4 pages
Implementation of Wallace Tree Multiplier Using Adders
No ratings yet
Implementation of Wallace Tree Multiplier Using Adders
7 pages
VLSI Design: Introduction To Digital Integrated Circuits
No ratings yet
VLSI Design: Introduction To Digital Integrated Circuits
16 pages
Болгарский транзисторах кремний
No ratings yet
Болгарский транзисторах кремний
6 pages
Clock Gating
No ratings yet
Clock Gating
2 pages
74LS138 9
No ratings yet
74LS138 9
7 pages
MRAM Technology and Business 2019
No ratings yet
MRAM Technology and Business 2019
47 pages
Building Blocks of A CPU
No ratings yet
Building Blocks of A CPU
11 pages
Digital Circuits and Systems: Spring 2015 Week 2
No ratings yet
Digital Circuits and Systems: Spring 2015 Week 2
14 pages
Devarajulu Resume
No ratings yet
Devarajulu Resume
2 pages
Anexo 3 Computer Octubre
No ratings yet
Anexo 3 Computer Octubre
2 pages
ITT270 Task3 RCS1103C
No ratings yet
ITT270 Task3 RCS1103C
5 pages
Ug579 Ultrascale DSP
No ratings yet
Ug579 Ultrascale DSP
76 pages
Digital Electronics Exam
No ratings yet
Digital Electronics Exam
4 pages
Instruction Set
No ratings yet
Instruction Set
9 pages
EC1401 VLSI - Question Bank (N.shanmuga Sundaram)
50% (2)
EC1401 VLSI - Question Bank (N.shanmuga Sundaram)
35 pages
Evaluation of Opamp Compensation Techniques
No ratings yet
Evaluation of Opamp Compensation Techniques
4 pages
DM74LS194A 4-Bit Bidirectional Universal Shift Register
100% (1)
DM74LS194A 4-Bit Bidirectional Universal Shift Register
6 pages
DE-IAT 2-PART B-SET B
No ratings yet
DE-IAT 2-PART B-SET B
1 page
Multiplexers DLD
No ratings yet
Multiplexers DLD
14 pages
Low-Power Digital VLSI Design
No ratings yet
Low-Power Digital VLSI Design
530 pages