0% found this document useful (0 votes)
27 views32 pages

Lecture 16: Cache Memories - Last Time - Today

This document summarizes key points from a lecture on cache memories: 1. It discusses how memory addresses are used to locate blocks in direct-mapped, set-associative, and fully-associative caches. 2. When a cache miss occurs, replacement policies like LRU are used to determine which block to evict. 3. Techniques for improving cache performance include increasing block size and associativity to reduce miss rates, and using wider buses and multiple cache levels to reduce miss penalties.

Uploaded by

adchy7
Copyright
© Attribution Non-Commercial (BY-NC)
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
27 views32 pages

Lecture 16: Cache Memories - Last Time - Today

This document summarizes key points from a lecture on cache memories: 1. It discusses how memory addresses are used to locate blocks in direct-mapped, set-associative, and fully-associative caches. 2. When a cache miss occurs, replacement policies like LRU are used to determine which block to evict. 3. Techniques for improving cache performance include increasing block size and associativity to reduce miss rates, and using wider buses and multiple cache levels to reduce miss penalties.

Uploaded by

adchy7
Copyright
© Attribution Non-Commercial (BY-NC)
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 32

Lecture 16: Cache Memories Last Time

AMAT average memory access time Basic cache organization

Today
Take QUIZ 12 over P&H 5.7-10 before 11:59pm today Read 5.4, 5.6 for 3/25 Homework 6 due Thursday March 25, 2010 Hardware cache organization Reads versus Writes Cache Optimization

UTCS 352, Lecture 16

Cache Memory Theory


Small fast memory + big slow memory Looks like a big fast memory

Big Fast

MC Small Fast MM

Big Slow
UTCS 352, Lecture 16 2

The Memory Hierarchy


Registers CPU Chip Latency 1 cyc 1-3cy Bandwidth 3-10 words/cycle < 1KB 1-2 words/cycle 32KB -1MB 1 word/cycle 1MB - 4MB

compiler managed hardware managed

Level 1 Cache

Level 2 Cache Chips DRAM

5-10cy

hardware managed OS managed

30-100cy 0.5 words/cycle 64MB - 4GB 106-107cy 0.01 words/cycle 4GB+

Disk Mechanical Tape


UTCS 352, Lecture 16

OS managed

Direct Mapped
Each block mapped to exactly 1 cache location Cache location = (block address) MOD (# blocks in cache)
0 1 2 3 4 5 6 7

0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31

UTCS 352, Lecture 16

Fully Associative
Each block mapped to any cache location Cache location = any
0 1 2 3 4 5 6 7

0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31

UTCS 352, Lecture 16

Set Associative
Each block mapped to subset of cache locations Set selection = (block address) MOD (# sets in cache)
0 1 2 3 4 5 6 7

Set

2-way set associative = 2 blocks in set This example: 4 sets

0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31

UTCS 352, Lecture 16

How do we use memory address to find block in the cache?

UTCS 352, Lecture 16

How Do We Find a Block in The Cache?


Our Example:
Main memory address space = 32 bits (= 4GBytes) Block size = 4 words = 16 bytes Cache capacity = 8 blocks = 128 bytes

32 bit Address

block address tag index block offset 4 bits

28 bits

index which set tag which data/instruction in block block offset which word in block # tag/index bits determine the associativity tag/index bits can come from anywhere in block address
8

UTCS 352, Lecture 16

Finding a Block: Direct-Mapped

S Entries

25
Tag Index Address

=
Hit Data

With cache capacity = 8 blocks

UTCS 352, Lecture 16

Finding A Block: 2-Way Set-Associative


S - sets A - elements in each set
A-way associative
2 elements per set

S=4, A=2
2-way associative 8-entry cache

4 Sets

2 26

=
Tag Index Address Hit Data

UTCS 352, Lecture 16

10

Finding A Block: Fully Associative

28

=
Tag Address
UTCS 352, Lecture 16

=
Hit

Data
11

Set Associative Cache - contd


All of main memory is divided into S sets
All addresses in set N map to same set of the cache Addr = N mod S A locations available

Low address bits select set


2 in example

Shares costly comparators across sets

High address bits are tag, used to associatively search the selected set Extreme cases
A=1: Direct mapped cache S=1: Fully associative

A need not be a power of 2

UTCS 352, Lecture 16

12

Cache Organization
Address

27 15 42 86 Valid bits

95 11 75 33

90 12 74 35

99 13 73 31

96 14 72 37

Data

Where

does a block get placed? - DONE How do we nd it? - DONE Which one do we replace when a new one is brought in? What happens on a write?
UTCS 352, Lecture 16 13

Which Block Should Be Replaced on Miss?


Direct Mapped
Choice is easy - only one option

Associative
Randomly select block in set to replace Least-Recently used (LRU)

Implementing LRU
2-way set-associative >2 way set-associative

UTCS 352, Lecture 16

14

What Happens on a Store?


Need to keep cache consistent with main memory
Reads are easy - no modifications Writes are harder - when do we update main memory?

Write-Through

On cache write - always update main memory as well Use a write buffer to stockpile writes to main memory for speed

Write-Back

On cache write - remember that block is modified (dirty bit) Update main memory when dirty block is replaced Sometimes need to flush cache (I/O, multiprocessing)
15

UTCS 352, Lecture 16

BUT: What if Store Causes Miss!


Write-Allocate
Bring written block into cache Update word in block Anticipate further use of block

No-write Allocate

Main memory is updated Cache contents unmodified

UTCS 352, Lecture 16

16

Improving cache performance

UTCS 352, Lecture 16

17

How Do We Improve Cache Performance?


AMAT = thit + pmiss penaltymiss

UTCS 352, Lecture 16

18

How Do We Improve Cache Performance?


AMAT = thit + pmiss penaltymiss
Reduce hit time Reduce miss rate Reduce miss penalty

UTCS 352, Lecture 16

19

Questions to think about


As the block size goes up, what happens to the miss rate? what happens to the miss penalty? what happens to hit time? As the associativity goes up, what happens to the miss rate? what happens to the hit time?

UTCS 352, Lecture 16

20

Reducing Miss Rate: Increase Associativity


Reduce conflict misses Rules of thumb

8-way = fully associative Direct mapped size N = 2-way set associative size N/2

But!

Size N associative is larger than Size N direct mapped Associative typically slower that direct mapped (thit larger)

UTCS 352, Lecture 16

21

Reducing Hit Time


Make Caches small and simple
Hit Time = 1 cycle is good (3.3ns!) L1 - low associativity, relatively small

Even L2 caches can be broken into sub-banks


Can exploit this for faster hit time in L2

UTCS 352, Lecture 16

22

Reducing Miss Rate: Increase Block Size


Fetch more data with each cache miss
16 bytes 64, 128, 256 bytes! Works because of Locality (spatial)
25 1K 16K 256K 20 4K 64K

Miss Rate

15

10

0 16 32 64 128 256

Block Size

UTCS 352, Lecture 16

23

Reduce Miss Penalty: Transfer Time


Should we transfer the whole block at once? Wider path to memory
Transfer more bytes/cycle Reduces total time to transfer block Limited by wires

Two ways to do this:

Wider path to each memory Separate paths to multiple memories multiple memory banks

Block size and transfer unit not necessarily equal!


UTCS 352, Lecture 16 24

Reduce Miss Penalty: Deliver Critical word first


Only need one word from block immediately
LW R3,8(R5)

Dont write entire word into cache first


Fetch word 2 first (deliver to CPU) Fetch order: 2 3 0 1

UTCS 352, Lecture 16

25

Reduce Miss Penalty: More Cache Levels


Average access time = HitTimeL1 + MissRateL1 * MissPenaltyL1 MissPenaltyL1 = HitTimeL2 + MissRateL2 * MissPenaltyL2 etc. Size/Associativity of higher level caches?

L1

L2

L3

UTCS 352, Lecture 16

26

Reduce Miss Penalty: Read Misses First


Let reads pass writes in Write buffer
SW 512(R0),R3 LW R1,1024(R0) LW R2,512(R0) CPU Tag Data

=?
write buffer

MAIN MEMORY

UTCS 352, Lecture 16

27

Reduce Miss Penalty: Lockup (nonblocking) Free Cache


Let cache continue to function while miss is being serviced
LW R1,1024(R0) LW R2,512(R0) MISS

CPU Tag
LW R2,512(R0)

Data
LW R1,1024(R0)

=?
write buffer

MAIN MEMORY

UTCS 352, Lecture 16

28

Reducing Miss Rate: Prefetching


Fetching Data that you will probably need Instructions

Alpha 21064 on cache miss Fetches requested block intro instruction stream buffer Fetches next sequential block into cache Automatically fetch data into cache (spatial locality) Issues?

Data

Compiler controlled prefetching


Inserts prefetching instructions to fetch data for later use Registers or cache

UTCS 352, Lecture 16

29

Reducing Miss Rate: Use a Victim Cache


Small cache (< 8 fully associative entries)
Jouppi 1990 Put evicted lines in the victim FIRST Search in both the L1 and the victim cache Accessed in parallel with main cache Captures conflict misses

CPU Tag L1 =? =? Victim

UTCS 352, Lecture 16

30

VC: Victim Cache Example


Given direct mapped L1 of 4 entries, fully associative 1 entry VC Address access sequence 8, 9, 10, 11, 8, 12, 9, 10, 11, 12, 8 ^ ^ ^ A B C First access to 12 misses, put 8 in VC, put 12 in L1, third access to 8 hits in VC After: A B

Tag =? L1

=? Victim

CPU

8 9 10 11 L1 Victim

12 9 10 11 L1
8

12 9 10 11 L1
8

Victim

Victim
31

UTCS 352, Lecture 16

Summary
Recap

Using a memory address to find location in cache Deciding what to evict from the cache Improving cache performance Homework 6 is due March 25, 2010 Reading: P&H 5.4, 5.6 Virtual Memory TLBs

Next Time

UTCS 352, Lecture 16

32

You might also like