0% found this document useful (0 votes)

27 views32 pages

Lecture 16: Cache Memories - Last Time - Today

This document summarizes key points from a lecture on cache memories: 1. It discusses how memory addresses are used to locate blocks in direct-mapped, set-associative, and fully-associative caches. 2. When a cache miss occurs, replacement policies like LRU are used to determine which block to evict. 3. Techniques for improving cache performance include increasing block size and associativity to reduce miss rates, and using wider buses and multiple cache levels to reduce miss penalties.

Uploaded by

adchy7

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

27 views32 pages

Lecture 16: Cache Memories - Last Time - Today

Uploaded by

adchy7

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 32

Lecture 16: Cache Memories Last Time

AMAT average memory access time Basic cache organization

Today
Take QUIZ 12 over P&H 5.7-10 before 11:59pm today Read 5.4, 5.6 for 3/25 Homework 6 due Thursday March 25, 2010 Hardware cache organization Reads versus Writes Cache Optimization

UTCS 352, Lecture 16

Cache Memory Theory

Small fast memory + big slow memory Looks like a big fast memory

Big Fast

MC Small Fast MM

Big Slow
UTCS 352, Lecture 16 2

The Memory Hierarchy

Registers CPU Chip Latency 1 cyc 1-3cy Bandwidth 3-10 words/cycle < 1KB 1-2 words/cycle 32KB -1MB 1 word/cycle 1MB - 4MB

compiler managed hardware managed

Level 1 Cache

Level 2 Cache Chips DRAM

5-10cy

hardware managed OS managed

30-100cy 0.5 words/cycle 64MB - 4GB 106-107cy 0.01 words/cycle 4GB+

Disk Mechanical Tape

UTCS 352, Lecture 16

OS managed

Direct Mapped
Each block mapped to exactly 1 cache location Cache location = (block address) MOD (# blocks in cache)
0 1 2 3 4 5 6 7

0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31

UTCS 352, Lecture 16

Fully Associative
Each block mapped to any cache location Cache location = any
0 1 2 3 4 5 6 7

0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31

UTCS 352, Lecture 16

Set Associative
Each block mapped to subset of cache locations Set selection = (block address) MOD (# sets in cache)
0 1 2 3 4 5 6 7

Set

2-way set associative = 2 blocks in set This example: 4 sets

0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31

UTCS 352, Lecture 16

How do we use memory address to find block in the cache?

UTCS 352, Lecture 16

How Do We Find a Block in The Cache?

Our Example:
Main memory address space = 32 bits (= 4GBytes) Block size = 4 words = 16 bytes Cache capacity = 8 blocks = 128 bytes

32 bit Address

block address tag index block offset 4 bits

28 bits

index which set tag which data/instruction in block block offset which word in block # tag/index bits determine the associativity tag/index bits can come from anywhere in block address
8

UTCS 352, Lecture 16

Finding a Block: Direct-Mapped

S Entries

25
Tag Index Address

=
Hit Data

With cache capacity = 8 blocks

UTCS 352, Lecture 16

Finding A Block: 2-Way Set-Associative

S - sets A - elements in each set
A-way associative
2 elements per set

S=4, A=2
2-way associative 8-entry cache

4 Sets

2 26

=
Tag Index Address Hit Data

UTCS 352, Lecture 16

Finding A Block: Fully Associative

=
Tag Address
UTCS 352, Lecture 16

=
Hit

Data
11

Set Associative Cache - contd

All of main memory is divided into S sets
All addresses in set N map to same set of the cache Addr = N mod S A locations available

Low address bits select set

2 in example

Shares costly comparators across sets

High address bits are tag, used to associatively search the selected set Extreme cases
A=1: Direct mapped cache S=1: Fully associative

A need not be a power of 2

UTCS 352, Lecture 16

Cache Organization
Address

27 15 42 86 Valid bits

95 11 75 33

90 12 74 35

99 13 73 31

96 14 72 37

Data

Where

does a block get placed? - DONE How do we nd it? - DONE Which one do we replace when a new one is brought in? What happens on a write?
UTCS 352, Lecture 16 13

Which Block Should Be Replaced on Miss?

Direct Mapped
Choice is easy - only one option

Associative
Randomly select block in set to replace Least-Recently used (LRU)

Implementing LRU
2-way set-associative >2 way set-associative

UTCS 352, Lecture 16

What Happens on a Store?

Need to keep cache consistent with main memory
Reads are easy - no modifications Writes are harder - when do we update main memory?

Write-Through

On cache write - always update main memory as well Use a write buffer to stockpile writes to main memory for speed

Write-Back

On cache write - remember that block is modified (dirty bit) Update main memory when dirty block is replaced Sometimes need to flush cache (I/O, multiprocessing)
15

UTCS 352, Lecture 16

BUT: What if Store Causes Miss!

Write-Allocate
Bring written block into cache Update word in block Anticipate further use of block

No-write Allocate

Main memory is updated Cache contents unmodified

UTCS 352, Lecture 16

Improving cache performance

UTCS 352, Lecture 16

How Do We Improve Cache Performance?

AMAT = thit + pmiss penaltymiss

UTCS 352, Lecture 16

How Do We Improve Cache Performance?

AMAT = thit + pmiss penaltymiss
Reduce hit time Reduce miss rate Reduce miss penalty

UTCS 352, Lecture 16

Questions to think about

As the block size goes up, what happens to the miss rate? what happens to the miss penalty? what happens to hit time? As the associativity goes up, what happens to the miss rate? what happens to the hit time?

UTCS 352, Lecture 16

Reducing Miss Rate: Increase Associativity

Reduce conflict misses Rules of thumb

8-way = fully associative Direct mapped size N = 2-way set associative size N/2

But!

Size N associative is larger than Size N direct mapped Associative typically slower that direct mapped (thit larger)

UTCS 352, Lecture 16

Reducing Hit Time

Make Caches small and simple
Hit Time = 1 cycle is good (3.3ns!) L1 - low associativity, relatively small

Even L2 caches can be broken into sub-banks

Can exploit this for faster hit time in L2

UTCS 352, Lecture 16

Reducing Miss Rate: Increase Block Size

Fetch more data with each cache miss
16 bytes 64, 128, 256 bytes! Works because of Locality (spatial)
25 1K 16K 256K 20 4K 64K

Miss Rate

0 16 32 64 128 256

Block Size

UTCS 352, Lecture 16

Reduce Miss Penalty: Transfer Time

Should we transfer the whole block at once? Wider path to memory
Transfer more bytes/cycle Reduces total time to transfer block Limited by wires

Two ways to do this:

Wider path to each memory Separate paths to multiple memories multiple memory banks

Block size and transfer unit not necessarily equal!

UTCS 352, Lecture 16 24

Reduce Miss Penalty: Deliver Critical word first

Only need one word from block immediately
LW R3,8(R5)

Dont write entire word into cache first

Fetch word 2 first (deliver to CPU) Fetch order: 2 3 0 1

UTCS 352, Lecture 16

Reduce Miss Penalty: More Cache Levels

Average access time = HitTimeL1 + MissRateL1 * MissPenaltyL1 MissPenaltyL1 = HitTimeL2 + MissRateL2 * MissPenaltyL2 etc. Size/Associativity of higher level caches?

UTCS 352, Lecture 16

Reduce Miss Penalty: Read Misses First

Let reads pass writes in Write buffer
SW 512(R0),R3 LW R1,1024(R0) LW R2,512(R0) CPU Tag Data

=?
write buffer

MAIN MEMORY

UTCS 352, Lecture 16

Reduce Miss Penalty: Lockup (nonblocking) Free Cache

Let cache continue to function while miss is being serviced
LW R1,1024(R0) LW R2,512(R0) MISS

CPU Tag
LW R2,512(R0)

Data
LW R1,1024(R0)

=?
write buffer

MAIN MEMORY

UTCS 352, Lecture 16

Reducing Miss Rate: Prefetching

Fetching Data that you will probably need Instructions

Alpha 21064 on cache miss Fetches requested block intro instruction stream buffer Fetches next sequential block into cache Automatically fetch data into cache (spatial locality) Issues?

Data

Compiler controlled prefetching

Inserts prefetching instructions to fetch data for later use Registers or cache

UTCS 352, Lecture 16

Reducing Miss Rate: Use a Victim Cache

Small cache (< 8 fully associative entries)
Jouppi 1990 Put evicted lines in the victim FIRST Search in both the L1 and the victim cache Accessed in parallel with main cache Captures conflict misses

CPU Tag L1 =? =? Victim

UTCS 352, Lecture 16

VC: Victim Cache Example

Given direct mapped L1 of 4 entries, fully associative 1 entry VC Address access sequence 8, 9, 10, 11, 8, 12, 9, 10, 11, 12, 8 ^ ^ ^ A B C First access to 12 misses, put 8 in VC, put 12 in L1, third access to 8 hits in VC After: A B

Tag =? L1

=? Victim

CPU

8 9 10 11 L1 Victim

12 9 10 11 L1
8

Victim

Victim
31

UTCS 352, Lecture 16

Summary
Recap

Using a memory address to find location in cache Deciding what to evict from the cache Improving cache performance Homework 6 is due March 25, 2010 Reading: P&H 5.4, 5.6 Virtual Memory TLBs

Next Time

UTCS 352, Lecture 16

Chapter # 05
No ratings yet
Chapter # 05
42 pages
Unit 4
No ratings yet
Unit 4
72 pages
CMP3010L09 MemoryII
No ratings yet
CMP3010L09 MemoryII
39 pages
Lec8 Memory
No ratings yet
Lec8 Memory
17 pages
Cache Org
No ratings yet
Cache Org
19 pages
Lec17 Cache 3
No ratings yet
Lec17 Cache 3
33 pages
53-Cache Memory - Principles, Cache Memory Management Techniques-28!02!2025
No ratings yet
53-Cache Memory - Principles, Cache Memory Management Techniques-28!02!2025
38 pages
Lectures wk11
No ratings yet
Lectures wk11
21 pages
6.module 2 - Part 2
No ratings yet
6.module 2 - Part 2
39 pages
Cache Presentation
No ratings yet
Cache Presentation
45 pages
Comp Arch Lect5
No ratings yet
Comp Arch Lect5
26 pages
CL10 MemoryMgmt
No ratings yet
CL10 MemoryMgmt
45 pages
10 Cacheperf
No ratings yet
10 Cacheperf
24 pages
Unit Iv
No ratings yet
Unit Iv
61 pages
Memory Originated
No ratings yet
Memory Originated
28 pages
Cache Mapping
100% (1)
Cache Mapping
44 pages
EE6304 Lecture9 Mem Caches
No ratings yet
EE6304 Lecture9 Mem Caches
61 pages
Cache PPT
No ratings yet
Cache PPT
38 pages
CS2115 Chapter-6
No ratings yet
CS2115 Chapter-6
45 pages
Wk10a Cache PDF
No ratings yet
Wk10a Cache PDF
25 pages
Computer Architecture: Assoc. Prof. Nguyễn Trí Thành, Phd
No ratings yet
Computer Architecture: Assoc. Prof. Nguyễn Trí Thành, Phd
55 pages
10 Caches
No ratings yet
10 Caches
34 pages
CDT25 CacheMemory
No ratings yet
CDT25 CacheMemory
7 pages
09 Caches Tlbs
No ratings yet
09 Caches Tlbs
33 pages
Lecture 04 IS064
No ratings yet
Lecture 04 IS064
41 pages
ch5 Easy
No ratings yet
ch5 Easy
27 pages
UNIT2 Cahe-Opt
No ratings yet
UNIT2 Cahe-Opt
134 pages
CH04 COA10e
No ratings yet
CH04 COA10e
41 pages
Cache Memory: A Safe Place For Hiding or Storing Things
No ratings yet
Cache Memory: A Safe Place For Hiding or Storing Things
34 pages
Unit V
No ratings yet
Unit V
44 pages
Cache - Memory - Concept
No ratings yet
Cache - Memory - Concept
73 pages
Memory Hierarchy Design
No ratings yet
Memory Hierarchy Design
115 pages
AC14L08 Memory Hierarchy
No ratings yet
AC14L08 Memory Hierarchy
20 pages
Class11 Cache
No ratings yet
Class11 Cache
41 pages
Conspect of Lecture 7
No ratings yet
Conspect of Lecture 7
13 pages
05) Cache Memory Introduction
No ratings yet
05) Cache Memory Introduction
20 pages
361 Computer Architecture Lecture 14: Cache Memory
No ratings yet
361 Computer Architecture Lecture 14: Cache Memory
20 pages
Caching: Acknowledgements
No ratings yet
Caching: Acknowledgements
6 pages
Cache
No ratings yet
Cache
34 pages
Cache Memory: A Safe Place For Hiding or Storing Things
100% (1)
Cache Memory: A Safe Place For Hiding or Storing Things
34 pages
Computer Org and Arch: R.Magesh
No ratings yet
Computer Org and Arch: R.Magesh
48 pages
10 Cache Memories
No ratings yet
10 Cache Memories
49 pages
15IF11 Multicore B
No ratings yet
15IF11 Multicore B
36 pages
Chap 6
No ratings yet
Chap 6
48 pages
Memory Hierarchies (Part 2) Review: The Memory Hierarchy
No ratings yet
Memory Hierarchies (Part 2) Review: The Memory Hierarchy
7 pages
Memory Hierarchy Design-Aca
No ratings yet
Memory Hierarchy Design-Aca
15 pages
Sampriya Chandra Cache Memory
No ratings yet
Sampriya Chandra Cache Memory
36 pages
Lecture: Cache Hierarchies: Topics: Cache Innovations (Sections B.1-B.3, 2.1)
No ratings yet
Lecture: Cache Hierarchies: Topics: Cache Innovations (Sections B.1-B.3, 2.1)
20 pages
Data Migration To Hadoop
100% (2)
Data Migration To Hadoop
26 pages
ACA Unit-5
No ratings yet
ACA Unit-5
54 pages
Large and Fast: Exploiting Memory Hierarchy
No ratings yet
Large and Fast: Exploiting Memory Hierarchy
48 pages
Cache: Contents and Introduction
No ratings yet
Cache: Contents and Introduction
13 pages
Lecture 5: Memory Hierarchy and Cache Traditional Four Questions For Memory Hierarchy Designers
No ratings yet
Lecture 5: Memory Hierarchy and Cache Traditional Four Questions For Memory Hierarchy Designers
10 pages
Cache Design
No ratings yet
Cache Design
59 pages
Assosiative Mapping - Cache Memory
No ratings yet
Assosiative Mapping - Cache Memory
2 pages
Chapter 6
No ratings yet
Chapter 6
37 pages
17-Lesson Sqoop Practice - PuTTY
No ratings yet
17-Lesson Sqoop Practice - PuTTY
8 pages
14-Lesson Cloudera Hive
No ratings yet
14-Lesson Cloudera Hive
9 pages
15-Lesson Cloudera Details
No ratings yet
15-Lesson Cloudera Details
4 pages
9-Lesson SQL Commands
No ratings yet
9-Lesson SQL Commands
6 pages
9-Lesson SQL Commands
No ratings yet
9-Lesson SQL Commands
6 pages
Presidents CSV
No ratings yet
Presidents CSV
3 pages
DBCC DMV Commands
No ratings yet
DBCC DMV Commands
6 pages
MySQL Cluster 7.5 inside and out
From Everand
MySQL Cluster 7.5 inside and out
Mikael Ronström
No ratings yet
The Tech Interview Playbook: From DSA to System Design
From Everand
The Tech Interview Playbook: From DSA to System Design
Chinmoy Mukherjee
No ratings yet
PlayStation 2 Architecture: Architecture of Consoles: A Practical Analysis, #12
From Everand
PlayStation 2 Architecture: Architecture of Consoles: A Practical Analysis, #12
Rodrigo Copetti
No ratings yet
Cassandra Query Language by Examples - Puzzles with Answers
From Everand
Cassandra Query Language by Examples - Puzzles with Answers
Cristian Scutaru
No ratings yet
Practical Reverse Engineering: x86, x64, ARM, Windows Kernel, Reversing Tools, and Obfuscation
From Everand
Practical Reverse Engineering: x86, x64, ARM, Windows Kernel, Reversing Tools, and Obfuscation
Bruce Dang
No ratings yet

Lecture 16: Cache Memories - Last Time - Today

Uploaded by

Lecture 16: Cache Memories - Last Time - Today

Uploaded by

Lecture 16: Cache Memories Last Time

AMAT average memory access time Basic cache organization

UTCS 352, Lecture 16

Cache Memory Theory

The Memory Hierarchy

compiler managed hardware managed

Level 2 Cache Chips DRAM

hardware managed OS managed

30-100cy 0.5 words/cycle 64MB - 4GB 106-107cy 0.01 words/cycle 4GB+

Disk Mechanical Tape

UTCS 352, Lecture 16

UTCS 352, Lecture 16

2-way set associative = 2 blocks in set This example: 4 sets

UTCS 352, Lecture 16

How do we use memory address to find block in the cache?

UTCS 352, Lecture 16

How Do We Find a Block in The Cache?

block address tag index block offset 4 bits

UTCS 352, Lecture 16

Finding a Block: Direct-Mapped

With cache capacity = 8 blocks

UTCS 352, Lecture 16

Finding A Block: 2-Way Set-Associative

UTCS 352, Lecture 16

Finding A Block: Fully Associative

Set Associative Cache - contd

Low address bits select set

Shares costly comparators across sets

A need not be a power of 2

UTCS 352, Lecture 16

Which Block Should Be Replaced on Miss?

UTCS 352, Lecture 16

What Happens on a Store?

UTCS 352, Lecture 16

BUT: What if Store Causes Miss!

Main memory is updated Cache contents unmodified

UTCS 352, Lecture 16

Improving cache performance

UTCS 352, Lecture 16

How Do We Improve Cache Performance?

UTCS 352, Lecture 16

How Do We Improve Cache Performance?

UTCS 352, Lecture 16

Questions to think about

UTCS 352, Lecture 16

Reducing Miss Rate: Increase Associativity

UTCS 352, Lecture 16

Reducing Hit Time

Even L2 caches can be broken into sub-banks

UTCS 352, Lecture 16

Reducing Miss Rate: Increase Block Size

UTCS 352, Lecture 16

Reduce Miss Penalty: Transfer Time

Two ways to do this:

Block size and transfer unit not necessarily equal!

Reduce Miss Penalty: Deliver Critical word first

Dont write entire word into cache first

UTCS 352, Lecture 16

Reduce Miss Penalty: More Cache Levels

UTCS 352, Lecture 16

Reduce Miss Penalty: Read Misses First

UTCS 352, Lecture 16

Reduce Miss Penalty: Lockup (nonblocking) Free Cache

UTCS 352, Lecture 16

Reducing Miss Rate: Prefetching

Compiler controlled prefetching

UTCS 352, Lecture 16

Reducing Miss Rate: Use a Victim Cache

CPU Tag L1 =? =? Victim

UTCS 352, Lecture 16

VC: Victim Cache Example

UTCS 352, Lecture 16

UTCS 352, Lecture 16

You might also like