0% found this document useful (0 votes)

78 views27 pages

Computer Science 246 Computer Architecture: Si 2009 Spring 2009 Harvard University

This document summarizes a lecture on memory hierarchies and caches. It discusses set-associative caches, block placement and identification policies, block replacement policies, and write policies like write-back and write-through. It also describes techniques for reducing cache miss penalties like victim caches, critical word first loading, and multilevel caches. The goal is to improve overall cache performance by reducing miss rates, miss penalties, and hit times.

Uploaded by

rafi sk

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

78 views27 pages

Computer Science 246 Computer Architecture: Si 2009 Spring 2009 Harvard University

Uploaded by

rafi sk

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 27

Computer Science 246

Computer Architecture
Spring
S i 2009
Harvard University

Instructor: Prof. David Brooks

[email protected]

Memory Hierarchy and Caches (Part 2)

Computer Science 246

David Brooks
Caches
• Monday lecture
– Review of cache basics, direct-mapped, set-associative
caches
• Today
– More on cache performance, write strategies

Computer Science 246

David Brooks
Summary of Set Associativity
• Direct Mapped
pp
– One place in cache, One Comparator, No Muxes
• Set Associative Caches
– Restricted set of places
– N-way set associativity
– Number of comparators = number of blocks per set
– N:1 mux
• Fully Associative
– Anywhere in cache
– Number of comparators = number of blocks in cache
– N:1 mux needed
Computer Science 246
David Brooks
More Detailed Questions
• Block placement policy?
– Where does a block go when it is fetched?
• Block identification policy?
– How do we find a block in the cache?
• Block replacement policy?
– When fetching a block into a full cache, how do we
decide what other block gets kicked out?
• Write strategy?
– Does any of this differ for reads vs. writes?

Computer Science 246

David Brooks
Block Placement + ID
• Placement
– Invariant: block always goes in exactly one set
– Fully
Fully-Associative:
Associative: Cache is one set, block goes
anywhere
– Direct-Mapped: Block goes in exactly one frame
– Set-Associative: Block goes in one of a few frames
• Identification
– Find Set
– Search ways in parallel (compare tags, check valid bits)

Computer Science 246

David Brooks
Block Replacement
• Cache miss requires a replacement
• No decision needed in direct mapped cache
• More than one place for memory blocks in set
set-
associative
• Replacement Strategies
– Optimal
• Replace Block used furthest ahead in time (oracle)
– Least Recently Used (LRU)
• Optimized for temporal locality
– (Pseudo) Random
• Nearly as good as LRU, simpler
Computer Science 246
David Brooks
Write Policies
• Writes are only about 21% of data cache traffic
• Optimize cache for reads, do writes “on the side”
– Reads can do tag check/data read in parallel
– Writes must be sure we are updating the correct data
and the correct amount of data (1-8 byte writes)
– Serial process => slow
• What to do on a write hit?
• What to do on a write miss?

Computer Science 246

David Brooks
Write Hit Policies
• Q1: When to propagate new values to memory?
• Write back – Information is only written to the cache.
– Next lower level only updated when it is evicted (dirty bits
say when
h data
d t has
h been
b modified)
difi d)
– Can write at speed of cache
– Caches become temporarily
p y inconsistent with lower-levels of
hierarchy.
– Uses less memory bandwidth/power (multiple consecutive
writes may require only 1 final write)
– Multiple writes within a block can be merged into one write
– Evictions are longer latency now (must write back)

Computer Science 246

David Brooks
Write Hit Policies
• Q1: When to propagate new values to memory?
• Write through – Information is written to cache
and to the lower-level memory
– Main memory is always “consistent/coherent”
– Easier to implement
p – no dirtyy bits
– Reads never result in writes to lower levels (cheaper)
– Higher bandwidth needed
– Write buffers used to avoid write stalls

Computer Science 246

David Brooks
Write buffers
CPU • Small chunks of memory to
buffer outgoing writes
• Processor can continue
Cache when data written to buffer
Write Buffer
• Allows overlap of
processor execution with
Lower Levels of Memory memory update

• Write buffers are essential for write-through caches

Computer Science 246
David Brooks
Write buffers
• Writes can now be pipelined (rather than serial)
• Check tag + Write store data into Write Buffer
• Write data from Write buffer to L2 cache (tags ok)
• Loads must check write buffer for Store Op
pending stores to same address Address| Data
• Loads Check: Write Buffer Address| Data
• Write Buffer Entry
• Cache
Tagg Data
• Subsequent Levels of Memory
Computer Science 246 Data Cache
David Brooks
Write buffer policies:
Performance/Complexity Tradeoffs
Stores L2 Cache

Loads

• Allow merging of multiple stores? (“coalescing”)

• “Flush
Flush Policy”
Policy – How to do flushing of entries?
• “Load Servicing Policy” – What happens when a
load occurs to data currently in write buffer?
Computer Science 246
David Brooks
Write misses?
• Write Allocate
– Block is allocated on a write miss
– Standard write hit actions follow the block allocation
– Write misses = Read Misses
– Goes well with write-back
• No-write Allocate
– Write misses do not allocate a block
– O l update
Only d t lower-level
l l l memory
– Blocks only allocate on Read misses!
– Goes well with write-through
Computer Science 246
David Brooks
Summary of Write Policies

Write Policy Hit/Miss Writes to

WriteBack/Allocate Both L1 Cache
WriteBack/NoAllocate Hit L1 Cache
WriteBack/NoAllocate Miss L2 Cache
WriteThrough/Allocate Both Both
WriteThrough/NoAllocate Hit Both
WriteThrough/NoAllocate Miss L2 Cache

Computer Science 246

David Brooks
Cache Performance
CPU time = (CPU execution cycles + Memory Stall
Cycles)*Clock Cycle Time

AMAT = Hit Time + Miss Rate * Miss Penalty

• Reducing these three parameters can have a big

impact on performance
• Out-of-order processors can hide some of the miss
penalty
Computer Science 246
David Brooks
Reducing Miss Penalty
• Have already seen two examples of techniques to
reduce miss penalty
– Write buffers give priority to read misses over writes
– Merging write buffers
• Multiword writes are faster than many single word writes
• Now we consider several more
– Victim Caches
– Critical Word First/Early Restart
– Multilevel caches

Computer Science 246

David Brooks
Reducing Miss Penalty:
Victim Caches
• Direct mapped caches => many conflict misses
• Solution 1: More associativity (expensive)
• Solution 2: Victim Cache
• Victim Cache
– Small (4 to 8-entry),
8 entry) fully
fully-associative
associative cache between L1
cache and refill path
– Holds blocks discarded from cache because of evictions
– Checked on a miss before going to L2 cache
– Hit in victim cache => swap victim block with cache block

Computer Science 246

David Brooks
Reducing Miss Penalty:
Victim Caches

• Even one entry helps

some benchmarks!
• Helps more for smaller
caches, larger block
sizes
Computer Science 246
David Brooks
Reducing Miss Penalty:
Critical Word First/Early Restart
• CPU normally just needs one word at a time
• Large cache blocks have long transfer times
• Don
Don’tt wait for the full block to be loaded before
sending requested data word to the CPU
• Critical Word First
– Request the missed word first from memory and send it
to the CPU and continue execution
• Early Restart
– Fetch in order, but as soon as the requested block
arrives send it to the CPU and continue execution

Computer Science 246

David Brooks
Review: Improving Cache
Performance
• How to improve cache performance?
– Reducing Cache Miss Penalty
– Reducing Miss Rate
– Reducing Miss Penalty/Rate via parallelism
– Reducing Hit Time

Computer Science 246

David Brooks
Non-blocking Caches to reduce stalls
on misses
• Non-blocking
Non blocking cache or lockup
lockup-free
free cache allow data cache to
continue to supply cache hits during a miss
– requires out-of-order execution
– requires multi-bank memories
• “hit under miss” reduces the effective miss penalty by
working during miss vs. ignoring CPU requests
• “hit
“h under
d multiple
l l miss”” or “miss
“ under
d miss”” may further
f h
lower the effective miss penalty by overlapping multiple
misses
– Significantly increases the complexity of the cache controller as there
can be multiple outstanding memory accesses
– Requires multiple memory banks (otherwise cannot support)
– Penium Pro allows 4 outstanding memory misses
Computer Science 246
David Brooks
Value of Hit Under Miss for SPEC
P
Percentage
t M
Memory St
Stall
ll Time
Ti off a Blocking
Bl ki Cache
C h

• FP pprograms
g on average:
g AMAT= 0.68 -> 0.52 -> 0.34 -> 0.26
• Int programs on average: AMAT= 0.24 -> 0.20 -> 0.19 -> 0.19
• 8 KB Data Cache, Direct Mapped, 32B block, 16 cycle miss
Reducing Misses by Hardware
Prefetching of Instructions & Data
• Instruction Prefetching
– Alpha 21064 fetches 2 blocks on a miss
– Extra block placed in “stream buffer” not the cache
– On Access: check both cache and stream buffer
– On SB Hit: move line into cache
– On SB Miss: Clear and refill SB with successive lines
• Works with data blocks too:
– Jouppi [1990] 1 data stream buffer got 25% misses from 4KB cache; 4
streams got 43%
– Palacharla & Kessler [[1994]] for scientific pprograms
g for 8 streams ggot
50% to 70% of misses from
2 64KB, 4-way set associative caches
• Prefetching relies on having extra memory bandwidth that can
be used without penalty
Computer Science 246
David Brooks
Hardware Prefetching
• What to prefetch?
p
– One block ahead (spatially)
• What will this work well for?
– Address prediction for non-sequential data
• Correlated predictors (store miss, next_miss pairs in table)
• Jump-pointers
pp (augment
( g data structures with prefetch
p pointers)
p )
• When to prefetch?
– On everyy reference
– On a miss (basically doubles block size!)
– When resident data becomes “dead” -- how do we know?
• No one will use it anymore, so it can be kicked out
Computer Science 246
David Brooks
Reducing Misses by
S f
Software P f hi D
Prefetching Data
• Data Prefetch
– Load data into register (HP PA-RISC loads)
– Cache Prefetch: load into cache (MIPS IV, PowerPC, SPARC v. 9)
– Special prefetching instructions cannot cause faults; a form of speculative
execution
• Prefetching comes in two flavors:
– Binding prefetch: Requests load directly into register.
register
• Must be correct address and register!
– Non-Binding prefetch: Load into cache.
• Can be incorrect.
incorrect Faults?
• Issuing Prefetch Instructions takes time
– Is cost of prefetch issues < savings in reduced misses?
– Higher
Hi h superscalar l reduces
d difficulty
diffi lt off issue
i bandwidth
b d idth
Computer Science 246
David Brooks
Reducing Hit Times
• Some common techniques/trends
q
– Small and simple caches
• Pentium III – 16KB L1
• Pentium
i 4 – 8KB
8 L11
– Pipelined Caches (actually bandwidth increase)
• Pentium – 1 clock cycle
y I-Cache
• Pentium III – 2 clock cycle I-Cache
• Pentium 4 – 4 clock cycle I-Cache
– Trace
T Caches
C h
• Beyond spatial locality
• Dynamic sequences of instruction (including taken branches)

Computer Science 246

David Brooks
Cache Bandwidth
• Superscalars need multiple memory access per cycle
• Parallel cache access: more difficult than parallel ALUs
– Caches have state so multiple
p accesses will affect each other
• “True Multiporting”
– Multiple decoders, read/write wordlines per SRAM cell
– Pipeline a single port by “double pumping” Alpha 21264
– Multiple cache copies (like clustered register file) POWER4
• Interleaved Multiporting
– Cache divides into banks – two accesses to same bank =>
conflict

Computer Science 246

David Brooks

T L E Module 1 Grade 8
100% (3)
T L E Module 1 Grade 8
20 pages
CS 152 Computer Architecture and Engineering Lecture 7 - Memory Hierarchy-II
No ratings yet
CS 152 Computer Architecture and Engineering Lecture 7 - Memory Hierarchy-II
27 pages
Cache
No ratings yet
Cache
34 pages
5.2 Eleven Advanced Optimizations of Cache Performance
No ratings yet
5.2 Eleven Advanced Optimizations of Cache Performance
13 pages
Cache Memory: A Safe Place For Hiding or Storing Things
100% (1)
Cache Memory: A Safe Place For Hiding or Storing Things
34 pages
Memory Cache
No ratings yet
Memory Cache
18 pages
Memory Hierarchy Design
No ratings yet
Memory Hierarchy Design
115 pages
Cau 6 Cache
No ratings yet
Cau 6 Cache
25 pages
f133 Datasheet v1.2
No ratings yet
f133 Datasheet v1.2
97 pages
Onur 447 Spring15 Lecture19 High Performance Caches Afterlecture
No ratings yet
Onur 447 Spring15 Lecture19 High Performance Caches Afterlecture
57 pages
Computer Arch 06
No ratings yet
Computer Arch 06
41 pages
COMP 740: Computer Architecture and Implementation: Montek Singh
No ratings yet
COMP 740: Computer Architecture and Implementation: Montek Singh
41 pages
Lec 33
No ratings yet
Lec 33
26 pages
Caches and Memory
No ratings yet
Caches and Memory
65 pages
Lecture 12: Cache Innovations
No ratings yet
Lecture 12: Cache Innovations
17 pages
Cache Presentation
No ratings yet
Cache Presentation
45 pages
Stanford Advanced Caches
No ratings yet
Stanford Advanced Caches
46 pages
Cache Optimizations
No ratings yet
Cache Optimizations
29 pages
AC14L08 Memory Hierarchy
No ratings yet
AC14L08 Memory Hierarchy
20 pages
Memory Hierarchy - Ways To Reduce Misses: DAP Spr. 98 ©UCB 1
No ratings yet
Memory Hierarchy - Ways To Reduce Misses: DAP Spr. 98 ©UCB 1
23 pages
Advanced Computer Architecture-06CS81-Memory Hierarchy Design
No ratings yet
Advanced Computer Architecture-06CS81-Memory Hierarchy Design
18 pages
Cache Writing & Performance
No ratings yet
Cache Writing & Performance
23 pages
Topics: Cache Innovations (Sections 2.4, B.4, B.5), Virtual Memory Intro
No ratings yet
Topics: Cache Innovations (Sections 2.4, B.4, B.5), Virtual Memory Intro
20 pages
10 Caches
No ratings yet
10 Caches
34 pages
Computer Organization and Architecture
No ratings yet
Computer Organization and Architecture
12 pages
Cache Misses
No ratings yet
Cache Misses
8 pages
L18 Cache Wrap Up
No ratings yet
L18 Cache Wrap Up
30 pages
09 Caches Tlbs
No ratings yet
09 Caches Tlbs
33 pages
Memory 2
No ratings yet
Memory 2
31 pages
05) Cache Memory Introduction
No ratings yet
05) Cache Memory Introduction
20 pages
Cache Basics and Operation
No ratings yet
Cache Basics and Operation
42 pages
Cache Memory: A Safe Place For Hiding or Storing Things
No ratings yet
Cache Memory: A Safe Place For Hiding or Storing Things
34 pages
L07 MemoryII
No ratings yet
L07 MemoryII
27 pages
Lecture16 PDF
No ratings yet
Lecture16 PDF
4 pages
Unit 4
No ratings yet
Unit 4
72 pages
ch2 Appb
No ratings yet
ch2 Appb
58 pages
15IF11 Multicore B
No ratings yet
15IF11 Multicore B
36 pages
Week 11
No ratings yet
Week 11
45 pages
25 e 50 Beb 5 Aad 8 F 60
No ratings yet
25 e 50 Beb 5 Aad 8 F 60
49 pages
ARCODE Operations - V1.5 - en
No ratings yet
ARCODE Operations - V1.5 - en
16 pages
2015Sp CS61C L16 Kavs Caches3
No ratings yet
2015Sp CS61C L16 Kavs Caches3
25 pages
Lecture: Cache Hierarchies: Topics: Cache Innovations (Sections B.1-B.3, 2.1)
No ratings yet
Lecture: Cache Hierarchies: Topics: Cache Innovations (Sections B.1-B.3, 2.1)
20 pages
Cache
No ratings yet
Cache
35 pages
Cache and Caching: Electrical and Electronic Engineering
No ratings yet
Cache and Caching: Electrical and Electronic Engineering
15 pages
Automobile Alarm Circuit
100% (1)
Automobile Alarm Circuit
1 page
Automobile Alarm Circuit
100% (1)
Automobile Alarm Circuit
1 page
Ec6009 Advanced Computer Architecture Unit V Memory and I/O: Cache Performance
No ratings yet
Ec6009 Advanced Computer Architecture Unit V Memory and I/O: Cache Performance
16 pages
Advanced Computer Architecture: BY Dr. Radwa M. Tawfeek
No ratings yet
Advanced Computer Architecture: BY Dr. Radwa M. Tawfeek
32 pages
UNIT2 Cahe-Opt
No ratings yet
UNIT2 Cahe-Opt
134 pages
Memory Hierarchy Design-Aca
No ratings yet
Memory Hierarchy Design-Aca
15 pages
Ca Q,,a 4TH Sem
No ratings yet
Ca Q,,a 4TH Sem
18 pages
Chapter # 05
No ratings yet
Chapter # 05
42 pages
CS530 Fall2015 Lecture6
No ratings yet
CS530 Fall2015 Lecture6
3 pages
Memory Hierarchy - Introduction: Cost Performance of Memory Reference
No ratings yet
Memory Hierarchy - Introduction: Cost Performance of Memory Reference
52 pages
Lecture 16
No ratings yet
Lecture 16
22 pages
ACA Unit-5
No ratings yet
ACA Unit-5
54 pages
Cache and Caching: Electrical and Electronic Engineering
No ratings yet
Cache and Caching: Electrical and Electronic Engineering
15 pages
CMP3010L09 MemoryII
No ratings yet
CMP3010L09 MemoryII
39 pages
UNIT-IV Memory and I/O
No ratings yet
UNIT-IV Memory and I/O
36 pages
TC58NVG0S3ETA00 Datasheet
No ratings yet
TC58NVG0S3ETA00 Datasheet
65 pages
Lec8 Memory
No ratings yet
Lec8 Memory
17 pages
Unit II
No ratings yet
Unit II
9 pages
Chapter 6
No ratings yet
Chapter 6
37 pages
Rock Nes
No ratings yet
Rock Nes
7 pages
Dell Optiplex 990 Power Switch Connector Pinout - Dell Photos and Images 2018
0% (1)
Dell Optiplex 990 Power Switch Connector Pinout - Dell Photos and Images 2018
16 pages
Linux Architecture - 1
No ratings yet
Linux Architecture - 1
34 pages
HID iCLASS Bioprox Reader PDF
No ratings yet
HID iCLASS Bioprox Reader PDF
2 pages
Visualising Indian Heritage Digital Library Metaphor: Ramdasi@cdac - Ernet.in
No ratings yet
Visualising Indian Heritage Digital Library Metaphor: Ramdasi@cdac - Ernet.in
10 pages
Cartridge Checkin Problem
No ratings yet
Cartridge Checkin Problem
3 pages
CERGAS Software SWE60 Manual de Uso
No ratings yet
CERGAS Software SWE60 Manual de Uso
40 pages
Stellaris LM3S9B96
No ratings yet
Stellaris LM3S9B96
1,392 pages
COA Unit 3 Shivam
No ratings yet
COA Unit 3 Shivam
29 pages
Aspire 8730 8730z 8530
No ratings yet
Aspire 8730 8730z 8530
196 pages
Ds ESPRIMO Q900
No ratings yet
Ds ESPRIMO Q900
7 pages
Precision-M6500 Service Manual En-Us
No ratings yet
Precision-M6500 Service Manual En-Us
118 pages
General Characteristics of It Systems
No ratings yet
General Characteristics of It Systems
12 pages
Short Surahs For Namaz Telugu Islam
No ratings yet
Short Surahs For Namaz Telugu Islam
2 pages
Serial Control Connector
No ratings yet
Serial Control Connector
36 pages
White Paper - : Amd64 Technology
No ratings yet
White Paper - : Amd64 Technology
5 pages
VEDA IIT, A Unit of The VEDA Educational Society: Learn. Explore. Excel
No ratings yet
VEDA IIT, A Unit of The VEDA Educational Society: Learn. Explore. Excel
1 page
Satellite Communication
No ratings yet
Satellite Communication
3 pages
Nvidia Round 1
No ratings yet
Nvidia Round 1
3 pages
ADIprice List
No ratings yet
ADIprice List
6 pages
Ece PDF
No ratings yet
Ece PDF
157 pages
The Evolution of Storage Devices
No ratings yet
The Evolution of Storage Devices
3 pages
Computer Organisation and Microprocessor
No ratings yet
Computer Organisation and Microprocessor
3 pages
Automatic College Bell Ring System
No ratings yet
Automatic College Bell Ring System
100 pages
Form 2
No ratings yet
Form 2
5 pages
07 Memory Management Basics
No ratings yet
07 Memory Management Basics
33 pages
Computer Organization: Performance, Risc and Cisc
No ratings yet
Computer Organization: Performance, Risc and Cisc
41 pages
Lec 4
No ratings yet
Lec 4
28 pages
ReleaseNote FileList of B1400CEAE 2009 X64 V2.01A L
No ratings yet
ReleaseNote FileList of B1400CEAE 2009 X64 V2.01A L
7 pages
Introduction To Rasberry Pi
No ratings yet
Introduction To Rasberry Pi
2 pages
LTCD NTK Datasheet
No ratings yet
LTCD NTK Datasheet
2 pages
Codigo MPLAB CRC - 8bits
No ratings yet
Codigo MPLAB CRC - 8bits
5 pages
Severity System Tasks Description: Section 19.9
No ratings yet
Severity System Tasks Description: Section 19.9
3 pages
Brochure Echometer Total Asset Monitor
No ratings yet
Brochure Echometer Total Asset Monitor
4 pages
Bash Shell from Zero to Hero: An SRE's Practical Guide to Terminal Skills, Scripting, and Automation
From Everand
Bash Shell from Zero to Hero: An SRE's Practical Guide to Terminal Skills, Scripting, and Automation
Nolan Reeves
No ratings yet
Kubernetes Made Easy
From Everand
Kubernetes Made Easy
Pankaj Joshi
No ratings yet
SAS Interview Questions You'll Most Likely Be Asked
From Everand
SAS Interview Questions You'll Most Likely Be Asked
Vibrant Publishers
No ratings yet

Computer Science 246 Computer Architecture: Si 2009 Spring 2009 Harvard University

Uploaded by

Computer Science 246 Computer Architecture: Si 2009 Spring 2009 Harvard University

Uploaded by

Computer Science 246

Instructor: Prof. David Brooks

Memory Hierarchy and Caches (Part 2)

Computer Science 246

Computer Science 246

Computer Science 246

Computer Science 246

Computer Science 246

Computer Science 246

Computer Science 246

• Write buffers are essential for write-through caches

• Allow merging of multiple stores? (“coalescing”)

Write Policy Hit/Miss Writes to

Computer Science 246

AMAT = Hit Time + Miss Rate * Miss Penalty

• Reducing these three parameters can have a big

Computer Science 246

Computer Science 246

• Even one entry helps

Computer Science 246

Computer Science 246

Computer Science 246

Computer Science 246

You might also like