0% found this document useful (0 votes)

42 views23 pages

Cache Writing & Performance

The document discusses cache writing and performance, focusing on maintaining memory consistency through write-through and write-back policies, as well as strategies for handling write misses. It also covers the importance of prefetching data into the cache to improve performance, including both hardware and software prefetching techniques. The document concludes with a summary of the key points and introduces the topic of cache-conscious programming for the next session.

Uploaded by

rakshithavasan22

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPT, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

42 views23 pages

Cache Writing & Performance

Uploaded by

rakshithavasan22

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPT, PDF, TXT or read online on Scribd

You are on page 1/ 23

Cache Writing & Performance

 Today we’ll finish up with caches; we’ll cover:

— Writing to caches: keeping memory consistent & write-
allocation.
— We’ll try to quantify the benefits of different cache designs,
and see how caches affect overall performance.
— How we can get the data into the cache before we need it with
prefetching.

01/27/25 Cache performance 1

Four important questions

1. When we copy a block of data from main memory to

the cache, where exactly should we put it?

2. How can we tell if a word is already in the cache, or if

it has to be fetched from main memory first?

3. Eventually, the small cache memory might fill up. To

load a new block from main RAM, we’d have to
replace one of the existing blocks in the cache...
which one?

4. How can write operations be handled by the memory

system?

 Previous lectures answered the first 3. Today, we consider the

4th.

01/27/25 Cache performance 2

Writing to a cache
 Writing to a cache raises several additional issues
 First, let’s assume that the address we want to write to is already
loaded in the cache. We’ll assume a simple direct-mapped cache:
Index V Tag Data Address Data
... ...
110 1 11010 42803 1101 0110 42803
... ...

 If we write a new value to that address, we can store the new

data in the cache, and avoid an expensive main memory access
[but inconsistent]
Mem[1101 0110] = 21763

Index V Tag Data Address Data

... ...
110 1 11010 21763 1101 0110 42803
... ...

01/27/25 Cache performance 3

Write-through caches

 A write-through cache solves the inconsistency problem by

forcing all writes to update both the cache and the main memory

Mem[1101 0110] = 21763

Index V Tag Data Address Data

... ...
110 1 11010 21763 1101 0110 21763
... ...

 This is simple to implement and keeps the cache and memory

consistent

 Why is this not so good?

01/27/25 Cache performance 4

Write-back caches
 In a write-back cache, the memory is not updated until the cache
block needs to be replaced (e.g., when loading data into a full
cache set)
 For example, we might write some data to the cache at first,
leaving it inconsistent with the main memory as shown before
— The cache block is marked “dirty” to indicate this inconsistency
Mem[1101 0110] = 21763

Index V Dirty Tag Data Address Data

... 1000 1110 1225
110 1 1 11010 21763 1101 0110 42803
... ...

 Subsequent reads to the same memory address will be serviced by

the cache, which contains the correct, updated data
01/27/25 Cache performance 5
Finishing the write back
 We don’t need to store the new value back to main memory
unless the cache block gets replaced
 e.g. on a read from Mem[1000 1110], which maps to the same
cache block, the modified cache contents will first be written to
main memory
Index V Dirty Tag Data Address Data
... 1000 1110 1225
110 1 1 11010 21763 1101 0110 21763

... ...

 Only then can the cache block be replaced with data from
Index
address V142
Dirty Tag Data Address Data
... 1000 1110 1225
110 1 0 10001 1225 1101 0110 21763

... ...

01/27/25 Cache performance 6

Write misses

 A second scenario is if we try to write to an address that is not

already contained in the cache; this is called a write miss

 Let’s say we want to store 21763 into Mem[1101 0110] but we

find that address is not currently in the cache

Index V Tag Data Address Data

... ...
110 1 00010 123456 1101 0110 6378
... ...

 When we update Mem[1101 0110], should we also load it into the

cache?

01/27/25 Cache performance 8

Allocate on write

 An allocate on write strategy would instead load the newly written

data into the cache
Mem[214] = 21763

Index V Tag Data Address Data

... ...
110 1 11010 21763 1101 0110 21763
... ...

 If that data is needed again soon, it will be available in the cache

 This is generally the baseline behavior or processors.

 What about the following?

for (int i = 0; i < LARGE; i++)
a[i] = i;
01/27/25 Cache performance 9
Non-temporal stores (write-around/write-no-
allocate)
 For code where the stored values won’t get used in the near future,
like:

for (int i = 0; i < LARGE; i++)

a[i] = i;

 There is no point in putting these values in the cache.

 With a write around policy, the write operation goes directly to main
memory without affecting the cache. Mem[1101 0110] = 21763

Index V Tag Data Address Data

... ...
110 1 00010 123456 1101 0110 21763
... ...

 Some modern processors with write-allocate caches provide special

store instructions called non-temporal stores that do this.
01/27/25 Cache performance 10
Real Designs

01/27/25 Cache performance 11

First Observations
 Split Instruction/Data caches:
— Pro: No structural hazard between IF & MEM stages
• A single-ported unified cache stalls fetch during load or
store
— Con: Static partitioning of cache between instructions & data
• Bad if working sets unequal: e.g., code/DATA or CODE/data

 Cache Hierarchies:
— Trade-off between access time & hit rate
• L1 cache can focus on fast access time (with okay hit rate)
• L2 cache can focus on good hit rate (with okay access
time)
— Such hierarchical design is another “big idea”
— We saw this in section.
CPU L1 cache L2 cache Main
Memory

01/27/25 Cache performance 12

Opteron Vital Statistics

CPU L1 cache L2 cache Main

Memory
 L1 Caches: Instruction & Data
— 64 kB
— 64 byte blocks
— 2-way set associative
— 2 cycle access time

 L2 Cache:
— 1 MB
— 64 byte blocks
— 4-way set associative
— 16 cycle access time (total, not just miss penalty)

 Memory
— 200+ cycle access time

01/27/25 Cache performance 13

Comparing cache organizations
 Like many architectural features, caches are evaluated
experimentally.
— As always, performance depends on the actual instruction mix,
since different programs will have different memory access
patterns.
— Simulating or executing real applications is the most accurate
way to measure performance characteristics.
 The graphs on the next few slides illustrate the simulated miss
rates for several different cache designs.
— Again lower miss rates are generally better, but remember
that the miss rate is just one component of average memory
access time and execution time.
— We will do some cache simulations on the MP’s.

01/27/25 Cache performance 14

Associativity tradeoffs and miss rates
 As we saw last time, higher associativity means more complex
hardware.
 But a highly-associative cache will also exhibit a lower miss rate.
— Each set has more blocks, so there’s less chance of a conflict
between two addresses which both belong in the same set.
 This graph shows the miss rates decreasing as the associativity
increases.

12%

9%
Miss rate

0%
One-way Two-way Four-way Eight-way
Associativity

01/27/25 Cache performance 15

Cache size and miss rates
 The cache size also has a significant impact on performance.
— The larger a cache is, the less chance there will be of a
conflict.
 This graph depicts the miss rate as a function of both the cache
size and its associativity.

15%

12%

9%
Miss rate

1 KB
2 KB
4 KB
6% 8 KB

0%
One-way Two-way Four-way Eight-way
Associativity

01/27/25 Cache performance 16

Block size and miss rates
 Finally, Figure 7.12 on p. 559 shows miss rates relative to the
block size and overall cache size.
— Smaller blocks do not take maximum advantage of spatial
locality.

40%

35%

30%
1 KB
25%
Miss rate

8 KB
20% 16 KB
64 KB
15%

10%

0%
4 16 64 256
Block size (bytes)

01/27/25 Cache performance 17

What happens on a cache miss?

 Can’t do write back (into register file) until data is fetched.

— Easiest thing to do is stall immediately.
• Sub-optimal if data isn’t used right away.

 Optimization: Non-blocking cache.

— Remember miss and which register it should write into &
continue
— Stalls when:
• Data is needed
• Or too many misses outstanding
lw $t0, 64($a0)
 …
Exploit by “hoist”ing loads up from their uses, but…
…
— Uses up a register
…
— For potentially many cycles (~100 to memory) …
— Might be guessing what will be accessed. add $v0, $t0, $t1
01/27/25 Cache performance 19
Software Prefetching

 Most modern architectures provide special software prefetch

instructions
— They look like loads w/o destination registers
• e.g., on SPIM, lw $0, 64($a0) # write to the zero
register.
— These are hints to the processor:
• “I think I might use cache block containing this address”
• Hardware will try to move the block into the cache.
• But, hardware can ignore (if busy)

— Useful for fetching data ahead of use:

for (int i = 0 ; i < LARGE ; i ++) {

prefetch A[i+16]; // prefetch 16 iterations ahead.
computation A[i];
}
01/27/25 Cache performance 20
Prefetching, cont.
 Remember this graph? int array[SIZE];
Actual Data from int A = 0;

45 remsun2.ews.uiuc.edu for (int i = 0 ; i < 200000 ; ++ i) {

40
for (int j = 0 ; j < SIZE ; ++ j) {
35
A += array[j];
30 }
}
time

25
Series1
20

0
0 2000 4000 6000 8000 10000

size

01/27/25 Cache performance 21

Hardware Stream Prefetching
int array[SIZE];
int A = 0;
 Inner loop has very simple access pattern.
for (int i = 0 ; i < 200000 ; ++ i) {
— A, A+4, A+8, A+12, …
for (int j = 0 ; j < SIZE ; ++ j) {
— What is called a stream
A += array[j];
}
 We can easily build hardware to recognize }streams
 If we get a pair of sequential misses (blocks X, X+1), predict a stream.
— Fetch the next two blocks (X+2, X+3)

 Continue fetching the stream if the prefetch blocks accessed.

— If X+2 is read/written, prefetch X+4 …

 As confidence in stream increases, increase # of outstanding

prefetches
— If we get to X+8, have prefetches for X+9, X+10, X+11, X+12,
X+13

 Can learn strides as well (X, X+16, X+32, …) and (X, X-1, X-2, …)
01/27/25 Cache performance 22
PC-based HW Prefetching
 What about the following?

for (int i=0 ; i < LARGE ; i ++) {

C[i] = A[i] + B[i];
}

 3 separate streams
— Might confuse naïve prefetcher.

 Observation: A, B, and C accessed by different instructions.

 Learn a stream for each instruction

 Modern x86 chips do both stream, and PC-based stride

prefetching in HW

01/27/25 Cache performance 23

So what do we need SW prefetching for?

 Non-stride accesses!
 Like linked data structures:
— lists, arrays of pointers, etc.

 Consider:
element_t *A[SIZE];
for (int i=0 ; i < SIZE ; i ++) {
process(A[i]);
}

01/27/25 Cache performance 24

Summary

 Writing to a cache poses a couple of interesting issues.

— Write-through and write-back policies keep the cache
consistent with main memory in different ways for write hits.
— Write-around and allocate-on-write are two strategies to
handle write misses, differing in whether updated data is
loaded into the cache.

 Hardware prefetching handles most streams and strides.

— We’ll talk later about 1 limitation.
 Software prefetching is useful for linked data structures
— Must be added by programmer (or very smart compiler)

 Next time, we’ll look at cache-conscious programming.

01/27/25 Cache performance 25

Anargharāghava of Murāri PDF
100% (2)
Anargharāghava of Murāri PDF
639 pages
ReleaseNotes M32 Firmware 4.09
No ratings yet
ReleaseNotes M32 Firmware 4.09
2 pages
Foc QP 3
No ratings yet
Foc QP 3
18 pages
AWS Certified Solutions Architect - Professional
From Everand
AWS Certified Solutions Architect - Professional
VB Dev
No ratings yet
VMware Certified Professional Data Center Virtualization on vSphere 6.7 Study Guide: Exam 2V0-21.19
From Everand
VMware Certified Professional Data Center Virtualization on vSphere 6.7 Study Guide: Exam 2V0-21.19
Jon Hall
No ratings yet
L18 Cache Wrap Up
No ratings yet
L18 Cache Wrap Up
30 pages
Cache
No ratings yet
Cache
34 pages
Cse 410 Computer Systems: Hal Perkins Spring 2010 L T 13 C Hwit DPF Lecture 13 - Cache Writes and Performance
No ratings yet
Cse 410 Computer Systems: Hal Perkins Spring 2010 L T 13 C Hwit DPF Lecture 13 - Cache Writes and Performance
20 pages
CA Lecture 08
No ratings yet
CA Lecture 08
38 pages
05) Cache Memory Introduction
No ratings yet
05) Cache Memory Introduction
20 pages
CMP3010L09 MemoryII
No ratings yet
CMP3010L09 MemoryII
39 pages
Advanced Computer Architecture: BY Dr. Radwa M. Tawfeek
No ratings yet
Advanced Computer Architecture: BY Dr. Radwa M. Tawfeek
32 pages
10 Caches
No ratings yet
10 Caches
34 pages
Memory 2
No ratings yet
Memory 2
31 pages
Cache Mapping
No ratings yet
Cache Mapping
11 pages
25 e 50 Beb 5 Aad 8 F 60
No ratings yet
25 e 50 Beb 5 Aad 8 F 60
49 pages
Cache Misses
No ratings yet
Cache Misses
8 pages
L07 MemoryII
No ratings yet
L07 MemoryII
27 pages
Cache Presentation
No ratings yet
Cache Presentation
45 pages
Cache Basics and Operation
No ratings yet
Cache Basics and Operation
42 pages
Computer Architecture: Assoc. Prof. Nguyễn Trí Thành, Phd
No ratings yet
Computer Architecture: Assoc. Prof. Nguyễn Trí Thành, Phd
55 pages
Chapter # 05
No ratings yet
Chapter # 05
42 pages
Unit 4
No ratings yet
Unit 4
72 pages
Unit II
No ratings yet
Unit II
9 pages
UNIT-IV Memory and I/O
No ratings yet
UNIT-IV Memory and I/O
36 pages
09 Caches Tlbs
No ratings yet
09 Caches Tlbs
33 pages
Memory Hierarchy Design-Aca
No ratings yet
Memory Hierarchy Design-Aca
15 pages
COMP 740: Computer Architecture and Implementation: Montek Singh
No ratings yet
COMP 740: Computer Architecture and Implementation: Montek Singh
41 pages
Memory Cache
No ratings yet
Memory Cache
18 pages
Cache
No ratings yet
Cache
36 pages
15IF11 Multicore B
No ratings yet
15IF11 Multicore B
36 pages
Lecture 13 - Introduction To Cache
No ratings yet
Lecture 13 - Introduction To Cache
47 pages
UNIT2 Cahe-Opt
No ratings yet
UNIT2 Cahe-Opt
134 pages
Cache and Caching: Electrical and Electronic Engineering
No ratings yet
Cache and Caching: Electrical and Electronic Engineering
15 pages
Computer Architecture and Organization: Lecture14: Cache Memory Organization
No ratings yet
Computer Architecture and Organization: Lecture14: Cache Memory Organization
18 pages
Lectures wk11
No ratings yet
Lectures wk11
21 pages
Coa PPT
No ratings yet
Coa PPT
158 pages
Cau 6 Cache
No ratings yet
Cau 6 Cache
25 pages
Cache and Caching: Electrical and Electronic Engineering
No ratings yet
Cache and Caching: Electrical and Electronic Engineering
15 pages
10 Cache Memories
No ratings yet
10 Cache Memories
49 pages
ch2 Appb
No ratings yet
ch2 Appb
58 pages
Cache Memory
No ratings yet
Cache Memory
28 pages
Lec8 Memory
No ratings yet
Lec8 Memory
17 pages
Lec2 PDF
No ratings yet
Lec2 PDF
21 pages
Cache PPT
No ratings yet
Cache PPT
38 pages
Cache Design
No ratings yet
Cache Design
59 pages
18 Caches Cornell PDF
No ratings yet
18 Caches Cornell PDF
43 pages
Cache Memory: A Safe Place For Hiding or Storing Things
100% (1)
Cache Memory: A Safe Place For Hiding or Storing Things
34 pages
Cache Performance Improving Cache Performance
No ratings yet
Cache Performance Improving Cache Performance
6 pages
Computer Organization and Architecture
No ratings yet
Computer Organization and Architecture
12 pages
10 Caches Detail
No ratings yet
10 Caches Detail
45 pages
Lecture 8
No ratings yet
Lecture 8
33 pages
Week12 Updated
No ratings yet
Week12 Updated
60 pages
CH04
No ratings yet
CH04
46 pages
Computer Architecture: Memory Organization
No ratings yet
Computer Architecture: Memory Organization
65 pages
Memory Hierarchy Design
No ratings yet
Memory Hierarchy Design
115 pages
Chapter 6
No ratings yet
Chapter 6
37 pages
Course Code: CS 283 Course Title: Computer Architecture: Class Day: Friday Timing: 12:00 To 1:30
No ratings yet
Course Code: CS 283 Course Title: Computer Architecture: Class Day: Friday Timing: 12:00 To 1:30
23 pages
William Stallings Computer Organization and Architecture 9 Edition
No ratings yet
William Stallings Computer Organization and Architecture 9 Edition
46 pages
Sampriya Chandra Cache Memory
No ratings yet
Sampriya Chandra Cache Memory
36 pages
Administering Microsoft Azure SQL Solutions DP 300
From Everand
Administering Microsoft Azure SQL Solutions DP 300
Manish Soni
No ratings yet
SAS Programming Guidelines Interview Questions You'll Most Likely Be Asked
From Everand
SAS Programming Guidelines Interview Questions You'll Most Likely Be Asked
Vibrant Publishers
No ratings yet
LPIC-3 Exam 306-300 Mastery: 500 Practice Questions on High Availability & Storage Clusters
From Everand
LPIC-3 Exam 306-300 Mastery: 500 Practice Questions on High Availability & Storage Clusters
Steve Brown
No ratings yet
Student Report Card Management Report
No ratings yet
Student Report Card Management Report
6 pages
NS LogMessages
No ratings yet
NS LogMessages
54 pages
CMake Lists
No ratings yet
CMake Lists
15 pages
MPU 1223 - Presentation Skills
No ratings yet
MPU 1223 - Presentation Skills
48 pages
Lecture4 AccessControl
No ratings yet
Lecture4 AccessControl
13 pages
Subject: English Level: Grade 8 Class Size: 40 Students Duration: 1 Hour Lesson: Nouns Learning Competencies
No ratings yet
Subject: English Level: Grade 8 Class Size: 40 Students Duration: 1 Hour Lesson: Nouns Learning Competencies
4 pages
HTML CSS JS Notes
No ratings yet
HTML CSS JS Notes
4 pages
Solaris Command Reference
100% (12)
Solaris Command Reference
7 pages
Manual Registrador de Datos Cr3000
No ratings yet
Manual Registrador de Datos Cr3000
546 pages
Past Simple and Past Continuous: Grammar
No ratings yet
Past Simple and Past Continuous: Grammar
4 pages
Comprehensive Guide On Cupp - A Wordlist Generating Tool
No ratings yet
Comprehensive Guide On Cupp - A Wordlist Generating Tool
12 pages
Analytical Exposition
No ratings yet
Analytical Exposition
20 pages
Mac Network Commands Cheat Sheet
No ratings yet
Mac Network Commands Cheat Sheet
1 page
Maths P1 & P3
No ratings yet
Maths P1 & P3
9 pages
Together Kl5 U1 Test For Dyslexic Students
No ratings yet
Together Kl5 U1 Test For Dyslexic Students
4 pages
History On Number Systems
100% (2)
History On Number Systems
14 pages
FortiAuthenticator 6.2.0 VM Install Guide
No ratings yet
FortiAuthenticator 6.2.0 VM Install Guide
43 pages
2G Huawei Site Solution
100% (1)
2G Huawei Site Solution
31 pages
Brill Typeface 2011 1
No ratings yet
Brill Typeface 2011 1
2 pages
Cisco IOS Disaster Recovery With TFTP Download
100% (1)
Cisco IOS Disaster Recovery With TFTP Download
13 pages
CSC213 Object Oriented Programming-Lab Manual-Sol
No ratings yet
CSC213 Object Oriented Programming-Lab Manual-Sol
83 pages
CSTP 1-6 Ehlers 7
No ratings yet
CSTP 1-6 Ehlers 7
39 pages
E@syfile TC Installation Trouble Shooting Guide.
No ratings yet
E@syfile TC Installation Trouble Shooting Guide.
3 pages
Acting With IRISH Manual
100% (1)
Acting With IRISH Manual
23 pages
Puncation Tutoira
No ratings yet
Puncation Tutoira
4 pages
OOAD Unit-2
No ratings yet
OOAD Unit-2
40 pages
Special Study On PNUEMA HAGION V5a
No ratings yet
Special Study On PNUEMA HAGION V5a
174 pages

Cache Writing & Performance

Uploaded by

Cache Writing & Performance

Uploaded by

Cache Writing & Performance

 Today we’ll finish up with caches; we’ll cover:

01/27/25 Cache performance 1

1. When we copy a block of data from main memory to

2. How can we tell if a word is already in the cache, or if

3. Eventually, the small cache memory might fill up. To

4. How can write operations be handled by the memory

 Previous lectures answered the first 3. Today, we consider the

01/27/25 Cache performance 2

 If we write a new value to that address, we can store the new

Index V Tag Data Address Data

01/27/25 Cache performance 3

 A write-through cache solves the inconsistency problem by

Mem[1101 0110] = 21763

Index V Tag Data Address Data

 This is simple to implement and keeps the cache and memory

 Why is this not so good?

01/27/25 Cache performance 4

Index V Dirty Tag Data Address Data

 Subsequent reads to the same memory address will be serviced by

01/27/25 Cache performance 6

 A second scenario is if we try to write to an address that is not

 Let’s say we want to store 21763 into Mem[1101 0110] but we

Index V Tag Data Address Data

 When we update Mem[1101 0110], should we also load it into the

01/27/25 Cache performance 8

 An allocate on write strategy would instead load the newly written

Index V Tag Data Address Data

 If that data is needed again soon, it will be available in the cache

 What about the following?

for (int i = 0; i < LARGE; i++)

 There is no point in putting these values in the cache.

Index V Tag Data Address Data

 Some modern processors with write-allocate caches provide special

01/27/25 Cache performance 11

01/27/25 Cache performance 12

CPU L1 cache L2 cache Main

01/27/25 Cache performance 13

01/27/25 Cache performance 14

01/27/25 Cache performance 15

01/27/25 Cache performance 16

01/27/25 Cache performance 17

 Can’t do write back (into register file) until data is fetched.

 Optimization: Non-blocking cache.

 Most modern architectures provide special software prefetch

— Useful for fetching data ahead of use:

for (int i = 0 ; i < LARGE ; i ++) {

45 remsun2.ews.uiuc.edu for (int i = 0 ; i < 200000 ; ++ i) {

01/27/25 Cache performance 21

 Continue fetching the stream if the prefetch blocks accessed.

 As confidence in stream increases, increase # of outstanding

for (int i=0 ; i < LARGE ; i ++) {

 Observation: A, B, and C accessed by different instructions.

 Learn a stream for each instruction

 Modern x86 chips do both stream, and PC-based stride

01/27/25 Cache performance 23

01/27/25 Cache performance 24

 Writing to a cache poses a couple of interesting issues.

 Hardware prefetching handles most streams and strides.

 Next time, we’ll look at cache-conscious programming.

01/27/25 Cache performance 25

You might also like