0% found this document useful (0 votes)

6 views11 pages

Lecture Slides 07 076-Caches-Opt

Uploaded by

yihuangece

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

6 views11 pages

Lecture Slides 07 076-Caches-Opt

Uploaded by

yihuangece

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

You are on page 1/ 11

University of Washington

Section 7: Memory and Caches

 Cache basics
 Principle of locality
 Memory hierarchies
 Cache organization
 Program optimizations that consider caches

Caches and Program Optimizations

University of Washington

Optimizations for the Memory Hierarchy

 Write code that has locality
 Spatial: access data contiguously
 Temporal: make sure access to the same data is not too far apart in time
 How to achieve?
 Proper choice of algorithm
 Loop transformations

Caches and Program Optimizations

University of Washington

Example: Matrix Multiplication

c = (double *) calloc(sizeof(double), n*n);

/* Multiply n x n matrices a and b */

void mmm(double *a, double *b, double *c, int n) {
int i, j, k;
for (i = 0; i < n; i++)
for (j = 0; j < n; j++)
for (k = 0; k < n; k++)
c[i*n + j] += a[i*n + k]*b[k*n + j];
}

j
c a b
=i *

Caches and Program Optimizations

University of Washington

Cache Miss Analysis

 Assume:
 Matrix elements are doubles
 Cache block = 64 bytes = 8 doubles
 Cache size C << n (much smaller than n)

n
 First iteration:
 n/8 + n = 9n/8 misses
(omitting matrix c)
= *
 Afterwards in cache:
(schematic)
= *
8 wide
Caches and Program Optimizations
University of Washington

Cache Miss Analysis

 Assume:
 Matrix elements are doubles
 Cache block = 64 bytes = 8 doubles
 Cache size C << n (much smaller than n)

n
 Other iterations:
 Again:
n/8 + n = 9n/8 misses
(omitting matrix c) = *
8 wide

 Total misses:
 9n/8 * n2 = (9/8) * n3

Caches and Program Optimizations

University of Washington

Blocked Matrix Multiplication

c = (double *) calloc(sizeof(double), n*n);

/* Multiply n x n matrices a and b */

void mmm(double *a, double *b, double *c, int n) {
int i, j, k;
for (i = 0; i < n; i+=B)
for (j = 0; j < n; j+=B)
for (k = 0; k < n; k+=B)
/* B x B mini matrix multiplications */
for (i1 = i; i1 < i+B; i1++)
for (j1 = j; j1 < j+B; j1++)
for (k1 = k; k1 < k+B; k1++)
c[i1*n + j1] += a[i1*n + k1]*b[k1*n + j1];
}

j1
c a b
=i1 *

Block size B x B
Caches and Program Optimizations
University of Washington

Cache Miss Analysis

 Assume:
 Cache block = 64 bytes = 8 doubles
 Cache size C << n (much smaller than n)
 Three blocks fit into cache: 3B2 < C

n/B blocks
 First (block) iteration:
 B2/8 misses for each block
 2n/B * B2/8 = nB/4
(omitting matrix c) = *
Block size B x B
 Afterwards in cache
(schematic)
= *
Caches and Program Optimizations
University of Washington

Cache Miss Analysis

 Assume:
 Cache block = 64 bytes = 8 doubles
 Cache size C << n (much smaller than n)
 Three blocks fit into cache: 3B2 < C

n/B blocks
 Other (block) iterations:
 Same as first iteration
 2n/B * B2/8 = nB/4
= *
 Total misses:
Block size B x B
 nB/4 * (n/B)2 = n3/(4B)

Caches and Program Optimizations

University of Washington

Summary
 No blocking: (9/8) * n3
 Blocking: 1/(4B) * n3
 If B = 8 difference is 4 * 8 * 9 / 8 = 36x
 If B = 16 difference is 4 * 16 * 9 / 8 = 72x

 Suggests largest possible block size B, but limit 3B2 < C!

 Reason for dramatic difference:

 Matrix multiplication has inherent temporal locality:
 Input data: 3n2, computation 2n3
 Every array element used O(n) times!
 But program has to be written properly

Caches and Program Optimizations

University of Washington

Cache-Friendly Code
 Programmer can optimize for cache performance
 How data structures are organized
 How data are accessed
 Nested loop structure
 Blocking is a general technique
 All systems favor “cache-friendly code”
 Getting absolute optimum performance is very platform specific
Cache sizes, line sizes, associativities, etc.
 Can get most of the advantage with generic code
 Keep working set reasonably small (temporal locality)
 Use small strides (spatial locality)
 Focus on inner loop code

Caches and Program Optimizations

University of Washington

Intel Core i7
The Memory Mountain 32 KB L1 i-cache
32 KB L1 d-cache
256 KB unified L2 cache
7000 8M unified L3 cache
Read throughput (MB/s)

L1
6000 All caches on-chip

5000

4000
L2
3000

2000
L3
1000

4K
16K
0
64K
256K

Mem
s1
s3

1M
s5
s7

4M
16M
s9

64M
s11
s13
s15

Stride (x8 bytes) Working set size (bytes)

s32

Caches and Program Optimizations

Lecture Slides 07 076-Caches-Opt
No ratings yet
Lecture Slides 07 076-Caches-Opt
11 pages
CS33 S25 L14 OpenMP Intro Annotated
No ratings yet
CS33 S25 L14 OpenMP Intro Annotated
73 pages
Cache Performance
No ratings yet
Cache Performance
44 pages
10 Cache Memories
No ratings yet
10 Cache Memories
49 pages
08 Caches
No ratings yet
08 Caches
78 pages
2 Cache Complexity
No ratings yet
2 Cache Complexity
100 pages
CA L7a SoftwareTechniquesToImproveCachePerformance
No ratings yet
CA L7a SoftwareTechniquesToImproveCachePerformance
15 pages
Issue With Memory
No ratings yet
Issue With Memory
3 pages
Aca Seminar Report
No ratings yet
Aca Seminar Report
11 pages
Rec 07
No ratings yet
Rec 07
40 pages
Memory 2
No ratings yet
Memory 2
31 pages
Parallel & Distributed Computing
No ratings yet
Parallel & Distributed Computing
58 pages
Lec 34
No ratings yet
Lec 34
26 pages
Matrix Multiplication-Javan.
No ratings yet
Matrix Multiplication-Javan.
6 pages
Affect of Cache On Memory Read/write:: Column Major
No ratings yet
Affect of Cache On Memory Read/write:: Column Major
3 pages
HW 6
No ratings yet
HW 6
3 pages
L18 Cache Wrap Up
No ratings yet
L18 Cache Wrap Up
30 pages
Cache Writing & Performance
No ratings yet
Cache Writing & Performance
23 pages
CA Lecture 08
No ratings yet
CA Lecture 08
38 pages
Chapter 6
No ratings yet
Chapter 6
37 pages
ch2 Appb
No ratings yet
ch2 Appb
58 pages
Chap 5 Memory System p1
No ratings yet
Chap 5 Memory System p1
30 pages
CS 61C: Great Ideas in Computer Architecture (Machine Structures)
No ratings yet
CS 61C: Great Ideas in Computer Architecture (Machine Structures)
32 pages
Improving Cache Performance:: Average Memory Access Time Amat T + Miss Rate X Miss Penalty
No ratings yet
Improving Cache Performance:: Average Memory Access Time Amat T + Miss Rate X Miss Penalty
16 pages
Lecture Slides 07 071-Caches-Basics
No ratings yet
Lecture Slides 07 071-Caches-Basics
11 pages
Lecture Slides 07 075 Caches Org Part2
No ratings yet
Lecture Slides 07 075 Caches Org Part2
13 pages
Cache Optimizations: Computer Architecture Prof. Muhamed Mudawar
No ratings yet
Cache Optimizations: Computer Architecture Prof. Muhamed Mudawar
38 pages
CPSC 312 Cache Memories: Topics
No ratings yet
CPSC 312 Cache Memories: Topics
39 pages
Cache Misses
No ratings yet
Cache Misses
8 pages
Lect12 Cache
No ratings yet
Lect12 Cache
39 pages
DigitalLogic ComputerOrganization L22 CachesP3 Handout
No ratings yet
DigitalLogic ComputerOrganization L22 CachesP3 Handout
52 pages
Data Oriented Design
No ratings yet
Data Oriented Design
17 pages
Memory Hierarchy Design-Aca
No ratings yet
Memory Hierarchy Design-Aca
15 pages
Cache Memory
No ratings yet
Cache Memory
28 pages
10 Caches
No ratings yet
10 Caches
34 pages
Lecture 5: Memory Hierarchy and Cache Traditional Four Questions For Memory Hierarchy Designers
No ratings yet
Lecture 5: Memory Hierarchy and Cache Traditional Four Questions For Memory Hierarchy Designers
10 pages
Lecture 8
No ratings yet
Lecture 8
22 pages
Cache
No ratings yet
Cache
31 pages
Cache Org
No ratings yet
Cache Org
19 pages
Direct-Mapped Cache: Write Allocate With Write-Through Protocol
No ratings yet
Direct-Mapped Cache: Write Allocate With Write-Through Protocol
25 pages
Lecture 13 - Introduction To Cache
No ratings yet
Lecture 13 - Introduction To Cache
47 pages
Recitation05 Cachelab
No ratings yet
Recitation05 Cachelab
97 pages
Lecture 16: Cache Memories - Last Time - Today
No ratings yet
Lecture 16: Cache Memories - Last Time - Today
32 pages
Cache
No ratings yet
Cache
34 pages
Onur 447 Spring15 Lecture19 High Performance Caches Afterlecture
No ratings yet
Onur 447 Spring15 Lecture19 High Performance Caches Afterlecture
57 pages
Lecture Slides 07 071-Caches-Basics
No ratings yet
Lecture Slides 07 071-Caches-Basics
11 pages
Chapter 2z
No ratings yet
Chapter 2z
54 pages
Lec 30
No ratings yet
Lec 30
22 pages
Disc09 Sols
No ratings yet
Disc09 Sols
7 pages
DigitalLogic ComputerOrganization L20 CachesP1 Handout
No ratings yet
DigitalLogic ComputerOrganization L20 CachesP1 Handout
43 pages
07 CacheOptimizations
No ratings yet
07 CacheOptimizations
38 pages
Lecture Slides 07 075-Caches-Org-Part2
No ratings yet
Lecture Slides 07 075-Caches-Org-Part2
13 pages
Final Exam Topics: CSE 564 Computer Architecture Summer 2017
No ratings yet
Final Exam Topics: CSE 564 Computer Architecture Summer 2017
78 pages
Compiler Optimizations and Prefetching
No ratings yet
Compiler Optimizations and Prefetching
22 pages
Optimize Matrix Multiplication Utilizing Opencl Fpga Kernel
No ratings yet
Optimize Matrix Multiplication Utilizing Opencl Fpga Kernel
8 pages
Cache Memory: A Safe Place For Hiding or Storing Things
100% (1)
Cache Memory: A Safe Place For Hiding or Storing Things
34 pages
GDC2003 Memory Optimization 18mar03
No ratings yet
GDC2003 Memory Optimization 18mar03
60 pages
CMP3010L09 MemoryII
No ratings yet
CMP3010L09 MemoryII
39 pages
Couchbase Certified Java Developer - Exam Practice Tests
From Everand
Couchbase Certified Java Developer - Exam Practice Tests
Cristian Scutaru
No ratings yet
IGNOU BCA Data and File Structure Previous Year Unsolved Papers MCS 021
From Everand
IGNOU BCA Data and File Structure Previous Year Unsolved Papers MCS 021
Manish Soni
No ratings yet
Homework 4
No ratings yet
Homework 4
4 pages
10 - 4 - Address Translation (11 - 17)
No ratings yet
10 - 4 - Address Translation (11 - 17)
5 pages
10 - 3 - Virtual Memory Caches (22 - 49)
No ratings yet
10 - 3 - Virtual Memory Caches (22 - 49)
10 pages
Wiki - Homework 1 The Hardware Software Interface
No ratings yet
Wiki - Homework 1 The Hardware Software Interface
3 pages
Lecture Slides 05 054-Linuxstackframe
No ratings yet
Lecture Slides 05 054-Linuxstackframe
18 pages
Homework 3
No ratings yet
Homework 3
3 pages
Lecture Slides 06 064-Structs
No ratings yet
Lecture Slides 06 064-Structs
5 pages
Lecture Slides 06 061-Arrays
No ratings yet
Lecture Slides 06 061-Arrays
11 pages
Lecture Slides 06 062-Nestedarrays
No ratings yet
Lecture Slides 06 062-Nestedarrays
12 pages
Lecture Slides 09 093-Vmascache
No ratings yet
Lecture Slides 09 093-Vmascache
16 pages
Lecture Slides 05 053-Stacklanguages
No ratings yet
Lecture Slides 05 053-Stacklanguages
15 pages
Lecture Slides 09 094-Vmaddrtranslation
No ratings yet
Lecture Slides 09 094-Vmaddrtranslation
10 pages
Lecture Slides 08 081-Processes
No ratings yet
Lecture Slides 08 081-Processes
7 pages
Lecture Slides 09 092-Indirection
No ratings yet
Lecture Slides 09 092-Indirection
9 pages
Lecture Slides 04 043 x86 Address Comp
No ratings yet
Lecture Slides 04 043 x86 Address Comp
15 pages
Lecture Slides 02 028-FloatsC
No ratings yet
Lecture Slides 02 028-FloatsC
5 pages
Lecture Slides 05 052-Proccalls
No ratings yet
Lecture Slides 05 052-Proccalls
15 pages
Lecture Slides 05 055-Registersaving
No ratings yet
Lecture Slides 05 055-Registersaving
8 pages
Lecture Slides 02 026-IEEEfloats
No ratings yet
Lecture Slides 02 026-IEEEfloats
8 pages
Lecture Slides 04 046-X86-Loops
No ratings yet
Lecture Slides 04 046-X86-Loops
11 pages
Lecture Slides 02 023-Integersc
No ratings yet
Lecture Slides 02 023-Integersc
5 pages
Lecture Slides 04 047-X86-Switch
No ratings yet
Lecture Slides 04 047-X86-Switch
14 pages
Lecture Slides 01 015-Arrays
No ratings yet
Lecture Slides 01 015-Arrays
7 pages
Lecture Slides 02 024-Shifting
No ratings yet
Lecture Slides 02 024-Shifting
6 pages
Lecture Slides 02 025-Fractions
No ratings yet
Lecture Slides 02 025-Fractions
7 pages
Lecture Slides 04 044-X86-Conditionals
No ratings yet
Lecture Slides 04 044-X86-Conditionals
9 pages
Exec Cics Assign
100% (1)
Exec Cics Assign
10 pages
ICO Crowd Magazine, Issue One, September 2017
No ratings yet
ICO Crowd Magazine, Issue One, September 2017
112 pages
Authorization To Perform Work
No ratings yet
Authorization To Perform Work
1 page
Sirah: Prophet
No ratings yet
Sirah: Prophet
4 pages
2020.11.28.402297v2.full - Using Pumas AI
No ratings yet
2020.11.28.402297v2.full - Using Pumas AI
35 pages
Hama IR110 v2.0. Manual
No ratings yet
Hama IR110 v2.0. Manual
40 pages
Getting Started Guide GC
No ratings yet
Getting Started Guide GC
60 pages
Ds7100niq1 Series
No ratings yet
Ds7100niq1 Series
93 pages
6 Lecture6 AI
No ratings yet
6 Lecture6 AI
7 pages
S.Y.J.C. Information Technology (I.T.) Revision: Database Concepts Using Libre Office Base
No ratings yet
S.Y.J.C. Information Technology (I.T.) Revision: Database Concepts Using Libre Office Base
46 pages
Improving Throughput and Availability of Cellular Digital Packet Data (CDPD)
No ratings yet
Improving Throughput and Availability of Cellular Digital Packet Data (CDPD)
12 pages
Lesson 2
No ratings yet
Lesson 2
18 pages
ISU Master Data V0.7
No ratings yet
ISU Master Data V0.7
28 pages
Abstract Algebra
No ratings yet
Abstract Algebra
4 pages
1.ION9000 Technical Datasheet - Class 0.1S - 1024 Samples Per Cycle
No ratings yet
1.ION9000 Technical Datasheet - Class 0.1S - 1024 Samples Per Cycle
12 pages
Nurse Call Reg
No ratings yet
Nurse Call Reg
96 pages
Fiction Becomes Fact: Sustainable Information and Communications Technology in 2020
No ratings yet
Fiction Becomes Fact: Sustainable Information and Communications Technology in 2020
38 pages
Unit-5 Bi
No ratings yet
Unit-5 Bi
47 pages
Server Security Audit Report FOR: Baramati Agro
No ratings yet
Server Security Audit Report FOR: Baramati Agro
10 pages
Module Framework R1
No ratings yet
Module Framework R1
16 pages
Wireless Hacking Tool
No ratings yet
Wireless Hacking Tool
9 pages
Introduction To AngularJS
No ratings yet
Introduction To AngularJS
74 pages
Game Development in Haskell Archive
100% (1)
Game Development in Haskell Archive
74 pages
V-1 Final ERP SYS 2021
No ratings yet
V-1 Final ERP SYS 2021
31 pages
2022-Cloud Computing Security and Law-1
No ratings yet
2022-Cloud Computing Security and Law-1
5 pages
SDN 10-Gigabit L2+ Managed Switch
No ratings yet
SDN 10-Gigabit L2+ Managed Switch
16 pages
Basic USB Type-C™ Upstream Facing Port Implementation: Author: Andrew Rogers Microchip Technology Inc
No ratings yet
Basic USB Type-C™ Upstream Facing Port Implementation: Author: Andrew Rogers Microchip Technology Inc
12 pages
ABAP Objects in Action: Screen Programming With The Control Framework
No ratings yet
ABAP Objects in Action: Screen Programming With The Control Framework
25 pages
Universal Remote Instruction Manual
No ratings yet
Universal Remote Instruction Manual
16 pages
Bravo
No ratings yet
Bravo
40 pages

Lecture Slides 07 076-Caches-Opt

Uploaded by

Lecture Slides 07 076-Caches-Opt

Uploaded by

University of Washington

Section 7: Memory and Caches

Caches and Program Optimizations

Optimizations for the Memory Hierarchy

Caches and Program Optimizations

Example: Matrix Multiplication

/* Multiply n x n matrices a and b */

Caches and Program Optimizations

Cache Miss Analysis

Cache Miss Analysis

Caches and Program Optimizations

Blocked Matrix Multiplication

/* Multiply n x n matrices a and b */

Cache Miss Analysis

Cache Miss Analysis

Caches and Program Optimizations

 Suggests largest possible block size B, but limit 3B2 < C!

 Reason for dramatic difference:

Caches and Program Optimizations

Caches and Program Optimizations

Stride (x8 bytes) Working set size (bytes)

Caches and Program Optimizations

You might also like