0% found this document useful (0 votes)

35 views35 pages

Lecture05 Memory Hierarchy Cache

computer systems organization

Uploaded by

doublefelix921

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

35 views35 pages

Lecture05 Memory Hierarchy Cache

computer systems organization

Uploaded by

doublefelix921

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 35

Memory Hierarchy

Computer Systems Organization (Spring 2016)

CSCI-UA 201, Section 2

Instructor: Joanna Klukowska

Slides adapted from

Randal E. Bryant and David R. OHallaron (CMU)
Mohamed Zahran (NYU)

Cache Memory Organization and Access

Example Memory
Hierarchy L0:
Smaller,
faster,
and
costlier
(per byte)
storage
devices

Larger,
slower,
and
cheaper
(per byte)
storage L5:
devices

L6:

L1:
L2:
L3:

Regs

L1 cache
(SRAM)

CPU registers hold words

retrieved from the L1 cache.

L2 cache
(SRAM)

L1 cache holds cache lines

retrieved from the L2 cache.
L2 cache holds cache lines
retrieved from L3 cache

L3 cache
(SRAM)
L3 cache holds cache lines
retrieved from main memory.

L4:

Main memory
(DRAM)

Main memory holds

disk blocks retrieved
from local disks.

Local secondary storage

(local disks)

Remote secondary storage

(e.g., Web servers)

Local disks hold files

retrieved from disks
on remote servers

General Cache Concept

Cache

Data is copied in block-sized

transfer units

10
4

Memory

14
10

Smaller, faster, more expensive

memory caches a subset of
the blocks

Larger, slower, cheaper memory

viewed as partitioned into blocks

Cache Memories

Cache memories are small, fast SRAM-based memories managed

automatically in hardware
Hold frequently accessed blocks of main memory

CPU looks first for data in cache

Typical system structure:

CPU chip
Register file
Cache
memory

AL
U
System bus

Bus interface

I/O
bridge

Memory bus
Main
memor
y
5

General Cache Organization (S, E, B)

E = 2e lines per set
set
line

S = 2s sets

valid bit

tag

0 1 2

B-1

Cache size:
C = S x E x B data bytes

B = 2b bytes per cache block (the data)

Cache Read
E = 2e lines per set

Locate set
Check if any line in set
has matching tag
Yes + line valid: hit
Locate data starting
at offset
Address of word:
t bits

S = 2s sets

tag

s bits

b bits

set block
index offset

data begins at this offset

tag

0 1 2

B-1

valid bit
B = 2b bytes per cache block (the data)

Example: Direct Mapped Cache (E = 1)

Direct mapped: One line per set
Assume: cache block size 8 bytes

tag

0 1 2 3 4 5 6 7

tag

0 1 2 3 4 5 6 7

tag

0 1 2 3 4 5 6 7

tag

0 1 2 3 4 5 6 7

Address of int:
t bits

001

100

find set

S = 2s sets

Example: Direct Mapped Cache (E = 1)

Direct mapped: One line per set
Assume: cache block size 8 bytes

valid? + match: assume yes = hit

tag

Address of int:
t bits

001

100

0 1 2 3 4 5 6 7

block offset

Example: Direct Mapped Cache (E = 1)

Direct mapped: One line per set
Assume: cache block size 8 bytes

valid? + match: assume yes = hit

tag

Address of int:
t bits

001

100

0 1 2 3 4 5 6 7

block offset
int (4 Bytes) is here

If tag doesnt match: old line is evicted and replaced

Example: Direct-Mapped Cache Simulation

t=1
x

s=2
xx

b=1
x

M=16 bytes (4-bit addresses), B=2 bytes/block,

S=4 sets, E=1 Blocks/set

Address trace (reads, one byte per read):

0 [00002],
miss
1 [00012],
hit
7 [01112],
miss
8 [10002],
miss
0 [00002]
miss

Set 0
Set 1
Set 2
Set 3

v
0
1

Ta
g0
1?

Block
?
M[8-9]
M[0-1]

M[6-7]
11

E-way Set Associative Cache (Here: E = 2)

E = 2: Two lines per set
Assume: cache block size 8 bytes

Address of short int:

t bits

tag

0 1 2 3 4 5 6 7

tag

0 1 2 3 4 5 6 7

tag

0 1 2 3 4 5 6 7

tag

0 1 2 3 4 5 6 7

tag

0 1 2 3 4 5 6 7

tag

0 1 2 3 4 5 6 7

tag

0 1 2 3 4 5 6 7

tag

0 1 2 3 4 5 6 7

001

100

find set

E-way Set Associative Cache (Here: E = 2)

E = 2: Two lines per set
Assume: cache block size 8 bytes

Address of short int:

t bits

compare both

001

100

valid? + match: yes = hit

tag

0 1 2 3 4 5 6 7

tag

0 1 2 3 4 5 6 7

block offset

E-way Set Associative Cache (Here: E = 2)

E = 2: Two lines per set
Assume: cache block size 8 bytes

Address of short int:

t bits

compare both

001

100

valid? + match: yes = hit

tag

0 1 2 3 4 5 6 7

tag

0 1 2 3 4 5 6 7

block offset
short int (2 Bytes) is here

No match:
One line in set is selected for eviction and replacement
Replacement policies: random, least recently used (LRU),

Example: 2-Way Set Associative Cache

Simulation
t=2
xx

s=1
x

b=1
x

M=16 byte addresses, B=2 bytes/block,

S=2 sets, E=2 blocks/set
Address trace (reads, one byte per read):
0 [00002],
miss
1 [00012],
hit
7 [01112],
miss
8 [10002],
miss
0 [00002]
hit
v
0
Set 0 1
1
0

Ta
?g
00
10

Block
?
M[0-1]
M[8-9]

1
Set 1 0

M[6-7]

What about writes?

Multiple copies of data exist:

L1, L2, L3, Main Memory, Disk

What to do on a write-hit?
Write-through (write immediately to memory)
Write-back (defer write to memory until replacement of line)

Need a dirty bit (line different from memory or not)

What to do on a write-miss?
Write-allocate (load into cache, update line in cache)
Good if more writes to the location follow
No-write-allocate (writes straight to memory, does not load into cache)

Typical
Write-through + No-write-allocate
Write-back + Write-allocate
16

Intel Core i7 Cache Hierarchy

Processor package
Core 0
Regs
L1
d-cache

L1 i-cache and d-cache:

32 KB, 8-way,
Access: 4 cycles

Core 3
Regs
L1
i-cache

L2 unified cache

L1
d-cache

L1
i-cache

L2 unified cache

L3 unified cache
(shared by all cores)

Main memory

L2 unified cache:
256 KB, 8-way,
Access: 10 cycles
L3 unified cache:
8 MB, 16-way,
Access: 40-75 cycles
Block size: 64 bytes for
all caches.

Cache Performance Metrics

Miss Rate
Fraction of memory references not found in cache (misses / accesses)
= 1 hit rate
Typical numbers (in percentages):
3-10% for L1
can be quite small (e.g., < 1%) for L2, depending on size, etc.

Hit Time
Time to deliver a line in the cache to the processor
includes time to determine whether the line is in the cache
Typical numbers:
4 clock cycle for L1
10 clock cycles for L2

Miss Penalty
Additional time required because of a miss

typically 50-200 cycles for main memory (Trend: increasing!)

Lets think about those numbers

Huge difference between a hit and a miss

Could be 100x, if just L1 and main memory

Would you believe 99% hits is twice as good as 97%?

Consider:
cache hit time of 1 cycle
miss penalty of 100 cycles

Average access time:

97% hits: 1 cycle + 0.03 * 100 cycles = 4 cycles
99% hits: 1 cycle + 0.01 * 100 cycles = 2 cycles

This is why miss rate is used instead of hit rate

Writing Cache Friendly Code

Make the common case go fast

Focus on the inner loops of the core functions

Minimize the misses in the inner loops

Repeated references to variables are good (temporal locality) because
there is a good chance that they are stored in registers.
Stride-1 reference patterns are good (spatial locality) because
subsequent references to elements in the same block will be able to hit the
cache (one cache miss followed by many cache hits).

Today

Cache organization and operation

Performance impact of caches
The memory mountain
Rearranging loops to improve spatial locality
Using blocking to improve temporal locality

The Memory Mountain

Read throughput (read bandwidth)

Number of bytes read from memory per second (MB/s)

Memory mountain: Measured read throughput as a function of

spatial and temporal locality.
Compact way to characterize memory system performance.

Rearranging Loops
to Improve Spatial Locality

Matrix Multiplication Example

Variable sum
held in register

/* ijk */
for (i=0; i<n; i++) {
for (j=0; j<n; j++) {
sum = 0.0;
for (k=0; k<n; k++)
sum += a[i][k] * b[k][j];
c[i][j] = sum;
}
}

Description:
Multiply N x N matrices
Matrix elements are doubles (8 bytes)
O(N3) total operations
N reads per source element
N values summed per destination
but may be able to hold in register
24

Miss Rate Analysis for Matrix Multiply

Assume:
Block size = 32B (big enough for four doubles)
Matrix dimension (N) is very large
Approximate 1/N as 0.0
Cache is not even big enough to hold multiple rows

Analysis Method:
Look at access pattern of inner loop

j
k

B
25

Layout of C Arrays in Memory (review)

C arrays allocated in row-major order

each row in contiguous memory locations

Stepping through columns in one row:

for (i = 0; i < N; i++)
sum += a[0][i];
accesses successive elements
if block size (B) > sizeof(aij) bytes, exploit spatial locality
miss rate = sizeof(aij) / B

Stepping through rows in one column:

for (i = 0; i < n; i++)
sum += a[i][0];
accesses distant elements
no spatial locality!
miss rate = 1 (i.e. 100%)
26

Matrix Multiplication (ijk)

/* ijk */
for (i=0; i<n; i++) {
for (j=0; j<n; j++) {
sum = 0.0;
for (k=0; k<n; k++)
sum += a[i][k] * b[k][j];
c[i][j] = sum;
}
}

Inner loop:
(*,j)
(i,j)

(i,*)
A

Row-wise Columnwise

Fixed

Misses per inner loop iteration:

A
B
C
0.25 1.0
0.0
27

Matrix Multiplication (jik)

/* jik */
for (j=0; j<n; j++) {
for (i=0; i<n; i++) {
sum = 0.0;
for (k=0; k<n; k++)
sum += a[i][k] * b[k][j];
c[i][j] = sum
}
}

Inner loop:
(*,j)
(i,j)

(i,*)
A

Row-wise Columnwise

Fixed

Misses per inner loop iteration:

A
B
C
0.25 1.0
0.0
28

Matrix Multiplication (kij)

/* kij */
for (k=0; k<n; k++) {
for (i=0; i<n; i++) {
r = a[i][k];
for (j=0; j<n; j++)
c[i][j] += r * b[k][j];
}
}

Inner loop:
(i,k)
A

Fixed

(k,*)
B

(i,*)
C

Row-wise Row-wise

Misses per inner loop iteration:

A
B
C
0.0
0.25 0.25
29

Matrix Multiplication (ikj)

/* ikj */
for (i=0; i<n; i++) {
for (k=0; k<n; k++) {
r = a[i][k];
for (j=0; j<n; j++)
c[i][j] += r * b[k][j];
}
}

Inner loop:
(i,k)
A

Fixed

(k,*)
B

(i,*)
C

Row-wise Row-wise

Misses per inner loop iteration:

A
B
C
0.0
0.25 0.25
30

Matrix Multiplication (jki)

/* jki */
for (j=0; j<n; j++) {
for (k=0; k<n; k++) {
r = b[k][j];
for (i=0; i<n; i++)
c[i][j] += a[i][k] * r;
}
}

Inner loop:
(*,k)

(*,j)
(k,j)

Columnwise

Fixed

Columnwise

Misses per inner loop iteration:

A
B
C
1.0
0.0
1.0
31

Matrix Multiplication (kji)

/* kji */
for (k=0; k<n; k++) {
for (j=0; j<n; j++) {
r = b[k][j];
for (i=0; i<n; i++)
c[i][j] += a[i][k] * r;
}
}

Inner loop:
(*,k)

(*,j)
(k,j)

Columnwise

Fixed

Columnwise

Misses per inner loop iteration:

A
B
C
1.0
0.0
1.0
32

Summary of Matrix Multiplication

for (i=0; i<n; i++) {
ijk (& jik):
for (j=0; j<n; j++) {
2 loads, 0 stores
sum = 0.0;
misses/iter = 1.25
for (k=0; k<n; k++)
sum += a[i][k] * b[k][j];
c[i][j] = sum;
}
}
for (k=0; k<n; k++) {
for (i=0; i<n; i++) {
kij (& ikj):
r = a[i][k];
for (j=0; j<n; j++)
2 loads, 1 store
c[i][j] += r * b[k][j];
misses/iter = 0.5
}
}
for (j=0; j<n; j++) {
for (k=0; k<n; k++) {
r = b[k][j];
for (i=0; i<n; i++)
c[i][j] += a[i][k] * r;
}
}

jki (& kji):

2 loads, 1 store
misses/iter = 2.0

Core i7 Matrix Multiply Performance

jki / kji

ijk / jik

kij / ikj

Cache Summary

Cache memories can have significant performance impact

You can write your programs to exploit this!

Focus on the inner loops, where bulk of computations and memory
accesses occur.
Try to maximize spatial locality by reading data objects with
sequentially with stride 1.
Try to maximize temporal locality by using a data object as often as
possible once its read from memory.

MV3000 Software Manual PDF
0% (1)
MV3000 Software Manual PDF
464 pages
Bach - Das Wohltemperierte Klavier Henle Verlag
92% (49)
Bach - Das Wohltemperierte Klavier Henle Verlag
134 pages
Dam Crack Detection
No ratings yet
Dam Crack Detection
15 pages
Mathlinks 9 Review Bundles CH 2
No ratings yet
Mathlinks 9 Review Bundles CH 2
4 pages
BMS Book
No ratings yet
BMS Book
81 pages
Assignment Guidelines-July'24 Session
No ratings yet
Assignment Guidelines-July'24 Session
2 pages
Zero Trust Presentation
No ratings yet
Zero Trust Presentation
14 pages
Kenya
No ratings yet
Kenya
84 pages
The Subtle Art of Not Giving a F*ck: A Counterintuitive Approach to Living a Good Life
From Everand
The Subtle Art of Not Giving a F*ck: A Counterintuitive Approach to Living a Good Life
Mark Manson
4/5 (6458)
2 - EDS-528E-4GTXSFP-HV - Layer 2 Managed Switches EDS-528E Series - MOXA
No ratings yet
2 - EDS-528E-4GTXSFP-HV - Layer 2 Managed Switches EDS-528E Series - MOXA
1 page
Lab01 - Classical Cryptography
No ratings yet
Lab01 - Classical Cryptography
10 pages
Natoreit Profile
No ratings yet
Natoreit Profile
7 pages
Principles: Life and Work
From Everand
Principles: Life and Work
Ray Dalio
4/5 (648)
Dsa Final Project
No ratings yet
Dsa Final Project
14 pages
Never Split the Difference: Negotiating As If Your Life Depended On It
From Everand
Never Split the Difference: Negotiating As If Your Life Depended On It
Chris Voss
4.5/5 (1005)
The Gifts of Imperfection: Let Go of Who You Think You're Supposed to Be and Embrace Who You Are
From Everand
The Gifts of Imperfection: Let Go of Who You Think You're Supposed to Be and Embrace Who You Are
Brené Brown
4/5 (1175)
The Little Book of Hygge: Danish Secrets to Happy Living
From Everand
The Little Book of Hygge: Danish Secrets to Happy Living
Meik Wiking
3.5/5 (464)
TN Apogee Prepress 10.0 - Apogee Impose
No ratings yet
TN Apogee Prepress 10.0 - Apogee Impose
49 pages
A Man Called Ove: A Novel
From Everand
A Man Called Ove: A Novel
Fredrik Backman
4.5/5 (5181)
Rom vs. Ram
No ratings yet
Rom vs. Ram
8 pages
Grit: The Power of Passion and Perseverance
From Everand
Grit: The Power of Passion and Perseverance
Angela Duckworth
4/5 (650)
The Sympathizer: A Novel (Pulitzer Prize for Fiction)
From Everand
The Sympathizer: A Novel (Pulitzer Prize for Fiction)
Viet Thanh Nguyen
4.5/5 (141)
Scratch Programming (Scratch 3.0)
No ratings yet
Scratch Programming (Scratch 3.0)
13 pages
Chapter 15
No ratings yet
Chapter 15
49 pages
The Glass Castle: A Memoir
From Everand
The Glass Castle: A Memoir
Jeannette Walls
4.5/5 (1856)
Deloitte TMT Telco 2030
100% (2)
Deloitte TMT Telco 2030
40 pages
Required Parts: Table 2
No ratings yet
Required Parts: Table 2
7 pages
Product CI854A
No ratings yet
Product CI854A
3 pages
The Emperor of All Maladies: A Biography of Cancer
From Everand
The Emperor of All Maladies: A Biography of Cancer
Siddhartha Mukherjee
4.5/5 (298)
The Hard Thing About Hard Things: Building a Business When There Are No Easy Answers
From Everand
The Hard Thing About Hard Things: Building a Business When There Are No Easy Answers
Ben Horowitz
4.5/5 (361)
Elon Musk: Tesla, SpaceX, and the Quest for a Fantastic Future
From Everand
Elon Musk: Tesla, SpaceX, and the Quest for a Fantastic Future
Ashlee Vance
4.5/5 (582)
Sing, Unburied, Sing: A Novel
From Everand
Sing, Unburied, Sing: A Novel
Jesmyn Ward
4/5 (1267)
Steve Jobs
From Everand
Steve Jobs
Walter Isaacson
4.5/5 (1139)
Register Transfer Language Register Transfer Bus and Memory Transfers Arithmetic Micro-Operations Logic Micro-Operations Shift Micro-Operations Arithmetic Logic Shift Unit
No ratings yet
Register Transfer Language Register Transfer Bus and Memory Transfers Arithmetic Micro-Operations Logic Micro-Operations Shift Micro-Operations Arithmetic Logic Shift Unit
7 pages
Abdulrahman El Moughrabi Resume
No ratings yet
Abdulrahman El Moughrabi Resume
2 pages
Shoe Dog: A Memoir by the Creator of Nike
From Everand
Shoe Dog: A Memoir by the Creator of Nike
Phil Knight
4.5/5 (629)
The Woman in Cabin 10
From Everand
The Woman in Cabin 10
Ruth Ware
3.5/5 (2814)
Hidden Figures: The American Dream and the Untold Story of the Black Women Mathematicians Who Helped Win the Space Race
From Everand
Hidden Figures: The American Dream and the Untold Story of the Black Women Mathematicians Who Helped Win the Space Race
Margot Lee Shetterly
4/5 (1022)
Accenture Disruptability POV PDF
No ratings yet
Accenture Disruptability POV PDF
13 pages
The Outsider: A Novel
From Everand
The Outsider: A Novel
Stephen King
4/5 (2886)
Her Body and Other Parties: Stories
From Everand
Her Body and Other Parties: Stories
Carmen Maria Machado
4/5 (903)
The Perks of Being a Wallflower
From Everand
The Perks of Being a Wallflower
Stephen Chbosky
4.5/5 (4103)
Bad Feminist: Essays
From Everand
Bad Feminist: Essays
Roxane Gay
4/5 (1090)
Lab Data Current Balance
0% (1)
Lab Data Current Balance
5 pages
A Detailed Study On E - Payment Modes and Its Impact
100% (1)
A Detailed Study On E - Payment Modes and Its Impact
53 pages
Yes Please
From Everand
Yes Please
Amy Poehler
4/5 (2016)
Angela's Ashes: A Memoir
From Everand
Angela's Ashes: A Memoir
Frank McCourt
4.5/5 (943)
File PDF
No ratings yet
File PDF
50 pages
The World Is Flat 3.0: A Brief History of the Twenty-first Century
From Everand
The World Is Flat 3.0: A Brief History of the Twenty-first Century
Thomas L. Friedman
3.5/5 (2289)
Final Sample
No ratings yet
Final Sample
10 pages
Maintenance - Free Secondary Cells (Vrla) General: BSNL Power-Plant
No ratings yet
Maintenance - Free Secondary Cells (Vrla) General: BSNL Power-Plant
15 pages
Fear: Trump in the White House
From Everand
Fear: Trump in the White House
Bob Woodward
3.5/5 (836)
Lecture09 ConcurrentProgramming 02 Synchronization
No ratings yet
Lecture09 ConcurrentProgramming 02 Synchronization
30 pages
Lecture04 Machine Programming 4 Advanced
No ratings yet
Lecture04 Machine Programming 4 Advanced
30 pages
Lecture07 Processes 1 Proc
No ratings yet
Lecture07 Processes 1 Proc
50 pages
Lecture08 Memory Allocation
No ratings yet
Lecture08 Memory Allocation
68 pages
Finite Element Analysis
No ratings yet
Finite Element Analysis
2 pages
Rise of ISIS: A Threat We Can't Ignore
From Everand
Rise of ISIS: A Threat We Can't Ignore
Jay Sekulow
3.5/5 (144)
Team of Rivals: The Political Genius of Abraham Lincoln
From Everand
Team of Rivals: The Political Genius of Abraham Lincoln
Doris Kearns Goodwin
4.5/5 (244)
Brooklyn: A Novel
From Everand
Brooklyn: A Novel
Colm Tóibín
3.5/5 (2141)
The Art of Racing in the Rain: A Novel
From Everand
The Art of Racing in the Rain: A Novel
Garth Stein
4/5 (4372)
Wolf Hall: A Novel
From Everand
Wolf Hall: A Novel
Hilary Mantel
4/5 (4135)
The Constant Gardener: A Novel
From Everand
The Constant Gardener: A Novel
John le Carré
4/5 (278)
A Heartbreaking Work Of Staggering Genius: A Memoir Based on a True Story
From Everand
A Heartbreaking Work Of Staggering Genius: A Memoir Based on a True Story
Dave Eggers
3.5/5 (233)
Lecture02 C Basics
No ratings yet
Lecture02 C Basics
33 pages
Devil in the Grove: Thurgood Marshall, the Groveland Boys, and the Dawn of a New America
From Everand
Devil in the Grove: Thurgood Marshall, the Groveland Boys, and the Dawn of a New America
Gilbert King
4.5/5 (280)
The Yellow House: A Memoir (2019 National Book Award Winner)
From Everand
The Yellow House: A Memoir (2019 National Book Award Winner)
Sarah M. Broom
4/5 (100)
A Tree Grows in Brooklyn
From Everand
A Tree Grows in Brooklyn
Betty Smith
4.5/5 (2033)
John Adams
From Everand
John Adams
David McCullough
4.5/5 (2546)
CoreJAVA Practicals
No ratings yet
CoreJAVA Practicals
2 pages
On Fire: The (Burning) Case for a Green New Deal
From Everand
On Fire: The (Burning) Case for a Green New Deal
Naomi Klein
4/5 (78)
Missing Neighbors in WCDMA Analysis Guide
100% (2)
Missing Neighbors in WCDMA Analysis Guide
15 pages
Gateforum Ece Question Paper
No ratings yet
Gateforum Ece Question Paper
17 pages
Little Women
From Everand
Little Women
Louisa May Alcott
4.5/5 (2369)
Manhattan Beach: A Novel
From Everand
Manhattan Beach: A Novel
Jennifer Egan
3.5/5 (919)
ASCII Characters Set
No ratings yet
ASCII Characters Set
8 pages
The Light Between Oceans: A Novel
From Everand
The Light Between Oceans: A Novel
M.L. Stedman
4.5/5 (815)
Discrete Mathematics Question Paper
0% (2)
Discrete Mathematics Question Paper
4 pages
The Unwinding: An Inner History of the New America
From Everand
The Unwinding: An Inner History of the New America
George Packer
4/5 (45)