0% found this document useful (0 votes)

7 views42 pages

06 Memory Hierarchy

Chapter 6 of 'Foundations of Computer Science' discusses memory hierarchy, highlighting the performance gap between processors and memory, and the importance of locality in program access patterns. It explains the structure and function of caches, including direct-mapped caches, and addresses key questions regarding block placement, identification, replacement, and write strategies. The chapter emphasizes the trade-offs between speed and size in memory systems and the implications for computer architecture.

Uploaded by

lucky25hada

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

7 views42 pages

06 Memory Hierarchy

Uploaded by

lucky25hada

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 42

Foundations of

Computer Science
Chapter 6 Memory Hierarchy

Prof. Dr.-Ing. Richard Membarth

January 8th 2025

Contents

6. Memory Hierarchy

6.1 Motivation

6.2 Caches

2 THI | Prof. Dr.-Ing. Richard Membarth | Foundations of Computer Science | Memory Hierarchy | Winter Term 2024/2025
Foundations of
Computer Science
Chapter 6.1 Memory Hierarchy
Motivation
Prof. Dr.-Ing. Richard Membarth

January 8th 2025

Memory Hierarchy
Motivation

assumption so far
instructions take a single cycle to compute
each instruction takes the same time to finish
memory requests are served within a single cycle
reality
instructions take multiple cycles to compute
memory requests are served within multiple cycles

4 THI | Prof. Dr.-Ing. Richard Membarth | Foundations of Computer Science | Memory Hierarchy: Motivation | Winter Term 2024/2025
Memory Hierarchy
Motivation

how can we wait on data?

stall pipeline
when data is ready, continue processing
applies for data and instruction memory
out-of-order processors can continue processing independent instructions

5 THI | Prof. Dr.-Ing. Richard Membarth | Foundations of Computer Science | Memory Hierarchy: Motivation | Winter Term 2024/2025
Memory Hierarchy
Motivation

processor-memory performance gap 10000

CPU performance
memory performance
memory performance grows slower than processor
performance
memory performance: DRAM latency 1000

CPU performance: processor memory

processor-memory
requests
performance gap
conclusion 100

communication costs becomes more and more

a performance bottleneck
we need ways to circumvent that problem 10

1
1980 1985 1990 1995 2000 2005 2010 2015 2020

6 THI | Prof. Dr.-Ing. Richard Membarth | Foundations of Computer Science | Memory Hierarchy: Motivation | Winter Term 2024/2025
Memory Hierarchy
Motivation

programs access a small portion of their address space at any time

temporal locality
items accessed recently are likely to be accessed again soon
examples
→ instructions in a loop
→ induction variables
spatial locality
items near those accessed recently are likely to be accessed soon
examples
→ sequential instruction access
→ array data

7 THI | Prof. Dr.-Ing. Richard Membarth | Foundations of Computer Science | Memory Hierarchy: Motivation | Winter Term 2024/2025
Memory Hierarchy
Motivation

temporal or spatial locality?

1 loop: lw x31, 0(x20) 1 addi sp, sp, -12

2 add x31, x31, x21 2 sw ra, 0(sp)
3 sw x31, 0(x20) 3 sw a0, 4(sp)
4 addi x20, x20, -4 4 sw a1, 8(sp)
5 blt x22, x20, loop 5 ...
6 lw a1, 8(sp)
spatial locality in programs 7 lw a0, 4(sp)
8 lw ra, 0(sp)
sequence of instructions stored consecutively in 9 addi sp, sp, +12
memory
spatial locality in data
temporal locality in programs
registers are stored to contiguous memory via sp
loop instructions 1–5 will be fetched once on
temporal locality in data
every loop iteration
registers are restored from memory via sp

8 THI | Prof. Dr.-Ing. Richard Membarth | Foundations of Computer Science | Memory Hierarchy: Motivation | Winter Term 2024/2025
Memory Hierarchy
Motivation

temporal or spatial locality?

1 int sum = 0;
2 for (int i = 0; i < n; i++)
3 sum = sum + arr[i];

temporal locality
variables sum and i
typically kept in registers, but not always possible
spatial locality
array arr stored in memory contiguously

9 THI | Prof. Dr.-Ing. Richard Membarth | Foundations of Computer Science | Memory Hierarchy: Motivation | Winter Term 2024/2025
Memory Hierarchy
Motivation

processor

core
registers cache as memory hierarchy

bandwidth, space, and price per bit

set of layers
L1 cache memory transfers only between adjacent layers

latency, capacity
important for developers: the higher the layer, the faster and smaller
L2 cache
the memory

L3 cache

DDR RAM

HDD

10 THI | Prof. Dr.-Ing. Richard Membarth | Foundations of Computer Science | Memory Hierarchy: Motivation | Winter Term 2024/2025
Memory Hierarchy
Motivation

L L
CPU
1 2 memory storage
registers
C C flash

1000 bytes 64 KB 256 KB 1–2 GB 4–64 GB

mobile
300 ps 1 ns 5–10 ns 50–100 ns 25–50 us

11 THI | Prof. Dr.-Ing. Richard Membarth | Foundations of Computer Science | Memory Hierarchy: Motivation | Winter Term 2024/2025
Memory Hierarchy
Motivation

L L L
CPU
1 2 3 memory storage
registers
C C C flash

laptop 1000 bytes 64 KB 256 KB 4–8 MB 4–16 GB 256 GB–1 TB

300 ps 1 ns 3–10 ns 10–20 ns 50–100 ns 50–100 us
desktop 2000 bytes 64 KB 256 KB 8–32 MB 8–64 GB 256 GB–2 TB
300 ps 1 ns 3–10 ns 10–20 ns 50–100 ns 50–100 us

L L L disk storage
CPU
1 2 3 memory
registers
C C C flash storage

server 4000 bytes 64 KB 256 KB 16–64 MB 32–256 GB 16–64 TB 1–16 TB

200 ps 1 ns 3–10 ns 10–20 ns 50–100 ns 5–10 ms 100–200 us
disk flash
12 THI | Prof. Dr.-Ing. Richard Membarth | Foundations of Computer Science | Memory Hierarchy: Motivation | Winter Term 2024/2025
Foundations of
Computer Science
Chapter 6.2 Memory Hierarchy
Caches
Prof. Dr.-Ing. Richard Membarth

January 8th 2025

Caches
Basics

caches are faster, but smaller than memory address data

divided into blocks 0
1
number of blocks is usually a power of 2
2
assumption: 1 byte per block 3
four important questions 4
5 index data
Q1 where can a block be placed in the upper level?
6 0
(block placement) 7 1
Q2 how is a block found if it is in the upper level? 8 2
9 3
(block identification)
10 cache
Q3 which block should be replaced on a miss? 11
(block replacement) 12
13
Q4 what happens on a write?
14
(write strategy) 15
memory
14 THI | Prof. Dr.-Ing. Richard Membarth | Foundations of Computer Science | Memory Hierarchy: Caches | Winter Term 2024/2025
Caches
Direct Mapping

direct-mapped cache address data

each memory address maps to exactly one block 0
1
addresses 0, 4, 8, and 12 map to cache block 0
2
addresses 1, 5, 9, and 13 map to cache block 1 3
... 4
5 index data
6 0
7 1
8 2
9 3
10 cache
11
12
13
14
15
memory
15 THI | Prof. Dr.-Ing. Richard Membarth | Foundations of Computer Science | Memory Hierarchy: Caches | Winter Term 2024/2025
Caches
Direct Mapping

direct-mapped cache address data

each memory address maps to exactly one block 0000
0001
addresses 0, 4, 8, and 12 map to cache block 0
0010
addresses 1, 5, 9, and 13 map to cache block 1 0011
... 0100
0101 index data
cache block index computation
0110 00
cache with 2k blocks 0111 01
memory address i 1000 10
1001 11
cache block index: i mod 2k
1010 cache
corresponds to the least significant k bits of the 1011
address 1100
1101
1110
1111
memory
15 THI | Prof. Dr.-Ing. Richard Membarth | Foundations of Computer Science | Memory Hierarchy: Caches | Winter Term 2024/2025
Caches
Direct Mapping

direct-mapped cache address data

for a memory address i, the corresponding cache 0000
0001
block can be computed, however, other addresses
0010
map also to the same cache block 0011
cache block 2 can contain data from memory 0100
0101 index Vtag
data
tag datadata
addresses 2, 6, 10, or 14
0110 00
add tags to the cache 0111 01
stores the rest of the address bits 1000 10
1001 11
to differentiate between different addresses
1010 cache
cache
cache
tag field: i /2k 1011
corresponds to the most significant m − k bits of 1100
1101
the address
1110
1111
memory
16 THI | Prof. Dr.-Ing. Richard Membarth | Foundations of Computer Science | Memory Hierarchy: Caches | Winter Term 2024/2025
Caches
Direct Mapping

direct-mapped cache address data

for a memory address i, the corresponding cache 0000
0001
block can be computed, however, other addresses
0010
map also to the same cache block 0011
cache block 2 can contain data from memory 0100
0101 index Vtag
data
tag datadata
addresses 2, 6, 10, or 14
0110 00
add tags to the cache 0111 01
stores the rest of the address bits 1000 10
1001 11
to differentiate between different addresses
1010 cache
cache
cache
tag field: i /2k 1011
corresponds to the most significant m − k bits of 1100
1101
the address
1110
add valid bit for each cache block 1111
cache data is not valid when system is initialized memory
16 THI | Prof. Dr.-Ing. Richard Membarth | Foundations of Computer Science | Memory Hierarchy: Caches | Winter Term 2024/2025
Caches
Direct Mapping

direct-mapped cache address data

in reality, the cache block size is larger than one 0000
0001
byte
0010
load two-byte blocks at a time 0011
when reading from address 12, data in addresses 0100
0101 index V tag data
12 and 13 will both be copied to cache block 2
0110 00
byte offset 0111 01
block size of 2n (cache line size) 1000 10
1001 11
byte offset within cache line: i mod 2n
1010 cache
corresponds to the n least significant bits 1011
block addresses 1100
1101
block address of a byte address: i /2n
1110
corresponds to the m − n most significant bits 1111
(contains bits for the index and tag) memory
17 THI | Prof. Dr.-Ing. Richard Membarth | Foundations of Computer Science | Memory Hierarchy: Caches | Winter Term 2024/2025
Caches
Direct Mapping

locating data
cache with 2k lines, each containing 2n bytes
m-bit memory address
n bits of the address corresponds to the byte offset within a cache line
k bits of the address corresponds to the index of the 2k cache line
m − k − n bits of the address corresponds to the tag to match the memory address
(m-k-n) bits k bits n bits
m-bit address tag index offset

example
cache with 22 cache lines, 21 bytes per cache line
memory address 13 (1101) will be stored in byte 1 of cache line 2
1 bit 2 bits 1 bit
4-bit address 1 10 1

18 THI | Prof. Dr.-Ing. Richard Membarth | Foundations of Computer Science | Memory Hierarchy: Caches | Winter Term 2024/2025
Caches
Direct Mapping

advantages
indices and offsets computation with bit operations or simple arithmetic
easy to realize in hardware
disadvantages
low cache hit rate
the memory address access pattern 0, 4, 0, 4, 0, . . . will replace the first cache line for each memory access

19 THI | Prof. Dr.-Ing. Richard Membarth | Foundations of Computer Science | Memory Hierarchy: Caches | Winter Term 2024/2025
Caches
Direct Mapping

cache example content of the cache before / after the memory

8 blocks, 1 word / block, direct mapped references
consider memory references at addresses 0x16, cache

0x1A, 0x16, 0x1A, 0x10, 0x03, 0x10, 0x12, index v tag data
000 y 00010 mem[0x10]
0x10
001 n
address hit / miss cache block y
010 00010 mem[0x12]
0x16 00010110 miss 110 y
011 00000 mem[0x03]
0x1A 00011010 miss 010
100 n
0x16 00010110 hit 110
101 n
0x1A 00011010 hit 010 y
110 00010 mem[0x16]
0x10 00010000 miss 000
111 n
0x03 00000011 miss 011
0x10 00010000 hit 000
0x12 00010010 miss 010
0x10 00010000 hit 000

20 THI | Prof. Dr.-Ing. Richard Membarth | Foundations of Computer Science | Memory Hierarchy: Caches | Winter Term 2024/2025
Caches
Fully Associative Mapping

fully associative cache address data

each memory address can map to any block 0000
0001
addresses 0 can map to cache block 0, 1, 2, or 3
0010
addresses 1 can map to cache block 0, 1, 2, or 3 0011
... 0100
(m-n) bits n bits 0101 index data
m-bit address tag offset 0110 00
0111 01
advantages 1000 10
high cache hit rate 1001 11
1010 cache
flexible replacement strategy
1011
disadvantages 1100
expensive to implement 1101
1110
1111
memory
21 THI | Prof. Dr.-Ing. Richard Membarth | Foundations of Computer Science | Memory Hierarchy: Caches | Winter Term 2024/2025
Caches
Set-Associative Mapping

set-associative cache address data

cache is divided into groups of blocks, called sets 0000
0001
each memory address maps to exactly one set
0010
data may be placed in any block within the set 0011
0100
(m-s-n) bits s bits n bits 0101 index data
m-bit address tag set offset 0110 00
set 0
0111 01
advantages 1000 10
set 1
trade-off between direct-mapped and fully 1001 11
1010 cache
associative cache
1011
flexible replacement strategy 1100
disadvantages 1101
1110
will not effectively use all available cache lines
1111
memory
22 THI | Prof. Dr.-Ing. Richard Membarth | Foundations of Computer Science | Memory Hierarchy: Caches | Winter Term 2024/2025
Caches
Set-Associative Mapping

k-way associative cache

cache is k -way associative cache ↔ each set with k = 2x blocks

1-way 2-way 4-way 8-way

8 sets 4 sets 2 sets 1 set
1 block each 2 blocks each 4 blocks each 8 blocks each
set set set set
0
0
1
0
2
1
3
0
4
2
5
1
6
3
7
cache cache cache cache
direct mapped fully associative
23 THI | Prof. Dr.-Ing. Richard Membarth | Foundations of Computer Science | Memory Hierarchy: Caches | Winter Term 2024/2025
Caches
Replacement Policies

on a cache miss, a cache block might need to be replaced

replacement policies
random replacement (RR)
first in, first out (FIFO)
last in, first out (LIFO)
least recently used (LRU)
most recently used (MRU)
...
sources of cache misses
compulsory (cold start, first reference)
conflict (collision)
capacity (size)
coherency (multi-core)

24 THI | Prof. Dr.-Ing. Richard Membarth | Foundations of Computer Science | Memory Hierarchy: Caches | Winter Term 2024/2025
Caches
Writing Policies: Write-Hit Policies

on a write-hit, data changes in cache → two strategies

write-back write-through
data is written only to the block in the cache data is written to both the block in the cache
modified cache block is written to main and to the block in the lower-level memory
memory only when it is replaced
discussion discussion
→ use dirty bit to indicate modified cache − slow
block → use buffered write-through
− more overhead with multi-core + simple
+ fast + always in sync

25 THI | Prof. Dr.-Ing. Richard Membarth | Foundations of Computer Science | Memory Hierarchy: Caches | Winter Term 2024/2025
Caches
Writing Policies: Write-Miss Policies

on a write-miss, data is not present in cache → two strategies

write-allocate no-write-allocate
cache block is allocated on a write miss write misses do not affect the cache
followed by the preceding write hit actions cache block is modified only in the
→ write misses act like read misses lower-level memory
typically write-back with write-allocate policy often write-through with no-write-allocate policy

26 THI | Prof. Dr.-Ing. Richard Membarth | Foundations of Computer Science | Memory Hierarchy: Caches | Winter Term 2024/2025
Caches
Inclusion Policies

the inclusion policy in multi-level caches decides which data blocks each level contains
inclusive: all blocks present in the higher/upper level cache have to be present in the lower level cache as well
example: L2 cache is inclusive of L1
example: L3 cache is inclusive of L1 and L2
exclusive: any element in the higher/upper level cache will not be present in any of the lower cache components
example: L2 cache is exclusive of L1
non-inclusive non-exclusive (NINE): higher/upper level cache may or may not be present in the lower level cache
example: L3 cache is non-inclusive

L2
L2 L1 L2 L1
L1

inclusive non-inclusive non-exclusive exclusive

27 THI | Prof. Dr.-Ing. Richard Membarth | Foundations of Computer Science | Memory Hierarchy: Caches | Winter Term 2024/2025
Caches
Inclusion Policies (cf., Solihin, 2015)

inclusive L2 exclusive L2 NINE L2

L1 X Y L1 X Y L1 X Y

Y Y L1 and L2 cache miss

L2 X L2 L2 X on block X and Y
Z Z Z

L1 Y L1 Y L1 Y

Y Y
block X is evicted from
L2 X L2 X L2 X
Z Z Z the L1 cache

L1 Y Y

block Y is evicted from

L2 X X X
Z Z Z the L2 cache

L1 L1 Y L1 Y
Z Z Z
L1 cache miss on
L2 X L2 X L2 X block Z
Z Z
28 THI | Prof. Dr.-Ing. Richard Membarth | Foundations of Computer Science | Memory Hierarchy: Caches | Winter Term 2024/2025
Caches
Average Memory Access Time

average memory access time (AMAT)

metric to analyze the memory subsystem performance
hit time, miss penalty, miss rate
AMAT = hit_time + miss_rate · miss_penalty
allows to analyze memory hierarchies
miss penalty can be computed via values from next cache level
AMATi = hit_timei + miss_ratei · miss_penaltyi +1
example
CPU with 1 ns clock
hit time = 1 cycle
miss penalty = 20 cycles
cache miss rate = 5%
→ AMAT = 1 cycle + 0.05 · 20 cycles = 2 cycles

29 THI | Prof. Dr.-Ing. Richard Membarth | Foundations of Computer Science | Memory Hierarchy: Caches | Winter Term 2024/2025
Caches
Cache Thrashing

cache thrashing address data

memory access pattern, where multiple addresses 0
1
map to the same cache line
2
leads to excessive cache misses 3
example: direct mapped cache, access pattern 4
5 index data
memory address 0 → cache line 0
6 0
memory address 4 → cache line 0 7 1
memory address 0 → cache line 0 8 2
9 3
memory address 4 → cache line 0
10 cache
... 11
problematic for caches with low associativity 12
13
14
15
memory
30 THI | Prof. Dr.-Ing. Richard Membarth | Foundations of Computer Science | Memory Hierarchy: Caches | Winter Term 2024/2025
Caches
False Sharing

false sharing core 0 core 1

memory access pattern, where threads on different thread 0 thread 1
cores access data within the same memory block
cache coherence enforces both cores to update
their cache continuously with main memory
example: variables A and B are in the same
memory block
thread 0 of core 0 modifies A → update of the
main memory block and cache line in core 1
thread 1 of core 1 modifies B → update of the
main memory block and cache line in core 0
...

31 THI | Prof. Dr.-Ing. Richard Membarth | Foundations of Computer Science | Memory Hierarchy: Caches | Winter Term 2024/2025
Caches
Hierarchy

functionality: load data from memory into register

processor
lw t0, 0(s0)
core
registers address in virtual memory: addr at 0(s0)

bandwidth, space, and price per bit

L1$
test target register: t0
hit
L1 cache test whether addr is in L1$

latency, capacity
L2$ L1$
hit miss if yes: L1$ hit, return value
L2 cache
L3$ L2$ otherwise: L1$ miss
hit miss
page L3 cache
L3$
table
miss
hit
DDR RAM
page
fault

HDD

32 THI | Prof. Dr.-Ing. Richard Membarth | Foundations of Computer Science | Memory Hierarchy: Caches | Winter Term 2024/2025
Caches
Hierarchy

functionality: load data from memory into register

processor
on L1$ miss, we go the next level: L2
core
registers test whether addr is in L2$

bandwidth, space, and price per bit

L1$
test if yes: L2$ hit
hit
L1 cache return cache line to L1$

latency, capacity
L2$ L1$
hit miss return value from L1$ to registers
L2 cache
L3$ L2$ otherwise: L2$ miss
hit miss cache line: continuous block of memory of a fixed size
page L3 cache
L3$ 64 bytes on current processors
table
miss
hit
DDR RAM
page
fault

HDD

32 THI | Prof. Dr.-Ing. Richard Membarth | Foundations of Computer Science | Memory Hierarchy: Caches | Winter Term 2024/2025
Caches
Hierarchy

functionality: load data from memory into register

processor
on L2$ miss, we go the next level: L3
core
registers test whether addr is in L3$

bandwidth, space, and price per bit

L1$
test if yes: L3$ hit
hit
L1 cache return cache line to L2$

latency, capacity
L2$ L1$
hit miss return cache line to L1$
L2 cache
L3$ L2$ return value from L1$ to registers
hit miss otherwise: L3$ miss
page L3 cache
L3$
table
miss
hit
DDR RAM
page
fault

HDD

32 THI | Prof. Dr.-Ing. Richard Membarth | Foundations of Computer Science | Memory Hierarchy: Caches | Winter Term 2024/2025
Caches
Hierarchy

functionality: load data from memory into register

processor
on L3$ miss, we go to physical main memory
core
registers MMU checks page table whether the page is physical memory

bandwidth, space, and price per bit

L1$
test if yes: page table hit
hit
L1 cache return cache line to L3$

latency, capacity
L2$ L1$
hit miss return cache line to L2$
L2 cache
L3$ L2$ return cache line to L1$
hit miss return value from L1$ to registers
page L3 cache
L3$ otherwise: L3$ miss
table
miss
hit
DDR RAM
page
fault

HDD

32 THI | Prof. Dr.-Ing. Richard Membarth | Foundations of Computer Science | Memory Hierarchy: Caches | Winter Term 2024/2025
Caches
Hierarchy

functionality: load data from memory into register

processor
if we have a page fault, OS loads page from disk first
core
registers if the virtual address is invalide, raise an exception

bandwidth, space, and price per bit

L1$
test if the address is valid
hit
L1 cache load page (4 KB) from HDD to DDR RAM

latency, capacity
L2$ L1$
hit miss return cache line to L3$
L2 cache
L3$ L2$ return cache line to L2$
hit miss return cache line to L1$
page L3 cache
L3$ return value from L1$ to registers
table
miss
hit
DDR RAM
page
fault

HDD

32 THI | Prof. Dr.-Ing. Richard Membarth | Foundations of Computer Science | Memory Hierarchy: Caches | Winter Term 2024/2025
Caches
Hierarchy

processor memory size technology latency bandwidth

core registers 16 × 8 B registers 0–1 cycle (0.3 ns) 435.2 GB/s
registers

bandwidth, space, and price per bit

L1 cache 4 × 32 KiB SRAM 4–5 cycles (1.2 ns) 217.6 GB/s

L1 cache L2 cache 4 × 256 KiB SRAM > 12 cycles (3.5 ns) 217.6 GB/s

latency, capacity
L3 cache 8 MiB SRAM > 42 cycles (14.5 ns) 4 × 92.8 GB /s
L2 cache DDR4 RAM ≤ 64 GiB DRAM 42 cycles + 51 ns = 55.5 ns 34.1 GB/s

L3 cache Intel Core i7 6700 (Skylake)

core frequency: 3.4 GHz

DDR RAM processor (ring/uncore) frequency: 2.9 GHz

memory clock: 266.67 MHz

HDD

33 THI | Prof. Dr.-Ing. Richard Membarth | Foundations of Computer Science | Memory Hierarchy: Caches | Winter Term 2024/2025
Caches
Hierarchy

Intel Core i7 6700 (Skylake)

processor
replacement strategy: LRU-based
core
registers L1: Tree-PLRU

bandwidth, space, and price per bit

L2: QLRU
L1 cache L3: QLRU

latency, capacity
write-back with write-allocate policy
L2 cache
cache line size: 64 bytes
L2: inclusive of L1
L3 cache
L3: non-inclusive of L2 / L1
two L1 caches
DDR RAM
L1 data cache
L1 instruction cache

HDD

34 THI | Prof. Dr.-Ing. Richard Membarth | Foundations of Computer Science | Memory Hierarchy: Caches | Winter Term 2024/2025
References I

Solihin, Y. (2015). Fundamentals of parallel multicore architecture. Chapman & Hall/CRC.

35 THI | Prof. Dr.-Ing. Richard Membarth | Foundations of Computer Science | Memory Hierarchy: References | Winter Term 2024/2025

Computer Organization and Architecture: Cache Memory
100% (1)
Computer Organization and Architecture: Cache Memory
57 pages
04 - Large and Fast Exploiting Memory Hierarchy
No ratings yet
04 - Large and Fast Exploiting Memory Hierarchy
92 pages
5 Memory Hierarchy
No ratings yet
5 Memory Hierarchy
99 pages
Chap 6
No ratings yet
Chap 6
48 pages
CMP3010L08 Memory
No ratings yet
CMP3010L08 Memory
45 pages
EC 5001 - Memory 1
No ratings yet
EC 5001 - Memory 1
56 pages
13 - Large and Fast Exploiting Memory Hierarchy Final
No ratings yet
13 - Large and Fast Exploiting Memory Hierarchy Final
118 pages
Memory
No ratings yet
Memory
125 pages
25 e 50 Beb 5 Aad 8 F 60
No ratings yet
25 e 50 Beb 5 Aad 8 F 60
49 pages
CH 4.ppt Type I
No ratings yet
CH 4.ppt Type I
60 pages
Help 2
No ratings yet
Help 2
102 pages
Computer Organization & Architecture: Cache Memory
No ratings yet
Computer Organization & Architecture: Cache Memory
71 pages
Memory Hierarchy Design
No ratings yet
Memory Hierarchy Design
76 pages
William Stallings Computer Organization and Architecture 7th Edition Cache Memory
No ratings yet
William Stallings Computer Organization and Architecture 7th Edition Cache Memory
57 pages
CS140 Computer Organization: Chapter 6: Memory
No ratings yet
CS140 Computer Organization: Chapter 6: Memory
81 pages
4.2 CacheMemory
No ratings yet
4.2 CacheMemory
61 pages
CS 61C: Great Ideas in Computer Architecture: Lecture 12 - Memory Hierarchy/Direct-Mapped Caches
No ratings yet
CS 61C: Great Ideas in Computer Architecture: Lecture 12 - Memory Hierarchy/Direct-Mapped Caches
27 pages
William Stallings Computer Organization and Architecture 6th Edition Cache Memory
No ratings yet
William Stallings Computer Organization and Architecture 6th Edition Cache Memory
55 pages
Chapter5 PDF
No ratings yet
Chapter5 PDF
95 pages
Unit 1 Part 2 (Chapter 4) Cache Memory
No ratings yet
Unit 1 Part 2 (Chapter 4) Cache Memory
53 pages
Characteristics Location Capacity Unit of Transfer Access Method Performance Physical Type Physical Characteristics Organisation
No ratings yet
Characteristics Location Capacity Unit of Transfer Access Method Performance Physical Type Physical Characteristics Organisation
53 pages
William Stallings Computer Organization and Architecture 7th Edition Cache Memory
No ratings yet
William Stallings Computer Organization and Architecture 7th Edition Cache Memory
57 pages
William Stallings Computer Organization and Architecture: Internal Memory
No ratings yet
William Stallings Computer Organization and Architecture: Internal Memory
60 pages
BiD 05
No ratings yet
BiD 05
97 pages
12 Caches Notes
No ratings yet
12 Caches Notes
144 pages
Memory Sub-System: CT101 - Computing Systems
No ratings yet
Memory Sub-System: CT101 - Computing Systems
46 pages
Cache Memory: William Stallings, Computer Organization and Architecture, 9 Edition
No ratings yet
Cache Memory: William Stallings, Computer Organization and Architecture, 9 Edition
47 pages
Module 4: Memory System Organization & Architecture
No ratings yet
Module 4: Memory System Organization & Architecture
97 pages
Memory
No ratings yet
Memory
57 pages
Memory Organization AndCache Mapping Study 13
100% (1)
Memory Organization AndCache Mapping Study 13
55 pages
CS 3853 Computer Architecture - Memory Hierarchy
No ratings yet
CS 3853 Computer Architecture - Memory Hierarchy
37 pages
Lec8 - Caches
No ratings yet
Lec8 - Caches
55 pages
Cache Memory-Direct Mapping
0% (1)
Cache Memory-Direct Mapping
30 pages
Characteristics Location Capacity Unit of Transfer Access Method Performance Physical Type Physical Characteristics Organisation
No ratings yet
Characteristics Location Capacity Unit of Transfer Access Method Performance Physical Type Physical Characteristics Organisation
53 pages
Large and Fast: Exploiting Memory Hierarchy
No ratings yet
Large and Fast: Exploiting Memory Hierarchy
24 pages
CAO - Lecutre7 Cache Memory
100% (1)
CAO - Lecutre7 Cache Memory
39 pages
William Stallings Computer Organization and Architecture 7th Edition
No ratings yet
William Stallings Computer Organization and Architecture 7th Edition
57 pages
Lecture 13 - Introduction To Cache
No ratings yet
Lecture 13 - Introduction To Cache
47 pages
12 Caches Notes
No ratings yet
12 Caches Notes
144 pages
04 - Cache Memory PDF
No ratings yet
04 - Cache Memory PDF
71 pages
Lec 2
No ratings yet
Lec 2
26 pages
Computer Organization & Architecture: Cache Memory
No ratings yet
Computer Organization & Architecture: Cache Memory
52 pages
Lecture 16
No ratings yet
Lecture 16
22 pages
Lecture11 Cda3101
No ratings yet
Lecture11 Cda3101
73 pages
Advanced Computer Architecture: BY Dr. Radwa M. Tawfeek
No ratings yet
Advanced Computer Architecture: BY Dr. Radwa M. Tawfeek
32 pages
William Stallings Computer Organization and Architecture 7th Edition Cache Memory
No ratings yet
William Stallings Computer Organization and Architecture 7th Edition Cache Memory
51 pages
CS2115 Chapter-6
No ratings yet
CS2115 Chapter-6
45 pages
Week 12 - Lecture 12 - Memory
No ratings yet
Week 12 - Lecture 12 - Memory
27 pages
ACA Unit 2
No ratings yet
ACA Unit 2
45 pages
Cache - Memory - Concept
No ratings yet
Cache - Memory - Concept
73 pages
IT3030E CA Chap6 Memory
No ratings yet
IT3030E CA Chap6 Memory
72 pages
Computer Architecture: Cache Memory
No ratings yet
Computer Architecture: Cache Memory
28 pages
Chapter 3 Large and Fast
No ratings yet
Chapter 3 Large and Fast
86 pages
Cache Memory
No ratings yet
Cache Memory
57 pages
Computer Organization and Architecture Module 3
100% (1)
Computer Organization and Architecture Module 3
34 pages
Chapter - 05 9wy
No ratings yet
Chapter - 05 9wy
136 pages
04 - Cache Memory
No ratings yet
04 - Cache Memory
79 pages
Lecture-17 CH-05 1
No ratings yet
Lecture-17 CH-05 1
21 pages
CH04 COA11e
No ratings yet
CH04 COA11e
48 pages
Efficient Algorithms and Structures with Heaps: Definitive Reference for Developers and Engineers
From Everand
Efficient Algorithms and Structures with Heaps: Definitive Reference for Developers and Engineers
Richard Johnson
No ratings yet
Understanding Data Deletion and Data Recovery in Windows
No ratings yet
Understanding Data Deletion and Data Recovery in Windows
29 pages
Concepts of IT
No ratings yet
Concepts of IT
4 pages
0420 Computer Studies: MARK SCHEME For The October/November 2014 Series
No ratings yet
0420 Computer Studies: MARK SCHEME For The October/November 2014 Series
13 pages
Help Usbasp
No ratings yet
Help Usbasp
18 pages
MODULE 1 BASIC COMPETENCIESedited
No ratings yet
MODULE 1 BASIC COMPETENCIESedited
30 pages
OK Cisco 1600 Series Router Architecture
No ratings yet
OK Cisco 1600 Series Router Architecture
13 pages
38 Basic Linux Commands To Learn With Examples: Syntax
No ratings yet
38 Basic Linux Commands To Learn With Examples: Syntax
18 pages
LSP Lab
No ratings yet
LSP Lab
7 pages
Personal Cloud Storage Service
No ratings yet
Personal Cloud Storage Service
2 pages
DES-3611.prepaway - Premium.exam.65q: Number: DES-3611 Passing Score: 800 Time Limit: 120 Min File Version: 1.1
No ratings yet
DES-3611.prepaway - Premium.exam.65q: Number: DES-3611 Passing Score: 800 Time Limit: 120 Min File Version: 1.1
22 pages
Applied Business Tools and Technologies
No ratings yet
Applied Business Tools and Technologies
41 pages
1 2 3 4 5 6 Simatic Teleservice S7 Library V6.1: Manual
No ratings yet
1 2 3 4 5 6 Simatic Teleservice S7 Library V6.1: Manual
72 pages
CCIE DC Storage Section 001 Introduction
No ratings yet
CCIE DC Storage Section 001 Introduction
8 pages
Memory Management Unit
No ratings yet
Memory Management Unit
23 pages
Chapter 5 File Management: Storage Hard Disks, LVM and RAID
100% (1)
Chapter 5 File Management: Storage Hard Disks, LVM and RAID
31 pages
05 CDAD2103 Topic 1
No ratings yet
05 CDAD2103 Topic 1
16 pages
DDR4-SODIMM Diagram
No ratings yet
DDR4-SODIMM Diagram
25 pages
Q1. What Are The Things You Need To Consider in Choosing An Operating System?
No ratings yet
Q1. What Are The Things You Need To Consider in Choosing An Operating System?
2 pages
Lab Manual Microcontroller and Interfacing 3151707 GECGn
No ratings yet
Lab Manual Microcontroller and Interfacing 3151707 GECGn
42 pages
Lesson Plan Branch-Mca 3 Semester Operating System
No ratings yet
Lesson Plan Branch-Mca 3 Semester Operating System
1 page
Python Course Book
No ratings yet
Python Course Book
219 pages
Epicor ERP Hardware Sizing Guide WP ENS
No ratings yet
Epicor ERP Hardware Sizing Guide WP ENS
25 pages
1.1.1 (SL) Computer Architecture - Revision Pack
No ratings yet
1.1.1 (SL) Computer Architecture - Revision Pack
29 pages
Multimedia Database Management System: Paper On
No ratings yet
Multimedia Database Management System: Paper On
24 pages
Chemitec
No ratings yet
Chemitec
7 pages
Lecture 6 Unit 1 Introduction To Microcomputer Systems
No ratings yet
Lecture 6 Unit 1 Introduction To Microcomputer Systems
20 pages
Oceanspace Cloud Storage System
No ratings yet
Oceanspace Cloud Storage System
2 pages
Reconfigurable Computing ES ZG554 Session 1: BITS Pilani
No ratings yet
Reconfigurable Computing ES ZG554 Session 1: BITS Pilani
18 pages
Comptia Cloud Objectives cv0-001
No ratings yet
Comptia Cloud Objectives cv0-001
15 pages
Authentication and Key Agreement Based On Anonymous Identity For Peer
100% (1)
Authentication and Key Agreement Based On Anonymous Identity For Peer
59 pages

06 Memory Hierarchy

Uploaded by

06 Memory Hierarchy

Uploaded by

Foundations of

Prof. Dr.-Ing. Richard Membarth

January 8th 2025

January 8th 2025

how can we wait on data?

processor-memory performance gap 10000

CPU performance: processor memory

communication costs becomes more and more

programs access a small portion of their address space at any time

temporal or spatial locality?

1 loop: lw x31, 0(x20) 1 addi sp, sp, -12

temporal or spatial locality?

bandwidth, space, and price per bit

1000 bytes 64 KB 256 KB 1–2 GB 4–64 GB

laptop 1000 bytes 64 KB 256 KB 4–8 MB 4–16 GB 256 GB–1 TB

server 4000 bytes 64 KB 256 KB 16–64 MB 32–256 GB 16–64 TB 1–16 TB

January 8th 2025

caches are faster, but smaller than memory address data

direct-mapped cache address data

direct-mapped cache address data

direct-mapped cache address data

direct-mapped cache address data

direct-mapped cache address data

direct-mapped cache address data

cache example content of the cache before / after the memory

fully associative cache address data

set-associative cache address data

k-way associative cache

1-way 2-way 4-way 8-way

on a cache miss, a cache block might need to be replaced

on a write-hit, data changes in cache → two strategies

on a write-miss, data is not present in cache → two strategies

inclusive non-inclusive non-exclusive exclusive

inclusive L2 exclusive L2 NINE L2

Y Y L1 and L2 cache miss

block Y is evicted from

average memory access time (AMAT)

cache thrashing address data

false sharing core 0 core 1

functionality: load data from memory into register

bandwidth, space, and price per bit

functionality: load data from memory into register

bandwidth, space, and price per bit

functionality: load data from memory into register

bandwidth, space, and price per bit

functionality: load data from memory into register

bandwidth, space, and price per bit

functionality: load data from memory into register

bandwidth, space, and price per bit

processor memory size technology latency bandwidth

bandwidth, space, and price per bit

L3 cache Intel Core i7 6700 (Skylake)

DDR RAM processor (ring/uncore) frequency: 2.9 GHz

Intel Core i7 6700 (Skylake)

bandwidth, space, and price per bit

Solihin, Y. (2015). Fundamentals of parallel multicore architecture. Chapman & Hall/CRC.

You might also like