06 Memory Hierarchy
06 Memory Hierarchy
Computer Science
Chapter 6 Memory Hierarchy
6. Memory Hierarchy
6.1 Motivation
6.2 Caches
2 THI | Prof. Dr.-Ing. Richard Membarth | Foundations of Computer Science | Memory Hierarchy | Winter Term 2024/2025
Foundations of
Computer Science
Chapter 6.1 Memory Hierarchy
Motivation
Prof. Dr.-Ing. Richard Membarth
assumption so far
instructions take a single cycle to compute
each instruction takes the same time to finish
memory requests are served within a single cycle
reality
instructions take multiple cycles to compute
memory requests are served within multiple cycles
4 THI | Prof. Dr.-Ing. Richard Membarth | Foundations of Computer Science | Memory Hierarchy: Motivation | Winter Term 2024/2025
Memory Hierarchy
Motivation
5 THI | Prof. Dr.-Ing. Richard Membarth | Foundations of Computer Science | Memory Hierarchy: Motivation | Winter Term 2024/2025
Memory Hierarchy
Motivation
1
1980 1985 1990 1995 2000 2005 2010 2015 2020
6 THI | Prof. Dr.-Ing. Richard Membarth | Foundations of Computer Science | Memory Hierarchy: Motivation | Winter Term 2024/2025
Memory Hierarchy
Motivation
7 THI | Prof. Dr.-Ing. Richard Membarth | Foundations of Computer Science | Memory Hierarchy: Motivation | Winter Term 2024/2025
Memory Hierarchy
Motivation
8 THI | Prof. Dr.-Ing. Richard Membarth | Foundations of Computer Science | Memory Hierarchy: Motivation | Winter Term 2024/2025
Memory Hierarchy
Motivation
1 int sum = 0;
2 for (int i = 0; i < n; i++)
3 sum = sum + arr[i];
temporal locality
variables sum and i
typically kept in registers, but not always possible
spatial locality
array arr stored in memory contiguously
9 THI | Prof. Dr.-Ing. Richard Membarth | Foundations of Computer Science | Memory Hierarchy: Motivation | Winter Term 2024/2025
Memory Hierarchy
Motivation
processor
core
registers cache as memory hierarchy
latency, capacity
important for developers: the higher the layer, the faster and smaller
L2 cache
the memory
L3 cache
DDR RAM
HDD
10 THI | Prof. Dr.-Ing. Richard Membarth | Foundations of Computer Science | Memory Hierarchy: Motivation | Winter Term 2024/2025
Memory Hierarchy
Motivation
L L
CPU
1 2 memory storage
registers
C C flash
11 THI | Prof. Dr.-Ing. Richard Membarth | Foundations of Computer Science | Memory Hierarchy: Motivation | Winter Term 2024/2025
Memory Hierarchy
Motivation
L L L
CPU
1 2 3 memory storage
registers
C C C flash
L L L disk storage
CPU
1 2 3 memory
registers
C C C flash storage
locating data
cache with 2k lines, each containing 2n bytes
m-bit memory address
n bits of the address corresponds to the byte offset within a cache line
k bits of the address corresponds to the index of the 2k cache line
m − k − n bits of the address corresponds to the tag to match the memory address
(m-k-n) bits k bits n bits
m-bit address tag index offset
example
cache with 22 cache lines, 21 bytes per cache line
memory address 13 (1101) will be stored in byte 1 of cache line 2
1 bit 2 bits 1 bit
4-bit address 1 10 1
18 THI | Prof. Dr.-Ing. Richard Membarth | Foundations of Computer Science | Memory Hierarchy: Caches | Winter Term 2024/2025
Caches
Direct Mapping
advantages
indices and offsets computation with bit operations or simple arithmetic
easy to realize in hardware
disadvantages
low cache hit rate
the memory address access pattern 0, 4, 0, 4, 0, . . . will replace the first cache line for each memory access
19 THI | Prof. Dr.-Ing. Richard Membarth | Foundations of Computer Science | Memory Hierarchy: Caches | Winter Term 2024/2025
Caches
Direct Mapping
0x1A, 0x16, 0x1A, 0x10, 0x03, 0x10, 0x12, index v tag data
000 y 00010 mem[0x10]
0x10
001 n
address hit / miss cache block y
010 00010 mem[0x12]
0x16 00010110 miss 110 y
011 00000 mem[0x03]
0x1A 00011010 miss 010
100 n
0x16 00010110 hit 110
101 n
0x1A 00011010 hit 010 y
110 00010 mem[0x16]
0x10 00010000 miss 000
111 n
0x03 00000011 miss 011
0x10 00010000 hit 000
0x12 00010010 miss 010
0x10 00010000 hit 000
20 THI | Prof. Dr.-Ing. Richard Membarth | Foundations of Computer Science | Memory Hierarchy: Caches | Winter Term 2024/2025
Caches
Fully Associative Mapping
24 THI | Prof. Dr.-Ing. Richard Membarth | Foundations of Computer Science | Memory Hierarchy: Caches | Winter Term 2024/2025
Caches
Writing Policies: Write-Hit Policies
write-back write-through
data is written only to the block in the cache data is written to both the block in the cache
modified cache block is written to main and to the block in the lower-level memory
memory only when it is replaced
discussion discussion
→ use dirty bit to indicate modified cache − slow
block → use buffered write-through
− more overhead with multi-core + simple
+ fast + always in sync
25 THI | Prof. Dr.-Ing. Richard Membarth | Foundations of Computer Science | Memory Hierarchy: Caches | Winter Term 2024/2025
Caches
Writing Policies: Write-Miss Policies
write-allocate no-write-allocate
cache block is allocated on a write miss write misses do not affect the cache
followed by the preceding write hit actions cache block is modified only in the
→ write misses act like read misses lower-level memory
typically write-back with write-allocate policy often write-through with no-write-allocate policy
26 THI | Prof. Dr.-Ing. Richard Membarth | Foundations of Computer Science | Memory Hierarchy: Caches | Winter Term 2024/2025
Caches
Inclusion Policies
the inclusion policy in multi-level caches decides which data blocks each level contains
inclusive: all blocks present in the higher/upper level cache have to be present in the lower level cache as well
example: L2 cache is inclusive of L1
example: L3 cache is inclusive of L1 and L2
exclusive: any element in the higher/upper level cache will not be present in any of the lower cache components
example: L2 cache is exclusive of L1
non-inclusive non-exclusive (NINE): higher/upper level cache may or may not be present in the lower level cache
example: L3 cache is non-inclusive
L2
L2 L1 L2 L1
L1
L1 X Y L1 X Y L1 X Y
L1 Y L1 Y L1 Y
Y Y
block X is evicted from
L2 X L2 X L2 X
Z Z Z the L1 cache
L1 Y Y
L1 L1 Y L1 Y
Z Z Z
L1 cache miss on
L2 X L2 X L2 X block Z
Z Z
28 THI | Prof. Dr.-Ing. Richard Membarth | Foundations of Computer Science | Memory Hierarchy: Caches | Winter Term 2024/2025
Caches
Average Memory Access Time
29 THI | Prof. Dr.-Ing. Richard Membarth | Foundations of Computer Science | Memory Hierarchy: Caches | Winter Term 2024/2025
Caches
Cache Thrashing
31 THI | Prof. Dr.-Ing. Richard Membarth | Foundations of Computer Science | Memory Hierarchy: Caches | Winter Term 2024/2025
Caches
Hierarchy
latency, capacity
L2$ L1$
hit miss if yes: L1$ hit, return value
L2 cache
L3$ L2$ otherwise: L1$ miss
hit miss
page L3 cache
L3$
table
miss
hit
DDR RAM
page
fault
HDD
32 THI | Prof. Dr.-Ing. Richard Membarth | Foundations of Computer Science | Memory Hierarchy: Caches | Winter Term 2024/2025
Caches
Hierarchy
latency, capacity
L2$ L1$
hit miss return value from L1$ to registers
L2 cache
L3$ L2$ otherwise: L2$ miss
hit miss cache line: continuous block of memory of a fixed size
page L3 cache
L3$ 64 bytes on current processors
table
miss
hit
DDR RAM
page
fault
HDD
32 THI | Prof. Dr.-Ing. Richard Membarth | Foundations of Computer Science | Memory Hierarchy: Caches | Winter Term 2024/2025
Caches
Hierarchy
latency, capacity
L2$ L1$
hit miss return cache line to L1$
L2 cache
L3$ L2$ return value from L1$ to registers
hit miss otherwise: L3$ miss
page L3 cache
L3$
table
miss
hit
DDR RAM
page
fault
HDD
32 THI | Prof. Dr.-Ing. Richard Membarth | Foundations of Computer Science | Memory Hierarchy: Caches | Winter Term 2024/2025
Caches
Hierarchy
latency, capacity
L2$ L1$
hit miss return cache line to L2$
L2 cache
L3$ L2$ return cache line to L1$
hit miss return value from L1$ to registers
page L3 cache
L3$ otherwise: L3$ miss
table
miss
hit
DDR RAM
page
fault
HDD
32 THI | Prof. Dr.-Ing. Richard Membarth | Foundations of Computer Science | Memory Hierarchy: Caches | Winter Term 2024/2025
Caches
Hierarchy
latency, capacity
L2$ L1$
hit miss return cache line to L3$
L2 cache
L3$ L2$ return cache line to L2$
hit miss return cache line to L1$
page L3 cache
L3$ return value from L1$ to registers
table
miss
hit
DDR RAM
page
fault
HDD
32 THI | Prof. Dr.-Ing. Richard Membarth | Foundations of Computer Science | Memory Hierarchy: Caches | Winter Term 2024/2025
Caches
Hierarchy
L1 cache L2 cache 4 × 256 KiB SRAM > 12 cycles (3.5 ns) 217.6 GB/s
latency, capacity
L3 cache 8 MiB SRAM > 42 cycles (14.5 ns) 4 × 92.8 GB /s
L2 cache DDR4 RAM ≤ 64 GiB DRAM 42 cycles + 51 ns = 55.5 ns 34.1 GB/s
HDD
33 THI | Prof. Dr.-Ing. Richard Membarth | Foundations of Computer Science | Memory Hierarchy: Caches | Winter Term 2024/2025
Caches
Hierarchy
latency, capacity
write-back with write-allocate policy
L2 cache
cache line size: 64 bytes
L2: inclusive of L1
L3 cache
L3: non-inclusive of L2 / L1
two L1 caches
DDR RAM
L1 data cache
L1 instruction cache
HDD
34 THI | Prof. Dr.-Ing. Richard Membarth | Foundations of Computer Science | Memory Hierarchy: Caches | Winter Term 2024/2025
References I
35 THI | Prof. Dr.-Ing. Richard Membarth | Foundations of Computer Science | Memory Hierarchy: References | Winter Term 2024/2025