0% found this document useful (0 votes)
7 views42 pages

06 Memory Hierarchy

Chapter 6 of 'Foundations of Computer Science' discusses memory hierarchy, highlighting the performance gap between processors and memory, and the importance of locality in program access patterns. It explains the structure and function of caches, including direct-mapped caches, and addresses key questions regarding block placement, identification, replacement, and write strategies. The chapter emphasizes the trade-offs between speed and size in memory systems and the implications for computer architecture.

Uploaded by

lucky25hada
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
7 views42 pages

06 Memory Hierarchy

Chapter 6 of 'Foundations of Computer Science' discusses memory hierarchy, highlighting the performance gap between processors and memory, and the importance of locality in program access patterns. It explains the structure and function of caches, including direct-mapped caches, and addresses key questions regarding block placement, identification, replacement, and write strategies. The chapter emphasizes the trade-offs between speed and size in memory systems and the implications for computer architecture.

Uploaded by

lucky25hada
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 42

Foundations of

Computer Science
Chapter 6 Memory Hierarchy

Prof. Dr.-Ing. Richard Membarth

January 8th 2025


Contents

6. Memory Hierarchy

6.1 Motivation

6.2 Caches

2 THI | Prof. Dr.-Ing. Richard Membarth | Foundations of Computer Science | Memory Hierarchy | Winter Term 2024/2025
Foundations of
Computer Science
Chapter 6.1 Memory Hierarchy
Motivation
Prof. Dr.-Ing. Richard Membarth

January 8th 2025


Memory Hierarchy
Motivation

assumption so far
instructions take a single cycle to compute
each instruction takes the same time to finish
memory requests are served within a single cycle
reality
instructions take multiple cycles to compute
memory requests are served within multiple cycles

4 THI | Prof. Dr.-Ing. Richard Membarth | Foundations of Computer Science | Memory Hierarchy: Motivation | Winter Term 2024/2025
Memory Hierarchy
Motivation

how can we wait on data?


stall pipeline
when data is ready, continue processing
applies for data and instruction memory
out-of-order processors can continue processing independent instructions

5 THI | Prof. Dr.-Ing. Richard Membarth | Foundations of Computer Science | Memory Hierarchy: Motivation | Winter Term 2024/2025
Memory Hierarchy
Motivation

processor-memory performance gap 10000


CPU performance
memory performance
memory performance grows slower than processor
performance
memory performance: DRAM latency 1000

CPU performance: processor memory


processor-memory
requests
performance gap
conclusion 100

communication costs becomes more and more


a performance bottleneck
we need ways to circumvent that problem 10

1
1980 1985 1990 1995 2000 2005 2010 2015 2020

6 THI | Prof. Dr.-Ing. Richard Membarth | Foundations of Computer Science | Memory Hierarchy: Motivation | Winter Term 2024/2025
Memory Hierarchy
Motivation

programs access a small portion of their address space at any time


temporal locality
items accessed recently are likely to be accessed again soon
examples
→ instructions in a loop
→ induction variables
spatial locality
items near those accessed recently are likely to be accessed soon
examples
→ sequential instruction access
→ array data

7 THI | Prof. Dr.-Ing. Richard Membarth | Foundations of Computer Science | Memory Hierarchy: Motivation | Winter Term 2024/2025
Memory Hierarchy
Motivation

temporal or spatial locality?

1 loop: lw x31, 0(x20) 1 addi sp, sp, -12


2 add x31, x31, x21 2 sw ra, 0(sp)
3 sw x31, 0(x20) 3 sw a0, 4(sp)
4 addi x20, x20, -4 4 sw a1, 8(sp)
5 blt x22, x20, loop 5 ...
6 lw a1, 8(sp)
spatial locality in programs 7 lw a0, 4(sp)
8 lw ra, 0(sp)
sequence of instructions stored consecutively in 9 addi sp, sp, +12
memory
spatial locality in data
temporal locality in programs
registers are stored to contiguous memory via sp
loop instructions 1–5 will be fetched once on
temporal locality in data
every loop iteration
registers are restored from memory via sp

8 THI | Prof. Dr.-Ing. Richard Membarth | Foundations of Computer Science | Memory Hierarchy: Motivation | Winter Term 2024/2025
Memory Hierarchy
Motivation

temporal or spatial locality?

1 int sum = 0;
2 for (int i = 0; i < n; i++)
3 sum = sum + arr[i];

temporal locality
variables sum and i
typically kept in registers, but not always possible
spatial locality
array arr stored in memory contiguously

9 THI | Prof. Dr.-Ing. Richard Membarth | Foundations of Computer Science | Memory Hierarchy: Motivation | Winter Term 2024/2025
Memory Hierarchy
Motivation

processor

core
registers cache as memory hierarchy

bandwidth, space, and price per bit


set of layers
L1 cache memory transfers only between adjacent layers

latency, capacity
important for developers: the higher the layer, the faster and smaller
L2 cache
the memory

L3 cache

DDR RAM

HDD

10 THI | Prof. Dr.-Ing. Richard Membarth | Foundations of Computer Science | Memory Hierarchy: Motivation | Winter Term 2024/2025
Memory Hierarchy
Motivation

L L
CPU
1 2 memory storage
registers
C C flash

1000 bytes 64 KB 256 KB 1–2 GB 4–64 GB


mobile
300 ps 1 ns 5–10 ns 50–100 ns 25–50 us

11 THI | Prof. Dr.-Ing. Richard Membarth | Foundations of Computer Science | Memory Hierarchy: Motivation | Winter Term 2024/2025
Memory Hierarchy
Motivation

L L L
CPU
1 2 3 memory storage
registers
C C C flash

laptop 1000 bytes 64 KB 256 KB 4–8 MB 4–16 GB 256 GB–1 TB


300 ps 1 ns 3–10 ns 10–20 ns 50–100 ns 50–100 us
desktop 2000 bytes 64 KB 256 KB 8–32 MB 8–64 GB 256 GB–2 TB
300 ps 1 ns 3–10 ns 10–20 ns 50–100 ns 50–100 us

L L L disk storage
CPU
1 2 3 memory
registers
C C C flash storage

server 4000 bytes 64 KB 256 KB 16–64 MB 32–256 GB 16–64 TB 1–16 TB


200 ps 1 ns 3–10 ns 10–20 ns 50–100 ns 5–10 ms 100–200 us
disk flash
12 THI | Prof. Dr.-Ing. Richard Membarth | Foundations of Computer Science | Memory Hierarchy: Motivation | Winter Term 2024/2025
Foundations of
Computer Science
Chapter 6.2 Memory Hierarchy
Caches
Prof. Dr.-Ing. Richard Membarth

January 8th 2025


Caches
Basics

caches are faster, but smaller than memory address data


divided into blocks 0
1
number of blocks is usually a power of 2
2
assumption: 1 byte per block 3
four important questions 4
5 index data
Q1 where can a block be placed in the upper level?
6 0
(block placement) 7 1
Q2 how is a block found if it is in the upper level? 8 2
9 3
(block identification)
10 cache
Q3 which block should be replaced on a miss? 11
(block replacement) 12
13
Q4 what happens on a write?
14
(write strategy) 15
memory
14 THI | Prof. Dr.-Ing. Richard Membarth | Foundations of Computer Science | Memory Hierarchy: Caches | Winter Term 2024/2025
Caches
Direct Mapping

direct-mapped cache address data


each memory address maps to exactly one block 0
1
addresses 0, 4, 8, and 12 map to cache block 0
2
addresses 1, 5, 9, and 13 map to cache block 1 3
... 4
5 index data
6 0
7 1
8 2
9 3
10 cache
11
12
13
14
15
memory
15 THI | Prof. Dr.-Ing. Richard Membarth | Foundations of Computer Science | Memory Hierarchy: Caches | Winter Term 2024/2025
Caches
Direct Mapping

direct-mapped cache address data


each memory address maps to exactly one block 0000
0001
addresses 0, 4, 8, and 12 map to cache block 0
0010
addresses 1, 5, 9, and 13 map to cache block 1 0011
... 0100
0101 index data
cache block index computation
0110 00
cache with 2k blocks 0111 01
memory address i 1000 10
1001 11
cache block index: i mod 2k
1010 cache
corresponds to the least significant k bits of the 1011
address 1100
1101
1110
1111
memory
15 THI | Prof. Dr.-Ing. Richard Membarth | Foundations of Computer Science | Memory Hierarchy: Caches | Winter Term 2024/2025
Caches
Direct Mapping

direct-mapped cache address data


for a memory address i, the corresponding cache 0000
0001
block can be computed, however, other addresses
0010
map also to the same cache block 0011
cache block 2 can contain data from memory 0100
0101 index Vtag
data
tag datadata
addresses 2, 6, 10, or 14
0110 00
0111 01
1000 10
1001 11
1010 cache
cache
cache
1011
1100
1101
1110
1111
memory
16 THI | Prof. Dr.-Ing. Richard Membarth | Foundations of Computer Science | Memory Hierarchy: Caches | Winter Term 2024/2025
Caches
Direct Mapping

direct-mapped cache address data


for a memory address i, the corresponding cache 0000
0001
block can be computed, however, other addresses
0010
map also to the same cache block 0011
cache block 2 can contain data from memory 0100
0101 index Vtag
data
tag datadata
addresses 2, 6, 10, or 14
0110 00
add tags to the cache 0111 01
stores the rest of the address bits 1000 10
1001 11
to differentiate between different addresses
1010 cache
cache
cache
tag field: i /2k 1011
corresponds to the most significant m − k bits of 1100
1101
the address
1110
1111
memory
16 THI | Prof. Dr.-Ing. Richard Membarth | Foundations of Computer Science | Memory Hierarchy: Caches | Winter Term 2024/2025
Caches
Direct Mapping

direct-mapped cache address data


for a memory address i, the corresponding cache 0000
0001
block can be computed, however, other addresses
0010
map also to the same cache block 0011
cache block 2 can contain data from memory 0100
0101 index Vtag
data
tag datadata
addresses 2, 6, 10, or 14
0110 00
add tags to the cache 0111 01
stores the rest of the address bits 1000 10
1001 11
to differentiate between different addresses
1010 cache
cache
cache
tag field: i /2k 1011
corresponds to the most significant m − k bits of 1100
1101
the address
1110
add valid bit for each cache block 1111
cache data is not valid when system is initialized memory
16 THI | Prof. Dr.-Ing. Richard Membarth | Foundations of Computer Science | Memory Hierarchy: Caches | Winter Term 2024/2025
Caches
Direct Mapping

direct-mapped cache address data


in reality, the cache block size is larger than one 0000
0001
byte
0010
load two-byte blocks at a time 0011
when reading from address 12, data in addresses 0100
0101 index V tag data
12 and 13 will both be copied to cache block 2
0110 00
byte offset 0111 01
block size of 2n (cache line size) 1000 10
1001 11
byte offset within cache line: i mod 2n
1010 cache
corresponds to the n least significant bits 1011
block addresses 1100
1101
block address of a byte address: i /2n
1110
corresponds to the m − n most significant bits 1111
(contains bits for the index and tag) memory
17 THI | Prof. Dr.-Ing. Richard Membarth | Foundations of Computer Science | Memory Hierarchy: Caches | Winter Term 2024/2025
Caches
Direct Mapping

locating data
cache with 2k lines, each containing 2n bytes
m-bit memory address
n bits of the address corresponds to the byte offset within a cache line
k bits of the address corresponds to the index of the 2k cache line
m − k − n bits of the address corresponds to the tag to match the memory address
(m-k-n) bits k bits n bits
m-bit address tag index offset

example
cache with 22 cache lines, 21 bytes per cache line
memory address 13 (1101) will be stored in byte 1 of cache line 2
1 bit 2 bits 1 bit
4-bit address 1 10 1

18 THI | Prof. Dr.-Ing. Richard Membarth | Foundations of Computer Science | Memory Hierarchy: Caches | Winter Term 2024/2025
Caches
Direct Mapping

advantages
indices and offsets computation with bit operations or simple arithmetic
easy to realize in hardware
disadvantages
low cache hit rate
the memory address access pattern 0, 4, 0, 4, 0, . . . will replace the first cache line for each memory access

19 THI | Prof. Dr.-Ing. Richard Membarth | Foundations of Computer Science | Memory Hierarchy: Caches | Winter Term 2024/2025
Caches
Direct Mapping

cache example content of the cache before / after the memory


8 blocks, 1 word / block, direct mapped references
consider memory references at addresses 0x16, cache

0x1A, 0x16, 0x1A, 0x10, 0x03, 0x10, 0x12, index v tag data
000 y 00010 mem[0x10]
0x10
001 n
address hit / miss cache block y
010 00010 mem[0x12]
0x16 00010110 miss 110 y
011 00000 mem[0x03]
0x1A 00011010 miss 010
100 n
0x16 00010110 hit 110
101 n
0x1A 00011010 hit 010 y
110 00010 mem[0x16]
0x10 00010000 miss 000
111 n
0x03 00000011 miss 011
0x10 00010000 hit 000
0x12 00010010 miss 010
0x10 00010000 hit 000

20 THI | Prof. Dr.-Ing. Richard Membarth | Foundations of Computer Science | Memory Hierarchy: Caches | Winter Term 2024/2025
Caches
Fully Associative Mapping

fully associative cache address data


each memory address can map to any block 0000
0001
addresses 0 can map to cache block 0, 1, 2, or 3
0010
addresses 1 can map to cache block 0, 1, 2, or 3 0011
... 0100
(m-n) bits n bits 0101 index data
m-bit address tag offset 0110 00
0111 01
advantages 1000 10
high cache hit rate 1001 11
1010 cache
flexible replacement strategy
1011
disadvantages 1100
expensive to implement 1101
1110
1111
memory
21 THI | Prof. Dr.-Ing. Richard Membarth | Foundations of Computer Science | Memory Hierarchy: Caches | Winter Term 2024/2025
Caches
Set-Associative Mapping

set-associative cache address data


cache is divided into groups of blocks, called sets 0000
0001
each memory address maps to exactly one set
0010
data may be placed in any block within the set 0011
0100
(m-s-n) bits s bits n bits 0101 index data
m-bit address tag set offset 0110 00
set 0
0111 01
advantages 1000 10
set 1
trade-off between direct-mapped and fully 1001 11
1010 cache
associative cache
1011
flexible replacement strategy 1100
disadvantages 1101
1110
will not effectively use all available cache lines
1111
memory
22 THI | Prof. Dr.-Ing. Richard Membarth | Foundations of Computer Science | Memory Hierarchy: Caches | Winter Term 2024/2025
Caches
Set-Associative Mapping

k-way associative cache


cache is k -way associative cache ↔ each set with k = 2x blocks

1-way 2-way 4-way 8-way


8 sets 4 sets 2 sets 1 set
1 block each 2 blocks each 4 blocks each 8 blocks each
set set set set
0
0
1
0
2
1
3
0
4
2
5
1
6
3
7
cache cache cache cache
direct mapped fully associative
23 THI | Prof. Dr.-Ing. Richard Membarth | Foundations of Computer Science | Memory Hierarchy: Caches | Winter Term 2024/2025
Caches
Replacement Policies

on a cache miss, a cache block might need to be replaced


replacement policies
random replacement (RR)
first in, first out (FIFO)
last in, first out (LIFO)
least recently used (LRU)
most recently used (MRU)
...
sources of cache misses
compulsory (cold start, first reference)
conflict (collision)
capacity (size)
coherency (multi-core)

24 THI | Prof. Dr.-Ing. Richard Membarth | Foundations of Computer Science | Memory Hierarchy: Caches | Winter Term 2024/2025
Caches
Writing Policies: Write-Hit Policies

on a write-hit, data changes in cache → two strategies

write-back write-through
data is written only to the block in the cache data is written to both the block in the cache
modified cache block is written to main and to the block in the lower-level memory
memory only when it is replaced
discussion discussion
→ use dirty bit to indicate modified cache − slow
block → use buffered write-through
− more overhead with multi-core + simple
+ fast + always in sync

25 THI | Prof. Dr.-Ing. Richard Membarth | Foundations of Computer Science | Memory Hierarchy: Caches | Winter Term 2024/2025
Caches
Writing Policies: Write-Miss Policies

on a write-miss, data is not present in cache → two strategies

write-allocate no-write-allocate
cache block is allocated on a write miss write misses do not affect the cache
followed by the preceding write hit actions cache block is modified only in the
→ write misses act like read misses lower-level memory
typically write-back with write-allocate policy often write-through with no-write-allocate policy

26 THI | Prof. Dr.-Ing. Richard Membarth | Foundations of Computer Science | Memory Hierarchy: Caches | Winter Term 2024/2025
Caches
Inclusion Policies

the inclusion policy in multi-level caches decides which data blocks each level contains
inclusive: all blocks present in the higher/upper level cache have to be present in the lower level cache as well
example: L2 cache is inclusive of L1
example: L3 cache is inclusive of L1 and L2
exclusive: any element in the higher/upper level cache will not be present in any of the lower cache components
example: L2 cache is exclusive of L1
non-inclusive non-exclusive (NINE): higher/upper level cache may or may not be present in the lower level cache
example: L3 cache is non-inclusive

L2
L2 L1 L2 L1
L1

inclusive non-inclusive non-exclusive exclusive


27 THI | Prof. Dr.-Ing. Richard Membarth | Foundations of Computer Science | Memory Hierarchy: Caches | Winter Term 2024/2025
Caches
Inclusion Policies (cf., Solihin, 2015)

inclusive L2 exclusive L2 NINE L2

L1 X Y L1 X Y L1 X Y

Y Y L1 and L2 cache miss


L2 X L2 L2 X on block X and Y
Z Z Z

L1 Y L1 Y L1 Y

Y Y
block X is evicted from
L2 X L2 X L2 X
Z Z Z the L1 cache

L1 Y Y

block Y is evicted from


L2 X X X
Z Z Z the L2 cache

L1 L1 Y L1 Y
Z Z Z
L1 cache miss on
L2 X L2 X L2 X block Z
Z Z
28 THI | Prof. Dr.-Ing. Richard Membarth | Foundations of Computer Science | Memory Hierarchy: Caches | Winter Term 2024/2025
Caches
Average Memory Access Time

average memory access time (AMAT)


metric to analyze the memory subsystem performance
hit time, miss penalty, miss rate
AMAT = hit_time + miss_rate · miss_penalty
allows to analyze memory hierarchies
miss penalty can be computed via values from next cache level
AMATi = hit_timei + miss_ratei · miss_penaltyi +1
example
CPU with 1 ns clock
hit time = 1 cycle
miss penalty = 20 cycles
cache miss rate = 5%
→ AMAT = 1 cycle + 0.05 · 20 cycles = 2 cycles

29 THI | Prof. Dr.-Ing. Richard Membarth | Foundations of Computer Science | Memory Hierarchy: Caches | Winter Term 2024/2025
Caches
Cache Thrashing

cache thrashing address data


memory access pattern, where multiple addresses 0
1
map to the same cache line
2
leads to excessive cache misses 3
example: direct mapped cache, access pattern 4
5 index data
memory address 0 → cache line 0
6 0
memory address 4 → cache line 0 7 1
memory address 0 → cache line 0 8 2
9 3
memory address 4 → cache line 0
10 cache
... 11
problematic for caches with low associativity 12
13
14
15
memory
30 THI | Prof. Dr.-Ing. Richard Membarth | Foundations of Computer Science | Memory Hierarchy: Caches | Winter Term 2024/2025
Caches
False Sharing

false sharing core 0 core 1


memory access pattern, where threads on different thread 0 thread 1
cores access data within the same memory block
cache coherence enforces both cores to update
their cache continuously with main memory
example: variables A and B are in the same
memory block
thread 0 of core 0 modifies A → update of the
main memory block and cache line in core 1
thread 1 of core 1 modifies B → update of the
main memory block and cache line in core 0
...

31 THI | Prof. Dr.-Ing. Richard Membarth | Foundations of Computer Science | Memory Hierarchy: Caches | Winter Term 2024/2025
Caches
Hierarchy

functionality: load data from memory into register


processor
lw t0, 0(s0)
core
registers address in virtual memory: addr at 0(s0)

bandwidth, space, and price per bit


L1$
test target register: t0
hit
L1 cache test whether addr is in L1$

latency, capacity
L2$ L1$
hit miss if yes: L1$ hit, return value
L2 cache
L3$ L2$ otherwise: L1$ miss
hit miss
page L3 cache
L3$
table
miss
hit
DDR RAM
page
fault

HDD

32 THI | Prof. Dr.-Ing. Richard Membarth | Foundations of Computer Science | Memory Hierarchy: Caches | Winter Term 2024/2025
Caches
Hierarchy

functionality: load data from memory into register


processor
on L1$ miss, we go the next level: L2
core
registers test whether addr is in L2$

bandwidth, space, and price per bit


L1$
test if yes: L2$ hit
hit
L1 cache return cache line to L1$

latency, capacity
L2$ L1$
hit miss return value from L1$ to registers
L2 cache
L3$ L2$ otherwise: L2$ miss
hit miss cache line: continuous block of memory of a fixed size
page L3 cache
L3$ 64 bytes on current processors
table
miss
hit
DDR RAM
page
fault

HDD

32 THI | Prof. Dr.-Ing. Richard Membarth | Foundations of Computer Science | Memory Hierarchy: Caches | Winter Term 2024/2025
Caches
Hierarchy

functionality: load data from memory into register


processor
on L2$ miss, we go the next level: L3
core
registers test whether addr is in L3$

bandwidth, space, and price per bit


L1$
test if yes: L3$ hit
hit
L1 cache return cache line to L2$

latency, capacity
L2$ L1$
hit miss return cache line to L1$
L2 cache
L3$ L2$ return value from L1$ to registers
hit miss otherwise: L3$ miss
page L3 cache
L3$
table
miss
hit
DDR RAM
page
fault

HDD

32 THI | Prof. Dr.-Ing. Richard Membarth | Foundations of Computer Science | Memory Hierarchy: Caches | Winter Term 2024/2025
Caches
Hierarchy

functionality: load data from memory into register


processor
on L3$ miss, we go to physical main memory
core
registers MMU checks page table whether the page is physical memory

bandwidth, space, and price per bit


L1$
test if yes: page table hit
hit
L1 cache return cache line to L3$

latency, capacity
L2$ L1$
hit miss return cache line to L2$
L2 cache
L3$ L2$ return cache line to L1$
hit miss return value from L1$ to registers
page L3 cache
L3$ otherwise: L3$ miss
table
miss
hit
DDR RAM
page
fault

HDD

32 THI | Prof. Dr.-Ing. Richard Membarth | Foundations of Computer Science | Memory Hierarchy: Caches | Winter Term 2024/2025
Caches
Hierarchy

functionality: load data from memory into register


processor
if we have a page fault, OS loads page from disk first
core
registers if the virtual address is invalide, raise an exception

bandwidth, space, and price per bit


L1$
test if the address is valid
hit
L1 cache load page (4 KB) from HDD to DDR RAM

latency, capacity
L2$ L1$
hit miss return cache line to L3$
L2 cache
L3$ L2$ return cache line to L2$
hit miss return cache line to L1$
page L3 cache
L3$ return value from L1$ to registers
table
miss
hit
DDR RAM
page
fault

HDD

32 THI | Prof. Dr.-Ing. Richard Membarth | Foundations of Computer Science | Memory Hierarchy: Caches | Winter Term 2024/2025
Caches
Hierarchy

processor memory size technology latency bandwidth


core registers 16 × 8 B registers 0–1 cycle (0.3 ns) 435.2 GB/s
registers

bandwidth, space, and price per bit


L1 cache 4 × 32 KiB SRAM 4–5 cycles (1.2 ns) 217.6 GB/s

L1 cache L2 cache 4 × 256 KiB SRAM > 12 cycles (3.5 ns) 217.6 GB/s

latency, capacity
L3 cache 8 MiB SRAM > 42 cycles (14.5 ns) 4 × 92.8 GB /s
L2 cache DDR4 RAM ≤ 64 GiB DRAM 42 cycles + 51 ns = 55.5 ns 34.1 GB/s

L3 cache Intel Core i7 6700 (Skylake)


core frequency: 3.4 GHz

DDR RAM processor (ring/uncore) frequency: 2.9 GHz


memory clock: 266.67 MHz

HDD

33 THI | Prof. Dr.-Ing. Richard Membarth | Foundations of Computer Science | Memory Hierarchy: Caches | Winter Term 2024/2025
Caches
Hierarchy

Intel Core i7 6700 (Skylake)


processor
replacement strategy: LRU-based
core
registers L1: Tree-PLRU

bandwidth, space, and price per bit


L2: QLRU
L1 cache L3: QLRU

latency, capacity
write-back with write-allocate policy
L2 cache
cache line size: 64 bytes
L2: inclusive of L1
L3 cache
L3: non-inclusive of L2 / L1
two L1 caches
DDR RAM
L1 data cache
L1 instruction cache

HDD

34 THI | Prof. Dr.-Ing. Richard Membarth | Foundations of Computer Science | Memory Hierarchy: Caches | Winter Term 2024/2025
References I

Solihin, Y. (2015). Fundamentals of parallel multicore architecture. Chapman & Hall/CRC.

35 THI | Prof. Dr.-Ing. Richard Membarth | Foundations of Computer Science | Memory Hierarchy: References | Winter Term 2024/2025

You might also like