Chapter 05 Computer Organization and Design, Fifth Edition: The Hardware/Software Interface (The Morgan Kaufmann Series in Computer Architecture and Design) 5th Edition
Chapter 05 Computer Organization and Design, Fifth Edition: The Hardware/Software Interface (The Morgan Kaufmann Series in Computer Architecture and Design) 5th Edition
5 D
Edition
th
Chapter 5
Large and Fast:
Exploiting Memory
Hierarchy
5.1 Introduction
Principle of Locality
Spatial locality
Memory hierarchy
Store everything on disk
Copy recently accessed (and nearby)
items from disk to smaller DRAM memory
Main memory
Magnetic disk
Memory Technology
Ideal memory
DRAM Technology
DRAM Generations
Year
Capacity
$/GB
1980
64Kbit
$1500000
1983
256Kbit
$500000
1985
1Mbit
$200000
1989
4Mbit
$50000
1992
16Mbit
$15000
1996
64Mbit
$10000
1998
128Mbit
$4000
2000
256Mbit
$1000
2004
512Mbit
$250
2007
1Gbit
$50
Row buffer
Synchronous DRAM
DRAM banking
Flash Storage
Flash Types
Disk Storage
Sector ID
Data (512 bytes, 4096 bytes proposed)
Error correcting code (ECC)
Given
Cache memory
Cache Memory
How do we know if
the data is present?
Where do we look?
#Blocks is a
power of 2
Use low-order
address bits
Cache Example
000
001
010
011
100
101
110
111
Tag
Data
Cache Example
Word addr
Binary addr
Hit/miss
Cache block
22
10 110
Miss
110
Index
000
001
010
011
100
101
110
111
Tag
Data
10
Mem[10110]
Cache Example
Word addr
Binary addr
Hit/miss
Cache block
26
11 010
Miss
010
Index
000
001
010
011
100
101
110
111
Tag
Data
11
Mem[11010]
10
Mem[10110]
Cache Example
Word addr
Binary addr
Hit/miss
Cache block
22
10 110
Hit
110
26
11 010
Hit
010
Index
000
001
010
011
100
101
110
111
Tag
Data
11
Mem[11010]
10
Mem[10110]
Cache Example
Word addr
Binary addr
Hit/miss
Cache block
16
10 000
Miss
000
00 011
Miss
011
16
10 000
Hit
000
Index
Tag
Data
000
10
Mem[10000]
001
010
11
Mem[11010]
011
00
Mem[00011]
100
101
110
10
Mem[10110]
111
N
Chapter 5 Large and Fast: Exploiting Memory Hierarchy 24
Cache Example
Word addr
Binary addr
Hit/miss
Cache block
18
10 010
Miss
010
Index
Tag
Data
000
10
Mem[10000]
001
010
10
Mem[10010]
011
00
Mem[00011]
100
101
110
10
Mem[10110]
111
N
Chapter 5 Large and Fast: Exploiting Memory Hierarchy 25
Address Subdivision
64 blocks, 16 bytes/block
10 9
4 3
Tag
Index
Offset
22 bits
6 bits
4 bits
Cache Misses
Write-Through
Write-Back
Write Allocation
For write-back
12-stage pipeline
Instruction and data access on each cycle
Each 16KB: 256 blocks 16 words/block
D-cache: write-through or write-back
I-cache: 0.4%
D-cache: 11.4%
Weighted average: 3.2%
Chapter 5 Large and Fast: Exploiting Memory Hierarchy 33
Memory accesses
Miss rate Miss penalty
Program
Instructions
Misses
Miss penalty
Program
Instruction
Chapter 5 Large and Fast: Exploiting Memory Hierarchy 36
Given
Example
Performance Summary
Associative Caches
Fully associative
Spectrum of Associativity
Associativity Example
Direct mapped
Block
address
Cache
index
Hit/miss
0
8
0
6
8
0
0
0
2
0
miss
miss
miss
miss
miss
0
Mem[0]
Mem[8]
Mem[0]
Mem[0]
Mem[8]
Mem[6]
Mem[6]
Associativity Example
Cache
index
Hit/miss
0
8
0
6
8
0
0
0
0
0
miss
miss
hit
miss
miss
Mem[8]
Mem[8]
Mem[6]
Mem[6]
Fully associative
Block
address
0
8
0
6
8
Hit/miss
miss
miss
hit
miss
hit
Mem[8]
Mem[8]
Mem[8]
Mem[8]
Mem[6]
Mem[6]
1-way: 10.3%
2-way: 8.6%
4-way: 8.3%
8-way: 8.1%
Chapter 5 Large and Fast: Exploiting Memory Hierarchy 45
Replacement Policy
Random
Multilevel Caches
Given
Example (cont.)
Primary cache
L-2 cache
Results
Misses depend on
memory access
patterns
Algorithm behavior
Compiler
optimization for
memory access
C, A, and B arrays
older accesses
new accesses
Unoptimized
Blocked
Service accomplishment
Service delivered
as specified
Restoration
Failure
Fault: failure of a
component
Dependability
Service interruption
Deviation from
specified service
Dependability Measures
Hamming distance
Encoding SEC
Decoding SEC
SEC/DEC Code
Virtual Machines
Examples
Virtual Memory
Address Translation
Page Tables
TLB Misses
If page is in memory
Or in software
Raise exception
Need to translate
before cache lookup
Complications due to
aliasing
Different virtual
addresses for shared
physical address
Memory Protection
Block placement
Finding a block
Replacement on a miss
Write policy
Block Placement
Determined by associativity
Fully associative
Any location
Finding a Block
Associativity
Location method
Tag comparisons
Direct mapped
Index
n-way set
associative
Fully associative
#entries
Hardware caches
Virtual memory
Replacement
Random
Virtual memory
Write Policy
Write-through
Write-back
Virtual memory
Sources of Misses
Capacity misses
Negative performance
effect
Decrease capacity
misses
Increase associativity
Decrease conflict
misses
31
10 9
4 3
Tag
Index
Offset
18 bits
10 bits
4 bits
Cache Control
Interface Signals
CPU
Read/Write
Read/Write
Valid
Valid
Address
32
Write Data
32
Read Data
32
Ready
Cache
Address
32
Write Data
128
Read Data
128
Memory
Ready
Multiple cycles
per access
Use an FSM to
sequence control steps
Set of states, transition
on each clock edge
Write-through caches
Time Event
step
CPU As
cache
CPU Bs
cache
Memory
0
CPU A reads X
CPU B reads X
CPU A writes 1 to X
Coherence Defined
P1 writes X, P2 writes X
all processors see writes in the same order
Snooping protocols
Directory-based protocols
CPU activity
Bus activity
CPU As
cache
CPU Bs
cache
Memory
0
CPU A reads X
CPU B reads X
CPU A writes 1 to X
Invalidate for X
CPU B read X
0
0
0
0
Memory Consistency
Assumptions
Consequence
Data prefetching
Chapter 5 Large and Fast: Exploiting Memory Hierarchy 100
DGEMM
Pitfalls
Pitfalls
Pitfalls
Principle of locality
Memory hierarchy
Concluding Remarks