Memory Hierarchies: Forecast - Memory (B5) - Motivation For Memory Hierarchy - Cache - Ecc - Virtual Memory
Memory Hierarchies: Forecast - Memory (B5) - Motivation For Memory Hierarchy - Cache - Ecc - Virtual Memory
Forecast
• Memory (B5)
• Motivation for memory hierarchy
• Cache
• ECC
• Virtual memory
Background
Mem
Size Speed Price/MB
Element
Register small 1-5ns high ??
SRAM medium 5-25ns $??
DRAM large 60-120ns $1
Disk large 10-20ms $0.20
Register File
32 FF in parallel => one register
16 registers
SRAM interface
Today - 2M x 8 in 5-15ns
DRAM
Dense memory
• 1 T cell
• forgets data on read and after a while
• e.g., 16M x 1 in 4k x 4k array
• 24 address bits - 12 for row and 12 for column
Implementation
Incompatible requirements
...
L1
largest, slowest, cheapest memory larger
L2
L3
faster
Ln
Memory Hierarchy
Speed
Type Size
(ns)
Register < 1 KB 0.5
L1 Cache < 128 KB 1
L2 Cache < 16 MB 20
Main < 4 GB 100
memory
Disk > 10 GB 10 x 106
Memory Hierarchy
Main memory <-> Disk: managed by
• program - explicit I/O
• operating system - virtual memory
• illusion of larger memory
• protection
• transparent to user
Main Memory
Cache
put block in “block frame”
• state (e.g., valid)
• address tag
• data
Cache Example
Memory words:
0x11c 0xe0e0e0e0
0x120 0xffffffff
0x124 0x00000001
0x128 0x00000007
0x12c 0x00000003
0x130 0xabababab
lw $4, 0x128
Cache Example
Return 0x7 to CPU to put in $4
lw $5, 0x124
= 1 + 0.01 * 20 = 1.2
Cache
4 questions
• Where is block placed?
• How is block found?
• Which block is replaced?
• What happens on a write?
16 bits 32 bits
• index - 14 bits
• tag - 16 bits 16K
entries
Consider
16 32
• hit & miss
• place & replace
16 12 2 Byte
Hit Tag Data
offset
Index Block offset
16 bits 128 bits
V Tag Data
4K
entries
16 32 32 32 32
Mux
32
22 8
253
254
255
22 32
4-to-1 multiplexor
Hit Data
Cache Design
Q4. What happens on a write?
• write hit must be slower
• propagate to memory?
• immediately - write-through
• on replacement - write-back
Reduce conflicts
• more associativity
• may increase cache hit time
Cache Design
Unified vs. split instruction and data cache
Example
• consider building 16K I and D cache
• or a 32K unified cache
• let tcache be 1 cycle and tmemory be 10 cycles
Unified cache
• tavg = 1 + 0.04*10 = 1.4 WRONG!
• tavg = 1.4 + cycles-lost-to-interference
• will cycles-lost-to-interference be < 0.1?
• NOT for modern pipelined processors!
Cache Design
Multi-level caches
Many systems today have a cache hierarchy
E.g.,
• 16K I-cache
• 16K D-cache
• 1M L2-cache