The Memory Hierarchy
The Memory Hierarchy
1
Today
Storage technologies and trends
Let it wash over you
Locality of reference
Caching in the memory hierarchy
2
Main Memory = DRAM
3
Random-Access Memory (RAM)
Key features
RAM is traditionally packaged as a chip.
Basic storage unit is normally a cell (one bit per cell).
Multiple RAM chips form a memory.
Static RAM (SRAM)
Each cell stores a bit with a four or six-transistor circuit.
Retains value indefinitely, as long as it is kept powered.
Relatively insensitive to electrical noise (EMI), radiation, etc.
Faster and more expensive than DRAM.
Dynamic RAM (DRAM)
Each cell stores bit with a capacitor. One transistor is used for access
Value must be refreshed every 10-100 ms.
More sensitive to disturbances (EMI, radiation,…) than SRAM.
Slower and cheaper than SRAM.
4
SRAM vs DRAM Summary
5
The Memory Bottleneck
Typical
This CPU clock
problem getsrate
worse
1CPUs
GHzget
(1ns cycle time)
faster
Typical DRAMget
Memories access
biggertime
30ns (about
Memory delay 30 cycles)
is mostly communication time
Typical main memory
reading/writing access
a bit is fast
100ns (100 cycles)
it takes time to
DRAM (30), precharge (10), chip crossings (30), overhead (30).
select the right bit
Our pipeline designs assume 1 cycle access (1ns)
route the data to/from the bit
Average instruction references
Big
memories
1 instructionare slow
word
Small
0.3memories
data wordscan be made fast
Lecture 14 6
6
Enhanced DRAMs
Basic DRAM cell has not changed since its invention in 1966.
Commercialized by Intel in 1970.
DRAM cores with better interface logic and faster I/O :
Synchronous DRAM (SDRAM)
Uses a conventional clock signal instead of asynchronous control
Allows reuse of the row addresses (e.g., RAS, CAS, CAS, CAS)
CPU chip
Register file
ALU
I/O Main
Bus interface
bridge memory
15
Memory Read Transaction (1)
CPU places address A on the memory bus.
ALU
%eax
Main memory
I/O bridge 0
A
Bus interface x A
16
Memory Read Transaction (2)
Main memory reads A from the memory bus, retrieves
word x, and places it on the bus.
Register file
Load operation: movl A, %eax
ALU
%eax
Main memory
I/O bridge x 0
Bus interface x A
17
Memory Read Transaction (3)
CPU read word x from the bus and copies it into register
%eax.
Register file Load operation: movl A, %eax
ALU
%eax x
Main memory
I/O bridge 0
Bus interface x A
18
Memory Write Transaction (1)
CPU places address A on bus. Main memory reads it and
waits for the corresponding data word to arrive.
Register file Store operation: movl %eax, A
ALU
%eax y
Main memory
I/O bridge 0
A
Bus interface A
19
Memory Write Transaction (2)
CPU places data word y on the bus.
ALU
%eax y
Main memory
I/O bridge 0
y
Bus interface A
20
Memory Write Transaction (3)
Main memory reads data word y from the bus and stores
it at address A.
register file
Store operation: movl %eax, A
ALU
%eax y
main memory
I/O bridge 0
bus interface y A
21
What’s Inside A Disk Drive?
Arm Spindle
Platters
Actuator
Electronics
(including a
processor
SCSI and memory!)
connector
Image courtesy of Seagate Technology
22
Disk Geometry
Disks consist of platters, each with two surfaces.
Each surface consists of concentric rings called tracks.
Each track consists of sectors separated by gaps.
Tracks
Surface
Track k Gaps
Spindle
Sectors
23
I/O Bus
CPU chip
Register file
ALU
System bus Memory bus
I/O Main
Bus interface
bridge memory
Main
Bus interface
memory
I/O bus
Main
Bus interface
memory
I/O bus
Main
Bus interface
memory
I/O bus
Sequential read tput 250 MB/s Sequential write tput 170 MB/s
Random read tput 140 MB/s Random write tput 14 MB/s
Rand read access 30 us Random write access 300 us
47
SSD Tradeoffs vs Rotating Disks
Advantages
No moving parts faster, less power, more rugged
Disadvantages
Have the potential to wear out
Mitigated by “wear leveling logic” in flash translation layer
E.g. Intel X25 guarantees 1 petabyte (1015 bytes) of random
writes before they wear out
In 2010, about 100 times more expensive per byte
Applications
MP3 players, smart phones, laptops
Beginning to appear in desktops and servers
48
Storage Trends
SRAM
Metric 1980 1985 1990 1995 2000 2005 2010
2010:1980
$/MB 19,200 2,900 320 256 100 75 60 320
access (ns) 300 150 35 15 3 2 1.5 200
DRAM
Metric 1980 1985 1990 1995 2000 2005 2010
2010:1980
$/MB 8,000 880 100 30 1 0.1 0.06 130,000
access (ns) 375 200 100 70 60 50 40 9
typical size (MB) 0.064 0.256 4 16 64 2,000 8,000 125,000
Disk
Metric 1980 1985 1990 1995 2000 2005 2010
2010:1980
$/MB 500 100 8 0.30 0.01 0.005 0.0003 1,600,000
access (ms) 87 75 28 10 8 4 3 29
49
CPU Clock Rates Inflection point in computer history
when designers hit the “Power Wall”
Clock
rate (MHz) 1 20 150 600 3300 2000 2500 2500
Cycle
time (ns) 1000 50 6 1.6 0.3 0.50 0.4 2500
Cores 1 1 1 1 1 2 4 4
Effective
cycle 1000 50 6 1.6 0.3 0.25 0.1 10,000
time (ns)
50
The CPU-Memory Gap
The gap widens between DRAM, disk, and CPU speeds.
100,000,000.0
Disk
10,000,000.0
1,000,000.0
SSD
100,000.0
100.0
DRAM CPU cycle time
Effective CPU cycle time
10.0
1.0
0.1 CPU
0.0
1980 1985 1990 1995 2000 2003 2005 2010
Year 51
The Memory Hierarchy
Latency Bandwidth
Registers 1 cyc 3-10 words/cycle compiler managed
CPU < 1KB
Chip 1-3cy 1-2 words/cycle hardware managed
L1 Cache 16KB -1MB
Tape
Lecture 14 52
52
Locality to the Rescue!
53
Today
Storage technologies and trends
Locality of reference
Caching in the memory hierarchy
54
Locality
Principle of Locality: Programs tend to use data and
instructions with addresses near or equal to those they
have used recently
Temporal locality:
Recently referenced items are likely
to be referenced again in the near future
Spatial locality:
Items with nearby addresses tend
to be referenced close together in time
55
Locality Example
sum = 0;
for (i = 0; i < n; i++)
sum += a[i];
return sum;
Data references
Reference array elements in succession
(stride-1 reference pattern). Spatial locality
Reference variable sum each iteration. Temporal locality
Instruction references
Reference instructions in sequence. Spatial locality
Cycle through loop repeatedly. Temporal locality
56
Qualitative Estimates of Locality
Claim: Being able to look at code and get a qualitative
sense of its locality is a key skill for a professional
programmer.
58
Locality Example
Question: Can you permute the loops so that the function
scans the 3-d array a with a stride-1 reference pattern
(and thus has good spatial locality)?
59
Memory Hierarchies
Some fundamental and enduring properties of hardware
and software:
Fast storage technologies cost more per byte, have less capacity,
and require more power (heat!).
The gap between CPU and main memory speed is widening.
Well-written programs tend to exhibit good locality.
60
Today
Storage technologies and trends
Locality of reference
Caching in the memory hierarchy
61
An Example Memory Hierarchy
L0: CPU registers hold words retrieved
Registers from L1 cache
L1: L1 cache
Smaller, (SRAM) L1 cache holds cache lines retrieved
from L2 cache
faster,
costlier L2:
L2 cache
per byte L2 cache holds cache lines
(SRAM)
retrieved from main memory
L3:
Larger, Main memory
(DRAM) Main memory holds disk blocks
slower, retrieved from local disks
cheaper
per byte L4: Local secondary storage Local disks hold files
(local disks) retrieved from disks on
remote network servers
62
Caches
Cache: A smaller, faster storage device that acts as a staging
area for a subset of the data in a larger, slower device.
Fundamental idea of a memory hierarchy:
For each k, the faster, smaller device at level k serves as a cache for the
larger, slower device at level k+1.
Why do memory hierarchies work?
Because of locality, programs tend to access the data at level k more
often than they access the data at level k+1.
Thus, the storage at level k+1 can be slower, and thus larger and cheaper
per bit.
Big Idea: The memory hierarchy creates a large pool of
storage that costs as much as the cheap storage near the
bottom, but that serves data to programs at the rate of the
fast storage near the top.
63
General Cache Concepts
4 5 6 7
8 9 10 11
12 13 14 15
64
General Cache Concepts: Hit
Request: 14 Data in block b is needed
Block b is in cache:
Cache 8 9 14 3
Hit!
Memory 0 1 2 3
4 5 6 7
8 9 10 11
12 13 14 15
65
General Cache Concepts: Miss
Request: 12 Data in block b is needed
66
General Caching Concepts:
Types of Cache Misses
Cold (compulsory) miss
Cold misses occur because the line has never been touched.
A cache whose size equals memory takes only cold misses.
Conflict miss
Conflict misses occur when the level k cache is large enough, but multiple
data objects all map to the same level k block.
Arises from limited associativity and non-optimal replacement
Misses absent in a fully-associative, optimal replacement policy cache
are conflict misses.
Capacity miss
Occurs when the set of active cache blocks (working set) is larger than
the cache.
67
Examples of Caching in the Hierarchy
Cache Type What is Cached? Where is it Cached? Latency (cycles) Managed By
Web cache Web pages Remote server disks 1,000,000,000 Web proxy
server
68
Summary
The speed gap between CPU, memory and mass storage
continues to widen.
69