0% found this document useful (0 votes)
15 views44 pages

LEC07 Memory Hierarchy

Uploaded by

鄔浚偉
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
15 views44 pages

LEC07 Memory Hierarchy

Uploaded by

鄔浚偉
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 44

Lecture 7

Memory Hierarchy

Acknowledgement: These slides are based on the textbook


(Computer Systems: A Programmer’s Perspective) and its slides. 1
Outline

 Storage technologies and trends

 Caching in the memory hierarchy

 Locality of reference

Bryant and O’Hallaron, Computer Systems: A Programmer’s Perspective, Third Edition 2


Random-Access Memory (RAM)
 Key features
 RAM is usually packaged as a chip
 Basic storage unit: a cell (one bit per cell)
 Multiple RAM chips form a memory
 RAM comes in two varieties:
 SRAM (Static RAM)
 DRAM (Dynamic RAM)
Trans. Access Needs
per bit time refresh? Cost Applications

SRAM 4 or 6 1X No 100x Cache memories

DRAM 1 10X Yes 1X Main memories


Bryant and O’Hallaron, Computer Systems: A Programmer’s Perspective, Third Edition 3
Nonvolatile Memories
 DRAM and SRAM are volatile memories
 Lose information if powered off
 Nonvolatile memories retain value even if powered off
 Read-only memory (ROM): programmed during production
 Programmable ROM (PROM): can be programmed once
 Erasable PROM (EPROM): can be erased 1000 times
 Electrically eraseable PROM (EEPROM): can be erased 100,000 times
 Flash memory: EEPROMs. with partial (block-level) erase capability
 Uses for Nonvolatile Memories
 Firmware programs stored in a ROM (BIOS, controllers for disks, network
cards, graphics accelerators, security subsystems,…)
 Solid state disks (thumb drives, smart phones, mp3 players, tablets,
laptops,…)
 Disk caches
Bryant and O’Hallaron, Computer Systems: A Programmer’s Perspective, Third Edition 4
What is Bus?
 A bus is a collection of parallel wires that carry address,
data, and control signals
 Buses are typically shared by multiple devices (CPU,
memory, I/O devices)

CPU chip

Register file

ALU

System bus Memory bus

I/O Main
Bus interface
bridge memory

Bryant and O’Hallaron, Computer Systems: A Programmer’s Perspective, Third Edition 5


Memory Read Transaction (1)
 CPU places address A on the memory bus.
Register file Load operation: movq A, %rax

ALU
%rax

Main memory
I/O bridge 0
A
Bus interface x A

Bryant and O’Hallaron, Computer Systems: A Programmer’s Perspective, Third Edition 6


Memory Read Transaction (2)
 Main memory reads A from the memory bus,
retrieves word x, and places it on the bus.
Register file
Load operation: movq A, %rax

ALU
%rax

Main memory
I/O bridge x 0

Bus interface x A

Bryant and O’Hallaron, Computer Systems: A Programmer’s Perspective, Third Edition 7


Memory Read Transaction (3)
 CPU read word x from the bus and
copies it into register %rax
Register file Load operation: movq A, %rax

ALU
%rax x

Main memory
I/O bridge 0

Bus interface x A

Bryant and O’Hallaron, Computer Systems: A Programmer’s Perspective, Third Edition 8


Memory Write Transaction (1)
 CPU places address A on bus.
 Main memory reads it and waits for the corresponding data
word to arrive
Register file Store operation: movq %rax, A

ALU
%rax y

Main memory
I/O bridge 0
A
Bus interface A

Bryant and O’Hallaron, Computer Systems: A Programmer’s Perspective, Third Edition 9


Memory Write Transaction (2)
 CPU places data word y on the bus

Register file Store operation: movq %rax, A

ALU
%rax y

Main memory
I/O bridge 0
y
Bus interface A

Bryant and O’Hallaron, Computer Systems: A Programmer’s Perspective, Third Edition 10


Memory Write Transaction (3)
 Main memory reads data word y from the bus and
stores it at address A
Register file
Store operation: movq %rax, A

ALU
%rax y

main memory
I/O bridge 0

Bus interface y A

Bryant and O’Hallaron, Computer Systems: A Programmer’s Perspective, Third Edition 11


A Disk Drive
Arm Platters

Electronics
(including a
processor
SCSI and memory!)
connector
Image courtesy of Seagate Technology
Bryant and O’Hallaron, Computer Systems: A Programmer’s Perspective, Third Edition 12
Disk Geometry
 Disks consist of platters, each with two surfaces
 Each surface consists of concentric rings called tracks
 Each track consists of sectors
 Aligned tracks form a cylinder
Cylinder k
Tracks
Surface
Surface 0 Track k
Surface 1 Platter 0
Surface 2
Surface 3 Platter 1
Surface 4
Surface 5
Platter 2
Sectors
Spindle

Bryant and O’Hallaron, Computer Systems: A Programmer’s Perspective, Third Edition 13


Disk Structure - top view of single platter

Surface organized into tracks

Tracks divided into sectors

Bryant and O’Hallaron, Computer Systems: A Programmer’s Perspective, Third Edition 14


Disk Operation
The disk surface spins
at a fixed rotational rate

The read/write head All read/write heads


is attached to the move in the same way
end of the arm
spindle
spindle

spindle
spindle
Arm

By moving radially, the arm can position


the read/write head over any track
Bryant and O’Hallaron, Computer Systems: A Programmer’s Perspective, Third Edition 15
Disk Access

1. Head in position 2. Wait for


above a track disk rotation

3. About to read
blue sector 4. After reading
blue sector

Bryant and O’Hallaron, Computer Systems: A Programmer’s Perspective, Third Edition 16


Logical Disk Blocks

 Modern disks present a simpler abstract view


of the complex sector geometry:
 The set of available sectors is modeled as a
sequence of b-sized logical blocks (0, 1, 2, ...)
 Mapping between logical blocks and actual
(physical) sectors
 Maintained by a device called disk controller
 Converts requests for logical blocks into
(surface,track,sector) triples

Bryant and O’Hallaron, Computer Systems: A Programmer’s Perspective, Third Edition 17


I/O Bus
CPU chip
Register file

ALU
System bus Memory bus

I/O Main
Bus interface
bridge memory

I/O bus Expansion slots for


other devices such
USB Graphics Disk as network adapters.
controller adapter controller

Mouse Keyboard Monitor


Disk
Bryant and O’Hallaron, Computer Systems: A Programmer’s Perspective, Third Edition 18
Reading a Disk Sector (1)
CPU chip
Register file CPU initiates a disk read by writing a
command, logical block number, and
ALU
destination memory address to a port
(address) associated with disk controller.

Main
Bus interface
memory

I/O bus

USB Graphics Disk


controller adapter controller

mouse keyboard Monitor


Disk
Bryant and O’Hallaron, Computer Systems: A Programmer’s Perspective, Third Edition 19
Reading a Disk Sector (2)
CPU chip
Register file
Disk controller reads the sector and
ALU
performs a direct memory access
(DMA) transfer into main memory.

Main
Bus interface
memory

I/O bus

USB Graphics Disk


controller adapter controller

Mouse Keyboard Monitor


Disk
Bryant and O’Hallaron, Computer Systems: A Programmer’s Perspective, Third Edition 20
Reading a Disk Sector (3)
CPU chip
Register file
When the DMA transfer completes, the disk
ALU controller notifies the CPU with an interrupt (i.e.,
asserts a special “interrupt” pin on the CPU)

Main
Bus interface
memory

I/O bus

USB Graphics Disk


controller adapter controller

Mouse Keyboard Monitor


Disk
Bryant and O’Hallaron, Computer Systems: A Programmer’s Perspective, Third Edition 21
Solid State Disks (SSDs)
I/O bus
Requests to read and
write logical disk blocks
Solid State Disk (SSD)
Flash
translation layer
Flash memory
Block 0 Block B-1
Page 0 Page 1 … Page P-1 … Page 0 Page 1 … Page P-1

 Pages: 512KB to 4KB, Blocks: 32 to 128 pages


 Data read/written in units of pages
 Page can be written only after its block has been erased
 A block wears out after about 100,000 repeated writes
Bryant and O’Hallaron, Computer Systems: A Programmer’s Perspective, Third Edition 22
SSD Performance Characteristics
Sequential read tput 550 MB/s Sequential write tput 470 MB/s
Random read tput 365 MB/s Random write tput 303 MB/s
Avg seq read time 50 us Avg seq write time 60 us
Source: Intel SSD 730 product specification

 Sequential access faster than random access


 Common theme in the memory hierarchy
 Random writes are somewhat slower
 Erasing a block takes a long time (~1 ms)
 Modifying a block page requires all other pages to be
copied to new block
 In earlier SSDs, the read/write gap was much larger
Bryant and O’Hallaron, Computer Systems: A Programmer’s Perspective, Third Edition 23
SSD Tradeoffs vs Rotating Disks

 Advantages
 No moving parts  faster, less power, more rugged
 Disadvantages
 Have the potential to wear out
Mitigated by “wear leveling logic” in flash translation layer

 In 2015, about 30 times more expensive per byte
 Applications
 MP3 players, smart phones, laptops
 Recent desktops and servers

Bryant and O’Hallaron, Computer Systems: A Programmer’s Perspective, Third Edition 24


Storage Trends
SRAM
Metric 1980 1985 1990 1995 2000 2005 2010
2010:1980
$/MB 19,200 2,900 320 256 100 75 60 320
access (ns) 300 150 35 15 3 2 1.5 200
DRAM
Metric 1980 1985 1990 1995 2000 2005 2010
2010:1980
$/MB 8,000 880 100 30 1 0.1 0.06
130,000
access (ns) 375 200 100 70 60 50 40 9
Disk
typical size (MB) 0.064 0.256 4 16 64 2,000 8,000
Metric
125,000 1980 1985 1990 1995 2000 2005 2010
2010:1980
$/MB 500 100 8 0.30 0.01 0.005 0.0003
1,600,000
Bryant and O’Hallaron, Computer Systems: A Programmer’s Perspective, Third Edition 25
CPU Clock Rates Inflection point in computer history
when designers hit the “Power Wall”

1980 1990 1995 2000 2003 2005 2010 2010:1980


CPU 8080 386 Pentium P-III P-4 Core 2 Core i7 ---

Clock
rate (MHz) 1 20 150 600 3300 2000 2500 2500

Cycle
time (ns) 1000 50 6 1.6 0.3 0.50 0.4 2500

Cores 1 1 1 1 1 2 4 4

Effective
cycle 1000 50 6 1.6 0.3 0.25 0.1 10,000
time (ns)

Bryant and O’Hallaron, Computer Systems: A Programmer’s Perspective, Third Edition 26


The CPU-Memory Gap
The gap widens between DRAM, disk, and CPU speeds
100,000,000.0

10,000,000.0
Disk
1,000,000.0

100,000.0
SSD

10,000.0 Disk seek time


Time (ns)

SSD access time


1,000.0 DRAM access time
SRAM access time
100.0 DRAM CPU cycle time
Effective CPU cycle time
10.0

1.0

0.1
CPU

0.0
1985 1990 1995 2000 2003 2005 2010 2015
Year
Bryant and O’Hallaron, Computer Systems: A Programmer’s Perspective, Third Edition 27
Outline

 Storage technologies and trends

 Caching in the memory hierarchy

 Locality of reference

Bryant and O’Hallaron, Computer Systems: A Programmer’s Perspective, Third Edition 28


Memory Hierarchy
CPU registers hold words
L0: retrieved from the L1 cache.
Regs

L1: L1 cache L1 cache holds cache lines


Smaller, faster, and (SRAM) retrieved from the L2 cache.
costlier (per byte)
L2 cache L2 cache holds cache lines
storage devices L2: retrieved from L3 cache
(SRAM)

L3: L3 cache L3 cache holds cache lines


retrieved from main memory.
(SRAM)

Larger, slower,
Main memory holds
and cheaper L4: Main memory disk blocks retrieved
(per byte) (DRAM) from local disks.
storage devices

L5: Local secondary storage


(local disks)
Local disks hold files
retrieved from disks
on remote servers
L6: Remote secondary storage
(e.g., Web servers)
Bryant and O’Hallaron, Computer Systems: A Programmer’s Perspective, Third Edition 29
Examples of Caching in the Mem. Hierarchy
Cache Type What is Cached? Where is it Cached? Latency (cycles) Managed By

Registers 4-8 bytes words CPU core 0 Compiler


TLB Address translations On-Chip TLB 0 Hardware
MMU
L1 cache 64-byte blocks On-Chip L1 4 Hardware
L2 cache 64-byte blocks On-Chip L2 10 Hardware
Virtual Memory 4-KB pages Main memory 100 Hardware + OS
Buffer cache Parts of files Main memory 100 OS
Disk cache Disk sectors Disk controller 100,000 Disk firmware
Network buffer Parts of files Local disk 10,000,000 NFS client
cache
Browser cache Web pages Local disk 10,000,000 Web browser

Web cache Web pages Remote server disks 1,000,000,000 Web proxy
server
Bryant and O’Hallaron, Computer Systems: A Programmer’s Perspective, Third Edition 30
Caches
 Cache: A smaller, faster storage device that acts as a staging
area for a subset of the data in a larger, slower device.
 Idea of a memory hierarchy:
 For each k, the faster, smaller device at level k serves as a cache for the
larger, slower device at level k+1.
 Why do memory hierarchies work?
 Because of locality, programs tend to access the data at level k more
often than they access the data at level k+1.
 Thus, the storage at level k+1 can be slower, and thus larger and cheaper
per bit.
 Big Idea: The memory hierarchy creates a large pool of storage
that costs as much as the cheap storage near the bottom, but
that serves data to programs at the rate of the fast storage near
the top.
Bryant and O’Hallaron, Computer Systems: A Programmer’s Perspective, Third Edition 31
General Cache Concepts

Smaller, faster, more expensive


Cache 8
4 9 14
10 3 memory caches a subset of
the blocks

Data is copied in block-sized


10
4 transfer units

Larger, slower, cheaper memory


Memory 0 1 2 3 viewed as partitioned into “blocks”
4 5 6 7
8 9 10 11
12 13 14 15

Bryant and O’Hallaron, Computer Systems: A Programmer’s Perspective, Third Edition 32


General Cache Concepts: Hit
Request: 14 Data in block b is needed

Block b is in cache:
Cache 8 9 14 3
Hit!

Memory 0 1 2 3
4 5 6 7
8 9 10 11
12 13 14 15

Bryant and O’Hallaron, Computer Systems: A Programmer’s Perspective, Third Edition 33


General Cache Concepts: Miss
Request: 12 Data in block b is needed

Block b is not in cache:


Cache 8 9
12 14 3
Miss!

Block b is fetched from


12 Request: 12
memory

Block b is stored in cache


Memory 0 1 2 3 • Placement policy:
4 5 6 7 determines where b goes
• Replacement policy:
8 9 10 11
determines which block
12 13 14 15 gets evicted (victim)

Bryant and O’Hallaron, Computer Systems: A Programmer’s Perspective, Third Edition 34


Caching Concepts: Types of Cache Misses

 Cold (compulsory) miss


 Cold misses occur because the cache is empty.
 Conflict miss
 Most caches limit blocks at level k+1 to a small subset (sometimes a
singleton) of the block positions at level k.
 E.g. Block i at level k+1 must be placed in block (i mod 4) at level k.
 Conflict misses occur when the level k cache is large enough, but multiple
data objects all map to the same level k block.
 E.g. Referencing blocks 0, 8, 0, 8, 0, 8, ... would miss every time.
 Capacity miss
 Occurs when the set of active cache blocks (working set) is larger than
the cache.

Bryant and O’Hallaron, Computer Systems: A Programmer’s Perspective, Third Edition 35


Outline

 Storage technologies and trends

 Caching in the memory hierarchy

 Locality of reference

Bryant and O’Hallaron, Computer Systems: A Programmer’s Perspective, Third Edition 36


Locality
 Principle of Locality: Programs tend to use data
and instructions with addresses near or equal to
those they have used recently

 Temporal locality:
 Recently referenced items are likely
to be referenced again in the near future

 Spatial locality:
 Items with nearby addresses tend
to be referenced close together in time

Bryant and O’Hallaron, Computer Systems: A Programmer’s Perspective, Third Edition 37


Locality Example
sum = 0;
for (i = 0; i < n; i++)
sum += a[i];
return sum;

 Data references
 Reference array elements in succession Spatial locality
(stride-1 reference pattern)
 Reference variable sum each iteration Temporal locality
 Instruction references
 Reference instructions in sequence Spatial locality
 Cycle through loop repeatedly Temporal locality

Bryant and O’Hallaron, Computer Systems: A Programmer’s Perspective, Third Edition 38


Qualitative Estimates of Locality
 Key skill for a professional programmer: look at code
and get a qualitative sense of its locality

 Question: Does this function have good locality with


respect to array a?

int sum_array_rows(int a[M][N]) int sum_array_cols(int a[M][N])


{ {
int i, j, sum = 0; int i, j, sum = 0;

for (i = 0; i < M; i++) for (j = 0; j < N; j++)


for (j = 0; j < N; j++) for (i = 0; i < M; i++)
sum += a[i][j]; sum += a[i][j];
return sum; return sum;
} }

Bryant and O’Hallaron, Computer Systems: A Programmer’s Perspective, Third Edition 39


Examples of locality
 To compute the sum of all elements in an 2-D array
 GOOD LOCALITY memory
a[0][0] 0x00

[ ]
𝒂 𝟎𝟎 𝒂 𝟎𝟏 𝒂 𝟎𝟐 𝒂 𝟎𝟑 a[0][1] 0x04
𝒂 𝟏𝟎 𝒂 𝟏𝟏 𝒂 𝟏𝟐 𝒂 𝟏𝟑
a[0][2] 0x08
𝒂 𝟐𝟎 𝒂 𝟐𝟏 𝒂 𝟐𝟐 𝒂 𝟐𝟑
a[0][3] 0x0C
a[1][0] 0x10
a[1][1] 0x14
a[1][2] 0x18
a[1][3] 0x1C
a[2][0] 0x20
a[2][1] 0x24
a[2][2] 0x28
a[2][3] 0x2C

Bryant and O’Hallaron, Computer Systems: A Programmer’s Perspective, Third Edition 40


Examples of locality
 To compute the sum of all elements in an 2-D array
 BAD LOCALITY memory
a[0][0] 0x00

[ ]
𝒂 𝟎𝟎 𝒂 𝟎𝟏 𝒂 𝟎𝟐 𝒂 𝟎𝟑 a[0][1] 0x04
𝒂 𝟏𝟎 𝒂 𝟏𝟏 𝒂 𝟏𝟐 𝒂 𝟏𝟑
a[0][2] 0x08
𝒂 𝟐𝟎 𝒂 𝟐𝟏 𝒂 𝟐𝟐 𝒂 𝟐𝟑
a[0][3] 0x0C
a[1][0] 0x10
a[1][1] 0x14
a[1][2] 0x18
a[1][3] 0x1C
a[2][0] 0x20
a[2][1] 0x24
a[2][2] 0x28
a[2][3] 0x2C

Bryant and O’Hallaron, Computer Systems: A Programmer’s Perspective, Third Edition 41


To measure locality
 Stride: The distance of two adjacent data accesses in
memory location, in the unit of 1 data element
 Stride-1 reference pattern: access the data one by one according to
their memory addresses, such as the good locality example

 Stride-k reference pattern: for example, the bad locality example


generally has a stride-4 reference pattern

 The smaller the stride, the better the locality

Bryant and O’Hallaron, Computer Systems: A Programmer’s Perspective, Third Edition 42


Memory Hierarchies

 Fundamental properties of hardware and software:


 Fast storage technologies cost more per byte, have less capacity,
and require more power (heat!)
 The gap between CPU and main memory speed is widening
 Well-written programs tend to exhibit good locality

 These fundamental properties complement each other

 They suggest an approach for organizing memory and


storage systems known as a memory hierarchy

Bryant and O’Hallaron, Computer Systems: A Programmer’s Perspective, Third Edition 43


Summary

 The speed gap between CPU, memory and mass


storage continues to widen

 Well-written programs exhibit a property called


locality

 Memory hierarchies based on caching close the


gap by exploiting locality

Bryant and O’Hallaron, Computer Systems: A Programmer’s Perspective, Third Edition 44

You might also like