Week 10
Week 10
Memory Technology
Readings
• Digital Design and Computer Architecture – David
Harris & Sarah Harris
• Chapter 8
Memory Technology:
DRAM and SRAM
Memory Technology: DRAM
_bitline
• DRAM cell loses charge over time
• DRAM cell needs to be refreshed
Memory Technology: SRAM
• Static random access memory
• Two cross coupled inverters store a single bit
• Feedback path enables the stored value to persist in the “cell”
• 4 transistors for storage
• 2 transistors for access
row select
_bitline
bitline
DRAM vs. SRAM
• DRAM
• Slower access (capacitor)
• Higher density (1T 1C cell)
• Lower cost
• Requires refresh (power, performance, circuitry)
• Manufacturing requires putting capacitor and logic together
• SRAM
• Faster access (no capacitor)
• Lower density (6T cell)
• Higher cost
• No need for refresh
• Manufacturing compatible with logic process (no capacitor)
Memory Hierarchy and
Caches
The Memory Hierarchy
Memory in a Modern System
DRAM BANKS
DRAM INTERFACE
DRAM MEMORY
CORE 1
CORE 3
CONTROLLER
L2 CACHE 1 L2 CACHE 3
L2 CACHE 0 L2 CACHE 2
CORE 2
CORE 0
SHARED L3 CACHE
Ideal Memory
• Zero access time (latency)
• Infinite capacity
• Zero cost
• Infinite bandwidth (to support multiple accesses in parallel)
The Problem
• Ideal memory’s requirements oppose each other
• Bigger is slower
• Bigger → Takes longer to determine the location
backup
everything big but slow
here
Memory Hierarchy
• Fundamental tradeoff
• Fast memory: small
• Large memory: slow
• Idea: Memory hierarchy
Hard Disk
Main
CPU Cache Memory
RF (DRAM)
Main
Level 2 Memory
CPU Level1 Cache (DRAM)
RF Cache
A Note on Manual vs. Automatic Management
L2 cache
512 KB ~ 1MB, many nsec Automatic
HW cache
L3 cache, management
.....
• Keep mi low
• increasing capacity Ci lowers mi, but beware of increasing ti
• lower mi by smarter cache management (replacement::anticipate
what you don’t need, prefetching::anticipate what you will need)
• Chapter 8
Caching Basics
◼Block (line): Unit of storage in the cache
❑Memory is logically divided into cache blocks that map to locations
in the cache
◼On a reference:
❑HIT: If in cache, use cached data instead of accessing memory
❑MISS: If not in cache, bring block into cache
◼ Maybe have to kick something else out to do it
Address
Tag Store Data Store
Hit/miss? Data
8-bit address
◼Cache access:
1) index into the tag and data stores with index bits in address
2) check valid bit in tag store
3) compare tag bits in address with the stored tag in tag store
V tag V tag
=? =? MUX
=? =? =? =?
Logic Hit?
Data store
MUX
byte in block
MUX
Tag store
=? =? =? =? =? =? =? =?
Logic
Hit?
Data store
MUX
byte in block
MUX
Associativity (and Tradeoffs)
• Degree of associativity: How many blocks can map to
the same index (or set)?
• Higher associativity
++ Higher hit rate
-- Slower cache access time (hit latency and data access latency)
-- More expensive hardware (more comparators)
hit rate
associativity
Cache Examples
Cache Terminology
• Capacity (C):
• the number of data bytes a cache stores
• Block size (b):
• bytes of data brought into cache at once
• Number of blocks (B = C/b):
• number of blocks in cache: B = C/b
• Degree of associativity (N):
• number of blocks in a set
• Number of sets (S = B/N):
• each memory address maps to exactly one cache set
How is data found?
• Cache organized into S sets
00...00100100 mem[0x00...24]
00...00100000 mem[0x00..20] Set Number
00...00011100 mem[0x00..1C] 7 (111)
00...00011000 mem[0x00...18] 6 (110)
00...00010100 mem[0x00...14] 5 (101)
00...00010000 mem[0x00...10] 4 (100)
00...00001100 mem[0x00...0C] 3 (011)
00...00001000 mem[0x00...08] 2 (010)
00...00000100 mem[0x00...04] 1 (001)
00...00000000 mem[0x00...00] 0 (000)
8-entry x
(1+27+32)-bit
SRAM
27 32
Hit Data
Direct Mapped Cache Performance
Byte
Tag Set Offset
Memory
00...00 001 00
Address 3
V Tag Data
0 Set 7 (111)
0 Set 6 (110)
0 Set 5 (101)
0 Set 4 (100)
1 00...00 mem[0x00...0C] Set 3 (011)
1 00...00 mem[0x00...08] Set 2 (010)
1 00...00 mem[0x00...04] Set 1 (001)
0 Set 0 (000)
28 32 28 32
= =
0
Hit1 Hit0 Hit1
32
Hit Data
N-way Set Associative Performance
# MIPS assembly code
Miss Rate =
addi $t0, $0, 5
loop: beq $t0, $0, done
lw $t1, 0x4($0)
lw $t2, 0x24($0)
addi $t0, $t0, -1
j loop
done:
Way 1 Way 0
V Tag Data V Tag Data
0 0 Set 3
0 0 Set 2
1 00...10 mem[0x00...24] 1 00...00 mem[0x00...04] Set 1
0 0 Set 0
N-way Set Associative Performance
# MIPS assembly code
Miss Rate = 2/10
addi $t0, $0, 5
loop: beq $t0, $0, done = 20%
lw $t1, 0x4($0)
lw
addi
$t2,
$t0,
0x24($0)
$t0, -1
Associativity reduces
j loop conflict misses
done:
Way 1 Way 0
V Tag Data V Tag Data
0 0 Set 3
0 0 Set 2
1 00...10 mem[0x00...24] 1 00...00 mem[0x00...04] Set 1
0 0 Set 0
Fully Associative Cache
• No conflict misses
• Expensive to build
V Tag Data V Tag Data V Tag Data V Tag Data V Tag Data V Tag Data V Tag Data V Tag Data
Spatial Locality?
• Increase block size:
• Block size, b = 4 words
• C = 8 words
• Direct mapped (1 block per set)
• Number of blocks, B = C/b = 8/4 = 2
Block Byte
Tag Set Offset Offset
Memory
00
Address
27 2
V Tag Data
Set 1
Set 0
27 32 32 32 32
11
10
01
00
32
=
Hit Data
Direct Mapped Cache Performance
loop:
addi
beq
$t0,
$t0,
$0, 5
$0, done Miss Rate =
lw $t1, 0x4($0)
lw $t2, 0xC($0)
lw $t3, 0x8($0)
addi $t0, $t0, -1
j loop
done:
Block Byte
Tag Set Offset Offset
Memory
00...00 0 11 00
Address
27 2
V Tag Data
0 Set 1
1 00...00 mem[0x00...0C] mem[0x00...08] mem[0x00...04] mem[0x00...00] Set 0
27 32 32 32 32
11
10
01
00
32
=
Hit Data
Direct Mapped Cache Performance
loop:
addi
beq
$t0,
$t0,
$0, 5
$0, done
Miss Rate = 1/15
= 6.67%
lw $t1, 0x4($0)
lw $t2, 0xC($0)
lw $t3, 0x8($0)
addi $t0, $t0, -1 Larger blocks reduce compulsory
misses through spatial locality
j loop
done:
Block Byte
Tag Set Offset Offset
Memory
00...00 0 11 00
Address
27 2
V Tag Data
0 Set 1
1 00...00 mem[0x00...0C] mem[0x00...08] mem[0x00...04] mem[0x00...00] Set 0
27 32 32 32 32
11
10
01
00
32
=
Hit Data
Types of Misses
• Compulsory: first time data is accessed