Section 4 - Chapter 5 - Revised
Section 4 - Chapter 5 - Revised
Chapter 5 – Memories
Introduction
m words
…
– words
– e.g., memory:
• 32,768 bits n bits per word
permanence
Storage
– ROM Mask-programmed ROM Ideal memory
• read only, bits stored without power
– RAM Life of OTP ROM
product
• read and write, lose stored bits without
power Tens of EPROM EEPROM FLASH
years
• Traditional distinctions blurred Battery NVRAM
Nonvolatile
– Advanced ROMs can be written to life (10
years)
• e.g., EEPROM
In-system
– Advanced RAMs can hold bits without programmable SRAM/DRAM
power Near
Write
zero
• e.g., NVRAM ability
• Write ability During External External External External
In-system, fast
fabrication programmer, programmer, programmer programmer
writes,
– Manner and speed a memory can be only one time only 1,000s OR in-system, OR in-system,
unlimited
of cycles 1,000s block-oriented
cycles
written of cycles writes, 1,000s
of cycles
• Storage permanence
– ability of memory to hold stored bits Write ability and storage permanence of memories,
after they are written showing relative degrees along each axis (not to scale).
Write ability
• Nonvolatile memory
• Can be read from but not written to, by a
processor in an embedded system External view
…
• Uses Ak-1
…
lines Q2 and Q0
• Output is 1010
Implementing combinational function
Truth table
Inputs (address) Outputs
a b c y z 8×2 ROM
0 0 word 0
0 0 0 0 0
0 0 1 0 1 0 1 word 1
0 1 0 0 1 0 1
0 1 1 1 0 enable 1 0
1 0 0 1 0 1 0
1 0 1 1 1 c 1 1
1 1 0 1 1 b 1 1
1 1 1 1 1 1 1 word 7
a
y z
Mask-programmed ROM
.
EEPROM: Electrically erasable
programmable ROM
• Programmed and erased electronically
– typically by using higher than normal voltage
– can program and erase individual words
• Better write ability
– can be in-system programmable with built-in circuit to provide higher
than normal voltage
– writes very slow due to erasing and programming
• “busy” pin indicates to processor EEPROM still writing
– can be erased and programmed tens of thousands of times
• Similar storage permanence to EPROM (about 10 years)
• Far more convenient than EPROMs, but more expensive
Flash Memory
• Extension of EEPROM
– Same floating gate principle
– Same write ability and storage permanence
• Fast erase
– Large blocks of memory erased at once, rather than one word at a time
– Blocks typically several thousand bytes large
• Writes to single words may be slower
– Entire block must be read, word updated, then entire block written back
• Used with embedded systems storing large data items in
nonvolatile memory
– e.g., digital cameras, TV set-top boxes, cell phones
RAM: “Random-access” memory
external view
• Typically volatile memory r/w 2k × n read and write
– bits are not held without power supply enable memory
Q3 Q2 Q1 Q0
Basic types of RAM
Memory Interface
CLK CLK
MemWrite WE
Address ReadData
Processor Memory
WriteData
The Memory Access Problem
• Up until now, assumed memory could be accessed
in 1 clock cycle
• But that hasn’t been true since the 1980’s
Memory System Challenge
• Make memory system appear as fast as processor
• Use a hierarchy of memories
• Ideal memory:
– Fast
– Cheap (inexpensive)
– Large (capacity)
SRAM ~ $10,000 ~ 1 ns
Cache
Speed
Size
Memory hierarchy
• Main memory
– Large, inexpensive,
Processor
• Cache Cache
• Offset
– used to find particular word/byte in cache line
40
Direct Mapped Cache
Address
11...11111100 mem[0xFF...FC]
11...11111000 mem[0xFF...F8]
11...11110100 mem[0xFF...F4]
11...11110000 mem[0xFF...F0]
11...11101100 mem[0xFF...EC]
11...11101000 mem[0xFF...E8]
11...11100100 mem[0xFF...E4]
11...11100000 mem[0xFF...E0]
00...00100100 mem[0x00...24]
00...00100000 mem[0x00..20] Set Number
00...00011100 mem[0x00..1C] 7 (111)
00...00011000 mem[0x00...18] 6 (110)
00...00010100 mem[0x00...14] 5 (101)
00...00010000 mem[0x00...10] 4 (100)
00...00001100 mem[0x00...0C] 3 (011)
00...00001000 mem[0x00...08] 2 (010)
00...00000100 mem[0x00...04] 1 (001)
00...00000000 mem[0x00...00] 0 (000)
8-entry x
(1+27+32)-bit
SRAM
27 32
Hit Data
Direct Mapped Cache Performance
Byte
Tag Set Offset
Memory
00...00 001 00
Address 3
V Tag Data
Set 7 (111)
# MIPS assembly code Set 6 (110)
Set 5 (101)
addi $t0, $0, 5 Set 4 (100)
loop: beq $t0, $0, Set 3 (011)
done Set 2 (010)
Set 1 (001)
lw $t1, 0x4($0) Set 0 (000)
lw $t2, 0xC($0)
lw $t3, 0x8($0)
addi $t0, $t0, -1 Miss Rate =
j loop
done:
43
Direct Mapped Cache Performance
Byte
Tag Set Offset
Memory
00...00 001 00
Address 3
V Tag Data
0 Set 7 (111)
# MIPS assembly code Set 6 (110)
0
0 Set 5 (101)
addi $t0, $0, 5 0 Set 4 (100)
loop: beq $t0, $0, 1 00...00 mem[0x00...0C] Set 3 (011)
done 1 00...00 mem[0x00...08] Set 2 (010)
1 00...00 mem[0x00...04] Set 1 (001)
lw $t1, 0x4($0) 0 Set 0 (000)
lw $t2, 0xC($0)
lw $t3, 0x8($0) Miss Rate = 3/15
addi $t0, $t0, -1
= 20%
j loop
done:
Temporal Locality
44
Compulsory Misses
Direct Mapped Cache: Conflict
Byte
Tag Set Offset
Memory
00...01 001 00
Address 3
V Tag Data
# MIPS assembly code Set 7 (111)
Set 6 (110)
Set 5 (101)
addi $t0, $0, 5 Set 4 (100)
loop: beq $t0, $0, done Set 3 (011)
lw $t1, 0x4($0) Set 2 (010)
lw $t2, 0x24($0) Set 1 (001)
Set 0 (000)
addi $t0, $t0, -1
j loop
done:
45
Direct Mapped Cache: Conflict
Byte
Tag Set Offset
Memory
00...01 001 00
Address 3
V Tag Data
# MIPS assembly code 0 Set 7 (111)
0 Set 6 (110)
0 Set 5 (101)
addi $t0, $0, 5 0 Set 4 (100)
loop: beq $t0, $0, done 0 Set 3 (011)
lw $t1, 0x4($0) 0 Set 2 (010)
mem[0x00...04] Set 1 (001)
lw $t2, 0x24($0) 1 00...00 mem[0x00...24]
0 Set 0 (000)
addi $t0, $t0, -1
j loop
done: Miss Rate = 10/10
=
100% Misses
Conflict
46
Fully Associative Cache
• Complete main memory address stored in each cache address
• All addresses stored in cache simultaneously compared with
desired address
• Valid bit and offset same as direct mapping; no set index since
# of sets = 1.
• No conflict misses
• Expensive to build
Tag Offset
Data
V T D V T D V T D
…
Valid
= =
=
Fully Associative Cache
• The cache for our example would appear as follows. One set;
eight blocks per set.
V Tag Data V Tag Data V Tag Data V Tag Data V Tag Data V Tag Data V Tag Data V Tag Data
Set Associative Cache
• Compromise between direct
mapping and fully associative
mapping
• Index same as in direct mapping
• Each cache address contains content Tag Set Index Offset
mapping
• Cache with set size N called N-way
set-associative
– 2-way, 4-way, 8-way are common
N-Way Set Associative Cache
Byte
Tag Set Offset
Memory
00
Address Way 1 Way 0
28 2
V Tag Data V Tag Data
28 32 28 32
= =
0
Hit1 Hit0 Hit1
32
Hit Data
N-Way Set Associative Performance
# MIPS assembly code
51
N-way Set Associative Performance
# MIPS assembly code
52
Spatial Locality?
• Increase block size:
– Block size, b = 4 words
– C = 8 words
– Direct mapped (1 block per set)
– Number of blocks, B = C/b = 8/4 = 2
Block Byte
Tag Set Offset Offset
Memory
00
Address
27 2
V Tag Data
Set 1
Set 0
27 32 32 32 32
11
10
01
00
32
=
Hit Data 53
Cache with Larger Block Size
Block Byte
Tag Set Offset Offset
Memory
00
Address
27 2
V Tag Data
Set 1
Set 0
27 32 32 32 32
11
10
01
00
32
=
Hit Data
54
Direct Mapped Cache Performance
addi $t0, $0, 5
loop: beq $t0, $0,
done
lw $t1, 0x4($0)
lw $t2, 0xC($0)
lw $t3, 0x8($0)
addi $t0, $t0, -1
j loop
Block Byte
done: Memory Tag Set Offset Offset
00
Address
27 2
V Tag Data
27 32 32 32 32
11
10
01
00
32
=
55
Hit Data
Direct Mapped Cache Performance
addi $t0, $0, 5
loop: beq $t0, $0,
done
lw $t1, 0x4($0) Miss Rate = 1/15
lw $t2, 0xC($0)
lw $t3, 0x8($0)
=
addi $t0, $t0, -1 6.67%
Larger blocks
j loop
Block Byte
reduce compulsory misses
done: Memory Tag Set Offset Offset
Address
00...00 0 11 00
2
through spatial locality
27
V Tag Data
0 Set 1
1 00...00 mem[0x00...0C] mem[0x00...08] mem[0x00...04] mem[0x00...00] Set 0
27 32 32 32 32
11
10
01
00
32
=
Hit Data 56
Cache-replacement policy
• Cache is too small to hold all data of interest at one time
• Technique for choosing which block to replace
– when fully associative cache is full
– when set-associative cache’s set is full
• Direct mapped cache has no choice
• Random
– replace block chosen at random
• LRU: least-recently used
– replace block not accessed for longest time
• FIFO: first-in-first-out
– push block onto queue when accessed
– choose block to replace by popping queue
LRU Replacement
# MIPS assembly
lw $t0, 0x04($0)
lw $t1, 0x24($0)
lw $t2, 0x54($0)
61
Cache impact on system performance
0.14
0.12
0.1 1 way
% cache miss
2 way
0.08
4 way
0.06 8 way
0.04
0.02
0
cache size
1 Kb 2 Kb 4 Kb 8 Kb 16 Kb 32 Kb 64 Kb 128 Kb
Multilevel Caches