0% found this document useful (0 votes)
4 views19 pages

Memory Hierarchies: Forecast - Memory (B5) - Motivation For Memory Hierarchy - Cache - Ecc - Virtual Memory

The document discusses memory hierarchies, detailing various types of memory elements such as registers, SRAM, DRAM, and disk storage, along with their characteristics in terms of size, speed, and cost. It emphasizes the motivation for a memory hierarchy based on locality of reference, which allows for faster access to frequently used data, and outlines the structure and management of caches. Additionally, it covers cache design considerations, including block placement, replacement strategies, and the impact of cache performance on CPU efficiency.

Uploaded by

jonathanj302
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
4 views19 pages

Memory Hierarchies: Forecast - Memory (B5) - Motivation For Memory Hierarchy - Cache - Ecc - Virtual Memory

The document discusses memory hierarchies, detailing various types of memory elements such as registers, SRAM, DRAM, and disk storage, along with their characteristics in terms of size, speed, and cost. It emphasizes the motivation for a memory hierarchy based on locality of reference, which allows for faster access to frequently used data, and outlines the structure and management of caches. Additionally, it covers cache design considerations, including block placement, replacement strategies, and the impact of cache performance on CPU efficiency.

Uploaded by

jonathanj302
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 19

Memory Hierarchies

Forecast
• Memory (B5)
• Motivation for memory hierarchy
• Cache
• ECC
• Virtual memory

© 2000 by Mark D. Hill CS/ECE 552 Lecture Notes: Chapter 7 1

Background
Mem
Size Speed Price/MB
Element
Register small 1-5ns high ??
SRAM medium 5-25ns $??
DRAM large 60-120ns $1
Disk large 10-20ms $0.20

© 2000 by Mark D. Hill CS/ECE 552 Lecture Notes: Chapter 7 2


Background
Need basic element to store a bit - latch, flip-flop, capacitor

Memory is logically a 2D array of #locations x data-width


• e.g., 16 registers 32 bits each is a 16 x 32 memory
• (4 address bits; 32 bits of data)
• today’s main memory chips are 8M x 8
• (23 address bits; 8 bits of data)

© 2000 by Mark D. Hill CS/ECE 552 Lecture Notes: Chapter 7 3

Register File
32 FF in parallel => one register
16 registers

one 16-way mux per read port


decode write enable

can use tri-state and bus for each port

© 2000 by Mark D. Hill CS/ECE 552 Lecture Notes: Chapter 7 4


SRAM
Static RAM
• does not lose data like DRAM
• 6T CMOS cell
• pass transistors as switch
• bit lines, word lines

SRAM interface

Today - 2M x 8 in 5-15ns

Typical large implementations (512 x 64) x 8

© 2000 by Mark D. Hill CS/ECE 552 Lecture Notes: Chapter 7 5

DRAM
Dense memory
• 1 T cell
• forgets data on read and after a while
• e.g., 16M x 1 in 4k x 4k array
• 24 address bits - 12 for row and 12 for column

Implementation

writeback row to restore destroyed value

Refresh - in background, march through reading all rows

Interface reflects internal orgn. - addr/2, RAS, CAS, data


© 2000 by Mark D. Hill CS/ECE 552 Lecture Notes: Chapter 7 6
Optimizations
Give faster access to some bits of row
• static column - change column address
• page mode - change column address & CAS hit (EDO)
• nibble mode - fast access to 4 bits

Bigger changes in future


• bandwidth inside >> external bandwidth
• 8kb/50ns/chip >> 8b/50ns/chip
• 164 Gb/s >> 20 Mb/s
• RAMBUS, IRAM, etc

© 2000 by Mark D. Hill CS/ECE 552 Lecture Notes: Chapter 7 7

Motivation for Hierarchy


CPU wants
• memory reference/insn * bytes-per-reference * IPC/Cycle
• 1.2*4*1/2ns = 2.4 GB/s

CPU can go only as fast as memory can supply

© 2000 by Mark D. Hill CS/ECE 552 Lecture Notes: Chapter 7 8


Motivation for Hierarchy
Want memory with
• fast access (e.g., one 500 ps CPU cycle)
• large capacity (10 GB)
• inexpensive ($1/MB)

Incompatible requirements

Fortunately memory references are not random!

© 2000 by Mark D. Hill CS/ECE 552 Lecture Notes: Chapter 7 9

Motivation for Hierarchy


Locality in time (temporal locality)

if a datum is recently referenced,

it is likely to be referenced again soon

Locality in space (spacial locality)

If a datum is recently referenced,


neighbouring data is likely to be referenced soon

© 2000 by Mark D. Hill CS/ECE 552 Lecture Notes: Chapter 7 10


Motivation for Hierarchy
E.g.,
• researching term paper - don’t look at all books at random
• if you look at a chapter in one book
• temporal - may re-read the chapter again
• spatial - may read neighbouring chapters
• Solution - leave the book on desk for a while
• hit - book on desk
• miss - book not on desk
• miss ratio - fraction not on desk

© 2000 by Mark D. Hill CS/ECE 552 Lecture Notes: Chapter 7 11

Motivation for Hierarchy

Memory access time = access-desk + miss-ratio * access-shelf


• 1 + 0.05 * 100
• 6 << 100

Extend this to several levels of hierarchy

© 2000 by Mark D. Hill CS/ECE 552 Lecture Notes: Chapter 7 12


Memory Hierarchy
Small, fast, inexpensive memory
CPU
larger, slower, cheaper memory

...
L1
largest, slowest, cheapest memory larger
L2

L3
faster

Ln

© 2000 by Mark D. Hill CS/ECE 552 Lecture Notes: Chapter 7 13

Memory Hierarchy
Speed
Type Size
(ns)
Register < 1 KB 0.5
L1 Cache < 128 KB 1
L2 Cache < 16 MB 20
Main < 4 GB 100
memory
Disk > 10 GB 10 x 106

© 2000 by Mark D. Hill CS/ECE 552 Lecture Notes: Chapter 7 14


Memory Hierarchy
Registers <-> Main memory: managed by compiler/programmer
• holds expression temporaries
• holds variables - more aggressive
• register allocation
• spill when needed
• hard!

© 2000 by Mark D. Hill CS/ECE 552 Lecture Notes: Chapter 7 15

Memory Hierarchy
Main memory <-> Disk: managed by
• program - explicit I/O
• operating system - virtual memory
• illusion of larger memory
• protection
• transparent to user

© 2000 by Mark D. Hill CS/ECE 552 Lecture Notes: Chapter 7 16


Cache
cache managed by hardware

keep recently accessed block CPU


• temporal locality

break memory into blocks (several bytes) $


• spatial locality

transfer data to/from cache in blocks

Main Memory

© 2000 by Mark D. Hill CS/ECE 552 Lecture Notes: Chapter 7 17

Cache
put block in “block frame”
• state (e.g., valid)
• address tag
• data

© 2000 by Mark D. Hill CS/ECE 552 Lecture Notes: Chapter 7 18


Cache
on memory access
• if incoming tag == stored tag then HIT
• else MISS
• << replace old block >>
• get block from memory
• put block in cache
• return appropriate word within block

© 2000 by Mark D. Hill CS/ECE 552 Lecture Notes: Chapter 7 19

Cache Example
Memory words:
0x11c 0xe0e0e0e0

0x120 0xffffffff
0x124 0x00000001

0x128 0x00000007

0x12c 0x00000003
0x130 0xabababab

© 2000 by Mark D. Hill CS/ECE 552 Lecture Notes: Chapter 7 20


Cache Example
a 16-byte cache block frame:
• state tag data
• invalid 0x?? ???

lw $4, 0x128

Is tag ox120 in cache? (0x128 mod 16 = 0x128 & 0xfffffff0)

No, get block


• state tag data
• valid 0x129 0xffffffff, 0x1, 0x7, 0x3

© 2000 by Mark D. Hill CS/ECE 552 Lecture Notes: Chapter 7 21

Cache Example
Return 0x7 to CPU to put in $4
lw $5, 0x124

Is tag 0x120 in cache?


Yes, return 0x1 to CPU

© 2000 by Mark D. Hill CS/ECE 552 Lecture Notes: Chapter 7 22


Cache Example
Often
• cache 1 cycle
• main memory 20 cycles

Performance for data accesses with miss ratio 0.1

mean access = cache access + miss ratio * main memory access

= 1 + 0.01 * 20 = 1.2

Typically caches 64K, main memory 64M


• 20 times faster
• 1/1000 capacity but contains 98% of references

© 2000 by Mark D. Hill CS/ECE 552 Lecture Notes: Chapter 7 23

Cache
4 questions
• Where is block placed?
• How is block found?
• Which block is replaced?
• What happens on a write?

© 2000 by Mark D. Hill CS/ECE 552 Lecture Notes: Chapter 7 24


Cache Design
Simple cache first Address (showing bit positions)
31 30 17 16 15 5 4 3 2

• block size = 1 word 16 14 Byte


offset
• “direct-mapped” Hit Data

16 bits 32 bits

• 16K words (64KB) Valid Tag Data

• index - 14 bits
• tag - 16 bits 16K
entries

Consider
16 32
• hit & miss
• place & replace

© 2000 by Mark D. Hill CS/ECE 552 Lecture Notes: Chapter 7 25

Cache Design w/ 16-byte blocks (7.10)

Address (showing bit positions)


31 15
16 3241 0

16 12 2 Byte
Hit Tag Data
offset
Index Block offset
16 bits 128 bits
V Tag Data

4K
entries

16 32 32 32 32

Mux
32

© 2000 by Mark D. Hill CS/ECE 552 Lecture Notes: Chapter 7 26


Cache Design
What if blocks conflict?
• Fully associative cache
• CAM cells hold D and D’; incoming bits B and B’
• match = AND (Bi*Di + B’i*D’i)
• compromise - set associative cache

© 2000 by Mark D. Hill CS/ECE 552 Lecture Notes: Chapter 7 27

Cache Design w/ 4-way set-assoc. (7.19)


Address
31 30 12 11 10 9 8 321 0

22 8

Index V Tag Data V Tag Data V Tag Data V Tag Data


0
1
2

253
254
255
22 32

4-to-1 multiplexor

Hit Data

© 2000 by Mark D. Hill CS/ECE 552 Lecture Notes: Chapter 7 28


Cache Design
3C model
• Conflict
• Capacity
• Compulsory

Q3. Which block is replaced


• LRU
• random

© 2000 by Mark D. Hill CS/ECE 552 Lecture Notes: Chapter 7 29

Cache Design
Q4. What happens on a write?
• write hit must be slower
• propagate to memory?
• immediately - write-through
• on replacement - write-back

© 2000 by Mark D. Hill CS/ECE 552 Lecture Notes: Chapter 7 30


Cache Design
Exploit spatial locality
• bigger block size
• may increase miss penalty

Reduce conflicts
• more associativity
• may increase cache hit time

© 2000 by Mark D. Hill CS/ECE 552 Lecture Notes: Chapter 7 31

Cache Design
Unified vs. split instruction and data cache
Example
• consider building 16K I and D cache
• or a 32K unified cache
• let tcache be 1 cycle and tmemory be 10 cycles

© 2000 by Mark D. Hill CS/ECE 552 Lecture Notes: Chapter 7 32


Cache Design
I and D split cache
• (a) Imiss is 5% and Dmiss is 6%
• 75% references are instruction fetches
• tavg = (1 + 0.05*10)*0.75 + (1 + 0.06*10) * 0.25 = 1.5

Unified cache
• tavg = 1 + 0.04*10 = 1.4 WRONG!
• tavg = 1.4 + cycles-lost-to-interference
• will cycles-lost-to-interference be < 0.1?
• NOT for modern pipelined processors!

© 2000 by Mark D. Hill CS/ECE 552 Lecture Notes: Chapter 7 33

Cache Design
Multi-level caches
Many systems today have a cache hierarchy

E.g.,
• 16K I-cache
• 16K D-cache
• 1M L2-cache

© 2000 by Mark D. Hill CS/ECE 552 Lecture Notes: Chapter 7 34


Cache Design
Why?
• Processors getting faster w.r.t. main memory
• want larger caches to reduce frequency of costly misses
• but larger caches are slower!

Solution: Reduce cost of misses with a second level cache

Begin to occur: 3 Cache Levels


Split L1 instruction & data on chip
Unified L2 on chip
Unified L3 on board
© 2000 by Mark D. Hill CS/ECE 552 Lecture Notes: Chapter 7 35

CPU and Cache Performance


Cache only
• miss ratio
• average access time

Integrate - assume cache hits are part of the pipeline

Time/prog = insn/prog * cycles/insn * sec/cycle

CPI = (execution cycles + stall cycles)/insn

CPI = execution cycles/insn + stall cycles/insn

© 2000 by Mark D. Hill CS/ECE 552 Lecture Notes: Chapter 7 36


CPU and Cache Performance
Stall cycles/insn =
• read stall cycles/insn + write stall cycles/insn

read stall cycles/insn =


• read/insn * miss ratio * read miss penalty

write stall cycles/insn =


• more complex - write through, write back, write buffer?

© 2000 by Mark D. Hill CS/ECE 552 Lecture Notes: Chapter 7 37

CPU and Cache Performance


Example
• CPI with ideal memory is 1.5
• Assume IF and write never stall
• How is CPI degraded if loads are 25% of all insns
• loads miss 10% and miss cost is 20 cycles

CPI = 1.5 + 0.25*0.10*20 = 2


• 2/1,5 = 33% slower

© 2000 by Mark D. Hill CS/ECE 552 Lecture Notes: Chapter 7 38

You might also like