0% found this document useful (0 votes)
19 views28 pages

Memory Hierarchy

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
19 views28 pages

Memory Hierarchy

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 28

Memory Hierarchy

Hitting the Memory Wall

Memory density and capacity have grown along with the CPU power and
complexity, but memory speed has not kept pace.
Why Care About Memory Hierarchy?
Processor-DRAM Performance Gap grows 50% / year
Processor
60%/year
1000 (2X/1.5 years)
“Moore’s Law”
Performance

100 CPU

10 DRAM
9%/year
DRAM
(2X/10 years)

1
1980
1981
1982
1983
1984
1985
1986
1987
1988
1989
1990
1991
1992
1993
1994
1995
1996
1997
1998
1999
2000
Time
The Need for a Memory Hierarchy
The widening speed gap between CPU and main memory

Processor operations take of the order of 1 ns


Memory access requires 10s or even 100s of ns

Memory bandwidth limits the instruction execution rate

Each instruction executed involves at least one memory access


Hence, a few to 100s of MIPS is the best that can be achieved
A fast buffer memory can help bridge the CPU-memory gap
The fastest memories are expensive and thus not very large
A second (third?) intermediate cache level is thus often used
Typical Levels in a Hierarchical Memory

Names and key characteristics of levels in a memory hierarchy.


Levels of the Memory Hierarchy
Capacity Upper Level
Access Time Staging
Cost Transfer Unit faster
CPU Registers Registers
100s Bytes
<10 ns Compiler
Instr. Operands 1-8 bytes
Cache
K Bytes
10-100 ns
Cache
1-0.1 cents/bit Cache controller
Cache Lines This Lecture
8-128 bytes
Main Memory
M Bytes
200ns- 500ns
Memory
$.0001-.00001 cents /bit Operating system
Disk
Pages 512-4K bytes
G Bytes, 10 ms
(10,000,000 ns) Disk
-5 -6
10 - 10 cents/bit User
Files Mbytes
Tape Larger
infinite
sec-min Tape Lower Level
10 -8
Memory Technologies
 RAM (Random Access Memory): access time
is the same for all locations (in nanoseconds).
 DRAM: Dynamic RAM
 High density, low power, cheap, slow (access
time: 60-120 ns).
 Dynamic: needs to be “refreshed” regularly.
 SRAM: Static RAM
 Low density, high power, expensive, fast (access
time: 5-25 ns).
 Static: non-volatile – content lasts “forever” (until
power is removed).
Memory Hierarchy
Cache memory: Main memory: Virtual memory:
provides illusion of reasonable cost, provides illusion of
very high speed but slow & small very large size

Data movement in a memory hierarchy.


Analogy: Term Paper in Library
 Working on a paper at a desk in library.
 Option 1: Every time a book is needed…
 Leave desk to go to shelves (or stacks)
 Find the book
 Bring one book back to desk
 Read section interested in
 When done, leave desk and go to shelves carrying
book
 Put book back on shelf
 Return to desk to work
Analogy: Term Paper in Library (2)
 Option 2: Every time a book is needed…
 Leave some books on desk after fetching them
 Only go to shelves when a book not on desk is
needed
 At the shelves, fetch related books in case you
need them; sometimes you will need to return
books not used recently to make space for new
books on desk (replacement algorithm)
 Return to desk to work
 Illusion: whole library on your desk
Desktop, Drawer, and File Cabinet Analogy
Once the “working set” is in
the drawer, very few trips to
the file cabinet are needed.

Items on a desktop (register) or in a drawer (cache) are more readily


accessible than those in a file cabinet (main memory).
Memory Hierarchy Analogy: Library
• You’re writing a term paper (Processor) at a table in the
library
• Library is equivalent to disk
– essentially limitless capacity
– very slow to retrieve a book
• Table is main memory
– smaller capacity: means you must return book when table fills up
– easier and faster to find a book there once you’ve already retrieved it
Memory Hierarchy Analogy
• Open books on table are cache
– smaller capacity: can have very few open books fit on table; again,
when table fills up, you must close a book
– much, much faster to retrieve data
• Illusion created: whole library open on the tabletop
– Keep as many recently used books open on table as possible since
likely to use again
– Also keep as many books on table as possible, since faster than going
to library
Principle of Locality

 Temporal locality (locality in time):


  Keep most recently accessed data items closer to
the processor.
 Library analogy: Recently read books are kept on desk.
 Spatial locality (locality in space):
  Move blocks consisting of neighboring words to
‘upper’ levels.
 Block is a unit of transfer.
 Library analogy: Bring back nearby books on shelves
when fetching a book, hoping that you might need
them soon.
Principle of Locality
• Programs access a relatively small portion of address
space at any instant of time.
• Two Types of Locality:
– Temporal Locality (Locality in Time): If an address is
referenced, it tends to be referenced again
• e.g., loops, reuse
– Spatial Locality (Locality in Space): If an address is
referenced, neighboring addresses tend to be referenced
• e.g., straightline code, array access
• Traditionally, HW has relied on locality for speed

Locality is a program property that is exploited in machine design.


Principle of Locality
 What programming constructs lead to principle
of locality?

Spatial Temporal
Data arrays loop counters
Code no branch/jump loop
Memory Hierarchy: Idea
 Temporal locality: keep recently accessed data
items closer to processor.
 Spatial locality: move contiguous words in
memory to upper levels of hierarchy.
 Use smaller and faster memory technologies
closer to the processor
 If hit rate is high enough, hierarchy has access
time close to the highest (fastest) level and size
equal to the lowest (largest) level.
Memory Hierarchy: Terminology
 Hit: data appears in some block in the upper level.
 Hit rate: the fraction of memory access found in the
upper level.
 Analogy: fraction of time you find the book on desk.
 Miss: data is not at upper level and needs to be
retrieved from a block in the lower level.
 Miss rate: 1 – Hit rate
 Analogy: fraction of time you need to go to shelves for
the book.
Memory Hierarchy: Terminology (2)
 (Average) Hit time: time to access the upper level
which consists of
 Time to determine hit/miss + memory access time.
 Analogy: time to find and pick up book from desk.
 (Average) Miss penalty: time to replace a block in
the upper level + time to deliver the block to the
processor.
 Analogy: time to go to shelves, find needed book, and
return to your desk.
 Hit time << Miss penalty.
Current Memory Hierarchy
Processor
Register
Increasing Increasing Increasing
size speed cost per bit
Primary L1
cache

Secondary
cache L2

Main
memory

Magnetic disk
secondary
memory
Current Memory Hierarchy
Processor

Control Second-
ary
Main
L2 Memory
Memory
Data Cache

L1 $
Regs

path

Speed(ns): 0.5ns 2ns 6ns 100ns 10,000,000ns

Size (MB): 0.0005 0.05 1-4 100-1000 100,000


Cost ($/MB): -- $100 $30 $1 $0.05
Technology: Regs SRAM SRAM DRAM Disk
Management of the Hierarchy
 Registers  Memory
 By compiler (or assembly programmer).
 Cache  Main memory
 By the hardware.
 Main memory  Disks
 By the hardware and operating system.
 By the programmer (through files).
Cache Terminology
• Hit: data appears in some block
– Hit Rate: the fraction of memory accesses found in the level
– Hit Time: Time to access the level (consists of RAM access time +
Time to determine hit)
• Miss: data needs to be retrieved from a block in the lower level (e.g., Block
Y)
– Miss Rate = 1 - (Hit Rate)
– Miss Penalty: Time to replace a block in the upper level +
Time to deliver the block to the processor
• Hit Time << Miss Penalty
Lower Level
Upper Level Memory
Memory
From Processor Blk X
Blk Y

To Processor
Average Memory Access Time
• Average memory-access time
= Hit time + Miss rate x Miss penalty
• Miss penalty: time to fetch a block from lower
memory level
– access time: function of latency
– transfer time: function of bandwidth b/w levels
• Transfer one “cache line/block” at a time
• Transfer at the size of the memory-bus width
Memory Hierarchy Performance
1 clk First-level
300 clks
Cache

Miss % * Miss penalty Main


Hit Time
Memory
(DRAM)
• Average Memory Access Time (AMAT)
= Hit Time + Miss rate * Miss Penalty
= Thit(L1) + Miss%(L1) * T(memory)
• Example:
– Cache Hit = 1 cycle
– Miss rate = 10% = 0.1
– Miss penalty = 300 cycles
– AMAT = 1 + 0.1 * 300 = 31 cycles
• Can we improve it?
Reducing Penalty: Multi-Level Cache
1 clk First-level
10 clks Second 20 clks 300 clks
Cache

Level Third
Cache Level Main
L1 Cache Memory
L2
(DRAM)

L3
On-die

Average Memory Access Time (AMAT)


= Thit(L1) + Miss%(L1)* (Thit(L2) + Miss%(L2)* (Thit(L3) + Miss%
(L3)*T(memory) ) )

26
Relationship of Caches and Pipeline
Memory

I-$ D-$

Next
SEQ PC
Next PC

EX/MEM

MEM/WB
Adder

ID/EX
Adder

MUX Reg File


IF/ID

Zero?
4 RS1
Address

RS2
Memory

ALU

WB Data
Memory
MUX

Data

MUX
Sign
Imm Extend

RD RD RD
Hardware cache memories
Cache memories are small, fast SRAM-based memories
managed automatically in hardware
– Hold frequently accessed blocks of main memory
CPU looks first for data in L1, then in main memory
Typical system structure:

CPU chip
register file
L1
ALU
cache

bus main
bus interface memory

You might also like