100% found this document useful (1 vote)
254 views

Memory Hierarchy

This ppt contains the memory hierarchy of subject computer organization useful for students of engineering colleges...
Copyright
© © All Rights Reserved
Available Formats
Download as PPT, PDF, TXT or read online on Scribd
100% found this document useful (1 vote)
254 views

Memory Hierarchy

This ppt contains the memory hierarchy of subject computer organization useful for students of engineering colleges...
Copyright
© © All Rights Reserved
Available Formats
Download as PPT, PDF, TXT or read online on Scribd
You are on page 1/ 47

Chapter 4

Memory Hierarchy Design


Computer Components
There are three basic hardware modules
(Bell, Newell: Computer Structures, 1971):

 Processors
 Memory
 Communication
Memory / Storage Evaluation
 Costs
 Capacity
 Speed
 Reliability
 Volatility
Memory/storage hierarchies
 Balancing performance with cost
 Small memories are fast but expensive
 Large memories are slow but cheap
 Exploit locality to get the best of

Capacity
both worlds

Performance
 locality = re-use/nearness of accesses
 allows most accesses to use small, fast
memory
An Example Memory Hierarchy
Smaller, L0:
faster, registers CPU registers hold words retrieved
and from L1 cache.
costlier L1: on-chip L1
(per byte) cache (SRAM) L1 cache holds cache lines retrieved
storage from the L2 cache memory.
devices L2: off-chip L2
cache (SRAM) L2 cache holds cache lines
retrieved from main memory.

L3: main memory


Larger, (DRAM)
Main memory holds disk
slower, blocks retrieved from local
and disks.
cheaper local secondary storage
L4:
(per byte) (local disks)
storage Local disks hold files
devices retrieved from disks on
remote network servers.

L5: remote secondary storage


(tapes, distributed file systems, Web servers)

From lecture-9.ppt
Main Memory
 Most of the main memory in a general
purpose computer is made up of RAM
integrated circuits chips, but a portion of the
memory may be constructed with ROM chips

 RAM– Random Access memory


 Integated RAM are available in two possible
operating modes, Static and Dynamic
 ROM– Read Only memory
Random-Access Memory
(RAM)
 Static RAM (SRAM)
 Each cell stores bit with a six-transistor circuit.
 Retains value indefinitely, as long as it is kept powered.
 Relatively insensitive to disturbances such as electrical noise.
 Faster (8-16 times faster) and more expensive (8-16 times more
expensice as well) than DRAM.

 Dynamic RAM (DRAM)


 Each cell stores bit with a capacitor and transistor.
 Value must be refreshed every 10-100 ms.
 Sensitive to disturbances.
 Slower and cheaper than SRAM.
SRAM vs DRAM Summary
Tran. Access
per bit time Persist? Sensitive? Cost Applications

SRAM 6 1X Yes No 100x cache memories

DRAM 1 10X No Yes 1X Main memories,


frame buffers

 Virtually all desktop or server computers since


1975 used DRAMs for main memory and
SRAMs for cache
ROM
 ROM is used for storing programs that are
PERMENTLY resident in the computer and
for tables of constants that do not change in
value once the production of the computer is
completed
 The ROM portion of main memory is needed
for storing an initial program called bootstrap
loader, witch is to start the computer
software operating when power is turned off
Introduction
Introduction
 Programmers want unlimited amounts of memory with
low latency
 Fast memory technology is more expensive per bit than
slower memory
 Solution: organize memory system into a hierarchy
 Entire addressable memory space available in largest, slowest
memory
 Incrementally smaller and faster memories, each containing a
subset of the memory below it, proceed in steps up toward the
processor
 Temporal and spatial locality insures that nearly all
references can be found in smaller memories
 Gives the illusion of a large, fast memory being presented to the
processor
Since 1980, CPU has outpaced DRAM ...
Q. How do architects address this gap?
A. Put smaller, faster “cache” memories
Performance between CPU and DRAM.
(1/latency) Create a “memory hierarchy”. CPU
CPU 60% per yr

2X in 1.5 yrs

Gap grew 50% per


year
DRAM
DRAM
9% per yr
2X in 10 yrs

Year
Introduction
Memory Hierarchy
Exploiting the Memory
Hierarchy
 Not all stored data is equally important.
 Put important data in the upper ranges
of the memory / storage hierarchy.
 Put unimportant data in the lower
ranges.
The Principle of Locality

 The Principle of Locality:


 Program access a relatively small portion of the address space at

any instant of time. (This is kind of like in real life, we all have a
lot of friends. But at any given time most of us can only keep
in touch with a small group of them.)
 Two Different Types of Locality:
 Temporal Locality (Locality in Time): If an item is referenced, it

will tend to be referenced again soon (e.g., loops, reuse)


 Spatial Locality (Locality in Space): If an item is referenced,

items whose addresses are close by tend to be referenced soon


(e.g., straightline code, array access)

 Last 15 years, HW relied on locality for speed


It is a property of programs which is exploited in machine design. 15
Exploiting the Memory
Hierarchy
 Locality
 Spatial Locality:
 Data is more likely to be accessed if
neighboring data is accessed.
 Temporal Locality:
 Data is more likely to be accessed if it has been
recently accessed.
Exploiting the Memory
Hierarchy
 Executables
 Program executions tend to spend a great portion of time
in loops.
 Spatial locality: if a statement in the loop is executed,
then so are the statements surrounding it.
 Temporal locality: if a statement is executed, it is likely to
be executed again.
Exploiting the Memory
Hierarchy
 Relational Databases
 Store data in relations
 Relation consists of fields
 Often with Record ID.
 Stored in a B+ tree or in a (linear) hash table.
 Spatial Locality
 Accessing all records in order, records are stored in B+
tree.
 Makes sense to move records in bunches from disk / tape
to main memory.
 Typical transaction has no spatial locality.
 Accesses a record here and there all over the place.
 No spatial locality.
Exploiting the Memory
Hierarchy
 Relational Databases
 Temporal Locality
 Some records are hot, most are cold.
 Records of current students vs. records of graduates.
 Active accounts in a bank database.
 Current patients versus other patients.
 Some transactions look at the same record
several times (due to inefficiencies).
Exploiting the Memory
Hierarchy
 File System
 Temporal Locality:
 Few files are frequently accessed (OS kernel,
killer apps, data in current projects).
 Most are written and never read again.
 Spatial Locality:
 Not only individual files, but also directories can
become hot.
Exploiting the Memory
Hierarchy
 Caching strategy:
 Keep popular items in expensive, small,
and fast memory.
 Keep less popular items in cheap, big, and
slow memory.
 Use spatial & temporal locality to guess
what items are popular.
Cache Analysis
 Assume two levels of memory:
 Cache: fast, small, expensive.
 Main: slow, large, cheap.
Introduction
Performance and Power
 High-end microprocessors have >10 MB on-
chip cache
 Consumes large amount of area and power
budget
Introduction
Memory Hierarchy Basics
 When a word is not found in the cache, a miss
occurs:
 Fetch word from lower level in hierarchy, requiring a
higher latency reference
 Lower level may be another cache or the main
memory
 Also fetch the other words contained within the block
 Takes advantage of spatial locality
 Place block into cache in any location within its set,
determined by address
 block address MOD number of sets
Cache Hits and Misses
 Hit: data appears in some block in the upper level
(example: Block X)
 Hit Rate: the fraction of memory access found in the upper level
 Hit Time: Time to access the upper level which consists of
RAM access time + Time to determine hit/miss
 Miss: data needs to be retrieved from a block in the lower
level (Block Y)
 Miss Rate = 1 - (Hit Rate)
 Miss Penalty: Time to replace a block in the upper level +
Time to deliver the block the processor

Lower Level
To Processor Upper Level Memory
Memory
Blk X
From Processor Blk Y
Memory Hierarchy Terms
 The goal of the memory hierarchy is to keep the
contents that are needed now at or near the top of
the hierarchy
 We discuss the performance of the memory hierarchy
using the following terms:
 Hit – when the datum being accessed is found at the current
level
 Miss – when the datum being accessed is not found and the next
level of the hierarchy must be examined
 Hit rate – how many hits out of all memory accesses
 Miss rate – how many misses out of all memory accesses
 NOTE: hit rate = 1 – miss rate, miss rate = 1 – hit rate
 Hit time – time to access this level of the hierarchy
 Miss penalty – time to access the next level
Hit Rate and Miss Penalty
 Hit rate: fraction found in that level
 So high that usually talk about Miss rate
 Miss rate fallacy: as MIPS to CPU performance,
miss rate to average memory access time in memory
 Average memory-access time
= Hit time + Miss rate x Miss penalty
(ns or clocks)
 Miss penalty: time to replace a block from lower level,
including time to replace in CPU
 access time: time to lower level
= f(latency to lower level)
 transfer time: time to transfer block
=f(BW between upper & lower levels, block size)
 Single Cache
 The average read access time= Hit Ratio*Time
taken in case of hit +(1-Hit Ratio)*Time taken
in case of miss
 Average access time = H1*T1 +(1-H1)*T2
 Two level Cache
 Average access time = [H1*T1]+[(1-
H1)*H2*T2]+[(1-H1)(1-H2)*Hm*Tm]
Example 1
 Assume that for a certain processor, a read request takes
50 nanoseconds on a cache miss and 5 nanoseconds on
a cache hit. Suppose while running a program, it was
observed that 80% of the processor’s read requests
result in a cache hit. The average read access time in
nanoseconds is____________.
 (A) 10
(B) 12
(C) 13
(D) 14
Solution 1
 Hit Ratio=0.8
 Time taken in case of hit=5ns
 Time taken in case of miss=50ns
 The average read access time= Hit Ratio*Time taken in
case of hit +(1-Hit Ratio)*Time taken in case of miss
 The average read access time in nanoseconds
= 0.8 * 5 + (1-0.8)*50
 = 0.8 * 5 + 0.2*50
= 14 ns
Example 2
Consider a system with 2 level caches. Access times of
Level 1 cache, Level 2 cache and main memory are 1
ns, 10ns, and 500 ns, respectively. The hit rates of
Level 1 and Level 2 caches are 0.8 and 0.9,
respectively. What is the average access time of the
system ignoring the search time within the cache?
(A) 13.0 ns
(B) 12.8 ns
(C) 12.6 ns
(D) 12.4 ns
Solution 2

where,
H1 = Hit rate of level 1 cache = 0.8
T1 = Access time for level 1 cache = 1 ns
H2 = Hit rate of level 2 cache = 0.9
T2 = Access time for level 2 cache = 10 ns
Hm = Hit rate of Main Memory = 1
Tm = Access time for Main Memory = 500 ns
So, Average Access Time = ( 0.8 * 1 ) + ( 0.2 * 0.9 * 10 ) + ( 0.2
* 0.1 * 1 * 500)
= 0.8 + 1.8 + 10
= 12.6 ns
Example 3
A computer system has an L1 cache, an L2 cache, and a main memory unit
connected as shown below. The block size in L1 cache is 4 words. The
block size in L2 cache is 16 words. The memory access times are 2
nanoseconds. 20 nanoseconds and 200 nanoseconds for L1 cache, L2
cache and main memory unit respectively.

When there is a miss in L1 cache and a hit in L2 cache, a block is


transferred from L2 cache to L1 cache. What is the time taken for this
transfer?
Example 4
 The memory access time is 1 nanosecond for a read operation
with a hit in cache, 5 nanoseconds for a read operation with a
miss in cache, 2 nanoseconds for a write operation with a hit in
cache and 10 nanoseconds for a write operation with a miss in
cache. Execution of a sequence of instructions involves 100
instruction fetch operations, 60 memory operand read
operations and 40 memory operand write operations. The cache
hit-ratio is 0.9. The average memory access time (in
nanoseconds) in executing the sequence of instructions is
__________.
(A) 1.26
(B) 1.68
(C) 2.46
(D) 4.52
The question is to find the time taken for, "100 fetch operation and 60
operand red operations and 40 memory operand write
operations"/"total number of instructions".

Total number of instructions= 100+60+40 =200

Time taken for 100 fetch operations(fetch =read)


= 100*((0.9*1)+(0.1*5)) // 1 corresponds to time taken for read
// when there is cache hit
= 140 ns //0.9 is cache hit rate

Time taken for 60 read operations = 60*((0.9*1)+(0.1*5)) = 84ns


Time taken for 40 write operations = 40*((0.9*2)+(0.1*10)) = 112 ns
// Here 2 and 10 the time taken for write when there is cache
// hit and no cahce hit respectively
So, the total time taken for 200 operations is = 140+84+112 = 336ns

Average time taken = time taken per operation = 336/200 = 1.68 ns


Data access using catche
Writing to Cache
 Writing to cache: two strategies
 Write-through
 Immediately update lower levels of hierarchy
 Write-back
 Only update lower levels of hierarchy when an updated block
is replaced
 Both strategies use write buffer to make writes
asynchronous
Q4: What happens on a write?
Write-Through Write-Back
Write data only to the
Data written to cache cache
block
Policy
also written to lower- Update lower level
level memory when a block falls out
of the cache

Debug Easy Hard


Do read misses
produce writes?
No Yes
Do repeated writes
make it to lower Yes No
level?

Additional option (on miss)-- let writes to an un-cached


address; allocate a new cache line (“write-allocate”).
Write Buffers for Write-Through Caches

Cache Lower
Processor Level
Memory

Write Buffer

Holds data awaiting write-through to


lower level memory
Q. Why a write buffer ? A. So CPU doesn’t stall
Q. Why a buffer, why not just A. Bursts of writes are
one register ? common.
Q. Are Read After Write (RAW) A. Yes! Drain buffer before next
hazards an issue for write read, or send read 1st after check
buffer? write buffers.
Performance - Cache Memory System

 Te: Effective memory access time in


cache memory system
 Tc: Cache access time
 Tm: Main memory access time
Te = Tc + (1 - h) Tm
 Example: Tc = 0.4ns, Tm = 1.2ns, h
= 85%
 Te = 0.4 + (1 - 0.85) × 1.2 = 0.58ns
Introduction
Types and Causes of Misses

 Causes of misses
 Compulsory
 First reference to a block
 Capacity
 Blocks discarded and later retrieved
 Conflict
 Program makes repeated references to multiple
addresses from different blocks that map to the same
location in the cache
Introduction
Memory Access Time

 Note that speculative and multithreaded


processors may execute other instructions
during a miss
 Reduces performance impact of misses
Improve Cache Performance
improve cache and memory access times:

Average Memory Access Time = Hit Time + Miss Rate * Miss Penalty

Reducing each of these!


Simultaneously?

CPUtime  IC * (CPI Execution  MemoryAccess


Instruction * MissRate * MissPenalt y * ClockCycleTime)
• Improve performance by:
1. Reduce the miss rate,
2. Reduce the miss penalty, or
3. Reduce the time to hit in the cache.

You might also like