0% found this document useful (0 votes)
24 views50 pages

Memory Hierarchy 4.0

Uploaded by

Sonam 60
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
24 views50 pages

Memory Hierarchy 4.0

Uploaded by

Sonam 60
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 50

Memory Hierarchy

Role of Memory
vProgrammers want unlimited amount of fast memory
vCreate the illusion of a very large and fast memory
vImplement the memory of a computer as a hierarchy
vMultiple levels of memory with different speeds and sizes
vEntire addressable memory space available in largest,
slowest memory
vKeep the smaller and faster memories close to the
processor and the slower, large memory below that.
Caehe

.Man
• A key design decision is where blocks (or lines) can be placed in a cache.

• The most popular scheme is set associative, where a set is a group of blocks in the cache. A
block is first mapped onto a set, and then the block can be placed any- where within that
set. Finding a block consists of first mapping the block address to the set and then
searching the set—usually in parallel—to find the block. The set is chosen by the address
of the data:

(Block address) MOD (Number of sets in cache)


• If there are n blocks in a set, the cache placement is called n-way set associative. The end
points of set associativity have their own names. A direct-mapped cache has just one block
per set (so a block is always placed in the same location), and a fully associative cache has
just one set (so a block can be placed anywhere).
Q1: Where Can a Block Be Placed in a Cache?
The restrictions on where a block is placed create three categories of cache organization:
• If each block has only one place it can appear in the cache, the cache is said to be direct mapped.
The mapping is usually
(Block address) MOD (Number of blocks in cache)
• If a block can be placed anywhere in the cache, the cache is said to be fully associative.
• If a block can be placed in a restricted set of places in the cache, the cache is set associative. A set
is a group of blocks in the cache. A block is first mapped onto a set, and then the block can be
placed anywhere within that set. The set is usually chosen by bit selection; that is,
(Block address) MOD (Number of sets in cache)
• If there are n blocks in a set, the cache placement is called n-way set associative.

Direct mapped is simply one-way set associative, and a fully associative cache with m blocks could
be called “m-way set associative.”
Q. An 8-way set-associative cache is used in a computer in which the
real memory size is 2^32 bytes. The line size is 16 Bytes and there are
2^10 lines per set. Calculate the cache size and tag length.

Cache Size = no. of sets * lines per set * size of line = 8*2^10 *16
= 128KB

No. of bits in the tag field = 32 – 17 = 15


One measure of the benefits of different cache organizations is miss rate. Miss rate is
simply the fraction of cache accesses that result in a miss—that is, the number of
accesses that miss divided by the number of accesses.
To gain insights into the causes of high miss rates, which can inspire better cache designs,
the three Cs model sorts all misses into three simple categories:
• Compulsory—The very first access to a block cannot be in the cache, so the block must
be brought into the cache. Compulsory misses are those that occur even if you had an
infinite sized cache.
• Capacity—If the cache cannot contain all the blocks needed during execution of a
program, capacity misses (in addition to compulsory misses) will occur because of
blocks being discarded and later retrieved.
• Conflict—If the block placement strategy is not fully associative, conflict misses (in
addition to compulsory and capacity misses) will occur because a block may be
discarded and later retrieved if multiple blocks map to its set and accesses to the
different blocks are intermingled.
Miss rate can be a misleading measure for several reasons. Hence, some designers prefer measuring misses per
instruction rather than misses per memory reference (miss rate). These two are related:

where hit time is the time to hit in the cache and miss penalty is the time to replace the block from memory (that
is, the cost of a miss).
• Because of locality and the higher speed of smaller memories, a memory hierarchy can substantially
improve performance. One method to evaluate cache performance is to expand our processor execution
time equation.
1.Larger block size to reduce miss rate—The simplest way to reduce the miss rate is
to take advantage of spatial locality and increase the block size. Larger blocks
reduce compulsory misses, but they also increase the miss penalty. Because larger
blocks lower the number of tags, they can slightly reduce static power. Larger
block sizes can also increase capacity or conflict misses, especially in smaller
caches. Choosing the right block size is a complex trade-off that depends on the
size of cache and the miss penalty.
2.Bigger caches to reduce miss rate—The obvious way to reduce capacity misses is
to increase cache capacity. Drawbacks include potentially longer hit time of the
larger cache memory and higher cost and power. Larger caches increase both static
and dynamic power.
3.Higher associativity to reduce miss rate—Obviously, increasing associativity
reduces conflict misses. Greater associativity can come at the cost of increased hit
time. As we will see shortly, associativity also increases power consumption.
4. Multilevel caches to reduce miss penalty—A difficult decision is whether to make the cache hit
time fast, to keep pace with the high clock rate of processors, or to make the cache large to reduce the
gap between the processor accesses and main memory accesses. Adding another level of cache
between the original cache and memory simplifies the decision. The first-level cache can be small
enough to match a fast clock cycle time, yet the second-level (or third-level) cache can be large
enough to capture many accesses that would go to main memory. The focus on misses in second-
level caches leads to larger blocks, bigger capacity, and higher associativity. Multi- level caches are
more power efficient than a single aggregate cache. If L1 and L2 refer, respectively, to first- and
second-level caches, we can redefine the average memory access time:
Hit timeL1 + Miss rateL1 × (Hit timeL2 + Miss rateL2 × Miss penaltyL2)

5. Giving priority to read misses over writes to reduce miss penalty—A write buffer is a good place
to implement this optimization. Write buffers create hazards because they hold the updated value of a
location needed on a read miss—that is, a read-after-write hazard through memory. One solution is to
check the contents of the write buffer on a read miss. If there are no conflicts, and if the memory
system is available, sending the read before the writes reduces the miss penalty. Most processors give
reads priority over writes. This choice has little effect on power consumption.
Multi-Level Cache: Some
Definitions
● Local miss rate— misses in this cache
divided by the total number of memory
accesses to this cache (Miss rateL2)
● Global miss rate—misses in this cache
divided by the total number of memory
accesses generated by the CPU
– Local Miss RateL1 x Local Miss RateL2
● L1 Global miss rate = L1 Local miss rate

20
Global vs. Local Miss
Rates
● At lower level caches (L2 or L3),
global miss rates provide more
useful information:
– Indicate how effective is cache in
reducing AMAT.
– Who cares if the miss rate of L3 is
50% as long as only 1% of processor
memory accesses ever benefit from it?
21
Example
Suppose that in 1000 memory references there are 40 misses in the first-level cache
and 20 misses in the second-level cache. What are the various miss rates? Assume
the miss penalty from the L2 cache to memory is 200 clock cycles, the hit time of
the L2 cache is 10 clock cycles, the hit time of L1 is 1 clock cycle, and there are 1.5
memory references per instruction. What is the average memory access time and
average stall cycles per instruction? Ignore the impact of writes.
Example:
Consider the design of a three-level memory hierarchy with the following specifications for memory
characteristics:

Design a memory hierarchy to achieve an effective memory access time (t=10.04 μs) with a cache
hit ratio (h1=0.98) and a main memory hit ratio (h2=0.9). The total cost of memory hierarchy is limited
by $15000.
Solution:
How to Improve Cache
Performance?
AMAT = HitTime + MissRate MissPenalty
1. Reduce miss rate.
2. Reduce miss penalty.
3. Reduce miss penalty or miss rates
via parallelism.
4. Reduce hit time.

11
Reducing Miss Rates
● Techniques:
– Larger block size
– Larger cache size
– Higher associativity
– Way prediction
– Pseudo-associativity
– Compiler optimization
12
Reducing Miss Penalty
● Techniques:
– Multilevel caches
– Victim caches
– Read miss first
– Critical word first

13
Reducing Miss Penalty or
Miss Rates via Parallelism
● Techniques:
– Non-blocking caches
– Hardware prefetching
– Compiler prefetching

14
Reducing Cache Hit
Time
● Techniques:
– Small and simple caches
– Avoiding address translation
– Pipelined cache access
– Trace caches

15
Modern Computer
Architectures

Lecture 23: Cache


Optimizations

16
Reducing Miss Penalty
● Techniques:
– Multilevel caches
– Victim caches
– Read miss first
– Critical word first
– Non-blocking caches

17
Multilevel Cache

18
Reducing Miss Penalty(1):
Multi-Level Cache
● Add a second-level cache.
● L2 Equations:

AMAT = Hit TimeL1 + Miss RateL1 x Miss


PenaltyL1
Miss PenaltyL1 = Hit TimeL2 + Miss RateL2
x Miss PenaltyL2
AMAT = Hit TimeL1 + Miss RateL1 x (Hit
TimeL2 + Miss RateL2 × Miss PenaltyL2)
19

You might also like