Memory Hierarchy 4.0
Memory Hierarchy 4.0
Role of Memory
vProgrammers want unlimited amount of fast memory
vCreate the illusion of a very large and fast memory
vImplement the memory of a computer as a hierarchy
vMultiple levels of memory with different speeds and sizes
vEntire addressable memory space available in largest,
slowest memory
vKeep the smaller and faster memories close to the
processor and the slower, large memory below that.
Caehe
.Man
• A key design decision is where blocks (or lines) can be placed in a cache.
• The most popular scheme is set associative, where a set is a group of blocks in the cache. A
block is first mapped onto a set, and then the block can be placed any- where within that
set. Finding a block consists of first mapping the block address to the set and then
searching the set—usually in parallel—to find the block. The set is chosen by the address
of the data:
Direct mapped is simply one-way set associative, and a fully associative cache with m blocks could
be called “m-way set associative.”
Q. An 8-way set-associative cache is used in a computer in which the
real memory size is 2^32 bytes. The line size is 16 Bytes and there are
2^10 lines per set. Calculate the cache size and tag length.
Cache Size = no. of sets * lines per set * size of line = 8*2^10 *16
= 128KB
where hit time is the time to hit in the cache and miss penalty is the time to replace the block from memory (that
is, the cost of a miss).
• Because of locality and the higher speed of smaller memories, a memory hierarchy can substantially
improve performance. One method to evaluate cache performance is to expand our processor execution
time equation.
1.Larger block size to reduce miss rate—The simplest way to reduce the miss rate is
to take advantage of spatial locality and increase the block size. Larger blocks
reduce compulsory misses, but they also increase the miss penalty. Because larger
blocks lower the number of tags, they can slightly reduce static power. Larger
block sizes can also increase capacity or conflict misses, especially in smaller
caches. Choosing the right block size is a complex trade-off that depends on the
size of cache and the miss penalty.
2.Bigger caches to reduce miss rate—The obvious way to reduce capacity misses is
to increase cache capacity. Drawbacks include potentially longer hit time of the
larger cache memory and higher cost and power. Larger caches increase both static
and dynamic power.
3.Higher associativity to reduce miss rate—Obviously, increasing associativity
reduces conflict misses. Greater associativity can come at the cost of increased hit
time. As we will see shortly, associativity also increases power consumption.
4. Multilevel caches to reduce miss penalty—A difficult decision is whether to make the cache hit
time fast, to keep pace with the high clock rate of processors, or to make the cache large to reduce the
gap between the processor accesses and main memory accesses. Adding another level of cache
between the original cache and memory simplifies the decision. The first-level cache can be small
enough to match a fast clock cycle time, yet the second-level (or third-level) cache can be large
enough to capture many accesses that would go to main memory. The focus on misses in second-
level caches leads to larger blocks, bigger capacity, and higher associativity. Multi- level caches are
more power efficient than a single aggregate cache. If L1 and L2 refer, respectively, to first- and
second-level caches, we can redefine the average memory access time:
Hit timeL1 + Miss rateL1 × (Hit timeL2 + Miss rateL2 × Miss penaltyL2)
5. Giving priority to read misses over writes to reduce miss penalty—A write buffer is a good place
to implement this optimization. Write buffers create hazards because they hold the updated value of a
location needed on a read miss—that is, a read-after-write hazard through memory. One solution is to
check the contents of the write buffer on a read miss. If there are no conflicts, and if the memory
system is available, sending the read before the writes reduces the miss penalty. Most processors give
reads priority over writes. This choice has little effect on power consumption.
Multi-Level Cache: Some
Definitions
● Local miss rate— misses in this cache
divided by the total number of memory
accesses to this cache (Miss rateL2)
● Global miss rate—misses in this cache
divided by the total number of memory
accesses generated by the CPU
– Local Miss RateL1 x Local Miss RateL2
● L1 Global miss rate = L1 Local miss rate
20
Global vs. Local Miss
Rates
● At lower level caches (L2 or L3),
global miss rates provide more
useful information:
– Indicate how effective is cache in
reducing AMAT.
– Who cares if the miss rate of L3 is
50% as long as only 1% of processor
memory accesses ever benefit from it?
21
Example
Suppose that in 1000 memory references there are 40 misses in the first-level cache
and 20 misses in the second-level cache. What are the various miss rates? Assume
the miss penalty from the L2 cache to memory is 200 clock cycles, the hit time of
the L2 cache is 10 clock cycles, the hit time of L1 is 1 clock cycle, and there are 1.5
memory references per instruction. What is the average memory access time and
average stall cycles per instruction? Ignore the impact of writes.
Example:
Consider the design of a three-level memory hierarchy with the following specifications for memory
characteristics:
Design a memory hierarchy to achieve an effective memory access time (t=10.04 μs) with a cache
hit ratio (h1=0.98) and a main memory hit ratio (h2=0.9). The total cost of memory hierarchy is limited
by $15000.
Solution:
How to Improve Cache
Performance?
AMAT = HitTime + MissRate MissPenalty
1. Reduce miss rate.
2. Reduce miss penalty.
3. Reduce miss penalty or miss rates
via parallelism.
4. Reduce hit time.
11
Reducing Miss Rates
● Techniques:
– Larger block size
– Larger cache size
– Higher associativity
– Way prediction
– Pseudo-associativity
– Compiler optimization
12
Reducing Miss Penalty
● Techniques:
– Multilevel caches
– Victim caches
– Read miss first
– Critical word first
13
Reducing Miss Penalty or
Miss Rates via Parallelism
● Techniques:
– Non-blocking caches
– Hardware prefetching
– Compiler prefetching
14
Reducing Cache Hit
Time
● Techniques:
– Small and simple caches
– Avoiding address translation
– Pipelined cache access
– Trace caches
15
Modern Computer
Architectures
16
Reducing Miss Penalty
● Techniques:
– Multilevel caches
– Victim caches
– Read miss first
– Critical word first
– Non-blocking caches
17
Multilevel Cache
18
Reducing Miss Penalty(1):
Multi-Level Cache
● Add a second-level cache.
● L2 Equations: