0% found this document useful (0 votes)
45 views27 pages

Lecture # 15

This document discusses cache performance and memory hierarchy. It covers: 1) Measuring cache performance using metrics like miss rate and miss penalty to calculate average memory access time. 2) Different types of caches like direct mapped, set associative, and fully associative caches, as well as replacement policies. 3) Techniques for improving cache performance like increasing associativity to reduce miss rate and adding additional cache levels to reduce miss penalty through multilevel caching.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
45 views27 pages

Lecture # 15

This document discusses cache performance and memory hierarchy. It covers: 1) Measuring cache performance using metrics like miss rate and miss penalty to calculate average memory access time. 2) Different types of caches like direct mapped, set associative, and fully associative caches, as well as replacement policies. 3) Techniques for improving cache performance like increasing associativity to reduce miss rate and adding additional cache levels to reduce miss penalty through multilevel caching.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 27

Chapter # 5:

Large and fast: exploiting memory


hierarchy

Course Instructor: Dr. Afshan Jamil


Lecture # 15
• Measuring and improving cache performance
• Average memory access time
• Associative cache

Contents
• Set associative cache

Contents
• Spectrum of associativity
• Set associative cache organization
• Replacement policies
• Decreasing Miss Rates with Associative Block
Placement
• Multilevel caches
Improving Cache Performance

• Reducing the miss rate by reducing the probability that two


different memory blocks will contend for the same cache
location.
• Reducing the miss penalty by adding an additional level to the
hierarchy. Th is technique, called multilevel caching.
Measuring Cache Performance
• Components of CPU time
– Program execution cycles
• Includes cache hit time
– Memory stall cycles
• Mainly from cache misses
• CPU time = (CPU execution clock cycles + Memory-stall clock
cycles) x Clock cycle time

• The memory-stall clock cycles come primarily from cache misses.

• Memory-stall clock cycles = (Read-stall cycles + Write-stall cycles)


CONTD…

𝑅𝑒𝑎𝑑𝑠
• Read 𝑠𝑡𝑎𝑙𝑙 𝑐𝑦𝑐𝑙𝑒𝑠 = × 𝑟𝑒𝑎𝑑 𝑚𝑖𝑠𝑠 𝑟𝑎𝑡𝑒 ×
𝑃𝑟𝑜𝑔𝑟𝑎𝑚
𝑟𝑒𝑎𝑑 𝑚𝑖𝑠𝑠 𝑝𝑒𝑛𝑎𝑙𝑡𝑦.

𝑊𝑟𝑖𝑡𝑒𝑠
• 𝑊𝑟𝑖𝑡𝑒 𝑠𝑡𝑎𝑙𝑙 𝑐𝑦𝑐𝑙𝑒𝑠 = ቀ × 𝑤𝑟𝑖𝑡𝑒 𝑚𝑖𝑠𝑠 𝑟𝑎𝑡𝑒 ×
𝑃𝑟𝑜𝑔𝑟𝑎𝑚
𝑤𝑟𝑖𝑡𝑒 𝑚𝑖𝑠𝑠 𝑝𝑒𝑛𝑎𝑙𝑡𝑦ቁ + 𝑤𝑟𝑖𝑡𝑒 𝑏𝑢𝑓𝑓𝑒𝑟 𝑠𝑡𝑎𝑙𝑙𝑠.

• If we assume that the write buff er stalls are negligible, we can


combine the reads and writes by using a single miss rate and
the miss penalty.
CONTD…

Memory stall cycles


Memory accesses
=  Miss rate  Miss penalty
Program
Instructio ns Misses
=   Miss penalty
Program Instructio n
Example

𝑰𝒏𝒔𝒕𝒓𝒄𝒖𝒕𝒊𝒐𝒏 𝒎𝒊𝒔𝒔 𝒄𝒚𝒄𝒍𝒆𝒔 = 𝑰 × 𝟐% × 𝟏𝟎𝟎 = 𝟐 × 𝑰

𝑫𝒂𝒕𝒂 𝒎𝒊𝒔𝒔 𝒄𝒚𝒄𝒍𝒆𝒔 = 𝑰 × 𝟑𝟔% × 𝟒% × 𝟏𝟎𝟎 = 𝟏. 𝟒𝟒 × 𝑰


The total number of memory-stall cycles is 2.00 𝐼 + 1.44 𝐼 = 3.44 𝐼.
CONTD…
• The total CPI including memory stalls is 2 𝐼 + 3.44 𝐼 = 5.44 𝐼
• The ratio of the CPU execution times is

𝐶𝑃𝑈 𝑇𝑖𝑚𝑒 𝑤𝑖𝑡ℎ 𝑆𝑡𝑎𝑙𝑙𝑠 𝐼×𝐶𝑃𝐼𝑠𝑡𝑎𝑙𝑙 ×𝐶𝑙𝑜𝑐𝑘 𝑐𝑦𝑐𝑙𝑒


• =
𝐶𝑃𝑈 𝑇𝑖𝑚𝑒 𝑤𝑖𝑡ℎ 𝑝𝑒𝑟𝑓𝑒𝑐𝑡 𝑐𝑎𝑐ℎ𝑒 𝐼×𝐶𝑃𝐼𝑝𝑒𝑟𝑓𝑒𝑐𝑡 ×𝐶𝑙𝑜𝑐𝑘 𝑐𝑦𝑐𝑙𝑒

𝐶𝑃𝐼𝑠𝑡𝑎𝑙𝑙 5.44
•= = = 2.72
𝐶𝑃𝐼𝑝𝑒𝑟𝑓𝑒𝑐𝑡 2

• The amount of execution time spent on memory stalls


3.44
• = 63%
5.44
CONTD…

• What happens if the processor is made faster, but the memory


system is not?
• Suppose we speed-up the computer in the previous example
by reducing its CPI from 2 to 1 without changing the clock rate,
which might be done with an improved pipeline. The system
with cache misses would then have a CPI of 1+3.44 = 4.44,
𝐶𝑃𝐼𝑠𝑡𝑎𝑙𝑙 4.44
•= = = 4.44
𝐶𝑃𝐼𝑝𝑒𝑟𝑓𝑒𝑐𝑡 1
• The amount of execution time spent on memory stalls
3.44
• = 77%
4.44
Average memory access time

• 𝑨𝑴𝑨𝑻 = 𝑻𝒊𝒎𝒆 𝒇𝒐𝒓 𝒂 𝒉𝒊𝒕 + 𝑴𝒊𝒔𝒔 𝒓𝒂𝒕𝒆 × 𝑴𝒊𝒔𝒔 𝒑𝒆𝒏𝒂𝒍𝒕𝒚


Associative Cache

• Instead of placing memory blocks in specific


cache locations based on memory address, we
could allow a block to go anywhere in cache.
• In this way, cache would have to fill up before
any blocks are evicted.
• This is how fully associative cache works.
• A memory address is partitioned into only three
fields: the tag, word and the byte.
Set Associative Cache Set associative cache

• Set associative cache combines the ideas of direct


mapped cache and fully associative cache.
• An N-way set associative cache mapping is like direct
mapped cache in that a memory reference maps to a
particular location in cache.
• Unlike direct mapped cache, a memory reference maps to
a set of several cache blocks, similar to the way in which
fully associative cache works.
• Instead of mapping anywhere in the entire cache, a
memory reference can map only to the subset of cache
slots.
Cache Memory Set associative cache

• The number of cache blocks per set in set associative


cache varies according to overall system design.
• For example, a 2-way set associative cache can be
conceptualized as shown in the schematic below.
• Each set contains two different memory blocks.
Blk 0 00000000 A block of 8 words 1
Set 0
Blk 1 --- (empty) 0
Blk 0 11110101 A block of 8 words 1
Set 1
Blk 1 --- (empty) 0
Blk 0 --- (empty) 0
Set 2
Blk 1 10111011 A block of 8 words 1
Associative Cache Example
Associative Caches
• Fully associative
–Allow a given block to go in any cache entry
–Requires all entries to be searched at once
–Comparator per entry (expensive)
• n-way set associative
–Each set contains n entries
–Block number determines which set
• (Block number) modulo (#Sets in cache)
–Search all entries in a given set at once
–n comparators (less expensive)
Spectrum of Associativity
• For a cache with 8 entries
Associativity Example

• Compare 4-block caches


– Direct mapped, 2-way set associative,
fully associative
– Block access sequence: 0, 8, 0, 6, 8

• Direct mapped

Block Cache Hit/miss Cache content after access


address index 0 1 2 3
0 0 miss Mem[0]
8 0 miss Mem[8]
0 0 miss Mem[0]
6 2 miss Mem[0] Mem[6]
8 0 miss Mem[8] Mem[6]
Associativity Example
2-way set associative
Block Cache Hit/miss Cache content after access
address index Set 0 Set 1
0 0 miss Mem[0]
8 0 miss Mem[0] Mem[8]
0 0 hit Mem[0] Mem[8]
6 0 miss Mem[0] Mem[6]
8 0 miss Mem[8] Mem[6]

Fully associative
Block Hit/miss Cache content after access
address
0 miss Mem[0]
8 miss Mem[0] Mem[8]
0 hit Mem[0] Mem[8]
6 miss Mem[0] Mem[8] Mem[6]
8 hit Mem[0] Mem[8] Mem[6]
Set Associative Cache Organization
Replacement Policies

• With fully associative and set associative cache, a


replacement policy is invoked when it becomes necessary
to evict a block from cache.
• An optimal replacement policy would be able to look into
the future to see which blocks won’t be needed for the
longest period of time.
• Although it is impossible to implement an optimal
replacement algorithm, it is instructive to use it as a
benchmark for assessing the efficiency of any other
scheme we come up with.
CONTD…

• The replacement policy that we choose depends upon the


locality that we are trying to optimize-- usually, we are
interested in temporal locality.
• Least Recently Used (LRU):
• Replace the block which has not been used for a longer
time. The disadvantage of this approach is its complexity:
LRU has to maintain an access history for each block,
which ultimately slows down the cache.
CONTD…

• First-in, first-out (FIFO)


• is a popular cache replacement policy. In FIFO, the block
that has been in the cache the longest, regardless of
when it was last used.
• A random replacement policy does what its name implies:
It picks a block at random and replaces it with a new
block.
• Random replacement can certainly evict a block that will
be needed often or needed soon, but it never thrashes.
Replacement Policy

• Direct mapped: no choice


• Set associative
– Prefer non-valid entry, if there is one
– Otherwise, choose among entries in the set
• Least-recently used (LRU)
– Choose the one unused for the longest time
• Simple for 2-way, manageable for 4-way, too hard beyond that
• Random
– Gives approximately the same performance as LRU for high
associativity
Decreasing Miss Rates with Associative Block Placement

• Direct mapped: one unique cache location for each memory


block
– cache block address = memory block address mod cache size
• Fully associative: each memory block can locate anywhere in
cache
– all cache entries are searched (in parallel) to locate block
• Set associative: each memory block can place in a unique set
of cache locations – if the set is of size n it is n-way set-
associative
– cache set address = memory block address mod number of sets in
cache
– all cache entries in the corresponding set are searched (in parallel) to
locate block
• Increasing degree of associativity
– reduces miss rate
– increases hit time because of the parallel search and then fetch
Multilevel Caches

• Primary cache attached to CPU


– Small, but fast
• Level-2 cache services misses from primary cache
– Larger, slower, but still faster than main memory
• Main memory services L-2 cache misses
• Some high-end systems include L-3 cache
Multilevel Cache Considerations

• Primary cache
– Focus on minimal hit time
• L-2 cache
– Focus on low miss rate to avoid main memory access
– Hit time has less overall impact
• Results
– L-1 cache usually smaller than a single cache
– L-1 block size smaller than L-2 block size
Class Task

You might also like