0% found this document useful (0 votes)

45 views27 pages

Lecture # 15

This document discusses cache performance and memory hierarchy. It covers: 1) Measuring cache performance using metrics like miss rate and miss penalty to calculate average memory access time. 2) Different types of caches like direct mapped, set associative, and fully associative caches, as well as replacement policies. 3) Techniques for improving cache performance like increasing associativity to reduce miss rate and adding additional cache levels to reduce miss penalty through multilevel caching.

Uploaded by

Noman Ajaz Kørãí

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

45 views27 pages

Lecture # 15

Uploaded by

Noman Ajaz Kørãí

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 27

Chapter # 5:

Large and fast: exploiting memory

hierarchy

Course Instructor: Dr. Afshan Jamil

Lecture # 15
• Measuring and improving cache performance
• Average memory access time
• Associative cache

Contents
• Set associative cache

Contents
• Spectrum of associativity
• Set associative cache organization
• Replacement policies
• Decreasing Miss Rates with Associative Block
Placement
• Multilevel caches
Improving Cache Performance

• Reducing the miss rate by reducing the probability that two

different memory blocks will contend for the same cache
location.
• Reducing the miss penalty by adding an additional level to the
hierarchy. Th is technique, called multilevel caching.
Measuring Cache Performance
• Components of CPU time
– Program execution cycles
• Includes cache hit time
– Memory stall cycles
• Mainly from cache misses
• CPU time = (CPU execution clock cycles + Memory-stall clock
cycles) x Clock cycle time

• The memory-stall clock cycles come primarily from cache misses.

• Memory-stall clock cycles = (Read-stall cycles + Write-stall cycles)

CONTD…

𝑅𝑒𝑎𝑑𝑠
• Read 𝑠𝑡𝑎𝑙𝑙 𝑐𝑦𝑐𝑙𝑒𝑠 = × 𝑟𝑒𝑎𝑑 𝑚𝑖𝑠𝑠 𝑟𝑎𝑡𝑒 ×
𝑃𝑟𝑜𝑔𝑟𝑎𝑚
𝑟𝑒𝑎𝑑 𝑚𝑖𝑠𝑠 𝑝𝑒𝑛𝑎𝑙𝑡𝑦.

𝑊𝑟𝑖𝑡𝑒𝑠
• 𝑊𝑟𝑖𝑡𝑒 𝑠𝑡𝑎𝑙𝑙 𝑐𝑦𝑐𝑙𝑒𝑠 = ቀ × 𝑤𝑟𝑖𝑡𝑒 𝑚𝑖𝑠𝑠 𝑟𝑎𝑡𝑒 ×
𝑃𝑟𝑜𝑔𝑟𝑎𝑚
𝑤𝑟𝑖𝑡𝑒 𝑚𝑖𝑠𝑠 𝑝𝑒𝑛𝑎𝑙𝑡𝑦ቁ + 𝑤𝑟𝑖𝑡𝑒 𝑏𝑢𝑓𝑓𝑒𝑟 𝑠𝑡𝑎𝑙𝑙𝑠.

• If we assume that the write buff er stalls are negligible, we can

combine the reads and writes by using a single miss rate and
the miss penalty.
CONTD…

Memory stall cycles

Memory accesses
=  Miss rate  Miss penalty
Program
Instructio ns Misses
=   Miss penalty
Program Instructio n
Example

𝑰𝒏𝒔𝒕𝒓𝒄𝒖𝒕𝒊𝒐𝒏 𝒎𝒊𝒔𝒔 𝒄𝒚𝒄𝒍𝒆𝒔 = 𝑰 × 𝟐% × 𝟏𝟎𝟎 = 𝟐 × 𝑰

𝑫𝒂𝒕𝒂 𝒎𝒊𝒔𝒔 𝒄𝒚𝒄𝒍𝒆𝒔 = 𝑰 × 𝟑𝟔% × 𝟒% × 𝟏𝟎𝟎 = 𝟏. 𝟒𝟒 × 𝑰

The total number of memory-stall cycles is 2.00 𝐼 + 1.44 𝐼 = 3.44 𝐼.
CONTD…
• The total CPI including memory stalls is 2 𝐼 + 3.44 𝐼 = 5.44 𝐼
• The ratio of the CPU execution times is

𝐶𝑃𝑈 𝑇𝑖𝑚𝑒 𝑤𝑖𝑡ℎ 𝑆𝑡𝑎𝑙𝑙𝑠 𝐼×𝐶𝑃𝐼𝑠𝑡𝑎𝑙𝑙 ×𝐶𝑙𝑜𝑐𝑘 𝑐𝑦𝑐𝑙𝑒

• =
𝐶𝑃𝑈 𝑇𝑖𝑚𝑒 𝑤𝑖𝑡ℎ 𝑝𝑒𝑟𝑓𝑒𝑐𝑡 𝑐𝑎𝑐ℎ𝑒 𝐼×𝐶𝑃𝐼𝑝𝑒𝑟𝑓𝑒𝑐𝑡 ×𝐶𝑙𝑜𝑐𝑘 𝑐𝑦𝑐𝑙𝑒

𝐶𝑃𝐼𝑠𝑡𝑎𝑙𝑙 5.44
•= = = 2.72
𝐶𝑃𝐼𝑝𝑒𝑟𝑓𝑒𝑐𝑡 2

• The amount of execution time spent on memory stalls

3.44
• = 63%
5.44
CONTD…

• What happens if the processor is made faster, but the memory

system is not?
• Suppose we speed-up the computer in the previous example
by reducing its CPI from 2 to 1 without changing the clock rate,
which might be done with an improved pipeline. The system
with cache misses would then have a CPI of 1+3.44 = 4.44,
𝐶𝑃𝐼𝑠𝑡𝑎𝑙𝑙 4.44
•= = = 4.44
𝐶𝑃𝐼𝑝𝑒𝑟𝑓𝑒𝑐𝑡 1
• The amount of execution time spent on memory stalls
3.44
• = 77%
4.44
Average memory access time

• 𝑨𝑴𝑨𝑻 = 𝑻𝒊𝒎𝒆 𝒇𝒐𝒓 𝒂 𝒉𝒊𝒕 + 𝑴𝒊𝒔𝒔 𝒓𝒂𝒕𝒆 × 𝑴𝒊𝒔𝒔 𝒑𝒆𝒏𝒂𝒍𝒕𝒚

Associative Cache

• Instead of placing memory blocks in specific

cache locations based on memory address, we
could allow a block to go anywhere in cache.
• In this way, cache would have to fill up before
any blocks are evicted.
• This is how fully associative cache works.
• A memory address is partitioned into only three
fields: the tag, word and the byte.
Set Associative Cache Set associative cache

• Set associative cache combines the ideas of direct

mapped cache and fully associative cache.
• An N-way set associative cache mapping is like direct
mapped cache in that a memory reference maps to a
particular location in cache.
• Unlike direct mapped cache, a memory reference maps to
a set of several cache blocks, similar to the way in which
fully associative cache works.
• Instead of mapping anywhere in the entire cache, a
memory reference can map only to the subset of cache
slots.
Cache Memory Set associative cache

• The number of cache blocks per set in set associative

cache varies according to overall system design.
• For example, a 2-way set associative cache can be
conceptualized as shown in the schematic below.
• Each set contains two different memory blocks.
Blk 0 00000000 A block of 8 words 1
Set 0
Blk 1 --- (empty) 0
Blk 0 11110101 A block of 8 words 1
Set 1
Blk 1 --- (empty) 0
Blk 0 --- (empty) 0
Set 2
Blk 1 10111011 A block of 8 words 1
Associative Cache Example
Associative Caches
• Fully associative
–Allow a given block to go in any cache entry
–Requires all entries to be searched at once
–Comparator per entry (expensive)
• n-way set associative
–Each set contains n entries
–Block number determines which set
• (Block number) modulo (#Sets in cache)
–Search all entries in a given set at once
–n comparators (less expensive)
Spectrum of Associativity
• For a cache with 8 entries
Associativity Example

• Compare 4-block caches

– Direct mapped, 2-way set associative,
fully associative
– Block access sequence: 0, 8, 0, 6, 8

• Direct mapped

Block Cache Hit/miss Cache content after access

address index 0 1 2 3
0 0 miss Mem[0]
8 0 miss Mem[8]
0 0 miss Mem[0]
6 2 miss Mem[0] Mem[6]
8 0 miss Mem[8] Mem[6]
Associativity Example
2-way set associative
Block Cache Hit/miss Cache content after access
address index Set 0 Set 1
0 0 miss Mem[0]
8 0 miss Mem[0] Mem[8]
0 0 hit Mem[0] Mem[8]
6 0 miss Mem[0] Mem[6]
8 0 miss Mem[8] Mem[6]

Fully associative
Block Hit/miss Cache content after access
address
0 miss Mem[0]
8 miss Mem[0] Mem[8]
0 hit Mem[0] Mem[8]
6 miss Mem[0] Mem[8] Mem[6]
8 hit Mem[0] Mem[8] Mem[6]
Set Associative Cache Organization
Replacement Policies

• With fully associative and set associative cache, a

replacement policy is invoked when it becomes necessary
to evict a block from cache.
• An optimal replacement policy would be able to look into
the future to see which blocks won’t be needed for the
longest period of time.
• Although it is impossible to implement an optimal
replacement algorithm, it is instructive to use it as a
benchmark for assessing the efficiency of any other
scheme we come up with.
CONTD…

• The replacement policy that we choose depends upon the

locality that we are trying to optimize-- usually, we are
interested in temporal locality.
• Least Recently Used (LRU):
• Replace the block which has not been used for a longer
time. The disadvantage of this approach is its complexity:
LRU has to maintain an access history for each block,
which ultimately slows down the cache.
CONTD…

• First-in, first-out (FIFO)

• is a popular cache replacement policy. In FIFO, the block
that has been in the cache the longest, regardless of
when it was last used.
• A random replacement policy does what its name implies:
It picks a block at random and replaces it with a new
block.
• Random replacement can certainly evict a block that will
be needed often or needed soon, but it never thrashes.
Replacement Policy

• Direct mapped: no choice

• Set associative
– Prefer non-valid entry, if there is one
– Otherwise, choose among entries in the set
• Least-recently used (LRU)
– Choose the one unused for the longest time
• Simple for 2-way, manageable for 4-way, too hard beyond that
• Random
– Gives approximately the same performance as LRU for high
associativity
Decreasing Miss Rates with Associative Block Placement

• Direct mapped: one unique cache location for each memory

block
– cache block address = memory block address mod cache size
• Fully associative: each memory block can locate anywhere in
cache
– all cache entries are searched (in parallel) to locate block
• Set associative: each memory block can place in a unique set
of cache locations – if the set is of size n it is n-way set-
associative
– cache set address = memory block address mod number of sets in
cache
– all cache entries in the corresponding set are searched (in parallel) to
locate block
• Increasing degree of associativity
– reduces miss rate
– increases hit time because of the parallel search and then fetch
Multilevel Caches

• Primary cache attached to CPU

– Small, but fast
• Level-2 cache services misses from primary cache
– Larger, slower, but still faster than main memory
• Main memory services L-2 cache misses
• Some high-end systems include L-3 cache
Multilevel Cache Considerations

• Primary cache
– Focus on minimal hit time
• L-2 cache
– Focus on low miss rate to avoid main memory access
– Hit time has less overall impact
• Results
– L-1 cache usually smaller than a single cache
– L-1 block size smaller than L-2 block size
Class Task

Principles: Life and Work
From Everand
Principles: Life and Work
Ray Dalio
4/5 (643)
The Gifts of Imperfection: Let Go of Who You Think You're Supposed to Be and Embrace Who You Are
From Everand
The Gifts of Imperfection: Let Go of Who You Think You're Supposed to Be and Embrace Who You Are
Brene Brown
4/5 (1175)
The Glass Castle: A Memoir
From Everand
The Glass Castle: A Memoir
Jeannette Walls
4.5/5 (1856)
Sing, Unburied, Sing: A Novel
From Everand
Sing, Unburied, Sing: A Novel
Jesmyn Ward
4/5 (1267)
The Perks of Being a Wallflower
From Everand
The Perks of Being a Wallflower
Stephen Chbosky
4.5/5 (4103)
Her Body and Other Parties: Stories
From Everand
Her Body and Other Parties: Stories
Carmen Maria Machado
4/5 (903)
Shoe Dog: A Memoir by the Creator of Nike
From Everand
Shoe Dog: A Memoir by the Creator of Nike
Phil Knight
4.5/5 (629)
Steve Jobs
From Everand
Steve Jobs
Walter Isaacson
4.5/5 (1139)
The Emperor of All Maladies: A Biography of Cancer
From Everand
The Emperor of All Maladies: A Biography of Cancer
Siddhartha Mukherjee
4.5/5 (298)
The Yellow House: A Memoir (2019 National Book Award Winner)
From Everand
The Yellow House: A Memoir (2019 National Book Award Winner)
Sarah M. Broom
4/5 (100)
Angela's Ashes: A Memoir
From Everand
Angela's Ashes: A Memoir
Frank McCourt
4.5/5 (943)
The World Is Flat 3.0: A Brief History of the Twenty-first Century
From Everand
The World Is Flat 3.0: A Brief History of the Twenty-first Century
Thomas L. Friedman
3.5/5 (2289)
The Outsider: A Novel
From Everand
The Outsider: A Novel
Stephen King
4/5 (2885)
A Heartbreaking Work Of Staggering Genius: A Memoir Based on a True Story
From Everand
A Heartbreaking Work Of Staggering Genius: A Memoir Based on a True Story
Dave Eggers
3.5/5 (233)
Team of Rivals: The Political Genius of Abraham Lincoln
From Everand
Team of Rivals: The Political Genius of Abraham Lincoln
Doris Kearns Goodwin
4.5/5 (244)
Rise of ISIS: A Threat We Can't Ignore
From Everand
Rise of ISIS: A Threat We Can't Ignore
Jay Sekulow
3.5/5 (144)
Manhattan Beach: A Novel
From Everand
Manhattan Beach: A Novel
Jennifer Egan
3.5/5 (919)
Fear: Trump in the White House
From Everand
Fear: Trump in the White House
Bob Woodward
3.5/5 (836)
John Adams
From Everand
John Adams
David McCullough
4.5/5 (2546)
The Unwinding: An Inner History of the New America
From Everand
The Unwinding: An Inner History of the New America
George Packer
4/5 (45)
The Light Between Oceans: A Novel
From Everand
The Light Between Oceans: A Novel
M.L. Stedman
4.5/5 (815)
Little Women
From Everand
Little Women
Louisa May Alcott
4.5/5 (2369)
Oracle Full Table Scan
No ratings yet
Oracle Full Table Scan
8 pages
Amcat Coding
No ratings yet
Amcat Coding
41 pages
Os Unit 4 Class Notes
No ratings yet
Os Unit 4 Class Notes
18 pages
Ex1 Answer
No ratings yet
Ex1 Answer
2 pages
Solution of CSE 240A Assignemnt 3
No ratings yet
Solution of CSE 240A Assignemnt 3
5 pages
CH 10
No ratings yet
CH 10
28 pages
OS Assignment 2
No ratings yet
OS Assignment 2
4 pages
Isilon OneFS
No ratings yet
Isilon OneFS
42 pages
TSI0556-Hitachi TagmaStore USP Software Solutions SG v2.0-2
No ratings yet
TSI0556-Hitachi TagmaStore USP Software Solutions SG v2.0-2
168 pages
Osy Report F Osy Project
No ratings yet
Osy Report F Osy Project
10 pages
CA I - Chapter 5 Caches 3
No ratings yet
CA I - Chapter 5 Caches 3
70 pages
Cache Memory-Associative Mapping
No ratings yet
Cache Memory-Associative Mapping
23 pages
Caching Challenges and Strategies
No ratings yet
Caching Challenges and Strategies
7 pages
Unbound in C: San Diego - 2006 Wouter Wijngaards (Wouter@Nlnetlabs - NL)
No ratings yet
Unbound in C: San Diego - 2006 Wouter Wijngaards (Wouter@Nlnetlabs - NL)
22 pages
How To Prepare For Your SDE Interview at Google
No ratings yet
How To Prepare For Your SDE Interview at Google
21 pages
The Nutanix Bible Guide
100% (1)
The Nutanix Bible Guide
75 pages
Chapter 4
No ratings yet
Chapter 4
25 pages
Oracle DBA Servival Guide
No ratings yet
Oracle DBA Servival Guide
287 pages
Unit 4 Part 3
No ratings yet
Unit 4 Part 3
6 pages
Os Lab Manual 7th To 15th
No ratings yet
Os Lab Manual 7th To 15th
74 pages
Algorithm For Page Replacement
No ratings yet
Algorithm For Page Replacement
13 pages
RSH Consulting - RACF Performance Tuning - SHARE
No ratings yet
RSH Consulting - RACF Performance Tuning - SHARE
24 pages
Cap Ese Q Answers
No ratings yet
Cap Ese Q Answers
11 pages
Compe 431 Sample Questions
100% (1)
Compe 431 Sample Questions
19 pages
Unit 1 Part 2 (Chapter 4) Cache Memory
No ratings yet
Unit 1 Part 2 (Chapter 4) Cache Memory
53 pages
Cache Memory Design Options
No ratings yet
Cache Memory Design Options
20 pages
Cat2 b1 Cao
No ratings yet
Cat2 b1 Cao
7 pages
Usha Mittal Institute of Technology SNDT Women'S University: MUMBAI - 400049
No ratings yet
Usha Mittal Institute of Technology SNDT Women'S University: MUMBAI - 400049
19 pages
William Stallings Computer Organization and Architecture 8th Edition Cache Memory
No ratings yet
William Stallings Computer Organization and Architecture 8th Edition Cache Memory
43 pages
Computer Organization - List of Assignments.
No ratings yet
Computer Organization - List of Assignments.
7 pages

Lecture # 15

Uploaded by

Lecture # 15

Uploaded by

Chapter # 5:

Large and fast: exploiting memory

Course Instructor: Dr. Afshan Jamil

• Reducing the miss rate by reducing the probability that two

• The memory-stall clock cycles come primarily from cache misses.

• Memory-stall clock cycles = (Read-stall cycles + Write-stall cycles)

• If we assume that the write buff er stalls are negligible, we can

Memory stall cycles

𝑰𝒏𝒔𝒕𝒓𝒄𝒖𝒕𝒊𝒐𝒏 𝒎𝒊𝒔𝒔 𝒄𝒚𝒄𝒍𝒆𝒔 = 𝑰 × 𝟐% × 𝟏𝟎𝟎 = 𝟐 × 𝑰

𝑫𝒂𝒕𝒂 𝒎𝒊𝒔𝒔 𝒄𝒚𝒄𝒍𝒆𝒔 = 𝑰 × 𝟑𝟔% × 𝟒% × 𝟏𝟎𝟎 = 𝟏. 𝟒𝟒 × 𝑰

𝐶𝑃𝑈 𝑇𝑖𝑚𝑒 𝑤𝑖𝑡ℎ 𝑆𝑡𝑎𝑙𝑙𝑠 𝐼×𝐶𝑃𝐼𝑠𝑡𝑎𝑙𝑙 ×𝐶𝑙𝑜𝑐𝑘 𝑐𝑦𝑐𝑙𝑒

• The amount of execution time spent on memory stalls

• What happens if the processor is made faster, but the memory

• 𝑨𝑴𝑨𝑻 = 𝑻𝒊𝒎𝒆 𝒇𝒐𝒓 𝒂 𝒉𝒊𝒕 + 𝑴𝒊𝒔𝒔 𝒓𝒂𝒕𝒆 × 𝑴𝒊𝒔𝒔 𝒑𝒆𝒏𝒂𝒍𝒕𝒚

• Instead of placing memory blocks in specific

• Set associative cache combines the ideas of direct

• The number of cache blocks per set in set associative

• Compare 4-block caches

Block Cache Hit/miss Cache content after access

• With fully associative and set associative cache, a

• The replacement policy that we choose depends upon the

• First-in, first-out (FIFO)

• Direct mapped: no choice

• Direct mapped: one unique cache location for each memory

• Primary cache attached to CPU

You might also like