CS 322M Digital Logic & Computer Architecture: Cache Optimization Techniques-II

This document discusses techniques for optimizing cache performance, including reducing miss rate, miss penalty, and hit time. It describes multi-banked caches that can support simultaneous accesses, non-blocking caches that allow hits during misses, early restart and critical word first to reduce miss penalties, write buffer merging, hardware and compiler prefetching to reduce miss rates, and compiler optimizations like loop interchange and blocking. The overall goal is to reduce average memory access time.

Uploaded by

sai rishi

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

39 views

CS 322M Digital Logic & Computer Architecture: Cache Optimization Techniques-II

Uploaded by

sai rishi

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 14

CS 322M Digital Logic & Computer Architecture

Lecture 25 [28.10.2019]
Cache Optimization Techniques-II

John Jose
Assistant Professor
Department of Computer Science & Engineering
Indian Institute of Technology Guwahati, Assam.
Accessing Cache Memory
Hit time
Memory
CPU Cache Miss penalty

Average memory access time (AMAT) =

Hit time + (Miss rate×Miss penalty)

 Hit Time: Time to find the block in the cache and

return it to processor [indexing, tag comparison,
transfer].
 Miss Rate: Fraction of cache access result in a miss.
 Miss Penalty: Number of cycles required to fetch the
block from the next level of memory hierarchy. It is the
extra (not total) time (or cycle) for a miss in addition to
hit time which is incurred by all accesses.
How to optimize cache ?
 Reduce Average Memory Access Time
 AMAT= Hit Time + Miss Rate x Miss Penalty
 Motives
Reducing the miss rate
Reducing the miss penalty
Reducing the hit time
Multi-banked Caches
 Multi-banked caches to increase cache bandwidth
 Rather a single monolithic unit, divide cache into many
banks that can support simultaneous accesses.
ARM Cortex-A8 supports 1-4 banks for L2
Intel i7 supports 4 banks for L1 and 8 banks for L2
 Interleave banks according to block address
 Sequential Interleaving
Non-blocking Caches
 Non blocking caches to increase cache bandwidth
 Caches can serve hits under multiple cache miss in progress
(a) Hit under miss (b) Hit under multiple miss

 Must needed for OOO superscalar processor for IPC increase

 L2 must support it with L1-MSHR (Miss Status Holding Reg.)
 On an L1 miss allocate MSHR entry, clear upon L2 respond
with cache block reply.
 L1 miss penalty can be hidden to some extend.
Non-blocking Caches
 Non blocking caches to increase cache bandwidth
 Processors can hide L1 miss penalty but not L2 miss penalty
 Reduces the effective miss penalty by overlapping miss
latencies
 Significantly increases the complexity of the cache controller
as there can be multiple outstanding memory accesses
 Requires pipelined or banked memory system
Early Restart
 Early restart to reduce miss penalty
 CPU do not wait for entire block to be loaded
 Early restart
Request words in normal order
Missed word to the processor as soon as it arrives
Generally useful in large blocks
L2 controller is not involved in this technique
Critical Word First
 Critical word first to reduce miss penalty
 Critical word first
Request missed word from memory first
Send it to the processor as soon as it arrives
Processor resume while rest of the block is filled in
cache
L2 cache controller send words out of order.
L1 cache controller should re-arrange words in block
Merging Write Buffer
 Write buffer merging to reduce miss penalty
 Write buffer allows the processor to continue without waiting
for writes to get over.
 When performing a store/write on a block that is already
pending in the write buffer, update write buffer
 Reduces stalls due to full write buffer, improve buffer
efficiency
 If buffer is full writes incur processor stall.

No write buffering Write buffering

Hardware Prefetching
 Pre-fetching to reduce miss rate and miss penalty.
 Pre-fetch items before processor request them.
 Fetch more blocks on miss -include next sequential block
 Requested block is kept in I-cache and next in stream buffer.
 If a missed block is in stream buffer, cache miss is cancelled
Compiler Optimizations
 Compiler optimization to reduce miss rate
 Loop Interchange
Swap nested loops to access memory in sequential order
Maximize the use of data in cache before it is discarded
Compiler Optimizations
 Blocking
Instead of accessing entire rows or columns, subdivide
matrices into blocks
Requires more accesses but improves locality of accesses
Compiler Controlled Pre-fetching
 Pre-fetching to reduce miss rate and miss penalty.
 Insert pre-fetch instructions before data is needed
 Pre-fetching will give performance only if processor reads
from caches and executes while pre-fetching is in progress.
 Register pre-fetch
Loads data into register
 Cache pre-fetch
Loads data into cache
 Use loop unrolling and scheduling for pre-fetching data of
adjacent iterations
[email protected]
http://www.iitg.ac.in/johnjose/

Solutions For All Social Sciences Grade
No ratings yet
Solutions For All Social Sciences Grade
49 pages
Selected Solutions To Munkres's Topology, 2nd Ed.: Takumi Murayama December 20, 2014
100% (1)
Selected Solutions To Munkres's Topology, 2nd Ed.: Takumi Murayama December 20, 2014
54 pages
Rock Chute Design Data: Input Geometry
100% (1)
Rock Chute Design Data: Input Geometry
8 pages
COMP 740: Computer Architecture and Implementation: Montek Singh
No ratings yet
COMP 740: Computer Architecture and Implementation: Montek Singh
41 pages
Memory Hierarchy Design: A Quantitative Approach, Fifth Edition
No ratings yet
Memory Hierarchy Design: A Quantitative Approach, Fifth Edition
17 pages
Memory Hierarchy Design-Aca
No ratings yet
Memory Hierarchy Design-Aca
15 pages
Cache_optimizations
No ratings yet
Cache_optimizations
29 pages
5.2 Eleven Advanced Optimizations of Cache Performance
No ratings yet
5.2 Eleven Advanced Optimizations of Cache Performance
13 pages
ACA_Lecture_27_Cache_Optimizations
No ratings yet
ACA_Lecture_27_Cache_Optimizations
20 pages
Lec 34
No ratings yet
Lec 34
26 pages
Cache Performance Average Memory Access Time
No ratings yet
Cache Performance Average Memory Access Time
23 pages
Advanced Computer Architecture-06CS81-Memory Hierarchy Design
No ratings yet
Advanced Computer Architecture-06CS81-Memory Hierarchy Design
18 pages
Chapter 3 Cache
No ratings yet
Chapter 3 Cache
38 pages
Cache Optimizations
No ratings yet
Cache Optimizations
23 pages
Cache Misses
No ratings yet
Cache Misses
8 pages
Memory Hierarchy - Ways To Reduce Misses: DAP Spr. 98 ©UCB 1
No ratings yet
Memory Hierarchy - Ways To Reduce Misses: DAP Spr. 98 ©UCB 1
23 pages
Lecture 5 Cache Optimization
No ratings yet
Lecture 5 Cache Optimization
25 pages
Lecture16 PDF
No ratings yet
Lecture16 PDF
4 pages
Coa Poster Content
No ratings yet
Coa Poster Content
2 pages
Caches and Memory
No ratings yet
Caches and Memory
65 pages
Crictical Word First For Cache Misses
No ratings yet
Crictical Word First For Cache Misses
21 pages
Lec 33
No ratings yet
Lec 33
26 pages
10_Caches
No ratings yet
10_Caches
34 pages
Memory Hierarchy Design: A Quantitative Approach, Fifth Edition
No ratings yet
Memory Hierarchy Design: A Quantitative Approach, Fifth Edition
22 pages
Memory Hierarchy Design: A Quantitative Approach, Fifth Edition
No ratings yet
Memory Hierarchy Design: A Quantitative Approach, Fifth Edition
37 pages
Improving and Measuring Cache Performance
No ratings yet
Improving and Measuring Cache Performance
8 pages
Cache Miss Penalty Reduction: #1 - Multilevel Caches
No ratings yet
Cache Miss Penalty Reduction: #1 - Multilevel Caches
8 pages
Compiler Optimizations and Prefetching
No ratings yet
Compiler Optimizations and Prefetching
22 pages
Average Memory Access Time
No ratings yet
Average Memory Access Time
12 pages
Lec 6
No ratings yet
Lec 6
18 pages
UE19CS252
No ratings yet
UE19CS252
25 pages
Advanced Cache Optimizations - : Adapted From Patterson and Hennessey (Morgan Kauffman Pubs)
No ratings yet
Advanced Cache Optimizations - : Adapted From Patterson and Hennessey (Morgan Kauffman Pubs)
12 pages
Computer Organization and Architecture
No ratings yet
Computer Organization and Architecture
12 pages
Advanced Architecture Memory
No ratings yet
Advanced Architecture Memory
13 pages
Ec6009 Advanced Computer Architecture Unit V Memory and I/O: Cache Performance
No ratings yet
Ec6009 Advanced Computer Architecture Unit V Memory and I/O: Cache Performance
16 pages
UNIT-IV Memory and I/O
No ratings yet
UNIT-IV Memory and I/O
36 pages
CHAPTER 2 Memory Hierarchy Design & APPENDIX B. Review of Memory Heriarchy
No ratings yet
CHAPTER 2 Memory Hierarchy Design & APPENDIX B. Review of Memory Heriarchy
73 pages
Solution of CSE 240A Assignemnt 3
No ratings yet
Solution of CSE 240A Assignemnt 3
5 pages
Improving Cache Performance:: Average Memory Access Time Amat T + Miss Rate X Miss Penalty
No ratings yet
Improving Cache Performance:: Average Memory Access Time Amat T + Miss Rate X Miss Penalty
16 pages
ch2 Appb
No ratings yet
ch2 Appb
58 pages
15IF11 Multicore C PDF
No ratings yet
15IF11 Multicore C PDF
46 pages
Lecture 12: Cache Innovations
No ratings yet
Lecture 12: Cache Innovations
17 pages
Question: Who Cares About The Memory Hierarchy?: Caches and Memory Systems I
No ratings yet
Question: Who Cares About The Memory Hierarchy?: Caches and Memory Systems I
13 pages
Cache Memory Performance
No ratings yet
Cache Memory Performance
10 pages
COA Digital-Cheatsheet
No ratings yet
COA Digital-Cheatsheet
4 pages
Basic Optimization Techniques in Cache Memory 2.2.4
No ratings yet
Basic Optimization Techniques in Cache Memory 2.2.4
4 pages
Lecture: Cache Hierarchies: Topics: Cache Innovations (Sections B.1-B.3, 2.1)
No ratings yet
Lecture: Cache Hierarchies: Topics: Cache Innovations (Sections B.1-B.3, 2.1)
20 pages
Improving Cache Performance Reducing Misses
No ratings yet
Improving Cache Performance Reducing Misses
9 pages
Computer Science 246 Computer Architecture: Si 2009 Spring 2009 Harvard University
No ratings yet
Computer Science 246 Computer Architecture: Si 2009 Spring 2009 Harvard University
27 pages
Cache
No ratings yet
Cache
34 pages
L07-MemoryII
No ratings yet
L07-MemoryII
27 pages
CompArch Cheatsheet (3)
No ratings yet
CompArch Cheatsheet (3)
2 pages
Computer Architecture
No ratings yet
Computer Architecture
5 pages
Topics: Cache Innovations (Sections 2.4, B.4, B.5), Virtual Memory Intro
No ratings yet
Topics: Cache Innovations (Sections 2.4, B.4, B.5), Virtual Memory Intro
20 pages
Cache Org
No ratings yet
Cache Org
19 pages
Merging Write Buffers
No ratings yet
Merging Write Buffers
14 pages
03-Memory
No ratings yet
03-Memory
48 pages
Cache Memory: A Safe Place For Hiding or Storing Things
100% (1)
Cache Memory: A Safe Place For Hiding or Storing Things
34 pages
Memory Hierarchy: Two Principles
No ratings yet
Memory Hierarchy: Two Principles
68 pages
UNIT2 Cahe-Opt
No ratings yet
UNIT2 Cahe-Opt
134 pages
Chapter 2 Adv 2007 PPTV 4
No ratings yet
Chapter 2 Adv 2007 PPTV 4
54 pages
SAS Programming Guidelines Interview Questions You'll Most Likely Be Asked
From Everand
SAS Programming Guidelines Interview Questions You'll Most Likely Be Asked
Vibrant Publishers
No ratings yet
Operating Systems Interview Questions You'll Most Likely Be Asked
From Everand
Operating Systems Interview Questions You'll Most Likely Be Asked
Vibrant Publishers
No ratings yet
Electric Valve Actuator Type Car: For 2 & 3-Way Valves Type G/L/M/S 2Fm-T & G/L/M/S 3Fm-T Page 1 of 4 0-4.11.08-H
No ratings yet
Electric Valve Actuator Type Car: For 2 & 3-Way Valves Type G/L/M/S 2Fm-T & G/L/M/S 3Fm-T Page 1 of 4 0-4.11.08-H
4 pages
BR Training Catalog
No ratings yet
BR Training Catalog
64 pages
Digital and Data Communications
No ratings yet
Digital and Data Communications
7 pages
A-Level Chemistry Practical Manual
No ratings yet
A-Level Chemistry Practical Manual
50 pages
SQL Interview Questions
100% (2)
SQL Interview Questions
46 pages
Tropical Cyclones and Floods in Fiji
No ratings yet
Tropical Cyclones and Floods in Fiji
16 pages
Pages From ASCE 7-05 Minimum Design Loads For Buildings and Other Struc
No ratings yet
Pages From ASCE 7-05 Minimum Design Loads For Buildings and Other Struc
3 pages
Kinds of Variables and Their Uses
100% (2)
Kinds of Variables and Their Uses
2 pages
Twenty-Five Years With Nicholas Bourbaki, - Borel - 1949-1974
No ratings yet
Twenty-Five Years With Nicholas Bourbaki, - Borel - 1949-1974
8 pages
First Draft 100L STATISTICS First Semester 2024
No ratings yet
First Draft 100L STATISTICS First Semester 2024
1 page
Tutorial Sheet 6
No ratings yet
Tutorial Sheet 6
2 pages
ams_mscphy_2nd sem_2021
No ratings yet
ams_mscphy_2nd sem_2021
1 page
STATISTICS-LESSON-14 3rd Quarter
No ratings yet
STATISTICS-LESSON-14 3rd Quarter
23 pages
Producer Behaviour and Supply
No ratings yet
Producer Behaviour and Supply
5 pages
Honeywell Omron
No ratings yet
Honeywell Omron
66 pages
00 AGENDA v2020 JUNE05 2020
No ratings yet
00 AGENDA v2020 JUNE05 2020
33 pages
Sap Logistic Table Related To PS and MM and SD
No ratings yet
Sap Logistic Table Related To PS and MM and SD
23 pages
ASQLSSR Feb2020 Chakey With Online Figures PDF
No ratings yet
ASQLSSR Feb2020 Chakey With Online Figures PDF
10 pages
Course Code: Cosc239 Credit Hours: 3+lab Lecture Hours: 2 Laboratory Hours: 2 Prerequisites: Cosc132
No ratings yet
Course Code: Cosc239 Credit Hours: 3+lab Lecture Hours: 2 Laboratory Hours: 2 Prerequisites: Cosc132
11 pages
(IJETA-V11I3P17) :vikash Kumar, Khushbu Jain, Arin Joshi, Ayush Mishra, Manvi Sharma
No ratings yet
(IJETA-V11I3P17) :vikash Kumar, Khushbu Jain, Arin Joshi, Ayush Mishra, Manvi Sharma
4 pages
Prod. Ucts Det. Ails
No ratings yet
Prod. Ucts Det. Ails
15 pages
Unit5 Questions
No ratings yet
Unit5 Questions
17 pages
CLASS 5TH Syllabus and Date sheet
No ratings yet
CLASS 5TH Syllabus and Date sheet
2 pages
Recloser-Fuse Coordination Protection For Distributed Generation Systems Methodology and Priorities For Optimal Disconnections
No ratings yet
Recloser-Fuse Coordination Protection For Distributed Generation Systems Methodology and Priorities For Optimal Disconnections
6 pages
LM3876 Overture™ Audio Power Amplifier Series High-Performance 56W Audio Power Amplifier W/mute
No ratings yet
LM3876 Overture™ Audio Power Amplifier Series High-Performance 56W Audio Power Amplifier W/mute
28 pages
Motoroal - 802.11ac White Paper
No ratings yet
Motoroal - 802.11ac White Paper
10 pages
EC410-Chapter 1
No ratings yet
EC410-Chapter 1
36 pages

CS 322M Digital Logic & Computer Architecture: Cache Optimization Techniques-II

Uploaded by

CS 322M Digital Logic & Computer Architecture: Cache Optimization Techniques-II

Uploaded by

CS 322M Digital Logic & Computer Architecture

Average memory access time (AMAT) =

 Hit Time: Time to find the block in the cache and

 Must needed for OOO superscalar processor for IPC increase

No write buffering Write buffering

You might also like