0% found this document useful (0 votes)

41 views16 pages

Improving Cache Performance:: Average Memory Access Time Amat T + Miss Rate X Miss Penalty

This document discusses techniques for improving cache performance. It describes that average memory access time (AMAT) is determined by hit time, miss rate, and miss penalty. Optimizations aim to reduce miss rate through larger caches, higher associativity, and compiler support, and to reduce miss penalty through multi-level caches and latency hiding. The document also describes cache performance models, different types of cache misses, replacement algorithms like LRU, and methods for reducing miss rate such as larger block sizes, pseudo-associative caches, and way prediction.

Uploaded by

Biplob

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

41 views16 pages

Improving Cache Performance:: Average Memory Access Time Amat T + Miss Rate X Miss Penalty

Uploaded by

Biplob

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 16

Improving Cache Performance

AMAT: Average Memory Access Time

AMAT = Thit + Miss Rate x Miss Penalty

Optimizations based on :
• Reducing Miss Rate:
• Structural: Cache size, Associativity, Block size, Compiler support

• Reducing Miss Penalty

• Structural: Multi-level caches, Critical word/Early Restart,
• Latency Hiding: Using concurrency to reduce miss rate or miss penalty

• Improving Hit Time

1
Cache Performance Models
Temporal Locality: Repeated access to the same word
Spatial Locality: Access to words in physical proximity to accessed word

Miss Categories:
• Compulsory: Cold-start (first-reference) misses
• Infinite cache miss rate
• Characteristic of the workload: e.g streams (majority of misses compulsory)

• Capacity: Data set size larger than that of cache

• Increase size of cache to avoid thrashing
• Fully associative abstraction

Replacement Algorithms:
Optimal off-line algorithm:
Belady Rule: Evict the cache block whose next reference is furthest in the future
Provides lower bound on the number of capacity misses for a given cache size

• Conflict: Cache organizations causes block to be discarded and later retrieved 2

• Collision or Interference misses

Cache Replacement
Replacement Algorithms:
Optimal off-line algorithm:
Belady Rule: Evict the cache block whose next reference is furthest in the future
Provides lower bound on the number of capacity misses for a given cache size
Cache size: 4 Blocks
Block Access Sequence: A B C D E C E A D B C D E A B
5
OPTIMAL (Belady)

5. A B C D 5 Compulsory Misses (A, B, C, D, E)

Evict B

ABCD ECEAD BCDEAB

6. A E C D
6
Evict A

ABCD ECEAD BCDE AB

7. B E C D 7
Evict D (or E or C)

B E C A
2 Capacity Misses

3
Cache Replacement
Replacement Algorithms:
Least Recently Used (LRU): Evict the cache block that was last referenced furthest in the past
Cache size: 4 Blocks
Block Access Sequence: A B C D E C E A D B C D E A B
5

LRU
2 additional misses due to
5. A B C D 5 Compulsory Misses (A, B, C, D, E)
non-optimal replacement
Evict A

ABCD ECE AD BCDEAB

6. E B C D
6

Evict B

ABCD ECEAD BCD EAB

7. B A C D 7
Evict A

8. E C D ABCD ECEAD BCD E AB

B
8

Evict B ABCD ECEAD BCD E A B

9. A E C D
9 4
Evict C
LRU
• Hard to implement efficiently

• Software: LRU Stack

ABCD ECEADBCDEAB
Miss

D TOP E C E A
C D E C E
B C D D C
A LRU Block B B B D

Hits

On hit: Need to read and write ordering information: Not for hardware maintained cache

2
LRU
• Approximate LRU (Some Intel processors)
Left/Right accessed last?
R

R R

R R R R

A B C D E F G H

A,B,C,D,E,F,G,H

On Miss: Follow the path of NOT Accessed last

• Random Selection
Reducing Miss Rate
1. Larger cache size:
+ Reduce capacity misses - Hit time may increase
- Cost increase
2. Increased Associativity:
+ Miss rate decreases -- conflict misses - Hit time increases
may increase clock cycle time
- Hardware cost increases

Miss rate with 8-way associative comparable to fully associative (empirical finding)

Example
Direct mapped cache: Hit time 1 cycle, Miss Penalty 25 cycles (low!), Miss rate = 0.08
8-way set associative: Clock cycle 1.5x, Miss rate = 0.07
Let T be clock cycle of direct mapped cache
AMAT (direct mapped) = (1 + 0.08 x 25) x T = 3.0T
AMAT (set associative): New clock period = 1.5 x T + 0.07 x Miss Penalty
Miss Penalty = ceiling (25 x T /1.5T) x 1.5T = ceiling (25/1.5) x 1.5T = 17 x 1.5 T= 25.5T
5
AMAT = 1.5T+ 0.07 x 25.5T = T(1.5+1.785) = 3.285T
(Increasing associativity hurts in this example!!!)
Reducing Miss Rate

3. Block Size (B):

• Miss rate
• decreases and then increases with increasing block size
+ a) Compulsory miss rate decreases due to better use of spatial locality
- b) Capacity (conflict) misses increase as effective cache size decreases
• Miss penalty
• increases with increasing block size
- c) Wasted memory access time: Miss penalty increase not providing any gain
Do (a) and (c) balance each other?
+ d) Amortized memory access time per byte decreases (burst mode memory)
• Tag overhead decreases

Low latency, Low bandwidth memory: Smaller block size

High latency, High bandwidth: Larger block size

6
Reducing Miss Rate

Block Size B (contd):

Low latency, Low bandwidth memory: Smaller block size
High latency, High bandwidth: Larger block size
Example:
Case 1: Miss ratio of 5% with B=8 and Case 2: Miss ratio of 4% with B=16.
Burst-mode Memory:
Memory latency of 8 cycles,
Transfer rate 2 bytes/cycle.

Cache Hit time 1 cycle.

AMAT = Hit time + Miss Rate x Miss penalty
Case 1: AMAT = 1 + 5% x (8 + 8/2) = 1.6 cycles
Case 2: AMAT = 1 + 4% x (8 + 16/2) = 1.64 cycles
Suppose memory latency was 16 cycles: Favors larger block size.
Case 1: AMAT = 1 + 5% x (16 + 8/2) = 2.0 cycles
Case 2: AMAT = 1 + 4% x (16 + 16/2) = 1.96 cycles
7
Reducing Miss Rate
4. Pseudo Associative caches
+ Maintain hit-speed of direct mapped.
+ Reduce conflict misses

Column (or pseudo) associative:

On miss: check one more location in Direct Mapped Cache
Like having a fixed way-prediction

Way Prediction: Predict block in set to be read on next access.

If tag match: 1 cycle hit
If failure: do complete selection on subsequent cycles

+ Power savings potential

- Poor prediction increases hit time

12
Column (or pseudo) associative

On miss: check one more location in Direct Mapped Cache

Like having a fixed way-prediction

Direct Map
0xxxx 0xxxx

1xxxx 1xxxx

Direct Map

Alternate Cache Location for Green block

12
Way Prediction

Predict block in set to be read on next access.

If tag match: 1 cycle hit
If failure: do complete selection on subsequent cycles
2-way set associative map
0xxxx 0xxxx

1xxxx

12
Reducing Miss Rate
5. Compiler Optimizations

• Instruction access
• Rearrange code (procedure, code block placements) to reduce conflict misses
• Align entry point of basic block with start of a cache block

• Data access: Improve spatial/temporal locality in arrays

a) Merging arrays: Replace parallel arrays with array of struct (spatial locality)

update(j): { *name[j] = …; id[j] = …; age[j] = …; salary[j] = …; }

update(j): { *(person[j].name) = …; person[j].id = …; person[j].age = …; person[j].salary = …;}

When might separate arrays be better?

b) Loop Fusion: Combine loops which use the same data (temporal locality)
for (j=0; < n; j++) x[j] = y[2 * j]; for (j=0; j < n; j++) {
for (j=0; < n; j++) sum += x[j]; x[j] = y[2 * j];
sum += x[j] ;
} 8

When might separate loops be better?

Reducing Miss Rate
Compiler Optimizations (contd …)

• Data access: Improve spatial/temporal locality in arrays

c) Loop interchange: Convert column-major matrix access to row-major access (spatial)

n
A
A P P

B
C for (k=0; k < m; k++)
m
D for (j=0; j < n; j++)
B
E a[k][j] = 0;

for (j=0; j < n; j++) Only compulsory misses:

C 1 per block:
for (k=0; k < m; k++)
a[k][j] = 0; F
Array element size w bytes
Block size B bytes
B/w elements per block
Assuming Row-Major storage in memory:
Misses: mn/ (B/w) = mnw/B
Could miss on each access of a[ ][ ]
9
Misses: mn
Reducing Miss Rate
Compiler/Programmer Optimizations (contd …)
d) Blocking: Use block-oriented access to maximize both temporal and spatial locality

Cache Insensitive Matrix Multiplication: O(n3) cache misses for accessing matrix b elements
for (i=0; i < n; i++)
for (j=0; j < n; j++)
for (k=0; k < n; k++)
c[i][j] += a[i][k] * b[k][j];

a b

10
Reducing Miss Rate
Compiler/Programmer Optimizations (contd …)
d) Blocking: Use block-oriented access to maximize both temporal and spatial locality
O(n3) cache misses for accessing matrix b elements
for (i=0; i < n/s; i++)
for (j=0; j < n/s; j++)
for (k=0; k < n/s; k++)
C[i][j] = C[i][j] +++ A[i][k] *** B[k][j];
Block Matrix Multiplication of A[i][k] with B[k][j] to get one update of
Matrix Addition C[i][j]

Block Matrix A[0][0] Block Matrix B[0][0]

a
11

Problems and Prospects of General Insurance in Bangladesh
75% (4)
Problems and Prospects of General Insurance in Bangladesh
56 pages
Kinematic Diagrams
No ratings yet
Kinematic Diagrams
16 pages
Plastic University MCQ Merged
No ratings yet
Plastic University MCQ Merged
13 pages
Lasers and Their Applications: Debabrata Goswami
No ratings yet
Lasers and Their Applications: Debabrata Goswami
18 pages
Victoria Code of Practice For Using Concrete Pump
0% (1)
Victoria Code of Practice For Using Concrete Pump
56 pages
Annexure-4 CertificatefromUniversity
No ratings yet
Annexure-4 CertificatefromUniversity
1 page
Cache Memory: A Safe Place For Hiding or Storing Things
100% (1)
Cache Memory: A Safe Place For Hiding or Storing Things
34 pages
Mycbseguide: Class 12 - Accountancy Sample Paper 07
No ratings yet
Mycbseguide: Class 12 - Accountancy Sample Paper 07
15 pages
Fedex Payment Due Upon Receipt in PDF
No ratings yet
Fedex Payment Due Upon Receipt in PDF
31 pages
Source Follower: (Common-Drain Amplifier)
No ratings yet
Source Follower: (Common-Drain Amplifier)
40 pages
Touch Me - Chapter 1 - Baeconandeggs, Xiaolianhua - EXO (Band) (Archive of Our Own)
No ratings yet
Touch Me - Chapter 1 - Baeconandeggs, Xiaolianhua - EXO (Band) (Archive of Our Own)
12 pages
Mahaveer Price List
No ratings yet
Mahaveer Price List
6 pages
Cache Design
No ratings yet
Cache Design
59 pages
Coa PPT
No ratings yet
Coa PPT
158 pages
Onur 447 Spring15 Lecture19 High Performance Caches Afterlecture
No ratings yet
Onur 447 Spring15 Lecture19 High Performance Caches Afterlecture
57 pages
ACA Unit-5
No ratings yet
ACA Unit-5
54 pages
10 Cache Memories
No ratings yet
10 Cache Memories
49 pages
Cache
No ratings yet
Cache
34 pages
Edraky - SD
No ratings yet
Edraky - SD
29 pages
Unit 1 AP World History Powerpoint
No ratings yet
Unit 1 AP World History Powerpoint
55 pages
ch2 Appb
No ratings yet
ch2 Appb
58 pages
10 Caches
No ratings yet
10 Caches
34 pages
Cache Memory: A Safe Place For Hiding or Storing Things
No ratings yet
Cache Memory: A Safe Place For Hiding or Storing Things
34 pages
Lecture 5 Cache Optimization
No ratings yet
Lecture 5 Cache Optimization
25 pages
COMP 740: Computer Architecture and Implementation: Montek Singh
No ratings yet
COMP 740: Computer Architecture and Implementation: Montek Singh
41 pages
Ynspire Magazin-1-23 EN
No ratings yet
Ynspire Magazin-1-23 EN
48 pages
CODch 7 Slides
No ratings yet
CODch 7 Slides
49 pages
Chapter 2 Adv 2007 PPTV 4
No ratings yet
Chapter 2 Adv 2007 PPTV 4
54 pages
Cache PPT
No ratings yet
Cache PPT
38 pages
Memory Hierarchy - Ways To Reduce Misses: DAP Spr. 98 ©UCB 1
No ratings yet
Memory Hierarchy - Ways To Reduce Misses: DAP Spr. 98 ©UCB 1
23 pages
5.2 Eleven Advanced Optimizations of Cache Performance
No ratings yet
5.2 Eleven Advanced Optimizations of Cache Performance
13 pages
Memory 2
No ratings yet
Memory 2
31 pages
Lecture: Cache Hierarchies: Topics: Cache Innovations (Sections B.1-B.3, 2.1)
No ratings yet
Lecture: Cache Hierarchies: Topics: Cache Innovations (Sections B.1-B.3, 2.1)
20 pages
ch5 Easy
No ratings yet
ch5 Easy
27 pages
Cache Presentation
No ratings yet
Cache Presentation
45 pages
Cache Optimizations
No ratings yet
Cache Optimizations
23 pages
Memory Hierarchy 4.0
No ratings yet
Memory Hierarchy 4.0
50 pages
Cache Writing & Performance
No ratings yet
Cache Writing & Performance
23 pages
Product Decision - MM
No ratings yet
Product Decision - MM
33 pages
Lecture 8
No ratings yet
Lecture 8
33 pages
AC14L08 Memory Hierarchy
No ratings yet
AC14L08 Memory Hierarchy
20 pages
Advanced Cache Optimizations - : Adapted From Patterson and Hennessey (Morgan Kauffman Pubs)
No ratings yet
Advanced Cache Optimizations - : Adapted From Patterson and Hennessey (Morgan Kauffman Pubs)
12 pages
Heuristic Search Strategies
No ratings yet
Heuristic Search Strategies
23 pages
chainOfResponsibility Example3
No ratings yet
chainOfResponsibility Example3
12 pages
Unit 4
No ratings yet
Unit 4
72 pages
Question: Who Cares About The Memory Hierarchy?: Caches and Memory Systems I
No ratings yet
Question: Who Cares About The Memory Hierarchy?: Caches and Memory Systems I
13 pages
RISK MGMT Chap III 2020 Sem II
No ratings yet
RISK MGMT Chap III 2020 Sem II
18 pages
CA Lecture 08
No ratings yet
CA Lecture 08
38 pages
Ec6009 Advanced Computer Architecture Unit V Memory and I/O: Cache Performance
No ratings yet
Ec6009 Advanced Computer Architecture Unit V Memory and I/O: Cache Performance
16 pages
361 Computer Architecture Lecture 14: Cache Memory
No ratings yet
361 Computer Architecture Lecture 14: Cache Memory
20 pages
Chapter 6
No ratings yet
Chapter 6
37 pages
Lec 6
No ratings yet
Lec 6
18 pages
Aca Seminar Report
No ratings yet
Aca Seminar Report
11 pages
L07 MemoryII
No ratings yet
L07 MemoryII
27 pages
Advanced Computer Architecture-06CS81-Memory Hierarchy Design
No ratings yet
Advanced Computer Architecture-06CS81-Memory Hierarchy Design
18 pages
L18 Cache Wrap Up
No ratings yet
L18 Cache Wrap Up
30 pages
Cache Optimizations
No ratings yet
Cache Optimizations
29 pages
Lecture 12: Cache Innovations
No ratings yet
Lecture 12: Cache Innovations
17 pages
Week 13 - Lecture 13 - Memory (Cont)
No ratings yet
Week 13 - Lecture 13 - Memory (Cont)
31 pages
10 Cacheperf
No ratings yet
10 Cacheperf
24 pages
Cache Org
No ratings yet
Cache Org
19 pages
Memory Hierarchy Design-Aca
No ratings yet
Memory Hierarchy Design-Aca
15 pages
Timber Formwork Design
No ratings yet
Timber Formwork Design
12 pages
CMP3010L09 MemoryII
No ratings yet
CMP3010L09 MemoryII
39 pages
Cache Memory
No ratings yet
Cache Memory
28 pages
Netwoking Assignment
No ratings yet
Netwoking Assignment
6 pages
Chapter # 05
No ratings yet
Chapter # 05
42 pages
ISO-9426-2003 - Wood-Based Panels - Determination of Dimension of Panels
No ratings yet
ISO-9426-2003 - Wood-Based Panels - Determination of Dimension of Panels
9 pages
BC2402 Designing and Developing Databases - Course Outline
No ratings yet
BC2402 Designing and Developing Databases - Course Outline
11 pages
CA11 2023S1 New
No ratings yet
CA11 2023S1 New
26 pages
S71937 - Enabling Intelligent Storage To Process Data For Ai Application Ibm
No ratings yet
S71937 - Enabling Intelligent Storage To Process Data For Ai Application Ibm
21 pages
Chapter 2 Neede For Guide Line Help From Smiw
No ratings yet
Chapter 2 Neede For Guide Line Help From Smiw
7 pages
Lecture 5: Memory Hierarchy and Cache Traditional Four Questions For Memory Hierarchy Designers
No ratings yet
Lecture 5: Memory Hierarchy and Cache Traditional Four Questions For Memory Hierarchy Designers
10 pages
Unit 3 Notes
No ratings yet
Unit 3 Notes
7 pages
Cache Misses
No ratings yet
Cache Misses
8 pages
SDE - Fresher
No ratings yet
SDE - Fresher
3 pages
Memory Hierarchies (Part 2) Review: The Memory Hierarchy
No ratings yet
Memory Hierarchies (Part 2) Review: The Memory Hierarchy
7 pages
Unit II
No ratings yet
Unit II
9 pages
2015-2016". May I Respectfully Ask Your Permission To Allow Me To Conduct This Research
No ratings yet
2015-2016". May I Respectfully Ask Your Permission To Allow Me To Conduct This Research
6 pages
Lec8 Memory
No ratings yet
Lec8 Memory
17 pages
Lesson Plan #5-Final Demo
No ratings yet
Lesson Plan #5-Final Demo
5 pages
Information On The Format of The TOEFL
No ratings yet
Information On The Format of The TOEFL
2 pages
1 Balance Sheet 1
No ratings yet
1 Balance Sheet 1
3 pages
DE-13 - Quiz 8
No ratings yet
DE-13 - Quiz 8
2 pages
Activity 1 BRS NSC Mar 2017 Cheques Out)
No ratings yet
Activity 1 BRS NSC Mar 2017 Cheques Out)
1 page
Chapter Test: QS - Explain How You Found Your Answer
No ratings yet
Chapter Test: QS - Explain How You Found Your Answer
1 page
Translate Skenario Toefl
No ratings yet
Translate Skenario Toefl
2 pages
CS530 Fall2015 Lecture6
No ratings yet
CS530 Fall2015 Lecture6
3 pages
Filec
No ratings yet
Filec
2 pages
Coa Poster Content
No ratings yet
Coa Poster Content
2 pages
Pressure Transmitter Offer
No ratings yet
Pressure Transmitter Offer
2 pages
Pondicherry University: ADMISSIONS 2020 - 21 Payment Receipt For Application
No ratings yet
Pondicherry University: ADMISSIONS 2020 - 21 Payment Receipt For Application
1 page
DSO Organizational Chart - by Michael W. Davis, DDS
No ratings yet
DSO Organizational Chart - by Michael W. Davis, DDS
1 page

Improving Cache Performance:: Average Memory Access Time Amat T + Miss Rate X Miss Penalty

Uploaded by

Improving Cache Performance:: Average Memory Access Time Amat T + Miss Rate X Miss Penalty

Uploaded by

Improving Cache Performance

AMAT: Average Memory Access Time

• Reducing Miss Penalty

• Improving Hit Time

• Capacity: Data set size larger than that of cache

• Conflict: Cache organizations causes block to be discarded and later retrieved 2

• Collision or Interference misses

5. A B C D 5 Compulsory Misses (A, B, C, D, E)

ABCD ECEAD BCDEAB

ABCD ECEAD BCDE AB

ABCD ECE AD BCDEAB

ABCD ECEAD BCD EAB

8. E C D ABCD ECEAD BCD E AB

Evict B ABCD ECEAD BCD E A B

• Software: LRU Stack

On Miss: Follow the path of NOT Accessed last

3. Block Size (B):

Low latency, Low bandwidth memory: Smaller block size

Block Size B (contd):

Cache Hit time 1 cycle.

Column (or pseudo) associative:

Way Prediction: Predict block in set to be read on next access.

+ Power savings potential

On miss: check one more location in Direct Mapped Cache

Alternate Cache Location for Green block

Predict block in set to be read on next access.

• Data access: Improve spatial/temporal locality in arrays

update(j): { *name[j] = …; id[j] = …; age[j] = …; salary[j] = …; }

update(j): { *(person[j].name) = …; person[j].id = …; person[j].age = …; person[j].salary = …;}

When might separate arrays be better?

When might separate loops be better?

• Data access: Improve spatial/temporal locality in arrays

c) Loop interchange: Convert column-major matrix access to row-major access (spatial)

for (j=0; j < n; j++) Only compulsory misses:

Block Matrix A[0][0] Block Matrix B[0][0]

You might also like