0% found this document useful (0 votes)

58 views26 pages

Chapter 5.1-5.6 Memory

Uploaded by

tippars

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

58 views26 pages

Chapter 5.1-5.6 Memory

Uploaded by

tippars

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

You are on page 1/ 26

Memory Hierarchy and Caches

1
Cache Memories in the Datapath
Imm16

Imm
Ext
ALU result 32
0
32
1

A
D-Cache

Data
L

R
0 Address
Instruction Rt 5 32
RB BusB 0 U Data_out
PC

1 Address 1 1 1

B
2 2 0
Data_in

D
RW BusW
3 32
32
0

Rd2

Rd3

Rd4
1
Rd

clk
Instruction Block
Block Address

Block Address

D-Cache miss
I-Cache miss

I-Cache miss or D-Cache miss

Data Block
causes pipeline to stall

Interface to L2 Cache or Main Memory

2
Four Basic Questions on Caches
• Q1: Where can a block be placed in a cache?
– Block placement
– Direct Mapped, Set Associative, Fully Associative
• Q2: How is a block found in a cache?
– Block identification
– Block address, tag, index
• Q3: Which block should be replaced on a cache miss?
– Block replacement
– FIFO, Random, LRU
• Q4: What happens on a write?
– Write strategy
– Write Back or Write Through cache (with Write Buffer) 3
Inside a Cache Memory
Address Address
Cache Main
Processor Data Data
Memory Memory

N Cache Blocks
Tags Address Tag 0 Cache Block 0

identify Address Tag 1 Cache Block 1

blocks in ... ...

the cache Tag N – 1 Cache Block N – 1

 Cache Block (or Cache Line)

 Unit of data transfer between main memory and a cache
 Large block size  Less tag overhead + Burst transfer from DRAM
 Typically, cache block size = 64 bytes in recent caches

4
Block Placement: Direct Mapped
• Block: unit of data transfer between cache and memory
• Direct Mapped Cache:
– A block can be placed in exactly one location in the cache

000
001
010

100
101
011

110
111
In this example:

Cach
Cache index =
least significant 3 bits of

e
Memory address

Memory
Main
00000
00001
00010

00100
00101

01000
01001
01010

10000
10001
10010

10100
10101
00011

00110

01011
01100
01101

10011

10110

11000
11001
11010
00111

01110

10111

11011
11100
11101
01111

11110
11111
5
Direct-Mapped Cache
• A memory address is divided into
Block Address
– Block address: identifies block in memory
Tag Index offset
– Block offset: to access bytes within a block

• A block address is further divided into V Tag Block Data

– Index: used for direct cache access

– Tag: most-significant bits of block address
Index = Block Address mod Cache Blocks
• Tag must be stored also inside cache
=
– For block identification
Data

• A valid bit is also required to indicate Hit

– Whether a cache block is valid or not 6

Direct Mapped Cache – cont’d
• Cache hit: block is stored inside cache
– Index is used to access cache block Block Address

– Address tag is compared against stored tag Tag Index offset

– If equal and cache block is valid then hit

V Tag Block Data
– Otherwise: cache miss
• If number of cache blocks is 2n
– n bits are used for the cache index
• If number of bytes in a block is 2b
– b bits are used for the block offset
=
• If 32 bits are used for an address
Data
– 32 – n – b bits are used for the tag Hit

• Cache data size = 2n+b bytes 7

Mapping an Address to a Cache Block
• Example
– Consider a direct-mapped cache with 256 blocks
– Block size = 16 bytes
– Compute tag, index, and byte offset of address: 0x01FFF8AC
• Solution Block Address

20 8 4
– 32-bit address is divided into: Tag Index offset
• 4-bit byte offset field, because block size = 24 = 16 bytes
• 8-bit cache index, because there are 28 = 256 blocks in cache
• 20-bit tag field
– Byte offset = 0xC = 12 (least significant 4 bits of address)
– Cache index = 0x8A = 138 (next lower 8 bits of address)
– Tag = 0x01FFF (upper 20 bits of address)

8
Example on Cache Placement & Misses
• Consider a small direct-mapped cache with 32 blocks
– Cache is initially empty, Block size = 16 bytes
– The following memory addresses (in decimal) are referenced:
1000, 1004, 1008, 2548, 2552, 2556.
– Map addresses to cache blocks and indicate whether hit or miss
23 5 4
• Solution: Tag Index offset

– 1000 = 0x3E8 cache index = 0x1E Miss (first access)

– 1004 = 0x3EC cache index = 0x1E Hit
– 1008 = 0x3F0 cache index = 0x1F Miss (first access)
– 2548 = 0x9F4 cache index = 0x1F Miss (different tag)
– 2552 = 0x9F8 cache index = 0x1F Hit
– 2556 = 0x9FC cache index = 0x1F Hit
9
Fully Associative Cache
• A block can be placed anywhere in cache  no indexing
• If m blocks exist then
– m comparators are needed to match tag
Address
– Cache data size = m  2b bytes
Tag offset

V Tag Block Data V Tag Block Data V Tag Block Data V Tag Block Data

= = = =

mux
m-way associative Data
Hit

10
Set-Associative Cache
• A set is a group of blocks that can be indexed
• A block is first mapped onto a set
– Set index = Block address mod Number of sets in cache

• If there are m blocks in a set (m-way set associative) then

– m tags are checked in parallel using m comparators

• If 2n sets exist then set index consists of n bits

• Cache data size = m  2n+b bytes (with 2b bytes per block)
– Without counting tags and valid bits

• A direct-mapped cache has one block per set (m = 1)

• A fully-associative cache has one set (2n = 1 or n = 0)
11
Set-Associative Cache Diagram
Address Tag Index offset

V Tag Block Data V Tag Block Data V Tag Block Data V Tag Block Data

= = = =

mux
m-way set-associative Hit
Data

12
Write Policy
• Write Through:
– Writes update cache and lower-level memory
– Cache control bit: only a Valid bit is needed
– Memory always has latest data, which simplifies data coherency
– Can always discard cached data when a block is replaced
• Write Back:
– Writes update cache only
– Cache control bits: Valid and Modified bits are required
– Modified cached data is written back to memory when replaced
– Multiple writes to a cache block require only one write to memory
– Uses less memory bandwidth than write-through and less power
– However, more complex to implement than write through
13
Write Miss Policy
• What happens on a write miss?
• Write Allocate:
– Allocate new block in cache
– Write miss acts like a read miss, block is fetched and updated
• No Write Allocate:
– Send data to lower-level memory
– Cache is not modified
• Typically, write back caches use write allocate
– Hoping subsequent writes will be captured in the cache
• Write-through caches often use no-write allocate
– Reasoning: writes must still go to lower level memory

14
Write Buffer
• Decouples the CPU write from the memory bus writing
– Permits writes to occur without stall cycles until buffer is full

• Write-through: all stores are sent to lower level memory

– Write buffer eliminates processor stalls on consecutive writes

• Write-back: modified blocks are written when replaced

– Write buffer is used for evicted blocks that must be written back

• The address and modified data are written in the buffer

– The write is finished from the CPU perspective
– CPU continues while the write buffer prepares to write memory

• If buffer is full, CPU stalls until buffer has an empty entry

15
What Happens on a Cache Miss?
• Cache sends a miss signal to stall the processor
• Decide which cache block to allocate/replace
– One choice only when the cache is directly mapped
– Multiple choices for set-associative or fully-associative cache
• If block to be replaced is modified then write it back
– Modified block is moved into a Write Buffer
– Otherwise, block to be replaced can be simply discarded
• Transfer the block from lower level memory to this cache
– Set the valid bit and the tag field from the upper address bits
• Restart the instruction that caused the cache miss
• Miss Penalty: clock cycles to process a cache miss
16
Replacement Policy
• Which block to be replaced on a cache miss?
• No selection alternatives for direct-mapped caches
• m blocks per set to choose from for associative caches
• Random replacement
– Candidate blocks are randomly selected
– One counter for all sets (0 to m – 1): incremented on every cycle
– On a cache miss replace block specified by counter
• First In First Out (FIFO) replacement
– Replace oldest block in set
– One counter per set (0 to m – 1): specifies oldest block to replace
– Counter is incremented on a cache miss

17
Replacement Policy – cont’d
• Least Recently Used (LRU)
– Replace block that has been unused for the longest time
– Order blocks within a set from least to most recently used
– Update ordering of blocks on each cache hit
– With m blocks per set, there are m! possible permutations

• Pure LRU is too costly to implement when m > 2

– m = 2, there are 2 permutations only (a single bit is needed)
– m = 4, there are 4! = 24 possible permutations
– LRU approximation is used in practice

• For large m > 4,

Random replacement can be as effective as LRU 18
Hit Rate and Miss Rate
• Hit Rate = Hits / (Hits + Misses)
• Miss Rate = Misses / (Hits + Misses)
• I-Cache Miss Rate = Miss rate in the Instruction Cache
• D-Cache Miss Rate = Miss rate in the Data Cache
• Example:
– Out of 1000 instructions fetched, 150 missed in the I-Cache
– 25% are load-store instructions, 50 missed in the D-Cache
– What are the I-cache and D-cache miss rates?

• I-Cache Miss Rate = 150 / 1000 = 15%

• D-Cache Miss Rate = 50 / (25% × 1000) = 50 / 250 = 20%19
Memory Stall Cycles
• The processor stalls on a Cache miss
– When fetching instructions from the Instruction Cache (I-cache)
– When loading or storing data into the Data Cache (D-cache)

Memory stall cycles = Combined Misses  Miss Penalty

• Miss Penalty: clock cycles to process a cache miss
Combined Misses = I-Cache Misses + D-Cache Misses
I-Cache Misses = I-Count × I-Cache Miss Rate
D-Cache Misses = LS-Count × D-Cache Miss Rate
LS-Count (Load & Store) = I-Count × LS Frequency
• Cache misses are often reported per thousand instructions
20
Memory Stall Cycles Per Instruction
• Memory Stall Cycles Per Instruction =

Combined Misses Per Instruction × Miss Penalty

• Miss Penalty is assumed equal for I-cache & D-cache
• Miss Penalty is assumed equal for Load and Store
• Combined Misses Per Instruction =

I-Cache Miss Rate + LS Frequency × D-Cache Miss Rate

• Therefore, Memory Stall Cycles Per Instruction =

I-Cache Miss Rate × Miss Penalty +

LS Frequency × D-Cache Miss Rate × Miss Penalty
21
Example on Memory Stall Cycles
• Consider a program with the given characteristics
– Instruction count (I-Count) = 106 instructions
– 30% of instructions are loads and stores
– D-cache miss rate is 5% and I-cache miss rate is 1%
– Miss penalty is 100 clock cycles for instruction and data caches
– Compute combined misses per instruction and memory stall cycles
• Combined misses per instruction in I-Cache and D-Cache
– 1% + 30%  5% = 0.025 combined misses per instruction
– Equal to 25 misses per 1000 instructions
• Memory stall cycles
– 0.025  100 (miss penalty) = 2.5 stall cycles per instruction
– Total memory stall cycles = 106  2.5 = 2,500,000 22
CPU Time with Memory Stall Cycles

CPU Time = I-Count × CPIMemoryStalls × Clock Cycle

CPIMemoryStalls = CPIPerfectCache + Mem Stalls per Instruction

• CPIPerfectCache = CPI for ideal cache (no cache misses)

• CPIMemoryStalls = CPI in the presence of memory stalls

• Memory stall cycles increase the CPI

23
Example on CPI with Memory Stalls
• A processor has CPI of 1.5 without any memory stalls
– Cache miss rate is 2% for instruction and 5% for data
– 20% of instructions are loads and stores
– Cache miss penalty is 100 clock cycles for I-cache and D-cache

• What is the impact on the CPI?

• Answer: Instruction data

Mem Stalls per Instruction = 0.02×100 + 0.2×0.05×100 = 3

CPIMemoryStalls = 1.5 + 3 = 4.5 cycles per instruction
CPIMemoryStalls / CPIPerfectCache = 4.5 / 1.5 = 3
Processor is 3 times slower due to memory stall cycles
24
Average Memory Access Time
• Average Memory Access Time (AMAT)

AMAT = Hit time + Miss rate × Miss penalty

• Time to access a cache for both hits and misses
• Example: Find the AMAT for a cache with
– Cache access time (Hit time) of 1 cycle = 2 ns
– Miss penalty of 20 clock cycles
– Miss rate of 0.05 per access

• Solution:
AMAT = 1 + 0.05 × 20 = 2 cycles = 4 ns
Without the cache, AMAT will be equal to Miss penalty = 20 cycles
25
Improving Cache Performance
• Average Memory Access Time (AMAT)

AMAT = Hit time + Miss rate * Miss penalty

• Used as a framework for optimizations

• Reduce the Hit time
– Small and simple caches

• Reduce the Miss Rate

– Larger cache size, higher associativity, and larger block size

• Reduce the Miss Penalty

– Multilevel caches
26

S28 PDF
50% (2)
S28 PDF
109 pages
8085 PPT 1
No ratings yet
8085 PPT 1
17 pages
AC500 High Availabil
No ratings yet
AC500 High Availabil
48 pages
SAP Middleware Comparison - CPI vs. PO
100% (1)
SAP Middleware Comparison - CPI vs. PO
2 pages
MPMC - Notes T Apparao
No ratings yet
MPMC - Notes T Apparao
105 pages
Standalone v3 01 A
0% (1)
Standalone v3 01 A
72 pages
DSP ch04 S6,7P
No ratings yet
DSP ch04 S6,7P
70 pages
Technical Summative Assessment 1 (Diaz)
No ratings yet
Technical Summative Assessment 1 (Diaz)
11 pages
Infineon-AURIX TC3xx Architecture vol1-UserManual-v01 00-EN
No ratings yet
Infineon-AURIX TC3xx Architecture vol1-UserManual-v01 00-EN
206 pages
How Do You Program The Communication Blocks SFB14
No ratings yet
How Do You Program The Communication Blocks SFB14
4 pages
Memory Hierarchy Design
No ratings yet
Memory Hierarchy Design
115 pages
MemTest86 User Guide UEFI
No ratings yet
MemTest86 User Guide UEFI
83 pages
Information Communication Technology: National Youth Service-Engineering Institute
No ratings yet
Information Communication Technology: National Youth Service-Engineering Institute
31 pages
Coa PPT
No ratings yet
Coa PPT
158 pages
Memory Cache
No ratings yet
Memory Cache
18 pages
EE 466/586 VLSI Design: School of EECS Washington State University Pande@eecs - Wsu.edu
No ratings yet
EE 466/586 VLSI Design: School of EECS Washington State University Pande@eecs - Wsu.edu
23 pages
UNIT2 Cahe-Opt
No ratings yet
UNIT2 Cahe-Opt
134 pages
Computer Architecture: Memory Organization
No ratings yet
Computer Architecture: Memory Organization
65 pages
ACA Unit-5
No ratings yet
ACA Unit-5
54 pages
IS303 Architectural Analysis: SMU SIS Personal Notes
No ratings yet
IS303 Architectural Analysis: SMU SIS Personal Notes
86 pages
Heterogeneous Multicore Processor Technologies For Embedded Systems Compress
No ratings yet
Heterogeneous Multicore Processor Technologies For Embedded Systems Compress
233 pages
Computer Organization: Basic Structure of Computer
No ratings yet
Computer Organization: Basic Structure of Computer
59 pages
Cache
No ratings yet
Cache
34 pages
Chap 6
No ratings yet
Chap 6
48 pages
Computer Organization and Architecture (AT70.01)
No ratings yet
Computer Organization and Architecture (AT70.01)
49 pages
Bell's Law 2013
No ratings yet
Bell's Law 2013
68 pages
Sampriya Chandra Cache Memory
No ratings yet
Sampriya Chandra Cache Memory
36 pages
9 - Cache
No ratings yet
9 - Cache
58 pages
Today: - How Do Caches Work?
No ratings yet
Today: - How Do Caches Work?
38 pages
Lec8 - Caches
No ratings yet
Lec8 - Caches
55 pages
Lect12 Cache
No ratings yet
Lect12 Cache
39 pages
Computer Arch 06
No ratings yet
Computer Arch 06
41 pages
8085 - Signal Description PDF
No ratings yet
8085 - Signal Description PDF
45 pages
CODch 7 Slides
No ratings yet
CODch 7 Slides
49 pages
Fundamentals of Computer Systems: Caches
No ratings yet
Fundamentals of Computer Systems: Caches
28 pages
Cache Basics and Operation
No ratings yet
Cache Basics and Operation
42 pages
Unit V
No ratings yet
Unit V
44 pages
Miss Rate Versus Block Size: 25% 1K 4K 16K 64K 256K
No ratings yet
Miss Rate Versus Block Size: 25% 1K 4K 16K 64K 256K
33 pages
Chapter 1
No ratings yet
Chapter 1
59 pages
Cache1 2
No ratings yet
Cache1 2
30 pages
Computer Organization and Architecture (AT70.01) : Comp. Sc. and Inf. MGMT
No ratings yet
Computer Organization and Architecture (AT70.01) : Comp. Sc. and Inf. MGMT
49 pages
MC0073 System Programming
No ratings yet
MC0073 System Programming
35 pages
25 e 50 Beb 5 Aad 8 F 60
No ratings yet
25 e 50 Beb 5 Aad 8 F 60
49 pages
DLCO Module 6 Sem 3
No ratings yet
DLCO Module 6 Sem 3
40 pages
Cau 6 Cache
No ratings yet
Cau 6 Cache
25 pages
Computer Architecture: Assoc. Prof. Nguyễn Trí Thành, Phd
No ratings yet
Computer Architecture: Assoc. Prof. Nguyễn Trí Thành, Phd
55 pages
09 Caches Tlbs
No ratings yet
09 Caches Tlbs
33 pages
15IF11 Multicore B
No ratings yet
15IF11 Multicore B
36 pages
Cache Presentation
No ratings yet
Cache Presentation
45 pages
Direct-Mapped Cache: Write Allocate With Write-Through Protocol
No ratings yet
Direct-Mapped Cache: Write Allocate With Write-Through Protocol
25 pages
Lec 4
No ratings yet
Lec 4
31 pages
Lecture 13 - Introduction To Cache
No ratings yet
Lecture 13 - Introduction To Cache
47 pages
Advanced Computer Architecture: BY Dr. Radwa M. Tawfeek
No ratings yet
Advanced Computer Architecture: BY Dr. Radwa M. Tawfeek
32 pages
CS2115 Chapter-6
No ratings yet
CS2115 Chapter-6
45 pages
Lecture 8
No ratings yet
Lecture 8
33 pages
AC14L08 Memory Hierarchy
No ratings yet
AC14L08 Memory Hierarchy
20 pages
6.module 2 - Part 2
No ratings yet
6.module 2 - Part 2
39 pages
Unit-1 Os
No ratings yet
Unit-1 Os
16 pages
Unit - I: Basic Structure of A Computer System Computer System
No ratings yet
Unit - I: Basic Structure of A Computer System Computer System
25 pages
Caches - Basic Idea
No ratings yet
Caches - Basic Idea
11 pages
Unit 4
No ratings yet
Unit 4
72 pages
DSP ch05 S1,2P
No ratings yet
DSP ch05 S1,2P
37 pages
Wk10a Cache PDF
No ratings yet
Wk10a Cache PDF
25 pages
Course Code: CS 283 Course Title: Computer Architecture: Class Day: Friday Timing: 12:00 To 1:30
No ratings yet
Course Code: CS 283 Course Title: Computer Architecture: Class Day: Friday Timing: 12:00 To 1:30
23 pages
Accelerating DNN Training in Wireless Federated Edge Learning
No ratings yet
Accelerating DNN Training in Wireless Federated Edge Learning
30 pages
Numericalup
No ratings yet
Numericalup
9 pages
05) Cache Memory Introduction
No ratings yet
05) Cache Memory Introduction
20 pages
CMSC 611: Advanced Computer Architecture
No ratings yet
CMSC 611: Advanced Computer Architecture
21 pages
Lecture 19: Cache Basics: Today's Topics: Out-Of-Order Execution Cache Hierarchies Reminder: Assignment 7 Due On Thursday
No ratings yet
Lecture 19: Cache Basics: Today's Topics: Out-Of-Order Execution Cache Hierarchies Reminder: Assignment 7 Due On Thursday
17 pages
Lectures wk11
No ratings yet
Lectures wk11
21 pages
L18 Cache Wrap Up
No ratings yet
L18 Cache Wrap Up
30 pages
10 Cache
No ratings yet
10 Cache
28 pages
Cse 410 Computer Systems: Hal Perkins Spring 2010 L T 13 C Hwit DPF Lecture 13 - Cache Writes and Performance
No ratings yet
Cse 410 Computer Systems: Hal Perkins Spring 2010 L T 13 C Hwit DPF Lecture 13 - Cache Writes and Performance
20 pages
Computer Specification Report
No ratings yet
Computer Specification Report
18 pages
Untitled
No ratings yet
Untitled
12 pages
CMP3010L08 Memory
No ratings yet
CMP3010L08 Memory
45 pages
DSP ch03 S9P
No ratings yet
DSP ch03 S9P
17 pages
Basic Philosophy: Cache Memory
No ratings yet
Basic Philosophy: Cache Memory
16 pages
CMP3010L09 MemoryII
No ratings yet
CMP3010L09 MemoryII
39 pages
Cache Memory
No ratings yet
Cache Memory
28 pages
CPU Cache: Details of Operation
No ratings yet
CPU Cache: Details of Operation
18 pages
Chapter5 - Direct Mapped Caches
No ratings yet
Chapter5 - Direct Mapped Caches
11 pages
Cache Read Write Policies
No ratings yet
Cache Read Write Policies
9 pages
Cache Memories
No ratings yet
Cache Memories
41 pages
Kskcet Ec6703-Embedded & Realtime Systems Year/Sem:Iv/Vii: R.Rameshkumar Ap/Ece
No ratings yet
Kskcet Ec6703-Embedded & Realtime Systems Year/Sem:Iv/Vii: R.Rameshkumar Ap/Ece
5 pages
Lec8 Memory
No ratings yet
Lec8 Memory
17 pages
A Brief Introduction of Microcontrollers: 8051 Microcontroller
No ratings yet
A Brief Introduction of Microcontrollers: 8051 Microcontroller
67 pages
Mother Board Error Code
No ratings yet
Mother Board Error Code
5 pages
4.2 Worksheet 1
No ratings yet
4.2 Worksheet 1
3 pages
Time: 3 Hours Total Marks: 70: Printed Pages: 02 Sub Code: Paper Id: 131288 Roll No
No ratings yet
Time: 3 Hours Total Marks: 70: Printed Pages: 02 Sub Code: Paper Id: 131288 Roll No
2 pages
Razer Blade 15 Advanced Model (Mid 2021) CPU - RZ09-0409 - Geekbench
No ratings yet
Razer Blade 15 Advanced Model (Mid 2021) CPU - RZ09-0409 - Geekbench
1 page
Black on the Block: The Politics of Race and Class in the City
From Everand
Black on the Block: The Politics of Race and Class in the City
Mary Pattillo
3.5/5 (10)

Chapter 5.1-5.6 Memory

Uploaded by

Chapter 5.1-5.6 Memory

Uploaded by

Memory Hierarchy and Caches

I-Cache miss or D-Cache miss

Interface to L2 Cache or Main Memory

identify Address Tag 1 Cache Block 1

blocks in ... ...

 Cache Block (or Cache Line)

• A block address is further divided into V Tag Block Data

– Index: used for direct cache access

• A valid bit is also required to indicate Hit

– Whether a cache block is valid or not 6

– Address tag is compared against stored tag Tag Index offset

– If equal and cache block is valid then hit

• Cache data size = 2n+b bytes 7

– 1000 = 0x3E8 cache index = 0x1E Miss (first access)

• If there are m blocks in a set (m-way set associative) then

• If 2n sets exist then set index consists of n bits

• A direct-mapped cache has one block per set (m = 1)

• Write-through: all stores are sent to lower level memory

• Write-back: modified blocks are written when replaced

• The address and modified data are written in the buffer

• If buffer is full, CPU stalls until buffer has an empty entry

• Pure LRU is too costly to implement when m > 2

• For large m > 4,

• I-Cache Miss Rate = 150 / 1000 = 15%

Memory stall cycles = Combined Misses  Miss Penalty

Combined Misses Per Instruction × Miss Penalty

I-Cache Miss Rate + LS Frequency × D-Cache Miss Rate

I-Cache Miss Rate × Miss Penalty +

CPU Time = I-Count × CPIMemoryStalls × Clock Cycle

CPIMemoryStalls = CPIPerfectCache + Mem Stalls per Instruction

• CPIPerfectCache = CPI for ideal cache (no cache misses)

• CPIMemoryStalls = CPI in the presence of memory stalls

• Memory stall cycles increase the CPI

• What is the impact on the CPI?

Mem Stalls per Instruction = 0.02×100 + 0.2×0.05×100 = 3

AMAT = Hit time + Miss rate × Miss penalty

AMAT = Hit time + Miss rate * Miss penalty

• Used as a framework for optimizations

• Reduce the Miss Rate

• Reduce the Miss Penalty

You might also like