0% found this document useful (0 votes)
6 views13 pages

3 - 2 Memory Performance Overview Notes

The document discusses memory hierarchy design and cache performance in computer architecture, focusing on strategies to improve cache performance by reducing miss penalties and miss rates. It provides examples and calculations related to cache performance metrics such as CPI and average memory access time. Additionally, it covers techniques like multilevel caches, critical word first, and non-blocking caches to enhance overall system efficiency.

Uploaded by

wtfashr
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
6 views13 pages

3 - 2 Memory Performance Overview Notes

The document discusses memory hierarchy design and cache performance in computer architecture, focusing on strategies to improve cache performance by reducing miss penalties and miss rates. It provides examples and calculations related to cache performance metrics such as CPI and average memory access time. Additionally, it covers techniques like multilevel caches, critical word first, and non-blocking caches to enhance overall system efficiency.

Uploaded by

wtfashr
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 13

Computer Engineering Department, 8/7/2022

Yarmouk University

CpE 440
Computer
Architecture
Dr. Haithem Al-Mefleh
Computer Engineering Department,
Yarmouk University
1

Memory Hierarchy
Design

1
Computer Engineering Department, 8/7/2022
Yarmouk University

Cache Performance

CPU Time

2
Computer Engineering Department, 8/7/2022
Yarmouk University

Example

CPIhit = 1
CPIstalls = 1 + 50*1% + 50*1%*50% = 1.75

Using misses per instruction:


= 1%*(1 + 0.5) = 0.015
= 15 misses per 1000 instructions

CPI = 1 + 0.015*50 = 1.75


5

Average Memory Access Time

3
Computer Engineering Department, 8/7/2022
Yarmouk University

Example

Misses per 1000


instructions
7

4
Computer Engineering Department, 8/7/2022
Yarmouk University

Improve Cache Performance


• Reduce Miss Penalty
 Multilevel caches, reads priority over writes, critical words first,
write buffers, victim caches
• Reduce Miss Rate
 Larger block, larger cache, higher associativity, prediction,
compiler optimization
• Reduce Miss Penalty or Miss Rate with Parallelism
 Non-blocking caches, hardware prefetching, compiler prefetching
• Reduce Hit Time
 Small/simple caches, avoid address translation, pipelined cache
access, trace cache
9

Miss Penalty Reduction: Multilevel Caches


• First Level – small enough to match CPU speed
• Second
Level – large enough for many accesses that would go to
main memory

• Local Miss Rate


 number of misses in a cache / total number of memory accesses to this
cache
• Global Miss Rate
 number of misses in a cache / total number of memory accesses by
generated the CPU

10

5
Computer Engineering Department, 8/7/2022
Yarmouk University

• First Level – Local, Global: Miss rateL1


• Second Level – Local: Miss rateL2, Global: Miss rateL1 x Miss rateL2
11

Example

Combined reads and writes,


Miss rate: write-back L1
• L1 40/1000=4%, L2 Local 20/40=50%, Global 20/1000=2%

12

6
Computer Engineering Department, 8/7/2022
Yarmouk University

Misses per 1000 instructions


L1 40*1.5 = 60
L2 20*1.5 = 30

= (AMAT – Hit timeL1)*(average


number of memory references per
instruction) =

13

Example – Associativity for L2

14

7
Computer Engineering Department, 8/7/2022
Yarmouk University

Miss Penalty Reduction: Critical Word First &


Early Start
• Normally, CPU needs one word of the block

• Critical Word First Wrapped Fetch, Requested Word First


 Request the missed word first from memory
 Send it to the CPU once it arrives
 CPU continues execution while filling the rest of words in block
• Early Restart
 Fetch words in normal order
 Send the requested word to CPU once it arrives
 Let CPU continue execution

15

Miss Rate Reduction: Larger Block Size


• Spatial locality
• Reduce compulsory misses
• Increase miss penalty

• Canincrease capacity or conflict misses; especially in small


caches
• Large number of small blocks may reduce conflict misses
•A complex trade-off; cache’s size and miss penalty
16

8
Computer Engineering Department, 8/7/2022
Yarmouk University

Miss Rate Reduction: Larger Block Size


• Lower-level memory
 High latency & High bandwidth
 Large block size
 cache gets more bytes per miss for a small increase in miss penalty

 Low latency & Low bandwidth


 Small block size
 little time saved with larger block

17

Example

Assume hit time =1

18

9
Computer Engineering Department, 8/7/2022
Yarmouk University

Miss Rate Reduction: Larger Caches


• Reduce capacity misses

• Longer hit time


• Higher cost / power

19

Miss Rate Reduction: Way Prediction


• Reduce conflict misses & keep hit speed of direct-mapped cache
• Extra bits to predict the way; block within the set of the next
cache access
 The Mux is set early to select the desired block, and in that clock
cycle a single tag comparison in parallel with reading cache data
• On a miss:
 check other blocks in the next clock cycle
 change prediction bits
 higher latency
• can make pipelining hard
20

10
Computer Engineering Department, 8/7/2022
Yarmouk University

Miss Penalty/Rate Reduction via Parallelism:


Nonblocking Caches to Reduce Stalls on Cache
Misses
• Pipelined architecture with out-of-order completion
 CPU doesn’t have to stall on a cache miss
 CPU may continue fetching instructions from Instruction cache while
waiting for the data cache
 Nonblocking, or lockup-free, cache
 “hit under miss”
 Significant increase in complexity

 “hit under multiple miss”, or “miss under miss”

21

Hit Time Reduction: Small and Simple Caches


• L1

• Smaller hardware is faster


• Fit on the same chip as the processor
• Simple cache; direct-mapped
 Overlap tag check & transmission of data
 Lower power
 Associativity
 number of blocks (determines # of rows that are accessed)
 Increasing block size, Miss rate could increase
• L2
• Keep tags off chip if fast enough; greater capacity of memory
chips
22

11
Computer Engineering Department, 8/7/2022
Yarmouk University

Main Memory Organization for Improving


Performance
• difficult to reduce latency to
fetch the first word from M

• Miss penalty can be


decreased
 Increase bandwidth from
M to cache
 Larger blocks, low
miss penalty like that
of a smaller block

• Processor connected to M
over a bus
 Clock rate of the bus
usually much slower
than the processor
4-way interleaved memory
 Speed of bus affects the
miss penalty

23

Main Memory Organization for Improving


Performance

Block: 4 Words, Word: 8 bytes

a) 4x(4+56+4) = 256 cycles, bandwidth of 1/8 bytes/cycle


b) 2x64 = 128 cycles, bandwidth of 1/2 bytes/cycle
c) 4+56+(4x4) = 76 cycles, bandwidth of 0.4 bytes/cycle

24

12
Computer Engineering Department, 8/7/2022
Yarmouk University

Thank you

25

13

You might also like