CS530 Fall2015 Lecture6
CS530 Fall2015 Lecture6
Memory:
Chapter 2 &
Appendix B,
Part 3
Gregory D. Peterson
[email protected]
Introduction
Memory Hierarchy Basics Memory Hierarchy Basics
• Six basic cache optimizations:
• Q1: Where can a block be placed in the upper – Larger block size
• Reduces compulsory misses
level? (Block placement) • Increases capacity and conflict misses, increases miss penalty
• Q2: How is a block found if it is in the upper – Larger total cache capacity to reduce miss rate
• Increases hit time, increases power consumption
level? (Block identification) – Higher associativity
• Q3: Which block should be replaced on a miss? • Reduces conflict misses
• Increases hit time, increases power consumption
(Block replacement) – Higher number of cache levels
• Q4: What happens on a write? • Reduces overall memory access time
Copyright © 2012, Elsevier Inc. All rights reserved. Copyright © 2012, Elsevier Inc. All rights reserved.
Advanced Optimizations
Advanced Optimizations
Copyright © 2012, Elsevier Inc. All rights reserved. Copyright © 2012, Elsevier Inc. All rights reserved.
1
9/14/15
Advanced Optimizations
L1 Size and Associativity Cache Example
Advanced Optimizations
Advanced Optimizations
Way Prediction Pipelining Cache
• To improve hit time, predict the way to pre-set • Pipeline cache access to improve bandwidth
mux – Examples:
– Mis-prediction gives longer hit time • Pentium: 1 cycle
– Prediction accuracy • Pentium Pro – Pentium III: 2 cycles
• > 90% for two-way • Pentium 4 – Core i7: 4 cycles
• > 80% for four-way
• I-cache has better accuracy than D-cache
– First used on MIPS R10000 in mid-90s • Increases branch mis-prediction penalty
– Used on ARM Cortex-A8
• Makes it easier to increase associativity
• Extend to predict block as well
– “Way selection”
– Increases mis-prediction penalty
Copyright © 2012, Elsevier Inc. All rights reserved. Copyright © 2012, Elsevier Inc. All rights reserved.
Advanced Optimizations
Advanced Optimizations
2
9/14/15
Advanced Optimizations
Advanced Optimizations
Critical Word First, Early Restart Merging Write Buffer
Advanced Optimizations
Advanced Optimizations
Compiler Optimizations Hardware Prefetching
• Blocking
– Instead of accessing entire rows or columns,
subdivide matrices into blocks
– Requires more memory accesses but improves
locality of accesses
Pentium 4 Pre-fetching
Copyright © 2012, Elsevier Inc. All rights reserved. Copyright © 2012, Elsevier Inc. All rights reserved.
Advanced Optimizations
Advanced Optimizations
• Register prefetch
– Loads data into register
• Cache prefetch
– Loads data into cache
Copyright © 2012, Elsevier Inc. All rights reserved. Copyright © 2012, Elsevier Inc. All rights reserved.
3