0% found this document useful (0 votes)
3 views3 pages

CS530 Fall2015 Lecture6

The document discusses various cache optimizations in memory hierarchy, including block placement, identification, replacement strategies, and advanced techniques such as way prediction and nonblocking caches. It emphasizes the importance of cache size, associativity, and energy efficiency in improving hit time and reducing miss penalties. Additionally, it covers compiler and hardware prefetching strategies to enhance memory access performance.

Uploaded by

oalqudi
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
3 views3 pages

CS530 Fall2015 Lecture6

The document discusses various cache optimizations in memory hierarchy, including block placement, identification, replacement strategies, and advanced techniques such as way prediction and nonblocking caches. It emphasizes the importance of cache size, associativity, and energy efficiency in improving hit time and reducing miss penalties. Additionally, it covers compiler and hardware prefetching strategies to enhance memory access performance.

Uploaded by

oalqudi
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 3

9/14/15

Associative Cache Example

Memory:
Chapter 2 &
Appendix B,
Part 3
Gregory D. Peterson
[email protected]

Introduction
Memory Hierarchy Basics Memory Hierarchy Basics
• Six basic cache optimizations:
• Q1: Where can a block be placed in the upper – Larger block size
• Reduces compulsory misses
level? (Block placement) • Increases capacity and conflict misses, increases miss penalty

• Q2: How is a block found if it is in the upper – Larger total cache capacity to reduce miss rate
• Increases hit time, increases power consumption
level? (Block identification) – Higher associativity
• Q3: Which block should be replaced on a miss? • Reduces conflict misses
• Increases hit time, increases power consumption
(Block replacement) – Higher number of cache levels
• Q4: What happens on a write? • Reduces overall memory access time

(Write strategy) – Giving priority to read misses over writes


• Reduces miss penalty
– Avoiding address translation in cache indexing
• Reduces hit time

Copyright © 2012, Elsevier Inc. All rights reserved. Copyright © 2012, Elsevier Inc. All rights reserved.
Advanced Optimizations

Advanced Optimizations

Ten Advanced Optimizations L1 Size and Associativity

• Small and simple first level caches


– Critical timing path:
• addressing tag memory, then
• comparing tags, then
• selecting correct set
– Direct-mapped caches can overlap tag compare
and transmission of data
– Lower associativity reduces power because fewer
cache lines are accessed

Access time vs. size and associativity

Copyright © 2012, Elsevier Inc. All rights reserved. Copyright © 2012, Elsevier Inc. All rights reserved.

1  
9/14/15  

Advanced Optimizations
L1 Size and Associativity Cache Example

• Assume a direct cache with 16 words and a


block size of 2 words.
• Which of these hit/miss and what are the final
contents after the following word addresses?
• 4, 36, 4, 13, 7, 12, 15, 11, 8, 56, 27, 21, 12
• What if the cache is 2-way set associative?

Energy per read vs. size and associativity

Copyright © 2012, Elsevier Inc. All rights reserved.

Advanced Optimizations

Advanced Optimizations
Way Prediction Pipelining Cache

• To improve hit time, predict the way to pre-set • Pipeline cache access to improve bandwidth
mux – Examples:
– Mis-prediction gives longer hit time • Pentium: 1 cycle
– Prediction accuracy • Pentium Pro – Pentium III: 2 cycles
• > 90% for two-way • Pentium 4 – Core i7: 4 cycles
• > 80% for four-way
• I-cache has better accuracy than D-cache
– First used on MIPS R10000 in mid-90s • Increases branch mis-prediction penalty
– Used on ARM Cortex-A8
• Makes it easier to increase associativity
• Extend to predict block as well
– “Way selection”
– Increases mis-prediction penalty
Copyright © 2012, Elsevier Inc. All rights reserved. Copyright © 2012, Elsevier Inc. All rights reserved.
Advanced Optimizations

Advanced Optimizations

Nonblocking Caches Multibanked Caches


• Allow hits before • Organize cache as independent banks to
previous misses support simultaneous access
complete
– ARM Cortex-A8 supports 1-4 banks for L2
– “Hit under miss”
– “Hit under multiple – Intel i7 supports 4 banks for L1 and 8 banks for
miss” L2
• L2 must support
this
• In general,
processors can hide
L1 miss penalty but
not L2 miss penalty
• Interleave banks according to block
address
Copyright © 2012, Elsevier Inc. All rights reserved. Copyright © 2012, Elsevier Inc. All rights reserved.

2  
9/14/15  

Advanced Optimizations

Advanced Optimizations
Critical Word First, Early Restart Merging Write Buffer

• Critical word first • When storing to a block that is already pending in


– Request missed word from memory first the write buffer, update write buffer
– Send it to the processor as soon as it arrives • Reduces stalls due to full write buffer

• Early restart • Do not apply to I/O addresses


– Request words in normal order
– Send missed work to the processor as soon as No write
it arrives buffering

• Effectiveness of these strategies depends


on block size and likelihood of another Write buffering
access to the portion of the block that has
not yet been fetched
Copyright © 2012, Elsevier Inc. All rights reserved. Copyright © 2012, Elsevier Inc. All rights reserved.

Advanced Optimizations

Advanced Optimizations
Compiler Optimizations Hardware Prefetching

• Loop Interchange • Fetch two blocks on miss (include next


– Swap nested loops to access memory in sequential block)
sequential order

• Blocking
– Instead of accessing entire rows or columns,
subdivide matrices into blocks
– Requires more memory accesses but improves
locality of accesses

Pentium 4 Pre-fetching
Copyright © 2012, Elsevier Inc. All rights reserved. Copyright © 2012, Elsevier Inc. All rights reserved.
Advanced Optimizations

Advanced Optimizations

Compiler Prefetching Summary


• Insert prefetch instructions before data is
needed
• Non-faulting: prefetch doesn’t cause exceptions

• Register prefetch
– Loads data into register
• Cache prefetch
– Loads data into cache

• Combine with loop unrolling and software


pipelining

Copyright © 2012, Elsevier Inc. All rights reserved. Copyright © 2012, Elsevier Inc. All rights reserved.

3  

You might also like