Unit 3 - LM11 - Memory Prefetching

The document discusses memory prefetching, a technique used to reduce latency in computer architecture by loading data into cache before it is needed. It covers two main types of prefetching: software prefetching, where programmers insert prefetch instructions, and hardware prefetching, where the processor predicts future data accesses. The document also highlights the benefits, challenges, and various implementation strategies of prefetching, including sequential, stride, and adaptive methods.

Uploaded by

gayathris

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

4 views

Unit 3 - LM11 - Memory Prefetching

Uploaded by

gayathris

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

You are on page 1/ 6

22XX303 – COMPUTER ARCHITECTURE

UNIT III – LP 11 – LECTURE MATERIAL

Memory Prefetching

Prefetching

Even programs with good data locality will now and then have to access a cache line that is not
in the cache, and will then stall until the data has been fetched from main memory. It would of
course be better if there was a way to load the data into the cache before it is needed so the stall
could be avoided. This is called prefetching and there are two ways to achieve it, software
prefetching and hardware prefetching.

Software Prefetching

With software prefetching the programmer or compiler inserts prefetch instructions into the
program. These are instructions that initiate a load of a cache line into the cache, but do not stall
waiting for the data to arrive.

A critical property of prefetch instructions is the time from when the prefetch is executed to
when the data is used. If the prefetch is too close to the instruction using the prefetched data, the
cache line will not have had time to arrive from main memory or the next cache level and the
instruction will stall. This reduces the effectiveness of the prefetch.

If the prefetch is too far ahead of the instruction using the prefetched data, the prefetched cache
line will instead already have been evicted again before the data is actually used. The instruction
using the data will then cause another fetch of the cache line and have to stall. This not only
eliminates the benefit of the prefetch instruction, but introduces additional costs since the cache
line is now fetched twice from main memory or the next cache level. This increases the memory
bandwidth requirement of the program.

Processors that have multiple levels of caches often have different prefetch instructions for
prefetching data into different cache levels. This can be used, for example, to prefetch data from
main memory to the L2 cache far ahead of the use with an L2 prefetch instruction, and then
prefetch data from the L2 cache to the L1 cache just before the use with a L1 prefetch
instruction.

There is a cost for executing a prefetch instruction. The instruction has to be decoded and it uses
some execution resources. A prefetch instruction that always prefetches cache lines that are
already in the cache will consume execution resources without providing any benefit. It is
therefore important to verify that prefetch instructions really prefetch data that is not already in
the cache.

The cache miss ratio needed by a prefetch instruction to be useful depends on its purpose. A
prefetch instruction that fetches data from main memory only needs a very low miss ratio to be
useful because of the high main memory access latency. A prefetch instruction that fetches cache
lines from a cache further from the processor to a cache closer to the processor may need a miss
ratio of a few percent to do any good.

It is common that software prefetching fetches slightly more data than is actually used. For
example, when iterating over a large array it is common to prefetch data some distance ahead of
the loop, for example, 1 kilobyte ahead of the loop. When the loop is approaching the end of the
array the software prefetching should ideally stop. However, it is often cheaper to continue to
prefetch data beyond the end of the array than to insert additional code to check when the end of
the array is reached. This means that 1 kilobyte of data beyond the end of the array that isn't
needed is fetched.

Hardware Prefetching

Many modern processors implement hardware prefetching. This means that the processor
monitors the memory access pattern of the running program and tries to predict what data the
program will access next and prefetches that data. There are few different variants of how this
can be done.

A stream prefetcher looks for streams where a sequence of consecutive cache lines are accessed
by the program. When such a stream is found the processor starts prefetching the cache lines
ahead of the program's accesses.

A stride prefetcher looks for instructions that make accesses with regular strides, that do not
necessarily have to be to consecutive cache lines. When such an instruction is detected the
processor tries to prefetch the cache lines it will access ahead of it.

An adjacent cache line prefetcher automatically fetches adjacent cache lines to ones being
accessed by the program. This can be used to mimic behaviour of a larger cache line size in a
cache level without actually having to increase the line size.

Hardware prefetchers can generally only handle very regular access patterns. The cost of
prefetching data that isn't used can be high, so processor designers have to be conservative.

An advantage of hardware prefetching compared to software prefetching is that no extra

instructions that use execution resources are needed in the program. If you know that an
application is going to be run on processors with hardware prefetching, a combination of
hardware and software prefetching can be used. The hardware prefetcher can be trusted to
prefetch highly regular accesses, while software prefetching can be used for irregular accesses
that the hardware prefetcher can not handle.
Introduction
In the realm of computer architecture, optimizing memory access latency is a critical factor in
enhancing system performance. One of the sophisticated techniques used to mitigate the latency
issues is prefetching. Prefetching is a proactive strategy where data is fetched into the cache
before it is actually needed by the processor. This lecture will delve into the various aspects of
prefetching, including its types, benefits, challenges, and implementation strategies.

1. The Basics of Prefetching

Prefetching is an anticipatory technique used to improve the performance of cache memory. The
primary idea is to predict which data will be required in the near future and load it into the cache
ahead of time, thereby reducing the wait time for data fetch operations.
1.1. Why Prefetching?
Latency Reduction: Memory accesses can be slow, especially when dealing with large datasets
that do not fit in the cache. By prefetching data, we can hide memory latency.
Increased Throughput: By keeping the cache populated with data that will likely be accessed
soon, the processor can maintain high throughput and avoid stalls.
Improved Cache Efficiency: Effective prefetching can increase cache hit rates, making better use
of the cache storage.
1.2. Cache Hierarchies and Prefetching
In modern computer systems, there are multiple levels of cache (L1, L2, L3, etc.), each with
varying sizes and speeds. Prefetching can be applied at any of these levels to bring data closer to
the processor before it is needed.
2. Types of Prefetching
Prefetching techniques can be broadly categorized based on the source of prefetching hints and
the way prefetching is implemented.
2.1. Based on Source of Hints
Software Prefetching: Instructions explicitly added to the program code by the compiler or
programmer to prefetch data.
Hardware Prefetching: The hardware (usually the processor or cache controller) automatically
identifies patterns in memory accesses and prefetches data accordingly.

2.2. Based on Implementation Method

Sequential Prefetching: Also known as next-line prefetching, it fetches the next sequential block
of memory. This method works well for linear data access patterns.
Stride Prefetching: Identifies regular strides (patterns) in memory access and prefetches data
blocks following that pattern.
Content-directed Prefetching: Uses the actual content of the memory to predict future accesses.
This method is complex and less common.
Correlation Prefetching: Relies on identifying and storing patterns or correlations in memory
accesses and prefetching based on these correlations.

3. Software Prefetching
3.1. Manual Prefetching
Manual prefetching involves the programmer explicitly inserting prefetch instructions into the
code. These instructions are hints to the processor to load specific data into the cache ahead of
time. This approach can be highly effective when the programmer has deep knowledge of the
application's access patterns.
3.2. Compiler-based Prefetching
Compilers can automatically insert prefetch instructions based on their analysis of the code. The
compiler looks for loops and other predictable patterns to determine where prefetching could be
beneficial. This method reduces the burden on the programmer and can optimize for specific
hardware characteristics.
4. Hardware Prefetching
4.1. Next-line Prefetching
Next-line prefetching is one of the simplest forms of hardware prefetching. When a cache miss
occurs, the cache controller not only fetches the requested cache line but also the next sequential
line. This technique is effective for workloads with sequential memory access patterns.
4.2. Stride Prefetching
Stride prefetching involves detecting regular access patterns with a fixed stride. For example, if a
program frequently accesses memory addresses in the sequence
4.3. Adaptive Prefetching
Adaptive prefetchers adjust their strategies based on the observed behavior of the running
program. They can switch between different prefetching techniques or modify their
aggressiveness depending on the workload characteristics. This adaptability helps in optimizing
performance across a variety of applications.
4.4. Correlation-based Prefetching
This sophisticated method involves maintaining a history of cache misses and identifying
patterns or correlations in these misses. When a cache miss occurs, the prefetcher uses this
historical data to predict and prefetch future addresses. While complex, correlation-based
prefetching can significantly boost performance for applications with recurring access patterns.
5. Prefetching Performance Metrics
To evaluate the effectiveness of prefetching techniques, several performance metrics are used:
Prefetch Accuracy: The ratio of useful prefetches (those that are actually used by the processor)
to the total number of prefetches issued.
Prefetch Coverage: The proportion of cache misses that are eliminated due to prefetching.
Prefetch Timeliness: The degree to which prefetched data is available in the cache when needed.
Prefetch Overhead: The additional memory traffic and cache pollution caused by prefetching.
6. Challenges in Prefetching
6.1. Prefetching Overhead
While prefetching can improve performance, it also introduces overhead. Unnecessary prefetches
can lead to increased memory traffic and cache pollution, where useful data is evicted from the
cache prematurely.
6.2. Prefetching Timeliness
For prefetching to be effective, the prefetched data must be in the cache when the processor
needs it. If data is prefetched too early, it might be evicted before use. If prefetched too late, it
does not help in reducing latency.
6.3. Adaptability
Different applications have different memory access patterns. A prefetching strategy that works
well for one application might not be effective for another. Prefetchers need to be adaptable to
various workloads.
6.4. Hardware Complexity
Implementing advanced prefetching techniques adds complexity to the hardware. This can lead
to increased power consumption and design challenges.
7. Case Studies and Examples
7.1. Sequential Prefetching in Multimedia Applications
Multimedia applications, such as video processing and image rendering, often access memory in
a sequential manner. Sequential prefetching can be highly effective in such scenarios. By
prefetching the next blocks of data, the application can maintain a smooth and uninterrupted data
flow.
7.2. Stride Prefetching in Scientific Computing
Scientific applications frequently involve array processing with regular strides. Stride
prefetching can significantly enhance performance by prefetching future elements of the array
based on the detected stride pattern. This reduces the wait time for data fetch operations and
boosts computational efficiency.
7.3. Adaptive Prefetching in General-purpose Computing
General-purpose applications exhibit a wide range of memory access patterns. Adaptive
prefetchers, which can switch between different strategies, are particularly useful in such
environments. By continuously monitoring access patterns and adjusting prefetching strategies,
adaptive prefetchers optimize performance across diverse workloads.
8. Prefetching in Modern Processors
Modern processors, including those from Intel and AMD, incorporate sophisticated hardware
prefetching mechanisms. These processors use a combination of next-line prefetching, stride
prefetching, and adaptive techniques to optimize memory access latency. Additionally, software
developers can leverage compiler-based prefetching to further enhance application performance.
9. Future Directions in Prefetching
As technology advances, new prefetching techniques and enhancements continue to emerge.
Some potential future directions include:
9.1. Machine Learning-based Prefetching
Machine learning algorithms can analyze memory access patterns and predict future accesses
with high accuracy. By training models on historical data, processors can implement highly
effective and adaptive prefetching strategies.
9.2. Prefetching in Non-volatile Memory Systems
With the advent of non-volatile memory (NVM) technologies, new challenges and opportunities
arise for prefetching. Prefetching strategies need to be adapted to the unique characteristics of
NVM, such as higher write latencies and endurance limitations.

Black Cell Security Operations Centre Report Template 0.1 - Sanitized 1
No ratings yet
Black Cell Security Operations Centre Report Template 0.1 - Sanitized 1
10 pages
200 Acsops 10 en M0SG
50% (2)
200 Acsops 10 en M0SG
13 pages
Software Engineering Question Bank
100% (2)
Software Engineering Question Bank
6 pages
Data Prefetching: Naveed Ahmed Muhammad Haseeb - Ul-Hassan Zahid
No ratings yet
Data Prefetching: Naveed Ahmed Muhammad Haseeb - Ul-Hassan Zahid
19 pages
Prefetching Using Markov Predictors: Grunwald@cs - Colorado.edu
No ratings yet
Prefetching Using Markov Predictors: Grunwald@cs - Colorado.edu
12 pages
A Branch Target Instruction Prefetchnig Technique For Improved Performance
No ratings yet
A Branch Target Instruction Prefetchnig Technique For Improved Performance
6 pages
Improving Data Cache Performance by Pre-Executing Instructions Under A Cache Miss
No ratings yet
Improving Data Cache Performance by Pre-Executing Instructions Under A Cache Miss
8 pages
CS7810 Prefetching: Seth Pugsley
No ratings yet
CS7810 Prefetching: Seth Pugsley
22 pages
Improving The Performance and Bandwidth Efficiency
No ratings yet
Improving The Performance and Bandwidth Efficiency
12 pages
COUnit 2 Rest
No ratings yet
COUnit 2 Rest
3 pages
Stanford Advanced Caches
No ratings yet
Stanford Advanced Caches
46 pages
Cache and Caching: Electrical and Electronic Engineering
No ratings yet
Cache and Caching: Electrical and Electronic Engineering
15 pages
Computer Organization and Architecture
No ratings yet
Computer Organization and Architecture
12 pages
10_Caches
No ratings yet
10_Caches
34 pages
Cache and Caching: Electrical and Electronic Engineering
No ratings yet
Cache and Caching: Electrical and Electronic Engineering
15 pages
3492321.3519583
No ratings yet
3492321.3519583
18 pages
Lecture16 PDF
No ratings yet
Lecture16 PDF
4 pages
Coa Poster Content
No ratings yet
Coa Poster Content
2 pages
Topics: Cache Innovations (Sections 2.4, B.4, B.5), Virtual Memory Intro
No ratings yet
Topics: Cache Innovations (Sections 2.4, B.4, B.5), Virtual Memory Intro
20 pages
Stack and Cache
No ratings yet
Stack and Cache
9 pages
Caches and Memory
No ratings yet
Caches and Memory
65 pages
Compiler Optimizations and Prefetching
No ratings yet
Compiler Optimizations and Prefetching
22 pages
5.2 Eleven Advanced Optimizations of Cache Performance
No ratings yet
5.2 Eleven Advanced Optimizations of Cache Performance
13 pages
Cache Misses
No ratings yet
Cache Misses
8 pages
Lecture 5 Cache Optimization
No ratings yet
Lecture 5 Cache Optimization
25 pages
Computer Architecture 1st Semester Spring Session Unit 3
No ratings yet
Computer Architecture 1st Semester Spring Session Unit 3
33 pages
Lecture: Cache Hierarchies: Topics: Cache Innovations (Sections B.1-B.3, 2.1)
No ratings yet
Lecture: Cache Hierarchies: Topics: Cache Innovations (Sections B.1-B.3, 2.1)
20 pages
Merging Similar Patterns For Hardware Prefetching
No ratings yet
Merging Similar Patterns For Hardware Prefetching
15 pages
Chapter 3 Cache
No ratings yet
Chapter 3 Cache
38 pages
CHAPTER 2 Memory Hierarchy Design & APPENDIX B. Review of Memory Heriarchy
No ratings yet
CHAPTER 2 Memory Hierarchy Design & APPENDIX B. Review of Memory Heriarchy
73 pages
Memory Hir and Io System
No ratings yet
Memory Hir and Io System
26 pages
Memory-Management Strategies
No ratings yet
Memory-Management Strategies
70 pages
Cache Memory: How Caching Works
No ratings yet
Cache Memory: How Caching Works
15 pages
10 1 1 305 429 PDF
No ratings yet
10 1 1 305 429 PDF
25 pages
CEIT-22021 Ch-8-L1
No ratings yet
CEIT-22021 Ch-8-L1
50 pages
OS Unit 4
No ratings yet
OS Unit 4
13 pages
Data Oriented Design
No ratings yet
Data Oriented Design
17 pages
Imp
No ratings yet
Imp
20 pages
Operting System Book
100% (3)
Operting System Book
37 pages
L17
No ratings yet
L17
23 pages
os unit1
No ratings yet
os unit1
78 pages
4 Memory Models
No ratings yet
4 Memory Models
19 pages
Memory Hierarchy Presentation Detailed
No ratings yet
Memory Hierarchy Presentation Detailed
24 pages
Memory Management
No ratings yet
Memory Management
50 pages
Advanced Caching Techniques: Approaches To Improving Memory System Performance
No ratings yet
Advanced Caching Techniques: Approaches To Improving Memory System Performance
18 pages
Lab 8
No ratings yet
Lab 8
10 pages
osymicroproject
No ratings yet
osymicroproject
10 pages
SAS Programming Guidelines Interview Questions You'll Most Likely Be Asked
From Everand
SAS Programming Guidelines Interview Questions You'll Most Likely Be Asked
Vibrant Publishers
No ratings yet
Register File Prefetching
No ratings yet
Register File Prefetching
14 pages
BCS303 M4 Notes
No ratings yet
BCS303 M4 Notes
36 pages
OS MODULE 4
No ratings yet
OS MODULE 4
35 pages
Computer Architecture Solutions_OK
No ratings yet
Computer Architecture Solutions_OK
6 pages
Memory Cache
No ratings yet
Memory Cache
18 pages
Module 5
No ratings yet
Module 5
17 pages
25 e 50 Beb 5 Aad 8 F 60
No ratings yet
25 e 50 Beb 5 Aad 8 F 60
49 pages
Unit 5 Dpco
No ratings yet
Unit 5 Dpco
20 pages
OS Unit IV
No ratings yet
OS Unit IV
96 pages
CO2202 L14 Cache
No ratings yet
CO2202 L14 Cache
31 pages
PA Week 4
No ratings yet
PA Week 4
3 pages
Advanced Processor Architecture: Summer 1997
No ratings yet
Advanced Processor Architecture: Summer 1997
28 pages
Lec 34
No ratings yet
Lec 34
26 pages
Cache
No ratings yet
Cache
31 pages
Unit - 1 Microprocessor Architecture
No ratings yet
Unit - 1 Microprocessor Architecture
52 pages
U1-FA1-22CS503
No ratings yet
U1-FA1-22CS503
21 pages
U2-FA5-22CS503
No ratings yet
U2-FA5-22CS503
20 pages
U1-FA3-22CS503
No ratings yet
U1-FA3-22CS503
20 pages
U1-FA4-22CS503
No ratings yet
U1-FA4-22CS503
21 pages
Unit 3 - LM 12 - IO Interface and Architecture
No ratings yet
Unit 3 - LM 12 - IO Interface and Architecture
8 pages
Unit 3 - LM11 - TERA MTA
No ratings yet
Unit 3 - LM11 - TERA MTA
4 pages
Unit 3 - LM 12 - Overview of IO
No ratings yet
Unit 3 - LM 12 - Overview of IO
3 pages
Automated Financial and Monitoring System For Sulitnarz Internet Cafe of Barangay Langgam, San Pedro City Laguna
No ratings yet
Automated Financial and Monitoring System For Sulitnarz Internet Cafe of Barangay Langgam, San Pedro City Laguna
8 pages
Infomation Assurance Pre q2
No ratings yet
Infomation Assurance Pre q2
7 pages
IBM FlashSystem 5045 (1)
No ratings yet
IBM FlashSystem 5045 (1)
14 pages
SAP Feildglass
No ratings yet
SAP Feildglass
48 pages
T NG H P NLHĐH Đã Nén
No ratings yet
T NG H P NLHĐH Đã Nén
564 pages
GIS Data Management_Presentation
No ratings yet
GIS Data Management_Presentation
16 pages
Ex4600 Core Switch Template 1.3
No ratings yet
Ex4600 Core Switch Template 1.3
26 pages
Reporting Rnglisg Unklabat
No ratings yet
Reporting Rnglisg Unklabat
8 pages
4.1 answers- top 200 mern interview questions
No ratings yet
4.1 answers- top 200 mern interview questions
35 pages
PGP35314 - Arvind Raj Verma - Aventura - CV Point 1
No ratings yet
PGP35314 - Arvind Raj Verma - Aventura - CV Point 1
2 pages
Slide 2 - Hardware Concepts
No ratings yet
Slide 2 - Hardware Concepts
58 pages
Face Detection
No ratings yet
Face Detection
14 pages
Azure Adventures With CSharp 2024
No ratings yet
Azure Adventures With CSharp 2024
314 pages
Introduction To Structured Programming Topic 1 and 2
No ratings yet
Introduction To Structured Programming Topic 1 and 2
35 pages
Google Analytics
100% (1)
Google Analytics
38 pages
Tutorial v6.15.0 - React Router
No ratings yet
Tutorial v6.15.0 - React Router
72 pages
Imagecon MLops syllabus
No ratings yet
Imagecon MLops syllabus
6 pages
Module 1 - C++ Basics
No ratings yet
Module 1 - C++ Basics
14 pages
Intel Ema Server Installation and Maintenance Guide
No ratings yet
Intel Ema Server Installation and Maintenance Guide
65 pages
MCS 213
No ratings yet
MCS 213
262 pages
Acceptance Testing
No ratings yet
Acceptance Testing
27 pages
Cyber Space Module 3 1
No ratings yet
Cyber Space Module 3 1
17 pages
Final exam _ Template
No ratings yet
Final exam _ Template
11 pages
Software Requirement Specification
No ratings yet
Software Requirement Specification
6 pages
Immediate download (Ebook) Cloud Security: Attacks, Techniques, Tools, and Challenges by Preeti Mishra, Emmanuel S Pilli, R C Joshi ISBN 9780367435820, 0367435829 ebooks 2024
100% (3)
Immediate download (Ebook) Cloud Security: Attacks, Techniques, Tools, and Challenges by Preeti Mishra, Emmanuel S Pilli, R C Joshi ISBN 9780367435820, 0367435829 ebooks 2024
81 pages
Netspi
No ratings yet
Netspi
3 pages
Resume: Career Objective
100% (1)
Resume: Career Objective
3 pages

Unit 3 - LM11 - Memory Prefetching

Uploaded by

Unit 3 - LM11 - Memory Prefetching

Uploaded by

22XX303 – COMPUTER ARCHITECTURE

UNIT III – LP 11 – LECTURE MATERIAL

An advantage of hardware prefetching compared to software prefetching is that no extra

1. The Basics of Prefetching

2.2. Based on Implementation Method

You might also like