MC&CC

The document discusses memory consistency and cache coherence in multi-core systems, emphasizing the importance of correct behavior in parallel programming. It explains various memory consistency models, including Sequential Consistency and Total Store Order, as well as cache coherence protocols like MSI, MESI, and MOESI, highlighting their states and features. The document also addresses the challenges and optimizations related to memory operations and cache management in modern processors.

Uploaded by

جيهان سعد السيد

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

3 views21 pages

MC&CC

Uploaded by

جيهان سعد السيد

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

You are on page 1/ 21

Memory Consistency

and Cache
Coherence
Table of contents

01 02
Memory Consistency Cache Coherance
01
memory
consistency
Memory Consistency
In modern multi-core systems, memory consistency defines the expected
“Correct” behavior of reads and writes across different processors. It
determines the order in which memory operations are observed, ensuring
correctness in parallel programs.

correct shared memory behavior in terms of loads and stores (memory reads
and writes)

Why is Memory Consistency Important?

• Ensures correctness in multi-threaded execution.

• Defines how different cores observe memory updates.
• Impacts system performance and predictability.
Memory Consistency
Sequential Consistency

Sequential Consistency (SC) is the strictest memory consistency

model
“A system is sequentially consistent if the result of any execution
is the same as if the operations of all the processors were
executed in some sequential order, and the operations of each
individual processor appear in the order specified by the
program”
Do we use it ?
• Performance Bottlenecks: Enforcing a global order limits optimizations.
• Write Delays: Cores must wait for writes to be visible before continuing
execution.
• Not Practical in Modern CPUs: Modern processors use optimizations like out-of-
order execution and caching.
Write Buffer issue :
• A write buffer temporarily holds store (write) operations before they are
committed to memory. While improving performance, this can cause cores to
observe memory updates in different orders, violating SC.
TSO
• TSO (Total Store Order) is a relaxed consistency model used in x86 processors.
It allows store buffering but maintains predictable ordering by enforcing:
• Stores must appear in order globally (FIFO rule).
• A core sees its own writes before others do.
• Loads cannot bypass earlier loads.

✅ Faster execution due to relaxed write visibility.

✅ Predictable behavior compared to more relaxed models.
✅ Efficient for multi-core synchronization.
Memory Fences (Barriers)
• Since relaxed models allow reordering, memory fences (or barriers) are used
to enforce ordering when needed.
• Types of Fences
• MFENCE (Memory Fence): Prevents any reordering.
• LFENCE (Load Fence): Ensures loads happen before subsequent loads/stores.
• SFENCE (Store Fence): Ensures stores are visible before later stores/loads.
Relaxed Memory Models
• To further improve performance, modern architectures like ARM and RISC-V
allow even more reordering
• DMB SY (Data Memory Barrier)
• DMB LD (Load Barrier)
02
Cache
Coherance
Cache Coherance
Cache coherence protocols ensure
that multiple caches maintain
consistent data.

• Baseline System Model :

Includes a single multicore
processor chip and off-chip main
memory, with each core having a
private data cache and a shared
last-level cache (LLC)

• Coherence invariants :
1. Single-Writer, Multiple-Read
(SWMR) 2. Data-Value Invariant.
Cache Coherance
• Single-Writer–Multiple-Reader (SWMR) Invariant: At any given time, a memory
location is either cached for writing (and reading) at one cache or cached only for
reading at zero to many caches
Cache Coherance
1.Snooping-based Cache
Coherence :
each cache monitors (snoops on) a
shared bus for memory operations. When
a cache miss occurs, the core's cache
controller arbitrates for the shared bus
and broadcasts its request. Other caches
respond with data if they have a copy.
They are conceptually simpler but can
become less scalable as the number of
cores increases, due to the reliance on
broadcasting over a shared bus.
Cache Coherance
• Directory-based Cache Coherence
use a directory to track which caches have
copies of which memory locations. When a
cache miss occurs, the requesting core
queries the directory for information about
where the data is located. The directory then
directs the request to the appropriate cache
or memory controller. Directory protocols are
more scalable than snooping protocols
because they don't rely on broadcasting, but
they add complexity due to the directory
structure and associated overhead.
MSI
States:
• Modified (M): The cache block is updated and different from memory. It must be written back before
another processor can read it.
• Shared (S): The block is clean and can be shared among multiple caches.
• Invalid (I): The block is not valid in the cache.
Key Features:
• Uses an Invalidate-based approach.
• Write operations cause invalidation in other caches.
• Drawback: Requires memory writebacks even for read-only shared data.
MESI
States:
• Modified (M): The cache block is updated and different from
memory. It must be written back before another processor can
read it.
• Exclusive (E) – The cache has the only clean copy (same as
memory, but no other caches have it).
• Shared (S): The block is clean and can be shared among
multiple caches.
• Invalid (I): The block is not valid in the cache.
Key Features:
• E state reduces memory writes: No need to write back if the
block is only in one cache.
• Optimizes performance over MSI by reducing unnecessary
memory traffic.
MOSI
States:
• Modified (M): The cache block is updated and different from memory. It must be written back before
another processor can read it.
• Owned (O) – The cache holds the most recent copy and serves requests to other cores without
updating main memory.
• Shared (S): The block is clean and can be shared among multiple caches.
• Invalid (I): The block is not valid in the cache.
Key Features:
• Reduces writebacks to memory by allowing a cache to supply data instead.
• Useful in systems with high inter-processor communication.
MOSI
• MOESI is the most optimized protocol, used in modern AMD processors.
• It balances memory traffic and performance by allowing cache-to-cache transfers while minimizing
unnecessary writebacks.
• It is preferred in high-performance multiprocessor architectures.
Advantages of MOESI Over Other Protocols
1.Cache-to-Cache Transfers:
⚬ The Owned (O) state allows direct cache sharing, reducing memory accesses.
2.Lower Memory Traffic:
⚬ Unlike MSI, where modified data must be written back to memory, MOESI allows sharing the
latest data without immediate writebacks.
3.Efficiency in Multi-Core Systems:
⚬ Used in AMD Opteron and Ryzen processors to optimize cache performance.
Thank you
Any Questions ?

Yan Solihin - Fundamentals of Parallel Computer Architecture
100% (2)
Yan Solihin - Fundamentals of Parallel Computer Architecture
547 pages
IT Theory Study Guide
No ratings yet
IT Theory Study Guide
242 pages
Cache Memory Basics
No ratings yet
Cache Memory Basics
3 pages
Bus-Based Multiprocessor: A.K.A or Snoopy-Bus Architecture
No ratings yet
Bus-Based Multiprocessor: A.K.A or Snoopy-Bus Architecture
54 pages
Shared Memory Architecture Concepts and Performance Issues: Outline
No ratings yet
Shared Memory Architecture Concepts and Performance Issues: Outline
7 pages
Cache Coherence - MESI MOESI
No ratings yet
Cache Coherence - MESI MOESI
57 pages
Week 5
No ratings yet
Week 5
52 pages
Cache Coherence: - According To Webster's Dictionary
No ratings yet
Cache Coherence: - According To Webster's Dictionary
15 pages
Cache Coherency
No ratings yet
Cache Coherency
19 pages
MODULE 4 HPC
No ratings yet
MODULE 4 HPC
41 pages
Module 4
No ratings yet
Module 4
40 pages
Cache Coherency in Multiprocessors (MPS) / Multi-Cores: Topic 9
No ratings yet
Cache Coherency in Multiprocessors (MPS) / Multi-Cores: Topic 9
79 pages
Cache Coherence
No ratings yet
Cache Coherence
53 pages
CA-unit 5-Material-For Reference
No ratings yet
CA-unit 5-Material-For Reference
16 pages
ACA Lecture 29 Cache-Coherence 2
No ratings yet
ACA Lecture 29 Cache-Coherence 2
42 pages
Module 2 (MEMORY) .Debkanta
No ratings yet
Module 2 (MEMORY) .Debkanta
27 pages
05 Multiprocessor
No ratings yet
05 Multiprocessor
54 pages
Memory Hierarchy: Haresh Dagale Dept of ESE
No ratings yet
Memory Hierarchy: Haresh Dagale Dept of ESE
32 pages
Multiprocessing: Flynn's Classification (1966)
No ratings yet
Multiprocessing: Flynn's Classification (1966)
8 pages
Cache Coherence - 20250120 - 142158 - 0000
No ratings yet
Cache Coherence - 20250120 - 142158 - 0000
34 pages
1.symmetric and Distributed Shared Memory Architectures
79% (19)
1.symmetric and Distributed Shared Memory Architectures
29 pages
Lec 6 SharedArch PDF
No ratings yet
Lec 6 SharedArch PDF
33 pages
CA Lecture 13
No ratings yet
CA Lecture 13
27 pages
L7 Multicore 1
No ratings yet
L7 Multicore 1
50 pages
Lecture12 PDF
No ratings yet
Lecture12 PDF
9 pages
ch5 4
No ratings yet
ch5 4
9 pages
Multiprocessors
No ratings yet
Multiprocessors
39 pages
18bce2429 Da 2 Cao
No ratings yet
18bce2429 Da 2 Cao
13 pages
Cache Coherence: Write-Invalidate Snooping Protocol For Write-Back
No ratings yet
Cache Coherence: Write-Invalidate Snooping Protocol For Write-Back
21 pages
Shared Memory Architecture
No ratings yet
Shared Memory Architecture
39 pages
L39 - Centralized Shared Memory Architectures
No ratings yet
L39 - Centralized Shared Memory Architectures
31 pages
Cache Coherency
No ratings yet
Cache Coherency
33 pages
Week4 1
No ratings yet
Week4 1
37 pages
Multi Processors and Thread Level Parallelism
No ratings yet
Multi Processors and Thread Level Parallelism
74 pages
Parallel 2
No ratings yet
Parallel 2
14 pages
A Survey of Cache Coherence Mechanisms in Shared M
No ratings yet
A Survey of Cache Coherence Mechanisms in Shared M
27 pages
Part B Ma
No ratings yet
Part B Ma
16 pages
Shared-Memory Architectures: Adapted From A Lecture by Ian Watson, University of Machester
No ratings yet
Shared-Memory Architectures: Adapted From A Lecture by Ian Watson, University of Machester
33 pages
Cache Coherency Presentation
No ratings yet
Cache Coherency Presentation
15 pages
Ownership Based Cache Coherence
No ratings yet
Ownership Based Cache Coherence
10 pages
Shared Memory Architectures
No ratings yet
Shared Memory Architectures
34 pages
Cheat Sheet Prepared For Advanced Computer Architecture Midterm Exam - UofM
No ratings yet
Cheat Sheet Prepared For Advanced Computer Architecture Midterm Exam - UofM
11 pages
Chapter 8 - Parallel Processing
No ratings yet
Chapter 8 - Parallel Processing
50 pages
IJARCCE-46 Cachemesiwithverilog
No ratings yet
IJARCCE-46 Cachemesiwithverilog
5 pages
2021-MESI Protocol For Multicore Processors Based On FPGA
No ratings yet
2021-MESI Protocol For Multicore Processors Based On FPGA
10 pages
Cache Coherence (Part 1)
No ratings yet
Cache Coherence (Part 1)
13 pages
Phase-Priority Based Directory Coherence For Multicore Processor
No ratings yet
Phase-Priority Based Directory Coherence For Multicore Processor
15 pages
Cache Coherence: Computer Science & Artificial Intelligence Lab
No ratings yet
Cache Coherence: Computer Science & Artificial Intelligence Lab
36 pages
Slot28 CH17 ParallelProcessing 32 Slides
No ratings yet
Slot28 CH17 ParallelProcessing 32 Slides
32 pages
Pattern Based Cache Coherency Architectu
No ratings yet
Pattern Based Cache Coherency Architectu
13 pages
Shared Memory Multiprocessors: Logical Design and Software Interactions
No ratings yet
Shared Memory Multiprocessors: Logical Design and Software Interactions
107 pages
Computer Architecture: Multiprocessors Shared Memory Architectures Prof. Jerry Breecher CSCI 240 Fall 2003
No ratings yet
Computer Architecture: Multiprocessors Shared Memory Architectures Prof. Jerry Breecher CSCI 240 Fall 2003
24 pages
Snoop-Based Multiprocessor Design
No ratings yet
Snoop-Based Multiprocessor Design
57 pages
MESI Protocol
No ratings yet
MESI Protocol
2 pages
Module 5 - Pentium
No ratings yet
Module 5 - Pentium
39 pages
Mehmet Senvar - Cache Coherence Protocols
No ratings yet
Mehmet Senvar - Cache Coherence Protocols
30 pages
MULTIPROCTLPA
No ratings yet
MULTIPROCTLPA
99 pages
Optimized Caching Techniques: Application for Scalable Distributed Architectures
From Everand
Optimized Caching Techniques: Application for Scalable Distributed Architectures
Peter Jones
No ratings yet
Kernel Concepts and Architecture: Definitive Reference for Developers and Engineers
From Everand
Kernel Concepts and Architecture: Definitive Reference for Developers and Engineers
Richard Johnson
No ratings yet
Introduction to Microsoft SQL Server
From Everand
Introduction to Microsoft SQL Server
Eric Frick
No ratings yet
Mastering Efficient Memory Management in Java: Unlock the Secrets of Expert-Level Skills
From Everand
Mastering Efficient Memory Management in Java: Unlock the Secrets of Expert-Level Skills
Larry Jones
No ratings yet
Assignmen C.E Reactions
No ratings yet
Assignmen C.E Reactions
2 pages
Micro Assignment
No ratings yet
Micro Assignment
1 page
QLinux
No ratings yet
QLinux
1 page
Final 2023
No ratings yet
Final 2023
2 pages
Bash Scripting Quiz V2
No ratings yet
Bash Scripting Quiz V2
1 page
Architecture in The Cloud Slides
No ratings yet
Architecture in The Cloud Slides
22 pages
Epicor10 techrefSystemAdministration 101400
No ratings yet
Epicor10 techrefSystemAdministration 101400
243 pages
Oracle Dat Admin # &&#
No ratings yet
Oracle Dat Admin # &&#
2 pages
Applications of Information and Communication Technology
No ratings yet
Applications of Information and Communication Technology
3 pages
19CS2106S Lecture Notes File System and Buffer Cache Allocation Algorithms Session - 4
No ratings yet
19CS2106S Lecture Notes File System and Buffer Cache Allocation Algorithms Session - 4
6 pages
V11 NewFeaturesWalkthroughGuide PDF
No ratings yet
V11 NewFeaturesWalkthroughGuide PDF
41 pages
MCQ Cdac
100% (3)
MCQ Cdac
11 pages
The Architectural Design of Globe: A Wide-Area Distributed System
No ratings yet
The Architectural Design of Globe: A Wide-Area Distributed System
23 pages
Tuning Your PostgreSQL Server
No ratings yet
Tuning Your PostgreSQL Server
7 pages
Study Material Nagarro Placement Interview Question 07122023
No ratings yet
Study Material Nagarro Placement Interview Question 07122023
27 pages
Mulesoft Interview Q&A
No ratings yet
Mulesoft Interview Q&A
11 pages
CO2: 1. Concept of Program Execution/Interpretation
No ratings yet
CO2: 1. Concept of Program Execution/Interpretation
22 pages
AWS Migration Guide
100% (1)
AWS Migration Guide
25 pages
Computer Organization and Architecture Module 3
100% (1)
Computer Organization and Architecture Module 3
34 pages
Lect12 Cache
No ratings yet
Lect12 Cache
39 pages
Set Properties Shape:: Execute Shape: Map Shape
No ratings yet
Set Properties Shape:: Execute Shape: Map Shape
4 pages
E CMP 12394656
No ratings yet
E CMP 12394656
250 pages
Module-3 ACA
No ratings yet
Module-3 ACA
50 pages
ApplicationXtender Web Access .NET Admin Guide 6.5
No ratings yet
ApplicationXtender Web Access .NET Admin Guide 6.5
205 pages
Abraham Silberschatz-Operating System Concepts (9th, 2012 - 12) - 460-463, 9.8
No ratings yet
Abraham Silberschatz-Operating System Concepts (9th, 2012 - 12) - 460-463, 9.8
4 pages
EE204 - Computer Architecture Course Project
No ratings yet
EE204 - Computer Architecture Course Project
7 pages
Oracle Real Application Clusters 19c: Best Practices and Secret Internals
No ratings yet
Oracle Real Application Clusters 19c: Best Practices and Secret Internals
47 pages
ICS 143 - Principles of Operating Systems
No ratings yet
ICS 143 - Principles of Operating Systems
38 pages
AWS Certified Solutions Architect Associate Practice Test 03
No ratings yet
AWS Certified Solutions Architect Associate Practice Test 03
72 pages
#OS Lecture Note 3 Memory Management
No ratings yet
#OS Lecture Note 3 Memory Management
64 pages
HDFS Vs AFS
No ratings yet
HDFS Vs AFS
4 pages
Scalable Web Architecture
No ratings yet
Scalable Web Architecture
20 pages
Where Do You "Tube"? Uncovering Youtube Server Selection Strategy
No ratings yet
Where Do You "Tube"? Uncovering Youtube Server Selection Strategy
6 pages
70-532: Developing Microsoft Azure Solutions
100% (1)
70-532: Developing Microsoft Azure Solutions
82 pages

MC&CC

Uploaded by

MC&CC

Uploaded by

Memory Consistency

Why is Memory Consistency Important?

• Ensures correctness in multi-threaded execution.

Sequential Consistency (SC) is the strictest memory consistency

✅ Faster execution due to relaxed write visibility.

• Baseline System Model :

You might also like