0% found this document useful (0 votes)
3 views21 pages

MC&CC

The document discusses memory consistency and cache coherence in multi-core systems, emphasizing the importance of correct behavior in parallel programming. It explains various memory consistency models, including Sequential Consistency and Total Store Order, as well as cache coherence protocols like MSI, MESI, and MOESI, highlighting their states and features. The document also addresses the challenges and optimizations related to memory operations and cache management in modern processors.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
3 views21 pages

MC&CC

The document discusses memory consistency and cache coherence in multi-core systems, emphasizing the importance of correct behavior in parallel programming. It explains various memory consistency models, including Sequential Consistency and Total Store Order, as well as cache coherence protocols like MSI, MESI, and MOESI, highlighting their states and features. The document also addresses the challenges and optimizations related to memory operations and cache management in modern processors.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 21

Memory Consistency

and Cache
Coherence
Table of contents

01 02
Memory Consistency Cache Coherance
01
memory
consistency
Memory Consistency
In modern multi-core systems, memory consistency defines the expected
“Correct” behavior of reads and writes across different processors. It
determines the order in which memory operations are observed, ensuring
correctness in parallel programs.

correct shared memory behavior in terms of loads and stores (memory reads
and writes)

Why is Memory Consistency Important?

• Ensures correctness in multi-threaded execution.


• Defines how different cores observe memory updates.
• Impacts system performance and predictability.
Memory Consistency
Sequential Consistency

Sequential Consistency (SC) is the strictest memory consistency


model
“A system is sequentially consistent if the result of any execution
is the same as if the operations of all the processors were
executed in some sequential order, and the operations of each
individual processor appear in the order specified by the
program”
Do we use it ?
• Performance Bottlenecks: Enforcing a global order limits optimizations.
• Write Delays: Cores must wait for writes to be visible before continuing
execution.
• Not Practical in Modern CPUs: Modern processors use optimizations like out-of-
order execution and caching.
Write Buffer issue :
• A write buffer temporarily holds store (write) operations before they are
committed to memory. While improving performance, this can cause cores to
observe memory updates in different orders, violating SC.
TSO
• TSO (Total Store Order) is a relaxed consistency model used in x86 processors.
It allows store buffering but maintains predictable ordering by enforcing:
• Stores must appear in order globally (FIFO rule).
• A core sees its own writes before others do.
• Loads cannot bypass earlier loads.

✅ Faster execution due to relaxed write visibility.


✅ Predictable behavior compared to more relaxed models.
✅ Efficient for multi-core synchronization.
Memory Fences (Barriers)
• Since relaxed models allow reordering, memory fences (or barriers) are used
to enforce ordering when needed.
• Types of Fences
• MFENCE (Memory Fence): Prevents any reordering.
• LFENCE (Load Fence): Ensures loads happen before subsequent loads/stores.
• SFENCE (Store Fence): Ensures stores are visible before later stores/loads.
Relaxed Memory Models
• To further improve performance, modern architectures like ARM and RISC-V
allow even more reordering
• DMB SY (Data Memory Barrier)
• DMB LD (Load Barrier)
02
Cache
Coherance
Cache Coherance
Cache coherence protocols ensure
that multiple caches maintain
consistent data.

• Baseline System Model :


Includes a single multicore
processor chip and off-chip main
memory, with each core having a
private data cache and a shared
last-level cache (LLC)

• Coherence invariants :
1. Single-Writer, Multiple-Read
(SWMR) 2. Data-Value Invariant.
Cache Coherance
• Single-Writer–Multiple-Reader (SWMR) Invariant: At any given time, a memory
location is either cached for writing (and reading) at one cache or cached only for
reading at zero to many caches
Cache Coherance
1.Snooping-based Cache
Coherence :
each cache monitors (snoops on) a
shared bus for memory operations. When
a cache miss occurs, the core's cache
controller arbitrates for the shared bus
and broadcasts its request. Other caches
respond with data if they have a copy.
They are conceptually simpler but can
become less scalable as the number of
cores increases, due to the reliance on
broadcasting over a shared bus.
Cache Coherance
• Directory-based Cache Coherence
use a directory to track which caches have
copies of which memory locations. When a
cache miss occurs, the requesting core
queries the directory for information about
where the data is located. The directory then
directs the request to the appropriate cache
or memory controller. Directory protocols are
more scalable than snooping protocols
because they don't rely on broadcasting, but
they add complexity due to the directory
structure and associated overhead.
MSI
States:
• Modified (M): The cache block is updated and different from memory. It must be written back before
another processor can read it.
• Shared (S): The block is clean and can be shared among multiple caches.
• Invalid (I): The block is not valid in the cache.
Key Features:
• Uses an Invalidate-based approach.
• Write operations cause invalidation in other caches.
• Drawback: Requires memory writebacks even for read-only shared data.
MESI
States:
• Modified (M): The cache block is updated and different from
memory. It must be written back before another processor can
read it.
• Exclusive (E) – The cache has the only clean copy (same as
memory, but no other caches have it).
• Shared (S): The block is clean and can be shared among
multiple caches.
• Invalid (I): The block is not valid in the cache.
Key Features:
• E state reduces memory writes: No need to write back if the
block is only in one cache.
• Optimizes performance over MSI by reducing unnecessary
memory traffic.
MOSI
States:
• Modified (M): The cache block is updated and different from memory. It must be written back before
another processor can read it.
• Owned (O) – The cache holds the most recent copy and serves requests to other cores without
updating main memory.
• Shared (S): The block is clean and can be shared among multiple caches.
• Invalid (I): The block is not valid in the cache.
Key Features:
• Reduces writebacks to memory by allowing a cache to supply data instead.
• Useful in systems with high inter-processor communication.
MOSI
• MOESI is the most optimized protocol, used in modern AMD processors.
• It balances memory traffic and performance by allowing cache-to-cache transfers while minimizing
unnecessary writebacks.
• It is preferred in high-performance multiprocessor architectures.
Advantages of MOESI Over Other Protocols
1.Cache-to-Cache Transfers:
⚬ The Owned (O) state allows direct cache sharing, reducing memory accesses.
2.Lower Memory Traffic:
⚬ Unlike MSI, where modified data must be written back to memory, MOESI allows sharing the
latest data without immediate writebacks.
3.Efficiency in Multi-Core Systems:
⚬ Used in AMD Opteron and Ryzen processors to optimize cache performance.
Thank you
Any Questions ?

You might also like