Coaint
Coaint
The memory hierarchy in computer systems is a structured organization of memory types that balances speed, size, and cost to optimize
overall system performance. The idea behind the hierarchy is to ensure that the processor (CPU) can quickly access the most frequently
used data while storing less frequently accessed data in larger, slower, and more cost-effective memory units.
• Speed Optimization: The closer the data is to the CPU, the faster the access time. The memory hierarchy ensures that the most
critical data is stored in the fastest possible memory.
• Cost Efficiency: High-speed memory (like SRAM) is very expensive, while large storage (like hard drives) is cheap. The hierarchy
makes use of both, combining fast but small memory with large but slower memory.
• Capacity Management: The hierarchy ensures there is enough storage for all data by using larger but slower memory for data
that is less frequently accessed.
The memory hierarchy typically consists of several levels, each offering different speeds and capacities, ranging from very fast, small, and
expensive memory to large, slow, and inexpensive storage.
o Purpose: Stores the data that is actively being processed by the CPU. These are the fastest accessible memory
elements, used to store operands for computations.
o Speed: Registers have access times measured in nanoseconds (ns) or even fractions of a CPU cycle (typically around
0.5–1 ns).
o Size: Very small (e.g., a few to tens of bytes), as they hold only the most immediate data.
o Cost: Extremely high per bit, as they are integrated into the CPU itself.
o Purpose: Cache memory stores copies of frequently accessed data and instructions from the main memory (RAM). It
reduces the time it takes for the CPU to access data by keeping a subset of the most-used data closer to the
processor.
o L1 Cache:
▪ Purpose: The smallest and fastest cache level, used to store a small amount of data and instructions.
▪ Speed: Very fast, with access times of around 1–2 CPU cycles.
o L2 Cache:
▪ Location: Often on the CPU chip but may also be separate from the processor.
▪ Purpose: A larger and slower cache than L1, used to store more data than L1 but still much faster than
accessing the main memory.
▪ Speed: Slower than L1 but faster than main memory, with access times in the 3–10 CPU cycle range.
o L3 Cache:
▪ Location: Can be shared among multiple processor cores, and often located off-chip.
▪ Purpose: The largest cache level, used to hold data that might not fit into L1 or L2. L3 is shared across
multiple cores in modern processors.
▪ Speed: Slower than L2 but still significantly faster than main memory.
o Access Speed: L1 is the fastest, followed by L2 and L3. The trade-off is that, as the cache size increases (L2, L3), it
becomes slower.
o Purpose: Main memory (often dynamic RAM, or DRAM) stores the operating system, applications, and currently
used data that are actively being processed by the CPU.
o Speed: Slower than cache memory. It can take 50–200 nanoseconds to access data from RAM.
o Size: Much larger than cache memory. Typically 4 GB to 128 GB or more, depending on the computer system.
o Cost: More affordable than cache memory. It’s cheaper per bit because DRAM is used in main memory.
o Usage: Main memory is used to store larger volumes of active data that need to be directly accessed by the CPU.
However, it’s slower than cache memory.
o Location: External to the processor and RAM, used for long-term storage.
o Purpose: Secondary storage devices are used to store data and applications that are not actively being processed.
This includes large files like videos, documents, and software programs.
▪ Mechanically based storage devices that use spinning magnetic disks to read and write data.
▪ Speed: Slower than RAM or SSDs, with access times in the millisecond (ms) range.
▪ Size: Can store hundreds of gigabytes (GB) to several terabytes (TB) of data.
▪ Speed: Access times are much faster than HDDs, typically around 0.1–0.2 milliseconds.
▪ Cost: More expensive than HDDs but still cheaper than RAM and cache memory.
o Access Speed: Both HDDs and SSDs are much slower than RAM and cache, with HDDs being the slowest. However,
SSDs are significantly faster than traditional HDDs.
1. Speed Optimization: The memory hierarchy allows frequent data to be accessed quickly by placing it in faster memory, like the
cache. Less frequently accessed data resides in slower, larger memory, like RAM or secondary storage. By doing so, it reduces
the time the CPU spends waiting for data.
2. Cost-Effective Design: SRAM (used for cache memory) is much more expensive than DRAM (used for main memory), and both
are much more expensive than magnetic disks or SSDs. The hierarchy ensures that expensive, fast memory is only used for the
most critical data, while the slower, cheaper storage holds larger amounts of data.
3. Capacity Management: The memory hierarchy provides large storage capacity (through HDDs or SSDs) for data that is not
needed immediately, while maintaining a smaller but faster storage layer (using cache and RAM) for the data that needs to be
accessed quickly.
• When the CPU needs data, it first checks the registers. If the data isn't there, it checks the L1 cache. If it's not in L1, it looks in
the L2 and L3 caches.
• If the data is not found in any cache levels, it moves to the main memory (RAM). If RAM doesn't have the data, the system
fetches it from secondary storage (HDD or SSD).
• This layered approach makes sure that the most frequently accessed data is always as close to the CPU as possible, and less
frequent data is stored in slower, larger storage layers.
Conclusion
The memory hierarchy ensures that the system performs efficiently by providing quick access to the data the CPU needs most often, while
still maintaining large storage capacities for less frequently accessed data. By organizing memory into a hierarchy of registers, caches,
main memory, and secondary storage, systems can manage the trade-off between speed, size, and cost, resulting in better performance
and more cost-effective computer systems.
Cache memory is a high-speed storage medium that sits between the CPU and main memory. Its purpose is to bridge the significant speed
gap between the processor and main memory. The CPU operates at a much faster rate than the main memory, and this disparity can
significantly slow down overall performance. Cache memory helps mitigate this performance bottleneck by making frequently accessed
data and instructions readily available to the processor.
(Cache memory
Cache memory is like a super-fast and small-sized storage area that sits between the main memory (RAM) and the central processing unit
(CPU) in a computer. Its primary job is to store frequently used or recently accessed information so that the CPU can quickly
retrieve it when needed.
o The CPU is designed to process data at very high speeds, but the main memory (RAM) is much slower in comparison.
Without cache memory, the processor would need to access main memory every time it requires data, which would
drastically slow down operations.
o Cache memory, being faster than main memory, helps to reduce the average access time for data, enabling the CPU
to process information more quickly.
2. Locality of Reference:
o The efficiency of cache memory is built on the principle of locality of reference, a characteristic behavior of
programs. This principle refers to the tendency of programs to access a relatively small set of memory locations
repeatedly within a short period.
▪ Temporal locality: If a program accesses a particular data or instruction, it is likely to access the same
data or instruction again soon.
▪ Spatial locality: If a program accesses a specific memory address, it is likely to access nearby addresses as
well.
o Cache takes advantage of these behaviors by storing recently or frequently accessed data (temporal locality) and
adjacent data (spatial locality).
3. Improved Performance:
o Cache memory improves CPU performance by reducing the time it takes to access frequently used data or
instructions. This allows the CPU to spend more time processing data and less time waiting for data to be fetched
from slower main memory.
The operation of cache memory is designed to capitalize on locality of reference by storing recently or frequently accessed data. The
general operation involves the following steps:
1. Processor Request:
o When the processor issues a read or write request, the memory address is passed to the cache.
2. Cache Lookup:
o The cache checks if the requested data is present. If it is, a cache hit occurs.
o If the data is not in the cache, a cache miss occurs, and the data must be fetched from the main memory.
3. Data Transfer:
o If the data is in the cache (cache hit), the requested data is returned to the CPU immediately.
o If the data is not in the cache (cache miss), a block of data containing the requested memory address is transferred
from main memory to the cache. The requested data is then returned to the CPU.
4. Cache Block:
o Data in the cache is grouped into blocks (or lines). A cache block (or cache line) is a set of contiguous memory
locations. Rather than loading only the requested memory location, the cache loads the entire block to take
advantage of spatial locality.
Cache Hit:
• A cache hit occurs when the CPU requests data that is already stored in the cache. In this case, the processor can directly access
the cache to retrieve the data without needing to access the slower main memory. This results in faster performance.
Cache Miss:
• A cache miss occurs when the requested data is not found in the cache. When this happens:
1. The cache controller fetches the data from the main memory.
2. The block containing the requested data is loaded into the cache.
o Cold Miss (or Compulsory Miss): The first time data is accessed and isn't in the cache.
o Capacity Miss: Occurs when the cache is too small to hold all the data that the program needs, causing older data to
be evicted to make space.
o Conflict Miss: Occurs when multiple blocks of data compete for the same cache location, even though the cache has
room for them.
Cache Write Policies: Write-Through vs. Write-Back
There are two main strategies for handling writes to the cache and main memory:
1. Write-Through Cache:
• In the write-through protocol, when data is written to the cache, it is also immediately written to the main memory.
o Advantages: Simple to implement; the cache and main memory remain synchronized.
o Disadvantages: Can result in unnecessary write operations to the main memory, especially if the cache data is
modified frequently.
2. Write-Back Cache:
• In the write-back protocol, data is written only to the cache initially. The main memory is updated later when the block is
evicted from the cache (i.e., when it is replaced by another block).
o Advantages: Fewer write operations to the main memory, which is beneficial for performance, especially when data
is modified multiple times before being evicted from the cache.
When the cache is full and a new memory block needs to be loaded but there is no space available, the system must decide which block to
remove (evict) to make room for the new block. This decision is made using a replacement policy.
• Least Recently Used (LRU): The block that has not been used for the longest period is replaced.
• First In, First Out (FIFO): The oldest block in the cache is replaced.
• Optimal Replacement: Replaces the block that will not be used for the longest period of time in the future (ideal but impractical
because it requires future knowledge).
o This is the smallest and fastest cache, usually located directly on the CPU chip.
o It stores the most frequently used data and instructions for immediate access by the processor.
o Speed: Closest to the processor and provides the fastest data access.
o L2 cache is larger than L1 cache and is usually located on the same chip as the processor or just outside the CPU.
o It holds data and instructions that are less frequently used than those in L1 cache but still need faster access than the
main memory.
o L3 cache is often shared between multiple processor cores in multi-core processors. It is the largest but slowest of
the caches.
o Size: Typically several MBs (e.g., 2 MB to 30 MB).
o Speed: Slower than L1 and L2 but still much faster than accessing data from main memory.
o Write-Through Cache: Updates both the cache and the main memory simultaneously.
o Write-Back Cache: Updates only the cache initially and writes to main memory later when the data is evicted from
the cache.
5. Associative Mapping:
o Direct-Mapped: Each block in main memory maps to exactly one cache line.
o Fully-Associative: Any block in memory can go into any cache line (flexible but complex).
o Set-Associative: Combines aspects of direct-mapped and fully-associative, allowing each block of memory to be
mapped to any cache line within a set of cache lines.
Conclusion
Cache memory is a critical component in modern computer systems for improving performance. By exploiting the locality of reference
principle, cache stores frequently or recently accessed data and instructions, reducing the time the CPU spends waiting for data from
slower main memory. Different types of cache (L1, L2, L3) and cache policies (write-through, write-back) are used to manage data
efficiently and balance between speed, complexity, and memory usage. The effectiveness of cache memory directly contributes to a
system's overall speed and responsiveness, especially for tasks that involve repetitive access to the same data or instructions.
Cache memory plays a crucial role in speeding up data access for a processor by temporarily holding frequently used data. However,
determining where in the cache memory a particular block of main memory should be placed is not trivial.
Mapping functions determine how memory blocks are placed in the cache.
Mapping functions help decide where data from the main memory will be stored in the cache. This is important because the cache is much
smaller than the main memory, so it needs a strategy to manage the placement of data efficiently.
• Block Organization:
o Main memory is divided into 4,096 blocks (4K), with each block holding 16 words.
Address Breakdown
Example
Address 0x1234:
• Formula:
How it Works:
o Main memory blocks are assigned to specific cache blocks based on the above formula.
o If multiple memory blocks map to the same cache block, they replace each other.
o Result: Block 256 from main memory is placed in cache block 0, replacing the data from block 0.
In direct mapping, each main memory block is assigned to one of the 128 cache blocks using the formula: Cache Block = Main Memory
Block Number mod 128. This ensures that the block number wraps around to fit within the available cache. For example, main memory
blocks 0, 128, and 256 all map to cache block 0. If multiple memory blocks map to the same cache block, the new block replaces the old
one. This approach is simple but can lead to conflicts when several blocks compete for the same cache location.
In associative mapping, unlike direct mapping, any block from main memory can be placed in any cache block. The key difference is that
there is no specific, fixed position for memory blocks in the cache. The cache uses a tag to identify which block is stored where. However,
the principle of mapping remains similar, where multiple memory blocks can go to the same cache block.
Example:
Steps:
Unlike direct mapping, where blocks are mapped to specific cache blocks using the modulus operation, in associative mapping, any main
memory block can be placed in any cache block. The cache will search for the correct block using the tag to determine which block is in
each cache location.
In associative mapping, any block from main memory can be placed in any cache block, providing complete flexibility. There is no fixed
mapping between memory blocks and cache blocks, unlike in direct mapping. Instead, the cache uses a tag to identify which memory
block is stored in a given cache block. This eliminates conflicts caused by multiple memory blocks mapping to the same cache location, as
seen in direct mapping. However, associative mapping requires more complex hardware to search the cache for the correct block, making
it slower and more expensive.
In associative mapping, memory blocks can be stored in any cache block. The cache checks if the requested data is present by comparing
the tag of the memory address with the tags of the blocks in the cache.
How It Works:
This method allows any memory block to be placed in any cache block, making it more flexible but requiring the cache to check all its
blocks.
Cache Replacement Policy (Super Simple):When the cache is full and needs space for new data, it uses a replacement policy to decide
which old data to remove:
1. LRU (Least Recently Used): Removes the data that hasn't been used for the longest time.
2. FIFO (First-In-First-Out): Removes the oldest data in the cache.
3. Random Replacement: Removes random data from the cache.
These policies help make space for new data in the cache.
In set-associative mapping, each memory block can be stored in any cache block within a specific set. The replacement policy is used to
decide which block within the set to replace when the cache is full.
Example Setup:
Replacement Policy:
When a new block needs to be loaded and the set is full, the cache will replace one of the blocks within the set using a replacement
policy.
Summary:
In set-associative mapping, blocks from main memory can go into any cache block within a set. When the set is full and a new block needs
to be loaded, a replacement policy (LRU, FIFO, or Random) determines which block to remove from the set.
In a Centralized Shared-Memory Architecture, all processors in the system share a single, central memory pool. This means that the
processors can directly access the memory, which is shared across all of them. Communication between processors happens via a shared
bus or memory module. This architecture is typically used in Symmetric Multiprocessing (SMP) systems.
How It Works:
• UMA means that all processors have the same access time to the shared memory. This is a key characteristic of centralized
shared-memory architectures. Each processor perceives the same memory access latency, making the system easier to
program because all processors work with memory in the same way.
• Single Memory Space means all processors access the memory as if it's a large, shared pool, simplifying memory management.
1. Simplicity in Programming:
o Since all processors share the same memory, programming becomes easier. You can use simple programming
models like threads and shared variables, where processors can communicate by reading and writing to shared
variables directly.
2. Memory Consistency:
o With centralized memory, it’s easier to maintain coherence or consistency of data because all memory is stored in
one place. There’s no need to worry about which processor has the most recent copy of the data, unlike in
distributed memory systems.
3. Efficient Memory Use:
o Multilevel caches reduce the demand on the main memory and increase processor speed. By caching data locally in
each processor’s cache, the system minimizes memory traffic and improves overall performance.
1. Scalability Limitations:
o As more processors are added to the system, the shared bus or interconnect becomes a bottleneck. Only one
processor can access the memory at a time, so as the number of processors increases, the memory access time
becomes slower, limiting how well the system can scale.
2. Communication Overhead:
o Since all processors share the same bus to communicate with memory, there is contention when multiple
processors try to access memory at the same time. This leads to delays, reducing the system's overall performance,
especially when the number of processors increases.
3. Cache Coherence Overhead:
o Maintaining cache coherence becomes more complex in systems with multiple processors. Cache coherence
protocols (such as MESI) are required to ensure that all processors have the most up-to-date version of the shared
data. This adds complexity and overhead to the system.
Summary:
A Centralized Shared-Memory Architecture provides a unified memory space accessible by all processors. This simplifies programming
and memory management by offering a Uniform Memory Access (UMA) system, where each processor experiences the same access time
to memory. However, it faces challenges like scalability issues due to the shared bus, communication delays, and the cache coherence
problem when multiple processors try to access or update the same data. As the number of processors increases, the system’s
performance may decrease due to these factors.
Cache Coherence in Multiprocessor Systems
In a system with multiple processors, each processor has its own cache memory. The cache speeds up access to data by storing frequently
used information. But when processors share data, it’s important to make sure that all caches have the same up-to-date version of that
data. This is called cache coherence.
1. Snooping Protocol
o How it works: Each processor watches (or "snoops on") the shared bus to see what other processors are doing with
the data.
o Example: If Processor 1 changes a value in its cache, other processors will notice and update or delete their copies
of that data.
o Best for: Smaller systems where processors are connected through a shared bus.
2. Directory-Based Protocol
o How it works: Instead of snooping, there is a central directory that keeps track of which caches have copies of each
data block.
o Example: If Processor 1 wants to access data, the directory will tell it who has copies of that data, and it makes sure
everything stays up-to-date.
o Best for: Larger systems, where keeping track of data using a shared bus would be too slow or messy.
Key Points:
• Snooping is simple, works well for small systems, and uses a shared bus to track data changes.
• Directory-based is better for larger systems because it avoids too much traffic on the bus and uses a central directory to
manage data.
Both methods are used to make sure all processors have the same data in their caches, just in different ways.
In multiprocessor systems, where multiple processors share data, it's important to make sure everyone has the latest data. This is called
cache coherence. There are two main ways to do this:
1. Directory-Based Protocols
• How it works: There's a central directory that keeps track of which processors have copies of each piece of data.
• In SMP systems: The directory is part of the main memory or a central point that keeps everything in check.
• In DSM systems: Many processors are involved, and having a single directory can slow things down because too many
processors might need to access it at once.
• Downside: Since all requests go through the directory, it can be slower compared to other methods.
2. Snooping Protocols
• How it works: Instead of a directory, each processor "snoops" (listens) on a shared bus that connects all the processors'
caches.
• When a processor updates data: It sends a signal on the bus to tell the other processors to update or delete their copies of
that data.
• In SMP systems: All processors are connected via the shared bus, so they can easily see what others are doing with data.
• Why it's good: It's simple and works well for smaller systems because processors can quickly check the bus for updates.
Key Takeaways:
• Directory-Based: A central directory keeps track of data, but it can slow down in larger systems.
• Snooping: Processors listen to a shared bus for updates, and it's simple and fast for smaller systems.
The Snooping Coherence Protocol is a method used in multiprocessor systems to ensure that all processors have a consistent view of
shared memory, even when each processor has its own private cache. This protocol requires each cache to "snoop" or monitor
transactions on a shared bus to detect any updates or changes to memory data made by other processors.
Key Characteristics
1. Shared Bus:
o All processors and caches are connected via a shared communication bus.
o When a processor changes a memory block, this change can be broadcast on the bus, allowing other caches to see
and respond to it.
2. Cache Coherence:
o Each cache listens to the bus and monitors memory transactions.
o If a processor writes to a memory block, other caches can detect this and make necessary updates to avoid holding
outdated or inconsistent data.
3. Snooping on the Bus:
o When any processor performs a read or write operation, all other caches "snoop" on the action.
o Based on what they detect, caches may update, invalidate, or keep their data to ensure consistency.
1. Write-Invalidate Protocol:
o This protocol ensures that only one cache has a valid (exclusive) copy of data when it is written to.
o How it works: When a processor writes to a memory block, it sends an "invalidate" signal on the bus. Other caches
receiving this signal invalidate their copies of that memory block, ensuring that the writing processor has exclusive
access.
o Example:
▪ If Processor A writes to block X, it invalidates copies of X in all other caches (e.g., Processor B). If
Processor B later needs X, it must fetch the updated version from memory or Processor A.
2. Write-Update Protocol (also called Write-Broadcast):
o In this protocol, when a processor writes to a cache block, it broadcasts the new data to all other caches.
o How it works: The updated data is shared immediately across all caches, ensuring that each cache holds the latest
version of the memory block.
o Example:
▪ If Processor A writes to block X, it broadcasts the updated value of X to all other processors (e.g.,
Processor B). Processor B then updates its cache with the new value of X.
Cache States:To manage these protocols, each cache line can be in one of the following states:
• Modified (M): The data is modified and only exists in one cache.
• Shared (S): The data is unmodified and can be shared by multiple caches.
• Invalid (I): The data is not valid and needs to be fetched from memory if accessed.
1. Simplicity: They are relatively straightforward to implement, especially in small systems with a shared bus.
2. Efficiency: They work well in small setups where all processors can monitor the bus in real-time.
3. Low Latency: Changes propagate quickly across caches because all caches are monitoring the bus.
Disadvantages of Snooping Protocols
1. Bus Contention: With many processors, bus congestion can occur since each processor snoops on every transaction.
2. Scalability Issues: Snooping protocols don’t scale well in larger systems, as a shared bus becomes a bottleneck.
3. Limited Bandwidth: The shared bus has limited bandwidth, reducing performance as more processors are added.
The Directory-Based Coherence Protocol is used in distributed-memory multiprocessor systems to maintain cache coherence. In such
systems, each processor has its own cache, and the directory keeps track of which caches have copies of each memory block.
Key Concepts
1. Directory:
o The directory is responsible for keeping track of the state of each memory block and which caches have copies of the
block.
o It uses a bit vector to indicate which processors hold copies of a memory block.
2. Cache States:
o Shared: The cache block is present in multiple caches, and all copies are up-to-date.
o Modified: A single cache holds the exclusive copy of the block, and the memory copy is outdated.
3. Nodes:
o Home Node: The node that contains the memory block and the directory for the block.
1. Read Miss:
o When a processor misses in its cache, the directory checks the status of the memory block.
o If the block is not in any cache (uncached), the directory fetches it from memory and sends it to the requesting
processor.
o If the block is in another processor’s cache (exclusive), the directory requests the cache to update its copy to shared
and sends the block to the requesting processor.
2. Write Miss:
o If a processor writes to a block that is in multiple caches (shared), the directory sends an invalidate signal to all
remote caches to ensure the writing processor has exclusive access.
▪ The directory checks if the block is uncached. If so, it fetches it from memory.
▪ If the block is exclusive in another cache, the directory requests the cache to send the block and updates
the state.
o The directory sends invalidation signals to remote caches if the block is shared. The block is then marked as modified
in the writing cache.
o If the block is uncached, the home directory sends the block to the local cache.
o If the block is exclusive in another cache, the directory sends a request to that cache to share or invalidate the block
and sends it to the requesting processor.
2. Write Miss:
o The home directory sends an invalidate message to remote caches to remove their copies of the block.
o The writing cache updates the directory to mark the block as modified.
Advantages
• Scalable: Directory-based protocols are more scalable for large systems since they avoid the bus contention seen in snooping
protocols.
Disadvantages
• Latency: There can be delays as the home node manages the coherence state and updates multiple caches.
In summary, the Directory-Based Coherence Protocol provides a way to maintain consistency in large multiprocessor systems by using a
centralized directory to track which caches have which memory blocks, ensuring data coherence and minimizing unnecessary data
transfers.
The Write-Invalidate Snooping Protocol is a common technique used to ensure cache coherence in systems where multiple processors
have their own caches. In this protocol, when one processor writes to a block of memory, all other caches that hold a copy of that
memory block are notified to invalidate their copies.
1. Modified (M): The cache has the only valid copy of the data, and it is different from the main memory.
2. Shared (S): The cache has a copy of the data, but it is consistent with the main memory. Multiple caches may have a copy.
3. Invalid (I): The cache does not have a valid copy of the block.
Basic Operations
• Write Miss: When a processor writes to a memory block, it sends a message over the shared bus. If other caches have the
block, they invalidate their copies.
• Read Miss: When a processor reads a block that is not in its cache, it requests the data. If another processor has the block, it
sends the data.
State Transitions
1. Modified (M):
o The processor has the only valid copy, and it is different from the main memory.
2. Shared (S):
o The processor has a copy of the block that is consistent with the main memory.
o Write Miss: Invalidate the block in other caches → Transition to Invalid (I).
3. Invalid (I):
o Write Miss: The processor fetches the block and writes to it → Transition to Modified (M).
o Read Miss: The processor fetches the block and reads it → Transition to Shared (S).
Description of Transitions:
o Write Miss: The processor sends an invalidation message to other caches, which transition to Invalid (I).
o Read Miss: The processor sends the block to the requesting processor and changes the state to Shared (S).
o Write Miss: Other caches are invalidated, and the state transitions to Invalid (I).
o Read Miss: The block remains in Shared (S) as it is still valid for other processors to read.
o Write Miss: The processor fetches the block and writes to it, transitioning to Modified (M).
o Read Miss: The processor fetches the block and transitions to Shared (S).
Example Scenario
• Processor B reads the same block, transitioning its state to Shared (S).
• Processor C writes to the same block, causing both Processor A and Processor B to invalidate their copies, transitioning to
Invalid (I).
• Simplicity: Easy to implement because when a processor writes, it simply invalidates the blocks in other caches.
• Data Integrity: Prevents caches from using stale data by ensuring that only one processor has the valid copy of a modified
block.
Disadvantages
• Bus Traffic: High bus traffic due to the need to send invalidation messages every time a write occurs.
• Latency: There can be delays as caches are invalidated and data is fetched again.
In summary, the Write-Invalidate Snooping Protocol helps maintain cache coherence by ensuring that no processor works with outdated
data by invalidating copies in other caches whenever a write operation occurs.