0% found this document useful (0 votes)
5 views16 pages

Coaint

The document explains the computer memory hierarchy, which organizes memory types to optimize speed, size, and cost for better system performance. It details the various levels of memory, including registers, cache memory, main memory, and secondary storage, highlighting their purposes, speeds, sizes, and costs. Additionally, it discusses the importance of cache memory in bridging the speed gap between the CPU and main memory, as well as the principles of locality of reference that enhance performance.

Uploaded by

heyna2617
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
5 views16 pages

Coaint

The document explains the computer memory hierarchy, which organizes memory types to optimize speed, size, and cost for better system performance. It details the various levels of memory, including registers, cache memory, main memory, and secondary storage, highlighting their purposes, speeds, sizes, and costs. Additionally, it discusses the importance of cache memory in bridging the speed gap between the CPU and main memory, as well as the principles of locality of reference that enhance performance.

Uploaded by

heyna2617
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 16

Computer Memory Hierarchy: Explanation

The memory hierarchy in computer systems is a structured organization of memory types that balances speed, size, and cost to optimize
overall system performance. The idea behind the hierarchy is to ensure that the processor (CPU) can quickly access the most frequently
used data while storing less frequently accessed data in larger, slower, and more cost-effective memory units.

The computer memory can be used as a hierarchy,

Purpose of the Memory Hierarchy:

• Speed Optimization: The closer the data is to the CPU, the faster the access time. The memory hierarchy ensures that the most
critical data is stored in the fastest possible memory.

• Cost Efficiency: High-speed memory (like SRAM) is very expensive, while large storage (like hard drives) is cheap. The hierarchy
makes use of both, combining fast but small memory with large but slower memory.

• Capacity Management: The hierarchy ensures there is enough storage for all data by using larger but slower memory for data
that is less frequently accessed.

Levels in the Memory Hierarchy

The memory hierarchy typically consists of several levels, each offering different speeds and capacities, ranging from very fast, small, and
expensive memory to large, slow, and inexpensive storage.

1. Registers (Fastest Memory)

o Location: Inside the processor (CPU).

o Purpose: Stores the data that is actively being processed by the CPU. These are the fastest accessible memory
elements, used to store operands for computations.

o Speed: Registers have access times measured in nanoseconds (ns) or even fractions of a CPU cycle (typically around
0.5–1 ns).

o Size: Very small (e.g., a few to tens of bytes), as they hold only the most immediate data.

o Cost: Extremely high per bit, as they are integrated into the CPU itself.

2. Cache Memory (L1, L2, L3)

o Location: Between the CPU and main memory.

o Purpose: Cache memory stores copies of frequently accessed data and instructions from the main memory (RAM). It
reduces the time it takes for the CPU to access data by keeping a subset of the most-used data closer to the
processor.

o L1 Cache:

▪ Location: Directly on the CPU chip.

▪ Purpose: The smallest and fastest cache level, used to store a small amount of data and instructions.

▪ Size: Typically ranges from 16 KB to 128 KB.

▪ Speed: Very fast, with access times of around 1–2 CPU cycles.

o L2 Cache:

▪ Location: Often on the CPU chip but may also be separate from the processor.

▪ Purpose: A larger and slower cache than L1, used to store more data than L1 but still much faster than
accessing the main memory.

▪ Size: Typically ranges from 128 KB to 8 MB.

▪ Speed: Slower than L1 but faster than main memory, with access times in the 3–10 CPU cycle range.
o L3 Cache:

▪ Location: Can be shared among multiple processor cores, and often located off-chip.

▪ Purpose: The largest cache level, used to hold data that might not fit into L1 or L2. L3 is shared across
multiple cores in modern processors.

▪ Size: Typically ranges from 2 MB to 64 MB.

▪ Speed: Slower than L2 but still significantly faster than main memory.

o Access Speed: L1 is the fastest, followed by L2 and L3. The trade-off is that, as the cache size increases (L2, L3), it
becomes slower.

3. Main Memory (RAM - Random Access Memory)

o Location: External to the processor but connected to it via a memory bus.

o Purpose: Main memory (often dynamic RAM, or DRAM) stores the operating system, applications, and currently
used data that are actively being processed by the CPU.

o Speed: Slower than cache memory. It can take 50–200 nanoseconds to access data from RAM.

o Size: Much larger than cache memory. Typically 4 GB to 128 GB or more, depending on the computer system.

o Cost: More affordable than cache memory. It’s cheaper per bit because DRAM is used in main memory.

o Usage: Main memory is used to store larger volumes of active data that need to be directly accessed by the CPU.
However, it’s slower than cache memory.

4. Secondary Storage (Hard Disk Drives (HDDs) / Solid-State Drives (SSDs))

o Location: External to the processor and RAM, used for long-term storage.

o Purpose: Secondary storage devices are used to store data and applications that are not actively being processed.
This includes large files like videos, documents, and software programs.

o Hard Disk Drives (HDDs):

▪ Mechanically based storage devices that use spinning magnetic disks to read and write data.

▪ Speed: Slower than RAM or SSDs, with access times in the millisecond (ms) range.

▪ Size: Can store hundreds of gigabytes (GB) to several terabytes (TB) of data.

▪ Cost: The cheapest per bit of all memory types.

o Solid-State Drives (SSDs):

▪ Flash-based storage with no moving parts, much faster than HDDs.

▪ Speed: Access times are much faster than HDDs, typically around 0.1–0.2 milliseconds.

▪ Size: Smaller than HDDs, typically 120 GB to 4 TB.

▪ Cost: More expensive than HDDs but still cheaper than RAM and cache memory.

o Access Speed: Both HDDs and SSDs are much slower than RAM and cache, with HDDs being the slowest. However,
SSDs are significantly faster than traditional HDDs.

Why is the Memory Hierarchy Important?

1. Speed Optimization: The memory hierarchy allows frequent data to be accessed quickly by placing it in faster memory, like the
cache. Less frequently accessed data resides in slower, larger memory, like RAM or secondary storage. By doing so, it reduces
the time the CPU spends waiting for data.

2. Cost-Effective Design: SRAM (used for cache memory) is much more expensive than DRAM (used for main memory), and both
are much more expensive than magnetic disks or SSDs. The hierarchy ensures that expensive, fast memory is only used for the
most critical data, while the slower, cheaper storage holds larger amounts of data.
3. Capacity Management: The memory hierarchy provides large storage capacity (through HDDs or SSDs) for data that is not
needed immediately, while maintaining a smaller but faster storage layer (using cache and RAM) for the data that needs to be
accessed quickly.

How the Memory Hierarchy Works:

• When the CPU needs data, it first checks the registers. If the data isn't there, it checks the L1 cache. If it's not in L1, it looks in
the L2 and L3 caches.

• If the data is not found in any cache levels, it moves to the main memory (RAM). If RAM doesn't have the data, the system
fetches it from secondary storage (HDD or SSD).

• This layered approach makes sure that the most frequently accessed data is always as close to the CPU as possible, and less
frequent data is stored in slower, larger storage layers.

Conclusion

The memory hierarchy ensures that the system performs efficiently by providing quick access to the data the CPU needs most often, while
still maintaining large storage capacities for less frequently accessed data. By organizing memory into a hierarchy of registers, caches,
main memory, and secondary storage, systems can manage the trade-off between speed, size, and cost, resulting in better performance
and more cost-effective computer systems.

Need for Cache Memory and Its Working Principle

Cache memory is a high-speed storage medium that sits between the CPU and main memory. Its purpose is to bridge the significant speed
gap between the processor and main memory. The CPU operates at a much faster rate than the main memory, and this disparity can
significantly slow down overall performance. Cache memory helps mitigate this performance bottleneck by making frequently accessed
data and instructions readily available to the processor.

(Cache memory

Cache memory is like a super-fast and small-sized storage area that sits between the main memory (RAM) and the central processing unit
(CPU) in a computer. Its primary job is to store frequently used or recently accessed information so that the CPU can quickly
retrieve it when needed.

Why Cache Memory is Needed

1. Speed Difference Between CPU and Main Memory:

o The CPU is designed to process data at very high speeds, but the main memory (RAM) is much slower in comparison.
Without cache memory, the processor would need to access main memory every time it requires data, which would
drastically slow down operations.

o Cache memory, being faster than main memory, helps to reduce the average access time for data, enabling the CPU
to process information more quickly.

2. Locality of Reference:

o The efficiency of cache memory is built on the principle of locality of reference, a characteristic behavior of
programs. This principle refers to the tendency of programs to access a relatively small set of memory locations
repeatedly within a short period.

▪ Temporal locality: If a program accesses a particular data or instruction, it is likely to access the same
data or instruction again soon.

▪ Spatial locality: If a program accesses a specific memory address, it is likely to access nearby addresses as
well.

o Cache takes advantage of these behaviors by storing recently or frequently accessed data (temporal locality) and
adjacent data (spatial locality).
3. Improved Performance:

o Cache memory improves CPU performance by reducing the time it takes to access frequently used data or
instructions. This allows the CPU to spend more time processing data and less time waiting for data to be fetched
from slower main memory.

Working Principle of Cache Memory

The operation of cache memory is designed to capitalize on locality of reference by storing recently or frequently accessed data. The
general operation involves the following steps:

1. Processor Request:

o When the processor issues a read or write request, the memory address is passed to the cache.

2. Cache Lookup:

o The cache checks if the requested data is present. If it is, a cache hit occurs.

o If the data is not in the cache, a cache miss occurs, and the data must be fetched from the main memory.

3. Data Transfer:

o If the data is in the cache (cache hit), the requested data is returned to the CPU immediately.

o If the data is not in the cache (cache miss), a block of data containing the requested memory address is transferred
from main memory to the cache. The requested data is then returned to the CPU.

4. Cache Block:

o Data in the cache is grouped into blocks (or lines). A cache block (or cache line) is a set of contiguous memory
locations. Rather than loading only the requested memory location, the cache loads the entire block to take
advantage of spatial locality.

Cache Hits and Cache Misses

Cache Hit:

• A cache hit occurs when the CPU requests data that is already stored in the cache. In this case, the processor can directly access
the cache to retrieve the data without needing to access the slower main memory. This results in faster performance.

Cache Miss:

• A cache miss occurs when the requested data is not found in the cache. When this happens:

1. The cache controller fetches the data from the main memory.

2. The block containing the requested data is loaded into the cache.

3. The data is then passed to the processor.

There are different types of cache misses:

o Cold Miss (or Compulsory Miss): The first time data is accessed and isn't in the cache.

o Capacity Miss: Occurs when the cache is too small to hold all the data that the program needs, causing older data to
be evicted to make space.

o Conflict Miss: Occurs when multiple blocks of data compete for the same cache location, even though the cache has
room for them.
Cache Write Policies: Write-Through vs. Write-Back

There are two main strategies for handling writes to the cache and main memory:

1. Write-Through Cache:

• In the write-through protocol, when data is written to the cache, it is also immediately written to the main memory.

o Advantages: Simple to implement; the cache and main memory remain synchronized.

o Disadvantages: Can result in unnecessary write operations to the main memory, especially if the cache data is
modified frequently.

2. Write-Back Cache:

• In the write-back protocol, data is written only to the cache initially. The main memory is updated later when the block is
evicted from the cache (i.e., when it is replaced by another block).

o Advantages: Fewer write operations to the main memory, which is beneficial for performance, especially when data
is modified multiple times before being evicted from the cache.

o Disadvantages: Requires additional management, such as tracking modified (dirty) blocks.

Cache Replacement Policy

When the cache is full and a new memory block needs to be loaded but there is no space available, the system must decide which block to
remove (evict) to make room for the new block. This decision is made using a replacement policy.

Some common replacement policies are:

• Least Recently Used (LRU): The block that has not been used for the longest period is replaced.

• First In, First Out (FIFO): The oldest block in the cache is replaced.

• Random Replacement: A random block is chosen for replacement.

• Optimal Replacement: Replaces the block that will not be used for the longest period of time in the future (ideal but impractical
because it requires future knowledge).

Cache Memory Types

1. Primary Cache (L1 Cache):

o This is the smallest and fastest cache, usually located directly on the CPU chip.

o It stores the most frequently used data and instructions for immediate access by the processor.

o Size: Typically between 16 KB and 128 KB.

o Speed: Closest to the processor and provides the fastest data access.

2. Secondary Cache (L2 Cache):

o L2 cache is larger than L1 cache and is usually located on the same chip as the processor or just outside the CPU.

o It holds data and instructions that are less frequently used than those in L1 cache but still need faster access than the
main memory.

o Size: Typically between 256 KB and several MBs.

o Speed: Slower than L1 but faster than main memory.

3. Tertiary Cache (L3 Cache):

o L3 cache is often shared between multiple processor cores in multi-core processors. It is the largest but slowest of
the caches.
o Size: Typically several MBs (e.g., 2 MB to 30 MB).

o Speed: Slower than L1 and L2 but still much faster than accessing data from main memory.

4. Write-Through vs. Write-Back Caches:

o Write-Through Cache: Updates both the cache and the main memory simultaneously.

o Write-Back Cache: Updates only the cache initially and writes to main memory later when the data is evicted from
the cache.

5. Associative Mapping:

o Direct-Mapped: Each block in main memory maps to exactly one cache line.

o Fully-Associative: Any block in memory can go into any cache line (flexible but complex).

o Set-Associative: Combines aspects of direct-mapped and fully-associative, allowing each block of memory to be
mapped to any cache line within a set of cache lines.

Conclusion

Cache memory is a critical component in modern computer systems for improving performance. By exploiting the locality of reference
principle, cache stores frequently or recently accessed data and instructions, reducing the time the CPU spends waiting for data from
slower main memory. Different types of cache (L1, L2, L3) and cache policies (write-through, write-back) are used to manage data
efficiently and balance between speed, complexity, and memory usage. The effectiveness of cache memory directly contributes to a
system's overall speed and responsiveness, especially for tasks that involve repetitive access to the same data or instructions.

Mapping Functions for Cache Memory

Cache memory plays a crucial role in speeding up data access for a processor by temporarily holding frequently used data. However,
determining where in the cache memory a particular block of main memory should be placed is not trivial.

Mapping functions determine how memory blocks are placed in the cache.

Mapping functions help decide where data from the main memory will be stored in the cache. This is important because the cache is much
smaller than the main memory, so it needs a strategy to manage the placement of data efficiently.

Main Memory Overview

• Address Size: 16 bits → 216=65,536unique addresses.

• Memory Capacity: 65,536 words = 64K words (1K = 1024).

• Block Organization:

o Main memory is divided into 4,096 blocks (4K), with each block holding 16 words.

o Total memory words = 4,096×16=65,536.

Address Breakdown

A 16-bit address is divided as:

• Upper 12 bits → Block number (4096 blocks).

• Lower 4 bits → Word offset within the block (16 words).

Example

Address 0x1234:

• Block number: 0x123 (Upper 12 bits → Block 291).

• Word offset: 0x4 (Lower 4 bits → Word 4 in Block 291).


Direct Mapping Simplified

• Formula:

Cache Block=Main Memory Block Number mod Total Cache Blocks

How it Works:

o Main memory blocks are assigned to specific cache blocks based on the above formula.

o If multiple memory blocks map to the same cache block, they replace each other.

Example with Direct Mapping Formula

1. Main Memory Block 0:

o Formula: 0 mod 128=0

o Result: Block 0 from main memory is placed in cache block 0.

2. Main Memory Block 129:

o Formula: 129mod 128=1

o Result: Block 129 from main memory is placed in cache block 1.

3. Main Memory Block 256:

o Formula: 256mod 128=0

o Result: Block 256 from main memory is placed in cache block 0, replacing the data from block 0.

4. Main Memory Block 130:

o Formula: 130mod 128=2

o Result: Block 130 from main memory is placed in cache block 2

In direct mapping, each main memory block is assigned to one of the 128 cache blocks using the formula: Cache Block = Main Memory
Block Number mod 128. This ensures that the block number wraps around to fit within the available cache. For example, main memory
blocks 0, 128, and 256 all map to cache block 0. If multiple memory blocks map to the same cache block, the new block replaces the old
one. This approach is simple but can lead to conflicts when several blocks compete for the same cache location.

Associative Mapping Example (Similar to Direct Mapping)

In associative mapping, unlike direct mapping, any block from main memory can be placed in any cache block. The key difference is that
there is no specific, fixed position for memory blocks in the cache. The cache uses a tag to identify which block is stored where. However,
the principle of mapping remains similar, where multiple memory blocks can go to the same cache block.

Example:

• Main Memory Blocks: 0, 128, 256


• Cache Blocks: 0, 1, 2 (Total 3 blocks for simplicity)

Steps:

1. Main Memory Block 0:


o Block 0 can go into any cache block. Let's say it goes into Cache Block 0.
2. Main Memory Block 128:
o Block 128 can also go into any cache block. Let’s say it goes to Cache Block 1.
3. Main Memory Block 256:
o Block 256 can go into any cache block. Let’s say it goes to Cache Block 2.
Key Point:

• Block 0 from main memory is placed in Cache Block 0.


• Block 128 from main memory is placed in Cache Block 1.
• Block 256 from main memory is placed in Cache Block 2.

Unlike direct mapping, where blocks are mapped to specific cache blocks using the modulus operation, in associative mapping, any main
memory block can be placed in any cache block. The cache will search for the correct block using the tag to determine which block is in
each cache location.

In associative mapping, any block from main memory can be placed in any cache block, providing complete flexibility. There is no fixed
mapping between memory blocks and cache blocks, unlike in direct mapping. Instead, the cache uses a tag to identify which memory
block is stored in a given cache block. This eliminates conflicts caused by multiple memory blocks mapping to the same cache location, as
seen in direct mapping. However, associative mapping requires more complex hardware to search the cache for the correct block, making
it slower and more expensive.

Associative Mapping Simplified

In associative mapping, memory blocks can be stored in any cache block. The cache checks if the requested data is present by comparing
the tag of the memory address with the tags of the blocks in the cache.

Memory Address Structure:

1. Word Field (Low-order 4 bits):


o Identifies which specific word (out of 16 words) in a block is needed.
2. Tag Field (High-order 12 bits):
o Identifies the memory block from which the word belongs.

How It Works:

1. The processor asks for data.


2. The cache checks all its blocks to see if the tag matches the requested block's tag.
3. If there's a match (cache hit), the cache sends the data.
4. If there's no match (cache miss), the data is fetched from main memory and stored in the cache.

This method allows any memory block to be placed in any cache block, making it more flexible but requiring the cache to check all its
blocks.

Cache Replacement Policy (Super Simple):When the cache is full and needs space for new data, it uses a replacement policy to decide
which old data to remove:

1. LRU (Least Recently Used): Removes the data that hasn't been used for the longest time.
2. FIFO (First-In-First-Out): Removes the oldest data in the cache.
3. Random Replacement: Removes random data from the cache.

These policies help make space for new data in the cache.

Set-Associative Mapping with Replacement Policy (Like Before)

In set-associative mapping, each memory block can be stored in any cache block within a specific set. The replacement policy is used to
decide which block within the set to replace when the cache is full.

Example Setup:

• Cache Blocks (2-way set-associative): 2 sets, each with 2 blocks.


• Main Memory Blocks: 0, 128, 256, 384, etc.

How It Works (Mapping):

1. Main Memory Block 0:


o Block 0 maps to Set 0 (using a mapping formula like modulo the number of sets).
o Block 0 is placed in Cache Block 0 of Set 0.
2. Main Memory Block 128:
o Block 128 also maps to Set 0.
o Block 128 is placed in Cache Block 1 of Set 0.
3. Main Memory Block 256:
o Block 256 maps to Set 1.
o Block 256 is placed in Cache Block 0 of Set 1.
4. Main Memory Block 384:
o Block 384 maps to Set 1.
o Block 384 is placed in Cache Block 1 of Set 1.

Replacement Policy:

When a new block needs to be loaded and the set is full, the cache will replace one of the blocks within the set using a replacement
policy.

1. LRU (Least Recently Used):


o If Set 0 is full (both Block 0 and Block 128 are in the set), and Block 256 needs to be loaded into Set 0, the cache will
replace the least recently used block. For example, if Block 0 has not been used for the longest time, Block 0 will be
replaced.
2. FIFO (First-In-First-Out):
o If Set 0 is full and Block 256 needs to be loaded, Block 0 (the first block loaded) will be replaced because it was the
first block in the set.
3. Random Replacement:
o If Set 0 is full and Block 256 needs to be loaded, the cache will randomly select either Block 0 or Block 128 to be
replaced.

Summary:

In set-associative mapping, blocks from main memory can go into any cache block within a set. When the set is full and a new block needs
to be loaded, a replacement policy (LRU, FIFO, or Random) determines which block to remove from the set.

Centralized Shared-Memory Architectures (Detailed)

In a Centralized Shared-Memory Architecture, all processors in the system share a single, central memory pool. This means that the
processors can directly access the memory, which is shared across all of them. Communication between processors happens via a shared
bus or memory module. This architecture is typically used in Symmetric Multiprocessing (SMP) systems.

How It Works:

1. Single Memory Pool:


o All processors are connected to a single, unified memory space, meaning they can all access the same memory. Each
processor can read or write data to the memory, just like accessing a large shared pool of data.
2. Shared Bus or Memory Module:
o The processors communicate with the memory using a shared bus or interconnect. The bus handles all memory
requests from the processors, but it can become slow if many processors are trying to access memory at once.
3. Asymmetric Memory Access:
o When one processor accesses a piece of memory that another processor "owns," the request has to go through the
processor that holds the memory. This creates asymmetric access, where memory access times might differ
depending on which processor holds the memory.
4. Private vs. Shared Data:
o Private Data: This type of data is used by only one processor. It can be stored in the cache of that processor, making
access quicker.
o Shared Data: This data is used by multiple processors, so copies of it may be stored in the caches of those
processors. This helps reduce memory access time but introduces the challenge of cache coherence (ensuring all
copies of the shared data remain consistent).
5. Cache Coherence Problem:
o Since multiple processors may store copies of the same data in their own caches, there can be a situation where one
processor updates a piece of data, but other processors still have the old version of it. This is called the cache
coherence problem and requires mechanisms to ensure that all caches stay in sync.

UMA (Uniform Memory Access):

• UMA means that all processors have the same access time to the shared memory. This is a key characteristic of centralized
shared-memory architectures. Each processor perceives the same memory access latency, making the system easier to
program because all processors work with memory in the same way.
• Single Memory Space means all processors access the memory as if it's a large, shared pool, simplifying memory management.

Advantages of Centralized Shared-Memory Systems:

1. Simplicity in Programming:
o Since all processors share the same memory, programming becomes easier. You can use simple programming
models like threads and shared variables, where processors can communicate by reading and writing to shared
variables directly.
2. Memory Consistency:
o With centralized memory, it’s easier to maintain coherence or consistency of data because all memory is stored in
one place. There’s no need to worry about which processor has the most recent copy of the data, unlike in
distributed memory systems.
3. Efficient Memory Use:
o Multilevel caches reduce the demand on the main memory and increase processor speed. By caching data locally in
each processor’s cache, the system minimizes memory traffic and improves overall performance.

Disadvantages of Centralized Shared-Memory Systems:

1. Scalability Limitations:
o As more processors are added to the system, the shared bus or interconnect becomes a bottleneck. Only one
processor can access the memory at a time, so as the number of processors increases, the memory access time
becomes slower, limiting how well the system can scale.
2. Communication Overhead:
o Since all processors share the same bus to communicate with memory, there is contention when multiple
processors try to access memory at the same time. This leads to delays, reducing the system's overall performance,
especially when the number of processors increases.
3. Cache Coherence Overhead:
o Maintaining cache coherence becomes more complex in systems with multiple processors. Cache coherence
protocols (such as MESI) are required to ensure that all processors have the most up-to-date version of the shared
data. This adds complexity and overhead to the system.

Summary:

A Centralized Shared-Memory Architecture provides a unified memory space accessible by all processors. This simplifies programming
and memory management by offering a Uniform Memory Access (UMA) system, where each processor experiences the same access time
to memory. However, it faces challenges like scalability issues due to the shared bus, communication delays, and the cache coherence
problem when multiple processors try to access or update the same data. As the number of processors increases, the system’s
performance may decrease due to these factors.
Cache Coherence in Multiprocessor Systems

In a system with multiple processors, each processor has its own cache memory. The cache speeds up access to data by storing frequently
used information. But when processors share data, it’s important to make sure that all caches have the same up-to-date version of that
data. This is called cache coherence.

Two Main Ways to Ensure Cache Coherence:

1. Snooping Protocol
o How it works: Each processor watches (or "snoops on") the shared bus to see what other processors are doing with
the data.
o Example: If Processor 1 changes a value in its cache, other processors will notice and update or delete their copies
of that data.
o Best for: Smaller systems where processors are connected through a shared bus.
2. Directory-Based Protocol
o How it works: Instead of snooping, there is a central directory that keeps track of which caches have copies of each
data block.
o Example: If Processor 1 wants to access data, the directory will tell it who has copies of that data, and it makes sure
everything stays up-to-date.
o Best for: Larger systems, where keeping track of data using a shared bus would be too slow or messy.

Key Points:

• Snooping is simple, works well for small systems, and uses a shared bus to track data changes.
• Directory-based is better for larger systems because it avoids too much traffic on the bus and uses a central directory to
manage data.

Both methods are used to make sure all processors have the same data in their caches, just in different ways.

In multiprocessor systems, where multiple processors share data, it's important to make sure everyone has the latest data. This is called
cache coherence. There are two main ways to do this:

1. Directory-Based Protocols

• How it works: There's a central directory that keeps track of which processors have copies of each piece of data.
• In SMP systems: The directory is part of the main memory or a central point that keeps everything in check.
• In DSM systems: Many processors are involved, and having a single directory can slow things down because too many
processors might need to access it at once.
• Downside: Since all requests go through the directory, it can be slower compared to other methods.

2. Snooping Protocols

• How it works: Instead of a directory, each processor "snoops" (listens) on a shared bus that connects all the processors'
caches.
• When a processor updates data: It sends a signal on the bus to tell the other processors to update or delete their copies of
that data.
• In SMP systems: All processors are connected via the shared bus, so they can easily see what others are doing with data.
• Why it's good: It's simple and works well for smaller systems because processors can quickly check the bus for updates.

Key Takeaways:

• Directory-Based: A central directory keeps track of data, but it can slow down in larger systems.
• Snooping: Processors listen to a shared bus for updates, and it's simple and fast for smaller systems.

The Snooping Coherence Protocol is a method used in multiprocessor systems to ensure that all processors have a consistent view of
shared memory, even when each processor has its own private cache. This protocol requires each cache to "snoop" or monitor
transactions on a shared bus to detect any updates or changes to memory data made by other processors.
Key Characteristics

1. Shared Bus:
o All processors and caches are connected via a shared communication bus.
o When a processor changes a memory block, this change can be broadcast on the bus, allowing other caches to see
and respond to it.
2. Cache Coherence:
o Each cache listens to the bus and monitors memory transactions.
o If a processor writes to a memory block, other caches can detect this and make necessary updates to avoid holding
outdated or inconsistent data.
3. Snooping on the Bus:
o When any processor performs a read or write operation, all other caches "snoop" on the action.
o Based on what they detect, caches may update, invalidate, or keep their data to ensure consistency.

Types of Snooping Protocols

1. Write-Invalidate Protocol:
o This protocol ensures that only one cache has a valid (exclusive) copy of data when it is written to.
o How it works: When a processor writes to a memory block, it sends an "invalidate" signal on the bus. Other caches
receiving this signal invalidate their copies of that memory block, ensuring that the writing processor has exclusive
access.
o Example:
▪ If Processor A writes to block X, it invalidates copies of X in all other caches (e.g., Processor B). If
Processor B later needs X, it must fetch the updated version from memory or Processor A.
2. Write-Update Protocol (also called Write-Broadcast):
o In this protocol, when a processor writes to a cache block, it broadcasts the new data to all other caches.
o How it works: The updated data is shared immediately across all caches, ensuring that each cache holds the latest
version of the memory block.
o Example:
▪ If Processor A writes to block X, it broadcasts the updated value of X to all other processors (e.g.,
Processor B). Processor B then updates its cache with the new value of X.

Cache States:To manage these protocols, each cache line can be in one of the following states:

• Modified (M): The data is modified and only exists in one cache.
• Shared (S): The data is unmodified and can be shared by multiple caches.
• Invalid (I): The data is not valid and needs to be fetched from memory if accessed.

1. Simplicity: They are relatively straightforward to implement, especially in small systems with a shared bus.

2. Efficiency: They work well in small setups where all processors can monitor the bus in real-time.

3. Low Latency: Changes propagate quickly across caches because all caches are monitoring the bus.
Disadvantages of Snooping Protocols

1. Bus Contention: With many processors, bus congestion can occur since each processor snoops on every transaction.

2. Scalability Issues: Snooping protocols don’t scale well in larger systems, as a shared bus becomes a bottleneck.

3. Limited Bandwidth: The shared bus has limited bandwidth, reducing performance as more processors are added.

The Directory-Based Coherence Protocol is used in distributed-memory multiprocessor systems to maintain cache coherence. In such
systems, each processor has its own cache, and the directory keeps track of which caches have copies of each memory block.

Key Concepts

1. Directory:

o The directory is responsible for keeping track of the state of each memory block and which caches have copies of the
block.

o It uses a bit vector to indicate which processors hold copies of a memory block.

2. Cache States:

o Shared: The cache block is present in multiple caches, and all copies are up-to-date.

o Uncached: No cache holds the block.

o Modified: A single cache holds the exclusive copy of the block, and the memory copy is outdated.

3. Nodes:

o Local Node: The processor that initiates a cache request.

o Home Node: The node that contains the memory block and the directory for the block.

o Remote Node: A processor that holds a copy of the memory block.

Operations in Directory-Based Protocol

1. Read Miss:

o When a processor misses in its cache, the directory checks the status of the memory block.

o If the block is not in any cache (uncached), the directory fetches it from memory and sends it to the requesting
processor.

o If the block is in another processor’s cache (exclusive), the directory requests the cache to update its copy to shared
and sends the block to the requesting processor.

2. Write Miss:

o If a processor writes to a block that is in multiple caches (shared), the directory sends an invalidate signal to all
remote caches to ensure the writing processor has exclusive access.

Directory-Based Protocol Example

1. Cache Read Operation:

o Cache Hit: If the block is in the cache, it is directly used.

o Cache Miss: If the block is not in the cache:

▪ The directory checks if the block is uncached. If so, it fetches it from memory.

▪ If the block is exclusive in another cache, the directory requests the cache to send the block and updates
the state.

2. Cache Write Operation:

o The directory sends invalidation signals to remote caches if the block is shared. The block is then marked as modified
in the writing cache.

Example Message Flow


1. Read Miss:

o Local cache sends a request to the home directory.

o If the block is uncached, the home directory sends the block to the local cache.

o If the block is exclusive in another cache, the directory sends a request to that cache to share or invalidate the block
and sends it to the requesting processor.

2. Write Miss:

o The home directory sends an invalidate message to remote caches to remove their copies of the block.

o The writing cache updates the directory to mark the block as modified.

Simplified Diagram draw)

Advantages

• Scalable: Directory-based protocols are more scalable for large systems since they avoid the bus contention seen in snooping
protocols.

• Efficient: Only involved caches communicate, reducing unnecessary traffic.

Disadvantages

• Directory Overhead: Maintaining the directory adds complexity.

• Latency: There can be delays as the home node manages the coherence state and updates multiple caches.

In summary, the Directory-Based Coherence Protocol provides a way to maintain consistency in large multiprocessor systems by using a
centralized directory to track which caches have which memory blocks, ensuring data coherence and minimizing unnecessary data
transfers.

Write-Invalidate Snooping Cache Coherence Protocol

The Write-Invalidate Snooping Protocol is a common technique used to ensure cache coherence in systems where multiple processors
have their own caches. In this protocol, when one processor writes to a block of memory, all other caches that hold a copy of that
memory block are notified to invalidate their copies.

Key Cache States

1. Modified (M): The cache has the only valid copy of the data, and it is different from the main memory.

2. Shared (S): The cache has a copy of the data, but it is consistent with the main memory. Multiple caches may have a copy.

3. Invalid (I): The cache does not have a valid copy of the block.

Basic Operations

• Read Miss: A processor reads a block that is not in its cache.

• Write Miss: A processor writes to a block that is not in its cache.

Write-Invalidate Protocol: How It Works

• Write Miss: When a processor writes to a memory block, it sends a message over the shared bus. If other caches have the
block, they invalidate their copies.

• Read Miss: When a processor reads a block that is not in its cache, it requests the data. If another processor has the block, it
sends the data.

State Transitions

1. Modified (M):

o The processor has the only valid copy, and it is different from the main memory.

o Write Miss: Invalidate other caches → Transition to Invalid (I).


Read Miss: Send the block to the requesting processor → Transition to Shared (S).

2. Shared (S):

o The processor has a copy of the block that is consistent with the main memory.

o Write Miss: Invalidate the block in other caches → Transition to Invalid (I).

o Read Miss: No change → Remains in Shared (S).

3. Invalid (I):

o The cache does not contain a valid copy of the block.

o Write Miss: The processor fetches the block and writes to it → Transition to Modified (M).

o Read Miss: The processor fetches the block and reads it → Transition to Shared (S).

State Transition Diagram

Description of Transitions:

1. From Modified (M):

o Write Miss: The processor sends an invalidation message to other caches, which transition to Invalid (I).

o Read Miss: The processor sends the block to the requesting processor and changes the state to Shared (S).

2. From Shared (S):

o Write Miss: Other caches are invalidated, and the state transitions to Invalid (I).

o Read Miss: The block remains in Shared (S) as it is still valid for other processors to read.

3. From Invalid (I):

o Write Miss: The processor fetches the block and writes to it, transitioning to Modified (M).

o Read Miss: The processor fetches the block and transitions to Shared (S).

Example Scenario

• Processor A writes to a block, changing its state to Modified (M).

• Processor B reads the same block, transitioning its state to Shared (S).

• Processor C writes to the same block, causing both Processor A and Processor B to invalidate their copies, transitioning to
Invalid (I).

Advantages of Write-Invalidate Protocol

• Simplicity: Easy to implement because when a processor writes, it simply invalidates the blocks in other caches.

• Data Integrity: Prevents caches from using stale data by ensuring that only one processor has the valid copy of a modified
block.

Disadvantages

• Bus Traffic: High bus traffic due to the need to send invalidation messages every time a write occurs.

• Latency: There can be delays as caches are invalidated and data is fetched again.

In summary, the Write-Invalidate Snooping Protocol helps maintain cache coherence by ensuring that no processor works with outdated
data by invalidating copies in other caches whenever a write operation occurs.

You might also like