Unit 3 Notes
Unit 3 Notes
Unit 3 Notes
Temporal Locality: This suggests that if a particular memory location is accessed, it is likely to be
accessed again soon. For example, loops and frequently called functions exhibit temporal locality.
Spatial Locality: This indicates that if a memory location is accessed, nearby memory locations are likely
to be accessed soon afterward. This is often seen in array accesses and sequential data processing.
To take advantage of the principle of locality, modern computer systems use a hierarchical memory
structure that balances speed, cost, and size. The typical memory hierarchy includes:
Registers: The fastest type of memory, located within the CPU. They store a small amount of data that is
immediately needed by the processor for calculations and operations.
Cache Memory: This is a small, high-speed memory located close to the CPU. It is divided into levels (L1,
L2, and sometimes L3). Caches store frequently accessed data and instructions to reduce the time the
CPU takes to access data from main memory. L1 is the smallest and fastest, while L3 is larger but slower.
Main Memory (RAM): This is the primary storage used for currently running programs and data. It is
larger than cache but slower. RAM is volatile, meaning it loses its contents when the power is turned off.
Secondary Storage: This includes hard drives (HDDs) and solid-state drives (SSDs). It provides long-term
storage for data and programs but is much slower than RAM. SSDs are faster than HDDs but generally
more expensive.
Tertiary Storage: This is used for backup and archival purposes, such as magnetic tape or optical discs. It
has the slowest access times and is often used for data that is not frequently accessed.
Set associative mapping combines direct mapping with fully associative mapping by arrangement lines of
a cache into sets. The sets are persistent using a direct mapping scheme. However, the lines within each
set are treated as a small fully associative cache where any block that can save in the set can be stored to
any line inside the set.
The diagram represents this arrangement using a sample cache that uses four lines to a set.
A set-associative cache that includes k lines per set is known as a k way set-associative cache. Because
the mapping approach uses the memory address only like direct mapping does, the number of lines
included in a set should be similar to an integer power of two, for example, two, four, eight, sixteen, etc.
Example − Consider a cache with 29 = 512 lines, a block of memory contains 23 = 8 words, and the full
memory space includes 230 = 1G words. In a direct mapping scheme, this can leave 30 – 9 – 3 = 18 bits
for the tag.
By sending from direct mapping to set associative with a set size of two lines per set, the various sets
achieved equals half the number of lines. In the instance of the cache having 512 lines, we can achieve
256 sets of two lines each, which would require eight bits from the memory address to recognize the set.
This can leave 30 – 8 – 3 = 19 bits for the tag. By sending to four lines per set, the number of sets is
decreased to 128 sets needing 7 bits to recognize the set and twenty bits for the tag.
3.What is Miss rate? Explain three categories of cache misses in three Cs Model.
Miss Rate
The miss rate is a performance metric used in cache memory systems to indicate the percentage of
memory accesses that result in a cache miss. It is calculated as:
A lower miss rate indicates better cache performance, as it means that more memory accesses are being
served by the cache rather than having to go to slower main memory.
The Three Cs Model categorizes cache misses into three distinct types: Compulsory Misses, Capacity
Misses, and Conflict Misses. Each type of miss arises from different causes:
Definition: These misses occur the first time a block is accessed. When data is loaded into the cache for
the first time, it’s considered a compulsory miss.
Example: If a program is accessing an array for the first time, the initial accesses to that array will result
in compulsory misses until the relevant blocks are loaded into the cache.
Impact: Compulsory misses are inevitable and can’t be eliminated entirely, but their frequency can be
reduced through techniques like prefetching.
Capacity Misses:
Definition: These occur when the cache cannot hold all the blocks that are actively being used by the
program. As a result, previously loaded blocks are evicted before they are reused.
Example: If a cache has a limited size and a program accesses more data than can fit into that cache,
some blocks will be evicted, leading to misses when those blocks are accessed again.
Impact: Capacity misses can be mitigated by increasing cache size or optimizing data access patterns to
fit within the available cache.
Definition: These arise in set-associative or direct-mapped caches when multiple blocks compete for the
same cache line or set. Even if there is space in other cache lines, the particular block being accessed is
not available because it maps to a specific set or line that is already occupied by a different block.
Example: In a direct-mapped cache, if two different memory blocks map to the same cache line,
accessing one block will evict the other, leading to conflict misses.
Impact: Conflict misses can be reduced by using higher associativity (i.e., moving to a more flexible cache
structure) or optimizing the memory access pattern to reduce conflicts.
Explanation: Increasing the cache size allows it to store more data blocks, reducing capacity misses. A
larger cache can hold more working data, which can be particularly beneficial for applications with high
data locality.
2. Higher Associativity:
Explanation: Using a higher level of associativity (e.g., going from direct-mapped to 4-way or 8-way set
associative) reduces conflict misses by allowing multiple blocks to map to the same set. This flexibility
decreases the chances of eviction of useful data.
Explanation: Choosing an optimal cache block (line) size is crucial. Larger blocks can exploit spatial
locality by fetching adjacent data along with the requested block. However, if blocks are too large, it can
lead to higher miss penalties and waste space for infrequently used data. A balance must be found based
on the access patterns of applications.
4. Prefetching:
Explanation: Prefetching involves predicting future memory accesses and loading data into the cache
before it is explicitly requested by the processor. This can significantly reduce latency for sequential
accesses and loops, though care must be taken to avoid evicting useful data.
5. Write Policies:
Explanation: Adjusting write policies, such as using write-back instead of write-through caching, can
improve performance. Write-back caching allows data to be written to the main memory only when it is
evicted from the cache, reducing memory traffic and improving overall speed.
Explanation: Implementing effective cache replacement policies (e.g., Least Recently Used (LRU),
First-In-First-Out (FIFO), or Random) can influence which cache lines to evict when new data needs to be
loaded. Better policies can reduce misses by keeping frequently accessed data in the cache longer.
5.What is way prediction? How it is used to reduce the cache hit time?
Way Prediction
Way prediction is a technique used in set-associative caches to improve access times by predicting which
way (or line) within a set will contain the required data. In a set-associative cache, each set contains
multiple cache lines, and way prediction aims to minimize the time taken to access these lines when
looking for data.
Instead of checking all the lines in a set sequentially to find the required data, the system predicts which
line is likely to contain the desired block based on past access patterns. This prediction can be based on
historical usage data or specific algorithms.
Upon receiving a memory request, the cache controller uses the prediction to first check the predicted
line in the set. If the data is found (a hit), access time is reduced since only one line was checked.
If the predicted line does not contain the data (a miss), the system then checks the remaining lines in the
set. This process involves some overhead, but the initial access is often significantly faster than checking
all lines.
By reducing the number of cache lines that need to be checked on average, way prediction decreases the
cache access time, leading to faster data retrieval.
In workloads with predictable access patterns (such as loops or frequently accessed data), way
prediction can improve the likelihood of cache hits.
Efficiency:
This technique minimizes the performance impact of the additional complexity introduced by using a
set-associative cache, making the access pattern more efficient.
Implementation Considerations
Prediction Mechanism: The effectiveness of way prediction heavily depends on the accuracy of the
prediction mechanism. Common strategies include:
Simple History: Keeping a record of recent accesses to predict which way to check first.
State Machines: Using finite state machines that track access patterns over time to make predictions.
Trade-offs: While way prediction can reduce hit times, it also introduces additional complexity and
potential overhead in terms of hardware resources and prediction accuracy.
6.Explain the use of loop interchange to reduce the miss rate with example.
Loop interchange is a code optimization technique used to improve data locality and reduce cache miss
rates in nested loops. By changing the order of loop iterations, you can enhance spatial locality, which
helps to maximize cache hits when accessing array elements.
When accessing multidimensional arrays, the order in which loops iterate over the array can greatly
affect cache performance. If the innermost loop accesses data that is not contiguous in memory (leading
to scattered memory accesses), it can result in more cache misses. Loop interchange reorders these
loops to access data in a more cache-friendly manner.
Example
A[i][j] += B[i][j];
In this example:
Cache Behavior
In a row-major order (which is how C/C++ stores arrays), this access pattern is cache-unfriendly because:
When you access A[i][j], you often load a cache line that contains several elements of A[i][*], but the
next access to A[i][j+1] may lead to a cache miss if the next iteration accesses a non-contiguous memory
location.
A[i][j] += B[i][j];
By iterating over the inner loop with i (rows) for each j (columns), you ensure that you are accessing
contiguous elements in memory, leading to better cache utilization.
With better data locality, the likelihood of cache hits increases, reducing the overall miss rate. This
means that once a cache line is loaded for A[i][j], the next access to A[i+1][j] is more likely to hit in the
cache.