0% found this document useful (0 votes)
9 views

Module 5

Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
9 views

Module 5

Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 17

Module 5: Caches

Introduction

A cache is a small, fast memory between the processor and main memory. It holds recently used
data to speed up the processor, reducing the need to access slower main memory. A write buffer,
also used with caches, queues data for efficient writing to main memory. Caches and write
buffers are transparent to software, improving performance without needing code changes.
However, predicting program execution time becomes challenging due to cache eviction, where
old data is removed to make room for new, potentially impacting performance unpredictably.

12.1 The Memory Hierarchy and Cache Memory


Memory Hierarchy Overview:

• Processor Core: Innermost level tightly coupled with a register file for fastest memory
access.
❖ Registers provide immediate storage for data being actively processed.
• Primary Level:
❖ Tightly Coupled Memory (TCM): On-chip memory directly connected to the
processor core via dedicated interfaces.
❖ Level 1 (L1) Cache: High-speed on-chip memory holding frequently accessed
data.
❖ Main Memory: Includes SRAM, DRAM, and flash memory; stores programs
during execution.
• Secondary Storage: Larger, slower devices like disk drives used for storing large
programs and data not currently in use.
❖ Characterized by longer access times compared to main memory.

Architecture and Technology:

• TCM and SRAM use similar technologies but differ in placement (on-chip vs. board-
mounted).

Cache Functionality:

• Enhances system performance by reducing the time required to access instructions and
data stored in slower memory levels.
• Moves frequently accessed data from lower levels of the hierarchy to higher levels
temporarily.

L1 Cache and Write Buffer:

• L1 Cache: On-chip memory speeding up access to frequently used data.


• Write Buffer: Small FIFO buffer facilitating efficient writes from cache to main
memory.

L2 Cache:

• Located between L1 cache and slower memory, further optimizing data access.

Cache Interaction with System:

• Figure 12.1: Illustrates L1 cache and write buffer, essential for optimizing data flow.
• Figure 12.2: Demonstrates how caches speed up data retrieval compared to direct access
to slower main memory.
o Shows data movement in cache lines between main memory and faster cache
memory.
o Write buffer temporarily holds data before efficiently writing it to main memory.
12.1.1 Caches and Memory Management Units

Figure 12.3 shows the difference between the two caches


• Cache and Virtual Memory:

• Cached cores supporting virtual memory can be placed either between the core and the
Memory Management Unit (MMU), or between the MMU and physical memory.
• Placement determines whether the cache operates in the virtual or physical addressing
realm.

• Logical Cache (Virtual Cache):

• Stores data using virtual addresses, located between the processor and MMU.
• Allows direct access to data without MMU translation.

• Physical Cache:

• Stores data using physical addresses, located between the MMU and main memory.
• Requires MMU translation of virtual addresses to physical addresses before accessing
memory.

• ARM Processor Cache Types:

• ARM7 through ARM10, Intel StrongARM, and Intel XScale processors use logical
caches.
• ARM11 processors utilize physical caches.

• Performance Improvement:

• Caches improve system performance by exploiting the principle of locality of reference.


• Programs often access data and instructions in predictable patterns (locality of reference).
• Caches store frequently used data in faster memory, reducing access time for subsequent
requests.

• Locality of Reference:

• Programs frequently access small loops or nearby data.


• Caches capitalize on this behavior by pre-loading frequently accessed data into faster
memory.
• Subsequent accesses to cached data are faster, enhancing overall system performance.
12.2 Cache Architecture
• ARM microcontrollers use two bus architectures: Von Neumann and Harvard.
• Von Neumann Architecture:

• Single cache used for both instructions and data (unified cache).
• Instructions and data share the same memory space.

• Harvard Architecture:

• Separate buses for instructions (I-cache) and data (D-cache).


• Requires two caches: I-cache for instructions and D-cache for data (split cache).

• Cache Basics:

• Cache memory consists of cache lines, accessed by the cache controller.


• Cache controller uses parts of the processor's address to access cache memory.

• Unified Cache (Von Neumann):

• Contains both instructions and data.

• Split Cache (Harvard):

• Separate caches for instructions and data to improve system performance.

• Cache Components:

• Cache memory: Dedicated memory array accessed in cache lines.


• Cache controller: Manages access between processor and cache memory.
12.2.1 Basic Architecture of a Cache Memory

• Parts of Cache Memory:

• Directory store: Stores cache-tags, identifying where cache lines came from in main
memory.
• Data section: Holds actual data read from main memory.
• Status information: Includes status bits like valid and dirty bits.

• Cache-Tags:

• Directory entries that specify the origin address in main memory for each cache line.

• Data Storage:

• Actual data from main memory stored in the data section of the cache.

• Cache Size Definition:

• Refers to the amount of actual code or data stored in the cache, excluding space for
cache-tags and status bits.

• Status Bits:

• Valid bit: Marks if a cache line contains valid data from main memory.
• Dirty bit: Indicates if data in a cache line differs from the corresponding data in main
memory.

12.2.2 Basic Operation of a Cache Controller

❖ The cache controller automatically copies data between main memory and cache to
optimize performance without software intervention.
❖ It intercepts memory read and write requests, dividing addresses into tag, set index, and
data index fields.
❖ Using the set index, it locates potential cache lines in cache memory and checks tags and
status bits.
❖ A cache hit occurs if the requested data is active in cache and matches the tag; otherwise,
it's a cache miss.
❖ On a cache miss, it performs a cache line fill by copying the entire cache line from main
memory to cache.
❖ For cache hits, it directly provides the requested data from cache memory to the
processor using the data index field.
12.2.3 The Relationship between Cache and Main Memory

• Figure 12.5 illustrates a direct-mapped cache system.


• Main memory addresses are temporarily stored in cache memory.
• Each address in main memory maps to a single location in cache memory.
• Many main memory addresses can map to the same cache location due to the size difference.
• Specifically shows mapping for addresses ending in 0x824.
• Three bit fields from Figure 12.4 are depicted:

• Set index selects the cache location for addresses ending in 0x824.
• Data index selects the specific data within the cache line.
• Tag field is compared to cache-tag to determine data presence in cache.

• Main memory has one million possible locations for every one cache location.
• Only one value from main memory's million can be in cache at a time.
• Tag comparison determines if requested data is in cache or needs fetching.

• Direct-mapped cache uses a single cache location for each main memory address.
• Design cost includes potential high levels of thrashing.
• Thrashing happens when multiple program elements fight for the same cache location.
• This leads to frequent loading and eviction of cache lines.
• Figure 12.6 overlays a software example to demonstrate thrashing.
• Shows two routines called repeatedly in a loop with the same set index address.
• Routines are placed in main memory addresses that map to the same cache location.
• First execution loads and executes routine A in cache.
• Calling routine B evicts routine A from cache.

• Cycle repeats, causing routines to swap places in cache during each loop iteration.
Example in Figure 12.6:

o Illustrates thrashing using a contrived software scenario.


o Two routines (A and B) repeatedly called in a loop with the same set index address.
o Each routine throws out the other from cache as it loads and executes.
o Results in inefficient cache usage and performance degradation.

12.2.4 Set Associativity

• Some caches incorporate a design enhancement to mitigate thrashing, as shown in Figure 12.7.
• This feature divides the cache memory into smaller units called "ways."
• Figure 12.7 depicts a four KB cache where the set index now addresses multiple cache lines
across several ways.
• Previously, a single cache had 256 lines, but now there are four ways with 64 lines each.
• Cache lines sharing the same set index are grouped into a "set," hence the term "set index."
❖ Set associative caches enhance performance by grouping cache lines into sets, as
illustrated in Figure 12.8.
❖ Each set includes multiple cache lines (e.g., four ways in a set).
❖ Data or code blocks from main memory can be allocated to any of these cache lines
within a set without impacting program execution.
❖ This flexibility allows two sequential blocks from main memory to be stored in the same
set or different sets.
❖ Unlike direct-mapped caches, where a specific main memory location maps to only one
cache location, a four-way set associative cache allows a single main memory location to
map to four different cache locations.
❖ Figure 12.8 shows how this mapping changes compared to Figure 12.5, despite both
being 4 KB caches.
❖ Key differences include:
o Tag field is larger by two bits, while set index field is smaller by two bits.
o This results in four million main memory addresses mapping to one set of four
cache lines, instead of one million addresses mapping to one location.
❖ Cache now covers a 1 KB area of main memory instead of 4 KB.
❖ This increases the chances of mapping data blocks to the same set but reduces the
likelihood of eviction.
❖ In a practical scenario like the example in Figure 12.6, using a four-way set associative
cache would reduce thrashing as routines and data establish unique places in the available
cache locations within a set, assuming their sizes fit within the 1 KB mapping area.
12.2.5 Write Buffers
❖ A write buffer is a small, fast FIFO (First-In-First-Out) memory buffer.
❖ It temporarily holds data that the processor intends to write to main memory.
❖ In systems without a write buffer, the processor writes directly to main memory.
❖ With a write buffer, data is initially written quickly to the FIFO and then transferred at a
slower pace to main memory.
❖ The purpose of the write buffer is to reduce the time the processor spends writing small
blocks of sequential data to main memory.
❖ The FIFO memory of the write buffer is positioned at the same level in the memory hierarchy
as the L1 cache, as depicted in Figure 12.1.

12.2.6 Measuring Cache Efficiency

Cache Hit Rate: This measures how often the processor finds requested data in the cache rather
than having to retrieve it from slower main memory.

The hit rate is the number of cache hits divided by the total number of memory requests over a
given time interval. The value is expressed as a percentage:

higher hit rate indicates better cache efficiency and faster data access.

Cache Miss Rate: The miss rate is similar in form: the total cache misses divided by the total
number of memory requests expressed as a percentage over a time interval. Note that the miss
rate also equals 100 minus the hit rate.

Types of Hit and Miss Rates: These terms can specify performance for reads, writes, or both,
offering insights into how effectively the cache handles different types of memory operations.

• Hit Time: This refers to the time taken to access data from the cache when a hit occurs. It's
typically much faster than accessing data from main memory due to the cache's proximity to the
processor.

• Miss Penalty: This is the time delay when data isn't found in the cache (a miss) and must be
fetched from main memory. It represents the performance cost of cache misses compared to hits.
12.3 Cache Policy
There are three policies that determine the operation of a cache:

1. The write policy.


2. The replacement policy
3. The allocation policy.

• The cache write policy determines where data is stored during processor write operations.
• The replacement policy selects the cache line in a set that is used for the next line fill during
a cache miss.
• The allocation policy determines when the cache controller allocates a cache line.

12.3.1 Write Policy—Writeback or Writethrough

• When the processor core writes to memory, the cache controller has two alternatives for its
write policy. The controller can write to both the cache and main memory, updating the
values in both locations; this approach is known as writethrough. Alternatively, the cache
controller can write to cache memory and not update main memory, this is known as
writeback or copyback

Writethrough: When the cache controller uses a writethrough policy, it writes to both cache
and main memory when there is a cache hit on write, ensuring that the cache and main memory
stay coherent at all times. Under this policy, the cache controller performs a write to main
memory for each write to cache memory. Because of the write to main memory, a writethrough
policy is slower than a writeback policy.

Writeback: When a cache controller uses a writeback policy, it writes to valid cache data
memory and not to main memory. Consequently, valid cache lines and main memory may
contain different data. The cache line holds the most recent data, and main memory contains
older data, which has not been updated. Caches configured as writeback caches must use one or
more of the dirty bits in the cache line status information block. When a cache controller in
writeback writes a value to cache memory, it sets the dirty bit true. If the core accesses the cache
line at a later time, it knows by the state of the dirty bit that the cache line contains data not in
main memory. If the cache controller evicts a dirty cache line, it is automatically written out to
main memory. The controller does this to prevent the loss of vital information held in cache
memory and not in main memory. One performance advantage a writeback cache has over a
writethrough cache is in the frequent use of temporary local variables by a subroutine.
• Writeback Policy AND Writethrough

Comparison table between Writeback and Writethrough cache policies:

Feature Writeback Writethrough


Operation Writes data to cache first, then to Writes data simultaneously to both cache
main memory when cache line is and main memory on every write operation.
evicted.
Data Cache may have newer data than Cache and main memory always have
Coherence main memory. synchronized data.
Speed Faster for write operations since Slower for write operations due to
main memory update is delayed immediate updates to main memory.
until cache eviction.
Dirty Bit Uses dirty bits to track which Does not typically use dirty bits since all
Usage cache lines have been modified. writes update main memory immediately.
Suitability Efficient for programs with Ensures data consistency across cache and
frequent writes and temporary main memory, suitable for critical data
data. integrity applications.

12.3.2 Cache Line Replacement Policies

On a cache miss, the cache controller must select a cache line from the available set in cache
memory to store the new information from main memory.

The cache line selected for replacement is known as a victim.

If the victim contains valid, dirty data, the controller must write the dirty data from the cache
memory to main memory before it copies new data into the victim cache line.

The process of selecting and replacing a victim cache line is known as eviction.

The strategy implemented in a cache controller to select the next victim is called its replacement
policy.

The replacement policy selects a cache line from the available associative member set; that is, it
selects the way to use in the next cache line replacement.

ARM cached cores support two replacement policies, either pseudorandom or round-robin.

■ Round-robin or cyclic replacement simply selects the next cache line in a set to replace. The
selection algorithm uses a sequential, incrementing victim counter that increments each time the
cache controller allocates a cache line. When the victim counter reaches a maximum value, it is
reset to a defined base value.
■ Pseudorandom replacement randomly selects the next cache line in a set to replace. The
selection algorithm uses a nonsequential incrementing victim counter. In a pseudorandom
replacement algorithm the controller increments the victim counter by randomly selecting an
increment value and adding this value to the victim counter. When the victim counter reaches a
maximum value, it is reset to a defined base value.

Most ARM cores support both policies (see Table 12.1 for a comprehensive list of ARM cores
and the policies they support).

The round-robin replacement policy has greater predictability, which is desirable in an embedded
system.

However, a round-robin replacement policy is subject to large changes in performance given


small changes in memory access.
Allocation Policy on a Cache Miss

• Read-Allocate on Cache Miss:

• Cache line is allocated only during a read from main memory.


• If the cache line being replaced (victim cache line) has valid data, it's written back to
main memory before the new data is fetched into the cache.
• Writes to memory do not update the cache unless the cache line was allocated during a
previous read.
• If data is in cache, writes update the cache and possibly main memory if it's a
writethrough cache.
• If data is not in cache, writes go directly to main memory.

• Read-Write-Allocate on Cache Miss:

• Cache line is allocated for both reads and writes to memory.


• Any load (read) or store (write) operation to main memory that is not in the cache
allocates a cache line.
• On read from memory, uses a read-allocate policy (similar to read-allocate on cache
miss).
• On write to memory, allocates a cache line if not already in cache.
• If the victim cache line contains valid data during a write, it's first written back to main
memory before the new data is fetched into the cache.
• After fetching data from main memory, the cache controller updates the corresponding
cache line.
• Updates main memory if it's a writethrough cache.

• Core-Specific Implementations:

• ARM7, ARM9, ARM10 Cores: Use read-allocate on miss policy.


• Intel XScale: Supports both read-allocate and write-allocate on miss policies.

You might also like