Notes M5

Download as pdf or txt
Download as pdf or txt
You are on page 1of 13

MC MODULE_5

MODULE 5

Chapter 12 CACHES

 Cache
– is a small, fast array of memory placed between the processor core and main memory
that store portions of recently referenced main memory.
– The word cache is a French word meaning “a concealed place for storage”.
 Write Buffer
– Often used with a cache is a write buffer a very small first-in-first-out (FIFO) memory
placed between the processor core and main memory.
– The purpose of a write buffer is to free the processor core and cache memory from the
slow write time associated with writing to main memory.

12.1 THE MEMORY HIERARCHY AND CACHE MEMORY

 The innermost level of the hierarchy is at the processor core. This memory is so
tightly coupled to the processor that in many ways it is difficult to think of it
as separate from the processor. This memory is known as a register file.

 At the primary level, memory components are connected to the processor core
through dedicated on-chip interfaces. It is at this level we find tightly coupled
memory (TCM) and level 1 cache.

RADHIKA S N Asst.Professor Dept of CSE JNNCE, Shimoga


MC MODULE_5

 At the primary level is main memory. It includes volatile components like


SRAM and DRAM, and non-volatile components like flash memory.

 The purpose of main memory is to hold programs while they are running on a
system.

 Also at the primary level is main memory. It includes volatile components like
SRAM and DRAM, and non-volatile components like flash memory.

 The purpose of main memory is to hold programs while they are running on a
system.

 Secondary memory is used to store unused portions of very large programs that
do not fit in main memory and programs that are not currently executing.

 A cache may be incorporated between any level in the hierarchy where there is
a significant access time difference between memory components.

 A cache can improve system performance whenever such a difference exists. A


cache memory system takes information stored in a lower level of the hierarchy
and temporarily moves it to a higher level.

 Figure 12.1 includes a level 1 (L1) cache and write buffer. The L1 cache is an
array of high-speed, on-chip memory that temporarily holds code and data from
a slower level.

 A cache holds this information to decrease the time required to access both
instructions and data.

 The write buffer is a very small FIFO buffer that supports writes to main
memory from the cache.

 Not shown in the figure is a level 2 (L2) cache. An L2 cache is located between
the L1 cache and slower memory. The L1 and L2 caches are also known as the
primary and secondary caches.

RADHIKA S N Asst.Professor Dept of CSE JNNCE, Shimoga


MC MODULE_5

Figure 12.2 shows the relationship that a cache has with main memory system and the
processor core.
 The upper half of the figure shows a block diagram of a system without a cache.
Main memory is accessed directly by the processor core using the datatypes
supported by the processor core.
 The lower half of the diagram shows a system with a cache. The cache memory
is much faster than main memory and thus responds quickly to data requests by
the core
 The cache's relationship with main memory involves the transfer of small
blocks of data between the slower main memory to the faster cache memory.
These blocks of data are known as cache lines
 The cache's relationship with main memory involves the transfer of small
blocks of data between the slower main memory to the faster cache memory.
These blocks of data are known as cache lines

12.1.1 CACHES AND MEMORY MANAGEMENT UNITS

If a cached core supports virtual memory, it can be located between the core
and the memory management unit (MMU), or between the MMU and physical
memory. Figure 12.3 shows the difference between the two caches
 A logical cache stores data in a virtual address space. A logical cache is located
between the processor and the MMU.
 The processor can access data from a logical cache directly without going
through the MMU. A logical cache is also known as a virtual cache
 A physical cache stores memory using physical addresses.
 A physical cache is located between the MMU and main memory.
 For the processor to access memory, the MMU must first translate the virtual
address to a physical address before the cache memory can provide data to the
core.
 ARM cached cores with an MMU use logical caches for processor families
ARM7 through ARM10, including the Intel StrongARM and Intel XScale
processors. The ARM11 processor family uses a physical cache
 The improvement a cache provides is possible because computer programs
execute in non-random way.
 The principle of locality of reference explains the performance improvement
provided by the addition of a cache memory to a system.
 The repeated use of the same code or data in memory, or those very near, is the
reason a cache improves performance.
 By loading the referenced code or data into faster memory when first accessed,
each subsequent access will be much faster. It is the repeated access to the faster
memory that improves performance.

RADHIKA S N Asst.Professor Dept of CSE JNNCE, Shimoga


MC MODULE_5

12.2 CACHE ARCHITECTURE


ARM uses two bus architectures in its cached cores
– Von Neumann
– Harvard

RADHIKA S N Asst.Professor Dept of CSE JNNCE, Shimoga


MC MODULE_5

 In processor cores using the Von Neumann architecture, there is a single cache
used for instruction and data.
 This type of cache is known as a unified cache. A unified cache memory
contains both instruction and data values.
 The Harvard architecture has separate instruction and data buses to improve
overall system performance, but supporting the two buses requires two caches.
 In processor cores using the Harvard architecture, there are two caches: an
instruction cache (I-cache) and a data cache (D-cache). This type of cache is
known as a split cache.
 In a split cache, instructions are stored in the instruction cache and data values
are stored in the data cache.
 The size of a cache is defined as the actual code or data the cache can store from
main memory. Not included in the cache size is the cache memory required to
support cache-tags or status bits.
 Two common status bits are the valid bit and dirty bit.
 A valid bit marks a cache line as active, meaning it contains live data originally
taken from main memory and is currently available to the processor core on
demand.
 A dirty bit defines whether or not a cache line contains data that is different
from the value it represents in main memory.

12.2.2 BASIC OPERATION OF A CACHE CONTROLLER

 The cache controller is hardware that copies code or data from main memory
to cache memory automatically.
 It performs this task automatically to conceal cache operation from the software
it supports
 The cache controller intercepts read and write memory requests before passing
them on to the memory controller.
 It processes a request by dividing the address of the request into three fields,
the tag field, the set index field, and the data index field. The three bit fields are
shown in Figure 12.4.
 First, the controller uses the set index portion of the address to locate the cache
line within the cache memory that might hold the requested code or data. This
cache line contains the cache-tag and status bits, which the controller uses to
determine the actual data stored there.
 The controller then checks the valid bit to determine if the cache line is active,
and compares the cache-tag to the tag field of the requested address. If both the
status check and comparison succeed, it is a cache hit. If either the status check
or comparison fails, it is a cache miss.
 On a cache miss, the controller copies an entire cache line from main memory
to cache memory and provides the requested code or data to the processor. The
copying of a cache line from main memory to cache memory is known as a
cache line fill.

RADHIKA S N Asst.Professor Dept of CSE JNNCE, Shimoga


MC MODULE_5

 On a cache hit, the controller supplies the code or data directly from cache
memory to the processor. To do this it moves to the next step, which is to use
the data index field of the address request to select the actual code or data in the
cache line and provide it to the processor.

12.2.3 THE RELATIONSHIP BETWEEN CACHE AND MAIN


MEMORY

Figure 12.5 shows where portions of main memory are temporarily stored in
cache memory. The figure represents the simplest form of cache, known as a direct-
mapped cache.

 direct-mapped cache each addressed location in main memory maps to a single


location in cache memory.
 Main memory is much larger than cache memory, there are many addresses in
main memory that map to the same single location in cache memory.
 The figure shows this relationship for the class of addresses ending in 0x824.
 The set index selects the one location in cache where all values in memory with
an ending address of 0x824 are stored.
 The data index selects the word/halfword/byte in the cache line, in this case the
second word in the cache line. The tag field is the portion of the address that is
compared to the cache-tag value found in the directory store.
 The comparison of the tag with the cache-tag determines whether the requested
data is in cache or represents another of the million locations in main memory
with an ending address of 0x824.
 During a cache line fill the cache controller may forward the loading data to the
core at the same time it is copying it to cache; this is known as data streaming.

RADHIKA S N Asst.Professor Dept of CSE JNNCE, Shimoga


MC MODULE_5

 Streaming allows a processor to continue execution while the cache controller


fills the remaining words in the cache line.
 If valid data exists in this cache line but represents another address block in
main memory, the entire cache line is evicted and replaced by the cache line
containing the requested address.
 This process of removing an existing cache line as part of servicing a cache
miss is known as eviction
 returning the contents of a cache line to main memory from the cache to make
room for new data that needs to be loaded in cache
 A direct-mapped cache is a simple solution, but there is a design cost inherent
in having a single location available to store a value from main memory.
 Direct-mapped caches are subject to high levels of thrashing—a software
battle for the same location in cache memory. The result of thrashing is the
repeated loading and eviction of a cache line.
 The loading and eviction result from program elements being placed in main
memory at addresses that map to the same cache line in cache memory.

 Figure 12.6 takes Figure 12.5 and overlays a simple, contrived software
procedure to demonstrate thrashing. The procedure calls two routines
repeatedly in a do whlie loop.
 Each routine has the same set index address; that is, the routines are found at
addresses in physical memory that map to the same location in cache memory.
 The first time through the loop, routine A is placed in the cache as it executes.
When the procedure calls routine B, it evicts routine A a cache line at a time as
it is loaded into cache and executed.
 On the second time through the loop, routine A replaces routine B, and then
routine B replaces routine A.
 Repeated cache misses result in continuous eviction of the routine that not
running. This is cache thrashing.

RADHIKA S N Asst.Professor Dept of CSE JNNCE, Shimoga


MC MODULE_5

12.2.4 SET ASSOCIATIVITY


some caches include an additional design feature to reduce the frequency of
thrashing
 This structural design feature is a change that divides the cache memory into
smaller equal units, called ways. Figure 12.7 is still a four KB cache
 The set index now addresses more than one cache line—it points to one cache
line in each way.
 Instead of one way of 256 lines, the cache has four ways of 64 lines. The four
cache lines with the same set index are said to be in the same set, which is the
origin of the name "set index."
 The set of cache lines pointed to by the set index are set associative

 The storing of data in cache lines within a set does not affect program execution.
 Two sequential blocks from main memory can be stored as cache lines in the
same way or two different ways.
 The placement of values within a set is exclusive to prevent the same code or
data block from simultaneously occupying two cache lines in a set.
 The mapping of main memory to a cache changes in a four-way set associative
cache. Figure 12.8 shows the differences.
 Any single location in main memory now maps to four different locations in
the cache
 The bit field for the tag is now two bits larger, and the set index bit field is two
bits smaller.
 The size of the area of main memory that maps to cache is now 1 KB instead
of 4 KB. This means that the likelihood of mapping cache line data blocks to
the same set is now four times higher. This is offset by the fact that a cache line
is one fourth less likely to be evicted.

RADHIKA S N Asst.Professor Dept of CSE JNNCE, Shimoga


MC MODULE_5

12.2.4.1 Increasing Set Associativity

The ideal goal would be to maximize the set associativity of a cache by designing it so any
main memory location maps to any cache line. A cache that does this is known as a fully
associative cache.

 One method used by hardware designers to increase the set associativity of a cache
includes a content addressable memory (CAM).
 A cache that does this is known as a fully associative cache. However, as the
associativity increases, so does the complexity of the hardware that supports it.
 A CAM uses a set of comparators to compare the input tag address with a cache-tag
stored in each valid cache line.
 A CAM works in the opposite way a RAM works. Where a RAM produces data when
given an address value, a CAM produces an address if a given data value exists in the
memory.
 The cache controller uses the address tag as the input to the CAM and the output selects
the way containing the valid cache line.
 The tag portion of the requested address is used as an input to the four CAMs that
simultaneously compare the input tag with all cache-tags stored in the 64 ways.
 The controller enables one of four CAMs using the set index bits. The indexed CAM
then selects a cache line in cache memory and the data index portion of the core address
selects the requested word, halfword, or byte within the cache line .

RADHIKA S N Asst.Professor Dept of CSE JNNCE, Shimoga


MC MODULE_5

12.2.5 WRITE BUFFERS

A write buffer is a very small, fast FIFO memory buffer that temporarily holds data
that the processor would normally write to main memory. In a system without a write buffer,
the processor writes directly to main memory.
 The write buffer reduces the processor time taken to write small blocks of sequential
data to main memory.
 The FIFO memory of the write buffer is at the same level in the memory hierarchy as
the L1 cache and is shown in Figure 12.1.
 The efficiency of the write buffer depends on the ratio of main memory writes to the
number of instructions executed
 If the write buffer does not fill, the running program continues to execute out of cache
memory using registers for processing, cache memory for reads and writes, and the
write buffer for holding evicted cache lines while they drain to main memory.
 A write buffer also improves cache performance; the improvement occurs during cache
line evictions.
 If the cache controller evicts a dirty cache line, it writes the cache line to the write
buffer instead of main memory.
 The new cache line data will be available sooner, and the processor can continue
operating from cache memory.
 Data written to the write buffer is not available for reading until it has exited the write
buffer to main memory.
 The ARM10 family, for example, supports coalescing —the merging of write
operations into a single cache line.
 The write buffer will merge the new value into an existing cache line in the write buffer
if they represent the same data block in main memory. Coalescing is also known as
write merging, write collapsing, or write combining.

RADHIKA S N Asst.Professor Dept of CSE JNNCE, Shimoga


MC MODULE_5

12.2.6 MEASURING CACHE EFFICIENCY

There are two terms used to characterize the cache efficiency of a program:

The cache hit rate and the cache miss rate. The hit rate is the number of cache hits
divided by the total number of memory requests over a given time interval. The value is
expressed as a percentage:

 The miss rate is similar in form: the total cache misses divided by the total number of
memory requests expressed as a percentage over a time interval.
 Note that the miss rate also equals 100 minus the hit rate.
 Two other terms used in cache performance measurement are the hit time—the time it
takes to access a memory location in the cache.
 The miss penalty—the time it takes to load a cache line from main memory into cache.

12.3 CACHE POLICY


There are three policies that determine the operation of a cache: the write
policy, the replacement policy, and the allocation policy.

The cache write policy determines where data is stored during processor write operations.

The replacement policy selects the cache line in a set that is used for the next line fill
during a cache miss.

The allocation policy determines when the cache controller allocates a cache line.

12.3.1 WRITE POLICY—WRITEBACK OR WRITETHROUGH

 Writethrough
– Cache controller writes to both cache and main memory when there is a cache hit on
write, ensuring that the cache and main memory stay coherent at all times, but slower
than writeback.
 Writeback
– Cache controller writes to valid cache data memory and not to main
memory.Consequently, valid cache lines and main memory may contain different
data.
– The line data will be written back to main memory when evicted.
– Must use one or more of the dirty bits.
One performance advantage a writeback cache has over a writethrough cache is in the
frequent use of temporary local variables by a subroutine.
These variables are transient in nature and never really need to be written to main memory.
An example of one of these transient variables is a local variable that overflows onto a
cached stack because there are not enough registers in the register file to hold the variable.

RADHIKA S N Asst.Professor Dept of CSE JNNCE, Shimoga


MC MODULE_5

12.3.2 CACHE LINE REPLACEMENT POLICIES


On a cache miss, the cache controller must select a cache line from the available set in cache memory to
store the new information from main memory.
 The cache line selected for replacement is known as a victim.
 If the victim contains valid, dirty data, the controller must write the dirty data from
the cache memory to main memory before it copies new data into the victim cache
line.
 The process of selecting and replacing a victim cache line is known as eviction.
 The strategy implemented in a cache controller to select the next victim is called its
replacement policy.
 The replacement policy selects a cache line from the available associative member
set; that is, it selects the way to use in the next cache line replacement.
 To summarize the overall process, the set index selects the set of cache lines
available in the ways, and the replacement policy selects the specific cache line from
the set to replace.
 ARM cached cores support two replacement policies, either pseudorandom or
round-robin.

Round-robin or cyclic replacement


simply selects the next cache line in a set to replace. The selection algorithm uses a
sequential, incrementing victim counter that increments each time the cache controller
allocates a cache line. When the victim counter reaches a maximum value, it is reset to a
defined base value.

Pseudorandom replacement
randomly selects the next cache line in a set to replace. The selection algorithm uses a
nonsequential incrementing victim counter. In a pseudorandom replacement algorithm the
controller increments the victim counter by randomly selecting an increment value and
adding this value to the victim counter. When the victim counter reaches a maximum value,
it is reset to a defined base value.

RADHIKA S N Asst.Professor Dept of CSE JNNCE, Shimoga


MC MODULE_5

12.3.3 ALLOCATION POLICY ON A CACHE MISS


There are two strategies ARM caches may use to allocate a cache line after a the occurrence of
a cache miss.
The first strategy is known as read-allocate, and the second strategy is known as read-write-allocate.
A read allocate on cache miss policy allocates a cache line only during a read from main memory. If the
victim cache line contains valid data, then it is written to main memory before the cache line is filled
with new data.
a write of new data to memory does not update the contents of the cache memory unless a cache line was
allocated on a previous read from main memory.
If the cache line contains valid data, then a write updates the cache and may update main memory if the
cache write policy is writethrough. If the data is not in cache, the controller writes to main memory only.
A read-write allocate on cache miss policy allocates a cache line for either a read or write to memory.
Any load or store operation made to main memory, which is not in cache memory, allocates a cache line.
On memory reads the controller uses a read-allocate policy. On a write, the controller also allocates a
cache line.
If the victim cache line contains valid data, then it is first written back to main memory before the cache
controller fills the victim cache line with new data from main memory.
If the cache line is not valid, it simply does a cache line fill. After the cache line is filled from main
memory, the controller writes the data to the corresponding data location within the cache line. The
cached core also updates main memory if it is a writethrough cache.

12.4 COPROCESSOR 1 5 AND CACHES


There are several coprocessor 15 registers used to specifically configure and control ARM
cached cores. Table 12.2 lists the coprocessor 15 registers that control cache configuration. Primary
CP15 registers c7 and c9 control the setup and operation of cache. Secondary CP15:c7 registers are write
only and clean and flush cache. The CP15:c9 register defines the victim pointer base address, which
determines the number of lines of code or data that are locked in cache.

RADHIKA S N Asst.Professor Dept of CSE JNNCE, Shimoga

You might also like