Cache Memory
Cache Memory
Chapter 4
Cache Memory
Presented by Group 4
Cache Memory
MEMORY SYSTEMS
Found within the CPU or very Located further from the CPU,
close to it, such as cache and including hard drives, SSDs, and
registers. tapes.
Capacity
DEFINITION HIERARCHY
Registers and Cache: Smallest but
fastest memory (in KB or MB).
The total amount of data the Main Memory (RAM): Larger but
memory system can hold at one slightly slower (in GB).
time. External Storage (HDD, SSD):
Largest, typically in the range of
terabytes, but slowest.
Unit of Transfer
ACCESS TIME
MEMORY CYCLE TIME
(LATENCY)
Time required to read or write Includes access time and the
data. time required before a second
For random-access memory, access can start.
this is the time between the Important for system bus
request and the availability of performance.
the data.
Performance
TRANSFER RATE
For non-random-access
memory: Tn=Ta+n/R,
The rate at which data can be where Tn is the average time
transferred. to read/write n bits, Tais the
For random-access memory, average access time, and R is
Transfer Rate = 1/Cycle Time the transfer rate (in bits per
second).
Types of Memory
NON-VOLATILE
VOLATILE MEMORY
MEMORY
Suppose that the processor has access to two levels of memory. Level 1 contains 1000 words
and has an access time of 0.01 μs; level 2 contains 100,000 words and has an access time of
0.1 μs. Assume that if a word to be accessed is in level 1, then the processor accesses it
directly. If it is in level 2, then the word is first transferred to level 1 and then accessed by the
processor. For simplicity, we ignore the time required for the processor to determine whether
the word is in level 1 or level 2. Figure 4.2 shows the general shape of the curve that covers
this situation. The figure shows the average access time to a two- level memory as a function
of the hit ratio H, where H is defined as the fraction of all memory accesses that are found in
the faster memory (e.g., the cache), T1 is the access time to level 1, and T2 is the access time
to level 2. As can be seen, for high percentages of level 1 access, the average total access
time is much closer to that of level 1 than that of level 2.
In our example, suppose 95% of the memory accesses are found in level 1. Then the average
time to access a word can be expressed as
The average access time is much closer to 0.01 μs than to 0.1 μs, as desired.
DATA ORGANIZATION
CACHE MEMORY
PRINCIPLES
4.2 CACHE MEMORY
PRINCIPLES
CACHE
CACHE
2
0
1
MAIN
MEMORY
READ OPERATION
Cache Hit: When the
cache has the data, it
directly communicates
with the processor,
bypassing the system
bus and buffers.
ORGANIZATION
Chapter 4.3
ELEMENTS OF
CACHE DESIGN
TOPICS
HIGH-PERFORMANCE
COMPUTING
CACHE
ADDRESSES
LOGICAL/ PHYSICAL
VIRTUAL CACHE CACHE
-Virtual addresses
-Physical addresses
-Faster
VIRTUAL/LOGICAL
PHYSICAL
CACHE SIZE
i = j modulo m
where i = cache line number
j = main memory block number
m = number of lines in the cache
Direct Mapping
Direct Mapping
Thrashing
Associative
Mapping
m=v*k
i = j modulo v
where
i = cache set number
j = main memory block number
m = number of lines in the cache
v = number of sets
k = number of lines in each set
Set-Associative Mapping
Replacement Algorithms
When the cache is filled, and a new block is
brought in:
A block must be replaced to make room for the
new block.
For direct mapping:
There is only one possible line for any specific
block.
No choice is available in selecting which block to
replace.
For associative and set-associative mapping:
A replacement algorithm is needed to decide
which block to replace.
This algorithm must be implemented in hardware
to ensure high speed.
Four Common replacement
algorithms are:
LEAST RECENTLY USED (LRU):
Concept: Replace the block that has been in
the cache the longest without being
referenced. LEAST FREQUENTLY USED (LFU):
For two-way set associative caches:
When a line is referenced, its USE bit is Concept: Replace the block that has
set to 1, and the other line's USE bit is set experienced the fewest references.
to 0. The block whose USE bit is 0 is A counter is associated with each line, and
replaced. the line with the lowest count is replaced.
For fully associative caches:
A list of indexes to all lines is maintained.
The most recently used line moves to the
front of the list. RANDOM REPLACEMENT:
The least recently used line (at the back
of the list) is replaced. Concept: Pick a line at random for
replacement, without considering its
usage.
Simulation studies suggest random
FIRST-IN-FIRST-OUT (FIFO): replacement provides only slightly worse
performance compared to usage-based
Concept: Replace the block that has been in algorithms.
the cache the longest, regardless of how
often it's been used.
Easily implemented using a round-robin or
circular buffer technique.
Write Policy
When a block in the cache is replaced:
If the block hasn't been altered: It can be overwritten directly with a
new block.
If the block has been altered (at least one write operation): The
modified block must be written back to the main memory before the
new block is loaded into the cache.
Cache coherency:
A system that ensures all
caches and main memory are
synchronized is said to
maintain cache coherency.
Possible approaches to
maintain cache coherency: