0% found this document useful (0 votes)
53 views28 pages

R RRRRRRRR Final

Replacement algorithms are used in cache memory to determine which existing block of data to remove when a new block needs to be added to a full cache. Common replacement algorithms include least recently used (LRU), first in first out (FIFO), and least frequently used (LFU). The goal of replacement algorithms is to minimize cache misses and their associated performance penalties like increased latency.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
53 views28 pages

R RRRRRRRR Final

Replacement algorithms are used in cache memory to determine which existing block of data to remove when a new block needs to be added to a full cache. Common replacement algorithms include least recently used (LRU), first in first out (FIFO), and least frequently used (LFU). The goal of replacement algorithms is to minimize cache misses and their associated performance penalties like increased latency.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 28

Replacement Algorithms

For direct mapping, there is only one possible line


for any particular block, and no choice is possible.

Once the cache has been filled, when a new block


is brought into the cache, one of the existing
blocks must be replaced.
Replacement algorithm are used by cache memory
for the replacement of older blocks or the blocks
which are not in use for a long time so as to get a
new place for the incoming blocks.
Cache memory is used for speed matching of main
memory and processor and replacement
algorithms helps it happen.
In modern architecture, cache misses can cost
power consumption, latency, penalty or any other
similar faults. Replacement algorithms are
designed to minimize these losses.

By applying simple replacement algorithms


sensitive to latency, we can improve the execution
time by 18 %.
Working of
Replacement Algorithms
•Least Recently used (LRU) : Replace the block in the set
that has been in cache the longest with no references to it.

•First in First out (FIFO) : The block which is first loaded in


the cache amongst the present blocks in the cache is
selected for replacement.

•Least frequently used (LFU) : Replace the block in the set


that has experienced the fewest references

•Random : Existing blocks are replaced randomly.


REPLACEMENT ALGORITHMS is guided by A CPU cache is a
cache used by the central processing unit of a computer to
reduce the average time to access memory. The cache is a
smaller, faster memory which stores copies of the data from
the most frequently used main memory locations.

When the processor needs to read from or write to a location in


main memory, it first checks whether a copy of that data is in
the cache. If so, the processor immediately reads from or
writes to the cache, which is much faster than reading from or
writing to main memory.
• We know that there are different levels in cache memory,
so we cannot rely on any single algorithm for replacement.

•Whenever there is a miss in the cache memory and there is


no place for a new block to be stored, replacement
algorithms comes into the picture.
Why Do We Need Caching?

 To help computers run more efficiently on a daily basis, the operating


system temporarily saves information in memory or to a storage device
(i.e., disk) so that it can be used later. This is a well-known technique
called caching and happens automatically in most cases. The speed of
access to information is a big selling point for multiple caching
techniques on our devices
Central Caching Policies
Write-Through Policy
•  it means that data is stored and written into the cache 
and to the primary storage device at the same time.

•  if the computer crashes or the power goes out, data 
can still be recovered without issue. 
• to keep data safe, this policy has to perform every write 
operation twice
• the program or application that is being used must wait 
until the data has been written to both the cache and 
storage device before it can proceed. 
• this comes at the cost of system performance but is 
highly recommended for sensitive data that cannot be 
lost.
ADVANTAGE

•  it ensures information will be stored safely without risk 
of data loss.
Write-Back Policy
•  saves data only to the cache when processing. 
• there are only certain times or conditions where the 
information will also be written to the primary storage 
device 
• since there is no guaranteed way to keep the data safe, 
this policy has a much higher probability of data loss if 
something were to go wrong
• at the same time, since it no longer has to write 
information to both cache and storage device, system 
performance gains are noticeable when compared to 
the write-through policy.
• data recoverability is exchanged for system 
performance, making this ideal for applications or 
programs that require low latency and high throughput.
Line Size
• retrieve not only desired word but a number of
adjacent words as well.
• increased block size will increase hit ratio at first
• hit ratio will decreases as block becomes even
bigger ---- probability of using newly fetched
information becomes less than probability of
reusing replaced
• larger blocks ---- reduce number of blocks that
fit in cache; data overwritten shortly after being
fetched; each additional word is less local so
less likely to be needed
• no definitive optimum value has been found.
• 8 to 64 bytes seems reasonable
• For HPC systems, 64- and 128- byte most
common
Number of Caches
Multilevel Caches
• High logic density enables caches on chip.
 faster than bus access
 frees bus for other transfers

• Common to use both on and off chip cache.


 L1 on chip, L2 off chip in static RAM
 L2 access much faster than DRAM or ROM
 L2 often uses separate data path
 L2 may now be on chip
 Resulting in L3 cache
 Bus access or now on chip
UNIFIED VERSUS SPLIT CACHES

• One cache for data and instructions or two, one


for data and one for instructions
• Advantages of unified cache
 Higher hit rate
 Balances load of instruction and date
fetch
 Only one cache to design & implement
• Advantages of split cache
 Eliminates cache contention between
instruction fetch/decode unit and execution
unit
 Important in pipelining
PENTIUM 4 CACHE
• 80386 – no on chip cache

• 80486 – 8k using 16 byte lines and four way set


associative organization
• Pentium ( all versions) – two on chip L1 caches
 Data & instructions

• Pentium III – L3 cache added off chip

• Pentium 4
 L1 caches
 8k bytes
 64 byte lines
 Four way set associative
PENTIUM 4 CACHE
• Pentium 4
 L2 cache
 Feeding both L1 cache
 256k
 128 byte line
 8 way set associative

 L3 cache on chip
PENTIUM 4 CORE PROCESSOR
• Fetch/Decode Unit
 Fetches instructions from L2 cache
 Decode into micro-ops
 Store micro-ops in L1 cache
• Out of order execution logic
 Schedules micro-ops
 Based on data dependence and resources
 May speculatively execute
• Execution units
 Execute micro-ops
 Data from L1 cache
 Results in registers
• Memory subsystem
 L2 cache and systems bus
PENTIUM 4 DESIGN REASONING
• Decodes instructions into RISC like micro-ops
before L1 cache
• Micro-ops fixed length
 Superscalar pipelining and scheduling
• Pentium instructions long & complex
• Performance improved by separating decoding
from scheduling & pipelining
• Data cache is write back
 Can be configured to write through
• L1 cache controlled by 2 bits in register
 CD = cache disable
 NW = not write through
 2 instructions to invalidate (flush) cache and
write back then invalidate
• L2 and L3 8-way set associative
 Line size 128 bytes
ARM CACHE ORGANIZATION

• Small FIFO write buffer


 Enhances memory write performance
 Between cache and main memory
 Small c.f. cache
 Data put in write buffer at processor clock speed
 Processor continues execution
 External write in parallel unit empty
 If buffer full, processor stalls
 Data in write buffer not available until written
ARM CACHE & WRITE BUFFER
ORGANIZATION

You might also like