Cache
Cache
05/04/24 1
Temporal Locality
References tend to repeat. If an item is
referenced then it will tend to be referenced
soon again.
If a sequence of references X1, X2, X3, X4
have recently been made then it is likely
that the next reference will be one of X1,
X2, X3, X4
05/04/24 2
Spatial Locality
If an item is referenced then there is high
probability that items whose addresses are
close by will be referenced soon
05/04/24 3
Sequentiality
It is a restricted type of spatial locality and
can be called a subset of it.
05/04/24 7
Basic Notions
HIT: Processor references that are found in the
cache are called ‘Cache Hits’.
Cache Miss: Processor references not found in
cache are called ‘cache miss’.
On a cache miss the cache control mechanism
must fetch the missing data from main memory
and place it in the cache.
Usually the cache fetches a spatial locality (Set of
contiguous words) called ‘Line’ from memory
05/04/24 8
Basic Notions
Hit Rate: Fraction of memory references
found in cache.
References found in cache / Total memory
References.
Miss Rate: ( 1-Hit Rate) Fraction of memory
references not found in cache.
Hit Time: Time to service memory references
found in cache( Including the time to
determine Hit or Miss.
05/04/24 9
Basic Notions
Miss Penalty: Time required to fetch a block
into a level of memory hierarchy from a
lower level .
This includes time to access the block,
transmit it across to higher level, and insert
it in level that experienced the miss
The primary measure of cache performance
is Miss Rate.
In most processor designs the CPU stalls or
ceases activity on cache Miss.
05/04/24 10
Processor cache Interface
It can be characterized by a number of
parameters.
• Access Time for a reference found in
cache (A Hit)- Depends on cache size and
organisation.
• Access Time for a reference not found in
cache (A Miss) – Depends on memory
organisation.
05/04/24 11
Processor cache Interface
• Time to compute a real address from a
virtual address ( Not in TLB time)-
Depends on address translation facility.
From cache’s point of view the processor
behavior that affects design is
1. No of requests or references per cycle
2. The physical word size or the transfer
unit between CPU and Cache.
05/04/24 12
Cache Organization
• The cache is organized as a directory to
locate the data item and a data array to
hold the data item
• Cache can be organized to fetch on
demand or to prefetch data.
• Fetch on demand ( Most common) brings
a new line to cache only when a processor
reference is not found in current cache
contents. (A cache Miss).
05/04/24 13
Cache Organization
There are three basic types of cache
organisations.
1. Direct Mapped
2. Set associative Mapped
3. Fully associative Mapped
05/04/24 14
Direct Mapped Cache
• In this organization each memory location
is mapped to exactly one location in the
cache.
• Cache Directory consists of number of
lines (entries) with each line containing a
number of contiguous words.
• The Cache Directory is Indexed by lower
order bits and higher order bits are stored
as Tag Bits.
05/04/24 15
Direct Mapped Cache
The 24 bit real address is partitioned into
following for cache usage.
10 8 3 3
Tag Bits Index Bits W/L B/w 8B
Data Array
Valid Bit Dirty Bit Ref Bit
2K X 8 B 2K
10 Tag Bits
COMP
To Processor
05/04/24 17
Set Associative Cache
• The set associative cache operates in a
fashion similar to direct mapped cache.
• Here we have more than one choice to
locate the line.
• If there are ‘n‘ such locations the cache is
said to be n way set associative.
• Each Line in memory maps to a unique
Set in cache and it can be placed in any
element of the set.
05/04/24 18
Set Associative Cache
• This improves locality ( Hit Rate ) since now
line may lie in more than one location. Going
from one way to two way decreases miss
rate by 15 %
• The reference address bits are compared
with all the entries in the set to find a match.
• If there is a hit, then that particular sub cache
array is selected and outgated to processor.
05/04/24 19
Set Associative Cache
Disadvantages:
1. Requires more comparators and stores
more tag bits per block
2. Additional compares and multiplexing
increases cache access time
05/04/24 20
Set Associative Cache
11 7 3 3
1K x 8B
Set 2
11 Tag Bits 11 Tag Bits
M
U
X
05/04/24 21
Fully Associative Cache
• It’s the extreme case of Set associative
mapping.
• In this mapping Line can be stored in any
of the directory entries.
• Referenced address is compared to all the
entries in the directory. (High H/W Cost)
• If a match is found the corresponding
location is fetched and returned to
processor.
• Suitable for small caches only.
05/04/24 22
Fully Associative Mapped Cache
TLB
10 8 3 3
Tag Bits W/L B/w 8B
Data Array
Valid Bit Dirty Bit Ref Bit
2K X 8 B 2K
18Tag Bits
To Processor
05/04/24 23
Write Policies
• There are two strategies to update the
memory on a write.
1. The Write through cache stores both
into cache and main memory on each
write.
2. In Copy back cache write is done in
cache only and Dirty bit is set. Entire line
is stored in main memory on
replacement. (If dirty bit is set)
05/04/24 24
Write Though
• A write is directed at both the cache and
main memory for every CPU store.
• Advantage of maintaining a consistent
image in main memory.
• Disadvantage of increasing memory
traffic in case of large caches.
05/04/24 25
Copy Back
• A write is directed only at cache on CPU
store, and dirty bit is set.
• The entire line is replaced in main
memory only when this line is replaced
with another line.
• When a read miss occurs in cache, the
old line is simply discarded if dirty bit is
not set, else old line is first written out
and then new line is accessed and
written to cache.
05/04/24 26
Write Allocate
• If a cache miss occurs in Store write, the
new line can be Allocated to the cache
and the store write can then be performed
in cache.
• This policy of “Write allocate” is generally
used with copy back caches
• Copy back caches result in lower memory
traffic with large caches.
05/04/24 27
No Write Allocate
• If a cache miss occurs in Store write, then
cache may be bypassed and write is
performed in main memory only.
• This policy of “No write allocate “ is
generally used with write through caches.
• So we have two types of caches.
• CBWA – copy back write allocate
• WTNWA – write through no write allocate.
05/04/24 28
Common Types of Cache
• Integrated or Unified cache
• Split cache I and D.
• Sectored Cache
• Two Level cache
• Write assembly Cache
05/04/24 29
Split I & D Caches
• Separate Instruction and data caches offer
the possibility of significantly increased
cache bandwidth ( almost twice )
• This comes at some sacrifice of increased
miss rate for same size of unified cache.
• Caches are not split equally. I caches are
not required to manage a processor store.
• Spatial locality is much higher in I caches
so larger lines are more effective in I
caches than in D caches.
05/04/24 30
Split I And D caches
I Reads
P
I - Cache
R
O
C
E Invalidate if Found
S
S D writes
O
R D -Cache
D Reads
05/04/24 31
On Chip Caches
• On Chip caches have two notable
considerations.
– Due to pin limitations transfer path to and from
memory is usually limited.
– The cache organisation must be optimized to make
best use of area.
So the area of directory should be small ,
allowing maximum area for data array.
This implies large block size ( less entries) and
simply organised cache with fewer bits per
05/04/24
directory entry. 32
Sectored Cache
• Use of large blocks specially for small
caches causes an increase in miss rate
and specially increases Miss Time penalty
( due to large access time for large blocks)
• The solution is a Sectored cache.
• In a sectored cache each line is broken
into transfer units (one access from cache
to memory)
• The Directory is organised around line size
as usual.
05/04/24 33
Two Level Caches
• First level on chip cache is supported by a
larger, (off or on chip) second level cache.
• The two level cache improves performance
by effectively lowering the first level cache
access time and Miss Penalty.
• A two level cache system is termed
Inclusive if all the contents of lower level
cache (L1) are also contained in higher
level cache ( L2).
05/04/24 34
Two Level Caches
• Second level cache analysis is done using
the principle of inclusion.
– A large second level cache includes everything
in the first level cache.
Thus for purpose of evaluating performance, the
first level cache can be presumed not to exist
and assuming that processor made all its
requests to second level cache.
The line size and in fact the overall size of
second level cache must be significantly larger
than first level cache.
05/04/24 35
Write Assembly Cache
• Write Assembly caches centralize pending
memory writes in a single buffer, reducing
resulting bus traffic.
• The goal of write assembly cache is to assemble
writes so that they can be transmitted to memory
in an orderly way.
• If a synchronizing event occurs as in case of
multiple shared memory processors, the entire
WAC should be transferred to memory to ensure
consistency.
• Temporal locality seems to play a more
important role in case of write traffic than spatial
locality Thus its advantageous to have more
smaller lines.
05/04/24 36