0% found this document useful (0 votes)
14 views36 pages

Cache

Uploaded by

Prince Rathore
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPT, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
14 views36 pages

Cache

Uploaded by

Prince Rathore
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPT, PDF, TXT or read online on Scribd
You are on page 1/ 36

Cache Memory

Caches work on the principle of locality of


program behaviour.

This principle states that programs access a


relatively small portion of their address
space at any instant of time.

There are three different types of locality.

05/04/24 1
Temporal Locality
References tend to repeat. If an item is
referenced then it will tend to be referenced
soon again.
If a sequence of references X1, X2, X3, X4
have recently been made then it is likely
that the next reference will be one of X1,
X2, X3, X4

05/04/24 2
Spatial Locality
If an item is referenced then there is high
probability that items whose addresses are
close by will be referenced soon

References tend to cluster into distinct


regions ( working Sets)

05/04/24 3
Sequentiality
It is a restricted type of spatial locality and
can be called a subset of it.

It states that given a reference been made


to a particular location ‘S’ then there is high
probability that with in the next several
references a reference to location ‘S+1’ will
be made.
05/04/24 4
Locality in Programs
• Locality in programs arise from simple and
natural program structures .
• For example ‘Loops’ – where instructions
and data are normally accessed
sequentially.
• Instructions are normally accessed
sequentially.
• Some data accesses like elements of an
array show high degree of spatial locality.
05/04/24 5
Memory Hierarchy
• Taking advantage of this principle of
locality of program behavior , The memory
of computer is implemented as memory
hierarchy.
• It consists of multiple levels of memory
with different access speeds and sizes
• The faster memory levels have high cost
of per bit storage, so they tend to be
smaller in size
05/04/24 6
Memory Hierarchy
• This implementation helps in creating an
illusion for the user that he can access as
much memory as available in cheapest
technology, while getting the access times
of faster memory.

05/04/24 7
Basic Notions
HIT: Processor references that are found in the
cache are called ‘Cache Hits’.
Cache Miss: Processor references not found in
cache are called ‘cache miss’.
On a cache miss the cache control mechanism
must fetch the missing data from main memory
and place it in the cache.
Usually the cache fetches a spatial locality (Set of
contiguous words) called ‘Line’ from memory

05/04/24 8
Basic Notions
Hit Rate: Fraction of memory references
found in cache.
References found in cache / Total memory
References.
Miss Rate: ( 1-Hit Rate) Fraction of memory
references not found in cache.
Hit Time: Time to service memory references
found in cache( Including the time to
determine Hit or Miss.
05/04/24 9
Basic Notions
Miss Penalty: Time required to fetch a block
into a level of memory hierarchy from a
lower level .
This includes time to access the block,
transmit it across to higher level, and insert
it in level that experienced the miss
The primary measure of cache performance
is Miss Rate.
In most processor designs the CPU stalls or
ceases activity on cache Miss.
05/04/24 10
Processor cache Interface
It can be characterized by a number of
parameters.
• Access Time for a reference found in
cache (A Hit)- Depends on cache size and
organisation.
• Access Time for a reference not found in
cache (A Miss) – Depends on memory
organisation.
05/04/24 11
Processor cache Interface
• Time to compute a real address from a
virtual address ( Not in TLB time)-
Depends on address translation facility.
From cache’s point of view the processor
behavior that affects design is
1. No of requests or references per cycle
2. The physical word size or the transfer
unit between CPU and Cache.
05/04/24 12
Cache Organization
• The cache is organized as a directory to
locate the data item and a data array to
hold the data item
• Cache can be organized to fetch on
demand or to prefetch data.
• Fetch on demand ( Most common) brings
a new line to cache only when a processor
reference is not found in current cache
contents. (A cache Miss).
05/04/24 13
Cache Organization
There are three basic types of cache
organisations.
1. Direct Mapped
2. Set associative Mapped
3. Fully associative Mapped

05/04/24 14
Direct Mapped Cache
• In this organization each memory location
is mapped to exactly one location in the
cache.
• Cache Directory consists of number of
lines (entries) with each line containing a
number of contiguous words.
• The Cache Directory is Indexed by lower
order bits and higher order bits are stored
as Tag Bits.
05/04/24 15
Direct Mapped Cache
The 24 bit real address is partitioned into
following for cache usage.

Tag (10 Bits) Index (8 Bits) W/L (3 Bits) B/W (3 Bits)

•A 16 KB Cache having a line of 8 words of 8 bytes each.


•Total 256 ( 16KB/64 ) Lines or Entries in cache Directory.
•Total 10 bits of Tag (Higher order bits) to differentiate
various addresses mapping to same line number.
05/04/24 16
Direct Mapped Cache
TLB

10 8 3 3
Tag Bits Index Bits W/L B/w 8B

Data Array
Valid Bit Dirty Bit Ref Bit
2K X 8 B 2K
10 Tag Bits

COMP

To Processor
05/04/24 17
Set Associative Cache
• The set associative cache operates in a
fashion similar to direct mapped cache.
• Here we have more than one choice to
locate the line.
• If there are ‘n‘ such locations the cache is
said to be n way set associative.
• Each Line in memory maps to a unique
Set in cache and it can be placed in any
element of the set.
05/04/24 18
Set Associative Cache
• This improves locality ( Hit Rate ) since now
line may lie in more than one location. Going
from one way to two way decreases miss
rate by 15 %
• The reference address bits are compared
with all the entries in the set to find a match.
• If there is a hit, then that particular sub cache
array is selected and outgated to processor.
05/04/24 19
Set Associative Cache
Disadvantages:
1. Requires more comparators and stores
more tag bits per block
2. Additional compares and multiplexing
increases cache access time

05/04/24 20
Set Associative Cache
11 7 3 3

Tag Bits Index Bits W/L B/w Data Array

Valid Bit Dirty Bit Ref Bit 1K x 8B


Set 1

1K x 8B
Set 2
11 Tag Bits 11 Tag Bits

M
U
X
05/04/24 21
Fully Associative Cache
• It’s the extreme case of Set associative
mapping.
• In this mapping Line can be stored in any
of the directory entries.
• Referenced address is compared to all the
entries in the directory. (High H/W Cost)
• If a match is found the corresponding
location is fetched and returned to
processor.
• Suitable for small caches only.
05/04/24 22
Fully Associative Mapped Cache
TLB

10 8 3 3
Tag Bits W/L B/w 8B

Data Array
Valid Bit Dirty Bit Ref Bit
2K X 8 B 2K
18Tag Bits

To Processor
05/04/24 23
Write Policies
• There are two strategies to update the
memory on a write.
1. The Write through cache stores both
into cache and main memory on each
write.
2. In Copy back cache write is done in
cache only and Dirty bit is set. Entire line
is stored in main memory on
replacement. (If dirty bit is set)
05/04/24 24
Write Though
• A write is directed at both the cache and
main memory for every CPU store.
• Advantage of maintaining a consistent
image in main memory.
• Disadvantage of increasing memory
traffic in case of large caches.

05/04/24 25
Copy Back
• A write is directed only at cache on CPU
store, and dirty bit is set.
• The entire line is replaced in main
memory only when this line is replaced
with another line.
• When a read miss occurs in cache, the
old line is simply discarded if dirty bit is
not set, else old line is first written out
and then new line is accessed and
written to cache.
05/04/24 26
Write Allocate
• If a cache miss occurs in Store write, the
new line can be Allocated to the cache
and the store write can then be performed
in cache.
• This policy of “Write allocate” is generally
used with copy back caches
• Copy back caches result in lower memory
traffic with large caches.
05/04/24 27
No Write Allocate
• If a cache miss occurs in Store write, then
cache may be bypassed and write is
performed in main memory only.
• This policy of “No write allocate “ is
generally used with write through caches.
• So we have two types of caches.
• CBWA – copy back write allocate
• WTNWA – write through no write allocate.
05/04/24 28
Common Types of Cache
• Integrated or Unified cache
• Split cache I and D.
• Sectored Cache
• Two Level cache
• Write assembly Cache

05/04/24 29
Split I & D Caches
• Separate Instruction and data caches offer
the possibility of significantly increased
cache bandwidth ( almost twice )
• This comes at some sacrifice of increased
miss rate for same size of unified cache.
• Caches are not split equally. I caches are
not required to manage a processor store.
• Spatial locality is much higher in I caches
so larger lines are more effective in I
caches than in D caches.
05/04/24 30
Split I And D caches
I Reads
P
I - Cache
R
O
C
E Invalidate if Found
S
S D writes
O
R D -Cache

D Reads

05/04/24 31
On Chip Caches
• On Chip caches have two notable
considerations.
– Due to pin limitations transfer path to and from
memory is usually limited.
– The cache organisation must be optimized to make
best use of area.
So the area of directory should be small ,
allowing maximum area for data array.
This implies large block size ( less entries) and
simply organised cache with fewer bits per
05/04/24
directory entry. 32
Sectored Cache
• Use of large blocks specially for small
caches causes an increase in miss rate
and specially increases Miss Time penalty
( due to large access time for large blocks)
• The solution is a Sectored cache.
• In a sectored cache each line is broken
into transfer units (one access from cache
to memory)
• The Directory is organised around line size
as usual.
05/04/24 33
Two Level Caches
• First level on chip cache is supported by a
larger, (off or on chip) second level cache.
• The two level cache improves performance
by effectively lowering the first level cache
access time and Miss Penalty.
• A two level cache system is termed
Inclusive if all the contents of lower level
cache (L1) are also contained in higher
level cache ( L2).
05/04/24 34
Two Level Caches
• Second level cache analysis is done using
the principle of inclusion.
– A large second level cache includes everything
in the first level cache.
Thus for purpose of evaluating performance, the
first level cache can be presumed not to exist
and assuming that processor made all its
requests to second level cache.
The line size and in fact the overall size of
second level cache must be significantly larger
than first level cache.
05/04/24 35
Write Assembly Cache
• Write Assembly caches centralize pending
memory writes in a single buffer, reducing
resulting bus traffic.
• The goal of write assembly cache is to assemble
writes so that they can be transmitted to memory
in an orderly way.
• If a synchronizing event occurs as in case of
multiple shared memory processors, the entire
WAC should be transferred to memory to ensure
consistency.
• Temporal locality seems to play a more
important role in case of write traffic than spatial
locality Thus its advantageous to have more
smaller lines.
05/04/24 36

You might also like