0% found this document useful (0 votes)
21 views60 pages

Lecture 08 - CH No. 04 (Part 02)

The document discusses the elements of cache design, including cache size, mapping functions, and replacement algorithms. It explains three mapping techniques: Direct, Associative, and Set-Associative mapping, detailing their advantages and disadvantages. The document also provides examples and summaries of each mapping technique, highlighting their impact on cache performance and efficiency.

Uploaded by

farwaakhtarrana
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
21 views60 pages

Lecture 08 - CH No. 04 (Part 02)

The document discusses the elements of cache design, including cache size, mapping functions, and replacement algorithms. It explains three mapping techniques: Direct, Associative, and Set-Associative mapping, detailing their advantages and disadvantages. The document also provides examples and summaries of each mapping technique, highlighting their impact on cache performance and efficiency.

Uploaded by

farwaakhtarrana
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 60

Chapter No.

04 –
Cache Memory (Part –
02)
Lecture – 09
17-04-2018

1
Topics to Cover
4.3 Elements of Cache Design
• Cache Addresses
• Cache Size
• Mapping Function
• Types of Mapping
• Replacement Algorithms
• Write Policy
• Line Size
• Number of Caches
2
4.3 Elements of Cache Design (Table
4.2)
• Although there are a large number of cache implementations, there
are a few basic design elements that serve to classify and differentiate
cache architectures. Table 4.2 below lists key elements.

3
Cache Size
• We would like the size of the cache to be small enough so that the
overall average cost per bit is close to that of main memory alone.
• We would like the cache to be large enough (up to a point) so that the
overall average access time is close to that of the cache alone.
• It is observed, that the larger the cache, the larger the number of
gates (bits) involved in addressing the cache.
• The result is that large caches tend to be slower than the smaller ones
• The available chip and board area also limits cache size.
• The performance of cache is very sensitive to the nature of workload,
so it is impossible to arrive at a single ‘optimum’ cache size.
4
Mapping Function
• Why do we need mapping?
• Because there are fewer cache lines than main memory blocks, an
algorithm is needed for mapping main memory blocks into cache
lines. HOW TO PLACE BLOCKS ONTO CACHE?
• Further, a means is needed for determining which main memory
block currently occupies a cache line. HOW TO IDENTIFY AND SEARCH
• Three mapping techniques can be used:
1. Direct 2. Associative 3. Set-Associative
• The choice of mapping function dictates how the cache is organized.

5
Example 4.2
• We will examine each of these three mapping techniques. In each
case, we look at the general structure and then a specific example.

(Capacity)

6
Example Explained (Cache)
• Cache Capacity = 64-K bytes. (64*1024 bytes)
• One cache line = one block = 4 bytes.

• So there are: 16-K blocks of 4 bytes each:


= 16-K blocks or lines
• To address cache we need 214 lines of 4 bytes each:
16-K blocks = 24 * 210 = 24+10 = 214
• Thus cache needs a 14-bit address for each line (of 4-bytes).

7
Example Explained (Memory)
• Main-memory capacity = 16-Mbytes (16*1024*1024 bytes)
• Main-memory addresses = 16 * 1024 * 1024 bytes
= 24 * 210 * 210
= 24+10+10
= 224 (=> 24-bit address, 224 = 16M)
• For mapping purpose, we have:
= 4 * 1024 *1024 = 4M
• Thus, we have in main-memory = 4M blocks of (4-bytes each)

8
1. Direct Mapping
• Direct mapping maps each block of main memory into only one
possible cache line. (Slicing)
• I.E. if a block is in cache, it must be in one specific place.
• The ‘direct mapping’ is expressed as:

Modulus operator(%) returns remainder after integer division.

9
Direct Mapping
• In ‘Direct Mapping’, the blocks of main-memory are assigned to lines
of the cache as follows:

(Tag field)

• The use of a portion of the address (r-bits) as a line number provides


a unique mapping of each block of main-memory into the cache.
10
• Figure shows the mapping for the first m-blocks of main memory.
• Each block of main memory maps into one unique line of the cache.
• The next m-blocks of main memory map into the cache in the same
fashion; that is, block Bm of main memory maps into line L0 of cache,
block Bm+1 maps into line L1, and so on.

11
See ‘s-bits’

Direct Mapping (Figure Next Slide)


• For purpose of cache access, each main-memory address can be
viewed as consisting of three fields.
• The least significant w-bits identify a unique word or byte within a
block of main-memory. (The address is at the byte level)
• The remaining s-bits specify one of the 2s-blocks of main-memory.
• The cache logic interprets these s-bits as a tag of s – r bits (most
significant portion) and a line field of r-bits.
• The ‘line field’ identifies one of the m = 2r lines of the cache.

12
bits

Cache Access Logic

This square
Box contains (Blocks)
(Block)
The answer to
Question 3. bits bits bits
Which Which Which
block line word

Line
(04 -
Words) (4-words
per block)

13
Summary of ‘Direct Mapping’
Where s is the number of blocks of main-memory and w is a word.

In main-memory

e.g. w =2-bits specifies 4-words in that block

(Blocks * words) / words = blocks

r bits specify m lines of the cache


Cache capacity = lines * words per line = r + w bits
Note: Tag distinguishes a block from other blocks that can fit into that line.

(s –r) - bits r - bits w

14
Advantages and Disadvantages
(Direct Mapping)
• Advantages
• The direct mapping technique is simple and inexpensive to implement
• Disadvantage
• Its main ‘disadvantage’ is that there is a fixed cache location for any
given block.
• Thus, if a program happens to reference words repeatedly from two
different blocks that map into the same line, ‘cache misses’ are HIGH.
• Then the blocks will be continuously swapped in the cache, and a hit
ratio will be low, this phenomenon is known as thrashing.

15
Victim Cache
• One approach to lower the miss penalty is to remember what was
discarded in case it is needed again.
• Since the discarded data has already been fetched, it can be used
again at a small cost.
• Such recycling is possible using a ‘Victim cache’.
• ‘Victim cache’ was originally proposed as an approach to reduce the
conflict misses of direct mapped caches without affecting its fast
access time.
• Victim cache is a fully associative cache, whose size is typically 4 to 16
cache lines, residing between a direct mapped L1 cache and memory.
16
8-bits r-bits specify m lines
Example 4.2(a)
2

Spacing = 4 bytes
Tag (8-bits)
Tag specifies which memory block is placed in the cache line
Each block has
4-byte word

17
For Example
Tag s-r Line or Slot r Word w
8 14 2

• 24 bit address
• 2 bit word identifier (4 byte block)
• 22 bit block identifier (s – r + r) See ‘Slide-6’
• 8 bit tag (=22-14)
• 14 bit slot or line
• No two blocks in the same line have the same Tag field
• Check contents of cache by finding line and checking Tag

18
Tag (s - r) Line or Slot r Word w
8 14 2
Cache Read Operation (Example)
1. The cache system is presented with a (s+w)-bits address.
2. The r-bits as line number is used as an index into the cache to
access a particular line. (containing 4 bytes/words)
3. If the tag-bits (is found), matches the tag number currently stored
in that line, then the w-bits as word number is used to select one of
the word in that line.
4. Otherwise (if not found), the s-bits is used to fetch a block from
main memory.
5. The actual address that is used for the fetch is the s-plus-line
concatenated with two 0-bits, so that 4 bytes (22 = 4) are fetched
starting on a block boundary.
(Break) 19
2. Associative Mapping
• Associative mapping permits each main-memory block to be loaded
into any line of the cache.

20
Fully Associative Cache Organization
(Fig. Next)
• The ‘cache control logic’ interprets a memory address simply as a Tag
and a Word field.
• The Tag field uniquely identifies a block of main-memory. M = 2n/K blk = 2 S

• To determine whether a block is in the cache, the ‘cache control logic’


must simultaneously examine every line’s tag for a match.
• Note that no field in the address corresponds to the line number ‘r’,
so that the number of lines in the cache is not determined by the
address format. Hence all lines are searched.

Block identify bits w

21
Cache Access Logic

(Block) (Blocks)

bits
Which Which
block bits word

This square Line


Box contains (04 -
The answer Words)
(4-words
to
per block)
Question 4.

22
Summary of ‘Associative Mapping’
Block of main memory = s, no. of words in each block = w
In main-memory

e.g. 22 = 4 bytes

(Blocks * words) / words = blocks

That is why we have NO r-bits in address

Tag distinguishes a block of main memory

Block identify bits w

23
Advantages and Disadvantages
(Associative Mapping)
• Advantages
• With ‘Associative mapping’, there is flexibility as to which block to
replace when a new block is read into the cache.
• Replacement algorithms, help to decide which line to replace and are
designed to maximize the HIT-ratio.
• Disadvantage
• The principal disadvantage of ‘associative mapping’ is the complex
circuitry required to examine the tags of all cache lines in parallel.
• This means that ‘cache searching gets expensive’.

24
Example 4.2 (b)

2 bits removed
(24 bits)

S-bits w

25
We have 4M blocks in Main-memory => 4*1024*1024 = 2 2+10+10
= 2 =>22-bits
22
For Example
Word
Tag 22 bit 2 bit

• 22 bit tag stored with each 32 bit block of data


• Compare tag field with tag entry in cache to check for hit
• Least significant 2 bits of address identify which 8 bit word is required
from 32 bit data block
• e.g.
• Address Tag Data Cache line
• 16339C 058CE7 24682468 3FFF

26
3. Set-Associative Mapping
• Set-Associative mapping, each block maps into all the cache lines in a
specific set, so that main-memory block B0 maps into set 0, and so on.
• The cache consists of a number of sets (v), each of which consists of a
number of lines (k). The relationships are

(out of V-sets)

27
(K-way ‘Set-Associative’)
K-way Set-Associative Mapping (Fig.
Next)
• With k-lines per set, this is referred to as k-way set-associative
mapping.
• With set-associative mapping, block Bj can be mapped into any of the
lines of set-j.
• Thus set-associative cache can be implemented as v-associative cache
• E.g. in a ‘2-way set-associative mapping’ (having 2 lines per set), a
given block can be in one of the two-lines in any one set.
• Figure (next slide) illustrates this mapping for the first v blocks of
main-memory.

28
‘V’ Associated–Mapped Caches

V-sets
K-lines per set
Each V-set is ASSOCIATIVE
Block B0 fits into any line of set-0

29
Set-Associative ‘Cache Control
Logic’
• For ‘Set-Associative mapping’, the cache control logic interprets a
memory address as three fields: Tag, Set, and Word.
• The d bits specify one of v = 2d sets.
• The s bits of the Tag and Set fields specify one of the 2s blocks of
main-memory.
• Advantage: With ‘k-way set-associative mapping’, the tag in a
memory address is much smaller and is only compared to the k tags
within a single set. (whereas associative-mapping tag is large, and compared with all lines).
• Figure (next slide) illustrates the ‘cache control logic’.

30
Cache Access Logic

This square
Box contains
The answer to bits bits bits
Question 5. Which Which Which
block Word k-lines
Set

Block Bj
Goes to set-j

k-lines

31
Summary of ‘Set-Associative
Mapping’
Block of main memory = s, no. of words in each block = w
In main-memory
e.g. 22 = 4 bytes

(Blocks * words) / words = blocks

d-bits to identify v-sets


k lines per v sets

No. of lines (k)ⅹ No. of sets (v) * No. of words (w)

32
Example 4.2 (c)

• In a ‘2-way set-associative mapping’ (having 2 lines per set), a given


block can be in one of the two-lines in any one set.
• 13 bit set number (identifies a unique set of two lines within the
cache). Block number in main memory is modulo 213
33
For Example (Cache Read)
Word
Tag 9 bit Set 13 bit 2 bit

• Use set field to determine cache set to look in


• Compare tag field to see if we have a hit
• e.g
• Address Tag Data Set number
• 1FF 7FFC 1FF 12345678 1FFF
• 001 7FFC 001 11223344 1FFF

34
Set-Associative Mapping (Extreme
Cases)
• In the extreme case of v = m, k =1, the set-associative technique
reduces to direct-mapping.
• For v =1, k = m, it reduces to associative-mapping.

• The use of two-lines per set (v = m/2 for k = 2) is the most common
‘set-associative’ organization.
• It significantly improves the HIT-ratio over ‘direct-mapping’.
• ‘Four-way set-associative’ (v = m/4, k = 4) makes a modest additional
improvement. Further increase in No. of lines per set has little effect.

35
See Last Questions
Problem 4-1

(Answer)

36
Problem 4-2

(Answer)

37
Summary of Three Mapping
Techniques
Mapping Address No. of Block No. of No. of Lines Size of Tag Size
Type Length Addressabl Size /Line Blocks in in Cache Cache (bits)
(bits) e Units Size Main- (m) (words)
(words (words memory
/bytes) /bytes)
Direct (s+w) 2 ^ ( s+ w) 2^w 2^s 2 ^ r (lines) 2 ^ ( r + w) (s–r)
Associative (s+w) 2 ^ ( s+ w) 2^w 2^s undefined unknown s-bits
Set- (s+w) 2 ^ ( s+ w) 2^w 2^s k-lines k*v*w (s–d)
Associative per set (v = sets)

This is the difference in all 3 techniques


Note: 2 ^ w means 2w

(Break)

38
Cache ‘Replacement Algorithms’
(Associative)
• Once the cache has been filled, when a new block is brought into the
cache, one of the existing blocks must be replaced.
• For direct-mapping, there is only one possible line for a particular
block, and no choice is possible. We have to replace that line (thrash).
• For the associative and set-associative techniques, a replacement
algorithm is needed.
• To achieve high speed, such an algorithm must be implemented in
hardware.
• A number of algorithms have been tried. We mention only four.

39
Types of ‘Cache Replacement
Algorithms’
• Four of the most common cache replacement algorithms are:
1) Least Recently Used (LRU)
2) First-In-First-Out (FIFO)
3) Least Frequently Used (LFU)
4) Random Replacement (RR)

• ‘Least Recently Used (LRU)’ is the most effective and popular


replacement algorithm.

40
Types of ‘Cache Replacement
Algorithms’
1. Least Recently Used (LRU): Replace that block in the set that has
been in the cache longest with no reference to it. (relates to time)
• We are assuming that more recently used memory locations are likely
to be referenced again. This is called ‘Temporal locality’ and it should
give the best HIT-ratio.
• The cache mechanism maintains a separate list of indexes to all the
lines in the associative cache.
• When a line is referenced, it moves to the front of the list.
• For replacement, the line at the back of the list is used (discarded).

41
Types of ‘Cache Replacement
Algorithms’
2. First-In-First-Out (FIFO): Replace that block in the set that has been
in the cache longest. (first-come first-serve pattern)
• FIFO is easily implemented as a round-robin or circular buffer
technique.
3. Least Frequently Used (LFU): Replace that block in the set that has
experienced the fewest references. (relates to the use of data item)
• LFU could be implemented by associating a counter with each line.
4. Random Replacement (RR): Pick and replace a line at random from
among the candidate lines.

42
Write Policy
• Coherency property: States that if a Word or a cache line (block) is
modified in the lower level memory M1 (cache), then copies of that
word must be updated immediately at all higher memory levels M2,
M3 and so on, so as to maintain ‘data coherency’ among different
memory levels.
• When a block that is resident in the cache is to be replaced, if it has
not been altered (updated/overwritten), then it may directly be
replaced without first writing out this block to the main memory.
• If at least one write operation has been performed on a word in that
line of the cache, then main memory must be updated by writing that
block to the main memory before bringing in the new block in cache.
43
Types of ‘Write Policies’
• If a Word has been altered only in the cache, then the corresponding
memory Word is invalid.
• If the I/O device has altered the main-memory, then the cache Word
is invalid.
• To maintain ‘data coherency’, two write policies have been adopted:
1) Write through 2) Write back

44
Types of ‘Write Policies’
1. Write through: Using this technique, all write operations are made
to main memory as well as to the cache simultaneously, ensuring
that main-memory is always valid. (Advantage)
• ‘Write through’ does an ‘immediate update’ in RAM whenever a word
is modified in cache.
• Its main disadvantage, it generates ‘memory traffic’ and bottleneck.
2. Write back: Minimizes memory writes, and it ‘delays an update’ in
main-memory until that updated block is being replaced by a new
block in that same line in the cache. (set ‘updated’ bit for a line)
• Before ‘write back’, the main-memory block is invalid (Disadvantage).
45
Example 4.3

Answer:

46
Line Size
• When a block of data is retrieved and placed in the cache, not only
the desired word but also some number of adjacent words are
retrieved. (a block is a combination of words)
• As the block size increases, the hit ratio will at first increase, due to:
• Principal of locality, which states that data in the vicinity of a
referenced word are likely to be referenced in the near future.
• As the block size increases further, the hit ratio will begin to decrease.
• The relationship between block size and hit ratio is complex, and no
definitive optimum value has been found. 8-to-64 bytes is reasonable.

47
Number of Caches
• Multilevel Caches / Cache Hierarchy
1. On-chip cache (L1): is a cache on the same chip as the processor.
• The ‘on-chip cache (Level-1)’ reduces the processor’s external bus
activity and therefore speeds up execution times and increases overall
system performance. It is often accessed in one cycle by CPU.
2. An off-chip cache (L2) is external to the processor and is designated
as Level-2. It is made of static-RAM (SRAM).
• Due to the slow bus speed and slow memory access time, resulting in
poor performance, an L2 cache is desirable & gives ‘zero-wait state’.

48
• L2 cache uses a separate data path (Local bus), so as to reduce the
burden on the ‘system bus’.
• With the continued shrinking of the processor components (due to
Moore’s law), a number of processors now incorporate the L2 cache
on the processors chip, improving performance.
• A Hit is counted if the desired data appears in either the L1 or the L2
cache.
• L2 has little effect on the total number of cache hits until it is at least
double the L1 cache size.
• With the increasing availability of on-chip area available for cache,
most microprocessors have moved the L2 cache on the processor chip
and added an L3 cache (Off chip).
• The L3 cache is accessible over the external bus and gives a
performance advantage to adding the third level. Cache size should be double. 49
‘Unified’ Versus ‘Split Caches’
• The on-chip cache first consisted of a single cache used to store
references to both data and instructions.
• Now it has become common to split the cache into two: one
dedicated to instruction and one dedicated to data, called split cache.
• These two caches both exist at same level, typically as two L1 caches.
• When the processor attempts to fetch an ‘instruction/data’ from
main-memory, it first consults the ‘instruction/data’ L1 cache.
• Unified cache has only one cache that is used for holding both data
and instructions.

50
Advantages of a ‘Unified Cache’
• There are two potential advantages of a unified cache over split cache:
1. For a given cache size, a unified cache has a higher hit rate than split
caches because it balances the load between instruction and data
fetches automatically.
• That is if an execution pattern involves many more instruction fetches
than data fetches, then the cache will tend to fill up with instructions,
and if an execution pattern involves relatively more data fetches, the
opposite will occur.
2. Only one cache needs to be designed and implemented.

51
Advantages of a ‘Split Cache’
1. The key advantage of the split cache design is that it eliminates
contention for the cache between the instruction fetch/decode unit
and the execution unit. (useful for ‘pipelining’ of instructions)
2. This enables the processor to fetch instructions ahead of time and
fill a buffer, or pipeline, with instructions to be executed.
The trend is toward split caches at the L1 and unified caches for
higher levels.

52
‘Spatial’ & ‘Temporal’ LOCALITY (Q
4.8 = Q12)
• Principal of locality, which states that data in the vicinity of a
referenced word are likely to be referenced in the near future.
• The ‘principal of locality’ has two types:
1) Spatial locality 2) Temporal locality
1. Spatial locality: refers to the tendency of execution to involve a
number of memory locations that are clustered. This means to ‘access
neighbouring info items whose addresses are near one another’. Space locality.
2. Temporal locality: refers to the tendency for a processor to access
memory locations that have been used recently. So the item that has a
‘probability of being used in future is kept’. (so don’t discard it) locality in time.
53
Quiz # 2
Preparatory Questions
Chapter #4
Q1. Why do we need ‘cache mapping’ techniques? What are the types of
mapping? (Slide – 05)
Q2. What are the differences among ‘direct mapping, associative mapping and
set-associative mapping’? (Slide 9, 20 and 27)
Q3. For a ‘direct-mapped cache’, a main-memory address is viewed as consisting
of three fields. List and define the three fields. (Slide – 13.Box, Attached Q5.
Answer below)
Q4. For an ‘associative cache’, a main-memory address is viewed as consisting of
two fields. List and define the two fields. (Slide – 22.Box, Attached Q6. Answer
below)
Q5. For a ‘set-associative cache’, a main-memory address is viewed as consisting
of three fields. List and define the three fields. (Slide – 31.Box, Attached Q7.
Answer below)
54
Preparatory Questions
Q6. Write down the advantages and disadvantages of ‘direct mapping’. (Slide –
15)
Q7. Write down the advantages and disadvantages of ‘associative mapping’.
(Slide – 24)
Q8. Write down the two ‘Extreme cases’ of ‘set-associative mapping’. (Slide – 35)
Q9. List and explain the various ‘replacement algorithms’ used in cache memory.
(Slide – 39, 40, 41 and 42)
Q10. What is ‘data coherency’? To maintain data-coherency which two ‘write
policies’ are adopted in caches? Explain. (Slide 43, 44 and 45)
Q11. State the advantages of ‘unified cache’ over ‘split cache’, and vice versa.
(Slide – 51)
Q12. Why do we use ‘Victim cache’ in direct mapping?(Slide – 53)
55
Answers to Guess Questions: Q2. (4.4), Q3.
(4.5), Q4. (4.6), Q5. (4.7), Q12. (4.8) below:

56
Answers to Problems 4.1 and
4.2 below.

57
For Information

• Problems 4.1 & 4.2 are part of your course.

• Examples 4.1, are part of your course.

• Preparatory Questions are provided for this chapter.

58
Cache Addresses
Out of Course
• Can be of two types
1) Logical addressing 2) Physical addressing
• Almost all processors support ‘Virtual Memory’.
• Virtual memory is a facility that allows programs to address memory
locations higher than the processor’s addressing space from a logical
point of view, without regard to the amount of main memory
physically available. (This concept will be discussed in Chapter 8)
• Memory Management Unit (MMU) is a hardware unit that translates
each virtual address into a physical address in main memory, to allow
reads and writes to main memory.
59
Logical and Physical Caches
1. Logical cache, also known as virtual cache, stores
data using virtual addresses.
• One obvious advantage of the logical cache is that ‘the processor
accesses the cache directly, without going through the MMU’. So the
cache access speed is faster than for a physical cache. Because the
cache can respond before the MMU performs an ‘address translation’.
• The disadvantage is that most virtual memory systems supply each
application with the same virtual memory address that starts at 0.
Thus cache memory must be completely flushed for two applications.
Thus. the same virtual address in two different applications refers to two different physical addresses.
2. Physical cache stores data using main memory physical addresses.
60

You might also like