CH05-COA11e - Modified
CH05-COA11e - Modified
Chapter 5
Cache Memory
Copyright © 2019, 2016, 2013 Pearson Education, Inc. All Rights Reserved
Figure 5.1
Cache and Main Memory
Copyright © 2019, 2016, 2013 Pearson Education, Inc. All Rights Reserved
Cache Memory Principles
• Block
– The minimum unit of transfer between cache and main memory
• Frame
– To distinguish between the data transferred and the chunk of physical
memory, the term frame, or block frame, is sometimes used with reference
to caches
• Line
– A portion of cache memory capable of holding one block, so-called
because it is usually drawn as a horizontal object
• Tag
– A portion of a cache line that is used for addressing purposes
• Line size
– The number of data bytes, or block size, contained in a line
Copyright © 2019, 2016, 2013 Pearson Education, Inc. All Rights Reserved
Figure 5.2
Cache/Main Memory Structure
Copyright © 2019, 2016, 2013 Pearson Education, Inc. All Rights Reserved
Figure 5.3
Cache Read Operation
Copyright © 2019, 2016, 2013 Pearson Education, Inc. All Rights Reserved
Figure 5.4
Typical Cache Organization
Copyright © 2019, 2016, 2013 Pearson Education, Inc. All Rights Reserved
Table 5.1
Elements of Cache Design
Cache Addresses Write Policy
Logical Write through
Physical Write back
Cache Size Line Size
Mapping Function Number of Caches
Direct Single or two level
Associative Unified or split
Set associative
Replacement Algorithm
Least recently used (LRU)
First in first out (FIFO)
Least frequently used (LFU)
Random
Copyright © 2019, 2016, 2013 Pearson Education, Inc. All Rights Reserved
Cache Addresses
•Virtual Memory
• Virtual memory
– Facility that allows programs to address memory from a
logical point of view, without regard to the amount of main
memory physically available
– When used, the address fields of machine instructions
contain virtual addresses
– For reads to and writes from main memory, a hardware
memory management unit (MMU) translates each virtual
address into a physical address in main memory
Copyright © 2019, 2016, 2013 Pearson Education, Inc. All Rights Reserved
Figure 5.5
Logical and Physical Caches
Copyright © 2019, 2016, 2013 Pearson Education, Inc. All Rights Reserved
Cache Size
• Preferable for the size of the cache to be:
– Small enough so that the overall average cost per bit is close to that of
main memory alone
– Large enough so that the overall average access time is close to that of
the cache alone
Copyright © 2019, 2016, 2013 Pearson Education, Inc. All Rights Reserved
Table 5.3
Cache Access Methods
Set Associative Sequence of m Each block of main memory Line portion of address used to
lines organized as v maps to one unique cache set. access cache set; Tag portion
sets of k lines each used to check every line in that
(m = v × k) set for hit on that line.
Copyright © 2019, 2016, 2013 Pearson Education, Inc. All Rights Reserved
Figure 5.6
Mapping from Main Memory to Cache: Direct and
Associative
Copyright © 2019, 2016, 2013 Pearson Education, Inc. All Rights Reserved
Direct Mapping: M mod N
Block/Line = 4 = 22 Block size = 4 = 22
32 words in Main Memory 32 = 25 = Tag Cache Block/Line # Word Offset
16 words in Cache Memory physical address
1 bit 2 bits 2 bits
Block size = 4 words
00 – 1
0 01 – 2
10 – 3 0 mod 4 = 0
11 – 4
0 1 0, 4
0 1
1, 5
2 2 mod 4 = 2
Tag 1
3
00 – 1
0 2, 6
2 01 – 2
10 – 3
11 – 4
3, 7 4 4 mod 4 = 0
3
Cache Memory 5
# blocks (N) = 4
6
MM blocks that map to CM block 0 are: 0, 4 7
MM blocks that map to CM block 1 are: 1, 5
Main Memory
MM blocks that map to CM block 2 are: 2, 6 # blocks (M) = 8
MM blocks that map to CM block 3 are: 3, 7
Figure 5.7
Direct-Mapping Cache Organization
Copyright © 2019, 2016, 2013 Pearson Education, Inc. All Rights Reserved
Figure 5.8
Direct Mapping Example
Copyright © 2019, 2016, 2013 Pearson Education, Inc. All Rights Reserved
Content-Addressable Memory (CAM)
• Also known as associative storage
• Content-addressable memory is constructed of Static RAM (SRAM)
cells but is considerably more expensive and holds much less data
than regular SRAM chips
• A CAM with the same data capacity as a regular SRAM is about
60% larger
• A CAM is designed such that when a bit string is supplied, the CAM
searches its entire memory in parallel for a match
– If the content is found, the CAM returns the address where the match
is found and, in some architectures, also returns the associated data
word
– This process takes only one clock cycle
Copyright © 2019, 2016, 2013 Pearson Education, Inc. All Rights Reserved
Figure 5.9
Content-Addressable Memory
Copyright © 2019, 2016, 2013 Pearson Education, Inc. All Rights Reserved
Fully Associative Cache
32 = 25 = Total = 5 bits
32 words in Main Memory physical address
Tag Word Offset
16 words in Cache Memory 3 bits 2 bits
Block size = 4 words
0
000
0 1
Tag
010
000
1 2
111
000
2 3
000
3 4
Cache Memory 5
# blocks (N) = 4
6
7
Main Memory
# blocks (M) = 8
Copyright © 2019, 2016, 2013 Pearson Education, Inc. All Rights Reserved
Figure 5.10
Fully Associative Cache Organization
Copyright © 2019, 2016, 2013 Pearson Education, Inc. All Rights Reserved
Figure 5.11
Associative Mapping Example
Copyright © 2019, 2016, 2013 Pearson Education, Inc. All Rights Reserved
Set Associative Mapping
• Compromise that exhibits the strengths of both the direct
and associative approaches while reducing their
disadvantages
• Cache consists of a number of sets
• Each set contains a number of lines
• A given block maps to any line in a given set
• e.g. 2 lines per set
– 2 way associative mapping
– A given block can be in one of 2 lines in only one set
Copyright © 2019, 2016, 2013 Pearson Education, Inc. All Rights Reserved
Set Associative Cache
32 = 25 =
32 words in Main Memory physical address Tag Set field Word Offset
16 words in Cache Memory Total = 5 bits 2 bits 1 bit 2 bits
Block size = 4 words
0 0 mod 2 = 0
Set 0
00
0 1
01
1 2 2 mod 2 = 0
Tag
11 Set 1 3 3 mod 2 = 1
2
01
3 4 4 mod 2 = 0
Cache Memory 5
# blocks (N) = 4
6
7 7 mod 2 = 1
Main Memory
# blocks (M) = 8
Copyright © 2019, 2016, 2013 Pearson Education, Inc. All Rights Reserved
Figure 5.13 k-Way
Set Associative Cache Organization
Copyright © 2019, 2016, 2013 Pearson Education, Inc. All Rights Reserved
Figure 5.14
Two-Way Set-Associative Mapping Example
Copyright © 2019, 2016, 2013 Pearson Education, Inc. All Rights Reserved
Figure 5.15
Varying Associativity over Cache Size
Copyright © 2019, 2016, 2013 Pearson Education, Inc. All Rights Reserved
Replacement Algorithms
• Once the cache has been filled, when a new block is
brought into the cache, one of the existing blocks must
be replaced
• For direct mapping there is only one possible line for any
particular block and no choice is possible
• For the associative and set-associative techniques a
replacement algorithm is needed
• To achieve high speed, an algorithm must be
implemented in hardware
Copyright © 2019, 2016, 2013 Pearson Education, Inc. All Rights Reserved
The most common replacement algorithms
are:
• Least recently used (LRU)
– Most effective
– Replace that block in the set that has been in the cache longest with no
reference to it
– Because of its simplicity of implementation, LRU is the most popular
replacement algorithm
• First-in-first-out (FIFO)
– Replace that block in the set that has been in the cache longest
– Easily implemented as a round-robin or circular buffer technique
Copyright © 2019, 2016, 2013 Pearson Education, Inc. All Rights Reserved
Write Policy
When a block that is resident
in the cache is to be There are two problems to
replaced there are two cases contend with:
to consider:
Copyright © 2019, 2016, 2013 Pearson Education, Inc. All Rights Reserved
Write Through
and Write Back
• Write through
– Simplest technique
– All write operations are made to main memory as well as to the cache
– The main disadvantage of this technique is that it generates substantial
memory traffic and may create a bottleneck
• Write back
– Minimizes memory writes
– Updates are made only in the cache
– Portions of main memory are invalid and hence accesses by I/O modules
can be allowed only through the cache
– This makes for complex circuitry and a potential bottleneck
Copyright © 2019, 2016, 2013 Pearson Education, Inc. All Rights Reserved
Write Miss Alternatives
• There are two alternatives in the event of a write miss at a cache
level:
– Write allocate
– The block containing the word to be written is fetched from main memory (or
next level cache) into the cache and the processor proceeds with the write
cycle
– No write allocate
– The block containing the word to be written is modified in the main memory
and not loaded into the cache
Copyright © 2019, 2016, 2013 Pearson Education, Inc. All Rights Reserved
Multilevel Caches
• As logic density has increased it has become possible to have a cache on the same chip as
the processor
• The on-chip cache reduces the processor’s external bus activity and speeds up execution
time and increases overall system performance
– When the requested instruction or data is found in the on-chip cache, the bus access is
eliminated
– On-chip cache accesses will complete appreciably faster than would even zero-wait state bus
cycles
– During this period the bus is free to support other transfers
• Two-level cache:
– Internal cache designated as level 1 (L1)
– External cache designated as level 2 (L2)
• Potential savings due to the use of an L2 cache depends on the hit rates in both the L1 and
L2 caches
• The use of multilevel caches complicates all of the design issues related to caches, including
size, replacement algorithm, and write policy
Copyright © 2019, 2016, 2013 Pearson Education, Inc. All Rights Reserved
Figure 5.16
Total Hit Ratio (L1 and L2) for 8-kB and
16-kB L1
Copyright © 2019, 2016, 2013 Pearson Education, Inc. All Rights Reserved
Unified Versus Split Caches
• Has become common to split cache:
– One dedicated to instructions
– One dedicated to data
– Both exist at the same level, typically as two L1 caches
Copyright © 2019, 2016, 2013 Pearson Education, Inc. All Rights Reserved
Inclusion Policy
• Inclusive policy
– Dictates that a piece of data in one cache is guaranteed to be also found in all lower levels of caches
– Advantage is that it simplifies searching for data when there are multiple processors in the computing
system
– This property is useful in enforcing cache coherence
• Exclusive policy
– Dictates that a piece of data in one cache is guaranteed not to be found in all lower levels of caches
– The advantage is that it does not waste cache capacity since it does not store multiple copies of the
same data in all of the caches
– The disadvantage is the need to search multiple cache levels when invalidating or updating a block
– To minimize the search time, the highest-level tag sets are typically duplicated at the lowest cache
level to centralize searching
• Noninclusive policy
– With the noninclusive policy a piece of data in one cache may or may not be found in lower levels of
caches
– As with the exclusive policy, this policy will generally maintain all higher-level cache sets at the lowest
cache level
Copyright © 2019, 2016, 2013 Pearson Education, Inc. All Rights Reserved
Figure 1. Inclusive Policy
Figure 2. Exclusive policy
Table 5.4
Intel Cache Evolution
Processor on Which
Problem Solution Feature First Appears
External memory slower than the Add external cache using faster 386
system bus. memory technology.
Internal cache is rather small, due to Add external L2 cache using faster 486
limited space on chip. technology than main memory.
Contention occurs when both the Create separate data and Pentium
Instruction Prefetcher and the instruction caches.
Execution Unit simultaneously require
access to the cache. In that case, the
Prefetcher is stalled while the Execution
Unit’s data access takes place.
Copyright © 2019, 2016, 2013 Pearson Education, Inc. All Rights Reserved
Figure 5.17
Pentium 4 Block Diagram
Copyright © 2019, 2016, 2013 Pearson Education, Inc. All Rights Reserved
Copyright
This work is protected by United States copyright laws and is provided solely
for the use of instructions in teaching their courses and assessing student
learning. dissemination or sale of any part of this work (including on the
World Wide Web) will destroy the integrity of the work and is not permit-
ted. The work and materials from it should never be made available to
students except by instructors using the accompanying text in their
classes. All recipients of this work are expected to abide by these
restrictions and to honor the intended pedagogical purposes and the needs of
other instructors who rely on these materials.
Copyright © 2019, 2016, 2013 Pearson Education, Inc. All Rights Reserved