0% found this document useful (0 votes)
7 views46 pages

CH05-COA11e - Modified

Chapter 5 of 'Computer Organization and Architecture' covers cache memory principles, including definitions of key terms such as block, line, and tag, as well as cache design elements like mapping functions and replacement algorithms. It discusses various cache access methods, including direct-mapped, fully associative, and set associative mapping, along with their advantages and disadvantages. The chapter also addresses write policies and strategies for managing cache memory effectively.

Uploaded by

Jaden Sooklal
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
7 views46 pages

CH05-COA11e - Modified

Chapter 5 of 'Computer Organization and Architecture' covers cache memory principles, including definitions of key terms such as block, line, and tag, as well as cache design elements like mapping functions and replacement algorithms. It discusses various cache access methods, including direct-mapped, fully associative, and set associative mapping, along with their advantages and disadvantages. The chapter also addresses write policies and strategies for managing cache memory effectively.

Uploaded by

Jaden Sooklal
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 46

Computer Organization and Architecture

Designing for Performance


11th Edition

Chapter 5
Cache Memory

Copyright © 2019, 2016, 2013 Pearson Education, Inc. All Rights Reserved
Figure 5.1
Cache and Main Memory

Copyright © 2019, 2016, 2013 Pearson Education, Inc. All Rights Reserved
Cache Memory Principles
• Block
– The minimum unit of transfer between cache and main memory

• Frame
– To distinguish between the data transferred and the chunk of physical
memory, the term frame, or block frame, is sometimes used with reference
to caches

• Line
– A portion of cache memory capable of holding one block, so-called
because it is usually drawn as a horizontal object

• Tag
– A portion of a cache line that is used for addressing purposes

• Line size
– The number of data bytes, or block size, contained in a line
Copyright © 2019, 2016, 2013 Pearson Education, Inc. All Rights Reserved
Figure 5.2
Cache/Main Memory Structure

Copyright © 2019, 2016, 2013 Pearson Education, Inc. All Rights Reserved
Figure 5.3
Cache Read Operation

Copyright © 2019, 2016, 2013 Pearson Education, Inc. All Rights Reserved
Figure 5.4
Typical Cache Organization

Copyright © 2019, 2016, 2013 Pearson Education, Inc. All Rights Reserved
Table 5.1
Elements of Cache Design
Cache Addresses Write Policy
Logical Write through
Physical Write back
Cache Size Line Size
Mapping Function Number of Caches
Direct Single or two level
Associative Unified or split
Set associative
Replacement Algorithm
Least recently used (LRU)
First in first out (FIFO)
Least frequently used (LFU)
Random

Copyright © 2019, 2016, 2013 Pearson Education, Inc. All Rights Reserved
Cache Addresses
•Virtual Memory
• Virtual memory
– Facility that allows programs to address memory from a
logical point of view, without regard to the amount of main
memory physically available
– When used, the address fields of machine instructions
contain virtual addresses
– For reads to and writes from main memory, a hardware
memory management unit (MMU) translates each virtual
address into a physical address in main memory

Copyright © 2019, 2016, 2013 Pearson Education, Inc. All Rights Reserved
Figure 5.5
Logical and Physical Caches

Copyright © 2019, 2016, 2013 Pearson Education, Inc. All Rights Reserved
Cache Size
• Preferable for the size of the cache to be:
– Small enough so that the overall average cost per bit is close to that of
main memory alone
– Large enough so that the overall average access time is close to that of
the cache alone

▪ Motivations for minimizing cache size:


– The larger the cache, the larger the number of gates involved in
addressing the cache resulting in large caches being slightly slower than
small ones
– The available chip and board area also limits cache size

▪ Because the performance of the cache is very sensitive to the


nature of the workload, it is impossible to arrive at a single
“optimum” cache size
Copyright © 2019, 2016, 2013 Pearson Education, Inc. All Rights Reserved
Table 5.2
Cache Sizes of Some Processors
Year of
Processor Type Introduction L1 Cachea L2 cache L3 Cache

IBM 360/85 Mainframe 1968 16 to 32 kB – –

PDP-11/70 Minicomputer 1968 1 kB – –

IBM 3033 Mainframe 1968 64 kB – –

IBM 3090 Mainframe 1968 128 to 256 kB – –

Intel 80486 PC 1968 8 kB – –

Pentium PC 1968 8 kB/8 kB 256 to 512 kB –

PowerPC 620 PC 1968 32 kB/32 kB – –

IBM S/390 G6 Mainframe 1968 256 kB 8 MB –

Pentium 4 PC/server 1968 8 kB/8 kB 256 kB –

Itanium PC/server 1968 16 kB/16 kB 96 kB 4 MB

Itanium 2 PC/server 1968 32 kB 256 kB 6 MB

IBM POWER5 High-end 1968 64 kB 1.9 MB 36 MB


server

CRAY XD-1 Supercomputer 1968 64 kB/64 kB 1 MB –

IBM POWER6 PC/server 1968 64 kB/64 kB 4 MB 32 MB

IBM z10 Mainframe 1968 64 kB/128 kB 3 MB 24-48 MB

Intel Core i7 Workstaton/ 1968 6 × 32 kB/32 kB 6 × 1.5 MB 12 MB a


Two values separated by a
EE 990 Server
slash refer to instruction and
data caches.
IBM Mainframe/ 1968 24 × 64 kB/128 kB 24 × 1.5 MB 24 MB L3
zEnterprise Server 192 MB L4
196

IBM z13 Mainframe/ 1968 24 × 96 kB/128 kB 24 × 2 MB/2 MB 64 MB L3


server 480 MB L4 (Table can be found on page
Intel Core Workstation/ 1968 8 × 32 kB/32 kB 8 × 1 MB 14 MB 145 in the textbook.)
i0-7900X server

Copyright © 2019, 2016, 2013 Pearson Education, Inc. All Rights Reserved
Table 5.3
Cache Access Methods

Mapping of Main Memory Access using Main


Method Organization Blocks to Cache Memory Address
Direct Mapped Sequence of m Each block of main memory Line portion of address used
lines maps to one unique line of to access cache line; Tag
cache. portion used to check for hit
on that line.
Fully Associative Sequence of m Each block of main memory Tag portion of address used
lines can map to any line of cache. to check every line for hit on
that line.

Set Associative Sequence of m Each block of main memory Line portion of address used to
lines organized as v maps to one unique cache set. access cache set; Tag portion
sets of k lines each used to check every line in that
(m = v × k) set for hit on that line.

Copyright © 2019, 2016, 2013 Pearson Education, Inc. All Rights Reserved
Figure 5.6
Mapping from Main Memory to Cache: Direct and
Associative

Copyright © 2019, 2016, 2013 Pearson Education, Inc. All Rights Reserved
Direct Mapping: M mod N
Block/Line = 4 = 22 Block size = 4 = 22
32 words in Main Memory 32 = 25 = Tag Cache Block/Line # Word Offset
16 words in Cache Memory physical address
1 bit 2 bits 2 bits
Block size = 4 words
00 – 1

0 01 – 2
10 – 3 0 mod 4 = 0
11 – 4

0 1 0, 4
0 1
1, 5
2 2 mod 4 = 2
Tag 1
3
00 – 1
0 2, 6
2 01 – 2
10 – 3
11 – 4

3, 7 4 4 mod 4 = 0
3
Cache Memory 5
# blocks (N) = 4
6
MM blocks that map to CM block 0 are: 0, 4 7
MM blocks that map to CM block 1 are: 1, 5
Main Memory
MM blocks that map to CM block 2 are: 2, 6 # blocks (M) = 8
MM blocks that map to CM block 3 are: 3, 7
Figure 5.7
Direct-Mapping Cache Organization

Copyright © 2019, 2016, 2013 Pearson Education, Inc. All Rights Reserved
Figure 5.8
Direct Mapping Example

Copyright © 2019, 2016, 2013 Pearson Education, Inc. All Rights Reserved
Content-Addressable Memory (CAM)
• Also known as associative storage
• Content-addressable memory is constructed of Static RAM (SRAM)
cells but is considerably more expensive and holds much less data
than regular SRAM chips
• A CAM with the same data capacity as a regular SRAM is about
60% larger
• A CAM is designed such that when a bit string is supplied, the CAM
searches its entire memory in parallel for a match
– If the content is found, the CAM returns the address where the match
is found and, in some architectures, also returns the associated data
word
– This process takes only one clock cycle

Copyright © 2019, 2016, 2013 Pearson Education, Inc. All Rights Reserved
Figure 5.9
Content-Addressable Memory

Copyright © 2019, 2016, 2013 Pearson Education, Inc. All Rights Reserved
Fully Associative Cache
32 = 25 = Total = 5 bits
32 words in Main Memory physical address
Tag Word Offset
16 words in Cache Memory 3 bits 2 bits
Block size = 4 words

0
000
0 1

Tag
010
000
1 2
111
000
2 3
000
3 4
Cache Memory 5
# blocks (N) = 4
6
7
Main Memory
# blocks (M) = 8

Copyright © 2019, 2016, 2013 Pearson Education, Inc. All Rights Reserved
Figure 5.10
Fully Associative Cache Organization

Copyright © 2019, 2016, 2013 Pearson Education, Inc. All Rights Reserved
Figure 5.11
Associative Mapping Example

Copyright © 2019, 2016, 2013 Pearson Education, Inc. All Rights Reserved
Set Associative Mapping
• Compromise that exhibits the strengths of both the direct
and associative approaches while reducing their
disadvantages
• Cache consists of a number of sets
• Each set contains a number of lines
• A given block maps to any line in a given set
• e.g. 2 lines per set
– 2 way associative mapping
– A given block can be in one of 2 lines in only one set

Copyright © 2019, 2016, 2013 Pearson Education, Inc. All Rights Reserved
Set Associative Cache
32 = 25 =
32 words in Main Memory physical address Tag Set field Word Offset
16 words in Cache Memory Total = 5 bits 2 bits 1 bit 2 bits
Block size = 4 words

0 0 mod 2 = 0
Set 0
00
0 1
01
1 2 2 mod 2 = 0
Tag
11 Set 1 3 3 mod 2 = 1
2
01
3 4 4 mod 2 = 0

Cache Memory 5
# blocks (N) = 4
6
7 7 mod 2 = 1
Main Memory
# blocks (M) = 8

Copyright © 2019, 2016, 2013 Pearson Education, Inc. All Rights Reserved
Figure 5.13 k-Way
Set Associative Cache Organization

Copyright © 2019, 2016, 2013 Pearson Education, Inc. All Rights Reserved
Figure 5.14
Two-Way Set-Associative Mapping Example

Copyright © 2019, 2016, 2013 Pearson Education, Inc. All Rights Reserved
Figure 5.15
Varying Associativity over Cache Size

Copyright © 2019, 2016, 2013 Pearson Education, Inc. All Rights Reserved
Replacement Algorithms
• Once the cache has been filled, when a new block is
brought into the cache, one of the existing blocks must
be replaced
• For direct mapping there is only one possible line for any
particular block and no choice is possible
• For the associative and set-associative techniques a
replacement algorithm is needed
• To achieve high speed, an algorithm must be
implemented in hardware

Copyright © 2019, 2016, 2013 Pearson Education, Inc. All Rights Reserved
The most common replacement algorithms
are:
• Least recently used (LRU)
– Most effective
– Replace that block in the set that has been in the cache longest with no
reference to it
– Because of its simplicity of implementation, LRU is the most popular
replacement algorithm

• First-in-first-out (FIFO)
– Replace that block in the set that has been in the cache longest
– Easily implemented as a round-robin or circular buffer technique

• Least frequently used (LFU)


– Replace that block in the set that has experienced the fewest references
– Could be implemented by associating a counter with each line

Copyright © 2019, 2016, 2013 Pearson Education, Inc. All Rights Reserved
Write Policy
When a block that is resident
in the cache is to be There are two problems to
replaced there are two cases contend with:
to consider:

If the old block in the cache has not


been altered, then it may be More than one device may have
overwritten with a new block without access to main memory
first writing out the old block

A more complex problem occurs


If at least one write operation has when multiple processors are
been performed on a word in that line attached to the same bus and each
of the cache then main memory must processor has its own local cache - if
be updated by writing the line of a word is altered in one cache it could
cache out to the block of memory conceivably invalidate a word in
before bringing in the new block other caches

Copyright © 2019, 2016, 2013 Pearson Education, Inc. All Rights Reserved
Write Through
and Write Back
• Write through
– Simplest technique
– All write operations are made to main memory as well as to the cache
– The main disadvantage of this technique is that it generates substantial
memory traffic and may create a bottleneck

• Write back
– Minimizes memory writes
– Updates are made only in the cache
– Portions of main memory are invalid and hence accesses by I/O modules
can be allowed only through the cache
– This makes for complex circuitry and a potential bottleneck

Copyright © 2019, 2016, 2013 Pearson Education, Inc. All Rights Reserved
Write Miss Alternatives
• There are two alternatives in the event of a write miss at a cache
level:
– Write allocate
– The block containing the word to be written is fetched from main memory (or
next level cache) into the cache and the processor proceeds with the write
cycle
– No write allocate
– The block containing the word to be written is modified in the main memory
and not loaded into the cache

• Either of these policies can be used with either write through or


write back
• No write allocate is most commonly used with write through
• Write allocate is most commonly used with write back
Copyright © 2019, 2016, 2013 Pearson Education, Inc. All Rights Reserved
Cache Coherency
• A new problem is introduced in a bus organization in which more than one device has
a cache and main memory is shared
• If data in one cache are altered, this invalidates not only the corresponding word in
main memory, but also that same word in other caches
• Even if a write-through policy is used, the other caches may contain invalid data
• Possible approaches to cache coherency include:
– Bus watching with write through
• Each cache controller monitors the address lines to detect write operations to memory by other bus
masters
• If another master writes to a location in shared memory that also resides in the cache memory, the cache
controller invalidates that cache entry
• This strategy depends on the use of a write-through policy by all cache controllers
– Hardware transparency
• Additional hardware is used to ensure that all updates to main memory via cache are reflected in all caches
• If one processor modifies a word in its cache, this update is written to main memory
– Noncacheable memory
• Only a portion of main memory is shared by more than one processor, and this is designated as noncacheable
• All accesses to shared memory are cache misses, because the shared memory is never copied into the cache
• The noncacheable memory can be identified using chip-select logic or high-address bits
Copyright © 2019, 2016, 2013 Pearson Education, Inc. All Rights Reserved
Line Size
When a block of Two specific effects
data is retrieved come into play:
and placed in the • Larger blocks reduce the
cache not only the As the block size number of blocks that fit
desired word but increases more into a cache
• As a block becomes larger
also some number useful data are each additional word is
of adjacent words brought into the farther from the requested
are retrieved cache word

As the block size The hit ratio will


increases the hit begin to decrease
ratio will at first as the block
increase because becomes bigger
of the principle of and the probability
locality of using the newly
fetched information
becomes less than
the probability of
reusing the
information that
has to be replaced

Copyright © 2019, 2016, 2013 Pearson Education, Inc. All Rights Reserved
Multilevel Caches
• As logic density has increased it has become possible to have a cache on the same chip as
the processor

• The on-chip cache reduces the processor’s external bus activity and speeds up execution
time and increases overall system performance
– When the requested instruction or data is found in the on-chip cache, the bus access is
eliminated
– On-chip cache accesses will complete appreciably faster than would even zero-wait state bus
cycles
– During this period the bus is free to support other transfers

• Two-level cache:
– Internal cache designated as level 1 (L1)
– External cache designated as level 2 (L2)

• Potential savings due to the use of an L2 cache depends on the hit rates in both the L1 and
L2 caches

• The use of multilevel caches complicates all of the design issues related to caches, including
size, replacement algorithm, and write policy

Copyright © 2019, 2016, 2013 Pearson Education, Inc. All Rights Reserved
Figure 5.16
Total Hit Ratio (L1 and L2) for 8-kB and
16-kB L1

Copyright © 2019, 2016, 2013 Pearson Education, Inc. All Rights Reserved
Unified Versus Split Caches
• Has become common to split cache:
– One dedicated to instructions
– One dedicated to data
– Both exist at the same level, typically as two L1 caches

• Advantages of unified cache:


– Higher hit rate
▪ Balances load of instruction and data fetches automatically
▪ Only one cache needs to be designed and implemented

• Trend is toward split caches at the L1 and unified caches for


higher levels
• Advantages of split cache:
– Eliminates cache contention between instruction fetch/decode unit and execution unit
▪ Important in pipelining

Copyright © 2019, 2016, 2013 Pearson Education, Inc. All Rights Reserved
Inclusion Policy
• Inclusive policy
– Dictates that a piece of data in one cache is guaranteed to be also found in all lower levels of caches
– Advantage is that it simplifies searching for data when there are multiple processors in the computing
system
– This property is useful in enforcing cache coherence

• Exclusive policy
– Dictates that a piece of data in one cache is guaranteed not to be found in all lower levels of caches
– The advantage is that it does not waste cache capacity since it does not store multiple copies of the
same data in all of the caches
– The disadvantage is the need to search multiple cache levels when invalidating or updating a block
– To minimize the search time, the highest-level tag sets are typically duplicated at the lowest cache
level to centralize searching

• Noninclusive policy
– With the noninclusive policy a piece of data in one cache may or may not be found in lower levels of
caches
– As with the exclusive policy, this policy will generally maintain all higher-level cache sets at the lowest
cache level

Copyright © 2019, 2016, 2013 Pearson Education, Inc. All Rights Reserved
Figure 1. Inclusive Policy
Figure 2. Exclusive policy
Table 5.4
Intel Cache Evolution
Processor on Which
Problem Solution Feature First Appears
External memory slower than the Add external cache using faster 386
system bus. memory technology.

Increased processor speed results in Move external cache on-chip, 486


external bus becoming a bottleneck for operating at the same speed as the
cache access. processor.

Internal cache is rather small, due to Add external L2 cache using faster 486
limited space on chip. technology than main memory.

Contention occurs when both the Create separate data and Pentium
Instruction Prefetcher and the instruction caches.
Execution Unit simultaneously require
access to the cache. In that case, the
Prefetcher is stalled while the Execution
Unit’s data access takes place.

Create separate back-side bus that Pentium Pro


runs at higher speed than the main
(front-side) external bus. The BSB is
Increased processor speed results in dedicated to the L2 cache.
external bus becoming a bottleneck for
L2 cache access.
Move L2 cache on to the Pentium II
processor chip.

Add external L3 cache. Pentium III


Some applications deal with massive
databases and must have rapid access
Move L3 cache on-chip. Pentium 4
to large amounts of data. The on-Chip
caches are too small.

Copyright © 2019, 2016, 2013 Pearson Education, Inc. All Rights Reserved
Figure 5.17
Pentium 4 Block Diagram

Copyright © 2019, 2016, 2013 Pearson Education, Inc. All Rights Reserved
Copyright

This work is protected by United States copyright laws and is provided solely
for the use of instructions in teaching their courses and assessing student
learning. dissemination or sale of any part of this work (including on the
World Wide Web) will destroy the integrity of the work and is not permit-
ted. The work and materials from it should never be made available to
students except by instructors using the accompanying text in their
classes. All recipients of this work are expected to abide by these
restrictions and to honor the intended pedagogical purposes and the needs of
other instructors who rely on these materials.

Copyright © 2019, 2016, 2013 Pearson Education, Inc. All Rights Reserved

You might also like