0% found this document useful (0 votes)
41 views56 pages

Cache Memory: S. Dandamudi

The document summarizes cache memory, including how it works to bridge the speed gap between the processor and main memory. It discusses cache design basics like block placement, mapping functions, and replacement policies. Direct mapping, set-associative mapping, and associative mapping are described. Write policies like write-through and write-back are also covered. The goal of cache is to behave like fast memory by exploiting data locality.

Uploaded by

gnshkhr
Copyright
© Attribution Non-Commercial (BY-NC)
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
41 views56 pages

Cache Memory: S. Dandamudi

The document summarizes cache memory, including how it works to bridge the speed gap between the processor and main memory. It discusses cache design basics like block placement, mapping functions, and replacement policies. Direct mapping, set-associative mapping, and associative mapping are described. Write policies like write-through and write-back are also covered. The goal of cache is to behave like fast memory by exploiting data locality.

Uploaded by

gnshkhr
Copyright
© Attribution Non-Commercial (BY-NC)
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 56

Cache Memory

Chapter 17 S. Dandamudi

Outline
Introduction How cache memory works Why cache memory works Cache design basics Mapping function
Direct mapping Associative mapping Set-associative mapping

Types of cache misses Types of caches Example implementations


Pentium PowerPC MIPS

Cache operation summary Design issues


Cache capacity Cache line size Degree of associatively

Replacement policies Write policies Space overhead


2003 S. Dandamudi

Chapter 17: Page 2

To be used with S. Dandamudi, Fundamentals of Computer Organization and Design, Springer, 2003.

Introduction
Memory hierarchy
Registers Memory Disk

Cache memory is a small amount of fast memory


Placed between two levels of memory hierarchy
To bridge the gap in access times Between processor and main memory (our focus) Between main memory and disk (disk cache)

Expected to behave like a large amount of fast memory


2003 S. Dandamudi Chapter 17: Page 3

To be used with S. Dandamudi, Fundamentals of Computer Organization and Design, Springer, 2003.

Introduction (contd)

2003

S. Dandamudi

Chapter 17: Page 4

To be used with S. Dandamudi, Fundamentals of Computer Organization and Design, Springer, 2003.

How Cache Memory Works


Prefetch data into cache before the processor needs it
Need to predict processor future access requirements
Not difficult owing to locality of reference

Important terms
Miss penalty Hit ratio Miss ratio = (1 hit ratio) Hit time

2003

S. Dandamudi

Chapter 17: Page 5

To be used with S. Dandamudi, Fundamentals of Computer Organization and Design, Springer, 2003.

How Cache Memory Works (contd)

Cache read operation

2003

S. Dandamudi

Chapter 17: Page 6

To be used with S. Dandamudi, Fundamentals of Computer Organization and Design, Springer, 2003.

How Cache Memory Works (contd)

Cache write operation

2003

S. Dandamudi

Chapter 17: Page 7

To be used with S. Dandamudi, Fundamentals of Computer Organization and Design, Springer, 2003.

Why Cache Memory Works


Example
for (i=0; i<M; i++) for(j=0; j<N; j++) X[i][j] = X[i][j] + K; Each element of X is double (eight bytes) Loop is executed (M*N) times
Placing the code in cache avoids access to main memory Repetitive use (one of the factors) Temporal locality Prefetching data Spatial locality
2003 S. Dandamudi Chapter 17: Page 8

To be used with S. Dandamudi, Fundamentals of Computer Organization and Design, Springer, 2003.

How Cache Memory Works (contd)


300 250
Execution time (ms

200 150 100 50 0 500 600 700

Column-order

Row-order

800 Matrix size

900

1000

2003

S. Dandamudi

Chapter 17: Page 9

To be used with S. Dandamudi, Fundamentals of Computer Organization and Design, Springer, 2003.

Cache Design Basics


On every read miss
A fixed number of bytes are transferred
More than what the processor needs Effective due to spatial locality

Cache is divided into blocks of B bytes


b-bits are needed as offset into the block b = log2B Block are called cache lines

Main memory is also divided into blocks of same size


Address is divided into two parts
2003 S. Dandamudi Chapter 17: Page 10

To be used with S. Dandamudi, Fundamentals of Computer Organization and Design, Springer, 2003.

Cache Design Basics (contd)


B = 4 bytes b = 2 bits

2003

S. Dandamudi

Chapter 17: Page 11

To be used with S. Dandamudi, Fundamentals of Computer Organization and Design, Springer, 2003.

Cache Design Basics (contd)


Transfer between main memory and cache
In units of blocks Implements spatial locality

Transfer between main memory and cache


In units of words

Need policies for



2003

Block placement Mapping function Block replacement Write policies


S. Dandamudi Chapter 17: Page 12
To be used with S. Dandamudi, Fundamentals of Computer Organization and Design, Springer, 2003.

Cache Design Basics (contd)


Read cycle operations

2003

S. Dandamudi

Chapter 17: Page 13

To be used with S. Dandamudi, Fundamentals of Computer Organization and Design, Springer, 2003.

Mapping Function
Determines how memory blocks are mapped to cache lines Three types
Direct mapping
Specifies a single cache line for each memory block

Set-associative mapping
Specifies a set of cache lines for each memory block

Associative mapping
No restrictions Any cache line can be used for any memory block

2003

S. Dandamudi

Chapter 17: Page 14

To be used with S. Dandamudi, Fundamentals of Computer Organization and Design, Springer, 2003.

Mapping Function (contd)


Direct mapping example

2003

S. Dandamudi

Chapter 17: Page 15

To be used with S. Dandamudi, Fundamentals of Computer Organization and Design, Springer, 2003.

Mapping Function (contd)


Implementing direct mapping
Easier than the other two Maintains three pieces of information
Cache data Actual data Cache tag Problem: More memory blocks than cache lines 4Several memory blocks are mapped to a cache line Tag stores the address of memory block in cache line Valid bit Indicates if cache line contains a valid block
2003 S. Dandamudi Chapter 17: Page 16

To be used with S. Dandamudi, Fundamentals of Computer Organization and Design, Springer, 2003.

Mapping Function (contd)

2003

S. Dandamudi

Chapter 17: Page 17

To be used with S. Dandamudi, Fundamentals of Computer Organization and Design, Springer, 2003.

Mapping Function (contd)

Direct mapping
Reference pattern: 0, 4, 0, 8, 0, 8, 0, 4, 0, 4, 0, 4 Hit ratio = 0%

2003

S. Dandamudi

Chapter 17: Page 18

To be used with S. Dandamudi, Fundamentals of Computer Organization and Design, Springer, 2003.

Mapping Function (contd)

Direct mapping
Reference pattern: 0, 7, 9, 10, 0, 7, 9, 10, 0, 7, 9, 10 Hit ratio = 67%

2003

S. Dandamudi

Chapter 17: Page 19

To be used with S. Dandamudi, Fundamentals of Computer Organization and Design, Springer, 2003.

Mapping Function (contd)


Associative mapping

2003

S. Dandamudi

Chapter 17: Page 20

To be used with S. Dandamudi, Fundamentals of Computer Organization and Design, Springer, 2003.

Mapping Function (contd)

Associative mapping
Reference pattern: 0, 4, 0, 8, 0, 8, 0, 4, 0, 4, 0, 4

Hit ratio = 75%

2003

S. Dandamudi

Chapter 17: Page 21

To be used with S. Dandamudi, Fundamentals of Computer Organization and Design, Springer, 2003.

Mapping Function (contd)

Address match logic for associative mapping

2003

S. Dandamudi

Chapter 17: Page 22

To be used with S. Dandamudi, Fundamentals of Computer Organization and Design, Springer, 2003.

Mapping Function (contd)


Associative cache with address match logic

2003

S. Dandamudi

Chapter 17: Page 23

To be used with S. Dandamudi, Fundamentals of Computer Organization and Design, Springer, 2003.

Mapping Function (contd)


Set-associative mapping

2003

S. Dandamudi

Chapter 17: Page 24

To be used with S. Dandamudi, Fundamentals of Computer Organization and Design, Springer, 2003.

Mapping Function (contd)


Address partition in set-associative mapping

2003

S. Dandamudi

Chapter 17: Page 25

To be used with S. Dandamudi, Fundamentals of Computer Organization and Design, Springer, 2003.

Mapping Function (contd)

Set-associative mapping
Reference pattern: 0, 4, 0, 8, 0, 8, 0, 4, 0, 4, 0, 4 Hit ratio = 67%
2003 S. Dandamudi Chapter 17: Page 26

To be used with S. Dandamudi, Fundamentals of Computer Organization and Design, Springer, 2003.

Replacement Policies
We invoke the replacement policy
When there is no place in cache to load the memory block

Depends on the actual placement policy in effect


Direct mapping does not need a special replacement policy
Replace the mapped cache line

Several policies for the other two mapping functions


Popular: LRU (least recently used) Random replacement Less interest (FIFO, LFU)
2003 S. Dandamudi Chapter 17: Page 27

To be used with S. Dandamudi, Fundamentals of Computer Organization and Design, Springer, 2003.

Replacement Policies (contd)


LRU
Expensive to implement
Particularly for set sizes more than four

Implementations resort to approximation


Pseudo-LRU
Partitions sets into two groups Maintains the group that has been accessed recently Requires only one bit Requires only (W-1) bits (W = degree of associativity) PowerPC is an example 4Details later
2003 S. Dandamudi Chapter 17: Page 28

To be used with S. Dandamudi, Fundamentals of Computer Organization and Design, Springer, 2003.

Replacement Policies (contd)


Pseudo-LRU implementation

2003

S. Dandamudi

Chapter 17: Page 29

To be used with S. Dandamudi, Fundamentals of Computer Organization and Design, Springer, 2003.

Write Policies
Memory write requires special attention
We have two copies
A memory copy A cached copy

Write policy determines how a memory write operation is handled


Two policies Write-through 4Update both copies Write-back 4Update only the cached copy 4Needs to be taken care of the memory copy
2003 S. Dandamudi Chapter 17: Page 30

To be used with S. Dandamudi, Fundamentals of Computer Organization and Design, Springer, 2003.

Write Policies (contd)


Cache hit in a write-through cache

Figure 17.3a 2003 S. Dandamudi Chapter 17: Page 31

To be used with S. Dandamudi, Fundamentals of Computer Organization and Design, Springer, 2003.

Write Policies (contd)


Cache hit in a write-back cache

2003

S. Dandamudi

Chapter 17: Page 32

To be used with S. Dandamudi, Fundamentals of Computer Organization and Design, Springer, 2003.

Write Policies (contd)


Write-back policy
Updates the memory copy when the cache copy is being replaced
We first write the cache copy to update the memory copy

Number of write-backs can be reduced if we write only when the cache copy is different from memory copy
Done by associating a dirty bit or update bit Write back only when the dirty bit is 1 Write-back caches thus require two bits A valid bit A dirty or update bit
2003 S. Dandamudi Chapter 17: Page 33

To be used with S. Dandamudi, Fundamentals of Computer Organization and Design, Springer, 2003.

Write Policies (contd)


Needed only in write-back caches

2003

S. Dandamudi

Chapter 17: Page 34

To be used with S. Dandamudi, Fundamentals of Computer Organization and Design, Springer, 2003.

Write Policies (contd)


Other ways to reduce write traffic
Buffered writes
Especially useful for write-through policies Writes to memory are buffered and written at a later time Allows write combining 4Catches multiple writes in the buffer itself

Example: Pentium
Uses a 32-byte write buffer Buffer is written at several trigger points An example trigger point 4Buffer full condition
2003 S. Dandamudi Chapter 17: Page 35
To be used with S. Dandamudi, Fundamentals of Computer Organization and Design, Springer, 2003.

Write Policies (contd)


Write-through versus write-back
Write-through
Advantage Both cache and memory copies are consistent 4Important in multiprocessor systems Disadvantage Tends to waste bus and memory bandwidth

Write-back
Advantage Reduces write traffic to memory Disadvantages Takes longer to load new cache lines Requires additional dirty bit
2003 S. Dandamudi Chapter 17: Page 36
To be used with S. Dandamudi, Fundamentals of Computer Organization and Design, Springer, 2003.

Space Overhead
The three mapping functions introduce different space overheads
Overhead decreases with increasing degree of associativity 4 GB address space Several examples in the text 32 KB cache

2003

S. Dandamudi

Chapter 17: Page 37

To be used with S. Dandamudi, Fundamentals of Computer Organization and Design, Springer, 2003.

Types of Cache Misses


Three types
Compulsory misses
Due to first-time access to a block Also called cold-start misses or compulsory line fills

Capacity misses
Induced due to cache capacity limitation Can be avoided by increasing cache size

Conflict misses
Due to conflicts caused by direct and set-associative mappings Can be completely eliminated by fully associative mapping Also called collision misses
2003 S. Dandamudi Chapter 17: Page 38

To be used with S. Dandamudi, Fundamentals of Computer Organization and Design, Springer, 2003.

Types of Cache Misses (contd)


Compulsory misses
Reduced by increasing block size
We prefetch more Cannot increase beyond a limit Cache misses increase

Capacity misses
Reduced by increasing cache size
Law of diminishing returns

Conflict misses
Reduced by increasing degree of associativity
Fully associative mapping: no conflict misses
2003 S. Dandamudi Chapter 17: Page 39

To be used with S. Dandamudi, Fundamentals of Computer Organization and Design, Springer, 2003.

Types of Caches
Separate instruction and data caches
Initial cache designs used unified caches Current trend is to use separate caches (for level 1)

2003

S. Dandamudi

Chapter 17: Page 40

To be used with S. Dandamudi, Fundamentals of Computer Organization and Design, Springer, 2003.

Types of Caches (contd)


Several reasons for preferring separate caches
Locality tends to be stronger Can use different designs for data and instruction caches
Instruction caches Read only, dominant sequential access No need for write policies Can use a simple direct mapped cache implementation Data caches Can use a set-associative cache Appropriate write policy can be implemented

Disadvantage
Rigid boundaries between data and instruction caches
2003 S. Dandamudi Chapter 17: Page 41
To be used with S. Dandamudi, Fundamentals of Computer Organization and Design, Springer, 2003.

Types of Caches (contd)


Number of cache levels
Most use two levels
Primary (level 1 or L1) On-chip Secondary (level 2 or L2) Off-chip

Examples
Pentium L1: 32 KB L2: up to 2 MB PowerPC L1: 64 KB L2: up to 1 MB
2003 S. Dandamudi Chapter 17: Page 42

To be used with S. Dandamudi, Fundamentals of Computer Organization and Design, Springer, 2003.

Types of Caches (contd)


Two-level caches work as follows:
First attempts to get data from L1 cache
If present in L1, gets data from L1 cache (L1 cache hit) If not, data must come form L2 cache or main memory (L1 cache miss)

In case of L1 cache miss, tries to get from L2 cache


If data are in L2, gets data from L2 cache (L2 cache hit) Data block is written to L1 cache If not, data comes from main memory (L2 cache miss) Main memory block is written into L1 and L2 caches

Variations on this basic scheme are possible


2003 S. Dandamudi Chapter 17: Page 43
To be used with S. Dandamudi, Fundamentals of Computer Organization and Design, Springer, 2003.

Types of Caches (contd)

Virtual and physical caches

2003

S. Dandamudi

Chapter 17: Page 44

To be used with S. Dandamudi, Fundamentals of Computer Organization and Design, Springer, 2003.

Example Implementations
We look at three processors
Pentium PowerPC MIPS

Pentium implementation
Two levels
L1 cache Split cache design 4Separate data and instruction caches L2 cache Unified cache design
2003 S. Dandamudi Chapter 17: Page 45
To be used with S. Dandamudi, Fundamentals of Computer Organization and Design, Springer, 2003.

Example Implementations (contd)


Pentium allows each page/memory region to have its own caching attributes
Uncacheable
All reads and writes go directly to the main memory Useful for 4Memory-mapped I/O devices 4Large data structures that are read once 4Write-only data structures

Write combining
Not cached Writes are buffered to reduce access to main memory Useful for video buffer frames
2003 S. Dandamudi Chapter 17: Page 46

To be used with S. Dandamudi, Fundamentals of Computer Organization and Design, Springer, 2003.

Example Implementations (contd)


Write-through
Uses write-through policy Writes are delayed as they go though a write buffer as in write combining mode

Write back
Uses write-back policy Writes are delayed as in the write-through mode

Write protected
Inhibits cache writes Write are done directly on the memory
2003 S. Dandamudi Chapter 17: Page 47

To be used with S. Dandamudi, Fundamentals of Computer Organization and Design, Springer, 2003.

Example Implementations (contd)


Two bits in control register CR0 determine the mode
Cache disable (CD) bit w Not write-through (NW) bit

Write-back
2003 S. Dandamudi Chapter 17: Page 48
To be used with S. Dandamudi, Fundamentals of Computer Organization and Design, Springer, 2003.

Example Implementations (contd)


PowerPC cache implementation
Two levels
L1 cache Split cache 4Each: 32 KB eight-way associative Uses pseudo-LRU replacement Instruction cache: read-only Data cache: read/write 4Choice of write-through or write-back L2 cache Unified cache as in Pentium Two-way set associative
2003 S. Dandamudi Chapter 17: Page 49
To be used with S. Dandamudi, Fundamentals of Computer Organization and Design, Springer, 2003.

Example Implementations (contd)


Write policy type and caching attributes can be set by OS at the block or page level L2 cache requires only a single bit to implement LRU
Because it is 2-way associative

L1 cache implements a pseudo-LRU


Each set maintains seven PLRU bits (B0B6)

2003

S. Dandamudi

Chapter 17: Page 50

To be used with S. Dandamudi, Fundamentals of Computer Organization and Design, Springer, 2003.

Example Implementations (contd)


PowerPC placement policy (incl. PLRU)

2003

S. Dandamudi

Chapter 17: Page 51

To be used with S. Dandamudi, Fundamentals of Computer Organization and Design, Springer, 2003.

Example Implementations (contd)


MIPS implementation
Two-level cache
L1 cache Split organization Instruction cache 4Virtual cache L1 line size: 16 or 32 bytes 4Direct mapped 4Read-only Data cache 4Virtual cache 4Direct mapped 4Uses write-back policy
2003 S. Dandamudi Chapter 17: Page 52
To be used with S. Dandamudi, Fundamentals of Computer Organization and Design, Springer, 2003.

Example Implementations (contd)


L2 cache Physical cache Either unified or split 4Configured at boot time Direct mapped Uses write-back policy Cache block size 416, 32, 64, or 128 bytes 4Set at boot time L1 cache line size L2 cache size

Direct mapping simplifies replacement


No need for LRU type complex implementation
2003 S. Dandamudi Chapter 17: Page 53

To be used with S. Dandamudi, Fundamentals of Computer Organization and Design, Springer, 2003.

Cache Operation Summary


Various policies used by cache
Placement of a block
Direct mapping Fully associative mapping Set-associative mapping

Location of a block
Depends on the placement policy

Replacement policy
LRU is the most popular Pseudo-LRU is often implemented

Write policy
Write-through Write-back
2003 S. Dandamudi Chapter 17: Page 54

To be used with S. Dandamudi, Fundamentals of Computer Organization and Design, Springer, 2003.

Design Issues
Several design issues
Cache capacity
Law of diminishing returns

Cache line size/block size Degree of associativity Unified/split Single/two-level Write-through/write-back Logical/physical

2003

S. Dandamudi

Chapter 17: Page 55

To be used with S. Dandamudi, Fundamentals of Computer Organization and Design, Springer, 2003.

Design Issues (contd)

Last slide

2003

S. Dandamudi

Chapter 17: Page 56

To be used with S. Dandamudi, Fundamentals of Computer Organization and Design, Springer, 2003.

You might also like