0% found this document useful (0 votes)
177 views

Cache Memory

Cache memory What is Cache memory? Cache Hit and Miss Locality Principle Cache Replacement Algorithms Mapping Comparison of Cache Sizes

Uploaded by

Budditha Hettige
Copyright
© Attribution Non-Commercial (BY-NC)
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
177 views

Cache Memory

Cache memory What is Cache memory? Cache Hit and Miss Locality Principle Cache Replacement Algorithms Mapping Comparison of Cache Sizes

Uploaded by

Budditha Hettige
Copyright
© Attribution Non-Commercial (BY-NC)
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 29

CSC 203 1.

Computer System Architecture

Budditha Hettige
Department of Statistics and Computer Science University of Sri Jayewardenepura

Cache memory

Budditha Hettige

Cache Memory
A highspeed,speed small memory Most frequently used memory words are kept in When CPU needs a word, it first checks it in cache. If not found, checks in memory

Budditha Hettige

Cache and Main Memory

Budditha Hettige

Cache memory Vs Main Memory

Budditha Hettige

Cache Hit and Miss


Cache Hit: a request to read from memory, which can satisfy from the cache without using the main memory. Cache Miss: A request to read from memory, which cannot be satisfied from the cache, for which the main memory has to be consulted.
Budditha Hettige

Locality Principle
PRINCIPAL OF LOCALITY is the tendency to reference data items that are near other recently referenced data items, or that were recently referenced themselves. TEMPORAL LOCALITY : memory location that is referenced once is likely to be referenced multiple times in near future. SPATIAL LOCALITY : memory location that is referenced once, then the program is likely to be reference a nearby memory location in near future.
Budditha Hettige

Locality Principle

Let
c cache access time m main memory access time h hit ratio (fraction of all references that can be satisfied out of cache)
miss ratio = 1h

Average memory access time = c + (1h) m


H =1 No memory references H=0 all are memory references
Budditha Hettige

Example:
Suppose that a word is read k times in a short interval

First reference: memory, Other k1 references: cache h = k1 k

Memory access time = c + m k


Budditha Hettige

Cache Memory
Main memories and caches are divided into fixed sized blocks Cache lines blocks inside the cache On a cache miss, entire cache line is loaded into cache from memory Example:
64K cache can be divided into 1K lines of 64 bytes, 2K lines of 32 byte etc

Unified cache
instruction and data use the same cache

Split cache
Instructions in one cache and data in another
Budditha Hettige

10

A system with three levels of cache

Budditha Hettige

11

Pentium 4 Block Diagram

Budditha Hettige

12

Replacement Algorithm
Optimal Replacement: replace the block which is no longer needed in the future. If all blocks currently in Cache Memory will be used again, replace the one which will not be used in the future for the longest time. Random selection: replace a randomly selected block among all blocks currently in Cache Memory.
Budditha Hettige

13

Replacement Algorithm
FIFO (first-in first-out): replace the block that has been in Cache Memory for the longest time. LRU (Least recently used): replace the block in Cache Memory that has not been used for the longest time. LFU (Least frequently used): replace the block in Cache Memory that has been used for the least number of times
Budditha Hettige

14

Cache Memory Placement Policy


Three commonly used methods to translate main memory addresses to cache memory addresses.
Associative Mapped Cache Direct-Mapped Cache Set-Associative Mapped Cache

The choice of cache mapping scheme affects cost and performance, and there is no single best method that is appropriate for all situations
Budditha Hettige

15

Associative Mapping

Budditha Hettige

16

Associative Mapping
A block in the Main Memory can be mapped to any block in the Cache Memory available (not already occupied) Advantage: Flexibility. An Main Memory block can be mapped anywhere in Cache Memory. Disadvantage: Slow or expensive. A search through all the Cache Memory blocks is needed to check whether the address can be matched to any of the tags.
Budditha Hettige

17

Direct Mapping

Budditha Hettige

18

Direct Mapping
To avoid the search through all CM blocks needed by associative mapping, this method only allows # blocks in main memory # blocks in cache memory Blocks to be mapped to each Cache Memory block. Each entry (row) in cache can hold exactly one cache line from main memory 32byte cache line size cache can hold 64KB
Budditha Hettige

19

Direct Mapping
Advantage: Direct mapping is faster than the associative mapping as it avoids searching through all the CM tags for a match. Disadvantage: But it lacks mapping flexibility. For example, if two MM blocks mapped to same CM block are needed repeatedly (e.g., in a loop), they will keep replacing each other, even though all other CM blocks may be available.
Budditha Hettige

20

Set-Associative Mapping

Budditha Hettige

21

Set-Associative Mapping
This is a trade-off between associative and direct mappings where each address is mapped to a certain set of cache locations. The cache is broken into sets where each set contains "N" cache lines, let's say 4. Then, each memory address is assigned a set, and can be cached in any one of those 4 locations within the set that it is assigned to. In other words, within each set the cache is associative, and thus the name. Budditha Hettige 22

Set Associative cache


LRU (Least Recently Used) algorithm is used
keep an ordering of each set of locations that could be accessed from a given memory location whenever any of present lines are accessed, it updates list, making that entry the most recently accessed when it comes to replace an entry, one at the end of list is discarded
Budditha Hettige

23

Load-Through and Store-Through


Load-Through : When the CPU needs to read a word from the memory, the block containing the word is brought from MM to CM, while at the same time the word is forwarded to the CPU.
Store-Through : If store-through is used, a word to be stored from CPU to memory is written to both CM (if the word is in there) and MM. By doing so, a CM block to be replaced can be overwritten by an in-coming block without being saved to MM.
Budditha Hettige

24

Cache Write Methods


Words in a cache have been viewed simply as copies of words from main memory that are read from the cache to provide faster access. However this view point changes. There are 3 possible write actions:
Write the result into the main memory Write the result into the cache Write the result into both main memory and cache memory

Budditha Hettige

25

Cache Write Methods


Write Through: A cache architecture in which data is written to main memory at the same time as it is cached. Write Back / Copy Back: CPU performs write only to the cache in case of a cache hit. If there is a cache miss, CPU performs a write to main memory. When the cache is missed :
Write Allocate: loads the memory block into cache and updates the cache block No-Write allocation: this bypasses the cache and writes the word directly into the memory.
Budditha Hettige

26

Cache Evaluation
Problem Solution Processor on which feature first appears

External memory slower than the system bus Increased processor speed results in external bus becoming a bottleneck for cache access. Internal cache is rather small, due to limited space on chip

Add external cache using faster memory technology Move external cache on-chip, operating at the same speed as the processor Add external L2 cache using faster technology than main memory

386

486

486

Budditha Hettige

27

Cache Evaluation
Problem Increased processor speed results in external bus becoming a bottleneck for L2 cache access Solution Move L2 cache on to the processor chip. Create separate back-side bus that runs at higher speed than the main (front-side) external bus. The BSB is dedicated to the L2 cache. Add external L3 cache. Move L3 cache on-chip Processor on which feature first appears Pentium II

Pentium Pro

Some applications deal with massive databases and must have rapid access to large amounts of data. The on-chip caches are too small.

Pentium III Pentium IV

Budditha Hettige

28

Comparison of Cache Sizes


Processor Type

IBM 360/85
PDP-11/70 VAX 11/780 IBM 3033 IBM 3090 Intel 80486 Pentium PowerPC 601

Mainframe
Minicomputer Minicomputer Mainframe Mainframe PC PC PC

Year of Introduction 1968 1975 1978 1978 1985 1989 1993 1993

L1 cache

L2 cache

L3 cache

16 to 32 KB
1 KB 16 KB 64 KB 128 to 256 KB 8 KB 8 KB/8 KB 32 KB

256 to 512 KB

PowerPC 620
PowerPC G4 IBM S/390 G4 IBM S/390 G6 Pentium 4 IBM SP

PC
PC/server Mainframe Mainframe PC/server High-end server

1996
1999 1997 1999 2000 2000

32 KB/32 KB
32 KB/32 KB 32 KB 256 KB 8 KB/8 KB 64 KB/32 KB

256 KB to 1 MB 256 KB 8 MB 256 KB 8 MB

2 MB 2 MB

CRAY MTAb
Itanium SGI Origin 2001 Itanium 2 IBM POWER5 CRAY XD-1

Supercomputer
PC/server High-end server PC/server High-end server Supercomputer

2000
2001 2001 2002 2003

8 KB
16 KB/16 KB 32 KB/32 KB 32 KB 64 KB

2 MB
96 KB 4 MB 256 KB 1.9 MB 1MB

4 MB 6 MB 36 MB 29

Budditha Hettige 64 KB/64 KB 2004

You might also like