Cache Memory
Cache Memory
Budditha Hettige
Department of Statistics and Computer Science University of Sri Jayewardenepura
Cache memory
Budditha Hettige
Cache Memory
A highspeed,speed small memory Most frequently used memory words are kept in When CPU needs a word, it first checks it in cache. If not found, checks in memory
Budditha Hettige
Budditha Hettige
Budditha Hettige
Locality Principle
PRINCIPAL OF LOCALITY is the tendency to reference data items that are near other recently referenced data items, or that were recently referenced themselves. TEMPORAL LOCALITY : memory location that is referenced once is likely to be referenced multiple times in near future. SPATIAL LOCALITY : memory location that is referenced once, then the program is likely to be reference a nearby memory location in near future.
Budditha Hettige
Locality Principle
Let
c cache access time m main memory access time h hit ratio (fraction of all references that can be satisfied out of cache)
miss ratio = 1h
Example:
Suppose that a word is read k times in a short interval
Cache Memory
Main memories and caches are divided into fixed sized blocks Cache lines blocks inside the cache On a cache miss, entire cache line is loaded into cache from memory Example:
64K cache can be divided into 1K lines of 64 bytes, 2K lines of 32 byte etc
Unified cache
instruction and data use the same cache
Split cache
Instructions in one cache and data in another
Budditha Hettige
10
Budditha Hettige
11
Budditha Hettige
12
Replacement Algorithm
Optimal Replacement: replace the block which is no longer needed in the future. If all blocks currently in Cache Memory will be used again, replace the one which will not be used in the future for the longest time. Random selection: replace a randomly selected block among all blocks currently in Cache Memory.
Budditha Hettige
13
Replacement Algorithm
FIFO (first-in first-out): replace the block that has been in Cache Memory for the longest time. LRU (Least recently used): replace the block in Cache Memory that has not been used for the longest time. LFU (Least frequently used): replace the block in Cache Memory that has been used for the least number of times
Budditha Hettige
14
The choice of cache mapping scheme affects cost and performance, and there is no single best method that is appropriate for all situations
Budditha Hettige
15
Associative Mapping
Budditha Hettige
16
Associative Mapping
A block in the Main Memory can be mapped to any block in the Cache Memory available (not already occupied) Advantage: Flexibility. An Main Memory block can be mapped anywhere in Cache Memory. Disadvantage: Slow or expensive. A search through all the Cache Memory blocks is needed to check whether the address can be matched to any of the tags.
Budditha Hettige
17
Direct Mapping
Budditha Hettige
18
Direct Mapping
To avoid the search through all CM blocks needed by associative mapping, this method only allows # blocks in main memory # blocks in cache memory Blocks to be mapped to each Cache Memory block. Each entry (row) in cache can hold exactly one cache line from main memory 32byte cache line size cache can hold 64KB
Budditha Hettige
19
Direct Mapping
Advantage: Direct mapping is faster than the associative mapping as it avoids searching through all the CM tags for a match. Disadvantage: But it lacks mapping flexibility. For example, if two MM blocks mapped to same CM block are needed repeatedly (e.g., in a loop), they will keep replacing each other, even though all other CM blocks may be available.
Budditha Hettige
20
Set-Associative Mapping
Budditha Hettige
21
Set-Associative Mapping
This is a trade-off between associative and direct mappings where each address is mapped to a certain set of cache locations. The cache is broken into sets where each set contains "N" cache lines, let's say 4. Then, each memory address is assigned a set, and can be cached in any one of those 4 locations within the set that it is assigned to. In other words, within each set the cache is associative, and thus the name. Budditha Hettige 22
23
24
Budditha Hettige
25
26
Cache Evaluation
Problem Solution Processor on which feature first appears
External memory slower than the system bus Increased processor speed results in external bus becoming a bottleneck for cache access. Internal cache is rather small, due to limited space on chip
Add external cache using faster memory technology Move external cache on-chip, operating at the same speed as the processor Add external L2 cache using faster technology than main memory
386
486
486
Budditha Hettige
27
Cache Evaluation
Problem Increased processor speed results in external bus becoming a bottleneck for L2 cache access Solution Move L2 cache on to the processor chip. Create separate back-side bus that runs at higher speed than the main (front-side) external bus. The BSB is dedicated to the L2 cache. Add external L3 cache. Move L3 cache on-chip Processor on which feature first appears Pentium II
Pentium Pro
Some applications deal with massive databases and must have rapid access to large amounts of data. The on-chip caches are too small.
Budditha Hettige
28
IBM 360/85
PDP-11/70 VAX 11/780 IBM 3033 IBM 3090 Intel 80486 Pentium PowerPC 601
Mainframe
Minicomputer Minicomputer Mainframe Mainframe PC PC PC
Year of Introduction 1968 1975 1978 1978 1985 1989 1993 1993
L1 cache
L2 cache
L3 cache
16 to 32 KB
1 KB 16 KB 64 KB 128 to 256 KB 8 KB 8 KB/8 KB 32 KB
256 to 512 KB
PowerPC 620
PowerPC G4 IBM S/390 G4 IBM S/390 G6 Pentium 4 IBM SP
PC
PC/server Mainframe Mainframe PC/server High-end server
1996
1999 1997 1999 2000 2000
32 KB/32 KB
32 KB/32 KB 32 KB 256 KB 8 KB/8 KB 64 KB/32 KB
2 MB 2 MB
CRAY MTAb
Itanium SGI Origin 2001 Itanium 2 IBM POWER5 CRAY XD-1
Supercomputer
PC/server High-end server PC/server High-end server Supercomputer
2000
2001 2001 2002 2003
8 KB
16 KB/16 KB 32 KB/32 KB 32 KB 64 KB
2 MB
96 KB 4 MB 256 KB 1.9 MB 1MB
4 MB 6 MB 36 MB 29