Fundamentals of Computer Systems: Caches
Fundamentals of Computer Systems: Caches
Spring 2012
Illustrations Copyright 2007 Elsevier
Computer Systems
Performance depends on which is slowest: the processor or the memory system
Processor
Memory
ReadData
Performance
1000
CPU
100
10
Memory
1
19 80 19 81 19 82 19 83 19 84 19 85 19 86 19 87 19 88 19 89 19 90 19 91 19 92 19 93 19 94 19 95 19 96 19 97 19 98 19 99 20 00 20 01 20 02 20 03 20 04 20 05
Year
On-Chip SRAM
Commodity DRAM
Supercomputer
Memory Hierarchy
Fundmental trick to making a big memory appear fast Technology SRAM DRAM Flash Hard Disk
Read
My desktop machine: Level Size L1 Instruction 64 K L1 Data 64 K L2 512 K L3 2 MB Memory 4 GB Disk 500 GB
per
core
Temporal Locality
What path do your eyes take when you read this? Did you look at the drawings more than once?
Euclids Elements
Spatial Locality
Memory Performance
Hit: Data is found in the level of memory hierarchy Miss: Data not found; will look in next level Hit Rate =
Number of hits Number of accesses Number of misses Number of accesses
Miss Rate =
Hit Rate + Miss Rate = 1 The expected access time EL for a memory level L with latency tL and miss rate ML : EL = tL + ML EL+1
= 75%
Miss Rate = 1 0.75 = 25% If the cache takes 1 cycle and the main memory 100, Whats the expected access time?
= 75%
Miss Rate = 1 0.75 = 25% If the cache takes 1 cycle and the main memory 100, Whats the expected access time? Expected access time of main memory: E1 = 100 cycles Access time for the cache: t0 = 1 cycle Cache miss rate: M0 = 0.25 E0 = t0 + M0 E1 = 1 + 0.25 100 = 26 cycles
Cache
Highest levels of memory hierarchy Fast: level 1 typically 1 cycle access time With luck, supplies most data Cache design questions: What data does it hold? How is data found? What data is replaced? Recently accessed Simple address hash Often the oldest
A Direct-Mapped Cache
Address Data
11...11111100 11...11111000 11...11110100 11...11110000 11...11101100 11...11101000 11...11100100 11...11100000 mem[0xFFFFFFFC] mem[0xFFFFFFF8] mem[0xFFFFFFF4] mem[0xFFFFFFF0] mem[0xFFFFFFEC] mem[0xFFFFFFE8] mem[0xFFFFFFE4] mem[0xFFFFFFE0]
This simple cache has 8 sets 1 block per set 4 bytes per block To simplify answering is this memory in the cache?, each byte is mapped to exactly one set.
Set 7 (111) Set 6 (110) Set 5 (101) Set 4 (100) Set 3 (011) Set 2 (010) Set 1 (001) Set 0 (000)
00...00100100 00...00100000 00...00011100 00...00011000 00...00010100 00...00010000 00...00001100 00...00001000 00...00000100 00...00000000
mem[0x00000024] mem[0x00000020] mem[0x0000001C] mem[0x00000018] mem[0x00000014] mem[0x00000010] mem[0x0000000C] mem[0x00000008] mem[0x00000004] mem[0x00000000]
23-Word Cache
Address bits: 01: byte within block 24: set number 531: block tag
00
3
V Tag
27
32
Cache hit if in the set of the address, block is valid (V=1) tag (address bits 531) matches
Hit
Data
A dumb loop: repeat 5 times load from 0x4; load from 0xC; load from 0x8. li l1: beq lw lw lw addiu j done: $t0, $t0, $t1, $t2, $t3, $t0, l1
Data Set 7 (111) Set 6 (110) Set 5 (101) Set 4 (100) Set 3 (011) Set 2 (010) Set 1 (001) Set 0 (000)
5 $0, done Cache when reading 0x4 last time 0x4($0) When two recently accessed addresses map to the same cache 0xC($0) Assuming the cache starts empty, 0x8($0) whats the miss rate? $t0, -1
A dumb loop: repeat 5 times load from 0x4; load from 0xC; load from 0x8. li l1: beq lw lw lw addiu j done: $t0, $t0, $t1, $t2, $t3, $t0, l1
Data Set 7 (111) Set 6 (110) Set 5 (101) Set 4 (100) Set 3 (011) Set 2 (010) Set 1 (001) Set 0 (000)
5 $0, done Cache when reading 0x4 last time 0x4($0) When two recently accessed addresses map to the same cache 0xC($0) Assuming the cache starts empty, 0x8($0) whats the miss rate? $t0, -1 4 C 8 4 C 8 4 C 8 4 C 8 4 C 8 MMMHHHHHHHHHHHH 3/ 15 = 0.2 = 20%
00...01 001 00 V Tag Data Set 7 (111) Set 6 (110) Set 5 (101) Set 4 (100) Set 3 (011) Set 2 (010) Set 1 (001) Set 0 (000)
A dumber loop: repeat 5 times load from 0x4; load from 0x24 li l1: beq lw lw addiu j done: $t0, $t0, $t1, $t2, $t0, l1 5 $0, done 0x4($0) 0x24($0) $t0, -1
Cache State Assuming the cache starts empty, whats the miss rate?
Data Set 7 (111) Set 6 (110) Set 5 (101) Set 4 (100) Set 3 (011) Set 2 (010) Set 1 (001) Set 0 (000)
A dumber loop: repeat 5 times load from 0x4; load from 0x24 li l1: beq lw lw addiu j done: $t0, $t0, $t1, $t2, $t0, l1 5 $0, done 0x4($0) 0x24($0) $t0, -1
Cache State Assuming the cache starts empty, whats the miss rate? 4 24 4 24 4 24 4 24 4 24 M M M M M M M M M M 10/ 10 = 1 = 100% Oops
00
2
V Tag
28
32
28
32
Hit1
Hit0
Hit1
32
Hit
Data
Way 1 V Tag
0 0 1 0 00...00 mem[0x00...24]
Way 0 V Tag
0 0 1 0 00...10 mem[0x00...04]
Data
Way 7
Way 6
Way 5
Way 4
Way 3
Way 2
Way 1
Way 0
V Tag Data V Tag Data V Tag Data V Tag Data V Tag Data V Tag Data V Tag Data V Tag Data
Figure 8.11
No conict misses: only compulsory or capacity misses Either very expensive or slow because of all the associativity
Block Byte
Offset Offset
Memory Address
Tag
Set
100...100
800000 9
11
C
00
Memory Address
Tag 27
00
2
V Tag
27
32
32
32
32
11
10
32
01
00
Hit
Data
Data
mem[0x00...0C] mem[0x00...08] mem[0x00...04] mem[0x00...00]
The dumb loop: repeat 5 times load from 0x4; load from 0xC; load from 0x8. li l1: beq lw lw lw addiu j done: $t0, $t0, $t1, $t2, $t3, $t0, l1
0 1
00...00
Set 1 Set 0
Figure 8.14
Data
mem[0x00...0C] mem[0x00...08] mem[0x00...04] mem[0x00...00]
The dumb loop: repeat 5 times load from 0x4; load from 0xC; load from 0x8. li l1: beq lw lw lw addiu j done: $t0, $t0, $t1, $t2, $t3, $t0, l1
0 1
00...00
Set 1 Set 0
Figure 8.14
Assuming the cache starts empty, whats the miss rate? 4 C 8 4 C 8 4 C 8 4 C 8 4 C 8 MHHHHHHHHHHHHHH 1/ 15 = 0.0666 = 6.7%
core
Pentium Pro 1995 Pentium II Pentium III Pentium 4 Pentium M 1997 1999 2001 2003