Lecture22 PDF
Lecture22 PDF
Lecture22 PDF
More Caches
Measuring Performance
Lecture 22: 1
Announcements
Lecture 22: 2
Another LRU Replacement Example
• 2-way set associative (*) = LRU block
1 bit in this case
Block Cache Hit/miss Cache contents after access
address index Set 0 Set 1
0 0 miss Mem[0]
4 0 miss Mem[0] (*) Mem[4]
2 0 miss Mem[2] Mem[4] (*)
6 0 miss Mem[2] (*) Mem[6]
8 0 miss Mem[8] Mem[6] (*)
0 0 miss Mem[8] (*) Mem[0]
4 0 miss Mem[4] Mem[0] (*)
2 0 miss Mem[4] (*) Mem[2]
6 0 miss Mem[6] Mem[2] (*)
8 0 miss Mem[6] (*) Mem[8]
2 0 miss Mem[2] Mem[8] (*)
6 0 miss Mem[2] (*) Mem[6]
2 0 hit Mem[2] Mem[6] (*)
0 0 miss Mem[2] (*) Mem[0]
Lecture 22: 5
Write Through
Processor Cache Memory
000 100
M[000000] <= R0
V tag data 000 110
M[000100] <= R1 miss
M[010000] <= R2 0 0 001 120
M[011100] <= R3 1 0 001 130
010 140
010 150
011 160
011 170
R0 333 100 180
R1 444 100 190
R2 555 101 200
R3 666 101 210
110 220
110 230
111 240
111 250
Lecture 22: 6
Write Through
Processor Cache Memory
000 100
M[000000] <= R0
V tag data 000 110
M[000100] <= R1 miss
M[010000] <= R2 0 1 00 110 100 001 120
M[011100] <= R3 1 0 001 130
010 140
010 150
011 160
011 170
R0 333 100 180
R1 444 100 190
R2 555 101 200
R3 666 101 210
110 220
110 230
111 240
111 250
Lecture 22: 7
Write Through
Processor Cache Memory
000 333
M[000000] <= R0
V tag data 000 110
M[000100] <= R1 miss
M[010000] <= R2 0 1 00 110 333 001 120
M[011100] <= R3 1 0 001 130
010 140
010 150
011 160
011 170
R0 333 100 180
R1 444 100 190
R2 555 101 200
R3 666 101 210
110 220
110 230
111 240
111 250
Lecture 22: 8
Write Through
Processor Cache Memory
000 333
M[000000] <= R0
V tag data 000 110
M[000100] <= R1 hit
M[010000] <= R2 0 1 00 110 333 001 120
M[011100] <= R3 1 0 001 130
010 140
010 150
011 160
011 170
R0 333 100 180
R1 444 100 190
R2 555 101 200
R3 666 101 210
110 220
110 230
111 240
111 250
Lecture 22: 9
Write Through
Processor Cache Memory
000 333
M[000000] <= R0
V tag data 000 444
M[000100] <= R1 hit
M[010000] <= R2 0 1 00 444 333 001 120
M[011100] <= R3 1 0 001 130
010 140
010 150
011 160
011 170
R0 333 100 180
R1 444 100 190
R2 555 101 200
R3 666 101 210
110 220
110 230
111 240
111 250
Lecture 22: 10
Write Through
Processor Cache Memory
000 333
M[000000] <= R0
V tag data 000 444
M[000100] <= R1 miss
M[010000] <= R2 0 1 00 444 333 001 120
M[011100] <= R3 1 0 001 130
010 140
010 150
011 160
011 170
R0 333 100 180
R1 444 100 190
R2 555 101 200
R3 666 101 210
110 220
110 230
111 240
111 250
Lecture 22: 11
Write Through
Processor Cache Memory
000 333
M[000000] <= R0
V tag data 000 444
M[000100] <= R1 miss
M[010000] <= R2 0 1 01 150 140 001 120
M[011100] <= R3 1 0 001 130
010 140
010 150
011 160
011 170
R0 333 100 180
R1 444 100 190
R2 555 101 200
R3 666 101 210
110 220
110 230
111 240
111 250
Lecture 22: 12
Write Through
Processor Cache Memory
000 333
M[000000] <= R0
V tag data 000 444
M[000100] <= R1 miss
M[010000] <= R2 0 1 01 150 555 001 120
M[011100] <= R3 1 0 001 130
010 555
010 150
011 160
011 170
R0 333 100 180
R1 444 100 190
R2 555 101 200
R3 666 101 210
110 220
110 230
111 240
111 250
Lecture 22: 13
Write Through
Processor Cache Memory
000 333
M[000000] <= R0
V tag data 000 444
M[000100] <= R1
M[010000] <= R2 miss 0 1 01 150 555 001 120
M[011100] <= R3 1 0 001 130
010 555
010 150
011 160
011 170
R0 333 100 180
R1 444 100 190
R2 555 101 200
R3 666 101 210
110 220
110 230
111 240
111 250
Lecture 22: 14
Write Through
Processor Cache Memory
000 333
M[000000] <= R0
V tag data 000 444
M[000100] <= R1
M[010000] <= R2 miss 0 1 01 150 555 001 120
M[011100] <= R3 1 1 01 170 160 001 130
010 555
010 150
011 160
011 170
R0 333 100 180
R1 444 100 190
R2 555 101 200
R3 666 101 210
110 220
110 230
111 240
111 250
Lecture 22: 15
Write Through
Processor Cache Memory
000 333
M[000000] <= R0
V tag data 000 444
M[000100] <= R1
M[010000] <= R2 miss 0 1 01 150 555 001 120
M[011100] <= R3 1 1 01 666 160 001 130
010 555
010 150
011 160
011 666
R0 333 100 180
R1 444 100 190
R2 555 101 200
R3 666 101 210
110 220
110 230
111 240
111 250
Lecture 22: 16
Write Back Example
• Assume write allocate
• Size of each block is 8 bytes
• Cache holds 2 blocks
• Memory holds 8 blocks
• Memory address Dirty bit
V D tag data
0
1
Lecture 22: 17
Write Back
Processor Cache Memory
000 100
M[000000] <= R0
V D tag data 000 110
M[000100] <= R1 miss
M[010000] <= R2 0 0 0 001 120
M[011100] <= R3 1 0 0 001 130
010 140
010 150
011 160
011 170
R0 333 100 180
R1 444 100 190
R2 555 101 200
R3 666 101 210
110 220
110 230
111 240
111 250
Lecture 22: 18
Write Back
Processor Cache Memory
000 100
M[000000] <= R0
V D tag data 000 110
M[000100] <= R1 miss
M[010000] <= R2 0 1 0 00 110 100 001 120
M[011100] <= R3 1 0 0 001 130
010 140
010 150
011 160
011 170
R0 333 100 180
R1 444 100 190
R2 555 101 200
R3 666 101 210
110 220
110 230
111 240
111 250
Lecture 22: 19
Write Back
Processor Cache Memory
000 100
M[000000] <= R0
V D tag data 000 110
M[000100] <= R1 miss
M[010000] <= R2 0 1 1 00 110 333 001 120
M[011100] <= R3 1 0 0 001 130
010 140
010 150
011 160
011 170
R0 333 100 180
R1 444 100 190
R2 555 101 200
R3 666 101 210
110 220
110 230
111 240
111 250
Lecture 22: 20
Write Back
Processor Cache Memory
000 100
M[000000] <= R0
V D tag data 000 110
M[000100] <= R1 hit
M[010000] <= R2 0 1 1 00 110 333 001 120
M[011100] <= R3 1 0 0 001 130
010 140
010 150
011 160
011 170
R0 333 100 180
R1 444 100 190
R2 555 101 200
R3 666 101 210
110 220
110 230
111 240
111 250
Lecture 22: 21
Write Back
Processor Cache Memory
000 100
M[000000] <= R0
V D tag data 000 110
M[000100] <= R1 hit
M[010000] <= R2 0 1 1 00 444 333 001 120
M[011100] <= R3 1 0 0 001 130
010 140
010 150
011 160
011 170
R0 333 100 180
R1 444 100 190
R2 555 101 200
R3 666 101 210
110 220
110 230
111 240
111 250
Lecture 22: 22
Write Back
Processor Cache Memory
000 100
M[000000] <= R0
V D tag data 000 110
M[000100] <= R1 miss
M[010000] <= R2 0 1 1 00 444 333 001 120
M[011100] <= R3 1 0 0 001 130
010 140
010 150
011 160
011 170
R0 333 100 180
R1 444 100 190
R2 555 101 200
R3 666 101 210
110 220
110 230
111 240
111 250
Lecture 22: 23
Write Back
Processor Cache Memory
000 333
M[000000] <= R0
V D tag data 000 444
M[000100] <= R1 miss
M[010000] <= R2 0 1 1 00 444 333 001 120
M[011100] <= R3 1 0 0 001 130
010 140
010 150
011 160
011 170
R0 333 100 180
R1 444 100 190
R2 555 101 200
R3 666 101 210
110 220
110 230
111 240
111 250
Lecture 22: 24
Write Back
Processor Cache Memory
000 333
M[000000] <= R0
V D tag data 000 444
M[000100] <= R1 miss
M[010000] <= R2 0 1 0 01 150 140 001 120
M[011100] <= R3 1 0 0 001 130
010 140
010 150
011 160
011 170
R0 333 100 180
R1 444 100 190
R2 555 101 200
R3 666 101 210
110 220
110 230
111 240
111 250
Lecture 22: 25
Write Back
Processor Cache Memory
000 333
M[000000] <= R0
V D tag data 000 444
M[000100] <= R1 miss
M[010000] <= R2 0 1 1 01 150 555 001 120
M[011100] <= R3 1 0 0 001 130
010 140
010 150
011 160
011 170
R0 333 100 180
R1 444 100 190
R2 555 101 200
R3 666 101 210
110 220
110 230
111 240
111 250
Lecture 22: 26
Write Back
Processor Cache Memory
000 333
M[000000] <= R0
V D tag data 000 444
M[000100] <= R1
M[010000] <= R2 miss 0 1 1 01 150 555 001 120
M[011100] <= R3 1 0 0 001 130
010 140
010 150
011 160
011 170
R0 333 100 180
R1 444 100 190
R2 555 101 200
R3 666 101 210
110 220
110 230
111 240
111 250
Lecture 22: 27
Write Back
Processor Cache Memory
000 333
M[000000] <= R0
V D tag data 000 444
M[000100] <= R1
M[010000] <= R2 miss 0 1 1 01 150 555 001 120
M[011100] <= R3 1 1 0 01 170 160 001 130
010 140
010 150
011 160
011 170
R0 333 100 180
R1 444 100 190
R2 555 101 200
R3 666 101 210
110 220
110 230
111 240
111 250
Lecture 22: 28
Write Back
Processor Cache Memory
000 333
M[000000] <= R0
V D tag data 000 444
M[000100] <= R1
M[010000] <= R2 miss 0 1 1 01 150 555 001 120
M[011100] <= R3 1 1 1 01 666 160 001 130
010 140
010 150
011 160
011 170
R0 333 100 180
R1 444 100 190
R2 555 101 200
R3 666 101 210
110 220
110 230
111 240
111 250
Lecture 22: 29
Cache Hierarchy
• Time to get a block from memory is so long that
performance suffers even with a low miss rate
Lecture 22: 30
Pipeline with a Cache Hierarchy
Adder L1
+2 Fm … F0 Data
M
RF U M
Cache
M LD X
(KB)
P L1 U
Decoder
U Instr SA X M
M ALU
X C Cache SB
M
U U
(KB) DR X D_IN
U
X
M
U
X
PCJ X MB
D_in MD
PCL
SE
IF/ID ID/EX EX/MEM MEM/WB
L2 Cache (MB)
Lecture 22: 31
Cache Hierarchy
• Level 1 (L1) instruction and data caches
– Small, but very fast
• Level 2 (L2) cache handles L1 misses
– Larger and slower than L1, but much faster than main memory
– L1 data are also present in L2
• Main memory handles L2 cache misses
Lecture 22: 32
How Do We Measure Performance?
• Execution time: The time between the start and
completion of a program (or task)
Lecture 22: 33
CPU Execution Time
• Amount of time the CPU takes to run a program
• Derivation
Lecture 22: 34
Instruction Count (I)
• Total number of instructions in the given
program
• Factors
– Instruction set
– Mix of instructions chosen by the compiler
Lecture 22: 35
Cycle Time (CT)
• Clock period (1/frequency)
• Factors
– Instruction set
– Structure of the processor and memory hierarchy
Lecture 22: 36
Cycles Per Instruction (CPI)
• Average number of cycles required to execute
each instruction
• Factors
– Instruction set
– Mix of instructions chosen by the compiler
– Ordering of the instructions by the compiler
– Structure of the processor and memory hierarchy
Lecture 22: 37
Processor Organization
Impact on CPI (Example 1)
CC1 CC2 CC3 CC4 CC5 CC6 CC7 CC8 CC9
A
ADD R1,R2,R3 IM Reg L DM Reg
U
A
OR R4,R1,R3 IM Reg L DM Reg
U
A
SUB R5,R2,R1 IM Reg L DM Reg
U
A
AND R6,R1,R2 IM Reg L DM Reg
U
A
ADDI R7,R7,3 IM Reg L DM Reg
U
Lecture 22: 38
Processor Organization
Impact on CPI (Example 2)
Control
CU Signals
=?
sign bit
Adder
+2 Fm … F0 Data
M
RF U M RAM
M LD X
P U
Decoder
U Inst SA X M
M ALU
X C RAM SB
M
U U
DR X D_IN
U
X
M
U
X
PCJ X MB
D_in MW MD
PCL
SE
IF/ID ID/EX EX/MEM MEM/WB
A
BEQ R2,R3,X IM Reg L DM Reg
U
A
NOP
ADDI R7,R7,3 IM Reg L DM Reg
U
A
OR R4,R1,R3 IM Reg L DM Reg
U
A
SUB R5,R2,R1 IM Reg L DM Reg
U
A
X: AND R6,R1,R2 IM Reg L DM Reg
U
ADDI R7,R7,3
Filling the branch delay slot
...
with a useful instruction
Lecture 22: 40
A Rough Breakdown of CPI
• CPIbase is the base CPI in an ideal scenario where
instruction fetches and data memory accesses incur no
extra delay
Lecture 22: 41
Impact of L1 Caches
• With L1 caches
– L1 instruction cache miss rate = 2%
– L1 data cache miss rate = 4%
– Miss penalty = 100 cycles (access main memory)
– 20% of all instructions are loads, 10% are stores
Lecture 22: 42
Impact of L1+L2 Caches
• With L1 and L2 caches
– L1 instruction cache miss rate = 2%
– L1 data cache miss rate = 4%
– L2 access time = 15 cycles
– L2 miss rate = 25%
– L2 miss penalty = 100 cycles (access main memory)
– 20% of all instructions are loads, 10% are stores
Lecture 22: 43
Before Next Class
• H&H 8.4
Next Time
Virtual Memory
Lecture 22: 44