Recitation05 Cachelab
Recitation05 Cachelab
Cache Concepts
Cache Organization
E = 2e lines/set
“line” or “block”
S = 2S sets
Activity 1: Traces
Carnegie Mellon
Tracing a Cache
Example Cache: -s 1 -E 2 -b 2 (S=2 B=4)
E=0 E=1
Example Trace
L - Load Jack.trace
S - Store L 0,4
S 0,4
Memory Location L 0,1
L 6,1
Size L 5,1
L 6,1
L 7,1
Carnegie Mellon
Example Trace
Jack.trace
L 0,4
S 0,4
L 0,1 Memory
L 5,1 15 21 13 18
L 6,1 0x04
51
0x05
30
0x06
ac
0x07
b3
L 7,1
Carnegie Mellon
Example Trace
15 21 13 18
Jack.trace
L 0,4 M
S 0,4
L 0,1 Memory
Why that line?
L 6,1 0x00 0x01 0x02 0x03 Where are those values
L 5,1 15 21 13 18
from?
L 6,1 0x04
51
0x05
30
0x06
ac
0x07
b3
L 7,1
Carnegie Mellon
Example Trace
15 21 13 18
Jack.trace
L 0,4 M
S 0,4 H
L 0,1 Memory
What happens if values
L 6,1 0x00 0x01 0x02 0x03 change?
L 5,1 15 21 13 18
L 6,1 0x04
51
0x05
30
0x06
ac
0x07
b3
L 7,1
Carnegie Mellon
Example Trace
15 21 13 18
Jack.trace
L 0,4 M
S 0,4 H
L 0,1 H Memory
Why is this still a hit?
L 6,1 0x00 0x01 0x02 0x03
L 5,1 15 21 13 18
What would happen if we had
not previously loaded all four
L 6,1 0x04
51
0x05
30
0x06
ac
0x07
b3 bytes?
L 7,1
Carnegie Mellon
Example Trace
15 21 13 18
Jack.trace ac
L 0,4 M
S 0,4 H
L 0,1 H Memory
Just one Byte?
L 6,1 M 0x00 0x01 0x02 0x03
L 5,1 15 21 13 18
L 6,1 0x04
51
0x05
30
0x06
ac
0x07
b3
L 7,1
Carnegie Mellon
Example Trace
15 21 13 18
Jack.trace ac
L 0,4 M
S 0,4 H
L 0,1 H Memory
Just one Byte?
L 6,1 M 0x00 0x01 0x02 0x03
L 5,1 NO!
15 21 13 18
L 6,1 0x04
51
0x05
30
0x06
ac
0x07
b3
L 7,1
Carnegie Mellon
Example Trace
15 21 13 18
Jack.trace 51 30 ac b3
L 0,4 M
S 0,4 H
L 0,1 H Memory
Why below and not above?
L 6,1 M 0x00 0x01 0x02 0x03
L 5,1 15 21 13 18
Why load all four bytes?
L 6,1 0x04
51
0x05
30
0x06
ac
0x07
b3
L 7,1
Carnegie Mellon
Example Trace
15 21 13 18
Jack.trace 51 30 ac b3
L 0,4 M
S 0,4 H
L 0,1 H Memory
L 5,1 H 15 21 13 18
L 6,1 0x04
51
0x05
30
0x06
ac
0x07
b3
L 7,1
Carnegie Mellon
Example Trace
15 21 13 18
Jack.trace 51 30 ac b3
L 0,4 M
S 0,4 H
L 0,1 H Memory
L 5,1 H 15 21 13 18
L 6,1 H 0x04
51
0x05
30
0x06
ac
0x07
b3
L 7,1
Carnegie Mellon
Example Trace
15 21 13 18
Jack.trace 51 30 ac b3
L 0,4 M
S 0,4 H
L 0,1 H Memory
L 5,1 H 15 21 13 18
L 6,1 H 0x04
51
0x05
30
0x06
ac
0x07
b3
L 7,1 H
Carnegie Mellon
Example Trace
15 21 13 18
51 30 ac b3
Jack2.trace
Memory
L 8,4 M
0x00 0x01 0x02 0x03
15 21 13 18 What would happen if we
loaded from memory address
0x04 0x05 0x06 0x07 0x08?
51 30 ac b3
Example Trace
15 21 13 18 de ad be ef
51 30 ac b3
Jack2.trace
Memory
L 8,4 M
0x00 0x01 0x02 0x03
15 21 13 18 What would happen if we
loaded from memory address
0x04 0x05 0x06 0x07 0x08?
51 30 ac b3
Activity 2: Blocking
Example: Matrix Multiplication
/* multiply 4x4 matrices */
void mm(int a[4][4], int b[4][4], int c[4][4]) {
int i, j, k;
for (i = 0; i < 4; i++)
for (j = 0; j < 4; j++)
for (k = 0; k < 4; k++)
c[i][j] += a[i][k] * b[k][j];
= x
iter i j k Key:
operation a l Grey = accessed
0 0 0 0 c[0][0] += a[0] Dark grey = currently accessing
[0] * l[0][0] M M Red border = in cache
c a l
= x
iter i j k Key:
operation a l Grey = accessed
0 0 0 0 c[0][0] += a[0] Dark grey = currently accessing
[0] * l[0][0] M M Red border = in cache
1 0 0 1 c[0][0] += a[0]
[1] * l[1][0] H M
c a l
= x
iter i j k Key:
operation a l Grey = accessed
0 0 0 0 c[0][0] += a[0] Dark grey = currently accessing
[0] * l[0][0] M M Red border = in cache
1 0 0 1 c[0][0] += a[0]
[1] * l[1][0] H M
2 0 0 2 c[0][0] += a[0]
[2] * l[2][0] M M
c a l
= x
iter i j k a l Key:
operation M Grey = accessed
0 0 0 0 c[0][0]M+= Dark grey = currently accessing
a[0][0] * l[0][0] H Red border = in cache
1 0 0 1 c[0][0]M+=
a[0][1] * l[1][0] M
2 0 0 2 c[0][0]M+=
a[0][2] * l[2][0] H
3 0 0 3 c[0][0]M+=
a[0][3] * l[3][0]
c a l
= x
iter i j k a l Key:
operation M Grey = accessed
0 0 0 0 c[0][0]M+= a[0] Dark grey = currently accessing
[0] * l[0][0] H Red border = in cache
1 0 0 1 c[0][0]M+= a[0]
[1] * l[1][0] M
2 0 0 2 c[0][0]M+= a[0]
[2] * l[2][0] H
3 0 0 3 c[0][0]M+= a[0]
[3] * l[3][0] M
4 0 1 0 c[0][1]M+= a[0]
c a l
= x
iter i j k a l Key:
operation M Grey = accessed
0 0 0 0 c[0][0]M+= Dark grey = currently accessing
a[0][0] * l[0][0] H Red border = in cache
1 0 0 1 c[0][0]M+=
a[0][1] * l[1][0] M
2 0 0 2 c[0][0]M+=
a[0][2] * l[2][0] H
3 0 0 3 c[0][0]M+=
a[0][3] * l[3][0] M
4 0 1 0 c[0][1]M+=
c a l
= x
iter i j k a l Key:
operation M Grey = accessed
0 0 0 0 c[0][0]M+= Dark grey = currently accessing
a[0][0] * l[0][0] H Red border = in cache
1 0 0 1 c[0][0]M+=
a[0][1] * l[1][0] M
2 0 0 2 c[0][0]M+=
a[0][2] * l[2][0] H
3 0 0 3 c[0][0]M+=
a[0][3] * l[3][0] M
4 0 1 0 c[0][1]M+=
c a l
= x
iter i j k a l Key:
operation M Grey = accessed
0 0 0 0 c[0][0]M+= Dark grey = currently accessing
a[0][0] * l[0][0] H Red border = in cache
1 0 0 1 c[0][0]M+=
a[0][1] * l[1][0] M
2 0 0 2 c[0][0]M+=
a[0][2] * l[2][0] H
3 0 0 3 c[0][0]M+=
a[0][3] * l[3][0] M
4 0 1 0 c[0][1]M+=
c a l
= x
iter i j k a l Key:
operation M Grey = accessed
0 0 0 0 c[0][0]M+= Dark grey = currently accessing
a[0][0] * l[0][0] H Red border = in cache
1 0 0 1 c[0][0]M+=
a[0][1] * l[1][0] M What is the miss rate of a?
2 0 0 2 c[0][0]M+=
a[0][2] * l[2][0] H
3 0 0 3 c[0][0]M+=
a[0][3] * l[3][0] M
4 0 1 0 c[0][1]M+=
c a l
= x
iter i j k a l Key:
operation M Grey = accessed
0 0 0 0 c[0][0]M+= Dark grey = currently accessing
a[0][0] * l[0][0] H Red border = in cache
1 0 0 1 c[0][0]M+=
a[0][1] * l[1][0] M What is the miss rate of a?
2 0 0 2 c[0][0]M+=
a[0][2] * l[2][0] H
3 0 0 3 c[0][0]M+= What is the miss rate of l?
a[0][3] * l[3][0] M
4 0 1 0 c[0][1]M+=
Example: Matrix Multiplication (blocking)
/* multiply 4x4 matrices using blocks of size 2 */
void mm_blocking(int a[4][4], int b[4][4], int c[4][4]) {
int i, j, k;
int i_c, j_c, k_c;
int B = 2;
// control loops
for (i_c = 0; i_c < 4; i_c += B)
for (j_c = 0; j_c < 4; j_c += B)
for (k_c = 0; k_c < 4; k_c += B)
// block multiplications
for (i = i_c; i < i_c + B; i++)
for (j = j_c; j < j_c + B; j++)
for (k = k_c; k < k_c + B; k++)
c[i][j] += a[i][k] * b[k][j];
= x
iter i j k a l Key:
operation M Grey = accessed
0 0 0 0 M Dark grey = currently accessing
c[0][0] += a[0][0] * l[0][0] Red border = in cache
c a l
= x
iter i j k a l Key:
operation M Grey = accessed
0 0 0 0 M Dark grey = currently accessing
c[0][0] += a[0][0] * l[0][0] H Red border = in cache
1 0 0 1 M
c[0][0] += a[0][1] * l[1][0]
c a l
= x
iter i j k a l Key:
operation M Grey = accessed
0 0 0 0 M Dark grey = currently accessing
c[0][0] += a[0][0] * l[0][0] H Red border = in cache
1 0 0 1 M
c[0][0] += a[0][1] * l[1][0] H
2 0 1 0 H
c[0][1] += a[0][0] * l[0][1]
c a l
= x
iter i j k a l Key:
operation M Grey = accessed
0 0 0 0 M Dark grey = currently accessing
c[0][0] += a[0][0] * l[0][0] H Red border = in cache
1 0 0 1 M
c[0][0] += a[0][1] * l[1][0] H
2 0 1 0 H
c[0][1] += a[0][0] * l[0][1] H
3 0 1 1 H
c[0][1] += a[0][1] * l[1][1]
c a l
= x
iter i j k a l Key:
operation M Grey = accessed
0 0 0 0 M Dark grey = currently accessing
c[0][0] += a[0][0] * l[0][0] H Red border = in cache
1 0 0 1 M
c[0][0] += a[0][1] * l[1][0] H
2 0 1 0 H
c[0][1] += a[0][0] * l[0][1] H
3 0 1 1 H
c[0][1] += a[0][1] * l[1][1] M
4 1 0 0 H
c a l
= x
iter i j k a l Key:
operation M Grey = accessed
0 0 0 0 M Dark grey = currently accessing
c[0][0] += a[0][0] * l[0][0] H Red border = in cache
1 0 0 1 M
c[0][0] += a[0][1] * l[1][0] H
2 0 1 0 H
c[0][1] += a[0][0] * l[0][1] H
3 0 1 1 H
c[0][1] += a[0][1] * l[1][1] M
4 1 0 0 H
c a l
= x
iter i j k a l Key:
operation M Grey = accessed
0 0 0 0 M Dark grey = currently accessing
c[0][0] += a[0][0] * l[0][0] H Red border = in cache
1 0 0 1 M
c[0][0] += a[0][1] * l[1][0] H
2 0 1 0 H
c[0][1] += a[0][0] * l[0][1] H
3 0 1 1 H
c[0][1] += a[0][1] * l[1][1] M
4 1 0 0 H
c a l
= x
iter i j k a l Key:
operation M Grey = accessed
0 0 0 0 M Dark grey = currently accessing
c[0][0] += a[0][0] * l[0][0] H Red border = in cache
1 0 0 1 M
c[0][0] += a[0][1] * l[1][0] H
2 0 1 0 H
c[0][1] += a[0][0] * l[0][1] H
3 0 1 1 H
c[0][1] += a[0][1] * l[1][1] M
4 1 0 0 H
c a l
= x
iter i j k a iter
l i j k a
operation M
operation M 8 0 0 2 c[0][0]
M +=
0 0 0 0 M a[0][2] * l[2][0]
c[0][0] += a[0][0] * l[0][0] H
1 0 0 1 M
c[0][0] += a[0][1] * l[1][0] H
2 0 1 0 H
c[0][1] += a[0][0] * l[0][1] H
3 0 1 1 H
c[0][1] += a[0][1] * l[1][1] M
4 1 0 0 H
c a l
= x
iter i j k a iter
l i j k a
operation M
operation M 8 0 0 M +=
2 c[0][0]
0 0 0 0 M a[0][2] * l[2][0] H
9 0 0 M +=
3 c[0][0]
c[0][0] += a[0][0] * l[0][0] H a[0][3] * l[3][0]
1 0 0 1 M
c[0][0] += a[0][1] * l[1][0] H
2 0 1 0 H
c[0][1] += a[0][0] * l[0][1] H
3 0 1 1 H
c[0][1] += a[0][1] * l[1][1] M
4 1 0 0 H
c a l
= x
iter i j k a iter
l i j k a
operation M
operation M 8 0 0 M +=
2 c[0][0]
0 0 0 0 M a[0][2] * l[2][0] H
9 0 0 M +=
3 c[0][0]
c[0][0] += a[0][0] * l[0][0] H a[0][3] * l[3][0] H
1 0 0 1 M 10 0 1 H +=
2 c[0][1]
c[0][0] += a[0][1] * l[1][0] H a[0][2] * l[2][1]
2 0 1 0 H
c[0][1] += a[0][0] * l[0][1] H
3 0 1 1 H
c[0][1] += a[0][1] * l[1][1] M
4 1 0 0 H
c a l
= x
iter i j k a iter
l i j k a
operation M
operation M 8 0 0 2 M +=
c[0][0]
0 0 0 0 M a[0][2] * l[2][0] H
9 0 0 3 M +=
c[0][0]
c[0][0] += a[0][0] * l[0][0] H a[0][3] * l[3][0] H
1 0 0 1 M 10 0 1 2 H +=
c[0][1]
c[0][0] += a[0][1] * l[1][0] H a[0][2] * l[2][1] H
11 0 1 3 H +=
c[0][1]
2 0 1 0 H a[0][3] * l[3][1]
c[0][1] += a[0][0] * l[0][1] H
3 0 1 1 H
c[0][1] += a[0][1] * l[1][1] M
4 1 0 0 H
c a l
= x
iter i j k a iter
l i j k a
operation M
operation M 8 0 0 2 M +=
c[0][0]
0 0 0 0 M a[0][2] * l[2][0] H
9 0 0 3 M +=
c[0][0]
c[0][0] += a[0][0] * l[0][0] H a[0][3] * l[3][0] H
1 0 0 1 M 10 0 1 2 H +=
c[0][1]
c[0][0] += a[0][1] * l[1][0] H a[0][2] * l[2][1] H
11 0 1 3 H +=
c[0][1]
2 0 1 0 H a[0][3] * l[3][1] M
c[0][1] += a[0][0] * l[0][1] H 12 1 0 2 H +=
c[1][0]
a[1][2] * l[2][0]
3 0 1 1 H
c[0][1] += a[0][1] * l[1][1] M
4 1 0 0 H
c a l
= x
iter i j k a iter
l i j k a
operation M
operation M 8 0 0 2 M +=
c[0][0]
0 0 0 0 M a[0][2] * l[2][0] H
9 0 0 3 M +=
c[0][0]
c[0][0] += a[0][0] * l[0][0] H a[0][3] * l[3][0] H
1 0 0 1 M 10 0 1 2 H +=
c[0][1]
c[0][0] += a[0][1] * l[1][0] H a[0][2] * l[2][1] H
11 0 1 3 H +=
c[0][1]
2 0 1 0 H a[0][3] * l[3][1] M
c[0][1] += a[0][0] * l[0][1] H 12 1 0 2 H +=
c[1][0]
a[1][2] * l[2][0] H
3 0 1 1 H 13 1 0 3 H +=
c[1][0]
c[0][1] += a[0][1] * l[1][1] M a[1][3] * l[3][0]
4 1 0 0 H
c a l
= x
iter i j k a iter
l i j k a
operation M
operation M 8 0 0 2 M +=
c[0][0]
0 0 0 0 M a[0][2] * l[2][0] H
9 0 0 3 M +=
c[0][0]
c[0][0] += a[0][0] * l[0][0] H a[0][3] * l[3][0] H
1 0 0 1 M 10 0 1 2 H +=
c[0][1]
c[0][0] += a[0][1] * l[1][0] H a[0][2] * l[2][1] H
11 0 1 3 H +=
c[0][1]
2 0 1 0 H a[0][3] * l[3][1] M
c[0][1] += a[0][0] * l[0][1] H 12 1 0 2 H +=
c[1][0]
a[1][2] * l[2][0] H
3 0 1 1 H 13 1 0 3 H +=
c[1][0]
c[0][1] += a[0][1] * l[1][1] M a[1][3] * l[3][0] H
4 1 0 0 H 14 1 1 2 H +=
c[1][1]
c a l
= x
iter i j k a iter
l i j k a
operation M
operation M 8 0 0 2 M +=
c[0][0]
0 0 0 0 M a[0][2] * l[2][0] H
9 0 0 3 M +=
c[0][0]
c[0][0] += a[0][0] * l[0][0] H a[0][3] * l[3][0] H
1 0 0 1 M 10 0 1 2 H +=
c[0][1]
c[0][0] += a[0][1] * l[1][0] H a[0][2] * l[2][1] H
11 0 1 3 H +=
c[0][1]
2 0 1 0 H a[0][3] * l[3][1] M
c[0][1] += a[0][0] * l[0][1] H 12 1 0 2 H +=
c[1][0]
a[1][2] * l[2][0] H
3 0 1 1 H 13 1 0 3 H +=
c[1][0]
c[0][1] += a[0][1] * l[1][1] M a[1][3] * l[3][0] H
4 1 0 0 H 14 1 1 2 H +=
c[1][1]
c a l
= x
iter i j k a iter
l i j k a
operation M
operation M 8 0 0 2 M +=
c[0][0]
0 0 0 0 M a[0][2] * l[2][0] H
9 0 0 3 M +=
c[0][0]
c[0][0] += a[0][0] * l[0][0] H a[0][3] * l[3][0] H
1 0 0 1 M 10 0 1 2 H +=
c[0][1]
c[0][0] += a[0][1] * l[1][0] H What is
a[0][2] * l[2][1] the miss rate of a? H
11 0 1 3 H +=
c[0][1]
2 0 1 0 H a[0][3] * l[3][1] M
c[0][1] += a[0][0] * l[0][1] H 12 1 0 2 H +=
c[1][0]
a[1][2] * l[2][0] H
3 0 1 1 H 13 What 1is the miss 0rate of l? 3 H +=
c[1][0]
c[0][1] += a[0][1] * l[1][1] M a[1][3] * l[3][0] H
4 1 0 0 H 14 1 1 2 H +=
c[1][1]
Carnegie Mellon
Practice Problems
Class Question / Discussions
■ We’ll work through a series of questions
■ Write down your answer for each question
■ You can discuss with your classmates
What Type of Locality?
• The following function exhibits which type of
locality? Consider only array accesses.
void who(int *arr, int size) {
for (int i = 0; i < size-1; ++i)
arr[i] = arr[i+1];
}
A. Spatial
B. Temporal
C. Both A and B
D. Neither A nor B
64
What Type of Locality?
• The following function exhibits which type of
locality? Consider only array accesses.
void who(int *arr, int size) {
for (int i = 0; i < size-1; ++i)
arr[i] = arr[i+1];
}
A. Spatial
B. Temporal
C. Both A and B
D. Neither A nor B
65
What Type of Locality?
• The following function exhibits which type of
locality? Consider only array accesses.
void coo(int *arr, int size) {
for (int i = size-2; i >= 0; --i)
arr[i] = arr[i+1];
}
A. Spatial
B. Temporal
C. Both A and B
D. Neither A nor B
66
What Type of Locality?
• The following function exhibits which type of
locality? Consider only array accesses.
void coo(int *arr, int size) {
for (int i = size-2; i >= 0; --i)
arr[i] = arr[i+1];
}
A. Spatial
B. Temporal
C. Both A and B
D. Neither A nor B
67
Calculating Cache Parameters
• Given the following address partition, how many
int values will fit in a single data block?
# of int in block
18 10 4
Address:
31
bits bits bits
0
A. 0
B. 1
Tag Set Block
index offset C. 2
D. 4
E. Unknown: We
need more info
Calculating Cache Parameters
• Given the following address partition, how many
int values will fit in a single data block?
# of int in block
18 10 4
Address:
31
bits bits bits
0
A. 0
B. 1
Tag Set Block
index offset C. 2
D. 4
E. Unknown: We
need more info
Direct-Mapped Cache Example
• Assuming a 32-bit address (i.e. m=32), how many bits
are used for tag (t), set index (s), and block offset (b).
8 bytes
per data block
31
bits bits bits
0
D. 3
E. More than one
Tag Set index Block of the above
offset
Which Set Is it?
• Which set is the address 0xFA1C located in?
8 bytes
per data block
31
bits bits bits
0
D. 3
E. More than one
Tag Set index Block of the above
offset
Cache Block Range
• What range of addresses will be in the same block as
address 0xFA1C? 8 bytes
per data block
Set
Valid Tag Cache block
0:
Set
Addr. Range
Valid Tag Cache block
1:
Set A. 0xFA1C
Valid Tag Cache block
2:
Set
B. 0xFA1C –
Valid Tag Cache block 0xFA23
3:
Set
Valid Tag Cache block
0:
Set
Addr. Range
Valid Tag Cache block
1:
Set A. 0xFA1C
Valid Tag Cache block
2:
Set
B. 0xFA1C –
Valid Tag Cache block 0xFA23
3:
Accessed
int foo(int* a, int N)
Bytes
{
int i; A 4
int sum = 0;
for(i = 0; i < N; i++) B 16
{
C 64
sum += a[i];
} D 256
return sum;
}
Cache Misses
If N = 16, how many bytes does the loop access of a?
Accessed
int foo(int* a, int N)
Bytes
{
int i; A 4
int sum = 0;
for(i = 0; i < N; i++) B 16
{
C 64
sum += a[i];
} D 256
return sum;
}
Cache Misses
Consider a 32 KB cache in a 32 bit address space. The cache is 8-way associative
and has 64 bytes per block. A LRU (Least Recently Used) replacement policy is used.
What is the miss rate on ‘pass 1’?
}
}
Detailed explanation in Appendix!
Appendix: C Programming Style
• Properly document your code
• Function + File header comments, overall operation of large blocks, any tricky bits
• Write robust code – check error and failure conditions
• Write modular code
• Use interfaces for data structures, e.g. create/insert/remove/free functions for a
linked list
• No magic numbers – use #define or static const
• Formatting
• 80 characters per line (use Autolab’s highlight feature to double-check)
• Consistent braces and whitespace
• No memory or file descriptor leaks
Appendix: Git Usage
• Commit early and often!
• At minimum at every major milestone
• Commits don’t cost anything!
• Arguments
1. A stream pointer, e.g. from fopen()
2. Format string for parsing, e.g “%c %d,%d”
3+. Pointers to variables for parsed data
• Can be pointers to stack variables
• Return Value
• Success: # of parsed vars
• Failure: EOF
• man fscanf
Appendix: fscanf() Example
FILE *pFile;
pFile = fopen(“trace.txt”, "r"); // Open file for reading
char access_type;
unsigned long address;
int size;
Misses
int foo(int* a, int N)
{ A 0
int i;
int sum = 0; B 8
for(i = 0; i < N; i++)
{ C 12
sum += a[i]; D 14
}
return sum; E 16
}
Appendix: Cache Misses
If there is a 48KB cache with 8 bytes per block and 3 cache lines
per set, how many misses if foo is called twice? N still equals 16.
NOTE: This is a contrived example since the number of cache lines must be a power of 2.
However, it still demonstrates an important point.
Misses
int foo(int* a, int N)
{ A 0
int i;
int sum = 0; B 8
for(i = 0; i < N; i++)
{ C 12
sum += a[i]; D 14
}
return sum; E 16
}
Appendix: Very Hard Cache Problem
• We will use a direct-mapped cache with 2 sets, which each can
hold up to 4 int’s.
• How can we copy A into B, shifted over by 1 position?
• The most efficient way? (Use temp!)
A 0 1 2 3 4 5 6 7
B 0 1 2 3 4 5 6 7
temp 0 1 2 3 4 5 6 7
A 0 1 2 3 4 5 6 7
B 0 1 2 3 4 5 6 7
Number of misses:
temp 0 1 2 3 4 5 6 7
A 0 1 2 3 4 5 6 7
B 0 1 2 3 4 5 6 7
The "m" denotes a miss, and the "h" denotes a hit. This pattern will repeat for the entirety of the array.
We can be sure that the second access is always a hit. This is because the first access will load the entire 64-byte block into
the cache (since the entire block is always loaded if any of its elements are accessed).
So, the big question is why the first access is always a miss. To answer this, we must understand many things about the cache.
First of all, we know that s, the number of set bits, is 6, which means there are 64 sets. Since each set maps to 64 bytes (as
there are b = 6 block bits), we know that every 64 * 64 bytes = 4 kilobytes we run out of sets:
Clearly, this pattern will repeat for the entirety of the array.
Appendix: 48KB Cache Explained (2)
However, note that we have E = 8 lines per set. That means that even though the next 4KB map to the same sets (0-63) as the first
4KB, they will just be put in another line in the cache, until we run out of lines (i.e., after we've gone through 8 * 4KB = 32KB
of memory). Splitting up the bigArr into 16KB chunks:
We see that section A will take up 16KB = 4 * 4KB; like we said, each of those 4KB chunks will take up 1 line each, so section A
uses 4 lines per set (and uses all 64 sets).
Similarly, section B also takes up 16KB = 4 * 4KB; again, each of those 4KB chunks will take up 1 line each, so section B also
uses 4 lines per set (and uses all 64 sets).
Note that as all of this data is being loaded in, our cache is still cold (does not contain any data from those sections), so the
previous assumption about the first of every other access missing (the "m" above) is still true.
line 0 1 2 3 4 5 6 7
+-------+-------+
0 | | |
1 | | |
s . . . .
e . . C . B .
t . . . .
62| | |
63| | |
+-------+-------+
Thus, we know now that the miss rate for the first pass is 50%.
Appendix: 48KB Cache Explained (4)
If we now consider the second pass, we're starting over at the beginning of bigArr (i.e., now we're reading section A). However,
there's a problem - section A isn't in the cache anymore! So we get a bunch of evictions (the "m" assumption is still true, of
course, since these evictions must also be misses). What are we evicting? The least-recently used lines, which are now lines 4-7
(holding data from B). Thus, the cache after reading section A looks like:
line 0 1 2 3 4 5 6 7
+-------+-------+
0 | | |
1 | | |
s . . . .
e . . C . A .
t . . . .
62| | |
63| | |
+-------+-------+
Then, we access B. But it isn't in the cache either! So we evict the least-recently-used lines (in this case, the lines that were
holding section C, 0-3) (the "m" assumption still holds); afterwards, the cache looks like:
line 0 1 2 3 4 5 6 7
+-------+-------+
0 | | |
1 | | |
s . . . .
e . . B . A .
t . . . .
62| | |
63| | |
+-------+-------+
Appendix: 48KB Cache Explained (5)
And finally, we access section C. But of course, its data isn't in the cache at all, so we again evict the least-recently used
lines (in this case, section A's lines, 4-7) (again, "m" assumption holds):
line 0 1 2 3 4 5 6 7
+-------+-------+
0 | | |
1 | | |
s . . . .
e . . B . C .
t . . . .
62| | |
63| | |
+-------+-------+
And so the miss rate is 50% for the second pass as well.
Thank you to Stan Zhang for coming up with such a detailed explanation!