Tutorial08 Solution
Tutorial08 Solution
S. Wallentowitz
418
420 428 430
00 | 00
01 | 02 0c | d0 00 | ff 00 | 00 ...
0 1 2 3
S S M I
01 | 02 54 | 04 00 | 00
438
a) sequence 1
1: (P1) read 410 2: (P2) read 410 3: (P0) read 430 3: replace
0 1 I S M I 408 410 S 430 54 | 04 20 | 01 00 | 00 ... 400 408 00 | 00 54 | 04 20 | 01 03 | 00
1: write back
2 3
1: 2: 3:
+ + 200 cycles
2: write back and load 1: read miss
0 1 2 3
410
418
420 428 430
00 | 00
01 | 02 0c | d0 00 | ff 00 | 00 00 | 00 ...
0 1 2 3
S S M I S
01 | 02 54 | 04 00 | 00 20 | 01
438
a) sequence 2
1: write miss 1: (P0) write 420, 42 2: (P2) read 424 3: (P2) write 424, 23 2: write back 3: invalidate
0 1 2 3 I 420 MS I S 408 M I 410 01 | 42 54 | 04 20 | 01 400 408 ... 00 | 00 54 | 04 03 | 00
1: 2: 3:
+ 124 cycles
1: snoop WM 2: read miss
0 1 2 3
I M I S 418 0a | 00 23 | 42 01 | 02 01 | 42 54 | 04 00 | 00 428 00 | 20
410
418
420 428 430
00 | 00
01 | 02 01 | 42 0c | d0 00 | ff 00 | 00 ...
0 1 2 3
438
3: invalidate
a) sequence 3
1: (P0) write 420, 42 2: (P2) read 424 3: (P2) write 424, 23
0 1 2 3 I S M I 408 410 54 | 04 20 | 01 400 408 ... 00 | 00 54 | 04 03 | 00
Self Study
0 1 2 3
I M I S 418 0a | 00 428 00 | 20
410
418
420 428 430
00 | 00
01 | 02 0c | d0 00 | ff 00 | 00 ...
0 1 2 3
S S M I
01 | 02 54 | 04 00 | 00
438
8.1 b)
To optimize the external accesses an owner state (O) is added to the cache coherency protocol. On a write, all other cache entries should be invalidated (write-invalidate). Instead of the memory the current owner will give the data on a read access of another cache. Sketch the modified diagramm of the MOSI protocol.
Invalid
CPU Write Miss (Place write miss on bus)
Shared
Read Miss
Modified
Hit
Bus triggered
Events Cache actions
MOSI
Invalidate Write Miss Read Hit
Invalid
Write Miss (Write Back)
Shared
Read Miss Write (Invalidate)
Read Miss (Provide Data) Write (Invalidate) Read Miss (Provide Data)
Modified
Read/Write Hit
Owner
Read Hit
c) sequence 1
1: (P1) read 410 2: (P2) read 410 3: (P0) read 430 3: replace
0 1 I S M 408 54 | 04 20 | 01 00 | 00 ... 400 408 0 1 I M I S S 428 410 418 00 | 20 20 | 01 0a | 00 410 00 | 00 54 | 04 20 | 01 03 | 00
2 3
410 O 430 I S
1: 2: 3:
+ + 152 cycles
2: write back and load 1: read miss
418
420 428 430
00 | 00
01 | 02 0c | d0 00 | ff 00 | 00 00 | 00 ...
2 3
0 1 2 3
S S M I S
01 | 02 54 | 04 00 | 00 20 | 01
438
c) sequence 2
1: write miss 1: (P0) write 420, 42 2: (P2) read 424 3: (P2) write 424, 23 2: provide data 3: invalidate
0 1 2 3 I 420 MO I S 408 M I 410 01 | 42 54 | 04 20 | 01 400 408 ... 00 | 00 54 | 04 03 | 00
1: 2: 3:
60 cycles
1: snoop WM, dont provide 2: read miss
0 1 2 3
I M I S 418 0a | 00 23 | 42 01 | 02 01 | 42 54 | 04 00 | 00 428 00 | 20
410
418
420 428 430
00 | 00
01 | 02 0c | d0 00 | ff 00 | 00 ...
0 1 2 3
438
3: invalidate
a) sequence 3
1: (P0) write 420, 42 2: (P2) read 424 3: (P2) write 424, 23
0 1 2 3 I S M I 408 410 54 | 04 20 | 01 400 408 ... 00 | 00 54 | 04 03 | 00
0 1 2 3
I M I S 418 0a | 00 428 00 | 20
410
418
420 428 430
00 | 00
01 | 02 0c | d0 00 | ff 00 | 00 ...
0 1 2 3
S S M I
01 | 02 54 | 04 00 | 00
438
8.2
Read the article Memory Performance and Cache Coherency Effects on an Intel Nehalem Multiprocessor System, Daniel Molka et al., PACT 2009. Shortly describe the investigated architecture? What is decribed by the term ccNUMA? How do the information in the L3 cache relate to the other levels and how precise is it? Shortly describe the executed benchmarks and central findings of the article.
ccNUMA: non-uniform memory access, cache coherent L3: inclusive last level, core valid bits are imprecise
0: core does for sure not hold a copy 1: core may hold a copy