MUNCHEN TECHNISCHE UNIVERSITAT Lehrstuhl f ur Integrierte Systeme
Chip Multicore Processors Tutorial 8
June 19, 2013
Task 8.1: Performance of Snooping-based Cache Coherency
In this task the performance of snooping-based cache coherency is evaluated. The starting state of a system with three processor cores and their caches is depicted. Each cache entry is the state of the coherency protocol, the tag und two data words. All addresses are hexadecimal and the tag is depicted simpleed as the cahe lines base address of the cache line. Data is also simpleed. You also nd the start state of the memory. As discussed in the previous tutorial, the timing behavior and the performance depend on the coherency implementation. The given implementation for mobile systems is optimized towards power eciency, so that accesses to the external memory should be minimized. The system has the following properties: In case of a hit, no extra stall cycles are required. In case of a miss, Nmem = 40 cycles are required when the block is loaded from memory. In case another cache holds the cache block currrently, it can provide the data within Ncache = 16 to other caches. An invalidation delays the execution by Ninv = 4 cycles. A write back delays the execution by Nwb = 40 cycles. Given is following operation sequence: sequence 1: (P1) read 410 (P2) read 410 (P0) read 430 sequence 2: (P0) write 420, 42 (P2) read 424 (P2) write 424, 23 sequence 3: (P0) write 408, 7 (P2) read 408 (P0) write 408, 9
1: 2: 3:
nomenclature: (CPU) read address and (CPU) write addresse, value. a) Give the changes of the cache entries of each sequence (separately) according to the MSI protocol. Use the following tables for the changes after each operation. Furthermore, give the delay of the whole sequence on execution.
2 sequence 1 Op CPU
Index
State
Tag
Data
sequence 2 Op CPU
Index
State
Tag
Data
sequence 3 Op CPU
Index
State
Tag
Data
b) To optimize the external accesses an owner state (O) is added to the cache coherency protocol. On a write, all other cache entries should be invalidated (write-invalidate). Instead of the memory the current owner will give the data on a read access of another cache. Sketch the modied diagramm of the MOSI protocol.
Invalid
Shared
Modified
Owner
3 c) Perform the same procedure as in part a for the MOSI protocol in the following tables.
sequence 1 Op CPU
Index
State
Tag
Data
sequence 2 Op CPU
Index
State
Tag
Data
sequence 3 Op CPU
Index
State
Tag
Data
Task 8.2: Cache Coherency Example: Intel Nehalem
Read the article Memory Performance and Cache Coherency Eects on an Intel Nehalem Multiprocessor System, Daniel Molka et al., PACT 2009. Shortly describe the investigated architecture? What is decribed by the term ccNUMA? How do the information in the level 3 cache relate to the other levels and how precise is it? Shortly describe the executed benchmarks and central ndings of the article.