4-Module #4-Shared-Memory-Students-Version-Final-October-24-2024
4-Module #4-Shared-Memory-Students-Version-Final-October-24-2024
2
Introduction
o A shared memory as shown in the Figure.
o In shared All inter-processor coordination and synchronization are accomplished via the global memory.
M M M M
Interconnection Network
P P P P
o An arbitration unit within the memory module passes requests through to a memory controller.
o If the memory module is not busy and a single request arrives, then the arbitration unit passes that request to the memory and the request is
satisfied.
o If the arbitration unit receives two requests, it selects one of them and passes it to the memory controller.
o The module is placed in the busy state while a request is being serviced.
o If a new request arrives while the memory is busy servicing a previous request, the memory module sends a wait signal, through the memory
controller, to the processor making the new request.
o Based on the interconnection network used, shared memory systems can be categorized in the following categories:
✓ Uniform Memory Access (UMA)
o In the UMA a shared memory all processors have equal access time to any memory location.
o The interconnection network used in the UMA can be a single bus, multiple buses, or a crossbar switch.
o Because access to shared memory is balanced, these systems are also called SMP (Symmetric Multiprocessor) systems.
o Each processor has equal opportunity to read/write to memory, including equal access speed.
o Commercial examples of SMPs are Sun Microsystems multiprocessor servers and Silicon Graphics Inc. multiprocessor servers.
o A typical bus-structured SMP computer, as shown in the Figure. M M M M
Bus
C C C C
P P
Classification of Shared Memory Systems
✓ Non-uniform memory access (NUMA)
o In this architecture each processor has part of shared
memory attached.
o The memory has a single address space.
o Any processor could access any memory location
directly using its real address.
o The access time to modules depends on the distance to
the processor, i.e., a non-uniform memory access time.
o A number of architectures are used in a NUMA such as
the tree networks.
o Examples of NUMA architecture are BBN TC-2000,
SGI Origin 3000, and Cray T3E. Figure 4.4 shows
NUMA system organization
o Example 1
▪ Suppose a shared memory system is constructed from processors that can execute V = 110 instructions/second and the
processor duty cycle I = 1.
▪ The caches are designed to support a hit rate of 97%, and the bus supports a peak bandwidth of B = 106 cycles/second.
▪ Then, (1 – h) = 0.03, and the maximum number of processors N is N ≤ 106/(0.03 * 110) = 32.
▪ Thus, the system we have in mind can support only 32 processors!
Cache Incoherence Problem
Single Processor Case caching
Hit: data in the cache
2. Transaction Serialization
Reads/Writes to a single memory location must be seen by all processors in the same order.
Thus, if location X received two different values A and B, in this order, from any two processors, the processors can never read
location X as B and then read it as A. The location X must be seen with values A and B in that order.
Cache Incoherence Problem
Example: Consider that more than one processor has cached a copy of the memory location X.
1. If a read made by a processor P1 to a location X that follows a write by the same processor P1 to X,
given that no writes to X by another processor occurring between the write and the read instructions
made by P1, X must always return the value written by P1.
2. If a read made by a processor P1 to location X that follows a write by another processor P2 to X, given
that no other writes to X made by any processor occurring between the two accesses and with the read
and write being sufficiently separated, X must always return the value written by P2.
x x x
P1 P2 P3 Pn
-Multiple copies of x
-What if P1 updates x?
11
Cache Incoherence Problem
Cache Coherence Policies
✓Four Cases to handle Writing to Cache in n processors case
• Write Update - Write Through
• Write Update - Write Back
• Write Invalidate - Write Through
• Write Invalidate - Write Back
✓ Cashe-Memory coherence
Illustration of the Write-Through and Write Back in a single Processor
Write Through Write Back
Serial Event Memory Cache Memory Cache
1 X X
2 P reads X X X X X
3 P updates (write) X X’ X’ X X’
12
Cache Coherence Problem
✓ Cache – Cache Coherence
o When a task running on processor P requests the data in global memory location X, the contents of X are copied to
processor P’s local cache.
o Suppose processor Q also accesses X and wants to Write a new value to X.
o There are two fundamental cache coherence policies:
(1) write-invalidate, and
(2) write-update.
o Write-invalidate maintains consistency by reading from local caches until a write occurs.
o When any processor updates the value of X through a write, posting a dirty bit for X invalidates all copies.
o Processor Q invalidates all other copies of X when it writes a new value into its cache. This sets the dirty bit for X.
o Q can continue to change X without further notifications to other caches because Q has the only valid copy of X.
o When processor P wants to read X, it must wait until X is updated and the dirty bit is cleared.
x x x’ I x’ I
P1 P2 P3 P1 P2 P3 P1 P2 P3
14
Cache Incoherence Problem
Write-Update
x x’ x
x x x’ x’ x’ x’
P1 P2 P3 P1 P2 P3 P1 P2 P3
15
Cache Incoherence Problem
✓ Write Invalidate Write Through
o Multiple processors can read block copies from main memory safely until one processor
updates its copy.
o At this time, all cache copies are invalidated and the memory is updated to remain consistent.
State Description
16
Cache Incoherence Problem
Write Through- Write Invalidate (cont.)
Event Actions
Read Miss Fetch a copy from global memory. Set the state of this copy to Valid.
Write Hit Perform the write locally. Broadcast an Invalid command to all caches. Update the global
memory.
Write Miss Get a copy from global memory. Broadcast an invalid command to all caches. Update the
global memory. Update the local copy and set its state to Valid.
Replace Since memory is always consistent, no write back is needed when a block is replaced.
17
Cache Incoherence Problem
✓ Write Invalidate Write Through
Example 1
X=5 M
1. P reads X
2. Q reads X
3. Q updates X, X=10 C C
4. Q reads X
5. Q updates X, X=15 P Q
6. P updates X, X=20
7. Q reads X
18
Cache Incoherence Problem
• The Table shows the contents of memory and the two caches after the execution of each operation when Write Invalidate Write
Through was used for cache coherence.
• The table also shows the state of the block containing X in P’s cache and Q’s cache.
Memory P’s Cache Q’s Cache
Serial Event Location Location State Location State
X X X
0 Original value 5
1 P reads X 5 5 VALID
(Read Miss)
2 Q reads X 5 5 VALID 5 VALID
(Read Miss)
3 Q updates X 10 5 INV 10 VALID
(Write Hit)
4 Q reads X 10 5 INV 10 VALID
(Read Hit)
5 Q updates X 15 5 INV 15 VALID
(Write Hit)
6 P updates X 20 20 VALID 15 INV
(Write Miss)
7 Q reads X 20 20 VALID 20 VALID
(Read Miss)
Cache Incoherence Problem
✓ Write Back- Write Invalidate (ownership)
o A valid block can be owned by memory and shared in multiple caches that can
contain only the shared copies of the block.
o Multiple processors can safely read these blocks from their caches until one
processor updates its copy.
o At this time, the writer becomes the only owner of the valid block and all other
copies are invalidated.
20
Cache Incoherence Problem
✓ Write Back- Write Invalidate (ownership)
State Description
Shared Data is valid and can be read safely. Multiple copies can be in this state
(Read-Only) [RO]
Exclusive Only one valid cache copy exists and can be read from and written to safely.
(Read-Write) Copies in other caches are invalid
[RW]
21
Cache Incoherence Problem
7) Q reads X.
23
Cache Incoherence Problem
✓Example:
o Consider the shared memory system previously shown in the Figure and the following operations: 1) P reads X, 2) Q reads
X, 3) Q updates X, 4) Q reads X, 5) Q updates X, 6) P updates X, 7) Q reads X.
o The Table shows the contents of memory and the two caches after the execution of each operation when Write Invalidate
Write Back was used for cache coherence. The table also shows the state of the block containing X in P’s cache and Q’s
cache.
Memory P’s Cache Q’s Cache
Serial Event Location Location State Location State
X X X
0 Original value 5
1 P reads X 5 5 RO
(Read Miss)
2 Q reads X 5 5 RO 5 RO
(Read Miss)
3 Q updates X 5 5 INV 10 RW
(Write Hit)
4 Q reads X 5 5 INV 10 RW
(Read Hit)
5 Q updates X 5 5 INV 15 RW
(Write Hit)
6 P updates X 5 20 RW 15 INV
(Write Miss)
7 Q reads X 20 20 RO 20 RO
(Read Miss)
References
• Textbook Chapter 4