Chapter 7: Distributed Shared Memory: Why DSM?
Chapter 7: Distributed Shared Memory: Why DSM?
SHARED MEMORY
DSM simulates a logical shared memory address space over a set of physically
distributed local memory systems.
Why DSM?
Chapter outline
1
Nonuniform Memory Access (NUMA) architectures
2
Multiprocessor Cache and DSM architectures
Global Memory
Common Bus
Local Caches
Processors
Communication
Network
Local
Memory
Processors
3
Common issues
Data consistency and coherency due to data placement, migration and repli-
cation
• Tradeoffs:
– Transfer time
– Administrative overhead
– Hit rate
– Replacement rate
– False Sharing
• Block Replication
4
Memory consistency models
These models apply consistency constraints to all memory accesses
Accesses may require multiple messages and take significant time
Atomic consistency
All processors see same (global) order
Respects real-time order
Sequential consistency
All processors see same (global) order and
order respects all internal orders (not nec. real time)
P1 : W (X)1
P2 : W (Y )2
P3 : R(Y )2 R(X)0 R(X)1
A2 A1 A3
"Global Time"
P1 Access 1 Access 3
P2 Access 2
A2 A3 A4 A1 A5
"Global Order"
P1 W(X) 1
P2 W(Y) 2
Sequential Consistency − global total order (not nec. respecting access intervals)
5
Causal consistency
Processors may see different order
all orders respect causal order (internal and r-w)
P1 : W (X)1 W (X)3
P2 : R(X)1 W (X)2
P3 : R(X)1 R(X)3 R(X)2
P4 : R(X)1 R(X)2 R(X)3
A1 A2 A6
"P2’s View"
A1 A3 A5 A7 A6 A8
"P3’s View"
A1 A4 A6 A9 A5 A10
"P4’s View"
6
Processor consistency
Writes from same processor are in order
Writes from different processors not constrained
P1 : W (X)1
P2 : R(X)1 W (X)2
P3 : R(X)1 R(X)2
P4 : R(X)2 R(X)1
P1 A1: W(X)1
causal link that need not be respected internal link that is respected
A1
"P1’s View"
A1 A2 A3
"P2’s View"
A3 A4 A1 A5
"P3’s View"
A1 A6 A3 A7
"P4’s View"
Processor Consistency − no global total order; partial order on writes by same processor
Each processor’s order respects internal order and order of writes by same processor
7
Slow memory consistency
Writes from same processor to same location are in order
Writes from different processors or locations not constrained
P1 : W (X)1 W (X)2 W (Y )3
P2 : R(Y )3 R(X)1 R(X)2
causal link that need not be respected causal link that must be respected
A1 A2 A3
"P1’s View"
A3 A4 A1 A5 A2 A6
"P2’s View"
Slow Memory Consistency − no global total order, no constraints across memory locations
Each processor’s order respects its internal order and order of writes to same memory by same processor
8
Synchronization Access Consistency Models
Accesses to synchronization variables distinguished from accesses to ordinary
shared variables
Weak consistency
• Accesses to synchronization variables are sequentially consistent
Release consistency
The synchronization access (synch(S)) in the weak consistency model can
be refined as a pair of acquire(S) and release(S) accesses. Shared variables
in the critical section are made consistent when the release operation is per-
formed.
(i.e., S “locks” access to shared variables it protects, and release is not com-
pleted until all accesses to them are also completed).
Entry consistency
acquire and release are applied to general variables.
Each variable has an implicit synchronization variable that may be acquired
to prevent concurrent access to it.
9
delay = time at which shared vars consistent
future
accesses
issued
delay til issued acquire(S) acquire(X)
previous performed
all accesses delay til only accesses
done
synch(S) are previous to X are
exclusive done exclusive
delay issued
future performed release(S) release(X)
accesses
performed
delay til delay til
previous previous
(a) Weak consistency done (b) Release consistency done (c) Entry consistency
like barrier sync Processor consistency Consistency w.r.t.
but local to process − w.r.t. S memory object X
only sync when necessary All vars in DSM system across all procs
R(X) W(Y)
R(Y) W(Y)
W(Z) W(Z)
No synchronization
R(X) W(Y)
W(Z) W(Z)
Weak consistency
Release consistency
10
Taxonomy
atomic consistency
sequential consistency
Processor Relative Access Type
Weakening Weakening
Processor Relative
Weakening
Location Relative
Weakening
slow memory entry consistency
11
Multiprocessor Cache Systems
Cache directory
master copy E
P bits
V : Valid or invalid
replicated block V E
E : Exclusive or
shared-read-only
replicated block V E
V bit for validity (in replicas), E bit for exclusive access (in all)
May also include private (= not shared) bit and/or dirty (= modified) bit.
Write-invalidate
• Read hit
Hardware mechanisms
• Directory-based
• Snooping cache
12
DSM implementation
exclusive copy
1 2 3 4
• Read-replicate-write-migrate: write-invalidate
Considerations:
• Block granularity
• Read/write ratio
• Locality of reference
13
Distributed implementation of Directory
1 2 3
Request
4 ...2n−1
probable
block
owner
request and change probable owner along way
invalidate or update
Head Master Node End Node
acknowledgement
requestor
append
14