0% found this document useful (0 votes)

41 views14 pages

Chapter 7: Distributed Shared Memory: Why DSM?

DSM

Uploaded by

Palanikumar Rajendran

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

41 views14 pages

Chapter 7: Distributed Shared Memory: Why DSM?

DSM

Uploaded by

Palanikumar Rajendran

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 14

CHAPTER 7: DISTRIBUTED

SHARED MEMORY
DSM simulates a logical shared memory address space over a set of physically
distributed local memory systems.

Why DSM?

• direct information sharing programming paradigm (transparency)

• multilevel memory access (locality)

• wealth of existing programs (portability)

• large physical memory

• scalable multiprocessor system

Chapter outline

• NUMA architectures: similarity between multiprocessor cache and DSM

systems

• Memory consistency models: why is memory consistency a more crit-

ical problem in multiprocessor and DSM systems? how is memory
consistency defined?

• Cache coherency protocols: implementation of consistency models

• DSM Implementation: applying the consistency models and coherency

protocols to a DSM system

1
Nonuniform Memory Access (NUMA) architectures

Generic NUMA architecture

processor memory processor memory

memory coherence . . . . . . . memory coherence

controller controller

buses, interconnection network, or communication network

2
Multiprocessor Cache and DSM architectures

Global Memory

Common Bus

Local Caches

Processors

(a) Multiprocessor cache architecture

Virtual Memory Space

Communication
Network

Local
Memory

Processors

(b) Distributed shared memory architecture

3
Common issues
Data consistency and coherency due to data placement, migration and repli-
cation

• Data Sharing Granularity

• Cache Miss Granularity

• Tradeoffs:

– Transfer time
– Administrative overhead
– Hit rate
– Replacement rate
– False Sharing

What to do on cache miss?

• Locating block - owner/directory

• Block Migration - block bouncing

• Block Replication

• Push vs. Pull

4
Memory consistency models
These models apply consistency constraints to all memory accesses
Accesses may require multiple messages and take significant time

Atomic consistency
All processors see same (global) order
Respects real-time order

Sequential consistency
All processors see same (global) order and
order respects all internal orders (not nec. real time)

P1 : W (X)1
P2 : W (Y )2
P3 : R(Y )2 R(X)0 R(X)1

A2 A1 A3
"Global Time"

P1 Access 1 Access 3

P2 Access 2

Atomic Consistency − global total order respecting access intervals

A2 A3 A4 A1 A5
"Global Order"

P1 W(X) 1

P2 W(Y) 2

P3 R(Y)2 R(X)0 R(X)1

Sequential Consistency − global total order (not nec. respecting access intervals)

5
Causal consistency
Processors may see different order
all orders respect causal order (internal and r-w)

P1 : W (X)1 W (X)3
P2 : R(X)1 W (X)2
P3 : R(X)1 R(X)3 R(X)2
P4 : R(X)1 R(X)2 R(X)3

P4 A4: R(X)1 A9: R(X)2 A10 R(X)3

P1 A1: W(X)1 A5: W(X)3

P2 A2: R(X)1 A6: W(X)2

P3 A3: R(X)1 A7: R(X)3 A8 R(X)2

causal link that must be respected in any order

A1 A5
"P1’s View"

A1 A2 A6
"P2’s View"

A1 A3 A5 A7 A6 A8
"P3’s View"

A1 A4 A6 A9 A5 A10
"P4’s View"

Causal Consistency − no global total order; causal partial order only

Each processor’s order respects internal order and Write−Read causality

6
Processor consistency
Writes from same processor are in order
Writes from different processors not constrained

P1 : W (X)1
P2 : R(X)1 W (X)2
P3 : R(X)1 R(X)2
P4 : R(X)2 R(X)1

P4 A6: R(X)1 A7: R(X)2

P1 A1: W(X)1

P2 A2: R(X)1 A3: W(X)2

P3 A4: R(X)2 A5: R(X)1

causal link that need not be respected internal link that is respected
A1
"P1’s View"

A1 A2 A3
"P2’s View"

A3 A4 A1 A5
"P3’s View"

A1 A6 A3 A7
"P4’s View"

Processor Consistency − no global total order; partial order on writes by same processor
Each processor’s order respects internal order and order of writes by same processor

7
Slow memory consistency
Writes from same processor to same location are in order
Writes from different processors or locations not constrained

P1 : W (X)1 W (X)2 W (Y )3
P2 : R(Y )3 R(X)1 R(X)2

P1 A1: W(X)1 A3: W(X)2 A3: W(Y)3

P2 A4: R(Y)3 A5: R(X)1 A6: R(X)2

causal link that need not be respected causal link that must be respected
A1 A2 A3
"P1’s View"

A3 A4 A1 A5 A2 A6
"P2’s View"

Slow Memory Consistency − no global total order, no constraints across memory locations
Each processor’s order respects its internal order and order of writes to same memory by same processor

8
Synchronization Access Consistency Models
Accesses to synchronization variables distinguished from accesses to ordinary
shared variables

Weak consistency
• Accesses to synchronization variables are sequentially consistent

• No access to a synchronization variable is issued by a processor before

all previous read/write data accesses have been performed
(i.e., synch waits until all ongoing accesses complete)

• No read/write data access is issued by a processor before a previous

access to a synchronization variable has been performed
(i.e., all new accesses must wait until synch is performed)

• in effect, system “settles” at synch.

Release consistency
The synchronization access (synch(S)) in the weak consistency model can
be refined as a pair of acquire(S) and release(S) accesses. Shared variables
in the critical section are made consistent when the release operation is per-
formed.
(i.e., S “locks” access to shared variables it protects, and release is not com-
pleted until all accesses to them are also completed).

Entry consistency
acquire and release are applied to general variables.
Each variable has an implicit synchronization variable that may be acquired
to prevent concurrent access to it.

9
delay = time at which shared vars consistent
future
accesses
issued
delay til issued acquire(S) acquire(X)
previous performed
all accesses delay til only accesses
done
synch(S) are previous to X are
exclusive done exclusive

delay issued
future performed release(S) release(X)
accesses
performed
delay til delay til
previous previous
(a) Weak consistency done (b) Release consistency done (c) Entry consistency
like barrier sync Processor consistency Consistency w.r.t.
but local to process − w.r.t. S memory object X
only sync when necessary All vars in DSM system across all procs

R(X) W(Y)

R(Y) W(Y)

W(Z) W(Z)

No synchronization

R(X) W(Y)

R(Y) Synch(S) W(Y)

W(Z) W(Z)

Weak consistency

Acq(S) R(X) W(Y) Rel(S)

Acq(S) R(Y) W(Y) Rel(S)

Acq(R) W(Z) W(Z) Rel(R)

Release consistency

10
Taxonomy

atomic consistency

Real−time Order Weakening

sequential consistency
Processor Relative Access Type
Weakening Weakening

causal consistency weak consistency

Processor Relative
Weakening

processor consistency release consistency

Location Relative
Weakening
slow memory entry consistency

no system coherence support

11
Multiprocessor Cache Systems

Cache directory

master copy E

P bits

replicated block V E P : Number of processors

V : Valid or invalid
replicated block V E
E : Exclusive or
shared-read-only
replicated block V E

V bit for validity (in replicas), E bit for exclusive access (in all)
May also include private (= not shared) bit and/or dirty (= modified) bit.

Cache coherency protocols

write-invalidate and write-update

Write-invalidate
• Read hit

• Read miss: transfer block, set P-, V-, and E-bit.

• Write hit: invalidate cache copies, write and set E-bit

• Write miss: like read miss/write hit

Hardware mechanisms

• Directory-based

• Snooping cache

12
DSM implementation

Memory management algorithms

exclusive copy

READ : remote migrate replicate

1 2 3 4

WRITE : remote migrate replicate

1 : Central server algorithm (SRSW) 2 : Migration algoritm (SRSW)

3 : Read-replication algorithm (MRSW) 4 : Full-replication algorithm (MRMW)

• Read-remote-write-remote: long network delay, trivial consistency

• Read-migrate-write-migrate: thrashing and false sharing

• Read-replicate-write-migrate: write-invalidate

• Read-replicate-write-replicate; full concurrency, atomic update

Considerations:

• Block granularity

• Block transfer communication overhead

• Read/write ratio

• Locality of reference

• Number of nodes and type of interaction

13
Distributed implementation of Directory

Locating Block Owner:

Previous Owners Current Owner

probable probable current

block block block
owner owner owner

1 2 3
Request
4 ...2n−1
probable
block
owner
request and change probable owner along way

Maintaining Copy List:

From To’s From To’s From To’s

From Nil From Nil

(a) Spanning tree representation of copy set

invalidate or update
Head Master Node End Node

master next master next master next master Nil

request forwarded request

acknowledgement
requestor
append

(b) Linked list representation of copy set

DS Unit4
No ratings yet
DS Unit4
63 pages
Vi6CS5 DS Unit-4
No ratings yet
Vi6CS5 DS Unit-4
64 pages
Distributed Shared Memory
No ratings yet
Distributed Shared Memory
52 pages
Unit 5 DOS SCR
No ratings yet
Unit 5 DOS SCR
46 pages
L04 Parallel Systems Synchronization Communication Scheduling
No ratings yet
L04 Parallel Systems Synchronization Communication Scheduling
117 pages
Unit 5 DOS SCR
No ratings yet
Unit 5 DOS SCR
22 pages
Distributed Shared Memory - Revised
No ratings yet
Distributed Shared Memory - Revised
64 pages
Memory Consistency Model
No ratings yet
Memory Consistency Model
17 pages
Untitled
No ratings yet
Untitled
27 pages
Unit 3
No ratings yet
Unit 3
58 pages
Multi Processor
No ratings yet
Multi Processor
63 pages
Lecture 5
No ratings yet
Lecture 5
15 pages
Distributed Shared Memory
No ratings yet
Distributed Shared Memory
51 pages
Module 5 Previous Year Questions With Solution
No ratings yet
Module 5 Previous Year Questions With Solution
8 pages
R12 U5 MultiProcessor Architectures
No ratings yet
R12 U5 MultiProcessor Architectures
47 pages
10 Distributed Shared Memory
No ratings yet
10 Distributed Shared Memory
20 pages
Distributed Shared Memory For Advanced Os
No ratings yet
Distributed Shared Memory For Advanced Os
21 pages
05 Multiprocessor
No ratings yet
05 Multiprocessor
54 pages
1st Ia Preparation
No ratings yet
1st Ia Preparation
15 pages
Distributed Shared Memory
No ratings yet
Distributed Shared Memory
35 pages
4-Module #4-Shared-Memory-Students-Version-Final-October-24-2024
No ratings yet
4-Module #4-Shared-Memory-Students-Version-Final-October-24-2024
25 pages
Shared Memory Multiprocessors: Logical Design and Software Interactions
No ratings yet
Shared Memory Multiprocessors: Logical Design and Software Interactions
107 pages
Bridgeport S2 - Int2 - TNC151 Parameters
100% (2)
Bridgeport S2 - Int2 - TNC151 Parameters
5 pages
L 14 DSM
No ratings yet
L 14 DSM
3 pages
MODULE 4 HPC
No ratings yet
MODULE 4 HPC
41 pages
Unit - IV Notes
No ratings yet
Unit - IV Notes
42 pages
Unit 4 - Advanced Computer Architecture - WWW - Rgpvnotes.in
No ratings yet
Unit 4 - Advanced Computer Architecture - WWW - Rgpvnotes.in
14 pages
Module 2
No ratings yet
Module 2
34 pages
Module 4
No ratings yet
Module 4
40 pages
Shared Memory Architecture
No ratings yet
Shared Memory Architecture
39 pages
L7 Multicore 1
No ratings yet
L7 Multicore 1
50 pages
Name: Jayrajsinh Vaghela Roll No: 5166 Div: B Sub: DOS (Assi-3.1)
No ratings yet
Name: Jayrajsinh Vaghela Roll No: 5166 Div: B Sub: DOS (Assi-3.1)
24 pages
Cache Coherence
No ratings yet
Cache Coherence
53 pages
Parallel and Distributed Computing Lec 6
No ratings yet
Parallel and Distributed Computing Lec 6
26 pages
RoverBook Laptop Motherboard Schematic Diagram PDF
100% (1)
RoverBook Laptop Motherboard Schematic Diagram PDF
51 pages
KTMTSS Shared Memory Multiprocessor
No ratings yet
KTMTSS Shared Memory Multiprocessor
29 pages
Unit 2
No ratings yet
Unit 2
15 pages
Shared Memory. Distributed Memory. Hybrid Distributed-Shared Memory
No ratings yet
Shared Memory. Distributed Memory. Hybrid Distributed-Shared Memory
22 pages
DSM
No ratings yet
DSM
36 pages
Lect5 - Distributed Shared Memory
No ratings yet
Lect5 - Distributed Shared Memory
120 pages
Multiprocessors
No ratings yet
Multiprocessors
39 pages
Unit 3 DSM
No ratings yet
Unit 3 DSM
12 pages
Unit-4 DS
No ratings yet
Unit-4 DS
39 pages
Shared Memory Architecture Concepts and Performance Issues: Outline
No ratings yet
Shared Memory Architecture Concepts and Performance Issues: Outline
7 pages
Shared Memory Multiprocessors
No ratings yet
Shared Memory Multiprocessors
45 pages
Parallel Computer Architecture A Hardware-Software
No ratings yet
Parallel Computer Architecture A Hardware-Software
18 pages
Consistency and Replication: CS403/534 Distributed Systems Erkay Savas Sabanci University
No ratings yet
Consistency and Replication: CS403/534 Distributed Systems Erkay Savas Sabanci University
44 pages
Distributed Shared Memory
No ratings yet
Distributed Shared Memory
24 pages
WINSEM2022-23 CSE4001 ETH VL2022230503162 ReferenceMaterialI TueFeb1400 00 00IST2023 Module4DistributedSystemsLecture2
No ratings yet
WINSEM2022-23 CSE4001 ETH VL2022230503162 ReferenceMaterialI TueFeb1400 00 00IST2023 Module4DistributedSystemsLecture2
27 pages
Unit 4
No ratings yet
Unit 4
7 pages
Memory Hierarchy: Haresh Dagale Dept of ESE
No ratings yet
Memory Hierarchy: Haresh Dagale Dept of ESE
32 pages
6CS5 DS Unit-4
No ratings yet
6CS5 DS Unit-4
64 pages
DSM - Distributedsharedmemory
No ratings yet
DSM - Distributedsharedmemory
108 pages
Multiprocessor Cache Coherence
No ratings yet
Multiprocessor Cache Coherence
13 pages
A4
No ratings yet
A4
5 pages
DS IAT 3 Answer Key
No ratings yet
DS IAT 3 Answer Key
9 pages
Introduction To DSM: Unit - III Essay Questions
No ratings yet
Introduction To DSM: Unit - III Essay Questions
21 pages
Distributed Shared Memory
100% (1)
Distributed Shared Memory
20 pages
What Is Parallel Computing
No ratings yet
What Is Parallel Computing
9 pages
Training Manual 2 PDF
100% (1)
Training Manual 2 PDF
46 pages
Cache Coherence: Part I: CMU 15-418: Parallel Computer Architecture and Programming (Spring 2012)
No ratings yet
Cache Coherence: Part I: CMU 15-418: Parallel Computer Architecture and Programming (Spring 2012)
31 pages
Tinno Ev730 Service Manual
No ratings yet
Tinno Ev730 Service Manual
44 pages
ZTE N9130 Speed Secret Codes
No ratings yet
ZTE N9130 Speed Secret Codes
3 pages
PSL Edited
No ratings yet
PSL Edited
7 pages
FC722 Fire Control Panel: Building Technologies
No ratings yet
FC722 Fire Control Panel: Building Technologies
8 pages
Mechatronics Module 3 - Yajnesha P Shettigar
No ratings yet
Mechatronics Module 3 - Yajnesha P Shettigar
21 pages
Micro Mouse
100% (1)
Micro Mouse
47 pages
Workshop Brochure
No ratings yet
Workshop Brochure
2 pages
PIC (Peripheral Interface Controller) PIC Is A Family of Harvard Architecture Microcontrollers Made by
100% (2)
PIC (Peripheral Interface Controller) PIC Is A Family of Harvard Architecture Microcontrollers Made by
7 pages
Intro To SRDF
No ratings yet
Intro To SRDF
10 pages
Tandberg Data StorageLibrary T40
No ratings yet
Tandberg Data StorageLibrary T40
2 pages
RFID Solution For 3-Part Upated 30 Jan 2024
No ratings yet
RFID Solution For 3-Part Upated 30 Jan 2024
13 pages
Biotech
No ratings yet
Biotech
12 pages
PC Specification
No ratings yet
PC Specification
104 pages
5A Monitor Prog SiBAS
No ratings yet
5A Monitor Prog SiBAS
144 pages
User Manual PE-6000
No ratings yet
User Manual PE-6000
32 pages
QB Os
No ratings yet
QB Os
104 pages
Interfacing of 8255 With 8085
No ratings yet
Interfacing of 8255 With 8085
2 pages
DXT Plug-In Unit Descriptions: DN0592435 Issue 1-7
No ratings yet
DXT Plug-In Unit Descriptions: DN0592435 Issue 1-7
32 pages
Ict 7 Revision
No ratings yet
Ict 7 Revision
4 pages
Handling Computer Files (Teachers Guide)
No ratings yet
Handling Computer Files (Teachers Guide)
16 pages
Powerlab 16-35
No ratings yet
Powerlab 16-35
4 pages
Lecture-17 CH-05 1
No ratings yet
Lecture-17 CH-05 1
21 pages
Adca / Mca (Iii Year) Term-End. Examination June, 2OO7
No ratings yet
Adca / Mca (Iii Year) Term-End. Examination June, 2OO7
4 pages
AYA Spec Sheet
No ratings yet
AYA Spec Sheet
2 pages
India Earl.: The Guide To Backups
No ratings yet
India Earl.: The Guide To Backups
15 pages
Super Computer
No ratings yet
Super Computer
15 pages
Enterasys - End of Service Life - C3 e B3
No ratings yet
Enterasys - End of Service Life - C3 e B3
4 pages
The Tech Interview Playbook: From DSA to System Design
From Everand
The Tech Interview Playbook: From DSA to System Design
Chinmoy Mukherjee
No ratings yet
Mastering the Art of ARM Assembly Programming: Unlock the Secrets of Expert-Level Skills
From Everand
Mastering the Art of ARM Assembly Programming: Unlock the Secrets of Expert-Level Skills
Steve Jones
No ratings yet

Chapter 7: Distributed Shared Memory: Why DSM?

Uploaded by

Chapter 7: Distributed Shared Memory: Why DSM?

Uploaded by

CHAPTER 7: DISTRIBUTED

• direct information sharing programming paradigm (transparency)

• multilevel memory access (locality)

• wealth of existing programs (portability)

• large physical memory

• scalable multiprocessor system

• NUMA architectures: similarity between multiprocessor cache and DSM

• Memory consistency models: why is memory consistency a more crit-

• Cache coherency protocols: implementation of consistency models

• DSM Implementation: applying the consistency models and coherency

Generic NUMA architecture

processor memory processor memory

memory coherence . . . . . . . memory coherence

buses, interconnection network, or communication network

(a) Multiprocessor cache architecture

Virtual Memory Space

(b) Distributed shared memory architecture

• Data Sharing Granularity

• Cache Miss Granularity

What to do on cache miss?

• Locating block - owner/directory

• Block Migration - block bouncing

• Push vs. Pull

Atomic Consistency − global total order respecting access intervals

P3 R(Y)2 R(X)0 R(X)1

P4 A4: R(X)1 A9: R(X)2 A10 R(X)3

P1 A1: W(X)1 A5: W(X)3

P2 A2: R(X)1 A6: W(X)2

P3 A3: R(X)1 A7: R(X)3 A8 R(X)2

causal link that must be respected in any order

Causal Consistency − no global total order; causal partial order only

P4 A6: R(X)1 A7: R(X)2

P2 A2: R(X)1 A3: W(X)2

P3 A4: R(X)2 A5: R(X)1

P1 A1: W(X)1 A3: W(X)2 A3: W(Y)3

P2 A4: R(Y)3 A5: R(X)1 A6: R(X)2

• No access to a synchronization variable is issued by a processor before

• No read/write data access is issued by a processor before a previous

• in effect, system “settles” at synch.

R(Y) Synch(S) W(Y)

Acq(S) R(X) W(Y) Rel(S)

Acq(S) R(Y) W(Y) Rel(S)

Acq(R) W(Z) W(Z) Rel(R)

Real−time Order Weakening

causal consistency weak consistency

processor consistency release consistency

no system coherence support

replicated block V E P : Number of processors

Cache coherency protocols

write-invalidate and write-update

• Read miss: transfer block, set P-, V-, and E-bit.

• Write hit: invalidate cache copies, write and set E-bit

• Write miss: like read miss/write hit

Memory management algorithms

READ : remote migrate replicate

WRITE : remote migrate replicate

1 : Central server algorithm (SRSW) 2 : Migration algoritm (SRSW)

• Read-remote-write-remote: long network delay, trivial consistency

• Read-migrate-write-migrate: thrashing and false sharing

• Read-replicate-write-replicate; full concurrency, atomic update

• Block transfer communication overhead

• Number of nodes and type of interaction

Locating Block Owner:

Previous Owners Current Owner

probable probable current

Maintaining Copy List:

From To’s From To’s From To’s

From Nil From Nil

(a) Spanning tree representation of copy set

master next master next master next master Nil

request forwarded request

(b) Linked list representation of copy set

You might also like