Global State: - Global State of A Distributed System Consists of
Global State: - Global State of A Distributed System Consists of
Global State
• Global state of a distributed system consists of
– Local state of each process: messages sent and messages
received
– State of each channel: messages sent but not received
• Problem: how to record the global state of a distributed
system?
– Cannot require that all local observations must happen at the
same time due to lack of global time
– Using the state of the individual processes at arbitrary points in
time may not result in a consistent overall picture
• Problem: record a consistent global state, also called a
distributed snapshot
– A global state is consistent if for any received message in the
state the corresponding send is also in the state
A global state is consistent if it corresponds to a consistent
cut.
a) A consistent cut. b) An inconsistent cut
1
2/14/2012
Distributed Snapshot Algorithm
• Assumptions: communication channels are reliable, message
delivery is ordered
• Any process can initiate the algorithm by
– Recording its local state, and
– Sending a marker over each outgoing channel
• On receiving a marker over incoming channel C
– If local state is not saved, save the local state and send a marker over
each outgoing channel
– Otherwise, save all messages received via C since local state is saved
and until the marker arrived
• A process finishes when it receives a marker on each incoming
channel and processes them all. At this time, it has accumulated
– A local state snapshot
– For each incoming channel, a set of messages received after
performing the local snapshot and before the marker came down that
channel
3
An example of the distributed snapshot algorithm
P1 *
m3
m1 *
P2
m2
P3
*
Entity Recoded State
P1 No messages have been sent or received
P2 m1 and m2 have been sent. No messages have been received
P3 m2 has been received. No messages have been sent
CH12 Empty
CH21 Contains m1
CH13 Empty
CH31 Empty
CH23 Empty
CH32 Empty
4
2
2/14/2012
Recorded State versus Global State
A state recorded by the distributed snapshot algorithm may not match any global
state of the system! 5
Distributed Termination Detection
• When a process terminates, OS frees its
resources
– This approach is not adequate for distributed systems
• Processes of a distributed computation should be
terminated when all of them have completed
their tasks
– A process is active when it is performing work, and
passive when it has no work
– Work is assigned to a process through a message
• A passive process becomes active on receiving a message
• Distributed termination condition (DTC):
– All processes of a distributed computation are passive
– No messages are in transit
6
3
2/14/2012
Distributed Termination Detection
(continued)
• Credit distribution based termination detection
– A distributed computation is initiated with credit C, which is
distributed among its processes
• A process sends some of its credit in each message
• A process receiving a message adds the credit from the message to its
own credit
– When a process becomes passive, it sends its credit to collector
process
– The distributed computation is known to have terminated when
credit accumulated by collector is C
• Diffusion computation based termination detection
– Each process that becomes passive initiates a diffusion
computation to determine if DTC holds
– Assumption: sender of a message becomes blocked until it
receives an ACK for the message
7
Diffusion Computation Based
Termination Detection
Definition: The first query received by a node is called an engaging query. 8