0% found this document useful (0 votes)
13 views29 pages

DS 3

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
13 views29 pages

DS 3

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 29

Introduction

This chapter focuses on the importance of time in distributed systems, which are
networks of computers that work together. It covers how we can monitor these systems
and manage the timing of events that happen across different computers.

Why is Time Important in Distributed Systems?

1. Accurate Measurement:

o In many scenarios, like online shopping, events occur at different


computers (e.g., a merchant's site and a bank). To keep track of when
these events happen, we need to synchronize the clocks of these
computers with a reliable external time source. This is crucial for auditing
and ensuring that transactions are recorded correctly.

2. Clock Synchronization Algorithms:

o Several algorithms help synchronize clocks in distributed systems. These


algorithms are essential for:

▪ Data Consistency: Ensuring that all copies of data are the same
across different computers.

▪ Authentication: Verifying that requests sent to servers are


legitimate, which can depend on synchronized clocks (like in the
Kerberos protocol).

▪ Duplicate Processing Prevention: Avoiding the reprocessing of


the same updates or transactions.

Challenges of Measuring Time

• Time can be tricky to measure because different observers may perceive time
differently. This concept is rooted in Einstein’s Special Theory of Relativity,
which tells us that:

o The speed of light is constant for all observers, regardless of their motion.

o Events that seem simultaneous to one observer may not appear


simultaneous to another moving observer. For example, someone on
Earth might disagree with a person in a spaceship about the timing of
events.

• Causality:

o While the order of events can differ for observers, if one event causes
another, all observers will agree on that order. However, the time it takes
between cause and effect can differ.
Time in Distributed Systems

• In distributed systems, we face a similar challenge: there is no universal clock


that can provide an absolute measure of time for all computers. This means:

o We can't definitively say when events happen relative to each other


across different nodes (computers).

o We need to determine if certain events occurred at the same time or in a


specific order.

For example, in object-oriented programming, we need to check if references to an


object still exist to know if it can be safely deleted (garbage collection). This requires
observing the states of processes (to see if they still reference the object) and checking
communication channels (to see if messages referencing the object are still in transit).

Chapter Structure

1. Clock Synchronization:

o The first part of the chapter discusses methods to synchronize computer


clocks using message passing techniques.

2. Logical Clocks:

o The chapter introduces logical clocks, such as vector clocks, which help
establish the order of events without needing to measure physical time.

3. Global State Capture:

o The latter part of the chapter describes algorithms designed to capture


the global state of distributed systems as they operate, allowing us to
understand the system's overall condition at any point in time.

14.2 Clocks, Events, and Process States

This section builds on the basics of distributed systems introduced earlier. Here, we
focus on how to understand and track the evolution of these systems, especially how to
order and timestamp events that occur within them.

Understanding Distributed Systems

1. What is a Distributed System?

o A distributed system consists of multiple processes (let's call


them p1,p2,…,pNp1,p2,…,pN) running on separate computers. Each
process operates on its own processor and does not share memory with
other processes.

2. Process State:

o Each process has a state (sisi), which includes:

▪ The values of all its variables.

▪ The status of any objects or files it interacts with in its local


environment.

o As a process executes, it can change its state.

3. Communication Between Processes:

o Processes can only communicate by sending and receiving messages


through the network. They cannot directly interact or share memory,
which means they can't, for example, shake hands.

Events in a Process

• An event is defined as a single action performed by a process. This can be:

o A message being sent or received.

o A change in the process's state.

• The sequence of events within a single process can be ordered. We denote this
order with a relation <i<i:

o If event ee occurs before event e′e′ in process pipi, we write e<ie′e<ie′.

• History of a Process:

o The history of a process pipi is the series of events that occur in that
process, ordered by the <i<i relation:

history(pi)=⟨e0,e1,e2,…⟩history(pi)=⟨e0,e1,e2,…⟩

Clocks and Timestamps

• Physical Clocks:

o Each computer has a physical clock that counts time based on


oscillations in a crystal. These clocks are used to timestamp events.

o The operating system reads the hardware clock and adjusts it to create a
software clock Ci(t)Ci(t) that approximates real time. This clock might not
be perfectly accurate due to various factors.

• Timestamping Events:
o We can use the value of the software clock Ci(t)Ci(t) to timestamp events
in process pipi. However, for timestamps to differ between consecutive
events, the clock must update frequently enough (the clock resolution
must be smaller than the time between events).

Clock Issues: Skew and Drift

• Clock Skew:

o This refers to the difference in time readings between two clocks at a


single moment. For instance, if two computers check their clocks
simultaneously, they may show different times.

• Clock Drift:

o Over time, clocks can run at slightly different speeds due to physical
variations. This means that even if two clocks start at the same time, they
will diverge as time passes.

o For example, a typical quartz clock might drift about one second every
1,000,000 seconds (or roughly every 11.6 days).

Coordinated Universal Time (UTC)

• Synchronization with Accurate Time Sources:

o To maintain accurate time, computers can synchronize their clocks with


external, highly accurate time sources, such as atomic clocks. These
clocks have a very low drift rate (about one part in 10131013).

• Defining UTC:

o UTC is the international standard for timekeeping, based on atomic time


but adjusted with occasional leap seconds to stay in sync with
astronomical time.

o UTC signals are broadcast from various sources, including radio stations
and satellites like GPS, allowing computers to synchronize their clocks
with high accuracy (GPS can be accurate to about 1 microsecond).
Synchronizing Physical Clocks in Distributed Systems

In distributed systems, it's crucial to know the exact time when events occur for various
reasons, such as record-keeping and coordination. To achieve this, we need to
synchronize the clocks of the processes involved. There are two main types of
synchronization: external synchronization and internal synchronization.

1. Types of Synchronization

External Synchronization:

• This involves synchronizing the clocks of processes with an authoritative external


time source, like Coordinated Universal Time (UTC).

• The goal is to ensure that the clocks CiCi of all processes are accurate within a
specified bound DD of the external time source SS for all times tt in a given
interval II.

• Mathematically, we say:

∣Ci(t)−S(t)∣<Dfor all i and t∈I∣Ci(t)−S(t)∣<Dfor all i and t∈I

• This means the clocks are accurate to within the bound DD.

Internal Synchronization:

• This refers to the synchronization of clocks with each other, ensuring that they
agree within a specified bound DD.

• Even if the clocks are not synchronized with an external source, as long as they
are synchronized with each other, we can measure the time intervals between
events.

• The condition for internal synchronization can be expressed as:

∣Ci(t)−Cj(t)∣<Dfor all i,j and t∈I∣Ci(t)−Cj(t)∣<Dfor all i,j and t∈I

• If the system is externally synchronized with a bound DD, it is also internally


synchronized with a bound of 2D2D. This is because the clocks may drift
collectively from the external source but still agree with one another.

2. Clock Correctness

To ensure that clocks function correctly, several conditions are defined:

• Correctness: A hardware clock HH is considered correct if its drift rate (the rate
at which it diverges from the true time) is within a known limit, such
as 10−610−6 seconds per second. This means the error in measuring time
should be bounded:
∣t−H(t)∣<bounded error∣t−H(t)∣<bounded error

• Monotonicity: This is a weaker condition that requires a clock CC to only


advance over time. It should never jump backward. For example, if a clock is
running fast and is reset, it could lead to incorrect event ordering in applications
(like the UNIX make tool, which relies on timestamps). Monotonicity ensures
that:

t1<t2 ⟹ C(t1)<C(t2)t1<t2⟹C(t1)<C(t2)

• Hybrid Condition: Sometimes, a hybrid condition is used where a clock must


obey monotonicity and have a bounded drift rate between synchronization
points. However, it can be allowed to jump ahead at synchronization points.

3. Faulty Clocks

A clock that does not adhere to the correctness conditions is considered faulty. There
are two types of clock failures:

• Crash Failure: The clock stops working entirely.

• Arbitrary Failure: The clock behaves unpredictably, such as when the "Y2K bug"
caused clocks to incorrectly register dates after December 31, 1999, as January
1, 1900.

Importantly, a clock does not need to be accurate (i.e., showing the correct current
time) to be considered correct. The focus is on the clock's ability to function reliably,
maintaining proper event ordering and synchronization with other clocks.

4. Algorithms for Synchronization

The chapter goes on to discuss various algorithms designed for both external and
internal synchronization of clocks in distributed systems. These algorithms help ensure
that processes can coordinate their actions based on time, which is essential for
maintaining consistency and correctness across the system.
Synchronization in a Synchronous System

In this section, we will explore how synchronization works between processes in a


synchronous distributed system, focusing on internal synchronization. Let's break down
the concepts step by step.

1. Understanding Synchronous Systems

A synchronous distributed system is one where certain timing constraints are known:

• Drift Rate: The maximum rate at which the clocks can drift from the true time.

• Message Transmission Delay: The maximum time it takes for a message to be


sent from one process to another.

• Execution Time: The maximum time required for a process to execute a single
step.

These known bounds help ensure that processes can synchronize their clocks
effectively.

2. Internal Synchronization Between Two Processes

Let’s consider two processes, P1P1 and P2P2, that need to synchronize their clocks.

• Step 1: Sending the Time:

o Process P1P1 sends its local clock time tt to P2P2 in a message mm.

• Step 2: Receiving the Time:

o Upon receiving the message, P2P2 can adjust its clock. In principle, P2P2
could set its clock to t+Ttranst+Ttrans, where TtransTtrans is the time
taken for the message to travel from P1P1 to P2P2.

However, the challenge is that TtransTtrans can vary and is unknown due to several
factors, including network congestion and the processing load on the nodes.

3. Transmission Time and Uncertainty

• Minimum and Maximum Transmission Time:

o There is always a minimum transmission time (TminTmin) that can be


measured or estimated. This is the best-case scenario where no other
processes are competing for resources.

o There is also a maximum transmission time (TmaxTmax), which is the


worst-case delay for any message to be sent.

• Uncertainty in Transmission Time:


o We define the uncertainty in message transmission time as uu, where:

u=Tmax−Tminu=Tmax−Tmin

• This means that the actual transmission time TtransTtrans can vary anywhere
between TminTmin and TmaxTmax.

4. Setting the Clock

When P2P2 receives the message, it has to decide how to set its clock based on the
uncertainty:

• Setting the Clock to t+Tmaxt+Tmax:

o If P2P2 sets its clock to t+Tmaxt+Tmax, the clock skew (the difference
between the two clocks) can be as much as uu if the message actually
took the maximum time to arrive.

• Setting the Clock to t+Tmint+Tmin:

o If it sets its clock to t+Tmint+Tmin, the skew could also be as large as uu.

• Optimal Setting:

o To minimize the skew, P2P2 can set its clock to the halfway point:

t+Tmin+Tmax2t+2Tmin+Tmax

• This approach reduces the maximum possible skew to:

u22u

5. Synchronizing Multiple Clocks

When synchronizing NN clocks in a synchronous system, the optimal bound on clock


skew that can be achieved is:

u⋅(1+1N)u⋅(1+N1)

This means that as the number of clocks increases, the skew can be minimized further,
but it will always be proportional to the uncertainty uu.

6. Asynchronous Systems

Most real-world distributed systems are asynchronous. In these systems:

• The factors leading to message delays are unpredictable, and there is no upper
bound on transmission delays.

• The transmission time can be expressed as:

Ttrans=Tmin+xTtrans=Tmin+x
where xx can be any non-negative value that is not known in advance.

In asynchronous systems, because of the lack of known bounds, synchronization


becomes much more complex and challenging.

Cristian’s Method for Synchronizing Clocks

Cristian's method, proposed in 1989, is a technique for synchronizing the clocks of


computers using a centralized time server. This method is designed to provide external
synchronization with Coordinated Universal Time (UTC). Let’s break down how this
method works, its assumptions, and its limitations.

1. Overview of Cristian's Method

• Time Server: Cristian’s method uses a time server (let's call it SS) that is
connected to a device receiving UTC time signals. The time server is responsible
for providing the current time to requesting processes.

• Request-Response Mechanism:

o A process pp (the client) sends a request message mrmr to the time


server SS asking for the current time.

o The server SS responds with a message mtmt that contains the current
time tt. This time tt is recorded just before the message is sent.

2. Measuring Round-Trip Time

• Round-Trip Time: The time it takes for the request mrmr to reach the server and
for the response mtmt to come back to process pp is called the round-trip
time TroundTround.

• Clock Drift: The accuracy of the round-trip time measurement depends on the
clock drift of process pp. For example, if the drift rate is 10−610−6 seconds per
second, and the round-trip time is about 1-10 milliseconds (0.001 to 0.01
seconds), the clock drift during that period would be negligible
(around 0.010.01 milliseconds).

3. Estimating the Time

• Setting the Clock: To set its clock accurately, process pp can estimate the time
to which it should set its clock using:

t′=t+Tround2t′=t+2Tround
This assumes that the time taken for the request to reach the server and for the
response to return is roughly equal.

• Accounting for Transmission Time:

o If the minimum transmission time TminTmin is known, the accuracy of the


time can be further refined. The earliest SS could have placed the time
in mtmt is TminTmin after pp sent mrmr, and the latest it could have done
this is TminTmin before mtmt arrives back at pp.

o Therefore, the time returned by the server is in the range:

t+Tmin to t+Tround−Tmint+Tmin to t+Tround−Tmin

• The width of this range indicates the potential error in the time synchronization.

4. Improving Accuracy

• Multiple Requests: To improve accuracy, process pp can send multiple requests


to the time server, spacing them out to avoid temporary network congestion. By
taking the minimum round-trip time from these requests, pp can get a more
accurate estimate of the synchronization time.

5. Limitations of Cristian’s Method

• Single Point of Failure: One significant drawback of Cristian’s method is that it


relies on a single time server. If this server fails, synchronization becomes
impossible. To mitigate this risk, Cristian suggested using multiple synchronized
time servers. A client could send requests to all servers and use the first
response.

• Faulty or Malicious Servers: If a time server provides incorrect time values or if


an imposter server responds with false times, it can cause significant issues in
the system. Cristian's original work did not address these potential security
issues, assuming that external time sources are reliable.

6. Related Work

• Cristian and his colleagues later proposed probabilistic protocols for internal
clock synchronization that tolerate certain failures.

• Other researchers, like Srikanth and Toueg, developed algorithms that optimize
accuracy while tolerating some faulty clocks. Dolev et al. highlighted the need for
a minimum number of correct clocks to achieve agreement in the presence of
faulty ones.

• The Berkeley algorithm, which will be described later, addresses synchronization


among multiple clocks and includes mechanisms for dealing with faulty clocks.
7. Security Considerations

• To prevent malicious interference with time synchronization, authentication


techniques can be implemented. This ensures that only trusted servers can
provide time updates to clients.

The Berkeley Algorithm for Clock Synchronization

The Berkeley algorithm, developed by Gusella and Zatti in 1989, is a method for
synchronizing the clocks of multiple computers in a distributed system, specifically
designed for Berkeley UNIX. Here’s a detailed yet straightforward explanation of how the
algorithm works, its components, and its advantages.

1. Overview of the Berkeley Algorithm

The Berkeley algorithm is an internal synchronization method that involves a


designated master computer (the coordinator) and several slave computers whose
clocks need to be synchronized. Here’s how it operates:

• Coordinator (Master): A specific computer is chosen to act as the master. This


computer is responsible for polling the other computers to gather their clock
values.

• Slaves: The other computers in the network are referred to as slaves. They send
their current clock readings to the master when requested.

2. Polling and Collecting Clock Values

• Polling Process: The master periodically sends requests to the slave computers
to ask for their current clock values.

• Round-Trip Time Measurement: Similar to Cristian’s method, the master


estimates the time it takes for messages to travel to and from the slaves (round-
trip time). This helps the master understand how much delay may have occurred
when receiving the clock values.

3. Calculating the Average Clock Time

• Averaging Clock Values: Once the master has collected the clock values from
all the slaves, it calculates an average clock time. This average includes the
master’s own clock reading. The idea is that averaging helps to cancel out any
individual clock errors (whether they run fast or slow).
• Handling Outliers: The master also sets a nominal maximum round-trip time. If
any clock readings are associated with round-trip times longer than this
maximum, those readings are ignored. This helps eliminate outliers caused by
faulty clocks or network delays.

4. Adjusting Slave Clocks

• Sending Adjustments: Instead of sending the updated time back to the slaves
(which could introduce uncertainty due to transmission delays), the master
calculates how much each slave’s clock needs to be adjusted. This adjustment
can be either positive (to speed up the clock) or negative (to slow it down).

5. Fault Tolerance

• Fault-Tolerant Average: The Berkeley algorithm is designed to handle faulty


clocks effectively. If a clock reading is significantly different from the others, it
may indicate a malfunctioning clock.

• Subset Selection: The master selects a subset of clock readings that do not
differ from one another by more than a specified threshold. The average is then
calculated only from these selected clocks, which helps ensure that faulty
clocks do not skew the results.

6. Example and Performance

• Experiment Results: In experiments conducted by Gusella and Zatti, they


synchronized 15 computers’ clocks to within about 20-25 milliseconds. The local
clocks had drift rates of less than 2×10−52×10−5, and the maximum round-trip
time was set at 10 milliseconds.

7. Handling Master Failures

• Master Election: If the master fails, another computer can be elected to take
over its role. This election process is crucial for maintaining synchronization in
the network.

• Election Algorithms: While there are algorithms for electing a new master, they
do not guarantee that a new master will be elected within a bounded time frame.
This means that if there is a delay in electing a new master, the clocks may drift
apart during that time.

The Network Time Protocol (NTP)


The Network Time Protocol (NTP) is a widely used method for synchronizing the clocks
of computers over the Internet. Developed by David Mills in 1995, NTP addresses the
challenges of achieving accurate time synchronization across networks that may
experience variable delays. Here’s a detailed but easy-to-understand overview of how
NTP works, its architecture, and its key features.

1. Purpose of NTP

NTP is designed to:

• Synchronize Clients to UTC: NTP helps clients across the Internet synchronize
their clocks accurately to Coordinated Universal Time (UTC), despite the large
delays that can occur in Internet communication.

• Provide Reliability: It offers a reliable service that can continue to function even
if some servers or connections fail.

• Offset Clock Drift: Clients can resynchronize frequently enough to counteract


the natural drift in computer clocks.

• Ensure Security: NTP uses authentication techniques to verify that the time data
comes from trusted sources and to protect against malicious interference.

2. NTP Architecture

NTP operates through a hierarchical structure of time servers, organized into levels
called strata:

• Stratum 1: These are primary servers directly connected to a reliable time


source, such as a GPS clock or an atomic clock. They provide the most accurate
time.

• Stratum 2: These servers synchronize their clocks with Stratum 1 servers. They
are one step removed from the primary time sources.

• Stratum 3 and Lower: These servers synchronize with Stratum 2 servers, and so
on. Each lower stratum level may introduce some error, making them slightly
less accurate.

The overall structure is referred to as a synchronization subnet, where servers can


communicate and share time information (see Figure 14.3 in your reference).

3. Synchronization Process

NTP uses different modes for synchronization:

• Multicast Mode: Suitable for high-speed LANs, where one or more servers
periodically broadcast their time to other computers. This mode is less accurate
but useful for many applications.
• Procedure-Call Mode: Similar to Cristian’s method, where a server responds to
requests from clients with its current timestamp. This mode is used when higher
accuracy is required.

• Symmetric Mode: Used between servers that need to share time information
accurately. Servers exchange timing data, and this mode is designed for high
accuracy, especially between servers at lower strata.

4. Message Exchange and Timestamping

In NTP, messages are sent using the User Datagram Protocol (UDP), which is an
unreliable transport method. Each NTP message includes several timestamps:

• Ti-3: Time when the first message was sent.

• Ti-2: Time when the first message was received by the other server.

• Ti-1: Time when the second message was sent back.

• Ti: Time when the second message was received.

Using these timestamps, NTP calculates:

• Offset (o): The estimated difference between the two clocks.

• Delay (d): The total time taken for the messages to be sent and received.

5. Filtering and Accuracy

NTP employs statistical techniques to ensure accuracy:

• Data Filtering: NTP servers filter the timing data they receive, keeping track of
the most recent pairs of offset and delay values. This helps to identify and
discard unreliable data.

• Peer Selection: NTP servers prioritize synchronization with peers that have
lower stratum numbers (closer to the primary time source) and lower dispersion
(less variability in timekeeping).

6. Phase Lock Loop Model

NTP uses a phase lock loop model to adjust the local clock's frequency based on its
drift rate. For example, if a clock consistently gains time, NTP can slightly reduce its
update frequency to compensate, thereby improving overall accuracy.

7. Performance

NTP can achieve synchronization accuracies of:

• Tens of milliseconds over the Internet.


• One millisecond on local area networks (LANs).

8. Redundancy and Resilience

NTP is designed to handle failures:

• Redundant Servers: If one server fails, others can take over without disrupting
the time service.

• Dynamic Reconfiguration: The synchronization subnet can adapt to changes,


such as servers becoming unreachable or failing.

Understanding Logical Time and Logical Clocks

In a distributed system, where multiple processes run on different machines, it’s


challenging to synchronize their clocks perfectly. This means that each process has its
own local clock, and the timestamps on these clocks cannot be used to determine the
order of events across different processes. To address this, Lamport introduced a way
to order events using logical time.

The Happened-Before Relation

The happened-before relation (denoted as “→”) is a way to establish a partial ordering


of events in a distributed system. Here’s how it works:

1. Local Events: If two events occur in the same process, they are ordered by the
time they are observed in that process. For example, if process p1p1 performs
action aa and then action bb, we say a→ba→b.

2. Message Passing: When a process sends a message to another process, the


sending of the message is considered to happen before the receiving of that
message. So, if process p1p1 sends a message mm and process p2p2 receives
it, we have send(m)→receive(m)send(m)→receive(m).

3. Transitive Relation: If event e1e1 happened before e2e2 (i.e., e1→e2e1→e2)


and e2e2 happened before e3e3 (i.e., e2→e3e2→e3), then we can conclude
that e1e1 happened before e3e3 (i.e., e1→e3e1→e3). This is similar to how we
think about time in a linear fashion.

Example of the Happened-Before Relation

Imagine three processes p1,p2,p1,p2, and p3p3:

• Process p1p1:

o Event aa (local event)


o Event bb (local event, happens after aa)

• Process p2p2:

o Event cc (receives message from p1p1)

• Process p3p3:

o Event dd (sends a message to p2p2)

o Event ee (receives that message)

From these events, we can establish the following relations:

• a→ba→b (since they are in the same process)

• b→cb→c (assuming bb sends a message that cc receives)

• d→ed→e (since dd sends a message that ee receives)

Concurrent Events

Not all events are related by the happened-before relation. If there are two events,
say aa and ee, that occur in different processes and there is no message sent between
them, we say that aa and ee are concurrent. This is denoted as a∥ea∥e. This means that
we cannot determine which event happened first based on the happened-before
relation.

Potential Causality

It’s important to note that the happened-before relation captures potential causality,
not actual causality. Just because one event happened before another in terms of the
relation does not mean that the first event caused the second. For example, if a server
receives a request and then sends a reply, the reply is ordered after the request.
However, if the server sends replies every five minutes regardless of requests, there is
no causal link between the two events, even though they are ordered by the happened-
before relation.

1. Logical Clocks

Logical clocks were introduced by Leslie Lamport in 1978 to provide a way to order
events in a distributed system where processes cannot perfectly synchronize their
physical clocks.

Key Features of Lamport Logical Clocks:


• Monotonically Increasing Counter: Each process pipi maintains its own logical
clock LiLi, which is a simple counter that increases over time. The value of this
clock does not relate to any physical time; it simply counts events.

• Timestamping Events: When a process executes an event, it increments its


logical clock before recording the event's timestamp. The timestamp of an
event ee at process pipi is denoted as Li(e)Li(e).

Rules for Updating Logical Clocks:

1. Increment Before Event: Before any event occurs (like sending a message), the
process increments its logical clock:

Li:=Li+1Li:=Li+1

2. Sending Messages: When process pipi sends a message mm, it includes its
current logical clock value LiLi with the message:

send(m,Li)send(m,Li)

3. Receiving Messages: When process pjpj receives a message mm with


timestamp tt, it updates its logical clock:

Lj:=max⁡(Lj,t)+1Lj:=max(Lj,t)+1

Then, it timestamps the receive event.

Example:

• Suppose three processes p1,p2,p1,p2, and p3p3 start with L1=0L1=0, L2=0L2=0,
and L3=0L3=0.

• If p1p1 sends a message, it increments its clock to L1=1L1=1 and


sends (m,1)(m,1).

• If p2p2 receives this message, it updates its clock to L2=max⁡(L2,1)+1=2L2


=max(L2,1)+1=2.

This method allows us to maintain a logical order of events based on the happened-
before relation.

2. Totally Ordered Logical Clocks

While Lamport's logical clocks help to establish a partial order of events, sometimes we
need a total order, meaning every pair of events can be compared.

How to Create a Total Order:

1. Combining Timestamps with Process Identifiers: To ensure that every event


can be uniquely ordered, we combine the logical clock value with the identifier of
the process that generated the event. If ee is an event at process pipi with
timestamp TiTi and e′e′ is at process pjpj with timestamp TjTj, we define:

o Global timestamps: (Ti,i)(Ti,i) for event ee and (Tj,j)(Tj,j) for event e′e′.

o We say (Ti,i)(Ti,i) is less than (Tj,j)(Tj,j) if:

▪ Ti<TjTi<Tj, or

▪ Ti=TjTi=Tj and i<ji<j.

Example:

• If ee occurs at p1p1 with timestamp 2 and e′e′ occurs at p2p2 with timestamp 3,
then (2,1)<(3,2)(2,1)<(3,2).

• If both events have the same timestamp (e.g., both have timestamp 2), we can
order them based on their process identifiers.

This total ordering is useful in scenarios like managing access to shared resources
(critical sections).

3. Vector Clocks

Vector clocks, developed by Mattern and Fidge, improve upon Lamport's clocks by
allowing us to determine concurrency (whether two events are independent) and
potential causality more effectively.

Key Features of Vector Clocks:

• Array of Counters: Each process pipi maintains a vector clock ViVi, which is an
array of integers of size NN (the total number of processes). Each entry Vi[j]Vi
[j] counts the number of events that process pjpj has seen that could affect pipi.

Rules for Updating Vector Clocks:

1. Initialization: Each process initializes its vector clock to zero:

Vi[j]=0for all jVi[j]=0for all j

2. Increment Before Event: Before process pipi timestamps an event, it


increments its own counter:

Vi[i]:=Vi[i]+1Vi[i]:=Vi[i]+1

3. Sending Messages: When pipi sends a message, it includes its vector clock:

send(m,Vi)send(m,Vi)

4. Receiving Messages: When pjpj receives a message with vector clock VV, it
updates its own vector clock:
Vj[k]:=max⁡(Vj[k],V[k])for all kVj[k]:=max(Vj[k],V[k])for all k

Then it increments its own entry:

Vj[j]:=Vj[j]+1Vj[j]:=Vj[j]+1

Example:

• Suppose we have three processes with vector clocks initialized to (0,0,0)(0,0,0).

• If p1p1 sends a message after incrementing its clock to (1,0,0)(1,0,0), p2p2


receives it and updates its clock to (1,1,0)(1,1,0).

Comparing Vector Clocks:

To determine the relationship between two events based on their vector clocks VeVe
and VfVf:

• Causally Related: Ve<VfVe<Vf if Ve[j]≤Vf[j]Ve[j]≤Vf[j] for all jj and there exists at


least one kk such that Ve[k]<Vf[k]Ve[k]<Vf[k].

• Concurrent Events: If neither Ve<VfVe<Vf nor Vf<VeVf<Ve, then the events are
concurrent.

Global States in Distributed Systems

In distributed systems, multiple processes run concurrently and may communicate


with each other via messages. A global state is a snapshot of the entire system at a
particular point in time, capturing the states of all processes and the status of
communication channels (like messages in transit). Understanding global states is
essential for solving various problems in distributed systems.

1. Distributed Garbage Collection

Garbage collection is the process of identifying and reclaiming memory that is no


longer needed. In a distributed system, an object is considered "garbage" if there are no
references to it from any process.

Key Points:

• References: An object is only useful if at least one process has a reference to it.
If no process has a reference, the object can be safely deleted.

• In-Transit Messages: We must also consider messages that are currently being
sent between processes. If a message contains a reference to an object, that
object cannot be considered garbage until the message is delivered.
Example:

• Suppose Process p1p1 has two objects: one referenced by itself and another
referenced by Process p2p2. If Process p2p2 has an object that no one
references, it can be considered garbage. However, if there’s a message in transit
from p2p2 to p1p1 that references another object, that object cannot be garbage
until the message is delivered.

2. Distributed Deadlock Detection

A deadlock occurs when a set of processes are blocked because each process is
waiting for a message from another process in the set, forming a cycle in the wait-for
graph.

Key Points:

• Wait-For Graph: This graph represents which process is waiting for which other
process. If there is a cycle in this graph, a deadlock exists.

• Example: If Process p1p1 is waiting for a message from Process p2p2


and Process p2p2 is waiting for a message from Process p1p1, they are in a
deadlock. Neither can proceed, and the system cannot make progress.

3. Distributed Termination Detection

Termination detection is about figuring out whether a distributed algorithm has


finished executing. It may seem simple at first, but it can be tricky due to the nature of
distributed processes.

Key Points:

• Active vs. Passive Processes: An active process is currently doing work, while
a passive process is not actively engaged but is ready to respond to requests.

• Example Scenario: Imagine two processes, p1p1 and p2p2. If both are found to
be passive, you might think the algorithm has terminated. However, if there’s a
message in transit from p2p2 to p1p1, p1p1 could become active again when it
receives the message, indicating that the algorithm hasn’t truly terminated.

4. Distributed Debugging

Debugging distributed systems is challenging because we need to understand the


interactions between processes over time.

Key Points:

• Consistency Constraints: In some applications, variables in different processes


must remain consistent or within a certain range of each other.
• Example: Suppose each process pipi has a variable xixi. If there’s a bug
causing xixi and xjxj to differ by more than the allowed range, debugging requires
knowing the values of these variables at the same time, which is difficult in a
distributed setting.

Global States in Distributed Systems

In distributed systems, multiple processes run concurrently, and each process has its
own local state. A global state represents the collective state of all processes and their
communication channels at a particular moment. Understanding global states is
crucial for analyzing the behavior of distributed systems, especially when detecting
issues like deadlocks or ensuring consistency.

Challenges in Observing Global States

1. Absence of Global Time: Unlike centralized systems, distributed systems lack a


synchronized clock across all processes. This makes it difficult to capture a
global state at a specific instant since each process may be recording its state at
different times.

2. Meaningful Global States: Even if we could collect states from each process,
not all combinations of these states would represent a valid global state
because of the asynchronous nature of communication.

Definitions

To understand global states, we need to define a few key concepts:

• History of a Process: The history of a process pipi is the sequence of events that
occur in that process. We denote this as:

hi=⟨ei0,ei1,ei2,…⟩hi=⟨ei0,ei1,ei2,…⟩

where each eikeik is an event in the history of process pipi.

• State of a Process: The state siksik of process pipi immediately before the kk-th
event occurs.

• Global History: The global history HH of the system is the union of the histories
of all processes:

H=h1∪h2∪…∪hNH=h1∪h2∪…∪hN
• Cut: A cut is a subset of the global history that consists of prefixes from each
process's history. A cut can be represented as:

C=⟨h1c1,h2c2,…,hNcN⟩C=⟨h1c1,h2c2,…,hNcN⟩

where cici indicates the last event from process pipi included in the cut.

Consistent Cuts

A cut is consistent if it respects the causal relationships between events. This means
that if an event ee in the cut happens after an event ff (i.e., ff happened-before ee),
then ff must also be included in the cut.

Causal Relationship

The happened-before relation (denoted as →→) is a way to express the order of events:

• If a process sends a message, and another process receives it, the sending event
happened-before the receiving event.

• If two events occur in the same process, they are ordered by their occurrence.

Example of Cuts

Consider two processes p1p1 and p2p2 with the following events:

• p1p1: e10e10 (send message m1m1), e11e11 (send message m2m2)

• p2p2: e20e20 (receive message m1m1), e21e21 (receive message m2m2)

Inconsistent Cut: A cut that includes e20e20 (the receipt of m1m1) but does not
include e10e10 (the sending of m1m1) is inconsistent because it shows an effect
without its cause.

Consistent Cut: A cut that includes both e10e10 and e20e20 (sending and
receiving m1m1) is consistent. It reflects the actual execution where the message was
sent before it was received.

Global States and Transitions

A global state corresponds to a consistent cut. The system transitions between global
states as events occur. Each transition involves a single event happening in one
process, which can be:

• Sending a message

• Receiving a message

• An internal event (like a computation)

Linearization and Runs


• A linearization is a total ordering of events from the global history that respects
the happened-before relation. This means that if event ff happened before
event ee, then ff must appear before ee in the linearization.

• A run is a sequence of events that may not necessarily respect the order of
concurrent events. However, all linearizations must pass through consistent
global states.

Reachability of States

A state S′S′ is said to be reachable from a state SS if there is a linearization that passes
through SS and then S′S′. This means you can transition from one global state to
another through a valid sequence of events.

Global State Predicates

• Global State Predicate: This is a function that evaluates the possible global
states of a distributed system and returns either True or False. For instance,
predicates can check conditions like:

o Is the system in a deadlock?

o Is the system terminated?

o Is an object eligible for garbage collection?

Characteristics of Global State Predicates:

1. Stability: A predicate is stable if, once a global state satisfies the predicate
(returns True), all future reachable states from that state also satisfy it. This is an
important property for certain conditions we want to monitor.

o Example of Stability: If a system is in a state of deadlock (i.e., the


deadlock predicate evaluates to True), it will always be in deadlock for any
future states it might reach, unless some intervention occurs.

2. Non-Stable Predicates: Conversely, some predicates are non-stable. These are


conditions where the state may only hold for a moment and can change
afterward.

o Example of Non-Stability: If a program variable is supposed to always


have a certain value (like a bounded difference constraint), it might reach
a state where it’s true, but this doesn’t guarantee it will remain true in
subsequent states. So, the condition can change over time.
Safety and Liveness

These concepts help categorize the properties of global states into two high-level
concepts: safety and liveness.

Safety

• Safety refers to undesirable conditions. A system is said to be safe with respect


to a predicate if that predicate evaluates to False for all states reachable from an
initial state S0S0.

• Example of Safety: Suppose we have a condition that indicates a deadlock (let's


call this predicate P_deadlock). If the system starts in state S0S0 (which is not
deadlocked), then safety asserts that there should be no reachable states
from S0S0 where the system is deadlocked. In essence, “nothing bad happens.”

Liveness

• Liveness, on the other hand, pertains to desirable conditions. A system is


considered live with respect to a predicate if, starting from any
linearization LL from an initial state S0S0, there exists some future state SLSL
such that the predicate returns True.

• Example of Liveness: If our predicate indicates termination (let's call this


predicate P_termination), being live means that from S0S0, no matter how you
execute the processes (following any valid sequence of events), you will
eventually reach a state where the system terminates. In other words,
“something good eventually happens.”

Overview

• Global State Predicate: A function that evaluates system states to True/False


based on certain conditions.

• Stability: Once a stable predicate is True, it remains True in all reachable future
states.

• Safety: Ensures that undesirable conditions (like deadlock) do not occur from a
given initial state.

• Liveness: Guarantees that from any initial state, a desirable condition (like
termination) will eventually be satisfied.

Purpose of the Snapshot Algorithm


The snapshot algorithm aims to record a consistent global state of a distributed
system, even when the actual events may not have occurred simultaneously. This is
important for evaluating stable global predicates, which are conditions that remain true
across different states of the system.

Key Assumptions

Before we dive into the algorithm, here are some key assumptions it makes:

1. Reliable Communication: All messages sent between processes are received


intact, and no messages are lost.

2. Unidirectional Channels: Messages flow in one direction (from one process to


another) and are delivered in the order they were sent (FIFO - First In, First Out).

3. Strong Connectivity: Every process can communicate with every other process,
meaning there’s a path from any process to any other process.

4. Independent Initiation: Any process can start the snapshot at any time.

5. Concurrent Execution: Processes can continue to send and receive messages


while the snapshot is being taken.

How the Algorithm Works

The algorithm records the state of each process and the state of the communication
channels. Here’s how it operates:

Step-by-Step Process

1. Recording State:

o Each process pipi records its own state when it receives a


special marker message. This state includes its local variables and any
relevant information.

o For each incoming channel, it records a set of messages that have arrived
after it recorded its state.

2. Marker Messages:

o Marker messages are special messages used to trigger state recording.


These messages are distinct from regular application messages.

o When a process sends a marker message, it must do so after recording


its state but before sending any other application messages.

3. Incoming Channel States:

o When a process receives a marker message:


▪ If it has not yet recorded its state, it records its state and
initializes the state of the incoming channel as empty.

▪ If it has already recorded its state, it records the state of the


incoming channel as the set of messages received since it
recorded its state.

Example Scenario

Let’s illustrate the algorithm with a simple example involving two processes, p1p1
and p2p2, connected by two unidirectional channels, c1c1 and c2c2.

Initial States

• Process p1p1:

o Money: $1000

o Widgets: 0

• Process p2p2:

o Money: $50

o Widgets: 2000

Execution Steps

1. Process p1p1 Records Its State:

o p1p1 records its state as S0S0: (Money: $1000, Widgets: 0).

o It sends a marker message over channel c2c2 before sending an order


message (e.g., “Order 10 widgets for $100”).

2. State Changes:

o After sending the marker, p1p1 sends the order message, and now the
system enters a new state S1S1.

3. Process p2p2 Receives the Marker:

o p2p2 receives the marker message from p1p1.

o Since p2p2 has not yet recorded its state, it records its current state
as S1S1: (Money: $50, Widgets: 2000).

o It initializes the state of channel c2c2 as empty (no messages received


yet).

o p2p2 then sends a marker message over channel c1c1.


4. Further Messages:

o Before p2p2 sends the marker, it also sends a message back to p1p1 (e.g.,
“5 widgets”).

o This creates another new state S2S2.

5. Final State Recording:

o When p1p1 receives the message “5 widgets” and also receives p2p2’s
marker, it records the state of channel c1c1 as the set of messages it
received (which is the message “5 widgets”).

o The final recorded states are:

▪ p1p1: (Money: $1000, Widgets: 0)

▪ p2p2: (Money: $50, Widgets: 1995)

▪ Channel c1c1: (Message: “5 widgets”)

▪ Channel c2c2: (Empty)

Final Notes

• Consistency: The recorded global state is consistent because it respects the


order of messages and the states of processes at the time the snapshot was
taken.

• Termination: The algorithm ensures that all processes will eventually record
their states due to the reliable communication and connectivity assumptions.

Termination of the Snapshot Algorithm

Termination refers to the point where all processes in a distributed system have
successfully recorded their states and the states of their channels.

Key Points for Termination

1. Finite Time Recording:

o Each process that receives a marker message will record its state within a
finite amount of time.

o After recording its state, the process will also send marker messages over
each of its outgoing channels within a finite time.

2. Communication Path:
o If there is a communication path from process pipi to process pjpj
(meaning pipi can send messages to pjpj), then pjpj will record its state a
finite time after pipi records its state.

3. Strong Connectivity:

o Since the graph of processes and channels is strongly connected


(meaning every process can reach every other process), it follows that all
processes will eventually record their states.

o This is guaranteed because once one process starts the snapshot, the
markers will propagate through the system, leading all processes to
record their states.

Characterizing the Observed State

The snapshot algorithm effectively selects a cut from the history of the execution of the
distributed system. A cut is a way to divide the events in the execution into two parts:
those that happened before the snapshot and those that happened after.

Consistency of the Cut

1. Events and Cuts:

o If an event eiei occurs at process pipi and another event ejej occurs at
process pjpj, and if eiei happens before ejej (denoted as ei→ejei→ej), then
if ejej is in the cut, eiei must also be in the cut.

2. Proof of Consistency:

o If we assume the opposite (that pipi recorded its state before eiei), we can
show that this leads to a contradiction:

▪ If ejej is in the cut and eiei occurred before it, then pjpj must have
recorded its state before ejej.

▪ This contradicts the assumption that ejej is in the cut, meaning our
initial assumption must be incorrect.

Reachability of States

Reachability refers to the relationship between the recorded global state (snapshot)
and the initial and final global states of the system.

Understanding the Reachability Relation

1. States Defined:

o SinitSinit: The global state just before the first process records its state.
o SsnapSsnap: The global state recorded by the snapshot algorithm.

o SfinalSfinal: The global state immediately after all processes have


recorded their states.

2. Execution Linearization:

o The execution of the system can be represented as a sequence of events,


which we can organize into a linear order (called linearization).

o In this linearization, we can categorize events into pre-snap events (those


that occurred before any process recorded its state) and post-
snap events (those that occurred after).

3. Ordering Events:

o We can rearrange the sequence of events so that all pre-snap events


occur before any post-snap events. This is possible because:

▪ A post-snap event cannot occur before a pre-snap event at the


same process.

▪ By using the rules of the marker messages, we can ensure that the
order of events respects the happened-before relationship.

4. Establishing Reachability:

o By organizing the events this way, we show that:

▪ SsnapSsnap is reachable from SinitSinit because all pre-snap


events lead up to the recorded state.

▪ SfinalSfinal is reachable from SsnapSsnap because all post-snap


events occur after the snapshot has been taken.

Stability and Reachability of Observed State

The reachability property is useful for detecting stable predicates—conditions that


remain true across various states of the system.

1. Stable Predicates:

o If a stable predicate (a condition that remains true) is true in the recorded


state SsnapSsnap, it must also be true in the final state SfinalSfinal.

o Conversely, if the predicate is false in SsnapSsnap, it must also be false in


the initial state SinitSinit.

You might also like