0% found this document useful (0 votes)

19 views119 pages

Nunit I

Uploaded by

Tamilselvan V

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

19 views119 pages

Nunit I

Uploaded by

Tamilselvan V

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 119

21CS401 /DISTRIBUTED SYSTEMS

Course Objective
• To explain the foundation and challenges of distributed systems.
• To infer the knowledge of message ordering and group
communication.
• To demonstrate the distributed mutual exclusion and deadlock
detection algorithms.
• To predict the significance of check pointing and rollback recovery
algorithms
• To summarize the characteristics of peer-to-peer and distributed
shared memory system
SYLLABUS
UNIT-1
• Introduction: Definition –Characteristics-Relation to computer system
components –Motivation – Message-passing systems versus shared
memory systems –Primitives for distributed communication –
Synchronous versus asynchronous executions –Challenges of
Distributed system: System Perspective. A model of distributed
computations: A distributed program –A model of distributed
executions –Models of communication networks –Global state – Cuts
of a distributed computation.
UNIT-2-MESSAGE ORDERING & GROUP
COMMUNICATION
• Message ordering and group communication: Message ordering
paradigms –Asynchronous execution with synchronous
communication –Synchronous program order on an asynchronous
system –Group communication – Causal order (CO) - Total order.
UNIT-3-DISTRIBUTED MUTEX & DEADLOCK
• Distributed mutual exclusion algorithms: Introduction – Preliminaries
– Lamport‘s algorithm – Ricart-Agrawala algorithm – Maekawa‘s
algorithm – Suzuki–Kasami‘s broadcast algorithm. Deadlock detection
in distributed systems: Introduction – System model – Preliminaries –
Models of deadlocks – Knapp‘s classification.
UNIT-4-CHECKPOINTING AND ROLLBACK
RECOVERY
• Introduction – Background and definitions – Issues in failure recovery
– Checkpoint-based recovery – Log-based rollback recovery – Koo-
Toueg coordinated checkpointing algorithm – Juang- Venkatesan
algorithm for asynchronous checkpointing and recovery.
UNIT-5-P2P & DISTRIBUTED SHARED MEMORY
• Peer-to-peer computing and overlay graphs: Introduction – Data
indexing and overlays – Content addressable networks – Tapestry.
Distributed shared memory: Abstraction and advantages – Types of
memory consistency models.
COURSE OUTCOMES

• CO1: Illustrate the models of communication in building a distributed

environment. (K2-Understand)
• CO2: Interpret the order of message in communication network for
synchronous and asynchronous system. (K2-Understand)
• CO3: Use the Mutex and Deadlock detection algorithm in real time
application.( K2-Understand)
• CO4: Discover the issues of check pointing and rollback recovery
mechanisms in distributed environment. (K2-Understand)
• CO5: Relate the features of peer-to-peer and memory consistency models
for a given application.
• (K2-Understand)
TEXT BOOK(S):

• T1: Kshemkalyani, Ajay D., and Mukesh Singhal. Distributed

computing: principles, algorithms, and systems. Cambridge University
Press, 2011.
• T2: George Coulouris, Jean Dollimore and Tim Kindberg, ”Distributed
Systems Concepts and Design”, 5th Edition, Pearson Education, 2017.
• Tanenbaum A.S., Van Steen M., "Distributed Systems: Principles and
paradigms”, 2nd Edition, Pearson
• Education, 2017.
UNIT I
Chapter 1,2
Introduction: Definition –Characteristics-Relation to computer system
components –Motivation – Message-passing systems versus shared
memory systems –Primitives for distributed communication –
Synchronous versus asynchronous executions –Challenges of Distributed
system: System Perspective. A model of distributed computations: A
distributed program –A model of distributed executions –Models of
communication networks –Global state – Cuts of a distributed
computation.
Chapter 1-Introduction

• A distributed system is a system in which components are located on

different networked computers, which can communicate and coordinate their
actions by passing messages to one another.

• A distributed System is a collection of autonomous computer systems that are

physically separated but are connected by a centralized computer network
that is equipped with distributed system software
• A distributed system is a collection of independent(autonomous)
entities that cooperate to solve a problem that cannot be
individually solved.
Middleware

• Middleware refers to software that acts as an intermediary between

different applications, services, or components in a distributed
system.
Types of Distributed Systems

• Client-server systems: The most traditional and simple type of

distributed system, involves a multitude of networked computers
that interact with a central server for data storage, processing, or
other common goal.
Types of Distributed Systems

• Peer-to-peer networks: They distribute workloads among hundreds

or thousands of computers all running the same software.

Characteristics of distributed system:
• No common physical clock

• element of “distribution” in the system and gives rise to the

inherent asynchrony among the processors

• No shared memory
• it requires message-passing for communication

• distributed system provides the abstraction of common address

space via distributed shared memory abstraction.
Characteristics of distributed system
• Geographical separation

• The geographically wider apart processors are the representative of a

distributed system

• it may be in wide-area network (WAN) or the network/cluster of

workstations (NOW/COW) configuration connecting processors on a LAN

• NOW configuration is the low-cost high-speed off-the-shelf processors.

Example: Google search engine.

Characteristics of distributed system
• Autonomy and heterogeneity

• The processors are “loosely coupled” having different speeds and

each runs different operating system but cooperate with one another

by offering services for solving a problem jointly.

Relation to computer system components
Relation to computer system components
• In distributed system each computer has a memory-processing
unit and are connected by a communication network.

• Figure 1.2 shows the relationships of software components that

run on computers use the local operating system and network
protocol stack for functioning.

• A distributed software is also termed as middleware.

Relation to computer system components

• A distributed execution is the execution of processes across the

distributed system to collaboratively achieve a common goal which is
also termed a computation or a run.

• A distributed system follows a layered architecture that reduces the

complexity of the system design.

• Middleware hides the heterogeneity transparently at the platform

level
Relation to computer system components
• It is assumed that the middleware layer does not contain the
application layer functions like http, mail, ftp, and telnet.

• User program code includes the code to invoke libraries of the

middleware layer to support the reliable and ordered multicasting.
Relation to computer system components

• Some of the commercial versions of middleware often in use are

CORBA, DCOM (distributed component object model), Java, and

RMI (remote method invocation), message-passing interface (MPI).

Motivation
• Inherently distributed computations

• Resource sharing

• Access to geographically remote data and resources

• Increased performance/cost ratio

• availability, integrity, fault-tolerance

• Enhanced reliability

• Scalability

• Modularity and incremental expandability

Motivation
• Inherently distributed computations

• Resource sharing
• Resources cannot be fully replicated at all the sites because it is often neither practical
nor cost-effective.

• Access to geographically remote data and resources

• In many scenarios, the data cannot be replicated at every site participating in the
distributed execution because it may be too large or too sensitive to be replicated.
Enhanced reliability
• A distributed system increased reliability because of the possibility of
replicating resources and executions

• geographically distributed resources are not likely to crash/malfunction

at the same time under normal circumstances.
• Availability: resource should be accessible at all times;

• Integrity: value/state of the resource must be correct, in the face of concurrent

access from multiple processors,

• Fault-tolerance: ability to recover from system failures.

• Increased performance/cost ratio

• By resource sharing and accessing geographically remote data and resources,

the performance/cost ratio is increased.

• Scalability
• As the processors are usually connected by a wide-area network, adding more
processors does not pose a direct bottleneck for the communication network.

• Modularity and incremental expandability

• Heterogeneous processors may be easily added into the system without affecting the
performance, as long as those processors are running the same middleware
algorithms.

• Similarly, existing processors may be easily replaced by other processors.

Message-passing vs. Shared Memory
Shared memory systems are those in which there is a (common) shared address
space throughout the system.
• Communication among processors takes place via shared data variables, and
control variables for synchronization among the processors.
• Eg for shared memory:Semaphores and monitors
Message passing
• All multicomputer systems do not have shared address space but communicate
by message passing.
• Programmers find it easier to program using shared memory than by message
passing. →leads to development of abstraction(idea)
• Abstraction (shared memory) is provided to simulate a shared address space.
• For a distributed system, this abstraction is called distributed shared memory.
Message-passing vs. Shared Memory
• 1.5.1 Emulating message-passing on a shared memory system (MP →SM)

• The shared address space is partitioned into disjoint parts, one part being

assigned to each processor.

• “Send” and “receive” operations are implemented for writing to and

reading from the destination/sender processor’s address space,

respectively.
Message-passing vs. Shared Memory
• Specifically, a separate location is reserved as mailbox (assumed to have
unbounded in size) for each ordered pair of processes.

• A Pi–Pj message-passing can be emulated by a write by Pi to the mailbox

and then a read by Pj from the mailbox.

• The write and read operations are controlled using synchronization

primitives to inform the receiver/sender after the data has been
sent/received.
Message-passing vs. Shared Memory
• 1.5.2 Emulating shared memory on a message-passing system (SM
→MP)
• This involves use of “send” and “receive” operations for “write” and
“read” operations.

• Each shared location can be modeled as a separate process;

• “write” to a shared location is emulated by sending an update

message to the corresponding owner process and a “read” by
sending a query message.
Message-passing vs. Shared Memory
• As accessing another processor’s memory requires send and receive
operations, this emulation is expensive.

• In a MIMD message-passing multicomputer system, each “processor” may

be a tightly coupled multiprocessor system with shared memory. Within
the multiprocessor system, the processors communicate via shared
memory.

• Between two computers, the communication by message passing are more

suited for wide-area distributed systems.
Primitives for distributed communication
• Blocking/non-blocking, synchronous/asynchronous primitives
• Processor synchrony
• Libraries and standards
Blocking/non-blocking, synchronous/asynchronous primitives

• Message send and message receive communication primitives are denoted

by Send() and Receive().

• Send primitive has 2 parameters→ destination, user buffer space

containing the data to be sent

• Receive primitive has 2 parameters →source from which the data to be

received , user buffer space into which the data is to be received.

Blocking/non-blocking, synchronous/asynchronous
primitives
2 ways of sending data→ buffered option, unbuffered option
• buffered option→ copies data from user buffer to the kernel buffer,
later the data gets copied from kernel buffer onto the network.

• unbuffered option→ data gets copied directly from user buffer onto
the network.
Blocking/non-blocking, synchronous/asynchronous
primitives

• Send primitive uses the buffered option and unbuffered option.

• Receive primitive→ buffered option required because the data

already arrived when the primitive is invoked, and needs a storage

place in the kernel.

Blocking/non-blocking, synchronous/asynchronous
primitives

Synchronous (send/receive)
• Send and Receive are synchronous when establish a Handshake between
sender and receiver
• Send completes when Receive completes
• Receive completes when data copied into buffer

Asynchronous (send)
• Control returns to process when data copied out of user-specified buffer
Blocking/non-blocking, synchronous/asynchronous
primitives
Blocking (send/receive)
• Control returns to invoking process after processing of primitive (whether sync
or async) completes

Nonblocking (send/receive)
• Control returns to process immediately after invocation (even though operation
has not completed)
• Send: control returns to the process even before data copied out of user buffer
• Receive: control returns to the process even before data may have arrived from
sender
Processor synchrony
• Processor synchrony indicates that all the processors execute in lock-step with their
clocks synchronized
lock-step →similar devices share the same timing and triggering and essentially acting
as a single device

Libraries and standards

• Many commercial software products (banking, payroll, etc., applications) use
proprietary primitive libraries supplied with the software marketed by the vendors
(e.g., the IBM CICS software which has a very widely installed customer base
worldwide uses its own primitives).
• The message-passing interface (MPI) library and the PVM (parallel virtual machine)
library are used largely by the scientific community, but other alternative libraries
exist
Synchronous versus asynchronous executions
• In addition to the two classifications of processor synchrony/asynchrony and of

synchronous/asynchronous communication primitives, there is another classification,

namely that of synchronous/asynchronous executions.

Asynchronous execution

• ) No processor synchrony, no bound on drift rate of clocks

• ) Message delays finite but unbounded

• ) No bound on time for a step at a process

Synchronous versus asynchronous executions
Synchronous versus asynchronous executions
Sync execution
• ) Processors are synchronized; clock drift rate bounded
• ) Message delivery occurs in one logical step/round
• ) Known upper bound on time to execute a step at a process
EMULATION:Synchronous versus asynchronous executions
• Already discussed how shared memory system could be emulated by a message-passing system,
and vice-versa.
• We now have four broad classes of programs, as shown in Figure 1.11.
• Using the emulations shown, any class can be emulated by any other.
• If system A can be emulated by system B, denoted A/B, and if a problem is not solvable in B, then
it is also not solvable in A.
• Likewise, if a problem is solvable in A, it is also solvable in B.
• Hence, in a sense, all four classes are equivalent in terms of computability” – what can and
cannot be computed – in failure-free systems.
Challenges of Distributed system: System Perspective
• Communication mechanisms: E.g., Remote Procedure Call (RPC),
remote object invocation (ROI), message-oriented vs. stream-
oriented communication

• Processes: Code migration, process/thread management at clients

and servers, design of software and mobile agents
Challenges of Distributed system: System Perspective

• Naming: Easy to use identifiers needed to locate resources and

processes transparently and scalably

• Synchronization: synchronization or coordination among processes

are essential.
Mutual exclusion is the classical example of
synchronization
Challenges of Distributed system: System Perspective

• Data storage and access

• Schemes for data storage, search, and lookup should be fast and
scalable across network

• Consistency and replication

• Replication for fast access, scalability, avoid bottlenecks

• Require consistency management among replicas

Challenges of Distributed system: System
Perspective
• Fault-tolerance: recover from failure;despite link, node, process
failures

• Distributed systems security

•) Secure channels, access control, key management (key
generation and key distribution), authorization, secure group
management
Challenges of Distributed system: System
Perspective
• Scalability and modularity of algorithms, data, services
The algorithms, data (objects), and services must be as
distributed as possible.
Challenges of Distributed system: System Perspective

• API for communications, services: ease of use(non-

technical users)
• Transparency: hiding implementation policies from user
•) Access: hide differences in data representation across systems,
provide uniform operations to access resources
• ) Location: locations of resources are transparent
• ) Migration: relocate resources without renaming
Challenges of Distributed system: System Perspective

• Replication: hide replication from the users

• Concurrency: mask the use of shared resources

• Failure: reliable and fault-tolerant operation

• API for communications, services: ease of use(non-technical users)

• Transparency: hiding implementation policies from user

• ) Access: hide differences in data representation across systems, provide
uniform operations to access resources
• ) Location: locations of resources are transparent
• ) Migration: relocate resources without renaming
• ) Relocation: relocate resources as they are being accessed
• ) Replication: hide replication from the users
• ) Concurrency: mask the use of shared resources
• ) Failure: reliable and fault-tolerant operation
Chapter 2:
A Model of Distributed Computations
• A model of distributed computations: A distributed program –A
model of distributed executions –Models of communication networks
–Global state – Cuts of a distributed computation..
A Distributed Program
• A distributed program is composed of a set of n asynchronous

processes, p1, p2, ..., pi , ..., pn.

• The processes do not share a global memory and communicate

solely by passing messages.

A Distributed Program

• Process execution and message transfer are asynchronous.

• Assume that each process is running on a different processor.

• Let Cij denote the channel from process pi to process pj and let mij

denote a message sent by pi to pj .

• The message transmission delay is finite and unpredictable.

A Model of Distributed Executions
• The execution of a process consists of a sequential execution of its
actions.

• The actions are atomic and the actions of a process are modeled as
three types of events.
• internal events
• message send events,
• message receive events.
A Model of Distributed Executions

• For a message m, let send(m) & rec(m) denote send and receive

events, respectively.

• The occurrence of events changes the states of respective processes

and channels, thus causing transitions in the global system state.

• An internal event changes the state of the process at which it occurs.

A Model of Distributed Executions

• A send event changes

• the state of the process that sends or receives and

• the state of the channel on which the message is sent.

• An internal event only affects the process at which it occurs.

A Model of Distributed Executions
A Model of Distributed Executions
• For every message m that is exchanged between two processes, have
Send(m)→msg rec(m)

• Relation →msg defines causal dependencies between send and

receive events.
• Fig 2.1 shows a distributed execution using space–time diagram that
involves three processes.

• A horizontal line represents the progress of the process; a dot

indicates an event; a slant arrow indicates a message transfer.
A Model of Distributed Executions

• In this figure, for process p1, the 2nd event is a message send event, the 3rd

event is an internal event, and the 4th event is a message receive event.
Causal Precedence Relation
• The execution of a distributed application results in a set of distributed
events produced by the processes.
• Let H=∪i hi denote the set of events executed in a distributed computation.
• Next Define a binary relation → on the set H as follows that expresses
causal dependencies between events in the distributed execution.
• Concurrent events
• Logical vs. physical concurrency
Models of communication networks
Models of the service provided by communication networks are

• In the FIFO model, each channel acts as a first-in first-out message queue
and thus, message ordering is preserved by a channel.

• In the non-FIFO model, a channel acts like a set in which the sender
process adds messages and the receiver process removes messages from
it in a random order.
Models of communication networks

• The “causal ordering” model is based on Lamport’s “happens before”

relation. A system that supports the causal ordering model satisfies
the following property:
Models of communication networks
• Causally ordered delivery of messages implies FIFO message delivery.

• Note that CO ⊂ FIFO ⊂ Non-FIFO.

• Causal ordering model is useful in developing distributed algorithms.

• Example: Replicated database systems, every process that updates a

replica must receives updates in the same order to maintain database
consistency.
Models of communication networks
Global state of a distributed system

• The global state of a distributed system is a collection of the local states of its

components, namely, the processes and the communication channels

• The state of a process at any time is defined by the contents of processor

registers, stacks, local memory, etc. and depends on the local context of the

distributed application.
Models of communication networks

• The state of a channel is given by the set of messages in transit in the

channel.

• The occurrence of events changes the states of respective processes

and channels, thus causing transitions in global system state.
Models of communication networks

• For example, an internal event changes the state of the process at

which it occurs.

• A send event (or a receive event) changes the state of the process
that sends (or receives) the message and the state of the channel on
which the message is sent (or received).
Models of communication networks

• LSi0 denotes the initial state of process pi.

• LSix is a result of the execution of all the events executed by process

pi till eix.
Models of communication networks
• Let SCijx,y denote the state of a channel Cij defined as follows:

• Thus, channel state SCijx,y denotes all messages that pi sent up to

event eix and which process pj had not received until event ejy.
Global state

• The global state GS of a distributed system is a collection of the local

states of the processes and the channels is defined as
Global state
• For a global snapshot, the states of all the components of the
distributed system must be recorded at the same instant.

• This is possible if the local clocks at processes were perfectly

synchronized by the processes.

• Basic idea is that a message cannot be received if it was not sent

i.e., the state should not violate causality. Such states are called
consistent global states.
Global state
• Inconsistent global states are not meaningful in a distributed system.
Models of communication networks
global state GS consisting of local states
{LS11, LS23, LS33 , LS42} is inconsistent
{LS12, LS24, LS34, LS42} is consistent;
Cuts of a distributed computation

• A consistent global state corresponds to a cut in which every message

received in the PAST of the cut was sent in the PAST of that cut. Such
a cut is known as a consistent cut.

• All messages that cross the cut from the PAST to the FUTURE are in
transit in the corresponding consistent global state.

• A cut is inconsistent if a message crosses the cut from the FUTURE to

the PAST.
• For example, the space–time diagram of Figure 2.3 shows two cuts,
C1 and C2.
• C1 is an inconsistent cut, whereas C2 is a consistent cut.
.

Past and Future Cones of an Event

Past Cone of an Event

An event ej could have been affected only by all events ei such that ei → ej .
In this situtaion, all the information available at ei could be made
accessible at ej .
All such events ei belong to the past of ej .
Let Past(ej ) denote all events in the past of ej in a computation (H, →).
Then,
Past(ej ) = {ei |∀ei∈H, ei → ej }.

Figure 2.4 (next slide) shows the past of an event ej .

. .
A Model of Distributed Computations 85/1
.

. . . Past and Future Cones of an Event

Figure 2.4: Illustration of past and future cones.

max(Pasti (ej )) min(Futurei (ej ))

PAST(ej ) FUTURE( ej )

e
j

. .
A Model of Distributed Computations 86/1
.

. . . Past and Future Cones of an Event

Let Pasti (ej ) be the set of all those events of Past(ej ) that are on
process pi .

Pasti (ej ) is a totally ordered set, ordered by the relation →i , whose

maximal element is denoted by max (Pasti (ej )).

max (Pasti (ej )) is the latest event at process pi that affected event ej
(Figure 2.4).

. .
A Model of Distributed Computations 87/1
.

. . . Past and Future Cones of an Event

Let Max Past(ej ) = (∀i ){max (Pasti (ej ))}.

Max Past(ej ) consists of the latest event at every process that affected event
ej and is referred to as the surface of the past cone of ej .
Past(ej ) represents all events on the past light cone that affect ej .
Future Cone of an Event
The future of an event ej , denoted by Future(ej ), contains all events ei that
are causally affected by ej (see Figure 2.4).
In a computation (H, →), Future(ej ) is defined as:
Future(ej ) = {ei |∀ei∈ H, ej → ei }.

. .
A Model of Distributed Computations 88/1
.

. . . Past and Future Cones of an Event

Define Futurei (ej ) as the set of those events of Future(ej ) that are on process
pi .
define min(Futurei (ej )) as the first event on process pi that is affected by ej .
Define Min Future(ej ) as S(∀i ){min(Futurei (ej ))}, which consists of the first event
at every process that is causally affected by event ej .
Min Future(ej ) is referred to as the surface of the future cone of ej .
All events at a process pi that occurred after max (Pasti (ej )) but before
min(Futurei (ej )) are concurrent with ej .
Therefore, all and only those events of computation H that belong to the set
“H − Past(ej ) − Future(ej )” are concurrent with event ej .

. .
A Model of Distributed Computations 89/1
.

Models of Process Communications

• There are two basic models of process communications – synchronous and
asynchronous.
• synchronous communication model is a blocking type where on a message send,
the sender process blocks until the message has been received by the receiver
process.
• The sender process resumes execution only after it learns that the receiver process
has accepted the message.
• Thus, the sender and the receiver processes must synchronize to exchange a
message.
• Asynchronous communication model is a non-blocking type where the sender and
the receiver do not synchronize to exchange a message.
• After having sent a message, the sender process does not wait for the message
to be delivered to the receiver process.
• The message is bufferred by the system and is delivered to the receiver process
when it is ready to accept the message.
. .
A Model of Distributed Computations 90/1
.

. . . Models of Process Communications

Neither of the communication models is superior to the other.
Asynchronous communication provides higher parallelism because the sender process can
execute while the message is in transit to the receiver.
However, A buffer overflow may occur if a process sends a large number of messages in a
burst to another process.
Thus, an implementation of asynchronous communication requires more complex
buffer management.
In addition, due to higher degree of parallelism and non-determinism, it is much more
difficult to design, verify, and implement distributed algorithms for asynchronous
communications.
Synchronous communication is simpler to handle and implement.
However, due to frequent blocking, it is likely to have poor performance and is likely to be
more prone to deadlocks.

. .
A Model of Distributed Computations 91/1
Chapter 3-Logical time
• . Logical Time: A framework for a system of logical clocks –Scalar time
–Vector time – Physical clock synchronization: NTP.
Introduction
• The concept of causality between events is fundamental to the design and
analysis of parallel and distributed computing and operating systems.

• Usually causality is tracked using physical time.

• In distributed systems, it is not possible to have a global physical time.

• As asynchronous distributed computations make progress in spurts(sudden),

the logical time is sufficient to capture the fundamental monotonicity property
associated with causality in distributed systems.

• Three ways to implement logical time - scalar time, vector time, and matrix time.
A Framework for a System of Logical Clocks

• Definition
• Implementing logical clocks
Definition
• A system of logical clocks consists of a time domain T and a logical clock C .
• Elements of T form a partially ordered set over a relation <
• This Relation < is called the happened before or causal precedence.
• The logical clock C is a function that maps an event e in a distributed system to an
element in the time domain T , denoted as C(e) and called the timestamp of e, and
is defined as follows:
C : H ›→ T
• such that the following property is satisfied:
for two events ei and ej , ei → ej =⇒ C(ei ) < C(ej ).
• This monotonicity property is called the clock consistency condition.
• When T and C satisfy the following condition,
for two events ei and ej , ei → ej ⇔ C(ei ) < C(ej )
the system of clocks is said to be strongly consistent.
Implementing Logical Clocks
• Implementation of logical clocks requires addressing two issues:
• Data structures local to every process to represent logical time
• Protocol to update the data structures to ensure the consistency condition.
• Each process pi maintains data structures that allow the following two
capabilities:
• The protocol ensures that a process’s logical clock, and its view of the
global time, is managed consistently.

• The protocol consists of the following two rules:

• R1: This rule governs how the local logical clock is updated by a process when it
executes an event.
• R2: This rule governs how a process updates its global logical clock to update its
view of the global time and global progress.
• Systems of logical clocks differ in their representation of logical time
and also in the protocol to update the logical clocks.
Scalar Time
• Proposed by Lamport in 1978 as an attempt to totally order events in a distributed system.

• Time domain is the set of non-negative integers.

• The logical local clock of a process pi and its local view of the global time are squashed
into one integer variable Ci .

• Rules R1 and R2 to update the clocks are as follows:

• R1: Before executing an event (send, receive, or internal), process pi executes the following:
• Ci := Ci + d (d > 0)
• In general, every time R1 is executed, d can have a different value;
• however, typically d is kept at 1.
• R2: Each message piggybacks the clock value of its sender at sending time.
• When a process pi receives a message with timestamp Cmsg , it executes the following
actions:
• Ci := max (Ci , Cmsg )
• Execute R1.
• Deliver the message.
• Figure 3.1 shows evolution of scalar time with d=1
Basic Properties
• Consistency Property
• Scalar clocks satisfy the monotonicity and hence the consistency property:
for two events ei and ej , ei → ej =⇒ C(ei ) < C(ej ).
• Total Ordering
• Scalar clocks can be used to totally order events in a distributed system.
• The main problem in totally ordering events is that two or more events at different
processes may have identical timestamp.
• For example in Figure 3.1, the third event of process P1 and the second event of process P2
have identical scalar timestamp.
• A tie-breaking mechanism is needed to order such events.
• A tie is broken as follows:

• Process identifiers are linearly ordered and tie among events with identical scalar
timestamp is broken on the basis of their process identifiers.
• The lower the process identifier in the ranking, the higher the priority.
• The timestamp of an event is denoted by a tuple (t, i ) where t is its time of
occurrence and i is the identity of the process where it occurred.
• The total order relation ≺ on two events x and y with timestamps (h,i) and (k,j),
respectively, is defined as follows:

• x ≺ y ⇔ (h < k or (h = k and i < j))

• Event counting
• If the increment value d is always 1, the scalar time has the following interesting
property:
• if event e has a timestamp h, then h-1 represents the minimum logical duration,
counted in units of events, required before producing the event e;
• We call it the height of the event e.
• In other words, h-1 events have been produced sequentially before the event e
regardless of the processes that produced these events.
• For example, in Figure 3.1, five events precede event b on the longest causal path
ending at b.
.

Vector Time
• The system of vector clocks was developed independently by Fidge,
Mattern and Schmuck.
• In the system of vector clocks, the time domain is represented by a set of
• n-dimensional non-negative integer vectors.
• Each process pi maintains a vector vti [1..n], where vti [i ]is the local logical clock of pi and
describes the logical time progress at process pi .
• vti [j] represents process pi ’s latest knowledge of process pj local time.
• If vti [j]=x , then process pi knows that local time at process pj has progressed till x .
• The entire vector vti constitutes pi ’s view of the global logical time and is used to timestamp
events.

. Logical Time . 105/67

Vector Time
Process pi uses the following two rules R1 and R2 to update its clock:
R1: Before executing an event, process pi updates its local logical time as
follows:
vti [i] := vti [i] + d (d > 0)
R2: Each message m is piggybacked with the vector clock vt of the sender
process at sending time. On the receipt of such a message (m,vt), process pi
executes the following sequence of actions:
◮ Update its global logical time as follows:

1 ≤ k ≤ n : vti [k] := max(vti [k], vt[k])

◮ Execute R1.
◮ Deliver the message m.

. Logical Time . 106/67

Vector Time

The timestamp of an event is the value of the vector clock of its process
when the event is executed.
Figure 3.2 shows an example of vector clocks progress with the increment
value d=1.
Initially, a vector clock is [0, 0, 0, ....,0].

. Logical Time . 107/67

Vector Time
An Example of Vector Clocks

1 2 3 4 5
0 0 0 3 3
0 0 0 4 4
p
1 5
2 2 3
0 3 4
0 0 2 2 2 4 5
1 2 3 4 6
0 0 0 0 4
p
2 5
2
3 5
0 4
0 2 2 2
0 3 3 3
1 2 3 4
p
3

Figure 3.2: Evolution of vector time.

. Logical Time . 108/67

Vector Time
Comparing Vector Timestamps

The following relations are defined to compare two vector timestamps, vh

and vk :
vh = vk ⇔ ∀x: vh[x ] = vk [x ]
vh ≤ vk ⇔ ∀x: vh[x ] ≤ vk [x ]
vh < vk ⇔ vh ≤ vk and ∃x : vh[x ] < vk [x
]
vh ǁ vk ⇔ ¬(vh < vk ) ∧ ¬(vk < vh)
If the process at which an event occurred is known, the test to compare two
timestamps can be simplified as follows: If events x and y respectively
occurred at processes pi and pj and are assigned timestamps vh and vk,
respectively, then

x→y ⇔vh[i ] ≤ vk [i ]
xǁy ⇔vh[i ] > vk [i ] ∧ vh[j] < vk [j]

. Logical Time . 109/67

Vector Time
Properties of Vectot Time

Isomorphism
If events in a distributed system are timestamped using a system of vector
clocks, we have the following property.
If two events x and y have timestamps vh and vk, respectively, then

x→y ⇔vh < vk x

ǁy ⇔vh ǁ vk.

Thus, there is an isomorphism between the set of partially ordered events

produced by a distributed computation and their vector timestamps.

. Logical Time . 110/67

Vector Time
Strong Consistency
The system of vector clocks is strongly consistent; thus, by examining the
vector timestamp of two events, we can determine if the events are causally
related.
However, Charron-Bost showed that the dimension of vector clocks cannot be
less than n, the total number of processes in the distributed computation, for
this property to hold.
Event Counting
If d=1 (in rule R1), then the i th component of vector clock at process pi ,
vti [i ], denotes the number of events that have occurred at pi until that
instant.
So, if an event e has timestamp vh, vh[j] denotes the number
Σ of events
executed by process pj that causally precede e. Clearly, vh[j] − 1
represents the total number of events that causally precede e in the
distributed computation.

. Logical Time . 111/67

Physical Clock Synchronization: NTP

Motivation
In centralized systems, there is only single clock. A process gets the time by
simply issuing a system call to the kernel.
In distributed systems, there is no global clock or common memory. Each
processor has its own internal clock and its own notion of time.
These clocks can easily drift seconds per day, accumulating significant errors
over time.
Also, because different clocks tick at different rates, they may not remain
always synchronized although they might be synchronized when they start.
This clearly poses serious problems to applications that depend on a
synchronized notion of time.

. Logical Time . 112/67

Physical Clock Synchronization: NTP

Motivation

For most applications and algorithms that run in a distributed system, we

need to know time in one or more of the following contexts:
◮ The time of the day at which an event happened on a specific machine in the
network.
◮ The time interval between two events that happened on different machines in
the network.
◮ The relative ordering of events that happened on different machines in the
network.
Unless the clocks in each machine have a common notion of time, time-based
queries cannot be answered.
Clock synchronization has a significant effect on many problems like secure
systems, fault diagnosis and recovery, scheduled operations, database
systems, and real-world clock values.

. Logical Time . 113/67

Physical Clock Synchronization: NTP

Clock synchronization is the process of ensuring that physically distributed

processors have a common notion of time.
Due to different clocks rates, the clocks at various sites may diverge with
time and periodically a clock synchronization must be performed to correct
this clock skew in distributed systems.
Clocks are synchronized to an accurate real-time standard like UTC
(Universal Coordinated Time).
Clocks that must not only be synchronized with each other but also have to
adhere to physical time are termed physical clocks.

. Logical Time . 114/67

Physical Clock Synchronization: NTP

Definitions and Terminology
Let Ca and Cb be any two clocks.
Time: The time of a clock in a machine p is given by the function Cp (t),
where Cp (t) = t for a perfect clock.
Frequency: Frequency is the rate at ′which a clock progresses. The
frequency at time t of clock Ca is Ca (t).
Offset: Clock offset is the difference between the time reported by a clock
and the real time. The offset of the clock Ca is given by Ca (t) − t. The
offset of clock Ca relative to Cb at time t ≥ 0 is given by Ca(t) − Cb (t).
Skew: The skew of a clock is the difference in the frequencies of the clock
and the perfect clock. The skew of a clock Ca relative to clock Cb at time t
is (Ca′ (t) − Cb ′ (t)). If the skew is bounded by ρ, then as per Equation (1),
clock values are allowed to diverge at a rate in the range of 1 − ρ to 1 + ρ.
Drift (rate): The drift of clock Ca is the second derivative of the clock value
with respect to time, namely, C ′′a(t). The drift of clock Ca relative to clock
Cb at time t is Ca (t)
′′ − Cb (t).
′′

. Logical Time . 115/67

Physical Clock Synchronization: NTP

Clock Inaccuracies
Physical clocks are synchronized to an accurate real-time standard like UTC
(Universal Coordinated Time).
However, due to the clock inaccuracy discussed above, a timer (clock) is said
to be working within its specification if (where constant ρ is the maximum
skew rate specified by the manufacturer.)
dC
1−ρ≤ ≤ 1+ ρ (1)
dt
Figure 3.5 illustrates the behavior of fast, slow, and perfect clocks with
respect to UTC.

. Logical Time . 116/67

Physical Clock Synchronization: NTP

Fast Clock
dC/dt > 1
Perfect Clock
dC/dt = 1

Clock time, C
Slow Clock
dC/dt < 1

UTC, t

Figure 3.5: The behavior of fast, slow, and perfect clocks with respect to UTC.

. Logical Time . 117/67

Physical Clock Synchronization: NTP

Offset delay estimation method

The Network Time Protocol (NTP) which is widely used for clock
synchronization on the Internet uses the The Offset Delay Estimation
method.
The design of NTP involves a hierarchical tree of time servers.
◮ The primary server at the root synchronizes with the UTC.
◮ The next level contains secondary servers, which act as a backup to the
primary server.
◮ At the lowest level is the synchronization subnet which has the clients.

. Logical Time . 118/67

Physical Clock Synchronization: NTP

Clock offset and delay estimation:

In practice, a source node cannot accurately estimate the local time on the target
node due to varying message or network delays between the nodes.
This protocol employs a common practice of performing several trials and
chooses the trial with the minimum delay.
Figure 3.6 shows how NTP timestamps are numbered and exchanged
between peers A and B.
Let T1, T2, T3, T4 be the values of the four most recent timestamps as shown.
Assume clocks A and B are stable and running at the same speed.

. Logical Time . 119/67

T1 T2
B

A
T3 T4

Figure 3.6: Offset and delay estimation.

. Logical Time . 120/67

Let a = T1 − T3 and b = T2 − T4.

If the network delay difference from A to B and from B to A, called
differential delay, is small, the clock offset θand roundtrip delay δof B
relative to A at time T4 are approximately given by the following.
a+ b
θ= , δ= a−b (2)
2
Each NTP message includes the latest three timestamps T1, T2 and T3,
while T4 is determined upon arrival.
Thus, both peers A and B can independently calculate delay and offset using a
single bidirectional message stream as shown in Figure 3.7.

. Logical Time . 121/67

Server A Ti-2 Ti-1

Server B Ti-3 Ti

Figure 3.7: Timing diagram for the two servers.

. Logical Time . 122/67

Asynchronous Programming in Rust
No ratings yet
Asynchronous Programming in Rust
76 pages
Mastering Async Await
No ratings yet
Mastering Async Await
56 pages
UNIT I Notes Final
No ratings yet
UNIT I Notes Final
88 pages
DC - Unit 1 - Introduction Final
No ratings yet
DC - Unit 1 - Introduction Final
53 pages
DC - Unit 1 - Introduction
No ratings yet
DC - Unit 1 - Introduction
68 pages
CS3551 - 1 - Merged
No ratings yet
CS3551 - 1 - Merged
117 pages
Unit 1 DC
No ratings yet
Unit 1 DC
19 pages
DC - Hand Written
No ratings yet
DC - Hand Written
26 pages
Ds 01
No ratings yet
Ds 01
41 pages
Unit 1notes Full
No ratings yet
Unit 1notes Full
20 pages
CS3551 - Distributed Computing
No ratings yet
CS3551 - Distributed Computing
106 pages
Unit-I Notes
No ratings yet
Unit-I Notes
40 pages
CS3551 Unit 1 Notes
No ratings yet
CS3551 Unit 1 Notes
25 pages
Chapter 1 - Introduction
No ratings yet
Chapter 1 - Introduction
44 pages
Distributed System
No ratings yet
Distributed System
162 pages
CS3551-Distributed Computing Notes - Removed
No ratings yet
CS3551-Distributed Computing Notes - Removed
32 pages
Unit One
No ratings yet
Unit One
93 pages
Unit-2 (A)
No ratings yet
Unit-2 (A)
40 pages
Distributed-Computing Notes
No ratings yet
Distributed-Computing Notes
108 pages
Distributed Systems: Lecturer: Dr. Nadia Tarik Saleh
No ratings yet
Distributed Systems: Lecturer: Dr. Nadia Tarik Saleh
19 pages
CS3551 Unit 1
No ratings yet
CS3551 Unit 1
24 pages
Distributed Systems: Xining Li
No ratings yet
Distributed Systems: Xining Li
21 pages
Chapter 1
No ratings yet
Chapter 1
55 pages
Distributed System: Chapter-1
No ratings yet
Distributed System: Chapter-1
31 pages
Distributed Systems: Dr.P.Amudha Associate Professor
100% (4)
Distributed Systems: Dr.P.Amudha Associate Professor
38 pages
Chapter 2: Concepts and Architectures: CPU I/O Disk(s)
No ratings yet
Chapter 2: Concepts and Architectures: CPU I/O Disk(s)
38 pages
Ch1 Introduction
No ratings yet
Ch1 Introduction
57 pages
Distributed Computing
No ratings yet
Distributed Computing
189 pages
CS3551 Unit 1 and 2
No ratings yet
CS3551 Unit 1 and 2
48 pages
SKR 4401
No ratings yet
SKR 4401
53 pages
DC Unit 1 - Notes
No ratings yet
DC Unit 1 - Notes
36 pages
DC - Unit 1 - Introduction Notes
No ratings yet
DC - Unit 1 - Introduction Notes
23 pages
Cs3551 Distributed Computing Unit-1
No ratings yet
Cs3551 Distributed Computing Unit-1
52 pages
Distributed Comp (Intro)
No ratings yet
Distributed Comp (Intro)
39 pages
Chapter One
No ratings yet
Chapter One
42 pages
Chapter 1 Introduction
No ratings yet
Chapter 1 Introduction
42 pages
CS8603 Unit I
No ratings yet
CS8603 Unit I
35 pages
Chapter 1-Introduction To Distributed Systems
No ratings yet
Chapter 1-Introduction To Distributed Systems
59 pages
Chapter 1 Introduction
No ratings yet
Chapter 1 Introduction
53 pages
Lecture 2 On Distributed Systems
No ratings yet
Lecture 2 On Distributed Systems
45 pages
CS8603 U.I
No ratings yet
CS8603 U.I
36 pages
Tema 1
No ratings yet
Tema 1
59 pages
CS8603 U.I
No ratings yet
CS8603 U.I
37 pages
Introduction To Distributed Systems: by Petros H
No ratings yet
Introduction To Distributed Systems: by Petros H
48 pages
DC Unit 1
No ratings yet
DC Unit 1
25 pages
Objectives: To Learn The Concept and Characteristics of Distributed System
No ratings yet
Objectives: To Learn The Concept and Characteristics of Distributed System
23 pages
CH 1
No ratings yet
CH 1
36 pages
Unit I Notes DC
No ratings yet
Unit I Notes DC
28 pages
Distributed Systems
No ratings yet
Distributed Systems
121 pages
DS Insem
No ratings yet
DS Insem
70 pages
Introduction To Distributed Systems: BY: Sunita Sahu Assistant Professor, VESIT, Mumbai
No ratings yet
Introduction To Distributed Systems: BY: Sunita Sahu Assistant Professor, VESIT, Mumbai
48 pages
Distributed Systems REPORT
No ratings yet
Distributed Systems REPORT
39 pages
DS - Lec01 - The Completed One
No ratings yet
DS - Lec01 - The Completed One
38 pages
Distributed System Unit 1
No ratings yet
Distributed System Unit 1
20 pages
Module 1
No ratings yet
Module 1
51 pages
DS Mod 1
No ratings yet
DS Mod 1
44 pages
Lecture 1 - Fundamentals of Distributed System
No ratings yet
Lecture 1 - Fundamentals of Distributed System
13 pages
DC Module1
No ratings yet
DC Module1
54 pages
List of Questions: Brightway Computers
No ratings yet
List of Questions: Brightway Computers
70 pages
DC Module1
No ratings yet
DC Module1
62 pages
Distributed Cluster Operations with DC/OS: Definitive Reference for Developers and Engineers
From Everand
Distributed Cluster Operations with DC/OS: Definitive Reference for Developers and Engineers
Richard Johnson
No ratings yet
Designing Resilient Distributed Systems with CAP: Definitive Reference for Developers and Engineers
From Everand
Designing Resilient Distributed Systems with CAP: Definitive Reference for Developers and Engineers
Richard Johnson
No ratings yet
BCS2213 - Async Interface
No ratings yet
BCS2213 - Async Interface
21 pages
Slide CPD
No ratings yet
Slide CPD
72 pages
Snake Game in C
No ratings yet
Snake Game in C
6 pages
BenChristensen FunctionalReactiveProgrammingWithRxJava PDF
No ratings yet
BenChristensen FunctionalReactiveProgrammingWithRxJava PDF
154 pages
Akka Ten Year Anniversary
No ratings yet
Akka Ten Year Anniversary
127 pages
Promise Vs Callback Vs Async - Await - by Mohit Garg - Dev Genius
No ratings yet
Promise Vs Callback Vs Async - Await - by Mohit Garg - Dev Genius
6 pages
Salesforce Admin + Dev Course Content Updated
No ratings yet
Salesforce Admin + Dev Course Content Updated
24 pages
Java Script Notes
No ratings yet
Java Script Notes
5 pages
5 Synchronous Vs Asynchronous
No ratings yet
5 Synchronous Vs Asynchronous
2 pages
Smashing Ebook Javascript Essentials
No ratings yet
Smashing Ebook Javascript Essentials
142 pages
Cuda - New Features and Beyond Ampere Programming For Developers PDF
No ratings yet
Cuda - New Features and Beyond Ampere Programming For Developers PDF
78 pages
Messenger en 44
No ratings yet
Messenger en 44
8 pages
Asynchronous Communication: SE3020 - Distributed Systems - Async. Communication - Dharshana Kasthurirathna
No ratings yet
Asynchronous Communication: SE3020 - Distributed Systems - Async. Communication - Dharshana Kasthurirathna
52 pages
Chapter2 PDF
No ratings yet
Chapter2 PDF
31 pages
Modern Asynchronous Javascript With Async An..
No ratings yet
Modern Asynchronous Javascript With Async An..
8 pages
Functional and Concurrent Programming: Core Concepts and Features
100% (1)
Functional and Concurrent Programming: Core Concepts and Features
516 pages
Chapter 6
No ratings yet
Chapter 6
3 pages
SRECon19 - Managing Microservices With Istio Service Mesh - v1.1
No ratings yet
SRECon19 - Managing Microservices With Istio Service Mesh - v1.1
28 pages
Async With CompletableFuture
No ratings yet
Async With CompletableFuture
9 pages
Async Apex
100% (1)
Async Apex
3 pages
2023远程工作手册
No ratings yet
2023远程工作手册
22 pages
Csharp Par
No ratings yet
Csharp Par
59 pages
Parallel Asynchronous Programming Java
No ratings yet
Parallel Asynchronous Programming Java
144 pages
Namaste JS Season-2 Notes
No ratings yet
Namaste JS Season-2 Notes
30 pages
Exploring Es2016 Es2017 PDF
100% (1)
Exploring Es2016 Es2017 PDF
60 pages
Kelsey Rochester 0188E 10062
No ratings yet
Kelsey Rochester 0188E 10062
191 pages
Learn Asynchronous JavaScript - Async-Await Cheatsheet - Codecademy
No ratings yet
Learn Asynchronous JavaScript - Async-Await Cheatsheet - Codecademy
4 pages
Functional and Concurrent Programming Core Concepts and Features 1st Edition Charpentier 2024 Scribd Download
100% (3)
Functional and Concurrent Programming Core Concepts and Features 1st Edition Charpentier 2024 Scribd Download
62 pages