Nunit I
Nunit I
Course Objective
• To explain the foundation and challenges of distributed systems.
• To infer the knowledge of message ordering and group
communication.
• To demonstrate the distributed mutual exclusion and deadlock
detection algorithms.
• To predict the significance of check pointing and rollback recovery
algorithms
• To summarize the characteristics of peer-to-peer and distributed
shared memory system
SYLLABUS
UNIT-1
• Introduction: Definition –Characteristics-Relation to computer system
components –Motivation – Message-passing systems versus shared
memory systems –Primitives for distributed communication –
Synchronous versus asynchronous executions –Challenges of
Distributed system: System Perspective. A model of distributed
computations: A distributed program –A model of distributed
executions –Models of communication networks –Global state – Cuts
of a distributed computation.
UNIT-2-MESSAGE ORDERING & GROUP
COMMUNICATION
• Message ordering and group communication: Message ordering
paradigms –Asynchronous execution with synchronous
communication –Synchronous program order on an asynchronous
system –Group communication – Causal order (CO) - Total order.
UNIT-3-DISTRIBUTED MUTEX & DEADLOCK
• Distributed mutual exclusion algorithms: Introduction – Preliminaries
– Lamport‘s algorithm – Ricart-Agrawala algorithm – Maekawa‘s
algorithm – Suzuki–Kasami‘s broadcast algorithm. Deadlock detection
in distributed systems: Introduction – System model – Preliminaries –
Models of deadlocks – Knapp‘s classification.
UNIT-4-CHECKPOINTING AND ROLLBACK
RECOVERY
• Introduction – Background and definitions – Issues in failure recovery
– Checkpoint-based recovery – Log-based rollback recovery – Koo-
Toueg coordinated checkpointing algorithm – Juang- Venkatesan
algorithm for asynchronous checkpointing and recovery.
UNIT-5-P2P & DISTRIBUTED SHARED MEMORY
• Peer-to-peer computing and overlay graphs: Introduction – Data
indexing and overlays – Content addressable networks – Tapestry.
Distributed shared memory: Abstraction and advantages – Types of
memory consistency models.
COURSE OUTCOMES
• No shared memory
• it requires message-passing for communication
distributed system
each runs different operating system but cooperate with one another
• Resource sharing
• Enhanced reliability
• Scalability
• Resource sharing
• Resources cannot be fully replicated at all the sites because it is often neither practical
nor cost-effective.
• The shared address space is partitioned into disjoint parts, one part being
respectively.
Message-passing vs. Shared Memory
• Specifically, a separate location is reserved as mailbox (assumed to have
unbounded in size) for each ordered pair of processes.
• unbuffered option→ data gets copied directly from user buffer onto
the network.
Blocking/non-blocking, synchronous/asynchronous
primitives
Synchronous (send/receive)
• Send and Receive are synchronous when establish a Handshake between
sender and receiver
• Send completes when Receive completes
• Receive completes when data copied into buffer
Asynchronous (send)
• Control returns to process when data copied out of user-specified buffer
Blocking/non-blocking, synchronous/asynchronous
primitives
Blocking (send/receive)
• Control returns to invoking process after processing of primitive (whether sync
or async) completes
Nonblocking (send/receive)
• Control returns to process immediately after invocation (even though operation
has not completed)
• Send: control returns to the process even before data copied out of user buffer
• Receive: control returns to the process even before data may have arrived from
sender
Processor synchrony
• Processor synchrony indicates that all the processors execute in lock-step with their
clocks synchronized
lock-step →similar devices share the same timing and triggering and essentially acting
as a single device
Asynchronous execution
• Schemes for data storage, search, and lookup should be fast and
scalable across network
• Let Cij denote the channel from process pi to process pj and let mij
• The actions are atomic and the actions of a process are modeled as
three types of events.
• internal events
• message send events,
• message receive events.
A Model of Distributed Executions
• For a message m, let send(m) & rec(m) denote send and receive
events, respectively.
• In this figure, for process p1, the 2nd event is a message send event, the 3rd
event is an internal event, and the 4th event is a message receive event.
Causal Precedence Relation
• The execution of a distributed application results in a set of distributed
events produced by the processes.
• Let H=∪i hi denote the set of events executed in a distributed computation.
• Next Define a binary relation → on the set H as follows that expresses
causal dependencies between events in the distributed execution.
• Concurrent events
• Logical vs. physical concurrency
Models of communication networks
Models of the service provided by communication networks are
• In the FIFO model, each channel acts as a first-in first-out message queue
and thus, message ordering is preserved by a channel.
• In the non-FIFO model, a channel acts like a set in which the sender
process adds messages and the receiver process removes messages from
it in a random order.
Models of communication networks
• The global state of a distributed system is a collection of the local states of its
registers, stacks, local memory, etc. and depends on the local context of the
distributed application.
Models of communication networks
• A send event (or a receive event) changes the state of the process
that sends (or receives) the message and the state of the channel on
which the message is sent (or received).
Models of communication networks
• All messages that cross the cut from the PAST to the FUTURE are in
transit in the corresponding consistent global state.
. .
A Model of Distributed Computations 85/1
.
PAST(ej ) FUTURE( ej )
e
j
. .
A Model of Distributed Computations 86/1
.
Let Pasti (ej ) be the set of all those events of Past(ej ) that are on
process pi .
max (Pasti (ej )) is the latest event at process pi that affected event ej
(Figure 2.4).
. .
A Model of Distributed Computations 87/1
.
Max Past(ej ) consists of the latest event at every process that affected event
ej and is referred to as the surface of the past cone of ej .
Past(ej ) represents all events on the past light cone that affect ej .
Future Cone of an Event
The future of an event ej , denoted by Future(ej ), contains all events ei that
are causally affected by ej (see Figure 2.4).
In a computation (H, →), Future(ej ) is defined as:
Future(ej ) = {ei |∀ei∈ H, ej → ei }.
. .
A Model of Distributed Computations 88/1
.
Define Futurei (ej ) as the set of those events of Future(ej ) that are on process
pi .
define min(Futurei (ej )) as the first event on process pi that is affected by ej .
Define Min Future(ej ) as S(∀i ){min(Futurei (ej ))}, which consists of the first event
at every process that is causally affected by event ej .
Min Future(ej ) is referred to as the surface of the future cone of ej .
All events at a process pi that occurred after max (Pasti (ej )) but before
min(Futurei (ej )) are concurrent with ej .
Therefore, all and only those events of computation H that belong to the set
“H − Past(ej ) − Future(ej )” are concurrent with event ej .
. .
A Model of Distributed Computations 89/1
.
. .
A Model of Distributed Computations 91/1
Chapter 3-Logical time
• . Logical Time: A framework for a system of logical clocks –Scalar time
–Vector time – Physical clock synchronization: NTP.
Introduction
• The concept of causality between events is fundamental to the design and
analysis of parallel and distributed computing and operating systems.
• Three ways to implement logical time - scalar time, vector time, and matrix time.
A Framework for a System of Logical Clocks
• Definition
• Implementing logical clocks
Definition
• A system of logical clocks consists of a time domain T and a logical clock C .
• Elements of T form a partially ordered set over a relation <
• This Relation < is called the happened before or causal precedence.
• The logical clock C is a function that maps an event e in a distributed system to an
element in the time domain T , denoted as C(e) and called the timestamp of e, and
is defined as follows:
C : H ›→ T
• such that the following property is satisfied:
for two events ei and ej , ei → ej =⇒ C(ei ) < C(ej ).
• This monotonicity property is called the clock consistency condition.
• When T and C satisfy the following condition,
for two events ei and ej , ei → ej ⇔ C(ei ) < C(ej )
the system of clocks is said to be strongly consistent.
Implementing Logical Clocks
• Implementation of logical clocks requires addressing two issues:
• Data structures local to every process to represent logical time
• Protocol to update the data structures to ensure the consistency condition.
• Each process pi maintains data structures that allow the following two
capabilities:
• The protocol ensures that a process’s logical clock, and its view of the
global time, is managed consistently.
• The logical local clock of a process pi and its local view of the global time are squashed
into one integer variable Ci .
• Process identifiers are linearly ordered and tie among events with identical scalar
timestamp is broken on the basis of their process identifiers.
• The lower the process identifier in the ranking, the higher the priority.
• The timestamp of an event is denoted by a tuple (t, i ) where t is its time of
occurrence and i is the identity of the process where it occurred.
• The total order relation ≺ on two events x and y with timestamps (h,i) and (k,j),
respectively, is defined as follows:
Vector Time
• The system of vector clocks was developed independently by Fidge,
Mattern and Schmuck.
• In the system of vector clocks, the time domain is represented by a set of
• n-dimensional non-negative integer vectors.
• Each process pi maintains a vector vti [1..n], where vti [i ]is the local logical clock of pi and
describes the logical time progress at process pi .
• vti [j] represents process pi ’s latest knowledge of process pj local time.
• If vti [j]=x , then process pi knows that local time at process pj has progressed till x .
• The entire vector vti constitutes pi ’s view of the global logical time and is used to timestamp
events.
Vector Time
Process pi uses the following two rules R1 and R2 to update its clock:
R1: Before executing an event, process pi updates its local logical time as
follows:
vti [i] := vti [i] + d (d > 0)
R2: Each message m is piggybacked with the vector clock vt of the sender
process at sending time. On the receipt of such a message (m,vt), process pi
executes the following sequence of actions:
◮ Update its global logical time as follows:
◮ Execute R1.
◮ Deliver the message m.
Vector Time
The timestamp of an event is the value of the vector clock of its process
when the event is executed.
Figure 3.2 shows an example of vector clocks progress with the increment
value d=1.
Initially, a vector clock is [0, 0, 0, ....,0].
Vector Time
An Example of Vector Clocks
1 2 3 4 5
0 0 0 3 3
0 0 0 4 4
p
1 5
2 2 3
0 3 4
0 0 2 2 2 4 5
1 2 3 4 6
0 0 0 0 4
p
2 5
2
3 5
0 4
0 2 2 2
0 3 3 3
1 2 3 4
p
3
Vector Time
Comparing Vector Timestamps
x→y ⇔vh[i ] ≤ vk [i ]
xǁy ⇔vh[i ] > vk [i ] ∧ vh[j] < vk [j]
Vector Time
Properties of Vectot Time
Isomorphism
If events in a distributed system are timestamped using a system of vector
clocks, we have the following property.
If two events x and y have timestamps vh and vk, respectively, then
Vector Time
Strong Consistency
The system of vector clocks is strongly consistent; thus, by examining the
vector timestamp of two events, we can determine if the events are causally
related.
However, Charron-Bost showed that the dimension of vector clocks cannot be
less than n, the total number of processes in the distributed computation, for
this property to hold.
Event Counting
If d=1 (in rule R1), then the i th component of vector clock at process pi ,
vti [i ], denotes the number of events that have occurred at pi until that
instant.
So, if an event e has timestamp vh, vh[j] denotes the number
Σ of events
executed by process pj that causally precede e. Clearly, vh[j] − 1
represents the total number of events that causally precede e in the
distributed computation.
Clock time, C
Slow Clock
dC/dt < 1
UTC, t
Figure 3.5: The behavior of fast, slow, and perfect clocks with respect to UTC.
T1 T2
B
A
T3 T4
Server B Ti-3 Ti