0% found this document useful (0 votes)
1 views189 pages

Distributed Computing

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
1 views189 pages

Distributed Computing

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 189

DISTRIBUTED

COMPUTING
CS-3551
V.Sathiya
AP/IT
UNIT I
INTRODUCTION
Introduction: Definition-Relation to Computer System
Components – Motivation – Message –Passing
Systems versus Shared Memory Systems – Primitives
for Distributed Communication –
Synchronous versus Asynchronous Executions –
Design Issues and Challenges; A Model of
Distributed Computations: A Distributed Program – A
Model of Distributed Executions – Models of
Communication Networks – Global State of a
Distributed System.
Introduction:
• Distributed computing is a model in which
components of a software system are shared
among multiple computers or nodes. Even though
the software components may be spread out across
multiple computers in multiple locations, they're
run as one system.
• distributed system is a collection of independent
computers, interconnected via a network, capable
of collaborating on a task.
• Distributed computing is computing performed in a
distributed system.
Features of Distributed Systems:

• No common physical clock - It introduces the element of


“distribution” in the system and gives rise to the inherent
asynchrony amongst the processors.
• No shared memory - A key feature that requires message-
passing for communication.
• This feature implies the absence of the common physical clock.
• Geographical separation – The geographically wider apart that
the processors are, the more representative is the system of a
distributed system.
• Autonomy and heterogeneity – Here the processors are
“loosely coupled” in that they have different speeds and each
can be running a different operating system.
Issues in distributed
systems
• Heterogeneity
• Openness
• Security
• Scalability
• Failure handling
• Concurrency
• Transparency
• Quality of service
Interaction of layers of
network
Definition-Relation to
Computer System Components

• A computer system is a set of integrated


devices that input, output, process, and
store data and information.
• Computer systems are currently built
around at least one digital processing
device.
• There are five main hardware
components in a computer system:
Input, Processing, Storage, Output and
Communication devices.
Motivation

• Inherently distributed computations


• Resources sharing
• Access to geographically remote data
resources
• Enhanced reliability
• Increased performance/ Cost ratio
• Scalability
• Modularity and incremental expandability.
Motivation
• Inherently distributed computations: DS can process
the computations at geographically remote locations.
• Resource sharing: The hardware, databases, special
libraries can be shared between systems without
owning a dedicated copy or a replica.
• This is cost effective and reliable.
• Access to geographically remote data and resources:
Resources such as centralized servers can also be
accessed from distant locations.
• Enhanced reliability: DS provides enhanced reliability,
since they run on multiple copies of resources
• Availability: The resource/ service
provided by the resource should be
accessible at all times
• Integrity: the value/state of the
resource should be correct and
consistent.
• . Fault-Tolerance: Ability to recover from
system failures
Message –Passing Systems
and Shared Memory Systems
• Shared memory system are those in
which there is a common shared
address space throughout the system.
• Communication among processors takes
places shared data variables and control
variables for synchronization among the
process
Shared memory systems:

• The shared memory is the memory that can be simultaneously


accessed by multiple processes.
• This is done so that the processes can communicate with each
other.
• Communication among processors takes place through shared
data variables, and control variables for synchronization
among the processors.
• • Semaphores and monitors are common synchronization
mechanisms on shared memory systems.
• When shared memory model is implemented in a distributed
environment, it is termed as distributed shared memory.
Message passing systems:

• • This allows multiple processes to read


and write data to the message queue
• without being connected to each other.
• • Messages are stored on the queue
until their recipient retrieves them.
Message passing
• Two processes communicate with each other by
passing messages.
• Message passing is direct and indirect
communication.
• Indirect communication uses mailbox for sending
and receivng message from other process.
• MPS requires the synchronization &Communication
b/w the two processes.
• MPS come in many forms.
• Messages sent by a process can be either fixed or
variable size.
• Send (destintion _name, message)
• Recevie (source_name, message)
The two paradigms
are Equivalent
• Emulating message passing on a shared
memory system(MP-SM)
• Emulating shared memory system on a
message passing (SM-MP)
Emulating message passing on a shared memory
system(MP-SM)

• Shared memory systems are those in


which there is common shared address
spaced throughout the system.
• Communication among the processors
place shared data variables and control
variables for Synchronization
processor.
• It follows a faster communication
compared to messages passsing
techniques.
Diffe b/w MPS & SMS
KEY MPS SMS
Platforms Platforms that Platforms that
exchange messaging provide a shared
for Sharing data are memory for data
called MPS sharing are called
Multiprocessors
Communication Message passing the Faster
communication is communication
lower when strategy when
compared to SM compared to
technique. message passing
technique.
Processes MP process model The process wishes
the process model to initiate
the proceses communication and
communication with has data to shared
others by exchaing create a SM address
Emulating shared memory system on a message
passing (SM-MP)

• This involves the use of “send” and


“receive” operation for “write” and
“read” operation.
• An application can of coures use a
combination of shared memory and
message-passing.
• In a MIMD message-passing
multiprocessors communicate
processor may be a tigthtly coupled
multiprocessor system with SM
Primitives for Distributed
Communication
• Message send and message receive
communication primitives are denoted
• Send()
• Receive()
• A receive primitive has at least two
parameters
• The source from which the data is to received
• The user Buffer into which the data is to be
received
• Synchronous Asynchronous
• Blocking Non blocking
• Blocking synchronous send()
• Non Blocking synchronous send()
• Blocking asynchronous send()
• Non Blocking asynchronous send()
• Blocking Receive
• Non Blocking Receive
Synchronous versus Asynchronous Executions

key Synchronous Asynchronous


execution execution

task Synchronous Asynchronous execution


execution means the means a second task
first task in a program can begin executing in
must finish processing Parallel, without for an
before moving on to task to finish.
executing the next
task

Execut Lower and upper No bound on process


ion bounds on execution execution time.
time time of processes can
be set.
synchronous process

• A synchronous process is a process that


can be executed without interruption
from start to finish.
• synchronous Distributed
Computing(system) can be used for hard
real time applications.
• Its possible and safe to use time out in
order to detect failuer of process or
communication link.
Asynchronous Executions

• Asynchronous is a non-blocking architecture,


so the execution of one task isn't dependent
on another.
• Synchronous is a blocking architecture, so
the execution of each operation is dependent
on the completion of the one before it.
• No time can be used.
• Asynchronous system are widely and
successfully used in practice
AS
SA
• AS
• Asynchronous program can be emulated on
asynchronous system failry trivially as the
synchronous system is a special caes of an
asynchronous sysytem.
• SA
• Synchronous program can be emulated on
an asynchronous system using a tool called
synchronizer.
Design Issues and Challenges

• Having a greater component related to


systems design and operating system design
• Having a greater component related to
algorithm design
• Emerging from recent technology advances
and/or driven by new application
• The system community
• Forces driving the emerging application
Challenges form system

• Communication mechanisms
• Processes
• Naming
• Synchronization
• Data storage and access
• Distributed systems security
• Communication mechanisms:- this task involves
designing appropriate mechanisms among the
process in the network.
• ex:- Remote procedure call
• Remote object Invocation.
• Processes:- issue involved are code migration,
process/thread management at client and servers
design of software and mobile agent.
• Naming easy to identifiers needed to locate
resource and processes transparently and scalable.
• Synchronization:- mutual exclusion is
the classical coordination among the
process are essential.
• Data storage and access:- various
schemes for data storage, searching and
lookup should be fast scalabe across
network.
• Distributed systems security:- secure
channels access key management are
method used to provide security
CHALLENGES

• Middleware:- is a software layer that


provides a programming abstraction as well a
making the underlying platforms.
• Eg DCOM,JAVA,RMI
• Middleware serves to hide both these aspects
by providing uniform, standard high level
interfaces to thhe application developers and
application can be easily composed reused
ported and made to interoperate.
middleware

• application software

• API for standardaedized, high level services

• Middleware

• Distributed heterogeneous hardware nodes


Openness & security
• Openness means that the system can be easily
extended and modified.
• New components can be integrated with existing
components.
• Security becomes even more important in a
distributed system.
• Components:-
• Confidentiality
• Integrity
• Availability
• Encryption provides protection of shred
resources keeps information secret
when transmitted.
• Denial of service attacks:- an attempt to
make a computer or network resource
unavailable to inteded user.
• Security of mobile code:- mobile code
sysytem are conceived to operate in
large scale setting network are compsed
different bandwidths.
scalability
• A system is said to be scalable if it can handle the
addition of users and resources without suffering
loss of performance or increase in complexity.
• In Size:- dealing with large number of machines
user tasks.
• In location:- dealing with geometric distribution
and mobility.
• In administration:- addressing data passing
through different regions of ownership.
• Controlling the cost of resources.
• Controlling the performance loss.
• Avoding performance bottlencks.
Failure handling
• Hardware/software and network are not free of
failures
• DC can maintain availabilty even at low levels of
hardware /software/network .
• there should be at least 2 routes between any 2
router in the internet.
• In DNS every name tables is replicated in several
server.
• Detecting
• Masking
• recovering
transparency
• Hide different aspect of distribution from the
client.
• Ultimate goal of many distributed system.
• Location
• Access
• Concurrency
• Failure
• Mobility
• scaling
A Model of Distributed
computations
• A Model of Distributed Executions. The execution of a
process consists of a sequential execution of its
actions.
• The actions are atomic and the actions of a process are
modeled as three types of events, namely, internal
events, message send events, and message receive
events.
• Distributed execution defines whether tests of an
execution plan are executed sequentially on the
selected execution server (Disabled), or if they are
distributed to all matching execution servers and run
in parallel (Enabled).
• A distributed program is composed of a set of n
asynchronous processes, p1, p2, ..., pi , ..., pn.
• The processes do not share a global memory and
communicate solely by passing messages.
• The processes do not share a global clock that is
instantaneously accessible to these processes.
• Process execution and message transfer are asynchronous.
Without loss of generality, we assume that each process is
running on a different processor.
• Let Cij denote the channel from process pi to process pj and
let mij denote a message sent by pi to pj .
• The message transmission delay is finite and unpredictable.
Logical vs. Physical
Concurrency
• Logical vs. Physical Concurrency In a
distributed computation, two events are
logically concurrent if and only if they do not
causally affect each other.
• Physical concurrency, on the other hand, has a
connotation that the events occur at the same
instant in physical time.
• Two or more events may be logically
concurrent even though they do not occur at
the same instant in physical time
Models of
Communication Networks
• There are several models of the service
provided by communication networks, namely,
FIFO, Non-FIFO, and causal ordering.
• In the FIFO model, each channel acts as a first-
in first-out message queue and thus, message
ordering is preserved by a channel.
• In the non-FIFO model, a channel acts like a
set in which the sender process adds messages
and the receiver process removes messages
from it in a random order.
Global State of a
Distributed System
• “A collection of the local states of its components, namely,
the processes and the communication channels.”
• The state of a process is defined by the contents of
processor registers, stacks, local memory, etc. and
depends on the local context of the distributed application.
• The state of channel is given by the set of messages in
transit in the channel.
• The occurrence of events changes the states of respective
processes and channels.
• An internal event changes the state of the process at
which it occurs.
• A send event changes the state of the
process that sends the message and the
state of the channel on which the
message is sent.
• A receive event changes the state of the
process that or receives the message
and the state of the channel on which
the message is received.
Models of Process Communications

• There are two basic models of process communications –


synchronous and asynchronous.
• The synchronous communication model is a blocking type
where on a message send, the sender process blocks until
the message has been received by the receiver process.
• The sender process resumes execution only after it learns
that the receiver process has accepted the message.
• Thus, the sender and the receiver processes must
synchronize to exchange a message. On the other hand,
asynchronous communication model is a non-blocking
type where the sender and the receiver do not
synchronize to exchange a message.
UNIT II LOGICAL TIME
AND GLOBAL STATE
• Logical Time: Physical Clock Synchronization: NTP –
A Framework for a System of Logical Clocks– Scalar
Time – Vector Time; Message Ordering and Group
Communication: Message Ordering
Paradigms – Asynchronous Execution with
Synchronous Communication – Synchronous Program
Order on Asynchronous System – Group
Communication – Causal Order – Total Order; Global
State and Snapshot Recording Algorithms:
Introduction – System Model and Definitions –
Snapshot
Algorithms for FIFO Channels.
Logical Time
• The logical time in distributed systems is used to maintain
the consistent ordering of events. The concept of causality,
i.e. the causal precedence relationship, is fundamental for
distributed systems.
• The concept of causality between events is fundamental to
the design and analysis of parallel and distributed
computing and operating systems.
• Usually causality is tracked using physical time. In
distributed systems, it is not possible to have a global
physical time. As asynchronous distributed computations
make progress in spurts, the logical time is sufficient to
capture the fundamental monotonicity property associated
with causality in distributed systems.
• Usually causality is tracked using
physical time. In distributed systems, it
is not possible to have a global physical
time.
• As asynchronous distributed
computations make progress in spurts,
the logical time is sufficient to capture
the fundamental monotonicity property
associated with causality in distributed
systems
Physical Clock
Synchronization: NTP
• NTP is an internet protocol used to
synchronize with computer clock time
sources in a network.
• NTP leverages UDP protocol over port 123,
to synchronize clocks within a few seconds
of the UTC time.
• Each computer(OS) comes with the NTP
Package.
• Network Time Protocol
• NTP is structured as a hierarchy of servers.
• Each level in the hierarchy is called a stratum & it represents the
distance from the reference clock.
• Reference Clock — Reference clocks are extremely accurate clocks like
atomic clocks and GPS clocks. Reference clocks have a Stratum of 0.
• Primary Time Servers — Primary Time Servers are directly connected
to the Reference Clocks and sync their time with the Reference Clocks.
They have a Stratum of 1. They are a few microseconds behind Stratum
0 clocks.
• Stratum 2 Clocks — Stratum 2 clocks are connected to Stratum 1
clocks over the network. They are a few milliseconds behind Stratum 0
clocks.
• Stratum 3 and beyond — Stratum X clocks are connected to Stratum
X-1 clocks over the network. The latency from the Stratum 0 clocks
keeps increasing as the levels keep increasing.
A Framework for a
System of Logical Clocks
• This monotonicity property is called the clock consistency
condition.
• When T and C satisfy the following condition, for two events
ei and ej , ei → ej ⇔ C(ei) < C(ej) the system of clocks is said
to be strongly consistent.
• Implementing Logical Clocks Implementation of logical
clocks requires addressing two issues: data structures local
to every process to represent logical time and a protocol to
update the data structures to ensure the consistency
condition.
• Each process pi maintains data structures that allow it the
following two capabilities: ◮ A local logical clock, denoted by
lci, that helps process pi measure its own progress.
Scalar Time
• Proposed by Lamport in 1978 as an attempt to totally
order events in a distributed system.
• Time domain is the set of non-negative integers. The
logical local clock of a process pi and its local view of
the global time are squashed into one integer variable
Ci .
• Rules R1 and R2 to update the clocks are as follows:
• R1: Before executing an event (send, receive, or
internal), process pi executes the following: Ci := Ci +
d (d > 0) In general, every time R1 is executed, d can
have a different value; however, typically d is kept at 1.
• Consistency Property
• Scalar clocks satisfy the monotonicity and hence the
consistency property: for two events ei and ej , ei → ej =⇒
C(ei) < C(ej).
• Total Ordering Scalar clocks can be used to totally order
events in a distributed system.
• The main problem in totally ordering events is that two or
more events at different processes may have identical
timestamp.
• For example in Figure 3.1, the third event of process P1
and the second event of process P2 have identical scalar
timestamp.
Vector Time
• The system of vector clocks was developed independently by
Fidge, Mattern and Schmuck.
• In the system of vector clocks, the time domain is represented by
a set of n-dimensional non-negative integer vectors.
• Each process pi maintains a vector vti [1..n], where vti [i] is the
local logical clock of pi and describes the logical time progress at
process pi .
• vti [j] represents process pi ’s latest knowledge of process pj
local time.
• If vti [j]=x, then process pi knows that local time at process pj has
progressed till x.
• The entire vector vti constitutes pi ’s view of the global logical
time and is used to timestamp events.
Parameters Scalar Vector

Meaning A scalar quantity has Vector quantity has


only magnitude, but both magnitude and
no direction. direction.
quantity Every scalar quantity Vector quantity can
is one-dimensional. be one, two or three-
dimensional.
Change It changes with the It changes with the
change in their change in their
magnitude direction or
magnitude or both.
Measurement Simple Complex

Resolution Scalar quantity Vector quantity can


cannot be resolved as be resolved in any
it has exactly the direction using the
same value regardless sine or cosine of the
of direction. adjacent angle.
Message Ordering and
Group Communication
• For any two events a and b, where each can be
either a send or a receive event, the notation
• • a ~ b denotes that a and b occur at the same
process, i.e., a ∈ Ei and b ∈ Ei for some process
i. The send and receive event pair for a
message called pair of corresponding events.

send–receive event pairs be denoted as • 𝒯 =


• • For a given execution E, let the set of all

{(s,r) ∈ Ei × Ej | s corresponds to r}.


Message ordering
paradigms
• Distributed program logic greatly
depends on the order of delivery
ofmessages.
• • Several orderings on messages have
been defined: (i) non-FIFO,
• (ii) FIFO,
• (iii) causal order,
• (iv) synchronous order.
Asynchronous executions
• Definition 6.1

execution) is an execution (E,≺) for which the


• (A-execution): An asynchronous execution (or A-

causality relation is a partial order.


• • On a logical link between two nodes (is formed as
multiple paths may exist) in the system, if the
messages are delivered in any order then it is
known as non-FIFO executions. Example: IPv4.
• • Each physical link delivers the messages sent on it
in FIFO order due to the physical properties of the
medium.
FIFO

• In general on any logical link, messages are delivered in


the order in which they are sent.
• • To implement FIFO logical channel over a non-FIFO
channel, use a separate numbering scheme to sequence
the messages.
• • The sender assigns and appends a tuple to each message.
The receiver uses a buffer to order the incoming messages
as per the sender’s sequence numbers, and accepts only
the “next” message in sequence.
Causally ordered (CO)
executions
• If two send events s and s′ are related
by causality ordering then their
corresponding receive events r and r′
must occur in the same order at all
common destinations.
• • Figure 6.2 shows an execution that
satisfies CO. s2 and s1 are related by
causality but the destinations of the
corresponding messages are different.
Similarly for s2 ands3.
• Applications of Causal order: applications that requires
update to shared data, to implement distributed shared
memory, and fair resource allocation in distributed mutual
exclusion.

send(m1 ) ≺ send(m2 ) then for each common destination d


• • Definition (causal order (CO) for implementations) If

of messages m1 and m2 , deliverd(m1 ) ≺ deliverd(m2 )


must be satisfied.
• • if m1 and m2 are sent by the same process, then property
degenerates to FIFOproperty.
• • In a FIFO execution, no message can be overtaken by
another message between the same (sender, receiver) pair
of processes.
Synchronous execution
(SYNC)
• When all the communication between pairs of processes uses
synchronous send and receive primitives, the resulting order is the
synchronous order.
• As each synchronous communication involves a handshake between
the receiver and the sender, the corresponding send and receive
events can be viewed as occuring instantaneously and atomically.
• In a timing diagram, the “instantaneous” message communication can
be shown by bidirectional vertical message lines.
• The “instantaneous communication” property of synchronous
executions requires that two events are viewed as being atomic and
simultaneous, and neither event precedes the other.
• Definition (Causality in a synchronous execution) The synchronous
causality relation ≪ on E is the smallest transitive relation that
satisfies the following:
• Definition (S- execution): A synchronous execution is an execution
(E, ≪) for which the causality relation ≪ is a partial order.
• Timestamping a synchronous execution: An execution (E, ≺) is
synchronous if and only if there exists a mapping from E to T
(scalar timestamps) such that •
• for any message M, T(s(M)) = T(r(M));
• for each process Pi, if ei ≺ ei′ then T(ei) < T(ei′). Asynchronous
execution with synchronous communication
• •When all the communication between pairs of processes is by
using synchronous
• send and receive primitives, the resulting order is synchronous
order.
• A distributed program that run correctly on an asynchronous
system may not be executed by synchronous primitives. There is a
possibility that the program may deadlock, as shown by the code in
Figure 6.4.
Executions realizable with
synchronous communication (RSC)

• In an A-execution, the messages can be made to


appear instantaneous if there exists a linear extension
of the execution, such that each send event is
immediately followed by its corresponding receive
event.
• Such an A-execution that isrealized under synchronous
communication is called a realizable with synchronous
communication (RSC) execution.

extension of (E,≺) is a linear extension of (E,≺) such


• Non-separated linear extension: A non-separated linear

that for each pair (s, r) ∈ T, the interval { x ∈ E | s ≺ x


≺ r} is empty.
Synchronous Program
Order on Asynchronous System

• There do not exist real systems with instantaneous


communication that allows for synchronous communication
to be naturally realized. Non-determinism
• This suggests that the distributed programs are
deterministic, i.e., repeated runs of the same program will
produce the same partial order.
• In many cases, programs are non-deterministic in the
following senses
• 1. A receive call can receive a message from any sender who
has sent a message, if the expected sender is not specified. 2.
Multiple send and receive calls which are enabled at a
process can be executed inan interchangeable order.
Group Communication
• Processes across a distributed system cooperate to solve
a task. Hence there is need for group communication.
• A message broadcast is sending a message to all
members.
• In Multicasting the message is sent to a certain subset,
identified as a group.
• In unicasting is the point-to-point message
communication.
• Broadcast and multicast is supported by the network
protocol stack using variants of the spanning tree. This
is an efficient mechanism for distributing information.
• However, the hardware or network layer protocol assisted
multicast cannot efficiently provide the following features:
• Application-specific ordering semantics on the order of
delivery of messages.
• Adapting groups to dynamically changing membership.
• Sending multicasts to an arbitrary set of processes at
each send event.
• Providing various fault-tolerance semantics.
• If a multicast algorithm requires the sender to be a part
of the destination group, the multicast algorithm is said to
be a closed group algorithm.
Causal Order

• Causal ordering of messages is one of the four


semantics of multicast communication namely
unordered, totally ordered, causal, and sync-
ordered communication.
• Multicast communication methods vary
according to the message’s reliability
guarantee and ordering guarantee.
• The causal ordering of messages describes the
causal relationship between a message send
event and a message receive event.
• Reasons that may lead to violation of causal
ordering of messages
• It may happen due to a transmission delay.
• Congestion in the network.
• Failure of a system.
• Protocols that are used to provide causal
ordering of messages
• Birman Schipher Stephenson Protocol
• Schipher Eggli Sandoz Protocol
Total Order
• For each pair of processes Pi and Pj and for each
pair of messages Mx and My that are delivered to
both the processes, Pi is delivered Mx before My if
and only if Pj is delivered Mx before My. Example •
• The execution in Figure 6.11(b) does not satisfy
total order.
• Even if the message m did not exist, total order
would not be satisfied.
• The execution • in Figure 6.11(c) satisfies total
order.
Centralized algorithm for total
order

• • Algorithm Assumes all processes


broadcast messages. • It enforces total
order and also the causal order in a
system with FIFO channels.
• • Each process sends the message it
wants to broadcast to a centralized
process.
• • The centralized process relays all the
messages it receives to every other
process over FIFO channels
Global State and Snapshot Recording Algorithms

• distributed computing system consists


of spatially separated processes that do
not share a common memory and
communicate asynchronously with each
other by message passing over
communication channels.
• The state of a channel is the set of messages in the
transit.
• The global state of a distributed system is the collection
of states of the process and the channel.
• Applications that use the global state information are
• deadlocks detection
• failure recovery,
• for debugging distributed software
• If shared memory is available then an up-to-date state of
the entire system is available to the processes sharing the
memory.
System model and
definitions
The system consists of a collection of n processes,
p1, p2,…, pn, that are connected by channels.
• There is no globally shared memory and processes
communicate solely by passing messages (send
and receive) asynchronously i.e., delivered reliably
with finite but arbitrary time delay.
• There is no physical global clock in the system.
• The system can be described as a directed graph
where vertices represents processes and edges
represent unidirectional communication channels.
• For a message mij that is sent by process pi to process pj, let
send(mij) and rec(mij) denote its send and receive events
affects state of the channel, respectively. • The events at a
process are linearly ordered by their order of occurrence.
• At any instant, the state of process pi, denoted by LSi, is a
result of the sequence of all the events executed by pi up to
thatinstant.
• • For an event e and a process state LSi, e∈LSi iff e belongs
to the sequence of events that have taken process pi to state
LSi.
• • For an event e and a process state LSi, e∉LSi iff e does not
belong to the sequence of events that have taken process pi
to state LSi.
9 Snapshot algorithms
for FIFO channels
• This algorithm uses a control message,
called a marker. • After a site has
recorded its snapshot, it sends a marker
along all of its outgoing channels before
sending out any more messages.
• • Since channels are FIFO, marker
separates the messages in the channel
into those to be included in the snapshot
from those not to be recorded in the
snapshot. This addresses issue I1.
• Since all messages that follow a marker on channel
Cij have been sent by process pi after pi has taken
its snapshot, process pj must record its snapshot if
not recorded earlier and record the state of the
channel that was received along the marker
message. This addresses issue I2.
• The algorithm • The algorithm is initiated by any
process by executing the marker sending rule.
• The algorithm terminates after each process
hasreceived a marker on all of its incoming
channels
Correctness
• • To prove the correctness of the algorithm, it is shown
that a recorded snapshot satisfies conditions C1 and C2.
• Since a process records its snapshot when it receives the
first marker on any incoming channel, no messages that
follow markers on the channels incoming to it are
recorded in the process’s snapshot.
• Moreover, a process stops recording the state of an
incoming channel when a marker is received on that
channel.
• Due to FIFO property of channels, it follows that no
message sent after the marker on that channel is recorded
in the channel state. Thus, condition C2 is satisfied.
Complexity

• • The recording part of a single instance


of the algorithm requires O(e) messages
and O(d) time, where e is the number of
edges in the network and d is the
diameter of the network.
UNIT III DISTRIBUTED
MUTEX AND DEADLOCK
• Distributed Mutual exclusion Algorithms:
Introduction – Preliminaries – Lamport’s
algorithm – RicartAgrawala’s Algorithm ––
Token-Based Algorithms – Suzuki-Kasami’s
Broadcast Algorithm;
Deadlock Detection in Distributed Systems:
Introduction – System Model – Preliminaries
– Models
of Deadlocks – Chandy-Misra-Haas
Algorithm for the AND model and OR Model.
• Mutual exclusion: Concurrent access of
processes to a shared resource or data is
executed in mutually exclusive manner.
• Only one process is allowed to execute the
critical section (CS) at any given time.
• In a distributed system, shared variables
(semaphores) or a local kernel cannot be used to
implement mutual exclusion.
• Message passing is the sole means for
implementing distributed mutual exclusion.
Distributed Mutual
exclusion Algorithms
• Distributed mutual exclusion algorithms must deal with
unpredictable message delays and incomplete knowledge
of the system state.
• Three basic approaches for distributed mutual exclusion:
• 1 Token based approach
• 2 Non-token based approach
• 3 Quorum based approach Token-based approach:
• A unique token is shared among the sites.
• A site is allowed to enter its CS if it possesses the token.
• Mutual exclusion is ensured because the token is unique.
• Non-token based approach:
• Two or more successive rounds of messages are
exchanged among the sites to determine which site
will enter the CS next. Quorum based approach:
• Each site requests permission to execute the CS
from a subset of sites (called a quorum).
• Any two quorums contain a common site.
• This common site is responsible to make sure that
only one request executes the CS at any time
Lamport’s algorithm

• Requests for CS are executed in the


increasing order of timestamps and time
is determined by logical clocks. Every
site Si keeps a queue, request queuei ,
which contains mutual exclusion
requests ordered by their timestamps.
This algorithm requires communication
channels to deliver messages the FIFO
order.
• Requesting the critical section:
• When a site Si wants to enter the CS, it broadcasts a
REQUEST(tsi , i) message to all other sites and places the
request on request queuei .
• ((tsi , i) denotes the timestamp of the request.) When a site Sj
receives the REQUEST(tsi , i) message from site Si ,places site
Si ’s request on request queuej and it returns a timestamped
REPLY message to Si .
• Executing the critical section: Site Si enters the CS when the
following two conditions hold:
• L1: Si has received a message with timestamp larger than (tsi ,
i) from all other sites. L2: Si ’s request is at the top of request
queuei
correctness
• Theorem: Lamport’s algorithm is fair. Proof:

• The proof is by contradiction. Suppose a site Si ’s request has a


smaller timestamp than the request of another site Sj and Sj is able to
execute the CS before Si .
• For Sj to execute the CS, it has to satisfy the conditions L1 and L2.
• This implies that at some instant in time say t, Sj has its own request
at the top of its queue and it has also received a message with
timestamp larger than the timestamp of its request from all other
sites.
• But request queue at a site is ordered by timestamp, and according to
our assumption Si has lower timestamp. So Si ’s request must be
placed ahead of the Sj ’s request in the request queuej . This is a
contradiction!
RicartAgrawala’s
Algorithm
• The Ricart-Agrawala algorithm assumes the communication channels
are FIFO. The algorithm uses two types of messages:
• REQUEST and REPLY.
• A process sends a REQUEST message to all other processes to request
their permission to enter the critical section.
• A process sends a REPLY message to a process to give its permission
to that process.
• Processes use Lamport-style logical clocks to assign a timestamp to
critical section requests and timestamps are used to decide the priority
of requests.
• Each process pi maintains the Request-Deferred array, RDi , the size of
which is the same as the number of processes in the system. Initially,
∀i ∀j: RDi [j]=0. Whenever pi defer the request sent by pj , it sets RDi
[j]=1 and after it has sent a REPLY message to pj , it sets RDi [j]=0.
• Requesting the critical section:
• (a) When a site Si wants to enter the CS, it broadcasts
a timestamped REQUEST message to all other sites.
• (b) When site Sj receives a REQUEST message from
site Si , it sends a REPLY message to site Si if site Sj
is neither requesting nor executing the CS, or if the
site Sj is requesting and Si ’s request’s timestamp is
smaller than site Sj ’s own request’s timestamp.
• Otherwise, the reply is deferred and Sj sets RDj [i]=1
• Executing the critical section:
• (c) Site Si enters the CS after it has received a REPLY
message from every site it sent a REQUEST message
to
Correctness
• Theorem: Ricart-Agrawala algorithm achieves
mutual exclusion. Proof: Proof is by contradiction.
Suppose two sites Si and Sj ‘ are executing the CS
concurrently and Si ’s request has higher priority
than the request of Sj . Clearly, Si received Sj ’s
request after it has made its own request. Thus, Sj
can concurrently execute the CS with Si only if Si
returns a REPLY to Sj (in response to Sj ’s request)
before Si exits the CS. However, this is impossible
because Sj ’s request has lower priority.Therefore,
Ricart-Agrawala algorithm achieves mutual
exclusion.
Performance

• For each CS execution, Ricart-Agrawala


algorithm requires (N − 1) REQUEST
messages and (N − 1) REPLY messages.
• Thus, it requires 2(N − 1) messages per
CS execution. Synchronization delay in
the algorithm is T.
Token-Based Algorithms

• These components can communicate with each


other to conquer one common goal as a task.
• There are many algorithms are used to achieve
Mutual Exclusion ,
• where there are multiple processes(or sites)
requesting access for a single shared
resource(often called as Critical Section) ,
• and these are broadly divided into 2
categories:
Token Based Algorithms Non-Token Based
Algorithms

In the Token-based algorithm, In Non-Token based


a unique token is shared algorithm, there is no
among all the sites in token even not any
Distributed Computing concept of sharing token
Systems. for access.
In Non-Token based Here, two or more
algorithm, there is no token successive rounds of
even not any concept of messages are exchanged
sharing token for access. between sites to
determine which site is to
enter the Critical Section
next.
The token-based algorithm Non-Token based
produces less message traffic Algorithm produces more
as compared to Non-Token message traffic as
based Algorithm. compared to the Token-
example

• A token-ring network is a local area


network (LAN) topology that sends data
in one direction throughout a specified
number of locations by using a token.
The token is the symbol of authority for
control of the transmission line.
Suzuki-Kasami’s
Broadcast Algorithm
• If a site wants to enter the CS and it does not
have the token, it broadcasts a REQUEST
message for the token to all other sites.
• A site which possesses the token sends it to
the requesting site upon the receipt of its
REQUEST message.
• If a site receives a REQUEST message when
it is executing the CS, it sends the token only
after it has completed the execution of the CS
Suzuki-Kasami’s
Broadcast Algorithm
• Suzuki–Kasami algorithm is a token-
based algorithm for achieving mutual
exclusion in distributed systems.
• This is modification of Ricart–Agrawala
algorithm, a permission based (Non-
token based) algorithm which
uses REQUEST and REPLY messages
to ensure mutual exclusion.
Data structure and
Notations:
• An array of integers RN[1…N]
A site Si keeps RNi[1…N], where RNi[j] is the
largest sequence number received so far
through REQUEST message from site Si.
• An array of integer LN[1…N]
This array is used by the token.LN[J] is the
sequence number of the request that is recently
executed by site Sj.
• A queue Q
This data structure is used by the token to keep
record of ID of sites waiting for the token
Algorithm:

• To enter Critical section:


• When a site Si wants to enter the critical section and it does not have the
token then it increments its sequence number RNi[i] and sends a request
message REQUEST(i, sn) to all other sites in order to request the token.
Here sn is update value of RNi[i]
• When a site Sj receives the request message REQUEST(i, sn) from site Si, it
sets RNj[i] to maximum of RNj[i] and sn i.e RNj[i] = max(RNj[i], sn).
• After updating RNj[i], Site Sj sends the token to site Si if it has token
and RNj[i] = LN[i] + 1

• To execute the critical section:


• Site Si executes the critical section if it has acquired the token.

• To release the critical section:


After finishing the execution Site Si exits the critical section and does
following:
Message Complexity

The algorithm requires 0 message


invocation if the site already holds the
idle token at the time of critical section
request or maximum of N message per
critical section execution. This N
messages involves

• (N – 1) request messages
• 1 reply message
performance

• Synchronization delay is 0 and no


message is needed if the site holds the
idle token at the time of its request.
• In case site does not holds the idle
token, the maximum synchronization
delay is equal to maximum message
transmission time and a maximum of N
message is required per critical section
invocation.
Deadlock Detection in
Distributed Systems
• In a distributed system deadlock can neither be
prevented nor avoided as the system is so vast that it is
impossible to do so. Therefore, only deadlock detection
can be implemented. The techniques of deadlock
detection in the distributed system require the following:

• Progress –
The method should be able to detect all the deadlocks in
the system.
• Safety –
The method should not detect false or phantom
deadlocks.
They are as follows:

• Centralized approach –
In the centralized approach, there is only one responsible resource to detect deadlock.
The advantage of this approach is that it is simple and easy to implement, while the
drawbacks include excessive workload at one node, single-point failure (that is the
whole system is dependent on one node if that node fails the whole system crashes)
which in turns makes the system less reliable.

• Distributed approach –
In the distributed approach different nodes work together to detect deadlocks. No
single point failure (that is the whole system is dependent on one node if that node
fails the whole system crashes) as the workload is equally divided among all nodes.
The speed of deadlock detection also increases.

• Hierarchical approach –
This approach is the most advantageous. It is the combination of both centralized and
distributed approaches of deadlock detection in a distributed system. In this approach,
some selected nodes or clusters of nodes are responsible for deadlock detection and
these selected nodes are controlled by a single node.
Issues of Deadlock Detection

• Deadlock detection-based deadlock handling


requires addressing two fundamental issues:
first, detecting existing deadlocks, and second,
resolving detected deadlocks.
• Detecting deadlocks entails tackling two issues:
WFG maintenance and searching the WFG for the
presence of cycles.
• In a distributed system, a cycle may include
multiple sites. The search for cycles is highly
dependent on the system's WFG as represented
across the system.
Resolution of Deadlock Detection

• Various resolutions of deadlock


detection in the distributed system are
as follows:
• Deadlock resolution includes the
braking existing wait-for dependencies
in the system WFG.
• It includes rolling multiple deadlocked
processes and giving their resources to
the blocked processes in the deadlock
so that they may resume execution.
Deadlock detection algorithms in
Distributed system are as follows

• Path-Pushing Algorithms
• Edge-chasing Algorithms
• Diffusing Computations Based
Algorithms
• Global State Detection Based
Algorithms
• Path-Pushing Algorithms
• Path-pushing algorithms detect distributed deadlocks by
keeping an explicit global WFG. The main concept is to
create a global WFG for each distributed system site.
When a site in this class of algorithms performs a
deadlock computation, it sends its local WFG to all
neighboring sites.
• Edge-Chasing Algorithms
• An edge-chasing method verifies a cycle in a distributed
graph structure by sending special messages called
probes along the graph's edges. These probing messages
are distinct from request and response messages.
Models
of Deadlocks
• A deadlock occurs when a set of
processes is stalled because each
process is holding a resource and
waiting for another process to acquire
another resource.
• In the diagram below, for example,
Process 1 is holding Resource 1 while
Process 2 acquires Resource 2, and
Process 2 is waiting for Resource 1.
System Model :

• For the purposes of deadlock discussion,


a system can be modeled as a collection
of limited resources that can be divided
into different categories and allocated to
a variety of processes, each with
different requirements.
• Memory, printers, CPUs, open files, tape
drives, CD-ROMs, and other resources
are examples of resource categories.
• Some categories may only have one
resource.
• The kernel keeps track of which
resources are free and which are
allocated, to which process they are
allocated, and a queue of processes
waiting for this resource to become
available for all kernel-managed
resources. Mutexes or wait() and
signal() calls can be used to control
application-managed resources (i.e.
Methods for Handling
Deadlocks

In general, there are three approaches to dealing with deadlocks as follows.
• Preventing or avoiding deadlock by Avoid allowing the system to become
stuck in a loop.
• Detection and recovery of deadlocks, When deadlocks are detected, abort
the process or preempt some resources.
• Ignore the problem entirely.
• To avoid deadlocks, the system requires more information about all
processes. The system, in particular, must understand what resources a
process will or may request in the future. ( Depending on the algorithm,
this can range from a simple worst-case maximum to a complete resource
request and release plan for each process. )
• Deadlock detection is relatively simple, but deadlock recovery necessitates
either aborting processes or preempting resources, neither of which is an
appealing option.
Recovery From
Deadlock

There are three basic approaches to
getting out of a bind:
• Inform the system operator and give
him/her permission to intervene manually.
• Stop one or more of the processes
involved in the deadlock.
• Prevent the use of resources.
Chandy-Misra-Haas Algorithm
for the AND model.
• Another fully distributed deadlock detection
algorithm is given by Chandy, Misra, and Hass
(1983).This is considered an edge-
chasing, probe-based algorithm.It is also
considered one of the best deadlock detection
algorithms for distributed systems.
• If a process makes a request for a resource
which fails or times out, the process generates
a probe message and sends it to each of the
processes holding one or more of its requested
resources.
Chandy-Misra-Haas Algorithm
for the AND Model.
• This is considered an edge-chasing, probe-based
algorithm.
• It is also considered one of the best deadlock
detection algorithms for distributed
• systems.
• •If a process makes a request for a resource
which fails or times out, the process
• generates a probe message and sends it to each
of the processes holding one or more of its
requested resources
• This algorithm uses a special message called probe,
which is a triplet (i, j,k), denoting that it belongs to a
deadlock detection initiated for process Pi andit is being
sent by the home site of process Pj to the home site of
process Pk.
• Each probe message contains the following information:
• ➢ the id of the process that is blocked (the one that
initiates the probe message);
• ➢ the id of the process is sending this particular version
of the probe message;
• ➢ the id of the process that should receive this probe
message.
• A probe message travels along the edges of the
global WFG graph, and a deadlock is detected
when a probe message returns to the process
that initiated it.
• • A process Pj is said to be dependent on another
process Pk if there exists a sequence of
processes Pj, Pi1 , Pi2 , . . . , Pim, Pksuch that
each process except Pkin the sequence is
blocked and each process, except the Pj, holds a
resource for which the previous process in the
sequence is waiting.
Each probe message
contains the following
• information:the id of the process that is
blocked (the one that initiates the probe message);
• the id of the process is sending this particular
version of the probe message; and
• the id of the process that should receive this probe
message.
• When a process receives a probe message, it
checks to see if it is also waiting for resources.If
not, it is currently using the needed resource
and will eventually finish and release the resource.
The advantages of this algorithm
include the following:

• Itis easy to implement.


• Each probe message is of fixed length.
• There is very little computation.
• There is very little overhead.
• There is no need to construct a graph, nor to
pass graph information to other sites.
• This algorithm does not find false (phantom)
deadlock.
• There is no need for special data structures.
Chandy-Misra-Haas Algorithm
for the AND model
CHANDY–MISRA–HAAS
ALGORITHM FOR THE OR MODEL

• A blocked process determines if it is


deadlocked by initiating a diffusion
computation.
• • Two types of messages are used in a
diffusion computation:
• ➢ query(i, j, k)
• ➢ reply(i, j, k)
CHANDY–MISRA–HAAS
ALGORITHM FOR THE OR MODEL

• A blocked process initiates deadlock detection by


sending query messages to all processes in its
dependent set.
• • If an active process receives a query or reply message,
it discards it.
• • When a blocked process Pk receives a query(i, j, k)
message, it takes the following actions:
• 1. If this is the first query message received by Pk for
the deadlock detection initiated by Pi, then it
propagates the query to all the processes in its
dependent set and sets a local variable numk (i) to the
number of query messages sent.
CHANDY–MISRA–HAAS
ALGORITHM FOR THE OR MODEL
UNIT IV CONSENSUS
AND RECOVERY
• Consensus and Agreement Algorithms: Problem
Definition – Overview of Results – Agreement in a
Failure-Free System(Synchronous and
Asynchronous) – Agreement in Synchronous
Systems with
Failures; Check pointing and Rollback Recovery:
Introduction – Background and Definitions –
Issues in Failure Recovery – Checkpoint-based
Recovery – Coordinated Check pointing
Algorithm – Algorithm for Asynchronous Check
pointing and Recovery
Consensus and
Agreement Algorithms
• System assumptions Failure models
Synchronous/ Asynchronous
communication Network connectivity
Sender identification Channel reliability
Authenticated vs.
• non-authenticated messages Agreement
variable 0
• Kshemkalyani and M. Singhal
(Distributed Computing) Consensus and
Agreement
• Up to f (< n) crash failures possible.
• In f + 1 rounds, at least one round has no failures.
• Now justify: agreement, validity, termination conditions are satisfied.
• Complexity: O(f + 1)n 2 messages f + 1 is lower bound on number of
rounds
• (global constants) integer: f ;
• // maximum number of crash failures tolerated (local variables) integer: x
←− local value;
• (1) Process Pi (1 ≤ i ≤ n) executes the Consensus algorithm for up to f
crash failures:
• (1a) for round from 1 to f + 1 do (1b) if the current value of x has not
been broadcast then (1c) broadcast(x);
• (1d) yj ←− value (if any) received from process j in this round;
• (1e) x ←− min(x, yj); (1f) output x as the consensus value
• Byzantine Agreement (single source has an initial value) Agreement: All non-
faulty processes must agree on the same value. Validity: If the source process is
non-faulty, then the agreed upon value by all the non-faulty processes must be
the same as the initial value of the source.
• Termination: Each non-faulty process must eventually decide on a value.
Consensus Problem (all processes have an initial value) Agreement: All non-
faulty processes must agree on the same (single) value. Validity: If all the non-
faulty processes have the same initial value, then the agreed upon value by all
the non-faulty processes must be that same value
• . Termination: Each non-faulty process must eventually decide on a value.
Interactive Consistency (all processes have an initial value) Agreement: All non-
faulty processes must agree on the same array of values A[v1 . . . vn].
• Validity: If process i is non-faulty and its initial value is vi , then all non-faulty
processes agree on vi as the ith element of the array A. If process j is faulty,
then the non-faulty processes can agree on any value for A[j]. Termination:
Each non-faulty process must eventually decide on the array A. These problems
are equivalent to one another! Show using reductions.
Agreement in a
Failure
• In a failure-free system, consensus can be reached by collecting
information from the different processes, arriving at a decision,
and distributing this decision in the system.
• In a synchronous system, this can be done simply in a constant
number of rounds.
• Further, common knowledge of the decision value can be obtained
using an
• additionalround.
• In an asynchronous system, consensus can similarly be reached in
a constant
• numberof message hops.
• Further, concurrent common knowledge of the consensus value
can also be attained.
Free System(Synchronous
and Asynchronous)
• Consensus algorithm for crash failures
message passing synchronous system.
• The consensus algorithm for n
processes where up to f processes
where f < n may fail in a fail stop failure
model.
Agreement in Synchronous
Systems withFailures

• So, of all the values received within that


round and its own value xi at that start of
• the round the process takes minimum and
updates xi occur f + 1 rounds the local
• value xi guaranteed to be the consensus
value.
• In one process is faulty, among three
processes then f = 1. So the agreement
requires f + 1 that is equal to two rounds.
• The agreement condition is satisfied because in the f+ 1 rounds,
there must be at least one round in which no process failed.

• In this round, say round r, all the processes that have not failed so
far succeed in broadcasting their values, and all these processes
take the minimum of the values broadcast and received in that
round.
• Thus, the local values at the end of the round are the same, say
xifor all non-failed processes.

• In further rounds, only this value may be sent by each process at


most once, and no process i will update its value xi
.
Check pointing and
Rollback Recovery
• Rollback recovery protocols – restore the system back to a
consistent state after a failure – achieve fault tolerance by
periodically saving the state of a process during the failure-
free execution – treats a distributed system application as a
collection of processes that communicate over a network
• • Checkpoints – the saved states of a process
• • Why is rollback recovery of distributed systems
complicated? – messages induce inter-process
dependencies during failure-free operation
• • Rollback propagation – the dependencies may force some
of the processes that did not fail to roll back –
• This phenomenon is called “domino effect”
Background and
Definitions
• A global state of a distributed system – a collection of the
individual states of all participating processes and the
states of the communication channels
• Consistent global state – a global state that may occur
during a failure-free execution of distribution of distributed
computation – if a process‟s state reflects a message
receipt, then the state of the corresponding sender must
reflect the sending of the message
• A global checkpoint – a set of local checkpoints, one from
each process • A consistent global checkpoint – a global
checkpoint such that no message is sent by a process after
taking its local point that is received by another process
before taking its checkpoint
Issues in Failure
Recovery
• The rollback of process 𝑃𝑖 to checkpoint 𝐶𝑖,1 created an orphan
message H
• Orphan message I is created due to the roll back of process 𝑃𝑗 to
checkpoint 𝐶𝑗,1
• Messages C, D, E, and F are potentially problematic –Message C: a
delayed message
• – Message D: a lost message since the send event for D is recorded in
the restored state for 𝑃𝑗 , but the receive event has been undone at
process 𝑃𝑖 .
• – Lost messages can be handled by having processes keep a message
log of all the sent messages –
• Messages E, F: delayed orphan messages.
• After resuming execution from their checkpoints, processes will
generate both of these messages
Uncoordinated
Checkpointing
• Each process has autonomy in deciding when to take
checkpoints •
• Advantages – The lower runtime overhead during normal
execution
• Disadvantages – Domino effect during a recovery – Recovery
from a failure is slow because processes need to iterate to find
a consistent set of checkpoints
• – Each process maintains multiple checkpoints and periodically
invoke a garbage collection algorithm
• – Not suitable for application with frequent output commits
• The processes record the dependencies among their
checkpoints caused by message exchange during failure-free
operation
• Blocking Checkpointing – After a process takes a local
checkpoint, to prevent orphan messages, it remains
blocked until the entire checkpointing activity is
complete – Disadvantages •
• the computation is blocked during the checkpointing
• • Non-blocking Checkpointing – The processes need
not stop their execution while taking checkpoints
• – A fundamental problem in coordinated
checkpointing is to prevent a process from receiving
application messages that could make the checkpoint
inconsistent.
Coordinated Check
pointing Algorithm
• coordinated checkpointing simplifies
failure recovery and eliminates domino
effects in case of failures by preserving
a consistent global checkpoint on stable
storage.
• However, the approach suffers from
high overhead associated with the
checkpointing process.
checkpoint algorithm

• The record-based commits global


checkpoint algorithm transactions at a
specified number of iterations of the
processJobStep method of batch step.
Each call to the processJobStep method
is treated as iterating through one
record.
Algorithm for Asynchronous
Check pointing and Recovery
• Asynchronous state
checkpointing attempts to perform the
checkpointing asynchronously so that
the micro-batch execution doesn't have
to wait for the checkpoint to complete.
• In other words, the next micro-batch
can start as soon as the computation of
the previous micro-batch has been
completed.
• If each process takes its checkpoints independently, then the
system can not avoid the domino effect – this scheme is called
independent or uncoordinated checkpointing
• • Techniques that avoid domino effect – Coordinated
checkpointing rollback recovery
• processes coordinate their checkpoints to form a system-wide
consistent state – Communication-induced checkpointing
rollback recovery
• forces each process to take checkpoints based on information
piggybacked on the application – Log-based rollback recovery
• • combines checkpointing with logging of non-deterministic
events • relies on piecewise deterministic (PWD) assumption
Uncoordinated
Checkpointing
• Each process has autonomy in deciding when to take checkpoints
• • Advantages – The lower runtime overhead during normal
execution
• • Disadvantages –
• Domino effect during a recovery
• – Recovery from a failure is slow because processes need to
iterate to find a consistent set of checkpoints
• – Each process maintains multiple checkpoints and periodically
invoke a garbage collection algorithm –
• Not suitable for application with frequent output commits
• • The processes record the dependencies among their checkpoints
caused by message exchange during failure-free operation
UNIT V CLOUD
COMPUTING
Definition of Cloud Computing –
Characteristics of Cloud – Cloud
Deployment Models – Cloud
Service Models – Driving Factors and
Challenges of Cloud – Virtualization –
Load Balancing –Scalability and Elasticity
– Replication – Monitoring – Cloud
Services and Platforms: Compute Services
– Storage Services – Application Services
Definition of Cloud
Computing
• Cloud Computing tutorial provides basic and
advanced concepts of Cloud Computing.
• Our Cloud Computing tutorial is designed for
beginners and professionals.
• Cloud computing is a virtualization-based
technology that allows us to create, configure, and
customize applications via an internet connection.
• The cloud technology includes a development
platform, hard disk, software application, and
database.
• There are the following operations that we
can do using cloud computing:
• Developing new applications and services
• Storage, back up, and recovery of data
• Hosting blogs and websites
• Delivery of software on demand
• Analysis of data
• Streaming videos and audios
Characteristics of Cloud
• 1) Agility
• The cloud works in a distributed computing environment. It
shares resources among users and works very fast.
• 2) High availability and reliability
• The availability of servers is high and more reliable because
the chances of infrastructure failure are minimum.
• 3) High Scalability
• Cloud offers "on-demand" provisioning of resources on a
large scale, without having engineers for peak loads.
• 4) Multi-Sharing
• With the help of cloud computing, multiple users and
applications can work more efficiently with cost reductions by
sharing common infrastructure.
• 5) Device and Location Independence
• Cloud computing enables the users to access systems using a web browser
regardless of their location or what device they use e.g. PC, mobile phone, etc. As
infrastructure is off-site (typically provided by a third-party) and accessed via
the Internet, users can connect from anywhere.
• 6) Maintenance
• Maintenance of cloud computing applications is easier, since they do not need to
be installed on each user's computer and can be accessed from different
places. So, it reduces the cost also.
• 7) Low Cost
• By using cloud computing, the cost will be reduced because to take the services of
cloud computing, IT company need not to set its own infrastructure and pay-
as-per usage of resources.
• 8) Services in the pay-per-use mode
• Application Programming Interfaces (APIs) are provided to the users so that
they can access services on the cloud by using these APIs and pay the charges
as per the usage of services.
Cloud Deployment
Models
• Today, organizations have many exciting
opportunities to reimagine, repurpose
and reinvent their businesses with the
cloud.
• The last decade has seen even more
businesses rely on it for quicker time to
market, better efficiency, and scalability.
• It helps them achieve lo ng-term digital
goals as part of their digital strategy.
benefits of Public Cloud

• Minimal Investment - As a pay-per-use


service, there is no large upfront cost
and is ideal for businesses who need
quick access to resources
• No Hardware Setup - The cloud service
providers fully fund the entire
Infrastructure
• No Infrastructure Management - This
does not require an in-house team to
utilize the public cloud.
• Benefits of Private Cloud
• Data Privacy - It is ideal for storing corporate
data where only authorized personnel gets
access
• Security - Segmentation of resources within the
same Infrastructure can help with better access
and higher levels of security.
• Supports Legacy Systems - This model supports
legacy systems that cannot access the public
cloud.
Cloud Service Models

• There are the following three types of


cloud service models -
• Infrastructure as a Service (IaaS)
• Platform as a Service (PaaS)
• Software as a Service (SaaS)
Sales

Iaas
Paas
Saas
Infrastructure as a Service (IaaS)

• IaaS is also known as Hardware as a Service (HaaS). It is a


computing infrastructure managed over the internet. The
main advantage of using IaaS is that it helps users to avoid the
cost and complexity of purchasing and managing the physical
servers.
• Characteristics of IaaS
• There are the following characteristics of IaaS -
• Resources are available as a service
• Services are highly scalable
• Dynamic and flexible
• GUI and API-based access
• Automated administrative tasks
Platform as a Service (PaaS)

PaaS cloud computing platform is created for the programmer to


develop, test, run, and manage the applications.
Characteristics of PaaS
There are the following characteristics of PaaS -
Accessible to various users via the same development application.
Integrates with web services and databases.
Builds on virtualization technology, so resources can easily be scaled
up or down as per the organization's need.
Support multiple languages and frameworks.
Provides an ability to "Auto-scale".
Example: AWS Elastic Beanstalk, Windows Azure, Heroku,
Force.com, Google App Engine, Apache Stratos, Magento Commerce
Cloud, and OpenShift.
Software as a Service (SaaS)

• SaaS is also known as "on-demand software". It is a software


in which the applications are hosted by a cloud service
provider. Users can access these applications with the help of
internet connection and web browser.
• Characteristics of SaaS
• There are the following characteristics of SaaS -
• Managed from a central location
• Hosted on a remote server
• Accessible over the internet
• Users are not responsible for hardware and software updates.
Updates are applied automatically.
• The services are purchased on the pay-as-per-use basis
Difference between IaaS, PaaS, and
IaaS PaaS
SaaS SaaS

It provides a virtual It provides virtual It provides web


data center to store platforms and tools to software and apps to
information and create create, test, and complete business
platforms for app deploy apps. tasks.
development, testing,
and deployment.
It provides access to It provides runtime It provides software as
resources such as environments and a service to the end-
virtual machines, deployment tools for users.
virtual storage, etc. applications.
It is used by network It is used by It is used by end users.
architects. developers.

IaaS provides only PaaS provides SaaS provides


Infrastructure. Infrastructure+Platfor Infrastructure+Platfor
m. m +Software.
Driving Factors and
Challenges of Cloud
• 5 Factors Driving Cloud Adoption
• Cloud Ecosystem Maturity. Traditionally,
organizations have developed a business
strategy and then determined solutions to
help achieve specific outcomes associated
with business objectives. ...
• Resilience. ...
• Security. ...
• Data and Analytics. ...
Virtualization
• Virtualization is the "creation of a virtual (rather
than actual) version of something, such as a server,
a desktop, a storage device, an operating system or
network resources".
• In other words, Virtualization is a technique, which
allows to share a single physical instance of a
resource or an application among multiple
customers and organizations.
• It does by assigning a logical name to a physical
storage and providing a pointer to that physical
resource when demanded.
• Application

Virtual Window
hardwa s
re unix

Operati
Hardwa
ng
re
system
• Ccpu
• Memory
• Secondary
storage
Types of Virtualization:

• The machine on which the virtual


machine is going to create is known
as Host Machine and that virtual
machine is referred as a Guest
Machine
• Hardware Virtualization.
• Operating system Virtualization.
• Server Virtualization.
• Storage Virtualization.
Hardware Virtualization
• :
• When the virtual machine software or virtual machine
manager (VMM) is directly installed on the hardware system is
known as hardware virtualization.
• The main job of hypervisor is to control and monitoring the
processor, memory and other hardware resources.
• After virtualization of hardware system we can install different
operating system on it and run different applications on those
OS.
• Usage:
• Hardware virtualization is mainly done for the server
platforms, because controlling virtual machines is much easier
than controlling a physical server.
Operating System
Virtualization

• When the virtual machine software or virtual


machine manager (VMM) is installed on the
Host operating system instead of directly on the
hardware system is known as operating system
virtualization.
• Usage:
• Operating System Virtualization is mainly used
for testing the applications on different
platforms of OS.
Server Virtualization:

• When the virtual machine software or


virtual machine manager (VMM) is
directly installed on the Server
system is known as server virtualization.
• Usage:
• Server virtualization is done because a
single physical server can be divided
into multiple servers on the demand
basis and for balancing the load.
Storage Virtualization

• Storage virtualization is the process of grouping


the physical storage from multiple network
storage devices so that it looks like a single
storage device.
• Storage virtualization is also implemented by
using software applications.
• Usage:
• Storage virtualization is mainly done for back-up
and recovery purposes.
Load Balancing

• Load balancing is the method of


distributing network traffic equally
across a pool of resources that support
an application.
• Modern applications must process
millions of users simultaneously and
return the correct text, videos, images,
and other data to each user in a fast and
reliable manner.
• This helps the load balancer to make a decision
on which server will receive an incoming request.
• Load Balancing Methods. ...
• Round Robin Method. ...
• Weighted Round Robin. ...
• Least Connections. ...
• Weighted Least Connections. ...
• Source IP Hash. ...
• Least Response Time. ...
• Least Pending Request.
Scalability and Elasticity

• Scalability is the ability to add, remove,


or reconfigure hardware and software
resources to handle an increase or
decrease in usage.
• Elasticity is automatically scaling up or
down resources to meet user demands.
• The key difference between scalability
and elasticity is the level of automation.
Replication
• Cloud Replication refers to the process of replicating data
from on-premises storage to the cloud, or from one cloud
instance to another.
• Traditional data replication involves replicating data across
different physical servers on the company's local network.
• Replicated data across multiple clusters can support high-
availability storage and active cluster failover so that
systems are never unavailable and are always up-to-date.
• Accuracy: Speaking of up-to-date, cloud replication can
ensure that you always have accessible data that isn't even
a minute behind.
Monitoring
• Cloud monitoring is the process of evaluating the
health of cloud-based IT infrastructures.
• Using cloud-monitoring tools, organizations can
proactively monitor the availability, performance, and
security of their cloud environments to find and fix
problems before they impact the end-user experience.
• Ex:-
• Cloud monitoring is the practice of measuring,
evaluating, monitoring, and managing workloads
inside cloud tenancies against specific metrics and
thresholds.
Cloud Services and
Platforms:
• Cloud service providers provide various
applications in the field of art, business,
data storage and backup services,
education, entertainment, management,
social networking, etc.
• The most widely used cloud computing
applications are given below -
Compute Services
• Cloud applications are software that users access
primarily through the internet, meaning at least
some of it is managed by a server and not users'
local machines.
• Application Services (often used instead of
application management services or application
services management) are a pool of services such
as load balancing, application performance
monitoring, application acceleration, autoscaling,
micro‑segmentation, service proxy and service
discovery needed to optimally deploy
• Application services are software
solutions that improve the
• speed,
• security,
• and operability of applications.
Storage Services

• Cloud Storage is a mode of computer


data storage in which digital data is
stored on servers in off-site locations.
The servers are maintained by a third-
party provider who is responsible for
hosting, managing, and securing data
stored on its infrastructure.
• There are three main cloud storage
types: object storage, file storage, and
block storage. Each offers its own
advantages and has its own use cases.
What are the examples of storage services?

• Top 10 cloud storage services of 2020


• DropBox.
• iCloud.
• Google Drive.
• Microsoft One Drive.
• IDrive.
• Mega.
• Box.
• pCloud.
Application Services
• Art Applications
• Cloud computing offers various art applications for quickly and easily
design attractive cards, booklets, and images. Some most commonly used
cloud art applications are given below:
• i Moo
• Moo is one of the best cloud art applications. It is used for designing and
printing business cards, postcards, and mini cards.
• ii. Vistaprint
• Vistaprint allows us to easily design various printed marketing products such
as business cards, Postcards, Booklets, and wedding invitations cards.
• iii. Adobe Creative Cloud
• Adobe creative cloud is made for designers, artists, filmmakers, and other
creative professionals. It is a suite of apps which includes PhotoShop image
editing programming, Illustrator, InDesign, TypeKit, Dreamweaver, XD, and
Audition.
Business Applications

• Business applications are based on cloud service providers.


Today, every organization requires the cloud business
application to grow their business. It also ensures that
business applications are 24*7 available to users.
• i. MailChimp
• iii. Salesforce
• iv. Chatter
• v. Bitrix24
• vi. Paypal
• vii. Slack
• viii. Quickbooks
Data Storage and Backup
Applications

• Cloud computing allows us to store information


(data, files, images, audios, and videos) on the
cloud and access this information using an internet
connection. As the cloud provider is responsible for
providing security, so they offer various backup
recovery application for retrieving the lost data.
• i. Box.com
• ii. Mozy
• iii. Joukuu
• iv. Google G Suite
Benefits of Application
Management Services

• Innovation. ...
• Increased performance. ...
• Business continuity. ...
• Platform stability. ...
• Better end-user experience. ...
• Higher productivity. ...
• Reduced need to rely on expensive, external
experts.

You might also like