0% found this document useful (0 votes)
10 views

DC Unit 4 Summary Distributed Computing

Uploaded by

jaffindurai29
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
10 views

DC Unit 4 Summary Distributed Computing

Uploaded by

jaffindurai29
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 15

lOMoARcPSD|42959283

DC UNIT 4 - Summary Distributed computing

Distributed computing (Anna University)

Scan to open on Studocu

Studocu is not sponsored or endorsed by any college or university


Downloaded by Jaffin Durai ([email protected])
lOMoARcPSD|42959283

Destribted
omiinc

Notes

Deana
MUex
and

4.3 4.2 4.1 Algorithm


forIssues
Failure-Free
Consensus
4.14 4.13 4.12 4.11 4.10 4.9 4.8 4.7 4.6 4.5 Contents Failures;Syllabus
Algorithm
Checkpoint-based
Checkpointing
Coordinated
TwoAlgorithm
Marks
Background
Issues
Consistent
Recovery and and Failures
Definitions
Check-pointing
Introduction
RecOvery
Rollback with Byzantine
Agreement inSolution
ofAgreement Consensus
Agreement
Byzantine
toProblem
Overview of
Failure
in
Check-pointing
Asynchronous System
and
4
in
Failure RecoveryAgreement
Questions for Set inResults and
Synchronous
Systems a (Synchronous
Asynchronous of Failure-Free Agreement
Recovery
Checkpoints -and
with Agreement
Problem Checkpointing Algorithms:
Checkpoint-based
Rollback
Consensus
Recovery
and
Answers Algorithms and
Checkpointing
Recovery
and (Synchronous
System
and
Problem
Recovery
Asynchronous)
UNITIV
and
(4- 1) Recovery
Recovery
Problem : Definition
Introduetion :
Dec.-22,
May-22, Dec.-22, May-22,
- -
Agreement
Coordinated
Definition
Overviw -
Asynchronous) Background -
in
Checkpotnting
Synchronous of
Results
and
Agament
Marks13 Marks13 Marka13 Defintions
Algorithm Sytms

with ina
- -

Downloaded by Jaffin Durai ([email protected])


lOMoARcPSD|42959283

4.2 Distnbuted
Computing
" " " " "
41
Agreement " " "
objectives : If to If ButissuesThree may
himretreating
another. The of
exchanged.
Presence system.
database
In Examples
Agreement effect
proesses
However,
reaching
deide In
achieve
Source
a the the
enemy,enemy
Byzantinemessage
majority,
the
Processes/Siles
Consensus
lieutenantto be Problem distributed
the or absence of
processor
: attack commander one traitors, they city, "Whether a
of faulty
and in an ommon
minimum
All order, more
or management :
exchangesfailure A protocols
each Agreeing agreenment is
non-faulty and is more generals must Agreement
:"Several vote of processor. relay case data
BLICATIONS
TECHNICAL
treacherous, lieutenants trying failures to in and
broadcastsanother division
decide :vote, can of goal.
is of Processors the bases,
commit distributed
to might whether helps failures, Agreement
treacherous, the be system.
thereMutual
processors that are
prevent divisions mean, values easy.
generals toagree commanded upon taken to
its they Problem be faulty or
he the needed can etc. Agreeing toreach processes
values tells t h e a and commit received Abort may systems
Trust/agreemernt
commander to of processors,
agree are
toone
he may
attack loyal
common
the
fail
decision/agreement agreement
an
the be
Algorithms 4-2
to proposes be before or on a often
an on retreat. by or from must
up-thrust the others. oftreacherous, generals Byzantine misbehave a to Transaction".situation
or i t s
his are to plan agreement values common abort others compete
samne own exchange
to in is
Problem :
for Solution peers attacking retreat. of
from armny presence where very
ledge value. decide action.
general. intentionally, (that a several
i.e. clock
transaction
that reaching can When much as
are can is their
one faulty.
to
and toOnce Some value
data
well
must thetold
camped
After the
be be to of times Consensus
Rocoven
and
attack the reached. made
be failures. values mànagers there required. Definition
meet Commander agreement of
commander
Several decided) in in as
or observing
the a to is cooperate
following
general retreat. outsle based distributed distributed a isolate with no
generals rounas can
other
fails. havo
on the
is be ta
:

Consensus
Problem
|4.2.1 supplid Distrlbutad
Computing
4.2.2 ."IfValidity
l i.irrelevant". 2. Value
L. Can If 1V. lii. i. i. faulty No
functioning Fig
processorsis
Processors It value".
Interactive
Problem initial EveryInitialEvery by
Alvery
l initial Agreement
Validity
agreed agree A solution 4.2.1 source
agreed non processes General 2 Attack the :
non value
on processor
processorvalues shows"Value If
sOurce
processor canvaluevalue faulty correctly. issource
faulty for
If: upon
:
any
of General 1Byzantineagreed
faulty
initial agreemerntthree Retreat
CHNICAL Consistency
processors has
agree
of
by All common
alnon-faulty
l
non-faulty
by
processors
may
hasbroadcasts
processor. is
nonfaulty,
then
non-faulty be processes upon
its on value faulty
non-faulty its General3
different Attack
own any value. own can agreement. all
ICATIONS must of processors must
processors its
by
non
common processors initial be faulty then
initial processors
every f o r initial achieved
can 4.2.1
Fig.
agree processors agree
faulty
-
the 4-3
non-faulty different
value. handle processors
value.
on is is on value common
-
an a value"Value agree
irrelevantdifferent
a only General 2 Attack proces5ors
up-thrust set must
is singleprocessors. a
to single
of different
processor on all if is agreed
be the then there irrelevant"
common
for v.
common other Retreat
General1
traitor.
knowledge same all are
can
agreed then processors. value
is non agree
values. single
v, value. 2m+1 In General3 Retreat Consensus
RecoveryY
and
all - must
upon then faulty a on
value. (more
system
non any be
the
by processors than common the
conmmon with
faultyfaulty value
2/3)
m

Downloaded by Jaffin Durai ([email protected])


lOMoARcPSD|42959283

4.4 Distnbuted
Comtig
ValidityIfii. i " No.Sr. .3
supplied Agreement Solution Number This
agreement. Faulty IfCommon
value. If An First Consensus
crashing. in In agremnt.ln
faulty the
Solution 3. 1. Overview In
arbitrary otherinteractive
Byzantine Byzantine
bound defined al l
processors' source failure failure
Byzantine Crash failureNo t h e
by : must of
processors Failure
mode two
All : canfaulty to is
onsistency and previous
the
source source and cases,
meet processor Byzantine not of
agreement
TECHNICAL sourcenon-iauity be
processors, solved Results
is relaxed
following
values
processor
solvable every consensus
are
nontaulty,
processor.
by problem,
mentioned
inand is processor problem,
ICATIONS processors for in
majority, faulty, Lamport.
Agreement process [f Agreement
Agreemernt
attainable atainable
[f<n Agreement
process] attainable problems,
objectives systems m, agreements broadcasts systemasynchronous
Synchronous
thern cannot (n- only problems,
other hasagreement
the agree then 1)
using
: exceed /3] its one agreement
an
- common do its
on non-faulty non-faulty initial Problem byzantine system own
procesSor all
p-thrust authenticated not
the : is non
trunc[(n-1)/3]. matter. initial on
is
for agreed same value evern a on taulty
dge processor initializes
value. set
processors Agreement
attainable Agreement
attainable
not Agreement
attainable
not
value. to Asynchronous system process one if of a
single
common processors
value the messages. all
others.
can the value. Recoven
SYNSUS
and
must cannot
agree value
valuos must
be
canfailby
reach whes reach
value
an an

Lamport-Shostak-Pease
Algorithm
4.4.2 4.4.1Computing
Distributed
Impossible
Scenario
" " Consider
1.m>0
2.OM(m), 1.
Algorithm
Algorithm
OM(0) number
'n' This possibilities
Two :
Message Time 3. Algorithm
You = Not Case
Casevalue.
Fig. possible.
For Process LetSource default EachSource
usingremaining Numberalgorithm
can
complexity 4.4.1
each vi processor of 2 a
1
reduce
complexity O(m-1). = x valueprocessor is faulty p0
:
p0: system
broadcasts
i recursively of processor
i, (nacts
value is(source)
j, processors also faulty.
CHNICAL message = i - 0 processors with
Process ?j, 2) as
received
is
uses sends
known
m+1 used] p0
= processes let a value 1
p0
is 3
O(n") the defined non-faulty pl notprocessors
complexity rounds uses i vj new value
its and as
may
ICATIONS = by to values p2 faulty.
value source Oral agree
the process all as n 4.4.1
Fig.
it follows
processes
receives to and : 4-5
to value
received Message
= p2 on p0,
every 3m+1
and processor
polynomial
- i
is
and 1 faulty. pl,
an majority(vl, initiatesfrom
: p1
p2.
up-thrust from
processor
Algorithmn p2
by
process source p2 p0 on p1
by source.
for OM(m faulty 0should
increasing
knowledge v2,
(0 OM(m)
.,from i if If p0
agree Consensus
and
Vn - no
no
process 1),
time 1) value` value where upon
sending
j is 1
in received). received m as Recovery
step vi is the
to
the
2
Downloaded by Jaffin Durai ([email protected])
lOMoARcPSD|42959283

Example2: Lamport'sDistributed
Computing
" . "
p2Stepinitiates
result. 0(Algorithm
" " "
Step Step System
Assumption bothered Step p2Step Step System
Assumption
sends (the
3: 2: 1 3: 2: 1 Algorithm
OM(0). : with faulty :p0 with
Majority to0 p0 p1 about
Majority OM(0).
(a) initiates : 4
OM(1), : 4
pl Possible
processors algorithm
OM(1)the
Processor one) possible
TECHNICAL and pl p2, pl prooessors Example :
function sends m function
the sends sends the
p3. =1), thevalues p0 faulty initialvalues
1
PUBLICATIONS at to initial p0,
: executes 1
at to to 1 p0, : 1
pl, \p2, are one). pl pl (p2, value are pl,
pl, p3
4A.3
Fig. value only p2, and
p2, p3). 4.4.2
Fig. and p3). toonly p2, 4.6
p3 p3 to and 1 p3. (b) p3 to0 p3 beand 1 p3.
un is sends be pO Proces9or 1.
still is p3. sends p0
up-thrust
1 ). is algorthm
OM(0)the 1, (Algorithm 0. is
the 1 for source, which 1 source,
for
to p1 p1, to
same (p1,p2) p0 lp1,p2)
owledge and and p2, p2
is
(1), p3
the OM(1), is
p3. is executes
which faulty. desired taulty. Consensus
Kecovery
and
For pa m
=1).
is p2
the result.
sends it
desired
(Not
a

4.6 4.Computing
Dlstributed
5
" " " "
"
most
Complexity The inThe least The follows
Algorithm message Common
hops. of Synchronous
areas Asynchronous round. Athe the In
and system.
number Agreement Asynchronous)
this distributed a and
Agreement
different
termination validity agreement
one
each failure-free
O(n) failure
of broadcast(x);
Output Process
x y theround value; int//f:
iffromforA-l0cal
integer:
round
for knowledge proces8
messages :
in
There condítion
model xmin value
current consensus system systemmechanismprocesses,
eachcondition condition
in as P global in system, in
computes
CATIONs
CHNICAL which the
V executes Synchronous a
is are (%, (if value of
round,
yj) any) constant : : Failure-Free
O f satisfiedis consensus no is
to1 with Consensus
the This would arriving consensus
(f+1)-n + is received of t h e
satisfied
process the
and
rounds, 1seen x +1 up decision is
has
consensus domaximum : same
each tobecause value to implemented have at
not Systems can acan
). be because
failed. from f function 4-7
where been
fail-stop
similarly value "decision," System
cach
message satisfied. be
process
an
processes algorithm number reached
up-thrust in broadcast can process
f processes with on
< the in (Synchronous
has n. be be t h e and
inj a
f of
for
one The do + this for
crash Failures
reached constantvalues distributing by
obtained broadcast
knowledge then
up collecting
number not rounds, 1 round; in
integer. to
faíluresg a
send system in received. Consensus
Recovery
and
f
crash a using
number its
of constant values information
Hence fictitious there tolerated this
messages failures ofMay-22
AU: an
n decision
must
processes additional of to
the numnber
values rounds. others,
total is be from
at at in

Downloaded by Jaffin Durai ([email protected])


lOMoARcPSD|42959283

System
Model
4.8.1 4.8 University
Question
47 QuestionDistributed
UniversityComputing
. Fig. " 1. Coordinated "Rollback " List1.
communication.
inter-prOCeSs shareThey system
resumption inter-processrecovery.
Checkpoint
interrupted rollback and
In proesses the When
problems to
Rollback
4.8.1
Background
consider
aare
We IMustrate distributed
the state Checkpointing make Introduction the
common wide failure
recovery communicatng
shows briefly procedure recovery
that saved consideration
pogres
such
consistent specifically
checkpointing: of is
dependerncies
7Rt
the processing communicate oocurs, as
TECHNICAL memory
system and the a system, in and of
prolocol distributed too designated that hardware in statmens
of treats
consising with
or
Definitions kinds state.
at to during
rollback restarting checkpoint
process
t h e Check-pointing
rollback
spite
of
UBLICATIONS
generally a system of here a
preserve over a errors recovery that
common each checkpoints later place distributed failures. shoul d
recovery
of other processes time. failure from a and rols and
network.
make three clock. with the in The are be
resumestransaction
back a n d
byfixed f o r a a fol l o ved
an processes
asSumpions using checkpoint coordinate status program free previously
is system
The to well-known
failures
complicated
up-thrust
number operation.
saved
execution. i t s Rollback in
for and
messages.
algorithm.
information
at checkpointed
application mOst aborts. under synchronous
their recent techniques
owledge state
about their of which because
processes AU: Recovery AU
thethe
interactions.
The checkpoints
necessary
is
as
called checkpont,
systems :
Consensus
KecoVery
and
Dec.-22, normal May-22,
that
reliability processesndo ot
(P,
message
state
collection a
a Dec.-22
AU: with
Marks transient ar e al failure.
P.., to process checkpoint Marks
to is processes
form induce called assume
of 13 allow 13
In a is of

Checkpoint
4.8.2
Local Distributed
Computing
4.9 " "
Checkpointing between The state
Fig, checkpoint.
(i+1)th > theLocal messages.
Cp,iby (iEach
before alsoWe The asThe are and The
computation
messages.
system
one All with Consistent
finite
from the one 4.9.1 in
attained 0) event
checkpoint
taken messages
messagesmessagescomputation
execution checkpoint checkpoint
sites checkpoint
its assume of but
each another shows
ih before recording arbitrary. are Process2
Fig. Process3 Process1
site,
save
in Set generated generated
consistent and begins that is exchanged is 4.8.1
LICATIONS
ECHNICAL establish
distributed of a
collectivelytheir of (i+1) interval
termination.
each process by snapshot
asynchronous, message
Input
the Three
local Checkpoints and by by
periodic and n process process astate M,
checkpoint, of ends P, processes the, through
states systems
inconsistentset. process of of
processes
form
with is underlying
P, assignedis athe ie., M,
checkpoints. assigned aprocess 4-9
a : reliable
globallocal requires P, atakes state to each
including virtual and
an
-
denotes an is of
advance M
message
Output
up-thrust checkpoints. distributedchannels,
process their
checkpoint. that initial called
a the Ma
checkpoint sequence
unique process
all the all checkpoints interactions
for immediately
checkpoint
Cp.0 local progresses
processes ih the
knowledge
sequence
number application whose M
All computation
checkpoint checkpointing. at
that a Consensus
Recovery
and
the
given are transmission at
local (sites) represents
lastthe
and i
number.
referred are its
checkpoints, that instance own
but
performed denotedis referred
interact not Thei" delaysspeed
to
and
the as to

Downloaded by Jaffin Durai ([email protected])


lOMoARcPSD|42959283

Strongly Distributed
Computing
.A "
set consistent
There informationglobal
state. spanned
COnsistent and by strongly A Establish Fig. The rollbacks. The
checkpoints; consistent
is P2
4.9.2 most domino
process P1
one by a shows recent P2 P.
set Po
recoveryconsistent the flow set
in
of there Initial
state effect Consistent
state(a)
UBLICATIONS
TECHNICAL checkpoints. of set
the recovery
distributed
checkponts takes local Fig. of is
set is
pont set checkpoints checkpoints caused
4.9.2 m.
and no fromsent
information for of plae Message line
any checkpoints Recovery P2 snapshot
corresponds each and by
process (ie., to
(one
P1 checkpoint. orphan 4.9.1
Fig.
process line Checkpoint
Recovery
line in 4-10
flow no
an- outside (recovery orphan for a
to and system messages, P2 Po
up-thrust in Inconsistent
cut
abetween the
each
checkpoint
consistent (b)
the is
for set process
line)messages) Inconsistent
state
owledge set. also Which
anyduring
global corresponds called m.
pair in Tíme
the in
the during8
state. of
the the turn Recovery
Consensus.
and
interval set)
that Failure
processes to interval
recovery
are
the such
a caused
spanned
in strongly line.
no by

Checkpoint
notation Consistent Computing
Distributed
node
Each " " "
" .
2.
1. message
Suppose sentEachSimilar process.
anotherof No Set Set Fig.
Monotonically
labelled.
Records local (x2, (x1,
in
message set
Y X 4.9.3
'm' another
Y X to
y1, y2,
maintains :
that the checkpoint
of shows
of is checkpoints Z2} Z1)
the lost Yconsistent
that
ECHNICAL checkpoint
fails is is consistent
last
increasing due is a a
strongly Fig.
consistent
includes
first_label_sent,[X] message afterreceived
to 4.9.3
global
CATIONS rollback. (state).
receiving set
counter anconsistent
from instate. setConsistent of
_received,M
last_label effect
4.9.4
Fig.
m.1 a of checkpoint.
and with checkpoint checkpoints
(a message whose
4-
11
message checkpoints.
setof
- the set
an which
up-thrust first of
would
m 'm'. (state) checkpoint
and message each (need
for its
f be
knowledge label message Yshould to Time
undone
restarts handle
l) to
all Consensus
Recovery
and
also
other from due
from lost
be to
nodes. that recorded
messages).
checkpoint, the
node rolback

is as

Downloaded by Jaffin Durai ([email protected])


lOMoARcPSD|42959283

Phase
One Two Dstnbte
Couting
Checkpointing
Algorithm
4.9.1.1 4.9.1 Simplelargest
" 2. 1. " "
Otherwise,
tentative checkpoints. types Note:
Iftentative
take Permanent
EachInitiating Tentativealgorithms
successful A 3 Processes 1. guaranteed
2 Make checkpoints.
local Livelock
Algorithm The Eery
P, single
Communication recovery.
rollback
Channels Synchronous method label" :s
learnsprocess of checkpointing set
some proess
of
checkpoints checkpoints process that
P, that process termination: : are to isproblem the for denotes
decides informs A A simplifying is
communicate be said are most takestaking
TECHNICAL localtemporary not >
all invokes consistent. any
actions Checkpointing
; invoked FIFO, to a
processes
thatshould P takes
checkpoint failures be during recentconsistent
a "smallest
other
of
assumptions synchronous
checkpoint
whether the
BLICATIONS all a checkpoint the end-to-end by such checkpoints
tentative be tentative checkpoint concurrently. do recovery label.
made have algorithm. not exchanging label"
at
it a partition
that after set
succeeded
taken process. and of 4-
12
checkpoints
permanent. checkpoint that protocols the when is that
algorithm sending
is
checkpolnt
an
-
tentative is
The
messages set Recovery
avoided always is
up-thrust
in made checkpoint the of the <
taking and network. cope all
processes every any
for should
checkpoints, a through
recent by consistent.
other
nowledge 3arequests permanent with taking message.
be tentative and checkpoints
involved label
discarded. that the message
channels.
a Consensus
Recoveny
and
P; checkpoint and
all
rollback
coordinate
consistent
checkpoint.
decides the loss in "||"
the
recovery denotes
that processes on due system set
their
all he to o a

Optimization
Synchronous Computing
Distributed
Phase
Two
4. 3. 2. 1. "
4.9.5
Fig. " " " " 2. 1.
{Now Process
It checkpoint All A There All example messages.
that back
Does Between On P;
processes
X, take minimal or propagates
receiving
X processes
y, isnone this
X2 { a X of no
Z} Checkpointing
tentative
y2, Y
decides shows
number the guaranteetentative
also take record of
and shows the its
z,) Checkpoint
fromn the
forms Z, Fig. the a decision
message
ECHNICAL
CATIONS forms
checkpoints to checkpoint processes
of checkpoint
causing initiate 4.9.5 checkpoints which of we we
a processes message a
a can
consistent
consistent Algorithm Properties : have from to
Checkpoints checkpointX2
Tentative
P; take still al l
checkpoint
Y m
to
and X, has a andprocesses.
P,
takenrecord being
take permanent have
strongly
set and commit/abort al l
set Z received
checkpoints. received 4-
13
to unnecessarily. lost processes
of sends
oftakealgorithm taken the
checkpoints. consisternt
messages
an checkpoints. sending
messages checkpoints.
up-thrust checkpoints take unnecessarily checkpoint
take a
Messages
but act
of
tentative after of not ?accordingly.
for checkpoint
state
knowledge receiving
those after sent.
y, Time ?
checkpoint
and messages. Can Consensus
Recovery
and
it
has process
message
z you
respectively. taken construct
messages must
'm'. its
hold
1last an
to

Downloaded by Jaffin Durai ([email protected])


lOMoARcPSD|42959283

Optimization Phase
Two : Phase
One : Drstnbutes
Comutng
.Y "P.
" " " "
4.9.2
" 3.
Synchronous
Checkpointing
Disadvantages
2 1.
" 4. " 5.
where A On Otherwise, thatrecovering Ifcheckpoints.
AProcessconcurrently.singieRestore affect progress
minirnurn propagates
will receiving they al l process The
algorithm
failure
If
SiThronization
Additional
No meages
first_send(y.
When )) sent
takes
Y

the
restart processes initiator, Rollback performance. omputational after
should P, the a tentative
ending to
thprass
P, procesS may checks placesrarely
from
nurnber
P{s its decides
system messages e sent
Y

TECHICA decision,decision restart. are checkpoint Recovery delavs prs takes the
G initiatedreply whether an
ocurs
one its
perrnanentof
proces
of
to
that willing
"no"
state unnecessary messages
are
must
o a tirst
enint
PUBUCATIONS t h e al l to between be take
nore all to by if al l introduced
nts. mese
and Algorithm
heckpoihecnoint
processes the processes a can exchanged only
proCesses, restart some it consistent
rnehayes TOl] processes rollback extra be
checpoint other is successive sent it (.14
back.
act from already load during to it af t e r t h e
are
upthrust
an
frorn accordingly. continue process.
their wiling recovery state
on whi l e coordinate wi l the last
only participatíng normal last
X
to if
previous after the
checkpoints, the ask message
for with system,
knowlodgo Y X
is is their
restart
to algorithms
failure
a checkponting
operations. checkpointing. Other
all
heckpoireceint ved
being
rolling checkpoints, in which then
normal a from (last Recovery
Consensus
and
undone
back
checkpointing are
with
can thalgorithm
e
processes by
their _recV(x, X
activities. assumptions:
not from
toa P; significantly
checkpoint that
deciaes previous invoked
y) Y
state is sernt
or in >was

|4.10 Message
|4.9.3
Types Distributed
Computing
" to"
Fundamental " "
Resources recover
1. process
2, resources,
4. 3. Following has
Recovery recovery. recorded
Duplicate receiving
Orphan rollbackIn-transit
Delayed message :
Lost Fig.
process
Restart Reclaim
Or the Undo
Issues Y X 4.9.6
occurred, messages
restart to
has refers messages -processwas
messages shows
inmessages do
modification are such are Correct a
resources
process some memory allocated
as itFailure
to
not
:
fault Messages the
ECHNICAL
ICATIONS files restoring is arise
solution
state.
: : : Messages unnecessary
from allocated toleranceessential Arise Messages either
Messages Fig.
made allocated
and Recovery if
to processes
whose 4.9.6
point on
executing a due downtheor
to menmory. system that m

is that
databases process
of
to
to t h e to with whose Unnecessary rollback.
process "send" have
failure the message roll 4-
15
it
recovery to "receive"
processes back been
recovery and process its
and message
"receive" is
an and normal
up-thrust a logging to done sent rollback
resume process infrom where recorded a
consistent
:
operational arrived is but but
a an
for and
knowledge
execution
computer.error. not "receive" not
may the Failure
replaying but after yet Time
failure Dec.-22 global recorded
May-22.
AU: received Consensus
Recovery
and
have message rollback
state. is
For state undone
locked happenedOnce d°ring
example because
"send"
a process due
shared failure
:a can not the to

Downloaded by Jaffin Durai ([email protected])


lOMoARcPSD|42959283

4.10.1 Distributed
Computing
" " " " " " " distributedIn
"
damage, FaultError
disturbances.
external :
Erroneous System
specified.
services
of as Fig. from
compliance A
about
A specified.
amanner provideSystenm cooperating
An Failure processes.
other
valid system system sequence
4.10.1 erroneous Concept
Basic
:
the state by
Anomalous failre of a is
part system ErroneOus shows is the is of specified
a
transitions. Leads to
Causes
existence
said said combìnation
valid state system its
TECHNICAL
the
of
state Systemn : Failure Fault concept with to
proess
physical error)
state of service.
system have "fail" state the occurs
Fig. the of
:
does transitions.
of "errors"
system a when system
of
recoveryv,
when hardware
LUCATIONS State 4.10.1
condition, state fault failure
which not Goncept
which and
specification in it is
the undo
meet if cannot
the state
a
and 4-
16
eg. could recovery. the system
- differs requirements, system. effect
an design lead
of service meet which software
up-thrust recovery of
for does
for
errors,
from to
a
state
Valid it a
deliverspromises.
specified
its could
components.
not
interactions
wledge its system
intended i.e. lead
manufacturing Recovery perform
failure does period to of
Consensus
Recoven.
and
to A
These failed
value. not
the failure system a
by ofuser service
process
perform
a time. is components
failure
problems, Sequence deviates brought
in
br the with

Checkpoint-based
Recovery
4.11 Distributed
QuestionsComputing
University
" " " The "
2. 1.
Typically
parameter
Depending
storage and state.
In thesaved
and
Clearly, non-volatile
data storage.
The When possible
applications.
embedded memory
sensitive While state state.
sections,
case.
thiserror, This as upon seconds.
state This Illustrate
the
them. Discuss
it the
it checkpointing on in basic
may will ditechnique
d canevent By
then device costthe some a
may t h e the the
previously.
betolerate indicating this
continue t h e systemn exclusion, it
saving idea
the restoration
system t h e on cost event
provides issues
the of non-volatile minimize different
this the
involves bandwidth
a behind
system provides system
ofa
is of of the in
CHNICAL
ATIONS most any task service mechanisnm failure
process that will checkpointed,
checkpoint and
checkpoint-recovery a
current
systemn th¹ types
importanttransient
will complexity,
restarting checkpoint-recover
the by
protectiorn there failure, storage baseline recovery
from available cost
designing of
continue could failure. state failures
is the takes
fault, of with
type continue
will state the the medium.
checkpointing information of
take the point
internal
to
vary a t h e the m an 4-17
however to against a
of amount failed to the snapshot
state system can system
distributed
examnple.
fail from be at storage withthe is
an fault
up-thrust and the recovered.task which state be the
if of to needed saving
to processing
recover t h e trarnsient
a of
or
of
t h e enough have periodically systems
guard fractionstate, its mechanism
of amount t h e high,
for fault system, state the
kowtedge system entire as by for and and
against,
endlessly. was and to
system small AU AU:
in
fault of was stateof be using t h e or explain
the restoration : Consensus
Recoveny
and
caused an a and being state system useful before Dec.-22, May-22,
last restoration
but model. bandwidth
identical second can a
providing and techniques
critical how
In saved. required
used be
to
not some by be is in critical Marks to Marks
stores saved even of
in Typically
amanner to restored, state prevent
to system
of
design
everycases, many to SOMe 13 13
the
save
the cost like lost code
to as

Downloaded by Jaffin Durai ([email protected])


lOMoARcPSD|42959283

Checkpointing
Coordinated 2. Direct Distributed
Computing
However, Uncoordinated
Checkpointing1.
" the "
" "
" " " " "
the process.
checkpointing
Two
of storage.
effects process.
each by Whern whichIËy When l,,Assume duringrecord message In The Not d. C Disadvantages
b.
Coordinated dependency dependency :
Adrantages Each
synchronization
checkpointing is order collection
algorithm
garbage Each
consistent
set
Recovery Domino
approaches processes
in failure P a prooess
checkpoint
receives each failure the exchange
to suitable
case process
request deternmine
dependencies effect
checkpointing occrs, later process
tracking free from :
The has
UBLICATIONS
TECHNICAL are approach tailures of record f o r
process saved a checkpoints.of
messages used
message the message
interval operation.
P;
duringapplication maintains during
a
autonomy
lower
technique a failure
nonblocking. to recovering onto starts consistent the a runtime
andreduce sutters by and among failure-free recovery
simplifies
preserving to stable m dependencies is in
its
the collect during execution it with multiple slow deciding
from is theirglobal overhead
number the processstorage an operation. frequent because 4-18
overhead: failure all I,
interval checkpoints checkpoints
an
highconsistent a the y checkpoint when
up-thnust
of
when
initiates records it
with among during
recovery dependency output processes to
checkpoints, overhead P,
between an
for first
initial normal t a ke
wledge takes causedduring their
commits and
is global rollback the checkpoints
to and dependency
Ci,y: C
checkpoint need
information y-1 recovery,
by
checkpoints periodically execution
the associated
minimize checkpoint
by to Consensus
Recovery
and
eliminates and message
other broad iterate
Ci,x Ci0 the
is the
to from caused invoke
with on maintained
castg processes
exchange to
number domino li,x find,
make the stable
* to by a

Checkpointing
Non-blocking BlockingDistributed
CheckpointingComputing
"A " " .
Key Disadvarntages
The Processes
resume e)receives
Processd) ) processes
b) a) When
receiving receiving Problem
channel receivingissue with Fig.blocked After
fundamental
problemin processes makes Coordinator Process
message Coordinator 4.11.1
a a
by can
application process
to
process
until
application the tentative X shows
a be toreceives take
first
checkpoint need : end the
coordinated The receives checkpoint takes takes
CHNICAL checkpoint-requestavoided
ATIONS-an blocking entire takes
messages
not
computation checkpoint
execution. commit protocol. this
messages
stop checkpoint;
a a Fig. a
acknowledgement message checkpoint, checkpointing
X local
request, by 4.11.1 checkpointing.
coordinated checkpointing: Being their
prececing message, checkpoint,
that that permanent. Blocking
execution is and
could message. forcing could blocked broadcasts
halts it 4-
19
engages activity
checkpointing the
make
removes
checkpointing to
make each execution;
from
first while during prevent
knowledge
up-thrustfor a is
process
the post-checkpoint the
al l messageprotocol a complete.
taking old
checkpoint checkpoint ableto the orphan
processes; takes m
is checkpointing. checkpoint
permanent
to to checkpoints. Z
to to
prevent take prevent tentative all
coordinate messages, Consensus
Recoveny
and
inconsistent. inconsistent.
a broadcastsprocesses.
checkpoint message
a a checkpoint.
process process it
with
on commit remains
from upon each from and other

Downloaded by Jaffin Durai ([email protected])


lOMoARcPSD|42959283

Communication-induced
Checkpointing 3. Example Distnbuted
ComIting
"
" b) P Initiator
stable
processes.
otherbased independently.
checkpoints a) "
Checkpoints independently.
Communication-induced It
distributed that they
each The message.
post-checkpoint
request,Solution
bycheckpoint
heckpoint process
checkpoint checkpointing The
Checkpoint Ga
avoids Fig
storage, on Communication-Induced help preceding of Chectpoint
request
Chandy-Lamport
Coordinated (a) m
4.1.2
information domino foring with P,
are checkpoint in
computation, Co.xinconsistency:
since
before
request shows
TECHNICAL
thereby bounding the FIFO
taken each irst checkpoint does
effect, the from nonblocking
channels Checkpointing
avodingpiggybacked
such not
checkpoint 4.11.2
Fig.
checkpointing while taken
rollback post-heckpoint
process algorithm
PUBLICATIONS while show t h e P, 0r
Po Initiator
that
the allowing Checkpointing
is If mC1, x
: checkpoint The
a propagation
at to
hannels beingshows Non-blocking
checkpoint checkpoint.
domino
system-wide on the part take request. process is (b) Checkpoint
request 4-20
- theforces processes same of message t h e m

an a sent coordinator.
the Po nonblocking
up-thrust effectapplication a checkpoint are This
time during(CIC)
consistent receipt
from sent
each FIFO,
for
consistent to on situation
take allowing protocols message
nowledge process failure each this Po. ofAssume
messages before message algorithm
some global channel
problenm results P ox
Po Initiator
state recovery,
to each are "m"
of
checkpoint receiving "m" message Consensus
Recoverv
and

always fromittake their popular, bycan in after Checkpoint


request
Process from an for (c)
receives by a
checkpoins checkpoints checkpont beinconsistent "m"
receiving coordinated
exists ensun
becau t h e avoided Po,
to of reaches
take the first while
on
a

Number
checkof
|4.11.1 Distributed
Computing
commit
Output process Domino
Rollback
extentOrphan effect. point "
Parameters
Communication-Induced Index-based
monotonically the withinModel-based
checkpoints A types
Two contents
coordination The information The
checkpointing determine
Communication-induced
and
Difference same model
forced receiver
Communication t h e
index of
system, is if
set that messages the
checkpoint it on
Unbounded
possible
Not Possible
Possible Uncoordinated,
Many pointing
check between increasing
communication-induced at checkpointing and of has ofeach
TECHNICAL different according up could message. each
toindex-based
communication-induced to
application
take
detectresult are application
Checkpointing
Uncoordinated, indexes must
Induced processes exchanged. In a
LICATIONS
tothe in relies contrast be forcedcheckpointing
No No One
some checkpointing
inconsistent message.
Globalcheckpoint
coordination global
Last to possibility on taken message
checkpoint
Coordinated
check Check checkpoints,
form heuristic. 4-21
protocols preventing
pattersof with
an
- pointing checkpointing
a before
Coordinated
Pointing consistent states coordinated piggybacks
uses
up-thrust that to
checkpointing the
do such suchamong advance the
equied
Global
tor not application piggybacked
knowledge state that patterns
take the checkpointing, the protocol
t h works
e
quidchekointsPosible
several No No Many useless global Consensus
Recovery
and
Communication communications
existing and
induced
check checkpoints are
could may
oordination pointing by information
checkpoints. model-based
recovery and
checkponts.
be process
assigning no
having forming special related
line.
the to

Downloaded by Jaffin Durai ([email protected])


lOMoARcPSD|42959283

Koo-Toueg
Algorithm : Distnbuted
Computing
" " " 2.
4.12
orphaned, received The Call
If checkpoint alltentative
permanent Pand An that 1.
Thus, Fig. Suppose
received problems takes
that a checkpointing
Koo-Toueg avoid An The Two
n_qp other
first Uncoordinated
message
the algorithm
records establishing
q
1 4.12.1 algorithm Coordinated
from set was near-simultaneousKoo-Touegbasic
Qwas processes from P
of shows wantsduring approaches
is tells
it such
sent consistent in
asked not its for P
Q which
ATIONS
CHNICAL before 1987 process.
recorded processeseach current such a Koo-Toueg
P_1 to to the checkpointing
algorithm,
staggers Checkpointing
from checkpoint prevent proposed
to the Fig. recovery.
establish
take prOcess coordinated set to
tentative whom state 4.12.1 of heavy checkpoint
a in II Q1 P_2
algorithm. checkpointing
q1 a checkpoints which may
tentative a in in at from a
coordinated loadingof
checkpoint IT it a Koo-Toueg
P_3 checkpoint
checkpoint has
checkpointing
tentative lead
(e-g., by being coordination
has Algorithm 4-22
checkpoint received to
an
Q), P diskthe adomino
thrust by the forces
algorithm orphaned, checkpointing
at and time; process
was checkpoint,
Q: P_3. avoids
a has Q
for taken. last
to to message Staggering
system. effect
dge record
two take P_3 QThis o
prevent message, then must domino
or
will initiate
sending types a and to
sincesends checkpoint checkpointrecord checkpoints Consensus
Recoven
and
Livelock
effect
and
recovery live-lock.
m_qp m_qP of the
taking
m_qP checkpoue
a
from messay that system-wida
that
to well.as technique
1 record q1 can
P
being has was help

Distributed
Computing
Algorithmn
4.13for
a.
of
without b. Two equal Then process at Orphan Pi
message
Whenever algorithm messages recovery,
Several In beAssumptions recovery.corresponding
Theyorder, Here setits. Each This of If all If
Asynchronous some
records, periodically.
Stable Volatile process from restored.
this II
tentative all
type process to gave we member may
abandon processes
infinite
any the Pi,
messages
process sendtheby iterations recovery it we discuss, members
log: of Pj avoids
a an set
first log number according has checkpoints
synchronization longer log P; are process need buffers, :
algorithm of off the
CHNICAL
checkpointing
are short :
storage mustorphan Piare
send
the algorithm
of
communication
Juang-Venkatesan a
IItentative in
to chain of II,
stored of is rollbacks,
rolledexistencerollback to Asynchronous
Checkpointing and can II,
time timnemessages rollback discovered, to find message that
are messages. the greater and that potentially are
reaction can
CATIONS volatileinwith to
maintained: to
current
back received
by eachconsistent is
a checkpoints, unable be
need
access
: access to of transmission
After than process processes is basedchannels converted to,
sent a if Orphan of
other state state necessary
the process algorithm spawn
checkpoints. toconfirmn
log, but but number from 4
executing by set on checkpoint -23
number has and
- processes,remained lost twhere
he
an then of messages. are other asynchronous
of are
a to
ifprocess. processes, ofbecome for keeps delay set none
permanent. taking
up-thrust moved also
checkpoints reliable, for
an processor number messages of all processes. . of
Local if messages involved track asynchronous
is checkpoints are as a
for event,
to crashed
other
an arbitrary requested,
knowledge stable delivery made checkpoint
then orphan checkpointing.
checkpoint crash. of
processes of to
the messages sent
onereceived in both which permanent.
log. triplet but Consensus
Recovery
and
Move bymessage. this messages
checkpointing Recovery among P as
or finite. and
the the requested,
consists to received
more
process by
recovery.
to
is find number system
During processes all
stable processor
recorded message in members
of Pj if
This FIFO
set og are to
any can the and then
of in

Downloaded by Jaffin Durai ([email protected])


lOMoARcPSD|42959283

Distributed
Comuting
algorithm
Recovery :
falure Ans.consensus that Ans
TecOver Ans. Q.4
debugging
proes
a Checkpointing
techriques Ans. : Q3
value. Ans. :
singe Ans.: 414
number "
Communicate
over cheknintchekointto
: in What
: : Treats Achieve
execution.
failure-free bas Idea Nuber
recovery.
RecoveryDefine The What Write process
In Checkpointing has What
Each State Two Numbe
tohas the
a problem, ofthe Restore on :
occurred,
orrect difference is
agreement do the is a fault theMarks From
faulty
agreement the
the you consensus distributed
purpose use
refers
tolerance the number the mesges mesgs of
state. all difference
processes migration,
mean and of set
ECHNICAL it processesbetween is
Questions
system
is to problem, problem, Rollback
of initialan of
by are most
Fundamental network. of
restorng
essential in
using system byback checkpoints,
messages
sent
neied
between and agreement distributed
ATIONS the typically
useful periodically
have recovery. by
sngle a single to balancing.
loadcheckpoints. value and to with
that a agreement achieve application by
tosystem anagreement not a p;
used consistent sent to p;
fault the initial
process process problem system Answers find 4-24
overall only and Pi, from
- process
tolerance to to theall saving a
up-thrust
an
its
value.problem
has and has
in
f o r
provide ? as
state received. set from
p;,
system of
normal
where availability,
the distributed correct a the the from
is
the
consensus
and after consistent
for initial fault collection state
the initial reliability beginnng the
wledge the
onerational the processes a
recOvery tolerance failure.
of
failure value,problem
consensus
value. system but checkpoints. beginning
a
ofprocess Consensus
RecOVev
and
in also must of
from whereas the applications.
? to processes
happenedstate.
? for computation
an problet presenceMay-22AU
: : agreeon
May-22AU a AU: dring Dec.-22AU
:
Dec-22
Doing computation
er Once
program
in
that the
Ol that
t

Q.10 whose Ans. Q.9 Ans. Q.8 Ans. Q,7


Ans. : Distributed
Computing
Q.12 Ans. Q.11successful
the
Failure
specified. Problems, Ans. Q.13
aepentriesPositionscaled Ans.
Versions.transaction 2. 3. 1.
Permanent Secondary Process :
: : : : :
1s a 4. 3. 2. 1. List Explain
1. state What The
An domino
Define
effect. Failures List
of FaultDefine and versionShadow How
a
made
To of No
If inAdditional
algorithmSynchronization
progress.
significantly
performance.
affect Tentative orphan process classification
damage,
system intentions
complete
: ar etheir shadow failure
computational
drawback is is failure
fault termination
:
inconsistent
two orphan storage in
by version
Anomalous and copying'shadows' store.
current
A
types prOcess
of
a
external
ours versions placesrarely messages of :
A
local
a
computer
the The cascaded failure
CHNIPUBLICALCTIAONs failure. lists
commit the arversions
of map
uses
occurs
synchronous
delays
temporary
checkpoint of of
process
with is
of
when.the
disturbances e
the a are an messages checkpoints.
the a
failures.
physical What stored associates map
must process
the ? 4. system
System
failure 2.
unnecessary
between are checkpoint rollback
Communication
process, olprevious
d in to
helpful
introduced be checkpoint
at crashed
system are map t h e can check can
condition, separately. locate exchanged a that
different the and version the in be process. may be 25
4-
does committed recovery successive extra sent pointing.algorithm. process
survives classified
an new identifiers versions during
that lead
not entering while to
up-thrust approaches
e-g. map When store. ? load coordinate made
is after
t h e
to medium
pertorm of checkpoints,
what as
for design replaces tversions. he a The of
on
normal
the i t s
crash follows
trarnsaction the the check a
recovery. is failure
knowledge its positions
to versions t h e permanent called
server's operations.check
serVice errors, the The server's of :
fault-tolerance old system, then pointing another
the
Consensus
Recovery
and
transaction
commits,of written objects
objects the
pointing.
domino
effect.
manufacturing
in
the
map. the
which algorithm
checkpoint
process,
checkpoint
?shadow a by with in
manner
newstatus each the file a can but
is on
Downloaded by Jaffin Durai ([email protected])
lOMoARcPSD|42959283

heckpoint
exchange. Q.19
checkpoint.
checkpoints Ans. majority
Q.18 collaborate
processor that
mayother Q.17
Ans. Q16 Ans. Q.1s Q.14
protocol Ans. : Ans.: " " 3 2 1. Terminathn
: re-synchronization.duringThere
changed
in State
is Agreement
equal.
Synchronizing At Physical
Distributed
synchronization.
Fault-tolerant dock : during Storage
Mege Time :
What What
A " ApplicationsWhat Rbwng What
guardTo any The Cst
piggybadks process among has order In :No
accordingEach the eveution h
is of is
an
Byzantine a time, clocks areoverhead
traftic are
Agneemnt
forceddifferent local to
process may
the ByZantine
initial reach
small protocols systems the ot
mis threquirements
e uints
against values nunds
initial distributed have of
application ofNumber perfomance :
TECHNICAL
CATIONS to checkpoints checkpoints
protocol-speci?c take
with bound
the
examnes binary agreement an agreement : are
processes of require
may drift the
Amount and
t h e value. each
agreement
needed of
protocol. a agreement protocol. ud of
local on clocks of lntegrity.
value.other clocks problem.
help nnsus
domino amount physical of consensus
agreement nmessages to aspects
the ? problem. algorithms of
? check of
are in to
information
intormation The problem,
on all :
reach information reach
effect, notpoint agreementorder by clocks of 4-28
a non-faulty exchanged an algorithm
algorithm
binary
coordinated
anytime
to
which
common
a
algorithm agreement.agreement
an a prevent n to
to
up-thrustcommunicationand processors the synchronized. that
application must value processes to to
clock to hold
occasionally during an clock ? needs reaçh protocols
for to reflect b. hold
wledge admissible for
form There of value.
the communicate to an
a must
? execution,
messages to for
non-faulty stored
agreement. Consensus
Recovev
and
induced
is execution.
a a are
forced global certain
agreement.
be execution
bad approximately at
that
procesSorswith
extent process processore
checkpoint
to
processes
consistett The
are
take Each eacCn
local
the
is

Distributed
Computing
checkpoints
Ans. Q,22 Ans. overhead he Ans. 0.20
Q.21 consistent
performance
Ans. Q.25 Ans.
e; Q,24 weight
t h e messages.
Ans. orphan
Q,23
Q.26
date.
out of tamiliar Ans.
taultenhancemnent
Performance such execution recovery :
: :
ereased : Acut.
consistent
a. What that Define
: : : Messages
of Write
Goal messages.
orphan
Define What
A checkpoints.
useless
Explain
A
f. e. d. C. b. Basic
tolerance: The
as Mention
Precedence The The are The
known. A cut a checkpoint useless
state.

availabilitymeans Reassignment
a
motivations process idea is (e; C= cutset to
and achieve
down
the
in
checkpoint
is
of
IPC cost amount the ->e; Useless
with checkpoint
the
some : {c,
communication is the execution
Highly costs of
has basic
) Co, the interval system
of processing recorded
receive
but
relationships of and goals
an checkpoints
TECHNICAL performancemotivations
:
for between
of already
computation idea Ca sum
availableUsers
optimal intervals
tasks (e; ...} to of from
of
replication behind -> ofachieve a is a
require The is every each been
;)
is costs the
assignment
process. the
?
failures,
process
are
for among
not consistent
UBLICATIONS data caching
enhancement.
services possible. pair required
replication.
task split task and weights
for an
sequence not 4-
27
is include the of on assignment up (e; that optimal message desirable is
not but one
of every into if of is
to data : taks tasks by assignment. finding of
-
an necessarily be are eachpieces
for
;)> all
the
assignment events
they that
because
node
is edges send will
up-thrust highly at
known.knoWn. task approach sites a
consume
chents is called minimum
in not between never
for strictly
available. known. and there
the recorded
they

thetasks.
resources do beConsensus
knowledoe and ? are cutset.
Recovery
and

correct speed no
weight two not part
servers are contribute of
events This
data. consecutive and
of May-17AU
: May-17AU
: cutest.Dec-16AU
: called a
each sums
It
is May-18AU
:
e; cause global
by
may CPU and upThe the to
now
be
Downloaded by Jaffin Durai ([email protected])

You might also like