0% found this document useful (0 votes)
13 views14 pages

DC (Unit 4)

The document outlines consensus and recovery algorithms in distributed systems, focusing on the Byzantine Agreement Problem and various recovery techniques such as checkpointing and rollback recovery. It discusses the challenges of achieving agreement in both synchronous and asynchronous systems, particularly in the presence of failures. Additionally, it covers the definitions, algorithms, and issues related to failure recovery, including coordinated checkpointing and the complexities involved in distributed systems.

Uploaded by

maniputu11
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF or read online on Scribd
0% found this document useful (0 votes)
13 views14 pages

DC (Unit 4)

The document outlines consensus and recovery algorithms in distributed systems, focusing on the Byzantine Agreement Problem and various recovery techniques such as checkpointing and rollback recovery. It discusses the challenges of achieving agreement in both synchronous and asynchronous systems, particularly in the presence of failures. Additionally, it covers the definitions, algorithms, and issues related to failure recovery, including coordinated checkpointing and the complexities involved in distributed systems.

Uploaded by

maniputu11
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF or read online on Scribd
You are on page 1/ 14
Consensus and Recovery Syllabus Consensus and Agreement Algorithms : Problem Definition - Overview of Results - Agreement in a Failure-Free System (Synchronous and Asynchronous) - Agreement in Synchronous Systems with * Failures; Check-pointing and Rollback Recovery : Introduction - Background and Definitions ~ Issues in Failure Recovery - Checkpoint-based Recovery - Coordinated Checkpointing Algorithm - Algorithm for Asynchronous Checkpointing and Recovery. Contents 4.1 Consensus and Agreement Algorithms : Problem Definition 4.2. Byzantine Agreement Problem 43 Overview of Results 4.4 Solution to Byzantine Agreement Problem 45 Agreement in a Failure-Free System (Synchronous and Asynchronous) 4.6 Agreement in Synchronous Systems with Failures . 4.7 Introduction of Check-pointing and Rollback Recovery . . 4.8 Background and Definitions 49 Consistent Set of Checkpoints Issues in Failure Recovery Checkpoint-based Recovery Coordinated Checkpointing Algorithm Algorithm for Asynchronous Checkpointing and Recovery Two Marks Questions with Answers Marks 13 Marks 13 May-22, Dec.-22, «+--+ -- Marks 13 a7 a 42 oor ob Dion itt HERI Consensus and Agreement Algor in distributed systems oft e a common goal. Mutual Trust/agree len compete 95 well a5 cooperate 4, rent is very much required. jan where data managers have jon”, When there is no Agreement protocols Examples : Aj Several rounds agreement can be reached. [EE] Byzantine Agreement Problem * The Problem : “Several divisions of the Byzantine army are camped outside an y, each division commanded by its own general, After observing, the decide upon a common plan of action, Some of the generals generals from reaching agreement.” Is are agree fo atack oF to retreat, Once the commander i vutenants to the commander 3k or retreat retreating to another. + Ifa lieutenant eacherous, he him to attack and another that they are Of is pers thatthe commander told to retreat * Source processor broadcasts objectives Agreement : its val ves f0 others, Solution must meet followité ty processor Y Processors agree on the same value. TECHNICAL PUBLICATIONS? , 99 woth for knowedgo puted Computing a If source is nonfaulty, then the comy al on agreed value must be the ice is faulty then all non - fa pure 'y processors can agree on any common value”. “Value ageced upon by faulty processor ie erent «Fig 42.1 shows Byzantine agreement, Consensus Problem Me ty processors ty processors is different then all non - ‘can agree on any common value. gree on the same single value. processor is v , then the common alee nota Pe ny processors must be ue rent then all agreed value by © “If initial value of non: e processors can agree on any common vs Processors is irrelevant”. EEEA interactive Consistency Problem i. Every processor has its own ji, All non faulty processors must 9Br°< TECHNICAL PUBLICATIONS reo on a set of common values. ‘an uptist fer hood ail non fully Processors must egg, rement is on a single value. In Byzan 1 is on a set of common values, ment tency problems aer® initializes the value whe lem, only one Processor teas has its own initial value. ‘© In interactive consis it pro processor EEE] overview of Results in asynchronous system even if one Process can fail by ‘Synchronous system Asynchronous system ‘Agreement not attainable “Agreement attainable 3. | Byzantine fare Z [f= (a= 1) 6) byzantine process EEZI solution to Byzantine Agreement Problem + First defined and solved by Lamport + An arbitrary source processor broadcasts its initial value to all others. + If the source processor is faulty, other non-faulty processor can agree on sy ‘common value. + Faulty processors’ values and agreements do not matter, 1 tty pes ae my, he nay proceso can! 0h * Number of faulty processors, cannot exced : tune n-)/3] {This bound canbe relaxed for systems using authenticated messages. + Solution must meet following objectives: \ Agreement: All nonaulty processors agree on the same value. source is nonfaulty, by the source processor, bia Raa" 3” upset for knowledge then the common agreed value must be the val Disiio ee ——— 4-5 [EEE impossible Scenario 1 Consider a system with 3 processors: po, pi, pa Consensus and Recovery Ource) is not faulty. 2 is faulty. pt should agree upon 1 as the value. Not possible, ae ee «Case 2 : p0is faulty. pl may agree on 1 and p2 on 0 Fig. 44.1 processor pO non faulty and processor po faulty EEE] Lamport-Shostak-Pease Al: + This algorithm also known as Oral Message Algorithm OM(m) where m is the number of ‘+’ = Number of processors and n >= 3m+l recursively defined as fllows = ——y ee Lampert ays a 1p ph. ps source pis ay aS Hy Land 0. pppoe oa jal value to be 1. to [p2, p3I- pd sends he faulty one) sends 110 pl and 0 t0 p3. ction at pl and pis 1, which is the desired result, (yy fu algorithm OM| Fig. 442 Examplo 2: © System with 4 proce 2, pB. pO is source, and i ‘Assumption : Possible and 0. + st ial value to be 1 for pl and p3, For p2, it sends a * Step 2: OM(0). pl sends 1 to {p2, pi. p3 sends 1 to {pl,p2) ‘= p2 sends 0 to pl and ps. Step 3: Majority function at 3 Majority * Ply p2, p3 is still the same (1), which is the desired TECHIICAL PuBLICATIONS® +8 wiht for knowledge pare ee ae —omenting ese cry Agreement in a Failure-F and Asynchronous)" S¥8t@m (Synchronous In a failure-free system, consensus can ing information from ing this decision in A distributed mechanism would have each p and each process computes the cast its values to others, received. KONOUS syst number of rounds he decision value can be obtained using an additional round. lop processes in a system, / global constant ; maximum number of crash failures tolerated Integer: x+ local value; Process P, executes the consensus algorithm for up tof crash failures for round from 1 t0 +1.do {if tho curtont value of x haa not been broadcast then broadcast(x); y) ¢ value (if any) received from process j in this round; ‘The agreement cor least one round in which no process failed. ty condition is satisfied because processes do not send fictitious values + The val in this + The termination condition is sen to be satisfied ‘There are { + 1 rounds, where f 0) checkpoint of process Pp is assigned a sequence number i and is denoted by Chi We also assume that each process Fp with a virtual checkpoint that represents the last takes an initial checkpoint C,,o imme« before execution begins and ends state attained before termination. ‘The it® checkpoint interval of process Fy, denotes ll 1) checkpoint, including the i I the computation performed (® checkpoint but not the between its i and +1)" checkpoint. EE] consistent Set of Checkpoints Fig, 4.9.1 shows consistent and incons processes (sites) that interact with one another es checkpoints, All the sites sav one from each TATIONS? = an pth fr howe? TECHNICAL PUBLIC (1) Conaatet state (bp inconsistent stato Fig. 4.9.4 effect Is caused by omphan messages, which in turn are caused by "y psrouted COMPUT an «Fig, 49.3 shows consistent set of checkpoint Consensus and Recovery labelled. 2 Records of lost abo_recelve,M ‘nat (a message mani fiat Jel seny8) TECHNICAL Fig. 49.4 ing counter with which each message from that node is last message fom and the first message to all other nodes. —pusureaTIOnS®- on wast fr Mowe Consensus and Recovey (Detrbuted Computing “on Druted Computing Note : “a” denotes a “smallest label” that is < any other label and “II” denotes , “largest label” that is > any other label. ‘Simple method for taking consistent set of checkpoint 4+ The set of the most recent checkpoints is always consistent, EEEEI synchronous checkpointing and Recovery Livelock problem during fecovery is avoided by taking a consistent set of checkpoints ‘Algorithm is said to be synchronous when the processes involved coordinate their local checkpointing actions such that the set of all recent checkpoints in the system guaranteed to be EEE chectpoining agortnm + Make some simplifying assumptions 1. Processes communicate by exchanging messages through channels. “2 Channels are FIFO, end-to-end protocols cope with message loss due to rollback recovery. 3. Communication failures do not partition the network. * A single process invokes the algorithm. The checkpoint and the rollback recovery algorithms are not invoked concurrently. a : A temporary checkpoint that is mad. : is made a permanent checkpoint on the ‘sctsful termination ofthe checkpoint algorithm _ 2 Permanent: A local checkpoint at a process. Phase One * Initiating process P, takes a tentati ive checkpoint : take tentative and requests that all the process + Each process informs P, whether i N succeeded in taking a tentative checkpoint. + IF; leams that all processes ha checkpoint ve taken tentative na tentative checkpoints should be made permanen, “eee * Otherwise F desis tha al etic cecpoints should be disca s ded. TT aetase paeanaP * 8 tet fr inonledge ated Computing tn phase TWO 1. P, propagates its decision to all processes 2. Om receiving the message from P, sl process ac 3 h y act accordingly. + Between tentative checkpoint and comit/abort of ehckpoin process must hold Consensus and Recovery back messages. + Does this guarantee we have a strongly consistent state ? Can you construct an example that shows we can stil have lost messages ? synchronous Checkpointing : Properties, + All processes from which P, has received messages after it has taken its last checkpoint take a checkpoint to record the sending of those messages. + Fig. 49.5 shows the checkpoints taken unnecessarily Fig. 49.5 Checkpoints taken unnecessarily 1. Process X decides to initiate checkpoint aigoritn ater receiving SSG 2 Te take a tentative checkpoints Xp and sends take tentative checkpoint messages to processes ¥ and Z,, causing Y and 3. Now { xp, yp, 22) forms a consist os 4. xy, yp, zi) also forms a consistent sf oF ~ aor Tepe PUBS Z to take checkpoints yz and 2 respectively of checkpoints points a Consensus Daeuted Conpuing eM _______RNN ny last message received by X from Y way the last checkpoint (last_recv(x, y) >. Synchronous Checkpointing Disadvantagos al messages must be exchanged to coordinate checkpointing.. progress, algorithm places an unnecessary extra load on the system, which can si affect performance. AZEXA Tho Rollback Recovery Algorithm + Restore the system st Phaso One : Process checks whether al “ Al proess are willing to restart fom thee previous checkpoints, = ae mre # A process may reply “no” if it is alread is already participating in a inting of secoverng Poss insted by some cer proces * StHPointn © + Wall processes are willing to ling to restat from their tha ey rd i 1 rst fom their previous checkpoints, , desis * Otherwise, P, decides that all the processes continue with the activities. Phase Two: +P propagates decison a pce * On receiving P;s decision, the : accor the pres set cd ‘Optimization ae + Aino nanber imum mantra + wl es rm i peomany where the sending of one of mare se Pet SHY if X is rolling back to sat ‘Messages from X to Y is being undone. OR aa gpg 20 upset fr bo pnbuted COMPUIPS aot i Consensus and Recovery 1s Fig, 49.6 shows the unnecessary rolack, =f af — eee Ka] Me iz T Fig. 46 Unnecessary rock EEE] Message Types Intransit message : Messages that have been sent but not yet received Lost messages : Messages whose “send” is done but "receive" is undone due to rollback Delayed messages receiving process was ei ages whose "receive" is not recorded because the 1 down or the message arrived after rollback ive” recorded but message "send" not Orphan messages : Messages wi recorded - do not arise if processes roll back to a consistent global state Duplicate messages : Arise due to message logging and replaying during process recovery. eae ERD issues in Failure Recovery operational state, Once a failure i al + Recovery rofers to restoring a system 10 is norm re ou crerrial that the process where the failure happened

You might also like