DC (Unit 4)

The document outlines consensus and recovery algorithms in distributed systems, focusing on the Byzantine Agreement Problem and various recovery techniques such as checkpointing and rollback recovery. It discusses the challenges of achieving agreement in both synchronous and asynchronous systems, particularly in the presence of failures. Additionally, it covers the definitions, algorithms, and issues related to failure recovery, including coordinated checkpointing and the complexities involved in distributed systems.

Uploaded by

maniputu11

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF or read online on Scribd

0% found this document useful (0 votes)

13 views14 pages

DC (Unit 4)

Uploaded by

maniputu11

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF or read online on Scribd

You are on page 1/ 14

Consensus and Recovery Syllabus Consensus and Agreement Algorithms : Problem Definition - Overview of Results - Agreement in a Failure-Free System (Synchronous and Asynchronous) - Agreement in Synchronous Systems with * Failures; Check-pointing and Rollback Recovery : Introduction - Background and Definitions ~ Issues in Failure Recovery - Checkpoint-based Recovery - Coordinated Checkpointing Algorithm - Algorithm for Asynchronous Checkpointing and Recovery. Contents 4.1 Consensus and Agreement Algorithms : Problem Definition 4.2. Byzantine Agreement Problem 43 Overview of Results 4.4 Solution to Byzantine Agreement Problem 45 Agreement in a Failure-Free System (Synchronous and Asynchronous) 4.6 Agreement in Synchronous Systems with Failures . 4.7 Introduction of Check-pointing and Rollback Recovery . . 4.8 Background and Definitions 49 Consistent Set of Checkpoints Issues in Failure Recovery Checkpoint-based Recovery Coordinated Checkpointing Algorithm Algorithm for Asynchronous Checkpointing and Recovery Two Marks Questions with Answers Marks 13 Marks 13 May-22, Dec.-22, «+--+ -- Marks 13 a7a 42 oor ob Dion itt HERI Consensus and Agreement Algor in distributed systems oft e a common goal. Mutual Trust/agree len compete 95 well a5 cooperate 4, rent is very much required. jan where data managers have jon”, When there is no Agreement protocols Examples : Aj Several rounds agreement can be reached. [EE] Byzantine Agreement Problem * The Problem : “Several divisions of the Byzantine army are camped outside an y, each division commanded by its own general, After observing, the decide upon a common plan of action, Some of the generals generals from reaching agreement.” Is are agree fo atack oF to retreat, Once the commander i vutenants to the commander 3k or retreat retreating to another. + Ifa lieutenant eacherous, he him to attack and another that they are Of is pers thatthe commander told to retreat * Source processor broadcasts objectives Agreement : its val ves f0 others, Solution must meet followité ty processor Y Processors agree on the same value. TECHNICAL PUBLICATIONS? , 99 woth for knowedgo puted Computing a If source is nonfaulty, then the comy al on agreed value must be the ice is faulty then all non - fa pure 'y processors can agree on any common value”. “Value ageced upon by faulty processor ie erent «Fig 42.1 shows Byzantine agreement, Consensus Problem Me ty processors ty processors is different then all non - ‘can agree on any common value. gree on the same single value. processor is v , then the common alee nota Pe ny processors must be ue rent then all agreed value by © “If initial value of non: e processors can agree on any common vs Processors is irrelevant”. EEEA interactive Consistency Problem i. Every processor has its own ji, All non faulty processors must 9Br°< TECHNICAL PUBLICATIONS reo on a set of common values. ‘an uptist fer hoodail non fully Processors must egg, rement is on a single value. In Byzan 1 is on a set of common values, ment tency problems aer® initializes the value whe lem, only one Processor teas has its own initial value. ‘© In interactive consis it pro processor EEE] overview of Results in asynchronous system even if one Process can fail by ‘Synchronous system Asynchronous system ‘Agreement not attainable “Agreement attainable 3. | Byzantine fare Z [f= (a= 1) 6) byzantine process EEZI solution to Byzantine Agreement Problem + First defined and solved by Lamport + An arbitrary source processor broadcasts its initial value to all others. + If the source processor is faulty, other non-faulty processor can agree on sy ‘common value. + Faulty processors’ values and agreements do not matter, 1 tty pes ae my, he nay proceso can! 0h * Number of faulty processors, cannot exced : tune n-)/3] {This bound canbe relaxed for systems using authenticated messages. + Solution must meet following objectives: \ Agreement: All nonaulty processors agree on the same value. source is nonfaulty, by the source processor, bia Raa" 3” upset for knowledge then the common agreed value must be the val Disiio ee ——— 4-5 [EEE impossible Scenario 1 Consider a system with 3 processors: po, pi, pa Consensus and Recovery Ource) is not faulty. 2 is faulty. pt should agree upon 1 as the value. Not possible, ae ee «Case 2 : p0is faulty. pl may agree on 1 and p2 on 0 Fig. 44.1 processor pO non faulty and processor po faulty EEE] Lamport-Shostak-Pease Al: + This algorithm also known as Oral Message Algorithm OM(m) where m is the number of ‘+’ = Number of processors and n >= 3m+l recursively defined as fllows= ——y ee Lampert ays a 1p ph. ps source pis ay aS Hy Land 0. pppoe oa jal value to be 1. to [p2, p3I- pd sends he faulty one) sends 110 pl and 0 t0 p3. ction at pl and pis 1, which is the desired result, (yy fu algorithm OM| Fig. 442 Examplo 2: © System with 4 proce 2, pB. pO is source, and i ‘Assumption : Possible and 0. + st ial value to be 1 for pl and p3, For p2, it sends a * Step 2: OM(0). pl sends 1 to {p2, pi. p3 sends 1 to {pl,p2) ‘= p2 sends 0 to pl and ps. Step 3: Majority function at 3 Majority * Ply p2, p3 is still the same (1), which is the desired TECHIICAL PuBLICATIONS® +8 wiht for knowledge pare ee ae —omenting ese cry Agreement in a Failure-F and Asynchronous)" S¥8t@m (Synchronous In a failure-free system, consensus can ing information from ing this decision in A distributed mechanism would have each p and each process computes the cast its values to others, received. KONOUS syst number of rounds he decision value can be obtained using an additional round. lop processes in a system, / global constant ; maximum number of crash failures tolerated Integer: x+ local value; Process P, executes the consensus algorithm for up tof crash failures for round from 1 t0 +1.do {if tho curtont value of x haa not been broadcast then broadcast(x); y) ¢ value (if any) received from process j in this round; ‘The agreement cor least one round in which no process failed. ty condition is satisfied because processes do not send fictitious values + The val in this + The termination condition is sen to be satisfied ‘There are { + 1 rounds, where f 0) checkpoint of process Pp is assigned a sequence number i and is denoted by Chi We also assume that each process Fp with a virtual checkpoint that represents the last takes an initial checkpoint C,,o imme« before execution begins and ends state attained before termination. ‘The it® checkpoint interval of process Fy, denotes ll 1) checkpoint, including the i I the computation performed (® checkpoint but not the between its i and +1)" checkpoint. EE] consistent Set of Checkpoints Fig, 4.9.1 shows consistent and incons processes (sites) that interact with one another es checkpoints, All the sites sav one from each TATIONS? = an pth fr howe? TECHNICAL PUBLIC(1) Conaatet state (bp inconsistent stato Fig. 4.9.4 effect Is caused by omphan messages, which in turn are caused by "y psrouted COMPUT an «Fig, 49.3 shows consistent set of checkpoint Consensus and Recovery labelled. 2 Records of lost abo_recelve,M ‘nat (a message mani fiat Jel seny8) TECHNICAL Fig. 49.4 ing counter with which each message from that node is last message fom and the first message to all other nodes. —pusureaTIOnS®- on wast fr MoweConsensus and Recovey (Detrbuted Computing “on Druted Computing Note : “a” denotes a “smallest label” that is < any other label and “II” denotes , “largest label” that is > any other label. ‘Simple method for taking consistent set of checkpoint 4+ The set of the most recent checkpoints is always consistent, EEEEI synchronous checkpointing and Recovery Livelock problem during fecovery is avoided by taking a consistent set of checkpoints ‘Algorithm is said to be synchronous when the processes involved coordinate their local checkpointing actions such that the set of all recent checkpoints in the system guaranteed to be EEE chectpoining agortnm + Make some simplifying assumptions 1. Processes communicate by exchanging messages through channels. “2 Channels are FIFO, end-to-end protocols cope with message loss due to rollback recovery. 3. Communication failures do not partition the network. * A single process invokes the algorithm. The checkpoint and the rollback recovery algorithms are not invoked concurrently. a : A temporary checkpoint that is mad. : is made a permanent checkpoint on the ‘sctsful termination ofthe checkpoint algorithm _ 2 Permanent: A local checkpoint at a process. Phase One * Initiating process P, takes a tentati ive checkpoint : take tentative and requests that all the process + Each process informs P, whether i N succeeded in taking a tentative checkpoint. + IF; leams that all processes ha checkpoint ve taken tentative na tentative checkpoints should be made permanen, “eee * Otherwise F desis tha al etic cecpoints should be disca s ded. TT aetase paeanaP * 8 tet fr inonledge ated Computing tn phase TWO 1. P, propagates its decision to all processes 2. Om receiving the message from P, sl process ac 3 h y act accordingly. + Between tentative checkpoint and comit/abort of ehckpoin process must hold Consensus and Recovery back messages. + Does this guarantee we have a strongly consistent state ? Can you construct an example that shows we can stil have lost messages ? synchronous Checkpointing : Properties, + All processes from which P, has received messages after it has taken its last checkpoint take a checkpoint to record the sending of those messages. + Fig. 49.5 shows the checkpoints taken unnecessarily Fig. 49.5 Checkpoints taken unnecessarily 1. Process X decides to initiate checkpoint aigoritn ater receiving SSG 2 Te take a tentative checkpoints Xp and sends take tentative checkpoint messages to processes ¥ and Z,, causing Y and 3. Now { xp, yp, 22) forms a consist os 4. xy, yp, zi) also forms a consistent sf oF ~ aor Tepe PUBS Z to take checkpoints yz and 2 respectively of checkpoints pointsa Consensus Daeuted Conpuing eM _______RNN ny last message received by X from Y way the last checkpoint (last_recv(x, y) >. Synchronous Checkpointing Disadvantagos al messages must be exchanged to coordinate checkpointing.. progress, algorithm places an unnecessary extra load on the system, which can si affect performance. AZEXA Tho Rollback Recovery Algorithm + Restore the system st Phaso One : Process checks whether al “ Al proess are willing to restart fom thee previous checkpoints, = ae mre # A process may reply “no” if it is alread is already participating in a inting of secoverng Poss insted by some cer proces * StHPointn © + Wall processes are willing to ling to restat from their tha ey rd i 1 rst fom their previous checkpoints, , desis * Otherwise, P, decides that all the processes continue with the activities. Phase Two: +P propagates decison a pce * On receiving P;s decision, the : accor the pres set cd ‘Optimization ae + Aino nanber imum mantra + wl es rm i peomany where the sending of one of mare se Pet SHY if X is rolling back to sat ‘Messages from X to Y is being undone. OR aa gpg 20 upset fr bo pnbuted COMPUIPS aot i Consensus and Recovery 1s Fig, 49.6 shows the unnecessary rolack, =f af — eee Ka] Me iz T Fig. 46 Unnecessary rock EEE] Message Types Intransit message : Messages that have been sent but not yet received Lost messages : Messages whose “send” is done but "receive" is undone due to rollback Delayed messages receiving process was ei ages whose "receive" is not recorded because the 1 down or the message arrived after rollback ive” recorded but message "send" not Orphan messages : Messages wi recorded - do not arise if processes roll back to a consistent global state Duplicate messages : Arise due to message logging and replaying during process recovery. eae ERD issues in Failure Recovery operational state, Once a failure i al + Recovery rofers to restoring a system 10 is norm re ou crerrial that the process where the failure happened

DC - Unit Iv - Consensus and Recovery Notes
No ratings yet
DC - Unit Iv - Consensus and Recovery Notes
33 pages
# Consensus and Agreement Algorithms: Distributed Computing
No ratings yet
# Consensus and Agreement Algorithms: Distributed Computing
9 pages
Byzantine Generals Problem MIT6 - 852JF09 - Lec04 (Ocw - Mit.edu)
No ratings yet
Byzantine Generals Problem MIT6 - 852JF09 - Lec04 (Ocw - Mit.edu)
48 pages
Distributed Computing UNIT-4
No ratings yet
Distributed Computing UNIT-4
27 pages
Distributed UNIT IV
No ratings yet
Distributed UNIT IV
60 pages
Unit 4
No ratings yet
Unit 4
94 pages
1904050001
No ratings yet
1904050001
119 pages
DC Unit 4 Summary Distributed Computing
No ratings yet
DC Unit 4 Summary Distributed Computing
15 pages
Concurrent Checkpointing and Recovery in Distributed Systems
No ratings yet
Concurrent Checkpointing and Recovery in Distributed Systems
61 pages
DC MergedPostMidSem
No ratings yet
DC MergedPostMidSem
262 pages
DC - Unit IV
No ratings yet
DC - Unit IV
36 pages
Checkpointing and Rollback Recovery For Distributed Systems 5cvcuy5txm
No ratings yet
Checkpointing and Rollback Recovery For Distributed Systems 5cvcuy5txm
23 pages
Unit Iv
No ratings yet
Unit Iv
46 pages
Byzantine Agreement: Distributed Systems
No ratings yet
Byzantine Agreement: Distributed Systems
32 pages
Distributed Systems - Agreement Protocol
No ratings yet
Distributed Systems - Agreement Protocol
17 pages
Presentation On Consistent Checkpoints & Recovery in Distributed System
100% (1)
Presentation On Consistent Checkpoints & Recovery in Distributed System
26 pages
Dc-3551 Unit IV Notes
No ratings yet
Dc-3551 Unit IV Notes
32 pages
DC - Unit 4 Latest
No ratings yet
DC - Unit 4 Latest
110 pages
Document 32distributed Computing Concept
No ratings yet
Document 32distributed Computing Concept
16 pages
Agreement Protocols-I
No ratings yet
Agreement Protocols-I
38 pages
Unit Iv Consensus and Recovery
No ratings yet
Unit Iv Consensus and Recovery
38 pages
Lm1-Consensus Algorithm
No ratings yet
Lm1-Consensus Algorithm
35 pages
4th Unit Topics Recovery
No ratings yet
4th Unit Topics Recovery
73 pages
Chapter 15
No ratings yet
Chapter 15
33 pages
DC Unit IV
No ratings yet
DC Unit IV
37 pages
UNIT 4 DC Final
No ratings yet
UNIT 4 DC Final
38 pages
4.2.1. Problem Definition
No ratings yet
4.2.1. Problem Definition
14 pages
Da10 Byzantine
No ratings yet
Da10 Byzantine
28 pages
DC Unit 4 Important
No ratings yet
DC Unit 4 Important
6 pages
Chapter 8-Fault Tolerance
100% (1)
Chapter 8-Fault Tolerance
71 pages
0 - Chap-8 Agreement Protocols
No ratings yet
0 - Chap-8 Agreement Protocols
9 pages
Ch8 Distributed
No ratings yet
Ch8 Distributed
12 pages
Lecture 18: Distributed Agreement: CSC 469H1F / CSC 2208H1F Fall 2007 Angela Demke Brown
No ratings yet
Lecture 18: Distributed Agreement: CSC 469H1F / CSC 2208H1F Fall 2007 Angela Demke Brown
35 pages
Unit 4 Final-1
No ratings yet
Unit 4 Final-1
25 pages
Process Resilience: by Ravalika Pola
No ratings yet
Process Resilience: by Ravalika Pola
17 pages
Unit 4 Part 1 B
No ratings yet
Unit 4 Part 1 B
31 pages
CS 425 / ECE 428 Distributed Systems Fall 2016: Lecture 16-A: Impossibility of Consensus
No ratings yet
CS 425 / ECE 428 Distributed Systems Fall 2016: Lecture 16-A: Impossibility of Consensus
40 pages
Module 5
No ratings yet
Module 5
11 pages
Module 5 Notes
No ratings yet
Module 5 Notes
10 pages
CH 4
No ratings yet
CH 4
25 pages
Agreement in A Failure-Free System: Rohini College of Engineering & Technology
No ratings yet
Agreement in A Failure-Free System: Rohini College of Engineering & Technology
6 pages
15 - Algorithm For Agreement
No ratings yet
15 - Algorithm For Agreement
3 pages
CS8603 UNIT 4 Agreement in A Failure Free System
No ratings yet
CS8603 UNIT 4 Agreement in A Failure Free System
37 pages
Unit5 Compressed Fault Tolerance - PACE
No ratings yet
Unit5 Compressed Fault Tolerance - PACE
11 pages
Another Advantage of Free Choice: Completely Asynchronous Agreement Protocols (Extended Abstract)
No ratings yet
Another Advantage of Free Choice: Completely Asynchronous Agreement Protocols (Extended Abstract)
5 pages
Impossibility of Distributed Consensus With One Faulty Process
No ratings yet
Impossibility of Distributed Consensus With One Faulty Process
7 pages
Fault Tolerance: Click To Add Text Dealing Successfully With Partial System. Key Technique: Redundancy
No ratings yet
Fault Tolerance: Click To Add Text Dealing Successfully With Partial System. Key Technique: Redundancy
48 pages
Unit 4
No ratings yet
Unit 4
11 pages
Synchronous Systems With Failures
No ratings yet
Synchronous Systems With Failures
9 pages
Agreement in Synchronous Systems With Failures
No ratings yet
Agreement in Synchronous Systems With Failures
8 pages
Consensus Failure
No ratings yet
Consensus Failure
79 pages
System Recovery
No ratings yet
System Recovery
38 pages
Manjula - RES
No ratings yet
Manjula - RES
10 pages
Nikil DS Report
No ratings yet
Nikil DS Report
4 pages
Lm3 Checkpointing Algorithm
No ratings yet
Lm3 Checkpointing Algorithm
40 pages
Fault Tolerance:-: Introduction, Process Resilience, Distributed Commit, Recovery
No ratings yet
Fault Tolerance:-: Introduction, Process Resilience, Distributed Commit, Recovery
52 pages
Aggrement Protocols
No ratings yet
Aggrement Protocols
17 pages
Agreement Protocols
No ratings yet
Agreement Protocols
17 pages
Consensus & Agreement: Arvind Krishnamurthy Fall 2003
No ratings yet
Consensus & Agreement: Arvind Krishnamurthy Fall 2003
41 pages
TechSurfCollegeInvite A4 June2025
No ratings yet
TechSurfCollegeInvite A4 June2025
1 page
Compiler Design - YesDee
No ratings yet
Compiler Design - YesDee
427 pages
DC (Unit 3)
No ratings yet
DC (Unit 3)
12 pages
Important Questions CN
No ratings yet
Important Questions CN
9 pages
Operating System
No ratings yet
Operating System
5 pages

DC (Unit 4)

Uploaded by

DC (Unit 4)

Uploaded by

You might also like