0% found this document useful (0 votes)
14 views

Document 32Distributed computing concept

Distributed computing concept

Uploaded by

devikrishna8300
Copyright
© © All Rights Reserved
Available Formats
Download as PDF or read online on Scribd
0% found this document useful (0 votes)
14 views

Document 32Distributed computing concept

Distributed computing concept

Uploaded by

devikrishna8300
Copyright
© © All Rights Reserved
Available Formats
Download as PDF or read online on Scribd
You are on page 1/ 16
Consensus and Agreement Algorithms : Problem Definition +” Processes/Sites in distributed systems often compete .as well as cooperate to achieve a common goal. Mutual Trust/agreement is very much required In distributed data bases; there may be a situation where data managers have to decide “Whether to commit or Abort the Transaction”. When there is no failure, reaching an agreement is easy. However, in case of failures, processes must exchange their values with other processes and relay the values received from others several times to isolate the effect of faulty processor: Agreement protocols helps to reach an agreemient in, presence of failures. Examples : Agreeing whether to commit or to abort a transaction in a distributed database management system. Agreeing on a common clock value in a distributed system. i In.the absence of failures or faulty processors, values (that is to be decided) can be exchanged. A vote can be taken aiid’ decision/ agreement can be made based on : majority, minimum vote, mean, etc. Presence of failure : Processors can fail or misbehave intentionally. ‘of message exchanges might be needed before agreement can be reached. Several rounds ate ee ene 1 ERE system Wan alone mn * Agreement problems have been studied under following system model 1. ‘n’ processors in the system with at most m of them being faulty. 2. Processors can exchange messages directly (no need to go through another processor) ( 3 Receiver knows the identity of the sender. i ~ 4 Coniminnication miedium is reliable : messages are deliveréd without errors. ( ERIEI Synchronous vs Asynchronous Computation ‘Synchronous Computation ( 1. Processes run in lock step manner, oF in rounds. ( 2 Step of synchronous computation is called round 3 In each step/round i, processes receive messages that were sent in the previous ( step/round i-1. ( 4. Processes then do some computation and send out messages, that will be received in step/round i + 1. 5: lesoige deligg c' Haw prose owt tones te hdc sd 6. Agreement protocols to be discussed assume synchronous computation. ' Asynchronous Computation ' i Cita dove nak peocead ek ta 2. Process can send receive messages and perform computation at any time. Model of Processor Failures "+ Processor'can fail int three modes : - 1. Crash fault : Processor stops and never resumes operation, 2 Omission fault : Processor Omits to send message to some processors 3. Malicious fault : Also known as Byzantine faults. Processor may send fictitious values /message to other processes to confuse them. Tough to detect/correct. Authenticated vs Non-authenticated Messages + There are two types of messages 1. Authenticated 2. Non-authenticated messages Authenticated Mossages ; * Alto known as signed message * Processor can not forge/change a reccived message. + Processor can ve + Wis easier to reach on an agreement inv this case “Non-authenticated Messages : + Also known as Oral Message. ify the authenticity of the message. * Processor ‘can forge/change a received message and daims to have received it from others. Processor can not verify the authenticity of the message in this cas ERIE] Pertormance Aspects of Agreement Protocols + Fellowing metrics are used 1. Time : No of rounds needed to reach an agreement 2 Message traffic : Number of messages exchanged to reich an agreement. 3. Storage overhead :,Amount of information that needs to stored at processors during execution of the protocol. 1. Compare synchronous vs asynchronous computation, 2 Explain model of processor fiers are Seni reoron revo wen (SynchoorOS 5y Sis enemy, they must decide upon a common plan of action. Some of the may be traitors, trying to pr generals ent the loyal generals from reaching agreement.” + Three or more generals are agree {to attack or to retreat. Once the commander is iseues the order, lieutenants to the commander are to decide to attack or retreat + But the one or more of the rals may be treacherous, ie. faulty + If the commander is treacherous, he Proposes attacking to one general and retreating to another. + Wa lieutenant is treacherous, he tells one of his peers that the commander told tem to attack and another that they are to retreat + Sourwe processor broadcasts its values to others. Solution must meet following adjectives : Agreement: All non-faulty processors agree on. thi sime vale ; Validity : If source is nonfaulty, then the comution agreed Value must bé the value supplied by the source processor. + "WK source is faulty then all non - faulty’ processors can agree value", “Value agreed upon by faulty processors is irelevant” + Fg 4121 shows Byzantine agreement any common Fig. 4.124 * Ne lution for. three processes can handle single ta. Ina system with m ‘aay processes agreement can be achieved onl if there ate vt] (more then 2/3) functioning correctly. ERE consensus Problem \ Frery processor broadcaits ts initial value to all ther processors Intl values may be different for different processor ‘very processor has its own initial value: All non faulty processor must a igre on a single common value. * Winital value of non-faulty Processors is di (2 agree on any common value. * Nalue agreed upon by faulty processors is irrelevant Atseement: All non-aulty processors agree on the same single value. 72 Validity +1 initial value of eve Agreed value by all non faulty W inlal value of nonfaully processors is dilecar then all non PeeSOR AN agree on any common. value Processors is irrelevant. ——— ferent then all non - faulty processors ‘ery non-faulty processor is v , then the common Processors must be ¥. > faulty *. "Value agreed upon by faulty 4.10 AGREEMENT IN A FAILURE -FREE SYSTEM: In a failure-tree system, consensus can be reached by collecting information from the different. processes, arriving at a decision, and distributing this decision in the system. A distributed mechanism would have each process broadcast its values to others, and each process computes the same function on the values received. The decision can be reached by using an application specific function + Algorithms to collect the initial values and then distribute the decision may be based , on the token circulation on a logical ring, or the three-phase tree-based broadcast converge cast: broadcast, or direct communication with all nodes. * Inasynchronous system, this can be done simply in a constant number of rounds i + Further, common knowledge of the decision value can be obtained using an additional round «In an asynchronous system, consensus can similarly be reached in a constant number ' of message hops. A-~ +. Further, concurrent common knowledge of the consensus value can also be attained © 4,11 AGREEMENT IN(MESSAGE PASSING)SYNCHRONOUS SYSTEMS WITH FAILURES: \ 411.1 Consensus algorithm for rash failures (synchronous system): + Consensus algorithm for crash failures message passing synchronous system — « The consensus algorithm for n processes where up to f processes where f 0) checkpoint of process Pp is assigned a sequence number i and is denote’ by Cy : : We also assume that each process F, takes an initial checkpoint Cp 9 immediately before execution begins and ends with a virtual checkpoint that represents the las! state attained before: termination. ; The i" checkpoint interval ‘of process Pp denotes all the computation performed between its i" and (i+1)": checkpoint; including the it" checkpoint but not the! (i+1)"™ checkpoint. Issues in Failure Recovery : Recovery refers to restoring a system to its normal operational ste. Once 4 ee has occu, it resent thatthe proves where the failure happened can recover to a correct state Fundamental to fault tolerance is the recovery from an error. Resources are_allocated to executing, processes in a computer. For example : a process has memory allocated to it and a process may have locked shared resources, such as files and memory. Following are some solution on process recovery, Reclaim resources allocated to process Undo modification made to databases and Restatt the process (Or restart process from point of failure and resume execution distributed process recovery, undo effect’of interactions of failed process with other cooperating processes. peepee Basic Concept 7 System is combination of hardware and software components. These components provide a specified service. Failure of 3 sysiem Sccurs when the system does ‘not perform its service in the manner specified. ‘An erroneous state ofthe system is a state which could lead to a system failure by 2 sequence of valid slate transitions. A system is said to “fail” when it cannot meet it about by the existence of “errors” in the system, A system is said to have a failure if the service it deliv {rom compliance with the system specification for a specif Fig. 44.1 shows concept of fault and recovery. 's promises. A failure is brought fers to the user deviates ied period of time. 77 System failure : System does not meet require stole saci ‘ments, ice does not perform its Leads 10 Fig. 44:1 Concept af recovery 77 Exoncous system state State which could lead ta system file by a sequence of valid sate transitions eA Bre the pat of the sytem state which ifr rom its ited vane 27 Fault: Anomalous physical condition, eg. design erors; manufacturing, problems, damage, external disturbances = | 1 ges Eror b Fault cb EE] checkpoint Based Recovery + The basic idea behind checkpoint-recovér isthe saving and cestoration-of-sysiom state By saving the current state of the system periodically or before ertical code Sections, it provides the baseline information needed for the restoration of lost slate in the event of a system failure: + While the cost of checkpointxecovery cai be high, by using techniques tke memory exclusion, and by designing a system to have as small a critical state as Possible may minimize the cost of checkpointing enough to be useful in even cost sensitive embedded applications. + When a system is checkpointed, the state of the entre system is saved ty” nonvolatile storage, —_ + The checkpointing mechanism takes a snapshot of the system state and stores the 4ata on some non-volatile storage inediumn. + Cheatly, the cos saved and the | of a checkpoint will vary with the amount of state required to be im the event of a system failure, the intemal state of the system can be restore can continui® service from the point at which its state was last saved. Typically this involves restarting the failed task or system, and providing Parameter indicating that there js state to be recovered. Depending on the task complexity, the amount of state, and the bandwidth to the Storage device this process could take from a fraction. of a second to mar seconds. This technique provides protection against the ransent fault suc typical ‘upon state restoration the system Will continue processing in an identical manneg as it did previously. This will tolerate any transient fault, however if the fault was caused by a desig error, then the system will continue to fail and recover endlessly. In some cases, this may be the most important type of fault to guard against, but not in everff case, 1. Uncoordinated Checkpointing i *+ Esch process has autonomiy in deciding when to take checkpoints ‘ + Advantages : The lower runtime overhead during normal execution + Disadvantages e a Domino effect during a recovery Bb. Recovery from a failure is slow because processes need to iterate to find a Consistent set of checkpoints ; ¢ Each process maintains multiple checkpoints and periodically invoke a garbage collection algorithin 4. Not suitable for application with frequent output commits + The. processes record the dependencies among their checkpoints caused by message exchange during failure-free-operation record the dependencies among their checkpoints caused by message exchange during failure free operation. Direct dependency tracking technique + Assume each process P, starts its execution with an initia ckckpoint C, o + I, is a checkpoint interval and itis an interval between Cy xy and C,,,. + When P} receives a message m during I, y it records the dependency from I 1y which is later saved onto stable storage when By takes Cy.” ‘ < t q + Inorder to determine a consistent global checkpoint during recovery, the processes { « 4 ‘ ‘ r + When failure occurs, the recovering, process initiates rollback by broad casting, & dependency request message to collect al the dependency information maintained by each process. 2. Coordinated Checkpointing “*, Cooidinated checkpointing, simplifies faihure recovery and eliminates domino elles in case of failures by Fiseiving a corsistent global checkpoint on stable orage : tv the “approach sulférs fom high overhead associated with the checkpointing process + Two approaches are used to reduce the overhead: fis s fo minimize the nurter of synchronization messages and the numberof checkpoints, the other is to make the checkpooting process nonblosking Blocking Checkpointing + After a process takes local checkpoint, 0 prevent orphan messages, it _semains ‘blocked until the entire checkpointing activity is complete. + Fig 451 shows blocking checkpointing Fig, 45.1 Blocking checkpointing + When a process takes a checkpoint, it engages a protocol fo coordinate with other processes to also checkpoint a) Coordinator takes a checkpoint; broadcasts a message to all processes bb) Braces rectves this message and halts execution: takes tentative checkpoint ©) Coordinator receives acknowledgement (rom all processes; broadcasts comunit message 19 end protocal 4d) Process receives commit message, removes old permanent checkpoint and makes tentative chekpoini permanent ) Processes resume execution + Disadvantages : The comp ‘Non blocking Checkpointing ‘+ The processes need not stop their execution while taking checkpoints. 2 +. Key issue with coordinated checkpointing: Being able lo prevent a process from receiving application messages that could make the checkpoint inconsistent | Problem can be avoided by preceding, the first postcheckpoint message on each channel by a checkpoint request, forcing each process to take a checkpoint upon receiving the first checkpoint-request message J+A fundamental problem in coordinated checkpointing isto prevent a process from receiving application messages that could make the checkpoint inconsistent, ly fig. 452 shows non-blocking checkpoint. ion is blocked during the checkpointing nate Iaiitoe aioe esipoint request Checkpoint request checkpont Po Gs Po Com Pe Gun 's fs Cy Fee ® © @ Fig. 43.2 Non-blockinig checkpoint + The Chandy-Lamport-algorithin is the nonblocking algorithm for coordinated sbeckpointing, . Example of Coordinated Checkpointing - 2) Checkpoint inconsistency + the process Py sent message “in” after receiving a checkpoint request from the checkpoint coordinator. Assume message “m* reaches process [before the checkpoint request, This situation results in. an inconsistent checkpoint since checkpoint C),« shows the receipt of message “m™ fooen Py, while checkpoint Co, dogs not show m being sent from Py ) Solution with FIFO channels : If channels are FIFO, this problem can be avoided by preceding, the first post-checkpoint message on each channel by a checkpoint request, forcing each process 10 take a checkpoint before receiving the first post-checkpoint message. cmmuncaninduced Checkooining Baa ese ends Cicapeting (CK) protese e Papas Oo ns ates Ge ee Fein nahh reciente ri Sas eee ess + It axoids doming cect, while allowing processes to take some oftheir checkpoints = £ Eaeeee ete ined ected feed ns lin i teat Wola! poland SMe arate ee = Eigen av lt ott pa yg lain oa tats cad ws all ons ous Geet see + Comimunication induced — checkpointing . piggybacks protocol and related information on each application’ message + The receiver of each application message uses the piggybacked information to determine if thas to take a forced checkpoint to advance the global recovery line. The forced checkpoint must be takeir before the application may process thé contents of the message.. In contrast with coordinated checkpointing, no special coordination messages are exchanged. * Two_types of | communication induced’ checkpointing are model-based

You might also like