0% found this document useful (0 votes)
40 views14 pages

Quorum

Quorum based protocols

Uploaded by

Eugene Gitonga
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF or read online on Scribd
0% found this document useful (0 votes)
40 views14 pages

Quorum

Quorum based protocols

Uploaded by

Eugene Gitonga
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF or read online on Scribd
You are on page 1/ 14
‘A QUORUM-BASED COMMIT PROTOCOL Dale Skeen TR 82-483 February 1982 Department of Computer Science Cornell University Ithaca, New York 14853 A QUORUH-BASED COMMIT PROTOCOL Dale Skeen Computer Science Department Cornell University Ithaca, New York Abstract Herein, we propose a commit protocol and an associated recovery protocol that is resilient to site failures, lost messages» and network partitioning. The protocols do not require that a failure be correctly identified or even detected. The only potential effect of undetected failures is a degradation in performance. The protocols use a weighted voting scheme that supports an arbitrary degree of data replication (including none) and allows unila~ terally aborts by any site. This lact property facilitates the integration of these protocols with concurrency control protocols. Both protocols are centralized protocols with low message overhead. Introduction A transaction is, by definition, an atomic operation on a distributed database system. Either all changes by the transaction are permanently installed in the database, in which case the transaction is said to be con- mitted, or no changes persist, in which case the transaction is said to be aborted. It is the task of a commit protocol to ensure that a transaction is atomically executed. In this paper we propose a commit protocol that is resilient to multi- ple occurrences of the following classes of benevolent failures: arbitrary site failures, lost messages, and network partitioning. It does not require that the type of failure be correctly determined, in fact, resiliency is guaranteed even if failures go undetected. The protocol uses a weighted voting scheme to resolve conflicts during failures. When failures occur, a transaction is committed only if a minimum number of votes, called a commit quorum and denoted V,, are cast for committing. Similarly, in the presence of failures, a transaction will be aborted only if a minimum mumber of votes, called an abort quorum and denoted V,, are cast for aborting. A commit quorum does not have to equal an abort quorum, but their sun must exceed the total number of votes. Voting schemes have been proposed previously for transaction manage~ ment. Thomas introduced a majority voting scheme to ensure consistency in a fully replicated database ([THOM/9]). Gifford extended the scheme by assigning weights to sites and using quorums rather than a simple majority (LGIFF79]). The proposed protocol differs from the previous work in several important ways: . (1) It is a commit protocol, not a concurrency control schene. It provides atomicity at a pex transaction basis. Nonetheless, it is straightfor- ward to integrate any type of concurrency control protocol into this protocol. (2) It allows unilateral aborts during the first phase of the transaction. A site may decide to abort because of several reasons, for example, a deadlock is detected locally. (3) It is primarily intended for partially replicated distributed databases vhere a transaction can read fron any copy but must update all copies. In addition, the protocol exhibits the following properties? (1) It is a centralized protocol and, thus, benefits from the economy of centralized protocols. (2) In the absence of failures it is no more expensive than previously pro~ posed protocols that are resilient only to coordinator failures (and not to a partitioning of the network). (3) If all failures are eventually repaired, then the protocol will eventu- ally terminate. (4) It is a blocking protocol -- operational sites must occasionally wait until a failure is repaired. This is an undesirable but necessary pro- perty exhibited by any protocol that is resilient to network partition~ ing ((SKEE8la]). However, the protocol can be tuned so that the frequency of blocking is low. This paper is divided into six sections. The second section states our assumptions and defines the terminology used in the remainder of the paper. The third section develops a resilient quorum-based commit protocol, and the fourth section develops @ resilient quorum-based recovery protocol. The recovery protocol is invoked whenever a group of sites can no longer commun- icate with the original coordinator (either it has failed or the network has Partitioned). Like the conmit protocol, it is a centralized protocol. The fifth section discusses performance, and the sixth section concludes the Paper. Although the protocols proposed are resilient to many classes of failures, this paper will focus on the problem of network partitioning. This class of failures is generally agreed to the most difficult class to handle. The other two classes, site feilures and lost messages, can be cast special cases of a partitioned network. In a site failure, a single site is isolated (partitioned) from the remainder: of the network. A lost message can be viewed as a very short lived partitioning. In all cases, the proto- cols work without modifications. 2. Background We assume that an underlying communications network provides point-to- point communication between any pair of sites. We also assume that it gen- erates no spontaneous messages, and that garbled messages are detected and deleted. We do not assume that messages arrive in order nor that it detects lost messages. A partitioned network occurs when there are two or more disjoint groups of sites such that no communication is possible between the groups. Each of the disjoint groups is called a partition. A distributed transaction T is decomposed into subtransactions 1). T,» ssey Tye where a subtransaction is executed at one of the N participating sites. Any subtransaction can be unilaterally aborted, which results in the abortion of the entire transaction. Hence, for transaction T to be commit~ ted, all sites must agree to conmit their subtransaction. MWe assume that a subtransaction can be atomically executed by a local transaction management system ([GRAY79 ,LIND791). It is the responsibility of a commit protocol to ensure that all sub- transactions are consistently committed or aborted. One of the simplest commit protocols is the two-phase protocol ([GRAY79, LAMP76]) depicted in Figure 1. The protocol uses a central site, the coordinator, to direct the execution of the transaction at the.other sites. Each slave has a chance to abort the transaction by replying with a "no" in the first round. A commit protocol can be conveniently described by a set of state diagrams, one for each participating site ([SKEES1a]). The diagram for Site i describes the processing of subtrensaction T;. A state in the diagran is called a local transaction state. In the two-phase conmit protocol, a single state diagran (illustrated in Figure 2.) suffices to describe processing at all sites. For both the coordinator and the slaves, there are four distinct and easily identified CCORDIRATOR SLAVE (1) Transaction is received. Subtransactions are sent to each slave. Subtransaction is received. A reply is sent: yea to commits no to abort. (2) I£ all sites respond yes then commit is sent; 7 else, abort is sent. Either commit or abort is received and processed. Figure 1. The two-phase commit protocol. Figure 2. The state diagram for the two-phase commit protocol. loca} transaction states: the imitial state (state q in the diagram), the wait state (w), the abort state (a), and the commit state (c). A site occu- pies the initial state until it decides whether to unilateral abort the transaction. If the site decides against an abort, then the wait state is entered. This state represents a period of uncertainty for the sites where it has agreed to proceed with the transaction but does not yet know its out come (i.e. committed or aborted). The commit and abort states are self- explanatory. The local transaction states of any protocol form two disjoint subset: the committable states and the noncommittable states. A site occupies a conmittable state only if all sites have agreed to proceed with the transac~ tion. For example, the only committable state in the two-phase commit pro- tocol is the commit state, A state that is not a committable state is a noncomittable state. 3. A Resilient Commit Protocol The two-phase commit protocol is not a-very robust protocol. Whenever the coordinator fails or becomes partitioned from the slaves, the slaves must block until the failure can be repaired. In this section we develop a very resilient commit protocol that allows recovery from both of these types of failures. The section develops the commit protocol in detail; the next section discusses the associated recovery protocols for handling coordinator failures and partitioning. Each site is assigned an integral nonnegative number of votes. (The number can be 0, in which case the site is a passive participant.) The basic idea is that whenever a group of communicating sites establishes a quorum, they are allowed to proceed. There are two distinct types of quoruns - a commit quorum and an abort quorum, Let Vs Vor required for a conmit quorum, and the number required for an abort quorum. A resilient quorum-based protocol must obey the following properties (LSKEEB1¢]): Q) Vet pY where 0V,. One argument concerns protocols allowing unilateral aborts: if a significant number of transactions are unilaterally aborted, then clearly V, should be smaller. A stronger argument is that most site failures are expected to occur during Phase 1 of the commit protocol since most of the transaction execution tine is epent in Phase 1. This phase is time consuming because the majority of the data processing takes place during it; whereas, Phase 2 and Phase 3 syn- chronize state information among the sites and require very little local Processing. If sites fail during Phase 1, then the transaction must be aborted -- hence, it should be easy to abort. Am interesting heuristic for choosing V, is based on a rough estimate of the failure distribution of the sites. ‘This heuristic is useful in environments where site failures, rather than network partitions, predom inate. Let P(V,) be the probability that at least an abort quorum is opera~ tional. P(V,) is a decreasing function in V,. The point is to choose the maximum V, such that V,<=Vg and P(V,) exceeds a minimum level of desired availability. As mentioned before, the weight of a site can be zero, in which case the site contributes nothing toward forming a quorum. (However, such a site can still unilaterally abort the transaction.) When designing a protocol, a zero-weighted site can be eliminated from all phases requiring the formation of a quorum. In the extrone case, where only a single site has a non-zero weight, a quorum based commit protocol degenerates into the standard two- 10 phase protocol with all of its disadvantages. Specifically, all sites must block on the failure of the only nonzero weighted site (vhich is normally the coordinator). 6. Conclusion The use of quorums is a standard recovery technique for handling net~ work partitioning (even primary site schemes, e.g. [STON79], are a degen- erate case of using quorums). We have presented a very general quorum-based commit protocol that can be used with both replicated and nonreplicated data. Unlike previous echenes it allows a single site to unilaterally abort the transaction. Quorum-based protocols are resilient because a site is allowed to par- ticipate in only one type of quorum. Quorum sizes are carefully chosen such that the formation of both a commit and an abort quorum requires the parti cipation of a common site. In this way mutual exclusion is assured -~ only one type of quorum can be formed during the execution of a transaction. (owever, it is possible for multiple occurrences of a single type of quorum to be formed. For example, since abort quorums are usually small, more than fone can be formed concurrently.) In such a scheme the concurrent execution of several coordinators, even if they are within the same partition, does not destroy consistency. When a new coordinator is elected in the proposed recovery protocol, it polls all sites about their current local state. In making a coumit deci- sion, only the replies from the latest poll is used -- information obtained in earlier polls is ignored. Less conservative approaches which uses previ- ous information can be found in [SKEES1c]. REFERENCES CaLsB76] Alsberg, P. and Day, J.s "A Principle for Resilient Sharing of Distributed Resources." Proc. 2nd International Conference on Software Ingineering, San Francisco, Ca+, October 1976. {cac79} Garcia-Molina, Hector, Ph.D. Thesis, Stanford University» 1979. Uoarcs1] Garcia-Molina, Hector, "Elections in a Distributed Computing System," TR No. 280, Princeton University, Decenber, 1980. [GIFF791 Gifford, David, "Weighted Voting for Replicated Data" Qperat= ing Systems Reviews 13, 5, Dec.» 1979, pp. 150-9. Coray79] Gray, J. N., "Notes on Database Operating Systems," in Operat- ing Systems: An Advanced Course, Springer-Verlag, 1979+ CHana179] Hammer, M. and Shipman, D., "Reliability Mechanisms for SDD-1: A Systen for Distributed Databases," Computer Corporation of America, Canbridge, Masse» July 1979. a (LAMP761 (Linp79] [skEE81a] (SKEES81b] [skEE81¢] {sT0n79] [TH0n79) Lampson, B. and Sturgis, H., "Crash Recovery in a Distributed Storage System," -Tech. Report, Computer Science Laboratory» Xerox Parc, Palo Alto, California, 1976. Lindsay, B.G. et ale, "Notes on Distributed Databases," IBM Research Report, no. RJ2571 (July 1979). Skeen, D. and M. Stonebraker, "A Formal Model of Crash Recovery in a Distributed System,” IEEE JIransactions on Software Engineering, (to appear). Skeen, De, "Nonblocking Commit Protocols." SIGMOD Intexna- ional Conf. on Management of Data, Ann Arbor, Michigan, 1981. Skeen, D., "Crash Recovery in a Distributed Database System Ph.D. Thesis, University of California, Berkeley (in prepa tion). Stonebraker, M., "Concurrency Control and Consistency of Mul- tiple Copies in Distributed INGRES," IEEE Transactions on Software Engineering, May 1979. ‘Thomas, Robert, "A Majority Consensus Approach to Concurrency Control," Transactions on Database Systems, 4, 2, June 1979. 412

You might also like