Lecture 2 Ho PDF

Download as pdf or txt
Download as pdf or txt
You are on page 1of 35

Outline

Q Introduction
Q Distributed DBMS Architecture
Q Distributed Database Design
Q Distributed Query Processing
Q Distributed Concurrency Control
¯ Transaction Concepts & Models
¯ Serializability
¯ Distributed Concurrency Control Protocols

Q Distributed Reliability Protocols

Distributed DBMS 1

Transaction
A transaction is a collection of actions that make consistent
transformations of system states while preserving system
consistency.
¯ concurrency transparency
¯ failure transparency

Database may be
Database in a temporarily in an Database in a
consistent inconsistent state consistent
state during execution state

Begin Execution of End


Transaction Transaction Transaction

Distributed DBMS 2

Page 1
Example Database

Consider an airline reservation example with the


relations:

FLIGHT(FNO, DATE, SRC, DEST, STSOLD, CAP)


CUST(CNAME, ADDR, BAL)
FC(FNO, DATE, CNAME,SPECIAL)

Distributed DBMS 3

Example Transaction

Begin_transaction Reservation
begin
input(flight_no, date, customer_name);
EXEC SQL UPDATE FLIGHT
SET STSOLD = STSOLD + 1
WHERE FNO = flight_no AND DATE = date;
EXEC SQL INSERT
INTO FC(FNO, DATE, CNAME, SPECIAL);
VALUES (flight_no, date, customer_name, null);
output(“reservation completed”)
end . {Reservation}

Distributed DBMS 4

Page 2
Termination of Transactions
Begin_transaction Reservation
begin
input(flight_no, date, customer_name);
EXEC SQL SELECT STSOLD,CAP
INTO temp1,temp2
FROM FLIGHT
WHERE FNO = flight_no AND DATE = date;
if temp1 = temp2 then
output(“no free seats”);
Abort
else
EXEC SQL UPDATE FLIGHT
SET STSOLD = STSOLD + 1
WHERE FNO = flight_no AND DATE = date;
EXEC SQL INSERT
INTO FC(FNO, DATE, CNAME, SPECIAL);
VALUES (flight_no, date, customer_name, null);
Commit
output(“reservation completed”)
endif
end . {Reservation}
Distributed DBMS 5

Properties of Transactions
ATOMICITY
¯ all or nothing

CONSISTENCY
¯ no violation of integrity constraints

ISOLATION
¯ concurrent changes invisible È serializable

DURABILITY
¯ committed updates persist

Distributed DBMS 6

Page 3
Transactions Provide…

Q Atomic and reliable execution in the presence of


failures

Q Correct execution in the presence of multiple


user accesses

Q Correct management of replicas (if they support


it)

Distributed DBMS 7

Transaction Processing Issues


Q Transaction structure (usually called transaction
model)
¯ Flat (simple), nested

Q Internal database consistency


¯ Semantic data control (integrity enforcement) algorithms

Q Reliability protocols
¯ Atomicity & Durability

¯ Local recovery protocols

¯ Global commit protocols

Distributed DBMS 8

Page 4
Transaction Processing Issues

Q Concurrency control algorithms


¯ How to synchronize concurrent transaction executions
(correctness criterion)
¯ Intra-transaction consistency, Isolation

Q Replica control protocols


¯ How to control the mutual consistency of replicated data
¯ One copy equivalence and ROWA

Distributed DBMS 9

Architecture Revisited
Begin_transaction,
Read, Write,
Commit, Abort Results

Distributed
Execution Monitor
Transaction Manager
(TM)
With other With other
Scheduling/
TMs Descheduling SCs
Requests
Scheduler
(SC)

To data
processor

Distributed DBMS 10

Page 5
Centralized Transaction
Execution
User User

Application Application

Begin_Transaction, Results &


Read, Write, Abort, EOT User Notifications
Transaction
Manager
(TM)

Read, Write,
Results
Abort, EOT

Scheduler
(SC)
Scheduled
Results
Operations

Recovery
Manager
(RM)

Distributed DBMS 11

Distributed Transaction
Execution
User application

Begin_transaction, Results &


Read, Write, EOT, User notifications Distributed
Abort Transaction Execution
Model
TM TM
Replica Control
Read, Write, Protocol
EOT, Abort

Distributed
SC SC Concurrency Control
Protocol

Local
RM RM Recovery
Protocol

Distributed DBMS 12

Page 6
Concurrency Control
Q The problem of synchronizing concurrent
transactions such that the consistency of the
database is maintained while, at the same time,
maximum degree of concurrency is achieved.
Q Anomalies:
¯ Lost updates
X The effects of some transactions are not reflected on the
database.
¯ Inconsistent retrievals
X A transaction, if it reads the same data item more than
once, should always read the same value.

Distributed DBMS 13

Execution Schedule (or History)


Q An order in which the operations of a set of
transactions are executed.
Q A schedule (history) can be defined as a partial
order over the operations of a set of transactions.

T1: Read(x) T2: Write(x) T3: Read(x)


Write(x) Write(y) Read(y)
Commit Read(z) Read(z)
Commit Commit

H1={W2(x),R1(x), R3(x),W1(x),C1,W2(y),R3(y),R2(z),C2,R3(z),C3}

Distributed DBMS 14

Page 7
Serial History
Q All the actions of a transaction occur
consecutively.
Q No interleaving of transaction operations.
Q If each transaction is consistent (obeys integrity
rules), then the database is guaranteed to be
consistent at the end of executing a serial history.

T1: Read(x) T2: Write(x) T3: Read(x)


Write(x) Write(y) Read(y)
Commit Read(z) Read(z)
Commit Commit

Hs={W2(x),W2(y),R2(z),C2,R1(x),W1(x),C1,R3(x),R3(y),R3(z),C3}

Distributed DBMS 15

Serializable History
Q Transactions execute concurrently, but the net
effect of the resulting history upon the database is
equivalent to some serial history.
Q Equivalent with respect to what?
¯ Conflict equivalence: the relative order of execution of the
conflicting operations belonging to unaborted transactions in
two histories are the same.
¯ Conflicting operations: two incompatible operations (e.g.,
Read and Write) conflict if they both access the same data
item.
X Incompatible operations of each transaction is assumed to
conflict; do not change their execution orders.
X If two operations from two different transactions conflict,
the corresponding transactions are also said to conflict.

Distributed DBMS 16

Page 8
Serializability in Distributed
DBMS
Q Somewhat more involved. Two histories have to
be considered:
¯ local histories
¯ global history

Q For global transactions (i.e., global history) to be


serializable, two conditions are necessary:
¯ Each local history should be serializable.
¯ Two conflicting operations should be in the same relative
order in all of the local histories where they appear together.

Distributed DBMS 17

Global Non-serializability
T1: Read(x) T2: Read(x)
x ←x+5 x ←x∗15
Write(x) Write(x)
Commit Commit

The following two local histories are individually


serializable (in fact serial), but the two transactions are
not globally serializable.
LH1={R1(x),W1(x),C1,R2(x),W2(x),C2}
LH2={R2(x),W2(x),C2,R1(x),W1(x),C1}

Distributed DBMS 18

Page 9
Concurrency Control
Algorithms
Q Pessimistic
¯ Two-Phase Locking-based (2PL)
X Centralized (primary site) 2PL
X Primary copy 2PL
X Distributed 2PL
¯ Timestamp Ordering (TO)
X Basic TO
X Multiversion TO
X Conservative TO
¯ Hybrid

Q Optimistic
¯ Locking-based
¯ Timestamp ordering-based

Distributed DBMS 19

Locking-Based Algorithms
Q Transactions indicate their intentions by requesting
locks from the scheduler (called lock manager).
Q Locks are either read lock (rl) [also called shared
lock] or write lock (wl) [also called exclusive lock]
Q Read locks and write locks conflict (because Read
and Write operations are incompatible
rl wl
rl yes no
wl no no
Q Locking works nicely to allow concurrent processing
of transactions.
Distributed DBMS 20

Page 10
Two-Phase Locking (2PL)
™ A Transaction locks an object before using it.
š When an object is locked by another transaction,
the requesting transaction must wait.
› When a transaction releases a lock, it may not
request another lock. Lock point
Obtain lock

Release lock
No. of locks

Phase 1 Phase 2

BEGIN END
Distributed DBMS 21

Strict 2PL
Hold locks until the end.

Obtain lock

Release lock

Transaction
BEGIN END duration
period of
data item
use

Distributed DBMS 22

Page 11
Centralized 2PL
Q There is only one 2PL scheduler in the distributed system.
Q Lock requests are issued to the central scheduler.
Data Processors at
participating sites Coordinating TM Central Site LM
Lock
Requ
est

ranted
Lock G
tion
Opera

End o
fO peratio
n

Relea
se Locks

Distributed DBMS 23

Distributed 2PL
Q 2PL schedulers are placed at each site. Each
scheduler handles lock requests for data at that
site.
Q A transaction may read any of the replicated
copies of item x, by obtaining a read lock on one
of the copies of x. Writing into x requires
obtaining write locks for all copies of x.

Distributed DBMS 24

Page 12
Distributed 2PL Execution
Coordinating TM Participating LMs Participating DPs

Lock
Requ
est
Oper
ation

ation
End of Oper

Rele
ase Lock
s

Distributed DBMS 25

Timestamp Ordering
™ Transaction (Ti) is assigned a globally unique timestamp
ts(Ti).
š Transaction manager attaches the timestamp to all
operations issued by the transaction.
› Each data item is assigned a write timestamp (wts) and a
read timestamp (rts):
¯ rts(x) = largest timestamp of any read on x
¯ wts(x) = largest timestamp of any read on x
œ Conflicting operations are resolved by timestamp order.
Basic T/O:
for Ri(x) for Wi(x)
if ts(Ti) < wts(x) if ts(Ti) < rts(x) and ts(Ti) < wts(x)
then reject Ri(x) then reject Wi(x)
else accept Ri(x) else accept Wi(x)
rts(x) ← ts(Ti) wts(x) ← ts(Ti)

Distributed DBMS 26

Page 13
Multiversion Timestamp
Ordering
Q Do not modify the values in the database, create
new values.
Q A Ri(x) is translated into a read on one version of
x.
¯ Find a version of x (say xv) such that ts(xv) is the largest
timestamp less than ts(Ti).

Q A Wi(x) is translated into Wi(xw) and accepted if


the scheduler has not yet processed any Rj(xr)
such that
ts(Ti) < ts(xr) < ts(Tj)

Distributed DBMS 27

Optimistic Concurrency Control


Algorithms
Pessimistic execution

Validate Read Compute Write

Optimistic execution

Read Compute Validate Write

Distributed DBMS 28

Page 14
Optimistic Concurrency Control
Algorithms

Q Transaction execution model: divide into


subtransactions each of which execute at a site
¯ Tij: transaction Ti that executes at site j

Q Transactions run independently at each site until


they reach the end of their read phases
Q All subtransactions are assigned a timestamp at
the end of their read phase
Q Validation test performed during validation
phase. If one fails, all rejected.

Distributed DBMS 29

Optimistic CC Validation Test


… If all transactions Tk where ts(Tk) < ts(Tij) have
completed their write phase before Tij has
started its read phase, then validation succeeds
¯ Transaction executions in serial order

Tk R V W
R V W
Tij

Distributed DBMS 30

Page 15
Optimistic CC Validation Test
† If there is any transaction Tk such that ts(Tk)<ts(Tij) and
which completes its write phase while Tij is in its read
phase, then validation succeeds if WS(Tk) ∩
RS(Tij) = Ø
¯ Read and write phases overlap, but Tij does not read data items
written by Tk

Tk R V W
R V W
Tij

Distributed DBMS 31

Optimistic CC Validation Test


‡ If there is any transaction Tk such that ts(Tk)< ts(Tij) and
which completes its read phase before Tij completes its
read phase, then validation succeeds if WS(Tk) ∩ RS(Tij)
= Ø and WS(Tk) ∩ WS(Tij) = Ø
¯ They overlap, but don't access any common data items.

R V W
Tk
R V W
Tij

Distributed DBMS 32

Page 16
Deadlock
Q A transaction is deadlocked if it is blocked and will
remain blocked until there is intervention.
Q Locking-based CC algorithms may cause deadlocks.
Q TO-based algorithms that involve waiting may cause
deadlocks.
Q Wait-for graph
¯ If transaction Ti waits for another transaction Tj to release a lock on
an entity, then Ti → Tj in WFG.

Ti Tj

Distributed DBMS 33

Local versus Global WFG


Assume T1 and T2 run at site 1, T3 and T4 run at site 2.
Also assume T3 waits for a lock held by T4 which waits for
a lock held by T1 which waits for a lock held by T2 which,
in turn, waits for a lock held by T3.
Local WFG Site 1 Site 2
T1 T4

T2 T3
Global WFG
T1 T4

T2 T3

Distributed DBMS 34

Page 17
Deadlock Management
Q Prevention
¯ Guaranteeing that deadlocks can never occur in the first
place. Check transaction when it is initiated. Requires no run
time support.
Q Avoidance
¯ Detecting potential deadlocks in advance and taking action to
insure that deadlock will not occur. Requires run time support.
Q Detection and Recovery
¯ Allowing deadlocks to form and then finding and breaking
them. As in the avoidance scheme, this requires run time
support.

Distributed DBMS 35

Deadlock Prevention
Q All resources which may be needed by a transaction
must be predeclared.
¯ The system must guarantee that none of the resources will be
needed by an ongoing transaction.
¯ Resources must only be reserved, but not necessarily allocated a
priori
¯ Unsuitability of the scheme in database environment
¯ Suitable for systems that have no provisions for undoing processes.

Q Evaluation:
– Reduced concurrency due to preallocation
– Evaluating whether an allocation is safe leads to added overhead.
– Difficult to determine (partial order)
+ No transaction rollback or restart is involved.

Distributed DBMS 36

Page 18
Deadlock Avoidance
Q Transactions are not required to request
resources a priori.
Q Transactions are allowed to proceed unless a
requested resource is unavailable.
Q In case of conflict, transactions may be
allowed to wait for a fixed time interval.
Q Order either the data items or the sites and
always request locks in that order.
Q More attractive than prevention in a database
environment.

Distributed DBMS 37

Deadlock Detection

Q Transactions are allowed to wait freely.


Q Wait-for graphs and cycles.
Q Topologies for deadlock detection algorithms
¯ Centralized
¯ Distributed
¯ Hierarchical

Distributed DBMS 38

Page 19
Centralized Deadlock Detection
Q One site is designated as the deadlock detector for
the system. Each scheduler periodically sends its
local WFG to the central site which merges them to a
global WFG to determine cycles.
Q How often to transmit?
¯ Too often ⇒ higher communication cost but lower delays due to
undetected deadlocks
¯ Too late ⇒ higher delays due to deadlocks, but lower
communication cost

Q Would be a reasonable choice if the concurrency


control algorithm is also centralized.
Q Proposed for Distributed INGRES

Distributed DBMS 39

Hierarchical Deadlock Detection


Build a hierarchy of detectors

DDox

DD11 DD14

Site 1 Site 2 Site 3 Site 4


DD21 DD22 DD23 DD24

Distributed DBMS 40

Page 20
Distributed Deadlock Detection
Q Sites cooperate in detection of deadlocks.
Q One example:
¯ The local WFGs are formed at each site and passed on to other
sites. Each local WFG is modified as follows:
… Since each site receives the potential deadlock cycles from
other sites, these edges are added to the local WFGs
† The edges in the local WFG which show that local transactions
are waiting for transactions at other sites are joined with edges
in the local WFGs which show that remote transactions are
waiting for local ones.
¯ Each local deadlock detector:
X looks for a cycle that does not involve the external edge. If it
exists, there is a local deadlock which can be handled locally.
X looks for a cycle involving the external edge. If it exists, it
indicates a potential global deadlock. Pass on the information to
the next site.

Distributed DBMS 41

Outline
Q Introduction
Q Distributed DBMS Architecture
Q Distributed Database Design
Q Distributed Query Processing
Q Distributed Concurrency Control
Q Distributed Reliability Protocols
¯ Distributed Commit Protocols
¯ Distributed Recovery Protocols

Distributed DBMS 42

Page 21
Reliability

Problem:
How to maintain

atomicity

durability
properties of transactions

Distributed DBMS 43

Types of Failures
Q Transaction failures
¯ Transaction aborts (unilaterally or due to deadlock)
¯ Avg. 3% of transactions abort abnormally
Q System (site) failures
¯ Failure of processor, main memory, power supply, …
¯ Main memory contents are lost, but secondary storage contents
are safe
¯ Partial vs. total failure
Q Media failures
¯ Failure of secondary storage devices such that the stored data is
lost
¯ Head crash/controller failure (?)
Q Communication failures
¯ Lost/undeliverable messages
¯ Network partitioning

Distributed DBMS 44

Page 22
Local Recovery Management –
Architecture
Q Volatile storage
¯ Consists of the main memory of the computer system (RAM).
Q Stable storage
¯ Resilient to failures and loses its contents only in the presence of
media failures (e.g., head crashes on disks).
¯ Implemented via a combination of hardware (non-volatile storage)
and software (stable-write, stable-read, clean-up) components.

Main memory
Secondary Local Recovery
storage Manager

Fetch,
Flush Database
Stable Read Write buffers
database Database Buffer (Volatile
Write Manager Read database)

Distributed DBMS 45

Update Strategies

Q In-place update
¯ Each update causes a change in one or more data values on
pages in the database buffers

Q Out-of-place update
¯ Each update causes the new value(s) of data item(s) to be
stored separate from the old value(s)

Distributed DBMS 46

Page 23
In-Place Update Recovery
Information
Database Log
Every action of a transaction must not only perform the action,
but must also write a log record to an append-only file.

Old New
Update
stable database stable database
Operation
state state

Database
Log

Distributed DBMS 47

Logging Interface

Secondary
storage
Main memory
Log
Stable Local Recovery
Manager buffers
log a d
Fetch, Re e
rit Database
Flush W
Read buffers
Stable Read Database Buffer
Manager (Volatile
database Write Write database)

Distributed DBMS 48

Page 24
REDO Protocol
Old New
stable database REDO stable database
state state

Database
Log

Q REDO'ing an action means performing it again.


Q The REDO operation uses the log information and
performs the action that might have been done
before, or not done due to failures.
Q The REDO operation generates the new image.
Distributed DBMS 49

UNDO Protocol
New Old
stable database UNDO stable database
state state

Database
Log

Q UNDO'ing an action means to restore the object


to its before image.
Q The UNDO operation uses the log information
and restores the old value of the object.
Distributed DBMS 50

Page 25
Write–Ahead Log (WAL)
Protocol
Q Notice:
¯ If a system crashes before a transaction is committed, then all
the operations must be undone. Only need the before images
(undo portion of the log).
¯ Once a transaction is committed, some of its actions might
have to be redone. Need the after images (redo portion of the
log).
Q WAL protocol :
™ Before a stable database is updated, the undo portion of the
log should be written to the stable log
š When a transaction commits, the redo portion of the log must
be written to stable log prior to the updating of the stable
database.

Distributed DBMS 51

Distributed Reliability Protocols


Q Commit protocols
¯ How to execute commit command for distributed transactions.
¯ Issue: how to ensure atomicity and durability?
Q Termination protocols
¯ If a failure occurs, how can the remaining operational sites deal with
it.
¯ Non-blocking : the occurrence of failures should not force the sites
to wait until the failure is repaired to terminate the transaction.
Q Recovery protocols
¯ When a failure occurs, how do the sites where the failure occurred
deal with it.
¯ Independent : a failed site can determine the outcome of a
transaction without having to obtain remote information.
Q Independent recovery ⇒ non-blocking termination

Distributed DBMS 52

Page 26
Two-Phase Commit (2PC)
Phase 1 : The coordinator gets the participants
ready to write the results into the database
Phase 2 : Everybody writes the results into the
database
¯ Coordinator :The process at the site where the transaction
originates and which controls the execution
¯ Participant :The process at the other sites that participate in
executing the transaction
Global Commit Rule:
… The coordinator aborts a transaction if and only if at least one
participant votes to abort it.
† The coordinator commits a transaction if and only if all of the
participants vote to commit it.

Distributed DBMS 53

Centralized 2PC

P P

P P
C C C
P P

P P

ready? yes/no commit/abort?commited/aborted

Phase 1 Phase 2

Distributed DBMS 54

Page 27
2PC Protocol Actions
Coordinator Participant

INITIAL INITIAL
A RE
PREP
write
begin_commit write abort No Ready to
in log in log Commit?
ORT
E-AB
VOT Yes
WAIT VOTE-COMMIT write ready
in log

Yes write abort GLOBAL-ABORT READY


Any No?
in log
MIT
No -COM
VOTE
write commit
in log
Abort Type of
msg
ACK
COMMIT ABORT write abort Commit
in log
ACK write commit
in log
write
end_of_transaction
in log ABORT COMMIT

Distributed DBMS 55

Linear 2PC

Phase 1

Prepare VC/VA VC/VA VC/VA VC/VA

1 2 3 4 5 N

GC/GA GC/GA GC/GA GC/GA GC/GA

Phase 2

VC: Vote-Commit, VA: Vote-Abort, GC: Global-commit, GA: Global-abort

Distributed DBMS 56

Page 28
Distributed 2PC
Coordinator Participants Participants

global-commit/
global-abort
vote-abort/ decision made
prepare vote-commit independently

Phase 1

Distributed DBMS 57

State Transitions in 2PC


INITIAL INITIAL

Commit command Prepare


Prepare Prepare Vote-commit
Vote-abort

WAIT READY

Vote-abort Vote-commit (all) Global-abort Global-commit


Global-abort Global-commit Ack Ack

ABORT COMMIT ABORT COMMIT

Coordinator Participants

Distributed DBMS 58

Page 29
Site Failures - 2PC Termination
COORDINATOR

Q Timeout in INITIAL
INITIAL
¯ Who cares

Q Timeout in WAIT Commit command


Prepare
¯ Cannot unilaterally commit
¯ Can unilaterally abort
WAIT
Q Timeout in ABORT or COMMIT
¯ Stay blocked and wait for the acks Vote-abort Vote-commit
Global-abort Global-commit

ABORT COMMIT

Distributed DBMS 59

Site Failures - 2PC Termination


PARTICIPANTS

INITIAL
Q Timeout in INITIAL
¯ Coordinator must have failed in Prepare
INITIAL state Vote-commit
Prepare
Vote-abort
¯ Unilaterally abort
READY
Q Timeout in READY
¯ Stay blocked Global-abort Global-commit
Ack Ack

ABORT COMMIT

Distributed DBMS 60

Page 30
Site Failures - 2PC Recovery
COORDINATOR

Q Failure in INITIAL INITIAL


¯ Start the commit process upon recovery

Q Failure in WAIT Commit command


Prepare
¯ Restart the commit process upon recovery

Q Failure in ABORT or COMMIT WAIT


¯ Nothing special if all the acks have been
Vote-abort Vote-commit
received Global-abort Global-commit
¯ Otherwise the termination protocol is involved

ABORT COMMIT

Distributed DBMS 61

Site Failures - 2PC Recovery


PARTICIPANTS

Q Failure in INITIAL INITIAL


¯ Unilaterally abort upon recovery
Prepare
Q Failure in READY Prepare
Vote-commit
¯ The coordinator has been informed about Vote-abort
the local decision
READY
¯ Treat as timeout in READY state and
invoke the termination protocol Global-abort Global-commit
Ack Ack
Q Failure in ABORT or COMMIT
¯ Nothing special needs to be done
ABORT COMMIT

Distributed DBMS 62

Page 31
Problem With 2PC
Q Blocking
¯ Ready implies that the participant waits for the coordinator
¯ If coordinator fails, site is blocked until recovery
¯ Blocking reduces availability

Q Independent recovery is not possible


Q However, it is known that:
¯ Independent recovery protocols exist only for single site
failures; no independent recovery protocol exists which is
resilient to multiple-site failures.
Q So we search for these protocols – 3PC

Distributed DBMS 63

Three-Phase Commit
Q 3PC is non-blocking.
Q A commit protocols is non-blocking iff
¯ it is synchronous within one state transition, and
¯ its state transition diagram contains
X no state which is “adjacent” to both a commit
and an abort state, and
X no non-committable state which is “adjacent” to
a commit state
Q Adjacent: possible to go from one stat to
another with a single state transition
Q Committable: all sites have voted to
commit a transaction
¯ e.g.: COMMIT state
Distributed DBMS 64

Page 32
Communication Structure

P P P

P P P
C C C C
P P P

P P P

pre-commit/
ready? yes/no pre-abort? yes/no commit/abort ack

Phase 1 Phase 2 Phase 3

Distributed DBMS 65

Network Partitioning
Q Simple partitioning
¯ Only two partitions

Q Multiple partitioning
¯ More than two partitions

Q Formal bounds (due to Skeen):


¯ There exists no non-blocking protocol that is resilient to a
network partition if messages are lost when partition
occurs.
¯ There exist non-blocking protocols which are resilient to a
single network partition if all undeliverable messages are
returned to sender.
¯ There exists no non-blocking protocol which is resilient to
a multiple partition.

Distributed DBMS 66

Page 33
Independent Recovery Protocols for
Network Partitioning
Q No general solution possible
¯ allow one group to terminate while the other is blocked
¯ improve availability

Q How to determine which group to proceed?


¯ The group with a majority

Q How does a group know if it has majority?


¯ centralized
X whichever partitions contains the central site should
terminate the transaction
¯ voting-based (quorum)
X different for replicated vs non-replicated databases

Distributed DBMS 67

Quorum Protocols for


Non-Replicated Databases
Q The network partitioning problem is handled
by the commit protocol.
Q Every site is assigned a vote Vi.
Q Total number of votes in the system V
Q Abort quorum Va, commit quorum Vc
¯ Va + Vc > V where 0 ≤ Va , Vc ≤ V
¯ Before a transaction commits, it must obtain a commit
quorum Vc
¯ Before a transaction aborts, it must obtain an abort
quorum Va

Distributed DBMS 68

Page 34
Quorum Protocols for
Replicated Databases
Q Network partitioning is handled by the replica
control protocol.
Q One implementation:
¯ Assign a vote to each copy of a replicated data item (say
Vi) such that Σi Vi = V
¯ Each operation has to obtain a read quorum (Vr) to read
and a write quorum (Vw) to write a data item
¯ Then the following rules have to be obeyed in determining
the quorums:
X Vr + Vw > V a data item is not read and written
by two transactions concurrently
X Vw > V/2 two write operations from two
transactions cannot occur
concurrently on the same data item

Distributed DBMS 69

Use for Network Partitioning


Q Simple modification of the ROWA rule:
¯ When the replica control protocol attempts to read or write
a data item, it first checks if a majority of the sites are in the
same partition as the site that the protocol is running on (by
checking its votes). If so, execute the ROWA rule within
that partition.
Q Assumes that failures are “clean” which means:
¯ failures that change the network's topology are detected by
all sites instantaneously
¯ each site has a view of the network consisting of all the
sites it can communicate with

Distributed DBMS 70

Page 35

You might also like