DTS Unit Iv
DTS Unit Iv
1
Clock may jump ahead at synchronization points
Working of Computer timer:
To implement a clock in a computer a counter register and a holding register are used.
The counter is decremented by a quartz crystals oscillator.
When it reaches zero, an interrupted is generated and the counter is reloaded from the holding
register.
Clock skew problem
To avoid the clock skew problem, two types of clocks are used:
Logical clocks : to provide consistent event ordering
Physical clocks: clocks whose values must not deviate from the real time by more than a
certain amount.
Software based solutions for synchronising clocks
The following techniques are used to synchronize clocks:
time stamps of real-time clocks
message passing
round-trip time (local measurement)
Based on the above mentioned techniques, the following algorithms provides clock
synchronization:
1. Cristian’s algorithm
2. Berkeley algorithm
3. Network time protocol (Internet)
4.1.3 Cristian’s algorithm
Cristian suggested the use of a time server, connected to a device that receives signals from a
source of UTC, to synchronize computers externally. Round trip times between processes are often
reasonably short in practice, yet theoretically unbounded. The practical estimation is possible if round-trip
times are sufficiently short in comparison to required accuracy.
Principle:
The UTC-synchronized time server S is used here.
A process P sends requests to S and measures round-trip time Tround .
In LAN, Tround should be around 1-10 ms .
During this time, a clock with a 10-6 sec/sec drift rate varies by at most 10-8 sec.
Hence the estimate of Tround is reasonably accurate.
Now set clock to t + ½ Tround.
2
As per the given algorithm, the following results were found:
Earliest time that S can have sent reply: t0 + min
Latest time that S can have sent reply: t0 + Tround – min
Total time range for answer: Tround - 2 * min
Accuracy is ± (½Tround - min)
Problems in this system:
Timer must never run backward.
Variable delays in message passing / delivery occurs.
4.1.4 Berkeley algorithm
Berkeley algorithm was developed to solve the problems of Cristian‟s algorithm.
This algorithm does not need external synchronization.
Master slave approach is used here.
The master polls the slaves periodically about their clock readings.
Estimate of local clock times is calculated using round trip.
The average values are obtained from a group of processes.
This method cancels out individual clock‟s tendencies to run fast and tells slave processes by
which amount of time to adjust local clock
In case of master failure, master election algorithm is used.
3
most computers.
d. To provide protection against interference with the time service, whether malicious or
accidental.
4
Fig 4.4: Message exchange between NTP
Delay and offset:
Let o be the true offset of B’s clock relative to A’s clock, and let t and t’ the true transmission
times of m and m’ (Ti , Ti-1 ... are not true time)
The delay Ti-2 = Ti-3 + t + o................... (1)
Ti = Ti-1 + t’ – o......................... (2)
which leads to di = t + t’ = Ti-2 - Ti-3 + Ti - Ti-1 (clock errors zeroed out à (almost) true d)
Offset oi = ½ (Ti-2 – Ti-3 + Ti-1 – Ti ) (only an estimate)
Implementing NTP
Statistical algorithms based on 8 most recent pairs are used in NTP to determine quality of
estimates.
The value of oi that corresponds to the minimum di is chosen as an estimate for o.
Time server communicates with multiple peers, eliminates peers with unreliable data, favors
peers with higher strata number (e.g., for primary synchronization partner selection).
NTP phase lock loop model: modify local clock in accordance with observed drift rate.
The experiments achieve synchronization accuracies of 10 msecs over Internet, and 1 msec on
LAN using NTP
4.2 LOGICAL TIME AND LOGICAL CLOCKS
Synchronization between two processes without any form of interaction is not needed.
But when processes interact, they must be event ordered. This is done by logical clocks.
Consider a distributed system with n processes, p1, p2, …pn.
Each process pi executes on a separate processor without any memory sharing.
Each pi has a state si. The process execution is a sequence of events.
As a result of the events, changes occur in the local state of the process.
They either send or receive messages.
4.2.1 Lamport Ordering of Events
The partial ordering obtained by generalizing the relationship between two processes is called as
happened-before relation or causal ordering or potential causal ordering. This term was coined by
Lamport. Happens-before defines a partial order of events in a distributed system. Some events can’t be
placed in the order.
We say e →i e’ if e happens before e’ at process i.
e → e’ is defined using the following rules:
Local ordering: e → e‟ if e →ie’ for any process i
Messages: send(m) → receive(m) for any message m
Transitivity: e → e’ if e → e’ and e’ → e‟
5
4.2.2 Logical Clocks
Lamport clock L orders events consistent with logical happens before ordering. If e → e‟,
then L(e) < L(e‟)
But the converse is not true.
L(e) < L(e‟) does not imply e → e‟
Similar rules for concurrency are:
L(e) = L(e‟) implies e║e‟ (for distinct e,e‟)
e║e‟ does not imply L(e) = L(e‟)
This implies that Lamport clocks arbitrarily order some concurrent events.
6
Fig 4.6: Vector Timestamps
Vector timestamps have the disadvantage, compared with Lamport timestamps, of taking up an
amount of storage and message payload that is proportional to N, the number of processes.
4.3 GLOBAL STATES
This section checks whether a property is true in a distributed systems. This is checked for the
following problems:
distributed garbage collection
deadlock detection
termination detection
Debugging
The global state of a distributed system consists of the local state of each process,
together with the messages that are currently in transit, that is, that have been sent
but not delivered.
Distributed debugging:
Consider an application with process pi and a variable x, where i=1, 2, …, N. As the program
executes the variables may change the value. The value must be within the range. The relationships
between variables must be calculated only for the variables which executes at same time.
4.3.1 Global States and Consistency cuts (Distributed Snapshots)
Distributed Snapshot represents a state in which the distributed system might have
been in. A snapshot of the system is a single configuration of the system.
A distributed snapshot should reflect a consistent state. A global state is consistent if it could have
been observed by an external observer. For a successful Global State, all states must be consistent
If we have recorded that a process P has received a message from a process Q, then we should
have also recorded that process Q had actually send that message.
Otherwise, a snapshot will contain the recording of messages that have been received but
never sent.
The reverse condition (Q has sent a message that P has not received) is allowed.
8
The notion of a global state can be graphically represented by what is called a cut. A cut
represents the last event that has been recorded for each process.
The history of each process if given by:
history(p ) h e0 , e1, e2 ,.
Each event either is an internali
action
i
of
i
the
i
process.
i
We denote by s k the state of process pi
immediately before the kth event occurs. The state si in the global state S corresponding i g to the cut C is
that of pi immediately after the last event processed by pi in the cut – ei . The set of events eici is called
ci
10
4.5 DISTRIBUTED MUTUAL EXCLUSION
The mutual exclusion makes sure that concurrent process access shared resources or data in a
serialized way. If a process, say Pi , is executing in its critical section, then no other processes can be
executing in their critical sections.
11
Fig 4. 12: Central Server Algorithm
The central server algorithm fulfills ME1 and ME2 but not ME3 (i.e.) safety and liveliness is
ensured but ordering is not satisfied. Also the performance of the algorithm is measured as follows:
Bandwidth: This is measured by entering and exiting messages. Entering takes two
messages ( request followed by a grant) which are delayed by the round- trip time. Exiting
takes one release message, and does not delay the exiting process.
Throughput is measured by synchronization delay, round-trip of a release message and
grant message.
4.5.3 Ring Based Algorithm
This provides a simplest way to arrange mutual exclusion between N processes without
requiring an additional process is arrange them in a logical ring.
Each process pi has a communication channel to the next process in the ring as follows:
p(i+1)/mod N.
The unique token is in the form of a message passed from process to process in a single
direction clockwise.
If a process does not require to enter the CS when it receives the token, then it
immediately forwards the token to its neighbor.
A process requires the token waits until it receives it, but retains it.
To exit the critical section, the process sends the token on to its neighbor.
12
Fig 4.13: Ring based algorithm
This algorithm satisfies ME1 and ME2 but not ME (i.e.) safety and liveness are satisfied but not
ordering. The performance measures include:
Bandwidth: continuously consumes the bandwidth except when a process is inside the CS.
Exit only requires one message.
Delay: experienced by process is zero message(just received token) to N
messages(just pass the token).
Throughput: synchronization delay between one exit and next entry is anywhere from 1(next
one) to N (self) message transmission.
4.5.4 Multicast Synchronisation
This exploits mutual exclusion between N peer processes based upon multicast.
Processes that require entry to a critical section multicast a request message, and can enter it only
when all the other processes have replied to this message.
The condition under which a process replies to a request are designed to ensure ME1 ME2 and
ME3 are met.
Each process pi keeps a Lamport clock. Message requesting entry are of the form<T, pi>.
Each process records its state of either RELEASE, WANTED or HELD in a variable state.
If a process requests entry and all other processes is RELEASED, then all processes
reply immediately.
If some process is in state HELD, then that process will not reply until it is finished.
If some process is in state WANTED and has a smaller timestamp than the incoming
request, it will queue the request until it is finished.
If two or more processes request entry at the same time, then whichever bears the lowest
timestamp will be the first to collect N-1 replies.
13
Fig 4.13: Multicast Synchronisation
14
Maekawa’s Voting Algorithm
On initialization
state := RELEASED;
voted := FALSE;
For pito enter the critical section
state := WANTED;
Multicast request to all processes in Vi;
Wait until (number of replies received = K);
state := HELD;
On receiptof a request frompi at pj
if (state = HELD orvoted = TRUE)
then
queue request from pi without replying;
else
send reply to pi;
voted := TRUE;
end if
For pi to exit the critical section
state := RELEASED;
Multicast release to all processes in Vi;
On receiptof a release frompi at pj
if (queue of requests is non-empty)
then
remove head of queue – from pk, say;
send reply to pk;
voted := TRUE;
else
voted := FALSE;
end if
15
If three processes concurrently request entry to the CS, then it is possible for p1 to reply to itself
and hold off p2; for p2 rely to itself and hold off p3; for p3 to reply to itself and hold off p1.
Each process has received one out of two replies, and none can proceed.
If process queues outstanding request in happen-before order, ME3 can be satisfied and will be
deadlock free.
Performance Evaluation:
Bandwidth utilization is 2sqrt(N) messages per entry to CS and sqrt(N) per exit.
Client delay is the same as Ricart and Agrawala‟salgorithm, one round-trip time.
Synchronization delay is one round-trip time which is worse than R&A
4.5.6 Fault Tolerance
The reactions of the algorithms when messages are lost or when a process crashes is fault
tolerance.
None of the algorithm that we have described would tolerate the loss of messages if the
channels were unreliable.
The ring-based algorithm cannot tolerate any single process crash failure.
Maekawa‟salgorithm can tolerate some process crash failures: if a crashed process is not in a
voting set that is required.
The central server algorithm can tolerate the crash failure of a client process that neither holds
nor has requested the token.
4.6 ELECTIONS
16
Fig 4.14: Election process using Ring based election algorithm Steps in
election process:
1. Initially, every process is marked as non-participant. Any process can begin an election.
2. The starting processes marks itself as participant and place its identifier in a message to its
neighbour.
3. A process receives a message and compares it with its own. If the arrived identifier is larger, it
passes on the message.
4. If arrived identifier is smaller and receiver is not a participant, substitute its own identifier in the
message and forward if. It does not forward the message if it is already a participant.
5. On forwarding of any case, the process marks itself as a participant.
6. If the received identifier is that of the receiver itself, then this process‟ s identifier must be the
greatest, and it becomes the coordinator.
7. The coordinator marks itself as non-participant set elected_i and sends an elected message to
its neighbour enclosing its ID.
8. When a process receives elected message, marks itself as a non-participant, sets its variable
elected_i and forwards the message.
The election was started by process 17. The highest process identifier encountered so far is 24.
Requirements:
E1 is met. All identifiers are compared, since a process must receive its own ID back before
sending an elected message.
E2 is also met due to the guaranteed traversals of the ring.
Tolerates no failure makes ring algorithm of limited practical use.
Performance Evaluation
If only a single process starts an election, the worst-performance case is then the anti-clockwise
neighbour has the highest identifier. A total of N-1 messages is used to reach this neighbour.
Then further N messages are required to announce its election. The elected message is sent N
times. Making 3N-1 messages in all.
Turnaround time is also 3N-1 sequential message transmission time.
17
4.6.2 Bully Algorithm
This algorithm allows process to crash during an election, although it assumes the message
delivery between processes is reliable.
Assume that the system is synchronous to use timeouts to detect a process failure and each
process knows which processes have higher identifiers and that it can communicate with all
such processes.
In this algorithm, there are three types of messages:
o Election message: This is sent to announce an election message. A process begins an
election when it notices, through timeouts, that the coordinator has failed.
T=2Ttrans+Tprocess From the time of sending
o Answermessage: This is sent in response to an election message.
o Coordinatormessage: This is sent to announce the identity of the elected process.
18
Suppose P3 crashes and replaced by another process. P2 set P3 as coordinator and P1 set P2 as
coordinator.
E2 is clearly met by the assumption of reliable transmission.
Performance Evaluation
Best case the process with the second highest ID notices the coordinator‟s failure. Then it can
immediately elect itself and send N-2 coordinator messages.
The bully algorithm requires O(N^2) messages in the worst case - that is, when the process with
the least ID first detects the coordinator‟s failure. For then N-1 processes altogether begin election,
each sending messages to processes with higher ID.
4.7 TRANSACTION AND CONCURRENCY CONTROL
Fundamentals of transactions and concurrency control
A transaction is generally atomic.
The state of the transaction being done is not visible. If it is not done completely, any changes
it made will be undone. This is known as rollback.
Concurrency is needed when multiple users want to access the same data at the same time.
Concurrency control (CC) ensures that correct results for parallel operations are generated.
CC provides rules, methods, design methodologies and theories to maintain the consistency of
components operating simultaneously while interacting with the same object.
All concurrency control protocols are based on serial equivalence and are derived from rules of
conflicting operations:
Locks used to order transactions that access the same object according to request order.
Optimistic concurrency control allows transactions to proceed until they are ready to commit,
whereupon a check is made to see any conflicting operation on objects.
Timestamp ordering uses timestamps to order transactions that access the same object
according to their starting time.
Synchronisation without transactions
Synchronization without transactions are done with multiple threads.
The use of multiple threads is beneficial to the performance. Multiple threads may access the
same objects.
In Java, Synchronized keyword can be applied to method so only one thread at a time can
access an object.
If one thread invokes a synchronized method on an object, then that object is locked, another
thread that invokes one of the synchronized method will be blocked.
Enhancing Client Cooperation by Signaling
The clients may use a server as a means of sharing some resources.
In some applications, threads need to communicate and coordinate their actions by signaling.
Failure modes in transactions
The following are the failure modes: Writes to permanent storage may fail, either by writing
nothing or by writing wrong values.
Servers may crash occasionally.
19
There may be an arbitrary delay before a message arrives.
4.8 TRANSACTIONS
Transaction originally from database management systems.
Clients require a sequence of separate requests to a server to be atomic in the sense that:
• They are free from interference by operations being performed on behalf of other
concurrent clients.
• Either all of the operations must be completed successfully or they must have no effect at
all in the presence of server crashes.
Atomicity
All or nothing: a transaction either completes successfully, and effects of all of its operations
are recorded in the object, or it has no effect at all.
Failure atomicity: effects are atomic even when server crashes
Durability: after a transaction has completed successfully, all its effects are
saved in permanent storage for recover later.
Isolation
Each transaction must be performed without interference from other transactions. The intermediate
effects of a transaction must not be visible to other transactions.
4.8.1 Concurrency Control Serial
Equivalence
If these transactions are done one at a time in some order, then the final result will be correct.
If we do not want to sacrifice the concurrency, an interleaving of the operations of transactions
may lead to the same effect as if the transactions had been performed one at a time in some
order.
We say it is a serially equivalent interleaving.
The use of serial equivalence is a criterion for correct concurrent execution to prevent lost
updates and inconsistent retrievals.
Conflicting Operations
When we say a pair of operations conflicts we mean that their combined effect depends on the
order in which they are executed. E.g. read and write
There are three ways to ensure serializability:
Locking
Timestamp ordering
Optimistic concurrency control
21
If another client wants to access the same object, it has to wait until the object is unlocked in
the end.
There are two types of locks: Read and Write.
Read locks are shared locks (i.e.) they do not bring about conflict.
Write locks are exclusive locks, since the order in which write is done may give rise to a
conflict.
None No No
Read No Yes
A wait-for graph can be used to represent the waiting relationships between current transactions.
In a wait-for graph the nodes represent transactions and the edges represent wait-for relationships
between transactions – there is an edge from node T to node U when transaction T is waiting for
transaction U to release a lock.
23
Fig 4.17: Wait for graph
24
4.11 OPTIMISTIC COCURRENCY CONTROL
The locking and serialization of transaction has numerous disadvantages.:
Lock maintenance represents an overhead that is not present in systems that do not support
concurrent access to shared data. Locking sometimes are only needed for some cases with low
probabilities.
The use of lock can result in deadlock. Deadlock prevention reduces concurrency severely. The
use of timeout and deadlock detection is not ideal for interactive programs.
To avoid cascading aborts, locks cannot be released until the end of the transaction. This
may reduce the potential for concurrency.
It is based on observation that, in most applications, the likelihood of two clients‟ transactions
accessing the same object is low.
Transactions are allowed to proceed as though there were no possibility of conflict with other
transactions until the client completes its task and issues a close Transaction request.
When conflict arises, some transaction is generally aborted and will need to be restarted by the
client.
Each transaction has the following phases:
Working phase:
Each transaction has a tentative version of each of the objects that it updates.
This is a copy of the most recently committed version of the object.
The tentative version allows the transaction to abort with no effect on the object, either
during the working phase or if it fails validation due to other conflicting transaction.
Several different tentative values of the same object may coexist.
In addition, two records are kept of the objects accessed within a transaction, a read set
and a write set containing all objects either read or written by this transaction.
Read are performed on committed version (no dirty read can occur) and write record
the new values of the object as tentative values which are invisible to other transactions.
Validation phase:
When closeTransaction request is received, the transaction is validated to establish
whether or not its operations on objects conflict with operations of other transaction on
the same objects.
If successful, then the transaction can commit.
If fails, then either the current transaction or those with which it conflicts will need to
be aborted.
Update phase:
If a transaction is validated, all of the changes recorded in its tentative versions are
made permanent.
Read-only transaction can commit immediately after passing validation.
Write transactions are ready to commit once the tentative versions have been recorded
in permanent storage.
25
Validation of Transactions
Validation uses the read-write conflict rules to ensure that the scheduling of a particular
transaction is serially equivalent with respect to all other overlapping transactions- that is, any
transactions that had not yet committed at the time this transaction started.
Each transaction is assigned a number when it enters the validation phase (when the client
issues a closeTransaction).
Such number defines its position in time.
A transaction always finishes its working phase after all transactions with lower numbers.
That is, a transaction with the number Ti always precedes a transaction with number Tj if i < j.
The validation test on transaction Tv is based on conflicts between operations in pairs of
transaction Ti and Tv, for a transaction Tv to be serializable with respect to an overlapping
transaction Ti, their operations must conform to the below rules.
Rule No Tv T1 Rule
1 Write Read Ti must not read objects written by Tv
2 Read Write Tvmust not read objects written by Ti
3 Write Write Ti must not read objects written by Tvand Tvmust not
read objects written by Ti.
Types of Validation
Backward Validation: checks the transaction undergoing validation with other preceding
overlapping transactions- those that entered the validation phase before it.
Forward Validation: checks the transaction undergoing validation with other later
transactions, which are still active
Starvation
When a transaction is aborted, it will normally be restarted by the client program.
There is no guarantee that a particular transaction will ever pass the validation checks, for it
may come into conflict with other transactions for the use of objects each time it is restarted.
The prevention of a transaction ever being able to commit is called starvation.
4.12 TIMESTAMP ORDERING
Timestamps may be assigned from the server‟s clock or a counter that is incremented
whenever a timestamp value is issued.
Each object has a write timestamp and a set of tentative versions, each of which has a write
timestamp associated with it; and a set of read timestamps.
26
The write timestamps of the committed object is earlier than that of any of its tentative versions,
and the set of read timestamps can be represented by its maximum member.
Whenever a transaction‟s write operation on an object is accepted, the server creates a new
tentative version of the object with write timestamp set to the transaction timestamp. Whenever
a read operation is accepted, the timestamp of the transaction is added to its set of read
timestamps.
When a transaction is committed, the values of the tentative version become the values of the
object, and the timestamps of the tentative version become the write timestamp of the
corresponding object.
Each request from a transaction is checked to see whether it conforms to the operation conflict
rules.
Conflict may occur when previous done operation from other transaction Ti is later than current
transaction Tc. It means the request is submitted too late.
Rule Tc Ti Description
1 Write Read Tc must not write an object that has been read by any Ti
where Ti>Tcthis requires that Tc>= the maximum read
timestamp of the object.
2 Write Write Tc must not write an object that has been written by any Ti
where Ti>Tcthis requires that Tc> the write timestamp of the
committed object.
3 Read Write Tc must not read an object that has been written by any Ti
where Ti>Tcthis requires that Tc>the write timestamp of the
committed object.
27
Timestamp ordering by write rule
Tc must not write objects that have been read by any Ti where Ti >Tc
2. When a participant receives a canCommit? request it replies with its vote (Yes or No) to
the coordinator. Before voting Yes, it prepares to commit by saving objects in permanent
storage. If the vote is No, the participant aborts immediately.
29
Phase 2 (completion according to outcome of vote):
3. The coordinator collects the votes (including its own).
(a) If there are no failures and all the votes are Yes, the coordinator decides to commit the
transaction and sends a doCommitrequest to each of the participants.
(b) Otherwise, the coordinator decides to abort the transaction and sends doAbortrequests to
all participants that voted Yes.
4. Participants that voted Yes are waiting for a doCommitor doAbortrequest from the
coordinator. When a participant receives one of these messages it acts accordingly and, in the
case of commit, makes a have Committedcall as confirmation to the coordinator.
30
Hierarchic Two phase commit protocol
In this approach, the two-phase commit protocol becomes a multi-level nested protocol.
The coordinator of the top-level transaction communicates with the coordinators of the sub-
transactions for which it is the immediate parent.
It sends canCommit? messages to each of the latter, which in turn pass them on to the
coordinators of their child transactions.
canCommit?(trans, subTrans) Yes / No: Call from coordinator to coordinator of child sub-
transaction to ask whether it can commit a sub-transaction subTrans. The first argument, trans,
is the transaction identifier of the top-level transaction. Participant replies with its vote, Yes /
No.
Flat Two phase commit protocol
In this approach, the coordinator of the top-level transaction sends canCommit? messages to
the coordinators of all of the sub-transactions in the provisional commit list.
During the commit protocol, the participants refer to the transaction by its top-level TID.
Each participant looks in its transaction list for any transaction or sub-transaction matching that
TID.
A participant can commit descendants of the top-level transaction unless they have aborted
ancestors.
When a participant receives a canCommit? request, it does the following:
If the participant has any provisionally committed transactions that are descendants of the top-
level transaction, trans, it:
checks that they do not have aborted ancestors in the abortList, then prepares
to commit (by recording the transaction and its objects in permanent storage);
aborts those with aborted ancestors;
sends a Yes vote to the coordinator.
If the participant does not have a provisionally committed descendent of the top level
transaction, it must have failed since it performed the sub- transaction and it sends a No vote
to the coordinator
4.14 DISTRIBUTED DEADLOCKS
• A cycle in the global wait-for graph (but not in any single local one) represents a distributed
deadlock.
• A deadlock that is detected but is not really a deadlock is called a phantom deadlock.
• Two-phase locking prevents phantom deadlocks; autonomous aborts may cause phantom
deadlocks.
• Permanent blocking of a set of processes that either compete for system resources or
communicate with each other is deadlock.
• No node has complete and up-to-date knowledge of the entire distributed system. This is the
cause of deadlocks.
31
Fig 4.17: Distributed deadlocks and wait for graphs Types of
distributed deadlock
Resource deadlock :Set of deadlocked processes, where each process waits for a resource
held by another process (e.g., data object in a database, I/O resource on a server)
Communication deadlocks: Set of deadlocked processes, where each process waits to
receive messages (communication) from other processes in the set.
Local and Global Wait for Graphs
Edge Chasing
When a server notes that a transaction T starts waiting for another transaction U, which is
waiting to access a data item at another server, it sends a probe containing
T U to the server of the data item at which transaction U is blocked.
Detection: receive probes and decide whether deadlock has occurred and whether to forward the
probes.
When a server receives a probe T U and finds the transaction that U is waiting for, say V, is
waiting for another data item elsewhere, a probe T U V is forwarded.
Detection: receive probes and decide whether deadlock has occurred and whether to forward the
probes.
When a server receives a probe T U and finds the transaction that U is waiting for, say V, is
waiting for another data item elsewhere, a probe T U V is forwarded.
Resolution: select a transaction in the cycle to abort
32
Transaction priorities
Every transaction involved in a deadlock cycle can cause deadlock detection to be initiated.
The effect of several transactions in a cycle initiating deadlock detection is that detection may
happen at several different servers in the cycle, with the result that more than one transaction in the
cycle is aborted.
33
4.14.1 System Model
34
4.15 CASE STUDY: CODA
Coda is a distributed file system. Coda has been developed at Carnegie Mellon University (CMU)
in the 1990s, and is now integrated with a number of popular UNIX-based operating systems such as
Linux.
Features of Coda File System (CFS):
CFS main goal is to achieve high availability.
It has advanced caching schemes.
It provide transparency
Architecture of Coda
The clients cache entire files locally.
Cache coherence is maintained by the use of callbacks. Clients dynamically find files on
server and cache location information.
For security, token-based authentication and end-to-end encryption is used.
35
Communication in Coda
36
Processes in Coda
Coda maintains distinction between client and server processes. The clients are known as Venus
processes and Server as Vice processes. The threads are non preemptive and operate entirely in user
space. Low-level thread handles I/O operations.
Caching in Coda
Cache consistency in Coda is maintained using callbacks.
The Vice server tracks all clients that have a copy of the file callback promise. The tokens are
obtained from vice server. It guarantee that Venus will be notified if file is modified
Upon modification Vice server send invalidate to clients.
38
REVIEW QUESTIONS
PART – A
1. Define clock skew.
Clock skew is defined as the difference between the times on two clocks.
2. Define clock drift.
Clock drift is the count time at different rates.
3. What is Clock drift rate ?
Clock drift rate is the difference in precision between a prefect reference clock and a physical clock.
4. What is External synchronization?
This method synchronize the process‟s clock with an authoritative external reference clock S(t)
by limiting skew to a delay bound D > 0 - |S(t) - Ci(t) | < D for all t.
For example, synchronization with a UTC (Coordinated Universal Time)source.
5. What is Internal synchronization?
Synchronize the local clocks within a distributed system to disagree by not more than a delay
bound D > 0, without necessarily achieving external synchronization - |Ci(t) - Cj(t)| < D for all
i, j, t n
For a system with external synchronization bound of D, the internal synchronization is
bounded by 2D.
6. Give the types of clocks.
Two types of clocks are used:
Logical clocks : to provide consistent event ordering
Physical clocks : clocks whose values must not deviate from the real time by more than a
certain amount.
7. What are the techniques are used to synchronize clocks?
time stamps of real-time clocks
message passing
round-trip time (local measurement)
8. List the algorithms that provides clock synchronization.
Cristian‟s algorithm
Berkeley algorithm
Network time protocol (Internet)
9. Give the working of Berkley’s algorithm.
The time daemon asks all the other machines for their clock values. The machines answer the
request. The time daemon tells everyone how to adjust their clock.
10. What is NTP?
The Network Time Protocol defines architecture for a time service and a protocol to distribute time
information over the Internet.
11. Give the working of Procedure call and symmetric modes:
All messages carry timing history information.
The history includes the local timestamps of send and receive of the previous NTP message
and the local timestamp of send of this message
For each pair i of messages (m, m‟) exchanged between two servers the following
values are being computed
39
- offsetoi : estimate for the actual offset between two clocks
- delay di : true total transmission time for the pair of messages.
12. Define casual ordering.
The partial ordering obtained by generalizing the relationship between two process is called as
happened-before relation or causal ordering or potential causal ordering.
13. Define logical clock.
A Lamport logical clock is a monotonically increasing software counter, whose value need bear no
particular relationship to any physical clock.
14. What is global state?
The global state of a distributed system consists of the local state of each process, together with the
messages that are currently in transit, that is, that have been sent but not delivered.
15. When do you call an object to be a garbage?
An object is considered to be garbage if there are no longer any references to it anywhere in the
distributed system.
16. Define distributed deadlock.
A distributed deadlock occurs when each of a collection of processes waits for another process to
send it a message, and where there is a cycle in the graph of this “waits-for‟ relationship.
17. What is distributed snapshot?
Distributed Snapshot represents a state in which the distributed system might have been in. A
snapshot of the system is a single configuration of the system.
18. What is consistent cut?
A consistent global state is one that corresponds to a consistent cut.
19. Define run.
A run is a total ordering of all the events in a global history that is consistent with each local
history‟s ordering.
20. What is linearization?
A linearization or consistent run is an ordering of the events in a global history that is consistent
with this happened-before relation o on H.
21. What is global state predicate?
A global state predicate is a function that maps from the set of global states of processes in the
system.
22. What are the features of the unreliable failure detectors?
• unsuspected or suspected (i.e.) there can be no evidence of failure
• each process sends ``alive'' message to everyone else
• not receiving ``alive'' message after timeout
• This is present in most practical systems
23. What are the features of the reliable failure detectors?
• Unsuspected or failure
• They are present in synchronous system
24. Define distributed mutual exclusion.
Distributed mutual exclusion provide critical region in a distributed environment.
25. What are the requirements for Mutual Exclusion (ME)?
[ME1] safety: only one process at a time [ME2] liveness: eventually enter or exit
40
[ME3] happened-before ordering: ordering of enter() is the same as HB ordering
26. What are the criteria for performance measures?
Bandwidth consumption, which is proportional to the number of messages sent in each entry and
exit operations.
The client delay incurred by a process at each entry and exit operation.
Throughput of the system: Rate at which the collection of processes as a whole can access the
critical section.
27. What is ring based algorithm?
This provides a simplest way to arrange mutual exclusion between N processes without requiring an
additional process is arrange them in a logical ring.
28. What is mutual synchronisation?
This exploits mutual exclusion between N peer processes based upon multicast. Processes that
require entry to a critical section multicast a request message, and can enter it only when all the
other processes have replied to this message.
29. What is Maekawa’s Voting Algorithm?
In this algorithm, it is not necessary for all of its peers to grant access. Only need to obtain
permission to enter from subsets of their peers, as long as the subsets used by any two processes
overlap.
30. What is election algorithm?
An algorithm for choosing a unique process to play a particular role is called an election algorithm.
31. What is Ring based Election Algorithm?
All the processes arranged in a logical ring.
Each process has a communication channel to the next process.
All messages are sent clockwise around the ring.
Assume that no failures occur, and system is asynchronous.
The ultimate goal is to elect a single process coordinator which has the largest identifier
32. What is bully algorithm?
This algorithm allows process to crash during an election, although it assumes the message delivery
between processes is reliable.
33. What are the messages in Bully algorithm?
There are three types of messages:
a. Election message: This is sent to announce an election message. A process begins an election
when it notices, through timeouts, that the coordinator has failed. T=2Ttrans+Tprocess From
the time of sending
b. Answermessage: This is sent in response to an election message.
c. Coordinatormessage: This is sent to announce the identity of the elected process.
34. Define transaction.
A Transaction defines a sequence of server operations that is guaranteed to be atomic in the
presence of multiple clients and server crash.
35. List the methods to ensure serializability.
There are three ways to ensure serializability:
Locking
Timestamp ordering
41
Optimistic concurrency control
36. Give the advantages of nested transactions.
Sub- transactions at same level can run concurrently.
Sub- transactions can commit or abort independently.
PART - B
1. Explain the clocking in detail.
2. Describe Cristian‟s algorithm.
3. Write about Berkeley algorithm
4. Brief about NTP.
5. Explain about logical time and logical clocks.
6. Describe global states.
7. Brief about distributed mutual exclusion.
8. Write in detail about election algorithms.
9. Explain transactions and concurrency control.
10. Describe nested transaction and its issues.
11. What is optimistic concurrency control?
12. Write in detail about timestamp ordering.
13. Describe atomic commit protocol.
14. Explain distributed deadlocks.
15. Describe replication.
16. Write about Coda.
43