100% found this document useful (1 vote)
71 views119 pages

Distributed System

This document discusses the characterization of distributed systems. It is divided into 5 parts. Part 1 provides an introduction and examples of distributed systems. It defines distributed systems and their key characteristics. Part 2 covers resource sharing and challenges in distributed architectures. Part 3 discusses theoretical foundations, including limitations and models. Part 4 focuses on logical clocks. Part 5 is about concepts in message ordering and global state detection in distributed systems.

Uploaded by

DataBase
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
Download as pdf or txt
100% found this document useful (1 vote)
71 views119 pages

Distributed System

This document discusses the characterization of distributed systems. It is divided into 5 parts. Part 1 provides an introduction and examples of distributed systems. It defines distributed systems and their key characteristics. Part 2 covers resource sharing and challenges in distributed architectures. Part 3 discusses theoretical foundations, including limitations and models. Part 4 focuses on logical clocks. Part 5 is about concepts in message ordering and global state detection in distributed systems.

Uploaded by

DataBase
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
Download as pdf or txt
Download as pdf or txt
You are on page 1/ 119

1

UNIT
Characterization of
Distributed System

CONTENTS
1-2B to 1-4B
Part-1 : Introduction, Example
of Distributed System
1-4B to 1-9B
Part-2 : Resource Sharing and Web ..eeech..+cese+e

Challenges, Architectural
Models, Fundamental Models

Part-3: Theoretical Foundation for .............. l-10B to 1-12B


Distributed System : Limitations
of Distributed System,
Absence of Global Clock,
Shared Memory
Part-4 : Logical Clock, Lamports 1-12B to 1-15B
and Vectors Logical Clocks

Part-5: Concept in Message System 1-16B to 1-20B


Causal Order, Total Order, Total
Causal Order, Techniques for
Message Ordering, Causal
Ordering of Messages,
Global State and Termination Detection.

1-1 B (CS-Sem-7)
1-2 B (CS-Sem-7) Characterization of Distributed System

PART- 1

Introduction, Example of Distributed System.

Questions-Answers

Long Answer Type and Medium Answer Type Questions

main
Que 1.1. What is a distributed system ? Describe the
characteristics of distributed- system. Give two example of
AKTU2014-15, Marks 05
distributed system.

Answer
Distributed system :
1. A distributed system is a system in which
software or hardware
communicates and
components connected via communication network
coordinates their actions only by passing messages.
2. Computers that are connected by a network may be spatially separated
by distanee.
3. Resources may be managed by servers and accessed by clients.
Characteristics of distributed system :
1. Heterogeneity : Distributed system enables the users to access services
computers and
and run application over a heterogeneous collection of
networks.
characteristics
2. Openness : The openness of a computer system is the
that determine whether the system can be extended and re-implemented
in various ways.
3.
Concurrency : Concurrency in distributed system is use to help different
time.
users to access the shared resource at the same
4. Scalability : A system is described as scalable if it remains effective
when there is significant increase in the number of resources and the
numbers of users.
5. Security : Security provides confidentially, integrity and availability of
the information resources.
Example of distributed system:
1. Internet : The Internet is a very large distributed system. It enables
users to make use of services such as the World Wide Web, e-mail and
file transfer.
Distributed System 1-3 B(CS-Sem-7)
2 Intranet :
a An intranet is a private network that is contained within an
enterprise.
b An intranet is connected to the internet via router, which allows
the users inside the intranet to make use of services such as web or
e-mail.

Que 1.2. What are distributed systems ? What are significant


advantages and applications of distributed system ?
AKTU2018-19, Marks 10
Answer
Distributed system: Refer Q. 1.1, Page 1-2B, Unit-1.
Advantages of distributed system :
1. Data sharing: It allows many users to access to a common database.
2 Resource sharing : Expensive peripherals like color printers can be
shared among different nodes (or systems).
3. Communication: Enhance human-to-human communication, e.g.,
email, chat.
4. Flexibility :Spread the workload over the available machines.
Applications of distributed systems :
1 Telecommunication networks such as telephone networks and cellular
networks.
2. Network applications, world wide web and peer-to-peer networks.
3 Real-time process controls aircraft control systems.
4. Parallel computation.
Que 1.3. How the distributed computing system is better than
parallel processing system ? Explain. AKTU2017-18,Marks 10
Answer
Distributed computing system is better than parallel processing system
because of followingadvantages:
1. Economics : Microprocessors offer better performance than parallel
processing system.
2. Speed : Adistributed system may have more total computing power
than parallel processing system.
3. Reliability : If some of the machines are downed, the distributed
system as a whole can stillsurvive with small degradation of
performance.
14B (CS-Sem-7) Characterization ofDistributed System
4. Ineremental growth : Computing power can be added in small
increments.
5. Data sharing:Allow many users access to a common database.
6 Device sharing : Allow many users to share expensive peripherals.
7. Flexibility : In distributed computing workload can be spread over
the available machines in the most cost effective way.

Que 1.4. What is distributed transparency? Explain the different


types of distributed transparencies. AKTU 2017-18, Marks 10

Answer
Distributed transparency is the property of distributed databases by the
virtue of which the internal details of the distribution are hidden from the
users.
Types of transparencies :
1. Access transparency : It enables local and remote resources to be
accessed using identical operations.
2. Location transparency : It enables resources to be accessed without
knowledge of their physical or network location.
3. Concurrency transparency: It enables several processes to operate
concurrently using shared resources without interference between
them.
4 Replication transparency:It enables multiple instances of resources
to be used to increase reliability and performance without knowledge of
the replicas by users or application programmers.
5. Failure transparency :It enables the concealment of faults, allowing
users and application program to complete their tasks despite the failure
of hardware or software components.
6 Performance transparency:It allows the system to be reconfigured
to improved performances as load varies.

PART-2

Resource Sharing and Web Challenges, Architectural


Models, Fundamental Models.

Questions-Answers
Long Answer Type and Medium Answer Type Questions
Distributed System 1-5B (CS-Sem-7)

Que 1.5.How the resource sharing is done in


distributed system ?
Explain with an example.
Answer
1. Resource sharing is one of the major advantages which is obtained from
distributed system.
2. In a distributed system, the resources are
enclosed within computers
and can only be accessed from other computers by
3.
communication.
Each resource must be managed by a program that offers a
communication interface enabling the resource to be accessed.
4. The program also helps the resources to be updated reliably and
consistently.
For example :
1. The client-server model provides an effective general purpose approach
to the sharing of information and resources in
distributed systems.
2. In this model, a client sends a request to a server for
getting some
services such as reading a block of a file.
3. The server executes the request and sends back a reply to the client that
contains the result of request processing.
Que 1.6. Discuss the major issue in designing a distributed
system. AKTU2017-18, Marks 10
Answer
Major issues in designing a distributed system :
1, Heterogeneity :
a. Distributed system must be constructed from variety of different
networks, operating systems, computer hardware's and
programming languages.
b. Internet communication protocol mask the difference
(heterogeneity) in networks and middleware can deal with the
other differences.
2 Openness : Distributed system should be extensible i.e. to develop
interface for the distributed system component so that they can be
integrated to new extension of distributed system.
3. Security :
Encryption can be used to provide adequate amount of shared
resources and to keep sensitive information secret when it is
transmitted in messages over a network.
b. Denial of Services (DoS) attack is one of the big problems for security.
1-6 B (CS-Sem-7) Characterization ofDistributed System
4. Scalability :
a. Scalability refers to the capability of a system to adapt to
increased service load.
b It is inevitable that a distributed system will grow with time since it
is very common to add new machËnes to take care of increased
work load. Therefore, a distributed system should be designed to
easily cope with the growth of nodes and users in the system
5. Fault avoidance :
Fault avoidance deals with designing the components ofthe system
in such a way that the occurrence of faults in minimized.
b. Conservative design practice such as using high reliability
components are often employed for improving the system's
reliability based on the idea offault avoidance.
6. Transparency :
a. Transparency aims to hide the details ofdistribution from the users.
b. For an example, user or programmer ned not be concerned with
its location or the details of how its operations are accessed by other
components, or whether it will be replicated or migrated.
Que 1.7, Why is scalability an important feature in the design of
distributed system ? Discuss some of the guiding principles for
designing a scalable distributed system.
|AKTU 2014-15, Marks 10
Answer
Scalability is important features in design of distributed' system
because :
a. It helps the system to work efficiently with an increase in number of
users.

b. It increases the system performance by incorporating additional


resources.

Guiding principle for designing scalable distributed system:


1. Avoid centralized entities:
Use ofcentralized entities should be avoided in the design of scalable
distributed system because :
In centralized system, if centralized entity fails then the entire
system will also fail.
i. Capacity of the network that connects the centralized entity
gets saturated.
iüi. In case of wide-area network system, traffic in the network
increases.
Distributed System 1-7B(CS-Sem-7)
b. Replication of resources and distributed control algorithms are
frequently used techniques to achieve scalable system.
For better scalability,functionally symmetric configuration should
be used in which all nodes of the system should play equal role in
the operation of the system.
2. Avoid centralized algorithms:
a. Acentralized algorithm is one that operates by collecting information
from all nodes, processing this information on a single node and
then distributing the result toother nodes.
b. Time complexity of centralized algorithm may be very high which
creates heavy network traffic and consumes network bandwidth.
C. Therefore, in the design of a distributed operating system, only
decentralized algorithm should be used.
3. Perform most operations on client workstations :
If possible, an operation should be performed on the client's
workstation.
b This principle enhances the scalability of the system, since it allows
graceful degradation of system performance as the system grows
in size.
C. Caching is a frequently used technique for the realization of this
principle.
4 Controlling the cost of physical resources : As the demand for a
resource grows, it should be possible to extend the system, at reasonable
cost, to meet it.

Que 1.8. Discuss architectural models of distributed system.

Answer
1 An architecture model of a distributed system simplifies and abstracts
the functions of the individual components of a distributed system
2. It also considers the placement of the components across a network of
computers and the interrelationship between the components.
3. The main objective of these models is to make the system reliable,
manageable, adaptable and cost-effective.
4. The two main types of architectural model are :
a. Client-server model (Search engine):
Fig. 1.8.1 illustrates the simple structure in which client
processes interact with individual server processes in separate
host computers in order to access the shared resources that
they manage.
1-8 B (CS-Sem-7) Characterization of Distributed System

Invocation Inyocation
Client Server

Result Server Result

Client
Key:
process: Computer
Fig. 1.8.1. Clients invoke individual servers.
This is the architecture that is most widely employed.
Client-server model offers a direct and simple approach tothe
sharing of data and other resources.
iv. Servers may acts as a client of other servers.
V For example, a web server is often a client of a local file server
that manages the files in which the web pages are stored.
b. Peer-to-Peer model :
In this architecture, all of the processes which are involved in
task play similar roles, interacting cooperatively as peers
without any distinction between client and server processes.
ii The Fig. 1.8.2 illustrates the form of a peer-to-peer application.

Application Application

Coordination) Coordination
Code Code

Peer-2
Peer-1

/Application

Coordination)
Code

Peer-3

Fig. 1.8.2. Distributed application based on


peer-to-peer processes.
Distributed System 1-9 B (CS-Sem-7)

ii. Applications are composed of large numbers-of peer processes


running on separate computers and the pattern of
communication between them depends on application
requirements.
Que 1.9. Explain the fundamental models of distributed system.
Answer
1 Fundamental models are based on the fundamental properties that
allow us to be more specific about their characteristics, failures and
security risks that they might exhibit.
2. The purpose of a model is :
a. To make explicit all the relevant assumptions about the systems.
b. To make generalizations concerning what is possible or impossible,
given those assumptions.
Following are the fundamental model :
1. Interaction models :
a It is concerned with performance.of process communication
channels and absence of global clock.
b Interaction model is further classified as synchronous and
asynchronous system.
C.
Interacting process perform all of the activity in adistributed system.
d. Each process has its own state, consisting of the set of data that it
can access and update, including the variable in its program.
The state belonging toeach process is completely private.
2. Failure model :
a. Ina distributed system both processes and communicationchannels
may fail, sÍ this model is capable of handling all the failure.
b. The failure model defines and classifies the faults that occur in the
system.
C. It provides a model to understand the effects offaults in the system.
3. Security model :
a. It identifies the possible threats to processes and communication
channels in an open distributed system such as integrity,
authentication, privacy etc.
b. The architectural model provides the basis for our security model:
i The security ofa distributed system can be achieved by securing
the processes and the channels used for their interactions and
by protecting the objects that they encapsulate against
unauthorized access.
ii Protection is described in terms of objects, although the concepts
apply equally well to resources of all types.
1-10B (CS-Sem-7) Characterization of Distributed System

PART-3

Theoretical Foundation for Distributed System : Limitations of


Distributed System, Absence of Global Clock, Shared Memory.

Questions-Answers

Long Answer Type and Medium Answer Type Questions

Que 1.10. Explain the limitations of distributed system with


example. AKTU 2018-19, Marks 10

Answer
Limitations of distributed systems are as follows :
1. Absence of global clock:
In a distributed system, global clock (or common clock) is not
present.
b Suppose a global clock is available for all the processes in the
system.
C In this case, two different processes can observe a global clock
value at different instants due to unpredictable message
transmission delays.
d Therefore, two different processes, may falsely perceive two
diferent instants in physical time to be a single instant in physical
time.
2. Absence of shared memory :
a The computer in a distributed system do not share common
memory,an up-to-date state of the entire system is not available
to any individual process.
b. It is necessary for reasoning about the system's behaviour,
debugging, recovering from failures, etc.
C. Aprocess in a distributed system can obtain a coherent but partial
view of the system or a complete but incoherent view ofthe system.
d. A view is said to be coherent if allthe observations of different
processes (computers) are made at the same physical time.
e. Because of the absence of a global clock in'a distributed system,
obtaininga coherent global state of the system is difficult.
Distributed System 1-11 B(CS-Sem-7)
Example:
Local state Local state
of A of B

Initial state Communication


Rs. 500 Rs. 200
Channels

S1: A S2: B
Fig. 110.1.
Message under tran sit
(Not yet reached to B)

Rs. 450 Rs. 50 Rs. 200

S1: A S2: B
Fig. 1.10.1.
a. S1 records its local state (Rs. 450) just after debit (- 50) and S2
records its location (200) before receiving.
b If transit message is not taken care off
Global state = Local state S1+ Local state S2
= 450+ 200
= 650 = Rs. 50 missing i.e., in coherent system

Que 1.11. What are distributed systems ? What are significant


advantages, applications and limitations of distributed systems ?
Explain with examples, what could be the impact of absen ce of global
clock & shared memory. AKTU 2015-16, Marks 10
Answer
Distributed systems: Refer Q.1.1, Page 1-2B, Unit-1.
Significant advantages and applications of distributed systems :
Refer Q. 1.2, Page 1-3B, Unit-1.
Limitations of distributed systemns :
1. Absence of shared memory.
2. Absence of global clock.
3. The initial deployment cost of a distributed system is very high.
1-12 B(CS-Sem-7) Characterization of Distributed System

Impact of the absence of global clock:


1 It is difficult in a distributed system to reason about the temporal order
at events.
2. It is dificult to design and debug algorithms for adistributed system as
compared to centralized systems.
3. Collecting up-to-date information on the state of the entire system in
harder.
Impact of the absence of shared memory :
1. An up-to-date state of the entire system is not available to any of the
individual processes.
2 Recovery from failure cannot be possible.
For example: Refer Q. 1.10, Page 1-1OB, Unit-1.

PART-4

Logical Clock, Lamport's and Vectors Logical Clocks.

Questions-Answers

Long Answer Type and Medium Answer Type Questions

the important
Que 1.12. What are Lamport logical clocks ? List
clocks. If A and B
conditions to be satisfied by Lamport logical A > B then
and if
represent two distinct events in a process statement.
Justify the
CA) < C(B) but vice-versa not true.
AKTU2015-16, Marks 10

Answer
Lamport logical clocks :
monotonically increasing software counter,
A Lamport logical clock is a to any physical clock.
whose value need bear no particular relationship
Lamport logical clocks:
Following conditions are to be satisfied by
process P.,and a occurs before
1 Ifa andb are two events within the same
b, then C(a) < C{b).
P, and bis the receipt of that
2 It a is the sending of a message by process
message by process P, then C{a) <C{6).
always go forward, never
3 A clock C, associated with a process P, must clock must always be
backward. That is, corrections to time of a logical
never by subtracting
made by adding a positive value to the clock,
value.
Distributed System 1-13 B (CS-Sem-7)

Justification : Event 'A' casually affects event 'B ifA ’ B. Now, ifA ’B
then C(A) <C(B), but it vice-versa (reverse) is not here,because nothing can
be said about events by comparing timestamps.

Que 1.13. Discuss the limitations of Lamport's logical clock with


suitable example. AKTU2016-17, Marks 05
OR
What are Lamport logical clocks ? List the important conditions to
be satisfied by Lamport logical clocks. Discuss the imitations of
Lamport logical clocks. AKTU2018-19, Marks 10
Answer
Lamport logical clock and important conditions to be satisfied by
Lamport logical clocks:Refer Q. 1.12, Page 1-12B, Unit-1.
Limitation of Lamport's clocks :
1
According to Lamport's system of logical clocks, if a’ b then
Cla) <C(b).
2 However, the reverse is not necessarily true if the events have occurred
in different processes.
3. That is, if a and bare events in different processes and C(a) <C(b), then
a’b is not necessarily true; event a and b may be causally related or
may not be causally related.
4 Thus, Lamport's system of clocks is not powerful enough to capture
such situations.
For example :
P, e12
(1) (2)
Space P, e1
(1) (3)
P
(1) (2)
Time

Fig. 1.13.1.
a. Fig. l.13.1 shows a computation over three processes clearly,
Ce,,) <Ce,) and Cle,)<Ce,).
b. However, we can see from the Fig. 1 that event e,, is causally
related to event e but not to e
Note that the initial clock values are assumed to be zero and d is
assumed to equal 1.
1-14 B (CS-Sem-7) Characterization of Distributed System
In other words, in Lamport's system of clocks, we can guarantee
that if Cla) < C(b) then b-a, however we cannot say whether
events a and bare causally related or not by just looking at the
timestamps of the events.
e. The reason for the ab0ve limitation is that each clock can
independently advance due to the occurrence of local events in a
process.
f. The Lamport's clock system cannot distinguish between the
advancements of clocks due to local events from those due to the
exchange of messages between processes.
Therefore, using the timestamps assigned by Lamport's clocks we
cannot reason about the causal relationship between twoevents
occurring in different processes by just looking at the timestamps
of the events.

Que 1.14. What are vector clocks ? Explain with the help of
implementation rule of vector clocks, how they are implemented ?
Givethe advantages of vector clock over Lamport clock.
AKTU2014-15, Marks 05

Answer

Vector clocks:
1 Vector clocks are used in a distributed system to determine whether
pairs of events are causally related.
2 Using vector clocks, timestamps are generated for each event in the
system, and their causal relationship is determined by comparing those
timestamps.
Implementation of vector clocks :
1. Let n' be the number of processes in a distributed system. Each process
P, is equipped with a clock C, which is an integer vector of length n.
2. Let a, b be a pair of events.Let Clalli] be the ¿h element of the vector
clock for the event a.
3. Cla) is dominated by Cb) i.e., Ca) <C%), if and only if thefollowing two
conditions hold :
a i,0sisn-1:Clalli] < C[b][i]
b Bi,0sisn-1:Clal[i] < Cb][i]
4. To implement a system of vector clocks, initialize the vector clock of
each process to 0, 0, ..., 0, (n component).
Distributed System 1-15 B (CS-Sem-7)

5. The implementation rules for vector clocks :


Rule l :Each local event at process P, increments the ih component of
its vector clock (i.e., C[i] by 1).
Rule 2 :The sender appends the vector timestamp to every message
that it sends.
Rule 3: When process P, receives a message with a vector timestamp 7
from another process, it first increments the jh component C{lof its
own vector clock i.e.,Cä] = Cäl +land then updates its vector clock as:
Vi,0sisn-1, C[i] = max(TIi], C[E)
6 When the vector clock values of two events are incomparable, the events
are concurrent.
7. An example of vector timestamp is shown in Fig. 1.14.1. The event e,
with vector timestamp (2, 1, 0) is causally ordered before the event e4
with the vector timestamp (2, 1, 4), but is concurrent with the event e
having timestamp (0,0, 2).
(1, 1, 0) (2,1, 0)
P, e11
(0, 0, 0) 12
Space (2, 2, 4)
(0, 1, 0)
fo,o,o e21
(0, 0, 1) (0, 0, 2) (2, 1, 3) (2, 1, 4)
Pa
(0, 0, 0) e31 eg2 e34 Time
Fig. 1.14.1.

Advantage of vector clock over Lamport's clock:


1. Events 'a' and b' are causally related if <tor t<t. Otherwise, these
events are concurrent.

2 In the system of vector clocks,


a’biff t <t
Thus, the system of vector clocks allows us to order events and decide
whether two events are causally related or not by simply looking at the
timestamps of the event.
3 In Lamport clock, it is not possible because t < t, does not always
implies that a ’ b.
1-16 B (CS-Sem-7) Characterization ofDistributed System

PART-5

Concept in Message System : CausalOrder, Total Order, Total


Causal Order, Techniques for Message Ordering, Causal
Ordering of Mesages, Global State and Termination Detection.

Questions-Answers

Long Answer Type and Medium Answer Type Questions

Que 1.15. What do you mean by casual ordering of messages ? If


process P sends two message m, and m, to another process Q,what
problems may arise if the two messages are not received by recipient
Q, in the order they were sent by process P. Develop an algorithm
which guarantees the casual ordering of message in distributed
system. |AKTU2015-16, Marks 10
OR
Discuss causal ordering of messages. Give one algorithm which can
order the messages according tocausal dependencies.
AKTU 2016-17, Marks 10
Answer
Casual ordering of message : The casual ordering of message deals with
the concept of maintaining same casual relationship that holds among
"message send" event with corresponding "message receive" event.
Problem:

If the two messages m, and m, are not received by recipient Qin the order
they were sent by process P, this means message delivery will not be causal.
Algorithm :
Schiper-Eggli-Sandoz algorithm :
Instead of maintaining a vector clock based on the number of messages sent
to each processes, the vector clock for this algorithm can be incremented at
any rate and has no additional meaning related to the number of messages
spent to the processes.
Sending a message :
1. All messages are timestamped and sent out with a list of all timestamps
of messages sent to other processes.
1-17 B (CS-Sem-7)
Distributed System

2. Locally store the timestamp of the sent message.


Receiving a message :
1 Amessage cannot be delivered if
there is a predate message in the list of
timestamps.
following steps :
2 Otherwise, a message can be delivered, performing the
a Add the timestamp of delivered message in the list :
other processes to
Add knowledge of messages destined for
our list of processes.
one we already
If the new list has a timestamp greater than
had stored, update our timestamp to match.
b Update the local logical clock.
can now be
C. Check all the local buffered messages to see if they
delivered.

of message in
Que 1.16. Explain the algorithm for casual ordering
distributed system.

Answer
Algorithm for casual ordering of message in distributed system:
a.
Birman-Schiper-Stephenson algorithm :
There are three basic principles to this algorithm :
1. All messages are time stamped by the sending
process.
2. A message cannot be
delivered until:
delivered locally.
i All the messages before this one have been
the original
Allthe other messages that have been sent out from receiving
delivered at the
process have been accounted as
process.
updated.
3. When a message is delivered, the clock is
communicate through
This algorithm requires that the processes message could be
broadcast messages which ensure that only one
received at any one time.
Page 1-16B,
b. Schiper-Eggli-Sandoz algorithm : Refer Q. 1.15,
Unit -1.

Que 1.17. Write short note on global state.


1-18B (CS-Sem-7) Characterization of Distributed System

Answer

1 The global state of a distributed computation is the set of local states of


all individual processes involved in the computation and the state of
the communication channels.
2. The global state of the system is a collection of the local states (LS) of
a processing system.
GS =(LS,, LS, LS,...LS)
where N is number of sites in the system.
Consistent global state :
1. Aglobal state GS is a consistent global state iffit satisfies the following
two conditions :
a. Every message m,, that is recorded as sent in the local state of a
process P, must be captured in the state of the channel C,. or in the
collected local state of thereceiver process P,.
b. Ifa message m,, is not recorded as sent in the local state of process
P.,then it must neither be present in.the state of the channel C
nor in the collected local state of the receiver process P,.
2 Thus, in a consistent global state, for every received message a
corresponding send event is recorded in the global state.
3 In an inconsistent global state, there is at least one message whose
receive event is recorded but its send event is not recorded in the
global state.
4 In Fig. 1.17.1, the global state (LS,, LS.,g), LS,) and (LS,,, LSp, LS,)
correspond to consistent and inconsistent global states, respectively.
Transitless global state : Aglobal state is transitless if and only if
Vi, vj :1si, jsn: transit (LS,, LS,)=¢. Thus, allcommunication channels
are empty in a transitless global state.

S,
LS11 LS12

S, LS1 LS2 LS3

S LS1 LSg2, LS33


Fig. 1.17.1. Global states in a distributed computation.
Distributed System 1-19 B (CS-Sem-7)

Strongly consistent global state :


1. A global state is strongly consistent if it is consistent and transitless.
2. In a strongly consistent state, not only the send events of all the recorded
received events are recorded, but the receive events of all the recorded
send events are also recorded.
3 Thus, a strongly consistent state corresponds to a consistent global
state in which allchannels are empty.
4 In Fig. 1.17.1, the global state (LS, LS,, LS,,) is astrongly consistent
global state.

Que 1.18. Give the Chandy-Lamport's global state recording


algorithm. AKTU2016-17, Marks 05

Answer

Chandy-Lamport global state recording algorithm :


1 The Chandy-Lamport algorithm uses a control message, called a marker
whose role in a FIFO system is to separate messages in the channels.
2 After a site has recorded its local state, it sends a marker, along all of its
outgoing channels before sending out any more messages.
3. A marker separates the messages in the channel into those which are
included in the local state and which are not to be recorded in the local state.

4. A process must record its local state before it receives a marker on any
of its incoming channels.
Chandy-Lamport algorithm :
1. Marker receiving rule for processj :
On receiving a marker along channel C:
IfÇhas not recorded its state) then
Record its process state
Record the state of C as the empty set
Follow the "marker sending rule"
else
Record the state of C as the set of messages received along C after
j's state was recorded and before j received the marker along C.
2 Marker sending rule for process i :
a. Process irecords its state.
b. For each outgoing channelC on which a marker has not been sent,
isends a marker along Cbefore i sends further messages along C.
1-20 B (CS-Sem-7) Characterization of Distributed System

Que 1.19. What is termination detection in distributed system ?


Explain any algorithm for termination detection.

AKTU2017-18, Marks 10

Answer

1. The termination detection problem involves detecting whether an


ongoing distributed computation has finished all its activities.
2. The termination detection problem arises when a distributed
computation terminates implicitly, that is, once the computation finishes
all its activities, no single process knows about the termination.
Therefore a separate algorithm has to be run to detect termination of
the computation.
Notations used in algorithm :
1. B(DW) : Computation message sent as a part of the computation and
DW is the weight assigned to it.
2 CDW) :Control message sent from the processes to the controlling
agent and DW is the weight assigned to it.
Huang's termination detection algorithm:
Rule 1: The controlling agent or an active process having weight Wmay
send a computation message to a process Pby doing:
Derive W, and W, such that
W, + W, = W, W, >0, W, >0;
W:=W, ;
send B (W,) to P;
Rule 2:On receipt of B(DW), a process P having weight Wdoes:
W:=W+ DW:
IfP is idle, P becomes active;
Rule 3: An active process having weight Wmay become idle at any time by
doing :
send C(W) to controlling agent;
W:=0;
(Process becomes idle);
Rule 4:On receiving C(DW), the controlling agent having weight W
takes
the following actions:
W:=W+ DW;
IfW= 1,conclude that the computation has terminated.
2
UNIT
Distributed Mutual
Exclusion

CONTENTS
Part-l Distributed Mutual +e*****se...+eeee 2-2B to 2-3B
Exclusion : Classification of
Distributed Mutual Exclusion,
Requirement of Mutual
Exclusion Theorem
Part-2 : Token Based and 2-3B to 2-10B
Non-Token Based
Algorithm, Performance
Metric for Distributed
Mutual Exclusion Algorithm
Part-3 : Distributed Deadlock .........2-10B to 2-12B
Detection : System Model
Resource vs Communication
Deadlocks, Deadlock
Prevention, Avoidance, Detection & Resolution
Part-4 : Centralized Deadlock ...2-12B to 2-18B
Detection, Distributed
Deadlock Detection, Path-Pushing
Algorithm, Edge Chasing Algorithms

2-1 B (CS-Sem-7)
2-2 B (CS-Sem-7) Distributed Mutual Exclusion

PART- 1
Distributed Mutual Exclusion : Classification of Distributed Mutual
Exclusion, Requiremernt of MutualExclusion Theorem.
Questions-Answers
Long Answer Type and Medium Answer Type Questions

Que 2.1. What do you mean by mutual exclusion in distributed


system ? What are requirements of a good mutual exclusion
algorithm ? AKTU 2014-15, Marks 05
OR
State the classification of distributed mutual exclusion. What is
requirement of mutual exclusion theorem ?
AKTU 2018-19, Marks 10

Answer
Mutual exclusion:
1. Mutual exclusion is a problem that arises if the process relies on a
common resource that can be used only by one process at a time.
2. Concurrent access to shared resources is prevented.
3. Mutual exclusion algorithm guarantees that only one request accesses
the critical section (CS) at a time.
4. There are two classes of distributed mutual exclusion algorithm:
aNon-token based algorithm
b Token based algorithm
Requirements of good mutual exclusion algorithm :
1. Freedom from deadlocks : Two or more sites should not endlessly
wait for messages that will never arrive.
2. Freedom from starvation :A site should not be forced to wait
indefinitely to execute CS i.e., every requesting site should get an
opportunity to execute CS in a finite time.
3. Fairness : Fairness dictates that requests must be executed in the order in
which they arrive in the system.
4. Fault tolerance : Amutual exclusion algorithm is fault-tolerant if in
the wake of a failure, it can recognize itself so that it continues to
function without any (prolonged) disruptions.
Distributed System 2-3 B (CS-Sem-7)

Que 2.2. How distributed mutual èxchusion is different from


mutual exclusion in single-computersystem ?
Answer
Difference :
S.No. Mutual exclusion in Mutual exclusion in
distributed system single-computer
1. Shared memory dóes not Shared memory exists.
exist.
2 Both shared resources and Both shared resources and the
the users may be distributed. users are present in shared
memory.
3.
Mutual exclusion problem isMutual exclusion problem is solved
solved by using message by using shared variables
passing approach. approach i.e., semaphores:

Que 2.3. What is mutual exclusion ? Describe the requirements


of mutual exclusion in distributed system. Is mnutual exclusion
problem more complex in distributed system thah single computer
system ? Justify your answer. AKTU 2017-18, Marks 10
Answer
Mutual exclusion and its requirements: Refer Q. 2.1, Page 2-2B,
Unit-2.
Yes, the problem of mutual exclusion becomes more complex in distributed
systemas compared to single computer systems because of absence of both
shared memory and a common physical clock and because of unpredictable
message delays. So, considering these factors, it is virtually impossible for a
site to have correct and complete knowledge of the state at the system.

PART-2
Token Based and Non-Token Based Algorithin, Performánce Metric
for Distributed Mutual Exclusion Algorithm.
Questions-Answers
Long Answer Type and Medium Answer Type Questions
2-4B (CS-Sem-7) Distributed Mutual Exclusion

Què 2.4. What istoken based algorithm and non-token based


algorithm in distributed system ? Explain with example.
AKTU2018-19, Marks 10
Answer
Token based algorithm :
1. In the token based algorithm, a unique token is shared among all sites.
If sites possess the token then it is allowed to enter its CS (Critical
Section).
2 Token based algorithms use sequence numbers instead of timestamps.
3. Every request for the token contains a sequence number and the
sequence number of sites advances independently. A site increment
its sequence number counter every time when it makes a request for
the token.
Example:
Suzuki-Kasami algorithm:
In the Suzuki-Kasami's algorithm, if a site attempting to enter the CS but
does not have the token,it broadcasts a request message for the token tÏ all
other sites.
Non-token based algorithm :
1 In non-token based mutual exclusion algorithms, a site communicates
with aset of other sites to arbitrate whoshould execute the (CS next.
2. Non-token based mutual exclusion algorithms use timestamps to order
request for the CS and to resolve conflicts between simultaneous
requests for the CS.
3 Each request for the CS getsa timestamp and small timestamp requests
have priority over larger timestamp requests. Each process freely and
equally competes for the right to use the shared resource; requests are
arbitrated by a central control site or by distributed agreement.
Example:
Lamports algorithm :
In Lamport'salgorithm, Vi:1sisN::R= S,, S, ..Sy. Every site S,
keeps a queue, request-queue,, which contains mutual exclusion requests
ordered by their timestamps. This algorithm requires message to be delivered
in the FIFO order between every pair of sites.
Que 2.5. Differentiate between token and non-token based

algorithms. AKTU2014-15, 2016-17; Marks 05


Distributed System 2-5 B (CS-Sem-7)

Answer

S. No. Token based algorithm Non-token based algorithm


1. In token based algorithm, In non token based algorithm,
token is shared among all the central site will communicates with
sites (or nodes). all other sites.
2 Token contains sequence It uses timestamp value in order
number of the sites in order to request for critical section.
to request for critical section.
3 A site having token can only A site with smaller timestamp
enter the critical section.
value can only enter the critical
section.
4 Token based algorithms are: Non-token based algorithm are :
a Lamport's algorithm a. Suzuki-Kasami's broadcast
b. Rickart Agarwala algorithm
algorithm b Raymond tree based
C.
Maekowa's algorithm algorithm
C. Singhal's Heuristic
algorithm
Que 2.6. Explain any one token based mutual exclusion
algorithm with its performance.
OR
Explain Lamport's algorithm for mutual exclusion.
Answer

Lamport's algorithm :
1. In Lamport's algorithm, Vi:1sisN::R, =(S,, S,, ..Sy). Every
site
S, keeps a queue, request-queue, , which contains mutual exclusion
requests ordered by their timestamps.
2 This algorithm requires message to be delivered in the FIFO order
between every pair of sites.
Algorithm :
1. Requesting the critical section:
a When a site S, wants to enter the critical section (CS), it sends a
REQUEST (ts, i) message to all the sites in its request set R, and
places the request on request-queue, Where (ts,, i) is the timestamp
of the request.
b. When a site S,receives the REQUEST (ts, i) message from site S,
it returns a timestamped REPLY message to S,
and places site S,s
request on the request-queue,.
2-6 B (CS-Sem-7) Distributed Mutual Exclusion

2. Executing the critical section : Site S, enters the CS when the


following twoconditions hold:
a. S,has received a message with timestamp larger than (ts,, i) from
all other sites.
b. S;s request is at the top of request-queue,.
3. Releasing the critical section :
a. Site S, upon exiting the CS, removes its request from the top of its
request queue and sends a timestamped RELEASE message to all
the sites in its request set.
b When a site S, receives a RELEASE message from site S, it removes
S;s request from its request queue.
C. When a site removes a request from its request queue, its own
request may come at the top of the queue, enabling it to enter the
CS. The algorithm executes CS requests in the increasing order of
timestamps.
Performance : Lamport's algorithm requires 3(N 1) messages per CS
invocation :
1. (N - 1) REQUEST, (N - 1) REPLY, and (N - 1) RELEASE messages.
Synchronization delay in the algorithm is T.
2. Lamport's algorithm can be optimized to require between 3(N 1) and
2(N-1) messages per CS execution by suppressing REPLY messages in
certain situations.

Que 2.7.Explain Suzuki-Kasami algorithm.

Answer

Suzuki-Kasami algorithm:
1. In the Suzuki-Kasami's algorithm, if a site attempting to enter the CS
but does not have the token, it broadcasts a request message for the
token to all other sites.
2. The main design issues in this algorithm are :
a. It distinguishws between outdated request messages and current
request messages.
b It determines which site has an outdated request for the critical
section.
Algorithm:
1. Requesting the critical seetion :
Ifthe requesting site S, does not have the token, then it increments
its sequence number, RN,i], and sends a REQUEST (i, sn) message
toall other sites. (sn is the updated value of RN,(i).
Distributed Systemn 2-7B(CS-Sem-7)
b. When a site S; rceives this message, it sets RN i]to max (RNi],
sn). If S, has the idle token, then it sends the token to S, if RN (1=
LN]+1. (LN]isthe sequence number of the request that síte S;
executed most recently).
2. Executing the critical section :
a. Site S; executes the CS when it has received the token.
3. Releasing the critical section : After finishing the execution of the
CS, site S, takes the following actions:
It sets LNi] element of the token array equal to RN,E.
b. For every site S, whose ID is not in the token queue, it appends its
ID to the token queue if RN,] = LNr] +.1.
If token queue is non-empty after the above update, then it deletes
the top site ID from the queue and sends the token to the site
indicated by ID.
Thus, after having executed its CS, a site gives priority to other sites with
outstanding requests for the CS (over its pending requests for the CS).
Que 2.8. Explain the Ricart-Agrawala algorithm for mutual
exclusion. Mention the performance of this algorithm.
AKTU 2014-15, Marks 05

Answer

The Ricart-Agrawala algorithm is an optimization of Lamport's algorithm


that dispenses with RELEASE messages by merging them with REPLY
messages. In this algorithm,
Vi:1sisN::R,=(S, S,, ..SE.
Algorithm:
1. Requesting the eritical section :
a. When a site S, wants to enter the CS, it sends a timestamped
REQUEST message to all the sites in its request set.
b When site S, receives a REQUEST message from site S,, it sendsa
REPLY message to site S, if site S, is neither requesting nor
executing the CS or if site S, is requesting and S;'s request's
timestamp is smaller than site S;s own request's timestamp. The
request is deferred otherwise.
2. Executingthe critical section:
a. Site S, enters the CS after it has received REPLY messages from all
the sites in its request set.
2-8 B (CS-Sem-7) Distributed Mutual Exclusion

3. Releasing the critical section :


a. When site S, exits the CS, it sends REPLY message to all the deferred
requests.
Asite's REPLY messages are blocked only by sites that are requesting the CS
with higher priority (i.e. asmaller timestamp). Thus, when asite sends out
REPLY messages to all the deferred requests, the site with the next highest
priority request receives the last needed REPLY message and enters the CS.
The execution of CS requests in this algorithm is always in the order of their
timestamps.
Performance: The Ricart-Agrawala algorithm requires 2(N- 1) messages
per CS execution : (N - 1) REQUEST and (N - 1) REPLY messages.
Synchronization delay in the algorithm is T.
Que 2.9. How the performance of mutual exclusion algorithms
is measured ?
OR
Discuss the performance metric for distributed mutual exclusion
algorithms. AKTU2016-17, Marks 7.5
OR
How distributed mutual exclusion is different of mutual exclusion
in single computer system ? How the performance of mutual
exclusion algorithm is measured ? AKTU 2018-19, Marks 10
Answer
Difference: Refer Q. 2.2,Page 2-3B, Uait-2.
Performance of mutual exclusion algorithm:
Performance of mutual exclusion algorithm is measured by the following
four metrics:
1. Response Time (RT):It is time between Rand CS i.e., time between
request message is sent out and completion of critical section.
Its request The site enters Thè site exits
CS request messages, the CS
arrives
the CS
sent out

-CS exeeution Time


. time
Response time
Fig. 2.9.1.
2. Synchronization delay (sd):Itis time between two consecutive CS,
that is, time between end of one CS and beginning of another CS. In this
period, messages are exchanged to 'arrive at mutual exclusion decision.
Distributed System 2-9 B (CS-Sem-7)

Last site Next site


exits CS enters CS

Synchronization Time
delay
Fig. 2.9.2.
3. Number of message per CS:As number of message exchange reduces,
the performance will improve.
4. System throughput : It is the rate at which the system executes for
the CS. If sd is the synchronization delay and E is the average critical
section time then the throughput is given by the following equation :
1
System throughput =
(sd + E)

Que 2.10. How distributed mutual exclusion is different from


mutual exclusion in single-computer system ? Classify mutual
exclusion algorithms. How the performance of mutual exclusion
algorithms ismeasured ? Compare the performance of token and
non-token based algorithms. How the Ricart- Agrawala algorithm
optimize the Lamport's algorithm. |AKTU 2015-16, Marks 10

Answer
Difference : Refer Q. 2.2, Page 2-3B, Unit-2.
Classification of mutual exclusion algorithm :
1. Token based algorithm : In the token based algorithm, a unique
token is shared among all sites. If sites possess the token then it is
allowed to enter its CS (critical section).
2. Non-token based algorithm : In non-token based algorithms,a site
communicates witha set of othersites to arbitrate who should execute
the CS next.

Performance of mutual exclusion algorithm : Refer Q. 2.9, Page 2-8B,


Unit-2.
Comparison of performance of token and non-token based
algorithms : (ll=light load, hl =heavy load)
Non-token Response Synchronization Message Message
time (l) delay (W) (hl)
Lamport 2T + E T 3(N 1) 3N - 1)
Ricart-Agrawal 2T + E 2(N - 1) 2(N - 1)
Maekawa 2T + E 2T 3/N 5N
2-10 B (CS-Sem-7) Distributed Mutual Exclusion

Token Response Synchronization Message Message


time (W) delay (hl)
Suzuki-Kasami 2T + E N N
Singhal's 2T + E NI2
Heuristic
Raymond |T(log N)+E T log (N)/2 log (N) 4

Ricart-Agrawala algorithm: Refer GQ. 2.8, Page 2-7B, Unit-2.

PART-3
Distributed Deadlock Detection : System Model
Resource vs Communication Deadlocks, Deadlock Prevention
Avoidance, Detection & Resolution.

Questions-Answers
Long Answer Type and Medium Answer Type Questions

Que 2.11. What is deadlock ? What are necessary conditions for


the occurrence of deadlock in distributed system ?
Answer
Deadlock is defined as the permanent blocking of the process i.e., a set of
process is waiting for an event that is held by other process.
Necessary conditions for deadlock :
1 Mutual exclusion : A resource can be held by at most one process.
2 Hold and wait: Processes that already hold resources can wait for
other resources.
3. No preemption :A resource, once granted, cannot be taken away from
a process till its complete execution.
4. Circular wait:Two or more processes are waiting for resources used
by one of the other processes. We can represent resource allocation as
a graph where:PR means aresource Ris currently held by a process
P. Deadlock exists when a resource allocation graph has a cycle.

Que 2.12. What is distributed deadlock? Explain various deadlock


handling strategies.
Distributed System 2-11 B (CS-Sem-7)

Answer
1 Deadlock is a situation in which a set of processes are blocked because
each process is holding a resources and waiting for another resources
acquired by some other pro cess.
2 The detection of deadlocks in a distributed DBMS is more complicated,
because it involves several different sites.
3 Thus, in a distributed DBMS it is necessary to draw a globalwait-for
graph (GWFG) for the entire system to detect a deadlock situation.
Deadlock handling strategies in distributed system :
1. Deadlock prevention :
a. Deadlock prevention is achieved by having a process collect all the
needed resources at once before it beings executing or by preempting
a process that holds the needed resource.
b Now, mutual exclusion, hold-and-wait, no pre-emption and circular
wait are the four necessary conditions for a deadlock to occur. If
one of these conditions is never satisfied then deadlock can be
prevented.
C. Deadlock prevention methods are
i. Collective requests : These methods deny the hold and wait
condition by ensuring that whenever a process requests a
resource it does not hold any other resource.
ii Ordered requests : In this method circular-wait is denied
such that each resource type is assigned a unique global
number toimpose total ordering of all resource types.
iii. Preemption:A preemptable resource is one whose state can
be easily saved and restored later. Deadlocks can be prevented
using resource allocation policies to deny no-preemption
condition.
d Deadlock prevention is highly incompetent and unrealistic in
distributed system.
2. Deadlock avoidance :
a. For deadlock avoidance in distributed system, a resource is assigned
to a process if the state of global system is safe.
b State of global system includes all processes and resources in
distributed system.
C. Deadlock avoidance algorithm can be done in the following steps :
When a process requests for a resource, if the resource is
available for allocation it is not immediately allocated to the
process. Rather, the system assumes that the request is
granted.
2-12 B (CS-Sem-7) Distributed Mutual Exclusion

i. Using advance knowledge of resource usage of processes and


the assumption made in step (i), the system performs some
analysis to decide whether granting the process request is safe
or unsafe.
üi. The resource is allocated to the process only ifit is safe to do so,
otherwise the request is deferred.
3. Deadlock detection :
In this approach for deadlock detection, the system does not make
any attempt to prevent deadlock but allows processes to request
resources and wait for each other in uncontrolled manner.
b Deadlock detection requires status of the process and resources
interaction for availability of cyclic wait.
C Deadlock detection algorithms are easily implemented by
maintaining Wait-for-graph (WFG) and searching for cycles.
Que 2.13. Distinguish between resource deadlock and
communication deadlock. AKTU2014-15, Marks 05
Answer

S. No. Resource deadlock Communication deadlock


1. The dependence of one A process can knów the identity
transaction on actions of of thoseprocesses on the action of
other transactions is not which it depends.
directly known.
2 A process cannot proceed Aprocesscannot proceed with its
with its execution until it execution until it can
receives all the resources for communicate with atleast.one of
which it is waiting. the processes for which it is
waiting.

PART-4

Centralized Deadlock Detection, Distributed Deadlock Detection,


Path-Pushing Aigorithm, Edge Chasing Algorithm.

Questions-Answers

Long Answer Type and Medium Answer Type Questions


Distributed System 2-13 B (CS-Sem-7)

Que 2.14.|What are the deadlock handling strategies in distributed


file system ? What is control organization for distributed deadlock
detection ? Discuss an algorithm which can remove phantom
deadlock. AKTU2016-17, Marks 05
OR
Explain edge chasing algorithm.
Answer
Deadlock handling strategies : Refer Q. 2.12, Page 2-10B, Unit-2.
Control organizations:Algorithm for detecting distributed deadlock can
be handled in following ways :
1. Centralized control :
a. In centralized deadlock detection algorithms, a designated site
(control site) has the responsibility of constructing the global WFG
and searching it for cycles.
b. Centralized deadlock detection algorithms are conceptually simple
and are easy to implement.
2 Distributed control :
a. In these algorithms, the responsibility for detecting a global deadlock
is shared equally among all sites.
b. The global state graph is spread over many sites and several sites
participate in the detection of aglobal cycle.
3. Hierarchical control: In hierarchical deadlock detection algorithms,
sites are arranged in a hierarchical fashion, and a site detects deadlocks
involving only its descendant sites.
Ede chasing algorithm :
1 Edge chasing algorithm is used for phantom deadlock removal in
distributed systems.
2 In this, the global wait-for-graph is not constructed, but each of the
servers involved has knowledge about some of its edges.
3. The servers attempt to find cycles by forwarding messages called probes,
which follows the edges of the graph throughout the distributed system.
4. A probe message consists of transactions wait-for-relationship
representing a path in the global wait-for-graph.
5 Edge chasing has three steps:
a. Initiation: The server initiates to detect deadlock.
b. Detection : Detection consists of receiving probes and deciding
whether deadlock has occurred and forward probes.
C. Resolution : When a cycle detected, a transaction in the cycle is
aborted to break deadlock.
2-14 B (CS-Sem-7) Distributed Mutual Exclusion

Que 2.15. Give the deadlock handling strategies in distributed


system ? What are the differences in centralized, distributed and
hierarchical control organizations for distributed deadlock
detection ? AKTU2014-15, Marks 10
Answer
Deadlock handling strategies : Refer Q. 2.12, Page 2-10B, Unit-2.
Difference:
S. No. Centralized control Distributed control Hierarchical control
1. Acontrol site has the All sites have the Descendant site can
responsibility to detect responsibility to detect detect a global wait
global wait for graph. a global wait for graph. for graph.
2. Have single point of]No single point of No single point of)
|failure. failure. failure.
3. Easy to implement. Difficult to implement. Simple to implement.
4. Completely centralized Path pushing and Menasce-Muntz and
and Ho-Ramamoorthy edge chasing Ho-Ramamoorthy
algorithm are used for algorithm are used algorithm are used
deadlock detection. for deadlock for deadlock
detection. detection.

Que 2.16. Classify the deadlock detection algorithms. Describe


the path-pushing deadlock detection algorithm.
AKTU2017-18, Marks 10
OR
Discuss Obermarck's path-pushing algorithm.
|AKTU2016-17, Marks 7.5|
Answer
Distributed deadlock detection algorithms can be divided into four
classes :
Path-pushing algorithm : In path-pushing algorithms, wait for
dependency information of the global WFG (wait for graph) is circulated
in the form of paths.
b. Edge chasing algorithm : In edge chasing algorithms, special messages
called probes are circulated along the edge of the WFG to detect a cycle.
When a blocked process receives a probe, it propagates the probe along
its outgoing edges in the WFG.
C. Diffusion computation based algorithm : Diffusion computation
type algorithms make use of echo algorithms to detect deadlocks.
Distributed System 2-15 B (CS-Sem-7)
Deadlock detection messages are
through the edges of the WFG. successively propagated (i.e., "diffused")
Global state detection based algorithm : These
algorithms detect
deadlocks by taking a snapshot of the system and by examining
condition of a deadlock. Several sites in distributed system it for the
detection of global cycle. Thus global state graph is spreadparticipat in
sites. The responsibility of detecting over many
all sites. deadlock is shared equally among
Obermarck's
1
path-pushing algorithm :
Obermarck's push-pushing algorithm was designed for distributed
database system.
2 In path-pushing deadlock detection
wait-for dependencies is propagatedalgorithms, information about the
in the form of a path.
Algorithm : Deadlock detection at a site follows the following iterative
process:
1. The sites wait for
system.
deadlock-related information from other sites in the
2 The site combines the received information with its
local
build an updated TWF graph. It then detects all cycles andTWF graph to
cycles which do not contain the node Ex (External node). breaks local
3 For all cycles, 'Ex’T, ’T, ’ Ex which contains the node 'Ex' the
transmits them in string form Ex T,,1? T,, Ex' to all other sites wheresitea
subtransaction of T, is waiting to receive a message from the
subtransaction of T, at this site. The algorithm reduces message traffic
by lexically ordering transaction and sending the string 'Ex, T,, T,, T
Ex to other sites only if T, is higher than T, in the lexical
for a deadlock, the highest priority transaction detects the ordering. Also,
deadlock.
Que 2.17. Discuss various centralized deadlock- detection
algorithms.
Answer
Various centralized deadlock detection algorithms are :
1. Completely centralized algorithm :
a. It is the simplest type of deadlock detection algorithm, wherein a
designated site called the control site, maintains the WFG (Wait for
graph) of the entire system and checks it for the existence of
deadlock cycles.
b. All sites request and release resources (even local resources) by
sending request resource and release resource messages to the
control site, respectively.
C. When the control site receives a request resource or a release
resource message, it correspondingly updates its WFG.
2-16 B (CS-Sem-7) Distributed Mutual Exclusion

d. The control site checks the WFG for deadlocks whenever a request
edge is added to the WFG.
2. The Ho-Ramamoorthy algorithms : Ho and Ramamoorthy gave
two centralized deadlock detection algorithms called two-phase and one
phase algorithms.
a.
The two-phase algorithm :
1. In the two-phase algorithm, every site maintains a status table
that contains the status of all the processes initiated at that
side.
i Periodically, a designated site requests the status table from
all sites, constructs a WFG from the information received, and
searches it for cycles.
ii. If there is no cycle, then the system is free from deadlocks,
otherwise, the designated site again requests status tables
from all the sites and again constructsa WFG'using only those
transactions which are common to both reports.
iv. If the same cycle is detected again, the system is declared
deadlocked.
b. The one-phase algorithm :
i. The one-phase algorithm requires only one status report from
each site; however each site maintains two status tables; a
resource status table and a process status table.
The resource status table at a site keeps track of the
transactions that have locked or are waiting for resources
stored at that site.
The process status table at a site keeps track of the resources
locked by or waited for by all the transactions at that site.
iv. Periodically, a designated site requests both the tables from
every site, construct a WFG using only those transactions for
which the entry in the resource table matches the
corresponding entry in process table, and searches the WFG
for cycles.
V. Ifno cycle is found, then the system is not deadlocked, otherwise
a deadlock is detected.

Que 2.18. Explain various hierarchical deadlock detection


algorithms.
Answer
In hierarchical algorithms,sites are logically arranged in hierarchical fashion,
and a site is responsible for detecting deadlocks involving only its children
sites.
Distributed System 2-17 B (CS-Sem-7)

Various hierarchical deadlock detection algorithms :


1. The Menasce-Muntz Algorithm :
In this algorithm all the controllers are arranged in tree fashion.
b The controllers at the bottom-most level (leaf
controllers) manage
resources and others (non-leaf controllers) are responsible for
deadlock detection.
C. Whenever a change occurs in a controller's TWF (Transa) graph
due to a resources allocation, wait or release, it is propagated to its
parent controller.
d. The parent controller makes changes in its TWF graph, searches
for cycles, and propagates the changes upward, if necessary.
e. Anon-leaf controller can receive up-to-date information concerning
the TWF graph of its children continuously or periodically.
2. The Ho-Ramamoorthy algorithm :
a. In this algorithm, sites are grouped into several disjoint clusters.
b. Periodically, a site is chosen as a central control site, which
dynamically chooses a control site for each cluster.
C
The central control site requests from every control site their
intercluster transaction status information and wait-for relations.

Control,
site

Central
Control site Control
site site

Fig. 2.18.1.
d As a result, a control site collects status table from all the sites in its
cluster and applies the one-phase deadlock detection algorithm to
detect all deadlocks involving only intracluster transactions.
2-18 B (CS-Sem-7) Distributed Mutual Exclusion

e. Then, it sends intercluster transaction status information and wait


for relations to the central control site.
f. The central site split the intercluster information it receives,
constructs a system WFG, and searches it for cycles.
Thus, a control site detects all deadlocks located in its cluster, and
the central control site detects all intercluster deadlocks.

VERY IMPORTANT QUESTIONS


Following questions are very important. These questions
may be asked in your SESSIONALS as wellas
UNIVERSITY EXKAMINATION.

Q.1. What do you mean by mutual exclusion in distributed


system ? What are requirements of a good mutual exclusion
algorithm ?
Ans. Refer Q. 2.1.
Q. 2. What is token based algorithm and non-token based
algorithm in distributed system ? Explain with example.
Ans. Refer Q. 2.4.
9.3. Differentiate between token and non-token based
algorithms.
Ans. Refer Q. 2.5.
Q.4. Explain the Ricart-Agrawala algorithm for mutual
exclusion. Mention the performance of this algorithm.
Ans. Refer Q. 2.8.
Q.5. Discuss the performance metric for distributed mutual
exclusion algorithms.
Ans. Refer Q. 2.9.

Q.6. Classify the deadlock detection algorithms. Describe the


path-pushing deadlock detection algorithm.
Ans. Refer Q. 2.16.

Q.7. What are the deadlock handling strategies in distributed


file system ? What is control organization for distributed
deadlock detection ? Discuss an algorithm which can
remove phantom deadlock.
Ans. Refer Q. 2.14.
UNIT
3 Agreement Protocols

CONTENTS
Part-1 : Agreement Protocol : 3-2B to 3-4B
Introduction, System Models
Part-2 : Classification of Agreement 3-4B to 3-11B
Problem, Byzantine Agreement
Problem, Consensus Problem,
Interactive Consistency Problem,
Solution to Byzantine Agreement
Problem

Part-3 : Application of Agreement 3-11B to 3-14B


Protocol, Atomic Commit
in Distributed Database System

Part-4 : Distributed Resource 3-14B to 3-23B


Management : Issues in
Distributed File System, Mechanism
For Building Distributed File
System
Part-5 : Design Issues in Distributed 3-24B to 3-26B
Shared Memory
Part-6 : Algorithm For Implementation. 3-26B to 3-30B
of Distributed Shared Memory

3-1 B (Cs-Sem-7)
3-2 B (CS-Sem-7) Agreement Protocols

PART- 1
Agreement Protocol : Introduction, System Models.

Questions-Answers
Long Answer Type and Medium Answer Type Questions

Que 3.1.Explain agreement protocol.


Answer
Agreement protocol:
1. Process of sending and reaching the agreement to all sites is called
agreement protocol.
2 In distributed system, the agreement protocols are very much useful for
error free communication among various sites.
3 In distributed system, the chances of the faulty processors are more.
The faulty processor may lead to wrong message communication, no
response for a message etc.
4 Also the presence of faulty processor is not known to the non-faulty
processors. So, the non-faulty processors do not restrict the message
transfer to the faulty processors.
5 The agreement protocols allow the non-faulty processors to reach a
common agreement in the distributed system, whether there are other
processors which are faulty or not.
6 The common agreement among the processors is taken through the
agreement protocol.
Que 3.2, What is agreement protocol ? Discuss the general system
model where agreement protocols are used.
Answer
Agreement protocol: Refer Q. 3.1, Page 3-2B, Unit-3.
System model : Following are the system models where agreement protocols
are used:
1. If there are nprocessors in the distributed system, then only mprocessors
out of them may be found as faulty processors.
2. Every proce ssor in the system is free to communicate with other
processors in the system due to their logical connections with each
other.
Distributed System 3-3 B (CS-Sem-7)

3. Areceiver processor always knows the identity of the sender processor


of message.
4. The communication medium is reliable (ie., it delivers all messages
without introducing any errors) and only processors are prone to failures.
Que 3.3. Discuss various aspects for recognizing the agreement
protocol.
Answer
Following are various aspects to consider for recognizing the
agreemnents in distributed system :
1. Synchronous and asynchronous computations :
a In asynchronous computation, processes in the system run in lock
step manner, where in each step, a process receives messages,
performs a computation and sends messages to other processes.
b. In synchronous computation, a process knows all the messages
which it expects to receive in a round.
C. Amessage delay or a slow process can slow down the entire system
or computation.
d. In an asynchronous computation, processes in the system does not
proceed in lock step.
e. Aprocess can send and receive messages and perform computation
at any time.
2. Process failure model :
The processor failure is the most common system model considered
in finding the agreement protocol.
b A processor can fail in three models:
Crash fault : In a crash fault, a processor stops functioning
and never resumes operation.
i. Omission fault : In an omission fault, a processor "omits" to
send messages to some prOcessors.
iii. Malicious fault : In a malicious fault, a processor behaves
randomly and arbitrarily.
3. Authenticated and non-authenticated messages :
a. In an authenticated message system, a (faulty) processor cannot
forge a message or change the contents of a received message.
b. Aprocessor can verify the authenticity of a received message.
An authenticated message is also called a signed message.
d In a non-authenticated message system, a (faulty)processor can
forge a message and claim to have received it from another processor
3-4 B (CS-Sem-7)
Agreement Protocols

or change the contents of a received message before it relays the


message to other processors.
e.
Aprocessor is not able to verify the authenticity of areceived message.
f A non-authenticated message is also called an oral message.
It iseasier to reach agreement in an authenticated message system
because faulty processors are capable of doing less damage.
4. Performance aspects :
The performance of agreement protocols is generally determined
by the following three metrics :
i. Time:Time refers to the time taken to reach an agreement
under a protocol.
ii. Message traffic: Message traffic is measured by the number
of messages exchanged to reach an agreement.
iii. Storage overhead: Storage overhead measures the amoånt
of information that needs to be stored at processors during the
execution of a protocol.

PART-2
Classification of Agreement Problem Byzantine Agreement Problem,
Consensus Problem, Interactive Consistency Problem, Solution to
Byzantine Agreement Problem.

Questions-Answers
Long Answer Type and Medium Answer Type Questions

Que 3.4. What are agreement protocols ? Explain Byzantine


agreement problem, the consensus problem and interactive
consistency problem. AKTU2016-17, Marks 10
Answer
Agreement protocols : Refer Q. 3.1, Page 3-2B, Unit-3.
Classification of agreement protocol:
1. The Byzantine agreement problem:
a. In the Byzantine agreement problem, an arbitrarily chosen
processor, called the source processor, broadcasts its initial value to
all other processors.
Distributed System 3-5 B (CS-Sem-7)

b. Asolution to the Byzantine agreement problem should meet the


following objectives:
i. Agreement : All non-faulty processors agree on the same value.
ii. Validity:Ifthe source processor is non-faulty, then the common
agreed upon value by all non-faulty processors should be the
initial value of the source.
iii. Termination : Each non-faulty processor must eventually
decide on a value.
2 Consensus problem :
a. In the consensus problem, every processor broadcasts its initial
value to all other processors.
b. Initial values of the processors may be different.
C. A protocol for reaching consensus should meet the following
conditions :
i. Agreement: All non-faulty processors agree on the same single
value.
ii. Validity: If the initial value of every non-faulty processor is U,
then the agreed upon common value by all non-faulty processors
must be u.
iüi. Termination : Each non-faulty processor must eventually
decide on a value.
3. The interactive consistency problem :
a. In the interactive consistency problem, every processor broadcasts
its initial value to all other processors.
b. The initial values of the processors may be different:
C. Aprotocol for the interactive consistency problem should mieet the
following conditions :
. i. Agreement : All non-faulty processors agree on the same
vector (v,, V .. Vn
ii. Validity : If the ith processor is non-faulty and its initial value
is U, then the ith value to be agreed on by all non-faulty
processors must be u,.
iii. Termination : Each non-faulty processor must eventually
decide on different value of vectors.

Que 3.5. What are agreement protocols ? Explain Byzantine


agreemnent problem, the consensus problem and interactive
consistency problem. Deseribe Lamport-Shostak-Pease algorithm.
AKTU2014-15, Marks 10
3-6B (CS-Sem-7)
Agreement Protocols

Answer
Agreement protocol:ReferQ. 3.1, Page 3-2B, Unit-3.
Byzantine agreement problem, the consensus problem and
interactive consistency problem : Refer Q. 3.4, Page 3-4B, Unit-3.
Lamport-Shostak-Pease algorithm :
Lamport algorithm, also referred to as Oral Message algorithm OM(m),
m >0, solves the Byzantine agreement problem for 3m + lor more
in the presence of at most m faulty processors
processors.
Let n denote the total number of processors(where, n >3m+ 1). The algorithm
is recursively defined as follows :
Algorithm OM(0):
1 The source processor sends its value to every processor.
2 Each processor uses the value it receives from the source. If it receives
no value, then it uses a default value of 0.
Algorithm OM (m), m>0:
1 The source processor sends its value to every processor.
2 For each i, let v, be the value processor i receives from the source (if it
receives no value, then it uses a default value of 0). Processor iacts as
the new source and initiates algorithm OM(m - 1) wherein, it sends the
value v, to each of the other n - 2 processors.
3 For each i andj (where iandj are not equal), letv,be the value processor
ireceived from processor j in step 2 using Algorithm OM (m - 1). If it
receives no value, then it uses a default value of0. Processor i uses the
value majority (v,, V,..).
4 The majority function is used to select the majority value out of values
received in round of message exchange.
5. The function majority (u, V V,-) computes majority of values
U,, Uy ....Uif it exists (otherwise returns 0).
Que 3.6. Describe Lamport-Shostak-Pease algorithm. How does
vector clock overcome the disadvantages of Lamport clock ? Explain
with an example. AKTU2016-17, Marks 15
Answer
Lamport-Shostak-Pease algorithm : Refer Q. 3.5, Page 3-5B, Unit-3. .
Vector clock overcome advantage of Lamport clock: With Lamport
clocks, we cannot determine whether two events are casually related by
looking at the timestamps, because if CA) < C(B) does not always mean
A’ Bwhile veçtor clock allow to compare the timestamps of the events to
determine whether they are casually related or not.
Distributed System 3-7B (CS-Sem-7)

For example :Vector timestamp is shown in Pig. 3.6.1. The event e,


with vector timestamp (2, 1, 0) is causally ordered before the event e.,,
with the vector timestamp (2,1, 4),but is concurrent with the event e
having timestamp (0, 0, 2).
(1, 1,0) (2, 1, 0)
P,
(0, 0, 0) e11 e12
Space
(0, 1, 0), (2, 2, 4)

To.o,o) e1
(2, 1, 3)
ez2

(2, 1, 4)
Pg (0, 0, 1) (0, 0, 2)
(0, 0, 0) e31 E32 eg4 Time
Fig.3.6.1.
Que 3.7. What do you understand by Byzantine agreement
problem? AKTU2018-19, Marks10
OR
What is Byzantine agreement problem ? Provide the solution to
Byzantine agreement problem. AKTU2018-19, Marks 10
Answer
1. In Byzantine agreement problem a single value, which is to be agreed
on is initializes by an arbitrary processes and all non-faulty processes
have to agree on that value.
2. There are n processes, n = lp Pgy.. P,} with unique names over
N={1,.. , n) and at most Byzantine participants t<n of the processes
can be Byzantine.
3. Each pro cess starts with an input value v from a set of values.
4. The goal of this protocol is to ensure that all non-faulty processes
eventually output the same value.
5. The output of a non-faulty process is called the decision value.
6. An algorithm solves the Byzantine agreement if the following conditions
hold :
a. Agreement : All non-faulty processes agreed on the same value
(i.e., there are no two non-faulty processes that decide different
values).
b. Validity : If all non-faulty processes start with the same valueu,
the decision value of allnon-faulty participants is u.
C. Termination :All non-faulty processes decide a value.
3-8 B (CS-Sem-7)
Agreement Protocols
7. Reaching agreement in presence of Byzantine processes is expensive
as the number of messages grows quadratically with the
participants nand the number of round (time)grows linearlynumber
with theof
number of Byzantine participants t (with n > 3t).
Solution to Byzantine agreement : Solution to Byzantine
problem is given by Lamport Shostak-Pease algorithm. agreement
Lamport Shostak-Pease algorithm : Refer Q. 3.5, Page 3-5B, Unit-3.
Que 3.8.
Show that a Byzantine agreement cannot be reached
among three processors, where one processor is faulty.
OR
Explain treatment of impossible result for the solution of Byzantine
agreement problem.
Answer
1 Sometimes, the agreement problem may lead to such a condition which
is quite impossible to solve.
2 The situation where the agreement is impossible, called as impossible
result.
3 This type of problem cannot be reached to agreement.
4 In a system, the impossible result situation is found with more than two
processors.
5 Let us check the situation of impossible result in a system with three
processors.
6. Consider a system with three processors, Po, P, and P:
7. We assume that there are only two values, 0 and 1, on which
processors
agree and processor P, initiates the initial value.
8 There are two possibilities:
Case I: P, is not faulty :
1. Assume P, is faulty.
2 Suppose that P, broadcasts an initial value of 1to both P, and P,.
3 Processor P, acts maliciously and communicates a value of 0 to
processor P:
4 Thus, P, receives conflicting values from P, and P,.
5 However, since P, is non-faulty, processor P, must accept 1 as the
agreed upon value.
3-9 B (CS-Sem-7)
Distributed System

Po
1

P1 P.

Fig. 3.8,1 Processor P, is non-faulty.


Case II : P, is faulty :
1 Suppose that processor P, sends an initial value of 1 to P, and 0 to
P,
2. Processor P, will communicate the value 0 to P,.
3 As far as P, is concerned, this case will look identical to Case I.
4 So any agreement protocol which works for three processors cannot
distinguish between the two cases and must force P, to accept 1as
the agreed upon value whenever P, is faced with such situations.
5 However, in case II, this will work only if P, is also made to accept
1as the agreed upon value.
6 Using a similar argument, we can show that ifP, receives an initial
value of 0 from Po, then it must take 0 as the agreed upon value,
even if P, communicatesa value of 1.
7 However, if this is followed in case II, P, will agree on a value of 1
and P, will agree on a value of 0.

Po

P; 0

Fig. 3.8.2 Processor P, is faulty.


Therefore, no solution exists for the Byzantine agreement problem for three
processors,which can work under single processor failure.
Que 3.9. Describe Byzantine agreement problem, and explain
its solution. Show that Byzantine agreement cannot always be
reached among four processors if two processors are faulty.
AKTU2017-18, Marks 10
3-10 B (CS-Sem-7) Agreement Protocols

Answer
Byzantine agreement problem and its solution : Refer Q. 3.7,
Page 3-7B, Unit-3.
Proof:
1 Considera system with four processors as P,, P,, P, Pa Assume that
processors are exchanging three values x, y and z to each other, P,
initiate the initial value and processors P, andP, are faulty.
2 To initiate the agreement, processor P, execute algorithm OM(1)and
sends its value x to all other processor as shown in Fig. 3.9.1.

P1
X
X

P2 P3 P4
Fig. 3.9.1.
3. After receiving the value x from source processor P,, processors P,, P,
and P, execute the algorithm OM(0).
4 Processor P, is non-faulty and send value x to processor P, and Pa
Faulty processors P, and P, sends valuey to (Ps, P) and z to (P,, P;)
respectively as shown in Fig. 3.9.2.
P

P2 P3 PA

Fig. 3.9.2.
5. After receiving all the messages, processor P;, P, Pz and P, decide on
the majority value.
Majority values for Byzantine solution :
ProcessorReceived majority Common majority
values values

P (x, x, 2)
(x, y, z) 0
P
PA (x, x, y)
Distributed System 3-11B (CS-Sem-7)
6 According to majority value table, processors does not agree on single
common majority value, which violates the condition of Byzantine
agreement problem.
7
"This proves that Byzantine agreement cannot always reach among
four processors if two processors are faulty.

PART-3
Application of AgreementProtocol, Atomic Commit in Distributed
Database System.

Questions-Answers
Long Answer Type and Medium Answer Type Questions

Que 3.10. What do you understand by the fault-tolerant clock


synchronization ?
Answer
1. In distributed systems, it is often necessary that sites (or processes)
maintain physical clocks that are synchronized with one another.
2. Since physical clocks have a drift problem, they must be periodically
resynthesized.
3. Such periodic synchronization becomes extremely difficult if the
Byzantine failure is allowed because a faulty process can report different
clock values to different processes.
4. Now the assumptions regarding thesystem are:
a. All clocks are initially synchronized to approximately the same
values.
b Anon-faulty process's clock runs at approximately correct rate.
C. A non-faulty process can read the clock value of another non
faulty process with at most a small error e.
5. A clock synchronization algorithm should satisfy the following two
conditions:
a. At any time, the values of the clocks of all non-faulty processes
must be approximately equal.
b. There is a small bound on the amount by which the clock of a non
faulty process is changed during each resynchronization. This
condition implies that resynchronization does not cause a clock
value to jump arbitrarily far, thereby preventing the clock rate
from being too far from the real time.
3-12 B (CS-Sem-7) Agreement Protocols

Que 3.11.| Explain the interactive convergence algorithm for clock


synchronization.
Answer
The interactive convergence algorithm :
1. It is called interactive convergence algorithm because it causes the
convergence of non-faulty clocks.
2 This algorithm assumes that the clocks are initially synchronized and
they are resynthesized often enough so that two non-faulty clocks never
differ by more than à.
3. According to the algorithm, each process reads the value of all other
processes clocks and sets its clock value to the average of these values.
4. Ifa clock value difers from its own clock value by more than 8, it replaces
that value by its own clock value when taking the average.
5. The processing of the algorithm is very simple.
6. Now, assume the situation where, there are two processes p and q
respectively, use Co, and C as the clock values of a third process r
when computing their averages.
7. Ifr is non-faulty, then
Cpr =Cor
8 Ifr is faulty, then
|Cpr-Car |s38
when p and g compute their averages for the n clock values, they both
use identical values for the clocks of n-mnon-faulty processes and the
difference in the clock values of mfaulty processes they use is bounded
by 38.
9 In this way, the average value computed by p and q differ by at most
(3m/n) 8.
Since, n> 3m
and (3m/n) 8<8
10. Thus, each resynchronization brings the clocks closer by a factor of
(3m/n).
11. This implies that we can keep the clocks synchronized within any desired
degree by resynchronizing them often enough using the algorithm.
Que 3.12. Explain interactive consisteney algorithm.
Answer
1. In this algorithm we consider two important conditions :
Distributed System 3-13B (CS-Sem-7)
C,:Any two processes obtain approximately the same value for a process
P'sclock (even if P is faulty).
This condition is important because any two processes always
compute approximate same median if they get approximate same
set of clock values for other processes.
C, : Ifqis a non-faulty process, then every non-faulty process obtains
approximately the correèt value for process q's clock.
Thus, if a majority of the processes are non-faulty, the median of all
the clock values is either approximately equal to a good clock's
value or it lies between the values of two good clocks.
2, The algorithm is as follows:
Allthe processes in a system execute an algorithm to collect values
for clock that satisfy the conditions C, Cg:
b Every process uses the median of collected values to compute its
new clock value.

Que 3.13. Write a short note on atomic commit in distributed


database system.
Answer

1. In the problem of atomic commit, sites of a distributed system must


agree whether to commit or abort a transaction.
2 In the first phase of the atomic commit, sites execute their part of a
distributed transaction and broadcast their decisions (commit or abort)
to all other sites.
3. In the second phase, each site, based on what it received from other
sites in the first phase, decides whether to commit or abort its part of the
distributed transaction.
4.
Ifevery site receives an identical response from all other sites, they will
reach the same decision.
5
Ifsome sites behave maliciously, they can send a conflicting response to
other sites, causing them to make conflicting decisions.
6. In these situations, we can use algorithms for the Byzantine agreement
decision about
to insure that all non-faulty processors reach a common
adistributedtransaction.
7. It works as follows:
a. In the first phase, after a site has made a decision, it starts the
Byzantine agreement.
based
b. In the second phase, processors determine a common decision
on the agreed vector of values.
3-14 B (CS-Sem-7)
Agreement Protocols
Que 3.14. What are agreement protocols ?
Discuss the general
system model where agreement protocols are used. Give the
applications of agreement protocols. AKTU2015-16, Marks 10
Answer

Agreement protocols : Refer Q. 3.1, Page 3-2B, Unit-3.


System model : Refer Q. 3.2, Page 3-2B, Unit-3.
Applications of agreement protocols :
1.
Fault-tolerant clock synchronization :
a.
Distributed systems require physical clocks to synchronize.
b. Physical clocks have drift problem.
C.
Agreement protocols may help to reach a common clock value.
2,. Atomic commit in distributed database system (DDBS):
a. DDBS sites must agree whether to commit or abort the transaction.
b. Agreement protocols may help to reach a consensus.

PART-4

Distributed Resource Management : Issues in Distributed File


System,Mechanism for Building Distributed File System.

Questions-Answers

Long Answer Type and Medium Answer Type Questions

Que 3.15. Explain typical architecture of Distributed File System


(DES).
Answer

1 In distributed file system, files can be stored at one machine and the
computation can be performed at other machine.
2 When a machine needs to access a file stored on a remote machine, the
remote machine performs the necessary file access operations and
returns data if a read operation is performed.
3. File server are higher performance machines which are used to store
file and performs storage and retrieval operations.
4. Client machines are used for computational purpose and to access the
files stored on servers.
Distributed System 3-15 B (CS-Sem-7)

Client
Client cache
cache

Ccachelient
Local
Communication disk
network

Server Server
cache cache

Fig. 3.15.1. Architecture of DFS.


5 The two most important services present in a distributed file system
are:

Name server:
Aname server is a process that maps names specified by clients
to stored objects such as files and directories.
ii The mapping occurs when a process references a file or
directory for the first time.
b. Cache manager :
i. A cache manager isa process that implements file caching.
In file caching, a copy of data stored at a remote file server is
brought to the client's machine when referenced by the client.
üi. Cache managers can be present at both clients and file servers.
iv. Cache managers at the servers, cache files in the main memory
toreduce delays due to disk látency.
V. Ifmultiple clients are allowed to cache a file and modify it, the
copies can become inconsistent.
vi. To avoid this inconsistency problem, cache managers at both
servers and clients coordinate to perform data storage and
retrieval operations.
Que 3.16. Explain the mechanism for distributed file system.
3-16B (CS-Sem-7) Agreement Protocols

Answer
Mechanism for building distributed file system :
1. Mounting :
a A mount mechanism allows the binding of different ilename spaces
to form a single hierarchically structured name space.
Server X

b C

g
Id f

Server Y k
Mount
points Server Z

Fig. 3.16.1. Name space hierarchy.


b A
name space (ora collection of files) can be bounded to or mounted
at an internal node or a leaf node of a name space tree.
C. A node onto which a name space is mounted is known as mount
point. The kernel maintains a structure called the mount table
which maps mount points to appropriate storage devices.
2. Caching :
a. In caching, a copy of data stored at a remote file server is brought to
the client when referenced by the client.
b. On further request, the data is searched locally at the client side
only and it reduces the access delay through the network.
C.
Data can either be cached in the main memory or on the local disk
of the clients or at the servers to reduce disk access latency.
3. Hints :
a.
Caching is an important mechanism to improve the performance of
a file system.
b.
Guaranteeing the consistency of the cached items requires elaborate
and expensive client/server protocol.
C. An alternative approach to maintaining consistency is to treat cached
data as hints.

d. With this scheme, cached data is not expected to be completely


consistent, but when it is, it can dramatically improve performance.
is
e To prevent the occurrence of negative consequences if a hint be
erroneous, the classes of applications that use hints must
Distributed System 3-17 B (CS-Sem-7)

restricted to those that can recover after discovering that the cached
data is invalid, that is, the data should be self-validating upon use.
4 Bulk data transfer:
a. In this mechanism, multiple consecutive data blocks are transferred
from server to client.
b. This reduces file access overhead by obtaining multiple number of
blocks with a single seek, by formatting and transmitting multiple
number of large packets in single context switch and by reducing
the number of acknowledgement that need to be sent.
C. Bulk transfer amortizes the cost of the fixed communication protocol
overheads and di_k seek time over many consecutive blocks ofa
file.
5. Encryption :
a. Encryption is the process used for data security in the distributed
system.
b. Anumber of possible threats exist, such as unauthorized release of
information, unauthorized modification of information.
C. Encryption prevents unauthorized release and modification of
information.
d. For performance, encryption/decryption may be performed by
special hardware at the client and server.
Que 3.17.
i. Explain typical architecture of distributed file system. Give the
mechanisms for building distributed file system.
ii. What is caching ? How isuseful in DFS?
AKTU 2014-15, Marks 10

Answer
i. Architecture of distributed file system: Refer Q. 3.15, Page 3-14B,
Unit-3.
Mechanisms for building DFS:Refer Q. 3.16, Page 3-15B, Unit-3.
ii. Caching: Refer Q. 3.16, Page 3-15B, Unit-3.
Uses of caching in DFS:
a The file system performance can be improved by caching; since
accessing remote disks is much slower than accessing local memory
or local disks.
b. Caching reduces the frequency of access to the file servers and the
communication network, thereby, improving the scalability ofa file
system.
3-18 B (CS-Sem-7) Agreement Protocols

Caching in the distributed file system is used to reduce delays in


accessingof data.
Que 3.18.What is cache? Discuss read operation with cache and
writeoperation with cache. AKTU2017-18, Marks 05
Answer
Cache :
1. Acache is a data storing technique that provides the ability to access
data or files at a h1gher speed.
2. Caches are implemented both in hardware and software.
3. Caching serves as an intermediary component between the primary
storage appliance and the recipient hardware or software device to
reduce the latency in data access.
Read operation:
1. The sequence of stepsin a cache read operation is shown in Fig. 3.18.1.
2 Read operation starts with a lookup operation and has a partial overlap
between the lookup and data read operations.
3 If there is a cache hËt, then cache returns the value to the processor, or
the higher level cache.
4 Ifthere isa cache miss, then we need to cancel the data read operation,
and send a request to the lower level cache.
5. The lower level cache will perform the same sequence of accesses, and
return the entire cache block.
6 The cache controller can then extract the requested data from the
block, and send it to the processor.
7 Simultaneously, the cache controller invokes the insert operation to
insert the block into the cache.
Time

insert replace evict insert


lookup lookup read block

miss
hit|data read
Lower level Lower level
cache cache

Fig. 3.18.1: The read operation.


Cache write operation (writeback cache):
1. Fig. 3.18.2 shows the sequence of operations for acache write operation
for a write back cache.
Distributed System 3-19 B (CS-Sem-7)

2 The sequence of operations is similar to that of a cache read.


3 If there is a cache hit, then we invoke a data write operation, and set
the modified bit to 1.
4 Otherwise, we issue a read request for the block to the lower level
cache.
5. After the block arrives, most cache controllers store it in a small
temporary buffer.
6. In some processors, the cache controller might wait till all the sub
operations complete.
7. After writing into the temporary buffer, cache controller invokes the
insert operation for writing the contents of the block.
Time
? ?
write insertreplace evict insert
lookup lookup block

hit data read miss

Lower level Lower level


cache cache

Fig. 3.18.2. The write operation (write back cache).

Que 3.19. Write and explain various issues that must be addressed
in design and implementation of distributed file system.
AKTU 2017-18, Marks 05

Answer
Following are various issues that must be addressed in the design and
implementation of distributed file system:
1. Naming and name resolution :
A name in file systems is associated with an object.
b. Name resolution refers to the process of mapping a name to an
object or, in the case of replication, to multiple objects.
C. A name space is a collection of names which may or may not share
an identical resolution mechanism.
2. Caches on disk or main memory :
a. The advantages of having the cache in the main memory are as
follows :
Diskless workstations can take advantages of caching as they
are cheaper.
3-20 B (CS-Sem-7) Agreement Protocols

Accessing a cache in main memory is much faster than


accessinga cache on local disk.
ii. The server-cache is in the main memory at the server, hence,
a single design for a caching mechanism is both clients and
servers.

b Caches deal with the memory contention between cache and virtual
memory system.
3. Writing policy :
a. The writing policy decides when a modified cache block at a client
should be transferred to the server.
b. The simplest policy is write-through.
C. In write-through, all writes requested by the applications at clients
are also carried out at the servers immediately.
d. The main advantage of write-through is reliability.
4. Availability:
a. Availability is one of the important issues in the design of distributed
file systems.
b. The failure of servers or the communication network can severely
affect the availability of files.
C. Replication is the primary mechanism used for enhancing the
availability of files in distributed file systems.
5. Scalability : The issue of scalability deals with the suitability of the
design of a system to handle the demands of the growing system.
6 Semantics :
The semantics of a file system characterizes the effects of accesses
on files,
b. The basic semantics are easily understood by programmers and are
easy to handle.
C. Aread operation will return the data (stored) due to the latest write
operation.

Que 3.20. Explain naming in distributed system. What is flat


naming and structured naming ? AKTU2017-18, Marks 05
Answer
Naming :
1. The naming facility enables users and programs to assign character
string names to objects and use the names to refer those objects.
2. The locating facility, which is an integral part of the naming facility,
maps an object's name to the object's location in a distributed system.
Distributed System 3-21 B (CS-Sem-7)

3. The naming and locating facilities joint.ly form a naming system that
hides the details of how and where an object is actually located in the
network.
4. The naming system plays a very important role in achieving the goal of
location transparency, facilitating transparent migration and replication
of objects, object sharing.
Flat naming :
1 Flat name is a simplest name space where names are character strings.
2. Flat names are fixed size bit strings that can be efficiently handed by
machines.
3 Names defined ina flat name space are called primitive or flat names.
4. Flat names do not have any structure.
5 Flat names are suitable for use either for small name spaces having
names for only a few objects or for system-oriented name spaces.
Structured naming :
1 Structured names are organized into name spaces.
2. A structured name space is represented as a labeled directed graph
with twotypes of nodes.
3. A leaf node represents a named entity and stores information about
the entity.
4. A
directory node stores a directory table of (edge label, node ID) pairs.
Que 3.21. Explain shared memory architecture and distributed
memory architecture. AKTU2017-18, Marks 10
Answer

Shared memoryarchitecture : It contains following features:


1. All processors can directly access all memory locations in the system,
thus providing a convenient mechanism for processors to communicate.
2 It is convenient in the sense of:
a Location transparency.
b. Abstraction support.
3 Memory usually is centrally placed.
4. Symmetric multiprocessor (SMP) systems use this centralized memory
approach, where each processor is connected to a shared bus. This
shared bus handles all accesses to main memory and VO: .
5. A schematicview of the centralized shared memory model in an SMP
system is given in Fig. 3.21.1. Each processor (denoted by P) accesses
memory through a shared bus and has a local cache (denoted by $).
3-22 B (CS-Sem-7) Agreement Protocols

Shared Bus

Memory

Fig. 3.21.1.
6. It belongs to the MIMD (Multiple Instruction, Multiple Data)
computational model.
Distributed mnemory architecture: Its main features are :
1 All processors in the system are directly connected to own memory
and caches. Any processor cannot directly access another processor's
memory.
2 Each node has a network interface (NI).
3. Allcommunication and synchronization between pro cessors happens
via messages passed through the NI.
4 Since this approach uses messages for communication and
synchronization, it is often called message passing architecture.
5 This architecture belongs to the MIMD (Multiple Instruction Stream,
Multiple Data Stream) programming model.
6. A schematic view of the distributed memory approach is shown in the
Fig. 3.21.2, where each processor has local memory and processors
(each denoted by P) communicate through an interconnection network.
Also, each processor has its own cache (denoted by $).
P P

M I/O M M I/O

Switched Interconnection Network

Fig. 3.21.2.
3-23 B (CS-Sem-7)
Distributed System

used to improve access


Que 3.22. Caching is one ofthetechniques
benefits of caching and what
to naming data. What are the useful ?
assumptions must hold for it to be
AKTU2015-16, Marks 10

Answer
Benefits of caching:
servers to
Due to caching, data is cached in the main memory at the
reduce disk access latency.
b. File system perfornmance can be improved by caching.
It improves the scalability of a file system.
Assumptions : Following assumptions must hold for caching to be useful :
1. Client-specific assumptions :The assumptions that are client-specific
are:

a.
The client might issue a list of requests instead of individual
requests. This is to be known as a request-list. The client suffers
insignificant performance penalty in the construction of this list.
b. The client will receive a bundle that is a collection of responses to
some or all ofits requests (in a request-list). The client will exert a
fixed amount of resources to dissemble the bundle.
C. The client is prepared to receive responses not in the same order as
that ofits requests. In addition, the order of requests and responses
is not important to the client.
2. Server-specifie assumptions : The assumptions that are server
specific are:
a. The server will determine whether or not to use the PC-Bundle
mechanism whenever a request-list is received. The server incurs
insignificant overhead for this action.
b. The server will determine which of the responses to a request-list
should be bundled. It will exert a fixed amount of resources to
determine and assemble a bundle.
3. Proxy-specific assumptions :The assumptions that are server-specific
are :

Upon receiving the request-list in a PC-bundle, the proxy has the


ability to parse and understand all individual object requests.
b The proxy will remove from the request-list all objects that are
found in its cache. It will form a new PC-bundle's request-list to be
forwarded to the server or next proxy.
3-24 B (CS-Sem-7) Agreement Protocols

PART-5
Design Issues in Distributed Shared N aory.

Questions-Answers

Long Answer Type and Medium Answer Type Questions

Que 3.23. What is Distributed Shared Memory (DSM) ? Explain


with diagram the architecture of distributed shared memory.
AKTU 2014-15, Marks 05

Answer
1. Distributed Shared Memory (DSM) is a form of memory architecture
where the (physically separate) memories can be addressed as one
(logically shared) address space.
2. The shared memory model provides a virtual address space shared
between all nodes.
3. DSM is primarily a tool for any distributed application in which individual
shared data items can be accessed directly.
Architecture:
1. DSM consists of number of nodes or computers each of which is
connected toother through high speed communication channel.
Node 1 Node 2 Node n

Memory Memory Memory

Mapping Mapping Mapping


manager manager manager

Distributed
Shared memory
Fig. 3.23.1,
Distributed System 3-25 B (CS-Sem-7)

2 Each node consists of one or more Central Processing Units and a single
memory unit.
3 Message is passed from one node to another by means of simple message
passing technique.
4 Data moves between main memory and secondary memory (within a
node) and between main memories of different nodes.
5 The main memory of individual nodes is used to cache pieces of shared
memory space using mapping manager.
6. When a process accesses data in the shared address space, the mapping
manager maps shared memory address to physical memory.
7. The mapping manager is layer of software implemented either in the
operating kernel or as runtime library routine.
8. For mapping operation, the shared memory space is partitioned into
blocks.
9 A simple message passing system allows on different node to exchange
message with each other.

Que 3.24. Explain design issues in distributed sharedmemory.


OR
Explain the mechanism for building distributed file system also
explain the design issues in distributed shared memory.
AKTU2018-19, Marks 10
Answer
Mechanisms for building distributed file system : Refer Q. 3.16,
Page 3-15B, Unit-3.
Design issues in distributed shared memory:
1. Granularity : Granularity refers to the block size of a DSM system.
2. Structure of shared memnory space :
a. Structure refers to the layout of the shared data in memory.
b The structure of the shared-memory is dependent on the type of
applications that the DSM system is intended to support.
3 Memory coherence and access synchronization :
a In aDSM system, concurrent access to shared data may be
generated.
b Therefore, memory coherence protocol and access synchronization
is needed to maintain the consistency of shared data.
4. Data location and access : To share data in a DSM system, it must
implement some form of data block locating mechanism in order to
service network data block fault and tomeet the requirenment of the
memory coherence semantics which are being used.
3-26 B (CS-Sem-7) Agreement Protocols

5. Replacement strategy :
If the local memory of a node is full, a cache miss at that node implies
not only a fetch of the accessed data block from a remote node but
also a replacement.
b. Therefore, a cache replacement strategy is also necessary in the
design ofa DSM system.

PART-6

Algorithm for Implementation of Distributed Shared Memory.

Questions-Answers

Long Answer Type and Medium Answer Type Questions

Que 3.25.Explain typical architecture of distributed file system.


State the algorithm for implementation of distributed shared
memory. AKTU2018-19, Marks 10
OR

Give the design issues in distributed shared memory. State the


algorithm for implementation of distributed shared memory.
AKTU 2018-19, Marks 10
OR

Explain design issues in distributed shared memory and also write


algorithm for implementation of shared memory.
AKTU2016-17, Marks 10

Answer

Architecture of distributed file system : Refer Q. 3.15, Page 3-14B,


Unit-3.
Design issues in distributed shared memory: Refer Q. 3.24,
Page 3-25B, Unit-3.
Following four algorithms are used to implement DSM systems :
1. The central-server algorithm
Distributed System 3-27 B (CS-Sem-7)

2. The migration algorithm


3 The read-replication algorithm
4 The full-replication algorithm
Migration algorithm :
1 In migration algorithm, data is migrated to the location of the data
access request, allowing subsequent accesses to the data to be performed
locally.
2. The migration algorithm allows only one node to access the shared data.
3. Data is migrated between servers in a fixed size unit called block to
facilitate the management of the data.
Algorithm :
1 Client will check if the block is local or not.
2 If block is not local then determine the location of block.
3 It sends the request to the server to determine the location of block.
4 Server receives the request from the client.
5 Server will send the location of block to the client.
6 Client receives the location of block from the server and access the data
block.
Data access request

Clients

Data block
Fig. 3.25.1.

Que 3.26. Describe the following algorithm for implementing


DSM:
i. The Migration Algorithm
ii. The Full-Replication Algorithm AKTU2014-15, Marks 05

Answer

i. The Migration Algorithm : Refer Q. 3.25, Page 3-26B, Unit-3.


i. The Full-Replication Algorithm :
1. The full replication algorithm allows multiple nodes to have both read
and write access to shared data blocks.
3-28 B (CS-Sem-7) Agreement Protocols

2. Because many nodes can write shared data concurrently, the access to
shared data must be controlled tomaintain its consistency.
3 One simple way to maintain consistency is to use a gap-free sequencer.
4. In this scheme, all nodes wishing to modify shared data willsend the
modifications to a sequencer.
5 The sequencer will assign a sequence number and multicast the
modification with the sequence number to all the nodes that have a copy
of the shared data item.
6. Each node processes the modification requests in the sequence number
order.
7. A
gap between the sequence number of a modification request and the
expected sequence number at a node indicates that one or more
modifications have been missed.

Clients Sequencer Hosts


Sequencer
If write
Write Update send data
request multicast receive data
add sequence
number
multicast

Clients receive receive data


acknowledgment update local
update local memory
memory

Fig. 3.26.1. Full replication algorithm.

8 Under such circumstances, the node will ask for the retransmission of
the modifications it has missed.

Que 3.27. What are the advantages of DSM?

Answer
Following are the advantages of DSM:
1. DSM provides a simple abstraction for sharing data.
2. DSM systems allow complex structures to be passed by reference, thus
simplifying the development of algorithms for distributed applications.
3. DSM takes advantages of the locality of reference exhibited by programs
and thereby cuts down on the overhead of communicating over the
network.
4 DSM systems are cheaper tobuild.
Distributed System 3-29 B (CS-Sem-7)

5. The physical memory available at all the nodes of a DSM system


combined together is enormous. This large memory can be used to run
programs efficiently that require large memory without incurn. \g disk
latency.
6. In tightly coupled multiprocessor systems with asingle shared memory,
main memory memory is accessed via a common bus that limits the size
of the multiprocessor system to a few tens of processors. DSM systems
do not suffer from this drawback and can easily be scaled upwards.
7. Programs written for shared memory multiprocessors can be run on
DSM systems without any changes.

Que 3.28. Explain central-server algorithm for implementing


distributed shared memory.
Answer
1. In the central-server algorithm, a central-server maintains all the shared
data.
2. It services the read request from other nodes or clients by returning the
data items to them.
3. It updates the data on write request by clients and returns
acknowledgment messages.
4 A timeout can be employed to resent the requests in case of failed
acknowledgements.
5. Duplicate write requests can be detected by associating sequence
numbers with write requests.
6. Afailure condition is returned to the application typing to access shared
data after several retransmission without a response.
Central-server Clients Central server
Data access
Send data request
request
Receive request
perform data access
send response

Receive response
Clients

Fig,3.28.1. Central- server algorithm.


3-30 B (CS-Sem-7) Agreement Protocols

Que 3.29. Describe the re ad replication algorithm for


implementing distributed shared memory.
Answer

Read replication algorithm :


1. The read replication algorithm extends the migration algorithm by
replicating data blocks and allows multiple nodes to have read access
or one node to have read-write access (the multiple readers-one
writer protocol).
2. Read replication can improve system performance by allowing
multiple nodes to access data concurrently.
3. However, the write operation is expensive as all the copies of a shared
block at various nodes will either have to be invalidated or update
with the current value to maintain the consistency of the shared data
block.
4. In the read replication algorithm, DSM must keep track of the location
of all the copies of data blocks.
Data access
Clients Remote host
request
If block not local
determine location
send regquest
Receive request
Data replication send block

Receive block
multicast invalidate
Invalidate
Receive invalidate
invalidate block
Access data

Fig. 3.29. 1.Write operation in the read replication algorithm.

VERY IMPORTANT QUESTIONS


Following questions are very important. These questions
may be asked in your SESSIONALS as well as
UNIVERSITY EXAMINATION.

Q.1. What are agreement proto cols ? Explain Byzantine


agreement problem, the consensus problem and interactive
consistency problem.
4UNIT
Failure Recovery in
Distributed System

CONTENTS
Part-1 : Failure Recovery in Distributed ......4-2B to 4-4B
System : Concepts in Backward
and Forward Recovery, Recovery
in Concurrent System

Part-2 : Obtaining Consistent 4-4B to 4-13B


Checkpoints, Recovery in
Distributed Database System

Part-3 : Fault Tolerance: Issues 4-13B to 4-15B


in Fault Tolerance

Part-4: Commit Protocol, Voting 4-15B to 4-20B


Protocol, Dynamic Voting
Protocol

4-1 B (CS-Sem-7)
4-2 B (CS-Sem-7) Failure Recovery in Distributed System

PART- 1
Failure Recovery in Distributed System : Concepts in Backward
and Forward Recovery, Recovery in Concurrent System.

Questions-Answers

Long Answer Type and Medium Answer Type Questions

Que 4.1. Define forward recovery and backward recovery. List


advantages and disadvantages of forward recovery. Explain two
approaches of backward-error recovery.
AKTU2014-15, 2016-17; Marks 10
Answer
Failure recovery is a process that involves restoring an erroneous state to an
error-free state. There are two approaches for restoring an erroneous state
to an error-free state:
1. Forward-error recovery :
If the nature of errors and damages caused by faults can be
completely and accurately assessed, then it is possible to remove
those errors in the process state and enable the process to move
forward. This technique is known as forward-error recovery.
b. The forward-error recovery technique incurs less overhead because
only those parts of the state that deviate from the intended value
need to be corrected.
C. This technique can be used only where the damages due to faults
can be correctly assessed.
Advantages of forward recovery : Relatively low overhead.
Disadvantages of forward recovery :
1. Dependent on damage assessment and error recovery and error
prediction.
2. Cannot provide a general mechanism for recovery design specifically
for a particular system.
2. Backward-error recovery :
a. Ifit is not possible to foresee the nature of fault and to remove all
the errors in the process state, then the process state can be restored
to a previous error-free state of the process. This technique is
known as backward-error recovery.
Distributed System 4-3 B (CS-Sem-7)

b In backward-error recovery, a process is restored to a prior state in


the hope that the prior state is free of errors.
C. Backward-error recovery is simpler than forward-error recovery
as it is independent of the fault and the errors caused by the fault.
d. Thus, a system can recover from an arbitrary fault by restoring to
a previous state. This enables backward-error recovery to act as a
general recovery mechanism to any type of system.
e. The points in the execution of a process to which the process can
later be restored are known as recovery points.
f. A recovery point is said to be restored when the current state of a
process is replaced by the state of the process at the recovery point.
Approaches for implementing backward-error recovery :
1. Operation-based approach :
In the operation-based approach, all the modifications that are made
tothe state of a process are recorded in sufficient detail so that a
previous state of the process can be restored by reversing all the
changes made to the state.
b The record of the system activity is known as an audit trail or a log.
2. State-based approach :
a In the state-based approach, the complete state of a process is
saved when a recovery point is established and recovering a process
involves reinstating its saved state and resuming the execution of
the process from the state.
b The process of saving state is also referred to as checkpointing. The
recoveryy point at which checkpointing occurs is often referred to as
a checkpoint.
C
The process of restoring a process to a prior-state is referred toas
rolling back the process.
Que 4.2. Define forward and backward recovery. Also list the

advantages and disadvantages of both. AKTU 2018-19, Marks 10

Answer
Forward and backward recovery, advantages and disadvantages
of forward recovery : Refer Q. 4.1, Page 4-2B, Unit-4.
Advantages of backward recovery :
1. Backwa recovery can handle unpredictable errors caused by residual
design faults.
2.
Backward recovery can be used regardless of the damage sustained by
the state.
3. Backward recovery can handle transparent or permanent arbitrary
faults.
4-4 B (CS-Sem-7) Failure Recovery in Distributed System
Disadvantages of backward recovery :
1, Backward recovery requires significant resources (i.e., time,
computation, and stable storage) to perform checkpointing and
recovery.
2. The implementation of backward recovery often requires that the
system be halted temporarily.
3. Restoring the previous state of a system or process (performance wise)
relatively costly.
Que 4.3. What do you mean by recovery in concurrent system ?
Explain. AKTU 2014-15, Marks 05
OR
What do you mean by backward and forward error recovery? Discuss
recovery in concurrent systems in detail.
AKTU2015-16, Marks 10
Answer
Backward and forward error recovery : Refer Q. 4.1, Page 4-2B,
Unit-4.
Recovery in concurrent system :
1. In concurrent systems, several processes cooperate by exchanging
information to accomplish a task.
2. The information exchange can be through a shared memory in the case
of shared memory machines or through messages in the case of
distributed system.
3 Recovery in concurrent system means to rollback all the processes at
the time of failure.
4 During failure the system assigns a recovery points at the point where
failure occur in the process.
5. If the failed process is associated with active process then the active
process must also rollback at an earlier state.
6. Recovery point helps to undo the effect caused by failed process.

PART-2

Obtaining Consistent Checkpoints, Recovery in


Distributed Database System.

Questions-Answers

Long Answer Type and Medium Answer Type Questions


Distributed System 4-5 B (CS-Sem-7)

Que 4.4.What is checkpointing ?Explain strongly consistent set


of checkpoint.
Answer
1 Checkpointing is a mechanisms that enables transactions to recover
from inconsistent state using backward error recovery.
2 Checkpointhg in distributed system involves taking acheckpoint by all
the processes (sites) or at least by a set of processes (sites) that interact
with one another in performing a distributed computation.
3 In distributed systems, all the sites save their local states, which are
known as local checkpoints, and the process of saving local states is
called local checkpointing.
4. All the local checkpoints, one from each site,collectively form a global
checkpoint.
Strongly consistent set of checkpoints:
1
The domino effect is caused by orphan messages, which themselves are
due to rollbacks.
such
2 To overcome the domino effect, a set of local checkpoints is needed
that no information flow takes place between any pair of processes in
the set, as well as in between any process in the set and any process
outside the set during the interval spanned by the checkpoints.
3 Such a set of checkpoints is known as a recovery line or a strongly
consistent set of checkpoints.
X
X

Z
Time

Fig. 4.4.1. Consistent set of checkpoints.


strongly consistent set of
4. In the Fig. 4.4.1, the set (x,, y, 2,) is a the interval spanned by
checkpoints and the thinly dotted lines denote
the checkpoints.
4-6 B (CS-Sem-7) Failure Recovery in Distributed System
5. A strongly consistent set of checkpoints corresponds to a strongly
consistent global state, wherein all messages have been delivered and
processed, and no message is in transit.
6. Processes X, Y, and Z can be rolled back to their respective checkpoints
x9, and z, and resume execution in the event of afailure.
7. No further rollbacks due to the domino effect would be necessary as no
information exchange took place in the interval spanned by the set of
checkpoints.
8. That is, no local checkpoint includes an effect whose cause would be
undone due to the rollback of another process.
Que 4.5. Write short note on consistent checkpoints.

Answer
Consistent checkpoints :
X,
X

Fig. 4.5.1. Time

1. Suppose that Y fails after receiving message m as shown in


Fig. 4.5.l.
2. IfY restarts from checkpoint yo, message m is lost due to rollback. The
set (xo, Yo, 2,) is referred to as a consistent set of checkpoints.
3. A consistent set of checkpoints is similar to a consistent global state in
that it requires that each message recorded as received in a checkpoint
(state) should also be recorded as sent in another checkpoint (state).
4. Therefore, systems that do not establish a strongly consistent set of
checkpoints have to deal with lost messages during rollback recovery.
5. While the systems which establish strongly consistent set of checkpoints
do not have to deal with lost messages during rollback recovery, they
experience delays during the checkpointing process as processes cannot
exchange messages while checkpointing is in progress.
Distributed System 4-7B (CS-Sem-7)

Que 4.6. Write a short note on method to obtain consistent set of

checkpoints. AKTU 2016-17, Marks 10

Answer
Method to obtain consistent set of checkpoints :
1 Assume that the action of taking a checkpoint and the action of sending
or receiving a message are indivisible; that is, they are not interrupted
by any other events.
2 Ifevery process takes a checkpoint after sending every message, the set
of the most recent checkpoints is always consistent.
3 The set of latest checkpoints is consistent because the latest checkpoint
at every process corresponds to a state where all the messages recorded
as received in it have already been recorded elsewhere as sent.
4 Therefore, rolling back a process to its latest checkpoint would not
result in any orphan messages.
5 Taking a checkpoint after sending each message is sent is expensive,so
we reduce the overhead by taking a checkpoint after every KK> 1)
messages sent.
6 However, this method suffers from the domino effect.

Que 4.7. Write short note on :


i. Livelocks
ii. Domino effects
iii. Failure resilient processes
iv. Consistent checkpoints AKTU2014-15, Marks 10
Answer
i. Livelocks:
1. In rollback recovery, livelock is a situation in which a single failure
can cause an infinite number of rollbacks, preventing the system
from making progress.
2. A livelock situation in a distributed system is shown in
Fig. 4.7.1. Here Fig. 4.7.1(a) illustrates the activity of twoprocesses
X and Y until the failure of Y.
3. Process Yfails before receiving message n,, sent by X.
4 When Yrolls back to y,, there is no record of sending message m,,
hence X must rollback to x,:
5 When process Y recovers, it sends out m, and receives n,
[Fig. 4.7.1(6)].
4-8 B (CS-Sem-7) Failure Recovery in Distributed System

Time
Failure
(a)
X

n
m

Time
2nd rollback
(6)
Fig. 4.7.1.
6 Process X, after resuming from x,, sends n, and receives m
7. However, because Xis rolled back, there is no record of sending n,
and hence Yhas to rollback for the second time.
8 This forces X to rollback too, as it has received m,, and there is no
record of sending m, at Y.
9 This situation can repeat indefinitely, preventing the system from
making any progress.
ii. Domino effect:
1. Consider the system activity illustrated in the Fig. 4.7.2.
2 In the Fig. 4.7.2, X,Y, and Zare three processes that cooperate by
exchanging information (shown by the arrows).
3. Each symbol marks a recovery point towhich a process can be
rolled back in the event of afailure.
X X
X

Time
Fig. 4.7.2.
Distributed System 4-9 B (CS-Sem-7)

4 If process Xis to be rolled back, it can be rolled back to the recovery


point x, without affecting any other process.
5. Suppose that Yfails after sending message mand is rolled back to

6. In this case, the receipt of mis recorded in , but the sending of m


is not recorded in y,
7. Now we have a situation where Xhas received message mfrom Y,
but Y has no record of sending it, which corresponds to an
inconsistent state.
8 Under such circumstances,m is referred to as an orphan message
and process X must rollback.
X must rollback because Yinteracted with X after establishing its
recovery point y,
10. When Yis rolled back to y,, the event that is responsible for the
interaction is undone.
11. Therefore, all the effects at Xcaused by the interaction must also
be undone.
12. This can be achieved by rolling back Xto recovery point x
13. In the same way, ifZ is rolled back, all three processes must rollback
to their very first recovery points, namely, , Y, and z,
14. This effect, where rolling back one process causes one or more
other processes to rollback, is known as domino effect.
ii. Failure resilient processes :
1 The fundamental unit of execution is a process.
2. Hence, in order for any system to be fault-tolerant., the processes of
that system must be resilient to system failure.
3. A process is said to be resilient, if it marks failures and
guarantees
progress despite a certain number of system failures.
4.
In other words, a minimum disruption is caused to service provided
by the process in event of a system failure.
5
Two approaches have been proposed to implement resilient process :
a. Backup processes
b Replicated execution
iv. Consistent checkpoints : Refer Q. 4.5, Page 4-6B, Unit-4.
Que 4.8. Explain synchronous checkpointing algorithm.

Answer
Synchronous checkpointing :
1
so
In this approach, processes synchronize their checkpointing activitythe
that a globally consistent set of checkpoints is always maintained in
system.
4-10 B (CS-Sem-7) Failure Recovery in Distributed System
2. In this method, consistent set of checkpoints are used which avoids
livelock problems during recovery.
Algorithm:
1 It assumes the following characteristics :
a.
Processes communicate by exchanging messages through
communication channels.
b. Channels are FIFO in nature.
C.
Communication failures do not partition the network.
2.
The checkpoint algorithm takes two kinds of checkpoints on stable
storage:
a.
Permanent checkpoint : A permanent checkpoint is a local
checkpoint at aprocess and is apart ofa consistent global checkpoint.
b. Tentative checkpoints : A tentative checkpoint is a temporary
checkpoint that is made a permanent checkpoint on the successful
termination of the checkpoint algorithm.
3. Processes rollback only totheir permanent checkpoints.
4 The algorithm has two phases :
a. First phase:
i An initiating process P, takes tentative checkpoint and requests
all the processes to take tentative checkpoints.
Each process informs process P; whether it accepts or rejects
the request of taking tentative checkpoint.
When all the process has successfully accepted the tentative
checkpoints then P decides to make this checkpoint a
permanent checkpoint. Otherwise tentative checkpoint is
discarded.
b. Second phase :
i. P, informs all the processes of the decision it reached at the
end of the first phase.
Aprocess, on receiving the message from P,, will act accordingly.
i. Therefore, either all or none of the processes accept permanent
checkpoints.
Que 4.9. Write short note on lost message.

Answer
Lost messages :
a.
Suppose that checkpoints x, and y,(Fig. 4.9.1) are chosen as the recovery
points for processes Xand Y, respectively.
Distributed System 4-11 B(CS-Sem-7)
b In this case, the event that sent message m is recorded in x,, while the
event ofits receipt at Yis not recorded in y,.
C. If Y fails after receiving message m, the system is restored to state
Y, in which message mis lost as process Xispast the point where
it sendsmessage m.
d This condition can also arise if m is lost in the communication channel
and processes X and Yare in state x, and y,, respectively.
e Both the above conditions are indistinguishable.

Failure

Time
Fig. 4.9.1.

Que 4.10. Explain the approaches to implement resilient process.


Answer
Two approaches have been proposed to implement resilient process:
1. Backup processes :
a. In the backup processes approach, each resilient process is
implemented by a primary process and one or more backup
processes. The primary process executes while the backup processes
are inactive.
b. If the primary process terminates because of a failure, one of the
backup processes becomes active and talkes over the functions of
the primary process.
C. To minimize the computation that has to be redone by the backup
process, the state of the primary process is stored (checkpointed) at
appropriate intervals.
d. The checkpointed state is stored in a suitable place such that the
failure of the primary process's machine does not affect the
checkpoint's availability.
2. Replicated execution:
a.
In the replicated execution approach, several processes execute
the same program concurrently.
b As long as one of the processes survives failures, the computation
or the service continues.
4-12B (CS-Sem-7) Failure Recovery in Distributed System
C. A significant advantage of replicated execution is that it can be used
to increase reliability as well as availability.
d. The reliability of a computation can be increased by taking a majority
consensus among the results generated by all the process.
e This final result can then be used in subsequent computations.

Que 4.11. Explain rollback recovery algorithm in distributed


database system.
Answer
1. This algorithm assumes that a single process invokes the algorithm as
opposed to several processes concurrently invoking it to rollback and
recover.

2. It also assumes that the checkpoint and the rollback recovery algorithms
are not concurrently invoked.
3. This algorithm has two phases:
First phase :
i. An initiating process P, checks whether all the processes are
willing to restart from their previous checkpoints.
i. Aprocess may reject the request of P, ifit is already participating
in acheckpointing or a recovering process initiated by some
other process.
iii. If all the processes accept the request of P, to restart from
their previous checkpoints, P, decides to restart all the
processes.

iv. Otherwise all the processes continue with their normal


activities.

b. Second phase:
i P.propagates its decision toall the processes.On receiving P's
decision, a process will act accordingly.
The recovery algorithm requires that every process do not
send messages related to underlying computation while it is
waiting for P's decision.
Que 4.12. Write the difference between deadlock and livelock.
Distributed System 4-13 B (CS-Sem-7)

Answer
Difference:
S. No. Deadlock Livelock

1. A deadlock is a situation Livelock is a situation in which


where a process or a set of single failure of aprocess can cause
processes is blocked and an infinite number of rollbacks,
waiting for an event that will preventing the systems from
never occur. making progress.
2. The problem of deadlocks is LivelockS are common in
common in computer system distributed system where
where resource sharing is synchronization is maintained by
frequent. message passing.
3 Deadlock can be handled by Livelocks can be prevented by
various deadlock handling coordinating the processes either
strategies like prevention, at the time of establishing
detection and avoidance. checkpointing or at the beginning
of recovery.

PART-3
Fault Tolerance :Issues in Fault Tolerance.

Questions-Answers

Long Answer Type and Medium Answer Type Questions

Que 4.13.Define fault and failure. What are different approaches


to fault-tolerance ?Explain. AKTU 2014-15, Marks 05

Answer
Faults : A fault is an anomalous physical condition. The causes of a fault
include design errors, manufacturing problems, damage fatigue or other
deterioration, and external disturbances.
Failure: Failure of asystem occurs when the system does not perform its
services in the manner specified. An erroneous state of the system is a state
which could lead to a system failure by a sequence of valid state transaction.
4-14 B (CS-Sem-7) Failure Recovery in Distributed System
Different fault tolerance approaches :
1. Replication :
a.
Replication is the process of creating and maintaining multiple copies
of data objects or processes on several nodes.
b Therefore, iffailure on one node occurs then data will be accessible
to the user from other node.
C.
Replication provides high data availability and performance.
2. Checkpointing :
a. Fault tolerance can be achieved through checkpointing.
b. Checkpointing means to periodically save the consistent state of
the system in a reliable storage medium. Each such instance when
a system is in the consistent state is called a checkpoint.
C. Checkpointing is primarily used to avoid losing all the useful
processing done before a fault has occurred.
In case of a fault, checkpoint enables the execútion of a program to
be resumed fromn a previous consistent state rather than resuming
the execution from the beginning.
Que 4.14. Discuss at least three main issues that are relevant to
the understanding of distributed fault tolerance system. Explain
how that makes it important. AKTU2015-16, Marks 10

Answer
Issues in the fault tolerance are as follows :
1. Process deaths:
a. When a process dies, it is important that the resources allocated to
that process are recouped, otherwise they may be permanently
lost.
b Many distributed systems are structured along the client-server
model in which a client requests a service by sendinga message to
a server.

C.
Ifthe server process fails, it is necessary that the client machine be
informed so that the client process, waiting for a reply can be
unblocked to take suitable action.
d. Similarly, if a client process dies after sending a request toa server,
it is imperative that the server be informed that the client process
no longer exists.
e This will facilitate the server in reclaiming any resources it has
allocated to the client process.
Distributed System 4-15 B (CS-Sem-7)
2. Machine failure :
In the case of machine failure, all the processes running at the
machine will die.
b. As far as the behaviour of a client process or a server process 1s
concerned, there is not much difference in their behaviour in the
event of a machine failure or a process death.
C. In case of machine failure, an absence of any kind of message
indicates either process death or a failure.
3. Network failure:
a. A communication link failure can partition a network into subnets,
making it impossible for a machine to communicate with another
machine in a different subnet.
b. A process cannot give the difference between a machine and a
communication link failure, unless the underlying communication
network (such asa slotted ring network) can recognize a machine
failure.
C
If the communication network cannot recognize machine failures
and thus cannot return a suitable error code (such as ethernet), a
fault-tolerant design will have to assume that a machine may be
operating and processes on that machine are active.

PART-4
Commit Protocol, Voting Protocol, Dynamic Voting Protocol.

Questions-Answers
Long Answer Type and Medium Answer TypeQuestions

Que 4.15.Explain two phase commit protocol.


Answer
Two phase commit protocol : Two phase commit protocol is designed to
allow any participant to abort its part oftransaction. Due to the requirement
for atomicity, if one part of a transaction is aborted then the whole transaction
must also be aborted.
Phase 1 (Voting phase) :
1. The coordinator sends a canCommit request to each of the participants
in the transaction.
2 When a participant receives a can Commit request it replies with its vote
(yes or no) to the coordinator. Before voting yes, it prepares to commit
4-16 B (CS-Sem-7) Failure Recovery in Distributed System

by saving objects in permanent storage. If the vote is no, the participant


aborts immediately.
Phase 2 (Completion according to outcome of vote) :
1 The coordinator collects the votes (including itsown):
a. If there are no failures and all the votes are yes, the coordinator
decides to commit the transaction and sends a doCommit request to
each of the participants.
b Otherwise, the coordinator decides to abort the transaction and
sends doAbort requests to all participants that voted yes.
2 Participants that voted yes are waiting for a doCommit or doAbort request
from the coordinator. When a participant receives one of thesemessages,
it acts accordingly and in the case of commit, makes a haveCommitted
calls as confirmation to the coordinator.

coordinator participant

step status step status

can Commit ?
1. prepared to commit
2. prepared to commit
(waiting for votes) yes
(uncertain)
doCommit ?
3.committed 4. committed
| haveCommitted ?
done

Fig. 4.15.1. Communication in two phase commit protocol.

Que 4.16. What are commit protocols ? Explain how two phase
protocols respond to failure of participating site and failure of
coordinator. AKTU 2014-15, Marks 05

Answer
Commit protocols :
1. In distributed system commit protocols ensure the atomicity across the
sites, i.e., when a transaction executes at multiple sites it must either be
committed at all the sites or aborted at all the sites.
2. The goal of commit protocols is to have all the concern participants
agree either to commit or to abort a transaction.
Handlinga failure of a participating site :
Let us assume that the failed site is S, and the Transaction Coordinator is TC.
There are two things we need to look into to handle failure of a participating
site:
Distributed System 4-17 B (CS-Sem-7)
1. The response of the Transaction Coordinator of
transaction T :
If the failed site have not sent any message
(<ready T>), the TC
cannot decide to commit the transaction. Hence, the transactionT
should be aborted and other participating sites is to be informed.
b. If the failed site have sent a message (<ready T>), the TC
can
assume that the failed site also was ready to commit, hence the
transaction can be committed by TC and the other sites will be
informed to commit. In this case, the site which recovers from
failure has to execute the two phase (2PC) protocol to set its local
database up-to-date.
2. The response of the failed site when it recovers:
When recover from failure, the recovering siteS, must identify the
fate of the transactions which was going on during the failure of S.
This can be done by examining the log file entries of site S;.
b. This is how the two phase (2PC) protocol handles the failure of a
participating Site.
Handling the failure of a coordinator site :
Let us suppose that the coordinator site failed during execution of two phase
(2PC) protocol for atransaction T. This situation can be handled in following
two way :
1 The other sites which are participating in the transaction T may try to
decide the fate of the transaction. That is, they may try to decide on
commit or abort of T using the control messages available in every site.
2. The second way is towait until the coordinator site recovers.
Que 4.17.Deseribe three phase commit protocol. Howthree phase
commit protocol is different than two phase commit protocol ?
AKTU 2017-18,Marks 10
Answer
Phases in three phase commit proto col :
1 The three-phase commit (3PC) protocol isa distributed algorithm which
lets all nodes in a distributed system agrees to commit a transaction.
2. 3PC is non-blocking protocol.
3. 3PC places an upper bound on the amount of time required before a
transaction either commits or aborts.
4. This property ensures that if a given transaction holds some resource
locks, it will release the locks after the time-out.
5. The three-phase commit (3PC) protocol is more complicated and more
expensive phase in 3PC protocol.
4-18 B (CS-Sem-7) Failure Recovery in Distributed System
Phase 1:Voting /Prepare phase :
1. Transaction Coordinator (TC) of the transaction writes
BEGIN_COMMIT message in its log file and sends PREPARE message
to all the participating sites and waits.
2. On receiving PREPARE message, ifa site is ready to commit, then the
site's Transaction Manager (TM) writes READY in its log and send
VOTE_COMMIT to TC.
3. If any site is not ready to commit, it writes ABORT in its log and
responds with VOTE ABORT to the TC.
Phase 2: BufferingPre-commit phase :
1. IfTC received VOTE_COMMIT from all the participating sites, then it
writes PREPARE_TO0_COMMIT in its log and sends
PREPARE_TO_COMMITmessage to all the participating sites.
2 If TC receives any one VOTE ABORT message, it writes ABORT in its
log and sends GLOBAL ABORT to all the participating sites and also
writes END_OF_TRANSACTION message in its log.
3. On receiving the message PREPARE_TO_COMMIT, the TM of
participating sites write PREPARE_TO_COMMIT in their log and
respond with READY_ TO_COMMIT message to the TC.
4. If they receive GLOBAL_ABORT message, then TM of the sites write
ABORT in their logs and acknowledge the abort.
Phase 3: Decision/Commit or abort phase :
1 If all responses are READY_TO_COMMIT, then TC writes COMMIT
in its log and send GLOBAL_COMMIT message to all the participating
sites' TMs.
2. The TM of all sites then writes COMMIT in their log and sends an
acknowledgement to the TC. Then, TC writes
END_OF_TRANSACTION in its log.
Three phase vs. two phase commit protocol :
1. In two-phase commit protocol, when coordinator fails during execution
then participating sites are unable to determine whether the coordinator
has made a decision to abort or commit the transaction, which cause
participating sites to be in blocked state.
2. To remove this blocking problem in 2PC, three phase commit protocol
was proposed. Three-Phase Commit protocol is able to. prevent this
blocking problem by taking the decision based on the decision of all
sites.

Que 4.18. What is voting protocol ? Explain static voting and

dynamic voting protocols. AKTU2014-15, Marks 05


Distributed System 4-19 B (CS-Sem-7)
OR
Describe in detail:
a.
Dynamic voting protocols
b. Method to obtain consistent set of checkpoint
AKTU2016-17, Marks 10
Answer
Voting protocol :
1 Voting protocol is a common approach to provide fault tolerance in
distributed system by replicating data at many sites (or nodes).
2 Ifa site is not available, the data can still be obtained from copies at other
sites.
3 With the voting mechanism, each replica is assigned some number of
votes, and a majority of votes must be collected from a process before it
can access a replica.
4 The voting mechanism is more fault tolerant than a commit protocol in
that it allows access to data under network partitions, site failures, and
message losses without compromising the integrity of the data.
Static voting protocol :
1 In static voting scheme, the replicas of files are stored at different sites.
2 Every file access operation requires that an appropriate lock is obtained.
3. The lock granting rules allow either 'one writer and no readers' or
'multiple readers and no writers' to access a file simultaneously.
4 It is assumed that at every site there is a lock manager that performs
the lock related operations, and every file is associated with a version
number, which gives the number of times the file has been updated.
5 The version numbers are stored on stable storage, and every successful
write operation on a replica updates its version number.
Dynamic voting protocol :
1 Suppose that in the system shown in Fig. 4.18.1, site 4 becomes
unreachable from the rest of the sites due to its failure or due to a
network partition.
votes =1
(1) 75 msecs

3
votes = 1 votes = 1
750 msecs 750 msecs

Votes = 2
100 msecs

Fig. 4.18.1.
4-20 B (CS-Sem-7) Failure Recovery in Distributed System
2. Site 1, 2, and 3 can still collect a guorum (also referred to as
while site 4 cannot collect a majority)
3.
quorum.
If another partition or a failure of a site
occurs, making any site
unavailable, the system cannot serve any read or write requests as a
quorumn cannot be collected in any partition.
4. In other words, thà system is completely unavailable which is a serious
problem.
5. Dynamic voting protocols solve this problem by adapting the number of
votes or the set of sites that can form a quorum, to the changing state of
the system due to site and communication failures:
6 In the dynamicprotocols, following two approaches are used to enhance
availability:
a.
Majority based approach :The set of sitesthat can forma majority
to allow access to replicated data changes with the changing state
of the system.
b. Dynamic vote reassignment :The number of votes assigned to
a site changes dynamically.
Method to obtain consistent set of checkpoint : Refer Q. 4.6, Page
4-7B, Unit-4.

VERY IMPORTANT QUESTIONS


|Following questions are very important. These questions
may be asked in your SESSIONALS as well as
UNIVERSITY EXAMINATION.

Q. 1. Define forward recovery and backward recovery. List


advantages and disadvantages of forward recovery. Explain
two approaches of backward-error recovery.
Ans. Refer Q. 4.1.

Q.2. What do you mean by recovery in concurrent system ?


Explain.
Ans. Refer Q. 4.3.

Q.3. Write a short note on method to obtain consistent set of


checkpoints.
Ans. Refer Q. 4.6.

Q.4. Write short note on:


i. Livelocks
ii. Domino effects
5 UNIT
Transaetion and
Concurrency Control

CONTENTS
Part-1 Transaction and Concurrency 5-2B to 5-4B
Control : Transaction, Nested
Transactions
Part-2 : Locks, Optimistic 5-4B to 5-10B
Concurrency Control, Timestamp
Ordering, Comparison of Methods
for Concurrency Control
Part-3 : Distributed Transaction 5-10B to 5-14B
Flat and Nested Transaction

Part-4 : Atomic Commit Protocol, 5-14B to 5-17B


Concurrency Contro] in
Distributed Transaction

Part-5 : Distributed Deadlocks, .5-17B to 5-21B


Transaction Recovery

Part-6 : Replication : System Model. 5-21B to 5-31B


and Group Communication,
Fault Tolerant Services, Highly
Available Services and Transaction
With Replicated Data.

5-1 B (CS-Sem-7)
5-2 B (CS-Sem-7) Transaction and Concurrency Control

PART-1

Transaction and Concurrency Control : Transaction, Nested


Transactions.

Questions-Answers
Long Answer Type and Medium Answer Type Questions

Que-5.1. Explain transaction and its properties.


Answer
Transaction is a sequence of read, compute and write statements that refer
tothe data objects of a database.
Properties of transactions: Transactions have four essential properties,
known as ACID properties :
1. Atomicity : Atomicity means that the transaction completes its
processing as an indivisible and atomic unit either successfully or
unsuccessfully. After the transaction, the database changes consistency
or not changed at all.
2.
Consistency : Consistency means that the database is in consistent
state before transaction and after the termination of transaction, the
database will also be in the consistent state.
3. Isolation: Isolation property indicates that every action processed by a
transaction is kept isolated until the completion of transaction. During
the transaction processing, the processing will be hidden from outside
the transaction.
Durability : Durability means any failure made after the commit
4
operation will not affect the database. The commit action will reflect its
results to the database after its termination.

Que 5.2. Write short note on nested transaction.

AKTU 2016-17, Marks 10

Answer
Nested transaction :
1 In a nested transaction, the top-level transaction can open
subtransactions, and each subtransaction can open further
subtransactions down to any depth of nesting.
Distributed System 5-3 B (CS-Sem-7)
2 The Fig. 5.2.1 shows a client's transaction T that opens
two
subtransactions T, and T,, which access objects at servers Xand Y.
3 The subtransactions T, and T, open further
subtransactions T1)
T and T Which access objects at servers M, N, O andP.
4 In the nested case, subtransactions at the same level can run
concurrently, so Tand T, are concurrent, and as they invoke objects in
different servers, they can run in parallel.
5 The four subtransactions T,,,Ti, To, and T, also run
concurrently.
M
T

Client
T X

T1
T, Y

T 22

Fig. 5.2.1.

Que 5.3.Explain how the two phase commit protocol for nested
transaction ensures that if the top level transactions commit, all
the right descendents are committed or aborted ?
AKTU2015-16, Marks 10
Answer
1 Consider the top-level transaction T and its subtransactions shown in
Fig. 5.3.1.
T, abort (at M)

T, provisional commit (at X)

T,, provisional commit (at N)

T provisional commit (at N)

T, aborted (at Y)

T2 provisional commit (at P)


Fig. 5.3.1. Transaction Tdecides whether to commit.
5-4B (CS-Sem-7) Transaction and Concurrency Control
2 Each subtransaction has either provisionally committed or aborted. For
example,
T, has provisionally committed and T,, has aborted, but the fate of
T;, depends on its parent T, and eventually on the top-level
transaction, T.
b Although T, and T,, have both provisionally committed, T, has
aborted and this means that T,, and To, must also abort.
C. Suppose that T decides to commit in spite of the fact that T, has
aborted, also that T,decides to commit in spite of thefact that Ti1
has aborted.
4 When a top-level transaction completes, its coordinator carries out a
two phase commit protocol. The only reason for a participant
subtransaction being unable to complete is if it has crashed since it
completed its provisional commit.
5. When each subtransaction was created, it joined its parent transaction.
6 Therefore, the coordinator of each parent transaction has a list of its
child subtransactions.
7. When a nested transaction provisionally commits, it reports its status
and the status of its descendants toitsparent.
8 When a nested transaction aborts, it just reports abort to its parent
without giving any information about its descendants.
9 Eventually, the top-level transaction receives a list of all the
subtransactions in the tree, together with the status of each.
10. Descendants of aborted subtransactions are actually omitted from this
list.

PART-2

Locks, Optimistic Concurrency Control, Timesstamp Ordering,


Comparison of Methods for Concurrency Control.

Questions-Answers
Long Answer Type and Medium Answer Type Questions

Que 5.4. What is lock ? What are the different modes in which
transaction can lock a data object.
Distributed System 5-5 B (CS-Sem-7)

Answer
1 A lock is avariable associated with shared resources such as data item
that determines whether read/write operation can be performed on
that data item.
2. In lock based techniques,each data object has a lock associated with it.
3. A transaction can hold, request or release the lock on a data object, as
required by the transaction.
4 The transaction is said to have the locked data object, ifit holds a lock.
5. There are two modes of locking in which transaction can lock data object :
a. Exclusive:
If a transaction has locked the data object in exclusive mode,
no other transaction can lock it in any mode.
ii. In this locking scheme, the server attempts to lock any object.
iii. If a client requests access to an object, the request is suspended
and the client must wait until the object is unlocked.
b. Shared:
i If the transaction has locked the data object in shared mode,
other transaction can concurrently lock it but only in shared
mode.
i. If a client requests access to an object, the request is always
successful.
ii. All the transactions reading the same object can share their
read lock.

Que 5.5. Discuss 2PL and strict 2PL in context of distributed

system. AKTU 2015-16, 2016-17; Marks 7.5

Answer

Two phase locking :


1 This locking scheme is also called as 2 PL.
2 Two phase locking is a dynamic locking scheme in which a transaction
requests a lock on a data object when it needs the data object.
Twophase locking is called two phase as it has two phases :
a. Growingphase:It is aphase during which new locks are acquired.
b. Shrinking phase : It is a phase during which locks are released.
4. Two phase locking imposes a constraint on lock acquisition and the lock
release actions of a transaction to guarantee consistency.
5. The state ofa transaction in which it releases locks and hold locks on all
the needed data objects is referred to as lock point. An execution is
shown in Fig. 5.5.1.
5-6 B (CS-Sem-7) Transaction and Concurrency Control

Number Lock point


of
locks
Growing Shrinking
phase phase

Time
First Second Release Release
lock of first of second
lock
acquisition acquisition lock lock

Fig. 5.5.1. Two phase locking scheme.


Strict two phase locking :
read or write an
1 Under a strict execution, a transaction that needs to
the same
object must be delayed until other transactions that wrote
object have committed or aborted.
2. To enforce this rule, any locks applied during the progress
of transaction
or aborts. This is called
are held until the transaction either commit
strict two phase locking.
because transaction
3 Strict two phase locking eliminates cascaded aborts
after the transaction
can read data objects modified by a transaction only
has completed.
as a transaction
4. However, strict two phase locking reduces concurrency
consistency.
holds locks for a longer period than required for
implemented in distributed
Que 5.6. How two phase locking is
database systemn ?
Answer
distributed database system in
Two phase locking can be implemented in a
the following way :
the locks associated with objects
1 ADataManager (DM) at a site controls
stored at that site.
appropriate DM to
2. Transaction Manager (TM) communicates with the
A
lock or unlock a data object.
Distributed System 5-7 B (CS-Sem-7)

3. If arequest for lock cannot be granted, the DM puts it on the waiting


queue of the object.
4 When a lock on an object is released, one of the waiting requests for the
lock on that object is granted.
Que 5.7. Explain optimistic concurrency control.

AKTU2014-15, 2016-17; Marks 05

Answer
Optimistic concurrency control states that the conflicts among the transactions
are rare in distributed database system. It is only an assumption so it is also
called optimistic. In optimistic concurrency control scheme, each transaction
goes through three phase:
1. Working phase : During this phase, each transaction has a tentative
version of each of the objects that it updates. The use oftentative versions
allows the transaction to abort either during the working phase or other
validation phase. The rules for read/write are:
a. Read operation is performed if the tentative version for that
transaction already exists.
b. Write operation record the new values of several concurrent
transaction objects as tentative values which are invisible to other
transactions.
2. Validation phase: When the close transaction request is received, the
transaction is validated to establish whether or not its operations on
objects conflicts with operations of other transaction on same objects.
3. Update phase:If the transaction is validated, all the changes recorded
in its tentative versions are made permanent -read only transaction
can commit immediately after passing validation.
Que 5.8. Discuss the optimistic methods for distributed
concurrency control. What are the different validation conditions
for optimistic concurrency control ? Explain.

|AKTU2015-16, Marks 10

Answer

Optimistic concurrency control : Refer Q. 5.7, Page 5-7B, Unit-5.


Validation condition for optimisticconcurrency control: Let T, and
T, be the two transactions. For a transaction T, to be serializable with respect
tó an overlapping transaction T, their opérations must confirm to the
following rules/conditions:
5-8 B (CS-Sem-7) Transaction and Concurrency Control
1. T, must not read objects written by T;.
2. T, must not read objects written by T;.
3 T, must not write objects written by T, and T. must not write objects
written by T;,
Que 5.9. What are the different validation conditions for
optimistic concurrency control ? How it effects the transaction in
distributed system. AKTU2018-19, Marks 10

Answer
Validation condition for optimistic concurren cy control :
Refer Q. 5.8, Page 5-7B, Unit-5.
Effects of validation conditions on transaction in distributed
system :
1. If the validation conditions are successful, then the transaction can
commit.
2 If the validation conditions fail, then some form of conflict resolution
must be used and the current transaction will be aborted.
3 Rule 1and 2 test whether there is a overlapping between the objects of
pair of transaction T; and T;.
4 Rule 3 ensures that no two transactions can overlap in update phase.
5 Due to restriction on write operations no dirty read can occurs.
Que 5.10. Write short notes on timestamp ordering transaction
management. AKTU 2015-16, Marks 05

Answer
1. In distributed transaction, each coordinator issue globally unique
timestamps.
2. Aglobally unique transaction timestamp is issued to the client by the
first coordinator accessed by a transaction.
3 The transaction timestamp is passed to the coordinator at each server
whose objects perform an operation in the transaction.
4 The servers of distributed transactions are jointly responsible for
ensuring that they are performed in a serially equivalent manner.
5. Atimestamp consists of apair <local timestamp, server-id>.
6. The agreed ordering of pairs of timestamps is based on a comparison in
which the server-id part is less significant.
Distributed System 5-9 B (CS-Sem-7)

7. The same ordering of transactions can be achieved at all the servers


even if their local clocks are not synchronized.

Que 5.11. Explain strict two phase locking with its rules.
Answer
Strict two phase locking: Refer Q. 5.5,Page 5-5B, Unit-5.
The rules for the use of locks in a strict two phase locking
implementation are as follows :
1 When an operation accesses an object within a transaction:
a If the object is not already locked, it is locked and the operation
proceeds.
b If the object has the conflicting lock set by another transaction, the
transaction wait until it is unlocked.
C. Ifthe object has the non-conflicting lock set by another transaction,
the lock is shared and the operation proceeds.
d. If the object has already been locked in the same transaction, the
lock will be promoted if necessary and the operation proceeds.
2. When a transaction is committed or aborted, the server unlocks all
objects it locked for the transaction.
These rules ensure strictness because the locks are held until a transaction
has either committed or aborted.

Que 5.12. Explain multiversion timestamp ordering protocol.


Answer
1 In multiversion timestamp ordering, a list of old committed versions as
well as tentative versions is kept for each object.
2 This list represents the history of the values of the object.
3 Each version has a read timestamp recording the largest timestamp of
any transaction that has read from it in addition to a write timestamp.
4 Whenever a write operation is accepted, it is directed to a tentative
version with the write timestamp of the transaction.
5 Whenever a read operation is carried out it is directed to the version
with the largest write timestamp less than the transaction timestamp.
6 If the transaction timestamp is larger than the read timestamp of the
version being used, the read timestamp of the version is set to the
transaction timestamp.
7. When a read arrives late, it can be allowed to read from an old committed
version, so there is no need to abort the read operations.
5-10 B (CS-Sem-7) Transaction and Concurrency Control
8. In multiversion timestamp ordering, read operations are always
permitted although they may have to wait for earlier transaction to
complete (either commit or abort), which ensures that executions are
recoverable.

Que 5.13, What are the advantages and drawback of multiversion


timestamp ordering in comparison with the basic timestamp
ordering? AKTU2014-15, Marks 10

Answer
Advantages of multiversion timestamp ordering:
1 It allows more concurrency in distributed system.
2. Improved system responsiveness by providing multiple versions.
3 Reduces the probability of conflicts transaction.
4. Read request never fails and is never made to wait.
Disadvantages of multiversion timestamp ordering :
1. Reading of adata item also requires the updating of the read timestamp
field resulting in two potential disk accesses, rather than one.
2. The conflicts between transactions are resolved through rollbacks, rather
than through waits.
3. It require huge amount of storage for storing multiple versions of data
objects.
4. It does not ensure recoverability and cascadelessness.

PART-3

Distributed Transaction : Flat and Nested Transaction.

Questions-Answers
Long Answer Type and Medium Answer Type Questions

Que 5.14. Explain distributed transaction.


Answer
1. Adistributed transaction is a database transaction in which two or
more network hosts are involved.
2. These hosts provide transactional resources, while the transaction
manager is responsible for creating and managing aglobal transaction
that encompasses alloperations against such resources.
Distributed System 5-11 B (CS-Sem-7)
3. Distributed transactions, as any other transactions, must have all four
ACID (Atomicity, Consistency, Isolation and Durability) properties.
4. For distributed transactions, each computer (or nodes) acts as a local
transaction manager.
5. If the transaction works at several computers, the transaction managers
communicate with various other transaction managers by means of
superior or subordinate relationships, which are accurate only for a
specific transaction.
Que 5.15. Explain distributed transactions. Discuss the
functionality of flat and nested distributed transactions with
example. |AKTU2018-19, Marks10
OR
Write short note on flat and nested tran saction.
AKTU 2016-17, Marks 7.5
Answer
Distributed transaction : Refer Q.5.14, Page 5-10B, Unit-5.
Functionality of flat with example :
1 In a flat transaction, a client makes requests to more than one server.
2 Aflat client transaction completes each ofits requests bfore going on to
the next one. Therefore, each transaction accesses server objects
sequentially.
3 For example, in the Fig. 5.15.1, transaction Tis a flat transaction that
invokes operations on objects in servers X, Yand Z.
X

T
T Y
Client

Fig. 5.15.1, Flat transactions.


Functionality of nested transaction with example : Refer Q. 5.2,
Page 5-2B, Unit-5.

Que 5.16.
i. What are the goals of distributed transaction ? Distinguish
between flat and nested transaction along with its structure.
5-12 B(CS-Sem-7) Transaction and Concurrency Control

ii. Explain optimistic concurrency control.


AKTU 2014-15, 2016-17; Marks 10

Answer
i. Goals of distributed transaction :
1. The goal of distributed transaction is to ensure that all of the objects
managed by, a server remain in a consistent state when they are
accessed by multiple transactions and in the presence of server
crashes.
2. To maintain ACID properties of transaction in distributed system.
3 To complete overall transaction occurring at different nodes.
4. Toensure the consistency of a set of shared data objects accessed
by user at the time of failures and concurrent access.
Difference between flat and nested transaction:
S. No. Flat transaction Nested transaction
1 A flat client transaction In a nested transaction, the top-level
completes each of its | transaction can open subtransactions,
requests before going on to and each subtransaction can open
the next one. Therefore, further subtransactions down to any
each transaction accesses depth of nesting.
server objects sequentially.
2 In the Fig. 5.16.1, In nested transaction as shown in
transaction T is a flat Fig. 5.16.2, subtransactions at the
transaction that invokes same level can run concurrently, so
operation on objects in T,and T, are concurrent, and as they
servers X, Y and Z. invoke objects in different servers,
they can run in parallel.
M
x
T

T Y Client,
T,
Client T12
T

Fig. 5.16.1.
P

Fig. 5.16.2.
Distributed System 5-13 B (CS-Sem-7)

iü. Optimistic concurrency control: Refer Q. 5.7, Page 5-7B, Unit-5.


Que 5.17. Draw a schematic diagram of the distributed
transaction management model. Explain each component in brief.
AKTU2017-18, Marks 05
Answer
Transaction management model contain following three components :
1. Transaction manager (TM) :
The transaction manager supervises.the execution of a transaction.
b. It intercepts and executes all the submitted transactions.
C. TM interacts with the DM to carry out the execution of atransaction.
TM assigns a timestamp to a transaction or issue requests to lock
and unlock data objects on behalf ofauser.
e. TM acts as an interface between user and the database system.
2. Scheduler:
Scheduler is used for enforcing concurrency control.
b. It grants or releases locks on data objects as requested by a
transaction.
3 Data manager :
The data manager (DM) manages the database.
b. It carries out the read-write requests issued by the TM on behalf of
a transaction by operating them on the database.
C. Thus, DM is an interface between scheduler and database.

transactions TM Scheduler DM

Database

transactionsTM Scheduler
DM D

Database

transactionsTM Scheduler DM D

Database
Fig. 5.17.1. Distributed transaction management model.
Execution of a transaction at the TM results in the execution of its
actions at the DM.
5-14 B (CS-Sem-7) Transaction and Concurrency Control
e. So, the DM executes a stream of transaction actions, directed
towards it by the TM.

PART-4

Atomic Commit Protocol, Concurrency Control


in Distributed System.

Questions-Answers
Long Answer Type and Medium Answer Type Questions

Que 5.18. Write short note on atomic commit protocol.


Answer
1. Atomic commit protocol (ACP) is a protocol used by database manager
to ensure that all the sub-transactions are consistently committed or
aborted.
2 In this each server applies local concurrency control to its own object,
which ensures that transaction are serialized locally as well as serialized
globally.
3. When a dístributed transaction comes to an end, either all the servers
commit the transaction or abort the transaction.
4 There are two atomic commit protocol used in distributed database :
a. Two phase commit protocol.
b Three phase commit protocol.

Que 5.19. Explain two phase and three phase commit protocol.
Answer
Two phase commit protocol : Refer Q. 4.15, Page 4-15B, Unit-4.
Three phase commit protocol:Refer Q. 4.17, Page 417B, Unit-4.
Que 5.20. Write short note on conflict resolution.
Answer
Aconflict is resolved by.taking one of the following actions:
1. Wait: The requesting transaction is made to wait until the conflicting
transaction either completes or aborts.
Distributed System 5-15 B (CS-Sem-7)

2. Restart:
Either the requesting transaction or the transaction it conflicts
with is aborted and started afresh.
b. Restarting is achieved by using one of the following primitives :
3. Die: The requesting transaction aborts and starts afresh.
4 Wound :
a. The transaction in conflict with the requesting transaction is tagged
as wounded and a message "wounded" is sent to all sites that the
wounded transaction has visited.
b If the message is received before the wounded transaction has
committed at a site, the concurrency control algorithm at that site
initiates an abort of the wounded transaction, otherwise the
message is ignored.
C. Ifawounded transaction is aborted, it is started again.
d The requesting transaction proceeds after the wounded transaction
completes or aborts.

Que 5.21. What are the algorithms for conflict resolution in


timestamp concurrency control ?

Answer
Following are the algorithms for conflict resolution in timestamps
concurrency control :
1. Wait-die algorithm :
a The wait-die algorithm is a nonpreemptive algorithm because a
requesting transaction never forces the transaction holding the
requested data object to abort.
b Suppose requesting transaction T, is in conflict with a transaction
T, If T, is older (i.e., has a smaller timestamp), then T, waits,
otherwise T, dies.
C. The older transaction waits for the younger transaction if the
younger has accessed the granule first.
d. The younger transaction is aborted and restarted ifit tries to access
a granule after an older concurrent transaction.
2 Wound-wait algorithm :
The wound-wait algorithm is a preemptive algorithm.
b. Supposea requesting transaction T, is in conflict with a transaction
T,. If T, is older, it wounds T, otherwise it watts.
C. The older transaction preempts the younger by suspending itif the
younger transaction tries to access a granule after an older
concurrent transaction.
5-16 B (CS-Sem-7) Transaction and Concurrency Control
An older transaction willwait for a younger one to commit if the
younger has accessed a granule that both want.
Que 5.22. What are the advantages, problems and applications of
optimistic concurrency control ?
Answer
Advantages:
Optimistic concurrency control is very efficient when conflicts are rare.
The occasional conflicts result in the transaction roll back.
i. The rollback involves only the local copy of data. And thus no cascading
rollback occurs.
Problems:
Conflicts are expensive to deal.
i. Longer transactions are more likely to have conflicts and may be
repeatedly rolled back because of conflicts with short transactions.
Applications :
Only suitable for environments where there are few conflicts and no
long transactions.
Acceptable for mostly read or query database systems that require very
few update transactions.
Que 5.23. Explain the schemes which conflicts in obtaining local
locks ?

Answer
Schemeswhich conflicts in obtaining local locks :
1. Write-locks-all, read-locks-one :
a. In this scheme exclusive locks are acquired on all copies, while
shared locks are acquired only on one arbitrary copy.
b. Aconflict is always detected, because a shared-exclusive conflict is
detected at the site where the shåred lock is required and exclusive
exclusive conflicts are detected at all sites.
2 Majority locking :
a. Both shared and exclusive locks are requested at a majority of the
copies of the data item.
b. If two transactions are required to lock the same item, there is at
least one copy ofit where the conflict is discovered.
3. Primary copy locking: In primary copy locking, one copy of each data
item is assigned the primary copy and all locks must be required at this
copy so that conflicts are discovered at the site where the primary copy
resides.
Distributed System 5-17 B (CS-Sem-7)

Que 5.24.Explain conservative timestamp methodin distributed


system.
Answer
The conservative timestamp method is based on the following rules :
1. Each transaction is executed at one site only and does not activate
remote programs. It can only issue read or write requests to remote
sites.
2. A site i must receive all the read requests from a different site j in
timestamp order. Similarly, a site imust receive all the write requests
from a different sitej in timestamp order.
3 Assume that a site i has at least one buffered read and one buffered
write operation from each other site of the network.
a. For a read operation R that arrives at site i :Ifthere is some
write operation Wbuffered at site i such that TSR) >TS(W), then
R is buffered until these writes are executed, otherwise R is
executed.
b. For a write operation W
that arrives at sitei:If there is some
read operation Rbuffered at site isuch that TS(W)> TSR), or
there is some write operation W buffered at site i such that
TS(W) > TS(W), then W is buffered until these operations are
executed, otherwise Wis executed.

PART-5

Distributed Deadlocks, Transaction Recovery.

Questions-Answers

Long Answer Type and Medium Answer Type Questions

Que 5.25. Write short notes on wait for graph with example of
distributed transaction. AKTU2015-16, Marks 05
Answer
A
Wait-For Graph (WFG) is a graph where
Each node represents a process.
b. An edge, P,-’ P, means that P, is blocked waiting for P, to release
a resource.
5-18 B (CS-Sem-7) Transaction and Concurrency Control
2. Asystem is deadlocked if and only if there is adirected cycle in the WFG.
3 In Distributed Database Systems (DDBS), users access the data objects
of the database by executing transactions.
4. The data objects of a database can be viewed as resources that are
acquired (through locking) and released (through unlocking) by
transactions.
5. In DDBS a wait for graph is referred to as a transaction-wait-for graph
(TWF graph).
6. In a TWF graph, nodes are transactions and there is a directed edge
from node T,to node T, if T, is blocked and is waiting for T, to release
Some resource.

7. Asystem is deadlocked if and only if there is a directed cycle or a knot in


its TWF graph.

Que 5.26. What is phantom deadlock? Describethe conditions for


the occurrence of phantom deadlock.
Answer
Phantom deadlock:
1 Adeadlock that is 'detected' but is not really a deadlock is called a phantom
deadlock.
2 In distributed deadlock detection, information about wait-for
relationships between transactions is transmitted from one server to
another.
3 If there is a deadlock, the necessary information will eventually be
collected in one place and a cycle will be detected.
4. As thisprocedure will take some time, there is a chance that one of the
transactions that hold a lock will meanwhile have released it., in that
case the deadlock will no longer exist.
5 For example :
a. Consider the case of a global deadlock detector that receives local
wait-for graphs from servers X and Y, as shown in
Fig. 5.26.1.
b. Suppose that transaction Uthen releases an object at server Xand
requests the one held by Vat server Y.
C. Suppose also that the global detector receives server Ys local graph
before server X's.
d. In this case, it would detect a cycle T ’ U V’T, although the
edge T ’ U no longer exists.
Distributed System 5-19 B (CS-Sem-7)

local wait-for-graph local wait-for-graph global deadlock detector


T
T U V T

X U

Fig. 5.26.1. Local and global wait-for-graphs.


6. A phantom deadlock could be detected if a waiting transaction in a
deadlock cycle aborts during the deadlock detection procedure.
7. For example, if there is a cycle T ’ U’ V’T and U aborts after the
information concerning U has been collected, then the cycle has been
broken already and there is no deadlock.
Necessary conditions for the occurrence of phantom deadlock are :
1. Presence of delay between two processes.
2. Notification of false global states by the communicating processes.
a. IfP, and P, are two processes executing on different nodes and a
third node N, is testing for deadlock. Process P, holds resource R,
and process P, holds resource R,.
b. Process Preleases resourceR, and sends message M, to node N.
Process P, then requests resource R, and sends message M, to
node N.
C. Process P, releases resource R, and sends message M, to node Na.
Finally, P, requests R, and sends message M, to node N4.
d. If there is network latency or delay, message M, and M, arrive at
node N, before messages M, and Mg
e. Node Na, which is testing deadlock, will detect a deadlock that does
not exist. This results in occurrence of phantom deadlock.

Que 5.27. Briefly explain the objectives of distributed transaction


management.
Answer
Following are the objectives of distributed transaction
management :
1 CPU and main memory utilization should be improved : Most of
the typical database applications spend much of their time waiting for
I/O operations rather than on computations. To improve CPU and main
memory utilization, a transaction manager should adopt specialized
techniques.
5-20 B (CS-Sem-7) Transaction and Concurrency Control

2. Response time should be minimized: To improve the performance


of transaction executions, the response time of each individual transaction
must be considered and should be minimized.
3. Availability should be maximized : Although the availability in a
distributed system is better than that in a centralized system, it must be
maximized for transaction recovery and concurrency control.
4 Communication cost should be minimized : In distributed systems,
an additional communication cost is incurred, because a number of
execution
message transfers are required between sites to control the
of a global transaction. Preventative measures should be adopted by the
transaction manager tominimize the communication cost.

Que 5.28. Write short note on logging.


Answer

1. In the logging technique, the recovery file represents a log containing


the history of all the transactions performed by a server.
and
2. The history consists of values of objects, transaction status entries
intentions lists of transactions.

The order of the entries in the log reflects the order in which
transactions
3.
have prepared, committed and aborted at that server.
4.
is called
During the normal operation of a server, its recovery manageraborts a
whenever a transaction prepares to commit, commits or
transaction.
recovery
5. When the server is prepared to commit a transaction, the
list to the recovery file,
manager appends all the objects in its intentions
followed by the current status of that transaction (prepared) together
with its intentions list.
recovery
6. When a transaction is eventually committed or aborted, the to its
transaction
manager appends the corresponding status of the
recovery file.

Que 5.29. Write short note on shadow vérsions.

Answer

1. The shadow versions technique is an alternative way to organize a


recovery file.
a file called a
2 It uses a map to locate versions of the server's object in
version store.
the version
3 The map associates the identifiers of the server's versions in
store.
Distributed System 5-21 B (CS-Sem-7)
4. The versions written by each transaction are shadows of the
previous
committed versions.
5. When a transaction is prepared to commit, any of the objects changed by
the transaction are appended to the version store, leaving
the
corresponding committed versions unchanged.
6. When a transaction commits, a new map is made by copying the old map
and entering the positions of the shadow versions.
7. To restore the objects when a server is replaced after a crash., its
manager reads the map and uses the information in the map to locate
recovery
the objects in the version store.
8 The shadow version method provides faster recovery than logging
because the positions of the current committed objects are recorded in
the map, whereas recovery from a log requires searching throughout
the log for objects.

PART-6
Replication : System Modaland Group Communication, Fault,
Tolerant Services, Highly Available Services and Transaction With
Replicated Data.

Questions-Answers
Long Answer Type and Medium Answer Type Questions

Que 5.30. What is replication and replica manager ? Give the


architectural model for replicated data.
AKTU2014-15, Marks 05
Answer
Replication :
1. Replication is the process of storing copies of data at more than one
node.
2 Replication is a key for providing high availability and fault tolerance in
distributed systems.
3. High availability means that all the users can access data after failure of
one or more of the servers.

4. Fault tolerance is the property that enables a system to continue


operating properly in the event of the failure.
5-22 B (CS-Sem-7) Transaction and Concurrency Control

Replica manager :
1. Replica manager is a subsystem that is responsible for managing the
synchronization of replicas.
2. Replica refers to a single copy of the data in a system that employs
replication.
3. Replica managers held various replicas and perform operations upon
them.
4. Areplica manager acts as a server in client-server environment.
5 Acollection of replica managers provides service to client.
Architectural model for replicated data :
FE
(RM (RM)
Fronts
Clients ends
Replica
(RM
TPE managers
and replies
Request
Fig. 5.30.1.

1. In this model, replicas are held by distinct replica managers.


client-server environment, in
2. This general model may be applied in a
which case a replica manager acts as a server.
processes in that case
3. It may be applied toan application. The application
acts as both clients and replica manager.
shown in Fig. 5.30.1.
4. The general model of replica management is
provides service to clients.
5 Acollection ofreplica managers
access to objects which are
6 The client sees a service that gives them
replicated at the replica managers.
operations upon one or more of the
7 Each clients request a series of
objects.
reads and updates to objects.
8 An operation involves a combination of updates are called read-only
Requested operations that involve no
an object are called update
requests requested operations that update
requests.
component called front end. The
9. Each client's requests are handled by a message passing with one or
role of the front end is to communicate by
more of the replica manager.

Que 5.31. Explain the following :


Gossip architecture
AKTU 2014-15, Marks 10|
ii. Quorum consensus methods :
Distributed System 5-23 B (CS-Sem-7)

OR
Explain the processing of queries and update operation in gossip
services.

Answer
Service

(RM)

RM Gossip RM

Query, prev 4 Val, new Update id


Update,
FE prev FE

Query Val Update


Clients,

Fig. 5.31.1.
i. Gossip architecture :
1 Gossip architecture is a framework for implementing highly available
services by replicating data close to the points where groups of clients
need it.
2, Here the replica managers exchange 'gossip' messages periodically in
order to convey the updates received from clients.
3. Agossip service provides two basic types of operation : queries (read
only operations) and updates (modify but do not read the state).
4. Akey feature is that front ends send queries and updates to any replica
manager that is available and can provide reasonable response times.
5 The system makes two guarantees:
a Each client obtains a consistent service over time.
Relaxed consistency between replicas.
b.
Processing of queries and update operations in gossip service :
a. Request :
The front end normally sends requests to only a single replica
manager at a time.
ii. However, a front end will communicate with a different replica
manager when the one it normally -uses fails or becomes
unreachable, or if the normal manager is heavily loaded.
ü. Front ends, and thus clients, may be blocked on query
operations.
5-24 B (CS-Sem-7) Transaction and Concurrency Control
iv. The default arrangement for update operation is to return to the
client as soon as the operation has been passed to the front end; the
front end then propagates the operation in the background.
b. Update response :Ifthe request is an update then the replica manager
replies as soon as it has received the update.
C. Coordination :
i The replica manager that receives a request does not process it
until it can apply the request according to the required ordering
constraints.
i. This may involve receiving updates from other replica managers,
in gossip messages.
d. Execution:The replica manager executes the request.
e. Query response : If the request is a query then the replica manager
replies at this point.
f Agreement :The replica managers update one another by exchanging
gossip messages, which contain the most recent updates they have
received.
:ii. Quorum consensus methods :
1 Aquorum is a subgroup of replica managers whose size gives it the right
to carry out operations.
2 In this scheme, an update operation on a logical object may be completed
successfully by a subgroup of its group of replica managers.
3. The other members of the group will therefore have out-of-date copies
of the object.
4. Versions numbers may be used to determine whether copies are
up-to-date.
5. Each copy of an object has a version number, but only the copies that
are up-to-date have the current version number.
Que 5.32. Discuss the following in terms of distributed system :
i. Sequential consistency
ii. Highly available services AKTU 2018-19, Marks 10
OR
Write short note on highly available services and sequentially
consistency. AKTU2015-16, Marks 10
Answer

i. Sequential consistency :
1. Sequential consistency is astrong safety property for concurrent systems.
5-25 B (CS-Sem-7)
Distributed System
2 A system is sequentially consistent if the
result of any execution of the
executed in a
operations of all the processors is same as if they were
sequential order, and the operations of each individual processor appear
in this sequence in the order specified by its program.
total order, and
3. It implies that operations appear to take place in some each individual
that order is consistent with the order of operations on
process.
the event of a
4 Sequential consistency cannot be totally available in progress.
network partition, some or all nodes will be unable tO make
ii. Highly available services :
1. Availability of service means the percentage of
time that a service is up.
availability is close to
2. Highly available service is the service whose
100% with reasonable response timne.
3. It may not conform to sequential consistency.
highly available
4. Gossip architecture is a framework for implementing
of clients
services by replicating data close to the points where groups
need it.

transactions.
Que 5.33. Describe the architecture of replicated

Answer
Architectures for replicated transactions :
requests to
1. In thisarchitecture, we assume that a front end sends client
one of the group of replica managers of a logical object.
communicate with a
2. In. the primary copy approach, all front ends
distinguished 'primary' replica manager to perform an operation, and
that replica manager keeps the backups up to date.
3.
Front ends may communicate with any replica manager to perform an
operation.
4 The replica manager that receives a request to perform an operation on
a particular object local state responsible for getting the cooperation of
the other replica managers in the group that have copies of that object.
5. Different replications schemes have different rules as to how. many of
successful
the replica managers in a group are required for the
completion of an operation.
6 In the read-one write-all scheme, a read request can be performed by a
single replica manager, whereas a write request must be performed by
all the replica managers in the group, as shown in Fig. 5.33.1.
7 Quorum consensus schemes are designed to reduce the number of
replica managers that must perform update operations, but at the expense
of increasing the number ofreplica managers required to perform read
only operations.
5-26 B (CS-Sem-7) Transaction and Concurrency Control

Client + front end Client + front end

deposit (B, 3);,


get Balance (A) B
Replica managers Replica managers

(B) B) (B)

Fig. 5.33.1.

Que 5.34. Explain the group communication in replicated data.


Answer
Group communication :
1. Multicast communication is also known as group communication because
process groups are the destinations of multicast messages.
2 Groups are useful for managing replicated data and in other systems
where processes cooperate towards a common goal by receiving and
processing the same set of multicast messages.
3. They are also useful where the group members independently consume
one or more common streams of messages, such as messages carrying
events to which the processes react independently.
4 A full implementation of group communication incorporates a group
membership service to manage the dynamic membership of groups, in
addition tomulticast communication.
5 Multicast and groupmembershipmanagement are strongly interrelated.
6. Fig. 5.33.1shows an open group, in which a process outside the group
sends to the group without knowing the group's membership.

Group
address
expansion
Group Leave
send
Multicast
communicatioD Fail SGroup membership
management
Join

Process group
Fig.B.34.L. Services provided for process groups.
Distributed System 5-27 B (CS-Sem-7)

7. The group communication services have to manage changes in the


group's membership while multicasts take place concurrently.
Que 5.35. Explain fault-tolerant services.
Answer
1. Fault tolerant services are obtained by using replication.
2. By using multiple independent server replicas each managing replicated
data help in designing a service which exhibits graceful degradation
during partial failure and improve overall server performance.
Following are two main fault-tolerance service models :
1. Passive (primary-backup) replication :
a In a passive model of replication for fault tolerance, there is at any
time a single primary replica manager and one or more secondary
replica managers i.e., backups' or 'slaves'.
b. In the pure form of the model, front ends communicate only with
the primary replica manager to obtain the service.
C The primary replica manager executes the operations and sends
copies of the updated data to the backups.
If the primary fails, one of the backups is promoted to act as the
primary.
e. The sequence of events when a client requests an operation to be
performed is as follows:
i. Request:The front end issues the request, containing aunique
idèntifier, to the primary replica manager.
ii. Coordination: The primary takes each request atomically,
in the order in whichit receives it. It checks the unique identifier,
in case it has already executed the request and if so it simply
re-sends the response.
ii. Execution : The primary executes the request and stores the
response.

iv. Agreement : If the request is an update then the primary


sends the updated state, the response and the unique identifier
to all the backups. The backups send an acknowledgement.
V Response : The primary responds to the front end, which
hands the response back to the client.
5-28 B (CS-Sem-7) Transaction and Concurrency Control

FE
Primary
RM
(RM
Backup

C)eFE (RM) Backup


Fig. 5.35.1.The passive (primary
backup) model for fault tolerance.
2. Active replication :
a.
In the active model of replication for fault tolerance, the replica
managers are state machines that play equivalent roles and are
organized as a group.
b Front ends multicast their requests to the group ofreplica managers
and all the replica managers process the request independently but
identically and reply.
C If any replica manager crashes, then this need have no impact
upon the performance of the service, since the remaining replica
managers continue to respond in the normal way.
Under active replication, the sequence of events when a client
requests and operation to be performed is as follows :
i. Request : The front end attaches a unique identifier tothe
request and multicasts it to the group of replica managers,
using a totally ordered, reliable multicast primitive.
Coordination : The group communication system delivers
the request to every correct replica manager in the same (total)
order.

RM

FE RM FEC

RM

Fig. 5.35.2. Active replication.


iii. Execution : Every replica manager executes the request.
iv. Agreement : No agreement phase is needed, because of the
multicast delivery semantics.
V. Response : Each replica manager sends its response to the
front end.
Distributed System 5-29 B (CS-Sem-7)

Que 5.36. What are stub and skeleton and why are they needed in
remote procedure calls ? AKTU 2017-18, Marks 10,
Answer
Stub is a function which converts the function call into a network response
and a network response into a function return.
Skeleton converts requests into function calls and function returns into
network replies.
Need of stub and skeleton in Remote Procedure Call (RPC):
1. RPC allows a local computer (client) to remotely call procedures on a
different computer (server).
2. The client and server use different address spaces, so parameters used
in a function (procedure) call have to be converted, atherwise the
values of those parameters could not be used, because pointers to
parameters in one computer's memory would point to different data
on the other computer.
3 The client and server may also use different data representations,
even for simple parameters.
4 Stubs perform the conversion of the parameters, so a remote procedure
call looks like a local function call for the remote computer.
5 Stub libraries must be installed on both the client and server side.
6. A client stub is responsible for conversion of parameters used in a
function call and deconversion of results passed from the server after
execution of the function.
7. A server skeleton, the stub on the server side, is responsible for
deconversion of parameters passed by the client and conversion of the
results after the execution of the function.

Que 5.37. What is the purpose of an Interface Definition


Language ? Why does CORBA not just use the Java interface
construct ? AKTU 2017-18, Marks 10
Answer
Purpose of Interface Definition Language (IDL) are :
1. To describe software component's Application Programming Interface
(API).
2. To describe an interface in a language-independent way, enabling
communication between software components that do not share one
language, for example, between those written in C++ and those written
in Java.
5-30 B (CS-Sem-7) Transaction and Concurrency Control
3. IDLs act as a bridge between the two different systems. For example,
incase of RPCsoftware the machines at either end of the link may be
using different operating systems and computer languages.
Reasons for CORBA not just using the Java interface construct :
1 CORBA builds on the idea that all objects are remotable.
2 Objects can be accessed via their object reference. So, we pass object
references in a call, not the object itself.
3. lf we create our remote applications starting from CORBA IDL, then it
will not pass objects by value.
4 However, we face the following problem :
a. Java interfaces can define method invocations that include
parameters that reference local objects.
b. These references to local Java objects are only useful within a
single virtual machine. So the ORB must copy these objects across
Java virtual machines.

Que 5.38. How does a server know that one of his remote objects
provided by him isno longer used by clients and can be collected ?
How does Java RMI handle this problem and what alternatives
are there ? AKTU2017-18, Marks 10|
Answer
1. When a client first receives a reference to a remote object, a "referenced"
message is sent to the server that is exporting the object.
2. Every subsequent reference within the client's local machine causes a
reference counter to be incremented.
3. As a local reference is finalized, the reference count is decremented,
and once the count goes to zero, an 'unreferenced' message is sent to
the server.
4. Once the server has no more live references to an object and there are
no local references, it is free to be finalized and garbage collected.
5. This condition tells a server that a remote object provided by him is no
longer used by clients and can be collected.
RMI uses its distributed garbage collection feature to collect remote server
objects that are no longer referenced by any client in the network.
Que 5.39. De-activation is a technology used topreserve server
resources where a server which provides remote objects to clients
can de-activate those remote objects. Clients should not know
about this. What must the server do to avoid surprises for the
clients ? AKTU2017-18, Marks 10
Distributed System 5-31 B (CS-Sem-7)

Answer
While using de-activation technologies to avoid surprises for the clients,
server must do the following :
1 It must give the client permission to recreate (activate) the object
again.
2 The remote objects must be available for a long period without any
predetermined expiration time out.
3 The remote objects state must not be lost between individual invocations
and must be available to all clients.
4 May provide remote objects whose lifetime is controlled by clients.

VERY IMPORTANT QUESTIONS


Following questions are very important. These questions
may be asked in your SESSIONALS as well as
UNIVERSITY EXAMINATION.

Q.1. Explain transaction and its properties ?


Ans. Refer Q. 5.1.

Q. 2. Discuss 2PL and strict 2PLin context of distributed system.


Ans. Refer Q. 5.5.

Q.3. Explain optimistic concurrency control.


Ans. Refer Q. 5.7.

Q.4. Discuss the optimistic methods for distributed concurrency


control. What are the different validation conditions for
optimistic concurrency control ? Explain.
Ans. Refer Q. 5.8.

Q.5. Write short notes on timestamp ordering transaction


management.
Ans. Refer Q. 5.10.

Q.6. What are the advantages and drawback of multiversion


timestamp ordering in comparison with the basic
timestamp ordering?
Ans. Refer Q. 5.13.

Q.7. Explain distributed transactions. Discuss the functionality


of flat and nested distributed transactions with example.
Ans. Refer Q. 5. 15.

You might also like