0% found this document useful (0 votes)
21 views11 pages

Module 3 - Mutual Exclusion and Deadlock Detection - Sreerag Sanilkumar

The document discusses mutual exclusion and deadlock detection in distributed systems, highlighting the need for mutual exclusion algorithms due to the concurrent access of processes to shared resources. It outlines three main approaches for distributed mutual exclusion: token-based, non-token based, and quorum based, along with specific algorithms like Lamport's, Ricart-Agrawala, and Maekawa's. Additionally, it addresses deadlock detection strategies, emphasizing the importance of examining process-resource interactions to identify and resolve deadlocks in distributed environments.

Uploaded by

arunslkjm
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
21 views11 pages

Module 3 - Mutual Exclusion and Deadlock Detection - Sreerag Sanilkumar

The document discusses mutual exclusion and deadlock detection in distributed systems, highlighting the need for mutual exclusion algorithms due to the concurrent access of processes to shared resources. It outlines three main approaches for distributed mutual exclusion: token-based, non-token based, and quorum based, along with specific algorithms like Lamport's, Ricart-Agrawala, and Maekawa's. Additionally, it addresses deadlock detection strategies, emphasizing the importance of examining process-resource interactions to identify and resolve deadlocks in distributed environments.

Uploaded by

arunslkjm
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 11

Module 3 - Mutual exclusion and Deadlock

detection
Introduction
Mutual exclusion: Concurrent access of processes to a shared resource or data is
executed in mutually exclusive manner.
Only one process is allowed to execute the critical section (CS) at any given time.
In a distributed system, shared variables (semaphores) or a local kernel cannot be used
to implement mutual exclusion.
Message passing is the sole means for implementing distributed mutual exclusion.
Distributed mutual exclusion algorithms must deal with unpredictable message delays
and incomplete knowledge of the system state.

Three basic approaches for distributed mutual exclusion:

1. Token based approach


2. Non-token based approach
3. Quorum based approach

Token-based approach
A unique token is shared among the sites.
A site is allowed to enter its CS if it possesses the token.
Mutual exclusion is ensured because the token is unique.

Non-token based approach


Two or more successive rounds of messages are exchanged among the sites to
determine which site will enter the CS next.

Quorum based approach


Each site requests permission to execute the CS from a subset of sites (called a
quorum).
Any quorums contain a common site.
This common site is responsible to make sure that only one request executes the CS at
any time.
System model for Distributed Mutual
Exclusion Algorithms
The system consists of N sites, S1, S2, , SN . Without loss of generality, we assume
that a single process is running on each site.
The process at site Si is denoted by pi.
All these processes communicate asynchronously over an underlying communication
network.
A process wishing to enter the CS requests all other or a subset of processes by
sending REQUEST messages, and waits for appropriate replies before entering the CS.
While waiting the process is not allowed to make further requests to enter the CS.
A site can be in one of the following three states: requesting the CS, executing the CS,
or neither requesting nor executing the CS
In the “requesting the CS” state, the site is blocked and cannot make further requests
for the CS.
In the “idle” state, the site is executing outside the CS.
In the token-based algorithms, a site can also be in a state where a site holding the
token is executing outside the CS.
Such state is refereed to as the idle token state.
At any instant, a site may have several pending requests for CS.
A site queues up these requests and serves them one at a time.
We do not make any assumption regarding communication channels if they are FIFO or
not.
This is algorithm specific. We assume that channels reliably deliver all messages, sites
do not crash, and the network does not get partitioned

Requirements of Mutual Exclusion


Algorithms
Safety Property: At any instant, only one process can execute the critical section.
Liveness Property: This property states the absence of deadlock and starvation. Two
or more sites should not endlessly wait for messages which will never arrive.
Fairness: Each process gets a fair chance to execute the CS. Fairness property
generally means the CS execution requests are executed in the order of their arrival
(time is determined by a logical clock) in the system.

Performance Metrics
The performance of mutual exclusion algorithms is generally measured by the following four
metrics:
1. Message complexity : This is the number of messages that are required per CS
execution by a site
2. Synchronization delay : After a site leaves the CS, it is the time required and before
the next site enters the CS
3. Response time : This is the time interval a request waits for its CS execution to be over
after its request messages have been sent out
4. System throughput : This is the rate at which system execute request for CS. If SD is
the synchronization delay and E is the average critical section execution time, then the
throughout is given by the following equation

1
System throughput =
(SD + E)

Lamport’s Algorithm
Requests for CS are executed in the increasing order of timestamps and time is
determined by logical clocks.
Every site Si keeps a queue, request queue i , which contains mutual exclusion
requests ordered by their timestamps.
This algorithm requires communication channels to deliver messages the FIFO order.

Requesting the critical section


When a site Si wants to enter the CS, it broadcasts a REQUEST(tsi , i) message to all
other sites and places the request on request _queuei. (tsi, i) denotes the timestamp of
the request.)
When a site Sj receives the REQUEST(tsi, i) message from site Si ,places site Si ’s
request on request queue j and it returns a time stamped REPLY message to Si.

Executing the critical section


Site Si enters the CS when the following two conditions hold:

L1: Si has received a reply message with timestamp larger than (tsi, i) from all other
sites.
L2: Si’s request is at the top of request _queue i.

Releasing the critical section


Site Si , upon exiting the CS, removes its request from the top of its request queue and
broadcasts a time stamped RELEASE message to all other sites.
When a site Sj receives a RELEASE message from site Si , it removes Si ’s request
from its request queue.
When a site removes a request from its request queue, its own request may come at
the top of the queue, enabling it to enter the CS.

Ricart-Agrawala Algorithm
The Ricart-Agrawala algorithm assumes the communication channels are FIFO. The
algorithm uses two types of messages: REQUEST and REPLY.
A process sends a REQUEST message to all other processes to request their
permission to enter the critical section. A process sends a REPLY message to a process
to give its permission to that process.

Requesting the critical section


When a site Si wants to enter the CS, it broadcasts a time stamped REQUEST
message to all other sites.
When site Sj receives a REQUEST message from site Si , it sends a REPLY message
to site Si if
site Sj is neither requesting nor executing the CS, or
if the site Sj is requesting and Si ’s request’s timestamp is smaller than site Sj ’s
own request’s timestamp.
Otherwise, the reply is deferred and Sj sets RDj [i]=1

Executing the critical section


Site Si enters the CS after it has received a REPLY message from every site it sent a
REQUEST message to.

Releasing the critical section


When site Si exits the CS, it sends all the deferred REPLY messages: ∀j if RDi [j]=1,
then send a REPLY message to Sj and set RDi [j]=0.

Quorum-Based Mutual Exclusion


Algorithms
Quorum-based mutual exclusion algorithms are different in the following two ways:

1. A site does not request permission from all other sites, but only from a subset of the
sites. Consequently, every pair of sites has a site which mediates conflicts between that
pair.
2. A site can send out only one REPLY message at any time. A site can send a REPLY
message only after it has received a RELEASE message for the previous REPLY
message.
Quorum-based mutual exclusion algorithms significantly reduce the message
complexity of invoking mutual exclusion by having sites ask permission from only a
subset of sites.
Since these algorithms are based on the notion of “Coteries” and “Quorums,” we first
describe the idea of coteries and quorums.
A coterie C is defined as a set of sets, where each set g ∈C is called a quorum. The
following properties hold for quorums in a coterie:
Intersection property
Minimality property
Coteries and quorums can be used to develop algorithms to ensure mutual exclusion in
a distributed environment.
A simple protocol works as follows: let “a” be a site in quorum “A.”
If “a” wants to invoke mutual exclusion, it requests permission from all sites in its
quorum “A.”
Minimality property ensures efficiency.

Maekawa’s Algorithm
Maekawa’s algorithm was the first quorum-based mutual exclusion algorithm.

The Algorithm
A site Si executes the following steps to execute the CS.

Requesting the critical section


A site Si requests access to the CS by sending REQUEST(i)messages to all sites in its
request set Ri.
When a site Sj receives the REQUEST(i) message, it sends a REPLY(j) message to Si
provided it hasn’t sent a REPLY message to a site since its receipt of the last RELEASE
message. Otherwise ,it queues up the REQUEST(i) for later consideration.

Executing the critical section


Site Si executes the CS only after it has received a REPLY message from every site in
Ri

Releasing the critical section


After the execution of the CS is over, site Si sends a RELEASE(i) message to every site
in Ri.
When a site Sj receives a RELEASE(i) message from site Si, it sends a REPLY
message to the next site waiting in the queue and deletes that entry from the queue. If
the queue is empty, then the site updates its state to reflect that it has not sent out any
REPLY message since the receipt of the last RELEASE message.

Token-based algorithm
Suzuki–Kasami’s broadcast algorithm
In token-based algorithms, a unique token is shared among the sites. A site is allowed
to enter its CS if it possesses the token.
Token-based algorithms use sequence numbers instead of timestamps. (Used to
distinguish between old and current requests.)
If a site wants to enter the CS and it does not have the token, it broadcasts a REQUEST
message for the token to all other sites.
A site which possesses the token sends it to the requesting site upon the receipt of its
REQUEST message.
If a site receives a REQUEST message when it is executing the CS, it sends the token
only after it has completed the execution of the CS.

This algorithm must efficiently address the following two design issues :

1. How to distinguish an outdated REQUEST message from a current REQUEST


message
1. Due to variable message delays, a site may receive a token request message after
the corresponding request has been satisfied.
2. If a site can not determined if the request corresponding to a token request has
been satisfied, it may dispatch the token to a site that does not need it.
3. This will not violate the correctness, however, this may seriously degrade the
performance.
2. How to determine which site has an outstanding request for the CS
1. After a site has finished the execution of the CS, it must determine what sites have
an outstanding request for the CS so that the token can be dispatched to one of
them.

The first issue is addressed in the following manner:

A REQUEST message of site Sj has the form REQUEST(j, n) where n (n=1,2, ...) is a
sequence number which indicates that site Sjis requesting its nth CS execution.
A site Si keeps an array of integers RNi[1..N] where RNi[j] denotes the largest sequence
number received in a REQUEST message so far from site Sj.
When site Si receives a REQUEST(j , n) message, it sets RNi [j]:=max(RNi[j], n).
When a site Si receives a REQUEST(j, n) message, the request is outdated if RNi[j]>n.

The second issue is addressed in the following manner


The token consists of a queue of requesting sites, Q, and an array of integers LN[1..N],
where LN[j] is the sequence number of the request which site Sj executed most
recently.
After executing its CS, a site Si updates LN[i]:=RNi[i] to indicate that its request
corresponding to sequence number RNi[i] has been executed.
At site Si if RNi[j]=LN[j]+1, then site Sj is currently requesting token.

The Algorithm

Requesting the critical section


If requesting site Si does not have the token, then it increments its sequence number,
RNi[i], and sends a REQUEST(i, sn) message to all other sites. (‘sn’ is the updated
value of RNi[i].)
When a site Sj receives this message, it sets RNj[i] to max(RNj[i],sn). If Sj has the idle
token, then it sends the token to Si if RNj[i]=LN[i]+1.

Executing the critical section


Site Si executes the CS after it has received the token

Releasing the critical section


Having finished the execution of the CS, site Si takes the following actions:

It sets LN[i] element of the token array equal to RNi[i].


For every site Sj whose id is not in the token queue, it appends its id to the token queue
if RNi[j]=LN[j]+1.
If the token queue is nonempty after the above update, Si deletes the top site id from
the token queue and sends the token to the site indicated by the id.

Deadlock detection in distributed systems


Deadlocks is a fundamental problem in distributed systems.
A process may request resources in any order, and a process can request resource
while holding others.
If the sequence of the allocations of resources to the processes is not controlled,
deadlocks can occur.
A deadlock is a state where a set of processes request resources that are held by other
processes in the set.
A distributed program is composed of a set of n asynchronous processes p1, p2, . . . , pi
, . . . , pn that communicates by message passing over the communication network.
Each process is running on a different processor.
The processors do not share a common global memory and communicate by passing
messages over the communication network.
There is no physical global clock in the system to which processes have instantaneous
access.
A process can be in two states: running or blocked.
In the running state (also called active state), a process has all the needed resources
and is either executing or is ready for execution.
In the blocked state, a process is waiting to acquire some resource.

Deadlock Handling Strategies


There are three strategies for handling deadlocks, viz.,
deadlock prevention
deadlock avoidance
deadlock detection
Deadlock prevention is commonly achieved either by having a process acquire all the
needed resources simultaneously before it begins executing or by preempting a process
which holds the needed resource. This approach is highly inefficient and impractical in
distributed systems.
In deadlock avoidance approach to distributed systems, a resource is granted to a
process if the resulting global system state is safe.
Deadlock detection requires examination of the status of process-resource interactions
for presence of cyclic wait. Deadlock detection in distributed systems seems to be the
best approach to handle deadlocks in distributed systems.

To resolve the deadlock, we have to abort a deadlocked process.

System Model
A distributed system consists of a set of processors that are connected by a
communication network.
The communication delay is finite but unpredictable.
A distributed program is composed of a set of n asynchronous processes P1, P2, , Pi, ,
Pn that communicate by message passing over the communication network.
Without loss of generality we assume that each process is running on different
processor.
The processors do not share a common global memory and communicate solely by
passing messages over the communication network.
There is no physical global clock in the system to which processes have instantaneous
access.
The communication medium may deliver messages out of order, messages may be lost,
garbled, or duplicated due to timeout and retransmission, processors may fail, and
communication links may go down.
The system can be modeled as a directed graph in which vertices represent the
processes and edges represent unidirectional communication channels.

We make the following assumptions:

The systems have only reusable resources.


Processes are allowed to make only exclusive access to resources.
There is only one copy of each resource.
A process can be in two states, running or blocked. In the running state (also called
active state),
A process has all the needed resources and is either executing or is ready for
execution.
In the blocked state, a process is waiting to acquire some resource.

Wait-for-Graph (WFG)
In distributed systems, the state of the system can be modeled by directed graph, called
a wait-for graph (WFG).
In a WFG, nodes are processes and there is a directed edge from node P1 to node P2 if
P1 is blocked and is waiting for P2 to release some resource.
A system is deadlocked if and only if there exists a directed cycle or knot in the WFG.

Issues in Deadlock Detection


Deadlock handling using the approach of deadlock detection addressing two basic
issues:
detection of existing deadlocks
resolution of detected deadlocks.

Detection of Deadlocks
Detection of deadlocks involves addressing two issues: Maintenance of the WFG and
searching of the WFG for the presence of cycles (or knots).
Correctness Criteria: A deadlock detection algorithm must satisfy the following two
conditions:
Progress (No undetected deadlocks)
The algorithm must detect all existing deadlocks in finite time.
Safety (No false deadlocks)
The algorithm should not report deadlocks which do not exist (called phantom
or false deadlocks).

Resolution of a Detected Deadlock


Deadlock resolution involves breaking existing wait-for dependencies between the
processes to resolve the deadlock.
It involves rolling back one or more deadlocked processes and assigning their
resources to blocked processes so that they can resume execution

Models of Deadlocks
Distributed systems allow several kinds of resource requests.

The Single Resource Model


In the single resource model, a process can have at most one outstanding request for
only one unit of a resource.
Since the maximum out-degree of a node in a WFG for the single resource model can
be 1, the presence of a cycle in the WFG shall indicate that there is a deadlock.

The AND Model


In the AND model, a process can request for more than one resource simultaneously
and the request is satisfied only after all the requested resources are granted to the
process.
The out degree of a node in the WFG for AND model can be more than 1.
The presence of a cycle in the WFG indicates a deadlock in the AND model.

The OR Model
In the OR model, a process can make a request for numerous resources simultaneously
and the request is satisfied if any one of the requested resources is granted.
Presence of a cycle in the WFG of an OR model does not imply a deadlock in the OR
model.
The AND-OR Model
A generalization of the previous two models (OR model and AND model) is the AND-OR
model.
In the AND-OR model, a request may specify any combination of and and or in the
resource request.

Unrestricted Model
In the unrestricted model, no assumptions are made regarding the underlying structure of
resource requests.

You might also like