0% found this document useful (0 votes)
332 views48 pages

Distributed Operating Systems: Unit - 2

Distributed operating systems consist of multiple computers that communicate over a network but do not share memory or a clock. Each computer has its own CPU, memory, and disk and runs its own operating system. Distributed systems offer advantages like supporting inherently distributed applications, extensibility, information sharing, resource sharing, shorter response times, higher reliability, and flexibility. However, distributed operating systems face challenges around global knowledge, scalability, naming, compatibility, process synchronization, resource management, security, and structuring.

Uploaded by

eyasu
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPT, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
332 views48 pages

Distributed Operating Systems: Unit - 2

Distributed operating systems consist of multiple computers that communicate over a network but do not share memory or a clock. Each computer has its own CPU, memory, and disk and runs its own operating system. Distributed systems offer advantages like supporting inherently distributed applications, extensibility, information sharing, resource sharing, shorter response times, higher reliability, and flexibility. However, distributed operating systems face challenges around global knowledge, scalability, naming, compatibility, process synchronization, resource management, security, and structuring.

Uploaded by

eyasu
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPT, PDF, TXT or read online on Scribd
You are on page 1/ 48

UNIT -2

Distributed Operating Systems

Adv.O.S 1
Definition:
Distributed operating system:
The term distributed system is used to describe a
system with the following characteristics: it consists
of several computers that do not share a memory or
a clock;
the computers communicate with each other by
exchanging messages over a communication network
and each computer has its own memory and runs its
own operating system.

Adv.O.S 2
Architecture of a distributed operating system
CP
U
CP
U
U MEMOR
CP dIS
K
Y
dIS
ME K
MO
RY
RY
E MO
M

Communication
Network

CPU U
CP

RY
dISK EMO
M
MEM
O RY

Adv.O.S 3
Advantages of Distributed System:
list
 Inherently Distributed Applications
Several applications are inherently distributed in nature and require a distributed
computing system for their realization
Examples:
 Large bank with offices all over the world – customers can deposit/withdraw
money from his/her account from any branch of the bank
 Airline reservation system
 Employees database

 Extensibility
 Possible to gradually extend the power and functionality of a distributed
computing system by simply adding additional resources (HW and SW)
 Attractive - Can extend the resources without affecting the normal
functionality of the existing system (called as open distributed systems)

Adv.O.S 4
Advantages of Distributed System
 Information sharing among distributed users
 Groupware / computer supported cooperative working : Emerging
technology – Distributed computing system by a group of users to
work cooperatively by transferring files and logging into other’s m/c
to run programs
Resource sharing

 Provides mechanisms for sharing files at remote sites, processing


information in distributed database, printing files at remote sites, using
remote specialized hardware devices (array processor)
Better price-performance ratio

 Facilitate resource sharing among multiple computers. Ex: costly HW

Adv.O.S 5
Advantages of Distributed System
 Shorter response times and higher throughput
 Expected to perform better than centralized system
 Metrics : Response time and throughput
 By partitioning the computations
 By evenly distributing the load from overloaded processors
Reliability

 Refers to the degree of tolerance against errors and component failures in a


system
 A reliable system prevents loss of information even in the event of component
failures
 Multiplicity of processors and storage devices allows the maintenance of multiple
copies of critical information
 If one processor fails, the computation can be successfully completed at the other
processor
 If one storage device fails, information are still available in other devices
 Increased availability
Adv.O.S 6
Advantages of Distributed System
 Better flexibility
 Different types of computers are suitable for performing different
types of computations
DCS may have a pool of different types of computers
Computation speedup

 Partition the computation into sub-computations and run concurrently at


different sites

Advantages can be achieved only by careful design of DCS


Adv.O.S 7
ISSUES IN DISTRIBUTED OPERATING SYSTEM

Global Knowledge
Scalability
Naming
Compatibility
Process Synchronization
Resource Management
Security
Structuring

Adv.O.S 8
Global knowledge
Up-to-date state of all the processes and resources (global state of
the system) is completely and accurately known in shared memory
computer systems

In Distributed computing system – not possible to get global state


due to the unavailability of global memory and clock

Issues : Unavailability of global memory, clock and unpredictable


message delays

Solution :
•Determine efficient techniques to implement decentralized system
wide control
•Determine efficient ways to schedule the different events happens at
different times in DS.

Adv.O.S 9
Scalability
It refers to the capability of a system to adapt to increased service load DCS should
be designed to easily cope with the growth of nodes and users in the system

Some guiding principles for designing scalable DS

Perform most operations on client workstations:


Server is a common resource for several clients - perform operations on client’s
own m/c rather than on a server m/c.
Enables the scalability – by reducing contention for shared resources (ex: server)

Avoid centralized algorithms:


It collects information from all nodes, process it on a single node and distributes
the results to other nodes – not acceptable from scalability point of view
Ex: scheduling algorithm that makes scheduling decisions by first inquiring from
all the nodes and selecting the most lightly loaded node as a candidate for
receiving jobs has poor scalability factor

Avoid centralized entities:


Use of centralized entities such as a single central file server/a single DB for the
entire system makes the DS non-scalable (single point of failure, contention, etc)
Adv.O.S 10
Naming: Names are used to refer to objects. Objects that
can be named in computer systems include computers,
printers, services, files and users. An example of a service
is a name service. A name service maps a logical name into
a physical address by making use of a table lookup, an
algorithm or through a combination of two.
Compatibility : It refers to the notion of interoperability
among the resources in a system. The three different levels
of compatibility that exist in distributed systems are the
binary level, the execution level and protocol level.
Process synchronization: The process synchronization in
distributed system is difficult because of the unavailability
of shared memory. A distributed operating system has to
synchronize processes running at different computers
when they try to concurrently access a shared resource,
such as a file directory.
Adv.O.S 11
Resource Management : Resource management in
distributed operating system is concerned with
making both local and remote resources available to
users in an effective manner. The resource of a
distributed system are made available to users in the
following ways :
Data migration
Computation migration
Distributed scheduling

Data Migration: In the process of data migration, data


is brought to the location of the computation that
needs access to it by the distributed operating system.

Adv.O.S 12
Computation Migration: The computation migrates to
another location. Migrating computation may be
efficient under certain circumstances. The remote
procedure call mechanism has been widely used for
computation migration and for providing
communication between computers.

Distributed Scheduling : In distributed scheduling,


processes are transferred from one computer to
another by the distributed operating system. It is
responsible for judiciously and transparently
distributing processes amongst computers such that
overall performance is maximized.

Adv.O.S 13
Security: The security of a system is the responsibility of
its operating system. Two issues that must be
considered in the design of security for computer
systems are authentication and authorization.
Authentication is the process of guaranteeing that an
entity is what it claims to be.
Authorization is the process of deciding what privileges
an entity has and making only these privileges
available.
Structuring : The structure of an operating system
defines how various parts of the operating system are
organized. The monolithic kernel, the collective
kernel structure and object oriented operating system

Adv.O.S 14
Communication Primitives
The communication primitives are the high level
constructs with which programs use the underlying
communication network. The communication
primitives influence a programmer’s choice of
algorithm as well as the ultimate performance of the
programs .
Two communication models , namely , message
passing and remote procedure call, that provide
communication primitives. These two models have
been widely used to develop distributed operating
systems and applications for distributed system
Adv.O.S 15
INHERENT LIMITATIONS OF A DISTRIBUTED SYSTEM
A distributed system is a collection of computers that are
specially separated and do not share a common
memory.
We discuss the inherent limitation of distributed systems
and their impact on the design and development of
distributed systems.

Absence of a Global clock


Absence of a Shared Memory

Adv.O.S 16
A Distributed system with two sites
Local state of A Local state of B

Communication
$500 $200
A Channel

S1: A S2: B

B Communication
$450 $200
Channel

S1: A S2: B

C
Communication
$450 $250
Channel

S1: A S2: B

Adv.O.S 17
Lamport’s Logical Clocks
Definition: Due to the absence of perfectly synchronized clocks and
global time in distributed systems, the order in which two events
occur at two different computers cannot be determined based on
the local time at which they occur.
 Happened before relation:
 a -> b : Event a occurred before event b. Events in the same process.
 a -> b : If a is the event of sending a message m in a process and b is
the event of receipt of the same message m by another process.
 a -> b, b -> c, then a -> c. “->” is transitive.
 Causally Ordered Events
 a -> b : Event a “causally” affects event b
 Concurrent Events : two distinct events a and b are said to be
concurrent If a !-> b and b !-> a in other words, concurrent events do
not casually affect each other.
For any two events a and b in a system either ab , ba or a||b.
Adv.O.S 18
Space-time Diagram
Example :

Space e11 e12 e13 e14


P1

Internal
Events
Messages

P2
e21 e22 e23 e24
Time

Adv.O.S 19
Logical Clocks
Conditions satisfied:
 Ci is clock in Process Pi.
 If a -> b in process Pi, Ci(a) < Ci(b)
 Let a: sending message m in Pi; b : receiving message m in Pj;
then, Ci(a) < Cj(b).
Implementation Rules:
 R1: Ci = Ci + d (d > 0); clock is updated between two successive
events.
 R2: Cj = max(Cj, tm) + d; (d > 0); When Pj receives a message
m with a time stamp tm (tm assigned by Pi, the sender; tm =
Ci(a), a being the event of sending message m).
A reasonable value for d is 1

Adv.O.S 20
Space-time Diagram
Example 2:
How Lamport’s logical clocks advance

Space e11 e12 e13 e14 e15 e16 e17


P1
(1) (2) (3) (4) (5) (6) (7)

Clock
values

(1) (2) (3) (4) (7)


P2
e21 e22 e23 e24 e25
Time

Adv.O.S 21
Limitation of Lamport’s Clock
Space e11 e12 e13
P1
(1) (2) (3)

(1) (3) (4)


P2
e21 e22 e23

P3 e31 e32 e33

(1) (2)
Time
C(e11) < C(e32) but not causally related.
This inter-dependency not reflected in Lamport’s Clock.
Adv.O.S 22
The above figure show a computation over three
processes. Clearly c( e11)< c(e22) and c(e11)<c(e32).
However , we can see from the figure that even e11 is
casually related to event e22 but not to event e32, since
a path exists from e11 to e22 but not from e11 to e32.

The reason for the above limitation is that each clock


can independently advance due to the occurrence of
local events in a process and the Lamport’s clock
system cannot distinguish between the advancements
of clocks due to local events from those due to the
exchange of message between processes.

Adv.O.S 23
Vector Clocks
 Vector clocks is an algorithm for generating a partial ordering of events in a
distributed system and detecting causality violations. Just as in Lamport
timestamps, interprocess messages contain the state of the sending process's
logical clock. A vector clock of a system of N processes is an array/vector of N
logical clocks, one clock per process; a local "smallest possible values" copy of
the global clock-array is kept in each process, with the following rules for clock
updates:
 Initially all clocks are zero.
 Each time a process experiences an internal event, it increments its own logical
clock in the vector by one.
 Each time a process prepares to send a message, it increments its own logical
clock in the vector by one and then sends its entire vector along with the
message being sent.
 Each time a process receives a message, it increments its own logical clock in
the vector by one and updates each element in its vector by taking the
maximum of the value in its own vector clock and the value in the vector in the
received message (for every element).

Adv.O.S 24
Adv.O.S 25
Vector Clocks
Keep track of transitive dependencies among processes
for recovery purposes.
Ci[1..n]: is a “vector” clock at process Pi whose entries
are the “assumed”/”best guess” clock values of different
processes.
Ci[j] (j != i) is the best guess of Pi for Pj’s clock.
Vector clock rules:
 Ci[i] = Ci[i] + d, (d > 0); for successive events in Pi
 For all k, Cj[k] = max (Cj[k],tm[k]), when a message m with
time stamp tm is received by Pj from Pi.
: For all

Adv.O.S 26
Vector Clocks Comparison
Equal : ta=tb iff i , ta [i]=tb[i];
Not Equal: tatb iff i , ta [i]  tb[i];
Less than or equal : ta tb iff i , ta [i]  tb[i];
Not Less than or Equal to : ta tb iff i , ta [i]  tb[i];
Less than: ta < tb iff (ta < tb ^ tatb )
Not Less than: ta < t b
Concurrent ta || tb

Adv.O.S 27
Vector Clock …
Dissemination of time in vector clocks

Space e11 e12 e13


P1
(1,0,0) (2,0,0) (3,4,1)

(0,1,0) (2,2,0) (2,3,1) (2,4,1)


P2
e21 e22 e23 e24

P3 e31 e32

(0,0,1) (0,0,2)
Time

Adv.O.S 28
Causal Ordering of Messages
Space Send(M1)
P1

Send(M2)
P2

P3 (1)

(2)
Time

Adv.O.S 29
Message Ordering …
 Not really worry about maintaining clocks.
 Order the messages sent and received among all processes in a
distributed system.
 (e.g.,) Send(M1) -> Send(M2), M1 should be received ahead of M2 by
all processes.
 This is not guaranteed by the communication network since M1 may
be from P1 to P2 and M2 may be from P3 to P4.
 Message ordering:
 Deliver a message only if the preceding one has already been delivered.
 Otherwise, buffer it up

Adv.O.S 30
Global State
The global state of a distributed system mainly
consists of the local state of all processes along with
the messages that are being sent but are not
delivered.
On many occasions , the global state of a distribute
system should be known. In a distributed database
system, the local state consists of database records.
The records that are temporarily used for
communication purpose are removed.

Adv.O.S 31
Global State: plot
Global-states and their transitions
Global State 1 in the bank account example.

C1: Empty
$500 $200
A C2: Empty B
Global State 2

C1: Tx $50
$450 $200
A C2: Empty B
Global State 3

C1: Empty
$450 $250
A C2: Empty B

Adv.O.S 32
Recording Global State...
(e.g.,) Global state of A is recorded in (1) and not in (2).
 State of B, C1, and C2 are recorded in (2)
 Extra amount of $50 will appear in global state
 Reason: A’s state recorded before sending message and C1’s state
after sending message.
Inconsistent global state if n < n’, where
 n is number of messages sent by A along channel before A’s state
was recorded
 n’ is number of messages sent by A along the channel before
channel’s state was recorded.
Consistent global state: n = n’

Adv.O.S 33
Recording Global
Similarly, for consistency m = m’
State...
 m’: no. of messages received along channel before B’s state recording
 m: no. of messages received along channel by B before channel’s state
was recorded.
Also, n’ >= m, as in no system no. of messages sent along the
channel be less than that received
Hence, n >= m
Consistent global state should satisfy the above equation.
Consistent global state:
 Channel state: sequence of messages sent before recording sender’s
state, excluding the messages received before receiver’s state was
recorded.
 Only transit messages are recorded in the channel state.

Adv.O.S 34
Recording Global State
Send(Mij): message M sent from Si to Sj
rec(Mij): message M received by Sj, from Si
time(x): Time of event x
LSi: local state at Si
send(Mij) is in LSi iff (if and only if) time(send(Mij)) <
time(LSi)
rec(Mij) is in LSj iff time(rec(Mij)) < time(LSj)
transit(LSi, LSj) : set of messages sent/recorded at LSi
and NOT received/recorded at LSj

Adv.O.S 35
Recording Global State …
inconsistent(LSi,LSj): set of messages NOT
sent/recorded at LSi and received/recorded at LSj
Global State, GS: {LS1, LS2,…., LSn}
Consistent Global State, GS = {LS1, ..LSn} AND for all i
in n, inconsistent(LSi,LSj) is null.
Transitless global state, GS = {LS1,…,LSn} AND for all i
in n, transit(LSi,LSj) is null.

Adv.O.S 36
Recording Global State ..
LS1 M2
S1 M1

S2
LS2

M1: transit
M2: inconsistent

Adv.O.S 37
Recording Global State...
Strongly consistent global state: consistent and
transitless, i.e., all send and the corresponding receive
events are recorded in all LSi.
LS11 LS12

LS22 LS23
LS21

LS31 LS32 LS33

Adv.O.S 38
Requirement of Mutual Exclusion algorithms
The primary objective of a mutual exclusion algorithm is to maintain mutual
exclusion: that is to guarantee that only one request accesses the critical
section at a time. In addition the following characteristics are considered
important in a mutual exclusion algorithm:
 Freedom from Deadlocks:-two or more sites should not endlessly wait for
messages that will never arrive.
 Freedom from Starvation: A site should not be forced to wait indefinitely to
execute critical section while other sites are repeatedly executing critical
section. That is , every requesting site should get an opportunity to execute
critical section in a finite time.
 Fairness: Fairness dictates that requests must be executed in the order they
are made. Since a physical global clock does not exist, time is determined by
logical clocks. Note that fairness implies freedom from starvation , but not
vice-versa.
 Fault tolerance: a mutual exclusion algorithm is fault-tolerant if in the wake
of a failure, it can reorganize itself to that it continues to function without any
disruptions.

Adv.O.S 39
DEADLOCK HANDLING STRATEGIES IN
DISTRIBUTED SYSTEMS
 There are 3 strategies to handle deadlocks, deadlock prevention,
deadlock avoidance and deadlock detection.
 Deadlock Prevention: It is commonly achieved by either having a
process acquire all the needed resources simultaneously before it begins
execution or by preempting a process that holds the needed resource.
 Deadlock Avoidance: In the deadlock avoidance approach to
distributed systems, a resource is granted to a process if the resulting
global system state is safe. Because of the following problems:
 Every site has to maintain information on the global state of the system,
which translates into huge storage requirements and extensive
communication costs
 The process of checking for a safe global state must be mutually exclusive ,
because if several sites concurrently perform checks for a safe global state

Adv.O.S 40
Due to the large number of processes and resources , it
will be computationally expensive to check for a safe
state.
Deadlock Detection: Deadlock detection in distributed
systems has two favorable conditions:
1. Once a cycle is formed in WFG, it persist until it is
detected and broken
2. Cycle detection can proceed concurrently with the
normal activities of a system

Adv.O.S 41
Control Organization for Distributed Deadlock Detection: select
one and discuss in detail
Centralized Control
Distributed Control
Hierarchical Control

Centralized Control: - In centralized deadlock detection


algorithms, a designated site has the responsibility of
constructing the global WFG and searching it for cycles.
However , centralized deadlock-detection algorithms have
a single point of failure. Also the message traffic generated
due to deadlock detection activity is independent of the
rate of deadlock formation and the structure of deadlock
cycles.
Adv.O.S 42
Distributed control: In distribution deadlock detection
algorithms, the responsibility for detecting a global
deadlock is shared equally among all sites.
The global state graph is spread over many sites and
several sites participate in the detection of a global cycle.
Unlike centralized sites and several sites participate in the
detection of a global cycle.
Unlike centralized control , distributed control is not
vulnerable to a single point of failure and no site is
swamped with deadlock detection activity.
Also, a deadlock detection is initiated only when a waiting
process is suspected to be a part of a deadlock cycle.

Adv.O.S 43
Hierarchical Control : In hierarchical deadlock
detection algorithms, sites are arranged in a
hierarchical fashion, and a site detects deadlocks
involving only its descendant sites.
Hierarchical algorithms exploit access patterns local
to a cluster of sites to efficiently detect deadlocks.
They tend to get the best of both the centralized and
the distributed control organizations in that there is
no single point of failure and a site is not bogged
down by deadlock detection activities with which it is
not concerned .

Adv.O.S 44
Centralized Deadlock Detection Algorithms
The completely centralized algorithm is the simplest
centralized deadlock detection algorithm, wherein a
designated site called the control site, maintains the
WFG of the entire system and checks it for the
existence of deadlock cycles.
All sites request and release resources by sending
request resource and release resource messages to the
control site, respectively.
When the control site receives a request resource or a
release resource message, it correspondingly updates
its WFG .
Adv.O.S 45
Distributed Deadlock Detection Algorithms
In distributed deadlock detection algorithms, all sites
collectively cooperate to detect a cycle in the state
graph that is likely to be distributed over several sites
of the system.
A distributed deadlock detection algorithm can be
initiated whenever a process is forced to wait , and it
can be initiated either by the local site or the process
or by the site where the process waits.
Distributed Deadlock detection algorithms can be
divided into four classes – path-pushing, edge-chasing,
diffusion computation and global state detection
Adv.O.S 46
 In Path Pushing Algorithms, the wait for dependency information of
the global WFG is disseminated in the form of paths.
 In edge-chasing Algorithms, special messages called probes are
circulated along the edge of the WFG to detect a cycle. A process
declares a deadlock when it receives a probe initiated by it. An
interesting feature of edge-chasing algorithms is that probes are of a
fixed size.
 Diffusion Computation type algorithms make use of echo algorithms
to detect deadlocks . To detect a deadlock , a process sends out query
messages along all the outgoing edges in the WFG. These queries are
successively propagated through the edges of the WFG.
 Global state detection based deadlock detection algorithms exploit
the following facts
 A consistent snapshot of a distributed system can be obtained without
freezing the underlying computation.
 A consistent snapshot may not represent the system state at any moment in
time, but if a stable property holds in the system before the snapshot
collection is initiated, this property will still hold in the snapshot.

Adv.O.S 47
HIERARCHICAL DEADLOCK DETECTION ALGORITHMS
In hierarchical algorithms, sites are arranged in
hierarchical fashion, and a site is responsible for
detecting deadlocks involving only its children sites.

These algorithms take advantage of access patterns


that are localized to a cluster of sites to optimize
performance.

Adv.O.S 48

You might also like