Unit-9 Distributed Operating System
Unit-9 Distributed Operating System
Introduction
A distributed operating system is an operating system that runs on several machines whose purpose
is to provide a useful set of services, generally to make the collection of machines behave more
like a single machine. The distributed operating system plays the same role in making the collective
resources of the machines more usable that a typical single-machine operating system plays in
making that machine’s resources more usable. Usually, the machines controlled by a distributed
operating system are connected by a relatively high-quality network, such as a high-speed local
area network. Most commonly, the participating nodes of the system are in a relatively small
geographical areas, something between an office and a campus.
The Distributed Os involves a collection of autonomous computer systems, capable of
communicating and cooperating with each other through a LAN / WAN. A Distributed Os provides
a virtual machine abstraction to its users and wide sharing of resources like as computational
capacity, I/O and files etc. A distributed operating system (DOS), are systems which model where
distributed applications are running on multiple computers, linked by communications.
1
Advantages of Distributed System
1. Scalability:
As computing occurs on each node independently, it is simple and inexpensive to add more
nodes and functionality as required.
2. Reliability:
Most distributed systems are made from many nodes that work together which ultimately
make them fault tolerant. The system doesn’t experience any disruptions if a single
machine fails.
3. Performance:
These systems are regarded to be very efficient as the work load can be broken up and sent
to multiple machines, therefore reducing data processing.
4. Data sharing:
Nodes can easily share data with other nodes as they are connected with each other.
5. No domino effect in case of a node failure:
The failure of one node in a DOS does not have a domino effect and enables all other nodes
fail. Other nodes can still communicate with each other despite the failure.
6. Shareable:
Resources, for instance like printers, can be shared with multiple nodes rather than just
being constrained to just one node.
1. Scheduling:
The system has to decide which jobs need to be executed, when they should be executed,
and where they should be executed. A scheduler will have limitations, this may lead to
under-utilised hardware and unpredictable runtimes.
2. Latency:
The more widely distributed a system is the more latency can be experienced with
communications. This therefore results in teams/developers to make tradeoffs between
availability, consistency and latency.
3. Observability:
It can be a real challenge to gather, process, present, and monitor hardware usage metrics
for large clusters.
4. Security:
It is difficult to place adequate security in DOS, as the nodes and the connections need to
be secured.
5. Data loss:
Some data/messages may be lost in the network while moving from one node to another.
6. Complicated database:
In comparison to a single user system, the database connected to a DOS is relatively
complicated and difficult to handle.
7. Overloading:
If multiple nodes in DOS send data all at once, then the system network may become
overloaded.
2
8. Expensive:
These systems are not readily available, as they are regarded to be very expensive.
9. Complex software:
Underlying software is highly complex and is not understood very well compared to other
systems.
A network operating system (NOS) is a computer operating system (OS) that is designed primarily
to support workstations, personal computers and, in some instances, older terminals that are
connected on a local area network (LAN).
Network Operating System runs on a server and gives the server the capability to manage data,
users, groups, security, applications, and other networking functions. The basic purpose of the
network operating system is to allow shared file and printer access among multiple computers in
a network, typically a local area network (LAN), a private network or to other networks.
Some examples of network operating systems include Microsoft Windows Server 2003, Microsoft
Windows Server 2008, UNIX, Linux, Mac OS X, Novell NetWare, and BSD.
Advantages
The main difference between these two operating systems is that in network operating system each
node or system can have its own operating system on the other hand in distribute operating each
node or system have same operating system which is opposite to the network operating system.
3
Network Operating System Distributed Operating System
All distributed systems consist of multiple CPUs. There are several different ways the hardware
can be arranged. The important thing related to hardware is that how they are interconnected and
how they communicate with each other. It is important to take a deep look at distributed system
hardware, in particular, how the machines are connected together and how they interact.
Many classification schemes for multiple CPU computer systems have been proposed over the
years, but none of them have really implemented. Still, the most commonly used taxonomy is
Flynn's (1972), but it was in basic stage. In this scheme, Flynn took only two things to consider
i.e. the number of instruction streams and the number of data streams.
1. Single Instruction, Single Data Stream (SISD)
A computer with a single instruction stream and a single data stream is called SISD. All traditional
uni-processor computers (i.e., those having only one CPU) fall under this category, from personal
computers to large mainframes. SISD flow concept is given in the figure below.
4
Fig :SISD Flow structure
5
Fig: MISD Flow structure
4. Multiple Instruction, Multiple Data Stream (MISD)
The next category is MIMD, which has multiple instructions performances on multiple data
units. This means a group of independent computers; each has its own program counter,
program, and data. All distributed systems are MIMD, so this classification system is not more
useful for simple purposes.
6
software allows machines and users of a distributed system to be fundamentally independent
of one another. Consider a group of personal computers, each of which has its own CPU, its
own memory, its own hard disk, and its own operating system, but which share some resources,
such as laser printers and databases over a LAN. This system is loosely coupled, since the
individual machines are clearly distinguishable, each with its own job to do. If the network
should go down for some reason, the individual machines can still continue to run to a
considerable degree, although some functionality may be lost.
For tightly coupled system consider a multiprocessor dedicated to running a single chess
program in parallel. Each CPU is assigned a board to evaluate and it spends its time examining
that board and all the boards that can be generated from it. When the evaluation is finished, the
CPU reports back the results and is given a new board to work on.
7
Fig: Distributed shared memory
Advantages
Disadvantages
8
2. Message Passing
Message passing model allows multiple processes to read and write data to the message queue
without being connected to each other. Messages are stored on the queue until their recipient
retrieves them. Message queues are quite useful for interprocess communication and are used by
most operating systems.
Communication in the message passing paradigm, in its simplest form, is performed using the
send() and receive() primitives. The syntax is generally of the form:
send(receiver, message)
receive(sender, message)
The send() primitive requires the name of the destination process and the message data as
parameters. The addition of the name of the sender as a parameter for the send() primitive would
enable the receiver to acknowledge the message. The receive() primitive requires the name of the
anticipated sender and should provide a storage buffer for the message.
Machine A Machine B
Task 0 Task 1
Data Data
send () recv()
Task 2 Task 3
Data Data
recv() send ()
9
3. Remote Procedure Call
Message passing leaves the programmer with the burden of the explicit control of the movement
of data. Remote procedure calls (RPC) relieve this burden by increasing the level of abstraction
and providing semantics similar to a local procedure call.
Remote Procedure Call (RPC) is a powerful technique for constructing distributed, client-server-
based applications. It is based on extending the conventional local procedure calling so that
the called procedure need not exist in the same address space as the calling procedure. The two
processes may be on the same system, or they may be on different systems with a network
connecting them.
The client process blocks at the call() until the reply is received. The remote procedure is the server
processes which has already begun executing on a remote machine. It blocks at the receive() until
10
it receives a message and parameters from the sender. The server then sends a reply() when it has
finished its task. The syntax is as follows:
reply(caller, result_parameters)
Clock Synchronization
Distributed System is a collection of computers connected via the high speed communication
network. In the distributed system, the hardware and software components communicate and
coordinate their actions by message passing. Each node in distributed systems can share their
resources with other nodes. So, there is need of proper allocation of resources to preserve the
state of resources and help coordinate between the several processes. To resolve such conflicts,
synchronization is used. Synchronization in distributed systems is achieved via clocks.
In distributed systems, this is not the case. Unfortunately, each system has its own timer that drives
its clock. These timers are based either on the oscillation of a quartz crytal, or equivalent IC.
Although they are reasonably precise, stable, and accurate, they are not perfect. This means that
the clocks will drift away from the true time. Each timer has different characteristics --
11
characteristics that might change with time, temperature, &c. This implies that each systems time
will drift away from the true time at a different rate -- and perhaps in a different direction (slow or
fast).
Berkeley algorithm
The server polls each machine periodically, asking it for the time.
Based on the answer it computes for the average time and tells all the other machine to
advance their clocks to the new time or slow their clocks down until some specific
reduction has been achieved.
The time daemon’s time must be set manually by operator periodically.
Logical Clock
Logical Clocks refer to implementing a protocol on all machines within your distributed
system, so that the machines are able to maintain consistent ordering of events within some
virtual timespan. A logical clock is a mechanism for capturing chronological and causal
relationships in a distributed system. Distributed systems may have no physically
synchronous global clock, so a logical clock allows global ordering on events from different
processes in such systems.
Lamport’s algorithm
Each message carries a timestamp of the sender’s clock
When a message arrives :
- If the receiver’s clock < message timestamp
set system clock to (message timestamp + 1)
else do nothing
12
Clock must be advanced between any two events in the same process.
# event is known
time = time + 1;
# event happens
send(message, time);
P1 P2 P3
T(P1)<T(P2)
0 0 0
m1
6 8 10
T(P2)<T(P3)
12 16 20
m2
18 24 30
24 32 40
30 40 50
36 48
m3
60
42 56 70
m4 T(P3) is not <T(P2)
48 64 80
54 72 90
T(P2) is not <T(P1)
60 80 100
13
Correcting the clock
P1 P2 P3
0 0 0
m1
6 8 10
12 16 20
m2
18 24 30
24 32 40
30 40 50
36 48
m3
60
42 61 70
P2 adjust its clock by
m4
48 69 80
adding 5
70 P1 adjust its clock by 77 90
76 adding 16 85 100
14