Introduction To Distributed Systems

Chapter 1
Introduction to Distributed Systems

Introduction:
Distributed systems are everywhere. The Internet enables users throughout the world to access its
services wherever they may be located. Each organization manages an intranet, which provides
local services and Internet services for local users and generally provides services to other users
in the Internet. Small distributed systems can be constructed from mobile computers and other
small computational devices that are attached to a wireless network.
Resource sharing is the main motivating factor for constructing distributed systems. Resources
such as printers, files, web pages or database records are managed by servers of the appropriate
type. For example, web servers manage web pages and other web resources. Resources are
accessed by clients – for example, the clients of web servers are generally called browsers.
We define a distributed system as one in which hardware or software components located at
networked computers communicate and coordinate their actions only by passing messages. This
simple definition covers the entire range of systems in which networked computers can usefully
be deployed.
Definition: - A distributed system is a collection of independent computers that appear to the
users of the system as a single computer.
Figure 1.1: distributed system

Fig. 1-1 shows four networked computers and three applications, of which application B is
distributed across computers 2 and 3. Each application is offered the same interface. The
distributed system provides the means for components of a single distributed application to
communicate with each other, but also to let different applications communicate. At the same
time, it hides, as best and reasonable as possible, the differences in hardware and operating
systems from each application.
Consequences of Distributed system
Concurrency: In a network of computers, concurrent program execution is the norm. I can do
my work on my computer while you do your work on yours, sharing resources such as web
pages or files when necessary. The capacity of the system to handle shared resources can be
increased by adding more resources (for example. computers) to the network. We will describe
ways in which this extra capacity can be usefully deployed at many points in this book. The
coordination of concurrently executing programs that share resources is also an important and
recurring topic.
No global clock: When programs need to cooperate they coordinate their actions by exchanging
messages. Close coordination often depends on a shared idea of the time at which the programs’
actions occur. But it turns out that there are limits to the accuracy with which the computers in a
network can synchronize their clocks – there is no single global notion of the correct time. This
is a direct consequence of the fact that the only communication is by sending messages through a
network.
Independent failures: All computer systems can fail, and it is the responsibility of system
designers to plan for the consequences of possible failures. Distributed systems can fail in new
ways. Faults in the network result in the isolation of the computers that are connected to it, but
that doesn’t mean that they stop running. In fact, the programs on them may not be able to detect
whether the network has failed or has become unusually slow. Similarly, the failure of a
computer, or the unexpected termination of a program somewhere in the system (a crash), is not
immediately made known to the other components with which it communicates. Each component
of the system can fail independently, leaving the others still running. The consequences of this
characteristic of distributed systems will be a recurring theme throughout the book.
Examples of Distributed Systems:
1. Network of workstations
.
2. Automatic banking (teller machine) system
3.The cloud
Advantages and Disadvantages of Distributed Systems:-

Distributed Computing Systems:
An important class of distributed systems is the one used for high-performance computing tasks.
Roughly speaking, one can make a distinction between two subgroups. In cluster computing the
underlying hardware consists of a collection of similar workstations or PCs, closely connected by
means of a high speed local-area network. In addition, each node runs the same operating
system. The situation becomes quite different in the case of grid computing. This subgroup
consists of distributed systems that are often constructed as a federation of computer systems,
where each system may fall under a different administrative domain, and may be very different
when it comes to hardware, software, and deployed network technology.
 Cluster Computing Systems
Cluster computing systems became popular when the price/performance ratio of personal
computers and workstations improved. In virtually all cases, cluster computing is used for
parallel
programming in which a single (compute intensive) program is run in parallel on multiple
machines.
One well-known example of a cluster computer is formed by Linux-based Beowulf clusters, of

which the general configuration is shown in Fig. 1-6. Each cluster consists of a collection of
compute nodes that are controlled and accessed by means of a single master node. The master
typically handles the allocation of nodes to a particular parallel program, maintains a batch queue
of submitted jobs, and provides an interface for the users of the system. As such, the master
actually runs the middleware needed for the execution of programs and management of the
cluster, while the compute nodes often need nothing else but a standard operating system.
 Grid Computing Systems
A key issue in a grid computing system is that resources from different organizations are brought
together to allow the collaboration of a group of people or institutions. Such collaboration is
realized in the form of a virtual organization. The people belonging to the same virtual
organization have access rights to the resources that are provided to that organization. Typically,
resources consist of computer servers (including supercomputers, possibly implemented as
cluster computers), storage facilities, and databases. In addition, special networked devices such
as telescopes, sensors, etc., can be provided as well.
The architecture consists of four layers. The lowest fabric layer provides interfaces to local
resources at a specific site. The connectivity layer consists of communication protocols for
supporting grid transactions that span the usage of multiple resources. The resource layer is
responsible for managing a single resource. It uses the functions provided by the connectivity
layer and calls directly the interfaces made available by the fabric layer. The next layer in the
hierarchy is the collective layer; it deals with handling access to multiple resources and typically
consists of services for resource discovery, allocation and scheduling of tasks onto multiple
resources, data replication, and so on.
Challenges:
The construction of distributed systems produces many challenges:
Heterogeneity: They must be constructed from a variety of different networks, operating
systems, computer hardware and programming languages. The Internet communication protocols
mask the difference in networks, and middleware can deal with the other differences.
Heterogeneity (that is, variety and difference) applies to all of the following:
• Networks;
• Computer Hardware;
• Operating Systems;
• Programming Languages;
• Implementations by different developers
Openness: Distributed systems should be extensible – the first step is to publish the interfaces of
the components, but the integration of components written by different programmers is a real
challenge.
Security: Encryption can be used to provide adequate protection of shared resources and to keep
sensitive information secret when it is transmitted in messages over a network. Denial of service
attacks are still a problem.
Scalability: A distributed system is scalable if the cost of adding a user is a constant amount in
terms of the resources that must be added. The algorithms used to access shared data should
avoid performance bottlenecks and data should be structured hierarchically to get the best access
times. Frequently accessed data can be replicated.
Failure handling: Any process, computer or network may fail independently of the others.
Therefore each component needs to be aware of the possible ways in which the components it
depends on may fail and be designed to deal with each of those failures appropriately.
Concurrency: The presence of multiple users in a distributed system is a source of concurrent
requests to its resources. Each resource must be designed to be safe in a concurrent environment.
Transparency: The aim is to make certain aspects of distribution invisible to the application
programmer so that they need only be concerned with the design of their particular application.
For example, they need not be concerned with its location or the details of how its operations are
accessed by other components, or whether it will be replicated or migrated. Even failures of
networks and processes can be presented to application programmers in the form of exceptions –
but they must be handled.
Quality of service: It is not sufficient to provide access to services in distributed systems. In
particular, it is also important to provide guarantees regarding the qualities associated with such
service access. Examples of such qualities include parameters related to performance, security
and reliability.
Chapter-2
Distributed Computing System Models and Issues
Distributed Computing System Models:
Various models are used for building distributed computing systems. These models can be
broadly classified following categories – minicomputer, workstation, processor pool, workstation
server and hybrid. They are briefly described below.
Minicomputer Model:
 The minicomputer model is a simple extension of the centralized time sharing system as
shown in Figure 1.2, a distributed computing system based on this model consists of a
few minicomputers (they may be large supercomputers as well) interconnected by a
communication network. Each minicomputer usually has multiple users simultaneously
logged on to it. For this, several interactive terminals are connected to each
minicomputer. Each user is logged on to one specific minicomputer, with remote access
to other minicomputers. The network allows a user to access remote resources that are
available on some machine other than the one on to which the user is currently logged.
 The minicomputer model may be used when resource sharing (Such as sharing of
information databases of different types, with each type of database located on a different
machine) with remote users is desired.
 The early ARPANET is an example of a distributed computing system based on the
minicomputer model.
Workstation Model:
 As shown in Fig. 1.3, a distributed computing system based on the workstation model
consists of several workstations interconnected by a communication network. A
company’s office or a university department may have several workstations scattered
throughout a building or campus, each workstation equipped with its own disk and
serving as a single-user computer.
 It has been often found that in such an environment, at any one time (especially at night),
a significant proportion of the workstations are idle (not being used), resulting in the
waste of large amounts of CPU time. Therefore, the idea of the workstation model is to
interconnect all these workstations by a high speed LAN so that idle workstations may be
used to process jobs of users who are logged onto other workstations and do not have
sufficient processing power at their own workstations to get their jobs processed
efficiently.
 In this model, a user logs onto one of the workstations called his or her “home”
workstation and submits jobs for execution. When the system finds that the user’s
workstation does not have sufficient processing power for executing the processes of the
submitted jobs
efficiently, it transfers one or more of the process from the user’s workstation to some
other workstation that is currently idle and gets the process executed there, and finally the
result of execution is returned to the user’s workstation.
Processor Pool Model:
The processor – pool model is based on the observation that most of the time a user does not
need any computing power but once in a while he or she may need a very large amount of
computing power for a short time. (e.g., when recompiling a program consisting of a large
number of files after changing a basic shared declaration). Therefore, unlike the workstation –
server model in which a processor is allocated to each user, in the processor-pool model the
processors are pooled together to be shared by the users as needed. The pool of processors
consists of a large number of microcomputers and minicomputers attached to the network. Each
processor in the pool has its own memory to load and run a system program or an application
program of the distributed computing system.
In the pure processor-pool model, the processors in the pool have no terminals attached directly
to them, and users access the system from terminals that are attached to the network via special
devices. These terminals are either small diskless workstations or graphic terminals, such as X
terminals. A special server (Called a run server) manages and allocates the processors in the pool
to different users on a demand basis. When a user submits a job for computation, an appropriate
number of processors are temporarily assigned to his or her job by the run server. For example, if
the user’s computation job is the compilation of a program having n segments, in which each of
the segments can be complied independently to produce separate re-locatable object files, n
processors from the pool can be allocated to this job to compile all the n segments in parallel.
When the computation is completed, the processors are returned to the pool for use by other
users.
In the processor-pool model there is no concept of a home machine. That is, a user does not log
onto a particular machine but to the system as a whole.
Work Station Server Model:

Out of the four models described above, the work station server model, is the most widely used
model for building distributed computing systems. This is because a large number of computer
users only perform simple interactive tasks such as editing jobs, sending electronic mails, and
executing small programs. The workstation-server model is ideal for such simple usage.
However, in a working environment that has groups of users who often perform jobs needing
massive computation, the processor-pool model is more attractive and suitable.
As compared to the workstation model, the workstation – server model has several advantages:
1. In general, it is much cheaper to use a few minicomputers equipped with large, fast disks
that are accessed over the network than a large number of diskful workstations, with each
workstation having a small, slow disk.
2. Diskless workstations are also preferred to diskful workstations from a system

maintenance point of view. Backup and hardware maintenance are easier to perform with
a few large disks than with many small disks scattered all over a building or campus.
Furthermore, installing new releases of software (Such as a file server with new
functionalities) is easier when the software is to be installed on a few file server machines
than on every workstation.
3. In the workstation server model, since all files are managed by the file servers, user have
the flexibility to use any workstation and access the files in the same manner irrespective
of which workstation the user is currently logged on. Note that this is not true with the
workstation model, in which each workstation has its local file system, because different
mechanisms are needed to access local and remote files.
4. In the workstation server model, the request response protocol described above is mainly
used to access the services of the server machines. Therefore, unlike the workstation
model, this model does not need a process migration facility, which is difficult to
implement.
Hybrid Model:
To continue the advantages of both the workstation-server and processor-pool models, a hybrid
model may be used to build a distributed computing system. The hybrid model is based on the
workstation-server model but with the addition of a pool of processors. The processors in the
pool can be allocated dynamically for computations that are too large for workstations or that
requires several computers concurrently for efficient execution.
In addition to efficient execution of computation-intensive jobs, the hybrid model gives

guaranteed response to interactive jobs by allowing them to be processed on local workstations
of the users. However, the hybrid model is more expensive to implement than the workstation –
server model or the processor-pool model.
Issues in designing a Distributed Operating System

In general, designing a distributed operating system is more difficult than designing a centralized
operating system for several reasons. In the design of a centralized operating system, it is
assumed that the operating system has access to complete and accurate information about the
environment in which it is functioning. For example, a centralized operating system can request
status information, being assured that the interrogated component will not charge state while
awaiting a decision based on that status information, since only the single operating system
asking the question may give commands. However, a distributed operating system must be
designed with the assumption that complete information about the system environment will
never be available. In a distributed system, the resources are physically separated, there is no
common clock among the multiple processors, delivery of messages is delayed, and messages
could even be lost. Due to all these reasons, a distributed operating system does not have up-to-
date, consistent knowledge about
the state of the various components of the underlying distributed system. Obviously, lack of up-
to-date and consistent information makes many things (Such as management of resources and
synchronization of cooperating activities) much harder in the design of a distributed operating
system. For example, it is hard to schedule the processors optimally if the operation system is not
sure how many of them are up at the moment.
Despite these complexities and difficulties, a distributed operating system must be designed to
provide all the advantages of a distributed system to its users. That is, the users should be able to
view a distributed system as a virtual centralized system that is flexible, efficient, reliable, secure
and easy to use. To meet this challenge, the designers of a distributed operating system must deal
with several design issues.
Transparency:
A distributed system that is able to present itself to user and application as if it were only a single
computer system is said to be transparent. There are eight types of transparencies in a distributed
system:
 Access Transparency: It hides differences in data representation and how a resource is

accessed by a user. Example, a distributed system may have a computer system that runs
different operating systems, each having their own file naming conventions. Differences
in naming conventions as well as how files can be manipulated should be hidden from the
users and applications.
 Location Transparency: Hides where exactly the resource is located physically.

Example, by assigning logical names to resources like yahoo.com, one cannot get an idea
of the location of the web page’s main server.
 Migration Transparency: Distributed system in which resources can be moved without

affecting how the resource can be accessed are said to provide migration transparency. It
hides that the resource may move from one location to another.
 Relocation Transparency: this transparency deals with the fact that resources can be
relocated while it is being accessed without the user who is using the application to know
anything. Example: using a Wi-Fi system on laptops while moving from place to place
without getting disconnected.
 Replication Transparency: Hides the fact that multiple copies of a resource could exist
simultaneously. To hide replication, it is essential that the replicas have the same name.
Consequently, as system that supports replication should also support location
transparency.
 Concurrency Transparency: It hides the fact that the resource may be shared by several
competitive users. Example, two independent users may each have stored their file on the
same server and may be accessing the same table in a shared database. In such cases, it is
important that each user doesn’t notice that the others are making use of the same
resource.
 Failure Transparency: Hides failure and recovery of the resources. It is the most
difficult task of a distributed system and is even impossible when certain apparently
realistic assumptions are made. Example: A user cannot distinguish between a very slow
or dead resource. Same error message come when a server is down or when the network
is overloaded of when the connection from the client side is lost. So here, the user is
unable to understand what has to be done, either the user should wait for the network to
clear up, or try again later when the server is working again.
 Persistence Transparency: It hides if the resource is in memory or disk. Example,

Object oriented database provides facilities for directly invoking methods on storage
objects. First the database server copies the object states from the disk i.e. main memory
performs the operation and writes the state back to the disk. The user does not know that
the server is moving between primary and secondary memory.
Summary of the Transparencies:

In a distributed system, multiple users who are spatially separated use the system concurrently.
In such duration, it is economical to share the system resources (hardware or software) among
the concurrently executing user processes. However since the number of available resources in a
computing system is restricted, one user process must necessarily influence the action of other
concurrently executing user processes, as it competes for resources.
For example, concurrent updates to the same file by two different processes should be prevented.
Concurrency transparency means that each user has a feeling that he or she is the sole user of the
system and other users do not exist in the system. For providing concurrency transparency, the
resource sharing mechanisms of the distributed operating system must have the following four
properties:
1. An event-ordering property ensures that all access requests to various system resources
are properly ordered to provide a consistent view to all users of the system.
2. A mutual-exclusion property ensures that at any time at most one process accesses a
shared resource, which must not be used simultaneously by multiple processes if program
operation is to be correct.
3. A no-starvation property ensures that if every process that is granted a resource, which
must not be used simultaneously by multiple processes, eventually releases it, every
request for that resource is eventually granted.
4. A no-deadlock property ensures that a situation will never occur in which competing
processes prevent their mutual progress even though no single one requests more
resources than available in the system.
Reliability:
In general, distributed systems are expected to be more reliable than centralized systems due to
the existence of multiple instances of resources. However, the existence of multiple instances of
the resources alone cannot increase the system’s reliability. Rather, the distributed operating
system, which manages these resources, must be designed properly to increase the system’s
reliability by taking full advantage of this characteristic feature of a distributed system.
A fault is a mechanical or algorithmic defect that may generate an error. A fault in a system
causes system failure. Depending on the manner in which a failed system behaves, system
failures are of two types – fail stop and Byzantine. In the case of fail-stop failure, the system
stops functioning after changing to a state in which its failure can be detected. On the other hand,
in the case of Byzantine failure, the system continues to function but produces wrong results.
Undetected software bugs often cause Byzantine failure of a system. Obviously, Byzantine
failures are much more difficult to deal with than fail-stop failures.
For higher reliability, the fault-handling mechanisms of a distributed operating system must be
designed properly to avoid faults, to tolerate faults, and to detect and recover from faults.
Flexibility:
Another important issue in the design of distributed operating systems is flexibility. Flexibility is
the most important features for open distributed systems. The design of a distributed operating
system should be flexible due to the following reasons:
1. Ease of modification: From the experience of system designers, it has been found that
some parts of the design often need to be replaced / modified either because some bug is
detected in the design or because the design is no longer suitable for the changed system
environment or new-user requirements. Therefore, it should be easy to incorporate
changes in the system in a user-transparent manner or with minimum interruption caused
to the users.
2. Ease of enhancement: In every system, new functionalities have to be added from time
to time it more powerful and easy to use. Therefore, it should be easy to add new services
to the system. Furthermore, if a group of users do not like the style in which a particular
service is provided by the operating system, they should have the flexibility to add and
use their own service that works in the style with which the users of that group are more
familiar and feel more comfortable.
Fault Avoidance:
Fault avoidance deals with designing the components of the system in such a way that the
occurrence of faults in minimized. Conservative design practice such as using high reliability
components are often employed for improving the system’s reliability based on the idea of fault
avoidance. Although a distributed operating system often has little or no role to play in
improving the fault avoidance capability of a hardware component, the designers of the various
software components of the distributed operating system must test them thoroughly to make
these components highly reliable.
Fault Tolerance:
Fault tolerance is the ability of a system to continue functioning in the event of partial system
failure. The performance of the system might be degraded due to partial failure, but otherwise
the system functions properly. Some of the important concepts that may be used to improve the
fault tolerance ability of a distributed operating system are as follows:
 Redundancy techniques: The basic idea behind redundancy techniques is to avoid

single points of failure by replicating critical hardware and software components, so that
if one of them fails, the others can be used to continue. Obviously, having two or more
copies of a critical component makes it possible, at least in principle, to continue
operations in spite of occasional partial failures. For example, a critical process can be
simultaneously executed on two nodes so that if one of the two nodes fails, the execution
of the process can be completed at the other node. Similarly, a critical file may be
replicated on two or more storage devices for better reliability.
 Distributed control: For better reliability, many of the particular algorithms or protocols
used in a distributed operating system must employ a distributed control mechanism to
avoid single points of failure. For example, a highly available distributed file system
should have multiple and independent file servers controlling multiple and independent
storage devices. In addition to file servers, a distributed control technique could also be
used for name servers, scheduling algorithms, and other executive control functions. It is
important to note here that when multiple distributed servers are used in a distributed
system to
provide a particular type of service, the servers must be independent. That is, the design
must not require simultaneous functioning of the servers; otherwise, the reliability will
become worse instead of getting better. Distributed control mechanisms are described
throughout this book.
Fault Detection and Recovery:

The faulty detection and recovery method of improving reliability deals with the use of hardware
and software mechanisms to determine the occurrence of a failure and then to correct the system
to a state acceptable for continued operation. Some of the commonly used techniques for
implementing this method in a distributed operating system are as follows.
1. Atomic Transactions: An atomic transaction (or just transaction for shore) is a

computation consisting of a collection of operation that take place indivisibly in the
presence of failures and concurrent computations. That is, all of the operations are
performed successfully or none of their effects prevails, other processes executing
concurrently cannot modify or observe intermediate states of the computation.
Transactions help to preserve the consistency of a set of shared date objects (e.g. files) in
the face of failures and concurrent access. They make crash recovery much easier,
because transactions can only end in two states: Either all the operations of the
transaction are performed or none of the operations of the transaction is performed.
In a system with transaction facility, if a process halts unexpectedly due to a hardware

error before a transaction is completed, the system subsequently restores any data objects
that were undergoing modification to their original states. Notice that if a system
does not
support a transaction mechanism, unexpected failure of a process during the processing of
an operation may leave the data objects that were undergoing modification in an
inconsistent state. Therefore, without transaction facility, it may be difficult or even
impossible in some cases to roll back (recover) the data objects from their current
inconsistent states to their original states.
2. Stateless Servers: The client-server model is frequently used in distributed systems to

service user requests. In this model, a server may be implemented by using any one of the
following two service paradigms – stateful or stateless. The two paradigms are
distinguished by one aspect of the client – server relationship, whether or not the history
of the serviced requests between a client and a server affects the execution of the next
service request. The stateful approach does depend on the history of the serviced requests,
but the stateless approach does not depend on it. Stateless servers have a distinct
advantage over stateful servers in the event of a failure. That is, the stateless service
paradigm makes crash recovery very easy because no client state information is
maintained by the server. On the other hand, the stateful service paradigm requires
complex crash recovery procedures. Both the client and server need to reliably detect
crashes. The server needs to detect client crashes so that it can discard any state it is
holding for the client, and the client must detect server crashes so that it can perform
necessary error – handling activities. Although stateful service becomes necessary in
some cases, to simplify the failure detection and recovery actions, the stateless service
paradigm must be used, wherever possible.
3. Acknowledgments and Timeout-based Retransmission of Messages: In a distributed

system, events such as a node crash or a communication link failure may interrupt a
communication that was in progress between two processes, resulting in the loss of a
message. Therefore, a reliable inter process communication mechanism must have ways
to detect lost messages so that they can be retransmitted. Handling of lost messages
usually involves return of acknowledgment messages and retransmissions on the basis of
timeouts. That is, the receiver must return an acknowledgment message for every
message received, and if the sender does not receive any acknowledgement for a
message within a fixed
timeout period, it assumes that the message was lost and retransmits the message. A
problem associated with this approach is that of duplicate message. Duplicates messages
may be sent in the event of failures or because of timeouts.
Therefore, a reliable inter process communication mechanism should also be capable of

detecting and handling duplicate messages. Handling of duplicate messages usually
involves a mechanism for automatically generating and assigning appropriate sequence
numbers to messages. Use of acknowledgement messages, timeout-based retransmissions
of messages, and handling of duplicate request messages for reliable communication.
The mechanisms described above may be employed to create a very reliable distributed
system. However, the main drawback of increased system reliability is potential loss of
execution time efficiency due to the extra overhead involved in these techniques. For
many systems it is just too costly to incorporate a large number of reliability mechanisms.
Therefore, the major challenge for distributed operating system designers is to integrate
these mechanisms in a cost-effective manner for producing a reliable system.
Performance:
If a distributed system is to be used its performance must be at least as good as a centralized
system. That is, when a particular application is run on a distributed system, its overall
performance should be better than or at least equal to that of running the same application on a
single processor system. However, to achieve his goal, it is important that the various
components of the operating system of a distributed system be designed properly; otherwise, the
overall performance of the distributed system may turn out to be worse than a centralized system.
Some design principles considered useful for better performance are as follows:
1. Batch if possible, Batching often helps in improving performance greatly. For

example, transfer of data across the network in large chunks rather than as individual
pages is much more efficient. Similarly, piggybacking of acknowledgement of previous
messages with the next message during a series of messages exchanged between two
communicating entities also improves performance.
2. Cache whenever possible : Caching of data at clients’ sites frequently improves overall
system performance because it makes data available wherever it is being currently used,
thus saving a large amount of computing time and network bandwidth. In addition,
caching reduces contention on centralized resources.
3. Minimize copying of data: Data copying overhead (e.g. moving data in and out of
buffers) involves a substantial CPU cost of many operations. For example, while being
transferred from its sender to its receiver, a message data may take the following path on
the sending side :
 From sender’s stack to its message buffer.

 From the message buffer in the sender’s address space to the message buffer in the
kernel’s address space.
 Finally, from the kernel to the network interface board On the receiving side, the data
probably takes a similar path in the reverse direction. Therefore, in this case, a total of six
copy operations are involved in the message transfer operation.
Similarly, in several systems, the data copying overhead is also large for read and write
operations on block I/O devices. Therefore, for better performance, it is desirable to avoid
copying of data, although this is not always simple to achieve. Making optimal use of
memory management often helps in eliminating much data movement between the
kernel, block I/O devices, clients, and servers.
4. Minimize network traffic: System performance may also be improved by reducing inter
node communication costs. For example, accesses to remote resources require
communication, possibly through intermediate nodes. Therefore, migrating a process
closer to the resources it is using most heavily may be helpful in reducing network traffic
in the system if the decreased cost of accessing its favorite resource offsets the possible
increased post of accessing its less favored ones. Another way to reduce network traffic is
to use the process migration facility to cluster two or more processes that frequently
communicate with each other on the same node of the system. Avoiding the collection of
global state information for making some decision also helps in reducing network traffic.
5. Take advantage of fine-grain parallelism for multiprocessing. Performance can also

be improved by taking advantage of fine-grain parallelism for multiprocessing. For
example, threads are often used for structuring server processes. Servers structured as a
group of threads can operate efficiently, because they can simultaneously service requests
from several clients. Fine-grained concurrency control of simultaneous accesses by
multiple processes, to a shared resource is another example of application of this
principle for better performance.
Scalability:
Scalability refers to the capability of a system to adapt to increased service load. It is inevitable
that a distributed system will grow with time since it is very common to add new machines or an
entire sub network to the system to take care of increased workload or organizational changes in
a company. Therefore, a distributed operating system should be designed to easily cope with the
growth of nodes and users in the system. That is, such growth should not cause serious disruption
of service or significant loss of performance to users.
Chapter-3
PROCESSES
Process turns out that having a finer granularity in the form of multiple threads of control per
process.
3.1 Introduction to Threads: To execute a program, an operating system creates a number of

virtual processes, each one for running a different program. To keep track of these virtual
processes, the operating system has a process table, containing entries to store CPU register
values, memory maps, open files, accounting information. Privileges, etc. Thread is a light-
weight process. More than one threads executed on CPU simultaneously is called multi-
threading.
Advantage of multithreading:
It becomes possible to exploit parallelism when executing the program on a multiprocessor

system. In that case, each thread is assigned to a different CPU while shared data are stored
in shared main memory.
3.2 Context Switching: The major drawback of all IPC mechanisms is
that Communication often requires extensive context switching.
3.3 Thread Implementation:
Threads are often provided in the form of a thread package. Such a package
Contains operations to create and destroy threads as well as operations on synchronization

variables. There are basically two approaches to implement a thread package. The first approach
is to construct a thread library that is executed entirely in user mode. The second approach is to
have the kernel be aware of threads and schedule them.
3.4 Multithreaded servers: The file server normally waits for an incoming request for a file
operation, subsequently carries out the request, and then sends back the reply. One possible, and
particularly popular organization is shown in Fig Here one thread, the dispatcher, reads
incoming requests for a file operation. The requests are sent by clients to a well-known end point
for this server. After examining the request, the server chooses an idle (i.e., blocked) worker
thread and hands it the request. The worker proceeds by performing a blocking read on the
local file system, which may cause the thread to be suspended until the data are fetched from
disk. Another thread is selected to be executed.
Fig: A Multithreaded server organized in a dispatcher/worker model.
3.5 The Role of virtualization in distributed systems: Every (distributed) computer system
offers a programming interface to higher level s/w. There are many different types of interfaces,
ranging from the basic instruction set as offered by a CPU to the vast collection of application
programming interfaces that are shipped with many current middleware systems. In its essence,
virtualization deals with extending or replacing an existing interface so as to mimic the
behaviour of another system.
Fig: virtualizing system A on top of system B.
3.7 Code Migration: Code migration in distributed systems took place in the form of
Process migration in which an entire process was moved from one machine to another.
Performance can be improved if processes are moved from heavily-loaded to lightly-loaded
machines.
Chapter 4
COMMUNICATION
 Inter process communication is at the heart of all distributed systems

 communication in distributed systems is based on message passing as offered by the underlying
network as opposed to using shared memory
 In a distributed system, processes
 run on different machines
 exchange information through message passing
 Successful distributed systems depend on communication models that hide or simplify
message passing
 modern distributed systems consist of thousands of processes scattered across an unreliable
network such as the Internet
 unless the primitive communication facilities of the network are replaced by more advanced ones,
development of large scale Distributed Systems becomes extremely difficult
4.1 Network Protocols and Standards

 a protocol is a set of rules that governs data communications
 a protocol defines what is communicated, how it is communicated, and when it is communicated
 the key elements of a protocol are syntax, semantics, and timing
 syntax: refers to the structure or format of the data
 semantics: refers to the meaning of each section of bits
 timing: refers to when data should be sent and how fast they can be sent
 functions of protocols
 each device must perform the same steps the same way so that the data will arrive
and reassemble properly;
 if one device uses a protocol with different steps, the two devices will not be able
to communicate with each other
 for instance, for one computer to send a message to another computer, packet
Network (Reference) Models

 Layers and Services
 within a single machine, each layer uses the services immediately below it and provides
services for the layer immediately above it
 between machines, layer x on one machine communicates with layer x on another
machine
 Two important network models or architectures
 The ISO OSI (Open Systems Interconnection) Reference Model
 The TCP/IP Reference Model
a. The OSI Reference Model

 consists of 7 layers
 was never fully implemented as a protocol stack, but a good theoretical model
 Open – to connect open systems or systems that are open for communication with
other systems
4.2 Types of Communication
Persistent versus transient
Synchronous versus asynchronous
Discrete versus streaming
Persistent versus Transient Communication
 Persistent: messages are held by the middleware communication service until they can be
delivered (e.g., email)
 Sender can terminate after executing send
 Receiver will get message next time it runs
 Transient: messages exist only while the sender and receiver are running
 Communication errors or inactive receiver cause the message to be discarded
 Transport-level communication is transient
Asynchronous v Synchronous Communication
 Asynchronous: (non-blocking) sender resumes execution as soon as the message is passed to the
communication/middleware software
 Synchronous: sender is blocked until
 The OS or middleware notifies acceptance of the message, or
 The message has been delivered to the receiver, or
 The receiver processes it & returns a response
Discrete versus Streaming Communication
 Discrete: communicating parties exchange discrete messages

 Streaming: one-way communication; a “session” consists of multiple messages from the sender
that are related either by send order (TCP streams), temporal proximity (multimedia streams), etc.
4.1 Layered Protocols:
Due to the absence of shared memory, all communication in distributed systems is based on
sending and receiving (low level) messages. When process A wants to communicate with process B, it
first builds a message in its own address space. Then it executes a system call that causes the operating
system to send the message over the network to B.A model called the Open Systems Interconnection
Reference Model (Day and Zimmerman, 1983) is used in communication.
Connection oriented & Connection-less communication system is used.
Fig: Viewing middleware as an intermediate (distributed) service in application-
level communication.
An E-mail system is a typical example in which communication is persistent. With persistent

communication, a message that has been submitted for transmission is stored by the communication
middleware as long as it takes to deliver it to the receiver.
Remote Method Invocations (RMI):- RMI stands for remote method invocations which is a way of
invoking or calling of object in a remote server and client
4.2 Remote Procedure Call: When a process on machine A calls' a procedure on machine B, the calling
process on A is suspended, and execution of the called procedure takes place on B. Information can be
transported from the caller to the callee in the parameters and can come back in the procedure result. No
message passing at all is visible to the programmer. This method is known as Remote Procedure Call, or
often just RPC.
4.3 Principle of RPC behind a client and a server program:
Client Stub (A Process) packs the parameters into a message and requests that message to be sent
to the server.
When the message arrives at the server, the server's operating system passes it up to a server stub.
A server stub is the server-side equivalent of a client stub which is a piece of code that transforms
requests coming in over the network into local procedure calls. The server stub unpacks the parameters
from the message and then calls the server procedure in the usual way.
The client stub inspects the message, unpacks the result, copies it to its caller, and returns in the
usual way. Packing parameters into a message is called parameter marshalling.
 Steps of a Remote Procedure Call

1. Client procedure calls client stub in the normal way
2. Client stub builds a message and calls the local OS (packing parameters into a message is
called parameter marshaling)
3. Client's OS sends the message to the remote OS
4. Remote OS gives the message to the server stub
5. Server stub unpacks the parameters and calls the server
6. Server does the work and returns the result to the stub
7. Server stub packs it in a message and calls the local OS
8. Server's OS sends the message to the client's OS
9. Client's OS gives the message to the client stub
10. Stub unpacks the result and returns to client
 hence, for the client remote services are accessed by making ordinary (local) procedure calls;
not by calling send and receive
4.5 Binding a Client to a Server
To allow a client to call a server, it is necessary that the server be registered

and prepared to accept incoming calls. Registration of a server makes it possible for a client to locate the
server and bind to it. Server location is done in two steps:
1. Locate the server's machine.
2. Locate the server (i.e., the correct process) on that machine.
4.6 DCE RPC(Distributed Communication Environment) Developed by Open Software Foundation

(OSF), now called The Open Group.DCE is a true middleware system in that it is designed to execute as a
layer of abstraction between existing (network) operating systems and distributed applications.
Fig: Client-to-server binding in DCE.

Chapter 5
Naming
Introduction
 names play an important role to:

o share resources
o uniquely identify entities
o refer to locations
o Etc.
 an important issue is that a name can be resolved to the entity it refers to
 to resolve names, it is necessary to implement a naming system
 in a distributed system, the implementation of a naming system is itself often distributed,
unlike in non-distributed systems
 efficiency and scalability of the naming system are the main issues
5.1 Names, Identifiers, and Addresses
 a name in a distributed system is a string of bits or characters that is used to refer to

an entity
 an entity is anything;
 e.g., resources such as hosts, printers, disks, files, objects, processes, users,
Web pages, ...
 entities can be operated on
 e.g., a resource such as a printer offers an interface containing operations
for printing a document, requesting the status of a job, ...
 to operate on an entity, it is necessary to access it through its access point, itself an entity
(special)
 Access point
 the name of an access point is called an address (such as IP address
and port number as used by the transport layer)
 the address of the access point of an entity is also referred to as the
address of the entity.
 an entity can have more than one access point (similar to accessing
an individual through different telephone numbers)
 an entity may change its access point in the course of time (e.g., a
mobile computer getting a new IP address as it moves)
 an address is a special kind of name
 it refers to at most one entity
 each entity is referred by at most one address; even when replicated
such as in Web pages
 an entity may change an access point, or an access point may be
reassigned to a different entity (like telephone numbers in offices)
 separating the name of an entity and its address makes it easier and
more flexible; such a name is called location independent
 there are also other types of names that uniquely identify an entity; in any case
an identifier is a name with the following properties
 it refers to at most one entity
 each entity is referred by at most one identifier
 it always refers to the same entity (never
reused) identifiers allow us to unambiguously refer to
an entity
5.2 Flat Naming

 a name is a sequence of characters without structure; like human names? maybe if it is
not Ethiopian name!
 difficult to be used in a large system since it must be centrally controlled to
avoid duplication
 how are flat names resolved?
 name resolution: mapping a name to an address or an address to a name is called
name- address resolution
 possible solutions: simple, home-based approaches, and hierarchical approaches
5.3 Structured Naming
 flat names are not convenient for humans
 Name Spaces
 names are organized into a name space
 each name is made of several parts; the first may define the nature of
the organization, the second the name, the third departments, ...
 the authority to assign and control the name spaces can be
decentralized where a central authority assigns only the first two parts
 a name space is generally organized as a labeled, directed graph with two types of nodes
 leaf node: represents the named entity and stores information such as its
address or the state of that entity
 directory node: a special entity that has a number of outgoing edges, each
labeled with a name
 each node in a naming graph is considered as another entity with an identifier
 Name Space Distribution
 in large scale distributed systems, it is necessary to distribute the name service
over multiple name servers, usually organized hierarchically
 a name service can be partitioned into logical layers
 the following three layers can be distinguished.
 global layer
 formed by highest level nodes (root node and nodes close to it or its children)
 nodes on this layer are characterized by their stability, i.e., directory tables are
rarely changed
 they may represent organizations, groups of organizations, ..., where names are stored
in the name space
 Administrational layer
 groups of entities that belong to the same organization or administrational unit, e.g.,
departments
 relatively stable
 managerial layer
 nodes that may change regularly, e.g., nodes representing hosts of a LAN, shared
files such as libraries or binaries, …
 nodes are managed not only by system administrators, but also by end users
Item Global Administrational Managerial
Geographical scale of
Worldwide Organization Department
network
Total number of nodes Few Many Vast numbers
Responsiveness to
Seconds Milliseconds Immediate
lookups
Update propagation Lazy Immediate Immediate
Availability
Very High High low
requirement
Number of replicas Many None or few None
Is client-side caching
Yes Yes Sometimes
applied?
5.4 Attribute-Based Naming
 flat naming: provides a unique and location-independent way of referring entities

 structured naming: also provides a unique and location-independent way of referring
entities as well as human-friendly names
 but do not allow searching entities by giving a description of an entity
 each entity is assumed to have a collection of attributes that say something about the
entity
 then a user can search an entity by specifying (attribute, value) pairs known attribute-
based naming
 Directory Services
 attribute-based naming systems are also called directory services
Chapter 6
Synchronization
Discussion point
 the issue of synchronization based on time (actual time and relative ordering)
 distributed mutual exclusion to protect shared resources from simultaneous
access by multiple processes
 how a group of processes can appoint a process as a coordinator; can be done
by means of election algorithms
6.1 Clock Synchronization
 in centralized systems, time can be unambiguously decided by a system call e.g., process
A at time t1 gets the time, say tA, and process b at time t2, where t1 < t2, gets the time, say
tB then tA is always less than (possibly equal to but never greater than) tB
achieving agreement on time in distributed systems is difficult
 e.g., consider the make program on a UNIX machine;

 it compiles only source files for which the time of their last update was later than
the existing object file
Physical Clocks
Is it possible to synchronize all clocks in a distributed system?
 no
 even if all computers initially start at the same time, they will get out of synch
after some time due to crystals in different computers running at different
frequencies, this phenomenon called clock skew
 How is time actually measured?
 earlier astronomically;
 based on the amount of time it takes the earth to rotate the sun; 1 solar
second = 1/86400th of a solar day (24*3600 = 86400)
 it was later discovered that the period of the earth’s rotation is not constant
 the earth is slowing down due to tidal friction and atmospheric drag
 geologists believe that 300 million years ago there were about 400 days
per year
 the length of the year is not affected, only the days have become longer
 in some countries, UTC (Universal Coordinated Time) is broadcasted on
shortwave radio and satellites (as a short pulse at the start of each UTC second)
for those who need precise time; but one has to pay for the propagation delay
Clock Synchronization Algorithms
 two situations:
 one machine has a receiver of UTC time, then how do we synchronize
all other machines to it
 no machine has a receiver, each machine keeps track of its own time, then
how to synchronize them
 many algorithms have been proposed
 a model for all algorithms
 each machine has a timer that ticks H times per second or causes an interrupt; the
interrupt handler adds 1 to the clock
 let the value of the clock that is obtained so be C
 when the UTC time is t, the value of the clock on machine p is Cp(t); if everything
is perfect, Cp(t) = t or dC/dt = 1
 but in practice there will be errors; either it ticks faster or slower
 if  is a constant such that 1-   dC/dt  1+ , then the timer is said to be
working within its specification
  is set by the manufacturer and is called the maximum drift rate
the relation between clock time and UTC when clocks tick at different rates
 The Berkley Algorithm

 in the previous algorithm, the time server is passive; only other machines ask
it periodically
 in Berkley UNIX, a time daemon asks every machine from time to time to ask
the time
 it then calculates the average and sends messages to all machines so that they will
adjust their clocks accordingly
 suitable when no machine has a UTC receiver
 the time daemon’s time must be set periodically manually
a) the time daemon asks all the other machines for their clock values
b) the machines answer how far ahead or behind the time daemon they are
c) the time daemon tells everyone how to adjust their clock
6.2 Logical Clocks
For some applications, it is sufficient if all machines agree on the same time, rather than with the
real time; we need internal consistency of the clocks rather than being close to the real time
 hence the concept of logical clocks

 what matters is the order in which events occur
Lamport Timestamps
 Lamport defined a relation called happens before

 a  b read as “a happens before b”; means all processes agree that first
event a occurs, then event b occurs
 this relation can be observed in two situations
 if a and b are events in the same process, and a occurs before
b, then a  b is true
 if a is the event of a message being sent by one process, and b is
the event of the message being received by another process, then
a
 b is also true
 happens before is a transitive relation
 if a  b and b  c, then a  c
 if two events, x and y, happen in different processes that do not
exchange messages, then both x  y and y  x are not true;
 these events are said to be concurrent
 it means that nothing can be said about the order of these events
 for every event a, we can assign a time value C(a) on which all processes agree;
if a  b, then C(a) < C(b)
 Lamport’s proposed algorithm for assigning times for processes
 consider three processes each running on a different machine, each with its
own clock
 the solution follows the happens before relation; each message carries the sending
time
 if the receiver’s clock shows a value prior to the time the message was sent, the
receiver fast forwards its clock to be one more than the sending time
a) three processes, each with its own clock; the clocks run at different rates
b) Lamport's algorithm corrects the clocks
Implementation:
 Each process maintains a local counter Ci.

 These counters are updated as follows:
 Before executing an event, Pi executes
Ci ← Ci + 1.
 When process Pi sends a message m to Pj, it sets m’s timestamp ts (m) equal to Ci
after having executed the previous step
 Upon the receipt of a message m, process Pj adjusts its own local counter as
Cj ← max {Cj , ts (m)}, after which it then executes the first step and delivers the
message to the application
 Vector Clocks
 with Lamport timestamps a  b does not necessarily mean a happens
before b; only that all processes agree; but they ensure total ordering
 it also doesn’t tell us causality of events
 vector timestamps are designed for this purpose

Introduction To Distributed Systems

Uploaded by

Document Informationclick to expand document information

Copyright:

Available Formats

Introduction To Distributed Systems

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Introduction To Distributed Systems

Uploaded by

Copyright:

Available Formats

Chapter 1

Introduction to Distributed Systems

Figure 1.1: distributed system

2. Automatic banking (teller machine) system

Advantages and Disadvantages of Distributed Systems:-

One well-known example of a cluster computer is formed by Linux-based Beowulf clusters, of

Distributed Computing System Models:

Processor Pool Model:

Work Station Server Model:

2. Diskless workstations are also preferred to diskful workstations from a system

In addition to efficient execution of computation-intensive jobs, the hybrid model gives

Issues in designing a Distributed Operating System

 Access Transparency: It hides differences in data representation and how a resource is

 Location Transparency: Hides where exactly the resource is located physically.

 Migration Transparency: Distributed system in which resources can be moved without

 Persistence Transparency: It hides if the resource is in memory or disk. Example,

Summary of the Transparencies:

 Redundancy techniques: The basic idea behind redundancy techniques is to avoid

Fault Detection and Recovery:

1. Atomic Transactions: An atomic transaction (or just transaction for shore) is a

In a system with transaction facility, if a process halts unexpectedly due to a hardware

2. Stateless Servers: The client-server model is frequently used in distributed systems to

3. Acknowledgments and Timeout-based Retransmission of Messages: In a distributed

Therefore, a reliable inter process communication mechanism should also be capable of

1. Batch if possible, Batching often helps in improving performance greatly. For

 From sender’s stack to its message buffer.

5. Take advantage of fine-grain parallelism for multiprocessing. Performance can also

3.1 Introduction to Threads: To execute a program, an operating system creates a number of

It becomes possible to exploit parallelism when executing the program on a multiprocessor

3.2 Context Switching: The major drawback of all IPC mechanisms is

that Communication often requires extensive context switching.

3.3 Thread Implementation:

Contains operations to create and destroy threads as well as operations on synchronization

Fig: A Multithreaded server organized in a dispatcher/worker model.

 Inter process communication is at the heart of all distributed systems

4.1 Network Protocols and Standards

Network (Reference) Models

a. The OSI Reference Model

Persistent versus transient

Synchronous versus asynchronous

Discrete versus streaming

Persistent versus Transient Communication

 Discrete: communicating parties exchange discrete messages

Fig: Viewing middleware as an intermediate (distributed) service in application-

An E-mail system is a typical example in which communication is persistent. With persistent

4.3 Principle of RPC behind a client and a server program:

 Steps of a Remote Procedure Call

To allow a client to call a server, it is necessary that the server be registered

1. Locate the server's machine.

2. Locate the server (i.e., the correct process) on that machine.

4.6 DCE RPC(Distributed Communication Environment) Developed by Open Software Foundation

Fig: Client-to-server binding in DCE.

 names play an important role to:

 a name in a distributed system is a string of bits or characters that is used to refer to

5.2 Flat Naming

Total number of nodes Few Many Vast numbers

Update propagation Lazy Immediate Immediate

5.4 Attribute-Based Naming