Lecture 2.0 - Issues in Design of Distributed System
Lecture 2.0 - Issues in Design of Distributed System
Introduction
Over the past two decades, advancements in microelectronic technology have resulted in
the availability of fast, inexpensive processors, and advancements in communication
technology have resulted in the availability of cost effective and highly efficient computer
networks. This Topic will enhance understand what is meant by Distributed systems.
Objectives
By the end of the course, students should understand: -
2.1 Issues in Designing a Distributed Operating System
2.2 Transparency
2.3 Performance Transparency
2.4 Scaling Transparency
2.5 Reliability
2.6 Fault Avoidance
2.7 Fault Tolerance
2.8 Fault Detection and Recovery
2.9 Flexibility
2.10 Performance
2.11 Scalability
Design Issues
2.2 TRANSPARENCY
A distributed system that is able to present itself to user and application as if it were only a
single computer system is said to be transparent.
There are eight types of transparencies in a distributed system:
In a distributed system, multiple users who are spatially separated use the system
concurrently. In such a duration, it is economical to share the system resources (hardware
or software) among the concurrently executing user processes. However since the number
of available resources in a computing system is restricted, one user process must necessarily
influence the action of other concurrently executing user processes, as it competes for
resources. For example, concurrent updates to the same file by two different processes
should be prevented. Concurrency transparency means that each user has a feeling that he
or she is the sole user of the system and other users do not exist in the system. For providing
concurrency transparency, the resource sharing mechanisms of the distributed operating
system must have the following four properties :
This requirement calls for the support of intelligent resource allocation and process
migration facilities in distributed operating systems.
The aim of scaling transparency is to allow the system to expand in scale without
disrupting the activities of the users. This requirement calls for open-system architecture
and the use of scalable algorithms for designing the distributed operating system
components.
2.5 RELIABILITY
Fault avoidance deals with designing the components of the system in such a way
that the occurrence of faults in minimized. Conservative design practice such as using high
reliability components are often employed for improving the system’s reliability based on
the idea of fault avoidance. Although a distributed operating system often has little or no
role to play in improving the fault avoidance capability of a hardware component, the
designers of the various software components of the distributed operating system must test
them thoroughly to make these components highly reliable.
Fault tolerance is the ability of a system to continue functioning in the event of partial
system failure. The performance of the system might be degraded due to partial failure, but
otherwise the system functions properly. Some of the important concepts that may be used
to improve the fault tolerance ability of a distributed operating system are as follows:
The faulty detection and recovery method of improving reliability deals with the use
of hardware and software mechanisms to determine the occurrence of a failure and then to
correct the system to a state acceptable for continued operation.
2.9 FLEXIBILITY
Flexibility is the most important features for open distributed systems. The design
of a distributed operating system should be flexible due to the following reasons:
The most important design factor that influences the flexibility of a distributed
operating system is the model used for designing its kernel. The kernel of an operating
system is its central controlling part that provides basic system facilities. It operates in a
separate address space that is inaccessible to user processes. It is the only part of an
operating system that a user cannot replace or modify. We saw that in the case of a
distributed operating system identical kernels are run on all the nodes of the distributed
system.
The two commonly used models for kernel design in distributed operating systems
are the monolithic kernel and the microkernel. In the monolithic kernel model, most
operating system services such as process management, memory management, device
management, file management, name management, and inter-process communication are
provided by the kernel. As a result, the kernel has a large, monolithic structure. Many
distributed operating systems that are extensions or limitations of the UNIX operating
system use the monolithic kernel model. This is mainly because UNIX itself has a large,
monolithic kernel.
On the other hand, in the microkernel model, the main goal is to keep the kernel as
small as possible. Therefore, in this model, the kernel is a very small nucleus of software
that provides only the minimal facilities necessary for implementing additional operating
system services. The only services provided by the kernel in this model are inter-process
communication low level device management, a limited amount of low-level process
management and some memory management. All other operating system services, such as
file management, name management, additional process, and memory management
activities and much system call handling are implemented as user-level server processes.
Each server process has its own address space and can be programmed separately.
As compared to the monolithic kernel model, the microkernel model has several
advantages. In the monolithic kernel model, the large size of the kernel reduces the overall
flexibility and configurability of the resulting operating system. On the other hand, the
resulting operating system of the microkernel model is highly modular in nature. Due to this
characteristic feature, the operating system of the microkernel model is easy to design,
implement, and install. Moreover, since most of the services are implemented as user-level
server processes, it is also easy to modify the design or add new services.
24
In spite of its potential performance cost, the microkernel model is being preferred
for the design of modern distributed operating systems. The two main reasons for this are
as follows.
1. The advantages of the microkernel model more than compensate for the
performance cost. Notice that the situation here is very similar to the one that
caused high level programming languages to be preferred to assembly
languages. In spite of the better performance of programs written in assembly
languages, most programs are written in high-level languages due to the
advantages of ease of design, maintenance, and portability. Similarly, the
flexibility advantages of the microkernel model previously described more
than outweigh its small performance penalty.
2.10 PERFORMANCE
worse than a centralized system. Some design principles considered useful for better
performance are as follows :
3. Minimize copying of data : Data copying overhead (e.g. moving data in and
out of buffers) involves a substantial CPU cost of many operations. For
example, while being transferred from its sender to its receiver, a message
data may take the following path on the sending side :
(b) From the message buffer in the sender’s address space to the message
buffer in the kernel’s address space
On the receiving side, the data probably takes a similar path in the reverse direction.
Therefore, in this case, a total of six copy operations are involved in the message transfer
operation. Similarly, in several systems, the data copying overhead is also large for read and
write operations on block I/O devices. Therefore, for better performance, it is desirable to
avoid copying of data, although this is not always simple to achieve. Making optimal use of
memory management often helps in eliminating much data movement between the kernel,
block I/O devices, clients, and servers.
2.11 SCALABILITY
Security :
In order that the users can trust the system and rely on it, the various resources of a
computer system must be protected against destruction and unauthorized access. Enforcing
security in a distributed system is more difficult than in a centralized system because of the
lack of a single point of control and the use of insecure networks for data communication.
In a centralized system, all users are authenticated by the system at login time, and the
system can easily check whether a user is authorized to perform the requested operation on
an accessed resource. In a distributed system, however, since the client – server model is
often used for requesting and providing services, when a client sends a request message to
a server, the server must have some way of knowing who is the client. This is not so simple
as it might appear because any client identification field in the message cannot be trusted.
This is because an intruder (a person or program trying to obtain unauthorized access to
system resources) may pretend to be an authorized client or may change the message
contents during transmission. Therefore, as compared to a centralized system, enforcement
of security in a distributed system has the following additional requirements :
27
1. It should be possible for the sender of a message to know that the message was
received by the intended receiver.
2. It should be possible for the receiver of a message to know that the message
was sent by the genuine sender.
Cryptography is the only known practical method for dealing with these security
aspects of a distributed system. In this method comprehension of private information is
prevented by encrypting the information, which can then be decrypted only by authorized users.
Another guiding principle for security is that a system whose security depends on the
integrity of the fewest possible entities is more likely to remain secure as it grows. For example,
it is much simpler to ensure security based on the integrity of the much smaller number of
servers rather than trusting thousands of clients. In this case, it is sufficient to only ensure the
physical security of these servers and the software they run.
Revision Exercise:
2) How are location, relocation and migration transparencies different from each other.
Explain with examples.
REFERENCES: