Course File Distributed System
Course File Distributed System
COURSE FILE
Program : B.E.
Semester : VII
1. Scheme
2. Syllabus
3. Time Table
4. Lecture Plan
5. List of Books
8. Tutorial Questions
9. Assignment Questions
Unit-II
Distributed Share Memory and Distributed File System
Basic Concept of Distributed Share Memory (DSM), DSM Architecture & its Types, Design &
Implementations issues In DSM System, Structure of Share Memory Space, Consistency Model,
and Thrashing. Desirable features of good Distributed File System, File Model, File Service
Architecture, File Accessing Model, File Sharing Semantics, File Catching Scheme, File
Application & Fault tolerance. Naming: - Features, System Oriented Names, Object Locating
Mechanism, Human Oriented Name.
Unit-III
Inter Process Communication and Synchronization
API for Internet Protocol, Data Representation & Marshaling, Group Communication, Client
Server Communication, RPC- Implementing RPC Mechanism, Stub Generation, RPC Messages.
Synchronization: - Clock Synchronization, Mutual Exclusion, Election Algorithms:- Bully & Ring
Algorithms.
Unit-IV
Distributed Scheduling and Deadlock
Distributed Scheduling-Issues in Load Distributing, Components for Load Distributing
Algorithms, Different Types of Load Distributing Algorithms, Task Migration and its issues.
Deadlock-Issues in deadlock detection & Resolutions, Deadlock Handling
Strategy, Distributed Deadlock Algorithms,
Unit-V
Distributed Multimedia & Database system
Distributed Data Base Management System (DDBMS), Types of Distributed Database, and
Distributed Multimedia: - Characteristics of multimedia Data, Quality of Service Managements.
Case Study of Distributed System: - Amoeba, Mach, Chorus
Time Table
Lecture Plan
Resolutions NOTES/R1:31
34
6-319
Deadlock Handling Strategy R2:163-
35
163,312-316
36 Distributed Deadlock Algorithms NOTES
References Books
R1: Pradeep K.Sinha, Distributed Operating System Concept & Design, PHI
R3:Singhal and Shivaratri, Advance Concept in Operating System, McGraw Hill
R2: Andrew S Tanenbaum
List of Books
Q.4 a) Explain the various services which can be performed on a file in a distributed file system.
Q 4 b) What is file caching? Explain which cache location provides highest performance gain.
Q.5 a) Discuss remote procedure call (RPC) mechanism in brief. Also discuss client and server
stub.
Q 5 b) Give a brief introduction to Group Communication.
Q 6 Write short note on(any three)
1. Tightly coupled system
2. DFS
3. DSM
4. Transparency
5. NUMA System
NAME GROUP OF INSTITUTES
MID SEMESTERII
BRANCH- COMPUTER SCIENCE & TECHNOLOGY
SEMESTER: VII
SUBJECT: Distributed Systems (CS-702)
1.a) What are the main difference between a Network operating system and Distributed operating
system.
b) Explain the distributed system characteristics. Also explain its advantages and disadvantages.
2.a) Differentiate between the following:
1. Stateless and Stateful server
2.Monolithic and Microlithic server
b) Compare and contrast the techniques of caching disk blocks locally on a client system and
remotely on a server.
Ques. 3 a)Prove the presence of cycle in general resource allocation graph is necessary but not a
sufficient condition of the existing of deadlock.
b) Differentiate between replication and caching.
Q.4 a) Explain design and implementation issues of Distributed shared memory.
b) A directory service is very similar to Yellow Pages service: entities are looked up by specifying
properties instead of exact names (such as in DNS).Why it is it so difficult to have an efficient
Distributed directory service?
Q.5 a) Explain bully algorithm. What will happen in a bully algorithm for electing a coordinate
when two or more process almost simultaneously discover that coordinator has crashed.
b) Make comparison between Weak and Release consistency.
Q.6 a) Differentiate between NFS and Coda file system on various characteristics.
b) What is timestamping ? Does using timestamping for concurrency control ensure
serializability ? Discuss.
Q.7 a) Give reason why :
1. Calling the printf statement of C language using RPC is impossible.
2.Call by reference is not possible through RPC.
Q.8 ) Write short note of any three of the following:
1.Unix emulation in MACH 2.DCE directory services
3. Object based DSM. 4.Load balancing
5.RPC time service
NAME College of Technology, Bhopal
Department of Computer Science & Engineering
Assignment-1
Unit-1
Q.1) Compare centralized system and distributed system.
Assignment-2
Unit-2
Assignment-3
Unit-3
Assignment-4
Unit-4
Assignment-5
Unit-5
Tutorial-1
Unit-1
Tutorial-2
Unit-2
Tutorial-3
Unit-3
Tutorial-4
Unit-4
1. Write a translation scheme to generate intermediate code for assignment statement with
array references.
2. Write syntax directed definition to translate switch statement. With a suitable example,
show translation of the source language switch statement.
3. Write short note on back patching.
4. Write short note on code generation.
5. What are general issues in designing a code generator?
6. Explain code generation algorithm.
7. What is basic block? With suitable example discuss various transformations on the basic
blocks.
8. What are the different types of intermediate codes? Explain in brief.
NAME College of Technology, Bhopal
Department of Computer Science & Engineering
Tutorial-5
Unit-5
Note: Attempt one question from each Unit. Draw neat diagrams where needed. All
questions carry equal marks.
UNIT-I
1. (a) What are the design issues for good distributed system ?
OR
4 .(a) Draw the file service architecture and explain all modules with example.
(b) What do you understand about naming system is Distributed systems? Explain object
locating mechanism.
UNIT-III
5. (a) How does a procedure call at remote system take place ?
(b) Discuss one to one and one to many in Group Communication.
OR
UNIT-IV
7.(a) How does distributed deadlocks can be prevented ? Give any one strategy with
explanation.
(b) Compare the various types of deadlocks in distributed systems..
OR
8.(a) What are issues in distributed scheduling in load distributing ?
(b) Discuss Task Migration and its issues
UNIT-V
OR
10. Write short notes on the following :
(i) Quality of Service Management in distributed systems.
(ii) Characteristics of distributed multimedia.
Unit Wise Blow-up
Unit-I
Introduction
The word distributed in terms such as "distributed system", "distributed programming", and
"distributed algorithm" originally referred to computer networks where individual computers were
physically distributed within some geographical area. The terms are nowadays used in a much
wider sense, even when referring to autonomous processes that run on the same physical computer
and interact with each other by message passing. While there is no single definition of a distributed
system, the following defining properties are commonly used:
There are several autonomous computational entities, each of which has its own local
memory.
The entities communicate with each other by message passing. In this article, the
computational entities are called computers or nodes. A distributed system may have a common
goal, such as solving a large computational problem. Alternatively, each computer may have its
own user with individual needs, and the purpose of the distributed system is to coordinate the use
of shared resources or provide communication services to the users. Other typical properties of
distributed systems include the following:
The system has to tolerate failures in individual computers.
The structure of the system (network topology, network latency, number of computers) is not
known in advance, the system may consist of different kinds of computers and network links, and
the system may change during the execution of a distributed program.
Each computer has only a limited, incomplete view of the system. Each computer
may know only one part of the input.
Architecture for Distributed System
Various hardware and software architectures are used for distributed computing. At a lower level, it
is necessary to interconnect multiple CPUs with some sort of network, regardless of whether that
network is printed onto a circuit board or made up of loosely coupled devices and cables. At a
higher level, it is necessary to interconnect processes running on those CPUs with some sort of
communication system. Distributed programming typically falls into one of several basic
architectures or categories: clientserver, 3-tier architecture, n-tier architecture, distributed objects,
loose coupling, or tight coupling.
Clientserver: Smart client code contacts the server for data then formats and displays it to
the user. Input at the client is committed back to the server when it represents a permanent change.
3-tier architecture: Three tier systems move the client intelligence to a middle tier so that
stateless clients can be used. This simplifies application deployment. Most web applications are 3-
Tier.
n-tier architecture: n-tier refers typically to web applications which further forward their
requests to other enterprise services. This type of application is the one most responsible for the
success of application servers.
Tightly coupled (clustered): refers typically to a cluster of machines that closely work
together, running a shared process in parallel. The task is subdivided in parts that are made
individually by each one and then put back together to make the final result.
Peer-to-peer: an architecture where there is no special machine or machines that provide a
service or manage the network resources. Instead all responsibilities are uniformly divided among
all machines, known as peers. Peers can serve both as clients and servers.
Space based: refers to an infrastructure that creates the illusion (virtualization) of one single
address-space. Data are transparently replicated according to application needs. Decoupling in
time, space and reference is achieved. Another basic aspect of distributed computing architecture is
the method of communicating and coordinating work among concurrent processes. Through
various message passing protocols, processes may communicate directly with one another,
typically in a master/slave relationship. Alternatively, a "database-centric" architecture can enable
distributed computing to be done without any form of direct inter process communication, by
utilizing a shared database.
Hardware Concepts
1. Multiprocessors
2. Multi computers
3. Networks of Computers
DistinguishingFeatures:
Private versus shared memory
Bus versus switched interconnection
Networks of Computers
Software Concepts
Distributed operating system
Network operating system
Middleware
The discussion below focuses on the case of multiple computers, although many of the issues are
the same for concurrent processes running on a single computer. Three viewpoints are commonly
used:
Centralized algorithms
The graph G is encoded as a string, and the string is given as input to a computer. The computer
program finds a coloring of the graph, encodes the coloring as a string, and outputs the result.
Parallel algorithms
Again, the graph G is encoded as a string. However, multiple computers can access the same
string in parallel. Each computer might focus on one part of the graph and produce a coloring for
that part.
The main focus is on high-performance computation that exploits the processing power of
multiple computers in parallel.
Distributed algorithms
The graph G is the structure of the computer network. There is one computer for each node
of G and one communication link for each edge of G. Initially, each computer only knows about its
immediate neighbors in the graph G; the computers must exchange messages with each other to
discover more about the structure of G. Each computer must produce its own colour as output.
The main focus is on coordinating the operation of an arbitrary distributed system. While the
field of parallel algorithms has a different focus than the field of distributed algorithms, there is a
lot of interaction between the two fields. For example, the Cole Vishkin algorithm for graph
colouring was originally presented as a parallel algorithm, but the same technique can also be used
directly as a distributed algorithm. Moreover, a parallel algorithm can be implemented either in a
parallel system (using shared memory) or in a distributed system (using message passing). The
traditional boundary between parallel and distributed algorithms (choose a suitable network vs. run
in any given network) does not lie in the same place as the boundary between parallel and
distributed systems (shared memory vs. message passing).
1:As it is previously told you distributed systems will have an inherent security issue.
2:Networking: If the network gets saturated then problems with transmission will surface.
3:Software:There is currently very little less software support for Distributed system.
4:Troubleshooting: Troubleshooting and diagnosing problems in a distributed system can also
become more difficult, because the analysis may require connecting to remote nodes or inspecting
communication between nodes.
Synchronization
Clock synchronization is a problem from computer science and engineering which deals with the
idea that internal clocks of several computers may differ. Even when initially set accurately, real
clocks will differ after some amount of time due to clock drift, caused by clocks counting time at
slightly different rates. There are several problems that occur as a repercussion of rate differences
and several solutions, some being more appropriate than others in certain contexts. In serial
communication, some people use the term "clock synchronization" merely to discuss getting one
metronome-like clock signal to pulse at the same frequency as another one frequency
synchronization and phase synchronization. Such "clock synchronization" is used in
synchronization in telecommunications and automatic baud rate detection.
Process scheduling
Preemptive scheduling is widespread in operating systems and in parallel processing on symmetric
multiprocessors. However, in distributed systems it is practically unheard of. Scheduling in
distributed systems is an important issue, and has performance impact on parallel processing, load
balancing and meta computing. Non-preemptive scheduling can perform well if the task lengths
and processor speeds are known in advance and hence job placement is done intelligently
Deadlock handling
Deadlocks in Distributed Systems: Deadlocks in Distributed Systems Deadlocks in distributed
systems are similar to deadlocks in single processor systems, only worse. They are harder to avoid,
prevent or even detect. They are hard to cure when tracked down because all relevant information
is scattered over many machines. People sometimes might classify deadlock into the following
types: Communication deadlocks -- competing with buffers for send/receive Resources deadlocks
-- exclusive access on I/O devices, files, locks, and other resources. We treat everything as
resources; there we only have resources deadlocks. Four best-known strategies to handle
deadlocks: The ostrich algorithm (ignore the problem) Detection (let deadlocks occur, detect them,
and try to recover) Prevention (statically make deadlocks structurally impossible) Avoidance
(avoid
deadlocks by allocating resources carefully)
The FOUR Strategies for handling deadlocks : The FOUR Strategies for handling deadlocks
The ostrich algorithm No dealing with the problem at all is as good and as popular in distributed
systems as it is in single-processor systems. In distributed systems used for programming, office
automation, process control, no system-wide deadlock mechanism is present -- distributed
databases will implement their own if they need one. Deadlock detection and recovery is popular
because prevention and avoidance are so difficult to implement. Deadlock prevention is possible
because of the presence of atomic transactions. We will have two algorithms for this. Deadlock
avoidance is never used in distributed system, in fact, it is not even used in single processor
systems. The problem is that the bankers algorithm need to know (in advance) how much of each
resource every process will eventually need. This information is rarely, if ever, available. Hence,
we will just talk about deadlock detection and deadlock prevention.
Load Balancing
Resource Scheduling (RS)
RS continuously monitors utilization across resource pools and intelligently aligns resources with
business needs, enabling you to:
Dynamically allocate IT resources to the highest priority applications. Create rules and
policies to prioritize how resources are allocated to virtual machines. Give IT autonomy to
business organizations. Provide dedicated IT infrastructure to business units while still achieving
higher hardware utilization through resource pooling.
Empower business units to build and manage virtual machines within their resource pool
while giving central IT control over hardware resources.
File Sharing
File sharing is the practice of distributing or providing access to digitally stored information, such
as computer programs, multi-media (audio, video), documents, or electronic books. It may be
implemented through a variety of storage, transmission, and distribution models and common
methods of file sharing incorporate manual sharing using removable media, centralized computer
file server installations on computer networks, World Wide Web-based hyper linked documents,
and the use of distributed peer-to-peer (P2P) networking. The Distributed File System is used to
build a hierarchical view of multiple file servers and shares on the network. Instead of having to
think of a specific machine name for each set of files, the user will only have to remember one
name; which will be the 'key' to a list of shares found on multiple servers on the network. Think of
it as the home of all file shares with links that point to one or more servers that actually host those
shares. DFS has the capability of routing a client to the closest available file server by using Active
Directory site metrics. It can also be installed on a cluster for even better performance and
reliability. Medium to large sized organizations are most likely to benefit from the use of DFS - for
smaller companies it is simply not worth setting up since an ordinary file server would be just fine.
Concurrency Control
In computer science, especially in the fields of computer programming (see also concurrent
programming, parallel programming), operating systems (see also parallel computing),
multiprocessors, and databases, concurrency control ensures that correct results for concurrent
operations are generated, while getting those results as quickly as possible. Distributed
concurrency control is the concurrency control of a system distributed over a computer network.
Failure handling
In a distributed system, failure transparency refers to the extent to which errors and subsequent
recoveries of hosts and services within the system are invisible to users and applications. For
example, if a server fails, but users are automatically redirected to another server and never notice
the failure, the system is said to exhibit high failure transparency.
Failure transparency is one of the most difficult types of transparency to achieve since it
is often difficult to determine whether a server has actually failed, or whether it is simply
responding very slowly. Additionally, it is generally impossible to achieve full failure transparency
in a distributed system since networks are unreliable.
Configuration
Dynamic system configuration is the ability to modify and extend a system while it is running. The
facility is a requirement in large distributed systems where it may not be possible or economic to
stop the entire system to allow modification to part of its hardware or software. It is also useful
during production of the system to aid incremental integration of component parts, and during
operation to aid system evolution.
Unit-II
Distributed Share Memory and Distributed File System
Distributed Shared Memory (DSM), also known as a distributed global address space
(DGAS), is a term in computer science that refers to a wide class of software and hardware
implementations, in which each node of a cluster has access to shared memory in addition to each
node's non-shared private memory.
Software DSM systems can be implemented in an operating system, or as a programming
library. Software DSM systems implemented in the operating system can be thought of as
extensions of the underlying virtual memory architecture. Such systems are transparent to the
developer; which means that the underlying distributed memory is completely hidden from the
users. In contrast, Software DSM systems implemented at the library or language level are not
transparent and developers usually have to program differently. However, these systems offer a
more portable approach to DSM system implementation. Software DSM systems also have the
flexibility to organize the shared memory region in different ways. The page based approach
organizes shared memory into pages of fixed size. In contrast, the object based approach organizes
the shared memory region as an abstract space for storing shareable objects of variable sizes. Other
commonly seen implementation uses tuple space, in which unit of sharing is a tuple.
Shared memory architecture may involve separating memory into shared parts distributed amongst
nodes and main memory; or distributing all memory between nodes. A coherence protocol, chosen
in accordance with a consistency model, maintains memory coherence.
Examples of such systems include:
Delphi DSM
JIAJIA
Kerrighed
NanosDSM
OpenSSI
MOSIX
Strings
Terracotta
TreadMarks
DIPC
Intel Cluster OpenMP is internally a Software DSM.
ScaleMP ?
RNA networks
DSM Subsystem
Routines to handle page faults relating to virtual addresses corresponding to a DSM region.
Code to service system calls which allow a user process to get, attach and detach a DSM
region.
Code to handle system calls from the DSM server.
DSM Server
In-server :Receives messages from remote DSM servers and takes appropriate action. (E.g.
Invalidate its copy of a page)
Out-server :Receives requests from the DSM subsystem and communicates with its peer
DSM servers at remote nodes. Note that the DSM subsystem itself does
not directly communicate over the network with other hosts.
Communication with key Server.
Key Sever
Each region must be uniquely identifiable across the entire LAN. When a process executes
system call with a key and is the first process at that host to do so, the key server is consulted.
Key servers internal table is looked-up for the key, if not found then it stores the specified
key in the table as a new entry.
Design & Implementations issues In DSM System
There are various factors that have to be kept in mind while designing and implementing the DSM
systems. They are as follows:
1.Block Size:
As we know, transfer of the memory blocks is the major operation in the DSM systems. Therefore
block size matters a lot here. Block size is often referred to as the Granularity. Block size is the
unit of sharing or unit of data transfer in the event of network block fault. Block size can be few
words, pages or few pages. Size of the block depends on various factors like, paging overhead,
thrashing, false sharing, and directory size.
3. Replacement Strategy:
It may happen that one node might be accessing for a memory block from DSM when its own
local memory is completely full. In such a case, when the memory block migrating from remote
node reaches, it finds no space to get placed. Thus a replacement strategy of major concern in the
design and implementation of the DSM systems. Certain block must be removed so as to place the
new blocks in such a situation. Several techniques are used for the replacement of old blocks such
as removal of Least Recently Used memory blocks.
4. Thrashing:
Sometimes two or more processes might access the same memory block. Suppose two processes
need to perform write operation on the same memory block. Since, to accomplish this, the block
has to migrate in both directions at a very small interval of time, so it will be transferred back and
forth at such a high rate that none of the two processes will be able to perform the operation
accurately and completely. As such no real work is done. This condition is called thrashing. A
technique should be incorporated, while designing the DSM systems to avoid thrashing.
5. Heterogeneity:
DSM systems should be able to function on computers with different architectures. Issues Involved
in
DSM Issues
Network Communication
Consistency
Data Granularity
Coherence
Consistency Model
Strict consistency in shared memory systems.
Sequential consistency in shared memory systems Our focus.
Other consistency protocols
Casual consistency protocol.
Weak and release consistency protocol
In computing, a distributed file system or network file system is any file system that allows
access to files from multiple hosts sharing via a computer network. This makes it possible for
multiple users on multiple machines to share files and storage resources. The client nodes do not
have direct access to the underlying block storage but interact over the network using a protocol.
This makes it possible to restrict access to the file system depending on access lists or capabilities
on both the servers and the clients, depending on how the protocol is designed. In contrast, in a
shared disk file system all nodes have equal access to the block storage where the file system is
located. On these systems the access control must reside on the client. Distributed file systems may
include facilities for transparent replication and fault tolerance. That is, when a limited number of
nodes in a file system go offline, the system continues to work without any data loss. The
difference between a distributed file system and a distributed data store can be
vague, but DFSes are generally geared towards use on local area networks.
Features
DFS offers many features that make managing multiple file servers much simpler and effective.
Unified Namespace
DFS links multiple shared folders on multiple servers into a folder hierarchy. This hierarchy is
same as a physical directory structure on a single hard disk. However, in this case, the individual
branch of the hierarchy can be on any of the participating servers.
Location Transparency
Even if the files are scattered across multiple servers, users need to go to only one network
location. This is a very powerful feature. Users do not need to know if the actual file location has
changed. There is no need to inform everyone about using new paths or server names! Imagine
how much time and energy this can save. It reduces downtime required during server renames,
planned or unplanned shutdowns and so on.
Continuous Availability
As mentioned, during planned shutdowns, the file resources can be temporarily made available
from another standby server, without users requiring to be notified about it. This way downtime
related to maintenance or disaster recovery tasks is completely eliminated. This is very useful
especially in Web servers. The Web server file locations can be configured in such a way that even
when the physical location of the files changes to another server, the HTML links continues to
work without breaking.
Replication
It is possible to replicate data to one or more servers within the DFS structure. This way, if one
server is down, files will be automatically served from other replicated locations. Whats more,
users will not even know the difference.
Load Balancing
This is a conceptual extension of replication feature. Now that you can put copies of the same file
across multiple locations. If the file is requested by more than one user at the same time, DFS will
serve it from different locations. This way, the load on one server is balanced across multiple
servers, which increases performance. At a user level, they do not even come to know that the file
came from a particular replica on DFS.
Security
DFS utilises the same NTFS security and file sharing permissions. Therefore, no special
configuration is required to integrate base security with DFS.
File Model
The Distributed File System is used to build a hierarchical view of multiple file servers and shares
on the network. Instead of having to think of a specific machine name for each set of files, the user
will only have to remember one name; which will be the 'key' to a list of shares found on multiple
servers on the network. Think of it as the home of all file shares with links that point to one or
more servers that actually host those shares. DFS has the capability of routing a client to the
closest available file server by using Active Directory site metrics. It can also be installed on a
cluster for even better performance and reliability. Medium to large sized organizations are most
likely to benefit from the use of DFS - for smaller companies it is simply not worth setting up since
an ordinary file server would be just fine.
File Service Architecture : Following figure shows the architecture of file service. Client computer
can communicate with server using client module to flat file service.
Distributed systems based on objects are one of the newest and most popular approaches to the
design and construction of distributed systems. CORBA platform is built from several standards
published by the organization OMG (Object Management Group), whose objective is to provide a
common system for the construction of distributed systems in a heterogeneous environment. The
role of the ORB is to deliver the tasks the system the following services: network communication,
locating objects, sending notifications too objects, the results to clients. The basic features of
CORBA are: independence from the programming language by using language IDL and the
independence of the system, hardware, communication (IIOP).
Java RMI (Remote Method Invocation) is a second example of a distributed system platform based
on objects. RMI is a structure built based on Java. The model presupposes the existence of the
facility, located in the address space of the server and client, which causes the object operations.
Remote state of the object is located on a single machine, and the local interface of an object is
released.
Human Oriented Name
Names allow us to identify objects. to talk about them and to access them. Naming is therefore an
important issue for large scale distributed systems. It becomes a critical issue when those systems
are intended to support collaboration between humans. A large volume of research has already
been published on the subject of naming, particularly within the context of name servers and
directories. However, it can be argued that the hierarchical nature of many of the naming
mechanisms so far proposed is too constraining to fully support the great flexibility of human
naming practice, particularly where group work is concerned.
Unit-III
Inter Process Communication and Synchronization
In computer networking, an Internet socket or network socket is an endpoint of a bidirectional
inter-process communication flow across an Internet Protocol based computer network, such as the
Internet. The term Internet sockets is also used as a name for an application programming interface
(API) for the TCP/IP protocol stack, usually provided by the operating system. Internet sockets
constitute a mechanism for delivering incoming data packets to the appropriate application process
or thread, based on a combination of local and remote IP addresses and port numbers. Each socket
is mapped by the operating system to a communicating application process or thread.
A socket address is the combination of an IP address (the location of the computer) and a port
(which is mapped to the application program process) into a single identity, much like one end of a
telephone connection is the combination of a phone number and a particular extension. An Internet
socket is characterized by a unique combination of the following:
Protocol: A transport protocol (e.g., TCP, UDP), raw IP, or others. TCP port 53 and UDP port
53 are different, distinct sockets.
Local socket address: Local IP address and port number
Remote socket address: Only for established TCP sockets. As discussed in the Client-Server
section below, this is necessary since a TCP server may serve several clients concurrently. The
server creates one socket for each client, and these sockets share the same local socket address.
Sockets are usually implemented by an API library such as Berkeley sockets, first introduced in
1983. Most implementations are based on Berkeley sockets, for example Winsock introduced in
1991. Other socket API implementations exist, such as the STREAMS-based Transport Layer
Interface (TLI). Development of application programs that utilize this API is called socket
programming or network programming. These are examples of functions or methods typically
provided by the API library:
socket() creates a new socket of a certain socket type, identified by an integer number, and
allocates system resources to it.
bind() is typically used on the server side, and associates a socket with a socket address
structure, i.e. a specified local port number and IP address.
listen() is used on the server side, and causes a bound TCP socket to enter listening state.
connect() is used on the client side, and assigns a free local port number to a socket. In case of
a TCP socket, it causes an attempt to establish a new TCP connection.
accept() is used on the server side. It accepts a received incoming attempt to create a new TCP
connection from the remote client, and creates a new socket associated with the socket address pair
of this connection.
send() and recv(), or write() and read(), or recvfrom() and sendto(), are used for sending and
receiving data to/from a remote socket.
close() causes the system to release resources allocated to a socket. In case of TCP, the
connection is terminated.
gethostbyname() and gethostbyaddr() are used to resolve host names and addresses.
select() is used to prune a provided list of sockets for those that are ready to read, ready to
write or have errors
poll() is used to check on the state of a socket. The socket can be tested to see if it can be
written to, read from or has errors.
Group Communication
Computer systems consisting of multiple processors are becoming commonplace. Many companies
and institutions, for example, own a collection of workstations connected by a local area network
(LAN). Although the hardware for distributed computer systems is advanced, the software has
many problems. We believe that one of the main problems is the communication paradigms that
are used. This thesis is concerned with software for distributed computer systems. In it, we will
study an abstraction, called group communication that simplifies building reliable efficient
distributed systems. We will discuss a design for group communication, show that it can be
implemented efficiently, and describe the design and implementation of applications based on
group communication. Finally, we will give extensive performance measurements. Our goal is to
demonstrate that group communication is a suitable abstraction for distributed systems.
Clock synchronization is a problem from computer science and engineering which deals with the
idea that internal clocks of several computers may differ. Even when initially set accurately, real
clocks will differ after some amount of time due to clock drift, caused by clocks counting time at
slightly different rates. There are several problems that occur as a repercussion of rate differences
and several solutions, some being more appropriate than others in certain contexts. In a centralized
system the solution is trivial; the centralized server will dictate the system time. Cristian's
algorithm and the Berkeley Algorithm are some solutions to the clock synchronization problem in
a centralized server environment. In a distributed system the problem takes on more complexity
because a global time is not easily known. The most used clock synchronization solution on the
Internet is the Network Time Protocol (NTP) which is a layered client-server architecture based on
UDP message passing. Lamport timestamps and Vector clocks are concepts of the logical clocks in
distributed systems.
Cristian's algorithm
Cristian's algorithm relies on the existence of a time server. The time server maintains its clock by
using a radio clock or other accurate time source, then all other computers in the system stay
synchronized with it. A time client will maintain its clock by making a procedure call to the time
server. Variations of this algorithm make more precise time calculations by factoring in network
propagation time.
Berkeley algorithm
This algorithm is more suitable for systems where a radio clock is not present, this system has no
way of making sure of the actual time other than by maintaining a global average time as the
global time. A time server will periodically fetch the time from all the time clients, average the
results, and then report back to the clients the adjustment that needs be made to their local clocks
to achieve the average. This algorithm highlights the fact that internal clocks may vary not only in
the time they contain but also in the clock rate. Often, any client whose clock differs by a value
outside of a given tolerance is disregarded when averaging the results. This prevents the overall
system time from being drastically skewed due to one erroneous clock.
Mutual Exclusion,
Assumptions
The system consists of n processes; each process Pi resides at a different processor Each process
has a critical section that requires mutual exclusion Basic Requirement If Pi is executing in its
critical section, then no other process Pj is executing in its critical section The presented
algorithms ensure the mutual exclusion execution of processes in their critical sections. Mutual
exclusion must be enforced: only one process at a time is allowed in its critical section A process
that hales in its non critical section must do so without interfering with other processes. It must not
be possible for a process requiring access to a critical section to be delayed indefinitely: No
deadlock or starvation When no process is in a critical section, any process that requests entry to its
critical section must be permitted to enter without delay No assumptions are made about relative
process speeds or number of processors A process remains inside its critical section for a finite
time Only
Election Algorithms:- Bully & Ring Algorithms
There are at least two basic strategies by which a distributed system can adapt to failures.
Operate continuously as failures occur and are repaired
The second alternative is to temporarily halt normal operation and to take some time out to
reorganize the system.
The reorganization of the system is managed by a single node called the coordinator.
So as a first step in any reorganization, the operating or active nodes must elect a coordinator.
Similar
Like Synchronization, all processors must come to an agreement about who enters the critical
region (i.e. who is the leader)
Different
The election protocol must properly deal with the case of a coordinator failing. On the other
hand, mutual exclusion algorithms assume that the process in the critical region (i.e., the
coordinator) will not fail.
A new coordinator must inform all active nodes that it is the coordinator. In a mutual
exclusion algorithm, the nodes not in the critical region have no need to know what node is in the
region.
The two classical election algorithms by Garcia-Molina
Bully Algorithm
Invitation Algorithm
Ring Algorithm
Election algorithms
We often need one process to act as a coordinator. It may not matter which process does this, but
there should be group agreement on only one. An assumption in election algorithms is that all
processes are exactly the same with no distinguishing characteristics. Each process can obtain a
unique identifier (for example, a machine address and process ID) and each process knows of
every other process but does not know which is up and which is down.
Bully algorithm
The bully algorithm selects the process with the largest identifier as the coordinator. It works as
follows:
1. When a process p detects that the coordinator is not responding to requests, it initiates an
election:
a. p sends an election message to all processes with higher numbers.
b. If nobody responds, then p wins and takes over.
c. If one of the processes answers, then p's job is done.
2. If a process receives an election message from a lower-numbered process at any time, it:
a. sends an OK message back.
b. holds an election (unless its already holding one).
3. A process announces its victory by sending all processes a message telling them that it is the
new coordinator.
4. If a process that has been down recovers, it holds an election.
Ring algorithm
The ring algorithm uses the same ring arrangement as in the token ring mutual exclusion
algorithm, but does not employ a token. Processes are physically or logically ordered so that each
knows its successor.
If any process detects failure, it constructs an election message with its process I.D. (e.g.
network address and local process I.D.) and sends it to its successor.
If the successor is down, it skips over it and sends the message to the next party. This process is
repeated until a running process is located.
At each step, the process adds its own process I.D. to the list in the message. Eventually, the
message comes back to the process that started it:
Migration issues
Obviously, achieving the goals of load balancing and transparency with as low over head as
possible presents a formidable task. The following are some of the main issues to be dealt with:
(1) Allocation and scheduling: How is a target node chosen? What are the factors taken into
consideration while choosing a destination host? Is load balanced dynamically, or only reallocated
during special circumstances like eviction or imminent host failure? Does the previous history of
allocation on a node make that node more attractive due to the presence of "warm caches" , also
known as cache affinity scheduling ? Considering that all of the above systems represent loosely
coupled environments, how much of a difference can such a consideration make? Similarly, what
is the best allocation policy for an I/O intensive process?
(2) Once a target has been chosen, how is the process state saved and transferred? For e.g.,
would virtual memory pages be transferred all at once, increasing the latency between process
suspension and resumption, or transferred on a demand-paged basis thus speeding up migration?
An important consideration over here is how much of "residual dependency" do we allow on the
ex-host?
(3) How is migration supported by the underlying file system for kernel level schemes? Are files
assumed to be accessible from any point? For transparency, a transparent file system would itself
seem to be a prerequisite.
(4) How are name spaces dealt with? Do process Ids, file descriptors etc change with migration?
How does global naming help? How are sockets and signals managed?
(5) What are the scaling considerations that have been incorporated into the design?
(6) What is the level of transparency?
Distributed Multimedia Database involves network technology, distributed control, security, and
multimedia computing. This chapter discusses fundamental concepts and introduces issues of
image database and digital libraries, video-on-demand systems, multimedia synchronization, as
well as some case studies of distributed multimedia database systems. Requirements of multimedia
database management systems and their functions are also presented.
Overview
Distributed database management system is software for managing databases stored on multiple
computers in a network. A distributed database is a set of databases stored on multiple computers
that typically appears to applications on a single database. Consequently, an application can
simultaneously access and modify the data in several databases in a network. DDBMS is specially
developed for heterogeneous database platforms, focusing mainly on heterogeneous database
management system (HDBMS).
The distributed database can be defined as consisting of a collection of data with different parts
under the control of separate DBMS, running an independent computer system. A distributed
database is a database that is under the control of a central database management system (DBMS)
in which storage devices are not all attached to a common CPU. It may be stored in multiple
computers located in the same physical location, or may be dispersed over a network of
interconnected computers. Collections of data (eg. in a database) can be distributed across multiple
physical locations. A distributed database can reside on network servers on the Internet, on
corporate intranets or extranets, or on other company networks. Replication and distribution of
databases improve database performance at end-user worksites. To ensure that the distributive
databases are up to date and current, there are two processes: replication and duplication.
Replication involves using specialized software that looks for changes in the distributive database.
Once the changes have been identified, the replication process makes all the databases look the
same. The replication process can be very complex and time consuming depending on the size and
number of the distributive databases. This process can also require a lot of time and computer
resources. Duplication on the other hand is not as complicated. It basically identifies one database
as a master and then duplicates that database. The duplication process is normally done at a set
time after hours. This is to ensure that each distributed location has the same data. During the
duplication process, changes to the master database only are allowed. This is to ensure that local
data will not be overwritten. Both of the processes can keep the data current in all distributive
locations. Besides distributed database replication and fragmentation, there are many other
distributed database design technologies. For example, local autonomy, synchronous and
asynchronous distributed database technologies. These technologies' implementation can and does
depend on the needs of the business and the sensitivity/confidentiality of the data to be stored in
the database, and hence the price the business is willing to spend on ensuring data security,
consistency and integrity.
Amoeba
Amoeba is the open source microkernel-based distributed operating system developed by Andrew
S. Tanenbaum and others at the Vrije Universiteit. The aim of the Amoeba project is to build a
timesharing system that makes an entire network of computers appear to the user as a single
machine. Development at Vrije Universiteit was stopped: the files in the latest version (5.3) were
last modified on 12 February 2001. Recent development is carried forward by Dr. Stefan Bosse at
BSS Lab. Amoeba runs on several platforms, including SPARC, i386, i486, 68030, Sun 3/50 and
Sun 3/60. The system uses FLIP as a network protocol. The Python programming language was
originally developed for this platform.
WHAT IS AMOEBA?
Amoeba is a general-purpose distributed operating system. It is designed to take a collection of
machines and make them act together as a single integrated system. In general, users are not aware
of the number and location of the processors that run their commands, nor of the number and
location of the file servers that store their files. To the casual user, an Amoeba system looks like a
single old-fashioned time-sharing system. Amoeba is an ongoing research project. It should be
thought of as a platform for doing research and development in distributed and parallel systems,
languages, protocols and applications. Although it provides some UNIX emulation, and has a
definite UNIX-like flavor (including over 100 UNIX-like utilities), it is NOT a plug-compatible
replacement for UNIX. It should be of interest to educators and researchers who want the source
code of a distributed operating system to inspect and tinker with, as well as to those who need
a base to run distributed and parallel applications. Amoeba is intended for both distributed
computing (multiple independent users working on different projects) and parallel computing
(e.g., one user using 50 CPUs to play chess in parallel). Amoeba provides the necessary
mechanism for doing both distributed and parallel applications, but the policy is entirely
determined by user-level programs. For example, both a traditional (i.e. sequential) make and a
new parallel
amake are supplied.
DESIGN GOALS
The basic design goals of Amoeba are:
DistributionConnecting together many machines
ParallelismAllowing individual jobs to use multiple CPUs easily
TransparencyHaving the collection of computers act like a single system
PerformanceAchieving all of the above in an efficient manner
Amoeba is a distributed system, in which multiple machines can be connected together. These
machines need not all be of the same kind. The machines can be spread around a building on a
LAN. Amoeba uses the high performance FLIP network protocol for LAN communication. If an
Amoeba machine has more than one network interface it will automatically act as a FLIP router
between the various networks and thus connect the various LANs together. Amoeba is also a
parallel system. This means that a single job or program can use multiple processors to gain speed.
For example, a branch and bound problem such as the Traveling Salesman Problem can use tens
or even hundreds of CPUs, if available, all working together to solve the problem more quickly.
Large back end multiprocessors, for example, can be harnessed this way as big compute
engines. Another key goal is transparency. The user need not know the number or the location of
the CPUs, nor the place where the files are stored. Similarly, issues like file replication
are handled largely automatically, without manual intervention by the users. Put in different terms,
a user does not log into a specific machine, but into the system as a whole. There is no concept of a
home machine. Once logged in, the user does not have to give special remote login commands
to take advantage of multiple processors or do special remote mount operations to access distant
files. To the user, the whole system looks like a single conventional timesharing system.
Performance and reliability are always key issues in operating systems, so substantial effort has
gone into dealing with them. In particular, the basic communication mechanism has been
optimized to allow messages to be sent and replies received with a minimum of delay, and to allow
large blocks of data to be shipped from machine to machine at high bandwidth. These building
blocks serve as the basis for implementing high performance subsystems and applications on
Amoeba.
Mach
Mach is an operating system microkernel developed at Carnegie Mellon university to support
operating system research, primarily distributed and parallel computation. It is one of the earliest
examples of a microkernel, and its derivatives are the basis of the modern operating system kernels
in Mac OS X and GNU Hurd.
Goals of Mach
Providing a base for building other operating systems (UNIX)
Supporting large sparse address spaces
Allowing transparent access to network resources
Exploiting parallelism in both the system and the applications
Making Mach portable to a larger collection of Machines
Chorus Task: an execution environment
Thread: the basic unit of execution and must run in the context of a task.
Port: a communication channel with an associated message queue
Port set: a group of ports sharing a common message queue
message: a typed collection of data objects used in communication between threads(can
contain port rights in addition to pure data)
memory object: a source of memory
Features of Mach
Multiprocessor operation
Transparent extension to network operation
User-level servers
operating system emulation
Flexible virtual memory implementation
Portability
Mach was designed to execute on a shared memory multiprocessor so that both kernel threads and
user-mode threads could be executed by any processor.
Chorus
The evolution of computer applications has led to the design of large, distributed systems for
which the requirement for efficiency and availability has increased, as has the need for higher level
tools used in their construction, operation, and administration. This evolution has introduced the
following requirements for new system structures that are difficult to fulfill merely by assembling
networks of cooperating systems:
Separate applications running on different machines, often from different suppliers, using
different operating systems, and written in a variety of programming languages, need to be tightly
coupled and logically integrated. The loose coupling provided by current computer networking is
insufficient. A requirement exists for a higher-level coupling of applications.
Applications often evolve by growing in size. Typically, this growth leads to distribution of
programs to different machines, to treating several geographically distributed sets of files as a
unique logical file, and to upgrading hardware and software to take advantage of the latest
technologies. A requirement exists for a gradual on-line evolution.
Applications grow in complexity and become more difficult to understand, specify, debug,
and tune. A requirement exists for a straightforward underlying architecture which allows the
modularity of the application to be mapped onto the operational system and which conceals
unnecessary details of distribution from the application. These structural properties can best be
accomplished through a basic set of unified and coherent concepts which provide a rigorous
framework that is well adapted to constructing distributed systems. The CHORUS architecture has
been designed to meet these requirements. Its foundation is a generic Nucleus running on each
machine. Communication and distribution are managed at the lowest level by this Nucleus. The
generic CHORUS Nucleus implements the real-time services required by real-time applications.
Traditional operating systems are built as subsystems on top of the generic Nucleus and use its
basic services. User application programs run in the context of these operating systems. CHORUS
provides a UNIX subsystem as one example of a host operating system running on top of the
CHORUS Nucleus. UNIX programs can run unmodified under this subsystem, optionally taking
advantage of the distributed nature of the CHORUS environment.
The CHORUS Architecture
Overall Organization
A CHORUS System is composed of a small Nucleus and a set of system servers, which cooperate
in the context of subsystems (Figure 1). This overall organization provides the basis for an open
operating system. It can be mapped onto a centralized as well as a distributed configuration. At this
level, distribution is hidden.
The choice was made to build a two-level logical structure, with a generic Nucleus at the lowest
level and almost autonomous subsystems providing applications with traditional operating system
services. Therefore, the CHORUS Nucleus is not the core of a specific operating system, rather it
provides generic tools designed to support a variety of host subsystems, which can co-exist on top
of the Nucleus. This structure supports application programs, which already run on an existing
operating system, by reproducing the operating systems interfaces within a subsystem. An
example of this approach is given using a UNIX emulation environment called CHORUS/MiX.
The classic idea of separating the functions of an operating system into groups of services provided
by autonomous servers is central to the CHORUS philosophy. In monolithic systems, these
functions are usually part of the kernel. This separation of functions increases modularity, and
therefore the portability of the overall system.
The CHORUS Nucleus
The CHORUS Nucleus manages, at the lowest level, the local physical resources of a site. At the
highest level, it provides a location transparent inter-process communication (IPC) mechanism.
The Nucleus is composed of four major components providing local and global services (Figure
2):
The CHORUS supervisor dispatches interrupts, traps, and exceptions delivered by the
hardware;
The CHORUS real-time executive controls the allocation of processors and provides fine
grained synchronization and priority-based preemptive scheduling;
The CHORUS virtual memory manager is responsible for manipulating the virtual
memory hardware and local memory resources;
T he CHORUS inter-process communication manager provides asynchronous message
exchange and remote procedure call (RPC) facilities in a location independent fashion. There are
no interdependencies among the four components of the CHORUS Nucleus. As a result, the
distribution of services provided by the Nucleus is almost hidden. Local services deal with local
resources and can be mostly managed using only local information. Global services involve
cooperation between Nuclei to provide distribution. In CHORUSV3 it was decided, based on
experience with CHORUS-V2 efficiency, to include in the Nucleus some functions that could have
been provided by system servers: actor and port management, name management, and RPC
management. The standard CHORUS IPC mechanism is the primary means used to communicate
with managers in a CHORUS system. For example, the virtual memory manager uses CHORUS
IPC to request remote data to service a page fault. The Nucleus was also designed to be highly
portable, which, in some instances, may preclude the use of some underlying hardware features.
Experience gained from porting the Nucleus to
The Subsystems
System servers work cooperatively to provide a coherent operating system interface, referred to as
a subsystem.
System Interfaces
A CHORUS system provides different levels of interface (Figure 1).
The Nucleus interface provides direct access to the low-level services of the CHORUS
Nucleus.
A subsystem interface is implemented by a set of cooperating, trusted servers, and typically
represents complex operating system abstractions. Several different subsystems may be resident on
a CHORUS system simultaneously, providing a variety of operating system or high-level
interfaces to different application procedures.
User libraries, such as the C library, further enhance the CHORUS interface by providing
commonly used programming facilities.