Distributed Computing CSC413 Note
Distributed Computing CSC413 Note
Contents
Module I
Introduction to distributed systems, characteristics of distributed systems, Design
considerations for distributed systems, Advantages and Disadvantages of Distributed Computing
System. Advantages of distributed system over centralized system. Evolution of Distributed
Computing System. Multiple Processor Connection – tightly coupled/loosely coupled systems.
Module II
Forms of Distributed Computing System – Term Paper presentation.
Models of Distributed Computing System. Challenges of Distributed Computing System/Issues
in designing Distributed Computing System. Failure Detection in Distributed Computing
System. Fault Avoidance, Tolerance, Detection and Recovery in Distributed System.
Module III
Communication in Distributed Computing System. Overview of Data Communication.
Transmission mechanism. Transmission mode. Message Transport technique.
Module IV
Expectations:- Each Student will be assigned one of the Forms of distributed computing paradigm
to research on: The overview of the form, the Key characteristics, Explanation of the fundamental
Principle behind its operation will be required and a case study scenario of problem solving with
the form. This will be in form of term paper and serve as continuous assessment test. The date
for presentation of term papers by each student will be communicated later.
1
CSC413-Distributed Computing System
Definitions
Distributed computing is a method of computer processing in which different parts of
a program are run simultaneously on two or more computers that are communicating
with each other over a network.
Distributed computing means designing and implementing programs that run on two
or more interconnected computer systems. It is a model in which components of a
software system are shared among multiple computers to improve efficiency and
performance.
2
CSC413-Distributed Computing System
The primary goal of a distributed computing system is to connect users and resources
in a transparent, open, and scalable way. Ideally this arrangement must drastically be
fault tolerant and more powerful than any combinations of stand-alone computer
systems. Therefore, the main goal of distributed computing is for device sharing, data
sharing, Communication and flexibility.
3
CSC413-Distributed Computing System
3. Shared State:
If a subset of nodes cooperate to provide a service, a shared state is maintained by these
nodes. The shared state is distributed or replicated among the participants.
3. Scalability, the ability to serve more users, provides acceptable response times with
increased amount of data and reduces cost
2. "A distributed system is one in which the failure of a computer you didn't even
know existed can render your own computer unusable. "Troubleshooting and
diagnosing problems in a distributed system can also become more difficult,
because the analysis may require connecting to remote nodes or inspecting
communication between nodes.
3. Many types of computation are not well suited for distributed environments,
typically owing to the amount of network communication or synchronization that
would be required between nodes. If bandwidth, latency, or communication
requirements are too significant, then the benefits of distributed computing may
be negated and the performance may be worse than a non-distributed
environment.
4. As distributed systems get larger, it becomes harder and harder to predict or even
understand their behavior. Part of the reason for this is that programmers have not
yet developed the kind of tools for managing complexity (like subroutines or
objects with narrow interfaces, or even simple structured programming
mechanisms like loops or if/then statements) that are standard in sequential
programming.
5. distributed systems bring with them large amounts of inherent non
determinism— unpredictable events like delays in message arrival and the
sudden failure of components
Project-related problems
1. Distributed computing projects may generate data that is proprietary to private
industry, even though the process of generating that data involves the resources of
volunteers. This may result in controversy as private industry profits from the data
which is generated with the aid of volunteers.
2. In addition, some distributed computing projects, such as biology projects that aim
to develop thousands or millions of "candidate molecules" for solving various
medical problems, may create vast amounts of raw data. This raw data may be
useless by itself without refinement of the raw data or testing of candidate results
in real-world experiments. Such refinement and experimentation may be so
5
CSC413-Distributed Computing System
expensive and time-consuming that it may literally take decades to sift through the
data. Until the data is refined, no benefits can be acquired from the computing
work.
3. Other projects suffer from lack of planning on behalf of their well-meaning
originators. These poorly planned projects may not generate results that are
palpable, or may not generate data that ultimately result in finished, innovative
scientific papers. Sensing that a project may not be generating useful data,
the project managers may decide to abruptly terminate the project without
definitive results, resulting in wastage of the electricity and computing resources
used in the project. Volunteers may feel disappointed and abused by such
outcomes.
4. There is an obvious opportunity cost of devoting time and energy to a project that
ultimately is useless, when that computing power could have been devoted to a
better planned distributed computing project generating useful, concrete results.
5. Some distributed computing projects may also attempt to use computers to find
solutions by number-crunching mathematical or physical models. With such
projects there is the risk that the model may not be designed well enough to
efficiently generate concrete solutions.
• Speed: a distributed system may have more total computing power than a
mainframe.
• Reliability: If one machine crashes, the system as a whole can still survive.
• The existence of large number of personal computers, the need for people to
collaborate and share information.
6
CSC413-Distributed Computing System
2. Loosely coupled systems: In these systems, the processors do not share memory,
and each processor has its own local memory as shown in Figure 2. If a processor
writes the value 100 to the memory location x, this write operation will only change
the contents of its local memory and will not affect the contents of the memory. In
these systems, all physical communication between the processors is done by passing
messages across the network that interconnects the processors. Loosely coupled
systems are referred to as distributed computing systems, or simply distributed
systems.
7
CSC413-Distributed Computing System
8
CSC413-Distributed Computing System
2. Workstation Model:
9
CSC413-Distributed Computing System
jobs of users who are logged onto other workstations and do not have sufficient processing
power at their own workstations to get their jobs processed efficiently. In this model, a user
logs onto one of the workstations called his or her “home” workstation and submits jobs for
execution. When the system finds that the user’s workstation does not have sufficient
processing power for executing the processes of the submitted jobs efficiently, it transfers
one or more of the process from the user’s workstation to some other workstation that is
currently idle and gets the process executed there, and finally the result of execution is
returned to the user’s workstation.
The workstation model is a network of personal workstations, each with its own disk and a
local file system. A workstation with its own local disk is usually called a diskful workstation
and a workstation without a local disk is called a diskless workstation. With the proliferation
of high-speed networks, diskless workstations have become more popular in network
environments than diskful workstations, making the workstation-server model more
popular than the workstation model for building distributed computing systems.
3. Workstation Server Model
A distributed computing system based on the workstation server model consists of a few
minicomputers and several workstations (most of which are diskless, but a few of which may
be diskful) interconnected by a communication network.
Normal computation activities required by the user’s processes are preformed at the user’s
home workstation, but requests for services provided by special servers (such as a file server
10
CSC413-Distributed Computing System
or a database server) are sent to a server providing that type of service that performs the
user’s requested activity and returns the result of request processing to the user’s
workstation.
Therefore, in this model, the user’s processes need not be migrated to the server machines
for getting the work done by those machines.
For better overall system performance, the local disk of a diskful workstation is normally
used for such purposes as storage of temporary files, storage of unshared files, storage of
shared files that are rarely changed, paging activity in virtual-memory management, and
changing of remotely accessed data.
1. In general, it is much cheaper to use a few minicomputers equipped with large, fast disks
that are accessed over the network than a large number of diskful workstations, with each
workstation having a small, slow disk.
3. In the workstation server model, since all files are managed by the file servers, user have
the flexibility to use any workstation and access the files in the same manner irrespective of
which workstation the user is currently logged on. Note that this is not true with the
workstation model, in which each workstation has its local file system, because different
mechanisms are needed to access local and remote files.
4. In the workstation server model, the request response protocol is mainly used to access
the services of the server machines. Therefore, unlike the workstation model, this model
does not need a process migration facility, which is difficult to implement.
11
CSC413-Distributed Computing System
5. A user has guaranteed response time because workstations are not used for executing
remote processes. However, the model does not utilize the processing capability of idle
workstations.
12
CSC413-Distributed Computing System
As shown in Figure 6, in the pure processor-pool model, the processors in the pool have no
terminals attached directly to them, and users access the system from terminals that are
attached to the network via special devices. These terminals are either small diskless
workstations or graphic terminals, such as X terminals. A special server (Called a run server)
manages and allocates the processors in the pool to different users on a demand basis. When
a user submits a job for computation, an appropriate number of processors are temporarily
assigned to his or her job by the run server. For example, if the user’s computation job is the
compilation of a program having n segments, in which each of the
segments can be complied independently to produce separate relocatable object files, n
processors from the pool can be allocated to this job to compile all the n segments in parallel.
When the computation is completed, the processors are returned to the pool for use by other
users. In the processor-pool model there is no concept of a home machine. That is, a user
does not log onto a particular machine but to the system as a whole.
13
CSC413-Distributed Computing System
5. Hybrid Model:
Out of the four models described above, the workstation server model, is the most
widely used model for building distributed computing systems. This is because a large
number of computer users only perform simple interactive tasks such as editing jobs,
sending electronic mails, and executing small programs. The workstation-server model is
ideal for such simple usage.
However, in a working environment that has groups of users who often perform jobs
needing massive computation, the processor-pool model is more attractive and suitable. The
advantages of both the workstation-server and processor-pool models results in a hybrid
model used to build a distributed computing system.
The hybrid model is based on the workstation-server model but with the addition of
a pool of processors. The processors in the pool can be allocated dynamically for
computations that are too large for workstations or that requires several computers
concurrently for efficient execution. In addition to efficient execution of computation-
intensive jobs, the hybrid model gives guaranteed response to interactive jobs by allowing
them to be processed on local workstations of the users.
However, the hybrid model is more expensive to implement than the workstation – server
model or the processor-pool model.
14
CSC413-Distributed Computing System
A. Transparency
A distributed system that is able to present itself to user and application as if it were only a
single computer system is said to be transparent. There are eight types of transparencies in
a distributed system:
1) Access Transparency: It hides differences in data representation and how a resource is
accessed by a user. Example, a distributed system may have a computer system that runs
different operating systems, each having their own file naming conventions. Differences in
naming conventions as well as how files can be manipulated should be hidden from the users
and applications.
2) Location Transparency: Hides where exactly the resource is located physically. Example,
by assigning logical names to resources like yahoo.com, one cannot get an idea of the location
of the web page’s main server.
4) Relocation Transparency: this transparency deals with the fact that resources can be
relocated while it is being accessed without the user who is using the application to know
anything.
Example: using a Wi-Fi system on laptops while moving from place to place without getting
disconnected.
5) Replication Transparency: Hides the fact that multiple copies of a resource could exist
simultaneously. To hide replication, it is essential that the replicas have the same name.
Consequently, as system that supports replication should also support location
transparency.
15
CSC413-Distributed Computing System
6) Concurrency Transparency: It hides the fact that the resource may be shared by several
competitive users. Example, two independent users may each have stored their file on the
same server and may be accessing the same table in a shared database. In such cases, it is
important that each user doesn’t notice that the others are making use of the same resource.
7) Failure Transparency: Hides failure and recovery of the resources. It is the most difficult
task of a distributed system and is even impossible when certain apparently realistic
assumptions are made. Example: A user cannot distinguish between a very slow or dead
resource. Same error message come when a server is down or when the network is
overloaded of when the connection from the client side is lost. So here, the user is unable to
understand what has to be done, either the user should wait for the network to clear up, or
try again later when the server is working again.
B Reliability
16
CSC413-Distributed Computing System
components of a distributed system. Some types of faults in a system and some of their
solutions include:
Component Faults
There are three types of component faults: transient faults, intermittent faults and
permanent faults.
- A transient fault occurs once and then disappears. If the operation is repeated then
the system will behave normally.
- An intermittent fault arises, then goes away, rises again and so on. A common cause
of an intermittent fault is a loose contact on a connector. These faults are very
annoying and hard to fix because of their sporadic nature.
- Permanent faults caused by faulty components. The system will not work until the
component is replaced. Burnt out chips, software bugs and processor failure are all
examples of permanent faults.
Processor Faults
A special type of component is the processor, and it can fail in three ways, fail-silent,
Byzantine and slowdown. All lead to a different kind of failure.
- A fail-silent, also known as fail-stop fault occurs when a processor stops
functioning. It no longer accepts input and outputs nothing, except perhaps to say it
is no longer functioning.
- Byzantine faults occurs when a faulty processor continues to run, giving wrong
answers and maybe working with other faulty processors to give the impression that
they are working correctly. Compared with fail-silent faults, Byzantine faults are hard
to diagnose and more difficult to deal with.
- A slowdown fault occurs when a certain processor executes very slowly. That
processor might be labeled as failed by the other processors. However the “failed”
processor may return to its normal speed and begin to issue orders, causing problems
within the distributed systems.
17
CSC413-Distributed Computing System
Network Failures
Network failures keep processors from communicating with each other. We will look at the
failures resulting in total loss of communication along parts of the network. Two problems
arise from this: one-way links and network partitions.
- One-way links cause problems similar to processor slowdown. For example,
processor A can send messages to processor B but cannot receive messages from
B. Processor C can talk to both A and B. So each processor has a different idea of which
processors have failed. Processor A might think that B has failed since it does not
receive any messages from it. Processor C thinks both A and B are working properly
since it can send and receive messages from both.
- Network partitions occur when a line connecting two sections of a network fail. So
processors A and B can communicate with each other but cannot communicate with
processors C and D and visa versa. Let us say processor A updates a file and processor
C updates the same file but in a different way. When the partition is fixed, the file will
not be consistent among all processors. It is not clear to the processors how to make
the data consistent again.
C. Flexibility
Another important issue in the design of distributed operating systems is flexibility. Flexibility
is the most important features for open distributed systems. The design of a distributed
operating system should be flexible due to the following reasons:
1. Ease of modification : From the experience of system designers, it has been found that
some parts of the design often need to be replaced / modified either because some bug is
detected in the design or because the design is no longer suitable for the changed system
environment or new-user requirements. Therefore, it should be easy to incorporate
changes in the system in a user-transparent manner or with minimum interruption
caused to the users.
2. Ease of enhancement : In every system, new functionalities have to be added from time
to time it more powerful and easy to use. Therefore, it should be easy to add new services
18
CSC413-Distributed Computing System
to the system. Furthermore, if a group of users do not like the style in which a particular
service is provided by the operating system, they should have the flexibility to add and
use their own service that works in the style with which the users of that group are more
familiar and feel more comfortable. The most important design factor that influences the
flexibility of a distributed operating system is the model used for designing its kernel. The
kernel of an operating system is its central controlling part that provides basic system
facilities. It operates in a separate address space that is inaccessible to user processes. It
is the only part of an operating system that a user cannot replace or modify.
The two commonly used models for kernel design in distributed operating systems are the
monolithic kernel and the microkernel.
- In the monolithic kernel model, most operating system
services such as process management, memory management, device management, file
management, name management, and inter-process communication are provided by the kernel.
As a result, the kernel has a large, monolithic structure. Many distributed operating systems that
are extensions or limitations of the UNIX operating system use the monolithic kernel model. This
is mainly because UNIX itself has a large, monolithic kernel.
- On the other hand, in the microkernel model, the main goal is to keep the kernel as small
as possible. Therefore, in this model, the kernel is a very small nucleus of software that provides
only the minimal facilities necessary for implementing additional operating system services. The
only services provided by the kernel in this model are inter-process communication low level
device management, a limited amount of low-level process management and some memory
management. All other operating system services, such as file management, name management,
additional process, and memory management activities and much system call handling are
implemented as user-level server processes. Each server process has its own address space and
can be programmed separately.
As compared to the monolithic kernel model, the microkernel model has several advantages. In
the monolithic kernel model, the large size of the kernel reduces the overall flexibility and
configurability of the resulting operating system. On the other hand, the resulting operating
system of the microkernel model is highly modular in nature. Due to this characteristic feature,
the operating system of the microkernel model is easy to design, implement, and install.
Moreover, since most of the services are implemented as user-level server processes, it is also
19
CSC413-Distributed Computing System
easy to modify the design or add new services. In spite of its potential performance cost, the
microkernel model is being preferred for the design of modern distributed operating systems.
D. Performance
1. Batch if possible: Batching often helps in improving performance greatly. For example,
transfer of data across the network in large chunks rather than as individual pages is much more
efficient. Similarly, piggybacking of acknowledgement of previous messages with the next
message during a series of messages exchanged between two communicating entities also
improves performance.
2. Cache whenever possible: Caching of data at clients’ sites frequently improves overall
system performance because it makes data available wherever it is being currently used, thus
saving a large amount of computing time and network bandwidth. In addition, caching reduces
contention on centralized resources.
3. Minimize copying of data: Data copying overhead (e.g. moving data in and out of buffers)
involves a substantial CPU cost of many operations. For example, while being transferred from
its sender to its receiver, a message data may take the following path on the sending side:
(a) From sender’s stack to its message buffer
(b) From the message buffer in the sender’s address space to the message buffer in the kernel’s
address space
(c) Finally, from the kernel to the network interface board.
20
CSC413-Distributed Computing System
On the receiving side, the data probably takes a similar path in the reverse direction. Therefore,
in this case, a total of six copy operations are involved in the message transfer operation.
Similarly, in several systems, the data copying overhead is also large for read and write
operations on block I/O devices.
Therefore, for better performance, it is desirable to avoid copying of data, although this is not
always simple to achieve. Making optimal use of memory management often helps in eliminating
much data movement between the kernel, block I/O devices, clients, and servers.
4. Minimize network traffic: System performance may also be improved by reducing internode
communication costs. For example, accesses to remote resources require communication,
possibly through intermediate nodes. Therefore, migrating a process closer to the resources it is
using most heavily may be helpful in reducing network traffic in the system if the decreased cost
of accessing its favorite resource offsets the possible increased post of accessing its less favored
ones. Another way to reduce network traffic is to use the process migration facility to cluster two
or more processes that frequently communicate with each other on the same node of the system.
Avoiding the collection of global state information for making some decision also helps in
reducing network traffic.
E. Scalability
Scalability refers to the capability of a system to adapt to increased service load. It is inevitable
that a distributed system will grow with time since it is very common to add new machines or
an entire subnetwork to the system to take care of increased workload or organizational changes
in a company. Therefore, a distributed operating system should be designed to easily cope with
the growth of nodes and users in the system. That is, such growth should not cause serious
disruption of service or significant loss of performance to users.
21
CSC413-Distributed Computing System
In the design of a distributed operating system, use of centralized entities such as a single central
file server or a single database for the entire system makes the distributed system non-scalable
1. It should be possible for the sender of a message to know that the message was received by
the intended receiver.
2. It should be possible for the receiver of a message to know that the message was sent by the
genuine sender.
3. It should be possible for both the sender and receiver of a message to be guaranteed that the
contents of the message were not changed while it was in transfer.
Cryptography is the only known practical method for dealing with these security aspects of a
distributed system. In this method comprehension of private information is prevented by
encrypting the information, which can then be decrypted only by authorized users.
22
CSC413-Distributed Computing System
Many distributed systems must handle crash failures, such as application crashes, operating
system crashes, device driver crashes, application deadlocks, and hardware failures. A common
way to handle crashes involves two steps: (1) Detect the failure; and (2) Recover, by restarting.
A common way for a distributed system to tolerate crashes is to explicitly detect them and then
recover from them. Interestingly, detection can take much longer than recovery, as a result of
many advances in recovery techniques, making failure detection the dominant factor in these
systems’ unavailability when a crash occurs. A failure detector is a service that reports the status
of a remote process as UP or DOWN. A failure detector should ideally have three properties.
i. First, it should be a reliable failure detector (RFD): when a process is up, it is reported
as UP, and when it crashes, it is reported as DOWN after a while.
ii. Second, the failure detector should be fast: the time taken to report DOWN, known as
the detection time, should be short (less than a second), so as not to delay recovery.
The above properties are in tension with each other and with other desired properties. For
instance, a short detection time based on timeouts would compromise reliability, since the
detector would report as DOWN a process that is UP. As an alternative, a detector could ensure
reliability and a short detection time by killing processes at the slightest provocation, but that
would be disruptive. Also, short detection times often require probing the target incessantly,
which is costly. Another challenge is comprehensiveness: how can the detector maximize its
coverage of failures?
A distributed system is synchronous if the time to execute each step of a process has an
upper and lower bound. Each message transmitted over a channel is received within a known
23
CSC413-Distributed Computing System
bounded delay. Each process has a local clock whose drift rate from real time has a known
bound.
In synchronous message-passing, every process sends out messages at time t that are delivered
at time t+1, at which point more messages are sent out that are delivered at time t + 2, and so on:
the whole system runs in lockstep, marching forward in perfect synchrony. Such systems are
difficult to build when the components become too numerous or too widely dispersed, but they
are often easier to analyze. Timeouts can be used to detect failures.
A distributed system is asynchronous if each step of a process can take an arbitrary time.
i.e. message delivery time and clock drift rates are arbitrary. In this case it is difficult to build a
failure detector or almost impossible to detect failure. Messages are delivered eventually after
some unknown delay. Unfortunately, it is impossible to distinguish with certainty a crashed
process from a very slow process in a purely asynchronous distributed system.
24
CSC413-Distributed Computing System
them fails, the others can be used to continue. Obviously, having two or more copies of a
critical component makes it possible, at least in principle, to continue operations in spite of
occasional partial failures. For example, a critical process can be simultaneously executed on
two nodes so that if one of the two nodes fails, the execution of the process can be completed
at the other node. Similarly, a critical file may be replicated on two or more storage devices
for better reliability. Notice that with redundancy techniques additional system overhead is
needed to maintain two or more copies of a replicated resource and to keep all the copies of
a resource consistent. For example, if a file is replicated on two or more nodes of a distributed
system, additional disk storage space is required and for correct functioning, it is often
necessary that all the copies of the file are mutually consistent. In general, the larger is the
number of copies kept, the better is the reliability
but the incurred overhead involved.
2. Distributed control: For better reliability, many of the particular algorithms or protocols
used in a distributed operating system must employ a distributed control mechanism to
avoid single points of failure. For example, a highly available distributed file
system should have multiple and independent file servers controlling multiple and
independent storage devices. In addition to file servers, a distributed control technique could
also be used for name servers, scheduling algorithms, and other executive control functions.
25
CSC413-Distributed Computing System
Transactions help to preserve the consistency of a set of shared data objects (e.g. files) in the
face of failures and concurrent access. They make crash recovery much easier, because
transactions can only end in two states : Either all the operations of the transaction are
performed or none of the operations of the transaction is performed.
In a system with transaction facility, if a process halts unexpectedly due to a hardware error
before a transaction is completed, the system subsequently restores any data objects that
were undergoing modification to their original states. Notice that if a system does not
support a transaction mechanism, unexpected failure of a process during the processing of
an operation may leave the data objects that were undergoing modification in an
inconsistent state. Therefore, without transaction facility, it may be difficult or even
impossible in some cases to roll back (recover) the data objects from their current
inconsistent states to their original states.
Although stateful service becomes necessary in some cases, to simplify the failure detection
and recovery actions, the stateless service paradigm must be used, wherever possible.
26
CSC413-Distributed Computing System
1. Delivery: The system must deliver data to the correct destination. Data must be
received by the intended device or user and only by that device or user.
2. Accuracy: The system must deliver the data accurately. Data that have been
altered in transmission and left uncorrected are unusable.
3. Timeliness: The system must deliver data in a timely manner. Data delivered late
are useless. In the case of video and audio, timely delivery means delivering data
as they are produced, in the same order that they are produced, and without
significant delay. This kind of delivery is called real-time transmission.
27
CSC413-Distributed Computing System
4. Jitter: Jitter refers to the variation in the packet arrival time. It is the uneven delay
in the delivery of packets. For example, let us assume that video packets are sent
every 30ms. If some of the packets arrive with 30-ms delay and others with 40-ms
delay, an uneven quality in the video is the result.
2. Sender. The sender is the device that sends the data message. It can be a
computer, workstation, telephone handset, video camera, and so on.
3. Receiver. The receiver is the device that receives the message. It can be a
computer, workstation, telephone handset, television, and so on.
28
CSC413-Distributed Computing System
1. Connection oriented
2. Connectionless network
29
CSC413-Distributed Computing System
There are three types of message transport techniques namely; circuit switched, message
switched and packet switched.
1. Circuit Switched Network. It transmits a message by providing a complete
path of transmission link right from the source of the message to the destination
node. A path is set up by a special signaling sent by the source and acknowledged
by the destination.
In this technique, data is transmitted progressively along the path without no
intermediate store-and-forward delay. The entire path is allocated to this
30
CSC413-Distributed Computing System
transmission until the source node releases this path. There is the requirement of
a dedicated path between the calling and the called parties. E.g. Telephone System.
2. Message Switched Network. It routes a message through intermediate nodes
which buffer the message data. Message is stored by an intermediate node and
forwarded to the next node upon allocation of buffer for the data at the next node.
It is called store and forward network.
In a message-switched network, there is no direct connection between the source
and destination nodes. In such networks, the intermediary nodes (switches) have
the responsibility of conveying the received message from one node to another in
the network. Therefore, each intermediary node within the network must store all
messages before retransmitting them one at a time as proper resource become
available. E.g. Telegraph. Message switching offers a number of attractive benefits,
including efficient usage of network resources, traffic regulation, and message
storage when messages cannot be delivered. There is need for large storage space
to store data in nodes
3. Packet Switched Network. This is similar to message switched network. The
only difference is that the message to be sent is broken up into fixed size segments
called packets. The packets traverse independently in the network until they reach
the destination node where they are reassembled.
Communication Protocols
1. Syntax: The term syntax refers to the structure or format of the data, and order
in which they are presented. For example, a simple protocol might expect first 8
31
CSC413-Distributed Computing System
bits of data to be the address of the sender, the second 8 bits to be the address of
the receiver, and the rest of the stream to be the message itself.
3. Timing: The term timing refers to when data should be sent and how fast
they can be sent. For example, if a sender produces data at 100mbps but the
receiver can process data at only I Mbps, the transmission will overload receiver
and some data will be lost.
32
CSC413-Distributed Computing System
Assignment:
CLIENT-SERVER ARCHITECTURE
Client-Server Architecture: An information-passing scheme that works as follows: a client
program, such as Mosaic, sends a request to a server. The server takes the request,
disconnects from the client and processes the request. When the request is processed, the
server reconnects to the client program and the information is transferred to the client.
This architecture differs from traditional Internet databases where the client connects to
the server and runs the program from the remote site.
33
CSC413-Distributed Computing System
1. Clients and servers are functional modules with well-defined interfaces (i.e., they hide
internal information). The functions performed by a client and a server can be
implemented by a set of software modules, hardware components, or a combination
thereof.
2. Each client/server relationship is established between two functional modules when
one module (client) initiates a service request and the other (server) chooses to respond
to the service request.
For a given service request (SR), clients and servers do not reverse roles (i.e., a client stays
a client and a server stays a server).
However, a server for SRR1 may become a client for SRR2 when it issues requests to another
server. For example, a client may issue an SR that may generate other SRs.
3. Information exchange between clients and servers is strictly through messages (i.e., no
information is exchanged through global variables). The service request and additional
information is placed into a message that is sent to the server.
The server's response is similarly another message that is sent back to the client. This is
an extremely crucial feature of C/S model.
4. Messages exchanged are typically interactive. In other words C/S model does not
support an off-line process. There are a few exceptions. For example, message queuing
systems allow clients to store messages on a queue to be picked up asynchronously by the
servers at a later stage.
5. Clients and servers typically reside on separate machines connected through a network.
Conceptually, clients and servers may run on the same machine or on separate machines.
34
CSC413-Distributed Computing System
The implication of the last two features is that C/S service requests are real-time messages
that are exchanged through network services. This feature increases the appeal of the C/S
model (i.e., flexibility, scalability) but introduces several technical issues such as
portability, interoperability, security, and performance.
Characteristics of a Client
Request sender is known as client
Initiates requests
Waits for and receives replies.
Usually connects to a small number of servers at one time
Typically interacts directly with end-users using a graphical user interface
Characteristics of a Server
Receiver of request which is send by client is known as server
Passive (slave)
Waits for requests from clients
Upon receipt of requests, processes them and then serves replies
Usually accepts connections from a large number of clients
Typically does not interact directly with end-users
Note
35
CSC413-Distributed Computing System
A client process needing a service (e.g. reading data from a file) sends a message to the
server and waits for a reply message. The server process, after performing the requested
task, sends the result in the form of a reply message to the client process.
Note that servers merely respond to the request of the clients, and do not typically initiate
conversations with clients.
In systems with multiple serves, it is desirable that when providing services to clients, the
locations and conversations among the servers are transparent to the clients.
Clients typically make use of a cache to minimize the frequency of sending data requests
to the servers.
36
CSC413-Distributed Computing System
Server Model
The server software architecture interact as superior and not subordinate. The client
process always initiates requests, and the server process always responds to request.
Theoretically, client and server can reside on the same machine.
Typically, however, they run on separate machines linked by a local area network. Many
different types of server exists, though most people tend to equate the term client-server
with a network of PCs or workstations connected by a network to a remote machine
running a database server. Examples of server are:
1. Mail servers.
2. Print servers.
3. File Servers.
4. Database (SQL) Servers
The client/server model is a form of distributed computing where one program (the
client) communicates with another program (the server) for the purpose of exchanging
information.
37
CSC413-Distributed Computing System
This is another major advantage of client/server computing; it tends to use the strengths
of divergent computing platforms to create more powerful applications.
38
CSC413-Distributed Computing System
TIERED ARCHITECTURE
Generic client/server architecture has two types of nodes on the network: clients and
servers. As a result, these generic architectures are sometimes referred to as "two-tier"
architectures.
Some networks will consist of three different kinds of nodes: server, application servers
which process data for the clients and database servers which store data for the
application servers. This is called three-tier architecture.
The advantage of an n-tier architecture compared with a two-tier architecture (or a three-
tier with a two-tier) is that it separates out the processing that occurs to better balance
the load on the different servers; it is more scalable.
First generation systems are 2-tiered architectures where a client presents a graphical
user interface (GUI) to the user, and acts according to the user's actions to perform
requests of a database server running on a different machine.
2-Tier Architectures.
Client/server applications started with a simple, 2-tiered model consisting of a client and
an application server. The most common implementation is a 'fat' client - 'thin' server
architecture, placing application logic in the client. As seen in Figure 11, the database
simply reports the results of queries implemented via dynamic SQL using a call level
interface (CLI) such as Microsoft's Open Database Connectivity (ODBC).
39
CSC413-Distributed Computing System
Alternate approach is to use thin client - fat server waylays that invokes procedures stored
at the database server as shown in Figure 12. The term thin client generally refers to user
devices whose functionality is minimized, either to reduce the cost of ownership per
desktop or to provide more user flexibility and mobility.
Remote database transport protocols such as SQL-Net are used to carry the transaction.
The network 'footprint' is very large per query so that the effective bandwidth of the
network, and thus the corresponding number of users who can effectively use the
network, is reduced.
Furthermore, network transaction size and query transaction speed is slowed by this
heavy interaction.
These architectures are not intended for mission critical applications
40
CSC413-Distributed Computing System
3-Tier Architectures
Inserting a middle tier in between a client and server achieves a 3-tier configuration. The
Components of three-tiered architecture are divided into three layers:
A presentation layer, functionality layer, and data layer, which must be logically separate.
As seen in Figure 13, the 3-tier architecture attempts to overcome some of the limitations
of 2-tier schemes by separating presentation, processing, and data into separate distinct
entities.
The middle-tier servers are typically coded in a highly portable, non-proprietary language
such as C.
Middle-tier functionality servers may be multithreaded and can be accessed by multiple
clients, even those from separate applications.
41
CSC413-Distributed Computing System
The client interacts with the middle tier via a standard protocol such as dynamic link
library (DLL), API, or RPC. The middle-tier interacts with the server via standard
database protocols.
The middle-tier contains most of the application logic, translating client calls into
database queries and other actions, and translating data from the database into client data
in return.
If the middle tier is located on the same host as the database, it can be tightly bound to
the database via an embedded third generation language (3gl) interface. This yields a very
highly controlled and high performance interaction, thus avoiding the costly processing
and network overhead of SQL-Net, ODBC, or other CLIs.
Furthermore, the middle tier can be distributed to a third host to gain processing power
capability.
42
CSC413-Distributed Computing System
As more users access applications remotely for business-critical functions, the ability of
servers to scale becomes the key determinant of end-to-end performance. There are
several ways to address this ever-increasing load on servers. Three techniques are widely
used:
Upsizing the servers
Deploying clustered servers
Partitioning server functions into a "tiered" arrangement
N-Tier Architectures
The 3-tier architecture can be extended to N-tiers when the middle tier provides
connections to various types of services, integrating and coupling them to the client, and
to each other. Partitioning the application logic among various hosts can also create an
N-tiered system.
Encapsulation of distributed functionality in such a manner provides significant
advantages such as reusability, and thus reliability.
As applications become Web-oriented, Web server front ends can be used to offload the
networking required to service user requests, providing more scalability and introducing
points of functional optimization. In this architecture, as seen in Figure 14, the client
sends HTTP requests for content and presents the responses provided by the application
system.
On receiving requests, the Web server either returns the content directly or passes it on
to a specific application server. The application server might then run CGI scripts for
dynamic content, parse database requests, or assemble formatted responses to client
queries, accessing dates or files as needed from a back-end database server or a file server.
43
CSC413-Distributed Computing System
By segregating each function, system bottlenecks can be more easily identified and
cleared by scaling the particular layer that is causing the bottleneck.
For example, if the Web server layer is the bottleneck, multiple Web servers can be
deployed, with an appropriate server load-balancing solution to ensure effective load
balancing across the servers as illustrated in Figure 15.
1. Different aspects of the application can be developed and rolled out independently.
2. Servers can be optimized separately for database and application server functions
3. Servers can be sized appropriately for the requirements of each tier of the architecture
4. More overall server horsepower can be deployed
44
CSC413-Distributed Computing System
5. They are far more scalable, since they balance and distribute the processing load among
multiple, often redundant, specialized server nodes.
6. It turn improves overall system performance and reliability, since more of the
processing load can be accommodated simultaneously.
45