Distributed System Architecture
Distributed System Architecture
If we look back at the history of computers then we realize that the growth has been at a
tremendous pace. Early computers were so large that big rooms were needed to store
them. They were very expensive also and it was not possible for an ordinary person to
use them. That was the era of serial processing systems. Right from that stage now we
have reached to a stage where we can keep the computers in our pockets and even a
The evolution of computer systems can be divided into various phases or generations
[15]. During the early phase of development the changes used to take place after a long
duration of time. New technologies took over a decade to evolve and to be accepted. But
now the changes are very very fast and their acceptance rate is also very high. Everyone
wants to switch to a new technology as soon as it is in the market. Let us have a look at
how we have moved from the era of vacuum tubes to the present day technology.
There was no operating system in the earliest electronic digital computers. In that era the
programs were entered in the computer system one bit at a time on mechanical switches
or plug boards. Programming languages and operating systems were unheard of. It was
not possible for a common man to use computer systems. There were specially trained
group of people who used to design and build these systems. The programming,
operation and maintenance of these systems also required special training. For using
these systems the programmers had to reserve them by signing up beforehand. Then at
the designated time they used to come down to the computer room, insert their own
plugboard and execute their program. During this era the systems were very huge which
used to occupy big rooms. Plug boards and mechanical switches were used in building
During the start of 1950s one job was executed in a system at a time. These were single
user systems as only one person used to execute his/her job at a time. All the resources
were there for that single user only. The computer system of that time did not had hard
disks as we have today. In these systems the tapes and disks had to be loaded before the
execution of the program. This took considerable amount of time which was known as
the job setup time. Hence the computer system was dedicated to one job for more than
the job’s execution time. Similarly after the execution of considerable omelette
considerable “teardown” time was needed which was the time required for removing the
tapes and disk packs. The computer system sat idle during job setup and job teardown.
In a serial processing system the procedure of job setup and teardown was repeated for
each job submitted. The new idea that came up in this era was to group together jobs
which required the same type of execution environment. By doing this and running all
these jobs one after the other the job setup and teardown had to be done only once for the
complete group. This is known as batch processing and it saved pretty good amount of
time.
The programming language used for the control cards was called job control language
(JCL). These single stream batch processing systems became very popular in the early
1960s. General Motors Operating System, Input Output System, FORTRAN Monitor
System and SAGE (Semi-Automatic Ground Environment) are some of the operating
systems which came into existence in 1950s. SAGE was a real time control system which
In 1960s the concept of multiprogramming came into existence. In these systems also the
jobs were sent in batches. The advantage here was that multiple programs were loaded
into the memory at the same time. The program during it's course of execution goes
programs was busy doing some I/O the CPU was given to some other program in the
UNIX, VMS, and Windows NT are some of the most popular multiprogramming
operations on line) was again an additional feature in the operating systems of third
generation. There is big difference between the speed of the CPU and the speed of the
peripheral device like a printer. Now if the system has to directly write on the printer then
the speed of this write operation will be as fast as the speed of the printer, which infact is
very slow. Spooling removed this drawback by putting a high speed device like a disk in
between the running program and the low speed peripheral device. So now instead of
writing on the peripheral device the system will write on this high speed disk and will go
back to do some other job. Hence the time of the system is saved. Time sharing systems
were one of the major developments in this area. These systems allowed multiple users to
share computer resources simultaneously. These systems distribute the time among
multiple users where each user gets a very tiny slice of time. The users here are unaware
about this sharing of system resources. During it's allocated time the user utilises all the
resources of the system and then after it's time slice expires the other users get those
resources. The difference between time sharing systems and multiprogramming is very
every job is allocated a specific small time period only. The first timesharing system was
developed in November 1961 and was called CTSS – Compatible Time-Sharing System.
Fourth Generation (Microprocessor and personal Computer- The 1970s
The 1970s saw several important developments in the field of operating systems. Along
with this there were many technological advancements in the field of data communication
also. Military and university computing started making heavy use of TCP/IP
Personal computers were also developed in this era only. Microprocessor or the
microchip was the main advancement which led to the development of these personal
computers.. The first IBM PC, IBM 5150 is shown in figure 2.4.
These microprocessors had thousands of transistors integrated onto small silicon chips
and hence were also known as integrated circuits or ICs. Intel’s 4004 was the first
microprocessors were Intel’s 8008, Motorola 68000, Zilog Z80 and the Mostek 6502. The
first operating system for Intel 8080 was written in 1976 and was known as CP/M
When more than one computer or processor runs an application in parallel it is known as
distributed processing. There are numerous ways in which this distribution can take
place. Parallel processing is one such example in which there are multiple CPUs in a
program occurs on more than one processor in order for it to be completed in a more
When a group of identical or similar computers are connected in the same geographic
location using high speed connections they form a cluster. In cluster computing these
systems operate as a single computer and these computers that form the cluster cannot
work as independent separate systems. A cluster works as one big computer in which a
In a Grid also a number of computers are connected using a communication network and
these systems work together on a large problem. The basic difference between a grid and
a cluster is the type of systems that are connected. In a cluster we have similar systems
but in a grid we have got hetrogenous environment, i.e. the systems connected in a grid
are of different types. This hetrogienity is both in terms of hardware as well as software.
This means that the computers that form a grid can run different operating systems and
they can have different hardware also. A Grid can be formed over a Local Area Network,
Metropolitan Area Network or a Wide Area Network which means that they are
inherently distributed in nature Another difference between a cluster and a Grid is that the
nodes in a Grid are autonomous i.e. they have their own resource manager and each node
behaves like an independent entity. As far as the physical location is concerned these
nodes can be geographically far apart and they can be operated independently. Each
Unused resources such as processing time and memory on one computer can be used by a
process running on another computer. This can be achieved with the help of a program
which runs on each node of the distributed environment. We know that the processing
speed of a computer is much larger than the speed of the communication network
between them. The task on a system is broken down into smaller independent parts and
these parts are migrated on various nodes. These nodes process their portions
So in a Grid structure, normally, a server logs onto a bunch of computers (the grid) and
sends them data and a program to run. The program is run on those computers, and when
the results are ready they are send back to the server.
product. This means that the shared resources such as software and information are
electricity grid where the electric power runs over the wires, which is accessible to
everyone who has the connection and each user pays depending upon their utilization.
Analogous to this in cloud computing the resources can be accessed over the network
The term cloud computing was first used in this context in 1997 by Ramnath Chellappa
website in 1999. This was time when the commercial application of this technology
started to come into the market. After this many more players started to emerge in the
market. Amazon launched its Amazon Web Service in 2002; Google Docs which came in
2006 brought cloud computing to the forefront of public consciousness. Amazon also
introduced its Amazon’s Elastic Compute cloud (EC2) as a commercial web service in
2006. We can also compare this service to a rental agency which provides computing
resources to small companies and individuals which cannot afford to have a full fledged
infrastructure of their own. These companies pay to the service provider in accordance to
the usage.
In the year 2007 corporate giants Google, IBM and a number of universities across the
United States came together in an industry-wide collaboration for this technology. The
concept of private cloud came with Eucalyptus in 2008 which was the first open source
AWS (Amazon Web Services) API compatible platform. This was followed by
OpenNebula which was the first open source software for deploying private and hybrid
clouds.
Microsoft also entered into cloud computing with Windows Azure in November 2009.
By this time most of the big companies were there in cloud computing. The latest entrants
in this technology are Dell, Oracle, Fujitsu, HP, Teradata, and a number of other
household names. Fundamentally the concepts of Grid computing and Cloud are different
but still we can have a cloud cluster within a computational grid and vice-versa.
The difference between Cloud Computing and Grid Computing lies in the method they
use for determining the tasks within their environments. A single task is broken down
into smaller subtasks in a grid environment and each of these subtasks is distributed
among different computing machines. Once these smaller tasks are completed they are
sent back to the primary machine. The primary machine combines the results obtained
On the other hand the main focus of a cloud computing architecture is to enable users to
grid also similar facility for computing power is offered, but cloud computing goes
beyond that. With a cloud various services such as web hosting etc. are also provided to
the users.
The main feature of cloud is that it offers infrastructure as a service (IaaS), software as a
service (SaaS) and platform as a service (PaaS) as well as Web 2.0. The cost and
needed to build and deploy applications is eliminated here. Instead these applications are
centralised computing model to distributed computing model. Let us discuss these two
models in detail.
All computing is controlled through a central terminal server(s), which centrally provides
the processing, programs and storage. The workstations (ThinClients, PCs, appliances)
are just used for input and display purposes. They connect to the server(s) where all tasks
are performed. All server resources are purchased once and shared by all users. Security
issues are far easier to coordinate and centrally nail down. Thus Centralized Computing
takes some of the control and all of the parts easily susceptible to failure away from the
desktop appliance. All computing power, processing, program installations, back-ups and
CC Advantages
securities.
CC Disadvantages
• In the rare event of a network failure, the ThinClient Terminal may lose access
to the terminal server. If this happens, there are still means to use some resources
Traditionally, this type of computing was only found in Enterprise Level Businesses. In
more recent time, reduced server and network costs have seen this type of computing
Every user has their own PC (desktop or laptop) for processing, programs and storage.
Storage is often mixed over the network between the local PC, shared PCs, or a dedicated
file server. Each PC requires the purchase of its own resources (operating system,
programs, etc.). This is also known as Peer-to-Peer (P2P) model. This environment is an
ad-hoc network that is generally grown from a small group of independent computers that
need to share files, resources such as printers and network/internet connections. These
have allowed small business to improve some forms of productivity. If all is to run
smoothly, this model usually needs internal technical skills, or access to outsourced
technical support.
DC Advantages
• Each user can add their own programs at their own leisure.
DC Disadvantages
• Many moving parts (fans, hard drives) which are susceptible to failure.
This is the more widely used computing configuration, because it has grown out of what
most users and many IT people were used to, within the comfort zone of their home PCs.
As a result, there has been extensive development of many business practices, systems
and security products to help the distributed system fully function in a business
environment.
The processes running on the CPU’s of the different nodes are interconnected with some
sort of communication system. Various models are used for building distributed
shown in Figure 2.7, a distributed computing system based on this model consists of a
logged on to it. For this, several interactive terminals are connected to each
minicomputer.
Each user is logged on to one specific minicomputer, with remote access to other
minicomputers. The network allows a user to access remote resources that are available
on some machine other than the one on to which the user is currently logged. The
minicomputer model may be used when resource sharing (such as sharing of information
databases of different types, with each type of database located on a different machine)
with remote users is desired. The early ARPAnet is an example of a distributed
speed LAN, as shown in figure 2.8. Some of the workstations may be in offices, and thus
implicitly dedicated to a single user, whereas others may be in public areas and have
several different users during the course of a day. In both cases, at any instant of time, a
workstation either has a single user logged into it, and thus has an "owner" (however
temporary), or it is idle.
Figure 2.8. A network of personal workstations, each with a local file system.
In some systems the workstations have local disks and in others they do not. The latter
are universally called diskless workstations, but the former are variously known as
are diskless, the file system must be implemented by one or more remote file servers.
Requests to read and write files are sent to a file server, which performs the work and
the least of which is price. Having a large number of workstations equipped with small,
slow disks is typically much more expensive than having one or two file servers equipped
A second reason that diskless workstations are popular is their ease of maintenance.
When a new release of some program, say a compiler, comes out, the system
administrators can easily install it on a small number of file servers in the machine room.
matter entirely. Backup and hardware maintenance is also simpler with one centrally
located 5-gigabyte disk than with fifty 100-megabyte disks scattered over the building.
Another point against disks is that they have fans and make noise. Many people find this
noise objectionable and do not want it in their office. Finally, diskless workstations
provide symmetry and flexibility. A user can walk up to any workstation in the system
and log in. Since all his files are on the file server, one diskless workstation is as good as
another. In contrast, when all the files are stored on local disks, using someone else's
workstation means that you have easy access to his files, but getting to your own requires
extra effort, and is certainly different from using your own workstation.
When the workstations have private disks, these disks can be used in one of at least four
ways:
The first design is based on the observation that while it may be convenient to keep all
the user files on the central file servers (to simplify backup and maintenance, etc.) disks
are also needed for paging (or swapping) and for temporary files. In this model, the local
disks are used only for paging and files that are temporary, unshared, and can be
discarded at the end of the login session. For example, most compilers consist of multiple
passes, each of which creates a temporary file read by the next pass. When the file has
been read once, it is discarded. Local disks are ideal for storing such files.
The second model is a variant of the first one in which the local disks also hold the binary
(executable) programs, such as the compilers, text editors, and electronic mail handlers.
When one of these programs is invoked, it is fetched from the local disk instead of from a
file server, further reducing the network load. Since these programs rarely change, they
can be installed on all the local disks and kept there for long periods of time. When a new
However, if hat machine happens to be down when the program is sent, it will miss the
program and continue to run the old version. Thus some administration is needed to keep
A third approach to using local disks is to use them as explicit caches (in addition to
using them for paging, temporaries, and binaries). In this mode of operation, users can
download files from the file servers to their own disks, read and write them locally, and
then upload the modified ones at the end of the login session. The goal of this
architecture is to keep long-term storage centralized, but reduce network load by keeping
files local while they are being used. A disadvantage is keeping the caches consistent.
What happens if two users download the same file and then each modifies it in different
ways? This problem is not easy to solve, and we will discuss it in detail later in the book.
Fourth, each machine can have its own self-contained file system, with the possibility of
mounting or otherwise accessing other machines' file systems. The idea here is that each
machine is basically self-contained and that contact with the outside world is limited.
This organization provides a uniform and guaranteed response time for the user and puts
little load on the network. The disadvantage is that sharing is more difficult, and the
resulting system is much closer to a network operating system than to a true transparent
The advantages of the workstation model are manifold and clear. The model is certainly
easy to understand. Users have a fixed amount of dedicated computing power, and thus
guaranteed response time. Sophisticated graphics programs can be very fast, since they
can have direct access to the screen. Each user has a large degree of autonomy and can
allocate his workstation's resources as he sees fit. Local disks add to this independence,
and make it possible to continue working to a lesser or greater degree even in the face of
However, the model also has two problems. First, as processor chips continue to get
cheaper, it will soon become economically feasible to give each user first 10 and later
100 CPUs. Having 100 workstations in your office makes it hard to see out the window.
Second, much of the time users are not using their workstations, which are idle, while
other users may need extra computing capacity and cannot get it. From a system-wide
perspective, allocating resources in such a way that some users have resources they do
not need while other users need these resources badly is inefficient.
The second problem, idle workstations, has been the subject of considerable research,
some of which are idle (an idle workstation is the devil's playground?). Measurements
show that even at peak periods in the middle of the day, often as many as 30 percent of
the workstations are idle at any given moment. In the evening, even more are idle. A
variety of schemes have been proposed for using idle or otherwise underutilized
workstations [17, 18]. The earliest attempt to allow idle workstations to be utilized was
the
rsh program
in which the first argument names a machine and the second names a command to run on
it. What rsh does is run the specified command on the specified machine. Although
widely used, this program has several serious flaws. First, the user must tell which
machine to use, putting the full burden of keeping track of idle machines on the user.
Second, the program executes in the environment of the remote machine, which is
usually different from the local environment. Finally, if someone should log into an idle
machine on which a remote process is running, the process continues to run and the
newly logged-in user either has to accept the lower performance or find another machine.
The research on idle workstations has centered on solving these problems. The key issues
are:
To start with, what is an idle workstation? At first glance, it might appear that a
workstation with no one logged in at the console is an idle workstation, but with modern
computer systems things are not always that simple. In many systems, even with no one
logged in there may be dozens of processes running, such as clock daemons, mail
daemons, news daemons, and all manner of other daemons. On the other hand, a user
who logs in when arriving at his desk in the morning, but otherwise does not touch the
computer for hours, hardly puts any additional load on it. Different systems make
different decisions as to what "idle" means, but typically, if no one has touched the
keyboard or mouse for several minutes and no user-initiated processes are running, the
load between one idle workstation and another, due, for example, to the volume of mail
The algorithms used to locate idle workstations can be divided into two categories: server
driven and client driven. In the former, when a workstation goes idle, and thus becomes a
potential compute server, it announces its availability. It can do this by entering its name,
network address, and properties in a registry file (or data base), for example. Later, when
remote command
and the remote program looks in the registry to find a suitable idle workstation. For
An alternative way for the newly idle workstation to announce the fact that it has become
unemployed is to put a broadcast message onto the network. All other workstations then
record this fact. In effect, each machine maintains its own private copy of the registry.
The advantage of doing it this way is less overhead in finding an idle workstation and
Figure 2.9: A registry-based algorithm for finding and using idle workstations.
Whether there is one registry or many, there is a potential danger of occurring of race
conditions. If two users invoke the remote command simultaneously, and both of them
discover that the same machine is idle, they may both try to start up processes there at the
same time. To detect and avoid this situation, the remote program can check with the idle
workstation, which, if still free, removes itself from the registry and gives the go-ahead
sign. At this point, the caller can send over its environment and start the remote process,
The other way to locate idle workstations is to use a client-driven approach. When remote
is invoked, it broadcasts a request saying what program it wants to run, how much
memory it needs, whether or not floating point is needed, and so on. These details are not
needed if all the workstations are identical, but if the system is heterogeneous and not
every program can run on every workstation, they are essential. When the replies come
back, remote picks one and sets it up. One nice twist is to have "idle" workstations delay
their responses slightly, with the delay being proportional to the current load. In this way,
the reply from the least heavily loaded machine will come back first and be selected.
Finding a workstation is only the first step. Now the process has to be run there. Moving
the code is easy. The trick is to set up the remote process so that it sees the same
environment it would have locally, on the home workstation, and thus carries out the
To start with, it needs the same view of the file system, the same working directory, and
the same environment variables (shell variables), if any. After these have been set up, the
program can begin running. The trouble starts when the first system call, say a READ, is
executed. What should the kernel do? The answer depends very much on the system
architecture. If the system is diskless, with all the files located on file servers, the kernel
can just send the request to the appropriate file server, the same way the home machine
would have done had the process been running there. On the other hand, if the system has
local disks, each with a complete file system, the request has to be forwarded back to the
Some system calls must be forwarded back to the home machine no matter what, even if
all the machines are diskless. For example, reads from the keyboard and writes to the
screen can never be carried out on the remote machine. However, other system calls must
be done remotely under all conditions. For example, the UNIX system calls SBRK
(adjust the size of the data segment), NICE (set CPU scheduling priority), and PROFIL
(enable profiling of the program counter) cannot be executed on the home machine. In
addition, all system calls that query the state of the machine have to be done on the
machine on which the process is actually running. These include asking for the machine's
name and network address, asking how much free memory it has, and so on.
System calls involving time are a problem because the clocks on different machines may
not be synchronized. Using the time on the remote machine may cause programs that
depend on time, like make, to give incorrect results. Forwarding all time-related calls
back to the home machine, however, introduces delay, which also causes problems with
time.
To complicate matters further, certain special cases of calls which normally might have to
be forwarded back, such as creating and writing to a temporary file, can be done much
more efficiently on the remote machine. In addition, mouse tracking and signal
propagation have to be thought out carefully as well. Programs that write directly to
hardware devices, such as the screen's frame buffer, diskette, or magnetic tape, cannot be
run remotely at all. All in all, making programs run on remote machines as though they
were running on their home machines is possible, but it is a complex and tricky business.
The final question on our original list is what to do if the machine's owner comes back
(i.e., somebody logs in or a previously inactive user touches the keyboard or mouse). The
easiest thing is to do nothing, but this tends to defeat the idea of "personal" workstations.
If other people can run programs on your workstation at the same time that you are trying
Another possibility is to kill off the intruding process. The simplest way is to do this
abruptly and without warning. The disadvantage of this strategy is that all work will be
lost and the file system may be left in a chaotic state. A better way is to give the process
fair warning, by sending it a signal to allow it to detect impending doom, and shut down
gracefully (write edit buffers to the disk, close files, and so on). If it has not exited within
a few seconds, it is then terminated. Of course, the program must be written to expect and
handle this signal, something most existing programs definitely are not.
A completely different approach is to migrate the process to another machine, either back
to the home machine or to yet another idle workstation. Migration is rarely done in
practice because the actual mechanism is complicated. The hard part is not moving the
user code and data, but finding and gathering up all the kernel data structures relating to
the process that is leaving. For example, it may have open files, running timers, queued
incoming messages, and other bits and pieces of information scattered around the kernel.
These must all be carefully removed from the source machine and successfully reinstalled
on the destination machine. There are no theoretical problems here, but the practical
engineering difficulties are substantial. Further literature about this can be found in [19,
20].
In both cases, when the process is gone, it should leave the machine in the same state in
which it found it, to avoid disturbing the owner. Among other items, this requirement
means that not only must the process go, but also all its children and their children. In
addition, mailboxes, network connections, and other system-wide data structures must be
deleted, and some provision must be made to ignore RPC replies and other messages that
arrive for the process after it is gone. If there is a local disk, temporary files must be
deleted, and if possible, any files that had to be removed from its cache restored.
The Sprite system [21] and an experimental system developed at Xerox PARC [22] are
The workstation model is a network of personal workstations, each with its own disk and
a local file system. A workstation with its own local disk is usually called a diskful
workstation and a workstation without a local disk is called a diskless workstation. With
server model more popular than the workstation model for building distributed
computing systems.
Figure 2.10: A distributed computing system based on the workstation-server
model.
As shown in figure 2.10, a distributed computing system based on the workstation server
model consists of a few minicomputers and several workstations (most of which are
network.
Note that when diskless workstations are used on a network, the file system to be used by
minicomputer equipped with a disk for file storage. The minicomputers are used for this
purpose. One or more of the minicomputers are used for implementing the file system.
Other minicomputers may be used for providing other types of services, such as database
service and print service. Therefore, each minicomputer is used as a server machine to
provide one or more types of services. Hence in the workstation-server model, in addition
to the workstations, there are specialized machines (may be specialized workstations) for
running server processes (called servers) for managing and providing access to shared
resources.
For a number of reasons, such as higher reliability and better scalability, multiple servers
are often used for managing the resources of a particular type in a distributed computing
system. For example, there may be multiple file servers, each running on a separate
minicomputer and cooperating via the network, for managing the files of all the users in
the system. Due to this reason, a distinction is often made between the services that are
provided to clients and the servers that provide them. That is, a service is an abstract
entity that is provided by one or more servers. For example, one or more file servers may
be used in a distributed computing system to provide file service to the users. In this
model, a user logs onto a workstation called his or her home workstation. Normal
computation activities required by the user's processes are performed at the user's home
workstation, but requests for services provided by special servers (such as a file server or
a database server) are sent to a server providing that type of service that performs the
user's requested activity and returns the result of request processing to the user's
workstation. Therefore, in this model, the user's processes need not be migrated to the
server machines for getting the work done by those machines. For better overall system
performance, the local disk of a diskful workstation is normally used for such purposes as
storage of temporary files, storage of unshared files, storage of shared files that are rarely
accessed data.
advantages:
1. In general, it is much cheaper to use a few minicomputers equipped with large, fast
disks that are accessed over the network than a large number of diskful workstations,
maintenance point of view. Backup and hardware maintenance are easier to perform with
a few large disks than with many small disks scattered all over a building or campus.
Furthermore, installing new releases of software (such as a file server with new
functionalities) is easier when the software is to be installed on a few file server machines
3. In the workstation-server model, since all files are managed by the file servers, users
have the flexibility to use any workstation and access the files in the same manner
irrespective of which workstation the user is currently logged on. Note that this is not true
with the workstation model, in which each workstation has its local file system, because
mainly used to access the services of the server machines. Therefore, unlike the
workstation model, this model does not need a process migration facility, which is
difficult to implement.
this model, a client process (which in this case resides on a workstation) sends a request
to a server process (which in this case resides on a minicomputer) for getting some
service such as reading a block of a file. The server executes the request and sends back a
reply to the client that contains the result of request processing. The client-server model
resources in distributed computing systems. It is not only meant for use with the
software environments. The computers used to run the client and server processes need
not necessarily be workstations and minicomputers. They can be of many types and there
is no need to distinguish between them. It is even possible for both the client and server
processes to be run on the same computer. Moreover, some processes are both client and
server processes. That is, a server process may use the services of another server,
5. A user has guaranteed response time because workstations are not used for executing
remote processes. However, the model does not utilize the processing capability of idle
workstations.
The V-System [23] is an example of a distributed computing system that is based on the
workstation-server model.
Although using idle workstations adds a little computing power to the system, it does not
address a more fundamental issue: What happens when it is feasible to provide 10 or 100
times as many CPUs as there are active users? One solution, as we saw, is to give
machine room, which can be dynamically allocated to users on demand. The processor
pool approach is illustrated in figure 2.11. Instead of giving users personal workstations,
in this model they are given high-performance graphics terminals, such as X terminals
(although small workstations can also be used as terminals). This idea is based on the
observation that what many users really want is a high-quality graphical interface and
microprocessors).
The motivation for the processor pool idea comes from taking the diskless workstation
idea a step further. If the file system can be centralized in a small number of file servers
to gain economies of scale, it should be possible to do the same thing for compute
servers. By putting all the CPUs in a big rack in the machine room, power supply and
other packaging costs can be reduced, giving more computing power for a given amount
of money. Furthermore, it permits the use of cheaper X terminals (or even ordinary
ASCII terminals), and decouples the number of users from the number of workstations.
The model also allows for easy incremental growth. If the computing load increases by
10 percent, you can just buy 10 percent more processors and put them in the pool.
In effect, we are converting all the computing power into "idle workstations" that can be
accessed dynamically. Users can be assigned as many CPUs as they need for short
periods, after which they are returned to the pool so that other users can have them. There
So far we have tacitly assumed that a pool of n processors is effectively the same thing as
a single processor that is n times as fast as a single processor. In reality, this assumption
is justified only if all requests can be split up in such a way as to allow them to run on all
the processors in parallel. If a job can be split into, say, only 5 parts, then the processor
pool model has an effective service time only 5 times better than that of a single
Still, the processor pool model is a much cleaner way of getting extra computing power
than looking around for idle workstations and sneaking over there while nobody is
looking. By starting out with the assumption that no processor belongs to anyone, we get
a design based on the concept of requesting machines from the pool, using them, and
putting them back when done. There is also no need to forward anything back to a
There is also no danger of the owner coming back, because there are no owners. In the
end, it all comes down to the nature of the workload. If all people are doing is simple
editing and occasionally sending an electronic mail message or two, having a personal
workstation is probably enough. If, on the other hand, the users are engaged in a large
software development project, frequently running make on large directories, or are trying
to invert massive sparse matrices, or do major simulations or run big artificial intelligence
workstations will be no fun at all. In all these situations, the processor pool idea is
A possible compromise is to provide each user with a personal workstation and to have a
processor pool in addition. Although this solution is more expensive than either a pure
workstation model or a pure processor pool model, it combines the advantages of both.
workstations, however, are not utilized, making for a simpler system design. They are
just left unused. Instead, all non interactive processes run on the processor pool, as does
all heavy computing in general. This model provides fast interactive response, an
operate together over a network (e.g., Internet, virtual private network, local area
network, intranet) to accomplish a set of goals. The model simplifies and abstracts the
functions of the individual components of a distributed system and then it considers the
interrelationships between the various components. Based on how the responsibilities are
distributed between system components and how these components are placed we can
a) Client-server Architecture
The client/server model is a computing model that acts as distributed application which
Often clients and servers communicate over a computer network on separate hardware,
but both client and server may reside in the same system. A server machine is a host that
is running one or more server programs which share their resources with clients. A client
does not share any of its resources, but requests a server's content or service function.
Clients therefore initiate communication sessions with servers which await incoming
requests.
application. The server component provides a function or service to one or many clients,
which initiate requests for such services. With the development of large scale information
systems which are very complex in nature, the client server model has become very
popular. These systems have evolved at a tremendous rate over the past two decades for
The C/S computing architecture is the backbone of technologies like groupware and
workflow systems. The Client Server technology is having huge impact on the
perhaps a logical model where the nodes are divided into clients and servers.
This architecture has been in existence since the very beginning of the Information
mainframes also where the mainframe acts as a server and an unintelligent terminal acts
When the client communicates directly with a database server, it is known as two tier
architecture. The business or the application logic can reside either on the client or on the
This model started to emerge in the late eighties and early nineties in the applications
which were developed for Local Area Network. These applications were based on simple
file sharing techniques which were implemented by X-base style products such as
Fat Clients
In the beginning the Client Server systems were such that most of the processing
occurred on the client node itself. In this case the host was a non-mainframe system as a
network file server. Since most of the processing took place on the client node so it was
known as a “fat client”. This configuration is shown in figure 2.13 but the disadvantage of
this model was that it was not able to handle large or even mid-size information systems
For desktop computing the Graphical User Interface (GUI) became the most popular
environment. New horizons started to appear for the two- tier architecture. Specialized
database servers started to replace the general purpose LAN file server. New
development tools such as PowerBuilder, Visual Basic, Delphi etc. started to emerge.
Figure 2.13: Fat Client Model
In this new scheme datasets of information used to be send to the client using Structured
Query Language (SQL) techniques. Most of the processing was still carried out on the
"fat" clients.
If we want to carry out most of the processing on the client the hardware of the client
must be very powerful to support it. That is we need a fatter client. If the client
technology is not that advanced then this is not feasible and hence the application cannot
be afforded. For fat clients the amount of bandwidth required is also quite large and
Thin Client
As opposed to fat client where most of the processing was carried out on the client we
have thin client model which is shown in figure 2.14. In this model the procedures stored
at the database server are invoked by the user as and when required.
The performance is increased in case of the thin client model because the network
However the drawback of this model is that it relies heavily on stored procedures which
are much customized and vendor specific. These stored procedures are very closely
linked to the database and hence they have to be changed whenever the business logic
changes. If the database is very large it becomes very difficult to make these changes and
Remote database transport protocol has to be used for such cases. One such example is
using SQL-Net to carry out the transactions. The Client/Server interaction has to be
mediated through 'heavy' network process in such situations. Due to this the network
Either we used thin client or fat client model the two-tier (C/S) systems were not able to
handle distributed systems larger than those comprising of 100 users. They were also not
In order to cope up with the limitations of the two tier architecture a new model was
actually a thin client. The application server(s) holds the business logic and the database
Multi-Tier Client/Server architectures have got many more advantages which include:
The applications can be modified easily to adapt to the changing user
application logic.
transferred to the client by the application layer. Hence network bottlenecks are
minimized.
Because the server holds the business logic so whenever there is any change
in the business logic we just have to update the server. No changes are needed at
The client has no information about the database and network operations. It
can access data easily and quickly without having any knowledge about the
Several users can share the database connections by pooling. The cost of
Standard SQL is used for writing the data layer. Because standard SQL is
can be used for writing the application layer. So the programming can be easily
languages.
Fat Middle
In a multi-tier architecture one or more middle tier components are added as compared to
the traditional client/server architecture. Standard protocols such as RPC or HTTP are
used for interaction between the client system and the middle-tier and standard protocols
for database connectivity such as SQL, ODBC and JDBC are used for interaction
Most of the application logic is present at the middle-tier. The client calls here are
translated into database queries and other actions and the data from the database is
translated into client data in return. Scalability is easier to achieve here because the
business logic is present on the application server. This also provides for easier handling
of the rapidly changing business needs. Apart from these advantages this also allows for
When the middle tier is capable of providing connections to different types of services,
and can integrate and couple them to the client and to each other we get an N tier
architecture.
With the developments in the field of distributed architectures we got models where in a
multi tier environment a client side computer works as a client as well as a server. In
these client-server systems the processes at the client end are smaller but more
specialized. These processes have the advantage that they can be developed faster and
maintained easily. Similarly on the server side also small specialized processes were
developed.
The industry nowadays is embracing this N-Tier architecture at a very rapid pace. Most
of the applications today are written using this technology only. But this does not mean
that two-tier and three-tier model are completely out Depending on the type of
application, the size of the distributed environment and the type of data access the two- or
When the traffic increases due to growing business needs each tier can be expanded
and moved to its own machine and then clustered. This is one example of how N-Tier
Applications can be made more readable and reusable by using N-Tier model. It
is easier to port EJB’s and custom tag libraries to readable applications in well-
applications.
architectures are more robust. The various tiers are independent of each other. For
example, if a business changes database vendors, they just have to replace the data
tier and adjust the integration tier to any changes that affect it. The business logic tier
and the presentation tier remain unchanged. Likewise, if the presentation layer
changes, this will not affect the integration or data layer. In 3-Tier Architecture all the
layers exist in one and affect each other. A developer would have to pick through the
allows developers to apply their specific skill to that part of the program that best
suits their skill set. Graphic artists can focus on the presentation tier, while
It is expected that the applications using N tier model will grow almost four-fold over the
application logic is broken down among various servers. Due to this partitioning of
consistent and global access to critical data. Due to the N tier architecture the network
traffic is also reduced which in turn results in greater reliability, faster network
an intelligent client, and one or more intelligent agents in the middle which control
handling, security and object store control. Object oriented methodologies are also very
TP monitors
many different resources. However, although the transactional paradigm this middleware
simple information sharing-the primary goal of B2B application integration. For example,
solution, while messaging solutions tend to be more cohesive. In addition, in order to take
changed.
between two or more applications and a location for application logic. Examples of TP
monitors include Tuxedo from BEA Systems, MTS from Microsoft, and CICS from
IBM.
The TP monitor performs two major services. On one side, a TP monitor provides
services that guarantee the integrity of transactions (a transaction service). On the other
side, a TP monitor provides resource management and runtime management services (an
queues. These connectors are typically low-level connectors that require some
resources. Once connected, these resources are integrated into the transaction and
leveraged as part of the transaction. As a result, they are also able to recover if a failure
occurs.
load and many clients. They take advantage of queued input buffers to protect against
peaks in the workload. If the load increases, the engine is able to press on without a loss
in response time. TP monitors can also use priority scheduling to prioritize messages and
support server threads, thus saving on the overhead of heavyweight processes. Finally,
by the many new products touting themselves as application servers. What's interesting
about this is that application servers are nothing new (and TP monitors should be
considered application servers because of their many common features). Most application
enabled applications. What's more, they employ modern languages such as Java instead
To put it simply, application servers provide not only for the sharing and processing of
application logic, but also for connecting to back-end resources. These resources include
databases, ERP applications, and even traditional mainframe applications. Application
servers also provide user interface development mechanisms. Additionally, they usually
Application server vendors are repositioning their products as a technology that solves
B2B application integration problems (some without the benefit of a technology that
works!). Since this is the case, application servers and TP monitors are sure to play a
major role in the B2B application integration domain. Many of these vendors are going
routing, services that are currently native to message brokers. This area of middleware is
programming or shell scripts to implement both client and server application logic. In
DOA, the application logic is organized as objects and distributed over multiple
networked hosts. These objects collaborate over the network to provide the overall
object is called as the “client object” and the remote object on a different host whose
method is being invoked is called as the “server object”. Since this invocation happens
over a network, a reference to the remote object has to be obtained by the client object.
invocation of a method.
One thing to be noted is that the distribution of the logic is transparent. The client object
thinks it is calling a local object. The task of actually making the call over the network is
taken over by the infrastructure software. The three most famous frameworks in this
Components are the units of processing in DCM [25]. We define a software component
programming are:
interface. An interface defines a set of properties, methods, and events through which
external entities can connect to, and communicate with, the component. According to
Lowy [26], this principle contrasts with the object-oriented view of the world that places
the object rather than its interface at the center. Lowy [26] further says that in
Location transparency
without hard coding their location into the client code. This allows the location of the
recompilation. Components are usually at a higher level of abstraction than objects and
are explicitly geared towards reuse. Components differ from other types of reusable
software modules in that they can be modified at design time as binary executables. In
Component standards specify how to build and interconnect software components. They
show how a component must present itself to the outside world, independent of its
and CORBA who provide support for the distributed component model through
Enterprise Services, Enterprise Java Beans (EJB) and CORBA Component Model (CCM)
respectively. Components often exist and operate within containers, which provide a
shared context for interaction with other components. Containers also offer common
threads and memory resources). Containers are themselves are typically implemented as
component and its container. Compliant containers all support the same set of interfaces
which means that components can freely migrate between different containers at runtime
application servers, which offer services offered by the underlying middleware systems
such as transactions, security, persistence and notification. Also, server components are
hierarchies.
Object Management Group for a distributed object architecture and infrastructure. The
objects components that collaborate over a network. It provides the mechanism for
exposing an object's methods to remote callers (to act as a server) and for discovering
such an exposed server object within the CORBA infrastructure (to invoke it as a client).
CORBA objects can act as servers and clients simultaneously. CORBA uses a platform-
the definition of the calling interfaces and their signatures. An IDL compiler is a tool that
a platform vendor must provide. It compiles the IDL file into platform- specific stub code
and maps the parameter types to platform-specific types. An IDL compiler can generate
both the client stubs and the server skeleton code. The IDL interface definition is
languages via OMG standards: OMG has standardized mappings from IDL to C, C++,
Java, COBOL, Smalltalk, Ada, Lisp, Python, and IDLscript. Thus, CORBA is language
independent, provided that there is a mapping from the language constructs to the IDL. In
operations that support a CORBA IDL interface is called as a “Servant”. The heart of the
software bus for objects. An ORB makes it possible for CORBA objects to communicate
with each other by connecting objects making requests (clients) with objects servicing
marshalling, fault recovery, and security. Figure 6 shows the structure of an ORB in
one acting as a client and the other acting as the server object, to communicate regardless
of whether the two objects are on the same or on different machines. This communication
structure is achieved using a proxy object in the client and a stub in the server. When
client and component reside on different machines, DCOM simply replaces the local
inter- process communication with a network protocol. The COM run-time provides
object-oriented services to clients and components and uses DCE-RPC and the security
provider to generate standard network packets that conform to the DCOM wire-protocol
standard. A DCOM object has one or more interfaces that a client accesses via interface
pointers.
It is not possible to directly access an object itself; it is accessible only through its
interfaces. Thus, a DCOM object is completely defined by the interfaces that comprise it.
Each DCOM interface is unique in the system. A Globally Unique Identifier (GUID – a
128 bit integer that guarantees uniqueness in space and time for an interface, an object or
a class) allows them to be uniquely named. A DCOM interface is not modifiable; if a new
Some take a technical perspective, some others a business perspective and a few define
SOA from an architectural perspective. For example, the W3C (World Wide Web
which can be invoked, and whose interface descriptions can be published and
discovered”. This is not very clear as it describes architecture as a technical
implementation and not in the sense the term “architecture” is generally used – to describe
perspective is provided in MSDN magazine where SOA is defined as “an architecture for
a system or application that is built using a set of services”. A SOA defines application
functionality as a set of shared, reusable services. However, it is not just a system that is
built as a set of services. An application or a system built using SOA could still contain
code that implements functionality specific to that application. On the other hand, all of
the application’s functionality could be made up of services. Some of the other definitions
of SOA include:
takes everyday business applications and breaks them down into individual
business functions and processes, called services. An SOA lets you build, deploy
[Microsoft 2004a]:
directory. These three components interact with each other to provide achieve
automation.
partitions tasks or workloads among peers. Peers are equally privileged, equipotent
participants in the application. They are said to form a peer-to-peer network of nodes.
Peers make a portion of their resources, such as processing power, disk storage or
network bandwidth, directly available to other network participants, without the need for
central coordination by servers or stable hosts. Peers are both suppliers and consumers of
resources, in contrast to the traditional client–server model where only servers supply
Napster. The concept has inspired new structures and philosophies in many areas of
also social processes with a peer-to-peer dynamic. In such context, social peer-to-peer