0% found this document useful (0 votes)
130 views

Distributed System Models - Workstation Model

The document describes the distributed system model of a workstation model. It consists of workstations connected by a LAN that can either be used by a dedicated user or be idle. When idle workstations are available, they can be used to run processes transparently by finding idle workstations through a registry, setting up the remote process so it sees the same environment as the original workstation, and forwarding certain system calls like disk access back to the original machine while having other calls refer to the machine running the process. Managing consistency when caching files on local disks of workstations is also a challenge for this distributed system model.

Uploaded by

Supriya Gupta
Copyright
© Attribution Non-Commercial (BY-NC)
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPT, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
130 views

Distributed System Models - Workstation Model

The document describes the distributed system model of a workstation model. It consists of workstations connected by a LAN that can either be used by a dedicated user or be idle. When idle workstations are available, they can be used to run processes transparently by finding idle workstations through a registry, setting up the remote process so it sees the same environment as the original workstation, and forwarding certain system calls like disk access back to the original machine while having other calls refer to the machine running the process. Managing consistency when caching files on local disks of workstations is also a challenge for this distributed system model.

Uploaded by

Supriya Gupta
Copyright
© Attribution Non-Commercial (BY-NC)
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPT, PDF, TXT or read online on Scribd
You are on page 1/ 19

Distributed System Models -- Workstation Model

• The system consists of workstations (high-end personal computers) scattered throughout a


building or campus connected by a high-speed LAN.
• Dedicated users Vs. Public terminals
• At any instant of time, a workstation either has a user (owner) logged into it or it is idle.
• Diskless workstation Vs. Diskful / Disky workstation
– Price: Have a large number of workstation equipped with small, slow disks is typically
much more expensive than having one or two servers equipped with huge, fast disks and
accessed over the LAN.
– Ease of maintenance: It is always easier for the system administrator to install and
maintain software for a couple of servers than maintaining hundreds of machines all over
the campus. Same thing applied to backup and hardware maintenance.
– Noise: 200 PC fans Vs. 1 server fan!
– Provide symmetry and flexibility: A user can walk up to any workstation in the system
and log in. Since all his files are on the file server, one workstation is as good as the other.
On the contrary, if files are stored on local disk, using someone’s workstation means that
you have easy access to his files, but getting to your own requires extra effort, and is
different from using your own workstation.
Workstation Model (continue)
• When workstations have local disks, these disks can be used as follows:
– Paging and temporary files:
• Paging (swapping) and files that are temporary, unshared, and can be discarded at
the end of the login session.
• For example, most compilers consist of multiple passes, each of which creates a
temporary file read by the next pass.
– Paging, temporary files & system binaries:
• Same as the above but it also hold the binary (executable) programs, such as
compilers, text editors, and electronic mail handlers for reducing the network
workload.
• Since these programs rarely changes, installing in the local disk is OK.
• When a new release of the program is available, it is broadcast to all machines. But
if the machine is down at the time of update, it will miss the update and continue to
run the old version when it goes up again.
• Hence, some administration work have to be done to keep track of which machine
has which version of which program.
Workstation Model (continue .)
– Paging, temporary files, system binaries, and file caching:
• Users can download their programs from file servers and work on their private
copies, after doing the job, the private copy is copied back to the server.
• The goal of this architecture is to keep long-term storage centralized but reduce the
network load by keeping files local while they are being used.
• The disadvantage is keeping the caches consistent. What happens if two users
download the same file and then modifies it in different ways?
– Complete local file system:
• Each machine can have its own file system, with the possibility of mounting or
otherwise accessing other machines’ file systems.
• The idea is that each machine is basically self-contained and that contact with the
outside world is limited. This organization guaranteed response time for the user
and puts little load on the network.
• The disadvantage is that sharing is more difficult, and the resulting system is much
closer to a network operating system than to a true transparent distributed operating
system.
Disk usage on workstations
• The advantages of the workstation model:
– Users have a fixed amount of dedicated computing power, and thus
guaranteed response time.
– Sophisticated graphics programs can be very fast, since they can have
direct access to the screen.
– Each user has a large degree of autonomy and can allocate his
workstation’s resources as he see fit.
– Local disks add to this independence, and make it possible to continue
working to a lesser or greater degree even in the face of file server
crashes.
• Two Problems:
– As processor chips continue to get cheaper, it will soon become
economically feasible to give each user 10 to 100 CPUs.
– Much of the time users are not using their workstations, which are idle,
while other users may need extra computing capacity and cannot get it.
Disk usage on workstation summary
Disk usage Advantages Disadvantages
(Diskless) Low cost, easy hardware Heavy network usage;
and software maintenance, file servers may become
symmetry and flexibility bottlenecks.

Paging, Reduces network load Higher cost due to


scratch files over diskless case large number of disks needed

Paging, Reduce network load Higher cost; additional


scratch files, even more complexity of updating
binaries the binaries

Paging, scratch Still lower network Higher cost; cache


files, binaries, load; reduces load on consistency problems
file caching file servers as well

Full local Hardly any network load; Loss of transparency


file system eliminates need for file servers
Using Idle Workstations
• Using idle workstations has been the subject of considerable research
because many universities have a substantial number of personal
workstations, some of which are idle.
• Measurements show that even at peak periods, as many as 30 percent of the
workstations are idle.
• The earliest attempt to allow idle workstation to be utilized was the rsh
program that comes with Berkeley UNIX. This program is called by
rsh <machine-name> <command-string>
• What rsh does is run the specified command on the specified machine.
Although widely used, this program has several serious flaws.
– The user must tell which machine to use -- finding a idle workstation;
– The program executes in the environment of the remote machine is
usually different from the local machine.
– If someone should log into an idle machine on which a remote process is
running, the process continues to run and the newly logged in user either
has to accept the lower performance or find another machine
Using Idle Workstations (continue)
• 1) How is an idle workstation found?
– What is an idle workstation? A workstation with noone logged in at the console? But
how about remote login, telnet, and background processes, like the clock daemons, mail
daemons, news daemons, ftpd, httpd. On the other hand, a user might log in and do
nothing
– Server driven:
• When a workstation goes idle, and thus becomes a potential compute server, it
announces its availability. It can do this by entering its name, network address, and
properties in a registry file. Later, when a user want to execute a command on an idle
workstation, this registry file can serve as a tool to locate an idle workstation. For
reliability reasons, the registry file can be duplicated.
• An Alternate way is to have the newly idle station to send out a broadcast message
into the network and have all other workstation record this fact. So each machine
maintains its own private copy of the registry.
• The advantage is less overhead in finding an idle workstation and greater
redundancy.
• The disadvantage is requiring all machines to do the work maintaining the registry
• Overall disadvantage of the registry file method is the race-condition.
Using Idle Workstations (continue .)
• Client-driven:
– When remote execution is invoked, it broadcasts a request saying what
program it wants to run, how much memory it needs, whether or not
floating point is needed, and so on. These details are not needed if all
workstations are identical, but if the system is heterogeneous and not
every program can run on every workstation, they are essential.
– When the replies come back, remote picks one and set it up.
– One nice twist is to have “idle” workstations delay their responses
slightly, with the delay being proportional to the current load. In this
way, the reply from the lest heavily loaded machine will come back first
and be selected.

Refer to Figure 4-12 for an overall picture for a registry based algorithm in
finding and using idle workstations
Using Idle Workstations (continue ..)
• 2) How can a remote process be run transparently? Well, moving the code is easy
– The trick is to set up the remote process so that it sees the same environment it would
have locally, on the original workstation, and thus carries out the same computation it
would have locally.
– It needs the same view of the file system, the same working directory, and the same
environment variables (shell variables), if any.
– Problems:
• When a READ is called, if it is a diskless station, that is fine, but if it refers to its
original local disk, the request have to be forward back to its original machine / disk.
• Furthermore, some system call always refers back to the original machine, like the
keyboard input and writes to the screen.
• On the other hand, some calls should be done remotely, like adjusting the size of a
data segment, the NICE command, and PROFILE command. And all system calls
that query the state of the machine should be referring to the machine who is running
the process -- like asking for the machine name, network address, how much free
memory.
Using Idle Workstations (continue ...)
• Problems: (continue)
– System calls involving time are a problem because the clocks on
different machines may not be the same. Using the time on the remote
machine may cause programs that depends on time to give incorrect
results. Forwarding all time-related calls back to the home machine,
however, introduces delay, which also cause problems with time.
– Certain special case of calls which normally might have to be forwarded
back, such as creating and writing to a temporary file, can be done much
more efficiently on the remote machine.
– Mouse tracking and signal propagation have to be thought out carefully.
– Programs that writes directly to hardware devices, such as the screen’s
frame buffer, diskette, or magnetic tapes, cannot be run remotely at all.
– All in all, making programs run on remote machines as though they were
running on their home machines is possible, but it is a complex and
tricky business.
Using Idle Workstations (continue ….)
• 3) What to do if the machine’s owner comes back?
– The easiest thing is to do nothing, but this tends to defeats the idea of
“personal” workstations. If other people can run programs on your
workstation at the same time that you are trying to use it, there goes your
guaranteed response.
– Another possibility is to kill off the intruding process. The simplest way
is to do this abruptly and without warning.
– The disadvantage is that all work will be lost and the file system may be
left in a chaotic state.
– A better way is to give the process fair warning, by sending it a signal to
allow it to detect impending doom, and shut down gracefully (write edit
buffers to the disk, close files, and so on). If it has not exited within a
few seconds, it is then terminated.
– Of course, the program must be written to expect and handle this signal,
something most existing programs definitely are not.
Using Idle Workstations (continue …..)
– Process migration: A completely different approach is to migrate the process to
another machine, either back to its original machine or to another idle workstation.
– Migration is rarely done in practice because the actual mechanism is complicated.
The hard part is not moving the user code and data, but finding and gathering up all
the kernel data structures relating to the process that is leaving.
– For example, it may have open files, running timers, queued incoming messages, and
other bit and pieces scattered around the kernel.
– These should be carefully removed from the source and successfully re-installed on
the destination machine.
– In any cases, when the process is gone, it should leave the machine in the same state
in which it found it, to avoid disturbing the owner. Any spawn child should also be
gone.
– In addition, mailboxes, network connections, and other system-wide data structures
must be deleted, and some provision must be made to ignore RPC replies and other
messages that arrive for the process after it is gone. If there is a local disk, temporary
files must be deleted, and if possible, any files that had to be removed from its cache.
Distributed System Models -- Processor Pool Model
• What happens when it is feasible to provide 10 to 100 times as many CPUs as
there are to the active users?
• Giving everyone a personal multiprocessor is an inefficient design.
• What the system designer aim at is price / performance. And purchasing 20
workstations but it ends up that only 50-70% is active at one time is not the
designer want.
• The next approach is to construct a processor pool, a rack full of CPUs in the
machine room, which can be dynamically allocated to users on demand.

Dis k array

Workstation Workstation Workstation Workstation File Server

Processor Pool
Processor Pool Model
• This idea is based on the observation that what many users really wants is a high-quality
graphical interface and good performance.
• It is much closer to traditional timesharing than to the personal computer model.
• The idea came from the diskless workstation: If the file system can be centralized in a small
number of servers to gain economies of scale, it should be possible to do the same thing for
compute servers.
• By putting all the CPUs in a big rack in the machine room, power supply and other packaging
costs can be reduced, giving more computing power for a given amount of money.
• Furthermore, it permits the use of cheaper X terminals and decouples the number of users
from the number of workstation (not one-to-one ratio)
• The processor pool model allows for easy incremental growth. If the computing load
increases by 10%, you can just install 10% more processors.
• We are converting all the computing power into “idle workstation” that can be accessed
dynamically. Users can be assigned as many CPUs as they need for short periods, after which
they are returned to the pool so that other users can have them. There is no concept of
ownership: all the processors belong equally to everyone.
Queueing Theory
• The biggest argument for centralizing the computing power in a processor
pool comes from queueing theory.
• A queueing system is a situation in which users generate random requests for
work from a server. When the server is busy, the users queue for service and
are processed in turn (FIFO).
• Common examples of queueing systems are airport check-in counters,
supermarket check-out counter, bank counters, cafeteria, … etc.
Inter-arrival & service
time follows
poisson distribution FIFO Queue

Job Statistics
generation J6 J5 J4 J3 J2 J1 Server collector
Queueing Theory (continue)
• Queueing systems are useful because it is possible to model them analytically.
• Let us call the total input rate requests per second, from all the users combined.
• Let us call the rate at which the server can process requests .
• For stable operation, we must have  > . If the server can handle 100
requests/second but the users continuously generate 110 requests/second, the queue
will grow without bound.
• We can have small intervals in which the input rate exceeds the service rate
provided that the mean input rate is lower than the service rate and there is enough
buffer space.
• It is proven by Kleinrock that the mean time between issuing a request and getting
a complete response, T, is related to  and  by the formula:
T = 1 / (  - )
• As an example, consider a file server that is capable of handling 50 request/sec but
only get 40 request/sec. The mean response time will be 1/10 sec. If  goes to 0,
the response time is 1/20 sec.
Queueing Theory (continue .)
• Suppose that we have n personal multiprocessors, each with some number of
CPUS, and each one forms a separate queueing system with request arrival
rate  and CPU processing rate .
• The mean response time, T, will be:
1 / ( - )
• Now consider the processor pool model. Instead of having n small queuing
systems running in parallel, we now have one large one, with an input rate
n and a service rate n.
• Let us call the mean response time of this combined system T1. From the
formula above we find:
T1 = 1 / (n - n) = T / n
• This surprising result says that by replacing n small resources by one big one
that is n times more powerful, we can reduce the average response time n-
fold.
Why Distributed System if Centralized System is better?
• At one point, the processor pool model is better than the workstation model, on
the other hand, if centralizing the computing power is better, why bother with
the distributed system.
• Given a choice between one centralized 1000-MIPS CPU and 100 private,
dedicated, 10-MIPS CPUs, the mean response time of the one CPU system will
be 100 times better, because no cycle are ever wasted.
• However, mean response time is not everything. There are also cost.
• If a single 1000-MIPS CPU is much more expensive than 100 10-MIPS CPUs,
the price / performance ratio of the latter may be much better.
• It may not even be possible to build such a large machine at any price.
• Reliability and fault tolerance are also factors.
• Also, personal workstations have a uniform response, independent of what
other people are doing (except when the network or file server are jammed).
• For some users, a low variance in response time may be perceived as more
important than the mean response time itself. (when opening a file, users might
be happier if it always takes 255 msec than 95% of time takes 5 msec, and 5%
of the time takes 5 seconds.)
Processor Pool Model Vs. Workstation Model (revisit)
• So far we have tacitly assumed that a pool of n processors is effectively the same thing as a single
processor that is n times as fast as a single processor.
• In reality, this assumption is justifies only if all requests can be split up in such a way as to allow them to
run on all the processor in parallel.
• If a job can be split into, say, only 5 parts, then the processor pool model has an effective service time
only 5 times better than that of a single processor, not n times better.
• Still, the processor pool model is much better than the workstation model. With the assumption that no
processor belongs to anyone:
– There is no need to forward anything back to a “home” machine;
– There is no danger of the owner coming back.
• In conclusion, it depends on the applications: Workstation Model is good at simple work like editing
files, sending emails. But for heavy duty work like large software development, intensive matrix
calculations, major simulations, AI programs, and VLSI routing, the processor pool model is a better
choice.
Note: A possible compromise is to provide each user with a personal workstation and to have a pool of
processors as a computation server. It is more expensive, but it got the best of both world. Users can get
a guaranteed response time through their workstations, and also have the capacity to run computation
intensive programs through the processor pool. (In order to have a simple system, idle workstations will
not be utilized in this Hybrid Model)

You might also like