0% found this document useful (0 votes)
24 views56 pages

Chapter 3

Uploaded by

Legesse Samuel
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
24 views56 pages

Chapter 3

Uploaded by

Legesse Samuel
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 56

CHAPTER 3-PROCESSES

1
INTRODUCTION
 The concept of a process originates from the field of OS.
 process is a program in execution.
 From OS perspective, the management and scheduling of
processes are the most important issue.
 other important issues arise in distributed systems;
 multithreading;
 to efficiently organize client-server systems;

 to enhance performance by overlapping communication and local

processing.
 virtualization:
 allows an application and its environment to run concurrently

with other applications independent of the hardware and


platforms, leading to a high degree of portability.
 process or code migration;
 moving processes between different machines (in wide area DS);

 can help in achieving scalability, 2

 can also help to dynamically configure clients and servers.


3.1. THREADS
 to execute a program, an OS creates a number of virtual
processors, each one for running a different program.
 to keep track of these virtual processors, the OS has a
process table, containing entries to store CPU register
values, memory maps, open files, accounting information,
privileges, etc.
 a process is a program that is currently being executed
on one of the OS’s virtual processors.
 there are usually many processes executing concurrently.

ƒprocesses should not interfere with each other; sharing


resources by processes is transparent.
 this concurrency transparency has a high price;
allocating resources for a new process and context
switching take time. 3
CONT…
 A thread is a single sequential flow of control within a
program.
 A thread also executes independently from other threads; but
no need of a high degree of concurrency transparency thereby
resulting in better performance.
 A Web browser is an example of a multithreaded
application. Within a typical browser, you can:
 scroll a page while it’s downloading an applet or an image,
 play animation and sound concurrently,
 print a page in the background while you download a new page

 A main contribution of threads in distributed systems is that


they allow clients and servers to be constructed.

 threads can be used in both:


 Non-Distributed systems 4
 Distributed systems
3.1.1. THREADS IN NON-DISTRIBUTED SYSTEMS
 a process has an address space(containing program
text and data) and a single thread of control, as well as
other resources such as open files, child processes,
accounting information, etc.

(a) (b)
Fig.3.1. (a) three processes each with one thread. 5

(b) one process with three threads.


CONT…
 each thread has its own program counter, registers,
stack, and state; but all threads of a process share
address space, global variables and other resources such
as open files, etc.

Fig.3.2. Threads
CONT…

 Threads allow multiple executions to take place in the


same process environment, called multithreading.
 Multithreading provides concurrency with less overhead;
 i.e. less transparency; application must provide memory
protection for threads.
 Thread Usage – Why do we need threads?
 simplifying the programming model: since many activities
are going on at once more or less independently
 they are easier to create and destroy than processes since
they do not have any resources attached to them
 performance improves by overlapping activities if there is
too much I/O;
 i.e., to avoid blocking when waiting for input or doing calculations, say
in a spreadsheet
7
 real parallelism is possible in a multiprocessor system
CONT…
 in non-distributed systems, threads can be used with
shared data instead of processes to avoid context
switching overhead in interprocess communication (IPC).

Fig.3.3. Context switching as the result of IPC 8


THREAD IMPLEMENTATION
 Threads are often provided in the form of a thread
package.
 Such a package contains operations to create and destroy
threads as well as operations on synchronization
variables such as mutexes and condition variables.
 The two approaches to implement a thread package are:
user-level and kernel-level thread.
user-level thread:
 to construct a thread library that is executed entirely in
user mode.
 the OS is not aware of threads
 Advantages:
 it is cheap to create and destroy threads; just allocate and free
memory
 context switching can be done in just a few instructions; 9
store and reload only CPU register values
CONT…
 Drawback:
 invocation of a blocking system call will block the entire
process to which the thread belongs, and all the other
threads in that process.
kernel-level thread:
 let the kernel be aware of threads and schedule them
 implementing threads in the OS’s kernel
 ƒ
expensive for thread operations such as creation, deletion,
synchronization since each requires a system call.

 solution: use a hybrid form of user-level and kernel-


level threads, called lightweight processes (LWP).

10
LIGHTWEIGHT PROCESSES (LWP)
 a LWP runs in the context of a single (heavy-weight)
process, and there can be several LWPs per process.

 the system also offers a user-level thread package;


 for creating and destroying threads and
 to provide facilities for thread synchronization, such as
mutexes and condition variables.

 the important issue is that the thread package is


implemented entirely in user space.
 in other words, all operations on threads are carried out
without intervention of the kernel.

11
CONT…
 the thread package can be shared by multiple LWPs,
as shown in the figure 3.4.
 this means that each LWP can be running its own (user-
level) thread

Fig. 3.4. combining kernel-level lightweight processes & 12


user-level threads
CONT…
 Advantages of using LWPs in combination with a user-
level thread package:
 creating, destroying, and synchronizing threads is relatively
cheap and involves no kernel intervention at all.
 a blocking system call will not suspend the entire process.
 there is no need for an application to know about the LWPs.
 all it sees are user-level threads

 LWPs can be easily used in multiprocessing environments,


by executing different LWPs on different CPUs.
 This multiprocessing can be hidden entirely from the

application

 Drawback:
 need to create and destroy LWPs, which is just as expensive
as with kernel-level threads. 13
3.1.2. THREADS IN DISTRIBUTED SYSTEMS
 threads allow blocking system calls without blocking
the entire process;
 this means multiple logical connections (communications)
can be established at the same time.

 threads gain much of their power by sharing an address


space
 No shared address space in distributed systems

 individual processes; e.g., a client or a server, can be


multithreaded to improve performance

14
CONT…
Multithreaded Clients:
 the main advantage is hide communication latency.
 addresses delays in downloading documents from web servers
in a WAN.
 The usual way to hide communication latencies is to
initiate communication and immediately proceed
with something else.
 Example: consider web browsers:
 fetching different parts of a page can be implemented as a
separate thread
 each opening its own TCP connection to the server
 each can display the results as it gets its part of the page

15
CONT…
 Hide latency by starting several threads
 One to download text (display as it arrives)
 Others to download photographs, figures, etc.

 parallelism can also be achieved for replicated servers


since each thread request can be forwarded to separate
replicas.
 if servers are replicated, the multiple threads may be sent to
separate sites.
 As a result; data can be downloaded in several parallel
streams, improving performance

16
CONT…
Multithreaded Servers:
 servers can be constructed in three ways:

A. Single-threaded process
 it processes one request at a time
 it gets a request, examines it, carries it out to completion
before getting the next request
 while waiting for the disk, the server is idle and does not
process any other requests;
 consequently, requests from other clients cannot be
handled

17
CONT…
B. Threads: (Multi-threaded)
 threads are more important for implementing servers

 e.g., a file server


 the dispatcher thread reads incoming requests for a file
operation from clients and passes it to an idle worker thread.
 the worker thread performs a blocking disk read; in which
case another thread may continue, say the dispatcher or
another worker thread

Fig 3.5. a multithreaded


server organized in a
dispatcher/worker model

18
CONT…
C. Finite-state machine
 if threads are not available
 it gets a request, examines it, tries to fulfill the request
from cache, else sends a request to the file system;
 but instead of blocking it records the state of the current
request and proceeds to the next request
 but hard to program
Summary

19
3.2. ANATOMY OF CLIENT
A. (Networked) User Interfaces:
 A major task of client machines is to provide the means
for users to interact with remote servers.

 There are two ways for this interaction:


 First, for each remote service the client machine will
have a separate counterpart that can contact the
service over the network.
 Second, the client machine provides direct access to
remote services by offering a convenient user interface.
 this means that the client machine is used only as a terminal
with no need for local storage

 In the case of networked user interfaces, everything is


20
processed and stored at the server.
CONT…

Fig 3.6.
a networked application with its own a general solution to allow access to
protocol remote applications

Fat client: Thin client:


 each remote application has  the client is basically a terminal
two parts: one on the client, and does little more than provide
one on the server. a GUI interface to remote services
 communication is application
specific 21
CONT…
B. Client-Side Software:
 in addition to the user interface, parts of the processing
and data level in a client-server application are executed
at the client side.
 an example is embedded client software for ATMs, cash
registers, etc.
 moreover, client software can also include components to
achieve distribution transparency;
 Access transparency: Client side stubs hide communication
and hardware details.
 Location, migration, and relocation transparency rely on
naming systems; (e.g., when a server changes location, the
client software can be informed without the user knowing )
 Failure transparency (e.g., client middleware can make
22
multiple attempts to connect to a server)
CONT…
 replication transparency:
 e.g. assume a distributed system with replicated servers;
 the client proxy can send requests to each replica and a
client side software can transparently collect all responses
and passes a single return value to the client application

23
Fig 3.7., Transparent replication of a server using a client-side solution
3.3. ANATOMY OF SERVERS
3.3.1. General Design Issues
 a server is a process implementing a specific service
on behalf of a collection of clients.
 each server is organized in the same way; it waits until a
request arrives.
A. How to organize servers?
 iterative server:
 the server itself handles the request and, if necessary, returns
a response to the requesting client.
 concurrent server:
 a concurrent server does not handle the request itself, but
passes it to a separate thread or another process, after
which it immediately waits for the next incoming request.
24
 a multithreaded server is an example of a concurrent server
CONT…
B. Where do clients contact a server?
 clients send requests to an end point, also called a
port, at the machine where the server is running.
 Each server listens to a specific end point.

 How do clients know the end point of a service?


 globally assign end points for well-known services;
 e.g. FTP is on TCP port 21, HTTP is on TCP port 80
 these end points have been assigned by the Internet

Assigned Numbers Authority (lANA).


 with assigned end points, the client only needs to find the
network address of the machine where the server is running.
 for services that do not require pre-assigned endpoints, it
can be dynamically assigned by the local OS;
 a client will first have to look up the end point 25
CONT…
 IANA Ranges:
 IANA divided the port numbers into three ranges

 Well-known ports:
 assigned and controlled by IANA for standard services,

 e.g., DNS uses port 53

 Registered ports:
 are not assigned and controlled by IANA;

 can only be registered with IANA to prevent duplication

 e.g., MySQL uses port 3306

 Dynamic ports : neither controlled nor registered by IANA26


CONT…
 how can the client know endpoints that are not well-
known? two approaches:
i. have a daemon running (on each machine that runs
servers) and listening to a well-known endpoint;
 it keeps track of all endpoints of services on the collocated
server
 the client will first contact the daemon which provides it

with the endpoint, and then the client contacts the specific
server

27

Fig. 3.8. Client-to-server binding using a daemon


CONT…

ii. use a superserver (as in UNIX) that listens to all


endpoints and then forks a process to take care of the
request;
 this is instead of having a lot of servers running
simultaneously and most of them idle

Fig. 3.9. Client-to-server binding using a superserver 28


CONT…

C. Whether and how a server can be interrupted?

 for instance, a user may want to interrupt a file


transfer; may be it was the wrong file
 let the client exit the client application;
 this will break the connection to the server;
 the server will tear down the connection assuming that the
client had crashed
OR
 let the client send out-of-bound data;
 data to be processed by the server before any other data from
the client;
 the server may listen on a separate control endpoint; or send it

on the same connection as urgent data as is in TCP


29
CONT…
D. Whether or not the server is stateless
 A stateless server does not keep information on the
state of its clients, and can change its own state without
having to inform any client
 Example: a web server which honors HTTP requests
doesn’t need to remember which clients have contacted it.
 A stateful server maintains information on its clients.
 the information needs to be explicitly deleted by the server.
 Example: a file server that allows a client to keep a local
copy of a file and can make update operations
 such a server would maintain a table containing (client, file)
entries.
 improve the performance of read and write operations
 but requires a recovery procedure in case of a server
crash;
30
 a stateful server needs to recover its entire state as it was
just before the crash
CONT…
3.3.2. Server Clusters
 A server cluster is a collection of machines connected
through a network, where each machine runs one or
more servers.
 the machines are connected through a LAN, with high
bandwidth and low latency.
 it is logically organized into three tiers
 the first tier consists of a (logical) switch through which
client requests are routed.
 the second tier consists of (application/compute) servers
through which data is processed.
 the third tire consists of data-processing servers; eg. File
servers and database servers;
 for other applications, the major part of the workload may be here
31
CONT…

32
Fig. 3.10. The general organization of a three-tiered server cluster
CONT…

 Distributed Servers
 the problem with a server cluster is when the logical
switch (single access point) fails making the
cluster unavailable.
 to eliminate this potential problem, several access
points can be provided where the addresses are
publicly available leading to a distributed server.
 For example, the Domain Name System (DNS) can
return several addresses, all belonging to the same host
name.

33
3.4. CODE MIGRATION
 So far, we have been mainly concerned with distributed
systems in which communication is limited to passing
data.
 However, there are situations in which passing programs,
even while they are running, and also in heterogeneous
systems; simplifies the design of a distributed system.

 code migration in distributed systems took place in the


form of process migration in which an entire process
was moved from one machine to another.

 code migration also involves moving data as well:


 when a program migrates while running, its status, pending
signals, and other environment variables such as the stack
and the program counter also have to be moved. 34
CONT…
 Reasons for Migrating Code:
 to improve performance
 move processes from heavily-loaded to lightly-loaded machines
(load balancing)
 to reduce communication
ƒ
 move a client application that performs many database
operations to a server if the database resides on the server;
then send only results to the client
 to exploit parallelism (for nonparallel programs)
ƒ
 e.g., copies of a mobile program (called a mobile agent or a
crawler ) moving from site to site searching the web.
 to have flexibility
 by dynamically configuring distributed systems; instead of
having a multi-tiered client-server application deciding in
advance which parts of a program are to be run where. 35
CONT…

Fig. 3.11. the principle of dynamically configuring a client to


communicate to a server; the client first fetches the necessary
software, and then invokes the serve 36
CLIENT-SERVER EXAMPLES
 Example 1: Send Client code to Server
 Server manages a huge database.
 If a client application needs to perform many database
operations, it may be better to ship part of the client
application to the server and send only the results across the
network.
 Example 2: Send Server code to Client
 In many interactive DB applications, clients need to fill in
forms that are subsequently translated into a series of DB
operation where validation at server side is required.

37
3.4.1. MODELS FOR CODE MIGRATION
 communication in distributed systems is concerned
with exchanging data between processes.
 code migration deals with moving programs
between machines, with the intention to have those
programs be executed at the target.
 in some cases, as in process migration, the execution
status of a program, pending signals, and other parts of
the environment must be moved as well.

 To get a better understanding of the different models for


code migration, we use a framework described in
Fuggetta et al. (1998).
38
CONT…

 In this framework, a process consists of three segments;


code segment, resource segment, and execution
segment.

 The code segment is the part that contains the set of


instructions that make up the program that is being
executed.
 The resource segment contains references to external
resources needed by the process, such as files, printers,
devices, other processes, and so on.
 The execution segment is used to store the current
execution state of a process, consisting of private data,
the stack, and the program counter.
39
CONT…

1. Weak Mobility
 transfer only the code segment and may be some
initialization data;
 process can only migrate before it begins to run, or
perhaps at a few intermediate points.
 the feature of weak mobility is that a transferred
program is always started from its initial stage.
 e.g. Java Applet (which always start execution from the
beginning)
 The benefit of this approach is its simplicity.
 it requires only that the target machine can execute that code.

40
CONT…
 In case of week mobility, the migrated code is executed
by the target process (in its own address space) or a
separate process.
 For example:
 Java applets are simply downloaded by a web browser and
are executed in the browser's address space.
 Advantage:
 no need to start a separate process, thereby avoiding
communication at the target machine.
 Drawback:
 the target process needs to be protected against malicious or
inadvertent code executions.
 a simple solution is to let the operating system take care of
that by creating a separate process to execute the migrated41
code.
CONT…
2. Strong Mobility
 transfer code segment and execution segment.
 processes can migrate after they have already started
to execute.
 its feature is that a running process can be stopped,
subsequently moved to another machine, and then
resume execution where it is stopped.
 it is much harder to implement
 can also be supported by remote cloning; having an
exact copy of the original process and running on a
different machine.
 the cloned process is executed in parallel to the original
process.
42
 UNIX does this by forking a child process and letting
that child continue on a remote machine.
CONT…

 migration can be: sender-initiated and receiver-initiated.


 Sender-initiated:
 migration is initiated at the machine where the code
currently resides or is being executed.
 Example:
 uploading programs to a server; requires that the client
has previously been registered and authenticated at
that server.
 sending a search program across the internet to a web

database server to perform the queries at that server

43
CONT…

 Receiver-initiated:
 the initiative for code migration is taken by the target
machine.
 Example: Java applets.
 code migration occurs between a client and a server,
where the client takes the initiative for migration.
 the server is generally not interested in the client's
resources. Instead, code migration to the client is done
only for improving client-side performance.

44
 Summery for models of code migration

45
Fig. 3.12. Alternatives for code migration
3.4.2. MIGRATION AND LOCAL RESOURCES
 So far, only the migration of the code and execution
segment has been taken into account.

 What often makes code migration so difficult is that the


resource segment cannot always be simply transferred
along with the other segments without being changed.
 For example:
 suppose a process holds a reference to a specific TCP port
through which it was communicating with other (remote)
processes.
 Such a reference is held in its resource segment.
 When the process moves to another location, it will have to
give up the port and request a new one at the destination.
46
CONT…
 To understand the implications that code migration has on
the resource segment, there are three types of process-to-
resource bindings.
1. Binding by Identifier: the strongest binding
 when a process refers to a resource by its identifier.
 the process requires the referenced resource.
 eg., when a process uses a URL to refer to a specific web site or IP
to refer to FTP server.
2. Binding by Value: the weaker binding
 when only the value of a resource is needed.
 in this case, another resource can provide the same value; it doesn’t
affect the execution of the process.
 eg., when a program relies on standard libraries of programming
languages such as C or Java which are normally locally available,
but their location in the file system may vary from site to site.
3. Binding by Type: the weakest binding
 when a process needs only a resource of a specific type; reference to
47
a resource by a type.
 e.g., local devices such as a printer or a monitor, and so on.
CONT…
 in migrating code, we need to change the references to
resources; ƒhow reference should be changed? depends on
whether the resource can be moved along with the code,
i.e., resource-to-machine binding
ƒTypes of Resource-to-Machine Bindings
1. Unattached Resources: can be easily moved between
different machines with the migrating program (such
as data files associated with the program)
2. ƒFastened Resources: moving or copying may be
possible, but more expensive; such as local databases
and complete web sites;
3. ƒFixed Resources: intimately bound to a specific
machine or environment and cannot be moved; such as
local devices. 48
CONT…
 when migrating code, we have nine combinations to
consider;

Fig. 3.13. Actions to be taken with respect to the references to local


resources when migrating code to another machine
49
CONT…
 when a process is bound to a resource by identifier;
 when the resource is unattached:
 it is best to move it along with the migrating code.,
 but when the resource is shared by other processes; an
alternative is to establish a global reference, that is, a
reference that can cross machine boundaries
 an example of such a reference is a URL.

 when the resource is fastened or fixed:


 the best solution is to create a global reference.

50
CONT…
 when a process is bound to a resource by value:
 when the resource is fixed:
 occurs when a process assumes that memory can be shared
between processes.
 establishing a global reference (means need to implement a

distributed form of shared memory)


 this is not efficient solution

 When the resource is fastened:


 are typically runtime libraries
 copies of the resources are available on the target machine,

 establishing a global reference is a better alternative when

huge amounts of data are to be copied


 When the resource is unattached:
 the best solution is to copy (or move) the resource to the new
51
destination
 establishing a global reference is the other option.
CONT…
 when a process is bound to a resource by type:
 irrespective of the resource-to-machine binding;
 the solution is to rebind the process to a locally available
resource of the same type.
 when a resource is not available, copy or move the
original one to the new destination, or establish a global
reference.

52
3.4.3. MIGRATION IN HETEROGENEOUS SYSTEMS
 So far, we have assumed that the migrated code can be
easily executed at the target machine when dealing with
homogeneous systems.
 However, distributed systems are constructed on a
heterogeneous collection of platforms, each with its
own OS and machine architecture.
 Migration in such systems requires:
 each platform is supported, i.e. the code segment can be
executed on each platform.
 the execution segment can be properly represented at each
platform.
 Heterogeneity problems are similar to those of portability.

53
CONT…
 heterogeneity can be addressed by providing process
virtual machines:
 for scripting languages directly interpret the migrated code
at the host site.
 for Java interpret intermediate code generated by a compiler

 A virtual machine encapsulates an entire computing


environment.
 if properly implemented, the virtual machine provides
strong mobility since local resources may be part of
the migrated environment.
 The reasons for wanting to migrate entire environments
is that it allows continuation of operation while a
machine needs to be shutdown.
54
CONT…
 For example:
 in a server cluster, the systems administrator may
decide to shutdown or replace a machine, but will not
have to stop all its running processes.
 instead, it can temporarily freeze an environment,
move it to another machine (where it sits next to other,
existing environments), and simply unfreeze it again.
 this is an extremely powerful way to manage long-
running compute environments and their processes.

55
END

56

You might also like