Chapter 3 Processes
Chapter 3 Processes
Process
An operating system creates a number of virtual processors, each one for running a different
program.
The operating system has a process table to keep track of these virtual processors (PCB).
The process table contains entries to store CPU register values, memory maps, open files,
accounting information, privileges, etc.
A process is a running instance of a program, including all variables and other state attributes on
one of the operating system's virtual processors.
The operating system ensures that independent processes cannot affect each other's behavior.
Sharing the same CPU and other hardware resources is made transparent with hardware support
to enforce this separation.
Each time a process is created, the operating system must create a complete independent address
space.
Example: zeroing a data segment, copying the associated program into a text segment,
and setting up a stack for temporary data.
Switching the CPU between two processes requires:
Saving the CPU context (which consists of register values, program counter, stack
pointer, etc.),
Modifying registers of the memory management unit (MMU)
Invalidate address translation caches such as in the translation lookaside buffer (TLB) a
cache in a CPU that is used to improve the speed of virtual address translation.
If the operating system supports more processes than it can simultaneously hold in main
memory, it may have to swap processes between main memory and disk before the actual
switch can take place.
Threads
A thread is a basic unit of CPU utilization, consisting of a program counter, a stack, and a set of
registers and a thread ID.
Traditional (heavyweight) processes have a single thread of control - There is one program counter,
and one sequence of instructions that can be carried out at any given time.
As shown in Figure 1, multi-threaded applications have multiple threads within a single process,
each having their own program counter, stack and set of registers, but sharing common code, data,
and certain structures such as open files.
Thread Types
User Threads
Threads are implemented at the user level by a thread library
Library provides support for thread creation, scheduling and management.
User threads are fast to create and manage.
Kernel Threads
Supported and managed directly by the OS.
Thread creation, scheduling and management take place in kernel space.
Slower to create and manage.
Thread Implementation
Threads are provided in the form of a thread package.
The package contains operations to create and destroy threads as well as operations on
synchronization variables such as mutexes and condition variables.
Two approaches to implement a thread package.
1. Construct a thread library that is executed entirely in user mode.
Advantages:
It is cheap to create and destroy threads
All thread administration is kept in the user's address space, the price of creating a thread is
primarily determined by the cost for allocating memory to set up a thread stack
Destroying a thread mainly involves freeing memory for the stack, which is no longer used.
Switching thread context can be done in just a few instructions
Disadvantage:
A blocking system call will immediately block the entire process to which the thread belongs,
and thus also all the other threads in that process
2. Have the kernel be aware of threads and schedule them.
Advantages
Eliminates blocking problem.
Disadvantage:
Every thread operation (creation, deletion, synchronization, etc.), will have to be carried out by
the kernel, requiring a system call.
Multithreading Models
Three common ways of establishing a relationship between user level threads and kernel-level threads
1. Many-to-One: Many user-level threads mapped to single kernel thread.
Easier thread management.
Blocking-problem.
No concurrency.
3. Many-to-Many: It allows many user level threads to be mapped to many kernel threads.
Allows the OS to create a sufficient number of kernel threads.
Users can create as many as user threads as necessary.
No blocking and concurrency problems.
Two-level model.
Multithreaded Servers
The main use of multithreading in distributed systems is found at the server side. Practice shows that
multithreading not only simplifies server code considerably, but also makes it much easier to develop
servers that exploit parallelism to attain high performance, even on uniprocessor systems.
To understand the benefits of threads for writing server code, consider the organization of a he server
that occasionally has to block waiting for the disk. The file server normally waits for an incoming
request for a file operation, subsequently carries out the request, and then sends back the reply. One
possible and particularly popular organization is shown in Figure 2. Here one thread, the dispatcher,
reads incoming requests for a file operation. The requests are sent by clients to a well-known end point
The worker proceeds by performing a blocking read on the local file system, which may cause the
thread to be suspended until the data are fetched from disk. If the thread is suspended, another thread is
selected to be executed. For example, the dispatcher may be selected to acquire more work.
Alternatively, another worker thread can be selected that is now ready to run.
Now consider how the file server might have been written in the absence of threads. One possibility is
to have it operate as a single thread. The main loop of the file server gets a request, examines it, and
carries it out to completion before getting the next one. While waiting for the disk, the server is idle and
does not process any other requests. Consequently, requests from other clients cannot be handled. In
addition, if the file server is running on a dedicated machine, as is commonly the case, the CPU is
simply idle while the file server is waiting for the disk. The net result is that many fewer requests per
time unit can be processed. Thus threads gain considerable performance, but each thread is
programmed sequentially, in the usual way.
Virtualization
Virtualization is a broad term that refers to the abstraction of computer resources.
Virtualization creates an external interface that hides an underlying implementation
Issue:
Networking has become completely pervasive.
Issue:
Management of content delivery networks that support replication of dynamic content becomes
easier if edge servers supported virtualization, allowing a complete site, including its environment
to be dynamically copied.
As we will discuss later, it is primarily such portability arguments that
Solution:
Virtualization provides a high degree of portability and flexibility making it an important
mechanism for distributed systems.
Figure 5: (a) A process virtual machine, with multiple instances of (application, runtime) combinations
and (b) A Native virtual machine monitor, with multiple instances of (applications, operating system)
combinations and (c) A Hosted Virtual Machine Monitor .
VMMs will become increasingly important in the context of reliability and security for (distributed)
systems.
Since they allow for the isolation of a complete application and its environment, a failure caused
by an error or security attack need no longer affect a complete machine.
Portability is greatly improved as VMMs provide a further decoupling between hardware and
software, allowing a complete environment to be moved from one machine to another.
Clients
Networked User Interfaces
Two ways to support client-server interaction:
Figure 6 (a) A networked application with its own protocol and (b) A general solution to allow access
to remote applications.
Example: The X Window System (X)
Used to control bit-mapped terminals, which include a monitor, keyboard, and a pointing device
such as a mouse.
Viewed as that part of an operating system that controls the terminal.
X kernel is heart of the system.
Contains all the terminal-specific device drivers - highly hardware dependent.
X kernel offers a low-level interface for controlling the screen and for capturing events
from the keyboard and mouse.
This interface is made available to applications as a library called Xlib.
X kernel and the X applications need not necessarily reside on the same machine.
Several applications can communicate at the same time with the X kernel.
One specific application that is given special rights - the window manager (WM).
WM can dictate the "look and feel" of the display as it appears to the user.
The window manager can prescribe how each window is decorated with extra
buttons, how windows are to be placed on the display, and so.
Other applications will have to adhere to these rules.
Servers
General Design Issues
Concurrent server
Concurrent server does not handle the request itself; a separate thread or sub-process handles the
request and returns any results to the client; the server is then free to immediately service the next client
(i.e., there’s no waiting, as service requests are processed in parallel).
A multithreaded server is an example of a concurrent server.
An alternative implementation of a concurrent server is to fork a new process for each new
incoming request.
This approach is followed in many UNIX systems.
The thread or process that handles the request is responsible for returning a response to the
requesting client.
Design issue
State of server: A stateless server is a server that treats each request as an independent transaction that
is unrelated to any previous request. A stateless server does not keep information on the state of its
clients, and can change its own state without having to inform any client. Example: Web server is
stateless.
It merely responds to incoming HTTP requests, which can be either for uploading a file to the
server or (most often) for fetching a file.
When the request has been processed, the Web server forgets the client completely.
The collection of files that a Web server manages (possibly in cooperation with a file server),
can be changed without clients having to be informed.
A stateful server remembers client data (state) from one request to the next.
Information needs to be explicitly deleted by the server.
Example:
A file server that allows a client to keep a local copy of a file, even for performing update
operations.
The server maintains a table containing (client, file) entries.
This table allows the server to keep track of which client currently has the update permissions
on which file and the most recent version of that file.
Improves performance of read and write operations as perceived by the client.
Server Clusters
General Organization
A server cluster is a collection of machines connected through a network, where each machine runs one
or more servers. A server cluster is logically organized into three tiers
First Tier - Consists of a (logical) switch through which client requests are routed.
Switches vary:
Transport-layer switches accept incoming TCP connection requests and pass requests on to one
of servers in the cluster,
A Web server that accepts incoming HTTP requests, but that partly passes requests to
application servers for further processing only to later collect results and return an HTTP
response.
Issue:
When a server cluster offers multiple different machines may run different application servers.
The switch will have to be able to distinguish services or otherwise it cannot forward requests to
the proper machines.
Many second-tier machines run only a single application.
This limitation comes from dependencies on available software and hardware, but also that
different applications are often managed by different administrators.
Consequence - certain machines are temporarily idle, while others are receiving an overload of
requests.
Solution:
Temporarily migrate the services to idle machines to balance load. Use virtual machines allowing a
relative easy migration of code to real machines.
The Switch
Design goal for server clusters: Access transparency i.e. client applications running on remote machines
should not know the internal organization of the cluster.
Implementation: A single access point employing a dedicated machine. The switch forms the entry
point for the server cluster, offering a single network address.
Standard way of accessing a server cluster: A TCP connection over which application-level requests
are sent as part of a session. A session ends by tearing down the connection. The switch accepts
incoming TCP connection requests, and hands off such connections to one of the servers.
When the switch receives a TCP connection request, it identifies the best server for handling
that request, and forwards the request packet to that server.
The server will send an acknowledgment back to the requesting client, but inserting the switch's
IP address as the source field of the header of the IP packet carrying the TCP segment.
Traditionally Code Migration in distributed system took place in the form of process migration, in
which an entire process is moved from one machine to another. The reason for doing so: overall system
performance can be improved if processes are moved from heavily-loaded to lightly-loaded machines.
Load is expressed in terms of the CPU queue length or CPU utilization but moving a running
process to a different machine is a costly and intricate task.
In Many modern distributed systems, optimizing computing capacity is less an issue than
minimizing communication.
Platform and network heterogeneity make decisions for performance improvement through code
migration based on qualitative reasoning instead of mathematical models.
Examples:
Client-server system where server manages a huge database
If a client application needs to perform many database operations involving large quantities of data,
it may be better to ship part of the client application to the server and send only the results across
the network.
Otherwise, the network may be swamped with the transfer of data from the server to the client. In
this case, code migration is based on the assumption that it generally makes sense to process data
close to where those data reside.
Reason for doing so: Flexibility - It is possible to dynamically configure distributed systems.
Example - Client / Server application
Traditional Implementation - server implements a standardized interface to a file system.
Remote clients communicate with the server using a proprietary protocol.
The client-side implementation of the file system interface needs to be linked with the client
application.
Approach requires the software be readily available to the client at the time the client
application is being developed.
The client first fetches the necessary software, and then invokes the server.
Advantages –
Clients need not have all the software preinstalled to talk to servers.
The software can be moved as required and discarded when no longer needed.
With standardized interfaces the client-server protocol and its implementation can be changed at
will.
Changes will not affect existing client applications that rely on the server.
Disadvantages: Security
Blindly trusting that the downloaded code; implements only the advertised interface, while accessing an
unprotected hard disk.
Receiver-initiated migration is simpler than sender-initiated migration. In many cases, code migration
occurs between a client and a server, where the client takes the initiative for migration. Securely
uploading code to a server, as is done in sender-initiated migration, often requires that the client has
previously been registered and authenticated at that server. In other words, the server is required to
know all its clients; the reason being is that the client will presumably want access to the server's
resources such as its disk. Protecting such resources is essential. In contrast, downloading code as in the
receiver-initiated case can often be done anonymously. Moreover, the server is generally not interested
in the client's resources. Instead, code migration to the client is done only for improving client-side
performance. To that end, only a limited number of resources need to be protected, such as memory and
network connections.