Unit 3 Distributed Systems
Unit 3 Distributed Systems
Distributed Objects refer to objects that reside in a networked environment and can be accessed
by clients across different locations. These objects can be thought of as encapsulated entities that
combine state (data) and behavior (methods) but are not confined to a single physical machine.
Instead, they exist in a distributed manner, enabling applications to leverage remote resources and
services seamlessly.
Encapsulation:
Distributed objects encapsulate data and behavior, allowing clients to interact with them without
needing to understand their internal implementation. This encapsulation promotes modular design
and abstraction.
Location Transparency:
Users of distributed objects can interact with them without concern for their physical location. The
system handles the complexities of locating and accessing these objects, providing a consistent
interface.
Interoperability:
Distributed objects can be implemented in different programming languages and run on various
platforms. Standards like CORBA (Common Object Request Broker Architecture) and RMI (Remote
Method Invocation) facilitate interoperability among distributed objects.
Scalability:
Distributed objects can be scaled easily by deploying additional instances across different nodes,
accommodating increased demand and enhancing performance.
Remote Invocation is the mechanism that enables a client to call methods on a distributed object
that resides on a remote server. When a remote method is invoked, the client does not execute the
method directly; instead, it sends a request to the remote object, which processes the request and
returns the results. This mechanism abstracts the underlying network communication and provides
a seamless experience for the client.
Types of Remote Invocation:
RPC allows a program to execute a procedure (or method) on a remote server as if it were a local
call. The client sends a request containing the procedure name and parameters, and the server
executes the procedure and returns the result.
RMI is a Java-specific technology that allows an object in one Java Virtual Machine (JVM) to invoke
methods on an object in another JVM, even if they are on different machines. RMI uses serialization
to transmit the object state and method invocations over the network.
The architecture of distributed objects and remote invocation typically involves several key
components:
Client:
The client initiates a request to invoke a method on a distributed object. It may be a user application,
a web client, or any entity that requires access to remote services.
Stub:
The stub acts as a proxy for the remote object on the client side. It provides a local interface to the
remote methods, handling the serialization of arguments and the transmission of requests to the
server.
Skeleton:
The skeleton resides on the server side and provides the implementation of the remote methods. It
receives requests from the client, deserializes the parameters, invokes the appropriate method on
the actual object, and sends back the results.
Remote Object:
The remote object is the actual implementation of the distributed object that processes the method
calls. It encapsulates the business logic and data required for the application.
Communication Protocol:
A communication protocol (e.g., HTTP, TCP/IP) is used to facilitate the transfer of requests and
responses between the client and server. This protocol ensures reliable and secure communication
over the network.
Advantages of Distributed Objects and Remote Invocation
Flexibility: Developers can design systems with distributed components, allowing for modularity
and scalability.
Resource Sharing: Distributed objects enable applications to utilize remote resources, such as
databases or services, efficiently.
Improved Performance: By distributing tasks across multiple nodes, applications can achieve better
load balancing and performance.
Latency: Network delays can impact the performance of remote invocations, leading to slower
response times.
Fault Tolerance: Ensuring that the system can recover from failures or network issues is crucial for
maintaining reliability.
Security: Protecting the data transmitted over the network and ensuring that only authorized clients
can access remote objects is vital for security.
The architecture of an object model in distributed systems typically involves several key components:
1. Client: The entity that requests services from the distributed objects.
2. Server: The entity that provides services through the distributed objects.
3. Object Request Broker (ORB): A middleware that facilitates communication between clients
and servers. It handles the location and invocation of objects.
4. Distributed Objects: The actual objects that encapsulate data and functionality, which can
be accessed remotely.
This architecture allows for a clear separation of concerns, where clients interact with the object
model without needing to know the details of the server-side implementation.
Example of CORBA Object Model
The Common Object Request Broker Architecture (CORBA) is a well-known example of an object
model in distributed systems. CORBA defines a standard for software components written in multiple
programming languages to work together across different platforms. In the CORBA object model,
clients use an interface defined in the Interface Definition Language (IDL) to interact with remote
objects. The ORB facilitates the communication, allowing clients to make requests to server-side
objects seamlessly.
For instance, consider a distributed banking application where a client wants to retrieve account
details. The client would invoke a method defined in the IDL on the remote account object. The ORB
translates this request into a network message, routes it to the appropriate server, and returns the
result to the client. This architecture exemplifies how CORBA enables interoperability and modular
design in distributed systems, allowing for flexible and scalable applications.
DCOM, or Distributed Component Object Model, is an extension of the Component Object Model
(COM) developed by Microsoft. It facilitates communication between software components located
on different networked computers. Here’s a breakdown of its key features, architecture, working
principles, and the issues it addresses.
Overview of DCOM
• Purpose: DCOM was designed to support the development of components that can be
dynamically activated and can interact seamlessly across a network.
• Protocol: DCOM uses the Object Remote Procedure Call (ORPC) protocol, which allows
remote method calls and facilitates communication between distributed objects.
• User Base: It is widely used in networked environments, with millions of Windows users
relying on it daily.
Key Issues Addressed by DCOM
4. Size and Complexity of System: DCOM addresses the challenges associated with the
complexity and size of distributed systems by promoting a modular architecture.
• Purpose: DDE is a protocol used in Microsoft Windows for applications to exchange data.
Although it offers a way to share information, its complexity led to the creation of the Dynamic
Data Exchange Management Library (DDEML), which simplifies interaction with DDE.
DCOM Architecture
Software Bus: DCOM architecture promotes software interoperability by utilizing a "software bus"
that allows reusable software components to interact seamlessly.
Working of DCOM
To operate correctly, DCOM requires proper configuration of COM objects on both the client and
server systems. Here’s how it works:
1.Registry Configuration: DCOM configurations are stored in the Windows Registry, consisting of
three identifiers:
• CLSID (Class Identifier): A globally unique identifier (GUID) for each class. It enables
Windows to locate and execute the appropriate component.
• PROGID (Programmatic Identifier): An optional, more user-friendly identifier than CLSID.
PROGIDs are easier to read but can lead to ambiguities if multiple classes share the same
name.
• APPID (Application Identifier): A unique identifier for applications that helps in managing
permissions and security. It identifies all classes related to a specific executable.
2.Component Activation: When a DCOM object is requested, the system uses the appropriate
CLSID or PROGID to locate the component in the registry and instantiate it.
3.Security and Permissions: DCOM relies on APPID for security. If the APPID is incorrect or
misconfigured, the application may encounter permissions errors when attempting to create or
access remote objects.
DCOM is a powerful framework for developing distributed applications and enabling communication
between components across networks. Its architecture promotes interoperability and modularity,
addressing the challenges of modern software systems. By managing component identifiers through
the Windows Registry, DCOM facilitates the dynamic activation of components while ensuring
security and reliability in networked environments. Despite its complexities, DCOM remains a critical
technology for distributed systems, particularly in Windows-based applications.
1. Object Serialization
RMI relies on object serialization to send objects between JVMs. The design must address how to
serialize and deserialize objects efficiently. This includes ensuring that all objects used in remote
method calls are serializable and handling potential performance overhead associated with
serialization.
2. Network Communication
The choice of transport protocol (e.g., TCP or UDP) impacts performance and reliability. TCP is
commonly used for its reliable stream-oriented communication, but it introduces latency. The design
should consider the trade-offs between reliability and performance.
3. Exception Handling
Remote method calls can fail due to various reasons, such as network issues or server crashes. The
design must incorporate robust exception handling to manage remote exceptions and ensure that
clients can recover from failures gracefully.
4. Security
RMI applications can expose sensitive data and functionalities. The design should implement security
measures, including authentication, authorization, and encryption, to protect data during
transmission and prevent unauthorized access.
5. Scalability
RMI systems must handle varying loads and scales effectively. The design should consider how to
manage multiple clients, load balancing, and resource management to ensure the system remains
responsive under different conditions.
RMI requires a mechanism to locate remote objects. The design should include a naming service
(e.g., RMI Registry) that allows clients to look up remote objects. This requires careful consideration
of object registration, discovery, and lifetime management.
7. Performance Optimization
The performance of RMI applications can be affected by factors like latency and bandwidth. The
design should consider optimizations such as connection pooling, caching of remote objects, and
minimizing the size of serialized objects to improve responsiveness.
Over time, remote objects may change. The design must address how to manage different versions
of objects, ensuring backward compatibility and handling changes in the interface without breaking
existing clients.
9. Concurrency Control
When multiple clients access remote objects concurrently, issues such as data consistency and race
conditions may arise. The design should include concurrency control mechanisms to manage access
to shared resources effectively.
Network latency can significantly affect the user experience. The design should account for timeout
settings to avoid long waits for unresponsive remote calls and include mechanisms for handling
latency issues.
Addressing these design issues is crucial for creating robust, efficient, and secure RMI systems. By
carefully considering each aspect, developers can build scalable and reliable applications that
leverage the power of remote method invocation while minimizing potential pitfalls.
Implementation of Remote Method Invocation (RMI)
The RMI (Remote Method Invocation) is an API that provides a mechanism to create
distributed application in java. The RMI allows an object to invoke methods on an
object running in another JVM.
The RMI provides remote communication between the applications using two
objects stub and skeleton.
Understanding stub and skeleton
RMI uses stub and skeleton object for communication with the remote object.
A remote object is an object whose method can be invoked from another JVM. Let's
understand the stub and skeleton objects:
stub
The stub is an object, acts as a gateway for the client side. All the outgoing requests
are routed through it. It resides at the client side and represents the remote object.
When the caller invokes method on the stub object, it does the following tasks:
1. It initiates a connection with remote Virtual Machine (JVM),
2. It writes and transmits (marshals) the parameters to the remote Virtual Machine
(JVM),
3. It waits for the result
The skeleton is an object, acts as a gateway for the server side object. All the incoming
requests are routed through it. When the skeleton receives the incoming request, it
does the following tasks:
Use the javac command to compile the classes. Make sure to include the correct classpath if
necessary:
Before running the server, start the RMI registry. Open a command prompt and run the following
command:
rmiregistry
This command will start the RMI registry on the default port (1099).
java CalculatorServer
You should see the message indicating that the server is ready.
java CalculatorClient
The client will connect to the server and display the results of the remote method invocations.
Distributed garbage collection is a process used in distributed systems to manage memory across
multiple interconnected computers. In such systems, each computer, or node, maintains its own
memory, and as applications run, they generate temporary data that may become obsolete. Unlike
in a single-machine environment where garbage collection can be straightforward, distributed
systems face the challenge of coordinating memory management across various nodes.
• Distributed garbage collection addresses this by implementing strategies that allow nodes to
identify and reclaim unused memory from other nodes as well.
• This involves tracking object references, detecting when data is no longer needed, and
coordinating with other nodes to ensure that memory is cleaned up efficiently.
• By automating these tasks, distributed garbage collection helps prevent memory leaks,
reduces resource wastage, and maintains system performance and reliability as the
distributed system grows.
1. Object Reachability: This concept involves determining whether an object in memory is still
in use or can be discarded. In distributed systems, reachability must be checked across
multiple nodes to identify which objects are still referenced and which can be collected.
2. Reference Counting: This technique tracks the number of references to an object. When an
object's reference count drops to zero, it can be safely removed. In a distributed context,
reference counts must be managed across different nodes, which requires coordination.
3. Distributed Reference Management: This involves managing references to objects that are
distributed across different nodes. It ensures that when objects are moved or deleted, all
nodes are updated accordingly to prevent invalid references.
4. Garbage Collection Protocols: These are algorithms and protocols used to collect and
reclaim unused memory. Examples include the Mark-and-Sweep algorithm and the
Generational Garbage Collection, adapted for distributed environments to handle
coordination between nodes.
6. Fault Tolerance: Since distributed systems are prone to failures, garbage collection protocols
must be designed to handle node crashes and network partitions, ensuring that memory
management can proceed even when some parts of the system are unavailable.
7. Global vs. Local Collection: In distributed systems, garbage collection can be performed
either globally (across all nodes) or locally (within individual nodes). Global collection is more
complex but can be more efficient, while local collection simplifies the process but may be
less optimal.
8. Scalability: The garbage collection system must scale with the distributed system. As the
number of nodes and objects grows, the garbage collection mechanism should efficiently
handle increased complexity and data volume.
Distributed garbage collection (DGC) algorithms manage memory across multiple nodes in a
distributed system. Each algorithm addresses different challenges associated with this task, such as
coordination between nodes, handling of object references, and communication overhead. Here are
some key algorithms used in distributed garbage collection:
1. Reference Counting:
• Keeps a count of references to each object. When the count reaches zero, the object
is eligible for garbage collection.
• Each node maintains a local count and updates counts on other nodes when references
are passed.
• This requires coordination to avoid inconsistent reference counts and to handle object
migration.
• Inefficient for handling cyclic references (where objects reference each other) and
involves significant communication overhead for updating reference counts.
2. Mark-and-Sweep Algorithm:
• Involves two phases: marking reachable objects and sweeping to collect those that are
not marked.
• Nodes perform local marking and then synchronize to complete the global sweep.
• This requires communication to ensure all nodes have an accurate view of which
objects are reachable.
3. Tracing Algorithms:
• Traces the object graph from roots to determine reachability. Common tracing
algorithms include variations of mark-and-sweep.
• Nodes trace their local object references and communicate with others to aggregate
reachability information.
• Each node updates counts as references are created or destroyed. Nodes exchange
messages to maintain accurate counts and detect when objects can be collected.
• Uses global snapshots of the system’s state to determine which objects are reachable.
• The system periodically takes snapshots of memory across nodes to identify live
objects.
• Nodes periodically freeze their state and communicate to create a consistent global
snapshot.
Remote Procedure Call (RPC) is a powerful technique for constructing distributed, client-server
based applications. It is based on extending the conventional local procedure calling so that
the called procedure need not exist in the same address space as the calling procedure. The
two processes may be on the same system, or they may be on different systems with a network
connecting them.
Types of RPC
Callback RPC
Callback RPC allows processes to act as both clients and servers. It helps with remote processing of
interactive applications. The server gets a handle to the client, and the client waits during the
callback. This type of RPC manages callback deadlocks and enables peer-to-peer communication
between processes.
Broadcast RPC
In Broadcast RPC, a client’s request is sent to all servers on the network that can handle it. This type
of RPC lets you specify that a client’s message should be broadcast. You can set up
special broadcast ports. Broadcast RPC helps reduce the load on the network.
Batch-mode RPC
Batch-mode RPC collects multiple RPC requests on the client side and sends them to the server in
one batch. This reduces the overhead of sending many separate requests. Batch-mode RPC works
best for applications that don’t need to make calls very often. It requires a reliable way to send data.
Working of RPC
1. A client invokes a client stub procedure, passing parameters in the usual way. The client stub
resides within the client’s own address space.
2. The client stub marshalls(pack) the parameters into a message. Marshalling includes converting
the representation of the parameters into a standard format, and copying each parameter into the
message.
3. The client stub passes the message to the transport layer, which sends it to the remote server
machine. On the server, the transport layer passes the message to a server stub,
which demarshalls(unpack) the parameters and calls the desired server routine using the regular
procedure call mechanism.
4. When the server procedure completes, it returns to the server stub (e.g., via a normal procedure
call return), which marshalls the return values into a message.
5. The server stub then hands the message to the transport layer. The transport layer sends the result
message back to the client transport layer, which hands the message back to the client stub.
6. The client stub demarshalls the return parameters and execution returns to the caller.
Understanding Events
An event is a significant occurrence that can trigger actions within a system. In distributed systems,
events can originate from various sources, such as user interactions, changes in system state, or
messages received from other components. Events are typically categorized as either internal
events, which occur within a component, or external events, which result from interactions with
other components or external systems.
Events can be processed in different ways, depending on the design of the system. In event-driven
architectures, components react to events asynchronously, allowing for a decoupled interaction
model where components do not need to be aware of each other’s state. This approach promotes
scalability and flexibility, enabling the system to handle numerous events concurrently without
blocking operations.
Notification Mechanisms
Notifications are messages sent in response to events, informing other components of changes or
significant occurrences. In distributed systems, notifications can take various forms, such as direct
messages, broadcast messages, or even updates to shared data stores. The mechanism used for
sending notifications often depends on the architecture of the distributed system.
A common pattern for notifications is the publish-subscribe model, where components can
subscribe to specific events or topics. When an event occurs, the publisher sends notifications to all
subscribed components. This decouples the producer and consumer of notifications, allowing for
greater modularity and scalability. Implementations of this model can be found in messaging
systems like Apache Kafka and RabbitMQ, which enable efficient event propagation across
distributed systems.
In distributed systems, effective event handling is crucial for ensuring timely responses and
maintaining system consistency. Events must be processed in a manner that respects their order and
guarantees that related events are handled together. Techniques such as event sourcing and
eventual consistency are often employed to manage state changes resulting from events, allowing
the system to reconstruct its state by processing a series of events.
Event handling also involves error management and recovery strategies. If an event cannot be
processed, the system must determine how to handle the failure gracefully, such as by retrying the
operation, logging the error, or alerting administrators. Robust error handling ensures the resilience
of distributed systems, allowing them to recover from failures without losing critical data or
functionality.
While events and notifications are powerful tools for enabling interaction in distributed systems,
they also introduce challenges.
Latency is a significant concern, as events must traverse network boundaries, potentially causing
delays in processing.
Message loss is another risk, as notifications may not always reach their intended recipients due to
network issues or system failures. To address these challenges, distributed systems often implement
reliability mechanisms, such as acknowledgments and retries, to ensure that notifications are
successfully delivered.
Additionally, maintaining order among events is critical, especially in systems where the sequence
of operations matters. Various algorithms, such as Lamport timestamps, are employed to establish
a logical ordering of events across distributed components, helping to resolve potential conflicts.
Resource Management: Operating systems are responsible for managing the hardware resources
of a computer system, including CPU, memory, storage, and I/O devices. In a distributed
environment, resource management becomes more complex as resources are spread across multiple
machines. The OS must efficiently allocate these resources, handle requests from various
applications, and ensure that resources are used optimally.
Process Management: The OS handles process creation, scheduling, and termination. In distributed
systems, this includes managing processes that may be executing on different nodes. The operating
system must ensure that processes can communicate with one another and synchronize their
actions, even when they are distributed across different machines.
Security and Access Control: Security is a critical concern in distributed systems, where multiple
users and processes interact over a network. The OS implements security mechanisms to protect
system resources and data from unauthorized access. This includes authentication, authorization,
and encryption, ensuring that only legitimate users and processes can access sensitive information.
Fault Tolerance and Reliability: Operating systems are designed to handle faults and ensure system
reliability. In distributed systems, this involves implementing strategies for detecting and recovering
from failures, such as process crashes or network disruptions. The OS provides redundancy and
replication mechanisms to enhance system robustness, ensuring that the system can continue
functioning even in the face of failures.
Concurrency Control: In distributed systems, multiple processes may access shared resources
simultaneously, leading to potential conflicts. The OS must implement concurrency control
mechanisms to ensure that these processes can safely interact without causing data inconsistencies
or corruption. Techniques such as locking, semaphores, and monitors are commonly used to manage
concurrency in distributed environments.
The Operating System Layer in Distributed Systems
The operating system (OS) layer plays a crucial role in the functioning of distributed systems, serving
as a foundational platform for middleware and applications. This layer provides abstractions of local
hardware resources, including processing power, storage, and communication capabilities. By
offering a standardized interface to these resources, the OS enables middleware to efficiently
manage distributed operations and facilitate remote invocations among distributed objects and
processes across different nodes.
Middleware acts as a bridge between applications and the underlying OS-hardware combinations
present at each node in a distributed system. Each node consists of an OS that includes a kernel and
user-level services such as communication libraries. These OS components provide the necessary
abstractions to utilize local resources effectively, allowing middleware to implement mechanisms for
remote invocations. The middleware layer benefits from the OS by leveraging its capabilities for
resource sharing, ensuring that users experience good performance and responsiveness when
interacting with distributed applications.
The OS layer's architectural components include the kernel and server processes that manage
resources and offer interfaces to clients. These components are essential for delivering services and
ensuring that user requests are processed efficiently. The following requirements are fundamental
for effective OS and middleware interaction:
1. Encapsulation: The OS should provide a clear and useful service interface to its resources,
hiding the complexities of resource management, such as memory and device handling, from
the clients. This allows clients to interact with resources without needing to understand their
underlying implementations.
2. Protection: Security is paramount; the OS must protect resources from unauthorized access.
For instance, file permissions prevent users from reading files without appropriate rights,
while device registers must be shielded from application processes to maintain system
integrity.
3. Concurrent Processing: In distributed systems, multiple clients may access shared resources
simultaneously. The OS must ensure concurrency transparency, enabling clients to share and
access resources without interfering with one another's operations. This is achieved through
effective resource management and scheduling.
Invocation Mechanisms
Clients access encapsulated resources through various invocation mechanisms, such as remote
method invocations (RMIs) or system calls to the kernel. These invocations involve several critical
tasks that the OS and middleware must handle, including:
• Scheduling: When a client invokes an operation, the OS must schedule its execution,
determining when and where the processing occurs within the kernel or server.
Core OS Functionality
The OS layer encompasses core functionalities essential for the management of distributed systems,
as illustrated in the architecture diagram. Key components include:
• Process Manager: Responsible for the creation and management of processes, which are
units of resource management, encompassing an address space and one or more threads.
• Thread Manager: Manages thread creation, synchronization, and scheduling. Threads are
lightweight units of execution within a process, enabling concurrent processing.
• Memory Manager: Oversees physical and virtual memory management, ensuring efficient
allocation and sharing of memory resources among processes.
• Supervisor: Handles interrupts, system calls, and exceptions, managing the hardware
abstraction layer and coordinating interactions with hardware components.
Shared-Memory Multiprocessors
Portability of OS Software
Operating system software is typically designed for portability across different computer
architectures. Much of the OS code is written in high-level programming languages like C, C++, or
Modula-3, with layered architectures to minimize machine-dependent components. This approach
enables kernels to operate on various hardware platforms, enhancing the flexibility and scalability
of distributed systems.
The operating system layer is foundational to the success of distributed systems, providing essential
services and abstractions that enable middleware to function effectively. By managing resources,
facilitating communication, and ensuring security and concurrency, the OS layer supports the
development of distributed applications that can leverage the power of networked environments,
ultimately leading to improved user satisfaction and system performance.
To illustrate illegitimate access, consider a file with two operations: read and write. Protecting the file
involves two primary sub-problems:
Authorization of Operations: The first challenge is ensuring that only authorized clients can
perform specific operations. For instance, the file owner, Smith, should have both read and write
permissions, while another user, Jones, should only be allowed to read the file. An illegitimate access
in this scenario occurs if Jones is able to perform a write operation, violating the established access
rights.
Bypassing Exported Operations: The second challenge involves preventing clients from executing
operations that are not explicitly provided. For example, if Smith were to access the file pointer
variable directly and create a nonsensical operation like setFilePointerRandomly, it could disrupt the
file's normal operation. This highlights the need for safeguards against unauthorized invocations
that could lead to inconsistent states within the system.
Approaches to Protection
Hardware Support: Hardware mechanisms can also play a crucial role in protecting modules from
one another at the invocation level, irrespective of the programming language. This requires a robust
kernel that manages these protections.
The kernel is a critical component that remains loaded from system initialization, executing with
complete access privileges to the system's physical resources. Its responsibilities include:
Privileged Mode Operation: The kernel operates in a privileged mode, allowing it to execute
sensitive instructions, such as managing protection tables used by the memory management unit.
This capability is essential for controlling access to the system's resources and enforcing security
policies.
Address Space Management: The kernel creates and manages distinct address spaces for each
process, ensuring that no process can access memory outside its designated space. This prevents
aberrant processes from interfering with the memory of other processes or the kernel itself. Each
address space can have specific memory access rights, such as read-only or read-write, further
enhancing protection.
User vs. Kernel Mode Execution: When processes execute application code, they operate in a user-
level address space with restricted access rights. Conversely, when executing kernel code, they
transition to the kernel's address space. This transition occurs through exceptions like interrupts or
system call traps, which safely transfer control to the kernel while preventing any process from
gaining unauthorized control over the hardware.
Costs of Protection
Implementing protection mechanisms incurs certain costs. Switching between user and kernel
address spaces can be processor-intensive, consuming multiple cycles. Additionally, executing a
system call trap is inherently more expensive than a standard procedure or method call due to the
overhead associated with transitioning control to the kernel. These performance implications must
be considered when designing distributed systems and their invocation mechanisms.
Protection in distributed systems is vital to maintaining system integrity and preventing illegitimate
access to resources. By employing type-safe programming practices, leveraging hardware support,
and utilizing robust kernel mechanisms, distributed systems can effectively safeguard their
resources. While these protections may introduce performance costs, they are essential for ensuring
reliable and secure operations in complex distributed environments.
Processes and Threads in Operating Systems
The traditional model of a process in operating systems—where a single activity is executed—proved
inadequate for the demands of distributed systems and complex single-computer applications that
require internal concurrency. This limitation arises because traditional processes make sharing
between related activities both cumbersome and costly. To address these challenges, the concept of
a process has evolved to encompass multiple activities within a single execution environment,
leading to the contemporary understanding of processes and threads.
Today, a process is defined as an execution environment that can host one or more threads. A thread
represents an individual activity or the "thread of execution." This execution environment is
considered the unit of resource management and includes the following key components:
• Address Space: This is the range of memory addresses that a process can use, providing a
unique space for its execution.
• Higher-Level Resources: These consist of resources that the process might use, such as open
files, network connections, and graphical windows.
While execution environments are relatively costly to create and manage, they can be efficiently
shared among multiple threads. This shared structure allows threads to access all resources within
their environment, creating a protection domain that ensures threads operate within defined
boundaries.
Benefits of Multi-Threading
The introduction of multiple threads within a single process aims to enhance concurrent execution
among operations. This concurrency enables the overlap of computation with input/output (I/O)
operations, significantly improving overall system performance. In server applications, for instance,
having multiple threads allows for concurrent processing of client requests, reducing bottlenecks.
For example, while one thread processes a client's request, another thread can wait for a disk access
to complete, optimizing resource utilization and response times.
A memorable analogy illustrates the relationship between threads and execution environments.
Imagine an execution environment as a stoppered jar containing air and food, with a fly representing
a thread inside it. The fly can reproduce, creating additional flies (threads) that can also consume
resources within the jar. However, these flies must be disciplined in their resource consumption;
otherwise, they risk colliding and causing unpredictable outcomes.
In this analogy, the jar acts as a protective boundary, preventing threads from accessing resources
in other jars (execution environments). While threads within a jar can communicate with flies in other
jars, they cannot escape, nor can foreign flies enter the jar. This encapsulation provides the necessary
protection, ensuring that data and resources within one execution environment remain inaccessible
to threads in others. However, some operating systems allow controlled sharing of resources, such
as physical memory, among execution environments on the same machine.
In older operating systems, the concept of a single thread per process was common, leading to the
term multi-threaded process to emphasize the presence of multiple threads. The terminology can
sometimes be confusing; for instance, in various programming models and operating systems, the
term process may refer to what is defined here as a thread.
1. Text Segment: This segment contains the compiled code of the program. It is usually read-
only to prevent the program from accidentally modifying its instructions during execution.
2. Data Segment: This segment holds global and static variables that are initialized by the
program. It can be further divided into initialized and uninitialized data segments.
3. Heap Segment: The heap is a dynamically allocated memory area used for variables whose
size may change during runtime. Memory management functions like malloc in C or new in
C++ allocate space from the heap.
4. Stack Segment: The stack segment stores temporary data such as function parameters,
return addresses, and local variables. It grows and shrinks as functions are called and return,
respectively. The stack operates in a last-in, first-out (LIFO) manner.
Each process's address space is isolated from others, preventing them from accessing each other’s
memory directly. This isolation enhances security and stability, as one process cannot corrupt or
manipulate the data of another.
1. Physical Address Space: This refers to the actual hardware memory addresses available on
the machine. It represents the total RAM that can be accessed directly by the system.
2. Virtual Address Space: Modern operating systems often use a virtual memory management
system, where each process operates in its own virtual address space. The virtual addresses
are mapped to physical addresses by the operating system and the memory management
unit (MMU). This allows processes to use more memory than is physically available and
provides isolation and protection.
The operating system is responsible for managing address spaces through various mechanisms:
• Address Space Allocation: When a process is created, the operating system allocates an
address space for it. This allocation includes setting up segments for text, data, heap, and
stack.
• Context Switching: During multitasking, the operating system performs context switching,
where it saves the state of the current process and loads the state of another process. This
involves switching the address space so that the CPU can execute the instructions and access
the data of the new process.
• Memory Protection: The operating system ensures that processes can only access their own
address space, using mechanisms like paging and segmentation to enforce this protection.
• Swapping: If a system runs low on physical memory, the operating system may swap
processes in and out of physical memory, temporarily storing them on disk. This involves
managing the virtual address space to ensure that the necessary data is accessible when
needed.
The address space is a crucial concept in operating systems, providing the framework for process
isolation, security, and resource management. By maintaining separate address spaces for each
process, the operating system can ensure efficient operation while protecting the integrity of data
and programs running on the system. This abstraction allows developers to work with memory
without concern for conflicts or corruption, thus facilitating the development of complex and reliable
applications.
Creation of a New Process in Operating Systems
The creation of a new process is a fundamental operation in operating systems, enabling
multitasking and efficient resource management. This process allows the system to run multiple
programs simultaneously, each operating within its own isolated environment. The mechanism for
process creation varies across different operating systems, but the basic principles remain consistent.
1. System Call Invocation: The process creation typically begins with a system call. In UNIX-like
operating systems, this is often done using the fork() system call. When a process invokes this
call, it requests the operating system to create a new process.
2. Duplication of the Parent Process: Upon receiving the fork() request, the operating system
duplicates the calling process, referred to as the parent process. This duplication includes
creating a new process control block (PCB) for the child process, which holds essential
information such as process state, program counter, CPU registers, memory management
information, and I/O status.
3. Allocation of Resources: The operating system allocates resources for the new process. This
includes:
o Memory Allocation: Assigning a unique address space to the child process. The child
typically inherits a copy of the parent’s address space, including its text, data, heap,
and stack segments. However, techniques like Copy-on-Write can be used to optimize
memory usage, allowing both processes to share the same memory pages until one of
them modifies a page.
o File Descriptors: The child process inherits file descriptors from the parent, allowing it
to access the same files that the parent process can.
4. Setting Process States: The operating system sets the state of the new process to ready or
waiting. The new process is added to the system's process queue, making it eligible for
execution by the CPU.
5. Process Identification: The child process is assigned a unique process identifier (PID). This
identifier is crucial for the operating system to manage and differentiate between multiple
processes.
6. Execution Context: After the process is created, the operating system prepares the execution
context for the child process. This context includes the program counter, which indicates the
next instruction to execute in the child process.
7. Return to Parent: The fork() system call returns twice: once in the parent process (returning
the PID of the child) and once in the child process (returning 0). This allows both processes
to continue executing different segments of code, often determining their subsequent actions
based on the return value.
8. Execution of Child Process: The child process may then execute a different program using
the exec() family of system calls (e.g., execl(), execv(), etc.). These calls replace the child’s
memory space with a new program, allowing for the execution of a completely different task.
9. Process Termination: Once a process has completed its execution, it can terminate using the
exit() system call. The operating system cleans up the resources associated with the
terminated process, updating the process tables and freeing memory.
• UNIX/Linux: The fork() system call is widely used, and process creation can be followed by
exec() calls to run different programs. Processes are created with a hierarchical relationship
(parent-child).
• Windows: The CreateProcess() function is utilized, which combines the functionalities of fork()
and exec() into a single call. This function allows for more control over the new process's
attributes and execution environment.
The creation of a new process is a crucial mechanism that underpins the multitasking capabilities of
modern operating systems. By allowing processes to operate independently within their own
address spaces while managing resources effectively, operating systems enable efficient execution
of multiple applications concurrently. Understanding the process creation model is essential for
developers and system administrators to optimize performance and ensure system stability.
Threads in Operating Systems
Threads are a critical aspect of modern operating systems, allowing processes to execute multiple
tasks concurrently. This section explores the advantages of multi-threading for client and server
processes, programming with threads using Java as a case study, and various designs for
implementing threads in servers.
Advantages of Multi-threading :
Increased Throughput:
Improved Responsiveness:
Threads can improve the responsiveness of applications. In a client-server model, while one thread
processes a request, another can handle user interactions, ensuring that the application remains
responsive to user input.
Threads share the same address space and resources of their parent process, making them more
lightweight than processes. This leads to lower overhead in terms of memory and context switching.
Simplified Complexity:
Multi-threading can simplify the design of applications that require concurrent operations, such as
web servers or applications that perform background tasks while allowing users to interact with the
main interface.
Single-threaded Scenario:
If a server has a single thread and each request takes an average of 10 milliseconds (2 ms for
processing + 8 ms for I/O), it can handle 100 requests per second. New requests are queued while
the server processes the current request.
Multi-threaded Scenario:
When the server uses two threads, they can be scheduled independently. While one thread is
blocked waiting for I/O, the second thread can process another request. However, both threads may
become blocked if they compete for a single disk drive, limiting the throughput to 125 requests per
second.
Using Caching:
Introducing disk block caching can further improve performance. With a 75% hit rate, the average
I/O time drops to 2 milliseconds, increasing the theoretical maximum throughput to 500 requests
per second. However, if processing time increases to 2.5 milliseconds due to caching overhead, the
effective throughput becomes 400 requests per second.
Multi-processor Environment:
Threading Architectures
The server maintains a fixed pool of worker threads that process requests from a shared queue. An
I/O thread receives requests and adds them to this queue. This architecture can struggle with request
prioritization and may lead to inflexibility if the request load fluctuates.
2.Thread-per-Request Architecture:
Each incoming request spawns a new worker thread that processes the request and then terminates.
This allows for maximum throughput since there’s no contention for a shared queue. However, the
overhead of creating and destroying threads can be significant.
3.Thread-per-Connection Architecture:
A thread is created for each client connection. This approach allows clients to send multiple requests
over the same connection, reducing the overhead compared to the thread-per-request model.
4.Thread-per-Object Architecture:
Each remote object has its dedicated thread. An I/O thread queues requests for the workers, which
helps in managing resources effectively while maintaining lower thread-management overhead.
Threads in Clients
Asynchronous Operations:
In a client process, one thread can handle remote method invocations (RMIs) while another thread
continues to generate results or handle user interactions. This approach prevents the client from
blocking unnecessarily.
Web Browsers:
Web browsers benefit from multi-threading by handling multiple page requests simultaneously. This
allows users to navigate and interact with the browser without experiencing delays while waiting for
pages to load.
Threads play a vital role in modern computing, enabling efficient concurrent execution of tasks in
both server and client environments. By leveraging the advantages of multi-threading, applications
can achieve higher throughput, improved responsiveness, and efficient resource utilization, making
them well-suited for the demands of contemporary software systems.