Unit-2 Notes
Unit-2 Notes
Imagine you have a friend (the server) who is really good at solving puzzles, but they live far
away. You can call them on the phone (the network) and ask for help. They solve the puzzle
and tell you the answer.
In computing, RPC is like that phone call. A program (the client) asks another computer (the
server) to do some work and send back the result. The magic is that to the client, it feels like
calling a local function, even though the work is done somewhere else.
Sockets:
o Definition: Sockets provide a low-level interface for network
communication between processes running on different computers.
o Characteristics: They allow processes to establish connections,
send data streams (TCP) or datagrams (UDP), and receive
responses. Sockets are fundamental for implementing higher-level
communication protocols.
Message Queuing Systems:
o Description: Message queuing systems facilitate asynchronous
communication by allowing processes to send messages to and
receive messages from queues.
o Characteristics: They decouple producers (senders) and consumers
(receivers) of messages, providing fault tolerance, scalability, and
persistence of messages. Examples include Apache Kafka,
RabbitMQ, and AWS SQS.
Publish-Subscribe Systems:
o Description: Publish-subscribe (pub-sub) systems enable
communication between components without requiring them to
directly know each other.
o Characteristics: Publishers publish messages to topics, and
subscribers receive messages based on their interest in specific
topics. This model supports one-to-many communication and is
scalable for large-scale distributed systems. Examples include
MQTT and Apache Pulsar.
These types of IPC mechanisms each have distinct advantages and are chosen
based on factors such as communication requirements, performance
considerations, and the nature of the distributed system architecture. Successful
implementation often involves selecting the most suitable IPC type or
combination thereof to meet specific application needs.
Benefits of Interprocess Communication in Distributed Systems
Below are the benefits of IPC in Distributed Systems:
Facilitates Communication:
o IPC enables processes or components distributed across different
nodes to communicate seamlessly.
o This allows for building complex distributed applications where
different parts of the system can exchange information and
coordinate their activities.
Integration of Heterogeneous Systems:
o IPC mechanisms provide a standardized way for integrating
heterogeneous systems and platforms.
o Processes written in different programming languages or running
on different operating systems can communicate using common
IPC protocols and interfaces.
Scalability:
o Distributed systems often need to scale horizontally by adding
more nodes or instances.
o IPC mechanisms, especially those designed for distributed
environments, can facilitate scalable communication patterns such
as publish-subscribe or message queuing, enabling efficient scaling
without compromising performance.
Fault Tolerance and Resilience:
o IPC techniques in distributed systems often include mechanisms
for handling failures and ensuring resilience.
o For example, message queues can buffer messages during network
interruptions, and RPC frameworks can retry failed calls or
implement failover strategies.
Performance Optimization:
o Effective IPC can optimize performance by minimizing latency and
overhead associated with communication between distributed
components.
o Techniques like shared memory or efficient message passing
protocols help in achieving low-latency communication.
Challenges of Interprocess Communication in Distributed Systems
Below are the challenges of IPC in Distributed Systems:
Network Latency and Bandwidth:
o Distributed systems operate over networks where latency (delay in
transmission) and bandwidth limitations can affect IPC
performance.
o Minimizing latency and optimizing bandwidth usage are critical
challenges, especially for real-time applications.
Reliability and Consistency:
o Ensuring reliable and consistent communication between
distributed components is challenging.
o IPC mechanisms must handle network failures, message loss, and
out-of-order delivery while maintaining data consistency across
distributed nodes.
Security:
o Securing IPC channels against unauthorized access, eavesdropping,
and data tampering is crucial.
o Distributed systems often transmit sensitive data over networks,
requiring robust encryption, authentication, and access control
mechanisms.
Complexity in Error Handling:
o IPC errors, such as network timeouts, connection failures, or
protocol mismatches, must be handled gracefully to maintain
system stability.
o Designing robust error handling and recovery mechanisms adds
complexity to distributed system implementations.
Synchronization and Coordination:
o Coordinating actions and ensuring synchronization between
distributed components can be challenging, especially when using
shared resources or implementing distributed transactions.
o IPC mechanisms must support synchronization primitives and
consistency models to avoid race conditions and ensure data
integrity.
Example of Interprocess Communication in Distributed System
Let’s consider a scenario to understand the Interprocess Communication in
Distributed System:
Consider a distributed system where you have two processes running on
separate computers, a client process (Process A) and a server process (Process
B). The client process needs to request information from the server process and
receive a response.
IPC Example using Remote Procedure Calls (RPC):
1. RPC Setup:
Process A (Client): Initiates an RPC call to Process B (Server).
Process B (Server): Listens for incoming RPC requests and
responds accordingly.
2. Steps Involved:
Client-side (Process A):
o The client process prepares an RPC request, which includes
the name of the remote procedure to be called and any
necessary parameters.
o It sends this request over the network to the server process.
Server-side (Process B):
o The server process (Process B) listens for incoming RPC
requests.
o Upon receiving an RPC request from Process A, it executes
the requested procedure using the provided parameters.
o After processing the request, the server process prepares a
response (if needed) and sends it back to the client process
(Process A) over the network.
3. Communication Flow:
Process A and Process B communicate through the RPC
framework, which manages the underlying network
communication and data serialization.
The RPC mechanism abstracts away the complexities of network
communication and allows the client and server processes to
interact as if they were local.
4. Example Use Case:
Process A (Client) could be a web application requesting user data
from a database hosted on Process B (Server).
Process B (Server) receives the request, queries the database,
processes the data, and sends the results back to Process A (Client)
via RPC.
The client application then displays the retrieved data to the user.
In this example, RPC serves as the IPC mechanism facilitating communication
between the client and server processes in a distributed system. It allows
processes running on different machines to collaborate and exchange data
transparently, making distributed computing more manageable and scalable.
API For Internet Protocols
Differences between TCP and UDP
Transmission Control Protocol (TCP) and User Datagram Protocol (UDP) both
are protocols of the Transport Layer Protocols. TCP is a connection-oriented
protocol whereas UDP is a part of the Internet Protocol suite, referred to as the
UDP/IP suite. Unlike TCP, it is an unreliable and connectionless protocol. In
this article, we will discuss the differences between TCP and UDP.
What is Transmission Control Protocol (TCP)?
TCP (Transmission Control Protocol) is one of the main protocols of the
Internet protocol suite. It lies between the Application and Network Layers
which are used in providing reliable delivery services. It is a connection-
oriented protocol for communications that helps in the exchange of messages
between different devices over a network. The Internet Protocol (IP), which
establishes the technique for sending data packets between computers, works
with TCP.
Grasping the differences between TCP and UDP is essential for excelling in
exams like GATE, where networking is a significant topic. To strengthen your
understanding and boost your exam preparation, consider enrolling in the GATE
CS Self-Paced Course. This course offers comprehensive coverage of
networking protocols, including in-depth explanations of TCP, UDP, and their
applications, ensuring you’re well-prepared for your exams.
transmission.
(UDP).
It’s a connectionless
Handshaking Uses handshakes such as
protocol i.e. No
Techniques SYN, ACK, SYN-ACK
handshake
UDP is used
TCP is used by HTTP,
by DNS , DHCP ,
Protocols HTTPs , FTP , SMTP and Tel
TFTP, SNMP , RIP ,
net .
and VoIP .
Approaches:
There are three ways to successfully communicate between various sorts of data
between computers.
1. Common Object Request Broker Architecture (CORBA):
CORBA is a specification defined by the Object Management Group (OMG)
that is currently the most widely used middleware in most distributed systems.
It allows systems with diverse architectures, operating systems, programming
languages, and computer hardware to work together. It allows software
applications and their objects to communicate with one another. It is a standard
for creating and using distributed objects. It is made up of five major
components. Components and their function are given below:
Object Request Broker (ORB): It provides a communication
infrastructure for the objects to communicate across a network.
Interface Definition Language (IDL): It is a specification language
used to provide an interface in a software component. To exemplify, it
allows communication between software components written in C++ and
Java.
Dynamic Invocation Interface (DII): Using DII, client applications are
permitted to use server objects without even knowing their types at
compile time. Here client obtains an instance of a CORBA object and
then invocation requests can be made dynamically on the corresponding
object.
Interface Repository (IR): As the name implies, interfaces can be added
to the interface repository. The purpose of IR is that a client should be
able to find an object which is not known at compile-time and
information about its interface then request is made to be sent to ORB.
Object Adapter (OA): It is used to access ORB services like object
reference generation.
Usage:
Marshalling is used to create various remote procedure call (RPC) protocols,
where separate processes and threads often have distinct data formats,
necessitating the need for marshalling between them.
To transmit data across COM object boundaries, the Microsoft Component
Object Model (COM) interface pointers employ marshalling. When a common-
language-runtime-based type has to connect with other unmanaged types via
marshalling, the same thing happens in the.NET framework. DCOM stands for
Distributed Component Object Model.
Unicast Communication
Unicast communication refers to the point-to-point transmission of data
between two nodes in a network. In the context of distributed systems:
Definition: Unicast involves a sender (one node) transmitting a message
to a specific receiver (another node) identified by its unique network
address.
Characteristics:
o One-to-One: Each message has a single intended recipient.
o Direct Connection: The sender establishes a direct connection to
the receiver.
o Efficiency: Suitable for scenarios where targeted communication is
required, such as client-server interactions or direct peer-to-peer
exchanges.
Use Cases:
o Request-Response: Common in client-server architectures where
clients send requests to servers and receive responses.
o Peer-to-Peer: Direct communication between two nodes in a
decentralized network.
Advantages:
o Efficient use of network resources as messages are targeted.
o Simplified implementation due to direct connections.
o Low latency since messages are sent directly to the intended
recipient.
Disadvantages:
o Not scalable for broadcasting to multiple recipients without
sending separate messages.
o Increased overhead if many nodes need to be contacted
individually.
2. Multicast Communication
Multicast Communication
Multicast communication involves sending a single message from one sender to
multiple receivers simultaneously within a network. It is particularly useful in
distributed systems where broadcasting information to a group of nodes is
necessary:
Definition: A sender transmits a message to a multicast group, which
consists of multiple recipients interested in receiving the message.
Characteristics:
o One-to-Many: Messages are sent to multiple receivers in a single
transmission.
o Efficient Bandwidth Usage: Reduces network congestion
compared to multiple unicast transmissions.
o Group Membership: Receivers voluntarily join and leave
multicast groups as needed.
Use Cases:
o Content Distribution: Broadcasting updates or notifications to
subscribers.
o Collaborative Systems: Real-time collaboration tools where
changes made by one user need to be propagated to others.
Advantages:
o Saves bandwidth and network resources by transmitting data only
once.
o Simplifies management by addressing a group rather than
individual nodes.
o Supports scalable communication to a large number of recipients.
Disadvantages:
o Requires mechanisms for managing group membership and
ensuring reliable delivery.
o Vulnerable to network issues such as packet loss or congestion
affecting all recipients.
3. Broadcast Communication
Broadcast communication involves sending a message from one sender to all
nodes in the network, ensuring that every node receives the message:
Broadcast Communication
Definition: A sender transmits a message to all nodes within the network
without the need for specific recipients.
Characteristics:
o One-to-All: Messages are delivered to every node in the network.
o Broadcast Address: Uses a special network address (e.g., IP
broadcast address) to reach all nodes.
o Global Scope: Suitable for disseminating information to all
connected nodes simultaneously.
Use Cases:
o Network Management: Broadcasting status updates or
configuration changes.
o Emergency Alerts: Disseminating critical information to all
recipients in a timely manner.
Advantages:
o Ensures that every node receives the message without requiring
explicit recipient lists.
o Efficient for scenarios where global dissemination of information is
necessary.
o Simplifies communication in small-scale networks or LAN
environments.
Disadvantages:
o Prone to network congestion and inefficiency in large networks.
o Security concerns, as broadcast messages are accessible to all
nodes, potentially leading to unauthorized access or information
leakage.
o Requires careful network design and management to control the
scope and impact of broadcast messages.
Reliable Multicast Protocols for Group Communication
Reliable multicast protocols are essential in distributed systems to ensure that
messages sent from a sender to multiple recipients are delivered reliably,
consistently, and in a specified order. These protocols are designed to handle the
complexities of group communication, where ensuring every member of a
multicast group receives the message correctly is crucial. Types of Reliable
Multicast Protocols include:
FIFO Ordering:
o Ensures that messages are delivered to all group members in the
order they were sent by the sender.
o Achieved by sequencing messages and delivering them
sequentially to maintain the correct order.
Causal Ordering:
o Preserves the causal relationships between messages based on their
dependencies.
o Ensures that messages are delivered in an order that respects the
causal dependencies observed by the sender.
Total Order and Atomicity:
o Guarantees that all group members receive messages in the same
global order.
o Ensures that operations based on the multicast messages (like
updates to shared data) appear atomic or indivisible to all
recipients.
Scalability and Performance for Group Communication
Scalability and performance are critical aspects of group communication in
distributed systems, where the ability to handle increasing numbers of nodes,
messages, and participants while maintaining efficient operation is essential.
Here’s an in-depth explanation of scalability and performance considerations in
this context:
1. Scalability
Scalability in group communication refers to the system’s ability to efficiently
accommodate growth in terms of:
Number of Participants: As the number of nodes or participants in a
group increases, the system should be able to manage communication
without significant degradation in performance.
Volume of Messages: Handling a larger volume of messages being
exchanged among group members, ensuring that communication remains
timely and responsive.
Geographical Distribution: Supporting communication across
geographically dispersed nodes or networks, which may introduce
additional latency and bandwidth challenges.
2. Challenges in Scalability
Communication Overhead: As the group size increases, the overhead
associated with managing group membership, message routing, and
coordination can become significant.
Network Bandwidth: Ensuring that the network bandwidth can handle
the increased traffic generated by a larger group without causing
congestion or delays.
Synchronization and Coordination: Maintaining consistency and
synchronization among distributed nodes becomes more complex as the
system scales up.
3. Strategies for Scalability
Partitioning and Sharding: Dividing the system into smaller partitions
or shards can reduce the scope of communication and management tasks,
improving scalability.
Load Balancing: Distributing workload evenly across nodes or partitions
to prevent bottlenecks and ensure optimal resource utilization.
Replication and Caching: Replicating data or messages across multiple
nodes can reduce access latency and improve fault tolerance, supporting
scalability.
Scalable Protocols and Algorithms: Using efficient communication
protocols and algorithms designed for large-scale distributed systems,
such as gossip protocols or scalable consensus algorithms.
4. Performance
Performance in group communication involves optimizing various aspects to
achieve:
Low Latency: Minimizing the time delay between sending and receiving
messages within the group.
High Throughput: Maximizing the rate at which messages can be
processed and delivered across the system.
Efficient Resource Utilization: Using network bandwidth, CPU, and
memory resources efficiently to support fast and responsive
communication.
5. Challenges in Performance
Message Ordering: Ensuring that messages are delivered in the correct
order while maintaining high throughput can be challenging, especially in
protocols that require strict ordering guarantees.
Concurrency Control: Managing concurrent access to shared resources
or data within the group without introducing contention or bottlenecks.
Network Conditions: Adapting communication strategies to varying
network conditions, such as bandwidth limitations or packet loss, to
maintain optimal performance.
6. Strategies for Performance
Optimized Message Routing: Using efficient routing algorithms to
minimize the number of network hops and reduce latency.
Asynchronous Communication: Employing asynchronous messaging
patterns to decouple sender and receiver activities, improving
responsiveness.
Caching and Prefetching: Pre-fetching or caching frequently accessed
data or messages to reduce latency and improve response times.
Parallelism: Leveraging parallel processing techniques to handle
multiple tasks or messages concurrently, enhancing throughput.
Challenges of Group Communication in Distributed Systems
Group communication in distributed systems poses several challenges due to the
inherent complexities of coordinating activities across multiple nodes or entities
that may be geographically dispersed or connected over unreliable networks.
Here are some of the key challenges:
Reliability: Ensuring that messages are reliably delivered to all intended
recipients despite network failures, node crashes, or temporary
disconnections. Reliable delivery becomes especially challenging when
nodes join or leave the group dynamically.
Scalability: As the number of group members increases, managing
communication becomes more challenging. Scalability issues arise in
terms of bandwidth consumption, message processing overhead, and the
ability to maintain performance as the system scales.
Concurrency and Consistency: Ensuring consistency of shared data
across distributed nodes while allowing concurrent updates can be
difficult. Coordinating access to shared resources to prevent conflicts and
maintain data integrity requires robust synchronization mechanisms.
Fault Tolerance: Dealing with node failures, network partitions, and
transient communication failures without compromising the overall
reliability and availability of the system. This involves mechanisms for
detecting failures, managing group membership changes, and ensuring
that communication continues uninterrupted.
Request/Reply Protocol:
The Request-Reply Protocol is also known as the RR protocol.
It works well for systems that involve simple RPCs.
The parameters and result values are enclosed in a single packet buffer in
simple RPCs. The duration of the call and the time between calls are both
briefs.
This protocol has a concept base of using implicit acknowledgements
instead of explicit acknowledgements.
Here, a reply from the server is treated as the acknowledgement (ACK)
for the client’s request message, and a client’s following call is considered
as an acknowledgement (ACK) of the server’s reply message to the
previous call made by the client.
To deal with failure handling e.g. lost messages, the timeout transmission
technique is used with RR protocol.
If a client does not get a response message within the predetermined
timeout period, it retransmits the request message.
Exactly-once semantics is provided by servers as responses get held in
reply cache that helps in filtering the duplicated request messages and
reply messages are retransmitted without processing the request again.
If there is no mechanism for filtering duplicate messages then at least-call
semantics is used by RR protocol in combination with timeout
transmission.
Semantic Transparency:
Syntactic transparency: This implies that there should be a similarity
between the remote process and a local procedure.
Semantic transparency: This implies that there should be similarity in
the semantics i.e. meaning of a remote process and a local procedure.
Working of RPC:
There are 5 elements used in the working of RPC:
Client
Client Stub
RPC Runtime
Server Stub
Server
Client: The client process initiates RPC. The client makes a standard call,
which triggers a correlated procedure in the client stub.
Client Stub: Stubs are used by RPC to achieve semantic transparency.
The client calls the client stub. Client stub does the following tasks:
o The first task performed by client stub is when it receives a request
from a client, it packs(marshalls) the parameters and required
specifications of remote/target procedure in a message.
o The second task performed by the client stub is upon receiving the
result values after execution, it unpacks (unmarshalled) those
results and sends them to the Client.
RPC Runtime: The RPC runtime is in charge of message transmission
between client and server via the network. Retransmission,
acknowledgement, routing, and encryption are all tasks performed by it.
On the client-side, it receives the result values in a message from the
server-side, and then it further sends it to the client stub whereas, on the
server-side, RPC Runtime got the same message from the server stub
when then it forwards to the client machine. It also accepts and forwards
client machine call request messages to the server stub.
Server Stub: Server stub does the following tasks:
o The first task performed by server stub is that it
unpacks(unmarshalled) the call request message which is received
from the local RPC Runtime and makes a regular call to invoke the
required procedure in the server.
o The second task performed by server stub is that when it receives
the server’s procedure execution result, it packs it into a message
and asks the local RPC Runtime to transmit it to the client stub
where it is unpacked.
Server: After receiving a call request from the client machine, the server
stub passes it to the server. The execution of the required procedure is
made by the server and finally, it returns the result to the server stub so
that it can be passed to the client machine using the local RPC Runtime.
RPC process:
The client, the client stub, and one instance of RPC Runtime are all
running on the client machine.
A client initiates a client stub process by giving parameters as normal.
The client stub acquires storage in the address space of the client.
At this point, the user can access RPC by using a normal Local
Procedural Call. The RPC runtime is in charge of message transmission
between client and server via the network. Retransmission,
acknowledgment, routing, and encryption are all tasks performed by it.
On the server-side, values are returned to the server stub, after the
completion of server operation, which then packs (which is also known as
marshaling) the return values into a message. The transport layer receives
a message from the server stub.
The resulting message is transmitted by the transport layer to the client
transport layer, which then sends a message back to the client stub.
The client stub unpacks (which is also known as unmarshalling) the
return arguments in the resulting packet, and the execution process
returns to the caller at this point.
When the client process requests by calling a local procedure then the procedure
will pass the arguments/parameters in request format so that they can be sent in
a message to the remote server. The remote server then will execute the local
procedure call ( based on the request arrived from the client machine) and after
execution finally returns a response to the client in the form of a message. Till
this time the client is blocked but as soon as the response comes from the server
side it will be able to find the result from the message. In some cases, RPCs can
be executed asynchronously also in which the client will not be blocked in
waiting for the response.
The parameters can be passed in two ways. The first is to pass by value,
whereas the second is to pass by reference. The parameters receiving the
address should be pointers when we provide it to a function. In Pass by
reference, a function is called using pointers to pass the address of variables.
Call by value refers to the method of sending variables’ actual values.
The language designers are usually the ones who decide which parameter
passing method to utilize. It is sometimes dependent on the data type that is
being provided. Integers and other scalar types are always passed by value in C,
whereas arrays are always passed by reference.
Remote Method Invocation
Remote Method Invocation (RMI) is an API that allows an object to invoke a
method on an object that exists in another address space, which could be on the
same machine or on a remote machine. Through RMI, an object running in a
JVM present on a computer (Client-side) can invoke methods on an object
present in another JVM (Server-side). RMI creates a public remote server object
that enables client and server-side communications through simple method calls
on the server object.
Stub Object: The stub object on the client machine builds an information block
and sends this information to the server.
The block consists of
An identifier of the remote object to be used
Method name which is to be invoked
Parameters to the remote JVM
Skeleton Object: The skeleton object passes the request from the stub object to
the remote object. It performs the following tasks
It calls the desired method on the real object present on the server.
It forwards the parameters received from the stub object to the method.
Working of RMI
The communication between client and server is handled by using two
intermediate objects: Stub object (on client side) and Skeleton object (on server-
side) as also can be depicted from below media as follows:
return result;
}
}
Step 3: Creating Stub and Skeleton objects from the implementation class
using rmic
The rmic tool is used to invoke the rmi compiler that creates the Stub and
Skeleton objects. Its prototype is rmic classname. For above program the
following command need to be executed at the command prompt
rmic SearchQuery.
Step 4: Start the rmiregistry
Start the registry service by issuing the following command at the command
prompt start rmiregistry
Step 5: Create and execute the server application program
The next step is to create the server application program and execute it on a
separate command prompt.
The server program uses createRegistry method of LocateRegistry class
to create rmiregistry within the server JVM with the port number passed
as an argument.
The rebind method of Naming class is used to bind the remote object to
the new name.
Java
Note: The above client and server program is executed on the same machine so
localhost is used. In order to access the remote object from another machine,
localhost is to be replaced with the IP address where the remote object is
present.
save the files respectively as per class name as
Search.java , SearchQuery.java , SearchServer.java & ClientRequest.java
Important Observations:
1. RMI is a pure java solution to Remote Procedure Calls (RPC) and is used
to create the distributed applications in java.
2. Stub and Skeleton objects are used for communication between the client
and server-side.