Module 1: Introduction
By
Prof. Ankita Mandore,
Assistant Professor,
CSE (Data Science), DSCE
What is Distributed Computing?
A distributed systems is a collection of independent. computers.
interconnected via a network, Capable of collaborating on a task.
Distributed computing is computing performed in a distributed system.
The process of computation was started from working on a single processor.
This Uniprocessor computing Can be termed as centralized computing.
Example of Distributed System
Features of Distributed systems
Communication is hidden from Users
Application interact in Uniform and Consistent way
High degree of scalability
Resource sharing is possible in distributed systems.
Distributed systems art as fault tolerant systems.
Enhanced performance
Centralized Systems vs Distributed
Systems
Centralized Systems Distributed Systems
In Centralized Systems Several jobs are In Distributed Systems Jobs are
done on a particular central processing distributed among several processors.
Unit (CPU) The processor are interconnected
by a computer network.
Centralized control and authority Decentralized control and authority
Communication flows to central node Direct communication between nodes
Single point of failure Redundancy, less vulnerable to single
points of failure
Limited scalability due to centralization Highly scalable, new nodes can be
added easily
Relatively simpler to manage More complex to manage
Relation to Computer System
Component
In distributed computing system, each node consists of a Processor (CPU),
local memory and interface.
Communication between any two or more nodes is only by message passing
because there is no common memory available.
Distributed software is also termed as middleware. The distributed system
uses layered architecture to break down the complexity of system design.
Each computer has memory processing unit and the computers are connected
by a communication Network. All the computers can communicate with each
other through LAN and WAN.
A distributed system tem is an information processing system that contains a
number of independent computers that cooperate with one another over a
communications network in order to achieve a specific objective.
A distributed computer system consists of multiple software component that
are on multiple computers but run as a single system.
A distributed Systems can consist of any number of possible configurations,
such as mainframes. personal computers, workstations, mini computers and
so on.
Motivation
Economics
A collection of microprocessors after a better price / performance than mainframes.
Low price / performance ratio is the Cost effective way to increase computing power.
Speed
A distributed system may have more total computing power than a mainframe
Scalability
Distributed systems can be extended through the addition of components, there by
providing better scalability compared to centralized systems
Inherent distribution
Some applications are inherently distributed e.g. A Supermarket chain
Reliability
If one machine crashes, the system as a whole can still survive It gives higher availability
and improved reliability
Incremental growth
Computing power can be added in small increments.
Need of Distributed System
Resource sharing is main motivation of the distributed system. The term
resource is a rather abstract one, but it best characterizes the range of things
that can use usefully be shared in a networked computer system.
Resources may be the software resources or hardware resources. printers,
disks, CDROM and data are the example &f software and hardware resources
A resource manager is a Software module that manages a set of resources of a
particular type.
Primary requirement of distributed system are as follows :
1. Fault tolerance
2. Consistency
3. Security
4. Reliability
5. Replicated data
6. Concurrent transactions
Focus on Resource sharing
The term resource is a rather abstract one, but it best characterizes the
range of things that can usefully be shared is a networked computer system.
Equipment are shared to reduce cast. Data shared in database or web pages
are high- level resources which are mark significant to users without regard
for the server on servers that provide these.
Types of resources
1. Hardware resource: Hard disk, printer, camera
2. Data: File, database, web page.
3. Service: Search Engine
Patterns of resource sharing vary widely in their scope and in how closely users
work together.
Search Engine:
Users need no contact between users.
Computer supported co-operative working (CSCW):
Users cooperate directly share resources mechanisms to coordinate users action are
determined by the pattern of sharing and the geographic distribution
For effective sharing, each resource must be managed by a program that offers' a
communication interface enabling the resource to be accessed and updated reliably and
consistently.
Service:
Manages a collection of related resources and presents their functionalities to users and
applications .
Server:
Server is basically storage of resources and it provides services to the authenticated clients.
It is running program on a networked computer. Server accepts requests from client and
performs service and responds to request.
Example: Apache server
The complete interaction between server machine and client machine, from the point when
the client sends its request to when it receives the server's response is called a remote
invocation.
Hardware resources
CPU
a) Computing server: It extents processor intensive applications for clients
b) Remote object server: It executes methods on behalf of clients
c) Worm program: It Shares CPU capacity f desktop machine with the local user
Memory
Cache server holds recently accessed web pages in its RAM, for faster access by other
local computers
Disk
File server, virtual disk server, videos on demand server
Screen
Network window systems
Printer
Networked printer accept print Jobs from many computers and managing them with a
queuing system
Software Resources
Web page
Web servers enable multiple clients to share read only page content.
File
File Servers enable multiple clients to share read write files
Object
Possibilities for software objects are limitless. Shared white board, Shared diary and
room booking system are examples of this type.
Database
Databases are in tended to record the definitive State of some related sets of data.
They have been shared ever since. multi-user computers appeared. They include
techniques to manage concurrent updates
News group Content
The net news system makes rend only copies of the recently posted news items
available to clients throughout the internet
Video/Audio Stream
Servers can store entire videos on disk and deliver them at playback speed to
multiple clients simultaneously
Advantages of Distributed Computing
Disadvantages of Distributed Computing
Architectures of Distributed Systems
Architectures of Distributed Systems
Inter Process Communication- Shared
Data
Inter Process Communication- Message
Passing
Message Passing in Distributed System
A process is a program in execution.
Resource manager process to monitor the current status of usage of its local
resources All resource managers communicate each other from time to time to
dynamically balance the system load.
Therefore a DOS needs to provide inter-process communication (IPC) mechanism for
communication activities.
IPC basically requires information sharing among two or more processes.
Two basic methods for information sharing
Original sharing, or shared-data approach
Copy sharing, or message-passing approach
The shared-data paradigm gives the conceptual communication pattern.
In message-passing approach, the information to be shared is physically
copied from the sender process address space to the address spaces of all
receiver processes
This done by transmitting the data in the form of messages.
A message is a block of information.
Communication processes interact directly with each other
Distributed system communicate by exchanging messages.
Message passing is the basic IPC mechanism in distributed system.
Message-passing system is a subsystem of a DSM that provides a set of
message-based IPC protocols.
It enables processes to communicate by simple communication primitives
send and receive.
Message send communication primitives is denoted by send() and receive
communication primitives denoted by Receive()
Message passing primitive commands SEND (msg, dost) RECEIVE (src, buffer)
Desirable Features of a Good Message-
Passing system
1. Simplicity
• MPS should be simple and easy to use
• Construction of new applications and to communicate with existing ones by using
the primitives provided by the MPS Different modules of a distributed application
use.
• Simple primitives without bothering the system or network.
• Use of clean and simple semantics of IPC protocols
2. Uniform Semantics
• Uses two type of communication
• Local communication - the communicating process are on the same node
• Remote communication the communicating processes are on - different nodes
• The semantics of remotes communication should be close as possible to those of local
communications
3. Efficiency
• If the MPS is not efficient, IPC may become so expensive
• Application users try to avoid its use in their applications
• An IPC protocol of a MPS can be made efficient by reducing the number of message
exchanges during communication
• Some optimizations are
• Avoiding cost of establishing and terminating connections between the same pair of processes
of every exchange
• Minimizing the cost of maintaining connections
• Piggybacking of acknowledgement
4. Reliability
• A reliable IPC protocol can cope up with failure problems and guarantees the
delivery of a message.
• Failure due to node crash or communication link failure
• Handling of lost messages usually involves acknowledgements and retransmissions
on the basis of timeouts
• Another issues related to reliability is duplicate messages
• Duplicate messages because of event of failures or timeouts
• A reliable IPC protocol is also capable of detecting and handling duplicates
• Use sequence number to avoid duplicate messages
5. Correctness
• IPC system has group communication
• One sender to multiple receiver, multiple sender to one receiver
• Correctness related to IPC protocols group communication
Issues related to correctness is
Atomicity
Ensures that every message sent to a group of receivers will be delivered to either all of
them or none of them
Ordered delivery
Ensures that messages arrive at all receivers in an order acceptable to the application
Survivability
Guarantees that messages will be delivered despite of partial failure of processes,
machines, or communication links
6. Flexibility
• Not all applications require the same degree of reliability and correctness of the
IPC protocols
• Many applications do not require atomicity or ordered delivery of messages
• The IPC primitives should be such that users have the flexibility to choose and
specify the types and levels of reliability and correctness requirements of
applications
• Flexibility permit control flow as synchronous and asynchronous send/receive
7. Security
• A MPS be capable of providing a secure end-to-end communication
• A message in transit on the network should not be accessible to any user other
than those to whom it is addressed and the sender
• Steps necessary for secure communication is
• Authentication of the receiver(s) of a message by the sender
• Authentication of the sender of a message by its receiver(s)
• Encryption of a message before sending it over the network
8. Portability
• Two different aspects of portability
• It should be easily construct new IPC facility on another system by reusing the basic design
of existing MPS
• Applications are also portable heterogeneity must be considered while designing MPS
Message Structure
A message is a block of information formatted by a sending process in such a
manner that it is meaningful to the receiving process
It consists of fixed-length header and a variable-size collection of typed data
objects
Issues in IPC by message passing
In a message-oriented IPC protocol, the sending process determines the actual
contents of a message
The receiving process to convert the contents
Special primitives are explicitly used for sending and receiving the messages
Following issues to be discussed in the design of an IPС protocol
Who is the sender?
Who is the receiver?
Is there one receiver or many receivers?
Is the message guaranteed to have been accepted by its receiver(s)?
Does the sender need to wait for a reply?
What should be done in case of failure (crash or communication)?
What should be done if the receiver is not accept the message?
Will the message be discarded or stored in a buffer In case of buffering,
what should be done if the buffer is full?
If there are several outstanding messages for a receiver, can it choose the order in which
to service the messages?
Synchronization . ...
Major issue of communicating process is synchronization
The semantics classified as blocking and non-blocking types
Non-blocking semantics if it invocation does not block the execution of its invoker
Otherwise a primitive is blocking (execution of invoker is blocked)
Two types of semantics used for the send and receive primitives
In case of blocking send, after the execution of send, the sending process is
blocked until acknowledgement is received
Blocking receive, after execution of receive statement, the receiving process is
blocked until it receives message
Non-blocking send, after sending process sending process is allowed to execute
Non-blocking receive, the receiving process proceeds with its execution after
execution the receive statement
An important issue in a non-blocking receive primitive is how receiving
process know that the message has arrived in the message buffer
The following two methods used for this
Polling
A test primitive is allowed to the receiver to check the buffer status
Receiver periodically poll the kernel to check the buffer
Interrupt
When the message is filled in the buffer, software interrupt is used to notify the receiving
process
This method permits the receiving process to continue without having unsuccessful test
requests
Its highly efficient and allows maximum parallelism
Drawback is user-level interrupts make programming difficult
A variant of Nonblocking receive primitive is the conditional receive primitive
It returns control immediately, either with a message or an indicator that no
message
Blocking send primitive uses the timeout values
The value set by user or default value
Timeout value used for blocking receive primitive to prevent the receiving
process blocked indefinitely
Both the send and receive primitives of a communication between two
process use blocking semantics is said to be synchronous
If its uses nonblocking primitives then communication asynchronous
Synchronous communication is simple and easy to implement
Provide high reliability
Drawbacks are
Limits the concurrency and is subject to communication deadlocks
Less flexible because sending process always has to wait for an acknowledgement,
even it is nor required
Buffering
Messages copying from the address space of the sending process to the
address space of the receiving process
If the receiving process is not ready to receive messages, then it should be
save for later usage
The message buffering is related synchronization strategy
The following are the buffering strategies
1. Null buffer or no buffer
2. Buffer with unbounded capacity
3. Single-message buffer
4. Finite-bound or multiple-message buffer
Null buffer (or no buffering)
There is no place to temporarily store the message
One of the following implementation strategies used
The message remains in sender address space and execution of send is delayed
until the receiver executes receive
The message is simply discarded and the timeout mechanism is used to resend the
message after a timeout period
Single-Message buffer
A buffer capacity to store single message is used on the receiver's node
An application module may have at most one message outstanding at a time
Single-message buffer strategy is to keep the message ready for use at the
location of the receiver
The request message is buffered on the receiver's node if the receiver is not
ready to receive the message
The message buffer may either be located in the kernel's address space or in
the receiver process's address space
Unbounded-capacity buffer
A sender does not wait for the receiver to be ready
An unbounded-capacity message buffer that can store all unreceived
messages
It assure that all the messages sent to the receiver will be delivered
Finite-bound (or multiple-message)
buffer
Asynchronous mode of communication use finite-bound buffers
Need mechanism to handle the problem of buffer overflow
Two ways to handle buffer overflow
Unsuccessful communication
Message transfers simply fail whenever there is no mode buffer space
The send normally returns an error message to the sending process
This method is less reliable
Flow-controlled communication
The sender is blocked until the receiver accepts some messages
This method introduces a synchronization between sender and receiver
It result in unexpected deadlocks
The amount of buffer space to be allocated depends on implementation
A create-buffer system call is provided to the users
The receiver mail box is located in the kernel address space or in the receiver
process address space
This buffering provides better concurrency and flexibility
Multidatagram Messages
All networks has upper bound of the size of data transmitted at a time
This size is known as Maximum Transfer Unit(MTU) of network a
Message size greater than MTU has fragmented in to multiples of the MTU
Each fragment sent separately
Each fragment is sent in a packet with control information and data
Each packet is known as datagram
Messages smaller than the MTU of the network can be sent in a single packet
known as single-datagram messages
Messages larger than the MTU of the network have to be fragmented and sent
in multiple packets known as multidatagram messages
Encoding and Decoding of Message Data
The structure of program objects should be preserved, while transmitting
from the address of the sending process to receiving process
Since both processes are on computers of different architectures it is difficult
Because two reasons
An absolute pointer value loses its meaning when transferred from one address
space to another
Different program objects occupy varying amount of storage space, ex. Long int,
short int, var size character strings
Due to this problem the program objects first converted to a stream form for
transmission and placed into message buffer
This conversion process on the sender side is known as encoding of a message
data
When received stream form converted to original program objects
Known as decoding.
Two representations used for the
encoding and decoding
Tagged representation
The type of each program object along with its value is encoded in the message
The receiving process to check the type of each program object in the message
Program object is the self-describing nature of the coded data format
Untagged representation
The message data only contains program objects
No information is included in the message data to specify the type of each program
object
Receiver process must have prior knowledge of how to decode
Algorithmic challenges in distributed
computing
Designing useful execution models and frameworks
Dynamic distributed graph algorithms and distributed routing
algorithms
Time and global state in a distributed system
Synchronization/coordination mechanisms
Group communication, multicast, and ordered message delivery
Monitoring distributed events and predicates
Distributed program design and verification tools
Debugging distributed programs
Data replication, consistency models, and caching.
Applications of distributed computing
and newer challenges
1. Mobile systems
2. Sensor networks
3. Ubiquitous or pervasive computing
4. Peer-to-peer computing
5. Publish-subscribe, content distribution, and multimedia
6. Distributed agents
7. Distributed data mining
8. Grid computing
9. Security in distributed system
Types of Message Passing in Distributed
Systems
Message passing describes the method by which nodes or processes interact
and share information in distributed systems.
Message passing can be divided into two main categories according to the
sender and receiver's timing and synchronization
1. Synchronous Message Passing
Synchronous message passing involves a tightly coordinated interaction between
the sender and receiver. The key characteristics include:
Timing Coordination: Before proceeding with execution, the sender waits for the
recipient to confirm receipt of the message or finish processing it.
Request-Response Pattern: often use a request-response paradigm in which the
sender sends a message requesting something and then waits for the recipient to
react.
Advantages:
Ensures precise synchronization between communicating entities.
Simplifies error handling as the sender knows when the message has been successfully
received or processed.
Disadvantages:
May introduce latency if the receiver is busy or unavailable.
Synchronous blocking can reduce overall system throughput if many processes are
waiting for responses.
2. Asynchronous Message Passing
Asynchronous message passing allows processes to operate independently of each
other in terms of timing. Key features include:
Decoupled Timing: The sender does not wait for an immediate response from the
receiver after sending a message. It continues its execution without blocking.
Event-Driven Model: Communication is often event-driven, where processes
respond to messages or events as they occur asynchronously.
Advantages:
Enhances system responsiveness and throughput by allowing processes to execute
concurrently.
Allows for interactions that are loosely connected, allowing processes to process
messages at their own speed.
Disadvantages:
Requires additional mechanisms (like callbacks or event handlers) to manage responses
or coordinate actions.
Handling out-of-order messages or ensuring message delivery reliability can be more
complex compared to synchronous communication.
3. Unicast Messaging
Unicast messaging is a one-to-one communication where a message is sent from a
single sender to a specific receiver. The key characteristics include:
Direct Communication: The message is targeted at a single, specific node or
endpoint.
Efficiency for Point-to-Point: Since only one recipient receives the message,
resources are efficiently used for direct, point-to-point communication.
Advantages:
Optimized for targeted communication, as the message is only sent to the intended
recipient.
Minimizes network load compared to group messaging, as it doesn’t broadcast to
unnecessary nodes.
Disadvantages:
Not scalable for group communications; sending multiple unicast messages can strain the
system in larger networks.
Can increase the complexity of managing multiple unicast connections in large-scale
applications.
4. Multicast Messaging
Multicast messaging enables one-to-many communication, where a message is sent from one
sender to a specific group of receivers. The key characteristics include:
Group-Based Communication: Messages are delivered to a subset of nodes that have joined
the multicast group.
Efficient for Groups: Saves bandwidth by sending the message once to all nodes in the group
instead of individually.
Advantages:
Reduces network traffic by sending a single message to multiple recipients, making it ideal for
content distribution or group updates.
Scales efficiently for applications where data needs to reach specific groups, like video
conferencing or online gaming.
Disadvantages:
Complex to implement as nodes need mechanisms to manage group memberships and handle
node join/leave requests.
Not all network infrastructures support multicast natively, which can limit its applicability.
5. Broadcast Messaging
Broadcast messaging involves sending a message from one sender to all nodes
within the network. The key characteristics include:
Wide Coverage: The message is sent to every node, ensuring that all nodes in the
network receive it.
Network-Wide Reach: Suitable for announcements, alerts, or updates intended
for all nodes without targeting specific ones.
Advantages:
Guarantees that every node in the network receives the message, which is useful for
critical notifications or status updates.
Simplifies dissemination of information when all nodes need to be aware of an event or
data change.
Disadvantages:
Consumes significant network resources since every node, regardless of relevance,
receives the message.
Can lead to unnecessary processing at nodes that don’t need the message, potentially
causing inefficiency.