0% found this document useful (0 votes)
10 views

Distributed Computing

Uploaded by

deepakkratos337
Copyright
© © All Rights Reserved
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
10 views

Distributed Computing

Uploaded by

deepakkratos337
Copyright
© © All Rights Reserved
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 10

Distributed computing

1.Design issue and challenges in the distributed


computing ?
Distributed computing, where tasks are processed across multiple machines, is
an architecture that promises scalability, fault tolerance, and resource efficiency.
However, designing and implementing such systems involves addressing a myriad of
unique challenges that stem from their very nature. These challenges span technical,
architectural, and operational domains, and effectively dealing with them is crucial for
building a robust and efficient distributed system.

1. Network Latency and Bandwidth Constraints


 Issue: One of the fundamental aspects of distributed systems is that the
various nodes communicate over a network, which introduces inherent
latency. Network latency is the time it takes for data to travel from one node
to another, while bandwidth represents the volume of data that can be
transmitted over the network within a given timeframe.

 Challenge: Minimizing network latency is particularly difficult, especially in


globally distributed systems where physical distances between nodes are
significant. Additionally, network bandwidth limitations can restrict the
amount of data that can be exchanged between nodes, leading to
bottlenecks. Strategies to address this include minimizing data transfers,
compressing data, and intelligently partitioning tasks so that nodes can
operate more independently without constantly needing to communicate.

2. Data Consistency
 Issue: In distributed computing, maintaining data consistency is a significant
concern, particularly when multiple copies of data are spread across different
nodes. Different nodes might read or write data concurrently, which can lead
to inconsistencies.

 Challenge: Balancing consistency with system availability and performance is


not straightforward. Strong consistency guarantees that all nodes see the
same data at the same time, but this can significantly degrade performance
due to the need for frequent synchronization across nodes. Eventual
consistency relaxes these guarantees, allowing the system to be more
available but accepting that not all nodes may have the latest data at any
given time. The CAP theorem highlights this trade-off between Consistency,
Availability, and Partition tolerance, forcing system architects to carefully
consider which model to adopt based on the system's requirements.
3. Fault Tolerance and Reliability
 Issue: Distributed systems are inherently more vulnerable to failures than
centralized systems because they rely on a network of interconnected nodes,
each of which can fail independently. The possibility of partial failures, where
some components fail while others continue to operate, complicates recovery
and reliability strategies.

 Challenge: Designing systems that can continue to function correctly in the


presence of failures requires redundancy, replication, and intelligent recovery
mechanisms. For example, data might be replicated across multiple nodes so
that if one node fails, the data is still accessible from another node. However,
this increases storage and synchronization overhead. Fault tolerance
mechanisms, such as consensus algorithms (e.g., Paxos, Raft), allow systems
to continue making progress despite failures but at the cost of increased
complexity.

4. Scalability
 Issue: One of the primary motivations behind distributed computing is
scalability—being able to increase system capacity by adding more nodes.
However, as more nodes are added, the system may experience diminishing
returns due to increased overhead in coordinating and managing the nodes.

 Challenge: Scaling efficiently involves not just adding more hardware but also
ensuring that the software architecture can handle the additional complexity.
For example, adding more nodes increases the number of messages that
must be exchanged between them, which can lead to congestion and
bottlenecks. Load balancing, dynamic resource allocation, and distributed
algorithms that minimize inter-node communication are necessary to ensure
the system scales effectively.

5. Synchronization and Coordination


 Issue: Distributed systems often require multiple nodes to work together to
complete a task, which necessitates synchronization. Tasks must be
coordinated to avoid conflicts, ensure correct execution order, and prevent
data corruption.

 Challenge: Achieving synchronization in a distributed system is complex due


to the asynchronous nature of communication and the potential for node
failures. Distributed locks, consensus protocols, and leader election
algorithms are commonly used to coordinate tasks. However, these
mechanisms can introduce additional latency and complexity. A significant
challenge is preventing deadlocks, where two or more nodes are waiting for
each other to release resources, leading to a system stall.

6. Security Concerns
 Issue: Distributed systems expose data and computations across multiple
nodes, often over untrusted networks, which significantly increases the
system's vulnerability to attacks such as data breaches, eavesdropping, and
distributed denial of service (DDoS) attacks.

 Challenge: Securing a distributed system requires implementing encryption


for data in transit and at rest, authenticating and authorizing access to
nodes, and ensuring that communications are secure. Moreover, ensuring
that sensitive data remains protected while being processed across
potentially insecure nodes presents additional challenges. Techniques like
zero-trust security models, where each node and communication must be
continuously verified, are increasingly employed to combat these risks.

7. Heterogeneity of Nodes
 Issue: Distributed systems often consist of nodes with varying hardware,
operating systems, and configurations, especially in environments like cloud
computing or edge computing, where resources are pooled from multiple
providers or locations.

 Challenge: Designing a distributed system that performs well across


heterogeneous nodes requires abstracting away the differences in hardware
and software environments. Middleware platforms can help standardize
communication and execution environments, but ensuring that the system
remains efficient despite these differences remains a challenge.

8. Debugging and Monitoring


 Issue: Distributed systems are notoriously difficult to debug and monitor due
to their complexity, scale, and the asynchronous nature of their operations.
Failures may not be immediately apparent, and diagnosing the root cause of
an issue often requires analyzing logs and performance metrics from multiple
nodes.

 Challenge: Centralized logging and monitoring solutions that collect and


aggregate data from all nodes are essential but can introduce overhead and
latency. Distributed tracing, which tracks the flow of requests across nodes, is
another technique used to diagnose issues, but implementing such systems
requires careful design to avoid excessive performance degradation.

9. Concurrency and Parallelism


 Issue: Distributed systems aim to exploit parallelism by having multiple
nodes work on different parts of a task simultaneously. However, concurrency
introduces the risk of race conditions, where the outcome depends on the
order of operations, leading to unpredictable results.

 Challenge: Writing code that effectively takes advantage of concurrency


without introducing errors like race conditions, deadlocks, or livelocks is
difficult. Locking mechanisms, atomic operations, and transactional memory
can help ensure correctness but often at the cost of reduced performance
due to increased contention for shared resources.

10. Latency Tolerance


 Issue: Distributed systems often span wide geographic areas, which results in
variable and sometimes high latency. This can negatively impact
performance, particularly for applications that require real-time or near-real-
time responses.

 Challenge: Designing latency-tolerant applications involves techniques such


as predictive loading, where future data is pre-fetched based on anticipated
requests, or caching frequently accessed data at nodes closer to the users.
Additionally, optimizing communication protocols to minimize round trips
between nodes and reducing dependencies on remote data are key to
improving latency tolerance.

11. Operational Costs and Resource Management


 Issue: Distributed systems typically operate on large scales, which can result
in significant operational costs, particularly in cloud environments where
resources are billed based on usage.

 Challenge: Resource management is crucial to controlling costs. Techniques


like auto-scaling, which dynamically adjusts the number of nodes based on
the system’s workload, can help reduce expenses. Additionally, optimizing
resource allocation algorithms to ensure that nodes are efficiently utilized
while minimizing wastage is critical for cost-effective operations.

2. Explain in detail about for the following topics.


i. OMG ii. COBRA
iii. RPC iv. DCOM
v. PMI

 OMG:

In distributed computing, OMG (Object Management Group) is a significant


organization that plays a key role in standardizing various aspects of distributed
systems and technologies. Here's a detailed overview of OMG and its contributions:

Overview of OMG:
The Object Management Group (OMG) is an international, open
membership, not-for-profit consortium that works to create standards for distributed
object computing. OMG's standards are designed to ensure interoperability and
integration of different systems and technologies.

Established in 1989, OMG has brought together various technology


vendors, users, and academic institutions to create and maintain standards that
support distributed computing environments.

OMG's Impact:
Interpertobility

Flexibility

Innovation

 COBRA:
CORBA (Common Object Request Broker Architecture) is a standard designed by
OMG to enable communication between objects in a distributed system. It
allows applications and services to communicate with each other regardless of
where they are located, the programming languages they are written in, or the
platforms they are running on.

CORBA is based on the idea of an Object Request Broker (ORB), which acts
as a middleman to handle the communication and method calls between
distributed objects across different systems. Essentially, CORBA enables object-
oriented programming in a distributed environment by abstracting the
complexities of network communication.

KEY CONCEPTS:
Object Request Broker (ORB):
The ORB is the core of CORBA. It enables objects to send requests
and receive responses in a transparent manner, without needing to know where the
object resides (locally or remotely) or what language it's implemented in.

Interface Definition Language (IDL):


CORBA uses a neutral language called IDL to define the interfaces
that objects expose to other objects. IDL is independent of any programming language
and platform.

POA (Portable Object Adapter)


The Portable Object Adapter (POA) is responsible for managing
server-side object lifecycles and object reference creation. It handles object activation,
request demultiplexing, and object reference creation.

CORBA Architecture
The CORBA architecture consists of several components:

1. Client: The entity that requests services by invoking methods on remote


objects.

2. Server: The entity that provides services by hosting the remote objects and
responding to client requests.

3. ORB (Object Request Broker): The intermediary that handles communication


between clients and servers, making the distributed nature of the system
transparent to both sides.
4. IDL (Interface Definition Language): Used to define the object interfaces in
a language-neutral manner.

5. GIOP/IIOP (General Inter-ORB Protocol/Internet Inter-ORB Protocol):


Protocols used to enable communication between ORBs across different
networks, including the internet.

CORBA Features:
Language and Platform Independence

Transparency

Scalability

 RPC:
RPC (Remote Procedure Call) is a communication paradigm where a program
(the client) can request a service or procedure from a program (the server)
located on a remote system, in a way that looks like a regular procedure call in
the local system. The goal of RPC is to hide the complexities of remote
communication, such as message passing, from the programmer .

Diagram of RPC Flow:


Client Application

| Calls Procedure (Local)

Client Stub

| Marshals Request (Serialization)

Communication Layer (Network Transport)

| Sends Message Over Network

Server Stub

| Unmarshals Request (Deserialization)


|

Server Application

| Executes Procedure (Remote)

Server Stub

| Marshals Response

Communication Layer (Network Transport)

| Sends Response Over Network

Client Stub

| Unmarshals Response

Client Application
Types of RPC:
Synchronous RPC

Asychronous RPC

 DCOM:
DCOM (Distributed Component Object Model) is a Microsoft
technology that extends the Component Object Model (COM) to
support communication between objects across network boundaries,
allowing for distributed applications in a networked environment. It
was designed to enable software components to communicate with
each other regardless of whether they are on the same machine or
across a network, making it an essential technology in the context of
distributed computing, especially within Windows environments.
Steps of a Typical DCOM Interaction
1. Client Object Reference: The client obtains a reference to the
remote object, either by querying a directory service or by
receiving the object reference from another component.

2. Method Invocation: The client invokes a method on the local proxy,


which represents the remote object.

3. Marshalling: The proxy marshals the method call’s arguments into


a format suitable for network transmission.

4. Network Communication: The marshaled data is transmitted over


the network to the remote machine using DCOM’s underlying
communication protocol (usually RPC).

5. Unmarshalling on Server: The stub on the server side receives the


data, unmarshals it, and invokes the appropriate method on the
actual object.

6. Execution: The server object executes the method and returns the
result.

7. Return Path: The result is marshaled by the stub, transmitted back


over the network, and unmarshaled by the proxy on the client side.

8. Result: The client receives the result of the method call as if the
object were local.

DCOM Architecture
The DCOM architecture extends the core COM architecture with the
following key components:

1. Client-Side Proxy: The proxy acts as a local representative for the


remote object, allowing the client to interact with it as though it
were a local COM object.

2. Server-Side Stub: The stub handles the method call on the server
side, translating network messages into method invocations on the
actual object.

3. RPC Protocol: DCOM relies on the underlying RPC (Remote


Procedure Call) protocol to handle the details of transmitting
method calls across the network.

4. Network Transport: Typically, DCOM uses TCP/IP as the transport


protocol for communication between machines.
 PMI:
In distributed computing, PMI (Process Management Interface) is a
specification and framework that is primarily used to manage and control
processes in distributed systems, especially in the context of parallel computing.
PMI provides a standard way for the various processes of a parallel application
(which might be spread across multiple nodes in a distributed system) to
coordinate with one another, exchange information, and synchronize their
activities.
PMI is most notably associated with the Message Passing Interface (MPI), a
widely used standard for programming parallel systems. In this context, PMI plays
a crucial role in launching and managing processes in parallel programs.

PMI in MPI Programs


In an MPI program, multiple processes need to be launched and coordinated
across different nodes (computers) in a distributed system. PMI is responsible
for managing these processes and ensuring that they can communicate and
coordinate effectively. This is especially critical in large-scale systems where
thousands or even millions of processes may be running in parallel.

PMI Versions
There have been multiple versions of the PMI specification, each introducing
improvements and new features to better support the needs of parallel and
distributed computing:

1. PMI-1: The first version of the PMI specification provided basic functionality
for launching and managing processes in MPI programs. It introduced
mechanisms for process startup, communication, and coordination.

2. PMI-2: PMI-2 introduced enhancements to improve scalability and


performance, particularly for large-scale parallel systems. This version added
support for more efficient key-value exchange between processes, better
handling of process groups, and optimizations for collective operations.

3. PMI-Ex (Extended PMI): PMI-Ex is an extension to PMI that offers even


greater scalability and flexibility. It was developed to support exascale
computing systems, which are capable of performing at least one exaflop
(10^18 floating point operations per second). PMI-Ex includes features like
dynamic process management, improved fault tolerance, and better support
for hierarchical process topologies.

Components of PMI
PMI typically consists of the following components:

1. PMI Server: The PMI server is responsible for coordinating the startup
and initialization of processes. It acts as a central authority that
processes can communicate with during the initialization phase to
exchange information such as process ranks and network addresses.
2. PMI Client: Each process in the MPI program runs a PMI client that
communicates with the PMI server. The PMI client is responsible for
registering the process with the PMI server, retrieving information
about other processes, and participating in collective operations such
as barrier synchronization.

3. Key-Value Store: The key-value store is a mechanism provided by


PMI that allows processes to share information with one another. Each
process can store key-value pairs in the store, which other processes
can then retrieve. This is used, for example, to share information about
process locations (e.g., hostnames and ports).

You might also like