0% found this document useful (0 votes)
44 views

What is Distributed Computing

Copyright
© © All Rights Reserved
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
44 views

What is Distributed Computing

Copyright
© © All Rights Reserved
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 45

What is distributed computing?

Distributed computing is a model in which components of a software system are


shared among multiple computers or nodes. Even though the software components
are spread out across multiple computers in multiple locations, they're run as one
system to improve efficiency and performance. The systems on different
networked computers communicate and coordinate by sending messages back and
forth to achieve a defined task.

Due to its ability to provide parallel processing between multiple systems, distributed
computing can increase performance, resilience and scalability, making it a
common computing model in database systems and application design.

Distributed computing is sometimes also known as distributed systems, distributed


programming or distributed algorithms.

How distributed computing works


Distributed computing networks can be connected as local networks or through
a wide area network if the machines are in different geographic locations. Processors
in distributed computing systems typically run in parallel.

Common functions involved in distributed computing include the following:

 Task distribution. A central algorithm distributes a large task into smaller


subtasks. These sub-tasks are then assigned to different nodes within the system
to distribute the workload.

 Parallel execution. Once the nodes are assigned, they independently execute
their assigned subtask concurrently with other nodes. This parallel processing
enables faster computation of complex tasks compared to sequential
processing.
 Communication. Nodes in a distributed system communicate with one another
to share resources, coordinate tasks and maintain synchronization. This
communication can take place through a variety of network protocols.

 Aggregation of results. After completing their respective sub-tasks, nodes


often send their results back to a central node or aggregator. The aggregator
combines these results to produce the final output or result of the overall
computation.

 Fault tolerance. Distributed systems are designed to handle failures gracefully.


They often incorporate redundancy, replication of data and mechanisms for
detecting and recovering from failures of individual nodes or communication
channels.

An
example showing how networks, servers and computers are structured in distributed
computing.
Types of distributed computing architecture
In enterprise settings, distributed computing generally puts various steps in
business processes at the most efficient places in a computer network. For
example, a typical distribution has a three-tier model that organizes applications
into the presentation tier -- or user interface -- the application tier and the data tier.
These tiers function as follows:

1. User interface processing occurs on the PC at the user's location.

2. Application processing takes place on a remote computer.


3. Database access and algorithm processing occur on another computer that
provides centralized access for many business processes.

In addition to the three-tier model, other types of distributed computing


architectures include the following:

 Client-server architectures. The client-server architectures use smart clients


that contact a server for data, then format and display that data to the user.

 N-tier system architectures. Typically used in application servers, these


architectures use web applications to forward requests to other enterprise
services.

 Peer-to-peer architectures. These divide all responsibilities among all peer


computers, which can serve as clients or servers. This model is popular in
several use cases, including blockchain networks, content exchange and media
streaming.

 Scale-out architectures. The scale-out architecture is typically used by


distributed computing clusters, making it easier to add new hardware as the
load on the network increases.

 Distributed shared memory architecture. This type of memory architecture


is applied to loosely coupled distributed memory systems. It enables end-user
processes to access shared data without the need for inter-process
communication.
Advantages of distributed computing
Distributed computing offers the following benefits:

 Enhanced performance. Distributed computing can help improve


performance by having each computer in a cluster handle different parts of a
task simultaneously.

 Scalability. Distributed computing clusters are scalable by adding new


hardware when needed. Additionally, they can keep running even if one or
more of the systems malfunctions, thus offering scalability and fault tolerance.
 Resilience and redundancy. Multiple computers can provide the same
services. This way, if one machine isn't available, others can fill in for the
service. Likewise, if two machines that perform the same service are in
different data centers and one data center goes down, an organization can still
operate.

 Cost-effectiveness. Distributed computing can use low-cost, off-the-shelf


hardware. These systems aim to achieve high performance by reducing latency,
improving response time and maximizing throughput to meet shared objectives.

 Efficiency. Complex requests can be broken down into smaller pieces and
distributed among different systems. This way, the request is simplified and
worked on as a form of parallel computing, reducing the time needed to
compute requests.

 Flexibility. Unlike traditional applications that run on a single


system, distributed applications run on multiple systems simultaneously.

 Transparency. Distributed systems can present resources such as files,


databases and services as if they exist at a single computer or location, even
though they could be spread across multiple nodes. This transparency enables
users or applications to access resources without needing to know where they're
physically located.
Disadvantages of distributed computing
Along with its various advantages, distributed computing also presents certain
limitations, including the following:

 Configuration issues. For a distributed system to work properly, all nodes in


the network should have the same configuration and be able to interact with one
another. This can bring challenges for organizations that have complicated IT
infrastructure or have IT staff that lacks the necessary skills.

 Communication overhead. Coordinating and communicating between


multiple nodes in a distributed system can create additional communication
overhead, potentially reducing overall system performance.
 Security management. Managing security in distributed systems can be complex,
as users should control replicated data across various locations and ensure
network security. Additionally, ensuring consistent performance and uptime
across distributed resources can be challenging.

 Cost. While distributed systems can be cost-effective in the long run, they often
incur high deployment costs initially. This can be an issue for some
organizations, especially when compared to the relatively lower upfront costs
of centralized systems.

 Complexity. Distributed systems can be more complex to design, execute and


maintain compared to centralized systems. This can lead to challenges
in debugging, monitoring and ensuring the overall system's reliability.
Use cases for distributed computing
Distributed computing has use cases and applications across several industries,
including the following:

 Healthcare and life sciences. Distributed computing is used to model and


simulate complex life science data. This enables the efficient processing,
storage and analysis of large volumes of medical data along with the exchange
of electronic health records and other information among healthcare providers,
hospitals, clinics and laboratories.

 Telecommunication networks. Telephone and cellular networks are examples


of distributed networks. Telephone networks were initially operated as
early peer-to-peer networks. Cellular networks, on the other hand, consist of
distributed base stations across designated cells. As telephone networks
transitioned to voice over Internet Protocol, they've evolved into more complex
distributed networks.

 Aerospace and aviation. The aerospace industry uses distributed computing,


including the distributed diagnostic system for airplane engines. DAME
handles extensive in-flight data from operating aircraft using grid computing.
This enables the development of decision support systems for aircraft diagnosis
and maintenance.
 Manufacturing and logistics. Distributed computing is used in industries such
as manufacturing and logistics to provide real-time tracking, automation control
and dispatching systems. These industries typically use apps that monitor and
check equipment, as well as provide real-time tracking of logistics and e-
commerce activities.

 Cloud computing and IT services. The IT industry takes advantage of


distributed computing to ensure fault tolerance, facilitate resource management
and accessibility and maximize performance. It's also integral to cloud
computing platforms, as it offers dynamic, flexible infrastructures and quality of
service guarantees.

 Financial services. Financial services organizations use distributed systems for


various use cases, such as rapid economic simulations, portfolio risk
assessment, market prediction and financial decision-making. They also deploy
web applications powered by distributed systems to offer personalized
premiums, handle large-scale financial transactions securely via distributed
databases and bolster user authentication to prevent fraud.

 Entertainment and gaming. Distributed computing is essential in the online


gaming and entertainment sectors because it provides the necessary resources
for efficient operation, supporting multiplayer online games, high-quality video
animation and various entertainment applications.

 Distributed artificial intelligence. Distributed AI uses complex algorithms and


large-scale systems for learning and decision-making, relying on computational
data points distributed across multiple locations.
Grid computing, distributed computing and cloud
computing
Grid computing and cloud computing are variants of distributed computing.
The following are key characteristics, differences and applications of the
grid, distributed and cloud computing models:

Grid computing
Grid computing involves a distributed architecture of multiple computers
connected to solve a complex problem. Servers or PCs run independent
tasks and are linked loosely by the internet or low-speed networks. In the
grid computing model, individual participants can enable some of their
computer's processing time to solve complex problems.

SETI@home is one example of a grid computing project. Although the


project's first phase wrapped up in March 2020, for more than 20 years,
individual computer owners volunteered some of their multitasking
processing cycles -- while concurrently still using their computers -- to the
Search for Extraterrestrial Intelligence (SETI) project. This computer-
intensive problem used thousands of PCs to download and search radio
telescope data.

Distributed computing
Grid computing and distributed computing are similar concepts that can be
hard to tell apart. Generally, distributed computing has a broader definition
than grid computing. Grid computing is typically a large group of dispersed
computers working together to accomplish a defined task.

Conversely, distributed computing can work on numerous tasks


simultaneously. Grid computing can also be defined as just one type of
distributed computing. In addition, while grid computing typically has well-
defined architectural components, distributed computing can have various
architectures, such as grid, cluster and cloud computing.

Cloud computing
Cloud computing is also similar in concept to distributed computing. Cloud
computing is a general term for anything that involves delivering hosted
services and computing power over the internet. These services, however,
are divided into three main types: infrastructure as a service, platform as a
service and software as a service. Cloud computing is also divided
into private and public clouds. A public cloud sells services to another party,
while a private cloud is a proprietary network that supplies a hosted service
to a limited number of people, with specific access and permissions
settings. Cloud computing aims to provide easy, scalable access to
computing resources and IT services.

Cloud and distributed computing both focus on spreading a service or


services to several different machines; however, cloud computing typically
offers a service such as specific software or storage for organizations to
use on their own tasks. Distributed computing involves distributing services
to different computers to aid in or around the same task.

Examples and Use Cases of Distributed


Computing
1. Artificial Intelligence and Machine Learning
Artificial Intelligence (AI) and Machine Learning (ML) are two of the most exciting
and rapidly developing fields in technology today. They are also among the most
notable use cases for distributed computing.

AI and ML algorithms require enormous amounts of data to train their models.


Dealing with such vast amounts of data and performing complex computations is not
feasible using traditional computing models. Therefore, distributed computing is used
extensively in these fields.

One specific example of distributed computing in AI and ML is in training neural


networks. Neural networks are a type of machine learning model that is inspired by
the human brain. Training these networks involves processing vast amounts of data,
which is distributed across multiple machines for faster computation. This distributed
approach to machine learning is what makes it possible for us to train complex AI
models in a reasonable amount of time.

Moreover, distributed computing also makes it possible to deploy AI models at scale.


For instance, recommendation algorithms used by companies like Netflix and
Amazon are deployed on distributed computing platforms. This allows these models
to process millions of requests per second, providing personalized recommendations
to users in real-time.

2. Scientific Research and High-Performance


Computing (HPC)
Another area where distributed is used extensively is scientific research and high-
performance computing (HPC). In these fields, distributed computing is used to solve
complex scientific problems that require enormous computational resources.

For instance, distributed computing is used in the field of genomics to analyze large-
scale DNA sequences. The Human Genome Project, which mapped the entire
human genome, is a prime example of this. The project involved processing and
analyzing vast amounts of genetic data, which was distributed across multiple
machines for faster computation.

Similarly, distributed computing is used in climate modeling and weather forecasting.


These simulations require processing massive amounts of data to make accurate
predictions. This is achieved by distributing the data and computations across
multiple machines, which allows for faster and more accurate modeling.

In the field of physics, distributed computing is used to simulate particle collisions in


high-energy physics experiments. The Large Hadron Collider, the world's largest and
most powerful particle accelerator, relies on distributed computing to process the
vast amounts of data generated by its experiments.

3. Financial Services
In the financial services sector, distributed computing examples are plenty. Financial
institutions deal with vast amounts of data, from customer transactions to market
data. Processing and analyzing this data in real-time is critical to making informed
decisions.

One notable example of distributed computing in financial services is in risk


management. Financial institutions use distributed computing to analyze market data
and calculate risk in real-time. This allows them to make informed decisions about
investments and trading.

Additionally, distributed computing is used in fraud detection. By distributing data and


computations across multiple machines, financial institutions can analyze transaction
patterns in real-time and identify suspicious activity. This allows them to detect and
prevent fraud more effectively.

4. Energy and Environment


Distributed computing is also used in the energy and environment sectors. For
example, it is used in smart grid technology to manage and optimize energy
consumption.
Smart grids use distributed computing to collect data from various sources, such as
smart meters and sensors. This data is then analyzed in real-time to optimize energy
distribution and consumption. This not only improves energy efficiency but also
enables the integration of renewable energy sources into the grid.

In the environmental sector, distributed computing is used in climate modeling and


environmental monitoring. For instance, it is used to analyze satellite data to monitor
environmental changes, such as deforestation and sea-level rise. By distributing
these computations across multiple machines, scientists can process and analyze
data more quickly and accurately.

5. Internet of Things (IoT)


The Internet of Things (IoT) is another area where distributed computing is utilized.
IoT devices generate vast amounts of data, which need to be processed and
analyzed in real-time.

Distributed computing is used in IoT to manage and process this data. For instance,
it is used in smart home systems to control and monitor various devices, such as
thermostats and security systems. By distributing data and computations across
multiple devices, these systems can operate more efficiently and effectively.

Moreover, distributed computing is used in industrial IoT applications, such as


manufacturing and logistics. By distributing data and computations across various
machines and sensors, companies can monitor and optimize their operations in real-
time.

6. Blockchain and Cryptocurrencies


Finally, one of the most prominent distributed computing examples is blockchain and
cryptocurrencies, two technologies that rely on distributed computing to operate.

In a blockchain, data is stored across a network of computers, each of which


maintains a copy of the entire blockchain. This ensures that the data is secure and
resistant to tampering.

In cryptocurrencies like Bitcoin, distributed computing is used to process transactions


and maintain the blockchain. This involves solving complex mathematical problems,
which are distributed across a network of computers. This distributed approach
ensures that the system is secure and can handle a large volume of transactions.

Latest Trends in Distributed Systems


Distributed systems are rapidly evolving, shaping how we handle data,
compute resources, and network architecture. This article explores current
trends driving innovation in this field, including advancements in cloud
computing, edge processing, and decentralized technologies, highlighting
their impact on scalability, reliability, and efficiency.

Importance of Staying Updated with Latest Trends


in Distributed Systems
Staying updated with latest trends in distributed systems is crucial for
several reasons:
 Performance Optimization: New trends often bring improvements in
efficiency and scalability, helping to enhance system performance and
manage growing workloads.
 Security Enhancements: Emerging trends can introduce advanced
security measures and protocols to protect against evolving cyber
threats.
 Cost Efficiency: Innovations in distributed systems can lead to more
cost-effective solutions by optimizing resource usage and reducing
operational expenses.
 Competitive Edge: Keeping abreast of the latest developments allows
organizations to leverage cutting-edge technologies, maintaining a
competitive advantage in the market.
 Adaptability: Understanding new trends helps organizations adapt to
changing technology landscapes and user demands, ensuring systems
remain relevant and effective.
Cloud Computing and Distributed Systems
Integrations between cloud computing and distributed systems involve
combining the principles and technologies of both to create efficient,
scalable, and resilient computing environments. Here’s how these
integrations work and their significance:
1. Cloud-Based Distributed Systems
Cloud computing platforms often use distributed systems principles to
deliver their services. For example:
 Scalable Infrastructure: Cloud providers use distributed systems to
manage large-scale data centers and networks. This allows them to
scale resources dynamically based on demand.
 Load Balancing: Cloud services distribute incoming network traffic
across multiple servers to ensure no single server becomes a
bottleneck, improving performance and reliability.
 Data Replication: Cloud storage solutions replicate data across
multiple nodes or locations to ensure high availability and fault
tolerance.
2. Distributed Cloud Services
Distributed systems principles are applied to create cloud services that
span multiple geographic locations or data centers:
 Multi-Region Deployments: Cloud providers offer services that are
distributed across various geographic regions to enhance performance,
reduce latency, and increase redundancy.
 Edge Computing: Cloud providers use distributed systems to push
computing resources closer to end users at the edge of the network,
improving response times and reducing bandwidth use.
Microservices and Containerization in Distributed
Systems
Microservices and containerization are key concepts in modern distributed
systems, enhancing scalability, flexibility, and efficiency. Here’s a detailed
explanation of each and their roles in distributed systems:
1. Microservices
Microservices is an architectural style where a large application is divided
into smaller, loosely coupled services, each responsible for a specific
functionality. Each microservice operates independently but interacts with
other services through well-defined APIs. Key characteristics include:
 Modularity: Each microservice focuses on a single business
capability, making it easier to develop, test, and deploy independently.
 Scalability: Microservices can be scaled individually based on
demand, allowing more efficient use of resources. For example, if a
particular service experiences high traffic, it can be scaled up without
affecting the entire system.
 Fault Isolation: Failures in one microservice are less likely to impact
others, improving the overall reliability and resilience of the application.
 Continuous Deployment: Microservices support continuous
integration and continuous delivery (CI/CD) practices, allowing for more
frequent and reliable releases.
Example:
An e-commerce application might be divided into microservices for user
authentication, product catalog, payment processing, and order
management. Each service handles its own data and logic, and they
communicate through APIs.
2. Containerization
Containerization is a technology that encapsulates an application and its
dependencies into a container, which is a lightweight, portable unit that
can run consistently across various computing environments. Containers
are built on the principles of isolation and resource efficiency. Key
characteristics include:
 Isolation: Containers package an application with its runtime
environment, ensuring that it runs the same way regardless of the
underlying infrastructure or operating system. This helps avoid conflicts
and inconsistencies.
 Portability: Containers can be deployed across different environments
—such as development, testing, and production—without modification,
facilitating a smooth transition between stages.
 Efficiency: Containers share the host operating system’s kernel,
making them more lightweight and resource-efficient compared to
traditional virtual machines, which require separate OS instances.
 Scalability: Containers can be easily scaled up or down based on
demand, and orchestration tools can manage large numbers of
containers across distributed systems.
Example:
A container might include a microservice for user authentication,
packaged with its necessary libraries and configurations. This container
can be deployed on various cloud platforms or on-premises infrastructure
with consistent behavior.
3. Integration of Microservices and Containerization in
Distributed Systems
Microservices and containerization are often used together in distributed
systems to maximize their benefits:
 Deployment Flexibility: Containers allow microservices to be
deployed and managed consistently across different environments,
from development to production. This flexibility enhances the
deployment process and reduces compatibility issues.
 Scalability and Management: Containers can be orchestrated using
tools like Kubernetes, which manage the deployment, scaling, and
operation of microservices across a distributed system. Kubernetes
handles containerized microservices, ensuring they are deployed
efficiently and scaled as needed.
 Fault Tolerance and Resilience: By combining microservices with
containers, systems can achieve greater fault tolerance. Containers
can be quickly restarted or replaced if they fail, and microservices
ensure that failures are isolated and do not disrupt the entire system.
 Continuous Integration and Delivery: Containers support CI/CD
pipelines by providing a consistent environment for building, testing,
and deploying microservices, streamlining the development process
and accelerating delivery cycles.
Networking Advances in Distributed Systems
Networking advances in distributed systems are crucial for improving
performance, reliability, and scalability. Here are some key areas of
advancement:
 High-Speed Networking
o Optical Networks: Increased use of fiber optics provides
higher bandwidth and lower latency compared to traditional
copper cables.
o 5G and Beyond: Enhancements in cellular technology offer
faster speeds and lower latency, benefiting distributed
systems especially in mobile and IoT contexts.
 Software-Defined Networking (SDN)
o Network Virtualization: SDN allows for flexible and
programmable network configurations, enabling dynamic
adjustments based on traffic demands and network
conditions.
o Centralized Control: By decoupling control and data planes,
SDN simplifies network management and can optimize routing
and resource allocation.
 Network Function Virtualization (NFV)
o Virtual Network Functions (VNFs): NFV allows network
services to be virtualized and run on standard hardware,
making the deployment of network functions more flexible and
cost-effective.
o Service Chaining: VNFs can be chained together to create
complex services without relying on specialized hardware.
 Data-Centric Networking (DCN)
o Content Distribution: DCN focuses on efficient content
delivery and caching, reducing latency and load on origin
servers.
o Named Data Networking (NDN): NDN enhances data
retrieval by naming data rather than locations, allowing for
more efficient and resilient data access.
Future Directions of Distributed Systems
The future of distributed systems is poised to be shaped by several
emerging trends and technological advancements. Here are some key
directions where distributed systems are likely to evolve:
 Ubiquitous Edge Computing
o Edge-AI Integration: Combining edge computing with
artificial intelligence to enable real-time data processing and
decision-making at the edge of the network.
o Smart Infrastructure: Development of smart cities and smart
infrastructure with distributed systems handling tasks like
traffic management, energy distribution, and environmental
monitoring.
 Enhanced Security and Privacy
o Zero Trust Architectures: Widespread adoption of zero trust
models that enforce strict identity verification and continuous
monitoring.
o Advanced Cryptography: Use of quantum-resistant
cryptography to safeguard against future quantum computing
threats and enhanced encryption methods for data security.
 Quantum Computing and Networking
o Quantum-Enhanced Systems: Integration of quantum
computing for tasks requiring high computational power and
optimization, potentially revolutionizing problem-solving in
distributed systems.
o Quantum Networks: Development of quantum
communication networks that leverage quantum entanglement
for ultra-secure data transmission.
 Interoperability and Standards
o Cross-Domain Interoperability: Increased focus on creating
standards and protocols that allow different distributed
systems and applications to work together seamlessly.
o Open Standards and APIs: Growth of open standards and
APIs to facilitate easier integration and communication
between diverse systems and platforms.

Resource Sharing in Distributed System


Resource sharing in distributed systems is very important for
optimizing performance, reducing redundancy, and enhancing
collaboration across networked environments. By enabling multiple
users and applications to access and utilize shared resources such
as data, storage, and computing power, distributed systems improve
efficiency and scalability.
Importance of Resource Sharing in Distributed
Systems
Resource sharing in distributed systems is of paramount importance for
several reasons:
 Efficiency and Cost Savings: By sharing resources like storage,
computing power, and data, distributed systems maximize utilization
and minimize waste, leading to significant cost reductions.
 Scalability: Distributed systems can easily scale by adding more
nodes, which share the workload and resources, ensuring the system
can handle increased demand without a loss in performance.
 Reliability and Redundancy: Resource sharing enhances system
reliability and fault tolerance. If one node fails, other nodes can take
over, ensuring continuous operation.
 Collaboration and Innovation: Resource sharing facilitates
collaboration among geographically dispersed teams, fostering
innovation by providing access to shared tools, data, and
computational resources.
 Load Balancing: Efficient distribution of workloads across multiple
nodes prevents any single node from becoming a bottleneck, ensuring
balanced performance and preventing overloads.
Types of Resources in Distributed Systems
In distributed systems, resources are diverse and can be broadly
categorized into several types:
 Computational Resources: These include CPU cycles and
processing power, which are shared among multiple users and
applications to perform various computations and processing tasks.
 Storage Resources: Distributed storage systems allow data to be
stored across multiple nodes, ensuring data availability, redundancy,
and efficient access.
 Memory Resources: Memory can be distributed and shared across
nodes, allowing applications to utilize a larger pool of memory than
what is available on a single machine.
 Network Resources: These include bandwidth and network interfaces,
which facilitate communication and data transfer between nodes in a
distributed system.
 Data Resources: Shared databases, files, and data streams that are
accessible by multiple users and applications for reading and writing
operations.
 Peripheral Devices: Devices such as printers, scanners, and
specialized hardware that can be accessed remotely within the
distributed network.
Resource Sharing Mechanisms
Resource sharing in distributed systems is facilitated through various
mechanisms designed to optimize utilization, enhance collaboration, and
ensure efficiency. Some common mechanisms include:
Resource Sharing Mechanisms

1. Client-Server Architecture: A classic model where clients request


services or resources from centralized servers. This architecture
centralizes resources and services, providing efficient access but
potentially leading to scalability and reliability challenges.
2. Peer-to-Peer (P2P) Networks : Distributed networks where each node
can act as both a client and a server. P2P networks facilitate direct
resource sharing between nodes without reliance on centralized
servers, promoting decentralized and scalable resource access.
3. Distributed File Systems: Storage systems that distribute files across
multiple nodes, ensuring redundancy and fault tolerance while allowing
efficient access to shared data.
4. Load Balancing: Mechanisms that distribute workload across multiple
nodes to optimize resource usage and prevent overload on individual
nodes, thereby improving performance and scalability.
5. Virtualization: Techniques such as virtual machines (VMs) and
containers that abstract physical resources, enabling efficient resource
allocation and utilization across distributed environments.
6. Caching: Storing frequently accessed data closer to users or
applications to reduce latency and improve responsiveness, enhancing
overall system performance.
7. Replication: Creating copies of data or resources across multiple
nodes to ensure data availability, fault tolerance, and improved access
speed.
Best Architectures for Resource Sharing in
Distributed System
The best architectures for resource sharing in distributed systems depend
on the specific requirements and characteristics of the system. Here are
some commonly adopted architectures that facilitate efficient resource
sharing:
1. Client-Server Architecture:
 Advantages: Centralized management simplifies resource
allocation and access control. It is suitable for applications where
clients primarily consume services or resources from centralized
servers.
 Use Cases: Web applications, databases, and enterprise systems
where centralized control and management are critical.
2. Peer-to-Peer (P2P) Architecture :
 Advantages: Decentralized nature facilitates direct resource
sharing between peers without dependency on centralized servers,
enhancing scalability and fault tolerance.
 Use Cases: File sharing, content distribution networks (CDNs), and
collaborative computing environments.
3. Service-Oriented Architecture (SOA):
 Advantages: Organizes services as reusable components that can
be accessed and shared across distributed systems, promoting
interoperability and flexibility.
 Use Cases: Enterprise applications, where modular services such
as authentication, messaging, and data access are shared across
different departments or systems.
4. Microservices Architecture:
 Advantages: Decomposes applications into small, independent
services that can be developed, deployed, and scaled
independently. Each microservice can share resources selectively,
optimizing resource usage.
 Use Cases: Cloud-native applications, where scalability, agility, and
resilience are paramount.
5. Distributed File System Architecture:
 Advantages: Distributes file storage across multiple nodes,
providing redundancy, fault tolerance, and efficient access to shared
data.
 Use Cases: Large-scale data storage and retrieval systems, such
as Hadoop Distributed File System (HDFS) for big data processing.
6. Container Orchestration Architectures (e.g., Kubernetes):
 Advantages: Orchestrates containers across a cluster of nodes,
facilitating efficient resource utilization and management of
applications in distributed environments.
 Use Cases: Cloud-native applications, where scalability, portability,
and resource efficiency are critical.
Choosing the best architecture involves considering factors such as
scalability requirements, fault tolerance, performance goals, and the
nature of applications and services being deployed.
Resource Allocation Strategies in Distributed
System
Resource allocation strategies in distributed systems are crucial for
optimizing performance, ensuring fairness, and maximizing resource
utilization. Here are some common strategies:
1. Static Allocation:
 Description: Resources are allocated based on fixed, predetermined
criteria without considering dynamic workload changes.
 Advantages: Simple to implement and manage, suitable for
predictable workloads.
 Challenges: Inefficient when workload varies or when resources are
underutilized during low-demand periods.
2. Dynamic Allocation:
 Description: Resources are allocated based on real-time demand and
workload conditions.
 Advantages: Maximizes resource utilization by adjusting allocations
dynamically, responding to varying workload patterns.
 Challenges: Requires sophisticated monitoring and management
mechanisms to handle dynamic changes effectively.
3. Load Balancing:
 Description: Distributes workload evenly across multiple nodes or
resources to optimize performance and prevent overload.
 Strategies: Round-robin scheduling, least connection method, and
weighted distribution based on resource capacities.
 Advantages: Improves system responsiveness and scalability by
preventing bottlenecks.
 Challenges: Overhead of monitoring and adjusting workload
distribution.
4. Reservation-Based Allocation:
 Description: Resources are reserved in advance based on anticipated
future demand or specific application requirements.
 Advantages: Guarantees resource availability when needed, ensuring
predictable performance.
 Challenges: Potential resource underutilization if reservations are not
fully utilized.
5. Priority-Based Allocation:
 Description: Assigns priorities to different users or applications,
allowing higher-priority tasks to access resources before lower-priority
tasks.
 Advantages: Ensures critical tasks are completed promptly, maintaining
service-level agreements (SLAs).
 Challenges: Requires fair prioritization policies to avoid starvation of
lower-priority tasks.
Challenges in Resource Sharing in Distributed
System
Resource sharing in distributed systems presents several challenges that
need to be addressed to ensure efficient operation and optimal
performance:
 Consistency and Coherency : Ensuring that shared resources such as
data or files remain consistent across distributed nodes despite
concurrent accesses and updates.
 Concurrency Control : Managing simultaneous access and updates to
shared resources to prevent conflicts and maintain data integrity.
 Fault Tolerance: Ensuring resource availability and continuity of service
in the event of node failures or network partitions.
 Scalability: Efficiently managing and scaling resources to accommodate
increasing demands without compromising performance.
 Load Balancing: Distributing workload and resource usage evenly
across distributed nodes to prevent bottlenecks and optimize resource
utilization.
 Security and Privacy : Safeguarding shared resources against
unauthorized access, data breaches, and ensuring privacy compliance.
 Communication Overhead : Minimizing overhead and latency associated
with communication between distributed nodes accessing shared
resources.
 Synchronization: Coordinating activities and maintaining
synchronization between distributed nodes to ensure consistent and
coherent resource access.

Case Study : WWW


System Models in DS
Interprocess Communication in Distributed
Systems
Interprocess Communication (IPC) in distributed systems is crucial
for enabling processes across different nodes to exchange data and
coordinate activities. This article explores various IPC methods, their
benefits, and challenges in modern distributed computing
environments.
What is Interprocess Communication in a
Distributed system?
Interprocess Communication in a distributed system is a process of
exchanging data between two or more independent processes in a
distributed environment is called as Interprocess communication.
Interprocess communication on the internet provides both Datagram and
stream communication.
Characteristics of Inter-process Communication in
Distributed Systems
There are mainly five characteristics of inter-process communication in a
distributed environment/system.
 Synchronous System Calls: In synchronous system calls both sender and
receiver use blocking system calls to transmit the data which means
the sender will wait until the acknowledgment is received from the
receiver and the receiver waits until the message arrives.
 Asynchronous System Calls: In asynchronous system calls, both sender
and receiver use non-blocking system calls to transmit the data which
means the sender doesn’t wait from the receiver acknowledgment.
 Message Destination: A local port is a message destination within a
computer, specified as an integer. Aport has exactly one receiver but
many senders. Processes may use multiple ports from which to receive
messages. Any process that knows the number of a port can send the
message to it.
 Reliability: It is defined as validity and integrity.
 Integrity: Messages must arrive without corruption and duplication to the
destination.
Types of Interprocess Communication in
Distributed Systems
Below are the types of interprocess communication (IPC) commonly used
in distributed systems:
 Message Passing:
o Definition: Message passing involves processes
communicating by sending and receiving messages.
Messages can be structured data packets containing
information or commands.
o Characteristics: It is a versatile method suitable for both
synchronous and asynchronous communication. Message
passing can be implemented using various protocols such as
TCP/IP, UDP, or higher-level messaging protocols like AMQP
(Advanced Message Queuing Protocol) or MQTT (Message
Queuing Telemetry Transport).
 Remote Procedure Calls (RPC) :
o Definition: RPC allows one process to invoke a procedure (or
function) in another process, typically located on a different
machine over a network.
o Characteristics: It abstracts the communication between
processes by making it appear as if a local procedure call is
being made. RPC frameworks handle details like parameter
marshalling, network communication, and error handling.
 Sockets:
o Definition: Sockets provide a low-level interface for network
communication between processes running on different
computers.
o Characteristics: They allow processes to establish connections,
send data streams (TCP) or datagrams (UDP), and receive
responses. Sockets are fundamental for implementing higher-
level communication protocols.
 Message Queuing Systems :
o Description: Message queuing systems facilitate asynchronous
communication by allowing processes to send messages to
and receive messages from queues.
o Characteristics: They decouple producers (senders) and
consumers (receivers) of messages, providing fault tolerance,
scalability, and persistence of messages. Examples
include Apache Kafka, RabbitMQ, and AWS SQS.
 Publish-Subscribe Systems :
o Description: Publish-subscribe (pub-sub) systems enable
communication between components without requiring them
to directly know each other.
o Characteristics: Publishers publish messages to topics, and
subscribers receive messages based on their interest in
specific topics. This model supports one-to-many
communication and is scalable for large-scale distributed
systems. Examples include MQTT and Apache Pulsar.
These types of IPC mechanisms each have distinct advantages and are
chosen based on factors such as communication requirements,
performance considerations, and the nature of the distributed system
architecture. Successful implementation often involves selecting the most
suitable IPC type or combination thereof to meet specific application
needs.
Benefits of Interprocess Communication in
Distributed Systems
Below are the benefits of IPC in Distributed Systems:
 Facilitates Communication :
o IPC enables processes or components distributed across
different nodes to communicate seamlessly.
o This allows for building complex distributed applications where
different parts of the system can exchange information and
coordinate their activities.
 Integration of Heterogeneous Systems :
o IPC mechanisms provide a standardized way for integrating
heterogeneous systems and platforms.
o Processes written in different programming languages or
running on different operating systems can communicate
using common IPC protocols and interfaces.
 Scalability:
o Distributed systems often need to scale horizontally by adding
more nodes or instances.
o IPC mechanisms, especially those designed for distributed
environments, can facilitate scalable communication patterns
such as publish-subscribe or message queuing, enabling
efficient scaling without compromising performance.
 Fault Tolerance and Resilience:
o IPC techniques in distributed systems often include
mechanisms for handling failures and ensuring resilience.
o For example, message queues can buffer messages during
network interruptions, and RPC frameworks can retry failed
calls or implement failover strategies.
 Performance Optimization :
o Effective IPC can optimize performance by minimizing latency
and overhead associated with communication between
distributed components.
o Techniques like shared memory or efficient message passing
protocols help in achieving low-latency communication.
Challenges of Interprocess Communication in
Distributed Systems
Below are the challenges of IPC in Distributed Systems:
 Network Latency and Bandwidth:
o Distributed systems operate over networks where latency
(delay in transmission) and bandwidth limitations can affect
IPC performance.
o Minimizing latency and optimizing bandwidth usage are critical
challenges, especially for real-time applications.
 Reliability and Consistency:
o Ensuring reliable and consistent communication between
distributed components is challenging.
o IPC mechanisms must handle network failures, message loss,
and out-of-order delivery while maintaining data consistency
across distributed nodes.
 Security:
o Securing IPC channels against unauthorized access,
eavesdropping, and data tampering is crucial.
o Distributed systems often transmit sensitive data over
networks, requiring robust encryption, authentication, and
access control mechanisms.
 Complexity in Error Handling :
o IPC errors, such as network timeouts, connection failures, or
protocol mismatches, must be handled gracefully to maintain
system stability.
o Designing robust error handling and recovery mechanisms
adds complexity to distributed system implementations.
 Synchronization and Coordination :
o Coordinating actions and ensuring synchronization between
distributed components can be challenging, especially when
using shared resources or implementing distributed
transactions.
o IPC mechanisms must support synchronization primitives and
consistency models to avoid race conditions and ensure data
integrity.
Example of Interprocess Communication in
Distributed System
Let’s consider a scenario to understand the Interprocess Communication
in Distributed System:
Consider a distributed system where you have two processes running on
separate computers, a client process (Process A) and a server process
(Process B). The client process needs to request information from the
server process and receive a response.
IPC Example using Remote Procedure Calls (RPC) :
1. RPC Setup:
 Process A (Client): Initiates an RPC call to Process B (Server).
 Process B (Server): Listens for incoming RPC requests and
responds accordingly.
2. Steps Involved:
 Client-side (Process A) :
o The client process prepares an RPC request, which
includes the name of the remote procedure to be called
and any necessary parameters.
o It sends this request over the network to the server
process.
 Server-side (Process B) :
o The server process (Process B) listens for incoming RPC
requests.
o Upon receiving an RPC request from Process A, it
executes the requested procedure using the provided
parameters.
o After processing the request, the server process prepares
a response (if needed) and sends it back to the client
process (Process A) over the network.
3. Communication Flow:
 Process A and Process B communicate through the RPC
framework, which manages the underlying network communication
and data serialization.
 The RPC mechanism abstracts away the complexities of network
communication and allows the client and server processes to
interact as if they were local.
4. Example Use Case:
 Process A (Client) could be a web application requesting user data
from a database hosted on Process B (Server).
 Process B (Server) receives the request, queries the database,
processes the data, and sends the results back to Process A
(Client) via RPC.
 The client application then displays the retrieved data to the user.
In this example, RPC serves as the IPC mechanism facilitating
communication between the client and server processes in a distributed
system. It allows processes running on different machines to collaborate
and exchange data transparently, making distributed computing more
manageable and scalable.

What is an API (Application Programming


Interface)
Everyone is in search of the highest-paying job so as to get into it. And, in
the list, the web developer has been on the top for years and will remain
in the same place due to its demand. If you’re the one who’s looking to get
into it. you must be aware of the most important terms used in it. Out of all
the terms, API is yet another term that plays a very important role in
building a website. Now, what is an API – (Application Programming
Interface)?
To make you clear with the diagram of what is API , let’s take a real-life
example of an API, you can think of an API as a waiter in a restaurant who
listens to your order request, goes to the chef, takes the food items
ordered and gets back to you with the order. Also, if you want to look for
the working of an API with the example, here’s one. You’re searching for
a course (let’s say DSA-Self Paced) on the XYZ website, you send a
request(product search requested) through an API, and the database
searches for the course and checks if it’s available, the API is responsible
here to send your request to the database (in search of the course) and
responds with the output(best DSA courses).
What is an API?
API full form is an Application Programming Interface that is a collection of
communication protocols and subroutines used by various programs to
communicate between them. A programmer can make use of various API
tools to make their program easier and simpler. Also, an API facilitates
programmers with an efficient way to develop their software programs.
Thus api meaning is when an API helps two programs or applications to
communicate with each other by providing them with the necessary tools
and functions. It takes the request from the user and sends it to the
service provider and then again sends the result generated from the
service provider to the desired user.
A developer extensively uses APIs in his software to implement various
features by using an API call without writing complex codes for the same.
We can create an API for an operating system , database
system , hardware system, JavaScript file , or similar object-oriented files.
Also, an API is similar to a GUI(Graphical User Interface) with one major
difference. Unlike GUIs, an application program interface helps software
developers to access web tools while a GUI helps to make a program
easier to understand for users.
APIs are the building blocks for the todays websites in which heavy data
is transferred from the client to server and vice versa. If you want to learn
such more concepts of the websites then you should enrol in our Full
Stack Node Development Course
How do APIs Work?
The working of an API can be clearly explained with a few simple steps.
Think of a client-server architecture where the client sends the request via
a medium to the server and receives the response through the same
medium. An API acts as a communication medium between two programs
or systems for functioning. The client is the user/customer (who sends the
request), the medium is the application interface programming , and the
server is the backend (where the request is accepted and a response is
provided). Steps followed in the working of APIs –
 The client initiates the requests via the APIs URI (Uniform Resource
Identifier)
 The API makes a call to the server after receiving the request
 Then the server sends the response back to the API with the
information
 Finally, the API transfers the data to the client
APIs are considered safe in terms of attacks as it includes authorization
credentials and an API gateway to limit access so as to minimize security
threats. To provide additional security layers to the data, HTTP headers,
query string parameters, or cookies are used.
If we talk about the architectures, API’s architectures are:
 REST (Representational State Transfer)
 SOAP (Simple Object Access Protocol)
Both define a standard communication protocol for the exchange of
messages in XML (Extensible Markup Language).
How is an API Different From a Web Application?
An API acts as an interface that allows proper communication between
two programs whereas a web application is a network-based resource
responsible for completing a single task. Also, it’s important to know
that “All web services are APIs, but not all APIs are web”.
The difference between an API and a web application is that API allows
two-way communication and web applications are just a way for users to
interact through a web browser. A web application may have an API to
complete the requests.
Types of APIs
There are three basic forms of API –
1. WEB APIs
A Web API also called Web Services is an extensively used API over the
web and can be easily accessed using the HTTP protocols. A
Web application programming interface is an open-source interface and
can be used by a large number of clients through their phones, tablets, or
PCs.
2. LOCAL APIs
In this type of API, the programmers get the local middleware services.
TAPI (Telephony Application Programming Interface), and .NET are
common examples of Local APIs.
3. PROGRAM APIs
It makes a remote program appear to be local by making use of RPCs
(Remote Procedural Calls). SOAP is a well-known example of this type of
API.
Few other types of APIs:
 SOAP (SIMPLE OBJECT ACCESS PROTOCOL): It defines messages in XML
format used by web applications to communicate with each other.
 REST (Representational State Transfer): It makes use of HTTP to GET,
POST, PUT, or DELETE data. It is basically used to take advantage of
the existing data.
 JSON-RPC: It uses JSON for data transfer and is a lightweight remote
procedural call defining a few data structure types.
 XML-RPC: It is based on XML and uses HTTP for data transfer. This API
is widely used to exchange information between two or more networks.
What are REST APIs?
REST stands for Representational State Transfer, and follows the
constraints of REST architecture allowing interaction with RESTful web
services. It defines a set of functions (GET, PUT, POST, DELETE) that
clients use to access server data. The functions used are:
 GET (retrieve a record)
 PUT (update a record)
 POST (create a record)
 DELETE (delete the record)
Its main feature is that REST API is stateless, i.e., the servers do not save
clients’ data between requests.
What is a Web API?
Web API Is simply an API for the web. It is an API that can be accessed
using the HTTP protocol. It can be built using Java, .nET, etc. It is
implemented to extend the functionality of a browser, simplify complex
functions, and provide easy syntax to complex code.
The four main types of web APIs are:
 Open API
 Partner API
 Internal API
 Composite API
To Know More: What is Web API and why we use it?
SOAP vs. REST
SOAP REST

SOAP (Simple Object Access


REST (Representational State Transfer) is
Protocol) is a protocol with
a set of guidelines (architectural
specific requirements like XML
style) offering flexible implementation
messaging

Heavier and needs more


Lightweight and needs less bandwidth
bandwidth

It inherits security from the underlying


It defines its own security
transport

It permits XML-based data It permits different data formats such as


format only plain text, HTML, XML, JSON, etc.

SOAP calls cannot be cached REST calls can be cached

Also, the major difference is that SOAP cannot make use of REST
whereas REST can make use of SOAP. You can also read about
the difference between REST API and SOAP API
What is API (Application Programming Interface)
Integration?
API (Application Programming Interface) Integration is the connection
between two or more applications, via APIs, letting you exchange data. It
is a medium through which you can share data and communicate with
each other by involving APIs to allow web tools to communicate. Due to
the rise in cloud-based products, API integration has become very
important.
What is API (Application Programming Interface)
Testing?
API (Application Programming Interface) testing is a kind of software testing
that analyzes an API in terms of its functionality, security, performance,
and reliability. It is very important to test an API so as to check whether
it’s working as expected or not. If not, again changes are made in the
architecture and re-verified.
APIs are the center of software development to exchange data across
applications. The API testing includes sending requests to single/multiple
API endpoints and validating the response. It focuses majorly on business
logic, data responses and security, and performance bottlenecks.
Types of Testing:
 Unit Testing
 Integration Testing
 Security Testing
 Performance Testing
 Functional Testing
Must Read: API Testing in Software Testing
API Testing Tools:
 Postman
 Apigee
 JMeter
 Ping API
 Soap UI
 vREST
How to Create APIs?
Creating an API is an easy task unless you are very well clear on the
basic concepts. It’s an iterative process (based on feedback) that just
includes a few easy steps:
 Plan your goal and the intended users
 Design the API architecture
 Develop (Implement the code) and Test API
 Monitor its working and work on feedback
Must Read: Tips for Building an API
Restrictions of Using APIs
When an API (Application Programming Interface) is made it’s not really
released as software for download and it has some policies governing its
use or restricting its use to everyone, usually, there are three main types
of policies governing APIs, are:
 Private: These APIs are only made for a single person or entity (like a
company that has spent the resources to make it or bought it).
 Partner: Just like the name it gives the authority to use APIs to some
partners of entities that own APIs for their private use.
 Public: You should be aware of them cause you can only find these
APIs in the market for your own use if you don’t own specific API
access from some entity that owns private these APIs for their private
use. An example of a Public API is ‘Windows API’ by Microsoft for
more public APIs you can visit this GitHub repository -
> https://github.com/public-apis/public-apis .
Advantages of APIs
 Efficiency: API produces efficient, quicker, and more reliable results
than the outputs produced by human beings in an organization.
 Flexible delivery of services: API provides fast and flexible delivery of
services according to developers’ requirements.
 Integration:The best feature of API is that it allows the movement of
data between various sites and thus enhances the integrated user
experience.
 Automation: As API makes use of robotic computers rather than
humans, it produces better and more automated results.
 New functionality : While using API the developers find new tools and
functionality for API exchanges.
Disadvantages of APIs
 Cost: Developing and implementing API is costly at times and requires
high maintenance and support from developers.
 Security issues: Using API adds another layer of surface which is then
prone to attacks, and hence the security risk problem is common in
APIs.
Conclusion
By now, you must have had a clear idea of What is API? it’s working,
types, testing tools used, etc. After understanding these concepts, you
can try working on them by implementing some of the concepts in
projects. Not just theoretical knowledge, you must also have a practical
idea of it by working on it. Developers must have a deep understanding of
APIs in order to implement them.

Application Program Interface (API) for internet protocols


API is a programming interface between application programs and communication
subsystems based on open network protocols. The API lets any application program
operating in its own MVS address space to access and use communication services provided
by an MVS subsystem that implements this interface. TCP access, which provides
communication services using TCP/IP protocols, is an example of such a subsystem.

This programmer's reference describes an interface to the transport layer of the Basic
Reference Model of Open Systems Interconnection (OSI). Although the API is capable of
interfacing to proprietary protocols, the Internet open network protocols are the intended
providers of the transport service. This document uses the term "open" to emphasize that
any system conforming to one of these standards can communicate with any other system
conforming to the same standard, regardless of vendor. These protocols are contrasted with
proprietary protocols that generally support a closed community of systems supplied by a
single vendor External Data Representation and Marshalling.

Data Representation & Marshaling

The information stored in running programs is represented as data structures – for example,
by sets of interconnected objects – whereas the information in messages consists of
sequences of bytes. Irrespective of the form of communication used, the data structures
must be flattened (converted to a sequence of bytes) before transmission and rebuilt on
arrival.

The individual primitive data items transmitted in messages can be data values of many
different types, and not all computers store primitive values such as integers in the same
order. The representation of floating-point numbers also differs between architectures. To
support any data type that can be passed as an argument or returned as a result must be
able to be flattened and the individual primitive data values represented in an agreed
format.

External data representation– an agreed standard for the representation of data structures
and primitive values

Marshalling– the process of taking a collection of data items and assembling them into a
form suitable for transmission in a message

Unmarshalling– is the process of disassembling them on arrival into an equivalent


representation at the destination The marshalling and unmarshalling are intended to be
carried out by the middleware layer. ……………Reference {george-coulouris-distributed-
systems-concepts-and-design}

Group Communication

What is a group?
 A number of processes which cooperate to provide a service.
 An abstract identity to name a collection of processes.

Group Communication: For coordination among processes of a group.

Who Needs Group Communication?


 Highly available servers (client-server)
 Database Replication
 Multimedia Conferencing
 Online Games
 Cluster management
Client Server Communication

Client and server communication take place when both are connected to each
other via a network. Client and the server are two individual computing systems
having their own operating system, applications and functions. When connected
via a network they are able to share their applications with each other.

It is not necessary that client and server use a same platform as operating system,
many varied operating systems can be connected with each other for advanced
communication using communication protocol. The responsibility of
implementing the communication protocol lies with an application known as
communication software.
Using the features of a communication software client and server can exchange
files and data for effective communication. The process of communication
between client and server can be explained as follows:

 Data resides in the server.

 Client system sends a query to the server

 Server searches the data for the information to be exchanged

 Server sends the requested data in the form of final result

External Data Representation


And Marshalling
In distributed system the connected computers in network
need to be concurrently collaborated for particular service.
Computers message with each other to communicate
within a network for the collaborative work. InterProcess
communication is important part in distributed computing
where two process communicate with each other to share
data. Here the two process can be in same running
machines or different machines.When communication
happens between processes data types of transmitted data
differs according to the platform the process is running.
This is the situation where External Data Representation
and Marshalling come to action.
An agreed standard for the representation of data
structures and primitive values is said to be an
External Data Representation.

The information stored in running programs is represented


as data structures whereas the information in messages
consists of sequences of bytes.Whatever form of
communication used, the data structures must be
converted to a sequence of bytes before transmission and
rebuilt the sequence of bytes to data structure while
receiving. The individual primitive data items transmitted
in messages can be data values of many different types,
and not all computers store primitive values as it is,
difference in platform language etc., have different type of
data value ranges and storing patterns. The representation
of floating-point numbers also differs between
architectures.

When we example storing of integers in memory there are


two variants for the ordering of integers. one is big-
endian order, in which the most significant byte comes
first, and little-endian order, in which it comes last.

Another issue is the set of codes used to represent


characters: for example, the majority of applications on
systems such as UNIX use ASCII character coding, taking
one byte per character, whereas the Unicode standard
allows for the representation of texts in many different
languages and takes two bytes per character.

As we saw the problem in communication between two


computer or two process data transmitted should be
converted on transmission so both computers can
understand. The conversion can be done by

 The values are converted to an agreed external format


before transmission and converted to the local form on
receipt. if the two computers are known to be the same
type, the conversion to external format can be omitted.

 The values are transmitted in the sender’s format,


together with an indication of the format used, and the
recipient converts the values if necessary.

To support RMI(Remote method invocation) or


RPC(Remote procedure call), any data type that can be
passed as an argument or returned as a result must be
able to be flattened and the individual primitive data
values represented in an agreed format. A standardized
and most community agreed data representation need to
be followed.

What is Marshalling?

Marshalling is the process of taking a collection of data


items and assembling them into a form suitable for
transmission in a message.
In computer science, marshalling or marshaling is the
process of transforming the memory representation of an
object to a data format suitable for storage or
transmission, and it is typically used when data must be
moved between different parts of a computer program or
from one program to another. — Wikipedia.

Unmarshalling is the process of disassembling them on


arrival to produce an equivalent collection of data items at
the destination.Thus marshalling consists of the translation
of structured data items and primitive values into an
external data representation.

As we see for the communication between processes in


inter-process communication need to have a recognized
data representation between
process marshalling happens in the middle between
running program and transmission.

There are some approaches like CORBA’s CDR, Java’s


object serialization and XML (Extensible Markup
Language) to external data representation and
marshalling.

CORBA’s CDR (common data representation)


CORBA CDR is the external data representation defined
with CORBA 2.0.CDR can represent all of the data types
that can be used as arguments and return values in remote
invocations in CORBA architecture. These consist of 15
primitive types, and any (which can represent any basic or
constructed type),together with a range of composite
types.Each argument or result in a remote invocation is
represented by a sequence of bytes in the invocation or
result message.

Primitive types: CDR defines a representation for both big-


endian and little-endian orderings. Values are transmitted
in the sender’s ordering, which is specified in each
message. The recipient translates if it requires a different
ordering.

Constructed types: The primitive values that comprise


each constructed type are added to a sequence of bytes in
a particular order.

CORBA CDR for constructed types

Type of data item is not given with data representation in


the message in CORBA CDR. This is because it’s assumed
that the sender and receiver have common knowledge of
the order types of the data items in a message. In
particular, for RMI or RPC, each method invocation pass
arguments of particular types, and the result is a value of a
particular type.

Marshalling in CORBA

Marshalling operations can be generated automatically


from the specification of the types of data items to be
transmitted in a message. The types of the data structures
and the types of the basic data items are described in
CORBA IDL, which provides a notation for describing the
types of the arguments and results of RMI methods.The
CORBA interface compiler generates appropriate
marshalling and unmarshalling operations for the
arguments and results of remote methods from the
definitions of the types of their parameters and results.

Java object serialization


Java’s object serialization is concerned with the flattening
and external data representation of any single object or
tree of objects that may need to be transmitted in a
message or stored in a disk. It’s only use by Java.
In java RMI, both objects and primitive data values may be
passed as arguments and results of method invocations. An
object is an instance of a class. Stating that a class
implements the serializable interface, which is provided in
the java.io package. that class has the effect of allowing its
instances to be serialized.

In Java, serialization refers to the activity of flattening an


object or a connected set of objects into a serial form that
is suitable for storing on disk or transmitting in a message.
Deserialization consists of restoring the state of an object
or set of objects from their serialized form. It’s assumed
that the process that does the deserialization has no prior
knowledge of the types of the objects in the serialized
form. There for some information about the class of each
object is included in the serialized form. This information
enables the recipient to load the appropriate class when an
object is deserialized.

Serialization and deserialization of the arguments and


results of remote invocations are generally carried out
automatically by the middle ware, without any
participation by the application programmer. If necessary,
programmers with special requirements may write their
own version of the methods that read and write objects.
Another way in which a programmer may modify the
effects of serialization is by declaring variables that should
not be serialized as transient.

XML (Extensible Markup Language)


XML is a markup language that was defined by the World
Wide Web Consortium (W3C) for general use on the Web.
XML was designed for writing structured documents for
the Web. XML data items are tagged with ‘markup’ strings.
The tags are used to describe the logical structure of the
data and to associate attribute-value pairs with logical
structures.
XML is used to enable clients to communicate with web
services and for defining the interfaces and other
properties of web services. However, XML is also used in
many other ways, including in archiving and retrieval
systems — although an XML archive may be larger than a
binary one, it has the advantage of being readable on any
computer.
XML is extensible in the sense that users can define their
own tags which uses a fixed set of tags. However, if an
XML document is intended to be used by more than one
application, then the names of the tags must be agreed
between them. When xml representation is used to share
data the defined tags should be shared between
computers.

XML was intended to be used by multiple applications for


different purposes. The provision of tags, together with the
use of namespaces to define the meaning of the tags, has
made this possible. In addition, the use of tags enables
applications to select just those parts of a document it
needs to process: it will not be affected by the addition of
information relevant to other applications.

Comparisons within CDR, Java’s object serialization


and XML
1. In CDR and java’s object serialization, the marshalling
and unmarshalling activities are intended to be carried
out by a middleware layer without any involvement on
the part of the application programmer. Even in the
case of XML, which is textual and therefore more
accessible to hand-encoding, software for marshalling
and unmarshalling is available for all commonly used
platforms and programming environments(for example
xml parsers in java). Compactness is another issue that
can be addressed in the design of automatically
generated marshalling procedures.

2. In CDR and java’s object serialization, the primitive


data types are marshalled into a binary form. In XML,
the primitive data types are represented textually. The
textual representation of a data value will generally be
longer than the equivalent binary representation.

3. Another issue with regard to the design of marshalling


methods is whether the marshalled data should include
information concerning the type of its contents.
CORBA’s representation includes just the values of the
objects transmitted, and nothing about their types. On
the other hand, both Java serialization and XML do
include type information, but in different ways. Java
puts all of the required type information into the
serialized form, but XML documents may refer to
externally defined sets of names (with types)
called namespaces.

4. Although we are interested in the use of an external


data representation for the arguments and results of
RMIs and RPCs, it does have a more general use for
representing data structures, objects or structured
documents in a form suitable for transmission in
messages or storing in files.

5. CORBA CDR do not need to be self- describing, because


it is assumed that the client and server exchanging a
message have prior knowledge of the order and the
types of the information it contains.But XML need self-
describing about its tags for communication.
Multicast Communication
Multicast Communication
A single message (as transmitted by the sender) is delivered to a group of
processes. One way to achieve this is to use a special multicast address.
Figure 3.10 illustrates multicast communication in which a group (a
prechosen subset) of processes receive a message sent to the group. The
light-shaded processes are members of the target group, so each will
receive the message sent by process A; the dark-shaded processes are
not members of the group and so ignore the message. The multicast
address can be considered to be a filter; either processes listen for
messages on that address (conceptually they are part of the group) or
they do not.

The sender may not know how many processes receive the message or
their identities; this depends on the implementation of the multicast
mechanism.
Multicast communication can be achieved using a broadcast
mechanism. UDP is an example of a protocol that supports broadcast
directly, but not multicast. In this case, transport layer ports can be used
as a means of group message filtering by arranging that only the subset
of processes that are members of the group listen on the particular port.
The group membership action join group can be implemented locally by
the process binding to the appropriate port and issuing a receive-from
call.
In both types of multicast communication, that is, directly supported by
the communication protocol or fabricated by using a broadcast
mechanism, there can be multiple groups, and each individual process
can be a member of several different groups. This provides a useful way
to impose some control and structure on the communication at the higher
level of the system or application. For example, the processes concerned
with a particular functionality or service within the system can join a
specific group related to that activity.

You might also like