0% found this document useful (0 votes)

17 views21 pages

DS Ass

The document discusses distributed system assignment questions related to distributed computing technologies and concepts such as distributed system evolution, limitations of XML, SOA, SOAP and RESTful, working principles of 2 phase locking and 3 phase locking, essentiality of transaction management and how it works in distributed platforms, distributed computing simulation frameworks, approaches to achieve high performance, major distributed platform areas and algorithms, clock synchronization and leadership algorithms, evolution of distributed programming, and limitations and strengths of leading distributed computing platforms. The student, Salim Mulugeta, submitted the assignment to their professor Ravindra Babu for a class on information and communication technology.

Uploaded by

salaandeska2015

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

17 views21 pages

DS Ass

Uploaded by

salaandeska2015

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

You are on page 1/ 21

TECHNICAL AND VOCATIONAL TRAINING INSTITUTE

DEPARTMENT OF INFORMATION and COMMUNICATION

TECHNOLOGY

Distributed System Assignment #1

Name: Salim Mulugeta

ID: ETUMR/263/14

Submitted to: Prof. Ravindra Babu B.

Submission Date: February, 2023
Contents
Q1. How distributed computing systems are going to be evolved in future and explain it briefly
mentioning/citing with proper references..................................................................................................2
Q2. Write limitations of following technologies a.XML b. SOA c. SOAP d. RESTful....................................3
Q3. Explain working principles of 2 phase locking and 3 phase locking.......................................................4
Q4. Describe the essentiality of transaction management and explain how transaction management
works in distributed platforms....................................................................................................................5
Q5. What are different simulation/emulation frameworks available for distributed computing platforms
to simulate, compare and contrast its capabilities......................................................................................7
Q6. Write different approaches to achieve high performance in distributed environments.......................8
Q7. Explain major distributed platform areas and its algorithm strengths and Weakness..........................9
Q8. Write different clock synchronization and leadership algorithms for distributed platforms..............12
Q9. Write a short note on evolution of distributed programming with proper references.......................14
Q10. Describe the current limitations and Strengths of leading distributed computing platforms with
proper references......................................................................................................................................17

Page 1 of 21
Q1. How distributed computing systems are going to be evolved in future and
explain it briefly mentioning/citing with proper references.
ANS:

A distributed system is a collection of multiple autonomous computing elements that appears to

its users as a single comprehensible system [1]. The ultimate goal of distributed computing is to
maximize performance by connecting users and IT resources in a cost-effective, transparent and
reliable manner.

The present computing paradigm is not scalable since it depends on "shared memory", yet most
physical systems work with message passing, so to gain progress you need to convince people to
surrender one for the other. You can bring down the bar on the progress by making Simultaneous
Multiple processing work as a programming paradigm on the top of message passing.

Because of the quick advancement in PC equipment, software, web, sensor networks, portable
device communications, and interactive media advances, distributed computing systems have
evolved radically to improve and grow various applications with better nature of administrations
and lower cost, particularly those involving human factors [2]. Besides reliability, performance
and availability, many other attributes, such as security, privacy, trustworthiness, situation
awareness, flexibility and rapid development of various applications, have also become
important. Distributed Computing System will serve and evolved in the long run.

With the rapid development of various emerging distributed computing technologies such as
Web services, Grid computing, and Cloud computing, computer networks become the integrant
of the next generation distributed computing systems. Therefore, integration of networking and
distributed computing systems becomes an important research problem for building the next-
generation high performance distributed information infrastructure.

In the near future, distributed application frameworks will support mobile code, multimedia data
streams, user and device mobility, and spontaneous networking [3].

Looking further into the future, essential techniques of distributed systems will be incorporated
into an emerging new area, envisioning billions of communicating smart devices forming a

Page 2 of 21
world-wide distributed computing system several orders of magnitude larger than today's
Internet.

Q2. Write limitations of following technologies a.XML b. SOA c. SOAP d.

RESTful
ANS:

Here are some limitations of the following technologies:

a. XML:

Parsing and processing XML documents can be computationally expensive and time-consuming,
especially for large documents.

XML is verbose and can lead to larger file sizes, which can affect network transfer times and
storage requirements.

The complexity of XML can make it difficult to read and understand, which can increase the
likelihood of errors and decrease developer productivity.

b. SOA:

SOA is a complex and highly configurable architecture that can be difficult to design and
implement correctly, leading to higher development and maintenance costs.

The decoupling of services in SOA can lead to increased network traffic, which can affect
system performance.

The service-oriented nature of SOA can lead to a proliferation of services, which can become
difficult to manage and govern.

c. SOAP:

SOAP messages can be verbose and can lead to larger file sizes, which can affect network
transfer times and storage requirements.

Page 3 of 21
SOAP can be slower than other communication protocols, such as REST, due to its use of XML
and additional processing requirements.

SOAP is tightly coupled and can be difficult to modify once deployed, which can affect system
agility and flexibility.

d. RESTful:

RESTful services can be less secure than SOAP services, as they rely on HTTP methods and do
not provide built-in security features.

RESTful services can be less reliable than SOAP services, as they rely on the statelessness of
HTTP, which can be affected by network errors and server failures.

The lack of a standardized approach to RESTful service design and documentation can lead to
inconsistencies and difficulties in service discovery and integration.

It is important to note that these technologies have their own strengths and are widely used in
various applications despite their limitations. Therefore, it is important to carefully consider the
specific requirements of an application before selecting a technology to use.

Q3. Explain working principles of 2 phase locking and 3 phase locking

ANS:

Two-phase locking (2PL) and three-phase locking (3PL) are concurrency control protocols used
in database management systems to ensure transaction atomicity, consistency, isolation, and
durability (ACID properties). They are used to prevent conflicts between transactions that may
try to access the same data simultaneously.

Here are the working principles of 2PL and 3PL:

Two-phase locking (2PL): In two-phase locking, a transaction acquires all the locks it
needs before it performs any modifications to the data. The protocol consists of two phases: the
growing phase and the shrinking phase.

Growing phase: During the growing phase, a transaction acquires locks on the data items it
needs to access. Once a lock is acquired, it cannot be released until the end of the transaction.

Page 4 of 21
Shrinking phase: During the shrinking phase, a transaction releases the locks it has acquired
after it has completed its modifications to the data. Once a lock is released, it cannot be
reacquired.

The two-phase locking protocol guarantees serializability, meaning that the transactions are
executed in a way that produces the same result as if they were executed serially, one after the
other.

Three-phase locking (3PL): Three-phase locking is an extension of two-phase locking that

adds an additional phase, the validation phase, to prevent deadlock. The protocol consists of
three phases: the growing phase, the validation phase, and the shrinking phase.

Growing phase: During the growing phase, a transaction acquires locks on the data items it
needs to access.

Validation phase: During the validation phase, a transaction checks if it can acquire all the locks
it needs to complete its modifications. If it cannot, it releases all the locks it has acquired and
starts again from the beginning of the growing phase.

Shrinking phase: During the shrinking phase, a transaction releases the locks it has acquired
after it has completed its modifications to the data.

The three-phase locking protocol ensures strict two-phase locking, meaning that a transaction
does not release any locks until the end of the transaction. It also prevents deadlocks by releasing
all locks if a transaction cannot acquire all the locks it needs during the validation phase.

In summary, 2PL and 3PL are concurrency control protocols that ensure the atomicity,
consistency, isolation, and durability of transactions in a database management system. Two-
phase locking acquires all the locks before modifications and releases them all at the end, while
three-phase locking adds a validation phase to prevent deadlock.

Q4. Describe the essentiality of transaction management and explain how

transaction management works in distributed platforms.
ANS:

Page 5 of 21
Transaction management is a critical aspect of distributed computing as it ensures data
consistency and integrity when multiple systems interact with each other. In distributed systems,
transactions may span multiple nodes, and failures in any of the participating nodes can cause
data inconsistencies. Transaction management provides a mechanism for coordinating and
controlling transactions, ensuring that they are atomic, consistent, isolated, and durable (ACID).

ACID properties ensure that transactions are executed reliably and that data integrity is
maintained even in the event of failures or conflicts. The following are the essential properties of
a transaction:

Atomicity: A transaction should be treated as a single, indivisible unit of work. Either all of the
changes in the transaction are committed, or none of them are committed.

Consistency: A transaction should ensure that the database is in a consistent state both before
and after the transaction is executed.

Isolation: Transactions should be executed independently of each other. Changes made by one
transaction should not affect other transactions.

Durability: Once a transaction is committed, its changes should persist even in the event of a
system failure.

In distributed systems, transaction management works by using a two-phase commit protocol

(2PC). The 2PC protocol coordinates the commit or rollback of a distributed transaction across
multiple nodes. Here's how it works:

The transaction coordinator (TC) sends a prepare message to all the nodes participating in the
transaction, asking them to prepare to commit the transaction.

Each node checks whether it can commit the transaction. If it can, it sends an agreement message
to the TC. If it cannot, it sends a abort message.

The TC collects all the agreement messages from the nodes. If it receives an abort message from
any node, it sends a rollback message to all the nodes, instructing them to abort the transaction.

If the TC receives agreement messages from all the nodes, it sends a commit message to all the
nodes, instructing them to commit the transaction.

Page 6 of 21
This protocol ensures that all the nodes participating in the transaction agree to commit the
transaction before any changes are made, ensuring consistency and durability across the
distributed system.

In summary, transaction management is essential for ensuring data consistency and integrity in
distributed systems. The two-phase commit protocol provides a mechanism for coordinating and
controlling transactions, ensuring that they are executed reliably and that data integrity is
maintained even in the event of failures or conflicts.

Q5. What are different simulation/emulation frameworks available for

distributed computing platforms to simulate, compare and contrast its
capabilities
ANS:

There are several simulation and emulation frameworks available for distributed computing
platforms that allow for the testing, comparison, and optimization of various distributed
computing capabilities. Here are a few examples of such frameworks:

SimGrid: SimGrid is a simulation framework for distributed systems and applications. It

provides a platform for simulating the behavior of distributed computing systems in a controlled
environment. SimGrid can simulate different network topologies, communication protocols, and
application behaviors to analyze and optimize system performance.

CloudSim: CloudSim is a simulation framework for modeling and simulating cloud computing
environments. It provides a platform for simulating various cloud computing scenarios, including
infrastructure-as-a-service (IaaS), platform-as-a-service (PaaS), and software-as-a-service (SaaS)
models. CloudSim can help to analyze and optimize the performance, energy consumption, and
cost-effectiveness of cloud computing environments.

GridSim: GridSim is a simulation framework for modeling and simulating grid computing
environments. It provides a platform for simulating various grid computing scenarios, including
job scheduling, data management, and resource allocation. GridSim can help to analyze and
optimize the performance and efficiency of grid computing environments.

Page 7 of 21
Distem: Distem is an emulation framework for distributed systems and applications. It allows for
the creation of a virtual testbed for running and testing distributed computing applications.
Distem can simulate different network topologies, communication protocols, and application
behaviors to analyze and optimize system performance.

Shadow: Shadow is an emulation framework for network systems and applications. It provides a
platform for running and testing distributed systems and applications in a realistic environment.
Shadow can simulate various network scenarios, including different network topologies, link
delays, packet losses, and congestion, to analyze and optimize the performance of distributed
systems and applications.

These simulation and emulation frameworks can help developers and researchers to analyze,
compare, and optimize the performance of various distributed computing platforms and
applications. By simulating and emulating various scenarios, these frameworks can help to
identify potential issues and bottlenecks, and optimize the performance and efficiency of
distributed computing systems and applications.

Q6. Write different approaches to achieve high performance in distributed

environments.
ANS:

Achieving high performance in distributed environments requires careful consideration of

various factors such as communication, load balancing, fault tolerance, scalability, and latency.
Here are some different approaches that can be used to achieve high performance in distributed
environments:

Distributed Computing Frameworks: Distributed computing frameworks like Apache

Hadoop, Apache Spark, and Apache Flink provide a platform for running large-scale data
processing applications in a distributed environment. These frameworks distribute the data and
processing workload across multiple machines, which can significantly improve performance.

Load Balancing: Load balancing is the process of distributing workloads across multiple
machines in a way that ensures no machine is overloaded. This approach can help to improve
performance by making sure that resources are being used efficiently.

Page 8 of 21
Caching: Caching involves storing frequently accessed data in memory, which can help to
reduce the number of times the data needs to be retrieved from disk. This approach can help to
improve performance by reducing latency.

Parallelism: Parallelism is the use of multiple threads or processes to perform a task

simultaneously. This approach can help to improve performance by reducing the time it takes to
complete a task.

Data Partitioning: Data partitioning involves dividing a dataset into smaller, more manageable
pieces that can be processed in parallel. This approach can help to improve performance by
reducing the amount of data that needs to be processed at any one time.

Replication: Replication involves duplicating data across multiple machines. This approach can
help to improve performance by reducing the time it takes to access data, as data can be retrieved
from the nearest machine.

Message Queuing: Message queuing involves sending messages between machines using a
queuing system. This approach can help to improve performance by reducing the time it takes to
process messages, as messages can be processed asynchronously.

Distributed Database Management Systems: Distributed database management systems like

Apache Cassandra and Apache HBase provide a platform for storing and processing large
datasets across multiple machines. These systems can help to improve performance by
distributing data and processing workload across multiple machines.

Containerization: Containerization involves packaging an application and its dependencies into

a container that can be deployed on any machine. This approach can help to improve
performance by making it easier to deploy and scale applications in a distributed environment.

These approaches can be combined and customized based on the specific requirements of a
distributed system to achieve high performance.

Q7. Explain major distributed platform areas and its algorithm strengths and
Weakness
ANS:

Page 9 of 21
Distributed platforms are designed to support the execution of complex distributed applications
across multiple nodes in a network. They typically provide a set of services and APIs to enable
the development of distributed applications that can leverage the underlying infrastructure's
processing and storage capabilities. Here are some major areas of distributed platforms and their
algorithm strengths and weaknesses:

Distributed Storage:

Distributed storage systems are designed to store and manage large amounts of data across
multiple nodes in a network. These systems typically use algorithms such as distributed hash
tables (DHTs) and gossip protocols to manage data distribution, replication, and consistency.

Strengths:

High availability and fault tolerance: Data is distributed across multiple nodes, making it highly
available and resilient to node failures.

Scalability: The storage capacity can be easily increased by adding more nodes to the network.

Low latency: Data can be accessed quickly from the node closest to the user.

Weaknesses:

Consistency: Consistency can be a challenge in distributed storage systems, especially in the

presence of concurrent updates.

Complexity: Managing a distributed storage system can be complex due to the need for data
distribution, replication, and consistency.

Distributed Computing:

Distributed computing systems are designed to distribute computational tasks across multiple
nodes in a network. These systems typically use algorithms such as MapReduce, Apache Spark,
and Hadoop to distribute tasks, process data, and aggregate results.

Strengths:

Scalability: Computational resources can be easily scaled up or down by adding or removing

nodes from the network.

Page 10 of 21
Fault tolerance: Distributed computing systems can continue to function even if some nodes fail.

Efficiency: Distributed computing systems can process large amounts of data in a relatively
short amount of time.

Weaknesses:

Overhead: The overhead of data transfer and coordination between nodes can affect
performance.

Complexity: Developing and managing distributed computing systems can be complex due to
the need for distributed data processing, task coordination, and error handling.

Distributed Messaging:

Distributed messaging systems are designed to enable messaging and event-driven

communication between nodes in a network. These systems typically use algorithms such as
publish/subscribe and message queuing to route messages and events between nodes.

Strengths:

Scalability: Messaging systems can handle large volumes of messages and events.

Decoupling: Messaging systems enable loose coupling between components in a distributed

system, making it easier to maintain and update the system.

Asynchronous: Messaging systems can process messages and events asynchronously, allowing
for more efficient use of computational resources.

Weaknesses:

Ordering: Ensuring message ordering can be a challenge in distributed messaging systems.

Reliability: Messaging systems can be affected by network latency and failures, which can
affect reliability.

In summary, distributed platforms are designed to provide a set of services and APIs to enable
the development of distributed applications that can leverage the underlying infrastructure's
processing and storage capabilities. Each of the major areas of distributed platforms has its own

Page 11 of 21
algorithm strengths and weaknesses, which must be carefully considered when designing and
implementing distributed applications.

Q8. Write different clock synchronization and leadership algorithms for

distributed platforms.
ANS:

Clock synchronization for distributed platforms

Distributed System is a collection of computers connected via the high speed

communication network. In the distributed system, the hardware and software
components communicate and coordinate their actions by message passing. Each node
in distributed systems can share their resources with other nodes. So, there is need of
proper allocation of resources to preserve the state of resources and help coordinate
between the several processes. To resolve such conflicts, synchronization is used.
Synchronization in distributed systems is achieved via clocks.
The physical clocks are used to adjust the time of nodes. Each node in the system can
share its local time with other nodes in the system. The time is set based on UTC
(Universal Time Coordination). UTC is used as a reference time clock for the nodes in
the system.

The clock synchronization can be achieved by 2 ways: External and Internal Clock
Synchronization.

1. External clock synchronization is the one in which an external reference clock is

present. It is used as a reference and the nodes in the system can set and adjust their
time accordingly.
2. Internal clock synchronization is the one in which each node shares its time with
other nodes and all the nodes set and adjust their times accordingly.
There are 2 types of clock synchronization algorithms: Centralized and Distributed.

1. Centralized is the one in which a time server is used as a reference. The single time
server propagates its time to the nodes and all the nodes adjust the time accordingly.
It is dependent on single time server so if that node fails, the whole system will lose

Page 12 of 21
synchronization. Examples of centralized are- Berkeley Algorithm, Passive Time
Server, Active Time Server etc.
2. Distributed is the one in which there is no centralized time server present. Instead
the nodes adjust their time by using their local time and then, taking the average of
the differences of time with other nodes. Distributed algorithms overcome the issue
of centralized algorithms like the scalability and single point failure. Examples of
Distributed algorithms are – Global Averaging Algorithm, Localized Averaging
Algorithm, NTP (Network time protocol) etc.
Leadership algorithms for distributed platforms

Many distributed election algorithms have been proposed to resolve the problem of leader
election. Among all the existing algorithms, the most prominent algorithms are as:

 Bully Algorithm presented by Gracia-Molina in 1982.

 Improved Bully Election Algorithm in Distributed System presented by A.Arghavani in
2011.
 Modified Bully Election Algorithm in Distributed Systems presented by M.S.Kordafshari
and group.
 Ring Algorithm
 Modified Ring Algorithm

BULLY ALGORITHM Bully

Algorithm is one of the most promising election algorithms which were presented by Gracia
Molina in 1982.

Disadvantages: Bully algorithm has following disadvantages.

 It required that every process should know the identity of every other process in the system so
it takes very large space in the system.

 It has high number of message passing during communication which increases heavy
traffic .the message passing has order o (n2).

IMPROVED BULLY ELECTION ALGORITHM

Page 13 of 21
It also overcomes the disadvantages of the original bully. The main concept of this algorithm is
that the algorithm declares the new coordinator before actual or current coordinator is crashed.

Disadvantages

 It has complex structure.

 Every time process updates its database.

 Large database required to maintain the information of each process in database of

every process.

MODIFIED ELECTION ALGORITHM

This algorithm resolve the disadvantages of the bully algorithm.

Disadvantages

 A modified algorithm is also time bounded.

 It is better than bully but also has o (n2) complexity in worst case.

 It is necessary for all process to know the priority of other.

RING ALGORITHM

This election algorithm is based on the use of a ring. We assume that the processes are physically
or logically ordered, so that each process knows who its successor[4].

Q9. Write a short note on evolution of distributed programming with proper

references.
ANS:

In this article, we will see the history of distributed computing systems from the mainframe era
to the current day to the best of my knowledge. It is important to understand the history of
anything in order to track how far we progressed. The distributed computing system is all
about evolution from centralization to decentralization, it depicts how the centralized systems
evolved from time to time towards decentralization. We had a centralized system like

Page 14 of 21
mainframe in early 1955 but now we are probably using a decentralized system like edge
computing and containers.

1. Mainframe: In the early years of computing between 1960-1967, mainframe-based

computing machines were considered as the best solution for processing large-scale data as
they provided time-sharing to a local clients who interacts with teletype terminals. This type of
system conceptualized the client-server architecture. The client connects and request the server
and the server processes these request, enabling a single time-sharing system to send multiple
resources over a single medium amongst clients. The major drawback it faced was that it was
quite expensive and that lead to the innovation of early disk-based storage and transistor
memory.
2. Cluster Networks: In the early 1970s, the development of packet-switching and cluster
computing happens which was considered an alternative for mainframe systems although it
was expensive. In cluster computing, the underlying hardware consists of a collection of
similar workstations or PCs, closely connected by means of a high-speed local-area network
where each node runs the same operating system
3. Internet & PC’s: During this era, the evolution of the internet takes place. New technology
such as TCP/IP had begun to transform the Internet into several connected networks, linking
local networks to the wider Internet. Thus, the number of hosts connected to the network began
to grow rapidly, therefore the centralized naming systems such as HOSTS.TXT couldn’t
provide scalability.
4. World Wide Web: During the 1980 – the 1990s, the creation of HyperText Transfer
Protocol (HTTP) and HyperText Markup Language (HTML) resulted in the first web browsers,
websites,s, and web-server. It was developed by Tim Berners Lee at CERN. Standardization of
TCP/IP provided infrastructure for interconnected networks of networks known as the World
Wide Web (WWW). This leads to the tremendous growth of the number of hosts connected to
the Internet. As the number of PC-based application programs running on independent
machines started growing, the communications between such application programs became
extremely complex and added a growing challenge in the aspect of application-to-application
interaction. With the advent of Network computing which enables remote procedure calls
(RPCs) over TCP/IP, it turned out to be a widely accepted way for application software
communication

Page 15 of 21
5. P2P, Grids & Web Services: Peer-to-peer (P2P) computing or networking is a distributed
application architecture that partitions tasks or workloads between peers without the
requirement of a central coordinator. Peers share equal privileges. In a P2P network, each
client acts as a client and server.P2P file sharing was introduced in 1999 when American
college student Shawn Fanning created the music-sharing service Napster.P2P networking
enables decentralized internet. With the introduction of Grid computing, multiple tasks can be
completed by computers jointly connected over a network. It basically makes use of a data grid
i.e., a set of computers can directly interact with each other to perform similar tasks by using
middleware. During 1994 – 2000, we also saw the creation of effective x86 virtualization.
With the introduction of web service, platform-independent communication was established
which uses XML-based information exchange systems that use the Internet for direct
application-to-application interaction. Through web services Java can talk with Perl; Windows
applications can talk with Unix applications. Peer-to-peer networks are often created by
collections of 12 or fewer machines.
6. Cloud, Mobile & IoT: Cloud computing came up with the convergence of cluster
technology, virtualization, and middleware. Through cloud computing, you can manage your
resources and applications online over the internet without explicitly building on your hard
drive or server. The major advantage is provided that it can be accessed by anyone from
anywhere in the world. Many cloud providers offer subscription-based services. After paying
for a subscription, customers can access all the computing resources they need. Customers no
longer need to update outdated servers, buy hard drives when they run out of storage, install
software updates or buy a software licenses. The vendor does all that for them. Mobile
computing allows us to transmit data, such as voice, and video over a wireless network. We no
longer need to connect our mobile phones with switches

The evolution of Application Programming Interface (API) based communication over the
REST model was needed to implement scalability, flexibility, portability, caching, and
security. Instead of implementing these capabilities at each and every API separately, there
came the requirement to have a common component to apply these features on top of the API.
This requirement leads the API management platform evolution and today it has become one

Page 16 of 21
of the core features of any distributed system. Instead of considering one computer as one
computer, the idea to have multiple systems within one computer came into existence.

7. Fog and Edge Computing: When the data produced by mobile computing and IoT services
started to grow tremendously, collecting and processing millions of data in real-time was still
an issue. This leads to the concept of edge computing in which client data is processed at the
periphery of the network, it’s all about the matter of location. That data is moved across a
WAN such as the internet, processed, and analyzed closer to the point such as corporate LAN,
where it’s created instead of the centralized data center which may cause latency issues. Fog
computing greatly reduces the need for bandwidth by not sending every bit of information over
cloud channels, and instead aggregating it at certain access points. This type of distributed
strategy lowers costs and improves efficiencies. Companies like IBM are the driving force
behind fog computing. The composition of Fog and Edge computing further extends the Cloud
computing model away from centralized stakeholders to decentralized multi-stakeholder
systems which are capable of providing ultra-low service response times, and increased
aggregate bandwidths.
Today distributed system is programmed by application programmers while the underlying
infrastructure management is done by a cloud provider. This is the current state of distributed
systems of computing and it keeps on evolving.

Q10. Describe the current limitations and Strengths of leading distributed

computing platforms with proper references.
ANS:

Distributed computing platforms have become increasingly popular in recent years due to their
ability to handle large-scale and complex computations. Here are the current limitations and
strengths of some leading distributed computing platforms:

 Apache Hadoop: Strengths: Hadoop is widely used for processing large volumes of data in a
fault-tolerant and scalable manner. It supports a variety of data sources and offers a flexible
data processing framework. Hadoop is well-supported by a large community and offers a
variety of tools for data analytics.

Page 17 of 21
Limitations: Hadoop has a high latency due to its reliance on disk I/O and can suffer from
performance issues when processing small files. It is not designed for real-time data processing
and can be complex to set up and manage.

 Apache Spark: Strengths: Spark is a high-performance data processing engine that can
handle both batch and real-time processing. It supports a variety of data sources and offers a
flexible programming model. Spark is well-supported by a large community and offers a
variety of tools for data analytics.

Limitations: Spark can be memory-intensive and requires a significant amount of resources to

run. It may not be suitable for processing extremely large datasets, and its real-time processing
capabilities are limited compared to other platforms.

 Apache Flink: Strengths: Flink is a high-performance data processing engine that offers both
batch and stream processing capabilities. It is designed for low-latency data processing and
offers a flexible programming model. Flink is well-suited for complex event processing and
real-time analytics.

Limitations: Flink is relatively new compared to other distributed computing platforms and may
not have as large of a community or ecosystem of tools. It may also require more expertise to set
up and manage compared to other platforms.

 Apache Kafka: Strengths: Kafka is a high-performance messaging system that can handle
large volumes of data streams. It offers low-latency and high-throughput data processing and
is well-suited for real-time data processing and event-driven architectures. Kafka is widely
used for building scalable and reliable data pipelines.

Limitations: Kafka is not a full-featured data processing engine and may require additional tools
for data processing and analytics. It may also require more expertise to set up and manage
compared to other messaging systems.

In summary, distributed computing platforms have strengths and limitations that should be
considered when selecting a platform for a particular use case. Apache Hadoop, Spark, Flink,
and Kafka are popular platforms with different strengths and limitations that make them suitable
for different types of data processing and analytics.

Page 18 of 21
Reference
[1] M. van Steen and A. S. Tanenbaum, “A brief introduction to distributed
systems,” Computing, vol. 98, no. 10, pp. 967–1009, 2016, doi:
10.1007/s00607-016-0508-7.
[2] S. S. Yau, “Challenges and Future Trends of Distributed Computing
Systems,” pp. 758–758, 2011, doi: 10.1109/hpcc.2011.151.
[3] J. Brier and lia dwi jayanti, No 主観的健康感を中心とした在宅高齢者に
おける健康関連指標に関する共分散構造分析 Title, vol. 21, no. 1. 2020.
[Online]. Available:
http://journal.um-surabaya.ac.id/index.php/JKM/article/view/2203
[4] S. Balhara and K. Khanna, “Leader Election Algorithms in Distributed
Systems,” Int. J. Comput. Sci. Mob. Comput., vol. 3, no. 6, pp. 374–379,
2014.
[5] Zaharia, M., et al. (2010). "Spark: Cluster Computing with Working Sets."
Proceedings of the 2nd USENIX Conference on Hot Topics in Cloud
Computing.
[6] Zaharia, M., et al. (2016). "Apache Flink: Stream and Batch Processing in a
Single Engine." IEEE Data Engineering Bulletin.
[7] Apache Kafka (2022). "Why Use Kafka?" Retrieved from

Web sites
1. https://www.geeksforgeeks.org/evolution-of-distributed-computing-systems/
2. https://insights.daffodilsw.com/blog/distributed-cloud-computing-benefits-and-limitations
3. https://www.geeksforgeeks.org/limitation-of-distributed-system/

Page 19 of 21
4. https://www.techtarget.com/whatis/definition/distributed-computing
5. https://www.geeksforgeeks.org/synchronization-in-distributed-systems/

Page 20 of 21

Introduction To ICT - Exam Paper ANSWER SHEET
No ratings yet
Introduction To ICT - Exam Paper ANSWER SHEET
19 pages
D.S Assignment Answer
100% (2)
D.S Assignment Answer
23 pages
Distributed Computing: Beakal Gizachew Assefa
No ratings yet
Distributed Computing: Beakal Gizachew Assefa
54 pages
How Distributed Computing Systems Are Going To Be Evolved in Future and Explain It Briefly Mentioning
100% (2)
How Distributed Computing Systems Are Going To Be Evolved in Future and Explain It Briefly Mentioning
10 pages
اسيمنتات
No ratings yet
اسيمنتات
48 pages
Distributed System2
No ratings yet
Distributed System2
102 pages
Distributed CS571
No ratings yet
Distributed CS571
36 pages
Data Warehousing
No ratings yet
Data Warehousing
42 pages
Assessments, Exam Practice
No ratings yet
Assessments, Exam Practice
174 pages
Distributed System
No ratings yet
Distributed System
23 pages
Exam Notes
No ratings yet
Exam Notes
16 pages
DSCC
No ratings yet
DSCC
27 pages
WS&SOA
No ratings yet
WS&SOA
59 pages
Concurrency Control
No ratings yet
Concurrency Control
43 pages
DBMS UNIT 5 Part 2
No ratings yet
DBMS UNIT 5 Part 2
97 pages
Introduction To Distributed Systems
No ratings yet
Introduction To Distributed Systems
155 pages
Cia 1 MW
No ratings yet
Cia 1 MW
18 pages
Comp 11
No ratings yet
Comp 11
16 pages
Parallel and Distributed Lec 7
No ratings yet
Parallel and Distributed Lec 7
35 pages
Chapter 1 Basic Distributed System Concepts
0% (1)
Chapter 1 Basic Distributed System Concepts
45 pages
Introduction To Distributed Systems
No ratings yet
Introduction To Distributed Systems
72 pages
Concurrency Control Techniques Editing
No ratings yet
Concurrency Control Techniques Editing
31 pages
DS ModelQP Solution
No ratings yet
DS ModelQP Solution
44 pages
Lecture 1.1.5
No ratings yet
Lecture 1.1.5
14 pages
Qdis
No ratings yet
Qdis
10 pages
Two Mark Questions - Unit 1
No ratings yet
Two Mark Questions - Unit 1
6 pages
Grocery Shop Management
No ratings yet
Grocery Shop Management
64 pages
Distributed Computing 2525
No ratings yet
Distributed Computing 2525
21 pages
Concurrency Controls
No ratings yet
Concurrency Controls
36 pages
Ds Part B
No ratings yet
Ds Part B
30 pages
Distributed Systems QA
No ratings yet
Distributed Systems QA
53 pages
Brief Notes On Blockchain - For Semester
No ratings yet
Brief Notes On Blockchain - For Semester
40 pages
DC Ese Notes
No ratings yet
DC Ese Notes
47 pages
Ds Part B
No ratings yet
Ds Part B
29 pages
Project Report
No ratings yet
Project Report
57 pages
DOS CT2 Set C Answer Key
No ratings yet
DOS CT2 Set C Answer Key
10 pages
CHAPTER IV - ppt5325820
No ratings yet
CHAPTER IV - ppt5325820
60 pages
DS - Important Questions
No ratings yet
DS - Important Questions
41 pages
PDC 4011
No ratings yet
PDC 4011
21 pages
RMCS
No ratings yet
RMCS
127 pages
Cloud Computing
No ratings yet
Cloud Computing
15 pages
Ds Questions
No ratings yet
Ds Questions
6 pages
Ansdc 250320 185742
No ratings yet
Ansdc 250320 185742
5 pages
DS Ass
No ratings yet
DS Ass
20 pages
Document
No ratings yet
Document
85 pages
555 PG
No ratings yet
555 PG
49 pages
Distributed Database System
No ratings yet
Distributed Database System
5 pages
Distributed System
No ratings yet
Distributed System
19 pages
9-Database System Architecture
No ratings yet
9-Database System Architecture
37 pages
CS407 M1 Ktunotes - in
No ratings yet
CS407 M1 Ktunotes - in
9 pages
Distributed Systems Principles and Paradigms: Second Edition Andrew S. Tanenbaum Maarten Van Steen
No ratings yet
Distributed Systems Principles and Paradigms: Second Edition Andrew S. Tanenbaum Maarten Van Steen
29 pages
Main Assignment
100% (2)
Main Assignment
7 pages
DBMS Unit 4
No ratings yet
DBMS Unit 4
71 pages
PDC Final
No ratings yet
PDC Final
18 pages
Dbms Important Questions
No ratings yet
Dbms Important Questions
15 pages
Jimma University: Jimma Institute of Technology Faculty of Computing
No ratings yet
Jimma University: Jimma Institute of Technology Faculty of Computing
7 pages
Function Distributed Database Management Systems
No ratings yet
Function Distributed Database Management Systems
7 pages
Concurrency Control Dbms
No ratings yet
Concurrency Control Dbms
49 pages
Concurrency Control and Reliable Commit Protocol in Distributed Database Systems
No ratings yet
Concurrency Control and Reliable Commit Protocol in Distributed Database Systems
37 pages
CS8492 /database Management Systems 2017 Regulations
No ratings yet
CS8492 /database Management Systems 2017 Regulations
20 pages
DBMS UNIT-4 (Database Transactions & Query Processing)
No ratings yet
DBMS UNIT-4 (Database Transactions & Query Processing)
119 pages
Concurrency Control
No ratings yet
Concurrency Control
36 pages
Mid 2 (DBMS)
No ratings yet
Mid 2 (DBMS)
19 pages
Synchronization in Distributed Systems
No ratings yet
Synchronization in Distributed Systems
51 pages
DBMS QP
No ratings yet
DBMS QP
15 pages
Locking Protocols
No ratings yet
Locking Protocols
5 pages
Unit 5 (Notes)
No ratings yet
Unit 5 (Notes)
51 pages
Dbms Unit V Notes
No ratings yet
Dbms Unit V Notes
27 pages
Chapter 8 - Concurrency Control Techniques
No ratings yet
Chapter 8 - Concurrency Control Techniques
28 pages
Aditya Silver Oak Institute of Technology Department of Computer Engineering
No ratings yet
Aditya Silver Oak Institute of Technology Department of Computer Engineering
10 pages
DBMS Unit-5
No ratings yet
DBMS Unit-5
18 pages
CS8492-Database Management Systems
No ratings yet
CS8492-Database Management Systems
15 pages
DBMS II-Question Bank
No ratings yet
DBMS II-Question Bank
8 pages
Unit 3 Transactions
No ratings yet
Unit 3 Transactions
16 pages
7.two Phase Locking Protocols and Deadlock
No ratings yet
7.two Phase Locking Protocols and Deadlock
7 pages
MCS-023Introduction To Database Management Systems
No ratings yet
MCS-023Introduction To Database Management Systems
21 pages
CS 542: Topics in Distributed Systems: Transactions and Concurrency Control
No ratings yet
CS 542: Topics in Distributed Systems: Transactions and Concurrency Control
46 pages
Unit 5 Transaction and Concurrency Control
No ratings yet
Unit 5 Transaction and Concurrency Control
92 pages
Concurrency Control in DBMS
No ratings yet
Concurrency Control in DBMS
5 pages
Assignment 7 DBMS January 2024
No ratings yet
Assignment 7 DBMS January 2024
10 pages
On The Exam We Can Have 1 Cheat Sheet: Blg/Edit?Usp Sharing
No ratings yet
On The Exam We Can Have 1 Cheat Sheet: Blg/Edit?Usp Sharing
40 pages
DBMS Q Bank
No ratings yet
DBMS Q Bank
11 pages
End Semester Examination, May 2 EC Microprocesso: 008 /COEIIC-311: RS
No ratings yet
End Semester Examination, May 2 EC Microprocesso: 008 /COEIIC-311: RS
10 pages
Introduction To Real-Time Databases: (Embedded Systems and Wireless Networking Laboratory)
No ratings yet
Introduction To Real-Time Databases: (Embedded Systems and Wireless Networking Laboratory)
31 pages
Chapter 9 Transaction Processing Concepts and Concurrency Control Techniques
No ratings yet
Chapter 9 Transaction Processing Concepts and Concurrency Control Techniques
13 pages
DBMS 2023 Jan
No ratings yet
DBMS 2023 Jan
2 pages