Week 9

Hadoop 1.0 had limitations including a single point of failure and scalability restrictions. Hadoop 2.0 improved on this with the inclusion of YARN, which removed resource management from MapReduce, improved scalability and fault tolerance, and allowed additional data processing frameworks to run alongside MapReduce. Spark can also have limitations, such as complexity when using advanced capabilities, potential memory issues when handling large datasets, and added integration complexity when used alongside Hadoop.

Uploaded by

leminil254

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

50 views3 pages

Week 9

Uploaded by

leminil254

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

You are on page 1/ 3

1.

Hadoop's Limitations:

a. Complexity and Scalability: Hadoop's setup and maintenance may be complicated,

offering a substantial barrier for smaller enterprises. This complexity derives from the need to
setup and manage many components, including the Hadoop Distributed File System (HDFS),
MapReduce, and connected all ecosystem tools. It comprises numerous software layers and
might be resource-intensive (AltexSoft). Additionally, for smaller firms, the expense and
labor necessary to manage Hadoop clusters may not be justified owing to their minimal data
demands (Murthy, 2017).

b. Latency: Hadoop's core processing are framework, MapReduce and that is built for batch
processing. While it's fantastic for managing massive datasets, it's not ideal for real-time or
near-real-time data processing. The batch processing characteristic of MapReduce might
result in greater latency, making Hadoop less appropriate for applications that demand
immediate data insights (AltexSoft). This constraint may effect applications where quick
decision-making is critical, such as fraud detection or recommendation systems (All You
Need to Know About Big Data Analytics. (n.d.)).

c. Storage Overheads: Hadoop's HDFS is designed for huge files and is less efficient when
managing multiple little files. HDFS replicates data across numerous nodes for fault
tolerance, contributing to storage cost when dealing with tiny files (AltexSoft). This may be a
downside for the enterprises with various data storage demands, as it may lead to wasteful
disk utilization (All You Need to Know About Big Data Analytics. (n.d.)).

d. High Disk I/O: Hadoop depends extensively on disk I/O for storing and retrieving the
data. While this method gives increased durability and fault tolerance, it may become a
bottleneck for specific workloads. Disk I/O may restrict the pace of data processing and
impair the overall performance of Hadoop (AltexSoft). In circumstances when low-latency
data access is necessary, this constraint becomes more obvious (All You Need to Know
About Big Data Analytics. (n.d.)).

2. Spark's Drawbacks:

a. Complexity: While Spark is a robust and adaptable data processing framework, its
diversity may contribute to complexity. Advanced capabilities like Spark Streaming for real-
time data and machine learning libraries may be complicated to install and operate, making
Spark less accessible to novice users (AltexSoft). Organizations may need to invest in
training or employ professionals to successfully leverage these talents (Murthy, 2017).

b. Memory Consumption: Spark's in-memory processing is one of its primary benefits,

dramatically enhancing data processing speed. However, this strategy might lead to excessive
memory use, particularly when processing huge datasets or sophisticated calculations. In
circumstances when memory resources are restricted, this might result in performance
concerns, since the data might not fit fully in memory, requiring additional data shuffling
between RAM and disk (AltexSoft). This might be an issue for businesses with finite
memory resources (Ahmed et al., 2020).

c. Limited Hadoop Ecosystem Integration: Spark can function with Hadoop, but it doesn't
totally replace Hadoop in certain instances. This implies enterprises running mixed systems
with both Hadoop and Spark need to manage and integrate the two technologies. This
connection may add complexity, as it needs coordination between the systems, data
transmission between HDFS and Spark, and occasionally, operating and monitoring several
clusters (AltexSoft). This complexity may raise operational overhead, making it less smooth
for enterprises (All You Need to Know About Big Data Analytics. (n.d.)).

3. Difference Between Hadoop 1.0 and 2.0:

• Hadoop 1.0: The first version of Hadoop, Hadoop 1.0, principally contained the Hadoop
Distributed File System (HDFS) for storage and the MapReduce processing engine. While it
was innovative in the managing enormous datasets, it had drawbacks, such as a single point
of failure in that JobTracker and restrictions on the scalability. JobTracker was responsible
for resource management and task scheduling (In-Depth Big Data Framework Comparison.
(n.d.)).

• Hadoop 2.0: Hadoop 2.0 provided important advancements, most notably the inclusion of
Yet Another Resource Negotiator (YARN). YARN removed the resource management and
task scheduling features from MapReduce. This huge improvement allowed Hadoop to
become more versatile and capable of running alternative data processing frameworks
alongside MapReduce. YARN enhanced the scalability, fault tolerance, and the capability to
handle multiple workloads simultaneously. In summary, Hadoop 2.0 makes Hadoop a more
versatile and capable data processing platform (In-Depth Big Data Framework Comparison.
(n.d.)).
References:

 Hadoop vs Spark: Main Big Data Tools Explained. (n.d.). AltexSoft.

https://www.altexsoft.com/blog/hadoop-vs-spark/
 Murthy, A. (2017). Big Data Analysis Using Hadoop and Spark Big Data Analysis
Using Hadoop and Spark. https://digitalcommons.memphis.edu/cgi/viewcontent.cgi?
article=2808&context=etd
 Ahmed, N., Barczak, A. L. C., Susnjak, T., & Rashid, M. A. (2020). A comprehensive
performance analysis of Apache Hadoop and Apache Spark for large scale data sets
using HiBench. Journal of Big Data, 7(1). https://doi.org/10.1186/s40537-020-00388-
5
 Spark Vs Hadoop: All You Need to Know About Big Data Analytics. (n.d.).
https://www.veritis.com/blog/hadoop-vs-spark-all-you-need-to-know-about-big-data-
analytics/
 Hadoop vs. Spark: In-Depth Big Data Framework Comparison. (n.d.).
SearchDataManagement.
https://www.techtarget.com/searchdatamanagement/feature/Hadoop-vs-Spark-
Comparing-the-two-big-data-frameworks

28 10 00 - Access Control and Intrusion Detection - 01
No ratings yet
28 10 00 - Access Control and Intrusion Detection - 01
18 pages
DG 07 001 e 04 10 Control Device For Conventional Injection With Actuators
100% (1)
DG 07 001 e 04 10 Control Device For Conventional Injection With Actuators
435 pages
Installation Manual: Glendinning Electronic Engine Controls
100% (1)
Installation Manual: Glendinning Electronic Engine Controls
56 pages
Canadian Visa Requirements 1. Accomplished IMM5257 Form
50% (2)
Canadian Visa Requirements 1. Accomplished IMM5257 Form
5 pages
Big Data Analytics: A Comparative Evaluation of Apache Hadoop and Apache Spark
No ratings yet
Big Data Analytics: A Comparative Evaluation of Apache Hadoop and Apache Spark
8 pages
SymSitive 1609 - PB
No ratings yet
SymSitive 1609 - PB
8 pages
Bda Unit Iv
No ratings yet
Bda Unit Iv
102 pages
3280 4.19MB Strabismus - A Decision Making Approach
No ratings yet
3280 4.19MB Strabismus - A Decision Making Approach
206 pages
Midlands State University: Module: Marketing of Financial Services (Mmrk812)
No ratings yet
Midlands State University: Module: Marketing of Financial Services (Mmrk812)
58 pages
DEV3600SlideGuide PDF
No ratings yet
DEV3600SlideGuide PDF
555 pages
09 Programming Hadoop - Spark, R and Pig
No ratings yet
09 Programming Hadoop - Spark, R and Pig
80 pages
Unit 2 Da Material
No ratings yet
Unit 2 Da Material
89 pages
"Analytics Using Apache Spark": (Lightening Fast Cluster Computing)
No ratings yet
"Analytics Using Apache Spark": (Lightening Fast Cluster Computing)
99 pages
SPARK
No ratings yet
SPARK
47 pages
Big Data
No ratings yet
Big Data
27 pages
Unit 2 - Intro To Hadoop
No ratings yet
Unit 2 - Intro To Hadoop
51 pages
Module 2
No ratings yet
Module 2
20 pages
Unit V Big Data
No ratings yet
Unit V Big Data
18 pages
DSCI 5350 - Lecture 2 PDF
No ratings yet
DSCI 5350 - Lecture 2 PDF
54 pages
10 - Big Data Architecture and Tools
No ratings yet
10 - Big Data Architecture and Tools
31 pages
1.1.4 and 1.1.5
No ratings yet
1.1.4 and 1.1.5
38 pages
Business Plan g4
No ratings yet
Business Plan g4
7 pages
BD Notes 5
No ratings yet
BD Notes 5
37 pages
Enterprise Data Storage and Analysis On Spark
No ratings yet
Enterprise Data Storage and Analysis On Spark
34 pages
Apache Spark
No ratings yet
Apache Spark
25 pages
Ap Rset
No ratings yet
Ap Rset
3 pages
BDA Unit2 Notes
No ratings yet
BDA Unit2 Notes
23 pages
BDA Notes Unit-2
No ratings yet
BDA Notes Unit-2
27 pages
Data Science and Big Data UNIT 3
No ratings yet
Data Science and Big Data UNIT 3
11 pages
M2 Bigdata&Hadoop
No ratings yet
M2 Bigdata&Hadoop
27 pages
Big Data Analytics Presentation
No ratings yet
Big Data Analytics Presentation
30 pages
Bda Unit 6
No ratings yet
Bda Unit 6
14 pages
Data Analytics Mid Sem Notes
No ratings yet
Data Analytics Mid Sem Notes
9 pages
Hadoop Main
No ratings yet
Hadoop Main
19 pages
Introduction To Big Data Technologies
No ratings yet
Introduction To Big Data Technologies
10 pages
MA - VaishuAchini - VIT - 24 - ICT703 - A3
No ratings yet
MA - VaishuAchini - VIT - 24 - ICT703 - A3
21 pages
Big Data and Mapreduce Challenges, Opportunities and Trends
No ratings yet
Big Data and Mapreduce Challenges, Opportunities and Trends
9 pages
Hadoopvsspark 180108070838
No ratings yet
Hadoopvsspark 180108070838
17 pages
Mos Drywall
No ratings yet
Mos Drywall
4 pages
Trends in Chemical Engineering Education Process, Product and Sustainable Chemical Engineering Challenges - 2008 - Education For Chemical Engineers PDF
No ratings yet
Trends in Chemical Engineering Education Process, Product and Sustainable Chemical Engineering Challenges - 2008 - Education For Chemical Engineers PDF
6 pages
Introduction To Big Data PDF
No ratings yet
Introduction To Big Data PDF
16 pages
S - Hadoop Ecosystem
No ratings yet
S - Hadoop Ecosystem
14 pages
Big Data Technologies Presentation
No ratings yet
Big Data Technologies Presentation
10 pages
Apache Spark Features
No ratings yet
Apache Spark Features
2 pages
Hadoop Vs Apache Spark
No ratings yet
Hadoop Vs Apache Spark
6 pages
Big Data Processing With Apache Spark - Infoqdotcom
No ratings yet
Big Data Processing With Apache Spark - Infoqdotcom
16 pages
A Critical Analysis of Apache Hadoop and Spark For Big Data Processing
No ratings yet
A Critical Analysis of Apache Hadoop and Spark For Big Data Processing
6 pages
Hadoop
No ratings yet
Hadoop
4 pages
Si - Ptba Sig - 16-17 Nov 22 9000 MT Shipment 3 (Best) - L.9-A3032
No ratings yet
Si - Ptba Sig - 16-17 Nov 22 9000 MT Shipment 3 (Best) - L.9-A3032
1 page
Data Analyst
No ratings yet
Data Analyst
9 pages
BDAunit II
No ratings yet
BDAunit II
4 pages
A Comparative Between Hadoop MapReduce and Apache
No ratings yet
A Comparative Between Hadoop MapReduce and Apache
4 pages
Week 9
No ratings yet
Week 9
2 pages
Travel Companion Finder System
No ratings yet
Travel Companion Finder System
13 pages
Performance Comparison of Apache Hadoop and Apache Spark
No ratings yet
Performance Comparison of Apache Hadoop and Apache Spark
5 pages
The Big Big Data' Question Hadoop or Spark
No ratings yet
The Big Big Data' Question Hadoop or Spark
3 pages
Big Data
No ratings yet
Big Data
4 pages
Big Data Unit II
No ratings yet
Big Data Unit II
42 pages
Hadoop Vs Spark
No ratings yet
Hadoop Vs Spark
2 pages
Menstural Cycle DISORDERS
No ratings yet
Menstural Cycle DISORDERS
28 pages
Big Data Application Performance Monitoring in Retail ECommerce Using Spark
No ratings yet
Big Data Application Performance Monitoring in Retail ECommerce Using Spark
4 pages
Panelboards - Electrical Design Guide
No ratings yet
Panelboards - Electrical Design Guide
2 pages
Hadoop in Bigdata Processing Concept
No ratings yet
Hadoop in Bigdata Processing Concept
2 pages
Big Data Architecture
No ratings yet
Big Data Architecture
17 pages
Chapter 1. M.F
No ratings yet
Chapter 1. M.F
11 pages
Oracle - 1Z0-238 EBS R12: Install, Patch and Maintain Applications
No ratings yet
Oracle - 1Z0-238 EBS R12: Install, Patch and Maintain Applications
5 pages
Grouding Test
No ratings yet
Grouding Test
8 pages
Tech Seminar Report
No ratings yet
Tech Seminar Report
5 pages
V3i308 PDF
No ratings yet
V3i308 PDF
9 pages
Full Paper SUCCESSFUL YOUNG INDIAN ENTREPRENEURS A CASE STUDY
No ratings yet
Full Paper SUCCESSFUL YOUNG INDIAN ENTREPRENEURS A CASE STUDY
12 pages
Loci Booklet
No ratings yet
Loci Booklet
7 pages
Entity Level GHG Survey (2019)
No ratings yet
Entity Level GHG Survey (2019)
2 pages
Knowledge Management For Agricultural Innovation
No ratings yet
Knowledge Management For Agricultural Innovation
17 pages
60e459fedbb7070071bf2942 - ## - Chemical Equilibrium - 230409 - 220542
No ratings yet
60e459fedbb7070071bf2942 - ## - Chemical Equilibrium - 230409 - 220542
6 pages
Manual Aspirador Makita DCL180Z A Batería 18V Litio
No ratings yet
Manual Aspirador Makita DCL180Z A Batería 18V Litio
44 pages
Konya Province Gelatin Production Pre Feasibility Report With Appendix
No ratings yet
Konya Province Gelatin Production Pre Feasibility Report With Appendix
77 pages
Warehouse Manager Supply Chain Logistics in Louisville KY Resume David Porter
No ratings yet
Warehouse Manager Supply Chain Logistics in Louisville KY Resume David Porter
2 pages
Activity Design District Municipal Festival of Talents 2024
No ratings yet
Activity Design District Municipal Festival of Talents 2024
6 pages
History Test
No ratings yet
History Test
3 pages
Important Tables of Oral Pathology
No ratings yet
Important Tables of Oral Pathology
17 pages
The Teacher's Guide To Scratch - Beginner: Professional Development For Coding Education 1st Edition Hutchence
100% (1)
The Teacher's Guide To Scratch - Beginner: Professional Development For Coding Education 1st Edition Hutchence
70 pages
Professional Hadoop Solutions
From Everand
Professional Hadoop Solutions
Boris Lublinsky
4/5 (2)
Exploring Hadoop Ecosystem (Volume 1): Batch Processing
From Everand
Exploring Hadoop Ecosystem (Volume 1): Batch Processing
Wei Liu
No ratings yet
Mastering Hadoop
From Everand
Mastering Hadoop
Sandeep Karanth
No ratings yet
Big Data Analytics
From Everand
Big Data Analytics
Nitin Kumar Yadav
No ratings yet
Hadoop Ecosystem for Big Data
From Everand
Hadoop Ecosystem for Big Data
Dr. Zemelak Goraga
No ratings yet
Hadoop Blueprints
From Everand
Hadoop Blueprints
Anurag Shrivastava
No ratings yet
Advanced Hadoop Techniques: A Comprehensive Guide to Mastery
From Everand
Advanced Hadoop Techniques: A Comprehensive Guide to Mastery
Adam Jones
No ratings yet
HDInsight Essentials - Second Edition
From Everand
HDInsight Essentials - Second Edition
Rajesh Nadipalli
No ratings yet
Learn Hadoop in 24 Hours
From Everand
Learn Hadoop in 24 Hours
Alex Nordeen
No ratings yet

Week 9

Uploaded by

Week 9

Uploaded by

1.

a. Complexity and Scalability: Hadoop's setup and maintenance may be complicated,

b. Memory Consumption: Spark's in-memory processing is one of its primary benefits,

3. Difference Between Hadoop 1.0 and 2.0:

 Hadoop vs Spark: Main Big Data Tools Explained. (n.d.). AltexSoft.

You might also like