Big Data

Apache Hadoop and Apache Spark are two popular frameworks for big data analytics. Hadoop stores data across servers in HDFS and uses MapReduce for processing. Spark uses an in-memory processing engine built on Hadoop's MapReduce but provides faster processing. While both are highly scalable, Spark tends to be more efficient due to its in-memory capabilities. Spark also has simpler APIs while Hadoop can be more complex to set up. There is no single better framework as it depends on the specific use case and requirements.

Uploaded by

sukhpreet singh

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

43 views7 pages

Big Data

Uploaded by

sukhpreet singh

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

You are on page 1/ 7

Big Data Analytics: A

Comparative Evaluation
of Apache Hadoop and
Apache Spark
In this presentation, we'll be exploring the differences between two of
the most popular big data processing frameworks, Apache Hadoop
and Apache Spark.

by Sukhpreet Singh
What is Big Data Analytics?
1 Definition 2 Importance

Big Data Analytics refers Big Data Analytics enables

to the process of organizations to drive
extracting insights and innovation and make
valuable information from data-driven decisions that
large and complex can lead to greater
datasets. efficiency and
profitability.

3 Tools

There are various tools available for Big Data Analytics, but
Apache Hadoop and Apache Spark are two of the most widely
used platforms.
Overview of Apache Hadoop

What is Hadoop? How does it work?

Apache Hadoop is an open-source Big Data Hadoop stores data across multiple servers in
processing framework that allows distributed a distributed file system called Hadoop
storage and processing of large datasets across Distributed File System (HDFS). The processing
computing clusters. itself is done using a framework called
MapReduce.
Overview of Apache Spark
What is Spark? How does it work? Features

Apache Spark is an open- Spark uses a processing Spark includes a wide

source Big Data engine built on top of range of features,
processing engine that Hadoop's MapReduce including support for real-
allows fast and efficient framework, but with time stream processing,
processing of large some important machine learning, graph
datasets in a distributed modifications that allow processing, and more.
fashion. faster and more efficient
processing, including in-
memory processing and
caching.
Comparison between Hadoop and Spark
Applications

Both platforms can be used for a

wide range of Big Data
Scalability
processing applications, but
Both platforms are highly Spark is better suited for certain
scalable, but Spark tends to be types of processing, such as
more efficient due to its in- machine learning and real-time
memory processing capabilities. stream processing.

1 2 3 4

Speed Usability

Spark is generally faster than Hadoop can be more complex to

Hadoop, especially for iterative set up and use, while Spark has a
processing and real-time stream simpler and more user-friendly
processing. API.
Evaluation Criteria
Performance Scalability

How well does each platform handle large- How easy is it to scale each platform to
scale data processing? handle larger and more complex datasets?

Usability Features

How easy is it to use and learn each What are the key features of each platform,
platform? and how well do they meet the needs of
your specific use case?
Conclusion

Which is better? Final Thoughts

There is no clear answer to this question, as it Both Apache Hadoop and Apache Spark are
largely depends on your specific use case and powerful Big Data processing platforms that
requirements. can help organizations gain valuable insights
from their data.

T07 Spark
No ratings yet
T07 Spark
23 pages
Big Data With Apache Spark 3 and Python From Zero To Expert
No ratings yet
Big Data With Apache Spark 3 and Python From Zero To Expert
28 pages
Apache Spark Primer 170303
No ratings yet
Apache Spark Primer 170303
8 pages
Big Data Analytics: A Comparative Evaluation of Apache Hadoop and Apache Spark
No ratings yet
Big Data Analytics: A Comparative Evaluation of Apache Hadoop and Apache Spark
8 pages
"Analytics Using Apache Spark": (Lightening Fast Cluster Computing)
No ratings yet
"Analytics Using Apache Spark": (Lightening Fast Cluster Computing)
99 pages
Olympiad Open Doors
0% (1)
Olympiad Open Doors
11 pages
PySpark Notes
No ratings yet
PySpark Notes
31 pages
Key Features: General-Purpose Fast Cluster Computing Platform
No ratings yet
Key Features: General-Purpose Fast Cluster Computing Platform
16 pages
Principles of Multimedia MCQs + Answers
77% (43)
Principles of Multimedia MCQs + Answers
24 pages
Functional Requirement Document Practical
No ratings yet
Functional Requirement Document Practical
19 pages
Spark Interview Ques1
No ratings yet
Spark Interview Ques1
20 pages
Presentation On Apache Spark
No ratings yet
Presentation On Apache Spark
7 pages
Spark: Prepared by Dulari Bhatt
No ratings yet
Spark: Prepared by Dulari Bhatt
19 pages
Apache Spark Features
No ratings yet
Apache Spark Features
2 pages
Compare Hadoop vs. Spark vs. Kafka For Your Big Data Strategy
No ratings yet
Compare Hadoop vs. Spark vs. Kafka For Your Big Data Strategy
10 pages
Apache Spark
No ratings yet
Apache Spark
25 pages
7 Steps For A Developer To Learn Apache Spark
No ratings yet
7 Steps For A Developer To Learn Apache Spark
30 pages
Large Scale Data Processing: Saeed Iqbal Khattak
No ratings yet
Large Scale Data Processing: Saeed Iqbal Khattak
81 pages
Introduction To Big Data Technologies
No ratings yet
Introduction To Big Data Technologies
10 pages
Apache Spark
No ratings yet
Apache Spark
16 pages
Hadoop vs. Spark: The New Age of Big Data
No ratings yet
Hadoop vs. Spark: The New Age of Big Data
7 pages
The Big Big Data' Question Hadoop or Spark
No ratings yet
The Big Big Data' Question Hadoop or Spark
3 pages
B66266C3 Hadoopvsspark
No ratings yet
B66266C3 Hadoopvsspark
13 pages
A Comparative Between Hadoop MapReduce and Apache
No ratings yet
A Comparative Between Hadoop MapReduce and Apache
4 pages
Hadoopvsspark 180108070838
No ratings yet
Hadoopvsspark 180108070838
17 pages
Shark
No ratings yet
Shark
24 pages
Sspark
No ratings yet
Sspark
7 pages
Spark Introduction
No ratings yet
Spark Introduction
12 pages
Spark
No ratings yet
Spark
49 pages
BD Notes 5
No ratings yet
BD Notes 5
37 pages
Spark-Rdd
No ratings yet
Spark-Rdd
15 pages
ShipConstructor Outfitting Documentation
No ratings yet
ShipConstructor Outfitting Documentation
68 pages
09 Programming Hadoop - Spark, R and Pig
No ratings yet
09 Programming Hadoop - Spark, R and Pig
80 pages
Bda U3 p1 (Intro To Spark)
No ratings yet
Bda U3 p1 (Intro To Spark)
66 pages
Compare Hadoop and Spark.: Table
No ratings yet
Compare Hadoop and Spark.: Table
10 pages
Week 9
No ratings yet
Week 9
2 pages
BigData Spark Sparklyr
No ratings yet
BigData Spark Sparklyr
80 pages
Apache Spark 1
No ratings yet
Apache Spark 1
11 pages
Big Data Processing With Apache Spark - Infoqdotcom
No ratings yet
Big Data Processing With Apache Spark - Infoqdotcom
16 pages
Lecture 3 PPT 22
No ratings yet
Lecture 3 PPT 22
25 pages
Big Data Anlytics Unit 3 R22 It
No ratings yet
Big Data Anlytics Unit 3 R22 It
57 pages
Module 2
No ratings yet
Module 2
20 pages
Introduction-to-Apache-Spark
No ratings yet
Introduction-to-Apache-Spark
22 pages
Pyspark Notes New
No ratings yet
Pyspark Notes New
18 pages
Spark
No ratings yet
Spark
4 pages
06 Big Data
No ratings yet
06 Big Data
52 pages
Hadoop Vs Spark
No ratings yet
Hadoop Vs Spark
2 pages
Cse3002 Big Data m3 Detailed
No ratings yet
Cse3002 Big Data m3 Detailed
39 pages
Introduction To Spark 1
No ratings yet
Introduction To Spark 1
21 pages
BIG Data Analytics 21CSH-471: Computer Science & Engineering
No ratings yet
BIG Data Analytics 21CSH-471: Computer Science & Engineering
24 pages
Unit V Big Data
No ratings yet
Unit V Big Data
18 pages
Introduction To Spark
No ratings yet
Introduction To Spark
84 pages
Apache Spark Vs MapReduce
No ratings yet
Apache Spark Vs MapReduce
3 pages
Big Data Processing With Apache Spark
No ratings yet
Big Data Processing With Apache Spark
17 pages
Hadoop Vs Apache Spark
No ratings yet
Hadoop Vs Apache Spark
6 pages
Big Data Analytics Presentation
No ratings yet
Big Data Analytics Presentation
30 pages
Big Data Processing With Apache Spark - Part 1 - Introduction - InfoQ
No ratings yet
Big Data Processing With Apache Spark - Part 1 - Introduction - InfoQ
18 pages
Big Data Technologies Presentation
No ratings yet
Big Data Technologies Presentation
10 pages
1.1.4 and 1.1.5
No ratings yet
1.1.4 and 1.1.5
38 pages
Tech Seminar Report
No ratings yet
Tech Seminar Report
5 pages
IC PMO Strategy Roadmap 11223
No ratings yet
IC PMO Strategy Roadmap 11223
2 pages
SIG
No ratings yet
SIG
43 pages
Preprocessor in C
100% (1)
Preprocessor in C
6 pages
Hotel Information MAnagement Sysrem
No ratings yet
Hotel Information MAnagement Sysrem
128 pages
ERP Project System
No ratings yet
ERP Project System
25 pages
User Interface Design-MCA
No ratings yet
User Interface Design-MCA
3 pages
Smart Contact Manager Synopsis
No ratings yet
Smart Contact Manager Synopsis
15 pages
Gridview Column of Radio Buttons: Step 1: Creating The Enhancing The Gridview Web Pages
No ratings yet
Gridview Column of Radio Buttons: Step 1: Creating The Enhancing The Gridview Web Pages
27 pages
Skills Matrix - RQ00356
No ratings yet
Skills Matrix - RQ00356
22 pages
Word Advanced
No ratings yet
Word Advanced
6 pages
jcp11 01 Rms 20240118
No ratings yet
jcp11 01 Rms 20240118
14 pages
Diskashur Pro2 Manual v3.6
No ratings yet
Diskashur Pro2 Manual v3.6
115 pages
Assembly Systems in Industry 4.0 Era: A Road Map To Understand Assembly 4.0
No ratings yet
Assembly Systems in Industry 4.0 Era: A Road Map To Understand Assembly 4.0
18 pages
Write A C Program To Identify Different Types of Tokens in A Given Program
No ratings yet
Write A C Program To Identify Different Types of Tokens in A Given Program
46 pages
eNetMeter DN EtherNet - IP Server User Guide
No ratings yet
eNetMeter DN EtherNet - IP Server User Guide
68 pages
Punjabiversion
No ratings yet
Punjabiversion
28 pages
Afreen Tehniyat
No ratings yet
Afreen Tehniyat
3 pages
Bcab - SC (It) Class List
No ratings yet
Bcab - SC (It) Class List
1 page
Multiprogramming
No ratings yet
Multiprogramming
2 pages
7d41 PDF
No ratings yet
7d41 PDF
7 pages
3 DOF Gyroscope Data Sheet
No ratings yet
3 DOF Gyroscope Data Sheet
2 pages
Sansan Brochure - June 2020
No ratings yet
Sansan Brochure - June 2020
8 pages
SAP ABAP Interview Questions Top 50
No ratings yet
SAP ABAP Interview Questions Top 50
7 pages
Sage 300 People System Requirements
No ratings yet
Sage 300 People System Requirements
11 pages
M1120 Calculus (III) Lecture
No ratings yet
M1120 Calculus (III) Lecture
10 pages
Give Access Right Non-Administrators To View AD Deleted Objects Container
No ratings yet
Give Access Right Non-Administrators To View AD Deleted Objects Container
2 pages
Base Paper
No ratings yet
Base Paper
9 pages
Cv-Mohd Salman
No ratings yet
Cv-Mohd Salman
4 pages
Sapanjeet Kaur Sidhu: Research Work Details
No ratings yet
Sapanjeet Kaur Sidhu: Research Work Details
1 page
Executive Officer II (Cluster Allied Health Office)
No ratings yet
Executive Officer II (Cluster Allied Health Office)
3 pages
SQL Queries
No ratings yet
SQL Queries
3 pages
Powersynth: Multi-Chip Power Module Layout Synthesis: Application of Fast Design Optimization Tools For Mcpms
No ratings yet
Powersynth: Multi-Chip Power Module Layout Synthesis: Application of Fast Design Optimization Tools For Mcpms
1 page
Big Data Analytics
From Everand
Big Data Analytics
Venkat Ankam
No ratings yet
Exploring Hadoop Ecosystem (Volume 1): Batch Processing
From Everand
Exploring Hadoop Ecosystem (Volume 1): Batch Processing
Wei Liu
No ratings yet
Exploring Hadoop Ecosystem (Volume 2): Stream Processing
From Everand
Exploring Hadoop Ecosystem (Volume 2): Stream Processing
Wei Liu
No ratings yet
PySpark Essentials: A Practical Guide to Distributed Computing
From Everand
PySpark Essentials: A Practical Guide to Distributed Computing
Robert Johnson
No ratings yet

Big Data

Uploaded by

Big Data

Uploaded by

Big Data Analytics: A

Big Data Analytics refers Big Data Analytics enables

What is Hadoop? How does it work?

Apache Spark is an open- Spark uses a processing Spark includes a wide

Both platforms can be used for a

Spark is generally faster than Hadoop can be more complex to

Which is better? Final Thoughts

You might also like