0% found this document useful (0 votes)

100 views7 pages

Presentation On Apache Spark

This presentation provides an introduction to Apache Spark. It discusses how Spark is a framework for performing data analytics on distributed clusters like Hadoop. Spark provides in-memory computations for increased speed over MapReduce. It runs on existing Hadoop clusters and can access Hadoop data stores. Spark supports multiple programming languages and can process both batch and streaming data. Key benefits of Spark include its speed for iterative jobs like machine learning through caching data in memory. It is becoming a popular replacement for some MapReduce jobs due to its superior performance.

Uploaded by

Mridula Bvs

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as ODP, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

100 views7 pages

Presentation On Apache Spark

Uploaded by

Mridula Bvs

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as ODP, PDF, TXT or read online on Scribd

You are on page 1/ 7

Presentation on Apache Spark

ByB V S Mridula (1039060)

Milind Baluni (1003192)
G. Ganesh ()
Dilip Payra ()
Sameer Nayak ()

Introduction to Apache Spark

It is a framework for performing general data analytics on distributed

computing cluster like Hadoop
It provides in memory computations for increase speed and data
process over MapReduce.
Runs on top of existing Hadoop cluster and access Hadoop data store
(HDFS)
Can also process structured data in Hive and Streaming data from
HDFS,Flume,Kafka,Twitter

High-Productivity Language
Support

Native support for multiple

languages with identical
APIs

Use of closures, iterations,

and other common
language constructs to
minimize code

Python

lines = sc.textFile(...)
lines.fi
lter(lam bda s: ERRO R in s).count()

Scala
val lines = sc.textFile(...)
lines.fi
lter(s = >
s.contains(ERRO R)).count()

Java

Unified API for batch and

streaming

JavaRD D < String> lines = sc.textFile(...);

lines.fi
lter(n ew Function< String, Boolean> ()
{
Boolean call(String s) {
retu rn s.contains(error);
}
}).count();

Why Apache Spark is used?

Apache Spark is best for performing iterative jobs like machine learning.
Apache spark maintains an in-memory copy of map reduce job using RDD(Resilient
Distributed Datasets)
With this, if the map reduce job is once loaded into in-memory copy.
The need for loading that map reduce job again and again from memory gets
reduced. With this there is a tremendous increase in speed.
The in-memory copy holds the frequently used map reduce job within itself.
Run programs up to 100x faster than Hadoop MapReduce in memory, or 10x faster
on disk.

Spark libraries
Spark SQL:
It is a Spark's module for working with structured data.
Spark Streaming:
Which makes it easy to build scalable fault-tolerant streaming
applications.
MLlib:
It is Apache Spark's scalable machine learning library.
GraphX:
It is Apache Spark's API for graphs and graph-parallel computation.

Is Apache Spark going to replace Hadoop?

Hadoop essentially consists of a MapReduce phase and a file system

(HDFS) whereas Spark is a framework that executes jobs, so
practically Spark can only replace the MapReduce phase in the
Hadoop Ecosystem.
Spark was mainly designed to run on top of Hadoop so as to minimize
the job execution time.
Spark is an alternative to the traditional MapReduce model that used
to work only in batch mode. Spark supports both batch as well as
real-time processing.
Spark mainly utilizes the primary memory of the system to provide
efficient output. Thus, it requires a high-end machine to execute jobs.
Hadoop, on the other hand, can easily run on commodity hardware.
Spark's way of handle fault tolerance is very fast as compared to
Hadoop's. This minimizes network I/O and guarantees fault tolerance.

Thank You!

T07 Spark
No ratings yet
T07 Spark
23 pages
Big Data With Apache Spark 3 and Python From Zero To Expert
No ratings yet
Big Data With Apache Spark 3 and Python From Zero To Expert
28 pages
SAP Vertex Solutions For SAP Self PDF
100% (1)
SAP Vertex Solutions For SAP Self PDF
20 pages
"Analytics Using Apache Spark": (Lightening Fast Cluster Computing)
No ratings yet
"Analytics Using Apache Spark": (Lightening Fast Cluster Computing)
99 pages
Unit V
No ratings yet
Unit V
35 pages
Key Features: General-Purpose Fast Cluster Computing Platform
No ratings yet
Key Features: General-Purpose Fast Cluster Computing Platform
16 pages
Mastering Apache Spark PDF
75% (4)
Mastering Apache Spark PDF
541 pages
Databricks On AWS 01 Getting Started Apache Spark Slides
100% (1)
Databricks On AWS 01 Getting Started Apache Spark Slides
29 pages
Spark: Prepared by Dulari Bhatt
No ratings yet
Spark: Prepared by Dulari Bhatt
19 pages
Sap Pra PDF
No ratings yet
Sap Pra PDF
20 pages
Pyspark Interview Code
100% (3)
Pyspark Interview Code
197 pages
Unit 5
100% (1)
Unit 5
109 pages
Big Data Anlytics Unit 3 R22 It
No ratings yet
Big Data Anlytics Unit 3 R22 It
57 pages
AWS Database Migration Service Best Practices
100% (1)
AWS Database Migration Service Best Practices
17 pages
Apache Spark Ecosystem - Complete Spark Components Guide: 1. Objective
No ratings yet
Apache Spark Ecosystem - Complete Spark Components Guide: 1. Objective
11 pages
Sspark
No ratings yet
Sspark
7 pages
Spark Introduction
No ratings yet
Spark Introduction
12 pages
Unit 5.1
No ratings yet
Unit 5.1
9 pages
Apache Spark 1
No ratings yet
Apache Spark 1
11 pages
Apache Spark
No ratings yet
Apache Spark
25 pages
Apache Spark
No ratings yet
Apache Spark
27 pages
Spark BD
No ratings yet
Spark BD
9 pages
Pyspark Notes New
No ratings yet
Pyspark Notes New
18 pages
Lec No 10
No ratings yet
Lec No 10
17 pages
Huawei FusionSphere 6.1 Virtualization Suite Data Sheet
No ratings yet
Huawei FusionSphere 6.1 Virtualization Suite Data Sheet
11 pages
7 Steps For A Developer To Learn Apache Spark
No ratings yet
7 Steps For A Developer To Learn Apache Spark
30 pages
Apache Spark
No ratings yet
Apache Spark
40 pages
Spark
No ratings yet
Spark
9 pages
A2041175501 - 28953 - 15 - 2025 - Unit 1 Part 1
No ratings yet
A2041175501 - 28953 - 15 - 2025 - Unit 1 Part 1
13 pages
Introduction To Spark 1
No ratings yet
Introduction To Spark 1
21 pages
CC PPT
No ratings yet
CC PPT
12 pages
Shark
No ratings yet
Shark
24 pages
ApacheSparkWorkshop 2020 09 17
No ratings yet
ApacheSparkWorkshop 2020 09 17
58 pages
Spark-Rdd
No ratings yet
Spark-Rdd
15 pages
Bda 5
No ratings yet
Bda 5
21 pages
Unit V Big Data
No ratings yet
Unit V Big Data
18 pages
Bda U3 p1 (Intro To Spark)
No ratings yet
Bda U3 p1 (Intro To Spark)
66 pages
Citrix Education - ALL-ACCESS
No ratings yet
Citrix Education - ALL-ACCESS
2 pages
Introduction-to-Apache-Spark
No ratings yet
Introduction-to-Apache-Spark
22 pages
Big Data Processing With Apache Spark - Part 1 - Introduction - InfoQ
No ratings yet
Big Data Processing With Apache Spark - Part 1 - Introduction - InfoQ
18 pages
09 Programming Hadoop - Spark, R and Pig
No ratings yet
09 Programming Hadoop - Spark, R and Pig
80 pages
Cse3002 Big Data m3 Detailed
No ratings yet
Cse3002 Big Data m3 Detailed
39 pages
Analytics at Large Scale in Spark
No ratings yet
Analytics at Large Scale in Spark
13 pages
DEV3600SlideGuide PDF
No ratings yet
DEV3600SlideGuide PDF
555 pages
Unit IV Spark
No ratings yet
Unit IV Spark
23 pages
07 - Apache Spark - An Introduction
No ratings yet
07 - Apache Spark - An Introduction
36 pages
Module 3
No ratings yet
Module 3
51 pages
L03-Spark Framework
No ratings yet
L03-Spark Framework
58 pages
What Is Spark?: History of Apache Spark
No ratings yet
What Is Spark?: History of Apache Spark
65 pages
BigData Spark Sparklyr
No ratings yet
BigData Spark Sparklyr
80 pages
06 Big Data
No ratings yet
06 Big Data
52 pages
Spark Final Theory
No ratings yet
Spark Final Theory
19 pages
A Brief Introduction To Apache Spark
No ratings yet
A Brief Introduction To Apache Spark
10 pages
Apache Spark: Dhineshkumar S K
No ratings yet
Apache Spark: Dhineshkumar S K
31 pages
Apache Spark Essential Training
No ratings yet
Apache Spark Essential Training
30 pages
4a.introduction To Apache Spark
No ratings yet
4a.introduction To Apache Spark
28 pages
Introduction To Spark
No ratings yet
Introduction To Spark
84 pages
Learning Spark - Chapter 1
No ratings yet
Learning Spark - Chapter 1
18 pages
Tech Seminar Report
No ratings yet
Tech Seminar Report
5 pages
New Lab 1
No ratings yet
New Lab 1
6 pages
PI Notifications User Guide
No ratings yet
PI Notifications User Guide
162 pages
Poison Ivy 2.1.0 Documentation by Shapeless - Tutorial
No ratings yet
Poison Ivy 2.1.0 Documentation by Shapeless - Tutorial
12 pages
Red Hat Openstack Platform-16.2-High Availability Deployment and Usage-En-us
No ratings yet
Red Hat Openstack Platform-16.2-High Availability Deployment and Usage-En-us
59 pages
Account Executive in Los Angeles CA Resume Jose Cavazos
No ratings yet
Account Executive in Los Angeles CA Resume Jose Cavazos
3 pages
Upgrade - Abap Role
No ratings yet
Upgrade - Abap Role
3 pages
PMSCS Class Routine Summer 2024
No ratings yet
PMSCS Class Routine Summer 2024
1 page
Combo Smart: Compact, Single-Step, Full-Page Scanner Series With Versatile Functions
No ratings yet
Combo Smart: Compact, Single-Step, Full-Page Scanner Series With Versatile Functions
2 pages
Software Engineer Agent Prompt - Claude
No ratings yet
Software Engineer Agent Prompt - Claude
1 page
Setup - ns3 Simulator
No ratings yet
Setup - ns3 Simulator
4 pages
Backbeat Sense: User Guide
No ratings yet
Backbeat Sense: User Guide
9 pages
Practicedump: Free Practice Dumps - Unlimited Free Access of Practice Exam
No ratings yet
Practicedump: Free Practice Dumps - Unlimited Free Access of Practice Exam
6 pages
HCM Data Loader User's Guide: Oracle Fusion Human Capital Management 11g Release 10 (11.1.10) and Later
No ratings yet
HCM Data Loader User's Guide: Oracle Fusion Human Capital Management 11g Release 10 (11.1.10) and Later
74 pages
Cha-2 - Software Maintenance Part-II
No ratings yet
Cha-2 - Software Maintenance Part-II
97 pages
Dream Report User Manual1 PDF
No ratings yet
Dream Report User Manual1 PDF
501 pages
Duke Energy Utilizes Power Plant Parameter Derivation Software To Validate Generator and Excitation System Models
No ratings yet
Duke Energy Utilizes Power Plant Parameter Derivation Software To Validate Generator and Excitation System Models
2 pages
Jasmine Install
No ratings yet
Jasmine Install
2 pages
Trevett SlidesMania
No ratings yet
Trevett SlidesMania
21 pages
Integrating GIS With Hydrological Modeling: Practices, Problems, and Prospects
No ratings yet
Integrating GIS With Hydrological Modeling: Practices, Problems, and Prospects
19 pages
AIS Exam (Chapter 9-12) Flashcards - Quizlet
No ratings yet
AIS Exam (Chapter 9-12) Flashcards - Quizlet
10 pages
Ibrahim Ahmeds Resume
No ratings yet
Ibrahim Ahmeds Resume
1 page
Unit I Wswe.
No ratings yet
Unit I Wswe.
56 pages
Quires
No ratings yet
Quires
7 pages
Foodpanda Login Test Case
No ratings yet
Foodpanda Login Test Case
13 pages
DevOps Fundamentals. Module 1 Review Questions Flashcards Quizlet
No ratings yet
DevOps Fundamentals. Module 1 Review Questions Flashcards Quizlet
1 page
Exploring Hadoop Ecosystem (Volume 1): Batch Processing
From Everand
Exploring Hadoop Ecosystem (Volume 1): Batch Processing
Wei Liu
No ratings yet
Exploring Hadoop Ecosystem (Volume 2): Stream Processing
From Everand
Exploring Hadoop Ecosystem (Volume 2): Stream Processing
Wei Liu
No ratings yet
Learning Hadoop 2
From Everand
Learning Hadoop 2
Garry Turkington
4/5 (1)
Mastering Hadoop
From Everand
Mastering Hadoop
Sandeep Karanth
No ratings yet
Learning Apache Spark 2
From Everand
Learning Apache Spark 2
Muhammad Asif Abbasi
No ratings yet
Big Data Analytics
From Everand
Big Data Analytics
Nitin Kumar Yadav
No ratings yet

Presentation On Apache Spark

Uploaded by

Presentation On Apache Spark

Uploaded by

Presentation on Apache Spark

ByB V S Mridula (1039060)

Introduction to Apache Spark

It is a framework for performing general data analytics on distributed

Native support for multiple

Use of closures, iterations,

Unified API for batch and

JavaRD D < String> lines = sc.textFile(...);

Why Apache Spark is used?

Is Apache Spark going to replace Hadoop?

Hadoop essentially consists of a MapReduce phase and a file system

You might also like