Apache Spark Features

Apache Spark is an open-source cluster computing framework that provides fault tolerance, dynamic operations, and lazy evaluation. It allows for real-time stream processing, is 100x faster than Hadoop for in-memory computing and 10x faster for disk-based operations. Spark supports multiple programming languages, integrated with Hadoop, and is cost-efficient without licensing fees.

Uploaded by

nitinlucky

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

71 views2 pages

Apache Spark Features

Uploaded by

nitinlucky

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 2

Module4:

Features of Apache Spark

Fault tolerance
Dynamic In Nature
Lazy Evaluation
Real-Time Stream Processing
Speed
Reusability
Advanced Analytics
In Memory Computing
Supporting Multiple languages
Integrated with Hadoop
Cost efficient

Fault Tolerance: Apache Spark is designed to handle worker node failures. It achieves this
fault tolerance by using DAG and RDD (Resilient Distributed Datasets). DAG contains the
lineage of all the transformations and actions needed to complete a task. So in the event of a
worker node failure, the same results can be achieved by rerunning the steps from the existing
DAG.

Dynamic nature:Spark offers over 80 high-level operators that make it easy to build parallel
apps.

Lazy Evaluation: Spark does not evaluate any transformation immediately. All the
transformations are lazily evaluated. The transformations are added to the DAG and the final
computation or results are available only when actions are called. This gives spark ability to
make optimization decisions as all the transformations become visible to the spark engine
before performing any action.

Real Time Stream Processing: Spark Streaming brings Apache Spark's

language-integrated API to stream processing, letting you write streaming jobs the same
way you write batch jobs.

Speed: Spark enables applications running on Hadoop to run upto 100x faster in memory and
upto 10x faster on disk. Spark achieves this by minimising disk read/write operations for
intermediate results and storing in memory and perform disk operations only when essential.
Spark achieves this using DAG, query optimizer and highly optimized physical execution
engine.
Reusability: Spark code can be used for batch-processing, joining streaming data against
historical data as well run ad-hoc queries on streaming state.

Advanced Analytics: Apache spark has rapidly become the de facto standard for big data
processing and data sciences across multiple industries. Spark provides both machine learning
and graph processing libraries which companies across sectors leverage to tackle complex
problems. And all this is easily done using the power of spark and highly scalable clustered
computers. Databricks provides and Advanced Analytics platform with Spark.

In Memory Computing: Unlike Hadoop Mapreduce, Apache spark is capable of processing

tasks in memory and it is not required to write back intermediate results to disk. This feature
gives massive speed to Spark processing. Over and above this, Spark is also capable of caching
the intermediate results so that it reused in next iteration. This gives spark added performance
boost for any iterative and repetitive process where results in one step can be used later or there
is a common dataset which can be used across multiple tasks.

Supporting Multiple languages: Spark comes inbuilt with multi-language support. It has
most of the APIs available in Java, Scala, Python and R. Also there are advanced features
available with R language for data analytics. Also, Spark comes with SparkSQL which has a SQL
like feature and so SQL developers find it very easy to use and the learning curve is reduced to a
great level.

Integrated with Hadoop: Apache Spark integrated very well with Hadoop file system HDFS.
It has support to multiple file formats like parquet,json,csv,ORC, Avro etc. Hadoop can be easily
leveraged with Spark as input data source or destination.

Cost efficient: Apache Spark is an open source software, so it does not have any licensing fee
associated with it. Users have to just worry about the hardware cost. Also, Apache Spark reduces
a lot of other costs as it comes inbuilt for stream processing, ML and Graph processing. Spark
does not have any locking with any vendor so that makes it very easy for organizations to pick
and choose spark features as per their use case.

Project Instructions 2
No ratings yet
Project Instructions 2
5 pages
The Object Primer - Agile Model-Driven Development With Uml 2.0
0% (1)
The Object Primer - Agile Model-Driven Development With Uml 2.0
506 pages
Cobol Db2 Sample PGM
100% (4)
Cobol Db2 Sample PGM
16 pages
Sspark
No ratings yet
Sspark
7 pages
Tech Seminar Report
No ratings yet
Tech Seminar Report
5 pages
09 Programming Hadoop - Spark, R and Pig
No ratings yet
09 Programming Hadoop - Spark, R and Pig
80 pages
Introduction-to-Apache-Spark
No ratings yet
Introduction-to-Apache-Spark
22 pages
Lec No 10
No ratings yet
Lec No 10
17 pages
Unit V Big Data
No ratings yet
Unit V Big Data
18 pages
Hadoop Vs Apache Spark
No ratings yet
Hadoop Vs Apache Spark
6 pages
Introduction To Spark
No ratings yet
Introduction To Spark
84 pages
Features of Apache Spark
No ratings yet
Features of Apache Spark
7 pages
Big Data Anlytics Unit 3 R22 It
No ratings yet
Big Data Anlytics Unit 3 R22 It
57 pages
Bda Unit Iv
No ratings yet
Bda Unit Iv
102 pages
Spark
No ratings yet
Spark
9 pages
Shark
No ratings yet
Shark
24 pages
Spark: Prepared by Dulari Bhatt
No ratings yet
Spark: Prepared by Dulari Bhatt
19 pages
Introduction To Spark 1
No ratings yet
Introduction To Spark 1
21 pages
Unit 4
No ratings yet
Unit 4
8 pages
Spark Introduction
No ratings yet
Spark Introduction
12 pages
Pyspark Notes New
No ratings yet
Pyspark Notes New
18 pages
Big Data Analytics Presentation
No ratings yet
Big Data Analytics Presentation
30 pages
Spark Final Theory
No ratings yet
Spark Final Theory
19 pages
Pyspark Interview Code
100% (3)
Pyspark Interview Code
197 pages
Big Data Processing With Apache Spark - Infoqdotcom
No ratings yet
Big Data Processing With Apache Spark - Infoqdotcom
16 pages
Introduction To Big Data Technologies
No ratings yet
Introduction To Big Data Technologies
10 pages
4 Spark SBP
No ratings yet
4 Spark SBP
74 pages
06 Big Data
No ratings yet
06 Big Data
52 pages
Learn Apache Spark
100% (1)
Learn Apache Spark
31 pages
Spark SQL
100% (1)
Spark SQL
25 pages
Bda 5
No ratings yet
Bda 5
21 pages
Spark-Rdd
No ratings yet
Spark-Rdd
15 pages
Parallel Processing
No ratings yet
Parallel Processing
38 pages
Spark Interview Questions PDF 2
No ratings yet
Spark Interview Questions PDF 2
19 pages
4a.introduction To Apache Spark
No ratings yet
4a.introduction To Apache Spark
28 pages
Unit 5.1
No ratings yet
Unit 5.1
9 pages
Spark Introduction
No ratings yet
Spark Introduction
25 pages
Analytics at Large Scale in Spark
No ratings yet
Analytics at Large Scale in Spark
13 pages
Spark
No ratings yet
Spark
9 pages
Unit V
No ratings yet
Unit V
35 pages
Lecture 3 PPT 22
No ratings yet
Lecture 3 PPT 22
25 pages
Big Data Processing With Apache Spark - Part 1 - Introduction - InfoQ
No ratings yet
Big Data Processing With Apache Spark - Part 1 - Introduction - InfoQ
18 pages
Presentation On Apache Spark
No ratings yet
Presentation On Apache Spark
7 pages
Unit 4 BDTT
No ratings yet
Unit 4 BDTT
23 pages
Spark BD
No ratings yet
Spark BD
9 pages
Key Features: General-Purpose Fast Cluster Computing Platform
No ratings yet
Key Features: General-Purpose Fast Cluster Computing Platform
16 pages
Apache Spark Ecosystem - Complete Spark Components Guide: 1. Objective
No ratings yet
Apache Spark Ecosystem - Complete Spark Components Guide: 1. Objective
11 pages
Hadoop Vs Spark
No ratings yet
Hadoop Vs Spark
2 pages
Unit 4
No ratings yet
Unit 4
60 pages
A Brief Introduction To Apache Spark
No ratings yet
A Brief Introduction To Apache Spark
10 pages
Big Data Application Performance Monitoring in Retail ECommerce Using Spark
No ratings yet
Big Data Application Performance Monitoring in Retail ECommerce Using Spark
4 pages
Apache Spark Self Learning 1
No ratings yet
Apache Spark Self Learning 1
7 pages
Unit 4 Spark Cassendra
No ratings yet
Unit 4 Spark Cassendra
41 pages
Compare Hadoop and Spark.: Table
No ratings yet
Compare Hadoop and Spark.: Table
10 pages
Module 2
No ratings yet
Module 2
20 pages
CISD 42 Introduction To Spark - Spark Transformation - Spark Actions
No ratings yet
CISD 42 Introduction To Spark - Spark Transformation - Spark Actions
27 pages
Unit 6 Spark
No ratings yet
Unit 6 Spark
43 pages
Unit 4 (Big Data Analytics)
No ratings yet
Unit 4 (Big Data Analytics)
28 pages
Apache Spark
No ratings yet
Apache Spark
25 pages
Apache Spark Architecture
No ratings yet
Apache Spark Architecture
4 pages
Apache Spark Components
No ratings yet
Apache Spark Components
4 pages
Design Patterns
No ratings yet
Design Patterns
12 pages
Design Patterns
No ratings yet
Design Patterns
12 pages
1Z0 1094 21 Skillscheck
No ratings yet
1Z0 1094 21 Skillscheck
6 pages
DP5 Practice Activities - Answers
100% (1)
DP5 Practice Activities - Answers
3 pages
Bi Lectures Chatgpt
No ratings yet
Bi Lectures Chatgpt
48 pages
Assignment Designs For University Student
No ratings yet
Assignment Designs For University Student
6 pages
Describing The Four Security Layers of The Peoplesoft System (Continued)
No ratings yet
Describing The Four Security Layers of The Peoplesoft System (Continued)
3 pages
ANTENATAL CARE MANAGEMENT SYSTEM REPORT (Suleiman Abdul) PDF
No ratings yet
ANTENATAL CARE MANAGEMENT SYSTEM REPORT (Suleiman Abdul) PDF
54 pages
Subquery Question
100% (1)
Subquery Question
4 pages
Index and Triggers
No ratings yet
Index and Triggers
30 pages
Building Microservices With Node - Js
No ratings yet
Building Microservices With Node - Js
16 pages
Eurosim 6
No ratings yet
Eurosim 6
7 pages
Comandos R-MAN
No ratings yet
Comandos R-MAN
3 pages
DSTBD Oracle hints-IT
No ratings yet
DSTBD Oracle hints-IT
11 pages
Final Dbms PDF
No ratings yet
Final Dbms PDF
26 pages
Over 100 SQL Concepts
No ratings yet
Over 100 SQL Concepts
23 pages
Apache Iceberg - Additional Real World Use Cases
No ratings yet
Apache Iceberg - Additional Real World Use Cases
25 pages
Chp3 Data Warehouse and Hadoop
No ratings yet
Chp3 Data Warehouse and Hadoop
49 pages
ERD Revision Class
No ratings yet
ERD Revision Class
1 page
104 Management Information Systems.
No ratings yet
104 Management Information Systems.
17 pages
MIS Management Information System
No ratings yet
MIS Management Information System
6 pages
Courtesy:: Report Title Purpose Relationships Tutorial
No ratings yet
Courtesy:: Report Title Purpose Relationships Tutorial
3 pages
Literature Review
100% (1)
Literature Review
2 pages
Chapter 1: Object-Based Database Concepts
No ratings yet
Chapter 1: Object-Based Database Concepts
27 pages
Capstone Project - Music Library - JAVAJ2EE
No ratings yet
Capstone Project - Music Library - JAVAJ2EE
3 pages
Aissce CS Practical Exam 2021
No ratings yet
Aissce CS Practical Exam 2021
5 pages
TAFJ Promoted Columns
No ratings yet
TAFJ Promoted Columns
3 pages
NetBackup104 AdminGuide PostgreSQL
No ratings yet
NetBackup104 AdminGuide PostgreSQL
38 pages
CS8080 Information Retrieval Techniques Reg 2017 Question Bank
No ratings yet
CS8080 Information Retrieval Techniques Reg 2017 Question Bank
6 pages

Apache Spark Features

Uploaded by

Apache Spark Features

Uploaded by

Module4​:

Features of Apache Spark

Real Time Stream Processing: ​Spark Streaming brings Apache Spark's

In Memory Computing: ​Unlike Hadoop Mapreduce, Apache spark is capable of processing

You might also like

Module4:

Real Time Stream Processing: Spark Streaming brings Apache Spark's

In Memory Computing: Unlike Hadoop Mapreduce, Apache spark is capable of processing