0% found this document useful (0 votes)

13 views8 pages

Lec 11

Uploaded by

ParichayBhattacharjee

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

13 views8 pages

Lec 11

Uploaded by

ParichayBhattacharjee

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 8

Big Data Computing

Prof. Rajiv Misra

Computer Science andEngineering,IIT Patna

Lecture – 11

Spark Built-in Libraries

Refer slide time: (0:14)

The Spark Built-in Libraries.

Refer slide time: (0:16)

Now Spark, Apache Spark is a fast and general-purpose cluster computing for large-scale data processing.
The Spark supports, high level APIs, in the form with, Java, Scala, Python and R. Now let us see the
Spark core, which is the fast and a general purpose cluster computing engine. Now this particular Spark is
giving a fast and general-purpose computing, which provides or which has enabled various applications,
to be run on top of this Spark core or Spark engine. Such as the various components, on various libraries,
which are supported, by the Spark, for different applications, are summarized in this particular diagram.
So the first one is called, ‘Spark Sql.’ So that means, Sql like, commands are being provided and by that,
the key value store and various programming can be supported, using, Sql like commands. The another
one is called as, ‘Spark streaming’ for real-time streaming applications. Here the data will be in the form
of micro batches and the streams will be computed in, real-time. Another one is called, ‘Spark MLlib’,
that is machine learning libraries, which are provided, over the ‘Spark Core’. Finally the graph processing
is done over top of the Spark in the form of library, which is called, ‘Graph X’.

Refer slide time: (2:12)

So these are the standard libraries, which are used for various big data applications and also supports,
various common algorithms, which are used for big data analytics.

Refer slide time: (2:26)

So let us see, these libraries, standard libraries, which are part of the Spark core, in more

details. Refer slide time: (2:31)

For example, the Spark machine learning library, which is called, ‘A Spark MLlib Library’. Now this
particular, Spark MLlib Library provides a collection of machine learning algorithms, which are provided
here, for doing the big data analytics, that is called, ‘Scalable Machine Learning Algorithms’ for big data
analytics. Let us summarize what are, what are the different machine learning, scalable algorithms, are
available in the form of MLlib. For the classification application, the algorithms like, logistic regression,
linear support vector machine, Naive Bayes and decision trees are available, as part of the MLlib library.
For regression application, generalized linear model (GLM) and regression tree is available. Similarly, for
collaborative filtering, alternating least squares and non-negative matrix factorization is available, as part
of MLlib. For unsupervised clustering or cluster analysis, parallel K means algorithm is available as part
of MLlib. Similarly, for decomposition, support, SVD and principal component analysis is available for
decomposition. And for the optimization, various libraries are available, will such as, Stochastic Gradient
Descent and L-BFGS.

Refer slide time: (4:24)

Let us see the another library, which is supporting the graph applications over the Spark, that is large
scale graph computation over the Spark, which is supported in the form of the graphics. So here the
graph, here the raw data, it takes and this figure shows that, that extract, transform and load this
particular, ETL, it will through that, it will create the initial graph. It will perform various
transformations, operations, on the graph, such as; it will create the sub graph, it will perform the graph
algorithms. Which are, such as, page rank and it will do the different analysis. So all these operations are
supported in the form of GraphX.

Refer slide time: (5:15)

So GraphX, is a general purpose, graph processing library and it builds the graph using RDD's of the
nodes and edges. Large library of graph algorithms are available as part of the GraphX.

Refer slide time: (5:31)

Let us summarize some of the algorithms, which are available, as part of the GraphX library and doing,
and for the Graph processing. Collaborative filtering and then structured prediction, then semi-supervised
machine learning and then, community detection, graph analytics and classification, using neural
networks. So all these graph analytics are available, such as page rank algorithm on graphs, personalized
page rank algorithm, shortest path graph coloring. All these algorithms, are available, as part of the graph
X library.
Refer slide time: (6:23)

Similarly, for community Detection, triangle counting, K-core Decomposition, K-Truss, all these
algorithms are also available for doing the graph analytics. Similarly for SPARK streaming, this particular
Spark core provides library for supporting the Spark streaming, Large-scale streaming applications are
supported in the form of Spark streaming, Spark standard streaming library system. And it will unify, the
integrate with the Spark to unify for batch interactive and streaming computations together.

Refer slide time: (6:57)

Next library which is available with the Spark core is called, ‘Spark SQL’. Now it will enable loading and
querying structured data, in the form of the Spark. It is also having the links or APIs with the Hive and
with JSON.

Refer slide time: (7:16)

So Spark community, now is most open-source, most active open-source community in the Big Data and
200 plus developers and 50 plus companies are contributing. And these icons shows that, the complete
presence of a Spark in all the, the production clusters, at most of these companies are shown over here
and you can see that. The Spark surpassed the Hadoop Map Reduce in the number of contributions.
Thank you.

Lean Six Sigma Green Belt Cheat Sheet PDF
67% (3)
Lean Six Sigma Green Belt Cheat Sheet PDF
18 pages
Exploring Hadoop Ecosystem (Volume 2): Stream Processing
From Everand
Exploring Hadoop Ecosystem (Volume 2): Stream Processing
Wei Liu
No ratings yet
Fast Data Processing with Spark 2 - Third Edition
From Everand
Fast Data Processing with Spark 2 - Third Edition
Krishna Sankar
No ratings yet
Learning PySpark
From Everand
Learning PySpark
Tomasz Drabas
No ratings yet
Lift Traffic Design Spreadsheet - All Peaks
No ratings yet
Lift Traffic Design Spreadsheet - All Peaks
8 pages
Spark & SparkMLLib
No ratings yet
Spark & SparkMLLib
6 pages
Big Data Computing Spark Built-In Libraries
No ratings yet
Big Data Computing Spark Built-In Libraries
11 pages
Apache Spark Components
No ratings yet
Apache Spark Components
4 pages
Unit 4 Spark Cassendra
No ratings yet
Unit 4 Spark Cassendra
41 pages
Big Data Processing With Apache Spark - Part 1 - Introduction - InfoQ
No ratings yet
Big Data Processing With Apache Spark - Part 1 - Introduction - InfoQ
18 pages
BDA1
No ratings yet
BDA1
17 pages
Spark: Prepared by Dulari Bhatt
No ratings yet
Spark: Prepared by Dulari Bhatt
19 pages
Big Data Topic3 (Spark) (Thanh Binh Nguyen) .TextMark
No ratings yet
Big Data Topic3 (Spark) (Thanh Binh Nguyen) .TextMark
60 pages
Bootcamp Keynote
No ratings yet
Bootcamp Keynote
47 pages
Apache Spark Ecosystem - Complete Spark Components Guide: 1. Objective
No ratings yet
Apache Spark Ecosystem - Complete Spark Components Guide: 1. Objective
11 pages
Spark BD
No ratings yet
Spark BD
9 pages
Spark2x: Big Data Huawei Course
No ratings yet
Spark2x: Big Data Huawei Course
25 pages
Spark
No ratings yet
Spark
9 pages
3.5 Apache Spark
No ratings yet
3.5 Apache Spark
12 pages
Slide 7 Spark Introduction
No ratings yet
Slide 7 Spark Introduction
59 pages
Apache Spark Engine
100% (1)
Apache Spark Engine
82 pages
20J41A0514-Big Data Spark
No ratings yet
20J41A0514-Big Data Spark
12 pages
BDA Unit-6
No ratings yet
BDA Unit-6
11 pages
Unit 6 Spark
No ratings yet
Unit 6 Spark
8 pages
Big Data Anlytics Unit 3 R22 It
No ratings yet
Big Data Anlytics Unit 3 R22 It
57 pages
Apache Spark A Comprehensive Guide
No ratings yet
Apache Spark A Comprehensive Guide
9 pages
Mod4 Bda
No ratings yet
Mod4 Bda
14 pages
Lecture 4 - Spark Introduction
No ratings yet
Lecture 4 - Spark Introduction
45 pages
SPARK
No ratings yet
SPARK
125 pages
Apache Spark Essential Training
No ratings yet
Apache Spark Essential Training
30 pages
Lecture 3 PPT 22
No ratings yet
Lecture 3 PPT 22
25 pages
Mastering Advanced Analytics With Apache Spark
No ratings yet
Mastering Advanced Analytics With Apache Spark
75 pages
Bda U4
No ratings yet
Bda U4
49 pages
Bda Unit 6
No ratings yet
Bda Unit 6
14 pages
What Is Spark?: History of Apache Spark
No ratings yet
What Is Spark?: History of Apache Spark
65 pages
Bda Notes
No ratings yet
Bda Notes
241 pages
BigData Spark Sparklyr
No ratings yet
BigData Spark Sparklyr
80 pages
4 Spark SBP
No ratings yet
4 Spark SBP
74 pages
ApacheSparkWorkshop 2020 09 17
No ratings yet
ApacheSparkWorkshop 2020 09 17
58 pages
Devops Slides
No ratings yet
Devops Slides
223 pages
Introduction To Spark
No ratings yet
Introduction To Spark
84 pages
Popular Machine Learning Algorithms in Apache Spark
No ratings yet
Popular Machine Learning Algorithms in Apache Spark
6 pages
Spark-Rdd
No ratings yet
Spark-Rdd
15 pages
Introduction To Spark 1
No ratings yet
Introduction To Spark 1
21 pages
Apach Spark With Scala Slides
No ratings yet
Apach Spark With Scala Slides
187 pages
Apache Spark: Dhineshkumar S K
No ratings yet
Apache Spark: Dhineshkumar S K
31 pages
Spark 101
No ratings yet
Spark 101
25 pages
BDA GTU Study Material Presentations Unit-6 03102021061221PM
No ratings yet
BDA GTU Study Material Presentations Unit-6 03102021061221PM
23 pages
PySpark Notes
No ratings yet
PySpark Notes
31 pages
UNIT 4 Part 2
No ratings yet
UNIT 4 Part 2
11 pages
Unit 5
100% (1)
Unit 5
109 pages
Big Data Processing With Apache Spark
No ratings yet
Big Data Processing With Apache Spark
17 pages
Apache Spark With Java
No ratings yet
Apache Spark With Java
209 pages
Unit V
No ratings yet
Unit V
35 pages
Pyspark Notes New
No ratings yet
Pyspark Notes New
18 pages
Apache Spark Graph Processing - Sample Chapter
No ratings yet
Apache Spark Graph Processing - Sample Chapter
22 pages
Top Answers To Spark Interview Questions
No ratings yet
Top Answers To Spark Interview Questions
4 pages
Sparkr: Scaling R Programs With Spark: Data Sources
No ratings yet
Sparkr: Scaling R Programs With Spark: Data Sources
6 pages
T07 Spark
No ratings yet
T07 Spark
23 pages
Introduction To Spark
No ratings yet
Introduction To Spark
4 pages
Spark and Scala - Module 5
No ratings yet
Spark and Scala - Module 5
36 pages
Kafka Up and Running for Network DevOps: Set Your Network Data in Motion
From Everand
Kafka Up and Running for Network DevOps: Set Your Network Data in Motion
Eric Chou
No ratings yet
JSW Vijayanagar Optimization of PLTCM1 CRM1 at JSW Steel LTD
No ratings yet
JSW Vijayanagar Optimization of PLTCM1 CRM1 at JSW Steel LTD
28 pages
Advantages of Metallic-Coated Steel Framing in Residential Buildings
No ratings yet
Advantages of Metallic-Coated Steel Framing in Residential Buildings
3 pages
Lec 13
No ratings yet
Lec 13
12 pages
Lec 7
No ratings yet
Lec 7
10 pages
Green Hydrogen Enabling Measures Roadmap
No ratings yet
Green Hydrogen Enabling Measures Roadmap
41 pages
Lec 3
No ratings yet
Lec 3
25 pages
Clecim - Product Brochure - SIAS - EN
No ratings yet
Clecim - Product Brochure - SIAS - EN
12 pages
Megger Testing
No ratings yet
Megger Testing
12 pages
Clecim - Product Brochure - Roll Coater - EN
No ratings yet
Clecim - Product Brochure - Roll Coater - EN
12 pages
Clecim - Product Brochure - Welders - EN
No ratings yet
Clecim - Product Brochure - Welders - EN
8 pages
State Bank of India
No ratings yet
State Bank of India
5 pages
Welders Cold Rolling Plants en
No ratings yet
Welders Cold Rolling Plants en
12 pages
08 Chapter 03 PDF
No ratings yet
08 Chapter 03 PDF
66 pages
Annexure To Tender No. Dps/Mrpu/1/1/238 Description of The ITEM: 10 KW Induction Heating System - 1 Set 1.description
No ratings yet
Annexure To Tender No. Dps/Mrpu/1/1/238 Description of The ITEM: 10 KW Induction Heating System - 1 Set 1.description
7 pages
Shape Measurement of Steel Strips Using PDF
No ratings yet
Shape Measurement of Steel Strips Using PDF
9 pages
Microsoft SC-200 - QuestionsPrt2
No ratings yet
Microsoft SC-200 - QuestionsPrt2
252 pages
Exiftool
No ratings yet
Exiftool
23 pages
Search:: A Really Simple Database
No ratings yet
Search:: A Really Simple Database
30 pages
Large Scale Font Independent Urdu Text Recognition System
No ratings yet
Large Scale Font Independent Urdu Text Recognition System
9 pages
Step by Step Manual Upgrade Oracle Database From 12c To 19c - DBsGuru
100% (1)
Step by Step Manual Upgrade Oracle Database From 12c To 19c - DBsGuru
47 pages
Ei-SMART Sewing System
No ratings yet
Ei-SMART Sewing System
22 pages
LIFECO Product Digital Catalogue
No ratings yet
LIFECO Product Digital Catalogue
48 pages
Greg Lynn
No ratings yet
Greg Lynn
32 pages
Documentation of Life Experience: Als RPL Form 1
88% (8)
Documentation of Life Experience: Als RPL Form 1
5 pages
Steps On How To Create Pixel Art Using Conditional Formatting
No ratings yet
Steps On How To Create Pixel Art Using Conditional Formatting
2 pages
Final Porfolio - Daniel Alcala
No ratings yet
Final Porfolio - Daniel Alcala
26 pages
CSC WS 1
No ratings yet
CSC WS 1
4 pages
DIR-610N+ A1 QIG v1.00 (LA)
No ratings yet
DIR-610N+ A1 QIG v1.00 (LA)
16 pages
Modal Verbs Bachillerato Teoría y Ejercicios
No ratings yet
Modal Verbs Bachillerato Teoría y Ejercicios
2 pages
B-K Radio Programming Manual
No ratings yet
B-K Radio Programming Manual
29 pages
Debre Markos Institute of Technology (DMIT) School of Computing Software Engineering Academic Program Final Year Project Title
No ratings yet
Debre Markos Institute of Technology (DMIT) School of Computing Software Engineering Academic Program Final Year Project Title
5 pages
Junit5 Notes
No ratings yet
Junit5 Notes
20 pages
Mpesa Web User Application Form
No ratings yet
Mpesa Web User Application Form
2 pages
Modern Marketing Data Stack 2025 Report
No ratings yet
Modern Marketing Data Stack 2025 Report
85 pages
JYT9535100 - 03 - SpaceLogic C-Bus Controllers - User Manual
No ratings yet
JYT9535100 - 03 - SpaceLogic C-Bus Controllers - User Manual
250 pages
Isodraft User Guide PDF
No ratings yet
Isodraft User Guide PDF
137 pages
University of Gujrat: Important Instructions
No ratings yet
University of Gujrat: Important Instructions
2 pages
Data Logging Vs Data Acquisition
No ratings yet
Data Logging Vs Data Acquisition
4 pages
Rules of Netiquette 1.1
No ratings yet
Rules of Netiquette 1.1
78 pages
Op Bankin
No ratings yet
Op Bankin
23 pages
Hotstar Debugger1
No ratings yet
Hotstar Debugger1
4 pages
Mounika Resume
No ratings yet
Mounika Resume
3 pages
Emotion Based Driving
No ratings yet
Emotion Based Driving
5 pages
KKSU-BCA-IV-01 (Software Engineering)
No ratings yet
KKSU-BCA-IV-01 (Software Engineering)
83 pages

Lec 11

Uploaded by

Lec 11

Uploaded by

Big Data Computing

Prof. Rajiv Misra

Spark Built-in Libraries

The Spark Built-in Libraries.

Refer slide time: (0:16)

Refer slide time: (2:12)

Refer slide time: (2:26)

details. Refer slide time: (2:31)

Refer slide time: (4:24)

Refer slide time: (5:15)

Refer slide time: (5:31)

Refer slide time: (6:57)

Refer slide time: (7:16)

You might also like