0% found this document useful (0 votes)

9 views31 pages

Lecture 6 - Spark ML

The document provides an overview of machine learning, its definitions, types, and applications, particularly focusing on Spark MLlib. It discusses various machine learning techniques such as classification, regression, clustering, and collaborative filtering, as well as the K-means algorithm. Additionally, it highlights the tools and libraries available for machine learning, emphasizing the scalability and efficiency of Spark for data processing.

Uploaded by

Tuân Nguyễn

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

9 views31 pages

Lecture 6 - Spark ML

Uploaded by

Tuân Nguyễn

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 31

06/10/2024

Spark Mllib

Instructor: Van-Dang Tran, Ph.D.

MACHINE LEARNING

“Programming Computers to optimize

performance using Example Data or Past
Experience”

1
06/10/2024

MACHINE LEARNING?

Field of study that gives "computers

the ability to learn without being
explicitly programmed."

-- Arthur Samuel, 1959

HAVE YOU PLAYED MARIO?

How much time did it take you to learn & win the princess?

2
06/10/2024

HOW ABOUT AUTOMATING IT?

How about
automating it?
Program Learns to Play Mario
Observes the game & presses keys
Maximises Score

3
06/10/2024

So?
• Program Learnt to play Mario and other games
• Without any need of programming

4
06/10/2024

Question: To make this program learn any other games such

as PacMan we will have to …

1. Write new rules as per the game

2. Just hook it to new game and let it play for a while

Question: To make this program learn any other games such as

PacMan we will have to …

1. Write new rules as per the game

2. Just hook it to new game and let it play for a while

5
06/10/2024

MACHINE LEARNING

• Branch of Artificial Intelligence

• Design and Development of Algorithms
• Computers Evolve Behaviour based on Empirical Data

Spark-MLb
il

MACHINE LEARNING - APPLICATIONS

Recommend Friends, Dates, Products to end-user.

6
06/10/2024

MACHINE LEARNING - APPLICATIONS

Classify content into predefined groups.

MACHINE LEARNING - APPLICATIONS

Identify key topics in large Collections of Text.

7
06/10/2024

MACHINE LEARNING - APPLICATIONS

Computer Vision - Identifying Objects

MACHINE LEARNING - APPLICATIONS

Natural Language Processing

8
06/10/2024

MACHINE LEARNING - APPLICATIONS

• Find Similar content based on Object Properties.

• Detect Anomalies within given data.
• Ranking Search Results with User Feedback Learning.
• Classifying DNA sequences.
• Sentiment Analysis/ Opinion Mining
• BioInformatics.
• Speech and HandWriting Recognition.

MACHINE LEARNING - TYPES?

Given example inputs & outputs, learn to

Supervised
map inputs to outputs

Machine Learning

9
06/10/2024

MACHINE LEARNING - TYPES?

Supervised Given example inputs & outputs, learn

to map inputs to outputs

Machine Learning Unsupervised No labels given, find structure

MACHINE LEARNING - TYPES?

Supervised
Given example inputs & outputs, learn
to map inputs to outputs

Machine Learning Unsupervised No labels given, find structure

Reinforcement
Dynamic environment, perform a certain
goal

10
06/10/2024

MACHINE LEARNING - TYPES?

Classification

Supervised

Regression

Machine Learning Unsupervised Clustering

Reinforcement

MACHINE LEARNING - CLASSIFICATION?

Check
Email

Spam? No

Yes We Use Logistic Regression

11
06/10/2024

MACHINE LEARNING - REGRESSION?

Predicting a continuous-valued
attribute associated with an object.

In linear regression, we draw all possible lines

going through the points such that it is closest
to all.

MACHINE LEARNING - CLUSTERING?

• To form a cluster based on

some definition of nearness

12
06/10/2024

MACHINE LEARNING - TOOLS

DATA SIZE CLASSFICATION TOOLS

Lines Sample Data Analysis and Whiteboard,…

Visualization
KBs - low MBs Prototype Analysis and Matlab, Octave, R,
Data Visualization Processing,
MBs - low GBs NumPy, SciPy,
Analysis
Online Data Weka,
Flare, AmCharts,
Visualization
Raphael, Protovis
GBs - TBs - PBs Analysis MLlib, SparkR, GraphX,
Big Data Mahout, Giraph

MACHINE LEARNING USING SPARK

• Spark RDDs à efficient data sharing

• In-memory caching accelerates performance

• Up to 20x faster than Hadoop

• Easy to use high-level programming interface

• Express complex algorithms ~100 lines.

13
06/10/2024

MACHINE LEARNING LIBRARY (MLlib)

Goal is to make practical machine learning scalable and easy

Consists of common learning algorithms and utilities, including:

• Classification
• Regression
• Clustering
• Collaborative filtering
• Dimensionality reduction
• Lower-level optimization primitives
• Higher-level pipeline APIs

MlLib STRUCTURE

ML Algorithms Featurization
Common learning algorithms
e.g. classification, regression, clustering, Feature extraction, Transformation, Dimensionality
and collaborative filtering reduction, and Selection

Pipelines Persistence
Tools for constructing, evaluating, Saving and load algorithms, models,
and tuning ML Pipelines and Pipelines

Utilities
Linear algebra, statistics, data handling, etc.

14
06/10/2024

MLLIB - COLLABORATIVE FILTERING

• Commonly used for recommender systems
• Techniques aim to fill in the missing entries of a user-item association
matrix

• Supports model-based collaborative filtering,

• Users and products are described by a small set of latent factors that can
be used to predict missing entries.
• MLlib uses the alternating least squares (ALS) algorithm to learn these
latent factors.

PIPELINES
DataFrame:This ML API uses DataFrame from Spark SQL as an ML dataset, which can hold a
variety of data types. E.g., a DataFrame could have different columns storing text, feature vectors,
true labels, and predictions.

Transformer: A Transformer is an algorithm which can transform one DataFrame into another
DataFrame. E.g., an ML model is a Transformer which transforms a DataFrame with features into a
DataFrame with predictions.

Estimator: An Estimator is an algorithm which can be fit on a DataFrame to produce a Transformer.

E.g., a learning algorithm is an Estimator which trains on a DataFrame and produces a model.

Pipeline: A Pipeline chains multiple Transformers and Estimators together to specify an ML

workflow.

Parameter: All Transformers and Estimators now share a common API for specifying parameters.

15
06/10/2024

PIPELINES

spark.mllib - BASIC STATISTICS

Summary statistics
Correlations
Stratified sampling
Hypothesis testing
Random data generation
Kernel density estimation
See https://spark.apache.org/docs/latest/mllib-statistics.html

16
06/10/2024

MLlib - CLASSIFICATION AND REGRESSION

MLlib supports various methods:
Binary Classification
linear SVMs, logistic regression, decision trees, random forests,
gradient-boosted trees, naive Bayes
Multiclass Classification
logistic regression, decision trees, random forests, naive Bayes
Regression
linear least squares, Lasso, ridge regression, decision trees,
random forests, gradient-boosted trees, isotonic regression

More Details>>

MlLib - Other Classes of Algorithms

Dimensionality reduction:
https://spark.apache.org/docs/latest/mllib-dimensionality-reduction.html
Feature extraction and transformation:
https://spark.apache.org/docs/latest/mllib-feature-extraction.html
Frequent pattern mining:
https://spark.apache.org/docs/latest/mllib-frequent-pattern-mining.html
Evaluation metrics:
https://spark.apache.org/docs/latest/mllib-evaluation-metrics.html
PMML model export:
https://spark.apache.org/docs/latest/mllib-pmml-model-export.html
Optimization (developer):
https://spark.apache.org/docs/latest/mllib-optimization.html

17
06/10/2024

MACHINE LEARNING TECHNIQUES

Classification

Clustering

Regression

Active learning

Collaborative filtering

K-Means Clustering using Spark

Focus: Implementation and Performance

18
06/10/2024

CLUSTERING
E.g. archaeological dig

Distance North
Grouping data according
to similarity

Distance East

CLUSTERING
E.g. archaeological dig
Distance North

Grouping data
according to
similarity

Distance East

19
06/10/2024

K-MEANS ALGORITHM
Benefits E.g. archaeological dig

Distance North
• Popular
• Fast
• Conceptually
straightforward

Distance East

K-MEANS: PRELIMINARIES
Data: Collection of values

data = lines.map(line=>
Feature 2

parseVector(line))

Feature 1

20
06/10/2024

K-MEANS: PRELIMINARIES
Dissimilarity:
Squared Euclidean distance

Feature 2
dist = p.squaredDist(q)

Feature 1

K-MEANS: PRELIMINARIES
K = Number of clusters
Feature 2

Data assignments to clusters

S1, S2,. . ., SK

Feature 1

21
06/10/2024

K-MEANS: PRELIMINARIES
K = Number of clusters

Feature 2
Data assignments to clusters
S1, S2,. . ., SK

Feature 1

K-MEANS ALGORITHM
• Initialize K cluster centers
• Repeat until convergence:
Assign each data point to
the cluster with the closest
Feature 2

center.
Assign each cluster center to
be the mean of its cluster’s
data points.

Feature 1

22
06/10/2024

K-MEANS ALGORITHM
• Initialize K cluster centers
• Repeat until convergence:
Assign each data point to
the cluster with the closest

Feature 2
center.
Assign each cluster center to
be the mean of its cluster’s
data points.

Feature 1

K-MEANS ALGORITHM
• Initialize K cluster centers
centers = data.takeSample(
false, K, seed)
Feature 2

• Repeat until convergence:

Assign each data point to
the cluster with the closest
center.
Assign each cluster center to
be the mean of its cluster’s
data points.
Feature 1

23
06/10/2024

K-MEANS ALGORITHM
• Initialize K cluster centers
centers = data.takeSample(
false, K, seed)

Feature 2
• Repeat until convergence:
Assign each data point to
the cluster with the closest
center.
Assign each cluster center to
be the mean of its cluster’s
data points.
Feature 1

K-MEANS ALGORITHM
• Initialize K cluster centers
centers = data.takeSample(
false, K, seed)
Feature 2

• Repeat until convergence:

Assign each data point to
the cluster with the closest
center.
Assign each cluster center to
be the mean of its cluster’s
data points.
Feature 1

24
06/10/2024

K-MEANS ALGORITHM
• Initialize K cluster centers
centers = data.takeSample(
false, K, seed)

Feature 2
• Repeat until convergence:
closest = data.map(p =>

(closestPoint(p,centers),p))
Assign each cluster center to
be the mean of its cluster’s
data points.
Feature 1

K-MEANS ALGORITHM
• Initialize K cluster centers
centers = data.takeSample(
false, K, seed)
Feature 2

• Repeat until convergence:

closest = data.map(p =>

(closestPoint(p,centers),p))
Assign each cluster center to
be the mean of its cluster’s
data points.
Feature 1

25
06/10/2024

K-MEANS ALGORITHM
• Initialize K cluster centers
centers = data.takeSample(
false, K, seed)

Feature 2
• Repeat until convergence:
closest = data.map(p =>

(closestPoint(p,centers),p))
Assign each cluster center to
be the mean of its cluster’s
data points.
Feature 1

K-MEANS ALGORITHM
• Initialize K cluster centers
centers = data.takeSample(
false, K, seed)
Feature 2

• Repeat until convergence:

closest = data.map(p =>

(closestPoint(p,centers),p))
pointsGroup =
closest.groupByKey()

Feature 1

26
06/10/2024

K-MEANS ALGORITHM
• Initialize K cluster centers
centers = data.takeSample(
false, K, seed)

Feature 2
• Repeat until convergence:
closest = data.map(p =>

(closestPoint(p,centers),p))
pointsGroup =
closest.groupByKey()
newCenters = pointsGroup.mapValues(
ps => average(ps))
Feature 1

K-MEANS ALGORITHM
• Initialize K cluster centers
centers = data.takeSample(
false, K, seed)
Feature 2

• Repeat until convergence:

closest = data.map(p =>

(closestPoint(p,centers),p))
pointsGroup =
closest.groupByKey()
newCenters = pointsGroup.mapValues(
ps => average(ps))
Feature 1

27
06/10/2024

K-MEANS ALGORITHM
• Initialize K cluster centers
centers = data.takeSample(
false, K, seed)

Feature 2
• Repeat until convergence:

closest = data.map(p =>

(closestPoint(p,centers),p))
pointsGroup =
closest.groupByKey()
newCenters = pointsGroup.mapValues(
ps => average(ps))
Feature 1

K-MEANS ALGORITHM
• Initialize K cluster centers
centers = data.takeSample(
false, K, seed)
Feature 2

• Repeat until convergence:

while (dist(centers, newCenters) > ɛ)
closest = data.map(p =>
(closestPoint(p,centers),p))
pointsGroup =
closest.groupByKey()
newCenters =pointsGroup.mapValues(
ps => average(ps))
Feature 1

28
06/10/2024

K-MEANS ALGORITHM
• Initialize K cluster centers
centers = data.takeSample(
false, K, seed)

Feature 2
• Repeat until convergence:
while (dist(centers, newCenters) > ɛ)
closest = data.map(p =>
(closestPoint(p,centers),p))
pointsGroup =
closest.groupByKey()
newCenters =pointsGroup.mapValues(
ps => average(ps))
Feature 1

K-MEANS ALGORITHM
centers = data.takeSample(
false, K, seed)
while (d > ɛ)
{
closest = data.map(p =>
Feature 2

(closestPoint(p,centers),p))
pointsGroup =
closest.groupByKey()
newCenters =pointsGroup.mapValues(
ps => average(ps))
d = distance(centers, newCenters)

centers = newCenters.map(_)
}
Feature 1

29
06/10/2024

EASE OF USE
§ Interactive shell:
Useful for featurization, pre-processing data
§ Lines of code for K-Means
- Spark ~ 90 lines – (Part of hands-on tutorial !)
- Hadoop/Mahout ~ 4 files, > 300 lines

PERFORMANCE
K-Means Logistic Regression
274

300 Hadoop 250 H ad oo p

HadoopBinMem H ad oo pB inMem
184

250
Iteration time (s)

200
Iteration time (s)

Spark
197

Spark
200
157

150
116
143

111
121

150
106

100
80

76
87

100
62
61

50
33

50
15

0 0
25 50 100
25 50 100
Number of machines Number of machines
[Zaharia et. al, NSDI’12]

30
06/10/2024

CONCLUSION
§ Spark: Framework for cluster computing

§ Fast and easy machine learning programs

§ K means clustering using Spark

Examples and more: www.spark-project.org

Allison 3000-4000 Series Troubleshooting
94% (131)
Allison 3000-4000 Series Troubleshooting
861 pages
cp4252 Machine Learning
100% (2)
cp4252 Machine Learning
49 pages
MLib Cheat Sheet Design
No ratings yet
MLib Cheat Sheet Design
1 page
Machine Learning With Spark
No ratings yet
Machine Learning With Spark
26 pages
SEng5305-chap-1-Introduction To ML
No ratings yet
SEng5305-chap-1-Introduction To ML
85 pages
AI Bootcamp Sarris2024
No ratings yet
AI Bootcamp Sarris2024
64 pages
Library
No ratings yet
Library
23 pages
Unit 3
No ratings yet
Unit 3
97 pages
Intro To ML - 1
No ratings yet
Intro To ML - 1
29 pages
21CS743 Model Question Paper Solution
No ratings yet
21CS743 Model Question Paper Solution
32 pages
Code Planet. Machine Learning With Python. A Comprehensive Guide... 2025
No ratings yet
Code Planet. Machine Learning With Python. A Comprehensive Guide... 2025
231 pages
BDA Lec11
No ratings yet
BDA Lec11
32 pages
Machine Learning Unit 1 Que and Ans
No ratings yet
Machine Learning Unit 1 Que and Ans
6 pages
21cs743 Model Question Paper Solution
No ratings yet
21cs743 Model Question Paper Solution
33 pages
1 Introduction
No ratings yet
1 Introduction
58 pages
Algorithms and Frameworks Used in The Development of Machine Learning Models
No ratings yet
Algorithms and Frameworks Used in The Development of Machine Learning Models
5 pages
Unit 1
No ratings yet
Unit 1
28 pages
LM #01-Introduction To ML
No ratings yet
LM #01-Introduction To ML
33 pages
Machine Learning With Big Data: Vietnam National University of HCMC
No ratings yet
Machine Learning With Big Data: Vietnam National University of HCMC
45 pages
Karthik
No ratings yet
Karthik
10 pages
Department of Electronics and Communication: Industrial Training Presentation
No ratings yet
Department of Electronics and Communication: Industrial Training Presentation
22 pages
ML Notes
No ratings yet
ML Notes
52 pages
Slide 11 Spark ML
No ratings yet
Slide 11 Spark ML
153 pages
Machine Learning
No ratings yet
Machine Learning
5 pages
ML Lab Manual Arpan
No ratings yet
ML Lab Manual Arpan
48 pages
Lecture 3
No ratings yet
Lecture 3
36 pages
Module - 1
No ratings yet
Module - 1
9 pages
An Introduction To Machine Learning and Its Applications
No ratings yet
An Introduction To Machine Learning and Its Applications
8 pages
CP Presentation Affan, Hammad, Arman, Shayan
No ratings yet
CP Presentation Affan, Hammad, Arman, Shayan
18 pages
BCS602 Model Question Paper Solved (Search Creators)
No ratings yet
BCS602 Model Question Paper Solved (Search Creators)
37 pages
ML Unit-1
No ratings yet
ML Unit-1
32 pages
Slide 1 Introduction
No ratings yet
Slide 1 Introduction
33 pages
Research Trends in Machine Learning: Muhammad Kashif Hanif
No ratings yet
Research Trends in Machine Learning: Muhammad Kashif Hanif
80 pages
ML Resources CW 2025
No ratings yet
ML Resources CW 2025
5 pages
Supervised Learning Final With Diagrams Cleaned
No ratings yet
Supervised Learning Final With Diagrams Cleaned
7 pages
Machine Learning
No ratings yet
Machine Learning
51 pages
ML Lecture Notes Unit-1
No ratings yet
ML Lecture Notes Unit-1
45 pages
ML Notes-1
No ratings yet
ML Notes-1
59 pages
9699457926machine Learning Lab
No ratings yet
9699457926machine Learning Lab
55 pages
EPS DL Handout1 Introduction Compressed
No ratings yet
EPS DL Handout1 Introduction Compressed
46 pages
Report Print
No ratings yet
Report Print
22 pages
ABES Presentation
No ratings yet
ABES Presentation
91 pages
Lecture 8
No ratings yet
Lecture 8
11 pages
Untitled Document
No ratings yet
Untitled Document
8 pages
Intro To Machine Learning With Apache Cassandra and Apache Spark
No ratings yet
Intro To Machine Learning With Apache Cassandra and Apache Spark
80 pages
Silver Oak College of Computer Application: Subject:Machine Learning
No ratings yet
Silver Oak College of Computer Application: Subject:Machine Learning
15 pages
Rohit Unit 1 ML Notes
No ratings yet
Rohit Unit 1 ML Notes
27 pages
Spark MLIB
No ratings yet
Spark MLIB
50 pages
UCS - 401 - Unit-LV - Trends in Machine Learning - Model and Symbols - Bagging and Boosting, Multitask
No ratings yet
UCS - 401 - Unit-LV - Trends in Machine Learning - Model and Symbols - Bagging and Boosting, Multitask
44 pages
Algorithmeknn 121213175830 Phpapp02
No ratings yet
Algorithmeknn 121213175830 Phpapp02
52 pages
Machine Learning
No ratings yet
Machine Learning
24 pages
Lecture Notes 1 2 Intro Python
No ratings yet
Lecture Notes 1 2 Intro Python
13 pages
Lecture Notes On Machine Learning Concepts
No ratings yet
Lecture Notes On Machine Learning Concepts
5 pages
ML 7th Sem AIML ITE Notes Complete LONG (1) - 10-33
No ratings yet
ML 7th Sem AIML ITE Notes Complete LONG (1) - 10-33
24 pages
Lecture 2
No ratings yet
Lecture 2
36 pages
Mooc Presentation
No ratings yet
Mooc Presentation
13 pages
Basic Concepts of Machine Learning For Beginners 1732109263
No ratings yet
Basic Concepts of Machine Learning For Beginners 1732109263
102 pages
AI ML Concepts
No ratings yet
AI ML Concepts
97 pages
Day5 FDP IoT Part1
No ratings yet
Day5 FDP IoT Part1
89 pages
MATHEMATICAL FOUNDATIONS OF MACHINE LEARNING: Unveiling the Mathematical Essence of Machine Learning (2024 Guide for Beginners)
From Everand
MATHEMATICAL FOUNDATIONS OF MACHINE LEARNING: Unveiling the Mathematical Essence of Machine Learning (2024 Guide for Beginners)
DAVID MACKAY
No ratings yet
Machine Learning Mastery for Engineers
From Everand
Machine Learning Mastery for Engineers
Abdellatif Sadeq
No ratings yet
WENETSPEECH
No ratings yet
WENETSPEECH
5 pages
Lower Frame Rate Neural Network Acoustic Models
No ratings yet
Lower Frame Rate Neural Network Acoustic Models
5 pages
Lecture 7 - 1-Spark - Streaming
No ratings yet
Lecture 7 - 1-Spark - Streaming
25 pages
Personalization of CTC Speech Recognition Models
No ratings yet
Personalization of CTC Speech Recognition Models
8 pages
Lecture 3 - 1-ML and Data Systems Fundamentals
No ratings yet
Lecture 3 - 1-ML and Data Systems Fundamentals
48 pages
Lecture 1 - Introduction
No ratings yet
Lecture 1 - Introduction
124 pages
Lecture 4 - Spark Introduction
No ratings yet
Lecture 4 - Spark Introduction
45 pages
5.1 Exploratory Analysis en
No ratings yet
5.1 Exploratory Analysis en
79 pages
5.2 Feature Engineering
No ratings yet
5.2 Feature Engineering
57 pages
Panasonic Philippines Eco Solutions Department: New Slim Designs To Harmonize With Any Living Space
No ratings yet
Panasonic Philippines Eco Solutions Department: New Slim Designs To Harmonize With Any Living Space
4 pages
TV Price List WS - June 04, 2020
No ratings yet
TV Price List WS - June 04, 2020
1 page
Mainboard ESC Model P4M800PRO M
No ratings yet
Mainboard ESC Model P4M800PRO M
29 pages
Proiect 6. Manipulating Arrays
No ratings yet
Proiect 6. Manipulating Arrays
6 pages
Leonardo - Pisano - Fibonacci - 5 8498 4 - 64 9
No ratings yet
Leonardo - Pisano - Fibonacci - 5 8498 4 - 64 9
11 pages
Rama 88203 06011181621075
No ratings yet
Rama 88203 06011181621075
108 pages
Refund BSP
No ratings yet
Refund BSP
17 pages
Střešní Krytina - RTL - CS - EN - SUNLUX
No ratings yet
Střešní Krytina - RTL - CS - EN - SUNLUX
2 pages
Assignment 1
No ratings yet
Assignment 1
2 pages
Adnan's Computer For Pharmacist
No ratings yet
Adnan's Computer For Pharmacist
136 pages
Apache Superset Readthedocs Io en Latest
No ratings yet
Apache Superset Readthedocs Io en Latest
135 pages
Skin Cancer Classification Using Deep Learning
No ratings yet
Skin Cancer Classification Using Deep Learning
65 pages
12 Chapterwise Blue Print 2022-23
No ratings yet
12 Chapterwise Blue Print 2022-23
3 pages
Тест ИКТ 8
No ratings yet
Тест ИКТ 8
8 pages
Work Profile of Mukesh Pandey
No ratings yet
Work Profile of Mukesh Pandey
3 pages
Citrix MetaFrame Web Interface Administrator's Guide
No ratings yet
Citrix MetaFrame Web Interface Administrator's Guide
141 pages
Economic Impact of Traffic Signals
100% (1)
Economic Impact of Traffic Signals
6 pages
L4D2 Server Config
No ratings yet
L4D2 Server Config
3 pages
Creating A Consistent Layout in ASP - Net Web Pages (Razor) Sites - The ASP
No ratings yet
Creating A Consistent Layout in ASP - Net Web Pages (Razor) Sites - The ASP
17 pages
Chapter-1. Introduction To Communication Systems:-: (April-2010) (07) (2.1 & 2.2)
No ratings yet
Chapter-1. Introduction To Communication Systems:-: (April-2010) (07) (2.1 & 2.2)
6 pages
The CW Checks The Following Learning Outcomes:: 7ECON012C, Data Analytics
No ratings yet
The CW Checks The Following Learning Outcomes:: 7ECON012C, Data Analytics
3 pages
MoA AoA Amended PDF
No ratings yet
MoA AoA Amended PDF
185 pages
Reflection
No ratings yet
Reflection
2 pages
Dart Variables and Data Types
No ratings yet
Dart Variables and Data Types
3 pages
ComProg Module - M5 Final
No ratings yet
ComProg Module - M5 Final
6 pages
Inst Requirements novAA800series
No ratings yet
Inst Requirements novAA800series
15 pages
Module 1 - Software - Vulnerability
No ratings yet
Module 1 - Software - Vulnerability
43 pages
Lecture Week 08 Travel Demand Forecasting pt2 Annotated
No ratings yet
Lecture Week 08 Travel Demand Forecasting pt2 Annotated
20 pages
Coursera
No ratings yet
Coursera
4 pages

Lecture 6 - Spark ML

Uploaded by

Lecture 6 - Spark ML

Uploaded by

06/10/2024

Instructor: Van-Dang Tran, Ph.D.

“Programming Computers to optimize

Field of study that gives "computers

-- Arthur Samuel, 1959

HAVE YOU PLAYED MARIO?

HOW ABOUT AUTOMATING IT?

Question: To make this program learn any other games such

1. Write new rules as per the game

Question: To make this program learn any other games such as

1. Write new rules as per the game

• Branch of Artificial Intelligence

MACHINE LEARNING - APPLICATIONS

MACHINE LEARNING - APPLICATIONS

Classify content into predefined groups.

MACHINE LEARNING - APPLICATIONS

MACHINE LEARNING - APPLICATIONS

MACHINE LEARNING - APPLICATIONS

MACHINE LEARNING - APPLICATIONS

• Find Similar content based on Object Properties.

MACHINE LEARNING - TYPES?

Given example inputs & outputs, learn to

MACHINE LEARNING - TYPES?

Supervised Given example inputs & outputs, learn

Machine Learning Unsupervised No labels given, find structure

MACHINE LEARNING - TYPES?

Machine Learning Unsupervised No labels given, find structure

MACHINE LEARNING - TYPES?

Machine Learning Unsupervised Clustering

MACHINE LEARNING - CLASSIFICATION?

Yes We Use Logistic Regression

MACHINE LEARNING - REGRESSION?

In linear regression, we draw all possible lines

MACHINE LEARNING - CLUSTERING?

• To form a cluster based on

MACHINE LEARNING - TOOLS

DATA SIZE CLASSFICATION TOOLS

Lines Sample Data Analysis and Whiteboard,…

MACHINE LEARNING USING SPARK

• In-memory caching accelerates performance

• Easy to use high-level programming interface

MACHINE LEARNING LIBRARY (MLlib)

Consists of common learning algorithms and utilities, including:

MLLIB - COLLABORATIVE FILTERING

• Supports model-based collaborative filtering,

Estimator: An Estimator is an algorithm which can be fit on a DataFrame to produce a Transformer.

Pipeline: A Pipeline chains multiple Transformers and Estimators together to specify an ML

spark.mllib - BASIC STATISTICS

MLlib - CLASSIFICATION AND REGRESSION

MlLib - Other Classes of Algorithms

MACHINE LEARNING TECHNIQUES

K-Means Clustering using Spark

Focus: Implementation and Performance

Data assignments to clusters

• Repeat until convergence:

• Repeat until convergence:

• Repeat until convergence:

• Repeat until convergence:

• Repeat until convergence:

closest = data.map(p =>

• Repeat until convergence:

300 Hadoop 250 H ad oo p

§ Fast and easy machine learning programs

§ K means clustering using Spark

Examples and more: www.spark-project.org

You might also like