Machine Learning Spark ML

The document discusses machine learning and Spark MLlib. It covers topics like supervised and unsupervised learning, the machine learning process including data preparation, feature engineering, model building, and model evaluation. Spark MLlib can be used to build machine learning models on large datasets.

Uploaded by

Aditya Kumar

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

17 views10 pages

Machine Learning Spark ML

Uploaded by

Aditya Kumar

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

You are on page 1/ 10

Machine Learning with Spark MLlib

Name- Aditya Kumar

Roll No: MCA/40053/22

Department: MCA
Project Guide: Dr. Partha Sarathi Bishnu

Institute Name: Birla Institute of Technology

Mesra, Lalpur
MACHINE LEARNING (ML)

• ML IS A BRANCH OF ARTIFICIAL INTELLIGENCE:

• USES COMPUTING BASED SYSTEMS TO MAKE SENSE OUT OF DATA
• EXTRACTING PATTERNS, FITTING DATA TO FUNCTIONS, CLASSIFYING DATA,
ETC.
• ML SYSTEMS CAN LEARN AND IMPROVE
• WITH HISTORICAL DATA, TIME AND EXPERIENCE
• BRIDGES THEORETICAL COMPUTER SCIENCE AND REAL NOISE DATA.
ML IN REAL-LIFE

3
SUPERVISED AND
UNSUPERVISED LEARNING
• UNSUPERVISED LEARNING
• THERE ARE NOT PREDEFINED
AND KNOWN SET OF
OUTCOMES
• LOOK FOR HIDDEN PATTERNS
AND RELATIONS IN THE DATA
• A TYPICAL EXAMPLE:
CLUSTERING

4
SUPERVISED AND
UNSUPERVISED LEARNING
• SUPERVISED LEARNING
• FOR EVERY EXAMPLE IN THE DATA THERE IS ALWAYS A
PREDEFINED OUTCOME
• MODELS THE RELATIONS BETWEEN A SET OF DESCRIPTIVE
FEATURES AND A TARGET (FITS DATA TO A FUNCTION)
• 2 GROUPS OF PROBLEMS:
• CLASSIFICATION
• REGRESSION
5
SUPERVISED LEARNING
• CLASSIFICATION

• PREDICTS WHICH CLASS A GIVEN SAMPLE OF

DATA (SAMPLE OF DESCRIPTIVE FEATURES) IS
PART OF (DISCRETE VALUE).
• REGRESSION

• PREDICTS CONTINUOUS VALUES.

6
MACHINE LEARNING AS A
PROCESS
- Define measurable and quantifiable goals
Define - Use this stage to learn about the problem
Objectives

- Study models accuracy Model - Normalization

- Work better than the naïve Deploymen - Transformation
Data
approach or previous system t - Missing Values
Preparation
- Do the results make sense in the - Outliers
context of the problem

- Data Splitting
- Features Engineering
Model Model
- Estimating Performance
Evaluation Building
- Evaluation and Model
Selection
7
ML AS A PROCESS: DATA PREPARATION
• Needed for several reasons
• Some Models have strict data requirements
• Scale of the data, data point intervals, etc
• Some characteristics of the data may impact dramatically on the model
performance
• Time on data preparation should not be underestimated
• Scaling
• Missing Values • Centering
• Error Values Data
Raw • Different Scales Data • Skewness
Modeling
• Dimensionality Transform • Outliers Read
Data ation • Missing phase
• Types Problems
• Many others Values y
• Errors

8
ML AS A PROCESS: FEATURE
ENGINEERING
• Determine the predictors (features) to be used is one of the most critical questions
• Some times we need to add predictors
• Reduce Number:
• Fewer predictors more interpretable model and less costly
• Most of the models are affected by high dimensionality, specially for non-informative predictors
Algorithms that
Multiple models
use models as
Wrappers adding and
removing
input and
performance as
Genetics
Algorithms
parameter
output

• Binning predictors Evaluate the

Filters relevance of the
predictor
Based normally on
correlations

9
ML AS A PROCESS: MODEL
BUILDING
• DATA SPLITTING
• ALLOCATE DATA TO DIFFERENT TASKS
• MODEL TRAINING
• PERFORMANCE EVALUATION
• DEFINE TRAINING, VALIDATION AND TEST SETS

• FEATURE SELECTION (REVIEW THE DECISION MADE PREVIOUSLY)

• ESTIMATING PERFORMANCE
• VISUALIZATION OF RESULTS – DISCOVERY INTERESTING AREAS OF THE PROBLEM SPACE
• STATISTICS AND PERFORMANCE MEASURES

• EVALUATION AND MODEL SELECTION

• THE ‘NO FREE LUNCH’ THEOREM NO A PRIORY ASSUMPTIONS CAN BE MADE
10
• AVOID USE OF FAVORITE MODELS IF NEEDED

Machine Learning For Data Science Unit-4
No ratings yet
Machine Learning For Data Science Unit-4
16 pages
Machine: Learning ATO Z - I
No ratings yet
Machine: Learning ATO Z - I
131 pages
DSF - UNIT III Notes
No ratings yet
DSF - UNIT III Notes
17 pages
Machine Learning?
100% (2)
Machine Learning?
114 pages
Machine Learning Spark ML
No ratings yet
Machine Learning Spark ML
14 pages
Assignment 3
No ratings yet
Assignment 3
4 pages
Data Management and Data Transformation, Introduction To Machine Learning
No ratings yet
Data Management and Data Transformation, Introduction To Machine Learning
54 pages
Alternate Ending - City of Ember
100% (3)
Alternate Ending - City of Ember
3 pages
Machine Learning: From: Atul Ranjan Jha
No ratings yet
Machine Learning: From: Atul Ranjan Jha
11 pages
Lecture - 2 Classification (Machine Learning Basic and KNN)
No ratings yet
Lecture - 2 Classification (Machine Learning Basic and KNN)
90 pages
Machine Learning Introduction
100% (1)
Machine Learning Introduction
20 pages
Machine Learning
No ratings yet
Machine Learning
30 pages
Machine Learning The Way To Better Thinking
No ratings yet
Machine Learning The Way To Better Thinking
11 pages
Machine Learning
No ratings yet
Machine Learning
116 pages
Introduction To Machine Learning
No ratings yet
Introduction To Machine Learning
25 pages
Machine Learning Spark ML
No ratings yet
Machine Learning Spark ML
11 pages
Vaishanvi Case Study
No ratings yet
Vaishanvi Case Study
16 pages
ML Lectures 2022 Part 1
No ratings yet
ML Lectures 2022 Part 1
231 pages
AI-Lecture 8 (Machine Learning Overview)
No ratings yet
AI-Lecture 8 (Machine Learning Overview)
42 pages
Undergraduate Honors Thesis Length
100% (2)
Undergraduate Honors Thesis Length
5 pages
SEC Presentation
No ratings yet
SEC Presentation
22 pages
Module 2 - ML
No ratings yet
Module 2 - ML
53 pages
A Gift of Fire - Chapter 9
100% (1)
A Gift of Fire - Chapter 9
16 pages
Lecture - 1 Introduction To ML
No ratings yet
Lecture - 1 Introduction To ML
38 pages
Machine Learning
No ratings yet
Machine Learning
16 pages
7974 USTA RTC Training Manual
100% (4)
7974 USTA RTC Training Manual
123 pages
MCCEE Preparation 1
80% (5)
MCCEE Preparation 1
1 page
Disruptive Technologies AI Lecture 2
No ratings yet
Disruptive Technologies AI Lecture 2
12 pages
Unit 3
No ratings yet
Unit 3
13 pages
Core Concepts of AI
No ratings yet
Core Concepts of AI
46 pages
Ass Bigd
No ratings yet
Ass Bigd
9 pages
ML Midterm Cheatsheet
No ratings yet
ML Midterm Cheatsheet
2 pages
Lecture 2 - What Is ML
No ratings yet
Lecture 2 - What Is ML
17 pages
Unit 1: Introduction To Machine Learning
No ratings yet
Unit 1: Introduction To Machine Learning
12 pages
Unit 1
No ratings yet
Unit 1
41 pages
ML Unit1
No ratings yet
ML Unit1
25 pages
Volume 100, Issue 2
No ratings yet
Volume 100, Issue 2
16 pages
Unit 1
No ratings yet
Unit 1
38 pages
Machine Learning Introduction and Types
No ratings yet
Machine Learning Introduction and Types
7 pages
Social Media Analytics Techniques
No ratings yet
Social Media Analytics Techniques
77 pages
Data - Analytics - Chapter 2
No ratings yet
Data - Analytics - Chapter 2
58 pages
Machine Learning
No ratings yet
Machine Learning
57 pages
Ignorance Rik Peels PDF Download
No ratings yet
Ignorance Rik Peels PDF Download
88 pages
Chap 10-Machine Learning
No ratings yet
Chap 10-Machine Learning
25 pages
Chapter 02 Overview - 4
No ratings yet
Chapter 02 Overview - 4
43 pages
Machine Learning in Unit-1
No ratings yet
Machine Learning in Unit-1
10 pages
Introduction To ML
No ratings yet
Introduction To ML
55 pages
ML Cahp 1
No ratings yet
ML Cahp 1
35 pages
ML Mdu 2024 10939237
No ratings yet
ML Mdu 2024 10939237
20 pages
Course File INDEX
No ratings yet
Course File INDEX
20 pages
Lecture 1
No ratings yet
Lecture 1
21 pages
ML Unit 1
No ratings yet
ML Unit 1
21 pages
1 - Machine Learning Overview
No ratings yet
1 - Machine Learning Overview
56 pages
1 - Machine Learning Overview
No ratings yet
1 - Machine Learning Overview
53 pages
ML MU Unit 1 Introduction To MLPDF 2025 02 07 10 53 02
No ratings yet
ML MU Unit 1 Introduction To MLPDF 2025 02 07 10 53 02
49 pages
ML Notes All
No ratings yet
ML Notes All
32 pages
Chapter 4 - Machine Learning
No ratings yet
Chapter 4 - Machine Learning
81 pages
Machine Learning
No ratings yet
Machine Learning
84 pages
Machine Learning (ML) - Comprehensive Summary
No ratings yet
Machine Learning (ML) - Comprehensive Summary
7 pages
Unit 4 - Question Bank and Answers
No ratings yet
Unit 4 - Question Bank and Answers
23 pages
Lecture 2
No ratings yet
Lecture 2
36 pages
CV - Thibaut Knop PDF
No ratings yet
CV - Thibaut Knop PDF
2 pages
Air Quality Prediction Using Machine Learning
No ratings yet
Air Quality Prediction Using Machine Learning
29 pages
Module 1 ML
No ratings yet
Module 1 ML
51 pages
Department of Emerging Technology (SB) III B.Tech - I Semester
No ratings yet
Department of Emerging Technology (SB) III B.Tech - I Semester
12 pages
Unit-1 Introduction To Machine Learning (5hrs)
No ratings yet
Unit-1 Introduction To Machine Learning (5hrs)
8 pages
Machine Learning
No ratings yet
Machine Learning
74 pages
MOVEMENT
100% (1)
MOVEMENT
4 pages
MCS224 Dec 2024 Solved
No ratings yet
MCS224 Dec 2024 Solved
22 pages
Alphabet Thematic Unit Standard E-Book
No ratings yet
Alphabet Thematic Unit Standard E-Book
81 pages
Here Are Statistics About The Grades of The QUIZ
No ratings yet
Here Are Statistics About The Grades of The QUIZ
20 pages
Ch02 Project Methodologies and Process
No ratings yet
Ch02 Project Methodologies and Process
32 pages
Most People Believe That It S Important To Look Nice However Some People Say That We Place Too Much Importance On Appearance and Fashion
No ratings yet
Most People Believe That It S Important To Look Nice However Some People Say That We Place Too Much Importance On Appearance and Fashion
2 pages
Bsge 2a
No ratings yet
Bsge 2a
18 pages
Application Letter: Maricel T. Pacariem
No ratings yet
Application Letter: Maricel T. Pacariem
3 pages
Excretory System: Instruction: Complete The Table Below
No ratings yet
Excretory System: Instruction: Complete The Table Below
4 pages
Di PDF
No ratings yet
Di PDF
13 pages
The Imperative
No ratings yet
The Imperative
41 pages
Winter Recurrent 2021 Atr72
No ratings yet
Winter Recurrent 2021 Atr72
21 pages
Mr. and Ms. NOCNHS 2022
No ratings yet
Mr. and Ms. NOCNHS 2022
7 pages
Expert Systems With Applications: Wei-Sen Chen, Yin-Kuan Du
No ratings yet
Expert Systems With Applications: Wei-Sen Chen, Yin-Kuan Du
12 pages
7411edn Assignment 1
No ratings yet
7411edn Assignment 1
23 pages
1 Mathematical Mountaineering: Problem Solving in Mathematics
No ratings yet
1 Mathematical Mountaineering: Problem Solving in Mathematics
6 pages
Competency Based Training
No ratings yet
Competency Based Training
13 pages
Creativity Rubric For PBL
No ratings yet
Creativity Rubric For PBL
3 pages
EAPP Quarter 1 Module2
No ratings yet
EAPP Quarter 1 Module2
33 pages
Balancing Distant Learning
No ratings yet
Balancing Distant Learning
2 pages
Lesson 3
No ratings yet
Lesson 3
7 pages
Cat and The Hat
No ratings yet
Cat and The Hat
3 pages