0% found this document useful (0 votes)
26 views

Machine Learning Spark ML

The document provides an introduction to machine learning, including defining machine learning, examples of machine learning in real life, supervised vs unsupervised learning, and machine learning as a process involving data preparation, feature engineering, model building, evaluation and deployment.
Copyright
© © All Rights Reserved
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
26 views

Machine Learning Spark ML

The document provides an introduction to machine learning, including defining machine learning, examples of machine learning in real life, supervised vs unsupervised learning, and machine learning as a process involving data preparation, feature engineering, model building, evaluation and deployment.
Copyright
© © All Rights Reserved
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 14

Introduction to Machine Learning

Perike Chandra Sekhar


Machine Learning (ML)
• ML is a branch of artificial intelligence:
• Uses computing based systems to make sense out of data
• Extracting patterns, fitting data to functions, classifying data, etc
• ML systems can learn and improve
• With historical data, time and experience
• Bridges theoretical computer science and real noise data.

6
ML in real-life

7
Supervised and Unsupervised Learning
• Unsupervised Learning
• There are not predefined and known set of outcomes
• Look for hidden patterns and relations in the data
• A typical example: Clustering

8
Supervised and Unsupervised Learning
• Supervised Learning
• For every example in the data there is always a predefined outcome
• Models the relations between a set of descriptive features and a
target (Fits data to a function)
• 2 groups of problems:
• Classification
• Regression

9
Supervised Learning
• Classification
• Predicts which class a given sample of data (sample of descriptive
features) is part of (discrete value).

• Regression
• Predicts continuous values.

10
Machine Learning as a Process
- Define measurable and quantifiable goals
- Use this stage to learn about the problem
Define
Objectives
- Normalization
- Transformation
- Missing Values
- Outliers
Model
Deployment Data
- Study models accuracy Preparation
- Work better than the naïve - Data Splitting
approach or previous system - Features Engineering
- Do the results make sense in the - Estimating Performance
context of the problem - Evaluation and Model
Selection

Model Model
Evaluation Building

11
ML as a Process: Data Preparation
• Needed for several reasons
• Some Models have strict data requirements
• Scale of the data, data point intervals, etc
• Some characteristics of the data may impact dramatically on the model
performance
• Time on data preparation should not be underestimated
• Scaling
• Missing Values • Centering
• Error Values
Raw • Different Scales Data
Transform
• Skewness
• Outliers
Data Modeling
Data • Dimensionality
• Types Problems ation • Missing Ready phase
• Many others Values
• Errors

12
ML as a Process: Feature engineering
• Determine the predictors (features) to be used is one of the most critical
questions
• Some times we need to add predictors
• Reduce Number:
• Fewer predictors more interpretable model and less costly
• Most of the models are affected by high dimensionality, specially for non-informative predictors
Algorithms that use
Multiple models
Wrappers adding and removing
parameter
models as input and
performance as
Genetics Algorithms
output

Filters Evaluate the relevance


of the predictor
Based normally on
correlations

• Binning predictors

13
ML as a Process: Model Building
• Data Splitting
• Allocate data to different tasks
• model training
• performance evaluation
• Define Training, Validation and Test sets
• Feature Selection (Review the decision made previously)
• Estimating Performance
• Visualization of results – discovery interesting areas of the problem space
• Statistics and performance measures
• Evaluation and Model selection
• The ‘no free lunch’ theorem no a priory assumptions can be made
• Avoid use of favorite models if NEEDED
14

You might also like