Introduction ML
Introduction ML
Machine Learning
1
Why Learn?
Machine learning is programming computers to optimize a
performance criterion using example data or past
experience.
There is no need to “learn” to calculate payroll
2
Why Learn?
Learning is used when:
◦Human expertise do not exist (navigating on Mars),
◦Humans are unable to explain their expertise (speech
recognition)
◦Solution changes in time (routing on a computer network)
◦Solution needs to be adapted to particular cases (user
biometrics)
3
Why Learn?
Learning general models from data of particular examples
Data is cheap and abundant; knowledge is expensive and
scarce.
Example in retail: Customer transactions to consumer
behavior:
◦People who bought “Da Vinci Code” also bought “The Five People
You Meet in Heaven”
Build a model that is a good and useful approximation to
the data.
4
What is Machine Learning?
Machine Learning
◦Study of algorithms that improve their performance at
some task with experience
Optimize a performance criterion using example data
or past experience.
Role of Statistics: Inference from a sample
Role of Computer Science: Efficient algorithms to
◦Solve the optimization problem
◦Representing and evaluating the model for inference
5
Growth of Machine Learning
Machine learning is preferred approach to
◦Speech recognition, Natural language processing
◦Computer vision
◦Medical outcomes analysis
◦Robot control
◦Computational biology
This trend is accelerating
◦Improved machine learning algorithms
◦Improved data capture, networking, faster computers
◦Software too complex to write by hand
◦New sensors / IO devices
◦Demand for self-customization to user, environment
6
Machine Learning Algorithms
The algorithms that help machines learn based on the
provided data
Machine learning algorithms are also called “learners”.
These algorithms can be categorized as:
◦Association Analysis
◦Supervised Learning
◦Classification
◦Regression/Prediction
◦Unsupervised Learning
◦Reinforcement Learning
7
Association Analysis
Basket analysis:
◦P (Y | X ) probability that somebody who buys X also buys Y where X
and Y are products/services.
◦Example: P ( chips | beer ) = 0.7
8
Association Analysis: Applications
Medical diagnosis
Protein Sequences
Fraud Detection in Credit Card Transactions
Customer Relationship Management (CRM).
9
Supervised Learning
The class of machine learning algorithms that take labelled
data as input and train a model to predict something
Types of supervised learning algorithms:
◦Classification
◦Regression
10
Supervised Learning: Classification
A classification problem is when the output variable is a
category, such as “red” or “blue” or “disease” and “no disease”.
Important Algorithms
◦Naïve Bayes
◦Logistic Regression
◦Support Vector Machines (SVM)
◦Decision Tree
11
Classification: Applications
Spam emails
Sentiment analysis
Image classification
Audio analysis
Text classification
12
Supervised Learning: Regression
A regression problem is when the output variable is a real
value, such as “dollars” or “weight”.
Algorithms
◦Simple Linear Regression
◦Multiple Linear Regression
◦Polynomial Regression
13
Regression: Application
Predictive analytics
Operations efficiency
Supporting decisions
Errors correction
14
Unsupervised Learning
The class of machine learning algorithms that take unlabeled
data as input and group it in distinct categories
◦Clustering
◦Principal Component Analysis (PCA)
◦Anomaly detection
15
Clustering
A clustering problem is where you want to discover the
inherent groupings in the data, such as grouping customers by
purchasing behavior.
Algorithms
◦Kmeans
◦DBSCAN
◦Hierarchical Clustering
16
Clustering: Applications
Customer segmentation
Inventory categorization
Anomaly of fraud detection
Pattern recognition
Environmental risks
17
Principal Component Analysis
PCA is a technique that takes a set of correlated variables and
linearly transforms those variables into a set of uncorrelated
factors.
18
Principal Component Analysis:
Applications
Data Compression
Dimensionality reduction
19
Anomaly Detection
Anomaly detection is a technique used to identify unusual
patterns that do not conform to expected behavior, called
outliers.
20
Anomaly Detection: Applications
Intrusion detection
Health monitoring
Fraud detection
21
Reinforcement Learning
Problems involving an agent interacting with an environment,
which provides numeric reward signals
Goal: Learn how to take actions in order to maximize reward
22
Reinforcement Learning:
Applications
Autonomous Driving
Traffic Control
Computer Games (AlphaGo, etc.)
23
Machine Learning Pipeline
Data Preparation for training, development and testing
◦ Re-scaling
◦ Normalization
◦ Train/dev/test split
Model training
◦ Model training using training data set
Model Evaluation
◦ Cost function
◦ Performance
Model Improvement
◦ Regularization
◦ Hyper-parameters tuning
Model deployment
24
Summary
Machine learning is based on learning algorithms which help a machine (computer) to learn
based on the available previous data
There are three main classes of machine learning algorithms
◦ Supervised Learning
◦ Unsupervised learning
◦ Reinforcement learning
25