100% found this document useful (1 vote)

75 views52 pages

Introduction To Machine Learning: Jaime S. Cardoso

This document provides an introduction to machine learning. It discusses different types of machine learning problems including supervised learning problems like classification and regression. For classification problems, the goal is to predict a class label for new examples, while for regression the goal is to predict a continuous valued output like price. The document uses examples like sorting fish and predicting house prices to illustrate machine learning concepts.

Uploaded by

Erica Lopes

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

100% found this document useful (1 vote)

75 views52 pages

Introduction To Machine Learning: Jaime S. Cardoso

Uploaded by

Erica Lopes

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 52

Jaime S.

Cardoso
[email protected]

INESC TEC and Faculdade de Engenharia,

Universidade do Porto, Portugal

Introduction to Machine Learning

FEUP 2021/22
Oct, 2021, Porto, Portugal
Roadmap
• What’s Machine Learning
• Distinct Learning Problems
• For the same problem, different solutions
• Different solutions but with common traits
– … and ingredients
• Avoiding overfitting and data memorization
• A fair judgement of your algorithm
• Some classical ML algorithms
• Beyond the classics
2
Artificial Intelligence (AI)
• “ […automation of] activities that we
associate with human thinking, activities such
as decision-making, problem solving,
learning…” (Bellman, 1978)
• “ The branch of computer science that is
concerned with the automation of intelligent
behaviour.” (Luger and Stubblefield, 1993)
• “The ultimate goal of AI is to create
technology that allows computational
machines to function in a highly intelligent
manner. (Li Deng 2018)
3
AI: three generations
1st wave of AI: the sixties
• emulates the decision-making process of a
human expert

Program
Output
Computer
Data

4
AI: three generations
1st wave of AI: the sixties
• Based on expert knowledge
– “if-then-else”
• Effective in narrow-domain problems
• Focus on the head or most important parameters
(identified in advance), leaving the “tail” parameters
and cases untouched.

• Transparent and interpretable

• Difficulty in generalizing to new situations and domains
• Cannot handle uncertainty
• Lack the ability to learn algorithmically from data
5
AI: three generations
2nd wave of AI: the eighties
• Based on (shallow) machine learning

Data
Machine Program
Learning
Output

Program
Output
Computer
Data

6
An example*

• Problem: sorting incoming

fish on a conveyor belt
according to species

• Assume that we have only

two kinds of fish:
– Salmon
– Sea bass

7
*Adapted from Duda, Hart and Stork, Pattern Classification, 2nd Ed.
An example: decision process
• What kind of information can distinguish one species
from the other?
– Length, width, weight, number and shape of fins, tail
shape, etc.
• What can cause problems during sensing?
– Lighting conditions, position of fish on the conveyor belt,
camera noise, etc.
• What are the steps in the process?
– Capture image -> isolate fish -> take measurements ->
make decision

8
An example: our system
• Sensor
– The camera captures an image as a new fish enters the sorting area
• Preprocessing
– Adjustments for average intensity levels
– Segmentation to separate fish from background
• Feature Extraction
– Assume a fisherman told us that a sea bass is generally longer than a salmon. We
can use length as a feature and decide between sea bass and salmon according to a
threshold on length.

Sensor Pixels input

Filtering Features Decisions

9
An example: features

We estimate the system’s probability of error and obtain a

discouraging result of 40%. Can we improve this result? 10
An example: features
• Even though sea bass is longer than salmon on the
average, there are many examples of fish where this
observation does not hold
• Committed to achieve a higher recognition rate, we
try a number of features
– Width, Area, Position of the eyes w.r.t. mouth...
– only to find out that these features contain no
discriminatory information
• Finally we find a “good” feature: average intensity of
the fish scales

11
An example: features

Histogram of the lightness feature for two types of fish in

training samples. It looks easier to choose the threshold but
12
we still can not make a perfect decision.
An example: multiple features

• We can use two features in our decision:

– lightness: 𝒙1
– length: 𝒙2
• Each fish image is now represented as a point
(feature vector)

é x1 ù
x =ê ú
ë x2 û
in a two-dimensional feature space.

13
An example: multiple features

Scatter plot of lightness and length features for training samples. We

can compute a decision boundary to divide the feature space into
14
two regions with a classification rate of 95.7%.
An example: cost of error

• We should also consider costs of different errors we

make in our decisions.
• For example, if the fish packing company knows that:
– Customers who buy salmon will object vigorously if they
see sea bass in their cans.
– Customers who buy sea bass will not be unhappy if they
occasionally see some expensive salmon in their cans.
• How does this knowledge affect our decision?

15
An example: cost of error

We could intuitively shift the decision boundary to

minimize an alternative cost function 16
An example: generalization
• The issue of generalization
– The recognition rate of our linear classifier (95.7%) met the
design specifications, but we still think we can improve the
performance of the system
– We then design a classifier that obtains an impressive
classification rate of 99.9975% with the following decision
boundary

17
Data Driven Design
• When to use?
– Difficult to reason about a generic rule that solves
the problem
– Easy to collect examples (with the solution)

Length
18
Data Driven Design
• There is little or no domain theory
• Thus the system will learn (i.e., generalize)
from training data the general input-output
function
 Programming computers to use example data or past
experience
• The system produces a program that
implements a function that assigns the
decision to any observation (and not just the
input-output patterns of the training data)
19
What is Machine Learning?
• Automating the Automation

Data
Output
Computer
Program

Data
Machine Program
Learning
Output

20
Data Driven Design
• A good learning program learns something
about the data beyond the specific cases that
have been presented to it
– Indeed, it is trivial to just store and retrieve the
cases that have been seen in the past
• This does not address the problem of how to handle
new cases, however
• Over-fitting a model to the data means that
instead of general properties of the
population we learn idiosyncracies (i.e., non-
representative properties) of the sample.
21
DISTINCT LEARNING PROBLEMS

22
Taxonomy of the Learning Settings
Goals and available data dictate the type of learning problem
• Supervised Learning
– Classification
• Binary
• Multiclass
– Nominal
– Ordinal
– Regression
– Ranking
– Counting
• Semi-supervised Learning
• Unsupervised Learning
• Reinforcement Learning
• etc. 23
Supervised Learning: Examples

24
Classification/Regression

y = f(x)
output prediction feature
function vector

• Training: given a training set of labeled examples {(x1,y1), …,

(xN,yN)}, estimate the prediction function f by minimizing the
prediction error on the training set
• Testing: apply f to a never before seen test example x and
output the predicted value y = f(x)

25
Regression
• Predicting house price
– Output: price (a scalar)
– Inputs: size, orientation, localization, distance to key
services, etc.

• Given a collection of labelled examples (= houses

with known price), come up with a function that
will predict the price of new examples (houses).
26
Supervised Learning
in computer vision
Training Training
Labels
Training
Images
Image Learned
Training
Features model

Testing
Image Learned
Prediction
Features model
Test Image
27
… but with common traits

FOR THE SAME PROBLEM,

DIFFERENT SOLUTIONS

28
Design of a Classifier
Color

length

29
Design of a Classifier

30
Design of a Classifier

31
Taxonomy of the Learning Tools
no computation computation
of posterior probabilities Classifier
of posterior probabilities
(probability of certain class given the data)

Discriminant Probabilistic Probabilistic

function Discriminative Generative
Models Models
Properties Properties Properties
• directly map each x • Model posterior • model class priors
onto a class label probabilities (p(Ck|x)) (p(Ck)) & class-
directly conditional densities
Tools (p(x|Ck))
• Least Square Tools • use to compute
Classification • Logistic posterior probabilities
• Fisher’s Linear Regression (Ck|x))
Discriminant
• SVM Tools
• Etc. • Bayes
32
Pros and Cons of the three approaches
• Discriminant Functions are the most simple and
intuitive approach to classify data, but do not
allow to
– compensate for class priors (e.g. class 1 is a very rare
disease)
– minimize risk (e.g. classifying sick person as healthy
more costly than classifying healthy person as sick)
– implement reject option (e.g. person cannot be
classified as sick or healthy with a sufficiently high
probability)

33
Pros and Cons of the three approaches
• Generative models provide a probabilistic model of all
variables that allows to synthesize new data and to do
novelty detection but
– generating all this information is computationally expensive and
complex and is not needed for a simple classification decision

• Discriminative models provide a probabilistic model

for the target variable (classes) conditional on the
observed variables
• this is usually sufficient for making a well-informed
classification decision without the disadvantages of the
simple Discriminant Functions
34
DIFFERENT SOLUTIONS BUT WITH
COMMON INGREDIENTS

35
Common steps
• The learning of a model from the data entails:
– Model representation
– Evaluation
– Optimization

36
Linear Regression

• Model
Representation

37
Linear Regression
• Evaluation

38
Linear Regression
• Optimization: finding the model that
maximizes our measure of quality

39
Let’s design a classifier
• Use the (hyper-)plane orthogonal to the line
joining the means
– project the data in the direction given by the line
joining the class means

40
Let’s design a classifier

41
Fisher's linear discriminant
• Every algorithm has three components:
– Model representation
– Evaluation
– Optimization
• Model representation: class of linear models
• Evaluation: find the direction w that
(𝑚2 −𝑚1 )2
maximizes J(w)=
𝑠12 +𝑠22
• Optimization

42
Hyper parameters / user defined parameters

AVOIDING OVERFITTING AND DATA

MEMORIZATION

43
Regularization
• To build a machine learning algorithm we specify
model family, a cost function and optimization
procedure
• Regularization is any modification we make to a
learning algorithm that is intended to reduce its
generalization error but not its training error
– There are many regularization strategies
• Regularization works by trading increased bias for
reduced variance. An effective regularizer is one
that makes a profitable trade, reducing variance
significantly while not overly increasing the bias.

44
Regularized Regression

45
Regularized classifier
• Hyper parameters / user defined parameters

46
Parameter Norm Penalties
• Penalize complexity in the loss function
– Model complexity
– Weight Decay

47
Regularization
• Evaluation
– Minimize (error in data) + λ (model complexity)

48
1-Nearest neighbour classifier
Assign label of nearest training data point to each test data
point

Novel test example

Black = negative
Red = positive Closest to a
positive example
from the training
set, so classify it as
positive.
from Duda et al.

Voronoi partitioning of feature space

for 2-category 2D data
49
k-Nearest neighbour classifier
• For a new point, find the k closest points from training data
• Labels of the k points “vote” to classify

k=5

If the query lands here, the 5

NN consist of 3 negatives and
2 positives, so we classify it as
negative.

Black = negative
Red = positive
50
kNN as a classifier

• Advantages:
– Simple to implement
– Flexible to feature / distance choices
– Naturally handles multi-class cases
– Can do well in practice with enough representative data
• Disadvantages:
– Large search problem to find nearest neighbors → Highly
susceptible to the curse of dimensionality
– Storage of data
– Must have a meaningful distance function

51
What is Machine Learning?
• Automating the Automation
Data
Output
Computer
Program
User parameters
(hyper parameters)

Data
Machine Program (model)
Learning
Output

Regression Anallysis Hands0n 1
100% (1)
Regression Anallysis Hands0n 1
3 pages
Assignment10 4
100% (1)
Assignment10 4
3 pages
Regressao Linear Simples - Ipynb - Colaboratory
100% (1)
Regressao Linear Simples - Ipynb - Colaboratory
2 pages
ML Lecture 15 Ensemble
No ratings yet
ML Lecture 15 Ensemble
27 pages
Taxi Trips Analysis Project 1682332303
100% (2)
Taxi Trips Analysis Project 1682332303
28 pages
Handout9 Trees Bagging Boosting
100% (1)
Handout9 Trees Bagging Boosting
23 pages
Charmi Shah 20bcp299 Lab2
100% (1)
Charmi Shah 20bcp299 Lab2
7 pages
Lab 3. Linear Regression 230223
100% (1)
Lab 3. Linear Regression 230223
7 pages
9 Regression
100% (1)
9 Regression
14 pages
NLP Techmax NLP
100% (1)
NLP Techmax NLP
137 pages
Importing Libraries: Import As Import As Import As From Import As From Import From Import Import
100% (1)
Importing Libraries: Import As Import As Import As From Import As From Import From Import Import
11 pages
Merging - Scaled - 1D - & - Trying - Different - CLassification - ML - Models - .Ipynb - Colaboratory
100% (1)
Merging - Scaled - 1D - & - Trying - Different - CLassification - ML - Models - .Ipynb - Colaboratory
16 pages
A) What Is Motivation Behind Ensemble Methods? Give Your Answer in Probabilistic Terms
100% (1)
A) What Is Motivation Behind Ensemble Methods? Give Your Answer in Probabilistic Terms
6 pages
Data Analytics Time Table V2
100% (1)
Data Analytics Time Table V2
6 pages
EMF CheatSheet V4
100% (1)
EMF CheatSheet V4
2 pages
CS550 Regression Aug12
100% (1)
CS550 Regression Aug12
63 pages
Linear Regression: What Is Regression Analysis?
100% (1)
Linear Regression: What Is Regression Analysis?
21 pages
Chapter-3-Linear Models For Regression
100% (1)
Chapter-3-Linear Models For Regression
61 pages
SQL Cheat Sheet
100% (1)
SQL Cheat Sheet
44 pages
Linear - Regression
100% (1)
Linear - Regression
39 pages
Xgboost in Online Transaction Fraud Detection
100% (1)
Xgboost in Online Transaction Fraud Detection
8 pages
Project 1 - Radio Link Failure Prediction
100% (1)
Project 1 - Radio Link Failure Prediction
8 pages
ML Lect1
100% (1)
ML Lect1
51 pages
Oil Export Indonesia
100% (1)
Oil Export Indonesia
12 pages
Csi 5155 ML Project Report
100% (1)
Csi 5155 ML Project Report
24 pages
ML0101EN Clas K Nearest Neighbors CustCat Py v1
100% (1)
ML0101EN Clas K Nearest Neighbors CustCat Py v1
11 pages
PR01
100% (1)
PR01
41 pages
Cardio Screen RF
100% (1)
Cardio Screen RF
27 pages
SVM (Support Vector Machine) For Classification - by Aditya Kumar - Towards Data Science
100% (1)
SVM (Support Vector Machine) For Classification - by Aditya Kumar - Towards Data Science
28 pages
Bagging and Boosting
100% (1)
Bagging and Boosting
19 pages
0.1 Stock Data
100% (1)
0.1 Stock Data
4 pages
Introduction to Boosting: Slides Adapted from Che Wanxiang (车万翔) at HIT, and Robin Dhamankar of Many thanks!
100% (1)
Introduction to Boosting: Slides Adapted from Che Wanxiang (车万翔) at HIT, and Robin Dhamankar of Many thanks!
41 pages
ML Lab6.Ipynb - Colaboratory
100% (1)
ML Lab6.Ipynb - Colaboratory
5 pages
Applied Data Science Camp - Info
100% (1)
Applied Data Science Camp - Info
12 pages
Variosalgoritmos - Jupyter Notebook
100% (1)
Variosalgoritmos - Jupyter Notebook
9 pages
Univariate and Bivariate Data Analysis + Probability
100% (1)
Univariate and Bivariate Data Analysis + Probability
5 pages
Outlines: Statements of Problems Objectives Bagging Random Forest Boosting Adaboost
100% (1)
Outlines: Statements of Problems Objectives Bagging Random Forest Boosting Adaboost
14 pages
Classification Problems
100% (1)
Classification Problems
25 pages
Hypothesis and Hypothesis Testing
100% (1)
Hypothesis and Hypothesis Testing
59 pages
K-NN (Nearest Neighbor)
100% (1)
K-NN (Nearest Neighbor)
17 pages
Teleco Cutomer Churn
100% (1)
Teleco Cutomer Churn
5 pages
ML MU Unit 2
100% (3)
ML MU Unit 2
84 pages
Sales Forecasting
100% (1)
Sales Forecasting
10 pages
CS229 Lecture 3 PDF
100% (1)
CS229 Lecture 3 PDF
35 pages
ML MU Unit 2
100% (2)
ML MU Unit 2
42 pages
Dokumen - Pub Approaching Almost Any Machine Learning Problem 9788269211528 L 5276104
100% (1)
Dokumen - Pub Approaching Almost Any Machine Learning Problem 9788269211528 L 5276104
151 pages
Assignment Updated 101
100% (1)
Assignment Updated 101
24 pages
Thinkcspy 3
100% (1)
Thinkcspy 3
415 pages
Decision Trees: at Some Point of Time You Have To Take A Decision Sitting On A Tree
100% (1)
Decision Trees: at Some Point of Time You Have To Take A Decision Sitting On A Tree
19 pages
Book
100% (1)
Book
480 pages
Correlation & Regression Analysis
100% (1)
Correlation & Regression Analysis
39 pages
C2M2 - Assignment: 1 Risk Models Using Tree-Based Models
100% (1)
C2M2 - Assignment: 1 Risk Models Using Tree-Based Models
38 pages
Random Forest: Implementaciones de Scikit-Learn Sobre QSAR
100% (1)
Random Forest: Implementaciones de Scikit-Learn Sobre QSAR
11 pages
01-Introduction Machine Learning
100% (1)
01-Introduction Machine Learning
48 pages
Lecture Week 2 KNN and Model Evaluation PDF
100% (1)
Lecture Week 2 KNN and Model Evaluation PDF
53 pages
Machine Learning (Analytics Vidhya) : What Is Logistic Regression?
100% (1)
Machine Learning (Analytics Vidhya) : What Is Logistic Regression?
5 pages
Group 3 - Artificial Intelligence in Banking Services
No ratings yet
Group 3 - Artificial Intelligence in Banking Services
5 pages
Bootstrap Powerpoint
100% (1)
Bootstrap Powerpoint
20 pages
Poly
100% (1)
Poly
108 pages
Scip y Lectures
100% (1)
Scip y Lectures
329 pages
Vinee
100% (1)
Vinee
28 pages
Introduction of Neural Network
100% (1)
Introduction of Neural Network
69 pages
Lecture 3
No ratings yet
Lecture 3
50 pages
Lecture 1 - Neural Network Definitions and Concepts 1
No ratings yet
Lecture 1 - Neural Network Definitions and Concepts 1
4 pages
Lecture Signal Flow Graphs
No ratings yet
Lecture Signal Flow Graphs
56 pages
Artificial Intelligence and Machine Learning
No ratings yet
Artificial Intelligence and Machine Learning
13 pages
Artificial Intelligence in Power Stations: Submitted by-S.Santhosh Raj
No ratings yet
Artificial Intelligence in Power Stations: Submitted by-S.Santhosh Raj
18 pages
Ee 357 Control Systems I Final Exam Apr. 28, 2005 DURATION: 120 Minutes
No ratings yet
Ee 357 Control Systems I Final Exam Apr. 28, 2005 DURATION: 120 Minutes
5 pages
Chap3 Basic Classification
No ratings yet
Chap3 Basic Classification
29 pages
Graph Database
No ratings yet
Graph Database
5 pages
Uebung CAE
No ratings yet
Uebung CAE
7 pages
Database Cheat Sheet
No ratings yet
Database Cheat Sheet
4 pages
Swe1011 Soft-Computing Eth 1.0 37 Swe1011
No ratings yet
Swe1011 Soft-Computing Eth 1.0 37 Swe1011
2 pages
Stanford CS Curriculum
No ratings yet
Stanford CS Curriculum
6 pages
Sheet 3 With Solutions
No ratings yet
Sheet 3 With Solutions
6 pages
Project Thesis2
No ratings yet
Project Thesis2
14 pages
Unit-5 Final
No ratings yet
Unit-5 Final
69 pages
Lab 3
No ratings yet
Lab 3
8 pages
Final
No ratings yet
Final
145 pages
Transfer Learning PDF
No ratings yet
Transfer Learning PDF
10 pages
Overleaf File
No ratings yet
Overleaf File
45 pages
Neural Network Based Control: Dan Simon Cleveland State University
No ratings yet
Neural Network Based Control: Dan Simon Cleveland State University
11 pages
Smart Glasses With Face Recognition
No ratings yet
Smart Glasses With Face Recognition
27 pages
A Survey On Deep Learning Techniques For Image and Video Semantic Segmentation
No ratings yet
A Survey On Deep Learning Techniques For Image and Video Semantic Segmentation
61 pages
Call Hold Protocol Adherence
No ratings yet
Call Hold Protocol Adherence
16 pages
Brij Kishor Singh
No ratings yet
Brij Kishor Singh
4 pages
Linear Control Systems: Amin Rezaeizadeh Fall 1397
No ratings yet
Linear Control Systems: Amin Rezaeizadeh Fall 1397
15 pages
Fuzzy Logic Control
No ratings yet
Fuzzy Logic Control
9 pages
Ghafari 2012
No ratings yet
Ghafari 2012
9 pages
Report
No ratings yet
Report
4 pages