100% found this document useful (1 vote)
75 views52 pages

Introduction To Machine Learning: Jaime S. Cardoso

This document provides an introduction to machine learning. It discusses different types of machine learning problems including supervised learning problems like classification and regression. For classification problems, the goal is to predict a class label for new examples, while for regression the goal is to predict a continuous valued output like price. The document uses examples like sorting fish and predicting house prices to illustrate machine learning concepts.

Uploaded by

Erica Lopes
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
100% found this document useful (1 vote)
75 views52 pages

Introduction To Machine Learning: Jaime S. Cardoso

This document provides an introduction to machine learning. It discusses different types of machine learning problems including supervised learning problems like classification and regression. For classification problems, the goal is to predict a class label for new examples, while for regression the goal is to predict a continuous valued output like price. The document uses examples like sorting fish and predicting house prices to illustrate machine learning concepts.

Uploaded by

Erica Lopes
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 52

Jaime S.

Cardoso
[email protected]

INESC TEC and Faculdade de Engenharia,


Universidade do Porto, Portugal

Introduction to Machine Learning

FEUP 2021/22
Oct, 2021, Porto, Portugal
Roadmap
• What’s Machine Learning
• Distinct Learning Problems
• For the same problem, different solutions
• Different solutions but with common traits
– … and ingredients
• Avoiding overfitting and data memorization
• A fair judgement of your algorithm
• Some classical ML algorithms
• Beyond the classics
2
Artificial Intelligence (AI)
• “ […automation of] activities that we
associate with human thinking, activities such
as decision-making, problem solving,
learning…” (Bellman, 1978)
• “ The branch of computer science that is
concerned with the automation of intelligent
behaviour.” (Luger and Stubblefield, 1993)
• “The ultimate goal of AI is to create
technology that allows computational
machines to function in a highly intelligent
manner. (Li Deng 2018)
3
AI: three generations
1st wave of AI: the sixties
• emulates the decision-making process of a
human expert

Program
Output
Computer
Data

4
AI: three generations
1st wave of AI: the sixties
• Based on expert knowledge
– “if-then-else”
• Effective in narrow-domain problems
• Focus on the head or most important parameters
(identified in advance), leaving the “tail” parameters
and cases untouched.

• Transparent and interpretable


• Difficulty in generalizing to new situations and domains
• Cannot handle uncertainty
• Lack the ability to learn algorithmically from data
5
AI: three generations
2nd wave of AI: the eighties
• Based on (shallow) machine learning

Data
Machine Program
Learning
Output

Program
Output
Computer
Data

6
An example*

• Problem: sorting incoming


fish on a conveyor belt
according to species

• Assume that we have only


two kinds of fish:
– Salmon
– Sea bass

7
*Adapted from Duda, Hart and Stork, Pattern Classification, 2nd Ed.
An example: decision process
• What kind of information can distinguish one species
from the other?
– Length, width, weight, number and shape of fins, tail
shape, etc.
• What can cause problems during sensing?
– Lighting conditions, position of fish on the conveyor belt,
camera noise, etc.
• What are the steps in the process?
– Capture image -> isolate fish -> take measurements ->
make decision

8
An example: our system
• Sensor
– The camera captures an image as a new fish enters the sorting area
• Preprocessing
– Adjustments for average intensity levels
– Segmentation to separate fish from background
• Feature Extraction
– Assume a fisherman told us that a sea bass is generally longer than a salmon. We
can use length as a feature and decide between sea bass and salmon according to a
threshold on length.

Sensor Pixels input


Filtering Features Decisions

9
An example: features

We estimate the system’s probability of error and obtain a


discouraging result of 40%. Can we improve this result? 10
An example: features
• Even though sea bass is longer than salmon on the
average, there are many examples of fish where this
observation does not hold
• Committed to achieve a higher recognition rate, we
try a number of features
– Width, Area, Position of the eyes w.r.t. mouth...
– only to find out that these features contain no
discriminatory information
• Finally we find a “good” feature: average intensity of
the fish scales

11
An example: features

Histogram of the lightness feature for two types of fish in


training samples. It looks easier to choose the threshold but
12
we still can not make a perfect decision.
An example: multiple features

• We can use two features in our decision:


– lightness: 𝒙1
– length: 𝒙2
• Each fish image is now represented as a point
(feature vector)

é x1 ù
x =ê ú
ë x2 û
in a two-dimensional feature space.

13
An example: multiple features

Scatter plot of lightness and length features for training samples. We


can compute a decision boundary to divide the feature space into
14
two regions with a classification rate of 95.7%.
An example: cost of error

• We should also consider costs of different errors we


make in our decisions.
• For example, if the fish packing company knows that:
– Customers who buy salmon will object vigorously if they
see sea bass in their cans.
– Customers who buy sea bass will not be unhappy if they
occasionally see some expensive salmon in their cans.
• How does this knowledge affect our decision?

15
An example: cost of error

We could intuitively shift the decision boundary to


minimize an alternative cost function 16
An example: generalization
• The issue of generalization
– The recognition rate of our linear classifier (95.7%) met the
design specifications, but we still think we can improve the
performance of the system
– We then design a classifier that obtains an impressive
classification rate of 99.9975% with the following decision
boundary

17
Data Driven Design
• When to use?
– Difficult to reason about a generic rule that solves
the problem
– Easy to collect examples (with the solution)

Length
18
Data Driven Design
• There is little or no domain theory
• Thus the system will learn (i.e., generalize)
from training data the general input-output
function
 Programming computers to use example data or past
experience
• The system produces a program that
implements a function that assigns the
decision to any observation (and not just the
input-output patterns of the training data)
19
What is Machine Learning?
• Automating the Automation

Data
Output
Computer
Program

Data
Machine Program
Learning
Output

20
Data Driven Design
• A good learning program learns something
about the data beyond the specific cases that
have been presented to it
– Indeed, it is trivial to just store and retrieve the
cases that have been seen in the past
• This does not address the problem of how to handle
new cases, however
• Over-fitting a model to the data means that
instead of general properties of the
population we learn idiosyncracies (i.e., non-
representative properties) of the sample.
21
DISTINCT LEARNING PROBLEMS

22
Taxonomy of the Learning Settings
Goals and available data dictate the type of learning problem
• Supervised Learning
– Classification
• Binary
• Multiclass
– Nominal
– Ordinal
– Regression
– Ranking
– Counting
• Semi-supervised Learning
• Unsupervised Learning
• Reinforcement Learning
• etc. 23
Supervised Learning: Examples

24
Classification/Regression

y = f(x)
output prediction feature
function vector

• Training: given a training set of labeled examples {(x1,y1), …,


(xN,yN)}, estimate the prediction function f by minimizing the
prediction error on the training set
• Testing: apply f to a never before seen test example x and
output the predicted value y = f(x)

25
Regression
• Predicting house price
– Output: price (a scalar)
– Inputs: size, orientation, localization, distance to key
services, etc.

• Given a collection of labelled examples (= houses


with known price), come up with a function that
will predict the price of new examples (houses).
26
Supervised Learning
in computer vision
Training Training
Labels
Training
Images
Image Learned
Training
Features model

Testing
Image Learned
Prediction
Features model
Test Image
27
… but with common traits

FOR THE SAME PROBLEM,


DIFFERENT SOLUTIONS

28
Design of a Classifier
Color

length

29
Design of a Classifier

30
Design of a Classifier

31
Taxonomy of the Learning Tools
no computation computation
of posterior probabilities Classifier
of posterior probabilities
(probability of certain class given the data)

Discriminant Probabilistic Probabilistic


function Discriminative Generative
Models Models
Properties Properties Properties
• directly map each x • Model posterior • model class priors
onto a class label probabilities (p(Ck|x)) (p(Ck)) & class-
directly conditional densities
Tools (p(x|Ck))
• Least Square Tools • use to compute
Classification • Logistic posterior probabilities
• Fisher’s Linear Regression (Ck|x))
Discriminant
• SVM Tools
• Etc. • Bayes
32
Pros and Cons of the three approaches
• Discriminant Functions are the most simple and
intuitive approach to classify data, but do not
allow to
– compensate for class priors (e.g. class 1 is a very rare
disease)
– minimize risk (e.g. classifying sick person as healthy
more costly than classifying healthy person as sick)
– implement reject option (e.g. person cannot be
classified as sick or healthy with a sufficiently high
probability)

33
Pros and Cons of the three approaches
• Generative models provide a probabilistic model of all
variables that allows to synthesize new data and to do
novelty detection but
– generating all this information is computationally expensive and
complex and is not needed for a simple classification decision

• Discriminative models provide a probabilistic model


for the target variable (classes) conditional on the
observed variables
• this is usually sufficient for making a well-informed
classification decision without the disadvantages of the
simple Discriminant Functions
34
DIFFERENT SOLUTIONS BUT WITH
COMMON INGREDIENTS

35
Common steps
• The learning of a model from the data entails:
– Model representation
– Evaluation
– Optimization

36
Linear Regression

• Model
Representation

37
Linear Regression
• Evaluation

38
Linear Regression
• Optimization: finding the model that
maximizes our measure of quality

39
Let’s design a classifier
• Use the (hyper-)plane orthogonal to the line
joining the means
– project the data in the direction given by the line
joining the class means

40
Let’s design a classifier

41
Fisher's linear discriminant
• Every algorithm has three components:
– Model representation
– Evaluation
– Optimization
• Model representation: class of linear models
• Evaluation: find the direction w that
(𝑚2 −𝑚1 )2
maximizes J(w)=
𝑠12 +𝑠22
• Optimization

42
Hyper parameters / user defined parameters

AVOIDING OVERFITTING AND DATA


MEMORIZATION

43
Regularization
• To build a machine learning algorithm we specify
model family, a cost function and optimization
procedure
• Regularization is any modification we make to a
learning algorithm that is intended to reduce its
generalization error but not its training error
– There are many regularization strategies
• Regularization works by trading increased bias for
reduced variance. An effective regularizer is one
that makes a profitable trade, reducing variance
significantly while not overly increasing the bias.

44
Regularized Regression

45
Regularized classifier
• Hyper parameters / user defined parameters

46
Parameter Norm Penalties
• Penalize complexity in the loss function
– Model complexity
– Weight Decay

47
Regularization
• Evaluation
– Minimize (error in data) + λ (model complexity)

48
1-Nearest neighbour classifier
Assign label of nearest training data point to each test data
point

Novel test example


Black = negative
Red = positive Closest to a
positive example
from the training
set, so classify it as
positive.
from Duda et al.

Voronoi partitioning of feature space


for 2-category 2D data
49
k-Nearest neighbour classifier
• For a new point, find the k closest points from training data
• Labels of the k points “vote” to classify

k=5

If the query lands here, the 5


NN consist of 3 negatives and
2 positives, so we classify it as
negative.

Black = negative
Red = positive
50
kNN as a classifier

• Advantages:
– Simple to implement
– Flexible to feature / distance choices
– Naturally handles multi-class cases
– Can do well in practice with enough representative data
• Disadvantages:
– Large search problem to find nearest neighbors → Highly
susceptible to the curse of dimensionality
– Storage of data
– Must have a meaningful distance function

51
What is Machine Learning?
• Automating the Automation
Data
Output
Computer
Program
User parameters
(hyper parameters)

Data
Machine Program (model)
Learning
Output

52

You might also like