Introduction To Machine Learning: Jaime S. Cardoso
Introduction To Machine Learning: Jaime S. Cardoso
Cardoso
[email protected]
FEUP 2021/22
Oct, 2021, Porto, Portugal
Roadmap
• What’s Machine Learning
• Distinct Learning Problems
• For the same problem, different solutions
• Different solutions but with common traits
– … and ingredients
• Avoiding overfitting and data memorization
• A fair judgement of your algorithm
• Some classical ML algorithms
• Beyond the classics
2
Artificial Intelligence (AI)
• “ […automation of] activities that we
associate with human thinking, activities such
as decision-making, problem solving,
learning…” (Bellman, 1978)
• “ The branch of computer science that is
concerned with the automation of intelligent
behaviour.” (Luger and Stubblefield, 1993)
• “The ultimate goal of AI is to create
technology that allows computational
machines to function in a highly intelligent
manner. (Li Deng 2018)
3
AI: three generations
1st wave of AI: the sixties
• emulates the decision-making process of a
human expert
Program
Output
Computer
Data
4
AI: three generations
1st wave of AI: the sixties
• Based on expert knowledge
– “if-then-else”
• Effective in narrow-domain problems
• Focus on the head or most important parameters
(identified in advance), leaving the “tail” parameters
and cases untouched.
Data
Machine Program
Learning
Output
Program
Output
Computer
Data
6
An example*
7
*Adapted from Duda, Hart and Stork, Pattern Classification, 2nd Ed.
An example: decision process
• What kind of information can distinguish one species
from the other?
– Length, width, weight, number and shape of fins, tail
shape, etc.
• What can cause problems during sensing?
– Lighting conditions, position of fish on the conveyor belt,
camera noise, etc.
• What are the steps in the process?
– Capture image -> isolate fish -> take measurements ->
make decision
8
An example: our system
• Sensor
– The camera captures an image as a new fish enters the sorting area
• Preprocessing
– Adjustments for average intensity levels
– Segmentation to separate fish from background
• Feature Extraction
– Assume a fisherman told us that a sea bass is generally longer than a salmon. We
can use length as a feature and decide between sea bass and salmon according to a
threshold on length.
9
An example: features
11
An example: features
é x1 ù
x =ê ú
ë x2 û
in a two-dimensional feature space.
13
An example: multiple features
15
An example: cost of error
17
Data Driven Design
• When to use?
– Difficult to reason about a generic rule that solves
the problem
– Easy to collect examples (with the solution)
Length
18
Data Driven Design
• There is little or no domain theory
• Thus the system will learn (i.e., generalize)
from training data the general input-output
function
Programming computers to use example data or past
experience
• The system produces a program that
implements a function that assigns the
decision to any observation (and not just the
input-output patterns of the training data)
19
What is Machine Learning?
• Automating the Automation
Data
Output
Computer
Program
Data
Machine Program
Learning
Output
20
Data Driven Design
• A good learning program learns something
about the data beyond the specific cases that
have been presented to it
– Indeed, it is trivial to just store and retrieve the
cases that have been seen in the past
• This does not address the problem of how to handle
new cases, however
• Over-fitting a model to the data means that
instead of general properties of the
population we learn idiosyncracies (i.e., non-
representative properties) of the sample.
21
DISTINCT LEARNING PROBLEMS
22
Taxonomy of the Learning Settings
Goals and available data dictate the type of learning problem
• Supervised Learning
– Classification
• Binary
• Multiclass
– Nominal
– Ordinal
– Regression
– Ranking
– Counting
• Semi-supervised Learning
• Unsupervised Learning
• Reinforcement Learning
• etc. 23
Supervised Learning: Examples
24
Classification/Regression
y = f(x)
output prediction feature
function vector
25
Regression
• Predicting house price
– Output: price (a scalar)
– Inputs: size, orientation, localization, distance to key
services, etc.
Testing
Image Learned
Prediction
Features model
Test Image
27
… but with common traits
28
Design of a Classifier
Color
length
29
Design of a Classifier
30
Design of a Classifier
31
Taxonomy of the Learning Tools
no computation computation
of posterior probabilities Classifier
of posterior probabilities
(probability of certain class given the data)
33
Pros and Cons of the three approaches
• Generative models provide a probabilistic model of all
variables that allows to synthesize new data and to do
novelty detection but
– generating all this information is computationally expensive and
complex and is not needed for a simple classification decision
35
Common steps
• The learning of a model from the data entails:
– Model representation
– Evaluation
– Optimization
36
Linear Regression
• Model
Representation
37
Linear Regression
• Evaluation
38
Linear Regression
• Optimization: finding the model that
maximizes our measure of quality
39
Let’s design a classifier
• Use the (hyper-)plane orthogonal to the line
joining the means
– project the data in the direction given by the line
joining the class means
40
Let’s design a classifier
41
Fisher's linear discriminant
• Every algorithm has three components:
– Model representation
– Evaluation
– Optimization
• Model representation: class of linear models
• Evaluation: find the direction w that
(𝑚2 −𝑚1 )2
maximizes J(w)=
𝑠12 +𝑠22
• Optimization
42
Hyper parameters / user defined parameters
43
Regularization
• To build a machine learning algorithm we specify
model family, a cost function and optimization
procedure
• Regularization is any modification we make to a
learning algorithm that is intended to reduce its
generalization error but not its training error
– There are many regularization strategies
• Regularization works by trading increased bias for
reduced variance. An effective regularizer is one
that makes a profitable trade, reducing variance
significantly while not overly increasing the bias.
44
Regularized Regression
45
Regularized classifier
• Hyper parameters / user defined parameters
46
Parameter Norm Penalties
• Penalize complexity in the loss function
– Model complexity
– Weight Decay
47
Regularization
• Evaluation
– Minimize (error in data) + λ (model complexity)
48
1-Nearest neighbour classifier
Assign label of nearest training data point to each test data
point
k=5
Black = negative
Red = positive
50
kNN as a classifier
• Advantages:
– Simple to implement
– Flexible to feature / distance choices
– Naturally handles multi-class cases
– Can do well in practice with enough representative data
• Disadvantages:
– Large search problem to find nearest neighbors → Highly
susceptible to the curse of dimensionality
– Storage of data
– Must have a meaningful distance function
51
What is Machine Learning?
• Automating the Automation
Data
Output
Computer
Program
User parameters
(hyper parameters)
Data
Machine Program (model)
Learning
Output
52