0% found this document useful (0 votes)
155 views

Machine Learning

This document provides an introduction and overview of a machine learning course. It outlines the course information including instructors, schedule, textbooks, and goals. It then discusses what machine learning is, defining it as a computer program that improves its performance on tasks through experience. Traditional programming is contrasted with machine learning, where the machine learns a program from example data rather than being explicitly programmed. Examples of applications of machine learning are provided.

Uploaded by

Marco Caruso
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
155 views

Machine Learning

This document provides an introduction and overview of a machine learning course. It outlines the course information including instructors, schedule, textbooks, and goals. It then discusses what machine learning is, defining it as a computer program that improves its performance on tasks through experience. Traditional programming is contrasted with machine learning, where the machine learns a program from example data rather than being explicitly programmed. Examples of applications of machine learning are provided.

Uploaded by

Marco Caruso
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 47

Machine Learning

Introduction

Marcello Restelli

February 25, 2018


Outline

1 Course Information

2 What is Machine Learning?

3 Supervised Learning

Marcello Restelli February 25, 2018 2 / 41


Course Information

Admin 03/03/20

Instructor: Marcello Restelli – [email protected]


Teaching assistant: Francesco Trovò – [email protected]
Class Website on BeeP: https://beep.metid.polimi.it/web/2017-18-
machine-learning-marcello-restelli-/
Assessment
Written exam
Thesis (only a few)
March, 6th at 5:00 p.m.: Meeting for thesis proposals (Sala Conferenze
Emilio Gatti, Building 20)

Marcello Restelli February 25, 2018 4 / 41


Course Information

Relations to Other Courses

A course of 5 credits is not enough to cover the main aspects of


Machine Learning
Fortunately, there are other courses that deal with some machine
learning topics:
Data Mining and Text Mining
Soft Computing
Applied Statistics
Model Identification and Data Analysis
artificial neural networks and deep learning

Marcello Restelli February 25, 2018 5 / 41


Course Information

Practical classes

For each topic there will be practical classes


In these classes Francesco will present
Exercises similar to the ones you will find in the exam
Practical exercises using Matlab
We suggest to bring your laptop

Marcello Restelli February 25, 2018 6 / 41


Course Information

Schedule: first part

26-Feb-2018 Introduction Restelli Bishop, Ch. 1, 2


01-Mar-2018 Matlab Trovò
05-Mar-2018 Linear regression Restelli Bishop, Ch. 3.1, 3.2, 3.3
Bishop, Ch. 4.1.1, 4.1.2,
08-Mar-2018 Linear regression Restelli 4.1.3, 4.1.7, 4.3.1, 4.3.2
12-Mar-2018 Ex. on linear regression Trovò
15-Mar-2018 Liner classification Restelli Bishop, Ch. 3.2, 1.3
19-Mar-2018 Ex. on linear classification Trovò
22-Mar-2018 Bias-Variance Restelli Bishop, Ch. 12.1, 14.2, 14.3
26-Mar-2018 Model Selection Restelli Mitchell, Ch. 7.1, 7.2, 7.3
05-Apr-2018 PAC-Learning and VC dimension Restelli Mitchell, Ch. 7.4
09-Apr-2018 Ex. on learning theory Trovò
12-Apr-2018 Kernel Methods Restelli Bishop, Ch. 6.1, 6.2
16-Apr-2018 Gaussian Processes Restelli Bishop, Ch. 6.4
19-Apr-2018 Ex. on Gaussian Processes Trovò
23-Apr-2018 Support Vector Machines Restelli Bishop, Ch. 7.1.1, 7.1.2
26-Apr-2018 Ex. on Support Vector Machines Trovò

Marcello Restelli February 25, 2018 7 / 41


Course Information

Schedule: second part

03-May-2018 Markov Decision Processes Restelli Sutton&Barto, Ch. 1, 2, 3


07-May-2018 Dynamic Programming Restelli Sutton&Barto, Ch. 4
10-May-2018 Ex. on Dynamic Programming Trovò
14-May-2018 RL in finite MDPs Restelli Sutton&Barto, Ch. 5, 6
17-May-2018 RL in finite MDPs Restelli Sutton&Barto, Ch. 7
21-May-2018 RL in finite MDPs Restelli Sutton&Barto, Ch. 8
24-May-2018 Multi-armed bandit Trovò
28-May-2018 RL in continuous MDPs Restelli Szepesvari, Ch. 4.3, 4.4
31-May-2018 Ex. on multi-armed bandit Trovò
04-Jun-2018 Ex. on RL in finte MDPs Trovò

Marcello Restelli February 25, 2018 8 / 41


Course Information

Textbooks

Supervised Learning
Bishop, “Pattern Recognition and Machine Learning”, Springer, 2006.
Mitchell, “Machine Learning”, McGraw Hill, 1997.
Hastie, Tibshirani, Friedman, “The Elements of Statistical Learning: Data
Mining, Inference, and Prediction”, Springer, 2009.
Reinforcement Learning
Sutton and Barto, “Reinforcement Learning: an Introduction”, MIT
Press, 1998. New draft available at:
http://www.incompleteideas.net/book/the-book-2nd.html
Buşoniu, Babuška, De Schutter and Ernst, “Reinforcement Learning and
Dynamic Programming Using Function Approximators”, CRC Press,
2010.
Szepesvari, “Algorithms for Reinforcement Learning”, Morgan and
Claypool, 2010.
Bertsekas and Tsitsiklis, “Neuro–Dynamic Programming”, Athena
Scientific, 1996.

Marcello Restelli February 25, 2018 9 / 41


Course Information

Course Goals

Learn to correctly model machine learning problems all algo have


Learn the principles of ML and the main techniques hyper-parameters
Learn to apply ML to practical problems
Learn limitations of ML techniques
Provide the basic background to do research in this field
My expectations
ask questions
interact
get involved

Marcello Restelli February 25, 2018 10 / 41


What is Machine Learning?

What is Machine Learning?

The real question is: what is learning?


Mitchell (1997):
“A computer program is said to more data, the alg will get better
learn from experience E performance.
with respect to some class of tasks T
and performance measure P,
improves with experience E”
ML is the sub-field of AI where the knowledge comes from:
Experienceinduction is the inference function (different from
Induction deduction, which starts from true facts and deduces new
true facts we start from observation, and we try to explain those
Machine learning is not magic! phenomena.
You need to know how it works
if the info is not in the data, ml alg will not find anything
You need to know how to use it
It can extract information from data, not create information

Marcello Restelli February 25, 2018 12 / 41


What is Machine Learning?

Traditional Programming vs ML

Tradition Programming

Data
Computer Output
Program

Machine Learning
ml: you provide data and the desired output for
some of the data, and the ml will try to learn the program
that produces the output from those data.
Data
Computer Program
goal: replace software developers.
Output The machine will program itself.

Marcello Restelli February 25, 2018 13 / 41


What is Machine Learning?

Why Machine Learning?


we want our algorithm to be able to make decision on unseen examples.

We need computers to make informed decisions on new, unseen data


Often it is too difficult to design a set of rules “by hand”
Machine learning allows to automatically extract relevant information
from data applying it to analyze new data
Automating automation
Getting computers to program themselves
Writing software is the bottleneck
let the data do the work instead!
we want to optimize performance over an unseen function: this is difficult.

Marcello Restelli February 25, 2018 14 / 41


What is Machine Learning?

What is ML useful for?

These are exciting times for ML


ML is becoming widespread
Computer vision and robotics
Speech recognition
Biology and medicine
Finance
Information retrieval, Web search, ...
Video gaming
Space exploration
Many application and many jobs...

Marcello Restelli February 25, 2018 15 / 41


What is Machine Learning?

A few quotes

“A breakthrough in machine learning would be worth ten Microsofts”


(Bill Gates, Chariman, Microsoft)
“Machine learning is the next Internet” (Tony Tether, Director, DARPA)
“Machine learning is the hot new things” (John Hennessy, President,
Stanford)
“Web rankings today are mostly a matter of machine learning”
(Prabhakar Raghavan, Dir. Research Yahoo)
“Machine learning is going to result in a real revolution” (Greg
Papadopolus, Former CTO, Sun)
“Machine learning is today’s discontinuity (Jerry Yang, Founder, Yahoo)
”Machine learning today is one of the hottest aspects of computer
science“ (Steve Ballmer, CEO, Microsoft)

Marcello Restelli February 25, 2018 16 / 41


What is Machine Learning?

The Machine Learning Bubble?

Marcello Restelli February 25, 2018 17 / 41


What is Machine Learning?

ML Top Venues
top journals, very good papers and contents.

free with polimi network


Journals
Journal of Machine Learning Research (JMLR)
Machine Learning Journal (MLJ) subscribe from polimi network.
Journal of Artificial Intelligence Research (JAIR)
Conferences
International Conference on Machine Learning (ICML)
Neural Information and Processing Systems (NIPS)
American Association on Artificial Intelligence (AAAI)
International Joint Conference on Artificial Intelligence (IJCAI)
Uncertainty in Artificial Intelligence (UAI)
Artificial Intelligence and Statistics (AI&Stats)
Conference on Learning Theory (CoLT)

Marcello Restelli February 25, 2018 18 / 41


What is Machine Learning?

Machine Learning

3 main areas,
there are other parts
but very little.

Marcello Restelli February 25, 2018 19 / 41


What is Machine Learning?

Machine Learning Models


learning the model: model that exaplin the relation between
input and output. We have input and output.

Supervised Learning
Learn the model
Unsupervised Learning
Learn the representation
Reinforcement Learning
Learn to control

Marcello Restelli February 25, 2018 20 / 41


What is Machine Learning?

Machine Learning Models


COVERED IN THE DATAMING AND TEXT MINING COURSE.
learn of the best possible representation of the data.
Here we have only input data, no output.
We want to find structure in the input: eg clusters.

Supervised Learning
Learn the model
data grouped
Unsupervised Learning in clusters

Learn the representation


Reinforcement Learning
Learn to control
those info come only from the full dataset.
With data in more dimensions, we have to use ml (it's difficult to represent for us)

Marcello Restelli February 25, 2018 20 / 41


What is Machine Learning?

Machine Learning Models


learning to control, to take decision.
Eg: learning to walk, to do something.
To take decision in order to optimize the performance of the agent in the task.

Supervised Learning
Learn the model
Unsupervised Learning
Learn the representation
Reinforcement Learning
Learn to control

Marcello Restelli February 25, 2018 20 / 41


What is Machine Learning?

Supervised Learning

Goal
Estimating the unknown model that maps known inputs to known
outputs
Training set: D = {hx, ti} ⇒ t = f (x) x is the input,
t is the output, target value. Value we want to predict.
Problems eg: x is the image (vectors of colors) and t is a class: Dog or Cat.
We want to find a function f that given an image, tells if the image
Classification contains Dog or Cat.
Regression
Probability estimation Classification: when the possible values for the target are finite
(eg dog of cat).
Techniques Regression: problems where the target value is continuous (eg
predict temperature of tomorrow)
Artificial Neural Networks Probab. Est: problems where the target is a probability (cont
value between 0 and 1) or a density (0, +infi). The constraint is
Support Vector Machines that the result must integrate to 1 (sum of proability is 1)
Decision trees
Etc.

Marcello Restelli February 25, 2018 21 / 41


What is Machine Learning?

Supervised Learning: Classification Example


one may think: 0 if female, 1 if male.
Training but may be: 0 if not smiling, 1 else. Testing

Input Output Input Output

0
0

1
0

1
1

0
0
Marcello Restelli February 25, 2018 22 / 41
What is Machine Learning?

Supervised Learning: Regression Example


here we are not used to see people in this way.

Training Testing

Input Output Input Output

78
78

34
81

25
41

Marcello Restelli February 25, 2018 23 / 41


What is Machine Learning?

Supervised Learning: Regression Example


AGE is the concept behind.
Training We are used to elaborate data only Testing
when is ordered in a certain way.
For machines no differences.

Input Output Input Output

78
78

34
81

25
41

Marcello Restelli February 25, 2018 24 / 41


What is Machine Learning?

Unsupervised Learning

Goal
Learning a more efficient representation of a set of unknown inputs
Training set: D = {x} ⇒? = f (x)
Problems
Compression
Clustering
Techniques
K-means
Self-organizing maps
Principal Component Analysis
Etc.

Marcello Restelli February 25, 2018 25 / 41


What is Machine Learning?

Unsupervised Learning: Clustering Example

Marcello Restelli February 25, 2018 26 / 41


What is Machine Learning?

Unsupervised Learning: Clustering Example

Marcello Restelli February 25, 2018 27 / 41


What is Machine Learning?

Unsupervised Learning: Dimensionality Reduction


Example
W is useless, it does not change.
X Y W Z
2 3 1 10
5 8 1 2
7 2 1 6
9 8 1 -2
8 1 1 6
4 10 1 1
8 8 1 -1
8 1 1 6
4 5 1 6
3 10 1 2
W is useless we could use 2 coordinates, by changing
the representations (choosing the plane)
Actually the points lie on a
if they were near a plane, we could represent them on a
2-dimensional plane plane with little error

Marcello Restelli February 25, 2018 28 / 41


What is Machine Learning?

Reinforcement Learning

Goal policy: way the agent take actions. Goal is the bestbehaviour possible

Learning the optimal policy


Training set: D = {hx, u, x0 , ri} ⇒ π ∗ (x) = arg maxu {Q∗ (x, u)},
where Q∗ (x, u) must be estimated.
Problems
Markov Decision Process (MDP) we work only with the markov decision process
Partially Observable MDP (POMDP)
Stochastic Games (SG)
Techniques
x = current state
Q-learning u = action, the control, what the agent decides to do
SARSA x' = state that the agent reaces when applies action u on state x
r = reward, that says how good was taking the action and reaching x'.
Fitted Q-iteration The goal is to optimize the reward over a long time horizon.
Sometimes it's okay to do actions with bad rewards, in order to have
Etc. a very high reward after.

Marcello Restelli February 25, 2018 29 / 41


What is Machine Learning?

Reinforcement Learning: Example

But Who’s Counting?


Marcello Restelli February 25, 2018 30 / 41
What is Machine Learning?

But Who’s Counting?

First game
Best possible value: 75421
Value following the optimal policy: 75142
Second game
Best possible value: 76530
Value following the optimal policy: 75630

Marcello Restelli February 25, 2018 31 / 41


Supervised Learning

Supervised Learning

Supervised (inductive) learning is the largest, most mature, most


widely used sub-field of machine learning
Given: training data set including desired outputs: D = {hx, ti} from
some unknown function f
Find: A good approximation of f that generalizes well on test data
Input variables x are also called features, predictors, attributes
Output variables t are also called targets, responses, labels
If t is discrete: classification
if t is continuous: regression
if t is the probability of x: probability estimation

Marcello Restelli February 25, 2018 33 / 41


Supervised Learning

Appropriate applications

There is no human expert


e.g., DNA analysis
Humans can perform the task but cannot explain how
e.g., character recognition
Desired function changes frequently
e.g., predicting stock prices based on recent trading data
Each user needs a customized function f
e.g., email filtering

Marcello Restelli February 25, 2018 34 / 41


Supervised Learning

square F: space of all the possible functions.


Overview of Supervised Learning f is the true function we want like to learn.

We want to approximate f given


the data set D F
The steps are
1 Define a loss function L f
2 Choose some hypothesis space
H
3 Optimize to find an
approximate model h
What happens if we enlarge the
hypothesis space?
We do not know f and we have
only a finite number of samples

Marcello Restelli February 25, 2018 35 / 41


Supervised Learning

Overview of Supervised Learning


loss function: function that measure the distance between the true
function and the found function.
Close -> light color
Distant -> darker color

We want to approximate f given


the data set D F
The steps are f
1 Define a loss function L
2 Choose some hypothesis space
H
3 Optimize to find an
approximate model h
What happens if we enlarge the
hypothesis space?
We do not know f and we have
only a finite number of samples

Marcello Restelli February 25, 2018 35 / 41


Supervised Learning

Overview of Supervised Learning


subspace of all the function, H = subspace(F). We want to find in H the best to approximate f.

we are limiting inside H1, so the hypothesys h1 will be inside H1.

We want to approximate f given


the data set D F
The steps are f
1 Define a loss function L
2 Choose some hypothesis space
H
3 Optimize to find an H1
approximate model h
What happens if we enlarge the
hypothesis space?
We do not know f and we have
only a finite number of samples

Marcello Restelli February 25, 2018 35 / 41


Supervised Learning
hypothesys is a synonym for function.

Overview of Supervised Learning


Search the function (hypotesys) h1 inside H1 with respect the true function.

We want to approximate f given


the data set D F
The steps are f
1 Define a loss function L
2 Choose some hypothesis space
H
h1
3 Optimize to find an H1
approximate model h
What happens if we enlarge the
hypothesis space?
We do not know f and we have
only a finite number of samples

Marcello Restelli February 25, 2018 35 / 41


Supervised Learning

Overview of Supervised Learning

We want to approximate f given


the data set D F
The steps are f
1 Define a loss function L
2 Choose some hypothesis space
H
3 Optimize to find an
approximate model h
What happens if we enlarge the
hypothesis space? H2
We do not know f and we have
only a finite number of samples
If we enlarge H1 to H2, when we know the true function f, the performance can only get better.
Because we know the true function, and the probability of having the true function inside H2 are bigger.
But this is not the case if we do not have the true function.

Marcello Restelli February 25, 2018 35 / 41


Supervised Learning

Overview of Supervised Learning


Problem: how do we know the loos functions if we do not know f?
Here we have a function approximation problem, where we have the true function. We are just finding a similar function (like a polinomya
one). But in ML we do now know the true function, so we do not know the loss function.

We want to approximate f given


the data set D F
The steps are f
1 Define a loss function L h2
2 Choose some hypothesis space
H
3 Optimize to find an
approximate model h
What happens if we enlarge the
hypothesis space? H2
We do not know f and we have
only a finite number of samples
In ml the loss function will be built upon the samples we are given.
We then have a noise, because we do not know precisely the true function.
We have an approximation of the true loss function.
The loss function built upon the samples can be good or bad based upon the quality of the samples.

Marcello Restelli February 25, 2018 35 / 41


Supervised Learning

Overview of Supervised Learning


in this case we enlarge H1 to H2.
The loss function is noisy, tends to go under f (che white part is not on f).
With h1 we are closer to f, with h2 we are more far.

We want to approximate f given


the data set D F
The steps are f
1 Define a loss function L
2 Choose some hypothesis space
H
3 Optimize to find an H1
approximate model h
h1
What happens if we enlarge the
hypothesis space? H2
h2
We do not know f and we have
only a finite number of samples
So if the loss function is built upon data, it is not always good to have a bigger hypothesys space.

Marcello Restelli February 25, 2018 35 / 41


Supervised Learning

small hypothesis space, we may be far from the


Overview of Supervised Learning solution.
big hypothesis space, we are more subjected to
The goal is to find the proper size for the hypothesis space. the noise of data.
with not enough data, the bigger H, the more we are subjected to the noise of data.
If H is bigger, the model becomes bigger. WE CAN INCREASE THE FEATURES
We have to find a tradeoff between the hypothesis space and the loss function. AND THE DATASET
(feature: male, female etc.., data: number of
We want to approximate f given pictures)
the data set D F
The steps are f
1 Define a loss function L
2 Choose some hypothesis space
H
3 Optimize to find an H1 h1
approximate model h h2
What happens if we enlarge the
hypothesis space? H2
We do not know f and we have
only a finite number of samples
loss function is obtained from samples.
loss function is unknown
Its quality depends on the quality and quantity of samples.
Marcello Restelli February 25, 2018 35 / 41
Supervised Learning

Key Elements of Supervised Learning

Ten of thousands of machine learning algorithms


Hundreds new every year
Every machine learning algorithm has three Components;
Representation
Evaluation
Optimization

Marcello Restelli February 25, 2018 36 / 41


Supervised Learning

Representation

Linear models
Instance-based
Decision trees
Set of rules
Graphical models
Neural networks
Gaussian Processes
Support vector machines
Model ensembles
etc.

Marcello Restelli February 25, 2018 37 / 41


Supervised Learning

Representation

Linear models
Instance-based
Decision trees
Set of rules
Graphical models
Neural networks
Gaussian Processes
Support vector machines
Model ensembles
etc.

Marcello Restelli February 25, 2018 38 / 41


Supervised Learning

Evaluation

Accuracy
Precision and recall
Squared error
Likelihood
Posterior probability
Cost/Utility
Margin
Entropy
KL divergence
Etc.

Marcello Restelli February 25, 2018 39 / 41


Supervised Learning

Optimization

Combinatorial optimization
e.g.: Greedy search combinatory optimization
Convex optimization
e.g.: Gradient descentwe will see
Constrained optimization
e.g.: Linear programming constrained optimization

Marcello Restelli February 25, 2018 40 / 41


Supervised Learning

Dichotomies in ML

Parametric vs Nonparametric
Parametric: fixed and finite number of parameters
Nonparametric: the number of parameters depends on the training set
Frequentist vs Bayesian
Frequentist: use probabilities to model the sampling process
Bayesian: use probability to model uncertainty about the estimate
bahesian: we have some prior knowledge along
Generative vs Discriminative the sampling
Generative: Learns the joint probability distribution p(x, t)
Discriminative: Learns the conditional probability distribution p(t|x)
Empirical Risk Minimization vs Structural Risk Minimization
Empirical Risk: Error over the training set
Structural Risk: Balance training error with model complexity
avoid generalizing too much

Marcello Restelli February 25, 2018 41 / 41

You might also like