Machine Learning
Machine Learning
Introduction
Marcello Restelli
1 Course Information
3 Supervised Learning
Admin 03/03/20
Practical classes
Textbooks
Supervised Learning
Bishop, “Pattern Recognition and Machine Learning”, Springer, 2006.
Mitchell, “Machine Learning”, McGraw Hill, 1997.
Hastie, Tibshirani, Friedman, “The Elements of Statistical Learning: Data
Mining, Inference, and Prediction”, Springer, 2009.
Reinforcement Learning
Sutton and Barto, “Reinforcement Learning: an Introduction”, MIT
Press, 1998. New draft available at:
http://www.incompleteideas.net/book/the-book-2nd.html
Buşoniu, Babuška, De Schutter and Ernst, “Reinforcement Learning and
Dynamic Programming Using Function Approximators”, CRC Press,
2010.
Szepesvari, “Algorithms for Reinforcement Learning”, Morgan and
Claypool, 2010.
Bertsekas and Tsitsiklis, “Neuro–Dynamic Programming”, Athena
Scientific, 1996.
Course Goals
Traditional Programming vs ML
Tradition Programming
Data
Computer Output
Program
Machine Learning
ml: you provide data and the desired output for
some of the data, and the ml will try to learn the program
that produces the output from those data.
Data
Computer Program
goal: replace software developers.
Output The machine will program itself.
A few quotes
ML Top Venues
top journals, very good papers and contents.
Machine Learning
3 main areas,
there are other parts
but very little.
Supervised Learning
Learn the model
Unsupervised Learning
Learn the representation
Reinforcement Learning
Learn to control
Supervised Learning
Learn the model
data grouped
Unsupervised Learning in clusters
Supervised Learning
Learn the model
Unsupervised Learning
Learn the representation
Reinforcement Learning
Learn to control
Supervised Learning
Goal
Estimating the unknown model that maps known inputs to known
outputs
Training set: D = {hx, ti} ⇒ t = f (x) x is the input,
t is the output, target value. Value we want to predict.
Problems eg: x is the image (vectors of colors) and t is a class: Dog or Cat.
We want to find a function f that given an image, tells if the image
Classification contains Dog or Cat.
Regression
Probability estimation Classification: when the possible values for the target are finite
(eg dog of cat).
Techniques Regression: problems where the target value is continuous (eg
predict temperature of tomorrow)
Artificial Neural Networks Probab. Est: problems where the target is a probability (cont
value between 0 and 1) or a density (0, +infi). The constraint is
Support Vector Machines that the result must integrate to 1 (sum of proability is 1)
Decision trees
Etc.
0
0
1
0
1
1
0
0
Marcello Restelli February 25, 2018 22 / 41
What is Machine Learning?
Training Testing
78
78
34
81
25
41
78
78
34
81
25
41
Unsupervised Learning
Goal
Learning a more efficient representation of a set of unknown inputs
Training set: D = {x} ⇒? = f (x)
Problems
Compression
Clustering
Techniques
K-means
Self-organizing maps
Principal Component Analysis
Etc.
Reinforcement Learning
Goal policy: way the agent take actions. Goal is the bestbehaviour possible
First game
Best possible value: 75421
Value following the optimal policy: 75142
Second game
Best possible value: 76530
Value following the optimal policy: 75630
Supervised Learning
Appropriate applications
Representation
Linear models
Instance-based
Decision trees
Set of rules
Graphical models
Neural networks
Gaussian Processes
Support vector machines
Model ensembles
etc.
Representation
Linear models
Instance-based
Decision trees
Set of rules
Graphical models
Neural networks
Gaussian Processes
Support vector machines
Model ensembles
etc.
Evaluation
Accuracy
Precision and recall
Squared error
Likelihood
Posterior probability
Cost/Utility
Margin
Entropy
KL divergence
Etc.
Optimization
Combinatorial optimization
e.g.: Greedy search combinatory optimization
Convex optimization
e.g.: Gradient descentwe will see
Constrained optimization
e.g.: Linear programming constrained optimization
Dichotomies in ML
Parametric vs Nonparametric
Parametric: fixed and finite number of parameters
Nonparametric: the number of parameters depends on the training set
Frequentist vs Bayesian
Frequentist: use probabilities to model the sampling process
Bayesian: use probability to model uncertainty about the estimate
bahesian: we have some prior knowledge along
Generative vs Discriminative the sampling
Generative: Learns the joint probability distribution p(x, t)
Discriminative: Learns the conditional probability distribution p(t|x)
Empirical Risk Minimization vs Structural Risk Minimization
Empirical Risk: Error over the training set
Structural Risk: Balance training error with model complexity
avoid generalizing too much