Chapter 19
Chapter 19
Chapter 19.
Machine learning.
Eng.Abdulrazak A. Dirie
Learning from Examples: Machine Learning 2
Machine Learning
Learning: Improve performance after making observations about the world. That is, learn what works
and what doesn’t to get closer to optimal decisions.
How to learn a model to make better decisions from data/experience?
Supervised Learning: Learn a function (model) to map input to output from a
training set.
Examples:
Use a naïve Bayesian classifier to distinguish between spam/no spam
Learn a playout policy to simulate games (current board -> good move)
Unsupervised Learning: Organize data (e.g., clustering, embedding)
Deap learning :
Reinforcement Learning: Learn from rewards/punishment (e.g., winning a game)
obtained via interaction with the environment over time.
Supervised Learning 3
Examples
We assume there exists a target function that produces iid (independent and identically distributed) examples
possibly with noise and errors.
Examples are observed input-output pairs , where is a vectors called the feature vector.
Learning problem 𝑓
Given a hypothesis space H of representable models.
Find a hypothesis such that
That is, we want to approximate by using .
Very simple,
but not very
consistent
with the
data!
Consistency:
Simplicity: small number of model parameters
Measuring Consistency using Loss 5
Goal of learning: Find a hypothesis that makes predictions that are consistent with the examples .
That is,
Empirical loss
Loss
𝑓
Reasons for
∗
a)
b)
Realizability:
is nondeterministic or examples are noisy.
h
c) It is computationally intractable to search all ,
so we use a non-optimal heuristic.
The Bayes Classifier 7
For 0/1 loss, the empirical loss is minimized by the model that predicts for each the most likely class using MAP
(Maximum a posteriori) estimates. This is called the Bayes classifier.
Optimality: The Bayes classifier is optimal for 0/1 loss. It is the most consistent classifier possible with the
lowest possible error called the Bayes error rate. No better classifier is possible!
Ease of use
Simpler hypotheses have fewer model parameters to estimate and store.
Penalty term
Overfitting
Model Selection: Bias vs. Variance 9
Points: Two
samples from the
same function to
show variance.
Examples
(Instances,
Observation)
on
ve
Pa y
e
ns
ti
ti m
r
a ti
ng
va
tro
rn
Hu
ai t
se
te
W
Re
Al
Find a hypothesis (called “model”) to predict the class given the features.
Feature Engineering 11
Training
and
Testing
Model Evaluation (Testing) 13
The model was trained on the training examples . We want to test how well the model
will perform on new examples (i.e., how well it generalizes to new data).
Testing loss: Calculate the empirical loss for predictions on a testing data set that
is different from the data used for training.
For classification we often use the accuracy measure, the proportion of correctly
classified test examples.
Models are “trained” (learned) on the training data. This involved estimating:
Test
Data
Hyperparameter Tuning/Model Selection 15
Notes: Validation
Data
The validation set was not used for training, so we get generalization accuracy for the
different hyperparameter settings.
If no model selection is necessary, then no validation set is used.
Test
Data
Testing a Model 16
Training
Data
After the model is selected, the final model is evaluated against the
test set to estimate the final model accuracy.
Test
Very important: never “peek” at the test set during training! Data
How to Split the Dataset 17
Stratified splits: Like random splits, but balance classes and other properties of the Training
examples. Data
Training
Data
k-fold cross validation: Use training & validation data better
Split the training & validation data randomly into k folds. Validation
For k rounds hold one fold back for testing and use the remaining folds for training. Data
Use the average error/accuracy as a better estimate.
Some algorithms/tools do this internally.
Test
Data
LOOCV (leave-one-out cross validation): used if very little data is available.
Learning Curve: 18
More data is
better!
At some point the learning
curve flattens out and more
data does not contribute
much!
Comparing to a Baselines 19
Model:
Squared error loss over the whole data matrix
Empirical Loss:
The gradient is a vector of partial derivatives
Gradient:
Find: 0
Analytical solution:
Pseudo inverse
Naïve Bayes Classifier 22
Approximates a Bayes classifier with the naïve independence assumption that all features are
conditional independent given the class.
Gaussian Naïve Bayes Classifiers extend the approach to continuous features by assuming:
Class is predicted by looking at the majority in the set of the k nearest neighbors. is a hyperparameter.
Larger smooth the decision boundary.
Neighbors are found using a distance measure (e.g., Euclidean distance between points).
Approximates a Bayesian classifier by
Support Vector Machine (SVM) 25
Margin
Decision
boundary
Linear classifier that finds the maximum margin separator using only the points that are “support
vectors” and quadratic optimization.
The kernel trick can be used to learn non-linear decision boundaries.
Artificial Neural Networks/Deep Learning 26
Computational graph
Hidden Layer For classification
typically a softmax Represent as a network of
activation function
returning
weighted sums with non-linear
activation functions g (e.g.,
logistic, ReLU).
Learn weights from examples
using backpropagation of
prediction errors (gradient
descend).
ANNs are universal
approximators. Large networks
can approximate any function (no
bias). Regularization is typically
used to avoid overfitting.
Deep learning adds more hidden
Perceptron layers and layer types (e.g.,
Bias term Non-linear activation function convolution layers) for better
learning.
27
Other
Many other models exist
Machine leaning
Supervised learning ( k-nearest naiboughr, linear regresson, ANN)
Unsupervised learning. (clustering..)
Deep learning ( ANN, CNN, RNN…)
Reinforcement learning ( q table ..)
Practice of supervised learning:
Linear regression.
Support vector machine.
Ensample leaning.
Decision tree.
Naïve bayes classifier.
30
END