Lecture Slide 02 - Supervised Learning - Summer 2023
Lecture Slide 02 - Supervised Learning - Summer 2023
This implies the existence of a “teacher” who knows the right answers
h : X1 × X2 × … × Xn → Y
h is called a hypothesis
• Training examples:
ei = <xi, yi>
for i = 1, …, 10
Linear Hypothesis
What Error Function ? Algorithm ?
Want to find the weight vector w = (w0, …, wn) such that hw(xi)
≈ yi
Should define the error function to measure the difference
between the predictions and the true answers
Thus pick w such that the error is minimized
Sum-of-squares error function:
Overfitting = low error on training data but high error on unseen data
Assume D is drawn from some unknown probability distribution
Given the universe U of data, we want to learn a hypothesis h from
the training set minimizing the error on unseen data .
Every h has a true error JU(h) on U, which is the expected error when the
data is drawn from the distribution
We can only measure the empirical error JD(h) on D; we do not have U
Then… How can we estimate the error JU(h) from D?
Apply a cross-validation method during D
Determining best hypothesis h which generalizes best is called model selection.
Avoiding Overfitting
• Red curve = Test set
• Blue curve = Training set
Optimal choice is d = 2
Overfitting for d > 2
Very high validation error for d = 8 and 9
Model Selection
J(d*) is not unbiased since it was obtained using all m
sample data
We chose the hypothesis class d* based on
We want both an hypothesis class and an unbiased true
error estimate
If we want to compare different learning algorithms (or
different hypotheses) an independent test data U is
required in order to decide for the best algorithm or the
best hypothesis
In our case, we are trying to decide which regression
model to use, d=1, or d=2, or …, or d=11?
And, which has the best unbiased true error estimate
k-Fold Cross-Validation
Partition D into k disjoint subsets of same size and
same distribution, P1, P2, …, Pk
For each degree d do:
for i ← 1 to k do:
1. Validation set Vi ← Pi ; leave Pi out for validation
2. Training set Ti ← D \ Vi
3. wd,i ← Train(Ti, d) ; train on Ti
4. J(d, i) ← Test(Vi) ; compute validation error on Ti
Average validation error:
d* ← arg mind J(d) ; return optimal degree d
kCV-Based Model Selection
Partition D into k disjoint subsets P1, P2, …, Pk
1. For j ← 1 to k do:
1. Test set Uj ← {ej = ( xj, yj )}
2. TrainingAndValidation set Dj ← D \ Uj
3. dj* ← kCV(Dj) ; find best degree d at iteration j
4. wj* ← Train(Dj, dj*) ; find associated w using full data Dj
5. J(hj*) ← Test(Uj) ; estimate unbiased predictive error of hj* on Uj
2. Performance of method:
; return E as the performance of the learning algorithm
3. Best hypothesis: hbest ← arg minj J(hj*) ; final selected predictor
; several approaches can be used to come up with just one hypothesis
Variations of k-Fold Cross-Validation
LOOCV: k-CV with k = m; i.e. m-fold CV
Best but very slow on large D
k-CV is a good trade-off between true error estimate, speed, and data size
Each sample is used for validation exactly once. Usually k = 10.
Learning a Class from Examples
Class C of a “family car”
Prediction: Is car x a family car?
Knowledge extraction: What do people expect from a
family car?
Output:
Positive (+) and negative (–) examples
Input representation:
x1: price, x2 : engine power
Training set X
Class C
Hypothesis class H
Error of h on H
h Î H, between S and G is
consistent
and make up the
version space
(Mitchell, 1997)
Margin
Choose h with largest margin
VC Dimension
N points can be labeled in 2N ways as +/–
H shatters N if there
exists h Î H consistent
for any of these:
VC(H ) = N
Train hypotheses
hi(x), i =1,...,K:
Regression
Model Selection & Generalization
Learning is an ill-posed problem; data is not sufficient
to find a unique solution
The need for inductive bias, assumptions about H
Generalization: How well a model performs on new
data
Overfitting: H more complex than C or f
Underfitting: H less complex than C or f
Triple Trade-Off
2. Loss function:
3. Optimization procedure:
Textbook/ Reference Materials