Regularization
The problem of
overfitting
Machine Learning
Example: Linear regression (housing prices)
Price
Price
Price
Size Size Size
Overfitting: If we have too many features, the learned hypothesis
may fit the training set very well ( ), but fail
to generalize to new examples (predict prices on new examples).
Andrew Ng
Example: Logistic regression
x2 x2 x2
x1 x1 x1
( = sigmoid function)
Andrew Ng
Addressing overfitting:
size of house
Price
no. of bedrooms
no. of floors
age of house
average income in neighborhood Size
kitchen size
Andrew Ng
Addressing overfitting:
Options:
1. Reduce number of features.
Manually select which features to keep.
Model selection algorithm (later in course).
2. Regularization.
Keep all the features, but reduce magnitude/values of
parameters .
Works well when we have a lot of features, each of
which contributes a bit to predicting .
Andrew Ng
Regularization
Cost function
Machine Learning
Intuition
Price
Price
Size of house Size of house
Suppose we penalize and make , really small.
Andrew Ng
Regularization.
Small values for parameters
Simpler hypothesis
Less prone to overfitting
Housing:
Features:
Parameters:
Andrew Ng
Regularization.
Price
Size of house
Andrew Ng
In regularized linear regression, we choose to minimize
What if is set to an extremely large value (perhaps for too large
for our problem, say )?
- Algorithm works fine; setting to be very large cant hurt it
- Algortihm fails to eliminate overfitting.
- Algorithm results in underfitting. (Fails to fit even training data
well).
- Gradient descent will fail to converge.
Andrew Ng
In regularized linear regression, we choose to minimize
What if is set to an extremely large value (perhaps for too large
for our problem, say )?
Price
Size of house
Andrew Ng
Regularization
Regularized linear
regression
Machine Learning
Regularized linear regression
Gradient descent
Repeat
Andrew Ng
Normal equation
Andrew Ng
Non-invertibility (optional/advanced).
Suppose ,
(#examples) (#features)
If ,
Andrew Ng
Regularization
Regularized
logistic regression
Machine Learning
Regularized logistic regression.
x2
x1
Cost function:
Andrew Ng
Gradient descent
Repeat
Andrew Ng
Advanced optimization
function [jVal, gradient] = costFunction(theta)
jVal = [ code to compute ];
gradient(1) = [ code to compute ];
gradient(2) = [code to compute ];
gradient(3) = [code to compute ];
gradient(n+1) = [ code to compute ];
Andrew Ng