Multiclass Classification Regularization
Multiclass Classification Regularization
x2 x2
x1 x1
One-vs-one:
x2
One-vs-all:
x1
x2 x2
x1 x1
x2
Class 1:
Class 2:
Class 3:
x1
Regularization
Regularization
A fundamental problem is that the algorithm tries to pick parameters that minimize loss on the
training set, but this may not result in a model that has low loss on future data. This is called
overfitting.
example
suppose we want to predict the probability of heads when tossing a coin. We toss it = 3 times and
∣ mle as our model, we will predict that all future coin tosses will also be heads, which seems
rather unlikely.
Regularization
• The core of the problem is that the model has enough parameters to perfectly fit the observed
training data. so it can perfectly match the empirical distribution.
• However, in most cases the empirical distribution is not the same as the true distribution, so
putting all the probability mass on the observed set of examples will not leave over any probability
for novel data in the future. That is, the model may not generalize.
Solution
The main solution to overfitting is to use regularization, which means to add a penalty term to the
Cost function. Thus we optimize an objective of the form
1
ℒ( ; ) = ℓ , ; + ( )
≥ 0 is a tuning parameter and control the relative impact of these two terms on the regression
coefficient estimates.
Price
Price
Size Size Size
x2 x2 x2
x1 x1 x1
( = sigmoid function)
Addressing overfitting:
size of house
Price
no. of bedrooms
no. of floors
age of house
average income in neighborhood Size
kitchen size
Addressing overfitting:
Options:
1. Reduce number of features
― Manually select which features to keep.
― Model selection algorithm (later in course).
2. Regularization
― Keep all the features, but reduce magnitude/values of
parameters .
― Works well when we have a lot of features, each of
which contributes a bit to predicting .
Regularization
Cost function
Intuition
Price
Price
Price
Size of house
In regularized linear regression, we choose to minimize
Size of house
Regularization
Regularized linear regression
Regularized linear regression
Gradient descent
Repeat
Normal equation(Regularized)
Non-invertibility
Suppose ,
(#examples) (#features)
If ,
Regularization
Regularized logistic regression
Regularized logistic regression.
x2
x1
Cost function:
Gradient descent
Repeat