0% found this document useful (0 votes)
19 views31 pages

Multiclass Classification Regularization

Uploaded by

mabdian4821
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
19 views31 pages

Multiclass Classification Regularization

Uploaded by

mabdian4821
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 31

Multiclass classification

Email foldering/tagging: Work, Friends, Family, Hobby

Medical diagrams: Not ill, Cold, Flu

Weather: Sunny, Cloudy, Rain, Snow


Binary classification: Multi-class classification:

x2 x2

x1 x1
One-vs-one:
x2
One-vs-all:

x1
x2 x2

x1 x1
x2
Class 1:
Class 2:
Class 3:
x1
Regularization
Regularization
A fundamental problem is that the algorithm tries to pick parameters that minimize loss on the
training set, but this may not result in a model that has low loss on future data. This is called
overfitting.

example

suppose we want to predict the probability of heads when tossing a coin. We toss it = 3 times and

observe 3 heads. The MLE is mle = / + 1 = 3/(3 + 0) = 1. However, if we use Ber

∣ mle as our model, we will predict that all future coin tosses will also be heads, which seems

rather unlikely.
Regularization
• The core of the problem is that the model has enough parameters to perfectly fit the observed
training data. so it can perfectly match the empirical distribution.

• However, in most cases the empirical distribution is not the same as the true distribution, so
putting all the probability mass on the observed set of examples will not leave over any probability
for novel data in the future. That is, the model may not generalize.
Solution
The main solution to overfitting is to use regularization, which means to add a penalty term to the
Cost function. Thus we optimize an objective of the form

1
ℒ( ; ) = ℓ , ; + ( )

≥ 0 is a tuning parameter and control the relative impact of these two terms on the regression
coefficient estimates.

When = 0, the penalty term has no effect


However, as → ∞, the impact of the shrinkage penalty grows, and the ridge regression
coefficient estimates will approach zero.
Example: Linear regression (housing prices)
Price

Price

Price
Size Size Size

Overfitting: If we have too many features, the learned hypothesis


may fit the training set very well ( ), but fail
to generalize to new examples (predict prices on new examples).
Example: Logistic regression

x2 x2 x2

x1 x1 x1

( = sigmoid function)
Addressing overfitting:
size of house

Price
no. of bedrooms
no. of floors
age of house
average income in neighborhood Size
kitchen size
Addressing overfitting:

Options:
1. Reduce number of features
― Manually select which features to keep.
― Model selection algorithm (later in course).
2. Regularization
― Keep all the features, but reduce magnitude/values of
parameters .
― Works well when we have a lot of features, each of
which contributes a bit to predicting .
Regularization
Cost function
Intuition

Price
Price

Size of house Size of house

Suppose we penalize and make , really small.


Regularization.

Small values for parameters


― “Simpler” hypothesis
― Less prone to overfitting
Housing:
― Features:
― Parameters:
Regularization.

Price

Size of house
In regularized linear regression, we choose to minimize

What if is set to an extremely large value (perhaps for too large


for our problem, say )?
- Algorithm works fine; setting to be very large can’t hurt it
- Algortihm fails to eliminate overfitting.
- Algorithm results in underfitting. (Fails to fit even training data
well).
- Gradient descent will fail to converge.
In regularized linear regression, we choose to minimize

What if is set to an extremely large value (perhaps for too large


for our problem, say )?
Price

Size of house
Regularization
Regularized linear regression
Regularized linear regression
Gradient descent
Repeat
Normal equation(Regularized)
Non-invertibility
Suppose ,
(#examples) (#features)

If ,
Regularization
Regularized logistic regression
Regularized logistic regression.

x2

x1
Cost function:
Gradient descent
Repeat

You might also like