0% found this document useful (0 votes)
10 views17 pages

Lecture 4 and 5

Uploaded by

bhavesh agrawal
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
10 views17 pages

Lecture 4 and 5

Uploaded by

bhavesh agrawal
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 17

MACHINE LEARNING (CS 403/603)

Bias-variance; Linear regression and


Ridge regression

By: Dr. Puneet Gupta


Overfitting
Overfitting means fitting the training set “too well” on the performance
on the test set degrades.
Underfitting refers to a model that can neither model the training data nor
generalize to new data.
Model will keep on learning and
thus the error on training and
testing data decreases.
If learning goes too long,
overfitting starts due to noise
and less relevant attributes.
Hence the performance of the
model on test set decreases.
For good model, we will stop at
a point just before where the
error starts increasing, i.e., the
point where the model performs
well on training and unseen
testing dataset.
K-Nearest Neighbor
1-NN,
i.e., K=1

5-NN, ● In one-nearest-neighbor (1-NN), label


i.e., K=5 of x is given by the label of its nearest
neighbor in training data.
● Distance measures are used to find the
nearest neighbor.

● Better result can be expected when


more (K > 1) neighbors are utilized.
● Classification: Use majority voting.
Mitigating Overfitting by Holdout
Model Selection
Techniques

Test Set
ML Algo. Error or Error1
Training Set

Select model with minimum error,


Model 1 average error
1

Error2
Validation Set

ML Algo. Error or
Model 2 average error
2

Model s
Selected Final
Error
: : : model
: : :
: : :
: : :
ML Algo. Error or Errorq
Model q average error
q

Different ML Algorithms are designed by varying hyperparameters


Overfitting vs underfitting
Bias and variance

● Variance is an error from sensitivity to Low Variance High Variance


small fluctuations in the training set
● The bias describes how much the

Low Bias
model fit over datasets, i.e., prediction
deviates from the value of the
underlying target function.

High Bias
Expected Prediction Error and Bias-
variance Tradeoff

What is expected prediction error?


Example
A person with high bias is someone who starts to answer before you can
even finish asking. A person with high variance is someone who can
think of all sorts of crazy answers. Combining these provide:
● High bias/low variance: this is someone who usually gives you the
same answer, no matter what you ask, and is always wrong about it.
● High bias/high variance: someone who takes wild guesses, all of
which are sort of wrong; he might be right sometimes due to chance.
● Low bias/high variance: a knowledgeable person who listens to you
and tries to answer the best they can, but that daydreams a lot and
may say something totally crazy.
● Low bias/low variance: a person who listens to you very carefully and
gives you good answers pretty much all the time.
Basic Maths refresher
Previous example of Supervised
Learning

-1 +1
Linear Regression
● Linear Models are used in SVM, Deep learning, and etc...
● Defining a rule is not always feasible.
● How to learn their weights (or unknowns) from data?
● Formulate learning as optimization problem w.r.t. weights

Parameteric ML
Equation of line:
y = mx +c

c can be considered as
bias then, m is the
weight.

For simplicity, we
consider the following:
● w = [m c]
T

● x = [x 1]
T

1-D pictorial example of linear regression


Linear Regression
● Squared loss chosen
for simplicity.
● The best w minimizes
training error w.r.t. w

Closed form solution for w


Linear Regression

In w, wd denotes the importance of dth input feature for predicting y

Problems with the closed form solution:


● Outliers or noise
● (XTX) may not be invertible
● Overfitting: Based solely on minimizing the training error
● Expensive inversion for large D
Ridge Regression or Regularized
Linear Regression
Why l2 regularization?
Linear and Ridge Regression
Linear and Ridge Regression

You might also like