0% found this document useful (0 votes)
7 views18 pages

Lecture 4

Uploaded by

bhavesh agrawal
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
7 views18 pages

Lecture 4

Uploaded by

bhavesh agrawal
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 18

MACHINE LEARNING (CS 403/603)

Bias-variance; Linear regression and


Ridge regression

By: Dr. Puneet Gupta


Overfitting
Overfitting means fitting the training set “too well” on the performance
on the test set degrades.
Underfitting refers to a model that can neither model the training data
nor generalize to new data.
Model will keep on learning and
thus the error on training and
testing data decreases.
If learning goes too long,
overfitting starts due to noise
and less relevant attributes.
Hence the performance of the
model on test set decreases.
For good model, we will stop at
a point just before where the
error starts increasing, i.e., the
point where the model performs
well on training and unseen
testing dataset.
K-Nearest Neighbor
1-NN,
i.e., K=1

5-NN, ● In one-nearest-neighbor (1-NN), label


i.e., K=5 of x is given by the label of its nearest
neighbor in training data.
● Distance measures are used to find the
nearest neighbor.

● Better result can be expected when


more (K > 1) neighbors are utilized.
● Classification: Use majority voting.
Mitigating Overfitting by Holdout
Model Selection
Techniques
Error1

Test Set
Error or
Training Set

ML Algo. 1 Model 1

Select model with minimum error,


average error

Error or Error2
Validation Set

ML Algo. 2 Model 2
average error

Model s
Selected Final
model Error
: : :
: : :
: : :
: : :
Error or Errorq
ML Algo. q Model q
average error

Different ML Algorithms are designed by varying hyperparameters


Overfitting vs underfitting
Bias and variance

● Variance is an error from sensitivity to Low Variance High Variance


small fluctuations in the training set
The bias describes how much the

ng

tti
model fit over datasets, i.e., prediction

Low Bias
rfi
deviates from the value of the

ve
underlying target function.

High Bias
g
tin
fit
er
nd
U
Expected Prediction Error and Bias-
variance Tradeoff

What is expected prediction error?


Example
A person with high bias is someone who starts to answer before you
can even finish asking. A person with high variance is someone who
can think of all sorts of crazy answers. Combining these provide:
● High bias/low variance: this is someone who usually gives you
the same answer, no matter what you ask, and is always wrong
about it.
● High bias/high variance: someone who takes wild guesses, all of
which are sort of wrong; he might be right sometimes due to
chance.
● Low bias/high variance: a knowledgeable person who listens to
you and tries to answer the best they can, but that daydreams a
lot and may say something totally crazy.
● Low bias/low variance: a person who listens to you very carefully
and gives you good answers pretty much all the time.
Basic Maths refresher
Previous example of Supervised
Learning

-1 +1
Linear Regression
● Linear Models are used in SVM, Deep learning, and etc...
● Defining a rule is not always feasible.
● How to learn their weights (or unknowns) from data?
● Formulate learning as optimization problem w.r.t. weights

Parameteric ML
Equation of line:
y = mx +c

c can be considered as
bias then, m is the
weight.

For simplicity, we
consider the following:

w = [m c]T

x = [x 1]T
1-D pictorial example of linear regression
Linear Regression
● Squared loss chosen
for simplicity.
● The best w minimizes
training error w.r.t. w
Linear Regression

In w, wd denotes the importance of dth input feature for predicting y

Problems with the closed form solution:


● Outliers or noise


(XTX) may not be invertible
● Overfitting: Based solely on minimizing the training error

● Expensive inversion for large D


Ridge Regression or Regularized
Linear Regression
Why l2 regularization?
Linear and Ridge Regression
Linear and Ridge Regression
Reference

A Course in Machine Learning by Hal Daumé III.


Link: “http://ciml.info/dl/v0_99/ciml-v0_99-all.pdf”.

You might also like