0% found this document useful (0 votes)

20 views26 pages

Lecture 09 ML

This document provides an overview of several machine learning models and techniques: - Polynomial regression, learning curves, and regularized linear models such as ridge, lasso, and elastic net regression are discussed for modeling relationships between variables and reducing overfitting. - Early stopping is presented as a way to regularize models by stopping training when validation error reaches a minimum. - Logistic regression is introduced for estimating probabilities and binary classification, using a "logistic function" to calculate class probabilities.

Uploaded by

saharabdouma

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

20 views26 pages

Lecture 09 ML

Uploaded by

saharabdouma

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 26

• Polynomial Regression.

• Learning Curves.
• Regularized Linear Models
• Ridge Regression.
• Lasso Regression.
TODAY’S • Elastic Net.

CONTENT • Early Stopping.

• Logistic Regression
• Estimating Probabilities.
• Training and Cost Function.
• Decision Boundaries.
• SoftMax Regression.
#1 Polynomial Regression

• Polynomial Regression is a
special case of Linear Regression
in which the relationship between
the independent variables and
dependent variables are modeled
in the nth degree polynomial.
#1 Polynomial Regression
#1 Polynomial
Regression

• If you perform high-degree Polynomial

Regression, you will likely fit the training
data much better than with plain Linear
Regression.
high-degree Polynomial Regression model
is severely overfitting the training data,
while the linear model is underfitting it.
So, How can you decide how complex
your model should be? How can you
say that your model is overfitting or
underfitting the data?
#2 Learning Curves

• Learning curves are plots that display

the performance of a machine learning
model as the number of training samples
increases. We use them to evaluate the
model’s ability to generalize to new,
unseen data, and to identify issues, such
as overfitting and underfitting.
#2 Learning Curves

• when there are just one or two instances

in the training set, the model can fit them
perfectly, which is why the curve starts at
zero. But as new instances are added to the
training set, it becomes impossible for the
model to fit the training data perfectly,
both because the data is noisy and because
it is not linear at all. Learning curves for theLinear Model
• These learning curves are typical of an
underfi6ng model. Both curves have reached a
plateau; they are close and fairly high.
#2 Learning Curves

• The error on the training data is much

lower than with the Linear Regression
model.
• There is a gap between the curves. This
means that the model performs
signiﬁcantly be@er on the training data
than on the validaAon data, which is the
hall- mark of an overﬁDng model.
• However, if you used a much larger Learning curves for the polynomial model
training set, the two curves would conAnue
to get closer.
# The Bias/Variance Tradeoff
model will have good accuracy when it is trying to make predictions on new or
unseen data for example, using the data which is not included in the training
set.
Good accuracy also means that the value predicted by the model will be very
much close to the actual value.
Bias will be low and variance will be high when model performs well on the
training data but performs bad or poorly on the test data.
High variance means the model cannot generalize to new or unseen data. (This
is the case of overfitting) If the model performs poorly (means less accurate and
cannot generalize) on both training data and test data, it means it has high
bias and high
low variance (This is the case of underfitting) If model performs well
on both test and training data.
Performs well meaning, predictions are close to actual values for unseen data
so accuracy will be high. In this case, bias will be low and variance will also be
low. The best model must have low bias (low error rate on training data) and
low variance (can generalize and has low error rate on new or test data) (This
is the case for best fit model)
# THE BIAS/VARIANCE TRADEOFF
#3 Regularized Linear Model

A good way to reduce overfitting is to regularize the model (i.e., to constrain it): the
fewer degrees of freedom it has, the harder it will be for it to overfit the data. For
example, a simple way to regularize a polynomial model is to reduce the number of
polynomial degrees.

For a linear model, regularization is typically achieved by constraining the weights of

the model. We will now look at Ridge Regression, Lasso Regression, and Elastic Net,
which implement three different ways to constrain the weights.
#3 Regularized Linear Model
Note:
In linear regression, the final output is the weighted sum of the feature variables which
is represented by the equation: y = w1x1+w2x2+w3x3+…+wn xn+w₀

- Here weights w1, w2, …, wn represent the importance of the features(x1, x2,..xn).
A feature will be of high importance if it has a large weight associated with it.
- Weights are calculated as per the cost function, for linear regression cost function is
mean squared error. Weights are tweaked each time and MSE is calculated and the
set that has minimum MSE will be considered as the final output.

- To improve the model or reduce the effect of the noise in our model, we need to
reduce the weights associated with noise. Smaller the weight associated with the noise
will be the less contribution it will have in predicting the output.

- Regularized Cost Function = MSE+ Regularization term

#3.1 Ridge Regression

• Ridge Regression is a regularized version of Linear Regression:

a regularization term equal to 𝜶 ∑𝒏𝒊"𝟏 𝜽𝟐𝒊 is added to the cost function.

• This forces the learning algorithm to not only fit the data but also keep the model
weights as small as possible.

• The hyperparameter α controls how much you want to regularize the model. If α = 0 then
Ridge Regression is just Linear Regression. If α is very large, then all weights end up very
close to zero and the result is a flat line going through the data’s mean. To choose the best
hyperparameter value, we do hyperparameter tuning.

• Note: whenever we apply this technique, we first scale the data, as ridge regression is
sensitive to the scale of input features. This is true for most of the regularized models.
#3.1 Ridge Regression

• shows several Ridge models trained on some linear data

using different α value. On the left, plain Ridge models are
used, leading to linear predictions.
• On the right, the data is first expanded using
PolynomialFeatures(degree=10), and the Ridge models are
applied to the resulting features: this is Polynomial Regression
with Ridge regularization.
• Note: how increasing α leads to flatter (i.e., less extreme,
more reasonable) predictions; this reduces the model’s variance
but increases its bias.
#3.2 Lasso Regression

• An important characteristic of the lasso

regression is, it tends to eliminate the
features which have less importance
by shrinking the weights to zero, and
because of this it is used in feature
selection also.
• it adds a regularization term to the
cost function
𝒏
𝜶 # |𝜽𝒊 |
𝒊"𝟏
#3.2 Elastic Net Regression
• Elastic Net is a middle ground between Ridge Regression and Lasso Regression. The
regularization term is a simple mix of both Ridge and Lasso’s regularization terms.

𝟏&𝒓
• Regularized Cost function = MSE + 𝒓𝜶 ∑𝒏𝒊"𝟏 𝜽𝟐𝒊 + 𝟐
𝜶 ∑𝒏𝒊"𝟏 |𝜽𝒊 |

• When r = 0, Elastic Net is equivalent to Ridge Regression, and when r = 1, it is

equivalent to Lasso Regression.

• So when should you use Linear Regression, Ridge, Lasso, or Elastic Net?
It is almost always preferable to have at least a little bit of regularization, so
generally you should avoid plain Linear Regression. Ridge is a good default, but if
you suspect that only a few features are actually useful, you should prefer Lasso or
Elastic Net since they tend to reduce the useless features’ weights down to zero as we
have discussed.
#4 Early Stopping
• A very different way to regularize iterative learning
algorithms such as Gradient Descent is to stop
training as soon as the validation error reaches a
minimum.
• As the epochs go by, the algorithm learns and its
prediction error (RMSE) on the training set naturally
goes down, and so does its prediction error on the
validation set.
• However, after a while the validation error stops
decreasing and actually starts to go back up. This
high-degree Polynomial Regression model trained using Batch Gradient Descent
indicates that the model has started to overfit the
training data.
• With early stopping you just stop training as soon
as the validation error reaches the minimum
#5 Logistic Regression
• Logistic Regression (also called Logit Regression) is commonly used to estimate the
probability that an instance belongs to a particular class (e.g., what is the probability
that this email is spam?).

• If the estimated probability is greater than 50%, then the model predicts that the
instance belongs to that class (called the positive class, labeled “1”), or else it predicts
that it does not (i.e., it belongs to the negative class, labeled “0”). This makes it a
binary classifier.
#5 Logistic Regression
• Imagine you have a magical box that can tell you if a fruit is an apple or an orange. You
have a bunch of fruits, and you want to know if each fruit is an apple (1) or an orange (0).

• Logistic regression is like a magical way to use the features of the fruits (like color, shape,
size) to make predictions. It’s as if the magical box draws a line on the features to separate
apples from oranges.

• The magic box uses a special formula called “logistic function” to calculate the probability of
a fruit being an apple (the chances of it being 1). If the probability is more than 0.5, the
box says it’s an apple; if it’s less than 0.5, the box says it’s an orange.

• For example, if the probability of a fruit being an apple is 0.8, the box is quite confident it’s
an apple. But if the probability is 0.2, the box thinks it’s more likely to be an orange.

• Logistic regression helps us classify things into two groups (like apples and oranges) based
on their features. It’s like a magical tool that uses probabilities to make smart decisions and
sort things out!
#5.1 Estimating Probabilities
• Logistic Regression model computes a weighted sum of the input features (plus a bias
term), but instead of outputting the result directly like the Linear Regression model does, it
outputs the logistic of this result.
• The logistic is a sigmoid function that outputs a number between 0 and 1.

Logistic Function
#5.1 Estimating Probabilities
• Once the Logistic Regression model has estimated the probability that an instance x
belongs to the positive class, it can make its prediction ŷ easily

0 𝑖𝑓 𝑝 < 0.5
• ŷ = "
1 𝑖𝑓 𝑝 ≥ 0.5
#5.2 Training and Cost Function
• The objective of training is to set the parameter vector θ so that the model estimates high
probabilities for positive instances (y = 1) and low probabilities for negative instances (y = 0).

− log 𝑝 𝑖𝑓 𝑦 = 1
• Cost function of a single training instance: C(𝜃) = "
− log 1 − 𝑝 𝑖𝑓 𝑦 = 0

• The bad news is that there is no known closed-form equation to compute the value of 𝜃 that
minimizes this cost function (there is no equivalent of the Normal Equation).
But the good news is that this cost function is convex, so Gradient Descent (or any other
optimization algorithm) is guaranteed to find the global minimum (if the learning rate is not
too large and you wait long enough).
#5.3 Decision Boundaries

• The fundamental application of logistic

regression is to determine a decision
boundary for a binary classification problem.

• boundary since we will observe instances of a

different class on each side of the boundary.
Our intention in logistic regression would be
to decide on a proper fit to the decision
boundary so that we will be able to predict
which class a new feature set might
correspond to.
#5.3 Decision Boundaries
#5.4 SoftMax Regression
• The Logistic Regression model can be generalized to support multiple classes directly,
without having to train and combine multiple binary classifiers. This is called SoftMax
Regression, or Multinomial Logistic Regression.

• The idea is quite simple: when given an instance x, the SoftMax Regression model first
computes a score Sk(x) for each class k, then estimates the probability of each class by
applying the SoftMax function (also called the normalized exponential) to the scores.

• Once you have computed the score of every class for the instance x, you can estimate the
probability pk that the instance belongs to class k by running the scores through the
softmax function

• The Softmax Regression classifier predicts only one class at a time (i.e., it is multiclass, not
multioutput) so it should be used only with mutually exclusive classes such as different
types of plants. You cannot use it to recognize multiple people in one picture.
#5.4 SoftMax Regression
Thanks…

Machine Learning Interview Questions
From Everand
Machine Learning Interview Questions
Tech Interviews
4.5/5 (2)
Lesson 11 Multiple Linear Regression
No ratings yet
Lesson 11 Multiple Linear Regression
35 pages
Econometrics-Final Exam BFI-61th Code 1
100% (1)
Econometrics-Final Exam BFI-61th Code 1
4 pages
The Practically Cheating Calculus Handbook
From Everand
The Practically Cheating Calculus Handbook
S. Deviant
3.5/5 (7)
Essentials of Business Analytics 2nd Edition Camm Test Bank 1
100% (71)
Essentials of Business Analytics 2nd Edition Camm Test Bank 1
53 pages
Unit 02 - Nonlinear Classification, Linear Regression, Collaborative Filtering - MD
No ratings yet
Unit 02 - Nonlinear Classification, Linear Regression, Collaborative Filtering - MD
14 pages
Machine Learning Interview Questions.
50% (2)
Machine Learning Interview Questions.
43 pages
lec8_Regularization_polynomial_regression
No ratings yet
lec8_Regularization_polynomial_regression
30 pages
Logistic Regression
No ratings yet
Logistic Regression
42 pages
ML-1
No ratings yet
ML-1
24 pages
Regularization Linear Models
No ratings yet
Regularization Linear Models
23 pages
2EL1730 ML Lecture02 Linear and Logistic Regression
No ratings yet
2EL1730 ML Lecture02 Linear and Logistic Regression
65 pages
Regression-and-generalization (1)
No ratings yet
Regression-and-generalization (1)
67 pages
Lecture 7 - Part A - Mutli Class and Overfitting and Regularization
No ratings yet
Lecture 7 - Part A - Mutli Class and Overfitting and Regularization
43 pages
ML4 Linear Models
No ratings yet
ML4 Linear Models
34 pages
Unit 2
No ratings yet
Unit 2
8 pages
Supervised Regression Notes
No ratings yet
Supervised Regression Notes
11 pages
ML Classification Trupesh Patel
No ratings yet
ML Classification Trupesh Patel
39 pages
ML Linear Model
No ratings yet
ML Linear Model
10 pages
Linear Regression Python Programming
No ratings yet
Linear Regression Python Programming
25 pages
Regression Analysis
No ratings yet
Regression Analysis
11 pages
Introduction To Machine Learning: Dr. Muhammad Amjad Iqbal
No ratings yet
Introduction To Machine Learning: Dr. Muhammad Amjad Iqbal
20 pages
ML models and when to choose one over others
No ratings yet
ML models and when to choose one over others
7 pages
Chapter+3+ ++Regression+Algorithms
No ratings yet
Chapter+3+ ++Regression+Algorithms
22 pages
Lecture-5---Polynomial-Regression-imran-07032025-114203am
No ratings yet
Lecture-5---Polynomial-Regression-imran-07032025-114203am
39 pages
ML 3 (1)
No ratings yet
ML 3 (1)
50 pages
MECH4403 LR Week04
No ratings yet
MECH4403 LR Week04
25 pages
week2
No ratings yet
week2
43 pages
ML Tutorial
No ratings yet
ML Tutorial
45 pages
ML 04 Validation Regularization
No ratings yet
ML 04 Validation Regularization
57 pages
(MLP) Lecture Notes
No ratings yet
(MLP) Lecture Notes
22 pages
CO-2-Session-3
No ratings yet
CO-2-Session-3
39 pages
linear regression (1)
No ratings yet
linear regression (1)
8 pages
Machine learning (1)
No ratings yet
Machine learning (1)
20 pages
06LogisticRegression
No ratings yet
06LogisticRegression
55 pages
Lec4 Oct12 2022 PracticalNotes LinearRegression
No ratings yet
Lec4 Oct12 2022 PracticalNotes LinearRegression
34 pages
Module3_Ch1
No ratings yet
Module3_Ch1
83 pages
logistic regression
No ratings yet
logistic regression
6 pages
Machine Learning Lecture 1
No ratings yet
Machine Learning Lecture 1
5 pages
Regression
No ratings yet
Regression
45 pages
BiasVariance
No ratings yet
BiasVariance
14 pages
(Slide) Non Linear Regression
No ratings yet
(Slide) Non Linear Regression
39 pages
IML-Summary
No ratings yet
IML-Summary
12 pages
Chapter 3. Linear Regression
No ratings yet
Chapter 3. Linear Regression
41 pages
Lecture04. Training Models (Regression in Chapter 4)
No ratings yet
Lecture04. Training Models (Regression in Chapter 4)
44 pages
Forecasting and Learning Theory
No ratings yet
Forecasting and Learning Theory
46 pages
Lecture 8: Gradient Descent and Logistic Regression
No ratings yet
Lecture 8: Gradient Descent and Logistic Regression
39 pages
Mauryan Empire
No ratings yet
Mauryan Empire
11 pages
3-LG_Eval
No ratings yet
3-LG_Eval
52 pages
ML Model Paper 2 Solution
No ratings yet
ML Model Paper 2 Solution
15 pages
CH 1
No ratings yet
CH 1
24 pages
03 Linear Models
No ratings yet
03 Linear Models
46 pages
MLA TAB Lecture3
No ratings yet
MLA TAB Lecture3
70 pages
Theory in Machine Learning
No ratings yet
Theory in Machine Learning
60 pages
A Layman's Guide to the Project
No ratings yet
A Layman's Guide to the Project
34 pages
Gansp Awareness Quiz PDF
No ratings yet
Gansp Awareness Quiz PDF
13 pages
3. LR, decision tree
No ratings yet
3. LR, decision tree
48 pages
Iet Cipher ML Bootcamp (Session-1)
No ratings yet
Iet Cipher ML Bootcamp (Session-1)
67 pages
A Tutorial of Machine Learning
No ratings yet
A Tutorial of Machine Learning
16 pages
Logistic Regression
No ratings yet
Logistic Regression
8 pages
Ridge Lasso Regression Bias Variance Tradeoff 71
No ratings yet
Ridge Lasso Regression Bias Variance Tradeoff 71
19 pages
CH 4
No ratings yet
CH 4
41 pages
Random Sample Consensus: Robust Estimation in Computer Vision
From Everand
Random Sample Consensus: Robust Estimation in Computer Vision
Fouad Sabry
No ratings yet
Hands On Machine Learning 3 Edition
No ratings yet
Hands On Machine Learning 3 Edition
43 pages
Hands On Machine Learning 3 Edition
No ratings yet
Hands On Machine Learning 3 Edition
31 pages
Lecture 08 ML
No ratings yet
Lecture 08 ML
20 pages
Hands On Machine Learning 3 Edition
No ratings yet
Hands On Machine Learning 3 Edition
18 pages
Lagranges Interpolation Formula For Unequal Interval
0% (1)
Lagranges Interpolation Formula For Unequal Interval
18 pages
NSM Final Merge 1
No ratings yet
NSM Final Merge 1
850 pages
Numerical Analysis
No ratings yet
Numerical Analysis
9 pages
20025
No ratings yet
20025
10 pages
Linear Regression
No ratings yet
Linear Regression
11 pages
6 +ARTIKEL+Nur+Rahma+FIX
No ratings yet
6 +ARTIKEL+Nur+Rahma+FIX
9 pages
5.02 Direct Method of Interpolation
100% (1)
5.02 Direct Method of Interpolation
11 pages
Chapter 4
No ratings yet
Chapter 4
29 pages
(Ebook) The multivariate social scientist: introductory statistics using generalized linear models by Sofroniou, Nick;Hutcheson, Graeme D ISBN 9780761952008, 9780761952015, 0761952004, 0761952012 - The latest updated ebook is now available for download
100% (1)
(Ebook) The multivariate social scientist: introductory statistics using generalized linear models by Sofroniou, Nick;Hutcheson, Graeme D ISBN 9780761952008, 9780761952015, 0761952004, 0761952012 - The latest updated ebook is now available for download
56 pages
Exercise6 1
No ratings yet
Exercise6 1
4 pages
Topic_7_Linear_regression
No ratings yet
Topic_7_Linear_regression
2 pages
CH 02 Simple Regression TQT
No ratings yet
CH 02 Simple Regression TQT
61 pages
2020 Mahajanetal Wtliftingpaper
No ratings yet
2020 Mahajanetal Wtliftingpaper
20 pages
course outline
No ratings yet
course outline
3 pages
Econometrics Assignemente
No ratings yet
Econometrics Assignemente
2 pages
Elimination Gauss Seidel - Examples
No ratings yet
Elimination Gauss Seidel - Examples
3 pages
SOA Exam Statistics For Risk Modelling Study Manual
No ratings yet
SOA Exam Statistics For Risk Modelling Study Manual
42 pages
2013 Book OptimizationMethods
No ratings yet
2013 Book OptimizationMethods
272 pages
Regression Analysis
No ratings yet
Regression Analysis
2 pages
Gian Brochure Iitm 2017 171003k08
No ratings yet
Gian Brochure Iitm 2017 171003k08
2 pages
Statistics 02
No ratings yet
Statistics 02
8 pages
Cálculo de Observaciones
No ratings yet
Cálculo de Observaciones
416 pages
ch-02-wooldridge-5e-ppt20250307
No ratings yet
ch-02-wooldridge-5e-ppt20250307
51 pages
App Fp2k Man
No ratings yet
App Fp2k Man
60 pages
Formula Sheet
No ratings yet
Formula Sheet
8 pages
17ma63 - Set A
No ratings yet
17ma63 - Set A
3 pages
CLRM Assumptions
No ratings yet
CLRM Assumptions
24 pages