0% found this document useful (0 votes)

7 views74 pages

Machine Learning (CSO851) - Lecture 02

The document discusses regression models, focusing on linear regression, its assumptions, and optimization techniques such as gradient descent. It includes data on India's population growth rate and explains concepts like the cost function, R-squared, and polynomial regression for non-linear relationships. The document emphasizes the importance of model fitting strategies and considerations when using polynomial models.

Uploaded by

trijitrana9878

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

7 views74 pages

Machine Learning (CSO851) - Lecture 02

Uploaded by

trijitrana9878

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

You are on page 1/ 74

Regression Models,

Penalty and
Optimization
Lecture - 02
Regression Models
India – Population Growth Rate
Data
Year Population Growth Growth Rate 2005 1,147,609,927 1.59%
2022 1,406,631,776 0.95% 2004 1,129,623,456 1.63%
2021 1,393,409,038 0.97% 2003 1,111,523,144 1.67%
2020 1,380,004,385 0.99% 2002 1,093,317,189 1.70%
2019 1,366,417,754 1.02% 2001 1,075,000,085 1.74%
2018 1,352,642,280 1.04% 2000 1,056,575,549 1.78%
2017 1,338,676,785 1.07% 1999 1,038,058,156 1.82%
2016 1,324,517,249 1.10% 1998 1,019,483,581 1.86%
2015 1,310,152,403 1.12% 1997 1,000,900,030 1.89%
2014 1,295,600,772 1.15% 1996 982,365,243 1.91%
2013 1,280,842,125 1.19% 1995 963,922,588 1.94%
2012 1,265,780,247 1.24% 1994 945,601,831 1.96%
2011 1,250,287,943 1.30% 1993 927,403,860 1.99%
2010 1,234,281,170 1.36% 1992 909,307,016 2.02%
2009 1,217,726,215 1.42% 1991 891,273,209 2.06%
2008 1,200,669,765 1.48% 1990 873,277,798 2.10%
2007 1,183,209,472 1.52% 1989 855,334,678 2.13%
2006 1,165,486,291 1.56% 1988 837,468,930 2.17%
------ --------------- -------
Regression Models
Least Square Regression
• It is useful: the underlying model assumptions are, in many
applications, close enough to correct that this method has been
widely used in the literature of many scientific domains.
• It can provide insight into “what is going on” with this approach to
regression.
• It enables the transformation of classification problems into
regression problems.
• The thorough understanding of least-squares regression helps us to
understand the relevant topics very well.
General Framework
The model assumption of linear regression is that the function to be learned
𝑚
:. 𝑥𝑖 ∈ Ɍ
Given a vector , is the vector inner product

[ ]
𝜃0
𝑓 ( 𝑥𝑖 )= 𝑓 𝜃 ( 𝑥 𝑖 ) =[ 1 𝑥 𝑖 1 𝑥 𝑖 2 … 𝑥 𝑖𝑚 ] ⋮ = 𝜃 0+ 𝑥 𝑖 1 𝜃 1+ …+ 𝑥 𝑖𝑚 𝜃 𝑚
𝜃𝑚

for some fixed real (m+1)-vector of parameters

𝜃=( 𝜃0 , 𝜃 1 , … , 𝜃𝑚 )𝑇 .

Observed data (𝑥 ¿ ¿ 1 , 𝑦 1), …,(𝑥 𝑛 , 𝑦 𝑛 )¿ are taken to be

𝑦 𝑖= 𝑓 𝜃 ( 𝑥𝑖 ) +𝑒 𝑖
General Framework
For each i, where 𝑒 1 , … ,𝑒 𝑛 are realizations of independent, identically distributed
random variables with mean zero and unknown variance, .
In matrix notation with n observations, the data satisfy

[ ][ ][ ][ ] [ ]
𝑦1 𝑓 𝜃 ( 𝑥 1 ) +𝑒 1 1 𝑥11 𝑥12 … 𝑥1 𝑚 𝜃0 𝑒1
⋮ = ⋮ = ⋮ ⋮ + ⋮
𝑦𝑛 𝑓 𝜃 ( 𝑥𝑛 ) +𝑒 𝑛 1 𝑥𝑛1 𝑥𝑛2 … 𝑥𝑛𝑚 𝜃𝑚 𝑒𝑛

𝑦 = 𝑋 𝜃+ 𝑒

parameter vector 𝜃 are the corresponding vectors and matrices.

where the n×(m+1) matrix X, the n-long column vectors y and e, and the (m+1)-long
Estimating the Model
Parameters
Given an (m+1)-long (column) parameter vector 𝜃, the linear model predicts that at
point in feature space, the response will be
The difference between an actual response from observed datum and the predicted
response is

to is equivalent to searching for a parameter vector 𝜃 which minimizes the training

Using training risk as an estimate of risk, searching for a minimum-risk approximation

risk,
Linear Regression
• Linear Regression is a
supervised machine
learning algorithm.
• Predicted output is
continuous and has a
constant slope.
• Predict values within a
continuous range rather
than trying to classify them
into categories.
Linear Regression
Linear Regression
Linear Regression
Linear Regression
Cost Function for Linear
Regression

In Linear Regression, generally Mean Squared Error (MSE) cost function is used, which
is the average of squared error that occurred between the predicted and observed
values for dependent variable.

• Using the MSE function, we’ll update the values of intercept and slope such that
the MSE value settles at the minima.

• These parameters can be determined using the gradient descent method such that
the value for the cost function is minimum.
Gradient Descent for Linear
Regression
• Gradient Descent is one of the optimization algorithms that
optimize the cost function (objective function) to reach the
optimal minimal solution.

• To find the optimum solution we need to reduce the cost

function (MSE) for all data points.

• This is done by updating the values of B0 and B1 iteratively

until we get an optimal solution.

• A regression model optimizes the gradient descent algorithm

to update the coefficients of the line by reducing the cost
function by randomly selecting coefficient values and then
iteratively updating the values to reach the minimum cost
function.
Gradient Descent for Linear
Regression
To update B0 and B1, we take gradients
from the cost function. To find these
gradients, we take partial derivatives for
B0 and B1.
Coefficient of Determination or
R-Squared (R2)
• R-Squared is a number that explains the amount of variation that is explained /
captured by the developed model.
• It always ranges between 0 & 1.
• Overall, the higher the value of R-squared, the better the model fits the data.
• Mathematically it can be represented as, R2 = 1 – ( RSS/TSS )
Residual sum of Squares (RSS) is defined as the sum of squares of the residual for
each data point in the plot/data. It is the measure of the difference between the
expected and the actual observed output.

Total Sum of Squares (TSS) is defined as the sum of errors of the data points from the
mean of the response variable. Mathematically TSS is,
Assumptions of Linear
Regression
Linearity of residuals: There needs to be a
linear relationship between the
dependent variable and independent
variable(s).

Independence of residuals: The error

terms should not be dependent on one
another. There should be no correlation
between the residual terms. The absence
of this phenomenon is known as
Autocorrelation.
Assumptions of Linear
Regression
Normal distribution of residuals: The
mean of residuals should follow a normal
distribution with a mean equal to zero or
close to zero. This is done in order to check
whether the selected line is actually the
line of best fit or not.
The equal variance of residuals: The error
terms must have constant variance. This
phenomenon is known as
Homoscedasticity. The presence of non-
constant variance in the error terms is
referred to as Heteroscedasticity.
Generally, non-constant variance arises in
the presence of outliers or extreme
leverage values.
Multiple Linear Regression
Multiple Linear Regression
Multiple Linear Regression
Multiple Linear Regression:
Considerations
Multiple Linear Regression:
Considerations
Multiple Linear Regression: New
Relationships
Multiple Linear Regression: Many Relationships
Multiple Linear Regression Model
Estimated Multiple Regression Equation
Interpreting coefficients
Multiple Regression: Data Preparation
Multiple Regression: Data Preparation
Data and Variable Naming
Sketching Out Relationships
Checking Relevancy: Scatterplot
Checking Relevancy: Scatterplot
Checking Relevancy: Scatterplot
DV vs IV Scatterplot
Scatterplot Summary
IV vs IV Scatterplot
IV vs IV Scatterplot
IV vs IV Scatterplot
IV Scatterplot: Multicollinearity
IV Scatterplot Summary
Correlations
DV vs IV Scatterplot
IV Scatterplot: Multicollinearity
Correlation Summary
Polynomial Regression
Polynomial Regression Model
Summary for Quadratic Function
Polynomial Regression
• A simple linear regression algorithm only works when the relationship between
the data is linear.
• But suppose we have non-linear data, then linear regression will not be able to
draw a best-fit line.
Polynomial Regression
A model is said to be linear when it is linear in parameters. So the model

and

are also the linear model. In fact, they are the second-order polynomials in one and
two variables, respectively.

The polynomial models can be used in those situations where the relationship
between study and explanatory variables is curvilinear.
Polynomial Models in One Variable
The order polynomial model in one variable is given by
Polynomial Models in One Variable
Polynomial Models in One Variable

The polynomial models can be used to

approximate a complex nonlinear
relationship. The polynomial model is
just the Taylor series expansion of the
unknown nonlinear function in such a
case.
Considerations in Fitting Polynomial in One Variable
Order of the model:
• The order of the polynomial model is kept as low as possible.
• Some transformations can be used to keep the model to be of the first order.
• If this is not satisfactory, then the second-order polynomial is used.
• Arbitrary fitting of higher-order polynomials can be a serious abuse of regression
analysis.
• A model which is consistent with the knowledge of data and its environment should
be taken into account.
• It is always possible for a polynomial of order (n-1) to pass through n points so that a
polynomial of sufficiently high degree can always be found that provides a “good” fit
to the data.
• Such models neither enhance the understanding of the unknown function nor be a
good predictor.
Considerations in Fitting Polynomial in One Variable
Model building strategy:
• A good strategy should be used to choose the order of an approximate polynomial.
• One possible approach is to successively fit the models in increasing order and test
the significance of regression coefficients at each step of model fitting. Keep the
order increasing until t-test for the highest order term is non-significant. This is
called a forward selection procedure.
• Another approach is to fit the appropriate highest order model and then delete
terms one at a time starting with the highest order. This is continued until the
highest order remaining term has a significant t statistic. This is called a backward
elimination procedure.
Considerations in Fitting Polynomial in One Variable
Extrapolation:
• One has to be very cautioned in
extrapolation with polynomial models.
The curvatures in the region of data
and the region of extrapolation can be
different.
• For example, in the figure on right
hand side, the trend of data in the
region of original data is increasing,
but it is decreasing in the region of
extrapolation.
• So predicted response would not be
based on the true behaviour of the
data.
Considerations in Fitting Polynomial in One Variable
Ill-Conditioning-:
Considerations in Fitting Polynomial in One Variable
Hierarchy:

It is expected that all polynomial models should have this property because only
hierarchical models are invariant under linear transformation.
Orthogonal Polynomials
• While fitting a linear regression model to a given set of data, we begin with a simple
linear regression model.
• Later we decide to change it to a quadratic or wish to increase the order from
quadratic to a cubic model etc.
• In each case, we have to begin the modeling from scratch, i.e., from the simple linear
regression model.
• The classical cases of orthogonal polynomials of special kinds are due to Legendre,
Hermite and Tehebycheff polynomials.
• These are continuous orthogonal polynomials (where the orthogonality relation
involve integrating) whereas in our case, we have discrete orthogonal polynomials
(where the orthogonality relation involves summation).
Orthogonal Polynomials
Orthogonal Polynomials
Orthogonal Polynomials
Orthogonal Polynomials

2 Simple Linear Regression
No ratings yet
2 Simple Linear Regression
22 pages
SIDDHANT VIJAY 2K20 CH 65 Sem 5
No ratings yet
SIDDHANT VIJAY 2K20 CH 65 Sem 5
29 pages
DA Unit-3
No ratings yet
DA Unit-3
11 pages
Data Analytics Unit 3 Notes
100% (3)
Data Analytics Unit 3 Notes
28 pages
Linear Regression
No ratings yet
Linear Regression
89 pages
Data Science Interview Preparation
100% (1)
Data Science Interview Preparation
113 pages
Regression
No ratings yet
Regression
45 pages
Analytics Compendium
No ratings yet
Analytics Compendium
41 pages
UNIT-III Lecture Notes
No ratings yet
UNIT-III Lecture Notes
18 pages
ML - LAB - BE CSE (DS) Final
No ratings yet
ML - LAB - BE CSE (DS) Final
110 pages
Linear Reg, Logistic Reg and SVM
No ratings yet
Linear Reg, Logistic Reg and SVM
40 pages
08 Curvefitting W Interpolation
No ratings yet
08 Curvefitting W Interpolation
64 pages
US - TMC - 06 - Curve Fitting & Interpolation
No ratings yet
US - TMC - 06 - Curve Fitting & Interpolation
64 pages
Lecture Notes On High Dimensional Linear Regression
No ratings yet
Lecture Notes On High Dimensional Linear Regression
73 pages
Lecture 09 - 02.09.2024 - Regression-01
No ratings yet
Lecture 09 - 02.09.2024 - Regression-01
62 pages
OE-ML Unit - 3
No ratings yet
OE-ML Unit - 3
29 pages
Linear Regression
No ratings yet
Linear Regression
29 pages
5 - AML Lecture 5 - Linear Regression
No ratings yet
5 - AML Lecture 5 - Linear Regression
56 pages
Linear Regression
No ratings yet
Linear Regression
24 pages
NOTES - UNIT 2 - Machine Learning
No ratings yet
NOTES - UNIT 2 - Machine Learning
33 pages
Linear Regression
No ratings yet
Linear Regression
62 pages
Linear Regression
No ratings yet
Linear Regression
20 pages
Week 2
No ratings yet
Week 2
43 pages
21csc305p ML Unit 2
No ratings yet
21csc305p ML Unit 2
115 pages
Everything You Need To Know About Linear Regression
No ratings yet
Everything You Need To Know About Linear Regression
19 pages
Unit - Iii Supervisied Learning - Notes
No ratings yet
Unit - Iii Supervisied Learning - Notes
42 pages
2-Linear Regression
No ratings yet
2-Linear Regression
31 pages
Data Analytics Iii Unit
No ratings yet
Data Analytics Iii Unit
8 pages
Lecture 2
No ratings yet
Lecture 2
23 pages
MachineLearning Unit-II
No ratings yet
MachineLearning Unit-II
45 pages
Unit Iii
No ratings yet
Unit Iii
27 pages
2EL1730 ML Lecture02 Linear and Logistic Regression
No ratings yet
2EL1730 ML Lecture02 Linear and Logistic Regression
65 pages
Chapter4 Regression
No ratings yet
Chapter4 Regression
15 pages
Regression Questionnaire
No ratings yet
Regression Questionnaire
10 pages
Linear Regression
No ratings yet
Linear Regression
35 pages
Linear Regression Models
No ratings yet
Linear Regression Models
41 pages
Machine Learning and Linear Regression
100% (1)
Machine Learning and Linear Regression
55 pages
Experiment No 7
No ratings yet
Experiment No 7
7 pages
Linear Regression
No ratings yet
Linear Regression
31 pages
MachineLearning Unit II
No ratings yet
MachineLearning Unit II
45 pages
MATH6183 Introduction+Regression
No ratings yet
MATH6183 Introduction+Regression
70 pages
Supervised Learning Algorithms
No ratings yet
Supervised Learning Algorithms
20 pages
ECE 3040 Lecture 18: Curve Fitting by Least-Squares-Error Regression
No ratings yet
ECE 3040 Lecture 18: Curve Fitting by Least-Squares-Error Regression
38 pages
Linear Regression Notes
No ratings yet
Linear Regression Notes
25 pages
Mungadze Linear
No ratings yet
Mungadze Linear
21 pages
Linear Regression Case Study
No ratings yet
Linear Regression Case Study
6 pages
Aimlf Unit4
No ratings yet
Aimlf Unit4
20 pages
Linear Regression-Part 2
No ratings yet
Linear Regression-Part 2
26 pages
Linear Regression
No ratings yet
Linear Regression
24 pages
Linear Regression
No ratings yet
Linear Regression
18 pages
Machine Learning Lecture 1
No ratings yet
Machine Learning Lecture 1
5 pages
3 Da
No ratings yet
3 Da
16 pages
Machine Learning and Deep Learning Course
No ratings yet
Machine Learning and Deep Learning Course
23 pages
Regression Analysis in Machine Learning
No ratings yet
Regression Analysis in Machine Learning
13 pages
ML - Module 2
No ratings yet
ML - Module 2
16 pages
Session 1: Simple Linear Regression: Figure 1 - Supervised and Unsupervised Learning Methods
No ratings yet
Session 1: Simple Linear Regression: Figure 1 - Supervised and Unsupervised Learning Methods
16 pages
4 Numerical Differentiation Integration
No ratings yet
4 Numerical Differentiation Integration
62 pages
Time Series Solution
100% (3)
Time Series Solution
5 pages
Linear Regression
No ratings yet
Linear Regression
11 pages
Combinepdf
No ratings yet
Combinepdf
8 pages
Isn't Linear Regression From Statistics?
No ratings yet
Isn't Linear Regression From Statistics?
4 pages
CHAPTER 6 NUmerical Differentiation
No ratings yet
CHAPTER 6 NUmerical Differentiation
6 pages
Matlab PDF
No ratings yet
Matlab PDF
43 pages
L05 Slides - mlp2
No ratings yet
L05 Slides - mlp2
21 pages
Unit 3 PPT Ai
No ratings yet
Unit 3 PPT Ai
93 pages
Machine Learning (CSO851) - Lecture 03
No ratings yet
Machine Learning (CSO851) - Lecture 03
71 pages
Digital Signal Lab
No ratings yet
Digital Signal Lab
13 pages
Pso 1
No ratings yet
Pso 1
21 pages
Vein-Based Biometric Verification Using Densely-Connected Convolutional Autoencoder
No ratings yet
Vein-Based Biometric Verification Using Densely-Connected Convolutional Autoencoder
5 pages
CP ENum Projectkuno
No ratings yet
CP ENum Projectkuno
5 pages
Machine Learning (CSO851) - Lecture 05
No ratings yet
Machine Learning (CSO851) - Lecture 05
27 pages
Discrete Convolution (Slides)
No ratings yet
Discrete Convolution (Slides)
40 pages
RNA Torsoin Angles - RNA
No ratings yet
RNA Torsoin Angles - RNA
20 pages
Lecture 13 - Kernels
No ratings yet
Lecture 13 - Kernels
5 pages
(EIE529) Assignment 2 Solution
No ratings yet
(EIE529) Assignment 2 Solution
4 pages
Od333269701051295100 1
No ratings yet
Od333269701051295100 1
8 pages
IT245 - Module 7
No ratings yet
IT245 - Module 7
23 pages
Inheritance 57
No ratings yet
Inheritance 57
25 pages
Lifted Newton Optimization
No ratings yet
Lifted Newton Optimization
60 pages
Assignment
No ratings yet
Assignment
3 pages
Automata Revision Ans
No ratings yet
Automata Revision Ans
6 pages
04 Discrete-Time Signal and System
No ratings yet
04 Discrete-Time Signal and System
57 pages
Confidential
No ratings yet
Confidential
14 pages
DTFT Properties
No ratings yet
DTFT Properties
15 pages
ML Exp9
No ratings yet
ML Exp9
10 pages
HPC May
No ratings yet
HPC May
2 pages
CS6700 Pa3
No ratings yet
CS6700 Pa3
3 pages
Daa Unit 2
No ratings yet
Daa Unit 2
12 pages
CS EXPERIMENT 11 01012021 095140am 18012021 012709am
No ratings yet
CS EXPERIMENT 11 01012021 095140am 18012021 012709am
13 pages
Genetic Algo 1
No ratings yet
Genetic Algo 1
11 pages
25.18 Solutions:: Xyk y K y K y K
No ratings yet
25.18 Solutions:: Xyk y K y K y K
3 pages
AD Conversion
No ratings yet
AD Conversion
5 pages
Sample Questions
No ratings yet
Sample Questions
2 pages
Acceptance-Rejection Sampling and Multi-dimensional Monte Carlo Integrations Utilizing Mathematica®
From Everand
Acceptance-Rejection Sampling and Multi-dimensional Monte Carlo Integrations Utilizing Mathematica®
SUJAUL CHOWDHURY
No ratings yet
Multi-dimensional Monte Carlo Integrations Utilizing Mathematica
From Everand
Multi-dimensional Monte Carlo Integrations Utilizing Mathematica
SUJAUL CHOWDHURY
No ratings yet

Machine Learning (CSO851) - Lecture 02

Uploaded by

Machine Learning (CSO851) - Lecture 02

Uploaded by

Regression Models,

for some fixed real (m+1)-vector of parameters

Observed data (𝑥 ¿ ¿ 1 , 𝑦 1), …,(𝑥 𝑛 , 𝑦 𝑛 )¿ are taken to be

parameter vector 𝜃 are the corresponding vectors and matrices.

to is equivalent to searching for a parameter vector 𝜃 which minimizes the training

• To find the optimum solution we need to reduce the cost

• This is done by updating the values of B0 and B1 iteratively

• A regression model optimizes the gradient descent algorithm

Independence of residuals: The error

The polynomial models can be used to

You might also like