Machine Learning (CSO851) - Lecture 02
Machine Learning (CSO851) - Lecture 02
Penalty and
Optimization
Lecture - 02
Regression Models
India – Population Growth Rate
Data
Year Population Growth Growth Rate 2005 1,147,609,927 1.59%
2022 1,406,631,776 0.95% 2004 1,129,623,456 1.63%
2021 1,393,409,038 0.97% 2003 1,111,523,144 1.67%
2020 1,380,004,385 0.99% 2002 1,093,317,189 1.70%
2019 1,366,417,754 1.02% 2001 1,075,000,085 1.74%
2018 1,352,642,280 1.04% 2000 1,056,575,549 1.78%
2017 1,338,676,785 1.07% 1999 1,038,058,156 1.82%
2016 1,324,517,249 1.10% 1998 1,019,483,581 1.86%
2015 1,310,152,403 1.12% 1997 1,000,900,030 1.89%
2014 1,295,600,772 1.15% 1996 982,365,243 1.91%
2013 1,280,842,125 1.19% 1995 963,922,588 1.94%
2012 1,265,780,247 1.24% 1994 945,601,831 1.96%
2011 1,250,287,943 1.30% 1993 927,403,860 1.99%
2010 1,234,281,170 1.36% 1992 909,307,016 2.02%
2009 1,217,726,215 1.42% 1991 891,273,209 2.06%
2008 1,200,669,765 1.48% 1990 873,277,798 2.10%
2007 1,183,209,472 1.52% 1989 855,334,678 2.13%
2006 1,165,486,291 1.56% 1988 837,468,930 2.17%
------ --------------- -------
Regression Models
Least Square Regression
• It is useful: the underlying model assumptions are, in many
applications, close enough to correct that this method has been
widely used in the literature of many scientific domains.
• It can provide insight into “what is going on” with this approach to
regression.
• It enables the transformation of classification problems into
regression problems.
• The thorough understanding of least-squares regression helps us to
understand the relevant topics very well.
General Framework
The model assumption of linear regression is that the function to be learned
𝑚
:. 𝑥𝑖 ∈ Ɍ
Given a vector , is the vector inner product
[ ]
𝜃0
𝑓 ( 𝑥𝑖 )= 𝑓 𝜃 ( 𝑥 𝑖 ) =[ 1 𝑥 𝑖 1 𝑥 𝑖 2 … 𝑥 𝑖𝑚 ] ⋮ = 𝜃 0+ 𝑥 𝑖 1 𝜃 1+ …+ 𝑥 𝑖𝑚 𝜃 𝑚
𝜃𝑚
𝑦 𝑖= 𝑓 𝜃 ( 𝑥𝑖 ) +𝑒 𝑖
General Framework
For each i, where 𝑒 1 , … ,𝑒 𝑛 are realizations of independent, identically distributed
random variables with mean zero and unknown variance, .
In matrix notation with n observations, the data satisfy
[ ][ ][ ][ ] [ ]
𝑦1 𝑓 𝜃 ( 𝑥 1 ) +𝑒 1 1 𝑥11 𝑥12 … 𝑥1 𝑚 𝜃0 𝑒1
⋮ = ⋮ = ⋮ ⋮ + ⋮
𝑦𝑛 𝑓 𝜃 ( 𝑥𝑛 ) +𝑒 𝑛 1 𝑥𝑛1 𝑥𝑛2 … 𝑥𝑛𝑚 𝜃𝑚 𝑒𝑛
𝑦 = 𝑋 𝜃+ 𝑒
risk,
Linear Regression
• Linear Regression is a
supervised machine
learning algorithm.
• Predicted output is
continuous and has a
constant slope.
• Predict values within a
continuous range rather
than trying to classify them
into categories.
Linear Regression
Linear Regression
Linear Regression
Linear Regression
Cost Function for Linear
Regression
In Linear Regression, generally Mean Squared Error (MSE) cost function is used, which
is the average of squared error that occurred between the predicted and observed
values for dependent variable.
• Using the MSE function, we’ll update the values of intercept and slope such that
the MSE value settles at the minima.
• These parameters can be determined using the gradient descent method such that
the value for the cost function is minimum.
Gradient Descent for Linear
Regression
• Gradient Descent is one of the optimization algorithms that
optimize the cost function (objective function) to reach the
optimal minimal solution.
Total Sum of Squares (TSS) is defined as the sum of errors of the data points from the
mean of the response variable. Mathematically TSS is,
Assumptions of Linear
Regression
Linearity of residuals: There needs to be a
linear relationship between the
dependent variable and independent
variable(s).
and
are also the linear model. In fact, they are the second-order polynomials in one and
two variables, respectively.
The polynomial models can be used in those situations where the relationship
between study and explanatory variables is curvilinear.
Polynomial Models in One Variable
The order polynomial model in one variable is given by
Polynomial Models in One Variable
Polynomial Models in One Variable
It is expected that all polynomial models should have this property because only
hierarchical models are invariant under linear transformation.
Orthogonal Polynomials
• While fitting a linear regression model to a given set of data, we begin with a simple
linear regression model.
• Later we decide to change it to a quadratic or wish to increase the order from
quadratic to a cubic model etc.
• In each case, we have to begin the modeling from scratch, i.e., from the simple linear
regression model.
• The classical cases of orthogonal polynomials of special kinds are due to Legendre,
Hermite and Tehebycheff polynomials.
• These are continuous orthogonal polynomials (where the orthogonality relation
involve integrating) whereas in our case, we have discrete orthogonal polynomials
(where the orthogonality relation involves summation).
Orthogonal Polynomials
Orthogonal Polynomials
Orthogonal Polynomials
Orthogonal Polynomials