0% found this document useful (0 votes)
26 views

w3 - Linear Model - Linear Regression

The document provides an overview of linear regression models. It discusses that regression analysis is used to predict the value of a response variable from attribute variables. The key aspects covered include: - Regression models involve parameters, independent variables, dependent variables, and error terms. The objective is to estimate the function that best fits the data. - The least squares method is commonly used to estimate the parameters by minimizing the sum of squared errors between observed and predicted values. - Gradient descent is another approach to determine the optimal parameter weights by iteratively updating the weights in steps that reduce the error function. It avoids problems with singular matrices and large computations compared to the pseudoinverse method.

Uploaded by

Swastik Sindhani
Copyright
© © All Rights Reserved
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
26 views

w3 - Linear Model - Linear Regression

The document provides an overview of linear regression models. It discusses that regression analysis is used to predict the value of a response variable from attribute variables. The key aspects covered include: - Regression models involve parameters, independent variables, dependent variables, and error terms. The objective is to estimate the function that best fits the data. - The least squares method is commonly used to estimate the parameters by minimizing the sum of squared errors between observed and predicted values. - Gradient descent is another approach to determine the optimal parameter weights by iteratively updating the weights in steps that reduce the error function. It avoids problems with singular matrices and large computations compared to the pseudoinverse method.

Uploaded by

Swastik Sindhani
Copyright
© © All Rights Reserved
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 33

LINEAR MODEL -

LINEAR
REGRESSION
Dr. Srikanth Allamsetty
Formulation & Mathematical Foundation
of Regression Problem
What is Regression
• Regression – predict value of response variable from attribute variables.
• Variables – continuous numeric values
• Regression analysis – a set of statistical processes for estimating the relationships
between a dependent variable and one or more independent variables.
• Dependent variables are often called the 'predictand', 'outcome' or 'response' variable;
• Independent variables are often called 'predictors', 'covariates', 'explanatory variables' or
'features'.
• Regression analysis is a way of mathematically sorting out which of those variables does
indeed have an impact.
• Used for modeling the future relationship between the variables.
• Statistical process – a science of collecting, exploring, organizing, analyzing,
interpreting data and exploring patterns and trends to answer questions and make
decisions (Broad area).
Basics of Regression Models
• Regression models predict a value of the Y variable given known values
of the X variables.
• Prediction within the range of values in the dataset used for model-fitting
is known as interpolation.
• Prediction outside this range of the data is known as extrapolation.
• First, a model to estimate the outcome need to be fixed.
• Then the parameters of that model need to be estimated using any
chosen method (e.g., least squares).
Formulation of Regression Models
• Regression models involve the following components:
• The unknown parameters, often denoted as β or ω or w.
• The independent variables, which are observed in data and are often
denoted as a vector Xi (where i denotes a row of data).
• The dependent variable, which are observed in data and often denoted
using the scalar Yi.
• The error terms, which are not directly observed in data and are often
denoted using the scalar ei.
Formulation of Regression Models
• Most regression models propose that Yi is a function of Xi and β, with ei
representing an additive error term that may stand in for a random statistical
noise.

• Our objective is to estimate the function f(Xi , β) that most closely fits the data.
• To carry out regression analysis, the form of the function f must be specified.
• Sometimes the form of this function is based on knowledge about the
relationship between Yi and Xi .
• If no such knowledge is available, a flexible or convenient form for f is chosen.
Formulation of Regression Models
• You may start with a simple univariate linear regression:

• It indicates that you believe that a reasonable approximation for Yi is:

• Now the next objective is to estimate the parameters β


• may be using least squares method
• may go with other alternatives such as least absolute deviations, Least trimmed squares,
quantile regression estimator, Theil–Sen estimator, M-estimation (maximum likelihood
type) or S-estimation (scale).
Formulation of Regression Models
• Find the value of β that minimizes the sum of squared errors

• A given regression method will ultimately provide an estimate of β, usually


denoted .
• Using this estimate, you can then find the fitted value for prediction or to
assess the accuracy of the model in explaining the data.
Variants in Regression Models
• The most common
models are simple linear
Y = a + bX + ϵ and multiple linear.
(discussed in last class)
• Nonlinear regression
Y = a + bX1 + cX2 + dX3 + ϵ analysis is commonly used
for more complicated
data sets in which the
dependent and
independent variables
-- Logistic Regression show a nonlinear
relationship.
Y= -- Michaelis–Menten model for enzyme kinetics
Multiple Linear Regression
x y
Size (feet2) Price ($1000)
Number of bedrooms Number of floors Age of home (years)
i 1 2 3 4
1 2104 5 1 45 460
2 1416 3 2 40 232
3 1534 3 2 30 315
4 852 2 1 36 178
… … … … …
N

Notation:
= number of features
= input (features) of training example.
= value of feature in training example.
N = number of training examples
Multiple Linear Regression

Notation:
= number of features
= input (features) of training example.
= value of feature in training example.
N = number of training examples
The Regression Model, & The
Concepts of Least Squares
What is Least Square Method
• The least-squares method is a statistical method that is practised to find a
regression line or a best-fit line for the given pattern.
• The method of least squares is used in regression.
• In regression analysis, this method is said to be a standard approach for the
approximation of sets of equations having more equations than the number of
unknowns (overdetermined systems).
• It is used to approximate the solution by minimizing the sum of the squares of the
residuals made in the results of each individual equation.
• Residual: the difference between an observed value and the fitted value provided by a model
• The problem of finding a linear regressor function will be formulated as a problem
of minimizing a criterion function.
• The widely-used criterion function for regression purposes is the sum-of-error-
squares.
Least Square Method with Linear
Regression
Least Square Method with Linear
Regression
• In general, regression methods are used to predict the value of
response (dependent) variable from attribute (independent)
variables,
• Linear regressor model fits a linear function (relationship) between
dependent (output) variable and independent (input) variables.

• where {w0, w1, …, wn} are the parameters of the model.


• The method of linear regression is to choose the (n + 1) coefficients
w0, w1, …, wn, to minimize the residual sum of squares of these
differences over all the N training instances.
Least Square Method with Linear
Regression
• In general, regression methods are used to predict the value of
response (dependent) variable from attribute (independent)
variables,
• Linear regressor model fits a linear function (relationship) between
dependent (output) variable and independent (input) variables.

• where {w0, w1, …, wn} are the parameters of the model.


• The method of linear regression is to choose the (n + 1) coefficients
w0, w1, …, wn, to minimize the residual sum of squares of these
differences over all the N training instances.
Minimal Sum-of-Error-Squares

• For an optimum solution for w , the following equations need to be satisfied:


Minimal Sum-of-Error-Squares
Minimal Sum-of-Error-Squares

• In this least-squares estimation task, the objective is to find the


optimal * that minimizes E ().
• The solution to this classic problem in calculus is found by setting the
gradient of E (), with respect to , to zero.
Minimal Sum-of-Error-Squares

• The (n + 1) x N matrix X+ = (XXT)–1X is called the pseudoinverse matrix


of the matrix XT. Thus, the optimal solution is
* = X+y
Unique solution?
• It might happen that the columns of X are not linearly independent.
• Then XXT is singular and the least squares coefficients * are not
uniquely defined.
• The singular case occurs most often when two or more inputs were
perfectly correlated.
• A natural way to resolve the non-unique representation is by dropping
redundant columns in X.
• Most regression software packages detect these redundancies and
automatically implement some strategy for removing them.
Error Reduction-Gradient Descent
Basics of Gradient Descent
• Gradient descent search helps determine a weight vector that minimizes E
by starting with an arbitrary initial weight vector and then altering it again
and again in small steps.
• Batch gradient descent: When the weight update is calculated based on all
examples in the training dataset, it is called as batch gradient descent.
• Stochastic gradient descent: When the weight update is calculated
incrementally after each training example or a small group of training
example, it is called as stochastic gradient descent.
• Gradient descent procedure has two advantages over merely computing
the pseudoinverse:
• (1) it avoids the problems that arise when XXT is singular (it always yields a
solution regardless of whether or not XXT is singular);
• (2) it avoids the need for working with large matrices.
Basics of Gradient Descent
Basics of Gradient Descent
• The error surface may have multiple local minimums but a single global
minimum.
• The objective would be to find out global minimum.
What is Gradient Descent
Gradient Descent Optimization Schemes
● Optimization method Gradient Descent Method used for minimization
tasks. Changes of the weights are made according to the following
algorithm:

where denotes the learning rate, and stands for the actual iteration step.
Note:
● Need to choose .
● Needs many iterations.
● Works well even when n is large.
● Gradient descent serves as the basis for learning algorithms that search
the hypothesis space of possible weight vectors to find the weights that
best fit the training examples.
What is Gradient Descent
Gradient Descent Optimization Schemes
● Optimization method Gradient Descent Method used for minimization
tasks. Changes of the weights are made according to the following
algorithm:

where denotes the learning rate, and stands for the actual iteration step.
Approaches for deciding the iteration step:
1. Batch methods use all the data in one shot.
● iteration step means the kth presentation of training dataset.
● Gradient is calculated across the entire set of training patterns.
2. Online methods is where
● – iteration step after single data pair is presented.
● Share almost all good features of recursive least square algorithm with
reduced computational complexity.
The gradient descent training rule

Linear classification using regression technique

Performance Criterion:

---To be minimized.
Called as cost function.

used for computational convenience


The gradient descent training rule
• The error surface is parabolic with a single global minimum.
• The specific parabola will depend on the particular set of training examples.
• The direction of steepest descent along the error surface can be found by
computing the derivative of E with respect to each component of the vector .
• This vector-derivative is called the gradient of E w.r.t , written ∇E () .
Remember, it can be applied to any objective
function, not just for squared distances.

• The negative of this vector gives the direction Performance Criterion:

of steepest decrease.
• Therefore, the training rule for gradient
descent is, ---To be minimized.
Called as cost function.

is learning rate, a +ve constant, determines the step size in the search.
The gradient descent training rule
• This training rule can also be written in its component form:

• which shows that steepest descent is achieved by


altering each component wj of w in proportion to
Performance Criterion:

• starting with an arbitrary initial weight vector, is


changed in the direction producing the steepest
descent along the error surface. ---To be minimized.
• The process goes on till the global minimum error Called as cost function.

is attained.
The gradient with respect to weight wj

& Performance Criterion:

---To be minimized.
Called as cost function.
The gradient with respect to weight wj

An epoch is a complete run through all the N associated pairs.


The gradient with respect to weight wj
• Once an epoch is completed, the
pair (x(1), y(1)) is presented again
and a run is performed through
all the pairs again.
• After several epochs, the ouput
error is expected to be
sufficiently small.

• k corresponds to the epoch number, the


number of times the set of N pairs is
presented and cumulative error is
compounded.

You might also like