0% found this document useful (0 votes)
11 views54 pages

Linear Regression

The document discusses linear regression, a supervised learning technique used to predict continuous outcomes based on input variables. It explains the hypothesis function, the role of parameters, and the cost function, emphasizing the importance of minimizing errors in predictions. Additionally, it covers gradient descent as a method for optimizing parameters and introduces multiple linear regression for scenarios with more than one independent variable.

Uploaded by

qudsiasamar09
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
11 views54 pages

Linear Regression

The document discusses linear regression, a supervised learning technique used to predict continuous outcomes based on input variables. It explains the hypothesis function, the role of parameters, and the cost function, emphasizing the importance of minimizing errors in predictions. Additionally, it covers gradient descent as a method for optimizing parameters and introduces multiple linear regression for scenarios with more than one independent variable.

Uploaded by

qudsiasamar09
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 54

Linear Regression

Module 4
Session 2

Department of Computer Science & Engine 1


ering
Example

This is a training set.

Can we learn to predict the prices of houses of other sizes in


the city, as a function of their living area?
Example(cont.)

Example of supervised learning problem.

When the target variable we are trying to


predict is continuous, regression problem.
How to learn Hypothesis

Learn a function
h(x), so that h(x) is
a good predictor for
the corresponding
value of y

h: hypothesis
function
How to represent hypothesis
h?

θi are parameters
- θ0 is zero condition
- θ1 is gradient

θ: vector of all the parameters

We assume y is a linear
function of x Univariate linear
regression
How to represent hypothesis
h?

θi are parameters
- θ0 is zero condition
- θ1 is gradient

We assume y is a linear
function of x Univariate linear
regression
How to learn the values of the
parameters θi?
Regression Analysis
▪ A technique for using data to identify
relationships among variables
▪ Use these relationships to make predictions.
▪ Assume that the outcome we are predicting
depends linearly on the information used to
make the prediction.
▪ Linear dependence means constant rate of
increase of one variable with respect to
another.

Department of Computer Science & Engineering 7


Example
▪ Suppose we have data on sales of houses in
some area. For each house, we have
complete information about its size, the
number of bedrooms, bathrooms, total
rooms, the size of the lot, the corresponding
property tax, etc., and also the price at
which the house was eventually sold.
▪ Can we use this data to predict the selling
price of a house currently on the market?

Department of Computer Science & Engineering 8


Example(cont.)
▪ Selling price = β0 + β1 (sq.ft.) + β2 (no. bedrooms) +β3
(no. bath) + β4 (no. acres)+β5 (taxes) + error
▪ In this expression, β1 represents the increase in selling
price for each additional square foot of area: it is the
marginal cost of additional area.
▪ β2 and β3 are the marginal costs of additional
bedrooms and bathrooms, and so on.
▪ The intercept β0 could in theory be thought of as the
price of a house for which all the variables specified are
zero; of course, no such house could exist, but including
β0 gives us more flexibility in picking a model.

Department of Computer Science & Engineering 9


Example(cont.)
▪ The last term in the equation above, the “error,” reflects the
fact that two houses with exactly the same characteristics need
not sell for exactly the same price.
▪ There 1 is always some variability left over, even after we
specify the value of a large number variables.
▪ This variability is captured by an error term, which we will treat
as a random variable.
▪ Regression gives us a method for computing estimates of the
parameters β0 and
▪ β1, . . . , β5 from data about past sales.
▪ Once we have these estimates, we can plug in values of the
variables for a new house to get an estimate of its selling price.

Department of Computer Science & Engineering 10


Department of Computer Science & Engineering 11
Linear Regression
▪ A group of techniques for fitting and
studying the straight-line relationship
between two variables.
▪ Linear regression estimates the
regression coefficients and in the
equation

▪ where X is the independent variable, Y


is the dependent variable, is the Y
intercept, is the slope, and ε is the error
Department of Computer Science & Engineering 12
Linear Regression
▪ How well a set of data points fits a straight
line can be measured by calculating the
distance between the data points and the
line.
▪ The total error between the data points and
the line is obtained by squaring each distance
and then summing the squared values.
▪ The regression equation is designed to
produce the minimum sum of squared errors

Department of Computer Science & Engineering 13


What is the Intuition of hypothesis
function

• We are attempting to fit a straight line to the


data in the training set
• Values of the parameters decide the equation of
the straight line
• Which is the best straight line to fit the
data?
What is the Intuition of hypothesis
function

• Which is the best straight line to fit the data?


• How to learn the values of the parameters θ
i?
i

• Choose the parameters such that the


What is Cost function

• Measure of how close the predictions


are to the actual y-values
• Average over all the m training
instances

• Squared error cost function J(θ)


• Choose parameters θ so that J(θ) is
minimized
Hypothesis:

Parameters:

Cost Function:

Goal:
(for , this is a function
fixed of x)

500

400

Price ($)
in 1000’s
300

200

100

0 0 1000
300 2000
0 Size in feet2 (x)
Minimizing a function

• Gradient descent algorithm

• Used in many applications of


minimizing functions
Have some
function

Want

Outline:
• Start with some
• Keep changing to reduce

end up at a minimum
J()



If the function has multiple local minima, where one starts
can decide which minimum is reached

J()



Gradient descent
algorithm

α is the learning rate


Gradient descent algorithm

Correct: Simultaneous Incorrec


update t:
For simplicity, let us first consider a function of
a single variable
What is the learning Rate?
• Gradient descent can converge to a local
minimum, even with the learning rate α
fixed

• But, value needs to be chosen judiciously


• If α is too small, gradient descent can be
slow to converge
• If α is too large, gradient descent can
overshoot the minimum.
It may fail to converge, or even
diverge.
Gradient descent for
univariate linear regression
Gradient descent Linear Regression
algorithm Model
Gradient descent for univariate linear
regression

update

simultaneousl
y
(for , this is a function (function of the )
fixed of x) parameters
(for , this is a function (function of the )
fixed of x) parameters
(for , this is a function (function of the )
fixed of x) parameters
(for , this is a function (function of the )
fixed of x) parameters
(for , this is a function (function of the )
fixed of x) parameters
(for , this is a function (function of the )
fixed of x) parameters
(for , this is a function (function of the )
fixed of x) parameters
(for , this is a function (function of the )
fixed of x) parameters
(for , this is a function (function of the )
fixed of x) parameters
Least Squares Estimation
▪ Suppose a sample of
▪ These observations are assumed to
satisfy the simple linear regression
model, and so we can write
▪ β

Department of Computer Science & Engineering 38


Least Squares Estimation
▪ The principle of least squares estimates the
parameters and by minimizing the sum of
squares of the difference between the
observations and the line in the scatter diagram.
▪ When the vertical difference between the
observations and the line in the scatter diagram is
considered.
▪ Its sum of squares is minimized to obtain the
estimates of and , the method is known as direct
regression.

Department of Computer Science & Engineering 39


Department of Computer Science & Engineering 40
Least Squares Estimation(cont.)
▪ The sum of squares of the difference
between the observations and the line
in the horizontal direction in the scatter
diagram can be minimized to obtain the
estimates of and .
▪ This is known as a reverse (or inverse)
regression method.

Department of Computer Science & Engineering 41


Department of Computer Science & Engineering 42
Least Squares Estimation(cont.)
▪ Instead of horizontal or vertical errors, if
the sum of squares of perpendicular
distances between the observations and
the line in the scatter diagram is
minimized to obtain the estimates of and
▪ The method is known as orthogonal
regression or major axis regression
method.

Department of Computer Science & Engineering 43


Department of Computer Science & Engineering 44
Least Squares Estimation(cont.)
▪ Instead of minimizing the distance, the area
can also be minimized. The reduced major
axis regression
▪ method minimizes the sum of the areas of
rectangles defined between the observed
data points and the
▪ nearest point on the line in the scatter
diagram to obtain the estimates of
regression coefficients
Department of Computer Science & Engineering 45
Department of Computer Science & Engineering 46
When we have Multiple Variables

Size (feet2) Number Number Age of Price ($1000)


of of home
bedroom floors (years)
s
2104 5 1 45 460
1416 3 2 40 232
1534 3 2 30 315
852 2 1 36 178
… … … … …
When we have Multiple Variables

Size (feet2) Number Number Age of Price ($1000)


of of home
bedroom floors (years)
s
2104 5 1 45 460
1416 3 2 40 232
1534 3 2 30 315
852 2 1 36 178
… … … … …
Notation:
= number of features
= input (features) of training example
= value of feature in
. training example.
Hypothesis:

Previously:

For multi-variate linear regression:

For convenience of notation, .


define
Multiple Linear Regression
• Extension of the simple linear regression model to
two or more independent variables
y   0 + 1 x1 +  2 x 2  ...  n x n  

Expression = Baseline + Age + Tissue + Sex + Error

• Partial Regression Coefficients: i  effect on the


dependent variable when increasing the ith
independent variable by 1 unit, holding all
other predictors constant
Hypothesis:

Parameters
:

Cost
function:

Gradient
descent:
Repeat

(simultaneously update for )


every
New algorithm :
Gradient
Descent Repeat
Previously
(n=1):

Repeat
simultaneously fo
update r

(simultaneously )
update
New algorithm :
Gradient
Descent Repeat
Previously
(n=1):

Repeat
simultaneously fo
update r

(simultaneously )
update
Thank You

Department of Computer Science &


54
Engineering

You might also like