0% found this document useful (0 votes)
12 views68 pages

3CP10 Final MJJ Linear Regression

The document provides an overview of linear regression, explaining its purpose in estimating relationships between independent and dependent variables. It covers various types of regression models, the estimation process, and the least squares method for fitting a regression line. Additionally, it discusses the uses of regression, including prediction, forecasting, and the impact of outliers.

Uploaded by

ypwudfhck
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
12 views68 pages

3CP10 Final MJJ Linear Regression

The document provides an overview of linear regression, explaining its purpose in estimating relationships between independent and dependent variables. It covers various types of regression models, the estimation process, and the least squares method for fitting a regression line. Additionally, it discusses the uses of regression, including prediction, forecasting, and the impact of outliers.

Uploaded by

ypwudfhck
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 68

Linear Regression

Prepared by
M J Joshi
Regression
 In regression the output is continuous
 Many models could be used – Simplest
is linear regression
 Fit data with the best hyper-plane which
"goes through" the points

y
dependent
variable
(output)

x – independent variable
(input)
Conti..

 Regression model estimates the nature of


the relationship between the independent
and dependent variables.
 Change in dependent variables that results
from changes in independent variables, ie. size
of the relationship.
 Strength of the relationship.
 Statistical significance of the relationship.
Examples
 Dependent variable is employment income –
independent variables might be hours of work,
education, occupation, Gender, age, region, years
of experience etc.

 Price of a product and quantity produced or sold:


 Quantity sold affected by price. Dependent variable is quantity of
product sold – independent variable is price.
 Price affected by quantity offered for sale. Dependent variable is price
– independent variable is quantity sold.
5 A Simple Example: Fitting a
Polynomial
 The green curve is the
true function (which is not
a polynomial)

 We may use a loss


function that measures
the squared error in the
prediction of y(x) from x.

from Bishop’s book on Machine


Learning
Some fits to the data: which is
6 best?
from Bishop
Types of Regression Models

Regression
1 feature Models 2+ features

Simple Multiple

Non- Non-
Linear Linear
Linear Linear
Linear regression
 Given an input x compute an
output y
 For example:
Y
- Predict height from age
- Predict house price from
house area

X
Bivariate or simple linear regression

 x is the independent variable


 y is the dependent variable
 The regression model is
y   0  1 x  
 The model has two variables, the independent or
explanatory variable, x, and the dependent
variable y, the variable whose variation is to be
explained.
 The relationship between x and y is a linear or
straight line relationship.
 Two parameters to estimate – the slope of the line
β1 and the y-intercept β0 (where the line crosses
the vertical axis).
 ε is the unexplained, random, or error
component.
Simple Linear Regression
Equation
 Positive Linear Relationship

E(y)

Regression line

Intercept Slope b1
b0
is positive

x
Simple Linear Regression Equation

 Negative Linear Relationship

E(y)

Intercept
b0Regression line

Slope b1
is negative

x
Simple Linear Regression Equation

 No Relationship

E(y)

Intercept Regression line


b0
Slope b1
is 0

x
Estimation Process

Regression Model Sample Data:


y = b0 + b1x +e x y
Regression Equation x1 y1
E(y) = b0 + b1x . .
Unknown Parameters . .
b0, b1 xn yn

Estimated
b0 and b1 Regression Equation
provide estimates of ŷ b0  b1x
b0 and b1 Sample Statistics
b0, b1
Regression line
 The regression model is y   0  1 x  
 Data about x and y are obtained from a sample.
 From the sample of values of x and y, estimates b0 of β0 and b1 of
β1 are obtained using the least squares or another method.
 The symbol is termed “y hat” and refers to the predicted values
of the dependent variable y that are associated with values of x,
given the linear model.

yˆ b0  b1 x
Least square method
Least Squares Method
 Least Squares Criterion

where:
yi = observed value of the dependent variable
for the ith observation
yi =^estimated value of the dependent variable
for the ith observation
Least Squares Method
 Slope for the Estimated Regression Equation

where:
xi = value of independent variable for ith
observation
yi = value of dependent variable for ith
_ observation
x = mean value for independent variable
_
y = mean value for dependent variable
Least Squares Method

 y-Intercept for the Estimated Regression


Equation
Simple Linear Regression

 Example: Reed Auto Sales


Reed Auto periodically has a special week-long sale.
As part of the advertising campaign Reed runs one or
more television commercials during the weekend
preceding the sale. Data from a sample of 5 previous
sales are shown on the next slide.
Simple Linear Regression

 Example: Reed Auto Sales

Number of Number of
TV Ads (x) Cars Sold (y)
1 14
3 24
2 18
1 17
3 27
Sx = 10 Sy = 100
x 2 y 20
Estimated Regression Equation
 Slope for the Estimated Regression
Equation

 y-Intercept for the Estimated Regression


Equation

 Estimated Regression Equation


Answer

 B0= 218.385
 B1 = -19.558

 Y = 218.358 – 19.558X
Example
Answer
3 squared differences
Coefficient of determination
Coefficient of Determination
 Relationship Among SST, SSR, SSE

where:
SST = total sum of squares
SSR = sum of squares due to regression
SSE = sum of squares due to error
Coefficient of Determination

 The coefficient of determination is:

where:
SSR = sum of squares due to regression
SST = total sum of squares
Example 2 Conti…
Uses of regression

 Amount of change in a dependent variable that results from changes in


the independent variable(s) – can be used to estimate elasticities,
returns on investment in human capital, etc.
 Attempt to determine causes of phenomena.
 Prediction and forecasting of sales, economic growth, etc.
 Support or negate theoretical model.
 Modify and improve theoretical models and explanations of phenomena.
Outliers

 Rare, extreme values may distort the outcome.


 Could be an error.
 Could be a very important observation.
 Outlier: more than 3 standard deviations from the mean.

46
GPA vs. Time Online

12

10

8
Time Online

0
50 55 60 65 70 75 80 85 90 95 100
GPA
Multiple linear regression
52 Linear Regression – Multiple Variables

• 0 is the intercept (i.e. the average value for


Y if all the X’s are zero), j is the slope for
the jth variable Xj
Regression Model

 Our model assumes that


E(Y | X = x) = 0 + 1x (the “population line”)

Populatio
n line

Least
Squares
line
We use through as guesses for 0
through p and as a guess for Yi. The
guesses will not be perfect.
Example
Prediction
Calculate R2
Polynomial regression
63 Logistic Regression
 One commonly used algorithm is Logistic Regression
 Assumes that the dependent (output) variable is binary which
is often the case in medical and other studies. (Does person
have disease or not, survive or not, accepted or not, etc.)
 Like Quadric, Logistic Regression does a particular non-linear
transform on the data after which it just does linear regression
on the transformed data
 Logistic regression fits the data with a sigmoidal/logistic curve
rather than a line and outputs an approximation of the
probability of the output given the input

CS 478 - Regression
Difference between logistic
regression and linear regression
 Logistic regression is used when the depended variable is binary in
nature.
 Linear regression is used when the depended variable is continuous and
the nature of regression line is linear.
 Multiple linear regression used when the depended variable is
continuous but the nature of regression line is non – linear.
Example
Conclusion

You might also like