3CP10 Final MJJ Linear Regression
3CP10 Final MJJ Linear Regression
Prepared by
M J Joshi
Regression
In regression the output is continuous
Many models could be used – Simplest
is linear regression
Fit data with the best hyper-plane which
"goes through" the points
y
dependent
variable
(output)
x – independent variable
(input)
Conti..
Regression
1 feature Models 2+ features
Simple Multiple
Non- Non-
Linear Linear
Linear Linear
Linear regression
Given an input x compute an
output y
For example:
Y
- Predict height from age
- Predict house price from
house area
X
Bivariate or simple linear regression
E(y)
Regression line
Intercept Slope b1
b0
is positive
x
Simple Linear Regression Equation
E(y)
Intercept
b0Regression line
Slope b1
is negative
x
Simple Linear Regression Equation
No Relationship
E(y)
x
Estimation Process
Estimated
b0 and b1 Regression Equation
provide estimates of ŷ b0 b1x
b0 and b1 Sample Statistics
b0, b1
Regression line
The regression model is y 0 1 x
Data about x and y are obtained from a sample.
From the sample of values of x and y, estimates b0 of β0 and b1 of
β1 are obtained using the least squares or another method.
The symbol is termed “y hat” and refers to the predicted values
of the dependent variable y that are associated with values of x,
given the linear model.
yˆ b0 b1 x
Least square method
Least Squares Method
Least Squares Criterion
where:
yi = observed value of the dependent variable
for the ith observation
yi =^estimated value of the dependent variable
for the ith observation
Least Squares Method
Slope for the Estimated Regression Equation
where:
xi = value of independent variable for ith
observation
yi = value of dependent variable for ith
_ observation
x = mean value for independent variable
_
y = mean value for dependent variable
Least Squares Method
Number of Number of
TV Ads (x) Cars Sold (y)
1 14
3 24
2 18
1 17
3 27
Sx = 10 Sy = 100
x 2 y 20
Estimated Regression Equation
Slope for the Estimated Regression
Equation
B0= 218.385
B1 = -19.558
Y = 218.358 – 19.558X
Example
Answer
3 squared differences
Coefficient of determination
Coefficient of Determination
Relationship Among SST, SSR, SSE
where:
SST = total sum of squares
SSR = sum of squares due to regression
SSE = sum of squares due to error
Coefficient of Determination
where:
SSR = sum of squares due to regression
SST = total sum of squares
Example 2 Conti…
Uses of regression
46
GPA vs. Time Online
12
10
8
Time Online
0
50 55 60 65 70 75 80 85 90 95 100
GPA
Multiple linear regression
52 Linear Regression – Multiple Variables
Populatio
n line
Least
Squares
line
We use through as guesses for 0
through p and as a guess for Yi. The
guesses will not be perfect.
Example
Prediction
Calculate R2
Polynomial regression
63 Logistic Regression
One commonly used algorithm is Logistic Regression
Assumes that the dependent (output) variable is binary which
is often the case in medical and other studies. (Does person
have disease or not, survive or not, accepted or not, etc.)
Like Quadric, Logistic Regression does a particular non-linear
transform on the data after which it just does linear regression
on the transformed data
Logistic regression fits the data with a sigmoidal/logistic curve
rather than a line and outputs an approximation of the
probability of the output given the input
CS 478 - Regression
Difference between logistic
regression and linear regression
Logistic regression is used when the depended variable is binary in
nature.
Linear regression is used when the depended variable is continuous and
the nature of regression line is linear.
Multiple linear regression used when the depended variable is
continuous but the nature of regression line is non – linear.
Example
Conclusion