Regression
Regression
to
Regression
In this session
• Introduction to Regression.
– What is Regression?
– Why we need Regression?
– Different types of Regression Models
– How to create a Regression model?
Simple
Multiple
Regression Regression
• Nonlinear regression.
1
Y = β0 + + X 2β 3 + ε
β1 + β 2 X 1
Linear Regression
Y = β0 + β1x1 + β 2 x2 + ... + β k xk + ε
2
Y = β0 + β1x1 + β 2 x2 + β3 x1x2 + β 4 x2 ... + β k xk + ε
NO
Model satisfies
diagnostic test
YES STOP
Regression Functional Form
1. Hypothesize Deterministic Component
Yi = β 0 + β1X i + ε i
Dependent Independent
(Response) (Explanatory)
Variable Variable
(e.g., income) (e.g., education)
Deterministic component in
Regression
General form of Regression Models
NO
Model satisfies
diagnostic test
YES STOP
Interpreting a Scatter plot
NO
Model satisfies
diagnostic test
YES STOP
Model Assumptions
Linear Regression Model Assumptions
• The regression model is linear in parameters.
• The explanatory variable X is assumed to be non-stochastic.
• Given the value of X (say Xi), the mean of the random error term
εi is zero.
• The error term, εi, follows a normal distribution.
• Given the value of X, the variance of εi is constant
(Homoscedasticity).
• There is no autocorrelation between two εi values.
Assumptions Continued…
Unknown
Relationship J $
Yi = β 0 + β1X i + ε i J $
J $
J $ J $
J $
J $
Population Linear Regression Model
Y
Yi = β 0 + β1X i + ε i
εi = Random error
X
∧ ∧
Observed value E (Y X ) = β 0 + β1 X i
What is the best fit?
How would you draw a line through the points?
How do you determine which line ‘fits best’?
Y
60
40
20
0 X
0 20 40 60
Method of Ordinary Least Squares (OLS)
Least Squares Graphically
n
LS minimizes ∑ ε̂ i2 = ε̂12 + ε̂ 22 + ε̂ 32 + ε̂ 24
i =1
Y Y2 = β! 0 + β! 1X 2 + ε! 2
ε^4
ε^2
ε^1 ε^3
! ! !
Yi = β 0 + β 1X i
X
Estimation of Parameters in Regression
2
n n
⎛ k ⎞
2
SSE = ∑ ε = ∑ ⎜⎜ yi − β 0 − ∑ β j xij ⎟⎟
i
i =1 i =1 ⎝ j =1 ⎠
Regression
Coefficient
(β1)
in
SLR
ˆ ∑ (xi − x )( yi − y ) Cov( X , Y )
β1 = 2
=
∑ (xi − x ) Var ( X )
ˆ SY
β1 = r ×
SX
where r is the correlation coefficient between X and Y
SY is the standard deviation of Y
SX is the standard deviation of X
Why
Least
Squares
Estimate
• OLS beta estimates are, “Best Linear Unbiased Estimates
(BLUE)”, provided the error terms are uncorrelated (no auto
regression) and have equal variance (homoscedasticity). That
is,
⎡ ∧ ⎤
E ⎢β − β ⎥ = 0
⎣ ⎦
Advantages
of
OLS
Estimates
• They are unbiased estimates.