Bivariate Regression Analysis: The Beginning of Many Types of Regression
Bivariate Regression Analysis: The Beginning of Many Types of Regression
a b
Where a = constant, alpha, or intercept (value of
Y when X= 0 ; B= slope or beta, the value of X
a b e
E = stochastic term or error of estimation
and captures everything else that affects
change in Y not captured by X
The Intercept
• The intercept estimate (constant) is where
the regression line intercepts the Y axis,
which is where the X axis will equal its
minimal value.
a b
The intercept operates as a baseline for the
estimation of the equation.
The Slope
• The slope estimate equals the average
change in Y associated with a unit change
in X.
ˆ
i i
NOT BLUE
BLUE
Ordinary Least Square (OLS)
• OLS is the technique used to estimate a line
that will minimize the error. The difference
between the predicted and the actual values of
Y
ˆ
Y Y e
OLS
• Equation for a population
Y X
• Equation for a sample
Y a bX e
The Least Squares Concept
• The goal is to minimize the error in the
prediction of b. This means summing the
errors of each prediction, or more
appropriately the Sum of the Squares of the
Errors.
SSE = (i i )
ˆ 2
The Least Squares and b coefficient
• The sum of the squares is “least” when
b
( )( )
i i
• And ( )i
2
a b
Knowing the intercept and the slope, we can predict
values of Y given X.
Calculating the slope & intercept
b
( Xi X )(Yi Y )
( Xi X ) 2
a Y bX
Step by step
1. Calculate the mean X
of Y and X Y
difference of X
7. Divide (step4/step6) b
( Xi X )(Yi Y )
( Xi X ) 2
8. Calculate a
a Y bX
An Example: Choosing two points
Y X
5.13 4.02
5.2 4.54
4.53 3.53
4.79 3.8
4.78 3.86
4.72 4.17
Forecasting Home Values
LOG_VALU
5.2
5.1
2
5.0
4.9
4.8
4.7
4.6
Linear
1
4.5
3.4 3.6 3.8 4.0 4.2 4.4 4.6
LOG_SQFT
Forecasting Home Values
LOG_VALU
5.2
5.1
5.0 Y2 - Y1
_______
4.9
4.8 X2 - X1
4.7
4.6
4.54 – 3.53
Linear
4.5
3.4 3.6 3.8 4.0 4.2 4.4 4.6
__________ =.69
LOG_SQFT
5.2 – 4.5
SPSS OUTPUT
Coefficientsa
Unstandardized Standardized
Coefficients Coefficients
Model B Std. Error Beta t Sig.
1 (Constant) 2.565 .929 2.761 .051
X .575 .232 .778 2.476 .068
a. Dependent Variable: Y
Unstandardized Standardized
Coefficients Coefficients 95% Confidence Interval for B
Model B Std. Error Beta t Sig. Lower Bound Upper Bound
1 (Constant) 797.952 45.360 17.592 .000 708.478 887.425
UNEMP -69.856 6.500 -.615 -10.747 .000 -82.678 -57.034
a. Dependent Variable: STOCKS
Model Summary
2
R
• What amount of variance in Y is explained
by X variable?
R r 2 2
This measure is based on the degree to which
the point estimates of fall on the regression
line. The higher the error from the line, the
lower the R square (scale between 1 and 0).
i
( ) 2
= Total sum of squared deviations (TSS)
= regression (explained) sum of squared
(ˆ i ) deviations (RSS)
2
Unstandardized Standardized
Coefficients Coefficients
Model B Std. Error Beta t Sig.
1 (Constant) 3.057 .041 74.071 .000
UPOP 4.176E-05 .000 .133 13.193 .000
a. Dependent Variable: DEMOC
Correlations
DEMOC UPOP
Model Summary
Pearson Correlation DEMOC 1.000 .133
UPOP .133 1.000 Adjusted Std. Error of
Sig. (1-tailed) DEMOC . .000 Model R R Square R Square the Estimate
1 .133a .018 .018 3.86927
UPOP .000 .
a. Predictors: (Constant), UPOP
N DEMOC 9622 9622
UPOP 9622 9622
Interpreting a Regression 2
• The correlation between X and Y is weak (.133).
Yi = α + βXi + εi
OLS Assumptions
1. No specification error
a) Linear relationship between X and Y
b) No relevant X variables excluded
c) No irrelevant X variables included
2. No Measurement Error
• (self-evident I hope, otherwise what would we
be modeling?)
OLS Assumptions
3. On Error Term:
a. Zero mean: E(εi2), meaning we expect
that for each observation the error equals
zero.
b. Homoskedasticity: The variance of the
error term is constant for all values of X i.
c. No autocorrelation: The error terms are
uncorrelated.
d. The X variable is uncorrelated with the
error term
e. The error term is normally distributed.
OLS Assumptions
• Some of these assumptions are complex
and issues for a second level course
(autocorrelation, heteroskedasticity).