Lecture 2 - Regression
Lecture 2 - Regression
Introduction
Response variable or dependent: variable being predicted.
Predictor variables or independent variables: variables being used to predict the value of
the dependent variable.
Simple (single) linear regression: a regression analysis for which any one unit change in
the independent variable, x, is assumed to result in a change in the dependent variable,
y.
Multiple linear regression: a regression analysis involving two or more independent
variables.
Where the critical value t ¿ depends on the confidence level and has n-2 degrees of freedom.
Use technology (Excel) to calculate SE (b1) and confidence interval for you.
Assumptions and Conditions
1. Make sure you have quantitative variables x, y.
2. Draw a scatterplot and check if there is a linear relationship between variables.
3. Fit a regression line and calculate residuals (e). Make a scatterplot for residuals against
variable x or the predicted values. This plot should have no patterns.
Hypothesis Test for the Regression Slope: t-test for the Regression Slope
When the assumptions and conditions are met, we can test the hypothesis H0: b1 = 0 vs.
HA: B1 does not = 0 (or a one-sided alternative hypothesis) using the standardized
estimated regression slope
b 1−beta 1
t=
SE(b1)
Which follows a student’s t-model with n-2 degrees of freedon, We can use the t-model
to find the P-value of the test.
ANOVA and the F-Statistic
Analysis of variance (ANOVA) for regression is another test for beta1 (b1).
But instead of stating the test in terms of the parameter, we restate it in terms of the
model.
Is the regression model worthwhile?
o (i.e., does the predictor variable contain useful information about the response
variable?)
F-Distribution
(review)
1. The curve is
not
symmetrical
but skewed to
the right.
2. There is a
different curve
for each set of
dfs.
3. As the
degrees of freedom for the
numerator and for the denominator
get larger, the curve approximates the
normal.
Variation in the Model and R-Squared
All regression models fall somewhere between the two extremes of zero correlation or
perfect correlation plus or minus 1.
We consider the square of the correlation co-efficient r to get r-squared which is a value
between 0 and 1.
R-squared: the fraction of the data’s variation accounted for by the model.
1-r-squared: the fraction of the original variation left in the residuals.
Two ways to calculate r-squared
o Square the correlation of r (r^2)
o
o