Lecture 8 Regression
Lecture 8 Regression
• Simple Regression:
Form of the general model
REGRESSION ANALYSIS Procedure in SPSS
Interpretation of SPSS output
Testing significance of a slope/intercept
Assumption checking
Reading materials: • Multiple Regression:
Chap 16,17(Keller) As above
1 2
1
Simple linear relationship Simple linear relationship: example
Respondent Duration of Quality of Attitude Towards
Number Residence infrastructure City
• In simple linear relationship, we want to see whether
1 10 3 6
a linear relationship exist b/w one dependent variable 2 12 11 9
(Y) and one independent variable (X). 3 12 4 8
• Example: want to see whether the time persons have 4 4 1 3
1. Analyse the nature of the relationship b/w independent • Simple regression – one predictor
and dependent variables • We have n observations.
2. Make a scatterplot • Xi = value of the independent variable on ith obs
• Yi= value of dependent variable on ith obs.
3. Formulate the mathematical model that describes the
• sx=sample standard deviation of the independent variables
relationship b/w the independent and dependent variables
• sy=sample standard deviation of the dependent variables
4. Estimate and interpret the coefficients of the model • Y is the sample average of the dependent variables
5. Test the model • X is the sample average of the independent variables
6. Evaluate the strength of the relationship (fitness) and
prediction accuracy
9 10
2
Are good students managers? Simple linear regression: scatterplot
11
10
Yi 0 1 X i i
Slope
Slope and intercept are estimated by the ordinary
least squares (OLS) method.
15 16
Y Yi 0 1 Xi i Observed
value
i = error terms
YX 0 1 Xi
X
17 18
3
OLS method (3)
Gauss-Markov assumptions
19 20
4
Step 5: Testing for significance of estimated
parameters
Applying this to example
• H0:β1=0
• Can test significance of linear relationship • HA:β1≠0
• H0:β1=0
• Test Statistic:
• HA:β1≠0
ˆ1 1 0.5897 0
• Test Statistic: t 8.412
sˆ 0.0701
ˆ 1
27 28
Step 6: Determine the strength or fitness of the Step 6: Determine the strength or fitness of the
relationship relationship
29 30
5
Step 6: Determine the strength or fitness of the Step 6: Determine the strength or fitness of the
relationship relationship
• Measured by r2 – coefficient of determination.
• r2 measures proportion of total variation (Y)
explained by the variation in X, i.e.
31
32
33 34
35 36
6
Example using SPSS Multiple Regression
37 38
• Imagine a case with two predictors • Attitude to city now being explained by
Duration of residence
Y 0 1 X1 2 X2 i Quality of infrastructure
41 42
7
General Model Estimation (SPSS)
The regression equation is
Attitude Towards City = 0.337 + 0.481 Duration of Residence
+ 0.289 quality of infrastructure
• Let Coefficientsa
43 44
45 46
8
Applying this to example–SPSS output 2. Significance of specific partial regression
coefficients.
• This is the test done in the ANOVA section of the • H0: βi=0
output. • HA: βi≠0
• In this case, we reject the null hypothesis – at least • Test Statistic:
ˆi i ˆi
one of the slopes is significantly different from t
sˆ sˆ
zero. i i
Coefficientsa
Unstandardized Standardized • Assumptions made:
Coefficients Coefficients • Error terms normally distributed
Model B Std. Error Beta t Sig. • Error terms have mean 0, constant variance
1 (Constant) .337 .567 .595 .567 • Error terms are independent
duration .481 .059 .764 8.160 .000
• Definition: A residual (also called error term) is
quality .289 .086 .314 3.353 .008
a. Dependent Variable: attitude
the difference between the observed response
value Yi, and the value predicted by the
• Once the quality of infrastructure is considered, the regression equation, Yˆi
duration of residence still has a significant linear
relationship with the attitude to a city. • (Vertical distance between point and line.)
51 52
Error terms normally distributed Error terms have mean 0, constant variance
53 54
9
Error terms are independent Example
Residual Plots for Attitude Towards City
• Check in previous plots; also in residuals vs Normal Probability Plot of the Residuals Residuals Versus the Fitted Values
time/order.
99 2
Standardized Residual
90
1
Percent
• Look for random scatter of residuals.
50 0
10 -1
1 -2
-2 -1 0 1 2 2 4 6 8 10
Standardized Residual Fitted Value
Standardized Residual
1
Frequency
2
0
1
-1
0 -2
-1.5 -1.0 -0.5 0.0 0.5 1.0 1.5 2.0 1 2 3 4 5 6 7 8 9 10 11 12
Standardized Residual Observation Order
55 56
57
10