3 Simple Linear Regression
3 Simple Linear Regression
2 No Exam
Mid-term
Final
0
A B C
D F
Lesson Outline
1 The Straight-Line Probabilistic Model
2 Fitting the Model: The Method of Least Squares
3 Model Assumptions
4 An Estimator of σ2
5 Assessing the Utility of the Model:
Making Inferences About the Slope β1
6 The Coefficient of Determination
7 Using the Model for Estimation and Prediction
8 A Complete Example
9 Regression Through the Origin (Optional)
7
8 The straight-line model
9 Fitting the model: The Method of Least Squares
10
11 Straight-line fit to data in Table 3.1
12
13
14
15
Plot of the least squares line
^
y = -.1 + .7x
16
17
SAS printout for
advertising-sales regression
18
SPSS printout for
advertising-sales regression
19
MINITAB printout for
advertising-sales regression
20 Student GPA and Family Death Rate
GPA (Grade) 4.00 (A) 3.00 (B) 2.00 (C) 1.00 (D) 0.00 (F) Mean
No Exam 0.04 0.07 0.05 0.05 0.06 0.054
Mid-term 0.06 0.21 0.49 0.86 1.25 0.574
Final 0.09 0.41 0.96 1.57 2.18 1.042
21 Scatterplot for the data
Family Death Rate versus GPA
3
No Exam
Mid-term
Final
0
4.00 3.00 2.00
(A) 1.00 (B) 0.00 (C)
(D) (F)
22
Student GPA and Family Death Rate
- Final Exam
2.5
2
f(x) = − 0.534 x + 2.11
1.5
Death Rate
0.5
0
0 0.5 1 1.5 2 2.5 3 3.5 4
GPA
23
Model assumptions
24 The probability distribution of e
25 Is The Method of Least Squares Always Reliable?
26 An Estimator of 2
27
28 Advertising and Sales Regression Results
Student GPA and Family Death Rate
29
- Final Exam Regression Results
SUMMARY OUTPUT
Regression Statistics
Multiple R 0.99396714
R Square 0.987970675
Adjusted R Square 0.9839609 Coeff. Var.
Standard Error 0.107579428 10.32%
Observations 5
ANOVA
df SS MS F Significance F
Regression 1 2.85156 2.85156 246.390553 0.000561986
Residual 3 0.03472 0.011573333
Total 4 2.88628
Coefficients Standard Error t Stat P-value Lower 95% Upper 95% Lower 95.0% Upper 95.0%
Intercept 2.11 0.083330667 25.32081028 0.000135084 1.844804628 2.375195372 1.844804628 2.375195372
GPA -0.534 0.034019602 -15.69683258 0.000561986 -0.642265557 -0.425734443 -0.642265557 -0.425734443
Assessing the Utility of the Model:
30 Making Inferences About the Slope 1
Graphing the model with
31 b1 = 0: y = b0 + e
32
Slide 23 & 24
^
33 Sampling distribution of b1
34
≥ ≤
35
Rejection region and calculated t-value for
testing whether the slope b1 = 0
36 Advertising & Sales Regression Results (SAS)
37
Student GPA and Family Death Rate
38
- Final Exam Regression Results
SUMMARY OUTPUT
Regression Statistics
Multiple R 0.99396714
R Square 0.987970675
Adjusted R Square 0.9839609
Standard Error 0.107579428
Observations 5
ANOVA
df SS MS F Significance F
Regression 1 2.85156 2.85156 246.390553 0.000561986
Residual 3 0.03472 0.011573333
Total 4 2.88628
Coefficients Standard Error t Stat P-value Lower 95% Upper 95% Lower 95.0% Upper 95.0%
Intercept 2.11 0.083330667 25.32081028 0.000135084 1.844804628 2.375195372 1.844804628 2.375195372
GPA -0.534 0.034019602 -15.69683258 0.000561986 -0.642265557 -0.425734443 -0.642265557 -0.425734443
39
Regression Statistics
Multiple R 0.99396714
R Square 0.987970675
Adjusted R Square 0.9839609
Standard Error 0.107579428
Observations 5
ANOVA
df SS MS F Significance F
Regression 1 2.85156 2.85156 246.390553 0.000561986
Residual 3 0.03472 0.011573333
Total 4 2.88628
Coefficients Standard Error t Stat P-value Lower 95% Upper 95% Lower 95.0% Upper 95.0%
Intercept 2.11 0.083330667 25.32081028 0.000135084 1.844804628 2.375195372 1.844804628 2.375195372
GPA -0.534 0.034019602 -15.69683258 0.000561986 -0.642265557 -0.425734443 -0.642265557 -0.425734443
MINITAB graph of simple linear model
45 relating price (y) to floor height (x)
46 Using the Model for Estimation and Prediction
Estimated mean value and predicted individual
47
value of sales revenue y for x = 4
48
49
50
51
SAS printout showing
95% confidence intervals for E(y)
52
SAS printout showing
95% prediction intervals for y
Error of estimating the mean value of y for a
53
given value of x
Error of predicting a future value of y for a
54
given value of x
Comparison of widths of
55
95% confidence and prediction intervals
56 Another example about simple linear regression
57
58 SAS printout for fire damage linear regression
59 Least squares model for the fire damage data
60 Test Your Understanding
Suppose a student wants to know whether 1) appearance and 2) presentation skills of a lecturer
affect students’ overall satisfaction about the course, so he collects 10 random samples, and
runs two separate simple linear regression analyses that have the values of R 2 equal to 1) 0.68
and 2) 0.55 respectively.
Assuming the regression model assumptions are all satisfied, and t-tests for the two β1 are both
significant. What can he conclude?
1) Appearance of the lecturer is a more important factor than presentation skills in
explaining students’ overall satisfaction about the course.
2) Appearance and presentation skills of a lecturer can explain 68% and 55% of the
variation in students’ overall satisfaction about the course respectively.
3) Both of the above.
4) Neither of the above.
61 Regression Through the Origin (Optional)
62
63
64
65
SPSS regression through origin printout for
66 Example 3.6
Model
MINITAB scatterplot for data in
67 Example 3.6
SPSS spreadsheet with
68 95% prediction intervals
Using a straight line to approximate a curvilinear
69 relationship when the true relationship passes through the
origin
Fitting Regression Model with Population Data
70
http://stattrek.com/regression/regression-example.aspx?Tutorial=AP
Do University Administrators Deserve Their Raises?
Administrator Raise Average Rating Ratings Expected Raise
(5-pt scale)
71 1 $18,000 2.76 Very Poor 1.00 $15,939
2 $16,700 1.52
1.50 $13,960
3 $10,608 3.10
4 $10,268 3.83
Poor 2.00 $11,980
5 $9,795 2.84
6 $9,513 2.10 2.50 $10,001
7 $8,459 2.38
Average 3.00 $8,021
8 $6,099 3.59
9 $4,557 4.11
3.50 $6,042
10 $3,751 3.14
11 $3,718 3.64 Good 4.00 $4,062
12 $3,652 3.36
4.50 $2,083
13 $3,227 2.92
14 $2,808 3.00 Very Good 5.00 $103