0% found this document useful (0 votes)
20 views71 pages

3 Simple Linear Regression

Uploaded by

Wu Chun Ricardo
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
20 views71 pages

3 Simple Linear Regression

Uploaded by

Wu Chun Ricardo
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 71

Simple Linear Regression

Dr. William Lau


Tel: 3943 8572
[email protected]
Why Regression Analysis
 Frank Schmidt and John Hunter (1998) studied all relevant HR research in the
past 85 years, and concluded that:
 In general, the top 16% employees are 19% more productive than the average
employees.
 For professional jobs or management positions, the top is 48% more productive.
 For programmers, the top is 5 times more productive than the average, and 10 times
more productive than those at the bottom.

 Professor Jeffrey Pfeffer from Stanford used regression analysis to help a


company replace their existing staff with better quality, and it increased the
company sales by 20% within a year!
A Case Study at Eastern Connecticut
State University, USA
The Dead Grandmother /
Exam Syndrome
For over 20 years teaching at a university in the U.S., Prof. Mike Adams found:
 The week prior to an exam is an extremely dangerous time for the relatives, particularly
grandmother, of university students!
 “A student’s grandmother is far more likely to die suddenly just before the student takes an
exam, than at any other time of the year!”

No Exam Mid-Term Final


Family Death Rate (per
0.054 0.574 1.042
100 students)
Is it grade dependent…?
A B C D F Mean
No Exam 0.04 0.07 0.05 0.05 0.06 0.054
Mid-term 0.06 0.21 0.49 0.86 1.25 0.574
Final 0.09 0.41 0.96 1.57 2.18 1.042

Family Death Rate versus Grade


3

2 No Exam
Mid-term
Final

0
A B C
D F
Lesson Outline
1 The Straight-Line Probabilistic Model
2 Fitting the Model: The Method of Least Squares
3 Model Assumptions
4 An Estimator of σ2
5 Assessing the Utility of the Model:
Making Inferences About the Slope β1
6 The Coefficient of Determination
7 Using the Model for Estimation and Prediction
8 A Complete Example
9 Regression Through the Origin (Optional)
7
8 The straight-line model
9 Fitting the model: The Method of Least Squares
10
11 Straight-line fit to data in Table 3.1
12
13
14
15
Plot of the least squares line
^
y = -.1 + .7x
16
17
SAS printout for
advertising-sales regression
18
SPSS printout for
advertising-sales regression
19
MINITAB printout for
advertising-sales regression
20 Student GPA and Family Death Rate

GPA (Grade) 4.00 (A) 3.00 (B) 2.00 (C) 1.00 (D) 0.00 (F) Mean
No Exam 0.04 0.07 0.05 0.05 0.06 0.054
Mid-term 0.06 0.21 0.49 0.86 1.25 0.574
Final 0.09 0.41 0.96 1.57 2.18 1.042
21 Scatterplot for the data
Family Death Rate versus GPA
3

No Exam
Mid-term
Final

0
4.00 3.00 2.00
(A) 1.00 (B) 0.00 (C)
(D) (F)
22
Student GPA and Family Death Rate
- Final Exam
2.5

2
f(x) = − 0.534 x + 2.11
1.5
Death Rate

0.5

0
0 0.5 1 1.5 2 2.5 3 3.5 4

GPA
23
Model assumptions

24 The probability distribution of e
25 Is The Method of Least Squares Always Reliable?
26 An Estimator of  2
27
28 Advertising and Sales Regression Results
Student GPA and Family Death Rate
29
- Final Exam Regression Results
SUMMARY OUTPUT

Regression Statistics
Multiple R 0.99396714
R Square 0.987970675
Adjusted R Square 0.9839609 Coeff. Var.
Standard Error 0.107579428 10.32%
Observations 5

ANOVA
df SS MS F Significance F
Regression 1 2.85156 2.85156 246.390553 0.000561986
Residual 3 0.03472 0.011573333
Total 4 2.88628

Coefficients Standard Error t Stat P-value Lower 95% Upper 95% Lower 95.0% Upper 95.0%
Intercept 2.11 0.083330667 25.32081028 0.000135084 1.844804628 2.375195372 1.844804628 2.375195372
GPA -0.534 0.034019602 -15.69683258 0.000561986 -0.642265557 -0.425734443 -0.642265557 -0.425734443
Assessing the Utility of the Model:
30 Making Inferences About the Slope 1
Graphing the model with
31 b1 = 0: y = b0 + e
32

Slide 23 & 24
^
33 Sampling distribution of b1
34

≥ ≤
35
Rejection region and calculated t-value for
testing whether the slope b1 = 0
36 Advertising & Sales Regression Results (SAS)
37
Student GPA and Family Death Rate
38
- Final Exam Regression Results
SUMMARY OUTPUT

Regression Statistics
Multiple R 0.99396714
R Square 0.987970675
Adjusted R Square 0.9839609
Standard Error 0.107579428
Observations 5

ANOVA
df SS MS F Significance F
Regression 1 2.85156 2.85156 246.390553 0.000561986
Residual 3 0.03472 0.011573333
Total 4 2.88628

Coefficients Standard Error t Stat P-value Lower 95% Upper 95% Lower 95.0% Upper 95.0%
Intercept 2.11 0.083330667 25.32081028 0.000135084 1.844804628 2.375195372 1.844804628 2.375195372
GPA -0.534 0.034019602 -15.69683258 0.000561986 -0.642265557 -0.425734443 -0.642265557 -0.425734443
39

The Coefficient of determination


A comparison of the sum of squares of
40
deviations for two models
41 Interpretation of R2

About R2 of the sample variation in y can be explained


by using x to predict y in the straight-line model.
42
Portion of SPSS printout for
43
advertising-sales regression
Student GPA and Family Death Rate
44
- Final Exam Regression Results
SUMMARY OUTPUT

Regression Statistics
Multiple R 0.99396714
R Square 0.987970675
Adjusted R Square 0.9839609
Standard Error 0.107579428
Observations 5

ANOVA
df SS MS F Significance F
Regression 1 2.85156 2.85156 246.390553 0.000561986
Residual 3 0.03472 0.011573333
Total 4 2.88628

Coefficients Standard Error t Stat P-value Lower 95% Upper 95% Lower 95.0% Upper 95.0%
Intercept 2.11 0.083330667 25.32081028 0.000135084 1.844804628 2.375195372 1.844804628 2.375195372
GPA -0.534 0.034019602 -15.69683258 0.000561986 -0.642265557 -0.425734443 -0.642265557 -0.425734443
MINITAB graph of simple linear model
45 relating price (y) to floor height (x)
46 Using the Model for Estimation and Prediction
Estimated mean value and predicted individual
47
value of sales revenue y for x = 4
48
49
50
51
SAS printout showing
95% confidence intervals for E(y)
52
SAS printout showing
95% prediction intervals for y
Error of estimating the mean value of y for a
53
given value of x
Error of predicting a future value of y for a
54
given value of x
Comparison of widths of
55
95% confidence and prediction intervals
56 Another example about simple linear regression
57
58 SAS printout for fire damage linear regression
59 Least squares model for the fire damage data
60 Test Your Understanding
Suppose a student wants to know whether 1) appearance and 2) presentation skills of a lecturer
affect students’ overall satisfaction about the course, so he collects 10 random samples, and
runs two separate simple linear regression analyses that have the values of R 2 equal to 1) 0.68
and 2) 0.55 respectively.

Assuming the regression model assumptions are all satisfied, and t-tests for the two β1 are both
significant. What can he conclude?
1) Appearance of the lecturer is a more important factor than presentation skills in
explaining students’ overall satisfaction about the course.
2) Appearance and presentation skills of a lecturer can explain 68% and 55% of the
variation in students’ overall satisfaction about the course respectively.
3) Both of the above.
4) Neither of the above.
61 Regression Through the Origin (Optional)
62
63
64
65
SPSS regression through origin printout for
66 Example 3.6
Model
MINITAB scatterplot for data in
67 Example 3.6
SPSS spreadsheet with
68 95% prediction intervals
Using a straight line to approximate a curvilinear
69 relationship when the true relationship passes through the
origin
Fitting Regression Model with Population Data
70

http://stattrek.com/regression/regression-example.aspx?Tutorial=AP
Do University Administrators Deserve Their Raises?
Administrator Raise Average Rating Ratings Expected Raise
(5-pt scale)
71 1 $18,000 2.76 Very Poor 1.00 $15,939
2 $16,700 1.52
1.50 $13,960
3 $10,608 3.10
4 $10,268 3.83
Poor 2.00 $11,980
5 $9,795 2.84
6 $9,513 2.10 2.50 $10,001
7 $8,459 2.38
Average 3.00 $8,021
8 $6,099 3.59
9 $4,557 4.11
3.50 $6,042
10 $3,751 3.14
11 $3,718 3.64 Good 4.00 $4,062
12 $3,652 3.36
4.50 $2,083
13 $3,227 2.92
14 $2,808 3.00 Very Good 5.00 $103

Response Rate: around 10-20%, University of South Florida


Source: Chronicle of Higher Education (January 1991)

Q1: Is there any evidence of an inverse linear relationship at 5% significance level?


Q2: Based on the results of the regression in Q1, the union computed estimated raises for different ratings as shown
in the above right table. Is there any problem of using this table for estimation?
Q3: Would you support the union’s claim of an inverse relationship between raise and rating?

You might also like