Simple Regression
Simple Regression
Objectives
In this chapter, you learn:
◼ How to use regression analysis to predict the value of a
dependent variable based on a value of an independent
variable.
◼ To understand the meaning of the regression coefficients
b0 and b1.
◼ The assumptions of regression analysis (Note: Sections
13.5 and 13.6 not covered).
◼ To make inferences about the slope (t-test and F-test)
(Note: t-test for the Correlation coefficient (pages 479 and 480) and section 3.8 not
covered).
3
Types of Relationships
Y Y
X X
Y Y
X X
Copyright © 2017 Pearson Education, Ltd. Chapter 13 - 4
Y Y
X X
Y Y
X X
Copyright © 2017 Pearson Education, Ltd. Chapter 13 - 5
No relationship
X
Copyright © 2017 Pearson Education, Ltd. Chapter 13 - 6
6
Introduction to
Regression Analysis
◼ Regression analysis is used to:
◼ Predict the value of a dependent variable based on
the value of at least one independent variable.
◼ Explain the impact of changes in an independent
variable on the dependent variable.
Dependent variable: the variable we wish to
predict or explain.
Independent variable: the variable used to predict
or explain the dependent
variable.
Population Random
Population Independent Error
Slope
Y intercept Variable term
Coefficient
Dependent
Variable
Yi = β0 + β1Xi + εi
Linear component Random Error
component
9
Simple Linear Regression Model (continued)
Y Yi = β0 + β1Xi + εi
Observed Value
of Y for Xi
εi Slope = β1
Predicted Value Random Error
of Y for Xi
for this Xi value
Intercept = β0
Xi X
Copyright © 2017 Pearson Education, Ltd. Chapter 13 - 10
10
Estimated
(or predicted) Estimate of Estimate of the
Y value for the regression regression slope
observation i intercept
Value of X for
Ŷi = b0 + b1Xi
observation i
11
12
Interpretation of the
Slope and the Intercept
◼ b0 is the estimated mean value of Y when
the value of X is zero.
13
14
15
Simple Linear Regression
Example: Scatter Plot
House price model: Scatter Plot.
450
400
House Price ($1000s)
350
300
250
200
150
100
50
0
0 500 1000 1500 2000 2500 3000
Square Feet
16
Ŷi = b0 + b1Xi
Chap 13-17
Copyright © 2017 Pearson Education, Ltd. Chapter 13 - 17
17
Chap 13-18
Copyright © 2017 Pearson Education, Ltd. Chapter 13 - 18
18
Simple Linear Regression Example:
Excel Output
Regression Statistics
Multiple R 0.76211 The regression equation is:
R Square 0.58082
Adjusted R Square 0.52842 house price = 98.24833 + 0.10977 (square feet)
Standard Error 41.33032
Observations 10
ANOVA
df SS MS F Significance F
Regression 1 18934.9348 18934.9348 11.0848 0.01039
Residual 8 13665.5652 1708.1957
Total 9 32600.5000
19
450
400
House Price ($1000s)
350 Slope
300
250
= 0.10977
200
150
100
50
Intercept 0
= 98.248 0 500 1000 1500 2000 2500 3000
Square Feet
20
21
Simple Linear Regression Example:
Interpreting b1
22
23
350
300
250
200
150
100
50 Do not try to
0
extrapolate
0 500 1000 1500 2000 2500 3000
Square Feet
beyond the range
of observed X’s
Copyright © 2017 Pearson Education, Ltd. Chapter 13 - 24
24
Measures of Variation
◼ Total variation is made up of two parts:
25
26
Y
Yi
SSE = (Yi - Yi )2 Y
_
SST = (Yi - Y)2
Y _
_ SSR = (Yi - Y)2 _
Y Y
Xi X
Copyright © 2017 Pearson Education, Ltd. Chapter 13 - 27
27
Excel Output Of The Measures Of Variation
28
Coefficient of Determination, r2
◼ The coefficient of determination is the portion
of the total variation in the dependent variable
that is explained by variation in the
independent variable.
◼ The coefficient of determination is also called
r-square and is denoted as r2.
SSR regression sum of squares
r2 = =
SST total sum of squares
note: 0 r2 1
Copyright © 2017 Pearson Education, Ltd. Chapter 13 - 29
29
X
r2 = 1
Copyright © 2017 Pearson Education, Ltd. Chapter 13 - 30
30
Examples of Approximate r2 Values
Y
0 < r2 < 1
X
Copyright © 2017 Pearson Education, Ltd. Chapter 13 - 31
31
r2 = 0
Y
No linear relationship
between X and Y.
32
33
Standard Error of Estimate
◼ The standard deviation of the variation of
observations around the regression line is
estimated by:
n
SSE
(Yi − Yˆi ) 2
i =1
S YX = =
n−2 n−2
Where:
SSE = error sum of squares.
n = sample size.
34
ANOVA
df SS MS F Significance F
Regression 1 18934.9348 18934.9348 11.0848 0.01039
35
36
Assumptions of Regression L.I.N.E.
◼ Linearity:
◼ The relationship between X and Y is linear.
◼ Independence of Errors:
◼ Error values are statistically independent.
◼ Particularly important when data are collected over a
period of time.
◼ Normality of Error:
◼ Error values are normally distributed for any given
value of X.
◼ Equal Variance (also called homoscedasticity):
◼ The probability distribution of the errors has constant
variance.
37
where:
Sb1= Estimate of the standard error of the slope.
SSE
SYX = = Standard error of the estimate.
n−2
38
39
Inferences About the Slope: t-Test
Example (Recall):
House Price
Square Feet
in $1000s
(x)
(y) Estimated Regression Equation:
245 1400
312 1600
house price = 98.25 + 0.1098 (sq.ft.)
279 1700
308 1875 The slope of this model is 0.1098.
199 1100
219 1550
Is there a relationship between the
405 2350 square footage of the house and its
324 2450 sales price?
319 1425
255 1700
40
H 0 : β1 = 0
b1 Sb1 H 1 : β1 ≠ 0
b1 − β 1 0.10977 − 0
t STAT = = = 3.32938
Sb 0.03297
1
41
d.f. = 10- 2 = 8
a/2=.025 a/2=.025
Decision: Reject H0.
42
Inferences About the Slope: t-Test
Example (continued):
H 0 : β1 = 0
From Excel output: H 1 : β1 ≠ 0
Coefficients Standard Error t Stat P-value
Intercept 98.24833 58.03348 1.69296 0.12892
Square Feet 0.10977 0.03297 3.32938 0.01039
43
where SSR
MSR =
k
SSE
MSE =
n − k −1
where FSTAT follows an F distribution with k numerator and (n – k - 1)
denominator degrees of freedom.
44
Regression Statistics
Multiple R 0.76211
MSR 18,934.9348
R Square 0.58082 FSTAT = = = 11.0848
Adjusted R Square 0.52842 MSE 1,708.1957
Standard Error 41.33032
Observations 10 With 1 and 8 degrees p-value for
of freedom the F-Test
ANOVA
df SS MS F Significance F
Regression 1 18934.9348 18934.9348 11.0848 0.01039
45
F Test for The Slope (continued)
H 0 : β1 = 0 Test Statistic:
H 1 : β1 ≠ 0 MSR
FSTAT = = 11.08
a = .05 MSE
df1= 1 df2 = 8
Decision:
Critical Reject H0 at a = 0.05.
Value:
Fa = 5.32
a = .05 Conclusion:
There is sufficient evidence that
0 F house size affects selling price.
Do not Reject H0
reject H0
F.05 = 5.32
Copyright © 2017 Pearson Education, Ltd. Chapter 13 - 46
46
47
48
Pitfalls of Regression Analysis
◼ Lacking an awareness of the assumptions of least-
squares regression.
◼ Not knowing how to evaluate the assumptions of least-
squares regression.
◼ Not knowing the alternatives to least-squares regression
if a particular assumption is violated.
◼ Using a regression model without knowledge of the
subject matter.
◼ Extrapolating outside the relevant range.
◼ Concluding that a significant relationship identified always
reflects a cause-and-effect relationship.
49
50
51
Chapter Summary
In this chapter we discussed:
◼ How to use regression analysis to predict the value of
a dependent variable based on a value of an
independent variable.
◼ Understanding the meaning of the regression
coefficients b0 and b1.
◼ Evaluating the assumptions of regression analysis.
◼ Making inferences about the slope.
52
Week 12 Tutorial
Selected Practice Questions (see Levine et. al. 8th edition):
Page 460: 13.1a-c, and 13.2a-d.
Page 466: 13.11, and 13.13.
Page 480: 13.40a-d, and 13.41a-e.
53
54