Simple Linear Reg Ex 1
Simple Linear Reg Ex 1
SimpleLinearRegEx1-1
Sample Data for House Price Model
House Price in $1000s Square Feet
(Y) (X)
245 1400
312 1600
279 1700
308 1875
199 1100
219 1550
405 2350
324 2450
319 1425
255 1700
SimpleLinearRegEx1-2
Graphical Presentation
• House price model: scatter plot
450
400
House Price ($1000s) 350
300
250
200
150
100
50
0
0 500 1000 1500 2000 2500 3000
Square Feet
SimpleLinearRegEx1-3
Regression Using Excel
• Tools / Data Analysis / Regression
SimpleLinearRegEx1-4
Excel Output
Regression Statistics
Multiple R 0.76211 The regression equation is:
R Square 0.58082
Adjusted R Square 0.52842 house price 98.24833 0.10977 (square feet)
Standard Error 41.33032
Observations 10
yˆ 98.24833 0.10977 X
ANOVA
df SS MS F Significance F
SimpleLinearRegEx1-5
Graphical Presentation
• House price model: scatter plot and regression
line
450
400
SimpleLinearRegEx1-7
Interpretation of the Slope Coefficient, b1
SimpleLinearRegEx1-8
Measures of Variation
SimpleLinearRegEx1-10
Coefficient of Determination, R 2
note: 0 R 1
2
SimpleLinearRegEx1-11
Examples of Approximate r2 Values
Y
r2 = 1
X
r =1
2
SimpleLinearRegEx1-12
Examples of Approximate
r2 Values
Y
0 < r2 < 1
X
SimpleLinearRegEx1-13
Examples of Approximate
r2 Values
r2 = 0
Y
No linear relationship
between X and Y:
SimpleLinearRegEx1-14
Excel Output
SSR 18934.9348
Regression Statistics
R 2
0.58082
Multiple R 0.76211 SST 32600.5000
R Square 0.58082
Adjusted R Square 0.52842 58.08% of the variation in
Standard Error 41.33032
house prices is explained by
Observations 10
variation in square feet
ANOVA
df SS MS F Significance F
Regression 1 18934.9348 18934.9348 11.0848 0.01039
Residual 8 13665.5652 1708.1957
Total 9 32600.5000
SimpleLinearRegEx1-15
2
Correlation and R
• The coefficient of determination, R2, for a simple
regression is equal to the simple correlation
squared
2 2
R rxy
SimpleLinearRegEx1-16
Estimation of Model Error Variance
i
e
SSE
2
σˆ s
2 2
e i1
n2 n2
• Division by n – 2 instead of n – 1 is because the simple regression model uses two estimated
parameters, b0 and b1, instead of one
SimpleLinearRegEx1-17
Excel Output
Regression Statistics
Multiple R 0.76211 se 41.33032
R Square 0.58082
Adjusted R Square 0.52842
Standard Error 41.33032
Observations 10
ANOVA
df SS MS F Significance F
Regression 1 18934.9348 18934.9348 11.0848 0.01039
Residual 8 13665.5652 1708.1957
Total 9 32600.5000
SimpleLinearRegEx1-18
Comparing Standard Errors
se is a measure of the variation of observed y
values from the regression line
Y Y
small se X large se X
i.e., se = $41.33K is moderately small relative to house prices in the $200 - $300K range
SimpleLinearRegEx1-19
Inferences About the Regression Model
2 2
s s
s2b1 e
e
(x i x) (n 1)s x
2 2
where:
sb1 = Estimate of the standard error of the least squares slope
SSE
se = Standard error of the estimate
n2
SimpleLinearRegEx1-20
Excel Output
Regression Statistics
Multiple R 0.76211
R Square 0.58082
Adjusted R Square 0.52842
Standard Error
Observations
41.33032
10
sb1 0.03297
ANOVA
df SS MS F Significance F
Regression 1 18934.9348 18934.9348 11.0848 0.01039
Residual 8 13665.5652 1708.1957
Total 9 32600.5000
SimpleLinearRegEx1-21
Inference about the Slope: t Test
• t test for a population slope
• Is there a linear relationship between X and Y?
• Null and alternative hypotheses
H0: β1 = 0 (no linear relationship)
H1: β1 0 (linear relationship does exist)
• Under H0, the test statistic
b1 1 where:
t ~ t (n 2)
sb1 b1 = regression slope
coefficient
β1 = hypothesized slope
sb1 = standard
error of the slope
SimpleLinearRegEx1-22
Inference about the Slope: t Test
(continued)
SimpleLinearRegEx1-23
Inferences about the Slope: t Test Example
b1 sb1
H0: β1 = 0 From Excel output:
H1: β1 0 Coefficients Standard Error t Stat P-value
Intercept 98.24833 58.03348 1.69296 0.12892
Square Feet 0.10977 0.03297 3.32938 0.01039
b1 β1 0.10977 0
t t 3.32938
sb1 0.03297
SimpleLinearRegEx1-24
Inferences about the Slope: t Test Example
(continued)
Test Statistic: t = 3.329
b1 sb1 t
H0: β1 = 0 From Excel output:
H1: β1 0 Coefficients Standard Error t Stat P-value
Intercept 98.24833 58.03348 1.69296 0.12892
d.f. = 10-2 = 8 Square Feet 0.10977 0.03297 3.32938 0.01039
t8,.025 = 2.3060
Decision:
a/2=.025 a/2=.025 Reject H0
Conclusion:
Reject H0 Do not reject H0 Reject H0
There is sufficient evidence
-tn-2,α/2 0 tn-2,α/2 that square footage affects
-2.3060 2.3060 3.329 house price SimpleLinearRegEx1-25
Inferences about the Slope: t Test Example
(continued)
P-value = 0.01039
P-value
H0: β1 = 0 From Excel output:
H1: β1 0 Coefficients Standard Error t Stat P-value
Intercept 98.24833 58.03348 1.69296 0.12892
Square Feet 0.10977 0.03297 3.32938 0.01039
SimpleLinearRegEx1-28
F-Test for Significance
• F Test statistic: MSR
F
where MSE
SSR
MSR
k
SSE
MSE
n k 1
where F follows an F distribution with k numerator and (n – k - 1)
denominator degrees of freedom
SimpleLinearRegEx1-30
F-Test for Significance
(continued)
98.25 0.1098(200 0)
317.85
The predicted price for a house with 2000
square feet is 317.85($1,000s) = $317,850
SimpleLinearRegEx1-32
Graphical Analysis
• The linear regression model is based on minimizing
the sum of squared errors
• If outliers exist, their potentially large squared errors
may have a strong influence on the fitted regression
line
• Be sure to examine your data graphically for outliers
and extreme points
• Decide, based on your model and logic, whether the
extreme points should remain or be removed
SimpleLinearRegEx1-33
Summary
• Introduced the linear regression model
• Reviewed correlation and the assumptions of linear
regression
• Discussed estimating the simple linear regression
coefficients
• Described measures of variation
• Described inference about the slope
• Addressed estimation of mean values and prediction
of individual values
SimpleLinearRegEx1-34