0% found this document useful (0 votes)
258 views33 pages

Linear Regression Example Data: House Price in $1000s (Y) Square Feet (X)

This document describes linear regression analysis performed on a housing data set. It includes a scatterplot of house price versus square feet, the regression equation calculated in Excel, interpretation of the regression coefficients, an example of making a prediction, and measures of goodness of fit including the coefficient of determination and standard error of the estimate.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
258 views33 pages

Linear Regression Example Data: House Price in $1000s (Y) Square Feet (X)

This document describes linear regression analysis performed on a housing data set. It includes a scatterplot of house price versus square feet, the regression equation calculated in Excel, interpretation of the regression coefficients, an example of making a prediction, and measures of goodness of fit including the coefficient of determination and standard error of the estimate.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 33

Linear Regression Example Data

House Price in $1000s Square Feet


(Y) (X)
245 1400
312 1600
279 1700
308 1875
199 1100
219 1550
405 2350
324 2450
319 1425
255 1700
Linear Regression Example
Scatterplot
• House price model: scatter plot
450
400
House Price ($1000s)

350
300
250
200
150
100
50
0
0 500 1000 1500 2000 2500 3000
Square Feet
Linear Regression Example
Using Excel

Tools
--------
Data Analysis
--------
Regression
Linear Regression Example
Excel Output
Regression Statistics The regression equation is:
house price = 98.24833 + 0.10977 (square feet)
Multiple R 0.76211
R Square 0.58082
Adjusted R Square 0.52842
Standard Error 41.33032
Observations 10

ANOVA
df SS MS F Significance F
Regression 1 18934.9348 18934.9348 11.0848 0.01039
Residual 8 13665.5652 1708.1957
Total 9 32600.5000

Coefficients Standard Error t Stat P-value Lower 95% Upper 95%


Intercept 98.24833 58.03348 1.69296 0.12892 -35.57720 232.07386
Square Feet 0.10977 0.03297 3.32938 0.01039 0.03374 0.18580
Linear Regression Example
Graphical Representation
• House price model: scatter plot and regression line
450
400
House Price ($1000s)

350 Slope
300
250
= 0.10977
200
150
100
50
Intercept 0
= 98.248 0 500 1000 1500 2000 2500 3000
Square Feet

house price = 98.24833 + 0.10977 (square feet)


Linear Regression Example
Interpretation of b0

house price = 98.24833 + 0.10977 (square feet)

• b0 is the estimated mean value of Y when


the value of X is zero (if X = 0 is in the range
of observed X values)
• Because the square footage of the house
cannot be 0, the Y intercept has no practical
application.
Linear Regression Example
Interpretation of b1

house price = 98.24833 + 0.10977 (square feet)

• b1 measures the mean change in the average value


of Y as a result of a one-unit change in X
• Here, b1 = .10977 tells us that the mean value of a
house increases by .10977($1000) = $109.77, on
average, for each additional one square foot of size
Linear Regression Example
Making Predictions

Predict the price for a house with 2000 square feet:

house price = 98.25 + 0.1098 (sq.ft.)

= 98.25 + 0.1098(200 0)

= 317.85

The predicted price for a house with 2000 square feet


is 317.85($1,000s) = $317,850
Linear Regression Example
Making Predictions
• When using a regression model for prediction,
only predict within the relevant range of data
Relevant range for
interpolation

450
400
House Price ($1000s)

350
300 Do not try to
250 extrapolate beyond
200
150
the range of
100 observed X’s
50
0
Statistics
0 for Managers
500 Using
1000 1500 2000 2500 3000
Microsoft Excel, 5e © 2008 Chap 13-9
Prentice-Hall, Inc. Square Feet
Measures of Variation
Total variation is made up of two parts:
SST = SSR + SSE
Total Sum of Regression Sum of Error Sum of
Squares Squares Squares

SST =  ( Yi − Y )2 SSR =  ( Ŷi − Y )2 SSE =  ( Yi − Ŷi )2


where:
Y = Mean value of the dependent variable
Yi = Observed values of the dependent variable
Ŷi = Predicted value of Y for the given Xi value
Coefficient of Determination, r2
• The coefficient of determination is the portion of
the total variation in the dependent variable that
is explained by variation in the independent
variable
• The coefficient of determination is also called r-
squared and is denoted as r2
SSR regression sum of squares
r =
2
=
SST total sum of squares

0 r 1
2
Linear Regression Example
Coefficient of Determination, r2
Regression Statistics
SSR 18934.9348
Multiple R 0.76211 r = 2
= = 0.58082
R Square 0.58082 SST 32600.5000
Adjusted R Square 0.52842
58.08% of the variation in house
Standard Error 41.33032
prices is explained by variation in
Observations 10
square feet
ANOVA
df SS MS F Significance F
Regression 1 18934.9348 18934.9348 11.0848 0.01039
Residual 8 13665.5652 1708.1957
Total 9 32600.5000

Coefficients Standard Error t Stat P-value Lower 95% Upper 95%


Intercept 98.24833 58.03348 1.69296 0.12892 -35.57720 232.07386
Square Feet 0.10977 0.03297 3.32938 0.01039 0.03374 0.18580
Standard Error of Estimate

• The standard deviation of the variation of observations


around the regression line is estimated by

SSE  i i
(Y − Yˆ ) 2

SYX = = i =1
n−2 n−2
Where
SSE = error sum of squares
n = sample size
Linear Regression Example
Standard Error of Estimate
Regression Statistics
Multiple R 0.76211 S YX = 41.33032
R Square 0.58082
Adjusted R Square 0.52842
Standard Error 41.33032
Observations 10

ANOVA
df SS MS F Significance F
Regression 1 18934.9348 18934.9348 11.0848 0.01039
Residual 8 13665.5652 1708.1957
Total 9 32600.5000

Coefficients Standard Error t Stat P-value Lower 95% Upper 95%


Intercept 98.24833 58.03348 1.69296 0.12892 -35.57720 232.07386
Square Feet 0.10977 0.03297 3.32938 0.01039 0.03374 0.18580
Comparing Standard Errors
SYX is a measure of the variation of observed Y
values from the regression line
Y Y

small s YX X large s YX X

The magnitude of SYX should always be judged relative to


the size of the Y values in the sample data
Inferences About the Slope:
t Test
• t test for a population slope
– Is there a linear relationship between X and Y?
• Null and alternative hypotheses
– H0: β1 = 0 (no linear relationship)
– H1: β1 ≠ 0 (linear relationship does exist)
• Test statistic where:

b1 − β1 b1 = regression slope
t= coefficient

Sb1 β1 = hypothesized slope


Sb1 = standard
d.f. =Chapn13-16
−2
error of the slope
Inferences About the Slope:
t Test Example
House Price in Estimated Regression Equation:
Square Feet
$1000s
(x)
(y)
house price = 98.25 + 0.1098 (sq.ft.)
245 1400
312 1600
279 1700
308 1875 The slope of this model is 0.1098
199 1100
219 1550
Is there a relationship between the
405 2350 square footage of the house and its
324 2450 sales price?
319 1425
255 1700
Inferences About the Slope:
t Test Example

b1 Sb1
• H0: β1 = 0 From Excel output:
• H1: β1 ≠ 0 Coefficients Standard Error t Stat P-value
Intercept 98.24833 58.03348 1.69296 0.12892
Square Feet 0.10977 0.03297 3.32938 0.01039

b1 − β1 0.10977 − 0
t= = t = 3.32938
Sb1 0.03297
Inferences About the Slope:
t Test Example

Test Statistic: t = 3.329 • H0: β1 = 0


• H1: β1 ≠ 0

d.f. = 10- 2 = 8

a/2=.025 a/2=.025
Decision: Reject H0

There is sufficient evidence


Reject H0
-tα/2
Do not reject H0
tα/2
Reject H0 that square footage affects
0
-2.3060 2.3060 3.329 house price
Inferences About the Slope:
t Test Example

From Excel output: P-Value


Coefficients Standard Error t Stat P-value
Intercept 98.24833 58.03348 1.69296 0.12892

• H0: β1 = 0 Square Feet 0.10977 0.03297 3.32938 0.01039

• H1: β1 ≠ 0
Decision: Reject H0, since p-value < α

There is sufficient evidence that


square footage affects house price.
F-Test for Significance
• F Test statistic:
MSR
where F=
MSE

SSR
MSR =
k
SSE
MSE =
n − k −1
where F follows an F distribution with k numerator degrees of freedom
and (n - k - 1) denominator degrees of freedom

(k = the number of independent variables in the regression model)


F-Test for Significance
Excel Output

Regression Statistics
Multiple R 0.76211
MSR 18934.9348
R Square 0.58082 F= = = 11.0848
Adjusted R Square 0.52842 MSE 1708.1957
Standard Error 41.33032
Observations 10 With 1 and 8 degrees of P-value for
freedom the F-Test
ANOVA
df SS MS F Significance F
Regression 1 18934.9348 18934.9348 11.0848 0.01039
Residual 8 13665.5652 1708.1957
Total 9 32600.5000
F-Test for Significance
• H0: β1 = 0 Test Statistic:
• H1: β1 ≠ 0 MSR
F= = 11.08
• a = .05 MSE
• df1= 1 df2 = 8
Decision:
Critical Value: Reject H0 at a = 0.05
Fa = 5.32

a = .05 Conclusion:
There is sufficient evidence that
0 Do not Reject H0
F house size affects selling price
reject H0 F.05 = 5.32
Residual Analysis
ei = Yi − Yˆi
• The residual for observation i, ei, is the difference between its
observed and predicted value
• Check the L.I.N.E. assumptions of regression by examining the
residuals
– Examine for Linearity assumption
– Evaluate Independence assumption
– Evaluate Normal distribution assumption
– Examine Equal variance for all levels of X
• Graphical Analysis of Residuals
– Can plot residuals vs. X
Residual Analysis for Linearity

Y Y

x x
residuals

x residuals x

Not Linear Linear


Residual Analysis for Independence

Not Independent Independent


residuals

residuals
X
residuals

X
Checking for Normality

• Examine the Stem-and-Leaf Display of the


Residuals
• Examine the Box-and-Whisker Plot of the
Residuals
• Examine the Histogram of the Residuals
• Construct a Normal Probability Plot of the
Residuals
Residual Analysis for
Equal Variance

Y Y

x x
residuals

x residuals x

Unequal variance Equal variance


Linear Regression Example
Excel Residual Output
RESIDUAL OUTPUT House Price Model Residual Plot
Predicted
80
House Price Residuals
1 251.92316 -6.923162 60

2 273.87671 38.12329 40

Residuals
3 284.85348 -5.853484 20

4 304.06284 3.937162 0
0 1000 2000 3000
5 218.99284 -19.99284 -20
6 268.38832 -49.38832 -40
7 356.20251 48.79749 -60
8 367.17929 -43.17929 Square Feet
9 254.6674 64.33264
10 284.85348 -29.85348 Does not appear to violate
any regression assumptions
Measuring Autocorrelation:
The Durbin-Watson Statistic

• Used when data are collected over time


to detect if autocorrelation is present
• Autocorrelation exists if residuals in one
time period are related to residuals in
another period
Autocorrelation
• Autocorrelation is correlation of the errors
(residuals) over time Time (t) Residual Plot
15
10
▪ Here, residuals suggest a Residuals 5
cyclic pattern, not 0
random -5 0 2 4 6 8
-10
-15
Time (t)

▪ Violates the regression assumption that residuals are


statistically independent
Strategies for Avoiding
the Pitfalls of Regression
• Start with a scatter plot of X on Y to observe
possible relationship
• Perform residual analysis to check the
assumptions
– Plot the residuals vs. X to check for violations of
assumptions such as equal variance
– Use a histogram, stem-and-leaf display, box-and-
whisker plot, or normal probability plot of the
residuals to uncover possible non-normality
Strategies for Avoiding
the Pitfalls of Regression
• If there is violation of any assumption, use
alternative methods or models
• If there is no evidence of assumption violation,
then test for the significance of the regression
coefficients and construct confidence intervals
and prediction intervals
• Avoid making predictions or forecasts outside
the relevant range

You might also like