0% found this document useful (0 votes)
58 views18 pages

Simple Regression

lecture

Uploaded by

maria akshatha
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
58 views18 pages

Simple Regression

lecture

Uploaded by

maria akshatha
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 18

Chapter 13

Simple Linear Regression

Copyright © 2017 Pearson Education, Ltd. Chapter 13 - 1

Objectives
In this chapter, you learn:
◼ How to use regression analysis to predict the value of a
dependent variable based on a value of an independent
variable.
◼ To understand the meaning of the regression coefficients
b0 and b1.
◼ The assumptions of regression analysis (Note: Sections
13.5 and 13.6 not covered).
◼ To make inferences about the slope (t-test and F-test)
(Note: t-test for the Correlation coefficient (pages 479 and 480) and section 3.8 not
covered).

Copyright © 2017 Pearson Education, Ltd. Chapter 13 - 2

Correlation vs. Regression


◼ A scatter plot can be used to show the
relationship between two variables.
◼ Correlation analysis is used to measure the
strength of the association (linear relationship)
between two variables.
◼ Correlation is only concerned with strength of the
relationship.
◼ No causal effect is implied with correlation.
◼ Scatter plots were first presented in Chapter 2.
◼ Correlation was first presented in Chapter 3.

Copyright © 2017 Pearson Education, Ltd. Chapter 13 - 3

3
Types of Relationships

Linear relationships Curvilinear relationships

Y Y

X X

Y Y

X X
Copyright © 2017 Pearson Education, Ltd. Chapter 13 - 4

Types of Relationships (continued)

Strong relationships Weak relationships

Y Y

X X

Y Y

X X
Copyright © 2017 Pearson Education, Ltd. Chapter 13 - 5

Types of Relationships (continued)

No relationship

X
Copyright © 2017 Pearson Education, Ltd. Chapter 13 - 6

6
Introduction to
Regression Analysis
◼ Regression analysis is used to:
◼ Predict the value of a dependent variable based on
the value of at least one independent variable.
◼ Explain the impact of changes in an independent
variable on the dependent variable.
Dependent variable: the variable we wish to
predict or explain.
Independent variable: the variable used to predict
or explain the dependent
variable.

Copyright © 2017 Pearson Education, Ltd. Chapter 13 - 7

Simple Linear Regression Model


◼ Only one independent variable, X.
◼ Relationship between X and Y is
described by a linear function.
◼ Changes in Y are assumed to be
related to changes in X.

Copyright © 2017 Pearson Education, Ltd. Chapter 13 - 8

Simple Linear Regression Model

Population Random
Population Independent Error
Slope
Y intercept Variable term
Coefficient
Dependent
Variable

Yi = β0 + β1Xi + εi
Linear component Random Error
component

Copyright © 2017 Pearson Education, Ltd. Chapter 13 - 9

9
Simple Linear Regression Model (continued)

Y Yi = β0 + β1Xi + εi
Observed Value
of Y for Xi

εi Slope = β1
Predicted Value Random Error
of Y for Xi
for this Xi value

Intercept = β0

Xi X
Copyright © 2017 Pearson Education, Ltd. Chapter 13 - 10

10

Simple Linear Regression Equation


(Prediction Line)
The simple linear regression equation provides an
estimate of the population regression line.

Estimated
(or predicted) Estimate of Estimate of the
Y value for the regression regression slope
observation i intercept

Value of X for

Ŷi = b0 + b1Xi
observation i

Copyright © 2017 Pearson Education, Ltd. Chapter 13 - 11

11

The Least Squares Method


◼ b0 and b1 are obtained by finding the values
that minimize the sum of the squared
differences between Y and Ŷ:

min  (Yi −Ŷi )2 = min  (Yi − (b 0 + b1Xi ))2


◼ The coefficients b0 and b1, and other
regression results in this chapter, will be
found using Excel.
◼ The calculations for b0 and b1 are not shown
here but are available in the text.
Copyright © 2017 Pearson Education, Ltd. Chapter 13 - 12

12
Interpretation of the
Slope and the Intercept
◼ b0 is the estimated mean value of Y when
the value of X is zero.

◼ b1 is the estimated change in the mean


value of Y as a result of a one-unit increase
in X.

Copyright © 2017 Pearson Education, Ltd. Chapter 13 - 13

13

Simple Linear Regression Example


◼ A real estate agent wishes to examine the
relationship between the selling price of a home
and its size (measured in square feet).

◼ A random sample of 10 houses is selected.


◼ Dependent variable (Y) = house price in $1,000s.

◼ Independent variable (X) = square feet.

Copyright © 2017 Pearson Education, Ltd. Chapter 13 - 14

14

Simple Linear Regression


Example: Data
House Price in $1000s Square Feet
(Y) (X)
245 1,400
312 1,600
279 1,700
308 1,875
199 1,100
219 1,550
405 2,350
324 2,450
319 1,425
255 1,700

Copyright © 2017 Pearson Education, Ltd. Chapter 13 - 15

15
Simple Linear Regression
Example: Scatter Plot
House price model: Scatter Plot.
450
400
House Price ($1000s)

350
300
250
200
150
100
50
0
0 500 1000 1500 2000 2500 3000
Square Feet

Copyright © 2017 Pearson Education, Ltd. Chapter 13 - 16

16

Simple Linear Regression:


Using Excel Data Analysis Function
1. Choose Data. 2. Choose Data Analysis.
3. Choose Regression.

Ŷi = b0 + b1Xi

Chap 13-17
Copyright © 2017 Pearson Education, Ltd. Chapter 13 - 17

17

Simple Linear Regression:


Using Excel Data Analysis Function

Enter Y range and X range and desired options.

Chap 13-18
Copyright © 2017 Pearson Education, Ltd. Chapter 13 - 18

18
Simple Linear Regression Example:
Excel Output
Regression Statistics
Multiple R 0.76211 The regression equation is:
R Square 0.58082
Adjusted R Square 0.52842 house price = 98.24833 + 0.10977 (square feet)
Standard Error 41.33032
Observations 10

ANOVA
df SS MS F Significance F
Regression 1 18934.9348 18934.9348 11.0848 0.01039
Residual 8 13665.5652 1708.1957
Total 9 32600.5000

Coefficients Standard Error t Stat P-value Lower 95% Upper 95%


Intercept 98.24833 58.03348 1.69296 0.12892 -35.57720 232.07386
Square Feet 0.10977 0.03297 3.32938 0.01039 0.03374 0.18580

Copyright © 2017 Pearson Education, Ltd. Chapter 13 - 19

19

Simple Linear Regression Example:


Graphical Representation

House price model: Scatter Plot and Prediction Line.

450
400
House Price ($1000s)

350 Slope
300
250
= 0.10977
200
150
100
50
Intercept 0
= 98.248 0 500 1000 1500 2000 2500 3000
Square Feet

house price = 98.24833 + 0.10977 (square feet)

Copyright © 2017 Pearson Education, Ltd. Chapter 13 - 20

20

Simple Linear Regression Example:


Interpretation of bo
house price = 98.24833 + 0.10977 (square feet)

◼ b0 is the estimated mean value of Y when the


value of X is zero (if X = 0 is in the range of
observed X values).
◼ Because a house cannot have a square footage
of 0, b0 has no practical application.

Copyright © 2017 Pearson Education, Ltd. Chapter 13 - 21

21
Simple Linear Regression Example:
Interpreting b1

house price = 98.24833 + 0.10977 (square feet)

◼ b1 estimates the change in the mean


value of Y as a result of a one-unit
increase in X.
◼ Here, b1 = 0.10977 tells us that the mean value
of a house increases by 0.10977($1,000) =
$109.77, on average, for each additional one
square foot of size.
Copyright © 2017 Pearson Education, Ltd. Chapter 13 - 22

22

Simple Linear Regression Example:


Making Predictions
Predict the price for a house
with 2,000 square feet:

house price = 98.25 + 0.1098 (sq.ft.)


= 98.25 + 0.1098(2,000)
= 317.85
The predicted price for a house with 2,000
square feet is 317.85($1,000s) = $317,850

Copyright © 2017 Pearson Education, Ltd. Chapter 13 - 23

23

Simple Linear Regression Example:


Making Predictions
◼ When using a regression model for prediction,
only predict within the relevant range of data.
Relevant range for
interpolation
450
400
House Price ($1000s)

350
300
250
200
150
100
50 Do not try to
0
extrapolate
0 500 1000 1500 2000 2500 3000
Square Feet
beyond the range
of observed X’s
Copyright © 2017 Pearson Education, Ltd. Chapter 13 - 24

24
Measures of Variation
◼ Total variation is made up of two parts:

SST = SSR + SSE


Total Sum of Regression Sum Error Sum of
Squares of Squares Squares

SST =  ( Yi − Y )2 SSR =  ( Ŷi − Y )2 SSE =  ( Yi − Ŷi )2


where:
Y = Mean value of the dependent variable.
Yi = Observed value of the dependent variable.
Yˆi = Predicted value of Y for the given Xi value.

Copyright © 2017 Pearson Education, Ltd. Chapter 13 - 25

25

Measures of Variation (continued)

◼ SST = total sum of squares (Total Variation.)


◼ Measures the variation of the Yi values around their
mean Y.
◼ SSR = regression sum of squares (Explained Variation.)
◼ Variation attributable to the relationship between X
and Y.
◼ SSE = error sum of squares (Unexplained Variation.)
◼ Variation in Y attributable to factors other than X.

Copyright © 2017 Pearson Education, Ltd. Chapter 13 - 26

26

Measures of Variation (continued)

Y
Yi  
SSE = (Yi - Yi )2 Y
_
SST = (Yi - Y)2

Y  _
_ SSR = (Yi - Y)2 _
Y Y

Xi X
Copyright © 2017 Pearson Education, Ltd. Chapter 13 - 27

27
Excel Output Of The Measures Of Variation

SST = SSR + SSE


32,600.5000 = 18,934.9348 + 13,665.5652

Copyright © 2017 Pearson Education, Ltd. Chapter 13 - 28

28

Coefficient of Determination, r2
◼ The coefficient of determination is the portion
of the total variation in the dependent variable
that is explained by variation in the
independent variable.
◼ The coefficient of determination is also called
r-square and is denoted as r2.
SSR regression sum of squares
r2 = =
SST total sum of squares

note: 0  r2  1
Copyright © 2017 Pearson Education, Ltd. Chapter 13 - 29

29

Examples of Approximate r2 Values


Y

Perfect linear relationship


between X and Y.
X
r2 = 1
Y 100% of the variation in Y is
explained by variation in X.

X
r2 = 1
Copyright © 2017 Pearson Education, Ltd. Chapter 13 - 30

30
Examples of Approximate r2 Values
Y
0 < r2 < 1

Weaker linear relationships


between X and Y.
X
Some but not all of the
Y
variation in Y is explained
by variation in X.

X
Copyright © 2017 Pearson Education, Ltd. Chapter 13 - 31

31

Examples of Approximate r2 Values

r2 = 0
Y
No linear relationship
between X and Y.

The value of Y does not


X depend on X. (None of the
r2 = 0
variation in Y is explained
by variation in X.)

Copyright © 2017 Pearson Education, Ltd. Chapter 13 - 32

32

Simple Linear Regression Example:


Coefficient of Determination, r2 in Excel
SSR 18,934.9348
Regression Statistics
r2 = = = 0.58082
Multiple R 0.76211 SST 32,600.5000
R Square 0.58082
Adjusted R Square 0.52842 58.08% of the variation in
Standard Error 41.33032 house prices is explained by
Observations 10
variation in square feet.
ANOVA
df SS MS F Significance F
Regression 1 18934.9348 18934.9348 11.0848 0.01039
Residual 8 13665.5652 1708.1957
Total 9 32600.5000

Coefficients Standard Error t Stat P-value Lower 95% Upper 95%


Intercept 98.24833 58.03348 1.69296 0.12892 -35.57720 232.07386
Square Feet 0.10977 0.03297 3.32938 0.01039 0.03374 0.18580

Copyright © 2017 Pearson Education, Ltd. Chapter 13 - 33

33
Standard Error of Estimate
◼ The standard deviation of the variation of
observations around the regression line is
estimated by:
n

SSE
 (Yi − Yˆi ) 2
i =1
S YX = =
n−2 n−2

Where:
SSE = error sum of squares.
n = sample size.

Copyright © 2017 Pearson Education, Ltd. Chapter 13 - 34

34

Simple Linear Regression Example:


Standard Error of Estimate in Excel
Regression Statistics
Multiple R 0.76211
R Square 0.58082 S YX = 41.33032
Adjusted R Square 0.52842
Standard Error 41.33032
Observations 10

ANOVA
df SS MS F Significance F
Regression 1 18934.9348 18934.9348 11.0848 0.01039

Residual 8 13665.5652 1708.1957


Total 9 32600.5000

Coefficients Standard Error t Stat P-value Lower 95% Upper 95%


Intercept 98.24833 58.03348 1.69296 0.12892 -35.57720 232.07386
Square Feet 0.10977 0.03297 3.32938 0.01039 0.03374 0.18580

Copyright © 2017 Pearson Education, Ltd. Chapter 13 - 35

35

Comparing Standard Errors


SYX is a measure of the variation of observed Y values from
the regression line.
Y Y

small SYX X large SYX X

The magnitude of SYX should always be judged relative to the


size of the Y values in the sample data.
i.e., SYX = $41.33K is moderately small relative to house prices in
the $200K - $400K range.
Copyright © 2017 Pearson Education, Ltd. Chapter 13 - 36

36
Assumptions of Regression L.I.N.E.
◼ Linearity:
◼ The relationship between X and Y is linear.
◼ Independence of Errors:
◼ Error values are statistically independent.
◼ Particularly important when data are collected over a
period of time.
◼ Normality of Error:
◼ Error values are normally distributed for any given
value of X.
◼ Equal Variance (also called homoscedasticity):
◼ The probability distribution of the errors has constant
variance.

Copyright © 2017 Pearson Education, Ltd. Chapter 13 - 37

37

Inferences About the Slope


◼ The standard error of the regression slope
coefficient (b1) is estimated by:
S YX S YX
Sb1 = =
SSX  (X − X)i
2

where:
Sb1= Estimate of the standard error of the slope.
SSE
SYX = = Standard error of the estimate.
n−2

Copyright © 2017 Pearson Education, Ltd. Chapter 13 - 38

38

Inferences About the Slope: t-Test


◼ t-test for a population slope:
◼ Is there a linear relationship between X and Y?
◼ Null and alternative hypotheses:
◼ H0: β1 = 0 (no linear relationship)
◼ H1: β1 ≠ 0 (linear relationship does exist)
◼ Test statistic :
b − β1 where:
t STAT = 1
Sb b1 = regression slope coefficient.
1 β1 = hypothesized slope.
Sb1 = standard error of the slope.
d.f. = n − 2

Copyright © 2017 Pearson Education, Ltd. Chapter 13 - 39

39
Inferences About the Slope: t-Test
Example (Recall):
House Price
Square Feet
in $1000s
(x)
(y) Estimated Regression Equation:
245 1400
312 1600
house price = 98.25 + 0.1098 (sq.ft.)
279 1700
308 1875 The slope of this model is 0.1098.
199 1100
219 1550
Is there a relationship between the
405 2350 square footage of the house and its
324 2450 sales price?
319 1425
255 1700

Copyright © 2017 Pearson Education, Ltd. Chapter 13 - 40

40

Inferences About the Slope: t-Test


Example (continued):

From Excel output:


Coefficients Standard Error t Stat P-value
Intercept 98.24833 58.03348 1.69296 0.12892
Square Feet 0.10977 0.03297 3.32938 0.01039

H 0 : β1 = 0
b1 Sb1 H 1 : β1 ≠ 0

b1 − β 1 0.10977 − 0
t STAT = = = 3.32938
Sb 0.03297
1

Copyright © 2017 Pearson Education, Ltd. Chapter 13 - 41

41

Inferences About the Slope: t-Test


Example (continued):
H 0 : β1 = 0
Test Statistic: tSTAT = 3.329 H 1 : β1 ≠ 0

d.f. = 10- 2 = 8

a/2=.025 a/2=.025
Decision: Reject H0.

There is sufficient evidence


Reject H0
-tα/2
Do not reject H0
tα/2
Reject H0 that square footage affects
0
-2.3060 2.3060 3.329 house price.

Copyright © 2017 Pearson Education, Ltd. Chapter 13 - 42

42
Inferences About the Slope: t-Test
Example (continued):
H 0 : β1 = 0
From Excel output: H 1 : β1 ≠ 0
Coefficients Standard Error t Stat P-value
Intercept 98.24833 58.03348 1.69296 0.12892
Square Feet 0.10977 0.03297 3.32938 0.01039

Decision: Reject H0, since p-value < α. p-value


There is sufficient evidence that
square footage affects house price.

Copyright © 2017 Pearson Education, Ltd. Chapter 13 - 43

43

F Test for The Slope

F Test statistic: F MSR


STAT =

MSE

where SSR
MSR =
k
SSE
MSE =
n − k −1
where FSTAT follows an F distribution with k numerator and (n – k - 1)
denominator degrees of freedom.

(k = the number of independent variables in the regression model.)

Copyright © 2017 Pearson Education, Ltd. Chapter 13 - 44

44

F-Test for The Slope


Excel Output

Regression Statistics
Multiple R 0.76211
MSR 18,934.9348
R Square 0.58082 FSTAT = = = 11.0848
Adjusted R Square 0.52842 MSE 1,708.1957
Standard Error 41.33032
Observations 10 With 1 and 8 degrees p-value for
of freedom the F-Test
ANOVA
df SS MS F Significance F
Regression 1 18934.9348 18934.9348 11.0848 0.01039

Residual 8 13665.5652 1708.1957


Total 9 32600.5000

Copyright © 2017 Pearson Education, Ltd. Chapter 13 - 45

45
F Test for The Slope (continued)

H 0 : β1 = 0 Test Statistic:
H 1 : β1 ≠ 0 MSR
FSTAT = = 11.08
a = .05 MSE
df1= 1 df2 = 8
Decision:
Critical Reject H0 at a = 0.05.
Value:
Fa = 5.32
a = .05 Conclusion:
There is sufficient evidence that
0 F house size affects selling price.
Do not Reject H0
reject H0
F.05 = 5.32
Copyright © 2017 Pearson Education, Ltd. Chapter 13 - 46

46

Confidence Interval Estimate


for the Slope
Confidence Interval Estimate of the Slope:
b1  t α / 2Sb d.f. = n - 2
1

Excel Printout for House Prices:


Coefficients Standard Error t Stat P-value Lower 95% Upper 95%
Intercept 98.24833 58.03348 1.69296 0.12892 -35.57720 232.07386
Square Feet 0.10977 0.03297 3.32938 0.01039 0.03374 0.18580

At 95% level of confidence, the confidence interval for


the slope is (0.0337, 0.1858).

Copyright © 2017 Pearson Education, Ltd. Chapter 13 - 47

47

Confidence Interval Estimate


for the Slope (continued)

Coefficients Standard Error t Stat P-value Lower 95% Upper 95%


Intercept 98.24833 58.03348 1.69296 0.12892 -35.57720 232.07386
Square Feet 0.10977 0.03297 3.32938 0.01039 0.03374 0.18580

Since the units of the house price variable is $1,000s, we are


95% confident that the average impact on sales price is between
$33.74 and $185.80 per square foot of house size.

This 95% confidence interval does not include 0.


Conclusion: There is a significant relationship between house price and
square feet at the .05 level of significance.

Copyright © 2017 Pearson Education, Ltd. Chapter 13 - 48

48
Pitfalls of Regression Analysis
◼ Lacking an awareness of the assumptions of least-
squares regression.
◼ Not knowing how to evaluate the assumptions of least-
squares regression.
◼ Not knowing the alternatives to least-squares regression
if a particular assumption is violated.
◼ Using a regression model without knowledge of the
subject matter.
◼ Extrapolating outside the relevant range.
◼ Concluding that a significant relationship identified always
reflects a cause-and-effect relationship.

Copyright © 2017 Pearson Education, Ltd. Chapter 13 - 49

49

Strategies for Avoiding


the Pitfalls of Regression
◼ Start with a scatter plot of X vs. Y to observe
possible relationship.
◼ Perform residual analysis to check the
assumptions:
◼ Plot the residuals vs. X to check for violations of
assumptions such as homoscedasticity.
◼ Use a histogram, stem-and-leaf display, boxplot,
or normal probability plot of the residuals to
uncover possible non-normality.

Copyright © 2017 Pearson Education, Ltd. Chapter 13 - 50

50

Strategies for Avoiding


the Pitfalls of Regression (continued)

◼ If there is violation of any assumption, use


alternative methods or models.
◼ Refrain from making predictions or forecasts
outside the relevant range.

Copyright © 2017 Pearson Education, Ltd. Chapter 13 - 51

51
Chapter Summary
In this chapter we discussed:
◼ How to use regression analysis to predict the value of
a dependent variable based on a value of an
independent variable.
◼ Understanding the meaning of the regression
coefficients b0 and b1.
◼ Evaluating the assumptions of regression analysis.
◼ Making inferences about the slope.

Copyright © 2017 Pearson Education, Ltd. Chapter 13 - 52

52

Week 12 Tutorial
Selected Practice Questions (see Levine et. al. 8th edition):
Page 460: 13.1a-c, and 13.2a-d.
Page 466: 13.11, and 13.13.
Page 480: 13.40a-d, and 13.41a-e.

Copyright © 2017 Pearson Education, Ltd. Chapter 13 - 53

53

Week 12 Tutorial (Continued)

Selected Practice Questions:


Additional Question (7th edition Levine et al):
The marketing manager of a large supermarket chain has the business objective of using shelf space
most efficiently. Toward that goal, she would like to use shelf space to predict the sales of a specialty
pet food. Data are collected from a random sample of 12 equal sized stores, with the following
results:

Shelf Space (X) Weekly Sales (Y)


Store (square metres) ($)
1 0.5 160
2 0.5 220
3 0.5 140
4 0.9 190
5 0.9 240
6 0.9 260
7 1.4 230
8 1.4 270
9 1.4 280
10 1.9 260
11 1.9 290
12 1.9 310

a. Construct a scatter plot of weekly sales and shelf space.


b. Use the least squares method to determine the coefficients of intercept and slope.
c. Interpret the meaning of the slope, in this problem.
d. Predict the weekly sales of pet food for stores with 0.7 square metres of shelf space.
e. Determine the coefficient of determination, r 2, and interpret its meaning.
f. Determine the standard error of the estimate.
g. How useful do you think this regression model is for predicting the monthly rent?
h. Can you think of other variables that might explain the variation in monthly rent?

Copyright © 2017 Pearson Education, Ltd. Chapter 13 - 54

54

You might also like