0% found this document useful (0 votes)
9 views27 pages

Cor Regression

Download as ppt, pdf, or txt
Download as ppt, pdf, or txt
Download as ppt, pdf, or txt
You are on page 1/ 27

Chapter 7

Simple Linear Regression and


Correlation
Chapter Objectives
• Calculate and interpret the simple linear correlation between two
variables.
• Calculate and interpret the simple linear regression equation for
a set of data.
• Understand the assumptions behind the simple linear regression
analysis.
Scatter Plots and Correlation

• A scatter plot (or scatter diagram) is used to show the


relationship between two variables
• Correlation analysis is used to measure strength of the
association (linear relationship) between two variables
Scatter Plot Examples
Linear relationships Curvilinear relationships

y y

x x

y y

x x
Scatter Plot Examples
(continued)
Strong relationships Weak relationships

y y

x x

y y

x x
Scatter Plot Examples
(continued)
No relationship

x
Correlation Coefficient
(continued)

• The population correlation coefficient, ρ (rho)


measures the strength of the association between the
variables

• The sample correlation coefficient, r is an estimate of


ρ and is used to measure the strength of the linear
relationship in the sample observations
Features of ρ and r
• Unit free
• Range between -1 and 1
• The closer is r to -1, the stronger the negative linear relationship
b/n the two variables
• The closer is r to 1, the stronger the positive linear relationship
b/n the two variables
• The closer is r to 0, the weaker the linear relationship b/n the
two variables
Examples of Approximate
r Values

y y y

x x x
r = -1 r = -.6 r=0
y y

x x
r = +.3 r = +1
Calculating the Correlation Coefficient
Sample correlation coefficient:

r
 ( x  x )( y  y )
[ ( x  x ) ][  ( y  y ) ]
2 2

or the algebraic equivalent:


n xy   x  y
r
[n(  x 2 )  (  x )2 ][n(  y 2 )  (  y )2 ]
where:
r = Sample correlation coefficient
n = Sample size
x = Value of the independent variable
y = Value of the dependent variable
Calculation Example
Tree Trunk
Height Diameter
y x xy y2 x2
35 8 280 1225 64
49 9 441 2401 81
27 7 189 729 49
33 6 198 1089 36
60 13 780 3600 169
21 7 147 441 49
45 11 495 2025 121
51 12 612 2601 144
=321 =73 =3142 =1411 =713
1
Calculation Example (continued)

Tree n xy   x  y
Height, r
y 70 [n(  x 2 )  (  x)2 ][n(  y 2 )  (  y)2 ]
60

8(3142)  (73)(321)
50 
40
[8(713)  (73)2 ][8(14111)  (321) 2 ]
30

 0.886
20

10

0
r = 0.886 → relatively strong positive
0 2 4 6 8 10 12 14
linear association between x and y
Trunk Diameter, x
Introduction to Regression Analysis
• Regression analysis is used to:
– Predict the value of a dependent variable based on the value of
at least one independent variable
– Explain the impact of changes in an independent variable on
the dependent variable
Dependent variable: the variable we wish to explain
Independent variable: the variable used to explain the dependent
variable
Simple Linear Regression Model

• Only one independent variable, x


• Relationship between x and y is described by a linear function
• Changes in y are assumed to be caused by changes in x

Dependent variable Independent variables


the revenue the number of sales people employed by a company
business generates
the revenue the the number of stores they operate
business generates

price of apartments size of the apartment in square feet

price of apartments distance of each apartment from downtown in


kilometres
Consumption Income
Types of Regression Models
Positive Linear Relationship Relationship NOT Linear

Negative Linear Relationship No Relationship


Population Linear Regression

The population regression model:


regression
Independent
y intercept Coefficient
Variable
(Slope)
Dependent

y  β 0  β1x  
Variable

Linear Regression Assumption: The underlying relationship between the x


variable and the y variable must be linear
Population Linear Regression
(continued)

y y  β0  β1x  ε
Observed Value
of y for xi

εi Slope = β1
Predicted Value Random Error
of y for xi
for this x value

Intercept = β0

xi x
Estimated Regression Model

The sample regression line provides an estimate of


the population regression line

Estimated Estimate of Estimate of the


(or predicted) the regression regression slope
y value
intercept
Independent

ŷ i  b0  b1x variable

The individual random error terms ei have a mean of zero


Least Squares Criterion

• b0 and b1 are obtained by finding the values of b0 and


b1 that minimize the sum of the squared residuals

e 2
  (y ŷ) 2

  (y  (b 0  b1x)) 2
The Least Squares Equation

• The formulas for b1 and b0 are:

b1 
 ( x  x )( y  y )
 (x  x) 2

algebraic equivalent is:


n xy   x  y and b0  y  b1 x
b1 
n x 2  ( x ) 2
Interpretation of the Slope and the Intercept

• b0 is the estimated average value of y when the value of x is zero

• b1 is the estimated change in the average value of y as a result of


a one-unit change in x
• NB: The coefficients b0 and b1 will usually be found using
computer softwares, such as Excel , Minitab, SPSS, R, etc
Simple Linear Regression Example

• A real estate agent wishes to examine the


relationship between the selling price of a home
and its size (measured in square feet)

• A random sample of 10 houses is selected


– Dependent variable (y) = house price in
$1000s
– Independent variable (x) = home size in
square feet
Sample Data for House Price Model
House Home size
Price in in Square
House
$1000s Home size inx
Feet
2 xy
Price in
(y)
$1000s (x)Square Feet Total
(x) 1960000 1645
245(y) 1400
312
245 1600 1400 2560000 1912
1645
279
312 1700 1600 2890000 1979
1912
308
279 1875 1700 3515625 2183
1979
199
308 1100 1875 1210000 1299
2183
219
199 1550 1100 2402500 1769
1299
405
219 2350 1550 5522500 2755
1769
324
405 2450 2350 6002500 2774
2755
319
324 1425 2450 2030625 1744
2774
255
319 1700 1425 2890000 1955
1744
2865
255 17150 1700 30983750 20015
1955
Solution
10 x 20015  2865 x17150
n xy   x  y b1 
b1  10 x30983750  (17150) 2
n x 2  ( x ) 2
b1  0.10977

b0  y  b1 x

b0 
 y b  x
1
n n
2865 17150
b0   0.10977 *
10 10
b0  98.24833

house price  98.24833  0.10977 * home size


Excel Output
Regression Statistics
Multiple R 0.76211
R Square 0.58082
Adjusted R
Square 0.52842
The regression equation is:
Standard Error 41.33032
house price  98.24833  0.10977 (square feet)
Observations 10

ANOVA Significance
Df SS MS F F
18934.934 11.084
Regression 1 18934.9348 8 8 0.01039
Residual 8 13665.5652 1708.1957
Total 9 32600.5000

Coefficien P- Upper
ts Standard Error t Stat value Lower 95% 95%
0.1289 232.0738
Intercept 98.24833 58.03348 1.69296 2 -35.57720 6
0.0103
Square Feet 0.10977 0.03297 3.32938 9 0.03374 0.18580
Graphical Presentation
• House price model: scatter plot and
regression line
450
400
House Price ($1000s)

350
Slope
300
250
= 0.10977
200
150
100
50
Intercept 0
= 98.248 0 500 1000 1500 2000 2500 3000
Square Feet

house price  98.24833  0.10977 (size)


Interpretation of the
Slope Coefficient, b1
house price  98.24833  0.10977 (square feet)

• b1 measures the estimated change in


the average value of Y as a result of
a one-unit change in X

– Here, b1 = .10977 tells us that the average value of a


house increases by .10977($1000) = $109.77, on
average, for each additional one square foot of size.

You might also like