Linear Correlation
&
Regression Analysis
Correlation & Regression Analysis 1
Chapter Six
Linear Regression and Correlation
GOALS
When you have completed this chapter, you will be able to:
ONE
Draw a scatter diagram.
TWO
Understand and interpret the terms dependent variable and
independent variable.
THREE
Calculate and interpret the coefficient of correlation, the
coefficient of determination, and the standard error of
estimate.
Correlation & Regression Analysis 2
Chapter Six continued
Linear Regression and Correlation
GOALS
When you have completed this chapter, you will be able to:
FOUR
Calculate the least squares regression line and interpret
the slope and intercept values.
Correlation & Regression Analysis 3
Correlation Analysis
Correlation Analysis is a group of statistical
techniques used to measure the strength of the
association between two variables.
Types of correlation :
a. Positive and Negative
b. Simple , Multiple and Partial
c. Linear and Nonlinear.
Correlation & Regression
Analysis 4
Correlation Analysis
A Scatter Diagram is a chart that portrays the
relationship between the two variables.
The Dependent Variable is the variable being
predicted or estimated.
The Independent Variable provides the basis for
estimation. It is the predictor variable.
Correlation & Regression
Analysis 5
The Coefficient of Correlation, r
Correlation & Regression
Analysis 6
The Coefficient of Correlation, r
The characteristics of the coefficient of correlation are:
It requires interval or ratio-scaled data.
It can range from -1.00 to 1.00.
Values of -1.00 or 1.00 indicate perfect and strong
correlation.
Values close to 0.0 indicate weak correlation.
Negative values indicate an inverse relationship
and positive values indicate a direct relationship.
Correlation & Regression
Analysis 7
Perfect Negative Correlation (r = - 1)
10
9
8
7
6
Y 5
4
3
2
1
0
0 1 2 3 4 5 6 7 8 9 10
X
Correlation & Regression
Analysis 8
Perfect Positive Correlation (r = +1)
10
9
8
7
6
5
Y 4
3
2
1
0
0 1 2 3 4 5 6 7 8 9 10
X
Correlation & Regression
Analysis 9
Zero Correlation (r = 0)
10
9
8
7
6
5
Y 4
3
2
1
0
0 1 2 3 4 5 6 7 8 9 10
X
Correlation & Regression
Analysis 10
Strong Positive Correlation
10
9
8
7
6
Y 5
4
3
2
1
0
0 1 2 3 4 5 6 7 8 9 10
X
Correlation & Regression
Analysis 11
Formula for r
We calculate the coefficient of correlation from the
following formulas.
(X X )(Y Y )
r
( X X ) 2
( Y Y ) 2
XY n XY
X 2 n X 2 Y 2 n Y 2
Correlation & Regression
Analysis 12
Coefficient of Determination
The coefficient of determination (r2) is the proportion of
the total variation in the dependent variable (Y) that is
explained or accounted for by the variation in the
independent variable (X).
Correlation & Regression
Analysis 13
Coefficient of Determination
The features of the coefficient of determination are:
It is the square of the coefficient of correlation.
It ranges from 0 to 1.
It does not give any information on the direction of the
relationship between the variables.
Correlation & Regression
Analysis 14
Example # 1
Dan Ireland, the student body president at Toledo State
University, is concerned about the cost to students of
textbooks. He believes there is a relationship between the
number of pages in the text and the selling price of the
book. To provide insight into the problem he selects a
sample of eight textbooks currently on sale in the
bookstore. Draw a scatter diagram. Compute the
correlation coefficient.
Correlation & Regression
Analysis 15
Example # 1 Continued
Book Page Price ($)
Intro to History 500 84
Basic Algebra 700 75
Intro to Psyc 800 99
Intro to Sociology 600 72
Bus. Mmgt 400 69
Intro to Biology 500 81
Fund. of Jazz 600 63
Princ. of Nursing 800 93
Correlation & Regression
Analysis 16
Example # 1 continued
Scatter Diagram of Number of Pages and Selling Price of Text
100
90
Price ($)
80
70
60
400 500 600 700 800
Page
Correlation & Regression
Analysis 17
Example # 1 continued
Book Page Price ($)
X Y XY X2 Y2
Into to History 500 84 42,000 250,000 7,056
Basic Algebra 700 75 52,500 490,000 5,625
Into to Psyc 800 99 79,200 640,000 9,801
Into to Sociology 600 72 43,200 360,000 5,184
Bus. Mmgt 400 69 27,600 160,000 4,761
Intro to Biology 500 81 40,500 250,000 6,561
Fund. of Jazz 600 63 37,800 360,000 3,969
Princ. of Nursing 800 93 74,400 640,000 8,649
Total 4,900 636 397,200 3,150,000 51,606
Correlation & Regression Analysis 18
Example # 1 continued
XY n XY
r
X 2 n X 2 Y 2 n Y 2
397200 8 612.5 79.5
(3150000 8 612.52 ) (51606 8 79.52 )
0.614
Correlation & Regression
Analysis 19
Example # 1 continued
Interpretation:
The correlation between the number of pages and the
selling price of the book is 0.614. This indicates a Higher
association between the variable.
Correlation & Regression
Analysis 20
Regression Analysis
In regression analysis we use the independent variable (X) to
estimate the dependent variable (Y).
The relationship between the variables is linear.
Both variables must be at least interval scale.
The least squares criterion is used to determine the
2
equation. That is the term (Y Y) is minimized.
Correlation & Regression
Analysis 21
Regression Analysis
The regression equation: Y = a + bX,
where:
Y is the average predicted value of Y for any X.
a is the Y-intercept. It is the estimated Y value when
X=0
b is the slope of the line, or the average change in Y
for each change of one unit in X
The least squares principle is used to obtain a and b.
Correlation & Regression
Analysis 22
Regression Analysis
The least squares principle is used to obtain a and b. The
equations to determine a and b are:
b yx
( x x )( y y)
xy n xy
(x x) 2
x nx
2 2
and a y b yx x
Correlation & Regression
Analysis 23
Example # 2 continued
Develop a regression equation for the information given
in Example # 1 that can be used to estimate the selling
price based on the number of pages.
Solution:
397200 8 612.5 79.5
b 2
.05143
3150000 8 (612.5)
636 4,900
a 0.05143 48.0
8 8
Correlation & Regression
Analysis 24
Example # 2 continued
The regression equation is:
Y’ = 48.0 + .05143X
The equation crosses the Y-axis at $48. A book with
no pages would cost $48.
The slope of the line is .05143. Each addition page
costs about a nickel.
The sign of the b value and the sign of r will always
be the same.
Correlation & Regression
Analysis 25
Example # 2 continued
We can use the regression equation to estimate
values of Y. The estimated selling price of an 800
page book is $89.14, found by:
Y 48.0 0.05143X
48.0 0.05143 (800) 89.14
Correlation & Regression
Analysis 26
The Standard Error of Estimate
The standard error of estimate measures the scatter, or
dispersion, of the observed values around the line of
regression
Correlation & Regression
Analysis 27
The Standard Error of Estimate
• The formulas that are used to compute the standard
error:
(Y Y ) 2
s y. x
n2
Y 2 aY bXY
n2
Correlation & Regression
Analysis 28
Example # 3
Find the standard error of estimate for the problem
involving the number of pages in a book and the selling
price.
Y 2 aY bXY
s y. x
n2
51606 48 (636) 0.05143 (397200)
82
10.408
Correlation & Regression
Analysis 29
Assumptions Underlying Linear Regression
For each value of X, there is a group of Y values, and
these Y values are:
The standard deviations of these normal distributions
are equal.
The Y values are statistically independent. This
means that in the selection of a sample, the Y values
chosen for a particular X value do not depend on the
Y values for any other X values.
The means of these normal distributions of Y values
all lie on the straight line of regression.
Correlation & Regression
Analysis 30