0% found this document useful (0 votes)
11 views29 pages

Chap 6 Linear Correlation and Regression

Chapter 6 covers linear regression and correlation, defining dependent and independent variables, and explaining how to calculate correlation coefficients and regression equations. It includes examples of correlation analysis, scatter diagrams, and the Pearson correlation coefficient, illustrating the strength and direction of relationships between variables. The chapter also discusses the coefficient of determination and the standard error of estimate, providing insights into the reliability of regression estimates.

Uploaded by

rakibemon699
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
11 views29 pages

Chap 6 Linear Correlation and Regression

Chapter 6 covers linear regression and correlation, defining dependent and independent variables, and explaining how to calculate correlation coefficients and regression equations. It includes examples of correlation analysis, scatter diagrams, and the Pearson correlation coefficient, illustrating the strength and direction of relationships between variables. The chapter also discusses the coefficient of determination and the standard error of estimate, providing insights into the reliability of regression estimates.

Uploaded by

rakibemon699
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 29

Chapter 6

Linear Regression and Correlation


Prepared by
Dr. Mohammad Bayezid Ali
Professor
Department of Finance
Jagannath University, Dhaka

McGraw-Hill/Irwin ©The McGraw-Hill Companies, Inc. 2008


GOALS

⚫ Understand and interpret the terms dependent and


independent variable.
⚫ Calculate and interpret the coefficient of correlation,
the coefficient of determination, and the standard
error of estimate.
⚫ Calculate the least squares regression equation
and regression line.

2
Definition: Dependent and Independent
Variable

⚫ Dependent variables are those variables whose value is


expected to be dependent or based on the value of some
other variable. For example: profit is dependent upon the
amount of revenue, agricultural production is dependent upon
the amount of rainfall or use of fertilizer, student satisfaction in
the class room is dependent upon the academic qualification
of the faculty or logistic support in the class room etc. In
statistics, dependent variables is also referred to as
predictand, regressand, explained variable, effect variable or
target variable.
⚫ Independent variables are those variables whose values are
not expected to be dependent on some other variables. In
statistics, independent variables are also known as
explanatory variables, predictors, regressors or control
variables.
3
Correlation Analysis

⚫ Correlation Analysis is the study of the


relationship between variables. It is also defined
as group of techniques to measure the
association between two variables. The
measures of correlation is called correlation
coefficient and is denoted by the symbol ‘r’.
⚫ A Scatter Diagram is a chart that portrays the
relationship between the two variables. It is the
usual first step in correlations analysis.

4
Scatter Plot (or scatter diagram)

Scatter plot or scatter diagram


is a graph in which the paired (x,y)
sample data are plotted with a
horizontal x axis and a vertical y axis.
Each individual (x, y) pair is plotted as
a single point.

5
Positive Linear Correlation

y y y

x x x
(a) Positive (b) Strong (c) Perfect
positive positive

6
Negative Linear Correlation

y y y

x x x
(d) Negative (e) Strong (f) Perfect
negative negative

7
No Linear Correlation
y y

x x
(g) No Correlation (h) Nonlinear Correlation

8
Example: Correlation

There is a general intuition Sales Intelligence Sales (‘000


person Test Scores Tk.)
that an intelligent sales
person will be able to make 1 45 2.0
more sales. This table 2 75 6.5
arranges the information 3 50 3.5
about the random selection 4 60 5
of 10 different sales person, 5 80 4.5
their obtained intelligent test 6 90 6
scores and their sales 7 85 6.5
performance in a particular 8 40 2.5
month. 9 80 5.5
10 55 4.5
9
Scatter Diagram

6
Sales Performance

1
20 40 60 80 100
Test Scores

10
Pearson Correlation Coefficient

The Coefficient of Correlation (r) is a measure of the


direction as well as strength of the relationship
between two variables. Measures a “linear”
relationship only.

Direction of relationship between x, y


Positive (+r) = As X goes up, Y goes up
Negative (-r) = As X goes up, Y goes down
Strength of a relationship between X, Y
Closer to  1.0, stronger
Closer to 0, weaker
when r = 0 → X,Y relationship not defined by a straight
line
11
Different Level of the Strength of
Relationship

12
Different Level of the Strength of
Relationship

13
What does r represent?:

r = degree to which X and Y vary together


degree to which X and Y vary separately
r = covariance of X and Y
variance of X and Y
X .Y
XY −
r= n
 (  X ) 2
  (  Y ) 2

 X −
2
  Y −
2

 n  n 

14
Solving for correlation coefficient
Sales Intelligence Sales (‘000 Tk.) X2 Y2 XY
Test Score (X) (Y)
1 45 2 2025 4 90
2 75 6.5 5625 42 488
3 50 3.5 2500 12 175
4 60 5 3600 25 300
5 80 4.5 6400 20 360
6 90 6 8100 36 540
7 85 6.5 7225 42 553
8 40 2.5 1600 6.3 100
9 80 5.5 6400 30 440
10 55 4.5 3025 20 248
∑X2 = ∑Y2= ∑XY=
N=10 ΣX=660 ΣY=46.5 46500 2239 3293
15
X .Y
XY −
r= n
 (X ) 2  ( Y ) 2 
 X −
2
 Y −
2

 n  n 
660  46.5
3293 −
= 10
6602 46.52
(46500 − )(2239 − )
10 10
= 0.869

Comment: Correlation coefficient value of 0.869 implies that


there exist a strong and positive statistical association between
intelligent test score of the sales person and their sales
performance.
16
Coefficient of Determination

The coefficient of determination (r2) is the proportion of the


total variation in the dependent variable (Y) that is
explained or accounted for by the variation in the
independent variable (X). It is the square of the
coefficient of correlation.
⚫ It ranges from 0 to 1.
⚫ It does not give any information on the direction of the
relationship between the variables.
⚫ In our example r = (0.869 ) = 0.754
2 2
that means 75.4%
variations or changes in sales performance can be
explained by the changes in intelligent test scores.

17
Linear Regression Model
Regression analysis is a technique used to develop an equation
to express the linear (straight line) relationship between two
variables and provide the estimates. The objective of this analysis is
to estimate a regression equation which is used to estimate the
change in the dependent variable due to any change in independent
variable.

18
a=
Y − b  X
n n

XY − X .Y / n
b=
X 2 −
(X )2
n

The least squares principle is used to obtain a and b.


The equations to determine a and b are:

b=
XY − X .Y / n and a=
 Y
−b
 X
X 2 −
(X ) 2
n n
n
19
Computing the Slope of the Line

Since, y = a + bx XY − X .Y / n


Here, b=
X −
2 (X )
2

n
660  46.5
3293 −
= 10
2
660
46500 −
10
= 0.076

20
Computing the Y-Intercept

a =
Y −b
 X
n n
46.5 660
= − (0.076  )
10 10
= −0.366

Hence, the regression equation of sales on test score is y=-


0.366+0.076x. This regression equation can be explained as if
the intelligence test score increased by 1 marks, sales
performance is expected to increase by 0.076 thousand or Tk. 76.

21
Predicting Values with Regression Equation

If any particular sales person got an


intelligence test score equal to 94, what is the
expected sales performance?

Here the regression equation of sales on test


score is y=-0.366+0.076x

When the test score is 94, the expected sales


amount will be -0.366+0.076*94= Tk. 7.51
thousand.

22
Difference Between Correlation and
Regression

23
Difference Between Correlation and
Regression

24
The Standard Error of Estimate

⚫ The standard error of estimate measures the scatter, or


dispersion, of the observed values around the line of
regression.
⚫ The formulas that are used to compute the standard
error: 2
Y − aY − bXY
s y.x =
n−2

⚫ In our example, the standard error of the regression


estimates is 0.847865. Here, the lower the value of the
standard error, greater the statistical reliability of the
regression estimates.
25
Charles Spearman’s Coefficient of Correlation
(Rank Correlation)

26
Charles Spearman’s Coefficient of Correlation
(Rank Correlation)

Two managers are Employees Ranking by Ranking by


asked to rank a group Manager I Manager II
of employees in order A 10 9
of potential for B 2 4
eventually becoming C 1 2
top managers. The D 4 3
rankings are as follows:
E 3 1
F 6 5
G 5 6
Compute the coefficient H 8 8
of rank correlation and I 7 7
comment on the value. J 9 10
27
Solution: Rank Correlation

Emplo Ranking by Ranking by


yees Manager I Manager II (R2)
(R1)
A 10 9 1
B 2 4 4
C 1 2 1
D 4 3 1
E 3 1 4 Thus we find that there is
F 6 5 1
a high degree of positive
G 5 6 1
correlation in the ranks
H 8 8 0
assigned by two
I 7 7 0
J 9 10 1
managers.

28
End of Chapter 6

29

You might also like