Chapter-9-Simple Linear Regression & Correlation
Chapter-9-Simple Linear Regression & Correlation
Linear regression and correlation is studying and measuring the linear relationship among
two or more variables. When only two variables are involved, the analysis is referred to
as simple correlation and simple linear regression analysis, and when there are more than
two variables the term multiple regression and partial correlation is used.
Correlation Analysis: deals with the measurement of the closeness of the relationship
which are described in the regression equation.
We say there is correlation when the two series of items vary together directly or
inversely.
Simple Correlation
Suppose we have two variables and
When higher values of X are associated with higher values of Y and lower values
of X are associated with lower values of Y, then the correlation is said to be
positive or direct.
Examples:
- Income and expenditure
- Number of hours spent in studying and the score obtained
- Height and weight
- Distance covered and fuel consumed by car.
When higher values of X are associated with lower values of Y and lower values
of X are associated with higher values of Y, then the correlation is said to be
negative or inverse.
Examples:
- Demand and supply
- Income and the proportion of income spent on food.
The correlation between X and Y may be one of the following
1. Perfect positive (slope=1)
2. Positive (slope between 0 and 1)
3. No correlation (slope=0)
4. Negative (slope between -1 and 0)
5. Perfect negative (slope=-1)
Page 1 of 11
The presence of correlation between two variables may be due to three reasons:
Example:
Let X1= be ESLCE result
Y1=be rate of surviving in the University
Y2=be the rate of getting a scholar ship.
3. Chance:
Examples:
Price of teff in Addis Ababa and grade of students in USA.
Weight of individuals in Ethiopia and income of individuals in Kenya.
Remark:
Interpretation of
Page 2 of 11
4. Some Negative linear relationship ( between -1 and 0)
5. Perfect negative linear relationship (
Examples:
1. Calculate the simple correlation between mid semester and final exam scores of 10
students (both out of 50)
This means mid semester exam and final exam scores have a slightly positive correlation.
2. The following data were collected from a certain household on the monthly income
(X) and consumption (Y) for the past 10 months. Compute the simple correlation
coefficient.( Exercise)
X: 650 654 720 456 536 853 735 650 536 666
Y: 450 523 235 398 500 632 500 635 450 360
The above formula and procedure is only applicable on quantitative data, but when we
have qualitative data like efficiency, honesty, intelligence, etc
We calculate what is called Spearman’s rank correlation coefficient as follows:
Steps
i. Rank the different items in X and Y.
ii. Find the difference of the ranks in a pair , denote them by Di
iii. Use the following formula
Page 3 of 11
Example:
Aster and Almaz were asked to rank 7 different types of lipsticks, see if there is
correlation between the tests of the ladies.
Lipsticks A B C D E F G
Aster 2 1 4 3 5 7 6
Almaz 1 3 2 4 5 6 7
Solution:
X Y R1-R2 D2
(R1) (R2) (D)
2 1 1 1
1 3 -2 4
4 2 2 4
3 4 -1 1
5 5 0 0
7 6 1 1
6 7 -1 1
Total 12
- Simple linear regression refers to the linear relationship between two variables
- We usually denote the dependent variable by Y and the independent variable by X.
- A simple regression line is the line fitted to the points plotted in the scatter diagram,
which would describe the average relationship between the two variables. Therefore,
to see the type of relationship, it is advisable to prepare scatter plot before fitting the
model.
Page 4 of 11
- To estimate the parameters ( ) we have several methods:
The free hand method
The semi-average method
The least square method
The maximum likelihood method
The method of moments
Bayesian estimation technique.
- The above model is estimated by:
Where is a constant which gives the value of Y when X=0 .It is called the Y-
intercept. is a constant indicating the slope of the regression line, and it gives a
measure of the change in Y for a unit change in X. It is also regression coefficient of Y
on X.
- and are found by minimizing
Example 1: The following data shows the score of 12 students for Accounting and Statistics
Examinations.
a) Calculate a simple correlation coefficient
b) Fit a regression line of Statistics on Accounting using least square estimates.
c) Predict the score of Statistics if the score of accounting is 85.
Accounting Statistics
X Y
Page 5 of 11
1 74.00 81.00
2 93.00 86.00
3 55.00 67.00
4 41.00 35.00
5 23.00 30.00
6 92.00 100.00
7 64.00 55.00
8 40.00 52.00
9 71.00 76.00
10 33.00 24.00
11 30.00 48.00
12 71.00 87.00
Accounting Statistics
X2 Y2 XY
X Y
1 74.00 81.00 5476.00 6561.00 5994.00
2 93.00 86.00 8649.00 7396.00 7998.00
3 55.00 67.00 3025.00 4489.00 3685.00
4 41.00 35.00 1681.00 1225.00 1435.00
5 23.00 30.00 529.00 900.00 690.00
Page 6 of 11
6 92.00 100.00 8464.00 10000.00 9200.00
7 64.00 55.00 4096.00 3025.00 3520.00
8 40.00 52.00 1600.00 2704.00 2080.00
9 71.00 76.00 5041.00 5776.00 5396.00
10 33.00 24.00 1089.00 576.00 792.00
11 30.00 48.00 900.00 2304.00 1440.00
12 71.00 87.00 5041.00 7569.00 6177.00
Total 687.00 741.00 45591.00 52525.00 48407.00
Mean 57.25 61.75
a)
The Coefficient of Correlation (r) has a value of 0.92. This indicates that the two
variables are positively correlated (Y increases as X increases).
b) Using OLS:
Page 7 of 11
Scatter Diagram and Regression Line
Example 2:
A car rental agency is interested in studying the relationship between the distance
driven in kilometer (Y) and the maintenance cost for their cars (X in birr). The
following summarized information is given based on samples of size 5. (Exercise)
, ,
- To know how far the regression equation has been able to explain the variation in Y we
determination ( )
Page 8 of 11
- gives the proportion of the variation in Y explained by the regression of Y on X.
- gives the unexplained proportion and is called coefficient of indetermination.
i.
ii.
Then
Page 9 of 11
- Moreover, are completely different numerically as well as
conceptually.
- Let us consider three cases concerning these coefficients.
1. If the correlation is perfect positive, i.e. then the b values reciprocals of each
other.
2. , then irrespective of the value of the b values are equal, i.e.
( but this is unlikely case)
3. The most important case is when , here the b values are not equal
or reciprocals to each other, but rather the two lines differ , intersecting at the
common point ( )
Thus to determine if a regression equation is X on Y or Y on X ,
we have to use the formula
If
If
Example: The regression line between height (X) in inches and weight (Y) in lbs of
male students are:
Solution
We will assume one of the equation as regression of X on Y and the other as Y on X
and calculate
,
This is impossible (contradiction). Hence our assumption is not correct. Thus
To verify:
Page 10 of 11
Page 11 of 11