0% found this document useful (0 votes)
22 views6 pages

Online Correlation and Regression

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
22 views6 pages

Online Correlation and Regression

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 6

Correlation and Regression

What is correlation?

Statistical tool with the help of which the relationships between two or more
than two variables is studied is known as correlation. The measure of
correlation called the co-efficient of correlation (denoted by the symbol r)
summarizes in one figure the direction and the degree of correlation. The
correlation analysis refers to the techniques used in measuring closeness of the
relationship between the variables.
Types of Correlation

Correlation is described or classified in several different ways. Three of the most important
are:

(i) Positive and negative

(ii) Simple, partial, and multiple,

(iii) Linear and non-linear

Methods of studying Correlation

(i) Scatter Diagram Method,


(ii) Karl Pearson’s Coefficient of Correlation,
(iii) Spearman’s Rank Correlation Co-efficient, and
(iv) Method of Least Squares.

Interpretation of the Values of correlation co-efficient


Co-effient of correlation r lie between ∓ 1. When r = +1, it means there is perfect positive
correlation between the variables. When r = -1 , it means that there is perfect negative correlation
between the variables. When r = 0, it means that there is no relationship between the variables.
The stronger the relationship between the variables as the values of r closes to 1. The co-efficient
of correlation describes not only the magnitude of correlation but also its direction.

Properties of the Co-efficient of Correlation

(i) The coefficient correlation lies between +1 and -1.


(ii) The co-efficient of correlation is independent of change of origin and scale.
(iii) If X and Y are independent variables then co-efficient of correlation is zero.

1
(iv) The co-efficient of correlation is the geometric mean of two regressions co-efficient.
Symbolically, r = √𝒃𝒚𝒙 𝒃𝒙𝒚 .

Example: The following data relate to the age of 10 employees and the number of days on
which they reported sick in a month.

Age 20 30 32 35 40 46 52 55 58 62
Sick days 1 2 0 3 4 6 5 7 8 9

Calculate Karl Pearson’s co-efficient of correlation and interpret its value.

Solution:

Let age and sick days be represented by variables X and Y respectively.


Age(X) dx = dx2 Y dy = 𝒅𝟐𝒚 dxdy
(X-43) (Sick days) (Y - 5)
20 -23 529 1 -4 16 92
30 -13 169 2 -3 9 39
32 -11 121 0 -5 25 55
35 -8 64 3 -2 4 16
40 -3 9 4 -1 1 3
46 3 9 6 1 1 3
52 9 81 5 0 0 0
55 12 144 7 2 4 24
58 15 225 8 3 9 45
62 19 361 9 4 16 76
∑ 𝑿 = 𝟒𝟑𝟎 ∑ 𝒅𝒙 = 𝟎 ∑ 𝒅𝟐𝒙 = 𝟏𝟕𝟏𝟐 ∑ 𝒀 = 𝟒𝟓 ∑ 𝒅𝒚 = −𝟓 ∑ 𝒅𝟐𝒚 = 𝟖𝟓 ∑ 𝒅𝒙 𝒅𝒚 = 𝟑𝟓𝟑

𝑵 ∑ 𝒅𝒙 𝒅𝒚 −∑ 𝒅𝒙 ∑ 𝒅𝒚 𝟏𝟎𝒙𝟑𝟓𝟑−(𝟎)(−𝟓) 𝟑𝟓𝟑𝟎
r(xy) = 𝟐 𝟐 = =
√𝑵 ∑ 𝒅𝟐𝒙 − (∑ 𝒅𝒙) . √𝑵 ∑ 𝒅𝟐𝒚 − (∑ 𝒅𝒚) √𝟏𝟎𝒙𝒙𝟏𝟕𝟏𝟐−(𝟎)𝟐 .√𝟏𝟎𝒙𝟖𝟓−(−𝟓)𝟐. √𝟏𝟕𝟏𝟐𝟎.√𝟖𝟐𝟓

𝟑𝟓𝟑𝟎
= 𝟏𝟑𝟎.𝟖𝟓(𝟐𝟖.𝟕𝟐) = 0.939

Thus, there is a very high degree of positive correlation between age and sick days taken. Hence,
it can be concluded that as the age of employee increases, he has the possibility to be sick more
often than others.

RANK CORRELATION COEFFICIENT


The rank correlation coefficient is applied to a set ordinal rank numbers, with 1 for the individual
ranked first in quantity or quality and so on, to N for the individual ranked last in a group of N
individuals (or N pairs of individuals).Spearman’s rank correlation coefficient is defined as :

𝟔 ∑ 𝑫𝟐
R=1-
𝑵(𝑵𝟐 − 𝟏)

Where, R = rank coefficient of correlation


2
D = difference of between paired items in two series.
N = pairs of observations
The value of this coefficient lies between +1 to -1. When R is +1 there is complete agreement in
the order of ranks and the ranks are in the same direction. When R = -1 there is complete
agreement in the order of ranks and they are in opposite directions.

Example: Calculate the rank correlation coefficient for the following data of marks of 2
tests given to candidates for a clerical job.

Preliminary test 92 89 87 86 83 77 71 63 53 50
Final test 86 83 91 77 68 85 52 82 37 57

Solution

Preliminary Test 𝑹𝟏 Final Test 𝑹𝟐 𝑫𝟐 = (𝑹𝟏 − 𝑹𝟐 )𝟐


92 10 86 9 1
89 9 83 7 4
87 8 91 10 4
86 7 77 5 4
83 6 68 4 4
77 5 85 8 9
71 4 52 2 4
63 3 82 6 9
53 2 37 1 1
50 1 57 3 4
N = 10 ∑ 𝑫𝟐 = 𝟒𝟒

𝟔 ∑ 𝑫𝟐
R = 1 - 𝑵(𝑵𝟐 − 𝟏)

𝟔(𝟒𝟒)
=1– = 1 – 0.267 = 0.733
𝟗𝟗𝟎

Thus, there is a high degree of positive correlation between preliminary and final
test.

3
Regression Analysis
The statistical tool with the help of which one can estimate (or predict) the
unknown values one variable from known values of another variable is called
regression.

Assumption

(1) Linear Regression Model,

(2) X values are fixed in repeated sampling,

(3) Zero means value of disturbance,

(4) Equal variance of ui ,

(5) Zero covariance between ui and xi ,

(6) Number of observation must be greater than the no. of parameters to be


tested,

(7) The regression model must be correctly specified,

(8) There is no perfect linear relationship among the explanatory variables.

The regression analysis helps in three important ways:


(i) It provides estimates of values of the dependent variables from values of
independent variables.
(ii) It used to obtain a measure of the error involved in using the regression line as
a basis of estimation.
(iii) This is a measure of the degree of association or correlation that exists
between the variables.

4
Example 2

The following data relate to advertising expenditure (in lakh Tk.) and their corresponding
sales(in crores Tk.):

Advertising Expenditure 10 12 15 23 20
Sales 14 17 23 25 21

Estimate (i) the regression line of Y on X ; (ii) estimate sales corresponding to advertising
expenditure of Tk. 30 lakh, (iii) test the significance of the regression co-efficient at 5%
level of significance.
Solution: Let advertising expenditure be denoted by X and sales by Y.
Adv. dx= (X- 16) 𝒅𝟐𝒙 Y(Sales) dy= (Y- 20) 𝒅𝟐𝒚 dxdy
Expenses(X)
10 -6 36 14 -6 36 36
12 -4 16 17 -3 9 12
15 -1 1 23 3 9 -3
23 7 49 25 5 25 35
20 4 16 21 1 1 4
∑ 𝑿 = 𝟖𝟎 ∑ 𝐝𝐱 = 𝟎 ∑ 𝒅𝟐𝒙 = 𝟏𝟏𝟖 ∑ 𝒀 = 𝟏𝟎𝟎 ∑ 𝐝𝐲 =0 ∑ 𝒅𝟐𝒚 = ∑ 𝐝𝐱𝐝𝐲=84
80

(i) Regression equation of Y on X: Y - ̅𝒀 = 𝒃𝒚𝒙 (X – 𝑿


̅)
∑ 𝐝𝐱𝐝𝐲
𝒃𝒚𝒙 = =
𝟖𝟒
= 0.712; ̅ = ∑ 𝒀 = 𝟏𝟎𝟎 = 20; ̅̅̅̅
𝒀 𝑿=
∑ 𝑿 𝟖𝟎
= = 16
∑ 𝒅𝟐𝒙 𝟏𝟏𝟖 𝟓 𝟓 𝒏 𝟓

Y – 20 = 𝒃𝒚𝒙 (X – 𝟏𝟔 ) ; Y – 20 = 0.712 (X – 𝟏𝟔 ) ; Y = 0.712 X + 20 – 𝟏𝟏. 𝟑𝟗


; Y = 8.608 + 0.712X
(ii) Sales corresponding to advertising expenditure of Tk. 30 lakh
Y = 8.608 + 0.712(30) = 8.608 + 21.36 = 29.968

Thus, the likely sales corresponding to advertising expenditure of Tk.30 lakh is Tk.
29.968.
(iii) Hypothesis 𝑯𝟎 : 𝜷 = 𝟎 ; 𝑯𝟏 : 𝜷 ≠ 𝟎 .
𝜷
To test the significance of the test statistic t is used, where t = ,
𝒔𝟐

𝑺𝑺(𝒙)
𝟏
𝒔𝟐 = [SS(y) - 𝜷{𝑺𝑷(𝒙𝒚)}]
𝒏−𝟐

5
𝟏 𝟎.𝟕𝟏𝟐 𝟎.𝟕𝟏𝟐
= [ 80 – 0.712(84)] = 6.73 ; t = = = 2.97.
𝟑 √
𝟔.𝟕𝟑 𝟎.𝟐𝟑𝟗
𝟏𝟏𝟖

Comments: The calculated value of 𝒕(.𝟎𝟓,𝟑)= 2.79. But the tabulated value of
t = 3.182, which higher than the calculated value. Therefore, we may accept the
null hypothesis.

You might also like