Online Correlation and Regression
Online Correlation and Regression
What is correlation?
Statistical tool with the help of which the relationships between two or more
than two variables is studied is known as correlation. The measure of
correlation called the co-efficient of correlation (denoted by the symbol r)
summarizes in one figure the direction and the degree of correlation. The
correlation analysis refers to the techniques used in measuring closeness of the
relationship between the variables.
Types of Correlation
Correlation is described or classified in several different ways. Three of the most important
are:
1
(iv) The co-efficient of correlation is the geometric mean of two regressions co-efficient.
Symbolically, r = √𝒃𝒚𝒙 𝒃𝒙𝒚 .
Example: The following data relate to the age of 10 employees and the number of days on
which they reported sick in a month.
Age 20 30 32 35 40 46 52 55 58 62
Sick days 1 2 0 3 4 6 5 7 8 9
Solution:
𝑵 ∑ 𝒅𝒙 𝒅𝒚 −∑ 𝒅𝒙 ∑ 𝒅𝒚 𝟏𝟎𝒙𝟑𝟓𝟑−(𝟎)(−𝟓) 𝟑𝟓𝟑𝟎
r(xy) = 𝟐 𝟐 = =
√𝑵 ∑ 𝒅𝟐𝒙 − (∑ 𝒅𝒙) . √𝑵 ∑ 𝒅𝟐𝒚 − (∑ 𝒅𝒚) √𝟏𝟎𝒙𝒙𝟏𝟕𝟏𝟐−(𝟎)𝟐 .√𝟏𝟎𝒙𝟖𝟓−(−𝟓)𝟐. √𝟏𝟕𝟏𝟐𝟎.√𝟖𝟐𝟓
𝟑𝟓𝟑𝟎
= 𝟏𝟑𝟎.𝟖𝟓(𝟐𝟖.𝟕𝟐) = 0.939
Thus, there is a very high degree of positive correlation between age and sick days taken. Hence,
it can be concluded that as the age of employee increases, he has the possibility to be sick more
often than others.
𝟔 ∑ 𝑫𝟐
R=1-
𝑵(𝑵𝟐 − 𝟏)
Example: Calculate the rank correlation coefficient for the following data of marks of 2
tests given to candidates for a clerical job.
Preliminary test 92 89 87 86 83 77 71 63 53 50
Final test 86 83 91 77 68 85 52 82 37 57
Solution
𝟔 ∑ 𝑫𝟐
R = 1 - 𝑵(𝑵𝟐 − 𝟏)
𝟔(𝟒𝟒)
=1– = 1 – 0.267 = 0.733
𝟗𝟗𝟎
Thus, there is a high degree of positive correlation between preliminary and final
test.
3
Regression Analysis
The statistical tool with the help of which one can estimate (or predict) the
unknown values one variable from known values of another variable is called
regression.
Assumption
4
Example 2
The following data relate to advertising expenditure (in lakh Tk.) and their corresponding
sales(in crores Tk.):
Advertising Expenditure 10 12 15 23 20
Sales 14 17 23 25 21
Estimate (i) the regression line of Y on X ; (ii) estimate sales corresponding to advertising
expenditure of Tk. 30 lakh, (iii) test the significance of the regression co-efficient at 5%
level of significance.
Solution: Let advertising expenditure be denoted by X and sales by Y.
Adv. dx= (X- 16) 𝒅𝟐𝒙 Y(Sales) dy= (Y- 20) 𝒅𝟐𝒚 dxdy
Expenses(X)
10 -6 36 14 -6 36 36
12 -4 16 17 -3 9 12
15 -1 1 23 3 9 -3
23 7 49 25 5 25 35
20 4 16 21 1 1 4
∑ 𝑿 = 𝟖𝟎 ∑ 𝐝𝐱 = 𝟎 ∑ 𝒅𝟐𝒙 = 𝟏𝟏𝟖 ∑ 𝒀 = 𝟏𝟎𝟎 ∑ 𝐝𝐲 =0 ∑ 𝒅𝟐𝒚 = ∑ 𝐝𝐱𝐝𝐲=84
80
Thus, the likely sales corresponding to advertising expenditure of Tk.30 lakh is Tk.
29.968.
(iii) Hypothesis 𝑯𝟎 : 𝜷 = 𝟎 ; 𝑯𝟏 : 𝜷 ≠ 𝟎 .
𝜷
To test the significance of the test statistic t is used, where t = ,
𝒔𝟐
√
𝑺𝑺(𝒙)
𝟏
𝒔𝟐 = [SS(y) - 𝜷{𝑺𝑷(𝒙𝒚)}]
𝒏−𝟐
5
𝟏 𝟎.𝟕𝟏𝟐 𝟎.𝟕𝟏𝟐
= [ 80 – 0.712(84)] = 6.73 ; t = = = 2.97.
𝟑 √
𝟔.𝟕𝟑 𝟎.𝟐𝟑𝟗
𝟏𝟏𝟖
Comments: The calculated value of 𝒕(.𝟎𝟓,𝟑)= 2.79. But the tabulated value of
t = 3.182, which higher than the calculated value. Therefore, we may accept the
null hypothesis.