Correlation and Regression
Correlation and Regression
Correlation: Correlation is a statistical tool which is used to measure relationship between two
or more variables.
Types of correlation
𝑛 ∗ ∑ 𝑥𝑦 − (∑ 𝑥 )(∑ 𝑦)
𝑟=
√𝑛 ∑ 𝑥 2 − (∑ 𝑥 )2 ∗ √𝑛 ∑ 𝑦 2 − (∑ 𝑦)2
∑ 𝑑2
𝑟 = 1− 6[ 3 ]
𝑛 −𝑛
Where 𝑑 = 𝑅1 − 𝑅2
𝑅1 = Ranks of one variable (x)
2) The coefficient of correlation is not affected by the changes in origin and scale.
A correlation coefficient measures the degree of relationship between two variables and
can vary between -1 to +1.
The stronger the relationship between the variables, the closer the coefficient will be to
either –1 or +1.
The weaker the relationship between the variables, the closer the coefficient will be to 0.
A correlation of 0 between two variables indicates the absence of any relationship, as
might occur by chance.
The sign preceding the correlation coefficient indicates whether the observed relationship
is positive or negative.
The terms positive and negative do not refer to good and bad relationships or strong or
weak relationships but rather to how the variables are related.
Regression
Correlation coefficients tell us about only linear relationships. A tool that enables us to predict an
individual’s score on one variable based on knowing one or more other variables is regression
analysis.
The property of tendency of actual value to lie close to the estimated value is called Regression.
Regression is theory of estimation of unknown value of variable with the help of known values
of other variables.
Regression analysis involves determining the equation for the best-fitting line for a data set.
Regression analysis allows you to make such predictions by developing a regression equation.
A procedure that allows us to predict an individual’s score on one variable based on knowing one
or more other variables.
Properties of regression coefficients:
1) Regression coefficient of x on y, i.e. 𝑏𝑥𝑦 is the change occurring in x for a unit change in y.
Regression coefficient of y on x 𝑏𝑦𝑥 is the change occurring in y for a unit change in x.
2) Regression coefficients are independent of the change in the origin but not of scale.
3) The geometric mean of regression coefficients is equal to the coefficient of correlation i.e. 𝑟 =
√𝑏𝑥𝑦 ∗ 𝑏𝑦𝑥
4) The coefficient of correlation numerically cannot be greater than one, product of regression
coefficients cannot be greater than one.
5) The two regression coefficients cannot be of different signs. Both regression coefficients are
positive or both are negative.
6) If the variables are positively correlated then regression line will have positive slope. When
variables are negatively correlated then regression line will have negative slope.
8) If there is no correlation, then two regression lines are perpendicular to each other.
Important formulae:
𝑛 ∗ ∑ 𝑥𝑦 − (∑ 𝑥 )(∑ 𝑦)
𝑏𝑥𝑦 =
𝑛 ∑ 𝑦 2 − (∑ 𝑦)2
2) Regression equation of Y on X is given by 𝑌 − 𝑌̅ = 𝑏𝑦𝑥 (𝑋 − 𝑋̅ )
𝑛 ∗ ∑ 𝑥𝑦 − (∑ 𝑥 )(∑ 𝑦)
𝑏𝑦𝑥 =
𝑛 ∑ 𝑥 2 − (∑ 𝑥 )2
3) 𝑟 = √𝑏𝑥𝑦 ∗ 𝑏𝑦𝑥
𝜎 𝜎
4) 𝑏𝑥𝑦 = 𝑟 ∗ 𝜎𝑥 and 𝑏𝑦𝑥 = 𝑟 ∗ 𝜎𝑦
𝑦 𝑥