0% found this document useful (0 votes)
21 views4 pages

Correlation and Regression

Uploaded by

ojas.saxena
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
21 views4 pages

Correlation and Regression

Uploaded by

ojas.saxena
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 4

Unit 1: Correlation and Regression

Correlation: Correlation is a statistical tool which is used to measure relationship between two
or more variables.

Types of correlation

 A positive correlation indicates a direct relationship between variables. An increase in


one variable is related to an increase in the other, and a decrease in one is related to a
decrease in the other. The majority of the data points fall along an upward angle (from
the lower-left corner to the upper-right corner).
 A negative correlation between two variables indicates an inverse or negative
relationship. An increase in one variable is accompanied by a decrease in the other
variable. Correlation coefficients can be weak, moderate, or strong. A correlation
coefficient (descriptive statistic) helps by assigning a numerical value to the observed
relationship.
 No Relationship: In the scatter plot, the data points are scattered in a random fashion.
The correlation coefficient for no correlated data is very close to 0.
Correlation can be determined using Scatter diagram, Karl Pearson correlation coefficient and
Spearman rank correlation.

Methods to calculate correlation coefficient

1) Karl pearson correlation coefficient

𝑛 ∗ ∑ 𝑥𝑦 − (∑ 𝑥 )(∑ 𝑦)
𝑟=
√𝑛 ∑ 𝑥 2 − (∑ 𝑥 )2 ∗ √𝑛 ∑ 𝑦 2 − (∑ 𝑦)2

2) Spearman rank correlation

∑ 𝑑2
𝑟 = 1− 6[ 3 ]
𝑛 −𝑛

Where 𝑑 = 𝑅1 − 𝑅2
𝑅1 = Ranks of one variable (x)

𝑅2 = Ranks of other variable (y)

n= number of pair of observations

Properties of correlation coefficient:

1) The correlation coefficient is independent of units of measurements of the variables.

2) The coefficient of correlation is not affected by the changes in origin and scale.

3) The range of coefficient of correlation is -1 to +1 i.e −1 ≤ 𝑟 ≤ +1

 A correlation coefficient measures the degree of relationship between two variables and
can vary between -1 to +1.
 The stronger the relationship between the variables, the closer the coefficient will be to
either –1 or +1.
 The weaker the relationship between the variables, the closer the coefficient will be to 0.
 A correlation of 0 between two variables indicates the absence of any relationship, as
might occur by chance.
 The sign preceding the correlation coefficient indicates whether the observed relationship
is positive or negative.
 The terms positive and negative do not refer to good and bad relationships or strong or
weak relationships but rather to how the variables are related.

Regression

Correlation coefficients tell us about only linear relationships. A tool that enables us to predict an
individual’s score on one variable based on knowing one or more other variables is regression
analysis.
The property of tendency of actual value to lie close to the estimated value is called Regression.

Regression is theory of estimation of unknown value of variable with the help of known values
of other variables.
Regression analysis involves determining the equation for the best-fitting line for a data set.
Regression analysis allows you to make such predictions by developing a regression equation.
A procedure that allows us to predict an individual’s score on one variable based on knowing one
or more other variables.
Properties of regression coefficients:

1) Regression coefficient of x on y, i.e. 𝑏𝑥𝑦 is the change occurring in x for a unit change in y.
Regression coefficient of y on x 𝑏𝑦𝑥 is the change occurring in y for a unit change in x.

2) Regression coefficients are independent of the change in the origin but not of scale.

3) The geometric mean of regression coefficients is equal to the coefficient of correlation i.e. 𝑟 =

√𝑏𝑥𝑦 ∗ 𝑏𝑦𝑥

4) The coefficient of correlation numerically cannot be greater than one, product of regression
coefficients cannot be greater than one.

5) The two regression coefficients cannot be of different signs. Both regression coefficients are
positive or both are negative.

6) If the variables are positively correlated then regression line will have positive slope. When
variables are negatively correlated then regression line will have negative slope.

7) If there is perfect correlation, the regression lines coincide.

8) If there is no correlation, then two regression lines are perpendicular to each other.

Important formulae:

1) Regression equation of X on Y is given by 𝑋 − 𝑋̅ = 𝑏𝑥𝑦 (𝑌 − 𝑌̅ )

where 𝑏𝑥𝑦 is regression coefficient of X on Y and is given by

𝑛 ∗ ∑ 𝑥𝑦 − (∑ 𝑥 )(∑ 𝑦)
𝑏𝑥𝑦 =
𝑛 ∑ 𝑦 2 − (∑ 𝑦)2
2) Regression equation of Y on X is given by 𝑌 − 𝑌̅ = 𝑏𝑦𝑥 (𝑋 − 𝑋̅ )

where 𝑏𝑦𝑥 is regression coefficient of Y on X and is given by

𝑛 ∗ ∑ 𝑥𝑦 − (∑ 𝑥 )(∑ 𝑦)
𝑏𝑦𝑥 =
𝑛 ∑ 𝑥 2 − (∑ 𝑥 )2

3) 𝑟 = √𝑏𝑥𝑦 ∗ 𝑏𝑦𝑥

𝜎 𝜎
4) 𝑏𝑥𝑦 = 𝑟 ∗ 𝜎𝑥 and 𝑏𝑦𝑥 = 𝑟 ∗ 𝜎𝑦
𝑦 𝑥

You might also like