Lecture 4 - 7 - Association Between Variables - Correlation
Lecture 4 - 7 - Association Between Variables - Correlation
Usha Mohan
1/ 16
Statistics for Data Science -1
Association between numerical variables
Measuring asssociation: Correlation
Learning objectives
2/ 16
Statistics for Data Science -1
Association between numerical variables
Measuring asssociation: Correlation
Correlation
3/ 16
Statistics for Data Science -1
Association between numerical variables
Measuring asssociation: Correlation
Correlation
I A more easily intepreted measure of linear association between
two numerical variables is correlation
3/ 16
Statistics for Data Science -1
Association between numerical variables
Measuring asssociation: Correlation
Correlation
I A more easily intepreted measure of linear association between
two numerical variables is correlation
I It is derived from covariance.
I To find the correlation between two numerical variables x and
y divide the covariance between x and y by the product of the
standard deviations of x and y . The Pearson correlation
coefficient, r , between x and y is given by
3/ 16
Statistics for Data Science -1
Association between numerical variables
Measuring asssociation: Correlation
Correlation
I A more easily intepreted measure of linear association between
two numerical variables is correlation
I It is derived from covariance.
I To find the correlation between two numerical variables x and
y divide the covariance between x and y by the product of the
standard deviations of x and y . The Pearson correlation
coefficient, r , between x and y is given by
r=
3/ 16
Statistics for Data Science -1
Association between numerical variables
Measuring asssociation: Correlation
Correlation
I A more easily intepreted measure of linear association between
two numerical variables is correlation
I It is derived from covariance.
I To find the correlation between two numerical variables x and
y divide the covariance between x and y by the product of the
standard deviations of x and y . The Pearson correlation
coefficient, r , between x and y is given by
n
X
(xi − x̄)(yi − ȳ )
i=1
r=Ã Ã =
Xn Xn
(xi − x̄)2 (yi − ȳ )2
i=1 i=1
3/ 16
Statistics for Data Science -1
Association between numerical variables
Measuring asssociation: Correlation
Correlation
I A more easily intepreted measure of linear association between
two numerical variables is correlation
I It is derived from covariance.
I To find the correlation between two numerical variables x and
y divide the covariance between x and y by the product of the
standard deviations of x and y . The Pearson correlation
coefficient, r , between x and y is given by
n
X
(xi − x̄)(yi − ȳ )
i=1 cov (x, y )
r=Ã Ã =
Xn Xn sx sy
(xi − x̄)2 (yi − ȳ )2
i=1 i=1
3/ 16
Statistics for Data Science -1
Association between numerical variables
Measuring asssociation: Correlation
4/ 16
Statistics for Data Science -1
Association between numerical variables
Measuring asssociation: Correlation
Remark
The units of the standard deviations cancel out the units of
covariance
4/ 16
Statistics for Data Science -1
Association between numerical variables
Measuring asssociation: Correlation
Remark
The units of the standard deviations cancel out the units of
covariance
Remark
It can be shown that the correlation measure always lies between
-1 and +1
4/ 16
Statistics for Data Science -1
Association between numerical variables
Measuring asssociation: Correlation
Correlation: Example 1
5/ 16
Statistics for Data Science -1
Association between numerical variables
Measuring asssociation: Correlation
Correlation: Example 1
I sx = 1.58, sy = 13.01
I r = √ 82 20.5
OR 1.58×13.01 = 0.9964
10×677.2
5/ 16
Statistics for Data Science -1
Association between numerical variables
Measuring asssociation: Correlation
Correlation: Example 2
I sx = 1.58, sy = 1.58
I r = √ −10√ OR 1.58×1.58
−2.5
= −1
10× 10
6/ 16
Statistics for Data Science -1
Association between numerical variables
Measuring asssociation: Correlation
7/ 16
Statistics for Data Science -1
Association between numerical variables
Measuring asssociation: Correlation
Section summary
8/ 16
Statistics for Data Science -1
Association between numerical variables
Measuring asssociation: Correlation
Section summary
8/ 16
Statistics for Data Science -1
Association between numerical variables
Fitting a line
Learning objectives
9/ 16
Statistics for Data Science -1
Association between numerical variables
Fitting a line
10/ 16
Statistics for Data Science -1
Association between numerical variables
Fitting a line
10/ 16
Statistics for Data Science -1
Association between numerical variables
Fitting a line
10/ 16
Statistics for Data Science -1
Association between numerical variables
Fitting a line
11/ 16
Statistics for Data Science -1
Association between numerical variables
Fitting a line
11/ 16
Statistics for Data Science -1
Association between numerical variables
Fitting a line
13/ 16
Statistics for Data Science -1
Association between numerical variables
Fitting a line
14/ 16
Statistics for Data Science -1
Association between numerical variables
Fitting a line
Section summary
16/ 16