16.. Correlation Analysis - Michael
16.. Correlation Analysis - Michael
2
Correlation & Causation
Correlation denotes the interdependency among
the variables for correlating two phenomenon.
3
Cont..
• If two variables vary in such a way that movement in
one are accompanied by movement in other, these
variables are called cause and effect relationship.
4
Types of correlation
Type1
A) POSITIVE CORRELATION
8
Pearson’s Linear Correlation Coefficient:
( x x )( y y )
r= 2 2
( x x) ( y y )
Where:
i. r = Correlation coefficient
ii. Xi = Values of the x –variable in a sample
iii. x̄ = mean of the values of the x-variable
iv. yi = Values of the y – variable in a sample
10
v. ȳ = mean of the values of the y - variable
Characteristics of r
• It is a bivariate correlation coefficient summarizing the magnitude
and direction of the relationship between two variables.
11
Cont..
• Ranges between -1 and +1
r=0 = No linear relationship ( uncorrelated)
r=1 = Perfect positive relationship
r= -1=Perfect negative relationship
12
1. Calculate the Correlation coefficient of age (Years) against the
body mass index of patients attended the clinic at a certain
department.
N AGE (YEARS) BMI (Kg/m2)
1 73 28
2 22 22
3 74 27
4 34 29
5 50 29
6 42 27
7 64 28
8 53 29
9 43 24
10 21 19
11 12 17
• Recall for, r
( x x )( y y )
2 2
( x x) ( y y )
X Y X - X̄ Y - Ȳ (X - X̄ )(Y –Ȳ ) (X- X̄ )2 (Y-Ȳ )2
73 28 73 – 44 = 29 28 – 25 = 3 87 841 9
22 22 22 - 44 = -22 22 – 25 = -3 66 484 9
74 27 74 – 44 = 30 27 – 25 = 2 60 900 4
34 29 34 – 44 = -10 29 – 25 = 4 -40 100 16
50 29 50 – 44 = 14 29 – 25 = 4 56 196 16
42 27 42 – 44 = -2 27 – 25 = 2 -4 4 4
64 28 64 – 44 = 20 28 – 25 = 3 60 400 9
53 29 53 – 44 = 9 29 – 25 = 4 36 81 16
43 24 43 – 44 = -1 24 – 25 = -1 1 1 1
21 19 21 – 44 = -23 19 – 25 = -6 138 529 36
12 17 12 – 44 = -32 17 – 25 = -8 256 1024 64
X̄ = 44 Ȳ = 25 Σ = 12 Σ=4 Σ = 716 Σ = 4564 Σ=184
r = 0.781
18
Degree of Correlations
a c
b
19
Coefficient of Determination
• Coefficient of Determination r2 is the square of the
Pearson’s correlation coefficient r.
20
• It denotes the percentage at which x can predict y.
21
• In other words it will mean that only 61.2% of changes
in SBP (y) are due to changes in Age (x)
22
Assumptions of Pearson’s Correlation
Coefficient
• There is linear relationship between two variables, i.e. when the two
variables are plotted on a scatter diagram, a straight line will be
formed by the points.
23
Advantages of Pearson’s
Coefficient
• It summarizes in one value, the degree of correlation
& direction of correlation also.
24
THANK YOU
25