0% found this document useful (0 votes)
31 views14 pages

Lesson4 1515

This document discusses correlation and correlation coefficients. It defines correlation as the strength and direction of the relationship between two variables. It describes correlation coefficients as dimensionless numbers between -1 and 1 that indicate the strength and direction of correlation. Strong positive correlations near 1 indicate variables increase together, while strong negative correlations near -1 indicate variables decrease together. The document contrasts Pearson's coefficient, which assumes normal distributions, and Spearman's coefficient, which is preferred when variables are not normal. It introduces scatterplots and the coefficient of determination, which expresses the shared variance between variables.

Uploaded by

gm hash
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
31 views14 pages

Lesson4 1515

This document discusses correlation and correlation coefficients. It defines correlation as the strength and direction of the relationship between two variables. It describes correlation coefficients as dimensionless numbers between -1 and 1 that indicate the strength and direction of correlation. Strong positive correlations near 1 indicate variables increase together, while strong negative correlations near -1 indicate variables decrease together. The document contrasts Pearson's coefficient, which assumes normal distributions, and Spearman's coefficient, which is preferred when variables are not normal. It introduces scatterplots and the coefficient of determination, which expresses the shared variance between variables.

Uploaded by

gm hash
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 14

Correlation

Dr. Penina Olga


USMF “Nicolae Testemițanu”,
Department Social Medicine and Health Management
Correlation
• Correlation is used to establish and quantify the strength
and direction of the relationship between two variables
Correlation coefficient
Correlation coefficient : expresses the strength and direction of the
relationship between two variables.
Examples: salt intake and blood pressure; cigarette consumption and life
expectancy
Independent or explanatory variable (X): salt intake, cigarette consumption
Dependent or outcome variable (Y): blood pressure, life expectancy
The value of the correlation coefficient varies from -1 to +1
Dimensionless number
The direction of correlation : the sign (+ or -) of the coefficient
The strength of correlation : the size of the coefficient
Direction of correlation
Positive correlation (+): high values on one variable (salt intake) are associated with
high values of the other variable (blood pressure)
Negative correlation (-): high values on one variable (cigarette consumption) are
associated with low values of the other variable (life expectancy)
Strength of correlation
From 0 to 0.25 (±) : no or weak correlation (no, little degree)
From 0.25 to 0.50 (±) : moderate correlation (fair degree)
From 0.50 to 0.75 (±) : strong correlation (good)
From 0.75 to 1 (±): very strong (very good, excellent)

+1.00: a “perfect” or excellent positive correlation


−1.00: a “perfect” or excellent negative correlation
0.00 : no relationship between the two variables
Scatterplots : visual display of the relationship between two numerical variables,
recommended to check for a linear relationship and extreme values

Outcome variable (Y)


r=0
r=+1.0
No correlation
“Perfect” (excellent)
Not possible to draw
positive correlation
any straight line

Explanatory variable (X)

r=-0.8 r=+0.3
Very strong (very Moderate (fair)
good) negative positive correlation
correlation
Types of correlation coefficient
• Pearson’s correlation coefficient (r)

• Spearman’s rank correlation coefficient (𝒓𝒔 𝒐𝒓 𝛒)

Both these correlational techniques are linear: they evaluate the strength
of a “straight line” relationship between two variables
A strong nonlinear relationship

If there is a very strong nonlinear relationship between two variables,


the Pearson or Spearman correlation coefficients will be an
underestimate of the true strength of the relationship.
Pearson’s correlation coefficient (r)
• For two numerical variables (e.g., salt intake and blood pressure)

Assumption: the two variables, X and Y, vary together in a joint


distribution that is normally distributed, called the bivariate normal
distribution

If either of the two variables is not normally distributed (i.e. the


data are skewed), Pearson's correlation coefficient is not the most
appropriate method. The Spearman rho correlation is advised.
Spearman’s rank correlation coefficient
(𝒓𝒔 𝒐𝒓 𝛒)
• For two ordinal variables
• For one ordinal and one numerical variable
• For two numerical variables if either of the two variables is
not normally distributed

The values of each variable are ranked from lowest to


highest, then the ranks are treated, as they were the actual
values themselves (see further formular)
𝟐
Coefficient of determination (𝒓 )
• Coefficient of determination is found by squaring the value of r
(𝑟 2 can be multiplied by 100%)

• The coefficient of determination expresses the proportion of


the variance (𝑠 2 ) in one variable that is accounted for, or
“explained,” by the variance in the other variable

Recall: the variance (𝑠 2 ) : measure of variability


2
𝑋𝑖 −𝑋
𝑠2 =
𝑛−1
Coefficient of determination (𝒓𝟐 )

r = 0.00
r2 = 0.00
r = 0.40
r2 = 0.16*100%=16%

When two variables are correlated, there is a certain amount of shared variance
between them. The stronger the correlation, the greater the amount of shared variance.

Example: If a study finds a correlation (r) of 0.40 between salt intake and blood pressure,
it could be concluded that 0.40 x 0.40 = 0.16, or 16% of the variance in blood pressure in
this study is accounted for by variance in salt intake
Important
• Correlation does not imply causation
Example: the correlation between salt intake and blood pressure does not
necessarily mean that the changes in salt intake caused the changes in blood
pressure.

• The fact that a correlation is present between two variables in a sample does
not necessarily mean that the correlation actually exists in the population.
It is necessary to test a null hypothesis about the absence of the correlation in a
population. A special t test is used (inferential statistics)
• Correlation coefficient (strength and direction)
• Interpret a scatterplot and coefficient of variation
• Under what conditions Pearson’s coefficient is used
• Under what conditions Spearman’s coefficient is used
• Coefficient of determination

You might also like