0% found this document useful (0 votes)
9 views31 pages

Biostatistics Lect 7a - Correlation - 142021

The document discusses correlation and simple linear regression, focusing on the relationship between two quantitative variables and the significance of their linear association. It explains the correlation coefficient, methods for hypothesis testing, and the importance of scatterplots in assessing relationships. Additionally, it highlights common errors in interpreting correlation and the distinction between correlation and causation.

Uploaded by

sm
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
9 views31 pages

Biostatistics Lect 7a - Correlation - 142021

The document discusses correlation and simple linear regression, focusing on the relationship between two quantitative variables and the significance of their linear association. It explains the correlation coefficient, methods for hypothesis testing, and the importance of scatterplots in assessing relationships. Additionally, it highlights common errors in interpreting correlation and the distinction between correlation and causation.

Uploaded by

sm
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 31

C 7

L E

BIOSTATISTICS

CORRELATION
AND
SIMPLE LINAR REGRESSION
Objectives
To examine the linear relationship between two quantitative variables using

CORRELATION REGRESSION
• Questions answered by • Questions answered by
correlation regression
• Scatterplots • An example
• An example • Coefficient of determination
• Correlation coefficient • Testing for significance
• Other kinds of correlations • Predictions
• Testing for significance
CORRELATION
Correlation

• Finding the relationship between two quantitative variables without


inferring causal relationships

• Correlation is a statistical technique - determines degree to which two


variables are related
• Give a single value to describe the relationship

• Consider mostly linear associations (scatterplots approximate a


straight-line pattern).
Simple Linear Correlation Coefficient (r)

• Pearson's correlation (r) or product moment correlation coefficient (r)

• Single number between – 1 and + 1

• It measures the nature and strength between two quantitative


variables
• The sign of r denotes the nature (direction) of association

• The value of r denotes the strength (quality) of association


Anscombe’s
Scatterplots:

Are these
relationships
linear?
Requirements when making inferences about r

• Sample of paired data (x,y) is a random sample of quantitative data


• Bivariate data (pairs)) matched or unmatched

• Scatterplot must have an approximate straight-line pattern

• Outliers should be removed from data

• Formal requirement: (x,y) data must have a bivariate normal


distribution.
Correlation Coefficient r

[ρ = linear correlation coefficient for the population]


• r = linear correlation coefficient for the sample

Formula components:
• n: Number of pairs of data
• :sum of all x values
• : each x value is squared then summed
• add up all the x values then find the square of the total
• : multiply each then add them up
Calculating r Scatter Plot
(1,8)

(3,5)

(5,4)

(1,2)
The formula
• Scatter plot not really a straight line
• But let’s apply the formula:

• Or
Properties of the linear correlation
coefficient r
• Values of r are always between -1 and +1

• Value of r does not change if all values of either variable are


converted to a different scale (e.g. covert inches to cms.)

• Exchanging values of x and y does not change the value of r

• r measures the strength of a linear association.


• DO NOT use r to measure nonlinear associations
Are these relationships linear?
Follows
assumption of
Not distributed
normality
normally; non-
linear

r = 0.816
one outlier is enough to produce a high correlation
Strong linear relationship; near perfect, coefficient,
except for one even though
outlier which relationship
lowers the between the
correlation two variables
not linear
The range and strength of association

strong intermediate weak weak intermediate strong

-1 -0.75 -0.25 0 0.25 0.75 1

Indirect Direct
perfect perfect
correlation correlation
no relation
Direct
Indirect Relationship
Relationship
Scatterplot:Video Games and Alcohol Consumption Scatterplot: Video Games and Test Score

20
100
Average Number of Alcoholic Drinks Per Week

18
90
16 80
14 70

Exam Score
12 60
10 50
8 40
6 30
4
20
10
2
0
0
0 5 10 15 20 25 0 5 10 15 20
Average Hours of Video Games Per Week Average Hours of Video Games Per Week
strong intermediate weak weak intermediate strong

-1 -0.75 -0.25 0 0.25 0.75 1

The formula
Indirect Direct
perfect correlation perfect correlation
no relation

• This is a very weak negative association


Perfect positive correlation
(Direct Relationship)

Perfect negative correlation


(Inverse/indirect Relationship)

Strong positive correlation


(Direct Relationship)

Strong negative correlation


(Inverse/indirect Relationship)

• No correlation

Non- linear correlation


A Correlation Interpretation Guide
Source: Kenyon, COVID-19 Infection Fatality Rate Associated with Incidence. Biology 2020, 9(6), 128; https://doi.org/10.3390/biology9060128
Other Kinds of Correlation
• Spearman Rank-Order Correlation Coefficient (rsp)
• used with 2 ranked/ordinal variables
• uses the same Pearson formula

Attractiveness Symmetry
3 2
4 6
1 1
2 3
5 4
6 5 20
rsp = 0.77
TESTING THE STRENGTH OF THE
ASSOCIATION
Hypothesis Testing: for Correlation
(Method 1)
• Determine if a significant linear correlation exists between two variables

• Hypothesis test:
H0 :  =0
H1 :  ≠0
• (two-tailed test, although a one-tailed test is possible)

• Test statistic: Use a t Student distribution =

• Critical value using df= n-2


Hypothesis testing: Correlation (Method
1 cont’d)
Conclusion: using critical values from Table A-3

• If |t|> critical value, reject Ho


Conclude: there is a significant linear correlation

• If |t| ≤ critical value, fail to reject Ho


Conclude: not sufficient evidence that there is a significant linear
correlation.
Hypothesis testing: Correlation (Method
2)
Conclusion: using critical values from Table A-6
Comparison with r

• If |r|> critical value, reject Ho


Conclude: there is a significant linear correlation

• If |r| ≤ critical value, fail to reject Ho


Conclude: not sufficient evidence that there is a significant linear correlation.
Example: Heights of fathers and
sons
• Text: Sect. 9.2, #8.
• • Data about heights of fathers and sons:

• Construct a scatterplot, find the value of r and use a significance


level of α=0.05 to determine whether there is a significant linear
correlation between the two variables.
Scatterplot: Sons’ heights vs. fathers’
heights
• Scatterplot does not
approximate a straight line
pattern.
• Hypothesis test is not
indicated.
• will perform to
demonstrate procedure.
• Calculate the linear
correlation coefficient.
Example con’td: Calculating r

• Low positive correlation…but is it a significant linear correlation?


Example cont’d: Hypothesis testing, Method 2

• The critical value from Table A-6


for n = 7 and α= 0.05 , critical value is 0.754.

No Correlation
• Since r = 0.342 < 0.754,
-1 0.754 r=0.342 0.754 1
There is not sufficient evidence of a significant linear correlation
between heights of fathers and their sons. The data does not suggest
that taller fathers tend to have taller sons.
Example cont’d: Hypothesis testing, Method 1
• The formal hypothesis test follows (α=0.025):
H0:  =0
H1:  ≠0
• The test statistic is (t Student with n-2 d.f.)

= = 0.815

• The critical values from Table A-3: t0.025, 5 = ±2.571.

• Test statistic between -2.571 and 2.571, we fail to reject the null hypothesis.
• There is not sufficient sample evidence to conclude there is a significant linear
correlation between father’s height and their son’s height. The data does not
suggest that taller fathers tend to have taller sons.
Correlation: Common errors

• Correlation does not imply causality (lurking variables)

• When data averages are used correlation can be inflated

• Correlation only applicable to linear relationships

• Low correlation values do not imply a lack of relationship between


two variables.
• The relationship can be nonlinear
Comments on Graphs
• Always draw a graph
• examples highlights correlation coefficient as a summary statistic is not sufficient
for final decision of the data

• If coefficient of linear correlation between (x, y) is significant; a linear


equation can be expressed y in terms of x.
• This equation can be used to predict the values of y given values of x.
• This equation is called the regression equation.

• The value r2 is the proportion of the variation in y that is explained by


the linear association between x and y.

You might also like