Biostatistics Lect 7a - Correlation - 142021
Biostatistics Lect 7a - Correlation - 142021
L E
BIOSTATISTICS
CORRELATION
AND
SIMPLE LINAR REGRESSION
Objectives
To examine the linear relationship between two quantitative variables using
CORRELATION REGRESSION
• Questions answered by • Questions answered by
correlation regression
• Scatterplots • An example
• An example • Coefficient of determination
• Correlation coefficient • Testing for significance
• Other kinds of correlations • Predictions
• Testing for significance
CORRELATION
Correlation
Are these
relationships
linear?
Requirements when making inferences about r
Formula components:
• n: Number of pairs of data
• :sum of all x values
• : each x value is squared then summed
• add up all the x values then find the square of the total
• : multiply each then add them up
Calculating r Scatter Plot
(1,8)
(3,5)
(5,4)
(1,2)
The formula
• Scatter plot not really a straight line
• But let’s apply the formula:
• Or
Properties of the linear correlation
coefficient r
• Values of r are always between -1 and +1
r = 0.816
one outlier is enough to produce a high correlation
Strong linear relationship; near perfect, coefficient,
except for one even though
outlier which relationship
lowers the between the
correlation two variables
not linear
The range and strength of association
Indirect Direct
perfect perfect
correlation correlation
no relation
Direct
Indirect Relationship
Relationship
Scatterplot:Video Games and Alcohol Consumption Scatterplot: Video Games and Test Score
20
100
Average Number of Alcoholic Drinks Per Week
18
90
16 80
14 70
Exam Score
12 60
10 50
8 40
6 30
4
20
10
2
0
0
0 5 10 15 20 25 0 5 10 15 20
Average Hours of Video Games Per Week Average Hours of Video Games Per Week
strong intermediate weak weak intermediate strong
The formula
Indirect Direct
perfect correlation perfect correlation
no relation
• No correlation
Attractiveness Symmetry
3 2
4 6
1 1
2 3
5 4
6 5 20
rsp = 0.77
TESTING THE STRENGTH OF THE
ASSOCIATION
Hypothesis Testing: for Correlation
(Method 1)
• Determine if a significant linear correlation exists between two variables
• Hypothesis test:
H0 : =0
H1 : ≠0
• (two-tailed test, although a one-tailed test is possible)
No Correlation
• Since r = 0.342 < 0.754,
-1 0.754 r=0.342 0.754 1
There is not sufficient evidence of a significant linear correlation
between heights of fathers and their sons. The data does not suggest
that taller fathers tend to have taller sons.
Example cont’d: Hypothesis testing, Method 1
• The formal hypothesis test follows (α=0.025):
H0: =0
H1: ≠0
• The test statistic is (t Student with n-2 d.f.)
= = 0.815
• Test statistic between -2.571 and 2.571, we fail to reject the null hypothesis.
• There is not sufficient sample evidence to conclude there is a significant linear
correlation between father’s height and their son’s height. The data does not
suggest that taller fathers tend to have taller sons.
Correlation: Common errors