Week 1
Week 1
By
Pels
LECTURE SLIDES
Introduction
Finding the relationship between two quantitative variables without being
able to infer causal relationships.
Correlation is a statistical method used to determine whether a linear
relationship between variables exists.
Note
Correlation does not imply causalty
Correlation does not distinguish between independent and dependent
variables.
The phrase "correlation does not imply causation" refers to the
inability to legitimately deduce a cause-and-effect relationship
between two events or variables solely on the basis of an observed
association or correlation between them.
Pels (KNUST) Regression Analysis: STAT 367 January 24, 2024 7 / 31
What’s the difference between correlation and causation?
While causation and correlation can exist at the same time,
correlation does not imply causation. Causation explicitly applies to
cases where action A causes outcome B. On the other hand,
correlation is simply a relationship. Action A relates to Action B but
one event doesn’t necessarily cause the other event to happen.
(x − x̄ )(y − ȳ )
P
= sP n −s1 (2)
(x − x̄ )2 (y − ȳ )2
P
n−1 n−1
(x − x̄ )(y − ȳ )
P
= qP qP (3)
(x − x̄ )2 (y − ȳ )2
XY − nX̄ Ȳ
P
= qP q (4)
X 2 − nX̄ 2 Y 2 − nȲ 2
P
Cont’d
Sxy
=p (5)
Sxx Syy
Figure: Data
Figure: Plot
Pels (KNUST) Regression Analysis: STAT 367 January 24, 2024 14 / 31
CORRELATION
Scatter plots of Data with Various Correlation Coefficients
Pels (KNUST)
Figure:
Regression Analysis: STAT 367 January 24, 2024 15 / 31
Interpretation of Correlation Coefficient
Strength
The greater the absolute value of the correlation coefficient, the stronger
the relationship.
The extreme values of -1 and 1 indicate a perfectly linear relationship
where a change in one variable is accompanied by a perfectly consistent
change in the other.
For these relationships, all of the data points fall on a line. In practice, you
won’t see either type of perfect relationship.
A coefficient of zero represents no linear relationship. As one variable
increases, there is no tendency in the other variable to either increase or
decrease.
When the value is in-between 0 and +1/-1, there is a relationship, but the
points don’t all fall on a line. As r approaches -1 or 1, the strength of the
relationship increases and the data points tend to fall closer to a line.
Pels (KNUST) Regression Analysis: STAT 367 January 24, 2024 16 / 31
Interpretation of Correlation Coefficient
Direction
The sign of the correlation coefficient represents the direction of the
relationship.
1 Positive coefficients indicate that when the value of one variable
increases, the value of the other variable also tends to increase.
Positive relationships produce an upward slope on a scatterplot.
2 Negative coefficients represent cases when the value of one variable
increases, the value of the other variable tends to decrease. Negative
relationships produce a downward slope.
Example
Weight (Kg) 67 69 85 83 74 81 97 92 114 85
SBP (mHg) 120 125 140 160 130 180 150 140 200 130
Solution
XY − nX̄ Ȳ
P
r = qP q (6)
X 2 − nX̄ 2 Y 2 − nȲ 2
P
P
XY = 127325 X̄ = 84.7 Ȳ = 147.5
X 2 = 73495 Y 2 = 223525 Ȳ 2 = 21756.25
P P
X̄ 2 = 7174.09
r = 0.7398
cont’d
Nature of the scatter plot
What is the nature of the Scatter Plot?
Positive relationship
Negative relationship
No relationship
Example 2
A sample of children was selected, data about their age in years and
weight in kilograms was recorded as shown below. It is required to find the
correlation between age and weight.
No. Age(years) Weight(kg)
1 7 12
2 6 8
3 8 12
4 5 10
5 6 11
6 9 13
Solution
Correlation Matrix
Age(years) Weight(kg)
Age(years) 1 0.75955452531275
Weight(kg) 0.75955452531275 1
x 12 15 18 21 27
y 2 4 6 8 12
r = 0.84
Note:
Like a proportion, the variance of the correlation coefficient depends on
the correlation coefficient itself. substitute in estimated r.
Olivia is studying for a test, and she wonders if her friend, Laney, is
also studying for the test. She calls Laney and asks her how long she
has been studying. Laney has been studying for her test all week,
approximately 8 hours total. Olivia has only been studying for her test
for a couple of hours. The next week, Olivia and Laney get their test
scores back. Laney got an A on her test, and Olivia got a C. Olivia
wonders if there is a correlation between the number of hours spent
studying and the grade a student earns. Take a look at the data Olivia
collected from her classmates, and see if you can find a correlation.
x 8 2 6 4 2
y 98 74 87 82 72
x 11.1 10.3 12.0 15.1 13.7 18.5 17.3 14.2 14.8 15.2
y 10.9 14.2 13.8 21.5 13.2 21.1 16.4 19.3 17.4 19.0