Association Between Two Variables: - No Association - Linear Association
Association Between Two Variables: - No Association - Linear Association
Variable Variable
Correlation
and Continuous Continuous
This Week Regression
• No association
• Linear association
– Positive association
– Negative association
• Curvilinear association
1 1
0.8 0.8
0.6 0.6
0.4 0.4
0.2 0.2
0 0
0 0.2 0.4 0.6 0.8 1 0 0.2 0.4 0.6 0.8 1
1 1
0.8 0.8
0.6 0.6
0.4 0.4
0.2 0.2
0 0
0 0.2 0.4 0.6 0.8 1 0 0.2 0.4 0.6 0.8 1
0.8 0.8
0.6 0.6
0.4 0.4
0.2 0.2
0 0
0 0.2 0.4 0.6 0.8 1 0 0.2 0.4 0.6 0.8 1
Strength of Linear Association
1 1
0.8 0.8
0.6 0.6
0.4 0.4
0.2 0.2
0 0
0 0.2 0.4 0.6 0.8 1 0 0.2 0.4 0.6 0.8 1
0.8 0.8
0.6 0.6
0.4 0.4
0.2 0.2
0 0
0 0.2 0.4 0.6 0.8 1 0 0.2 0.4 0.6 0.8 1
Quantifying the Strength
of Linear Correlation
• What does a positive Student SAT GPA
linear correlation # (X) (Y)
mean?
1 450 2.7
– Large numbers on one
variable go with large 2 520 3.1
numbers on the other
variable. 3 600 3.5
• How to decide what 4 470 2.6
are large and small 5 460 3.1
numbers? µ 500 3.0
– Relative to the means.
σ 55.5 0.32
3.6
3.4
3.2
3
350 400 450 500 550 600 650
2.8
2.6
2.4
Student SAT GPA X – µ X Y – µ Y (X – µ X)(Y – µ Y)
# (X) (Y)
1 450 2.7 -50 -0.3 15
σ 55.5 0.32
Co var iance σ XY
r= =
σ X ⋅σ Y σ X ⋅σ Y
• r = 15 / (55.5 x 0.32) = 0.84
Alternative Approach
• Standardize X and Y first (z-scores), then
calculate the covariance between the z-
scores.
r=
∑z X ⋅ zY
N
Significance Testing
• The following has a t distribution:
r N −2
t=
1− r2
df = N – 2
r = .84, t = 2.68, df = 3, p = .075
Not significant at .05 level. Small sample size.
When There’s a Significant
Correlation
• Correlation and Causation
• X causes Y
• Y causes X
• Z causes both X and Y
0.8 0.8
0.6 0.6
0.4 0.4
0.2 0.2
0 0
0 0.2 0.4 0.6 0.8 1 0 0.2 0.4 0.6 0.8 1
An Example
70
60 r = .86
50
40
Pushup
30
20
10
0
0 5 10 15 20 25
Spinach
Coefficientsa
Unstandardized Standardized
Coefficients Coefficients
Model B Std. Error Beta t Sig.
1 (Constant) 19.443 3.494 5.565 .000
spinach 1.550 .220 .856 7.031 .000
a. Dependent Variable: pushup
Yˆ =19.48+1.55X
)
zY = (.856) z X
Understanding R2:
Proportion of Variance Explained, or
Proportion Reduction in Error
70
60
50
40
Pushup
30
20
10
0
0 5 10 15 20 25
Spinach
70
60
50
40
Pushup
30
70
60
50
40
Pushup
30
60
50
40
Pushup
30
20
10
0
0 5 10 15 20 25
Spinach
70
60
50
40
Pushup
30
0
0 5 10 15 20 25
Spinach
70
60
50
40
Pushup
30
Green
20
)
R2 =
∑ (Y − Y ) 2
10
∑ (Y − Y ) 2
Green and
Red
0
0 5 10 15 20 25
Spinach
# of push-ups
Spinach consumption
Association Between Two
Categorical Variables
• Angelina Jolie or Jennifer Aniston?
Right
Total
Expected Frequency
• Expected assuming the null hypothesis is
true, i.e., no association between the two
variables.
C⋅R
Expected =
N
• C: Column total, R: Row total,
N: Grand total
Chi-Square
(Observed − Expected ) 2
χ =∑
2
Expected
• Degree of Freedom
df = (# of Columns – 1)(# of Rows – 1)
• What is the df for a 2 x 2 table?
• The shape of Chi-Square distribution
depends on the degree of freedom
Chi-Square Distribution
Critical Region
Chi-Square
• The chi-square statistic is always positive.
Why?
• When df = 1, chi-square distribution is the
distribution of z2.
• Without looking up in a reference, what is
the alpha = .05 cutoff value for the chi-
square distribution (df = 1)?
– (1.96)2 = 3.84
Back to Angelina and Jennifer
• In SPSS.
Chi-Square Test
for Goodness of Fit
To test whether a distribution is
the same as a predetermined or
theoretical distribution.
Next Week
• Integrating t-test, correlation, regression,
and chi-square test for independence
• They are all special cases of the general
linear model
• Effect size and power for the above tests