Lecture 5 Correlation
Lecture 5 Correlation
Lecture 5 Correlation
1
Three segments
! Overview ! Calculation of r ! Assumptions
Lecture 5 ~ Segment 1
Correlation: Overview
Correlation: Overview
! Important concepts & topics
! What is a correlation? ! What are they used for? ! Scatterplots ! CAUTION! ! Types of correlations
4
Correlation: Overview
! Correlation
! A statistical procedure used to measure and describe the relationship between two variables ! Correlations can range between +1 and -1
! +1 is a perfect positive correlation ! 0 is no correlation (independence) ! -1 is a perfect negative correlation
5
Correlation: Overview
! When two variables, lets call them X and Y, are correlated, then one variable can be used to predict the other variable
! More precisely, a persons score on X can be used to predict his or her score on Y
Correlation: Overview
! Example:
! Working memory capacity is strongly correlated with intelligence, or IQ, in healthy young adults ! So if we know a persons IQ then we can predict how they will do on a test of working memory
Correlation: Overview
Correlation: Overview
! CAUTION!
! Correlation does not imply causation
Correlation: Overview
! CAUTION!
! The magnitude of a correlation depends upon many factors, including:
! Sampling (random and representative?)
10
Correlation: Overview
! CAUTION!
! The magnitude of a correlation is also influenced by:
! Measurement of X & Y (See Lecture 6) ! Several other assumptions (See Segment 3)
11
Correlation: Overview
! For now, consider just one assumption:
! Random and representative sampling ! There is a strong correlation between IQ and working memory among all healthy young adults.
! What is the correlation between IQ and working memory among college graduates?
12
Correlation: Overview
13
Correlation: Overview
! CAUTION! ! Finally & perhaps most important:
! The correlation coefficient is a sample statistic, just like the mean
! It may not be representative of ALL individuals
! For example, in school I scored very high on Math and Science but below average on Language and History
14
Correlation: Overview
15
Correlation: Overview
! Note: there are several types of correlation coefficients, for different variable types
! Pearson product-moment correlation coefficient (r)
! When both variables, X & Y, are continuous
Correlation: Overview
! Note: there are several types of correlation coefficients
! Phi coefficient
! When both variables are dichotomous
17
Segment summary
! Important concepts/topics
! What is a correlation? ! What are they used for? ! Scatterplots ! CAUTION! ! Types of correlations
18
END SEGMENT
19
Lecture 5 ~ Segment 2
Calculation of r
20
Calculation of r
! Important topics
! r
! Pearson product-moment correlation coefficient
! Raw score formula ! Z-score formula
21
Calculation of r
! r = the degree to which X and Y vary together, relative to the degree to which X and Y vary independently ! r = (Covariance of X & Y) / (Variance of X & Y)
22
Calculation of r
! Two ways to calculate r ! Raw score formula ! Z-score formula
23
Calculation of r
! Lets quickly review calculations from Lecture 4 on summary statistics
! Variance = SD2 = MS = (SS/N)
24
Linsanity!
25
./012'
$%&# &%&# +*!%,# -%&# +!%,# *$%&# '%&# $%&# !%&# +!'%,#
./0123'
!"%'(# *'%"(# *)*%!(# *"%-(# ,%!(# !&-%'(# '%'(# !"%'(# $%!(# -!"%-(# .#/#(!!%*0*'#/#(!%!*#
26
Results
! M = Mean = 22.7 2 ! SD = Variance = MS = SS/N = 92.21 ! SD = Standard Deviation = 9.6
27
28
! (X Mx)
! (X - Mx)2
30
31
Calculation of r
Raw score formula:! ! r = SPxy / SQRT(SSx x SSy)!
!
32
Calculation of r
SPxy = ![(X - Mx) x (Y - My)]! ! 2 SSx = !(X - Mx) = ![(X - Mx) x (X - Mx)]! ! SSy = !(Y - My)2 = ![(Y - My) x (Y - My)]!
!
! ! !
33
Formulae to calculate r
r = SPxy / SQRT (SSx x SSy)! ! r = ![(X - Mx) x (Y - My)] / ! 2 2 SQRT (!(X - Mx) x !(Y - My) )!
! ! ! ! !
34
Formulae to calculate r
Z-score formula:! ! r = !(Zx x Zy) / N! !
35
Formulae to calculate r
Zx = (X - Mx) / SDx! Zy = (Y - My) / SDy! ! 2 SDx = SQRT (!(X - Mx) / N)! SDy = SQRT (!(Y - My)2 / N)! ! ! !
36
Formulae to calculate r
Proof of equivalence:! ! Zx = (X - Mx) / SQRT (!(X - Mx)2 / N)! ! Zy = (Y - My) / SQRT (!(Y - My)2 / N)!
! ! !
37
Formulae to calculate r
r = ! { [(X - Mx) / SQRT (!(X - Mx)2 / N)] x! [(Y - My) / SQRT (!(Y - My)2 / N)] } / N!
! !
! ! ! !
!
38
Formulae to calculate r
r = ! { [(X - Mx) / SQRT (!(X - Mx)2 / N)] x! [(Y - My) / SQRT (!(Y - My)2 / N)] } / N!
!
r = ! [(X - Mx) x (Y - My)] / ! SQRT ( !(X - Mx)2 x !(Y - My)2 )! ! r = SPxy / SQRT (SSx x SSy) ! The raw score formula!!
!
39
40
41
Segment summary
! Important topics
! r
! Pearson product-moment correlation coefficient
! Raw score formula ! Z-score formula
42
END SEGMENT
43
Lecture 5 ~ Segment 3
Assumptions
44
Assumptions
! Assumptions when interpreting r
! Normal distributions for X and Y ! Linear relationship between X and Y ! Homoscedasticity
45
Assumptions
! Assumptions when interpreting r
! Reliability of X and Y ! Validity of X and Y ! Random and representative sampling
46
Assumptions
! Assumptions when interpreting r
! Normal distributions for X and Y
! How to detect violations?
! Plot histograms and examine summary statistics
47
Assumptions
! Assumptions when interpreting r
! Linear relationship between X and Y
! How to detect violation?
! Examine scatterplots (see following examples)
48
Assumptions
! Assumptions when interpreting r
! Homoscedasticity
! How to detect violation?
! Examine scatterplots (see following examples)
49
Homoscedasticity
! In a scatterplot the vertical distance between a dot and the regression line reflects the amount of prediction error (known as the residual)
50
Homoscedasticity
! Homoscedasticity means that the distances (the residuals) are not related to the variable plotted on the X axis (they are not a function of X) ! This is best illustrated with scatterplots
51
Anscombes quartet
! In 1973, statistician Dr. Frank Anscombe developed a classic example to illustrate several of the assumptions underlying correlation and regression
52
Anscombes quartet
53
Anscombes quartet
54
Anscombes quartet
55
Anscombes quartet
56
Anscombes quartet
57
Segment summary
! Assumptions when interpreting r
! Normal distributions for X and Y ! Linear relationship between X and Y ! Homoscedasticity
58
Segment summary
! Assumptions when interpreting r
! Reliability of X and Y ! Validity of X and Y ! Random and representative sampling
59
END SEGMENT
60
END LECTURE 5
61