0% found this document useful (0 votes)
21 views61 pages

Lecture 5 Correlation

Uploaded by

Juan
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
21 views61 pages

Lecture 5 Correlation

Uploaded by

Juan
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 61

Statistics One

Lecture 5 Correlation
1

Three segments
! Overview ! Calculation of r ! Assumptions

Lecture 5 ~ Segment 1
Correlation: Overview

Correlation: Overview
! Important concepts & topics
! What is a correlation? ! What are they used for? ! Scatterplots ! CAUTION! ! Types of correlations
4

Correlation: Overview
! Correlation
! A statistical procedure used to measure and describe the relationship between two variables ! Correlations can range between +1 and -1
! +1 is a perfect positive correlation ! 0 is no correlation (independence) ! -1 is a perfect negative correlation
5

Correlation: Overview
! When two variables, lets call them X and Y, are correlated, then one variable can be used to predict the other variable
! More precisely, a persons score on X can be used to predict his or her score on Y

Correlation: Overview
! Example:
! Working memory capacity is strongly correlated with intelligence, or IQ, in healthy young adults ! So if we know a persons IQ then we can predict how they will do on a test of working memory

Correlation: Overview

Correlation: Overview
! CAUTION!
! Correlation does not imply causation

Correlation: Overview
! CAUTION!
! The magnitude of a correlation depends upon many factors, including:
! Sampling (random and representative?)

10

Correlation: Overview
! CAUTION!
! The magnitude of a correlation is also influenced by:
! Measurement of X & Y (See Lecture 6) ! Several other assumptions (See Segment 3)

11

Correlation: Overview
! For now, consider just one assumption:
! Random and representative sampling ! There is a strong correlation between IQ and working memory among all healthy young adults.
! What is the correlation between IQ and working memory among college graduates?

12

Correlation: Overview

13

Correlation: Overview
! CAUTION! ! Finally & perhaps most important:
! The correlation coefficient is a sample statistic, just like the mean
! It may not be representative of ALL individuals
! For example, in school I scored very high on Math and Science but below average on Language and History
14

Correlation: Overview

15

Correlation: Overview
! Note: there are several types of correlation coefficients, for different variable types
! Pearson product-moment correlation coefficient (r)
! When both variables, X & Y, are continuous

! Point bi-serial correlation


! When 1 variable is continuous and 1 is dichotomous
16

Correlation: Overview
! Note: there are several types of correlation coefficients
! Phi coefficient
! When both variables are dichotomous

! Spearman rank correlation


! When both variables are ordinal (ranked data)

17

Segment summary
! Important concepts/topics
! What is a correlation? ! What are they used for? ! Scatterplots ! CAUTION! ! Types of correlations
18

END SEGMENT

19

Lecture 5 ~ Segment 2
Calculation of r

20

Calculation of r
! Important topics
! r
! Pearson product-moment correlation coefficient
! Raw score formula ! Z-score formula

! Sum of cross products (SP) & Covariance

21

Calculation of r
! r = the degree to which X and Y vary together, relative to the degree to which X and Y vary independently ! r = (Covariance of X & Y) / (Variance of X & Y)

22

Calculation of r
! Two ways to calculate r ! Raw score formula ! Z-score formula

23

Calculation of r
! Lets quickly review calculations from Lecture 4 on summary statistics
! Variance = SD2 = MS = (SS/N)

24

Linsanity!

25

Jeremy Lin (10 games)


!"#$%&'()*'+,-)'
!"# !)# *'# !,# !'# &"# !&# !"# !$# !# .#/#!!,0*'#/#!!%,# .#/#'0*'#/#'#

./012'
$%&# &%&# +*!%,# -%&# +!%,# *$%&# '%&# $%&# !%&# +!'%,#

./0123'
!"%'(# *'%"(# *)*%!(# *"%-(# ,%!(# !&-%'(# '%'(# !"%'(# $%!(# -!"%-(# .#/#(!!%*0*'#/#(!%!*#
26

Results
! M = Mean = 22.7 2 ! SD = Variance = MS = SS/N = 92.21 ! SD = Standard Deviation = 9.6

27

Just one new concept!


! SP = Sum of cross Products

28

Just one new concept!


! Review: To calculate SS
! For each row, calculate the deviation score ! Square the deviation scores ! Sum the squared deviation scores
! SSx = ![(X Mx)2] = ![(X Mx) x (X Mx)]
29

! (X Mx)

! (X - Mx)2

Just one new concept!


! To calculate SP
! For each row, calculate the deviation score on X ! For each row, calculate the deviation score on Y
! (Y My) ! (X - Mx)

30

Just one new concept!


! To calculate SP
! Then, for each row, multiply the deviation score on X by the deviation score on Y ! Then, sum the cross products
! SP = ![(X Mx) x (Y My)] ! (X Mx) x (Y My)

31

Calculation of r
Raw score formula:! ! r = SPxy / SQRT(SSx x SSy)!
!

32

Calculation of r
SPxy = ![(X - Mx) x (Y - My)]! ! 2 SSx = !(X - Mx) = ![(X - Mx) x (X - Mx)]! ! SSy = !(Y - My)2 = ![(Y - My) x (Y - My)]!

!
! ! !
33

Formulae to calculate r
r = SPxy / SQRT (SSx x SSy)! ! r = ![(X - Mx) x (Y - My)] / ! 2 2 SQRT (!(X - Mx) x !(Y - My) )!
! ! ! ! !

34

Formulae to calculate r
Z-score formula:! ! r = !(Zx x Zy) / N! !

35

Formulae to calculate r
Zx = (X - Mx) / SDx! Zy = (Y - My) / SDy! ! 2 SDx = SQRT (!(X - Mx) / N)! SDy = SQRT (!(Y - My)2 / N)! ! ! !

36

Formulae to calculate r
Proof of equivalence:! ! Zx = (X - Mx) / SQRT (!(X - Mx)2 / N)! ! Zy = (Y - My) / SQRT (!(Y - My)2 / N)!
! ! !
37

Formulae to calculate r
r = ! { [(X - Mx) / SQRT (!(X - Mx)2 / N)] x! [(Y - My) / SQRT (!(Y - My)2 / N)] } / N!

! !
! ! ! !
!
38

Formulae to calculate r
r = ! { [(X - Mx) / SQRT (!(X - Mx)2 / N)] x! [(Y - My) / SQRT (!(Y - My)2 / N)] } / N!

!
r = ! [(X - Mx) x (Y - My)] / ! SQRT ( !(X - Mx)2 x !(Y - My)2 )! ! r = SPxy / SQRT (SSx x SSy) ! The raw score formula!!
!

39

Variance and covariance


! Variance = MS = SS / N ! Covariance = COV = SP / N ! Correlation is standardized COV
! Standardized so the value is in the range -1 to 1

40

Note on the denominators


! Correlation for descriptive statistics
! Divide by N

! Correlation for inferential statistics


! Divide by N 1

41

Segment summary
! Important topics
! r
! Pearson product-moment correlation coefficient
! Raw score formula ! Z-score formula

! Sum of cross Products (SP) & Covariance

42

END SEGMENT

43

Lecture 5 ~ Segment 3
Assumptions

44

Assumptions
! Assumptions when interpreting r
! Normal distributions for X and Y ! Linear relationship between X and Y ! Homoscedasticity

45

Assumptions
! Assumptions when interpreting r
! Reliability of X and Y ! Validity of X and Y ! Random and representative sampling

46

Assumptions
! Assumptions when interpreting r
! Normal distributions for X and Y
! How to detect violations?
! Plot histograms and examine summary statistics

47

Assumptions
! Assumptions when interpreting r
! Linear relationship between X and Y
! How to detect violation?
! Examine scatterplots (see following examples)

48

Assumptions
! Assumptions when interpreting r
! Homoscedasticity
! How to detect violation?
! Examine scatterplots (see following examples)

49

Homoscedasticity
! In a scatterplot the vertical distance between a dot and the regression line reflects the amount of prediction error (known as the residual)

50

Homoscedasticity
! Homoscedasticity means that the distances (the residuals) are not related to the variable plotted on the X axis (they are not a function of X) ! This is best illustrated with scatterplots
51

Anscombes quartet
! In 1973, statistician Dr. Frank Anscombe developed a classic example to illustrate several of the assumptions underlying correlation and regression

52

Anscombes quartet

53

Anscombes quartet

54

Anscombes quartet

55

Anscombes quartet

56

Anscombes quartet

57

Segment summary
! Assumptions when interpreting r
! Normal distributions for X and Y ! Linear relationship between X and Y ! Homoscedasticity

58

Segment summary
! Assumptions when interpreting r
! Reliability of X and Y ! Validity of X and Y ! Random and representative sampling

59

END SEGMENT

60

END LECTURE 5

61

You might also like