0% found this document useful (0 votes)
49 views42 pages

DSUR I Chapter 06 (Correlation)

The document discusses measuring relationships between variables using correlation. It defines correlation as a way to measure how two variables change together, and introduces several correlation coefficients including Pearson's r, Spearman's rho, and Kendall's tau. Examples are provided calculating these coefficients in R and interpreting the results. Bootstrapping methods are also described as a way to estimate confidence intervals for correlation values.

Uploaded by

paynebrennan
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
49 views42 pages

DSUR I Chapter 06 (Correlation)

The document discusses measuring relationships between variables using correlation. It defines correlation as a way to measure how two variables change together, and introduces several correlation coefficients including Pearson's r, Spearman's rho, and Kendall's tau. Examples are provided calculating these coefficients in R and interpreting the results. Bootstrapping methods are also described as a way to estimate confidence intervals for correlation values.

Uploaded by

paynebrennan
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 42

Correlation

Prof. Andy Field


Aims
• Measuring relationships
– Scatterplots
– Covariance
– Pearson’s correlation coefficient
• Nonparametric measures
– Spearman’s rho
– Kendall’s tau
• Interpreting correlations
– Causality
• Partial correlations
What is a Correlation?
• It is a way of measuring the extent to which
two variables are related.
• It measures the pattern of responses across
variables.
160

140
Appreciation of Dimmu Borgir

120

100

80

60

40

20

-20
10 20 30 40 50 60 70 80 90

Age
Slide 4
90

Appreciation of Dimmu Borgir 80

70

60

50

40

30

20

10
10 20 30 40 50 60 70 80 90

Age

Slide 5
100

80
Appreciation of Dimmu Borgir

60

40

20

-20
10 20 30 40 50 60 70 80 90

Age

Slide 6
Measuring Relationships
• We need to see whether as one variable
increases, the other increases, decreases or
stays the same.
• This can be done by calculating the
covariance.
– We look at how much each score deviates from
the mean.
– If both variables deviate from the mean by the
same amount, they are likely to be related.
Revision of Variance
• The variance tells us by how much scores
deviate from the mean for a single variable.
• It is closely linked to the sum of squares.
• Covariance is similar – it tells is by how
much scores on two variables differ from
their respective means.
Variance
• The variance tells us by how much scores
deviate from the mean for a single variable.
• It is closely linked to the sum of squares.
  xi  x 
2

variance  N 1
  xi  x  xi  x 
 N 1
Covariance
• Calculate the error between the mean and
each subject’s score for the first variable (x).
• Calculate the error between the mean and
their score for the second variable (y).
• Multiply these error values.
• Add these values and you get the cross
product deviations.
• The covariance is the average cross-product
deviations:
  xi  x  yi  y 
cov( x, y)  N 1
( xi  x )( y i  y )
cov( x , y ) 
N 1
( 0.4)( 3)  ( 1.4)( 2 )  ( 1.4)( 1)  (0.6)( 2 )  ( 2.6)( 4)

4
1.2  2.8  1.4  1.2  10.4

4
 17
4
 4.25
Problems with Covariance
• It depends upon the units of measurement.
– E.g. the covariance of two variables measured in
miles might be 4.25, but if the same scores are
converted to kilometres, the covariance is 11.
• One solution: standardize it!
– Divide by the standard deviations of both
variables.
• The standardized version of covariance is
known as the correlation coefficient.
– It is relatively unaffected by units of measurement.
The Correlation Coefficient

r
cov xy
sx s y

  xi  x  yi  y 
  N 1 sx s y
The Correlation Coefficient

r
cov xy
sx s y

4.25

1.67  2.92
 .87
Correlation: Example
• Anxiety and exam performance
• Participants:
– 103 students
• Measures
– Time spent revising (hours)
– Exam performance (%)
– Exam Anxiety (the EAQ, score out of 100)
– Gender
Doing a Correlation with R
Commander
General Procedure for Correlations
Using R
• To compute basic correlation coefficients
there are three main functions that can be
used:
cor(), cor.test() and rcorr().
Correlations using R
• Pearson correlations:
– cor(examData, use = "complete.obs", method
= "pearson")
– rcorr(examData, type = "pearson")
– cor.test(examData$Exam, examData$Anxiety,
method = "pearson")
• If we predicted a negative correlation:
– cor.test(examData$Exam, examData$Anxiety,
alternative = "less"), method = "pearson")
Pearson Correlation Output

Exam Anxiety Revise


Exam 1.0000000 -0.4409934 0.3967207
Anxiety -0.4409934 1.0000000 -0.7092493
Revise 0.3967207 -0.7092493 1.0000000
Reporting the Results
• Exam performance was significantly
correlated with exam anxiety, r = .44, and
time spent revising, r = .40; the time spent
revising was also correlated with exam
anxiety, r = .71 (all ps < .001).
Things to Know about the
Correlation
• It varies between -1 and +1
– 0 = no relationship
• It is an effect size
– ±.1 = small effect
– ±.3 = medium effect
– ±.5 = large effect
• Coefficient of determination, r2
– By squaring the value of r you get the proportion
of variance in one variable shared by the other.
Correlation and Causality
• The third-variable problem:
– In any correlation, causality between two
variables cannot be assumed because there
may be other measured or unmeasured
variables affecting the results.
• Direction of causality:
– Correlation coefficients say nothing about
which variable causes the other to change.
Non-parametric Correlation
• Spearman’s rho
– Pearson’s correlation on the ranked data
• Kendall’s tau
– Better than Spearman’s for small samples
• World’s Biggest Liar competition
– 68 contestants
– Measures
• Where they were placed in the competition (first,
second, third, etc.)
• Creativity questionnaire (maximum score 60)
Spearman’s Rho
cor(liarData$Position, liarData$Creativity, method =
"spearman")
• The output of this command will be:
[1] -0.3732184
• To get the significance value use rcorr() (NB:
first convert the dataframe to a matrix):
liarMatrix<-as.matrix(liarData[, c("Position",
"Creativity")])
rcorr(liarMatrix)
• Or:
cor.test(liarData$Position, liarData$Creativity,
alternative = "less", method = "spearman")
Spearman's Rho
Output
Spearman's rank correlation rho
data: liarData$Position and liarData$Creativity
S = 71948.4, p-value = 0.0008602
alternative hypothesis: true rho is less than 0
sample estimates:
rho
-0.3732184
Kendall’s Tau (Non-parametric)
• To carry out Kendall’s correlation on the
World’s Biggest Liar data simply follow the
same steps as for Pearson and Spearman
correlations but use method = “kendall”:
cor(liarData$Position, liarData$Creativity,
method = "kendall")
cor.test(liarData$Position, liarData$Creativity,
alternative = "less", method = "kendall")
Kendall’s Tau (Non-parametric)

• The output is much the same as for


Spearman’s correlation.
Kendall's rank correlation tau
data: liarData$Position and liarData$Creativity
z = -3.2252, p-value = 0.0006294
alternative hypothesis: true tau is less than 0
sample estimates:
tau
-0.3002413
Bootstrapping Correlations
• If we stick with our World’s Biggest Liar data and
want to bootstrap Kendall’s tau, then our function
will be:
bootTau<-function(liarData,i) cor(liarData$Position[i],
liarData$Creativity[i], use = "complete.obs", method =
"kendall")

• To bootstrap a Pearson or Spearman correlation you


do it in exactly the same way except that you
specify method = “pearson” or method =
“spearman” when you define the function.
Bootstrapping Correlations Output
• To create the bootstrap object, we execute:
library(boot)
boot_kendall<-boot(liarData, bootTau, 2000)
boot_kendall
• To get the 95% confidence interval for the
boot_kendall object:
boot.ci(boot_kendall)
Bootstrapping Correlations
• To bootstrap a Pearson or Spearman
correlation you do it in exactly the same
way except that you specify method =
“pearson” or method = “spearman” when
you define the function.
Bootstrapping Correlations Output
• The output below shows the contents of boot_kendall:

ORDINARY NONPARAMETRIC BOOTSTRAP

Call:
boot(data = liarData, statistic = bootTau, R = 2000)

Bootstrap Statistics :
original bias std. error
t1* -0.3002413 0.001058191 0.097663
Bootstrapping Correlations Output
• The output below shows the contents of the boot.ci() function:

BOOTSTRAP CONFIDENCE INTERVAL CALCULATIONS


Based on 2000 bootstrap replicates

CALL :
boot.ci(boot.out = boot_kendall)

Intervals :
Level Normal Basic
95% (-0.4927, -0.1099 ) (-0.4956, -0.1126 )

Level Percentile BCa


95% (-0.4879, -0.1049 ) (-0.4777, -0.0941 )
Partial and Semi-partial Correlations

• Partial correlation:
– Measures the relationship between two
variables, controlling for the effect that a third
variable has on them both.
• Semi-partial correlation:
– Measures the relationship between two
variables controlling for the effect that a third
variable has on only one of the others.

Slide 37
Exam
Performance

1
Variance Accounted for by Exam Anxiety
Exam Anxiety (19.4%)

Exam
Performance

2 Revision Time

Variance Accounted for by


Revision Time (15.7%)

Unique variance accounted


for by Revision Time

Variance accounted for by Exam


both Exam Anxiety and Performance
Revision Time

3 Revision Time

Unique variance accounted Exam Anxiety


for by Exam Anxiety
Revision Revision

Exam Anxiety Exam Anxiety

Partial Correlation Semi-Partial Correlation

13-Oct-19
Doing Partial Correlation using R
• The general form of pcor() is:
pcor(c("var1", "var2", "control1", "control2"
etc.), var(dataframe))
• We can then see the partial correlation and
the value of R2 in the console by executing:
pc
pc^2
Doing Partial Correlation using R
• The general form of pcor.test() is:
pcor(pcor object, number of control variables,
sample size)
• Basically, you enter an object that you have
created with pcor() (or you can put the
pcor() command directly into the function):
pcor.test(pc, 1, 103)
Partial Correlation Output
> pc
[1] -0.2466658

> pc^2
[1] 0.06084403
> t(pc, 1, 103)
$tval
[1] -2.545307

$df
[1] 100

$pvalue
[1] 0.01244581

You might also like