DSUR I Chapter 06 (Correlation)
DSUR I Chapter 06 (Correlation)
140
Appreciation of Dimmu Borgir
120
100
80
60
40
20
-20
10 20 30 40 50 60 70 80 90
Age
Slide 4
90
70
60
50
40
30
20
10
10 20 30 40 50 60 70 80 90
Age
Slide 5
100
80
Appreciation of Dimmu Borgir
60
40
20
-20
10 20 30 40 50 60 70 80 90
Age
Slide 6
Measuring Relationships
• We need to see whether as one variable
increases, the other increases, decreases or
stays the same.
• This can be done by calculating the
covariance.
– We look at how much each score deviates from
the mean.
– If both variables deviate from the mean by the
same amount, they are likely to be related.
Revision of Variance
• The variance tells us by how much scores
deviate from the mean for a single variable.
• It is closely linked to the sum of squares.
• Covariance is similar – it tells is by how
much scores on two variables differ from
their respective means.
Variance
• The variance tells us by how much scores
deviate from the mean for a single variable.
• It is closely linked to the sum of squares.
xi x
2
variance N 1
xi x xi x
N 1
Covariance
• Calculate the error between the mean and
each subject’s score for the first variable (x).
• Calculate the error between the mean and
their score for the second variable (y).
• Multiply these error values.
• Add these values and you get the cross
product deviations.
• The covariance is the average cross-product
deviations:
xi x yi y
cov( x, y) N 1
( xi x )( y i y )
cov( x , y )
N 1
( 0.4)( 3) ( 1.4)( 2 ) ( 1.4)( 1) (0.6)( 2 ) ( 2.6)( 4)
4
1.2 2.8 1.4 1.2 10.4
4
17
4
4.25
Problems with Covariance
• It depends upon the units of measurement.
– E.g. the covariance of two variables measured in
miles might be 4.25, but if the same scores are
converted to kilometres, the covariance is 11.
• One solution: standardize it!
– Divide by the standard deviations of both
variables.
• The standardized version of covariance is
known as the correlation coefficient.
– It is relatively unaffected by units of measurement.
The Correlation Coefficient
r
cov xy
sx s y
xi x yi y
N 1 sx s y
The Correlation Coefficient
r
cov xy
sx s y
4.25
1.67 2.92
.87
Correlation: Example
• Anxiety and exam performance
• Participants:
– 103 students
• Measures
– Time spent revising (hours)
– Exam performance (%)
– Exam Anxiety (the EAQ, score out of 100)
– Gender
Doing a Correlation with R
Commander
General Procedure for Correlations
Using R
• To compute basic correlation coefficients
there are three main functions that can be
used:
cor(), cor.test() and rcorr().
Correlations using R
• Pearson correlations:
– cor(examData, use = "complete.obs", method
= "pearson")
– rcorr(examData, type = "pearson")
– cor.test(examData$Exam, examData$Anxiety,
method = "pearson")
• If we predicted a negative correlation:
– cor.test(examData$Exam, examData$Anxiety,
alternative = "less"), method = "pearson")
Pearson Correlation Output
Call:
boot(data = liarData, statistic = bootTau, R = 2000)
Bootstrap Statistics :
original bias std. error
t1* -0.3002413 0.001058191 0.097663
Bootstrapping Correlations Output
• The output below shows the contents of the boot.ci() function:
CALL :
boot.ci(boot.out = boot_kendall)
Intervals :
Level Normal Basic
95% (-0.4927, -0.1099 ) (-0.4956, -0.1126 )
• Partial correlation:
– Measures the relationship between two
variables, controlling for the effect that a third
variable has on them both.
• Semi-partial correlation:
– Measures the relationship between two
variables controlling for the effect that a third
variable has on only one of the others.
Slide 37
Exam
Performance
1
Variance Accounted for by Exam Anxiety
Exam Anxiety (19.4%)
Exam
Performance
2 Revision Time
3 Revision Time
13-Oct-19
Doing Partial Correlation using R
• The general form of pcor() is:
pcor(c("var1", "var2", "control1", "control2"
etc.), var(dataframe))
• We can then see the partial correlation and
the value of R2 in the console by executing:
pc
pc^2
Doing Partial Correlation using R
• The general form of pcor.test() is:
pcor(pcor object, number of control variables,
sample size)
• Basically, you enter an object that you have
created with pcor() (or you can put the
pcor() command directly into the function):
pcor.test(pc, 1, 103)
Partial Correlation Output
> pc
[1] -0.2466658
> pc^2
[1] 0.06084403
> t(pc, 1, 103)
$tval
[1] -2.545307
$df
[1] 100
$pvalue
[1] 0.01244581