10 Cor1
10 Cor1
analysis
Ryan Tibshirani
Data Mining: 36-462/36-662
February 14 2013
1
Review: correlation
Given two random variables X, Y ∈ R, the (Pearson) correlation
between X and Y is defined as
Cov(X, Y )
Cor(X, Y ) = p p
Var(X) Var(Y )
Recall that
Cov(X, Y ) = E (X − E[X])(Y − E[Y ])
and
Var(X) = E (X − E[X])2 = Cov(X, X)
cov(x, y)
cor(x, y) = p p .
var(x) var(y)
α1 , β1 = argmax (Xα)T (Y β)
kXαk2 =1, kY βk2 =1
αk , βk = argmax (Xα)T (Y β)
kXαk2 =1, kY βk2 =1
(Xα)T (Xαj )=0, j=1,...k−1
(Y β)T (Y βj )=0, j=1,...k−1
5
Example: scores data
Example: n = 88 students took tests in each of 5 subjects:
mechanics, vectors, algebra, analysis, statistics. (From Mardia et
al. (1979) “Multivariate analysis”.) Each test is out of 100 points
6
The first canonical directions (multiplied by 103 ):
8.782 alg
2.770 mec
α1 = , β1 = 0.860 ana
5.517 vec
0.370 sta
The first canonical correlation is ρ1 = 0.663, and the variates:
0.8
●
0.7
●
● ●
●●
● ●
● ●
0.6 ●
● ● ● ● ●
● ●●
●
● ● ● ●●● ●
●
● ● ● ● ●● ● ●
●
0.5
●
● ●●
●● ● ● ● ●
Yβ1
● ●
●
● ● ●●
●
● ● ● ●●●
●●
● ● ●● ●●
0.4
● ● ●●
● ● ●
● ● ●
● ●
● ●
0.3
0.2
Xα1
7
How many canonical directions are there?
1
This is assuming that n ≥ p and n ≥ q. In general, there are actually only
r = min{rank(X), rank(Y )} canonical directions
8
Transforming the problem
9
Sphering
For any symmetric invertible matrix A ∈ Rn×n , there is a matrix
A1/2 ∈ Rn×n , called the (symmetric) square root of A, such that
A1/2 A1/2 = A
−1/2 −1/2
Recall that then α1 = VX α̃1 and β1 = VY β̃1 .
M = U DV T
1. region
2. palmitic
3. palmitoleic
4. stearic
5. oleic
6. linoleic
7. linolenic
8. arachidic
9. eicosenoic
Variable 1 takes values in {1, 2, 3}, indicating the region (in Italy)
of origin. Variables 2-9 are continuous valued and measure the
percentage composition of 8 different fatty acids
13
We are interested in the correlations between the region of origin
and the fatty acid measurements. Hence we take X ∈ R572×8 to
contain the fatty acid measurements, and Y ∈ R572×3 to be an
indicator matrix, i.e., each row of Y indicates the region with a 1
and otherwise has 0s. This might look like:
1 0 0
1 0 0
Y = 0 0 1
0 1 0
...
(In this case, canonical correlation analysis actually does the exact
same thing as linear discriminant analysis, an important tool that
we will learn later for classification)
14
The first two canonical X variates, with the points colored by
region:
1.40
●
●● ●●●
● ● ●
● ● ●● ● ●●
● ● ●
●● ● ●● ●
● ●●●●●● ●
● ●
●●● ●
Second canonical x variate
● ● ●●●●
●● ●●
●
●●●●
●
●●
● ● ●●●
●● ● ●●
● ● ●●●● ●
●
● ●
●●
●● ●
●● ●
● ●●●
● ● ●●●●
●●
●
1.35 ●
●
● ● ● ● ●
● ●●●
●
●
●●●
●●
●●●●
●
● ● ● ●● ● ● ● ● ●● ● ●
●●●●
●● ●●● ● ●● ●● ● ●● ● ●
● ● ●●
●
●● ● ●
●●● ● ●●● ●●●●
●● ● ●●
●● ●● ● ●●●● ● ●
● ●●● ●●●
●● ●● ●●
●●
● ●● ● ●●● ●●● ●●● ●
●● ● ●
● ●
●●
●●
●●●
● ● ● ●●●
●● ●
●● ● ● ● ●
● ● ● ●●●●●●●●● ●
●
●
● ●
● ●●●● ● ●
● ● ●●
●
●
●●●● ●
● ●●●●
●●
●
●● ● ●● ●● ●
●●● ● ●
●● ●● ● ●●●● ● ●●
● ●●● ●
●●● ●●
●●●● ● ●
1.30
●●●●● ●● ●● ● ● ● ●● ●●
● ●●● ●●●●●●●●
●●● ●●
● ●●
●● ●●●
●● ●● ●●●●●●● ● ●●
● ●
● ● ● ●●● ●
● ● ●
●● ● ● ● ●
●●
● ● ● ● ● ●●● ● ●●
● ● ● ●●● ● ●●
● ● ●●● ●
● ● ●● ●●●
●
●●
● ●
● ●● ●● ● ● ●●●
●
●●
●●
● ●
● ●
● ●●●
●●●
● ●
●
●
●●
● ●
●●
●
●●●
1.25
● ●●●
●
●
●● ● ● ●●
●● ●●● ●
●● ●● ●
●●
●●
●
●
●●●
●
●●● ●●●●
●●
● Region 1 ● ●●
●
●●
●
● Region 2
1.20
● Region 3 ●
15
Canonical correlation analysis in R
cc = cancor(x,y)
alpha = cc$xcoef
beta = cc$ycoef
rho = cc$cor
xvars = x %*% alpha
yvars = y %*% beta
16
Recap: canonical correlation analysis
17
Next time: measures of correlation
A lot of work has been done, but there’s still a lot of interest
...
1888 2012
18