0% found this document useful (0 votes)

119 views39 pages

Python Igraph

Uploaded by

rahuldeosin

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

119 views39 pages

Python Igraph

Uploaded by

rahuldeosin

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 39

Canonical Correlation Analysis

Nathaniel E. Helwig

Assistant Professor of Psychology and Statistics

University of Minnesota (Twin Cities)

Updated 16-Mar-2017

Nathaniel E. Helwig (U of Minnesota) Canonical Correlation Analysis Updated 16-Mar-2017 : Slide 1

Nathaniel E. Helwig (U of Minnesota) Canonical Correlation Analysis Updated 16-Mar-2017 : Slide 2

Outline of Notes

1) Canonical Correlations 2) Decathlon Example

Overview Data Overview
Population Defintion Two Sets of Variables
Sample Estimates CCA of Unstandardized Data
Large Samples CCA of Standardized Data

Nathaniel E. Helwig (U of Minnesota) Canonical Correlation Analysis Updated 16-Mar-2017 : Slide 3

Canonical Correlations

Nathaniel E. Helwig (U of Minnesota) Canonical Correlation Analysis Updated 16-Mar-2017 : Slide 4

Canonical Correlations Overview

Purpose of Canonical Correlation Analysis

Canonical Correlation Analysis (CCA) connects two sets of variables

by finding linear combinations of variables that maximally correlate.

There are two typical purposes of CCA:

1 Data reduction: explain covariation between two sets of variables
using small number of linear combinations
2 Data interpretation: find features (i.e., canonical variates) that are
important for explaining covariation between sets of variables

Nathaniel E. Helwig (U of Minnesota) Canonical Correlation Analysis Updated 16-Mar-2017 : Slide 5

Canonical Correlations Population Definition

Linear Combinations of Two Sets of Variables

Let X = (X1 , . . . , Xp )0 and Y = (Y1 , . . . , Yq )0 denote random vectors
with mean vectors µX and µY and covariance matrices ΣX and ΣY .

Let Z 0 = (X 0 , Y 0 ) and note that Z ∼ (µ, Σ) where µ0 = (µ0X , µ0Y ) and

ΣX ΣXY
Σ=
ΣYX ΣY

where ΣXY = E[(X − µX )(Y − µY )0 ] is the covariance between X & Y .

Define new variables U and V via linear combinations of X & Y

U = a0 X
V = b0 Y

Nathaniel E. Helwig (U of Minnesota) Canonical Correlation Analysis Updated 16-Mar-2017 : Slide 6

Canonical Correlations Population Definition

Defining Canonical Variates (and Correlations)

Note that U = a0 X and V = b0 Y have properties
Var(U) = a0 ΣX a
Var(V ) = b0 ΣY b
Cov(U, V ) = a0 ΣXY b

The first pair of canonical variates (U1 , V1 ) is defined via the pair of
linear combination vectors {a1 , b1 } that maximize
Cov(U, V ) a0 ΣXY b
Cor(U, V ) = p =√ 0 √
a ΣX a b0 ΣY b
p
Var(U) Var(V )
subject to U1 and V1 having unit variance.

Remaining canonical variates (U` , V` ) maximize the above subject to

having unit variance and being uncorrelated with (Uk , Vk ) for all k < `.
Nathaniel E. Helwig (U of Minnesota) Canonical Correlation Analysis Updated 16-Mar-2017 : Slide 7
Canonical Correlations Population Definition

Computing Canonical Variates (and Correlations)

The k -th pair of canonical variates is given by
−1/2 −1/2
Uk = u0k ΣX X and Vk = vk ΣY Y
| {z } | {z }
a0k b0k

where
−1/2 −1/2
uk is the k -th eigenvector of ΣX ΣXY Σ−1
Y ΣYX ΣX
−1/2 −1/2
vk is the k -th eigenvector of ΣY ΣYX Σ−1
X ΣXY ΣY

The k -th canonical correlation is given by

Cor(Uk , Vk ) = ρk
−1/2 −1/2
where ρ2k is the k -th eigenvalue of ΣX ΣXY Σ−1
Y ΣYX ΣX
−1/2 −1/2
[ρ2k is also the k -th eigenvalue of ΣY ΣYX Σ−1
X ΣXY ΣY ]
Nathaniel E. Helwig (U of Minnesota) Canonical Correlation Analysis Updated 16-Mar-2017 : Slide 8
Canonical Correlations Population Definition

Covariance of Original and Canonical Variables

U = A0 X and V = B0 Y where A = [a1 , . . . , ap ] and B = [b1 , . . . , bq ].

U = (U1 , . . . , Up )0 contains the p canonical variates from X
V = (V1 , . . . , Vq )0 contains the q canonical variates from Y
If p ≤ q, we are interested in first p canonical variates from Y

The canonical variates and original variables have covariance matrices

Cov(U, X ) = Cov(A0 X , X ) = A0 ΣX
Cov(U, Y ) = Cov(A0 X , Y ) = A0 ΣXY
Cov(V , X ) = Cov(B0 Y , X ) = B0 ΣYX
Cov(V , Y ) = Cov(B0 Y , Y ) = B0 ΣY

Nathaniel E. Helwig (U of Minnesota) Canonical Correlation Analysis Updated 16-Mar-2017 : Slide 9

Canonical Correlations Population Definition

Correlation of Original and Canonical Variables

The canonical variates and original variables have correlation matrices

−1/2 −1/2
Cor(U, X ) = Cov(A0 X , Σ̃X X ) = A0 ΣX Σ̃X
−1/2 −1/2
Cor(U, Y ) = Cov(A0 X , Σ̃Y Y ) = A0 ΣXY Σ̃Y
−1/2 −1/2
Cor(V , X ) = Cov(B0 Y , Σ̃X X ) = B0 ΣYX Σ̃X
−1/2 −1/2
Cor(V , Y ) = Cov(B0 Y , Σ̃Y Y ) = B0 ΣY Σ̃Y

given that Var(Uk ) = Var(V` ) = 1 for all k , `.

Σ̃X = diag(ΣX ) is a diagonal matrix containing X variances
Σ̃Y = diag(ΣY ) is a diagonal matrix containing Y variances

Nathaniel E. Helwig (U of Minnesota) Canonical Correlation Analysis Updated 16-Mar-2017 : Slide 10

Canonical Correlations Population Definition

Canonical Variates and Summarizing Variability

The linear transformations U = A0 X and V = B0 Y are defined to

maximize the correlations between the canonical variables.

Not the same as maximizing the explained variance in ΣX or ΣY .

If the first few pairs of canonical variables do not well explain the
variability in ΣX and ΣY , then the interpretation becomes less clear.

Nathaniel E. Helwig (U of Minnesota) Canonical Correlation Analysis Updated 16-Mar-2017 : Slide 11

Canonical Correlations Sample Estimates

Moving to the Sample Situation

iid
Assume that zi = (x0i , y0i )0 ∼ N(µ, Σ) where

µX ΣX ΣXY
µ= and Σ =
µY ΣYX ΣY

and let the sample mean vector and covariance matrix be denoted by

x̄ SX SXY
z̄ = and S =
ȳ SYX SY

where
Pn Pn
x̄ = (1/n) i=1 xi and ȳ = (1/n) i=1 yi
1 Pn 0
SX = n−1 i=1 (xi − x̄)(xi − x̄)
1 Pn 0
SY = n−1 i=1 (yi − ȳ)(yi − ȳ)
1 Pn
SXY = S0YX = n−1 i=1 (xi − x̄)(yi − ȳ)
0

Nathaniel E. Helwig (U of Minnesota) Canonical Correlation Analysis Updated 16-Mar-2017 : Slide 12

Canonical Correlations Sample Estimates

Defining Canonical Variates (and Correlations)

Note that U = a0 X and V = b0 Y have sample properties
Var(U)
c = a0 SX a
c ) = b0 SY b
Var(V
Cov(U,
d V ) = a0 SXY b

The first pair of sample canonical variates (U1 , V1 ) is defined via the
pair of linear combination vectors {a1 , b1 } that maximize
Cov(U,
d V) a0 SXY b
Cor(U,
d V) = q =√ 0 √
a SX a b0 SY b
q
Var(U)
c c )
Var(V
subject to U1 and V1 having unit variance.

Remaining canonical variates (U` , V` ) maximize the above subject to

having unit variance and being uncorrelated with (Uk , Vk ) for all k < `.
Nathaniel E. Helwig (U of Minnesota) Canonical Correlation Analysis Updated 16-Mar-2017 : Slide 13
Canonical Correlations Sample Estimates

Calculating Canonical Variates (and Correlations)

The sample estimate of the k -th pair of canonical variates is given by
−1/2 −1/2
Ûk = û0k SX X and V̂k = v̂k SY Y
| {z } | {z }
â0k b̂0k

where
−1/2 −1/2
ûk is the k -th eigenvector of SX SXY S−1
Y SYX SX
−1/2 −1/2
v̂k is the k -th eigenvector of SY SYX S−1
X SXY SY

The sample estimate of the k -th canonical correlation is given by

d k , Vk ) = ρ̂k
Cor(U
−1/2 −1/2
where ρ̂2k is the k -th eigenvalue of SX SXY S−1
Y SYX SX
−1/2 −1/2
[ρ̂2k is also the k -th eigenvalue of SY SYX S−1
X SXY SY ]
Nathaniel E. Helwig (U of Minnesota) Canonical Correlation Analysis Updated 16-Mar-2017 : Slide 14
Canonical Correlations Sample Estimates

Covariance of Original and Canonical Variables

Û = Â0 X and V̂ = B̂0 Y where Â = [â1 , . . . , âp ] and B̂ = [b̂1 , . . . , b̂q ].
Û = (Û1 , . . . , Ûp )0 contains the p canonical variates from X
V̂ = (V̂1 , . . . , V̂q )0 contains the q canonical variates from Y
If p ≤ q, we are interested in first p canonical variates from Y

The sample canonical variates and original variables have covariances

d Â0 X , X ) = Â0 SX
d Û, X ) = Cov(
Cov(
d Â0 X , Y ) = Â0 SXY
d Û, Y ) = Cov(
Cov(
d B̂0 Y , X ) = B̂0 SYX
d V̂ , X ) = Cov(
Cov(
d B̂0 Y , Y ) = B̂0 SY
d V̂ , Y ) = Cov(
Cov(

Nathaniel E. Helwig (U of Minnesota) Canonical Correlation Analysis Updated 16-Mar-2017 : Slide 15

Canonical Correlations Sample Estimates

Correlation of Original and Canonical Variables

The sample canonical variates and original variables have correlations

d Â0 X , S̃−1/2 X ) = Â0 SX S̃−1/2

d Û, X ) = Cov(
Cor( X X
d Â0 X , S̃−1/2 Y ) = Â0 SXY S̃−1/2
d Û, Y ) = Cov(
Cor( Y Y
d B̂0 Y , S̃−1/2 X ) = B̂0 SYX S̃−1/2
d V̂ , X ) = Cov(
Cor( X X
d B̂0 Y , S̃−1/2 Y ) = B̂0 SY S̃−1/2
d V̂ , Y ) = Cov(
Cor( Y Y

given that Var(Ûk ) = Var(V̂` ) = 1 for all k , `.

S̃X = diag(SX ) is a diagonal matrix containing X variances
S̃Y = diag(SY ) is a diagonal matrix containing Y variances

Nathaniel E. Helwig (U of Minnesota) Canonical Correlation Analysis Updated 16-Mar-2017 : Slide 16

Canonical Correlations Sample Estimates

Covariance Matrix Implied by CCA for X

Note that we have the following properties

d Û) = Â0 SX Â = Ip
Cov(

This implies that we can write

Â0 SX Â = Ip
(Â0 )−1 Â0 SX Â(Â−1 ) = (Â0 )−1 (Â−1 )
SX = (Â−1 )0 (Â−1 )
p
X
= (â(j) )(â(j) )0
j=1

where a(j) denotes the j-th column of (Â−1 )0 .

Nathaniel E. Helwig (U of Minnesota) Canonical Correlation Analysis Updated 16-Mar-2017 : Slide 17
Canonical Correlations Sample Estimates

Covariance Matrix Implied by CCA for Y

Note that we have the following properties

d V̂) = B̂0 SY B̂ = Iq
Cov(

This implies that we can write

B̂0 SY B̂ = Iq
(B̂0 )−1 B̂0 SY B̂(B̂−1 ) = (B̂0 )−1 (B̂−1 )
SY = (B̂−1 )0 (B̂−1 )
q
X
= (b̂(j) )(b̂(j) )0
j=1

where b(j) denotes the j-th column of (B̂−1 )0 .

Nathaniel E. Helwig (U of Minnesota) Canonical Correlation Analysis Updated 16-Mar-2017 : Slide 18
Canonical Correlations Sample Estimates

Covariance Matrix Implied by CCA for (X , Y )

Note that we have the following properties (assuming p < q)
 
ρ̂1 0 · · · 0 0 · · · 0
 0 ρ̂2 · · · 0 0 · · · 0
d Û, V̂) = Â0 SXY B̂ = 
Cov(  .. .. . . .. ..  = ρ̂

. . . . 0 . 0
0 0 ··· ρ̂p 0 · · · 0

This implies that we can write

Â0 SXY B̂ = ρ̂
(Â0 )−1 Â0 SXY B̂(B̂−1 ) = (Â0 )−1 ρ̂(B̂−1 )
SXY = (Â−1 )0 ρ̂(B̂−1 )
p
X
= ρ̂j (â(j) )(b̂(j) )0
j=1

Nathaniel E. Helwig (U of Minnesota) Canonical Correlation Analysis Updated 16-Mar-2017 : Slide 19

Canonical Correlations Sample Estimates

CCA Error of Approximation Matrices

Using r < p canonical variates, the approximation error matrices are
r
X p
X
(j) 0
EX = SX − (j)
(â )(â ) = (â(k ) )(â(k ) )0
j=1 k =r +1
r
X q
X
(j) 0
EY = SY − (j)
(b̂ )(b̂ ) = (b̂(k ) )(b̂(k ) )0
j=1 k =r +1
r
X q
X
(j) 0
EXY = SXY − (j)
ρ̂j (â )(b̂ ) = ρ̂k (â(k ) )(b̂(k ) )0
j=1 k =r +1

The error matrices provide a descriptive measure of how well the first r
pairs of canonical variates explain the covariation in the data.
Nathaniel E. Helwig (U of Minnesota) Canonical Correlation Analysis Updated 16-Mar-2017 : Slide 20
Canonical Correlations Large Sample Inference

Likelihood Ratio Test: Is CCA Worthwhile?

Note that if ΣXY = 0p×q , then Cov(U, V ) = a0 Σ12 b = 0 for all a and b.
Implies that all canonical correlations must be zero
Then there is no point in pursuing CCA

For large n, we reject H0 : ΣXY = 0p×q in favor of H1 : ΣXY 6= 0p×q if

p
|SX ||SY | X
−2 ln(Λ) = n ln = −n ln(1 − ρ̂2j )
|S|
j=1

is larger than χ2pq (α).

For an improvement to the χ2 approximation, Bartlett suggested

replacing the scaling factor of n by n − 1 − (1/2)(p + q + 1)
p
X
−2 ln(Λ) ≈ −[n − 1 − (1/2)(p + q + 1)] ln(1 − ρ̂2j )
j=1
Nathaniel E. Helwig (U of Minnesota) Canonical Correlation Analysis Updated 16-Mar-2017 : Slide 21
Decathlon Example

Decathlon Example

Nathaniel E. Helwig (U of Minnesota) Canonical Correlation Analysis Updated 16-Mar-2017 : Slide 22

Decathlon Example Data Overview

Men’s Olympic Decathlon Data from 1988

Data from men’s 1988 Olympic decathlon

Total of n = 34 athletes
Have p = 10 variables giving score for each decathlon event
Have overall decathlon score also (score)

> decathlon[1:9,]
run100 long.jump shot high.jump run400 hurdle discus pole.vault javelin run1500 score
Schenk 11.25 7.43 15.48 2.27 48.90 15.13 49.28 4.7 61.32 268.95 8488
Voss 10.87 7.45 14.97 1.97 47.71 14.46 44.36 5.1 61.76 273.02 8399
Steen 11.18 7.44 14.20 1.97 48.29 14.81 43.66 5.2 64.16 263.20 8328
Thompson 10.62 7.38 15.02 2.03 49.06 14.72 44.80 4.9 64.04 285.11 8306
Blondel 11.02 7.43 12.92 1.97 47.44 14.40 41.20 5.2 57.46 256.64 8286
Plaziat 10.83 7.72 13.58 2.12 48.34 14.18 43.06 4.9 52.18 274.07 8272
Bright 11.18 7.05 14.12 2.06 49.34 14.39 41.68 5.7 61.60 291.20 8216
De.Wit 11.05 6.95 15.34 2.00 48.21 14.36 41.32 4.8 63.00 265.86 8189
Johnson 11.15 7.12 14.52 2.03 49.15 14.66 42.36 4.9 66.46 269.62 8180

Nathaniel E. Helwig (U of Minnesota) Canonical Correlation Analysis Updated 16-Mar-2017 : Slide 23

Decathlon Example Data Overview

Resigning Running Events

For the running events (run100, run400, run1500, and hurdle),

lower scores correspond to better performance, whereas higher scores
represent better performance for other events.

To make interpretation simpler, we will resign the running events:

> decathlon[,c(1,5,6,10)] <- (-1)*decathlon[,c(1,5,6,10)]
> decathlon[1:9,]
run100 long.jump shot high.jump run400 hurdle discus pole.vault javelin run1500 score
Schenk -11.25 7.43 15.48 2.27 -48.90 -15.13 49.28 4.7 61.32 -268.95 8488
Voss -10.87 7.45 14.97 1.97 -47.71 -14.46 44.36 5.1 61.76 -273.02 8399
Steen -11.18 7.44 14.20 1.97 -48.29 -14.81 43.66 5.2 64.16 -263.20 8328
Thompson -10.62 7.38 15.02 2.03 -49.06 -14.72 44.80 4.9 64.04 -285.11 8306
Blondel -11.02 7.43 12.92 1.97 -47.44 -14.40 41.20 5.2 57.46 -256.64 8286
Plaziat -10.83 7.72 13.58 2.12 -48.34 -14.18 43.06 4.9 52.18 -274.07 8272
Bright -11.18 7.05 14.12 2.06 -49.34 -14.39 41.68 5.7 61.60 -291.20 8216
De.Wit -11.05 6.95 15.34 2.00 -48.21 -14.36 41.32 4.8 63.00 -265.86 8189
Johnson -11.15 7.12 14.52 2.03 -49.15 -14.66 42.36 4.9 66.46 -269.62 8180

Nathaniel E. Helwig (U of Minnesota) Canonical Correlation Analysis Updated 16-Mar-2017 : Slide 24

Decathlon Example Two Sets of Variables

Split Events into Two Sets: Arms vs Legs

We will split the decathlon events into two different sets:

X : shot, discus, javelin, pole.vault
Y : run100, run400, run1500, hurdle, long.jump, high.jump

Note that X are “arm events” (throwing/vaulting), whereas Y are “leg

events” (running/jumping).

R code to split the decathlon data into two sets of events

> X <- as.matrix(decathlon[,c("shot", "discus", "javelin", "pole.vault")])
> Y <- as.matrix(decathlon[,c("run100", "run400", "run1500", "hurdle", "long.jump", "high.jump")])
> n <- nrow(X) # n = 34
> p <- ncol(X) # p = 4
> q <- ncol(Y) # q = 6

Nathaniel E. Helwig (U of Minnesota) Canonical Correlation Analysis Updated 16-Mar-2017 : Slide 25

Decathlon Example Canonical Correlation Analysis of Unstandardized Data

CCA of Decathlon Data in R

R code to conduct canonical correlation analysis

# canonical correlations of covariance (unstandardized data)
> cca <- cancor(X, Y)

# cca (the normal way)

> Sx <- cov(X)
> Sy <- cov(Y)
> Sxy <- cov(X,Y)
> Sxeig <- eigen(Sx, symmetric=TRUE)
> Sxisqrt <- Sxeig$vectors %*% diag(1/sqrt(Sxeig$values)) %*% t(Sxeig$vectors)
> Syeig <- eigen(Sy, symmetric=TRUE)
> Syisqrt <- Syeig$vectors %*% diag(1/sqrt(Syeig$values)) %*% t(Syeig$vectors)
> Xmat <- Sxisqrt %*% Sxy %*% solve(Sy) %*% t(Sxy) %*% Sxisqrt
> Ymat <- Syisqrt %*% t(Sxy) %*% solve(Sx) %*% Sxy %*% Syisqrt
> Xeig <- eigen(Xmat, symmetric=TRUE)
> Yeig <- eigen(Ymat, symmetric=TRUE)

# compare correlations (same)

> cca$cor
[1] 0.7702006 0.5033532 0.4184145 0.3052556
> rho <- sqrt(Xeig$values)
> rho
[1] 0.7702006 0.5033532 0.4184145 0.3052556
> sqrt(Yeig$values[1:p])
[1] 0.7702006 0.5033532 0.4184145 0.3052556

Nathaniel E. Helwig (U of Minnesota) Canonical Correlation Analysis Updated 16-Mar-2017 : Slide 26

Decathlon Example Canonical Correlation Analysis of Unstandardized Data

CCA of Decathlon Data in R (continued)

R code to compare the CCA coefficients:
# compare linear combinations (different!)
> Ahat <- Sxisqrt %*% Xeig$vectors
> Bhat <- Syisqrt %*% Yeig$vectors
> sum((cca$xcoef - Ahat)^2)
[1] 6.710414
> sum((cca$ycoef[,1:p] - Bhat[,1:p])^2)
[1] 42.98483

# NOTE: you need to multiply R’s xcoef and ycoef by sqrt(n-1)

# to obtain the results we are expecting...

# compare linear combinations (same!)

> Ahat <- Sxisqrt %*% Xeig$vectors
> Bhat <- Syisqrt %*% Yeig$vectors
> sum((cca$xcoef * sqrt(n-1) - Ahat)^2)
[1] 3.031301e-28
> sum((cca$ycoef[,1:p] * sqrt(n-1) - Bhat[,1:p])^2)
[1] 2.414499e-25
Nathaniel E. Helwig (U of Minnesota) Canonical Correlation Analysis Updated 16-Mar-2017 : Slide 27
Decathlon Example Canonical Correlation Analysis of Unstandardized Data

Plot the CCA Coefficients

X Coefficients Y Coefficients
0.5

6
discus high.jump

4
javelin
0.0
A2 Coefficients

B2 Coefficients

2
−0.5

long.jump
run400

0
run1500

shot run100
−1.0

pole.vault hurdle

−2
−2.0 −1.5 −1.0 −0.5 0.0 −2.0 −1.5 −1.0 −0.5 0.0

A1 Coefficients B1 Coefficients

R code for left plot:

plot(Ahat[,1:2], xlab="A1 Coefficients", ylab="A2 Coefficients",
type="n", main="X Coefficients", xlim=c(-2, 0.1), ylim=c(-1.1, 0.5))
text(Ahat[,1:2], labels=colnames(X))

Nathaniel E. Helwig (U of Minnesota) Canonical Correlation Analysis Updated 16-Mar-2017 : Slide 28

Decathlon Example Canonical Correlation Analysis of Unstandardized Data

Define the Canonical Variables

If X = {xij }n×p and Y = {yij }n×q , then

Û = XÂ = {ûij }n×p where columns of Û contain the canonical
variables for the X set
V̂ = YB̂ = {v̂ij }n×q where columns of V̂ contain the canonical
variables for the Y set

R code to define canonical variables:

> U <- X %*% Ahat
> V <- Y %*% Bhat

Nathaniel E. Helwig (U of Minnesota) Canonical Correlation Analysis Updated 16-Mar-2017 : Slide 29

Decathlon Example Canonical Correlation Analysis of Unstandardized Data

Covariance Matrices of Canonical Variables

R code to check covariance matrices of the canonical variables:
# canonical variable covariances
> round(cov(U),4)
[,1] [,2] [,3] [,4]
[1,] 1 0 0 0
[2,] 0 1 0 0
[3,] 0 0 1 0
[4,] 0 0 0 1

> round(cov(V),4)
[,1] [,2] [,3] [,4] [,5] [,6]
[1,] 1 0 0 0 0 0
[2,] 0 1 0 0 0 0
[3,] 0 0 1 0 0 0
[4,] 0 0 0 1 0 0
[5,] 0 0 0 0 1 0
[6,] 0 0 0 0 0 1

> round(cov(U,V),4)
[,1] [,2] [,3] [,4] [,5] [,6]
[1,] 0.7702 0.0000 0.0000 0.0000 0 0
[2,] 0.0000 0.5034 0.0000 0.0000 0 0
[3,] 0.0000 0.0000 0.4184 0.0000 0 0
[4,] 0.0000 0.0000 0.0000 0.3053 0 0

> rho
[1] 0.7702006 0.5033532 0.4184145 0.3052556

Nathaniel E. Helwig (U of Minnesota) Canonical Correlation Analysis Updated 16-Mar-2017 : Slide 30

Decathlon Example Canonical Correlation Analysis of Unstandardized Data

Covariances of Canonical and Observed Variables

R code to check covariance matrices of the canonical variables:
# covariance of original and canonical variables (U and X)
> Ainv <- solve(Ahat)
> sum( ( cov(U, X) - crossprod(Ahat, Sx) )^2 )
[1] 3.396329e-30
> sum( ( Sx - crossprod(Ainv) )^2 )
[1] 4.364327e-27

# covariance of original and canonical variables (V and Y)

> Binv <- solve(Bhat)
> sum( ( cov(V, Y) - crossprod(Bhat, Sy) )^2 )
[1] 1.696269e-28
> sum( ( Sy - crossprod(Binv) )^2 )
[1] 3.027024e-26

# covariance of original and canonical variables (U and Y)

> sum( (cov(U, Y) - crossprod(Ahat, Sxy))^2 )
[1] 2.071712e-29

# covariance of original and canonical variables (V and X)

> sum( (cov(V, X) - crossprod(Bhat, t(Sxy)))^2 )
[1] 2.943246e-28

# covariance of canonical variables (U and V)

> rhomat <- cbind(diag(rho), matrix(0, p, q-p))
> sum( (cov(U, V) - rhomat)^2 )
[1] 1.241068e-27
> sum( (Sxy - crossprod(Ainv, rhomat) %*% Binv)^2 )
[1] 1.355523e-25

Nathaniel E. Helwig (U of Minnesota) Canonical Correlation Analysis Updated 16-Mar-2017 : Slide 31

Decathlon Example Canonical Correlation Analysis of Unstandardized Data

Error of Approximation Matrices (r = 2)

R code to calculate error of approximation matrices with r = 2:

# error of approximation matrices (with r=2)

> Ainv <- solve(Ahat)
> Binv <- solve(Bhat)
> r <- 2
> Ex <- Sx - crossprod(Ainv[1:r,])
> Ey <- Sy - crossprod(Binv[1:r,])
> Exy <- Sxy - crossprod(diag(rho[1:r]) %*% Ainv[1:r,], Binv[1:r,])

# get norms of error matrices

> sqrt(mean(Ex^2))
[1] 6.561393
> sqrt(mean(Ey^2))
[1] 18.37339
> sqrt(mean(Exy^2))
[1] 1.725392

Nathaniel E. Helwig (U of Minnesota) Canonical Correlation Analysis Updated 16-Mar-2017 : Slide 32

Decathlon Example Canonical Correlation Analysis of Standardized Data

CCA of Standardized Decathlon Data in R

R code to conduct standardized canonical correlation analysis
# standardize data
> Xs <- scale(X)
> Ys <- scale(Y)

# canonical correlations of correlation (standardized data)

> ccas <- cancor(Xs, Ys)

# cca (the normal way)

> Sx <- cov(Xs)
> Sy <- cov(Ys)
> Sxy <- cov(Xs,Ys)
> Sxeig <- eigen(Sx, symmetric=TRUE)
> Sxisqrt <- Sxeig$vectors %*% diag(1/sqrt(Sxeig$values)) %*% t(Sxeig$vectors)
> Syeig <- eigen(Sy, symmetric=TRUE)
> Syisqrt <- Syeig$vectors %*% diag(1/sqrt(Syeig$values)) %*% t(Syeig$vectors)
> Xmat <- Sxisqrt %*% Sxy %*% solve(Sy) %*% t(Sxy) %*% Sxisqrt
> Ymat <- Syisqrt %*% t(Sxy) %*% solve(Sx) %*% Sxy %*% Syisqrt
> Xeig <- eigen(Xmat, symmetric=TRUE)
> Yeig <- eigen(Ymat, symmetric=TRUE)

# compare correlations (same)

> cca$cor
[1] 0.7702006 0.5033532 0.4184145 0.3052556
> sqrt(Xeig$values)
[1] 0.7702006 0.5033532 0.4184145 0.3052556
> sqrt(Yeig$values[1:p])
[1] 0.7702006 0.5033532 0.4184145 0.3052556

Nathaniel E. Helwig (U of Minnesota) Canonical Correlation Analysis Updated 16-Mar-2017 : Slide 33

Decathlon Example Canonical Correlation Analysis of Standardized Data

CCA of Standardized Decathlon Data in R (continued)

R code to compare the CCA coefficients:
# compare linear combinations (different?)
> Ahat <- Sxisqrt %*% Xeig$vectors
> Bhat <- Syisqrt %*% Yeig$vectors
> sum((ccas$xcoef * sqrt(n-1) - Ahat)^2)
[1] 3.332536e-29
> sum((ccas$ycoef[,1:p] * sqrt(n-1) - Bhat[,1:p])^2)
[1] 11.59453

# note that the signing is arbitary!!

> ccas$ycoef[,1:p] * sqrt(n-1)
[,1] [,2] [,3] [,4]
run100 -0.1439138 -0.2404940 0.5274876 -0.13754449
run400 -0.1373435 0.7655659 -1.2826821 0.96359176
run1500 0.3023537 -1.0519285 -0.1514027 -0.52923644
hurdle -0.4396044 -1.0374417 0.6303782 0.49905604
long.jump -0.3564702 0.4110878 -0.0253127 -1.09325282
high.jump -0.1855627 0.5731149 -0.2615838 -0.09007821
> Bhat[,1:p]
[,1] [,2] [,3] [,4]
[1,] 0.1439138 -0.2404940 -0.5274876 -0.13754449
[2,] 0.1373435 0.7655659 1.2826821 0.96359176
[3,] -0.3023537 -1.0519285 0.1514027 -0.52923644
[4,] 0.4396044 -1.0374417 -0.6303782 0.49905604
[5,] 0.3564702 0.4110878 0.0253127 -1.09325282
[6,] 0.1855627 0.5731149 0.2615838 -0.09007821
> Bhat[,1:p] <- Bhat[,1:p] %*% diag(c(-1,1,-1,1))
> sum((ccas$ycoef[,1:p] * sqrt(n-1) - Bhat[,1:p])^2)
[1] 1.132493e-28

Nathaniel E. Helwig (U of Minnesota) Canonical Correlation Analysis Updated 16-Mar-2017 : Slide 34

Decathlon Example Canonical Correlation Analysis of Standardized Data

Plot the Standardized CCA Coefficients

X Coefficients Y Coefficients
2.0

discus run400
high.jump
1.5

0.5
long.jump
1.0
A2 Coefficients

B2 Coefficients

0.0
0.5

javelin
run100

−0.5
−0.5

pole.vault

−1.0
shot
−1.5

hurdle run1500

−2.0 −1.5 −1.0 −0.5 0.0 −0.4 −0.2 0.0 0.2 0.4

A1 Coefficients B1 Coefficients

R code for left plot:

plot(Ahat[,1:2], xlab="A1 Coefficients", ylab="A2 Coefficients",
type="n", main="X Coefficients", xlim=c(-2, 0.1), ylim=c(-1.1, 0.5))
text(Ahat[,1:2], labels=colnames(X))

Nathaniel E. Helwig (U of Minnesota) Canonical Correlation Analysis Updated 16-Mar-2017 : Slide 35

Decathlon Example Canonical Correlation Analysis of Standardized Data

Define the Canonical Variables

If Xs = {(xij − x̄j )/sxj }n×p and Ys = {(yij − ȳj )/syj }n×q , then
Û = Xs Â = {ûij }n×p where columns of Û contain the canonical
variables for the Xs set
V̂ = Ys B̂ = {v̂ij }n×q where columns of V̂ contain the canonical
variables for the Ys set

R code to define canonical variables:

> U <- Xs %*% Ahat
> V <- Ys %*% Bhat

Nathaniel E. Helwig (U of Minnesota) Canonical Correlation Analysis Updated 16-Mar-2017 : Slide 36

Decathlon Example Canonical Correlation Analysis of Standardized Data

Covariance Matrices of Canonical Variables

R code to check covariance matrices of the canonical variables:
# canonical variable covariances
> round(cov(U),4)
[,1] [,2] [,3] [,4]
[1,] 1 0 0 0
[2,] 0 1 0 0
[3,] 0 0 1 0
[4,] 0 0 0 1

> round(cov(V),4)
[,1] [,2] [,3] [,4] [,5] [,6]
[1,] 1 0 0 0 0 0
[2,] 0 1 0 0 0 0
[3,] 0 0 1 0 0 0
[4,] 0 0 0 1 0 0
[5,] 0 0 0 0 1 0
[6,] 0 0 0 0 0 1

> round(cov(U,V),4)
[,1] [,2] [,3] [,4] [,5] [,6]
[1,] 0.7702 0.0000 0.0000 0.0000 0 0
[2,] 0.0000 0.5034 0.0000 0.0000 0 0
[3,] 0.0000 0.0000 0.4184 0.0000 0 0
[4,] 0.0000 0.0000 0.0000 0.3053 0 0

> rho
[1] 0.7702006 0.5033532 0.4184145 0.3052556

Nathaniel E. Helwig (U of Minnesota) Canonical Correlation Analysis Updated 16-Mar-2017 : Slide 37

Decathlon Example Canonical Correlation Analysis of Standardized Data

Covariances of Canonical and Observed Variables

R code to check covariance matrices of the canonical variables:
# covariance of original and canonical variables (U and Xs)
> Ainv <- solve(Ahat)
> sum( ( cov(U, Xs) - crossprod(Ahat, Sx) )^2 )
[1] 2.759323e-31
> sum( ( Sx - crossprod(Ainv) )^2 )
[1] 6.569732e-30

# covariance of original and canonical variables (V and Ys)

> Binv <- solve(Bhat)
> sum( ( cov(V, Ys) - crossprod(Bhat, Sy) )^2 )
[1] 2.406961e-31
> sum( ( Sy - crossprod(Binv) )^2 )
[1] 3.136492e-29

# covariance of original and canonical variables (U and Ys)

> sum( (cov(U, Ys) - crossprod(Ahat, Sxy))^2 )
[1] 5.477785e-32

# covariance of original and canonical variables (V and Xs)

> sum( (cov(V, Xs) - crossprod(Bhat, t(Sxy)))^2 )
[1] 1.336149e-31

# covariance of canonical variables (U and V)

> rhomat <- cbind(diag(rho), matrix(0, p, q-p))
> sum( (cov(U, V) - rhomat)^2 )
[1] 1.272906e-29
> sum( (Sxy - crossprod(Ainv, rhomat) %*% Binv)^2 )
[1] 7.505349e-30

Nathaniel E. Helwig (U of Minnesota) Canonical Correlation Analysis Updated 16-Mar-2017 : Slide 38

Decathlon Example Canonical Correlation Analysis of Standardized Data

Error of Approximation Matrices (r = 2)

R code to calculate error of approximation matrices with r = 2:

# error of approximation matrices (with r=2)

> Ainv <- solve(Ahat)
> Binv <- solve(Bhat)
> r <- 2
> Ex <- Sx - crossprod(Ainv[1:r,])
> Ey <- Sy - crossprod(Binv[1:r,])
> Exy <- Sxy - crossprod(diag(rho[1:r]) %*% Ainv[1:r,], Binv[1:r,])

# get norms of error matrices

> sqrt(mean(Ex^2))
[1] 0.2432351
> sqrt(mean(Ey^2))
[1] 0.2296716
> sqrt(mean(Exy^2))
[1] 0.07458264

Nathaniel E. Helwig (U of Minnesota) Canonical Correlation Analysis Updated 16-Mar-2017 : Slide 39

Olah Data Normal Fix
No ratings yet
Olah Data Normal Fix
28 pages
Module 5 Kendall's Tau
100% (1)
Module 5 Kendall's Tau
10 pages
Jurnal Persalinan Normal
No ratings yet
Jurnal Persalinan Normal
8 pages
Title: Author: Publisher: Isbn10 - Asin: Print Isbn13: Ebook Isbn13: Language: Subject Publication Date: LCC: DDC: Subject
No ratings yet
Title: Author: Publisher: Isbn10 - Asin: Print Isbn13: Ebook Isbn13: Language: Subject Publication Date: LCC: DDC: Subject
137 pages
Correlation and Regression
No ratings yet
Correlation and Regression
10 pages
Kendall Rank Correlation Coefficient
100% (1)
Kendall Rank Correlation Coefficient
4 pages
PAHS 306: Session 5 - Simple Correlation
100% (1)
PAHS 306: Session 5 - Simple Correlation
14 pages
Random Variables: Presented by in Stochastic Analysis and Inverse Modelling
100% (1)
Random Variables: Presented by in Stochastic Analysis and Inverse Modelling
21 pages
The Specialist Lexicon: Allen C. Browne, Alexa T. Mccray, Suresh Srinivasan
No ratings yet
The Specialist Lexicon: Allen C. Browne, Alexa T. Mccray, Suresh Srinivasan
96 pages
Python Igraph
No ratings yet
Python Igraph
429 pages
CH-3d (Cov, Corr, & Indep)
No ratings yet
CH-3d (Cov, Corr, & Indep)
71 pages
Correlation - DPP 03 - (Aarambh 2024)
No ratings yet
Correlation - DPP 03 - (Aarambh 2024)
3 pages
Unit5 3
No ratings yet
Unit5 3
46 pages
1525695618CanonicalCorrelation 1
No ratings yet
1525695618CanonicalCorrelation 1
51 pages
OpenCOBOL Programmers Guide
0% (1)
OpenCOBOL Programmers Guide
237 pages
Canonical Correlation Analysis: James H. Steiger
No ratings yet
Canonical Correlation Analysis: James H. Steiger
35 pages
Canonical Corr
No ratings yet
Canonical Corr
49 pages
1404 2465 PDF
No ratings yet
1404 2465 PDF
21 pages
Covariance and Correlation: Multiple Random Variables
No ratings yet
Covariance and Correlation: Multiple Random Variables
2 pages
Lec4 IntroToProbabilityAndStatistics
No ratings yet
Lec4 IntroToProbabilityAndStatistics
44 pages
Random Vectors
No ratings yet
Random Vectors
44 pages
Week 3
No ratings yet
Week 3
3 pages
Part2 Gaussian RVs - PPTX Annotated
No ratings yet
Part2 Gaussian RVs - PPTX Annotated
39 pages
Chapter 5 Analysis of Several Groups Canonical Variate Analysis - 1999 - Aspects of Multivariate Statistical Analysis in Geology
No ratings yet
Chapter 5 Analysis of Several Groups Canonical Variate Analysis - 1999 - Aspects of Multivariate Statistical Analysis in Geology
37 pages
Covariance - Correlation and Regression (Lecture)
No ratings yet
Covariance - Correlation and Regression (Lecture)
11 pages
Canonical Correlation PDF
No ratings yet
Canonical Correlation PDF
10 pages
Kuylen 1981
No ratings yet
Kuylen 1981
21 pages
Correlation Rank - Correlation Curve - Fitting For Student
No ratings yet
Correlation Rank - Correlation Curve - Fitting For Student
26 pages
CCA - Canonical Correlation Analysis
No ratings yet
CCA - Canonical Correlation Analysis
12 pages
Notes - Correlation and Regression
No ratings yet
Notes - Correlation and Regression
26 pages
TABEL RANK SPEARMAN-spearman Ranked Correlation Table
No ratings yet
TABEL RANK SPEARMAN-spearman Ranked Correlation Table
1 page
1.2. Ch-2 - Correlation Theory-1
No ratings yet
1.2. Ch-2 - Correlation Theory-1
29 pages
Canonical Correlation Analysis: An Overview With Application To Learning Methods
No ratings yet
Canonical Correlation Analysis: An Overview With Application To Learning Methods
22 pages
10 Cor1
No ratings yet
10 Cor1
18 pages
Covariance and Some Conditional Expectation Exercises: Scott Sheffield
No ratings yet
Covariance and Some Conditional Expectation Exercises: Scott Sheffield
69 pages
1.13 Covariance: Definition
No ratings yet
1.13 Covariance: Definition
24 pages
Stat2024 9
No ratings yet
Stat2024 9
15 pages
R07 Correlation and Regression IFT Notes
No ratings yet
R07 Correlation and Regression IFT Notes
27 pages
Correlation and Regression
No ratings yet
Correlation and Regression
17 pages
JUDE, ESEMOKUMO and OTI PUBLISHED PAPER
No ratings yet
JUDE, ESEMOKUMO and OTI PUBLISHED PAPER
12 pages
Corelatii
No ratings yet
Corelatii
16 pages
Abdi CCA2018
No ratings yet
Abdi CCA2018
16 pages
Canonical Correlation Analysis: An Overview With Application To Learning Methods
No ratings yet
Canonical Correlation Analysis: An Overview With Application To Learning Methods
22 pages
Malacarne
No ratings yet
Malacarne
22 pages
Durbin Watson Tabel (Anwar)
No ratings yet
Durbin Watson Tabel (Anwar)
151 pages
Jurnal Melkianus Mone
No ratings yet
Jurnal Melkianus Mone
12 pages
L15 Cca
No ratings yet
L15 Cca
10 pages
Random Variables: Van Nam Tran
No ratings yet
Random Variables: Van Nam Tran
17 pages
Unit-26 - Canonical - Correlation-Cropped (2 Files Merged)
No ratings yet
Unit-26 - Canonical - Correlation-Cropped (2 Files Merged)
11 pages
Rvmoments Pred Notes
No ratings yet
Rvmoments Pred Notes
18 pages
Visual Analytics Tools For Analysis of Movement Data: Gennady Andrienko1 Natalia Andrienko1 Stefan Wrobel1,2 1
No ratings yet
Visual Analytics Tools For Analysis of Movement Data: Gennady Andrienko1 Natalia Andrienko1 Stefan Wrobel1,2 1
18 pages
Covariances
No ratings yet
Covariances
12 pages
Lesson 13 - Canonical Correlation Analysis
No ratings yet
Lesson 13 - Canonical Correlation Analysis
13 pages
2021 - Week - 3 - Ch.2 Random Process
No ratings yet
2021 - Week - 3 - Ch.2 Random Process
11 pages
Roots PDF
No ratings yet
Roots PDF
15 pages
J-REMI: Jurnal Rekam Medik Dan Informasi Kesehatan: E-Issn: XXX-XXX Vol. No. Doi - XXXXXXXX
No ratings yet
J-REMI: Jurnal Rekam Medik Dan Informasi Kesehatan: E-Issn: XXX-XXX Vol. No. Doi - XXXXXXXX
12 pages
Correction
No ratings yet
Correction
10 pages
Joint Distribution
No ratings yet
Joint Distribution
9 pages
CH5.Operations On Multiple Random Variables
No ratings yet
CH5.Operations On Multiple Random Variables
12 pages
Master Tabel Pengetahuan
No ratings yet
Master Tabel Pengetahuan
11 pages
Correlation Practice Questions
No ratings yet
Correlation Practice Questions
6 pages
Spearman's Rank Order Correlation
No ratings yet
Spearman's Rank Order Correlation
7 pages
Study of The Relationship Between Dependent and Independent Variable Groups by Using Canonical Correlation Analysis With Application
No ratings yet
Study of The Relationship Between Dependent and Independent Variable Groups by Using Canonical Correlation Analysis With Application
10 pages
Ma702 - 13
No ratings yet
Ma702 - 13
4 pages
Canonical Correlation 1
No ratings yet
Canonical Correlation 1
8 pages
Linear Regression
No ratings yet
Linear Regression
9 pages
Canonical Analysis
No ratings yet
Canonical Analysis
7 pages
Colleges and Universities School Type Median SAT Acceptance Rate Expenditures/Student
No ratings yet
Colleges and Universities School Type Median SAT Acceptance Rate Expenditures/Student
7 pages
Korelasi Kunjungan Ke Posyandu Dengan Status Gizi Balita Di Desa Sambirejo Kecamatan Pare Kabupaten Kediri
No ratings yet
Korelasi Kunjungan Ke Posyandu Dengan Status Gizi Balita Di Desa Sambirejo Kecamatan Pare Kabupaten Kediri
6 pages
Canonical Correlation
No ratings yet
Canonical Correlation
7 pages
13 - Calculate Rank Correlation Coefficient
No ratings yet
13 - Calculate Rank Correlation Coefficient
3 pages
Solved Questions For Sampling PDF
No ratings yet
Solved Questions For Sampling PDF
4 pages
TUGAS - STATISTIK Reghi
No ratings yet
TUGAS - STATISTIK Reghi
5 pages
Stationary Time Series Powerpoint
No ratings yet
Stationary Time Series Powerpoint
4 pages
R Data Analysis Examples - Canonical Correlation Analysis
No ratings yet
R Data Analysis Examples - Canonical Correlation Analysis
7 pages
Y X y X N B: Linear Regression
No ratings yet
Y X y X N B: Linear Regression
7 pages
Week 3 - Notes
No ratings yet
Week 3 - Notes
3 pages
Assignment 4.54: First Quartile Second Quartile Third Quartile 4.56
No ratings yet
Assignment 4.54: First Quartile Second Quartile Third Quartile 4.56
2 pages
Date: 13-08-18 Correlation and Autocorrelation Coefficient: Xperiment O
No ratings yet
Date: 13-08-18 Correlation and Autocorrelation Coefficient: Xperiment O
3 pages
Mariella Pearson
No ratings yet
Mariella Pearson
2 pages
Canonical Correlation Analysis in SPSS PDF
No ratings yet
Canonical Correlation Analysis in SPSS PDF
6 pages
Correlation & Simple Regression
No ratings yet
Correlation & Simple Regression
15 pages
Correlation Activity Sheet
No ratings yet
Correlation Activity Sheet
2 pages
7thcanonical Correlation Analysis PDF
No ratings yet
7thcanonical Correlation Analysis PDF
13 pages
Digital Signal
No ratings yet
Digital Signal
3 pages
1,425 277 273 100 915 120 1,687 259 234 40 142 25 57 258 31 894 141 Coefficient of Correlat 0.95684
No ratings yet
1,425 277 273 100 915 120 1,687 259 234 40 142 25 57 258 31 894 141 Coefficient of Correlat 0.95684
3 pages
Canonical Correlation Notes
No ratings yet
Canonical Correlation Notes
6 pages
Lecture-12 Canonical Correlation
No ratings yet
Lecture-12 Canonical Correlation
13 pages
Canonical Correlation - MATLAB Canoncorr - MathWorks India
No ratings yet
Canonical Correlation - MATLAB Canoncorr - MathWorks India
2 pages
Lampiran Validitas Dan Reliabilitas
No ratings yet
Lampiran Validitas Dan Reliabilitas
10 pages
Spearman's Rank Correlation Coefficient: Idea
No ratings yet
Spearman's Rank Correlation Coefficient: Idea
4 pages
Hyundaifa Interview Call Letter
No ratings yet
Hyundaifa Interview Call Letter
3 pages
Canonical Correlation Analysis (Cca) Algorithms For Multiple Data Sets: Application To Blind Simo Equalization
No ratings yet
Canonical Correlation Analysis (Cca) Algorithms For Multiple Data Sets: Application To Blind Simo Equalization
4 pages
Correlation: Prepared By: Prof. Shuchi Mathur
No ratings yet
Correlation: Prepared By: Prof. Shuchi Mathur
14 pages