Principal Component Analysis
Principal Component Analysis
Correspondence
from the same matrix all use the same letter (e.g.,
A, a, a). The transpose operation is denoted by the
superscriptT . The identity matrix is denoted I.
The data table to be analyzed by PCA comprises
I observations described by J variables and it is
represented by the I J matrix X, whose generic
element !is x"i,j . The matrix X has rank L where
L min I, J .
In general, the data table will be preprocessed
before the analysis. Almost always, the columns of X
will be centered so that the mean of each column
is equal to 0 (i.e., XT 1 = 0, where 0 is a J by
1 vector of zeros and 1 is an I by 1 vector of
ones).
addition, each element of X is divided
If in
by I (or I 1), the analysis is referred to as
a covariance PCA because, in this case, the matrix
XT X is a covariance matrix. In addition to centering,
when the variables are measured with different units,
it is customary to standardize each variable to unit
norm. This is obtained by dividing each variable by
its norm (i.e., the square root of the sum of all the
squared elements of this variable). In this case, the
analysis is referred to as a correlation PCA because,
then, the matrix XT X is a correlation matrix (most
statistical packages use correlation preprocessing as a
default).
The matrix X has the following singular value
decomposition [SVD, see Refs 1113 and Appendix B
for an introduction to the SVD]:
1 School
2 Department
X = P!QT
(1)
433
Overview
www.wiley.com/wires/compstats
I
#
x2i,j .
(2)
J
#
$
%2
xi,j gj .
(3)
J
#
x2i,j .
GOALS OF PCA
The goals of PCA are to
(1) extract the most important information from the
data table;
(2) compress the size of the data set by keeping only
this important information;
(3) simplify the description of the data set; and
(4) analyze the structure of the observations and the
variables.
(5)
(4)
2
Note that the sum of all di,g
is equal to I which is the
inertia of the data table .
434
(6)
with FT F = !2 and QT Q = I.
(7)
Vo lu me 2, Ju ly /Au gu s t 2010
Therefore
Generality
15
160
10
120
Blot
3.83
6.98
4.76
5.52
7.52
1.61
1.92
3.91
6.06
5.60
1.07
4.14
3.07
0.00
4.99
2.99
0.84
4.68
0.84
6.67
F1
2.30
2.07
1.84
1.23
1.23
2.53
1.15
1.30
2.07
2.38
1.69
0.92
0.77
0.00
0.38
2.84
0.54
1.76
0.54
0.69
F2
1
9.41
2.59
12
3
1
14
14.71
100
10
100
48.71
12
30.49
444
I
2
1
20
53
26
32
58
17
41
37
18
10
25
17
25
45
d2
52
5.29
4.29
3.39
1.51
1.51
6.41
1.32
1.70
4.29
5.65
2.85
0.85
0.59
0.00
0.15
8.05
0.29
3.11
0.29
0.48
F22
392
22.61
56.49
3.68
15.30
3
9
4
31.35
36.71
11
1.15
17.15
24.85
6
0
8.95
0.71
1
15
21.89
0.71
44.52
F12
6
0
ctr2 100
11
ctr1 100
74
92
87
95
97
29
74
90
90
85
29
95
94
99
53
71
88
71
99
cos21 100
26
13
71
26
10
10
15
71
47
29
12
29
cos22 100
MW = 8, MY = 6. The following abbreviations are used to label the columns: w = (W MW ); y = (Y MY ). The contributions and the squared cosines are multiplied by 100 for ease of reading. The positive
important contributions are italicized , and the negative important contributions are represented in bold.
Infectious
&
13
10
Arise
Pretentious
For
Neither
With
12
Scoundrel
11
Slope
Relief
This
Monastery
Solid
By
Insane
11
On
11
14
Across
Bag
Gravity, and Squared Cosines of the Observations for the Example Length of Words (Y) and Number of Lines (W)
TABLE 1 Raw Scores, Deviations from the Mean, Coordinates, Squared Coordinates on the Components, Contributions of the Observations to the Components, Squared Distances to the Center of
435
Overview
www.wiley.com/wires/compstats
(a)
9
8
wsur
= Ysur MY = 3 6 = 3
Blot
Solid
This
On
For
Arise
Bag
By
9 10 11 12 13 14 15
Generality
4
Infectious
3
Scoundrel
2
Monastery
and
= Wsur MW = 12 8 = 4.
Insane
Slope
Relief
With
(8)
Across
Neither
Therefore
Neither
1
7 6 5 4 3 2 1
Across
Insane
1 2 3 4
Relief Slope
With
2
2
xTsup Q.
Scoundrel
Therefore
(b)
T
fsup
Pretentious
Generality
Infectious
Monastery
11
10
Pretentious
Infectious
Therefore
Neither
On
Bag
By
Scoundrel
1
Monastery
7 6 5 4 3 2 1
Across
7
Blot
Solid
For
Generality
This
(c)
Arise
2
Solid
Insane Slope
1 2 3 4
Relief
With
2
3
For
This
Arise
5
Blot
Bag
6 7 1
On
By
Vo lu me 2, Ju ly /Au gu s t 2010
Pretentious
1
Neither
7 6 5 4 3 2 1
Across
1
2
3
Blot
.38
With
Solid
This
For
On
4
Infectious
3
Scoundrel
2
Neither
1
7 6 5 4 3 2 1
1 2 3 4 5 6
Insane
Arise
Relief Slope
Across
1
Bag
(b)
INTERPRETING PCA
Inertia explained by a component
The importance of a component is reflected by its
inertia or by the proportion of the total inertia
explained by this factor. In our example (see
Table 2) the inertia of the first component is equal
to 392 and this corresponds to 83% of the total
inertia.
Contribution of an Observation to a
Component
Recall that the eigenvalue associated to a component
is equal to the sum of the squared factor scores
for this component. Therefore, the importance of an
observation for a component can be obtained by the
ratio of the squared factor score of this observation by
the eigenvalue associated with that component. This
ratio is called the contribution of the observation to the
component. Formally, the contribution of observation
i to component # is, denoted ctri,# , obtained as:
With
Solid
This
For
On
Sur
By
Blot
Bag
Pretentious
(10)
0.38
4
Infectious
Monastery
3
4.99
Scoundrel
Therefore
2
Neither
1
7 6 5 4 3 2 1
1 2 3 4 5 6 7
Insane
Arise
Across
ReliefSlope
1
With
Blot
Solid
2
This
3
For
On
Bag 1
Sur
4
By
2
FIGURE 2 | Plot of the centered data, with the first and second
2
2
fi,#
fi,#
= #
=
2
#
fi,#
1 2 3 4 5 6
Insane
Arise
Relief Slope
By
Generality
ctri,#
Monastery
Therefore
4
Infectious
Monastery
3
5
.60
Scoundrel
Therefore
2
Pretentious
Generality
On second component
Generality
(a)
Projection of
neither
On first component
437
Overview
www.wiley.com/wires/compstats
Component
i
(eigenvalue)
Cumulated
(eigenvalues)
Cumulated
(percentage)
392
392
83.29
83.29
52
444
11.71
100.00
(11)
2
is the squared distance of a given
where di,g
2
observation to the origin. The squared distance, di,g
, is
computed (thanks to the Pythagorean theorem) as the
sum of the squared values of all the factor scores of
this observation (cf. Eq. 4). Components with a large
value of cos2i,# contribute a relatively large portion to
the total distance and therefore these components are
important for that observation.
The distance to the center of gravity is defined for
supplementary observations and the squared cosine
can be computed and is meaningful. Therefore, the
value of cos2 can help find the components that are
important to interpret both active and supplementary
observations.
Percent of
of Inertia
Vo lu me 2, Ju ly /Au gu s t 2010
Loadings
Component
1
0.9927
0.9810
0.9855
0.9624
0.8437
0.0145
0.0376
0.5369
1.0000
1.0000
0.1203
2
&
Squared Loadings
0.1939
0.8437
0.5369
Frequency
Bag
# Entries
Loadings
Component
Across
230
On
700
12
2
&
Insane
By
500
Monastery
Relief
Slope
Scoundrel
700
With
Neither
Pretentious
Solid
This
500
For
900
Therefore
Generality
Arise
10
Blot
Infectious
0.3012
0.7218
# Entries
Frequency
# Entries
0.6999
0.0907
0.4899
0.4493
0.5210
0.2019
.6117
.6918
STATISTICAL INFERENCE:
EVALUATING THE QUALITY
OF THE MODEL
Fixed Effect Model
Frequency
Squared Loadings
X = FQT = XQQT .
(12)
439
Overview
www.wiley.com/wires/compstats
(a)
PC 2
Length
(number of letters)
Number of
lines of the
definition
+[M] &2
RESSM = &X X
,
= trace ET E
PC1
=I
(b)
(15)
#=1
where & & is the norm of X (i.e., the square root of the
sum of all the squared elements of X), and where the
trace of a matrix is the sum of its diagonal elements.
The smaller the value of RESS, the better the PCA
model. For a fixed effect model, a larger M gives a
+[M] . For a fixed effect model,
better estimation of X
the matrix X is always perfectly reconstituted with L
components (recall that L is the rank of X).
In addition, Eq. 12 can be adapted to compute
the estimation of the supplementary observations as
PC 2
Length
(number of letters)
M
#
Number of
lines of the
definition
PC1
# Entries
[M] [M]T
+
x[M]
Q
.
sup = xsup Q
Frequency
(16)
variables with principal components 1 and 2, and (b) the variables and
supplementary variables with principal components 1 and 2. Note that
the supplementary variables are not positioned on the unit circle.
= XQ[M] Q[M]T
(13)
(14)
Vo lu me 2, Ju ly /Au gu s t 2010
(17)
PRESS#
RESS#1
(19)
dfresidual, #
PRESS#1 PRESS#
,
PRESS#
df#
(20)
# >
1#
1
# = I
L
L
(18)
df# = I + J 2#,
Random Model
(21)
dfresidual, # = J(I 1)
#
#
k=1
(I + J 2k)
(22)
441
Overview
www.wiley.com/wires/compstats
ROTATION
After the number of components has been determined,
and in order to facilitate the interpretation, the
analysis often involves a rotation of the components
that were retained [see, e.g., Ref 40 and 67, for
more details]. Two main types of rotation are used:
orthogonal when the new axes are also orthogonal
to each other, and oblique when the new axes are
not required to be orthogonal. Because the rotations
are always performed in a subspace, the new axes
will always explain less inertia than the original
components (which are computed to be optimal).
However, the part of the inertia explained by the
total subspace after rotation is the same as it was
before rotation (only the partition of the inertia has
changed). It is also important to note that because
rotation always takes place in a subspace (i.e., the
space of the retained components), the choice of this
subspace strongly influences the result of the rotation.
Therefore, it is strongly recommended to try several
sizes for the subspace of the retained components in
order to assess the robustness of the interpretation of
the rotation. When performing a rotation, the term
loadings almost always refer to the elements of matrix
Q. We will follow this tradition in this section.
Orthogonal Rotation
An orthogonal rotation is specified by a rotation
matrix, denoted R, where the rows stand for the
original factors and the columns for the new (rotated)
factors. At the intersection of row m and column n we
have the cosine of the angle between the original axis
and the new one: rm,n = cos m,n . A rotation matrix
has the important property of being orthonormal
because it corresponds to a matrix of direction cosines
and therefore RT R = I.
Varimax rotation, developed by Kaiser,41 is the
most popular rotation method. For varimax a simple
solution means that each component has a small
number of large loadings and a large number of zero
(or small) loadings. This simplifies the interpretation
because, after a varimax rotation, each original
442
(q2j,# q2# )2
(23)
Oblique Rotations
With oblique rotations, the new axes are free to
take any position in the component space, but the
degree of correlation allowed among factors is small
because two highly correlated components are better
interpreted as only one factor. Oblique rotations,
therefore, relax the orthogonality constraint in order
to gain simplicity in the interpretation. They were
strongly recommended by Thurstone,42 but are used
more rarely than their orthogonal counterparts.
For oblique rotations, the promax rotation has
the advantage of being fast and conceptually simple.
The first step in promax rotation defines the target
matrix, almost always obtained as the result of a
varimax rotation whose entries are raised to some
power (typically between 2 and 4) in order to force
the structure of the loadings to become bipolar.
The second step is obtained by computing a least
square fit from the varimax solution to the target
matrix. Promax rotations are interpreted by looking
at the correlationsregarded as loadingsbetween
the rotated axes and the original variables. An
interesting recent development of the concept of
oblique rotation corresponds to the technique of
independent component analysis (ica) where the
axes are computed in order to replace the notion
of orthogonality by statistical independence [see Ref
43,for a tutorial].
Vo lu me 2, Ju ly /Au gu s t 2010
For
Meat
For
Dessert
Price
Sugar
Alcohol
Acidity
Wine 1
14
13
Wine 2
10
14
Wine 3
10
12
Wine 4
16
11
Wine 5
13
10
Five wines are described by seven variables [data from Ref 44].
F1
F2
ctr1
ctr2
cos21
cos22
1.17
0.55
29
17
77
17
0.61
23
21
69
24
0.08
0.19
34
Wine 4
0.89
17
41
50
46
Wine 5
1.23
0.86
32
20
78
19
Wine 3
1.04
Wine 5
Wine 3
PC1
Wine 1
Wine 4
Correlation PCA
Wine 2
PC2
EXAMPLES
Wine 1
Wine 2
0.61
We can see from Figure 5 that the first component separates Wines 1 and 2 from Wines 4 and
5, while the second component separates Wines 2
and 5 from Wines 1 and 4. The examination of the
values of the contributions and cosines, shown in
Table 7, complements and refines this interpretation
because the contributions suggest that Component 1
essentially contrasts Wines 1 and 2 with Wine 5 and
that Component 2 essentially contrasts Wines 2 and
5 with Wine 4. The cosines show that Component
1 contributes highly to Wines 1 and 5, while
Component 2 contributes most to Wine 4.
To find the variables that account for these
differences, we examine the loadings of the variables
on the first two components (see Table 8) and the circle
of correlations (see Figure 6 and Table 9). From these,
we see that the first component contrasts price with
the wines hedonic qualities, its acidity, its amount
of alcohol, and how well it goes with meat (i.e., the
wine tasters preferred inexpensive wines). The second
component contrasts the wines hedonic qualities,
acidity, and alcohol content with its sugar content and
how well it goes with dessert. From this, it appears
that the first component represents characteristics that
are inversely correlated with a wines price while the
second component represents the wines sweetness.
To strengthen the interpretation, we can apply
a varimax rotation, which gives a clockwise rotation
443
Overview
www.wiley.com/wires/compstats
TABLE 8 PCA Wine Characteristics. Loadings (i.e., Q matrix) of the Variables on the
First Two Components
PC 1
For
For
Hedonic
Meat
Dessert
Price
Sugar
Alcohol
Acidity
0.40
0.45
0.26
0.42
0.05
0.44
0.45
0.11
PC 2
0.11
0.59
0.31
0.72
0.06
0.09
TABLE 9 PCA Wine Characteristics. Correlation of the Variables with the First Two
Components
For
PC 1
PC 2
For
Hedonic
meat
dessert
Price
Sugar
Alcohol
Acidity
0.87
0.97
0.58
0.91
0.11
0.96
0.99
0.15
0.15
0.79
PC 2
Hedonic
Acidity
Alcohol
PC1
For meat
Price
For dessert
Sugar
Covariance PCA
Here we use data from a survey performed in the
1950s in France [data from Ref 45]. The data table
gives the average number of Francs spent on several
categories of food products according to social class
and the number of children per family. Because a
Franc spent on one item has the same value as a Franc
444
0.42
0.97
0.07
0.12
Vo lu me 2, Ju ly /Au gu s t 2010
For
For
Hedonic
Meat
Dessert
Price
Sugar
Alcohol
Acidity
0.41
0.41
0.11
0.48
0.13
0.20
0.71
0.44
0.46
PC 1
PC 2
0.02
0.21
0.63
0.05
0.03
TABLE 11 Average Number of Francs Spent (per month) on Different Types of Food According To
Social Class and Number of Children [dataset from Ref 45]
Bread
Vegetables
332
Type of Food
Fruit Meat Poultry
Milk
Wine
354
247
427
Blue collar
2 Children
428
1437
526
White collar
2 Children
293
559
388
1527
567
239
258
Upper class
2 Children
372
767
562
1948
927
235
433
Blue collar
3 Children
406
563
341
1507
544
324
407
White collar
3 Children
386
608
396
1501
558
319
363
Upper class
3 Children
438
843
689
2345
1148
243
341
Blue collar
4 Children
534
660
367
1620
638
414
407
White collar
4 Children
460
699
484
1856
762
400
416
Upper class
4 Children
385
789
621
2366
1149
304
282
Blue collar
5 Children
655
776
423
1848
759
495
486
White collar
5 Children
584
995
548
2056
893
518
319
Upper class
5 Children
515
1097
887
2630
1167
561
284
447
732
505
1887
803
358
369
107
189
165
396
250
117
72
Mean
S
TABLE 12 PCA Example. Amount of Francs Spent (per month) by Food Type, Social Class,
and Number of Children. Factor scores, contributions of the observations to the components,
and squared cosines of the observations on principal components 1 and 2
F1
635.05
F2
ctr1
ctr2
cos21
cos22
120.89
13
95
142.33
86
139.75
26
40
12.05
100
Blue collar
2 Children
White collar
2 Children
488.56
Upper class
2 Children
Blue collar
3 Children
112.03
White collar
3 Children
485.94
1.17
98
Upper class
3 Children
12
89
4 Children
188.44
11
Blue collar
588.17
144.54
83
15
White collar
4 Children
57.51
42.86
40
22
Upper class
4 Children
206.76
15
86
11
5 Children
571.32
11
Blue collar
264.47
24
79
White collar
5 Children
235.92
19
57
36
Upper class
5 Children
296.04
97.15
33
97
520.01
333.95
39.38
992.83
The positive important contributions are italicized, and the negative important contributions are
represented in bold. For convenience, squared cosines and contributions have been multiplied by 100
and rounded.
445
Overview
www.wiley.com/wires/compstats
(a)
PC 2
BC5
x2
UC5
Hedonic
Acidity
x1
Alcohol
PC1
WC5
UC3
BC3
WC3
UC4
For meat
BC4
WC4
UC2
BC2
WC2
Price
food type by social class and number of children. Factor scores for
principal components 1 and 2. 1 = 3,023,141.24, 1 = 88%;
2 = 290,575.84, 2 = 8%. BC = blue collar; WC = white collar; UC
= upper class; 2 = 2 children; 3 = 3 children; 4 = 4 children; 5 = 5
children..
For dessert
Sugar
(b)
y2
Alcohol
x1
= 15
For meat
Price
y1
For dessert
Sugar
(c)
y2
Hedonic
Acidity
y1
Alcohol
Price
For meat
For dessert
Sugar
seven variables. (b) The loadings of the seven variables showing the
original axes and the new (rotated) axes derived from varimax. (c) The
loadings after varimax rotation of the seven variables.
446
PC 1 0.01
0.33
0.16 0.01
0.03
0.45 0.00
PC 2 0.11
0.17
0.09 0.37
0.18
0.03 0.06
Vo lu me 2, Ju ly /Au gu s t 2010
the rows and one for the columns. These factor scores
are, in general, scaled such that their inertia is equal to
their eigenvalue (some versions of CA compute row or
column factor scores normalized to unity). The grand
total of the table is denoted N.
Bread PC2
Milk
Vegetable
Wine
Computations
PC1
Meat
Fruit
Poultry
Notations
0
. T with .
. . T 1 .
.Q
P!
Z rcT = .
PT D1
r P = Q Dc Q = I
(24)
and
..
G = D1
c Q!
(25)
2
ri fi,#
and
ctrj,# =
2
cj gj,#
(26)
447
Overview
www.wiley.com/wires/compstats
TABLE 14 PCA Example: Amount of Francs Spent (per month) on Food Type by Social Class and Number of children. Eigenvalues,
cumulative eigenvalues, RESS, PRESS, Q2 , and W values for all seven components
&
&
PRESS/I
Q2
610,231.19
0.18
0.82
1.31
259,515.13
0.08
0.37
0.45
0.02
155,978.58
0.05
0.27
27,438.49
0.01
152,472.37
0.04
0.28
4,446.25
0.00
54,444.52
0.02
1.35
1.00
723.92
0.00
7,919.49
0.00
0.98
1.00
0.00
0.00
0.00
0.00
1.00
/I
RESS
0.88
412,108.51
0.12
0.96
121,532.68
0.04
3,382,512.31
0.98
52,737.44
0.01
3,407,811.26
0.99
0.01
3,430,803.50
1.00
3,722.32
0.00
3,434,525.83
723.92
0.00
3,435,249.75
3,435,249.75
1.00
/I
PC 1
3,023,141.24
0.88
3,023,141.24
PC 2
290,575.84
0.08
3,313,717.07
PC 3
68,795.23
0.02
PC 4
25,298.95
PC 5
22,992.25
PC 6
PC 7
&
PRESS
1.89
0.01
0.78
8.22
L
#
l
# = rT dr = cT dc
(28)
The squared cosine between row i and component # and column j and component # are obtained
respectively as:
cos2i,# =
2
fi,#
2
dr,i
and
cos2j,# =
2
gj,#
2
dc,j
(29)
2
2
(with dr,i
and dc,j
, being respectively the i-th element
of dr and the j-th element of dc ). Just like for
PCA, squared cosines help locating the components
important for a given observation or variable.
And just like for PCA, supplementary or
illustrative elements can be projected onto the
components, but the CA formula needs to take into
account masses and weights. The projection formula,
is called the transition formula and it is specific to CA.
Specifically, let iTsup being an illustrative row and jsup
being an illustrative column to be projected (note that
in CA, prior to projection, a illustrative row or column
is re-scaled such that its sum is equal to one). Their
coordinates of the illustrative rows (denoted fsup ) and
column (denoted gsup ) are obtained as:
/
01
. 1 and gsup
fsup = iTsup 1
iTsup G!
/
01
. 1
= jTsup 1
jTsup F!
(30)
/
01
/
01
[note that the scalar terms iTsup 1
and jTsup 1
are used to ensure that the sum of the elements of isup
448
RESS/I
Example
For this example, we use a contingency table that
gives the number of punctuation marks used by the
French writers Rousseau, Chateaubriand, Hugo, Zola,
Proust, and Giraudoux [data from Ref 52]. This table
indicates how often each writer used the period, the
comma, and all other punctuation marks combined
(i.e., interrogation mark, exclamation mark, colon,
and semicolon). The data are shown in Table 15.
A CA of the punctuation table extracts two
components which together account for 100% of
the inertia (with eigenvalues of .0178 and .0056,
respectively). The factor scores of the observations
(rows) and variables (columns) are shown in Tables 16
and the corresponding map is displayed in Figure 10.
We can see from Figure 10 that the first
component separates Proust and Zolas pattern of
punctuation from the pattern of punctuation of
the other four authors, with Chateaubriand, Proust,
and Zola contributing most to the component.
The squared cosines show that the first component
accounts for all of Zolas pattern of punctuation (see
Table 16).
The second component separates Giraudouxs
pattern of punctuation from that of the other authors.
Giraudoux also has the highest contribution indicating
that Giraudouxs pattern of punctuation is important
for the second component. In addition, for Giraudoux
the highest squared cosine (94%), is obtained for
Component 2. This shows that the second component
is essential to understand Giraudouxs pattern of
punctuation (see Table 16).
In contrast with PCA, the variables (columns)
in CA are interpreted identically to the rows. The
factor scores for the variables (columns) are shown in
Vo lu me 2, Ju ly /Au gu s t 2010
TABLE 15 The Punctuation Marks of Six French Writers (from Ref 52).
Authors Name
Period
Comma
Other
xi+
Rousseau
7,836
13,112
6,026
26,974
Chateaubriand
r
0.0189
53,655
102,383
42,413
198,451
0.1393
Hugo
115,615
184,541
59,226
359,382
0.2522
Zola
161,926
340,479
62,754
565,159
0.3966
Proust
38,177
105,101
12,670
155,948
0.1094
Giraudoux
46,371
58,367
14,299
119,037
0.0835
x+j
423,580
803,983
197,388
N = 142,4951
1.0000
cT
0.2973
0.5642
0.1385
The column labeled xi+ gives the total number of punctuation marks used by each author. N is the
grand total of the data table. The vector of mass for the rows, r, is the proportion of punctuation marks
used by each author (ri = xi+ /N). The row labeled x+j gives the total number of times each punctuation
mark was used. The centroid row, cT , gives the proportion of each punctuation mark in the sample
(cj = x+j /N).
TABLE 16 CA punctuation. Factor scores, contributions, mass, mass squared factor scores, inertia to barycenter, and
squared cosines for the rows.
Rousseau
Chateaubriand
Hugo
F1
F2
0.24
0.07
0.10
0.03
0.00
0.19
0.09
Zola
0.22
Proust
Giraudoux
&
0.05
ri
ri
F12
F22
2
dr,i
0.0189
0.0011
0.0001
0.0012
28
29
0.1393
0.0050
0.0016
15
.2522
0.0027
0.0002
ctr2
0.11
ri
ri
ctr1
cos21
cos22
91
0.0066
76
24
0.0029
92
19
.3966
0.0033
0.0000
0.0033
100
0.06
31
.1094
0.0055
0.0004
.0059
93
0.20
58
0.0835
0.0002
0.0032
0.0034
94
100
100
.0178
.0056
.0234
76%
24%
The positive important contributions are italicized, and the negative important contributions are represented in bold. For convenience,
squared cosines and contributions have been multiplied by 100 and rounded.
PC2
Chateaubriand
Proust
Comma
PC1
Zola
Rousseau
Hugo
Other
marks
Period
Giraudoux
449
Overview
www.wiley.com/wires/compstats
TABLE 17 CA punctuation. Factor scores, contributions, mass, mass squared factor scores, inertia to
barycenter, and squared cosines for the columns
F1
Period
Comma
Other
&
F2
0.05
0.10
0.29
ctr2
cj
cj
F12
F22
cj
2
dc,j
cos21
cos22
0.11
66
.2973
0.0007
0.0037
0.0044
16
84
0.04
30
14
.5642
0.0053
0.0008
0.0061
88
12
66
20
.1385
0.0118
0.0011
0.0129
91
100
100
.0178
0.0056
0.0234
I
0.09
ctr1
cj
76%
24%
The positive important contributions are italicized, and the negative important contributions are represented in bold. For
convenience, squared cosines, and contributions have been multiplied by 100 and rounded.
Notations
[t] J
(31)
Computations
[t] ( 1
[t]X
[t] 1
[t]X
(32)
Vo lu me 2, Ju ly /Au gu s t 2010
(a)
PC 2
4
6
Example
Suppose that three experts were asked to rate six
wines aged in two different kinds of oak barrel from
the same harvest of Pinot Noir [example from Ref 55].
Wines 1, 5, and 6 were aged with a first type of oak,
and Wines 2, 3, and 4 with a second type of oak. Each
expert was asked to choose from 2 to 5 variables to
describe the wines. For each wine, the expert rated the
intensity of his/her variables on a 9-point scale. The
data consist of T = 3 subtables, which are presented
in Table 18.
The PCAs on each of the three subtables
extracted eigenvalues of 1 (1 = 2.86, 2 (1 = 3.65,
and 3 (1 = 2.50 with singular values of 1 1 = 1.69,
2 1 = 1.91, and 3 1 = 1.58, respectively.
Following normalization and concatenation of
the subtables, the global PCA extracted five components (with eigenvalues of 2.83, 0.36, 0.11, 0.03, and
0.01). The first two components explain 95% of the
inertia. The factor scores for the first two components
of the global analysis are given in Table 19 and the
corresponding map is displayed in Figure 11a.
We can see from Figure 11 that the first
component separates the first type of oak (Wines 1, 5,
and 6) from the second oak type (Wines 2, 3, and 4).
In addition to examining the placement of the
wines, we wanted to see how each experts ratings
fit into the global PCA space. We achieved this by
projecting the data set of each expert as a supplementary element [see Ref 18 for details of the procedure].
The factor scores are shown in Table 19. The experts
placement in the global map is shown in Figure 11b.
Note that the position of each wine in the global analysis is the center of gravity of its position for the experts.
The projection of the experts shows that Expert 3s
ratings differ from those of the other two experts.
The variable loadings show the correlations
between the original variables and the global factor
scores (Table 20). These loadings are plotted in
Figure 12. This figure also represents the loadings (Table 21) between the components of each
subtable and the components of the global analysis
as the circle of correlations specific to each expert.
From this we see that Expert 3 differs from the other
experts, and is mostly responsible for the second
component of the global PCA.
Volume 2, July /August 2010
5
PC1
(b)
PC 2
4
6
5
PC1
FIGURE 11 | MFA wine ratings and oak type. (a) Plot of the global
analysis of the wines on the first two principal components. (b)
Projection of the experts onto the global analysis. Experts are
represented by their faces. A line segment links the position of the wine
for a given expert to its global position. 1 = 2.83, 1 = 84%;
2 = 2.83, 2 = 11%.
CONCLUSION
PCA is very versatile, it is the oldest and remains the
most popular technique in multivariate analysis. In
addition to the basics presented here, PCA can also
be interpreted as a neural network model [see, e.g.,
Refs 56,57]. In addition to CA, covered in this paper,
generalized PCA can also be shown to incorporate
a very large set of multivariate techniques such as
451
Overview
www.wiley.com/wires/compstats
Oak Type
Fruity
Woody
Expert 2
Coffee
Red Fruit
Roasted
Expert 3
Vanillin
Woody
Fruity
Butter
Woody
Wine 1
Wine 2
Wine 3
Wine 4
Wine 5
Wine 6
Au = u .
(A.1)
(A.2)
TABLE 19 MFA Wine Ratings and Oak Type. Factor Scores for the Global Analysis, Expert 1,
Expert 2, and Expert 3 for the First Two Components
Expert 1sup
[1] F2
Global
F1
F2
Wine 1
2.18
Wine 2
0.56
Wine 3
Wine 4
Wine 5
0.51
0.20
2.32
0.83
1.40
0.05
1.13
0.58
1.83
Wine 6
Expert 2sup
[1] F1
0.90
2.76
0.77
1.10
0.30
0.81
1.99
1.98
1.29
0.69
[2] F1
0.93
0.62
0.30
Expert 3sup
[3] F2
[2] F2
2.21
[3] F1
0.86
0.28
0.13
2.39
1.23
0.50
2.11
1.49
0.44
0.76
2.85
3.80
1.43
1.27
1.62
2.28
1.12
0.49
1.08
1.54
0.61
0.24
0.56
TABLE 20 MFA Wine Ratings and Oak Type. Loadings (i.e., correlations) on the Principal Components of the Global Analysis of the
Original Variables. Only the First Three Dimensions are Kept
Expert 1
452
PC
(%)
Fruity
Woody
2.83
85
0.98
0.36
11
0.97
0.12
0.23
0.02
0.15
0.02
Coffee
0.92
0.06
0.37
Fruit
0.89
0.38
0.21
Roasted
Vanillin
0.96
0.95
0.00
0.20
0.28
0.00
Expert 3
Woody
Fruity
Butter
Woody
0.97
0.59
0.95
0.99
0.19
0.00
0.08
0.24
0.11
0.10
0.14
0.80
Vo lu me 2, Ju ly /Au gu s t 2010
TABLE 21 MFA Wine Ratings and Oak Type. Loadings (i.e., correlations) on the
Principal Components of the Global Analysis of the Principal Components of the Subtable
PCAs
(%)
[1] PC1
2.83
85
.98
0.36
11
0.12
0.15
0.14
[1] PC2
[2] PC1
[2] PC2
[3] PC1
[3] PC2
0.08
0.99
0.13
0.16
0.94
0.28
0.35
0.58
0.05
0.84
0.09
0.76
*
2 3
A=
2 1
(A.3)
with eigenvalue
1 = 4
(A.4)
and
0.35
0.94
.01
*
1
u2 =
1
with eigenvalue
2 = 1
(A.5)
For most applications we normalize the eigenvectors (i.e., transform them such that their length is
equal to one), therefore
uT u = 1
(A.6)
(A.7)
A = U"U1 .
(A.8)
or also as:
(A.10)
453
Overview
www.wiley.com/wires/compstats
(a)
PC 2
Fruity
PC1
1PC2
Coffee
1PC1
Woody
U1 = UT .
PC2
2PC2
Red Fruit
Woody
A = U"UT
2PC1
Vanillin
with
3
Expert 2
3PC2
PC2
3PC1
Butter
Woody
PC1
Fruity
Expert 3
454
1
32
1
2
3
*
1
4 0 32
0 2
1
1
32
1
2
1
32
12
)
*
3 1
=
1 3
(c)
(A.12)
Roasted
PC1
UT U = I
with
Expert 1
(b)
(A.11)
1
32
12
1
32
1
2
1
32
12
1
32
12
(A.14)
)
*
= 1 0 (A.15)
0 1
F = XQ,
(A.16)
Vo lu me 2, Ju ly /Au gu s t 2010
F F = Q X XQ
(A.17)
(A.18)
(A.19)
(A.21)
X XQ Q" = 0 X XQ = Q".
(A.22)
(A.23)
Because " is diagonal, this is clearly an eigendecomposition problem, and this indicates that " is
the matrix of eigenvalues of the positive semi-definite
matrix XT X ordered from the largest to the smallest
and that Q is the matrix of eigenvectors of XT X
associated to ". Finally, we find that the factor matrix
has the form
F = XQ.
Volume 2, July /August 2010
(A.24)
(A.25)
(B.1)
with
P: The (normalized) eigenvectors of the matrix
AAT (i.e., PT P = I). The columns of P are called
the left singular vectors of A.
Q: The (normalized) eigenvectors of the matrix
AT A (i.e., QT Q = I). The columns of Q are called
the right singular vectors of A.
!: The diagonal matrix of the singular values,
1
! = " 2 with " being the diagonal matrix of the
eigenvalues of matrix AAT and of the matrix AT A
(as they are the same).
L
#
# p# qT# ,
(B.2)
#=1
455
Overview
www.wiley.com/wires/compstats
. T WQ
. = I.
with: .
PT M.
P=Q
(B.3)
(B.4)
with: PT P = QT Q = I.
(B.5)
. = W 12 Q.
Q
(B.6)
. = !.
!
.T
.Q
A =.
P!
456
(B.7)
by substitution:
1
AW 2
A = M 2 .
1
= M 2 P!QT W 2
.T
=.
P!Q
(from Eq. B.6).
(B.8)
and
1
1
.
PT M.
P = PT M 2 MM 2 P = PT P = I
(B.9)
. T WQ
. = QT W 12 WW 12 Q = QT Q = I.
Q
(B.10)
(B.11)
'
(
Q[M] = q1 , . . . , qm , . . . , qM
(B.12)
![M] = diag { 1 , . . . , m , . . . , M } .
(B.13)
M
#
m pm qTm ,
(B.14)
= trace
7/
[M]
AA
6
62
= min 6A X6
X
0/
[M]
AA
0T 8
(B.15)
Vo lu me 2, Ju ly /Au gu s t 2010
(B.16)
ACKNOWLEDGEMENT
We would like to thank Yoshio Takane for his very helpful comments on a draft of this article.
REFERENCES
1. Pearson K. On lines and planes of closest fit to systems
of points in space. Philos Mag A 1901, 6:559572.
2. Cauchy AL. Sur lequation a` laide de laquelle on
determine les inegalites seculaires des mouvements des
plan`etes, vol. 9. Oeuvres Compl`etes (II`eme Serie); Paris:
Blanchard; 1829.
12. Abdi H. Eigen-decomposition: eigenvalues and eigenvectors. In: Salkind NJ, ed. Encyclopedia of Measurement and Statistics. Thousand Oaks: Sage Publications;
2007, 304308.
13. Takane Y. Relationships among various kinds of eigenvalue and singular value decompositions. In: Yanai H,
Okada A, Shigemasu K, Kano Y, Meulman J, eds.
New Developments in Psychometrics. Tokyo: Springer
Verlag; 2002, 4556.
457
Overview
www.wiley.com/wires/compstats
21. Efron B. The Jackknife, the Bootstrap and other Resampling Plans, vol. 83, CMBF-NSF Regional Conference
Series in Applied Mathematics: New York SIAM; 1982.
40. Abdi H. Multivariate analysis. In: Lewis-Beck M, Bryman A, Futing T, eds. Encyclopedia for Research
Methods for the Social Sciences. Thousand Oaks, CA:
Sage Publications 2003, 669702.
47. Greenacre MJ. Theory and Applications of Correspondence Analysis. London: Academic Press; 1984.
49. Abdi H, Valentin D. Multiple correspondence analysis. In: Salkind NJ, ed. Encyclopedia of Measurement
and Statistics. Thousand Oaks, CA: Sage Publications;
2007, 651657.
50. Hwang H, Tomiuk MA, Takane Y. Correspondence
analysis, multiple correspondence analysis and recent
developments. In: Millsap R, Maydeu-Olivares A, eds.
Handbook of Quantitative Methods in Psychology.
London: Sage Publications 2009, 243263.
51. Abdi H, Williams LJ. Correspondence analysis. In:
Salkind NJ, ed. Encyclopedia of Research Design.
Thousand Oaks: Sage Publications; 2010, (In press).
52. Brunet E. Faut-il ponderer les donnees linguistiques.
CUMFID 1989, 16:3950.
53. Escofier B, Pag`es J. Analyses factorielles simples et
multiples: objectifs, methodes, interpretation. Paris:
Dunod; 1990.
54. Escofier B, Pag`es J. Multiple factor analysis. Comput
Stat Data Anal 1994, 18:121140.
55. Abdi H, Valentin D. Multiple factor analysis (mfa).
In: Salkind NJ, ed. Encyclopedia of Measurement
and Statistics. Thousand Oaks, CA: Sage Publications;
2007, 657663.
56. Diamantaras KI, Kung SY. Principal Component Neural Networks: Theory and Applications. New York:
John Wiley & Sons; 1996.
57. Abdi H, Valentin D, Edelman B. Neural Networks.
Thousand Oaks, CA: Sage; 1999.
Vo lu me 2, Ju ly /Au gu s t 2010
FURTHER READING
Baskilevsky A. Applied Matrix Algebra in the Statistical Sciences. New York: Elsevier; 1983.
459