0% found this document useful (0 votes)
38 views7 pages

Lecture 9 PRINCIPAL COMPONENTS

Principal component analysis (PCA) is a technique used to reduce the dimensionality of large datasets while retaining maximum information. It transforms the data into a new coordinate system where most of the variation can be described by fewer dimensions. PCA is commonly used when variables are highly correlated and it is desirable to reduce them to an independent set. It works by finding linear combinations of the original variables called principal components that successively maximize variance. The first principal component accounts for as much of the variability in the data as possible, and each succeeding component accounts for as much of the remaining variability as possible. Many studies use the first two principal components to visually identify clusters in two-dimensional plots.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
38 views7 pages

Lecture 9 PRINCIPAL COMPONENTS

Principal component analysis (PCA) is a technique used to reduce the dimensionality of large datasets while retaining maximum information. It transforms the data into a new coordinate system where most of the variation can be described by fewer dimensions. PCA is commonly used when variables are highly correlated and it is desirable to reduce them to an independent set. It works by finding linear combinations of the original variables called principal components that successively maximize variance. The first principal component accounts for as much of the variability in the data as possible, and each succeeding component accounts for as much of the remaining variability as possible. Many studies use the first two principal components to visually identify clusters in two-dimensional plots.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 7

Lecture 9

PRINCIPAL COMPONENT ANALYSIS

➢ Principal component analysis (PCA) is a popular technique for analyzing


large datasets containing a
✓ high number of dimensions/features per observation,
✓ increasing the interpretability of data while preserving the maximum
amount of information, and
✓ enabling the visualization of multidimensional data.
➢ Formally, PCA is a statistical technique for reducing the dimensionality of
a dataset.
➢ This is accomplished by linearly transforming the data into a
new coordinate system where (most of) the variation in the data can be
described with fewer dimensions than the initial data.
➢ Many studies use the first two principal components in order
✓ to plot the data in two dimensions and
✓ to visually identify clusters of closely related data points.
➢ Principal component analysis has applications in many fields such
as population genetics, microbiome studies, and atmospheric science.

➢ In data analysis, the first principal component of a set of p variables,


presumed to be jointly normally distributed, is the derived variable formed
as a linear combination of the original variables that explains the most
variance.
➢ The second principal component explains the most variance in what is left
once the effect of the first component is removed, and we may proceed
through p iterations until all the variance is explained.
➢ PCA is most commonly used when
✓ many of the variables are highly correlated with each other and
✓ it is desirable to reduce their number to an independent set.
➢ A principal component analysis is concerned with explaining the variance-
covariance structure of a set of variables through a few linear combinations
of these variables.
➢ Its general objectives are (1) data reduction and (2) interpretation.

Population Principal Components


➢ Algebraically, principal components are particular linear combinations of
the p random variables 𝑋1 , 𝑋2 , … … … … … , 𝑋𝑝 .
➢ Geometrically, these linear combinations represent the selection of a new
coordinate system obtained by rotating the original system with
𝑋1 , 𝑋2 , … … … … … , 𝑋𝑝 as the coordinate axes.
➢ The new axes represent the directions with maximum variability and
provide a simpler and more parsimonious description of covariance structure.

➢ Principal components depend solely on the covariance matrix ∑ (or the


correlation matrix 𝜌) of 𝑋1 , 𝑋2 , … … … … … , 𝑋𝑝 .
➢ Let the random vector 𝑋 ′ = [𝑋1 , 𝑋2 , … … … . . , 𝑋𝑝 ] have the covariance matrix
∑ with eigenvalues 𝜆1 ≥ 𝜆2 … … … … . ≥ 𝜆𝑝 ≥ 0.

Consider the linear combinations


𝑌1 = 𝑎1′ 𝑋 = 𝑎11 𝑋1 + 𝑎12 𝑋2 + ⋯ … … . . +𝑎1𝑝 𝑋𝑝
𝑌2 = 𝑎2′ 𝑋 = 𝑎21 𝑋1 + 𝑎22 𝑋2 + ⋯ … … . . +𝑎2𝑝 𝑋𝑝
: (1)
:
𝑌𝑝 = 𝑎𝑝′ 𝑋 = 𝑎𝑝1 𝑋1 + 𝑎𝑝2 𝑋2 + ⋯ … … . . +𝑎𝑝𝑝 𝑋𝑝
Then,
𝑉𝑎𝑟(𝑌𝑖 ) = 𝑎𝑖′ ∑ 𝑎𝑖 𝑖 = 1,2, … … . , 𝑝 (2)
𝐶𝑜𝑣(𝑌𝑖 , 𝑌𝑘 ) = 𝑎𝑖′ ∑ 𝑎𝑘 𝑖, 𝑘 = 1,2, … … . , 𝑝 (3)
➢ The principal components are those uncorrelated linear combinations
𝒀𝟏 , 𝒀𝟐 , … … , 𝒀𝒑 whose variances in (2) are as large as possible.

➢ The first principal component is the linear combination with maximum


variance.
✓ That is, it maximizes 𝑉𝑎𝑟(𝑌𝑖 ) = 𝑎𝑖′ ∑ 𝑎𝑖 .
✓ It is clear that 𝑉𝑎𝑟(𝑌𝑖 ) = 𝑎𝑖′ ∑ 𝑎𝑖 can be increased by multiplying any
𝑎𝑖 by some constant.
✓ To eliminate this indeterminacy, it is convenient to restrict
attenuation to coefficient vector of unit length. We therefore define

𝐹𝑖𝑟𝑠𝑡 𝑝𝑟𝑖𝑛𝑐𝑖𝑝𝑎𝑙 𝑐𝑜𝑚𝑝𝑜𝑛𝑒𝑛𝑡 = 𝑙𝑖𝑛𝑒𝑎𝑟 𝑐𝑜𝑚𝑏𝑖𝑛𝑎𝑡𝑖𝑜𝑛 𝑎1′ 𝑋 𝑡ℎ𝑎𝑡 𝑚𝑎𝑥𝑖𝑚𝑖𝑧𝑒𝑠


Var (𝑎1′ 𝑋) subject to 𝑎1′ 𝑎1 = 1

𝑆𝑒𝑐𝑜𝑛𝑑 𝑝𝑟𝑖𝑛𝑐𝑖𝑝𝑎𝑙 𝑐𝑜𝑚𝑝𝑜𝑛𝑒𝑛𝑡 = 𝑙𝑖𝑛𝑒𝑎𝑟 𝑐𝑜𝑚𝑏𝑖𝑛𝑎𝑡𝑖𝑜𝑛 𝑎2′ 𝑋 𝑡ℎ𝑎𝑡 𝑚𝑎𝑥𝑖𝑚𝑖𝑧𝑒𝑠


Var (𝑎2′ 𝑋) subject to 𝑎2′ 𝑎2 = 1 and
Cov (𝑎1′ 𝑋, 𝑎2′ 𝑋) = 0
At the ith step
𝑖𝑡ℎ 𝑝𝑟𝑖𝑛𝑐𝑖𝑝𝑎𝑙 𝑐𝑜𝑚𝑝𝑜𝑛𝑒𝑛𝑡 = 𝑙𝑖𝑛𝑒𝑎𝑟 𝑐𝑜𝑚𝑏𝑖𝑛𝑎𝑡𝑖𝑜𝑛 𝑎𝑖′ 𝑋 𝑡ℎ𝑎𝑡 𝑚𝑎𝑥𝑖𝑚𝑖𝑧𝑒𝑠
Var (𝑎𝑖′ 𝑋) subject to 𝑎𝑖′ 𝑎𝑖 = 1 and
Cov (𝑎𝑖′ 𝑋, 𝑎𝑘′ 𝑋) = 0 𝑓𝑜𝑟 𝑘 < 𝑖

Theorem: Let ∑ be the covariance matrix associated with the random vector 𝑋 ′ =
[𝑋1 , 𝑋2 , … … … . . , 𝑋𝑝 ]. Let ∑ have the eigenvalue-eigenvector pairs
(𝜆1 , 𝑒1 ), (𝜆2 , 𝑒2 ), … … … … , (𝜆𝑝 , 𝑒𝑝 ) where 𝜆1 ≥ 𝜆2 … … … … . ≥ 𝜆𝑝 ≥ 0. Then the ith
principal component is given by
𝑌𝑖 = 𝑒𝑖′ 𝑋 = 𝑒𝑖1 𝑋1 + 𝑒𝑖2 𝑋2 + ⋯ … … … . . +𝑒𝑖𝑝 𝑋𝑝 , 𝑖 = 1, 2, … … … … … . , 𝑝
With these choices,
𝑉𝑎𝑟(𝑌𝑖 ) = 𝑒𝑖′ ∑𝑒𝑖 = 𝜆𝑖 , 𝑖 = 1,2, … … … … , 𝑝
} … … … … . (4)
𝐶𝑜𝑣 (𝑌𝑖 , 𝑌𝑘 ) = 𝑒𝑖′ ∑𝑒𝑘 = 0, 𝑖≠𝑘
If some 𝜆𝑖 are equal, the choices of the corresponding coefficient vectors, 𝑒𝑖 , and
hence 𝑌𝑖 , are not unique.

➢ Total population variance = 𝜎11 + 𝜎22 + ⋯ … … + 𝜎𝑝𝑝


= 𝜆1 + 𝜆2 + ⋯ … … … . +𝜆𝑝 (5)
➢ Proportion of total population variance due to kth principal component
𝜆𝑘
= k = 1, 2,………,p
𝜆1 +𝜆2 +⋯……….+𝜆𝑝
➢ If most (for instance, 80 to 90%) of the total population variance, for large
p, can be attributed to the first one, two, or three components, then these
components can "replace" the original p variables without much loss of
information.

➢ Each component of the coefficient vector 𝑒𝑖′ = [𝑒𝑖1 , … … , 𝑒𝑖𝑘 , … … , 𝑒𝑖𝑝 ]


also merits inspection.
✓ The magnitude of eik measures the importance of the kth variable to
the ith principal component, irrespective of the other variables.
✓ In particular, eik is proportional to the correlation coefficient between
Yi and Xk .
𝑒𝑖𝑘 √𝜆𝑖
𝜌𝑌𝑖 ,𝑋𝑘 = i,k=1,2,……….,p
√𝜎𝑘𝑘

are the correlation coefficients between the components Yi and the


variables Xk.
Example: (Calculating the population principal components) Suppose the
random variables 𝑋1, , 𝑋2 , 𝑎𝑛𝑑 𝑋3 have covariance matrix
1 −2 0
∑ = [−2 5 0]
0 0 2
It may be verified that the eigenvalue-eigenvector pairs are
𝜆1 = 5.83, 𝑒1′ = [. 383, −.924, 0]
𝜆2 = 2.00, 𝑒2′ = [0, 0, 1]
𝜆3 = 0.17, 𝑒3′ = [. 924, .383, 0]
Therefore, the principal components become
𝑌1 = 𝑒1′ 𝑋 = .383𝑋1 − .924𝑋2
𝑌2 = 𝑒2′ 𝑋 = 𝑋3
𝑌3 = 𝑒3′ 𝑋 = .924𝑋1 + .383𝑋2
The variable 𝑿𝟑 is one of the principal components, because it is uncorrelated
with the other two variables. Equation (4) can be demonstrated from first principles.
For example,
𝑉𝑎𝑟(𝑌1 ) = 𝑉𝑎𝑟(.383𝑋1 − .924𝑋2 ) [Var(X+Y)=Var(X)+Var(Y)+2Cov(X,Y)]
= (. 383)2 𝑉𝑎𝑟(𝑋1 ) + (. 924)2 𝑉𝑎𝑟(𝑋2 ) + 2(. 383)(−.924)𝐶𝑜𝑣(𝑋1 𝑋2 )
= .147(1) + .854(5) − .708(−2)
= 5.83 = 𝜆1
𝑉𝑎𝑟(𝑌2 ) = 𝑉𝑎𝑟(𝑋3 ) = 2 = 𝜆2
𝑉𝑎𝑟(𝑌3 ) = 𝑉𝑎𝑟(.924𝑋1 + .383𝑋2 )
= (. 924)2 𝑉𝑎𝑟(𝑋1 ) + (. 383)2 𝑉𝑎𝑟(𝑋2 ) + 2(. 924)(. 383)𝐶𝑜𝑣(𝑋1 𝑋2 )
= .854(1) + .147(5) + .708(−2)
= .854 + .735 − 1.416
= .17 = 𝜆3
𝐶𝑜𝑣(𝑌1 , 𝑌2 ) = 𝐶𝑜𝑣(.383𝑋1 − .924𝑋2 , 𝑋3 )
= .383 𝐶𝑜𝑣(𝑋1 , 𝑋3 ) − .924𝐶𝑜𝑣(𝑋2 𝑋3 )
= .383(0) − .924(0) = 0
It is readily apparent that
𝜎11 + 𝜎22 + 𝜎33 = 1 + 5 + 2 = 8
and also 𝜆1 + 𝜆2 + 𝜆3 = 5.83 + 2.00 + .17 = 8
validating Equation (5)

The proportion of total variance accounted for by the first principal component is

𝜆1 5.83
= = .73
𝜆1 + 𝜆2 + +𝜆3 8
Further, the first two components account for a proportion (5.83 + 2)/8 = .98 of the
population variance. In this case, the components Y 1 and Y2 could replace the
original three variables with little loss of information.

Next, we obtain

𝑒11 √𝜆1 . 383√5.83


𝜌𝑌1 ,𝑋1 = = = .925
𝜎
√ 11 √1
𝑒12 √𝜆1 −.924√5.83
𝜌𝑌1 ,𝑋2 = = = −.998
𝜎
√ 22 √5
➢ Notice here that the variable X2, with coefficient -.924, receives the greatest
weight in the component Y1 .
➢ It also has the largest correlation (in absolute value) with Y1.
➢ The correlation of Xl, with Y1, .925, is almost as large as that for X2, indicating
that the variables are about equally important to the first principal component.
➢ The relative sizes of the coefficients of Xl and X2 suggest, however, that X2
contributes more to the determination of Y1 than does X1.
➢ Since, in this case, both coefficients are reasonably large and they have
opposite signs, we would argue that both variables aid in the interpretation of
Y1.
Finally,

𝑒23 √𝜆2 1√2


𝜌𝑌2 ,𝑋1 = 𝜌𝑌2 ,𝑋2 = 0 𝑎𝑛𝑑 𝜌𝑌2 ,𝑋3 = = == 1 (𝑎𝑠 𝑖𝑡 𝑠ℎ𝑜𝑢𝑙𝑑)
√𝜎33 √2
The remaining correlations can be neglected, since the third component is
unimportant.

You might also like