0% found this document useful (0 votes)

131 views

Principal Component Analysis: 2.1 Definition of Principal Components

Principal component analysis (PCA) is a technique used to simplify a dataset by reducing the number of variables, while retaining as much information as possible. It works by transforming the original variables into a new set of variables called principal components. The first principal component accounts for as much of the variability in the data as possible, and each succeeding component in turn has the highest variance possible under the constraint that it is orthogonal to the preceding components. The document provides a mathematical definition of PCA, and derives the principal components by finding the eigenvalues and eigenvectors of the covariance matrix of the variables. An example with two highly correlated variables is used to illustrate how PCA finds the principal component that captures most of the variance in the data.

Uploaded by

Anju Latha Nair S

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

131 views

Principal Component Analysis: 2.1 Definition of Principal Components

Uploaded by

Anju Latha Nair S

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 8

Machine Learning, Kristjan Korjus

PRINCIPAL COMPONENT ANALYSIS

1 INTRODUCTION

One of the main problems inherent in statistics with more than two variables is the issue of
visualising or interpreting data. Fortunately, quite often the problem can be simplified by
replacing a group of variables with a single new variable. The reason might be that more
than one variable is measuring the same driving principle governing the behaviour of the
system. One of the methods for reducing variables is Principal Component Analysis (PCA).

The purpose of the report is to give precise mathematical definition to PCA. Then,
mathematical derivation of PCA is given.

2 PRINCIPAL COMPONENT ANALYSIS

The method creates a new set of variables called principal components. Each of the new
variables is a linear combination of the original variables. Each of principal components is
chosen so that it would describe most of the still available variance and all principal
components are orthogonal to each other; hence there is no redundant information. The
first principal component has the maximum variance among all possible choices. (The
MathWorks, 2010) (Jolliffe, 1986)

PCA is used for different purposes - finding interrelations between variables in the data;
interpreting and visualizing data; decreasing the number of variables for making further
analysis simpler and for many other similar reasons.

The definition and derivation of the principal component analysis is described. In between
Lagrange multipliers for finding a maximum of a function with constraints and eigenvalues
and eigenvectors are explained, because these ideas are needed in the derivation.

2.1 DEFINITION OF PRINCIPAL COMPONENTS

Suppose that is a vector of r random variables and denotes the transpose of . So

[ ]

Page 1 of 8
Machine Learning, Kristjan Korjus

First step is to look at the linear function of the elements of which has maximum
variance, where is a vector of r constants, so that

There must be some constraints imposed, otherwise variance is unbounded. In the current
paper is used, which means that the sum of squares of elements of is 1 or the
length of is 1.

So, the aim is to find the linear function that transforms random variables into a new
random variable so that the new variable has maximum variation.

Next, it is necessary to find a linear function , uncorrelated with , which has

maximum variance, and then for linear function , uncorrelated with and
so on up to rth linear function such that kth linear function is uncorrelated
with .

All of these transformations create r new random variables which are called principal
components. In general, the hope is that the first few random variables are needed to
explain necessary amount of variability in the dataset.

2.1.1 EXAMPLE

In the simplest case, there is a pair of 2 random variables and , which are highly
correlated. In this case one might want to reduce the variables to only one, so that it would
be easier to conduct further analysis. Figure 1 shows 30 such pairs of ( )
random variables, where is a random number from interval ( ) and is
plus 0.4 times a random number from interval ( ).

Page 2 of 8
Machine Learning, Kristjan Korjus

30 highly correlated random variables

PC 1
PC 2

0
-1 -0,8 -0,6 -0,4 -0,2 0 0,2 0,4 0,6 0,8 1

-1

Figure 1 – 30 highly correlated random variables. Source: Random sample

A strong correlation is visible. Principal component analysis tries to find the first principal
component which would explain most of the variance in the dataset. In this case it is clear
that the most variance would stay present if the new random variable (first principal
component) would be on the direction shown with the line on the graph. This new random
variable would explain most of the variation in the data set and could be used for further
analysis instead the original variables.

The method with two random variables looks similar to regression model, but the difference
is that the first principle component is chosen so that the sample points are as close to the
new variable as possible, but in regression analysis the vertical distances are as small as
possible.

In reality, random variables and can have some meaning as well. For example, might
be standardized mathematics exam scores and might be standardized physics exam
scores. In that case it would be possible to conclude that the new variable (the first
principal component) might account for some general logical ability and could be
interpreted as some other factor.

Page 3 of 8
Machine Learning, Kristjan Korjus

2.2 DERIVATION OF PRINCIPAL COMPONENTS

The following part shows how to find those principal components. Basic structure of the
definition and derivation are from I. T. Jolliffe’s (1986) book “Principal Component Analysis”.

It is assumed that the covariance matrix of the random variables is known – denoted .
is a non-singular symmetric matrix with dimension . is also positive semi-definite
which means that all the eigenvalues are non-negative. The element ( ) of the matrix
shows the covariance between and in case . Elements ( ) on the diagonal show
the variance of the element . So,

[( )( )] [( )( )] [( )( )]
[( )( )] [( )( )] [( )( )]
[ ]
[( )( )] [( )( )] [( )( )]

where [ ] is the expected value of and is the mean of . In this report the mean is
assumed to be 0 because it can be subtracted from the data before the analysis.

Finding the principal components is reduced to finding eigenvalues and eigenvectors of the
matrix such that the kth principal component is given by . Here is an
eigenvector of , which corresponds to the kth largest eigenvalue . In addition to this, the
variance of is because is chosen to be unit length. (Jolliffe, 1986)

Before the result is derived two topics must be explained. Firstly, eigenvalues and
eigenvector are described together with an example, and then method of Lagrange
Multipliers is explained.

2.2.1 EIGENVALUES AND EIGENVECTORS

Eigenvector is a non-zero vector that stays parallel after matrix multiplication, i.e. is
eigenvector of dimension of matrix with dimension if and are parallel.
Parallel means that there exists such that . (Roberts, 1985)

To find eigenvalues and eigenvectors the equation must be solved. Rewrite it

( ) , where and are both unknowns. Therefore ( ) must be a singular
matrix for non trivial eigenvectors. So, it is possible to find all ’s first because it is known
from linear algebra that determinant of a singular matrix is 0. This equation is called
eigenvalue equation. (ibid)

Page 4 of 8
Machine Learning, Kristjan Korjus

In case of symmetric semi-definite matrix where , eigenvalues are non-negative

real numbers and more importantly the eigenvectors are perpendicular to each other (ibid).

After finding all the eigenvalues ’s, all the corresponding eigen vectors can be found by
solving standard linear matrix equation , where ( ).

EXAMPLE OF FINDING EIGENVECTORS AND EIGENVALUES

Q: Let [ ] be a matrix. We need to find its eigenvectors and eigenvalues.

(Strang, 1999)

A:
( )

[ ]

( )

Next, eigenvector for can be found.

( )

[ ]

In a similar way for , [ ]

Therefore, two pairs of eigenvalues and eigenvectors have been found as required.

In practice eigenvalues are computed by better algorithms than finding the roots of
algebraic equations.

2.2.2 LAGRANGE MULTIPLIERS

Sometimes it is needed to find the maximum or minimum of the function that depends
upon several variables whose values must satisfy certain equalities, i.e. constraints. In this
report, it is needed to find principal components which are linear combination of original

Page 5 of 8
Machine Learning, Kristjan Korjus

random variables so that the length of the vector that represents linear combination is 1
and that all these vectors are uncorrelated to the others. The idea is to change the
constrained variable problem to unconstrained variable problem (Gowers, 2008).

Situation can be stated as follows: maximize ( ) subject to ( ) as illustrated in

figures 1 and 2.

Figure 2 - Find x and y to maximize f(x,y) subject to a constraint Figure 3 - Contour map of Figure 1. The red line shows the
(shown in red) g(x,y) = c. Source: (Lagrange multipliers) constraint g(x,y) = c. The blue lines are contours of f(x,y). The
point where the red line tangentially touches a blue contour is
the solution. Source: (Lagrange multipliers)

New variable called a Lagrange multiplier is introduced, and the Lagrange function is
defined by

( ) ( ) ( )

Needed extremum points are solutions of

( ) ( )

Lagrange multiplier method gives necessary conditions for finding the maximum points of a
function subject to constraints.

2.2.3 DERIVATION OF FIRST PRINCIPAL COMPONENT

It is needed to find (subject to i.e. ), which maximizes the

variance of As is the covariance matrix defined above and the data is normalized,
i.e. [ ] ,

Page 6 of 8
Machine Learning, Kristjan Korjus

( ) [( )( )] [( )( ) ]

[ ] [ ]
The technique of Lagrange multipliers is used. So it is necessary to maximize

( )

where is a Lagrange multiplier.

Differentiation with respect to gives

which is the same as

( )

where is the ( ) identity matrix, i.e.

[ ]

Next, it is needed to decide which of the eigenvectors gives the maximizing value for the
first principal component. It is necessary to maximize . Let be any eigenvector of
and be the corresponding eigenvalue. We have

As must be the largest possible, must be the eigenvector which corresponds to the
largest eigenvalue.

The first principal component has now been derived. Same process can be applied to others
such that principal component of is and variance of is such that is the
largest eigenvalue with the corresponding eigenvector of , where .

In this report, only the case where will be proven.

The second principal component must maximize such that is uncorrelated

with i.e. covariance between and is 0. It can be written as

( ) [( )( ) ] [ ]

Page 7 of 8
Machine Learning, Kristjan Korjus

where ( ) denotes the covariance between and , and is already known from
the derivation of first principal component. So, must be maximized with following
constraints: The method of Lagrange multipliers is used.

( )

where are Lagrange multipliers. As before, we differentiate with respect to

and for simplifying the left side it must be multiplied by

We know that so the equation reduces to . If

we put in the first equation then

( )
where is an eigenvalue and the corresponding eigenvector of . As before,
, so because must be as big as possible and due
to correlations constraint it must not equal to .

As said before, the method applies for all the principal components and the proof is similar
but not given in this report.

3 WORKS CITED

Gowers, T. (2008). The Princeton Companion to Mathematics. Princetone University Press.

Jolliffe, I. (1986). Principal Component Analysis. Harrisonburg: R. R. Donnelley & Sons.

Lagrange multipliers. (n.d.). Retrieved April 1, 2010, from Wikipedia:

http://en.wikipedia.org/wiki/Lagrange_multipliers

Pearson, K. (1901). On lines and planes of closest fit to systems of points in space.
Philosophical Magazine(2), 559-572.

Roberts, A. W. (1985). Elementary Linear Algebra. Menlo Park: Benjamin/Cummings Pub.

Co.

Strang, G. (1999). MIT video lectures in linear algebra. Retrieved April 16, 2010, from MIT
Open Courseware: http://ocw.mit.edu/OcwWeb/Mathematics/18-06Spring-
2005/VideoLectures/detail/lecture21.htm

Page 8 of 8

Econ 212: Using Stata To Estimate VAR and Stractural VAR Models
100% (1)
Econ 212: Using Stata To Estimate VAR and Stractural VAR Models
17 pages
Review Paper
No ratings yet
Review Paper
13 pages
Exp 1 Prepare Buffer Soln
100% (1)
Exp 1 Prepare Buffer Soln
4 pages
Friedlander Weiss 98
No ratings yet
Friedlander Weiss 98
4 pages
Econometrics Journal - 2002 - Lüutkepohl - Maximum Eigenvalue Versus Trace Tests For The Cointegrating Rank of A VAR
No ratings yet
Econometrics Journal - 2002 - Lüutkepohl - Maximum Eigenvalue Versus Trace Tests For The Cointegrating Rank of A VAR
24 pages
Canonical Correlation
No ratings yet
Canonical Correlation
7 pages
Some Applications of The Vector Spaces:: Note1
No ratings yet
Some Applications of The Vector Spaces:: Note1
2 pages
Factor Analysis: NCSS Statistical Software
No ratings yet
Factor Analysis: NCSS Statistical Software
27 pages
J JSV 2015 10 014
No ratings yet
J JSV 2015 10 014
13 pages
Nonlinear Differential Equations
No ratings yet
Nonlinear Differential Equations
25 pages
Ch8-Principal Components
No ratings yet
Ch8-Principal Components
77 pages
Factor Analysis
No ratings yet
Factor Analysis
57 pages
NA1 Lecture Chap.01-06-Paged
No ratings yet
NA1 Lecture Chap.01-06-Paged
313 pages
Notas de Clase
No ratings yet
Notas de Clase
61 pages
Sparse Principal Component Analysis
No ratings yet
Sparse Principal Component Analysis
30 pages
7thcanonical Correlation Analysis PDF
No ratings yet
7thcanonical Correlation Analysis PDF
13 pages
Factor Analysis Note 4
No ratings yet
Factor Analysis Note 4
16 pages
Set Domande Econometria 2
No ratings yet
Set Domande Econometria 2
19 pages
Critical Journal Review: Mathematics Departement Faculty of Mathematic and Sciences State University of Medan 2018
No ratings yet
Critical Journal Review: Mathematics Departement Faculty of Mathematic and Sciences State University of Medan 2018
12 pages
Passivity and Complementarity
No ratings yet
Passivity and Complementarity
33 pages
Principal Component Analysis
No ratings yet
Principal Component Analysis
33 pages
PCA Term Structure
No ratings yet
PCA Term Structure
28 pages
Sensitivity Analyses: A Brief Tutorial With R Package Pse, Version 0.1.2
No ratings yet
Sensitivity Analyses: A Brief Tutorial With R Package Pse, Version 0.1.2
14 pages
Econometria 2
No ratings yet
Econometria 2
16 pages
EE731 Lecture Notes: Matrix Computations For Signal Processing
No ratings yet
EE731 Lecture Notes: Matrix Computations For Signal Processing
21 pages
Sparse
No ratings yet
Sparse
24 pages
Fourier and Signal Processing
No ratings yet
Fourier and Signal Processing
11 pages
Pca
No ratings yet
Pca
10 pages
Chapter-4 Principal Component Analysis-Based Fusion
No ratings yet
Chapter-4 Principal Component Analysis-Based Fusion
27 pages
Principal component analysis
No ratings yet
Principal component analysis
15 pages
Properties and Significance
No ratings yet
Properties and Significance
19 pages
Qu Zhongjun
No ratings yet
Qu Zhongjun
41 pages
UNIT5.3
No ratings yet
UNIT5.3
46 pages
Estimation of Multivariate Probit Models: A Mixed Generalized Estimating/pseudo-Score Equations Approach and Some Nite Sample Results
No ratings yet
Estimation of Multivariate Probit Models: A Mixed Generalized Estimating/pseudo-Score Equations Approach and Some Nite Sample Results
19 pages
Unit 4 (PCA)
No ratings yet
Unit 4 (PCA)
12 pages
1-s2.0-S0047259X11001047-main
No ratings yet
1-s2.0-S0047259X11001047-main
11 pages
Efficient Similarity Search in Sequence Databases
No ratings yet
Efficient Similarity Search in Sequence Databases
16 pages
Factor in R PDF
No ratings yet
Factor in R PDF
4 pages
Quantum Chapter 8 2016
No ratings yet
Quantum Chapter 8 2016
17 pages
Principal Component Analysis
No ratings yet
Principal Component Analysis
10 pages
LAnotes
No ratings yet
LAnotes
51 pages
Shape Design Sensitivity Analysis of Eigenvalues Using "Exact" Numerical Differentiation of Finite Element Matrices
No ratings yet
Shape Design Sensitivity Analysis of Eigenvalues Using "Exact" Numerical Differentiation of Finite Element Matrices
8 pages
On The Existence and Computation of L (/-Factorizations With Small Pivots
No ratings yet
On The Existence and Computation of L (/-Factorizations With Small Pivots
13 pages
S.Bogli - Convergence of Sequences of Linear Operators and Their Spectra
No ratings yet
S.Bogli - Convergence of Sequences of Linear Operators and Their Spectra
41 pages
A Statistical Theory of Chord Under Churn: Abstract
No ratings yet
A Statistical Theory of Chord Under Churn: Abstract
6 pages
Mechanical Vibrations
No ratings yet
Mechanical Vibrations
30 pages
Implementing The Division Operation On A Database Containing Uncertain Data
No ratings yet
Implementing The Division Operation On A Database Containing Uncertain Data
31 pages
Science of Computer Programming: Roland Backhouse, João F. Ferreira
No ratings yet
Science of Computer Programming: Roland Backhouse, João F. Ferreira
21 pages
1 Basic Vector/Matrix Structure and Notation
No ratings yet
1 Basic Vector/Matrix Structure and Notation
6 pages
Linear Regression
No ratings yet
Linear Regression
4 pages
Prony Method For Exponential
No ratings yet
Prony Method For Exponential
21 pages
ATKINS - 1998 - Seriation PDF
No ratings yet
ATKINS - 1998 - Seriation PDF
14 pages
MATH2071: LAB 8: The Eigenvalue Problem
No ratings yet
MATH2071: LAB 8: The Eigenvalue Problem
16 pages
Density matrices IBM Quantum Learning 2
No ratings yet
Density matrices IBM Quantum Learning 2
3 pages
Principal Component Analysis
No ratings yet
Principal Component Analysis
23 pages
2012-408 Understanding Correlation Matrices
No ratings yet
2012-408 Understanding Correlation Matrices
6 pages
6 Dimension Reduction Theory
No ratings yet
6 Dimension Reduction Theory
18 pages
Factor analysis is a statistical method used to explore the underlying structure of relationships among observed variables in a dataset. It aims to identify latent or unobservable factors that exp (2)
No ratings yet
Factor analysis is a statistical method used to explore the underlying structure of relationships among observed variables in a dataset. It aims to identify latent or unobservable factors that exp (2)
12 pages
Robust Estimation of Risk Factor Model Covariance Matrix
No ratings yet
Robust Estimation of Risk Factor Model Covariance Matrix
5 pages
Moving Objects and Their Equations of Motion: Abstract
No ratings yet
Moving Objects and Their Equations of Motion: Abstract
12 pages
Ordered Weighted Averaging Aggregation Operator: Fundamentals and Applications
From Everand
Ordered Weighted Averaging Aggregation Operator: Fundamentals and Applications
Fouad Sabry
No ratings yet
Moments Inertia
No ratings yet
Moments Inertia
3 pages
Motion in One Dimension - 2023 - DPP
No ratings yet
Motion in One Dimension - 2023 - DPP
10 pages
Chapter 4 KN Sharma Atomic Structure (Classical
No ratings yet
Chapter 4 KN Sharma Atomic Structure (Classical
18 pages
Albert Einstein by Hardik Singh Cheema Class 9 B Roll No - 5
No ratings yet
Albert Einstein by Hardik Singh Cheema Class 9 B Roll No - 5
26 pages
Dielectrics 2019
80% (5)
Dielectrics 2019
103 pages
Inorganic Chemistry: Duward Shriver and Peter Atkins
No ratings yet
Inorganic Chemistry: Duward Shriver and Peter Atkins
65 pages
(Undergraduate Lecture Notes in Physics) Albrecht Lindner, Dieter Strauch - A Complete Course on Theoretical Physics_ From Classical Mechanics to Advanced Quantum Statistics-Springer International Pub
100% (14)
(Undergraduate Lecture Notes in Physics) Albrecht Lindner, Dieter Strauch - A Complete Course on Theoretical Physics_ From Classical Mechanics to Advanced Quantum Statistics-Springer International Pub
655 pages
XII CHEM RT - 9 Answer Key
No ratings yet
XII CHEM RT - 9 Answer Key
7 pages
Particle Accelerators
No ratings yet
Particle Accelerators
12 pages
Ions and Radicals Text
No ratings yet
Ions and Radicals Text
3 pages
Hudson, J. A. and Harrison, J. R. - Rock Mechanics (Part1) - 50
No ratings yet
Hudson, J. A. and Harrison, J. R. - Rock Mechanics (Part1) - 50
1 page
2.2 Phase Diagrams
No ratings yet
2.2 Phase Diagrams
123 pages
Physics Blown-Up Syllabus 1st PU.
No ratings yet
Physics Blown-Up Syllabus 1st PU.
19 pages
3.5core Aluminium Xlpe Armoured
No ratings yet
3.5core Aluminium Xlpe Armoured
1 page
7.2014-Perez-Rey Arzua Barbiero Alejano Walton - Alabtestingbasedgeomechanicalcharacterizationofmetamorphicrocksfocusingonpost-Failurebehavior
No ratings yet
7.2014-Perez-Rey Arzua Barbiero Alejano Walton - Alabtestingbasedgeomechanicalcharacterizationofmetamorphicrocksfocusingonpost-Failurebehavior
7 pages
Oxidative Addition
No ratings yet
Oxidative Addition
7 pages
Organic Chemistry (Infrared)
100% (1)
Organic Chemistry (Infrared)
42 pages
Chapter 02 - Position On The Earth
100% (1)
Chapter 02 - Position On The Earth
10 pages
Positioning of SVC and STATCOM in A Long Transmission Line
No ratings yet
Positioning of SVC and STATCOM in A Long Transmission Line
5 pages
Question paper
No ratings yet
Question paper
1 page
Computational Chemistry II
No ratings yet
Computational Chemistry II
18 pages
Wronskian PDF
0% (2)
Wronskian PDF
3 pages
On Implicit Surfaces and Their Intersection Curves in Euclidean
No ratings yet
On Implicit Surfaces and Their Intersection Curves in Euclidean
13 pages
Ap2011 Solutions 03
No ratings yet
Ap2011 Solutions 03
10 pages
TM 11-6625-524-14 - Voltmeter - AN - URM-145 - 1963 PDF
No ratings yet
TM 11-6625-524-14 - Voltmeter - AN - URM-145 - 1963 PDF
44 pages
Chemistry-Half Life of A Penny Activity 2.1.18
No ratings yet
Chemistry-Half Life of A Penny Activity 2.1.18
2 pages
SPM Chemistry 2017: Answers
No ratings yet
SPM Chemistry 2017: Answers
4 pages
Types of High Voltage Generators
No ratings yet
Types of High Voltage Generators
7 pages
Electrostatics (S.C.Q.) E
No ratings yet
Electrostatics (S.C.Q.) E
50 pages