Principal Component Analysis: by Eesha Tur Razia Babar
Principal Component Analysis: by Eesha Tur Razia Babar
COMPONENT ANALYSIS
BY EESHA TUR RAZIA BABAR
2 PCA
• The goal is to transform high-dimensional datasets (having a large number of features) into a low-dimensional one
(having a smaller number of features) without losing too much information.
• Datasets can include images or simple structured datasets.
• Deal with the curse of dimensionality, which results in complex models and difficulty in visualizing.
• Helps us to remove the Multi-collinearity situation in which some input features are correlated with each other and
provide redundant information.
• PCA - reduce the number of features/variables of a data set, while preserving as much information as possible
3 HOW PCA WORKS
4 PCA STEPS
• After normalization, all the variables will be transformed to the same scale.
6 STEP 2: COVARIANCE MATRIX COMPUTATION
• The goal of this step is to understand that if there is any relationship between input
variables
• Correlated variables = redundant information
• Eigenvectors and eigenvalues are the linear algebra concepts that we need to compute from the
covariance matrix
• Eigen-Vectors are also called Principal Components in the context of the topic.
• Principal components are new variables that are constructed as linear combinations or
mixtures of the initial variables.
• Combinations are done in such a way that the new variables (i.e., principal components) are
uncorrelated
• Most of the information within the initial variables is squeezed or compressed into the first
components.
10 SCREE PLOT
11 NOTE THAT
• Principal components are less interpretable and don’t have any real meaning since they
are constructed as linear combinations of the initial variables.
• Geometrically, principal components represent the directions of the data that explain
a maximal amount of variance.
• In simple terms, lines that capture most information of the data
12 HOW PCA CONSTRUCTS THE PRINCIPAL
COMPONENTS
• There are as many principal components as there are variables in the data
• First principal component accounts for the largest possible variance, and so on.
13 EIGENVECTORS AND EIGENVALUES
• Eigenvectors of the Covariance matrix are actually the directions of the axes where there is the most variance(most
information) and that we call Principal Components.
• Eigenvalues are simply the coefficients attached to eigenvectors, which give the amount of variance carried in each
Principal Component.
• We construct a total of n principal components where n is the total number of dimensions of the dataset.
• By sorting the Principal Components in order of their EigenValues, highest to lowest, you get the principal
components in order of significance.
• In the last step, we decide the number of Principal Components to keep and choose the ones of greater significance
and construct a matrix of vectors that we call Feature Vectors. These are our features in the new lower
dimensions.
15 EIGEN VALUES AND EIGEN VECTOR EXAMPLE
• Eigenvectors of the Covariance matrix are actually the directions of the axes where there is the most variance(most
information) and that we call Principal Components.
• Eigenvalues are simply the coefficients attached to eigenvectors, which give the amount of variance carried in each
Principal Component.
• We construct a total of n principal components where n is the total number of dimensions of the dataset.
• By sorting the Principal Components in order of their EigenValues, highest to lowest, you get the principal
components in order of significance.
• In the last step, we decide the number of Principal Components to keep and choose the ones of greater significance
and construct a matrix of vectors that we call Feature Vectors. These are our features in the new lower
dimensions.
16 EXAMPLE 1: STEP 1: FIND MEAN
17 EXAMPLE 1: STEP 2: FIND COVARIANCE
18 EXAMPLE 1: STEP 3: FIND EIGEN VALUE
19 EXAMPLE 1: STEP 4: FIND EIGEN VALUE
20 EXAMPLE 1: STEP 4: FIND EIGEN VALUE
21 EXAMPLE 1: STEP 4: FIND EIGEN VECTOR
22 EXAMPLE 1: STEP 4: FIND PRINCIPLE COMPONENT
DATA AND SCATTER PLOT
SUBTRACTING MEAN
CONT.
27 COVARIANCE MATRIX
28 EIGEN VALUES AND EIGEN VECTORS
•
29 EIGEN VECTOR: UNIT LENGTH VECTOR
30 SORT BY EIGEN VALUES
•
31 PRINCIPLE COMPONENTS CALCULATION
32 CONT.
33 INTERPRET THE PRINCIPLE COMPONENTS
34 HOW MANY COMPONENTS SHOULD WE
EXTRACT?
• The motivations for PCA was to reduce the number of features.
• The question arises, “How do we determine how many components to extract?”
• For example, should we retain only the first principal component, as it explains nearly
half the variability? Or, should we retain all eight components, as they explain 100% of
the variability?
• Retaining all eight components does not help us to reduce the number of dimension
• The answer lies somewhere between these two extremes.
35 HOW MANY COMPONENTS SHOULD WE
EXTRACT?
• The criteria used for deciding how many components to extract are the following:
1. The Eigenvalue Criterion
2. The Proportion of Variance Explained Criterion
3. The Scree Plot Criterion
4. The Minimum Communality Criterion
36 1. THE EIGENVALUE CRITERION
• An eigenvalue of 1 would mean that the component would explain about “one variable's worth” of the
variability.
• The rationale for using the eigenvalue criterion is that each component should explain at least one
variable's worth of the variability, and therefore, the eigenvalue criterion states that only components with
eigenvalues greater than 1 should be retained.
• Note that, if there are fewer than 20 variables, the eigenvalue criterion tends to recommend extracting
too few components, while, if there are more than 50 variables, this criterion may recommend extracting
too many.
37 2. THE PROPORTION OF VARIANCE EXPLAINED
CRITERION
• Specified by the analytst that how much
of the total varability would like the
principal components to account for
• Selects the components one by one until
the desired proportion of variability
explained is attained.
• For example, suppose we would like our
components to explain 85% of the
variability in the variables.
38 3. THE SCREE PLOT CRITERION
• A scree plot is a graphical plot of the eigenvalues against the component
number.
• Scree plots are useful for finding an upper bound (maximum) for the number of
components that should be retained.
• Most scree plots look broadly similar in shape, starting high on the left, falling
rather quickly, and then flattening out at some point.
• This is because the first component usually explains much of the variability, the
next few components explain a moderate amount, and the latter components
only explain a small amount of the variability.
• The scree plot criterion is this: The maximum number of components that
should be extracted is just before where the plot first begins to straighten out
into a horizontal line.
• Sometimes, the curve in a scree plot is so gradual that no such elbow point is
evident; in that case, turn to the other criteria.