Principal Component Analysis
Principal Component Analysis
• Outlier Detection
– Principal Component Analysis can be used for
outlier detection.
– Outliers are data points that are significantly
different from the other data points in the dataset.
– Principal Component Analysis can identify these
outliers by looking for data points that are far from
the other points in the principal component space.
Disadvantages of Principal Component Analysis
• Interpretation of Principal Components
– The principal components created by Principal
Component Analysis are linear combinations of the
original variables, and it is often difficult to interpret
them in terms of the original variables.
– This can make it difficult to explain the results of PCA to
others.
• Data Scaling
– Principal Component Analysis is sensitive to the scale of
the data. If the data is not properly scaled, then PCA
may not work well.
– Therefore, it is important to scale the data before
applying Principal Component Analysis.
Disadvantages of Principal Component Analysis
• Information Loss
– Principal Component Analysis can result in information loss.
– While Principal Component Analysis reduces the number of
variables, it can also lead to loss of information.
– The degree of information loss depends on the number of
principal components selected.
– Therefore, it is important to carefully select the number of
principal components to retain.
• Non-linear Relationships
– Principal Component Analysis assumes that the relationships
between variables are linear.
– However, if there are non-linear relationships between
variables, Principal Component Analysis may not work well.
Disadvantages of Principal Component Analysis
• Computational Complexity
– Computing Principal Component Analysis can be
computationally expensive for large datasets.
– This is especially true if the number of variables in the
dataset is large.
• Overfitting
– Principal Component Analysis can sometimes result in
overfitting, which is when the model fits the training
data too well and performs poorly on new data.
– This can happen if too many principal components
are used or if the model is trained on a small dataset.