Principal Component Analysis and Cluster Analysis
Principal Component Analysis and Cluster Analysis
1. Covariance Structure:
○ Covariance measures the extent to which two variables change
together. In PCA, we are interested in understanding the
relationships between variables in terms of their covariance.
○ PCA seeks to transform the data into new coordinates (principal
components) where these covariances are zero, meaning the
components are uncorrelated.
3. Orthogonal Transformation:
○ PCA applies an orthogonal transformation to the dataset to achieve
decorrelation. Each new axis is orthogonal (at a right angle) to the
others, which eliminates redundancy in the data.
● writing:
● Now let us consider how this applies to the covariance matrix in the PCA
process. Let Σ be an n×n covariance matrix. There is an orthogonal n × n
matrix Φ whose columns are eigenvectors of Σ and a diagonal matrix Λ
whose diagonal elements are the eigenvalues of Σ, such that: ΦT ΣΦ = Λ
Limitations of PCA
● Linear Assumption: PCA assumes that data variation is linear, which may
not always hold, especially in complex datasets.
● Sensitivity to Scaling: PCA is sensitive to the scale of the data. It’s
essential to standardize data to avoid misleading principal component
results.
● Loss of Interpretability: Reduced dimensions might lead to a loss of
direct interpretability, as principal components are combinations of
original variables.
Applications of PCA
PCA is widely used across various fields due to its versatility in simplifying
datasets:
Cluster Analysis
Applications:
A. Partitioning Methods
B. Hierarchical Methods
● A hierarchical method creates a hierarchical decomposition of the given
set of data objects. A hierarchical method can be classified as being either
agglomerative or divisive, based on how the hierarchical decomposition is
formed.
○ The Agglomerative approach, also called the bottom-up approach,
starts with each object forming a separate group. It successively
merges the objects or groups that are close to one another, until all
of the groups are merged into one or until a termination condition
holds.
○ The divisive approach, also called the top-down approach, starts
with all of the objects in the same cluster. In each successive
iteration, a cluster is split up into smaller clusters, until eventually
each object is in one cluster, or until a termination condition holds.
C. Density-based methods
D. Grid-Based Methods
● Grid-based methods quantize the object space into a finite number of
cells that form a grid structure.
● All of the clustering operations are performed on the grid structure i.e., on
the quantized space. The main advantage of this approach is its fast
processing time, which is typically independent of the number of data
objects and dependent only on the number of cells in each dimension in
the quantized space.
● STING is a typical example of a grid-based method. Wave Cluster applies
wavelet transformation for clustering analysis and is both grid-based and
density-based.
E. Model-Based Methods
The k-means algorithm takes the input parameter, k, and partitions a set of n
objects into k clusters so that the resulting intra cluster similarity is high but the
inter cluster similarity is low. Cluster similarity is measured in regard to the
mean value of the objects in a cluster, which can be viewed as the cluster’s
centroid or center of gravity.
where E is the sum of the square error for all objects in the data
set p is the point in space representing a given object mi is the mean of cluster
Ci.
where E is the sum of the absolute error for all objects in the data set p is
the point in space representing a given object in cluster Cj. Oj is the
representative object of Cj.
Outlier Analysis
● There exist data objects that do not comply with the general behavior or
model of the data. Such data objects, which are grossly different from or
inconsistent with the remaining set of data, are called outliers.
● Many data mining algorithms try to minimize the influence of outliers or
eliminate them all together. This, however, could result in the loss of
important hidden information because one person’s noise could be
another person’s signal.
● In other words, the outliers may be of particular interest, such as in the
case of fraud detection, where outliers may indicate fraudulent activity.
● Thus, outlier detection and analysis is an interesting data mining task,
referred to as outlier mining. It can be used in fraud detection, for
example, by detecting unusual usage of credit cards or telecommunication
services.
● In addition, it is useful in customized marketing for identifying the
spending behavior of customers with extremely low or extremely high
incomes, or in medical analysis for finding unusual responses to various
medical treatments.