0% found this document useful (0 votes)

90 views14 pages

Principal Component Analysis and Cluster Analysis

The document provides an overview of Principal Component Analysis (PCA) and Cluster Analysis, detailing PCA's role in dimensionality reduction and its computational methods, including eigenvectors and eigenvalues. It also discusses clustering techniques, their applications, and requirements, highlighting various methods such as partitioning, hierarchical, density-based, grid-based, and model-based approaches. Additionally, it outlines the benefits and limitations of PCA and clustering in data analysis.

Uploaded by

thesupriyaproject2025

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

90 views14 pages

Principal Component Analysis and Cluster Analysis

Uploaded by

thesupriyaproject2025

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 14

Principal Component Analysis and Cluster Analysis

Principal Components Analysis

● Definition: PCA is a dimensionality reduction technique that transforms a

dataset with potentially correlated variables into a set of linearly
uncorrelated variables called principal components.
● Objective: The primary goal of PCA is to simplify the covariance structure
of the dataset by finding new axes (principal components) that capture
the maximum variance in the data.

Key Concepts in PCA

1. Covariance Structure:
○ Covariance measures the extent to which two variables change
together. In PCA, we are interested in understanding the
relationships between variables in terms of their covariance.
○ PCA seeks to transform the data into new coordinates (principal
components) where these covariances are zero, meaning the
components are uncorrelated.

2. Principal Directions of Variance:

○ Principal Directions are the directions in which the data shows the
most variance (or spread).
○ Principal Components are vectors that represent these directions.
The first principal component represents the direction with the
highest variance, while the second is the direction orthogonal to it
with the next highest variance, and so on.

3. Orthogonal Transformation:
○ PCA applies an orthogonal transformation to the dataset to achieve
decorrelation. Each new axis is orthogonal (at a right angle) to the
others, which eliminates redundancy in the data.

Understanding PCA with an Example

Consider a dataset with two

variables, represented by the X and
Y axes in a 2D plane.

● Principal Directions in Data:

○ Suppose the data
distribution is primarily along an axis called the U-axis, which
represents the direction of maximum variance. The V-axis is
orthogonal to the U-axis and represents the secondary direction of
variance.
○ By re-orienting the coordinate system to align with the U and V
axes, the dataset gains a more compact representation centered
around its mean.

● Transformation to the U-V System:

○ Each data point, initially represented as (X,Y), can be transformed
into the (U,V) coordinate system.
○ In this transformed system, the covariance between U and V is zero,
meaning the dataset is decorrelated, providing a simplified and clear
view of the underlying data structure.

Dimensionality Reduction Using PCA

1. Reducing Complexity in High-Dimensional Data:

○ When working with multidimensional datasets, some dimensions
might add minimal new information (often due to noise or
redundant relationships). PCA helps to identify and eliminate such
dimensions, thus reducing the dataset’s dimensionality without
losing essential information.

2. Illustration with Linearly Related Variables:

○ If two variables in the dataset are linearly related, PCA will identify
one main direction of variance (e.g., the U-axis).
○ In such cases, all values along the secondary direction (V-axis) may
be close to zero, primarily due to noise or insignificant variance.
○ By discarding the V-axis, we can represent the data solely by the U
variable, reducing dimensionality from two to one, while retaining
most of the dataset’s variance.

3. Hyper-Ellipse and Class Boundaries:

○ When data follows a normal distribution, PCA can represent the
data spread with a hyper-ellipse (or an ellipse in 2D).
○ This hyper-ellipse, enclosing most data points, acts as a boundary
within which data points are likely to fall, helping identify classes
and outliers.

Computing the Principal Components

● In computational terms the principal components are found by calculating

the eigenvectors and eigenvalues of the data covariance matrix. This
process is equivalent to finding the axis system in which the covariance
matrix is diagonal.
● The eigenvector with the largest eigenvalue is the direction of greatest
variation, the one with the second largest eigenvalue is the (orthogonal)
direction with the next highest variation and so on.
● Let A be an n × n matrix. The eigenvalues of A are defined as the roots of:
○ where I is the n × n identity matrix. This equation is called the
characteristic equation (or characteristic polynomial) and has n
roots.
● Let λ be an eigenvalue of A. Then there exists a vector x such that:
Ax = λx

● The vector x is called an eigenvector of A associated with the eigenvalue

λ. It is a direction vector only and can be scaled to any magnitude.
● To find a numerical solution for x we need to set one of its elements to an
arbitrary value, say 1, which gives us a set of simultaneous equations to
solve for the other other elements.
● If there is no solution we repeat the process with another element.
Ordinarily we normalise the final values so that x has length one, that is x
· xt = 1. Suppose we have a 3 × 3 matrix A with eigenvectors x1, x2, x3, and
eigenvalues λ1, λ2, λ3 so:

● Putting the eigenvectors as the columns of a matrix gives:

● writing:

● gives us the matrix equation: AΦ = ΦΛ

● We normalize the eigenvectors to unit magnitude, and they are

orthogonal, so: ΦΦT = ΦTΦ = I

● which means that: ΦTAΦ = Λ

● And: A = ΦΛΦT

● Now let us consider how this applies to the covariance matrix in the PCA
process. Let Σ be an n×n covariance matrix. There is an orthogonal n × n
matrix Φ whose columns are eigenvectors of Σ and a diagonal matrix Λ
whose diagonal elements are the eigenvalues of Σ, such that: ΦT ΣΦ = Λ

● We can look at the matrix of eigenvectors Φ as a linear transformation

which, in the example of figure 1 transforms data points in the [X, Y ] axis
system into the [U, V ] axis system.
● In the general case the linear transformation given by Φ transforms the
data points into a data set where the variables are uncorrelated. The
correlation matrix of the data in the new coordinate system is Λ which has
zeros in all the off diagonal elements.

Benefits of Using PCA

● Reduction of Noise and Overfitting: By removing components that

mainly represent noise, PCA reduces the risk of overfitting in machine
learning applications.
● Simplification of Data Interpretation: PCA provides a more compact
representation of data, making complex relationships easier to visualize
and analyze.
● Efficient Computation in High-Dimensional Data: For high-dimensional
datasets, PCA reduces computational costs by focusing only on significant
components.

Limitations of PCA

● Linear Assumption: PCA assumes that data variation is linear, which may
not always hold, especially in complex datasets.
● Sensitivity to Scaling: PCA is sensitive to the scale of the data. It’s
essential to standardize data to avoid misleading principal component
results.
● Loss of Interpretability: Reduced dimensions might lead to a loss of
direct interpretability, as principal components are combinations of
original variables.

Applications of PCA

PCA is widely used across various fields due to its versatility in simplifying
datasets:

● Image Compression: PCA helps in reducing image data by focusing on

key visual components.
● Genomics and Bioinformatics: PCA helps in gene expression analysis by
identifying significant patterns in genetic data.
● Finance: PCA is used in stock market analysis to reduce the complexity of
data and find key patterns.
● Data Visualization: PCA aids in visualizing high-dimensional data in two
or three dimensions, facilitating easier pattern recognition.

Cluster Analysis

The process of grouping a set of physical or abstract objects into classes of

similar objects is called clustering. A cluster is a collection of data objects that
are similar to one another within the same cluster and are dissimilar to the
objects in other clusters. A cluster of data objects can be treated collectively as
one group and so may be considered as a form of data compression. Cluster
analysis tools based on k-means, k-medoids, and several methods have also
been built into many statistical analysis software packages or systems, such as
S-Plus, SPSS, and SAS.

Applications:

● Cluster analysis has been widely used in numerous applications, including

market research, pattern recognition, data analysis, and image processing.
● In business, clustering can help marketers discover distinct groups in their
customer bases and characterize customer groups based on purchasing
patterns.
● Clustering helps in the identification of areas of similar land use in an
earth observation database and in the identification of groups of houses
in a city according to house type, value,and geographic location, as well
as the identification of groups of automobile insurance policy holders with
a high average claim cost.
● Clustering is also called data segmentation in some applications because
clustering partitions large data sets into groups according to their
similarity.

Typical Requirements Of Clustering In Data Mining

● Scalability: Many clustering algorithms work well on small data sets

containing fewer than several hundred data objects; however, a large
database may contain millions of objects. Clustering on a sample of a
given large data set may lead to biased results.
● Ability to deal with different types of attributes: Many algorithms are
designed to cluster interval-based (numerical) data. However,
applications may require clustering other types of data, such as binary,
categorical (nominal), and ordinal data, or mixtures of these data types.
● Discovery of clusters with arbitrary shape: Many clustering algorithms
determine clusters based on Euclidean or Manhattan distance measures.
Algorithms based on such distance measures tend to find spherical
clusters with similar size and density. However, a cluster could be of any
shape. It is important to develop algorithms that can detect clusters of
arbitrary shape.
● Minimal requirements for domain knowledge to determine input
parameters: Many clustering algorithms require users to input certain
parameters in cluster analysis (such as the number of desired clusters).
The clustering results can be quite sensitive to input parameters.
Parameters are often difficult to determine, especially for data sets
containing high-dimensional objects. This not only burdens users, but it
also makes the quality of clustering difficult to control.
● Ability to deal with noisy data: Most real-world databases contain
outliers or missing, unknown, or erroneous data. Some clustering
algorithms are sensitive to such data and may lead to clusters of poor
quality.
● Incremental clustering and insensitivity to the order of input records:
Some clustering algorithms cannot incorporate newly inserted data (i.e.,
database updates) into existing clustering structures and, instead, must
determine a new clustering from scratch. Some clustering algorithms are
sensitive to the order of input data. That is, given a set of data objects,
such an algorithm may return dramatically different

Major Clustering Methods:

A. Partitioning Methods

● A partitioning method constructs k partitions of the data, where each

partition represents a cluster and k <= n. That is, it classifies the data into
k groups, which together satisfy the following requirements: Each group
must contain at least one object, and Each object must belong to exactly
one group.
● A partitioning method creates an initial partitioning. It then uses an
iterative relocation technique that attempts to improve the partitioning by
moving objects from one group to another.
● The general criterion of a good partitioning is that objects in the same
cluster are close or related to each other, whereas objects of different
clusters are far apart or very different.

B. Hierarchical Methods
● A hierarchical method creates a hierarchical decomposition of the given
set of data objects. A hierarchical method can be classified as being either
agglomerative or divisive, based on how the hierarchical decomposition is
formed.
○ The Agglomerative approach, also called the bottom-up approach,
starts with each object forming a separate group. It successively
merges the objects or groups that are close to one another, until all
of the groups are merged into one or until a termination condition
holds.
○ The divisive approach, also called the top-down approach, starts
with all of the objects in the same cluster. In each successive
iteration, a cluster is split up into smaller clusters, until eventually
each object is in one cluster, or until a termination condition holds.

C. Density-based methods

● Most partitioning methods cluster objects based on the distance between

objects. Such methods can find only spherical-shaped clusters and
encounter difficulty at discovering clusters of arbitrary shapes.
● Other clustering methods have been developed based on the notion of
density. Their general idea is to continue growing the given cluster as
long as the density in the neighborhood exceeds some threshold; that is,
for each data point within a given cluster, the neighborhood of a given
radius has to contain at least a minimum number of points. Such a method
can be used to filter out noise (outliers)and discover clusters of arbitrary
shape.
● DBSCAN and its extension, OPTICS, are typical density-based methods
that grow clusters according to a density-based connectivity analysis.
DENCLUE is a method that clusters objects based on the analysis of the
value distributions of density functions.

D. Grid-Based Methods
● Grid-based methods quantize the object space into a finite number of
cells that form a grid structure.
● All of the clustering operations are performed on the grid structure i.e., on
the quantized space. The main advantage of this approach is its fast
processing time, which is typically independent of the number of data
objects and dependent only on the number of cells in each dimension in
the quantized space.
● STING is a typical example of a grid-based method. Wave Cluster applies
wavelet transformation for clustering analysis and is both grid-based and
density-based.

E. Model-Based Methods

● Model-based methods hypothesize a model for each of the clusters and

find the best fit of the data to the given model.
● A model-based algorithm may locate clusters by constructing a density
function that reflects the spatial distribution of the data points.
● It also leads to a way of automatically determining the number of clusters
based on standard statistics, taking ―noise‖ or outliers into account and
thus yielding robust clustering methods.

Classical Partitioning Methods

1. Centroid-Based Technique: The K-Means Method:

The k-means algorithm takes the input parameter, k, and partitions a set of n
objects into k clusters so that the resulting intra cluster similarity is high but the
inter cluster similarity is low. Cluster similarity is measured in regard to the
mean value of the objects in a cluster, which can be viewed as the cluster’s
centroid or center of gravity.

The k-means algorithm proceeds as follows:

● First, it randomly selects k of the objects, each of which initially
represents a cluster mean or center.
● For each of the remaining objects, an object is assigned to the cluster to
which it is the most similar, based on the distance between the object and
the cluster mean.
● It then computes the new mean for each cluster.
● This process iterates until the criterion function converges.
● Typically, the square-error criterion is used, defined as:

where E is the sum of the square error for all objects in the data
set p is the point in space representing a given object mi is the mean of cluster
Ci.

2. The k-Medoids Method

● The k-means algorithm is sensitive to outliers because an object with an

extremely large value may substantially distort the distribution of data.
This effect is particularly exacerbated due to the use of the square-error
function.
● Instead of taking the mean value of the objects in a cluster as a reference
point, we can pick actual objects to represent the clusters, using one
representative object per cluster. Each remaining object is clustered with
the representative object to which it is the most similar.
● The Partitioning method is then performed based on the principle of
minimizing the sum of the dissimilarities between each object and its
corresponding reference point. That is, an absolute-error criterion is used,
defined as:

where E is the sum of the absolute error for all objects in the data set p is
the point in space representing a given object in cluster Cj. Oj is the
representative object of Cj.

● The initial representative objects are chosen arbitrarily. The iterative

process of replacing representative objects by non representative objects
continues as long as the quality of the resulting clustering is improved.
● This quality is estimated using a cost function that measures the average
dissimilarity between an object and the representative object of its
cluster.
● To determine whether a non representative object, oj random, is a good
replacement for a current representative object, oj, the following four
cases are examined for each of the nonrepresentative objects.

Case 1: p currently belongs to representative object, oj . If oj is replaced by

orandom as a representative object and p is closest to one of the other
representative objects, oi, i≠j, then p is reassigned to oi .

Case 2: p currently belongs to representative object, oj. If oj is replaced by

orandom as a representative object and p is closest to orandom, then p is reassigned
to orandom.

Case 3: p currently belongs to representative object, oi , i≠j. If oj is replaced by

orandom as a representative object and p is still closest to oi , then the assignment
does not change.
Case 4: p currently belongs to representative object, oi, i≠j. If oj is replaced by
orandom as a representative object and p is closest to orandom, then p is reassigned
to orandom.

Outlier Analysis

● There exist data objects that do not comply with the general behavior or
model of the data. Such data objects, which are grossly different from or
inconsistent with the remaining set of data, are called outliers.
● Many data mining algorithms try to minimize the influence of outliers or
eliminate them all together. This, however, could result in the loss of
important hidden information because one person’s noise could be
another person’s signal.
● In other words, the outliers may be of particular interest, such as in the
case of fraud detection, where outliers may indicate fraudulent activity.
● Thus, outlier detection and analysis is an interesting data mining task,
referred to as outlier mining. It can be used in fraud detection, for
example, by detecting unusual usage of credit cards or telecommunication
services.
● In addition, it is useful in customized marketing for identifying the
spending behavior of customers with extremely low or extremely high
incomes, or in medical analysis for finding unusual responses to various
medical treatments.

Essentials of Econometrics Damodar Gujarati Z Library
No ratings yet
Essentials of Econometrics Damodar Gujarati Z Library
52 pages
Impact of Darwinian Theory On Geographical Thought
No ratings yet
Impact of Darwinian Theory On Geographical Thought
7 pages
Paradigm Shift and Perspectives in Geography
No ratings yet
Paradigm Shift and Perspectives in Geography
23 pages
Gabriel Otieno Okello - Statistical Methods Using SPSS-Chapman and Hall - CRC (2024)
No ratings yet
Gabriel Otieno Okello - Statistical Methods Using SPSS-Chapman and Hall - CRC (2024)
204 pages
Unit 3dimentionality Reduction
No ratings yet
Unit 3dimentionality Reduction
13 pages
Principal Component Analysis
No ratings yet
Principal Component Analysis
8 pages
Fra Merged PDF
No ratings yet
Fra Merged PDF
336 pages
Unit 3
No ratings yet
Unit 3
102 pages
Dokumen - Tips Finquiz Cfa Level I Mock Exam 1 Solutions Am Questions Topic Minutes
No ratings yet
Dokumen - Tips Finquiz Cfa Level I Mock Exam 1 Solutions Am Questions Topic Minutes
76 pages
Unit - IV - DIMENSIONALITY REDUCTION AND GRAPHICAL MODELS
No ratings yet
Unit - IV - DIMENSIONALITY REDUCTION AND GRAPHICAL MODELS
59 pages
IDS 4 (Week 14)
No ratings yet
IDS 4 (Week 14)
66 pages
Week 4 Project: Case Study
No ratings yet
Week 4 Project: Case Study
2 pages
ML (Unit 5)
No ratings yet
ML (Unit 5)
34 pages
ML Mod 4 Part 2
No ratings yet
ML Mod 4 Part 2
32 pages
Lecture 9 - Data Prep - Reduction - PCA-M
No ratings yet
Lecture 9 - Data Prep - Reduction - PCA-M
44 pages
It ML Unit 4 Notes Final
No ratings yet
It ML Unit 4 Notes Final
21 pages
Blupf90 All8
No ratings yet
Blupf90 All8
149 pages
ML RUSA Module 5 Dim Red
No ratings yet
ML RUSA Module 5 Dim Red
85 pages
Edexcel Formula Booklet
No ratings yet
Edexcel Formula Booklet
27 pages
Advanced Data Analysis Techniques 2
No ratings yet
Advanced Data Analysis Techniques 2
32 pages
Sampling and Hypothesis Testing
No ratings yet
Sampling and Hypothesis Testing
37 pages
Principal Component Analysis
No ratings yet
Principal Component Analysis
27 pages
PCA Theory
No ratings yet
PCA Theory
13 pages
Data Pre-Processing-IV (Feature Extraction-PCA)
No ratings yet
Data Pre-Processing-IV (Feature Extraction-PCA)
23 pages
Cheat Sheet
No ratings yet
Cheat Sheet
2 pages
What Is PCA?: Image Source
No ratings yet
What Is PCA?: Image Source
17 pages
Module 3
No ratings yet
Module 3
41 pages
PCA Complete
No ratings yet
PCA Complete
8 pages
Lecture 9 - Data Reduction
No ratings yet
Lecture 9 - Data Reduction
36 pages
PCA Dev
No ratings yet
PCA Dev
16 pages
Unit 3
No ratings yet
Unit 3
28 pages
U5@-Data Reduction
No ratings yet
U5@-Data Reduction
22 pages
Data Analysis: Dr. C Santhosh Kumar
No ratings yet
Data Analysis: Dr. C Santhosh Kumar
22 pages
Eco No Metrics
No ratings yet
Eco No Metrics
79 pages
G5. PM1. Diversification
No ratings yet
G5. PM1. Diversification
51 pages
1501589578da Mod15 Q1 e Text
No ratings yet
1501589578da Mod15 Q1 e Text
9 pages
Chapter Five Principal Comonent Analysis (PCA)
No ratings yet
Chapter Five Principal Comonent Analysis (PCA)
33 pages
Feature Extraction Techniques
No ratings yet
Feature Extraction Techniques
32 pages
P-3.1.4 - Pca
No ratings yet
P-3.1.4 - Pca
44 pages
Principal Component Analysis
100% (1)
Principal Component Analysis
10 pages
Pca - Principal Component Analysis 1233
No ratings yet
Pca - Principal Component Analysis 1233
30 pages
3.2 Pca
No ratings yet
3.2 Pca
27 pages
Clustering and Dimensionality Reduction Techniques PCA T SNE K Means
No ratings yet
Clustering and Dimensionality Reduction Techniques PCA T SNE K Means
15 pages
Principal Component Analysis
No ratings yet
Principal Component Analysis
33 pages
Principal Component Analysis
No ratings yet
Principal Component Analysis
8 pages
Earth Movements
No ratings yet
Earth Movements
10 pages
Unit Iii Dimentionality Reduction
No ratings yet
Unit Iii Dimentionality Reduction
12 pages
Origin and Formation of The Earth
No ratings yet
Origin and Formation of The Earth
7 pages
05 PDF
No ratings yet
05 PDF
70 pages
2023 Deformation Monitoring of Monopole Communication Towers
No ratings yet
2023 Deformation Monitoring of Monopole Communication Towers
19 pages
Joint Probability Distributions
No ratings yet
Joint Probability Distributions
47 pages
Pac
No ratings yet
Pac
70 pages
Pca 1
No ratings yet
Pca 1
3 pages
U4 - PCA - 5th Sem - DS
No ratings yet
U4 - PCA - 5th Sem - DS
14 pages
Love Report 1
No ratings yet
Love Report 1
10 pages
Dimensionality Reduction
No ratings yet
Dimensionality Reduction
19 pages
AML Unit 5
No ratings yet
AML Unit 5
13 pages
Module3 Notes
No ratings yet
Module3 Notes
13 pages
Bauer 2006
No ratings yet
Bauer 2006
22 pages
Need of Principal Component Analysis
No ratings yet
Need of Principal Component Analysis
8 pages
Principal Component Analysis
No ratings yet
Principal Component Analysis
9 pages
Principal Component Analysis
No ratings yet
Principal Component Analysis
11 pages
Principal Component Analysis
No ratings yet
Principal Component Analysis
6 pages
PDF Marketing Analytics A Practical Guide To Real Marketing Science 1St Edition Mike Grigsby Ebook Full Chapter
100% (8)
PDF Marketing Analytics A Practical Guide To Real Marketing Science 1St Edition Mike Grigsby Ebook Full Chapter
53 pages
Weatherwax Hasbrouck Notes
No ratings yet
Weatherwax Hasbrouck Notes
60 pages
Principal Component Analysis
No ratings yet
Principal Component Analysis
6 pages
Bayesian Methods For The Analysis of Small Sample Multilevel Data With A Complex Variance Structure
No ratings yet
Bayesian Methods For The Analysis of Small Sample Multilevel Data With A Complex Variance Structure
14 pages
PCA Finds Representation Through Linear Transformation
No ratings yet
PCA Finds Representation Through Linear Transformation
28 pages
DS Ca2 PPT 3010 3017
No ratings yet
DS Ca2 PPT 3010 3017
10 pages
Love Report
No ratings yet
Love Report
7 pages
11 - Mean, Median, Covariance and Correlation
No ratings yet
11 - Mean, Median, Covariance and Correlation
19 pages
Feature Extraction: - Saheni Patra
No ratings yet
Feature Extraction: - Saheni Patra
17 pages
The Intuition Behind PCA: Machine Learning Assignment
No ratings yet
The Intuition Behind PCA: Machine Learning Assignment
11 pages
Zeger (1988) Models For Longitudinal Data A Generalized Estimating Equation Approach PDF
No ratings yet
Zeger (1988) Models For Longitudinal Data A Generalized Estimating Equation Approach PDF
13 pages
Principal Components Analysis (PCA) Final
No ratings yet
Principal Components Analysis (PCA) Final
23 pages
1.variable Reduction 2.principal Component Analysis: Topic UNIT-4
No ratings yet
1.variable Reduction 2.principal Component Analysis: Topic UNIT-4
19 pages
Linear Regression: Dimensionality Reduction
No ratings yet
Linear Regression: Dimensionality Reduction
7 pages
PCA
100% (1)
PCA
33 pages
Analysis of A Complex of Statistical Variables Into Principal Components
No ratings yet
Analysis of A Complex of Statistical Variables Into Principal Components
25 pages
Elements of Econometrics Exam Commentaries
0% (1)
Elements of Econometrics Exam Commentaries
4 pages
Ai (PCA)
No ratings yet
Ai (PCA)
3 pages
6338 - Multicollinearity & Autocorrelation
No ratings yet
6338 - Multicollinearity & Autocorrelation
28 pages
Unit 2
No ratings yet
Unit 2
17 pages
Pca&kmean
No ratings yet
Pca&kmean
6 pages
Optimal Risky Portfolio
No ratings yet
Optimal Risky Portfolio
25 pages
Chapter 5 Bài tập
No ratings yet
Chapter 5 Bài tập
4 pages
Lecture 14: Principal Component Analysis: Computing The Principal Components
No ratings yet
Lecture 14: Principal Component Analysis: Computing The Principal Components
6 pages
Expert-Verified: Search
No ratings yet
Expert-Verified: Search
1 page
Econometrics, Economic Data and Probability: A.S. Goldberger
No ratings yet
Econometrics, Economic Data and Probability: A.S. Goldberger
6 pages
Principal Component Analysis: Term Paper For Data Mining & Data Warehousing
No ratings yet
Principal Component Analysis: Term Paper For Data Mining & Data Warehousing
11 pages
Pre and Post Merger P-E Ratios
No ratings yet
Pre and Post Merger P-E Ratios
4 pages
Coek - Info - Selection Index and Introduction To Mixed Model Me
No ratings yet
Coek - Info - Selection Index and Introduction To Mixed Model Me
1 page
10-601 Machine Learning (Fall 2010) Principal Component Analysis
No ratings yet
10-601 Machine Learning (Fall 2010) Principal Component Analysis
8 pages
Illuminating Data: A hands on guide to data visualization in R
From Everand
Illuminating Data: A hands on guide to data visualization in R
Eman Ahmad
No ratings yet
Mathematics for Data Science: Linear Algebra with Matlab
From Everand
Mathematics for Data Science: Linear Algebra with Matlab
César Pérez López
No ratings yet

Principal Component Analysis and Cluster Analysis

Uploaded by

Principal Component Analysis and Cluster Analysis

Uploaded by

Principal Component Analysis and Cluster Analysis

Principal Components Analysis

● Definition: PCA is a dimensionality reduction technique that transforms a

Key Concepts in PCA

2. Principal Directions of Variance:

Understanding PCA with an Example

Consider a dataset with two

● Principal Directions in Data:

● Transformation to the U-V System:

Dimensionality Reduction Using PCA

1. Reducing Complexity in High-Dimensional Data:

2. Illustration with Linearly Related Variables:

3. Hyper-Ellipse and Class Boundaries:

Computing the Principal Components

● In computational terms the principal components are found by calculating

● The vector x is called an eigenvector of A associated with the eigenvalue

● Putting the eigenvectors as the columns of a matrix gives:

● gives us the matrix equation: AΦ = ΦΛ

● We normalize the eigenvectors to unit magnitude, and they are

● which means that: ΦTAΦ = Λ

● We can look at the matrix of eigenvectors Φ as a linear transformation

Benefits of Using PCA

● Reduction of Noise and Overfitting: By removing components that

● Image Compression: PCA helps in reducing image data by focusing on

The process of grouping a set of physical or abstract objects into classes of

● Cluster analysis has been widely used in numerous applications, including

Typical Requirements Of Clustering In Data Mining

● Scalability: Many clustering algorithms work well on small data sets

Major Clustering Methods:

● A partitioning method constructs k partitions of the data, where each

● Most partitioning methods cluster objects based on the distance between

● Model-based methods hypothesize a model for each of the clusters and

Classical Partitioning Methods

1. Centroid-Based Technique: The K-Means Method:

The k-means algorithm proceeds as follows:

2. The k-Medoids Method

● The k-means algorithm is sensitive to outliers because an object with an

● The initial representative objects are chosen arbitrarily. The iterative

Case 1: p currently belongs to representative object, oj . If oj is replaced by

Case 2: p currently belongs to representative object, oj. If oj is replaced by

Case 3: p currently belongs to representative object, oi , i≠j. If oj is replaced by

You might also like