0% found this document useful (0 votes)
24 views15 pages

ML Unit 4

DBSCAN is a density-based clustering algorithm that efficiently identifies clusters of varying sizes and handles noise by classifying points as core, border, or noise based on local density. Hierarchical clustering, on the other hand, builds a tree-like structure of clusters through agglomerative or divisive methods, allowing for flexible cluster numbers determined by dendrogram analysis. K-Means is a centroid-based partitioning method that requires a predefined number of clusters and iteratively assigns data points to the nearest centroid to minimize intra-cluster variance.

Uploaded by

syb9721
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
24 views15 pages

ML Unit 4

DBSCAN is a density-based clustering algorithm that efficiently identifies clusters of varying sizes and handles noise by classifying points as core, border, or noise based on local density. Hierarchical clustering, on the other hand, builds a tree-like structure of clusters through agglomerative or divisive methods, allowing for flexible cluster numbers determined by dendrogram analysis. K-Means is a centroid-based partitioning method that requires a predefined number of clusters and iteratively assigns data points to the nearest centroid to minimize intra-cluster variance.

Uploaded by

syb9721
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 15

DBSCAN Clustering in ML | Density based clustering

Introduction
DBSCAN is the abbreviation for Density-Based Spatial Clustering of Applications with Noise. It is an
unsupervised clustering algorithm.DBSCAN clustering can work with clusters of any size from huge
amounts of data and can work with datasets containing a significant amount of noise. It is basically based
on the criteria of a minimum number of points within a region.
What is DBSCAN Algorithm?
DBSCAN algorithm can cluster densely grouped points efficiently into one cluster. It can identify local
density in the data points among large datasets. DBSCAN can very effectively handle outliers. An
advantage of DBSACN over the K-means algorithm is that the number of centroids need not be known
beforehand in the case of DBSCAN.
DBSCAN algorithm depends upon two parameters epsilon and minPoints.
Epsilon is defined as the radius of each data point around which the density is considered.
minPoints is the number of points required within the radius so that the data point becomes a core point.
The circle can be extended to higher dimensions.
Working of DBSCAN Algorithm
In the DBSCAN algorithm, a circle with a radius epsilon is drawn around each data point and the data point
is classified into Core Point, Border Point, or Noise Point. The data point is classified as a core point if it
has minPoints number of data points with epsilon radius. If it has points less than minPoints it is known as
Border Point and if there are no points inside epsilon radius it is considered a Noise Point.

In the above figure, we can see that point A has no points inside epsilon(e) radius. Hence it is a Noise Point.
Point B has minPoints(=4) number of points with epsilon e radius , thus it is a Core Point. While the point
has only 1 ( less than minPoints) point, hence it is a Border Point.
Steps Involved in DBSCAN Algorithm.
 First, all the points within epsilon radius are found and the core points are identified with number of
points greater than or equal to minPoints.
 Next, for each core point, if not assigned to a particular cluster, a new cluster is created for it.
 All the densely connected points related to the core point are found and assigned to the same cluster.
Two points are called densely connected points if they have a neighbor point that has both the points
within epsilon distance.
 Then all the points in the data are iterated, and the points that do not belong to any cluster are marked
as noise.

Hierarchical Clustering in Machine Learning


In data mining and statistics, hierarchical clustering analysis is a method of clustering analysis that seeks to
build a hierarchy of clusters i.e. tree-type structure based on the hierarchy.
In machine learning, clustering is the unsupervised learning technique that groups the data based on
similarity between the set of data. There are different-different types of clustering algorithms in machine
learning. Connectivity-based clustering: This type of clustering algorithm builds the cluster based on the
connectivity between the data points. Example: Hierarchical clustering
 Centroid-based clustering: This type of clustering algorithm forms around the centroids of the data
points. Example: K-Means clustering, K-Mode clustering
 Distribution-based clustering: This type of clustering algorithm is modeled using statistical
distributions. It assumes that the data points in a cluster are generated from a particular probability
distribution, and the algorithm aims to estimate the parameters of the distribution to group similar data
points into clusters Example: Gaussian Mixture Models (GMM)
 Density-based clustering: This type of clustering algorithm groups together data points that are in
high-density concentrations and separates points in low-concentrations regions. The basic idea is that it
identifies regions in the data space that have a high density of data points and groups those points
together into clusters. Example: DBSCAN(Density-Based Spatial Clustering of Applications with Noise)
Hierarchical clustering
Hierarchical clustering is a connectivity-based clustering model that groups the data points together that are
close to each other based on the measure of similarity or distance. The assumption is that data points that
are close to each other are more similar or related than data points that are farther apart.
A dendrogram, a tree-like figure produced by hierarchical clustering, depicts the hierarchical relationships
between groups. Individual data points are located at the bottom of the dendrogram, while the largest
clusters, which include all the data points, are located at the top. In order to generate different numbers of
clusters, the dendrogram can be sliced at various heights.
The dendrogram is created by iteratively merging or splitting clusters based on a measure of similarity or
distance between data points. Clusters are divided or merged repeatedly until all data points are contained
within a single cluster, or until the predetermined number of clusters is attained.
We can look at the dendrogram and measure the height at which the branches of the dendrogram form
distinct clusters to calculate the ideal number of clusters. The dendrogram can be sliced at this height to
determine the number of clusters.
Types of Hierarchical Clustering
Basically, there are two types of hierarchical Clustering:
1. Agglomerative Clustering
2. Divisive clustering
Hierarchical Agglomerative Clustering
It is also known as the bottom-up approach or hierarchical agglomerative clustering (HAC). A structure that
is more informative than the unstructured set of clusters returned by flat clustering. This clustering
algorithm does not require us to prespecify the number of clusters. Bottom-up algorithms treat each data as
a singleton cluster at the outset and then successively agglomerate pairs of clusters until all clusters have
been merged into a single cluster that contains all data.
Algorithm :
given a dataset (d1, d2, d3, ....dN) of size N
# compute the distance matrix
for i=1 to N:
# as the distance matrix is symmetric about
# the primary diagonal so we compute only lower
# part of the primary diagonal
for j=1 to i:
dis_mat[i][j] = distance[di, dj]
each data point is a singleton clusterrepeat
merge the two cluster having minimum distance
update the distance matrixuntil only a single cluster remains
Hierarchical Agglomerative Clustering
Steps:
 Consider each alphabet as a single cluster and calculate the distance of one cluster from all the
other clusters.
 In the second step, comparable clusters are merged together to form a single cluster. Let’s say
cluster (B) and cluster (C) are very similar to each other therefore we merge them in the seco nd step
similarly to cluster (D) and (E) and at last, we get the clusters [(A), (BC), (DE), (F)]
 We recalculate the proximity according to the algorithm and merge the two nearest clusters([(DE),
(F)]) together to form new clusters as [(A), (BC), (DEF)]
 Repeating the same process; The clusters DEF and BC are comparable and merged together to
form a new cluster. We’re now left with clusters [(A), (BCDEF)].
 At last, the two remaining clusters are merged together to form a single cluster [(ABCDEF)].

Hierarchical Divisive clustering


It is also known as a top-down approach. This algorithm also does not require to prespecify the number of
clusters. Top-down clustering requires a method for splitting a cluster that contains the whole data and
proceeds by splitting clusters recursively until individual data have been split into singleton clusters.
Algorithm :
given a dataset (d1, d2, d3, ....dN) of size N
at the top we have all data in one cluster
the cluster is split using a flat clustering method eg. K-Means etcrepeat
choose the best cluster among all the clusters to split
split that cluster by the flat clustering algorithm until each data is in its own singleton cluster

Hierarchical Divisive clustering


Computing Distance Matrix
While merging two clusters we check the distance between two every pair of clusters and merge the pair
with the least distance/most similarity. But the question is how is that distance determined. There are
different ways of defining Inter Cluster distance/similarity. Some of them are:
1. Min Distance: Find the minimum distance between any two points of the cluster.
2. Max Distance: Find the maximum distance between any two points of the cluster.
3. Group Average: Find the average distance between every two points of the clusters.
4. Ward’s Method: The similarity of two clusters is based on the increase in squared error when two
clusters are merged.
For example, if we group a given data using different methods, we may get different results:

Distance Matrix Comparision in Hierarchical Clustering


Output:

Hierarchical Clustering Dendrogram


Hierarchical Agglomerative vs Divisive Clustering
 Divisive clustering is more complex as compared to agglomerative clustering, as in the case of
divisive clustering we need a flat clustering method as “subroutine” to split each cluster until we have
each data having its own singleton cluster.
 Divisive clustering is more efficient if we do not generate a complete hierarchy all the way down
to individual data leaves. The time complexity of a naive agglomerative clustering is O(n3) because
we exhaustively scan the N x N matrix dist_mat for the lowest distance in each of N-1 iterations.
Using priority queue data structure we can reduce this complexity to O(n2logn). By using some more
optimizations it can be brought down to O(n2). Whereas for divisive clustering given a fixed number
of top levels, using an efficient flat algorithm like K-Means, divisive algorithms are linear in the
number of patterns and clusters.
 A divisive algorithm is also more accurate. Agglomerative clustering makes decisions by
considering the local patterns or neighbor points without initially taking into account the global
distribution of data. These early decisions cannot be undone. whereas divisive clustering takes into
consideration the global distribution of data when making top-level partitioning decisions.
Partitioning Method (K-Mean) in Data Mining
Partitioning Method: This clustering method classifies the information into multiple groups based on the
characteristics and similarity of the data. Its the data analysts to specify the number of clusters that has to be
generated for the clustering methods. In the partitioning method when database(D) that contains multiple(N)
objects then the partitioning method constructs user-specified(K) partitions of the data in which each
partition represents a cluster and a particular region. There are many algorithms that come under
partitioning method some of the popular ones are K-Mean, PAM(K-Medoids), CLARA algorithm
(Clustering Large Applications) etc. In this article, we will be seeing the working of K Mean algorithm in
detail.
K-Mean (A centroid based Technique): The K means algorithm takes the input parameter K from the
user and partitions the dataset containing N objects into K clusters so that resulting similarity among the
data objects inside the group (intracluster) is high but the similarity of data objects with the data objects
from outside the cluster is low (intercluster). The similarity of the cluster is determined with respect to the
mean value of the cluster. It is a type of square error algorithm. At the start randomly k objects from the
dataset are chosen in which each of the objects represents a cluster mean(centre). For the rest of the data
objects, they are assigned to the nearest cluster based on their distance from the cluster mean. The new
mean of each of the cluster is then calculated with the added data objects.
Algorithm:
K mean:
Input:
K: The number of clusters in which the dataset has to be divided
D: A dataset containing N number of objects

Output:
A dataset of K clusters
Method:
1. Randomly assign K objects from the dataset(D) as cluster centres(C)
2. (Re) Assign each object to which object is most similar based upon mean values.
3. Update Cluster means, i.e., Recalculate the mean of each cluster with the updated values.
4. Repeat Step 2 until no change occurs.

Figure – K-mean Clustering Flowchart:


Example: Suppose we want to group the visitors to a website using just their age as follows:
16, 16, 17, 20, 20, 21, 21, 22, 23, 29, 36, 41, 42, 43, 44, 45, 61, 62, 66
Initial Cluster:
K=2
Centroid(C1) = 16 [16]
Centroid(C2) = 22 [22]
Note: These two points are chosen randomly from the dataset.
Iteration-1:
C1 = 16.33 [16, 16, 17]
C2 = 37.25 [20, 20, 21, 21, 22, 23, 29, 36, 41, 42, 43, 44, 45, 61, 62, 66]
Iteration-2:
C1 = 19.55 [16, 16, 17, 20, 20, 21, 21, 22, 23]
C2 = 46.90 [29, 36, 41, 42, 43, 44, 45, 61, 62, 66]
Iteration-3:
C1 = 20.50 [16, 16, 17, 20, 20, 21, 21, 22, 23, 29]
C2 = 48.89 [36, 41, 42, 43, 44, 45, 61, 62, 66]
Iteration-4:
C1 = 20.50 [16, 16, 17, 20, 20, 21, 21, 22, 23, 29]
C2 = 48.89 [36, 41, 42, 43, 44, 45, 61, 62, 66]
No change Between Iteration 3 and 4, so we stop. Therefore we get the clusters (16-29) and (36-66) as 2
clusters we get using K Mean Algorithm.

K-Means Clustering Algorithm


K-Means Clustering is an unsupervised learning algorithm that is used to solve the clustering problems in
machine learning or data science. In this topic, we will learn what is K-means clustering algorithm, how the
algorithm works, along with the Python implementation of k-means clustering.
What is K-Means Algorithm?
K-Means Clustering is an Unsupervised Learning algorithm, which groups the unlabeled dataset into
different clusters. Here K defines the number of pre-defined clusters that need to be created in the process,
as if K=2, there will be two clusters, and for K=3, there will be three clusters, and so on.
It is an iterative algorithm that divides the unlabeled dataset into k different clusters in such a way
that each dataset belongs only one group that has similar properties.
It allows us to cluster the data into different groups and a convenient way to discover the categories of
groups in the unlabeled dataset on its own without the need for any training.
It is a centroid-based algorithm, where each cluster is associated with a centroid. The main aim of this
algorithm is to minimize the sum of distances between the data point and their corresponding clusters.
The algorithm takes the unlabeled dataset as input, divides the dataset into k-number of clusters, and repeats
the process until it does not find the best clusters. The value of k should be predetermined in this algorithm.
The k-means clustering algorithm mainly performs two tasks:
o Determines the best value for K center points or centroids by an iterative process.
o Assigns each data point to its closest k-center. Those data points which are near to the particular k-
center, create a cluster.
Hence each cluster has datapoints with some commonalities, and it is away from other clusters.
The below diagram explains the working of the K-means Clustering Algorithm:

How does the K-Means Algorithm Work?


The working of the K-Means algorithm is explained in the below steps:
Step-1: Select the number K to decide the number of clusters.
Step-2: Select random K points or centroids. (It can be other from the input dataset).

Step-3: Assign each data point to their closest centroid, which will form the predefined K clusters.
Step-4: Calculate the variance and place a new centroid of each cluster.
Step-5: Repeat the third steps, which means reassign each datapoint to the new closest centroid of each
cluster.
Step-6: If any reassignment occurs, then go to step-4 else go to FINISH.
Step-7: The model is ready.
Let's understand the above steps by considering the visual plots:
Suppose we have two variables M1 and M2. The x-y axis scatter plot of these two variables is given below:

o Let's take number k of clusters, i.e., K=2, to identify the dataset and to put them into different
clusters. It means here we will try to group these datasets into two different clusters.
o We need to choose some random k points or centroid to form the cluster. These points can be either
the points from the dataset or any other point. So, here we are selecting the below two points as k
points, which are not the part of our dataset. Consider the below image:

o Now we will assign each data point of the scatter plot to its closest K-point or centroid. We will
compute it by applying some mathematics that we have studied to calculate the distance between
two points. So, we will draw a median between both the centroids. Consider the below image:
From the above image, it is clear that points left side of the line is near to the K1 or blue centroid, and
points to the right of the line are close to the yellow centroid. Let's color them as blue and yellow for
clear visualization.

o As we need to find the closest cluster, so we will repeat the process by choosing a new centroid. To
choose the new centroids, we will compute the center of gravity of these centroids, and will find
new centroids as below:

o Next, we will reassign each datapoint to the new centroid. For this, we will repeat the same process
of finding a median line. The median will be like below image:

From the above image, we can see, one yellow point is on the left side of the line, and two blue points
are right to the line. So, these three points will be assigned to new centroids.
As reassignment has taken place, so we will again go to the step-4, which is finding new centroids or
K-points.
o We will repeat the process by finding the center of gravity of centroids, so the new centroids will be
as shown in the below image:

o As we got the new centroids so again will draw the median line and reassign the data points. So, the
image will be:

o We can see in the above image; there are no dissimilar data points on either side of the line, which
means our model is formed. Consider the below image:

As our model is ready, so we can now remove the assumed centroids, and the two final clusters will be
as shown in the below image:
How to choose the value of "K number of clusters" in K-means Clustering?
The performance of the K-means clustering algorithm depends upon highly efficient clusters that it forms.
But choosing the optimal number of clusters is a big task. There are some different ways to find the optimal
number of clusters, but here we are discussing the most appropriate method to find the number of clusters
or value of K. The method is given below:
Elbow Method
The Elbow method is one of the most popular ways to find the optimal number of clusters. This method
uses the concept of WCSS value. WCSS stands for Within Cluster Sum of Squares, which defines the
total variations within a cluster. The formula to calculate the value of WCSS (for 3 clusters) is given below:
WCSS= ∑Pi in Cluster1 distance(Pi C1)2 +∑Pi in Cluster2distance(Pi C2)2+∑Pi in
CLuster3 distance(Pi C3)2
In the above formula of WCSS,
∑Pi in Cluster1 distance(Pi C1)2: It is the sum of the square of the distances between each data point and its
centroid within a cluster1 and the same for the other two terms.
To measure the distance between data points and centroid, we can use any method such as Euclidean
distance or Manhattan distance.
To find the optimal value of clusters, the elbow method follows the below steps:
o It executes the K-means clustering on a given dataset for different K values (ranges from 1-10).
o For each value of K, calculates the WCSS value.
o Plots a curve between calculated WCSS values and the number of clusters K.
o The sharp point of bend or a point of the plot looks like an arm, then that point is considered as the
best value of K.
Since the graph shows the sharp bend, which looks like an elbow, hence it is known as the elbow method.
The graph for the elbow method looks like the below image:

Note: We can choose the number of clusters equal to the given data points. If we choose the number
of clusters equal to the data points, then the value of WCSS becomes zero, and that will be the
endpoint of the plot.

The steps to be followed for the implementation are given below:


o Data Pre-processing
o Finding the optimal number of clusters using the elbow method
o Training the K-means algorithm on the training dataset
o Visualizing the clusters
Grid-Based Clustering - STING, WaveCluster & CLIQUE
Grid-Based Clustering
Grid-Based Clustering method uses a multi-resolution grid data structure.

Partitional Clustering Methods


Hierarchical Clustering Methods
Density-Based Clustering Methods
Several interesting methods
 STING (a STatistical INformation Grid approach) by Wang, Yang,
 and Muntz (1997)
 WaveCluster by Sheikholeslami, Chatterjee, and Zhang (VLDB’98) - A multi-resolution clustering
approach using wavelet method
 CLIQUE - Agrawal, et al. (SIGMOD’98)

STING - A Statistical Information Grid Approach


STING was proposed by Wang, Yang, and Muntz (VLDB’97).

In this method, the spatial area is divided into rectangular cells.


There are several levels of cells corresponding to different levels of resolution.

For each cell, the high level is partitioned into several smaller cells in the next lower level.

The statistical info of each cell is calculated and stored beforehand and is used to answer queries.
The parameters of higher-level cells can be easily calculated from parameters of lower-level cell
 Count, mean, s, min, max
 Type of distribution—normal, uniform, etc.
Then using a top-down approach we need to answer spatial data queries.

Then start from a pre-selected layer—typically with a small number of cells.

For each cell in the current level compute the confidence interval.
Now remove the irrelevant cells from further consideration.

When finishing examining the current layer, proceed to the next lower level.

Repeat this process until the bottom layer is reached.

Advantages:
It is Query-independent, easy to parallelize, incremental update.
O(K), where K is the number of grid cells at the lowest level.
Disadvantages:
All the cluster boundaries are either horizontal or vertical, and no diagonal boundary is detected.

WaveCluster
It was proposed by Sheikholeslami, Chatterjee, and Zhang (VLDB’98).
It is a multi-resolution clustering approach which applies wavelet transform to the feature space
 A wavelet transform is a signal processing technique that decomposes a signal into different
frequency sub-band.
It can be both grid-based and density-based method.

Input parameters:
 No of grid cells for each dimension
 The wavelet, and the no of applications of wavelet transform.

How to apply the wavelet transform to find clusters


 It summaries the data by imposing a multidimensional grid structure onto data space.
 These multidimensional spatial data objects are represented in an n-dimensional feature space.
 Now apply wavelet transform on feature space to find the dense regions in the feature space.
 Then apply wavelet transform multiple times which results in clusters at different scales from fine to
coarse.

Why is wavelet transformation useful for clustering


 It uses hat-shape filters to emphasize region where points cluster, but simultaneously to suppress
weaker information in their boundary.
 It is an effective removal method for outliers.
 It is of Multi-resolution method.
 It is cost-efficiency.

Major features:
 The time complexity of this method is O(N).
 It detects arbitrary shaped clusters at different scales.
 It is not sensitive to noise, not sensitive to input order.
 It only applicable to low dimensional data.

CLIQUE - Clustering In QUEst


It was proposed by Agrawal, Gehrke, Gunopulos, Raghavan (SIGMOD’98).
 It is based on automatically identifying the subspaces of high dimensional data space that allow
better clustering than original space.
CLIQUE can be considered as both density-based and grid-based:
 It partitions each dimension into the same number of equal-length intervals.
 It partitions an m-dimensional data space into non-overlapping rectangular units.
 A unit is dense if the fraction of the total data points contained in the unit exceeds the input model
parameter.
 A cluster is a maximal set of connected dense units within a subspace.
Partition the data space and find the number of points that lie inside each cell of the partition.

Identify the subspaces that contain clusters using the Apriori principle.

Identify clusters:
 Determine dense units in all subspaces of interests.
 Determine connected dense units in all subspaces of interests.

Generate minimal description for the clusters:


 Determine maximal regions that cover a cluster of connected dense units for each cluster.
 Determination of minimal cover for each cluster.

Advantages
It automatically finds subspaces of the highest dimensionality such that high-density clusters exist in those
subspaces.

It is insensitive to the order of records in input and does not presume some canonical data distribution.
It scales linearly with the size of input and has good scalability as the number of dimensions in the data
increases.

Disadvantages
The accuracy of the clustering result may be degraded at the expense of the simplicity of the method.

Summary
Grid-Based Clustering -> It is one of the methods of cluster analysis which uses a multi-resolution grid data
structure.

Principal Component Analysis


Principal Component Analysis is an unsupervised learning algorithm that is used for the dimensionality
reduction in machine learning. It is a statistical process that converts the observations of correlated features
into a set of linearly uncorrelated features with the help of orthogonal transformation. These new
transformed features are called the Principal Components. It is one of the popular tools that is used for
exploratory data analysis and predictive modeling. It is a technique to draw strong patterns from the given
dataset by reducing the variances.
PCA generally tries to find the lower-dimensional surface to project the high-dimensional data.
PCA works by considering the variance of each attribute because the high attribute shows the good split
between the classes, and hence it reduces the dimensionality. Some real-world applications of PCA
are image processing, movie recommendation system, optimizing the power allocation in various
communication channels. It is a feature extraction technique, so it contains the important variables and
drops the least important variable.
The PCA algorithm is based on some mathematical concepts such as:
o Variance and Covariance
o Eigenvalues and Eigen factors
Some common terms used in PCA algorithm:
o Dimensionality: It is the number of features or variables present in the given dataset. More easily, it
is the number of columns present in the dataset.
o Correlation: It signifies that how strongly two variables are related to each other. Such as if one
changes, the other variable also gets changed. The correlation value ranges from -1 to +1. Here, -1
occurs if variables are inversely proportional to each other, and +1 indicates that variables are
directly proportional to each other.
o Orthogonal: It defines that variables are not correlated to each other, and hence the correlation
between the pair of variables is zero.
o Eigenvectors: If there is a square matrix M, and a non-zero vector v is given. Then v will be
eigenvector if Av is the scalar multiple of v.
o Covariance Matrix: A matrix containing the covariance between the pair of variables is called the
Covariance Matrix.
Principal Components in PCA
As described above, the transformed new features or the output of PCA are the Principal Components. The
number of these PCs are either equal to or less than the original features present in the dataset. Some
properties of these principal components are given below:
o The principal component must be the linear combination of the original features.
o These components are orthogonal, i.e., the correlation between a pair of variables is zero.
o The importance of each component decreases when going to 1 to n, it means the 1 PC has the most
importance, and n PC will have the least importance.
Steps for PCA algorithm
1. Getting the dataset
Firstly, we need to take the input dataset and divide it into two subparts X and Y, where X is the
training set, and Y is the validation set.
2. Representing data into a structure
Now we will represent our dataset into a structure. Such as we will represent the two-dimensional
matrix of independent variable X. Here each row corresponds to the data items, and the column
corresponds to the Features. The number of columns is the dimensions of the dataset.
3. Standardizing the data
In this step, we will standardize our dataset. Such as in a particular column, the features with high
variance are more important compared to the features with lower variance.
If the importance of features is independent of the variance of the feature, then we will divide each
data item in a column with the standard deviation of the column. Here we will name the matrix as Z.
4. Calculating the Covariance of Z
To calculate the covariance of Z, we will take the matrix Z, and will transpose it. After transpose,
we will multiply it by Z. The output matrix will be the Covariance matrix of Z.
5. Calculating the Eigen Values and Eigen Vectors
Now we need to calculate the eigenvalues and eigenvectors for the resultant covariance matrix Z.
Eigenvectors or the covariance matrix are the directions of the axes with high information. And the
coefficients of these eigenvectors are defined as the eigenvalues.
6. Sorting the Eigen Vectors
In this step, we will take all the eigenvalues and will sort them in decreasing order, which means
from largest to smallest. And simultaneously sort the eigenvectors accordingly in matrix P of
eigenvalues. The resultant matrix will be named as P*.
7. Calculating the new features Or Principal Components
Here we will calculate the new features. To do this, we will multiply the P* matrix to the Z. In the
resultant matrix Z*, each observation is the linear combination of original features. Each column of
the Z* matrix is independent of each other.
8. Remove less or unimportant features from the new dataset.
The new feature set has occurred, so we will decide here what to keep and what to remove. It means,
we will only keep the relevant or important features in the new dataset, and unimportant features
will be removed out.
Applications of Principal Component Analysis
o PCA is mainly used as the dimensionality reduction technique in various AI applications such as
computer vision, image compression, etc.
o It can also be used for finding hidden patterns if data has high dimensions. Some fields where PCA
is used are Finance, data mining, Psychology, etc.

You might also like