0% found this document useful (0 votes)
1 views20 pages

Cluster Analysis

Clustering analysis is a method in unsupervised learning that groups similar data objects into clusters for better interpretation and analysis. It has various applications, including market research and image processing, and requires certain characteristics like interpretability and scalability. Different clustering methods include partitioning, hierarchical, centroid-based, and density-based approaches, each with unique features and use cases.

Uploaded by

Rohan Malik
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
1 views20 pages

Cluster Analysis

Clustering analysis is a method in unsupervised learning that groups similar data objects into clusters for better interpretation and analysis. It has various applications, including market research and image processing, and requires certain characteristics like interpretability and scalability. Different clustering methods include partitioning, hierarchical, centroid-based, and density-based approaches, each with unique features and use cases.

Uploaded by

Rohan Malik
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 20

Clustering Analysis

Unsupervised Learning

By: Muhammad Fraz


Clustering
 A group of different data objects is classified as similar objects.
 One group means a cluster of data.
 Data sets are divided into different groups in the cluster analysis,
which is based on the similarity of the data.
 After the classification of data into various groups, a label is
assigned to the group. It helps in adapting to the changes by
doing the classification.
Applications of Data Mining Cluster Analysis

 Market research, pattern recognition, data analysis, and image


processing.
 Discover distinct groups in their customer base.
 Used to derive plant and animal taxonomies, categorize genes with
similar functionalities
 Web for information discovery.
Requirements of Clustering in Data Mining

•Interpretability
The result of clustering should be usable, understandable and interpretable.
•Helps in dealing with messed up data
Usually, the data is messed up and unstructured. It cannot be analyzed quickly, and that is
why the clustering of information is so significant in data mining. Grouping can give some
structure to the data by organizing it into groups of similar data objects. It becomes more
comfortable for the data expert in processing the data and also discover new things.
•High Dimensional
Data clustering is also able to handle the data of high dimension along with the data of
small size.
Requirements of Clustering in Data Mining
•Attribute shape clusters are discovered
Arbitrary shape clusters are detected by using the algorithm of clustering.
Small size cluster with spherical shape can also be found.
•Algorithm Usability with multiple data kind
Many different kinds of data can be used with algorithms of clustering. The
data can be like binary data, categorical and interval-based data.
•Clustering Scalability
The database usually is enormous to deal with. The algorithm should be
scalable to handle extensive database, so it needs to be scalable.
Data Mining Clustering Methods
1. Partitioning Clustering Method
In this method, let us say that “m” partition is done on the “p” objects of the
database. A cluster will be represented by each partition and m < p.

 One objective should only belong to only one group.


 There should be no group without even a single purpose.
 There are some points which should be remembered in this type of
Partitioning Clustering Method e.g K
 There will be an initial partitioning if we already give no. of a partition (say
m).
 There is one technique called iterative relocation, which means the object will
be moved from one group to another to improve the partitioning.
2. Hierarchical Clustering Methods (Nested Cluster)
In this hierarchical clustering method, the given set of an object of data is created into a kind of hierarchical
decomposition. The formation of hierarchical decomposition will decide the purposes of classification. There
are two types of approaches for the creation of hierarchical decomposition, which are: –
a. Divisive Approach (top-down approach)
Another name for the Divisive approach is a top-down approach.
At the beginning of this method, all the data objects are kept in the same cluster. Smaller clusters are created
by splitting the group by using the continuous iteration. The constant iteration method will keep on going until
the condition of termination is met. One cannot undo after the group is split or merged, and that is why this
method is not so flexible.
b. Agglomerative Approach (bottom-up approach)
Another name for this approach is the bottom-up approach. All the groups are separated in the beginning.
Then it keeps on merging until all the groups are merged, or condition of termination is met.
There are two approaches which can be used to improve the Hierarchical Clustering Quality in Data Mining
which are: –
1.One should carefully analyze the linkages of the object at every partitioning of hierarchical clustering.
2.One can use a hierarchical agglomerative algorithm for the integration of hierarchical agglomeration. In this
approach, first, the objects are grouped into micro-clusters. After grouping data objects into microclusters,
macro clustering is performed on the microcluster.
3. Centroid-based Clustering

Centroid-based clustering organizes the data into non-hierarchical


clusters, in contrast to hierarchical clustering.
k-means is the most widely-used centroid-based clustering
algorithm.
Centroid-based algorithms are efficient but sensitive to initial
conditions and outliers.
4. Exclusive clustering

 Each object or data point belongs exclusively to only one


cluster.
 This is also the most desired form of clustering, in most cases.
 For example, it would be necessary for a customer to be part of
only one segmentation group, so that a unique, dedicated, and
exclusive marketing effort could be formulated as part of a
campaign
5. Non-Exclusive clustering

Often, if not always, an object can be part of more than one cluster.
These are mostly borderline objects, in which we define the
boundaries of clusters to overlap each other.
An example is demographic clustering, in which students can be part
of both a student cluster and a high-spender cluster, which would be
rare.
6. Model-Based Clustering Methods

In this type of clustering method, every cluster is hypothesized so


that it can find the data which is best suited for the model. The
density function is clustered to locate the group in this method.
 Neural Network approaches
 Machine Learning
7. Density-Based Clustering Method

In this method of clustering in Data Mining, density is the main


focus. The notion of mass is used as the basis for this clustering
method. In this clustering method, the cluster will keep on
growing continuously. At least one number of points should be
there in the radius of the group for each point of data.
8. Constraint-Based Clustering Method
Application or user-oriented constraints are incorporated to
perform the clustering.
The expectation of the user is referred to as the constraint.
In this process of grouping, communication is very
interactive, which is provided by the restrictions.
9. Grid-Based Clustering Method

The grid-based clustering methods use a multi-resolution grid


data structure. It quantizes the object areas into a finite number of
cells that form a grid structure on which all of the operations for
clustering are implemented.

You might also like