0% found this document useful (0 votes)

98 views25 pages

Unit-5 Clustering (March 16, 24)

The document discusses unsupervised learning and clustering techniques in machine learning. It defines unsupervised learning and describes common uses such as market segmentation and anomaly detection. The document also explains different types of clustering methods including partitioning, density-based, distribution-based, and hierarchical clustering.

Uploaded by

nikhilbadlani77

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

98 views25 pages

Unit-5 Clustering (March 16, 24)

Uploaded by

nikhilbadlani77

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 25

Study Material of Unit-5

Syllabus - Clustering and unsupervised learning – learning from unclassified

data. Clustering. Hierarchical Agglomerative clustering. K-Meanspartitional
clustering
What is Unsupervised Learning?
As the name suggests, unsupervised learning is a machine learning technique in
which models are not supervised using training dataset. Instead, models itself find
the hidden patterns and insights from the given data. It can be compared to learning
which takes place in the human brain while learning new things. It can be defined
as:

Unsupervised learning is a type of machine learning in which models are trained

using unlabeled dataset and are allowed to act on that data without any
supervision.

Unsupervised learning cannot be directly applied to a regression or classification

problem because unlike supervised learning, we have the input data but no
corresponding output data. The goal of unsupervised learning is to find the
underlying structure of dataset, group that data according to similarities, and
represent that dataset in a compressed format.

Example: Suppose the unsupervised learning algorithm is given an input dataset

containing images of different types of cats and dogs. The algorithm is never
trained upon the given dataset, which means it does not have any idea about the
features of the dataset. The task of the unsupervised learning algorithm is to
identify the image features on their own. Unsupervised learning algorithm will
perform this task by clustering the image dataset into the groups according to
similarities between images.
Why use Unsupervised Learning?
Below are some main reasons which describe the importance of Unsupervised
Learning:

o Unsupervised learning is helpful for finding useful insights from the data.
o Unsupervised learning is much similar as a human learns to think by their
own experiences, which makes it closer to the real AI.
o Unsupervised learning works on unlabeled and uncategorized data which
make unsupervised learning more important.
o In real-world, we do not always have input data with the corresponding
output so to solve such cases, we need unsupervised learning.

Working of Unsupervised Learning

Working of unsupervised learning can be understood by the below diagram:

Here, we have taken an unlabeled input data, which means it is not categorized and
corresponding outputs are also not given. Now, this unlabeled input data is fed to
the machine learning model in order to train it. Firstly, it will interpret the raw data
to find the hidden patterns from the data and then will apply suitable algorithms
such as k-means clustering, Decision tree, etc.

Once it applies the suitable algorithm, the algorithm divides the data objects into
groups according to the similarities and difference between the objects.
Types of Unsupervised Learning Algorithm:
The unsupervised learning algorithm can be further categorized into two types of
problems:

o Clustering: Clustering is a method of grouping the objects into clusters such

that objects with most similarities remains into a group and has less or no
similarities with the objects of another group. Cluster analysis finds the
commonalities between the data objects and categorizes them as per the
presence and absence of those commonalities.
o Association: An association rule is an unsupervised learning method which
is used for finding the relationships between variables in the large database.
It determines the set of items that occurs together in the dataset. Association
rule makes marketing strategy more effective. Such as people who buy X
item (suppose a bread) are also tend to purchase Y (Butter/Jam) item. A
typical example of Association rule is Market Basket Analysis.

Clustering in Machine Learning

Clustering or cluster analysis is a machine learning technique, which groups the
unlabelled dataset. It can be defined as "A way of grouping the data points into
different clusters, consisting of similar data points. The objects with the possible
similarities remain in a group that has less or no similarities with another
group."

It does it by finding some similar patterns in the unlabelled dataset such as shape,
size, color, behavior, etc., and divides them as per the presence and absence of
those similar patterns.
It is an unsupervised learning method, hence no supervision is provided to the
algorithm, and it deals with the unlabeled dataset.

After applying this clustering technique, each cluster or group is provided with a
cluster-ID. ML system can use this id to simplify the processing of large and
complex datasets.

The clustering technique is commonly used for statistical data analysis.

Note: Clustering is somewhere similar to the classification algorithm, but the

difference is the type of dataset that we are using. In classification, we work with
the labeled data set, whereas in clustering, we work with the unlabelled dataset.

Example: Let's understand the clustering technique with the real-world example
of Mall: When we visit any shopping mall, we can observe that the things with
similar usage are grouped together. Such as the t-shirts are grouped in one section,
and trousers are at other sections, similarly, at vegetable sections, apples, bananas,
Mangoes, etc., are grouped in separate sections, so that we can easily find out the
things. The clustering technique also works in the same way. Other examples of
clustering are grouping documents according to the topic.

The clustering technique can be widely used in various tasks. Some most
common uses of this technique are:

o Market Segmentation
o Statistical data analysis
o Social network analysis
o Image segmentation
o Anomaly detection, etc.

Apart from these general usages, it is used by the Amazon in its recommendation
system to provide the recommendations as per the past search of
products. Netflix also uses this technique to recommend the movies and web-
series to its users as per the watch history.

The below diagram explains the working of the clustering algorithm. We can see
the different fruits are divided into several groups with similar properties.
Types of Clustering Methods
The clustering methods are broadly divided into Hard clustering (datapoint
belongs to only one group) and Soft Clustering (data points can belong to another
group also). But there are also other various approaches of Clustering exist. Below
are the main clustering methods used in Machine learning:

1. Partitioning Clustering
2. Density-Based Clustering
3. Distribution Model-Based Clustering
4. Hierarchical Clustering
5. Fuzzy Clustering

Partitioning Clustering

It is a type of clustering that divides the data into non-hierarchical groups. It is also
known as the centroid-based method. The most common example of partitioning
clustering is the K-Means Clustering algorithm.

In this type, the dataset is divided into a set of k groups, where K is used to define
the number of pre-defined groups. The cluster center is created in such a way that
the distance between the data points of one cluster is minimum as compared to
another cluster centroid.
Partitioning clustering is further subdivided into:

➢ K-Means clustering
➢ Fuzzy C-Means clustering
Here, the features or characteristics are compared, and all objects having similar
characteristics are clustered together.

• In k-means clustering, the objects are divided into several clusters

mentioned by the number ‘K.’ So if we say K = 2, the objects are divided
into two clusters, c1 and c2, as shown:

• Fuzzy c-means is very similar to k-means in the sense that it clusters objects
that have similar characteristics together. In k-means clustering, a single
object cannot belong to two different clusters. But in c-means, objects can
belong to more than one cluster, as shown.
Density-Based Clustering

The density-based clustering method connects the highly-dense areas into clusters,
and the arbitrarily shaped distributions are formed as long as the dense region can
be connected. This algorithm does it by identifying different clusters in the dataset
and connects the areas of high densities into clusters. The dense areas in data space
are divided from each other by sparser areas.

These algorithms can face difficulty in clustering the data points if the dataset has
varying densities and high dimensions.
Distribution Model-Based Clustering

In the distribution model-based clustering method, the data is divided based on the
probability of how a dataset belongs to a particular distribution. The grouping is
done by assuming some distributions commonly Gaussian Distribution.

The example of this type is the Expectation-Maximization Clustering

algorithm that uses Gaussian Mixture Models (GMM).

Hierarchical Clustering

Hierarchical clustering can be used as an alternative for the partitioned clustering

as there is no requirement of pre-specifying the number of clusters to be created.
In this technique, the dataset is divided into clusters to create a tree-like structure,
which is also called a dendrogram. The observations or any number of clusters
can be selected by cutting the tree at the correct level. The most common example
of this method is the Agglomerative Hierarchical algorithm.
Fuzzy Clustering

Fuzzy clustering is a type of soft method in which a data object may belong to
more than one group or cluster. Each dataset has a set of membership coefficients,
which depend on the degree of membership to be in a cluster. Fuzzy C-means
algorithm is the example of this type of clustering; it is sometimes also known as
the Fuzzy k-means algorithm.

Clustering Algorithms
The Clustering algorithms can be divided based on their models that are explained
above. There are different types of clustering algorithms published, but only a few
are commonly used. The clustering algorithm is based on the kind of data that we
are using. Such as, some algorithms need to guess the number of clusters in the
given dataset, whereas some are required to find the minimum distance between
the observation of the dataset.

Below is the list of some popular unsupervised learning algorithms:

o K-means clustering
o KNN (k-nearest neighbors)
o Hierarchal clustering
o Anomaly detection
o Neural Networks
o Principle Component Analysis
o Independent Component Analysis
o Apriori algorithm
o Singular value decomposition

Here we are discussing mainly popular Clustering algorithms that are widely used
in machine learning:

1. K-Means algorithm: The k-means algorithm is one of the most popular

clustering algorithms. It classifies the dataset by dividing the samples into
different clusters of equal variances. The number of clusters must be
specified in this algorithm. It is fast with fewer computations required, with
the linear complexity of O(n).
2. Mean-shift algorithm: Mean-shift algorithm tries to find the dense areas in
the smooth density of data points. It is an example of a centroid-based model,
that works on updating the candidates for centroid to be the center of the
points within a given region.
3. DBSCAN Algorithm: It stands for Density-Based Spatial Clustering of
Applications with Noise. It is an example of a density-based model similar
to the mean-shift, but with some remarkable advantages. In this algorithm,
the areas of high density are separated by the areas of low density. Because
of this, the clusters can be found in any arbitrary shape.
4. Expectation-Maximization Clustering using GMM: This algorithm can
be used as an alternative for the k-means algorithm or for those cases where
K-means can be failed. In GMM, it is assumed that the data points are
Gaussian distributed.
5. Agglomerative Hierarchical algorithm: The Agglomerative hierarchical
algorithm performs the bottom-up hierarchical clustering. In this, each data
point is treated as a single cluster at the outset and then successively merged.
The cluster hierarchy can be represented as a tree-structure.
6. Affinity Propagation: It is different from other clustering algorithms as it
does not require to specify the number of clusters. In this, each data point
sends a message between the pair of data points until convergence. It has
O(N2T) time complexity, which is the main drawback of this algorithm.

Applications of Clustering
Below are some commonly known applications of clustering technique in Machine
Learning:
o In Identification of Cancer Cells: The clustering algorithms are widely
used for the identification of cancerous cells. It divides the cancerous and
non-cancerous data sets into different groups.
o In Search Engines: Search engines also work on the clustering technique.
The search result appears based on the closest object to the search query. It
does it by grouping similar data objects in one group that is far from the other
dissimilar objects. The accurate result of a query depends on the quality of
the clustering algorithm used.
o Customer Segmentation: It is used in market research to segment the
customers based on their choice and preferences.
o In Biology: It is used in the biology stream to classify different species of
plants and animals using the image recognition technique.
o In Land Use: The clustering technique is used in identifying the area of
similar lands use in the GIS database. This can be very useful to find that for
what purpose the particular land should be used, that means for which
purpose it is more suitable.

Hierarchical Clustering in Machine Learning

Hierarchical clustering is another unsupervised machine learning algorithm, which
is used to group the unlabeled datasets into a cluster and also known
as Hierarchical Cluster Analysis or HCA.

In this algorithm, we develop the hierarchy of clusters in the form of a tree, and
this tree-shaped structure is known as the dendrogram.

Sometimes the results of K-means clustering and hierarchical clustering may look
similar, but they both differ depending on how they work. As there is no
requirement to predetermine the number of clusters as we did in the K-Means
algorithm.

The hierarchical clustering technique has two approaches:

1. Agglomerative: Agglomerative is a bottom-up approach, in which the

algorithm starts with taking all data points as single clusters and merging
them until one cluster is left.
2. Divisive: Divisive algorithm is the reverse of the agglomerative algorithm as
it is a top-down approach.
Why hierarchical clustering?

As we already have other clustering algorithms such as K-Means Clustering,

then why we need hierarchical clustering? So, as we have seen in the K-means
clustering that there are some challenges with this algorithm, which are a
predetermined number of clusters, and it always tries to create the clusters of the
same size. To solve these two challenges, we can opt for the hierarchical clustering
algorithm because, in this algorithm, we don't need to have knowledge about the
predefined number of clusters.

In this topic, we will discuss the Agglomerative Hierarchical clustering algorithm.

Agglomerative Hierarchical clustering

The agglomerative hierarchical clustering algorithm is a popular example of HCA.

To group the datasets into clusters, it follows the bottom-up approach. It means,
this algorithm considers each dataset as a single cluster at the beginning, and then
start combining the closest pair of clusters together. It does this until all the clusters
are merged into a single cluster that contains all the datasets.

This hierarchy of clusters is represented in the form of the dendrogram.

How the Agglomerative Hierarchical clustering Work?

The working of the AHC algorithm can be explained using the below steps:

o Step-1: Create each data point as a single cluster. Let's say there are N data
points, so the number of clusters will also be N.
o Step-2: Take two closest data points or clusters and merge them to form one
cluster. So, there will now be N-1 clusters.

o Step-3: Again, take the two closest clusters and merge them together to form
one cluster. There will be N-2 clusters.

o Step-4: Repeat Step 3 until only one cluster left. So, we will get the
following clusters. Consider the below images:
o Step-5: Once all the clusters are combined into one big cluster, develop the
dendrogram to divide the clusters as per the problem.

Note: To better understand hierarchical clustering, it is advised to have a look on

k-means clustering

Measure for the distance between two clusters

As we have seen, the closest distance between the two clusters is crucial for the
hierarchical clustering. There are various ways to calculate the distance between
two clusters, and these ways decide the rule for clustering. These measures are
called Linkage methods. Some of the popular linkage methods are given below:
1. Single Linkage: It is the Shortest Distance between the closest points of the
clusters. Consider the below image:

2. Complete Linkage: It is the farthest distance between the two points of two
different clusters. It is one of the popular linkage methods as it forms tighter
clusters than single-linkage.

3. Average Linkage: It is the linkage method in which the distance between

each pair of datasets is added up and then divided by the total number of
datasets to calculate the average distance between two clusters. It is also one
of the most popular linkage methods.
4. Centroid Linkage: It is the linkage method in which the distance between
the centroid of the clusters is calculated. Consider the below image:

From the above-given approaches, we can apply any of them according to the type
of problem or business requirement.

Working of Dendrogram in Hierarchical clustering

The dendrogram is a tree-like structure that is mainly used to store each step as a
memory that the HC algorithm performs. In the dendrogram plot, the Y-axis shows
the Euclidean distances between the data points, and the x-axis shows all the data
points of the given dataset.

The working of the dendrogram can be explained using the below diagram:
In the above diagram, the left part is showing how clusters are created in
agglomerative clustering, and the right part is showing the corresponding
dendrogram.

o As we have discussed above, firstly, the datapoints P2 and P3 combine

together and form a cluster, correspondingly a dendrogram is created, which
connects P2 and P3 with a rectangular shape. The hight is decided according
to the Euclidean distance between the data points.
o In the next step, P5 and P6 form a cluster, and the corresponding dendrogram
is created. It is higher than of previous, as the Euclidean distance between P5
and P6 is a little bit greater than the P2 and P3.
o Again, two new dendrograms are created that combine P1, P2, and P3 in one
dendrogram, and P4, P5, and P6, in another dendrogram.
o At last, the final dendrogram is created that combines all the data points
together.

We can cut the dendrogram tree structure at any level as per our requirement.

K-Means Clustering Algorithm

K-Means Clustering is an unsupervised learning algorithm that is used to solve the
clustering problems in machine learning or data science. In this topic, we will learn
what is K-means clustering algorithm, how the algorithm works, along with the
Python implementation of k-means clustering.

What is K-Means Algorithm?

K-Means Clustering is an Unsupervised Learning algorithm, which groups the

unlabeled dataset into different clusters. Here K defines the number of pre-defined
clusters that need to be created in the process, as if K=2, there will be two clusters,
and for K=3, there will be three clusters, and so on.

It is an iterative algorithm that divides the unlabeled dataset into k different clusters
in such a way that each dataset belongs only one group that has similar properties.

It allows us to cluster the data into different groups and a convenient way to
discover the categories of groups in the unlabeled dataset on its own without the
need for any training.
It is a centroid-based algorithm, where each cluster is associated with a centroid.
The main aim of this algorithm is to minimize the sum of distances between the
data point and their corresponding clusters.

The algorithm takes the unlabeled dataset as input, divides the dataset into k-
number of clusters, and repeats the process until it does not find the best clusters.
The value of k should be predetermined in this algorithm.

The k-means clustering algorithm mainly performs two tasks:

o Determines the best value for K center points or centroids by an iterative

process.
o Assigns each data point to its closest k-center. Those data points which are
near to the particular k-center, create a cluster.

Hence each cluster has datapoints with some commonalities, and it is away from
other clusters.

The below diagram explains the working of the K-means Clustering Algorithm:

How does the K-Means Algorithm Work?

The working of the K-Means algorithm is explained in the below steps:

Step-1: Select the number K to decide the number of clusters.

Step-2: Select random K points or centroids. (It can be other from the input
dataset).

Step-3: Assign each data point to their closest centroid, which will form the
predefined K clusters.
Step-4: Calculate the variance and place a new centroid of each cluster.

Step-5: Repeat the third steps, which means reassign each datapoint to the new
closest centroid of each cluster.

Step-6: If any reassignment occurs, then go to step-4 else go to FINISH.

Step-7: The model is ready.

Let's understand the above steps by considering the visual plots:

Suppose we have two variables M1 and M2. The x-y axis scatter plot of these two
variables is given below:

o Let's take number k of clusters, i.e., K=2, to identify the dataset and to put
them into different clusters. It means here we will try to group these datasets
into two different clusters.
o We need to choose some random k points or centroid to form the cluster.
These points can be either the points from the dataset or any other point. So,
here we are selecting the below two points as k points, which are not the part
of our dataset. Consider the below image:

o Now we will assign each data point of the scatter plot to its closest K-point
or centroid. We will compute it by applying some mathematics that we have
studied to calculate the distance between two points. So, we will draw a
median between both the centroids. Consider the below image:

From the above image, it is clear that points left side of the line is near to the K1
or blue centroid, and points to the right of the line are close to the yellow centroid.
Let's color them as blue and yellow for clear visualization.
o As we need to find the closest cluster, so we will repeat the process by
choosing a new centroid. To choose the new centroids, we will compute the
center of gravity of these centroids, and will find new centroids as below:

o Next, we will reassign each datapoint to the new centroid. For this, we will
repeat the same process of finding a median line. The median will be like
below image:

From the above image, we can see, one yellow point is on the left side of the line,
and two blue points are right to the line. So, these three points will be assigned to
new centroids.

As reassignment has taken place, so we will again go to the step-4, which is finding
new centroids or K-points.
o We will repeat the process by finding the center of gravity of centroids, so
the new centroids will be as shown in the below image:

o As we got the new centroids so again will draw the median line and reassign
the data points. So, the image will be:
o We can see in the above image; there are no dissimilar data points on either
side of the line, which means our model is formed. Consider the below

As our model is ready, so we can now remove the assumed centroids, and the two
final clusters will be as shown in the below image:

How to choose the value of "K number of clusters" in K-means Clustering?

The performance of the K-means clustering algorithm depends upon highly

efficient clusters that it forms. But choosing the optimal number of clusters is a big
task. There are some different ways to find the optimal number of clusters, but here
we are discussing the most appropriate method to find the number of clusters or
value of K.

Gitlab Ci/Cd: An Overview
No ratings yet
Gitlab Ci/Cd: An Overview
32 pages
Toyota 5l
100% (2)
Toyota 5l
79 pages
Unsupervised Machine Learning
No ratings yet
Unsupervised Machine Learning
59 pages
ML Mod 4 Part 1
No ratings yet
ML Mod 4 Part 1
99 pages
U20cs604 Machine Learning Unit III
No ratings yet
U20cs604 Machine Learning Unit III
23 pages
Artificial Intelligence Lec 5
No ratings yet
Artificial Intelligence Lec 5
20 pages
DSA Presentation Group 6
No ratings yet
DSA Presentation Group 6
34 pages
2nd Unit NN Final Class Notes
No ratings yet
2nd Unit NN Final Class Notes
50 pages
Unit 3 Unsupervised Learning Algorith
No ratings yet
Unit 3 Unsupervised Learning Algorith
15 pages
Clustering in Machine Learning
No ratings yet
Clustering in Machine Learning
7 pages
ML CH 4
No ratings yet
ML CH 4
51 pages
Chapter 3 p4
No ratings yet
Chapter 3 p4
18 pages
Classify Clustering
No ratings yet
Classify Clustering
31 pages
ML Unit-Iii
No ratings yet
ML Unit-Iii
18 pages
What Is Unsupervised Learning
No ratings yet
What Is Unsupervised Learning
9 pages
Unit - 4 (ML)
No ratings yet
Unit - 4 (ML)
13 pages
Module 6 - Un-Supervised Learning Algorithms
No ratings yet
Module 6 - Un-Supervised Learning Algorithms
31 pages
DSUP Exp5
No ratings yet
DSUP Exp5
7 pages
Module 5
No ratings yet
Module 5
91 pages
ML Unit-4
No ratings yet
ML Unit-4
14 pages
7.introduction To Clustering
No ratings yet
7.introduction To Clustering
11 pages
ML Unit 4
No ratings yet
ML Unit 4
17 pages
Unit 4
No ratings yet
Unit 4
96 pages
Unsupervised Machine Learning
No ratings yet
Unsupervised Machine Learning
10 pages
Unit 4
No ratings yet
Unit 4
62 pages
Un-Supervised Machine Learning
No ratings yet
Un-Supervised Machine Learning
9 pages
Machine Learning - Iv
No ratings yet
Machine Learning - Iv
13 pages
Classification and Clustering
No ratings yet
Classification and Clustering
8 pages
Fuzzy Meaning
No ratings yet
Fuzzy Meaning
6 pages
Unit 4
No ratings yet
Unit 4
40 pages
Unit 2 ML
No ratings yet
Unit 2 ML
11 pages
ML Unit 4 (Ab 22)
No ratings yet
ML Unit 4 (Ab 22)
39 pages
Lecturer-1 Unit 3
No ratings yet
Lecturer-1 Unit 3
31 pages
Unit 4
No ratings yet
Unit 4
53 pages
UnSupervised Learning
No ratings yet
UnSupervised Learning
3 pages
Clustering in Machine Learning
No ratings yet
Clustering in Machine Learning
7 pages
D3IT Clustering April 2023
No ratings yet
D3IT Clustering April 2023
70 pages
Day 3 - Content
No ratings yet
Day 3 - Content
50 pages
Unsupervised Learning: Niveditha. GH
No ratings yet
Unsupervised Learning: Niveditha. GH
10 pages
Clustering in Machine Learning
No ratings yet
Clustering in Machine Learning
21 pages
Unit 5
No ratings yet
Unit 5
5 pages
Unsupervised Learning Part 1
No ratings yet
Unsupervised Learning Part 1
9 pages
Unsupervised Learning
No ratings yet
Unsupervised Learning
20 pages
Chapter 1 Introduction
No ratings yet
Chapter 1 Introduction
49 pages
Cbsyllabus Bda
No ratings yet
Cbsyllabus Bda
5 pages
4.unsupervised Learning Model-Clustering
No ratings yet
4.unsupervised Learning Model-Clustering
45 pages
Clustering U 5
No ratings yet
Clustering U 5
2 pages
Unit 3 Unsupervised Learning & Neural Network
No ratings yet
Unit 3 Unsupervised Learning & Neural Network
21 pages
Unit 4
No ratings yet
Unit 4
74 pages
Clustering
No ratings yet
Clustering
9 pages
Untitled Document
No ratings yet
Untitled Document
32 pages
Clustering
No ratings yet
Clustering
10 pages
ML Unit-4 Final 2024-25
No ratings yet
ML Unit-4 Final 2024-25
28 pages
Clustering
No ratings yet
Clustering
8 pages
Unit 3 Supervised Learning
No ratings yet
Unit 3 Supervised Learning
89 pages
UNIT-5 Material
No ratings yet
UNIT-5 Material
42 pages
ML UNIT 4 Sir
No ratings yet
ML UNIT 4 Sir
42 pages
Unit 2 Unsupervised Learning
No ratings yet
Unit 2 Unsupervised Learning
86 pages
DW & DM Unit 4 Notes
No ratings yet
DW & DM Unit 4 Notes
40 pages
Unsupervised Learning
No ratings yet
Unsupervised Learning
23 pages
Clustering
No ratings yet
Clustering
29 pages
Unit-6 AI (April 11, 2023)
No ratings yet
Unit-6 AI (April 11, 2023)
42 pages
Unit-3 Decision Tree Learning (Februray 26, 2024)
No ratings yet
Unit-3 Decision Tree Learning (Februray 26, 2024)
51 pages
Unit-2 Inductive Classification (February 19, 2024)
No ratings yet
Unit-2 Inductive Classification (February 19, 2024)
36 pages
Ec No Metrics
No ratings yet
Ec No Metrics
8 pages
1.PPQA-SEPG Roles and Responsibilities PDF
No ratings yet
1.PPQA-SEPG Roles and Responsibilities PDF
2 pages
Salesforce Course Content PDF
No ratings yet
Salesforce Course Content PDF
8 pages
AI OPEN Inaugural Editorial - 2020 - AI Open
No ratings yet
AI OPEN Inaugural Editorial - 2020 - AI Open
1 page
Anatomy of A MapReduce Job
No ratings yet
Anatomy of A MapReduce Job
5 pages
ZTNA - Cloudflare Access - Product-Overview 2024 Q2 EN
No ratings yet
ZTNA - Cloudflare Access - Product-Overview 2024 Q2 EN
7 pages
HEC-HMS Training-V2-20231219 - 223927
No ratings yet
HEC-HMS Training-V2-20231219 - 223927
61 pages
Planning Chemical Syntheses With Deep Neural Networks and Symbolic AI - Presentation
No ratings yet
Planning Chemical Syntheses With Deep Neural Networks and Symbolic AI - Presentation
18 pages
DA-100 Mod6-ENU-PowerPoint
No ratings yet
DA-100 Mod6-ENU-PowerPoint
26 pages
Dinesh 2 1
No ratings yet
Dinesh 2 1
2 pages
5.1.9-Packet-Tracer - Investigate-Stp-Loop-Prevention
No ratings yet
5.1.9-Packet-Tracer - Investigate-Stp-Loop-Prevention
6 pages
IGCSEFM Factorisation
No ratings yet
IGCSEFM Factorisation
10 pages
Risc V Cores
No ratings yet
Risc V Cores
41 pages
SERDES
No ratings yet
SERDES
47 pages
0.pham Bac Nguyen - LLM Algorithm
No ratings yet
0.pham Bac Nguyen - LLM Algorithm
2 pages
Internet and Multimedia
No ratings yet
Internet and Multimedia
6 pages
Note To Merchant
No ratings yet
Note To Merchant
10 pages
Introduction of Structured Query Language: SQL Practical File
No ratings yet
Introduction of Structured Query Language: SQL Practical File
18 pages
LEAK DETECTION IN PIPELINE-jijo
No ratings yet
LEAK DETECTION IN PIPELINE-jijo
17 pages
Cloud - Business Profile
No ratings yet
Cloud - Business Profile
14 pages
NPM-D3A en 25 0101
No ratings yet
NPM-D3A en 25 0101
4 pages
The Youtube Social Network: Mirjam Wattenhofer Roger Wattenhofer Zack Zhu
No ratings yet
The Youtube Social Network: Mirjam Wattenhofer Roger Wattenhofer Zack Zhu
9 pages
Assessing Maximum DG Penetration Levels in A Real Distribution Feeder by Using OpenDSS
No ratings yet
Assessing Maximum DG Penetration Levels in A Real Distribution Feeder by Using OpenDSS
6 pages
UTtoKB A Model For Semantic Relation Extraction For Unstructured Text
No ratings yet
UTtoKB A Model For Semantic Relation Extraction For Unstructured Text
7 pages
4721 Assig 02
No ratings yet
4721 Assig 02
37 pages
Exercises: 2 / Basic Structures: Sets, Functions, Sequences, Sums, and Matrices
No ratings yet
Exercises: 2 / Basic Structures: Sets, Functions, Sequences, Sums, and Matrices
2 pages
Offers
No ratings yet
Offers
1 page
05-IP Addressing
No ratings yet
05-IP Addressing
13 pages
Multimedia SYsytem Unit 1
No ratings yet
Multimedia SYsytem Unit 1
20 pages

Unit-5 Clustering (March 16, 24)

Uploaded by

Unit-5 Clustering (March 16, 24)

Uploaded by

Study Material of Unit-5

Syllabus - Clustering and unsupervised learning – learning from unclassified

Unsupervised learning is a type of machine learning in which models are trained

Unsupervised learning cannot be directly applied to a regression or classification

Example: Suppose the unsupervised learning algorithm is given an input dataset

Working of Unsupervised Learning

o Clustering: Clustering is a method of grouping the objects into clusters such

Clustering in Machine Learning

The clustering technique is commonly used for statistical data analysis.

Note: Clustering is somewhere similar to the classification algorithm, but the

• In k-means clustering, the objects are divided into several clusters

The example of this type is the Expectation-Maximization Clustering

Hierarchical clustering can be used as an alternative for the partitioned clustering

Below is the list of some popular unsupervised learning algorithms:

1. K-Means algorithm: The k-means algorithm is one of the most popular

Hierarchical Clustering in Machine Learning

The hierarchical clustering technique has two approaches:

1. Agglomerative: Agglomerative is a bottom-up approach, in which the

As we already have other clustering algorithms such as K-Means Clustering,

In this topic, we will discuss the Agglomerative Hierarchical clustering algorithm.

Agglomerative Hierarchical clustering

The agglomerative hierarchical clustering algorithm is a popular example of HCA.

This hierarchy of clusters is represented in the form of the dendrogram.

How the Agglomerative Hierarchical clustering Work?

Note: To better understand hierarchical clustering, it is advised to have a look on

Measure for the distance between two clusters

3. Average Linkage: It is the linkage method in which the distance between

Working of Dendrogram in Hierarchical clustering

o As we have discussed above, firstly, the datapoints P2 and P3 combine

K-Means Clustering Algorithm

What is K-Means Algorithm?

K-Means Clustering is an Unsupervised Learning algorithm, which groups the

The k-means clustering algorithm mainly performs two tasks:

o Determines the best value for K center points or centroids by an iterative

How does the K-Means Algorithm Work?

The working of the K-Means algorithm is explained in the below steps:

Step-1: Select the number K to decide the number of clusters.

Step-6: If any reassignment occurs, then go to step-4 else go to FINISH.

Step-7: The model is ready.

Let's understand the above steps by considering the visual plots:

How to choose the value of "K number of clusters" in K-means Clustering?

The performance of the K-means clustering algorithm depends upon highly

You might also like