Machine_Learning_Unit_4
Machine_Learning_Unit_4
Introduction to Clustering
Partitioning Methods
Hierarchical methods
4. K-means clustering,
Association rule.
Overview of Clustering
Introduction to clustering?
Definition:-1
Clustering is the task of dividing the population or data points into a number of groups such that data
points in the same groups are more similar to other data points in the same group and dissimilar to
It is basically a collection of objects on the basis of similarity and dissimilarity between them.
Definition:-2
"A way of grouping the data points into different clusters, consisting of similar data points. The
objects with the possible similarities remain in a group that has less or no similarities with another
group."
What is clustering?
Examples
The clustering methods are broadly divided into Hard clustering (data point belongs to only one
group) and Soft Clustering (data points can belong to another group also).
Partitioning Clustering
Hierarchical Clustering
Density-Based Clustering
Model-Based Clustering
medical imaging
image segmentation
anomaly detection
Examples of clustering(Grouping):
The clustering methods are broadly divided into Hard clustering (data point belongs to only one
group) and Soft Clustering (data points can belong to another group also).
Partitioning Clustering
Hierarchical Clustering
Density-Based Clustering
Model-Based Clustering
Grid-Based Methods
Partitioning Clustering
The most common example of partitioning clustering is the K-Means Clustering algorithm.
In this type, the dataset is divided into a set of k-groups, where K is used to define the number of
pre-defined groups.
The cluster center is created in such a way that the distance between the data points of one cluster
The following stages will help us understand how the K-Means clustering technique works:
Step 4: Iterate the steps below until we find the ideal centroid,
4.1 The sum of squared distances between data points and centroids would be calculated first.
4.2 At this point, we need to allocate each data point to the cluster that is closest to the others
(centroid).
4.3 Finally, compute the centroids for the clusters by averaging all of the clusters data points.
Step-3: Assign each data point to their closest centroid, which will form the predefined K clusters.
Step-4: Calculate the distances between data points and centroids first and allocate each data point
to the cluster that is closest to the others (centroid). Place a new centroid of each cluster.
Step-5: Repeat the third steps, which means reassign each datapoint to the new closest centroid of
each cluster.
to FINISH.
Step-1:Let us pick k clusters, i.e., K=2, to identify the dataset and to put them into different clusters.
Step2:- We need to select some random k data points or centroid to form the cluster.
These points can be either the points from the dataset or any other point.
So, here we are selecting the below two points as k points, which are not the part of our dataset.
We will compute it by applying some mathematics that we have studied to calculate the distance
From the above image, it is clear that points left side of the line is near to the K1 or blue centroid,
and points to the right of the line are close to the yellow centroid.
The left Form cluster has a blue centroid, whereas the right Form cluster has a yellow centroid.
Next, we will reassign each data point to the new centroid. For this, we will repeat the same process
From the above image, we can see, one yellow point is on the left side of the line, and two blue
points are right to the line. So, these three points will be assigned to new centroids.
As reassignment has taken place, so we will again go to the step-4, which is finding new centroids
or K-points.
We will repeat the process by finding the center of gravity of centroids, so the new centroids will be
As we got the new centroids so again will draw the median line and reassign the data points. So, the
We can see in the above image; there are no dissimilar data points on either side of the line, which
As our model is ready, so we can now remove the assumed centroids, and the two final clusters are
There are many different ways to find the optimal number of clusters, but here we are discussing the
The Elbow method is one of the most popular ways to find the optimal number of clusters.
WCSS stands for Within Cluster Sum of Squares, which defines the total variations within a cluster.
The formula to calculate the value of WCSS (for 3 clusters) is given below:
Association rule
Association rule is a kind of unsupervised learning technique that tests for the reliance of one data
element on another data element and design appropriately so that it can be more cost-effective.
The association rule learning is the most important approach of machine learning, and it is employed
in Market Basket analysis, Web usage mining, continuous production, etc. In market basket
analysis, it is an approach used by several big retailers to find the relations between items.
1. Apriori algorithm
2. Eclat algorithm
Association rule learning works on the concept of If and Else Statement, such as if A then B.
To measure the associations between thousands of data items, there are several metrics.
Support
Confidence
Lift
Applications of Machine Learning:
Learning Associations:
Diaper, Beer})=2
Association Rule An implication expression of the form X -> Y, where X and Y are any 2 item sets.
Definition of Support:
Support is the frequency of A or how frequently an item set appears in the dataset.
= 0.4
Definition of Confidence:
How often the items X and Y occur together in the dataset when the occurrence of X is already
given.
It is the ratio of the transaction that contains X and Y to the number of records that contain X.
=0.67
Definition of Lift(l): The lift of the rule X=>Y is the confidence of the rule divided by the expected
confidence, assuming that the item sets X and Y are independent of each other. The expected
If Lift(l)<1: It means they appear less than expected. Greater lift values indicate stronger association.
Lift(l)= Support(X,Y)/(Support(X)*Support(Y))
= 0.4/(0.6*0.6)
=1.11
Market Basket Analysis: It is one of the popular examples and applications of association rule
mining. This technique is commonly used by big retailers to determine the association between
items. By discovering such associations, retailers produce marketing methods by analyzing which
Protein Sequence: The association rules help in determining the synthesis of artificial Proteins.
Web usage mining: Web usage mining is basically the extraction of various types of interesting data
that is readily available and accessible in the ocean of huge web pages, from Internet.
Introduction to Clustering
Partitioning Methods
Hierarchical methods
4. K-means clustering,
Association rule.
Hierarchical clustering
The hierarchical clustering methods are used to group the data into hierarchy or tree-like structure.
then within each department, the employees can be grouped according to their roles such as
This creates a hierarchical structure of the employee data and eases visualization and analysis.
Similarly, there may be a data set which has an underlying hierarchy structure that we want to
discover and we can use the hierarchical clustering methods to achieve that.
There are two main hierarchical clustering methods: Agglomerative clustering and Divisive
clustering.
Agglomerative clustering: is a bottom-up technique which starts with individual objects as clusters
Divisive clustering: starts with one cluster with all given objects and then splits it iteratively to form
smaller clusters.
Agglomerative Clustering:
It starts with each object forming its own cluster and then iteratively merges the clusters according to
It terminates either when certain clustering condition imposed by user is achieved or All clusters
The starting point is the largest cluster with all objects in it and then split recursively to form smaller
It terminates when the user-defined condition is achieved or final clusters contain only one object.
Individual objects are represented by leaf nodes and the clusters are represented by root nodes. A
Hierarchical Clustering
Agglomerative (bottom up) clustering: It builds the dendrogram (tree) from the bottom level, and
stops when all the data points are merged into a single cluster (i.e., the root cluster).
Divisive (top down) clustering: It starts with all data points in one cluster, the root.
Splits the root into a set of child clusters. Each child cluster is recursively divided further
stops when only singleton clusters of individual data points remain, i.e., each cluster with only a
single point
Dendrogram
10
Dendrogram
Each level of the tree represents a partition of the input data into several (nested) clusters or groups.
10
Hierarchical clustering
Initialization
Distance/Proximity Matrix
Intermediate State
C1
C4
C2
C5
C3
Distance/Proximity Matrix
Intermediate State
Merge the two closest clusters (C2 and C5) and update the distance matrix.
C1
C4
C2
C5
C3
Distance/Proximity Matrix
After Merging
C1
C4
C2 U C5
C3
? ? ? ?
C2 U C5
C1
C1
C3
C4
C2 U C5
C3
C4
Closest Pair
16
Single-link
Centroid
Average-link
17
It Can result in straggly (long and thin) clusters due to chaining effect.
To perform clustering, we will first create a distance matrix consisting of the distance between each
For our convenience, we will be considering only the lower bound values of the matrix as shown
below. Specifically, the lower bound values represent the minimum distance between any two points
in the dataset.
The minimum value element is (p3,p6)and value is 0.11 i.e. cluster (p3,p6)
Min[dist((p3,p6),p2)]
Min[dist((p3,p2),(p6,p2)]
Min[dist(0.15, 0.25)]=0.15
Min[dist((p3,p6),p4)]
Min[dist((p3,p4),(p6,p4)]
Min[dist(0.15, 0.22)]=0.15
Min[dist((p3,p6),p5)]
Min[dist((p3,p5),(p6,p5)]
Min[dist(0.28, 0.39)]=0.28
Updated distance matrix:
The minimum value element is (p2,p5)and value is 0.14 i.e. cluster (p2,p5)
Min[dist((p2,p5),(p3,p6)]
Min[dist((p2,(p3,p6),(p5, (p3,p6))]
Min[dist(0.15, 0.28)]=0.15
Min[dist((p2,p5),p4)]
Min[dist((p2,p4),(p5,p4)]
Min[dist(0.20, 0.29)]=0.20
Here 2 values are same and minimum, then first element is choose as minimum value element. i.e.
cluster (p2,p5,p3,p6)
Min[dist([(p2p5),(p3p6)],p1)]
Min[dist([(0.23)], [(0.22)]]=0.22
Same formula can be used for p4
Min[dist([(p2p5),(p3p6)],p4)]
Min[dist([(0.20)], [(0.15)]]=0.15
The minimum value element is (p2,p5,p3,p6,p4) and value is 0.15. i.e. our 4th cluster
(p2,p5,p3,p6,p4)