0% found this document useful (0 votes)
0 views

Machine_Learning_Unit_4

This document provides an overview of unsupervised learning, focusing on clustering techniques such as K-means, hierarchical, and density-based methods. It explains the principles of clustering, differentiates it from classification, and discusses the applications of clustering in various fields. Additionally, it covers the association rule learning, particularly the Apriori algorithm, and its applications in market basket analysis and other domains.

Uploaded by

sherkeyashi
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
0 views

Machine_Learning_Unit_4

This document provides an overview of unsupervised learning, focusing on clustering techniques such as K-means, hierarchical, and density-based methods. It explains the principles of clustering, differentiates it from classification, and discusses the applications of clustering in various fields. Additionally, it covers the association rule learning, particularly the Apriori algorithm, and its applications in market basket analysis and other domains.

Uploaded by

sherkeyashi
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 22

UNIT-4: Unsupervised Learning

Introduction to Clustering

Unsupervised vs Supervised Learning,

Different types of clustering techniques

Partitioning Methods

Hierarchical methods

Density Based Methods

Grid Based Methods

Model Based Clustering Methods.

4. K-means clustering,

5. Apriori algorithm and

Association rule.

Hierarchical clustering, K-Medoids,

Density-based methods DBSCAN.

Overview of Clustering

Introduction to clustering?

Definition:-1

Clustering is the task of dividing the population or data points into a number of groups such that data

points in the same groups are more similar to other data points in the same group and dissimilar to

the data points in other groups.

It is basically a collection of objects on the basis of similarity and dissimilarity between them.

Definition:-2

"A way of grouping the data points into different clusters, consisting of similar data points. The
objects with the possible similarities remain in a group that has less or no similarities with another

group."

What is clustering?

Grouping unlabeled examples is called clustering.

As the examples are unlabeled, clustering relies on unsupervised machine learning.

If the examples are labeled, then clustering is called classification.

Figure 1: Unlabeled examples grouped into three clusters.

Examples

Differentiate between Classification and Clustering

Types of Clustering Methods

The clustering methods are broadly divided into Hard clustering (data point belongs to only one

group) and Soft Clustering (data points can belong to another group also).

Partitioning Clustering

Hierarchical Clustering

Density-Based Clustering

Model-Based Clustering

Grid Based Methods

What are the Uses of Clustering?

Common applications for clustering include the following:


market segmentation

social network analysis

search result grouping

medical imaging

image segmentation

anomaly detection

Examples of clustering(Grouping):

Group stars by brightness.

Group organisms by genetic information into a taxonomy.

Group documents by topic.

Types of Clustering Methods

The clustering methods are broadly divided into Hard clustering (data point belongs to only one

group) and Soft Clustering (data points can belong to another group also).

Partitioning Clustering

Hierarchical Clustering

Density-Based Clustering

Model-Based Clustering

Grid-Based Methods

Partitioning Clustering

It is a type of clustering that divides the data into non-hierarchical groups.

It is also known as the centroid-based method.

The most common example of partitioning clustering is the K-Means Clustering algorithm.
In this type, the dataset is divided into a set of k-groups, where K is used to define the number of

pre-defined groups.

The cluster center is created in such a way that the distance between the data points of one cluster

is minimum as compared to another cluster centroid.

Working of K-Means Algorithm:

The following stages will help us understand how the K-Means clustering technique works:

Step 1: First, we need to provide the number of clusters, K.

Step 2: Next, choose K data points at random and assign each

data points to a cluster.

Step 3: The cluster centroids will now be computed.

Step 4: Iterate the steps below until we find the ideal centroid,

which is the assigning of data points to clusters that do not vary.

4.1 The sum of squared distances between data points and centroids would be calculated first.

4.2 At this point, we need to allocate each data point to the cluster that is closest to the others

(centroid).

4.3 Finally, compute the centroids for the clusters by averaging all of the clusters data points.

The working of the K-Means algorithm is given below:

Step-1: Select the number K to decide the number of clusters.

Step-2: Select random K points or centroids. (It can be other

from the input dataset).

Step-3: Assign each data point to their closest centroid, which will form the predefined K clusters.
Step-4: Calculate the distances between data points and centroids first and allocate each data point

to the cluster that is closest to the others (centroid). Place a new centroid of each cluster.

Step-5: Repeat the third steps, which means reassign each datapoint to the new closest centroid of

each cluster.

Step-6: If any reassignment occurs, then go to step-4 else go

to FINISH.

Step-7: The model is ready.

Let's understand the above steps by considering the visual plots:

Step-1:Let us pick k clusters, i.e., K=2, to identify the dataset and to put them into different clusters.

Step2:- We need to select some random k data points or centroid to form the cluster.

These points can be either the points from the dataset or any other point.

So, here we are selecting the below two points as k points, which are not the part of our dataset.

Consider the image below:


Now we will assign each data point of the scatter plot to its closest K-point or centroid.

We will compute it by applying some mathematics that we have studied to calculate the distance

between two points.

So, we will draw a median between both the centroids.

Consider the below image:

From the above image, it is clear that points left side of the line is near to the K1 or blue centroid,

and points to the right of the line are close to the yellow centroid.

Let's color them as blue and yellow for clear visualization.

The left Form cluster has a blue centroid, whereas the right Form cluster has a yellow centroid.

Repeat the procedure, this time by selecting a different centroid.


To choose the new centroids, we will compute the center of gravity of these centroids, and will find

new centroids as below:

Next, we will reassign each data point to the new centroid. For this, we will repeat the same process

of finding a median line. The median will be like below image:

From the above image, we can see, one yellow point is on the left side of the line, and two blue

points are right to the line. So, these three points will be assigned to new centroids.

As reassignment has taken place, so we will again go to the step-4, which is finding new centroids

or K-points.

We will repeat the process by finding the center of gravity of centroids, so the new centroids will be

as shown in the below image:

As we got the new centroids so again will draw the median line and reassign the data points. So, the

image will be:


.

We can see in the above image; there are no dissimilar data points on either side of the line, which

means our model is formed. Consider the below image:

As our model is ready, so we can now remove the assumed centroids, and the two final clusters are

as shown in the fig. below:

How to choose the value of "K number of clusters" in K-means Clustering?

Choosing the optimal number of clusters is a big task.

There are many different ways to find the optimal number of clusters, but here we are discussing the

most appropriate method to find the number of clusters or value of K.

The method is Elbow Method:

The Elbow method is one of the most popular ways to find the optimal number of clusters.

This method uses the concept of WCSS value.

WCSS stands for Within Cluster Sum of Squares, which defines the total variations within a cluster.

The formula to calculate the value of WCSS (for 3 clusters) is given below:

WCSS= Pi in Cluster1 distance(Pi C1)2 +Pi in Cluster2distance(Pi C2)2+Pi in

CLuster3 distance(Pi C3)2

Association rule
Association rule is a kind of unsupervised learning technique that tests for the reliance of one data

element on another data element and design appropriately so that it can be more cost-effective.

The association rule learning is the most important approach of machine learning, and it is employed

in Market Basket analysis, Web usage mining, continuous production, etc. In market basket

analysis, it is an approach used by several big retailers to find the relations between items.

Association rule learning can be divided into three types of algorithms:

1. Apriori algorithm

2. Eclat algorithm

3. F-P Growth algorithm

Associations: Market Basket Analysis

How does Association Rule Learning work?

Association rule learning works on the concept of If and Else Statement, such as if A then B.

To measure the associations between thousands of data items, there are several metrics.

These metrics are given below:

Support

Confidence

Lift
Applications of Machine Learning:

Learning Associations:

Support Count() Frequency of occurrence of a items.Here ({Milk, Diaper, Beer})=2 ({Bread,

Diaper, Beer})=2

Association Rule An implication expression of the form X -> Y, where X and Y are any 2 item sets.

Example: {Milk, Diaper}->{Beer}

Definition of Support:

Support is the frequency of A or how frequently an item set appears in the dataset.

From the above table:

Support(s)= ({Milk, Diaper, Beer})/|T| = 2/5

= 0.4

Where T is the total number of transactions.

Definition of Confidence:

How often the items X and Y occur together in the dataset when the occurrence of X is already

given.

It is the ratio of the transaction that contains X and Y to the number of records that contain X.

From Example {Milk, Diaper} > {Beer}


Confidence(c)= (Milk, Diaper, Beer)/(Milk, Diaper)=2/3

=0.67

Definition of Lift(l): The lift of the rule X=>Y is the confidence of the rule divided by the expected

confidence, assuming that the item sets X and Y are independent of each other. The expected

confidence is the confidence divided by the frequency of {Y}.

If Lift(l)=1: It indicates X and Y almost often appear together as expected,

If Lift(l)>1: It means they appear together more than expected and

If Lift(l)<1: It means they appear less than expected. Greater lift values indicate stronger association.

Lift(l)= Support(X,Y)/(Support(X)*Support(Y))

l=Supp({Milk, Diaper, Beer})/ (Supp({Milk, Diaper})*Supp({Beer}))

= 0.4/(0.6*0.6)

=1.11

Applications of Association Rule Learning

Below are some popular applications of association rule learning:

Market Basket Analysis: It is one of the popular examples and applications of association rule

mining. This technique is commonly used by big retailers to determine the association between

items. By discovering such associations, retailers produce marketing methods by analyzing which

elements are frequently purchased by users.


Medical Diagnosis: With the help of association rules, patients can be cured easily, as it helps in

identifying the probability of illness for a particular disease.

Protein Sequence: The association rules help in determining the synthesis of artificial Proteins.

Web usage mining: Web usage mining is basically the extraction of various types of interesting data

that is readily available and accessible in the ocean of huge web pages, from Internet.

UNIT-4: Unsupervised Learning

Introduction to Clustering

Unsupervised vs Supervised Learning,

Different types of clustering techniques

Partitioning Methods

Hierarchical methods

Density Based Methods

Grid Based Methods

Model Based Clustering Methods.

4. K-means clustering,

5. Apriori algorithm and

Association rule.

Hierarchical clustering, K-Medoids,

Density-based methods DBSCAN.

Hierarchical clustering

The hierarchical clustering methods are used to group the data into hierarchy or tree-like structure.

For example, in a machine learning problem of organizing employees of a university in different


departments, first the employees are grouped under the different departments in the university, and

then within each department, the employees can be grouped according to their roles such as

professors, assistant professors, supervisors, lab assistants, etc.

This creates a hierarchical structure of the employee data and eases visualization and analysis.

Similarly, there may be a data set which has an underlying hierarchy structure that we want to

discover and we can use the hierarchical clustering methods to achieve that.

Types of hierarchical clustering

There are two main hierarchical clustering methods: Agglomerative clustering and Divisive

clustering.

Agglomerative clustering: is a bottom-up technique which starts with individual objects as clusters

and then iteratively merges them to form larger clusters.

Divisive clustering: starts with one cluster with all given objects and then splits it iteratively to form

smaller clusters.

Agglomerative Clustering:

It uses a bottom-up approach.

It starts with each object forming its own cluster and then iteratively merges the clusters according to

their similarity to form large clusters.

It terminates either when certain clustering condition imposed by user is achieved or All clusters

merge into a single cluster.


Divisive Clustering:

Divisive clustering is the opposite of Agglomerative clustering.

It uses the top-down approach.

The starting point is the largest cluster with all objects in it and then split recursively to form smaller

and smaller clusters.

It terminates when the user-defined condition is achieved or final clusters contain only one object.

A dendrogram, which is a tree like structure, is used to represent hierarchical clustering.

Individual objects are represented by leaf nodes and the clusters are represented by root nodes. A

representation of dendrogram is shown in this figure:

Hierarchical Clustering

Produce a nested sequence of clusters.

One approach: recursive application of a partitional clustering algorithm.

Types of hierarchical clustering

Agglomerative (bottom up) clustering: It builds the dendrogram (tree) from the bottom level, and

merges the most similar (or nearest) pair of clusters

stops when all the data points are merged into a single cluster (i.e., the root cluster).

Divisive (top down) clustering: It starts with all data points in one cluster, the root.
Splits the root into a set of child clusters. Each child cluster is recursively divided further

stops when only singleton clusters of individual data points remain, i.e., each cluster with only a

single point

Dendrogram: Hierarchical Clustering

Dendrogram

Given an input set S

nodes represent subsets of S

Features of the tree:

The root is the whole input set S.

The leaves are the individual elements of S.

The internal nodes are defined as the union of their children.

10

Dendrogram: Hierarchical Clustering

Dendrogram

Each level of the tree represents a partition of the input data into several (nested) clusters or groups.

May be cut at any level: Each connected component forms a cluster.

10

Hierarchical clustering

Initialization

Each individual point is taken as a cluster


Construct distance/proximity matrix

Distance/Proximity Matrix

Intermediate State

After some merging steps, we have some clusters

C1

C4

C2

C5

C3
Distance/Proximity Matrix

Intermediate State

Merge the two closest clusters (C2 and C5) and update the distance matrix.

C1

C4

C2

C5

C3

Distance/Proximity Matrix

After Merging

Update the distance matrix

C1

C4

C2 U C5

C3
? ? ? ?

C2 U C5

C1

C1

C3

C4

C2 U C5

C3

C4

Closest Pair

16

A few ways to measure distances of two clusters.

Single-link

Similarity of the most similar (single-link)


Complete-link

Similarity of the least similar points

Centroid

Clusters whose centroids (centers of gravity) are the most similar

Average-link

Average cosine between pairs of elements

17

Single Link Example

It Can result in straggly (long and thin) clusters due to chaining effect.

Agglomerative Clustering Example


Consider the points A (1, 1), B(2, 3), C(3, 5), D(4,5), E(6,6), and F(7,5) and try to cluster them.

To perform clustering, we will first create a distance matrix consisting of the distance between each

point in the dataset. The distance matrix looks as follows.

Consider the following Sample Data:

The distance matrix is:

For our convenience, we will be considering only the lower bound values of the matrix as shown

below. Specifically, the lower bound values represent the minimum distance between any two points

in the dataset.

Step 1: Calculate Euclidean distance, create the distance matrix.

Step 2: Find the minimum value element from distance matrix.

The minimum value element is (p3,p6)and value is 0.11 i.e. cluster (p3,p6)

Step 3: Recalculate or update the distance matrix for cluster(p3,p6)

Same formula can be used for p2,p4,p5

Min[dist((p3,p6),p2)]

Min[dist((p3,p2),(p6,p2)]

Min[dist(0.15, 0.25)]=0.15

Min[dist((p3,p6),p4)]

Min[dist((p3,p4),(p6,p4)]

Min[dist(0.15, 0.22)]=0.15

Min[dist((p3,p6),p5)]

Min[dist((p3,p5),(p6,p5)]

Min[dist(0.28, 0.39)]=0.28
Updated distance matrix:

Step 4: repeat the step 2 & 3.

The minimum value element is (p2,p5)and value is 0.14 i.e. cluster (p2,p5)

Recalculate or update the distance matrix for cluster (p2,p5)

Same formula can be used for (p3,p6),p4

Min[dist((p2,p5),(p3,p6)]

Min[dist((p2,(p3,p6),(p5, (p3,p6))]

Min[dist(0.15, 0.28)]=0.15

Min[dist((p2,p5),p4)]

Min[dist((p2,p4),(p5,p4)]

Min[dist(0.20, 0.29)]=0.20

Updated distance matrix:

Step 5: repeat the step 2 & 3.

The minimum value element is (p2,p5,p3,p6) and value is 0.15

Here 2 values are same and minimum, then first element is choose as minimum value element. i.e.

cluster (p2,p5,p3,p6)

Recalculate or update the distance matrix for cluster (p2,p5,p3,p6)

Min[dist([(p2p5),(p3p6)],p1)]

Min[dist([((p2p5), p1)], [((p3p6), p1)]]

Min[dist([(0.23)], [(0.22)]]=0.22
Same formula can be used for p4

Min[dist([(p2p5),(p3p6)],p4)]

Min[dist([((p2p5), p4)], [((p3p6), p4)]]

Min[dist([(0.20)], [(0.15)]]=0.15

Updated distance matrix:

Step 6: repeat the step 2 & 3.

The minimum value element is (p2,p5,p3,p6,p4) and value is 0.15. i.e. our 4th cluster

(p2,p5,p3,p6,p4)

Recalculate or update the distance matrix for cluster (p2,p5,p3,p6,p4)

Updated distance matrix:

Step 8: repeat the step 3 & 4.

The minimum value element is (p2,p5,p3,p6,p4,p1) and value is 0.22

i.e. our 5th cluster (p2,p5,p3,p6,p4,p1)

Recalculate or update the distance matrix for cluster (p2,p5,p3,p6,p4,p1)

In this step only 1 value is remaining so it is by default cluster.

Step 8: Drawing the Dendogram.

You might also like