Unsupervised Learning 2024-PPG
Unsupervised Learning 2024-PPG
By
Prof(Dr) Premanand P Ghadekar
1
Unsupervised Learning
❖ K-Means Clustering
❖ C-Means Clustering
❖ Associated Rule Mining
2
Unsupervised Learning
Unsupervised Learning is a type of Machine Learning Algorithm used to draw
inferences From data sets consisting of input data without labelled responses
3
Unsupervised Learning-Process Flow
Training Data is collections of information without any label
4
What is Clustering
Clustering is the process of dividing the datasets into groups consisting
of similar data points.
o It merge grouping of objects based on the information found in the data,
describing objects or their relationship.
o Points in the same groups are as similar as possible.
o Points in different groups are as dissimilar as possible.
5
Why is Clustering used
The goal of clustering is to determine the intrinsic grouping in a set of unlabeled
data.
Sometimes Partitioning is the goal.
6
Where it is used
7
Clustering Example
Image segmentation
Goal: Break up the image into meaningful or perceptually similar regions
8
Types of Clustering
Division of Objects into clusters such that each object is in exactly one cluster not
Several clusters
10
Types of Clustering
C-Means Clustering- Division of Objects into clusters such that each object can belongs to
Multiple clusters
11
Types of Clustering
o Exclusive Clustering
o Overlapping Clustering
o Hierarchical Clustering- Agglomerative and Divisive
13
K-Means Algorithm WORKING
Distance Measure determines the similarity between two elements, and it influences
the shape of the Clusters.
Euclidean Distance Measure, Manhattan Distance Measure, Squared Euclidean
Distance Measure, Cosine distance measure
14
K-Means Clustering Steps
1. First, we need to decide number of clusters to be made (Guessing).
2. Then we provide centroid of all the clusters (Guessing)
3. The Algorithm calculates Euclidian distance of the points from each centroid and assign
the point to the closest cluster.
17
K-Means Clustering Algorithm
❑ Divide the point into three Clusters. Where K=3
18
K-Means Clustering Algorithm
Step-1: Select the number of Clusters to be identified, i.e. select a value of k=3 in
this case.
Step-2: Randomly select three distinct data points.
19
K-Means Clustering Algorithm
Step-3: Measure the distance between the 1st point and selected three Clusters.
20
K-Means Clustering Algorithm
Step-4:Assign 1st point to nearest cluster.
21
K-Means Clustering Algorithm
Step-5:Calculate the mean value including the new point for the red cluster.
Calculate the mean value including the new point for the red cluster.
22
K-Means Clustering Algorithm
Step-6: Find to which cluster does point 2 belongs to, how?
Repeat the same procedure but measure distance to the red mean
Calculate the mean value including the new point for the red cluster.
23
K-Means Clustering Algorithm
Step-7:
o Measure the distance
o Assign the point to the nearest Cluster
o Calculate the Cluster mean using the new point.
24
K-Means Clustering Algorithm
According to K-means Algorithm it iterates over again and again unless and until
the data points within each cluster stop changing.
25
K-Means Clustering Algorithm
26
K-Means Clustering Algorithm
27
K-Means Clustering Example
Sr No Height Weight K1 Cluster-Centroid-185, 72
1 185 72 K2 Cluster-Centroid-170, 56
K2= 2, 3 28
K-Means Clustering Example
29
K-Means Clustering Example
30
K-Means Clustering Example
31
K-Means Clustering Example
32
K-Means Clustering Example
33
K-Means Clustering Example
34
K-Means Clustering Example
35
K-Means Clustering Example
36
K-Means Clustering Example
37
How to decide the number of Clusters
The Elbow Method
First of all, Compute the Sum of Squared Error (SSE) for some value of k (For
Ex-2,4,6,8 etc.). The SSE is defined as the sum of the squared distance between
each member of the cluster and its centroid. Mathematically-
SSE=σ𝐾
𝑖=1 σ𝑧=𝑐𝑖 𝑑𝑖𝑠𝑡 𝑥, 𝑐𝑖
2
The idea of the elbow method is to choose the ‘k’ after which the SSE decrease is
Almost constant.
38
Pros and Cons of K-means Clustering
Pros
o Simple Understandable
o Items automatically assigned to clusters
Cons
o Must define number of clusters
o All items forced into clusters
o Unable to handle noisy data and outliers
39
Applications of K-Means Clustering
o Academic Performance
o Diagnostic System
o Search Engine
o Wireless Sensor Network
41
Fuzzy C-means Clustering
b
a
42
Fuzzy C-means Clustering
43
Pros and Cons of C-means Clustering
Pros
o Allows a data point to be in multiple clusters.
o A more natural representation of the behavior of genes.
o Genes usually are involved in multiple functions.
Cons
o Need to define c, the number of clusters.
o Need to determine membership cut-off value.
o Clusters are sensitive to initial assignment of centroids.
o Fuzzy C-Means is not a deterministic algorithm.
44
K-Means versus Fuzzy C-Means
45
Fuzzy C-means Clustering
46
Fuzzy Sets
47
Steps in Fuzzy C-Means
48
The process flow of fuzzy C-Means
49
The Fuzzy C-Means Example
50
The Fuzzy C-Means Example
51
The Fuzzy C-Means Example
o In fuzzy clustering, each data point can have membership to multiple clusters.
o By relaxing the definition of membership coefficients from strictly 1 or 0, these
values can range from any value from 1 to 0.
o The following image shows the data set from the previous clustering, but now
fuzzy c-means clustering is applied.
o First, a new threshold value defining two clusters may be generated.
o Next, new membership coefficients for each data point are generated based on
clusters centroids, as well as distance from each cluster centroid.
52
Density Based Spatial Clustering of Application with Noise
53
Evaluation Metrics for Clusters
Some popular measures used to evaluate the C-Means clusters :
1. Homogeneity analysis of the clusters formed.
2. The clusters thus formed using Fuzzy C-Means, need to homogeneous and
separated from other clusters.
3. Coefficient of Variance analysis for each cluster.
4. Pearson Correlation can be used for validating the quality of clusters.
5. If we have ground truth cluster values, precision, recall, and f-score can also
be considered.
6. Elbow Method and Silhouette are also some statistical measures for
evaluating your clusters (I would rather use them to in pre-definition of
cluster number).
7. Entropy-based methods
54
Hierarchical Clustering
Hierarchical Clustering is an alternative approach which builds a hierarchy from
the bottom up, and doesn’t require us to specify the number of clusters beforehand.
55
Pros and Cons : Hierarchical Clustering
Pros
o No assumption of a particular number of clusters.
o May corresponds to meaningful taxonomies.
Cons
o Once a decision is made to combine two clusters, it cant be undone.
o TO slow for large datasets.
56
Why to use Market Basket Analysis
Market Basket Analysis
In order to understand why Market Basket Analysis is important we need to understand
the objective of MBA
The primary objectives of Market Basket Analysis is to
o Improve the effectiveness of Marketing and
o Improve the Sales tactics using customer data collected (During the Sales Transaction)
Market Basket Analysis is a modelling technique based upon the theory that if you buy a
certain group of items, you are more (or less) likely to buy another group of items.
57
What Questions Market Basket Analysis ?
o What products are customers really interested in?
o What products are sold well and which products can be combined with them?
o Which combinations are working well in terms of products?
o Other Random Observations or hidden Pattern if any?
58
What is Market Basket Analysis ?
o Market Basket Analysis(MBA) is a technique or algorithm of Data Mining to
file association rules from given Data or available data.
o The Mathematical Concept behind this algorithm is simple
o Support
o Confidence
59
Example
Given is the data of the transaction table of the shop/Super Market
60
Example
61
Market Basket Analysis
62
Association Rule Mining
63
Association Rule Mining
Association Rule Mining is a technique that shows how items are associated
with each other.
Examples
1. Customer who purchase bread have a 60% likelihood of also purchasing
Jam.
2. Customer who purchase laptop are more likely to purchase laptop bags.
64
Association Rule Mining
Example of Association Rule
A B
It means that if a person buys item A then he will also buy item B
65
Association Rule Mining
66
Market Basket Analysis
69
Association Rule Mining-Example
70
Association Rule Mining-Example
71
Apriori Algorithm-Example
Example- For the following Given Transaction Data-Set, Generate Rules using
Apriori Algorithm. Consider Values as Support=50%, and Confidence=75%.
Data Set Frequent Item Set Support(Bread)=nBread/n
Remove Egg and Yogurt from the list as Support value is less than 50%
72
Apriori Algorithm-Example
Make 2-Items Candidate Set and Write their Frequency
Support=50%, and Confidence=75%. For Rules –
I. Bread, Juice (1)
Item Pairs Frequency Support II. Cheese, Juice (2)
Bread, Cheese 2 2/5=40%
1. (Bread, Juice-)
Bread, Juice 3 3/5=60% Bread Juice
Juice Bread
Bread, Milk 2 2/5=40%
Cheese, Juice 3 3/5=60% Confidence (A B)=
Support(A U B)/S(A)
Cheese, Milk 1 1/5=20%
Juice, Milk 2 2/5=40%
As Confidence Level of all the Rules are equal to 75% -Means all the Rules
are Good. 73
Apriori Algorithm-Example
Example- For the following Given Transaction Data-Set, Generate Rules using
Apriori Algorithm. Consider Values as Support=50%, and Confidence=70%.
Data Set I Find out the Frequency of Items
74
Apriori Algorithm-Example
Example- For the following Given Transaction Data-Set, Generate Rules using
Apriori Algorithm. Consider Values as Support=50%, and Confidence=70%.
II Find out the Support/ III Find out the Support/Frequency of
Frequency of Pair of Items three Items
75
Apriori Algorithm-Example
Consider Values as Support=50%, and Confidence=70%.
Define Rules and Calculate Confidence Values.
II Rules Confidence=S(A U B)/S(A)
Ex-(2 ^ 3) 5 =2/2=100%
1 (2 ^ 3) 5 2 1 (2 ^ 3) 5 2/2=100%
2 (3 ^ 5) 5 2 2 (3 ^ 5) 5 2/2=100%
3 (2 ^ 5) 3 2 3 (2 ^ 5) 3 2/3=66%
4 2 (3 ^ 5) 2 4 2 (3 ^ 5) 2/3=66%
5 5 (2 ^ 3) 2 5 5 (2 ^ 3) 2/3=66%
6 3 (2 ^ 5) 2 6 3 (2 ^ 5) 2/3=66%
77
Apriori Algorithm
78
Apriori Algorithm
79
Apriori Algorithm-First Iteration
80
Apriori Algorithm- Second Iteration
81
Apriori Algorithm- Third Iteration
82
Apriori Algorithm-Pruning
83
Apriori Algorithm-Fourth Iteration
84
Apriori Algorithm-Subset Creation
85
Apriori Algorithm-Applying Rules
86
Apriori Algorithm-Applying Rules
87
April 5, 2024 Prof P P Ghadekar, VIT Pune 88