Unsupervised Machine Learning
Unsupervised Machine Learning
K ADISESHA
2
Unsupervised Contents--
Learning and
Association Mining Introduction
Clustering
Clustering Types
K-Means algorithm
Association Mining
FP-growth
Prof. Dr. K. Adisesha
3
Introduction
Clustering Algorithms:
Clustering in unsupervised machine learning is the process of grouping unlabeled data
into clusters based on their similarities.
➢ The goal of clustering is to identify patterns and relationships in the data without any
prior knowledge of the data’s meaning.
➢ Some common clustering algorithms:
❖ Hierarchical Clustering: Creates clusters by building a tree step-by-step, either merging or splitting
groups.
❖ K-means Clustering: Groups data into K clusters based on how close the points are to each other.
❖ Density-Based Clustering (DBSCAN): Finds clusters in dense areas and treats scattered points as noise.
❖ Mean-Shift Clustering: Discovers clusters by moving points toward the most crowded areas.
❖ Probalistic clustering: Clusters are created using probability distribution.
Prof. Dr. K. Adisesha
5
Unsupervised Learning
Dimensionality Reduction:
Dimensionality reduction is the process of reducing the number of features in a dataset
while preserving as much information as possible.
➢ Here are some popular Dimensionality Reduction algorithms:
❖ Principal Component Analysis (PCA): Reduces dimensions by transforming data into
uncorrelated principal components.
❖ Linear Discriminant Analysis (LDA): Reduces dimensions while maximizing class separability
for classification tasks.
❖ Non-negative Matrix Factorization (NMF): Breaks data into non-negative parts to simplify
representation.
❖ Isomap: Captures global data structure by preserving distances along a manifold.
Prof. Dr. K. Adisesha
7
Unsupervised Learning
K-means Clustering:
K-means clustering is an unsupervised learning algorithm used for data clustering,
which groups unlabeled data points into groups or clusters.
➢ For example, online store uses K-Means to group customers based on purchase
frequency and spending creating segments personalised marketing.
➢ The algorithm works by first randomly picking some
central points called centroids and each data point is
then assigned to the closest centroid forming a cluster.
➢ After all the points are assigned to a cluster the
centroids are updated by finding the average position of
the points in each cluster.
Prof. Dr. K. Adisesha
20
Clustering Models
K-means Clustering:
The algorithm will categorize the items into k groups or clusters of similarity. To
calculate that similarity, we will use the Euclidean distance as a measurement.
➢ The algorithm works as follows:
❖ First, we randomly initialize k points, called means or cluster centroids.
❖ We categorize each item to its closest mean, and
we update the mean’s coordinates, which are the
averages of the items categorized in that cluster so
far.
❖ We repeat the process for a given number of
iterations and at the end, we have our clusters.
Prof. Dr. K. Adisesha
21
Clustering Models
K-means Clustering:
The algorithm will categorize the items into k groups or clusters of similarity.
➢ First, we randomly initialize k points, called means or cluster centroids.
➢ Given a set of observations (x1, x2, ..., xn), where each observation is a d-dimensional
real vector, k-means clustering aims to partition the n observations into k (≤ n) sets
S = {S1, S2, ..., Sk} so as to minimize the within-cluster sum of
squares (variance). Formally, the objective is to find:
K-means Clustering:
The goal of the K-Means algorithm is to find clusters in the given input data. The
flowchart below shows how k-means clustering works:
➢ The goal of the K-Means algorithm is to find clusters in the given input data.
K-means Clustering:
K-means Clustering – Details
➢ Initial centroids are often chosen randomly.
❖ Clusters produced vary from one run to another.
➢ The centroid is (typically) the mean of the points in the cluster.
➢ ‘Closeness’ is measured by Euclidean distance, cosine similarity, correlation, etc.
➢ K-means will converge for common similarity measures mentioned above.
➢ Most of the convergence happens in the first few iterations.
❖ Often the stopping condition is changed to ‘Until relatively few points change clusters’
➢ Complexity is O( n * K * I * d )
❖ n = number of points, K = number of clusters, I = number of iterations, d = number of
attributes
Prof. Dr. K. Adisesha
25
Clustering Models
2.5
1.5
y
1
0.5
Limitations of K-means:
K-means has problems when clusters are of differing
❖ Sizes
❖ Densities
❖ Non-globular shapes
K-Medoids clustering:
The k-medoids problem is a clustering problem similar to k-means. The name was
coined by Leonard Kaufman and Peter J. Rousseeuw with their PAM (Partitioning
Around Medoids) algorithm.
➢ K-medoids is a classical partitioning technique of clustering
that splits the data set of n objects into k clusters, where the
number k of clusters assumed known a priori.
➢ In K-medoids instead of using centroids of clusters, actual
data points are use to represent clusters.
K-medoids:
Applications:
➢ Academic Performance: Based on the scores obtained by the student, they are classified
into grades like A, B, C, D etc.
➢ Diagnostic System: In the medical profession, it helps in creating smarter medical
decision support systems, especially in the treatment of liver ailments.
➢ Document Classification: Cluster documents in multiple categories based on their tags,
topics, and the content of the document.
➢ Customer Segmentation: It helps marketers to improve their customer base, work on
target areas and segment customer based on purchase history, interest. The
classification would help the company target specific clusters of customers for specific
Prof.campaigns
Dr. K. Adisesha and sell their products according to their interest.
31
Clustering Models
K-means vs K-medoids:
K-Medoid is more efficient as it is less sensitive to outliners which makes it better than
the K-Means method and it is applicable in various fields and in future, it can rise.
K-Means K-Medoids
D Minimizes the sum of dissimilarities between points
Attempts to minimize the total squared
labelled to be in A cluster and their closest selected
error.
object.
Takes means of elements in a dataset Takes medoids of elements in a dataset
Time complexity is O(ndk+1) Time complexity is O(k*(n-k)2)
More sensitive to outliners Less sensitive to outliners
Uses Euclidean Distance
Prof. Dr. K. Adisesha
Uses Partitioning Around Medoids (PAM)
32
Association Rule Learning
➢ Here the If element is called antecedent, and then statement is called as Consequent.
➢ Association rule learning can be divided into three types of algorithms:
❖ Apriori: It is mainly used for market basket analysis and helps to understand the products
that can be bought together.
❖ Eclat: Eclat algorithm stands for Equivalence Class Transformation. This algorithm uses a
depth-first search technique to find frequent itemsets in a transaction database.
❖ F-P Growth Algorithm: It stands for Frequent Pattern, and it represents the database in the
form of a tree structure that is known as a frequent pattern or tree.
Prof. Dr. K. Adisesha
34
Association Rule Learning
➢ to measure the associations between thousands of data items, there are several metrics.
These metrics are given below:
❖ Support: Support is the frequency of A or how frequently an item appears in the
dataset.
❖ Confidence: It is the ratio of the transaction that contains X and Y to the number of
records that contain X.
❖ Lift: It is the ratio of the observed support measure and expected support if X and Y
are independent of each other.
Prof. Dr. K. Adisesha
35
Association Rule Learning
Apriori Algorithm:
The Apriori algorithm is a machine learning algorithm that finds frequent patterns in
data. It's used to identify associations between items and create rules based on those
associations.
➢ Apriori Algorithm is a foundational method in data mining used for discovering
frequent item sets and generating association rules.
➢ Its significance lies in its ability to identify relationships between items in large datasets
which is particularly valuable in market basket analysis.
➢ For example, if a grocery store finds that customers who buy bread often also buy
butter, it can use this information to optimize product placement or marketing
strategies.
Prof. Dr. K. Adisesha
36
Association Rule Learning
Apriori Algorithm:
The Apriori Algorithm operates through a systematic process that involves several key
steps:
➢ Identifying Frequent Itemsets: The algorithm begins by scanning the dataset to identify
individual items (1-item) and their frequencies.
➢ Creating Possible item group: Once frequent 1-itemgroup(single items) are identified, the
algorithm generates candidate 2-itemgroup by combining frequent items.
➢ Removing Infrequent Item groups: The algorithm employs a pruning technique based on the
Apriori Property, which states that if an itemset is infrequent, all its supersets must also be
infrequent.
➢ Generating Association Rules: After identifying frequent itemsets, the algorithm generates
association rules that illustrate how items relate to one another, using metrics like support,
Prof.confidence,
Dr. K. Adisesha and lift to evaluate the strength of these relationships.
37
Association Rule Learning
Apriori Algorithm:
Example of the Apriori Algorithm: Min support 2
➢ Data in the database
➢ Calculate the support/frequency of all items
➢ Discard the items with minimum support less than 2
➢ Combine two items
➢ Calculate the support/frequency of all items
➢ Discard the items with minimum support less than 2.
Combine three items and calculate their support.
➢ Discard the items with minimum support less than 2
Result:
Only one itemset is frequent (Eggs, Tea, Cold Drink)
because this itemset has minimum support 2
Prof. Dr. K. Adisesha
38
Association Rule Learning
Apriori Algorithm:
Advantages of Apriori Algorithm:
➢ Apriori Algorithm is the simplest and easy to understand the algorithm for mining the frequent
itemset.
➢ Apriori Algorithm is fully supervised so it does not require labeled data.
➢ Apriori Algorithm is an exhaustive algorithm, so it gives satisfactory results to mine all the
rules within specified confidence and sport.
➢ Apriori principles Downward closure property of frequent patterns which, means that All
subset of any frequent itemset must also be frequent.
Eclat Algorithm:
Eclat algorithm stands for Equivalence Class Transformation. This algorithm uses a
depth-first search technique to find frequent itemsets in a transaction database.
➢ Here’s how it works step by step:
❖ Transaction Database: Eclat starts with a transaction database, where each row represents a transaction,
and each column represents an item.
❖ Itemset Generation: Initially, Eclat creates a list of single items as 1-itemsets. It counts the support
(frequency) of each item in the database by scanning it once.
❖ Building Equivalence Classes: Eclat constructs equivalence classes by grouping transactions that share
common items in their 1-itemsets.
❖ Recursive Search: Eclat recursively explores larger itemsets by combining smaller ones. It does this by
taking the intersection of equivalence classes of items.
❖ Pruning: Eclat prunes infrequent itemsets at each step to reduce the search space, just like Apriori.
Prof. Dr. K. Adisesha
40
Association Rule Learning
Eclat Algorithm:
Eclat algorithm stands for Equivalence Class Transformation. This algorithm uses a
depth-first search technique to find frequent itemsets in a transaction database.
➢ Let’s say you have a transactional dataset for a grocery store:
1: {Milk, Bread, Eggs}
2: {Milk, Bread, Diapers}
3: {Milk, Beer, Chips}
4: {Bread, Diapers, Beer, Chips}
5: {Bread, Eggs, Beer}
❖ Suppose you want to find frequent itemsets with a minimum support of 2 transactions.
❖ Initially, the 1-itemsets are {Milk}, {Bread}, {Eggs}, {Diapers}, {Beer}, {Chips}.
❖ Calculate their support.
Eclat Algorithm:
Eclat algorithm stands for Equivalence Class Transformation. This algorithm uses a
depth-first search technique to find frequent itemsets in a transaction database.
❖ Construct equivalence classes:
{Milk}: Transaction 1, 2, 3 {Diapers}: Transaction 2, 4
{Bread}: Transaction 1, 2, 4, 5 {Beer}: Transaction 3, 4, 5
{Chips}: Transaction 3, 4
{Eggs}: Transaction 1, 5
❖ Recursively generate larger itemsets:
{Milk, Bread}, {Milk, Eggs}, {Milk, Diapers}, {Milk, Beer}, {Milk, Chips},
{Bread, Eggs}, {Bread, Diapers}, {Bread, Beer}, {Bread, Chips}, {Eggs, Diapers},
{Eggs, Beer}. {Diapers, Beer}, {Diapers, Chips},{Beer, Chips}
❖ Prune itemsets with support less than 2.
❖ Continue this process until no more frequent itemsets can be found.
Prof. Dr. K. Adisesha
42
Association Rule Learning
Eclat Algorithm:
Eclat algorithm stands for Equivalence Class Transformation. This algorithm uses a
depth-first search technique to find frequent itemsets in a transaction database.
➢ This vertical approach of the ECLAT algorithm makes it a faster algorithm than the Apriori
algorithm.
➢ Transactions, originally stored in horizontal format, are read from disk and converted to
vertical format.
Eclat Algorithm:
Example:- Consider the following transactions record:-, minimum support = 2
k=1 k=2
Item Tidset
Item Tidset
{Bread, Butter} {T1, T4, T8, T9}
Bread {T1, T4, T5, T7, T8, T9}
{Bread, Milk} {T5, T7, T8, T9}
Milk {T3, T5, T6, T7, T8, T9} {Bread, Jam} {T1, T8}
Eclat Algorithm:
Example:- Consider the following transactions record:-, minimum support = 2
k=3 k=4 Items Bought Recommended Products
Dimensionality Reduction:
Dimensionality reduction is the process of reducing the number of features (or
dimensions) in a dataset while retaining as much information as possible.
➢ The number of input features, variables, or columns present in a given dataset is known as
dimensionality, and the process to reduce these features is called dimensionality reduction.
➢ These techniques are widely used in machine learning for obtaining a better fit predictive model
while solving the classification and regression problems.
➢ Several techniques for dimensionality reduction:
❖ Principal Component Analysis (PCA)
❖ Singular Value Decomposition (SVD)
❖ Linear Discriminant Analysis (LDA)
❖ where I is the identity matrix of the same shape as matrix A. And the above conditions will be
true only if (A–λI) will be non-invertible (i.e. singular matrix).
Prof. Dr. K. Adisesha
54
Dimensionality Reduction
➢ Misclassification rate: It is also termed as Error rate, and it defines how often the model gives
the wrong predictions.
➢ Precision: It can be defined as the number of correct outputs provided by the model or out of all
positive classes that have predicted correctly by the model, that were actually true.
➢ F-measure: If two models have low precision and high recall or vice versa, we can use F-score.
This score helps us to evaluate the recall and precision at the same time.
➢ Null Error rate: It defines how often our model would be incorrect if it always predicted the
majority class. It is said that "the best classifier has a higher error rate than the null error rate."
Prof. Dr. K. Adisesha
62
Confusion Matrix