0% found this document useful (0 votes)

8 views63 pages

Unsupervised Machine Learning

Unsupervised Machine Learning Algorithms

Uploaded by

Adisesha. K

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

8 views63 pages

Unsupervised Machine Learning

Unsupervised Machine Learning Algorithms

Uploaded by

Adisesha. K

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 63

Prof. Dr.

K ADISESHA
2
Unsupervised Contents--
Learning and
Association Mining Introduction

Clustering

Clustering Types

K-Means algorithm

Association Mining

FP-growth
Prof. Dr. K. Adisesha
3
Introduction

Unsupervised Learning and Association Mining:

Unsupervised learning algorithms are tasked with finding patterns and relationships
within the data without any prior knowledge of the data’s meaning.
➢ Unsupervised learning is a branch of machine learning that deals with unlabeled data.
➢ There are mainly 3 types of Algorithms which are used for Unsupervised dataset.
❖ Clustering
❖ Association Rule Learning
❖ Dimensionality Reduction

Prof. Dr. K. Adisesha

4
Unsupervised Learning

Clustering Algorithms:
Clustering in unsupervised machine learning is the process of grouping unlabeled data
into clusters based on their similarities.
➢ The goal of clustering is to identify patterns and relationships in the data without any
prior knowledge of the data’s meaning.
➢ Some common clustering algorithms:
❖ Hierarchical Clustering: Creates clusters by building a tree step-by-step, either merging or splitting
groups.
❖ K-means Clustering: Groups data into K clusters based on how close the points are to each other.
❖ Density-Based Clustering (DBSCAN): Finds clusters in dense areas and treats scattered points as noise.
❖ Mean-Shift Clustering: Discovers clusters by moving points toward the most crowded areas.
❖ Probalistic clustering: Clusters are created using probability distribution.
Prof. Dr. K. Adisesha
5
Unsupervised Learning

Association Rule Learning:

Association rule learning is also known as association rule mining is a common
technique used to discover associations in unsupervised machine learning.
➢ This technique is a rule-based ML technique that finds out some very useful relations
between parameters of a large data set.
➢ Some common Association Rule Learning algorithms:
❖ Apriori Algorithm: Finds patterns by exploring frequent item combinations step-by-step.
❖ FP-Growth Algorithm: An Efficient Alternative to Apriori. It quickly identifies frequent
patterns without generating candidate sets.
❖ Eclat Algorithm: Uses intersections of itemsets to efficiently find frequent patterns.
❖ Efficient Tree-based Algorithms: Scales to handle large datasets by organizing data in tree
structures.
Prof. Dr. K. Adisesha
6
Unsupervised Learning

Dimensionality Reduction:
Dimensionality reduction is the process of reducing the number of features in a dataset
while preserving as much information as possible.
➢ Here are some popular Dimensionality Reduction algorithms:
❖ Principal Component Analysis (PCA): Reduces dimensions by transforming data into
uncorrelated principal components.
❖ Linear Discriminant Analysis (LDA): Reduces dimensions while maximizing class separability
for classification tasks.
❖ Non-negative Matrix Factorization (NMF): Breaks data into non-negative parts to simplify
representation.
❖ Isomap: Captures global data structure by preserving distances along a manifold.
Prof. Dr. K. Adisesha
7
Unsupervised Learning

Challenges of Unsupervised Learning:

Here are the key challenges of unsupervised learning:
➢ Noisy Data: Outliers and noise can distort patterns and reduce the effectiveness of
algorithms.
➢ Assumption Dependence: Algorithms often rely on assumptions (e.g., cluster shapes),
which may not match the actual data structure.
➢ Overfitting Risk: Overfitting can occur when models capture noise instead of
meaningful patterns in the data.
➢ Limited Guidance: The absence of labels restricts the ability to guide the algorithm
toward specific outcomes.
➢ Cluster Interpretability: Results, such as clusters, may lack clear meaning or alignment
Prof.with real-world categories.
Dr. K. Adisesha
8
Unsupervised Learning

Applications of Unsupervised learning:

Unsupervised learning has diverse applications across industries and domains. Key
applications include:
➢ Customer Segmentation: Algorithms cluster customers based on purchasing behavior or
demographics, enabling targeted marketing strategies.
➢ Anomaly Detection: Identifies unusual patterns in data, aiding fraud detection, cybersecurity,
and equipment failure prevention.
➢ Recommendation Systems: Suggests products, movies, or music by analyzing user behavior and
preferences.
➢ Image and Text Clustering: Groups similar images or documents for tasks like organization,
classification, or content recommendation.
➢ Social Network Analysis: Detects communities or trends in user interactions on social media
Prof.platforms.
Dr. K. Adisesha
9
Clustering Models

Hierarchical vs Non-Hierarchical Clustering:

Two commonly utilized clustering approaches are Hierarchical Clustering and
Non−Hierarchical Clustering.
➢ Hierarchical Clustering may be a flexible clustering method that makes various levelled
structures of clusters. It can be performed utilizing two primary strategies:
❖ Agglomerative progressive clustering
❖ Divisive progressive clustering
➢ Non-Hierarchical Clustering: also known as partition clustering, points to
straightforwardly relegating information focuses to clusters without considering a
progressive structure.
➢ It incorporates well−known calculations such as K−means, DBSCAN, and Gaussian Blend
Models (GMM).
Prof. Dr. K. Adisesha
10
Clustering Models

Hierarchical vs Non-Hierarchical Clustering:

Two commonly utilized clustering approaches are Hierarchical Clustering and
Non−Hierarchical Clustering.
➢ Hierarchical Clustering
➢ Non-Hierarchical Clustering:

Prof. Dr. K. Adisesha

11
Clustering Models

Agglomerative Hierarchical Clustering:

Agglomerative Hierarchical clustering is a bottom-up clustering approach where
clusters have sub-clusters, which consecutively have sub-clusters, etc.
➢ It is also known as the bottom-up approach or hierarchical agglomerative clustering
(HAC).
➢ Unlike flat clustering hierarchical clustering
provides a structured way to group data.
➢ This clustering algorithm does not require us
to prespecify the number of clusters.

Prof. Dr. K. Adisesha

12
Clustering Models

Agglomerative Hierarchical Clustering:

Workflow for Hierarchical Agglomerative clustering.
1. Start with individual points: Each data point is its own cluster. For example if you have 5 data points
you start with 5 clusters each containing just one data point.
2. Calculate distances between clusters: Calculate the distance between every pair of clusters. Initially
since each cluster has one point this is the distance between the two data points.
3. Merge the closest clusters: Identify the two clusters with the smallest distance and merge them into a
single cluster.
4. Update distance matrix: After merging you now have one less cluster. Recalculate the distances between
the new cluster and the remaining clusters.
5. Repeat steps 3 and 4: Keep merging the closest clusters and updating the distance matrix until you have
only one cluster left.
6. Create a dendrogram: As the process continues you can visualize the merging of clusters using a tree-
like
Prof. Dr. diagram called a dendrogram. It shows the hierarchy of how clusters are merged.
K. Adisesha
13
Clustering Models

Agglomerative Hierarchical Clustering:

Workflow for Hierarchical Agglomerative clustering.

Prof. Dr. K. Adisesha

14
Clustering Models

Hierarchical Divisive clustering:

It is also known as a top-down approach. This algorithm also does not require to
prespecify the number of clusters.
➢ Top-down clustering requires a method for splitting a cluster that contains the whole
data and proceeds by splitting clusters recursively until individual data have been split
into singleton clusters.

Prof. Dr. K. Adisesha

15
Clustering Models

Hierarchical Divisive clustering:

Workflow for Hierarchical Agglomerative clustering.
1. Start with all data points in one cluster: Treat the entire dataset as a single large cluster.
2. Split the cluster: Divide the cluster into two smaller clusters. The division is typically done
by finding the two most dissimilar points in the cluster and using them to separate the data
into two parts.
3. Repeat the process: For each of the new clusters, repeat the splitting process:
❖ Choose the cluster with the most dissimilar points.
❖ Split it again into two smaller clusters.
4. Stop when each data point is in its own cluster: Continue this process until every data
point is its own cluster, or the stopping condition (such as a predefined number of clusters)
is met.
Prof. Dr. K. Adisesha
16
Clustering Models

Hierarchical Divisive clustering:

Prof. Dr. K. Adisesha

17
Clustering Models

Computing Distance Matrix:

Prof. Dr. K. Adisesha

18
Clustering Models

Computing Distance Matrix:

While merging two clusters we check the distance between two every pair of clusters
and merge the pair with the least distance/most similarity.
➢ But the question is how is that distance determined.
➢ There are different ways of defining Inter Cluster distance/similarity. Some of them
are:
❖ Min Distance: Find the minimum distance between any two points of the cluster.
❖ Max Distance: Find the maximum distance between any two points of the cluster.
❖ Group Average: Find the average distance between every two points of the clusters.
❖ Ward’s Method: The similarity of two clusters is based on the increase in squared
error when two clusters are merged.
Prof. Dr. K. Adisesha
19
Clustering Models

K-means Clustering:
K-means clustering is an unsupervised learning algorithm used for data clustering,
which groups unlabeled data points into groups or clusters.
➢ For example, online store uses K-Means to group customers based on purchase
frequency and spending creating segments personalised marketing.
➢ The algorithm works by first randomly picking some
central points called centroids and each data point is
then assigned to the closest centroid forming a cluster.
➢ After all the points are assigned to a cluster the
centroids are updated by finding the average position of
the points in each cluster.
Prof. Dr. K. Adisesha
20
Clustering Models

K-means Clustering:
The algorithm will categorize the items into k groups or clusters of similarity. To
calculate that similarity, we will use the Euclidean distance as a measurement.
➢ The algorithm works as follows:
❖ First, we randomly initialize k points, called means or cluster centroids.
❖ We categorize each item to its closest mean, and
we update the mean’s coordinates, which are the
averages of the items categorized in that cluster so
far.
❖ We repeat the process for a given number of
iterations and at the end, we have our clusters.
Prof. Dr. K. Adisesha
21
Clustering Models

K-means Clustering:
The algorithm will categorize the items into k groups or clusters of similarity.
➢ First, we randomly initialize k points, called means or cluster centroids.
➢ Given a set of observations (x1, x2, ..., xn), where each observation is a d-dimensional
real vector, k-means clustering aims to partition the n observations into k (≤ n) sets
S = {S1, S2, ..., Sk} so as to minimize the within-cluster sum of
squares (variance). Formally, the objective is to find:

where μi is the mean (also called centroid) of points in Si, i.e.

Prof. Dr. K. Adisesha

22
Clustering Models

K-means Clustering:
The goal of the K-Means algorithm is to find clusters in the given input data. The
flowchart below shows how k-means clustering works:
➢ The goal of the K-Means algorithm is to find clusters in the given input data.

Prof. Dr. K. Adisesha

23
Clustering Models

K-means Clustering Method:

Example

Prof. Dr. K. Adisesha

24
Clustering Models

K-means Clustering:
K-means Clustering – Details
➢ Initial centroids are often chosen randomly.
❖ Clusters produced vary from one run to another.
➢ The centroid is (typically) the mean of the points in the cluster.
➢ ‘Closeness’ is measured by Euclidean distance, cosine similarity, correlation, etc.
➢ K-means will converge for common similarity measures mentioned above.
➢ Most of the convergence happens in the first few iterations.
❖ Often the stopping condition is changed to ‘Until relatively few points change clusters’
➢ Complexity is O( n * K * I * d )
❖ n = number of points, K = number of clusters, I = number of iterations, d = number of
attributes
Prof. Dr. K. Adisesha
25
Clustering Models

Evaluating K-means Clusters:

Most common measure is Sum of Squared Error (SSE)
➢ For each point, the error is the distance to the nearest cluster
➢ To get SSE, we square these errors and sum them.
K
SSE =   dist2 (mi , x )
i =1 xCi

➢ x is a data point in cluster Ci and mi is the representative point for cluster Ci

❖ can show that mi corresponds to the center (mean) of the cluster
➢ Given two clusters, we can choose the one with the smallest error
➢ One easy way to reduce SSE is to increase K, the number of clusters
❖ A good clustering with smaller K can have a lower SSE than a poor clustering with higher K
Prof. Dr. K. Adisesha
26
Clustering Models

Evaluating K-means Clusters:

Most common measure is Sum of Squared Error (SSE)
Iteration 6
3

2.5

1.5

y
1

0.5

-2 -1.5 -1 -0.5 0 0.5 1 1.5 2

x
Prof. Dr. K. Adisesha
27
Clustering Models

Limitations of K-means:
K-means has problems when clusters are of differing
❖ Sizes
❖ Densities
❖ Non-globular shapes

➢ K-means has problems when the data contains

outliers.

Prof. Dr. K. Adisesha

28
Clustering Models

K-Medoids clustering:
The k-medoids problem is a clustering problem similar to k-means. The name was
coined by Leonard Kaufman and Peter J. Rousseeuw with their PAM (Partitioning
Around Medoids) algorithm.
➢ K-medoids is a classical partitioning technique of clustering
that splits the data set of n objects into k clusters, where the
number k of clusters assumed known a priori.
➢ In K-medoids instead of using centroids of clusters, actual
data points are use to represent clusters.

Prof. Dr. K. Adisesha

29
Clustering Models

K-Medoids clustering Algorithm:

The partitioning will be carried on such that each cluster must have at least one object
and an object must belong to only one cluster.

Prof. Dr. K. Adisesha

30
Clustering Models

K-medoids:
Applications:
➢ Academic Performance: Based on the scores obtained by the student, they are classified
into grades like A, B, C, D etc.
➢ Diagnostic System: In the medical profession, it helps in creating smarter medical
decision support systems, especially in the treatment of liver ailments.
➢ Document Classification: Cluster documents in multiple categories based on their tags,
topics, and the content of the document.
➢ Customer Segmentation: It helps marketers to improve their customer base, work on
target areas and segment customer based on purchase history, interest. The
classification would help the company target specific clusters of customers for specific
Prof.campaigns
Dr. K. Adisesha and sell their products according to their interest.
31
Clustering Models

K-means vs K-medoids:
K-Medoid is more efficient as it is less sensitive to outliners which makes it better than
the K-Means method and it is applicable in various fields and in future, it can rise.
K-Means K-Medoids
D Minimizes the sum of dissimilarities between points
Attempts to minimize the total squared
labelled to be in A cluster and their closest selected
error.
object.
Takes means of elements in a dataset Takes medoids of elements in a dataset
Time complexity is O(ndk+1) Time complexity is O(k*(n-k)2)
More sensitive to outliners Less sensitive to outliners
Uses Euclidean Distance
Prof. Dr. K. Adisesha
Uses Partitioning Around Medoids (PAM)
32
Association Rule Learning

Association Rule Learning:

Association rule learning is a type of unsupervised learning technique that checks for
the dependency of one data item on another data item and maps accordingly so that it
can be more profitable.
➢ The association rule learning is employed in Market Basket analysis, Web usage
mining, continuous production, etc.
➢ For example, if a customer buys bread, he most likely can also buy butter, eggs, or milk,
so these products are stored within a shelf or mostly nearby.

Prof. Dr. K. Adisesha

33
Association Rule Learning

Association Rule Learning:

Association rule learning works on the concept of If and Else Statement, such as if A
then B.

➢ Here the If element is called antecedent, and then statement is called as Consequent.
➢ Association rule learning can be divided into three types of algorithms:
❖ Apriori: It is mainly used for market basket analysis and helps to understand the products
that can be bought together.
❖ Eclat: Eclat algorithm stands for Equivalence Class Transformation. This algorithm uses a
depth-first search technique to find frequent itemsets in a transaction database.
❖ F-P Growth Algorithm: It stands for Frequent Pattern, and it represents the database in the
form of a tree structure that is known as a frequent pattern or tree.
Prof. Dr. K. Adisesha
34
Association Rule Learning

Association Rule Learning:

Association rule learning works on the concept of If and Else Statement, such as if A
then B.

➢ to measure the associations between thousands of data items, there are several metrics.
These metrics are given below:
❖ Support: Support is the frequency of A or how frequently an item appears in the
dataset.
❖ Confidence: It is the ratio of the transaction that contains X and Y to the number of
records that contain X.
❖ Lift: It is the ratio of the observed support measure and expected support if X and Y
are independent of each other.
Prof. Dr. K. Adisesha
35
Association Rule Learning

Apriori Algorithm:
The Apriori algorithm is a machine learning algorithm that finds frequent patterns in
data. It's used to identify associations between items and create rules based on those
associations.
➢ Apriori Algorithm is a foundational method in data mining used for discovering
frequent item sets and generating association rules.
➢ Its significance lies in its ability to identify relationships between items in large datasets
which is particularly valuable in market basket analysis.
➢ For example, if a grocery store finds that customers who buy bread often also buy
butter, it can use this information to optimize product placement or marketing
strategies.
Prof. Dr. K. Adisesha
36
Association Rule Learning

Apriori Algorithm:
The Apriori Algorithm operates through a systematic process that involves several key
steps:
➢ Identifying Frequent Itemsets: The algorithm begins by scanning the dataset to identify
individual items (1-item) and their frequencies.
➢ Creating Possible item group: Once frequent 1-itemgroup(single items) are identified, the
algorithm generates candidate 2-itemgroup by combining frequent items.
➢ Removing Infrequent Item groups: The algorithm employs a pruning technique based on the
Apriori Property, which states that if an itemset is infrequent, all its supersets must also be
infrequent.
➢ Generating Association Rules: After identifying frequent itemsets, the algorithm generates
association rules that illustrate how items relate to one another, using metrics like support,
Prof.confidence,
Dr. K. Adisesha and lift to evaluate the strength of these relationships.
37
Association Rule Learning

Apriori Algorithm:
Example of the Apriori Algorithm: Min support 2
➢ Data in the database
➢ Calculate the support/frequency of all items
➢ Discard the items with minimum support less than 2
➢ Combine two items
➢ Calculate the support/frequency of all items
➢ Discard the items with minimum support less than 2.
Combine three items and calculate their support.
➢ Discard the items with minimum support less than 2
Result:
Only one itemset is frequent (Eggs, Tea, Cold Drink)
because this itemset has minimum support 2
Prof. Dr. K. Adisesha
38
Association Rule Learning

Apriori Algorithm:
Advantages of Apriori Algorithm:
➢ Apriori Algorithm is the simplest and easy to understand the algorithm for mining the frequent
itemset.
➢ Apriori Algorithm is fully supervised so it does not require labeled data.
➢ Apriori Algorithm is an exhaustive algorithm, so it gives satisfactory results to mine all the
rules within specified confidence and sport.
➢ Apriori principles Downward closure property of frequent patterns which, means that All
subset of any frequent itemset must also be frequent.

Prof. Dr. K. Adisesha

39
Association Rule Learning

Eclat Algorithm:
Eclat algorithm stands for Equivalence Class Transformation. This algorithm uses a
depth-first search technique to find frequent itemsets in a transaction database.
➢ Here’s how it works step by step:
❖ Transaction Database: Eclat starts with a transaction database, where each row represents a transaction,
and each column represents an item.
❖ Itemset Generation: Initially, Eclat creates a list of single items as 1-itemsets. It counts the support
(frequency) of each item in the database by scanning it once.
❖ Building Equivalence Classes: Eclat constructs equivalence classes by grouping transactions that share
common items in their 1-itemsets.
❖ Recursive Search: Eclat recursively explores larger itemsets by combining smaller ones. It does this by
taking the intersection of equivalence classes of items.
❖ Pruning: Eclat prunes infrequent itemsets at each step to reduce the search space, just like Apriori.
Prof. Dr. K. Adisesha
40
Association Rule Learning

Eclat Algorithm:
Eclat algorithm stands for Equivalence Class Transformation. This algorithm uses a
depth-first search technique to find frequent itemsets in a transaction database.
➢ Let’s say you have a transactional dataset for a grocery store:
1: {Milk, Bread, Eggs}
2: {Milk, Bread, Diapers}
3: {Milk, Beer, Chips}
4: {Bread, Diapers, Beer, Chips}
5: {Bread, Eggs, Beer}
❖ Suppose you want to find frequent itemsets with a minimum support of 2 transactions.
❖ Initially, the 1-itemsets are {Milk}, {Bread}, {Eggs}, {Diapers}, {Beer}, {Chips}.
❖ Calculate their support.

Prof. Dr. K. Adisesha

41
Association Rule Learning

Eclat Algorithm:
Eclat algorithm stands for Equivalence Class Transformation. This algorithm uses a
depth-first search technique to find frequent itemsets in a transaction database.
❖ Construct equivalence classes:
{Milk}: Transaction 1, 2, 3 {Diapers}: Transaction 2, 4
{Bread}: Transaction 1, 2, 4, 5 {Beer}: Transaction 3, 4, 5
{Chips}: Transaction 3, 4
{Eggs}: Transaction 1, 5
❖ Recursively generate larger itemsets:
{Milk, Bread}, {Milk, Eggs}, {Milk, Diapers}, {Milk, Beer}, {Milk, Chips},
{Bread, Eggs}, {Bread, Diapers}, {Bread, Beer}, {Bread, Chips}, {Eggs, Diapers},
{Eggs, Beer}. {Diapers, Beer}, {Diapers, Chips},{Beer, Chips}
❖ Prune itemsets with support less than 2.
❖ Continue this process until no more frequent itemsets can be found.
Prof. Dr. K. Adisesha
42
Association Rule Learning

Eclat Algorithm:
Eclat algorithm stands for Equivalence Class Transformation. This algorithm uses a
depth-first search technique to find frequent itemsets in a transaction database.
➢ This vertical approach of the ECLAT algorithm makes it a faster algorithm than the Apriori
algorithm.
➢ Transactions, originally stored in horizontal format, are read from disk and converted to
vertical format.

Prof. Dr. K. Adisesha

43
Association Rule Learning

Eclat Algorithm:
Example:- Consider the following transactions record:-, minimum support = 2
k=1 k=2
Item Tidset
Item Tidset
{Bread, Butter} {T1, T4, T8, T9}
Bread {T1, T4, T5, T7, T8, T9}
{Bread, Milk} {T5, T7, T8, T9}

Butter {T1, T2, T3, T4, T6, T8, T9}

{Bread, Coke} {T4}

Milk {T3, T5, T6, T7, T8, T9} {Bread, Jam} {T1, T8}

Coke {T2, T4} {Butter, Milk} {T3, T6, T8, T9}

{Butter, Coke} {T2, T4}

Jam {T1, T8}
{Butter, Jam} {T1, T8}
Prof. Dr. K. Adisesha
{Milk, Jam} {T8}
44
Association Rule Learning

Eclat Algorithm:
Example:- Consider the following transactions record:-, minimum support = 2
k=3 k=4 Items Bought Recommended Products

Item Tidset Item Tidset Bread Butter

{Bread, Butter, Milk} {T8, T9} Bread Milk

{Bread, Butter,
{T8}
Milk, Jam} Bread Jam
{Bread, Butter, Jam} {T1, T8}
Butter Milk
We stop at k = 4 because there are no more item-tidset pairs to
Butter Coke
combine.
Since minimum support = 2, we conclude the following rules Butter Jam
from the given dataset:-
Bread and Butter Milk
Prof. Dr. K. Adisesha
Bread and Butter Jam
45
Association Rule Learning

F-P Growth Algorithm:

The F-P growth algorithm stands for Frequent Pattern, and it is the improved version
of the Apriori Algorithm.
➢ The FP-Growth (Frequent Pattern Growth) algorithm is another popular algorithm for
association rule mining.
➢ It works by constructing a tree-like structure called a FP-tree, which encodes the frequent
itemsets in the dataset.
➢ The FP-tree is then used to generate association rules in a similar manner to the Apriori
algorithm.
➢ The FP-Growth algorithm is generally faster than the Apriori algorithm, especially for large
datasets.

Prof. Dr. K. Adisesha

46
Association Rule Learning

F-P Growth Algorithm:

The F-P growth algorithm stands for Frequent Pattern, and it is the improved version
of the Apriori Algorithm.
➢ Here’s how it works in simple terms:
❖ Data Compression: First, FP-Growth compresses the dataset into a smaller structure called the
Frequent Pattern Tree (FP-Tree). This tree stores information about itemsets (collections of
items) and their frequencies, without needing to generate candidate sets like Apriori does.
❖ Mining the Tree: The algorithm then examines this tree to identify patterns that appear
frequently, based on a minimum support threshold. It does this by breaking the tree down into
smaller “conditional” trees for each item, making the process more efficient.
❖ Generating Patterns: Once the tree is built and analyzed, the algorithm generates the frequent
patterns (itemsets) and the rules that describe relationships between items
Prof. Dr. K. Adisesha
47
Association Rule Learning

F-P Growth Algorithm:

The F-P growth algorithm stands for Frequent Pattern, and it is the improved version
of the Apriori Algorithm.
➢ Here’s how it works in simple terms:
➢ FP tree is the compressed representation
of the itemset database.
➢ The tree structure not only reserves the
itemset in DB but also keeps track of the
association between itemsets.
➢ The tree is constructed by taking each
itemset and mapping it to a path in the tree
one at a time.
Prof. Dr. K. Adisesha
48
Association Rule Learning

F-P Growth Algorithm:

Prof. Dr. K. Adisesha

49
Dimensionality Reduction

Dimensionality Reduction:
Dimensionality reduction is the process of reducing the number of features (or
dimensions) in a dataset while retaining as much information as possible.
➢ The number of input features, variables, or columns present in a given dataset is known as
dimensionality, and the process to reduce these features is called dimensionality reduction.
➢ These techniques are widely used in machine learning for obtaining a better fit predictive model
while solving the classification and regression problems.
➢ Several techniques for dimensionality reduction:
❖ Principal Component Analysis (PCA)
❖ Singular Value Decomposition (SVD)
❖ Linear Discriminant Analysis (LDA)

Prof. Dr. K. Adisesha

50
Dimensionality Reduction

Principal Component Analysis (PCA):

Principal Component Analysis is a widely used dimensionality reduction technique that
leverages the concept of eigenvalues and eigenvectors.
➢ It is used in many fields, including data science, image processing, and finance.
➢ The process involves:
❖ Computing the Covariance Matrix: The covariance matrix of the dataset is calculated.
❖ Eigen Decomposition: The eigenvalues and eigenvectors of the covariance matrix are
computed.
❖ Selecting Principal Components: The eigenvectors corresponding to the largest eigenvalues
are selected as the principal components.
❖ Projecting the Data: The original data is projected onto the new coordinate system defined
by the principal components.
Prof. Dr. K. Adisesha
51
Dimensionality Reduction

Principal Component Analysis (PCA):

Principal Component Analysis is a widely used dimensionality reduction technique that
leverages the concept of eigenvalues and eigenvectors.
Step 1: Standardize the Data: Standardizing our dataset to ensures that each variable has a mean of
0 and a standard deviation of 1.
➢ Here,
❖ μ is the mean of independent features
❖ σ is the standard deviation of independent features
Step 2: Find Relationships: Calculate how features move together using a covariance matrix.
Covariance measures the strength of joint variability between two or more variables, indicating how
much they change in relation to each other.

Prof. Dr. K. Adisesha

52
Dimensionality Reduction

Principal Component Analysis (PCA):

Principal Component Analysis is a widely used dimensionality reduction technique that
leverages the concept of eigenvalues and eigenvectors.
➢ To find the covariance we can use the formula:
➢ The value of covariance can be positive, negative, or zeros.
❖ Positive: As the x1 increases x2 also increases.
❖ Negative: As the x1 increases x2 also decreases.
❖ Zeros: No direct relation.
Step 3: Find the “Magic Directions” (Principal Components)
➢ PCA identifies new axes (like rotating a camera) where the data spreads out the most:
❖ 1st Principal Component (PC1): The direction of maximum variance (most spread).
❖ 2nd Principal Component (PC2): The next best direction, perpendicular to PC1, and so on.
Prof. Dr. K. Adisesha
53
Dimensionality Reduction

Principal Component Analysis (PCA):

Principal Component Analysis is a widely used dimensionality reduction technique that
leverages the concept of eigenvalues and eigenvectors.
➢ These directions are calculated using Eigenvalues and Eigenvectors where:
➢ For a square matrix A, an eigenvector X (a non-zero vector) and its corresponding eigenvalue λ
(a scalar) satisfy:
➢ This means:
❖ When A acts on X, it only stretches or shrinks X by the scalar λ.
❖ The direction of X remains unchanged (hence, eigenvectors define “stable directions” of A).
➢ It can also be written as :

❖ where I is the identity matrix of the same shape as matrix A. And the above conditions will be
true only if (A–λI) will be non-invertible (i.e. singular matrix).
Prof. Dr. K. Adisesha
54
Dimensionality Reduction

Principal Component Analysis (PCA):

Principal Component Analysis is a widely used dimensionality reduction technique that
leverages the concept of eigenvalues and eigenvectors.

Prof. Dr. K. Adisesha

55
Dimensionality Reduction

Principal Component Analysis (PCA):

Principal Component Analysis is a widely used dimensionality reduction technique that
leverages the concept of eigenvalues and eigenvectors.

Prof. Dr. K. Adisesha

56
Dimensionality Reduction

Principal Component Analysis (PCA):

Principal Component Analysis is a widely used dimensionality reduction technique that
leverages the concept of eigenvalues and eigenvectors.

Prof. Dr. K. Adisesha

57
Confusion Matrix

Confusion Matrix in Machine Learning:

The confusion matrix is a matrix used to determine the performance of the classification
models for a given set of test data.
➢ Suppose we are trying to create a model that can predict the result for the disease that is either
a person has that disease or not. So, the confusion matrix for this is given as:.
➢ From the above example, we can conclude that:
➢ The table is given for the two-class classifier, which has two
predictions "Yes" and "NO."
➢ Here, Yes defines that patient has the disease, and No defines that
patient doesn’t have that disease.
❖ The classifier has made a total of 100 predictions. Out of 100 predictions, 89 are true predictions, and 11
are incorrect predictions.
❖ The model has given prediction "yes" for 32 times, and "No" for 68 times. Whereas the actual "Yes" was
Prof. Dr. K. Adisesha
27, and actual "No" was 73 times.
58
Confusion Matrix

Confusion Matrix in Machine Learning:

The confusion matrix is a matrix used to determine the performance of the classification
models for a given set of test data.
➢ It shows the errors in the model performance in the form of a matrix, hence also known as an
error matrix.
➢ The above table has the following cases:
❖ True Negative: Model has given prediction No, and the real or actual
value was also No.
❖ True Positive: The model has predicted yes, and the actual value was
also true.
❖ False Positive: The model has predicted Yes, but the actual value was No. It is also called a Type-I error.
❖ False Negative: The model has predicted no, but the actual value was Yes, it is also called as Type-II
error.
Prof. Dr. K. Adisesha
59
Confusion Matrix

Need for Confusion Matrix in Machine learning:

The confusion matrix is a matrix used to determine the performance of the classification
models for a given set of test data.
➢ It evaluates the performance of the classification models, when they make predictions
on test data, and tells how good our classification model is.
➢ It not only tells the error made by the classifiers but also the type of errors such as it is
either type-I or type-II error.
➢ With the help of the confusion matrix, we can calculate the different parameters for the
model, such as accuracy, precision, etc.

Prof. Dr. K. Adisesha

60
Confusion Matrix

Calculations using Confusion Matrix:

We can perform various calculations for the model, such as the model's accuracy, using
this matrix. These calculations are given below.
➢ Classification Accuracy: It is one of the important parameters to determine the accuracy of the
classification problems.

➢ Misclassification rate: It is also termed as Error rate, and it defines how often the model gives
the wrong predictions.

➢ Precision: It can be defined as the number of correct outputs provided by the model or out of all
positive classes that have predicted correctly by the model, that were actually true.

Prof. Dr. K. Adisesha

61
Confusion Matrix

Calculations using Confusion Matrix:

We can perform various calculations for the model, such as the model's accuracy, using
this matrix. These calculations are given below.
➢ Recall: It is defined as the out of total positive classes, how our model predicted correctly. The
recall must be as high as possible.

➢ F-measure: If two models have low precision and high recall or vice versa, we can use F-score.
This score helps us to evaluate the recall and precision at the same time.

➢ Null Error rate: It defines how often our model would be incorrect if it always predicted the
majority class. It is said that "the best classifier has a higher error rate than the null error rate."
Prof. Dr. K. Adisesha
62
Confusion Matrix

Calculations using Confusion Matrix:

We can perform various calculations for the model, such as the model's accuracy, using
this matrix. These calculations are given below.

Prof. Dr. K. Adisesha

63
Queries

Prof. Dr. K. Adisesha

Machine Learning with Clustering: A Visual Guide for Beginners with Examples in Python
From Everand
Machine Learning with Clustering: A Visual Guide for Beginners with Examples in Python
Artem Kovera
No ratings yet
UNIT-4
No ratings yet
UNIT-4
62 pages
Unit 3 unsupervised learning algorith
No ratings yet
Unit 3 unsupervised learning algorith
15 pages
ML UNIT-III
No ratings yet
ML UNIT-III
18 pages
Clustering Agglo Devisive DBSCAN
No ratings yet
Clustering Agglo Devisive DBSCAN
78 pages
Machine Learning Note Modul 4 5[1]
No ratings yet
Machine Learning Note Modul 4 5[1]
20 pages
Clustering new
No ratings yet
Clustering new
6 pages
ARTIFICIAL INTELLIGENCE LEC 5
No ratings yet
ARTIFICIAL INTELLIGENCE LEC 5
20 pages
22AIP3101A Session 9
No ratings yet
22AIP3101A Session 9
38 pages
unsupervised-learning
No ratings yet
unsupervised-learning
18 pages
Day 3 - Content
No ratings yet
Day 3 - Content
50 pages
Clustering
No ratings yet
Clustering
8 pages
DM Unit 5
No ratings yet
DM Unit 5
15 pages
Machine Learning & Data Mining: Understanding
No ratings yet
Machine Learning & Data Mining: Understanding
7 pages
DW & DM Unit 4 Notes
No ratings yet
DW & DM Unit 4 Notes
40 pages
UNIT III - ML
No ratings yet
UNIT III - ML
13 pages
Module 6 - Un-Supervised Learning Algorithms
No ratings yet
Module 6 - Un-Supervised Learning Algorithms
31 pages
ML Mod 4 Part 1
No ratings yet
ML Mod 4 Part 1
99 pages
ml8
No ratings yet
ml8
5 pages
Clustering
No ratings yet
Clustering
29 pages
Unit 4 Clustering
No ratings yet
Unit 4 Clustering
18 pages
Unsupervised Learning-01
No ratings yet
Unsupervised Learning-01
42 pages
Unsupervised Learning
No ratings yet
Unsupervised Learning
64 pages
8. Clustering
No ratings yet
8. Clustering
38 pages
ml 8
No ratings yet
ml 8
12 pages
E-Note_28966_Content_Document_20241211091351PM
No ratings yet
E-Note_28966_Content_Document_20241211091351PM
69 pages
ML Unit 4 Notes - NJ
No ratings yet
ML Unit 4 Notes - NJ
15 pages
Classify Clustering
No ratings yet
Classify Clustering
31 pages
Fundamentals of Data Science Unit 3
No ratings yet
Fundamentals of Data Science Unit 3
15 pages
Clustering
No ratings yet
Clustering
44 pages
Lecturer-1 Unit 3
No ratings yet
Lecturer-1 Unit 3
31 pages
FML Unit4
No ratings yet
FML Unit4
14 pages
UNIT-4
No ratings yet
UNIT-4
106 pages
Clustering: An Overview: Key Concepts Objective
No ratings yet
Clustering: An Overview: Key Concepts Objective
12 pages
Lect 10 - Unsupervised Learning
No ratings yet
Lect 10 - Unsupervised Learning
50 pages
Clustering
No ratings yet
Clustering
27 pages
CBSYLLABUS BDA
No ratings yet
CBSYLLABUS BDA
5 pages
Week 9. Unsupervised Learning
No ratings yet
Week 9. Unsupervised Learning
32 pages
Untitled document
No ratings yet
Untitled document
32 pages
Unit 4 Descriptive Modeling
No ratings yet
Unit 4 Descriptive Modeling
18 pages
Clustering-Part 1
No ratings yet
Clustering-Part 1
35 pages
Chapter 5
No ratings yet
Chapter 5
43 pages
Lecture 3 Types of Machine Learning
No ratings yet
Lecture 3 Types of Machine Learning
40 pages
A Comprehensive Overview of Clustering Algorithms in Pattern Recognition
No ratings yet
A Comprehensive Overview of Clustering Algorithms in Pattern Recognition
8 pages
fuzzy meaning
No ratings yet
fuzzy meaning
6 pages
Unsupervised Machine Learning Techniques (2)
No ratings yet
Unsupervised Machine Learning Techniques (2)
58 pages
ifferent methods of clustering
No ratings yet
ifferent methods of clustering
8 pages
Presentation 28128 Content Document 20241126014005PM
No ratings yet
Presentation 28128 Content Document 20241126014005PM
80 pages
Introduction to Cluster Analysis.
No ratings yet
Introduction to Cluster Analysis.
53 pages
Clustering
No ratings yet
Clustering
57 pages
Clustering
No ratings yet
Clustering
39 pages
Unit 5
No ratings yet
Unit 5
5 pages
Big Data Analytics
No ratings yet
Big Data Analytics
25 pages
Lecture Notes For Chapter 8: by Tan, Steinbach, Kumar
No ratings yet
Lecture Notes For Chapter 8: by Tan, Steinbach, Kumar
93 pages
Clustering
No ratings yet
Clustering
21 pages
Unit-4
No ratings yet
Unit-4
53 pages
Lec 31
No ratings yet
Lec 31
31 pages
Unit III Clustering
No ratings yet
Unit III Clustering
47 pages
The Secret Of Machine Learning
From Everand
The Secret Of Machine Learning
Mhd Arjunanta
No ratings yet
Decision Tree Pruning: Fundamentals and Applications
From Everand
Decision Tree Pruning: Fundamentals and Applications
Fouad Sabry
No ratings yet
Lecture 3
No ratings yet
Lecture 3
17 pages
Regularization of Neural Networks Using Dropconnect: Hinton Et Al. 2012 Hinton Et Al. 2012
No ratings yet
Regularization of Neural Networks Using Dropconnect: Hinton Et Al. 2012 Hinton Et Al. 2012
9 pages
Management Information Systems Managing The Digital Firm 14th Edition Laudon Test Bank Download
100% (25)
Management Information Systems Managing The Digital Firm 14th Edition Laudon Test Bank Download
28 pages
Unit 11
No ratings yet
Unit 11
37 pages
Handbook of Survival Analysis - 1st Edition Digital DOCX Download
100% (11)
Handbook of Survival Analysis - 1st Edition Digital DOCX Download
17 pages
MITOCW - L2.4 Degenerate Perturbation Theory: Leading Energy Corrections
No ratings yet
MITOCW - L2.4 Degenerate Perturbation Theory: Leading Energy Corrections
2 pages
Obstacle Avoidance and Control of Swedish Wheeled Mobile Robot
No ratings yet
Obstacle Avoidance and Control of Swedish Wheeled Mobile Robot
8 pages
Recursion C++ PDF
No ratings yet
Recursion C++ PDF
24 pages
Expert Systems With Applications: Georgios Douzas, Fernando Bacao
No ratings yet
Expert Systems With Applications: Georgios Douzas, Fernando Bacao
8 pages
Network Traffic Analysis Using Machine Learning: Abstract
No ratings yet
Network Traffic Analysis Using Machine Learning: Abstract
6 pages
Least Cost Algorithm
No ratings yet
Least Cost Algorithm
9 pages
Fuck You Scribd
No ratings yet
Fuck You Scribd
1 page
Math 10 Problem Solving
No ratings yet
Math 10 Problem Solving
3 pages
Data Mining and Knowledge Discovery By, Amit Vaghela (020102017)
No ratings yet
Data Mining and Knowledge Discovery By, Amit Vaghela (020102017)
16 pages
Tut-2_Solution
No ratings yet
Tut-2_Solution
2 pages
Differential Equations and Boundary Value Problems Computing and Modeling 5th Edition Edwards Solutions Manual - Free Download Available In PDF DOCX Format
100% (6)
Differential Equations and Boundary Value Problems Computing and Modeling 5th Edition Edwards Solutions Manual - Free Download Available In PDF DOCX Format
55 pages
SQL Cheat Sheet DATAwithBARAA
No ratings yet
SQL Cheat Sheet DATAwithBARAA
5 pages
EPR Paradox Contested
No ratings yet
EPR Paradox Contested
6 pages
Artificial Intelligence
No ratings yet
Artificial Intelligence
2 pages
Matriks Hessian Dalam Fungsi Lagrange
No ratings yet
Matriks Hessian Dalam Fungsi Lagrange
6 pages
Ci & Si Practise Sheet CGL Mains
No ratings yet
Ci & Si Practise Sheet CGL Mains
19 pages
Abhinav Jaiswal S CV
No ratings yet
Abhinav Jaiswal S CV
1 page
@Vtucode.in 21CS71 Module 5 PDF[1]
No ratings yet
@Vtucode.in 21CS71 Module 5 PDF[1]
5 pages
Uji Multikolinearitas
No ratings yet
Uji Multikolinearitas
4 pages
Artificial Intelligence (AI)
No ratings yet
Artificial Intelligence (AI)
11 pages
Respon Surface
No ratings yet
Respon Surface
8 pages
MA204 2024 SM Tutoril 1
No ratings yet
MA204 2024 SM Tutoril 1
2 pages
Lesson 8 Association Rules
No ratings yet
Lesson 8 Association Rules
58 pages
Generation of Elementary Discrete Time Sequence With Formula Generation of Elementary Discrete Time Sequence Without Formula
No ratings yet
Generation of Elementary Discrete Time Sequence With Formula Generation of Elementary Discrete Time Sequence Without Formula
11 pages
1484 Curso CFD de Standford Me469a Numerical Methods For Fluids Mechancis Capitulo 4 Finite Volume Discretization
No ratings yet
1484 Curso CFD de Standford Me469a Numerical Methods For Fluids Mechancis Capitulo 4 Finite Volume Discretization
13 pages