0% found this document useful (0 votes)

10 views9 pages

Machine Learning_Unit 3

Unit 3 covers unsupervised learning techniques including clustering methods like K-means and hierarchical clustering, as well as association rules. K-means clustering groups data into a predetermined number of clusters based on similarity, while hierarchical clustering builds a tree of clusters without needing to predefine the number. Association rule mining identifies relationships and patterns in large datasets, commonly used in market basket analysis.

Uploaded by

apparaokadam250

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

10 views9 pages

Machine Learning_Unit 3

Uploaded by

apparaokadam250

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

You are on page 1/ 9

Unit 3: Unsupervised Learning

Clustering (k-means, hierarchical), dimensionality reduction (PCA), association

rules.

Clustering (k-means, hierarchical)

What is K-means Clustering?

K-means clustering is a simple and widely used unsupervised machine learning
algorithm that iteratively groups a collection of data points into a fixed number
of clusters (k) according to their similarity. The algorithm aims to reduce the
distance between each data point and its corresponding cluster center, also
called the centroid. The algorithm terminates when either the centroids remain
stable, or a maximum number of iterations is achieved. K-means clustering has
various applications, such as data analysis, image segmentation, anomaly
detection, etc.

What is Hierarchical Clustering?

Hierarchical clustering is a type of unsupervised machine learning algorithm
that organizes data points into a hierarchy of clusters based on their similarity
or distance. It is also called hierarchical cluster analysis or HCA. Hierarchical
clustering has two variants: agglomerative and divisive.
Agglomerative clustering: begins with each data point as a separate cluster and
then combines the nearest clusters until there is only one cluster left.
Divisive clustering: begins with all data points in a single cluster and then
divides the cluster repeatedly until each data point has its own cluster.
Hierarchical clustering can be represented by a dendrogram, which is a graph
that illustrates the nested arrangement of clusters and their distances.
Hierarchical clustering has various uses, such as finding patterns, discovering
hierarchies, or detecting outliers in data.
Differences Between K-Means Clustering and Hierarchical Clustering. 0
Hierarchical Clustering in Machine Learning
Hierarchical clustering is another unsupervised machine learning algorithm,
which is used to group the unlabelled datasets into a cluster and also known
as hierarchical cluster analysis or HCA.
In this algorithm, we develop the hierarchy of clusters in the form of a tree, and
this tree-shaped structure is known as the dendrogram.
Sometimes the results of K-means clustering and hierarchical clustering may
look similar, but they both differ depending on how they work. As there is no
requirement to predetermine the number of clusters as we did in the K-Means
algorithm.
The hierarchical clustering technique has two approaches:

1. Agglomerative: Agglomerative is a bottom-up approach, in which the

algorithm starts with taking all data points as single clusters and merging
them until one cluster is left.
2. Divisive: Divisive algorithm is the reverse of the agglomerative algorithm as
it is a top-down approach.

Why hierarchical clustering?

As we already have other clustering algorithms such as K-Means
Clustering, then why we need hierarchical clustering? So, as we have seen
in the K-means clustering that there are some challenges with this algorithm,
which are a predetermined number of clusters, and it always tries to create
the clusters of the same size. To solve these two challenges, we can opt for
the hierarchical clustering algorithm because, in this algorithm, we don't
need to have knowledge about the predefined number of clusters.

In this topic, we will discuss the Agglomerative Hierarchical clustering

algorithm.

Agglomerative Hierarchical clustering

The agglomerative hierarchical clustering algorithm is a popular example of

HCA. To group the datasets into clusters, it follows the bottom-up
approach. It means, this algorithm considers each dataset as a single cluster
at the beginning, and then start combining the closest pair of clusters
together. It does this until all the clusters are merged into a single cluster
that contains all the datasets.

This hierarchy of clusters is represented in the form of the dendrogram.

How the Agglomerative Hierarchical clustering Work?

The working of the AHC algorithm can be explained using the below steps:

o Step-1: Create each data point as a single cluster. Let's say there
are N data points, so the number of clusters will also be N.

o Step-2: Take two closest data points or clusters and merge them to
form one cluster. So, there will now be N-1 clusters.
o Step-3: Again, take the two closest clusters and merge them
together to form one cluster. There will be N-2 clusters.

o Step-4: Repeat Step 3 until only one cluster left. So, we will get the
following clusters. Consider the below images:
o Step-5: Once all the clusters are combined into one big cluster,
develop the dendrogram to divide the clusters as per the problem.

Association rules offer a powerful tool for data analysis, providing

insights into patterns and relationships within large datasets. While they
are a staple in market basket analysis, their application extends across
various domains, offering invaluable insights into customer behaviour and
beyond.

Introduction to Association Rules in Data Mining

Association rule mining is a technique in data mining for discovering

interesting relationships, frequent patterns, associations, or correlations,
between variables in large datasets. It’s widely used in various fields such
as market basket analysis, web usage mining, bioinformatics, and more.
The basic idea is to find rules that predict the occurrence of an item based
on the occurrences of other items in the transaction.

Understanding the Basics

To explain the association rule mining, we can use a simple example of a

grocery store’s transaction data. Let’s start by defining a sample
transaction table and then move on to discuss item sets and association
rules derived from this data.

Imagine a small dataset representing transactions in a grocery store:

Table 1: Sample Transaction Table

In this table, each row represents a transaction (a customer’s purchase),

and each transaction has a unique ID. The ‘Items Purchased’ column lists
the items bought in that transaction.

Concept of Itemset

An ‘item’ is a collection of one or more items found within a dataset. For

example, consider a dataset containing various groceries. An item could
be a combination like {Cheese, Tomato}.

The ‘length’ of an item set is the number of items it contains. Thus,

{Cheese, Tomato} is a 2-itemset.

· Single item itemsets: {Milk}, {Bread}, {Butter}, {Diapers}, {Beer},

{Cola}

· Two-item itemsets: {Milk, Bread}, {Bread, Butter}, {Diapers, Beer}, etc.

· Three-item itemsets: {Milk, Bread, Butter}, {Bread, Diapers, Beer}, etc.

Association Rules
· An association rule is a fundamental concept in data mining that reveals
how items within a dataset are connected. It’s a directive that suggests a
strong, potentially useful relationship between two sets of items.

· These rules are expressed in the form of “If-Then” statements, typically

written as {X} → {Y}, where X and Y are different sets of items.

Excel PPT CBP
No ratings yet
Excel PPT CBP
10 pages
Empirical Political Analysis International Edition 9th Edition Extended Version Download
100% (13)
Empirical Political Analysis International Edition 9th Edition Extended Version Download
15 pages
Primary and Foreign Keys Laboratory Exercise #3 From Houston, Texas
100% (1)
Primary and Foreign Keys Laboratory Exercise #3 From Houston, Texas
11 pages
Mastering Storytelling From Titles to Scripts By Syed Naeem Shah (1)
No ratings yet
Mastering Storytelling From Titles to Scripts By Syed Naeem Shah (1)
5 pages
MACHINE LEARNING NOTES ANNA UNIVERSITY
No ratings yet
MACHINE LEARNING NOTES ANNA UNIVERSITY
14 pages
Sli CS-XL
No ratings yet
Sli CS-XL
7 pages
Framework of Content Management System For Uitm Elearning System
No ratings yet
Framework of Content Management System For Uitm Elearning System
9 pages
Data Science Unit 5
No ratings yet
Data Science Unit 5
105 pages
Hierarchical Clustering in Machine Learning
No ratings yet
Hierarchical Clustering in Machine Learning
11 pages
Swift Travel Services SPC
No ratings yet
Swift Travel Services SPC
2 pages
7.2. Clustering Methods (1)
No ratings yet
7.2. Clustering Methods (1)
46 pages
DeepLog _820
No ratings yet
DeepLog _820
6 pages
Pointer C Notes
No ratings yet
Pointer C Notes
6 pages
Cluster Analysis Notes
No ratings yet
Cluster Analysis Notes
37 pages
Hassler Et Al 2015 Journal of Computer Assisted Learning
No ratings yet
Hassler Et Al 2015 Journal of Computer Assisted Learning
45 pages
Big Data Analytics Report (1) (1)
No ratings yet
Big Data Analytics Report (1) (1)
30 pages
Cultural_preservation_and_digital_heritage_challen
No ratings yet
Cultural_preservation_and_digital_heritage_challen
12 pages
DOC-20231118-WA0008new Unit 5
No ratings yet
DOC-20231118-WA0008new Unit 5
15 pages
6 - Chapter 6 - Hierarchical Clustering
No ratings yet
6 - Chapter 6 - Hierarchical Clustering
32 pages
Hierarchical Clustering
No ratings yet
Hierarchical Clustering
15 pages
JETIR2112143
No ratings yet
JETIR2112143
13 pages
Export Directly From A URL
No ratings yet
Export Directly From A URL
2 pages
Web Caching Through Modified Cache Replacement Algorithm
No ratings yet
Web Caching Through Modified Cache Replacement Algorithm
5 pages
David Kishura Research Report - 2
No ratings yet
David Kishura Research Report - 2
76 pages
CHAPTER 8 - Project Planning
No ratings yet
CHAPTER 8 - Project Planning
16 pages
5812d46b-1c39-4a89-ae4b-eec09f93ba4b
No ratings yet
5812d46b-1c39-4a89-ae4b-eec09f93ba4b
66 pages
Test Cases Test Case For Print (Table) : Test Case Action Input Value Expected Results Actual Results Pass/Fail
No ratings yet
Test Cases Test Case For Print (Table) : Test Case Action Input Value Expected Results Actual Results Pass/Fail
5 pages
ML UNIT 2
No ratings yet
ML UNIT 2
17 pages
Clustering
No ratings yet
Clustering
19 pages
ML Unit 5
No ratings yet
ML Unit 5
50 pages
Data Independence and Keys
No ratings yet
Data Independence and Keys
8 pages
Unsupervised Machine Learning Techniques (2)
No ratings yet
Unsupervised Machine Learning Techniques (2)
58 pages
ps-file
No ratings yet
ps-file
6 pages
Troubleshooting ENQ TX Row Locks
No ratings yet
Troubleshooting ENQ TX Row Locks
7 pages
4.unsupervised Learning Model-Clustering
No ratings yet
4.unsupervised Learning Model-Clustering
45 pages
Data Analytics and Model Evaluation
No ratings yet
Data Analytics and Model Evaluation
55 pages
Clustering Techniques
No ratings yet
Clustering Techniques
23 pages
Unit - 4 DM
No ratings yet
Unit - 4 DM
24 pages
Hierarchical Clustering
No ratings yet
Hierarchical Clustering
7 pages
Hierarchial Clustering
No ratings yet
Hierarchial Clustering
14 pages
Unit 4 Self Made (1)
No ratings yet
Unit 4 Self Made (1)
28 pages
6 - Clustering and Applications and Trends in Datamining
No ratings yet
6 - Clustering and Applications and Trends in Datamining
66 pages
K-Means and Hierarchical Clustering
No ratings yet
K-Means and Hierarchical Clustering
30 pages
Lesson 1 & 2
No ratings yet
Lesson 1 & 2
8 pages
MODULE 4 - 5TH SEM (2)
No ratings yet
MODULE 4 - 5TH SEM (2)
23 pages
UnSupervisedLearning
No ratings yet
UnSupervisedLearning
22 pages
Lec35
No ratings yet
Lec35
18 pages
Module 3 - 1
No ratings yet
Module 3 - 1
149 pages
Week-9-Part-2 Agglomerative Clustering
No ratings yet
Week-9-Part-2 Agglomerative Clustering
40 pages
Text Analytics Unit-3
No ratings yet
Text Analytics Unit-3
11 pages
Introduction to Cluster Analysis.
No ratings yet
Introduction to Cluster Analysis.
53 pages
Hierarchical Clusters
No ratings yet
Hierarchical Clusters
6 pages
NoSQL Gnosis. - Resp
No ratings yet
NoSQL Gnosis. - Resp
22 pages
6 - Machine Learning and Unlabeled Data
No ratings yet
6 - Machine Learning and Unlabeled Data
67 pages
Presentation 28128 Content Document 20241126014005PM
No ratings yet
Presentation 28128 Content Document 20241126014005PM
80 pages
Hierarchical Clustering PDF
No ratings yet
Hierarchical Clustering PDF
5 pages
Hierarchical Clustering Algorithm
No ratings yet
Hierarchical Clustering Algorithm
9 pages
Unit 3 unsupervised learning algorith
No ratings yet
Unit 3 unsupervised learning algorith
15 pages
LOGS
No ratings yet
LOGS
78 pages
Perl Data Table Cookbook
No ratings yet
Perl Data Table Cookbook
43 pages
Herichycal March2020
No ratings yet
Herichycal March2020
29 pages
Hierarchical-Clustering-in-Machine-Learning
No ratings yet
Hierarchical-Clustering-in-Machine-Learning
10 pages
Module 6 - Un-Supervised Learning Algorithms
No ratings yet
Module 6 - Un-Supervised Learning Algorithms
31 pages
Hierarchical Clustering pdf
No ratings yet
Hierarchical Clustering pdf
7 pages
Image Segmentation Adaptive Clustering
No ratings yet
Image Segmentation Adaptive Clustering
9 pages
Clustering Agglo Devisive DBSCAN
No ratings yet
Clustering Agglo Devisive DBSCAN
78 pages
DA Seminar
No ratings yet
DA Seminar
29 pages
Module-5-Cluster Analysis-Part1
No ratings yet
Module-5-Cluster Analysis-Part1
24 pages
Hierarchical Clustering
No ratings yet
Hierarchical Clustering
34 pages
Hierarchical Clustering: Required Data
No ratings yet
Hierarchical Clustering: Required Data
6 pages
Unit 4 Descriptive Modeling
No ratings yet
Unit 4 Descriptive Modeling
18 pages
Dbms Mini Project
0% (3)
Dbms Mini Project
18 pages
Hierarchical Clustering
No ratings yet
Hierarchical Clustering
11 pages
8. Clustering
No ratings yet
8. Clustering
38 pages
Ceilcote - 2000 International
No ratings yet
Ceilcote - 2000 International
4 pages
Chap15 Cluster Analysis
No ratings yet
Chap15 Cluster Analysis
55 pages
AIMLB PGP 2024 Session 12
No ratings yet
AIMLB PGP 2024 Session 12
46 pages
Clustring
No ratings yet
Clustring
20 pages
Hierarchical Clustering Unit 4 ML
No ratings yet
Hierarchical Clustering Unit 4 ML
14 pages
Data Mining Unit 3 Cluster Analysis: Types of Clusters
No ratings yet
Data Mining Unit 3 Cluster Analysis: Types of Clusters
11 pages
Sales Analysis
No ratings yet
Sales Analysis
4 pages
Grouping
No ratings yet
Grouping
98 pages
Hierarchical Clustering - 11.3.2024 - Full
No ratings yet
Hierarchical Clustering - 11.3.2024 - Full
14 pages
Clustering
No ratings yet
Clustering
7 pages
Agnes
No ratings yet
Agnes
25 pages
Herichycal Cluster - March2020
No ratings yet
Herichycal Cluster - March2020
29 pages
Cluster Analysis
No ratings yet
Cluster Analysis
24 pages
Developing Perl and PHP Applications - Db2ape90
No ratings yet
Developing Perl and PHP Applications - Db2ape90
166 pages
Hibernate ntcc2k19
No ratings yet
Hibernate ntcc2k19
21 pages
Machine Learning with Clustering: A Visual Guide for Beginners with Examples in Python
From Everand
Machine Learning with Clustering: A Visual Guide for Beginners with Examples in Python
Artem Kovera
No ratings yet