0% found this document useful (0 votes)

21 views4 pages

Detailed Clustering in Machine Learning Notes

Uploaded by

kunal b malviya

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

21 views4 pages

Detailed Clustering in Machine Learning Notes

Uploaded by

kunal b malviya

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 4

UNIT-II: Clustering in Machine Learning

Clustering in Machine Learning:

-------------------------------

Clustering is a type of unsupervised learning in which we group data points into distinct clusters,

such that the points in each cluster are more similar to each other than to those in other clusters.

This is widely used for data exploration, pattern recognition, and as a pre-processing step in other

algorithms.

1. Types of Clustering Methods:

--------------------------------

a) **Partitioning Clustering**:

- Partitioning methods divide the data set into non-overlapping subsets (clusters). A popular

partitioning clustering algorithm is the K-Means algorithm.

- **K-Means**: It is an iterative algorithm that assigns each point to one of \(K\) clusters based

on the mean of the points within the cluster. The algorithm minimizes the intra-cluster variance

(distance between points in the same cluster).

- K-Medoids: A variation of K-Means where the mean of each cluster is replaced by a

representative point (medoid).

- These methods are sensitive to the initial selection of centroids and the number of clusters,

which needs to be pre-defined.

b) Distribution Model-Based Clustering:

- This type of clustering assumes that the data is generated by a mixture of several probability
distributions (usually Gaussian). The goal is to estimate the parameters of these distributions.

- **Gaussian Mixture Models (GMM)**: A probabilistic model where the data points are modeled

as a mixture of several Gaussian distributions. Each data point has a probability of belonging to a

certain cluster.

- **Expectation Maximization (EM)**: An iterative algorithm used for fitting a GMM. The

algorithm alternates between estimating the probability distribution (Expectation step) and

maximizing the likelihood (Maximization step).

c) **Hierarchical Clustering**:

- Hierarchical clustering builds a hierarchy of clusters by either starting with individual data

points and merging them (agglomerative) or starting with all points in one cluster and splitting them

(divisive).

- **Agglomerative Clustering**: Begins with each data point as its own cluster and iteratively

merges the closest clusters based on a similarity measure.

- **Divisive Clustering**: Starts with a single cluster that contains all the data points and

recursively splits the cluster into smaller clusters.

- A key advantage of hierarchical clustering is that the number of clusters does not need to be

predefined.

d) **Fuzzy Clustering**:

- In fuzzy clustering, each data point can belong to multiple clusters with different degrees of

membership. The most popular method is Fuzzy C-Means (FCM).

- **Fuzzy C-Means**: The algorithm assigns each data point a membership value for each

cluster, and the sum of the membership values for each data point is equal to 1. This allows for soft

clustering, where data points can belong to multiple clusters.

2. **Birch Algorithm**:
- The BIRCH (Balanced Iterative Reducing and Clustering using Hierarchies) algorithm is an

efficient clustering algorithm for large datasets. It constructs a Clustering Feature (CF) tree, which

summarizes clusters in a compact form.

- The CF tree is built incrementally, where each node in the tree represents a cluster summary.

The Birch algorithm uses this structure to efficiently compute clusters without needing to store the

entire dataset.

- The BIRCH algorithm is particularly useful for situations where the dataset is too large to fit into

memory and when the clustering task requires a quick solution.

3. **CURE Algorithm**:

- CURE (Clustering Using REpresentatives) is an algorithm designed for clustering large datasets.

- The algorithm addresses the issue of outliers and high dimensionality by selecting a fixed

number of representative points from each cluster. These points are then used to form a cluster, and

the algorithm uses a combination of distance and centroid-based techniques to build the final cluster

structure.

- CURE is highly efficient and effective for clustering large datasets with varying shapes and sizes.

4. Gaussian Mixture Models (GMM) and Expectation Maximization (EM):

- **Gaussian Mixture Models (GMM)**: This is a probabilistic model used for clustering that

assumes that the data is generated by a mixture of several Gaussian distributions. Each cluster in a

GMM is represented by a Gaussian distribution, and the model estimates the parameters (mean,

variance, and mixture weights) for each distribution.

- **Expectation Maximization (EM)**: The EM algorithm is used to estimate the parameters of the

GMM. The algorithm consists of two steps:

- **Expectation (E-step)**: Compute the probability that each data point belongs to each cluster

(given the current parameters of the Gaussian distributions).

- **Maximization (M-step)**: Update the parameters (mean, covariance, and mixture weights) of
the Gaussians based on the probabilities computed in the E-step.

- The EM algorithm iterates between these two steps until convergence.

5. **Parameters Estimations**:

- **Maximum Likelihood Estimation (MLE)**: MLE is a method for estimating the parameters of a

statistical model. It involves choosing the parameter values that maximize the likelihood function,

i.e., the probability of the observed data given the model.

- **Maximum A Posteriori (MAP)**: MAP is similar to MLE, but it incorporates a prior probability

distribution on the parameters. This prior distribution represents any prior knowledge we have about

the parameters. MAP estimation aims to maximize the posterior probability, which is a combination

of the likelihood and the prior.

6. **Applications of Clustering**:

- **Image Segmentation**: Clustering is used to group similar pixels in an image, allowing for

segmentation of the image into meaningful regions or objects.

- **Market Segmentation**: Businesses use clustering to group customers with similar behaviors

or purchasing patterns, allowing for targeted marketing strategies.

- **Anomaly Detection**: Clustering can be used to identify outliers or anomalous data points that

do not fit into any of the existing clusters.

- **Social Network Analysis**: Clustering can be used to detect communities in social networks,

where nodes (individuals) within the same cluster have similar characteristics.

- **Document Categorization**: In text mining, clustering can be used to group similar documents

together, which is useful for topic modeling and information retrieval.

Machine Learning PPT For Students
70% (10)
Machine Learning PPT For Students
18 pages
Computer Graphics Polygon
No ratings yet
Computer Graphics Polygon
86 pages
Chapter 1 Introduction To Computer Organization and Architecture
No ratings yet
Chapter 1 Introduction To Computer Organization and Architecture
58 pages
Network Attack
No ratings yet
Network Attack
84 pages
Machine Learning Notes
100% (10)
Machine Learning Notes
19 pages
DMDW Full PDF
No ratings yet
DMDW Full PDF
784 pages
IOT Notes
100% (1)
IOT Notes
40 pages
Deep Learning CNN
100% (1)
Deep Learning CNN
28 pages
Course Notes-Unit 1
100% (1)
Course Notes-Unit 1
69 pages
Deep Learning in Computer Vision - Principles and Applications
100% (3)
Deep Learning in Computer Vision - Principles and Applications
339 pages
Machine Learning For Absolute Beginners A - Oliver Theobald
100% (2)
Machine Learning For Absolute Beginners A - Oliver Theobald
179 pages
Coa Unit 2
100% (1)
Coa Unit 2
129 pages
Computer Network UNIT 4
No ratings yet
Computer Network UNIT 4
13 pages
Unit 2 Scan Conversion Algorithm
No ratings yet
Unit 2 Scan Conversion Algorithm
19 pages
Experience Certificate Format Teaching
No ratings yet
Experience Certificate Format Teaching
2 pages
Unit 4
No ratings yet
Unit 4
29 pages
Flood Fill Algorithm
No ratings yet
Flood Fill Algorithm
6 pages
Computer Graphics I UNIT Notes
100% (1)
Computer Graphics I UNIT Notes
24 pages
Computer Networks UNIT I
No ratings yet
Computer Networks UNIT I
54 pages
Data Warehousing & Data Mining
No ratings yet
Data Warehousing & Data Mining
1 page
ML Unit 5
No ratings yet
ML Unit 5
20 pages
02.ScanConversion MCA
No ratings yet
02.ScanConversion MCA
101 pages
Computer Graphics: Unit 2
No ratings yet
Computer Graphics: Unit 2
74 pages
UNIT 2 Computer Graphics
No ratings yet
UNIT 2 Computer Graphics
80 pages
Materi Naive Bayes
No ratings yet
Materi Naive Bayes
15 pages
Concepts and Techniques: Data Mining
No ratings yet
Concepts and Techniques: Data Mining
101 pages
Experience Certificate
75% (8)
Experience Certificate
3 pages
Ifferent Methods of Clustering
No ratings yet
Ifferent Methods of Clustering
8 pages
An Approach For Crime Analysis Using Clustering Algorithm: Submitted by
No ratings yet
An Approach For Crime Analysis Using Clustering Algorithm: Submitted by
52 pages
Classification in Data Mining
No ratings yet
Classification in Data Mining
60 pages
Unsupervised Machine Learning
No ratings yet
Unsupervised Machine Learning
63 pages
Chapter 5
No ratings yet
Chapter 5
43 pages
Unit 4
No ratings yet
Unit 4
62 pages
Motivation of Data Mining
No ratings yet
Motivation of Data Mining
4 pages
Lect 10 - Unsupervised Learning
No ratings yet
Lect 10 - Unsupervised Learning
50 pages
Text, Web and Social Media Analytics: SE Computer, Sem VIII Academic Year: 2023 - 24
No ratings yet
Text, Web and Social Media Analytics: SE Computer, Sem VIII Academic Year: 2023 - 24
36 pages
Jalali@mshdiua - Ac.ir Jalali - Mshdiau.ac - Ir: Data Mining
No ratings yet
Jalali@mshdiua - Ac.ir Jalali - Mshdiau.ac - Ir: Data Mining
53 pages
Clustering
No ratings yet
Clustering
45 pages
Medical Data Mining With Extended WEKA: R. ROBU and C. HORA
No ratings yet
Medical Data Mining With Extended WEKA: R. ROBU and C. HORA
4 pages
Unit 1 Overview (P)
No ratings yet
Unit 1 Overview (P)
32 pages
Full Course of Machine Learning
100% (16)
Full Course of Machine Learning
660 pages
ML Unit 4 (Ab 22)
No ratings yet
ML Unit 4 (Ab 22)
39 pages
Big Data Techniques of 2025
No ratings yet
Big Data Techniques of 2025
31 pages
Module 5.docx Aiml
No ratings yet
Module 5.docx Aiml
28 pages
Unit VII
No ratings yet
Unit VII
30 pages
Big Data Analytics
No ratings yet
Big Data Analytics
25 pages
ML Unsupervised Notes
No ratings yet
ML Unsupervised Notes
26 pages
08 Gtu TPT Report
No ratings yet
08 Gtu TPT Report
37 pages
Data Mining Assignment 2
No ratings yet
Data Mining Assignment 2
25 pages
Fundamentals of Data Science Unit 3
No ratings yet
Fundamentals of Data Science Unit 3
15 pages
Circle Generation Algorithm
No ratings yet
Circle Generation Algorithm
10 pages
1M and 10 M
No ratings yet
1M and 10 M
23 pages
Machine Learning Unit-4
No ratings yet
Machine Learning Unit-4
24 pages
Module - 3 - MIS - CSIT204
No ratings yet
Module - 3 - MIS - CSIT204
20 pages
Unit 3 - MLnotes-WPS Office
No ratings yet
Unit 3 - MLnotes-WPS Office
18 pages
Spatial Data Mining Techniques: M.Tech Seminar Report Submitted by
No ratings yet
Spatial Data Mining Techniques: M.Tech Seminar Report Submitted by
28 pages
DWDM Unit 3
No ratings yet
DWDM Unit 3
21 pages
2haeckel Steve (1999) Adaptive-Enterprise-Entire-Book-95-112
No ratings yet
2haeckel Steve (1999) Adaptive-Enterprise-Entire-Book-95-112
18 pages
Hackers Guide To Machine Learning With Python PDF
100% (15)
Hackers Guide To Machine Learning With Python PDF
272 pages
BDA Lecture Unit 3 With LAB
No ratings yet
BDA Lecture Unit 3 With LAB
20 pages
Kavin
No ratings yet
Kavin
15 pages
FAI Lecture - 9-10-2023 PDF
No ratings yet
FAI Lecture - 9-10-2023 PDF
16 pages
Data Science For Civil Engineering Unit 5 Notes
No ratings yet
Data Science For Civil Engineering Unit 5 Notes
17 pages
B SC (IT) VI-DSE3-M5
No ratings yet
B SC (IT) VI-DSE3-M5
13 pages
Machine Learning Note Modul 4 5
No ratings yet
Machine Learning Note Modul 4 5
20 pages
Challenges in Internet of Things
No ratings yet
Challenges in Internet of Things
7 pages
Lecture 6
No ratings yet
Lecture 6
17 pages
Data Enggineering
No ratings yet
Data Enggineering
16 pages
Semister Project
No ratings yet
Semister Project
12 pages
Data Mining-Applications, Issues
No ratings yet
Data Mining-Applications, Issues
9 pages
FML
No ratings yet
FML
18 pages
Unit 2 ML
No ratings yet
Unit 2 ML
11 pages
Unit-2-Part-1 Data Mining
No ratings yet
Unit-2-Part-1 Data Mining
12 pages
Dmbda 2no.
No ratings yet
Dmbda 2no.
13 pages
Practical Projects
100% (30)
Practical Projects
478 pages
Deep Learning Techniques For Cyber Security Intrusion Detection: A Detailed Analysis
No ratings yet
Deep Learning Techniques For Cyber Security Intrusion Detection: A Detailed Analysis
11 pages
Stages in Data Mining
No ratings yet
Stages in Data Mining
11 pages
Unit No 3
No ratings yet
Unit No 3
10 pages
Computer Organization and Architecture Q
No ratings yet
Computer Organization and Architecture Q
7 pages
Clustering
No ratings yet
Clustering
11 pages
Discovering Knowledge in Data: Lecture Review of
No ratings yet
Discovering Knowledge in Data: Lecture Review of
20 pages
Unit 5
No ratings yet
Unit 5
10 pages
Unsupervised Machine Learning
No ratings yet
Unsupervised Machine Learning
10 pages
The Python Manual
97% (31)
The Python Manual
196 pages
Unit 4 Introduction To Algorithm
No ratings yet
Unit 4 Introduction To Algorithm
10 pages
Abhisheksur 121
No ratings yet
Abhisheksur 121
8 pages
ML Unit-5
No ratings yet
ML Unit-5
8 pages
Machine Learning Algorithms Overview
No ratings yet
Machine Learning Algorithms Overview
7 pages
Solve These
No ratings yet
Solve These
7 pages
Handwritten Digit Recognition Using Machine and Deep Learning Algorithms
No ratings yet
Handwritten Digit Recognition Using Machine and Deep Learning Algorithms
6 pages
Unit3 Datamining
No ratings yet
Unit3 Datamining
5 pages
ML ModuleUntitled 2
No ratings yet
ML ModuleUntitled 2
8 pages
Machine Learning Masterclass
100% (11)
Machine Learning Masterclass
108 pages
Lecture 6-Data Mining and Warehousing
No ratings yet
Lecture 6-Data Mining and Warehousing
7 pages
M.L. 3,5,6 Unit 3
No ratings yet
M.L. 3,5,6 Unit 3
6 pages
DM 3rd Unit
No ratings yet
DM 3rd Unit
5 pages
Gautam A. Kudale
No ratings yet
Gautam A. Kudale
6 pages
Entropy (S) Log (P) : I 1c I I
No ratings yet
Entropy (S) Log (P) : I 1c I I
5 pages
Quran Question and Answer Corpus For Data Mining With WEKA
No ratings yet
Quran Question and Answer Corpus For Data Mining With WEKA
6 pages
Bresanham's Circle Generating Algorithm
No ratings yet
Bresanham's Circle Generating Algorithm
5 pages
Survey of Heart Disease Prediction Based On Data Mining Algorithms Ijariie1844
No ratings yet
Survey of Heart Disease Prediction Based On Data Mining Algorithms Ijariie1844
5 pages
What Are The Common Algorithms in Machine Learning
No ratings yet
What Are The Common Algorithms in Machine Learning
3 pages
Practice Question Bank - Machine Learning
No ratings yet
Practice Question Bank - Machine Learning
4 pages
Chatgpt Unit - 4
No ratings yet
Chatgpt Unit - 4
4 pages
Unit 5
No ratings yet
Unit 5
3 pages
Clustering in Machine Learning
No ratings yet
Clustering in Machine Learning
4 pages
Clustering
No ratings yet
Clustering
3 pages
Chapter 7
No ratings yet
Chapter 7
3 pages
Clustering
No ratings yet
Clustering
3 pages
HTCB Unit 5
No ratings yet
HTCB Unit 5
3 pages
Clustering in Machine Learning Notes
No ratings yet
Clustering in Machine Learning Notes
2 pages
ML Assignment 2
No ratings yet
ML Assignment 2
2 pages
Expert Systems With Applications: Guangli Nie, Wei Rowe, Lingling Zhang, Yingjie Tian, Yong Shi
No ratings yet
Expert Systems With Applications: Guangli Nie, Wei Rowe, Lingling Zhang, Yingjie Tian, Yong Shi
3 pages
Course Schedule3
No ratings yet
Course Schedule3
3 pages
Asynchronous Task Cluster Analysis
No ratings yet
Asynchronous Task Cluster Analysis
2 pages
K Nearest Neighbour
No ratings yet
K Nearest Neighbour
2 pages
Overview of Unsupervised Learning
No ratings yet
Overview of Unsupervised Learning
2 pages
Full ml-2
No ratings yet
Full ml-2
1 page
The Secret Of Machine Learning
From Everand
The Secret Of Machine Learning
Mhd Arjunanta
No ratings yet
Image Segmentation: Unlocking Insights through Pixel Precision
From Everand
Image Segmentation: Unlocking Insights through Pixel Precision
Fouad Sabry
No ratings yet