Machine Learning - Clustering
Machine Learning - Clustering
Prof. Dr. Dewan Md. Farid: Unsupervised Learning in Machine Learning Professor of Computer Science, United International University
Outline Introduction K-Means Clustering Similarity-Based Clustering Nearest Neighbor Clustering Ensemble Clustering Subspace Clustering
Introduction
K-Means Clustering
Similarity-Based Clustering
Ensemble Clustering
Subspace Clustering
Prof. Dr. Dewan Md. Farid: Unsupervised Learning in Machine Learning Professor of Computer Science, United International University
Outline Introduction K-Means Clustering Similarity-Based Clustering Nearest Neighbor Clustering Ensemble Clustering Subspace Clustering
What is Clustering?
Prof. Dr. Dewan Md. Farid: Unsupervised Learning in Machine Learning Professor of Computer Science, United International University
Outline Introduction K-Means Clustering Similarity-Based Clustering Nearest Neighbor Clustering Ensemble Clustering Subspace Clustering
Prof. Dr. Dewan Md. Farid: Unsupervised Learning in Machine Learning Professor of Computer Science, United International University
Outline Introduction K-Means Clustering Similarity-Based Clustering Nearest Neighbor Clustering Ensemble Clustering Subspace Clustering
Area of Applications
Clustering has been widely used in many real world applications, such as:
I Human genetic clustering
I Medical imaging clustering
I Market research
I Field robotics
I Crime analysis
I Pattern recognition
Prof. Dr. Dewan Md. Farid: Unsupervised Learning in Machine Learning Professor of Computer Science, United International University
Outline Introduction K-Means Clustering Similarity-Based Clustering Nearest Neighbor Clustering Ensemble Clustering Subspace Clustering
Clustering Instances
X = {x1 , x2 , · · · , xN }; (1)
Prof. Dr. Dewan Md. Farid: Unsupervised Learning in Machine Learning Professor of Computer Science, United International University
Outline Introduction K-Means Clustering Similarity-Based Clustering Nearest Neighbor Clustering Ensemble Clustering Subspace Clustering
Prof. Dr. Dewan Md. Farid: Unsupervised Learning in Machine Learning Professor of Computer Science, United International University
Outline Introduction K-Means Clustering Similarity-Based Clustering Nearest Neighbor Clustering Ensemble Clustering Subspace Clustering
The basic clustering methods are organised into the four categories:
1. Partitioning methods
2. Hierarchical methods
3. Density-based methods
4. Grid-based methods
Prof. Dr. Dewan Md. Farid: Unsupervised Learning in Machine Learning Professor of Computer Science, United International University
Outline Introduction K-Means Clustering Similarity-Based Clustering Nearest Neighbor Clustering Ensemble Clustering Subspace Clustering
Partitioning Method
Prof. Dr. Dewan Md. Farid: Unsupervised Learning in Machine Learning Professor of Computer Science, United International University
Outline Introduction K-Means Clustering Similarity-Based Clustering Nearest Neighbor Clustering Ensemble Clustering Subspace Clustering
Hierarchical Method
Prof. Dr. Dewan Md. Farid: Unsupervised Learning in Machine Learning Professor of Computer Science, United International University
Outline Introduction K-Means Clustering Similarity-Based Clustering Nearest Neighbor Clustering Ensemble Clustering Subspace Clustering
Density-based method
The density-based methods cluster instances based on the distance
between instances, which can find arbitrarily shaped clusters. It can
cluster instances as dense regions in the data space, separated by sparse
regions.
Prof. Dr. Dewan Md. Farid: Unsupervised Learning in Machine Learning Professor of Computer Science, United International University
Outline Introduction K-Means Clustering Similarity-Based Clustering Nearest Neighbor Clustering Ensemble Clustering Subspace Clustering
Grid-based method
Prof. Dr. Dewan Md. Farid: Unsupervised Learning in Machine Learning Professor of Computer Science, United International University
Outline Introduction K-Means Clustering Similarity-Based Clustering Nearest Neighbor Clustering Ensemble Clustering Subspace Clustering
Similarity Measure
Prof. Dr. Dewan Md. Farid: Unsupervised Learning in Machine Learning Professor of Computer Science, United International University
Outline Introduction K-Means Clustering Similarity-Based Clustering Nearest Neighbor Clustering Ensemble Clustering Subspace Clustering
Distance Measure
Where, xi = (xi1 , xi2 , · · · , xim ) and xl = (xl1 , xl2 , · · · , xlm ) are two
instances in Euclidean m-space.
Prof. Dr. Dewan Md. Farid: Unsupervised Learning in Machine Learning Professor of Computer Science, United International University
Outline Introduction K-Means Clustering Similarity-Based Clustering Nearest Neighbor Clustering Ensemble Clustering Subspace Clustering
k-Means or c-Means
Prof. Dr. Dewan Md. Farid: Unsupervised Learning in Machine Learning Professor of Computer Science, United International University
Outline Introduction K-Means Clustering Similarity-Based Clustering Nearest Neighbor Clustering Ensemble Clustering Subspace Clustering
Cluster Mean
Prof. Dr. Dewan Md. Farid: Unsupervised Learning in Machine Learning Professor of Computer Science, United International University
Outline Introduction K-Means Clustering Similarity-Based Clustering Nearest Neighbor Clustering Ensemble Clustering Subspace Clustering
Prof. Dr. Dewan Md. Farid: Unsupervised Learning in Machine Learning Professor of Computer Science, United International University
Outline Introduction K-Means Clustering Similarity-Based Clustering Nearest Neighbor Clustering Ensemble Clustering Subspace Clustering
Prof. Dr. Dewan Md. Farid: Unsupervised Learning in Machine Learning Professor of Computer Science, United International University
Outline Introduction K-Means Clustering Similarity-Based Clustering Nearest Neighbor Clustering Ensemble Clustering Subspace Clustering
K-Means - An Example
Prof. Dr. Dewan Md. Farid: Unsupervised Learning in Machine Learning Professor of Computer Science, United International University
Outline Introduction K-Means Clustering Similarity-Based Clustering Nearest Neighbor Clustering Ensemble Clustering Subspace Clustering
Prof. Dr. Dewan Md. Farid: Unsupervised Learning in Machine Learning Professor of Computer Science, United International University
Outline Introduction K-Means Clustering Similarity-Based Clustering Nearest Neighbor Clustering Ensemble Clustering Subspace Clustering
Run Information
kMeans
======
Number of iterations: 4
Within cluster sum of squared errors: 21.000000000000004
Missing values globally replaced with mean/mode
Cluster centroids:
Cluster#
Attribute Full Data 0 1
(14) (10) (4)
==============================================
outlook sunny sunny overcast
temperature mild mild cool
humidity high high normal
windy FALSE FALSE TRUE
Prof. Dr. Dewan Md. Farid: Unsupervised Learning in Machine Learning Professor of Computer Science, United International University
Outline Introduction K-Means Clustering Similarity-Based Clustering Nearest Neighbor Clustering Ensemble Clustering Subspace Clustering
Prof. Dr. Dewan Md. Farid: Unsupervised Learning in Machine Learning Professor of Computer Science, United International University
Outline Introduction K-Means Clustering Similarity-Based Clustering Nearest Neighbor Clustering Ensemble Clustering Subspace Clustering
Prof. Dr. Dewan Md. Farid: Unsupervised Learning in Machine Learning Professor of Computer Science, United International University
Outline Introduction K-Means Clustering Similarity-Based Clustering Nearest Neighbor Clustering Ensemble Clustering Subspace Clustering
Similarity-Based Clustering
Prof. Dr. Dewan Md. Farid: Unsupervised Learning in Machine Learning Professor of Computer Science, United International University
Outline Introduction K-Means Clustering Similarity-Based Clustering Nearest Neighbor Clustering Ensemble Clustering Subspace Clustering
Prof. Dr. Dewan Md. Farid: Unsupervised Learning in Machine Learning Professor of Computer Science, United International University
Outline Introduction K-Means Clustering Similarity-Based Clustering Nearest Neighbor Clustering Ensemble Clustering Subspace Clustering
Prof. Dr. Dewan Md. Farid: Unsupervised Learning in Machine Learning Professor of Computer Science, United International University
Outline Introduction K-Means Clustering Similarity-Based Clustering Nearest Neighbor Clustering Ensemble Clustering Subspace Clustering
SCM - An Example
Prof. Dr. Dewan Md. Farid: Unsupervised Learning in Machine Learning Professor of Computer Science, United International University
Outline Introduction K-Means Clustering Similarity-Based Clustering Nearest Neighbor Clustering Ensemble Clustering Subspace Clustering
Instances are iteratively merged into the existing clusters that are closest.
In NN clustering a threshold, t, is used to determine if instances will be
added to existing clusters or if a new cluster is created. The complexity
of the NN clustering algorithm is depends on the number of instances in
the dataset. For each loop, each instance must be compared to each
instance already in a cluster.
Thus, the time complexity of NN clustering algorithm is O(n2 ). We do
need to calculate the distance between instances often, we assume that
the space requirement is also O(n2 ).
Prof. Dr. Dewan Md. Farid: Unsupervised Learning in Machine Learning Professor of Computer Science, United International University
Outline Introduction K-Means Clustering Similarity-Based Clustering Nearest Neighbor Clustering Ensemble Clustering Subspace Clustering
Prof. Dr. Dewan Md. Farid: Unsupervised Learning in Machine Learning Professor of Computer Science, United International University
Outline Introduction K-Means Clustering Similarity-Based Clustering Nearest Neighbor Clustering Ensemble Clustering Subspace Clustering
The distance between the two points in the plane with coordinate (x,y)
and (a,b) is given by:
q
2 2
EuclideanDistance, (x, y )(a, b) = (x − a) + (y − b) (8)
Prof. Dr. Dewan Md. Farid: Unsupervised Learning in Machine Learning Professor of Computer Science, United International University
Outline Introduction K-Means Clustering Similarity-Based Clustering Nearest Neighbor Clustering Ensemble Clustering Subspace Clustering
Ensemble Clustering
Prof. Dr. Dewan Md. Farid: Unsupervised Learning in Machine Learning Professor of Computer Science, United International University
Outline Introduction K-Means Clustering Similarity-Based Clustering Nearest Neighbor Clustering Ensemble Clustering Subspace Clustering
Prof. Dr. Dewan Md. Farid: Unsupervised Learning in Machine Learning Professor of Computer Science, United International University
Outline Introduction K-Means Clustering Similarity-Based Clustering Nearest Neighbor Clustering Ensemble Clustering Subspace Clustering
Subspace Clustering
The subspace clustering finds subspace clusters in high-dimensional data.
It can be classified into three groups:
1. Subspace search methods.
2. Correlation-based clustering methods
3. Biclustering methods.
A subspace search method searches various subspaces for clusters (set
of instances that are similar to each other in a subspace) in the full
space. It uses two kinds of strategies:
I Bottom-up approach - start from low-dimensional subspace and
search higher-dimensional subspaces.
I Top-down approach - start with full space and search smaller
subspaces recursively.
Prof. Dr. Dewan Md. Farid: Unsupervised Learning in Machine Learning Professor of Computer Science, United International University
Outline Introduction K-Means Clustering Similarity-Based Clustering Nearest Neighbor Clustering Ensemble Clustering Subspace Clustering
Prof. Dr. Dewan Md. Farid: Unsupervised Learning in Machine Learning Professor of Computer Science, United International University
Outline Introduction K-Means Clustering Similarity-Based Clustering Nearest Neighbor Clustering Ensemble Clustering Subspace Clustering
Prof. Dr. Dewan Md. Farid: Unsupervised Learning in Machine Learning Professor of Computer Science, United International University
Outline Introduction K-Means Clustering Similarity-Based Clustering Nearest Neighbor Clustering Ensemble Clustering Subspace Clustering
Prof. Dr. Dewan Md. Farid: Unsupervised Learning in Machine Learning Professor of Computer Science, United International University
Outline Introduction K-Means Clustering Similarity-Based Clustering Nearest Neighbor Clustering Ensemble Clustering Subspace Clustering
Weka Explorer
Prof. Dr. Dewan Md. Farid: Unsupervised Learning in Machine Learning Professor of Computer Science, United International University
Outline Introduction K-Means Clustering Similarity-Based Clustering Nearest Neighbor Clustering Ensemble Clustering Subspace Clustering
Reference Books
Prof. Dr. Dewan Md. Farid: Unsupervised Learning in Machine Learning Professor of Computer Science, United International University
Outline Introduction K-Means Clustering Similarity-Based Clustering Nearest Neighbor Clustering Ensemble Clustering Subspace Clustering
Prof. Dr. Dewan Md. Farid: Unsupervised Learning in Machine Learning Professor of Computer Science, United International University