0% found this document useful (0 votes)
3 views41 pages

Machine Learning - Clustering

The document discusses various clustering methods in unsupervised learning, including K-Means, similarity-based, nearest neighbor, ensemble, and subspace clustering. It outlines the principles of clustering, its applications, and the requirements for effective clustering, as well as detailing different clustering algorithms and their characteristics. Additionally, it highlights the k-Means algorithm, its process, and its limitations.

Uploaded by

quinnharley6942
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
3 views41 pages

Machine Learning - Clustering

The document discusses various clustering methods in unsupervised learning, including K-Means, similarity-based, nearest neighbor, ensemble, and subspace clustering. It outlines the principles of clustering, its applications, and the requirements for effective clustering, as well as detailing different clustering algorithms and their characteristics. Additionally, it highlights the k-Means algorithm, its process, and its limitations.

Uploaded by

quinnharley6942
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 41

Outline Introduction K-Means Clustering Similarity-Based Clustering Nearest Neighbor Clustering Ensemble Clustering Subspace Clustering

Unsupervised Learning in Machine Learning

Prof. Dr. Dewan Md. Farid

Professor of Computer Science, United International University

June 29, 2024

Prof. Dr. Dewan Md. Farid: Unsupervised Learning in Machine Learning Professor of Computer Science, United International University
Outline Introduction K-Means Clustering Similarity-Based Clustering Nearest Neighbor Clustering Ensemble Clustering Subspace Clustering

Introduction

K-Means Clustering

Similarity-Based Clustering

Nearest Neighbor Clustering

Ensemble Clustering

Subspace Clustering

Prof. Dr. Dewan Md. Farid: Unsupervised Learning in Machine Learning Professor of Computer Science, United International University
Outline Introduction K-Means Clustering Similarity-Based Clustering Nearest Neighbor Clustering Ensemble Clustering Subspace Clustering

What is Clustering?

Clustering is the process of grouping a set of instances (data points or


examples or vectors) into clusters (subsets or groups) so that instances
within a cluster have high similarity in comparison to one another, but
are very dissimilar to instances in other clusters.
Clustering may be found under different names in different contexts, such
as:
I Unsupervised Learning
I Data Segmentation
I Automatic Classification
I Learning by Observation

Prof. Dr. Dewan Md. Farid: Unsupervised Learning in Machine Learning Professor of Computer Science, United International University
Outline Introduction K-Means Clustering Similarity-Based Clustering Nearest Neighbor Clustering Ensemble Clustering Subspace Clustering

What is Clustering? (con.)

Figure: Clustering of a set of instances.

Similarities and dissimilarities of instances are based on the predefined


features of the data. The most similar instances are grouped into a single
cluster.

Prof. Dr. Dewan Md. Farid: Unsupervised Learning in Machine Learning Professor of Computer Science, United International University
Outline Introduction K-Means Clustering Similarity-Based Clustering Nearest Neighbor Clustering Ensemble Clustering Subspace Clustering

Area of Applications

Clustering has been widely used in many real world applications, such as:
I Human genetic clustering
I Medical imaging clustering
I Market research
I Field robotics
I Crime analysis
I Pattern recognition

Prof. Dr. Dewan Md. Farid: Unsupervised Learning in Machine Learning Professor of Computer Science, United International University
Outline Introduction K-Means Clustering Similarity-Based Clustering Nearest Neighbor Clustering Ensemble Clustering Subspace Clustering

Clustering Instances

Let X be the unlabelled data set, that is,

X = {x1 , x2 , · · · , xN }; (1)

The partition of X into k clusters, C1 , · · · , Ck , so that the following


conditions are met:
Ci 6= ∅, i = 1, · · · , k; (2)
∪ki=1 Ci = X ; (3)
Ci ∩ Cj = ∅, i 6= j, i, j = 1, · · · , k; (4)

Prof. Dr. Dewan Md. Farid: Unsupervised Learning in Machine Learning Professor of Computer Science, United International University
Outline Introduction K-Means Clustering Similarity-Based Clustering Nearest Neighbor Clustering Ensemble Clustering Subspace Clustering

Requirements for Clustering

The goal of clustering is to group a set of unlabelled data. There are


many typical requirements of clustering in machine learning and data
mining, such as:
I Dealing with large data sets containing different types of attributes.
I Find the clusters with arbitrary shape.
I Ability to deal with noisy data in data streaming environment.
I Handling with high-dimensional data sets.
I Constraint-based clustering.

Prof. Dr. Dewan Md. Farid: Unsupervised Learning in Machine Learning Professor of Computer Science, United International University
Outline Introduction K-Means Clustering Similarity-Based Clustering Nearest Neighbor Clustering Ensemble Clustering Subspace Clustering

Types of Clustering Methods

The basic clustering methods are organised into the four categories:
1. Partitioning methods
2. Hierarchical methods
3. Density-based methods
4. Grid-based methods

Prof. Dr. Dewan Md. Farid: Unsupervised Learning in Machine Learning Professor of Computer Science, United International University
Outline Introduction K-Means Clustering Similarity-Based Clustering Nearest Neighbor Clustering Ensemble Clustering Subspace Clustering

Partitioning Method

I The partitioning method constructs k clusters of the given set of N


instances, where k ≤ N. It finds mutually exclusive clusters of
spherical shape using the traditional distance measures (Euclidean
distances).
I To find the cluster center, it may use mean or medoid (etc.) and
apply iterative relocation technique to improve the clustering by
moving instances from one cluster to another such as k-means
clustering.
I The partitioning algorithms are ineffective for clustering
high-dimensional big data.

Prof. Dr. Dewan Md. Farid: Unsupervised Learning in Machine Learning Professor of Computer Science, United International University
Outline Introduction K-Means Clustering Similarity-Based Clustering Nearest Neighbor Clustering Ensemble Clustering Subspace Clustering

Hierarchical Method

The hierarchical methods create a hierarchical decomposition of N


instances. It can be divided into two categories:
1. top-down (or divisive) approache.
2. bottom-up (or agglomerative) approache
The top-down approach starts with a single cluster having all the N
instances and then split into smaller clusters in each successive iteration,
until eventually each instance is in one cluster, or a termination condition
holds.
The bottom-up approach starts with each instance forming a separate
cluster and then successively merges the clusters close to one another,
until all the clusters are merged into a single cluster, or a termination
condition holds.

Prof. Dr. Dewan Md. Farid: Unsupervised Learning in Machine Learning Professor of Computer Science, United International University
Outline Introduction K-Means Clustering Similarity-Based Clustering Nearest Neighbor Clustering Ensemble Clustering Subspace Clustering

Density-based method
The density-based methods cluster instances based on the distance
between instances, which can find arbitrarily shaped clusters. It can
cluster instances as dense regions in the data space, separated by sparse
regions.

Figure: Clustering of a set of instances using density-based clustering.

Prof. Dr. Dewan Md. Farid: Unsupervised Learning in Machine Learning Professor of Computer Science, United International University
Outline Introduction K-Means Clustering Similarity-Based Clustering Nearest Neighbor Clustering Ensemble Clustering Subspace Clustering

Grid-based method

The grid-based methods use a multi-resolution grid data structure. It’s


fast processing time that typically independent of the number of
instances, yet dependent on the grid size.

Prof. Dr. Dewan Md. Farid: Unsupervised Learning in Machine Learning Professor of Computer Science, United International University
Outline Introduction K-Means Clustering Similarity-Based Clustering Nearest Neighbor Clustering Ensemble Clustering Subspace Clustering

Similarity Measure

A similarity measure (SM), sim(xi , xl ), defined between any two


instances, xi , xl ∈ X ; An integer value k, the clustering problem is to
define a mapping f : X → 1, · · · , k, where each instance, xi is assigned
to one cluster Ci , 1 ≤ i ≤ k;
Given a cluster, Ci , ∀xil , xim ∈ Ci , and xj ∈
/ Ci , sim(xil , xim ) > sim(xil , xj );
A good clustering is that instances in the same cluster are “close” or
related to each other, whereas instances of different clusters are “far
apart” or very different from one another, which together satisfy the
following requirements:
I Each cluster must contain at least one instance.
I Each instance must belong to exactly one cluster.

Prof. Dr. Dewan Md. Farid: Unsupervised Learning in Machine Learning Professor of Computer Science, United International University
Outline Introduction K-Means Clustering Similarity-Based Clustering Nearest Neighbor Clustering Ensemble Clustering Subspace Clustering

Distance Measure

A distance measure (DM), dis(xi , xl ), where xi , xl ∈ X , as opposed to


similarity measure, is often used in clustering. Let’s consider the
well-known Euclidean distance or Euclidean metric (i.e. straight-line)
between two instances in Euclidean space in Eq. 5.
v
u m
uX
dis(xi , xl ) = t (xi − xl )2 (5)
i=1

Where, xi = (xi1 , xi2 , · · · , xim ) and xl = (xl1 , xl2 , · · · , xlm ) are two
instances in Euclidean m-space.

Prof. Dr. Dewan Md. Farid: Unsupervised Learning in Machine Learning Professor of Computer Science, United International University
Outline Introduction K-Means Clustering Similarity-Based Clustering Nearest Neighbor Clustering Ensemble Clustering Subspace Clustering

k-Means or c-Means

It defines the centroid of a cluster, Ci as the mean value of the instances


{xi1 , xi2 , · · · , xiN } ∈ Ci . It proceeds as follows. First, it randomly selects
k instances, {xk1 , xk2 , · · · , xkN } ∈ X each of which initially represents a
cluster mean or center. For each of the remaining instances, xi ∈ X , xi is
assigned to the cluster to which it is most similar, based on the Euclidean
distance between the instance and the cluster mean. It then iteratively
improves the within-cluster variation. For each cluster, Ci , it computes
the new mean using the instances assigned to the cluster in the previous
iteration. All the instances, xi ∈ X are then reassigned into clusters using
the updated means as the new cluster centers. The iterations continue
until the assignment is stable, that is the clusters formed in the current
round are the same as those formed in the previous round.

Prof. Dr. Dewan Md. Farid: Unsupervised Learning in Machine Learning Professor of Computer Science, United International University
Outline Introduction K-Means Clustering Similarity-Based Clustering Nearest Neighbor Clustering Ensemble Clustering Subspace Clustering

Cluster Mean

A high degree of similarity among instances in clusters is obtained, while


a high degree of dissimilarity among instances in different clusters is
achieved simultaneously. The cluster mean of Ci = {xi1 , xi2 , · · · , xiN } is
defined in equation 6.
PN
j=1 (xij )
Mean = Ci = (6)
N

Prof. Dr. Dewan Md. Farid: Unsupervised Learning in Machine Learning Professor of Computer Science, United International University
Outline Introduction K-Means Clustering Similarity-Based Clustering Nearest Neighbor Clustering Ensemble Clustering Subspace Clustering

Algorithm 1 k-Means Clustering


Input: X = {x1 , x2 , · · · , xN } // A set of unlabelled instances.
k // the number of clusters
Output: A set of k clusters.
Method:
1: arbitrarily choose k number of instances, {xk1 , xk2 , · · · , xkN } ∈ X as
the initial k clusters center;
2: repeat
3: (re)assign each xi ∈ X → k to which the xi is the most similar based
on the mean value of the xm ∈ k;
4: update the k means, that is, calculate the mean value of the instances
for each cluster;
5: until no change

Prof. Dr. Dewan Md. Farid: Unsupervised Learning in Machine Learning Professor of Computer Science, United International University
Outline Introduction K-Means Clustering Similarity-Based Clustering Nearest Neighbor Clustering Ensemble Clustering Subspace Clustering

Drawbacks of k-Means Clustering

The k-Means clustering is not guaranteed to converge to the global


optimum and often terminates at a local optimum (as the initial cluster
means are assigned randomly). It may not be used in some application
such as when data with nominal features are involved. The k-Means
method is not suitable for discovering clusters with non-convex shapes or
clusters of very different size.
The time complexity of the k-Means algorithm is O(nkt), where n is the
total number of instances, k is the number of clusters, and t is the
number of iterations. Normally, k  n and t  n.

Prof. Dr. Dewan Md. Farid: Unsupervised Learning in Machine Learning Professor of Computer Science, United International University
Outline Introduction K-Means Clustering Similarity-Based Clustering Nearest Neighbor Clustering Ensemble Clustering Subspace Clustering

K-Means - An Example

Figure: Weather Numeric Data.

Prof. Dr. Dewan Md. Farid: Unsupervised Learning in Machine Learning Professor of Computer Science, United International University
Outline Introduction K-Means Clustering Similarity-Based Clustering Nearest Neighbor Clustering Ensemble Clustering Subspace Clustering

K-Means using Weka 3

Figure: SimpleKMeans on Weather Nominal Data.

Prof. Dr. Dewan Md. Farid: Unsupervised Learning in Machine Learning Professor of Computer Science, United International University
Outline Introduction K-Means Clustering Similarity-Based Clustering Nearest Neighbor Clustering Ensemble Clustering Subspace Clustering

Run Information

=== Run information ===


Scheme:weka.clusterers.SimpleKMeans -N 2 -A "weka.core.EuclideanDistance -R
first-last" -I 500 -S 10
Relation: weather.symbolic-weka.filters.unsupervised.attribute.Remove-R5
Instances: 14
Attributes: 4
outlook
temperature
humidity
windy
Test mode:evaluate on training data

=== Model and evaluation on training set ===

kMeans
======
Number of iterations: 4
Within cluster sum of squared errors: 21.000000000000004
Missing values globally replaced with mean/mode
Cluster centroids:
Cluster#
Attribute Full Data 0 1
(14) (10) (4)
==============================================
outlook sunny sunny overcast
temperature mild mild cool
humidity high high normal
windy FALSE FALSE TRUE

Time taken to build model (full training data) : 0 seconds

=== Model and evaluation on training set ===


Clustered Instances
0 10 ( 71%)
1 4 ( 29%)

Prof. Dr. Dewan Md. Farid: Unsupervised Learning in Machine Learning Professor of Computer Science, United International University
Outline Introduction K-Means Clustering Similarity-Based Clustering Nearest Neighbor Clustering Ensemble Clustering Subspace Clustering

Weka Cluster Visualize

Figure: Clustering Weather Nominal Data.

Prof. Dr. Dewan Md. Farid: Unsupervised Learning in Machine Learning Professor of Computer Science, United International University
Outline Introduction K-Means Clustering Similarity-Based Clustering Nearest Neighbor Clustering Ensemble Clustering Subspace Clustering

k-Means: Another Example


Table: Height Data
Name Gender Height Output
Kristina F 1.6 m Short
Jim M 2m Tall
Maggie F 1.9 m Medium
Martha F 1.88 m Medium
Stephanie F 1.7 m Short
Bob M 1.85 m Medium
Kathy F 1.6 m Short
Dave M 1.7 m Short
Worth M 2.2 m Tall
Steven M 2.1 m Tall
Debbie F 1.8 m Medium
Todd M 1.95 m Medium
Kim F 1.9 m Medium
Amy F 1.8 m Medium
Wynette F 1.75 m Medium

Prof. Dr. Dewan Md. Farid: Unsupervised Learning in Machine Learning Professor of Computer Science, United International University
Outline Introduction K-Means Clustering Similarity-Based Clustering Nearest Neighbor Clustering Ensemble Clustering Subspace Clustering

Similarity-Based Clustering

A similarity-based clustering method (SCM) is an effective and robust


clustering approach based on the similarity of instances, which is robust
to initialise the cluster numbers and efficient to detect different volumes
of clusters. SCM is a method for clustering a data set into most similar
instances in the same cluster and most dissimilar instances in different
clusters. The instances in SCM can self-organise local optimal cluster
number and volumes without using cluster validity functions.

Prof. Dr. Dewan Md. Farid: Unsupervised Learning in Machine Learning Professor of Computer Science, United International University
Outline Introduction K-Means Clustering Similarity-Based Clustering Nearest Neighbor Clustering Ensemble Clustering Subspace Clustering

Similarity between Instances

Let’s consider sim(xi , xl ) as the similarity measure between instances xi


and the lth cluster center xl . The goal is to find xl to maximise the total
similarity measure shown in Eq. 7.
k X
X N
Js (C ) = f (sim(xi , xl )) (7)
l=1 i=1

Where, f (sim(xi , xl )) is a reasonable similarity measure and


C = {C1 , · · · , Ck }. In general, the similarity-based clustering method
uses feature values to check the similarity between instances. However,
any suitable distance measure can be used to check the similarity
between the instances.

Prof. Dr. Dewan Md. Farid: Unsupervised Learning in Machine Learning Professor of Computer Science, United International University
Outline Introduction K-Means Clustering Similarity-Based Clustering Nearest Neighbor Clustering Ensemble Clustering Subspace Clustering

Algorithm 2 Similarity-based Clustering


Input: X = {x1 , x2 , · · · , xN } // A set of unlabelled instances.
Output: A set of clusters, C = {C1 , C2 , · · · , Ck }.
Method:
1: C = ∅;
2: k = 1;
3: Ck = {x1 };
4: C = C ∪ Ck ;
5: for i = 2 to N do
6: for l = 1 to k do
7: find the lth cluster center xl ∈ Cl to maximize the similarity
measure, sim(xi , xl );
8: end for
9: if sim(xi , xl ) ≥ threshold value then
10: Cl = Cl ∪ xi
11: else
12: k = k + 1;
13: Ck = {xi };
14: C = C ∪ Ck ;
15: end if
16: end for

Prof. Dr. Dewan Md. Farid: Unsupervised Learning in Machine Learning Professor of Computer Science, United International University
Outline Introduction K-Means Clustering Similarity-Based Clustering Nearest Neighbor Clustering Ensemble Clustering Subspace Clustering

SCM - An Example

Figure: Weather Nominal Data.

Prof. Dr. Dewan Md. Farid: Unsupervised Learning in Machine Learning Professor of Computer Science, United International University
Outline Introduction K-Means Clustering Similarity-Based Clustering Nearest Neighbor Clustering Ensemble Clustering Subspace Clustering

Nearest Neighbor (NN) Clustering

Instances are iteratively merged into the existing clusters that are closest.
In NN clustering a threshold, t, is used to determine if instances will be
added to existing clusters or if a new cluster is created. The complexity
of the NN clustering algorithm is depends on the number of instances in
the dataset. For each loop, each instance must be compared to each
instance already in a cluster.
Thus, the time complexity of NN clustering algorithm is O(n2 ). We do
need to calculate the distance between instances often, we assume that
the space requirement is also O(n2 ).

Prof. Dr. Dewan Md. Farid: Unsupervised Learning in Machine Learning Professor of Computer Science, United International University
Outline Introduction K-Means Clustering Similarity-Based Clustering Nearest Neighbor Clustering Ensemble Clustering Subspace Clustering

Algorithm 3 Nearest Neighbor Clustering


Input: D = {x1 , x2 , · · · , xn } // A set of instances.
A // Adjacency matrix showing distance between instances
Output: A set of C clusters.
Method:
1: C1 = {x1 };
2: C = {C1 };
3: k = 1;
4: for i = 2 to n do
5: find xm in some cluster Cm in C so that dis(xi , xm ) is the smallest;
6: if dis(xi , xm ) ≤t, threshold value then
7: Cm = Cm ∪ xi
8: else
9: k = k + 1;
10: Ck = {xi };
11: C = C ∪ Ck ;
12: end if
13: end for

Prof. Dr. Dewan Md. Farid: Unsupervised Learning in Machine Learning Professor of Computer Science, United International University
Outline Introduction K-Means Clustering Similarity-Based Clustering Nearest Neighbor Clustering Ensemble Clustering Subspace Clustering

Euclidean Vs. Manhattan distance

The distance between the two points in the plane with coordinate (x,y)
and (a,b) is given by:
q
2 2
EuclideanDistance, (x, y )(a, b) = (x − a) + (y − b) (8)

ManhattanDistance, (x, y )(a, b) = |x − a| + |y − b| (9)

Prof. Dr. Dewan Md. Farid: Unsupervised Learning in Machine Learning Professor of Computer Science, United International University
Outline Introduction K-Means Clustering Similarity-Based Clustering Nearest Neighbor Clustering Ensemble Clustering Subspace Clustering

Ensemble Clustering

Ensemble clustering is a process of integrating multiple clustering


algorithms to form a single strong clustering approach that usually
provides better clustering results. It generates a set of clusters from a
given unlabelled data set and then combines the clusters into final
clusters to improve the quality of individual clustering.
I No single cluster analysis method is optimal.
I Different clustering methods may produce different clusters, because
they impose different structure on the data set.
I Ensemble clustering performs more effectively in high dimensional
complex data.
I It’s a good alternative when facing cluster analysis problems.

Prof. Dr. Dewan Md. Farid: Unsupervised Learning in Machine Learning Professor of Computer Science, United International University
Outline Introduction K-Means Clustering Similarity-Based Clustering Nearest Neighbor Clustering Ensemble Clustering Subspace Clustering

Ensemble clustering (con.)

Generally three strategies are applied in ensemble clustering:


1. Using different clustering algorithms on the same data set to create
heterogeneous clusters.
2. Using different samples/ subsets of the data with different clustering
algorithms to cluster them to produce component clusters.
3. Running the same clustering algorithm many times on same data set
with different parameters or initialisations to create homogeneous
clusters.
The main goal of the ensemble clustering is to integrate component
clustering into one final clustering with a higher accuracy.

Prof. Dr. Dewan Md. Farid: Unsupervised Learning in Machine Learning Professor of Computer Science, United International University
Outline Introduction K-Means Clustering Similarity-Based Clustering Nearest Neighbor Clustering Ensemble Clustering Subspace Clustering

Subspace Clustering
The subspace clustering finds subspace clusters in high-dimensional data.
It can be classified into three groups:
1. Subspace search methods.
2. Correlation-based clustering methods
3. Biclustering methods.
A subspace search method searches various subspaces for clusters (set
of instances that are similar to each other in a subspace) in the full
space. It uses two kinds of strategies:
I Bottom-up approach - start from low-dimensional subspace and
search higher-dimensional subspaces.
I Top-down approach - start with full space and search smaller
subspaces recursively.

Prof. Dr. Dewan Md. Farid: Unsupervised Learning in Machine Learning Professor of Computer Science, United International University
Outline Introduction K-Means Clustering Similarity-Based Clustering Nearest Neighbor Clustering Ensemble Clustering Subspace Clustering

Subspace Clustering (con.)

A correlation-based approach uses space transformation methods to


derive a set of new, uncorrelated dimensions, and then mine clusters in
the new space or its subspaces. It uses PCA-based approach (principal
components analysis), the Hough transform, and fractal dimensions.
Biclustering methods cluster both instances and features
simultaneously, where cluster analysis involves searching data matrices for
sub-matrices that show unique patterns as clusters.

Prof. Dr. Dewan Md. Farid: Unsupervised Learning in Machine Learning Professor of Computer Science, United International University
Outline Introduction K-Means Clustering Similarity-Based Clustering Nearest Neighbor Clustering Ensemble Clustering Subspace Clustering

Weka 3: Data Mining Software in Java

Weka (Waikato Environment for Knowledge Analysis) is a collection of


machine learning algorithms for data mining tasks. The algorithms can
either be applied directly to a dataset or called from your own Java code.
Weka contains tools for data pre-processing, classification, regression,
clustering, association rules, and visualization.
Mark Hall, Eibe Frank, Geoffrey Holmes, Bernhard Pfahringer, Peter
Reutemann, Ian H. Witten (2009); The WEKA Data Mining Software:
An Update; SIGKDD Explorations, Volume 11, Issue 1.

Prof. Dr. Dewan Md. Farid: Unsupervised Learning in Machine Learning Professor of Computer Science, United International University
Outline Introduction K-Means Clustering Similarity-Based Clustering Nearest Neighbor Clustering Ensemble Clustering Subspace Clustering

Clustering Algorithms in Weka 3


1. SimpleKMeans - Cluster using the k-Means method.
2. XMeans - Extension of k-Means.
3. DBScan - Nearest-neighbor-based clustering that automatically
determines the number of clusters.
4. OPTICS - Extension of DBScan to hierarchical clustering.
5. HierarchicalClusterer - Agglomerative hierarchical clustering.
6. MakeDensityBasedCluster - Wrap a clusterer to make it return
distribution and density.
7. EM - Cluster using expectation maximization.
8. CLOPE - Fast clustering of transactional data.
9. Cobweb - Implements the Cobweb and Classit clustering algorithms.
10. FarthestFirst - Cluster using the farthest first traversal algorithm.
11. FilteredClusterer - Runs a clusterer on filtered data.
12. sIB - Cluster using the sequential information bottleneck algorithm.
Prof. Dr. Dewan Md. Farid: Unsupervised Learning in Machine Learning Professor of Computer Science, United International University
Outline Introduction K-Means Clustering Similarity-Based Clustering Nearest Neighbor Clustering Ensemble Clustering Subspace Clustering

Weka GUI Chooser

Figure: Weka GUI Chooser.

Prof. Dr. Dewan Md. Farid: Unsupervised Learning in Machine Learning Professor of Computer Science, United International University
Outline Introduction K-Means Clustering Similarity-Based Clustering Nearest Neighbor Clustering Ensemble Clustering Subspace Clustering

Weka Explorer

Figure: Weka Explorer.


Prof. Dr. Dewan Md. Farid: Unsupervised Learning in Machine Learning Professor of Computer Science, United International University
Outline Introduction K-Means Clustering Similarity-Based Clustering Nearest Neighbor Clustering Ensemble Clustering Subspace Clustering

Clustering using Weka

Figure: Cluster - Weka Explorer.

Prof. Dr. Dewan Md. Farid: Unsupervised Learning in Machine Learning Professor of Computer Science, United International University
Outline Introduction K-Means Clustering Similarity-Based Clustering Nearest Neighbor Clustering Ensemble Clustering Subspace Clustering

Reference Books

1. Data Mining Concepts and Technique, by Jiawei Han, Micheline


Kamber, and Jian Pei (Third Edition)
2. Data Mining Practical Machine Learning Tools and Techniques, by
Ian H. Witten, Eibe Frank, and Mark A. Hall (Third Edition)
3. Data Mining Knowledge Discovery and Applications, Edited by
Adem Karahoca
4. Mining Complex Data, by Djamel A. Zighed, Shusaku Tsumoto,
Zbigniew W. Ras, and Hakim Hacid

Prof. Dr. Dewan Md. Farid: Unsupervised Learning in Machine Learning Professor of Computer Science, United International University
Outline Introduction K-Means Clustering Similarity-Based Clustering Nearest Neighbor Clustering Ensemble Clustering Subspace Clustering

*** THANK YOU ***

Prof. Dr. Dewan Md. Farid: Unsupervised Learning in Machine Learning Professor of Computer Science, United International University

You might also like