0% found this document useful (0 votes)

13 views25 pages

ICS 2408 Lecture 7 Clustering

The document discusses different types of clustering methods. It describes clustering as the process of grouping similar data objects into clusters. The main clustering methods discussed are partitioning methods like k-means which create non-overlapping clusters, hierarchical methods like agglomerative clustering which create nested clusters, density-based methods which find clusters based on density connections, grid-based methods which operate on multi-level spatial data structures, and model-based clustering which fit statistical models to each cluster.

Uploaded by

petergitagia9781

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPT, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

13 views25 pages

ICS 2408 Lecture 7 Clustering

Uploaded by

petergitagia9781

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPT, PDF, TXT or read online on Scribd

You are on page 1/ 25

Clustering

 What is Cluster Analysis?

 A Categorization of Major Clustering Methods
 Partitioning Methods
 Hierarchical Methods
 Density-Based Methods
 Grid-Based Methods
 Model-Based Clustering Methods
 Outlier Analysis

February 19, 2024 Moso J : Dedan Kimathi University 1

What is Cluster Analysis?
 Cluster: a collection of data objects
 Similar to one another within the same cluster
 Dissimilar to the objects in other clusters
 Cluster analysis
 Finding similarities between data according to the characteristics
found in the data and grouping similar data objects into clusters
 Unsupervised learning: no predefined classes
 Typical applications
 As a stand-alone tool to get insight into data distribution
 As a preprocessing step for other algorithms

February 19, 2024 Moso J : Dedan Kimathi University 2

General Applications of Clustering
 Pattern Recognition
 Spatial Data Analysis
 Create thematic maps in GIS by clustering feature spaces
 Detect spatial clusters or for other spatial mining tasks
 Image Processing
 Economic Science (especially market research)
 WWW
 Document classification
 Cluster Weblog data to discover groups of similar access patterns

February 19, 2024 Moso J : Dedan Kimathi University 3

Examples of Clustering Applications
 Marketing: Help marketers discover distinct groups in their customer bases, and
then use this knowledge to develop targeted marketing programs
 Land use: Identification of areas of similar land use in an earth observation
database
 Insurance: Identifying groups of motor insurance policy holders with a high average
claim cost
 City-planning: Identifying groups of houses according to their house type, value,
and geographical location
 Earth-quake studies: Observed earth quake epicenters should be clustered along
continent faults

February 19, 2024 Moso J : Dedan Kimathi University 4

What Is Good Clustering?

 A good clustering method will produce high quality clusters

with
 high intra-class similarity
 low inter-class similarity
 The quality of a clustering result depends on both the
similarity measure used by the method and its implementation
 The quality of a clustering method is also measured by its
ability to discover some or all of the hidden patterns
February 19, 2024 Moso J : Dedan Kimathi University 5
Requirements of Clustering in Data Mining
 Scalability
 Ability to deal with different types of attributes
 Discovery of clusters with arbitrary shape
 Minimal requirements for domain knowledge to determine input
parameters
 Able to deal with noise and outliers
 Insensitive to order of input records
 High dimensionality
 Incorporation of user-specified constraints
 Interpretability and usability

February 19, 2024 Moso J : Dedan Kimathi University 6

Measure the Quality of Clustering
 Dissimilarity/Similarity metric: Similarity is expressed in terms of a
distance function, typically metric: d(i, j)
 There is a separate “quality” function that measures the “goodness”
of a cluster.
 The definitions of distance functions are usually very different for
interval-scaled, boolean, categorical, ordinal ratio, and vector
variables.
 Weights should be associated with different variables based on
applications and data semantics.
 It is hard to define “similar enough” or “good enough”
 the answer is typically highly subjective.

February 19, 2024 Moso J : Dedan Kimathi University 7

Type of data in clustering analysis

 Interval-scaled variables
 Binary variables
 Nominal, ordinal, and ratio variables
 Variables of mixed types

February 19, 2024 Moso J : Dedan Kimathi University 8

Major Clustering Approaches (I)

 Partitioning approach:
 Construct various partitions and then evaluate them by some criterion, e.g.,
minimizing the sum of square errors
 Typical methods: k-means, k-medoids, CLARANS
 Hierarchical approach:
 Create a hierarchical decomposition of the set of data (or objects) using some
criterion
 Typical methods: Diana, Agnes, BIRCH, ROCK, CHAMELEON
 Density-based approach:
 Based on connectivity and density functions
 Typical methods: DBSACN, OPTICS, DenClue

February 19, 2024 Moso J : Dedan Kimathi University 9

Major Clustering Approaches (II)

 Grid-based approach:
 based on a multiple-level granularity structure
 Typical methods: STING, WaveCluster, CLIQUE
 Model-based:
 A model is hypothesized for each of the clusters and tries to find the best fit of that model to
each other
 Typical methods: EM, SOM, COBWEB
 Frequent pattern-based:
 Based on the analysis of frequent patterns
 Typical methods: pCluster
 User-guided or constraint-based:
 Clustering by considering user-specified or application-specific constraints
 Typical methods: COD (obstacles), constrained clustering
February 19, 2024 Moso J : Dedan Kimathi University 10
Partitioning Algorithms: Basic Concept

 Partitioning method: Construct a partition of a database D of n

objects into a set of k clusters.
 Given a k, find a partition of k clusters that optimizes the chosen
partitioning criterion
 Global optimal: exhaustively enumerate all partitions
 Heuristic methods: k-means and k-medoids algorithms
 k-means (MacQueen’67): Each cluster is represented by the
center of the cluster
 k-medoids or PAM (Partition around medoids) (Kaufman &
Rousseeuw’87): Each cluster is represented by one of the
objects in the cluster

February 19, 2024 Moso J : Dedan Kimathi University 11

Hierarchical Clustering

 A hierarchical clustering method works by grouping

objects into a tree of clusters.
 Hierarchical clustering methods can be further classified
as either agglomerative or divisive, depending on
whether the hierarchical decomposition is formed in a
bottom-up (merging) or top-down (splitting) fashion.

February 19, 2024 Moso J : Dedan Kimathi University 12

Hierarchical Clustering: Agglomerative

 This bottom-up strategy starts by placing each object in its own

cluster and then merges these atomic clusters into larger and
larger clusters, until all of the objects are in a single cluster or until
certain termination conditions are satisfied.


Method:
 Start with partition Pn, where each object forms its own cluster.
 Merge the two closest clusters, obtaining Pn-1.
 Repeat merge until only one cluster is left or termination condition
is satisfied.

February 19, 2024 Moso J : Dedan Kimathi University 13

Hierarchical Clustering: Divisive (DIANA)

 This top-down strategy does the reverse of agglomerative

hierarchical clustering by starting with all objects in one cluster. It
subdivides the clusters into smaller and smaller pieces, until each
object form a cluster on its own or until it satisfies certain termination
conditions, such as a desired number of cluster or the diameter of
each cluster is within a certain threshold.
 Method:
 Start with P1.
 Split the collection into two clusters that are as homogenous (and as
different from each other) as possible.
 Apply splitting procedure recursively to the clusters.
February 19, 2024 Moso J : Dedan Kimathi University 14
Hierarchical Clustering

 Example: A data-set has five objects {a,b,c,d,e}

 AGNES (Agglomerative Nesting)
 DIANA (Divisive Analysis)
Step 0 Step 1 Step 2 Step 3 Step 4 agglomerative
(AGNES)
a
ab
b
abcde
c
cde
d
de
e
divisive
(DIANA)
Step 4 Step 3 Step 2 Step 1 Step 0

February 19, 2024 Moso J : Dedan Kimathi University 15

Density-Based Clustering Methods

 Clustering based on density (local cluster criterion), such as density-

connected points
 Major features:

Discover clusters of arbitrary shape

Handle noise

One scan

Need density parameters as termination condition
 Several interesting studies:
 DBSCAN: Ester, et al. (KDD’96)

 OPTICS: Ankerst, et al (SIGMOD’99).

 DENCLUE: Hinneburg & D. Keim (KDD’98)

 CLIQUE: Agrawal, et al. (SIGMOD’98) (more grid-based)

February 19, 2024 Moso J : Dedan Kimathi University 16

Grid-Based Clustering Method

 Using multi-resolution grid data structure

 Several interesting methods
 STING (a STatistical INformation Grid approach) by Wang, Yang
and Muntz (1997)
 WaveCluster by Sheikholeslami, Chatterjee, and Zhang (VLDB’98)
 A multi-resolution clustering approach using wavelet method
 CLIQUE: Agrawal, et al. (SIGMOD’98)
 On high-dimensional data

February 19, 2024 Moso J : Dedan Kimathi University 17

Model-Based Clustering

 What is model-based clustering?

 Attempt to optimize the fit between the given data and some

mathematical model
 Based on the assumption: Data are generated by a mixture of

underlying probability distribution

 Typical methods
 Statistical approach

 EM (Expectation maximization), AutoClass

 Machine learning approach
 COBWEB, CLASSIT
 Neural network approach
 SOM (Self-Organizing Feature Map)

February 19, 2024 Moso J : Dedan Kimathi University 18

Clustering High-Dimensional Data

 Clustering high-dimensional data

 Many applications: text documents, DNA micro-array data
 Major challenges:
 Many irrelevant dimensions may mask clusters
 Distance measure becomes meaningless—due to equi-distance
 Clusters may exist only in some subspaces
 Methods
 Feature transformation: only effective if most dimensions are relevant
 PCA & SVD useful only when features are highly correlated/redundant
 Feature selection: wrapper or filter approaches
 useful to find a subspace where the data have nice clusters
 Subspace-clustering: find clusters in all the possible subspaces
 CLIQUE, ProClus, and frequent pattern-based clustering

February 19, 2024 Moso J : Dedan Kimathi University 19

The Curse of Dimensionality

 Data in only one dimension is relatively packed

 Adding a dimension “stretch” the points across that dimension, making them
further apart
 Adding more dimensions will make the points further apart—high dimensional data
is extremely sparse
 Distance measure becomes meaningless—due to equi-distance

(graphs adapted from Parsons et

al. KDD Explorations 2004)
February 19, 2024 Moso J : Dedan Kimathi University 20
What Is Outlier Discovery (analysis)?

 What are outliers?

 The set of objects are considerably dissimilar from the

remainder of the data

 Problem: Define and find outliers in large data sets

 Applications:
 Credit card fraud detection

 Telecom fraud detection

 Customer segmentation

 Medical analysis

February 19, 2024 Moso J : Dedan Kimathi University 21

Outlier Discovery: Statistical Approaches

 Assume a model underlying distribution that

generates data set (e.g. normal distribution)
 Use discordancy tests depending on
 data distribution

 distribution parameter (e.g., mean,

variance)
 number of expected outliers

 Drawbacks
 most tests are for single attribute

 In many cases, data distribution may not be

known

February 19, 2024 Moso J : Dedan Kimathi University 22

Outlier Discovery: Distance-Based Approach

 Introduced to counter the main limitations imposed by statistical

methods
 We need multi-dimensional analysis without knowing data

distribution
 Distance-based outlier: A DB(p, D)-outlier is an object O in a dataset
T such that at least a fraction p of the objects in T lies at a distance
greater than D from O
 Algorithms for mining distance-based outliers
 Index-based algorithm

 Nested-loop algorithm

 Cell-based algorithm

February 19, 2024 Moso J : Dedan Kimathi University 23

References (1)

 R. Agrawal, J. Gehrke, D. Gunopulos, and P. Raghavan. Automatic subspace clustering of high

dimensional data for data mining applications. SIGMOD'98
 M. R. Anderberg. Cluster Analysis for Applications. Academic Press, 1973.
 M. Ankerst, M. Breunig, H.-P. Kriegel, and J. Sander. Optics: Ordering points to identify the clustering
structure, SIGMOD’99.
 P. Arabie, L. J. Hubert, and G. De Soete. Clustering and Classification. World Scietific, 1996
 M. Ester, H.-P. Kriegel, J. Sander, and X. Xu. A density-based algorithm for discovering clusters in large
spatial databases. KDD'96.
 M. Ester, H.-P. Kriegel, and X. Xu. Knowledge discovery in large spatial databases: Focusing techniques
for efficient class identification. SSD'95.
 D. Fisher. Knowledge acquisition via incremental conceptual clustering. Machine Learning, 2:139-172,
1987.
 D. Gibson, J. Kleinberg, and P. Raghavan. Clustering categorical data: An approach based on dynamic
systems. In Proc. VLDB’98.
 S. Guha, R. Rastogi, and K. Shim. Cure: An efficient clustering algorithm for large databases.
SIGMOD'98.
 A. K. Jain and R. C. Dubes. Algorithms for Clustering Data. Printice Hall, 1988.

February 19, 2024 Moso J : Dedan Kimathi University 24

References (2)

 L. Kaufman and P. J. Rousseeuw. Finding Groups in Data: an Introduction to Cluster Analysis. John Wiley
& Sons, 1990.
 E. Knorr and R. Ng. Algorithms for mining distance-based outliers in large datasets. VLDB’98.
 G. J. McLachlan and K.E. Bkasford. Mixture Models: Inference and Applications to Clustering. John Wiley
and Sons, 1988.
 P. Michaud. Clustering techniques. Future Generation Computer systems, 13, 1997.
 R. Ng and J. Han. Efficient and effective clustering method for spatial data mining. VLDB'94.
 E. Schikuta. Grid clustering: An efficient hierarchical clustering method for very large data sets. Proc.
1996 Int. Conf. on Pattern Recognition, 101-105.
 G. Sheikholeslami, S. Chatterjee, and A. Zhang. WaveCluster: A multi-resolution clustering approach for
very large spatial databases. VLDB’98.
 W. Wang, Yang, R. Muntz, STING: A Statistical Information grid Approach to Spatial Data Mining,
VLDB’97.
 T. Zhang, R. Ramakrishnan, and M. Livny. BIRCH : an efficient data clustering method for very large
databases. SIGMOD'96.

February 19, 2024 Moso J : Dedan Kimathi University 25

CS202 Current Final Term Paper 2022
0% (1)
CS202 Current Final Term Paper 2022
4 pages
Unit 5
No ratings yet
Unit 5
27 pages
SAi Color Tester 2019
No ratings yet
SAi Color Tester 2019
1 page
Data Mining Clustering
No ratings yet
Data Mining Clustering
76 pages
Accomplishment Report Ict
100% (2)
Accomplishment Report Ict
2 pages
Density & Grid Based Clustering
100% (1)
Density & Grid Based Clustering
21 pages
Data Mining Clustering Techniques
No ratings yet
Data Mining Clustering Techniques
3 pages
TYBSC (CS) - CS-3511 Blockchain Technology
No ratings yet
TYBSC (CS) - CS-3511 Blockchain Technology
2 pages
GIS-Based Application For DepEd Schools in The Philippines Using Spatial Data Analysis
No ratings yet
GIS-Based Application For DepEd Schools in The Philippines Using Spatial Data Analysis
5 pages
Pactor Modem-SCS - Manual - PTC-IIe - 4.0
No ratings yet
Pactor Modem-SCS - Manual - PTC-IIe - 4.0
218 pages
OS Lecture2 - CPU Scheduling
No ratings yet
OS Lecture2 - CPU Scheduling
48 pages
Lecture 1 Object-Oriented Design and Implementation
No ratings yet
Lecture 1 Object-Oriented Design and Implementation
201 pages
Unclaimed Shares Transferred To IEPF 22042019
No ratings yet
Unclaimed Shares Transferred To IEPF 22042019
178 pages
Unit 4
No ratings yet
Unit 4
106 pages
d0cc d0cd D0ce Aa Kor Ind
No ratings yet
d0cc d0cd D0ce Aa Kor Ind
177 pages
CLUSTER ANALYSIS Unit 3 Data Mining
No ratings yet
CLUSTER ANALYSIS Unit 3 Data Mining
84 pages
Unit IV
No ratings yet
Unit IV
96 pages
10 Clus Basic
No ratings yet
10 Clus Basic
95 pages
Gog and Magog
No ratings yet
Gog and Magog
193 pages
Clustering
No ratings yet
Clustering
41 pages
A Competency Framework For AI Integration in India
No ratings yet
A Competency Framework For AI Integration in India
78 pages
05 Clustering
No ratings yet
05 Clustering
96 pages
Unit - 5 Cluster Analysis
No ratings yet
Unit - 5 Cluster Analysis
83 pages
10ClusBasic Editted v1
No ratings yet
10ClusBasic Editted v1
41 pages
DWMModule 4
No ratings yet
DWMModule 4
31 pages
Battlefy Player Guide
No ratings yet
Battlefy Player Guide
73 pages
Introduction To Cluster Analysis.
No ratings yet
Introduction To Cluster Analysis.
53 pages
10 Clus Basic
No ratings yet
10 Clus Basic
31 pages
Concepts and Techniques: Data Mining
No ratings yet
Concepts and Techniques: Data Mining
101 pages
CAD Tutorials
No ratings yet
CAD Tutorials
18 pages
Clustering Notes
No ratings yet
Clustering Notes
17 pages
AKANKSHA START PAGE - Merged
No ratings yet
AKANKSHA START PAGE - Merged
51 pages
Clustering in Data Mining
No ratings yet
Clustering in Data Mining
14 pages
Concepts and Techniques: Data Mining
No ratings yet
Concepts and Techniques: Data Mining
50 pages
ICS 2408 - Lecture 7 - Clustering
No ratings yet
ICS 2408 - Lecture 7 - Clustering
25 pages
Data Mining - Lecture 9
No ratings yet
Data Mining - Lecture 9
29 pages
DM Module 4
No ratings yet
DM Module 4
17 pages
Cluster Analysis
No ratings yet
Cluster Analysis
26 pages
Unit 2 DMW
No ratings yet
Unit 2 DMW
26 pages
Cluster Analysis: Basic Concepts Partitioning Methods Hierarchical Methods Density-Based Methods Grid-Based Methods Evaluation of Clustering
No ratings yet
Cluster Analysis: Basic Concepts Partitioning Methods Hierarchical Methods Density-Based Methods Grid-Based Methods Evaluation of Clustering
38 pages
DWDM - Unit - VI
No ratings yet
DWDM - Unit - VI
38 pages
Lecture 8 - Clustering
No ratings yet
Lecture 8 - Clustering
23 pages
CCBoot Manual - Disk Manager
No ratings yet
CCBoot Manual - Disk Manager
89 pages
RBF Elm PNN-2020
No ratings yet
RBF Elm PNN-2020
24 pages
Eh Unit2
No ratings yet
Eh Unit2
10 pages
10clustering - Han and Kamber
No ratings yet
10clustering - Han and Kamber
93 pages
DWDM Lecture Notes U-5
No ratings yet
DWDM Lecture Notes U-5
26 pages
Module V
No ratings yet
Module V
16 pages
Unit 4
No ratings yet
Unit 4
21 pages
Unit-IV Cluster Outlier Analysis
No ratings yet
Unit-IV Cluster Outlier Analysis
21 pages
Data Mining Notes UNIT IV
No ratings yet
Data Mining Notes UNIT IV
19 pages
DBMS Final Print
No ratings yet
DBMS Final Print
10 pages
Screenshot 2024-05-17 at 3.30.05 PM
No ratings yet
Screenshot 2024-05-17 at 3.30.05 PM
31 pages
Cluster Analysis
No ratings yet
Cluster Analysis
36 pages
Operating Manual Programming and Diagnostic Tool
No ratings yet
Operating Manual Programming and Diagnostic Tool
40 pages
Ds Econtent
No ratings yet
Ds Econtent
8 pages
Fingerprint Lock System
No ratings yet
Fingerprint Lock System
9 pages
Concepts and Techniques: - Chapter 10
No ratings yet
Concepts and Techniques: - Chapter 10
97 pages
Unit 4 Clustering
No ratings yet
Unit 4 Clustering
18 pages
Operational Plan Modern Space Multifunctional Table
No ratings yet
Operational Plan Modern Space Multifunctional Table
20 pages
Clustering
No ratings yet
Clustering
25 pages
Clustering
No ratings yet
Clustering
32 pages
Chapter 5
No ratings yet
Chapter 5
43 pages
Concepts and Techniques: Data Mining
No ratings yet
Concepts and Techniques: Data Mining
101 pages
Data Mining Unit-Iv
No ratings yet
Data Mining Unit-Iv
34 pages
ML Unit 4 Notes - NJ
No ratings yet
ML Unit 4 Notes - NJ
15 pages
Cluster Analysis
No ratings yet
Cluster Analysis
4 pages
Clustering K Means Agnes
No ratings yet
Clustering K Means Agnes
36 pages
Clustering New
No ratings yet
Clustering New
6 pages
Study of Clustering Methods in Data Mini PDF
No ratings yet
Study of Clustering Methods in Data Mini PDF
5 pages
BSC BSC Cs Electronic Science Semester 1 2022 April Semiconductor Devices and Basic Electronic Systems 2019 Pattern
No ratings yet
BSC BSC Cs Electronic Science Semester 1 2022 April Semiconductor Devices and Basic Electronic Systems 2019 Pattern
2 pages
Hamidullah CV
No ratings yet
Hamidullah CV
3 pages
Research Methodology
No ratings yet
Research Methodology
2 pages
चरित्र प्रमाण पत्र - PDF
No ratings yet
चरित्र प्रमाण पत्र - PDF
6 pages
Fundamentals of Data Science Unit 3
No ratings yet
Fundamentals of Data Science Unit 3
15 pages
Cluster Analysis-Unit 4
No ratings yet
Cluster Analysis-Unit 4
7 pages
What Is Cluster Analysis?: - Cluster: A Collection of Data Objects
No ratings yet
What Is Cluster Analysis?: - Cluster: A Collection of Data Objects
9 pages
Clustering
No ratings yet
Clustering
6 pages
Vice President of Engineering - Shift5
No ratings yet
Vice President of Engineering - Shift5
3 pages
Geographic Information System AND Remote Sensing
No ratings yet
Geographic Information System AND Remote Sensing
19 pages
Sathyabama Institute of Science and Technology SIT1301-Data Mining and Warehousing
No ratings yet
Sathyabama Institute of Science and Technology SIT1301-Data Mining and Warehousing
22 pages
Tracking Data Changes: With Temporal Tables and More
No ratings yet
Tracking Data Changes: With Temporal Tables and More
22 pages
Data Mining-Unit IV
No ratings yet
Data Mining-Unit IV
15 pages
Amity School of Engineering and Technology Amity University, Uttar Pradesh
No ratings yet
Amity School of Engineering and Technology Amity University, Uttar Pradesh
5 pages
UPS Atlas Power Online BH - Plus Power+ 1-20KVA (Rack Mount) 2019 PDF
No ratings yet
UPS Atlas Power Online BH - Plus Power+ 1-20KVA (Rack Mount) 2019 PDF
2 pages
Custer Analysis: Prepared by Navin Ninama
No ratings yet
Custer Analysis: Prepared by Navin Ninama
20 pages
Comparison of Different Clustering Algorithms Using WEKA Tool
No ratings yet
Comparison of Different Clustering Algorithms Using WEKA Tool
3 pages
DM Cluster Analysis
No ratings yet
DM Cluster Analysis
3 pages
Contextual Image Classification: Understanding Visual Data for Effective Classification
From Everand
Contextual Image Classification: Understanding Visual Data for Effective Classification
Fouad Sabry
No ratings yet
K Nearest Neighbor Algorithm: Fundamentals and Applications
From Everand
K Nearest Neighbor Algorithm: Fundamentals and Applications
Fouad Sabry
No ratings yet

ICS 2408 Lecture 7 Clustering

Uploaded by

ICS 2408 Lecture 7 Clustering

Uploaded by

Clustering

 What is Cluster Analysis?

February 19, 2024 Moso J : Dedan Kimathi University 1

February 19, 2024 Moso J : Dedan Kimathi University 2

February 19, 2024 Moso J : Dedan Kimathi University 3

February 19, 2024 Moso J : Dedan Kimathi University 4

 A good clustering method will produce high quality clusters

February 19, 2024 Moso J : Dedan Kimathi University 6

February 19, 2024 Moso J : Dedan Kimathi University 7

February 19, 2024 Moso J : Dedan Kimathi University 8

February 19, 2024 Moso J : Dedan Kimathi University 9

 Partitioning method: Construct a partition of a database D of n

February 19, 2024 Moso J : Dedan Kimathi University 11

 A hierarchical clustering method works by grouping

February 19, 2024 Moso J : Dedan Kimathi University 12

 This bottom-up strategy starts by placing each object in its own

February 19, 2024 Moso J : Dedan Kimathi University 13

 This top-down strategy does the reverse of agglomerative

 Example: A data-set has five objects {a,b,c,d,e}

February 19, 2024 Moso J : Dedan Kimathi University 15

 Clustering based on density (local cluster criterion), such as density-

 OPTICS: Ankerst, et al (SIGMOD’99).

 DENCLUE: Hinneburg & D. Keim (KDD’98)

 CLIQUE: Agrawal, et al. (SIGMOD’98) (more grid-based)

February 19, 2024 Moso J : Dedan Kimathi University 16

 Using multi-resolution grid data structure

February 19, 2024 Moso J : Dedan Kimathi University 17

 What is model-based clustering?

underlying probability distribution

 EM (Expectation maximization), AutoClass

February 19, 2024 Moso J : Dedan Kimathi University 18

 Clustering high-dimensional data

February 19, 2024 Moso J : Dedan Kimathi University 19

 Data in only one dimension is relatively packed

(graphs adapted from Parsons et

 What are outliers?

remainder of the data

 Telecom fraud detection

February 19, 2024 Moso J : Dedan Kimathi University 21

 Assume a model underlying distribution that

 distribution parameter (e.g., mean,

 In many cases, data distribution may not be

February 19, 2024 Moso J : Dedan Kimathi University 22

 Introduced to counter the main limitations imposed by statistical

February 19, 2024 Moso J : Dedan Kimathi University 23

 R. Agrawal, J. Gehrke, D. Gunopulos, and P. Raghavan. Automatic subspace clustering of high

February 19, 2024 Moso J : Dedan Kimathi University 24

February 19, 2024 Moso J : Dedan Kimathi University 25

You might also like