100% found this document useful (1 vote)

6K views

Data Mining-Partitioning Methods

Major clustering approaches include partitioning, hierarchical, density-based, grid-based, and model-based methods. Partitioning approaches like k-means and k-medoids iteratively assign objects to clusters to minimize distances between objects and cluster centers or medoids. K-medoids is more robust to outliers than k-means but does not scale well to large datasets, leading to methods like CLARA and CLARANS that apply k-medoids to samples of the data.

Uploaded by

Raj Endran

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

100% found this document useful (1 vote)

6K views

Data Mining-Partitioning Methods

Uploaded by

Raj Endran

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 7

CLUSTERING PARTITIONING METHODS

Major Clustering Approaches

Partitioning approach:

Construct k partitions (k <= n) and then evaluate

them by some criterion, e.g., minimizing the sum
of square errors

Each group has at least one object, each

object belongs to one group

Iterative Relocation Technique

Avoid Enumeration by storing the centroids

Typical
methods:
k-means,
k-medoids,
CLARANS

Hierarchical approach:

Create a hierarchical decomposition of the set of

data (or objects) using some criterion

Agglomerative Vs Divisive

Rigid Cannot undo

Perform Analysis of linkages

Integrate with iterative relocation

Typical methods: Diana, Agnes, BIRCH

Density Based Methods
Distance based methods Spherical Clusters
Density For each data point within a given
cluster the neighbourhood should contain a
minimum number of points
DBSCAN, OPTICS
Grid Based Methods
Object space finite number of cells forming grid
structure
Fast processing time
Typical methods: STING, WaveCluster, CLIQUE

Model-based:
A model is hypothesized for each of the clusters
and tries to find the best fit of that model to each
other
Typical methods: EM, COBWEB
Frequent pattern-based:
Based on the analysis of frequent patterns
Typical methods: pCluster
User-guided or constraint-based:
Clustering
by considering user-specified or
application-specific constraints
Typical methods: COD, constrained clustering

Partitioning Algorithms: Basic Concept

Partitioning method: Construct a partition of a
database D of n objects into a set of k clusters, s.t.,
min sum of squared distance

Given a k, find a partition of k clusters that optimizes

the chosen partitioning criterion
Global
optimal: exhaustively enumerate all
partitions
Heuristic
methods: k-means and k-medoids
algorithms
k-means Each cluster is represented by the center
of the cluster
k-medoids or PAM (Partition around medoids)
Each cluster is represented by one of the objects
in the cluster

K-Means Clustering Method

Given k, the k-means algorithm is implemented in 4
steps:
Partition objects into k non-empty subsets
Compute seed points as the centroids of the
clusters of the current partition. The centroid is
the center (mean point) of the cluster.
Assign each object to the cluster with the nearest
seed point.
Go back to Step 2, stop when no more new
assignment.
k-means algorithm is implemented as below:
Input: Number of clusters k, database of n objects
Output: Set of k clusters that minimize the squared
error
Choose k objects as the initial cluster centers
Repeat

(Re)assign each object to the cluster to which

the object is most similar based on the mean
value of the objects in the cluster

Update the cluster means

Until no change
K-Means Method
Strength: Relatively efficient: O(tkn), where n is #
objects, k is # clusters, and t is # iterations.
Normally, k, t << n.
Comment: Often terminates at a local optimum. The
global optimum may be found using techniques such
as: deterministic annealing and genetic algorithms

Weakness
Applicable only
when mean is defined
Categorical data
Need to specify k, the number of clusters, in
advance
Unable to handle noisy data and outliers
Not suitable to discover clusters with non-convex
shapes

Variations of the K-Means Method

A few variants of the k-means which differ in
Selection of the initial k means
Dissimilarity calculations
Strategies to calculate cluster means
Handling categorical data: k-modes
Replacing means of clusters with modes
Using new dissimilarity measures to deal with
categorical objects
A mixture of categorical and numerical data: kprototype method
Expectation Maximization
Assigns
objects to clusters based on the
probability of membership
Scalability of k-means
Compressible, Discardable, To be maintained in
main memory
Clustering Features

Problem of the K-Means Method

The k-means algorithm is sensitive to outliers
Since an object with an extremely large value
may substantially distort the distribution of the

data.
K-Medoids: Instead of taking the mean value of the
object in a cluster as a reference point, medoids can
be used, which is the most centrally located object
in a cluster.

K-Medoids Clustering Method

PAM (Partitioning Around Medoids)
starts from an initial set of medoids and iteratively
replaces one of the medoids by one of the nonmedoids if it improves the total distance of the
resulting clustering
All pairs are analyzed for replacement
PAM works effectively for small data sets, but
does not scale well for large data sets
CLARA
CLARANS
K-Medoids
Input: k, and database of n objects
Output: A set of k clusters
Method:
Arbitrarily choose k objects as initial medoids
Repeat

Assign each remaining object to cluster with

nearest medoid
Randomly select a non-medoid orandom
Compute cost S of swapping oj with orandom

If S < 0 swap to form new set of k medoids

Until no change

K-medoids

Case 1: p currently belongs to medoid oj. If oj is

replaced by orandom as a medoid and p is closest to one
of oi where i < > j then p is reassigned to oi.

Case 2: p currently belongs to medoid oj. If oj is

replaced by orandom as a medoid and p is closest to
orandom then p is reassigned to orandom.

Case 3: p currently belongs to medoid oi (i< >j) If oj

is replaced by orandom as a medoid and p is still closest
to oi assignment does not change
Case 4: p currently belongs to medoid oi (i < > j). If
oj is replaced by orandom as a medoid and p is closest to
orandom then p is reassigned to orandom.

K-medoids
After reassignment difference in squared error E is
calculated. Total cost of swapping Sum of costs
incurred by all non-medoid objects
If total cost is negative, oj is replaced with orandom as E
will be reduced
K-medoids Algorithm Problem with PAM
PAM is more robust than k-means in the presence of
noise and outliers because a medoid is less influenced
by outliers or other extreme values than a mean
PAM works efficiently for small data sets but does
not scale well for large data sets.

CLARA
Clustering LARge Applications
Choose a representative set of data
Choose medoids from this
Cluster
Draw multiple such samples and apply PAM on each
Returns best Clustering
Effectiveness depends on Sample Size
CLARANS
Clustering
Large
Applications
based
on
RANdomized Search
Uses Sampling and PAM
Doesnt restrict itself to any particular sample
Performs a graph search with each node acting as a
potential solution-( k medoids)
Clustering got after replacement Neighbor
Number of neighbors to be tried is limited
Moves to better neighbour
Silhouette Coefficient
2
Complexity O(n )

CNS Notes
No ratings yet
CNS Notes
244 pages
DBMS Unit V
No ratings yet
DBMS Unit V
17 pages
Unit - V - 1
0% (1)
Unit - V - 1
17 pages
Mining Multilevel Association Rules From Transactional Databases
No ratings yet
Mining Multilevel Association Rules From Transactional Databases
46 pages
02 Data Mining-Partitioning Method
No ratings yet
02 Data Mining-Partitioning Method
8 pages
Data Mining-Mining Time Series Data
0% (1)
Data Mining-Mining Time Series Data
7 pages
Chameleon: A Hierarchical Clustering Algorithm Using Dynamic Modeling
No ratings yet
Chameleon: A Hierarchical Clustering Algorithm Using Dynamic Modeling
18 pages
Data Mining - Classification Using Frequent Pattern
No ratings yet
Data Mining - Classification Using Frequent Pattern
8 pages
Assignment DBMS
No ratings yet
Assignment DBMS
8 pages
Final Document
No ratings yet
Final Document
73 pages
Hadoop Building Blocks
No ratings yet
Hadoop Building Blocks
30 pages
Vtu 7TH Sem Cse/ise Data Warehousing & Data Mining Notes 10cs755/10is74
94% (18)
Vtu 7TH Sem Cse/ise Data Warehousing & Data Mining Notes 10cs755/10is74
70 pages
Data Mining Report
100% (1)
Data Mining Report
15 pages
Data Mining Metrices
No ratings yet
Data Mining Metrices
6 pages
Data Mining-Constraint Based Cluster Analysis
100% (1)
Data Mining-Constraint Based Cluster Analysis
4 pages
Data Mining Notes
No ratings yet
Data Mining Notes
21 pages
Dimensionality Reduction
No ratings yet
Dimensionality Reduction
4 pages
Data Mining-Graph Mining
No ratings yet
Data Mining-Graph Mining
9 pages
Distribution Model
100% (1)
Distribution Model
24 pages
Enhanced Data Models For Advanced Applications
91% (11)
Enhanced Data Models For Advanced Applications
15 pages
K Means Clustering Algorithm
No ratings yet
K Means Clustering Algorithm
12 pages
FIND-S Algorithm: Machine Learning 15CSL76
No ratings yet
FIND-S Algorithm: Machine Learning 15CSL76
3 pages
Mining Various Kinds of Association Rules
No ratings yet
Mining Various Kinds of Association Rules
11 pages
Properties of Relational Decomposition
No ratings yet
Properties of Relational Decomposition
3 pages
Chapter 4 Distributed Database Systems
No ratings yet
Chapter 4 Distributed Database Systems
69 pages
Algo PPT Unit-2 B Tree
No ratings yet
Algo PPT Unit-2 B Tree
38 pages
5.1 Mining Data Streams
No ratings yet
5.1 Mining Data Streams
16 pages
Proposal For Hospital System: EER Diagram & Relational Model
0% (1)
Proposal For Hospital System: EER Diagram & Relational Model
11 pages
1.introduction To Schema Refinement: Problems Caused by Redundancy
No ratings yet
1.introduction To Schema Refinement: Problems Caused by Redundancy
44 pages
Backup and Recovery
No ratings yet
Backup and Recovery
35 pages
Dbms Lab Manual II Cse II Sem
No ratings yet
Dbms Lab Manual II Cse II Sem
58 pages
Data Science Techniques Classification Regression and Clustering
No ratings yet
Data Science Techniques Classification Regression and Clustering
5 pages
Data Mining Notes Jntuh Compress
No ratings yet
Data Mining Notes Jntuh Compress
62 pages
Counting Oneness in A Window
No ratings yet
Counting Oneness in A Window
12 pages
KNN Algorithm
100% (1)
KNN Algorithm
11 pages
Direct Hashing and Pruning (Park-Chen-Yu) Direct Hashing and Pruning
No ratings yet
Direct Hashing and Pruning (Park-Chen-Yu) Direct Hashing and Pruning
3 pages
Flajolet-Martin Algorithm
No ratings yet
Flajolet-Martin Algorithm
28 pages
Classification and Prediction
No ratings yet
Classification and Prediction
126 pages
Aproiri Qand A
No ratings yet
Aproiri Qand A
9 pages
R18CSE4102-UNIT 2 Data Mining Notes
100% (1)
R18CSE4102-UNIT 2 Data Mining Notes
31 pages
Lazy Learners PDF
No ratings yet
Lazy Learners PDF
15 pages
Presentation On: Crime Analysis and Prediction Using Data Mining
No ratings yet
Presentation On: Crime Analysis and Prediction Using Data Mining
14 pages
Bda Experiment 4: Roll No. A-52 Name: Janmejay Patil Class: BE-A Batch: A3 Date of Experiment: Date of Submission Grade
No ratings yet
Bda Experiment 4: Roll No. A-52 Name: Janmejay Patil Class: BE-A Batch: A3 Date of Experiment: Date of Submission Grade
5 pages
Integrity and Domain Constraints
No ratings yet
Integrity and Domain Constraints
25 pages
CA2-Question Bank MCQ (PEC-CSBS601D)
No ratings yet
CA2-Question Bank MCQ (PEC-CSBS601D)
9 pages
DAA Question Bank
No ratings yet
DAA Question Bank
39 pages
DMW Question Paper
0% (1)
DMW Question Paper
7 pages
DC Toppers Solution
No ratings yet
DC Toppers Solution
92 pages
Distance-Based Methods - KNN
No ratings yet
Distance-Based Methods - KNN
8 pages
Parallelizing - K Means Clustering: A Project Report
100% (1)
Parallelizing - K Means Clustering: A Project Report
32 pages
Decaying Window
No ratings yet
Decaying Window
16 pages
K Mean Clustering
No ratings yet
K Mean Clustering
36 pages
What Is Data Visualization UNIT-V
No ratings yet
What Is Data Visualization UNIT-V
24 pages
Parallel Database Systems
No ratings yet
Parallel Database Systems
17 pages
Dmbi Mcqs Mcqs For Data Mining and Business Intelligence
No ratings yet
Dmbi Mcqs Mcqs For Data Mining and Business Intelligence
24 pages
DBMS ER Design Issues - Copy Unit.2
No ratings yet
DBMS ER Design Issues - Copy Unit.2
2 pages
Numerical Based On Indexing: Problem 1.2
No ratings yet
Numerical Based On Indexing: Problem 1.2
3 pages
Clustering Partitioning Methods
No ratings yet
Clustering Partitioning Methods
20 pages
2002 Spring CS525 Lecture 2
No ratings yet
2002 Spring CS525 Lecture 2
37 pages
Partitioning Methods
No ratings yet
Partitioning Methods
26 pages
Data Mining-Mining Sequence Patterns in Biological Data
No ratings yet
Data Mining-Mining Sequence Patterns in Biological Data
6 pages
Data Mining - Mining Sequential Patterns
No ratings yet
Data Mining - Mining Sequential Patterns
10 pages
Data Mining-Multimedia Datamining
No ratings yet
Data Mining-Multimedia Datamining
8 pages
Data Mining-Outlier Analysis
No ratings yet
Data Mining-Outlier Analysis
6 pages
Data Mining-Model Based Clustering
No ratings yet
Data Mining-Model Based Clustering
8 pages
Data Mining-Spatial Data Mining
No ratings yet
Data Mining-Spatial Data Mining
8 pages
Data Mining - Data Reduction
No ratings yet
Data Mining - Data Reduction
6 pages
Data Mining-Backpropagation
100% (1)
Data Mining-Backpropagation
5 pages
Data Mining - Other Classifiers
No ratings yet
Data Mining - Other Classifiers
7 pages
Data Mining-Applications, Issues
No ratings yet
Data Mining-Applications, Issues
9 pages
Data Mining-Rule Based Classification
No ratings yet
Data Mining-Rule Based Classification
4 pages
Data Mining - Bayesian Classification
No ratings yet
Data Mining - Bayesian Classification
6 pages
Data Mining - Outlier Analysis
100% (3)
Data Mining - Outlier Analysis
11 pages
Data Mining - Discretization
100% (1)
Data Mining - Discretization
5 pages
Data Mining-Data Warehouse
No ratings yet
Data Mining-Data Warehouse
7 pages
08 Data Mining-Other Classifications
No ratings yet
08 Data Mining-Other Classifications
4 pages
Data Mining - Density Based Clustering
No ratings yet
Data Mining - Density Based Clustering
8 pages
DSTF Judges Consolidated Score Sheet Team & Individual
No ratings yet
DSTF Judges Consolidated Score Sheet Team & Individual
10 pages
Written Works: NO. LT# No - of Items Score
No ratings yet
Written Works: NO. LT# No - of Items Score
6 pages
AOA 2022 Solution
No ratings yet
AOA 2022 Solution
24 pages
Machine Learning: Version 2 CSE IIT, Kharagpur
No ratings yet
Machine Learning: Version 2 CSE IIT, Kharagpur
5 pages
Skvortsov - Kozlov14 MM 4
No ratings yet
Skvortsov - Kozlov14 MM 4
11 pages
Numerical Methods II
No ratings yet
Numerical Methods II
9 pages
assingment_2
No ratings yet
assingment_2
2 pages
MAT 461/561: 12.4 Convergence Analysis
No ratings yet
MAT 461/561: 12.4 Convergence Analysis
7 pages
Developmental Mathematics With Applications and Visualization Prealgebra Beginning Algebra and Intermediate Algebra 2nd Edition Rockswold Test Bank
100% (29)
Developmental Mathematics With Applications and Visualization Prealgebra Beginning Algebra and Intermediate Algebra 2nd Edition Rockswold Test Bank
39 pages
Lecture Attention Neural Networks
No ratings yet
Lecture Attention Neural Networks
74 pages
Department of Electrical Engineering Gcu Lahore: Experiment
No ratings yet
Department of Electrical Engineering Gcu Lahore: Experiment
12 pages
Final Quiz 1 - Attempt Review 1
No ratings yet
Final Quiz 1 - Attempt Review 1
3 pages
F-16 Limit Cycle Oscillation
No ratings yet
F-16 Limit Cycle Oscillation
11 pages
Algorithms For Polynomial and Rational Approximation
No ratings yet
Algorithms For Polynomial and Rational Approximation
141 pages
MCA (Revised) Term-End Examination, 2019 0: Mcse-004: Numerical and Statistical Computing
No ratings yet
MCA (Revised) Term-End Examination, 2019 0: Mcse-004: Numerical and Statistical Computing
5 pages
Rational Zero Theorem
No ratings yet
Rational Zero Theorem
9 pages
FIR Filtering and Convolution
No ratings yet
FIR Filtering and Convolution
32 pages
intelligent engineering systems through artificial neural networks volume 17 Smart systems engineering computational intelligence in architecting complex engineering systems 1st Edition Cihan H. Dagli - The ebook in PDF format is available for download
No ratings yet
intelligent engineering systems through artificial neural networks volume 17 Smart systems engineering computational intelligence in architecting complex engineering systems 1st Edition Cihan H. Dagli - The ebook in PDF format is available for download
86 pages
MODULE-4
No ratings yet
MODULE-4
11 pages
Numerical Methods To Find A Root of An Algebraic or Transcendental Equation
No ratings yet
Numerical Methods To Find A Root of An Algebraic or Transcendental Equation
21 pages
Deep Learning
No ratings yet
Deep Learning
43 pages
Laboratory 10: Discrete Fourier Transform: Exercise
No ratings yet
Laboratory 10: Discrete Fourier Transform: Exercise
9 pages
Guided Tutorials Split
No ratings yet
Guided Tutorials Split
49 pages
Sequencing Problems Processing N Jobs Through 2 Machines Problem Example PDF
100% (1)
Sequencing Problems Processing N Jobs Through 2 Machines Problem Example PDF
2 pages
Problem Set #9: Secant Method: Iteration, The Root of A Given Function Has Found and It Is - 0.65927
No ratings yet
Problem Set #9: Secant Method: Iteration, The Root of A Given Function Has Found and It Is - 0.65927
5 pages
Summative Test # 1 Math 7 I. Read Carefully and Write The Correct Answer On The Blank Provided
No ratings yet
Summative Test # 1 Math 7 I. Read Carefully and Write The Correct Answer On The Blank Provided
2 pages
ELISA-Logit21042005-TESTING20190531
No ratings yet
ELISA-Logit21042005-TESTING20190531
6 pages
Computational Tractability Asymptotic Order of Growth Implementing Gale-Shapley Survey of Common Running
No ratings yet
Computational Tractability Asymptotic Order of Growth Implementing Gale-Shapley Survey of Common Running
80 pages
Transportation Problem VAM
No ratings yet
Transportation Problem VAM
16 pages
Slides-4 Optimization Extra Gradient Descent
No ratings yet
Slides-4 Optimization Extra Gradient Descent
67 pages

Data Mining-Partitioning Methods

Uploaded by

Data Mining-Partitioning Methods

Uploaded by

CLUSTERING PARTITIONING METHODS

Major Clustering Approaches

Construct k partitions (k <= n) and then evaluate

Each group has at least one object, each

Iterative Relocation Technique

Avoid Enumeration by storing the centroids

Create a hierarchical decomposition of the set of

Rigid Cannot undo

Perform Analysis of linkages

Integrate with iterative relocation

Typical methods: Diana, Agnes, BIRCH

Partitioning Algorithms: Basic Concept

Given a k, find a partition of k clusters that optimizes

K-Means Clustering Method

(Re)assign each object to the cluster to which

Update the cluster means

Variations of the K-Means Method

Problem of the K-Means Method

K-Medoids Clustering Method

Assign each remaining object to cluster with

If S < 0 swap to form new set of k medoids

Case 1: p currently belongs to medoid oj. If oj is

Case 2: p currently belongs to medoid oj. If oj is

Case 3: p currently belongs to medoid oi (i< >j) If oj

You might also like