0% found this document useful (0 votes)

7 views14 pages

Partitioning Algorithms

Uploaded by

Pradeep ravikumar

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

7 views14 pages

Partitioning Algorithms

Uploaded by

Pradeep ravikumar

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

You are on page 1/ 14

Data Mining

(19ADOCN1001)
Mr.M.VijayaKumar, AP/AI&DS

19ADCN1303 - Data Mining 1

Course Outcomes

CO4: Classify
data for the given dataset using real world
applications.

19ADCN1303 - Data Mining 2

UNIT IV – Classification and Clustering
Classification: Basic Concepts - Decision Tree
Induction – Bayes Classification Methods – Rule
Based Classification – K-Nearest-Neighbor
Classifier - Model Evaluation and Selection –
Techniques to Improve Classification Accuracy.
Cluster Analysis: Basic Concepts and Methods-
Cluster Analysis - Partitioning Methods -
Hierarchical Methods - Density-Based Methods -
Grid-Based Methods.

19ADCN1303 - Data Mining 3

Partitioning Algorithms: Basic Concept
• Partitioning method: Partitioning a database D of n objects into a set of k
clusters, such that the sum of squared distances is minimized (where ci is
the centroid or medoid of cluster Ci)

E  ik1 pCi ( p  ci ) 2

• Given k, find a partition of k clusters that optimizes the chosen

partitioning criterion
• Global optimal: exhaustively enumerate all partitions
• Heuristic methods: k-means and k-medoids algorithms
• k-means (MacQueen’67, Lloyd’57/’82): Each cluster is represented
by the center of the cluster
• k-medoids or PAM (Partition around medoids) (Kaufman &
Rousseeuw’87): Each cluster is represented by one of the objects in
the cluster
Data Mining 4
The K-Means Clustering Method
• Given k, the k-means algorithm is implemented in four
steps:
• Partition objects into k nonempty subsets
• Compute seed points as the centroids of the clusters
of the current partitioning (the centroid is the center,
i.e., mean point, of the cluster)
• Assign each object to the cluster with the nearest
seed point
• Go back to Step 2, stop when the assignment does not
change

19ADCN1303 - Data Mining 5

An Example of K-Means Clustering

19ADCN1303 - Data Mining 6

Comments on the K-Means Method
• Strength: Efficient: O(tkn), where n is # objects, k is # clusters, and t is
# iterations. Normally, k, t << n.
• Comparing: PAM: O(k(n-k)2 ), CLARA: O(ks2 + k(n-k))
• Comment: Often terminates at a local optimal.
• Weakness
• Applicable only to objects in a continuous n-dimensional space
• Using the k-modes method for categorical data
• In comparison, k-medoids can be applied to a wide range of
data
• Need to specify k, the number of clusters, in advance (there are
ways to automatically determine the best k (see Hastie et al.,
2009)
• Sensitive to noisy data and outliers
• Not suitable to discover19ADCN1303
clusters- Data
with non-convex shapes 7
Mining
Variations of the K-Means Method
• Most of the variants of the k-means which differ in

• Selection of the initial k means

• Dissimilarity calculations

• Strategies to calculate cluster means

• Handling categorical data: k-modes

• Replacing means of clusters with modes

• Using new dissimilarity measures to deal with categorical

objects
• Using a frequency-based method to update modes of clusters

• A mixture of categorical and numerical data: k-prototype method

19ADCN1303 - Data Mining 8

What Is the Problem of the K-Means
Method?
• The k-means algorithm is sensitive to outliers !

• Since an object with an extremely large value may substantially

distort the distribution of the data

• K-Medoids: Instead of taking the mean value of the object in a cluster as

a reference point, medoids can be used, which is the most centrally
located object in a cluster

19ADCN1303 - Data Mining 9

PAM: A Typical K-Medoids Algorithm

19ADCN1303 - Data Mining 10

The K-Medoid Clustering Method
• K-Medoids Clustering: Find representative objects (medoids) in clusters

• PAM (Partitioning Around Medoids, Kaufmann & Rousseeuw 1987)

• Starts from an initial set of medoids and iteratively replaces one

of the medoids by one of the non-medoids if it improves the total
distance of the resulting clustering
• PAM works effectively for small data sets, but does not scale well
for large data sets (due to the computational complexity)

• Efficiency improvement on PAM

• CLARA (Kaufmann & Rousseeuw, 1990): PAM on samples

• CLARANS (Ng & Han, 1994): Randomized re-sampling

19ADCN1303 - Data Mining 11

Summary
• Partitioning Methods

19ADCN1303 - Data Mining 12

Reference
1. Jiawei Han, Micheline Kamber, Jian Pei, “Data Mining:
Concepts and Techniques”, 3rd Edition, Elsevier, 2012.

19ADCN1303 - Data Mining 13

Thank you

19ADCN1303 - Data Mining 14

Lesson8 Clustering
100% (1)
Lesson8 Clustering
33 pages
Cluster-Analysis
No ratings yet
Cluster-Analysis
89 pages
Session 7 Clustering
No ratings yet
Session 7 Clustering
93 pages
Ijret 110306027
No ratings yet
Ijret 110306027
4 pages
Lect3 Clustering
No ratings yet
Lect3 Clustering
86 pages
Cluster Analysis
No ratings yet
Cluster Analysis
76 pages
07-Clustering
No ratings yet
07-Clustering
54 pages
Unit V - Clustering
No ratings yet
Unit V - Clustering
19 pages
Clustering
No ratings yet
Clustering
25 pages
Clustering
No ratings yet
Clustering
24 pages
UNIT-5 PPT
No ratings yet
UNIT-5 PPT
85 pages
Chap 19 - CLustering
No ratings yet
Chap 19 - CLustering
18 pages
CE345 - Lecture #9 - Clustering
No ratings yet
CE345 - Lecture #9 - Clustering
56 pages
CLUSTERING CLASSIFICATION AND INTRO NEURAL NETWORK
No ratings yet
CLUSTERING CLASSIFICATION AND INTRO NEURAL NETWORK
168 pages
2002 Spring CS525 Lecture 2
No ratings yet
2002 Spring CS525 Lecture 2
37 pages
Chapter 3: Cluster Analysis: 3.1 Basic Concepts of Clustering
No ratings yet
Chapter 3: Cluster Analysis: 3.1 Basic Concepts of Clustering
33 pages
Cluster
No ratings yet
Cluster
20 pages
Lecture 3.2.3 3.2.4
No ratings yet
Lecture 3.2.3 3.2.4
28 pages
Clustering Data Mining
No ratings yet
Clustering Data Mining
27 pages
Clustering
No ratings yet
Clustering
29 pages
Clustering
No ratings yet
Clustering
32 pages
Cluster Analysis: Dr. Bernard Chen Ph.D. Assistant Professor
No ratings yet
Cluster Analysis: Dr. Bernard Chen Ph.D. Assistant Professor
43 pages
Data Mining-Partitioning Methods
100% (1)
Data Mining-Partitioning Methods
7 pages
Clustering and Dimensionality Reduction
No ratings yet
Clustering and Dimensionality Reduction
58 pages
Slide-08-Chapter10-Cluster Analysis Basic Concept I
No ratings yet
Slide-08-Chapter10-Cluster Analysis Basic Concept I
40 pages
Lecture5 - Clustering (K Means and K Medoids)
No ratings yet
Lecture5 - Clustering (K Means and K Medoids)
36 pages
Concepts and Techniques: Data Mining
No ratings yet
Concepts and Techniques: Data Mining
50 pages
Concepts and Techniques: Data Mining
No ratings yet
Concepts and Techniques: Data Mining
101 pages
Lecture 1 (UNIT 1)
No ratings yet
Lecture 1 (UNIT 1)
68 pages
10ClusBasic Editted v1
No ratings yet
10ClusBasic Editted v1
41 pages
Clustering_Deep_Dive
No ratings yet
Clustering_Deep_Dive
8 pages
Clustering Partition Hierachy
No ratings yet
Clustering Partition Hierachy
58 pages
2.10 Partitioning Methods - k-Means and k-Medoids
No ratings yet
2.10 Partitioning Methods - k-Means and k-Medoids
38 pages
Partitioning Methods
No ratings yet
Partitioning Methods
26 pages
DMW Unit-V
No ratings yet
DMW Unit-V
47 pages
Cluster Analysis
No ratings yet
Cluster Analysis
21 pages
Lec.3.D. M. spring 2025
No ratings yet
Lec.3.D. M. spring 2025
21 pages
19.1. Partitioning-Based Clustering Algorithms
No ratings yet
19.1. Partitioning-Based Clustering Algorithms
27 pages
unit4_ml[1]
No ratings yet
unit4_ml[1]
20 pages
Unit 3
No ratings yet
Unit 3
58 pages
Data Mining Clustering
No ratings yet
Data Mining Clustering
76 pages
Data Mining Unit-Iv
No ratings yet
Data Mining Unit-Iv
34 pages
Clustering
No ratings yet
Clustering
80 pages
Clustering Partitioning Methods
No ratings yet
Clustering Partitioning Methods
20 pages
Lecture 3. Partitioning-Based Clustering Methods
No ratings yet
Lecture 3. Partitioning-Based Clustering Methods
27 pages
Pam Clustering Technique
No ratings yet
Pam Clustering Technique
13 pages
Week 11
No ratings yet
Week 11
49 pages
Clustering_notes
No ratings yet
Clustering_notes
29 pages
Chapter 5. Clustering Algorithms-Stud
No ratings yet
Chapter 5. Clustering Algorithms-Stud
44 pages
Clustering in AI
No ratings yet
Clustering in AI
16 pages
4 Clustring
No ratings yet
4 Clustring
48 pages
Lect 10 DM
No ratings yet
Lect 10 DM
36 pages
Concepts and Techniques: Data Mining
No ratings yet
Concepts and Techniques: Data Mining
101 pages
Microsoft_500+_DSA_Questions (1)
No ratings yet
Microsoft_500+_DSA_Questions (1)
3 pages
Unsupervised Learning - Clustering
No ratings yet
Unsupervised Learning - Clustering
55 pages
What Is Cluster Analysis?: - Cluster: A Collection of Data Objects
No ratings yet
What Is Cluster Analysis?: - Cluster: A Collection of Data Objects
42 pages
AI-AG-Day-2-28th Feb 2023
No ratings yet
AI-AG-Day-2-28th Feb 2023
44 pages
What Is Cluster Analysis?: - Cluster: A Collection of Data Objects
No ratings yet
What Is Cluster Analysis?: - Cluster: A Collection of Data Objects
77 pages
Session 18-Cluster Analysis
No ratings yet
Session 18-Cluster Analysis
20 pages
Lecture 6
No ratings yet
Lecture 6
14 pages
(eBook PDF) Optimization in Operations Research 2nd Edition download
100% (6)
(eBook PDF) Optimization in Operations Research 2nd Edition download
47 pages
FMSE Lect 06
No ratings yet
FMSE Lect 06
64 pages
??_♂️DAA solutions
No ratings yet
??_♂️DAA solutions
43 pages
Introduction To Algorithms
50% (2)
Introduction To Algorithms
20 pages
Object Oriented Programming (OOP) - CS304 Power Point Slides Lecture 42
No ratings yet
Object Oriented Programming (OOP) - CS304 Power Point Slides Lecture 42
43 pages
Channel Coding For Modern Communication Systems: Presented by Yasir Mehmood (200411018)
No ratings yet
Channel Coding For Modern Communication Systems: Presented by Yasir Mehmood (200411018)
20 pages
Algorithmics and Optimization PDF
No ratings yet
Algorithmics and Optimization PDF
123 pages
Hashing Part 1 Lecture
No ratings yet
Hashing Part 1 Lecture
33 pages
JazzSolo LSTM
No ratings yet
JazzSolo LSTM
32 pages
DFA Solved Examples
100% (1)
DFA Solved Examples
19 pages
Graph Notes
No ratings yet
Graph Notes
16 pages
Methods and Models For Combinatorial Optimization
No ratings yet
Methods and Models For Combinatorial Optimization
17 pages
Multi-Objective Genetic Algorithms
No ratings yet
Multi-Objective Genetic Algorithms
52 pages
Lecture Notes Topic 8
No ratings yet
Lecture Notes Topic 8
38 pages
Automata & Compiler Design Handout
No ratings yet
Automata & Compiler Design Handout
59 pages
Algorithmic Thinking
No ratings yet
Algorithmic Thinking
24 pages
Graph Anomaly Detection With Graph Neural Networks-Current Status and Challenges
No ratings yet
Graph Anomaly Detection With Graph Neural Networks-Current Status and Challenges
8 pages
Operation Research 1 Final Examination Reviewer
No ratings yet
Operation Research 1 Final Examination Reviewer
4 pages
PDC Review 1
No ratings yet
PDC Review 1
6 pages
Cs606 Final Term Quizez and MCQZ Solved With Refer
No ratings yet
Cs606 Final Term Quizez and MCQZ Solved With Refer
18 pages
An Introduction To Graph Theory in Complex Systems Studies: Why Use A Graph-Theoretic Representation?
No ratings yet
An Introduction To Graph Theory in Complex Systems Studies: Why Use A Graph-Theoretic Representation?
10 pages
BPJ Lesson 10
No ratings yet
BPJ Lesson 10
4 pages
Polytree
No ratings yet
Polytree
3 pages
CS207A: Data Structures and Algorithms (Module #3) Assignment #1
No ratings yet
CS207A: Data Structures and Algorithms (Module #3) Assignment #1
2 pages
Conics
No ratings yet
Conics
3 pages
Introduction To Operations Research: Ninth Edition
No ratings yet
Introduction To Operations Research: Ninth Edition
8 pages
En - Ahau.findplus - CN H Articles&db Edselc&an Edselc.2-52
No ratings yet
En - Ahau.findplus - CN H Articles&db Edselc&an Edselc.2-52
1 page
Total Pages: 2: Answer All Questions, Each Carries 3 Marks
No ratings yet
Total Pages: 2: Answer All Questions, Each Carries 3 Marks
2 pages
Mathematical Foundations For Data Science
No ratings yet
Mathematical Foundations For Data Science
2 pages
Bayesian Networks: An Introduction
From Everand
Bayesian Networks: An Introduction
Timo Koski
3/5 (1)

Partitioning Algorithms

Uploaded by

Partitioning Algorithms

Uploaded by

Data Mining

19ADCN1303 - Data Mining 1

19ADCN1303 - Data Mining 2

19ADCN1303 - Data Mining 3

• Given k, find a partition of k clusters that optimizes the chosen

19ADCN1303 - Data Mining 5

19ADCN1303 - Data Mining 6

• Selection of the initial k means

• Strategies to calculate cluster means

• Handling categorical data: k-modes

• Replacing means of clusters with modes

• Using new dissimilarity measures to deal with categorical

• A mixture of categorical and numerical data: k-prototype method

19ADCN1303 - Data Mining 8

• Since an object with an extremely large value may substantially

• K-Medoids: Instead of taking the mean value of the object in a cluster as

19ADCN1303 - Data Mining 9

19ADCN1303 - Data Mining 10

• PAM (Partitioning Around Medoids, Kaufmann & Rousseeuw 1987)

• Starts from an initial set of medoids and iteratively replaces one

• Efficiency improvement on PAM

• CLARA (Kaufmann & Rousseeuw, 1990): PAM on samples

• CLARANS (Ng & Han, 1994): Randomized re-sampling

19ADCN1303 - Data Mining 11

19ADCN1303 - Data Mining 12

19ADCN1303 - Data Mining 13

19ADCN1303 - Data Mining 14

You might also like