Partitioning Methods

Uploaded by

Ahmed hussain

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

24 views26 pages

Partitioning Methods

Uploaded by

Ahmed hussain

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

You are on page 1/ 26

PARTITIONING METHODS

 The simplest and most fundamental version of cluster

analysis is partitioning, which organizes the objects
of a set into several exclusive groups or clusters .
 given a data set, D, of n objects, and k, the number of
clusters to form, a partitioning algorithm organizes
the objects into k partitions (k <= n), where each
partition represents a cluster.
 The clusters are formed to optimize an objective
partitioning criterion, such as a dissimilarity function
based on distance, so that the objects within a cluster
are “similar” to one another and “dissimilar” to objects
in other clusters in terms of the data set attributes.
PARTITTIONING ALGORITHMS

 k-Means: A Centroid-Based Technique

 k-Medoids: A Representative Object-Based

Technique

 CLARA (Clustering LARge Applications)

 CLARANS (Clustering Large Applications based

upon RANdomized Search)
1. k-Means: A Centroid-Based Technique
 a data set, D, contains n objects
 Partitioning methods distribute the objects in D into k
clusters C1,...,Ck, that is,
for (1<I , j <k).
 An objective function is used to assess the partitioning
quality so that objects within a cluster are similar to
one another but dissimilar to objects in other clusters.
 This is, the objective function aims for high intra-
cluster similarity and low inter-cluster similarity.
 Conceptually, the centroid of a cluster is its center
point.
 The centroid can be defined in various ways such as
by the mean or medoid of the objects (or points)
assigned to the cluster.
 The quality of cluster Ci can be measured by the
within cluster variation, which is the sum of
squared error between all objects in cluster Ci and
the centroid ci, defined as:
 E is the sum of the squared error for all objects in the data
set;
 p is the point in space representing a given object; and
 ci is the centroid of cluster Ci (both p and ci are
multidimensional)
In other words, for each object in each cluster, the distance from
the object to its cluster center is squared, and the distances are
summed.
 To obtain good results in practice, it is common to run the k-
means algorithm multiple times with different initial cluster
centers.
 The time complexity of the k-means algorithm is O(nkt)

where n is the total number of objects ,k is the number of

clusters, t is the number of iterations.
 Normally k<<n and t<<,n.. so the method is relatively
scalable and efficient in processing large data sets.
DISADVANTAGES
1. The k-means method can be applied only when the
mean of a set of objects is defined.
 This may not be the case in some applications such as when data
with nominal attributes are involved. The k-modes method is a
variant of k-means, which extends the k-means paradigm to cluster
nominal data by replacing the means of clusters with modes.
 It uses new dissimilarity measures to deal with nominal objects and
a frequency-based method to update modes of clusters. The k-means
and the k-modes methods can be integrated to cluster data with
mixed numeric and nominal values.
2. The k-means method is not suitable for discovering
clusters with non convex shapes or clusters of very
different size.
3. it is sensitive to noise and outlier data points because a
small number of such data can substantially influence
the mean value.
2. k-Medoids: A Representative Object-Based
Technique
 Instead of taking the mean value of the objects in a
cluster as a reference point, we can pick actual objects
to represent the clusters, using one representative object
per cluster.
 Each remaining object is assigned to the cluster of
which the representative object is the most similar.
 The partitioning method is then performed based on
the principle of minimizing the sum of the
dissimilarities between each object p and its
corresponding representative object. That is, an
absolute-error criterion is used, defined as :
 E is the sum of the absolute error for all objects p in
the data set,
 oi is the representative object of Ci . This is the
basis for the k-medoids method, which groups n
objects into k clusters by minimizing the absolute
error.
 Partitioning Around Medoids (PAM) algorithm is a
popular realization of k-medoids clustering. It tackles
the problem in an iterative, greedy way.
 Like the k-means algorithm, the initial representative
objects (called seeds) are chosen arbitrarily. We
consider whether replacing a representative object by a
non representative object would improve the clustering
quality.
ADVANTAGES
 The k-medoids method is more robust than k-means in
the presence of noise and outliers because a medoid is
less influenced by outliers or other extreme values than
a mean.
 the complexity of each iteration in the k-medoids
algorithm is O(k(n-k)^2.
3.CLARA
(Clustering LARge Applications)
 A typical k-medoids partitioning algorithm like
PAM works effectively for small data sets, but
does not scale well for large data sets.
 To deal with larger data sets, a sampling-based
method called CLARA (Clustering LARge
Applications) can be used.
 Instead of taking the whole data set into consideration,
CLARA uses a random sample of the data set.
 The PAM algorithm is then applied to compute the best
medoids from the sample. Ideally, the sample should
closely represent the original data set.
 CLARA builds clustering's from multiple random
samples and returns the best clustering as the output.
 O(ks^2 +k(n-k)),
 The effectiveness of CLARA depends on the sample
size.
 PAM searches for the best k-medoids among a given
data set, whereas CLARA searches for the best k-
medoids among the selected sample of the data set.
 CLARA cannot find a good clustering if any of the best
sampled medoids is far from the best k-medoids.
 If an object is one of the best k-medoids but is not
selected during sampling, CLARA will never find
the best clustering.
4.CLARANS (Clustering Large
Applications based upon RANdomized Search)
 A randomized algorithm called CLARANS
(Clustering Large Applications based upon
RANdomized Search) presents a trade-off between
the cost and the effectiveness of using samples to
obtain clustering.
 First, it randomly selects k objects in the data set as the
current medoids.
 It then randomly selects a current medoid x and an
object y that is not one of the current medoids.
 Can replacing x by y improve the absolute-error
criterion? If yes, the replacement is made.
 CLARANS conducts such a randomized search l times.
The set of the current medoids after the l steps is
considered a local optimum.
 CLARANS repeats this randomized process m times and
returns the best local optimal as the final result.
THANK YOU

A General Theory of Crime PDF
No ratings yet
A General Theory of Crime PDF
306 pages
Fanuc Ac Spindle Motor Αi Series Descriptions
100% (1)
Fanuc Ac Spindle Motor Αi Series Descriptions
406 pages
Teenage Pregnancy
No ratings yet
Teenage Pregnancy
19 pages
Data Mining-Partitioning Methods
100% (1)
Data Mining-Partitioning Methods
7 pages
2002 Spring CS525 Lecture 2
No ratings yet
2002 Spring CS525 Lecture 2
37 pages
Partitioning Methods
100% (1)
Partitioning Methods
3 pages
Clustering Partitioning Methods
No ratings yet
Clustering Partitioning Methods
20 pages
Session 7 Clustering
No ratings yet
Session 7 Clustering
93 pages
Unit 5 DM
No ratings yet
Unit 5 DM
47 pages
DMW Unit-V
No ratings yet
DMW Unit-V
47 pages
Cluster-Analysis
No ratings yet
Cluster-Analysis
89 pages
Cluster Analysis - Approach 1
No ratings yet
Cluster Analysis - Approach 1
28 pages
Slide-08-Chapter10-Cluster Analysis Basic Concept I
No ratings yet
Slide-08-Chapter10-Cluster Analysis Basic Concept I
40 pages
10 Clus Basic
No ratings yet
10 Clus Basic
95 pages
Concepts and Techniques: Data Mining
No ratings yet
Concepts and Techniques: Data Mining
101 pages
Data Mining Clustering
No ratings yet
Data Mining Clustering
76 pages
4.1 Clustering
No ratings yet
4.1 Clustering
69 pages
What Is Cluster Analysis?: - Cluster: A Collection of Data Objects
No ratings yet
What Is Cluster Analysis?: - Cluster: A Collection of Data Objects
42 pages
Cluster Analysis Clustering
No ratings yet
Cluster Analysis Clustering
6 pages
10 Clus Basic
No ratings yet
10 Clus Basic
66 pages
Concepts and Techniques: Data Mining
No ratings yet
Concepts and Techniques: Data Mining
50 pages
Session 3: Clustering Techniques - Partitioning & Hierarchical Methods
No ratings yet
Session 3: Clustering Techniques - Partitioning & Hierarchical Methods
27 pages
Concepts and Techniques: Data Mining
No ratings yet
Concepts and Techniques: Data Mining
101 pages
Chapter 3: Cluster Analysis: 3.1 Basic Concepts of Clustering
No ratings yet
Chapter 3: Cluster Analysis: 3.1 Basic Concepts of Clustering
33 pages
Lesson8 Clustering
100% (1)
Lesson8 Clustering
33 pages
05 Clustering
No ratings yet
05 Clustering
96 pages
Cluster Analysis
No ratings yet
Cluster Analysis
76 pages
10ClusBasic Editted v1
No ratings yet
10ClusBasic Editted v1
41 pages
Clustering For Big Data Analytics
No ratings yet
Clustering For Big Data Analytics
28 pages
Introduction To Cluster Analysis.
No ratings yet
Introduction To Cluster Analysis.
53 pages
Clustering K Means Agnes
No ratings yet
Clustering K Means Agnes
36 pages
Clustering Algorithms
No ratings yet
Clustering Algorithms
93 pages
A Comparison of K-Means Clustering Algorithm and C
No ratings yet
A Comparison of K-Means Clustering Algorithm and C
4 pages
Clustering Methods
No ratings yet
Clustering Methods
14 pages
Clustering
No ratings yet
Clustering
32 pages
Lecture 8 - Clustering
No ratings yet
Lecture 8 - Clustering
23 pages
Clustering Data Mining
No ratings yet
Clustering Data Mining
27 pages
Cluster Analysis: Basic Concepts Partitioning Methods Hierarchical Methods Density-Based Methods Grid-Based Methods Evaluation of Clustering
No ratings yet
Cluster Analysis: Basic Concepts Partitioning Methods Hierarchical Methods Density-Based Methods Grid-Based Methods Evaluation of Clustering
38 pages
Clustering
No ratings yet
Clustering
24 pages
What Is Cluster Analysis?: - Cluster: A Collection of Data Objects
No ratings yet
What Is Cluster Analysis?: - Cluster: A Collection of Data Objects
77 pages
Unit IV
No ratings yet
Unit IV
96 pages
Pam Clustering Technique: Bachelor of Technology Computer Science and Engineering
No ratings yet
Pam Clustering Technique: Bachelor of Technology Computer Science and Engineering
11 pages
Clustering
No ratings yet
Clustering
25 pages
Clustering
No ratings yet
Clustering
37 pages
3.k-Metoids and Hierarchical Updated
No ratings yet
3.k-Metoids and Hierarchical Updated
50 pages
Clustering
No ratings yet
Clustering
7 pages
Cluster
No ratings yet
Cluster
20 pages
Data Mining - Clustering
No ratings yet
Data Mining - Clustering
90 pages
Data Mining Project: Cluster Analysis and Dimensionality Reduction in R Using Bank Marketing Data Set
No ratings yet
Data Mining Project: Cluster Analysis and Dimensionality Reduction in R Using Bank Marketing Data Set
31 pages
10 Clus Basic
No ratings yet
10 Clus Basic
31 pages
Concepts and Techniques: - Chapter 10
No ratings yet
Concepts and Techniques: - Chapter 10
97 pages
Clustering
No ratings yet
Clustering
34 pages
10clustering - Han and Kamber
No ratings yet
10clustering - Han and Kamber
93 pages
8 - Clustering
No ratings yet
8 - Clustering
85 pages
Unit - 5 Cluster Analysis
No ratings yet
Unit - 5 Cluster Analysis
83 pages
Clustering
No ratings yet
Clustering
29 pages
Clustering-Part 1
No ratings yet
Clustering-Part 1
35 pages
Chapter 5. Clustering Algorithms-Stud
No ratings yet
Chapter 5. Clustering Algorithms-Stud
44 pages
Clustering
No ratings yet
Clustering
104 pages
Lect 10 DM
No ratings yet
Lect 10 DM
36 pages
Datamining and Dataware Housing With Special Reference TO Partitional Algorithms in Clustering of Data Mining
No ratings yet
Datamining and Dataware Housing With Special Reference TO Partitional Algorithms in Clustering of Data Mining
10 pages
Sathyabama Institute of Science and Technology SIT1301-Data Mining and Warehousing
No ratings yet
Sathyabama Institute of Science and Technology SIT1301-Data Mining and Warehousing
22 pages
K Nearest Neighbor Algorithm: Fundamentals and Applications
From Everand
K Nearest Neighbor Algorithm: Fundamentals and Applications
Fouad Sabry
No ratings yet
Mrs. Joan Case Study
No ratings yet
Mrs. Joan Case Study
4 pages
Law and Medicine Seminar Paper - Nikita Ganesh, Anusha Shivaswamy (BBA LLB - A)
No ratings yet
Law and Medicine Seminar Paper - Nikita Ganesh, Anusha Shivaswamy (BBA LLB - A)
24 pages
MR SONU SAINI S O RATTI RAM SAINI 25 04 2024 08 00 04 AM
No ratings yet
MR SONU SAINI S O RATTI RAM SAINI 25 04 2024 08 00 04 AM
4 pages
Mice Planning
No ratings yet
Mice Planning
22 pages
NM-NMD Brochure - en
No ratings yet
NM-NMD Brochure - en
12 pages
Pendulate User Guide
No ratings yet
Pendulate User Guide
26 pages
13 三菱、马自达电脑维修图集
No ratings yet
13 三菱、马自达电脑维修图集
70 pages
Become Eco Friendly
100% (1)
Become Eco Friendly
2 pages
Eapp-Module-1 Revised 2024
No ratings yet
Eapp-Module-1 Revised 2024
8 pages
CARAT 220 Compact: Injection Unit Hydraulic System
No ratings yet
CARAT 220 Compact: Injection Unit Hydraulic System
2 pages
Professional Growth Plan Sept-Dec 2015: NAME: Amber Toone Goal 1
No ratings yet
Professional Growth Plan Sept-Dec 2015: NAME: Amber Toone Goal 1
6 pages
Kgn-gdd-050-2024-Tender For Supply, Installation and Commissioning of A Fiber Optic Cable Network at Olkaria.
No ratings yet
Kgn-gdd-050-2024-Tender For Supply, Installation and Commissioning of A Fiber Optic Cable Network at Olkaria.
97 pages
The Accounting Equation
No ratings yet
The Accounting Equation
49 pages
Problem Based Learning. GMRC
No ratings yet
Problem Based Learning. GMRC
22 pages
Handout 3.1 Writing Equation of A Line 1 Combined
No ratings yet
Handout 3.1 Writing Equation of A Line 1 Combined
13 pages
Projek 3 Unit 3
No ratings yet
Projek 3 Unit 3
3 pages
Traditional Underpinning Apr 09
No ratings yet
Traditional Underpinning Apr 09
3 pages
2014 Dynatect Catalog DT14-CT-10A Rev1
No ratings yet
2014 Dynatect Catalog DT14-CT-10A Rev1
224 pages
Greater Noida Societies
100% (1)
Greater Noida Societies
4 pages
Medicinal Chemistry-I: Dr. Firoj Ahmed Professor Department of Pharmaceutical Chemistry University of Dhaka
100% (1)
Medicinal Chemistry-I: Dr. Firoj Ahmed Professor Department of Pharmaceutical Chemistry University of Dhaka
91 pages
Thesis Defense
No ratings yet
Thesis Defense
37 pages
FFT10603 Tutorial - Functions and Graphs MCQ With Solutions
No ratings yet
FFT10603 Tutorial - Functions and Graphs MCQ With Solutions
10 pages
Q4 TLE 7 Lesson 3 Week 3
No ratings yet
Q4 TLE 7 Lesson 3 Week 3
53 pages
Inès Champey 2002
No ratings yet
Inès Champey 2002
7 pages
7
No ratings yet
7
3 pages
Beltran V Secretary of Health GR 133640
No ratings yet
Beltran V Secretary of Health GR 133640
10 pages
Sas S2
No ratings yet
Sas S2
6 pages

Partitioning Methods

Uploaded by

Partitioning Methods

Uploaded by

PARTITIONING METHODS

 The simplest and most fundamental version of cluster

 k-Means: A Centroid-Based Technique

 k-Medoids: A Representative Object-Based

 CLARA (Clustering LARge Applications)

 CLARANS (Clustering Large Applications based

where n is the total number of objects ,k is the number of

You might also like