0% found this document useful (0 votes)

41 views7 pages

LJ 9

The document discusses clustering algorithms for consolidating large datasets by grouping data points into clusters. It introduces the continuous k-means algorithm developed at Los Alamos National Laboratory for efficiently clustering large datasets. The algorithm aims to find cluster centroids that best represent the data to minimize the error between data points and their assigned centroids.

Uploaded by

Edward

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

41 views7 pages

LJ 9

Uploaded by

Edward

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 7

Clustering and the

Continuous k-Means Algorithm

Vance Faber

M
any types of data analysis, such as the interpretation of Landsat images
discussed in the accompanying article, involve datasets so large that
their direct manipulation is impractical. Some method of data compres-
sion or consolidation must first be applied to reduce the size of the dataset without
losing the essential character of the data. All consolidation methods sacrifice
some detail; the most desirable methods are computationally efficient and yield re-
sults that are—at least for practical applications—representative of the original
data. Here we introduce several widely used algorithms that consolidate data by
clustering, or grouping, and then present a new method, the continuous k-means
algorithm,* developed at the Laboratory specifically for clustering large datasets.

Clustering involves dividing a set of data points into non-overlapping groups, or

clusters, of points, where points in a cluster are “more similar” to one another than
to points in other clusters. The term “more similar,” when applied to clustered
points, usually means closer by some measure of proximity. When a dataset is
clustered, every point is assigned to some cluster, and every cluster can be charac-
terized by a single reference point, usually an average of the points in the cluster.
Any particular division of all points in a dataset into clusters is called a partitioning.

One of the most familiar applications of clustering is the classification of plants or

animals into distinct groups or species. However, the main purpose of clustering
Landsat data is to reduce the size and complexity of the dataset. Data reduction is
accomplished by replacing the coordinates of each point in a cluster with the coor-
dinates of that cluster’s reference point. Clustered data require considerably less
storage space and can be manipulated more quickly than the original data. The
value of a particular clustering method will depend on how closely the reference
points represent the data as well as how fast the program runs.

A common example of clustering is the consolidation of a set of students’ test

scores, expressed as percentages, into five clusters, one for each letter grade A, B,
C, D, and F (see Figure 1). The test scores are the data points, and each cluster’s
reference point is the average of the test scores in that cluster. The letter grades
can be thought of as symbolic replacements for the numerical reference points.

Test scores are an example of one-dimensional data; each data point represents a
single measured quantity. Multidimensional data can include any number of mea-
surable attributes; a biologist might use four attributes of duck bills (four-dimen-
sional data: size, straightness, thickness, and color) to sort a large set of ducks
into several species. Each independent characteristic, or measurement, is one di-
mension. The consolidation of large, multidimensional datasets is the main pur-

* The continuous k-means algorithm is part of a patented application for improving both the processing
speed and the appearance of color video displays. The application is commercially available for Mac-
intosh computers under the names Fast Eddie, 1992 and Planet Color, 1993, by Paradigm Con-
cepts, Inc., Santa Fe, NM. This software was developed by Vance Faber, Mark O. Mundt, Jeffrey S.
Saltzman, and James M. White.

138 Los Alamos Science Number 22 1994

Clustering and the Continuous k-Means Algorithm

F D C B A
47 5253 5657 59 61 65 6768 70 71 73 75 77 79 8283 87 97

50 60 70 80 90 100

47 56.33 69.86 81.6 97

pose of the field of cluster analysis. We will describe several clustering methods Figure 1. Clustering Test Scores
below. In all of these methods the desired number of clusters k is specified be- The figure illustrates an arbitrary parti-
forehand. The reference point zi for the cluster i is usually the centroid of the tioning of 20 test scores into 5 non-over-
cluster. In the case of one-dimensional data, such as the test scores, the centroid lapping clusters (dashed lines), corre-
is the arithmetic average of the values of the points in a cluster. For multi- sponding to 5 letter grades. The
dimensional data, where each data point has several components, the centroid will reference points (means) are indicated
have the same number of components and each component will be the arithmetic in red.
average of the corresponding components of all the data points in the cluster.

Perhaps the simplest and oldest automated clustering method is to combine data
points into clusters in a pairwise fashion until the points have been condensed into
the desired number of clusters; this type of agglomerative algorithm is found in
many off-the-shelf statistics packages. Figure 2 illustrates the method applied to
the set of test scores given in Figure 1.

There are two major drawbacks to this algorithm. First—and absolutely prohibi-
tive for the analysis of large datasets—the method is computationally inefficient.
Each step of the procedure requires calculation of the distance between every pos-
sible pair of data points and comparison of all the distances. The second difficulty
is connected to a more fundamental problem in cluster analysis: Although the al-
gorithm will always produce the desired number of clusters, the centroids of these
clusters may not be particularly representative of the data.

What determines a “good,” or representative, clustering? Consider a single cluster

of points along with its centroid or mean. If the data points are tightly clustered
around the centroid, the centroid will be representative of all the points in that
cluster. The standard measure of the spread of a group of points about its mean is
the variance, or the sum of the squares of the distance between each point and the
mean. If the data points are close to the mean, the variance will be small. A gen-
eralization of the variance, in which the centroid is replaced by a reference point
that may or may not be a centroid, is used in cluster analysis to indicate the over-
all quality of a partitioning; specifically, the error measure E is the sum of all the
variances:
k ni
E 5 ^ ^ xij 2 zi2,
i51 j51

where xij is the jth point in the ith cluster, zi is the reference point of the ith clus-
ter, and ni is the number of points in that cluster. The notation ||xij - zi|| stands for
the distance between xij and zi. Hence, the error measure E indicates the overall
spread of data points about their reference points. To achieve a representative
clustering, E should be as small as possible.

The error measure provides an objective method for comparing partitionings as

well as a test for eliminating unsuitable partitionings. At present, finding the best

Number 22 1994 Los Alamos Science 139

Clustering and the Continuous k-Means Algorithm

cluster2.adb•
7/26/94

Test scores
47 52 53 56 57 59 61 65 6768 70 71 73 75 77 79 82 83 87 97

S1
52.5
S2
56.5
S3
67.5
S4
70.5
S5
82.5
S6
60
Step number

S7
74
S8
78
S9
66.25
S10
58.25
S11
72.25
S12
80.25
S13
49.75
S14
69.25
S15
83.75

Means 49.75 58.25 69.25 83.75 97

Clusters 47, 52, 53 56, 57, 59, 61 65, 67, 68, 70, 71, 73, 75 77, 79, 82, 83, 87 97

Figure 2. Pairwise Agglomerative Clustering

The figure illustrates the operation of an agglomerative clustering method, in which the 20 test scores of Figure 1 are successively
merged by pairs of points and/or pairs of clusters until all the scores are collected into 5 clusters. The steps of the algorithm are
shown in the branching of a dendrogram, or tree structure (much like a genealogy). A node, or branch point, indicates the merging
of two branches into one, i.e. two data points into one cluster, or two clusters into one larger cluster. The algorithm begins with 20
separate clusters of one point apiece. For the first step in the algorithm, the closest two points (here, scores of 52 and 53) are
found and merged into one cluster {52,53}. The two individual points are replaced by a single point equal to the unweighted average
of the two points (52.5). The next step repeats this process (find the closest two points, calculate the average, merge the points),
but with 19 points and 19 clusters (18 one-point clusters, plus 1 two-point cluster). There will be only one new branch, or merge at
each step. Hence, if there is more than one pair of points at the minimum distance, only one pair will be merged at each step. It
takes 15 steps to consolidate 20 points into 5 clusters.

140 Los Alamos Science Number 22 1994

Clustering and the Continuous k-Means Algorithm

partitioning (the clustering most representative of an arbitrary dataset) requires

generating all possible combinations of clusters and comparing their error mea-
sures. This can be done for small datasets with a few dozen points, but not for
large sets—the number of different ways to combine 1 million data points into 256
clusters, for example, is 2561,000,000/256!, where 256! is equal to 256 3 255 3
254 3 … 3 2 3 1. This number is greater than 102,000,000, or 1 followed by 2
million zeros.

When clustering is done for the purpose of data reduction, as in the case of the
Landsat images, the goal is not to find the best partitioning. We merely want a
reasonable consolidation of N data points into k clusters, and, if necessary, some
efficient way to improve the quality of the initial partitioning. For that purpose,
there is a family of iterative-partitioning algorithms that is far superior to the ag-
glomerative algorithm described above.

Iterative algorithms begin with a set of k reference points whose initial values are
usually chosen by the user. First, the data points are partitioned into k clusters: A
data point x becomes a member of cluster i if zi is the reference point closest to x.
The positions of the reference points and the assignment of the data points to clus-
ters are then adjusted during successive iterations. Iterative algorithms are thus
similar to fitting routines, which begin with an initial “guess” for each fitted para-
meter and then optimize their values. Algorithms within this family differ in the
details of generating and adjusting the partitions. Three members of this family
are discussed here: Lloyd’s algorithm, the standard k-means algorithm, and a con-
tinuous k-means algorithm first described in 1967 by J. MacQueen and recently
developed for general use at Los Alamos.

Conceptually, Lloyd’s algorithm is the simplest. The initial partitioning is set up

as described above: All the data points are partitioned into k clusters by assigning
each point to the cluster of the closest reference point. Adjustments are made by
calculating the centroid for each of those clusters and then using those centroids as
reference points for the next partitioning of all the data points. It can be proved
that a local minimum of the error measure E corresponds to a “centroidal
Voronoi” configuration, where each data point is closer to the reference point of
its cluster than to any other reference point, and each reference point is the cen-
troid of its cluster. The purpose of the iteration is to move the partition closer to
this configuration and thus to approach a local minimum for E.

For Lloyd’s and other iterative algorithms, improvement of the partitioning and
convergence of the error measure E to a local minimum is often quite fast—even
when the initial reference points are badly chosen. However, unlike guesses for
parameters in simple fitting routines, slightly different initial partitionings general-
ly do not produce the same set of final clusters. A final partitioning will be better
than the initial choice, but it will not necessarily be the best possible partitioning.
For many applications, this is not a significant problem. For example, the differ-
ences between Landsat images made from the original data and those made from
the clustered data are seldom visible even to trained analysts, so small differences
in the clustered data are even less important. In such cases, the judgment of the
analyst is the best guide as to whether a clustering method yields reasonable results.

Number 22 1994 Los Alamos Science 141

cluster3.adb•
7/26/94
Clustering and the Continuous k-Means Algorithm

(a) Setup: (b) Results of first iteration: (c) Results of second iteration:
Reference point 1 (filled red circle) and Next each reference point is moved to the During the second iteration, the process in
reference point 2 (filled black circle) are centroid of its cluster. Then each data point is Figure 3(b) is performed again for every data
chosen arbitrarily. All data points (open considered in the sequence shown. If the point. The partition shown above is stable; it
circles) are then partitioned into two clusters: reference point closest to the data point will not change for any further iteration.
each data point is assigned to cluster 1 or belongs to the other cluster, the data point is
cluster 2, depending on whether the data point reassigned to that other cluster, and both
is closer to reference point 1 or 2, respectively. cluster centroids are recomputed.

Cluster 2 Cluster 2
Cluster 2
7 8
Cluster 1 Cluster 1 9
Cluster 1
6
3
5
1 2
4

Figure 3. Clustering by the The standard k-means algorithm differs from Lloyd’s in its more efficient use of
Standard k-Means Algorithm information at every step. The setup for both algorithms is the same: Reference
The diagrams show results during two points are chosen and all the data points are assigned to clusters. As with Lloyd’s,
iterations in the partitioning of nine two- the k-means algorithm then uses the cluster centroids as reference points in subse-
dimensional data points into two well- quent partitionings—but the centroids are adjusted both during and after each par-
separated clusters, using the standard titioning. For data point x in cluster i, if the centroid zi is the nearest reference
k-means algorithm. Points in cluster 1 point, no adjustments are made and the algorithm proceeds to the next data point.
are shown in red, points in cluster 2 are However, if the centroid zj of the cluster j is the reference point closest to data
shown in black; data points are denoted point x, then x is reassigned to cluster j, the centroids of the “losing” cluster i
by open circles and reference points by (minus point x) and the “gaining” cluster j (plus point x) are recomputed, and the
filled circles. Clusters are indicated by reference points zi and zj are moved to their new centroids. After each step,
dashed lines. Note that the iteration con- every one of the k reference points is a centroid, or mean, hence the name “k-
verges quickly to the correct clustering, means.” An example of clustering using the standard k-mean algorithm is shown
even for this bad initial choice of the two in Figure 3.
reference points.
There are a number of variants of the k-means algorithm. In some versions, the
error measure E is evaluated at each step, and a data point is reassigned to a dif-
ferent cluster only if that reassignment decreases E. In MacQueen’s original paper
on the k-means method, the centroid update (assign data point to cluster, recom-
pute the centroid, move the reference point to the centroid) is applied at each step
in the initial partitioning, as well as during the iterations. In all of these cases, the
standard k-means algorithm requires about the same amount of computation for a
single pass through all the data points, or one iteration, as does Lloyd’s algorithm.
However, the k-means algorithm, because it constantly updates the clusters, is un-
likely to require as many iterations as the less efficient Lloyd’s algorithm and is
therefore considerably faster.

The Continuous k-Means Algorithm

The continuous k-means algorithm is faster than the standard version and thus ex-
tends the size of the datasets that can be clustered. It differs from the standard
version in how the initial reference points are chosen and how data points are se-
lected for the updating process.

142 Los Alamos Science Number 22 1994

Clustering and the Continuous k-Means Algorithm

In the standard algorithm the initial reference points are chosen more or less arbi-
trarily. In the continuous algorithm reference points are chosen as a random sam-
ple from the whole population of data points. If the sample is sufficiently large,
the distribution of these initial reference points should reflect the distribution of
points in the entire set. If the whole set of points is densest in Region 7, for ex-
ample, then the sample should also be densest in Region 7. When this process is
applied to Landsat data, it effectively puts more cluster centroids (and the best
color resolution) where there are more data points.

Another difference between the standard and continuous k-means algorithms is the
way the data points are treated. During each complete iteration, the standard algo-
rithm examines all the data points in sequence. In contrast, the continuous algo-
rithm examines only a random sample of data points. If the dataset is very large
and the sample is representative of the dataset, the algorithm should converge
much more quickly than an algorithm that examines every point in sequence. In
fact, the continuous algorithm adopts MacQueen’s method of updating the cen-
troids during the initial partitioning, when the data points are first assigned to clus-
ters. Convergence is usually fast enough so that a second pass through the data
points is not needed.

From a theoretical perspective, random sampling represents a return to MacQueen’s

original concept of the algorithm as a method of clustering data over a continuous
space. In his formulation, the error measure Ei for each region Ri is given by

Ei 5 E
x[Ri
2
r (x)x 2 zi  dx ,

where ρ (x) is the probability density function, a continuous function defined over
the space, and the total error measure E is given by the sum of the Ei’s. In Mac-
Queen’s concept of the algorithm, a very large set of discrete data points can be
thought of as a large sample—and thus a good estimate—of the continuous proba-
bility density r (x). It then becomes apparent that a random sample of the dataset
can also be a good estimate of r (x). Such a sample yields a representative set of
cluster centroids and a reasonable estimate of the error measure without using all
the points in the original dataset.

These modifications to the standard algorithm greatly accelerate the clustering

process. Since both the reference points and the data points for the updates are
chosen by random sampling, more reference points will be found in the densest re-
gions of the dataset and the reference points will be updated by data points in the
most critical regions. In addition, the initial reference points are already members
of the dataset and, as such, require fewer updates. Therefore, even when applied
to a large dataset, the algorithm normally converges to a solution after only a
small fraction (10 to 15 percent) of the total points have been examined. This
rapid convergence distinguishes the continuous k-means from less efficient algo-
rithms. Clustering with the continuous k-means algorithm is about ten times faster
than clustering with Lloyd’s algorithm.

Number 22 1994 Los Alamos Science 143

Clustering and the Continuous k-Means Algorithm

The computer time can be further reduced by making the individual steps in the
algorithm more efficient. A substantial fraction of the computation time required
by any of these clustering algorithms is typically spent in finding the reference
point closest to a particular data point. In a “brute-force” method, the distances
from a given data point to all of the reference points must be calculated and com-
pared. More elegant methods of “point location” avoid much of this time-consum-
ing process by reducing the number of reference points that must be considered—
but some computational time must be spent to create data structures. Such
structures range from particular orderings of reference points, to “trees” in which
reference points are organized into categories. A tree structure allows one to elim-
inate entire categories of reference points from the distance calculations. The con-
tinuous k-means algorithm uses a tree method to cluster three-dimensional data,
such as pixel colors on a video screen. When applied to seven-dimensional Land-
sat data, the algorithm uses single-axis boundarizing, which orders the reference
points along the direction of maximum variation. In either method only a few
points need be considered when calculating and comparing distances. The choice
of a particular method will depend on the number of dimensions of the dataset.

Two features of the continuous k-means algorithm—convergence to a feasible

group of reference points after very few updates and greatly reduced computer
time per update—are highly desirable for any clustering algorithm. In fact, such
features are crucial for consolidating and analyzing very large datasets such as
those discussed in the accompanying article.

The biography of Vance Faber appears on page 149.

144 Los Alamos Science Number 22 1994

K Medoids
No ratings yet
K Medoids
101 pages
Advance Hydraulic Structures (Multiple Choice Questions)
67% (3)
Advance Hydraulic Structures (Multiple Choice Questions)
4 pages
Datamining-Lect5 - Clustering. The K-Means Algorithm. Hierarchical Clustering. The DBSCAN Algorithm. Clustering Evaluation
No ratings yet
Datamining-Lect5 - Clustering. The K-Means Algorithm. Hierarchical Clustering. The DBSCAN Algorithm. Clustering Evaluation
110 pages
Unit 4 - Data Warehousing and Mining
No ratings yet
Unit 4 - Data Warehousing and Mining
51 pages
Roland Barthes Empire of Signs PDF
100% (4)
Roland Barthes Empire of Signs PDF
120 pages
Clustering
No ratings yet
Clustering
55 pages
Lecture 3
No ratings yet
Lecture 3
46 pages
Cluster Analysis: Minh Tran, PHD
No ratings yet
Cluster Analysis: Minh Tran, PHD
37 pages
Manifestation Meditation
80% (5)
Manifestation Meditation
28 pages
Datamining Lect8
No ratings yet
Datamining Lect8
79 pages
DM 4
No ratings yet
DM 4
76 pages
Data Mining - Clustering
No ratings yet
Data Mining - Clustering
90 pages
07 Clustering
No ratings yet
07 Clustering
54 pages
Clustering
No ratings yet
Clustering
34 pages
Unit 2 - Introduction To Cluster Analysis
No ratings yet
Unit 2 - Introduction To Cluster Analysis
53 pages
Chapter 6
No ratings yet
Chapter 6
62 pages
Basic Geostatistics
100% (1)
Basic Geostatistics
131 pages
Lecture 9 Clustering
No ratings yet
Lecture 9 Clustering
36 pages
Lecture24 s12
No ratings yet
Lecture24 s12
24 pages
Clustering
No ratings yet
Clustering
38 pages
DSML-ML09. Unsupervised Learning
No ratings yet
DSML-ML09. Unsupervised Learning
69 pages
Data Mining Lecture Notes-1: Bsc. (H) Computer Science: Vi Semester Teacher: Ms. Sonal Linda
No ratings yet
Data Mining Lecture Notes-1: Bsc. (H) Computer Science: Vi Semester Teacher: Ms. Sonal Linda
40 pages
Unit 5
No ratings yet
Unit 5
63 pages
L11 Cluster Analysis
No ratings yet
L11 Cluster Analysis
47 pages
Kuby Immunology 7th Edition Owen HQ File Fast Access
No ratings yet
Kuby Immunology 7th Edition Owen HQ File Fast Access
317 pages
Unit Iv
No ratings yet
Unit Iv
19 pages
John H. Hoefker
100% (1)
John H. Hoefker
13 pages
Chapter 04 Clustering
No ratings yet
Chapter 04 Clustering
36 pages
Clustering
No ratings yet
Clustering
80 pages
Clustering Analysis
No ratings yet
Clustering Analysis
102 pages
Fast and Robust General Purpose Clustering Algorit
No ratings yet
Fast and Robust General Purpose Clustering Algorit
29 pages
Clustering
No ratings yet
Clustering
6 pages
Chapter 5 Clustering
No ratings yet
Chapter 5 Clustering
40 pages
DMW Unit-V
No ratings yet
DMW Unit-V
47 pages
ML L14 Clustering
No ratings yet
ML L14 Clustering
59 pages
2000 - Scalability For Clustering Algorithms Revisited
No ratings yet
2000 - Scalability For Clustering Algorithms Revisited
7 pages
Eco Kitchen
100% (2)
Eco Kitchen
36 pages
Clustering
No ratings yet
Clustering
27 pages
Clustering Algorithms
No ratings yet
Clustering Algorithms
19 pages
U-5 Iml
No ratings yet
U-5 Iml
20 pages
AI Chapter 3 Part 5
No ratings yet
AI Chapter 3 Part 5
30 pages
Prasanna Hebbar @govt First Grade College Honnavar
No ratings yet
Prasanna Hebbar @govt First Grade College Honnavar
11 pages
CH-6 DM Clustering
No ratings yet
CH-6 DM Clustering
28 pages
ML Unit V
No ratings yet
ML Unit V
26 pages
Nail Care Lesson 2
No ratings yet
Nail Care Lesson 2
5 pages
B43 Exp5 ML
No ratings yet
B43 Exp5 ML
6 pages
Data Clustering..
No ratings yet
Data Clustering..
10 pages
MA Unit 5
No ratings yet
MA Unit 5
7 pages
Clustering Algorithm
No ratings yet
Clustering Algorithm
17 pages
ML Unit Iii
No ratings yet
ML Unit Iii
12 pages
Lecture-9 Cluster Analysis - LAK
No ratings yet
Lecture-9 Cluster Analysis - LAK
4 pages
DMW Unit 5
No ratings yet
DMW Unit 5
10 pages
What Is Cluster Analysis?: - Cluster: A Collection of Data Objects
No ratings yet
What Is Cluster Analysis?: - Cluster: A Collection of Data Objects
77 pages
Module5 QB 1
No ratings yet
Module5 QB 1
21 pages
Fds Unit03
No ratings yet
Fds Unit03
11 pages
008 Clustering With Examples - Unlocked
No ratings yet
008 Clustering With Examples - Unlocked
6 pages
4 Clustering
No ratings yet
4 Clustering
9 pages
No-Bake Cheesecake No-Bake Nuttela Cheesecakes: Ingredients Ingredients
No ratings yet
No-Bake Cheesecake No-Bake Nuttela Cheesecakes: Ingredients Ingredients
6 pages
Cluster
100% (1)
Cluster
72 pages
Lathe Machine (PDF) - Definition, Parts, Types, Operations & Specifications PDF
No ratings yet
Lathe Machine (PDF) - Definition, Parts, Types, Operations & Specifications PDF
43 pages
DWDM Unit5
No ratings yet
DWDM Unit5
14 pages
Lecture+Notes+ +clustering
No ratings yet
Lecture+Notes+ +clustering
13 pages
Free PDF Brain Over Binge Basics 6 2020
No ratings yet
Free PDF Brain Over Binge Basics 6 2020
30 pages
Efficient Clustering Algorithm For Large Database
No ratings yet
Efficient Clustering Algorithm For Large Database
25 pages
Foot Reflexology
No ratings yet
Foot Reflexology
7 pages
Cluster Analysis
No ratings yet
Cluster Analysis
15 pages
Ancient Marathon Class
No ratings yet
Ancient Marathon Class
13 pages
Meus Paces Instructions - CVS
No ratings yet
Meus Paces Instructions - CVS
4 pages
Problem No. 2: Nursing Care of A Family When A Child Needs Diagnostic or Therapeutic Modalities
No ratings yet
Problem No. 2: Nursing Care of A Family When A Child Needs Diagnostic or Therapeutic Modalities
20 pages
Highway 1
No ratings yet
Highway 1
76 pages
Admixtues Module 1
No ratings yet
Admixtues Module 1
9 pages
Sinteza Chimica Adamantan
No ratings yet
Sinteza Chimica Adamantan
4 pages
Hanging Scaffolding - Pipe Rack Area
No ratings yet
Hanging Scaffolding - Pipe Rack Area
20 pages
Graph Partitioning Advance Clustering Technique
No ratings yet
Graph Partitioning Advance Clustering Technique
14 pages
Sample Questions
No ratings yet
Sample Questions
29 pages
Healthcare E Guide System Using K Means
No ratings yet
Healthcare E Guide System Using K Means
90 pages
Activity Running Wolf Ws 7A
No ratings yet
Activity Running Wolf Ws 7A
3 pages
MPRA Paper 20588
No ratings yet
MPRA Paper 20588
10 pages
Hierarchical Energy-Saving Routing Algorithm Using Fuzzy Logic in Wireless Sensor Networks
No ratings yet
Hierarchical Energy-Saving Routing Algorithm Using Fuzzy Logic in Wireless Sensor Networks
11 pages
A Novel Approach of Implementing An Optimal K-Means Plus Plus Algorithm For Scalar Data
No ratings yet
A Novel Approach of Implementing An Optimal K-Means Plus Plus Algorithm For Scalar Data
6 pages
Paper 16 - Clustering Applied To Data Structuring and Retrieval
No ratings yet
Paper 16 - Clustering Applied To Data Structuring and Retrieval
6 pages
Candito 6 Week Program
No ratings yet
Candito 6 Week Program
13 pages
Bed Level Sensor - Royce
No ratings yet
Bed Level Sensor - Royce
8 pages
Teste de Aderencia
No ratings yet
Teste de Aderencia
7 pages
Fermentation Broth: Pump and Line Calculation Sheet
No ratings yet
Fermentation Broth: Pump and Line Calculation Sheet
12 pages
Analysis&Comparisonof Efficient Techniquesof
No ratings yet
Analysis&Comparisonof Efficient Techniquesof
5 pages
Sensors 24 02197
No ratings yet
Sensors 24 02197
23 pages
Springer Jeevan Sensor Accepted Version
No ratings yet
Springer Jeevan Sensor Accepted Version
15 pages
Energy-Aware Data Processing Techniques For Wireless Sensor Networks: A Review
No ratings yet
Energy-Aware Data Processing Techniques For Wireless Sensor Networks: A Review
21 pages
Enery Saving Algorithms in Sensor Systems
No ratings yet
Enery Saving Algorithms in Sensor Systems
4 pages
IET Communications - 2019 - Nilsaz Dezfuli - Distributed Energy Efficient Algorithm For Ensuring Coverage of Wireless
No ratings yet
IET Communications - 2019 - Nilsaz Dezfuli - Distributed Energy Efficient Algorithm For Ensuring Coverage of Wireless
7 pages
Exam 4TH
No ratings yet
Exam 4TH
8 pages
Blue Brain Final Ppt1
No ratings yet
Blue Brain Final Ppt1
21 pages
Group 7 Spiritual Self
No ratings yet
Group 7 Spiritual Self
19 pages
SMK Means An Improved Mini Batch K Means Algorithm
No ratings yet
SMK Means An Improved Mini Batch K Means Algorithm
16 pages
Survey On Energy-Efficient Techniques For Wireless Sensor Networks
No ratings yet
Survey On Energy-Efficient Techniques For Wireless Sensor Networks
15 pages
Template
No ratings yet
Template
14 pages
Energy Efficient Routing Protocol
No ratings yet
Energy Efficient Routing Protocol
14 pages
A Fuzzy Approach For Multi-Type Relational Data Clustering
No ratings yet
A Fuzzy Approach For Multi-Type Relational Data Clustering
14 pages
Final Term Paper
No ratings yet
Final Term Paper
13 pages
HL Icdcs2020
No ratings yet
HL Icdcs2020
11 pages
Energy Efficient Routing Protocols For Wireless Sensor Network
No ratings yet
Energy Efficient Routing Protocols For Wireless Sensor Network
5 pages
Clustering Algorithms For Mixed Datasets: A Review: K. Balaji and K. Lavanya
No ratings yet
Clustering Algorithms For Mixed Datasets: A Review: K. Balaji and K. Lavanya
10 pages
What If We Knew What Happens After Death?: Naresh Kumar
No ratings yet
What If We Knew What Happens After Death?: Naresh Kumar
11 pages
1 s2.0 S1877050914009077 Main
No ratings yet
1 s2.0 S1877050914009077 Main
8 pages
An Energy Efficient Routing Protocol For Wireless Sensor Networks Using A-Star Algorithm
No ratings yet
An Energy Efficient Routing Protocol For Wireless Sensor Networks Using A-Star Algorithm
8 pages
Comp RC 88 PDF
No ratings yet
Comp RC 88 PDF
6 pages
Nutritional Status Grade 2 CARNATION 2023
No ratings yet
Nutritional Status Grade 2 CARNATION 2023
4 pages
Ijctt V71i2p105
No ratings yet
Ijctt V71i2p105
7 pages
Prediction Analysis Techniques of Data Mining: A Review
No ratings yet
Prediction Analysis Techniques of Data Mining: A Review
7 pages
1 A Modified Version
No ratings yet
1 A Modified Version
7 pages
Roostapour Dy KCo SMC08
No ratings yet
Roostapour Dy KCo SMC08
6 pages
Int J Communication - 2006 - Zheng - Energy Efficient Network Protocols and Algorithms For Wireless Sensor Networks
No ratings yet
Int J Communication - 2006 - Zheng - Energy Efficient Network Protocols and Algorithms For Wireless Sensor Networks
4 pages
JETIR1503025
No ratings yet
JETIR1503025
4 pages
5 CS 03 Ijsrcse
No ratings yet
5 CS 03 Ijsrcse
4 pages
Dunkels 07 Demo
No ratings yet
Dunkels 07 Demo
2 pages
2822 Introduction
No ratings yet
2822 Introduction
1 page
Energy Saving With Node Sleep and Power Control Mechanisms For Wireless Sensor Networks
No ratings yet
Energy Saving With Node Sleep and Power Control Mechanisms For Wireless Sensor Networks
1 page

LJ 9

Uploaded by

LJ 9

Uploaded by

Clustering and the

Continuous k-Means Algorithm

Clustering involves dividing a set of data points into non-overlapping groups, or

One of the most familiar applications of clustering is the classification of plants or

A common example of clustering is the consolidation of a set of students’ test

138 Los Alamos Science Number 22 1994

47 56.33 69.86 81.6 97

What determines a “good,” or representative, clustering? Consider a single cluster

The error measure provides an objective method for comparing partitionings as

Number 22 1994 Los Alamos Science 139

Means 49.75 58.25 69.25 83.75 97

Figure 2. Pairwise Agglomerative Clustering

140 Los Alamos Science Number 22 1994

partitioning (the clustering most representative of an arbitrary dataset) requires

Conceptually, Lloyd’s algorithm is the simplest. The initial partitioning is set up

Number 22 1994 Los Alamos Science 141

The Continuous k-Means Algorithm

142 Los Alamos Science Number 22 1994

From a theoretical perspective, random sampling represents a return to MacQueen’s

These modifications to the standard algorithm greatly accelerate the clustering

Number 22 1994 Los Alamos Science 143

Two features of the continuous k-means algorithm—convergence to a feasible

The biography of Vance Faber appears on page 149.

144 Los Alamos Science Number 22 1994

You might also like