0% found this document useful (0 votes)
706 views6 pages

1971 - Rand - Objective Criteria For The Evaluation of Clustering Methods

The document proposes objective criteria for evaluating clustering methods based on a measure of similarity between two clusterings of the same data set. The measure considers how each pair of data points is assigned in each clustering. A statistic C is developed that ranges from 0 (when the clusterings have no similarities) to 1 (when the clusterings are identical). This measure of similarity can be used to evaluate methods based on how well they retrieve inherent data structure, their sensitivity to resampling, and the stability of their results with new data.

Uploaded by

adin80
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
706 views6 pages

1971 - Rand - Objective Criteria For The Evaluation of Clustering Methods

The document proposes objective criteria for evaluating clustering methods based on a measure of similarity between two clusterings of the same data set. The measure considers how each pair of data points is assigned in each clustering. A statistic C is developed that ranges from 0 (when the clusterings have no similarities) to 1 (when the clusterings are identical). This measure of similarity can be used to evaluate methods based on how well they retrieve inherent data structure, their sensitivity to resampling, and the stability of their results with new data.

Uploaded by

adin80
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 6

Objective Criteria for the Evaluation of Clustering Methods

Author(s): William M. Rand


Reviewed work(s):
Source: Journal of the American Statistical Association, Vol. 66, No. 336 (Dec., 1971), pp. 846-
850
Published by: American Statistical Association
Stable URL: http://www.jstor.org/stable/2284239 .
Accessed: 09/06/2012 18:17

Your use of the JSTOR archive indicates your acceptance of the Terms & Conditions of Use, available at .
http://www.jstor.org/page/info/about/policies/terms.jsp

JSTOR is a not-for-profit service that helps scholars, researchers, and students discover, use, and build upon a wide range of
content in a trusted digital archive. We use information technology and tools to increase productivity and facilitate new forms
of scholarship. For more information about JSTOR, please contact [email protected].

American Statistical Association is collaborating with JSTOR to digitize, preserve and extend access to Journal
of the American Statistical Association.

http://www.jstor.org
i Journalof the AmericanStatisticalAssociation
December 1971, Volume 66, Number336
Theoryand Methods Section

ObjectiveCriteriafortheEvaluationof
Methods
Clustering
WILLIAMM. RAND*

Manyintuitively appealingmethods have been suggestedforclustering data) titions,forall K 1, 2 , N, of a givenset of N points


however, interpretation
oftheirresultshas beenhinderedbythelackofobjective
will be denotedby y~4.
criteria.
Thisarticleproposesseveralcriteriawhichisolatespecific
aspectsof the
performance of a method,suchas itsretrieval
of inherentstructure,
itssensitivity
The symbol m is used for the method of choosing a
toresampling andthestability
ofitsresultsinthelightofnewdata. Thesecriteria particularY giventhe set X. In general,clusteringmeth-
dependon a measureof similarity betweentwodifferent clusteringsof thesame ods have two components,a criterionand a technique.
set of data; themeasureessentially considershoweach pair of data pointsis The criterionassignsto each clusteringa numericalvalue
assignedineachclustering.
in the contextof
which indicates its relative desirability
the given method. The technique selects a particular
1. INTRODUCTION AND STATEMENT OF PROBLEM subset of cyto be searched for a clusteringwhich mini-
Clusteranalysis has come to be used as a genericterm mizes (maximizes) the given criterion.The technique is
for techniques which are concernedwith the problem: an essential part of any operational method,since even
given a numberof objects,how to select those whichare for moderateN, the numberof elementsof ly precludes
closerto each otherthan they are to the rest of the ob- examinationof all possible clusterings.
jects. While the generalproblemsof clusteringhave long
2. A STATISTICFOR INVESTIGATION OF
intriguedinvestigators,the necessity and recent avail-
THE CLUSTERING PROBLEM
abilityof large computingpowerhas shaped much of the
recentresearch (see [1, 6, 7], which contain computer- There are two general ways of comparingclustering
orientedreviews of recent effort).Althoughsome theo- methods;the firstis to considerhow easy theyare to use,
reticalinvestigationshave been made [e.g.,2, 4, 5], most and the second is to evaluate how well they perform
efforthas been directed toward the investigationand when used. The firsttype is usually computer-oriented
development of specific methods in specific situations and considersexecution time and storage requirements
[e.g.,3, 4, 8, 9]. Much of this researchhas been directed (see [1]). In actual applicationof any clusteringmethod,
towardfindingnatural definitionsof the termcloser and such a criterionmust be considered.Yet how well a
developing and evaluating clusteringmethods in these clusteringmethodperformsis still the ultimateconcern.
terms. This article assumes that every definitionof While a quick good methodis always betterthan a slow
"closer" is natural forsome situationand thereforethat bad one, the choice between a quick bad method and a
the problemcan be consideredwithoutthis aspect. This slow good one requiressome quantificationofwhat makes
articlefocusesinstead on the developmentof procedures a methodgood or bad.
for evaluating clusteringmethodsin objective terms of Evaluation of the performanceof a clusteringmethod
how they cluster. requires some means of comparingits results to either
The specificclusteringproblemdiscussedis definedas standard results, or to the results of another method.
the study of the triplet (X, Y, m). In this notation X This section develops such a measure of the similarity
representsthe set ofN objects (or points) to be clustered, between clusterings,while the next section proposes
X= { X1, X2, . *., XN}, and Y, a specificpartitioning several standard situationsin whichthis measure can be
of these objects into K disjoint sets. This partitioning used to evaluate methodson the basis of different aspects
will be called a clusteringand writtenas a set of clusters, oftheirperformance.
Y S,Y1,Y2, . .. , YK },where each clusteris a set of the In standard classificationproblems,such as discrimi-
givenpoints,Yk = {Xk1,Xk,, . . . X7cl }, with E nk = N nant functionanalysis, there is a correct classification
,

and nk>l fork=l, 2, * * *, K. The set of all such par- againstwhichto view theresultsofclassificationschemes.
Oftena measure of the "goodness" of such a systemis a
* WilliamM. Rand is assistantprofessorofbiostatistics,MassachusettsInstitute simple count of the points misclassifiedor this number
ofTechnology,Cambridge,Mass. 02139. This articleis based in parton theauthor's normalizedinto percenterror.In the clusteringsituation
doctoraldissertation,submittedto the Universityof Californiaat Los Angelesand
sponsoredby PHS grant GM-00049 and NIHl grantFR-3. The author wishesto
organizationis soughtwhich arises fromthe data them-
thankProfessorA. A. Afifi, UCLA, forvaluable suggestions. selves, and there is no absolute scheme with which to
846
Methods
of Clustering
Evaluation 847
measure clusterings.However, thereis a natural exten- clusterof Y and thejth clusterof Y'. Then the similarity
sion of this idea involvingthe comparisonof two arbi- between Y and Y' is:
traryclusterings.
Motivation of this measure followsthree basic con- C(Y, Y') [(2 - [1/21{E ( .i ni) 2
is discretein the sense that
siderations.First, clusteriing
N2(2.2)
every point is unequivocably assigned to a specific
cluster. Second, clusters are definedjust as much by + E (i nj)2} - EEn2ij] /()
those points which they do not contain as by those
points which they do contain. Third, all points are of Thereare threefundamental properties of c. First,it
equal importance in the determinationof clusterings. is a measureof similarity, rangingfromc=O whenthe
While specificexamples in whichthese considerationsdo two clusterings have no similarities (i.e., whenone con-
not hold can be devised,thesethreeassumptionsformthe sists of a singleclusterand the otheronlyof clusters
basis fora generalclusteringproblem.From them it fol- containing singlepoints),to c 1 whenthe clusterings
lows that a basic unit of comparisonbetweentwo cluster- are in fact identical.Second,whilec is a measureof
ingsis how pairs ofpointsare clustered.If the elementsof similarity, 1-c is a measureof distance,beinga metric
an individualpoint-pairare placed togetherin a clusterin on the set of all clusterings of a givenset of points,y.
each of the two clusterings,or if they are assigned to Third,if a distribution is assumedforX, and under
differentclusters in both clusterings,this representsa variousconditions to be discussedin thenextsection,c
similaritybetweenthe clusterings,as opposed to the case is a randomvariable.
in which the elementsof the point-pairare in the same Sinceoperational clusteringmethodssearcha subsetof
clusterin one clusteringand in different clustersin the cywhichis usuallydefinedas all ofa certaintypeofre-
other. From this, a measure of the similaritybetween arrangement of a specificinitialclustering (such as all
two clusteringsofthe same data, Y and Y', can be defined clusterings formed byjoiningpairsofclusters oftheorigi-
as c(Y, Y') equal to the numberof similar assignments nal clustering),it is ofinterestto examinethebehaviorof
of point-pairsnormalizedby the total numberof point- the measurec in someof thesesituations.Table 1 dis-
pairs. playsseveralcomparisons betweensimilarclusterings and
Consider the followingillustrationwhich calculates c theirlimitsas thenumberofpointsand clusters increase.
between two clusteringsof six points. Let Y = {(a, b, c), Given an initialclusteringY havingk clusterswithn
(d, e,f) } and Y'={ (a, b), (c, d, e), (f)}, thenthepoint- pointsin each,Table 1A considers thesimilarity between
pairs are tabulated as follows: this clustering and new clusterings formedfromY by

Point-pair ab ac ad ae af bc bd be bf cd ce cf de df ef Total

Together in both $ * 2
Separate in both * * * * * * * 7
Mixed * * * * * * 6

The total of ninesimilarities


out of a possible15 gives varioussimpleoperations. Table lB considersthe simi-
c(Y, Y')=.6. laritybetweentwoclusterings eachofwhichis formed by
More precisely,given N points,X1,X2, ... , XN, and the same operationfromthe same initialclustering but
two clusteringsof them Y= {Y1,* , YR1} and with the operationapplied to different parts of the
Y -= { 1, * * ,YK2 },we define originalclustering.Table 1C showsthesimilarity ofthe
initialclusteringto certainstandardclusterings,
and to
the clusteringwhich can be consideredthe opposite
c(Y, Y') = EN zi ( 2 (2.1) clustering,thatwhichconsistsofn clusterseachofwhich
containsk points,onefromeachoftheoriginalk clusters.
Note thatin all threepartsof the tablethe similarities
where dependon both k, the numberof clusters,and n, the
1 ifthereexistkandk' suchthatbothXi andXi are numberof pointsin each cluster.This followsdirectly
in bothYkand Y'7,' fromthe fact that the similarity of two clusteringsis
T-i= 1 ifthereexistk and k' suchthatXi is in bothYk essentially the proportionof point-pairs
whoserelation-
and Y'k' whileXi is in neitherYkor Y'k' shipis thesamein both.Thusthejoiningoftwoofthree
O otherwise clustersis much more significant than the joiningof
twoofthirty(givenequal sizedclusters).
Note that thereis a simplecomputational formforc.
Givena pairofclusterings Y and Y' ofthesameN points, 3. EVALUATION OF METHODS
arbitrarily numberthe clustersin each clustering and The mostimportantproblemfacingan investigator
let niqbe thenumberofpointssimultaneously in theith withdata he wouldliketo examinebyclustering methods
848 Journal Statistical
of theAmerican December1971
Association,
1. EXPRESSIONS
FOR THEMEASUREc BETWEENTWO bution.A specificmethodapplied to these data produces
SIMILARCLUSTERINGS,GIVENAN INITIAL a clustering,Y'. For each sample ofN pointsc (Y, Y') can
Y, WHICHHASk CLUSTERS
CLUSTERING, be calculated; its distributionrepresentshow well the
OF n POINTS EACH methodretrievesclustersinherentin the data.

A. c(Y,Y'), where Y' is a simple modification of Y of thedata?


is a methodto perturbation
3.2 Howsensitive
Limit of c In many applicationsit is not knownwhetherthe data
Modification of Y c(Y,Y') n4e, k o
are good representationsof theirrespectivepopulations.
1
Two clusters joined
(k2_2)n-k
k2n-k
k2_2
k2
The changesof clusteringwhichresultfromslightmove-
ment of points are thereforeof critical importance in
One cluster split into two equal (2k2-l)n-2k 2k2-l 1 both choice of methods and interpretationof results.
parts (n even) 2k n-2k 2k'
Consider N points drawn randomlyfroma specificdis-
One cluster split into single (k2-l)n-k+l k2_1 1 tributionand clusteredby a specificmethod as Y. Add
point clusters k2n-k kr
to each point a quantitydrawn froma distributionwith
One point taken from each cluster kn2-3n-k+3 1 n2_1 zero mean and small variance and cluster these per-
to form a new cluster of k points k2n7
turbed points by the same method to produce Y'. The
B. c(Y',Y"), where Y' and Y" are similar modifications
distributionof c(Y, Y') indicates the sensitivityof the
of the original clustering Y particular method to errors of measurement or re-
Limit of c sampling.
Differences between Y' and Y" c(Y .Y") n - X k 4"'

Movement of a point to dif- k2n-k-4 1 1 3.3 Howsensitive individuals?


is a methodto missing
ferent clusters k2n-k

Different
Sometimesan investigatorknows,or suspects,that his
clusters split into (k2-l)n-k k2-1 1
two equal parts (n eveni) k2n-k k2 data set is incomplete,that whole subpopulations are
Different pairs of clusters (k2-4)n-k k2-4 1 missingor not well represented.In this case he is inter-
joined k2n-k ested in the agreementor lack of agreementbetweenthe
C. c(Y,Y'), where Y' is a major modification of the
clusteringshe derives from the data he has and the
original clustering Y clusteringshe would get ifhe had moredata. ConsiderN1
Limit of c points drawnfroma singledistributionand clusteredby
Modification of Y c(Y,Y.) n co
0 k+o.
a specificmethodas Y. If N2 additional pointsare drawn
All clusters joined into one n-l 1 0 fromthe same distributionand all N1+N2 points are
large cluster kn-l k
clusteredby the same methodas Y', the assignmentsof
All clusters split
point clusters
into single (k-l)n
kn-l
k-i
k
1 the original N1 points in Y' define the clusteringY".
Assumingthat Y" representshow the initial N1 points
n clusters are formed with k (k-l) (n-l) k-l n-1
points in each, one point from kn-I k n should have been clustered,c(Y, Y") describeshow close
each original cluster
the specificmethodcomes to findingthis clusteringusing
only those N1 points.
is thatofwhichmethodto use. Whilemanyresearchers3.4 Giventwomethods, when
results
do theyproducedifferent
wouldbe unableto specify
thespecificmethodwhichbest applied to thesame data?
such
suitedtheirneeds,mostcouldsuggestcharacteristics
a methodshould possess. The precedingmachinery An investigator,tryingto choose between methods,
methodsto be evaluatedwithrespectto would be helped by knowinghow different
allowsclustering the methods
suchrequirements.The followingfourquestionsillustrate are in terms of the clusteringsthey produce. Given, for
thisprocessof evaluationforfourfundamental aspects example, a complex method which requires much com-
ofclusteringmethods.Each is translated
intoa distribu- puting, it would be of value to determineif a simpler
tionproblemwhichis investigatedin thenextsectionby method could be used as an approximation.(This ques-
Monte Carlo techniquesfor two specificclustering tion also suggestsa stoppingcriteriaforiterativemeth-
methods. ods. Thus, a method could be iterated until results of
successive steps agreed within a prechosen similarity.)
Consider N points drawn from a given distribution.
3.1 How well does a methodretrieve"natural" clusters?
methodsproduceclusterings
Clustering of Clusteringby one method as Y and by another as Y'
irrespective
withinthe permitsthe agreementbetweenthe two methodsforany
thepresenceor absenceof 'natural'structure
data. An important is whathappenswhen specificnumberof clustersto be measuredas c(Y, Y').
consideration
thereis obviousstructurepresent.ConsiderN points
drawnrandomly fromK distinctdistributions
(differing 4. EXAMPLES OF USAGE
onlyin location)withn1? 1 pointsfromtheithdistribu- These four questions were formulatedas procedures
tion and denote as Y the clusteringwhich clusters forthe evraluationof methods (details are givrenlater in
togetherpointswhich}are drawnfromthe same distri- this section) and applied to two simple clusteringmeth-
Evaluation
of Clustering
Methods 849
2. RETRIEVAL:COMPARISON OF THE ABILITY OF TWO bution of c is describedby threestatistics,the mean, the
CLUSTERINGMETHODS TO RETRIEVEFIVE standard deviation, and the percentage of complete
MULTIVARIATENORMAL POPULATIONS agreement.Sample resultsforK =2, 3, , 10 are dis-
played in Tables 2 through5.
Standard
Average deviation Percentage of complete For retrieval(Section 3.1), six pointswere drawn ran-
Nuiber of of c of c agreement domly fromeach of five five-dimensional normal popu-
clusters Method
lations with unity covariance matrix and means sym-
T/N AA T/N AA T/N AA
metricallyfourunits apart. These points were clustered
10 .87 .71 .020 .018 0 0 by each of the two methods to produce sequences of
9 .88 .68 .023 .019 0 0
8 .88 .68 .027 .016 0 0 clusterings.These clusteringswere then compared (c
7
6
.89
.89
.63
.56
.034
.039
.015
.013
0
0
0
0
calculated) with the clusteringwhich clusteredtogether
5 .88 .50 .046 .013 3 0 those points drawn fromthe same population. Table 2
4 .84 .43 .038 .013 0 0
3 .75 .35 .045 .007 0 0
displaysthe resultsof the applicationofthisprocedure.
2 .60 .26 .048 .004 0 0 For perturbation(Section 3.2), 30 points were drawn
randomlyfroma five-dimensional unit normal distribu-
tion and a clusteringsequence produced by application
ods. These methodsare both agglomerativein that,given of each method. Random perturbationsdrawn from
a best clusteringofK clusters,theyexamineall clusterings N(O, .01) were added to each coordinateofeach of the 30
of K-1 clustersformedby joiningpairs of clusters,and points and a new clusteredsequence derived. These se-
select one forwhichtheirspecificcriterionis a minimum. quences were comparedforeach K to produceTable 3.
These methodswere applied in a stepwisefashion,start- For missingdata (Section 3.3), 25 points were drawn
ing with the clusteringin which each point is itself a from a five-dimensionalunit normal distributionand
cluster and proceeding until all points are in a single clustered.Then an additionalfivepointsweredrawnfrom
cluster. Thus for each set of points a sequence of best the same distributionand the total 30 points clustered.
clusteringsis generated.The importantquestion of how The assignmentsof the original 25 points within this
to choose whichis the best numberof clustersis not con- sequence of clusteringswere then compared with the
sideredhere. clusteringsequence based on onlythe original25 foreach
The firstclusteringmethod,denotedby T/N, takes for K. The resultsare summarizedin Table 4.
its criterionthe sum of all withindistances (those dis- For comparisonof the two methods (Section 3.4), 30
tances between points which are in the same cluster) points were randomly chosen from a five-dimensional
divided by the numberof such distances. Denoting the unit normal distribution.Each method was applied to
total of the distances between the nk points in the kth produce a clusteringsequence and the sequences com-
cluster as Wk, this criterionis written EWk/E(2k), pared foreach value of K (see Table 5).
where the summationis over all K clusters.The second The application of these procedures illustratestheir
method,denotedby AA, takes as its criterionthe average utility. Table 5 shows that the examined methods are
of the average withindistance,or 1/KEWk/(2k). These not similar,whilethe otherthreetables indicatewherethe
methodswere chosen fortheirsimplicityand similarity, dissimilaritieslie. Essentially,method TIN is betterfor
since both minimizea type of withindistance. retrievalof structurewhile method AA is less affected
The procedureswere applied by means of Monte Carlo by missingdata. The situationwith regardto perturba-
simulation,each being replicated 100 times. The distri- tionillustratesthe further generalizationthat methodAA

3. PERTURBATION:COMPARISON OF THE SENSITIVITY 4. MISSINGDATA:COMPARISONOF TWO CLUSTERING


OF TWO CLUSTERINGMETHODS TO SLIGHT METHODSIN TERMSOF THEEFFECTELIMINATIONOF
MOVEMENT OF OBJECTS BEING CLUSTEREDa OBJECTSHAS ON THEASSIGNMENTOF THOSE LEFTa
Standard Standard
Average deviation Percentage of complete Average deviation Percentage of complete
Nuiber of of c of c agreement
Numberof of c of c agreement clusters Method
clusters Method
T/N AA T/N AA T/N AA
T/N AA T/N AA TIN AA
10 .96 .84 .020 .102 6 6
10 .92 .68 .080 .022 C 0 .82 .023 .103 2 4
9 .95
9 .91 .65 .082 .022 0 0 8 .94 .80 .030 .106 3 0
8 .89 .65 .082 .028 0 0 7 .93 .77 .032 .117 1. 1
7 .87 .61 .091 .035 0 0 6 .91 .75 .039 .119 1 2
6 .85 .59 .110 .041 0 0 .87 .78 .054 .133 1 16
5
5 .81 .60 .116 .046 0 0 4 .82 .81 .071 .131 2 24
4 .76 .63 .121 .060 0 1 .76 .88 .103 .114 3 42
3
3 .68 .70 .106 .075 0 2 2 .68 .94 .163 .104 7 72
2 .61 .81 .091 .132 0 17

a The similarityof the clusteringsof the data beforeand afterperturbationis a The similarityof clusteringswith and without these additional objects is
measuredby c. measuredby c.
850 of theAmerican
Journal Statistical December1971
Association,
5. DIRECTCOMPARISONOF THEAGREEMENT BETWEEN Vol. 27, Part 1: Fall JointComputerConference,
(1965), 533-
TWO CLUSTERINGMETHODS,T/N AND AA. 59.
[2] Fisher,Walter D., "On GroupingforMaximumHomogene-
WHENAPPLIEDTO SAME SETS OF DATA ity," Journal of the American Statistical Association,53
Standard
(December 1958), 789-98.
Nuniberof Average deviation Percentage of complete [31 Fortier,J. J. and Solomon,H., "ClusteringProcedures,"in
clusters of c of c agreement P. R. Krishnaiah, ed., MultivariateAnalysis, New York:
Academic Press, 1966, 493-506.
10 .76 .015 0 [4] Friedman,H. P., and Rubin,J., "On Some InvariantCriteria
9 .72 .018 0
8 .71 .022 0 forGroupingData," JournaloftheAmericanStatisticalAssoci-
7 .64 .019 0 ation,62 (December 1967), 1159-78.
6 .57 .023 0 [5] Jardine,J. and Sibson, R., "The Constructionof Hierarchic
5 .50 .025 0
4 .44 .027 0
and Non-hierarchicClassifications,"The ComputerJournal,
3 .43 .032 0 11 (August 1968), 177-84.
2 .52 .041 0 [6] Lance, G. N. and Williams,W. T., "A General Theory of
ClassificatorySortingStrategies.I. HierarchicalSystems,"
TheComputer Journal,9 (February1967), 373-80.
[71 , and Williams,W. T., "A General Theoryof Classi-
is betterforsmall K whilemethodT/N is betterfor ficatorySorting Strategies. II. ClusteringSystems," The
Computer Journal,10 (November1967), 271-7.
largerK. [8] Rubin,J., "OptimalClassificationinto Groups:An Approach
for Solving the Taxonomy Problem,"Journalof Theoretical
Biology,15 (April1967), 103-44.
REFERENCES
[9] Ward, Joe H., Jr., "HierarchicalGroupingto Optimize an
[1] Ball, GeoffreyH., "Data Analysis in the Social Sciences: Objective Function," Journal of the American Statistical As-
What About the Details?" in AFIPS ConferenceProceedings, sociation,58 (March 1963), 236-44.

You might also like