1971 - Rand - Objective Criteria For The Evaluation of Clustering Methods

The document proposes objective criteria for evaluating clustering methods based on a measure of similarity between two clusterings of the same data set. The measure considers how each pair of data points is assigned in each clustering. A statistic C is developed that ranges from 0 (when the clusterings have no similarities) to 1 (when the clusterings are identical). This measure of similarity can be used to evaluate methods based on how well they retrieve inherent data structure, their sensitivity to resampling, and the stability of their results with new data.

Uploaded by

adin80

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

706 views6 pages

1971 - Rand - Objective Criteria For The Evaluation of Clustering Methods

Uploaded by

adin80

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 6

Objective Criteria for the Evaluation of Clustering Methods

Author(s): William M. Rand

Reviewed work(s):
Source: Journal of the American Statistical Association, Vol. 66, No. 336 (Dec., 1971), pp. 846-
850
Published by: American Statistical Association
Stable URL: http://www.jstor.org/stable/2284239 .
Accessed: 09/06/2012 18:17

Your use of the JSTOR archive indicates your acceptance of the Terms & Conditions of Use, available at .
http://www.jstor.org/page/info/about/policies/terms.jsp

JSTOR is a not-for-profit service that helps scholars, researchers, and students discover, use, and build upon a wide range of
content in a trusted digital archive. We use information technology and tools to increase productivity and facilitate new forms
of scholarship. For more information about JSTOR, please contact [email protected].

American Statistical Association is collaborating with JSTOR to digitize, preserve and extend access to Journal
of the American Statistical Association.

http://www.jstor.org
i Journalof the AmericanStatisticalAssociation
December 1971, Volume 66, Number336
Theoryand Methods Section

ObjectiveCriteriafortheEvaluationof
Methods
Clustering
WILLIAMM. RAND*

Manyintuitively appealingmethods have been suggestedforclustering data) titions,forall K 1, 2 , N, of a givenset of N points

however, interpretation
oftheirresultshas beenhinderedbythelackofobjective
will be denotedby y~4.
criteria.
Thisarticleproposesseveralcriteriawhichisolatespecific
aspectsof the
performance of a method,suchas itsretrieval
of inherentstructure,
itssensitivity
The symbol m is used for the method of choosing a
toresampling andthestability
ofitsresultsinthelightofnewdata. Thesecriteria particularY giventhe set X. In general,clusteringmeth-
dependon a measureof similarity betweentwodifferent clusteringsof thesame ods have two components,a criterionand a technique.
set of data; themeasureessentially considershoweach pair of data pointsis The criterionassignsto each clusteringa numericalvalue
assignedineachclustering.
in the contextof
which indicates its relative desirability
the given method. The technique selects a particular
1. INTRODUCTION AND STATEMENT OF PROBLEM subset of cyto be searched for a clusteringwhich mini-
Clusteranalysis has come to be used as a genericterm mizes (maximizes) the given criterion.The technique is
for techniques which are concernedwith the problem: an essential part of any operational method,since even
given a numberof objects,how to select those whichare for moderateN, the numberof elementsof ly precludes
closerto each otherthan they are to the rest of the ob- examinationof all possible clusterings.
jects. While the generalproblemsof clusteringhave long
2. A STATISTICFOR INVESTIGATION OF
intriguedinvestigators,the necessity and recent avail-
THE CLUSTERING PROBLEM
abilityof large computingpowerhas shaped much of the
recentresearch (see [1, 6, 7], which contain computer- There are two general ways of comparingclustering
orientedreviews of recent effort).Althoughsome theo- methods;the firstis to considerhow easy theyare to use,
reticalinvestigationshave been made [e.g.,2, 4, 5], most and the second is to evaluate how well they perform
efforthas been directed toward the investigationand when used. The firsttype is usually computer-oriented
development of specific methods in specific situations and considersexecution time and storage requirements
[e.g.,3, 4, 8, 9]. Much of this researchhas been directed (see [1]). In actual applicationof any clusteringmethod,
towardfindingnatural definitionsof the termcloser and such a criterionmust be considered.Yet how well a
developing and evaluating clusteringmethods in these clusteringmethodperformsis still the ultimateconcern.
terms. This article assumes that every definitionof While a quick good methodis always betterthan a slow
"closer" is natural forsome situationand thereforethat bad one, the choice between a quick bad method and a
the problemcan be consideredwithoutthis aspect. This slow good one requiressome quantificationofwhat makes
articlefocusesinstead on the developmentof procedures a methodgood or bad.
for evaluating clusteringmethodsin objective terms of Evaluation of the performanceof a clusteringmethod
how they cluster. requires some means of comparingits results to either
The specificclusteringproblemdiscussedis definedas standard results, or to the results of another method.
the study of the triplet (X, Y, m). In this notation X This section develops such a measure of the similarity
representsthe set ofN objects (or points) to be clustered, between clusterings,while the next section proposes
X= { X1, X2, . *., XN}, and Y, a specificpartitioning several standard situationsin whichthis measure can be
of these objects into K disjoint sets. This partitioning used to evaluate methodson the basis of different aspects
will be called a clusteringand writtenas a set of clusters, oftheirperformance.
Y S,Y1,Y2, . .. , YK },where each clusteris a set of the In standard classificationproblems,such as discrimi-
givenpoints,Yk = {Xk1,Xk,, . . . X7cl }, with E nk = N nant functionanalysis, there is a correct classification
,

and nk>l fork=l, 2, * * *, K. The set of all such par- againstwhichto view theresultsofclassificationschemes.
Oftena measure of the "goodness" of such a systemis a
* WilliamM. Rand is assistantprofessorofbiostatistics,MassachusettsInstitute simple count of the points misclassifiedor this number
ofTechnology,Cambridge,Mass. 02139. This articleis based in parton theauthor's normalizedinto percenterror.In the clusteringsituation
doctoraldissertation,submittedto the Universityof Californiaat Los Angelesand
sponsoredby PHS grant GM-00049 and NIHl grantFR-3. The author wishesto
organizationis soughtwhich arises fromthe data them-
thankProfessorA. A. Afifi, UCLA, forvaluable suggestions. selves, and there is no absolute scheme with which to
846
Methods
of Clustering
Evaluation 847
measure clusterings.However, thereis a natural exten- clusterof Y and thejth clusterof Y'. Then the similarity
sion of this idea involvingthe comparisonof two arbi- between Y and Y' is:
traryclusterings.
Motivation of this measure followsthree basic con- C(Y, Y') [(2 - [1/21{E ( .i ni) 2
is discretein the sense that
siderations.First, clusteriing
N2(2.2)
every point is unequivocably assigned to a specific
cluster. Second, clusters are definedjust as much by + E (i nj)2} - EEn2ij] /()
those points which they do not contain as by those
points which they do contain. Third, all points are of Thereare threefundamental properties of c. First,it
equal importance in the determinationof clusterings. is a measureof similarity, rangingfromc=O whenthe
While specificexamples in whichthese considerationsdo two clusterings have no similarities (i.e., whenone con-
not hold can be devised,thesethreeassumptionsformthe sists of a singleclusterand the otheronlyof clusters
basis fora generalclusteringproblem.From them it fol- containing singlepoints),to c 1 whenthe clusterings
lows that a basic unit of comparisonbetweentwo cluster- are in fact identical.Second,whilec is a measureof
ingsis how pairs ofpointsare clustered.If the elementsof similarity, 1-c is a measureof distance,beinga metric
an individualpoint-pairare placed togetherin a clusterin on the set of all clusterings of a givenset of points,y.
each of the two clusterings,or if they are assigned to Third,if a distribution is assumedforX, and under
differentclusters in both clusterings,this representsa variousconditions to be discussedin thenextsection,c
similaritybetweenthe clusterings,as opposed to the case is a randomvariable.
in which the elementsof the point-pairare in the same Sinceoperational clusteringmethodssearcha subsetof
clusterin one clusteringand in different clustersin the cywhichis usuallydefinedas all ofa certaintypeofre-
other. From this, a measure of the similaritybetween arrangement of a specificinitialclustering (such as all
two clusteringsofthe same data, Y and Y', can be defined clusterings formed byjoiningpairsofclusters oftheorigi-
as c(Y, Y') equal to the numberof similar assignments nal clustering),it is ofinterestto examinethebehaviorof
of point-pairsnormalizedby the total numberof point- the measurec in someof thesesituations.Table 1 dis-
pairs. playsseveralcomparisons betweensimilarclusterings and
Consider the followingillustrationwhich calculates c theirlimitsas thenumberofpointsand clusters increase.
between two clusteringsof six points. Let Y = {(a, b, c), Given an initialclusteringY havingk clusterswithn
(d, e,f) } and Y'={ (a, b), (c, d, e), (f)}, thenthepoint- pointsin each,Table 1A considers thesimilarity between
pairs are tabulated as follows: this clustering and new clusterings formedfromY by

Point-pair ab ac ad ae af bc bd be bf cd ce cf de df ef Total

Together in both $ * 2
Separate in both * * * * * * * 7
Mixed * * * * * * 6

The total of ninesimilarities

out of a possible15 gives varioussimpleoperations. Table lB considersthe simi-
c(Y, Y')=.6. laritybetweentwoclusterings eachofwhichis formed by
More precisely,given N points,X1,X2, ... , XN, and the same operationfromthe same initialclustering but
two clusteringsof them Y= {Y1,* , YR1} and with the operationapplied to different parts of the
Y -= { 1, * * ,YK2 },we define originalclustering.Table 1C showsthesimilarity ofthe
initialclusteringto certainstandardclusterings,
and to
the clusteringwhich can be consideredthe opposite
c(Y, Y') = EN zi ( 2 (2.1) clustering,thatwhichconsistsofn clusterseachofwhich
containsk points,onefromeachoftheoriginalk clusters.
Note thatin all threepartsof the tablethe similarities
where dependon both k, the numberof clusters,and n, the
1 ifthereexistkandk' suchthatbothXi andXi are numberof pointsin each cluster.This followsdirectly
in bothYkand Y'7,' fromthe fact that the similarity of two clusteringsis
T-i= 1 ifthereexistk and k' suchthatXi is in bothYk essentially the proportionof point-pairs
whoserelation-
and Y'k' whileXi is in neitherYkor Y'k' shipis thesamein both.Thusthejoiningoftwoofthree
O otherwise clustersis much more significant than the joiningof
twoofthirty(givenequal sizedclusters).
Note that thereis a simplecomputational formforc.
Givena pairofclusterings Y and Y' ofthesameN points, 3. EVALUATION OF METHODS
arbitrarily numberthe clustersin each clustering and The mostimportantproblemfacingan investigator
let niqbe thenumberofpointssimultaneously in theith withdata he wouldliketo examinebyclustering methods
848 Journal Statistical
of theAmerican December1971
Association,
1. EXPRESSIONS
FOR THEMEASUREc BETWEENTWO bution.A specificmethodapplied to these data produces
SIMILARCLUSTERINGS,GIVENAN INITIAL a clustering,Y'. For each sample ofN pointsc (Y, Y') can
Y, WHICHHASk CLUSTERS
CLUSTERING, be calculated; its distributionrepresentshow well the
OF n POINTS EACH methodretrievesclustersinherentin the data.

A. c(Y,Y'), where Y' is a simple modification of Y of thedata?

is a methodto perturbation
3.2 Howsensitive
Limit of c In many applicationsit is not knownwhetherthe data
Modification of Y c(Y,Y') n4e, k o
are good representationsof theirrespectivepopulations.
1
Two clusters joined
(k2_2)n-k
k2n-k
k2_2
k2
The changesof clusteringwhichresultfromslightmove-
ment of points are thereforeof critical importance in
One cluster split into two equal (2k2-l)n-2k 2k2-l 1 both choice of methods and interpretationof results.
parts (n even) 2k n-2k 2k'
Consider N points drawn randomlyfroma specificdis-
One cluster split into single (k2-l)n-k+l k2_1 1 tributionand clusteredby a specificmethod as Y. Add
point clusters k2n-k kr
to each point a quantitydrawn froma distributionwith
One point taken from each cluster kn2-3n-k+3 1 n2_1 zero mean and small variance and cluster these per-
to form a new cluster of k points k2n7
turbed points by the same method to produce Y'. The
B. c(Y',Y"), where Y' and Y" are similar modifications
distributionof c(Y, Y') indicates the sensitivityof the
of the original clustering Y particular method to errors of measurement or re-
Limit of c sampling.
Differences between Y' and Y" c(Y .Y") n - X k 4"'

Movement of a point to dif- k2n-k-4 1 1 3.3 Howsensitive individuals?

is a methodto missing
ferent clusters k2n-k

Different
Sometimesan investigatorknows,or suspects,that his
clusters split into (k2-l)n-k k2-1 1
two equal parts (n eveni) k2n-k k2 data set is incomplete,that whole subpopulations are
Different pairs of clusters (k2-4)n-k k2-4 1 missingor not well represented.In this case he is inter-
joined k2n-k ested in the agreementor lack of agreementbetweenthe
C. c(Y,Y'), where Y' is a major modification of the
clusteringshe derives from the data he has and the
original clustering Y clusteringshe would get ifhe had moredata. ConsiderN1
Limit of c points drawnfroma singledistributionand clusteredby
Modification of Y c(Y,Y.) n co
0 k+o.
a specificmethodas Y. If N2 additional pointsare drawn
All clusters joined into one n-l 1 0 fromthe same distributionand all N1+N2 points are
large cluster kn-l k
clusteredby the same methodas Y', the assignmentsof
All clusters split
point clusters
into single (k-l)n
kn-l
k-i
k
1 the original N1 points in Y' define the clusteringY".
Assumingthat Y" representshow the initial N1 points
n clusters are formed with k (k-l) (n-l) k-l n-1
points in each, one point from kn-I k n should have been clustered,c(Y, Y") describeshow close
each original cluster
the specificmethodcomes to findingthis clusteringusing
only those N1 points.
is thatofwhichmethodto use. Whilemanyresearchers3.4 Giventwomethods, when
results
do theyproducedifferent
wouldbe unableto specify
thespecificmethodwhichbest applied to thesame data?
such
suitedtheirneeds,mostcouldsuggestcharacteristics
a methodshould possess. The precedingmachinery An investigator,tryingto choose between methods,
methodsto be evaluatedwithrespectto would be helped by knowinghow different
allowsclustering the methods
suchrequirements.The followingfourquestionsillustrate are in terms of the clusteringsthey produce. Given, for
thisprocessof evaluationforfourfundamental aspects example, a complex method which requires much com-
ofclusteringmethods.Each is translated
intoa distribu- puting, it would be of value to determineif a simpler
tionproblemwhichis investigatedin thenextsectionby method could be used as an approximation.(This ques-
Monte Carlo techniquesfor two specificclustering tion also suggestsa stoppingcriteriaforiterativemeth-
methods. ods. Thus, a method could be iterated until results of
successive steps agreed within a prechosen similarity.)
Consider N points drawn from a given distribution.
3.1 How well does a methodretrieve"natural" clusters?
methodsproduceclusterings
Clustering of Clusteringby one method as Y and by another as Y'
irrespective
withinthe permitsthe agreementbetweenthe two methodsforany
thepresenceor absenceof 'natural'structure
data. An important is whathappenswhen specificnumberof clustersto be measuredas c(Y, Y').
consideration
thereis obviousstructurepresent.ConsiderN points
drawnrandomly fromK distinctdistributions
(differing 4. EXAMPLES OF USAGE
onlyin location)withn1? 1 pointsfromtheithdistribu- These four questions were formulatedas procedures
tion and denote as Y the clusteringwhich clusters forthe evraluationof methods (details are givrenlater in
togetherpointswhich}are drawnfromthe same distri- this section) and applied to two simple clusteringmeth-
Evaluation
of Clustering
Methods 849
2. RETRIEVAL:COMPARISON OF THE ABILITY OF TWO bution of c is describedby threestatistics,the mean, the
CLUSTERINGMETHODS TO RETRIEVEFIVE standard deviation, and the percentage of complete
MULTIVARIATENORMAL POPULATIONS agreement.Sample resultsforK =2, 3, , 10 are dis-
played in Tables 2 through5.
Standard
Average deviation Percentage of complete For retrieval(Section 3.1), six pointswere drawn ran-
Nuiber of of c of c agreement domly fromeach of five five-dimensional normal popu-
clusters Method
lations with unity covariance matrix and means sym-
T/N AA T/N AA T/N AA
metricallyfourunits apart. These points were clustered
10 .87 .71 .020 .018 0 0 by each of the two methods to produce sequences of
9 .88 .68 .023 .019 0 0
8 .88 .68 .027 .016 0 0 clusterings.These clusteringswere then compared (c
7
6
.89
.89
.63
.56
.034
.039
.015
.013
0
0
0
0
calculated) with the clusteringwhich clusteredtogether
5 .88 .50 .046 .013 3 0 those points drawn fromthe same population. Table 2
4 .84 .43 .038 .013 0 0
3 .75 .35 .045 .007 0 0
displaysthe resultsof the applicationofthisprocedure.
2 .60 .26 .048 .004 0 0 For perturbation(Section 3.2), 30 points were drawn
randomlyfroma five-dimensional unit normal distribu-
tion and a clusteringsequence produced by application
ods. These methodsare both agglomerativein that,given of each method. Random perturbationsdrawn from
a best clusteringofK clusters,theyexamineall clusterings N(O, .01) were added to each coordinateofeach of the 30
of K-1 clustersformedby joiningpairs of clusters,and points and a new clusteredsequence derived. These se-
select one forwhichtheirspecificcriterionis a minimum. quences were comparedforeach K to produceTable 3.
These methodswere applied in a stepwisefashion,start- For missingdata (Section 3.3), 25 points were drawn
ing with the clusteringin which each point is itself a from a five-dimensionalunit normal distributionand
cluster and proceeding until all points are in a single clustered.Then an additionalfivepointsweredrawnfrom
cluster. Thus for each set of points a sequence of best the same distributionand the total 30 points clustered.
clusteringsis generated.The importantquestion of how The assignmentsof the original 25 points within this
to choose whichis the best numberof clustersis not con- sequence of clusteringswere then compared with the
sideredhere. clusteringsequence based on onlythe original25 foreach
The firstclusteringmethod,denotedby T/N, takes for K. The resultsare summarizedin Table 4.
its criterionthe sum of all withindistances (those dis- For comparisonof the two methods (Section 3.4), 30
tances between points which are in the same cluster) points were randomly chosen from a five-dimensional
divided by the numberof such distances. Denoting the unit normal distribution.Each method was applied to
total of the distances between the nk points in the kth produce a clusteringsequence and the sequences com-
cluster as Wk, this criterionis written EWk/E(2k), pared foreach value of K (see Table 5).
where the summationis over all K clusters.The second The application of these procedures illustratestheir
method,denotedby AA, takes as its criterionthe average utility. Table 5 shows that the examined methods are
of the average withindistance,or 1/KEWk/(2k). These not similar,whilethe otherthreetables indicatewherethe
methodswere chosen fortheirsimplicityand similarity, dissimilaritieslie. Essentially,method TIN is betterfor
since both minimizea type of withindistance. retrievalof structurewhile method AA is less affected
The procedureswere applied by means of Monte Carlo by missingdata. The situationwith regardto perturba-
simulation,each being replicated 100 times. The distri- tionillustratesthe further generalizationthat methodAA

3. PERTURBATION:COMPARISON OF THE SENSITIVITY 4. MISSINGDATA:COMPARISONOF TWO CLUSTERING

OF TWO CLUSTERINGMETHODS TO SLIGHT METHODSIN TERMSOF THEEFFECTELIMINATIONOF
MOVEMENT OF OBJECTS BEING CLUSTEREDa OBJECTSHAS ON THEASSIGNMENTOF THOSE LEFTa
Standard Standard
Average deviation Percentage of complete Average deviation Percentage of complete
Nuiber of of c of c agreement
Numberof of c of c agreement clusters Method
clusters Method
T/N AA T/N AA T/N AA
T/N AA T/N AA TIN AA
10 .96 .84 .020 .102 6 6
10 .92 .68 .080 .022 C 0 .82 .023 .103 2 4
9 .95
9 .91 .65 .082 .022 0 0 8 .94 .80 .030 .106 3 0
8 .89 .65 .082 .028 0 0 7 .93 .77 .032 .117 1. 1
7 .87 .61 .091 .035 0 0 6 .91 .75 .039 .119 1 2
6 .85 .59 .110 .041 0 0 .87 .78 .054 .133 1 16
5
5 .81 .60 .116 .046 0 0 4 .82 .81 .071 .131 2 24
4 .76 .63 .121 .060 0 1 .76 .88 .103 .114 3 42
3
3 .68 .70 .106 .075 0 2 2 .68 .94 .163 .104 7 72
2 .61 .81 .091 .132 0 17

a The similarityof the clusteringsof the data beforeand afterperturbationis a The similarityof clusteringswith and without these additional objects is
measuredby c. measuredby c.
850 of theAmerican
Journal Statistical December1971
Association,
5. DIRECTCOMPARISONOF THEAGREEMENT BETWEEN Vol. 27, Part 1: Fall JointComputerConference,
(1965), 533-
TWO CLUSTERINGMETHODS,T/N AND AA. 59.
[2] Fisher,Walter D., "On GroupingforMaximumHomogene-
WHENAPPLIEDTO SAME SETS OF DATA ity," Journal of the American Statistical Association,53
Standard
(December 1958), 789-98.
Nuniberof Average deviation Percentage of complete [31 Fortier,J. J. and Solomon,H., "ClusteringProcedures,"in
clusters of c of c agreement P. R. Krishnaiah, ed., MultivariateAnalysis, New York:
Academic Press, 1966, 493-506.
10 .76 .015 0 [4] Friedman,H. P., and Rubin,J., "On Some InvariantCriteria
9 .72 .018 0
8 .71 .022 0 forGroupingData," JournaloftheAmericanStatisticalAssoci-
7 .64 .019 0 ation,62 (December 1967), 1159-78.
6 .57 .023 0 [5] Jardine,J. and Sibson, R., "The Constructionof Hierarchic
5 .50 .025 0
4 .44 .027 0
and Non-hierarchicClassifications,"The ComputerJournal,
3 .43 .032 0 11 (August 1968), 177-84.
2 .52 .041 0 [6] Lance, G. N. and Williams,W. T., "A General Theory of
ClassificatorySortingStrategies.I. HierarchicalSystems,"
TheComputer Journal,9 (February1967), 373-80.
[71 , and Williams,W. T., "A General Theoryof Classi-
is betterforsmall K whilemethodT/N is betterfor ficatorySorting Strategies. II. ClusteringSystems," The
Computer Journal,10 (November1967), 271-7.
largerK. [8] Rubin,J., "OptimalClassificationinto Groups:An Approach
for Solving the Taxonomy Problem,"Journalof Theoretical
Biology,15 (April1967), 103-44.
REFERENCES
[9] Ward, Joe H., Jr., "HierarchicalGroupingto Optimize an
[1] Ball, GeoffreyH., "Data Analysis in the Social Sciences: Objective Function," Journal of the American Statistical As-
What About the Details?" in AFIPS ConferenceProceedings, sociation,58 (March 1963), 236-44.

ASHRAE 4clean Room
100% (4)
ASHRAE 4clean Room
451 pages
Cluster Analysis For Researcher - Charles Romesburg PDF
No ratings yet
Cluster Analysis For Researcher - Charles Romesburg PDF
244 pages
KAMIAS (Averrhoa Bilimbi) WINE BLEND WITH PINEAPPLE COMBINATION
100% (3)
KAMIAS (Averrhoa Bilimbi) WINE BLEND WITH PINEAPPLE COMBINATION
41 pages
Wiley - Student Solutions Manual To Accompany Introduction To Time Series Analysis and Forecasting - 978-0-470-43574-8
0% (1)
Wiley - Student Solutions Manual To Accompany Introduction To Time Series Analysis and Forecasting - 978-0-470-43574-8
3 pages
Comparing The Areas Under Two or More Correlated Receiver Operating Characteristic Curves A Nonparametric Approach
No ratings yet
Comparing The Areas Under Two or More Correlated Receiver Operating Characteristic Curves A Nonparametric Approach
10 pages
Mathematical Statistics With Applications Solution Manual
No ratings yet
Mathematical Statistics With Applications Solution Manual
5 pages
HW 03 Sol
No ratings yet
HW 03 Sol
9 pages
402 08 Elandt Johnson Survival Models and Data Analysis 1980
No ratings yet
402 08 Elandt Johnson Survival Models and Data Analysis 1980
478 pages
BiodiversityR PDF
No ratings yet
BiodiversityR PDF
128 pages
Sampling Theory and Method-301-500
No ratings yet
Sampling Theory and Method-301-500
200 pages
(Image Processing Series) Luciano Da Fona Costa, Roberto Marcond Cesar Jr. - Shape Classification and Analysis - Theory and Practice-CRC Press (2009) PDF
No ratings yet
(Image Processing Series) Luciano Da Fona Costa, Roberto Marcond Cesar Jr. - Shape Classification and Analysis - Theory and Practice-CRC Press (2009) PDF
674 pages
Skvoretz & Fararo (2011) - Mathematical Sociology
No ratings yet
Skvoretz & Fararo (2011) - Mathematical Sociology
14 pages
Multiple Regression Tutorial 3
100% (2)
Multiple Regression Tutorial 3
5 pages
835618
No ratings yet
835618
298 pages
Applied Multidimensional Scaling by Ingwer Borg Patrick J F Groenen Patrick Mair PDF
No ratings yet
Applied Multidimensional Scaling by Ingwer Borg Patrick J F Groenen Patrick Mair PDF
119 pages
Parinya Sanguansat Principal Component Analysis Multidisciplinary Applications InTech 2012 PDF
No ratings yet
Parinya Sanguansat Principal Component Analysis Multidisciplinary Applications InTech 2012 PDF
212 pages
Spatial Econometrics, James P. LeSage.
No ratings yet
Spatial Econometrics, James P. LeSage.
309 pages
Time Series Kendall
No ratings yet
Time Series Kendall
320 pages
Evžen Kočenda - Alexandr Černý - Elements of Time Series Econometrics - An Applied Approach-Karolinum Press, Charles University (2017)
No ratings yet
Evžen Kočenda - Alexandr Černý - Elements of Time Series Econometrics - An Applied Approach-Karolinum Press, Charles University (2017)
220 pages
PDF Analisis Multivariante Carlos Veliz Capuay Compress
No ratings yet
PDF Analisis Multivariante Carlos Veliz Capuay Compress
210 pages
Mis Notas de R PDF
100% (1)
Mis Notas de R PDF
396 pages
Solution CH # 5
No ratings yet
Solution CH # 5
39 pages
Machine Learning For Survival Analysis
No ratings yet
Machine Learning For Survival Analysis
107 pages
Weak Convergence and Empirical Processes With Applications To Statistics (A.w. Van Der Vaart - Jon A. Wellner) (Z-Library)
No ratings yet
Weak Convergence and Empirical Processes With Applications To Statistics (A.w. Van Der Vaart - Jon A. Wellner) (Z-Library)
693 pages
Modeling With Penalized Splines
No ratings yet
Modeling With Penalized Splines
50 pages
Chi G, ZHU
100% (1)
Chi G, ZHU
26 pages
GLMM in Agriculture and Biology
No ratings yet
GLMM in Agriculture and Biology
436 pages
OptimisationII Notes
100% (2)
OptimisationII Notes
94 pages
One-Sample T-Test
No ratings yet
One-Sample T-Test
9 pages
Asymptotics in Statistics Some Basic Concepts by Lucien Le Cam, Grace Lo Yang (Auth.)
No ratings yet
Asymptotics in Statistics Some Basic Concepts by Lucien Le Cam, Grace Lo Yang (Auth.)
298 pages
Cognitive Psychology - Module 1
No ratings yet
Cognitive Psychology - Module 1
72 pages
Manual Stata 13
100% (1)
Manual Stata 13
371 pages
STAT613
No ratings yet
STAT613
295 pages
Chapter 01 Introduction
No ratings yet
Chapter 01 Introduction
21 pages
Hosmer DW & Lemeshow S (1999) - Applied Survival Analysis Regression Modeling of Time To Event Da
No ratings yet
Hosmer DW & Lemeshow S (1999) - Applied Survival Analysis Regression Modeling of Time To Event Da
206 pages
Sufficient Statistics - Problems - Solved - Xiang - Yin
No ratings yet
Sufficient Statistics - Problems - Solved - Xiang - Yin
5 pages
R Manual To Agresti's Categorical Data Analysis
100% (1)
R Manual To Agresti's Categorical Data Analysis
280 pages
Sufficient Statistics and Exponential Family
No ratings yet
Sufficient Statistics and Exponential Family
11 pages
Functional Analysis
100% (1)
Functional Analysis
5 pages
13 Pag Design and Analysis of Experiments in The Health Sciences
100% (1)
13 Pag Design and Analysis of Experiments in The Health Sciences
13 pages
Intermediate R - Nonlinear Regression in R
No ratings yet
Intermediate R - Nonlinear Regression in R
4 pages
Age-Period-Cohort Analysis: New Models, Methods, and Empirical Applications
No ratings yet
Age-Period-Cohort Analysis: New Models, Methods, and Empirical Applications
339 pages
Agricultural Statistical Data Analysis Using Stata by George Boyhan
No ratings yet
Agricultural Statistical Data Analysis Using Stata by George Boyhan
253 pages
Topological Data Analysis
No ratings yet
Topological Data Analysis
26 pages
Least Square Vs Gradient Descent
100% (1)
Least Square Vs Gradient Descent
52 pages
Scott Knott
No ratings yet
Scott Knott
26 pages
Advanced Strategies For Metabolomic Data Analysis
100% (1)
Advanced Strategies For Metabolomic Data Analysis
31 pages
Generalized Linear Models: Ariel Alonso Abad
No ratings yet
Generalized Linear Models: Ariel Alonso Abad
43 pages
Eem520l3 2023
No ratings yet
Eem520l3 2023
25 pages
Multiple Shooting Method
No ratings yet
Multiple Shooting Method
1 page
Metaheuristics For Finding Multiple Solutions 1st Edition Mike Preuss Michael G Epitropakis Xiaodong Li Jonathan E Fieldsend Instant Download
100% (1)
Metaheuristics For Finding Multiple Solutions 1st Edition Mike Preuss Michael G Epitropakis Xiaodong Li Jonathan E Fieldsend Instant Download
50 pages
Introduction To Modeling and Simulation
No ratings yet
Introduction To Modeling and Simulation
8 pages
Mixture Modelling For Medical and Health Sciences (Shu Kay NG Liming Xiang Kelvin Kai Wing Yau) (Z-Library)
100% (1)
Mixture Modelling For Medical and Health Sciences (Shu Kay NG Liming Xiang Kelvin Kai Wing Yau) (Z-Library)
315 pages
Lectnotemat 2
No ratings yet
Lectnotemat 2
348 pages
History of Regression
No ratings yet
History of Regression
13 pages
3 - Principles of Data Reduction
No ratings yet
3 - Principles of Data Reduction
14 pages
Objective Criteria For The Evaluation of Clustering Methods RAND - JASA - 1971
No ratings yet
Objective Criteria For The Evaluation of Clustering Methods RAND - JASA - 1971
6 pages
What Is Cluster Analysis?
No ratings yet
What Is Cluster Analysis?
24 pages
Graph Partitioning Advance Clustering Technique
No ratings yet
Graph Partitioning Advance Clustering Technique
14 pages
What Is Cluster Analysis?: - Cluster: A Collection of Data Objects
No ratings yet
What Is Cluster Analysis?: - Cluster: A Collection of Data Objects
51 pages
Video 18
No ratings yet
Video 18
17 pages
Data Science through R. Unsupervised Learning. Dimension Reduction Techniques: Principal Components, Factor Analysis and Correspondence Analysis
From Everand
Data Science through R. Unsupervised Learning. Dimension Reduction Techniques: Principal Components, Factor Analysis and Correspondence Analysis
César Pérez López
No ratings yet
Rancang Bangun Prototype Smart Home Dengan Konsep
No ratings yet
Rancang Bangun Prototype Smart Home Dengan Konsep
5 pages
The Effect of Work Motivation On Teacher's Work Performance in Pekanbaru Senior High Schools, Riau Province, Indonesia
No ratings yet
The Effect of Work Motivation On Teacher's Work Performance in Pekanbaru Senior High Schools, Riau Province, Indonesia
14 pages
Lampiran-Lampiran A. Form: Menu Utama
No ratings yet
Lampiran-Lampiran A. Form: Menu Utama
69 pages
1979 - Hartigan - A K-Means Algorithm PDF
No ratings yet
1979 - Hartigan - A K-Means Algorithm PDF
10 pages
1967 - Ball - A Clustering Technique For Summarizing Multivariate Data
No ratings yet
1967 - Ball - A Clustering Technique For Summarizing Multivariate Data
3 pages
Modul 6: Membuat Streaming Server
No ratings yet
Modul 6: Membuat Streaming Server
18 pages
Tower of Hanoi & Big O: Pseudocode
No ratings yet
Tower of Hanoi & Big O: Pseudocode
10 pages
T2 Nanthagopal
No ratings yet
T2 Nanthagopal
10 pages
Press Release CEMA-WPS Office
No ratings yet
Press Release CEMA-WPS Office
2 pages
Orpheus, A General History of Religions by Salomon Reinach, Florence Simmonds (Z-Lib - Org) by Desconocido
100% (1)
Orpheus, A General History of Religions by Salomon Reinach, Florence Simmonds (Z-Lib - Org) by Desconocido
624 pages
PD 532 Anti Piracy and Highway Robbery
No ratings yet
PD 532 Anti Piracy and Highway Robbery
2 pages
River Frontdevelopment 140322040936 Phpapp02
No ratings yet
River Frontdevelopment 140322040936 Phpapp02
35 pages
2 Listofcoursesindex3 1
No ratings yet
2 Listofcoursesindex3 1
4 pages
The Relationship Between IgE and Allergic Disease
No ratings yet
The Relationship Between IgE and Allergic Disease
12 pages
PURCOM-POINTERS
No ratings yet
PURCOM-POINTERS
7 pages
Market Leader Elementary EfB Prelim PDF
No ratings yet
Market Leader Elementary EfB Prelim PDF
6 pages
Water
No ratings yet
Water
2 pages
Earwigs: Biology and Habits
No ratings yet
Earwigs: Biology and Habits
3 pages
3257-Article Text-13145-1-10-20210630
No ratings yet
3257-Article Text-13145-1-10-20210630
13 pages
Har Ghadi Kal Ho Na Ho Tabs Chord Chart
No ratings yet
Har Ghadi Kal Ho Na Ho Tabs Chord Chart
1 page
New Text Document
No ratings yet
New Text Document
33 pages
Unit 5
No ratings yet
Unit 5
6 pages
Chapter 6 Brief Exercises
No ratings yet
Chapter 6 Brief Exercises
8 pages
Sand Filter - Aster Model - Side Mount Laminated Filter
No ratings yet
Sand Filter - Aster Model - Side Mount Laminated Filter
2 pages
Ordinary Days PDF
No ratings yet
Ordinary Days PDF
10 pages
NDMC Invoice
No ratings yet
NDMC Invoice
1 page
Dark Ages - Clan Novel 08 - Brujah
100% (4)
Dark Ages - Clan Novel 08 - Brujah
290 pages
Reality Lost Markets of Attention Misinformation and Manipulation 1st Edition by Vincent Hendricks, Mads Vestergaard 3030008126 9783030008123 Download
100% (1)
Reality Lost Markets of Attention Misinformation and Manipulation 1st Edition by Vincent Hendricks, Mads Vestergaard 3030008126 9783030008123 Download
82 pages
HighNote3 Unit Language Test Unit04 GroupA
No ratings yet
HighNote3 Unit Language Test Unit04 GroupA
1 page
(Ebooks PDF) Download Catgirl Doctor Volume 1 1st Edition Brandon Varnell Liremi Art Full Chapters
100% (6)
(Ebooks PDF) Download Catgirl Doctor Volume 1 1st Edition Brandon Varnell Liremi Art Full Chapters
49 pages
Syllogism Concepts Part I
No ratings yet
Syllogism Concepts Part I
7 pages
Non Linear Data Structure
No ratings yet
Non Linear Data Structure
59 pages
Meaning Exercises - 01
No ratings yet
Meaning Exercises - 01
11 pages
Q1 Mod 2 TVL Automotive Servicing Grade 12 For Teacher
No ratings yet
Q1 Mod 2 TVL Automotive Servicing Grade 12 For Teacher
19 pages
Meet 10 Analytical Reading
No ratings yet
Meet 10 Analytical Reading
18 pages
Chapter 2 - Benefit of Outdoor Education
No ratings yet
Chapter 2 - Benefit of Outdoor Education
12 pages

1971 - Rand - Objective Criteria For The Evaluation of Clustering Methods

Uploaded by

1971 - Rand - Objective Criteria For The Evaluation of Clustering Methods

Uploaded by

Objective Criteria for the Evaluation of Clustering Methods

Author(s): William M. Rand

Manyintuitively appealingmethods have been suggestedforclustering data) titions,forall K 1, 2 , N, of a givenset of N points

The total of ninesimilarities

A. c(Y,Y'), where Y' is a simple modification of Y of thedata?

Movement of a point to dif- k2n-k-4 1 1 3.3 Howsensitive individuals?

3. PERTURBATION:COMPARISON OF THE SENSITIVITY 4. MISSINGDATA:COMPARISONOF TWO CLUSTERING

You might also like