0% found this document useful (0 votes)

69 views8 pages

A Method To Estimate The Statistical Confidence of Cluster Separation

A method to estimate the statistical

Uploaded by

Ivan Cordova

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

69 views8 pages

A Method To Estimate The Statistical Confidence of Cluster Separation

A method to estimate the statistical

Uploaded by

Ivan Cordova

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 8

Theor. Appl. Climatol.

57, 103 110 (1997)

Theoretical
and Applied
Climatology
© Springer-Verlag 1997
Printed in Austria

Potsdam Institute for Climate Impact Research, Potsdam, Germany

A Method to Estimate the Statistical Confidence

of Cluster Separation
F.-W. Gerstengarbe and P. C. Werner

With 1 Figure

Received November 24, 1995

Revised May 2, 1996

Summary impossible if the "tree structure" is built up. This

disadvantage restricts the application.
Cluster analysis contains several multivariate methods
for the separation of patterns (clusters). The definition of With non-hierarchical methods, the elements
the optimum or universally best cluster analysis is an un- ei are simultaneously partitioned into a given
resolved issue. Three methods are of special importance: 1. number of dusters K. By displacing the elements
The statistical confidence of cluster separation. 2. The defini- between the clusters in case of a given quality
tion of the optimal number of clusters. 3. The description criterion, a given initial partition is built up step
of the internal cluster structure. Two new methods addres-
sing these problems are presented. On the basis of non-
by step, and developed into steadily improving
hierarchical minimum-distance cluster analysis a new grouping until the optimum is reached; for more
method is described that allows a separation of clusters details, see Steinhausen and Langer (1977). The
in a statistically welt-founded way. This method solves starting point of the following method is the non-
problems one and two. Using a newly developed special hierarchical minimum-distance method accord-
rank-sum analysis, a solution to the third problem is possible.
ing to Forgy (1965). The starting condition when
An example shows the practicability of the proposed pro-
cedures. applying the above method is to have the elements
ei equally distributed over a number K of given
clusters (initial partition). In the case of M given
1. Introduction
elements and K clusters each cluster receives
The main idea of cluster analysis is to relate to L = M / K elements as follows:
each other an existing number M of elements ei
which are each described by N parameters p, i.e.: eL~C 1
eL+l,..., e2L~C 2
ei = f ( P n . . . . , P,N). (1)
(2)
Two main techniques are possible: e(k- 1)L+ 1, • • •, ekL ~ ¢k
Using hierarchical methods, different sequences (The number of clusters K must be defined empiri-
of groups on different levels may be constructed. cally; the number of elements depends on the data
The result is a hierarchy of clusters in a "tree series and the problem being investigated.)
structure". The disadvantage of this technique A so-called group centroid 6k is then caicutated
lies in the fact that an exchange of elements is for each k of the K clusters (cluster mean value
104 F.-W. Gerstengarbe and P. C. Werner

under consideration of those existing parameters the quality of separation is unknown, as is the
that have to be normalized accordingly in the case objective number of clusters. The following pro-
of different scalings): cedure shows a solution to this problem.
1 kL
= Z ei (3) 2. Definition of a Quality Criterion
i=(k- 1)L+ 1
to Separate Clusters
By applying the Euclidean distance, the following
The quality criterion represents the statistical se-
objective function a(g) for each grouping step
curity of the cluster separation. The definition of
9 can be defined:
this criterion can be described as follows: After
K
having reached the local minimum, each cluster is
a(g) = Z 2 l e, - G I2. (4)
k = 1 i~k equipped with a generally varying number of
elements. Each element is defined by N pa-
By considering the Euclidean distance, each
rameters, i.e., it is located in a N-dimensional
grouping step can be seen as a displacement of the
parameter space. As each cluster consists of a cer-
element e~ into that cluster which contains the
tain number of elements, each representing a scat-
respective nearest centroid. The objective can thus
ter plot of elements in the above space. If the
be minimized:
clustering leads to a local secondary minimum,
a(g)Vg ~ min. (5) overlaps occur between the scatter plots of single
clusters. The principle of this method is presented
This procedure is repeated until a local m i n i m u m
in Fig. 1 which depicts the projection of two pa-
of the objective function is reached. The objective
rameters within the N-dimensional space. The
function reaches a local minimum if two succes-
number of overlaps O of the two clusters a and b
sive grouping steps show the same result; the
of N parameters can accordingly be defined as
iteration is in this case discontinued, i.e., the opti-
follows:
m u m classification with respect to the given
La Lb N a=l,...,k--1 (6)
number of clusters has been reached. Oa,b: i aE= l i bE= l j =El Oia,ibd b = 2,. "" ,k
An important disadvantage of this method is
that one does not know whether an absolute with
minimum or just a secondary minimum of the
objective function has been obtained (Fovell, {10 Pib'J>/Pi~'J (7)
Oia'ib'j : Pib,j< Pi~,j
1993; Milligan and Cooper, 1985). For this reason

25-

¢u20 •
..-.. m I

• a;."
~15 ,,=
i .'- I1,,

Fig. 1. Principle scheme of

overlaps between the clusters the descriptionof the cluster-
0 | . . . . t r ing quality (square/cross-
0 5 10 15 20 25 30 35 overlapped clusters, double
PARAMETER 1 cross - full separatedcluster)
A Method to Estimate the Statistical Confidence of Cluster Separation 105

under the additional condition for instance by the z2-test (e.g., Taubenheim,
1969) which can be written as follows:
~1 > ~2 > "'" > ~k. (8)
)~2 (Oa, b __ 0 ) 2 * (%qmax __ 1)
IfOa, b = 0, than the clusters a and b are completely _ , vo,b (h)"
separated from each other. The m a x i m u m pos-
(Oa,b + O)*(20 m; x oo,
- - O)
sible number of overlaps is with one degree of freedom (dr).
Oared,x = N L a L b. (9) The result of this test can be interpreted in the
following way: If the calculated Z2-value is greater
This number is reached if both clusters cover the than a given threshold of significance, the fre-
same region within the N-dimensional space. quency of overlaps exceeding the mean value
Thus by applying Eqs. (6) to (9) the quality of 0 differs significantly from the Z2-value. The sep-
the separation of clusters can be determined sta- aration between the clusters is therefore statisti-
tistically by the following steps: cally not significant.
1. Calculation of the mean number of maximum
3. Determination of an Optimum Number
possible overlaps (~max and the mean actual
of Clusters
number of overlaps 0 over all combinations of
cluster pairs. The optimum number of clusters is defined as that
2. Undertake a test to see whether O and O max number which leads to the best separation be-
originate from the same basic population. As- tween all clusters. The method presented above
suming that there is a gaussian normal dis- allows the optimum number of clusters for the
tribution, the Student's t-test can be used. non-hierarchical clustering to be determined in
(Because of the necessary normalization of the the best possible way. The following procedure is
parameters, a normal distribution is generally required to this end:
realized.) The null hypothesis implies that both
1. If a clustering with a given initial number of
mean values originate from the same popula-
clusters does not lead to a separation, then the
tion. The clusters can be separated only when
initial number of clusters is varied until at least
the null hypothesis is rejected. Otherwise, the
a single statistically reliable separation be-
procedure is as follows:
tween one cluster and the rest exists.
3. The ratio v~,b of the actual to the maximum
2. If point 1 is fulfilled, the elements of the sepa-
possible number of overlaps is determined for
each cluster pair: rated clusters are noted as being a final partial
result.
Oo,b (10) 3. The initial series is reduced by the separated
vo,b - om;x - cluster elements.
4. This algorithm is repeated using the method
4. The mean value ~ over all v~,b is calculated. It is
presented in Section 2 until all clusters are
the empirical estimate of the actual occurrence
probability of overlaps. statistically reliably separated.
5. If mean values f are not identical, point 5. The optimum number of clusters results from
2 implies that there i s - a c c o r d i n g to the the amount of clusters separated per algorithm
step.
chosen level of significance - a statistically sig-
nificant separation of those clusters for which
4. Rank sum Analysis
Va, b ~ !).
6. The quality of separation in the case va,b > ~7 In addition to the previous investigations, a rank
still needs to be determined. The point is hence sum analysis can be carried out which allows the
to clarify whether a certain value of the number clustering to be checked and determines the inter-
of the actual overlap Oa,b is compatible with the hal structure of the clusters more precisely. After
mean value of all numbers of the actual overlap clustering process is finished, each cluster con-
O. If one interprets the overlaps as empirical tains a certain number of elements. The order of
occurrence frequencies, a statistical compari- elements is random, i.e., there is no knowledge as
son between both is possible. This can be done to a possibly existing order which, however, is of
106 F.-W. Gerstengarbe and P. C. Werner

interest to certain investigations. This problem element e i can be determined as follows:

can be solved by interpretation of the elements N
ei(see Eq. (1)). The disadvantage is that this pro- RS i= ~ wjpij i = I , . . . , K L . (15)
cedure depends directly on the cluster analysis. j=l
The following method is therefore suggested to Having determined all i rank sums, they can in
solve the problem. For cluster analysis, the turn be assigned to the ranks R i (RSi). If these
elements are described by their parameters, ranks are assigned to the respective elements %
where as for rank sum analysis, the elements are then each cluster is equipped with the structure
described by the ranks of their parameters, the ordered according to ranks.
rank of a parameter being determined by its posi-
tion within the existing values of this parameter.
5. Example
Thus each parameter can be considered as the
function of the ranks of their parameters: Many meteorological events are characterized by
more than one parameter, for instance the de-
ei = h(Ril,... , RiN ) (12)
scription of seasonal temperature conditions.
with R~j = rank of the parameter pi~. A preliminary study (Gerstengarbe and Werner,
Rank sum analysis must be started with the 1992) led to the conclusion that the temperature
calculation of the weighted sum obtained from the conditions with regard to Central European sum-
ranks of the parameters for each element. In order mers have to be described by five parameters:
to scale the parameters with one another and to
reduce their possible interdependence to a mini- Number of summer days: Tmax~-25 C, May-September.
Number of hot days: Tin,x>~30 °C, May-September,
m u m it is useful to weight the parameters. The
Heat s u m : Tax > 20 °C, May-September,
weighting is based on the correlation between the Summer mean: T/n, June-August,
parameters: The starting point is the distribution- Extreme value mean: (T~I 1 + T~a + TM3)/3 , June-August
free fourfold table test (Taubenheim, 1969) which
allows the estimation of the tetrachoric correla- with n = amount of days June-August,
tion coefficient r between two parameters: TM1, TM2, TM3 = monthly maximum of the air
r = sin(q * re/2) (13) temperature June, July, August, Tmax= d a i l y
maximum of the air temperature, T = daily mean
with q = 1 - 4aiM (quadrant ratio), a = number
of the air temperature.
of values within the first quadrant, M = number
In order to understand them better, the follow-
of values of all quadrants.
ing calculations are - without restricting general-
The weights of wj (j = 2,..., N) are determined
i t y - b a s e d on the investigation of only one
under the following conditions:
station. The time series of the meteorological
- Determination of a reference parameter with station at Potsdam covering the period 1893-
the weight of w 1 = 1 (which is necessary to 1993 were available for the daily mean and daily
determine the weights by means of correlation). maximum of the air temperature. The aim of
- The weight of the rank of a parameter is sup- the study is to classify the 101 summers of the
posed to reach at least the value of 1/N, which is Potsdam station:
the case if the correlation coefficient between
the above parameter and the reference pa- 5.1 Calculation of the non-Hierarchical
rameter is r = 1. Cluster Analysis with a fixed Initial Number
- If the correlation decreases (in terms of quan- of Clusters
tity), the weight increases, whereas if a correla-
The following quantities have been determined
tion does not exist the weight is one.
for the calculation: number of elements M = 101,
The weighting depends on the sign of the corre-
number of parameters N = 5, number of clusters
lation:
K = 9. The calculation steps are as follows:
N (N 1)lrJ[sign(rj)j = 2,... , g . (14) Determination of the initial partition of Eq. (2),
wj = N
- Calculation of the group centroids according to
By employing (14), the rank sums RSi for the Eq. (3),
A Method to Estimate the Statistical Confidence of Cluster Separation 107

Table 1. Values of the Group Centroids with Regard to the Initial Number of Clusters

Cluster 1 2 3 4 5 6 7 8 9

ok 12.17 5.37 1.50 1.20 0.27 -2.26 -2.70 -5.43 -6.04

Table 2. Results of the Cluster Analysis with a Fixed Number of Clusters

Cluster 1 2 3 4 5 6 7 8 9

1947 1983 1938 1949 1943 1919 1967 1984 1898

1992 i982 1986 1901 1900 1931 1933 1977 1993
1911 1917 1895 1914 1945 1979 1955 1894 1918
1975 1939 1906 1904 1981 1988 1974 1899
1976 1970 1966 1941 1940 1980 1923
1934 1951 1908 1928 1897 1987 1903
1959 1968 1915 1961 1924 1962 1926
1989 1991 1942 1920 1927 1913
1921 1925 1936 1910 1958 1907
1944 1990 1930 1978 1896 1956
1964 1946 1893 1960 1912 1965
1932 1905 1952 1922 1916
1963 1957 1954 1902
1937 1985 1909
1971 1972
1969
1929
1973
1953
1948
1935
1950

- Minimization of the objective function accord- As shown in Table 3, actual overlaps exist. That is
ing to Eqs. (4) and (5), why the quality of cluster separation must be
The values of the group centroids are given in checked. The necessary m a x i m u m number of
Table 1. Because of normalization of the pa- overlaps is given in Table 4. Using the values of
rameters, the values are without units. The group Tables 3 and 4 the ratios of overlaps were cal-
centroids are ranked in order from large to small culated. The result can be seen in Table 5. Accord-
values. This corresponds to a ranking of the clus- ing to the null hypothesis of the z2-test, Table 6
ters from "warm" to "cool" (see Table 2). shows all those clusters which cannot be statisti-
cally reliably separated from each other (to the
5.2 Calculation of the Quality of the 1% or 5% margin of error). Therefore it is neces-
Separation of Clusters sary to realize an additional separation of clusters.

Calculation steps for cluster separation are:

5.3 Optimization of the Number of Clusters
- Calculation of the actual overlaps according to
Eqs. (6)-(8), The o p t i m u m number of clusters is determined
Calculation of the m a x i m u m possible number according to the method described in Section 3.
of overlaps according to Eq. (9), The results are presented in Table 7. It shows that
- Determination of the ratios of overlaps accord- the number of clusters increases from 9 (initial
ing to Eq. (10), number of clusters) to 14 (number of statistically
- z2_test. significantly separated clusters).
Table 3. Number ofActual Overlaps

Cluster 1 2 3 4 5 6 7 8 9

1 23 2 0 2 0 0 0 0
2 120 28 93 18 0 1 0
3 113 253 79 7 2 9
4 105 31 17 5 3
5 127 130 3 17
6 317 52 82
7 79 90
8 210
9

Table 4. Maximum NumberofOverlaps

Cluster 1 2 3 4 5 6 7 8 9

1 325 505 565 790 985 1150 1255 1465

2 1495 1995 3870 5495 6870 7745 9495
3 735 3510 5915 7950 9245 11835
4 3070 5735 7990 9425 12295
5 3635 6715 8675 12595
6 3790 6205 11035
7 2795 8395
8 6085
9

Table 5. Ratio of Overlaps (ratio*lO -4)

Cluster 1 2 3 4 5 6 7 8 9

1 707 39.6 0 25.30 0 0 0 0

2 803 140 240 32.8 0 1.3 0
3 1537 721 134 8.8 2.2 7.6
4 342 50.4 21.3 5.3 2.4
5 349 194 3.4 13.5
6 836 83.8 74.3
7 28.6 10.7
8 345
9

5.4 Rank sum Analysis

Table 6. Error Probability for the Significant Difference Be-
Calculation steps for rank sum analysis are: tween the Frequency of Occurrence of "Overlaps" of the Clu-
sters among each other
- C a l c u l a t i o n of the correlation coefficients
according to Eq. (13), Cluster 1 2 3 4 5 6 7 8 9
- Determination of the weights according to
Eq. (14), 1 1
Calculation of the rank sums according to 2 1 5
3 1 1
Eq. (15), 4 1
- Determination of the ranks of the rank sums, 5 1
- Assignment of the ranks to the elements (sum- 6 1
mers). 7
8 1
Table 7 shows in addition to the results of the 9
optimum clustering those that are obtained from
A Method to Estimate the Statistical Confidence of Cluster Separation 109

Table 7. Results of the Cluster- and rank sum Analysis "Summers". The Clusters are ranked in Order from "warm" to "cold", per
Cluster. 1st Column- year, 2st Column rank of the rank sum

Cluster 1 2 3 4 5

1947 1 1921 12 1917 6 1976 8 1944 13

1992 2 1964 14 1934 9 1932 15 1971 18
1911 3 1959 10 1950 26 1969 19
1983 4 1989 11 1953 22
1982 5 1948 23
1975 7

Cluster 6 7 8 9 10

1963 16 1973 21 1949 29 1943 28 1925 39

1937 17 1935 25 1939 31 1904 45 1905 43
1929 20 1895 30 1901 32 t930 52 1952 56
1938 24 1970 33
1986 27 1951 34
1914 37
1990 41

Cluster 11 12 13 14

1900 35 1967 54 1931 60 1898 76

1968 36 1933 55 1979 63 1993 78
1991 38 1955 59 1981 64 1918 82
1906 40 1988 61 1941 66 1899 83
1946 42 1940 65 1928 68 1923 85
1945 44 1897 67 1961 69 1977 87
1966 46 1920 70 1903 88
1908 47 1910 71 1926 89
1915 48 1924 72 1913 91
1942 49 1978 73 1974 92
1919 50 1960 74 1980 93
1936 51 1922 75 1907 94
1893 53 1927 77 1987 95
1957 57 1954 79 1956 96
1985 58 1958 80 1965 97
1972 62 1896 81 1916 98
1912 84 1902 99
1894 86 1909 100
1984 90 1962 101

the r a n k s u m analysis. T h e classification o f the " w a r m " to "cool". O n e c a n see t h a t the results of
clusters is c a r r i e d o u t a c c o r d i n g to the a b o v e the r a n k s u m analysis are in g o o d c o r r e s p o n d e n c e
results, i.e. t h a t the clusters are r a n k e d in o r d e r with the r a n k e d clusters e v e n w h e n single
f r o m " v e r y w a r m s u m m e r s " (cluster 1) to " v e r y e l e m e n t s are a r r a n g e d differently within the
cool s u m m e r s " (cluster 14) o n the basis of the line f r o m " w a r m " to "coo1".
g r o u p centroids. W i t h i n the clusters the classifica-
t i o n of the e l e m e n t s is c a r r i e d o u t in the o r d e r of
6. Conclusions
the r a n k s of the r a n k sums. B e c a u s e of the fact t h a t
the r a n k s of the r a n k s u m s are also o r d e r e d f r o m T h e results p r e s e n t e d s h o w t h a t the s u g g e s t e d
" v e r y w a r m s u m m e r s " to " v e r y c o o l s u m m e r s " p r o c e d u r e is the first w h i c h allows the q u a l i t y of
(increasing r a n k s ) w i t h i n e a c h cluster o n e gets the s e p a r a t i o n of clusters to be c a l c u l a t e d in a sta-
a s e q u e n c e of the e l e m e n t s ( s u m m e r s ) f r o m tistically w e l l - f o u n d e d way; it r e p l a c e s the often
110 F.-W.Gerstengarbe and P. C. Werner: A Method to Estimate the Statistical Confidence of Cluster Separation

adverse effects of a given number of clusters when Fovell, R.G., Fovell, M.C., 1993: Climate zones of the
employing non-hierarchical cluster analysis by conterminous United States Defined using cluster analy-
applying the o p t i m u m number of clusters which sis. Journal of Climate, 6, 2103-2135.
Gerstengarbe, F.-W., Werner, P. C., 1992: The time structure
guarantee a statistically reliable separation of all of extreme summers in Central Europe between 1901 and
clusters from each other. The procedure is com- 1980. Meteor. Zeitschrift, N, F,, 1, 285-289.
pleted by applying the described rank sum analy- Milligan, G.W., Cooper, M.C., 1985: An examination of
sis which makes it possible to indicate the internal procedures for determining the number of clusters in
order of the elements for each calculated cluster. a data set. Psychometrika, 50, 159-179.
Steinhausen, D., Langer, K., 1977: Clusteranalyse- Einfiih-
The combined application of the described pro-
rung in Methoden und Verfahren der automatischen Klas-
cedures is thus a suitable method to improve the sifikation Berlin: Walter de Gruyter, 411 pp.
use of cluster analysis methods and the methods Taubenheim, J., 1969: Statistische Auswertung geophysikali-
hint whether an application of the cluster analysis scher und meteorologischer Daten. Leipzig: Akad. Ver-
is possible or not. lagsges. Geest & Portig, 386pp.

References
Authors' address: Dr. F.-W. Gerstengarbe, Dr. P.C.
Forgy, E.W., 1965: Cluster analysis of multivariate data: Werner, Potsdam Institute for Climate Impact Research,
efficiency versus interpretability of classifications (ab- P,O. Box 60 1203, D-14412 Potsdam, Federal Republic of
stract). Biometrics, 21, 768. Germany.

Aula - Análise de Clusters
No ratings yet
Aula - Análise de Clusters
93 pages
Long and Synthetic Division
100% (1)
Long and Synthetic Division
8 pages
Clustering Methods
No ratings yet
Clustering Methods
29 pages
Unit IV
No ratings yet
Unit IV
51 pages
Clustering
No ratings yet
Clustering
55 pages
Lecture 3
No ratings yet
Lecture 3
46 pages
Machine Learning Notes Anna University
100% (1)
Machine Learning Notes Anna University
14 pages
Cluster Analysis
No ratings yet
Cluster Analysis
12 pages
508XT V2 1 Functional DRAFT
No ratings yet
508XT V2 1 Functional DRAFT
692 pages
L18 19 Clustering
No ratings yet
L18 19 Clustering
48 pages
Objective Criteria For The Evaluation of Clustering Methods RAND - JASA - 1971
No ratings yet
Objective Criteria For The Evaluation of Clustering Methods RAND - JASA - 1971
6 pages
Aveva™ - Engineering - Commands - 2024 09 26 13 33 05
No ratings yet
Aveva™ - Engineering - Commands - 2024 09 26 13 33 05
5 pages
Lecture 18 K Means Clustering
No ratings yet
Lecture 18 K Means Clustering
77 pages
Panasonic VRF AHU Catalog 190626 Lo-Res
No ratings yet
Panasonic VRF AHU Catalog 190626 Lo-Res
11 pages
Chapter 3 Unsupervised Learning
No ratings yet
Chapter 3 Unsupervised Learning
45 pages
SP SS
No ratings yet
SP SS
9 pages
First Paper Before
No ratings yet
First Paper Before
19 pages
How To Desing, Implement, and Interpret An Employee Survey-Jhon H. McConnell
No ratings yet
How To Desing, Implement, and Interpret An Employee Survey-Jhon H. McConnell
330 pages
Dynoc
No ratings yet
Dynoc
7 pages
MachineLearning Unit IV
No ratings yet
MachineLearning Unit IV
51 pages
Lecture-11 Cluster Analysis-1
No ratings yet
Lecture-11 Cluster Analysis-1
28 pages
Bacher 2002 Cluster Analysis
No ratings yet
Bacher 2002 Cluster Analysis
199 pages
Lec 35
No ratings yet
Lec 35
18 pages
DA Seminar
No ratings yet
DA Seminar
29 pages
Hierarchical Clustering
No ratings yet
Hierarchical Clustering
26 pages
MODULE 4 Clustering
No ratings yet
MODULE 4 Clustering
23 pages
AI20 - Hierarchical-Clustering
No ratings yet
AI20 - Hierarchical-Clustering
31 pages
Chapter 4 - Clustering
No ratings yet
Chapter 4 - Clustering
21 pages
Module 4 - 5TH Sem
No ratings yet
Module 4 - 5TH Sem
23 pages
Maroche MG164 UM
No ratings yet
Maroche MG164 UM
74 pages
UTT SBA v10 27th Nov 2016 Revision10
No ratings yet
UTT SBA v10 27th Nov 2016 Revision10
21 pages
Unit 4 Self Made
No ratings yet
Unit 4 Self Made
28 pages
Data Mining - Chapter 4 Cluster Analysis
No ratings yet
Data Mining - Chapter 4 Cluster Analysis
37 pages
CLUSTRING
No ratings yet
CLUSTRING
13 pages
CSO303 Week2 H AI IA
No ratings yet
CSO303 Week2 H AI IA
80 pages
KELOMPOK 5 - An Overview of Business Intelligence, Analytics, and Data Science
No ratings yet
KELOMPOK 5 - An Overview of Business Intelligence, Analytics, and Data Science
15 pages
Introduction To Data Mining Clustering Analysis
No ratings yet
Introduction To Data Mining Clustering Analysis
84 pages
Unit-6 Clustering Techniques
No ratings yet
Unit-6 Clustering Techniques
110 pages
Unit-1-Android-And-Its-Tools MAD
No ratings yet
Unit-1-Android-And-Its-Tools MAD
10 pages
DC17 Ch04
No ratings yet
DC17 Ch04
42 pages
Lecture-9 Cluster Analysis - LAK
No ratings yet
Lecture-9 Cluster Analysis - LAK
4 pages
JCM Training Overview Uba 10-11-12 14
No ratings yet
JCM Training Overview Uba 10-11-12 14
25 pages
Cluster Analysis: Talha Farooq Faizan Ali Muhammad Abdul Basit
No ratings yet
Cluster Analysis: Talha Farooq Faizan Ali Muhammad Abdul Basit
16 pages
MA Unit 5
No ratings yet
MA Unit 5
7 pages
Bayesian Multidimensional Scaling and Choice of Dimension
No ratings yet
Bayesian Multidimensional Scaling and Choice of Dimension
14 pages
Satya Nadella The Man Who Rebuilt Microsoft
No ratings yet
Satya Nadella The Man Who Rebuilt Microsoft
2 pages
P 3.1.3 Hierarchical
No ratings yet
P 3.1.3 Hierarchical
30 pages
JD For Grey Orange
No ratings yet
JD For Grey Orange
1 page
2 - Review Article - Introduction To Multivariate Analysis
No ratings yet
2 - Review Article - Introduction To Multivariate Analysis
8 pages
Recent Advances in Clustering A Brief Survey
No ratings yet
Recent Advances in Clustering A Brief Survey
9 pages
Machine Learning
No ratings yet
Machine Learning
23 pages
ch4 Framework and Validation PDF
No ratings yet
ch4 Framework and Validation PDF
21 pages
A Geometrical Techniques in Statistical Data Analysis Tools For Extracting Information From Data
No ratings yet
A Geometrical Techniques in Statistical Data Analysis Tools For Extracting Information From Data
14 pages
Remarks On Parallel Analysis: Andreas Buja
No ratings yet
Remarks On Parallel Analysis: Andreas Buja
26 pages
Bowman - Monotone Regresion PDF
No ratings yet
Bowman - Monotone Regresion PDF
12 pages
K-Means Clustering
No ratings yet
K-Means Clustering
8 pages
Week2-Fuzzy Logic and Reasoning
No ratings yet
Week2-Fuzzy Logic and Reasoning
48 pages
My Lecture On CLUSTER ANALYSIS PDF
No ratings yet
My Lecture On CLUSTER ANALYSIS PDF
55 pages
Lecture 3 Revision Questions
No ratings yet
Lecture 3 Revision Questions
3 pages
INVENTORY SHEET Final
No ratings yet
INVENTORY SHEET Final
1 page
Robust Linear Clustering
No ratings yet
Robust Linear Clustering
18 pages
Valid Fortinet NSE5 FMG-5.4 Exam Dumps
No ratings yet
Valid Fortinet NSE5 FMG-5.4 Exam Dumps
4 pages
Cluster Analysis
No ratings yet
Cluster Analysis
24 pages
Service Manual: TV-21ST3 TV-20ST5 TV-14ST5
No ratings yet
Service Manual: TV-21ST3 TV-20ST5 TV-14ST5
6 pages
Analyzing Malicious Documents Cheat Sheet
No ratings yet
Analyzing Malicious Documents Cheat Sheet
7 pages
10.cluster Analysis
No ratings yet
10.cluster Analysis
68 pages
Lecture Notes - Clustering
No ratings yet
Lecture Notes - Clustering
13 pages
Lecture+Notes+ +clustering
No ratings yet
Lecture+Notes+ +clustering
13 pages
Gnuradio Install
100% (1)
Gnuradio Install
3 pages
Performance Evaluation of Distance Metrics in The Clustering Algorithms
No ratings yet
Performance Evaluation of Distance Metrics in The Clustering Algorithms
14 pages
DWDM Unit5
No ratings yet
DWDM Unit5
14 pages
ESB Services API Reference Guide
No ratings yet
ESB Services API Reference Guide
12 pages
MX BNG 17.x-18.x New Features v02 (ENG)
No ratings yet
MX BNG 17.x-18.x New Features v02 (ENG)
48 pages
Compilation Techniques
No ratings yet
Compilation Techniques
15 pages
"These Are Just Rough Notes For References" What Is K-Means Clustering
No ratings yet
"These Are Just Rough Notes For References" What Is K-Means Clustering
9 pages
Cluster Analysis
No ratings yet
Cluster Analysis
33 pages
Image Segmentation Adaptive Clustering
No ratings yet
Image Segmentation Adaptive Clustering
9 pages
Hierarchical Clustering: Required Data
No ratings yet
Hierarchical Clustering: Required Data
6 pages
Automatic Clustering With Single Optimal Solution
No ratings yet
Automatic Clustering With Single Optimal Solution
13 pages
Free Valentine Homework Pass Printable
100% (1)
Free Valentine Homework Pass Printable
5 pages
DA-100 Mod6-ENU-PowerPoint
No ratings yet
DA-100 Mod6-ENU-PowerPoint
26 pages
Cluster Analysis Concept & Methods
No ratings yet
Cluster Analysis Concept & Methods
14 pages
Everything About Folded Plate in Advance Steel - Go Measure 4 Me in 3D
No ratings yet
Everything About Folded Plate in Advance Steel - Go Measure 4 Me in 3D
10 pages
Chapter-5-Cluster Analysis PDF
No ratings yet
Chapter-5-Cluster Analysis PDF
5 pages
Agnes
No ratings yet
Agnes
25 pages
Rezilens Profile-New
No ratings yet
Rezilens Profile-New
13 pages
1971 - Rand - Objective Criteria For The Evaluation of Clustering Methods
No ratings yet
1971 - Rand - Objective Criteria For The Evaluation of Clustering Methods
6 pages
Cluster Analysis BRM Session 14
No ratings yet
Cluster Analysis BRM Session 14
25 pages
2002 Hakidi Cluster Validity Methods Part II
No ratings yet
2002 Hakidi Cluster Validity Methods Part II
9 pages
Device Dispatch
No ratings yet
Device Dispatch
7 pages
Understanding Community Structure A Multivariate Approach PDF
No ratings yet
Understanding Community Structure A Multivariate Approach PDF
12 pages
Laporan Praktikum Transformasi Dan Animasi: Oleh Azizah Tri Novanti 170533628613 S1 PTI 2017 A
No ratings yet
Laporan Praktikum Transformasi Dan Animasi: Oleh Azizah Tri Novanti 170533628613 S1 PTI 2017 A
14 pages
Agglomerative Hierarchical Clustering Algorithm-A Review: K.Sasirekha, P.Baby
No ratings yet
Agglomerative Hierarchical Clustering Algorithm-A Review: K.Sasirekha, P.Baby
3 pages
Business Research: Cluster Analysis
No ratings yet
Business Research: Cluster Analysis
10 pages
Cluster Analysis
No ratings yet
Cluster Analysis
5 pages
L2B Test 3 Answer Key
No ratings yet
L2B Test 3 Answer Key
2 pages
K Nearest Neighbor Algorithm: Fundamentals and Applications
From Everand
K Nearest Neighbor Algorithm: Fundamentals and Applications
Fouad Sabry
No ratings yet

A Method To Estimate The Statistical Confidence of Cluster Separation

Uploaded by

A Method To Estimate The Statistical Confidence of Cluster Separation

Uploaded by

Theor. Appl. Climatol.

57, 103 110 (1997)

Potsdam Institute for Climate Impact Research, Potsdam, Germany

A Method to Estimate the Statistical Confidence

Received November 24, 1995

Summary impossible if the "tree structure" is built up. This

Fig. 1. Principle scheme of

interest to certain investigations. This problem element e i can be determined as follows:

ok 12.17 5.37 1.50 1.20 0.27 -2.26 -2.70 -5.43 -6.04

Table 2. Results of the Cluster Analysis with a Fixed Number of Clusters

1947 1983 1938 1949 1943 1919 1967 1984 1898

Calculation steps for cluster separation are:

Table 4. Maximum NumberofOverlaps

1 325 505 565 790 985 1150 1255 1465

Table 5. Ratio of Overlaps (ratio*lO -4)

1 707 39.6 0 25.30 0 0 0 0

5.4 Rank sum Analysis

1947 1 1921 12 1917 6 1976 8 1944 13

1963 16 1973 21 1949 29 1943 28 1925 39

1900 35 1967 54 1931 60 1898 76

You might also like