Vogel 1979

Download as pdf or txt
Download as pdf or txt
You are on page 1of 9

IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, VOL. PAMI-1, NO.

3, JULY 1979 237

PFS Clustering Method


MARK A. VOGEL, MEMBER, IEEE, AND ANDREW K. C. WONG

Abstract-This paper presents a method of cluster analysis based on a criterion possesses certain advantages. Its computation is fast
pseudo F-statistic (PFS) criterion function. It is designed to subdivide and it cannot be overwhelmed by a single variable.
an ensemble into an optimal set of groups, where the number of groups
is not specified and no ad hoc parameters are employed. Since feature preprocessing is often necessary prior to cluster
Univariate and multivariate F-statistic and pseudo F-statistic con- analysis, one can use some of the well-documented techniques
sistency is displayed. Algorithms for feasible application of PFS are [5] - [9] to obtain a feature set that approximates a Euclidean
given. Results from simulations are utilized to demonstrate the capa- space. In this space where Euclidean distances represent the
bilities of the PFS clustering method and to provide a comparative differences or dissimilarities between objects, the mim tr (W)
guide for other users.
criterion is very appropriate for cluster analysis. The min tr
Index Terms-Cluster analysis, Euclidean distance clustering, group (W) criterion by itself, like the min IWI criterion, cannot in-
separation criteria, hierarchical clustering, pseudo F-statistic, sum of dicate the optimal total number of clusters present, as it will
squares within minimization.
always decrease as the number of subdivisions is increased. If
a situation arises such that the min tr (W) cluster configuration
I. INTRODUCTION (with the total number of clusters preset) is optimal, then one
IN the past, several criteria for measuring the quality of is accepting Euclidean distance as the best measure of object
group separations have been proposed. Different criteria separability.
were used to divide or evaluate the division of n samples into Since minimizing Euclidean distance within clusters is a
k groups where the number of groups is specified. Friedman heuristically appealing concept, techniques evolved from it
and Rubin [1] examined the feasibility of several criteria for have received considerable attention. Additional heuristics
application in cluster analysis. The analyzed criteria utilized are necessary when the number of clusters to be detected is
the pooled within group scatter matrix W [2]. They recog- inherently unknown prior to the analysis. ISODATA [10],
nized that optimal clusters should have minimal variability and [11], with its splitting and merging heuristics, is a good ex-
concluded that a criterion of minimizing the determinant IWI ample of such development. Unfortunately, it requires the
was generally the best due to its sensitivity to local structure. user to specify parameters upon which the splitting and
Since the I WI value for k + 1 groups would be less than or merging are to be based and does not provide a good measure
equal to that for k groups, a comparison of IWI values could of the quality of cluster configurations obtained.
not be used to determine the appropriate number of groups. We are presenting here the PFS (pseudo F-statistic) cluster-
Marriott [3] examined the min IWI criterion and proposed a ing method which uses a Euclidean distance ratio and finds an
method for deciding the optimal value of k. He noted that the optimal number of clusters without the necessity of specifying
optimum subdivision of a uniformly distributed population any ad hoc parameters. This technique does not require any
into k groups reduces IWI by a factor k2. In his paper the prior knowledge of cluster size or cluster variance. It contains
WI criterion was presented along with a rough method for an objective criterion which determines the total number of
judging the significance of the groupings. clusters in a way which matches the basic assumption (that
Minimizing I WI is computationally difficult. The simpler Eucidean distance is the proper measure of object similarities
trace of W criterion is not considered as useful by Marriott in the feature space given) made for these types of clustering
or Friedman and Rubin because it does not take into account routines. The pseudo F-statistic criterion treats Eucidean
the within group covariance. Consequently, it does not as distance as if it is a single measurement variable. When a single
readily detect those highly elliptical clusters whose correlations variable is used for the comparison of object similarities, the
are high. However, Maronna and Jacoukis [4] point out some F-statistic can be employed to determine the statistical sig-
of the major problems associated with Friedman and Rubin's nificance of the group separations with respect to that variable
criteria while noting the importance and general acceptance [12]. Furthermore, it can indicate whether a separation in k
of clustering algorithms based on the trace of W. The trace or k + 1 groups is more significant. By treating the Euclidean
distance as though it is a single measurement variable and ap-
plying the F-statistic type of significance test, one can now
Manuscript received November 15, 1976; revised November 3, 1978.
This work was supported in part by the National Council of Canada, create a criterion which both responds to the minimization of
Operation Grant A4716. the trace of W for k groups and optimizes for the best value of
M. A. Vogel is with The Analytic Sciences Corporation, Reading, MA k. Thus, when the total number of classes is unknown, the
01867.
A. K. C. Wong is with the Department of Systems Design, University PFS clustering method provides a matching extension to
of Waterloo, Waterloo, Ont., Canada. min tr (W).

0162-8828/79/0700-0237$00.75 O 1979 IEEE


238 IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, VOL. PAMI-1, NO. 3, JULY 1979

II. BASIC FORMULATION


A. Univariate and Multivariate Relationship
Source
Source 0Degrees
~~~Freedomo Dispersion Matrix

In this section the relationships between the quantities under


Between groups k - 1 B
discussion and properties of the pseudo F-statistic are ex-
amined. The notation for matrix quantities is adopted from Within groups n - k w
the standard multivariate analysis of variance usage [12],
[13]. Total n - 1 T
Fig. 1 represents the MANOVA table for multivariate ob-
servations on n objects divided into k groups. Let X represent Fig. 1. MANOVA table.
the observation matrix of size n X p such that each row con-
tains all of the p observations about a single object and each
column represents all of the objects' values on a single mea- Source Degrees
Freedom
of Sums of Squares
surement variable [14]. Let Ci be a vector of length p repre-
senting the centroid of the ith cluster or group. Let ni be the
number of members of cluster i, and n = 4= 1 ni. Yi is an Between groups k - 1 SSB.
ni X p matrix of observations on members of group i. There- Within groups n - k SSW
fore, the i yi' has the identical information contained in X.
Now we can consider the formulation of the analysis of a Total n - 1 SST

variance table for the univariate case (Fig. 2).


From this table we can generate the F-statistic ratio Fig. 2. ANOVA table.

SSB (n - k)
SSW(k- 1) Yll ...

*Y1P -i

which follows the F-distribution with (k - 1, n - k) degrees


yi=
of freedom. In both the univariate and multivariate analysis
which follows we can consider the global mean or mean vector Relating this back to our original matrices we get
to be zero without loss of generality. For the univariate pro-
cess, ci is the mean (a scalar) of group i and Y' is a vector of tr (B) = SSBe
length ni such that Y' {y,A, * . i} tr (W)= SSWe
The univariate SSB, SSW, and SST quantities can be com-
puted by tr (T) = SSTe.

k The following relationship holds true for all cases:


SSB= nidc SSW + SSB = SST
i=l
SSWe + SSBe = SSTe
k ni
SSW = E (yi - ci)2 or
i=1 v=1
tr (W) + tr (B) = tr (T). (1)
k ni
SST= E E (yi)2. Just as before where we formed the sums of squares ratio
i=1 v=l to obtain an F-statistic, we can now form a pseudo F-statistic:
Eucidean sums of squares SSBe, SSWe, and SSTe can be PFS = SSBe(n - k) - tr (B)(n - k)
computed similarly: SSWe(k- 1) tr(W)(k- 1)
k P B. Properties of the PFS
SSBe= £ E nicA
i=l j=1 First we can examine the weighting system which enables
the PFS to select the appropriate number of groups.
k ni P Consider the case when k = 2.
SSWe = EE E (y'i - cij)2
i=l v=l j=l
PFS =
tr (B)(n - 2)
k ni P tr (W)
SSTe= E
Z 2 (ylj)2
For two clusters the tr (B)/tr (W) ratio is multiplied by a large
i=l v=l j=l
number which tends to make the PFS larger.
where At the opposite extreme when k = n - 1
Ci = {cil I Ci2,* * * , Cip}
PFS = tr (B)
and tr (W)(n -2)
VOGEL AND WONG: PFS CLUSTERING METHOD 239

Now the tr (B)/tr (W) ratio is divided by the same large num-
ber to lower the PFS.
The variation in the PFS as k progresses, caused by this

xrX~~~~~~
weighting, renders a means to decide the number of clusters
that best represents the distribution of objects in the feature PFS
space. It should be noted that, if k is held fixed, maximizing
PFS minimizes tr (W) due to (1). As more clusters are formed
tr (W) always decreases while tr (B) always increases, but the I

PFS does not continually increase because the weighting acts I


I
against this trend. I
A simple illustration of the variation of the PFS with respect I

to k follows (Fig. 3). If there actually exists a hierarchy of


nested clusters such as that represented below (Fig. 4(a), k
where there are two distinct levels of clusters) then the PFS Fig. 3. Expected result for well-separable clusters: K' is the number
should remain relatively constant over this range [Fig. 4(b)]. of clusters that best represents the distribution of objects in the
In theory, to maximize the significance of a grouping in feature space.
Eucidean space one should maximize the related percent
level of significance on the F-distribution. It seems, how-
ever, reasonable to acc0pt a solution which maximizes the
PFS value. A comparison of PFS values from samples gen-
erated randomly supports this view. These results are pre- First Level

sented in the simulation section. One further simplification


can be employed if desired. When n is much greater than k, Second Level
the PFS will not be changed much by the (n - k) term and the
simpler ratio (a)
tr (B)
tr(W)(k- 1)
can be observed. This can be especially useful if one wishes
to determine relative significance of clusterings in different PFS
runs with different sample size. In addition, maximization
I
of the PFS ratio to choose the best cluster structure is men- II

tioned by Cormack [15] in his classification review. I I


I I
III. ALGORITHMS FOR PFS CLUSTERING
I

I I
I
I k
Any clustering routine which calculates a single measure of I

KA Kh
0

similarity (as with the minimization of Euclidean distance (b)


within clusters type) can be formulated in the PFS setting. Fig. 4. Hierarchical clustering. (a) Nested clusters. (b) PFS versus
The PFS can be used as an evaluator after processing or in- k plot.
corporated into a procedure as the controller.
Two implementations in a Euclidean setting are presented. figurations of cluster members could, of course, be accom-
In the first, a modified version of Ball and Hall's ISODATA plished by formulating all cluster configurations and choosing
[1 1 ], [10] and the K-means [16], [17] clustering procedures the one with the highest PFS value. This is not, however,
are employed. It first minimizes the tr (W) with a fixed num- computationally practical and two algorithms which tend
ber of clusters. The PFS is then used after different runs for towards the objective of PFS maximization are presented.
the evaluation and selection of the best cluster formation. Our first algorithm Al presents a practical system in which
In the second, a more complex procedure is involved. It uses the tr (W) tends towards a minimum for k = 2, then k = 3,
part of the ISODATA and K-means heuristics and also in- etc., and continues until the PFS value starts to decline. It
corporates the PFS as controller. The ISODATA procedure retains the hill climbing aspect of the ISODATA heuristic
as devised by Ball and Hall was capable of splitting and merg- and hence requires several runs to assure a good result. The
ing clusters based on ad hoc parameters which the user set. PFS is only an evaluator in this procedure and is used to
Our clustering method eliminates this arbitrariness since the terminate the iteration that keeps increasing the cluster num-
PFS criterion selects the proper number of clusters without bers. Rescaling is included as a standard procedure and the
prior knowledge of cluster variability. initial set of centroids can be specified by the user based on
The clustering method presented is optimal in the sense that prior knowledge. Individuals who already are using K-means
the maximization of a particular (PFS) objective function is or ISODATA type of clustering algorithms may find this algo-
conceptually desired. To assure this maximization with re- rithm easy to implement by addition and modification of their
spect to all possible numbers of clusters and all possible con- existing routines.
240 IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, VOL. PAMI-1, NO. 3, JULY 1979

Our second algorithm A2 is a more complex procedure with troids closest to them). CLM = cluster membership
the maximization of PFS as its objective. Initial cluster mem- assignments.
berships can be specified for hypothesis testing. Through the Al .5 [Compute new centroids]. Based on { Y}, compute
use of the control parameter SUPER one can obtain knowl- a new {C1}, CENT - {Ci}.
edge of the quality and stability of the initial assignment. Al .6 [Compute variability measures]. SSWe and SSBe are
When SUPER is set to 1, the PFS for the original clusters is computed on each iteration.
computed and then only one reassignment of members to the A1.7 [Store old assignment]. CLMO <- present cluster
cluster with the closest centroid is performed. This tests the member assignment.
assignment hypothesis with respect to the given feature set. A1.8 [Reassign members to clusters]. Each member is re-
In normal operation this algorithm can search for the best assigned to a cluster (based on new Ci's). CLM v- new
clustering by PFS standards. Although more difficult to im- cluster membership assignment.
plement than algorithm Al, this algorithm displays better Al .9 [Determine if clusters are changing]. If CLM CLMO
=

stability and convergence properties. This advantage can be go to Step AL.5. If CLM = CLMO, proceed to next
especially important for problems in which the cluster separa- step.
bility is not very distinct. A1.10 [Compute PFS and output results]. Output PFS,
A choice of three alternative methods can be specified by CENT, and CLM.
control parameter METH. For METH set to zero, the routine Al .11 [Determine if iteration continues]. If PFS < PFSO,
will stop after finding the best clustering (still hill climbing) stop. If PFS > PFSO, proceed.
for the initially specified number of clusters. This provides A1.12 [Store and increment]. PFSO v- PFS and K v K + 1.
a maximization of PFS for a fixed k and a corresponding A1.13 [Retum]. Return to Step A1.3 to start new minimiza-
minimization of tr (W). The second alternative, called the tion of tr (W) with a higher number of clusters if
cluster splitting method, splits the most variable cluster K < upper limit of K, else stop.
(starting with k = 2) and reassigns members into a stable con-
figuration. The number of clusters k will then increase until B. Algorithm A2: PFS Controller Method
the PFS declines. The third alternative is known as the cluster A2.1 [Form observation matrix]. Obtain X as described in
combination method. This procedure initially divides the Al.l.
feature space into many clusters and combines clusters two A2.2 [Assign initial cluster members]. Input CLM, initial
at a time, then reassigns until the clusters are stable for each k. membership assignment. If no initial assignment is
The process continues until the PFS starts to decline or k = 2. given, a random assignment CLM will be generated.
A special function is constructed in order to determine which A2.3 [Specify procedure]. Input control parameter SUPER:
clusters should be combined. This function has been pre- if SUPER <- 0 normal operations will be pursued; if
viously advanced for hierarchical grouping by Ward and Hook SUPER <- 1, the quality and stability of clusters as-
[18], [19] and examined by Wishart [20] and Beale [21 ]. signed in CLM will be tested. Input control parameter
The merging and splitting methods are not used to just METH: if METH <- 0, stop after obtaining best result
hierarchically partition the data. They are merely strategies for initial K; if METH v 1, subdivide ensemble until
in the attempt to locate the best PFS grouping. The merging PFS declines; if METH <- 2, combine initial clusters
or splitting criteria are used in conjunction with iterations to until PFS declines. Input K (starting number of
reassign cluster members to the closest centroids. This pro- clusters). Default: K <- 2 when METH = 1; K <-N/5
duces an overall result without some of the mathematically when METH = 2. PFSO <- 0.
precise justifications of the purely hierarchical techniques but A2.4 [Compute group centroids]. CENT is a K X P matrix
with good cluster location capabilities. of centroid values, CENT <- {C,}.
A2.5 [Compute variability measures]. SSWe and SSBe are
A. Algorithm Al: PFS Evaluator Method computed.
A2.6 [Select procedure]. If SUPER= 1, go to Step ASUB. 1;
Al.1 [Form observation matrix]. The original measure- if SUPER 0 1, proceed to next step.
ment variables are processed into a final feature set for A2.7 [Store and assign membership]. CLMO <- CLM.
clustering. Standard procedure is to remove the global A2.8 [Reassign cluster members]. Assignment is based on
mean and normalize the variance of each feature to Euclidean distances between all members and all cen-
unity. The feature set forms an n X p matrix X. troids. CLM <- new cluster membership assignment.
A1.2 [Specify range of K]. Range of K is specified to pro- A2.9 [Determine if clusters are changing]. If CLM #CLMO,
vide computation limits to the optimal clusters search go to Step A.2.4. If CLM = CLMO, proceed.
procedure. A2.10 [Compute PFS and- output all results]. Output PFS,
Al.3 [Initialize centroids]. Initial centroid starting values CENT, and CLM.
can be specified; if not, they are generated randomly. A2. 1I [Determine if PFS is increasing]. If PFS <PFSO,
CENT - {Ci}. PFSO < 0. stop; else proceed.
Al.4 [Assign members to clusters]. Euclidean distances are A2.12 [Branch to appropriate method]. If METH = 0, stop;
computed between each member and all centroids. if METH=1, go to A2Ml.1; if METH=2, go to
Members are assigned to clusters {Y'} (with cen- A2M2.1.
VOGEL AND WONG: PFS CLUSTERING METHOD 241

Subroutine ASUB the PFS clustering algorithm A2. For this purpose a clear but
ASUB.1 [Compute PFS]. difficult clustering problem is created.
ASUB.2 [Reassign cluster members]. Procedure is identical
to A2.8. CLM = new assignment. A. Evaluation of PFS Procedure Al Through Simulation
ASUB.3 [Output results]. PFS, CLM, and CENT. The following clustering tasks were designed in order to eval-
uate the PFS performance with various types of simulated data:
Cluster Splitting Method A2MJ 1) well separable, 2) slightly overlapping, 3) hierarchically
A2MJ.1 [Compute tr (Wi)]. tr (Wi) is computed for each nested, and 4) randomly distributed data. All of the simulated
cluster i. data have four features. Various numbers of separable clusters
A2MI.2 [Select most variable cluster]. The most variable are generated stochastically based on a normal distribution
cluster Yi is one where tr (Wi) = max1 [tr (W1)]. with unit variance. Only the centroids were varied to define
A2M1 .3 [Split Y']. One half of the members of cluster Y' separate clusters. All initially generated clusters were of equal
are randomly chosen and assigned to new cluster size. In the tasks the results were obtained from Algorithm
yk+ 1 Al, with the global means automatically subtracted and
A2M1.4 [Store and increment]. PFSO v- PFS, K *- K + 1. global variance in each feature normalized. (For testing
A2MI.5 [Return]. Return to A2.4. purposes, the algorithm was not permitted to stop until six
clusters were formed even though the PFS declines earlier.)
Cluster Combination Method A2M2
Nonhierarchical Clustering Examples
A2M2.1 [Test for cluster combination possibility]. If
K = 2, stop; else proceed. 1) Two slightly overlapping clusters (sample size 60, 30 per
A2M2.2 [Compute distance membership function DMF]. cluster):
(DMF formulation follows this algorithm.)
A2M2.3 [Combine clusters]. Select smallest DMF and com- K PFS
bine its corresponding clusters into one.
A2M2.4 [Store and decrement]. PFSO v- PFS, K *- K - 1. 2 70
A2M2.5 [Return]. Return to A2.4. 3 43
4 34
Distance Membership Function (DMF) 5 30
DMF for combining clusters can be formulated as follows: 6 26
Dii -squared Euclidean distance between Ci and Cj.
ni-number of members of cluster i. 2) Three well-separated clusters (20 per cluster):
n1-number of members of cluster j.
nin . K PFS
DMF. = D.
DMii Yi ni + ni
2 38
Some of its properties are listed as follows: 3 98
1 ) It is easy to compute. 4 68
2) When all cluster sizes are equal, it will be minimum for 5 58
those clusters whose centroids are separated by the minimum 6 53
Euclidean distance!.
3) When distances between clusters are equal, it will be
minimum for the clusters with fewest members.! 3) Four well-separated clusters (15 per cluster):
4) When cluster sizes are highly imbalanced, all other fac-
tors being equal, it will be minimum for the most imbalanced K PFS
cluster set.1
5) Mathematically: tr (Wi) + tr (W,) + DMFiJ = tr (T,1). 2 34
3 43
IV. SIMULATIONS 4 108
Two areas are independently explored through simulated 5 87
data and separately presented here. First we justify the PFS 6 72
criterion as an evaluator of cluster configuration and examine
some of its properties using the clustering procedure outlined From the above example we see that, at least in the well-
in Algorithm Al. Having established the practical abilities of separable clustering problem, the PFS successfully identifies
the PFS criterion, we then present simulated performance of the number of clusters without prior knowledge.
1(These characteristics correspond to a PFS maximization objective.) Hierarchical Clustering Examples
Since reassignments will occur immediately after a combination is made
and continue until stability is obtained, the maximization of PFS is 1) Three clusters, two tightly grouped (60 samples, 20 in
not completely dependent on the DMF. each cluster):
242 IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, VOL. PAMI-1, NO. 3, JULY 1979

Centroids: ++ xx+ x xxxx++ ++


x++x++ xxxxx++++
a) 1 1 -1 1 + + x + x + x.x x x + + + +
b) 1 1 1 -1 xx+ x++ xxxx++ ++
+ + + x+ x xxxx++ + +
c) -2 -2 -2 -2 + xx + + x xxxx + + + +
(a) (b)
K PFS SPLIT ON DATA Fig. 5. Overlapping and nonoverlapping groupings. (a) Separable
clusters, high PFS. (b) Nonseparable clusters, low PFS. x's are group
1 members and +'s are group 2 members.
2 70 40,20
3 65 18, 22, 20
4 50 tion produces minimally overlapping groupings, the groups
5 41 are always very nonrandom (Fig. 5).
6 34 Continuous data of a biological origin is often initially as-
sumed to conform to a normal distribution. The members of
an ensemble might be expected to be normally distributed in
2) Four clusters, grouped by two's: any given measurement variable. If an ensemble is composed
of subgroups, though the subgroups might be expected to in-
dividually follow the normal distribution, the large ensemble
K PFS SPLIT ON DATA would not be expected to. When attempting cluster analysis
2 96 30, 30 one is proposing that two or more subgroups are present. The
3 64 15, 15, 30 obvious alternative is the single unified ensemble hypothesis.
4 60 15, 15, 15, 15 This indicates that a comparison of the PFS for any proposed
5 50 partition might best be made with the PFS obtained by parti-
6 47 tioning the single normally distributed ensemble into the same
number of groups. The PFS for the unified normally dis-
Three runs had to be made for this last problem as the tributed ensemble remains relatively constant in a min tr (W)
clustering from K= 4 on often stopped at suboptimal solu- partitioning.
tions. The best run for each K is displayed (largest PFS is Accordingly, the cluster formation with highest PFS is also
considered best run). Since the PFS at a fixed percent level the one with a PFS most different from the single ensemble
of significance decreases slightly with increasing number of result. Therefore, maximizing PFS over many divisions into
groups, the PFS of 70 and 65 for K equals 2 and 3 in the first varying numbers of groups provides a good criterion for
hierarchical example can be roughly equal in percent level selecting the appropriate number of groups.
significance. B. Evaluation of the PFS Controller Clustering Algorithm A2
Random Distributions Through Simulation
In order to discover the lower limits that the PFS would Although the primary objective of this paper is to demon-
take, two single cluster problems were tried, one with nor- strate the usefulness of the pseudo F-statistic as an evaluator
mally distributed random variables and one with uniformly for cluster analysis, we have also presented a complex algo-
distributed random variables, shown
as below. Four features rithm for locating PFS optimal cluster configurations. Through
with zero means, unit variance, and sample size of 60 are experiments on simulated data, we shall explore some of the
used. properties of this algorithm and the PFS criterion.
In our simulation experiment we set up a four class problem
with three features for simplicity of illustration. Centroids
Normal Uniform of the clusters were placed at (10 10 0), (0 0 10), (5 0 0),
K PFS K PFS and (5 10 10) as displayed in Fig. 6(a). Each cluster was
2 14 2 20 given the same internal variance by generating all deviations
3 13 3 18 from the centroid with the same Gaussian random number
4 14 4 16 generator. In order to tax the capabilities of the clustering
5 15 5 16 procedures the cluster sizes were varied by 5 to 1 (from largest
6 14 6 15 to smallest cluster), with 50 samples in cluster A, 10 in B, 20
in C, and 30 in D.
To fairly explore the PFS criterion, the distribution of
Here the PFS variation indicates no strong preferences for samples in the space must be varied and several runs con-
any division of the data, especially in the normal case. All of ducted. We varied the sample distribution in space by altering
the statistics displayed have been well above the 0.999 percent the variance of the Gaussian generator. For the initial trial,
level on the F-distribution, even the purely random samples. we desired well-separable clusters to obtain a PFS maximum
The reason for this is that the F-statistic tests the deviation which would correctly identify the generation procedure. As
from the randomness of the groupings. Since SSW minimiza- the variance is increased [Fig. 6(b)], the clusters will become
VOGEL AND WONG: PFS CLUSTERING METHOD 243

0.D
0 00 0
0
0 a0 0 0

00 00 //1
/l
/1
*0
~~~~0 oo0 . X
O0 *DO-. oOl- -
o 0 0 v
!- 0-
0

J /
/* ,.0..
O
I 0 *o* 0/
..S .
I

0DtO
X1° 0°
0
/lo)
40_0O.): -
*.° *A_'*
I I . * 1* * *1i
(a)O)
*I
B __ .n___ I

II / . 0 /
X2 1 A /
Av
o.cb X _ _
o o 0
_
Do _ _

00 a0 0

(QQO)
l 0O 0 (10,00) 0

(a) (b)
Fig. 6. Spatial distribution. (a) Representation of clusters when gen-
erator variance = 1. (b) Representation of clusters when generator
variance = 9.

600 I

-X- Merging

----0--- Splitting
500 F

400 F

PFS
300 VAR 1

200 [

100 - 4

0
0 2 4 6 8 10 12
# of Clusters
Fig. 7. Simulation results for PFS Algorithm A2.

less distinct. When the internal variance becomes much larger for the well-separable clusters with variance 1. In this case,
than the distance separating the cluster centroids, there both merging and splitting methods reach the identical cor-
should really appear to be only one cluster in the space. rect result with only one run although both are started at ran-
Six total runs were conducted, with one run by merging and dom points. For the barely intersecting clusters of variance
one run by splitting, for three different variances. The results 4, the overall PFS values are markedly lower, but the peak at
are displayed in Fig. 7. Note that the PFS value peaks sharply 4 clusters is still clearly evident. Once again, both methods
244 IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, VOL. PAMI-1, NO. 3, JULY 1979

Problem Trial / PFS Remarks Cumulative


Well 1 128.8 A-split C-ok B+D Combined
Separable 2 614.3 Correct Result
Clusters 3 614.3 Correct Result 3 out of 5
(generator 4 614.3 Correct Result correct
variance 1) 5 156.8 A-ok D-split B+C Combined

Barely 1 99.4 A-split D-ok B+C Combined


Intersecting 2 105.3 A-split D-ok B+C Combined
Clusters 3 158.6 Correct Result 1 out of 5
(generator 4 98.3 A-ok B-split B+C Combined correct
variance 4) 5 98.4 A-split D-ok B+C Combined

Fig. 8. Table of results using K-means iteration process from different


random starting points. (No splitting or merging heuristics.)

obtained the same results for the PFS peaked cluster config- prior knowledge is often utilized in the selection, scaling, and
uration as well as for the best 3 and 5 cluster configurations.manipulation of the feature set prior to clustering. By retain-
Finally, when the variance is increased to 9, the clusters be- ing a clustering procedure independent of human intervention,
come less distinct and the PFS is further lowered with the we maintain a certain objectivity in the final result. This
peak value shifted to a 3 cluster configuration. This best 3 makes the results more meaningful for new data.
cluster configuration contains most members of the two In regards to the use of the PFS controller algorithm for
smaller clusters B and C in a single cluster. If the variance cluster analysis, we feel that it is clearly desirable to utilize it
was further increased the PFS values would continue to de- when the feature space is reasonably appropriate and when the
cline while the peak value shifted more toward 2 and finally number of clusters is unknown. The results have a consistency
leveled out. not present in the K-means procedure. Since it can also be
As long as the clusters are reasonably separable the PFS applied without setting, splitting, and merging parameters, it
criterion highlights them even when they differ greatly in maintains this advantage over the ISODATA procedure.
the number of samples. That the algorithm performed well There is another variation on ISODATA reported by From
is reflected by the correct solutions obtained from random and Northouse [23] which eliminates the setting of param-
starting points without the necessity of several runs. By eters through the use of heuristic methods, but does not pro-
using both methods of the algorithm confirmation of a result vide any rationale for their inherent quality.
can be obtained although one or the other may be best for As was shown in Fig. 8, the K-means iterative procedure,
different types of data. even when the total number of clusters was known a prior,
ISODATA would also be able to find the four well-separable must often be restarted many times before arriving at the
clusters if given the correct merging and splitting parameter correct solution. And ISODATA, if given the wrong values
values. This method achieves that result without any setting for merging and splitting, can have extreme difficulty in ob-
of parameters. If one just neglects merging and splitting con- taining a good result. In all of the cases presented here the
siderations for the cluster analysis and just uses a technique PFS algorithm obtained the correct result by both the split
of randomly separating the original samples in four groups, and combine procedures on only one run. Since it is also a
locating the centroids, reassigning and iterating until a stablemodified hill climbing procedure, this will not always be the
solution is achieved, then the correct solution is obtained onlycase and different starting trials can lead to different stable
part of the time, as shown in Fig. 8. solutions if the clusters are not all well separable. In general,
the combining method has an advantage in being less likely
V. CONCLUSION to overlook the distinct but small clusters, while the splitting
method is computationally much faster for problems with few
With simulated data, the PFS criterion does perform well in clusters. As in method Al when an inconsistency is present,
determining the number of clusters present. In practical ap- the solution with highest PFS value is chosen as the more
plications on biomedical data we have found [211 that the naturally accurate one.
highest PFS does not always correspond to the most desirable
cluster formation (based upon other criteria dealing with the REFERENCES
applicability of the result). The apparent inconsistency is due [1 H. P. Friedman and J. Rubin, "On some invariant criteria for
to the fact that sometimes the clusters were not very separable grouping data," J. Amer. Statist. Ass., vol. 62, pp. 1152-1178,
in the feature space. Despite this, the most useful cluster 1967.
[2] S. S. Wilks, "Certain generalizations in the analysis of variance,"
formations do have a relatively high PFS, often within 5 per- Biometrika, vol. 24, pp. 271-294, 1932.
cent of the maximum obtainable. To avoid overlooking addi- [31 F. H. C. Marriott, "Practical problems in a method of cluster
tional cluster formations which might be meaningful, the PFS analysis," Biometrics, vol. 27, pp. 501-514, 1971.
[4] R. Maronna and P. M. Jacoukis, "Multivariate clustering pro-
algorithms can be modified to continue searching until the cedures with variable metrics," Biometrics, vol. 30, pp. 499-
PFS declines by at least 5 percent from its maximum value. 505, 1974.
D. E. Bailey and R. C. Tryon, Cluster Analysis. New York:
To obtain a good feature set for this type of cluster tech- [51 McGraw-Hill, 1970.
nique, we [22] have found it most helpful to create com- [6] R. 0. Duda and P. E. Hart, Pattern Classification and Scene
posite features with as many ordered states as possible. Much Analysis. New York: Wiley, 1972.
IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, VOL. PAMI-1, NO. 3, JULY 1979 245

[7] S. A. Mulaik, The Foundations of Factor Analysis. New York: [23] K. R. From and C. Northouse, "A nonparametric clustering algo-
McGraw-Hill, 1972. rithm," Pattern Recognition, vol. 8, pp. 107-1 14, 1976.
[8] A. K. C. Wong, M. A. Vogel, and N. L. Steg, "A systematic ap-
proach to computer aided diagnosis and prognosis," in Proc.
1975 Int. Conf. Cybern., Soc., 1975,pp. 189-191.
[9] T. Y. Young and T. W. Calvert, Classification Estimation and Mark A. Vogel (S'76-M'77) was born in New
Pattern Recognition. New York: Elsevier, 1970. York, NY, in 1950. He received the B.S., M.S.,
[101 G. H. Ball and D. J. Hall, "ISODATA, a novel method of data and Ph.D. degrees in electrical engineering from
analysis and pattern classification," Stanford Res. Inst., Menlo Carnegie-Mellon University, Pittsburgh, PA, in
Park, CA, Tech. Rep., 1965. 1972, 1973, and 1977, respectively.
[11] G. H. Ball, "Data analysis in the social sciences: What about the From 1974 to 1977 he was Coordinator of a
details," in AFIPS Proc. Fall Joint Comput. Conf., vol. 27, Joint Biomedical Research Project between
1965, pp. 533-559. Carnegie-Mellon University and the A. I. DuPont
[121 K. A. Brownlee, Statistical Theory and Methodology in Science Institute. He is presently with The Analytic
and Engineering. New York: Wiley, 1965. Sciences Corporation, Reading, MA. His re-
[13] M. M. Tatsuoka,Multivariate Analysis. New York: Wiley, 1971. search interests include image processing, pat-
[14] A. K. C. Wong and T. S. Liu, "Typicality, diversity and feature tern recognition, simulation, modeling, and classification and clustering
pattern of an ensemble," IEEE Trans. Comput., vol. C-24, pp. methodology.
158-181,Feb. 1975.
[15] R. M. Cormack, "A review of classification," J. Roy. Statist.
Soc., series A, vol. 134, part 3, pp. 321-353, 1971.
[16] E. W. Forgy, "Cluster analysis of multivariate data: Efficiency
vs. interpretability of classifications," Biometrics, vol. 21, pp.
768-769, 1965.
[17] J. MacQueen, "Some methods for the classification and analysis Andrew K. C. Wong received the Ph.D. degree
of multivariate observations," in Proc. 5th Berkeley Symp. Math. at Carnegie-Mellon University, Pittsburgh, PA,
Statist. Prob., 1967, pp. 281-297. 'in 1968.
[18] J. H. Ward, "Hierarchical grouping to optimize an objective func- Prior to 1976 he was an Associate Professor
tion," J. Amer. Statist. Ass., vol. 58, pp. 236-244, 1963. in the Biomedical Engineering Program of
[19] J. H. Ward and M. E. Hook, "Application of a hierarchical group- Carnegie-Mellon University. Since 1976 he has
ing procedure to a problem of grouping profiles," Educ. Psychol. i |§ been with the Department of Systems Design,
Measurement, vol. 23, pp. 69-82, 1963. University of Waterloo, Waterloo, Ont., Canada.
[20] D. Wishart, "An algorithm for heirarchical classifications," Bio- He is the author and coauthor of many papers
metrics, vol. 25, pp. 165-170, 1969. concerning pattern, image, and scene analysis,
[21] E. M. L. Beale, "Euclidean cluster analysis," Bull. 1. S.I., vol. biomedical signal analysis, biomedical and
43, book 2, pp. 92-94, 1969. health-care systems, synthesis and analysis of system states and rela-
[22] A. K. C. Wong and M. A. Vogel, "Unsupervised classification for tional structures, applications of information theory, and complex
prognosis of Legg Calve Perthes' disease," Biomedical Informa- information processing to molecular biology and genetics.
tion Processing Program, A. I. DuPont Inst. and Carnegie-Mellon Dr. Wong is currently an Associate Editor of the Journal of Com-
Univ., Pittsburgh, PA, Int. Rep., 1975. puters in Biology and Medicine.

A Method of Recognition and Representation of


Korean Characters by Tree Grammars
TAKESHI AGUI, MASAYUKI NAKAJIMA, TAE K. KIM, AND EDUARDO T. TAKAHASHI

Abstract-A syntactic method is applied to the recognition of Korean As a structural analysis, a production process of fundamental charac-
characters (Hangul). Since they develop into complex characters by ters is represented by tree grammars, and the extraction algorithm
the sequential addition of fundamental characters under positioning of fundamental characters and the results of computer simulation are
rules, there are a large amount of characters and consequently there described.
exist many similar characters. Therefore, the sequential extraction,
according to the positioning rules, of fundamental characters compos- Index Terms-Blank pattern, graph matrix, graph pattern, Korean
ing Korean characters is effective for automatic recognition. characters, sequential extraction, tree grammar.

Manuscript received May 23, 1978; revised December 15, 1978.


T. Agui and M. Nakajima are with the Imaging Science and Engineer- I. INTRODUCTION
ing Laboratory, Tokyo Institute of Technology, Midori-Ku, Yokohama,
Japan. IN Korean characters, systematic combinations of fundamen-
T. K. Kim is with Choong-nam National University, Taejon-City, tal characters, i.e., 14 consonants and 10 vowels, make char-
Korea.
E. T. Takahashi is with the Department Computer Science,
of Univer- acters with significance. The number of all the characters pro-
sidade Estadual de Campinas, Campinas, Brazil. duced are counted up to 14 000 [1], and this large amount of
0162-8828/79/0700-0245$00.75 C 1979 IEEE

You might also like