Correspondence Analysis and Classification: Lebart L.
Correspondence Analysis and Classification: Lebart L.
(corrected version)
and
MIRKIN B.G.
Department of Applied Statistics and Informatics
Central Economics-Mathematics Institute of Russian Academy of Sciences
(Currently at International Energy Agency,
2, rue André Pascal, 75775, Paris Cedex, 16, France)
1. Introduction
The present paper contains a survey of some of the most salient results about the links and
the complementarity between clustering and correspondence analysis (CA) of
contingency tables. It includes also a presentation of certain new contributions and
domains of research.
The practitioners use to complement one approach with the other when a thorough
exploration of data is needed, since the two points of view may provide quite different
portrays of data. The involved processes are obviously distinct (projection onto a principal
subspace on the one hand, grouping of similar categories on the other) but they could lead
to identical results in specific situations. In more general cases, the parameters they
produce are not independent. We will precisely focus on this interdependence and these
specific situations below.
Two characteristics of CA are in favour of a reconciliation with classification : the
symmetry of the roles of rows and columns in the process, and the property of
distributional equivalence (Benzecri, 1973; Escofier, 1978; Gilula, 1986; Greenacre,
1988), allowing for a great stability of the results when agglomerating elements with
similar profiles. Agglomerating the rows or the columns of a contingency table is
"natural" in the sense that it is merely replacing classes by classes (instead of replacing
individuals by groups, or variables by groups of variables...).
The questions of clustering in contingency data tables based on grouping of homogeneous
items are discussed in Cazes (1986), Escoufier (1988), Greenacre (1988), Gilula (1986),
Goodman (1981), Jambu (1978).
1
Seventh International Conference on Multivariate Analysis
Barcelona Meeting, September, 21 - 24, 1992
in "Multivariate Analysis, Future Directions, C.Cuadras, C.R.Rao, Eds, North Holland, 1993, p 341-357.
One can find a series of theoretical bridges between these approaches, exemplified by
some particular models. We discuss below a set of such models having in mind the
following purposes: to unify the previous developments and to propose certain new
approaches. Let us illustrate this discussion with a numerical example of a symmetric 8
by 8 contingency table KIJ = (kij) comprising k=640 cases (table 1). The marginals ki and
kj are identical (equal to 80) in this particular example, but all the results concern as well
the cases with unequal marginals.
Table 1
Contingency table KIJ
LIG1 30 18 12 12 2 2 2 2
LIG2 18 30 12 12 2 2 2 2
LIG3 12 12 27 21 2 2 2 2
LIG4 12 12 21 27 2 2 2 2
LIG5 2 2 2 2 24 20 14 14
LIG6 2 2 2 2 20 24 14 14
LIG7 2 2 2 2 14 14 23 21
LIG8 2 2 2 2 14 14 21 23
ROW1
(.023)
ROW2
(.090)
ROW3
(.006)
ROW4 (.640)
ROW5
(.003)
ROW6
(.040)
ROW7
(.001)
ROW8
Figure 1
Sketched dendrogram issued from hierarchical clustering of the (8,8) table KIJ
2
Seventh International Conference on Multivariate Analysis
Barcelona Meeting, September, 21 - 24, 1992
in "Multivariate Analysis, Future Directions, C.Cuadras, C.R.Rao, Eds, North Holland, 1993, p 341-357.
Moving centers (k-means) method based on chi-square distance gives similar results. For
example, beginning with the two centers corresponding to the elements 1 and 8, we obtain
easily the 2-class partition presented corresponding to the upper part of the dendrogram
(cf. Figure1).
To express both the symmetry and distributional equivalence in a unified form let us
consider for each i ∈ I and j ∈ J the value
qij = (k ∞ kij)/(kikj) - 1 (i ∈ I, j ∈ J)
which expresses the relative increment (or decrement) RIP(i/j) of the probability of row i
due to the knowledge of column j. Dual interpretation of qij as the relative increment
RIP(j/i) of the probability of column j due to row i is straightforward. Relative increments
for subsets are defined in analogous way using the total probabilities (or frequencies). The
RIP values in table 1 are calculated by multiplying the entries by .1 and substracting 1
afterwards. Note that we have the two following relationships expressing the classical
Chi-square X2 as a function of the RIP coefficients :
The RIP concept is useful in many aspects (Mirkin, 1985, 1992). In the present context we
should point out that the RIP concept underlies the basic reconstruction formulas of CA :
3
Seventh International Conference on Multivariate Analysis
Barcelona Meeting, September, 21 - 24, 1992
in "Multivariate Analysis, Future Directions, C.Cuadras, C.R.Rao, Eds, North Holland, 1993, p 341-357.
and corresponding values µh for h=(s,t) to approximate the RIP matrix Q={qij} , that is,
to minimize the difference between the left and right parts of (1) (in the Boolean form)
measured by the weighted least square criterion L2 such that :
L2 = Σij ki kj [qij - Σh µhFh(i)Gh(j) ]2 (2)
(the weight of the entry (i,j) is to be equal to kikj (Carroll, Pruzansky, and Green, 1977;
Escoufier, 1988). When the user wants to clusterize only one of the sets, I (or, J) the
corresponding partition of J (or, of I) consists of the set of singletones.
Evidently, for Fh and Gh (h∈ H) fixed, the optimal values µh are equal to the
corresponding RIP values, that is, for each h = (s,t), the optimal value is such that µh = qst.
It is not difficult to prove also that the alternating algorithm for minimizing L2 is
equivalent to the chi-square distance moving centers method, and that an agglomerative
suboptimal algorithm is equivalent to the chi-square distance based agglomerating
clustering procedure using generalized Ward criterion (Mirkin, 1992). The value of the
criterion can be expressed through the difference of the chi-square contingency
coefficients for the initial and aggregated contingency tables: L2= (X2(I,J)-X2(S,T)). This
approach can account for various results and findings derived in Benzecri et al.(1980),
Cazes (1986), Moussaoui (1987), Jambu (1978).
4
Seventh International Conference on Multivariate Analysis
Barcelona Meeting, September, 21 - 24, 1992
in "Multivariate Analysis, Future Directions, C.Cuadras, C.R.Rao, Eds, North Holland, 1993, p 341-357.
µ, F(i) and G(j) are unknown, whereas the qij are the residuals computed after each
iteration (for the first iteration, the qij are the initial RIP values).
The optimal µ for any fixed box VxW (defined as V = {i : F(i) = 1} and W = {j : G(j) =
1}) is determined by the weighted average of qij computed within the box :
µ = Σi %V Σj %W ki kj qij / kv kW
which equals qv W.
Substituting this value into (3) leads to the following equality :
L2 = Σi %I Σj %J ki kj (qij)2 - µ2 kv kW
which shows that minimizing L2 is equivalent to maximizing the following form of the
criterion, depending on box VxW only :
To maximize this criterion, the following step-by-step procedure of box generation can be
performed : each step adds to the box issued from the previous step only one element, a
row or a column, to maximize the increment of the criterion due to the added element. At
the first step, two elements are simultaneously selected : a row i and a column j ,
maximizing g({i},{j}) for all the pairs of singletones. The process stops when the
maximal increment becomes negative. The suboptimal cluster box obtained through this
algorithm has the following property (Mirkin, 1992) : For each row i or column j outside
the cluster box, the absolute value of the relative increments qVj = qjV and qiW = qWi are
at least twice smaller than the absolute value of the relative "internal" increment qVW =
qWV.
The residual data in this sequential fitting procedure are obtained through substracting the
solution provided by the h-th iteration from the residual data of the preceding iteration.
Even in the case of overlapping boxes, the initial Chi-square can be partitioned into
components corresponding to these boxes in order to evaluate the contribution of each
cluster, and to help fixing the number of cluster (by using traditional values of the
5
Seventh International Conference on Multivariate Analysis
Barcelona Meeting, September, 21 - 24, 1992
in "Multivariate Analysis, Future Directions, C.Cuadras, C.R.Rao, Eds, North Holland, 1993, p 341-357.
Unfortunately, the Boolean form of decomposition (1) has no longer the weighted
orthonormality properties of the CA factors. But for the symmetric matrices KII (which is
exactly the case of our example), Benzecri (1973, vol.2, ch.11) has pointed out a situation
where the discrete orthonormal eigen-functions are relevant.
The preceding symmetric (8,8) contingency table KIJ has thus the property of providing
an exact coincidence between correspondence analysis and hierarchical clustering (using
the Ward's criterion) in the following sense : each eigenvalue of the CA corresponds
exactly to a node of the classification.
Table 2
The associated axis of the CA separates the two sets of elements constituting this node.
6
Seventh International Conference on Multivariate Analysis
Barcelona Meeting, September, 21 - 24, 1992
in "Multivariate Analysis, Future Directions, C.Cuadras, C.R.Rao, Eds, North Holland, 1993, p 341-357.
Correspondence analysis of table KIJ leads to 7 clearly separated eigenvalues (see table
2).
The sequence of patterns that can be observed in the columns of table 3 (eigen-vectors) is
typical of a hierarchical structure : the non-zero coordinates on each principal axis can
take only two distinct values, opposing two groups of elements.
Table 3
Axes 1 2 3 4 5 6
The first axis, for instance, opposes (ROW1 ... ROW4) to (ROW5 ... ROW8). The second axis,
within the first group isolated by axis 1, opposes (ROW1, ROW2) to (ROW3, ROW4), etc..
Correspondence analysis performs in this case like a divisive algorithm, working
iteratively from the upper to the lower level of a hierarchy.
ROW5 Axis 1
ROW6
-0.8 0 ROW7 80%
ROW8
-0.4
ROW3
ROW4
Figure 2
7
Seventh International Conference on Multivariate Analysis
Barcelona Meeting, September, 21 - 24, 1992
in "Multivariate Analysis, Future Directions, C.Cuadras, C.R.Rao, Eds, North Holland, 1993, p 341-357.
Figure 2 gives neither pertinent information about the distances between ROW1 and ROW2
(the corresponding points are superimposed on the plane, suggesting a null distance), nor
useful information about the distances between ROW4, ROW5, ROW6, also superimposed on
the graphical display. This shrinkage of distances, easily explained by the geometrical
properties of the initial swarm of points, should prompt the users to use simultaneously
the two kinds of methods to obtain a reliable description of the data.
The above example concerns the case of a binary hierarchy H, whose each nonterminal
element h ∈ H can be partitioned in a unique way into two sets a(h) and b(h) belonging to
H. The orthonormal set of "3-valued" functions fh is defined as follows: fh(i) equals da for
i ∈a(h), -db for i ∈ b(h), 0 for others elements i, where da, db are chosen in order to make
the average of fh equal to zero, and the norm equal to 1.
Evidently, da = [ (k×kb(h)) / (ka(h)kh) ]1/2 , db = [ (k×ka(h)) / (kb(h)kh) ]1/2).
We say that a square symmetric contingency table is compatible if (1) holds for some
binary hierarchy H with Fh(i)=fh(i), Gh(j)=fh(j) and some µh>0 (h ∈ H). In general, a
method to approximate the RIP values with those 3-valued eigen-function decomposition
can be developed. The method fits model (1) sequentially, each iteration finding a bi-
partition of current set h into two subsets, a(h) and b(h), to minimize the weighted least
square criterion, or, equivalently, to maximize the "explained" part of the chi-square value
which is shown to be equal to :
(µh)2 = (qa(h)a(h) + qb(h)b(h) - 2 qa(h)b(h))2.
This divisive clustering procedure, in our example, leads to the hierarchy of Figure 1.
The largest eigenvalues issued from the CA of a contingency table are greater or equal to
the largest index corresponding to the last node of a hierarchical clustering of the rows or
8
Seventh International Conference on Multivariate Analysis
Barcelona Meeting, September, 21 - 24, 1992
in "Multivariate Analysis, Future Directions, C.Cuadras, C.R.Rao, Eds, North Holland, 1993, p 341-357.
of the column of this contingency table (using the chi-square distance and the generalized
Ward criterion to ensure a compatibility between the two techniques). The equality occurs
for special tables such as the compatible matrices dealt with in the previous section. This
upper bound for the indices could be derived easily from the above considerations since
the indices and the eigenvalues appear to be solutions of the same optimization problem,
with supplementary constraints for the indices. Benzecri and Cazes (1978) have more
generally shown that the quantity (λ1 + λ2 + .. λp ) is greater or equal than the sum of
the p indices corresponding to the p highest nodes of the associated hierarchy (a property
which can be directly derived from the general criterion (2), where F and G are less
constrained in the case of CA). Moreover, these authors have produced a counter-example
showing that there exists no general lower bound for the index corresponding to the
highest node : one can find distributions of density such that the largest index remains an
arbitrarily small fraction of the largest eigenvalue.
We give in this section some empirical results about the joint behavior of the indices and
the eigenvalues issued from the same random contingency table.
Under the hypothesis of independence (also called homogeneity in the case of
contingency tables), a series of 1000 pseudo-random independent (8,8) contingency table
with equal theoretical marginal are generated, according to a multinomial scheme. For
each generated table, the total number of observations k is 1000.
9
Seventh International Conference on Multivariate Analysis
Barcelona Meeting, September, 21 - 24, 1992
in "Multivariate Analysis, Future Directions, C.Cuadras, C.R.Rao, Eds, North Holland, 1993, p 341-357.
Table 4
Mean values and standard deviations of the
Eigenvalues and the Clustering indices.
(1000 independent random ( 8 , 8 ) contingency tables C . For each table C, k = 1000.)
The 7 eigenvalues issued from the CA of each table as well as the 7 indices of the
hierarchical classification of the rows and the columns (always using the generalized
Ward criterion and the chi-square distances) of the same table are computed, enabling to
estimates the means, variances and correlations relating to these 21 variates.
Table 4 summarizes the results concerning the means, the standard deviations of the initial
variables and the standard deviations of the means.
The results concerning the eigenvalues are consistent with some previous approximations
(Lebart, 1976), since their distribution is similar to the one of the eigenvalues of a Wishart
matrix (n=7, p=7). The sum τ of the means of the different eigenvalues equals 0.0492 ;
the statistics kτ has thus the value 49.2 (no significant difference with the expectation of a
Chi-square with 7x7 degrees of freedom)
10
Seventh International Conference on Multivariate Analysis
Barcelona Meeting, September, 21 - 24, 1992
in "Multivariate Analysis, Future Directions, C.Cuadras, C.R.Rao, Eds, North Holland, 1993, p 341-357.
The indices corresponding to the clustering of the rows and of the columns are distinct
for each simulated matrix. The statistical identity of their first and second order moments
is a further indication of the consistency of the simulation process.
As expected, the largest indices INR1 and INC1 are smaller than the largest eigenvalue
λ1=EV1, whereas the smallest indices INR7 and INC7 are on the average much larger their
counterpart.
0,025
Eigenvalue
0,02
Clust. Index
0,015
0,01
0,005
0
1 2 3 4 5 6 7
Figure 3 shows the compared trajectories of these two quantities, highlighting the smaller
range of variation of the indices.
Figure 4 below presents the scattering diagram of the joint distribution of the first
eigenvalue λ1 = EV1 and the first row-clustering index INR1 both issued from the same
pseudo-random matrix. The correlation coefficient between λ1 and INR1 is 0.91. (The
same value is obtained for the correlation coefficient between λ1 and INC1). The
theoretical constraint INR1≤ λ1 clearly defines the upper left boundary of the swarm of
points.
11
Seventh International Conference on Multivariate Analysis
Barcelona Meeting, September, 21 - 24, 1992
in "Multivariate Analysis, Future Directions, C.Cuadras, C.R.Rao, Eds, North Holland, 1993, p 341-357.
% * *
!
! *
! * * *
! * *
.030 ! * * *
! * *
! ***
! ** *** *
! * * *
! * * * * * * *
! * * * * * * * * * *
! ** * * * *
.025 ! * * * * * * *
! *** *** ** * * * * *
! * ***** ** * * * *
! * * ****** * ****
! ** ** ***** * *** ** *
! ** * ****** * * * ** *
! * ** *** *** * *
! ******* * * * * * **
! * * ** ***** **** *** *
.020 ! * ************ * * * * *
! ************* * ** * *
! **************** ** * * ** *
! * *********** ** * * * *
! ************** *** **** * *
! ************* * * ** *
! ************** ** ** ** *
! * ****************** ***
.015 ! * ************** ***
! ************** * * * *
! ***************** * * *
! * *************** ** *
! *** ** ****** * *
! *********** ** * *
! ******* * * *
! ******* *** *
! ***** ** * * *
.010 ! ** * *** * * *
! ***** * *
! * *
! *
! * *
!
!
.005 !
!
!
!
!
!
!
.000 !
+--------------------------------------------------------------------------------------------------->
004 .009 .018 .027 .036
.044
FIRST EIGENVALUE
Figure 4
Correlation between λ1 = ΕV1 and the first clustering index INR1
To study the complex system of relationships between the various indices and the
eigenvalues, we will visualize the corresponding correlation matrix through a principal
component analysis (PCA), which will summarize the main observable patterns.
Figure 5 shows the principal plane of a PCA whose the active elements are the
eigenvalues and the illustrative elements are the indices. A classical size effect (all the
12
Seventh International Conference on Multivariate Analysis
Barcelona Meeting, September, 21 - 24, 1992
in "Multivariate Analysis, Future Directions, C.Cuadras, C.R.Rao, Eds, North Holland, 1993, p 341-357.
coordinates on the first axis are positive) corresponds to the fact that all the involved
correlation coefficients are positive.
EV7 EV6
0.60
Axis 2 19%
0.40
EV5
0.20
INR7
INC7 INC6
EV4 Axis 1
INR6
40%
INC5
0 0.20 0.40 0.60 INR5 0.90
INC4
INR4
EV3
INR3
LEGEND INC3
INC1 INR2
EVi Eigenvalue i
INRi Row-Index i INR1
INCi Column-Index i INC2
EV1 EV2
The first indices are clearly correlated with the first eigenvalues. As mentioned
previously, the two correlation coefficients between each of the largest indices (INR1 and
13
Seventh International Conference on Multivariate Analysis
Barcelona Meeting, September, 21 - 24, 1992
in "Multivariate Analysis, Future Directions, C.Cuadras, C.R.Rao, Eds, North Holland, 1993, p 341-357.
INC1) and the first eigenvalue take the value 0.91 (the correlation between INR1 and
INR2 is only 0.80, but these relatively small differences are not visible on the display).
The positive autocorrelations beween successive eigenvalues or indices entail regular
trajectories on the plane spanned by the two first principal components, but these
trajectories diverge for the smallest eigenvalues and indices.
This pattern established from pseudo-random matrices is an assessment of the intuitive
experience of the practitioners : on the one hand the upper part of the dendrogram
provides the user with about the same results than the first axes ; on the other hand the
lower part of the dendrogram often pinpoints some interesting local properties of the data,
while the smallest eigenvalues correspond to some unidentifiable noise.
Two series of works involving simultaneously at different level both CA (or other
principal axes method) and clustering are briefly mentioned below.
In the case of individuals described by several categorical variables (these variables could
be measured on nominal, ordinal or interval scales), van Buuren and Heiser (1989)
propose an algorithm achieving simultaneoulsly a coding of the variable and a clustering
of the individuals. An alternative least square algorithm is used, starting from a multiple
correspondence analysis of the data table.
Some techniques related to projection pursuit and discrimination can be considered also
as an intermediate step between the two approaches.
Let us consider n objects described by p variables (yij is the value of variable j for object i).
Furthermore, these objects are also the vertices of a symmetric graph G, whose associated
matrix is M (mii' = 1 if nodes i and i' are joined by an edge, mii' = 0 otherwise). Such
situation occurs when objects are time-points, geographic areas, or if they are assigned to a
priori classes. Contiguity Analysis simultaneously uses the local covariance matrix C (such
that cjj' = (1/2m) Σi,i' mii'(yij - yi'j) (yij' - yi'j') ) , and the global covariance matrix V. If
the graph is made of k disjoined complete subgraphs, V is very similar to the classical
"within covariance matrix" used in linear discriminant analysis, and coincides with it when
the graph is regular (i.e. each vertex is provided with the same number of edges). The
14
Seventh International Conference on Multivariate Analysis
Barcelona Meeting, September, 21 - 24, 1992
in "Multivariate Analysis, Future Directions, C.Cuadras, C.R.Rao, Eds, North Holland, 1993, p 341-357.
minimization of the ratio: u'Cu / u'Vu (u being a p-vector) provides then a generalization
of linear discriminant analysis in the case of overlapping clusters (see for instance Aluja and
Lebart., 1984).
Using more general similarity indices in place of the binary quantity mii' allows to define a
series of indices analogous to those used in Projection Pursuit (see Caussinus, 1992).
It is easy to derive a contiguity matrix from the basic data array itself: any threshold applied
to the set of n(n-1) distances or similarities between observations allows to define a binary
relationship which can be described by a symmetric graph. Similarly, a contiguity matrix
can be derived, from the k nearest neighbours of each observation.
The contiguity analysis applied to such matrices (Burtschy and Lebart, 1991) is closely
related to the techniques proposed by Gnanadesikan et al. (1982), Art et al. (1982). It
produces planar (or low dimensional) representations which can be viewed as
compromises between the outcomes of principal axis techniques (CA or PCA) and those
of clustering techniques.
Various authors have insisted upon the complementarity between principal axes
techniques and classification, which concerns the comprehension of the data structure as
well as the interpretation of the results. Gower and Ross (1969), for example, have shown
how the drawing of a minimum spanning tree onto a principal plane issued from a
principal component analysis could enrich the interpretation of the represented distances
between points. Benzecri et al. (1980) have developed a thorough methodology for the
conjoint use of CA and hierarchical clustering, comprising various parameters which
describes the mutual links between axes and nodes.
CA , like PCA, could entail shrinkages and distorsions due to both the projection onto the
principal dimensions and the possible lack of robustness of the global fit (sensitivity to
outliers). It is then advisable to complement it with a classification performed in the
whole space. The clusters are not only used to mark out the factorial planes by a sample
of well described areas. Being derived in a much higher dimensional space, they can
supply elements of information that could have been hidden by the projection onto a low
dimensional subspace.
A practical issue reinforces this need for both approaches : it is much easier to describe a
set of clusters than a continuous space. The most significant categories or variables for
each cluster could be automatically selected, therefore producing a computer aided
description of the classes, and hence, of the whole space. A series of statistical tests
15
Seventh International Conference on Multivariate Analysis
Barcelona Meeting, September, 21 - 24, 1992
in "Multivariate Analysis, Future Directions, C.Cuadras, C.R.Rao, Eds, North Holland, 1993, p 341-357.
allow to select and to sort (according to the computed levels of significance) the most
characteristics items for each cluster (see for instance Lebart et al., 1984). .
From a purely computational point of view, when dealing with very large data sets such as
those provided by survey data files, it may prove efficient to perform a classification
using a limited number of factors issued from CA to increase the performances of the
techniques (Morineau and Lebart., 1986).
Finally, the user may wish to discover some unexpected latent factors or some hidden
existing groups within the data. Although the theoretical models underlying CA and
classification are seldom referred to by exploratory data analysts, it is clear that each tool
has its own vocation and idiosyncrasies. Even if the history of statistical applications
abounds in examples of groups discovered through eigen-analyses as well as latent factors
discovered through clustering, it seems wiser to systematically use both techniques.
References
Aluja Banet T., L. Lebart (1984). Local and Partial Principal Component Analysis and
Correspondence Analysis, COMPSTAT Proceedings, 113-118, Physica Verlag, Vienna.
Art D., Gnanadesikan R, Kettenring J.R.(1982). Data Based Metrics for Cluster
Analysis, Utilitas Mathematica, 21 A, 75-99.
Benzécri J.P. (1973) Analyse des Données.-Paris: Dunod.
Benzécri, J.P. (1983) Analyse d'inertie intraclasse par l'analyse d'un tableau de
correspondance, Les Cahiers d'Analyse des Données, 8, no.3, 351-358.
Benzécri J.P., Cazes P. (1978) Problème sur la classification. Les Cahiers d'Analyse des
Données, 3, no.1, 95-101.
Benzécri J.P., Jambu M. (1976) Agrégation suivant le saut minimum et arbre de longueur
minimum. Les Cahiers d'Analyse des Données, 1, no.4, 441-452.
Benzécri, J.P., Lebeaux M.O., and Jambu M. (1980) Aides a l'interpretation en
classification automatique, Les Cahiers de l'Analyse des Données, vol.V, n.1, 101-123.
Bock H. H. (1979) Simultaneous clustering of objects and variables. in Analyse des
donnees et informatique, European C.C. Courses, INRIA, p 187-203.
Braverman E.M., Kiseleva N.E., Muchnik I.B., and Novikov, S.G. (1974) Linguistic
approach to the problem of processing large bodies of data, Automation and Remote
Control, 35, no.11, part 1, 1768-1788.
16
Seventh International Conference on Multivariate Analysis
Barcelona Meeting, September, 21 - 24, 1992
in "Multivariate Analysis, Future Directions, C.Cuadras, C.R.Rao, Eds, North Holland, 1993, p 341-357.
Burtschy B., and Lebart L. (1991) Contiguity analysis and projection pursuit, 117-128. in
Applied Stochastic Models and Data Analysis, World Scientific , Singapore.
van Buuren S., and Heiser W.J. (1989) Clustering N objects into k groups under optimal
scaling of variables, Psychometrika, 54, no.4, 699-706.
Carrol J.D., Pruzansky S., and Green P.F. (1977) Estimation of the parameters of
Lazarsfeld's Latent Class Model by application of canonical decomposition
CANDECOMP to multi-way contingency tables, AT&T Bell Laboratories, unpublished
paper, 18 p.
Cazes P. (1986) Correspondance entre deux ensembles et partition de ces deux
ensembles, Les Cahiers de l'Analyse des Données, vol.XI, no.3, 335-340.
Cazes P., and Moreau J. (1991) Contingency table in which the rows and columns have a
graph structure, in E.Diday, Y.Lechevallier (Eds) Symbolic-Numeric Data Analysis and
Learning, Nova Science Publishers: New York, 271-280.
Caussinus H.(1992). Projections Revelatrices in Modèles pour l'Analyse des Données
Multidimensionnelles, J.J. Droesbeke, B. Fichet, P.Tassi, eds, Economica, Paris.
Escofier B. (1978). Analyse factorielle et distances répondant au principe d'équivalence
distributionnelle. Revue de Statist. Appl. vol. 26, n°4, p 29-37.
Escoufier Y. (1988) Beyond correspondence analysis. In: H.H.Bock (Ed.) Classification
and Related Methods of Data Analysis. Elsevier Sc.P.
Gilula Z. (1986) Grouping and association in contingency tables: an exploratory
canonical correlation approach, Journal of American Statistical Association, vol.81,
no.395, 773-779.
Gnanadesikan R., Kettenring J.R., Landwehr J.M. (1982). Projection Plots for Displaying
Clusters, in Statistics and Probability, Essays in Honor of C.R. Rao, G. Kallianpur, P.R.
Krishnaiah, J.K.Ghosh, eds, North-Holland.
Goodman L.A. (1991) Measures, models, and graphical displays in the analysis of cross-
classified data (with Discussion), Journal of American Statistical Association, vol.86,
No.416, 1085-1138.
Goodman L.A.(1981) Criteria for determining whether certain categories in a cross-
classification table should be combined with special reference to occupational categories
in an occupational mobility table, American Journal of Sociology, 87, 612-650.
Govaert G. (1977) Algorithme de classification d'un tableau de contingence. In:
"Premières Journées Internationales Analyse des Données et Informatique (Versailles
1977)" INRIA, p. 487-500.
17
Seventh International Conference on Multivariate Analysis
Barcelona Meeting, September, 21 - 24, 1992
in "Multivariate Analysis, Future Directions, C.Cuadras, C.R.Rao, Eds, North Holland, 1993, p 341-357.
Gower J.C, Ross G. (1969) Minimum spanning tree and single linkage cluster analysis.
Appl.Statistics, vol 18, p 54-64.
Greenacre M.J. (1988) Clustering the rows and columns of a contingency table, Journal
of Classification, 5, 39-51.
Hartigan J.A. (1972) Direct clustering of a data matrix, Journal of American Statistical
Association, vol.67, p. 123-129.
Jambu M. (1978) Classification Automatique pour l'Analyse des Données, I- Méthodes et
Algorithms. Paris:Dunod.
Kharchaf I., Rousseau R. (1988, 1989) Reconnaissance de la structure de blocs d'un
tableau de correspondance par la classification ascendante hiérarchique: parts 1 and 2, Les
Cahiers de l'Analyse des Données, vol.XIII, n.4, 439-443; vol.XIV, n.3, 257-266.
Lebart L. (1976) The significance of eigenvalues issued from correspondence analysis.
Proceedings in Comp. Stat., COMPSTAT, Physica verlag, Wien, p 38-45.
Lebart L., Morineau A., Warwick K. (1984) - Multivariate Descriptive Statistical
Analysis, J.Wiley, New-York.
Marcotorchino F. (1987) Block seriation problems: a unified approach, Journal of
Applied Stochastical Models and Data Analysis, vol.3, no.3, 73-93.
Mirkin B.G. (1985) Grouping in SocioEconomic Studies. Finansy i Statistika Publishers,
Moscow (in Russian).
Mirkin B.G. (1992) Correspondence-wise clustering for contingency tables, submitted for
publication.
Morineau A., Lebart L. (1986) Specific Clustering Algorithms for Large data sets and
Implementation in SPAD Software. in Classification as a Tool of Research, Gaul W.,
Schader M., Eds, North Holland, 1986.
Moussaoui A.E. (1987) Sur la reconstruction approchée d'un tableau de correspondance a
partir du tableau cumulé par blocs suivant deux partitions des ensembles I et J, Les
Cahiers de l'Analyse des Données, vol.XII, n.3, 365-370.
Key-words :
Correspondence Analysis, Clustering techniques, Classification, Hybrid
approaches in Data Analysis, Contingency tables.
18