0% found this document useful (0 votes)
57 views7 pages

Correlation Visualization of High Dimensional Data

Correlation analysis has always been a key technique for understanding data. However, traditional methods are only applicable on

Uploaded by

Haruna Tofa
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
57 views7 pages

Correlation Visualization of High Dimensional Data

Correlation analysis has always been a key technique for understanding data. However, traditional methods are only applicable on

Uploaded by

Haruna Tofa
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 7

See discussions, stats, and author profiles for this publication at: https://www.researchgate.

net/publication/221079744

Correlation Visualization of High Dimensional Data Using Topographic Maps

Conference Paper  in  Lecture Notes in Computer Science · August 2002


DOI: 10.1007/3-540-46084-5_163 · Source: DBLP

CITATIONS READS

7 175

3 authors, including:

Ignacio Díaz Blanco


University of Oviedo
69 PUBLICATIONS   354 CITATIONS   

SEE PROFILE

Some of the authors of this publication are also working on these related projects:

Visual Analytics Techniques for Improving Efficiency in Buildings and Processes View project

All content following this page was uploaded by Ignacio Díaz Blanco on 03 June 2014.

The user has requested enhancement of the downloaded file.


Correlation Visualization of High Dimensional
Data using Topographic Maps

Ignacio Dı́az Blanco, Abel A. Cuadrado Vega, and Alberto B. Diez González

Área de Ingenierı́a de Sistemas y Automática


Universidad de Oviedo
Campus de Viesques s/n, 33204, Gijón, Spain
{idiaz,cuadrado,alberto}@isa.uniovi.es

Abstract. Correlation analysis has always been a key technique for un-
derstanding data. However, traditional methods are only applicable on
the whole data set, providing only global information on correlations.
Correlations usually have a local nature and two variables can be di-
rectly and inversely correlated at different points in the same data set.
This situation arises typically in nonlinear processes. In this paper we
propose a method to visualize the distribution of local correlations along
the whole data set using dimension reduction mappings. The ideas are
illustrated through an artificial data example.

1 Introduction
Visualization and dimension reduction techniques have received considerable at-
tention in recent years for the analysis of large sets of multidimensional data
[1–3] and particularly for supervision and condition monitoring of complex in-
dustrial processes [4–6]. These techniques allow to discover unknown features
and relationships of high dimensional data in a visual manner by means of a
mapping from a data space D (also input space) onto a low dimensional visu-
alization space V where complex relationships among input variables can be
easily represented and visualized while preserving information significant to a
given problem.
Another very useful technique when dealing with high dimensional data is
correlation analysis. Correlation analysis is concerned with finding how compo-
nents x1 , · · · , xp of the sample data vectors {xi }i=1,···,n are mutually related.
The standard way to cope with this problem is through the analysis of second
order statistics such as the correlation matrix R whose coefficients r ij ∈ [−1, 1]
provide a description of how variables xi and xj are related. These coefficients
are the result of a normalized inner product –the cosine– between vectors formed
by the values of xi and xj for the whole data set and, in consequence, they pro-
vide a correlation information of a global nature. However, in many cases data
variables can be correlated in different ways for different regions of the data
space. This is the case, for instance, of multimodal or nonlinear processes, which
behave locally in different ways depending on the working point. Thus, we need
a local description of correlation.
In this paper, we suggest a method to combine correlation analysis with the
power of dimension reduction visualization methods, such as the Self-Organizing
Map (SOM) [7] or the Generative Topographic Map (GTM) [8], allowing to vi-
sualize local correlations for each pair of variables xi , xj through the so called
correlation maps defined in the visualization space. The paper is organized as
follows. In section 2 the ideas of local covariance and local correlation are intro-
duced, and a method to display the information provided by local second order
statistics in the visualization space is proposed. In section 3 the proposed ideas
are illustrated through a simple example. Finally, in section 4 some concluding
remarks and future research lines are outlined.

2 Correlation Maps

2.1 Local Covariance Matrix

Let ψ(y) : R2 → Rn a continuous mapping which takes a point y of the visualiza-


tion space V ⊂ R2 and obtains a point ψ(y) pertaining to the manifold which ap-
proximates the distribution of the input data points xi in the data space D ⊂ Rn .
1 2 2
Let’s define the following neighborhood function wi (y) = e− 2 kxi −ψ(y)k /σ , which
describes the degree of locality or proximity of sample xi with respect to ψ(y) in
the data space D. We define the local mean vector m(y) and the local covariance
matrix C(y) associated to a point y in the visualization space V as

P
i xi · wi (y)
m(y) = P (1)
i wi (y)
P
[xi − m(y)][xi − m(y)]T · wi (y)
C(y) = (cij ) = i P (2)
i wi (y)

Taken independently, the n×n components cij (y) of the covariance matrix C(y),
can be regarded as local covariance values which describe the local dependency
between variables xi and xj . Expressions (1) and (2) represent local versions of
the sample first and second order moments of the input data distribution around
the image of point y in the visualization space, i.e., ψ(y), where the width factor
σ is a design parameter related the degree of locality to be taken into account,
allowing to establish a tradeoff between global and local correlations.
The local covariance C(y) described in (2) defines in V a field of covariance
matrices from D each of which provides a local description of second order
statistical features of data in D lying in the vicinity of ψ(y).

2.2 Local Correlation Matrix

The previously defined covariance matrix provides insight in the approach of


local description of second order statistics. However, in looking for correlations,
correlation coefficients are preferred as they provide a normalized description of
correlations in the interval [−1, +1]. The local correlation matrix around y can
be defined as
cij
R(y) = (rij ) where, rij = √ (3)
cii cjj

The local correlation matrix R(y) has n × n components rij (y) which rep-
resent the local correlation coefficient between variable xi and variable xj and
lie always in the interval [−1, +1], where +1 denotes full direct correlation, 0
denotes incorrelation, and −1 denotes full inverse correlation.

2.3 Visualization of Second Order Statistical Features

Both the covariance matrix C(y) and correlation matrix R(y) are defined for
each point y of V . In addition to this, all powerful geometrical and statistical
interpretations underlying both matrices can be represented in V using scalar
quantities. Thus, for instance, each component cij (y) or rij (y) defines a scalar
quantity susceptible to be represented in the same way as SOM planes, using a
color code for each pixel y. In the same way, the principal values of the covariance
matrix λi (y) or the components of the principal vectors ui (y) can be represented
as SOM planes.
This representation provides a unified visualization of the underlying correla-
tions and second order statistical properties in general. Moreover, it is coherent
with other SOM representations such as SOM planes or the u-matrix providing
insight in the pattern of correlation dependencies among variables or revealing
the most important features describing the behavior of the underlying process
for each data region.

3 Application to Artificial Data

All these ideas are illustrated in figures 1, 2 and 3. A simple 2D data set was
used to train both a 1D-SOM and a 2D-SOM. Local covariances were obtained
for the 2D-SOM using (1) and (2) and then plotted in both the data space D
and the visualization space V . Local correlations were also obtained using (3) to
build the correlation maps of rxx , rxy , ryx , ryy shown in figure 2. A set of points
with negative local correlations (corresponding to the right part of the “arc” in
the data) can be discovered by looking at the upper left corner in the rxy plane.
Similarly, moderately high correlations appear in the upper right corner of the
map, showing up the positive local correlations existing in the left part of the
“arc” in the data space. It can also be observed that the graphical information
provided by correlation maps in figure 2 is consistent with that shown in the
SOM planes in figure 3, because both are descriptions in the same visualization
space V . Finally, as we should expect, planes rxx and ryy are equal to 1, and
rxy = ryx due to the symmetry properties of correlation matrices.
1D−SOM in Data Space 2D−SOM in Data Space
5 5

0 0

−5 −5

−10 −10
−5 0 5 −5 0 5 10

Visualization Space Visualization Space


15
14
10 12

5 10
8
0
6
−5 4
−10 2
0
−15
0 10 20 30 0 5 10 15

Fig. 1. Local covariances in D (top) and in V (bottom) obtained for both a


1D-SOM (left) and a 2D-SOM (right). In the thick areas (low correlations),
the covariances are nearly spherical, while in thin areas (high correlations) the
covariances become low rank, and oriented, showing up in V the nature of local
correlations.

4 Concluding Remarks

We have proposed here a method for the visualization of local second order sta-
tistical properties using dimension reduction mappings like –but not restricted
to– the SOM. The proposed idea has strong connections with local model ap-
proaches, such as [9], where local linear PCA projections are proposed to capture
the nonlinear structure of data.
We showed here through an artificial data example how local second order
statistical properties can be revealed by means of correlation maps, which, in
xx xy
14 1 14 1

12 12
0.5 0.5
10 10
8 8
0 0
6 6
4 4
−0.5 −0.5
2 2
0 −1 0 −1
0 5 10 0 5 10

yx yy
14 1 14 1

12 12
0.5 0.5
10 10
8 8
0 0
6 6
4 4
−0.5 −0.5
2 2
0 0
−1 −1
0 5 10 0 5 10 15

Fig. 2. Correlation Maps for the 2D-SOM show a region in V (up-left) related to
highly negative local correlations and another region (up-right) revealing positive local
correlations.

addition, are consistent with other standard representations in the visualization


space such as the component planes or the distance matrix. This provides an
alternative way for high dimensional data visualization to the standard methods
based on SOM –u-matrix, SOM planes, response surfaces or SOM planes rear-
rangement [10], as well as SOM clustering methods [11]– which combines the
classical correlation analysis techniques (correlation matrix) with the power of
SOM data visualization.
As a matter of further study, the idea of local second order moments is not
restricted to correlation analysis or even to second order moments. Eigenvalues
λi (y) or the components of eigenvectors ui (y) of the local covariance matrix can
lead to meaningful maps, which can be derived in a straightforward manner from
the ideas described here. In a similar way, higher order statistics (cumulants)
can be obtained in a local fashion opening new exciting research lines in data
visualization.
x y Interneuron Distance Matrix

14 14 14

12 12 12

10 10 10

8 8 8

6 6 6

4 4 4

2 2 2

0 0 0

0 5 10 0 5 10 0 5 10

Fig. 3. SOM planes of variables x and y and distance matrix.

The ideas proposed in this paper are currently being tested in the steel in-
dustry to investigate the effects of several dozens of process variables in several
quality factors of the processed coils in a tandem mill with encouraging results.

References
1. Joshua B. Tenenbaum, Vin de Silva, and John C. Langford. A global geometric
framework for nonlinear dimensionality reduction. Science, 290:2319–2323, Dec,
22 2000.
2. Sam T. Roweis and Lawrence K. Saul. Nonlinear dimensionality reduction by
locally linear embedding. Science, 290:2323–2326, Dec., 22 2000.
3. Jianchang Mao and Anil K. Jain. Artificial neural networks for feature extrac-
tion and multivariate data projection. IEEE Transactions on Neural Networks,
6(2):296–316, March 1995.
4. David J. H. Wilson and George W. Irwin. RBF principal manifolds for process
monitoring. IEEE Transactions on Neural Networks, 10(6):1424–1434, November
1999.
5. Teuvo Kohonen, Erkki Oja, Olli Simula, Ari Visa, and Jari Kangas. Engineering
applications of the self-organizing map. Proceedings of the IEEE, 84(10):1358–1384,
october 1996.
6. Esa Alhoniemi, Jaakko Hollmén, Olli Simula, and Juha Vesanto. Process mon-
itoring and modeling using the self-organizing map. Integrated Computer Aided
Engineering, 6(1):3–14, 1999.
7. Teuvo Kohonen. Self-Organizing Maps. Springer-Verlag, 1995.
8. Christopher M. Bishop, Markus Svensen, and Christopher K. I. Williams. GTM:
The generative topographic mapping. Neural Computation, 10(1):215–234, 1998.
9. M. Tipping and C. Bishop. Mixtures of probabilistic principal component analyz-
ers. Neural Computation, 11(2):443–482, 1999.
10. Juha Vesanto. Som-based data visualization methods. Intelligent Data Analysis,
3(2):111–126, 1999.
11. Juha Vesanto and Esa Alhoniemi. Clustering of the self-organizing map. IEEE
Transactions on Neural Networks, 11(3):586–600, May 2000.

View publication stats

You might also like