0% found this document useful (0 votes)
15 views5 pages

Solucao Diferente

This paper discusses the use of parallel coordinates for visualizing fuzzy data, allowing for the representation of n-dimensional data in two dimensions without loss of information. It highlights the advantages of this technique in identifying fuzzy rules and clusters, as well as validating findings from data sets. The authors provide examples from the Iris plant database to demonstrate the effectiveness of parallel coordinates in analyzing fuzzy data.

Uploaded by

thabatta
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
15 views5 pages

Solucao Diferente

This paper discusses the use of parallel coordinates for visualizing fuzzy data, allowing for the representation of n-dimensional data in two dimensions without loss of information. It highlights the advantages of this technique in identifying fuzzy rules and clusters, as well as validating findings from data sets. The authors provide examples from the Iris plant database to demonstrate the effectiveness of parallel coordinates in analyzing fuzzy data.

Uploaded by

thabatta
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 5

Fuzzy Parallel Coordinates

Lawrence 0. Hall1 and Michael R. B e r t h ~ l d ~ ? ~


‘Dept. of Computer Science and Engineering, ENB 118
University of South Florida, Tampa, F1. 33620
hall9csee.usf.edu

2Berkeley Initiative in Soft Computing (BISC)


Computer Science Division, University of California, Berkeley, CA 94720
berthold9cs.berkeley.edu
3Utopy Inc.
330 Fell Street, San Francisco, CA 94102

Abstract For data visualization many techniques exist, most


prominently (probably) two-dimensional scatter plots along
The ability to visualize data often leads to new insights. pairs of the features. Often this technique is extended to
Data that is more than three dimensional must be visualized scatter plots of the most important principal components,
as a series of projections or transformed into some other hence providing projections of the data which have high vari-
representation which usually causes a loss of detail. Parallel ance. Of course histograms of each feature also belong in
coordinates allows one to visualize data in two dimensions this category. However, such lower dimensional, linear pro-
without a loss of information. I n this paper, we discuss the jections always run into the risk of hiding interesting, higher
use of parallel coordinates to visualize fuzzy data. Fuzzy data dimensional structures. More sophisticated non-linear tech-
may consist of fuzzy rules, which can be viewed as cutting a niques are often based on multidimensional scaling [16] and
swath through a n n-dimensional space. Fuzzy clusters m a y try to find a lower dimensional representation that best
also be considered fuzzy data in a similar way. Examples matches the original data set.
are given from three domains. The examples show that par- Since data sets tend to be large and usually also redun-
allel coordinates can be used to find extraneous f i z z y rules, dant, processing and plotting all data points is often com-
separate fuzzy clusters as well as validate previous findings putationally expensive. Hence, attention has shifted to the
about data sets. visualization of models that summarize the data. Some tech-
niques, such as Kohonen’s self organizing feature map [15],
try to find non-linear structures of fixed topology and size
in the data which are then displayed in a lower dimen-
1 Introduction sional (usually 2 or 3) space. Often hierarchical clustering
The automatic analysis of large datasets has gained increas- techniques are used to display structure in the data, based
ing attention in the past couple of years. Scenarios where on presumably more and less important prototypes. Data
data analysis techniques can be used are many and stem points themselves or representative cluster centers can also
from a wide range of diverse applications. Most techniques again be visualized through multi dimensional scaling tech-
nowadays try to extract descriptors (usually in numerical niques. These are often based on spring embedding [14]
form) or relationships (for example in the form of rules or where the spring constants resemble relationships between
mathematical functions). An introduction to the most com- parts of the model (often simply their distance in the origi-
monly used techniques can be found in [l].Due to the com- nal space).
plexity of the resulting models (or their inaccuracy if the All these techniques are based on some sort of compres-
model’s complexity is too low) increasingly methods have sion, that is, high-dimensional structures are summarized
been proposed to visualize either the data itself or a model and represented through elements in a lower dimensional
which aims to summarize all or a subset of the data. This visual representation. Hence complex, high-dimensional re-
way, the user can hopefully find structures of interest and, lationships are compressed into a much lower-dimensional
through a process commonly referred to as exploratory data relation. This loss in information can be harmful especially
analysis, fine tune the focus of analysis. when the underlying dimensionality of the data (or repre-

0-7803-6274-8/00/$10.00 O 2000 IEEE 74

Authorized licensed use limited to: CENTRO FED DE EDUCACAO TECNOLOGICA DE MINAS GERAIS. Downloaded on May 24,2023 at 20:42:32 UTC from IEEE Xplore. Restrictions apply.
sentative model) is high'. Therefore it is of great interest the example in Figure 1 these are indicated by i1,2 = (0.5,2)
to find representations which enable a visualization of the and i2,3 = (0.75,1.5) which uniquely describe a line in 3 di-
entire structure of a dataset or the corresponding model mensions.
without such a loss of information. The n-dimensional line in Cartesian coordinates can be
An interesting approach to visualize high dimensional represented by (n - 1) linearly independent equations each of
data sets in the same dimension without any loss of in- which results from equating a different pair of the following
formation is parallel coordinates [lo, 12, 131 a technique fractions [12]:
which was recently proposed to find trends in data. Par-
allel coordinates allows one to visualize n dimensional data 51 - a1 - 52 -a2
- -- - ... -
- -X. n - an
~

(1)
points in two dimensions. Essentially, parallel coordinates U1 U2 Un
transforms multi-dimensional patterns into two-dimensional
Now it may be assumed that the n - 1 linearly indepen-
patterns without loss of information. Visualization is facili-
dent equations are obtained from pairing the n - 1 adjacent
tated by viewing the two-dimensional representation of the
fractions, with no loss in generality. This yields
n-dimensional data points as lines crossing n parallel axes,
each of which represents one dimension of the original fea-
ture space. This approach scales well with increasing n and
has already been incorporated in some data analysis tools. where mi = ui+l/u; represents the slope and b, = (ai+l -
Fuzzy models have not received a lot of attention with mia,) the intercept of the zi+l-axis of the projected line on
respect to visualization even though powerful algorithms to the sizi+l-plane. The dual point of the n-dimensional line
construct such models from data exist. In this paper we in parallel coordinates therefore corresponds the the set of
therefore explore whether and how parallel coordinates can n-1 indexed points
be used to visualize fuzzy points in n-dimensional space. In
each dimension a fuzzy point has some fuzzy extent which
in the narrowest case (that is, a singleton) would be a sin-
gle value or point. Since fuzzy rules can also be considered
as a fuzzy point in n dimensional space, this visualization There are other nice results about the parallel coordinate
is extensible to such rules as well. Another highly related representation [13,9,6] that are not germane to this paper.
example of fuzzy points is a fuzzy partition of a data set.
The fuzzy partition can be represented by fuzzy clusters,
which in turn can be viewed as centroids with a correspond- 3 f i z z y data in parallel coordinates
ing fuzzy neighborhood which again is representable in the
framework of fuzzy points. Of course, fuzzy clusters have Our initial example showed several non-fuzzy points on a
also been converted to fuzzy rules [5, 7, 81. line. There are several ways that a fuzzy point might be
represented. In Figure 2a we show that a fuzzy point can
be represented as a single point which would be the core
2 Parallel Coordinates or centroid of the fuzzy rule, cluster, or point. Here it rep-
resents the number of training examples it covers (thicker
Parallel coordinates [lo, 121 allows one to visualize n dimen- lines for more examples). A more intuitive, but perhaps
sional data in 2-D. Essentially, parallel coordinates trans- more difficult to decipher, approach is to model the fuzzy
forms multi-dimensional problems into 2-D patterns without point as a region in each dimension Figure 2b. When there
loss of information. Visualization is facilitated by viewing are many fuzzy points each color or grey-level (as we use
the 2-D representation of the n dimensional data.
If one takes each of n coordinate axes and lines them up
in parallel, one has the basis for parallel coordinates. The
distance between each adjacent axis is assumed to be equal
to 1. A point in n dimensional space becomes a series of
n - 1 connected lines in parallel coordinates which intersect
each axis at the appropriate value for that dimension. A
parallel coordinates example of 3 points in 3-D, a' = (1,3, l ) ,
= (4,0,2), and c'= (2.5,1.5,1.5), from a line is shown in
Figure 1. Figure 1 . A parallel Coordinate depiction of 3 pojnts
The dual of an n-dimensional line in Cartesian coordi- on a line with a' = (1.0,3.0,1.0) (dark line), b =
nates is a set of n-1 points in parallel coordinates [ll,61,for (4.0,0.0,2.0) (gray), c' = (2.5,1.5,1.5) (light gray).
The two intersection points at i1,2 = (0.5,2) and i2,3 =
'In contrast to a model in high dimensions which has a simple
low-dimensional structure and can hence be transformed into a low (0.75,1.5) uniquely describe the line going through all
dimensional representation without a substantial loss of information. three points an the original 3 - 0 space.
Such models, unfortunately, are found rarely for real-world problems.

75

Authorized licensed use limited to: CENTRO FED DE EDUCACAO TECNOLOGICA DE MINAS GERAIS. Downloaded on May 24,2023 at 20:42:32 UTC from IEEE Xplore. Restrictions apply.
Figure 3. A n example of a fuzzy rule represented as
its core support region and in lighter gray its partial
support region.

point displays of which parallel coordinates is one. Within


parallel coordinates there are several options for the display
of fuzzy points. The core point of the fuzzy point is a vector
with each element the center of the core region for a fuzzy
rule in the appropriate dimension.
It is possible to display only the region of the fuzzy set
where the membership is one (core) or the full range covered
by a fuzzy set. Since, the learning tool used here provides a
very large region of partial coverage, only the core support
of the fuzzy sets are displayed in each dimension here.
To display regions of less than full membership in a fuzzy
set, it will be necessary to lighten the color or gray-level to
indicate lessening of memberships in the set. An example
is shown for a fuzzy rule in three dimensions. Recognizing
overlapping lighter colors will be challenging. It is possible
to plot alpha-cuts of each fuzzy set so that only the region
whose points have a membership equal to or above a are
shown. Under an alpha-cut paradigm the remaining figures
Figure 2. Displayed in parallel coordinates a) Fuzzy show plots with Q = 1.
point as its core point, b) fuzzy point as a region, c ) A fuzzy point can also be represented by its coverage in
overlapping fuzzy points or rules. each dimension as shown in Figure 2c. Overlapping fuzzy
points tend to obscure the overlap unless colors are used and
allowed to intermix. There is a mode for overlapping fuzzy
here) is overlaid. This can make it difficult to discern pat- points in which the colors for each point appear one vertical
terns in the data which is why the core point representation pixel slice at a time, which we call interlaced or mixed lines.
is offered. In Figure 2c, we show two overlapping fuzzy This helps visualize the overlap of two or more fuzzy points
points/rules in different gray levels. by allowing the user to see different colors.
After the fuzzy data is displayed, it is necessary to view It is also possible to remove labeled classes from a plot.
it in different ways in order to extract the information con- This is a useful feature when one wants to concentrate on the
tained. For example, it may be desirable to view fuzzy fuzzy points from a subset of classes. There are sliders on
points from one or two classes or regions only. One may each axis which allow fuzzy points t o be masked out. Any
wish to restrict the ranges of fuzzy points on one or more fuzzy point in a dimension which is not within the range
axes, in order to search for separation or trends. allowed by the sliders is masked out and shown in a distinct
The tool used to display the fuzzy points in Figure 2 and color. These points can also (optionally) be removed from
in the following figures is part of the fuzzy data mining tool the parallel coordinate display.
which contains a learning algorithm (and other features) The described functionality is illustrated by a set of ex-
described in [3] and which was developed at UC Berkeley’s amples in the next section. This set of examples shows the
BISC group (Berkeley Initiative in Soft Computing). The value of each way of displaying fuzzy points in parallel co-
display feature allows for several different types of fuzzy ordinates.

76

Authorized licensed use limited to: CENTRO FED DE EDUCACAO TECNOLOGICA DE MINAS GERAIS. Downloaded on May 24,2023 at 20:42:32 UTC from IEEE Xplore. Restrictions apply.
4 Fuzzy Data analysis

The operations applied to fuzzy data sets are best illus-


trated by examples. Three data sets were used to show
the utility of parallel coordinates in analyzing fuzzy data.
Here we will describe results on the well-studied Iris plant
database. Other experiments using more complex data sets
are reported in [2].

The Iris plant database consists of 150 examples each of


4 features which describe 3 types or classes of Iris. In our
experiment, fuzzy points, fuzzy rules in this instance, were
created from a training set of 75 examples. In Figure 4a we
show the 11 fuzzy rules that were learned for three classes.
The rules for a given class are all the same gray level. This
means the three rules whose cores are at the bottom of the
petal width feature are from the same class. It can also
be seen that the other two classes have 3 and 5 fuzzy rules
describing them. Also shown in the upper left hand corner
of the plot is the percentage of training examples correctly
covered by the sum total of the fuzzy rules for each class.
The check boxes allow the user to choose not to display one
or more classes by turning the check off. Here, the union of
the rules for each class correctly cover all examples in the Figure 4 . The parallel coordinate view of the Iris data
training set for the class in this experiment. set displaying only the centers of the cores (a) and
using interlaced lines (b). Rules for the same class are
One thing that is clear from viewing in parallel coor- displayed using the same gray level.
dinates is that features 3 and 4 (petal length and width)
appear to cleanly separate one class. In fact, for this data
set it appears that all classes can be separated using only 5 Summary and Future Work
feature 3. However, the coverage of the fuzzy points is not
given by the cores, so we show overlaid or mixed colors (we Parallel coordinates allow n-dimensional fuzzy points to be
call this transparent colors) in Figure 4b with just classes viewed in 2 dimensions without loss of information. This
Virginica and Versicolour displayed. paper shows how parallel coordinates can be used to both
view fuzzy points in several different ways and understand
Now, we investigate isolating classes by using the sliding
them. Fuzzy points can be viewed by their cores or cen-
boxes on each axis (sliders). Only the fuzzy points whose
troids, by thickline cores which indicate how many exam-
cores lie within the region between the two sliders on each
ples each point covers, by interlaced lines with each point
axis are displayed and only their coverage (on the train set)
assigned a color or gray-level and by transparent colors or
is reflected in the upper left hand coverage area. In Figure 5a
gray levels. The last representation is difficult to see with
the class of Virginica has been isolated from the other two
gray-levels.
classes by just using petal length. This can be seen because
A java-based tool has been developed to display fuzzy
the only fuzzy points between the sliders are of one gray
level. Also, the display in the upper left shows that 100% points in each of the above described ways in n-dimensional
of the training examples for Virginica are covered and no parallel coordinates. The tool also allows for restrictive
other (0% for the other two classes) examples are covered. ranges to be applied to each dimension. For fuzzy points
which correspond to a particular fuzzy or non-fuzzy class,
Given known results about the Iris data set [4], the fact each time restrictions are applied to an axis, the percentage
that a single dimension can separate this class is simply of examples removed for each class is shown.
confirmed in the parallel coordinate view. In Figure 5b, we The visualization can be used to remove fuzzy rules, for
attempt to isolate the fuzzy points for the Setosa class using example, that have no impact as shown in the Shuttle data
just petal-length. We can almost do it, but are left with 12% set experiments [2] (not shown here). It can be used to
of the examples from the Versicolour class. This is because remove features that are not helpful in separating classes as
we cannot remove the last few fuzzy points for Versicolour shown with experiments on the Iris data set in which the
without also removing some pointslrules for Setosa. This overlap between the fuzzy points in the two sepal features
fact is consistent with what is known about the Iris data clearly precludes their use as class separators.
set, namely that these two classes overlap. Parallel coordinates was also effective in visualizing a set

77

Authorized licensed use limited to: CENTRO FED DE EDUCACAO TECNOLOGICA DE MINAS GERAIS. Downloaded on May 24,2023 at 20:42:32 UTC from IEEE Xplore. Restrictions apply.
fuzzy sets per axis. Not all axes need to be included in ev-
ery rule. Hence, it seems a natural approach to the visual
development of fuzzy sets. In a way analogous to our cur-
rent accuracy percentage for the training set, after a rule
is visually created an accuracy measure can be displayed
for it when it is applied to classification data. Then the
fuzzy sets may be modified to achieve the desired accuracy
on the training or validation set. This approach seems to
hold promise for developing fuzzy classification rules and
interactive data exploration.

Acknowledgements
This research was partially done at BISC while L. Hall was
on sabbatical. Thanks to UC Berkeley’s Div. of CS and
Prof. Zadeh for the use of their facilities. M. Berthold was
supported by the Deutsche Forschungsgemeinschaft through
grant DFG Be1740/7-1.

References
(b) [l] M. Berthold and D. H. (eds.). Intelligent Data Analysis, An
Introduction. Springer-Verlag, 1999.
Figure 5. The parallel coordinate view after using the [2] M. R. Berthold and L. 0. Hall. Visualizing fuzzy points in
slider on petal-length to isolate only the f i z z y points parallel coordinates. Technical Report UCB-CSD-99-1082,
describing the Virginica class (a) and after using the University of California at Berkeley, 1999.
[3] M. R. Berthold and K.-P. Huber. Constructing fuzzy graphs
slider on petal-length to attempt to isolate only the from examples. Intelligent Data Analysis, 3(1):37-54, 1999.
fuzzy points describing the Setosa class (b). [4] C. Blake and C. Merz. UCI repository of machine learning
databases, 1998.
[5] S. Chiu. Fuzzy model identification based on cluster esti-
of fuzzy clusters (by centroids). The visualization was used mation. Journal of Intelligent €4 Fuzzy Systems, 2(3), 1994.
to separate red tide phytoplankton blooms from other types [6] S.-Y. Chou, S.-W. Lin, and C.-S. Yeh. Cluster identifica-
tion with parallel coordinates. Pattern Recognition Letters,
of blooms and water. We were able to recreate the results of
20:565-572, 1999.
a painstaking search through 2-D projections that produced [7] M. Delgado, G.-S. A.F., and M. F. A fuzzy clustering-based
a similar rule for isolating red tide. The re-creation required rapid prototyping for fuzzy rule-based modeling. IEEE
less than 5 minutes [2]. fiansactions on f i z z y Systems, 5(2):223-233, May 1997.
We have applied the concept of viewing fuzzy points in [8] M. Fkiedman and A. Kandel. Introduction to Pattern Recog-
parallel coordinates only to fuzzy points which are rules or nition : Statistical, Structural, Neural and f i z z y Logic Ap-
centroids of potential rules that will be used to place an proaches. World Scientific, 1998.
[9] C. Gennings, K. Dawson, W. Carter, and R. Myers. Inter-
object in a class. The technique may also be useful for
preting plots of a multidimensional dose-responsesurface in
fuzzy control rules in which the consequent of a rule is a a parallel coordinate system. Biometrics, 46:719-735, 1990.
fuzzy set or fuzzy function. The visualization process may [lo] A. Inselberg. The plane with parallel coordinates. Visual
indicate where coverage is not fine enough (light overlap) Computer, 1:69-91, 1985.
and appears likely to make it easy to view uncovered regions. [ll] A. Inselberg. Multidimensional detective. In IEEE Confer-
To apply the idea to fuzzy control rules, an indication of ence on Visualization?, 1997.
goodness of control would need to be displayed after every [12] A. Inselberg and B. Dimsdale. Multidimensional lines i: rep-
change to the fuzzy points/rules. resentation. SIAM J. Applied Math, 54(2):559-577, 1994.
[13] A. Inselberg and B. Dimsdale. Multidimensional lines
One intriguing possibility with parallel coordinates is to ii: proximity and applications. SIAM J. Applied Math,
add features that are, for example, projections onto non-axis 54(2):578-596, 1994.
parallel lines or more complicated features to allow for the [14] N. Q. Jr. and M. Breuer. A force directed component place-
data to be separated. The extra features can still easily be ment procedure for printed circuit boards. IEEE fiansac-
visualized, may separate the data much better even though tions on Circuits and Systems, 26(6):377-388, 1979.
they may be redundant. [15] T. Kohonen. Self-organizing Maps. Springer-Verlag,Berlin,
The tool we have built could be further extended to al- Heidelberg, 1995.
[16] J. Meulman. A distance approach to nonlinear multivariate
low fuzzy sets to be defined on each axis. Then a fuzzy analysis. DSWO Press, Leiden, The Netherlands, 1986.
rule/point could be created by linking a choice of 1 or more

Authorized licensed use limited to: CENTRO FED DE EDUCACAO TECNOLOGICA DE MINAS GERAIS. Downloaded on May 24,2023 at 20:42:32 UTC from IEEE Xplore. Restrictions apply.

You might also like