Data Visulization Techniques
Data Visulization Techniques
Figure 2.9 Three cases where there is no observed correlation between the two plotted attributes in each
of the data sets.
from lower left to upper right, this means that the values of X increase as the values
of Y increase, suggesting a positive correlation (Figure 2.8a). If the pattern of plotted
points slopes from upper left to lower right, the values of X increase as the values of Y
decrease, suggesting a negative correlation (Figure 2.8b). A line of best fit can be drawn
to study the correlation between the variables. Statistical tests for correlation are given
in Chapter 3 on data integration (Eq. (3.3)). Figure 2.9 shows three cases for which
there is no correlation relationship between the two attributes in each of the given data
sets. Section 2.3.2 shows how scatter plots can be extended to n attributes, resulting in a
scatter-plot matrix.
In conclusion, basic data descriptions (e.g., measures of central tendency and mea-
sures of dispersion) and graphic statistical displays (e.g., quantile plots, histograms, and
scatter plots) provide valuable insight into the overall behavior of your data. By helping
to identify noise and outliers, they are especially useful for data cleaning.
How can we convey data to users effectively? Data visualization aims to communicate
data clearly and effectively through graphical representation. Data visualization has been
used extensively in many applications—for example, at work for reporting, managing
business operations, and tracking progress of tasks. More popularly, we can take advan-
tage of visualization techniques to discover data relationships that are otherwise not
easily observable by looking at the raw data. Nowadays, people also use data visualization
to create fun and interesting graphics.
In this section, we briefly introduce the basic concepts of data visualization. We start
with multidimensional data such as those stored in relational databases. We discuss
several representative approaches, including pixel-oriented techniques, geometric pro-
jection techniques, icon-based techniques, and hierarchical and graph-based techniques.
We then discuss the visualization of complex data and relations.
2.3 Data Visualization 57
Figure 2.10 Pixel-oriented visualization of four attributes by sorting all customers in income ascending
order.
58 Chapter 2 Getting to Know Your Data
One data
record Dim 6
Dim 6
Dim 5 Dim 1
Dim 5 Dim 1
Dim 4 Dim 2
Dim 4 Dim 2
Dim 3
Dim 3
(b)
(a)
Figure 2.12 The circle segment technique. (a) Representing a data record in circle segments. (b) Laying
out pixels in circle segments.
to fill the windows. A space-filling curve is a curve with a range that covers the entire
n-dimensional unit hypercube. Since the visualization windows are 2-D, we can use any
2-D space-filling curve. Figure 2.11 shows some frequently used 2-D space-filling curves.
Note that the windows do not have to be rectangular. For example, the circle segment
technique uses windows in the shape of segments of a circle, as illustrated in Figure 2.12.
This technique can ease the comparison of dimensions because the dimension windows
are located side by side and form a circle.
80
70
60
50
40
Y
30
20
10
0
0 10 20 30 40 50 60 70 80
X
Figure 2.13 Visualization of a 2-D data set using a scatter plot. Source: www.cs.sfu.ca/jpei/publications/
rareevent-geoinformatica06.pdf .
Figure 2.14 Visualization of a 3-D data set using a scatter plot. Source: http://upload.wikimedia.org/
wikipedia/commons/c/c4/Scatter plot.jpg.
A data record is represented by a polygonal line that intersects each axis at the point
corresponding to the associated dimension value (Figure 2.16).
A major limitation of the parallel coordinates technique is that it cannot effec-
tively show a data set of many records. Even for a data set of several thousand records,
visual clutter and overlap often reduce the readability of the visualization and make the
patterns hard to find.
10 30 50 70 0 10 20
80
70
Sepal length (mm) 60
50
40
70
50
Petal length (mm)
30
10
45
40
35
Sepal width (mm)
30
25
20
25
20
15 Petal width (mm)
10
5
0
40 50 60 70 80 20 30 40
Iris Species Setosa Versicolor Virginica
Figure 2.15 Visualization of the Iris data set using a scatter-plot matrix. Source: http://support.sas.com/
documentation/cdl/en/grstatproc/61948/HTML/default/images/gsgscmat.gif .
Viewing large tables of data can be tedious. By condensing the data, Chernoff faces
make the data easier for users to digest. In this way, they facilitate visualization of reg-
ularities and irregularities present in the data, although their power in relating multiple
relationships is limited. Another limitation is that specific data values are not shown.
Furthermore, facial features vary in perceived importance. This means that the similarity
of two faces (representing two multidimensional data points) can vary depending on the
order in which dimensions are assigned to facial characteristics. Therefore, this mapping
should be carefully chosen. Eye size and eyebrow slant have been found to be important.
Asymmetrical Chernoff faces were proposed as an extension to the original technique.
Since a face has vertical symmetry (along the y-axis), the left and right side of a face are
identical, which wastes space. Asymmetrical Chernoff faces double the number of facial
characteristics, thus allowing up to 36 dimensions to be displayed.
The stick figure visualization technique maps multidimensional data to five-piece
stick figures, where each figure has four limbs and a body. Two dimensions are mapped
to the display (x and y) axes and the remaining dimensions are mapped to the angle
62 Chapter 2 Getting to Know Your Data
y
10
0 x
–5
–10
⫻1 ⫻2 ⫻3 ⫻4 ⫻5 ⫻6 ⫻7 ⫻8 ⫻9 ⫻10
Figure 2.16 Here is a visualization that uses parallel coordinates. Source: www.stat.columbia.edu/∼cook/
movabletype/archives/2007/10/parallel coordi.thml.
Figure 2.17 Chernoff faces. Each face represents an n-dimensional data point (n ≤ 18).
and/or length of the limbs. Figure 2.18 shows census data, where age and income are
mapped to the display axes, and the remaining dimensions (gender, education, and so
on) are mapped to stick figures. If the data items are relatively dense with respect to
the two display dimensions, the resulting visualization shows texture patterns, reflecting
data trends.
2.3 Data Visualization 63
age
income
Figure 2.18 Census data represented using stick figures. Source: Professor G. Grinstein, Department of
Computer Science, University of Massachusetts at Lowell.
Figure 2.20 Newsmap: Use of tree-maps to visualize Google news headline stories. Source: www.cs.umd.
edu/class/spring2005/cmsc838s/viz4all/ss/newsmap.png.