Week 5 Lecture 14
Week 5 Lecture 14
Computing
5: Visualization
Content
• Panel Displays
• Surface Plots and 3D Scatter Plots
• Contour Plots
• Other 2D Representations of Data
• Other Approaches to Data Visualization
Introduction
• Visualization of multivariate data is related to exploratory data analysis
(EDA).
• The term ‘exploratory’ is in contrast to ‘confirmatory’, which could describe
hypothesis testing.
• It was important to do the exploratory work before hypothesis testing, to learn what
are the appropriate questions to ask, and the most appropriate methods to answer
them.
• With multivariate data, we may also be interested in dimension reduction or finding
structure or groups in the data.
• In this chapter, we focus on methods for visualizing multivariate data.
Several graphics functions are used, including R graphics package, lattice
and MASS, rggobi interface to GGobi and rgl package for interactive 3D
visualization. Table 1.4 lists some basic graphics functions. Table 4.1 lists
more.
Panel displays
• Panel display: an array of two-dimensional graphical summaries of
pairs of variables in a multivariate dataset. For example, a scatterplot
matrix displays the scatterplots for all pairs of variables in an array.
pairs: produce a scatterplot matrix, as shown in Figures 4.1 and 4.2 in
Example 4.1, and Figure 3.7. An example of three-dimensional plots is
Figure 4.5.
Example 4.1 (Scatterplot matrix)
• Compare the four variables in the iris data for the species virginica, in
a scatterplot matrix.
# virg in ica data in f i rs t 4 columns of the last 50 obs. p airs ( i r i s [101:150 , 1:4])
• The variable names will appear along the diagonal. The pairs function
takes an optional argument diag.panel, which is a function that
determines what is displayed along the diagonal.
To obtain a graph with estimated density curves along the diagonal,
supply the name of a function to plot the densities. The following
panel.d plot the densities.
pa ne l . d < - f unc t i on ( x , . . . ) {
us r < - pa r ( " us r " )
on . ex i t ( pa r ( us r ) )
pa r ( us r = c ( us r [ 1: 2] , 0 , . 5) )
l i ne s ( de ns i t y ( x ) )
}
Sepal.Length
0 1 2
−2
Sepal.Width
0 1 2
Fig.4.1: Scatterplot matrix (pairs)
comparing four measurements of iris
−2
0 1 2
−2
Petal.Width
0 1 2
−2
−2 0 1 2 −2 0 1 2
library ( lattice )
splom ( iris [101:150 , 1:4]) # plot 1
# for all 3 at once , in color , plot 2
splom ( iris [,1:4], groups = iris$ Species) # for all 3 at
once , black and white , plot 3
splom (∼iris [1:4] , groups = Species , data = iris , col = 1 , pch = c(1 ,
2 , 3), cex = c(.5 ,.5 ,.5))
}
1.5
Petal.Width
1.0
0.5
.0 0.5 1.0
0.0
7
4 5 6 7
6
5
4 Petal.Length 4
3
2
1 2 3 4
1
4.5
3.5 4.0 4.
4.0
3.5
Sepal.Width
3.0
2.5
2.0 2.5 3.0
2.0
8
7 8
7
Sepal.Length 6
5
5 6
0.15
0.10
Fig.4.3: Perspective plot of the stan-
0.05
dard bivariate normal density in Ex-
−3 3 ample 4.2.
−2 2
−1 1
0 0
1 −1
2 −2
3 −3
# s t or e v i e wi ng t r a ns f or ma t i on i n M
M= pe r s p ( x , y , z , t het a = 45 , phi = 30 ,
ex pa nd = . 4 , box = FAL S E )
The transformation returned by the persp function call is
[,1] [,2] [,3] [,4]
[1 ,] 2 .357023e -01 -0 .1178511 0 .2041241 -0 .2041241
[2 ,] 2 .357023e -01 0 .1178511 -0 .2041241 0 .2041241
[3 ,] -2 .184757e -16 4 .3700078 2 .5230252 -2 .5230252
[4 ,] 1 .732284e -17 -0 .3464960 -2 .9321004 3 .9321004
l i br a r y ( l a t t i c e )
x < - y < - s eq ( - 3 , 3 , l e ng t h = 50)
x y < - ex pa nd . g r i d ( x , y )
z < - ( 1 / ( 2 * pi ) ) * ex p ( - . 5 * ( x y [ , 1] ^2 + x y [ , 2] ^ 2) )
wireframe ( z ∼ x y [ , 1 ] * xy [ , 2 ] ) }
4.3.2 Three-dimensional scatterplot
cloud ( l a t t i c e ) function produces 3D scatterplots, which could
explore whether there are groups or clusters in the data. To apply
cloud, provide a formula z ∼ x∗y, where z = f (x, y) is the surface.
Example 4.5 (3D scatterplot)
Use cloud to display a 3D scatterplot of the iris data. There are
three species of iris and each is measured on four variables. The
following code produces a 3D scatterplot of sepal length, sepal
width, and petal length (similar to (3) in Figure 4.5).
l i br a r y ( l a t t i c e )
at t ac h( i r i s )
# ba s i c 3 c ol or pl ot wi t h a r r ows a l ong a x e s
pr i nt ( c l oud ( Pet a l . L e ng t h ∼ S e pa l . L e ng t h * S e pa l . Wi dt h , da t
The iris data has four variables, so there are four subsets of three
variables to graph. To see all four plots on the screen, use the more
and split options. The split arguments determine the location of the
plot within the panel display.
pr i nt ( c l oud ( S e pa l . L e ng t h ∼ Pe t a l . L e ng t h * Pe t a l . Wi dt h ,
d a t a = i r i s , g r o u p s = S p e c i e s , main = " 1 " , pch = 1 : 3 ,
s c a l e s = l i s t ( dr a w = F AL S E ) , z l a b = " S L " ,
s c r e e n = l i s t ( z = 30 , x = - 75 , y = 0) ) ,
s pl i t = c ( 1 , 1 , 2 , 2) , mor e = T RUE )
pr i nt ( c l oud ( S e pa l . Wi dt h ∼ Pe t a l . L e ng t h * Pe t a l . Wi dt h ,
d a t a = i r i s , g r o u p s = S p e c i e s , main = " 2 " , pch = 1 : 3 ,
s c a l e s = l i s t ( dr a w = F AL S E ) , z l a b = " S W" ,
s c r e e n = l i s t ( z = 30 , x = - 75 , y = 0) ) ,
s pl i t = c ( 2 , 1 , 2 , 2) , mor e = T RUE )
pr i nt ( c l oud ( Pe t a l . L e ng t h ∼ S e pa l . L e ng t h * S e pa l . Wi dt h ,
d a t a = i r i s , g r o u p s = S p e c i e s , main = " 3 " , pch = 1 : 3 ,
s c a l e s = l i s t ( dr a w = F AL S E ) , z l a b = " PL " ,
s c r e e n = l i s t ( z = 30 , x = - 55 , y = 0) ) ,
s pl i t = c ( 1 , 2 , 2 , 2) , mor e = T RUE )
pr i nt ( c l oud ( Pe t a l . Wi dt h ∼ S e pa l . L e ng t h * S e pa l . Wi dt h ,
d a t a = i r i s , g r o u p s = S p e c i e s , main = " 4 " , pch = 1 : 3 ,
s c a l e s = l i s t ( dr a w = F AL S E ) , z l a b = " PW" ,
s c r e e n = l i s t ( z = 30 , x = - 55 , y = 0) ) ,
s pl i t = c ( 2 , 2 , 2 , 2) )
de t a c h ( i r i s )
1 2
SL SW
Petal.Width Petal.Width
Petal.Length Petal.Length
3 4
PL PW
Sepal.Width Sepal.Width
Sepal.Length Sepal.Length
1.0
50 180
0.8
40 160
0.6
30
140
0.4
20
0.2
120
110
10
0.0
100
Fig.4.6: Contour plot and levelplot of volcano data in Examples 4.6 and
4.7.
Example 4.7 (Filled contour plots)
A contour plot with a 3D effect could be displayed in 2D by over-
laying the contour lines on a color map corresponding to the height.
The image function in the graphics package provides the color back-
ground for the plot. The plot produced below is similar to Figure
4.6(a), with the background of the plot in terrain colors.
image ( v o l c a n o , c o l = t e r r a i n . c o l o r s ( 1 0 0 ) , a x e s = FA L S E )
c o n t o u r ( v o l c a n o , l e v e l s = s e q ( 1 0 0 , 2 0 0 , by = 1 0 ) , add = TRUE )
3
Counts
2 23
22
20
1 19
18
Fig.4.7: Flat density his-
16
togram of bivariate normal
x[, 2]
0 15
13
−1 12 data with hexagonal bins pro-
11
−2
9
8
duced by hexbin in Example
6
5 4.8.
−3 4
2
1
−3 −2 −1 0 1 2 3
x[, 1]
5. Other 2D Representations of Data
Andrews curves, parallel coordinate plots, and various iconographic
displays such as segment plots and star plots.
1. Andrews Curves
The plot represents each observation in the dataset as a curve in two-
dimensional space. The curve is generated by computing the Fourier
series of the observation's values, which can be thought of as a
mathematical representation of the curve. Each observation is then
represented by a curve, and all the curves are plotted on the same
graph.
The x-axis of the plot represents the frequency of the sine and
cosine waves used to generate the curves, while the y-axis
represents the amplitude of the waves. Each curve is then
colored based on a categorical variable or a continuous
variable.
Example 4.9 (Andrews curves)
• Measurements of leaves for two types of leaf architecture are
represented by Andrews curves (leafshape17 in DAAG pack-
age). Three measurements (leaf length, petiole, and leaf width)
correspond to points in R3.
• To plot the curves, define a function to compute f i (t) for arbi-
trary points x i in R3 and − π ≤ t ≤ π. Evaluate the function
along the interval [−π, π] for each sample point x i .
l i br a r y ( DAAG)
a t t a c h ( l e a f s ha pe 1 7 )
f < - f unc t i on ( a , v ) {
# Andr e ws c ur v e f ( a ) f or a da t a v e c t or v i n R^3
v [ 1] / s qr t ( 2) + v [ 2] * s i n ( a ) + v [ 3] * c os ( a ) }
# s c a l e data to range [ - 1 , 1 ]
x < - c bi nd ( bl a de l e n , pet i ol e , bl a de wi d )
n < - nrow ( x )
mi ns < - a ppl y ( x , 2 , mi n ) # c ol umn mi ni mums
ma x s < - a ppl y ( x , 2 , ma x ) # c ol umn ma x i mums
r < - ma x s - mi ns # c ol umn r a ng e s
y < - s we e p ( x , 2 , mi ns ) # s ubt r a c t c ol umn mi ns
y < - s we e p ( y , 2 , r , " / " ) # di v i de by r a ng e
x < - 2 * y - 1 # now ha s r a ng e [ - 1 , 1]
# s et up pl ot wi ndow, but pl ot not hi ng y et
pl ot ( 0 , 0 , x l i m = c ( - pi , pi ) , y l i m = c ( - 3 , 3) ,
x l a b = " t " , y l a b = " Andr e ws Cur v e s " ,
ma i n = " " , t y pe = " n" )
# now add t h e Andrews c u r v e s f o r each o b s e r v a t i o n
# l i ne t y pe c or r e s ponds t o l e a f a r c hi t e c t ur e
# 0= o r t h o t r o p i c , 1= p l a g i o t r o p i c
a < - s eq ( - pi , pi , l en = 101)
di m( a ) < - l e ng t h ( a )
f or ( i i n 1: n) {
g <- ar c h[ i ] + 1
y < - a ppl y ( a , MARGI N = 1 , F UN = f , v = x [ i , ] )
l i n e s ( a , y, l t y = g)
}
l e g e nd ( 3 , c ( " Or t hot r opi c " , " P l a g i ot r opi c " ) , l t y = 1: 2)
d e t a c h ( l e a f s h a p e 17 )
3
Orthotropic
Plagiotropic
2
Fig.4.8: Andrews curves for
leafshape17 (DAAG) data
1
Andrews Curves
at latitude 17.1: leaf length,
0
width, and petiole measure-
−1
ments in Example 4.9. Curves
are identified by leaf architec-
−2
ture.
−3
−3 −2 −1 0 1 2 3