0% found this document useful (0 votes)
38 views

Week 5 Lecture 14

Uploaded by

HANJING QUAN
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
38 views

Week 5 Lecture 14

Uploaded by

HANJING QUAN
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 33

AA037014 Statistical

Computing
5: Visualization
Content
• Panel Displays
• Surface Plots and 3D Scatter Plots
• Contour Plots
• Other 2D Representations of Data
• Other Approaches to Data Visualization
Introduction
• Visualization of multivariate data is related to exploratory data analysis
(EDA).
• The term ‘exploratory’ is in contrast to ‘confirmatory’, which could describe
hypothesis testing.
• It was important to do the exploratory work before hypothesis testing, to learn what
are the appropriate questions to ask, and the most appropriate methods to answer
them.
• With multivariate data, we may also be interested in dimension reduction or finding
structure or groups in the data.
• In this chapter, we focus on methods for visualizing multivariate data.
Several graphics functions are used, including R graphics package, lattice
and MASS, rggobi interface to GGobi and rgl package for interactive 3D
visualization. Table 1.4 lists some basic graphics functions. Table 4.1 lists
more.
Panel displays
• Panel display: an array of two-dimensional graphical summaries of
pairs of variables in a multivariate dataset. For example, a scatterplot
matrix displays the scatterplots for all pairs of variables in an array.
pairs: produce a scatterplot matrix, as shown in Figures 4.1 and 4.2 in
Example 4.1, and Figure 3.7. An example of three-dimensional plots is
Figure 4.5.
Example 4.1 (Scatterplot matrix)
• Compare the four variables in the iris data for the species virginica, in
a scatterplot matrix.
# virg in ica data in f i rs t 4 columns of the last 50 obs. p airs ( i r i s [101:150 , 1:4])

• The variable names will appear along the diagonal. The pairs function
takes an optional argument diag.panel, which is a function that
determines what is displayed along the diagonal.
To obtain a graph with estimated density curves along the diagonal,
supply the name of a function to plot the densities. The following
panel.d plot the densities.
pa ne l . d < - f unc t i on ( x , . . . ) {
us r < - pa r ( " us r " )
on . ex i t ( pa r ( us r ) )
pa r ( us r = c ( us r [ 1: 2] , 0 , . 5) )
l i ne s ( de ns i t y ( x ) )
}

In panel.d, the graphics parameter usr specifies the extremes of


the user coordinates of the plotting region. Before plotting, apply
s c a l e to standardize each of the one-dimensional samples.
x < - s c a l e ( i r i s [ 101: 150 , 1: 4] )
r < - r a ng e ( x )
p a i r s ( x , d i a g . panel = p a n e l . d , xlim = r , ylim = r )

The pairs plot is displayed in Figure 4.1.


−2 0 1 2 −2 0 1 2

Sepal.Length

0 1 2
−2
Sepal.Width
0 1 2
Fig.4.1: Scatterplot matrix (pairs)
comparing four measurements of iris
−2

Petal.Length virginica species in Example 4.1.

0 1 2
−2
Petal.Width
0 1 2
−2

−2 0 1 2 −2 0 1 2

Observation: The length variables are positively correlated, and the


width variables appear to be positively correlated. Other structure
could be present in the data that is not revealed by the bivariate
marginal distributions.
Illustrate the scatterplot matrix function splom in l a t t i c e .

library ( lattice )
splom ( iris [101:150 , 1:4]) # plot 1
# for all 3 at once , in color , plot 2
splom ( iris [,1:4], groups = iris$ Species) # for all 3 at
once , black and white , plot 3
splom (∼iris [1:4] , groups = Species , data = iris , col = 1 , pch = c(1 ,
2 , 3), cex = c(.5 ,.5 ,.5))
}

The last plot (plot 3) is displayed in Figure 4.2. It is displayed here in


black and white, but on screen the panel display is easier to interpret
when displayed in color (plot 2). Also see the 3D scatterplot of the
iris data in Figure 4.5.
2.5
1.5 2.0 2.5
2.0

1.5
Petal.Width
1.0

0.5
.0 0.5 1.0
0.0
7
4 5 6 7
6
5
4 Petal.Length 4
3
2
1 2 3 4
1
4.5
3.5 4.0 4.
4.0

3.5
Sepal.Width
3.0

2.5
2.0 2.5 3.0
2.0
8
7 8
7

Sepal.Length 6

5
5 6

Scatter Plot Matrix

Fig.4.2: Scatterplot matrix comparing four measurements of iris data: se-


tosa (circle), versicolor (triangle), virginica (cross) from Example 4.1.
Surface Plots and 3D Scatter Plots
• persp (graphics) draw perspective plots of surfaces over the
plane.
• demo(persp): try running the demo examples for persp.
• 3D methods in the l a t t i c e graphics package and the r g l
package.

4.3.1 Surface plots


expand.grid: mesh a grid of regularly spaced points in the plane.
If we do not need to save the x,y values, and only need the
function values {z ij = f (x i , y j )}, the outer function can be used.

Example 4.2 (Plot bivariate normal density)


Plot the standard bivariate normal density
1 —1( x 2+ y 2)
f (x, y) = e 2 , (x, y) ∈R2

In this example, z i j = f (x i , y j ) are computed by the outer function.

# t he s t a nda r d BVN dens i t y f <- f unc t i on ( x , y ) {


z <- ( 1 / ( 2 * pi ) ) * ex p ( - . 5 * ( x ^2 + y ^2) )
y <- x <- s eq ( - 3 , 3 , l engt h = 50) }
z < - o u t e r ( x , y , f ) # compute d e n s i t y f o r a l l ( x , y ) p e rs p ( x , y , z ) #
the d e f a u l t p l o t
• p e rs p ( x , y , z , t h e t a = 45 , p h i = 30 , expand = 0 . 6 , l t h e t a = 120 ,
shade = 0 . 7 5 , t i c k t y p e = " d e t a i l e d " , x l a b = " X " , y l a b = " Y " , z l a b =
"f(x, y)")

• The second version of the perspective plot is shown in Figure 4.3.


• R note 4.1
• o u t e r ( x , y, f ) apply the third argument f to the grid of (x, y) values. The returned
value is a matrix of function values for every point (x i , y j ) in the grid.
• For a presentation, adding color (say, c o l = " l i g ht b l u e " ) produces a more attractive plot.
box can be suppressed by box
• = FALSE.
Example 4.3 (Add elements to perspective plot)
Use the viewing transformation returned by the perspective plot of
the standard bivariate normal density to add points, lines, and text.

0.15

0.10
Fig.4.3: Perspective plot of the stan-
0.05
dard bivariate normal density in Ex-
−3 3 ample 4.2.
−2 2
−1 1
0 0
1 −1

2 −2

3 −3

# s t or e v i e wi ng t r a ns f or ma t i on i n M
M= pe r s p ( x , y , z , t het a = 45 , phi = 30 ,
ex pa nd = . 4 , box = FAL S E )
The transformation returned by the persp function call is
[,1] [,2] [,3] [,4]
[1 ,] 2 .357023e -01 -0 .1178511 0 .2041241 -0 .2041241
[2 ,] 2 .357023e -01 0 .1178511 -0 .2041241 0 .2041241
[3 ,] -2 .184757e -16 4 .3700078 2 .5230252 -2 .5230252
[4 ,] 1 .732284e -17 -0 .3464960 -2 .9321004 3 .9321004

This transformation M is applied to (x, y, z, t) to project points onto


the screen for display in the same coordinate system used to draw
the perspective plot.
# a dd s ome poi nt s a l ong a c i r c l e
a < - s e q ( - pi , pi , pi / 16)
newpts < - c b i n d ( c o s ( a ) , s i n ( a ) ) * 2 ; newpts < - c b i n d ( newpts , 0 , 1 ) # z = 0 , t
N < - newpts%*% M; p o i n t s ( N [ , 1 ] / N [ , 4 ] , N [ , 2 ] / N [ , 4 ] , c o l = 2 )
# add l i n e s
x 2 < - s e q ( - 3 , 3 , . 1) ; y 2 < - - x 2 ^2 / 3
z 2 < - dnorm ( x2 ) * dnorm ( y2 ) ; N< - c b i n d ( x2 , y2 , z2 , 1 ) % * % M
l i n e s ( N [ , 1 ] / N [ , 4 ] , N [ , 2 ] / N [ , 4 ] , co l =4)
# a dd t ex t
x 3 <- c ( 0 , 3. 1) ; y 3 < - c ( 0 , - 3. 1)
z 3 < - dnorm ( x3 ) * dnorm ( y3 ) * 1 . 1 ; N< - c b i n d ( x3 , y3 , z3 , 1 ) % * % M
t e x t ( N[ 1 , 1 ] / N[1 , 4 ] , N[ 1 , 2 ] / N[1 , 4 ] , " f ( x , y ) " )
t e x t ( N[ 2 , 1 ] / N[2 , 4 ] , N[ 2 , 2 ] / N[2 , 4 ] , b q u o t e ( y = = - x ^ 2 / 3 ) )
f(x,y)

Fig.4.4: Perspective plot of the s-


tandard bivariate normal density with
elements added using the viewing
transformation returned by persp in
Example 4.3.
y = − x2 3

The plot with added elements is shown in Fig.4.4 (Note: R provides a


function trans3d to compute the coordinates above. Here we have shown
the calculations.)
Other functions for graphing surfaces
Use w i ref ra m e ( l att i c e ) to display a surface plot of the bivariate
normal density similar to Figure 4.3.
Example 4.4 (Surface plot using wireframe(lattice))
wireframe requires a formula z ∼ x ∗ y, where z = f (x, y) is the
surface to be plotted. x, y and z must have the same number of
rows. Generate matrix of (x, y) coordinates by expand.grid.

l i br a r y ( l a t t i c e )
x < - y < - s eq ( - 3 , 3 , l e ng t h = 50)
x y < - ex pa nd . g r i d ( x , y )
z < - ( 1 / ( 2 * pi ) ) * ex p ( - . 5 * ( x y [ , 1] ^2 + x y [ , 2] ^ 2) )
wireframe ( z ∼ x y [ , 1 ] * xy [ , 2 ] ) }
4.3.2 Three-dimensional scatterplot
cloud ( l a t t i c e ) function produces 3D scatterplots, which could
explore whether there are groups or clusters in the data. To apply
cloud, provide a formula z ∼ x∗y, where z = f (x, y) is the surface.
Example 4.5 (3D scatterplot)
Use cloud to display a 3D scatterplot of the iris data. There are
three species of iris and each is measured on four variables. The
following code produces a 3D scatterplot of sepal length, sepal
width, and petal length (similar to (3) in Figure 4.5).
l i br a r y ( l a t t i c e )
at t ac h( i r i s )
# ba s i c 3 c ol or pl ot wi t h a r r ows a l ong a x e s
pr i nt ( c l oud ( Pet a l . L e ng t h ∼ S e pa l . L e ng t h * S e pa l . Wi dt h , da t

The iris data has four variables, so there are four subsets of three
variables to graph. To see all four plots on the screen, use the more
and split options. The split arguments determine the location of the
plot within the panel display.
pr i nt ( c l oud ( S e pa l . L e ng t h ∼ Pe t a l . L e ng t h * Pe t a l . Wi dt h ,
d a t a = i r i s , g r o u p s = S p e c i e s , main = " 1 " , pch = 1 : 3 ,
s c a l e s = l i s t ( dr a w = F AL S E ) , z l a b = " S L " ,
s c r e e n = l i s t ( z = 30 , x = - 75 , y = 0) ) ,
s pl i t = c ( 1 , 1 , 2 , 2) , mor e = T RUE )

pr i nt ( c l oud ( S e pa l . Wi dt h ∼ Pe t a l . L e ng t h * Pe t a l . Wi dt h ,
d a t a = i r i s , g r o u p s = S p e c i e s , main = " 2 " , pch = 1 : 3 ,
s c a l e s = l i s t ( dr a w = F AL S E ) , z l a b = " S W" ,
s c r e e n = l i s t ( z = 30 , x = - 75 , y = 0) ) ,
s pl i t = c ( 2 , 1 , 2 , 2) , mor e = T RUE )

pr i nt ( c l oud ( Pe t a l . L e ng t h ∼ S e pa l . L e ng t h * S e pa l . Wi dt h ,
d a t a = i r i s , g r o u p s = S p e c i e s , main = " 3 " , pch = 1 : 3 ,
s c a l e s = l i s t ( dr a w = F AL S E ) , z l a b = " PL " ,
s c r e e n = l i s t ( z = 30 , x = - 55 , y = 0) ) ,
s pl i t = c ( 1 , 2 , 2 , 2) , mor e = T RUE )

pr i nt ( c l oud ( Pe t a l . Wi dt h ∼ S e pa l . L e ng t h * S e pa l . Wi dt h ,
d a t a = i r i s , g r o u p s = S p e c i e s , main = " 4 " , pch = 1 : 3 ,
s c a l e s = l i s t ( dr a w = F AL S E ) , z l a b = " PW" ,
s c r e e n = l i s t ( z = 30 , x = - 55 , y = 0) ) ,
s pl i t = c ( 2 , 2 , 2 , 2) )
de t a c h ( i r i s )
1 2

SL SW

Petal.Width Petal.Width
Petal.Length Petal.Length

3 4

PL PW

Sepal.Width Sepal.Width
Sepal.Length Sepal.Length

Fig.4.5: 3D scatterplots of iris data produced by cloud (lattice) in Example


4.5, with each species represented by a different plotting character.
Observation: three species of iris are separated into groups or clus-
ters, which is evident in these plots. One might follow up with
cluster analysis or principal components analysis to analyze the ap-
parent structure in the data.
R note 4.2
• The screen option sets the orientation of the axes. Setting
draw = FALSE suppresses arrows and tick marks on the axes.
• To split the screen into n rows and m columns, and put the plot
into position (r, c), set s p l i t equal to the vector (r, c, n, m).
• One unusual feature of cloud is that unlike most graphics func-
tions in R, cloud does not plot a panel figure unless we print
it.
4.4 Contour Plots
• A contour plot represents a 3D surface (x, y, f (x, y)) in the plane by
projecting the level curves f (x, y) = c for selected constants c.
• The functions contour (graphics) and contourplot ( l a t t i c e )
produce contour plots.
• The functions fille d. conto ur in the graphics package and levelplot
function in the l a t t i c e package produce filled contour plots. Both
contour and contourplot label the contours by default.
• A variation of this type of plot is image (graphics), which uses color to
identify contour levels.
Example 4.6 (Contour plot)
volcano data: an 87 by 61 matrix containing topographic infor-
mation for the Maunga Whau volcano.
# c ont our pl ot wi t h l a be l s
c ont our ( v ol c a no , a s p = 1 , l a bc ex = 1)
# a not he r v e r s i on f r om l a t t i c e pa c k a g e
l i br a r y ( l a t t i c e )
c ont our pl ot ( v ol c a no ) # s i mi l a r t o a bov e

A 3D view of the volcano surface is provided in the examples of


the persp function. Type example(persp).
An interactive 3D view of the volcano appears in the examples.
l i br a r y ( r g l )
ex a mpl e ( r g l )

For another 3D view of the volcano data, with shading to indicate


contour levels, see the first example in the wireframe help file.
60 200

1.0
50 180

0.8
40 160
0.6

30
140
0.4

20
0.2

120
110

10
0.0

100

0.0 0.2 0.4 0.6 0.8 1.0


20 40 60 80
(a) (b)

Fig.4.6: Contour plot and levelplot of volcano data in Examples 4.6 and
4.7.
Example 4.7 (Filled contour plots)
A contour plot with a 3D effect could be displayed in 2D by over-
laying the contour lines on a color map corresponding to the height.
The image function in the graphics package provides the color back-
ground for the plot. The plot produced below is similar to Figure
4.6(a), with the background of the plot in terrain colors.
image ( v o l c a n o , c o l = t e r r a i n . c o l o r s ( 1 0 0 ) , a x e s = FA L S E )
c o n t o u r ( v o l c a n o , l e v e l s = s e q ( 1 0 0 , 2 0 0 , by = 1 0 ) , add = TRUE )

Using image without contour produces essentially the same type of


plot as f i l l e d . c o n t o u r ( g ra p h i c s ) and l e v e l p l o t ( l a t t i c e ) .
The contours of f i l l e d . c o n t o u r and l e v e l p l o t are identified by
a legend rather than superimposing the contour lines.
Compare the plot produced by image with the following
two plots.
f i l l e d . c o n t o u r ( vo lcano , c o l o r = t e r r a i n . c o l o r s , asp =1 )
l e v e l pl ot ( v ol c a no , s c a l e s = l i s t ( dr a w = FAL S E ) ,
xlab = " " , ylab = " " )

The plot produced by l e v e l p l o t is shown in Figure 4.6(b).


• A limitation of 2D scatterplots is that for large data sets, there
are often regions where data is very dense, and regions where
data is quite sparse. In this case, the 2D scatterplot does not
reveal much information about the bivariate density.
• Another approach is to produce a 2D or flat histogram, with
the density estimate in each bin represented by an appropriate
color.
Example 4.8 (2D histogram)
Simulated bivariate normal data is displayed in a flat histogram with
hexagonal bins. hexbin in package hexbin produces a basic version
of this plot in grayscale.
l i br a r y ( hex bi n )
x < - ma t r i x ( r nor m( 4000) , 2000 , 2)
pl ot ( hex bi n ( x [ , 1] , x [ , 2] ) )

3
Counts
2 23
22
20
1 19
18
Fig.4.7: Flat density his-
16
togram of bivariate normal
x[, 2]

0 15
13
−1 12 data with hexagonal bins pro-
11

−2
9
8
duced by hexbin in Example
6
5 4.8.
−3 4
2
1
−3 −2 −1 0 1 2 3
x[, 1]
5. Other 2D Representations of Data
Andrews curves, parallel coordinate plots, and various iconographic
displays such as segment plots and star plots.
1. Andrews Curves
The plot represents each observation in the dataset as a curve in two-
dimensional space. The curve is generated by computing the Fourier
series of the observation's values, which can be thought of as a
mathematical representation of the curve. Each observation is then
represented by a curve, and all the curves are plotted on the same
graph.

The x-axis of the plot represents the frequency of the sine and
cosine waves used to generate the curves, while the y-axis
represents the amplitude of the waves. Each curve is then
colored based on a categorical variable or a continuous
variable.
Example 4.9 (Andrews curves)
• Measurements of leaves for two types of leaf architecture are
represented by Andrews curves (leafshape17 in DAAG pack-
age). Three measurements (leaf length, petiole, and leaf width)
correspond to points in R3.
• To plot the curves, define a function to compute f i (t) for arbi-
trary points x i in R3 and − π ≤ t ≤ π. Evaluate the function
along the interval [−π, π] for each sample point x i .

l i br a r y ( DAAG)
a t t a c h ( l e a f s ha pe 1 7 )
f < - f unc t i on ( a , v ) {
# Andr e ws c ur v e f ( a ) f or a da t a v e c t or v i n R^3
v [ 1] / s qr t ( 2) + v [ 2] * s i n ( a ) + v [ 3] * c os ( a ) }
# s c a l e data to range [ - 1 , 1 ]
x < - c bi nd ( bl a de l e n , pet i ol e , bl a de wi d )
n < - nrow ( x )
mi ns < - a ppl y ( x , 2 , mi n ) # c ol umn mi ni mums
ma x s < - a ppl y ( x , 2 , ma x ) # c ol umn ma x i mums
r < - ma x s - mi ns # c ol umn r a ng e s
y < - s we e p ( x , 2 , mi ns ) # s ubt r a c t c ol umn mi ns
y < - s we e p ( y , 2 , r , " / " ) # di v i de by r a ng e
x < - 2 * y - 1 # now ha s r a ng e [ - 1 , 1]
# s et up pl ot wi ndow, but pl ot not hi ng y et
pl ot ( 0 , 0 , x l i m = c ( - pi , pi ) , y l i m = c ( - 3 , 3) ,
x l a b = " t " , y l a b = " Andr e ws Cur v e s " ,
ma i n = " " , t y pe = " n" )
# now add t h e Andrews c u r v e s f o r each o b s e r v a t i o n
# l i ne t y pe c or r e s ponds t o l e a f a r c hi t e c t ur e
# 0= o r t h o t r o p i c , 1= p l a g i o t r o p i c
a < - s eq ( - pi , pi , l en = 101)
di m( a ) < - l e ng t h ( a )
f or ( i i n 1: n) {
g <- ar c h[ i ] + 1
y < - a ppl y ( a , MARGI N = 1 , F UN = f , v = x [ i , ] )
l i n e s ( a , y, l t y = g)
}
l e g e nd ( 3 , c ( " Or t hot r opi c " , " P l a g i ot r opi c " ) , l t y = 1: 2)
d e t a c h ( l e a f s h a p e 17 )
3
Orthotropic
Plagiotropic

2
Fig.4.8: Andrews curves for
leafshape17 (DAAG) data

1
Andrews Curves
at latitude 17.1: leaf length,

0
width, and petiole measure-
−1
ments in Example 4.9. Curves
are identified by leaf architec-
−2

ture.
−3

−3 −2 −1 0 1 2 3

The plot reveals similarities within plagiotropic and orthotropic leaf


architecture groups, and differences between these groups. In gen-
eral, this type of plot may reveal possible clustering of data.
R note 4.4
To identify the curves by color, replace l t y with c o l parameters in
the lines and legend statements.
Exercise
The random variables X and Y are independent and identically
distributed with normal mixture distributions. The components of the
mixture have N(0, 1) and N(3, 1) distributions with mixing probabilities
p1 and p2 = 1−p1 respectively. Generate a bivariate random sample
from the joint distribution of (X, Y ) and construct a contour plot. Adjust
the levels of the contours so that the the contours of the second mode
are visible.

You might also like