Visualizing Complex Data Using R (2014, N.D.lewis)
Visualizing Complex Data Using R (2014, N.D.lewis)
N.D. Lewis
Copyright © 2013 N.D.Lewis
ISBN-13: 978-1490433547
ISBN-10: 1490433546
DEDICATION
If you were to end your analysis with this conclusion you would be making a
huge mistake. At best the mistake will make your boss look foolish, cost you
your job, your company lost profits and result in a very dissatisfied client. Too
see why, take a look at Figure 1 1 , Figure 1 2 , Figure 1 3 and Figure 1 4 . All
four diagrams show the scatterplot of x and y (with regression line in red) for
each dataset. Clearly the relationship between x and y differs greatly, yet this
difference was unobservable in the traditional summary statistics of mean,
standard deviation and correlation. If you were to report the results of standard
statistical analysis, you would not find much difference between the data. The
real magic happens when these datasets are represented visually. All of a
sudden, the datasets look very different from each other, and you cannot help but
see and understand much more about the underlying relationships. This is the
power of visualizing data. Scientific data visualization is a collection
statistical methods, both quantitative and qualitative, for identifying
relationships in data and consolidating them into an illustrative informative
summarizing graphic.
Figure 1 1 Plot of x and y for dataset 1
Tip: Copies of R and free tutorial guides for beginners can be downloaded at
http://www.r-project.org/
How to get the most from this book
There are at least three ways to use this book to boost your productivity and
creativity. First, you can dip into it as an efficient reference tool. Identify the
data visualization idea you need and see how to create it quickly. For best
results type in the examples given in the text, examine the results, and then adjust
for your own use. Second, browse through the numerous illustrations, tips and
examples to help stimulate your own creativity. Third, by typing the many
illustrations and experimenting, you will strengthen your knowledge and
understanding of both scientific data visualization and R.
You don’t have to wait until you have read the entire book to incorporate the
ideas into your own analysis and presentations. You can experience their
marvelous potency for yourself almost immediately. You can go straight to the
idea of interest and immediately test, create and exploit it in your own research,
analysis and presentations.
Applying the ideas in this book will transform your analysis. If you utilize even
one tip or idea from each chapter, you will be far better prepared to win when
faced with the challenges and opportunities of the ever expanding deluge of
exploitable data. Enough talking, let’s get started!
2. Easily illuminate associations in cross
tabulations
In this plot, the area of the boxes are proportional to the difference in observed
and expected frequencies in the data. The green rectangles above the dashed
lines indicate observed frequencies exceeding those expected. The red
rectangles below the dashed lines indicate observed frequencies below those
expected[iii]. In this example, the green boxes indicate positive association, and
the red boxes indicate negative association. The size and shapes of the box
indicate how strong the associations are. It is immediately evident that tall
people tend to marry tall people at a higher rate than one might expect;
However, this is not the case with short people, they tend to marry other short
people at lower frequency than might be expected.
Figure 2 2 was created using the assocplot function contained in the R base
package. It was built using the following three steps:
Step 1: Download and install the required packages.
require{graphics}
Figure 2 3 Association plot using the assoc function in the vcd package
Tip 2: Occasional you may need to change the orientation of the text in the
image. This can be achieved by using the las argument. For
example, mosaicplot(data, color = 1:4, las = 1) produces the image in Figure 2 9 .
Tip 3: It is always useful to know of differentfunctions which produce similar
types of chart. A mosaic plot can also be produced using the mosaic function in
the package vcd.
Figure 2 9 Mosiac plot using mosaicplot(data, color = 1:4, las = 1)
Figure 5 4 relationship between tips and total bill given group size
install.packages("extracat ")
install.packages("reshape")
require(extracat)
require(reshape)
data(tips)
boxplot2g(tips$total_bill,tips$tip,tips$size)
Figure 5 5 relationship between tips and total bill given day of week
install.packages("extracat ")
install.packages("reshape")
require(extracat)
require(reshape)
data(tips)
boxplot2g(tips$total_bill,tips$tip,tips$day)
Further resources
Further reading on the extracat package can be found in:
Alexander Pilhoefer, Antony Unwin (2013). New Approaches in
Visualization of Categorical Data: R Package extracat. Journal of
Statistical Software, 53(7), 1-25. URL
http://www.jstatsoft.org/v53/i07/.
For further details on the datasets mentioned in this chapter see:
Anderson, Edgar (1935). The irises of the Gaspe Peninsula, Bulletin of the
American Iris Society, 59, 2–5.
Where xik is the value of variable xk for entity i and xjk is the value of the same
variable for entity j.
Figure 3 1 shows a dendrogram obtained from an average agglomeration method
using crime data[iv] from eleven US states. A scale which measures the
dissimilarity (distance) has been placed at the bottom of the chart. States which
are close in terms of distance have branches coded in similar colors. The
distance matrix for this data is shown in the image below. The two closest states
were Connecticut and Pennsylvania. These are distance 165 apart. The next
closet pair of states is Missouri and Alaska, which are 244 apart. Then come
New York and Vermont which are 326 apart. The District of Columbia is the
furthest apart or the least similar.
Innovative use of dendrograms
One of the major uses of dendrograms is to discern any sub- sets of observations
whose members are both close to one another and/ or isolated from the rest.
This ability has proved extremely useful in fields ranging from marketing and
fake software detection to understanding genetic diversity in agricultural crops.
dictyExpres: The dictyExpres[v] is an interactive, web-based exploratory data
analytics application providing access to over 1,000 Dictyostelium gene
expression experiments from Baylor College of Medicine. The dictyExpres
application can be used to find clusters of similarly expressed genes via
dendrogram visualization. Dendrogram visualization taken directly from the
web application is shown in Figure 3 2 .
Using the same method the cladogram shown in Figure 3 5 was created by
entering:
plot(data, edge.color = rainbow(length(data$edge)/2),tip.color=”brown”,edge.width = 2,font=2,type=
“cladogram”,no.margin=TRUE)
Figure 4 4 illustrates the treemap obtained using this data. It was created using
the following three steps:
Step 1: Download and install the required packages.
install.packages("portfolio")
install.packages("treemap")
The variable z is used to create a scale for the color of the rectangles. It is
based on the rank population size, adjusted so it lies in the range -104 to +104.
This adjustment is made to emphasize the visual contrast in colors.
Step 3: The map.market function takes four arguments. The first is the unique
identify for each data item. In this case the unique country identifier contained
obtained from GNI2010$iso3 . The second argument calculates the area of the
rectangles; we used gross national income contained in GNI2010$GNI . The
variable z measures the relative rank population size ; it is used to calculate the
color of the rectangles. Finally, the title of the chart was created
using main="Gross National Income - Population" .
Customizing map.market
Figure 4 5 Treemap of Dow-Jones Index with stock labels
Tip 1: The most common request is for the individual labels of data elements to
be represented on the treemap. This can be achieved by adding the
argument lab=TRUE to the map.market function. For example, to add labels to Figure
4 2 type:
data(dow.jan.2005)
map.market(id = dow.jan.2005$symbol,
area = dow.jan.2005$price,
group = dow.jan.2005$sector,
color = 100 * dow.jan.2005$month.ret,
lab=TRUE)
We will create a new function to draw our required treemap. This can be
achieved in four steps:
Step 1: Add the term map.market.color<- to the start of the code. Your edited code
should now look something like this:
map.market.color<-
function (id, area, group, color, scale = NULL, lab = c(TRUE,
FALSE), main = "Map of the Market", print = TRUE)
You have just created a new function called map.market.color.
Step 2: Look for the following lines of code (you should find them around line
165 in your text editor):
mapmarket <- gTree(name = "MAPMARKET", children = gList(rectGrob(gp = gpar(col = "dark grey",
fill = "dark grey"), name = "background"), top.tree,
map.tree))
Change the phase "dark grey" to "white" . The background color has been changed
from dark grey to white!
Step 3: Now let’s change the color of the text from white to blue. Change "white"
to "blue" in the statement (around line 124):
label = this.data$label[s], gp = gpar(col = "white")
Next, change "white" to "blue" in statement (around line 133):
name = "label", gp = gpar(col = "white")
These two lines of code are responsible for the color of positive and negative
numbers. Change the lines of code so that they state:
color.ramp.pos <- colorRamp(c("yellow", "darkgreen"))
color.ramp.neg <- colorRamp(c("orange", "darkred"))
The edited function is complete. Save it and then copy into the R consol. Now
you are ready to execute the code. Enter the following into the R consul and you
will see the image shown in Figure 4 6:
data(dow.jan.2005)
map.market.color (id = dow.jan.2005$symbol,
area = dow.jan.2005$price,
group = dow.jan.2005$sector,
color = 100 * dow.jan.2005$month.ret,
lab=TRUE)
How to create a treemap using treemap
The treemap function in the package treemap provides a useful alternative for
creating a treemap in R. We illustrate the use of this function using data from the
London datastore. This is a free to access website created by the Greater
London Authority as a portal for London’s data. It is hosted at
http://data.london.gov.uk/ with home page similar to that shown below.
Let's make a withdrawal from the store. Our goal is to create a treemap using
data on the political control, income and unemployment rate for each London
borough. To do this we will use the London boroughs dataset located at:
http://data.london.gov.uk/datastore/package/london-borough-profiles . Figure
4 7 illustrates the treemap created using this dataset. It was created using the
following three steps:
Step 1: Download and install the required packages.
install.packages("treemap")
require("treemap")
The 2d-bc plot offers an extremely useful exploratory data analysis tool as the
following example illustrates.
Tipping in Restaurants: Wait table staff make a considerable proportion of
their income from tips. The better the service, friendlier the waiter and tastier
the food, the higher the expected tip. Bryant & Smith (1995) report on a waiter
who recorded information about each tip he received over a period of several
months. For each of the 244 tips the waiter recorded information on:
tip in dollars,
bill in dollars,
sex of the bill payer,
whether there were smokers in the party,
day of the week,
time of day, *
size of the party.
Figure 5 4 relationship between tips and total bill given group size
install.packages("extracat ")
install.packages("reshape")
require(extracat)
require(reshape)
data(tips)
boxplot2g(tips$total_bill,tips$tip,tips$size)
Figure 5 5 relationship between tips and total bill given day of week
install.packages("extracat ")
install.packages("reshape")
require(extracat)
require(reshape)
data(tips)
boxplot2g(tips$total_bill,tips$tip,tips$day)
Further resources
Further reading on the extracat package can be found in:
Alexander Pilhoefer, Antony Unwin (2013). New Approaches in
Visualization of Categorical Data: R Package extracat. Journal of
Statistical Software, 53(7), 1-25. URL
http://www.jstatsoft.org/v53/i07/.
For further details on the datasets mentioned in this chapter see:
Anderson, Edgar (1935). The irises of the Gaspe Peninsula, Bulletin of the
American Iris Society, 59, 2–5.
Exploration and display of correlations form an integral part of the day to day
duties of the data scientist. Where many variables are concerned, the usual
approach is to display a correlation table, typically similar to that shown in
Figure 6 1 . The figure shows the correlation of the daily return of eleven
exchange traded funds using closing prices over the time period August 23 rd
2011 to September 11 th 2013. Although the correlations are presented neatly, it
is often difficult to discern the true multivariate nature of the relationships. In
practice, a researcher might be interested in correlations above a certain
absolute value. A popular, although primitive, solution is to highlight such
correlations in the above table. A much better idea, is to use a correlation
network to display visually the relationship between variables.
Figure 6 2 illustrates a basic correlation network using the exchange traded
funds data. It is constructed so that green lines represent a positive correlation
between any two variables, and red lines a negative correlation. The thickness
of the lines increases with the absolute value of the correlation.
Figure 6 2 Basic correlation networks of popular exchange traded funds
The image captures neatly the multivariate nature of the correlation structure in a
more intuitive fashion that a traditional table. Further details on the correlation
structure can be revealed by only examining correlations above a specific
absolute value. For example, suppose the researcher is interested in absolute
correlations greater than 0.5. The resultant correlation network is displayed in
Figure 6 3. Notice how the multivariate interrelationships between the variables
is immediately evident. A positive core of four exchange traded funds (acw, gnr,
vwo and ivw) is apparent, whilst igv and pcy appear uncorrelated with the
remaining exchange traded funds (at the cutoff threshold).
Figure 6 3 Correlations greater than |0.5|
Innovative use of correlation networks
Physiological networks using drugs: Grossman, Adam et al (2013) built
correlation networks using physiologic data to investigate changes associated
with pressor use in intensive care. The researchers collected 29 physiological
variables at one-minute intervals from nineteen trauma patients. They grouped
each minute of data as receiving or not receiving pressors. For each group
correlation networks of pairs of physiologic variables were built. To visualize
drug-associated changes the researchers divided the resultant networks into
three components: an unchanging correlation network, a correlation network of
connections with changing correlation sign, and a correlation network of
connections only present in one group.
Mindfulness training: Taylor, Véronique A., et al (2013) examine the effect
of mindfulness training on functional brain connectivity during a restful state.
Using data from 13 experienced meditators, 11 beginner meditators and
functional magnetic resonance imaging, pairwise correlations were computed
between default mode network seed regions’ time courses. A correlation
network diagram was used to visualize the resultant correlations. The
researchers observed relative to beginners, experienced meditators had weaker
functional connectivity between default mode network (DMN) brain regions
involved in self-referential processing and emotional appraisal. However, they
noted experienced meditators had increased connectivity between certain DMN
regions, compared to beginner meditators. The researchers conclude meditation
training leads to functional connectivity changes between core DMN regions
possibly reflecting strengthened present-moment awareness.
Chinese Prices: Gao and Zhong (2013) collect data on 195 Chinese price
indices from 2003 to 2011. The relationships between the observations are
investigated using a correlation network, see Figure 6 4 . The researchers
choose a correlation of 0.82 as their threshold. Therefore If the correlation
coefficient between two price indices is greater than 0.82 there is an edge
between them.
Figure 6 4 Correlation network of Gao and Zhong[vii]
Creating correlation networks in R
Figure 6 1 was created using the qgraph package. It was built using the following
three steps:
Step 1: Download and install the required packages.
install.packages("PerformanceAnalytics")
install.packages("tseries")
install.packages ("fImport")
install.packages ("qgraph")
library(tseries)
library(PerformanceAnalytics)
library(qgraph)
x <- merge(acwi,vwo,iwv,mcro,ief,igov,pcy,hdg,rem,fty,gnr)
x<- na.omit(x)
colnames(x)<- c("acwi","vwo", "ivw", "mcro", "ief", "igov", "pcy", "hdg", "rem", "fty", "gnr" )
Tip 2: The argument vsize changes node size. Try experimenting with various
values, for example set it to 20 and then 5. What do you observe?
library(tseries)
library(PerformanceAnalytics)
library(qgraph)
x <- merge(acwi,vwo,iwv,mcro,ief,igov,pcy,hdg,rem,fty,gnr)
x<- na.omit(x)
colnames(x)<- c("acwi","vwo", "ivw", "mcro", "ief", "igov", "pcy", "hdg", "rem", "fty", "gnr" )
cor=cor(returns)
qgraph(cor,shape="circle",posCol="darkgreen",negCol="darkred",layout= "spring",vsize=10)
Figure 6 3
As above but replace last line with:
qgraph(cor,shape="circle",posCol="darkgreen",negCol="darkred",layout= "spring",vsize=10,minimum=0.5)
Figure 6 5
As above but replace last line with:
qgraph(cor,shape="circle",posCol="darkgreen",negCol="darkred",layout= "circular",vsize=10)
Further resources
Further reading on the packages mentioned in this chapter see:
Adrian Trapletti and Kurt Hornik (2013). tseries: Time Series Analysis and
Computational Finance. R package version 0.10-32.
Diethelm Wuertz and many others (2013). fImport: Rmetrics - Economic and
Financial Data Import. R package version 3000.82. http://CRAN.R-
project.org/package=fImport
Directional datasets were first analyzed in the Earth Sciences, Meteorology and
Medicine. With the growth in social computing and growing volumes of data
from online sources, it will become more frequently observed in nontraditional
areas also. The analysis of this type of data poses a number of challenges for the
data analyst. For example, the usual statistics such as the arithmetic mean or
standard deviation are not rotational invariant. For example, the average
direction between 10 degrees and 350 degrees is not given by the arithmetic
mean of 180 degrees. However, Rose diagrams offer a powerful solution to the
issue of presenting circular or directional data. Figure 7 1 shows the foraging
direction of three beavers given in Lewis (2013). Each beaver is represented by
a different color (orange, red, blue). The foraging directional preferences of
each beaver are clearly discernible from the diagram.
Figure 7 2 Pollution rose of Carbon monoxide concentration at Marylebone
Fault patterns: Leonard, and Szynkaruk (2013) investigate the fault pattern in
areas of the Polish Outer Carpathians. Fault positions were established by
photointerpretation where zones of breccia and cataclasites extended on valley
slopes as narrow and often dry gullies. The researchers used rose diagrams to
visualize the distribution of fault pattern.
Storm surges: The South China Sea (SCS) lies under the southwest-northeast
pathway of the seasonal monsoons which dominate the large scale sea level
dynamics of region. Tkalich, Pavel, et al (2013) investigate wind-driven sea
level anomalies using tide gauge data and satellite altimetry. Climatological
wind rose diagrams using data over the period 1984 – 2007 were used as a
powerful visualization technique.
Creating rose diagrams in R
Figure 7 1 was created using the circular package. It was built using the following
three steps:
Step 1: Download and install the required packages.
install.packages("circular")
require(circular)
The pollution rose of Figure 7 2 was created using the openair package. It was
constructed using the following three steps:
Step 1: Download and install the required packages.
install.packages("openair")
require(openair)
Tip 2: You can change the plot to a modern wind rose by setting the
argument paddle=TRUE .
Tip 3: When substituting your own data note the pollutionRose function
requires data frames with a field "date" that can be in either POSIXct or Date
format but should be set to the Greenwich Mean Time time-zone. The data
should be at least hourly or higher resolution.
Tip 4: The chart used the argument pollutant <- c("co") to access the carbon
monoxide observations in the mydata data frame. This data frame contains a
number of other air quality observations taken in central London:
Tip 2: The parameter norm ="internal" ensures each individual time series is
categorized into colors based on the its range of values. This implies the same
color in two different time series may have a different interpretations. If "global
“is specified, then each time series will be categorized based on the range of
values for the entire collection of time series.
Figure 8 3 Shaded timeseries plot of European stock markets with levels =2
Summary
Multivariate shaded timeseries plots allow easy identification of relationships
between numerous variables and trends in those variables individually and
simultaneously across time. This is an extremely useful property for exploratory
data analysis of datasets which contain large numbers of timeseries data. This
chart should be frequently deployed for exploratory analysis and is an important
technique in your data visualization and presentation toolkit.
Quick reference R code
Figure 8 1 Multivariate shaded timeseries plot of European stock markets
install.packages("mvtsplot")
install.packages("datasets")
require(datasets)
require(mvtsplot)
data <- diff(EuStockMarkets ,90)
mvtsplot(data,norm ="internal", levels = 5,margin=FALSE)
The first line allows you to replicate exactly the image below. It basically starts
the random number generator at the same value every time. The second line
indicates five circles will be drawn (to draw more set n to a larger number).
The third line generates the random numbers. The fifth line sets the size of the
circles as a function of the random number value. The final line draws the
random circles image onto the R graphics device. You should now see on your
R graphics device an image that looks like the one below:
Not that impressive but at least you have drawn something! Let’s play a little
with it. First let’s make the circumference of the circles blue; to do this type:
symbols(x,circles=size,fg="blue")# turn circumference blue
You should now see an image that looks like this:
-
The argument fg changes the color of the circumference of the circle. R has a
rich array of color options. To get a list of the basic color type colors() into the
R consul and press return.
Back to our example. We would like to fill the circles with the color green. This
can be accomplished in one line
symbols(x,circles=size,fg="blue",bg="green")#green circles
The resultant image is shown below:
Since the circles are rather large, it might be a good idea to limit their size. This
can be done by adding the inches argument to the symbols function. We will set
the maximum size to 0.5 of an inch and set the background color of the image to
cornsilk. This is achieve using the par() function.
par(bg="cornsilk")# set background color
symbols(x,circles=size,fg="blue",bg="green", inches=0.5)
The image below is the result:
Of course, we will need to label the x and y axis and give the image a title. This
can achieved by typing one line into R:
symbols(x,circles=size,fg="blue",bg="green", inches=0.2, main="Random
Circle Chart", xlab="Five random bubbles",ylab="Bubble size")# random
circles chart finished!
The diagram below is the final product:
In reality it only took two lines of code to produce this image:
par(bg="cornsilk")# set background color
symbols(x,circles=size,fg="blue",bg="green", inches=0.2, main="Random
Circle Chart", xlab="Five random bubbles",ylab="Bubble size")# draw chart
Not bad for a couple of lines! But we can do even better when we visualize real
data in subsequent sections.
Tip 1: To get help on a function, type help(function name) or ? function name.
For example to get help on the par function you can type:
help(par)
or
?par
A note on color in R
Traditionally scientific charts have used color sparingly. This has begun to
change as more and more researchers become aware of both the power of color
and the ease by which it can be added to an image. R is particularly rich in
color. Let’s try some of the built in palettes right now! We begin with the
rainbow palette:
We have just scratched the surface. Creative ways to use these and other colors
will be explored in the next sections of this text.
Now let’s get started!
10. Bubble plot
The chart displays the relationship between personal saving (x-axis), growth
rate in disposable income (y-axis) and proportion of population less than 15
years old (size of bubble).
Tip 1: Change the background color of the plot by using the argument
par(bg="my color"). The color of the bubbles is controlled by the argument bg
in the symbols functions; you can also change the border around the bubbles by
setting argument fg="my color".
Tip 2: The radius of the bubbles is controlled by the circles argument in the
symbols function. The chart used circles = sqrt(data$pop15/pi). To ensure the
bubbles are not to large the maximum size of each bubble was limited by setting
inches=0.25. For larger bubbles increase this number, for smaller bubbles
decrease the number, the default is 1 inch.
Tip 3: The title function offers considerable flexible in adding text to a chart.
The color of the text can be changed by setting col.main ="my color". The
argument cex.main controls the size of the text. The bubble chart used a title,
with cex.main=2, and a subtitle with a smaller font size set by cex.main =1.
Quick reference R code
Figure 11 1 The 90 day price change of four European stock market indices
The chart is constructed from daily data on the four major European stock
market indices (FTSE, CAC, SMI and DAX) recorded over the period 1991 to
1998. The panel shows the time series for the 90 day price changes for each
market index. The values have been discretized and assigned to five distinct
categories. The shading at the extremes represents whether a high 90 day change
(green) or low 90 day change (purple) in prices were observed. The deeper the
purple, the lower the 90 day price change. The deeper the green, the higher the
90 day price change.
Tip 1: The mvtsplot function requires data in a matrix format. When using your
own data remember to set the argument ncol equal to the number of columns in
your data set.
Tip 2: You can supply your own column heading by setting colnames<-(data
c("name.1","name.2","name.3",…). Again, make sure you have sufficient
column names for the number of columns in your data.
Tip 3: The argument norm in the mvtsplot function addresses whether the
categorization of the time series is based on within-series categories (norm =
"internal") or using all the data in (norm = "global"). In the chart created in this
chapter the categories were the four stock market indices (FTSE, CAC, SMI and
DAX) with each categorization created using within-series data. Experiment
with both norm = "internal" and norm = "global" to see if trends in the overall
dataset appear more clearly.
Tip 4: The argument levels in the mvtsplot function controls how many colors
are used in the plot. The default is levels =3. In the chart created in this chapter
the values of each time series are divided into five categories ("very low",
"low", "medium", "high" and "very high") and plotted using deep purple for very
low values and deep green to represent high values. As a rule of thumb smooth
data can usually be plotted with a larger number of levels; very noisy or spikey
data typically needs to be plotted with fewer levels. Try experimenting with
different values to see if trends in the overall dataset appear more clearly.
install.packages("mvtsplot")
install.packages("datasets")
require(datasets)
require(mvtsplot)
data <- diff(EuStockMarkets ,90
mvtsplot(data,norm ="internal", levels = 5,margin=FALSE)
NOTES
The chart is constructed from data on the monthly number of sunspots recorded
over the period 1749 to 1997. The main panel shows the time series of sunspots
for each month of the year. The values of each time series are discretized and
assigned to distinct categories. The shading represents whether a high number
(green) or low number (purple) of sunspots were recorded. The right panel
contains box plots for each of the twelve months. The median value is
represented by a large dot. The lower panel presents a time series plot of the
median value of observations for each year.
Tip 1: The mvtsplot function requires data in a matrix format. When using your
own data remember to set the argument ncol equal to the number of columns in
your data set.
Tip 2: You can supply your own column heading by setting colnames<-(data
c("name.1","name.2","name.3",…). Again, make sure you have sufficient
column names for the number of columns in your data.
Tip 3: The argument norm in the mvtsplot function addresses whether the
categorization of the time series is based on within-series categories (norm =
"internal") or using all the data in (norm = "global"). In the chart created in this
chapter the categories were months of the year using all the data. Experiment
with both norm = "internal" and norm = "global" to see if trends in the overall
dataset appear more clearly.
Tip 4: The argument levels in the mvtsplot function controls how many colors
are used in the plot. The default is levels =3. In this case the values of each time
series are divided into three categories ("low", "medium", and "high") and
plotted using purple for low values, grey for medium values, and green to
represent high values. As a rule of thumb smooth data can usually be plotted
with a larger number of levels; very noisy or spikey data typically needs to be
plotted with fewer levels. Try experimenting with different values to see if
trends in the overall dataset appear more clearly.
install.packages("mvtsplot")
install.packages("datasets")
require(datasets)
require(mvtsplot)
data<-matrix(data = sunspot.month, ncol = 12, byrow = TRUE)
colnames(data) <- c("Jan","Feb","Mar","Apr","May","Jun","Jul",
"Aug","Sep","Oct","Nov","Dec")
rownames(data)<-(1749:1997)
time<-(1749:1997)
mvtsplot(data,xtime = time,norm ="global", levels = 2)
NOTES
The chart is constructed from data on the monthly number of sunspots recorded
over the period 1749 to 1997. The main panel is divided into two sections by a
red line. The section above the red line shows the time series of sunspots for
each month of the year. The area below the red line presents simulated data of
sunspots from a lognormal distribution. The values of both the actual and
simulated time series are discretized and assigned to distinct categories. The
shading represents whether a high number (green) or low number (purple) of
sunspots were recorded. The right panel contains box plots for each series for
each month of the year. The median value is represented by a large dot. The
lower panel presents a time series plot of the median value of both sets of
observations by year.
Tip 1: The mvtsplot function requires data in a matrix format. When using your
own data remember to set the argument ncol equal to the number of columns in
your data set.
Tip 2: You can supply your own column heading by setting colnames<-(data
c("name.1","name.2","name.3",…). Again, make sure you have sufficient
column names for the number of columns in your data.
Tip 3: The argument norm in the mvtsplot function addresses whether the
categorization of the time series is based on within-series categories (norm =
"internal") or using all the data in (norm = "global"). Experiment with both norm
= "internal" and norm = "global" to see if trends in the overall dataset appear
more clearly.
Tip 4: The argument levels in the mvtsplot function controls how many colors
are used in the plot. The default is levels =3 which was used in the chart in this
chapter. For the default value purple is assigned to low values, grey to medium
values, and green to high values. As a rule of thumb smooth data can usually be
plotted with a larger number of levels; very noisy or spikey data typically needs
to be plotted with fewer levels. Try experimenting with different values to see if
trends in the overall dataset appear more clearly.
install.packages("mvtsplot")
install.packages("datasets")
require(datasets)
require(mvtsplot)
data<-matrix(data = sunspot.month, ncol = 12, byrow = TRUE)
colnames(data) <- c("Jan","Feb","Mar","Apr","May","Jun","Jul",
"Aug","Sep","Oct","Nov","Dec")
rownames(data)<-(1749:1997)
time<-(1749:1997)
set.seed(1234)
random.data<- matrix(rlnorm (12*nrow(data),mean=3.9,sd=1.2),
nrow(data),12)
colnames(random.data) <- c("Jan","Feb","Mar","Apr","May","Jun","Jul",
"Aug","Sep","Oct","Nov","Dec")#
group<-c(0,0,0,0,0,0, 0,0,0,0,0,0,1,1,1,1,1,1, 1,1,1,1,1,1)
data<-cbind(random.data,data)
mvtsplot(data, xtime = time, norm ="internal", levels = 3, group=group,
gcol="red")
NOTES
Figure 14 1 United States State of the Union Addresses (2010 and 2011)
The image displays the most frequently used words by President Obama in his
State of the Union addresses during 2010 and 2011. Word frequency is
visualized by the size of the text; the more frequent the word the-e larger it
appears in the image.
Tip 1: To create a word cloud your data needs to be in a corpus format. One
way to do this is to transform your text into a data frame and then to a corpus (a
collection of text documents). Something along the lines of
library(tm)
textframe <- do.call("rbind", lapply(mytext, as.data.frame))
myCorpus <- Corpus(VectorSource(textframe$text))
Note VectorSource requires character vectors.
Tip 2: The argument min.freq controls the minimum word count displayed in the
image. Words which occur less than min.freq are excluded from the word cloud.
Tip 3: To limit the maximum number of words in the image add the max.words
argument to the wordcloud function i.e. wordcloud(max.word=100,…).
Tip 4: To change the color of the printed text use the argument colors =
“my_favorite_color”.
*
15. Commonality word cloud
Figure 15 1 United States State of the Union Addresses (2010 and 2011)
Tip 2: To limit the maximum number of words in the image add the max.words
argument to the commonality.cloud function i.e. commonality.cloud
(max.word=100,…).
Tip 3: To see the list of stop words used type the following into the R console
window: stopwords("en")
Quick reference R code
install.packages("tm") # this package contains the data
install.packages("wordcloud") # this package generates the plot
require(tm)
require(wordcloud)
data(SOTU)
text <- SOTU
text <- tm_map(text, removePunctuation)
text <- tm_map(text, tolower)
text <- tm_map(text, removeNumbers)
text <- tm_map(text, function(x)removeWords(x,stopwords()))
term.matrix <- TermDocumentMatrix(text)
term.matrix <- as.matrix(term.matrix)
par(bg="yellow")
set.seed(123)
commonality.cloud(term.matrix,random.order=FALSE)
NOTES
Figure 16 1 United States State of the Union Addresses (2010 and 2011)
The image displays high frequently words used by President Obama in his State
of the Union addresses during 2010 and 2011. The top half of the diagram are
frequently used word from the 2010 address (green), the bottom half are high
frequency words from the 2011 address (burnt orange).
Tip 2: To limit the maximum number of words in the image add the max.words
to the comparision.cloud function i.e. comparision.cloud (max.word=100,…).
Tip 3: To see the list of stop words used type the following into the R console
window: stopwords("en")
Quick reference R code
install.packages("tm") # this package contains the data
install.packages("wordcloud") # this package generates the plot
require(tm)
require(wordcloud)
data(SOTU)
text <- SOTU
text <- tm_map(text, removePunctuation)
text <- tm_map(text, tolower)
text <- tm_map(text, removeNumbers)
text <- tm_map(text, function(x)removeWords(x,stopwords()))
term.matrix <- TermDocumentMatrix(text)
term.matrix <- as.matrix(term.matrix)
colnames(term.matrix) <- c("2010 Presidential Address","2011 Presidential
Address")
set.seed(1234)
par(bg="gray90")
comparison.cloud(term.matrix,max.words=300,random.order=FALSE)
NOTES
The image displays the scale, cities and capital city of the US state of Texas.
Major cities are denoted by white dots. Austin, the capital city is highlighted in
yellow.
Tip 1: The map function allows you to draw a wide variety of maps for US
states and other countries. For example map("state", "iowa") will draw an
outline of the state of Iowa. To fill in a state in a solid color use fill=TRUE.
Tip 2: Change the size of the scale by changing the cex argument in map.scale.
Values greater 1 increase the size. To use metric distances rather than imperial
set the argument metric =T.
Tip 3: You can extend the length of the scale by using the relwidth argument in
the map.scale function. The larger the value passed to the argument, the larger
the distance measured by the scale.
The image displays the fish and chips condiment choice by the consumers in
London, England; Cardiff, Wales; Glasgow, Scotland and Belfast, Northern
Ireland. The size of the pie charts reflect the relative consumption by consumers
in each city.
Tip 1: You can set a title directly in the basemap function by setting main="my
title"; If you would like to have a longitude/ latitude grid around the images set
the argument axes = TRUE and axes = FALSE,frame.plot = TRUE. Labels for
the x and y axes can be entered by xlab="x axis title",ylab="y axis title".
Tip 2: The map used the argument col <- rainbow(4) to generate four pie chart
colors using the rainbow function. A quick and easy alternatives which often
looks great is col <- terrain.colors(n), where n is the number of colors you want
to generate..
Tip 3: The add.pie function is easy and straightforward to use. The first
argument takes the values for the pie chart, the second two arguments the x and y
location of the pie chart, and the third argument the size of the pie chart. The
label.dist argument is very useful as it specifies the distance between pie chart
labels. The images used label.dist=1.3. Try experimenting with larger and
smaller values until you find a distance that is visually appealing to you.
The image displays the level of state income tax by state in the United States
during 2013. The highest income tax state is California with appears shaded
purple. Zero state income tax states, such as Texas, are not colored.
Tip 2: Playing with colors is fun. The map used the argument color.scale with
color.spec="hcl". Good alternatives are "rgb" and "hsv". The function
color.scale calculates a sequence of colors by a linear transformation of the
numeric values supplied into the ranges for the three color parameters (the map
used c(0,300),35,50). Adjust the value of each parameter to change the visual
flavor of the map.
Tip 3: You adjust the location of the legend by specifying two arguments (x and
y), these determine the horizontal and vertical location. We used a fast
alternative by specifying the location with a shortcut legend("bottomleft",…).
You can also specify "topright", "bottomright", "topleft". However, greater
precision can be obtained by specifying the x and y parameters separately.
The image displays the outcome of the 2012 US Presidential election. States
won by Obama are colored blue, states won by Romney are colored red.
Tip 1: For an exact match of state names set exact =TRUE in the match.map
function.
Tip 2: Playing with colors is fun. The map used the argument color.scale with
color.spec="rgb" to generate the red and blue colors of the political parties.
Quick and easy alternatives are "hcl" and "hsv".
Tip 3: You adjust the location of the legend by specifying two arguments (x and
y), these determine the horizontal and vertical location. We used a fast
alternative by specifying the location with a shortcut legend("bottomleft",…).
You can also specify "topright", "bottomright", "topleft". However, greater
precision can be obtained by specifying the x and y parameters separately.
map("usa")
data(votes.repub)
result=c(1 ,1 ,1 ,1 ,0 ,0
,0 ,0 ,0 ,1 ,0 ,1 ,0
,1 ,0 ,1 ,1 ,1 ,0 ,0
,0 ,0 ,0 ,1 ,1 ,1 ,1
,0 ,0 ,0 ,0 ,0 ,1 ,1
,0 ,1 ,0 ,0 ,0 ,1 ,1
,1 ,1 ,1 ,0 ,0 ,0 ,1
,0 ,1)
temp<-cbind(votes.repub,result)
data <- temp[,c(32)]
data[data == 0] <- NA
state.to.map <- match.map("state", state.name, exact = F)
x <- data[state.to.map]
state.col<-color.scale(x, 1,0, 0, color.spec="rgb")
par(bg="slategray2")
map("state",fill=TRUE,col=state.col, )
data[is.na(data)] <- 0.0
data[data == 1] <- NA
x <- data[state.to.map]
state.col<-color.scale(x, 0,0, 1, color.spec="rgb")
map("state",fill=TRUE,col=state.col,add=TRUE, boundary = FALSE)
map("state",fill=F,add=TRUE, boundary = T, col="white")
title(main="2012 Presidential Election")
legend(x="bottomleft", inset=.05,
bty="n", cex=0.75, text.col ="darkred",
c("Obama (Democrat)", "Romney (Republican)"), fill= c("blue","red"))
NOTES
Tip 1: For an exact match of state names set exact =TRUE in the match.map
function.
Tip 2: Playing with colors is fun. The map used the argument color.scale with
color.spec="rgb" to generate the red and blue colors of the political parties.
Quick and easy alternatives are "hcl" and "hsv".
Tip 3: You adjust the location of the legend by specifying two arguments (x and
y), these determine the horizontal and vertical location. We used a fast
alternative by specifying the location with a shortcut legend("bottomleft",…).
You can also specify "topright", "bottomright", "topleft". However, greater
precision can be obtained by specifying the x and y parameters separately.
temp<-cbind(votes.repub,result)
data <- temp[,c(32)]
data[data == 0] <- NA
state.to.map <- match.map("state", state.name, exact = F)
x <- data[state.to.map]
state.col<-color.scale(x, 1,0, 0, color.spec="rgb")
par(bg=" cornsilk")
map('state.carto',fill=TRUE,col=state.col, )
data[is.na(data)] <- 0.0
data[data == 1] <- NA
x <- data[state.to.map]
state.col<-color.scale(x, 0,0, 1, color.spec="rgb")
map('state.carto',fill=TRUE,col=state.col,add=TRUE, boundary = FALSE)
map('state.carto',fill=F,add=TRUE, boundary = T, col="white")
title(main=" Party control of Governors' offices ",
cex.main=1,col.main="black")
title( sub="As at January 2013", cex.main=0.3)
legend(x="bottomleft", inset=.05,
bty="n", cex=0.75, text.col ="darkred",
c("Democratic)", "Republican"), fill= c("blue","red"))NOTES
Write your notes below…
22. Visibility map
The image displays economic conditions across the US based on the monthly
coincident index for each of the 50 states published by the Federal Reserve
Bank of Philadelphia. The image uses a visibility based map of the US which
provides simplified state shapes with sufficient areas to allow annotations in
even the smaller states.
# we don’t want to waste time typing in state names, so this is a neat trick
temp<-cbind(votes.repub,conditions)
# now drop all the other columns and just leave our tax data
data <- temp[,c(32)]
state.to.map <- match.map("state", state.name, exact = F)
Step 3: Plot the chart
#set colors to use on the image
state.col<-color.scale(x,c(0,300),40,60, color.spec="hcl")
par(bg="cornsilk")# set background color
map("state.vbm",fill=TRUE,col=state.col)
#create title and legend
title(main="Economic Health of the Nation",
sub=" Federal Reserve Bank of Philadelphia coincident index – April 2013")
legend(x="bottomright", inset=.05,
bty="n", cex=0.75, text.col ="darkred",
c("Weak", "Improving", "Strong"), fill= state.col)
Tip 2: The map used the argument color.scale with color.spec="hcl". Quick and
easy alternatives are "rgb" and "hsv".
Tip 3: You adjust the location of the legend by specifying two arguments (x and
y), these determine the horizontal location and the vertical location. We used a
fast alternative by specifying the location with a shortcut legend("bottomleft",
…). You can also specify "topright", "bottomright", "topleft". However, greater
precision can be obtained by specifying the x and y parameters separately.
#bottlenose data
bottle.x=c( -9.76 ,-9.31,-9.10)
bottle.y=c( 54.98, 55.00,54.57)
#common dolphin data
common.x=c( -10.35,-10.55)
common.y=c( 53.50,53.70)
#habour dolphin data
habour.x=c( -10.00,-10.25)
habour.y=c(53.00,52.90)
#set the colors for the legend, blue for whales, brown for dolphins
colors=c("blue","blue","blue","blue","brown","brown","black")
shapes=c(21,22,23,24,25,21,22) # shapes used to plot observations
#write legend in bottom left corner of image
legend(x="bottomleft", inset=.05, bty="0", cex=0.75, text.col ="white",
pch=shapes,col=colors,legend=species, title="Key",title.col="black")
#add title to image
title(main="Marine Mammal observations around the Irish Coast",
sub=" 30th May – 15th April 2013: Irish Research Vessel")
Tip 1: The draw.bubble function was used to draw the observed marine
mammal sighting on the image. The size and radius of each point can be changed
by specifying alternative values for the arguments z and maxradius respectively.
Tip 2: To turn off the x and y axis use the argument axes = FALSE in the
basemap function.
Tip 3: You adjust the location of the legend by specifying two arguments (x and
y), these determine the horizontal location and the vertical location. We used a
fast alternative by specifying the location with a shortcut legend("bottomleft",
…). You can also specify "topright", "bottomright", "topleft". However, greater
precision can be obtained by specifying the x and y parameters separately.
install.packages("mapplots")
require(mapplots)
data(coast)
species=c("Fin","Humpback","Minke","Killer","Bottlenose","Common","Habour")
par(bg="orangered")
basemap(xlim=c(-13,-5.7), ylim=c(51.5,55.15), main = "", axes = TRUE)
draw.shape(coast, col='darkgreen')
colors=c("blue","blue","blue","blue","brown","brown","black")#legend colors
shapes=c(21,22,23,24,25,21,22)
legend(x="bottomleft", inset=.05, bty="0", cex=0.75, text.col ="white",
pch=shapes,col=colors,legend=species,title="Key",title.col="black")
title(main="Marine Mammal observations around the Irish Coast",
sub=" 30th May – 15th April 2013: Irish Research Vessel")
NOTES
Figure 24 1 Bacteria count, weight over time of two samples of raw milk
The image displays the relationship between bacteria count, weight (mass) and
time of two samples of raw milk. Sample one is colored green and sample two
colored red.
Tip 1: You can set the limit range of the x,y and z axis by using the arguments
xlim, ylim and zlim respectively. So zlim=c(0,2000) sets the range of the z axis
between 0 and 2000.
Tip 2: To set the size of the labels, axis and symbols (plotted points) use the
argument cex.lab, cex.axis and cex.symbols respectively. Try experimenting
with values greater than 1 for much larger labels, symbols or x, y and z axis,
values less than 1 produce smaller labels, symbols and x,y,z axis.
Tip 3: Set the color of the grid and axes by using the arguments col.axis and
col.grid. You can turn the axis and grid off by setting the arguments axis=FALSE
and grid =FALSE.
Quick reference R code
install.packages("scatterplot3d")
require("scatterplot3d")
time=c(25,37,49,61,73,85,97,109,121,133,145,157,169,181,193,205,217)
count.1=c(0,7,9,12,7,30,8,12,33,16,16,30,29,19,28,29,11)
mass.1=c(220,110,210,340,330,560,635,80,605,150,688,422,1511,642,681,1961,1645)
count.2=c(6,13,15,18,13,36,14,18,39,22,22,36,35,25,34,35,17)
mass.2=c(239,197,213,481,466,591,695,140,746,203,782,494,1562,757,796,2055,1739
par(col="green",bg="cornsilk")
scatterplot3d(ylim=c(0,1600),zlim=c(0,2000) , count.1, time, mass.1, angle=22,
pch=20, cex.symbols=1,xlab="",ylab="", zlab="",grid=F,axis=F,cex.lab=1,
cex.axis=1)
par(new=TRUE,col="red" )
scatterplot3d(main=" Two samples of raw
milk",ylim=c(0,1600),zlim=c(0,2000), count.2, time, mass.2, angle=22,
col.axis="grey5",col.grid="tan",xlab=" Bacteria Count",ylab="Mass/ weight",
zlab="Time (minutes)", pch=20, cex.symbols=1,box=FALSE,
col.lab="brown",cex.lab=1, cex.axis=0.3)
legend("topright", inset=.05, bty="o", cex=1,
title="Key", c("Sample 1", "Sample 2"), pch=20, fill=c("green",
"red"),text.col="brown")
NOTES
The image displays the relationship between bacteria count, weight (mass) and
time from a sample of raw milk. Sample points are red colored dots.
Tip 1: You can set the limit range of the x,y and z axis by using the arguments
xlim, ylim and zlim respectively. So zlim=c(0,2000) sets the range of the z axis
between 0 and 2000.
Tip 2: To set the size of the labels, axis and symbols (plotted points) use the
argument cex.lab, cex.axis and cex.symbols respectively. Try experimenting
with values greater than 1 for much larger labels, symbols or x, y and z axis,
values less than 1 produce smaller labels, symbols and x,y,z axis.
Tip 3: Set the color of the grid and axes by using the arguments col.axis and
col.grid. You can turn the axis and grid off by setting the arguments axis=FALSE
and grid =FALSE.
The image displays the values from three distinct samples from the normal
distribution. Each sample has a different mean, although all three samples have
the same variance. Clusters are color coded with increasing mean as you move
from the green cluster through the blue cluster, with the orange cluster having the
highest mean value.
Tip 1: The argument type determines the type of plot. Use type = "p" for points,
type = "l" for lines and type = "h" for vertical lines.
Tip 2: Change the angle of the image by using the angle argument. Values in the
range of 20 to 45 generally work well. Notice that angle=0 results in a 2-D plot.
Tip 3: Set the color of the grid and axes by using the arguments col.axis and
col.grid. You can turn the axis and grid off by setting the arguments axis=FALSE
and grid =FALSE.
Tip 4: use the $pcolor argument to set the color of each group of bars.
sample2<-matrix(rnorm(300,2,1),100,3)
group2<- rep(2,nrow(sample2) )
sample2<-cbind(sample2,group2)
sample3<-matrix(rnorm(300,4,1),100,3)
group3<- rep(3,nrow(sample3) )
sample3<-cbind(sample3,group3)
data=rbind(sample1,sample2,sample3)
data<-data.frame(data)
colnames(data)=c("x","y","z","Group")
The figure displays the path of inflation and interest rates in the US during 1990.
Each sample point is represented by a vertical dashed line and an open circle.
The solid purple line captures the trajectory over time of interest rates and
inflation.
par(new=TRUE,col="blue" )
scatterplot3d(time,inflation,rates,type="h",box=F, tick.marks=T,
label.tick.marks=T, x.ticklabs=label,angle=25,xlab="Date",ylab="Interest Rate
(%)", zlab="Inflation Rate", lty.hplot="dotted",col.grid="red",main= "US
Inflation and Interest Rates - 1990",col.main="firebrick4") Visualization quick
tips
Tip 1: The argument type determines the type of plot. Use type = "p" for points,
type = "l" for lines and type = "h" for vertical lines.
Tip 3: The argument lty.hplot="dotted" determines the nature of the vertical line.
In this case dotted. Other values the argument can take include "blank", "solid",
"dashed", "dotdash", "longdash", or "twodash".
Tip 4: Set the color of the grid and axes by using the arguments col.axis and
col.grid. You can turn the axis and grid off by setting the arguments axis=FALSE
and grid =FALSE.
The image displays the savings rate, growth in disposable income and
population over 75 for six nations averaged over the decade 1960–1970. Each
nation is represented by a red vertical dashed line with an open circle.
Tip 1: The argument type determines the type of plot. Use type = "p" for points,
type = "l" for lines and type = "h" for vertical lines.
Tip 2: Change the angle of the image by using the angle argument. Values in the
range of 20 to 45 generally work well. Notice that angle=0 results in a 2-D plot.
Tip 3: Add the argument lty.hplot="dotted" to change the nature of the vertical
line. Argument can take the form "blank", "solid", "dashed", "dotted", "dotdash",
"longdash", or "twodash".
Tip 4: Set the color of the grid and axes by using the arguments col.axis and
col.grid. You can turn the axis and grid off by setting the arguments axis=FALSE
and grid =FALSE.
Quick reference R code
install.packages("scatterplot3d")
require(scatterplot3d)
require(datasets)
attach(LifeCycleSavings)
nations<-rbind( LifeCycleSavings[8,], LifeCycleSavings[11,],
LifeCycleSavings[14,], LifeCycleSavings[15,], LifeCycleSavings[20,],
LifeCycleSavings[23,])
x=nations[,1]
y=nations[,3]
z=nations[,5]
par(bg="ghostwhite",col="firebrick1")
image <-scatterplot3d(x,y,z,box=FALSE,type="h",angle=45, xlab="Savings
Rate",ylab="Population over 75 (%)",zlab="Growth in disposable income
(%)",main="Saving Characteristics",sub="1960-1970",col.main="darkblue",
col.grid="grey",color="firebrick1")
image.coords <- image$xyz.convert(x,y,z)
text(image.coords$x, image.coords$y, labels=row.names(nations),
pos=3, cex=1)
NOTES
The image displays the expected asset class returns and risk for stocks (large
cap, small cap, international), bonds (government, corporate, high yield) and
real assets (real estate, commodities, gold) alongside the current portfolio asset
allocation. Each asset class is color coded, stocks (green), bonds(blue) and real
assets (orange). The name of each asset appear at the top of the colored bar.
data<-data.frame(data)
# data used for axis in plot
x<-data[,2]
y<-data[,3]
z<-data[,1]
Tip 1: The argument type determines the type of plot. Use type = "p" for points,
type = "l" for lines and type = "h" for vertical lines.
Tip 2: Change the angle of the image by using the angle argument. Values in the
range of 20 to 45 generally work well. Notice that angle=0 results in a 2-D plot.
Tip 3: Add the argument lty.hplot="dotted" to change the nature of the vertical
line. Argument can take the form "blank", "solid", "dashed", "dotted", "dotdash",
"longdash", or "twodash".
Tip 4: Set the color of the grid and axes by using the arguments col.axis and
col.grid. You can turn the axis and grid off by setting the arguments axis=FALSE
and grid =FALSE.
Tip 5: use the $pcolor argument to set the color of each group of bars.
data<-data.frame(data)
x<-data[,2]
y<-data[,3]
z<-data[,1]
data$pcolor[data$Group==1] <- "green"
data$pcolor[data$Group ==2] <- "blue"
data$pcolor[data$Group ==3] <- "orange"
par(bg="ghostwhite")
with(data, {
image<-scatterplot3d(x,y,z, type="h", lwd=10, pch=" ", color=pcolor,zlim =
c(0, 12),box=FALSE,xlab="Risk (vol)",zlab="Expected Return",ylab="Current
Allocation",angle=22,main="Asset Class Expectation", col.main="darkblue",
col.grid="grey")
legend("topleft", inset=.05,
bty="n", cex=1, text.col ="darkred",
title="Asset Class",
c("Stocks", "Bonds", "Real Assets"), fill=c("green", "blue", "orange"))
})
NOTES
The chart displays the simulated flight directions of three flies from a point of
rest. The colored dots represent the flight orientation of each fly (red for fly 1,
blue for fly 2 and green for fly 3).
Tip 1: Data should be in a circular format. This means your data needs to be
converted to circular class. To do this try the statement circular (your_data) to
coerce data into a circular format.
Tip 2: You can change the color of the points by changing the col argument.
Tip 3: Set stack=FALSE to plot the points around the outer edge of the circle.
Tip 4: To remove the numbers around the edge of the circle add the argument
axes=FALSE to the plot function. To add tick marks to the edge of the circle add
the argument ticks =TRUE to the plot function.
install.packages("circular")
require(circular)
data<-circular(wind , zero = pi/2)
par(bg="lightyellow")
rose.diag(data, bins=18,axes=TRUE,ticks=TRUE,border="black",
col="darkred",tol=0,prop=1.5)
points(data,stack=TRUE,col="darkgreen",pch=20)
symbols(0, 0, circle = 0.2, inches = FALSE, add = TRUE, fg = 'black')
symbols(0, 0, circle = 0.4, inches = FALSE, add = TRUE, fg = 'black')
symbols(0, 0, circle = 0.6, inches = FALSE, add = TRUE, fg = 'black')
symbols(0, 0, circle = 0.8, inches = FALSE, add = TRUE, fg = 'black')
NOTES
A arrows chart for circular data shows the orientation of each individual data
point by use of an arrow. The chart displays the orientation of three sets of long-
legged desert ants after one eye on each ant was 'trained' to learn the ant's home
direction, then covered and the other eye uncovered. The charts are orientated
so that zero degrees appears at the top.
Tip 3: Change the background color by setting the argument bg="your chosen
color".
Tip 4: To add tick marks around the edge of the circle add the argument ticks
=TRUE to the plot function.
install.packages("circular")
require(circular)
ant.1<-circular(fisherB10$set1,zero=pi/2)
ant.2<-circular(fisherB10$set2,zero=pi/2)
ant.3<-circular(fisherB10$set3,zero=pi/2)
par(bg="cornsilk")
par(mfrow=c(2,2))
plot(ant.1,col="darkred",main="Ant group 1",axes=FALSE)
arrows.circular (ant.1,col="darkred")
plot(ant.2,col="darkgreen",main="Ant group 2",axes=FALSE)
arrows.circular (ant.2,col="darkgreen")
plot(ant.3,col="darkblue",main="Ant group 3",axes=FALSE)
arrows.circular (ant.3,col="darkblue")
plot(ant.1,col="darkred",main="All three groups",axes=FALSE)
arrows.circular (ant.1,col="darkred")
par(new=TRUE)
plot(ant.2,col="darkgreen", axes=FALSE)
arrows.circular (ant.2,col="darkgreen")
par(new=TRUE)
plot(ant.3,col="darkblue",axes=FALSE)
arrows.circular (ant.3,col="darkblue")
NOTES
Write your notes below…
32. Rose diagram
A rose diagram is essentially a histogram for circular data. The chart displays
wind direction measurement taken at 15 minute intervals at Col de la Roa in the
Italian Alps from January 29, 2001 to March 31, 2001. The dark red bars in the
center of the diagram are similar to the bars in a histogram. The image is
orientated so that zero degrees appears at the top of the image.
rose.diag(data, bins=18,axes=TRUE,ticks=TRUE,border="black",
col="darkred",tol=0,prop=1.5)
Tip 1: The rose.diag function requires data in a circular format. This means the
data needs to be converted to circular class. When using your own data or other
datasets you can use the statement circular (your_data) to coerce data into a
circular format.
Ti p 2: You can change the number of bins by using the arguments bins in
rose.diag. The argument takes positive values.
Tip 3: Change the argument col in rose.diag to choose different bin colors.
Tip 4: To remove the numbers around the edge of the circle use axes=FALSE.
To remove the tick marks from the edge of the circle use ticks =FALSE.
install.packages("circular")
require(circular)
data<-circular(wind , zero = pi/2)
par(bg="lightyellow")
rose.diag(data, bins=18,axes=TRUE,ticks=TRUE,border="black",
col="darkred",tol=0,prop=1.5)
points(data,stack=TRUE,col="darkgreen",pch=20)
symbols(0, 0, circle = 0.2, inches = FALSE, add = TRUE, fg = 'black')
symbols(0, 0, circle = 0.4, inches = FALSE, add = TRUE, fg = 'black')
symbols(0, 0, circle = 0.6, inches = FALSE, add = TRUE, fg = 'black')
symbols(0, 0, circle = 0.8, inches = FALSE, add = TRUE, fg = 'black')
NOTES
Tip 1: The rose.diag function requires data in a circular format. This means the
data needs to be converted to circular class. When using your own data or other
datasets you can use the statement circular (your_data) to coerce data into a
circular format.
Ti p 2: You can change the number of bins by using the arguments bins in
rose.diag. The argument takes positive values.
Tip 3: Change the argument col in rose.diag to choose different bin colors.
Tip 4: In the points function set Stack =FALSE if you want individual
observations plotted around the edge of the outer circle.
install.packages("circular")
require(circular)
data<-circular(wind , zero = pi/2)
par(bg="lightyellow")
rose.diag(data, bins=18,axes=TRUE,ticks=TRUE,border="black",
col="darkred",tol=0,prop=1.5)
points(data,stack=TRUE,col="darkgreen",pch=20)
34. Radial rose diagram
A radial rose diagram is essentially a histogram for circular data with radial
lines indicating the frequency of observations. The chart displays wind direction
measurement taken at 15 minute intervals at Col de la Roa in the Italian Alps
from January 29, 2001 to March 31, 2001. The gold petals in the center of the
diagram are similar to the bars in a histogram. The green dots around the edge of
the outer circle are the stacked individual observations. The diagram is
orientated so that zero degrees appears at the top of the image.
Tip 1: The rose.diag function requires data in a circular format. This means the
data needs to be converted to circular class. When using your own data or other
datasets you can use the statement circular (your_data) to coerce data into a
circular format.
Ti p 2: You can change the number of bins by using the arguments bins in
rose.diag. The argument takes positive values.
Tip 3: Change the argument col in rose.diag to choose different bin colors. In the
points function set Stack =FALSE if you want individual observations plotted
around the edge of the outer circle.
Tip 4: You can change the color of the radial lines by setting the argument
fg='your chosen color' in the symbols function.
Quick reference R code
par(bg="lightblue")
rose.diag(data,
bins=18,axes=TRUE,ticks=TRUE,border="black",col="gold",tol=0,prop=1.5)
#
points(data,stack=TRUE,col="darkred",pch=20)
symbols(0, 0, circle = 0.2, inches = FALSE, add = TRUE, fg = 'black')
symbols(0, 0, circle = 0.4, inches = FALSE, add = TRUE, fg = 'black')
symbols(0, 0, circle = 0.6, inches = FALSE, add = TRUE, fg = 'black')
symbols(0, 0, circle = 0.8, inches = FALSE, add = TRUE, fg = 'black')
NOTES
Tip 1: The rose.diag function requires data in a circular format. This means the
data needs to be converted to circular class. When using your own data or other
datasets you can use the statement circular (your_data) to coerce data into a
circular format.
Ti p 2: You can change the number of bins by using the arguments bins in
rose.diag. The argument takes positive values.
Tip 3: Change the argument col in rose.diag to choose different bin colors. In the
points function set Stack =FALSE if you want individual observations plotted
around the edge of the outer circle.
Tip 4: To create a regular multivariable rose diagram do not enter the last three
lines of code.
install.packages("circular")
require(circular)
set.seed(1234)
beaver.1 <- rvonmises(n=50, mu=circular(0), kappa=3)
beaver.2 <- rvonmises(n=50, mu=circular(pi), kappa=3)
beaver.3 <- rvonmises(n=50, mu=circular(pi/2), kappa=3)
par(bg="cornsilk")
rose.diag(beaver.1, bins=18,axes=TRUE,ticks=TRUE,
border="black",col="darkred",tol=0,prop=1.5)
par(new=TRUE)
rose.diag(beaver.2, bins=18,axes=TRUE,ticks=TRUE,
border="black",col="darkblue",tol=0,prop=1.5)
par(new=TRUE)
rose.diag(beaver.3, bins=18,axes=TRUE,ticks=TRUE,
border="black",col="darkorange",tol=0,prop=1.5) points(beaver.1,
col="darkred", stack=TRUE)
points(beaver.2, col="darkblue", stack=TRUE)
points(beaver.3, col="darkorange", stack=TRUE)
NOTES
The image shows the frequency of counts by wind direction for Carbon
monoxide concentration (parts per million) measured at Marylebone, London.
The petals are similar to the bars in a histogram and represent the frequency of
observations. The petals are color coded to capture the extent of concentration.
Circular bands are drawn at 5%,10%,15% and 20%. A petal that touches the
5% band represents 5% of the overall data.
pollutionRose(mydata,paddle=FALSE,pollutant= pollutant,annotate=FALSE,
key.footer = pollutant, key.position = "right", key = TRUE, breaks =
8,main=title, sub=sub,auto.text=FALSE)
Tip 1: The location of the key is determined by the argument key.posiition; It can
take values “left”,”right”,”top” and “bottom”. The number of colors (or breaks)
represented in the key is determined by the argument breaks; for many datasets
values between 6-8 work well.
Tip 2: You can change the plot to a modern wind rose by setting the argument
paddle=TRUE.
Tip 3: When substituting your own data note the pollutionRose function requires
data frames with a field "date" that can be in either POSIXct or Date format but
should be set to the Greenwich Mean Time time-zone. The data should be at
least hourly or higher resolution.
Tip 4: The chart used the argument pollutant <- c("co") to access the carbon
monoxide observations in the mydata data frame. This data frame contains a
number of other air quality observations taken in central London:
1. "ws" - Wind speed.
2. "wd" - Wind direction, in degrees from North.
3. "nox"- Oxides of nitrogen concentration.
4. "no2" - Nitrogen dioxide concentration.
5. "o3"- Ozone concentration.
6. "pm10" - Particulate PM10 fraction measurement.
7. "so2" - Sulfur dioxide concentration.
8. "pm25" - Particulate PM2.5 fraction measurement.
install.packages("openair")
require(openair)
data(mydata)#dataframe contains the data on Carbon monoxide concentration, in
ppm, as a numeric vector
pollutant <- c("co")# Carbon monoxide data
title=c("Carbon monoxide concentration in central London")
sub=c("1st January 1998 to 23rd June 2005")
pollutionRose(mydata,paddle=FALSE,pollutant= pollutant,annotate=F,
key.footer = pollutant, key.position = "right", key = TRUE, breaks =
8,main=title, sub=sub,auto.text=FALSE)
NOTES
Tip 1: The location of the key is determined by the argument key.posiition; It can
take values “left”,”right”,”top” and “bottom”.
Tip 2: The number of colors (or breaks) represented in the key is controlled by
the breaks argument in the polarFreq function. The imaged used breaks = seq(0,
70, 10), which generated a series of breaks from 0 to 70 in increments of 10.
Try breaks = seq(0, 70, 5) and observe the difference.
Tip 3: When substituting your own data note the polarFreq function requires
data frames with a field "date" that can be in either POSIXct or standard R Date
format. It should be set to the Greenwich Mean Time time-zone. The data should
ideally be at hourly or higher resolution.
Tip 4: The argument grid.line=10 controls the spacing of the dotted grid lines.
The argument border.col="white" controls the color of the boarder around each
colored rectangular. Set border.col = "transparent" to have an invisible border
around each colored rectangle.
Tip 5: The images used the argument pollutant <- c("o3") to access the carbon
monoxide observations in the mydata data frame. This data frame contains a
number of other air quality observations taken in central London:
9. "ws" - Wind speed.
10. "wd" - Wind direction, in degrees from North.
11. "nox"- Oxides of nitrogen concentration.
12. "no2" - Nitrogen dioxide concentration.
13. "co"- Carbon monoxide concentration.
14. "pm10" - Particulate PM10 fraction measurement.
15. "so2" - Sulfur dioxide concentration.
16. "pm25" - Particulate PM2.5 fraction measurement.
install.packages("openair")
require(openair)
data(mydata)#dataframe contains the data on Carbon monoxide concentration, in
ppm, as a numeric vector
pollutant <- c("o3")# Carbon monoxide data
title=c("Variation in Ozone concentration in central London")
sub=c("1st January 1998 to 23rd June 2005")
polarFreq (mydata,paddle=FALSE,statistic="max",pollutant=
pollutant,annotate=F, key.footer = pollutant, key.position = "right", key =
TRUE,main=title,
sub=sub,auto.text=TRUE,grid.line=10,border.col="white",breaks = seq(0, 70,
10))
NOTES
The chart shows a traditional wind rose using data measured at Marylebone,
London between 1st January 1998 to 23rd June 2005. Wind directions are
divided into eight compass directions. The circles around the image represent
the various percentages of occurrence of the winds. For example, the branch to
the north west just reaches the 10% ring implying a frequency of 10% blowing
from that direction. Calm has no direction.
Tip 1: Data for the oz.windrose function needs to be in a format where the rows
represent speed ranges and the columns indicating wind directions. Furthermore
the data should be in percentages such that the sum of all data in your table is
equal to 100. This is best achieved using the bin.wind.records function. This
function classifies wind direction and speed records into a matrix of
percentages of observations in speed and direction bins. Use the argument ndir
to change the number of direction bins to be used in the wind rose. The
traditional number of bins is ndir=8.
Tip 2: Use the argument speed.col in the oz.windrose function to change the
color of the branches of the wind rose. The default colors, shown in the above
chart are obtained by speed.col
=c("#dab286","#fe9a66","#ce6733","#986434"). You could also set alternative
colors such as speed.col =c("red","green","purple","yellow").
install.packages("openair")
install.packages("plotrix")
require(openair)
require(plotrix)
raw.data<-cbind(mydata$wd,mydata$ws)
data<-na.omit(raw.data)# remove missing observations
table<-bin.wind.records(data[,1],data[,2]*3.6,ndir=8)# create table and covert
wind speed to km per hour
par(bg="cornsilk")
oz.windrose(table)
NOTES
The chart is constructed from statistics, in arrests per 100,000 residents for
assault, murder, and rape in each of the five Pacific US states in 1973. The
shading goes from dark red to very light yellow. The darker the shading the
higher the number of violent offenses. The variable UrbanPop on the horizontal
axes refers to the size of the urban population, the darker the shading the higher
the urban population.
Tip 1: The heatmap function requires data in a matrix format. When using your
own data or other datasets you can use the as.matrix function to coerce data into
a matrix.
Tip 2: You can change the size of the text on the column and row by using the
arguments cexRow, cexCol respectively. The arguments take positive values.
The larger the value, the larger the text. As a rule of thumb set both equal to 1
and then experiment for best effect.
Tip 3: The argument col=heat.colors takes two parameters. The first is the
number of colors to use. The second relates to the transparency of the colors.
The chart in this chapter used 10 colors. Try experimenting with a lower
number. The chart in this chapter set transparency equal to 1 (no transparency).
It takes values between 0 and 1. Try experimenting with different values, the
visual impact can be quite stunning.
Tip 4: The heat.colors plots the heatmap using shades of red, orange and
yellow. Many other options are available. For example terrain.colors(n, alpha =
1), topo.colors(n, alpha = 1) and cm.colors(n, alpha = 1). The parameter n
refers to the number of colors and alpha the degree of transparency. As an
illustration you might set col= terrain.colors(10, 1). Experiment with color and
parameter values to see if trends in the dataset appear more pronounced.
The chart is constructed from simulated circular data representing the foraging
direction of eight ants from a single colony at various times of the year. The
shading in the main panel goes from dark red (higher circular values) to very
light yellow (lower circular values). The column dendrogram (top) shows the
relationship between the various months of the year. The row dendrogram (on
left of diagram) shows the relationship between the various ants.
Tip 2: You can change the size of the text on the column and row by using the
arguments cexRow, cexCol respectively. The arguments take positive values.
The larger the value, the larger the text. As a rule of thumb set both equal to 1
and then experiment for best effect.
Tip 3: You can insert a title on to the chart by adding the argument main="my
title" to the heatmap.circular function. To suppress the row dendrogram add the
argument Rowv = NA to the heatmap.circular function. To suppress the column
dendrogram add the argument Colv = NA to the heatmap.circular function.
Tip 4: The argument col=heat.colors takes two parameters. The first is the
number of colors to use. The second relates to the transparency of the colors.
The chart in this chapter used 10 colors. Try experimenting with a lower number
of color. The chart in this chapter set transparency equal to 1 (no transparency).
It takes values between 0 and 1. Try experimenting with different values, the
visual impact can be quite stunning.
Tip 5: The heat.colors function plots shades of red, orange and yellow. Many
other options are available. For example terrain.colors(n, alpha = 1),
topo.colors(n, alpha = 1) and cm.colors(n, alpha = 1). The parameter n refers to
the number of colors and alpha the degree of transparency. As an illustration you
might set col= terrain.colors(10, 1). Experiment with color and parameter
values to see if trends in the dataset appear more pronounced.
The chart is constructed from statistics, in arrests per 100,000 residents for
assault, murder, and rape in each of the five Pacific US states in 1973. The
shading goes from dark red to very light yellow. The darker the shading the
higher the number of violent offenses. The variable UrbanPop on the horizontal
axes refers to the size of the urban population, the darker the shading the higher
the urban population. The trace (solid line ) is drawn down the column's. The
distance of the line from the center of each color-cell is proportional to the size
of the measurement.
Tip 1: The heatmap.2 function requires data in a matrix format. When using your
own data or other datasets you can use the as.matrix function to coerce data into
a matrix.
Tip 2: You can change the size of the text on the column and row by using the
arguments cexRow, cexCol respectively. The arguments take positive values.
The larger the value, the larger the text. As a rule of thumb set both equal to 1
and then experiment for best effect.
Tip 3: You can insert a title by adding the argument main="my title" to the
heatmap.2 function. To activate the row dendrogram add the argument Rowv =
TRUE. To suppress the column dendrogram add the argument Colv = NA.
Ti p 4: The argument col=heat.colors takes two parameters. The first is the
number of colors to use. The second relates to the transparency of the colors.
The chart in this chapter used 10 colors. Try experimenting with a lower number
of color. The chart in this chapter set transparency equal to 1 (no transparency).
It takes values between 0 and 1.
Tip 5: The heat.colors plots shades of red, orange and yellow. Many other
options are available. For example terrain.colors(n, alpha = 1), topo.colors(n,
alpha = 1) and cm.colors(n, alpha = 1). The parameter n refers to the number of
colors and alpha the degree of transparency. As an illustration you might set
col= terrain.colors(10, 1). Experiment with color and parameter values to see
if trends in the dataset appear more pronounced.
Tip 6: To add a color key to the chart set the argument key=TRUE. You can set
the size of the color key by using the keysize argument. It takes positive values.
The larger the value the larger the key. Try experimenting different values. Good
starting points are keysize=2 and keysize =1.
The chart is constructed from statistics, in arrests per 100,000 residents for
assault, murder, and rape in each of the five Pacific US states in 1973. The
shading goes from dark red to very light yellow. The darker the shading the
higher the number of violent offenses. The variable UrbanPop on the horizontal
axes refers to the size of the urban population, the darker the shading the higher
the urban population. The chart has been annotated with comments on those
areas where the researcher has sufficient information to recommend policy
action (marked “yes”) and those areas where further information is sought
(market “no”).
Tip 1: The heatmap.2 function requires data in a matrix format. When using your
own data or other datasets you can use the as.matrix function to coerce data into
a matrix.
Tip 2: You can change the size of the text on the column and row by using the
arguments cexRow, cexCol respectively. The arguments take positive values.
The larger the value, the larger the text. As a rule of thumb set both equal to 1
and then experiment for best effect.
Tip 3: You can insert a title on to the chart by adding the argument main="my
title" to the heatmap.2 function. To activate the row dendrogram add the
argument Rowv = TRUE. To suppress the column dendrogram add the argument
Colv = NA.
Tip 4: The argument col=heat.colors takes two parameters. The first is the
number of colors to use. The second relates to the transparency of the colors.
The chart in this chapter used 10 colors. Try experimenting with a lower number
of color. The chart in this chapter set transparency equal to 1 (no transparency).
It takes values between 0 and 1. Try experimenting with different values, the
visual impact can be quite stunning.
Tip 5: The heat.colors plots shades of red, orange and yellow. Many other
options are available. For example terrain.colors(n, alpha = 1), topo.colors(n,
alpha = 1) and cm.colors(n, alpha = 1). The parameter n refers to the number of
colors and alpha the degree of transparency. As an illustration you might set
col= terrain.colors(10, 1). Experiment with color and parameter values to see
if trends in the dataset appear more pronounced.
Tip 6: To add a color key to the chart set the argument key=TRUE. You can set
the size of the color key by using the keysize argument. It takes positive values,
larger values increase the size of the key. Try experimenting different values.
Good starting points are keysize=2 and keysize =1.
install.packages("datasets")
install.packages("gplots")
require(datasets)
require(gplots)
data<-as.matrix(-rbind(USArrests[2,],USArrests[5,],USArrests[11,],
USArrests[37,],USArrests[47,]))
text<-matrix("",5,4)
text[1,1]= "yes"
text[2,1]= "yes"
text[2,3]= "yes"
text[1,4]= "yes"
text[4,1]= "no"
text[3,2]= "no"
text[4,3]= "no"
text[3,4]= "no"
heatmap.2(data, scale="column", cexCol=1,cexRow=1 ,col= heat.colors(10,
alpha = 1), density.info='none', cellnote=text,
trace="none",notecol="darkblue",Rowv = FALSE,key=FALSE)
NOTES
The chart is constructed from the correlation between the eight air quality
variables collected at the air quality monitoring station in Marylebone, London.
The actual correlations are printed inside each square. The grayscale shading of
each square represents the strength of correlation. The scale runs from black
(negative 100% correlation) to white (positive 100% correlation).
Tip 1: You can replace the squares with hexagons by adding the argument
do.hex=TRUE to the plotting function color2D.matplot.
install.packages("openair")
install.packages("plotrix")
require(openair)#contains the data
require(plotrix)# draws the plot
x<-na.omit(mydata[,2:9])#remove missing values
cor<- cor(x)*100
par(bg="cornsilk")
color2D.matplot(x=round(cor,1), show.values=TRUE,axes=FALSE,
xlab="",ylab="")
#create basic plot area
axis(side=1,at=(0.7:7.7), labels=colnames(cor),cex.axis=1, tick=FALSE)
axis(side=2,at=(0.7:7.7), labels=colnames(cor),cex.axis=1, tick=FALSE)
title("London Air Quality ",col.main="darkblue",line=2, cex.main =2)
title("January 1998- June 2005",col.main="black",line=1, cex.main =1)
title("*correlations calculated from hourly
measurements",col.main="black",line=-29, cex.main =0.8)
NOTES
The chart is constructed from the correlation between eight air quality variables
collected at the air quality monitoring station in Marylebone, London. The size
of the squares are proportional to the correlation with the sign indicated by the
color of the squares.
Tip 1: You can create a correlation matrix by setting Hinton =FALSE in the
plotting function color2D.matplot. Replace the squares with hexagons by adding
the argument do.hex=TRUE to the plotting function color2D.matplot.
Tip 2: To see the actual variable values set the argument show.values=TRUE.
You can also supply your own x and y axis labels by setting xlab="my x axis
label",ylab="my y axis label".
install.packages("openair")
install.packages("plotrix")
require(openair)#contains the data
require(plotrix)# draws the plot
x<-na.omit(mydata[,2:9])#remove missing values
cor<- cor(x)*100
par(bg=" tan")
color2D.matplot(x=round(cor,1), Hinton=T, show.values=F,axes=FALSE,
xlab="",ylab="",show.legend=T,extremes=c(2,3))
axis(side=1,at=(0.7:7.7), labels=colnames(cor),cex.axis=1, tick=FALSE)
axis(side=2,at=(0.7:7.7), labels=colnames(cor),cex.axis=1, tick=FALSE)
title("Hinton Air Quality Map",col.main="purple",line=2, cex.main =2)
title("Central London",col.main="purple",line=1, cex.main =1)
title("*Hourly measurements taken between January 1998- June
2005",col.main="black",line=-27.9, cex.main =0.6)
NOTES
The chart is constructed from the correlation between the six socioeconomic
indicators collected in French-speaking provinces of Switzerland around 1888.
The size of the rectangular bars indicate the strength of the correlation. The
larger the bar the higher the absolute correlation between two indicators. The
bars are color coded to help distinguish positive correlated indicators from
negatively correlated indicators.
#add legend
legend(x=5.5,y=1.3, inset=.05, bty="n", cex=0.75, text.col ="darkred",c("Strong
Pos", "Moderate Pos", "Moderate Neg","Strong Neg"), fill= c("#D97088FF",
"#A49100FF", "#00AA83FF", "#B678D5FF"))
Tip 2: You can change the size of the text on the x and y axis by using the
argument cex.labels. The default value is 1, larger values increase the size of the
x,y axis labels.
Tip 3: The function title is a simple, fast and flexible way to add a title to a
chart. Use the line argument to place the title precisely where you require it.
data<-cor(swiss)
#set the diagonal elements to zero
data[1,1]=0
data[2,2]=0
data[3,3]=0
data[4,4]=0
data[5,5]=0
data[6,6]=0
par(bg="burlywood1")
battleship.plot(data,col=color.scale(-data,c(0,300),70,60, color.spec="hcl"),
cex.labels=0.8,border="burlywood1")
Figure 46 1 Three groups of desert ants –density plot with rose diagram
The panel displays the circular density and rose diagram for the orientation of
three sets of long-legged desert ants after one eye on each ant was 'trained' to
learn the ant's home direction, then covered and the other eye uncovered. The
top three images show the density fit for each group to the circular distribution.
The bottom three images present the rose diagrams for each group of ants. The
size of the “petals” indicate the primary orientation of each group of ants.
Tip 1: The rose.diag function require data in a circular format. This means the
data needs to be converted to circular class. When using your own data or other
datasets you can use the statement circular (your_data) to coerce data into a
circular format.
Ti p 2: You can exclude the axes on each chart by setting the argument
axes=FALSE in the plot function.
Tip 3: Change the background color by setting the argument bg="your chosen
color".
Tip 4: Change the plot region using par(mfrow=c(x,y)), where x is the number
of rows, and y is the number of columns. For example, par(mfrow=c(2,2))
would split the graphic output into four areas and then you could plot a chart to
each specific area.
rose.diag(ant.1, bins=15,axes=TRUE,ticks=TRUE
,border="black",col="darkred",tol=0,prop=1.5, main="Ant group 1")
rose.diag(ant.2, bins=15,axes=TRUE,ticks=TRUE
,border="black",col="darkgreen",tol=0,prop=1.5, main="Ant group 2")
rose.diag(ant.3, bins=15,axes=TRUE,ticks=TRUE
,border="black",col="darkblue",tol=0,prop=1.5, main="Ant group 3")
NOTES
The chart displays the death rate per 1000 in Virginia during 1940. The data are
broken out by gender and by location (urban, rural) The cylindrical bars are
shaded by age group.
Tip 1: You can draw standard rectangular bars by setting the argument
cylindrical=FALSE.
Tip 2: It is possible to add colors to the legend plot by specifying the color
name rather than number, for example fill= c("red", "orange", "purple", "green",
"blue")
Tip 3: Change the size of the text in the legend by using the argument cex. It has
a default value of 1, larger values increase the text size.
install.packages("plotrix")
require(plotrix)
require(datasets)
age<-c(54,59,64,69,74)
names=colnames(VADeaths)
main= "Death rates per 1000 in Virginia in 1940"
par(bg="lightskyblue1")
barp(VADeaths,col=color.scale(age,c(1.1,0),c(0,1.1),c(0,0.1)),cylindrical=TRUE,names
age.range=c("50-54", "55-59","60-64", "65-69", "70-74")
legend(x="topright", inset=.05, bty="n", cex=0.75, text.col ="darkred",
legend=age.range, fill= c("#FF0000FF", "#BF4006FF", "#80800DFF",
"#40BF13FF", "#00FF1AFF"), title="Age Range")
NOTES
The chart displays 48 hour variation in various air quality measurements taken
in central London from January 1st 1998 to June 23rd 2005. The extent of the red
line represents captures the degree of variation of each environmental factor;
The blue square in the middle of each line represents the median or average
value.
Tip 1: Change the background color of the plot using par(bg="my color") .
Tip 2: The observations are sorted by average value, this is controlled by the
argument sort.segs=TRUE. Set to FALSE to plot in the order they appear in the
dataset.
Tip 4: The arguments pch,col,bg, and col.main control the type of shape used to
represent the median (use values in the range 19-26), the color of the lines
representing variation, color of the shape used to represent the median and the
color used in the title.
install.packages("openair")
install.packages("plotrix")
require(openair)#contains the data
require(plotrix)# draws the plot
temp<-list("",7)
temp[[1]]<-x1
temp[[2]]<-x2
temp[[3]]<-x3
temp[[4]]<-x4
temp[[5]]<-x5
temp[[6]]<-x6
temp[[7]]<-x7
data<-get.segs(temp)
colnames(data)=c("ws", "wd", "nox", "no2" , "pm10", "so2", "co")
par(bg="cornsilk")
centipede.plot(data*10000,sort.segs=TRUE,right.labels=rep("",7),main="Air
Quality Data - London",xlab="(48 hour change/ maximum 48 hour
change)",pch=22,col="red",bg="blue",col.main="blue")
NOTES
Figure 49 1 Probability and empirical plots for three groups of desert ants
The chart displays probability and empirical probability plots for the
orientation of three sets of long-legged desert ants after one eye on each ant was
'trained' to learn the ant's home direction, then covered and the other eye
uncovered. The top three images plot the probability plot for each group relative
to the von Misses distribution. The closer the dotted points are the solid black
line, the better the observations fit the von Misses distribution. The bottom three
charts plot the empirical cumulative probability distribution of the three groups
of ants.
Tip 1: The pp.plot and plot.edf function require data in a circular format. This
means the data needs to be converted to circular class. When using your own
data or other datasets you can use the statement circular (your_data) to coerce
data into a circular format.
Tip 2: You can include axes on each chart by setting the argument axes=TRUE
in the plot function.
Tip 3: Change the background color by setting the argument bg="your chosen
color".
Tip 4: Change the plot region using par(mfrow=c(x,y)), where x is the number
of rows, and y is the number of columns. For example, par(mfrow=c(2,2))
would split the graphic output into four areas and then you could plot a chart to
each specific area.
install.packages("circular")
require(circular)
ant.1<-circular(fisherB10$set1,zero=pi/2)
ant.2<-circular(fisherB10$set2,zero=pi/2)
ant.3<-circular(fisherB10$set3,zero=pi/2)
par(mfrow=c(2,3))
par(bg="cornsilk")
pp.plot(ant.1,col="darkred",pch=19,main="Ant group 1",)
pp.plot(ant.2,col="darkgreen",pch=19,main="Ant group 2")
pp.plot(ant.3,col="darkblue",pch=19,main="Ant group 3")
plot.edf(ant.1,col="darkred",pch=19,main="Ant group 1", ylab="Probability",
xlab="value")
plot.edf (ant.2,col="darkgreen",pch=19,main="Ant group 2",
ylab="Probability", xlab="value")
plot.edf (ant.3,col="darkblue",pch=19,main="Ant group 3", ylab="Probability",
xlab="value")
NOTES
Tip 1: Change the color of the stacked bars by using the argument cols= "my
color". Quick and easy alternatives include “default”, “increment” and “heat”.
Tip 2: Set the argument normalise = FALSE to plot the actual frequencies in
place of the proportions.
Tip 3: The argument box.width determines the size of the gap between colored
bars. It takes a default value of 1 (no gap). Values less than one increase the
gap/space between the colored bars.
Tip 4: The argument key.position determines the location of the legend. It can
take values "right", "left", "top", "bottom".
install.packages("openair")
require(openair)
data(mydata)
title="Ozone concentration by wind direction in London"
timeProp(mydata, pollutant="o3", avg.time="year", proportion="wd",normalise
= TRUE ,cols="jet",box.width=.9,key.position="right", statistic =
"frequency",xlab= "Year",main=title,cex.main=1.3)
NOTES
The top panel displays ozone concentration (grey line) in central London over
the years 1998 to 2005. The cyan bar directly underneath the ozone time series
indicates the presence (cyan) or absence (black) of data at a particular point in
time. The box to the right of the top panel presents the density curve (red) of
ozone. The remaining four panels (nitrogen dioxide concentration, Oxides of
nitrogen concentration, wind direction and wind speed) have a similar
interpretation.
Tip 1: Change the color of the line used to draw the density, trend pollution
values and present and missing data by setting to “my color” the arguments
col.hist,col.trend,col.data and col.mis respectively.
Tip 2: For greater clarity plot fewer time series. This can be achieved by using
data <- temp[,c(1:number)], where number is equal to the number of time series
you wish to observe. You can plot all of the data in myframe by setting data<-
myframe in the above R code.
Tip 3: To use default labels for the x-axis omit the argument xlab. To add
custom text to the y-axis use the argument ylab="my text".
install.packages("openair")
require(openair)
data(mydata)#dataframe contains the data on ozone concentrations
temp<-mydata
data <- temp[,c(1:6)]
title="London Air Quality Data"
summaryPlot(data,type="density",col.hist="red",col.trend="grey",col.data="lightblue",co
NOTES
The chart displays the eight counties that took part in the first English cricket
country championships beginning in 1890. Surrey won the first championship
and are the first team listed under 1890. Sussex scored the lowest number of
points in that first season and are therefore placed last for 1890. The lines from
each team represent their relative ranking in both 1948 and 2012 using total
points earned.
Tip 1: The color of the lines was set using col=rainbow(8), you can also specify
the colors of your choice directly by using col=c("your first color","your
second color"…,"your final color"). For example, try
col=c("red","green","yellow","purple","blue","orange","darkred","white")
Tip 2: The argument rank=TRUE ranks the values before plotting. To turn this
option off use rank =FALSE.
Tip 3: Change the background color by setting the argument bg="your chosen
color".
Tip 4: The first two arguments in the legend function, determine the horizontal
and vertical location of the legend. An alternative, which works in many
instances is to specify the location such as legend("topleft",…). You can also
specify "topright", "bottomright", "bottomleft".
The chart displays the allocation between bonds (blue) and stocks (tan) for a
portfolio over the years 2006 to 2013. The left axis provides the measurements
for the allocation. Overlaid on the plot are the annual returns for private equity,
real estate and gold with return values measured on the right axis.
Tip 1: Both the ltype and rtype arguments can take various values. The most
frequently used are "p" for points, "l" for lines,"b" for both, "c" for the lines part
alone of "b", "o" for both ‘overplotted’, "h","s" for stair steps, "S" for other
steps, and "n" for no plotting.
Tip 2: You can change the right hand axis color specifying it as the first value in
the rcol argument i.e. rcol =c("my color","next color"…). Similarly, you can
change the left hand axis color using the same approach.
Tip 3: Change the background color by setting the argument bg="your chosen
color".
Tip 4: The first two argument in the legend function, determine the horizontal
and vertical location of the legend. An alternative, which works in many
instances is to specify the location such as legend("topleft",…). You can also
specify "topright", "bottomright", "bottomleft".
install.packages("plotrix")
require(plotrix)
stock.percent =c(90,95,80,70,60,65,70,70)
bond.percent =100-stock.percent
private_equity=c(31.2,21.4,-25.4,10.5,17,12.8,8,11.9)
real_estate=c(5.6,18.1,2.3,-35.7,6.1,7,5,9.3)
gold=c(23.2,31.3,7.5,25.3,28.7,13,0,-10.7)
time<-0:7
allocation <-cbind(bond.percent,stock.percent)
axis(side=1,at=(0:7), labels=
c("2006","2007","2008","2009","2010","2011","2012","2013"))
The chart displays wind (solid green) and air temperature (purple line) taken in
New York during 1973. The left axis provides the measurements for wind, and
the right axis the measurements for temperature.
Tip 1: Change type argument to draw different lines. It takes a range of values.
The most frequently used are "p" for points, "l" for lines,"b" for both, "c" for the
lines part alone of "b", "o" for both ‘overplotted’, "h","s" for stair steps, "S" for
other steps, and "n" for no plotting.
Tip 2: You can change the right hand axis color by using the argument rcol="my
color". Similarly, you can change the left hand axis color by lcol="my color"
Tip 3: Change the background color by setting the argument bg="your chosen
color".
install.packages("plotrix")
require(plotrix)
attach(airquality)
x=airquality[,3]#Wind measurement
y= airquality[,4]#temperature
seq<-1:153
labels<-c("May ","June","Aug ","Sept")
par(bg="burlywood1")
twoord.plot(seq,x,y,type=c("h","l"),rcol="purple",lcol="darkgreen",lwd=2,ylab=
"Wind measurement",rylab=" Temperature",main= "Atmospheric
Measurements",xticklab=labels,xtickpos=c(0,50,100,150),sub=" New York
1973")
NOTES
The image displays Chernoff faces calculated from four variables, consumption
of ice cream per head (in pints); average family income per week (in US
Dollars); price of ice cream (per pint) and the average temperature (in
Fahrenheit). The variables used to define the face were height of face, width of
face, structure of face, height of mouth, width of mouth , degree of smiling,
height of eyes, width of eyes, height of hair, width of hair, style of hair, height of
nose, width of nose, width of ear and height of ear. The value for the height of
face was determined from the consumption of ice cream, the value for the
structure of face was determined from average family income and so on
repeating for the fifteen facial characteristics that define a Chernoff face.
Tip 1: To plot outline faces without color set face.type=0. You can also draw
Santa Claus faces by setting face.type=2.
Tip 2: Drop the argument labels if you don’t want to have dates above the faces.
The dates will be replaced with numbers.
Tip 3: Change the background color by setting the argument bg="your chosen
color".
install.packages("aplpack")
require("aplpack")
install.packages("Ecdat")
require(Ecdat)
data(Icecream )
dates<-c("18 Mar","17 Apr","16 May","14 Jun","13 Jul","11 Aug","09
Sep","08 Oct","06 Nov","05 Dec","03 Jan","01 Feb","01 Mar","30 Mar","28
Apr","27 May","25 Jun","24 Jul","22 Aug","20 Sep","19 Oct","17 Nov","16
Dec","14 Jan","12 Feb","13 Mar","11 Apr","10 May","08 Jun","11 Jul")
par(bg="cornsilk")
faces(Icecream,main="March 1952- July 1953",label=dates, face.type=1)
NOTES
Figure 56 1 State of the Union address using words to define Chernoff faces
The image displays Chernoff faces calculated from the most frequent words in
President Obama’s 2011 and 2012 State of the Union address. The variables
used to define the face were height of face, width of face, structure of face,
height of mouth, width of mouth , degree of smiling, height of eyes, width of
eyes, height of hair, width of hair, style of hair, height of nose, width of nose,
width of ear and height of ear. The value for the height of face was determined
from the most common word (“America”) , the value for the structure of face
was determined by the next most popular word (“People”) and so on.
Tip 1: To plot outline faces without color set face.type=0. You can also draw
Santa Claus faces by setting face.type=2.
Tip 2: Replace ncol.plot with nrow.plot if you wish to plot the faces as rows
rather than in a column.
Tip 3: Change the background color by setting the argument bg="your chosen
color".
install.packages("aplpack")
require("aplpack")
c.2010<- c(70,33,29,29,20,13,12,11,13,15,13,15,6,11,10)
c.2011<-c(57,33,31,20,14,18,16,16,12,9,10,7,15,9,10)
words<-rbind(c.2010,c.2011)
colnames(words) <- c("America","People","Jobs","Business",
"Time","Government","Nation", "Tonight","Country","Energy",
"Taxes","Economy","Futurev","Care","Congress")
rownames(words)<-c("2011","2012")
par(bg="papayawhip")
faces(words,face.type=1,main="President Obama's State of the Union
Address",ncol.plot=1)
NOTES
The image displays a time series plot of ice cream consumption and temperature
using Chernoff faces. The Chernoff faces are calculated from four variables,
consumption of ice cream per head (in pints); average family income per week
(in US Dollars); price of ice cream (per pint) and the average temperature (in
Fahrenheit). The variables used to define the face were height of face, width of
face, structure of face, height of mouth, width of mouth , degree of smiling,
height of eyes, width of eyes, height of hair, width of hair, style of hair, height of
nose, width of nose, width of ear and height of ear. The value for the height of
face was determined from the consumption of ice cream, the value for the
structure of face was determined from average family income and so on
repeating for the fifteen facial characteristics that define a Chernoff face.
Tip 1: To plot outline faces without color set face.type=0. You can also draw
Santa Claus faces by setting face.type=2.
Tip 2: Adjust the width and height of the faces in the plot by using the argument
width and height. It may take some experimentation to get the size correct when
using your own data.
Tip 3: Change the background color by setting the argument bg="your chosen
color".
install.packages("aplpack")
require("aplpack")
install.packages("Ecdat")
require(Ecdat)
data(Icecream )
b<- Icecream[1:6,]
rownames(b)<-c <-c("18 Mar","17 Apr","16 May","14 Jun","13 Jul","11 Aug")
consumption= b[1:6,1] *100
temperature = b[1:6,4]
plot(consumption, temperature ,bty="n")
par(bg="lightblue")
b<-faces(b,plot=FALSE)
plot.faces(b, consumption, temperature,width=2, height=3)
NOTES
Figure 58 1 Final position of top six and bottom six English football teams
The image displays the Chernoff faces of the top three and bottom three English
premier league football teams for the season 2012-2013. The Chernoff faces are
calculated from the following variables:
Tip 1: To plot outline faces without color set face.type=0. You can also draw
Santa Claus faces by setting face.type=2.
Tip 2: Adjust the width and height of the faces in the plot by using the argument
width and height. It may take some experimentation to get the size correct when
using your own data.
Tip 3: Change the background color by setting the argument bg="your chosen
color".
install.packages("aplpack")
require("aplpack")
Man_Utd =c(89, 43, 16, 45,
12, 41, 0, 3, 19, 5, 2,
24)
Man_City=c( 78, 32, 14, 41,
9, 25, 3, 2, 15, 6, 4,
19)
Chelsea =c(75, 36, 12, 41, 10,
34, 5, 2, 16, 4, 5, 23)
Wigan=c( 36, -26, 4, 26, 5,
21, 6, 9, 39, 3, 11, 34)
Reading=c (28, -30, 4, 23, 2,
20, 8, 7, 33, 2, 15, 40)
QPR=c( 25, -30, 2, 13, 2,
17, 8, 9, 28, 5, 12, 32)
names<-rbind(Man_Utd,Man_City,Chelsea,Wigan,Reading,QPR)
colnames(names)<-c("Points", "Goal Difference", "Home
wins", "Home for", "Away wins", "Away for",
"Home draw", "Home loss", "Home against", "Away
draw", "Away loss", "Away against")
teams<-faces(names,plot.faces=FALSE)
par(bg=" honeydew1",col="purple")
plot(0:6,0:6,type="n",axes=FALSE,xlab="Season 2012-2013 final league
position",ylab="",main="English Premier League")
axis(side=1,at=(1:6), labels=c("1st","2nd","3rd","18th","19th","last"))
plot(teams,x.pos=1:6,y.pos=6:1)
NOTES
The image displays constant Chernoff faces calculated from the points scored,
number of wins, losses and draws for teams playing in the first English County
Championship. The images are ranked from division winners (Surrey) to last
place (Sussex).The Chernoff faces are calculated from the following variables:
The remaining eleven Chernoff face features are given a constant value.
Tip 1: To plot outline faces without color set face.type=0. You can also draw
Santa Claus faces by setting face.type=2.
Tip 2: Adjust the width and height of the faces in the plot by using the argument
width and height. It may take some experimentation to get the size correct when
using your own data.
Tip 3: Change the background color by setting the argument bg="your chosen
color".
install.packages("aplpack")
require("aplpack")
Surrey= c ( 6, 9, 2, 3)
Lancashire= c( 4, 7, 4, 3)
Kent = c(3, 6, 5, 3)
Yorkshire= c ( 3, 6, 5, 3)
Nottinghamshire= c ( 0, 5, 4, 5)
Gloucestershire= c( -1, 5, 3, 6)
Middlesex=c ( -5, 3, 1, 8)
Sussex = c (-10, 1, 0, 11)
names<-rbind( Surrey,Lancashire,Kent,Yorkshire,Nottinghamshire,
Gloucestershire,Middlesex,Sussex)
colnames(names)<-c("Points", "Wins", "Draws",
"Losses")
constant <- matrix(1,nrow=nrow(names),ncol=11)
names<-cbind(names,constant)
par(bg=" honeydew1")
faces (names, main=" 1890 County Championship", face.type=1)
NOTES
www.auscov.com
[i] This dataset is rather famous and known as Anscombe's quartet. See Anscombe, F. J. (1973). "Graphs in
Statistical Analysis". American Statistician 27 (1): 17–21. JSTOR 2682899.
[ii] See page 206 in Galton, Francis. Natural inheritance. Vol. 42. MacmillaSn, 1889.
[iii] The height of the boxes are proportional to the Pearson residuals with the width proportional to the
square roots of the expected frequencies.
[iv] The crime data consisted of the following crimes: Murder, Rape Robbery,
Assault, Burglary, Larceny and Motor Theft.
[vii] See Gao X, An H, Zhong W (2013) Features of the Correlation Structure of Price Indices. PLoS ONE
8(4): e61091. doi:10.1371/journal.pone.0061091
[viii] Image taken from the website http://www.florence-nightingale-avenging-angel.co.uk/ . Also see the
insightful book by Hugh Small entitled Florence Nightingale, Avenging Angel.