R Graphs Cookbook Second Edition Sample Chapter
R Graphs Cookbook Second Edition Sample Chapter
Second Edition
Jaynal Abedin
Hrishi V. Mittal
Chapter No. 1
" R Graphics"
In this package, you will find:
The authors biography
A preview chapter from the book, Chapter no.1 "R Graphics"
A synopsis of the books content
Information on where to buy this book
About the Authors
Jaynal Abedin currently holds the position of Senior Statistician at the Centre for
Communicable Diseases (CCD) at icddr, b (www. i cddr b. or g). He attained his
Bachelor's and Master's degrees in Statistics from University of Rajshahi, Rajshahi,
Bangladesh. He has vast experience in R programming and Stata and has efficient
leadership qualities. He has written an R package named edeR: Email Data
Extraction Using R, which is available at CRAN (ht t p: / / cr an. r -
pr oj ect . or g/ web/ packages/ edeR/ i ndex. ht ml ). He is currently leading
a team of statisticians. He has hands-on experience in developing training material
and facilitating training in R programming and Stata along with statistical aspects in
public health research. He has authored Data Manipulation with R, Packt Publishing,
which got good reviews. His primary area of interest in research includes causal
inference and machine learning. He is currently involved in several ongoing public
health research projects and is a co-author of seven peer-reviewed scientific papers.
Moreover, he engages in several work-in-progress manuscripts. He is also one of the
reviewers of the following two journals:
Journal of Applied Statistics (JAS)
Journal of Health Population and Nutrition (JHPN)
For More Information:
www.packtpub.com/big-data-and-business-intel ligence/r-graph-cookbook--
second-edition
Hrishi V. Mittal has been working with R for a few years in different capacities. He
was introduced to the exciting world of data analysis with R when he was working as
a senior air quality scientist at King's College, London, where he used R extensively to
analyze large amounts of air pollution and traffic data for London's Mayor's Air Quality
Strategy. He has experience in various other programming languages but prefers R for
data analysis and visualization. He is also actively involved in various R mailing lists,
forums, and the development of some R packages.
For More Information:
www.packtpub.com/big-data-and-business-intel ligence/r-graph-cookbook--
second-edition
R Graphs Cookbook
Second Edition
The open source statistical software, R, is one of the most popular choices among
researchers from various fields. This software has the capability to produce high-quality
graphics, and data visualization is one of the most important tasks in data science tracks.
Through effective visualization, we can easily uncover the underlying pattern among
variables without doing any sophisticated statistical analysis. In this cookbook, we
have focused on graphical analysis using R in a very simple way with each independent
example. We have covered the default R functionality along with more advanced
visualization techniques such as lattice, ggplot2, and three-dimensional plots.
Readers will not only learn the code to produce the graph but also learn why certain
code has been written with specific examples.
What This Book Covers
Chapter 1, R Graphics, introduces the reader to the R graphic system, how R graphs
work with default libraries, and also to the very recent revolution of lattice and ggplot2.
Here, readers will get a flavor of what is going to be discussed in the subsequent chapters.
Chapter 2, Basic Graph Functions, introduces recipes for some basic types of graphs,
useful in almost any kind of data analysis. We will go through all the steps to get you
going from reading your data into R, making a first graph, tweaking it to suit your needs,
and then saving and exporting it for use in presentations and publications.
Chapter 3, Beyond the Basics Adjusting Key Parameters, looks more closely at various
arguments to graph functions and their values, highlighting common pitfalls and
workarounds. The par() function is explained with some useful examples, showing how
to adjust colors, sizes, margins, and the styles of various graph elements such as points,
lines, bars, axes, and titles. The subsequent chapters 3 to 9 cover the graph types
introduced in the first two chapters in more detail.
Chapter 4, Creating Scatter Plots, has over a dozen recipes that cover scatter plots, some
of the simplest and most commonly used types of graphs in data analysis. We will see
how we can make more enhanced plots by adjusting various arguments and using some
new functions.
Chapter 5, Creating Line Graphs and Time Series Charts, discusses some more
intermediate to advanced recipes on customizing line graphs, improving and speeding up
line graphs with multiple lines, processing dates to make time series charts, sparklines,
and stock charts.
For More Information:
www.packtpub.com/big-data-and-business-intel ligence/r-graph-cookbook--
second-edition
Chapter 6, Creating Bar, Dot, and Pie Charts, will show you how you can create many
useful variations of bar graphs and dot plots by using only the base library functions. We
will also look at a few recipes that address common criticisms of pie charts with some
ways to make them more readable.
Chapter 7, Creating Histograms, enhances the basic histogram in R by changing the
plotting mode and bins, in addition to style adjustments. We will also look at some
advanced recipes that combine histograms with other types of graphs.
Chapter 8, Box and Whisker Plots, looks into various stylistic and structural adjustments
to box plots. We will start by looking at some basic arguments to change individual
aspects of a box plot and slowly move to more advanced recipes that involve the use
of multiple function calls.
Chapter 9, Creating Heat Maps and Contour Plots, discusses various types of heat
maps to visualize correlations, trends and multivariate data, and contour plots to show
topographical information in various two- and three-dimensional ways.
Chapter 10, Creating Maps, builds on the introduction to visualizing data on
geographical maps in the first chapter and covers recipes on plotting data from the World
Bank, World Health Organization (WHO), Google Maps API, and some Geographical
Information Systems (GIS).
Chapter 11, Data Visualization Using Lattice, contains various recipes to create the
most common graphs using the lattice library. Lattice is one of the most popular data
visualization libraries in R. This chapter contains 9 different recipes ranging from bar
charts to distributional plots and empirical cumulative distribution.
Chapter 12, Data Visualization Using ggplot2, contains how we can create very high-
quality data visualization using the concept of Grammar of Graphics. There are 8
different recipes to create the most common graphics. This chapter contains 1 special
recipe where we discuss how to annotate a graph. The annotated graph contains an
enormous amount of information. The recipe ranges from very basic graphics to
advanced ones, where we show how we can incorporate layered graphs, such as
a scatter plot embedded with the lowest and least square-fitted lines.
Chapter 13, Inspecting Large Datasets, contains recipes related to one of the newest
concepts of finding patterns in large data through visualization. The recipes of this
chapter show how to create nice graphs that tell us a story about the pattern of
relationship among variables.
For More Information:
www.packtpub.com/big-data-and-business-intel ligence/r-graph-cookbook--
second-edition
Chapter 14, Three-dimensional Visualizations, has most of its recipes centered on
creating three-dimensional graphics ranging from scatter plots to density estimations.
In this chapter, we also include how to create three-dimensional scatter plots with an
estimated linear plane.
Chapter 15, Finalizing Graphs for Publications and Presentations, discusses some tricks
and tips to add some polish to our graphs so that they can be used for publication and
presentation. We will cover many important practical topics such as exported graph file
formats, high resolution formats, vector formats such as PDF, SVG, and PS,
mathematical and scientific notations, text descriptions, fonts, graph templates,
and themes.
For More Information:
www.packtpub.com/big-data-and-business-intel ligence/r-graph-cookbook--
second-edition
1
R Graphics
R provides a number of well-known facilities that produce a variety of graphs to meaningfully
visualize data. It has low-level facilities where we deal with basic shapes to draw graphs
and high-level facilities. There are functions available here to produce quality graphs; these
functionalities are usually developed using certain combinations of basic shapes. Using
R, we can produce traditional plots, the trellis plot, and very high-level graphs inspired by
the Grammar of Graphics implemented in the ggplot2 package. The default graphics
package is useful for traditional plots, lattice provides facilities to produce trellis graphs,
and the ggplot2 package is the most powerful high-level graphical tool in R. Other than
these, there are low-level facilities that draw basic shapes, and arranging the shapes in their
relative position is an important step in order to create meaningful data visualization. In this
chapter, we will introduce both low-level graphics (also known as base graphics) and high-level
graphics using different packages. Particularly, the content of this chapter will be as follows:
Base graphics using the default package
Trellis graphs using lattice
Graphs inspired by Grammar of Graphics
For More Information:
www.packtpub.com/big-data-and-business-intel ligence/r-graph-cookbook--
second-edition
R Graphics
8
Base graphics using the default package
It is well known that R has very powerful data visualization capabilities. The primary reason
behind the powerful graphical utility of R is the low-level graphical environment. The grid
graphic system of R makes data visualization much more exible and intuitive. With the
help of the grid package, we can draw very basic shapes that can be arranged to produce
interesting data visualizations. There are functions in the grid graphics system that draw
very basic shapes of a high-level data visualization, including lines, rectangles, circles, and
texts along with some other functions that specify where to put which part of the visualization.
Through the use of the basic function, we can easily produce components of high-level graphs,
such as a rectangle, rounded rectangle, circle, line, and arrow. We will now see how we can
produce these basic shapes. In a single visualization, we will show you all the output from the
following code snippet:
# Calling grid library
library(grid)
# Creating a rectangle
grid.rect(height=0.25,width=0.25)
# A rounded rectangle
grid.roundrect(height=0.2,width=0.2)
# A circle
grid.circle(r=0.1)
# Inserting text within the shape
grid.text("R Graphics")
# Drawing a polygon
grid.polygon()
Basic shapes using the grid package
For More Information:
www.packtpub.com/big-data-and-business-intel ligence/r-graph-cookbook--
second-edition
Chapter 1
9
For any high-level visualization, we can use the basic shapes and arrange them as required.
Now, we will list some of the functions for high-level data visualization where the basic
shapes have been used:
plot: This is a generic function that is used to plot any kind of objects. Most
commonly, we use this function for x-y plotting
barplot: This function is used to produce a horizontal or vertical bar plot
boxplot: This is used to produce a box-whisker plot
pie: This is used to produce a pie chart
hist: This is used to produce a histogram
dotchart: This is used to produce cleveland dot plots
image, heatmap, contour, and persp: These functions are used to generate
image-like plots
qqnorm, qqline, and qqplot: These functions are used to produce plots in order to
compare distributions
We will provide specic recipes for each of these functions in the subsequent chapters.
Trellis graphs using lattice
Though grid graphics have much more exibility than trellis graphs, it is a bit difcult to use
them from the point of view of general users. The lattice package enhances the data
visualization capability of R through relatively easy code in order to produce much more
complex graphs. This allows the user to produce multivariate visualization. The lattice
package could be considered as a high-level data visualization tool that is able to produce
structured graphics with the exibility to adjust the graphs as required.
The traditional R graphics system has much more exibility to produce any kind of data
visualization with control over each and every component. However, it is still a difcult task
for an inexperienced R programmer to produce efcient graphs. In other words, we can say
that the traditional graphic system of R is not so user friendly. It would be good if the user
could have complete high-level graphics with the use of minimal written code. To address
this shortcoming, Trellis graphics have been implemented in S. The inspired lattice
add-on package is the add-on package that provides similar capabilities for R users. One of
the important features of the lattice graphics system is the formula interface. During data
visualization, we can intuitively use the formula interface to produce conditional plots, which
is difcult in a traditional graphics system.
For More Information:
www.packtpub.com/big-data-and-business-intel ligence/r-graph-cookbook--
second-edition
R Graphics
10
For example, say we have a dataset with two variables, an incubation period, and the
exposure category of a certain disease. This dataset contains one numeric variable, the
incubation period itself, and another discrete variable with four possible values: 1, 2, 3, or
4. We want to produce a histogram for each exposure category. The following code snippet
shows you the traditional code:
# data generation
# Set the seed to make the example reproducible
set.seed(1234)
incubation_period <- c(rnorm(100,mean=10),rnorm(100,mean=15),rnorm(100
,mean=5),rnorm(100,mean=20))
exposure_cat <- sort(rep(c(1:4),100))
dis_dat<-data.frame(incubation_period,exposure_cat)
# Producing histogram for each of the exposure category 1, 2, 3, and 4
# using traditional visualization code. The code below for
# panel histogram for different values of the variable
# exposure_cat. This code will produce a 2 x 2 matrix where
# we will have four different histograms.
op<-par(mfrow=c(2,2))
hist(dis_dat$incubation_period[dis_dat$exposure_cat==1])
hist(dis_dat$incubation_period[dis_dat$exposure_cat==2])
hist(dis_dat$incubation_period[dis_dat$exposure_cat==3])
hist(dis_dat$incubation_period[dis_dat$exposure_cat==4])
par(op)
For More Information:
www.packtpub.com/big-data-and-business-intel ligence/r-graph-cookbook--
second-edition
Chapter 1
11
The following code snippet shows you the lattice implementation for the same histogram:
library(lattice)
histogram(~incubation_period | factor(exposure_cat), data=dis_dat)
In this lattice version of the code, it is much more intuitive to write the entire code to produce
a histogram using the formula interface. The code that follows the ~ symbol contains the
name of the variable that we are interested in to produce the histogram, and then we specify
the grouping variable. The ~ symbol acts like the of preposition, for example, the histogram
of the incubation period. The vertical bar is used to represent the panel variable over which
we are going to repeat the histogram. Notice that we have used the factor command here
to specify the grouping variable. If we do not specify the factor, then we will not be able to
distinguish which plot corresponds to which category. The factor()command creates text
labels. If the variable was left as a numeric value, it would show low to high values as though
it were a continuous scale rather than discrete categories, as shown in the following gure:
For More Information:
www.packtpub.com/big-data-and-business-intel ligence/r-graph-cookbook--
second-edition
R Graphics
12
Now, if we change the code's formula part and use a plot generic function instead of the
histogram, then the visualization will be changed as follows:
plot(incubation_period ~ factor(exposure_cat), data=dis_dat)
Downloading the example code
You can download the example code les for all Packt books you
have purchased from your account at http://www.packtpub.
com. If you purchased this book elsewhere, you can visit http://
www.packtpub.com/support and register to have the les
e-mailed directly to you.
For More Information:
www.packtpub.com/big-data-and-business-intel ligence/r-graph-cookbook--
second-edition
Chapter 1
13
If we change the code further and just omit the factor function, then the same visualization
will be turned into a scatter plot as follows:
plot(incubation_period ~ exposure_cat, data=dis_dat)
The plot()function is a generic function. If we put two numeric variables inside this
function, it produces a scatter. On the other hand, if we use one numeric variable and
another factor variable, then it produces a boxplot of the numeric variable for each
unique value of the factor variable.
For More Information:
www.packtpub.com/big-data-and-business-intel ligence/r-graph-cookbook--
second-edition
R Graphics
14
Graphs inspired by Grammar of Graphics
The ggplot2 R package is based on The Grammar of Graphics by Leland Wilkinson,
Springer). Using this package, we can produce a variety of traditional graphics, and the user
can produce their customized graphs as well. The beauty of this package is in its layered
graphics facilities; through the use of layered graphics utilities, we can produce almost any
kind of data visualization. Recently, ggplot2 has become the most searched keyword
in the R community, including the most popular R blog (www.r-bloggers.com). The
comprehensive theme system allows the user to produce publication quality graphs with a
variety of themes of their choice. If we want to explain this package in a single sentence, then
we can say that if whatever we can think about data visualization can be structured in a data
frame, the visualization is a matter of few seconds.
In Chapter 12, Data Visualization Using ggplot2, on ggplot2 , we will see different examples
and use themes to produce publication quality graphs. However, in this introductory chapter,
we will show you one of the important features of the ggplot2 package that produces
various types of graphs. The main function is ggplot(), but with the help of a different
geom function, we can easily produce different types of graphs, such as the following:
geom_point(): This will create a scatter plot
geom_line(): This will create a line chart
geom_bar(): This will create a bar chart
geom_boxplot(): This will create a box plot
geom_text(): This will write certain text inside the plot area
Now, we will see a simple example of the use of different geom functions with the default
mtcars dataset in R:
# loading ggplot2 library
library(ggplot2)
# creating a basic ggplot object
p <- ggplot(data=mtcars)
# Creating scatter plot of mpg and disp variable
p1 <- p+geom_point(aes(x=disp,y=mpg))
# creating line chart from the same ggplot object but different
# geom function
p2 <- p+geom_line(aes(x=disp,y=mpg))
# creating bar chart of mpg variable
p3 <- p+geom_bar(aes(x=mpg))
# creating boxplot of mpg over gear
p4 <- p+geom_boxplot(aes(x=factor(gear),y=mpg))
# writing certain text into the scatter plot
p5 <- p1+geom_text(x=200,y=25,label="Scatter plot")
For More Information:
www.packtpub.com/big-data-and-business-intel ligence/r-graph-cookbook--
second-edition
Chapter 1
15
The visualization of the preceding ve plots will look like the following gure:
For More Information:
www.packtpub.com/big-data-and-business-intel ligence/r-graph-cookbook--
second-edition
Where to buy this book
You can buy R Graphs Cookbook Second Edition from the Packt Publishing website:
ht t ps: / / www. packt pub. com/ bi g- dat a- and- busi ness- i nt el l i gence/ r -
gr aph- cookbook - - second- edi t i on
Free shipping to the US, UK, Europe and selected Asian countries. For more information, please
read our shipping policy.
Alternatively, you can buy the book from Amazon, BN.com, Computer Manuals and
most internet book retailers.
www.PacktPub.com
For More Information:
www.packtpub.com/big-data-and-business-intel ligence/r-graph-cookbook--
second-edition