(Victor A. Bloomfield) Using R For Numerical Analysis
(Victor A. Bloomfield) Using R For Numerical Analysis
Victor A. Bloomfield
Bloomfield
K13976
Victor A. Bloomfield
University of Minnesota
Minneapolis, USA
Series Editors
John M. Chambers Torsten Hothorn
Department of Statistics Division of Biostatistics
Stanford University University of Zurich
Stanford, California, USA Switzerland
Customer and Business Analytics: Applied Data Mining for Business Decision
Making Using R, Daniel S. Putler and Robert E. Krider
CRC Press
Taylor & Francis Group
6000 Broken Sound Parkway NW, Suite 300
Boca Raton, FL 33487-2742
This book contains information obtained from authentic and highly regarded sources. Reasonable efforts have been
made to publish reliable data and information, but the author and publisher cannot assume responsibility for the
validity of all materials or the consequences of their use. The authors and publishers have attempted to trace the
copyright holders of all material reproduced in this publication and apologize to copyright holders if permission to
publish in this form has not been obtained. If any copyright material has not been acknowledged please write and let
us know so we may rectify in any future reprint.
Except as permitted under U.S. Copyright Law, no part of this book may be reprinted, reproduced, transmitted,
or utilized in any form by any electronic, mechanical, or other means, now known or hereafter invented, includ-
ing photocopying, microfilming, and recording, or in any information storage or retrieval system, without written
permission from the publishers.
For permission to photocopy or use material electronically from this work, please access www.copyright.com
(http://www.copyright.com/) or contact the Copyright Clearance Center, Inc. (CCC), 222 Rosewood Drive, Danvers,
MA 01923, 978-750-8400. CCC is a not-for-profit organization that provides licenses and registration for a variety
of users. For organizations that have been granted a photocopy license by the CCC, a separate system of payment
has been arranged.
Trademark Notice: Product or corporate names may be trademarks or registered trademarks, and are used only for
identification and explanation without intent to infringe.
Visit the Taylor & Francis Web site at
http://www.taylorandfrancis.com
Preface xix
1 Introduction 1
1.1 Obtaining and installing R 1
1.2 Learning R 1
1.3 Learning numerical methods 1
1.4 Finding help 2
1.5 Augmenting R with packages 3
1.6 Learning more about R 5
1.6.1 Books 5
1.6.2 Online resources 5
2 Calculating 7
2.1 Basic operators and functions 7
2.2 Complex numbers 8
2.3 Numerical display, round-off error, and rounding 9
2.4 Assigning variables 11
2.4.1 Listing and removing variables 12
2.5 Relational operators 12
2.6 Vectors 13
2.6.1 Vector elements and indexes 13
2.6.2 Operations with vectors 14
2.6.3 Generating sequences 15
2.6.3.1 Regular sequences 15
2.6.3.2 Repeating values 16
2.6.3.3 Sequences of random numbers 16
2.6.4 Logical vectors 17
2.6.5 Speed of forming large vectors 18
2.6.6 Vector dot product and crossproduct 19
2.7 Matrices 21
2.7.1 Forming matrices 21
2.7.2 Operations on matrices 24
2.7.2.1 Arithmetic operations on matrices 24
2.7.2.2 Matrix multiplication 25
v
vi CONTENTS
2.7.2.3 Transpose and determinant 26
2.7.2.4 Matrix crossproduct 26
2.7.2.5 Matrix exponential 27
2.7.2.6 Matrix inverse and solve 27
2.7.2.7 Eigenvalues and eigenvectors 29
2.7.2.8 Singular value decomposition 31
2.7.3 The Matrix package 33
2.7.4 Additional matrix functions and packages 34
2.8 Time and date calculations 34
3 Graphing 37
3.1 Scatter plots 37
3.2 Function plots 39
3.3 Other common plots 40
3.3.1 Bar charts 40
3.3.2 Histograms 42
3.3.3 Box-and-whisker plots 43
3.4 Customizing plots 44
3.4.1 Points and lines 44
3.4.2 Axes, ticks, and par() 44
3.4.3 Overlaying plots with graphic elements 46
3.5 Error bars 48
3.6 Superimposing vectors in a plot 49
3.7 Modifying axes 50
3.7.1 Logarithmic axes 51
3.7.2 Supplementary axes 51
3.7.3 Incomplete axis boxes 52
3.7.4 Broken axes 52
3.8 Adding text and math expressions 54
3.8.1 Making math annotations with expression() 55
3.9 Placing several plots in a figure 56
3.10 Two- and three-dimensional plots 58
3.11 The plotrix package 60
3.11.1 radial.plot and polar.plot 60
3.11.2 Triangle plot 61
3.11.3 Error bars in plotrix 62
3.12 Animation 63
3.13 Additional plotting packages 64
7 Optimization 159
7.1 One-dimensional optimization 159
7.2 Multi-dimensional optimization with optim() 162
7.2.1 optim() with NelderMead default 163
7.2.2 optim() with BFGS method 165
7.2.3 optim() with CG method 167
7.2.4 optim() with L-BFGS-B method to find a local minimum 167
7.3 Other optimization packages 169
7.3.1 nlm() 169
CONTENTS ix
7.3.2 ucminf package 171
7.3.3 BB package 171
7.3.4 optimx() wrapper 172
7.3.5 Derivative-free optimization algorithms 172
7.4 Optimization with constraints 173
7.4.1 constrOptim to optimize functions with linear constraints 173
7.4.2 External packages alabama and Rsolnp 175
7.5 Global optimization with many local minima 177
7.5.1 Simulated annealing 178
7.5.2 Genetic algorithms 181
7.5.2.1 DEoptim 181
7.5.2.2 rgenoud 183
7.5.2.3 GA 183
7.6 Linear and quadratic programming 183
7.6.1 Linear programming 183
7.6.2 Quadratic programming 186
7.7 Mixed-integer linear programming 189
7.7.1 Mixed-integer problems 189
7.7.2 Integer programming problems 190
7.7.2.1 Knapsack problems 191
7.7.2.2 Transportation problems 191
7.7.2.3 Assignment problems 192
7.7.2.4 Subsetsum problems 193
7.8 Case study 194
7.8.1 Monte Carlo simulation of the 2D Ising model 194
Bibliography 329
Index 331
List of Figures
xiii
xiv LIST OF FIGURES
3.25 Using layout to create a scatter plot with accompanying box plots. 57
3.26 Left: Image plot; Right: Contour plot. 58
3.27 Left: Perspective plot of the outer product of sin(n) and cos(n)en/3 ;
Right: The same plot with shade applied. 59
3.28 scatterplot3d plots (left) default (type = p); Right:
(type =h). 60
3.29 Radial (left) and polar (center, right) plots using (p)polygon,
(s)ymbol, and (r)adial line representations. 61
3.30 Triangle plot of alloy composition. 62
3.31 Result of Brownian motion animation after 100 steps with 10
particles. 63
5.1 The function f(x,a) with a = 0.5. Roots are located by the points
command once they have been calculated by uniroot.all. 95
5.2 Viscosity of water fit to a quadratic in temperature. 98
5.3 Plot of the lhs of Equation 5.4. 110
5.4 Simulated spectrum of 4-component mixture. 119
5.5 Plots of reduced pressure vs. reduced volume below (points) and
above (line) the critical temperature. 122
11.1 Linear fit (left) and residuals (right) for simulated data with random
error. 294
LIST OF FIGURES xvii
11.2 lm() fit to a quadratic polynomial with random error. 295
11.3 (left) Plot of misra1a data with abline of linear fit; (right) Residuals
of linear fit to misra1a data. 298
11.4 (left) nls() exponential fit to misra1a data; (right) Residuals of
nls() exponential fit to misra1a data. 299
11.5 Fit and residuals of nls() fit to the 3-exponential Lanczos function
11.1. 302
11.6 Concentration of product C of reversible reaction with points
reflecting measurement errors. 306
11.7 Approximations of ln(1+x): Solid line, true function; dashed line,
Taylors series; points, Pade approximation. 310
11.8 Approximation to (2) by direct summation of 1/x2 . 311
11.9 Viscosity of 20% solutions of sucrose in water as a function of
temperature. 312
11.10 Examples of non-monotonic and monotonic fitting to a set of
points. 314
11.11 Fit of a spline function to a simulated spectrum, along with first
and second derivative curves. 315
11.12 Sampling and analysis of a sine signal. 316
11.13 Inverse fft of the signal in Figure 11.17. 317
11.14 Power spectrum of sine function. 318
11.15 fft of the sum of two sine functions. 319
11.16 Power spectrum of the sum of two sine functions. 320
11.17 Power spectrum (right) of the sum of two sine functions with
random noise and a sloping baseline (left). 320
11.18 Plot of the peaks derived from the power spectrum. 321
11.19 Frequency response of the Butterworth filter butter(4,0.1). 323
11.20 Use of butter(3,0.1) filter to extract a sinusoidal signal from
added normally distributed random noise. 323
11.21 Use of SavitzkyGolay filter to extract a sinusoidal signal from
added normally distributed random noise. 324
11.22 Use of fftfilt to extract a sinusoidal signal from added normally
distributed random noise. 325
11.23 (left) Plot of Hahn1 data and fitting function; (right) Plot of
residuals. 326
11.24 Atmospheric concentration of CO2 monthly from 1959 to 1997. 327
11.25 Decomposition of CO2 data into trend, seasonal, and random
components. 328
Preface
The complex mathematical problems faced by scientists and engineers rarely can be
solved by analytical approaches, so numerical methods are often necessary. There
are many books that deal with numerical methods for scientists and engineers; their
content is fairly standardized: Solution of systems of linear algebraic equations and
nonlinear equations, finding eigenvalues and eigenfunctions, interpolation and curve
fitting, numerical differentiation and integration, optimization, solution of ordinary
differential equations and partial differential equations, and Fourier analysis. Some-
times statistical analysis of data is included, as it should be. As powerful personal
computers have become virtually universal on the desks of scientists and engineers,
computationally intensive Monte Carlo methods are joining the numerical analysis
armamentarium.
If there are many books on these well-established topics, why am I writing an-
other one? The answer is to propose and demonstrate the use of a language relatively
new to the field: R. My approach in this book is not to present the standard theoretical
treatments that underlie the various numerical methods used by scientists and engi-
neers. There are many fine books and online resources that do that, including one that
uses R: Owen Jones, Robert Maillardet, and Andrew Robinson. Introduction to Sci-
entific Programming and Simulation Using R. Chapman & Hall/CRC, Boca Raton,
FL, 2009.
Instead, I have tried to write a guide to the capabilities of R and its add-on pack-
ages in the realm of numerical methods, with simple but useful examples of how the
most pertinent functions can be employed in practical situations. Perhapsif it were
not for its cumbersomenessa more accurately descriptive title for this book would
be How To Use R to Perform Numerical Analyses of Interest to Scientists and Engi-
neers. I believe that the approach I take is the most efficient way to introduce new
users to the powerful capabilities of R.
R, with more than two million users worldwide, is well known and widely used
among statisticians as a language and environment for statistical computing and
graphics which provides a wide variety of statistical and graphical techniques: linear
and nonlinear modeling, statistical tests, time series analysis, classification, cluster-
ing, etc. It runs on essentially all common operating systems: Mac OS, Windows,
and Linux.
Less well known than Rs statistical prowess is that it has capabilities in the
realm of numerical methods very similar to those of excellent but costly commercial
Comprehensive R Archive Network (CRAN), http[://cran.r-project.org/
xix
xx PREFACE
programs such as MATLAB
R
, MathCad, and the numerical parts of Mathematica
and Maple, with the considerable advantages that it is free and open source. The fact
that R is free is important in making its capabilities available to everyone, even if
they live in poor countries, do not work in companies or institutions that can afford
expensive site licenses, or no longer have student discounts.
R has excellent, publication-quality graphics. It has many useful built-in func-
tions and add-on packages, and can be readily extended with standard program-
ing techniques. For large, computationally demanding projects, R can interface
with speedier but more-difficult-to-program languages such as Fortran, C, or C++.
It has extensive online help and a large and growing library of books that il-
lustrate its many applications. R is a stable but evolving computational platform,
which undergoes continual (but not excessive) development and maintenance, so
that it can be relied on over the long term. To quote from the What Is R? page
http://www.r-project.org/about.html linked to the R Project home page at
http://www.r-project.org/,
R is an integrated suite of software facilities for data manipulation, calculation
and graphical display. It includes
an effective data handling and storage facility,
a suite of operators for calculations on arrays, in particular matrices,
a large, coherent, integrated collection of intermediate tools for data analy-
sis,
graphical facilities for data analysis and display either on-screen or on hard-
copy,
a well-developed, simple and effective programming language which in-
cludes conditionals, loops, user-defined recursive functions and input and
output facilities.
The term environment is intended to characterize it as a fully planned and
coherent system. . .
Acknowledgments
I am grateful to Hans Werner Borchert, author of the valuable packages pracma and
specfun and maintainer of the Numerical Mathematics Task View on the CRAN
website, for his many contributions to this book. In addition to his overall critiques,
he wrote the section on Numerical Integration and several sections in the Optimiza-
tion chapter. Daniel Beard made insightful comments on an earlier version of this
manuscript. My editor Rob Calvert, and his assistants Rachel Holt and Sarah Gel-
son, kept things running smoothly. Karen Simon efficiently shepherded the produc-
tion process. My greatest thanks, however, go to the large community of R project
contributorsboth the core group and the authors of the many packages, manuals,
and bookswho have given so freely of their time and talent to craft a tool of such
immense value.
MATLAB
R
is a registered trademark of The MathWorks, Inc. and is used with per-
mission. The MathWorks does not warrant the accuracy of the text or exercises in
this book. This books use or discussion of MATLAB
R
software or related products
does not constitute endorsement or sponsorship by The Math Works of a particular
pedagogical approach or particular use of the MATLAB
R
software.
Introduction
1.2 Learning R
The next several chapters of this book are intended to provide a basic introduction
to R. The basic manual for learning R is the online An Introduction to R, found at
http://cran.r-project.org/ Documentation. The section Learning more
about R at the end of this chapter lists numerous books and online resources.
1
2 INTRODUCTION
1.4 Finding help
If you know the name of an R object, such as a function, and want to know what it
does, what its various arguments (including defaults) are, and what values it returns,
with examples, typehelp (function.name) or ?function.name. For example,
?solve tells us that This generic function solves the equation a%*% x = b for x,
where b can be either a vector or a matrix. As one example, it gives inversion of a
Hilbert matrix:
hilbert <- function(n) {i <- 1:n; 1 / outer(i - 1, i, "+")}
h8 <- hilbert(8); h8
sh8 <- solve(h8)
round(sh8 %*% h8, 3)
(Dont worry if you dont understand the code at this time. We will discuss R pro-
gramming beginning in Chapter 4.)
Often, you may need to be reminded of the name of a function. A very useful
cheat sheet listing many of the more common R functions is R Reference Card
by Tom Short, available at http://cran.r-project.org/doc/contrib/Short-
refcard.pdf.
If you think that an object or function may be available, and can guess part
of its name, try apropos(). For example, if youre interested in spectral analysis,
apropos(spec) gives
[1] "plot.spec" "plot.spec.coherency" "plot.spec.phase" "spec.ar"
[5] "spec.pgram "spec.taper" "spectrum"
However, this does not turn up Special, which yields special mathematical func-
tions related to the beta and gamma functions. apropos() allows searches using
regular expressions; enter ?apropos to see some examples.
help.search() allows for searching the help system for documentation
matching a given character string in the (file) name, alias, title, concept or key-
word entries (or any combination thereof), using either fuzzy matching or regular
expression matching. Note that the character string must be in quotes. For example,
help.search("spectral") turns up five topics, with descriptions:
eigen: Spectral Decomposition of a Matrix
plot.spec: Plotting Spectral Densities
spec.ar: Estimate Spectral Density of a Time Series from AR Fit
spec.pgram: Estimate Spectral Density of a Time Series by a Smoothed Peri-
odogram
spectrum: Spectral Density Estimation
Clicking on any of these topics brings up its help page.
Using regular expressions, help.search(^spec) brings up those help pages
containing information about topics whose title, alias, or concept contain words that
begin with spec: Special, specific, spectral, specification, etc.
help.start() opens your web browser to Rs online documentation. The man-
ual An Introduction to R is the standard online reference to the language. Click on
AUGMENTING R WITH PACKAGES 3
Search Engine & Keywords to search for keywords, function and data names, and
concepts, and also for words or phrases within help-page titles. A list of keywords ar-
ranged by topics (Basics; Graphics; MASS (the book); Mathematics; Programming,
Input/Output, and Miscellaneous; and Statistics) is provided to help target the search.
R has a large and helpful online community, of which you can ask
questions if you cant find answers through your own efforts. A very large
database (nearly 2700 pages as of the end of 2013) of topics is maintained at
http://r.789695.n4.nabble.com/. Searching this database can provide leads
to existing resources, or show how others have solved puzzling problems.
Two sites for doing Google-type searching of the R language are http://www.
dangoldstein.com/search r.html and http://www.rseek.org/.
If all else fails, you can ask your own questions by going to http://www.r-pro
ject.org/ > Mailing Lists. The third item down is R-help. (The first two are R-
announce, for major announcements about the development of R and the availability
of new code and R-packages, for announcements ... on the availability of new or
enhanced contributed packages.) The posting guide gives important advice about
how to ask good questions that prompt useful answers. Follow that advice to avoid
grumpy responses from the experts.
A somewhat haphazard but occasionally enlightening way to learn about vari-
ous aspects of R is to look at R-bloggers (http://www.r-bloggers.com/), which
collects daily news and tutorials about R, contributed by over 450 bloggers.
Calculating
7
8 CALCULATING
2.2 Complex numbers
R has the standard operations on complex numbers. To get more information,
?complex.
> (1i)^2 # Complex unit i must be multiplied by a scalar
[1] -1+0i
> Mod(1+2i)*Mod(3+4i)
[1] 11.18034
> (-8+0i)^(1/3)
[1] 1+1.732051i
> options(digits=3)
> Arg(3+2i)
[1] 0.588
The round(number, digits) function rounds the number to the specified
number of decimal places. The default is digits = 0. It works with both positive
and negative numbers of digits.
> options(digits=7)
> round(1234.567,-2)
[1] 1200
> round(1234.567,2)
[1] 1234.57
The function signif(number, digits) rounds the number to the specified num-
ber of significant digits (default = 6).
> signif(1234.567) # Default
[1] 1234.57
> signif(1234.567,2)
[1] 1200
The functions ceiling(x), floor(x), and trunc(x) take a single numeric
argument x and return the smallest integer not less than x, the largest integer not
greater than x, and the integer formed by truncating x toward zero, respectively. If x
10 CALCULATING
is a vector (see below, Section 2.6), these rounding functions work on each element
of the vector.
For presentation, it is often desirable to format numbers with a given number
of digits, commas or other marks separating intervals before the decimal point, in
decimal or scientific format, etc. One can do this using the formatC function.
> options(digits = 7)
> (x = x + 123456)
[1] 123456.5 123456.6 123456.2
> .7/.1-7
[1] -8.88e-16
# but
> .7/.1
[1] 7
One can zap meaningless values close to zero with the zapsmall function:
> zapsmall(.7/.1) - 7
[1] 0
but, of course, one must be cautious in doing so.
R uses the IEEE standard in representing floating-point numbers in 64-bit double
precision. (For details see Jones et al., 2009.) The command .Machine tells us a
variety of things about this standard. The smallest non-zero floating point number
that can be represented is double.xmin, 2.225074 10308 , and the largest floating
point number is double.xmax, 1.797693 10308 . The smallest positive number x
such that 1 + x is not equal to 1 is double.eps, 2.220446 1016 . The smallest
positive number such that 1 x can be distinguished from 1 is double.neg.eps,
1.110223 1016 . One must exercise care in testing for exact numerical equality if
differences are near double.eps. (See the section on relational operators, below.)
ASSIGNING VARIABLES 11
If a number cannot be represented meaningfully, Inf (infinity) or NaN (not a
number) will generally be returned according to standard computational arithmetic
definitions:
> 1/0
[1] Inf
> log(0)
[1] -Inf
> Inf*Inf
[1] Inf
> Inf/Inf
[1] NaN
> 0/0
[1] NaN
> theta
[1] 0.7853982
> asin(st)
[1] 0.7853982
Names of variables in R may consist of lowercase or capital letters, numbers, .,
and . The name must begin with a letter or .; and if it begins with . the next
character cannot be a number. R is case sensitive, so x and X are different variables.
A simple but handy use for named variables is to convert between units. For ex-
ample, to convert between time units, we can use the definitions (separating multiple
assignments on the same line with semicolons):
> sec. = 1; min. = 60*sec.; hr. = 60*min.; day. = 24*hr.
> week. = 7*day.; yr. = 365.25*day.; century. = 100*yr.
12 CALCULATING
> 3*century./sec.
[1] 9.47e+09
to calculate the (approximate) number of seconds in three centuries. Note that one
divides by the desired unit because the answer is a pure number without units. In
this example I have adopted the arbitrary but useful convention that unit names end
with a period, to avoid conflicts with other potential uses of these variable names.
Another example is to convert between degrees and radians when using trigonometic
functions.
> degree. = pi/180
> sin(30*degree.)
[1] 0.5
R has five sets of built-in constants:
pi
LETTERS (the 26 uppercase letters of the Roman alphabet)
letters (the 26 lowercase letters of the Roman alphabet)
month.abb (the 3-letter abbreviations of the month names in English)
month.name (the month names in English)
Although not strictly prohibited, it is not advisable to name variables as c, t,
T, or F since these are used in R to combine arguments to form a vector, take the
transpose of a vector or matrix, and stand as abbreviations for TRUE and FALSE.
In this book we will deal with numeric, complex, or logical (TRUE/FALSE) vari-
ables, but R can also deal with character data. We will mainly consider calculations
with vectors and matrices, and occasionally lists (the general form of vectors with
different types of elements); but R has other types of objects as well: data frames,
factors, and arrays (matrices with more than two dimensions).
> .3/.1 == 3
VECTORS 13
[1] FALSE
To avoid such situations with numerical or complex quantities, use all.equal, a
utility that tests near equality (by default within a tolerance of .Machine$double.
eps0.5 ) of two R objects:
> all.equal(.3/.1,3)
[1] TRUE
If the operator == or ! = is applied to vectors (see the next section) with n el-
ements, it will generate a logical vector with n TRUE or FALSE values. If what is
wanted is instead a single answer to the question whether the vectors are identical,
use the identical function, which tests for exact equality, instead. (See the subsec-
tion on logical vectors below.)
2.6 Vectors
For nearly all numerical calculations in R, one uses vectors and matrices. Vectors
are the simplest data structure, consisting of an ordered collection of numbers, char-
acters, or logical values that are separated by commas and bracketed by c(), which
stands for combine or concatenate. A typical numerical vector might be
> x = c(3.2, 1.7, -11.3, -0.67, 4, 0)
A scalar can be thought of as a vector of length 1.
> 3*x
[1] 9.60 5.10 -33.90 -2.01 12.00 0.00
> x^2
[1] 10.240 2.890 127.690 0.449 16.000 0.000
> cos(x/2)
[1] -0.0292 0.6600 0.8061 0.9444 -0.4161 1.0000
There is also a set of functions that return the length, mean, standard deviation,
minimum, maximum, range, etc., of a vector.
> length(x) # Number of elements in the vector
[1] 6
> mean(x)
[1] -0.512
> min(x)
[1] -11.3
VECTORS 15
> range(x)
[1] -11.3 4.0
> sum(x)
[1] -3.07
> prod(x)
[1] 0
Warning message:
In x1 * y1 :
longer object length is not a multiple of shorter object length
> 5.7:-3.7
[1] 5.7 4.7 3.7 2.7 1.7 0.7 -0.3 -1.3 -2.3 -3.3
A common mistake is to forget that the colon has higher priority than other arith-
metic operations.
> n = 10
> 1:n-1
[1] 0 1 2 3 4 5 6 7 8 9
> 1:(n-1)
[1] 1 2 3 4 5 6 7 8 9
If an increment different from 1 is desired, use seq(from, to, by). If the
parameters are given in this order, their names may be omitted, but if a different
order is used, the names are required. (This is true of all functions in R.)
> seq(3,8,.5)
[1] 3.0 3.5 4.0 4.5 5.0 5.5 6.0 6.5 7.0 7.5 8.0
> seq(by=0.45,from=2.7,to=6.7)
[1] 2.70 3.15 3.60 4.05 4.50 4.95 5.40 5.85 6.30
The number of elements may be specified by length.out, which is often abbre-
viated to length or simply len.
> seq(-pi,pi,length.out=12) # 12 values between -pi and pi
[1] -3.142 -2.570 -1.999 -1.428 -0.857 -0.286 0.286 0.857
[9] 1.428 1.999 2.570 3.142
> runif(6)
[1] 0.176 0.235 0.316 0.656 0.817 0.636
Likewise, a sequence of n random numbers drawn from a normal distribution
with mean mean and standard deviation sd is generated by rnorm(n, mean, sd).
If mean and sd are not specified, the defaults are 0 and 1.
> rnorm(6,9,1.5)
[1] 7.36 9.49 10.79 8.59 11.90 6.44
> rnorm(6)
[1] -1.103 -0.849 1.148 1.460 -0.831 0.919
In a common simulation scenario, one wants to generate a sequence of values
with normally distributed random error of fixed standard deviation. For example:
> x = 1:6
> err = rnorm(6,0,0.1) # mean of error = 0, sd = 0.1
> w1 = 1:6; w1
[1] 1 2 3 4 5 6
> v1 == w1
[1] TRUE TRUE FALSE TRUE TRUE FALSE
> all.equal(v1,w1)
[1] TRUE
> head(v2)
[1] 1 2 3 4 5 6
> tail(v2)
[1] 9995 9996 9997 9998 9999 10000
The head and tail functions are useful for checking the beginning and end of very
large vectors without printing the entire vector.
> dot(u,v)
[1] 32
20 CALCULATING
> vecnorm(u) # sqrt(1^2 + 2^2 +3^2) = sqrt(14)
[1] 3.742
Note that the magnitude of v might also be called its length, but in R the length
of a vector is its number of elements, not its magnitude. R has a function norm to
calculate any one of several norms of a matrix. (Type ?norm for details.) We can
use it to calculate the Euclidian norm of a vector by converting the vector to a (one-
dimensional) matrix and choosing the Frobenius option F or f:
> norm(as.matrix(u),"F")
[1] 3.742
The crossproduct of two three-dimensional vectors is calculated with the function
> cross = function(u,v) {c(u[2]*v[3]-u[3]*v[2],
+ u[3]*v[1]-u[1]*v[3], u[1]*v[2]-u[2]*v[1])}
> cross(u,v)
[1] -3 6 -3
Note that "+" is added automatically to the beginning of the next line when the
preceding line does not form a complete statement, in this case because it ends with
a comma.
R has a function crossprod that, when operating on vectors, behaves like dot
(but yields a 1 1 matrix). See below for its use in matrix multiplication, and
?crossprod for details. To add to the confusion, R already has a function dot
that, in the plotmath package for annotating graphics, yields x with a dot. Its
important not to confuse these usages.
More convenient, since we will be using it in a variety of contexts, may be to
install and load the add-on package pracma, which contains the expected vector dot
and crossproducts.
> install.packages("pracma") # If not already installed
> require(pracma)
Loading required package: pracma
Attaching package: pracma
The following objects are masked _by_ .GlobalEnv:
cross, dot
Note that pracma now superimposes its definition of dot over the base R definition
as well as over the functions we have just defined.
> u = c(1,2,3)
> v = c(4,5,6)
> dot(u,v)
[1] 32
> cross(u,v)
[1] -3 6 -3
MATRICES 21
The reader is urged to type cross (without the ?) to see how the definition of this
function is implemented in pracma: essentially the same as above.
2.7 Matrices
Matricestwo-dimensional arrays of numbersare ubiquitous in numerical analy-
sis, and R has an extensive set of functions for dealing with them.
> m[2,]
[1] 4 5 6
> m[,3]
[1] 3 6
A matrix may also be formed by binding together row (rbind) or column
(cbind) vectors. For example,
> x = 1:3; y = 1:3
> rbind(x,y)
[,1] [,2] [,3]
x 1 2 3
y 1 2 3
22 CALCULATING
> cbind(x,y)
x y
[1,] 1 1
[2,] 2 2
[3,] 3 3
A diagonal matrix is constructed using diag:
> diag(c(4,6,5))
[,1] [,2] [,3]
[1,] 4 0 0
[2,] 0 6 0
[3,] 0 0 5
so an n n unit matrix can be constructed by diag(rep(1,n)):
> diag(rep(1,3))
[,1] [,2] [,3]
[1,] 1 0 0
[2,] 0 1 0
[3,] 0 0 1
Similarly, a 3 3 matrix with all zeros is constructed by
> matrix(rep(0,9), nrow=3)
[,1] [,2] [,3]
[1,] 0 0 0
[2,] 0 0 0
[3,] 0 0 0
The outer() operator forms an m n matrix by combining two vectors of
lengths m and n according to a function specified by FUN. The default function is
*.
> x = 1:3; y = 1:3
> outer(x,y)
[,1] [,2] [,3]
[1,] 1 2 3
[2,] 2 4 6
[3,] 3 6 9
> outer(x,y,FUN="+")
[,1] [,2] [,3]
[1,] 2 3 4
[2,] 3 4 5
[3,] 4 5 6
Heres an easy way to use outer() to make a table (actually a matrix) of powers
of integers:
MATRICES 23
> x = 1:9; y = 2:8
> names(x)=x; names(y)=y
> outer(y,x,"^")
1 2 3 4 5 6 7 8 9
2 2 4 8 16 32 64 128 256 512
3 3 9 27 81 243 729 2187 6561 19683
4 4 16 64 256 1024 4096 16384 65536 262144
5 5 25 125 625 3125 15625 78125 390625 1953125
6 6 36 216 1296 7776 46656 279936 1679616 10077696
7 7 49 343 2401 16807 117649 823543 5764801 40353607
8 8 64 512 4096 32768 262144 2097152 16777216 134217728
The kronecker function is useful for constructing block matrices. Given
two matrices M1 and M2, kronecker(M1,M2) returns a matrix with dimensions
dim(M1)*dim(M2).
> (M1 = matrix(1:4,2,2))
[,1] [,2]
[1,] 1 3
[2,] 2 4
> kronecker(M1,M2)
[,1] [,2] [,3] [,4]
[1,] 1 0 3 0
[2,] 0 1 0 3
[3,] 2 0 4 0
[4,] 0 2 0 4
> kronecker(M2,M1)
[,1] [,2] [,3] [,4]
[1,] 1 3 0 0
[2,] 2 4 0 0
[3,] 0 0 1 3
[4,] 0 0 2 4
A submatrix can be formed from a larger matrix by putting the desired row and
column indices in square brackets.
> (m3 = matrix(1:9,3,3,byrow=T))
[,1] [,2] [,3]
[1,] 1 2 3
[2,] 4 5 6
[3,] 7 8 9
24 CALCULATING
> m3[1:2,c(1,3)]
[,1] [,2]
[1,] 1 3
[2,] 4 6
If the rows and columns of a matrix arise from a series of measurements, as in
a database where each row corresponds to a subject and each column to a particular
measurement, it may be convenient to give the rows and columns descriptive names.
(See the discussion of data.frame in Chapter 10.) rownames and colnames are
used for this purpose. For example, using the matrix m defined above:
> rownames(m) = c("A","B")
> colnames(m) = c("v1","v2","v3")
> m
v1 v2 v3
A 1 2 3
B 4 5 6
The rows or columns may then be queried individually by name, e.g.,
> m[,"v1"]
A B
1 4
> summary(m[,"v1"])
Min. 1st Qu. Median Mean 3rd Qu. Max.
1.00 1.75 2.50 2.50 3.25 4.00
The names can also be assigned when the matrix is constructed:
> m = matrix(1:6, nrow=2, ncol=3, byrow=T,
+ dimnames = list(c("A","B"),c("v1","v2","v3")))
> m/5
v1 v2 v3
A 0.2 0.4 0.6
B 0.8 1.0 1.2
MATRICES 25
> options(digits=3)
> sqrt(m)
v1 v2 v3
A 1 1.41 1.73
B 2 2.24 2.45
> m^(-1)
v1 v2 v3
A 1.00 0.5 0.333
B 0.25 0.2 0.167
If two matrices are combined by simple operations, assuming their row and col-
umn dimensions match, the operations are applied to the individual elements.
> m+m
v1 v2 v3
A 2 4 6
B 8 10 12
> m/m
v1 v2 v3
A 1 1 1
B 1 1 1
where n is the number of columns of M1, which must equal the number of rows of
M2.
The R operator for matrix multiplication is %*%. For example,
> M1 = matrix(runif(9),3,3); M1
[,1] [,2] [,3]
[1,] 0.261 0.338 0.176
[2,] 0.402 0.786 0.827
[3,] 0.899 0.765 0.340
> M2 = matrix(runif(9),3,3); M2
[,1] [,2] [,3]
[1,] 0.0844 0.00498 0.478
[2,] 0.8571 0.80753 0.505
[3,] 0.9829 0.18913 0.133
26 CALCULATING
> crossprod(A,B)
[,1] [,2]
[1,] 17 23
[2,] 39 53
> tcrossprod(A,B)
[,1] [,2]
[1,] 26 30
[2,] 38 44
> (A = matrix(1:4,2,2))
[,1] [,2]
[1,] 1 3
[2,] 2 4
> expm(A)
2 x 2 Matrix of class "dgeMatrix"
[,1] [,2]
[1,] 51.97 112.1
[2,] 74.74 164.1
dgeMatrix is the standard class for dense numeric matrices in the Matrix pack-
age. Type help(package = "Matrix") for more details. The matrix exponential
function may be used to solve sets of first-order, linear differential equations, whose
formal solutions are often sums of exponential functions.
1 12 13
x1 1
1 1 1 x2 = 0 (2.5)
2 3 4
1 1 1 x3 0
3 4 5
or in compact form as
Ax = b (2.6)
Premultiplying both sides by A1 ,
A1 Ax = x = A1 b (2.7)
> b = c(1,0,0)
> Hvec[,1]%*%Hvec[,1]
[,1]
[1,] 1
> Hvec[,2]%*%Hvec[,2]
[,1]
[1,] 1
> Hvec[,3]%*%Hvec[,3]
[,1]
[1,] 1
> Hvec[,1]%*%Hvec[,2]
[,1]
[1,] -2.91e-16
> Hvec[,1]%*%Hvec[,3]
[,1]
[1,] 5.55e-17
> Hvec[,2]%*%Hvec[,3]
[,1]
[1,] -2.15e-16
More generally, a square matrix with complex elements is Hermitian if the ele-
ment in the i-th row and j-th column is equal to the complex conjugate of the element
in the j-th row and i-th column, for all indices i and j. In this case, the eigenvalues
are real and the eigenvectors are orthonormal with their complex conjugates. For
example:
> Hi = matrix(c(1,2+7i,3,
+ 2-7i,5,-1,
+ 3,-1,7),3,3,byrow=T)
> Hivec[,1]%*%Conj(Hivec[,1])
[,1]
[1,] 1+0i
MATRICES 31
> Hivec[,1]%*%Conj(Hivec[,2])
[,1]
[1,] -5.55e-17-2.78e-17i
Of course, the fact that the imaginary parts that should be zero are not exactly so in
these calculations shows the limitations of representing decimal numbers in binary
arithmetic.
1.0
0.8
20 0.6
0.4
Row 40 0.2
0.0
60 -0.2
-0.4
20 40 60
Column
Dimensions: 72 x 72
Graphing
It is not traditional for a book on numerical methods to devote much space to graph-
ing of data and functions. Yet graphing is an essential part of scientific and engineer-
ing work, and R has very strong graphics tools. Therefore, we include a chapter on
the topic here. Two recent books on graphing with R are
Paul Murrell, R Graphics, Second Edition, Chapman & Hall/CRC, 2011
Hrishi V. Mittal, R Graphs Cookbook, Packt Publishing, 2011
37
38 GRAPHING
Detector Output
2.0
2.0
Signal Intensity
1.0
1.0
y
0.0
0.0
0 2 4 6 8 10 0 2 4 6 8 10
x Time, sec
Figure 3.1: Left: Default data plot; Right: Refined data plot.
> par(mfrow=c(1,2))
> par(mar=c(4,4,1.5,1.5),mex=.8,mgp=c(2,.5,0),tcl=0.3)
> plot(x,y, type = "l")
> plot(x,y, type = "o")
> par(mfrow=c(1,1))
As we shall discuss in a few pages, R allows much greater customization of
graphs than the few elementary steps we have shown here. However, from now on we
will generally use the par(mar=c(4,4,1.5,1.5),mex=.8,mgp=c(2,.5,0),tcl=0.3)
parameter setting to make the graphs more compact and to place the ticks within the
graph, as is standard in most scientific work. Explanation of these parameters will be
found in Section 3.4.2.
0.0 0.5 1.0 1.5 2.0
0 2 4 6 8 10 0 2 4 6 8 10
x x
Figure 3.2: Left: plot(x,y,type="l"); Right: plot(x,y,type="o").
FUNCTION PLOTS 39
Detector Output
0 2 4 6 8 10 0 2 4 6 8 10
x Time, sec
Figure 3.3: Left: Function plot using curve; Right: Function plot superimposed on data
points.
-10 -5 0 5 10
x
Figure 3.4: The function sin plotted without specifying the independent variable.
40 GRAPHING
1 - x + x^2/2 - x^3/3
-1.5 -0.5 0.5
0.0 0.5 1.0 1.5 2.0
x
Figure 3.5: curve plot of a polynomial with points added.
14
12 Feed Week
B 3
20
A 2
8 10
1
grams gained
grams gained
10 15
4 6
5
2
0
0
1 2 3 A B
Week Feed
Figure 3.6: Stacked bar plots using beside = FALSE default option.
> A = c(3,4,4)
> B = c(6,8,10)
> feed = matrix(c(A,B), nrow=2, byrow=TRUE,
+ dimnames= list(c("A","B"), c("1","2","3")))
> feed # Check that weve set up the matrix correctly
1 2 3
A 3 4 4
B 6 8 10
10
Feed Week
A 1
8
B 2
3
grams gained
grams gained
6
6
4
4
2
2
0
1 2 3 A B
Week Feed
Figure 3.7: Bar plots using beside = TRUE option.
42 GRAPHING
Now plot stacked barplots using the default beside = FALSE option, emphasiz-
ing, on the left, time as the independent variable:
> barplot(feed,xlab="Week",ylab="grams gained",
+ main = "Weight Gain by Week\n", legend.text=c("A","B"),
+ args.legend=list(title="Feed",x="topleft",bty="n"))
and, on the right, the feeds:
> barplot(t(feed),xlab="Feed",ylab="grams gained",
main = "Weight Gain by Feed\n",legend.text=c("1","2","3"),
args.legend=list(title="Week",x="topleft",bty="n"))
We can display the same data in a more expanded form using the beside =
TRUE option as in the following code:
> barplot(feed, beside=T,xlab="Week",ylab="grams gained",
main = "Weight Gain by Week\n", legend.text=c("A","B"),
args.legend=list(title="Feed",x="topleft",bty="n"))
and
> barplot(t(feed), beside=T,xlab="Feed",ylab="grams gained",
main = "Weight Gain by Feed\n",
legend.text=c("1","2","3"),
args.legend=list(title="Week",x="topleft",bty="n"))
In main for both of these plots, \n is the newline command, introducing an extra
line spacing after the main title. There are many options to the barplot command,
some of which we have used above. See the help page for more details.
3.3.2 Histograms
Histograms are commonly used to display the distribution of repeated measurements.
R does this with the hist function. If the fraction of measurements falling into each
range is desired instead, use plot(density), where the density function gives
useful numerical data about the distribution. As an example, we generate 1000 nor-
mally distributed random numbers with mean 10 and standard deviation 2. Results
are shown in Figure 3.8.
> set.seed(333)
> x = rnorm(1000,10,2)
> hist(x)
> plot(density(x))
> density(x) # Get information about distribution
Call:
density.default(x = x)
OTHER COMMON PLOTS 43
Histogram of x density.default(x = x)
Density
50
0
4 6 8 10 14 5 10 15
x N = 1000 Bandwidth = 0.4387
Figure 3.8: Distribution of 1000 normally distributed random variables with mean = 10 and
standard deviation = 2. Left: Histogram; Right: Density plot.
x y
Min. : 2.333 Min. :1.035e-05
1st Qu.: 6.277 1st Qu.:3.456e-03
Median :10.220 Median :2.737e-02
Mean :10.220 Mean :6.333e-02
3rd Qu.:14.164 3rd Qu.:1.183e-01
Max. :18.108 Max. :2.078e-01
The hist function tends to have a mind of its own when setting breaks between
classes. To control this, use the argument breaks = vecbreaks, where vecbreaks
is a vector that explicitly gives the breakpoints between histogram cells.
Box Plot
16
14
12
10
8
6
4
k 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25
pch
It also has six line types, numbered from 1 to 6, which can be specified by lty(n)
(Figure 3.11):
1
2
3
4
5
6
As noted earlier, the size of points, relative to the default size, can be set by cex.
Similarly, the relative thickness of lines can be set by lwd.
1.0
0.8
0.8
0.6
0.6
signal
signal
0.4
0.4
0.2
0.2
0.0
2 4 6 8
0 2 4 6 8 10
time
time
Figure 3.12: Left: Default plot of 0.8et/4 + 0.05; Right: Plot modified as described in the
text.
red, green, blue, cyan, and magenta, respectively. These colors can also be called by
name, e.g., col ="red".Typing ?palette gives much more information on graphic
color capabilities in R.
The axes, text placement, and other aspects of a graph can readily be customized
in R. For example, suppose we want a line plot of the function 0.8et/4 + 0.05 for t
values between 1 and 8.5. The result, produced by the code
> time = seq(1,8.5,.5)
> signal = 0.8*exp(-(time-1)/4) + 0.05
> plot(time, signal, type="l")
is shown in Figure 3.12, left.
We can modify this default result in several ways. For example, suppose we want
the ordinate to run from 0 to 10, and the abscissa from 0 to 1. We want the ticks to be
inside the axes rather than outside and to be a bit shorter. We want the plot margins
to be somewhat smaller, and the axis labels to be closer to the axes, to tighten up the
white space. We also want to add some lines to the graph, to emphasize that the signal
starts at time = 1 and that it levels off at signal = 0.05. These modifications are
accomplished by the following code, giving the result shown in Figure 3.12, right.
> par(mar=c(4,4,1.5,1.5),mex=.8,mgp=c(2,.5,0),tcl=0.3)
> plot(time, signal, type="l",xlim = c(0,10), ylim = c(0,1))
> abline(h = 0.05, lty=3) # Horizontal line at y = 0.05
> abline(v=1,lty=3) # Vertical line at x = 1
The axis limits are specified with xlim = c(0,10) and ylim = c(0,1). The
tick direction and length are set with tcl = 0.3 (the default is -0.5, where the minus
sign specifies ticks outside the plot, and the number is the fraction of a line height).
The internal lines in the plot are drawn with the abline command.
Additional aspects of the plot are set with various arguments to the par function,
which specifies a broad range of graphical parameters. For example, figure margins
may be made tighter or looser with the mar argument to par, (the default is mar =
46 GRAPHING
c(5,4,4,2)+0.1), where the numbers are multiples of a line height and are in the
order bottom, left, top, right. mex determines coordinates in the margins of plots; val-
ues less than 1 move the labels toward the margins, thus decreasing white space. The
location of the axis labels is set by mgp (the default is mgp = c(3,1,0)). The new
settings of the par parameters are retained until modified by future par() statements,
or until a new R session is started. Type ?par for a listing of the many parameters
that go into a graph.
Figure 3.13: Left: Graphic elements produced with base R; Right: Graphic elements produced
with plotrix package.
CUSTOMIZING PLOTS 47
> draw.arc(20, 20, (1:4)*5, deg2 = 1:20*15)
> draw.circle(20, 80, (1:4)*5)
> draw.ellipse(80, 20, a = 20, b = 10, angle = 30, col=gray(.5))
> draw.radial.line(start=2, end = 15, center = c(80,80), angle=0)
> draw.radial.line(start=2, end = 15, center = c(80,80), angle=pi/2)
> draw.radial.line(start=2, end = 15, center = c(80,80), angle=pi)
> draw.radial.line(start=2, end = 15, center = c(80,80), angle=3*pi/2)
>
> par(mfrow=c(1,1))
It is evident that these functions could be used to draw simple or even not so
simple diagrams.
Finally, we show how to color defined areas of a plot with the polygon() func-
tion. As an example, we distinguish the positive and negative regions of a function
with different shades of gray, which might be useful in a pedagogical presentation of
how to integrate the function.
Consider integrating the first order Bessel function besselJ(x,1) from x = 0
to its zero-crossing point near x = 10. We first compute the crossing points with the
uniroot function.
> x10 = uniroot(function(x) besselJ(x,1),c(9,11))$root
> x4 = uniroot(function(x) besselJ(x,1),c(3,5))$root
> x7 = uniroot(function(x) besselJ(x,1),c(6,8))$root
We compute the value of the function over the desired range, using many steps to
give a smooth polygon fill.
> x = seq(0,x10,len=100)
> y = besselJ(x,1)
Next we construct an empty plot with the desired x and y limits, and add the polygon
of the function with a medium gray fill.
> plot(c(0,x10),c(-0.5,0.8), type="n", xlab="x", ylab="J(x,1)")
> polygon(x,y,col="gray", border=NA)
We then paint over the negative region with white, and add a horizontal line at
x = 0.
> rect(0,-0.5,x10,0, col="white", border=NA)
> abline(h=0, col = "gray")
We calculate the value of the function in the negative region, again using many steps
for smoothness.
> xminus = seq(x4,x7,len=50)
> yminus = besselJ(xminus,1)
Finally, we cover the negative region with a polygon in a darker gray, and add back
the ticks that were painted over in an earlier step.
> polygon(xminus,yminus, col=gray(.2), border=NA)
> axis(1, tick=TRUE)
The result is seen in Figure 3.14.
48 GRAPHING
0.8
0.6
0.4
0.2
J(x,1)
0.0
-0.2
-0.4
0 2 4 6 8 10
x
Figure 3.14: The positive and negative regions of besselJ(x,1 distinguished with different
shades of gray using the polygon function.
1.0
1.0
y
y
0.0
0.0
-1.0
-1.0
0 5 10 15 20 0 5 10 15 20
x x
Figure 3.15: Illustration of error bars using the arrows command. Left: y error bars only;
Right: Both x and y error bars.
1 PL.s
1 1
4
1 SW.s
4
1 1 1 PW.s
1 1 1 1 1 1
1 1 1
1 1
1 1
1 1 1
cbind(SW.s, PL.s, PW.s)
1 1 1 1 1 1
1 1 1
1
1 1 1 1 1
3
3
Iris Measures
1
2
2
2 2
2 2 2 2 2
2 2 2
2 2 2 2 2 2
2 2 2
2 2 2 2 2 2 2
2
2
2 2 2
2 2
2 2
1
1
3 3
3 3 3
3 3 3 3 3 3 3
3 3 3
3
3 3 3 3
3 3 3 3 3 3 3 3
0
0
4.5 5.0 5.5 4.5 5.0 5.5
SL.s Sepal Length
1.0
0.5
y1,y2,y3
0.0
-0.5 y1
y2
y3
-1.0
-10 -5 0 5 10
x
y1 = sin(x), y2 = sin(x+pi/6)+0.1, y3 = sin(x+pi/3)+0.2
T, deg F
32 68 104 140 176 212
2800
dH/kCal
dH/kJ
2000
1600
0 20 40 60 80 100
T, deg C
Figure 3.19: Adding supplementary axes to a graph.
> tC = seq(0,100,10)
> dH = 2000 + 10*(tC - 25)
> par(tcl=0.3, mar=c(3,3,4,4)+0.1, mgp = c(2,0.4,0))
> plot(tC, dH, xlim = c(0,100), ylim = c(1600,2800),
+ xlab="T, deg C",ylab="dH/kJ", tcl=0.3)
> axis(3, at = tC, labels = tC*9/5+32, tcl=0.3)
> mtext(side=3,"T, deg F", line = 2)
> axis(3, at = tC, labels = tC*9/5+32, tcl=0.3)
> axis(4, at = dH, labels = round(dH/4.18,0), tcl=0.3)
> mtext(side=4,"dH/kCal", line = 2)
0.6
0.4
log10(x)
0.2
0.0
-0.2
1 2 3 4 5
x
the axes with axes=FALSE, and imposing customized labels with the axis() com-
mand. The following code gives an example. Note also that, for illustrative purposes,
we have given the x and y axes the two different break styles, zigzag and slash
(Figure 3.21).
> xstep = 0.1 # Scale factor for x axis
> x1 = seq(0.1,0.5,xstep)
> x2 = seq(5.0,5.3,xstep)
> x2red = x2-min(x2)+max(x1)+2*xstep
> x = c(x1,x2red)
> ystep = 1 # Scale factor for y axis
> y1 = 1:5
> y2 = c(51,53,52,54)
> y2red = y2-min(y2)+max(y1)+2*ystep
> y = c(y1,y2red)
>
> library(plotrix)
>
> plot(x,y,axes=F,xlab="x", ylab="y")
> box() # Draw axes without labels
> axis.break(1,max(x1)+xstep, style="zigzag", brw=0.04)
> axis.break(2,max(y1)+ystep, style="slash", brw=0.04)
>
> lx1 = length(x1); lx2 = length(x2)
> lx = lx1 + lx2
> ly1 = length(y1); ly2 = length(y2)
> ly = ly1 + ly2
>
> axis(1,at=(1:lx1)*xstep,labels=c(as.character(x1)))
> axis(1,at=((lx1+2):(lx+1))*xstep,labels=c(as.character(seq(min(x2),
+ max(x2),by=xstep))))
> axis(2,at=(1:ly1)*ystep,labels=c(as.character(y1)))
> axis(2,at=((ly1+2):(ly+1))*ystep,labels=c(as.character(seq(min(y2),
+ max(y2),by=ystep))))
54 GRAPHING
51 52 53 54
y
5
4
3
2
1
0.1 0.2 0.3 0.4 0.5 5 5.1 5.2 5.3
x
Local Max
0.5
0.0
y
0,0
-0.5
-1.0
Local Min
-10 -5 0 5 10
x
Convergence to Riemann
1.7
2
2 =
2
1.6
k=2
1.5
1.4
vk
1
1.3
n
1.2
k=4 4
1.1
4 =
90
1.0
5 10 15 20
v
4
2
2
-3 -2 -1 0 1
0
y1
-6 -4 -2
Density
0.10
50
0.00
0
-3 -2 -1 0 1 2 3 -5 0 5
y1 N = 1000 Bandwidth = 0.4426
y vs x
8 10
6
y
4
2
0
6 8 10 12 14
x
x distribution y distribution
8 10
10 12 14
6
4
8
2
0
6
Figure 3.25: Using layout to create a scatter plot with accompanying box plots.
For more complex layouts, in which the plots are of different sizes, use
layout(). As a simple example, we construct a figure in which the first plot uses all
of the top row, while the second and third plots share the bottom row.
> # Divide the device into two rows and two columns
> # Allocate figure 1 all of row 1
> m = matrix(c(1,1,2,3),ncol=2,byrow=TRUE)
> layout(m)
> x = rnorm(100,10,2)
> y = rnorm(100,5,2)
> par(mar=c(2,3,3,2))
> plot(x,y,main="y vs x")
> boxplot(x,main="x distribution")
> boxplot(y,main="y distribution")
which gives Figure 3.25.
58 GRAPHING
1.0
1.0
0.8
0.8
0.6
0.6
0.4
0.4
0.2
0.2
0.0
0.0
0.0 0.2 0.4 0.6 0.8 1.0 0.0 0.2 0.4 0.6 0.8 1.0
To see a more complex illustration, run the code to create a scatter plot with
marginal histograms in the example file for help(layout).
z
n n
n n
Figure 3.27: Left: Perspective plot of the outer product of sin(n) and cos(n)en/3 ; Right: The
same plot with shade applied.
z
20 20
y
15 15
10 10
5 5
0 0
0 5 10 15 20 0 5 10 15 20
x x
Figure 3.28: scatterplot3d plots (left) default (type = p); Right: (type =h).
The help page for scatterplot3d() lays out the many optional arguments to
the function and shows some interesting examples of its use to draw 3D geometrical
figures as well as data points.
Figure 3.29: Radial (left) and polar (center, right) plots using (p)polygon, (s)ymbol, and
(r)adial line representations.
Alloy Composition
alloy1
alloy2
alloy3
0.1
alloy4 0.9
0.2
0.8
0.3
0.7
0.4
0.6
0.5
0.5
0.6
0.4
0.7
0.3
0.8
0.2
0.9
0.1
0.9
0.8
0.7
0.6
0.5
0.4
0.3
0.2
0.1
Figure 3.30: Triangle plot of alloy composition.
0 20
10
y
-10
-20
-20 -10 0 10 20
x
Figure 3.31: Result of Brownian motion animation after 100 steps with 10 particles.
3.12 Animation
Animation is often useful in scientific or engineering calculations, to visualize the
time course of complex events. The animation package in R presents A Gallery
of Animations in Statistics and Utilities to Create Animations. The results can be
shown either in an R graphics window or in an HTML browser window. Here we
present an adaptation of the brownian.motion demo, with a figure (Figure 3.31)
showing the final state of 10 particles making 100 random steps in two dimensions.
Of course, a static book page cannot represent an animated graphics window, so the
reader should run the code.
> install.packages("animation")
> require(animation)
> brownian.motion = function(n = 10, xlim = c(-20, 20),
+ ylim = c(-20, 20), ...) {
+ x = rnorm(n)
+ y = rnorm(n)
+ interval = ani.options("interval")
+ for (i in seq_len(ani.options("nmax"))) {
+ plot(x, y, xlim = xlim, ylim = ylim, ...)
+ # text(x, y)
+ x = x + rnorm(n)
+ y = y + rnorm(n)
+ Sys.sleep(interval)
+ }
+ invisible(NULL)
+ }
> # Change options from default (interval = 1, nmax = 50)
> oopt = ani.options(interval = 0.05, nmax = 100)
> brownian.motion(pch = 16, cex = 1.5)
> ani.options(oopt) # Restore default options
64 GRAPHING
3.13 Additional plotting packages
A great deal of effort has been put by developers into devising useful functions for
graphing almost any kind of data or functions. We have already mentioned plotrix,
scatterplot3d, rgl, and animation. The graphics task view (http://cran.r-
project.org/web/views/ Graphics.html) summarizes the many other packages avail-
able. In particular, the lattice package (included in the R installation) and the
ggplot2 package provide important facilities for producing more complex graphics
than we have considered here.
Chapter 4
65
66 PROGRAMMING AND FUNCTIONS
sure to break it so as to yield an incomplete command. R will then insert a + at the
beginning of the next line.
In simple cases, the if-else construction can be written on a single line:
> x = -1
> if (x > 0) 1 else 0
[1] 0
Likewise for more than two choices, with else if:
> x = 0
> if (x < 0) 0 else if (x == 0) 0.5 else 1
[1] 0.5
A conditional construction may be used inside another construction, e.g.,
> x = if(y > 0) pi else pi/2
To apply conditional execution to each element of a vector, use the function
ifelse:
> set.seed(333)
> x = round(rnorm(10),2)
> y = ifelse(x>0, 1, -1)
> x
[1] -0.08 1.93 -2.05 0.28 -1.53 -0.27 1.23 0.63 0.35 -0.56
> y
[1] -1 1 -1 1 -1 -1 1 1 1 -1
R also has a switch function, switch(EXPR, cases), which evaluates EXPR and
accordingly chooses one of the cases.
> set.seed(123)
> x = rnorm(10, 2, 0.5)
> y = 3
> switch(y, mean(x), median(x), sd(x))
[1] 0.476892
4.2 Loops
Computations often must repeat certain steps either a given number of times, or until
some condition is met. R, like other programming languages, has the looping func-
tions to deal with such situations, though it is often possible, and usually desirable,
to avoid loops by taking advantage of vectorization.
100
80
60
N
40
20
0 20 40 60 80 100
time
Figure 4.1: Simulation of radioactive decay using Eulers method.
0.4
0.3
f(x)
0.20.1
0.0
-4 -2 0 2 4
x
Figure 4.2: Overlay of gaussian(x,0,1) (solid line), dnorm (points), and
lorentzian(x,0,1) (dotted line) functions.
+ x=a
+ iter = 0
+ xdiff = Inf
+ while (xdiff > tol) {
+ iter = iter + 1
+ xold = x
+ x = (x + a/x)/2
+ xdiff = abs(x-xold)/abs(x)
+ if (iter > max.iter) {print(paste("Not converged after",
+ iter,"cycles."))
+ break
+ }}
+ return(print(paste("sqrt(",a,")=",x,", xdiff=", xdiff,",
+ iterations=",iter))) }
> sqrt_N(52.3)
[1] "sqrt( 52.3 )= 7.23187389270582 , xdiff= 0 , iterations= 9"
The Gaussian and Lorentzian functions are frequently encountered in spec-
troscopy and other scientific areas. gauss(x,0,1) is equivalent to dnorm, the prob-
ability density function for the normal distribution built into R (Figure 4.2).
> gauss = function(x,x0,sig) {1/sqrt(2*pi)*sig*exp(-(x-x0)^2/(2*sig^2))}
> lorentz = function(x,x0,w) {w/pi/((x-x0)^2 + w^2)}
>
> curve(gauss(x,0,1), -5,5, ylab = "f(x)", main = "Distributions")
> curve(dnorm,-5,5,type="p", add=T)
> curve(lorentz(x,0,1), xlim = c(-5,5), lt = 2, add=T)
We shall construct and use many functions throughout this book. Consider, for
example, this variation on the one-dimensional random walk theme. Define the func-
tion randwalk(N), in which we start at 0 and generate N steps of unit length
taken randomly to either the left or the right. We pick the direction of each step
USER-DEFINED FUNCTIONS 71
Histogram of multiwalks
10 15 20 25
Frequency
5
0
-20 -10 0 10 20
multiwalks
Figure 4.3: Histogram of displacements of 100 one-dimensional random walks.
4.4 Debugging
Things can often go wrong when writing functions or other code. Indeed, it may
frequently be the case that more time will be spent on debugging than on writing the
original code. Therefore, methods for tracking and correcting errors are important.
Perhaps the simplest method is to insert print() or cat statements at intermediate
points in the program. For example, here is code for a function that takes a vector
of numbers x, squares each element, subtracts 4, and takes the natural logarithm of
the result. Of course, if the x value is too small, the final operation will be taking the
log of zero or a negative number, which gives NaN. We put a print statement in the
function to see whether such numbers appear before logs are taken.
> f1 = function(x) {
+ xsq = x^2
+ xsqminus4 = xsq - 4; print(xsqminus4)
+ log(xsqminus4-4)
+ }
> f1(6:1)
[1] 32 21 12 5 0 -3
[1] 3.332205 2.833213 2.079442 0.000000 NaN NaN
Warning message:
In log(xsqminus4 - 4) : NaNs produced
Alternatively, we can omit the print statement in the function, and use the
debug function to step through the program one instruction at a time, invoking print
BUILT-IN MATHEMATICAL FUNCTIONS 73
from the Browser prompt to see intermediate results where we suspect a problem
may arise.
> debug(f1)
> f1(1:6)
debugging in: f1(1:6)
debug at #1: {
xsq = x^2
xsqminus4 = xsq - 4
log(xsqminus4 - 4)
}
Browse[2]>
debug at #2: xsq = x^2
Browse[2]>
debug at #3: xsqminus4 = xsq - 4
Browse[2]>
debug at #4: log(xsqminus4 - 4)
Browse[2]> print(xsqminus4-4)
[1] -7 -4 1 8 17 28
Browse[2]>
exiting from: f1(1:6)
[1] NaN NaN 0.000000 2.079442 2.833213 3.332205
Warning message:
In log(xsqminus4 - 4) : NaNs produced
To end debugging, type undebug(f1). Type ?debug for more information.
J(x,0)
0.8
J(x,1)
J(x,0), J(x,1)
0.0 0.4
-0.4
0 2 4 6 8 10
x
Figure 4.4: Bessel functions J(x,0) and J(x,1).
3
2
3
2
4
Laguerre(n,x)
5
0 1
-1
-2
0 1 2 3 4 5 6
x
Figure 4.5: Laguerre polynomials.
0.8
Fresnel integrals
0.2 0.4 0.6
fS
fC
0.0
0 1 2 3 4 5
x
Figure 4.6: Fresnel sine and cosine integrals.
> require(pracma)
> fS = function(x) fresnelS(x)
> fC = function(x) fresnelC(x)
> curve(fS,0,5,ylim=c(0,0.8),ylab="Fresnel integrals")
> curve(fC,0,5,add=T,lty=3)
> legend("bottomright",legend=c("fS","fC"),lty=c(1,3), bty="n")
There are several equivalent definitions of the Fresnel integrals, differing in nor-
malization and scaling: pracma uses
Z x Z x
2
S(x) = sin t dt, C(x) = cos t 2 dt (4.2)
0 2 0 2
Other definitions (Abramowitz and Stegun, 1965, p. 300) are
r Z r Z
2 x 2 2 x
S1 (x) = sin(t ) dt, C1 (x) = cos(t 2 ) dt (4.3)
0 0
and Z x Z x
1 sin(t) 1 cos(t)
S2 (x) = dt, C2 (x) = dt (4.4)
2 0 t 2 0 t
Relations between these definitions are
r
S(x) = S1 x = S2 x2 (4.5)
2 2
with similar equations for the cosine integral. We shall use S2 and C2 in a calculation
at the end of this chapter.
The specfun package is another package devoted to special functions in mathe-
matics and physics. Functions included are:
Gamma, Beta, Airy, and Psi functions
Legendre functions of first and second kind
78 PROGRAMMING AND FUNCTIONS
Bessel and modified Bessel functions, spherical Bessel functions, and integrals of
Bessel functions
Kelvin, Struve, and Mathieu functions
Hypergeometric (and confluent hypergeometric) functions
Parabolic cylinder functions
Spheroidal wave functions
Error functions and Fresnel integrals
Elliptic integrals and Jacobian elliptic functions
Cosine and sine integrals
Exponential integrals
One aspect of this package is that all functions, where appropriate, accept com-
plex arguments, whereas most special functions in R and other packages only work
for real arguments.
Because names like beta or gamma are so often used in R, all function names
in specfun are prepended with a sp. to avoid name clashes. For example the Beta
and Gamma functions are called sp.beta() and sp.gamma(), respectively.
5
p(x), dp/dx
0
-5
-2 -1 0 1 2 3
x
Figure 4.7: Plot of a polynomial and its first derivative.
0.00.5
y
-0.5
-3 -2 -1 0 1 2 3
x
Figure 4.8: Fitting data to a polynomial with poly.calc.
x^2
We can use recursion relations to generate higher order orthogonal polynomi-
als from the zeroth and first terms. (Note that the nth item in the list below is
the (n-1)st order polynomial.) We use the Hermite polynomial example from the
PolynomF.pdf reference manual, denoting them as He(n) to differentiate them
from the differently scaled Hermite polynomials H(n) used by physicists (e.g.,
Abramowitz and Stegun, 1965):
> x = polynom()
> He = polylist(1, x)
> for(j in 2:10) He[[j+1]] = x*He[[j]] - (j-1)*He[[j-1]]
> He
List of polynomials:
[[1]]
1
[[2]]
x
[[3]]
-1 + x^2
[[4]]
-3*x + x^3
[[5]]
3 - 6*x^2 + x^4
[[6]]
15*x - 10*x^3 + x^5
[[7]]
-15 + 45*x^2 - 15*x^4 + x^6
[[8]]
-105*x + 105*x^3 - 21*x^5 + x^7
[[9]]
POLYNOMIAL FUNCTIONS IN PACKAGES 83
2 0
He5(x)
-2 -4
-6
-2 -1 0 1 2
x
Figure 4.9: Plot of 5th Hermite polynomial.
Electron Density
0 5 10 15 20 25
r (reduced)
Figure 4.10: Normalized associated Laguerre polynomials used to calculate the electron den-
sities of the 2s and 2p orbitals of the hydrogen atom.
0.6
S(x), S2(x)
0.2 0.4
S2
S
0.0
0 1 2 3 4 5
x
Figure 4.12: Comparison of S2 and S function definitions for Fresnel sine integral.
m2
Glm = 2(1)(ml+2)/2 [l 1/2 S2 (l) m1/2 S2 (m)], l 6= 0, m, (4.7)
l 2 m2
k1/2
Gkk = [2kC2 (k) S(k)]. (4.8)
2
These equations are implemented in the R code below. Note that the fac-
tors of the form (1)m/2+1 will give the result NaN if m is odd and an imag-
inary part not exactly 0 if m is even. Therefore we have used the formulation
Re(round((-1+0i)^((m1/2)+1), 0)) to get the appropriate values of 0 if m is
odd and 1 if m is even. We follow Zimm et al. in calculating only the first eight
rows and columns of the matrix, which is sufficient to calculate the lowest frequen-
cies. However, since their matrix indices started at 0, while Rs start at 1, we must
subtract 1 from the indices before evaluating the elements.
> G = matrix(nrow=8,ncol=8)
> for (m in 1:8) { # First row
+ m1 = m-1
+ v = Re(round((-1+0i)^((m1/2)+1), 0))
+ G[1,m] = 2*pi*m1^(1/2)*v*S2(pi*m1)
+ }
>
> for (m in 1:8) {
+ m1 = m-1
+ for (l in 1:8) {
+ l1 = l-1
CASE STUDIES 89
+ if (l1 == m1) next
+ v = Re(round((-1+0i)^((m1-l1+2)/2), 0))
+ vw = 2*pi*v*m1^2/(l1^2-m1^2)
+ G[l,m] = vw*(sqrt(l1)*S2(pi*l1)-sqrt(m1)*S2(pi*m1))
+ }
+ }
>
> for (k in 1:8) { # Diagonal
+ k1 = k-1
+ G[k,k] = pi*sqrt(k1)/2*(2*pi*k1*C2(pi*k1)-S2(pi*k1))
+ }
We display the matrix elements rounded to three decimals.
> round(G,3)
[,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8]
[1,] 0 0.000 3.052 0.000 -4.875 0.000 6.284 0.000
[2,] 0 4.098 0.000 2.653 0.000 -4.113 0.000 5.347
[3,] 0 0.000 12.867 0.000 2.432 0.000 -3.637 0.000
[4,] 0 0.295 0.000 24.271 0.000 2.485 0.000 -3.527
[5,] 0 0.000 0.608 0.000 37.914 0.000 2.536 0.000
[6,] 0 -0.165 0.000 0.895 0.000 53.413 0.000 2.632
[7,] 0 0.000 -0.404 0.000 1.127 0.000 70.606 0.000
[8,] 0 0.109 0.000 -0.648 0.000 1.343 0.000 89.314
Finally, we calculate the eigenvalues, round the results to three decimals, and sort
the values in ascending order.
> sort(round(eigen(G)$values,3))
[1] 0.000 4.035 12.779 24.200 37.892 53.412 70.715 89.449
The results agree with those of Zimm et al. to within 1 in the last decimal.
Chapter 5
Solving equations is central to numerical analysis, and R has numerous tools for
doing so. We begin by extending our consideration of polynomials from the previous
chapter.
91
92 SOLVING SYSTEMS OF ALGEBRAIC EQUATIONS
There are certain pathological polynomials for which finding roots should be
easy, but isnt. The prime example is Wilkinsons polynomial (http://en.wikipedia.org/
wiki/Wilkinsons polynomial; Acton, p. 201):
20
W (x) = (x i) = (x 1)(x 2) . . . (x 20)
i=1
Obviously the roots are the integers 1 to 20 and they are well separated, but there are
limitations in the precision available to the root-solving functions that undermine the
process, both with polyroot and with solve, which uses an eigenvalue computa-
tion. JenkinsTraub is usually the most reliable, but in this case neither works.
> require(PolynomF)
> x = polynom()
> W=(x-1)
> for (j in 2:20) W = W*(x-j)
> solve(W)
[1] 1.000000 2.000000 3.000000 4.000000 5.000000 6.000000
[7] 6.999973 8.000284 8.998394 10.006060 10.984041 12.033449
[13] 12.949056 14.065273 14.935356 16.048275 16.971132 18.011222
[19] 18.997160 20.000325
Now try with polyroot, after getting the polynomial coefficients with coif(W).
> polyroot(coef(W))
[1] 1.000000+0.000000i 2.000000+0.000000i 3.000000-0.000000i
[4] 4.000000+0.000000i 5.000000-0.000000i 7.000005-0.000014i
[7] 5.999999+0.000002i 9.000227-0.000165i 7.999960+0.000059i
[10] 11.002737-0.000451i 11.993971+0.000451i 9.999080+0.000322i
[13] 13.986821+0.000200i 15.013075-0.000118i 13.010246-0.000335i
[16] 17.005442-0.000049i 17.997884+0.000022i 15.990103+0.000082i
[19] 19.000507-0.000005i 19.999943+0.000000i
0.5-0.5
f(x, a = 0.5)
-1.5 -2.5
0 1 2 3 4 5
x
Figure 5.1: The function f(x,a) with a = 0.5. Roots are located by the points command once
they have been calculated by uniroot.all.
$estim.prec
[1] 6.103516e-05
In this example, we started the root search in the region c(0.1,1) rather than
c(0,1 because the function must be of opposite signs at the beginning and end of
the interval. If not, an error message is generated.
> uniroot(f,c(.1,.5),a=0.5)
Error in uniroot(f, c(0.1, 0.5)) :
f() values at end points not of opposite sign
If the function has several zeros in the region of interest, the function
uniroot.all from the package rootSolve (which uses uniroot) should find all
of them, though success is not guaranteed in pathological cases.
> require(rootSolve)
Loading required package: rootSolve
> zpts=uniroot.all(f,c(0,5),a=0.5)
> zpts
[1] 0.00000000 0.06442212 0.53483060 1.36761623 1.76852629
[6] 2.63893168 3.01267402 3.90557382 4.26021380
> yz=rep(0,length(zpts))
> points(zpts,yz) # Locate roots on graph of function
Note that uniroot.all does not provide the information about convergence and
precision that uniroot does. Note also the differences in how to deal with the pa-
rameter a in the calls to curve and in uniroot or uniroot.all.
uniroot will not work if the function only touches, but does not cross, the x
axis, unless one end of the search range is exactly at the root. For example,
> ff = function(x) sin(x)+1
> uniroot(ff,c(-pi,0))
Error in uniroot(ff, c(-pi, 0)) :
f() values at end points not of opposite sign
96 SOLVING SYSTEMS OF ALGEBRAIC EQUATIONS
# but
> uniroot(ff,c(-pi,-pi/2))
$root
[1] -1.570796
$f.root
[1] 0
$iter
[1] 0
$estim.prec
[1] 0
Of course, if the position of the root is already known, there is no need to do the cal-
culation. In general, however, it may be best to seek the minimum of such a function
by procedures discussed later in this book in the chapter on optimization.
1.5
visc
1.00.5
0 20 40 60 80 100
tC
Figure 5.2: Viscosity of water fit to a quadratic in temperature.
[2,] 2 1 1 -1
[3,] 1 -1 2 1
[4,] 1 3-2-3
> B = c(-1,4,5,-3)
> solve(A.sing,B)
Error in solve.default(A.sing, B) :
LAPACK routine dgesv: system is exactly singular
> options(digits=3)
> # Solve
> soln = solve(m,d)
> soln
[1] 0.500 0.547 0.589 0.625 0.655 0.678 0.694 0.704
[9] 0.706 0.702 0.690
Now suppose that the set of equations to be solved gets 100 or 1000 times bigger.
On my 2012 laptop, for n = 1001,
user system elapsed
0.331 0.004 0.332
and for n = 10001
user system elapsed
341.84 8.24 371.07
SPARSE MATRICES 101
This is close to expected since the solve algorithm goes as n3 . Now try with
Solve.tridiag from the limSolve package. This algorithm goes as n. We need to
provide just the vectors, not the matrix m.
> require(limSolve)
> n = 1001
> set.seed(333)
102 SOLVING SYSTEMS OF ALGEBRAIC EQUATIONS
> n = 500 # 500 x 500 matrix
> # Diagonal
> bb = runif(n)
5.8.1 QR decomposition
The QR decomposition of an m n (not necessarily square) matrix factors the ma-
trix into an orthogonal mm matrix Q and an upper triangular matrix R. It is invoked
in the base R installation with qr() and used to solve overdetermined systems in a
least-square sense with qr.solve(), being therefore useful in computing regres-
sion coefficients and applying the NewtonRaphson algorithm. In the Matrix pack-
age, x = "dgCMatrix" gives the QR decomposition of a general sparse double-
precision matrix.
We give two examples, starting with an overdetermined system with 4 equations
and 3 unknowns.
> set.seed(321)
> A = matrix((1:12)+rnorm(12),nrow=4)
> b = 2:5
> qr.solve(A,b) # Solution in a least-squares sense
[1] 0.625 1.088 -0.504
The QR decomposition of A, itself, is simply obtained by
> qr(A)
$qr
[,1] [,2] [,3]
[1,] -5.607 -13.2403 -21.515
[2,] 0.230 -3.9049 -4.761
[3,] 0.485 0.4595 1.228
[4,] 0.692 -0.0574 0.515
$rank
[1] 3
$qraux
[1] 1.48 1.89 1.86
$pivot
[1] 1 2 3
attr(,"class")
[1] "qr"
If, on the other hand, there are 3 equations and 4 unknowns, we have an under-
determined system.
> set.seed(321)
> A = matrix((1:12)+rnorm(12),nrow=3)
106 SOLVING SYSTEMS OF ALGEBRAIC EQUATIONS
> b = 3:5
$u
[,1] [,2] [,3] [,4] [,5]
[1,] -0.217 -0.4632 0.4614 0.164 0.675
[2,] -0.154 -0.5416 0.0168 -0.528 -0.444
[3,] 0.538 -0.1533 0.5983 -0.290 -0.124
[4,] 0.574 -0.5585 -0.5013 0.319 0.070
[5,] 0.547 0.3937 0.0449 -0.261 0.285
[6,] 0.104 0.0404 0.4190 0.664 -0.496
$v
[,1] [,2] [,3] [,4] [,5]
[1,] 0.459 -0.0047 0.712 -0.159 0.507
[2,] -0.115 -0.5192 -0.028 0.758 0.377
[3,] 0.279 0.7350 -0.355 0.352 0.363
[4,] 0.333 -0.4023 -0.604 -0.448 0.402
[5,] -0.766 0.1684 0.039 -0.275 0.554
MATRIX DECOMPOSITION 107
An interesting and insightful article about the geometric interpretation of the
SVD in terms of linear transformations, its theory, and some applications, is A Sin-
gularly Valuable Decomposition: The SVD of a Matrix by Dan Kalman.1
5.8.3 Eigendecomposition
The familiar process of finding the eigenvalues and eigenvectors of a square matrix
can be viewed as eigendecomposition. It factors the matrix into VDV1 , where D
is a diagonal matrix formed from the eigenvalues, and the columns of V are the
corresponding eigenvectors.
A familiar example from physics textbooks is a system of 3 masses of mass m
attached to parallel walls by 4 springs of force constant k. Analysis of this system
(e.g., Garcia, 2000, pp. 1645) leads to the matrix equation
2 1 0
1 2 1 a = a (5.2)
0 1 2
We wish to solve this equation for the eigenvalues = m 2 /k leading to the char-
acteristic frequencies , and for the eigenvectors
a. The analytical solutions, readily
obtained in this simple case, are = 2, 2 + 2, 2 sqrt2 with eigenvectors
1/ 2 1/2
a0 = 0 , a = 1/ 2 (5.3)
1/ 2 1/2
These results agree with the numerical values obtained by the R code
> options(digits=3)
> M = matrix(c(2,-1,0,-1,2,-1,0,-1,2), nrow=3, byrow=TRUE)
> eigen(M)
$values
[1] 3.414 2.000 0.586
$vectors
[,1] [,2] [,3]
[1,] -0.500 -7.07e-01 0.500
[2,] 0.707 1.10e-15 0.707
[3,] -0.500 7.07e-01 0.500
5.8.4 LU decomposition
The LU decomposition factors a square matrix into a lower triangular matrix L
and an upper triangular matrix U. It can be called from the Matrix package with
the function lu(). LU decomposition is commonly used to solve square systems
1 www.math.umn.edu/lerman/math5467/svd.pdf
108 SOLVING SYSTEMS OF ALGEBRAIC EQUATIONS
of linear equations, since it is about twice as fast as QR decomposition. Here is an
example from the LU (dense) Matrix Decomposition help page.
> options(digits=3)
> set.seed(1)
> require(Matrix)
> mm = Matrix(round(rnorm(9),2), nrow = 3)
> mm
3 x 3 Matrix of class "dgeMatrix"
[,1] [,2] [,3]
[1,] -0.63 1.60 0.49
[2,] 0.18 0.33 0.74
[3,] -0.84 -0.82 0.58
> lum = lu(mm)
> str(lum)
Formal class denseLU [package "Matrix"] with 3 slots
..@ x : num [1:9] -0.84 0.75 -0.214 -0.82 2.215 ...
..@ perm: int [1:3] 3 3 3
..@ Dim : int [1:2] 3 3
> elu = expand(lum)
> elu # three components: "L", "U", and "P", the permutation
$L
3 x 3 Matrix of class "dtrMatrix" (unitriangular)
[,1] [,2] [,3]
[1,] 1.0000 . .
[2,] 0.7500 1.0000 .
[3,] -0.2143 0.0697 1.0000
$U
3 x 3 Matrix of class "dtrMatrix"
[,1] [,2] [,3]
[1,] -0.840 -0.820 0.580
[2,] . 2.215 0.055
[3,] . . 0.860
$P
3 x 3 sparse Matrix of class "pMatrix"
[1,] . | .
[2,] . . |
[3,] | . .
s3 3s2 + 4 = 0 (5.4)
which arises when Archimedes principle is used to calculate the ratio s of the height
submerged to the radius of a sphere in a fluid, where the ratio of sphere density to
110 SOLVING SYSTEMS OF ALGEBRAIC EQUATIONS
3
fs(x)
2
1
0
0.0 1.0 2.0 3.0
x
Figure 5.3: Plot of the lhs of Equation 5.4.
fluid density is . Suppose we want to use this equation to calculate the fraction
of the height of an iceberg (modeled as a sphere) that is submerged in water just
above freezing. The density ratio of ice to water near 0 C is about 0.96. We plot the
equation and find that the physically sensible root is a little below 2. (The maximum
ratio of depth to diameter is 1, so the maximum ratio of depth to radius is 2.)
> fs = function(s) s^3 - 3*s^2 + 4*rho
> rho = 0.96
> curve(fs(x),0,3); abline(h=0)
Thus we search for roots between 1.5 and 2.5. (See Figure 5.3)
> options(digits=3)
> multiroot(fs, c(1.5,2.5))
$root
[1] 1.76 2.22
$f.root
[1] 1.79e-09 6.45e-07
$iter
[1] 5
$estim.precis
[1] 3.24e-07
This confirms the common estimate that about 7/8 of the height of an iceberg is under
water.
Next we consider the set of two simultaneous equations
10x1 + 3x22 3 = 0
x12 ex2 2 = 0 (5.5)
SYSTEMS OF NONLINEAR EQUATIONS 111
We first use multiroot without an explicit Jacobian, so that the function does
the Jacobian calculation internally.
> require(rootSolve)
> model = function(x) c(F1 = 10*x[1]+3*x[2]^2-3,
F2 = x[1]^2 -exp(x[2]) -2)
> (ss = multiroot(model,c(1,1)))
$root
[1] -1.445552 -2.412158
$f.root
F1 F2
5.117684e-12 -6.084022e-14
$iter [1] 10
$estim.precis
[1] 2.589262e-12
Providing an analytical Jacobian may provide a more quickly converging solu-
tion, but not always, as seen here.
> model = function(x) c(F1 = 10*x[1]+3*x[2]^2-3,
F2 = x[1]^2 -exp(x[2]) -2)
> derivs = function(x) matrix(c(10,6*x[2],2*x[1],
-exp(x[2])),nrow=2,byrow=T)
> (ssJ = multiroot(model,c(0,0),jacfunc = derivs))
$root
[1] -1.445552 -2.412158
$f.root
1.166651e-09 -1.390243e-11
$iter [1] 29
$estim.precis
[1] 5.902766e-10
The help page explains how various convergence tolerances may be adjusted if the
defaults are inadequate.
The rootSolve package has a variety of related functions, largely devoted to
obtaining steady-state solutions to systems of ordinary and partial differential equa-
tions. We shall return to it later in this book. The package vignette at http://cran.r-
project.org/web/packages/rootSolve/vignettes/ rootSolve.pdf is a valuable resource
and should be consulted for more information.
5.9.2 nleqslv
Another nonlinear equation solver is nleqslv in the package of the same name. The
package description states Solve a system of non linear equations using a Broyden
or a Newton method with a choice of global strategies such as linesearch and trust
region. There are options for using a numerical or an analytical Jacobian and fixed
or automatic scaling of parameters.
112 SOLVING SYSTEMS OF ALGEBRAIC EQUATIONS
After installing the package with install.packages("nleqslv"), we load it
and apply it to the same function we used with multiroot:
> install.packages("nleqslv")
> require(nleqslv)
> model = function(x) {
+ y = numeric(2)
+ y[1] = 10*x[1]+3*x[2]^2-3
+ y[2] = x[1]^2 -exp(x[2]) -2
+ y
+ }
> (ss = nleqslv(c(1,1), model))
$x
[1] -1.445552 -2.412158
$fvec
[1] 3.592504e-11 -1.544165e-11
$termcd
[1] 1
$message
[1] "Function criterion near zero"
$scalex
[1] 1 1
$nfcnt
[1] 22
$njcnt
[1] 1
$iter
[1] 18
Consult the help page for nleqslv to learn about its numerous options and to
see more examples.
$residual
[1] 2.15111e-08
$fn.reduction
[1] 10.66891
$feval
[1] 103
114 SOLVING SYSTEMS OF ALGEBRAIC EQUATIONS
$iter
[1] 74
$convergence
[1] 0
$message
[1] "Successful convergence"
BBsolve() is a wrapper around dfsane() that automatically uses sequential
strategiesdetailed on its help pagein cases where there are difficulties with con-
vergence. With the BBsolve() wrapper:
> ans = BBsolve(par=c(1,1), fn=model)
Successful convergence.
> ans
$par
F1 F2
-1.445552 -2.412158
$residual
[1] 7.036056e-08
$fn.reduction
[1] 0.0048782
$feval
[1] 174
$iter
[1] 60
$convergence
[1] 0
$message
[1] "Successful convergence"
$cpar
method M NM
2 50 1
Here is an example where dfsane() doesnt converge but BBsolve() does,
because it switches to a different method.
> froth = function(p){
+ f = rep(NA,length(p))
SYSTEMS OF NONLINEAR EQUATIONS 115
+ f[1] = -13 + p[1] + (p[2]*(5 - p[2]) - 2) * p[2]
+ f[2] = -29 + p[1] + (p[2]*(1 + p[2]) - 14) * p[2]
+ f
+ }
> p0 = c(3,2)
> BBsolve(par=p0, fn=froth)
Successful convergence.
$par
[1] 5 4
$residual
[1] 3.659749e-10
$fn.reduction
[1] 0.001827326
$feval
[1] 100
$iter
[1] 10
$convergence
[1] 0
$message
[1] "Successful convergence"
$cpar
method M NM
2 50 1
$residual
[1] 11.63811
$fn.reduction
[1] 25.58882
$feval
[1] 137
116 SOLVING SYSTEMS OF ALGEBRAIC EQUATIONS
$iter
[1] 114
$convergence
[1] 5
$message
[1] "Lack of improvement in objective function"
In other words, OD(x) is the dot product of the vector A(x) with the vector C, which
in R notation is written A%*%C.
If we know the concentrations (ultimately we will pretend we do not know them
and will solve for them) we can calculate and plot the spectrum of the mixture as
follows:
> x = 180:400 # Plot spectrum between 180 nm and 400 nm
> A = matrix(nrow = length(x), ncol = 4) # Initialize A matrix
> # Calculate Ais at each wavelength
> for (i in 1:length(x)) {
+ xi = x[i]
+ for (j in 1:4){
+ A[i,1] = A1(xi)
+ A[i,2] = A2(xi)
+ A[i,3] = A3(xi)
+ A[i,4] = A4(xi)
+ }
+ }
> conc = c(7,5,8,2)*1e-3 # Vector of concs (molar)
> OD = A%*%conc # Multiply A matrix into conc vector
> plot(x, OD, type="l")
Not knowing the concentrations and wishing to determine them, a chemist would
choose at least four wavelengths at which to measure the OD. For example,
> x.meas = c(220,250,280,310)
We calculate the extinction coefficient matrix for the four compounds at the four
wavelengths (Figure 5.4):
> A.meas = matrix(nrow = length(x.meas), ncol = 4)
> for (i in 1:length(x.meas)) {
+ A.meas[i,1] = A1(x.meas[i])
+ A.meas[i,2] = A2(x.meas[i])
+ A.meas[i,3] = A3(x.meas[i])
+ A.meas[i,4] = A4(x.meas[i])
+ }
CASE STUDIES 119
1.5
1.0
OD
0.5
0.0
200 250 300 350 400
x
Figure 5.4: Simulated spectrum of 4-component mixture.
an equation that holds for all gases when expressed in terms of reduced variables.
With some algebraic manipulation we can write Equation 5.10 as a cubic equa-
tion in the reduced volume:
1 8Tr 3 1
Vr3 1+ Vr2 + Vr = 0 (5.11)
3 Pr Pr Pr
We now use Rs polyroot() function to solve for the real roots of this equation
to construct a plot of Vr as a function of Pr at a given Tr that shows the special
behavior of a van der Waals gas near its critical point. First we look at just a single
point to understand the nature of the roots.
> Tr = 0.95
> Pr = 1.5
> # Write expressions for the coefficients in the cubic
> c0 = -1/Pr
> c1 = 3/Pr
> c2 = -1/3*(1+8*Tr/Pr)
> c3 = 1
> (prc = polyroot(c(c0,c1,c2,c3)))
[1] 0.5677520-0.0000000i 0.7272351+0.8033372i 0.7272351-0.8033372i
We see that there are three roots, as there should be for a cubic equation. Its the
one with imaginary part equal to zero that we want, so our code has to have a way
to pick that root. Since the roots are calculated numerically, the logical testIm(prc)
== 0 will likely fail. Also, it is found that the test all.equal(Im(prc),0) some-
times failed, suggesting that although the imaginary part of the root is displayed as
0 to seven decimal places, it may be larger than {Machine\$double.eps ^ 0.5,
the test that all.equal() uses. Therefore, we use abs(Im(prc)) <= 1e-12 as a
heuristic test for a zero imaginary part, with the confidence that only one of the three
roots would pass the test. These considerations lead to the following code.
> Tr = 0.95 # Temperature below the critical point
> pr = seq(0.5,3,by = 0.01) # From relatively dilute to compressed
> npr = length(pr)
> Vr = numeric(npr)
> for( i in 1:npr) {
+ Pr = pr[i]
+ c0 = -1/Pr
+ c1 = 3/Pr
+ c2 = -1/3*(1+8*Tr/Pr)
+ c3 = 1
+ prc = polyroot(c(c0,c1,c2,c3))
+ for (j in 1:3) if (abs(Im(prc[j])) <= 1e-12) Vr[i] = Re(prc[j])
+ }
122 SOLVING SYSTEMS OF ALGEBRAIC EQUATIONS
3.0
2.0
Pr
1.0
0.0
0 1 2 3 4
Vr
Figure 5.5: Plots of reduced pressure vs. reduced volume below (points) and above (line) the
critical temperature.
We need to solve these equations for the ri , and use the results in Equations
5.12 to calculate the equilibrium mole fractions. As they stand, Equations 5.14 are
very difficult to solve numerically with any of the functions we have examined in
this chapter, unless the starting guesses are unrealistically close to the true values.
However, if the denominators are cleared, the difficulties largely disappear. Thus we
use the following code to solve for the reaction coordinate values at equilibrium,
choosing nleqslv as our solver.
> require(nleqslv)
> model = function(r) {
+ FX = numeric(4)
+ r1 = r[1]; r2 = r[2]; r3 = r[3]; r4 = r[4]
124 SOLVING SYSTEMS OF ALGEBRAIC EQUATIONS
+ ntot = 4-2*r1+r3-4*r4
+ FX[1] = r1*(r1-r2+2*r4)*ntot^2-69.18*(3-3*r1+r2-5*r4)^3*
(1-r1-r2+2*r3-2*r4)
+ FX[2] = (r2-r3)*(3-3*r1+r2-5*r4)-4.68*(1-r1-r2+2*r3-2*r4)*
(r1-r2+2*r4)
+ FX[3] = (1-r1-r2+2*r3-2*r4)^2-0.0056*(r2-r3)*ntot
+ FX[4] = r4*(r1-r2+2*r4)^2*ntot^4-0.141*(3-3*r1+r2-5*r4)^5*
(1-r1-r2+2*r3-2*r4)^2
+ FX
+ }
> # For initial guess, set all r equal
> (ss = nleqslv(c(.25,.25,.25,.25), model))
$x
[1] 6.816039e-01 1.589615e-02 -1.287031e-01 1.409549e-05
$fvec
[1] 1.015055e-08 -3.922078e-10 6.236144e-12 -9.467267e-09
$termcd
[1] 2
$message
[1] "x-values within tolerance xtol"
$scalex
[1] 1 1 1 1
$nfcnt
[1] 78
$njcnt
[1] 2
$iter
[1] 63
Note that the equilibrium value for r3 is negative, as it should be since solid carbon
is consumed in the reaction.
We now set the r vector equal to the results ss$x and calculate the equilibrium
mole fractions.
> r = ss$x
> ntot = 4-2*r[1]+r[3]-4*r[4]
> X = numeric(6)
> X[1] = (3-3*r[1]+r[2]-5*r[4])/ntot
> X[2] = (1-r[1]-r[2]+2*r[3]-2*r[4])/ntot
> X[3] = r[1]/ntot
> X[4] = (r[1] - r[2] + 2*r[4])/ntot
> X[5] = (r[2] - r[3])/ntot
> X[6] = r[4]/ntot
> X
[1] 3.871616e-01 1.796845e-02 2.717684e-01 2.654415e-01
[5] 5.765447e-02 5.620138e-06
Chapter 6
which is also numerically more accurate, with error O(h2 ) as h gets small:
> (f(x0+h) - f(x0-h))/(2*h)
[1] 0.1635973
It might be tempting, therefore, to make h as small as possible, e.g., as small as the
machine accuracy eps = .Machinedouble.eps of about 2.2 1016 , to minimize
the error in the derivative. However, as the following numerical experiment shows,
this strategy fails. Choose h to be 10i for i in 1 . . . 16 and plot the error as a function
of i (Figure 6.1).
125
126 NUMERICAL DIFFERENTIATION AND INTEGRATION
-2
-4
log10(err)
-6
-8
-10
5 10 15
-log10(h)
6.1.1.2 diff()
diff() in base R is not a derivative function, but rather a function that returns lagged
differences between entries in a vector. For central estimation use lag = 2. For ex-
ample
> xfun = function(x0,h) seq(x0-h,x0+h,h)
> diff(f(xfun(x0,h)), lag = 2)/(2*h)
[1] 0.163597
NUMERICAL DIFFERENTIATION 127
6.1.2 Numerical differentiation using the numDeriv package
The standard R package for calculating numerical approximations to derivatives is
numDeriv. It is used, for example, in all the standard optimization packages dis-
cussed in the next chapter. numDeriv contains functions to accurately calculate first
derivatives (gradients and Jacobians) and second derivatives (Hessians). In searching
for an optimum of a multivariate function, the gradient gives the direction of steepest
descent (or ascent) and the Hessian gives the local curvature of the surface.
The usage for each of these functions is
dfun(func, x, method, method.args, ...)
where dfun is one of grad, jacobian, or hessian, func is the function to be
differentiated, method is one of "Richardson" (the default), "simple" (not sup-
ported for hessian), or "complex", indicating the method to be used for the ap-
proximation. method.args are the argumentstolerances, number of repetitions,
etc.passed to the method (see the help page for grad for the most complete dis-
cussion of the details), and ... stands for any additional arguments to be passed to
func. In the examples following, we will use the defaults.
With method="simple", these functions calculate forward derivatives with step
size 104 , both choices that we already know are not optimal. Applying this method
to the function above yields
> require(numDeriv)
> options(digits=16)
> grad(f, 1, method = simple)
[1] 0.1636540038633105 # error: 5.7e-5
With the default method="Richardson", Richardsons extrapolation scheme is
applied, a method for accelerating a sequence of related computations. The help page
says: This method should be used if accuracy, as opposed to speed, is important.
> grad(f, 1, method = "Richardson")
[1] 0.1635973483989158 # error: 8.4e-13
Method "complex" refers to the quite recent complex-step derivative approach
and can be applied to complex-differentiable (i.e., analytic) functions that satisfy
the conditions that x0 and f (x0 ) are real. Then the complex step method computes
derivatives to the same accuracy as the function itself. Almost all special functions
available in R are complex-differentiable. Therefore, this method can be applied to
the function above, returning the derivative to 16 digits, and with no loss in speed
compared to method "simple":
> grad(f, 1, method = "complex")
[1] 0.1635973483980761 # error: < 1e-15
One has to be careful with self-defined functions. Normally, the complex-step
approach only works for functions composed of basic special functions defined in R.
128 NUMERICAL DIFFERENTIATION AND INTEGRATION
6.1.2.1 grad()
To illustrate the use of grad() for multivariate functions, we consider the scalar
function of three variables
f (x, y, z) = 2x + 3y2 sin(z). (6.3)
The gradient, as commonly defined in Cartesian coordinates, is the vector
f f f
5f = i+ j+ k (6.4)
x y z
where i, j, k are unit vectors along the x, y, z axes. For the function f defined in Equa-
tion 6.3, the gradient is therefore
5 f = 2i + 6yj cos(z)k (6.5)
We obtain the same result for particular numerical values of x, y, z = c(1,1,0) using
the grad() function as follows.
> require(numDeriv)
> f = function(u){
+ x = u[1]; y = u[2]; z = u[3]
+ return(2*x + 3*y^2 - sin(z))
+ }
> grad(f,c(1,1,0))
[1] 2 6 -1
6.1.2.2 jacobian()
The Jacobian matrix J of a vector function F(x) is the matrix of all first-order partial
derivatives of F with respect to the components of x. For a 2 2 system,
F F
!
1 1
x1 x2
J= F2 F2 (6.6)
x1 x2
> zapsmall(hessian(f,c(1,1,0)))
[,1] [,2] [,3]
[1,] 0 0 0
[2,] 0 6 0
[3,] 0 0 0
That is, as can be seen by inspection of Equation 6.5, all entries in the hessian matrix
for f at the given point are 0 except for the [2,2] entry.
6.1.3.1 fderiv()
The fderiv() function enables numerical differentiation of functions from first to
higher orders. Note that numerical derivatives get less accurate, the higher the order;
130 NUMERICAL DIFFERENTIATION AND INTEGRATION
but derivatives up to the eighth order seem to be possible without problems. To obtain
the nth derivative of a function f at a vector of points x, the usage with defaults is
fderiv(f, x, n = 1, h = 0, method="central", ...)
where h is the step size, set automatically if h = 0. Optimal step sizes for various
orders of derivative are given in the help page. The central method should be used
unless the function can be evaluated only on the right side (forward) or the left side
(backward). As usual, . . . stands for additional variables to be passed to f. An exam-
ple of usage:
> require(pracma)
> f = function(x) x^3 * sin(x/3) * log(sqrt(x))
> x = 1:4
> fderiv(f,x) # 1st derivative at 4 points
[1] 0.1635973 4.5347814 18.9378217 43.5914029
> fderiv(f,x,n=2,h=1e-5) # 2nd derivative at 4 points
[1] 1.132972 8.699867 20.207551 27.569698
6.1.3.4 jacobian()
As noted in the numDeriv section, the Jacobian matrix is the matrix of first deriva-
tives of the components of one vector x with respect to the components of another
vector y: xi / y j . The determinant of this matrix is used as a multiplicative factor
when changing variables from x to y when integrating a function over a region within
132 NUMERICAL DIFFERENTIATION AND INTEGRATION
0 2
1
y
-1
-2
-2 -1 0 1 2
x
Figure 6.2: Electric field of a dipole, plotted using the quiver function.
its domain. Here is an example of the Jacobian in transforming from spherical polar
to Cartesian coordinates:
> f = function(x) {
+ r = x[1]; theta = x[2]; phi = x[3];
+ return(c(r*sin(theta)*sin(phi), r*sin(theta)*cos(phi),
+ r*cos (theta)))
+ }
> x = c(2, 90*pi/180, 45*pi/180)
> options(digits=4)
> jacobian(f,x)
[,1] [,2] [,3]
[1,] 7.071e-01 0 1.414
[2,] 7.071e-01 0 -1.414
[3,] 6.123e-17 -2 0.000
This matrix accords with the analytical result.
6.1.3.5 hessian
The hessian() function in pracma behaves just as it does in numDeriv. Here is an
example from the help page for hessian() in the pracma package.
> f = function(u) {
+ x = u[1]; y <- u[2]; z <- u[3]
+ return(x^3 + y^2 + z^2 +12*x*y + 2*z)
+ }
> x0 = c(1,1,1) # Point at which the hessian is calculated
> hessian(f, x0)
[,1] [,2] [,3]
[1,] 6 12 0
[2,] 12 2 0
[3,] 0 0 2
NUMERICAL INTEGRATION 133
hessian() functions are provided in many R packages. For example, one is
included in the rootSolve package, where it is used in the context of solving differ-
ential equations. However, for stand-alone purposes, numDeriv or pracma are to be
preferred.
6.1.3.6 laplacian()
The Laplacian is a differential operator given by the divergence of the gradient of a
function, often denoted by 2 or 4. In Cartesian coordinates, the Laplacian is given
by the sum of second partial derivatives of the function with respect to x, y, and z.
2 f 2 f 2 f
2 f = + + . (6.10)
x2 y2 z2
pracma numerically calculates this quantity, in as many dimensions as desired, with
the laplacian()function. For example, in two dimensions:
> f = function(x) 2/x[1] - 1/x[2]^2
> laplacian(f, c(1,1))
[1] -2
> str(q)
List of 5
$ value : num 0.522
$ abs.error : num 7.6e-15
$ subdivisions: int 1
$ message : chr "OK"
$ call : language integrate(f = f, lower = 0, upper = pi)
- attr(*, "class")= chr "integrate"
integrate() returns a list with entries $value for the approximate value of
the integral, and $abs.error the estimated absolute error. Because the known exact
value of the integral is 12 (1 + e ) the true absolute error is:
> v = 0.5*(1+exp(-pi))
> abs(q$value - v)
[1] 1.110223e-16
The integrand function needs to be vectorized, otherwise one will get an error
message, e.g., with the following nonnegative function:
> f1 = function(x) max(0, x)
> integrate(f1, -1, 1)
Error in integrate(f1, -1, 1) :
evaluation of function gave a result of wrong length
The reason is that f(c(x1, x2, ...)) is max(0, x1, ...) and not
c(max(0, x1), max(0, x2), ...) as would be expected from a vectorized
function. In this case, the behavior of the function can be remedied by using the
pmax() function, which returns a vector of the maxima of the input values:
> f2 = function(x) pmax(0, x)
> integrate(f2, -1, 1)
0.5 with absolute error < 5.6e-15
In general, the help page suggests to vectorize the function by applying the
Vectorize() function to it.
NUMERICAL INTEGRATION 135
> f3 = Vectorize(f1)
> integrate(f3, -1, 1)
0.5 with absolute error < 5.6e-15
Sometimes, integrate() has difficulties with highly oscillating functions: one
then sees a message like maximum number of subdivisions reached. It may help to
increase the number of subdivisions, but that is not guaranteed to solve the problem.
It is sometimes recommended to set the number of subdivisions to 500 by default,
anyway.
Note that the true absolute error will not always be smaller than the estimated
one, there may be situations where the estimated absolute error will be misleadingly
small. Consider for example the following function
x1/3
f (x) = , (6.11)
1+x
which has ill-behaved derivatives at the origin.
> f = function(x) x^(1/3)/(1+x)
> curve(f,0,1)
> integrate(f,0,1)
0.4930535 with absolute error < 1.1e-09
Using integrate(), we get the same answer using x or the transformed variable
u = x3 in the integration, but with a considerably smaller estimated absolute error.
> # Now with transformed variable
> fu = function(u) 3*u^3/(1+u^3)
> integrate(fu,0,1)
0.4930535 with absolute error < 1.5e-13
Example Consider the calculation of the mean-square radius of a sphere of
radius R and constant density:
RR
1 0 r4 dr
< R2 >= (6.12)
R2 0R r2 dr
R
The integrals are trivial analytically, and lead to 3/5 as the answer. Numerically,
> f1 = function(r) r^2
> f2 = function(r) r^4
> f = function(R) integrate(f2,0,R)$value/
+ integrate(f1,0,R)$value/R^2
> f(1); f(10); f(100)
[1] 0.6
[1] 0.6
[1] 0.6
and we would get the same result for any value of R.
Example The following exercise displays a combination of numerical differ-
entiation and integration techniques.
136 NUMERICAL DIFFERENTIATION AND INTEGRATION
Compute the surface area of rotating the curve sin(x) from 0 to 2 about the
x-axis. The formula for an area of surface from a to b of revolving a curve f is
Z b p
Sx = 2 f (x) 1 + f 0 (x)2 dx. (6.13)
a
> simpson(ys, h)
[1] 0.521607
One may attempt to reconstruct the original function through an approximation of
the discrete points, for example a polynomial or spline approximation. splinefun()
will generate such a function.
> fsp = splinefun(xs, ys)
> integrate(fsp, 0, pi)
0.521607 with absolute error < 6.7e-10
The absolute error concerns the spline approximation, not necessarily the error
compared to the initial, unknown function from which the discrete points are derived.
Still another approach could be to approximate the points with a polynomial
which has the advantage that polynomials can be integrated easily. With pracma
we can do this as follows:
> require(pracma)
> p = polyfit(xs, ys, 6) # fitting polynomial
> q = polyint(p) # anti-derivative
> polyval(q, pi) - polyval(q, 0) # evaluate at endpoints
[1] 0.5216072
Which approach to use depends on the application, e.g., on possible oscillations
or smoothness assumptions about the underlying function.
It can be shown that the optimal abscissas xi for a given n are the roots of the
orthogonal polynomial for the same integral and weighting function. The resulting
138 NUMERICAL DIFFERENTIATION AND INTEGRATION
approximation to the integral is then exact for polynomials of degree 2n 1 or less,
and highly accurate for functions that are well approximated by polynomials. In some
common cases, we have
W (x) interval polynomial
1 (1, 1) Legendre Pn (x)
(1 x2 )1/2 (1, 1) Chebyshev Tn (x)
(1 x2 )1/2 (1, 1) Chebyshev Un (x)
ex (0, ) Laguerre Ln (x)
2
ex (, ) Hermite Hn (x)
If the interval in the first three cases is (a, b) rather than (1, 1), the scaling trans-
formation
ba 1 ba
Z b
ba
Z
f (x)dx = f( x+ ) dx (6.15)
a 2 1 2 2
accomplishes the change.
Package gaussquad encompasses a collection of functions for Gaussian quadra-
ture. For example, function legendre.quadrature.rules() will return the nodes
and weights for performing GaussLegendre quadrature on the interval [1, 1].
> library(gaussquad)
> legendre.quadrature.rules(4)
[[1]]
x w
1 0 2
[[2]]
x w
1 0.5773503 1
2 -0.5773503 1
[[3]]
x w
1 7.745967e-01 0.5555556
2 7.771561e-16 0.8888889
3 -7.745967e-01 0.5555556
[[4]]
x w
1 0.8611363 0.3478548
2 0.3399810 0.6521452
3 -0.3399810 0.6521452
4 -0.8611363 0.3478548
Compute the integral of f (x) = x6 on [1, 1] with Legendre nodes and weights of
order 4:
NUMERICAL INTEGRATION 139
> f = function(x) x^6
> Lq = legendre.quadrature.rules(4)[[4]] # Legendre of order 4
> xi = Lq$x; wi = Lq$w # nodes and weights
> sum(wi * f(xi)) # quadrature
[1] 0.2857143
and this is exactly 2/7, the value of integrating x6 from 1 to 1. One can also directly
calculate this integral with legendre.quadrature():
> legendre.quadrature(f, Lq, lower = -1, upper = 1)
[1] 0.2857143
In pracma there is a gaussLegendre() function available. It takes as argu-
ments the number of nodes and the limits of integration, and returns the positions
and weights at the nodes. We illustrate with examples from the help page of the
functions.
> f = function(x) sin(x+cos(10*exp(x))/3)
> curve(f, -1, 1)
Let us examine convergence with increasing number of nodes.
> nnodes = c(17,29,51,65)
> # Set up initial matrix of zeros for nodes and weights
> gLresult = matrix(rep(0, 2*length(nnodes)),ncol=2)
> for (i in 1:length(nnodes)) {
+ cc = gaussLegendre(nnodes[i],-1,1)
+ gLresult[i,1] = nnodes[i]
+ gLresult[i,2] = sum(cc$w * f(cc$x))
+ }
> gLresult
[,1] [,2]
[1,] 17 0.03164279
[2,] 29 0.03249163
[3,] 51 0.03250365
[4,] 65 0.03250365
> # Compare with integrate()
> integrate(f,-1,1)
0.03250365 with absolute error < 6.7e-07
We see that 51 nodes are enough to get a very precise result.
The pracma package has a number of other integration functions that implement
Gaussian quadrature or some variants of it, most notably quadgk() for adaptive
GaussKronrod quadrature, and quadgr(), a Gaussian quadrature with Richardson
extrapolation.
In GaussKronrod quadrature the evaluation points are chosen so that an accurate
approximation can be computed by reusing the information produced by the compu-
tation of a less accurate approximation. n + 1 points are added to the n-point Gaussian
rule to get a rule of order 2n + 1. The difference between these approximations leads
to an estimate of the relative error.
140 NUMERICAL DIFFERENTIATION AND INTEGRATION
The adaptive version applies this procedure recursively on refined subintervals
of the integration interval, splitting the subinterval into smaller pieces if the relative
error is greater than a tolerance level, and returning and adding up integral values on
subintervals otherwise. Normally, GaussKronrod works by comparing the n = 7 and
2n + 1 = 15 results.
GaussKronrod quadrature is the basic step in integrate as well, combined
with an adaptive interval subdivision and Wynns epsilon algorithm for extrapola-
tion.
quadgk, like all other functions in pracma, is written in R rather than, like
integrate, in compiled C code. It therefore is slightly slower, but has the advantage
of being more stable with oscillating functions while reaching a better level of accu-
racy. As an example, we will integrate the highly oscillating function f (x) = sin( 1x )
on the intervall [0, 1].
> require(pracma)
> f = function(x) sin(1/x)
> integrate(fun, 0, 1)
Error in integrate(fun, 0, 1) : maximum number of subdivisions reached
> integrate(fun, 0, 1, subdivisions=500)
0.5041151 with absolute error < 9.7e-05
> quadgk(fun, 0, 1)
[1] 0.5040670
with an absolute error of 1 107 . This accuracy will not be reached with
integrate(). There are more complicated examples, where integrate() does
not return a value while quadgk() does.
Therefore, the quadgk() function might be most efficient for high accuracies
and oscillatory integrands. It can handle moderate singularities at the endpoints, but
does not support infinite intervals.
> quad(f, 0, 4)
[1] 1.282129075
quadl() uses adaptive Lobatto quadrature, which is similar to Gaussian quadra-
ture, but includes the endpoints of the integration interval in the set of integration
points. It is exact for polynomials up to degree 2n 3, where n is the number of
integration points.
NUMERICAL INTEGRATION 141
> quadl(f,0,1)
[1] 1.282129074
The quad() function might be more efficient for low accuracies with nonsmooth
integrands, while the quadl() function might be more efficient than quad() at
higher accuracies with smooth integrands.
Another advantage of quad() and quadl() is that the integrand does not need
to be vectorized.
Function cotes() provides composite NewtonCotes formulas of degrees 2 to 8.
It takes as arguments the integrand, upper and lower limit, the number of subintervals
to treat separately, and the number of nodes (the degree).
For the function above, because NewtonCotes formulas are not adaptive, one
needs a lot of intervals to get a good result.
> cotes(f, 0, 4, 500, 7)
[1] 1.282129074
No discussion of integration is complete without mentioning Romberg integra-
tion. Rombergs method approximates the integral with applying the trapezoidal rule
(such as in trapz()) by doubling the number of subintervals in each step, and ac-
celerates convergence by Richardson extrapolation.
> romberg(f, 0, 4, tol=1e-10)
$value
[1] 1.282129074
$iter
[1] 9
$rel.error
[1] 1.880781318e-13
The advantages of Romberg integration are the small number of calls to the in-
tegrand function compared to other integration methodsan advantage that will be
relevant for difficult or costly to compute functionsand the quite high accuracy
that can be reached. The functions should not have singularities and should not be
oscillatory.
The last approach to mention is adaptive ClenshawCurtis quadrature, an in-
tegration routine that has gained popularity and is now considered to be a rival to
GaussKronrod. ClenshawCurtis quadrature is based on an expansion of the inte-
grand in terms of Chebyshev polynomials. Unlike Gauss quadrature, which is ex-
act for polynomials up to order 2n 1, ClenshawCurtis quadrature is only exact
for polynomials up to order n. However, since it uses the fast Fourier transform
algorithm, the weights and nodes are computed in linear time. Its speed is further
enhanced by the fact that the Chebyshev polynomial expansion of many functions
converges rapidly. The function cannot have singularities.
> quadcc(f, 0, 4)
[1] 1.282129074
142 NUMERICAL DIFFERENTIATION AND INTEGRATION
The implementation of quadcc() in pracma at the moment is iterative, not adap-
tive. That is, it will half all subintervals until the tolerance is reached. An adaptive
version to come will be a strong competitor to integrate() and quadgk().
pracma provides a function integral() that acts as a wrapper for some of the
more important integration routines in this package. Some examples are given on the
help page. Here we test it on the dilogarithm function
2
Z 1
log(1 t)
dx = (6.16)
0 t 6
> flog = function(t) log(1-t)/t
> val = pi^2/6
> for (m in c("Kron", "Rich", "Clen", "Simp", "Romb")) {
+ Q = integral(flog, 0, 1, reltol = 1e-12, method = m)
+ cat(m, Q, abs(Q-val), "\n")
+ }
Kron -1.644934067 9.858780459e-14 # Gauss-Kronrod
Rich -1.644934067 2.864375404e-14 # Gauss-Richardson
Clen -1.644934067 8.459899448e-14 # Clenshaw-Curtis
Simp -1.644934067 8.719469591e-12 # Simpson
Romb -1.645147726 0.0002136594219 # Romberg
and Z 2
f (x)ex dx (6.23)
boundary of the integral as simple bounds on the variables x and y is not obvious; but
in polar coordinates the region can be described through = 0 . . . 2 and r = 3 . . . 5.
Thus, use integral2 with sector = TRUE:
> require(pracma)
> f = function(x, y) log(x^2 + y^2)
> q = integral2(f, 0, 2*pi, 3, 5, sector = TRUE)
> q
$Q
[1] 140.4194
$error
[1] 2.271203e-07
There are many more two-dimensional integration routines in R, for instance
in package pracma, simpson2d() for a 2D variant of Simpsons rule, or a 2-
dimensional form of Gaussian quadrature in quad2d(). Readers are asked to look at
the help pages and try out some examples for themselves.
that in some form will often arise in statistical applications. Because the function
is the product of one-dimensional functions, the integral can be calculated as the
product of univariate integrals.
> f = function(x) prod(1/sqrt(2*pi)*exp(-x^2))
As this function is not vectorized(!) lets compute the one-dimensional integral
with quad(), and then the 10th power will be a good approximation of the integral
we are looking for.
> require(pracma)
> I1 = quad(f, 0, 1)
> I10 = I1^10
> I1; I10
[1] 0.2979397
[1] 5.511681e-06
adaptIntegrate() will not return a result for higher-dimensional integrals in
an acceptable time frame. We test integration routines cuhre() and vegas() on this
function in 10 dimensions:
> require(R2Cuba)
> ndim = 10; ncomp = 1
> cuhre(ndim, ncomp, f, lower=rep(0, 10), upper=rep(1, 10))
Iteration 1: 2605 integrand evaluations so far
[1] 5.51163e-06 +- 4.32043e-11 chisq 0 (0 df)
Iteration 2: 7815 integrand evaluations so far
[1] 5.51165e-06 +- 5.03113e-11 chisq 0.104658 (1 df)
integral: 5.511651e-06 (+-5e-11)
150 NUMERICAL DIFFERENTIATION AND INTEGRATION
nregions: 2; number of evaluations: 7815; probability: 0.2536896
6.3.1 D()
As an example of how to use D(), consider applying it to the function
f (x) = sin(x)eax (6.27)
> # Define the expression and its function counterpart
> f = expression(sin(x)*exp(-a*x))
> ffun = function(x,a) sin(x)*exp(-a*x)
> # Take the first derivative
> (g = D(f,"x"))
cos(x) * exp(-a * x) - sin(x) * (exp(-a * x) * a)
> # Turn the result into a function
> gfun = function(x,a) eval(g)
> # Take the second derivative
> (g2 = D(g,"x"))
-(cos(x) * (exp(-a * x) * a) + sin(x) * exp(-a * x) + (cos(x) *
(exp(-a * x) * a) - sin(x) * (exp(-a * x) * a * a)))
> # Turn the result into a function
> g2fun = function(x,a) eval(g2)
> # Plot the function and its derivatives, with a = 1
> curve(ffun(x,1),0,4, ylim = c(-1,1), ylab=c("f(x,1) and
+ derivatives"))
> curve(gfun(x,1), add=T, lty=2)
> curve(g2fun(x,1), add=T, lty=3)
> legend("topright", legend = c("f(x,1)", "df/dx", "d2f/dx2"),
+ lty=1:3, bty="n")
6.3.2 deriv()
An equivalent result is obtained with the deriv() function, albeit at the cost of
greater complexity.
SYMBOLIC MANIPULATIONS IN R 153
1.0
f(x,1)
0.5
d2f/dx2
0.020
Lorentzian
4e-04
Gaussian
0.015
dLor(x)
0e+00
0.010
f(x)
0.005
-4e-04
0.000
Optimization
Scientists and engineers often have to solve for the maximum or minimum of a multi-
dimensional function, sometimes with constraints on the values of some or all of the
variables. This is known as optimization, and is a rich, highly developed, and often
difficult problem. Generally the problem is phrased as a minimization, which shows
its kinship to the least-squares data fitting procedures discussed in a subsequent chap-
ter. If a maximum is desired, one simply solves for the minimum of the negative of
the function. The greatest difficulties typically arise if the multi-dimensional surface
has local minima in addition to the global minimum, because there is no way to show
that the minimum is local except by trial and error.
R has three functions in the base installation: optimize() for one-dimensional
problems, optim() for multi-dimensional problems, and constrOptim() for opti-
mization with linear constraints. (Note that even unconstrained optimization nor-
mally is constrained by the limits on the search range.) We shall consider each of
these with suitable examples, and introduce several add-on packages that expand the
power of the basic functions. Optimization is a sufficiently large and important topic
to deserve its own task view in R, at http://cran.r-project.org/web/views/
Optimization.html.
In addition to the packages considered in this chapter, the interested reader
should become acquainted with the nloptr package, which is considered one of
the strongest and most comprehensive optimization packages in R. According to its
synopsis,nloptr is an R interface to NLopt. NLopt is a free/open-source library for
nonlinear optimization, providing a common interface for a number of different free
optimization routines available online as well as original implementations of various
other algorithms.
159
160 OPTIMIZATION
2
10
f(x)
-1
-2
0.0 1.0 2.0 3.0
x
Figure 7.1: Plot of function f (x) = x sin(4x) showing several maxima and minima.
$maximum
[1] 2.944935
$objective
[1] 420.1104
Consider next the use of optimize with the function
> f = function(x) x*sin(4*x)
which plotted looks like this (Figure 7.1):
> curve(f,0,3)
It has two minima in the x = 0 3 range, with the global minimum near 2.8,
and two maxima, with the global maximum near 2.0. Applying optimize() in the
simplest way yields
> optimize(f,c(0,3))
$minimum
[1] 1.228297
$objective
[1] -1.203617
which gives the local minimum because it is the first minimum encountered by the
search algorithm (Brents method, which combines root bracketing, bisection, and
inverse quadratic interpolation). Because we have a plot of the function, we can see
that we must exclude the local minimum from the lower and upper endpoints of the
search interval.
> optimize(f,c(1.5,3))
$minimum
[1] 2.771403
$objective
[1] -2.760177
To find the global maximum we enter
ONE-DIMENSIONAL OPTIMIZATION 161
> optimize(f,c(1,3),maximum=TRUE)
$maximum
[1] 1.994684
$objective
[1] 1.979182
We could have obtained the same result by minimizing the negative of the function
> optimize(function(x) -f(x),c(1,3))
$minimum
[1] 1.994684
$objective
[1] -1.979182
which finds the maximum in the right place but, of course, yields the negative of the
function value at the maximum.
If necessary, the desired accuracy can be adjusted with the tol option in the
function call.
The pracma package contains the function findmins(), which finds the posi-
tions of all the minima in the search interval by dividing it n times (default n = 100)
and applying optimize in each interval. To find the values at those minima, evaluate
the function.
> require(pracma)
> f.mins = findmins(f,0,3)
> f.mins # x values at the minima
[1] 1.228312 2.771382
> f(f.mins[1:2]) # function evaluated at the minima
[1] -1.203617 -2.760177
The Examples section of the help page for optimize() shows how to include
a parameter in the function call. It also shows how, for a function with a very flat
minimum, the wrong solution can be obtained if the search interval is not properly
chosen. Unfortunately, there is no clear way to choose the search interval in such
cases, so if the results are not as expected from inspecting the graph of the function,
other intervals should be explored.
The function f (x) = |x2 8| yields some interesting behavior (Figure 7.2).
> f = function(x) abs(x^2-8)
> curve(f,-4,4)
Straightforward solving for the maximum over the entire range yields the result at
x = 0.
> optimize(f,c(-4,4),maximum=T)
$maximum
[1] -1.110223e-16
$objective
[1] 8
Excluding the middle from the search interval finds the maxima at the extremes.
162 OPTIMIZATION
8
6
f(x)
4 2
0
-4 -2 0 2 4
x
Figure 7.2: Plot of function f (x) = |x2 8| showing several maxima and minima.
> optimize(f,c(-4,-2),maximum=T)
$maximum
[1] -3.999959
$objective
[1] 7.999672
> optimize(f,c(2,4),maximum=T)
$maximum
[1] 3.999959
$objective
[1] 7.999672
However, the endpoints of the interval will never be considered to be local min-
ima in findmins, because the function applies optimize() to two adjacent subin-
tervals, and the endpoints have only one.
> findmins(function(x) -f(x),-4,4)
[1] -1.040834e-17
> findmins(function(x) -f(x),-4,-3)
NULL
z
x1 x2
+ (1-x2)/(x2*(1-x1)) + 1/((1-x1)*(1-x2)))
> persp(x1,x2,z,theta=45,phi=0)
The diversity and difficulty of optimization problems has led to the development
of many packages and functions in R, each with its own strengths and weaknesses. It
can therefore be somewhat bewildering to know which to try. According to Borchers
(personal communication),
Whenever one can reasonably assume that the objective function is smooth
or at least differentiable, apply BFGS or L-BFGS-B. If worried about
memory requirements with high-dimensional problems, try also CG. Apply
NelderMead only in other cases, and only for low-dimensional tasks. [All
of these are methods of the optim() function.] If the objective function is truly
non-smooth, none of these approaches may be successful.
If you are looking for global optima, first try a global solver (GenSA,
DEoptim, psoptim, CMAES, ...) or a kind of multi-start approach (in low di-
mensions). Try specialized solvers for least-squares problems as they are con-
vex and therefore have only one global minimum.
In addition to these optimization solvers, and others which will be discussed be-
low, there is also the package nloptwrap that is simply a wrapper for the nloptr
package. This, in turn, is a wrapper for the free and powerful optimization library
NLOPT. Consult the R package library for details.
used in the Examples section of the optim() help page (Figure 7.4).
> x1 = x2 = seq(-1.2,1,.1)
> z = outer(x1,x2,FUN=function(x1,x2) {100 *
(x2 - x1 * x1)^2 + (1 -x1)^2})
> persp(x1,x2,z,theta=150)
It appears as if the minimum is somewhere near (1,1). Proceeding as in the previous
example,
MULTI-DIMENSIONAL OPTIMIZATION WITH OPTIM() 165
z
x2
x1
Figure 7.4: Perspective plot of the Rosenbrock banana function defined by Equation 7.2.
0.8
y
0.40.0
0.0 1.0 2.0 3.0
x
Figure 7.5: Least squares fit of a spline function to data.
$convergence
[1] 0
$message
[1] "CONVERGENCE: REL_REDUCTION_OF_F <= FACTR*EPSMCH"
Now plot the smoothed spline approximation (Figure 7.5):
> fsp = splinefun(xp, c(0, opt$par, 0))
> yy = fsp(x)
> lines(x, yy)
With fewer nodes a more sine-like shape will result.
In comparison, the default "Nelder-Mead" method finds a less satisfactory min-
imum:
> opt <- optim(rep(0.5, 10), F)
> opt
$par
[1] 0.2914511 0.3881710 0.8630475 0.9825521 0.9513082 1.0322379
[7] 0.8621980 0.7084492 0.5955802 0.2432722
$value
[1] 0.3030502
Since ucminf() finds minima, we apply it to the negative of f1 . We also note that,
since each term is squared and thus always positive or zero, f1 will be a maximum,
or f1 a minimum, when each term equals zero, a condition that can be solved by
inspection: x = c(5/2, 3, 2/5). A calculation agrees within numerical precision:
> install.packages("ucminf")
> library(ucminf)
> f1 = function(x) (2*x[1]-5)^2 + (x[2]-3)^2 + (5*x[3]-2)^2
> ucminf(c(1,1,1), f1)
$par
[1] 2.4999992 2.9999994 0.4000001
$value
[1] 3.485199e-12
$convergence
[1] 4
$message
[1] "Stopped by zero step from line search"
$invhessian.lt
[1] 0.1257734298 -0.0001776904 -0.0001415078 0.5022988116
-0.0001955878 0.0200489174
$info
maxgradient laststep stepmax neval
1.745566e-05 0.000000e+00 3.500000e-01 1.000000e+01
7.3.3 BB package
The BB package has the optimization functions spg(), BBoptim, and multiStart
with action = "optimize". These functions all have essentially the same calling
structure, and seem to have about the same success as the various methods of optim
in finding the global minimum of f, but may have various advantages of speed or
alternative strategies depending on the problem. The reader is directed to the package
vignette and help pages for details. As an example, let us find the values of (x1 , x2 )
for which the function
This gives the values for ui and ci in the function call below.
> constrOptim(c(-1.2,0.9), fr, grr, ui=rbind(c(-1,0),c(0,-1)),
ci=c(-1,-1))
$par
[1] 0.9999761 0.9999521
$value
[1] 5.734115e-10
$counts
function gradient
297 94
$convergence
[1] 0
$message
NULL
$outer.iterations
[1] 12
$barrier.value
[1] -0.0001999195
We can use constraints to find the optimum at locations away from the global
minimum. For example, if we wish to find the minimum subject to x1 0.9 and
x2 x1 > 0.1, a process similar to the above yields
1 0 x1 0.9 0
(7.5)
1 1 x2 0.1 0
which leads to
> constrOptim(c(.5,0), fr, grr, ui=rbind(c(-1,0),c(1,-1)),
ci=c(-0.9,0.1))
$par
[1] 0.8891335 0.7891335
$value
[1] 0.01249441
$counts
function gradient
OPTIMIZATION WITH CONSTRAINTS 175
254 48
$convergence
[1] 0
$message
NULL
$outer.iterations
[1] 4
$barrier.value
[1] -7.399944e-05
Outer iteration: 2
Min(hin): 0.1767693 Max(abs(heq)): 0.005111882
par: 3.21925 1.16012 0.983351
fval = -1
...
Outer iteration: 6
Min(hin): 0.1758973 Max(abs(heq)): 1.515614e-06
par: 3.21885 1.15866 0.982768
fval = -1
Outer iteration: 7
Min(hin): 0.1758972 Max(abs(heq)): 3.233966e-07
par: 3.21885 1.15866 0.982768
fval = -1
If we had written p0 = c(1,2,3) we would have received an error message that
initial value violates inequality constraints. This message would not have appeared
with auglag(). There are, of course, many (x1 , x2 , x3 ) triplets that satisfy this mini-
mization problem; which set is found depends on the initial guesses.
We now solve the same problem with the solnp() function in the Rsolnp pack-
age.
> f = function(x) -sin(x[1]*x[2]+x[3])
> heq = function(x) -x[1]*x[2]^3 + x[1]^2*x[3]^2 -5
> hin = function(x) {
+ h = rep(NA,2)
+ h[1] = x[1]-x[2]
+ h[2] = x[2] -x[3]
+ h
+ }
We load the Rsolnp package, which also loads the required truncnorm and
parallel packages. solnp() asks for upper and lower bounds for the variables and
the inequalities. It returns considerably more information about the solution than do
the functions in alabama.
> require(Rsolnp)> upper = rep(5,3)
> lower = rep(0,3)
> p0 = c(3,2,1)
> ans = solnp(pars=p0, fun = f, eqfun=heq, ineqfun = hin, LB=lower,
GLOBAL OPTIMIZATION WITH MANY LOCAL MINIMA 177
+ UB=upper, ineqLB = c(0,0), ineqUB = c(5,5))
x2
x1
$objective
[1] -0.2172336
so the minimum of the sum of two sinc functions should be near (1.43,1.43). Putting
the function in vector form, pretending we dont know the answer, and using a plau-
sible starting point with the default options of optim(), the function converges to a
local minimum, but not the global minimum.
> f = function(x) sinc(x[1]) + sinc(x[2])
> optim(c(3,3),f)
$par
[1] 3.47089 3.47089
$value
[1] -0.1826504
$counts
function gradient
107 NA
$convergence
[1] 0
$message
NULL
Two of the major approaches for optimizing functions with many local minima
are simulated annealing and differential evolution.
x2
x1
[1] 53285
The same results are obtained with max.time = 0.1, but with 10,257 counts.
As another example, consider the function (Figure 7.7)
7.5.2.1 DEoptim
The DE in DEoptim stands for differential evolution. The function, with the
same name as the package, is called by the familiar set of arguments: DEoptim(fn,
lower, upper, control = DEoptim.control(), ..., fnMap=NULL). fn is
the function to be minimized. Its first argument should be the vector of parameters
to optimize, and it should return a real scalar result. lower and upper are vectors
that specify the bounds on each parameter, with the ith component of each vector
corresponding to the ith parameter. These vectors should encompass the full range of
allowable values of the parameters.
control is a list of control parameters, discussed in the next paragraph. If
control is not specified, its defaults are used. . . . signifies further arguments, if
any, to be passed to fn. fnMap is an optional function that will be run after each
population is created, but before the population is passed to the objective function.
This allows the user to impose integer/cardinality constraints.
DEoptim will often run fine with the default control settings, but sometimes
tweaks in those settings will lead to better results. Settings are modified with the
DEoptim.control function, whose usage is
DEoptim.control(VTR = -Inf, strategy = 2, bs = FALSE, NP = NA,
itermax = 200, CR = 0.5, F = 0.8, trace = TRUE, initialpop = NULL,
storepopfrom = itermax + 1, storepopfreq = 1, p = 0.2, c = 0, reltol,
steptol, parallelType = 0, packages = c(), parVar = c(),
foreachArgs = list())
182 OPTIMIZATION
Some of the more commonly adjusted arguments of this function (see the help
page for the full list and details) are:
strategy: an integer between 1 and 6 (default = 2), specifying the way in which
successive generations of candidate parameter vectors are chosen. We describe
strategy = 2 in more detail below.
NP: Number of population members. The default is 10length(lower). Setting NP
above 40 has been found empirically to not substantially improve convergence,
regardless of the number of parameters.
itermax: The maximum number of iterations allowed. The default is 200.
CR: The crossover probability in the interval (0,1). Default is 0.5. However, it
has been suggested that CR = 0.2 is a better choice if the parameters are substan-
tially independent, while CR = 0.9 may be better if there is significant parameter
dependence.
F: Differential weighting factor (see strategy discussion below), with de-
fault = 0.8. However, according to the Differential Evolution Homepage
(http://www1.icsi.berkeley.edu/ storn/code.html) It has been found recently that
selecting F from the interval [0.5, 1.0] randomly for each generation or for each
difference vector, a technique called dither, improves convergence behaviour sig-
nificantly, especially for noisy objective functions.
The default strategy = 2 replaces the classical random mutation strategy = 1 with
the expression
where oldi,g and bestg are the ith member and best member, respectively, of the pre-
vious population. (i stands for the ith parameter, g for the gth generation.) See the
Details section of the DEoptim.control help page for more information on avail-
able strategies.
As an example, we use the Rastrigin function, which is a common test function
for multi-dimensional optimization. Inspection shows that its minimum is 0 at (0,0),
with which the DEoptim calculation adequately agrees.
> fras = function(x) 10*length(x)+sum(x^2-10*cos(2*pi*x))
> require(DEoptim)
> optras = DEoptim(fras,lower=c(-5,-5),upper=c(5,5),
+ control=list(storepopfrom=1, trace=FALSE))
> optras$optim
$bestmem
par1 par2
1.161563e-06 -1.098620e-06
$bestval
[1] 5.071286e-10
$nfeval
LINEAR AND QUADRATIC PROGRAMMING 183
[1] 402
$iter
[1] 200
7.5.2.2 rgenoud
The genoud() function in the rgenoud package combines genetic algorithm meth-
ods with a derivative-based, Newton-like method. The rationale is that over much
of the landscape, a problem may be strongly nonlinear or even discontinuous: the
sort of situation for which evolutionary methods were developed. On the other hand,
near a solution many problems are regular and derivative information is useful. The
approach and implementation are discussed at length, with examples, in the paper
by Mebane and Sekhon (2011) and in the vignette1 and help pages for the package.
In this as in the other genetic algorithm methods, the population size and maximum
number of generations are key variables in reaching an optimal solution, though of
course there are trade-offs with computation time.
7.5.2.3 GA
The GA package developed by Scrucca (2012) is intended as a flexible, general-
purpose R package for implementing genetic algorithms search in both the contin-
uous and discrete case, whether constrained or not. Scrucca continues, Users can
easily define their own objective function depending on the problem at hand. Sev-
eral genetic operators are available and can be combined to explore the best settings
for the current task. Furthermore, users can define new genetic operators and easily
evaluate their performances. Interested readers should consult the package and its
documentation2 for details and examples.
and the profit is 500m1 +400m2 , where naturally m1 , m2 0. For solving the problem,
these equations and inequalities have to be defined in R as vectors and matrices.
> obj = c(500, 400) # objective function 500*m1+400m2
> mat = matrix(c(20, 20, # constraint matrix
+ 5, 30,
+ 15, 7), nrow=3, ncol=2, byrow=TRUE)
> dir = rep("<=", 3) # direction of inequalities
> rhs = c(100, 50, 60) # right hand side of inequalities
The fact that the variables have to be greater than 0 is always inplicitly assumed
and does not need to be stated. Now we can easily solve this linear programming
problem.
LINEAR AND QUADRATIC PROGRAMMING 185
> require(lpSolve)
> soln = lp("max", obj, mat, dir, rhs)
> soln
Success: the objective function is 2180.723
> soln$solution
[1] 3.493976 1.084337
Of the first product about 3.494 units should be produced, and of the second about
1.084 units. To see that the constraints are respected, multiply the constraint matrix
with the solution vector:
> mat %*% soln$solution
[,1]
[1,] 91.56627
[2,] 50.00000
[3,] 60.00000
Therefore, feedstocks 2 and 3 have been completely used up.
For smaller linear programming problems there is another package, linprog,
that is by far not as powerful as lpSolve, but returns more information about the
problem. Applied to the problem above we see
> require(linprog)
> solveLP(obj, rhs, mat, maximum = TRUE)
Iterations in phase 1: 0
Iterations in phase 2: 2
Solution
opt
1 3.49398
2 1.08434
Basic Variables
opt
1 3.49398
2 1.08434
S 1 8.43373
Constraints
actual dir bvec free dual dual.reg
1 91.5663 <= 100 8.43373 0.0000 8.43373
2 50.0000 <= 50 0.00000 6.0241 30.00000
3 60.0000 <= 60 0.00000 31.3253 48.33333
186 OPTIMIZATION
All Variables (including slack variables)
opt cvec min.c max.c marg marg.reg
1 3.49398 500 66.6667 857.1429 NA NA
2 1.08434 400 233.3333 3000.0000 NA NA
S 1 8.43373 0 NA 15.6250 0.0000 NA
S 2 0.00000 0 -Inf 6.0241 -6.0241 30.0000
S 3 0.00000 0 -Inf 31.3253 -31.3253 48.3333
Of interest here are the values of the dual variables (under dual) for the con-
straints. These values indicate how much the objective will change if the constraints
are changed by one unit. We see that increasing the stock in feedstock 3 is the most
profitable action one can take:
> rhs = c(100, 50, 61)
> lp("max", obj, mat, dir, rhs)
Success: the objective function is 2212.048
and the increase 31.325 in profit is indeed exactly what is predicted by the dual
variable shown in the output above.
x1 + x2 2
x1 + 2x2 2
2x1 + x2 3
0 x1 , 0 x2 .
These equations and inequalities lead to the following formulation and solution.
LINEAR AND QUADRATIC PROGRAMMING 187
> require(quadprog)
> Dmat = matrix(c(1,-1,-1,2),2,2)
> dvec = c(2,6)
> Amat = matrix(c(-1,-1,1,-2,-2,-1),2,3)
> bvec = c(-2,-2,-3)
>
> solve.QP(Dmat, dvec, Amat, bvec)
$solution
[1] 0.6666667 1.3333333
$value
[1] -8.222222
$unconstrained.solution
[1] 10 8
$iterations
[1] 3 0
$Lagrangian
[1] 3.1111111 0.4444444 0.0000000
$iact
[1] 1 2
Quadratic programming plays an important role in geometric optimization prob-
lems, such as the enclosing ball problem and polytope distance problem, or the
separating hyperplane problem that is essential for the technique of support vector
machines in machine learning.
Here we will look at finding a smallest circle in R2 enclosing ten given points
p1 , . . . , p10 . (Figure 7.8) Let the points be p1 = (0.30, 0.21), etc., represented as
columns in the following matrix.
> C = matrix(
+ c(0.30, 0.08, 0.30, 0.99, 0.31, 0.77, 0.23, 0.29, 0.92, 0.14,
+ 0.21, 0.93, 0.48, 0.83, 0.69, 0.91, 0.35, 0.05, 0.03, 0.19),
+ nrow = 2, ncol = 10, byrow = TRUE)
1.0
0.6
+
0.2
-0.2
xT CT Cx pTi pi xi
with the constraints xi = 1 and all xi 0. Then the point p = pi xi is the center of
the ball, and the negative of the minimum value is the square of the radius.
Unfortunately, the matrix CT C is not positive definite. (Look at matrix D below
and compute its eigenvalues with eigen(D): eight of the ten eigenvalues are zero.)
Thus function solve.QP() cannot be applied. We will instead turn to ipop() in
package kernlab, an often used package and function.
For solving with ipop() we need to define D, d, A, and b.
> D = 2 * t(C) %*% C # D = 1/2 C C
> d = apply(C^2, 2, sum) # d = (p1 p1, )
> A = matrix(rep(1, 10), 1, 10) # sum xi = 1
> b = 1; r = 0 # b <= A x <= b + r
> l = rep(0, 10); u = rep(1, 10) # l <= x <= u
ipop requires explicit lower and upper bounds for the variables as well as for the
inequalities. We can safely assume xi 1 because the xi are positive and their sum is
1. Now everything is in place to compute a solution.
> require(kernlab)
> sol = ipop(-d, D, A, b, l, u, r)
> x = sol@primal
> sum(x)
[1] 1
The center of the ball will be at p0 = xi pi , that is
> p0 = C %*% x; p0
[,1]
[1,] 0.4495151
[2,] 0.4029843
To see the distance of all points to the proposed center, do the following:
# Euclidean distance between p0 and all pi
> e = sqrt(colSums((C - c(p0))^2)); e
[1] 0.3360061 0.6155485 0.2000001 0.6021627 0.2831960
[5] 0.5077400 0.2996666 0.4785395 0.6155485 0.4622771
> r0 = max(e); r0
[1] 0.784569
To be convinced, we lay out the whole situation in a scatterplot.
> plot(C[1, ], C[2, ], xlim = c(-0.2, 1.2), ylim = c(-0.2, 1.2),
+ xlab = "x", ylab = "", asp = 1)
> grid()
# Draw the center of the circle
> points(p0[1], p0[2], pch = "+", cex = 2)
MIXED-INTEGER LINEAR PROGRAMMING 189
# Draw a circle with radius r0
> th = seq(0, 2*pi, length.out = 100)
> xc = p0[1] + r0 * cos(th)
> yc = p0[2] + r0 * sin(th)
> lines(xc, yc)
The circle cannot be made smaller because two points lie on the boundary on
opposite sides.
y = 500x1 + 450x2
subject to
6x1 + 5x2 60
10x1 + 20x2 150
x1 8
6x1 + 5x2 60
10x1 + 20x2 150
x1 8
x2 + 3b 0
x2 20b 0
and thus
> obj = c(500, 400, 0)
> A = matrix(c( 6, 5, 0,
+ 10, 20, 0,
+ 1, 0, 0,
+ 0, -1, 3,
+ 0, 1, -20), ncol = 3, byrow = TRUE)
> b = c(60, 150, 8, 0, 0)
> int.vec = c(3)
> soln = lp("max", obj, A, rep("<=", 5), b, int.vec = int.vec)
> soln; soln$solution
Success: the objective function is 4950
[1] 7.5 3.0 1.0
Semi-continuous variables and the big-M trick are often-used elements of mod-
eling and solving linear programming tasks.
the first one saying we want exactly four items, the second that the price for these
four items shall be below 200.10 euros, but as close as possible. The reason is that
a linear programming solver will maximize a linear function, not exactly reaching a
certain value. But if the maximal value found by the solver is less than 20010, we
know for sure that there are no four item prices exactly summing up to this value.
The objective function is also bi Pi ; that is, we use the same linear function as
objective and as inequality. Putting all the pieces together:
> obj = P
> M = rbind(rep(1, 30), P)
> dir = c("==", "<=")
> rhs = c(4, 20010)
> binary.vec = 1:40
> require(lpSolve)
> (L = lp("max", obj, M, dir, rhs, binary.vec = binary.vec))
Success: the objective function is 20010
This shows that there are four items whose prices add up to 200.10 euros, identi-
fying their indices and single prices with
194 OPTIMIZATION
> inds = which(L$solution == 1)
> inds; P[inds]/100; sum(P[inds])/100
[1] 7 9 20 26
[1] 23.54 93.29 73.03 10.24
[1] 200.1
Note that there may be more than one combination of four prices that yield this
total, because an optimization solver stops when it has found a solution, it does not try
to produce all of the possible solutions. To make sure, remove one of those indices,
for example 26, and try to find another combination of four prices with this same
sum. In the following, we set P[26] to some high value, e.g., 21000, so it will not be
a candidate for the sum. Then repeat the procedure from above.
> i = 26
> Q = P
> Q[i] = 21000
> N = rbind(rep(1, 30), Q)
> LL = lp("max", obj, N, dir, rhs, binary.vec = binary.vec)
> inds = which(LL$solution == 1)
> inds; Q[inds]/100; sum(Q[inds])/100
[1] 6 16 24 30
[1] 30.77 76.40 89.07 3.85
[1] 200.09
Without item 26 there is no subset of four prices adding up to 200.10. We try indices
20 or 9 (7 has not to be checked then), with the same negative result. Therefore we
can safely conclude that exactly the goods with prices 23.54, 93.29, 73.03, and 10.24
are covered on this receipt!
hD 2 E i
CV = E hEi2 (7.8)
T
and hD E i
= M 2 hMi2 (7.9)
to calculate the heat capacity CV and magnetic susceptibility from the variances of
E and M.
We begin by setting the parameters of the simulation. Set up a square lattice (ma-
trix) of spins A with nr rows and nc columns. The calculation runs rather slowly in R
(for greater speed use Fortran or C) so we use a small 1212 lattice. An even number
of rows and columns assures consistency with the periodic boundary conditions. At
each temperature we make npass random choices of a spin, flipping each and testing
the energy of the resulting configuration against the Metropolis criterion. After the
first nequil of these passes it is assumed that the system has reached equilibrium, so
the remaining passes are used to obtain averages of the energy, magnetization, and
their squares.
> nr = 12; nc = 12 # Number of rows and columns
> A = matrix(nrow = nr, ncol = nc)
> npass = 2e5 # Number of passes for each temperature
> nequil = 1e5 # Number of equilibration steps for each T
Set the upper and lower temperatures to be scanned, and the interval between
them, thus determining the number of temperatures to be scanned. It is known that
3 http://fraden.brandeis.edu/courses/phys39/simulations/Student%20Ising%20
Swarthmore.pdf
196 OPTIMIZATION
the phase transition temperature (critical temperature) for this model is near TC 2.3,
so we choose limits that bracket this value.
> T_hi = 3 # Temperature to start scan at
> T_lo = 1.5 # Temperature to finish scan at
> dT = 0.1 # Temperature scanning interval
> nscans = as.integer((T_hi - T_lo)/dT) + 1
Set up a table (matrix M) to accept the results at the end of each temperature
scan.
> # Initialize results table
> M = matrix(nrow = nscans, ncol = 5, byrow=TRUE,
+ dimnames=list(rep("",nscans),c("T","E_av","Cv","Mag_av",
+ "Mag_sus")))
Construct a function Ann(A,m,n) that defines the nearest neighbors of the (m, n)
spin in A, with special provision for edges to accommodate periodic boundary con-
ditions.
> Ann = function(A, m, n) {
+ if (m == nr) Ann1 = A[1,n] else Ann1 = A[m+1,n] # bottom
+ if (n == 1) Ann2 = A[m,nc] else Ann2 = A[m,n-1] # left
+ if (m == 1) Ann3 = A[nr,n] else Ann3 = A[m-1,n] # top
+ if (n == nc) Ann4 = A[m,1] else Ann4 = A[m,n+1] # right
+ return(Ann1 + Ann2 + Ann3 + Ann4)
+ }
At each temperature we start the calculation anew: Initialize the variables in units
chosen so that the thermal energy T equals one, as does the spin interaction energy
J (which therefore doesnt appear explicitly in the calculation). Set the energy and
magnetization to zero.
> for (isc in 1:nscans) { # T scan loop
+ temp = T_hi - dT*(isc - 1)
+ # Initialize variables
+ beta = 1/temp
+ oc = 0 # output count
+ E_av = 0
+ E2_av = 0
+ mag_av = 0
+ mag2_av = 0
Set up the lattice in a checkerboard configuration with alternating spins pointing
up and down. (Other initial configurations are possible, but yield the same equilib-
rium results.)
+ # Set up initial checkerboard spin configuration
+ A[1,1] = 1
+ for (i in 1:(nr - 1)) A[i+1,1] = -A[i,1]
+ for (j in 1:(nc - 1)) A[,j+1] = -A[,j]
CASE STUDY 197
Begin passes at each temperature, using the Metropolis algorithm to accept or
reject a trial. The first nequil passes are equilibration steps. The remainder are used
to accumulate statistics on energy and magnetization.
+ for (ipass in 0:npass) { # Monte Carlo passes at T
+ if (ipass > nequil) {
+ oc = oc + 1 # output count
+ mag = sum(A)/(nr*nc)
+ mag_av = mag_av + mag
+ mag2_av = mag2_av + mag^2
+ E = 0
+ for (m in 1:nr) {
+ for (n in 1:nc) {
+ E = E - A[m,n]*Ann(A,m,n)
+ }
+ }
+ E = E/(2*nr*nc)
+ E_av = E_av + E
+ E2_av = E2_av + E^2
+ }
+ # Choose a random spin to change
+ m = sample(nr,1,replace=TRUE)
+ n = sample(nc,1,replace=TRUE)
+ ts = -A[m,n] # Flip sign of spin
+ dU = -2*ts*Ann(A,m,n)
+ log_eta = log(runif(1))
+ if(-beta*dU > log_eta) A[m,n] = ts
+ } # end MC passes at T
Fill a row with the temperature, energy, and magnetization results at that temper-
ature.
+ M[isc,1] = temp
+ M[isc,2] = E_av/oc
+ M[isc,3] = beta^2*(E2_av/oc - (E_av/oc)^2)
+ M[isc,4] = abs(mag_av/oc)
+ M[isc,5] = beta*(mag2_av/oc - (mag_av/oc)^2)
+ cat(c(temp, mag_av,mag2_av,E_av,E2_av),"\n") # not shown
+ } # end T scans
Print and plot the results.
> M # print result (deleted from output)
> # plot results
> par(mar=c(4,4,1.5,1.5),mex=.8,mgp=c(2,.5,0),tcl=0.3)
> par(mfrow=c(2,2))
> plot(M[,1], M[,2], xlab="T", ylab="<E>")
> plot(M[,1], M[,3], xlab="T", ylab="<Cv>")
> plot(M[,1], M[,4], xlab="T", ylab="<M>")
198 OPTIMIZATION
0.010
-1.0
0.006
<Cv>
<E>
-1.4
-1.8
0.002
1.5 2.0 2.5 3.0 1.5 2.0 2.5 3.0
T T
1.0
0.12
0.8
0.6
0.08
<chi>
<M>
0.4
0.04
0.2
0.00
0.0
Figure 7.9: Plots of thermodynamic and magnetic functions for 2D Ising model.
Differential equations are ubiquitous in science and engineering, since they describe
the rate of change of a system with time, position, or some other independent vari-
able. It is conventional to classify differential equations according to certain charac-
teristics.
Ordinary differential equations (ODEs) depend on a single independent variable,
such as time; while partial differential equations (PDEs) depend on several indepen-
dent variables, such as spatial coordinates as well as, perhaps, time. In this chapter
we treat ODEs; PDEs are the subject of the next chapter.
First order differential equations involve only first derivatives of the dependent
variables, while second and higher order differential equations involve second and
higher order derivatives. In numerical solution of differential equations, all equations
are reduced to first order by the expedient of defining, e.g., dy/dt = y1 and then
d 2 y/dt 2 = dy1 /dt.
Initial value problems define the starting values of the variables, while boundary
value problems specify the beginning and ending values of the variables.
Linear differential equations are linear in the dependent variables, while non-
linear differential equations involve higher or nonintegral powers. Most analytically
soluble differential equations are linear, but numerical solutions can cope equally
well with nonlinear equations.
R, through contributed packages, has powerful tools to numerically solve differ-
ential equations. The DifferentialEquations task view at cran.r-project.org/
web/views/DifferentialEquations.html provides a useful overview. These
are some of the most important packages:
deSolve provides functions that solve initial value problems of a system of
first-order ordinary differential equations (ODE), of partial differential equations
(PDE), of differential algebraic equations (DAE), and of delay differential equa-
tions. The functions provide an interface to the FORTRAN functions lsoda,
lsodar, lsode, lsodes of the ODEPACK collection, to the FORTRAN func-
tions dvode and daspk and a C-implementation of solvers of the RungeKutta
family with fixed or variable time steps. The package contains routines designed
for solving ODEs resulting from 1-D, 2-D and 3-D partial differential equations
(PDE) that have been converted to ODEs by numerical differencing. The vignette
Package deSolve: Solving Initial Value Differential Equations in R is available
199
200 ORDINARY DIFFERENTIAL EQUATIONS
as a pdf file on the CRAN > Packages site, and should be consulted for orienta-
tion and examples. The help pages for the individual functions provide details and
more examples.
bvpSolve provides functions that solve boundary value problems (BVP) of sys-
tems of ordinary differential equations (ODE). The functions provide an interface
to the FORTRAN functions twpbvpC and colnew and an R-implementation of
the shooting method.
ReacTran provides routines for developing models that describe reaction and
advective-diffusive transport in one, two or three dimensions. Includes transport
routines in porous media, in estuaries, and in bodies with variable shape.
rootSolve contains routines to find the root of nonlinear functions, and to
perform steady-state and equilibrium analysis of ordinary differential equations
(ODE). Includes routines that: (1) generate gradient and Jacobian matrices (full
and banded), (2) find roots of non-linear equations by the NewtonRaphson
method, (3) estimate steady-state conditions of a system of (differential) equations
in full, banded or sparse form, using the NewtonRaphson method, or by dynam-
ically running, (4) solve the steady-state conditions for uni-and multicomponent
1-D, 2-D, and 3-D partial differential equations, that have been converted to ODEs
by numerical differencing (using the method-of-lines approach). Includes fortran
code.
PBSddesolve solves systems of delay differential equations. (This capability
also exists in deSolve.)
FME (A Flexible Modelling Environment for Inverse Modelling, Sensitivity, Iden-
tifiability, Monte Carlo Analysis) provides functions to help in fitting models to
data, to perform Monte Carlo, sensitivity and identifiability analysis. It is intended
to work with models written as a set of differential equations that are solved either
by an integration routine from package deSolve, or a steady-state solver from
package rootSolve. However, the methods can also be used with other types of
functions.
These packages, and the functions in them, are explained with useful exam-
ples in their help pages and in the book Solving Differential Equations in R by
Soetaert, Cash and Mazzia, Springer, 2012. Additional information can be found
on the website of the special interest group about dynamic modeling with R,
https://stat.ethz.ch/pipermail/r-sig-dynamic-models/. This site can
also be reached from [email protected]>Mailing lists.
d2x 1
m 2
= ACdrag vvx (8.1)
dt 2
d2y 1
m 2 = mg ACdrag vvy (8.2)
dt 2
80 100
60
p
40
20
0
0 1 2 3 4 5
time
Figure 8.1: Exponentially decaying population calculated by the improved Euler method.
We load the deSolve package (first installing it if that has not already been done),
then go through the initial steps of defining parameters, initializing position and ve-
locity variables, and specifying the time sequence for the calculation.
> install.packages("deSolve")
> require(deSolve)
>
DESOLVE PACKAGE 209
> # Compute parameter
> G = 3600^2*6.673e-20
> Msun = 1.989e30
> GM = G*Msun
> parms = GM
>
> # Initialize variables
> x0 = 149.6e6; vx0 = 0
> y0 = 0; vy0 = 29.786*3600
>
> # Set time sequence
> tmin = 0; tmax = 8800; dt = 400
> hrs = seq(tmin, tmax, dt)
We next define the function that computes the desired derivatives and returns
them in a list.
> orbit = function(t,y,GM) {
+ dy1 = y[2]
+ dy2 = -GM*y[1]/(y[1]^2+y[3]^2)^1.5
+ dy3 = y[4]
+ dy4 = -GM*y[3]/(y[1]^2+y[3]^2)^1.5
+ return(list(c(dy1,dy2,dy3,dy4)))
+}
We then call the ode function, using the default lsoda method, to solve the sys-
tem of differential equations. The arguments to the function are specified by position,
and thus not explicitly named.
> out = ode(c(x0, vx0, y0, vy0), hrs, orbit, parms)
Finally, we display the results as before.
> options(digits=5)
> hrs = out[,1]; x = out[,2]; vx = out[,3]
> y = out[,4]; vy = out[,5]
> r = round(sqrt(x^2 + y^2)*1e-8,3)
> v = round(sqrt(vx^2 + vy^2)/3600,3)
> mat = cbind(hrs,x,y,r,v)
> colnames(mat) = c("hrs", "x km", "y km",
"r/1e8 km", "v km/s")
> mat
hrs x km y km r/1e8 km v km/s
[1,] 0 149600000 0 1.496 29.786
[2,] 400 143493179 42306604 1.496 29.786
[3,] 800 125671404 81159285 1.496 29.786
[4,] 1200 97589604 113386031 1.496 29.786
[5,] 1600 61540456 136355691 1.496 29.786
[6,] 2000 20467145 148193044 1.496 29.786
[7,] 2400 -22277150 147931692 1.496 29.786
210 ORDINARY DIFFERENTIAL EQUATIONS
[8,] 2800 -63202726 135592932 1.496 29.786
[9,] 3200 -98968215 112184158 1.496 29.786
[10,] 3600 -126653781 79616413 1.496 29.786
[11,] 4000 -143999011 40548639 1.496 29.786
[12,] 4400 -149587807 -1829627 1.496 29.786
[13,] 4800 -142963858 -44058504 1.496 29.786
[14,] 5200 -124667908 -82690312 1.496 29.786
[15,] 5600 -96193662 -114571009 1.496 29.786
[16,] 6000 -59865843 -137097729 1.496 29.786
[17,] 6400 -18650363 -148431290 1.496 29.786
[18,] 6800 24087794 -147646368 1.496 29.786
[19,] 7200 64859326 -134807039 1.496 29.786
[20,] 7600 100335543 -110961522 1.496 29.786
[21,] 8000 127619992 -78056635 1.496 29.786
[22,] 8400 144484949 -38778991 1.496 29.786
[23,] 8800 149553679 3664689 1.496 29.786
With the ode function of deSolve, the radius and velocity of the orbit remain satis-
factorily constant over the course of the year.
d2J dJ
x2 2
+ x + (x2 2 )J = 0 (8.5)
dx dx
> require(deSolve)
>
0.6
0.4
0.2
J
-0.2 0.0
0 5 10 15
xx
Figure 8.2: Numerical solution of the Bessel equation of order 1.
DESOLVE PACKAGE 211
> # Function to feed to lsoda
> diffeqs = function(x,y,nu) {
+ J=y[1]
+ dJdx = y[2]
+ with(as.list(parms), {
+ dJ = dJdx
+ ddJdx = -1/x^2*(x*dJdx + (x^2-nu^2)*J)
+ res = c(dJ, ddJdx)
+ list(res)
+ })
+}
>
> # Abscissa steps
> xmin = 1e-15 # Dont start exactly at zero, to avoid infinity
> xmax = 15
> dx = 0.1
> xx = seq(xmin, xmax, dx)
>
> # Parameters
> parms = c(nu = 1) # Bessel equation of order 1
>
> # Initial values
> y0 = c(J = 0, dJdx = 1)
>
> # Solve with lsoda
> out = lsoda(y0, xx, diffeqs, parms)
>
> # Plot results and compare with built-in besselJ
> xx = out[,1]; J = out[,2]; dJdx = out[,3]
> plot(xx, J, type="l"); curve(besselJ(x,1),0,15,add=TRUE)
> abline(0,0)
The lsode() function is very similar to lsoda(), but requires that the user spec-
ify whether the problem is stiff. LSODE (Livermore Solver for Ordinary Differen-
tial Equations) is the basic solver of the ODEPACK collection on which deSolve is
based. In turn, the solver vode is very similar to lsode, but uses a variable-coefficient
method rather than the fixed-step interpolation methods in lsode. Also, in vode it
is possible to choose whether or not a copy of the Jacobian is saved for reuse in the
corrector iteration algorithm; in lsode, a copy is not kept. The solver zvode is like
vode, but should be used when the dependent variables and derivatives, y and dy/dt,
are complex. See the help pages for these solver functions for more details.
0.006
0.006
0.006
0.004
0.004
0.004
X
Y
0.002
0.002
0.002
0.000
0.000
0.000
0 50 100 150 200 0 50 100 150 200 0.000 0.002 0.004 0.006
time time X
Figure 8.3: Concentration changes with time in an oscillating chemical system (Equation 8.6).
dX
= k1 A k2 X k3 XY 2 (8.6)
dt
dY
= k2 X + k3 XY 2 k4Y
dt
> require(deSolve)
> # Reaction mechanism
> diffeqs = function(t,x,parms) {
+ X=x[1]
+ Y=x[2]
+ with(as.list(parms), {
+ dX = k1*A - k2*X - k3*X*Y^2
+ dY = k2*X + k3*X*Y^2 - k4*Y
DESOLVE PACKAGE 213
+ list(c(dX, dY))
+ })}
>
> # Time steps
> tmin = 0; tmax = 200; dt = 0.01
> times = seq(tmin, tmax, dt)
>
> # Parameters: Rate constants and A concentration
> parms = c(k1 = 0.01, k2 = 0.01, k3 = 1e6, k4 = 1, A = 0.05)
>
> # Initial values
> x0 = c(X = 0, Y = 0)
>
> # Solve with adams method
> out = ode(x0, times, diffeqs, parms, method = "adams")
>
> # Plot results
> time = out[,1]; X = out[,2]; Y = out[,3]
>
> par(mfrow = c(1,3))
> plot(time, X, type="l") # Time variation of X
> plot(time, Y, type="l") # Time variation of Y
> plot(X,Y, type="l") # Phase plot of X vs Y
> par(mfrow = c(1,1))
dV V
= ForcingFunction . (8.8)
dt RC
We begin in the more-or-less standard way, loading deSolve and setting up the
time sequence over which the solution will be calculated.
> require(deSolve)
> # Time sequence
> tmin=0; tmax=4; dt=.01 #millisec
> times = seq(tmin, tmax, dt)
We next define the forcing voltage: a sequence of (1,0,1,0) pulses, each extending
for 1/dt time steps. The time and voltage pulses are bound together as a matrix
representing a rectangular wave.
> # Forcing behavior
> pulse = c(rep(1,1/dt), rep(0,1/dt), rep(1,1/dt), rep(0,1/dt+1))
> sqw = cbind(times, pulse)
The pulse is now converted into a function with approxfun(). The interpolation
method is linear and rule = 2 avoids NaNs while interpolating.
> SqWave = approxfun(x = sqw[,1], y = sqw[,2],
+ method = "linear", rule = 2)
We write the RC circuit equation for dV/dt as a function to be fed to lsoda()
> voltage = function( t, V, RC) list (c(SqWave(t) - V/RC))
and define the RC value as a parameter, and the initial value of V
> parms = c(RC = 0.6) # millisec
> V0 = 0 # Initial condition
216 ORDINARY DIFFERENTIAL EQUATIONS
0.8
V
0.4 0.0 0 1 2 3 4
msec
Figure 8.4: RC circuit with periodic pulse as example of an event-driven ODE.
3.0
A
B
2.0
conc
1.0 0.0 0 2 4 6 8 10
time
Figure 8.5: Drug delivery protocol illustrating root-triggered event: when B falls below 1, A
is added to bring it to 2.
prey
14
pred
60
pred
prey
10
0 20
8
6
0 50 150 30 50 70
time prey
100
prey
14
pred
60
pred
prey
10
0 20
8
6
0 50 150 30 40 50 60
time prey
4000
1500
3000
2000
1000
1000
500
0 10 20 30 40 50 0 10 20 30 40 50
time time
3
4000 0-12
13-40
350
40
3000
Population
250
2000
150
1000
50
0 10 20 30 40 50 0 10 20 30 40 50
time times
Figure 8.8: Graphs of three population groups (1: 012, 2: 1340, 3: greater than 40).
generally (though not necessarily) called tau. The value of the dependent variable(s)
at the lagged time is defined by the function lagvalue. If there is more than one
variable, lagvalue becomes a vector with variables specified by indices [1], [2],
etc. If necessary, the lagged derivative is defined by the function lagderiv.
Our first example is the Hutchinson equation of population dynamics
dy y(t )
= ry(1 ) (8.11)
dt K
as presented by Y. Kuang in math.la.asu.edu/~kuang/paper/STE034KuangDDEs.pdf.
The derivative function for the Hutchinson model, which is to be fed to dede, is
> func = function(t, y, parms) {
+ tlag=t-tau
+ if (tlag < 0) dy = 1 else dy = r*y*(1 - lagvalue(tlag)/K)
+ return(list(c(dy))) }
DELAY DIFFERENTIAL EQUATIONS 223
Comparison of lag times
10
1
3
8
6
Population
4
2
0
0 20 40 60 80 100
time
Figure 8.9: Solutions to Hutchinson Equation 8.11 using dede with time lag = 1 and 3).
We enter the initial value of the population, the desired time sequence, and the
parameters of the model:
> yinit = 0
> times = 0:100
> r = 1; K = 1; tau = 1
To get a solution, we load deSolve and put the arguments into dede.
> require(deSolve)
> yout1 = dede(y = yinit, times = times, func = func,
+ parms = c(r,K, tau))
We test the sensitivity of the model to the delay time by choosing another value
of , running the calculation again ...
> tau = 3
> yout2 = dede(y = yinit, times = times, func = func,
+ parms = c(r,K, tau))
... and plotting both results on the same plot (Figure 8.9).
> plot(yout1,yout2, type = "l", main="Comparison of lag times",
+ col = rep(1,2), ylim = c(0,10), ylab = "Population")
> legend("topleft", legend = c("1","3"), lty = c(1,3), bty="n")
As Huang points out, has no clear physical meaning, and the fact that modest
variation in leads to such different results leads one to be skeptical of the utility of
the equation. However, it does serve as a useful example of how a delay differential
equation can be formulated and solved.
The second example, adapted from the PBSddesolve reference, shows how to
treat a system of DDEs with two dependent variables (Figure 8.10).
> require(deSolve)
>
> # Create a function to return derivatives
224 ORDINARY DIFFERENTIAL EQUATIONS
y1
y2
6
4
2
y
0
-2
-4
-6
0 5 10 15 20 25 30
t
1.5
y1
y2
1.0
y1, y2
0.5
0.0
-0.5
3 4 5 6 7 8
time
with initial conditions x() = 1/2. daspk requires that the equations be written in
residual form, as in eq1 and eq2 in the code below with results shown in Figure
8.11.
> require(deSolve)
>
> # Function defining the system
> Res_DAE = function (t, y, dy, pars){
+ y1=y[1]; y2=y[2]; dy1=dy[1]; dy2=dy[2]
+ eq1 = dy1 - y2 - sin(t)
+ eq2 = y1 + y2 - 1
+ return(list(c(eq1, eq2), c(y1,y2)))
+ }
>
> # Time sequence and initial values
> times = seq(pi,8,.1)
226 ORDINARY DIFFERENTIAL EQUATIONS
> y = c(y1 = 0.5, y2 = 0.5)
> dy = c(dy1 = 0.5, dy2 = -0.5)
>
> # Solution with daspk
> DAE = daspk(y = y, dy = dy, times = times,
+ res = Res_DAE, parms = NULL, atol = 1e-10, rtol = 1e-10)
>
> # Output and plotting
> time = DAE[,1]; y1 = DAE[,2]; y2 = DAE[,3]
> matplot(time, cbind(y1, y2), xlab = "time", ylab = "y1, y2",
+ type = "l", lty=c(1,3), col = 1)
>
> legend("topleft", legend = c("y1", "y2"), lty = c(1,3),
+ col = 1, bty = "n")
>
As another example, this time with a second derivative, we consider the system
of equations
d2x
= y(t)
dt 2
x(t) + 4y(t) = sin(t) (8.13)
1.0
0.5
0.0
y, x
-0.5
y
-1.0 x
3 4 5 6 7
time
where V f and Vr are maximum velocities in the forward and reverse directions and K f
and Kr are the Michaelis constants (dissociation constants of the enzymesubstrate
or enzymeproduct complexes). A flux Fin of S is supplied to the system, and a
flux Fout of P is removed from the system. The kinetic behavior is modeled by the
user-defined function enzyme(). Solving the system of differential equations for
the concentrations of S and P for a particular set of parameters gives the following
behavior.
> require(deSolve)
> enzyme = function(t, state, pars) {
+ with (as.list(c(state,pars)), {
228 ORDINARY DIFFERENTIAL EQUATIONS
S P
1.0
1.4
0.9
1.3
0.8
1.2
0.7
1.1
0.6
1.0
0 10 20 30 40 50 0 10 20 30 40 50
time time
attr(,"precis")
[1] 0.2
attr(,"steady")
[1] FALSE
attr(,"class")
[1] "steady" "rootSolve" "list"
attr(,"nspec")
[1] 2
attr(,"ynames")
[1] "S" "P"
Warning messages:
1: In stode(y, time, func, parms = parms, ...) :
error during factorisation of matrix (dgefa); singular matrix
2: In stode(y, time, func, parms = parms, ...) : steady-state not
reached
Apparently this difficulty arises because the system has several steady states de-
pending on initial conditions, and the NewtonRaphson solution method flips unpre-
dictably between them. This example serves as a warning that numerical methods, in
R or any other language, are not foolproof.
230 ORDINARY DIFFERENTIAL EQUATIONS
8.10 bvpSolve package for boundary value ODE problems
Boundary value problems (BVPs) are systems of ODEs with values and derivatives
specified at more than one pointcommonly two points, at the boundaries. The
bvpSolve package contains three functions for solving boundary value problems:
bvpshoot(), bvptwpl(), and bvpcol().
8.10.1 bvpshoot()
Perhaps the most common approach to the solution of one-dimensional BVPs is the
shooting method, which gains its name from the analogy with the method of training
artillery on a distant target: guess an initial direction and velocity for the projectile,
try to improve the initial guesses, and repeat until the target is hit. The code for
methods such as bvpshoot() contains systematic procedures (usually Newtonian
iteration) for using the results of previous trials to iterate subsequent guesses. The
method can fail if the ODE is highly nonlinear or unstable, because the guesses may
need to be unrealistically close to the true value.
For an example of a case where the shooting method does work, we consider the
equation (in reduced units) for the height y of a liquid droplet on a flat surface as a
function of surface distance x (from Higham and Higham, p. 163):
" #3/2
2
d2y dy
+ (1 y) 1 + = 0, (8.15)
dx2 dx
0.4
0.3
0.2
y
0.1
0.0
-1.0 -0.5 0.0 0.5 1.0
x
Figure 8.14: Solution to Equation 8.15 for the shape of a liquid drop on a flat surface, by the
shooting method.
8.10.2 bvptwp()
We next consider an example of a strongly nonlinear problem with which
bvpshoot() does not deal well, but which the function bvptwp() (twp stands for
two-point) handles nicely. The differential equation is
d2y
= 100y2 (8.17)
dx2
with the boundary conditions
y(0) = 1
dy
= 0. (8.18)
dx x=1
232 ORDINARY DIFFERENTIAL EQUATIONS
1.0
0.8
0.6
y
0.4
0.2
8.10.3 bvpcol()
The third function in the bvpSolve package, bvpcol(), is based on FORTRAN
code developed for solving multi-point boundary value problems of mixed order.
col stands for collocation. The idea of the collocation method is to choose a finite-
dimensional space of candidate solutions (usually, polynomials [often splines] up to
a certain degree) and a number of points in the domain (called collocation points),
and to select that solution which satisfies the given equation at the collocation points
(Wikipedia).
STOCHASTIC DIFFERENTIAL EQUATIONS: GILLESPIESSA PACKAGE 233
Here is a simple example from Acton, Numerical Methods that Work, p. 157
y00 + y = 0 (8.19)
y(0) = 1
y(1) = 2 (8.20)
2.0
1.8
1.6
y
1.4
1.2
1.0
Figure 8.16: Solution to Equations 8.19 and 8.20 by the collocation method.
0.8
0.6
fractOcc
0.4
0.2
0.0
0 2 4 6 8 10
time
120
S
P
SP
20 40 60 80
Frequency
0
0 2 4 6 8 10
Time
Figure 8.18: Time dependence of the binding reaction S + P = SP treated as a stochastic
process.
We do so while comparing the direct (exact) Gillespie method with the three ap-
proximate tau-leap methods included as optional methods in ssa. These are intended
to speed up the calculation by skipping some time steps according to their underlying
algorithms. We plot the results (Figure 8.19) and include in the title of each plot the
elapsed time as obtained from out$stats$elapsedWallTime.
> par(mfrow = c(2,2)) # Prepare for four plots
Direct method:
> set.seed(1)
> out = ssa(yini,a,nu,pars,tf,method="D",simName,
+ verbose=FALSE,consoleInterval=1)
> et = as.character(round(out$stats$elapsedWallTime,4)) #elapsed time
> time = out$data[,1]
> fractOcc = out$data[,4]/(out$data[,2] + out$data[,4])
> plot(time, fractOcc, pch = 16, cex = 0.5, main = paste("D ",et, " s"))
Explicit tau-leap method:
> set.seed(1)
> out = ssa(yini,a,nu,pars,tf,method="ETL",simName,
+ tau=0.003,verbose=FALSE,consoleInterval=1)
> et = as.character(round(out$stats$elapsedWallTime,4)) #elapsed time
> time = out$data[,1]
> fractOcc = out$data[,4]/(out$data[,2] + out$data[,4])
> plot(time, fractOcc, pch = 16, cex = 0.5,
+ main = paste("ETL ",et, "s"))
Binomial tau-leap method:
> set.seed(1)
> out = ssa(yini,a,nu,pars,tf,method="BTL",simName,
+ verbose=FALSE,consoleInterval=1)
> et = as.character(round(out$stats$elapsedWallTime,4)) #elapsed time
> time = out$data[,1]
> fractOcc = out$data[,4]/(out$data[,2] + out$data[,4])
> plot(time, fractOcc, pch = 16, cex = 0.5,
+ main = paste("BTL ",et, "s"))
Optimized tau-leap method:
> set.seed(1)
> out = ssa(yini,a,nu,pars,tf,method="OTL",simName,
+ verbose=FALSE,consoleInterval=1)
Warning messages:
1: In FUN(newX[, i], ...) : coercing argument of type double to logical
2: In FUN(newX[, i], ...) : coercing argument of type double to logical
3: In FUN(newX[, i], ...) : coercing argument of type double to logical
STOCHASTIC DIFFERENTIAL EQUATIONS: GILLESPIESSA PACKAGE 239
D 0.008 s ETL 0.548 s
0.8
0.8
0.2 0.4 0.6
0.4 0.6
fractOcc
fractOcc
0.2
0.0
0.0
0 2 4 6 8 10 0 2 4 6 8 10
time time
0.8
0.4 0.6
fractOcc
fractOcc
0.2
0.0
0 2 4 6 8 10 0 2 4 6 8 10
time time
Figure 8.19: Fractional occupancy of binding sites calculated according to the direct and
three tau-leap methods of the Gillespie algorithm.
and
d2x
m = (thrust drag)sin(t) (8.23)
dt 2
with the initial conditions
m = m0 burn.rate t (8.25)
= 0 ey/8000 . (8.27)
x and y are measured in meters while the other lengths are measured in km.
Values for the various factors are quantified in the parameters listed in the code
below.
> # Parameters
> m0 = 2.04e6 # Initial mass, kg
> burn.rate = 9800 # kg/s
> R = 6371 # Radius of earth, km
> thrust = 28.6e6 # Newtons
> dens0 = 1.2 # kg/m^3 Density of air at earth surface
CASE STUDIES 241
> A = 100 # m^2, cross-section of launch vehicle
> Cdrag = 0.3 # Drag coefficient
> eps = 0.007 # radians/s, rate of angular change
> g = 9.8 #
The equations of motion are expressed in terms of a vector y whose components
are the x- and y- positions and velocities.
> # Equations of motion
> launch = function(t, y,parms) {
+ xpos = y[1]
+ xvel = y[2]
+ ypos = y[3]
+ yvel = y[4]
+ airdens = dens0*exp(-ypos/8000)
+ drag = 0.5*airdens*A*Cdrag*(xvel^2 + yvel^2)
+ m = m0-burn.rate*t
+ angle = eps*t
+ grav = g*(R/(R+ypos/1000))^2
+ xaccel = (thrust - drag)/m*sin(angle)
+ yaccel = (thrust - drag)/m*cos(angle) - grav
+ list(c(xvel, xaccel, yvel, yaccel))
+ }
We next specify the initial values of the positions and velocities, and the times
over which the solution is to be calculated (every second for two minutes).
> # Initial values
> init = c(0,0,0,0)
>
> # Times
> times = 0:120
We load the deSolve package and, since the differential equations are not stiff,
use the "adams" method of solution.
> # Solve with Adams method
> require(deSolve)
> out = ode(init, times, launch, parms=NULL, method="adams")
Finally, we plot the x-y coordinates, expressed in km, at each second of the launch
(Figure 8.20).
> # Plot results
> time = out[,1]; x = out[,2]; y = out[,4]
> plot(x/1000,y/1000, cex=0.5, xlab="x/km", ylab="y/km")
40
20 30
y/km
10
0
0 10 20 30 40
x/km
Figure 8.20: Height vs. horizontal distance for the first 120 seconds of the space shuttle
launch.
a double helical backbone of radius 1 nm. Therefore, it interacts very strongly with
other charged molecules, including other DNA molecules. To understand how DNA
is tightly coiled and packaged in small volumes such as virus capsids, one needs to
calculate the electrostatic repulsions between nearby DNA segments. The strength
of electrostatic interactions is modulated by the concentration of small ions, such as
salt, in the surrounding solution. The influence of ions on the electrostatic potential
is given by the DebyeHuckel equation
2
52 = Zi ci eZi (8.28)
2I
i
where is the inverse Debye length (nm), I is the ionic strength (molar), and Zi and
ci are the charge and molar concentration of the ith ionic species:
1
I= ci Zi2 (8.29)
2i
0.304
1 = (8.30)
I
We model DNA as a cylindrical rod with charge distributed uniformly on its
surface. In cylindrical coordinates where there is no dependence on height or angle,
the Laplacian operator can be written in terms of , the distance from the rod axis to
a point in solution, as
2 1
52 = + (8.31)
2
Defining the dimensionless variable x = and z = ln x, and confining our calculation
to a uni-univalent salt such as NaCl at molar concentration c, Equation 8.28 can be
written
2 ce2z
e2z
= e e = e e . (8.32)
z2 2I 2
CASE STUDIES 243
Since this is a second-order differential equation, it needs two boundary condi-
tions for a complete solution. One is the gradient of the potential at the helical rod
surface, which can be written
= 4 / (8.33)
z z=ln a
where is the surface charge density and is the dielectric constant. For double-
stranded DNA in the units we are using, 4 / = -0.84.
The second boundary condition depends on the environment in which the DNA
finds itself. If it is effectively alone in dilute solution, then 0 as z . But if the
DNA is in relatively concentrated solution, a different consideration holds. As stated
by Bloomfield et al. (1980) In an extensive array of parallel, equally spaced rods,
a different boundary condition applies. Halfway between any two rods the potential
will be a minimum, corresponding to equally balanced electrical forces perpendicular
to the normal plane between the two rods. We then assume that we can approximate
the polygonally shaped minimum potential surface surrounding any rod by a circu-
lar one with radius R/2, where R is the center-to-center distance between nearest
neighbor rods. At that distance,
=0 (8.34)
z z=ln R/2
We are now in a position to solve the boundary value problem for the potential
as a function of distance from the surface of the DNA helix modeled as a cylindrical
rod. We can first try the shooting method, but find that it fails. However, the functions
bvptwp() and bvpcol() succeed, as shown in Figure 8.21. Note that to change
from bvptwp() to bvpcol(), all that need be done is change the function name in
the code.
-0.05
-0.15
-0.25
1.2
10 12 14
14
0.8
10
w
u
v
8
8
0.4
6
6
0.0
4
Figure 8.22: Time course of the three-population model of resource u, consumer v, and preda-
tor w, illustrating uniform phase but chaotic amplitude behavior.
1.0
0.5
0.0
0.0 0.5 1.0 1.5 2.0
b
Figure 8.23: Bifurcation diagram for the three-population model, with the predator
independent herbivore loss rate b as the control parameter. Bifurcations occur at the extrema
of the predator variable w.
Partial differential equations (PDEs) arise in all fields of science and engineering. In
contrast to ordinary differential equations, they involve more than one independent
variable, often time and one or more position variables, or several spatial variables.
The most common approach to solving PDEs numerically is the method of lines:
one discretizes the spatial derivatives and leaves the time variable continuous. This
leads to a system of ordinary differential equations to which one of the methods
discussed in the previous chapter for initial value ODEs can be applied.
R has three packages, ReacTran, deSolve, and rootSolve, that together con-
tain most of the tools needed to solve most commonly encountered PDEs. The task
view DifferentialEquations lists resources for PDEs as well as for the various types
of ODEs discussed in the previous chapter.
PDEs are commonly classified into three types: parabolic (time-dependent
and diffusive), hyperbolic (time-dependent and wavelike), and elliptic (time-
independent). We shall give examples of how each of these may be solved with
explicit R code, before showing how the functions in ReacTran, deSolve, and
rootSolve can be used to solve such problems concisely and efficiently.
In preparing the first part of this chapter I have drawn heavily on Garcia, Numer-
ical Methods for Physics, Chs. 6-9. The latter part of the chapter, focusing on the
ReacTran package, is based on the work of Soetaert and coworkers, Solving Dif-
ferential Equations in R and A Practical Guide to Ecological Modelling: Using R
as a Simulation Platform, whichalong with the help pages and vignettes for the
packageshould be consulted for more details and interesting examples.
C 2C
=D 2, (9.1)
t x
is, like the heat conduction equation, a parabolic differential equation. (In the heat
conduction equation, the concentration C is replaced by the temperature T , and the
diffusion coefficient D is replaced by the thermal diffusion coefficient .)
249
250 PARTIAL DIFFERENTIAL EQUATIONS
To solve the diffusion equation numerically, a common procedure is to discretize
the time derivative using the Euler approximation
where
D4t
A= (9.5)
4x2
This is the equation, along with suitable boundary conditions, that we shall use to
compute the time-evolution of the concentration profile.
The analytic solution to the one-dimensional diffusion equation, in which the
concentration is initially a spike of magnitude C0 at the origin x0 and zero everywhere
else, is well-known to be
(x x0 )2
C0
C(t, x) = exp (9.6)
2 2 2 2
In other words, the initially very sharp peak broadens with the square root of the
elapsed time. It is this behavior that we shall demonstrate in R. In the code below,
note that the initialization and updating of C maintains the boundary conditions of
C = 0 at the boundaries.
Set the parameters of the diffusion process. An important consideration in choos-
ing the time and distance increments is that the coefficient A = D4t/4x2 must be
1/2 for the computation to be stable.
> dt=3 #Timestep,s
> dx = .1 # Distance step, cm
> D = 1e-4 # Diffusion coeff, cm^2/s
> (A = D*dt/dx^2) # Coefficient should be < 0.5 for stability
[1] 0.03
Discretize the spatial grid and set the number of time iterations.
WAVE EQUATION 251
C
tim
e x
Figure 9.1: Perspective plot of the evolution of a sharp concentration spike due to diffusion.
2W 2W
2
= c2 2 , (9.8)
t x
252 PARTIAL DIFFERENTIAL EQUATIONS
where W is the displacement and c the wave speed, is a typical example of a hyper-
bolic PDE. A simplified version (see Garcia, p. 216) is the advection equation
y y
= c , (9.9)
t x
which describes the evolution of the scalar field y(t, x) carried along by a flow of
constant speed c moving to the right if c > 0.The advection equation is the simplest
example of a flux conservation equation.
The analytical solution of the advection equation, with initial condition y(0, x) =
y0 (x) is simply y(t, x) = y0 (x ct). However, the numerical solution is by no means
trivial, and in fact the forward- in-t, centered-in-x approach that worked for parabolic
equations does not work for the advection equation.
As in the previous section, we replace the time derivative by its forward Euler
approximation
y y(ti + 4t, x j ) y(ti , x j )
(9.10)
t 4t
and the space derivative by the centered discretized approximation
y y(ti , x j + 4x) y(ti , x j 4x)
(9.11)
x 24x
Combining and rearranging leads to the equation for y at timepoint i + 1,
c4t
y(i + 1, j) = y(i, j) [y(i, j + 1) y(i, j 1)] (9.12)
24x
once we provide the initial condition and boundary conditions. We use as initial con-
dition a Gaussian pulse, and impose cyclic boundary conditions, so that grid points
xn and x1 are adjacent.
0.8
C
0.4
0.0
-0.4 0.0 0.2 0.4
x
Figure 9.2: Advection of a Gaussian pulse calculated according to the FTCS method.
Set the locations of the grid points and initialize the space-time matrix of con-
centration values.
> x = (1:n - 0.5)*dx - L/2 # Location of grid points
> sig = 0.1 # Standard deviation of initial Gaussian wave
> amp0 = exp(-x^2/(2*sig^2)) # Initial Gaussian amplitude
> C = matrix(rep(0, (steps+1)*n), nrow = steps+1, ncol = n)
> C[1,] = amp0 # Initial concentration distribution
Establish periodic boundary conditions.
> jplus1 = c(2:n,1)
> jminus1 = c(n,1:(n-1))
For the body of the calculation, loop over the desired number of time steps and
compute the new concentration profile at each time.
> for(i in 1:steps) { # Loop over desired number of steps
+ for(j in 1:n) { # Compute new C profile at each time
+ C[i+1,j] = C[i,j] + A*( C[i,jplus1[j]] - C[i,jminus1[j]] )
+ }
+ }
Finally, plot the initial and final concentration profiles (Figure 9.2).
> plot(x, C[1,], type = "l", ylab = "C", ylim = c(min(C), max(C)))
> lines(x, C[steps, ], lty = 3)
If the advection equation were properly solved by this method, the two wave-
forms should be superimposable. Instead, distortion occurs as the wave propagates.
It can be shown that in fact there is no stable solution for any value of the character-
istic time dx/v.
too large (the calculation becomes unstable) nor too small (the pulse decays as it
progresses). It can be shown that the optimum time step is dt = dx/v. The code is
exactly the same as for the FTCS method, except for the body of the calculation,
where the looping over the desired number of time steps and computation of the new
concentration profile at each time takes place. The result is shown in Figure 9.3.
> # Loop over desired number of steps #
> for(i in 1:steps) {
+ # Compute new concentration profile at each time #
+ for(j in 1:n) {
+ C[i+1,j] = 0.5*(C[i,jplus1[j]] + C[i, jminus1[j]]) +
+ A*(C[i,jplus1[j]] - C[i,jminus1[j]])
+ }
+ }
A still better approach, as explained by Garcia (pp. 2224), is the LaxWendorff
method, which uses a second-order finite difference scheme to treat the time deriva-
tive. This yields Equation 9.13 for the updating of the advection equation:
2
Ci+1 i i i i i i
j = C j A C j+1 C j1 + 2A C j+1 +C j1 2C j (9.13)
2V 2V
+ =0 (9.14)
x2 y2
is an example of the third type of PDE, an elliptic equation. It arises frequently in
electrostatics, gravitation, and other fields in which the potential V is to be calcu-
lated as a function of position. If there are charges or masses in the space, and if we
LAPLACES EQUATION 255
generalize to three dimensions, the equation becomes the Poisson equation
2V 2V 2V
+ + = f (x, y, z) (9.15)
x2 y2 z2
Depending on the geometry of the problem, the equation may also be written in
spherical, cylindrical, or other coordinates.
To solve an elliptic equation of this type, one must be given the boundary condi-
tions. Typically, these specify that certain points, lines, or surfaces are held at con-
stant values of the potential. Then the potentials at other points are adjusted until the
equation is satisfied to some desired approximation. (In rare cases, the equation with
boundary conditions can be solved exactly analytically; but usually an approximate
solution must suffice.)
There are many approaches to numerical solution of the Laplace equation. Per-
haps the simplest is that due to Jacobi, in which the interior points are successively
approximated by the mean of their surrounding points, while the boundary points
are held at their fixed, specified values. We consider as an example a square plane,
bounded by (0,1) in the x and y directions, in which the edge at y = 1 is held at V = 1
and the other three edges are held at V = 0. We make a rather arbitrary initial guess
for the potentials at the interior points, but these will be evened out as the solution
converges.
In the following code we solve the Laplace equation on a square lattice using the
Jacobi method. We begin by setting the parameters
> n = 30 # Number of grid points per side
> L=1 # Length of a side
> dx = L/(n-1) # Grid spacing
> x = y = 0:(n-1)*dx # x and y coordinates
and making a rather arbitrary initial guess for the voltage profile.
> V0 = 1
> V = matrix(V0/2*sin(2*pi*x/L)*sin(2*pi*y/L),
+ nrow = n, ncol = n, byrow = TRUE)
We set the boundary conditions (V = 0 on three edges of the plate, V = 1 on the
fourth edge:
> V[1,] = 0
> V[n,] = 0
> V[,1] = 0
> V[,n] = V0*rep(1,n)
We make a perspective plot of the initial guess,
> par(mfrow = c(1,2))
> persp(x,y,V, theta = -45, phi = 15)
then proceed with the Jacobi-method calculation.
> ## Loop until desired tolerance is obtained
> newV = V
> itmax = n^2 # Hope that solution converges within n^2 iterations
256 PARTIAL DIFFERENTIAL EQUATIONS
V
y x y x
Figure 9.4: Solution to the Laplace equation with the Jacobi method.
9.4.1 setup.grid.1D
Use of ReacTran generally proceeds in three or four steps. First, the function
setup.grid.1D is used to establish the grid. In the simplest case, this function sub-
divides the one-dimensional space of length L, between x.up and x.down, into N
grid cells of size dx.1. The calling usage is
setup.grid.1D(x.up=0, x.down=NULL, L=NULL, N=NULL, dx.1=NULL, p.dx.1=
rep(1,length(L)), max.dx.1=L, dx.N=NULL, p.dx.N=rep(1,length(L)),
max.dx.N=L)
where
x.up is the position of the upstream interface
x.down is the position of the downstream interface
L = x.down - x.up
N is the number of grid cells = L/dx.1
In more complex situations, the size of the cells can vary, or there may be more
than one zone. These situations are described in the help page for setup.grid.1D.
The values returned by setup.grid.1D include x.mid, a vector of length N,
which specifies the positions of the midpoints of the grid cells at which the con-
centrations are measured, and x.int, a vector of length (N+1), which specifies the
positions of the interfaces between grid cells, at which the fluxes are measured.
258 PARTIAL DIFFERENTIAL EQUATIONS
The plot function for grid.1D plots both the positions of the cells and the box
thicknesses, showing both x.mid and x.int. The examples on the help page demon-
strate this behavior.
setup.grid.1D serves as the starting point for setup.grid.2D, which creates
a grid over a rectangular domain defined by two orthogonal 1D grids.
9.4.2 setup.prop.1D
Many transport models will involve grids with constant properties. But if some prop-
erty that affects diffusion or advection varies with position in the grid, the variation
can be incorporated with the function setup.prop.1D (or setup.prop.2D in two
dimensions).
Given either a mathematical function or a data matrix, the setup.prop.1D func-
tion calculates the value of the property of interest at the middle of the grid cells and
at the interfaces between cells. The function is called with
setup.prop.1D(func=NULL, value=NULL, xy=NULL, interpolate="spline",
grid, ...)
where
func is a function that governs the spatial dependency of the property
value is the constant value given to the property if there is no spatial dependency
xy is a data matrix in which the first column gives the position, and the second
column gives the values which are interpolated over the grid
interpolate is the interpolation method (spline or linear)
grid is the object defined with setup.grid.1D
. . . are additional arguments to be passed to func
9.4.3 tran.1D
This function calculates the transport termsthe rate of change of concentration due
to diffusion and advectionin a 1D model of a liquid (volume fraction = 1) or a
porous solid (volume fraction may be variable and < 1).
tran.1D is also used for problems in spherical or cylindrical geometries, though
in these cases the grid cell interfaces will have variable areas.
The calling usage for tran.1D is
tran.1D(C, C.up = C[1], C.down = C[length(C)], flux.up = NULL, flux.down
= NULL, a.bl.up = NULL, a.bl.down = NULL, D = 0, v = 0, AFDW = 1, VF = 1,
A = 1, dx, full.check = FALSE, full.output = FALSE)
where
C is a vector of concentrations at the midpoints of the grid cells.
C.up and C.down are the concentrations at the upstream and downstream bound-
aries.
flux.up and flux.down are the fluxes into and out of the system at the upstream
and downstream boundaries.
EXAMPLES WITH THE REACTRAN PACKAGE 259
If there is convective transfer across the upstream and downstream boundary lay-
ers, a.bl.up and a.bl.down are the coefficients.
D is the diffusion coefficient, and v is the advective velocity.
ADFW is the weight used in the finite difference scheme for advection.
VF and A are the volume fraction and area at the grid cell interfaces.
dx is the thickness of the grid cells, either a constant value or a vector.
full.check and full.output are logical flags to check consistency and regu-
late output of the calculation. Both are FALSE by default.
See the help page for details on these inputs.
When full.output = FALSE, the values returned by trans.1D are dC, the
rate of change of C at the center of each grid cell due to transport, and flux.up and
flux.down, the fluxes into and out of the model at the upstream and downstream
boundaries.
ReacTran also has functions for estimating the diffusion and advection terms
in two- and three-dimensional models, and in cylindrical and polar coordinates. The
number of inputs grows with dimension, but the inputs are essentially the same as in
the 1D case. See the help pages for tran.2D, tran.3D, tran.cylindrical, and
tran.polar.
Yet another refinement is the function tran.volume.1D, which estimates the
volumetric transport term in a 1D model. In contrast to tran.1D, which uses fluxes
(mass per unit area per unit time), tran.volume.1D uses flows (mass per unit time).
It is useful for modeling channels for which the cross-sectional area changes, when
the change in area need not be explicitly modeled. It also allows lateral input from
side channels.
80 100
60
Y
40
20
0 0.0 0.2 0.4 0.6 0.8 1.0
x
Figure 9.5: Advection and diffusion of an initially sharp concentration layer.
10
8
6
u
4
2
0
-100 -50 0 50 100
x
0.0 0.2 0.4 0.6 0.8 1.0 0.0 0.2 0.4 0.6 0.8 1.0
2w 2w
+ = (9.16)
x2 y2 0
for a dipole located in the middle of a square sheet otherwise at 0 potential. For
simplicity, we set all scale factors equal to one. In the definition of the poisson
function, the values in the Nx Ny matrix w are input through the data vector U. As
in the Laplace equation above, we set the initial values of w at the grid cells equal to
uniformly distributed random numbers.
Load ReacTran and establish the grid.
> require(ReacTran)
264 PARTIAL DIFFERENTIAL EQUATIONS
> Nx = 100
> Ny = 100
> xgrid = setup.grid.1D(x.up = 0, x.down = 1, N = Nx)
> ygrid = setup.grid.1D(x.up = 0, x.down = 1, N = Ny)
> x = xgrid$x.mid
> y = ygrid$x.mid
Find the x and y grid points closest to (0.4, 0.5) for the positive charges, and the
(x,y) grid points closest to (0.6, 0.5) for the negative charges.
> # x and y coordinates of positive and negative charges
> ipos = which.min(abs(x - 0.4))
> jpos = which.min(abs(y - 0.50))
>
> ineg = which.min(abs(x - 0.6))
> jneg = which.min(abs(y - 0.50))
Define the poisson function for the potential and its derivatives.
> poisson = function(t, U, parms) {
+ w = matrix(nrow = Nx, ncol = Ny, data = U)
+ dw = tran.2D(C = w, C.x.up = 0, C.y.down = 0,
+ flux.y.up = 0,
+ flux.y.down = 0,
+ D.x = 1, D.y = 1,
+ dx = xgrid, dy = ygrid)$dC
+ dw[ipos,jpos] = dw[ipos,jpos] + 1
+ dw[ineg,jneg] = dw[ineg,jneg] - 1
+ list(dw) }
Solve for the steady-state potential distribution, and make a contour plot of the
result (Figure 9.8).
> out = steady.2D(y = runif(Nx*Ny), func = poisson, parms = NULL,
+ nspec = 1, dimens = c(Nx, Ny), lrw = 1e7)
>
> z <- matrix(nr = Nx, nc = Ny, data = out$y)
> contour(z, nlevels = 30)
0.0 0.2 0.4 0.6 0.8 1.0 0.0 0.2 0.4 0.6 0.8 1.0
StokesEinstein equation
kB T
D= (9.17)
6R
where kB is the Boltzmann constant, T the Kelvin temperature, and the viscosity.
We use the functions in the ReacTran package to show how the viscosity gradient
leads to an asymmetry in the concentration profile of a diffusing molecule in one
dimension.
> require(ReacTran)
We set up a grid in the x-direction with N = 100 cells and 101 interfaces including
the left and right (or up and down) boundaries.
> N=100
> xgrid = setup.grid.1D(x.up=0,x.down=1,N=N)
> x = xgrid$x.mid # Coordinates of cell midpoints
> xint = xgrid$x.int # Coordiates of interfaces
We set the average value of the diffusion coefficient equal to an arbitrary value of
1, and specify a linear viscosity gradient so that the diffusion coefficients at the left
and right sides are 1/4 and 4 times the average value:
> Davg = 1
> D.coeff = Davg*(0.25 +3.75*xint)
A similar linear dependence could be imposed with the ReacTran function p.lin(),
and exponentially or sigmoidally decreasing dependence with p.exp() or p.sig.
See the help pages for details.
We set the initial concentration to a band of width 10 with concentration 0.1 in
the middle of the solution, and concentration 0 elsewhere.
> Yini = rep(0,N); Yini[45:55] = 0.1
We set the time scale using the result, established by Einstein in his theory of
Brownian motion, that the mean-square distance diffused by a Brownian particle in
time t is
< x2 >= 2Dt. (9.18)
266 PARTIAL DIFFERENTIAL EQUATIONS
In our case, the mean-square distance from the middle to either end of the solution
is 1/4, so we set the maximum time for the simulation as tmax = 1/8. We then divide
the simulation into 100 time steps.
> tmin = 0; tmax = 1/(8*Davg)
> times = seq(tmin, tmax,len=100)
We now define the function, Diffusion(), that gives the time-derivatives of the
concentration (the fluxes):
> Diffusion = function(t,Y,parms){
+ tran = tran.1D(C=Y,D=D.coeff, dx=xgrid)
+ list(dY = tran$dC, flux.up = tran$flux.up,
flux.down=tran$flux.down)
+ }
Having made all the necessary preparations, we invoke the differential equation
solver ode.1D(), which most likely calls its default method, lsoda.
> out = ode.1D(y=Yini, times=times, func=Diffusion, parms=NULL,
dimens=N)
The result, out, is a matrix in which column 1 gives the time and columns 2
to N + 1 the concentrations at the midpoints of the N cells. We first plot the initial
concentration profile in row 1 of out. We then use lines() plot the concentration
profiles at subsequent times spaced to give roughly equal diffusion distances, con-
sidering the square-root dependence of average diffusion distance on time (Figure
9.9).
> plot(x, out[1,2:(N+1)],type="l",xlab="x",ylab="C",
ylim=c(0,0.1))
> for (i in c(2,4,8,16,32)) lines(x,out[i,2:(N+1)])
Note the asymmetry in the concentration profile, with more material accumulating to
the right, where the viscosity is lower and the diffusion coefficient higher.
CASE STUDIES 267
9.6.2 Evolution of a Gaussian wave packet
The familiar time-dependent Schrodinger equation in one dimension,
(x,t) h 2 2
ih = H = +V (x) (9.19)
t 2m x2
is an example of a diffusion-advection equation. H is the Hamiltonian. With the
potential V (x) = 0, Equation 9.19 has the form of Ficks second law of diffusion,
with the diffusion coefficient ih/2m.
We show how this equation can be solved numerically using the ReacTran pack-
age to calculate the evolution of probability density of a Gaussian wave packet in free
space. Part of the interest in this calculation is in showing how complex numbers are
handled in R. Our treatment is adapted from Garcia (2000), pp. 287293.
We begin by loading ReacTran and defining the constants and the lattice on
which the calculation will be carried out.
> hbar = 1; m = 1
> D = 1i*hbar/(2*m)
> require(ReacTran)
> N = 131
> L = N-1
> xgrid = setup.grid.1D(-30,100,N=N)
> x = xgrid$x.mid
Next we define the function, Schrodinger, by which the derivative will be cal-
culated and updated.
> Schrodinger = function(t,u,parms) {
+ du = tran.1D(C = u, D = D, dx = xgrid)$dC
+ list(du)
+ }
For the simplest calculation, we choose a Gaussian wave packet
2 2
(x,t = 0) = (0 )1/2 eik0 x e(xx0 ) /20 (9.20)
initially centered at x0 , moving in the positive direction with wave number k0 = mv/h,
and standard deviation of the packet width 0 . The wave function is appropriately
normalized. We give values for these parameters in arbitrary units:
> # Initialize wave function
> x0 = 0 # Center of wave packet
> vel = 0.5 # Mean velocity
> k0 = m*vel/hbar # Mean wave number
> sig0 = L/10 # Std of wave function
We then calculate the normalization and the initial magnitude of the wave func-
tion as a function of x, and plot the result, showing both real and imaginary parts
(Figure 9.10).
268 PARTIAL DIFFERENTIAL EQUATIONS
0.2
0.1
(x)
0.0
-0.1
Re
-0.2 Im
-20 0 20 40 60 80 100
x
0.04
0.03
P(x,t)
0.02 0.01
0.00
-20 0 20 40 60 80
x
Figure 9.11: Time evolution of the probability density of a Gaussian wave packet.
+ lines(x, pdens)
+ }
Note that the xaxs="i" and yaxs="i" options set the limits of the plot (Figure
9.11) equal to the numerical limits, rather than leaving a little space at each margin.
However, we set the upper y-axis limit as slightly larger than the amplitude of the
zero-time probability density.
1.0
1.0
0.5
0.5
0.0
0.0
u
u
-0.5
-0.5
-1.0
-1.0
-4 -2 0 2 4 -4 -2 0 2 4
x x
Figure 9.12: Solution of the Burgers Equation 9.21 with ReacTran (left) and exact solu-
tion for L (right).
where pnorm in R is the distribution function for the normal distribution. We load
pracma, define the functions in equations 9.22 and 9.23,
> require(pracma)
> Fn = function(t,x) 1/2*exp(t-x)*(1-erf((x-2*t)/(2*sqrt(t))))
> u = function(t,x) (Fn(t,x)-Fn(t,-x))/(Fn(t,x)+Fn(t,-x))
set up the time and space array as above,
> t = seq(0,1,.01)
> L = 10
> x = seq(-L/2,L/2,len=100)
initialize the matrix M to hold the results,
> M = matrix(rep(0,length(t)*length(x)),nrow=length(t))
perform the calculations,
> for (i in 1:length(t)) {
+ for (j in (1:length(x))) {
+ M[i,j] = u(t[i],x[j])
+ }
+ }
and plot the results in the right panel of Figure 9.12.
> plot(x, M[1,], type = "l", ylab="u")
> for (i in c(10,20,50,80)) lines(x, M[i,])
Agreement between the two modes of calculation is excellent at first, but the
results diverge slightly as time proceeds. This may be both because of accumulating
numerical imprecision in the ReacTran calculation, and because Equation 9.22 is no
longer exact as the initial discontinuity spreads toward the limits.
Chapter 10
Analyzing data
In the final two chapters we focus on data analysis, a topic for which R is particularly
well-suitedindeed, for which it was initially developed and about which most of the
literature on R is concerned. However, rather than refer the reader to other resources,
it seems reasonable to present here at least a brief survey of some of the major topics,
recognizing that scientists and engineers generally spend much of their time dealing
with real data, not just developing numerical simulations.
We begin in this chapter by showing how to get data into R from external files,
and how to structure data in data frames. We then turn to standard statistical topics
of characterizing a univariate dataset, comparing two datasets, determining goodness
of fit to a theoretical model, and determining the correlation of two variables. Fi-
nally, we introduce two methods of exploratory data analysisprincipal component
analysis and cluster analysiswhich are crucial in making sense of large datasets.
273
274 ANALYZING DATA
sets the new working directory to my desktop, verifies the change, and changes back
to the home directory.
To maintain the current working directory, but to access a file in an-
other directory, give the path to the file from the working directory, e.g.,
~/Desktop/NIST/lanczos3.txt if the desired file lanczos3.txt is located in
the NIST folder on my desktop.
If the entries in the file are in tabular form separated by spaces, and the columns
have headers, then the file can be read into R as a data frame (see later in this chapter)
by the command
lan = read.table("~/Desktop/NIST/lanczos3.txt", header=TRUE)
The default is header = FALSE, with entries separated by spaces. If the entries
were separated by tabs or commas, include the option sep = "\t" or sep = ","
in read.table(). Alternatively, since comma-separated (csv) files are a common
format of files exported from spreadsheets, one may use read.csv() for those files.
Consult the help file ?read.table for a complete description of the usage of these
commands.
Conversely, if we have calculated a vector, matrix, or other array of data called
my.data, and wish to save it in the file my file on the desktop, we do so with the
function
> write.table(my.data, file="~/Desktop/my_file")
Such a file can be imported by a spreadsheet.
400
300
200
100
In this example, the head() function displays just the first six rows of the data
frame. In general, head(x,n) displays the first n (default = 6) rows of the object x,
which may be a vector, matrix, or data frame. Likewise, the tail() function displays
the last rows of the object.
The columns of a data frame may be specified with the $ operator:
> class(chickwts$feed)
[1] "factor"
> class(chickwts$weight)
[1] "numeric"
A handy function to summarize measurements grouped by factor is tapply, in
which the first argument is the measurement to be summarized, the second is the
factor on which grouping is to be done, and the third is the function to be applied
(mean, summary, sum, etc.).
> options(digits=1)
> tapply(chickwts$weight, chickwts$feed, mean)
casein horsebean linseed meatmeal soybean sunflower
324 160 219 277 246 329
The boxplot function provides a handy graphical overview of the distribution
of measurements grouped by factor (Figure 10.1).
> boxplot(chickwts$weight ~ chickwts$feed)
900
15
700
0 5
> qqnorm(speed)
> qqline(speed)
150
100
50
0
5 6 7 8 9
mean of x mean of y
909.0 831.5
The test indicates that the means of the two experimental sets are significantly dif-
ferent at the p = 0.0065 level; that is, the null hypothesis has only a probability of
0.0065 of being correct by chance.
Several variants of the t test should be noted. The example above is a two-sample,
unpaired, two-sided test. A one-sample t test compares a single sample against a
hypothetical mean mu, e.g. t.test(morley$Speed, mu = 850). In a paired t test,
the individuals in each sample are related in some way (e.g., IQ of identical twins,
Youngs modulus of several steel bars before and after heat treatment, etc.). In such
a case, the argument paired = TRUE should be specified. A two-sided test is one in
which the mean of one sample can be either greater or less than that of the other. If it
is desired to test whether the mean of sample 1 is greater than that of sample 2, use
alternative = "greater", and similarly for "less". See ?t.test for details.
The t test applies rigorously only if the variation in the vectors is normally dis-
tributed. We saw that was essentially the case with the morley data, but not all data
behave so nicely. Consider, for example, the airquality dataset in the base R in-
stallation (Figure 10.4).
> boxplot(Ozone ~ Month, data = airquality)
Suppose we want to test the hypothesis that the mean ozone levels in months 5
and 8 are equal. A histogram and qqnorm plot of the month 5 data show a distinctly
non-normal distribution of ozone level occurrences (Figure 10.5); the same is true
for month 8.
> airq5 = subset(airquality, Month == 5)
> par(mfrow=c(1,2))
> hist(airq5$Ozone)
> qqnorm(airq5$Ozone)
CHI-SQUARED TEST FOR GOODNESS OF FIT 279
15
Sample Quantiles
Frequency
80
10
40
5
0
0
0 40 80 120 -2 -1 0 1 2
airq5$Ozone Theoretical Quantiles
In this case, the Wilcoxon (also known as MannWhitney) rank-sum test is more
appropriate than the t test. Executing the example in the help page for wilcox.test,
we obtain
> wilcox.test(Ozone ~ Month, data = airquality,
+ subset = Month %in% c(5, 8))
Wilcoxon rank sum test with continuity correction
data: Ozone by Month
W = 127.5, p-value = 0.0001208
alternative hypothesis: true location shift is not equal to 0
Warning message:
In wilcox.test.default(x = c(41L, 36L, 12L, 18L, 28L, 23L, 19L, :
cannot compute exact p-value with ties
so there is only a probability of one part in 104 that the means are equal.
5e+03
5000
5e+02
Animals$brain
Animals$brain
3000
5e+01
5e+00
1000
5e-01
0
Figure 10.6: Linear and log-log plots of brain weight vs. body weight, from MASS dataset
Animals.
10.6 Correlation
We are often interested in whether, and to what extent, two sets of data are correlated
with one another. Correlation may, but need not, imply a causal relation between
the variables. There are three standard measures of correlation: Pearsons product-
moment coefficient, and rank correlation coefficients due to Spearman and Kendall.
R gives access to all of these via the cor.test function, with Pearsons as the de-
fault.
We demonstrate the use of the cor.test function via the Animals dataset in the
MASS package. It is almost always useful to first graph the data (Figure 10.6).
> require(MASS)
> par(mfrow=c(1,2))
> plot(Animals$body, Animals$brain)
> plot(Animals$body, Animals$brain, log="xy")
We see that because of a few outliers (elephants, humans), the linear plot is not
very informative, but the log-log plot shows a strong correlation between body weight
and brain weight. However, when we use the linear data with the default (Pearson)
cor.test, we find virtually no correlation because of the strong influence of the out-
liers.
> cor.test(Animals$body, Animals$brain)
Warning message:
In cor.test.default(Animals$body, Animals$brain, method = "spearman") :
Cannot compute exact p-values with ties
> cor.test(Animals$body, Animals$brain, method="kendall")
Warning message:
In cor.test.default(Animals$body, Animals$brain, method = "kendall") :
Cannot compute exact p-value with ties
Rotation:
PC1 PC2 PC3 PC4
Sepal.Length 0.5211 -0.37742 0.7196 0.2613
Sepal.Width -0.2693 -0.92330 -0.2444 -0.1235
Petal.Length 0.5804 -0.02449 -0.1421 -0.8014
Petal.Width 0.5649 -0.06694 -0.6343 0.5236
The summary function gives the proportion of the total variance attributable to
each of the principal components, and the cumulative proportion as each component
is added in. We see that the first two components account for more than 95% of the
total variance.
> summary(iris1_pca)
Importance of components:
PC1 PC2 PC3 PC4
Standard deviation 1.71 0.956 0.3831 0.14393
Proportion of Variance 0.73 0.229 0.0367 0.00518
Cumulative Proportion 0.73 0.958 0.9948 1.00000
The histogram (the result of plot in a prcomp analysis) graphically recapitu-
lates the proportions of the variance contributed by each principal component, while
the biplot shows how the initial variables are projected on the first two principal
components (Figure 10.7). It also shows (albeit illegibly at the printed scale) the co-
ordinates of each sample in the (PC1, PC2) space. One species of iris (which turns
out to be setosa from the cluster analysis below) is distinctly separated from the other
two species in this coordinate space.
> par(mfrow=c(1,2))
> plot(iris1_pca)
> biplot(iris1_pca, col = c("gray", "black"))
> par(mfrow=c(1,1))
See the Multivariate Statistics task view in CRAN for more information and op-
tions.
CLUSTER ANALYSIS 283
iris1_pca -10 -5 0 5 10
61
0.2
42
10
0.0 0.5 1.0 1.5 2.0 2.5
94
58 63120
54 69
9982
81
107 88
90
70
0.1
9 8091
60
93 114
5
14
39 73
95143
68
83 147
135
13
46 102 109
Variances
4342
26
31
10 100
56 84
122
124
35
48
3
30 65 72
74
97
85 115
112
127
134
36 89
67
96 79
55
64 129
133 Petal.Length
PC2
98
75
62 77
0.0
50
7 24 92 104
150
139
59
128 119
0
12
25
821
27
40
29 76117
148 131
105
78 Petal.Width
2341
381
1832
44
28 71
66
87
52 146
113
138 123
108
103
130
5
37 5753 141
140
116
142
111 106
22
49 86 51101
126
149
125 136
144
121
Sepal.Length
-0.1
11
47
20 145
137
-5
45
1719
6
33
15 110
-10
34
-0.2
Sepal.Width
16 118
132
-0.2 -0.1 0.0 0.1 0.2
PC1
Cluster Dendrogram
6 4
Height
2
61107
42
0
6063
135
8723
110
136
109
120
101
78
119
15
16
103
115
26
86
937
65
80
73
118
132
45
62
72
116
74
99
112
147
77
150
126
130
33
14
34
37
122
134
84
87
196
125
105
56
91
71
43
12
25
68
21
32
108
131
106
123
114
69
88
51
53
52
57
70
44
141
145
137
149
104
142
146
55
59
79
47
121
144
111
148
75
98
67
85
54
90
26
13
24
27
113
140
124
127
100
95
89
41
50
117
138
128
139
66
76
64
92
58
94
83
93
96
97
30
31
81
82
46
28
385
29
39
484
20
22
3
1
129
133
10
35
18
401
11
49
102
143
iris1_dist
hclust (*, "complete")
Figure 10.8: Hierarchical cluster analysis of iris data using hclust.
a hierarchical cluster analysis using a set of dissimilarities for the n objects being
clustered. Initially, each object is assigned to its own cluster and then the algorithm
proceeds iteratively, at each stage joining the two most similar clusters, continuing
until there is just a single cluster. At each stage distances between clusters are recom-
puted by the LanceWilliams dissimilarity update formula according to the particu-
lar clustering method being used. There are seven agglomeration methods available,
with completewhich searches for compact, spherical clustersas the default. See
help(hclust) for details.
> iris1_dist = dist(iris1) # Uses default method
> plot(hclust(iris1_dist))
6 4
Height
2
78107
0
42
130
6988
61
135
9723
6063
109
110
115
101
136
103
465
15
119
120
112
105
126
17
33
77
86
65
80
37
73
118
132
62
72
125
99
74
147
116
16
34
150
14
19
87
134
84
122
43
56
91
26
12
25
71
21
32
68
41
44
51
53
52
57
114
70
106
123
108
131
47
55
59
79
104
137
149
141
145
142
146
36
50
111
148
121
144
24
27
13
75
98
54
90
67
85
31
30 4
124
127
100
89
95
113
140
28
2985
38
20
22
46 2
48 3
39
58
94
66
76
64
92
128
139
81
82
83
93
96
97
117
138
181
40
11
49
10
35
129
133
102
143
iris1
Divisive Coefficient = 0.95
Figure 10.9: Divisive hierarchical cluster analysis of iris data using diana.
1 2 3
96 21 33
> ccent = function(cl) {
+ f = function(i) colMeans(iris1[cl==i,])
+ x = sapply(sort(unique(cl)), f)
+ colnames(x) = sort(unique(cl))
+ return(x)
+ }
> ccent(iris1_kmeans3$cluster)
1 2 3
Sepal.Length 6.315 4.7381 5.1758
286 ANALYZING DATA
Sepal.Width 2.896 2.9048 3.6242
Petal.Length 4.974 1.7905 1.4727
Petal.Width 1.703 0.3524 0.2727
2
63 69
54
99 82107 120
81
90 88
70
9 6091
80 114
14
39 93 73
95 147
135
1
13
46
226 68
83 102
143 109
Component 2
43 4
31
10 100 84
56 122124
48 35
336
30 659772
74 115
127
134112
85
89
966779
64
98 55104
129
133
7
125024
25
6275
925977
150
139
128 119
0
827
40
29 76 117
148
78 105
146 131123
233841
1 21
4432 71 138 113108
518
28
37 66
5287
53116
111 103
130
141
140
142
22
49 865751 149 144 106
101
121
126
125 136
47
2011
45 137145
-1
17619
3315 110
-2
34
16 118
132
-3
-3 -2 -1 0 1 2 3
Component 1
Figure 10.10: pam (partitioning around medoids) analysis of iris data.
k e
f (k) = (10.1)
k!
and the expected number of occurences of k emissions is this probability multiplied
by the total number of intervals.
> k = emissions
> expCounts = totIntervals*lambda^k*exp(-lambda)/factorial(k)
> expCounts = round(expCounts,2)
The chi-squared test for goodness of fit demands an expected count of at least
five in each interval. Therefore, the first three intervals are combined into one, as are
the last three. We can then display the observed (O) and expected (E) counts as a
table.
> O = c(sum(obsCounts[1:3]),obsCounts[4:17],sum(obsCounts[18:20]))
> E = c(sum(expCounts[1:3]),expCounts[4:17],sum(expCounts[18:20]))
> cbind(O,E)
O E
[1,] 18 12.45
[2,] 28 27.39
[3,] 56 57.28
[4,] 105 95.86
[5,] 126 133.67
[6,] 146 159.78
[7,] 164 167.11
[8,] 161 155.36
[9,] 123 129.99
[10,] 101 98.87
[11,] 74 68.94
[12,] 53 44.37
[13,] 23 26.52
[14,] 15 14.79
[15,] 9 7.74
[16,] 5 6.36
The Pearson chi-square test statistic is
n
2 = (Ok Ek )2 /Ek . (10.2)
k=1
8 6
Variances
4 2
0
> screeplot(pc)
The first eight principal components contribute most of the variance. This is made
visually apparent with screeplot, which plots the variances against the number of
the principal component (Figure 10.11).
We learn which properties contribute most to the major principal components by
calling the rotation element of the prcomp list. (princomp calls this the loadings
element.)
> round(pc$rotation[,1:8],2)
PC1 PC2 PC3 PC4 PC5 PC6 PC7 PC8
z 0.16 0.17 -0.38 0.37 -0.05 -0.05 0.17 0.28
u_mag 0.23 0.30 -0.25 -0.04 0.08 0.03 0.17 -0.25
CASE STUDIES 291
sig_u 0.12 0.31 -0.23 0.17 0.22 0.18 0.19 -0.63
g_mag 0.27 0.26 -0.12 -0.20 0.02 0.01 -0.07 0.05
sig_g 0.08 0.23 -0.14 0.06 0.33 0.35 -0.77 0.22
r_mag 0.29 0.20 -0.02 -0.28 -0.09 -0.07 0.02 0.18
sig_r 0.05 0.25 0.44 0.32 -0.01 0.00 -0.02 -0.03
i_mag 0.29 0.17 0.01 -0.29 -0.13 -0.09 0.08 0.16
sig_i 0.06 0.26 0.45 0.29 -0.01 -0.01 -0.02 0.01
z_mag 0.29 0.14 0.02 -0.26 -0.15 -0.10 0.09 0.19
sig_z 0.14 0.29 0.40 0.16 -0.07 -0.06 0.08 0.08
Radio -0.03 0.04 -0.01 -0.01 0.58 -0.80 -0.10 -0.02
X.ray -0.11 0.01 0.12 -0.12 0.62 0.39 0.51 0.40
J_mag -0.31 0.23 -0.05 -0.06 -0.03 -0.02 0.01 0.02
sig_J -0.29 0.26 -0.05 -0.11 -0.11 -0.04 0.01 0.05
H_mag -0.31 0.23 -0.05 -0.06 -0.04 -0.02 0.01 0.02
sig_H -0.29 0.25 -0.06 -0.09 -0.11 -0.04 0.01 0.06
K_mag -0.31 0.23 -0.05 -0.06 -0.04 -0.02 0.01 0.02
sig_K -0.29 0.25 -0.07 -0.07 -0.13 -0.06 0.01 0.08
M_i -0.03 0.01 0.34 -0.55 0.12 0.12 -0.11 -0.37
Chapter 11
A large part of scientific computation involves using data to determine the parame-
ters in theoretical or empirical model equations. Not surprisingly, given its statistical
roots, R has powerful tools for fitting functions to data. In this chapter we discuss the
most important of these tools: linear and nonlinear least-squares fitting, and poly-
nomial and spline interpolation. We also show how these methods can be used to
accelerate the convergence of slowly convergent series with Pade and Shanks ap-
proximations. We then consider the related topics of time series, Fourier analysis of
periodic data, spectrum analysis, and signal processing, with a focus on extracting
signal from noise.
Call:
lm(formula = y ~ x)
Coefficients:
(Intercept) x
4.029 2.988
The intercept and slope are recovered within a few percent of the original.
293
294 FITTING MODELS TO DATA
0.6
30
0.4
25
0 2 4 6 8 10 0 2 4 6 8 10
x x
Figure 11.1: Linear fit (left) and residuals (right) for simulated data with random error.
Note that lm() enables one to draw the fitted line with the abline(h,v) func-
tion, in which h and v are taken from the fitted coefficients. The lm() function
also calculates the residuals, convenient for visual inspection of the quality of the
fit (Figure 11.1).
> par(mfrow=c(1,2))
> plot(x,y)
> abline(yfit)
> plot(x,residuals(yfit))
> abline(0,0)
If appropriate, the measurements may be accompanied by a vector of weights, in
which case weighted least squares is used. See ?lm for further details.
7
6
5
y
4 3
2
1
0 5 10 15 20
x
Figure 11.2: lm() fit to a quadratic polynomial with random error.
> summary(y2fit)
Call:
lm(formula = y ~ 1 + x + I(x^2))
Residuals:
Min 1Q Median 3Q Max
-0.34951 -0.25683 -0.08032 0.15884 0.80823
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 1.583913 0.197273 8.029 2.33e-07 ***
x -0.061677 0.045711 -1.349 0.194
I(x^2) 0.017214 0.002207 7.801 3.50e-07 ***
---
Signif. codes: 0 *** 0.001 ** 0.01 * 0.05 . 0.1 1
residuals(lmfit)
50
y
30
10
residuals(nlsfit)
50
y
30
10
Parameters:
Estimate Std. Error t value Pr(>|t|)
b1 2.389e+02 2.707e+00 88.27 <2e-16 ***
b2 5.502e-04 7.267e-06 75.71 <2e-16 ***
---
Signif. codes: 0 *** 0.001 ** 0.01 * 0.05 . 0.1 1
Parameters:
Estimate Std. Error t value Pr(>|t|)
b1 2.389e+02 2.707e+00 88.27 <2e-16 ***
b2 5.502e-04 7.267e-06 75.71 <2e-16 ***
---
Signif. codes: 0 *** 0.001 ** 0.01 * 0.05 . 0.1 1
Parameters:
Estimate Std. Error t value Pr(>|t|)
b1 2.389e+02 2.707e+00 88.27 <2e-16 ***
b2 5.502e-04 7.267e-06 75.71 <2e-16 ***
---
Signif. codes: 0 *** 0.001 ** 0.01 * 0.05 . 0.1 1
4e-05
2.5
2.0
residuals(nls_lan3)
1.5
0e+00
y
1.0
0.5
-4e-05
0.0
0.0 0.2 0.4 0.6 0.8 1.0 0.0 0.2 0.4 0.6 0.8 1.0
x x
Figure 11.5: Fit and residuals of nls() fit to the 3-exponential Lanczos function 11.1.
Parameters:
Estimate Std. Error t value Pr(>|t|)
b1 0.08682 0.01720 5.048 8.37e-05 ***
b2 0.95498 0.09704 9.841 1.14e-08 ***
b3 0.84401 0.04149 20.343 7.18e-14 ***
b4 2.95160 0.10766 27.416 3.93e-16 ***
b5 1.58257 0.05837 27.112 4.77e-16 ***
b6 4.98636 0.03444 144.801 < 2e-16 ***
---
Signif. codes: 0 *** 0.001 ** 0.01 * 0.05 . 0.1 1
> summary(nlsLM_lan3)
Parameters:
Estimate Std. Error t value Pr(>|t|)
b1 0.10963 0.01939 5.656 2.30e-05 ***
b2 1.06938 0.08704 12.286 3.45e-10 ***
b3 0.90322 0.05645 16.001 4.35e-12 ***
b4 3.09411 0.12182 25.399 1.50e-15 ***
b5 1.50055 0.07550 19.874 1.07e-13 ***
b6 5.03437 0.04458 112.930 < 2e-16 ***
---
Signif. codes: 0 *** 0.001 ** 0.01 * 0.05 . 0.1 1
Parameters:
Estimate Std. Error t value Pr(>|t|)
b1 0.08682 0.01720 5.048 8.36e-05 ***
b2 0.95499 0.09703 9.842 1.14e-08 ***
b3 0.84401 0.04149 20.344 7.17e-14 ***
b4 2.95161 0.10766 27.417 3.92e-16 ***
b5 1.58256 0.05837 27.113 4.77e-16 ***
b6 4.98636 0.03443 144.807 < 2e-16 ***
---
Signif. codes: 0 *** 0.001 ** 0.01 * 0.05 . 0.1 1
1.5
1.0
C
0.5
0.0
0 2 4 6 8 10
time
Figure 11.6: Concentration of product C of reversible reaction with points reflecting measure-
ment errors.
Given a solution of a model and observed data, modCost estimates the residuals,
and the variable and model costs (sum of squared residuals). The function is called
with
modCost(model, obs, x = "time", y = NULL, err = NULL,
weight = "none", scaleVar = FALSE, cost = NULL, ...)
where the arguments are (see the help page for details):
model model output, as generated by the integration routine or the steady-state
solver, a matrix or a data.frame, with one column per dependent and independent
variable.
obs the observed data, either in long (database) format (name, x, y), a data.frame, or
in wide (crosstable, or matrix) format.
x the name of the independent variable; it should be a name occurring both in the
obs and model data structures.
y either NULL, the name of the column with the dependent variable values,or an in-
dex to the dependent variable values; if NULL then the observations are assumed
to be in crosstable (matrix) format, and the names of the independent variables
are given by the column names of this matrix.
cost if not NULL, the output of a previous call to modCost; in this case, the new
output will combine both.
weight only if err=NULL: how to weigh the residuals, one of none, std, mean.
scaleVar if TRUE, then the residuals of one observed variable are scaled respec-
tively to the number of observations.
... additional arguments passed to R-function approx.
In our case, model is the data frame out, and obs is the data frame dataC. x
and y are picked up from the names in the data frames, and the other arguments are
handled as defaults.
INVERSE MODELING OF ODES WITH THE FME PACKAGE 307
> require(FME)
> rxnCost = function(pars) {
+ out = rxn(pars)
+ cost = modCost(model = out, obs = dataC)
+ }
modFit performs constrained fitting of a model to data, in many ways like the
other nonlinear optimization routines we have considered, and is called as follows:
modFit(f, p, ..., lower = -Inf, upper = Inf,
method = c("Marq", "Port", "Newton",
"Nelder-Mead", "BFGS", "CG", "L-BFGS-B", "SANN",
"Pseudo"), jac = NULL,
control = list(), hessian = TRUE)
Its arguments are
f a function to be minimized, with first argument the vector of parameters over
which minimization is to take place. It should return either a vector of residuals
(of model versus data) or an element of class modCost (as returned by a call to
modCost).
p initial values for the parameters to be optimized over.
... additional arguments passed to function f (modFit) or passed to the methods.
lower, upper lower and upper bounds on the parameters; if unbounded set equal to
Inf.
method the method to be used, one of Marq, Port, Newton, Nelder-Mead,
BFGS, CG, L-BFGS-B, SANN, Pseudosee the help page for details.
Note that the LevenbergMarquardt method is the default method.
jac a function that calculates the Jacobian; it should be called as jac(x, ...) and return
the matrix with derivatives of the model residuals as a function of the parameters.
Supplying the Jacobian can substantially improve performance; see last example.
hessian TRUE if Hessian is to be estimated. Note that, if set to FALSE, then a sum-
mary cannot be estimated.
control additional control arguments passed to the optimization routine.
Applying modFit to our fitting problem with guesses for the parameters that are
not too far from the real values, and using rxnCost as the function to be minimized,
we obtain
> Fit = modFit(p = c(kf=.5, kr=.5), f = rxnCost)
> summary(Fit)
Parameters:
Estimate Std. Error t value Pr(>|t|)
kf 0.196391 0.007781 25.24 <2e-16 ***
kr 0.293774 0.013954 21.05 <2e-16 ***
---
Signif. codes: 0 *** 0.001 ** 0.01 * 0.05 . 0.1 1
308 FITTING MODELS TO DATA
Parameter correlation:
kf kr
kf 1.0000 0.9618
kr 0.9618 1.0000
If the guesses for the parameters are too far from the correct values, an error
message may be returned stating effectively that the calculation did not converge
after the maximum number of iterations, but that Results are accurate, as far as they
go. In that case, those results may be used to start a new calculation, or the maximum
number of iterations may be increased.
Proper inverse modeling involves a number of subtleties and complexities be-
yond just nonlinear optimization. The FME vignette uses a relatively simple model of
HIV infection to demonstrate these points. We urge the reader to work through the
vignette, and summarize its contents as follows.
1. The model is formulated as a function containing a set of ODEs with given pa-
rameters, and solved using the deSolve function ode, with output going to a
data frame. The function is initially coded in R (HIV R) but then in Fortran for
speed (HIV), since at later stages the calculation must be repeated thousands of
times. The process for writing code in Fortran, C, or C++ is described in the
vignette deSolve: Writing Code in Compiled Languages available at http://
cran.r-project.org/web/packages/deSolve/vignettes/compiledCode.
pdf.
2. The output is compared with the data, actually artificial data to which random
noise has been applied. The weighted and scaled residuals are converted into a
cost (HIVcost) using the modCost function of FME.
3. Local sensitivity (sensitivity to the specific parameters in the model) is then cal-
culated using the senFun function of FME which takes as input HIVcost and the
parameters. This process identifies parameters that have little effect on the cost
when they are varied, and parameters that have strongly similar effects, indicat-
ing that they may not be independent. Parameters that are not strongly pairwise
correlated are termed identifiable, and are the ones that are important to the
model.
4. Further examination of identifiability comes from multivariate parameter analysis
using the collin function of FME. This yields a set of collinearity indices, indicat-
ing the extent to which a change in one parameter can be undone by appropriate
changes in the other parameters. semFun and collin together enable selection of
the set of parameters with the smallest collinearities for subsequent fitting.
5. To find the best values of the remaining parameters, nonlinear data fitting is car-
ried out with the modFit function of FME. This is a wrapper for the optimization
functions in optim, nls, and nlminb from the R base packages, with the addi-
tion of the LevenbergMarquardt algorithm from the minpack.lm package and a
pseudo-random search algorithm implemented in FME.
IMPROVING THE CONVERGENCE OF SERIES: PADE AND SHANKS 309
6. The steps up to this point have provided values for the identifiable parameters that
are optimal in the least squares sense. However, it is important to estimate the
effect of uncertainties in the parameters on the fit between the model and the data.
This is done with the modMCMC function in FME, using a Markov chain Monte Carlo
method with probabilities drawn from the target distribution, as described in the
vignette. It is at this stage that the large number of runs of the model are carried
out, making desirable coding of the HIV function in a fast, low-level language.
7. The function sensRange of FME is then used to generate graphs and summary
data on the effect of parameter uncertainty on the output of the model.
8. An extension of this approach to global parameter sensitivity is made by the func-
tion modCRL which tests the effect of parameter variation on a single output vari-
able (such as mean viral load), rather than on a time series.
FME is applicable not just to simulations of dynamic processes, but also
to steady states (vignette FMEsteady) and nonlinear equilibrium models (vi-
gnette FMEother). The vignettes FMEdyna and FMEmcmc demonstrate ad-
ditional aspects of the FME package. All of these vignettes are available at
http://CRAN.R-project.org/package=FME.
1.0
0.4 0.6 0.8
log(1 + x)
0.2
0.0
0.0 0.5 1.0 1.5 2.0
x
Figure 11.7: Approximations of ln(1+x): Solid line, true function; dashed line, Taylors series;
points, Pade approximation.
1.8
1.6
csy
1.41.2
1.0
0 10 20 30 40 50 60
x
Figure 11.8: Approximation to (2) by direct summation of 1/x2 .
The R program below, which applies the Shanks transformation three times in
succession, comes within 1.6% using only the first seven terms of the series, while
direct summation takes 37 terms to get that close.
> S = function(w,n) {
+ lam = (w[n+1]-w[n])/(w[n]-w[n-1])
+ return(w[n+1]+lam/(1-lam)*(w[n+1]-w[n]))
+ }
> # Use terms (1,2,3) to get S(csy,2), ...
> # (5,6,7) to get S(csy,6)
> S1 = c(S(csy,2),S(csy,3),S(csy,4),S(csy,5),S(csy,6))
> S1
[1] 1.450000 1.503968 1.534722 1.554520 1.568312
> # Now use the previous five values to get three new values
> S2 = c(S(S1,2),S(S1,3),S(S1,4))
> S2
[1] 1.575465 1.590296 1.599981
> # Use those three values to get one new value
> S3 = S(S2,2);
> S3
[1] 1.618209
> pi^2/6
[1] 1.644934
11.5 Interpolation
Often one has tabulated values of a property as a function of some condition, but
wants the value at some other conditions than those tabulated. If the desired con-
dition lies within the tabulated range, the value can be estimated by interpolation.
(Extrapolating beyond the tabulated range is a much riskier business.) R has several
functions for doing such interpolation.
312 FITTING MODELS TO DATA
3.5
visc
3.02.5
0 5 10 15
tC
Figure 11.9: Viscosity of 20% solutions of sucrose in water as a function of temperature.
For example, biochemists often sediment proteins and nucleic acids through
aqueous sucrose solutions. Tables of the viscosity of such solutions are available
at 5 deg C temperature increments (0, 5, 10, 15, etc.). But suppose sedimentation
measurements are to be done at other temperatures, e.g., 4, 7, and 12 deg C. See
Figure 11.9.
> # Known values
> tC = c(0,5,10,15)
> visc = c(3.774, 3.135, 2.642, 2.255)
> plot(tC,visc, type="o")
$y
[1] 3.2628 2.9378 2.4872
0 1 2 3 4
x
Figure 11.10: Examples of non-monotonic and monotonic fitting to a set of points.
1.0
0.5
y
0.0
-1.0
200 250 300 350 400
x
Figure 11.11: Fit of a spline function to a simulated spectrum, along with first and second
derivative curves.
Second and higher order derivatives can also be calculated (Figure 11.11).
> curve(10*fsp(x, deriv=2),add=T, lty="dotted")
> abline(0,0)
20
Re
sin(2 * pi * freq * x + phi)
Im
0.5
10
Re(y),Im(y)
-1.0 -0.5 0.0
-10 0-20
0 10 20 30 40 50 0.0 0.2 0.4 0.6 0.8 1.0
time freq
Figure 11.12: Sampling and analysis of a sine signal.
400
300200
Re(yifft)
100 0
-100
0 10 20 30 40 50
j
12
1e+00
8
sp$spec
power
1e-04
6
4
2
1e-08
0
0.1 0.2 0.3 0.4 0.5 0.1 0.2 0.3 0.4 0.5
frequency sp$freq
Figure 11.14: Power spectrum of sine function.
A1 * sin(2 * pi * f1 * x) + A2 * sin(2 * pi * f2 * x)
Two Sine Functions Fourier Components
3
20
2
-10 0 10
1
Re(y),Im(y)
-3 -2 -1 0
-30
0 10 20 30 40 50 0.0 0.2 0.4 0.6 0.8 1.0
time freq
If we apply the same analysis to the sum of two sine functions, with different
frequencies and amplitudes, we recover the original frequencies with approximately
proportionate amplitudes with spectrum(). The fft() results, however, are not
easy to interpret by inspection (Figure 11.15).
> par(mfrow=c(1,2))
>
> N = 50; tau = 1
> f1 = 1/5; A1 = 1; f2 =1/3; A2 = 2
> curve(A1*sin(2*pi*f1*x) + A2*sin(2*pi*f2*x),0,N-1,
+ xlab="time", main="Two Sine Functions")
>
> j=0:(N-1)
> y = A1*sin(2*pi*f1*j*tau) + A2*sin(2*pi*f2*j*tau)
>
> ry = Re(fft(y)); iy = Im(fft(y))
> zry = zapsmall(ry)
> ziy = zapsmall(iy)
>
> plot(j/(tau*N),zry,type="h",ylim=c(min(c(zry,ziy)),
+ max(c(zry,ziy))),xlab = "freq",
+ ylab ="Re(y),Im(y)", main="Fourier Components")
> points(j/(tau*N),ziy,type="h",lty=2)
The power spectrum (Figure 11.16) is computed and plotted from
> par(mfrow = c(1,1))
> sp = spectrum(y, xlab="frequency", ylab="power",
+ main="Power Spectrum 2 Sines")
> grid()
320 FITTING MODELS TO DATA
1e+01
power
1e-03
1e-07
A more realistic case would be a signal consisting of two sine functions with a
sloping baseline and a significant amount of random noise (Figure 11.17).
> par(mfrow=c(1,2))
> set.seed(123)
> N = 50; tau = 1
> f1 = 1/5; A1 = 1; f2 =1/3; A2 = 2
> j=0:(N-1)
> y = A1*sin(2*pi*f1*j*tau) + A2*sin(2*pi*f2*j*tau)
Series: x
Raw Periodogram
8
20.0
6
2.0 5.0
4
spectrum
y
2
0.5
0
-2
0.1
Figure 11.17: Power spectrum (right) of the sum of two sine functions with random noise and
a sloping baseline (left).
TIME SERIES, SPECTRUM ANALYSIS, AND SIGNAL PROCESSING 321
25
20
15
sps
10
5
0
Figure 11.18: Plot of the peaks derived from the power spectrum.
filter, poly
-0.5
-3.0
0.0 0.5 1.0 1.5 2.0 2.5 3.0
Phase (degrees)
-50
-350
Frequency
data
filtfilt
-1.0
filter
Figure 11.20: Use of butter(3,0.1) filter to extract a sinusoidal signal from added nor-
mally distributed random noise.
324 FITTING MODELS TO DATA
> lines(t, z, lty=2, lwd = 1.5)
> legend("bottomleft", legend = c("data", "filtfilt", "filter"),
+ lty=c(3,1,2), lwd=rep(1.5,3), bty = "n")
data
sgolayfilt
Figure 11.21: Use of SavitzkyGolay filter to extract a sinusoidal signal from added normally
distributed random noise.
CASE STUDIES 325
1.5
1.0
0.5
0.0
x
-0.5
-1.0
data
fftfilt
Figure 11.22: Use of fftfilt to extract a sinusoidal signal from added normally distributed
random noise.
Chan (2008); Chapter 14 in Venables and Ripley (2002); and the Time Series Analy-
sis Task View on CRAN.1 Shorter but useful online treatments have been written by
Coghlan2 and Kabacoff,3 among others.
20
0.2
resid(nlsLM_Hahn1)
0.1
15
0.0
y
10
-0.2 -0.1
5
0
Figure 11.23: (left) Plot of Hahn1 data and fitting function; (right) Plot of residuals.
We extract the x and y variables, and plot the data to get a sense of its behavior.
Anticipating the need to overlay the fitting function and plot the residuals, we set up
a 1 2 graphics array (Figure 11.23).
> x = hahn1$x; y = hahn1$y
> par(mfrow=c(1,2))
> plot(x,y,cex=0.5)
We use the LevenbergMarquardt approach to find the estimated best values for the
coefficients in the rational function with the nlsLM() function in the minpack.lm
package. As starting values we use those on the NIST website.
> require(minpack.lm)
> nlsLM_Hahn1 = nlsLM(y~(b1+b2*x+b3*x^2+b4*x^3)/
(1+b5*x+b6*x^2+b7*x^3),
start=list(b1=10, b2=-1, b3=.05, b4=-1e-5,
b5=-5e-2, b6=.001, b7=-1e-6))
We get the estimated values, their standard errors, and the probabilities that they
are not significant (infinitesimal in all cases) with the summary() function.
> summary(nlsLM_Hahn1)
Parameters:
Estimate Std. Error t value Pr(>|t|)
b1 1.078e+00 1.707e-01 6.313 1.40e-09 ***
b2 -1.227e-01 1.200e-02 -10.224 < 2e-16 ***
b3 4.086e-03 2.251e-04 18.155 < 2e-16 ***
b4 -1.426e-06 2.758e-07 -5.172 5.06e-07 ***
b5 -5.761e-03 2.471e-04 -23.312 < 2e-16 ***
CASE STUDIES 327
340360
350
co2
330
320
1960 1970 1980 1990
Time
Time
Figure 11.25: Decomposition of CO2 data into trend, seasonal, and random components.
329
330 BIBLIOGRAPHY
[JMR09] Owen Jones, Robert Maillardet, and Andrew Robinson. Introduction to
Scientific Programming and Simulation Using R. CRC Press, Boca Ra-
ton, 2009.
[Kab11] Robert I. Kabacoff. R in Action: Data Analysis and Graphics with R.
Manning, Shelter Island, N.Y., 2011.
[Mat11] Norman Matloff. The Art of R Programming: A Tour of Statistical Soft-
ware Design. No Starch Press, San Francisco, 2011.
[Mit11] Hrishi V. Mittal. R Graphs Cookbook. Packt, Birmingham, U.K., 2011.
[MJS11] Walter R. Mebane, Jr. and Jasjeet S. Sekhon. Genetic optimization using
derivatives: the rgenoud package for r. Journal of Statistical Software.
URL http://www. jstatsoft. org, 2011.
[Mur11] Paul Murrell. R Graphics. CRC Press, Boca Raton, second edition, 2011.
[Pet03] Thomas Petzoldt. R as a simulation platform in ecological modelling. R
News, 3(3):816, 2003.
[PTVF07] William H. Press, Saul A. Teukolsky, William T. Vetterling, and Brian P.
Flannery. Numerical Recipes: The Art of Scientific Computing. Cam-
bridge University Press, New York, third edition, 2007.
[Ric95] J.A. Rice. Mathematical Statistics and Data Analysis. Duxbury Press,
Pacific Grove, CA, second edition, 1995.
[SCM12] Karline Soetaert, Jeff Cash, and Francesca Mazzia. Solving Differential
Equations in R. Springer, New York, 2012.
[Scr12] Luca Scrucca. Ga: A package for genetic algorithms in r. Journal of
Statistical Software, 53:137, 2012.
[SH10] Karline Soetaert and Peter M.J. Herman. A Practical Guide to Ecological
Modelling: Using R as a Simulation Platform. Springer, New York, 2010.
[SN87] J.M. Smith and H.C. Van Ness. Introduction to Chemical Engineering
Thermodynamics. McGraw-Hill, New York, 1987.
[Ste09] M. Henry Stevens. A Primer of Ecology with R. Springer, New York,
2009.
[Tee11] Paul Teetor. R Cookbook. OReilly, Sebastopol, CA, 2011.
[Van08] Steve VanWyk. Computer Solutions in Physics with Applications in As-
trophysics, Biophysics, Differential Equations, and Engineering. World
Scientific, Singapore, 2008.
[Ver04] John Verzani. Using R for Introductory Statistics. CRC Press, Boca
Raton, 2004.
[VR02] W.N. Venables and B.D. Ripley. Modern Applied Statistics with S.
Springer, New York, fourth edition, 2002.
[ZRE56] B.H. Zimm, G.M. Roe, and L.F. Epstein. Solution of a characteristic
value problem from the theory of chain molecules. J. Chem. Phys.,
24:279280, 1956.
Statistics The R Series
Victor A. Bloomfield
Bloomfield
K13976