Sslib
Sslib
for the
Statistical Seismology Library
David Harte
URL: www.statsresearch.co.nz
Email: david‘at’statsresearch.co.nz
Copyright
c 2007 by David Harte. This document may be reproduced and distributed
in any medium so long as the entire document, including this copyright notice and the
version date above, remains intact and unchanged on all copies. Commercial redistri-
bution is permitted, but you may not redistribute it, in whole or in part, under terms
more restrictive than those under which you received it.
The document should be cited in the usual scientific manner, and should contain the
following information:
Preface vii
I The R Language 1
1 Introduction to R 3
1.1 Starting R . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
1.2 Quitting R . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
1.3 Vectors . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
1.4 Matrices . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
1.5 Mode of an Object . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
1.6 Function Objects . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
1.7 Help Documentation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8
1.8 Writing Functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8
1.9 Graphs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
2 Input-Output Methods 11
2.1 Reading Data from a Text File . . . . . . . . . . . . . . . . . . . . . . . 11
2.2 Including an R Program Source File . . . . . . . . . . . . . . . . . . . . 12
2.3 Writing R Objects to a Text File . . . . . . . . . . . . . . . . . . . . . . 13
2.4 Writing Program Output to a Text File . . . . . . . . . . . . . . . . . . 13
2.5 Saving R Objects for Use in a Subsequent Session . . . . . . . . . . . . . 13
2.6 Retrieving R Objects from a Previous Session . . . . . . . . . . . . . . . 14
2.7 Executing FORTRAN and C++ from within R . . . . . . . . . . . . . . 14
2.8 Running Jobs in Batch Mode . . . . . . . . . . . . . . . . . . . . . . . . 14
iii
iv CONTENTS
8 M8 Algorithm (ssM8) 51
11 Software Installation 59
11.1 Installation or Updating of the R Software . . . . . . . . . . . . . . . . . 59
11.1.1 Linux . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 59
11.1.2 Microsoft Windows . . . . . . . . . . . . . . . . . . . . . . . . . . 60
11.2 Installation or Updating of SSLib . . . . . . . . . . . . . . . . . . . . . . 60
11.2.1 unix/Linux . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 60
11.2.2 Microsoft Windows . . . . . . . . . . . . . . . . . . . . . . . . . . 60
IV Appendices 65
B Common R Functions 77
B.1 Data Objects . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 77
B.1.1 Checking and Creating Different Data Types . . . . . . . . . . . 77
B.1.2 Data Attributes . . . . . . . . . . . . . . . . . . . . . . . . . . . . 77
vi CONTENTS
C Mathematical Detail 89
C.1 Point Process Log-Likelihood Function . . . . . . . . . . . . . . . . . . . 89
C.2 Self-Exciting and ETAS Models . . . . . . . . . . . . . . . . . . . . . . . 90
C.2.1 Self-Exciting Models . . . . . . . . . . . . . . . . . . . . . . . . . 90
C.2.2 The ETAS Model . . . . . . . . . . . . . . . . . . . . . . . . . . . 91
C.2.3 Utsu & Ogata’s Parameterisation . . . . . . . . . . . . . . . . . . 91
C.2.4 SSLib Parameterisation . . . . . . . . . . . . . . . . . . . . . . . 92
C.3 Stress Release Models . . . . . . . . . . . . . . . . . . . . . . . . . . . . 93
C.3.1 Simple Stress Release Model . . . . . . . . . . . . . . . . . . . . 93
C.3.2 Linked Stress Release Model . . . . . . . . . . . . . . . . . . . . 94
C.4 Simulation Using the Thinning Method . . . . . . . . . . . . . . . . . . 94
C.4.1 Algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 94
C.4.2 Simulation Notes . . . . . . . . . . . . . . . . . . . . . . . . . . . 95
References 97
Preface
vii
viii PREFACE
Part I
The R Language
1
Chapter 1
Introduction to R
1.1 Starting R
Start R by entering
R
on the xterm or console command line. Your window will look something like:
david> R
3
4 CHAPTER 1. INTRODUCTION TO R
>
1.2 Quitting R
When you have completed your session within R, quit by entering q(). When you quit
R, it will ask whether you want to save any of the R objects that you may have created
during the session. You can list them by entering ls() on the command line. So far we
have not created anything, so there should be nothing listed. If you choose to save any
objects when quitting R they will be written to the disk into the directory from which
you started R, and into a file called .RData. Next time you start R from within this
subdirectory, the objects that have been saved into the file .RData will be automatically
loaded into the R session. This is discussed further in §2.5.
1.3 Vectors
1. Vectors are constructed using the c function, which stands for combine or concate-
nate. For example, within the R window, enter
a <- c(1, 2, 3, 4, 5)
b <- c(2, 4, 6, 8, 10)
d <- c(3, 9, 27)
The <- means assign the value of the object on the right to an object with the
given name on the left. Thus we have three vectors a, b and d.
To save possible confusion, we prefer not to use c as a vector name, since it is the
name of a system function c (combine or concatenate). We will look at functions
more carefully in §1.6.
3. Individual elements of a vector can be indexed using square brackets. There are
two methods of selecting the required elements from a vector:
1.4. MATRICES 5
(a) To select, for example, the 2nd element from vector b, enter b[2] on the
command line. To select the 2nd twice and 4th elements, enter b[c(2,2,4)]
on the command line. Notice that the indices are contained within a vector,
i.e. c(2, 2, 4).
(b) Alternatively, one can use a logical (Boolean) vector of the same length as b.
To select those elements in b that are greater than 6, create a logical vector
by entering e <- (b > 6). Now enter print(e), and you will have a vector
of the same length as b with a sequence of TRUE’s and FALSE’s depending on
whether the expression is true for each element in the vector. Those elements
can now be selected by entering b[e]. Alternatively, you could simply enter
b[b>6].
4. Often when data are collected, there are situations where some values are missing.
For example, the depths of some historical earthquakes are often missing. In the R
language, missing values (in numeric objects) are coded as NA without quotes. For
example, say we have four earthquakes, the first three have depths (km): 31, 150,
and 2, but the fourth is missing. These data would be assigned to the variable
depth as:
depth <- c(31, 150, 2, NA)
Any arithmetic operations that are performed on a missing value will give a missing
value, for example, try 2*depth.
1.4 Matrices
1. One way to construct matrices (there are many ways) is as follows:
x <- matrix(c(1, 2, 3, 4, 5, 6, 7, 8, 9, 10), byrow=FALSE, ncol=2)
print(x)
This gives
[,1] [,2]
[1,] 1 6
[2,] 2 7
[3,] 3 8
[4,] 4 9
[5,] 5 10
Like c, matrix is also a function. The statement above says to make a matrix
with 2 columns, called x, and with elements 1, 2, · · · , 10. The byrow=FALSE means
load the matrix column by column.
Similarly,
y <- matrix(c(1, 2, 3, 4, 5, 6, 7, 8, 9, 10), byrow=FALSE, ncol=5)
print(y)
6 CHAPTER 1. INTRODUCTION TO R
gives
The same result would be given by entering y^2 or y**2. However, x*y will
produce an error.
3. Individual elements of the matrix can be selected, for example, entering x[4, 2]
gives 9. As for vectors, one could also index using a logical matrix of the same
dimensions. For example, enter:
z <- (x < 6)
The matrix z is a logical matrix with the same dimensions as x, containing either
TRUE’s or FALSE’s depending on whether xij < 6.
4. Matrix multiplication is achieved with the symbol %*%, hence x %*% y gives
5. Character matrices (or vectors) can also be defined in the same manner, for ex-
ample:
This gives:
a <- c(1, 2, 3, 4, 5)
b <- c(2, 4, 6, 8, 10)
x <- matrix(c(1, 2, 3, 4, 5, 6, 7, 8, 9, 10), byrow=FALSE, ncol=2)
e <- (b > 6)
colours <- matrix(c("red", "blue", "green", "cyan", "yellow", "magenta"),
byrow=TRUE, nrow=2)
4. Recall that c is the concatenation function, thus entering mode(c) gives "function".
3. We can view the internal commands within a function by entering its name (with-
out brackets) on the command line, for example enter matrix on the command
line.
5. Trivial functions are list (ls) and quit (q). Recall that one quits the R session by
executing the quit function, i.e. q() (see §1.2). Or one could view the function by
simply entering q.
6. There are in fact many functions within the R system: for performing mathemat-
ical operations, fitting statistical models, and graphical functions. For example,
the object y %*% x represents a 2 × 2 matrix. The function solve(y %*% x) will
invert the matrix, and eigen(y %*% x) will calculate the eigen values and vectors
of the 2 × 2 matrix y %*% x. Note that the output from the eigen function is
a list object (see §3.1), containing a vector of eigen values and a matrix of eigen
vectors.
2. Many help pages have a set of examples at the bottom that can be easily run. For
example, select the help information for eigen. The first example is:
eigen(cbind(c(1,-1),c(-1,1)))
which will calculate the eigen values and vectors of the 2 by 2 matrix with one’s
on the main diagonal and negative one’s on the minor diagonal. The code can be
executed by highlighting it in the web browser, and dumping onto the R command
line.
3. Documentation for each of the functions cited in this document can be found in
the web browser help window.
enter test on the command line. It will probably say that the object does not
exist, and hence we can use it as our function name.
2. Say we have a vector, we want to multiply each element by 2 and add 1. We want
to pass this vector into the function, and pass the required vector out. This is
done as follows:
test <- function(invector){
outvector <- 2*invector + 1
return(outvector)
}
To execute the function using vector a in §1.3, enter test(a) on the command line.
Complicated functions can be written using both logical (Boolean) and looping
constructions. If arithmetic operations are applied to a logical vector, the elements
will be treated as zero’s (FALSE) and one’s (TRUE).
3. It is possible to call both FORTRAN and C++ code from within a function (see
§2.7).
1.9 Graphs
In this subsection, a very brief outline of graphical methods in R is given.
2. Having opened an appropriate graphics device, one often wants to change various
parameters, for example: the number of graphs on the page, the axis layout,
available colours, font types, etc. Various options can be selected by using the par
function.
3. There are many functions to do various types of graphs. The most common are
plot, hist, curve, and barplot. For example, say we wanted to plot the cubic
function f (x) = x(x − 3)(x + 1) on the interval (−1.5, 3.5). This can be done by
entering:
x <- seq(-1.5, 3.5, 0.01)
f <- x*(x-3)*(x+1)
plot(x, f, type="l")
10 CHAPTER 1. INTRODUCTION TO R
If no graphics device is open, then in Linux (or UNIX) R will usually open an X11
window automatically, and in Microsoft Windows a windows window.
Input-Output Methods
1. Assume that the data are stored a file called “events.dat” in the format below.
2. These can be read into a list object by using the scan function:
Note that in the above use of scan, the blank character denotes the break between
fields. This is why the underscore has been used where a blank would normally
occur in the event name. An alternative method is to use commas, or some other
character as the delimiter.
11
12 CHAPTER 2. INPUT-OUTPUT METHODS
3. The object NZ1 will be a list object, try the function is.list(NZ1). When we
print the object, i.e. print(NZ1), it prints as a list. Lists are discussed further in
§3.1.
5. Now assume that the data stored in the file “events.dat” does not contain the un-
derscores, and some values are missing (unfortunately, often indicated by blanks),
e.g. the depth for Westport.
6. The use of a separator as above will not work here. In this situation, one reads each
complete line (record) into one character variable. The use of sep="\n" indicates
that the end of record denotes the next value. One then picks off the substrings
relating to the individual variables, and the numeric variables must be “coerced”
from character to numeric:
1. The text file can be written with any text editor, e.g. emacs, gedit, or notepad.
Create a file with the name “test.R”
2. Enter the required programming code into the file. For example:
2.3. WRITING R OBJECTS TO A TEXT FILE 13
a <- c(1, 2, 3, 4, 5)
b <- c(2, 4, 6, 8, 10)
# print the product of a times b
print(a*b)
Note that the hash character (#) starts a comment line. This comment remains
in effect until the next hard return (hard line feed).
4. The commands within the text file can now be executed in R by typing
source("test.R")
dump("matrix", file="temp.R")
the code for the function matrix will be written into the text file “temp.R”. This code
could be edited and included back into R by using the source function (see §2.2).
Data objects that are required for use in programs outside of R will need to be
written to a text file. Some possibly useful functions are sink, print, and cat.
a <- 10
b <- 15
d <- 21
save(a, b, file="temp.Rda")
will save the objects a and b only in an R format in the file “temp.Rda”. To save all
current objects, run:
save(list=ls(), file="temp.Rda")
The output that would be normally written to the VDU in an interactive session will
now be written to outfile. For more information, enter
R CMD BATCH --help
Chapter 3
[[2]]:
[,1] [,2]
[1,] 1 6
[2,] 2 7
[3,] 3 8
[4,] 4 9
[5,] 5 10
[[3]]:
[,1] [,2] [,3]
[1,] "red" "blue" "green"
[2,] "cyan" "yellow" "magenta"
Notice that the matrix x is referred to as [[2]] within the list object data. Enter
data[[2]] on the command line to get x. The element in the 5th row and 2nd
column of x can be extracted as data[[2]][5,2].
3. If we want to retain their original names (or allocate new names), enter
15
16 CHAPTER 3. MORE ADVANCED DATA STRUCTURES
Now x can be retrieved from the object data by entering either data[[2]] or
data$x.
4. Enter mode(data) to see that R recognises the object data as a list object (§1.5).
Enter names(data) to see the variable names within the list object called data.
5. Each part of the list will have its own mode (§1.5), eg. enter mode(data$colours)
and mode(data$x).
3.2 Factors
A factor is essentially a coded variable, usually a character variable.
1. Consider the dataset in §2.1. For simplicity, assume that the file “events.dat”
contains the data in the following format:
-41.76,172.04,Westport,12,6.7,23,05,1968,17,24,17.4
-34.94,179.30,Kermadec Trench,297,6.8,08,01,1970,17,12,36.6
-39.13,175.18,National Park,173,7.0,05,01,1973,13,54,27.6
-41.61,173.65,Marlborough,84,6.7,27,05,1992,22,30,36.1
-45.21,166.71,Secretary Island,5,6.7,10,08,1993,00,51,51.6
-43.01,171.46,Arthurs Pass,11,6.7,18,06,1994,03,25,15.2
-37.65,179.49,East Cape,12,7.0,05,02,1995,22,51,02.3
2. These data can be read into a list object by using the following statement:
NZ1 <- scan("events.dat", what=list(latitude=0, longitude=0,
event="", depth=0, magnitude=0, day=0, month=0,
year=0, hour=0, minute=0, second=0), sep=",")
3. Now assume that we want to divide the depth into two categories, "deep" and
"shallow". This can be achieved as:
NZ1$depth.cat <- c("deep", "shallow")[(NZ1$depth<40)+1]
print(NZ1$depth.cat)
print(is.character(NZ1$depth.cat))
Note that the “data” are stored as character the strings "deep" and "shallow",
which would be inefficient for a large dataset.
4. A factor stores this information by recording the character “levels”, here being
"deep" and "shallow", and then the data vector is a numeric variable of one’s
and two’s depending on whether the particular value is "deep" or "shallow",
respectively. The variable NZ1$depth.cat can be transformed into a factor as
follows:
NZ1$depth.cat <- as.factor(NZ1$depth.cat)
3.3. ATTRIBUTES 17
5. Observe that the variable is printed with the values of deep and shallow. The
levels can be extracted using the levels function.
print(NZ1$depth.cat)
print(levels(NZ1$depth.cat))
print(as.numeric(NZ1$depth.cat))
3.3 Attributes
1. A variable can have a number of attributes. These are characteristics of the
variable, and some determine the manner in which the variable is printed, and so
on.
Note that there are two attributes, levels and class as follows:
$levels
[1] "deep" "shallow"
$class
[1] "factor"
The levels were discussed in §3.2. The “class” attribute is a central concept in
the R language, and determines the manner in which other functions interact with
this variable. This is discussed further in §3.4.
The colours object only has one attribute, dim, being the dimensions of the matrix.
Note that the row and column names are actually stored as attributes.
5. We can also attach our own attributes using the attr function. Again, consider
the data in §3.2. The magnitudes are on a “local” scale. This information could
be attached to the variable NZ1$magnitude as follows:
18 CHAPTER 3. MORE ADVANCED DATA STRUCTURES
2. Note that whenever an object name is entered on the command line followed by
Enter, for example NZ1$depth.cat, it is interpreted as print(NZ1$depth.cat).
4. Since class(NZ1$depth.cat) is "factor", then the print function looks for an-
other function called print.factor, and since there is such a function, it issues
the command print.factor(NZ1$depth.cat). It is this function that causes
NZ1$depth.cat to be printed using the values deep and shallow rather than
one’s and two’s.
6. The function print.default will print the stored data without reference to the
class of the object. For example,
print.default(NZ1$depth.cat)
will simply give a vector of one’s and two’s, because this is how the data are stored.
7. It is possible to define our own generic functions and the associated methods.
These ideas are quite important and are used in the Statistical Seismology Library,
particularly to store the earthquake catalogues. One can also add methods for
system supplied generic functions.
3.5. DATA FRAME 19
8. Objects can have multiple classes. For example, the earthquake catalogues, which
will be discussed in §5.1, have classes "catalogue" and "data.frame". Hence a
generic function will first search for the appropriate method for "catalogue". If
this is not found, it will search for an appropriate method for "data.frame". If
there is no method found here either, it will simply use the default method for the
generic operation.
2. Now force the data into a matrix by using the column bind function:
NZ2 <- NULL
for (i in 1:length(NZ1)) NZ2 <- cbind(NZ2, NZ1[[i]])
print(NZ2)
This has part of the desired effect, in that each earthquake event is now represented
as a row in the matrix, and each variable as a column. However, since a matrix
has the same mode (see §1.5) for all elements, and one variable in NZ2 is character,
then all elements are transformed to character values.
3. The standard way to turn the object NZ1 into a data frame is with the following
statement: NZ3 <- as.data.frame(NZ1).
Hence, when we print NZ3, the print function uses the function print.data.frame.
This function causes the object to be printed like a matrix, even though it is a
list.
6. As for a list, each variable can have different modes and attributes:
20 CHAPTER 3. MORE ADVANCED DATA STRUCTURES
print(mode(NZ3$latitude))
print(mode(NZ3$longitude))
print(mode(NZ3$event))
Notice that NZ3$event is also stored as a numeric variable! In fact, since it was
character in the original data, the as.data.frame turns it into a factor (see §3.2),
hence it still prints “like” a character variable.
7. In the following, we again read the data from the text file and turn it into a
data frame. However, the I function attaches a class of "AsIs", which tells the
as.data.frame function not to turn NZ1$event into a factor. We also create a
depth category factor, and attach an attribute to the magnitude. All of these
characteristics can be stored within a data frame object.
NZ1 <- scan("events.dat", what=list(latitude=0, longitude=0,
event="", depth=0, magnitude=0, day=0, month=0,
year=0, hour=0, minute=0, second=0), sep=",")
NZ1$event <- I(NZ1$event)
NZ1 <- as.data.frame(NZ1)
print(NZ1)
print(mode(NZ1$event))
print(attributes(NZ1$event))
print(attributes(NZ1$magnitude))
21
Chapter 4
4.1 Introduction
The Statistical Seismology Library is a collection of R packages or libraries for the
analysis of seismological data. It consists of the individual packages: ssBase, ssEDA,
PtProcess, ssM8, Fractal, and various packages containing earthquake catalogues.
The packages can be attached individually, or collectively as shown below.
The ssBase package contains common functions utilised by more than one of the
SSLib packages.
The word “Library” in the name of SSLib is historical. SSLib was originally writ-
ten for S-PLUS, in which such add-on software was known as a library, and used the
library function to attach it. Now R distinguishes between a “library”, which is a
directory containing installed packages, and a package, which is a named component of
a library. However, the library function is still used to make a package available. In
R nomenclature, SSLib should be a “bundle”, although as yet it is not distributed as
such.
Lay & Wallace (1995) is a good general seismology text and will provide descriptions
of much of the seismological terminology used.
library(sslib)
Note that R is case dependent. Something like the following should appear on
your screen.
23
24 CHAPTER 4. STATISTICAL SEISMOLOGY LIBRARY (SSLIB)
> library(sslib)
Loading required package: ssBase
Loading required package: chron
Loading required package: ssEDA
Loading required package: maps
Loading required package: ssNZ
Loading required package: Fractal
Loading required package: ssM8
Loading required package: PtProcess
>
Note that SSLib is made up of a number of packages, as above. The specific pack-
age sslib, loaded by calling library(sslib) as above, actually contains nothing,
though requires the collection of packages that make up SSLib. Thus the package
sslib simply provides a convenient method to load all parts of SSLib with one com-
mand. We refer to the above collection of packages as the Statistical Seismology
Library (SSLib).
3. SSLib largely contains two types of R objects: earthquake catalogues and func-
tions. Documentation for each can be found by entering help.start(). At the
end of most function documentation, there is a set of examples that can be exe-
cuted.
5. If one always requires a particular catalogue to be loaded, one can modify the
sslib package to do this (see §12.1).
Chapter 5
A listing of the main functions in the ssBase package can be found in Appendix A.2,
and detailed documentation for all functions can be found in Harte (2003c).
latitude (numeric) is the number of degrees north of the equator (positive) or south of
the equator (negative).
longitude (numeric) is the number of degrees east of Greenwich (i.e. between 0◦ and
360◦ ). Note that events in the hemisphere west of the meridian through Greenwich
are not represented with negative longitudes. This ensures that the discontinuity
occurs at a longitude of 0◦ .
time (numeric) with class "datetimes" being the number of days (and fractions) from
midnight on 1 January 1970. While the data are stored as the number of days
since 1 January 1970, the class of "datetimes" causes the data to be printed
in the format "ddmmmyyyy hh:mm:ss.s", where the number of decimal places for
seconds is defined as an attribute of the time variable.
You can add any other variables to the catalogue in addition to those listed above.
Catalogues are stored as list objects (see §3.1). They have two classes (see §3.4):
"catalogue" and "data.frame". This causes generic functions to first search for a
method (see §3.4) for "catalogue", and if one is not found, then use the method for
25
26 CHAPTER 5. THE SSLIB BASE PACKAGE (SSBASE)
"data.frame". The catalogue also has an attribute called catname, being a character
string containing the name of the catalogue. This ensures that the object does not
“forget” its origin when passed in and out of other functions.
2. The datetimes function can be used to create the time variable. This function
is contained in the SSLib package ssBase, which needs to be loaded first if not
already done. For example, enter
library(sslib)
x <- datetimes(NZ1$year, NZ1$month, NZ1$day, NZ1$hour, NZ1$minute,
NZ1$second, dp.second=1)
print(x)
The argument dp.second=1 ensures that seconds are printed to one decimal place,
the number read from the text file. More information about the datetimes func-
tion can be found in the help documentation.
3. The function as.catalogue will not only calculate the time variable, but also
attach the necessary attributes to the catalogue object. This is run as follows:
as.catalogue(NZ1, catname="NZ1", dp.second=1)
print(NZ1)
Note that there is no assignment arrow to the left of the function call. The
assignment is done internally within the function.
4. Now print these various characteristics of the catalogue NZ1:
print(names(NZ1))
print(attributes(NZ1))
print(class(NZ1))
Note that an extra variable called missing.time has been added, while the vari-
ables year, month, day, hour, minute, and second have been deleted (see §5.3 for
more explanation). Note that both classes have been added, and the rows have
been sequentially numbered. Lastly note that the catalogue name has been added
as an attribute.
5. You can also add your own attributes to the catalogue objects and variables within
the object (e.g. magnitude), but do not use the same attribute names expected
by other functions. For example,
5.3. THE TIME VARIABLE 27
attr(NZ1, "note") <- "Event solutions determined using velocity model A"
1. Continuing the example in §5.2, enter print(NZ1). Notice that the time variable
is formatted with a date and time component. As noted above, the six original
date and time variables have been deleted. These data can easily be recalculated
from the time variable as follows:
print(years1(NZ1$time))
print(months1(NZ1$time))
print(days1(NZ1$time))
print(hrs.mins.secs(NZ1$time))
3. The time variable in a catalogue has class "datetimes", however, the variable is
numeric, being the number of days from a defined origin, by default 1 Jan 1970.
For example, enter:
print(attributes(NZ1$time))
print(is.numeric(NZ1$time))
Even though the dates are stored as the number of days from 1 Jan 1970, they are
printed in the format DDMMMYYYY hh:mm:ss.s because of the class "datetimes"
(see §3.4). The function print.default will print the data in the manner in which
it is stored, for example, try:
28 CHAPTER 5. THE SSLIB BASE PACKAGE (SSBASE)
print(NZ1$time)
print.default(NZ1$time)
4. In modelling data, one often needs the flexibility to change the time origin, but
still requires the dates to be printed correctly. For example, we may want the
origin to be 1 Jan 1990. This can be done as follows:
NZ1$time <- NZ1$time - julian(1, 1, 1990, origin=attr(NZ1$time, "origin"))
attr(NZ1$time, "origin") <- c(month=1, day=1, year=1990)
print(NZ1$time)
The first statement is subtracting the number of days between the original origin
and 1 Jan 1990. The second statement updates the origin attribute so that the
dates are calculated (displayed) correctly.
The object a is a list object. Enter names(a) to see the variables that it con-
tains. You can extract the individual components in the usual way, for example
a$missing.time.
2. The object a also contains information about the spatial and temporal range of
the catalogue, for example, enter print(a$ranges) and print(a$time.range).
3. Since the catalogue is also a data frame, it can be treated like a matrix. In
particular, see the discussion in §3.5(8).
2. Usually one wants to analyse fairly small parts of earthquake catalogues. There
are four functions provided to subset catalogues: subsetcircle, subsetpolygon,
subsetsphere and subsetrect. Further information can be found about each
within the web browser help window.
5.5. SUBSETTING CATALOGUES AND SUBCATALOGUES 29
3. Select the help page for subsetcircle, and highlight the following statements
that can be found in the Examples section:
data(NZ55)
a <- subsetcircle(NZ55, centrelat=-41.3, centrelong=174.8,
maxradius=100)
print(summary(a))
The new catalogue called NZ7 will have the standard catalogue format, and so can
be treated like any other catalogue. For example, to select from the NZ7 catalogue
all events between 12:30:00 hrs on 2 Jan 1990 until 00:00:00 hrs on 31 July 1990
enter:
b <- subsetrect(NZ7, minday=julian(1,2,1990)+12/24+30/(24*60),
maxday=julian(7,31,1990))
summary(b)
Note that the julian function unfortunately uses the American ordering for the
date, i.e. month, day, year. The parameter ordering can be overriden by explicitly
stating the parameter names, e.g. julian(d=1, x=1, y=1970), with x meaning
month.
30 CHAPTER 5. THE SSLIB BASE PACKAGE (SSBASE)
Chapter 6
In this section we use catalogue data to demonstrate some of the graphical routines. A
listing of the main functions in the ssEDA package can be found in Appendix A.3, and
detailed documentation for all functions can be found in Harte (2003d).
1. The required events are found in the New Zealand catalogue. We must also attach
the ssEDA package if it has not already been attached as follows:
library(ssNZ)
library(ssEDA)
2. Initially select the required events from the New Zealand catalogue. We then make
a new temporary catalogue called “EastCape”. This catalogue can be treated like
any other earthquake catalogue in SSLib, though will be deleted at the end of this
R session if not saved.
31
32 CHAPTER 6. EXPLORATORY DATA ANALYSIS (SSEDA)
as.catalogue(a, catname="EastCape")
3. We next use the epicentres function to draw the epicentral plot. However, this
function requires an object of class "subset", (i.e. output from either subsetcircle,
subsetpolygon, subsetrect, or subsetsphere). Since we want all events in the
“EastCape” catalogue, we include no restrictions in the subsetrect function call
below:
a <- subsetrect(EastCape)
epicentres(a, criteria=FALSE,
magnitude=c(4,5,6,6.9,7.1), cex=c(0.5,1,3,5),
usr=c(177.5, 181, -39, -36.5))
4. The magnitude and cex arguments tell the function to represent larger earth-
quake events with larger symbols (see help documentation for epicentres for
more details).
5. Notice that the map of the East Cape of New Zealand in the plot is terrible. This
is because, by default, it is using a world map of low resolution ("world2"). Also
within the maps package is a low resolution map of NZ ("nz"), which will provide
higher resolution in the East Cape area. The required map is specified in the
"mapname" argument. Redo the plot as follows:
epicentres(a, criteria=FALSE,
magnitude=c(4,5,6,6.9,7.1), cex=c(0.5,1,3,5),
usr=c(177.5, 181, -39, -36.5), mapname="nz")
6. The mapdata package contains high resolution maps. If this is installed on your
system, try:
library(mapdata)
epicentres(a, criteria=FALSE,
magnitude=c(4,5,6,6.9,7.1), cex=c(0.5,1,3,5),
usr=c(177.5, 181, -39, -36.5), mapname="nzHires")
7. We can easily enhance the plot by adding a title and various place names:
epicentres(a, criteria=FALSE,
magnitude=c(4,5,6,6.9,7.1), cex=c(0.5,1,3,5),
usr=c(177.5, 181, -39, -36.5), mapname="nz")
title(main="East Cape (NZ) Event", cex.main=1.8, font.main=1)
1. Some significant NZ events are listed below. Copy these into a file called “events.dat”.
-41.76,172.04,Westport,12,6.7,23,05,1968,17,24,17.4
-34.94,179.30,Kermadec Trench,297,6.8,08,01,1970,17,12,36.6
-39.13,175.18,National Park,173,7.0,05,01,1973,13,54,27.6
-44.67,167.38,,12,6.5,04,05,1976,13,56,29.2
-46.70,166.03,,12,6.5,12,10,1979,10,25,22.1
-37.89,176.80,Edgecumbe,10,6.1,02,03,1987,01,42,35.0
-40.43,176.47,Weber,30,6.2,13,05,1990,04,23,10.2
-41.61,173.65,Marlborough,84,6.7,27,05,1992,22,30,36.1
-45.21,166.71,Secretary Island,5,6.7,10,08,1993,00,51,51.6
-43.01,171.46,Arthurs Pass,11,6.7,18,06,1994,03,25,15.2
-37.65,179.49,East Cape,12,7.0,05,02,1995,22,51,02.3
2. Attach the ssEDA library, read the events, and create a catalogue (discussed in
§5.2) called “NZ1”:
library(ssEDA)
NZ1 <- scan("events.dat", what=list(latitude=0, longitude=0,
event="", depth=0, magnitude=0, day=0, month=0,
year=0, hour=0, minute=0, second=0), sep=",")
as.catalogue(NZ1, catname="NZ1", dp.second=1)
print(NZ1)
The map data is that from a low resolution version of the world map (default).
4. A better looking map can be drawn by using the low resolution version of the NZ
map, which is also contained in the maps package.
epicentres(a, cex=1, usr=c(165, 180, -48, -34), mapname="nz")
6. Now remove subsetting criteria shown at the bottom of the plot, use a symbol
colour that represents the depth, and a symbol size that represents the magnitude
of the event. Also annotate with some event names:
34 CHAPTER 6. EXPLORATORY DATA ANALYSIS (SSEDA)
1. The required events can be found in the PDE catalogue. Load the library con-
taining this catalogue and ssEDA:
library(ssPDE)
library(ssEDA)
2. Extract the events with magnitude ≥ 5 from the PDE catalogue and plot:
b <- subsetrect(PDE, minlong=90, maxlong=300, minlat=-80,
maxlat=80, minmag=5, minday=julian(1,1,1990),
maxday=julian(1,1,1993))
2. Extract events with ML ≥ 2, with a maximum depth of 200 km, between 1 January
1978 and 31 December 1991 from the Wellington Catalogue:
b <- subsetrect(Wellington, minmag=2, minday=julian(1,1,1978),
maxday=julian(1,1,1992), maxdepth=200)
3. To view the subduction boundary, enter threeD(b). An XGobi window will ap-
pear; maximise to fill the screen and also move the lines to enlarge the plotting
area. The depth will be on the vertical axis, with either longitude or latitude on
the horizontal axis. Pull down the menu at the top called View: XYPlot, and se-
lect Rotation (r). Then click Pause so that it rotates (if it is not already rotating).
You can stop it by clicking Pause. The speed of the rotation can be changed by
dragging the bar in the slider window beneath the File and View menus.
The displayed viewing perspective has events with zero depth at the top of the
picture, and deepest events at the bottom. The “clocks” on the right describe
what is happening to each of the three spatial variables. One can easily see the
subducting plate slab. The lines at depth values of 5 km and 12 km are from poorly
determined events.
You can also move the points in whatever direction that you like. Stop the rotation
by clicking Pause. Then put the mouse cursor onto the plot, and holding down
the first mouse button, drag the points. If you loose your orientation, and get lost,
click the button Reinit. This will reinitialise the picture to its original orientation.
Quit the XGobi window by pulling down the File menu, then Quit.
4. A high resolution plot can be done by using the function rotation. By viewing
roughly towards the north-east (essentially along the direction of the main moun-
tain range), the plate boundaries were approximately aligned. In particular, a
rotation of −40◦ (from north) is specified as follows:
rotation(b, theta=-40)
title(main="Plate Subduction in Wellington Region")
5. An epicentral plot can also display the subduction process. This is achieved by
plotting deeper events with a colour at the blue end of the spectrum up to shallow
events at the red end. The size of the plotting symbol represents the magnitude
of the event. Note that the usr argument specifies the axis limits of the plot.
epicentres(b, usr=c(173.55, 176.05, -42.13, -40.47),
depth=c(0, 30, 50, 70, 100, Inf), criteria=FALSE,
magnitude=c(2, 3, 4, 5, 6, Inf), mapname="nz")
title(main="Plate Subduction in Wellington Region", font.main=1)
6. A high resolution map can be drawn if the package mapdata is installed on your
system, as follows:
library(mapdata)
epicentres(b, usr=c(173.55, 176.05, -42.13, -40.47),
depth=c(0, 30, 50, 70, 100, Inf), criteria=FALSE,
36 CHAPTER 6. EXPLORATORY DATA ANALYSIS (SSEDA)
par(mfrow=c(2,1))
a <- subsetrect(NZ, minday=julian(1,1,1965), maxday=julian(1,1,1995),
mindepth=0, maxdepth=39.99, minmag=4)
depth.hist(a)
title(main="Depth Distribution: Shallow Events")
Note the large peaks at 5 km, 12 km and 33 km. These are generally events with
poorly determined locations.
2. If one requires more control in the way that the histogram is drawn, it is easier
to create a temporary subcatalogue as follows, and then directly call the hist
function.
par(mfrow=c(1,1))
a <- subsetrect(NZ, minday=julian(1,1,1965), maxday=julian(1,1,1995),
mindepth=0, maxdepth=39.99, minmag=4)
as.catalogue(a, catname="temp")
par(mfrow=c(1,1))
b <- subsetrect(NZ, minday=julian(1,1,1964), maxday=julian(1,1,1993),
mindepth=40, maxdepth=120, minmag=4)
freq.magnitude(b)
Note that the vertical axis label includes a subscript. Other mathematical sym-
bols (including Greek characters) can easily be included on plots, see the topic
plotmath in the help documentation.
2. The freq.magnitude function also calculates the b-value (i.e. the slope of the
line), and returns this value if it is assigned to another object as follows:
bvalue <- freq.magnitude(b)
print(bvalue)
Note that, by default, the b-value is estimated using the maximum likelihood
estimator. See the help documentation for freq.magnitude for further details.
par(mfrow=c(1,1))
b <- subsetrect(NZ, minday=julian(1,1,1961), maxday=julian(1,1,2001),
minmag=4)
timeplot(b)
title(main=expression(paste("Events in NZ Catalogue with", M[L] >= 4)))
38 CHAPTER 6. EXPLORATORY DATA ANALYSIS (SSEDA)
Note that the title includes a subscript and a “≥” sign. Other mathematical symbols
(including Greek characters) can easily be included on plots, see the topic plotmath in
the help documentation.
Chapter 7
A listing of the main functions in the PtProcess package can be found in Appendix
A.4, and detailed documentation for all functions can be found in Harte (2003b). Some
further mathematical details about the model formulation used by the package PtPro-
cess and various relationships can be found in Appendix C. A general text on point
process modelling is provided by Daley & Vere-Jones (2003).
as.catalogue(a, catname="x")
2. The magnitudes would be transformed so that they represent the number of mag-
nitude units above M0 . Assuming that M0 = 5, then:
x$magnitude <- x$magnitude - 5
39
40 CHAPTER 7. POINT PROCESS MODELLING (PTPROCESS)
3. The times are stored as the number of days (and fractions) from some origin, usu-
ally 1 Jan 1970, though not necessarily. The object x$time has class "datetimes"
which causes the dates to be printed in the usual way, i.e. day, month, year, etc.
To print the dates in the standard format, use print(x$time); the attributes
are listed by using print(attributes(x$time)); note the origin. Further, the
number of days from the origin is printed as print(as.vector(x$time)).
4. The times are transformed to the origin required by the point process model.
Assume that this is midnight on 1 January 1965. We want to ensure that the
julian function uses the same origin as that used in x$time. The julian function
below calculates the number of days from the current origin to 1 January 1965.
x$time <- x$time - julian(1,1,1965, origin=attr(x$time, "origin"))
attr(x$time, "origin") <- c(month=1, day=1, year=1965)
By resetting the origin attribute on the time variable, the times will still be printed
correctly, for example, check by printing the first one hundred events, i.e.
print(x[(1:100),])
Alternatively, the attributes can be stripped from x$time so that it will al-
ways be printed as the number of days from 1 Jan 1965. This is done as:
x$time <- as.vector(x$time).
2. Below we fit the stress release model, using the conditional intensity function
srm.cif, to events in the NthChina dataset. This conditional intensity function
is defined in the PtProcess manual (see Harte, 2003b, for further details). These
data are stored as part of the PtProcess package. Read the events by entering:
library(PtProcess)
data(NthChina)
onto the R command line. The PtProcess package manual (Harte, 2003b) con-
tains more information about the North China data. Note that here the time
variable has already been scaled to represent the number of years from 1480 AD,
and the magnitude is the number of magnitude units greater than 6. Hence there
is no need for the adjustments discussed in §7.1 above.
3. The data span 517 years from 1480 AD. Set up a time interval variable TT as
follows:
TT <- c(0, 517)
7.3. UNCONSTRAINED MAXIMUM LIKELIHOOD ESTIMATION 41
Six has been added to the magnitude so that the correct unadjusted value is
plotted.
5. Now we plot the conditional intensity on the interval TT. The stress release model
contains parameters a, b, and c which are specified in the params vector below, in
that order. Initially, we simply use the values as below, which turn out to be a
good approximation to the required values:
params <- c(-2, 0.01, 1)
ti <- seq(TT[1], TT[2], 0.5)
y <- srm.cif(NthChina, ti, params)
plot(ti, y, type="l", xlab="Years Since 1480 AD",
ylab=expression(lambda(t)))
6. The integral of λ(t) on the interval (0, 517) can be calculated and placed into y by
entering
y <- srm.cif(NthChina, NULL, params, TT=TT)
7. For a given matrix containing the history of the process, a specific conditional
intensity function, the interval of evaluation TT, and the corresponding vector of
parameter values params, we can calculate the log-likelihood as
y <- pp.LL(NthChina, srm.cif, params, TT)
1. Begin by attaching the North China dataset and creating a time interval variable
TT:
library(PtProcess)
data(NthChina)
TT <- c(0, 517)
2. To find the values of the parameters where the log-likelihood function is maximised,
we minimise the negative of the log-likelihood using the minimiser nlm. However,
nlm requires a function that only has the free arguments as parameters. Further,
42 CHAPTER 7. POINT PROCESS MODELLING (PTPROCESS)
it may also be the case that we want to constrain the parameters in some manner.
We do this by adding priors to the log-likelihood function to create a posterior
likelihood function. This will be discussed in §7.5. By default the make.posterior
function creates an unconstrained likelihood function. Now enter:
posterior <- make.posterior(NthChina, srm.cif, TT)
The object posterior is in fact a function, with only the parameter’s vector as an
argument. Enter posterior on the command line to view the function.
3. Since no priors were specified, the posterior function is simply the log-likelihood.
Enter
params <- c(-2, 0.01, 1)
posterior(params)
The print.level argument is set so that values of the parameters are printed at
each iteration. The iterations start at the initial parameter values given by the
params argument. The object z is a list object, containing the maximum likelihood
parameter estimates (z$estimate), and convergence messages. One should also
check that the derivatives, contained in the output object z, are sufficiently small.
5. Minimisation (or optimisation) is not straight forward. For the process to work
properly, the function nlm needs to know the relative scale of each of the parame-
ters. In the stress release model, the b parameter has a much finer possible range
of values than both of the other parameters. The typsize argument gives the
relative order of magnitude of step sizes that the minimiser initially should use in
its search for the minimum, e.g.
z <- nlm(neg.posterior, params, typsize=c(1, 0.01, 1), iterlim=1000,
print.level=2)
converges now in 14 iterations. Often the iteration procedure will get “lost” if
poor values are chosen for either of params or typsize.
6. Now plot the conditional intensity function using the maximum likelihood param-
eter estimates, with the magnitude-time graph as follows:
par(mfrow=c(2,1))
ti <- seq(TT[1], TT[2], 0.5)
y <- srm.cif(NthChina, ti, z$estimate)
ylab=expression(lambda(t)),
main="North China Historical Catalogue")
8. We may want to restrict the events over which we maximise the likelihood. For
example, say we wanted to maximise the likelihood over those events such that
100 < ti < 517. That is, the log-likelihood is
X Z 517
log λ(ti ) − λ(t)dt.
i:100≤ti ≤517 100
However, note that λ(t) = λ(t|Ht ), and so λ(t) will be calculated using the com-
plete history that is supplied in the matrix NthChina, but the parameters a, b and
c (in this case) will be determined by maximising the log-likelihood as specified
above. Thus, events in the interval (0, 100) will be used to calculate the stress
function S(t). Effectively, the events in the interval (0, 100) are being used to get
the process to a steady state. This can be done by entering:
posterior <- make.posterior(NthChina, srm.cif, c(100,517))
neg.posterior <- function(params) (-posterior(params))
z <- nlm(neg.posterior, params, typsize=c(1, 0.01, 1), iterlim=1000)
However, be warned that often the Hessian calculations are very sensitive to the
value of the differencing step used by nlm and also the number of iterations that
were required to achieve convergence. If the minimisation converged within a very
small number if iterations, the estimate of the Hessian may be very poor.
3. The standard errors can be extracted from the covariance matrix as follows:
stderr <- sqrt(diag(covariance))
4. The correlation matrix of the parameters can be calculated by pre and post mul-
tiplying the covariance matrix by a diagonal matrix containing the inverse of the
standard errors. This is done as follows:
correlation <- diag(1/stderr) %*% covariance %*% diag(1/stderr)
5. To make interpretation easier, we can name the rows and columns of the matrices
as follows:
param.names <- c("a", "b", "c")
dimnames(correlation) <- list(param.names, param.names)
dimnames(covariance) <- list(param.names, param.names)
param.est <- cbind(z$estimate, stderr)
dimnames(param.est) <- list(param.names, c("Estimate", "StdErr"))
print(covariance)
print(correlation)
print(param.est)
2. Reset the time origin to be 1 Jan 1990 (the start of the observation period). By
default, the time origin is 1 Jan 1970.
Palliser$time <- Palliser$time -
julian(1,1,1990, origin=attr(Palliser$time, "origin"))
attr(Palliser$time, "origin") <- c(month=1, day=1, year=1990)
3. Fit the unconstrained model. This is done in the same way as for the stress release
model, but with etas.cif instead.
Palliser$magnitude <- Palliser$magnitude - 2.5
TT <- c(0, julian(1,1,1992)-julian(1,1,1990))
initial <- c(0.025, 15, 1.3, 0.006, 1.21)
print(param.est)
print(correlation)
# mu A alpha CC P
# mu 1.00000000 -0.1967434 0.05942277 0.2907219 0.45475125
# A -0.19674336 1.0000000 -0.54625117 -0.8955979 -0.56881814
# alpha 0.05942277 -0.5462512 1.00000000 0.2270125 0.07102283
# CC 0.29072192 -0.8955979 0.22701248 1.0000000 0.82064150
# P 0.45475125 -0.5688181 0.07102283 0.8206415 1.00000000
5. Draw a contour plot of the likelihood surface as a function of the c and p parameters
by copying the following statements:
w <- pp.contours(Palliser, z$estimate, etas.cif, TT=TT, param.index=c(4, 5),
steps.x=seq(0.0055, 0.0065, 0.0001),
steps.y=seq(1.17, 1.25, 0.005))
46 CHAPTER 7. POINT PROCESS MODELLING (PTPROCESS)
8. We will also fix the value of c to be 0.006, achieved with the Dirac prior.
9. The required matrix can be formed by using the function prior.info. Select the
help documentation for this function from the web browser help window. The
required statements are:
Print the matrix y to the screen. Note that there is one row for each parameter.
The first 3 parameters have been restricted to take positive values, c has been fixed
to 0.006, and the p parameter has been given a Cauchy prior with parameters 1.2
and 0.1.
10. Now we must make an R function to calculate the log-likelihood together with
the prior distributions (i.e. weights or penalty functions). This is achieved by
entering:
Type posterior on the R command line to view the function. Notice that it is
only a function of the free parameters. It not only calls the log-likelihood function,
but also the specified prior distributions.
11. We now maximise the log-likelihood subject to these constraints. Note that we
now only estimate four parameters, hence the vector initial is only of length 4.
This is done by entering:
initial <- c(0.015, 15, 1.3, 1.2)
z1 <- nlm(neg.posterior, initial, iterlim=1000,
typsize=c(1, 100, 1, 1), print.level=2)
13. Now calculate the log-likelihood function and AICs using both the unconstrained
and constrained solutions by entering:
LL <- pp.LL(Palliser, etas.cif, z$estimate, TT=TT)
LL1 <- pp.LL(Palliser, etas.cif, z1$full, TT=TT)
AIC <- -2*LL + 2*5
AIC1 <- -2*LL1 + 2*4
Note that the two likelihoods and AICs are very similar.
2. The function cif.reformat can be used to achieve the same as above, and more.
It maps the parameter space of a given conditional intensity function to a lower
number of dimensions. See the help documentation (Harte, 2003b) for examples
using cif.reformat.
48 CHAPTER 7. POINT PROCESS MODELLING (PTPROCESS)
3. A linear trend component could be added to the standard ETAS model by defining
a new function as follows:
etas.plus.trend <- function(data, eval.pts, params, TT = NA){
# conditional intensity for ETAS plus linear trend
# params <- c(mu, A, alpha, CC, P, trend.slope)
ci <- etas.cif(data, eval.pts, params[1:5], TT = TT) +
poly.cif(data, eval.pts, c(0,params[6]), TT = TT)
# the constant in the polynomial is zero as is the same as mu
return(ci)
}
This new function could also be used in the standard manner. Any additive combi-
nation of the conditional intensity functions can be made. In general multiplicative
combinations will not work, as the integral terms will not be correct.
3. Now use the estimated model parameters to simulate data for 1992 and 1993.
Recall that our date origin is 1 January 1990, hence the simulation interval bounds
are calculated as:
T1 <- julian(1, 1, 1992, origin=c(month=1, day=1, year=1990))
T2 <- julian(1, 1, 1994, origin=c(month=1, day=1, year=1990))
print(T1)
print(T2)
The “seed” argument, an integer, causes the random number generator to start at
the same place each time. It is quite useful when one wants to regenerate the same
random numbers. As well as the simulated events being written to the object sim,
they are also written to the screen.
5. A magnitude-time plot of the observed events in 1990 and 1991, and the simulated
events for 1992 and 1993 can be plotted as below:
plot(sim[,"time"], 2.5 + sim[,"magnitude"], type="h",
xlab="Time", ylab="Magnitude", xlim=c(0, T2))
axis(3, at=c(365, 1070), tck=0,
labels=c("Observed Events", "Simulated Events"))
abline(v=T1, lty=3)
title(main=paste(c("mu", "A", "alpha", "c", "p"),
round(z$estimate, digits=3), sep="=", collapse=" "), line=3)
Note that 2.5 is being added to the magnitude, as this threshold value was sub-
tracted from Palliser$magnitude prior to the estimation stage above.
6. Sometimes simulations explode when using the ETAS model. This explosion oc-
curs because a stability requirement is not satisfied. Let k be the expected number
of “offspring” from a single “ancestor”, then the stability requirement is that k < 1.
The parameters satisfy the following relationship:
c β
k=A
p−1 β−α
The value of k for the above simulations can be calculated as:
params <- z$estimate
beta <- bvalue*log(10)
k <- params[2]*params[4]/(params[5]-1)*beta/(beta-params[3])
print(k)
7. By decreasing the b-value sufficiently, then k will be large enough so that the
process will have a high chance of exploding. For example, try:
50 CHAPTER 7. POINT PROCESS MODELLING (PTPROCESS)
If you need to stop it, hold down the “control” key and press “c”. Try with a
number of different values for the seed. When b = 0.9 the process should explode
less frequently than when b = 0.8.
8. The observed and simulated events can be plotted in the same manner as above.
9. Other model parameters can be adjusted, and the effect on the simulations ob-
served. However, if you adjust other parameters, be careful that the criticality
conditions are satisfied.
In the simulations above, we have used the thinning method. This method of simu-
lation can be used for a wide class of point process models. This is not to say that it will
be the most efficient method for a given model. For example, a more efficient method
to simulate the ETAS model is to generate the mainshock event (ancestor) then its off-
spring, then the offspring produced by the first generation, etc. This process is repeated
until extinction for each particular family line. At this time, the aftershock sequence is
finished. This method of simulation is exploiting a characteristic of the ETAS model.
For large simulations, one also needs to use an appropriate computer language and
program structure. For example R becomes slower when using loops. It works much
faster if one can write the algorithm in the form of matrix algebra.
Chapter 8
M8 Algorithm (ssM8)
Examples using the M8 functions can be found within the help documentation for the
functions M8, M8.series, and M8.TIP.
A listing of the main functions in the ssM8 package can be found in Appendix A.5,
and detailed documentation for all functions can be found in the package manual, see
Harte (2003e).
51
52 CHAPTER 8. M8 ALGORITHM (SSM8)
Chapter 9
The package HiddenMarkov contains functions for the simulation and fitting of dis-
crete time hidden Markov models, where the hidden Markov process has m discrete
states.
Some of the code is quite inefficient. The emphasis so far has been to ensure that
the code gives the correct answers.
Detailed documentation and examples for all functions in the HiddenMarkov pack-
age can be found in the package manual, see Harte (2005).
53
54 CHAPTER 9. HIDDEN MARKOV MODELS (HIDDENMARKOV)
Chapter 10
A listing of the main functions in the Fractal package can be found in Appendix A.6,
and detailed documentation for all functions can be found in the package manual, see
Harte (2003a).
An example using the Hill estimator to calculate the Rényi dimensions of the Can-
tor measure can be found in the help documentation for the function hill. Further
theoretical details can be found in Harte (2001).
55
56 CHAPTER 10. FRACTAL DIMENSION ESTIMATION (FRACTAL)
Part III
System Administration
57
Chapter 11
Software Installation
The Statistical Seismology Library cannot be installed until the R language has been
installed onto the computer.
11.1.1 Linux
R is available for 5 different “flavours” of Linux, namely Debian, Mandrake, RedHat,
Suse and Vinelinux. Within each flavour, only some versions have binary versions of R
available. In general a particular version of R is available for those versions of Linux
that were current when that version of R was released (sometimes more than one version
of a particular Linux release might be considered to be current).
The following describes installing R for RedHat distributions of Linux using the
binary distribution. This is done by downloading the appropriate rpm file for your
operating system from CRAN. The downloaded rpm file can be placed into any directory.
The R software will generally be installed into the system directories. Hence one will
need to login as “root” to ensure sufficient privileges to write into these directories.
The software is installed by issuing the command
rpm -ivh filename
in an XTERM window that is within the subdirectory containing the downloaded rpm
file. This will run the installation process, and cause the R software to be installed into
59
60 CHAPTER 11. SOFTWARE INSTALLATION
the appropriate system directories. At the completion of the job, the rpm file can be
deleted.
Software can subsequently be updated with a new version (i.e. new rpm file) as
rpm -Uvh filename
For more details about the use of the rpm command, enter
man rpm
11.2.1 unix/Linux
The SSLib software will generally be installed into the system directories. Hence one
will need to login as “root” to ensure sufficient privileges to write into these directories.
SSLib consists of individual R packages. An individual R package can be installed
by giving the following command at the XTERM prompt:
R CMD INSTALL packagename_*.*-*.tar.gz
where * denotes the version numbers. The XTERM window should be within the
directory that contains the package source file (usually a *.tar.gz) or the source directory.
Similarly, an individual package can be removed by issuing the following command:
R CMD REMOVE packagename
This removal process can be done from any directory, but only by the “root” user.
source code). These are recognised by their “.zip” filename extension, rather than the
“.tar.gz” extension used for the source distribution. Then use the “Install package(s)
from local zip files...” option in the “Packages” menu. Some of these zip files are
available from the SSLib web page, see the ”windows binaries” hyperlink.
62 CHAPTER 11. SOFTWARE INSTALLATION
Chapter 12
where * denotes the version numbers. Within the directory /sslib/R/, edit the file
“zzz.R”, in particular, add
require(CataloguePackageName)
below the other “require” statements. Then reinstall the package onto the operating
system.
in the unix xterm window, where packagename is the name of the package.
Note that the directory containing the package source code will have the name
packagename too. For further details, enter
R CMD check --help
63
64 CHAPTER 12. MODIFICATIONS AND ADDITIONS TO SSLIB
3. The tape archive file (*.tar.gz) can be created when the development of the source
code in the directory packagename is complete. This file is created by entering:
R CMD build --force packagename
in the unix xterm window. It will include the version numbers in the tar.gz
file name (e.g. ssBase 1.2-5.tar.gz), which are read from the DESCRIPTION
file within the source directory. It also performs a few other checks. For more
information, enter:
R CMD build --help
Appendices
65
Appendix A
The main functions in SSLib are listed below under their respective package name.
67
68 APPENDIX A. MAIN SSLIB FUNCTIONS
Base Package
?@,*'
"#?A
56/ . $!$,*
/:G57H"IH"I
!"#
8 9
# ?CB6
( .
' >1
/:';<"
'
#/2#,*43# .
'
D )%( .
E%
'E"F . + 4
="
as.catalogue /:G57H"IH"I /1G56HIHKJ
.
'0/1'!"
-
,*'
/:G57H"IH%L
$%&'(*) "+*,*'"
-
Figure A.1: Flow chart showing the relationship between objects in the ssBase package.
A.3. EDA PACKAGE (SSEDA) 69
EDA Package
!
,.+ -/+0
multigraph
Figure A.2: Flow chart showing the relationship between objects in the ssEDA package.
A.4. POINT PROCESS PACKAGE (PTPROCESS) 71
Specification of
Conditional
Catalogue Prior Distribution
Intensity Functions
Subsetting for Each
See A.3.1
Characteristics Model Parameter
See A.3.2
Calculate Log−likelihood in
Specified Time Interval
pp.LL
Optimiser
Figure A.3: Flow chart showing the relationships between the required functions em-
ployed to find the maximum likelihood estimates of a point process model.
A.4. POINT PROCESS PACKAGE (PTPROCESS) 73
/
0
7
!#4(
0 & "$
('
")
8 1 6:96
; !#"$%
&
+*
$
1 22 2
$4("$
3
,.-
( 0/
-+*
5
6
as.catalogue
Figure A.4: Flow chart showing the relationship between the required functions em-
ployed to evaluate the goodness of fit and simulate a point process model.
74 APPENDIX A. MAIN SSLIB FUNCTIONS
M8 Package
as.catalogue
decluster.M8
M8.series
%
#$%
!"
M8
M8.TIP
plot.M8
Figure A.5: Flow chart showing the relationship between objects in the M8 package.
76 APPENDIX A. MAIN SSLIB FUNCTIONS
Appendix B
Common R Functions
This Appendix provides a list of some of the many R functions which are available.
Further information can be found by using the browser help facility.
77
78 APPENDIX B. COMMON R FUNCTIONS
B.1.4 Lists
$ Extract or Replace Parts of an Object – Generic operator
[[ Extract or Replace Parts of an Object – Generic operator
as.list List Objects
c Combine Values into a Vector or List
is.list List Objects
lapply Apply a Function to Components of a List
length Length of a Vector or List
list List Objects
names Names Attribute of an Object
rev Reverse the Order of a Vector or List
sapply Apply a Function to Components of a List
split Split Data by Groups
unlist Simplify the Structure of a List
B.2. GRAPHS 79
B.2 Graphs
B.2.1 Add to Existing Plot
abline Plot Line in Intercept-Slope Form
arrows Plot Disconnected Line Segments or Arrows
axis Add an Axis to the Current Plot
box Add a Box Around a Plot
boxplot Box Plot
contour Contour Plot
identify Identify Points on Plot – Generic function
labels Labels for Printing or Plotting – Generic function
legend Put a Legend on a Plot
lines Add Lines or Points to Current Plot
mtext Text in the Margins of a Plot
labels Labels for Printing or Plotting – Generic function
points Add Lines or Points to Current Plot
80 APPENDIX B. COMMON R FUNCTIONS
B.4.5 Input/Output
cat General Printing
count.fields Count the Number of Fields per Line
history Display, Edit, Re-evaluate and Save Past R Expressions
list.files List the Files in a Directory/Folder
page Page Through Data
print Print Data – Generic function
readline Read a Line from the Terminal
scan Input Data from a File
scan.fixed Input Data from a Fixed Format File
sink Send R Output to a File
source Parse and Evaluate R Expressions from a File
system Execute a system (unix) Command
write Write Data to ASCII File
B.4.7 Miscellaneous
all.names Find All Names in an Expression
amatch Argument Matching
84 APPENDIX B. COMMON R FUNCTIONS
Mathematical Detail
1
λ(t|Ht ) = lim Pr{Nδ (t) > 0|Ht }.
δ→0 δ
Let τ be the time of the last event before time t, hence t > τ . Also let ∅(τ,t) be the
null outcome, i.e. no events in the interval (τ, t). Denote the conditional distribution of
the time of the next event as
f (t|Hτ ∩ ∅(τ,t) )
λ(t|Hτ ∩ ∅(τ,t) ) = .
1 − F (t|Hτ ∩ ∅(τ,t) )
where τ is the time of the last event occurring before t. Rearranging gives the density
function as
Z t
f (t|Hτ ∩ ∅(τ,t) ) = λ(t|Hτ ∩ ∅(τ,t) ) exp − λ(u|Hτ ∩ ∅(τ,u) )du .
τ
Say · · · < t−2 < t−1 < t0 < T1 < t1 < t2 < · · · < tn < T2 < tn+1 < tn+2 < · · ·,
where ti i ∈ Z are event times, and [T1 , T2 ] represents the interval over which we want
89
90 APPENDIX C. MATHEMATICAL DETAIL
log L(T1 , T2 )
n
X
= log f (t1 |HT1 ∩ ∅(T1 ,t1 ) ) + log f (ti |Hti−1 ∩ ∅(ti−1 ,ti ) ) + log 1 − F (T2 |Htn ∩ ∅(tn ,T2 ) )
i=2
n
X Z t1 n Z
X ti
= log λ(ti |Hti ) − λ(u|HT1 ∩ ∅(T1 ,u) )du − λ(u|Hti−1 ∩ ∅(ti−1 ,u) )du
i=1 T1 i=2 ti−1
Z T2
− λ(u|Htn ∩ ∅(tn ,u) )du
tn
X Z T2
= log λ(ti |Hti ) − λ(t|Ht )dt
i:T1 ≤ti ≤T2 T1
Often “|Ht ” is omitted, and throughout this document, λ(t) is understood to mean
λ(t|Ht ).
where µ ≥ 0 is a background rate, and γ(u) ≥ 0 (0 < u < ∞), γ(u) = 0 (u < 0), is
a “reproduction” function which the describes the rate atR which a “parent” at u = 0
produces “offspring” at time u. For stability we require γ(u)du < 1. Thus we may
also write X
λ(t) = µ + k g(t − ti ),
i:ti <t
where g has been normalized to form a probability density and k is the expected number
of direct “offspring” from a single “ancestor”. The stability requirement
R becomes k < 1;
k is sometimes called the “criticality” parameter. If k ≥ 1, or if γ(u)du = ∞, the
process “explodes”, i.e. if simulated, the overall rate of occurrence of offspring events
would grow larger and larger.
In principle, the model extends easily to situations where the conditional intensity
depends on space and magnitude or other additional variables. For example,
X
λ(t, x, M ) = µ(x, M ) + γ(t − ti , x − xi , M | Mi ),
i:ti <t
C.2. SELF-EXCITING AND ETAS MODELS 91
where the background rate depends on the location and on the magnitude class being
considered, and γ(u, x, M |M0 ) gives the rate of production of offspring of magnitude M
at location x and time u after a parent event of magnitude M0 at the origin of space
and time. In practice we use only forms with a simplified structure. If magnitudes
are assumed to follow a density f (M ) independently of all other features, and the
background rate is homogeneous in space, the conditional intensity can be written in
normalized form as
" #
X
λ(t, x, M ) = f (M ) µ + k a(Mi )g(t − ti , x − xi ) ,
i:ti <t
with
p−1 u −p β − α αM
f (M ) = βe−βM , g(u) = 1+ , and a(M ) = e .
c c β
Stability conditions for the model are p > 1, k < 1, β > α. If these conditions
are not satisfied, this form of parameterisation fails (densities may become negative), so
care must be taken in ensuring that the constraints are observed.
t − ti −p
K X α(Mi −M0 )
= µ+ p e 1+ .
c c
i:ti <t
In this parameterisation the requirements p > 1, α < β are not explicitly enforced, so
that if the likelihood is maximized freely, the optimum may occur at a point where one
or other of these constraints is broken. This tends to happen in situations where there
is an increasing trend or other form of departure from the type of behaviour postulated
by the model, for example in modelling intermediate-depth earthquakes.
92 APPENDIX C. MATHEMATICAL DETAIL
−p
X
α(Mi −M0 ) t − ti
λ(t) = µ + A e 1+
c
i:ti <t
If p 6= 1, then:
ti ti
t − tj −p
Z Z X
λ(t)dt = µ(ti − ti−1 ) + A eα(Mj −M0 ) 1 + dt
ti−1 ti−1 j:t <t c
j
1−p ti
Ac X t − tj
= µ(ti − ti−1 ) + eα(Mj −M0 ) 1 +
1−p c
j:tj <t
ti−1
1−p
Ac X ti − tj
= µ(ti − ti−1 ) + eα(Mj −M0 ) 1 +
1−p c
j:tj <ti
ti−1 − tj 1−p
Ac X
α(Mj −M0 )
− e 1+
1−p c
j:tj ≤ti−1
Z ti k Z
X ti−k+j
λ(t)dt = λ(t)dt
ti−k j=1 ti−k+j−1
ti − tj 1−p
Ac X α(Mj −M0 )
= µ(ti − ti−k ) + e 1+
1−p c
j:tj <ti
1−p i−1
Ac X
α(Mj −M0 ) ti−k − tj Ac X α(Mj −M0 )
− e 1+ − e
1−p c 1−p
j:tj <ti−k j=i−k
" #
ti − tj 1−p
Ac X α(Mj −M0 )
= µ(ti − ti−k ) + e 1+ −1
1−p c
j:tj <ti
" 1−p #
Ac X t i−k − t j
− eα(Mj −M0 ) 1+ −1
1−p c
j:tj <ti−k
" #
T2
T2 − tj 1−p
Z
Ac X α(Mj −M0 )
λ(t)dt = µ(T2 − T1 ) + e 1+ −1
T1 1−p c
j:tj <T2
" #
T1 − tj 1−p
Ac X α(Mj −M0 )
− e 1+ −1
1−p c
j:tj <T1
C.3. STRESS RELEASE MODELS 93
Similarly, for p = 1
T2
T 2 − tj
Z X
α(Mj −M0 )
λ(t)dt = µ(T2 − T1 ) + Ac e log 1 +
T1 c
j:tj <T2
X T 1 − tj
− Ac eα(Mj −M0 ) log 1 +
c
j:tj <T1
Z ti+1 Z ti+1
λ(t)dt = ea eb[t−cS(t)] dt
ti ti
1 a−bcS(ti+1 ) h bti+1 i
= e e − ebti
b
If there are no events in the interval [T1 , T2 ], then
Z T2
1 h i
λ(t)dt = ea−bcS(T2 ) ebT2 − ebT1
T1 b
Say · · · < t−2 < t−1 < t0 < T1 < t1 < t2 < · · · < tn < T2 < tn+1 < tn+2 < · · ·,
where ti i ∈ Z are event times, and [T1 , T2 ] represents the interval over which we want
to maximise the likelihood. Thus, the interval [T1 , T2 ] contains n events, and
Z T2 Z t1 n−1
X Z ti+1 Z T2
λ(t)dt = λ(t)dt + λ(t)dt + λ(t)dt
T1 T1 i=1 ti tn
where Sj (t) is as in Equation C.1, but the summation is only taken over those events in
region j. The structure of the matrix C determines the possible types of stress transfer
between regions.
Say · · · < t−2 < t−1 < t0 < T1 < t1 < t2 < · · · < tn < T2 < tn+1 < tn+2 < · · ·,
where ti i ∈ Z are event times (over the union of all regions), and [T1 , T2 ] represents the
interval over which we want to maximise the likelihood. Also let R be a set containing
the region labels, i.e. r ∈ R = {1, 2, · · · , k}. Then
Z ti+1 k
exp(br ti+1 ) − exp(br ti ) X
λ(t, r)dt = exp ar − br crj Sj (ti+1 )
ti br
j=1
and hence
n−1
" #
Z Z T2 X Z t1 Z T2 X Z ti+1
λ(t, r)dtdr = λ(t, r)dt + λ(t, r)dt + λ(t, r)dt .
R T1 r∈R T1 tn i=1 ti
C.4.1 Algorithm
1. Let T be the start of a small simulation interval.
5. If
λ(T + τ )
<1
λmax
then go to (6).
Else no events occur in (T, T + δ), hence T ← T + δ, and return to (2).
7. If
λ(T + τ )
U≤ ,
λmax
then a new “event” occurs at ti = T + τ .
T ←T +τ.
9. Return to (2).
2. When λ(t) is monotonically increasing (except at event times), e.g. the stress
release model, there are two extreme situations that could cause the process to be
inefficient.
(a) If δ is too small, λmax will be relatively small, hence τ quite large, possibly
greater than T + δ. Here many small intervals will be considered, but each
with a very low likelihood of an event.
(b) If δ is too large, λmax will be relatively large, hence τ will be quite small.
This could be inefficient as many possible “events” will be “thinned”.
Hence, for best efficiency, one requires δ to be not too small, but also not too large.
(a) A too small b-value many cause aftershock sequences to never die out in the
ETAS model.
(b) A too smaller b-value in the stress release model may cause an extremely
long period of quiescence. An alternative is to use the truncated Pareto
distribution rather than the exponential distribution.
96 APPENDIX C. MATHEMATICAL DETAIL
References
Chambers, J.M. & Hastie, T. (1991). Statistical Models in S. Wadesworth and Brooks-
Cole, Pacific Grove CA.
Daley, D.J. & Vere-Jones, D. (2003). An Introduction to the Theory of Point Processes.
Volume I: Elementary Theory and Methods. Second Edition. Springer-Verlag,
New York. ISBN: 0-387-95541-0.
Harte, D.S. (1998). Documentation for the Statistical Seismology Library. School
of Mathematical and Computing Sciences Research Report No. 98-10. Victoria
University of Wellington, Wellington.
Harte, D.S. (2001). Multifractals: Theory and Applications. Chapman and Hall/CRC,
Boca Raton. ISBN: 1-58488-154-2.
Harte, D.S. (2003a). Package Fractal: Fractal Analysis. Manual of Function Docu-
mentation. Statistics Research Associates, Wellington. URL: http://homepages.
paradise.net.nz/david.harte/SSLib/Manuals/fractal.pdf.
Harte, D.S. (2003b). Package PtProcess: Time Dependent Point Process Modelling.
Manual of Function Documentation. Statistics Research Associates, Wellington.
URL: http://homepages.paradise.net.nz/david.harte/SSLib/Manuals/pp.pdf.
Harte, D.S. (2003c). Package ssBase: Base Functions for SSLib. Manual of Func-
tion Documentation. Statistics Research Associates, Wellington. URL: http:
//homepages.paradise.net.nz/david.harte/SSLib/Manuals/base.pdf.
Harte, D.S. (2003d). Package ssEDA: Exploratory Data Analysis for Earthquake Data.
Manual of Function Documentation. Statistics Research Associates, Wellington.
URL: http://homepages.paradise.net.nz/david.harte/SSLib/Manuals/eda.
pdf.
Harte, D.S. (2003e). Package ssM8: M8 Earthquake Forecasting Algorithm. Manual
of Function Documentation. Statistics Research Associates, Wellington. URL:
http://homepages.paradise.net.nz/david.harte/SSLib/Manuals/m8.pdf.
Harte, D.S. (2003f). Package ssNZ: Catalogue of NZ Earthquake Events. Manual
of Function Documentation. Statistics Research Associates, Wellington. URL:
http://homepages.paradise.net.nz/david.harte/SSLib/Manuals/nz.pdf.
97
98 REFERENCES
Harte, D.S. (2003g). Package ssPDE: PDE Catalogue of World Wide Earthquake
Events. Manual of Function Documentation. Statistics Research Associates,
Wellington. URL: http://homepages.paradise.net.nz/david.harte/SSLib/
Manuals/pde.pdf.
Ihaka, R. & Gentleman, R. (1996). R: A language for data analysis and graphics.
Journal of Computational and Graphical Statistics 5(3), 299–314.
Lay, T. & Wallace, T.C. (1995). Modern Global Seismology. Academic Press, San
Diego. ISBN: 0-12-732870-X.
Statistical Sciences Inc. (1992). S-PLUS Programmers Manual, Version 3.0. Statisti-
cal Sciences Inc, Seattle.
Utsu, T. & Ogata, Y. (1997). Statistical analysis of seismicity. In: Algorithms for
Earthquake Statistics and Prediction (Edited by: J.H. Healy, V.I. Keilis-Borok &
W.H.K. Lee), pp 13–94. IASPEI, Menlo Park CA.