0% found this document useful (0 votes)
84 views106 pages

Sslib

sslib seismology
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
84 views106 pages

Sslib

sslib seismology
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 106

Users Guide

for the
Statistical Seismology Library

David Harte

Statistics Research Associates Limited


PO Box 12 649
Wellington NZ
www.statsresearch.co.nz
ii

This Version: 20 November 2007

Postal Address: Statistics Research Associates


PO Box 12 649
Thorndon
Wellington
New Zealand

URL: www.statsresearch.co.nz

Email: david‘at’statsresearch.co.nz

Copyright c 2007 by David Harte. This document may be reproduced and distributed
in any medium so long as the entire document, including this copyright notice and the
version date above, remains intact and unchanged on all copies. Commercial redistri-
bution is permitted, but you may not redistribute it, in whole or in part, under terms
more restrictive than those under which you received it.

The document should be cited in the usual scientific manner, and should contain the
following information:

Harte, D. (2007). Users Guide for the Statistical Seismology Library.


Statistics Research Associates, Wellington.
URL: www.statsresearch.co.nz/software.html
Contents

Preface vii

I The R Language 1

1 Introduction to R 3
1.1 Starting R . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
1.2 Quitting R . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
1.3 Vectors . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
1.4 Matrices . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
1.5 Mode of an Object . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
1.6 Function Objects . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
1.7 Help Documentation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8
1.8 Writing Functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8
1.9 Graphs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9

2 Input-Output Methods 11
2.1 Reading Data from a Text File . . . . . . . . . . . . . . . . . . . . . . . 11
2.2 Including an R Program Source File . . . . . . . . . . . . . . . . . . . . 12
2.3 Writing R Objects to a Text File . . . . . . . . . . . . . . . . . . . . . . 13
2.4 Writing Program Output to a Text File . . . . . . . . . . . . . . . . . . 13
2.5 Saving R Objects for Use in a Subsequent Session . . . . . . . . . . . . . 13
2.6 Retrieving R Objects from a Previous Session . . . . . . . . . . . . . . . 14
2.7 Executing FORTRAN and C++ from within R . . . . . . . . . . . . . . 14
2.8 Running Jobs in Batch Mode . . . . . . . . . . . . . . . . . . . . . . . . 14

3 More Advanced Data Structures 15


3.1 List Objects . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15
3.2 Factors . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16
3.3 Attributes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17
3.4 Class Attribute, Generic and Method Functions . . . . . . . . . . . . . . 18
3.5 Data Frame . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19

iii
iv CONTENTS

II The Statistical Seismology Library 21

4 Statistical Seismology Library (sslib) 23


4.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23
4.2 Attaching the Statistical Seismology Library . . . . . . . . . . . . . . . . 23

5 The SSLib Base Package (ssBase) 25


5.1 Structure of Earthquake Catalogues in SSLib . . . . . . . . . . . . . . . 25
5.2 Creating a Simple Catalogue . . . . . . . . . . . . . . . . . . . . . . . . 26
5.3 The Time Variable . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27
5.4 More About Earthquake Catalogues . . . . . . . . . . . . . . . . . . . . 28
5.5 Subsetting Catalogues and Subcatalogues . . . . . . . . . . . . . . . . . 28

6 Exploratory Data Analysis (ssEDA) 31


6.1 Epicentral Plots . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31
6.1.1 NZ East Cape Event . . . . . . . . . . . . . . . . . . . . . . . . . 31
6.1.2 Significant New Zealand Events . . . . . . . . . . . . . . . . . . . 33
6.1.3 Circum-Pacific Events . . . . . . . . . . . . . . . . . . . . . . . . 34
6.2 Plate Subduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34
6.2.1 Wellington Local Area . . . . . . . . . . . . . . . . . . . . . . . . 34
6.2.2 New Zealand Region . . . . . . . . . . . . . . . . . . . . . . . . . 36
6.3 General Statistics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36
6.3.1 Depth Distribution . . . . . . . . . . . . . . . . . . . . . . . . . . 36
6.3.2 Frequency-Magnitude Distribution . . . . . . . . . . . . . . . . . 37
6.3.3 Time-Series of Event Counts . . . . . . . . . . . . . . . . . . . . 37

7 Point Process Modelling (PtProcess) 39


7.1 Using Catalogue Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39
7.2 Conditional Intensity and Log-Likelihood Functions . . . . . . . . . . . 40
7.3 Unconstrained Maximum Likelihood Estimation . . . . . . . . . . . . . 41
7.4 Calculation of Standard Errors . . . . . . . . . . . . . . . . . . . . . . . 43
7.5 Constrained Maximum Likelihood Estimation . . . . . . . . . . . . . . . 44
7.6 Modifying Conditional Intensity Functions . . . . . . . . . . . . . . . . . 47
7.7 Simulating Point Process Models . . . . . . . . . . . . . . . . . . . . . . 48

8 M8 Algorithm (ssM8) 51

9 Hidden Markov Models (HiddenMarkov) 53

10 Fractal Dimension Estimation (Fractal) 55


CONTENTS v

III System Administration 57

11 Software Installation 59
11.1 Installation or Updating of the R Software . . . . . . . . . . . . . . . . . 59
11.1.1 Linux . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 59
11.1.2 Microsoft Windows . . . . . . . . . . . . . . . . . . . . . . . . . . 60
11.2 Installation or Updating of SSLib . . . . . . . . . . . . . . . . . . . . . . 60
11.2.1 unix/Linux . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 60
11.2.2 Microsoft Windows . . . . . . . . . . . . . . . . . . . . . . . . . . 60

12 Modifications and Additions to SSLib 63


12.1 Modification of “Required” Packages by sslib . . . . . . . . . . . . . . . 63
12.2 Writing a Package for Inclusion into SSLib . . . . . . . . . . . . . . . . . 63
12.3 Creating an Earthquake Catalogue Package . . . . . . . . . . . . . . . . 64

IV Appendices 65

A Main SSLib Functions 67


A.1 Earthquake Catalogue Packages . . . . . . . . . . . . . . . . . . . . . . . 67
A.2 Base Package (ssBase) . . . . . . . . . . . . . . . . . . . . . . . . . . . . 67
A.2.1 Basic Catalogue Characteristics . . . . . . . . . . . . . . . . . . . 67
A.2.2 Catalogue Subsetting Functions . . . . . . . . . . . . . . . . . . . 67
A.2.3 Date Related Functions . . . . . . . . . . . . . . . . . . . . . . . 69
A.2.4 Miscellaneous Functions . . . . . . . . . . . . . . . . . . . . . . . 69
A.3 EDA Package (ssEDA) . . . . . . . . . . . . . . . . . . . . . . . . . . . . 69
A.3.1 Graphical Summaries . . . . . . . . . . . . . . . . . . . . . . . . 69
A.3.2 Probability Distributions Originating in Seismology . . . . . . . 69
A.3.3 Miscellaneous Functions . . . . . . . . . . . . . . . . . . . . . . . 71
A.4 Point Process Package (PtProcess) . . . . . . . . . . . . . . . . . . . . . 71
A.4.1 Conditional Intensity Functions . . . . . . . . . . . . . . . . . . . 71
A.4.2 Prior Functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . 71
A.4.3 Fitting and Evaluating Point Process Models . . . . . . . . . . . 71
A.4.4 Miscellaneous Functions . . . . . . . . . . . . . . . . . . . . . . . 74
A.5 M8 Package (ssM8) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 74
A.5.1 Main M8 Functions . . . . . . . . . . . . . . . . . . . . . . . . . 74
A.6 Fractal Package (Fractal) . . . . . . . . . . . . . . . . . . . . . . . . . . 74
A.6.1 Simulate Fractal Processes . . . . . . . . . . . . . . . . . . . . . . 74
A.6.2 Estimate Rényi Dimensions . . . . . . . . . . . . . . . . . . . . . 74

B Common R Functions 77
B.1 Data Objects . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 77
B.1.1 Checking and Creating Different Data Types . . . . . . . . . . . 77
B.1.2 Data Attributes . . . . . . . . . . . . . . . . . . . . . . . . . . . . 77
vi CONTENTS

B.1.3 Data Storage . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 78


B.1.4 Lists . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 78
B.1.5 Matrices and Arrays . . . . . . . . . . . . . . . . . . . . . . . . . 79
B.2 Graphs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 79
B.2.1 Add to Existing Plot . . . . . . . . . . . . . . . . . . . . . . . . . 79
B.2.2 Graphical Devices . . . . . . . . . . . . . . . . . . . . . . . . . . 80
B.2.3 High-Level Plots . . . . . . . . . . . . . . . . . . . . . . . . . . . 80
B.2.4 Interacting with Plots . . . . . . . . . . . . . . . . . . . . . . . . 80
B.3 Help Documentation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 81
B.4 Programming Constructs . . . . . . . . . . . . . . . . . . . . . . . . . . 81
B.4.1 Arithmetic Operators . . . . . . . . . . . . . . . . . . . . . . . . 81
B.4.2 Character Data Operations . . . . . . . . . . . . . . . . . . . . . 81
B.4.3 Control Flow . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 81
B.4.4 Data Manipulation . . . . . . . . . . . . . . . . . . . . . . . . . . 82
B.4.5 Input/Output . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 83
B.4.6 Logical Operators . . . . . . . . . . . . . . . . . . . . . . . . . . 83
B.4.7 Miscellaneous . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 83
B.4.8 Pseudo Looping Functions (Apply) . . . . . . . . . . . . . . . . . 84
B.5 Technical Computations . . . . . . . . . . . . . . . . . . . . . . . . . . . 84
B.5.1 Categorical Data . . . . . . . . . . . . . . . . . . . . . . . . . . . 84
B.5.2 Miscellaneous Mathematical Functions . . . . . . . . . . . . . . . 84
B.5.3 Probability Distributions and Random Numbers . . . . . . . . . 85
B.5.4 Rounding Functions . . . . . . . . . . . . . . . . . . . . . . . . . 85
B.5.5 Statistical Functions . . . . . . . . . . . . . . . . . . . . . . . . . 86
B.5.6 Time Series . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 86
B.5.7 Trigonometric Functions . . . . . . . . . . . . . . . . . . . . . . . 86

C Mathematical Detail 89
C.1 Point Process Log-Likelihood Function . . . . . . . . . . . . . . . . . . . 89
C.2 Self-Exciting and ETAS Models . . . . . . . . . . . . . . . . . . . . . . . 90
C.2.1 Self-Exciting Models . . . . . . . . . . . . . . . . . . . . . . . . . 90
C.2.2 The ETAS Model . . . . . . . . . . . . . . . . . . . . . . . . . . . 91
C.2.3 Utsu & Ogata’s Parameterisation . . . . . . . . . . . . . . . . . . 91
C.2.4 SSLib Parameterisation . . . . . . . . . . . . . . . . . . . . . . . 92
C.3 Stress Release Models . . . . . . . . . . . . . . . . . . . . . . . . . . . . 93
C.3.1 Simple Stress Release Model . . . . . . . . . . . . . . . . . . . . 93
C.3.2 Linked Stress Release Model . . . . . . . . . . . . . . . . . . . . 94
C.4 Simulation Using the Thinning Method . . . . . . . . . . . . . . . . . . 94
C.4.1 Algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 94
C.4.2 Simulation Notes . . . . . . . . . . . . . . . . . . . . . . . . . . . 95

References 97
Preface

The Statistical Seismology Library (SSLib) is a collection of earthquake hypocentral


catalogues and R functions to analyse the catalogues. The analyses include graphical
data displays, fitting of point process models, estimation of fractal dimensions, and rou-
tines to apply the M8 Algorithm to given datasets. The Statistical Seismology Library
is written in the R language, and consist of a number of R packages. Each package has
its own Users Manual that contains documentation for all functions within that package.
This Users Guide contains a sequence of examples, showing how the functions can
be implemented and how they are related to each other. The guide is divided into three
parts. The first part gives an introduction to the R language, emphasising those features
that are important for an understanding of SSLib. In the second part, examples are given
for each package in SSLib, showing how the functions are related to each other. The
third section is more technical, and really relates to system administration: installation
of software, software modification, and inclusion of local earthquake catalogues.
Contributions to SSLib have been made by: Ray Brownrigg, Edwin Choi, Robert
Davies, Michael Eglinton, David Harte, Dongfeng Li, Alistair Merrifield, Andrew Toke-
ley, David Vere-Jones, Wenzheng Yang, Leon Young, Irina Zhdanova and Jiancang
Zhuang. Ray Brownrigg translated the original S-PLUS code (Harte, 1998), where nec-
essary, into R and packaged the various library parts into R packages.

vii
viii PREFACE
Part I

The R Language

1
Chapter 1

Introduction to R

Like S-PLUS (Statistical Sciences Inc., 1992), R is a statistical programming language (R


Development Core Team, 2003) based on the S language (see Chambers & Hastie, 1991).
R originated at the University of Auckland in New Zealand, the original authors being
Ihaka & Gentleman (1996). An introduction to R is provided by the R Development
Core Team (2000), and another introduction has been written by Maindonald & Braun
(2003).
R refers to data structures, and functions containing programming code, as objects.
There are four object types in which we are particularly interested: vectors, matrices,
lists, and functions. In this chapter we discuss vector, matrix, and function objects; and
introduce simple graphics. In Chapter 3 more complicated data objects are introduced.
In this guide, we have included much programming code. We have generally not
included the R response to these commands. It is intended that the reader has an R
session running while viewing this document with a PDF reader. One can then easily
highlight the programming statements in the PDF reader, and dump them into the R
window, and hence then observe the response.

1.1 Starting R
Start R by entering
R

on the xterm or console command line. Your window will look something like:
david> R

R : Copyright 2004, The R Foundation for Statistical Computing


Version 2.0.1 (2004-11-15), ISBN 3-900051-07-0

R is free software and comes with ABSOLUTELY NO WARRANTY.


You are welcome to redistribute it under certain conditions.
Type ’license()’ or ’licence()’ for distribution details.

3
4 CHAPTER 1. INTRODUCTION TO R

R is a collaborative project with many contributors.


Type ’contributors()’ for more information and
’citation()’ on how to cite R or R packages in publications.

Type ’demo()’ for some demos, ’help()’ for on-line help, or


’help.start()’ for a HTML browser interface to help.
Type ’q()’ to quit R.

>

1.2 Quitting R
When you have completed your session within R, quit by entering q(). When you quit
R, it will ask whether you want to save any of the R objects that you may have created
during the session. You can list them by entering ls() on the command line. So far we
have not created anything, so there should be nothing listed. If you choose to save any
objects when quitting R they will be written to the disk into the directory from which
you started R, and into a file called .RData. Next time you start R from within this
subdirectory, the objects that have been saved into the file .RData will be automatically
loaded into the R session. This is discussed further in §2.5.

1.3 Vectors
1. Vectors are constructed using the c function, which stands for combine or concate-
nate. For example, within the R window, enter
a <- c(1, 2, 3, 4, 5)
b <- c(2, 4, 6, 8, 10)
d <- c(3, 9, 27)

The <- means assign the value of the object on the right to an object with the
given name on the left. Thus we have three vectors a, b and d.
To save possible confusion, we prefer not to use c as a vector name, since it is the
name of a system function c (combine or concatenate). We will look at functions
more carefully in §1.6.

2. Basic arithmetic operators (^, ∗, /, +, −) act on an element by element basis. En-


tering a*b on the command line gives c(2, 8, 18, 32, 50). Similarly, 2*b is
not ambiguous, and hence will give c(4, 8, 12, 16, 10). Similarly with the
other operators: a-b gives c(-1, -2, -3, -4, -5).
Now enter b*d. This is ambiguous, and will produce a warning message. Often R
will attempt to interpret the meaning, and hence the given answer could be quite
different to that intended.

3. Individual elements of a vector can be indexed using square brackets. There are
two methods of selecting the required elements from a vector:
1.4. MATRICES 5

(a) To select, for example, the 2nd element from vector b, enter b[2] on the
command line. To select the 2nd twice and 4th elements, enter b[c(2,2,4)]
on the command line. Notice that the indices are contained within a vector,
i.e. c(2, 2, 4).
(b) Alternatively, one can use a logical (Boolean) vector of the same length as b.
To select those elements in b that are greater than 6, create a logical vector
by entering e <- (b > 6). Now enter print(e), and you will have a vector
of the same length as b with a sequence of TRUE’s and FALSE’s depending on
whether the expression is true for each element in the vector. Those elements
can now be selected by entering b[e]. Alternatively, you could simply enter
b[b>6].

4. Often when data are collected, there are situations where some values are missing.
For example, the depths of some historical earthquakes are often missing. In the R
language, missing values (in numeric objects) are coded as NA without quotes. For
example, say we have four earthquakes, the first three have depths (km): 31, 150,
and 2, but the fourth is missing. These data would be assigned to the variable
depth as:
depth <- c(31, 150, 2, NA)

Any arithmetic operations that are performed on a missing value will give a missing
value, for example, try 2*depth.

1.4 Matrices
1. One way to construct matrices (there are many ways) is as follows:
x <- matrix(c(1, 2, 3, 4, 5, 6, 7, 8, 9, 10), byrow=FALSE, ncol=2)
print(x)

This gives
[,1] [,2]
[1,] 1 6
[2,] 2 7
[3,] 3 8
[4,] 4 9
[5,] 5 10

Like c, matrix is also a function. The statement above says to make a matrix
with 2 columns, called x, and with elements 1, 2, · · · , 10. The byrow=FALSE means
load the matrix column by column.
Similarly,
y <- matrix(c(1, 2, 3, 4, 5, 6, 7, 8, 9, 10), byrow=FALSE, ncol=5)
print(y)
6 CHAPTER 1. INTRODUCTION TO R

gives

[,1] [,2] [,3] [,4] [,5]


[1,] 1 3 5 7 9
[2,] 2 4 6 8 10

2. As for vectors, the arithmetic operators (^, ∗, /, +, −) act on matrices element by


element, for example, y*y gives

[,1] [,2] [,3] [,4] [,5]


[1,] 1 9 25 49 81
[2,] 4 16 36 64 100

The same result would be given by entering y^2 or y**2. However, x*y will
produce an error.

3. Individual elements of the matrix can be selected, for example, entering x[4, 2]
gives 9. As for vectors, one could also index using a logical matrix of the same
dimensions. For example, enter:

z <- (x < 6)

The matrix z is a logical matrix with the same dimensions as x, containing either
TRUE’s or FALSE’s depending on whether xij < 6.

4. Matrix multiplication is achieved with the symbol %*%, hence x %*% y gives

[,1] [,2] [,3] [,4] [,5]


[1,] 13 27 41 55 69
[2,] 16 34 52 70 88
[3,] 19 41 63 85 107
[4,] 22 48 74 100 126
[5,] 25 55 85 115 145

5. Character matrices (or vectors) can also be defined in the same manner, for ex-
ample:

colours <- matrix(c("red", "blue", "green", "cyan", "yellow", "magenta"),


byrow=TRUE, nrow=2)
print(colours)

This gives:

[,1] [,2] [,3]


[1,] "red" "blue" "green"
[2,] "cyan" "yellow" "magenta"

Names can be added to the rows and columns (see §3.3(4)).


1.5. MODE OF AN OBJECT 7

1.5 Mode of an Object


In the subsections above, we looked at two types of data structures: vectors and matrices.
However, the contents of a were numeric, of e were logical and of colours were character.
This is referred to as the mode. The mode of an object can be determined by using the
mode function. Possible modes are: "logical", "numeric", "complex", "character"
and "function".
Recall from above that

a <- c(1, 2, 3, 4, 5)
b <- c(2, 4, 6, 8, 10)
x <- matrix(c(1, 2, 3, 4, 5, 6, 7, 8, 9, 10), byrow=FALSE, ncol=2)
e <- (b > 6)
colours <- matrix(c("red", "blue", "green", "cyan", "yellow", "magenta"),
byrow=TRUE, nrow=2)

1. Entering mode(a) or mode(x) gives "numeric".

2. Entering mode(e) gives "logical".

3. Entering mode(colours) gives "character".

4. Recall that c is the concatenation function, thus entering mode(c) gives "function".

5. Entering mode(matrix) gives "function".

1.6 Function Objects


1. A function is a sequence of commands or operations that are stored within an
object. We have already used a number of functions, provided by the R system,
in our discussion above, e.g. c and matrix. The functions in R are very similar in
nature to FORTRAN functions. The functions may have one or more objects as
input. They manipulate the objects in some specified manner, and one object is
passed out of the function at the end.

2. Function objects have mode equal to "function". By entering mode(matrix) on


the command line, R informs us that the object matrix is a function.

3. We can view the internal commands within a function by entering its name (with-
out brackets) on the command line, for example enter matrix on the command
line.

4. Attaching brackets to a function name will cause the function to be executed.


No input arguments within the brackets will cause the default values to be used.
For example, entering matrix() will produce a matrix with one element which is
assigned a missing value.
8 CHAPTER 1. INTRODUCTION TO R

5. Trivial functions are list (ls) and quit (q). Recall that one quits the R session by
executing the quit function, i.e. q() (see §1.2). Or one could view the function by
simply entering q.

6. There are in fact many functions within the R system: for performing mathemat-
ical operations, fitting statistical models, and graphical functions. For example,
the object y %*% x represents a 2 × 2 matrix. The function solve(y %*% x) will
invert the matrix, and eigen(y %*% x) will calculate the eigen values and vectors
of the 2 × 2 matrix y %*% x. Note that the output from the eigen function is
a list object (see §3.1), containing a vector of eigen values and a matrix of eigen
vectors.

1.7 Help Documentation


1. Help documentation can be found by entering help.start() on the command line.
When the web browser window appears, click on Packages, then find the entry for
“base” and click on it. This contains documentation for each function in the base
R package. Note that there are required arguments and optional arguments. The
optional arguments will always have default settings. Those arguments with no
default settings are required arguments and must be specified. The structure of
the object that is passed out of the function is described under “Value”. “Details”
contains information about the algorithm being used. All documentation for R
functions is set out in a similar manner.

2. Many help pages have a set of examples at the bottom that can be easily run. For
example, select the help information for eigen. The first example is:
eigen(cbind(c(1,-1),c(-1,1)))

which will calculate the eigen values and vectors of the 2 by 2 matrix with one’s
on the main diagonal and negative one’s on the minor diagonal. The code can be
executed by highlighting it in the web browser, and dumping onto the R command
line.

3. Documentation for each of the functions cited in this document can be found in
the web browser help window.

4. A summary of some commonly used R functions is given in Appendix B of this


document.

1.8 Writing Functions


1. The R system will never contain all of the functions that you will require, and
you will need to be able to write your own. One needs to be careful in selecting
function names that are meaningful and have not been used before. For example,
1.9. GRAPHS 9

enter test on the command line. It will probably say that the object does not
exist, and hence we can use it as our function name.

2. Say we have a vector, we want to multiply each element by 2 and add 1. We want
to pass this vector into the function, and pass the required vector out. This is
done as follows:
test <- function(invector){
outvector <- 2*invector + 1
return(outvector)
}

To execute the function using vector a in §1.3, enter test(a) on the command line.
Complicated functions can be written using both logical (Boolean) and looping
constructions. If arithmetic operations are applied to a logical vector, the elements
will be treated as zero’s (FALSE) and one’s (TRUE).

3. It is possible to call both FORTRAN and C++ code from within a function (see
§2.7).

1.9 Graphs
In this subsection, a very brief outline of graphical methods in R is given.

1. Graphs (including maps) are written to a graphics device. R has a number of


these (see the topic “Devices” in the help documentation, §1.7), and the most
appropriate one to use is determined by what one wants to do with the graph. If
one wants to include the graph into an external document, then the postscript
or pdf functions are useful. They open a postscript or pdf file, respectively, into
which the graph is placed. If one wants to annotate the graph, then xfig is quite
useful. This will create a file that XFIG (rather like MacDraw) can read. Each
device is closed by dev.off(). If no device has been opened prior to running
graphics functions, a default device on the VDU will be opened.

2. Having opened an appropriate graphics device, one often wants to change various
parameters, for example: the number of graphs on the page, the axis layout,
available colours, font types, etc. Various options can be selected by using the par
function.

3. There are many functions to do various types of graphs. The most common are
plot, hist, curve, and barplot. For example, say we wanted to plot the cubic
function f (x) = x(x − 3)(x + 1) on the interval (−1.5, 3.5). This can be done by
entering:
x <- seq(-1.5, 3.5, 0.01)
f <- x*(x-3)*(x+1)
plot(x, f, type="l")
10 CHAPTER 1. INTRODUCTION TO R

If no graphics device is open, then in Linux (or UNIX) R will usually open an X11
window automatically, and in Microsoft Windows a windows window.

4. Maps of coastlines can also be drawn, for example enter:


library(maps)
map("usa")

5. Many of the graphics functions contained in the Statistical Seismology Library


(SSLib) make extensive use of the above R provided functions. These SSLib func-
tions will be discussed further in Chapter 6.
Chapter 2

Input-Output Methods

2.1 Reading Data from a Text File


Listed below are earthquake events with magnitude ≥ 6.7 between 1965 and 1995 in or
near New Zealand.

Latitude Longitude Event Name Depth Magn Date Time


-41.76 172.04 Westport 12 6.7 23 May 1968 17:24:17.4
-34.94 179.30 Kermadec Trench 297 6.8 08 Jan 1970 17:12:36.6
-39.13 175.18 National Park 173 7.0 05 Jan 1973 13:54:27.6
-41.61 173.65 Marlborough 84 6.7 27 May 1992 22:30:36.1
-45.21 166.71 Secretary Island 5 6.7 10 Aug 1993 00:51:51.6
-43.01 171.46 Arthurs Pass 11 6.7 18 Jun 1994 03:25:15.2
-37.65 179.49 East Cape 12 7.0 05 Feb 1995 22:51:02.3

1. Assume that the data are stored a file called “events.dat” in the format below.

-41.76 172.04 Westport 12 6.7 23 05 1968 17 24 17.4


-34.94 179.30 Kermadec_Trench 297 6.8 08 01 1970 17 12 36.6
-39.13 175.18 National_Park 173 7.0 05 01 1973 13 54 27.6
-41.61 173.65 Marlborough 84 6.7 27 05 1992 22 30 36.1
-45.21 166.71 Secretary_Island 5 6.7 10 08 1993 00 51 51.6
-43.01 171.46 Arthurs_Pass 11 6.7 18 06 1994 03 25 15.2
-37.65 179.49 East_Cape 12 7.0 05 02 1995 22 51 02.3

2. These can be read into a list object by using the scan function:

NZ1 <- scan("events.dat", what=list(latitude=0, longitude=0,


event="", depth=0, magnitude=0, day=0, month=0,
year=0, hour=0, minute=0, second=0))

Note that in the above use of scan, the blank character denotes the break between
fields. This is why the underscore has been used where a blank would normally
occur in the event name. An alternative method is to use commas, or some other
character as the delimiter.

11
12 CHAPTER 2. INPUT-OUTPUT METHODS

3. The object NZ1 will be a list object, try the function is.list(NZ1). When we
print the object, i.e. print(NZ1), it prints as a list. Lists are discussed further in
§3.1.

4. Note that mode(NZ1$latitude) is "numeric" and mode(NZ1$event) is "character"


as required.

5. Now assume that the data stored in the file “events.dat” does not contain the un-
derscores, and some values are missing (unfortunately, often indicated by blanks),
e.g. the depth for Westport.

-41.76 172.04 Westport 6.7 23 05 1968 17 24 17.4


-34.94 179.30 Kermadec Trench 297 6.8 08 01 1970 17 12 36.6
-39.13 175.18 National Park 173 7.0 05 01 1973 13 54 27.6
-41.61 173.65 Marlborough 84 6.7 27 05 1992 22 30 36.1
-45.21 166.71 Secretary Island 5 6.7 10 08 1993 00 51 51.6
-43.01 171.46 Arthurs Pass 11 6.7 18 06 1994 03 25 15.2
-37.65 179.49 East Cape 12 7.0 05 02 1995 22 51 02.3

6. The use of a separator as above will not work here. In this situation, one reads each
complete line (record) into one character variable. The use of sep="\n" indicates
that the end of record denotes the next value. One then picks off the substrings
relating to the individual variables, and the numeric variables must be “coerced”
from character to numeric:

a <- scan("events.dat", what=character(), sep="\n")


NZ1 <- NULL
NZ1$latitude <- as.numeric(substr(a, 1, 6))
NZ1$longitude <- as.numeric(substr(a, 8, 13))
NZ1$event <- substr(a, 15, 30)
NZ1$depth <- as.numeric(substr(a, 32, 34))
NZ1$magnitude <- as.numeric(substr(a, 36, 38))
NZ1$day <- as.numeric(substr(a, 40, 41))
NZ1$month <- as.numeric(substr(a, 43, 44))
NZ1$year <- as.numeric(substr(a, 46, 49))
NZ1$hour <- as.numeric(substr(a, 51, 52))
NZ1$minute <- as.numeric(substr(a, 54, 55))
NZ1$second <- as.numeric(substr(a, 57, 60))

2.2 Including an R Program Source File


Usually programming commands will be contained within a text file, and this file called
from within R. This can be done as follows by using the source function:

1. The text file can be written with any text editor, e.g. emacs, gedit, or notepad.
Create a file with the name “test.R”

2. Enter the required programming code into the file. For example:
2.3. WRITING R OBJECTS TO A TEXT FILE 13

a <- c(1, 2, 3, 4, 5)
b <- c(2, 4, 6, 8, 10)
# print the product of a times b
print(a*b)

Note that the hash character (#) starts a comment line. This comment remains
in effect until the next hard return (hard line feed).

3. Save the file.

4. The commands within the text file can now be executed in R by typing

source("test.R")

on the R command line.

2.3 Writing R Objects to a Text File


We need to distinguish between function objects and data objects. In the case of function
objects, one may want to put the function code into a text file so that it can be modified,
then included back into R and executed. For example, by entering

dump("matrix", file="temp.R")

the code for the function matrix will be written into the text file “temp.R”. This code
could be edited and included back into R by using the source function (see §2.2).
Data objects that are required for use in programs outside of R will need to be
written to a text file. Some possibly useful functions are sink, print, and cat.

2.4 Writing Program Output to a Text File


R output can be written to a text file by using the sink function.
For example, enter sink("test.out") on the R command line. This tells R to direct
all output that would have normally gone to the screen to go to the file “test.out”. Now
enter source("test.R") again (created in §2.2 above). To close the file “test.out”,
enter sink(). Now, there should be a file called “test.out”, which can be viewed by
using a text editor.

2.5 Saving R Objects for Use in a Subsequent Session


When quitting from R one is asked whether the “workspace” should be saved (i.e. the
objects displayed when one runs ls()), see §1.2.
One can be more selective about the objects to save, together with the location and
name of the “Rda” file (i.e. the R binary file) by using the save function. For example,
14 CHAPTER 2. INPUT-OUTPUT METHODS

a <- 10
b <- 15
d <- 21

save(a, b, file="temp.Rda")

will save the objects a and b only in an R format in the file “temp.Rda”. To save all
current objects, run:
save(list=ls(), file="temp.Rda")

2.6 Retrieving R Objects from a Previous Session


Saved objects can be reloaded into R by using the load function. For example, the file
“temp.Rda” created in §2.5 is loaded by executing the following command:
load("temp.Rda")

2.7 Executing FORTRAN and C++ from within R


Compiled FORTRAN and C++ code can be executed and linked to internal R objects.
This is done by using the functions .C and .Fortran (refer to their documentation).

2.8 Running Jobs in Batch Mode


It is often required to run an R program as part of a sequence of other jobs not being
executed in the R language. These jobs may be initiated by commands contained in an
executable unix file. To execute R source code contained in infile from outside R, the
appropriate command is
R CMD BATCH infile outfile

The output that would be normally written to the VDU in an interactive session will
now be written to outfile. For more information, enter
R CMD BATCH --help
Chapter 3

More Advanced Data Structures

3.1 List Objects


1. More complicated data structures can be constructed as list objects. For example,
say we wanted to put vector a, matrix x and the matrix colours into one object
called data (these objects were created in §1.3 and §1.4). This is done as follows:
a <- c(1, 2, 3, 4, 5)
x <- matrix(c(1, 2, 3, 4, 5, 6, 7, 8, 9, 10), byrow=FALSE, ncol=2)
colours <- matrix(c("red", "blue", "green", "cyan", "yellow", "magenta"),
byrow=TRUE, nrow=2)

data <- list(a, x, colours)

2. Now enter print(data) on the command line, to get


[[1]]:
[1] 1 2 3 4 5

[[2]]:
[,1] [,2]
[1,] 1 6
[2,] 2 7
[3,] 3 8
[4,] 4 9
[5,] 5 10

[[3]]:
[,1] [,2] [,3]
[1,] "red" "blue" "green"
[2,] "cyan" "yellow" "magenta"

Notice that the matrix x is referred to as [[2]] within the list object data. Enter
data[[2]] on the command line to get x. The element in the 5th row and 2nd
column of x can be extracted as data[[2]][5,2].

3. If we want to retain their original names (or allocate new names), enter

15
16 CHAPTER 3. MORE ADVANCED DATA STRUCTURES

data <- list(a=a, x=x, colours=colours)

Now x can be retrieved from the object data by entering either data[[2]] or
data$x.

4. Enter mode(data) to see that R recognises the object data as a list object (§1.5).
Enter names(data) to see the variable names within the list object called data.

5. Each part of the list will have its own mode (§1.5), eg. enter mode(data$colours)
and mode(data$x).

3.2 Factors
A factor is essentially a coded variable, usually a character variable.

1. Consider the dataset in §2.1. For simplicity, assume that the file “events.dat”
contains the data in the following format:
-41.76,172.04,Westport,12,6.7,23,05,1968,17,24,17.4
-34.94,179.30,Kermadec Trench,297,6.8,08,01,1970,17,12,36.6
-39.13,175.18,National Park,173,7.0,05,01,1973,13,54,27.6
-41.61,173.65,Marlborough,84,6.7,27,05,1992,22,30,36.1
-45.21,166.71,Secretary Island,5,6.7,10,08,1993,00,51,51.6
-43.01,171.46,Arthurs Pass,11,6.7,18,06,1994,03,25,15.2
-37.65,179.49,East Cape,12,7.0,05,02,1995,22,51,02.3

2. These data can be read into a list object by using the following statement:
NZ1 <- scan("events.dat", what=list(latitude=0, longitude=0,
event="", depth=0, magnitude=0, day=0, month=0,
year=0, hour=0, minute=0, second=0), sep=",")

3. Now assume that we want to divide the depth into two categories, "deep" and
"shallow". This can be achieved as:
NZ1$depth.cat <- c("deep", "shallow")[(NZ1$depth<40)+1]
print(NZ1$depth.cat)
print(is.character(NZ1$depth.cat))

Note that the “data” are stored as character the strings "deep" and "shallow",
which would be inefficient for a large dataset.

4. A factor stores this information by recording the character “levels”, here being
"deep" and "shallow", and then the data vector is a numeric variable of one’s
and two’s depending on whether the particular value is "deep" or "shallow",
respectively. The variable NZ1$depth.cat can be transformed into a factor as
follows:
NZ1$depth.cat <- as.factor(NZ1$depth.cat)
3.3. ATTRIBUTES 17

5. Observe that the variable is printed with the values of deep and shallow. The
levels can be extracted using the levels function.
print(NZ1$depth.cat)
print(levels(NZ1$depth.cat))
print(as.numeric(NZ1$depth.cat))

3.3 Attributes
1. A variable can have a number of attributes. These are characteristics of the
variable, and some determine the manner in which the variable is printed, and so
on.

2. Consider the example in §3.2, in particular, the variable NZ1$depth.cat. The


attributes can be printed as:
print(attributes(NZ1$depth.cat))

Note that there are two attributes, levels and class as follows:
$levels
[1] "deep" "shallow"

$class
[1] "factor"

The levels were discussed in §3.2. The “class” attribute is a central concept in
the R language, and determines the manner in which other functions interact with
this variable. This is discussed further in §3.4.

3. Consider the matrix colours from §1.4:


colours <- matrix(c("red", "blue", "green", "cyan", "yellow", "magenta"),
byrow=TRUE, nrow=2)
print(attributes(colours))

The colours object only has one attribute, dim, being the dimensions of the matrix.

4. Column and row names can be added to the matrix as:


dimnames(colours) <- list(c("Primary", "Secondary"), c("R", "B", "G"))
print(colours)
print(attributes(colours))

Note that the row and column names are actually stored as attributes.

5. We can also attach our own attributes using the attr function. Again, consider
the data in §3.2. The magnitudes are on a “local” scale. This information could
be attached to the variable NZ1$magnitude as follows:
18 CHAPTER 3. MORE ADVANCED DATA STRUCTURES

attr(NZ1$magnitude, "magn.type") <- "local"


print(NZ1$magnitude)
print(attributes(NZ1$magnitude))

Hence, we have attached an attribute called "magn.type" which has a value of


"local".

3.4 Class Attribute, Generic and Method Functions


When analysing data, we often want to print, plot or create a summary of the dataset.
These can be thought of as generic operations. However, the most appropriate way to
print, plot or summarise the data will often depend on the type of data. This is achieved
by giving the data a class, which then determines the method that is used to print, plot
or summarise the data.

1. We again refer to the dataset in §3.2. Enter class(NZ1$depth.cat), it will be


"factor".

2. Note that whenever an object name is entered on the command line followed by
Enter, for example NZ1$depth.cat, it is interpreted as print(NZ1$depth.cat).

3. The function print is generic. When print(object) is entered on the command


line, the print function checks to see what the class of object is. If object has
no set class, the function print issues the command
print.default(object)

4. Since class(NZ1$depth.cat) is "factor", then the print function looks for an-
other function called print.factor, and since there is such a function, it issues
the command print.factor(NZ1$depth.cat). It is this function that causes
NZ1$depth.cat to be printed using the values deep and shallow rather than
one’s and two’s.

5. The function print.factor is referred to as the method to be used on objects of


class "factor" by the generic function print.

6. The function print.default will print the stored data without reference to the
class of the object. For example,
print.default(NZ1$depth.cat)

will simply give a vector of one’s and two’s, because this is how the data are stored.

7. It is possible to define our own generic functions and the associated methods.
These ideas are quite important and are used in the Statistical Seismology Library,
particularly to store the earthquake catalogues. One can also add methods for
system supplied generic functions.
3.5. DATA FRAME 19

8. Objects can have multiple classes. For example, the earthquake catalogues, which
will be discussed in §5.1, have classes "catalogue" and "data.frame". Hence a
generic function will first search for the appropriate method for "catalogue". If
this is not found, it will search for an appropriate method for "data.frame". If
there is no method found here either, it will simply use the default method for the
generic operation.

3.5 Data Frame


Many datasets are matrix-like objects, where the rows represent observations, and the
columns represent variables. A data frame has these characteristics but with much more
flexibility. It is basically a list, but where all of the variables must be of the same length.
But it can also be treated and is displayed like a matrix. Further, the modes (and other
attributes) of the individual variables can be different.

1. Consider the earthquake data in §3.2. We read these data as in §3.2:


NZ1 <- scan("events.dat", what=list(latitude=0, longitude=0,
event="", depth=0, magnitude=0, day=0, month=0,
year=0, hour=0, minute=0, second=0), sep=",")

2. Now force the data into a matrix by using the column bind function:
NZ2 <- NULL
for (i in 1:length(NZ1)) NZ2 <- cbind(NZ2, NZ1[[i]])
print(NZ2)

This has part of the desired effect, in that each earthquake event is now represented
as a row in the matrix, and each variable as a column. However, since a matrix
has the same mode (see §1.5) for all elements, and one variable in NZ2 is character,
then all elements are transformed to character values.

3. The standard way to turn the object NZ1 into a data frame is with the following
statement: NZ3 <- as.data.frame(NZ1).

4. Now print NZ3, it should look like a matrix.

5. Note that the class of NZ3 is "data.frame", but it is still a list.


print(class(NZ3))
print(is.list(NZ3))

Hence, when we print NZ3, the print function uses the function print.data.frame.
This function causes the object to be printed like a matrix, even though it is a
list.

6. As for a list, each variable can have different modes and attributes:
20 CHAPTER 3. MORE ADVANCED DATA STRUCTURES

print(mode(NZ3$latitude))
print(mode(NZ3$longitude))
print(mode(NZ3$event))

Notice that NZ3$event is also stored as a numeric variable! In fact, since it was
character in the original data, the as.data.frame turns it into a factor (see §3.2),
hence it still prints “like” a character variable.

7. In the following, we again read the data from the text file and turn it into a
data frame. However, the I function attaches a class of "AsIs", which tells the
as.data.frame function not to turn NZ1$event into a factor. We also create a
depth category factor, and attach an attribute to the magnitude. All of these
characteristics can be stored within a data frame object.
NZ1 <- scan("events.dat", what=list(latitude=0, longitude=0,
event="", depth=0, magnitude=0, day=0, month=0,
year=0, hour=0, minute=0, second=0), sep=",")
NZ1$event <- I(NZ1$event)
NZ1 <- as.data.frame(NZ1)

NZ1$depth.cat <- c("deep", "shallow")[(NZ1$depth<40)+1]


NZ1$depth.cat <- as.factor(NZ1$depth.cat)
attr(NZ1$magnitude, "magn.type") <- "local"

print(NZ1)
print(mode(NZ1$event))
print(attributes(NZ1$event))
print(attributes(NZ1$magnitude))

8. Note that NZ1 is not a matrix (try print(is.matrix(NZ1))), though it can be


treated like a matrix. Thus the latitudes could be extracted as if it is a list, i.e.
NZ1$latitude or NZ1[[1]], or as if it was a matrix NZ1[,1]. The first event
occupies the first row, so can be extracted as NZ1[1,]. Further, concatenation
works in the same way as for matrices, for example, try cbind(NZ1, NZ1) and
rbind(NZ1, NZ1).
Part II

The Statistical Seismology


Library

21
Chapter 4

Statistical Seismology Library


(sslib)

4.1 Introduction
The Statistical Seismology Library is a collection of R packages or libraries for the
analysis of seismological data. It consists of the individual packages: ssBase, ssEDA,
PtProcess, ssM8, Fractal, and various packages containing earthquake catalogues.
The packages can be attached individually, or collectively as shown below.
The ssBase package contains common functions utilised by more than one of the
SSLib packages.
The word “Library” in the name of SSLib is historical. SSLib was originally writ-
ten for S-PLUS, in which such add-on software was known as a library, and used the
library function to attach it. Now R distinguishes between a “library”, which is a
directory containing installed packages, and a package, which is a named component of
a library. However, the library function is still used to make a package available. In
R nomenclature, SSLib should be a “bundle”, although as yet it is not distributed as
such.
Lay & Wallace (1995) is a good general seismology text and will provide descriptions
of much of the seismological terminology used.

4.2 Attaching the Statistical Seismology Library


1. Start R if it is not already running (see §1.1).

2. The Statistical Seismology Library is attached by entering the command

library(sslib)

Note that R is case dependent. Something like the following should appear on
your screen.

23
24 CHAPTER 4. STATISTICAL SEISMOLOGY LIBRARY (SSLIB)

> library(sslib)
Loading required package: ssBase
Loading required package: chron
Loading required package: ssEDA
Loading required package: maps
Loading required package: ssNZ
Loading required package: Fractal
Loading required package: ssM8
Loading required package: PtProcess
>

Note that SSLib is made up of a number of packages, as above. The specific pack-
age sslib, loaded by calling library(sslib) as above, actually contains nothing,
though requires the collection of packages that make up SSLib. Thus the package
sslib simply provides a convenient method to load all parts of SSLib with one com-
mand. We refer to the above collection of packages as the Statistical Seismology
Library (SSLib).

3. SSLib largely contains two types of R objects: earthquake catalogues and func-
tions. Documentation for each can be found by entering help.start(). At the
end of most function documentation, there is a set of examples that can be exe-
cuted.

4. The New Zealand earthquake catalogue will be loaded when library(sslib) is


executed, but only if the ssNZ package is installed on the local system. By default,
other earthquake catalogue packages will not be loaded. This is because differ-
ent users will require different catalogues, and may not even want all catalogue
packages loaded onto their computer. The PDE catalogue package, for example,
is loaded by running the command:
library(ssPDE)

5. If one always requires a particular catalogue to be loaded, one can modify the
sslib package to do this (see §12.1).
Chapter 5

The SSLib Base Package (ssBase)

A listing of the main functions in the ssBase package can be found in Appendix A.2,
and detailed documentation for all functions can be found in Harte (2003c).

5.1 Structure of Earthquake Catalogues in SSLib


All earthquake catalogues in SSLib are formatted in the same manner. This allows the
user to immediately analyse and fit models to data without the necessity of reformatting
the data. All catalogues in SSLib contain the following variables:

latitude (numeric) is the number of degrees north of the equator (positive) or south of
the equator (negative).

longitude (numeric) is the number of degrees east of Greenwich (i.e. between 0◦ and
360◦ ). Note that events in the hemisphere west of the meridian through Greenwich
are not represented with negative longitudes. This ensures that the discontinuity
occurs at a longitude of 0◦ .

depth (numeric) is a positive number representing the event depth in kilometres.

magnitude (numeric) is the magnitude. It may have an attribute "magn.type" which


records the magnitude type.

time (numeric) with class "datetimes" being the number of days (and fractions) from
midnight on 1 January 1970. While the data are stored as the number of days
since 1 January 1970, the class of "datetimes" causes the data to be printed
in the format "ddmmmyyyy hh:mm:ss.s", where the number of decimal places for
seconds is defined as an attribute of the time variable.

You can add any other variables to the catalogue in addition to those listed above.
Catalogues are stored as list objects (see §3.1). They have two classes (see §3.4):
"catalogue" and "data.frame". This causes generic functions to first search for a
method (see §3.4) for "catalogue", and if one is not found, then use the method for

25
26 CHAPTER 5. THE SSLIB BASE PACKAGE (SSBASE)

"data.frame". The catalogue also has an attribute called catname, being a character
string containing the name of the catalogue. This ensures that the object does not
“forget” its origin when passed in and out of other functions.

5.2 Creating a Simple Catalogue


Consider the following example.
1. Firstly read the data as in §3.2, i.e.
NZ1 <- scan("events.dat", what=list(latitude=0, longitude=0,
event="", depth=0, magnitude=0, day=0, month=0,
year=0, hour=0, minute=0, second=0), sep=",")

2. The datetimes function can be used to create the time variable. This function
is contained in the SSLib package ssBase, which needs to be loaded first if not
already done. For example, enter
library(sslib)
x <- datetimes(NZ1$year, NZ1$month, NZ1$day, NZ1$hour, NZ1$minute,
NZ1$second, dp.second=1)
print(x)

The argument dp.second=1 ensures that seconds are printed to one decimal place,
the number read from the text file. More information about the datetimes func-
tion can be found in the help documentation.
3. The function as.catalogue will not only calculate the time variable, but also
attach the necessary attributes to the catalogue object. This is run as follows:
as.catalogue(NZ1, catname="NZ1", dp.second=1)
print(NZ1)

Note that there is no assignment arrow to the left of the function call. The
assignment is done internally within the function.
4. Now print these various characteristics of the catalogue NZ1:
print(names(NZ1))
print(attributes(NZ1))
print(class(NZ1))

Note that an extra variable called missing.time has been added, while the vari-
ables year, month, day, hour, minute, and second have been deleted (see §5.3 for
more explanation). Note that both classes have been added, and the rows have
been sequentially numbered. Lastly note that the catalogue name has been added
as an attribute.
5. You can also add your own attributes to the catalogue objects and variables within
the object (e.g. magnitude), but do not use the same attribute names expected
by other functions. For example,
5.3. THE TIME VARIABLE 27

attr(NZ1, "note") <- "Event solutions determined using velocity model A"

5.3 The Time Variable


Catalogues are often distributed with the year, month, day, hour, minute and seconds
included as separate variables. As well as occupying considerably more space, and
hence memory when fitting models, this information is also redundant. Time is a one
dimensional continuous variable, and the coarser intervals of days, months and years
can easily be derived from this single variable. Hence, we only store the variable time
in the catalogue. However, this does not restrict the user from adding these variables
back if required.

1. Continuing the example in §5.2, enter print(NZ1). Notice that the time variable
is formatted with a date and time component. As noted above, the six original
date and time variables have been deleted. These data can easily be recalculated
from the time variable as follows:
print(years1(NZ1$time))
print(months1(NZ1$time))
print(days1(NZ1$time))
print(hrs.mins.secs(NZ1$time))

2. Now enter print(NZ1$missing.time). In this particular example, they are all


empty character strings. Recall that the time component consisted of the six
original variables: year, month, day, hour, minute, and second. Some of the great
historical events are only recorded to the closest year, and the remaining 5 other
date/time variables are missing (unknown). Others may be recorded to the closest
month, and so on. For the calculation of the time variable, missing hour, minute
or second values are set to zero; and missing month or day variables are set to
1. Hence an event that was recorded as occurring in Sept 1745, with the other
variables missing, would be calculated as 01Sep1745 00:00:00 (i.e. midnight).
We would then know that the time is approximate, because the missing.time
variable in the catalogue would be set to "D", the biggest component that is
missing. Values that missing.time can take are: "M" (month), "D" (day), "h"
(hour), "m" (minute), and "s" (second).

3. The time variable in a catalogue has class "datetimes", however, the variable is
numeric, being the number of days from a defined origin, by default 1 Jan 1970.
For example, enter:
print(attributes(NZ1$time))
print(is.numeric(NZ1$time))

Even though the dates are stored as the number of days from 1 Jan 1970, they are
printed in the format DDMMMYYYY hh:mm:ss.s because of the class "datetimes"
(see §3.4). The function print.default will print the data in the manner in which
it is stored, for example, try:
28 CHAPTER 5. THE SSLIB BASE PACKAGE (SSBASE)

print(NZ1$time)
print.default(NZ1$time)

4. In modelling data, one often needs the flexibility to change the time origin, but
still requires the dates to be printed correctly. For example, we may want the
origin to be 1 Jan 1990. This can be done as follows:
NZ1$time <- NZ1$time - julian(1, 1, 1990, origin=attr(NZ1$time, "origin"))
attr(NZ1$time, "origin") <- c(month=1, day=1, year=1990)
print(NZ1$time)

The first statement is subtracting the number of days between the original origin
and 1 Jan 1990. The second statement updates the origin attribute so that the
dates are calculated (displayed) correctly.

5.4 More About Earthquake Catalogues


1. A summary of the catalogue can be produced by using the summary function,
which is generic. A method (see §3.4) for this generic function is provided by the
function summary.catalogue. Create a summary of the NZ1 catalogue by entering
a <- summary(NZ1)
print(a)

The object a is a list object. Enter names(a) to see the variables that it con-
tains. You can extract the individual components in the usual way, for example
a$missing.time.

2. The object a also contains information about the spatial and temporal range of
the catalogue, for example, enter print(a$ranges) and print(a$time.range).

3. Since the catalogue is also a data frame, it can be treated like a matrix. In
particular, see the discussion in §3.5(8).

5.5 Subsetting Catalogues and Subcatalogues


1. Note that all catalogues must have the variables: latitude, longitude, depth, time,
and magnitude. If these variables are not contained within the catalogue, then
the subsetting functions will not work. However, some catalogues do not have
all of these variables, for example, depth may not be available in some synthetic
catalogues. In this situation it is easiest to add a variable called depth, of the
same length as the other variables, but with all values being zero.

2. Usually one wants to analyse fairly small parts of earthquake catalogues. There
are four functions provided to subset catalogues: subsetcircle, subsetpolygon,
subsetsphere and subsetrect. Further information can be found about each
within the web browser help window.
5.5. SUBSETTING CATALOGUES AND SUBCATALOGUES 29

3. Select the help page for subsetcircle, and highlight the following statements
that can be found in the Examples section:
data(NZ55)
a <- subsetcircle(NZ55, centrelat=-41.3, centrelong=174.8,
maxradius=100)
print(summary(a))

Paste these into the R command line.


4. The subset of events is described within the object a. By typing mode(a) we see
that a is a list, and names(a) tells us the names of the variables within the list
object a. The locations and times of the earthquake events are not stored within a,
only the indices of the required events and information about how the subsetting
was done. For example, type a$indices to display the indices. Objects created
by subsetrect and subsetpolygon are very similar.
5. An object that has been created by any one of the functions of subsetcircle,
subsetpolygon, or subsetrect has class "subset". Such an object is a list
object, but the print.subset function will not print out all of the indices, as
there are usually too many. For example, enter print(a). This is the same as
print.subset(a). If you want it set out just like a list object, with all of the
event indices too, then enter print.default(a).
6. Entering summary(a) is the same as summary.subset(a), and will produce a
character string giving the subset criteria. This description often appears as a
footnote on some graphs (see Chapter 6).
7. If one wants a relatively small amount of data from a larger catalogue, it may be
more efficient to create a subcatalogue. For example, we may want to select all
events in the NZ55 Catalogue with magnitude ≥ 7 and create a catalogue called
NZ7. This is done as follows:
b <- subsetcircle(NZ55, minmag=7)
as.catalogue(b, "NZ7")

The new catalogue called NZ7 will have the standard catalogue format, and so can
be treated like any other catalogue. For example, to select from the NZ7 catalogue
all events between 12:30:00 hrs on 2 Jan 1990 until 00:00:00 hrs on 31 July 1990
enter:
b <- subsetrect(NZ7, minday=julian(1,2,1990)+12/24+30/(24*60),
maxday=julian(7,31,1990))
summary(b)

Note that the julian function unfortunately uses the American ordering for the
date, i.e. month, day, year. The parameter ordering can be overriden by explicitly
stating the parameter names, e.g. julian(d=1, x=1, y=1970), with x meaning
month.
30 CHAPTER 5. THE SSLIB BASE PACKAGE (SSBASE)
Chapter 6

Exploratory Data Analysis


(ssEDA)

In this section we use catalogue data to demonstrate some of the graphical routines. A
listing of the main functions in the ssEDA package can be found in Appendix A.3, and
detailed documentation for all functions can be found in Harte (2003d).

6.1 Epicentral Plots


An earthquake hypocentre is the three dimensional coordinates of the point in the earth
where the rupture begins. In a large earthquake, the actual rupture may extend along
a number of kilometers, and the hypocentre is not necessarily in the middle of the
rupture. The epicentre of an earthquake event is the point location on the surface of
the globe that represents the projection of the hypocentre onto the surface of the globe.
An epicentral plot is a map overlaid with the epicentral locations of the required events.

6.1.1 NZ East Cape Event


This event (ML = 7.0) occurred off the East Cape of New Zealand (37.65◦ S, 179.49◦ E)
on 5 February 1995. It was followed by many aftershocks. In this example, we plot the
epicentral locations of the main event and some of its aftershocks.

1. The required events are found in the New Zealand catalogue. We must also attach
the ssEDA package if it has not already been attached as follows:
library(ssNZ)
library(ssEDA)

2. Initially select the required events from the New Zealand catalogue. We then make
a new temporary catalogue called “EastCape”. This catalogue can be treated like
any other earthquake catalogue in SSLib, though will be deleted at the end of this
R session if not saved.

31
32 CHAPTER 6. EXPLORATORY DATA ANALYSIS (SSEDA)

a <- subsetcircle(NZ, minday=julian(1,1,1995),


maxday=julian(1,1,1996), minmag=4.0, maxradius=150,
centrelat=-37.65, centrelong=179.49)

as.catalogue(a, catname="EastCape")

3. We next use the epicentres function to draw the epicentral plot. However, this
function requires an object of class "subset", (i.e. output from either subsetcircle,
subsetpolygon, subsetrect, or subsetsphere). Since we want all events in the
“EastCape” catalogue, we include no restrictions in the subsetrect function call
below:
a <- subsetrect(EastCape)

epicentres(a, criteria=FALSE,
magnitude=c(4,5,6,6.9,7.1), cex=c(0.5,1,3,5),
usr=c(177.5, 181, -39, -36.5))

4. The magnitude and cex arguments tell the function to represent larger earth-
quake events with larger symbols (see help documentation for epicentres for
more details).

5. Notice that the map of the East Cape of New Zealand in the plot is terrible. This
is because, by default, it is using a world map of low resolution ("world2"). Also
within the maps package is a low resolution map of NZ ("nz"), which will provide
higher resolution in the East Cape area. The required map is specified in the
"mapname" argument. Redo the plot as follows:
epicentres(a, criteria=FALSE,
magnitude=c(4,5,6,6.9,7.1), cex=c(0.5,1,3,5),
usr=c(177.5, 181, -39, -36.5), mapname="nz")

6. The mapdata package contains high resolution maps. If this is installed on your
system, try:
library(mapdata)

epicentres(a, criteria=FALSE,
magnitude=c(4,5,6,6.9,7.1), cex=c(0.5,1,3,5),
usr=c(177.5, 181, -39, -36.5), mapname="nzHires")

7. We can easily enhance the plot by adding a title and various place names:
epicentres(a, criteria=FALSE,
magnitude=c(4,5,6,6.9,7.1), cex=c(0.5,1,3,5),
usr=c(177.5, 181, -39, -36.5), mapname="nz")
title(main="East Cape (NZ) Event", cex.main=1.8, font.main=1)

# Add some place names to the plot


text(x=178.1, y=-38.65, labels="Gisborne", adj=c(0,1), cex=1.2)
points(178, -38.65, pch=16, cex=2)
text(x=180.3, y=-38.25, labels="PACIFIC OCEAN", cex=1.2, font=3)
6.1. EPICENTRAL PLOTS 33

6.1.2 Significant New Zealand Events


Here we plot significant NZ events that occurred between 1 January 1960 and 31 De-
cember 1998.

1. Some significant NZ events are listed below. Copy these into a file called “events.dat”.
-41.76,172.04,Westport,12,6.7,23,05,1968,17,24,17.4
-34.94,179.30,Kermadec Trench,297,6.8,08,01,1970,17,12,36.6
-39.13,175.18,National Park,173,7.0,05,01,1973,13,54,27.6
-44.67,167.38,,12,6.5,04,05,1976,13,56,29.2
-46.70,166.03,,12,6.5,12,10,1979,10,25,22.1
-37.89,176.80,Edgecumbe,10,6.1,02,03,1987,01,42,35.0
-40.43,176.47,Weber,30,6.2,13,05,1990,04,23,10.2
-41.61,173.65,Marlborough,84,6.7,27,05,1992,22,30,36.1
-45.21,166.71,Secretary Island,5,6.7,10,08,1993,00,51,51.6
-43.01,171.46,Arthurs Pass,11,6.7,18,06,1994,03,25,15.2
-37.65,179.49,East Cape,12,7.0,05,02,1995,22,51,02.3

2. Attach the ssEDA library, read the events, and create a catalogue (discussed in
§5.2) called “NZ1”:
library(ssEDA)
NZ1 <- scan("events.dat", what=list(latitude=0, longitude=0,
event="", depth=0, magnitude=0, day=0, month=0,
year=0, hour=0, minute=0, second=0), sep=",")
as.catalogue(NZ1, catname="NZ1", dp.second=1)
print(NZ1)

3. Draw a plot containing a very low resolution map of NZ:


a <- subsetrect(NZ1)
epicentres(a, cex=1, usr=c(165, 180, -48, -34))

The map data is that from a low resolution version of the world map (default).

4. A better looking map can be drawn by using the low resolution version of the NZ
map, which is also contained in the maps package.
epicentres(a, cex=1, usr=c(165, 180, -48, -34), mapname="nz")

5. The mapdata package contains high resolution maps. If it is installed on your


system, attach and rerun as below:
library(mapdata)
epicentres(a, cex=1, usr=c(165, 180, -48, -34), mapname="nzHires")

Notice that the subsetting information is included as a footnote at the bottom of


the plot.

6. Now remove subsetting criteria shown at the bottom of the plot, use a symbol
colour that represents the depth, and a symbol size that represents the magnitude
of the event. Also annotate with some event names:
34 CHAPTER 6. EXPLORATORY DATA ANALYSIS (SSEDA)

epicentres(a, mapname="nz", criteria=FALSE,


usr=c(165, 180, -48, -34), depth=c(0, 20, 40, 100, 200, Inf),
magnitude=c(6.0, 6.8, 7.0, 7.2, Inf), cex=c(1, 2, 3, 4))
title(main="Significant NZ Events Between 1960 and 1998", font.main=1)

# Add event names to the plot


text(x=166.8, y=-44.5, labels="Secretary\nIsland", adj=c(1,1), cex=1)
text(x=171.7, y=-42.8, labels="Arthurs\nPass", adj=c(0,1), cex=1)
text(x=170.3, y=-41.6, labels="Westport", adj=c(0,0), cex=1)
text(x=175, y=-38.2, labels="Edgecumbe", adj=c(0,0), cex=1)
text(x=179, y=-38, labels="East\nCape", adj=c(0,1), cex=1)
text(x=176.8, y=-40.5, labels="Weber", adj=c(0,1), cex=1)
text(x=177.4, y=-35, labels="Kermadec", adj=c(0,1), cex=1)
text(x=175.6, y=-39, labels="National\nPark", adj=c(0,1), cex=1)
text(x=176, y=-45, labels="PACIFIC OCEAN", cex=1.2, font=3)
text(x=169, y=-38, labels="TASMAN SEA", cex=1.2, font=3)

6.1.3 Circum-Pacific Events


In this example, we plot large events in the Pacific region.

1. The required events can be found in the PDE catalogue. Load the library con-
taining this catalogue and ssEDA:
library(ssPDE)
library(ssEDA)

2. Extract the events with magnitude ≥ 5 from the PDE catalogue and plot:
b <- subsetrect(PDE, minlong=90, maxlong=300, minlat=-80,
maxlat=80, minmag=5, minday=julian(1,1,1990),
maxday=julian(1,1,1993))

epicentres(b, usr=c(b$minlong, b$maxlong, b$minlat, b$maxlat),


cex=1, horiz=TRUE, mapname="world2")
title(main="Events in Pacific Region", font.main=1)

6.2 Plate Subduction


6.2.1 Wellington Local Area
In this example, we plot the Pacific Plate subducting the Australian Plate in the Welling-
ton Region of New Zealand. Events from the Wellington local network are now included
in the New Zealand Catalogue.

1. Initially create the Wellington Catalogue as follows:


library(ssEDA)
library(ssNZ)
as.catalogue(subsetrect(NZ, minlat=-42.1, maxlat=-40.5,
minlong=173.6, maxlong=176.0, minday=julian(1,1,1978)),
catname="Wellington")
6.2. PLATE SUBDUCTION 35

2. Extract events with ML ≥ 2, with a maximum depth of 200 km, between 1 January
1978 and 31 December 1991 from the Wellington Catalogue:
b <- subsetrect(Wellington, minmag=2, minday=julian(1,1,1978),
maxday=julian(1,1,1992), maxdepth=200)

3. To view the subduction boundary, enter threeD(b). An XGobi window will ap-
pear; maximise to fill the screen and also move the lines to enlarge the plotting
area. The depth will be on the vertical axis, with either longitude or latitude on
the horizontal axis. Pull down the menu at the top called View: XYPlot, and se-
lect Rotation (r). Then click Pause so that it rotates (if it is not already rotating).
You can stop it by clicking Pause. The speed of the rotation can be changed by
dragging the bar in the slider window beneath the File and View menus.
The displayed viewing perspective has events with zero depth at the top of the
picture, and deepest events at the bottom. The “clocks” on the right describe
what is happening to each of the three spatial variables. One can easily see the
subducting plate slab. The lines at depth values of 5 km and 12 km are from poorly
determined events.
You can also move the points in whatever direction that you like. Stop the rotation
by clicking Pause. Then put the mouse cursor onto the plot, and holding down
the first mouse button, drag the points. If you loose your orientation, and get lost,
click the button Reinit. This will reinitialise the picture to its original orientation.
Quit the XGobi window by pulling down the File menu, then Quit.

4. A high resolution plot can be done by using the function rotation. By viewing
roughly towards the north-east (essentially along the direction of the main moun-
tain range), the plate boundaries were approximately aligned. In particular, a
rotation of −40◦ (from north) is specified as follows:
rotation(b, theta=-40)
title(main="Plate Subduction in Wellington Region")

5. An epicentral plot can also display the subduction process. This is achieved by
plotting deeper events with a colour at the blue end of the spectrum up to shallow
events at the red end. The size of the plotting symbol represents the magnitude
of the event. Note that the usr argument specifies the axis limits of the plot.
epicentres(b, usr=c(173.55, 176.05, -42.13, -40.47),
depth=c(0, 30, 50, 70, 100, Inf), criteria=FALSE,
magnitude=c(2, 3, 4, 5, 6, Inf), mapname="nz")
title(main="Plate Subduction in Wellington Region", font.main=1)

6. A high resolution map can be drawn if the package mapdata is installed on your
system, as follows:
library(mapdata)
epicentres(b, usr=c(173.55, 176.05, -42.13, -40.47),
depth=c(0, 30, 50, 70, 100, Inf), criteria=FALSE,
36 CHAPTER 6. EXPLORATORY DATA ANALYSIS (SSEDA)

magnitude=c(2, 3, 4, 5, 6, Inf), mapname="nzHires")


title(main="Plate Subduction in Wellington Region", font.main=1)

6.2.2 New Zealand Region


Plate subduction in the overall NZ region can be observed by viewing in a roughly north-
east direction, essentially in the direction of the main mountain ranges (see discussion
in §6.2.1).
1. Select the required events from the NZ Catalogue:
library(ssEDA)
library(ssNZ)
b <- subsetrect(NZ, minlong=170, maxlong=180, minlat=-43,
maxlat=-35, minmag=3, minday=julian(1,1,1970),
maxday=julian(1,1,1993))

2. Plot the subducting plate as follows:


rotation(b, theta=-47)
title(main="Plate Subduction in NZ Region")

3. The plate subduction can also be represented by an epicentral plot:


epicentres(b, depth=c(0, 30, 100, 150, 200, Inf),
usr=c(b$minlong, b$maxlong, b$minlat, b$maxlat),
mapname="nz", criteria=FALSE)
title(main="Plate Subduction in NZ Region", font.main=1)

6.3 General Statistics


In this section, we briefly display some other graphical data summaries.

6.3.1 Depth Distribution


1. The depth distribution of New Zealand events can be displayed using a histogram
as follows. We split the graphics page into two, for shallow and deep events, using
the par function:
library(ssEDA)
library(ssNZ)

par(mfrow=c(2,1))
a <- subsetrect(NZ, minday=julian(1,1,1965), maxday=julian(1,1,1995),
mindepth=0, maxdepth=39.99, minmag=4)
depth.hist(a)
title(main="Depth Distribution: Shallow Events")

a <- subsetrect(NZ, minday=julian(1,1,1965), maxday=julian(1,1,1995),


mindepth=40, maxdepth=500, minmag=4)
depth.hist(a)
title(main="Depth Distribution: Deep Events")
6.3. GENERAL STATISTICS 37

Note the large peaks at 5 km, 12 km and 33 km. These are generally events with
poorly determined locations.
2. If one requires more control in the way that the histogram is drawn, it is easier
to create a temporary subcatalogue as follows, and then directly call the hist
function.
par(mfrow=c(1,1))
a <- subsetrect(NZ, minday=julian(1,1,1965), maxday=julian(1,1,1995),
mindepth=0, maxdepth=39.99, minmag=4)
as.catalogue(a, catname="temp")

hist(temp$depth, main="", xlab="Depth", breaks=seq(0, 40, 1))

6.3.2 Frequency-Magnitude Distribution


1. The Gutenberg-Richter relationship can be observed by doing a frequency magni-
tude plot.
library(ssEDA)
library(ssNZ)

par(mfrow=c(1,1))
b <- subsetrect(NZ, minday=julian(1,1,1964), maxday=julian(1,1,1993),
mindepth=40, maxdepth=120, minmag=4)

freq.magnitude(b)

Note that the vertical axis label includes a subscript. Other mathematical sym-
bols (including Greek characters) can easily be included on plots, see the topic
plotmath in the help documentation.
2. The freq.magnitude function also calculates the b-value (i.e. the slope of the
line), and returns this value if it is assigned to another object as follows:
bvalue <- freq.magnitude(b)
print(bvalue)

Note that, by default, the b-value is estimated using the maximum likelihood
estimator. See the help documentation for freq.magnitude for further details.

6.3.3 Time-Series of Event Counts


Annual counts of events can be plotted using the timeplot function:
library(ssEDA)
library(ssNZ)

par(mfrow=c(1,1))
b <- subsetrect(NZ, minday=julian(1,1,1961), maxday=julian(1,1,2001),
minmag=4)
timeplot(b)
title(main=expression(paste("Events in NZ Catalogue with", M[L] >= 4)))
38 CHAPTER 6. EXPLORATORY DATA ANALYSIS (SSEDA)

Note that the title includes a subscript and a “≥” sign. Other mathematical symbols
(including Greek characters) can easily be included on plots, see the topic plotmath in
the help documentation.
Chapter 7

Point Process Modelling


(PtProcess)

A listing of the main functions in the PtProcess package can be found in Appendix
A.4, and detailed documentation for all functions can be found in Harte (2003b). Some
further mathematical details about the model formulation used by the package PtPro-
cess and various relationships can be found in Appendix C. A general text on point
process modelling is provided by Daley & Vere-Jones (2003).

7.1 Using Catalogue Data


Often one will want to extract a subset of data from an earthquake catalogue and fit a
point process model to these data. The earthquake catalogues store the time variable as
the number of days (and fractions of) from some arbitrary origin (see §5.3). By default,
this is 1 January 1970. Hence midnight on 1 January 1970 is “time zero”. When fitting
a point process model, “time zero” is generally the beginning of the observation period
of the event sequence of interest. Further, the magnitudes are generally shifted to be
the number of magnitude units above some threshold value M0 .
Say we want to fit a point process model to events in the New Zealand Catalogue
with magnitude ≥ 5 between 1 January 1965 and 31 December 2000.
1. These events would be extracted from the NZ Catalogue to make a new catalogue,
called x for example, as follows:
library(ssNZ)
a <- subsetrect(NZ, minday=julian(1,1,1965),
maxday=julian(1,1,2001), minmag=5)

as.catalogue(a, catname="x")

2. The magnitudes would be transformed so that they represent the number of mag-
nitude units above M0 . Assuming that M0 = 5, then:
x$magnitude <- x$magnitude - 5

39
40 CHAPTER 7. POINT PROCESS MODELLING (PTPROCESS)

3. The times are stored as the number of days (and fractions) from some origin, usu-
ally 1 Jan 1970, though not necessarily. The object x$time has class "datetimes"
which causes the dates to be printed in the usual way, i.e. day, month, year, etc.
To print the dates in the standard format, use print(x$time); the attributes
are listed by using print(attributes(x$time)); note the origin. Further, the
number of days from the origin is printed as print(as.vector(x$time)).

4. The times are transformed to the origin required by the point process model.
Assume that this is midnight on 1 January 1965. We want to ensure that the
julian function uses the same origin as that used in x$time. The julian function
below calculates the number of days from the current origin to 1 January 1965.
x$time <- x$time - julian(1,1,1965, origin=attr(x$time, "origin"))
attr(x$time, "origin") <- c(month=1, day=1, year=1965)

By resetting the origin attribute on the time variable, the times will still be printed
correctly, for example, check by printing the first one hundred events, i.e.
print(x[(1:100),])

Alternatively, the attributes can be stripped from x$time so that it will al-
ways be printed as the number of days from 1 Jan 1965. This is done as:
x$time <- as.vector(x$time).

7.2 Conditional Intensity and Log-Likelihood Functions


1. Our numerical routines use the conditional intensity function as the basic building
block. All conditional intensity functions have the suffix “.cif”. These functions
not only evaluate λ(t), but also evaluate the integral of λ(t).

2. Below we fit the stress release model, using the conditional intensity function
srm.cif, to events in the NthChina dataset. This conditional intensity function
is defined in the PtProcess manual (see Harte, 2003b, for further details). These
data are stored as part of the PtProcess package. Read the events by entering:
library(PtProcess)
data(NthChina)

onto the R command line. The PtProcess package manual (Harte, 2003b) con-
tains more information about the North China data. Note that here the time
variable has already been scaled to represent the number of years from 1480 AD,
and the magnitude is the number of magnitude units greater than 6. Hence there
is no need for the adjustments discussed in §7.1 above.

3. The data span 517 years from 1480 AD. Set up a time interval variable TT as
follows:
TT <- c(0, 517)
7.3. UNCONSTRAINED MAXIMUM LIKELIHOOD ESTIMATION 41

4. Plot the event times by magnitude:


plot(NthChina$time, NthChina$magnitude+6, type="h",
xlab="Years Since 1480 AD", ylab="Magnitude", xlim=TT,
main="North China Historical Catalogue")

Six has been added to the magnitude so that the correct unadjusted value is
plotted.

5. Now we plot the conditional intensity on the interval TT. The stress release model
contains parameters a, b, and c which are specified in the params vector below, in
that order. Initially, we simply use the values as below, which turn out to be a
good approximation to the required values:
params <- c(-2, 0.01, 1)
ti <- seq(TT[1], TT[2], 0.5)
y <- srm.cif(NthChina, ti, params)
plot(ti, y, type="l", xlab="Years Since 1480 AD",
ylab=expression(lambda(t)))

The values of λ(t) for t = 0, · · · , 517 will be contained in y.

6. The integral of λ(t) on the interval (0, 517) can be calculated and placed into y by
entering
y <- srm.cif(NthChina, NULL, params, TT=TT)

7. For a given matrix containing the history of the process, a specific conditional
intensity function, the interval of evaluation TT, and the corresponding vector of
parameter values params, we can calculate the log-likelihood as
y <- pp.LL(NthChina, srm.cif, params, TT)

The object y now contains the log-likelihood.

7.3 Unconstrained Maximum Likelihood Estimation


Below we estimate the parameters for the stress release model when fitted to the North
China dataset.

1. Begin by attaching the North China dataset and creating a time interval variable
TT:
library(PtProcess)
data(NthChina)
TT <- c(0, 517)

2. To find the values of the parameters where the log-likelihood function is maximised,
we minimise the negative of the log-likelihood using the minimiser nlm. However,
nlm requires a function that only has the free arguments as parameters. Further,
42 CHAPTER 7. POINT PROCESS MODELLING (PTPROCESS)

it may also be the case that we want to constrain the parameters in some manner.
We do this by adding priors to the log-likelihood function to create a posterior
likelihood function. This will be discussed in §7.5. By default the make.posterior
function creates an unconstrained likelihood function. Now enter:
posterior <- make.posterior(NthChina, srm.cif, TT)

The object posterior is in fact a function, with only the parameter’s vector as an
argument. Enter posterior on the command line to view the function.

3. Since no priors were specified, the posterior function is simply the log-likelihood.
Enter
params <- c(-2, 0.01, 1)
posterior(params)

on the command line to again get the log-likelihood evaluated at params.

4. The maximum likelihood parameter estimates can be calculated by minimising


minus the log-likelihood, using the minimiser nlm. For example, enter:
neg.posterior <- function(params) (-posterior(params))
z <- nlm(neg.posterior, params, iterlim=1000, print.level=2)

The print.level argument is set so that values of the parameters are printed at
each iteration. The iterations start at the initial parameter values given by the
params argument. The object z is a list object, containing the maximum likelihood
parameter estimates (z$estimate), and convergence messages. One should also
check that the derivatives, contained in the output object z, are sufficiently small.

5. Minimisation (or optimisation) is not straight forward. For the process to work
properly, the function nlm needs to know the relative scale of each of the parame-
ters. In the stress release model, the b parameter has a much finer possible range
of values than both of the other parameters. The typsize argument gives the
relative order of magnitude of step sizes that the minimiser initially should use in
its search for the minimum, e.g.
z <- nlm(neg.posterior, params, typsize=c(1, 0.01, 1), iterlim=1000,
print.level=2)

converges now in 14 iterations. Often the iteration procedure will get “lost” if
poor values are chosen for either of params or typsize.

6. Now plot the conditional intensity function using the maximum likelihood param-
eter estimates, with the magnitude-time graph as follows:
par(mfrow=c(2,1))
ti <- seq(TT[1], TT[2], 0.5)
y <- srm.cif(NthChina, ti, z$estimate)

plot(ti, y, type="l", xlab="Years Since 1480 AD",


7.4. CALCULATION OF STANDARD ERRORS 43

ylab=expression(lambda(t)),
main="North China Historical Catalogue")

plot(NthChina$time, NthChina$magnitude+6, type="h",


xlab="Years Since 1480 AD", ylab="Magnitude", xlim=TT)

7. The residual point process can be calculated and plotted by entering


par(mfrow=c(1,1))
y <- pp.resid(NthChina, z$estimate, srm.cif)

plot(y, 1:nrow(NthChina), type="b", xlab="Transformed Time",


ylab="Event Number")
abline(a=0, b=1, lty=3)

8. We may want to restrict the events over which we maximise the likelihood. For
example, say we wanted to maximise the likelihood over those events such that
100 < ti < 517. That is, the log-likelihood is
X Z 517
log λ(ti ) − λ(t)dt.
i:100≤ti ≤517 100

However, note that λ(t) = λ(t|Ht ), and so λ(t) will be calculated using the com-
plete history that is supplied in the matrix NthChina, but the parameters a, b and
c (in this case) will be determined by maximising the log-likelihood as specified
above. Thus, events in the interval (0, 100) will be used to calculate the stress
function S(t). Effectively, the events in the interval (0, 100) are being used to get
the process to a steady state. This can be done by entering:
posterior <- make.posterior(NthChina, srm.cif, c(100,517))
neg.posterior <- function(params) (-posterior(params))
z <- nlm(neg.posterior, params, typsize=c(1, 0.01, 1), iterlim=1000)

7.4 Calculation of Standard Errors


Here we will calculate the standard errors of the parameter estimates for the stress
release model fitted to the North China data and maximised over the interval (0, 517).
1. The standard errors are calculated using the Hessian matrix. The function nlm
uses this matrix in its calculations, but we need to set the argument hessian to
TRUE so that it is included in the object z that is output from the function nlm.
Refit the model with the following statements:
library(PtProcess)
data(NthChina)
TT <- c(0, 517)
params <- c(-2, 0.01, 1)
posterior <- make.posterior(NthChina, srm.cif, TT)
neg.posterior <- function(params) (-posterior(params))
z <- nlm(neg.posterior, params, typsize=c(1, 0.01, 1), iterlim=1000,
print.level=2, hessian=TRUE)
44 CHAPTER 7. POINT PROCESS MODELLING (PTPROCESS)

Enter print(z$hessian) to see the Hessian matrix.

2. The covariance matrix of the parameter estimates can be estimated by calculating


the inverse of the Hessian matrix. This inverse is calculated by using the function
solve. This can be done as follows:
covariance <- solve(z$hessian)

However, be warned that often the Hessian calculations are very sensitive to the
value of the differencing step used by nlm and also the number of iterations that
were required to achieve convergence. If the minimisation converged within a very
small number if iterations, the estimate of the Hessian may be very poor.

3. The standard errors can be extracted from the covariance matrix as follows:
stderr <- sqrt(diag(covariance))

4. The correlation matrix of the parameters can be calculated by pre and post mul-
tiplying the covariance matrix by a diagonal matrix containing the inverse of the
standard errors. This is done as follows:
correlation <- diag(1/stderr) %*% covariance %*% diag(1/stderr)

5. To make interpretation easier, we can name the rows and columns of the matrices
as follows:
param.names <- c("a", "b", "c")
dimnames(correlation) <- list(param.names, param.names)
dimnames(covariance) <- list(param.names, param.names)
param.est <- cbind(z$estimate, stderr)
dimnames(param.est) <- list(param.names, c("Estimate", "StdErr"))

print(covariance)
print(correlation)
print(param.est)

7.5 Constrained Maximum Likelihood Estimation


Here we fit the ETAS model to events near Cape Palliser at the south of the North Island
of New Zealand. The required events are contained in the Palliser Catalogue which is in
the ssBase package. The Palliser Catalogue contains events from the original Wellington
Catalogue with magnitude ≥ 2.5, depth ≤ 40 km, between 1 Jan 1990 and 31 Dec 1991,
and within a 36 km radius of 41.684◦ S and 175.503◦ E.

1. Attached the Palliser Catalogue as follows:


library(PtProcess)
library(ssBase)
data(Palliser)
7.5. CONSTRAINED MAXIMUM LIKELIHOOD ESTIMATION 45

2. Reset the time origin to be 1 Jan 1990 (the start of the observation period). By
default, the time origin is 1 Jan 1970.
Palliser$time <- Palliser$time -
julian(1,1,1990, origin=attr(Palliser$time, "origin"))
attr(Palliser$time, "origin") <- c(month=1, day=1, year=1990)

3. Fit the unconstrained model. This is done in the same way as for the stress release
model, but with etas.cif instead.
Palliser$magnitude <- Palliser$magnitude - 2.5
TT <- c(0, julian(1,1,1992)-julian(1,1,1990))
initial <- c(0.025, 15, 1.3, 0.006, 1.21)

posterior <- make.posterior(Palliser, etas.cif, TT=TT)


neg.posterior <- function(params) (-posterior(params))
z <- nlm(neg.posterior, initial, iterlim=1000, hessian=TRUE,
typsize=c(1, 100, 1, 0.1, 1), print.level=2)

covariance <- solve(z$hessian)


stderr <- sqrt(diag(covariance))
correlation <- diag(1/stderr) %*% covariance %*% diag(1/stderr)
param.est <- cbind(z$estimate, stderr)

param.names <- c("mu", "A", "alpha", "CC", "P")


dimnames(correlation) <- list(param.names, param.names)
dimnames(param.est) <- list(param.names,
c("Estimate", "StdErr"))

print(param.est)
print(correlation)

4. Values of the parameter estimates and correlations should be as follows:


# Estimate StdErr
# mu 0.013826703 0.007349847
# A 17.078050145 8.284288353
# alpha 1.304035044 0.149650579
# CC 0.005754268 0.003330309
# P 1.213908647 0.059975967

# mu A alpha CC P
# mu 1.00000000 -0.1967434 0.05942277 0.2907219 0.45475125
# A -0.19674336 1.0000000 -0.54625117 -0.8955979 -0.56881814
# alpha 0.05942277 -0.5462512 1.00000000 0.2270125 0.07102283
# CC 0.29072192 -0.8955979 0.22701248 1.0000000 0.82064150
# P 0.45475125 -0.5688181 0.07102283 0.8206415 1.00000000

5. Draw a contour plot of the likelihood surface as a function of the c and p parameters
by copying the following statements:
w <- pp.contours(Palliser, z$estimate, etas.cif, TT=TT, param.index=c(4, 5),
steps.x=seq(0.0055, 0.0065, 0.0001),
steps.y=seq(1.17, 1.25, 0.005))
46 CHAPTER 7. POINT PROCESS MODELLING (PTPROCESS)

contour(w$x, w$y, w$LL, levels=seq(170, 213, 0.1))


title(main="ETAS Model: Likelihood Function Surface", xlab="c", ylab="p")
abline(v=z$estimate[4], lty=3)
abline(h=z$estimate[5], lty=3)

The argument levels=seq(170, 213, 0.1) instructs the contour function to


draw contours of the log-likelihood at 170.0, 170.1, · · · , 213.0. Note that the main
axis of the ellipse is not parallel to either the x or y axes, being consistent with
the high correlation between the c and p parameters.

6. If we want to constrain one or more model parameters, we describe these con-


straints within a matrix. The matrix has one row for each model parameter
(including those that may be unconstrained). This is done by assigning a prior
distribution to each parameter. The matrix will have three columns, the first col-
umn being the name of the prior density, and the remaining two being parameter
values pertaining to the selected prior. Within the web browser help window se-
lect NIprior. Within the window that appears will be a listing of possible prior
distributions.

7. Here we will impose a Cauchy prior probability distribution on the parameter p,


with location parameter 1.2 and scale parameter 0.1. The shape of the correspond-
ing Cauchy density function can be plotted as follows:

plot(seq(0.8,1.8,0.01), dcauchy(seq(0.8,1.8,0.01), 1.2, 0.1), type="l")

8. We will also fix the value of c to be 0.006, achieved with the Dirac prior.

9. The required matrix can be formed by using the function prior.info. Select the
help documentation for this function from the web browser help window. The
required statements are:

y <- prior.info(density=c("NIprior", "NIprior", "NIprior",


"Dirac", "Cprior"),
par1=c(0, 0, 0, 0.006, 1.2),
par2=c(Inf, Inf, Inf, 0.006, 0.1))

Print the matrix y to the screen. Note that there is one row for each parameter.
The first 3 parameters have been restricted to take positive values, c has been fixed
to 0.006, and the p parameter has been given a Cauchy prior with parameters 1.2
and 0.1.

10. Now we must make an R function to calculate the log-likelihood together with
the prior distributions (i.e. weights or penalty functions). This is achieved by
entering:

posterior <- make.posterior(Palliser, etas.cif, TT=TT, prior.info=y)


neg.posterior <- function(params) (-posterior(params))
7.6. MODIFYING CONDITIONAL INTENSITY FUNCTIONS 47

Type posterior on the R command line to view the function. Notice that it is
only a function of the free parameters. It not only calls the log-likelihood function,
but also the specified prior distributions.

11. We now maximise the log-likelihood subject to these constraints. Note that we
now only estimate four parameters, hence the vector initial is only of length 4.
This is done by entering:
initial <- c(0.015, 15, 1.3, 1.2)
z1 <- nlm(neg.posterior, initial, iterlim=1000,
typsize=c(1, 100, 1, 1), print.level=2)

When the calculations stop, print out the object z.

12. The full parameters’ vector can be reconstructed as


z1$full <- c(z1$estimate[1:3], 0.006, z1$estimate[4])

13. Now calculate the log-likelihood function and AICs using both the unconstrained
and constrained solutions by entering:
LL <- pp.LL(Palliser, etas.cif, z$estimate, TT=TT)
LL1 <- pp.LL(Palliser, etas.cif, z1$full, TT=TT)
AIC <- -2*LL + 2*5
AIC1 <- -2*LL1 + 2*4

Note that the two likelihoods and AICs are very similar.

7.6 Modifying Conditional Intensity Functions


1. The conditional intensity function for the linked stress release model for two re-
gions can be modified so that the transfer matrix is symmetric as follows:
symmetric <- function(data, eval.pts, params, TT = NA){
# conditional intensity for symmetric coupled SRM
# params <- c(a1, a2, b1, b2, c11, c12, c22)
# where c12 = c21
ci <- linksrm.cif(data, eval.pts, params[c(1:5,6,6,7)], TT = TT)
return(ci)
}

Note that the function symmetric is a function of 7 parameters, and internally,


simply calls the function linksrm.cif. This new function can then be used in the
standard manner.

2. The function cif.reformat can be used to achieve the same as above, and more.
It maps the parameter space of a given conditional intensity function to a lower
number of dimensions. See the help documentation (Harte, 2003b) for examples
using cif.reformat.
48 CHAPTER 7. POINT PROCESS MODELLING (PTPROCESS)

3. A linear trend component could be added to the standard ETAS model by defining
a new function as follows:
etas.plus.trend <- function(data, eval.pts, params, TT = NA){
# conditional intensity for ETAS plus linear trend
# params <- c(mu, A, alpha, CC, P, trend.slope)
ci <- etas.cif(data, eval.pts, params[1:5], TT = TT) +
poly.cif(data, eval.pts, c(0,params[6]), TT = TT)
# the constant in the polynomial is zero as is the same as mu
return(ci)
}

This new function could also be used in the standard manner. Any additive combi-
nation of the conditional intensity functions can be made. In general multiplicative
combinations will not work, as the integral terms will not be correct.

7.7 Simulating Point Process Models


We again use the Cape Palliser data as in §7.5. We start by fitting the ETAS model as
in §7.5, then simulating data for the two years after the observation period.
1. Fit again the unconstrained ETAS model as in §7.5:
library(PtProcess)
library(ssBase)
library(ssEDA)
data(Palliser)

Palliser$time <- Palliser$time -


julian(1,1,1990, origin=attr(Palliser$time, "origin"))
attr(Palliser$time, "origin") <- c(month=1, day=1, year=1990)

Palliser$magnitude <- Palliser$magnitude - 2.5


TT <- c(0, julian(1,1,1992)-julian(1,1,1990))
initial <- c(0.025, 15, 1.3, 0.006, 1.21)

posterior <- make.posterior(Palliser, etas.cif, TT=TT)


neg.posterior <- function(params) (-posterior(params))
z <- nlm(neg.posterior, initial, iterlim=1000, hessian=TRUE,
typsize=c(1, 100, 1, 0.1, 1), print.level=2)

The estimated parameters are contained in the vector z$estimate.

2. The b-value can be estimated by using the freq.magnitude function. It requires


input from a “subset” function. Since we want all data in the Palliser Catalogue,
this is achieved by including no subsetting restrictions.
a <- subsetrect(Palliser)
bvalue <- freq.magnitude(a)
print(bvalue)

The estimated b-value should be 1.068027.


7.7. SIMULATING POINT PROCESS MODELS 49

3. Now use the estimated model parameters to simulate data for 1992 and 1993.
Recall that our date origin is 1 January 1990, hence the simulation interval bounds
are calculated as:
T1 <- julian(1, 1, 1992, origin=c(month=1, day=1, year=1990))
T2 <- julian(1, 1, 1994, origin=c(month=1, day=1, year=1990))
print(T1)
print(T2)

4. The simulation is performed as:


sim <- pp.sim(Palliser, z$estimate, etas.cif, c(T1, T2),
output=TRUE, seed=100, magn.sim=bvalue)

The “seed” argument, an integer, causes the random number generator to start at
the same place each time. It is quite useful when one wants to regenerate the same
random numbers. As well as the simulated events being written to the object sim,
they are also written to the screen.

5. A magnitude-time plot of the observed events in 1990 and 1991, and the simulated
events for 1992 and 1993 can be plotted as below:
plot(sim[,"time"], 2.5 + sim[,"magnitude"], type="h",
xlab="Time", ylab="Magnitude", xlim=c(0, T2))
axis(3, at=c(365, 1070), tck=0,
labels=c("Observed Events", "Simulated Events"))
abline(v=T1, lty=3)
title(main=paste(c("mu", "A", "alpha", "c", "p"),
round(z$estimate, digits=3), sep="=", collapse=" "), line=3)

Note that 2.5 is being added to the magnitude, as this threshold value was sub-
tracted from Palliser$magnitude prior to the estimation stage above.

6. Sometimes simulations explode when using the ETAS model. This explosion oc-
curs because a stability requirement is not satisfied. Let k be the expected number
of “offspring” from a single “ancestor”, then the stability requirement is that k < 1.
The parameters satisfy the following relationship:
c β
k=A
p−1 β−α
The value of k for the above simulations can be calculated as:
params <- z$estimate
beta <- bvalue*log(10)
k <- params[2]*params[4]/(params[5]-1)*beta/(beta-params[3])
print(k)

For the current example, k = 0.978015.

7. By decreasing the b-value sufficiently, then k will be large enough so that the
process will have a high chance of exploding. For example, try:
50 CHAPTER 7. POINT PROCESS MODELLING (PTPROCESS)

bvalue <- 0.8


beta <- bvalue*log(10)
k <- params[2]*params[4]/(params[5]-1)*beta/(beta-params[3])
print(k)

sim <- pp.sim(Palliser, z$estimate, etas.cif, c(T1, T2),


output=TRUE, seed=100, magn.sim=bvalue)

If you need to stop it, hold down the “control” key and press “c”. Try with a
number of different values for the seed. When b = 0.9 the process should explode
less frequently than when b = 0.8.

8. The observed and simulated events can be plotted in the same manner as above.

9. Other model parameters can be adjusted, and the effect on the simulations ob-
served. However, if you adjust other parameters, be careful that the criticality
conditions are satisfied.

In the simulations above, we have used the thinning method. This method of simu-
lation can be used for a wide class of point process models. This is not to say that it will
be the most efficient method for a given model. For example, a more efficient method
to simulate the ETAS model is to generate the mainshock event (ancestor) then its off-
spring, then the offspring produced by the first generation, etc. This process is repeated
until extinction for each particular family line. At this time, the aftershock sequence is
finished. This method of simulation is exploiting a characteristic of the ETAS model.
For large simulations, one also needs to use an appropriate computer language and
program structure. For example R becomes slower when using loops. It works much
faster if one can write the algorithm in the form of matrix algebra.
Chapter 8

M8 Algorithm (ssM8)

Examples using the M8 functions can be found within the help documentation for the
functions M8, M8.series, and M8.TIP.
A listing of the main functions in the ssM8 package can be found in Appendix A.5,
and detailed documentation for all functions can be found in the package manual, see
Harte (2003e).

51
52 CHAPTER 8. M8 ALGORITHM (SSM8)
Chapter 9

Hidden Markov Models


(HiddenMarkov)

The package HiddenMarkov contains functions for the simulation and fitting of dis-
crete time hidden Markov models, where the hidden Markov process has m discrete
states.
Some of the code is quite inefficient. The emphasis so far has been to ensure that
the code gives the correct answers.
Detailed documentation and examples for all functions in the HiddenMarkov pack-
age can be found in the package manual, see Harte (2005).

53
54 CHAPTER 9. HIDDEN MARKOV MODELS (HIDDENMARKOV)
Chapter 10

Fractal Dimension Estimation


(Fractal)

A listing of the main functions in the Fractal package can be found in Appendix A.6,
and detailed documentation for all functions can be found in the package manual, see
Harte (2003a).
An example using the Hill estimator to calculate the Rényi dimensions of the Can-
tor measure can be found in the help documentation for the function hill. Further
theoretical details can be found in Harte (2001).

55
56 CHAPTER 10. FRACTAL DIMENSION ESTIMATION (FRACTAL)
Part III

System Administration

57
Chapter 11

Software Installation

The Statistical Seismology Library cannot be installed until the R language has been
installed onto the computer.

11.1 Installation or Updating of the R Software


R is available in two forms: as source code, or as a binary distribution. The binary distri-
butions are “compiled” versions of the source code. Source and binary distributions can
be downloaded from the Comprehensive R Archive Network (CRAN), at www.cran.r-
project.org/mirrors.html. It is much more difficult to install R from the source code,
and we only describe here the installation from the binary distribution.
R is available in binary form for a variety of systems, including Windows, Linux,
and Macintosh (OS X 10.2 and above). We will describe the installation for Windows
and RedHat Linux here, but other installation instructions are available from CRAN.

11.1.1 Linux
R is available for 5 different “flavours” of Linux, namely Debian, Mandrake, RedHat,
Suse and Vinelinux. Within each flavour, only some versions have binary versions of R
available. In general a particular version of R is available for those versions of Linux
that were current when that version of R was released (sometimes more than one version
of a particular Linux release might be considered to be current).
The following describes installing R for RedHat distributions of Linux using the
binary distribution. This is done by downloading the appropriate rpm file for your
operating system from CRAN. The downloaded rpm file can be placed into any directory.
The R software will generally be installed into the system directories. Hence one will
need to login as “root” to ensure sufficient privileges to write into these directories.
The software is installed by issuing the command
rpm -ivh filename

in an XTERM window that is within the subdirectory containing the downloaded rpm
file. This will run the installation process, and cause the R software to be installed into

59
60 CHAPTER 11. SOFTWARE INSTALLATION

the appropriate system directories. At the completion of the job, the rpm file can be
deleted.
Software can subsequently be updated with a new version (i.e. new rpm file) as
rpm -Uvh filename

For more details about the use of the rpm command, enter
man rpm

on the XTERM command line.

11.1.2 Microsoft Windows


A single MS Windows binary will, in general, suit all versions of Windows, from Win-
dows 95 on. Download the Windows binary from CRAN. Install by double-clicking the
downloaded “.exe” file. In general you can accept all the default options during the
installation.

11.2 Installation or Updating of SSLib


The SSLib packages supplied on the SSLib web page are source distributions.

11.2.1 unix/Linux
The SSLib software will generally be installed into the system directories. Hence one
will need to login as “root” to ensure sufficient privileges to write into these directories.
SSLib consists of individual R packages. An individual R package can be installed
by giving the following command at the XTERM prompt:
R CMD INSTALL packagename_*.*-*.tar.gz

where * denotes the version numbers. The XTERM window should be within the
directory that contains the package source file (usually a *.tar.gz) or the source directory.
Similarly, an individual package can be removed by issuing the following command:
R CMD REMOVE packagename

This removal process can be done from any directory, but only by the “root” user.

11.2.2 Microsoft Windows


The above method (§11.2.1) can be used to install packages on a MS operating sys-
tem, although a lot more support software needs to be installed first, and it is fairly
complicated.
The easiest way to install a SSLib package on a MS Windows system is to use a
Windows binary form of the package (i.e. pre-compiled for Windows, rather than the
11.2. INSTALLATION OR UPDATING OF SSLIB 61

source code). These are recognised by their “.zip” filename extension, rather than the
“.tar.gz” extension used for the source distribution. Then use the “Install package(s)
from local zip files...” option in the “Packages” menu. Some of these zip files are
available from the SSLib web page, see the ”windows binaries” hyperlink.
62 CHAPTER 11. SOFTWARE INSTALLATION
Chapter 12

Modifications and Additions to


SSLib

12.1 Modification of “Required” Packages by sslib


The sslib package exists to simply provide an easy means to attach all SSLib packages
at the beginning of an R session. If one wants to add to this collection of packages
one or more earthquake catalogue packages, then unzip (gzip) and open the tar archive
containing the sslib package source code as follows:
tar -zxvf sslib_*.*-*.tar.gz

where * denotes the version numbers. Within the directory /sslib/R/, edit the file
“zzz.R”, in particular, add
require(CataloguePackageName)

below the other “require” statements. Then reinstall the package onto the operating
system.

12.2 Writing a Package for Inclusion into SSLib


1. For the necessary style and layout, see the document by the R Development Core
Team (2002).

2. Checks of syntax, documentation, validity of examples, and so on, can be per-


formed by running:
R CMD check packagename

in the unix xterm window, where packagename is the name of the package.
Note that the directory containing the package source code will have the name
packagename too. For further details, enter
R CMD check --help

63
64 CHAPTER 12. MODIFICATIONS AND ADDITIONS TO SSLIB

in the unix xterm window.

3. The tape archive file (*.tar.gz) can be created when the development of the source
code in the directory packagename is complete. This file is created by entering:
R CMD build --force packagename

in the unix xterm window. It will include the version numbers in the tar.gz
file name (e.g. ssBase 1.2-5.tar.gz), which are read from the DESCRIPTION
file within the source directory. It also performs a few other checks. For more
information, enter:
R CMD build --help

12.3 Creating an Earthquake Catalogue Package


The package must conform to the rules outlined in §12.2. The easiest way to construct a
catalogue package is to mimic the style of other catalogue packages, for example ssNZ.
Part IV

Appendices

65
Appendix A

Main SSLib Functions

The main functions in SSLib are listed below under their respective package name.

A.1 Earthquake Catalogue Packages


Italy Italian Instrumental Catalogue (package ssItaly)
CPTI Catalogo Parametrico Dei Terremoti Italiani (Historical Italian Events, pack-
age ssItalyHist)
NZ NZ Catalogue (package ssNZ)
PDE PDE Catalogue (package ssPDE)
SCEC Southern California Earthquake Center Catalogue (package ssSCEC)
Taiwan Taiwan Catalogue (package ssTaiwan)

A.2 Base Package (ssBase)


For an overview of the ssBase package, see Figure A.1. Detailed documentation for
each function in the ssBase package can be found in Harte (2003c).

A.2.1 Basic Catalogue Characteristics


names Names Attribute of Object
print.catalogue Print Earthquake Catalogue
summary.catalogue Summary of Earthquake Catalogue

A.2.2 Catalogue Subsetting Functions


print.subset Method for Generic Function
subsetcircle Circular Subset of Events
subsetpolygon Polygon Subset of Events
subsetrect Rectangular Subset of Events
subsetsphere Spherical Subset of Events

67
68 APPENDIX A. MAIN SSLIB FUNCTIONS

Base Package

57/ . $!$ .  " '


,*


?@,*'
 "#?A
56/ . $!$,*

/:G57H"IH"I
 
 
 
!"#


8 9
# ?CB6
( .  ' >1

/:';<" '
#/2#,*43# .  '
D )%( . E% 'E"F . + 4
="

as.catalogue /:G57H"IH"I /1G56HIHKJ

.  '0/1'!"
-
,*'

/:G57H"IH%L

$%&'(*) "+*,*'"
-

Figure A.1: Flow chart showing the relationship between objects in the ssBase package.
A.3. EDA PACKAGE (SSEDA) 69

summary.subset Summary of Subset

A.2.3 Date Related Functions


datetimes Calculate Dates and Times
days1 Days of Datetime Value
format.datetimes Format a Datetime Object
hrs.mins.secs Calculates Hours, Minutes and Seconds
julian Calculate Julian Date
months1 Months of Datetime Value
print.datetimes Method for Generic Function
years1 Years of Datetime Value

A.2.4 Miscellaneous Functions


epi.circle Calculate Epicentral Coordinates of a Circle
lattice Lattice of Evenly Spaced Points on Sphere
lattice.retrieve Determine Lattice Indices for Given Circle
projection Transforms Spherical Coordinates to Cartesian

A.3 EDA Package (ssEDA)


For an overview of the ssEDA package, see Figure A.2. Detailed documentation for
each function in the ssEDA package can be found in Harte (2003d).

A.3.1 Graphical Summaries


bvalue.contour b-Value Contours at Specified Depth
depth.hist Depth Histogram
epicentres Epicentral Plot of Selected Events
freq.cusum Frequency Cusum Plot
freq.magnitude Frequency Magnitude Plot
magnitude.contour Mean Magnitude Contours at Specified Depth
magnitude.cusum Cusum Magnitude Plot
magnitude.time Magnitude Time Plot
rotation Rotates Events to View Plate Boundary
threeD Dynamic 3D Plot of Earthquake Hypocentres
time.plot Plots Event Frequencies by Time

A.3.2 Probability Distributions Originating in Seismology


dkagan Density of the Kagan Distribution
est.kagan Estimate Parameters of Kagan Distribution
pkagan Cumulative Probability of the Kagan Distribution
qkagan Quantiles of the Kagan Distribution
70 APPENDIX A. MAIN SSLIB FUNCTIONS

EDA Package

 
    

    

   !
 ,.+ -/+0

"$#%& '$   

(   & *) 

multigraph

Figure A.2: Flow chart showing the relationship between objects in the ssEDA package.
A.4. POINT PROCESS PACKAGE (PTPROCESS) 71

rkagan Simulate the Kagan Distribution

A.3.3 Miscellaneous Functions


hemisphere Map of the Hemisphere
multigraph Plot Multiple Graphs
plot.subset Method for Generic Function Plot
projection Transforms Spherical Coordinates to Cartesian

A.4 Point Process Package (PtProcess)


For an overview of the PtProcess package, see Figures A.3 and A.4. Detailed docu-
mentation for each function in the PtProcess package can be found in Harte (2003b).

A.4.1 Conditional Intensity Functions


etas.cif Conditional Intensity for ETAS Model
expfourier.cif Exponential Fourier Intensity Function
exppoly.cif Exponential Polynomial Intensity Function
fourier.cif Fourier Series Intensity Function
linksrm.cif Conditional Intensity for Linked SRM Model
poly.cif Polynomial Intensity Function
simple.cif Intensity Function for Simple Poisson Model
srm.cif Conditional Intensity for SRM Model

A.4.2 Prior Functions


prior.info Prior Distributions for Point Process Models
Cprior Cauchy Prior Distribution
Gprior Gamma Prior Distribution
Lprior Log-Normal Prior Distribution
NIprior Noninformative Prior Distribution
Nprior Normal Prior Distribution
Uprior Uniform Prior Distribution

A.4.3 Fitting and Evaluating Point Process Models


make.posterior Make Posterior Function for Point Process
pp.contours Contours of Log-Likelihood Surface
pp.data Data Matrix for Point Process Model
pp.eval Evaluate Predictive Power of Point Process
pp.infogain Point Process Information Gain
pp.LL Point Process Log-likelihood
pp.resid Residual Point Process Analysis
72 APPENDIX A. MAIN SSLIB FUNCTIONS

Point Process Package


Maximum Likelihood Parameter Estimation

Specification of
Conditional
Catalogue Prior Distribution
Intensity Functions
Subsetting for Each
See A.3.1
Characteristics Model Parameter
See A.3.2

Put Data into


Appropriate Format
Check for Coincidences
of Dates, etc.
as.catalogue

Calculate Log−likelihood in
Specified Time Interval
pp.LL

Write Posterior Function in


Appropriate Format for Optimiser
make.posterior

Optimiser

Maximum Likelihood Parameter Estimates

Figure A.3: Flow chart showing the relationships between the required functions em-
ployed to find the maximum likelihood estimates of a point process model.
A.4. POINT PROCESS PACKAGE (PTPROCESS) 73

Point Process Package


Model Evaluation and Simulation


 /
0
7 
 
!#4(
0 & "$
('
")

 


 

  8  1 6:96 ;   !#"$ %

& 
+*
$

1 22  2 
$4("$

3
 ,.- ( 0/  
 -+*
 5 
6
as.catalogue

< /  '>=


$/$"$0

  1 6:96:9

Figure A.4: Flow chart showing the relationship between the required functions em-
ployed to evaluate the goodness of fit and simulate a point process model.
74 APPENDIX A. MAIN SSLIB FUNCTIONS

pp.sim Simulate Point Process Model

A.4.4 Miscellaneous Functions


convert.reduced Parameter Mapping Between Models
emp.magn Magnitude Empirical Probability Distribution
linksrm.convert Parameter Conversion for Linked SR Model

A.5 M8 Package (ssM8)


See Figure A.5 for an overview of the ssM8 package. Detailed documentation for each
function in the ssM8 package can be found in Harte (2003e).

A.5.1 Main M8 Functions


decluster.M8 Decluster Catalogue Using M8 Method
M8 M8 Algorithm
M8.series M8 Algorithm
M8.TIP M8 Algorithm
plot.M8 Plot M8 Series

A.6 Fractal Package (Fractal)


Detailed documentation for each function in the Fractal package can be found in Harte
(2003a).

A.6.1 Simulate Fractal Processes


cantor Simulations from Cantor Measure
fgn Simulate Fractional Gaussian Noise
lorenz Simulations from Lorenz Attractor

A.6.2 Estimate Rényi Dimensions


dimension.plot Plot of Rényi Dimension Estimates
hill Hill Estimation of Rényi Dimensions Dq
phasecon Reconstruct Phase Space
A.6. FRACTAL PACKAGE (FRACTAL) 75

M8 Package
 
      

   

&'  ( )* 




as.catalogue

&'  +  ,)*-


. 


decluster.M8

M8.series

%
#$%
!"

M8

M8.TIP



plot.M8

Figure A.5: Flow chart showing the relationship between objects in the M8 package.
76 APPENDIX A. MAIN SSLIB FUNCTIONS
Appendix B

Common R Functions

This Appendix provides a list of some of the many R functions which are available.
Further information can be found by using the browser help facility.

B.1 Data Objects


B.1.1 Checking and Creating Different Data Types
as.character Character Objects
as.list List Objects
as.logical Logical Objects
as.matrix Matrix Objects
as.vector Vector Objects
character Character Objects
complex Complex Value Objects
integer Integer Objects
is.character Character Objects
is.list List Objects
is.logical Logical Objects
is.matrix Matrix Objects
is.vector Vector Objects
list List Objects
logical Logical Objects
matrix Matrix Objects
NULL Null Objects
numeric Numeric Objects
vector Vectors (Simple Objects)

B.1.2 Data Attributes


attr Attribute of an Object
attributes All Attributes of an Object

77
78 APPENDIX B. COMMON R FUNCTIONS

class Class Attribute of an Object


col Column and Row Identification in a Matrix
dim Dim Attribute of an Object
dimnames Dimnames Attribute of an Object
length Length of a Vector or List
levels Levels Attribute
mode Data Mode of the Values in a Vector
names Names Attribute of an Object
ncol Extents of a Matrix
nrow Extents of a Matrix
row Column and Row Identification in a Matrix

B.1.3 Data Storage


assign Assign Object to Database or Frame
attach Add to or View the Search List
data.matrix Convert a Data Frame into a Numeric Matrix
detach Detach Data from Search List
exists Search for an R Object
get Search for an R Object
library Shared Functions and Datasets
ls List of Datasets in Data Directory
objects Find R Object Names
remove Remove Objects from a Database
rm Remove by Name
scan Scan Data in Text File
search Add to or View the Search List

B.1.4 Lists
$ Extract or Replace Parts of an Object – Generic operator
[[ Extract or Replace Parts of an Object – Generic operator
as.list List Objects
c Combine Values into a Vector or List
is.list List Objects
lapply Apply a Function to Components of a List
length Length of a Vector or List
list List Objects
names Names Attribute of an Object
rev Reverse the Order of a Vector or List
sapply Apply a Function to Components of a List
split Split Data by Groups
unlist Simplify the Structure of a List
B.2. GRAPHS 79

B.1.5 Matrices and Arrays


%*% Matrix Multiplication Operator
[ Extract or Replace Parts of an Object – Generic operator
apply Apply a Function to Sections of an Array
cbind Build Matrix from Columns
col Column and Row Identification in a Matrix
crossprod Matrix Cross Product
diag Diagonal Matrices
dim Dim Attribute of an Object
dimnames Dimnames Attribute of an Object
drop Drop Length-One Dimensions of an Array
is.array Multi-Way Arrays
is.matrix Matrix Objects
matlines Plot Columns of Matrices
matplot Plot Columns of Matrices
matpoints Plot Columns of Matrices
matrix Matrix Objects
ncol Extents of a Matrix
nrow Extents of a Matrix
outer Outer Product of Arrays
rbind Build Matrix from Rows
row Column and Row Identification in a Matrix
solve Solve Linear Equations and Invert Matrices – Generic function
t Matrix Transpose
tapply Apply a Function to a Ragged Array

B.2 Graphs
B.2.1 Add to Existing Plot
abline Plot Line in Intercept-Slope Form
arrows Plot Disconnected Line Segments or Arrows
axis Add an Axis to the Current Plot
box Add a Box Around a Plot
boxplot Box Plot
contour Contour Plot
identify Identify Points on Plot – Generic function
labels Labels for Printing or Plotting – Generic function
legend Put a Legend on a Plot
lines Add Lines or Points to Current Plot
mtext Text in the Margins of a Plot
labels Labels for Printing or Plotting – Generic function
points Add Lines or Points to Current Plot
80 APPENDIX B. COMMON R FUNCTIONS

polygon Shade in a Polygonal Figure


qqline Produce a Line Through a Normal QQ-Plot
segments Plot Disconnected Line Segments or Arrows
symbols Draw Symbols on a Plot
text Plot Text
title Plot Titling Information and/or Axis Labels

B.2.2 Graphical Devices


dev.off Turn Off Current Graphics Device
dev.print Copy Graphics Between Multiple Devices
graphics.off Turn Off All Graphics Devices
jpeg JPEG Bitmap Device
par Set Graphical Parameters
pdf PDF Device
pictex LaTeX/PicTeX Device
png PNG Bitmap Device
postscript Postscript Device
x11 X11 Device
xfig XFIG Device

B.2.3 High-Level Plots


barplot Bar Graph
boxplot Boxplots
bxp Boxplots From Processed Data
contour Contour Plot
dotchart Draw a Dot Chart
hist Plot a Histogram
pairs Produce All Pair-Wise Scatter Plots – Generic function
par Graphical Parameters
pie Pie Charts
plot Plots – Generic function
ppoints Plotting Points for QQ-Plots
qqnorm Quantile-Quantile Plots – Generic function
qqplot Quantile-Quantile Plots – Generic function
stem Stem and Leaf Display
symbols Draw Symbols on a Plot

B.2.4 Interacting with Plots


identify Identify Points on Plot – Generic function
locator Get Coordinates from Plot
menu Menu Interaction Function
B.3. HELP DOCUMENTATION 81

B.3 Help Documentation


? On-line Information on Functions, Objects, and Calls
args Display the Argument List of a Function
help Online Documentation
help.start Help Window System
history Display and Re-Execute R Expressions
prompt Construct Documentation for Function or Data

B.4 Programming Constructs


B.4.1 Arithmetic Operators
%% Mod Operation
%/% Integer Division
* Multiplication
+ Addition
- Subtraction
/ Division
^ Exponentiation

B.4.2 Character Data Operations


abbreviate Abbreviate Character Strings
as.character Character Objects
character Character Objects
charmatch Partial Matching of Character Strings
grep Pattern Search in Character Strings
gsub Substitute Characters in Character Strings
make.names Make Character Strings into Legal R Names
match Match Items in Vector – Generic function
nchar Lengths of Character Strings
order Indices Representing a Sorted Vector
paste Glue Data Together to Make Character Data
sort Sort into Ascending Numeric or Alphabetic Order
sub(" +$", "", x) Remove trailing blanks from x
sub("^ +", "", x) Remove preceding blanks from x
substr Extract Substring
tolower Change to Lower Case
toupper Change to Upper Case

B.4.3 Control Flow


break Break from a Loop
else Conditional Expressions and Operators
82 APPENDIX B. COMMON R FUNCTIONS

for The Structure of R Expressions


if Conditional Expressions and Operators
ifelse Conditional Data Selection
next Break from a Loop
while Conditional Expressions and Operators

B.4.4 Data Manipulation


$ Extract or Replace Parts of an Object – Generic operator
-> Assign a Name to an Object
->> Assign a Name to an Object
: Sequences of Numbers
<- Assign a Name to an Object
<<- Assign a Name to an Object
[[ Extract or Replace Parts of an Object – Generic operator
[ Extract or Replace Parts of an Object – Generic operator
append Data Merging
c Combine Values into a Vector or List
cbind Build Matrix from Columns
charmatch Partial Matching of Character Strings
duplicated Unique or Duplicated Values in a Vector
edit Edits and replaces functions or datasets
fix Fix a function
grep Search for Pattern in Text
is.na Test For Missing Values – Generic Function
length Length of a Vector or List
make.names Make Character Strings into Legal R Names
match Match Items in Vector – Generic function
order Ordering to Create Sorted Data
paste Glue Data Together to Make Character Data
pmatch Partial Matching of Character Items in a Vector
rbind Build Matrix from Rows
rep Replicate Data Values
replace Data Merging
rev Reverse the Order of a Vector or List
seq Sequences of Numbers
sort Sort into Ascending Numeric or Alphabetic Order
sort.list Vector of Indices that Sort Data
split Split Data by Groups
structure An Object with Given Attributes
substr Get Portions of Character Strings
unique Unique or Duplicated Values in a Vector
unlist Simplify the Structure of a List
B.4. PROGRAMMING CONSTRUCTS 83

B.4.5 Input/Output
cat General Printing
count.fields Count the Number of Fields per Line
history Display, Edit, Re-evaluate and Save Past R Expressions
list.files List the Files in a Directory/Folder
page Page Through Data
print Print Data – Generic function
readline Read a Line from the Terminal
scan Input Data from a File
scan.fixed Input Data from a Fixed Format File
sink Send R Output to a File
source Parse and Evaluate R Expressions from a File
system Execute a system (unix) Command
write Write Data to ASCII File

B.4.6 Logical Operators


!= Comparison Operators
! Logical Operators
&& Conditional Expressions and Operators
& Logical Operators
<= Comparison Operators
< Comparison Operators
== Comparison Operators
>= Comparison Operators
> Comparison Operators
| Logical Operators
|| Conditional Expressions and Operators
all Logical Sum and Product
any Logical Sum and Product
is.finite Check IEEE Arithmetic Values
is.infinite Check IEEE Arithmetic Values
is.logical Logical Objects
is.na Test For Missing Values – Generic function
is.nan Check IEEE Arithmetic Values
is.numeric Check IEEE Arithmetic Values
logical Logical Objects
sign Signum Function and Comparison
xor Logical Operators

B.4.7 Miscellaneous
all.names Find All Names in an Expression
amatch Argument Matching
84 APPENDIX B. COMMON R FUNCTIONS

assign Assign Object to Database or Frame


browser Browse an Object – Generic function
charmatch Partial Matching of Character Strings
date Today’s Date and Time
eval Evaluate an Expression
history Display, Edit, Re-evaluate and Save Past R Expressions
integer Integer Objects
invisible Mark Function as Non-Printing
missing Check for Missing Arguments
mode Data Mode of the Values in a Vector
NULL Null Objects
parse Parse Expressions
readline Read a Line from the Terminal
return The Structure of R Expressions
substitute Substitute in an Expression
switch Evaluate One of Several Expressions
trace Trace Calls to Functions
traceback Print Call Stack
warning Error and Warning Messages

B.4.8 Pseudo Looping Functions (Apply)


apply Apply a Function to Sections of an Array
lapply Apply a Function to Components of a List
sapply Apply a Function to Components of a List
tapply Apply a Function to a Ragged Array

B.5 Technical Computations


B.5.1 Categorical Data
aggregate Compute Summary Statistics of Data Subsets
as.factor Factor
factor Factor
is.factor Factor
levels Levels Attribute
split Split Data by Groups
table Create Contingency Table from Categories
tabulate Count Entries in Bins

B.5.2 Miscellaneous Mathematical Functions


approx Linear Interpolation of Points
exp Exponential Function (base e)
B.5. TECHNICAL COMPUTATIONS 85

gamma Gamma Function


lgamma Natural Logarithm of Gamma Function
log Logarithm
log10 Logarithm to Base 10
polyroot Find the Roots of a Polynomial
prod Product
rank Ranks of Data
sqrt Square Root
sum Sums

B.5.3 Probability Distributions and Random Numbers


In the probability distributions below ‘*’ should be replaced by ‘d’ for the density, ‘p’
for the cumulative distribution, ‘q’ for the quantiles, and ‘r’ for a random sample from
the distribution as appropriate.

*beta Beta Distribution


*binom Binomial Distribution
*cauchy Cauchy Distribution
*chisq Chi-Square Distribution
*exp Exponential Distribution
*f F Distribution
*gamma Gamma Distribution
*geom Geometric Distribution
*hyper Hypergeometric Distribution
*lnorm Lognormal Distribution
*logis Logistic Distribution
*nbinom Negative Binomial Distribution
*norm Normal (Gaussian) Distribution
*nrange Distribution of the Range of Standard Normals
*pois Poisson Distribution
*t Student’s t-Distribution
*unif Uniform Distribution
*weibull Weibull Distribution
*wilcox Distribution of Wilcoxon Rank Sum Statistic

B.5.4 Rounding Functions


abs Absolute Value
ceiling Smallest Integer Not Less Than
floor Largest Integer Not Greater Than
round Rounding Function
signif Rounding Function
trunc Round to Integer Towards to Zero
86 APPENDIX B. COMMON R FUNCTIONS

B.5.5 Statistical Functions


anova Analysis of Variance
cor Variance, Covariance, and Correlation
cumprod Cumulative Sums and Products
cumsum Cumulative Sums and Products
cummax Cumulative Maxima
cummin Cumulative Minima
density Estimate Probability Density Function
diff Create an Object of Differences
gamma Gamma Function (and its Natural Logarithm)
glm Generalised Linear Models
lm Fit Linear Model
max Extremes
mean Mean Value (Arithmetic Average)
median Median
min Extremes
ppoints Plotting Points for QQ-Plots
prod Sums and Products
quantile Empirical Quantiles
range Range of Data
sample Generate Random Samples or Permutations of Data
stem Stem and Leaf Display
sum Sums and Products
var Variance, Covariance, and Correlation

B.5.6 Time Series


diff Create an Object of Differences
time Create Time Vector
tsp Tsp Attribute of an Object
ts Time Series Objects
ts.plot Time Series Plot

B.5.7 Trigonometric Functions


acos Inverse Cosine Function
acosh Inverse Hyperbolic Cosine Function
asin Inverse Sine Function
asinh Inverse Hyperbolic Sine Function
atan Inverse Tangent Function
atanh Inverse Hyperbolic Tangent Function
cos Cosine Function
cosh Hyperbolic Cosine Function
sin Sine Function
B.5. TECHNICAL COMPUTATIONS 87

sinh Hyperbolic Sine Function


tan Tangent Function
tanh Hyperbolic Tangent Function
88 APPENDIX B. COMMON R FUNCTIONS
Appendix C

Mathematical Detail

C.1 Point Process Log-Likelihood Function


Let Nδ (t) be the number of events in [t, t + δ), and Ht be the history of the process up
to but not including t. The conditional intensity function is defined as

1
λ(t|Ht ) = lim Pr{Nδ (t) > 0|Ht }.
δ→0 δ

Let τ be the time of the last event before time t, hence t > τ . Also let ∅(τ,t) be the
null outcome, i.e. no events in the interval (τ, t). Denote the conditional distribution of
the time of the next event as

F (t|Hτ ∩ ∅(τ,t) ) = Pr{T ≥ t|Hτ ∩ ∅(τ,t) },

and f (t|Hτ ∩ ∅(τ,t) ) as the corresponding conditional density function. Then

f (t|Hτ ∩ ∅(τ,t) )
λ(t|Hτ ∩ ∅(τ,t) ) = .
1 − F (t|Hτ ∩ ∅(τ,t) )

Solving the differential equation gives


 Z t 
F (t|Hτ ∩ ∅(τ,t) ) = 1 − exp − λ(u|Hτ ∩ ∅(τ,u) )du ,
τ

where τ is the time of the last event occurring before t. Rearranging gives the density
function as
 Z t 
f (t|Hτ ∩ ∅(τ,t) ) = λ(t|Hτ ∩ ∅(τ,t) ) exp − λ(u|Hτ ∩ ∅(τ,u) )du .
τ

Say · · · < t−2 < t−1 < t0 < T1 < t1 < t2 < · · · < tn < T2 < tn+1 < tn+2 < · · ·,
where ti i ∈ Z are event times, and [T1 , T2 ] represents the interval over which we want

89
90 APPENDIX C. MATHEMATICAL DETAIL

to maximise the likelihood. Then

log L(T1 , T2 )
n
X 
= log f (t1 |HT1 ∩ ∅(T1 ,t1 ) ) + log f (ti |Hti−1 ∩ ∅(ti−1 ,ti ) ) + log 1 − F (T2 |Htn ∩ ∅(tn ,T2 ) )
i=2
n
X Z t1 n Z
X ti
= log λ(ti |Hti ) − λ(u|HT1 ∩ ∅(T1 ,u) )du − λ(u|Hti−1 ∩ ∅(ti−1 ,u) )du
i=1 T1 i=2 ti−1
Z T2
− λ(u|Htn ∩ ∅(tn ,u) )du
tn
X Z T2
= log λ(ti |Hti ) − λ(t|Ht )dt
i:T1 ≤ti ≤T2 T1

Often “|Ht ” is omitted, and throughout this document, λ(t) is understood to mean
λ(t|Ht ).

C.2 Self-Exciting and ETAS Models


C.2.1 Self-Exciting Models
The general class of self-exciting models was introduced by A.G. Hawkes in 1970. For
a time-only model, it has conditional intensity of the form
X
λ(t) = µ + γ(t − ti ),
i:ti <t

where µ ≥ 0 is a background rate, and γ(u) ≥ 0 (0 < u < ∞), γ(u) = 0 (u < 0), is
a “reproduction” function which the describes the rate atR which a “parent” at u = 0
produces “offspring” at time u. For stability we require γ(u)du < 1. Thus we may
also write X
λ(t) = µ + k g(t − ti ),
i:ti <t

where g has been normalized to form a probability density and k is the expected number
of direct “offspring” from a single “ancestor”. The stability requirement
R becomes k < 1;
k is sometimes called the “criticality” parameter. If k ≥ 1, or if γ(u)du = ∞, the
process “explodes”, i.e. if simulated, the overall rate of occurrence of offspring events
would grow larger and larger.
In principle, the model extends easily to situations where the conditional intensity
depends on space and magnitude or other additional variables. For example,
X
λ(t, x, M ) = µ(x, M ) + γ(t − ti , x − xi , M | Mi ),
i:ti <t
C.2. SELF-EXCITING AND ETAS MODELS 91

where the background rate depends on the location and on the magnitude class being
considered, and γ(u, x, M |M0 ) gives the rate of production of offspring of magnitude M
at location x and time u after a parent event of magnitude M0 at the origin of space
and time. In practice we use only forms with a simplified structure. If magnitudes
are assumed to follow a density f (M ) independently of all other features, and the
background rate is homogeneous in space, the conditional intensity can be written in
normalized form as
" #
X
λ(t, x, M ) = f (M ) µ + k a(Mi )g(t − ti , x − xi ) ,
i:ti <t

where g(u, x) is a bivariate probability density, a(M ) is a magnitude weighting factor


such that a(M1 )/a(M2 ) is the ratio of the expected numbers Rof direct offspring from
parent events of magnitudes M1 and M2 normalized to satisfy a(M )f (M ) = 1; and k
can again be interpreted as a “criticality parameter”.

C.2.2 The ETAS Model


This is a special form of a time-magnitude self-exciting model with independent magni-
tudes, homogeneous background rate, and hyperbolic form for the reproduction function.
Its conditional intensity can be written as
" #
X
λ(t, M ) = f (M ) µ + k a(Mi )g(t − ti )
i:ti <t

with
p−1 u −p β − α αM
f (M ) = βe−βM , g(u) = 1+ , and a(M ) = e .
c c β
Stability conditions for the model are p > 1, k < 1, β > α. If these conditions
are not satisfied, this form of parameterisation fails (densities may become negative), so
care must be taken in ensuring that the constraints are observed.

C.2.3 Utsu & Ogata’s Parameterisation


An alternative parameterisation is used by Utsu & Ogata (1997), which omits the mag-
nitude density, which is just a multiplying factor. Their parameterisation is
eα(Mi −M0 ) (t − ti + c)−p
X
λ(t) = µ + K
i:ti <t

t − ti −p
 
K X α(Mi −M0 )
= µ+ p e 1+ .
c c
i:ti <t

In this parameterisation the requirements p > 1, α < β are not explicitly enforced, so
that if the likelihood is maximized freely, the optimum may occur at a point where one
or other of these constraints is broken. This tends to happen in situations where there
is an increasing trend or other form of departure from the type of behaviour postulated
by the model, for example in modelling intermediate-depth earthquakes.
92 APPENDIX C. MATHEMATICAL DETAIL

C.2.4 SSLib Parameterisation

 −p
X
α(Mi −M0 ) t − ti
λ(t) = µ + A e 1+
c
i:ti <t

If p 6= 1, then:

ti ti
t − tj −p
Z Z X  
λ(t)dt = µ(ti − ti−1 ) + A eα(Mj −M0 ) 1 + dt
ti−1 ti−1 j:t <t c
j

1−p ti
 

Ac  X t − tj
= µ(ti − ti−1 ) + eα(Mj −M0 ) 1 + 
1−p c
j:tj <t
ti−1
 1−p
Ac X ti − tj
= µ(ti − ti−1 ) + eα(Mj −M0 ) 1 +
1−p c
j:tj <ti

ti−1 − tj 1−p
 
Ac X
α(Mj −M0 )
− e 1+
1−p c
j:tj ≤ti−1

Z ti k Z
X ti−k+j
λ(t)dt = λ(t)dt
ti−k j=1 ti−k+j−1

ti − tj 1−p
 
Ac X α(Mj −M0 )
= µ(ti − ti−k ) + e 1+
1−p c
j:tj <ti
 1−p i−1
Ac X
α(Mj −M0 ) ti−k − tj Ac X α(Mj −M0 )
− e 1+ − e
1−p c 1−p
j:tj <ti−k j=i−k
" #
ti − tj 1−p

Ac X α(Mj −M0 )
= µ(ti − ti−k ) + e 1+ −1
1−p c
j:tj <ti
" 1−p #
Ac X t i−k − t j
− eα(Mj −M0 ) 1+ −1
1−p c
j:tj <ti−k

" #
T2
T2 − tj 1−p
Z 
Ac X α(Mj −M0 )
λ(t)dt = µ(T2 − T1 ) + e 1+ −1
T1 1−p c
j:tj <T2
" #
T1 − tj 1−p

Ac X α(Mj −M0 )
− e 1+ −1
1−p c
j:tj <T1
C.3. STRESS RELEASE MODELS 93

Similarly, for p = 1
T2  
T 2 − tj
Z X
α(Mj −M0 )
λ(t)dt = µ(T2 − T1 ) + Ac e log 1 +
T1 c
j:tj <T2
 
X T 1 − tj
− Ac eα(Mj −M0 ) log 1 +
c
j:tj <T1

C.3 Stress Release Models


C.3.1 Simple Stress Release Model
X
S(t) = 100.75(Mi −M0 ) (C.1)
i:ti <t

For example, say S(t1 ) = 0, then S(t2 ) = 100.75(M1 −M0 ) , and

S(ti+1 ) = S(ti ) + 100.75(Mi −M0 ) , i = 1, 2, · · ·

λ(t) = exp{a + b[t − cS(t)]}

Z ti+1 Z ti+1
λ(t)dt = ea eb[t−cS(t)] dt
ti ti
1 a−bcS(ti+1 ) h bti+1 i
= e e − ebti
b
If there are no events in the interval [T1 , T2 ], then
Z T2
1 h i
λ(t)dt = ea−bcS(T2 ) ebT2 − ebT1
T1 b

Say · · · < t−2 < t−1 < t0 < T1 < t1 < t2 < · · · < tn < T2 < tn+1 < tn+2 < · · ·,
where ti i ∈ Z are event times, and [T1 , T2 ] represents the interval over which we want
to maximise the likelihood. Thus, the interval [T1 , T2 ] contains n events, and
Z T2 Z t1 n−1
X Z ti+1 Z T2
λ(t)dt = λ(t)dt + λ(t)dt + λ(t)dt
T1 T1 i=1 ti tn

1 a−bcS(t1 ) h bt1 i n−1


X1 h i
= e e − ebT1 + ea−bcS(ti+1 ) ebti+1 − ebti
b b
i=1
1 h i
+ ea−bcS(T2 ) ebT2 − ebtn
b
94 APPENDIX C. MATHEMATICAL DETAIL

C.3.2 Linked Stress Release Model


We have k “linked” regions where the stress transfer between regions is given by the
k × k matrix C = (crj ). The conditional intensity function for the rth region is
  
 Xk 
λ(t, r) = exp ar + br t −
 crj Sj (t) ,
 
j=1

where Sj (t) is as in Equation C.1, but the summation is only taken over those events in
region j. The structure of the matrix C determines the possible types of stress transfer
between regions.
Say · · · < t−2 < t−1 < t0 < T1 < t1 < t2 < · · · < tn < T2 < tn+1 < tn+2 < · · ·,
where ti i ∈ Z are event times (over the union of all regions), and [T1 , T2 ] represents the
interval over which we want to maximise the likelihood. Also let R be a set containing
the region labels, i.e. r ∈ R = {1, 2, · · · , k}. Then
 
Z ti+1 k
exp(br ti+1 ) − exp(br ti )  X 
λ(t, r)dt = exp ar − br crj Sj (ti+1 )
ti br  
j=1

and hence
n−1
" #
Z Z T2 X Z t1 Z T2 X Z ti+1
λ(t, r)dtdr = λ(t, r)dt + λ(t, r)dt + λ(t, r)dt .
R T1 r∈R T1 tn i=1 ti

C.4 Simulation Using the Thinning Method


Essentially one takes a small sub-interval at the beginning of the required simulation
period. Then one finds the maximum of λ(t) on the sub-interval, and simulates an
inter-event time according to this maximum rate. Given this has been done using λmax ,
the inter-event time will, on average, be too small. Hence, events will be generated too
frequently; thus we generate a potential event (too many on average), and then “thin”.
One then moves along the simulation period and considers the next sub-interval.

C.4.1 Algorithm
1. Let T be the start of a small simulation interval.

2. Take a small interval (T, T + δ).

3. Calculate the maximum of λ(t|HT ) in the interval as

λmax = max λ(t|HT ) .


t∈(T,T +δ)

4. Simulate an exponential random number τ with rate λmax .


C.4. SIMULATION USING THE THINNING METHOD 95

5. If
λ(T + τ )
<1
λmax
then go to (6).
Else no events occur in (T, T + δ), hence T ← T + δ, and return to (2).

6. Simulate a uniform random number U on the interval (0, 1).

7. If
λ(T + τ )
U≤ ,
λmax
then a new “event” occurs at ti = T + τ .

8. Increment T for the next event simulation:

T ←T +τ.

9. Return to (2).

C.4.2 Simulation Notes


1. If λ(t) is monotonically decreasing (except at event times), as in the ETAS model,
then the selection of δ has no effect because λmax = λ(T ).

2. When λ(t) is monotonically increasing (except at event times), e.g. the stress
release model, there are two extreme situations that could cause the process to be
inefficient.

(a) If δ is too small, λmax will be relatively small, hence τ quite large, possibly
greater than T + δ. Here many small intervals will be considered, but each
with a very low likelihood of an event.
(b) If δ is too large, λmax will be relatively large, hence τ will be quite small.
This could be inefficient as many possible “events” will be “thinned”.

Hence, for best efficiency, one requires δ to be not too small, but also not too large.

3. Simulation is sometimes critically dependent on the magnitude distribution.

(a) A too small b-value many cause aftershock sequences to never die out in the
ETAS model.
(b) A too smaller b-value in the stress release model may cause an extremely
long period of quiescence. An alternative is to use the truncated Pareto
distribution rather than the exponential distribution.
96 APPENDIX C. MATHEMATICAL DETAIL
References

Chambers, J.M. & Hastie, T. (1991). Statistical Models in S. Wadesworth and Brooks-
Cole, Pacific Grove CA.
Daley, D.J. & Vere-Jones, D. (2003). An Introduction to the Theory of Point Processes.
Volume I: Elementary Theory and Methods. Second Edition. Springer-Verlag,
New York. ISBN: 0-387-95541-0.
Harte, D.S. (1998). Documentation for the Statistical Seismology Library. School
of Mathematical and Computing Sciences Research Report No. 98-10. Victoria
University of Wellington, Wellington.
Harte, D.S. (2001). Multifractals: Theory and Applications. Chapman and Hall/CRC,
Boca Raton. ISBN: 1-58488-154-2.
Harte, D.S. (2003a). Package Fractal: Fractal Analysis. Manual of Function Docu-
mentation. Statistics Research Associates, Wellington. URL: http://homepages.
paradise.net.nz/david.harte/SSLib/Manuals/fractal.pdf.
Harte, D.S. (2003b). Package PtProcess: Time Dependent Point Process Modelling.
Manual of Function Documentation. Statistics Research Associates, Wellington.
URL: http://homepages.paradise.net.nz/david.harte/SSLib/Manuals/pp.pdf.
Harte, D.S. (2003c). Package ssBase: Base Functions for SSLib. Manual of Func-
tion Documentation. Statistics Research Associates, Wellington. URL: http:
//homepages.paradise.net.nz/david.harte/SSLib/Manuals/base.pdf.
Harte, D.S. (2003d). Package ssEDA: Exploratory Data Analysis for Earthquake Data.
Manual of Function Documentation. Statistics Research Associates, Wellington.
URL: http://homepages.paradise.net.nz/david.harte/SSLib/Manuals/eda.
pdf.
Harte, D.S. (2003e). Package ssM8: M8 Earthquake Forecasting Algorithm. Manual
of Function Documentation. Statistics Research Associates, Wellington. URL:
http://homepages.paradise.net.nz/david.harte/SSLib/Manuals/m8.pdf.
Harte, D.S. (2003f). Package ssNZ: Catalogue of NZ Earthquake Events. Manual
of Function Documentation. Statistics Research Associates, Wellington. URL:
http://homepages.paradise.net.nz/david.harte/SSLib/Manuals/nz.pdf.

97
98 REFERENCES

Harte, D.S. (2003g). Package ssPDE: PDE Catalogue of World Wide Earthquake
Events. Manual of Function Documentation. Statistics Research Associates,
Wellington. URL: http://homepages.paradise.net.nz/david.harte/SSLib/
Manuals/pde.pdf.

Harte, D.S. (2005). Package HiddenMarkov: Hidden Markov Models. Manual of


Function Documentation. Statistics Research Associates, Wellington. URL: http:
//homepages.paradise.net.nz/david.harte/SSLib/Manuals/hmm.pdf.

Ihaka, R. & Gentleman, R. (1996). R: A language for data analysis and graphics.
Journal of Computational and Graphical Statistics 5(3), 299–314.

Lay, T. & Wallace, T.C. (1995). Modern Global Seismology. Academic Press, San
Diego. ISBN: 0-12-732870-X.

Maindonald, J. & Braun, J. (2003). Data Analysis and Graphics Using R - An


Example-Based Approach. Cambridge University Press, Cambridge. ISBN: 0521813360.

R Development Core Team. (2000). An Introduction to R. R Foundation for Sta-


tistical Computing, Vienna. URL: http://cran.r-project.org/doc/manuals/
R-intro.pdf.

R Development Core Team. (2002). Writing R Extensions. R Foundation for Statistical


Computing, Vienna. URL: http://cran.r-project.org/doc/manuals/R-exts.
pdf.

R Development Core Team. (2003). R: A Language and Environment for Statistical


Computing. R Foundation for Statistical Computing, Vienna. ISBN 3-900051-07-
0. URL: http://www.r-project.org.

Statistical Sciences Inc. (1992). S-PLUS Programmers Manual, Version 3.0. Statisti-
cal Sciences Inc, Seattle.

Utsu, T. & Ogata, Y. (1997). Statistical analysis of seismicity. In: Algorithms for
Earthquake Statistics and Prediction (Edited by: J.H. Healy, V.I. Keilis-Borok &
W.H.K. Lee), pp 13–94. IASPEI, Menlo Park CA.

You might also like