PASTManual
PASTManual
44
Øyvind Hammer, D.A.T. Harper and P.D. Ryan
June 2, 2006
1 Introduction
Welcome to the PAST! This program is designed as a follow-up to PALSTAT, an
extensive package written by P.D. Ryan, D.A.T. Harper and J.S. Whalley (Ryan et
al. 1995). It includes many of the functions which are commonly used in palaeon-
tology and palaeoecology.
These days, a number of large and very good statistical systems exist, including
SPSS, SAS and extensions to Excel. Why yet another statistics program?
• PAST is free.
• PAST is easy to use, and therefore well suited for introductory courses in
quantitative palaeontology.
• PAST comes with a number of example data sets, case studies and exercises,
making it a complete educational package.
http://folk.uio.no/ohammer/past
1
2 Installation
The basic installation of PAST is easy: Just download the file ’Past.exe’ and put
it anywhere on your hard disk. Double-clicking the file will start the program.
The data files for the case studies can be downloaded separately, or together in
the packed file ’casefiles.zip’. This file must be unpacked with a program such as
WinZip.
We suggest you make a folder called ’past’ anywhere on your hard disk, and
put all the files in this folder.
Please note: Problems have been reported for some combinations of screen
resolution and default font size in Windows - the layout becomes ugly and it may
be necessary for the user to increase the sizes of windows in order to see all the text
and buttons. If this happens, please set the font size to ’Small fonts’ in the Screen
control panel in Windows. We are working on solving this problem.
PAST also seems to have problems with some printers. Postscript printers work
fine.
When you exit PAST, a file called ’pastsetup’will be automatically placed in
your personal folder (for example ’My Documents’ in Windows 95/98), containing
the last used file directories.
2
3 Entering and manipulating data
PAST has a spreadsheet-like user interface. Data are entered as an array of cells,
organized in rows (horizontally) and columns (vertically).
Entering data
To input data in a cell, click on the cell with the mouse and type in the data. This
can only be done when the program is in the ’Edit mode’. To select edit mode, tick
the box above the array. When edit mode is off, the array is locked and the data
cannot be changed. The cells can also be navigated using the arrow keys.
Any text can be entered in the cells, but almost all functions will expect num-
bers. Both comma (,) and decimal point (.) are accepted as decimal separators.
Absence/presence data are coded as 0 or 1, respectively. Any other positive
number will be interpreted as presence. Absence/presence-matrices can be shown
with black squares for presences by ticking the ’Square mode’ box above the array.
Missing data are coded with question marks (’?’) or the value -1. Unless
support for missing data is specifically stated in the documentation for a function,
the function will not handle missing data correctly, so be careful.
The convention in PAST is that items occupy rows, and variables columns.
Three brachiopod individuals might therefore occupy rows 1, 2 and 3, with their
lengths and widths in columns A and B. Cluster analysis will always cluster items,
that is rows. For Q-mode analysis of associations, samples (sites) should there-
fore be entered in rows, while taxa (species) are in columns. For switching be-
tween Q-mode and R-mode, rows and columns can easily be interchanged using
the Transpose operation.
Selecting areas
Most operations in PAST are carried only out on the area of the array which you
have selected (marked). If you try to run a function which expects data, and no
area has been selected, you will get an error message.
• Multiple rows are selected by selecting the first row label, then shift-clicking
(clicking with the Shift key down) on the additional row labels. Note that
you can not ’drag out’ multiple rows - this will instead move the first row
(see below).
3
• The whole array can be selected by clicking the upper left corner of the array
(the empty grey cell) or by choosing ’Select all’ in the Edit menu.
• Smaller areas within the array can be selected by ’dragging out’ the area, but
this only works when ’Edit mode’ is off.
4
Remove
The remove function (Edit menu) allows you to remove selected row(s) or col-
umn(s) from the spreadsheet. The removed area is not copied to the paste buffer.
Transpose
The Transpose function, in the Edit menu, will interchange rows and columns. This
is used for switching between R mode and Q mode in cluster analysis, principal
components analysis and seriation.
5
Samples to events (UA to RASC)
Given a data matrix of occurrences of taxa in a number of samples in a number
of sections, as used by the Unitary Associations module, this function will convert
each section to a single row with orders of events (FADs, LADs or both) as ex-
pected by the Ranking-Scaling module. Tied events (in the same sample) will be
given equal ranking.
Empty cells (like the top left cell) are coded with a full stop (.). Cells are
separated by white space, which means that you must never use spaces in row or
column labels. ’Oxford Clay’ is thus an illegal column label which would confuse
the program.
If any rows have been assigned a colour other than black, the row labels in
the file will start with an underscore, a number from 0 to 8 identifying the colour
(symbol), and another underscore.
In addition to this format, PAST can also detect and open files in the following
formats:
• TPS format developed by Rohlf (only the landmark, id and scale fields are
supported, other fields are ignored).
• RASC format for biostratigraphy. You must open the .DAT file, and the
program expects corresponding .DIC and .DEP files in the same directory.
The decimal depths format is not supported.
The ’Insert from file’ function is useful for concatenating data sets. The loaded
file will be inserted into your existing spreadsheet at the selected position (upper
left). Other data sets can thus be inserted both to the right of and below your
existing data.
6
Data from Excel
Data from Excel can be imported in two ways:
• Copy from Excel and paste into PAST. Note that if you want the first row
and column to be copied into the label cells in PAST, you need to switch on
the "Edit labels" option.
• Make sure that the top left cell in Excel contains a single dot (.) and save as
tab-separated text in Excel. The resulting text file can be opened directly in
PAST.
7
4 Transforming your data
These routines subject your data to mathematical operations. This can be useful for
bringing out features in your data, or as a necessary preprocessing step for some
types of analysis.
Logarithm
The Log function in the Transform menu log-transforms your data using the base-
10 logarithm. A constant is added if necessary, that is if any values are non-
positive:
y = log(x + 1)
Subtract mean
This function subtracts the column mean from each of the selected columns. The
means cannot be computed row-wise.
Remove trend
This function removes any linear trend from a data set (two columns with X-Y
pairs). This is done by subtraction of a linear regression line from the Y values.
Removing the trend can sometimes be a useful operation prior to spectral analysis.
Column difference
Simply subtracts two selected columns, and places the result in the next column.
8
Evaluate expression
This powerful feature allows flexible mathematical operations on the selected ar-
ray of data. Each selected cell is evaluated, and the result replaces the previous
contents. A mathematical expression must be entered, which can include any of
the operators +, -, *, /, ˆ(power), and mod (modulo). Also supported are brackets (),
and the functions abs, atan, cos, sin, exp, ln, sqrt, sqr, round and trunc.
The following variables can also be used:
Examples:
sqrt(x) Replaces all numbers with their square roots
(x-mean)/stdev Mean and standard deviation normalization, column-wise
x-0.5*(max+min) Centers the values around zero
(u+x+d)/3 Three-point moving average smoothing
x-u First-order difference
i Fills the column with the row numbers (requires non-empty cells, such as all zeros
sin(2*3.14159*i/n) Generates one period of a sine function down a column (requires non-empty cells)
5*normal+10 Normally distributed random number, mean 10 and standard deviation 5
9
5 Plotting functions
Graph
Plots one or more columns as separate graphs. The x coordinates are set auto-
matically to 1,2,3,... There are three plot styles available: Graph (lines), bars and
points. The ’X labels’ options sets the x axis labels to the appropriate row names.
XY graph
Plots one or more pairs of columns containing x/y coordinate pairs. The ’log Y’
option log-transforms your Y values (if necessary, a constant is added to make the
minimum log value equal to 0). The curve can also be smoothed using 3-point
moving average.
95 percent confidence ellipses can be plotted in most scatter plots in PAST,
such as scores for PCA, CA, DCA, PCO, NMDS, and relative and partial warps.
The calculation of these ellipses assumes a bivariate normal distribution.
Convex hulls can also be drawn in the scatter plots, in order to show the areas
occupied by points of different ’colours’. The convex hull is the smallest convex
polygon containing all points.
The minimal spanning tree is the set of lines with minimal total length, con-
necting all points. In the XY graph module, Euclidean lengths in 2D are used.
Histogram
Plots histograms (frequency distributions) for one or more columns. The number
of bins is 10 by default, but can be changed by the user. The "Fit normal" option
draws a graph with a fitted normal distribution (Parametric estimation, not Least
Squares).
Box plot
Box plot for one or several columns (samples) of univariate data. For each sample,
the 25-75 percent quartiles are drawn using a box. The median is shown with a
horizontal line inside the box. The minimal and maximal values are shown with
short horizontal lines (’whiskers’).
10
Ternary
Ternary plot for three columns of data, normally containing proportions of compo-
sitions.
Survivorship
Survivorship curves for one or more columns of data. The data will normally con-
sist of age or size values. A survivorship plot shows the number of individuals
which survived to different ages. Assuming exponential growth (highly question-
able!), size should be log-transformed to age. This can be done either in the Trans-
form menu, or directly in the Survivorship dialogue.
Landmark plot
This function is very similar to the ’XY graph’, the only difference being that all
XY pairs on each row are plotted with the appropriate row colour and symbol. It is
well suited for plotting landmark data.
Landmarks 3D
Plotting of points in 3D (XYZ triples). Especially suited for 3D landmark data, but
can also be used e.g. for PCA scatter plots along three principal components. The
point cloud can be rotated around the x and the y axes (note: left-handed coordinate
system). The ’Perspective’ slider is normally not used. The ’Stems’ option draws
a line from each point down or up to a plane centered along the y axis, which
can sometimes enhance 3D information. ’Lines’ draws lines between consecutive
landmarks within each separate specimen (row). ’Axes’ shows the three coordinate
axes with the centroid of the points as the origin.
Matrix
Two-dimensional plot of the data matrix, using a grayscale with white for lowest
value, black for highest. Can be useful to get an overview over a large data matrix.
11
6 Basic statistics
Univariate statistics
Typical application Assumptions Data needed
Quick statistical description None, but variance and One or more columns of
of a univariate sample standard deviation are most measured or counted data
meaningful for normally
distributed data
Displays the following statistics: Number of entries (N), smallest value (Min),
largest value (Max), mean value (Mean), standard error of the estimate of the mean
(Std. error), population variance (that is, the variance of the population estimated
from the sample), population standard deviation (square root of variance), median,
skewness (positive for a tail to the right) and kurtosis (positive for a peaked distri-
bution).
Missing data (?) are supported.
12
Tests whether a single distribution (one selected column) is normal. This test
is designed for populations with 3≤N ≤5000.
Missing data (?) are supported.
Two columns must be selected. The F test compares the variances of two
distributions, while the t test compares their means. The F and t statistics, and the
probabilities that the variances and means of the parent populations are the same,
are given. The F and t tests should only be used if you have reason to believe that
the parent populations are close to normally distributed. The Shapiro-Wilk test for
one distribution against a normal distribution can give you an idea about this.
Also, the t test is really only applicable when the variances are the same. So
if the F test says otherwise, you should be cautious about the t test. An unequal
variance t statistic (Welch test) is also given, which should be used in this case.
The permutation t test compares the observed t statistic (normalized difference
between means) with the t statistics from 1000 (can be changed by the user) ran-
dom pairs of replicates from the pooled data set. This test will be more accurate
than the normal t test for non-normal distributions and small samples.
Sometimes publications give not the data, but values for sample size, mean and
variance for two populations. These can be entered manually using the ’F and t
from parameters’ option in the menu.
See Brown & Rothery (1993) or Davis (1986) for details.
Missing data (?) are supported.
13
The one-sample t test is used to investigate whether the sample is likely to have
been taken from a population with a given (theoretical) mean.
Paired t test. Say that a measurement such as length of claw has been taken
on the left and right side of a number of crab specimens, and we want to test for
directed asymmetry (difference between left and right). A two-sample t test is not
appropriate, because the values are not independent. Instead, we can perform a
one-sample t test of left minus right against the value zero.
Missing data (?) are supported.
The Chi-square test is the one to use if your data consist of the numbers of
elements in different bins (compartments). For example, this test can be used to
compare two associations (columns) with the number of individuals in each taxon
organized in the rows. You should be a little cautious about such comparisons if
any of the bins contain less than five individuals.
There are two options that you should select or not for correct results. ’Sample
vs. expected’ should be ticked if your second column consists of values from a
theoretical distribution (expected values) with zero error bars. If your data are
from two counted samples each with error bars, leave this box open. This is not a
small-sample correction.
’One constraint’ should be ticked if your expected values have been normal-
ized in order to fit the total observed number of events, or if two counted samples
necessarily have the same totals (for example because they are percentages). This
will reduce the number of degrees of freedom by one. When "one constraint" is
selected, a permutation test is available, with 1000 randomly permutated replicates
(row and column sums kept constant).
See Brown & Rothery (1993) or Davis (1986) for details.
Missing data (?) are supported.
14
different. This test is non-parametric, which means that the distributions can be of
any shape. PAST uses an approximation based on a z test, which is only valid for
N > 7. It includes a continuity correction.
See Brown & Rothery (1993) or Davis (1986) for details.
Missing data (?) are supported.
Two columns must be selected. The K-S test can be used to test whether two in-
dependent distributions of continuous, unbinned numerical data are different. The
K-S test is non-parametric, which means that the distributions can be of any shape.
If you want to test just the locations of the distribution (medians), you should use
instead the Mann-Whitney U test.
See Davis (1986) for details.
Missing data (?) are supported.
These non-parametric rank-order tests are used to test for correlation between
two variables.
Missing data (?) are supported.
Correlation matrix
Typical application Assumptions Data needed
Quantifying correlation be- Normal distribution Two or more columns of
tween two or more variables measured or counted vari-
ables
A matrix is presented with the correlations between all pairs of columns. Cor-
relation values (Pearson’s r) are given in the lower triangle of the matrix, and the
probabilities that the columns are uncorrelated are given in the upper.
15
Variance/covariance matrix
Typical application Assumptions Data needed
Quantifying covariance be- None Two or more columns of
tween two or more variables measured or counted vari-
ables
A contingency table is input to this routine. Rows represent the different states
of one nominal variable, columns represent the states of another nominal variable,
and cells contain the counts of occurrences of that specific state (row, column) of
the two variables. A measure and probability of association of the two variables
(based on Chi-square) is then given.
For example, rows may represent taxa and columns samples as usual (with
specimen counts in the cells). The contingency table analysis then gives informa-
tion on whether the two nominal variables "taxon" and "locality" are associated. If
not, the data matrix is not very informative. For details, see Press et al. (1992).
One-way ANOVA
Typical application Assumptions Data needed
Testing for equality of the Normal distribution and Two or more columns of
means of several univariate similar variances and measured or counted data
samples sample sizes
16
HSD test. The Studentized Range Statistic Q is given in the lower left triangle of
the array, and the probabilities p(equal) in the upper right. Sample sizes do not
have to be equal for the version of Tukey’s test used.
Two-way ANOVA
Typical application Assumptions Data needed
Testing for equality of the Normal distribution and A matrix of measured or
means of several univariate similar variances and counted data, with one fac-
samples, taken across two sample sizes tor in columns and the
sets of factors (e.g. species other marked with grouping
and soil type), and for inter- (colouring) of rows. All ele-
action between the factors ments in the matrix must be
filled.
Kruskal-Wallis test
Typical application Assumptions Data needed
Testing for equality of the None Two or more columns of
medians of several univari- measured or counted data
ate samples
Similarity/distance indices
Typical application Assumptions Data needed
Comparing two or more Equal sampling conditions Two or more columns of
samples presence/absence (1/0) or
abundance data with taxa
down the rows
17
14 similarity and distance measures, as described under Cluster Analysis are
available. Note that some of these are similarity indices, while others are distance
indices (in cluster analysis, these are all converted to similarities). All pairs of rows
are compared, and the results given in a matrix.
Missing data are supported as described under Cluster Analysis.
Mixture analysis
Typical application Assumptions Data needed
Fitting a univariate data set Sampling from a mixture of One column of measured
to a mixture of two or more two or more normally dis- data
Gaussian (normal) distribu- tributed populations
tions
18
7 Multivariate statistics
Principal components analysis
Typical application Assumptions Data needed
Reduction and interpretation Debated Two or more rows of mea-
of large multivariate data sured data with three or
sets with some underlying more variables
linear structure
• Simple reduction of the data set to only two variables (the two most impor-
tant components), for plotting and clustering purposes.
• More interestingly, you might try to hypothesize that the most important
components are correlated with some other underlying variables. For mor-
phometric data, this might be simply age, while for associations it might be
a physical or chemical gradient (e.g. latitude or position across the shelf).
The PCA routine finds the eigenvalues and eigenvectors of the variance-covariance
matrix or the correlation matrix. Choose var-covar if all your variables are mea-
sured in the same unit (e.g. centimetres). Choose correlation (normalized var-
covar) if your variables are measured in different units; this implies normalizing
all variables using division by their standard deviations. The eigenvalues, giving
a measure of the variance accounted for by the corresponding eigenvectors (com-
ponents) are given for all components. The percentages of variance accounted for
by these components are also given. If most of the variance is accounted for by
the first one or two components, you have scored a success, but if the variance is
spread more or less evenly among the components, the PCA has in a sense not been
very successful.
The Jolliffe cut-off value gives an informal indication of how many principal
components should be considered significant (Jolliffe, 1986). Components with
eigenvalues smaller than the Jolliffe cut-off may be considered insignificant, but
too much weight should not be put on this criterion.
The ’Scree plot’ (simple plot of eigenvalues) can also be used to informally
indicate the number of significant components. After this curve starts to flatten
out, the corresponding components may be regarded as insignificant.
The ’View scatter’ option allows you to see all your data points (rows) plotted
in the coordinate system given by the two most important components. If you have
tagged (grouped) rows, the different groups will be shown using different symbols
19
and colours. You can also plot the Minimal Spanning Tree, which is the shortest
possible set of connected lines connecting all points. This may be used as a visual
aid in grouping close points. The MST is based on an Euclidean distance measure
of the original data points, so it is most meaningful when all your variables use
the same unit. The ’Biplot’ option will show a projection of the original axes
(variables) onto the scattergram. This is another visualisation of the component
loadings (coefficients) - see below. Note that the lengths of these axes are arbitrarily
scaled, all by the same factor, for giving a clear diagram.
The ’View loadings’ option shows to what degree your different original vari-
ables (given in the original order along the x axis) enter into the different compo-
nents (as chosen in the radio button panel). These component loadings are impor-
tant when you try to interpret the ’meaning’ of the components. The ’Coefficients’
option gives the PC coefficients, while ’Correlation’ gives the correlation between
a variable and the PC scores. Do not use the latter if you are doing PCA on the
correlation matrix.
The ’SVD’ option will enforce use of the supposedly superior Singular Value
Decomposition algorithm instead of "classical" eigenanalysis. The two algorithms
will normally give almost identical results, except that SVD will center on zero.
Also, the eigenvalues will have different absolute values (their relative values re-
main the same), and axes may be flipped.
For the ’Shape PCA’ and ’Shape deform’ options, see the section on Geomet-
rical Analysis.
Bruton & Owen (1988) describe a typical morphometrical application of PCA.
Missing data are supported by column average substitution.
Principal coordinates
Typical application Assumptions Data needed
Reduction and interpretation Unknown Two or more rows of
of large multivariate data measured, counted or pres-
sets with some underlying ence/absence data with
linear structure three or more variables, or
a symmetric similarity or
distance matrix
20
variance accounted for by these components are also given.
The similarity/distance values are raised to the power of c (the "Transformation
exponent") before eigenanalysis. The standard value is c = 2. Higher values (4 or
6) may decrease the "horseshoe" effect (Podani & Miklos 2002).
The ’View scatter’ option allows you to see all your data points (rows) plotted
in the coordinate system given by the PCO. If you have tagged (grouped) rows, the
different groups will be shown using different symbols and colours. The "Eigen-
value scaling" option scales each axis using the square root of the eigenvalue (rec-
ommended). The minimal spanning tree option is based on the selected similarity
or distance index in the original space.
Missing data are supported by pairwise deletion (not for the Raup-Crick, rho
or user-defined indices).
21
Correspondence analysis
Typical application Assumptions Data needed
Reduction and interpretation Unknown Two or more rows of
of large multivariate ecolog- counted data in three or
ical data sets with environ- more compartments
mental or other gradients
22
Detrended correspondence analysis
Typical application Assumptions Data needed
Reduction and interpretation Unknown Two or more rows of
of large multivariate ecolog- counted data in three or
ical data sets with environ- more compartments
mental or other gradients
23
Cluster analysis
Typical application Assumptions Data needed
Finding hierarchical group- None Two or more rows of
ings in multivariate data sets counted, measured or pres-
ence/absence data in one or
more variables or categories,
or a symmetric similarity or
distance matrix.
• Single linkage (nearest neighbour). Clusters are joined based on the smallest
distance between the two groups.
• Ward’s method. Clusters are joined such that increase in within-group vari-
ance is minimized.
One method is not necessarily better than the other, though single linkage is not
recommended by some. It can be useful to compare the dendrograms given by the
different algorithms in order to informally assess the robustness of the groupings. If
a grouping is changed when trying another algorithm, that grouping should perhaps
not be trusted.
For Ward’s method, a Euclidean distance measure is inherent to the algorithm.
For UPGMA and single linkage, the distance matrix can be computed using 13
different measures:
• Correlation (of the variables along rows) using Pearson’s r. A little mean-
ingless if you have only two variables.
24
• Correlation using Spearman’s rho (basically the r value of the ranks). Will
often give the same result as correlation using r.
• Cosine distance for abundance data - one minus the inner product of abun-
dances each normalised to unit norm.
Ps
(xij (xij − 1))
λ1 = Ps i=1 Ps (1)
i=1 xij ( i=1 xij − 1)
Ps
(xik (xik − 1))
λ2 = Ps i=1 Ps
i=1 xik ( i=1 xik − 1)
2 si=1 (xij xik )
P
M orisitajk =
(λ1 + λ2 ) si=1 xij si=1 xik
P P
25
• Raup-Crick index for absence-presence data. Recommended! This index
(Raup & Crick 1979) uses a randomization ("Monte Carlo") procedure, com-
paring the observed number of species ocurring in both associations with the
distribution of co-occurrences from 200 random replicates.
s
X
Nj = xij (2)
i=1
Xs
Nk = xik
i=1
Ps
+ xik )ln(xij + xik )] − si=1 [xij lnxij ] − si=1 [xik lnxik ]
P P
i=1 [(xij
Rojk =
(Nj + Nk )ln(Nj + Nk ) − Nj lnNj − Nk lnNk
• Hamming distance for categorical data as coded with integers. The Ham-
ming distance is the number of differences (mismatches), so that the distance
between (3,5,1,2) and (3,7,0,2) equals 2. In PAST, this is normalised to the
range (0,1).
Missing data: The cluster analysis algorithm can handle missing data, coded
as -1 or question mark (?). This is done using pairwise deletion, meaning that
when distance is calculated between two points, any variables that are missing are
ignored in the calculation. Missing data are not supported for Ward’s method, nor
for the Rho or the Raup-Crick similarity measures.
Two-way clustering: The two-way option allows simultaneous clustering in
R-mode and Q-mode. The graphics only support relatively small data sets.
Stratigraphically constrained clustering: This option will allow only adjacent
rows (or groups of rows) to be joined during the agglomerative clustering proce-
dure. May produce strange-looking (but correct) dendrograms.
Bootstrapping: If a number of bootstrap replicates is given (e.g. 100), the
columns are subjected to resampling. The percentage of replicates where each
node is still supported is given on the dendrogram.
26
K-means clustering
Typical application Assumptions Data needed
Non-hierarchical clustering None Two or more rows of
of multivariate data into a counted or measured data in
specified number of groups one or more variables
Seriation
Typical application Assumptions Data needed
Stratigraphical or environ- None Presence/absence (1/0) ma-
mental ordering of taxa and trix with taxa in rows
localities
27
In the unconstrained mode, both rows and columns are free to move.
28
Paired Hotelling’s T 2
Typical application Assumptions Data needed
Testing for equal means of a Multivariate normality. A multivariate data set
paired multivariate data set of paired measured data,
marked with different colors
The paired Hotelling’s test expects two groups of multivariate data, marked
with different colours. Rows within each group must be consecutive. The first row
of the first group is paired with the first row of the second group, the second row is
paired with the second, etc.
Missing data are supported by column average substitution.
This module expects the rows in the two data sets to be grouped into two sets
by colouring the rows, e.g. with black (dots) and red (crosses). Rows within each
group must be consecutive.
Equality of the means of the two groups is tested using permutation with 2000
replicates (can be changed by the user), and the Mahalanobis squared distance
measure. The permutation test is an alternative to Hotelling’s test when the as-
sumptions of multivariate normal distributions and equal covariance matrices do
not hold.
Missing data are supported by column average substitution.
29
size should be reasonably large (>50), although a small-sample correction is also
attempted.
Box’s M test
Typical application Assumptions Data needed
Testing for equivalence of Multivariate normality Two multivariate data
the covariance matrices for sets of measured data, or
two data sets two (square) variance-
covariance matrices, marked
with different colors
This test is rather specialized, testing for the equivalence of the covariance
matrices for two multivariate data sets. You can use either two original multivariate
data sets from which the covariance matrices are automatically computed, or two
specified variance-covariance matrices. In the latter case, you must also specify the
sizes (number of individuals) of the two samples.
The Box’s M statistic is given, together with a significance value based on a
chi-square approximation. Note that this test is supposedly very sensitive. This
means that a high p value will be a good, although informal, indicator of equality,
while a highly significant result (low p value) may in practical terms be a somewhat
too sensitive indicator of inequality.
30
PAST, the post-hoc analysis is quite simple, by pairwise Hotelling’s tests. In the
post-hoc table, groups are named according to the row label of the first item in
the group. Hotelling’s p values are given above the diagonal, while Bonferroni
corrected values (multiplied by the number of pairwise comparisons) are given
below the diagonal. This Bonferroni corrected test has very little power.
One-way ANOSIM
Typical application Assumptions Data needed
Testing for difference be- Ranked dissimilarities Two or more groups of mul-
tween two or more multi- within groups have similar tivariate data, marked with
variate groups, based on any median and range. different colours, or a sym-
distance measure metric similarity or distance
matrix with similar groups.
31
One-way NPMANOVA
Typical application Assumptions Data needed
Testing for difference be- The groups have similar Two or more groups of mul-
tween two or more multi- distributions (similar vari- tivariate data, marked with
variate groups, based on any ances) different colors, or a sym-
distance measure metric similarity or distance
matrix with similar groups.
32
8 Fitting data to functions
Linear
Typical application Assumptions Data needed
Fitting data to a straight None One or two columns of
line, or exponential or power counted or measured data
function
If two columns are selected, they represent x and y values, respectively. If one
column is selected, it represents y values, and x values are taken to be the sequence
of positive integers (1,2,...). A straight line y = ax+b is fitted to the data. There are
two different algorithms available: Standard regression and Reduced Major Axis
(the latter is selected by ticking the box). Standard regression keeps the x values
fixed, and finds the line which minimizes the squared errors in the y values. Use
this if your x values have very small errors associated with them. Reduced Major
Axis tries to minimize both the x and the y errors. RMA fitting and standard error
estimation is according to Miller & Kahn (1962), not Davis (1986)!
Also, both x and y values can be log-transformed (base 10), in effect fitting
your data to the ’allometric’ function y = 10b xa . An a value around 1 indicates
that a straight-line (’isometric’) fit may be more applicable.
The values for a and b, their errors, a Chi-square correlation value (not for
RMA), Pearson’s r correlation, and the probability that the columns are not corre-
lated are given.
The calculation of standard errors for slope and intercept assumes normal dis-
tribution of residuals and independence between the variables and the variance of
residuals. If these assumptions are strongly broken, it is preferable to use the boot-
strapped 95 percent confidence intervals (2000 replicates). The number of random
points selected for each replicate should normally be kept as N , but may be reduced
for special applications.
In Standard regression (not RMA), a 95 percent "Working-Hotelling" confi-
dence band for the fitted line (not for the data points!) is available.
Residuals
The Residuals window reports the distances from each data point to the regression
line, in the x and y directions. Only the latter is of interest when using ordinary
linear regression rather than RMA. The residuals can be copied back to the spread-
sheet and inspected for normal distribution and independence between independent
variable and residual variance (homoskedasticity).
Exponential functions
Your data can be fitted to an exponential function y = eb eax by first log-transforming
just your y column (in the Transform menu) and then performing a straight-line fit.
33
Sinusoidal
Typical application Assumptions Data needed
Fitting data to a set of peri- None Two columns of counted or
odic, sinusoidal functions measured data
Logistic
Typical application Assumptions Data needed
Fitting data to a logistic None Two columns of counted or
or von Bertalanffy growth measured data
model
Attempts to fit the data to the logistic equation y = a/(1 + b ∗ e−cx ). For
numerical reasons, the x axis is normalized. The algorithm is a little complicated.
The value of a is first estimated to be the maximal value of y. The values of b and
c are then estimated using a straight-line fit to a linearized model.
Though acceptable, this estimate can optionally be improved by using the esti-
mated values as an initial guess for a Levenberg-Marquardt nonlinear optimization
(tick the box). This procedure can sometimes improve the fit, but due to the nu-
merical instability of the logistic model it often fails with an error message.
The logistic equation can model growth with saturation, and was used by Sep-
koski (1984) to describe the proposed stabilization of marine diversity in the late
Palaeozoic.
The 95 percent confidence intervals are based on 2000 bootstrap replicates, not
using the Levenberg-Marquardt optimization step.
34
Von Bertalanffy
An option in the ’Logistic fit’ window. Uses the same algorithm as above, but fits
to the von Bertalanffy equation y = a ∗ (1 − b ∗ e−cx ). This equation is used for
modelling growth of multi-celled animals (in units of length or width, not volume).
B-splines
Typical application Assumptions Data needed
Smoothing noisy data None Two columns of counted or
measured data
Two columns must be selected (x and y values). The data are fitted with a
least-squares criterion to a B-spline, which is a sequence of third-order polyno-
mials, continuous up to the second derivative. A typical application of this is the
construction of a smooth curve going through a noisy data set.
A decimation factor is set by the user, and controls how many data points con-
tribute to each polynomial section. Larger decimation gives a smoother curve.
Note that sharp jumps in your data can give rise to oscillations in the curve, and
that you can also get large excursions in regions with few data points.
Abundance models
Typical application Assumptions Data needed
Fitting taxon abundance dis- None One column of abundance
tribution to one of three counts for a number of taxa
models in a sample
This module can be used for plotting logarithms of taxon abundances in de-
scending rank order (Whittaker plot), or number of species in abundance octave
classes (as shown when fitting to log-normal distribution). It can also fit the data to
one of three different standard abundance models:
• Geometric, where the 2nd most abundant species should have a taxon count
of k<1 times the most abundant, the 3rd most abundant a taxon count of k
times the 2nd most abundant etc. for a constant k. This will give a straight
descending line in the Whittaker plot. Fitting is by simple linear regression
of the log abundances.
• Log-series, with two parameters α and x. The fitting algorithm is from Krebs
(1989).
35
Octave Abundance
1 1
2 2-3
3 4-7
4 8-15
5 16-31
6 32-63
7 64-127
... ...
36
9 Diversity
Diversity statistics
Typical application Assumptions Data needed
Quantifying taxonomical di- Representative samples One or more columns, each
versity in samples containing counts of individ-
uals of different taxa down
the rows
37
Most of these indices are explained in Harper (1999).
Approximate confidence intervals for all the indices can be computed with a
bootstrap procedure. 1000 random samples are produced (200 prior to version
0.87b), each with the same total number of individuals as in the original sam-
ple. The random samples are taken from the total, pooled data set (all columns).
For each individual in the random sample, the taxon is chosen with probabilities
according to the original abundances. A 95 percent confidence interval is then cal-
culated. Note that the diversity in the replicates will often be less than, and never
larger than, the pooled diversity in the total data set.
Since these confidence intervals are all computed with respect to the pooled
data set, they do not represent confidence intervals for the individual samples. They
are mainly useful for identifying samples where the given diversity index falls out-
side the confidence interval. Bootstrapped comparison of diversity indices in two
samples is provided in the "Compare diversities" module.
Quadrat richness
Typical application Assumptions Data needed
Estimating species richness Representative, random Two or more columns, each
from several quadrat sam- quadrats of equal size containing presence/absence
ples (1/0) of different taxa down
the rows
Taxonomic distinctness
Typical application Assumptions Data needed
Quantifying taxonomical Representative samples One or more columns, each
distinctness in samples containing counts of indi-
viduals of different taxa
down the rows. In ad-
dition, the leftmost row(s)
must contain names of gen-
era/families etc. (see be-
low).
38
Warwick is not entered directly, but is calculated internally by pooling (summing)
the given samples.
These indices depend on taxonomic information also above the species level,
which has to be entered for each species as follows. Species names go in the name
column (leftmost, fixed column), genus names in column 1, family in column 2
etc. Species counts follow in the columns thereafter. The program will ask for the
number of columns containing taxonomic information above the species level.
For presence-absence data, taxonomic diversity and distinctness will be valid
but equal to each other.
Compare diversities
Typical application Assumptions Data needed
Comparing diversities in Equal sampling conditions Two columns of abundance
two samples of abundance data with taxa down the
data rows
This module computes a number of diversity indices for two samples, and then
compares the diversities using two different randomization procedures as follows.
Bootstrapping
The two samples A and B are pooled. 1000 random pairs of samples (Ai , Bi ) are
then taken from this pool (200 prior to version 0.87b), with the same numbers of
individuals as in the original two samples. For each replicate pair, the diversity in-
dices div(Ai ) and div(Bi ) are computed. The number of times |div(Ai )−div(Bi )|
exceeds or equals |div(A) − div(B)| indicates the probability that the observed
difference could have occurred by random sampling from one parent population as
estimated by the pooled sample.
A small probability value p(equal) then indicates a significant difference in
diversity index between the two samples.
Permutation
1000 random matrices with two columns (samples) are generated, each with the
same row and column totals as in the original data matrix. The p value is computed
as for the boostrap test.
Diversity t test
Typical application Assumptions Data needed
Comparing Shannon diver- Equal sampling conditions Two columns of abundance
sities in two samples of data with taxa down the
abundance data rows
39
Comparison of the Shannon diversities (entropies) in two samples, using a t
test described by Poole (1974). This is an alternative to the randomization test
available in the Compare diversities module.
Note that the Shannon indices here include a bias correction term (Poole 1974),
and may diverge slightly from the uncorected estimates calculated elsewhere in
PAST, at least for small samples.
Diversity profiles
Typical application Assumptions Data needed
Comparing diversities in Equal sampling conditions Two columns of abundance
two samples of abundance data with taxa down the
data rows
Rarefaction
Typical application Assumptions Data needed
Comparing taxonomical di- When comparing samples: Single column of counts of
versity in samples of differ- Samples should be taxo- individuals of different taxa
ent sizes nomically similar, obtained
using standardised sampling
and taken from a similar
’habitat’.
Given a column of abundance data for a number of taxa, this module estimates
how many taxa you would expect to find in a sample with a smaller total number of
40
individuals. With this method, you can compare the number of taxa in samples of
different size. Using rarefaction analysis on your largest sample, you can read out
the number of expected taxa for any smaller sample size. The algorithm is from
Krebs (1989). An example application in palaeontology can be found in Adrain et
al. (2000).
Let N be the total number of individuals in the sample, s the total number
of species, and Ni the number of individuals of species number i. The expected
number of species E(Sn ) in a sample of size n and the variance V (Sn ) are then
given by
s N −Ni
" #
X
n
E(Sn ) = 1− N
i=1 n
s N −Ni N −Ni
" !#
X
n n
V (Sn ) = N
1− N
i=1 n n
s
XX j−1 " N −N −N
i j N −Ni N −Nj
#
n n n
+ 2 N
− N N
(3)
j=2 i=1 n n n
Standard errors (square roots of variances) are given by the program. In the
graphical plot, these standard errors are converted to 95 percent confidence inter-
vals.
Diversity curves
Typical application Assumptions Data needed
Plotting diversity curves None Abundance or pres-
from occurrence data ence/absence matrix with
samples in rows (lowest
sample at bottom) and taxa
in columns
Found in the ’Strat’ menu, this simple tool allows plotting of diversity curves
from occurrence data in a stratigraphical column. Note that samples should be
in stratigraphical order, with the uppermost (youngest) sample in the uppermost
row. Data are subjected to the range-through assumption (absences between first
and last appearance are treated as presences). Originations and extinctions are in
absolute numbers, not percentages.
The ’Endpoint correction’ option counts a FAD or LAD in a sample as 0.5
instead of 1 in that sample. Both FAD and LAD in the sample counts as 0.33.
41
10 Time series analysis
Spectral analysis
Typical application Assumptions Data needed
Finding periodicities in Time series long enough to One or two columns of
counted or measured data contain at least four cycles counted or measured data
Autocorrelation
Typical application Assumptions Data needed
Finding periodicities in Time series long enough to One column of counted or
counted or measured data contain at least two cycles. measured data
Even spacing of data points.
42
Cross-correlation
Typical application Assumptions Data needed
Finding an optimal align- Even spacing of data points. Two columns of counted or
ment of two time series measured data
Wavelet transform
Typical application Assumptions Data needed
Inspection of time series at Even spacing of data points One column of counted or
different scales measured data
Walsh transform
Typical application Assumptions Data needed
Spectral analysis (finding Even spacing of data points One column of binary (0/1)
periodicities) of binary or or ordinal (integer) data
ordinal data
The normal methods for spectral analysis are perhaps not optimal for binary
data, because they decompose the time series into sinusoids rather than "square
43
waves". The Walsh transform may then be a better choice, using basis functions
that flip between -1 and +1. These basis functions have different "frequencies"
(number of transitions divided by two), known as sequencies. In PAST, each pair
of even ("cal") and odd ("sal") basis functions (one pair for each integer-valued
sequency) is combined into a power value using cal2 + sal2 , producing a "power
spectrum" that is comparable to the Lomb periodogram.
Note that the Walsh transform is slightly "exotic" compared with the Fourier
transform, and its interpretation must be done cautiously. For example, the ef-
fects of the duty cycle (percentage of ones versus zeros) are somewhat difficult to
understand.
In PAST, the data values are pre-processed by multiplying with two and sub-
tracting one, bringing 0/1 binary values into the -1/+1 range optimal for the Walsh
transform.
Runs test
Typical application Assumptions Data needed
Testing for randomness in a None One column containing a
time series time series. The values are
converted to 0 (x≤0) or 1
(x > 0).
44
The Mantel periodogram is a power spectrum of the multivariate time series,
computed from the Mantel correlogram (Hammer, unpublished).
45
11 Geometrical analysis
Directional analysis
Typical application Assumptions Data needed
Displaying and testing for See below One column of directional
random distribution of di- data in degrees (0-360)
rectional data
R̄ is further tested against a random distribution using Rayleigh’s test for direc-
tional data (Davis 1986). Note that this procedure assumes evenly or unimodally
distributed data - the test is not appropriate for bidirectional data. Also, the test is
not accurate for N >200; it will then report a too high p value.
A four-bin Chi-square test is also available, giving the probability that the di-
rections are randomly and evenly distributed.
The ’Orientations’ option allows analysis of linear orientations (0-180 degrees).
The Rayleigh test is then carried out by a directional test on doubled angles (this
trick is described by Davis 1986). The Chi-square uses four bins from 0-180 de-
grees. The rose diagram mirrors the histogram around the origin.
Point distribution
Typical application Assumptions Data needed
Testing for clustering or Elements small compared to Two columns of x/y posi-
overdispersion of two- their distances, mainly con- tions
dimensional position values vex domain, N>50.
46
Both are inappropriate for points in very concave domains. Two different edge ef-
fect adjustment methods are available: wrap-around (’torus’) and Donnelly’s cor-
rection.
The probability that the distribution is random (Poisson process, giving an ex-
ponential nearest neighbour distribution) is presented, together with the R value:
2d¯
R= p ,
A/N
where d¯ is the observed mean distance between nearest neighbours, A is the
area of the convex hull, and N is the number of points. Clustered points give R<1,
Poisson patterns give R 1, while overdispersed points give R>1.
The orientations (0-180 degrees) and lengths of lines between nearest neigh-
bours, are also included. The orientations can be subjected to directional analysis
to test whether the points are organised along lineaments.
Applications of this module include spatial ecology (are in-situ brachiopods
clustered) and morphology (are trilobite tubercles overdispersed).
Multivariate allometry
Typical application Assumptions Data needed
Finding and testing for al- None A multivariate data set with
lometry in a multivariate variables (distance measure-
morphometric data set ments) in columns, speci-
mens in rows.
47
Accepts X − Y coordinates digitized around an outline. More than one shape
(row) can be simultaneously analyzed. Points do not need to be totally evenly
spaced. The shape must be expressible as a unique function in polar co-ordinates,
that is, any straight line radiating from the centre of the shape must cross the outline
only once.
The origin for the polar coordinate system is found by numerical approximation
to the centroid. 128 points are then produced at equal angular increments around
the outline, through linear interpolation. The centroid is then re-computed, and
the radii normalized (size is thus removed from the analysis). The cosine and sine
components are given for the first ten harmonics, but note that only N/2 harmonics
are ’valid’, where N is the number of digitized points. The coefficients can be
copied to the main spreadsheet for further analysis (e.g. by PCA).
The ’Shape view’ window allows graphical viewing of the Fourier shape ap-
proximation(s).
48
Eigenshape analysis
Typical application Assumptions Data needed
Analysis of fossil outline Sufficient number of digi- Digitized x/y coordinates
shape tized points to capture fea- around several outlines.
tures. Specimens in rows, coordi-
nates of alternating x and
y values in columns (see
Procrustes fitting below).
The Procrustes option in the Transform menu will transform your measured
coordinates to Procrustes coordinates. There is also a menu choice for Bookstein
coordinates. Specimens go in different rows and landmarks along each row. If you
have three specimens with four landmarks, your data should look as follows:
x1 y1 x2 y2 x3 y3 x4 y4
x1 y1 x2 y2 x3 y3 x4 y4
x1 y1 x2 y2 x3 y3 x4 y4
For 3D the data will be similar, but with additional columns for z.
Landmark data in this format could be analyzed directly with the multivariate
methods in PAST, but it is recommended to standardize to so-called Procrustes co-
ordinates by removing position, size and rotation. A further transformation to Pro-
crustes residuals (approximate tangent space coordinates) is achieved by selecting
49
’Subtract mean’ in the Edit menu. Note: You must always convert to Procrustes
coordinates first, then to Procrustes residuals.
Here is a typical sequence of operations for landmark analysis:
Shape PCA
This is an option in the Principal Components module (Multivar menu). PCA on
landmark data can be carried out as normal PCA analysis on Procrustes residu-
als for 2D or 3D (see above), but for 2D landmark data some extra functionality
is available in the PCA module by choosing Shape PCA. The conversion to Pro-
crustes residuals is then done automatically, so your data must be Procrustes fitted,
but not with subtracted mean. The var-covar option is enforced, and the ’Shape
deform (2D)’ button enabled. This allows you to view the displacement of land-
marks from the mean shape (plotted as points or symbols) in the direction of the
different principal components, allowing interpretation of the components. The
displacements are plotted as lines (vectors).
Another implementation of Shape PCA is available under Relative Warps (see
below), by setting the parameter alpha to zero.
50
The first specimen (first row) is taken as a reference, with an associated square
grid. The warps from this to all other specimens can be viewed. You can also
choose the mean shape as the reference.
The ’Expansion factors’ option will display the area expansion (or contraction)
factor around each landmark in yellow numbers, indicating the degree of local
growth. This is computed using the Jacobian of the warp. Also, the expansions
are colour-coded for all grid elements, with green for expansion and purple for
contraction.
At each landmark, the principal strains can also be shown, with the major strain
in black and minor strain in brown. These vectors indicate directional stretching.
A description of thin-plate spline transformation grids is given by Dryden &
Mardia (1998).
Partial warps
From the thin-plate spline window, you can choose to see the partial warps for a
particular spline deformation. The first partial warp will represent some long-range
(large scale) deformation of the grid, while higher-order warps will normally be
connected with more local deformations. The affine component of the warp (also
known as zeroth warp) represents linear translation, scaling, rotation and shearing.
In the present version of PAST you can not view the principal warps.
When you increase the magnification factor from zero, the original landmark
configuration and a grid will be progressively deformed according to the selected
partial warp.
Relative warps
Typical application Assumptions Data needed
Ordination of a set of shapes None Digitized x/y landmark co-
ordinates. Specimens in
rows, coordinates of alter-
nating x and y values in
columns. Procrustes stan-
dardization recommended.
The relative warps can be viewed as the principal components of the set of
thin-plate transformations from the mean shape to each of the shapes under study.
It provides an alternative to direct PCA of the landmarks (see Shape PCA above).
The parameter alpha can be set to one of three values:
51
• alpha=-1 emphasizes small-scale variation.
The relative warps are ordered according to importance, and the first and sec-
ond warps are usually the most informative. Note that the percentage values of the
eigenvalues are relative to the total non-affine part of the transformation - the affine
part is not included.
The relative warps are visualized with thin-plate spline transformation grids.
When you increase or decrease the amplitude factor away from zero, the original
landmark configuration and grid will be progressively deformed according to the
selected relative warp.
The relative warp scores of pairs of consecutive relative warps can shown in
scatter plots, and all scores can be shown in a numerical matrix.
The algorithm for computing the relative warps is taken from Dryden & Mardia
(1998).
Calculates the centroid size for each specimen (Euclidean norm of the distances
from all landmarks to the centroid).
The values in the ’Normalized’ column are centroid sizes divided by the square
root of the number of landmarks - this might be useful for comparing specimens
with different numbers of landmarks.
Normalize size
The ’Normalize size’ option in the Transform menu allows you to remove size
by dividing all coordinate values by the centroid size for each specimen. For 2D
data you may instead use Procrustes coordinates, which are also normalized with
respect to size.
See Dryden & Mardia (1998), p. 23-26.
52
Distance from landmarks (2D or 3D)
Typical application Assumptions Data needed
Calculating distances None Digitized x/y or x/y/z
between two landmarks landmark coordinates.
Specimens in rows, coor-
dinates with alternating x
and y (and z for 3D) values
in columns. May or may
not be Procrustes fitted or
normalized for size.
Calculates the Euclidean distances between two fixed landmarks for one or
many specimens. You must choose two landmarks - these are named according to
the name of the first column for the landmark (x value).
This function will replace the landmark data in the data matrix with a data
set consisting of distances between all pairs of landmarks, with one specimen per
row. The number of pairs is N(N-1)/2 for N landmarks. This transformation will
allow multivariate analysis of distance data, which are not sensitive to rotation or
translation of the original specimens, so a Procrustes fitting is not mandatory before
such analysis. Using distance data also allows log-transformation, and analysis of
fit to the allometric equation for pairs of distances.
Missing data are supported by column average substitution.
Landmark linking
This function in the Geomet menu allows the selection of any pairs of landmarks
to be linked with lines in the morphometric plots (thin-plate splines, partial and
relative warps, etc.), to improve readability. The landmarks must be present in the
main spreadsheet before links can be defined.
Pairs of landmarks are selected or deselected by clicking in the symmetric ma-
trix. The set of links can also be saved in a text file. Note that there is little error
checking in this module.
53
Burnaby size removal
This function in the Transform menu will project your multivariate data set of mea-
sured distances onto a space orthogonal to the first principal component. Burnaby’s
method may (or may not!) remove isometric size from the data, for further "size-
free" data analysis. The "Allometric" option will log-transform the data prior to
projection, thus conceivably removing also allometric size-dependent shape varia-
tion from the data. Note that the implementation in PAST does not center the data
within groups - it assumes that all specimens (rows) belong to one group.
Moving average
The value at a grid node is simply the average of the N closest data points, as
specified by the user (the default is to use all data points). The points are given
weight in inverse proportion to distance. This algorithm is simple and will not
always give good (smooth) results. One advantage is that the interpolated values
will never go outside the range of the data points.
Thin-plate spline
Maximally smooth interpolator. Can overshoot in the presence of sharp bends in
the surface.
Kriging
This advanced method is implemented in a simple version in PAST. The user is re-
quired to specify a model for the semivariogram, by choosing one of three models
54
(spherical, exponential or Gaussian) and corresponding parameters to fit the empir-
ical semivariances as well as possible. See e.g. Davis (1986) for more information.
The kriging procedure also provides an estimate of standard errors across the map
(this depends on the semivariogram model being accurate). Kriging in PAST does
not provide for anisotropic semivariance.
55
12 Cladistics
Typical application Assumptions Data needed
Semi-objective analysis of Many! See Kitchin et al. Character matrix with taxa
relationships between taxa (1998) in rows, outgroup in first
from morphological or ge- row. For calculation of
netic evidence stratigraphic congruence in-
dices, first and last appear-
ance datums must be given
in the first two columns.
Parsimony analysis
Character states should be coded using integers in the range 0 to 255. The first
taxon is treated as the outgroup, and will be placed at the root of the tree.
Missing values are coded with a question mark (?) or the value -1. Please note
that PAST does not collapse zero-length branches. Because of this, missing values
can lead to a proliferation of equally shortest trees ad nauseam, many of which are
in fact equivalent.
There are four algorithms available for finding short trees:
Branch-and-bound
The branch-and-bound algorithm is guaranteed to find all shortest trees. The total
number of shortest trees is reported, but a maximum of 1000 trees are saved. You
should not use the branch-and-bound algorithm for data sets with more than 12
taxa.
Exhaustive
The exhaustive algorithm evaluates all possible trees. Like the branch-and-bound
algorithm it will necessarily find all shortest trees, but it is very slow. For 12 taxa,
more than 600 million trees are evaluated! The only advantage over branch-and-
bound is the plotting of tree length distribution. This histogram may indicate the
’quality’ of your matrix, in the sense that there should be a tail to the left such that
few short trees are ’isolated’ from the greater mass of longer trees (but see Kitchin
et al. 1998 for critical comments on this). For more than 8 taxa, the histogram is
based on a subset of tree lengths and may not be accurate.
56
Heuristic, nearest neighbour interchange
This heuristic algorithm adds taxa sequentially in the order they are given in the
matrix, to the branch where they will give least increase in tree length. After each
taxon is added, all nearest neighbour trees are swapped to try to find an even shorter
tree.
Like all heuristic searches, this one is much faster than the algorithms above
and can be used for large numbers of taxa, but is not guaranteed to find all or any of
the most parsimonious trees. To decrease the likelihood of ending up on a subopti-
mal local minimum, a number of reorderings can be specified. For each reordering,
the order of input taxa will be randomly permutated and another heuristic search
attempted.
Please note: Because of the random reordering, the trees found by the heuristic
searches will normally be different each time. To reproduce a search exactly, you
will have to start the parsimony module again from the menu, using the same value
for "Random seed". This will reset the random number generator to the seed value.
Wagner
Characters are reversible and ordered, meaning that 0->2 costs more than 0->1, but
has the same cost as 2->0.
57
Fitch
Characters are reversible and unordered, meaning that all changes have equal cost.
This is the criterion with fewest assumptions, and is therefore generally preferable.
Dollo
Characters are ordered, but acquistition of a character state (from lower to higher
value) can happen only once. All homoplasy is accounted for by secondary rever-
sals. Hence, 0->1 can only happen once, normally relatively close to the root of
the tree, but 1->0 can happen any number of times further up in the tree. (This
definition has been debated on the PAST mailing list, especially whether Dollo
characters need to be ordered).
Bootstrap
Bootstrapping is performed when the ’Bootstrap replicates’ value is set to non-zero.
The specified number of replicates (typically 100 or even 1000) of your character
matrix are made, each with randomly weighted characters. The bootstrap value for
a group is the percentage of replicates supporting that group. A replicate supports
the group if the group exists in the majority rule consensus tree of the shortest trees
made from the replicate.
Warning: Specifying 1000 bootstrap replicates will clearly give a thousand
times longer computation time than no bootstrap! Exhaustive search with boot-
strapping is unrealistic and is not allowed.
Cladogram plotting
All shortest (most parsimonious) trees can be viewed, up to a maximum of 1000
trees. If bootstrapping has been performed, a bootstrap value in percents is given
at the root of the subtree specifying each group.
Character states can also be plotted onto the tree, as selected by the ’Character’
buttons. This character reconstruction is unique only in the absence of homoplasy.
In case of homoplasy, character changes are placed as close to the root as possible,
favouring one-time acquisition and later reversal of a character state over several
independent gains (so-called accelerated transformation).
Consistency index
The per-character consistency index (ci) is defined as m/s, where m is the mini-
mum possible number of character changes (steps) on any tree, and s is the actual
number of steps on the current tree. This index hence varies from one (no homo-
plasy) and down towards zero (a lot of homoplasy). The ensemble consistency
index CI is a similar index summed over all characters.
58
Retention index
The per-character retention index (ri) is defined as (g − s)/(g − m), where m
and s are as for the consistency index, while g is the maximal number of steps for
the character on any cladogram (Farris 1989). The retention index measures the
amount of synapomorphy on the tree, and varies from 0 to 1.
Consensus tree
The consensus tree of all shortest (most parsimonious) trees can also be viewed.
Two consensus rules are implemented: Strict (groups must be supported by all
trees) and majority (groups must be supported by more than 50 percent of the
trees).
• In the box for ’Longest tree kept’, enter the number N +1 (43 in our example)
and perform a new search.
• Additional clades which are no longer found in the strict consensus tree have
a Bremer support value of 1.
• For ’Longest tree kept’, enter the number N + 2 (44) and perform a new
search. Clades which now disappear in the consensus tree have a Bremer
support value of 2.
59
values to achieve this (e.g. 400 million years before present is coded as -400.0).
The box "FADs/LADs in first columns" in the Parsimony dialogue must be ticked.
The Stratigraphic Congruence Index (SCI) of Huelsenbeck (1994) is defined as
the proportion of stratigraphically consistent nodes on the cladogram, and varies
from 0 to 1. A node is stratigraphically consistent when the oldest first occurrence
above the node is the same age or younger than the first occurrence of its sister
taxon (node).
The Relative Completeness Index (RCI) of Benton & Storrs (1994) is defined
as (1−M IG/SRL)x100 percent, where MIG (Minimum Implied Gap) is the sum
of the durations of ghost ranges and SRL is the sum of the durations of observed
ranges. The RCI can become negative, but will normally vary from 0 to 100.
The Gap Excess Ratio (GER) of Wills (1999) is defined as 1 − (M IG −
Gmin )/(Gmax − Gmin ) where Gmin is the minimum possible sum of ghost ranges
on any tree (that is, the sum of distances between successive FADs), and Gmax is
the maximum (that is, the sum of distances from first FAD to all other FADs).
These indices are further subjected to a permutation test, where all dates are
randomly redistributed across the different taxa 1000 times. The proportion of
permutations where the recalculated index exceeds the original index is given. If
small (e.g. p<0.05), this indicates a statistically significant departure from the null
hypothesis of no congruency between cladogram and stratigraphy (in other words,
you have significant congruency). The permutation probabilities of RCI and GER
are equal for any given set of permutations, because they are based on the same
value for MIG.
60
13 Biostratigraphy
Unitary associations
Typical application Assumptions Data needed
Quantitative biostratigraphi- None Presence/absence (1/0) ma-
cal correlation trix with horizons in rows,
taxa in columns
61
The superpositions and co-occurrences of taxa can be viewed in the biostrati-
graphic graph. In this graph, taxa are coded as numbers. Co-occurrences between
pairs of taxa are shown as solid blue lines. Superpositions are shown as dashed red
lines, with long dashes from the above-occurring taxon and short dashes from the
below-occurring taxon.
Some taxa may occur in so-called forbidden sub-graphs, which indicate incon-
sistencies in their superpositional relationships. Two of the several types of such
sub-graphs can be plotted in PAST: Cn cycles, which are superpositional cycles (A-
>B->C->A), and S3 circuits, which are inconsistencies of the type ’A co-occurring
with B, C above A, and C below B’. Interpretation of such forbidden sub-graphs is
described by Guex (1991).
3. Maximal cliques
Maximal cliques are groups of co-occurring taxa not contained in any larger
group of co-occurring taxa. The maximal cliques are candidates for the status of
unitary associations, but will be further processed below. In PAST, maximal cliques
receive a number and are also named after a maximal horizon in the original data
set which is identical to, or contained in (marked with asterisk), the maximal clique.
4. Superposition of maximal cliques
The superpositional relationships between maximal cliques are decided by in-
specting the superpositional relationships between their constituent taxa, as com-
puted in step 2. Contradictions (some taxa in clique A occur below some taxa in
clique B, and vice versa) are resolved by a ’majority vote’. The contradictions
between cliques can be viewed in PAST.
The superpositions and co-occurrences of cliques can be viewed in the maximal
clique graph. In this graph, cliques are coded as numbers. Co-occurrences between
pairs of cliques are shown as solid blue lines. Superpositions are shown as dashed
red lines, with long dashes from the above-occurring clique and short dashes from
the below-occurring clique. Also, cycles between maximal cliques (see below) can
be viewed as green lines.
5. Resolving cycles
It will sometimes be the case that maximal cliques are now ordered in cycles: A
is below B, which is below C, which is below A again. This is clearly contradictory.
The ’weakest link’ (superpositional relationship supported by fewest taxa) in such
cycles is destroyed.
6. Reduction to unique path
At this stage, we should ideally have a single path (chain) of superpositional
relationships between maximal cliques, from bottom to top. This is however often
not the case, for example if A and B are below C, which is below D, or if we have
isolated paths without any relationships (A below B and C below D). To produce a
single path, it is necessary to merge cliques according to special rules.
7. Post-processing of maximal cliques
Finally, a number of minor manipulations are carried out to ’polish’ the result:
Generation of the ’consecutive ones’ property, reinsertion of residual virtual co-
occurrences and superpositions, and compaction to remove any generated non-
62
maximal cliques. For details on these procedures, see Guex 1991. At last, we now
have the Unitary Associations, which can be viewed in PAST.
The unitary associations have associated with them an index of similarity from
one UA to the next, called D:
Special functionality
The implementation of the Unitary Associations method in PAST includes a num-
ber of options and functions which have not yet been described in the literature.
For questions about these, please contact us.
63
well, increasing upwards (you may want to use negative values to achieve this).
Absences are coded as zero. If only the order of events is known, this can be coded
as increasing whole numbers (ranks, with possible ties for co-occurring events)
within each well.
The implementation of ranking-scaling in PAST is not comprehensive, and
advanced users are referred to the RASC and CASC programs of Agterberg and
Gradstein.
RASC in PAST
Parameters
Well threshold: The minimum number of wells in which an event must occur
in order to be included in the analysis
Pair threshold: The minimum number of times a relationship between events
A and B must be observed in order for the pair (A,B) to be included in the ranking
step
Scaling threshold: Pair threshold for the scaling step
Tolerance: Used in the ranking step (see Agterberg & Gradstein)
Ranking
The ordering of events after the ranking step is given, with the first event at the
bottom of the list. The "Range" column indicates uncertainty in the position.
Scaling
64
The ordering of the events after the scaling step is given, with the first event
at the bottom of the list. For an explanation of all the columns, see Agterberg &
Gradstein (1999).
Event distribution
A plot showing the number of events in each well, with the wells ordered ac-
cording to number of events.
Scattergrams
For each well, the depth of each event in the well is plotted against the optimum
sequence (after scaling). Ideally, the events should plot in an ascending sequence.
Dendrogram
Plot of the distances between events in the scaled sequence, including a den-
drogram which may aid in zonation.
65
Appearance Event Ordination (Alroy 1994, 2000) is a method for biostrati-
graphical seriation and correlation. The data input is in the same format as for Uni-
tary Associations, consisting of a presence/absence matrix with samples in rows
and taxa in columns. Samples belonging to the same section (locality) must be as-
signed the same color, and ordered stratigraphically within each section such that
the lowermost sample enters in the lowest row. Colors can be re-used in data sets
with large numbers of sections (see alveolinid.dat for an example).
The implementation in PAST is based on code provided by John Alroy. It
includes Maximum Likelihood AEO (Alroy 2000).
This method (Marshall 1994) does not assume random distribution of fossil-
iferous horizons. It requires that the levels or dates of all horizons containing the
66
taxon are given.
The program outputs upper and lower bounds on the lengths of the confidence
intervals, using a 95 percent confidence probability, for confidence levels of 50, 80
and 95 percent. Values which can not be calculated are marked with an asterisk
(see Marshall 1994).
67
14 Acknowledgments
PAST was inspired by and includes many functions found in PALSTAT, which was
programmed by P.D. Ryan with assistance from J.S. Whalley. Harper thanks the
Danish Natural Science Research Council (SNF) for support. Frits Agterberg and
Felix Gradstein allowed OH access to source code for RASC, and Peter Sadler pro-
vided source code for CONOP. Jean Guex provided a series of ideas for improve-
ment and extension of the Unitary Associations module, and tested it intensively.
John Alroy provided source code for AEO.
Many users of PAST have given us ideas for improvement and reported bugs.
Among these are Charles Galea Bonavia, Hans Arne Nakrem, Mikael Fortelius,
Knut Rognes, Julian Overnell, Kirsty Brown, Paolo Tomassetti, Jose Luis Navarrete-
Heredia, Wally Woolfenden, Erik Telie, Fernando Archuby, Ian J. Slipper, James
Gallagher, Marcio Pie, Hugo Bucher, Alexey Tesakov, Craig Macfarlane, José
Camilo Hurtado Guerrero, Wolfgang Kiessling and Bastien Wauthoz.
15 References
Adrain, J.M., S.R. Westrop & D.E. Chatterton 2000. Silurian trilobite alpha diver-
sity and the end-Ordovician mass extinction. Paleobiology 26:625-646.
Agterberg, F.P. & F.M. Gradstein. 1999. The RASC method for Ranking and
Scaling of Biostratigraphic Events. In: Proceedings Conference 75th Birthday
C.W. Drooger, Utrecht, November 1997. Earth Science Review 46(1-4):1-25.
Alroy, J. 1994. Appearance event ordination: a new biochronologic method. Pale-
obiology 20:191-207.
Alroy, J. 2000. New methods for quantifying macroevolutionary patterns and pro-
cesses. Paleobiology 26:707-733.
Anderson, M.J. 2001. A new method for non-parametric multivariate analysis of
variance. Austral Ecology 26:32-46.
Angiolini, L. & H. Bucher 1999. Taxonomy and quantitative biochronology of
Guadalupian brachiopods from the Khuff Formation, Southeastern Oman. Geobios
32:665-699.
Benton, M.J. & G.W. Storrs. 1994. Testing the quality of the fossil record: paleon-
tological knowledge is improving. Geology 22:111-114.
Bow, S.-T. 1984. Pattern recognition. Marcel Dekker, New York.
Brower, J.C. & K.M. Kyle 1988. Seriation of an original data matrix as applied to
palaeoecology. Lethaia 21:79-93.
Brown, D. & P. Rothery 1993. Models in biology: mathematics, statistics and
computing. John Wiley & Sons, New York.
Bruton, D.L. & A.W. Owen 1988. The Norwegian Upper Ordovician illaenid trilo-
bites. Norsk Geologisk Tidsskrift 68:241-258.
Clarke, K.R. 1993. Non-parametric multivariate analysis of changes in community
structure. Australian Journal of Ecology 18:117-143.
68
Clarke, K.R. & Warwick, R.M. 1998. A taxonomic distinctness index and its sta-
tistical properties. Journal of Applied Ecology 35:523-531.
Colwell, R.K. & J.A. Coddington. 1994. Estimating terrestrial biodiversity through
extrapolation. Philosophical Transactions of the Royal Society (Series B) 345:101-
118.
Davis, J.C. 1986. Statistics and Data Analysis in Geology. John Wiley & Sons,
New York.
Dryden, I.L. & K.V. Mardia 1998. Statistical Shape Analysis. Wiley.
Farris, J.S. 1989. The retention index and the rescaled consistency index. Cladis-
tics 5:417-419.
Ferson, S.F., F.J. Rohlf & R.K. Koehn 1985. Measuring shape variation of two-
dimensional outlines. Systematic Zoology 34:59-68.
Guex, J. 1991. Biochronological Correlations. Springer Verlag, Berlin.
Harper, D.A.T. (ed.). 1999. Numerical Palaeobiology. John Wiley & Sons, Chich-
ester.
Hennebert, M. & A. Lees. 1991. Environmental gradients in carbonate sediments
and rocks detected by correspondence analysis: examples from the Recent of Nor-
way and the Dinantian of southwest England. Sedimentology 38:623-642.
Hill, M.O. & H.G. Gauch Jr. 1980. Detrended Correspondence analysis: an im-
proved ordination technique. Vegetatio 42:47-58.
Horn, H.S. 1966. Measurement of overlap in comparative ecological studies. Amer-
ican Naturalist 100:419-424.
Huelsenbeck, J.P. Comparing the stratigraphic record to estimates of phylogeny.
Paleobiology 20:470-483.
Jolicoeur, P. 1963. The multivariate generalization of the allometry equation. Bio-
metrics 19:497-499.
Jolliffe, I.T. 1986. Principal Component Analysis. Springer-Verlag, Berlin.
Kemple, W.G., P.M. Sadler & D.J. Strauss. 1989. A prototype constrained op-
timization solution to the time correlation problem. In Agterberg, F.P. & G.F.
Bonham-Carter (eds), Statistical Applications in the Earth Sciences. Geological
Survey of Canada Paper 89-9:417-425.
Kitchin, I.J., P.L. Forey, C.J. Humphries & D.M. Williams 1998. Cladistics. Ox-
ford University Press, Oxford.
Kowalewski, M., E. Dyreson, J.D. Marcot, J.A. Vargas, K.W. Flessa & D.P. Hall-
mann. 1997. Phenetic discrimination of biometric simpletons: paleobiological
implications of morphospecies in the lingulide brachiopod Glottidia. Paleobiology
23:444-469.
Krebs, C.J. 1989. Ecological Methodology. Harper & Row, New York.
Legendre, P. & L. Legendre. 1998. Numerical Ecology, 2nd English ed. Elsevier,
853 pp.
MacLeod, N. 1999. Generalizing and extending the eigenshape method of shape
space visualization and analysis. Paleobiology 25:107-138.
Marshall, C.R. 1990. Confidence intervals on stratigraphic ranges. Paleobiology
16:1-10.
69
Marshall, C.R. 1994. Confidence intervals on stratigraphic ranges: partial re-
laxation of the assumption of randomly distributed fossil horizons. Paleobiology
20:459-469.
Miller, R.L. & Kahn, J.S. 1962. Statistical Analysis in the Geological Sciences.
John Wiley & Sons, New York.
Oxanen, J. & P.R. Minchin. 1997. Instability of ordination results under changes in
input data order: explanations and remedies. Journal of Vegetation Science 8:447-
454.
Podani, J. & I. Miklos. 2002. Resemblance coefficients and the horseshoe effect in
principal coordinates analysis. Ecology 83:3331-3343.
Poole, R.W. 1974. An introduction to quantitative ecology. McGraw-Hill, New
York.
Press, W.H., S.A. Teukolsky, W.T. Vetterling & B.P. Flannery 1992. Numerical
Recipes in C. Cambridge University Press, Cambridge.
Prokoph, A., A.D. Fowler & R.T. Patterson. 2000. Evidence for periodicity and
nonlinearity in a high-resolution fossil record of long-term evolution. Geology
28:867-870.
Raup, D. & R.E. Crick. 1979. Measurement of faunal similarity in paleontology.
Journal of Paleontology 53:1213-1227.
Ryan, P.D., Harper, D.A.T. & Whalley, J.S. 1995. PALSTAT, Statistics for palaeon-
tologists. Chapman & Hall (now Kluwer Academic Publishers).
Savary, J. & J. Guex. 1999. Discrete Biochronological Scales and Unitary Asso-
ciations: Description of the BioGraph Computer Program. Memoires de Geologie
(Lausanne) 34.
Sepkoski, J.J. 1984. A kinetic model of Phanerozoic taxonomic diversity. Paleobi-
ology 10:246-267.
Strauss, D. & P.M. Sadler. 1989. Classical confidence intervals and Bayesian prob-
ability estimates for ends of local taxon ranges. Mathematical Geology 21:411-
427.
Taguchi, Y-H. & Oono, Y. In press. Novel non-metric MDS algorithm with confi-
dence level test.
Tothmeresz, B. 1995. Comparison of different methods for diversity ordering.
Journal of Vegetation Science 6:283-290.
Wills, M.A. 1999. The gap excess ratio, randomization tests, and the goodness of
fit of trees to stratigraphy. Systematic Biology 48:559-580.
Zar, J.H. 1996. Biostatistical Analysis. 3rd ed. Prentice Hall, New York.
70