Manual
Manual
9
Xuebin Zhang, Yang Feng and Rodney Chan
1
Contents
1 Introduction 6
2 Installation of R 7
2.1 Brief introduction to R . . . . . . . . . . . . . . . . . . . . . . . . 7
2.2 Installing R . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
2.3 Running R . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
3 Installation of RClimDex 8
4 RClimDex 8
4.1 Getting started on RClimDex . . . . . . . . . . . . . . . . . . . . 8
4.2 Quality control . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
4.2.1 Description of RClimDex quality control procedures . . . 9
4.2.2 Loading datasets for quality control . . . . . . . . . . . . 10
4.2.3 Quality control main menu and parameters . . . . . . . . 10
4.2.4 Quality control results and plots . . . . . . . . . . . . . . 12
4.3 Indices calculation . . . . . . . . . . . . . . . . . . . . . . . . . . 13
4.3.1 Loading data files for indices calculation . . . . . . . . . . 13
4.3.2 Indices calculation main menu and parameters . . . . . . 14
2
Acknowledgements to version 1
The initial version of RClimDex was developed by Xuebin Zhang and Yang
Feng at the Climate Research Branch of Meteorological Service of Canada. Its
initial development was funded by the Canadian International Development
Agency through the Canada China Climate Change Cooperation (C5) Project.
Lisa Alexander, Francis Zwiers, Byron Gleason, David Stephenson, Albert Klan
Tank, Mark New, Lucie Vincent, and Tom Peterson made important contribu-
tions to the development and testing of the package. Jose Luis Santos at CIIFEN
helped to translate this document into Spanish. Earlier versions of RClimDex
have been used during CCl/CLIVAR ETCCDI workshops in Cape Town, South
Africa, May 31-June 4, 2004, and in Maceio, Brazil, August 9-14, 2004. The
lectures and attendees of the workshops provided very valuable suggestions for
the improvement of RClimDex.
Acknowledgements to version 2
The version 2 of RClimDex was developed by Xuebin Zhang, Yang Feng and
Rodney Chan at the Climate Research Division of Environment Canada. The
new version of RClimDex is made possible by Pacific Climate Impacts Consor-
tium and their climdex.pcic package. All indices calculation now depends on the
fast and well tested implemention of ClimDex from the climdex.pcic package.
3
Release notes
Changelog
2.0:
Release version.
Updated manual.
Removed outlier checking for precipition upper limit in quality control.
1.9-1:
Fix bug where indices calculation crashed when only one variable is present.
(the others are all NAs)
1.9:
If base period contains years at the start or end with all missing values, the base
period will be shorten accordingly.
Longitude and Latitude are replaced by Northern Hemisphere and Southern
Hemisphere to better reflect the parameter.
Fix bug where selecting a different output directory than the input data file will
sometimes result in an error. (’/’ syntax issue)
Fix bug where outliers are set against tmax or tmin even if they are purposely
not provided. (all missing values)
NA mask rules are implemented in climdex.pcic instead of RClimDex.
Updated manual to reflect UX change for station selection.
1.8-2:
Edits to FD0,SU25,TR20,ID0 to FD,SU,TR,ID to be more consistent.
1.8-1:
Made some minor edits to vignettes in terms of spelling, grammer and structure.
1.8:
Change the version numbering from 0.1-8 -> 1.8.
Remove support for LS trend and plots for indices calculation.
Updated dependency from climdex.pcic version 1.1-1 to 1.1-6
Updated function calls from RClimDex to climdex.pcic for version 1.1-6 from
version 0.7-2
Fix bug where repeated indices calculation with multiple files will skip the first
file.
Fix bug where ”indcal” will appear in the names of output files from the first
dataset when processing multiple files.
4
0.1-7:
Vignettes is added to package.
Manual has been updated with more details.
Startup message has been added for clearer instruction.
Check for precipitation exceeding upper limit.
Output indices plot as pdf instead of jpg.
Change quality control rules to allow some datasets to be flagged but not dis-
rupted from processing.
Log messages are updated to be more natural.
Fix bug in multiple dataset where parameters are changed due to data quality,
the changed will revert back after that file instead of remaining for the rest of
the files.
Fix bug in leap days being NA instead of being repeated of 28th February.
0.1-6:
Datasets are padded with NA to ensure the datasets filled up a whole year.
0.1-5:
Change NA mask rules to 15 NA threshold for annual and 3 NA threshold for
monthly.
Fix bug where any NA in values resulted in invalid dates.
Fix bug where NA mask was not apply to TMAXmean and TMINmean.
Updated RX5day, RX1day. There were issues in old RClimdex NA mask.
Updated missing marker to accept character values.
Updated thresholds such that some indices calculation are skiped when there
are too many NA.
0.1-4:
Initial pacakage version - UX overhaul, batch processing support, indices calcu-
lation replaced by climdex.pcic from RClimDex.
Updated GSL. There were issues in counting one less day in old RClimdex.
(missing +1)
Updated TNn. There were issues with January, even one NA will trigger NA
for that month. (missing na.rm=T)
5
1 Introduction
ClimDex is a Microsoft Excel based program that provides an easy-to-use soft-
ware package for the calculation of indices of climate extremes for monitor-
ing and detecting climate change. It was developed by Byron Gleason at
the National Climate Data Centre (NCDC) of NOAA, and has been used in
CCl/CLIVAR workshops on climate indices from 2001.
The original objective was to port ClimDex into an environment that does not
depend on a particular operating system. It was very natural to use R as our
platform, since R is a free and yet very robust and powerful software for statisti-
cal analysis and graphics. It runs under both Windows and Unix environments.
In 2003 it was discovered that the method used for computing percentile-based
temperature indices in ClimDex and other programs resulted in inhomogeneity
in the indices series. A fix to the problem requires a bootstrap procedure that
makes it almost impossible to implement in an Excel environment. This has
made it more urgent to develop the R based package.
6
2 Installation of R
2.1 Brief introduction to R
R is a language and environment for statistical computing and graphics. It is
a GNU implementation of the S language developed by John Chambers and
colleagues at Bell Laboratories (formerly AT&T, now Lucent Technologies). S-
plus provides a commercial implementation of the S language.
2.2 Installing R
RClimDex requires the base package of R (Version 2.15.2 or later). The instal-
lation of R involves a very simple procedure. First, connect to the R project
website at www.r-project.org, then follow the links to download the most recent
version of R for your computer operating system from any mirror site of CRAN.
For Microsoft Windows XP and later, download the base R Windows installer.
Run that installer and R will be automatically installed on your computer, with
a shortcut to R on your desktop. The Tcl/Tk library is included in the default
installation of R.
For Linux, download the proper precompiled binaries and follow the instruction
to install R. For other UNIX systems, you may need to download the source
code and compile it yourself.
For Mac OSX 10.9 (Mavericks) and above, download the latest version of the
R signed package. Validate the signature using pkgutil for example –check-
signature R-3.2.1.pkg in the Terminal. Run the R signed package to install.
Select custom install during installtion to enable Tcl/Tk library. Connect to
XQuartz website at xquartz.macosforge.org to use X11 which is required for the
graphical user interface of RClimDex. Download the latest XQuartz image and
install. There is now an R app in Lauchpad.
For Mac OSX 10.5 (Leopard) to 10.8 (Mountain Lion), download the last sup-
ported legacy version of the R signed package of the corresponding Mac OSX
version. Validate the signature by checking the MD5 checksum with the website.
For example to check the MD5 checksum type in md5 R-3.2.1.pkg in the Ter-
minal application. Run the R signed package to install and during installation
select custom install to enable Tcl/Tk library. R is now installed and in the
Application folder. For Mac OSX 10.8 (Mountain Lion), connect to XQuartz
website at xquartz.macosforge.org to use X11 which is required for the graphical
user interface of RClimDex. Download the latest XQuartz image and install.
2.3 Running R
For Windows, double click the R icon on your desktop, or launch it through
Windows Start Menu. This usually gets you into the R user interface. R 64bit
will also be installed if your system supports 64bit. It is recommended to use
R 64bit if your system is 64bit. You may quit the program by clicking on the
7
top menu under File then Exit.
Under Linux, just run the command R to give you the R console. You may quit
by typing in the command q().
Under OSX, click on the R app in Lauchpad, or double click the R icon in your
Application folder. This usually gets you into the R user interface. You may
quit the program by clicking on the top menu under R then Quit R.
3 Installation of RClimDex
RClimDex is now an R package. Most recent version of RClimDex is avaliable
from the ETCCDI website at http://etccdi.pacificclimate.org/software.shtml,
where registration is required. Please install RClimDex as a local package in
R. RClimDex now depends on the climdex.pcic. With an internet connection,
launch R in the same directory as the RClimDex package. Then run the follow-
ing commands:
install.packages("climdex.pcic")
RClimDex has been developed under R 2.15.2. This version of RClimDex de-
pends on the R library of climdex.pcic (Version 1.1-6) and PCICt (Version
0.5-4) for computing the 27 core indices as well as the R library of Tcl/Tk (Ver-
sion 2.15.2) for the graphical user interface. The depended R libraries will be
downloaded and installed automatically during the installation of the RClimDex
package.
For Windows, you may change the current R working directory by clicking on
the top menu under File then Change dir..., afterwards select the directory
where you store the RClimDex package before installing.
Under OSX, please click on the top menu under Misc then Change Working Di-
rectory... to change the current R working directory. Please select the directory
where RClimDex package is stored before installing.
4 RClimDex
4.1 Getting started on RClimDex
RClimDex can be loaded like any other R packages. All dependent libraries will
also be loaded. Please type in the following into the R console:
library("RClimDex")
8
In order to launch RClimDex user interface and to begin using RClimdex, simply
type in the following into the R console.
rclimdex.start()
You may type in the command into the R console again to relaunch the user
interface. RClimDex is not programmed to support concurrent sessions within
the same R console, therefore please launch one user interface at a time.
Alternatively you may launch another seperate R console in another window for
an additional session of RClimDex. Please note that RClimDex only supports
one current session per R console.
9
(c) Daily temperature values greater than 70 degree Celsius or less than
-70 degree Celsius.
(d) Leap days. (i.e. 29th February)
(e) All values corresponding to an impossible date. (i.e. 32nd March
2013, 12th June 20AA, etc.)
(f) Any non-numeric values.
3. Identifies outliers of daily temperature values outside of user-defined re-
gion, with default as values further than 3 times standard deviation from
the mean value for that calendar day. Users can define other multiples of
standard deviation away from mean. Note that the outliers are most often
valid values.
Windows and Linux users may press and hold shift key or crtl key (OSX users
may use cmd key) to select multiple datasets. When multiple datasets are
selected, RClimDex would perform quality control based on the same parameters
selected on the quality control menu to all datasets in sequence. The output
filenames of multiple datasets correspond with their filenames respectively and
cannot be changed.
1. Output file location: The local directory where RClimDex will store the
output files from quality control.
2. Station name or code: Output filename prefix. (Only available when
processing only one dataset)
3. Number of standard deviation for temperature: The threshold for outliers
from daily record. Any daily temperature values away from mean in terms
of standard deviation for that particular calendar day above threshold will
be flagged as outlier in the log file.
4. Missing marker: Character string as the indicator for missing values in
input dataset.
The output file location is where all outputs from RClimDex will be stored.
After a successful quality control execution, you will find a csv file ending with
a suffix of .indcal.csv, which is the post quality controlled dataset. Please select
this dataset instead of your original dataset for the indices calculation. In addi-
tion, there is a log file in txt format with details about the quality control a log
10
Figure 2: File selection for quality control.
11
subdirectory that contains plots and statistics about the quality control. Lastly,
there are the indices subdirectory which would be where the indices calculation
outputs be stored. If you are only executing quality control on one dataset, you
may rename the output filename.
The other parameters relates to the actual quality control. They are the outlier
settings in terms of standard devation away from mean value, an upper limit
for precipitation values in millimeters and the missing marker. Please note that
the default missing marker of -99.9 will always be used in addition to your user-
defined missing marker.
Press the Perform Quality Control button to begin quality control and press
the Quit button to exit to main menu. Also note that quality control will not
automatically quit after completion. If you desire to perform indices calculation
on the post quality controlled dataset, simply press the Quit button to go back
to the main menu then press the Run Indices Calculation button on the main
menu.
Under the buttons, there are log messages in which RClimDex communicates
with the user. It will provide details for current status, including but not limited
to which file is loaded, what process is being executed, was the quality control
successfully performed and number of files remaining to processs.
Figure 4: A sample plot from quality control. Red circle indicates values were
flagged by quality control.
Please note that while unreasonable values are removed, other problematic data
like outliers are simply flagged in a log file and are not changed. They simply
alert the user to pay more attention and make correction only if needed.
Besides the main results of post quality controlled dataset, there are a few more
outputs from quality control of RClimDex. All of these outputs are in the log
subdirectory. Please review these log files for a better understanding of the
changes quality control made to the dataset.
12
(e) log/filename tminPLOT.pdf; Plot of daily minimum temperature values.
(f) log/filename dtrPLOT.pdf; Plot of daily diurnal temperature range val-
ues.
(g) log/filename prcpQC.csv; Table of all daily precipitation values that were
flagged and removed.
(h) log/filename tmaxQC.csv; Table of all daily maximum temperature values
that were flagged and removed.
(i) log/filename tminQC.csv; Table of all daily minimum temperature values
that were flagged and removed.
(j) log/filename tepstdQC.csv; Table of all outlier temperature values that
were only flagged.
(k) log/filename nastatistic.csv; Summary table of the annual and monthly
sums of values flagged and removed.
For the outlier tables, there are twelve columns. Three columns each for date,
daily maximum temperature, daily minimum temperature and diurnal tempera-
ture range. For all temperature columns, there are the lower bound of accepted
values, the temperature value of that day and the upper bound of accepted
values. The boundaries for outliers are user-defined by the number of standard
deviation away from mean. A record is added to the outlier table if any outlier
temperature values are flagged for that day.
All plots in quality control provide a quick visual understanding of where values
are flagged. As seen in Figure 4, the red circles in the plots are where values
are flagged. For daily precipitation values, a histogram is provided as well.
User may want to check and review the values flagged by the RClimDex quality
control prior to indices calculation. You may make changes to the post qual-
ity controlled dataset (.indcal.csv file) by a spreadsheet editor such as Excel or
Numbers in Windows and OSX or any text editor in Linux.
Windows and Linux users may press and hold shift key or crtl key (OSX users
may use cmd key) to select multiple files. When multiple data files are selected,
RClimDex would perform indices calculation based on the same parameters
13
and indices selected on the indices calculation menu to all selected datasets in
sequence. The output filenames of multiple files corresponds with their original
filenames respectively and can not be changed.
1. Output file location: The local directory where RClimDex will store the
output files from indices calculation.
2. Station name or code: Output filename prefix. (Only available when
processing only one dataset)
3. Missing marker: Character string as the indicator for missing values in
input dataset.
4. Base period: The base period usually is a long enough period preferably
more than 10 years in which climatology do not change much for a given
station.
(a) First year: Starting year of base period. (Always begin on 1st of
January)
(b) Last year: Ending year of base period. (Always end on 31st of De-
cember)
5. Station location: It will affect indices such as CSDI and GSL based on the
hemisphere the station is located. Please select datasets from the same
hemisphere when processing multiple dateset. Station location only have
options to select between Northern Hemisphere or Southern Hemisphere.
6. Threshold of daily maximum temperature:
(a) Upper: User defined threshold for summer days, similar to SU.
(b) Lower: User defined threshold for ice days, similar to ID.
7. Threshold of daily minimum temperature:
(a) Upper: User defined threshold for tropical nights, similar to TR.
(b) Lower: User defined threshold for frost days, similar to FD.
8. Threshold of precipitation (mm): User defined total counts of daily pre-
cipitation above the user defined threshold, similar to R10 and R20.
9. Indices selections: Please refer to Appendix A for more information and
full definition can be found in Appendix C.
The output files location is where all outputs from RClimDex will be stored.
There is a log file in txt format about the indices calculation. There are also
the indices subdirectory which would be where the indices calculation outputs
be stored. You may only rename the output filename if you are not processing
multiple dataset. The current version of RClimDex does not support different
indices calculation selection for each individual dataset when processing multi-
ple dataset. In other words, only the indices selected are calculated for multiple
14
Figure 5: File selection for indices calculation.
The base period always begin from the 1st of January and ends on the 31st of De-
cember of the user defined period. RClimDex determines the hemisphere from
the station location with radio button for Northern or Southern Hemisphere.
Keep in mind to have all datasets from the same hemisphere when processing
multiple datasets because RClimDex assumes same parameters when process-
ing multiple datasets which includes the hemisphere. The threshold values for
temperature are for user-defined frost days, summer days, ice days and tropical
nights, where precipitation threshold is for user defined number of days with
daily precipitation values above threshold.
RClimDex communicates with the user in the log message box below the but-
tons. This is where details about the current status, such as which dataset is
loaded, what process is being performed, number of datasets remaining, etc. are
displayed.
16
A List of ETCCDI core Climate Indices
ID Indicator name Definitions Units
GSL Growing season Annual (1st Jan to 31st Dec in NH, 1st days
length July to 30th June in SH) count between
first span of at least 6 days with TG > 5°C
and first span after 1st July (1st January
in SH) of 6 days with TG < 5°C
17
ID Indicator name Definitions Units
WSDI Warm spell dura- Annual count of days with at least 6 con- days
tion indicator secutive days when TX>90th percentile
CSDI Cold spell duration Annual count of days with at least 6 con- days
indicator secutive days when TN<10th percentile
SDII Simple daily inten- Annual total precipitation divided by the mm/day
sity index number of wet days (defined as PRCP >=
1.0mm) in the year
R10 Number of heavy Annual count of days when PRCP >= days
precipitation days 10mm
R20 Number of very Annual count of days when PRCP >= days
heavy precipitation 20mm
days
Rnn Number of days Annual count of days when PRCP >= nn days
above nn mm mm, nn is user defined threshold
R95p Very wet days Annual total PRCP when RR > 95th per- mm
centile
R99p Extremely wet days Annual total PRCP when RR > 99th per- mm
centile
PRCPTOT Annual total wet- Annual total PRCP in wet days (RR >= mm
day precipitation 1mm)
18
B Data format
All indices calculation are outputted as comma-separated values (CSV) files for
tables and portable document format (PDF) files for all plots. RClimDex like-
wise accepts CSV files as input data. RClimDex also accepts space-delimited
ASCII text file as input data. Space-delimited format has each element sep-
arated by one or more spaces. The input dataset must satisfy the following
requirements.
1. The input dataset must have the file extension .csv or .txt
2. Columns must be YEAR MONTH DAY PRCP TMAX TMIN in that
order
3. The records must be in calendar date order. Missing dates are allowed.
1950,2,3,-99.9,-3.1,-6.8
1950,2,4,0,-1.3,-3.6
1950,2,5,0,-0.5,-7.9
1950,2,6,11.4,-1,-9.1
1950,2,9,0,-1.8,-8.4
19
C Indices definition
Definitions for indicators are listed in Appendix A. For practical reasons, in
this version of the software, not all indices are calculated on a monthly basis.
Monthly indices are calculated if no more than 3 days are missing in a month,
while annual values are calculated if no more than 15 days are missing in a year.
No annual values will be calculated if any one month’s data are missing. For
threshold indices, a threshold is calculated if at least 75% of data are present.
For spell duration indicators (marked with a *), a spell can continue into the
next year and is counted against the year in which the spell ends.
A cold spell (CSDI) for example in the Northern Hemisphere beginning on 31st
December 2000 and ending on 6th January 2001 is counted towards the total
number of cold spells in 2001.
1. FD
5. GSL
Let Tij be the daily mean temperature on day i in period j. Count the
number of days between the first occurence of at least 6 consecutive days
with:
20
and the first occurence after 1st July (1st January in SH) of at least 6
consecutive days with:
10. Tn10p
Let T nij be the daily minimum temperature on day i, period j and let
T nin 10 be the calendar day 10th percentile centered on a 5-day window
(calculated using method in Appendix D). The percentage of time is de-
termined where:
Let T xij be the daily maximum temperature on day i, period j and let
T xin 10 be the calendar day 10th percentile centered on a 5-day window
(calculated using method in Appendix D). The percentage of time is de-
termined where:
21
12. Tn90p
Let T nij be the daily minimum temperature on day i, period j and let
T nin 90 be the calendar day 90th percentile centered on a 5-day window
(calculated using method in Appendix D). The percentage of time is de-
termined where:
Let T xij be the daily maximum temperature on day i, period j and let
T xin 90 be the calendar day 90th percentile centered on a 5-day window
(calculated using method in Appendix D). The percentage of time is de-
termined where:
Let T xij be the daily maximum temperature on day i, period j and let
T xin 90 be the calendar day 90th percentile centered on a 5-day window
(calculated using method in Appendix D). Then the number of days per
period is summed where, in interval, of at least 6 consecutive days:
Let T nij be the daily minimum temperature on day i, period j and let
T nin 10 be the calendar day 10th percentile centered on a 5-day window
(calculated using method in Appendix D). Then the number of days per
period is summed where, in interval, of at least 6 consecutive days:
Let T xij and T nij be the daily maximum and minimum temperature re-
spectively on day i, period j. If I represents the number of days in j, then:
I
X
(T xij − T nij )
i=1
DT R = I
17. RX1day
22
RX1dayj = max(RRij )
18. RX5day
Let RRkj be the daily precipitation amount for the 5-day interval ending
on day k in period j. Then maximum 5-day values for period j are:
RX5dayj = max(RRkj )
19. SDII
20. R10
RRij ≥ 10mm
21. R20
RRij ≥ 20mm
22. Rnn
RRij ≥ nn
23. CDD*
23
24. CWD*
RRij ≥ 1mm
25. R95p
26. R99p
27. PRCPTOT
24
D Threshold estimation and base period tem-
perature indices calculation
D.1 Empirical quantile estimation
The quantile of a distribution is defined as
where F (x) is the distribution function. Let {X(a) , ..., X(n) } denote the order
statistics of {X1 , ..., Xn } (i.e. sorted values of {X}), and let Q̂i (p) denote the
ith sample quantile definition. The sample quantiles can be generally written
as
Hyndman and Fan (1996) suggest a formula to obtain medium un-biased es-
timate of the quantile by letting j =int(p × n + (1 + p)/3) and letting γ =
p × n + (1 + p)/3 − j, where int(u) is the largest integer not greater than u.
The empirical quantile is set to the smallest or largest value in the sample
when j < 1 or j > n respectively. That is, quantile estimates corresponding to
p < 1/(n+1) are set to the smallest value in the sample, and those corresponding
to p > n/(n + 1) are set to the largest value in the sample.
(a) The base period is divided into one out-of-base year, the year for which
exceedance is to be estimated, and a base-period consisting the remaining
years from which the thresholds would be estimated.
(b) A n-year block of data is constructed by using the n − 1 year base-period
data set and adding an additional year of data from the base-period (i.e.
one of the years in the base-period is repeated). This constructed n-year
block is used to estimate thresholds.
(c) The out-of-base year is then compared with these thresholds and the ex-
ceedance rate for the out-of-base year is obtained.
(d) Steps (b) and (c) are repeated for additional n − 2 times, by repeating
each of the remaining n − 2 in-base years in turn to construct the n-year
block.
(e) The final index for the out-of-base year is obtained by averaging the n − 1
estimates obtained from steps (b), (c) and (d).
25
References
[1] Hyndman, R.J., and Y. Fan, 1996: Sample quantiles in statistical packages.
The American Statistician, 50, 361-367.
[2] Zhang, X., G. Hegerl, F.W. Zwiers, and J. Kenyon, 2005: Avoiding inhomo-
geneity in percentile-based indices of temperature extremes. J. Climate, 18,
1647-1648.
26