0% found this document useful (0 votes)
102 views50 pages

9.3.2 CDO Operator Categories: Section 5.3

The document discusses using multiple operators in CDO commands to perform complex file operations. It describes categories of CDO operators and provides examples of merging files from multiple input files using operators like mergetime, copy and cat.

Uploaded by

Oscard Kana
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
102 views50 pages

9.3.2 CDO Operator Categories: Section 5.3

The document discusses using multiple operators in CDO commands to perform complex file operations. It describes categories of CDO operators and provides examples of merging files from multiple input files using operators like mergetime, copy and cat.

Uploaded by

Oscard Kana
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 50

Data Analysis with CDO 239

The option -b is needed when working with packed netCDF files (see
Section 5.3).

9.3.2 CDO Operator Categories


CDO has 791 operators (v1.7.0). Each one of them does a specific operation when
applied to an input file. Here only some of the most frequently used operators will
be discussed. However, the syntax of CDO commands follows some fairly simple
rules (see Section 8.3). Once these are understood it becomes quite straightforward
to switch from one operator to another as long as it is clear what the operator does
to the file.
Each operator is documented in detail in the CDO User’s Guide. CDO operators may
be categorised as shown in Table 8.3.2.1.

Table 8.3.2.1: CDO operator categories.

Category Description
File information Print file information
File operations Copy, split or merge files
Selection Select (extract) parts of a dataset
Comparison Compare datasets
Modification Modify datasets
Arithmetic Arithmetics
Statistics Spatial, vertical, temporal and ensemble statistics
Correlation Correlation statistics
Regression Regression and trend analysis
Interpolation Spatial, vertical and time interpolation
Transformation Spectral, divergence and vorticity analysis

When using an operator it is absolutely critical to be certain as to what


computation or manipulation the operator does when applied to a file. If in
doubt, consult the CDO User’s Guide (see Section 8.2 ).
Data Analysis with CDO 240

9.3.3 Using Multiple CDO Operators


One feature that makes CDO extremely powerful is that multiple operators can be
used in a single command. This allows for single commands to be constructed that
can do quite complex file operations.
If multiple operators are used then hyphens (-) need to be added before each operator
apart from the leftmost (last executed) one.
Operators are executed in sequence from right to left. This is important to remember
because the order in which the operators manipulate data often matters. For instance,
the following example crops the input file to a smaller spatial domain defined
by longitude and latitude boundaries using the sellonlatbox operator followed by
calculating the spatial mean for each timestep using the fldmean operator. The
operators are applied to the input file ifile.nc and the result of the file operations is
saved in the output file ofile.nc.

cdo sellonlatbox,12,25,-10,10 -fldmean ifile.nc ofile.nc

Note that parameter values (longitude and latitude boundaries for the Congo Basin)
are passed to the sellonlatbox operator (see Section 8.3.4).

9.3.4 CDO Operator Parameters


Some operators require more information which is passed to them via parameters.
Parameters are added after their corresponding operator, and are separated by
commas (,) and no white spaces. A parameter can take the form of a string, a float
or an integer value as demonstrated in the following three examples.
In the first example, the parameter JJA (a value of the type string) is passed on to the
operator selseas (select season) in order to extract the seasons June, July and August
from the input file ifile.nc.

cdo selseas,JJA ifile.nc ofile.nc

In the second example, the value 273.15 (a value of type float) is passed to the operator
subc (subtract constant) in order to be subtracted from each data value in the input
file (i.e., to convert temperature values from Kelvin to degree C).
Data Analysis with CDO 241

cdo subc,273.15 ifile.nc ofile.nc

In the third example, the integer values 5 to 9 (values of the type integer) are passed
to the operator selmonth (select months) in order to extract the corresponding months
from the input file.

cdo selmonth,5,6,7,8,9 ifile.nc ofile.nc

No white spaces are allowed between an operator and its associated param-
eter(s). Only commas are used to separate the operator and parameter(s).

A short-cut for passing sequential lists of integer values such as months or years to
operators is to separate the first and last value of the list by a forward slash (/) as in
the following example. The values between the first and last list element are filled in
automatically when the operator is applied. This is especially useful for long value
lists. The following example will result in the same output file as in the previous
example.

cdo selmonth,5/9 ifile.nc ofile.nc

9.3.5 CDO Command Input and Output Files


Most CDO operators expect an input filename and most of the time also an output
filename. Within the CDO command syntax the input filename comes after the
option(s) and before the output filename.
Input and output filenames can include paths that point to different locations on the
Unix server where they are being read from or where output files are being saved to
(see Section 3.4.4 for a description of full and relative paths).
Most operators expect only a single input filename and a single output filename. In
some cases multiple input files are expected, for instance, when merging netCDF
files. Information operators such as info, sinfo or griddes expect only a single input
filename.
Data Analysis with CDO 242

9.4 Merging Files


Quite often datasets are split over many files. Datasets tend to be split into files
containing either different variables or different time periods such as months or
years. Table 8.4.1 lists operators that can be used to merge or concatenate such split
datasets if required. Each of these operators accepts more than one intput filename.

Table 8.4.1: CDO operators for merging files.

Operator Description Input Files Conditions


merge Merges time series of different Same number of timesteps;
fields from several input different variables OR different
datasets. levels of the same variable
mergetime Merges all timesteps of all input Same structure with the same
files sorted by date and time. variables on different timesteps
copy Copies all input datasets to an Same structure with the same
outfile. variables on different timesteps
cat Concatenates all input datasets Same structure with the same
and appends the result to the variables on different timesteps
end of outfile.
The operator merge can be used when merging files that have the same number of
timesteps and have either different variables on the same vertical levels or have the
same variables but on different vertical levels. Mixing both conditions is not possible.
The operator mergetime allows files to be merged that have differing timesteps but
have otherwise the same structure and the same variables. The data fields in the
output file will be sorted by date and time. Assume the following list of 12 intput
files. Each file contains hourly 2m dewpoint temperature data for one month of the
year 2015.
Data Analysis with CDO 243

era5_hourly_d2m_201501.nc
era5_hourly_d2m_201502.nc
era5_hourly_d2m_201503.nc
era5_hourly_d2m_201504.nc
era5_hourly_d2m_201505.nc
era5_hourly_d2m_201506.nc
era5_hourly_d2m_201507.nc
era5_hourly_d2m_201508.nc
era5_hourly_d2m_201509.nc
era5_hourly_d2m_201510.nc
era5_hourly_d2m_201511.nc
era5_hourly_d2m_201512.nc

To merge these files into a single file in which all the timesteps are sorted correctly the
following command can be used. Note that the special character question mark (?)
is used here to generate a list of intput files (see Section 3.4.5 for special characters).

cdo mergetime era5_hourly_d2m_2015??.nc era5_hourly_d2m_2015.nc

The operators copy and cat can be used to achieve similar results to the use of
mergetime. The exact differences, however, are not well documented in the CDO
User’s Guide. The conditions for the input files are the same as those for mergetime
(see Table 8.4.1).
While the operator copy can be used for single file operations such as converting to
netCDF or adding a relative time axis it can also be used to concatenate datasets that
have different timesteps but have otherwise the same structure and same variables.
One of the difference is that the operators copy and cat will not automatically sort
the data fields by date and time. Therefore, the order of the list of input files matters
here.

Care should be taken if the operators copy and cat are used to merge files
because the data fields are not automatically sorted by date and time as
done by the operator mergetime.

The following command will achieve the same results as in the mergetime example
above only because the use of the special characters ?? will result in a list of input
files where the data fields across all files are sorted by date and time.
Data Analysis with CDO 244

cdo copy era5_hourly_d2m_2015??.nc era5_hourly_d2m_2015.nc

The operator cat works in a similar way to mergetime and copy but it can be used to
append data fields to an existing file. If the file does not exist then it will be created.
Therefore, the following command will achieve the same result as in the mergetime
and copy examples above as long as the output file does not already exist and the list
of input files is sorted by date and time.

cdo cat era5_hourly_d2m_2015??.nc era5_hourly_d2m_2015.nc

If the number of input files is very large then the use of the cat operator is
recommended for merging files.

9.5 Selections
Selection operators are frequently used to select parts or specific features from a
netCDF file such as temporal and spatial subsets and subsets of variables. Selection
operator names start with sel followed by the feature to be selected (e.g., selmonth). If
the selected feature is saved into a new file without any further processing then one
may also refer to the process as extraction. Some of the more frequently used selection
operators are discussed in the following sub-sections (Section 8.4.1 to Section 8.4.4).

9.5.1 Selecting Variables


In some cases a single netCDF file may contain multiple data variables and it may
be desirable to extract one or more variables of interest. The variable(s) of interest
can be referred to by the parameter identifier using the operator selparam, by the
code number using the operator selcode or by the variable name using the operator
selname. In the following example the parameters t2m and d2m are extracted from the
input file ifile.nc and saved in the output file ofile.nc.
Data Analysis with CDO 245

cdo selname,t2m,d2m ifile.nc ofile.nc

9.5.2 Selecting Spatial Subsets (Geographical Regions)


Climate studies often focus on a specific region of the world such as continents or
even smaller sub-regions such as the Congo Basin. Extracting a spatial subset from
a global dataset can in some cases significantly reduce file size and processing time.
Spatial sub-setting is also necessary when calculating a time series based on averages
for a specific region. A rectangular spatial subset can be selected by using the operator
sellonlatbox or selindexbox depending on whether the region boundaries are defined
by latitude and longitude values or by their corresponding index values (see Section
4.3).
Both operators expect four parameters to be passed to them that define the sub-region
boundaries. For the sellonlatbox operator the sub-region boundaries are defined
using latitude and longitude coordinate values in the form lon1,lon2,lat1,lat2
whereby lon1 and lon2 are the western and eastern boundaries, respectively, and
lat1 and lat2 are the southern and northern boundaries, respectively. Similarly,
the corresponding index values can be used in the same order with the selindexbox
operator (idx1,idx2,idy1,idy2).
In the following example the region of northern Africa covering an area from the
Equator to 40°N and from 20°W to 40°E is extracted from a netCDF file.

cdo sellonlatbox,-20,40,0,40 ifile.nc ofile.nc

Longitude values ranging between -180° and 180° can be used with the
sellonlatbox operator even if the longitudes in the file range from 0° to
360° (Pacific centred maps). CDO will convert them internally. Similarly,
longitude values ranging between 0° to 360° can be used with Africa centred
files (longitudes ranging from -180° and 180°).

9.5.3 Selecting Vertical Levels


As described in Section 4 climate models generate data output as 2-dimensional
(surface) fields (e.g., 2m air temperature) or 3-dimensional (upper air) fields (e.g.,
Data Analysis with CDO 246

air temperature or specific humidity). Types of vertical levels used for the latter are
discussed in more detail in Section 4.7. One or more vertical levels can be selected
from such files using the sellevel or sellevidx operator.
For the use of the sellevel operator the level or levels of interest must be specified
using values saved in the vertical dimension variable. To find out which values
are save in the variable associated with the vertical dimension the file information
operator showlevel can be used.
In the following example the data field associated with the 925 hPa pressure level is
extracted from a file.

cdo sellevel,925 ifile.nc ofile.nc

When using the sellevidx operator the levels to be selected are specified using the
corresponding index values.

9.5.4 Selecting Time Subsets


Various operators are available to select subsets in the temporal domain. A list of
common operators used for temporal selections is given in Table 8.5.4.1. More details
on how to use the operators can be found in the CDO User’s Guide.

Table 8.5.4.1: Time selection operators commonly used with CDO.

Operator Parameter Input Type Selects all timesteps with a …


seltimestep Comma-separated list of Integer timestep in a user–defined list
timesteps
seltime Comma-separated list of times String time in a user–defined list
selhour Comma-separated list of hours Integer hour in a user–defined list
selday Comma-separated list of days Integer day in a user–defined list
selmonth Comma-separated list of Integer month in a user –defined list
months
selyear Comma-separated list of years Integer year in a user–defined list
selseason Comma-separated list of String month of a season in a
seasons user–defined list
seldate One date or two String date in a user–defined range
comma-separated dates
Data Analysis with CDO 247

In the following example all timesteps with an hour that matches either 12 or 18 UTC
are selected.

cdo selhour,12,18 ifile.nc ofile.nc

In the following example all timesteps with a month that matches June are selected.

cdo selmonth,6 ifile.nc ofile.nc

In the following example all timesteps that fall between the years 1980 to 1989
inclusive are selected. Note the shortcut for creating a list of years by using a forward
slash (/) between the first and last year.

cdo selyear,1980/1989 ifile.nc ofile.nc

The parameters for selecting times, dates and seasons are of the type String whereas
all other parameters are of the type Integer (Table 8.5.4.1). String parameters are
passed on to the operator using specific formats which are listed in Table 8.5.4.2.

Table 8.5.4.2: Specific formats of time, date and season parameters used with selection operators.

Parameter Format Examples


times hh:mm:ss 06:00:00
dates YYYY-MM-DDThh:mm:ss 2000-09-01T12:00:00
seasons Substring of DJFMAMJJASOND or ANN DJF, MAMJJA, ANN

A specific time range can be defined by using the operator seldate. The operator
expects parameters corresponding to the start date and time (date1) as well as the
end date and time (date2). All timesteps that fall between these two date/time
parameters inclusively will be selected. The format of the date/time parameter is
YYYY-MM-DDThh:mm:ss whereby YYYY is the year, MM is the month, DD is the day, hh is the
hour, mm is the minute and ss is the second. The letter T separates the date and the
time part of the parameter. Time details hh:mm:ss may be omitted in cases where they
are not required (e.g., monthly mean timesteps).
In the following example all timesteps that fall between 1 Jan 1975 and 31 Dec 2012
Data Analysis with CDO 248

inclusive are selected.

cdo seldate,1975-01-01,2012-12-31 ifile.nc ofile.nc

In the following example all timesteps that fall between 1 Jan 2000 at 12 UTC and 3
Jan 2000 at 12 UTC inclusive are selected.

cdo seldate,2000-01-01T12:00:00,2000-01-03T12:00:00 ifile.nc ofile.nc

In the following example all timesteps that fall within the months of June, July and
August are selected, corresponding to the season JJA.

cdo selseasons,JJA ifile.nc ofile.nc

9.6 Basic Statistics


Calculating statistical values is a major part of climate data analysis. The operator
names for statistical computations consist of two parts - the domain over which
statistical values are computed and the statistic of interest.
The first part of the operator name defines the domain over which the statistic is
to be computed. Possible domains include time (tim), geographical space (fld), zonal
(zon), meridional (mer), vertical (vert) and ensemble (ens).
The second part of the operator name defines the statistic to be computed. Possible
statistics included minimum (min), maximum (max), range (range), sum (sum), mean
(mean), average (avg), variance (var), standard deviation (std) and percentile (pctl).
When referring to the use of any of these statistics in the following subsections the
alias <stat> is used.
By combining the domain part and statistics part the complete operator name can be
constructed. Examples of complete operator names are timmean, fldmean, zonmin and
ensstd. For a complete list of statistical operators consult the CDO User’s Guide.

The difference between the statistic mean (mean) and average (avg) is artificial and
relates to how missing values are treated in the computation. The operator mean
ignores missing values. For instance, the mean of an array containing the four
elements 1, 2, missing value and 3 is
Data Analysis with CDO 249

(1 + 2 + 3) / 3
= 2

whereas the average operator (avg) applied to the same four element array is

(1 + 2 + missing_value + 3) / 4
= missing_value / 4
= missing_value

If the array does not include any missing values then the use of statistic mean and avg
will produce the same result.

The use of statistic mean for calculating statistical averages is recommended


over the use of avg.

Some of the more frequently used statistical computations are discussed in following
Section 8.5.1 to Section 8.5.4.

9.6.1 Statistics over the Time Domain


For statistical computations over the temporal domain the CDO operator tim com-
bined with the statistic of interest is used (tim<stat>).
Considering a netCDF file named cru_ts4.02.1979.2015.tmp.dat.nc that contains
global gridded annual mean land surface near-surface air temperature data from the
CRU TS4.02³ dataset for the period 1979 to 2015, the file header information obtained
from using the command

ncdump -h cru_ts4.02.1979.2015.tmp.dat.nc

may look similar to the following.

³http://data.ceda.ac.uk/badc/cru/data/cru_ts/cru_ts_4.02/data/tmp
Data Analysis with CDO 250

netcdf cru_ts4.02.1979.2015.tmp.dat {
dimensions:
lon = 720 ;
lat = 360 ;
time = UNLIMITED ; // (37 currently)
variables:
float lon(lon) ;
lon:standard_name = "longitude" ;
lon:long_name = "longitude" ;
lon:units = "degrees_east" ;
lon:axis = "X" ;
float lat(lat) ;
lat:standard_name = "latitude" ;
lat:long_name = "latitude" ;
lat:units = "degrees_north" ;
lat:axis = "Y" ;
double time(time) ;
time:standard_name = "time" ;
time:long_name = "time" ;
time:units = "days since 1900-1-1 00:00:00" ;
time:calendar = "standard" ;
time:axis = "T" ;
float tmp(time, lat, lon) ;
tmp:long_name = "near-surface temperature" ;
tmp:units = "degrees Celsius" ;
tmp:_FillValue = 9.96921e+36f ;
tmp:missing_value = 9.96921e+36f ;
tmp:correlation_decay_distance = 1200.f ;
...
}

The file contains near-surface temperature data on a global 720 by 360 spatial grid
(0.5° resolution) and has 37 timesteps (one for each year).
The following CDO command computes the long-term mean near-surface tempera-
ture field over all timesteps using the timmean operator. The output is saved in a file
named tmp_ltm.nc.
Data Analysis with CDO 251

cdo timmean cru_ts4.02.1979.2015.tmp.dat.nc tmp_ltm.nc

Looking at the file header information of the output file tmp_ltm.nc, the command
ncdump -h tmp_ltm.nc will look like the following.

netcdf tmp_ltm {
dimensions:
lon = 720 ;
lat = 360 ;
time = UNLIMITED ; // (1 currently)
bnds = 2 ;
variables:
float lon(lon) ;
lon:standard_name = "longitude" ;
lon:long_name = "longitude" ;
lon:units = "degrees_east" ;
lon:axis = "X" ;
float lat(lat) ;
lat:standard_name = "latitude" ;
lat:long_name = "latitude" ;
lat:units = "degrees_north" ;
lat:axis = "Y" ;
double time(time) ;
time:standard_name = "time" ;
time:long_name = "time" ;
time:bounds = "time_bnds" ;
time:units = "days since 1900-1-1 00:00:00" ;
time:calendar = "standard" ;
time:axis = "T" ;
double time_bnds(time, bnds) ;
float tmp(time, lat, lon) ;
tmp:long_name = "near-surface temperature" ;
tmp:units = "degrees Celsius" ;
tmp:_FillValue = 9.96921e+36f ;
tmp:missing_value = 9.96921e+36f ;
tmp:correlation_decay_distance = 1200.f ;
...
}
Data Analysis with CDO 252

The time dimension (time) now only holds 1 timestep whereas the spatial dimensions
(lon and lat) remain unchanged.
Statistics calculated over the temporal domain using the tim<stat> operator result in
the collapse of the time dimension to 1 as shown in Figure 8.5.1.1 for a 3D field.

Statistics calculated over the time domain using the tim<stat> operator will
result in the collapse of the time dimension to 1.

Figure 8.5.1.1: Schematic showing the collapse of the time dimension when the tim<stat> operator
is applied to a 3D (longitude, latitude and time) data structure resulting in a 2D (longitude and
latitude) data structure.

Applying the tim<stat> operator to a 4D field (longitude, latitude, level and time)
will also collapse the time dimension to 1 as illustrated in Figure 8.5.1.2.
Data Analysis with CDO 253

Figure 8.5.1.2: Schematic showing the collapse of the time dimension when the tim<stat> operator
is applied to a 4D (longitude, latitude, levels and time) data structure resulting in a 3D (longitude,
latitude and levels) data structure.

In addition, the CDO command added a new dimension variable named bnds and a
new data variable named time_bnds which holds the two timesteps associated with
the boundaries of the time range over which the mean was computed.

9.6.2 Statistics over the Spatial Domain


For statistical computations over the spatial domain the CDO operator fld combined
with the statistic of interest is used (fld<stat>). The spatial domain is represented
by the longitude and latitude range of the data field. In the following example
the same input file as in the previous section is used (see output of ncdump -h
cru_ts4.02.1979.2015.tmp.dat.nc command in Section 8.5.1). The following CDO
command calculates a time series of annual global mean near-surface temperature
using the operator fldmean. The output is saved in a file named tmp_time series.nc.

cdo fldmean cru_ts4.02.1979.2015.tmp.dat.nc tmp_time series.nc

The output of the command ncdump -h tmp_time series.nc looks like the following.
Data Analysis with CDO 254

netcdf tmp_timeseries {
dimensions:
lon = 1 ;
lat = 1 ;
time = UNLIMITED ; // (37 currently)
variables:
double lon(lon) ;
lon:standard_name = "longitude" ;
lon:long_name = "longitude" ;
lon:units = "degrees_east" ;
lon:axis = "X" ;
double lat(lat) ;
lat:standard_name = "latitude" ;
lat:long_name = "latitude" ;
lat:units = "degrees_north" ;
lat:axis = "Y" ;
double time(time) ;
time:standard_name = "time" ;
time:long_name = "time" ;
time:units = "days since 1900-1-1 00:00:00" ;
time:calendar = "standard" ;
time:axis = "T" ;
float tmp(time, lat, lon) ;
tmp:long_name = "near-surface temperature" ;
tmp:units = "degrees Celsius" ;
tmp:_FillValue = 9.96921e+36f ;
tmp:missing_value = 9.96921e+36f ;
tmp:correlation_decay_distance = 1200.f ;
...
}

Note that the number of elements of the dimensions longitude and latitude have
been collapsed to 1 while the time dimension (time) remains unchanged. Each year
is now associated with a single near-surface temperature value thereby creating a
time series. Opening the output file using the command ncview tmp_time series.nc
will show the time series in a simple plot (Figure 8.5.2.1).
Data Analysis with CDO 255

Figure 8.5.2.1: Screenshot of the ncview window when opening the output file tmp_time_series.nc
showing annual global mean near-surface temperatures from 1979 to 2015.

The collapse of the spatial dimension longitude and latitude is demonstrated visually
for a 3D data structure (as used in the example above) in Figure 8.5.2.2 and for a 4D
data structure in Figure 8.5.2.3.

Statistics calculated over the spatial domain using the fld<stat> operator
will result in the collapse of the longitude and latitude dimension to 1.
Data Analysis with CDO 256

Figure 8.5.2.2: Schematic showing the collapse of the spatial dimensions longitude and latitude when
the fld<stat> operator is applied to a 3D (longitude, latitude and time) data structure resulting in a
1D (time only) data structure.

Figure 8.5.2.3: Schematic showing the collapse of the spatial dimensions longitude and latitude when
the fld<stat> operator is applied to a 3D (longitude, latitude, levels and time) data structure resulting
in a 2D (levels and time) data structure.

9.6.3 Statistics over the Vertical Domain


For statistical computations over the vertical domain the CDO operator vert com-
bined with the statistic of interest is used (vert<stat>). The vertical domain (e.g.,
the atmosphere) tends to be represented as levels and often the variable name in a
netCDF file will be level or lev.
Considering a netCDF file named erai_q_ltm.nc that contains global long-term mean
specific humidity data from the ERA-Interim reanalysis, the file header information
obtained from using the command
Data Analysis with CDO 257

ncdump -h erai_q_ltm.nc

may look similar to the following.

netcdf erai_q_ltm {
dimensions:
longitude = 480 ;
latitude = 241 ;
level = 37 ;
time = UNLIMITED ; // (1 currently)
bnds = 2 ;
variables:
float longitude(longitude) ;
longitude:standard_name = "longitude" ;
longitude:long_name = "longitude" ;
longitude:units = "degrees_east" ;
longitude:axis = "X" ;
float latitude(latitude) ;
latitude:standard_name = "latitude" ;
latitude:long_name = "latitude" ;
latitude:units = "degrees_north" ;
latitude:axis = "Y" ;
double level(level) ;
level:standard_name = "air_pressure" ;
level:long_name = "pressure_level" ;
level:units = "millibars" ;
level:positive = "down" ;
level:axis = "Z" ;
double time(time) ;
time:standard_name = "time" ;
time:long_name = "time" ;
time:bounds = "time_bnds" ;
time:units = "hours since 1900-1-1 00:00:00" ;
time:calendar = "standard" ;
time:axis = "T" ;
double time_bnds(time, bnds) ;
double q(time, level, latitude, longitude) ;
q:standard_name = "specific_humidity" ;
q:long_name = "Specific humidity" ;
Data Analysis with CDO 258

q:units = "kg kg**-1" ;


q:_FillValue = -32767. ;
q:missing_value = -32767. ;
...
}

The specific humidity data are saved for 37 vertical levels on a 480 by 241 global
grid. The vertical levels represent atmospheric pressure levels. The following CDO
command calculates the vertically integrated (summed up) specific humidity using
the CDO operator vertsum. The output is saved in a file named erai_ltm_q_vertsum.nc.

cdo vertsum erai_q_ltm.nc erai_ltm_q_vertsum.nc

The output of the command ncdump -h erai_ltm_q_vertsum.nc looks like the following.

netcdf erai_ltm_q_vertsum {
dimensions:
longitude = 480 ;
latitude = 241 ;
time = UNLIMITED ; // (1 currently)
bnds = 2 ;
variables:
float longitude(longitude) ;
longitude:standard_name = "longitude" ;
longitude:long_name = "longitude" ;
longitude:units = "degrees_east" ;
longitude:axis = "X" ;
float latitude(latitude) ;
latitude:standard_name = "latitude" ;
latitude:long_name = "latitude" ;
latitude:units = "degrees_north" ;
latitude:axis = "Y" ;
double time(time) ;
time:standard_name = "time" ;
time:long_name = "time" ;
time:bounds = "time_bnds" ;
time:units = "hours since 1900-1-1 00:00:00" ;
time:calendar = "standard" ;
Data Analysis with CDO 259

time:axis = "T" ;
double time_bnds(time, bnds) ;
double q(time, latitude, longitude) ;
q:standard_name = "specific_humidity" ;
q:long_name = "Specific humidity" ;
q:units = "kg kg**-1" ;
q:_FillValue = -32767. ;
q:missing_value = -32767. ;
...
}

Note that CDO, in this case, not only collapsed the vertical dimension but also
removed it completely from the netCDF file. The longitude, latitude and time
dimensions remain unchanged. The single level output field represents the global
long-term mean vertically integrated specific humidity.
The collapse of the vertical dimension level is demonstrated visually for a 4D data
structure (as used in the example above) in Figure 8.5.3.1 and for a 2D data structure
in Figure 8.5.3.2.

Statistics calculated over the vertical domain using the vert<stat> operator
will result in the collapse of the vertical dimension to 1.

Figure 8.5.3.1: Schematic showing the collapse of the vertical dimension level when the vert<stat>
operator is applied to a 4D (longitude, latitude, level and time) data structure resulting in a 3D
(longitude, latitude and time) data structure.
Data Analysis with CDO 260

Figure 8.5.3.2: Schematic showing the collapse of the vertical dimension level when the vert<stat>
operator is applied to a 2D (levels and time) data structure resulting in a 1D (time only) data
structure (time series).

9.6.4 Statistics over the Zonal Domain


The zonal domain refers to the east-west direction of the data field (along the latitude
zones). For statistical computations over the zonal domain the CDO operator zon
combined with the statistic of interest is used (zon<stat>).
The following command applies the zonmean operator to the same input file used in
Section 8.5.3 to calculate global long-term zonal mean specific humidity values. The
output is saved in a file named erai_ltm_q_zonmean.nc.

cdo zonmean erai_q_ltm.nc erai_ltm_q_zonmean.nc

The output of the command ncdump -h erai_ltm_q_zonmean.nc looks like the following.
Data Analysis with CDO 261

netcdf erai_q_ltm_zonmean {
dimensions:
lon = 1 ;
lat = 241 ;
level = 37 ;
time = UNLIMITED ; // (1 currently)
bnds = 2 ;
variables:
double lon(lon) ;
lon:standard_name = "longitude" ;
lon:long_name = "longitude" ;
lon:units = "degrees_east" ;
lon:axis = "X" ;
double lat(lat) ;
lat:standard_name = "latitude" ;
lat:long_name = "latitude" ;
lat:units = "degrees_north" ;
lat:axis = "Y" ;
double level(level) ;
level:standard_name = "air_pressure" ;
level:long_name = "pressure_level" ;
level:units = "millibars" ;
level:positive = "down" ;
level:axis = "Z" ;
double time(time) ;
time:standard_name = "time" ;
time:long_name = "time" ;
time:bounds = "time_bnds" ;
time:units = "hours since 1900-1-1 00:00:00" ;
time:calendar = "standard" ;
time:axis = "T" ;
double time_bnds(time, bnds) ;
double q(time, level, lat, lon) ;
q:standard_name = "specific_humidity" ;
q:long_name = "Specific humidity" ;
q:units = "kg kg**-1" ;
q:_FillValue = -32767. ;
q:missing_value = -32767. ;
...
}
Data Analysis with CDO 262

Note that the longitude dimension collapsed to 1 whereas the number of elements
of the latitude, level and time dimensions remain unchanged. The output field
represents a latitude by height cross section of the atmosphere.
The collapse of the longitude dimension longitude is demonstrated visually for a 4D
data structure (as used in the example above) in Figure 8.5.4.1 and for a 3D data
structure in Figure 8.5.4.2.

Statistics calculated over the zonal domain using the zon<stat> operator
will result in the collapse of the longitude dimension to 1.

Figure 8.5.4.1: Schematic showing the collapse of the zonal dimension longitude when the zon<stat>
operator is applied to a 4D (longitude, latitude, levels and time) data structure resulting in a 3D
(latitude, levels and time) data structure.
Data Analysis with CDO 263

Figure 8.5.4.2: Schematic showing the collapse of the zonal dimension longitude when the zon<stat>
operator is applied to a 3D (longitude, levels and time) data structure resulting in a 2D (levels and
time) data structure.

9.6.5 Statistics over the Meridional Domain


The meridional domain refers to the north-south direction of the data field (along
the meridians). For statistical computations over the meridional domain the CDO
operator mer combined with the statistic of interest is used (mer<stat>).
The following command applies the mermean operator to the same input file used in
Section 8.5.3 to calculate global long-term meridional mean specific humidity values.
The output is saved in a file named erai_ltm_q_mermean.nc.

cdo mermean erai_q_ltm.nc erai_ltm_q_mermean.nc

The output of the command ncdump -h erai_ltm_q_mermean.nc looks like the following.
Data Analysis with CDO 264

netcdf erai_ltm_q_mermean {
dimensions:
lon = 480 ;
lat = 1 ;
level = 37 ;
time = UNLIMITED ; // (1 currently)
bnds = 2 ;
variables:
double lon(lon) ;
lon:standard_name = "longitude" ;
lon:long_name = "longitude" ;
lon:units = "degrees_east" ;
lon:axis = "X" ;
double lat(lat) ;
lat:standard_name = "latitude" ;
lat:long_name = "latitude" ;
lat:units = "degrees_north" ;
lat:axis = "Y" ;
double level(level) ;
level:standard_name = "air_pressure" ;
level:long_name = "pressure_level" ;
level:units = "millibars" ;
level:positive = "down" ;
level:axis = "Z" ;
double time(time) ;
time:standard_name = "time" ;
time:long_name = "time" ;
time:bounds = "time_bnds" ;
time:units = "hours since 1900-1-1 00:00:00" ;
time:calendar = "standard" ;
time:axis = "T" ;
double time_bnds(time, bnds) ;
double q(time, level, lat, lon) ;
q:standard_name = "specific_humidity" ;
q:long_name = "Specific humidity" ;
q:units = "kg kg**-1" ;
q:_FillValue = -32767. ;
q:missing_value = -32767. ;
...
}
Data Analysis with CDO 265

Note that the latitude dimension collapsed to 1 whereas the number of elements
of the longitude, level and time dimensions remain unchanged. The output field
represents a longitude by height cross section of the atmosphere.
The collapse of the latitude dimension latitude is demonstrated visually for a 4D data
structure (as used in the example above) in Figure 8.5.5.1 and for a 3D data structure
in Figure 8.5.5.2.

Statistics calculated over the meridional domain using the mer<stat> oper-
ator will result in the collapse of the latitude dimension to 1.

Figure 8.5.5.1: Schematic showing the collapse of the meridional dimension latitude when the
mer<stat> operator is applied to a 4D (longitude, latitude, levels and time) data structure resulting
in a 3D (longitude, levels and time) data structure.
Data Analysis with CDO 266

Figure 8.5.5.2: Schematic showing the collapse of the meridional dimension latitude when the
mer<stat> operator is applied to a 3D (latitude, levels and time) data structure resulting in a 2D
(levels and time) data structure.

9.6.6 Statistics over Ensembles


A climate model may be used to simulate future climate conditions over and over
again every time using slightly different initial conditions resulting in, perhaps,
thousands of model runs (files). This approach is often used to establish the natural
variability of the climate system. Similarly, projects such as the Climate Model Inter-
comparison Project Phase 5 (CMIP5) have been set up whereby the models from the
main climate modelling centres in the world run climate experiments using the same
input fields and conditions. The output from the different models is then analysed
in order to get an idea about the spread of outcomes given different models. Such
collections of model runs are called ensembles. Ensembles can be considered as
related datasets which hold the same variables but are saved in separate files.
Climate scientists are often interested in performing statistical analysis across these
different model runs (ensemble), which can be achieved by using the CDO operator
ens combined with the statistic of interest (ens<stat>).

In contrast to temporal, spatial, vertical, zonal and meridional statistic operators


discussed in the previous sections the ensemble operator ens expects multiple input
files. The data variable name or names need to be the same in all files.
In addition, the data in the different files must have the same spatial resolution, which
Data Analysis with CDO 267

is often not the case when comparing output from different models. If the files do
not share the same resolution then files have to be interpolated spatially (known as
remapping) to a common resolution before ensemble statistics can be calculated (see
Section 8.6.1 for remapping options).
In the following example the 95ʰ percentile is calculated over relative humidity (hurs)
fields from six ensemble members (r1i1p1 to r6i1p1) of historical experiment CCSM4
global climate model runs. Note that the asterisk (*) is used here as a wildcard in
order to generate a list of input files.

cdo enspctl,95 hurs_Amon_CCSM4_historical_r*i1p1_185001-200512.nc enspctl95.nc

9.7 Interpolations

9.7.1 Interpolation to a new horizontal grid (remapping)


Interpolating a gridded data file in the horizontal domain will generally result in a
new horizontal resolution in the output file. This is also referred to as remapping.
The general syntax for remapping a file to a new horizontal resolution is as follows.

cdo remap<InterpMethod>,<NewGrid> ifile ofile

The CDO operator name is constructed from two parts. The first part is named remap
followed by the second part the interpolation method (<InterpMethod>) to be used.
Table 8.7.1.1 lists the available interpolation methods. Using both parts, operator
names such as remapbil (for bilinear remapping) or remapnn (for nearest neighbour
remapping) can be constructed.
Data Analysis with CDO 268

Table 8.7.1.1: CDO remapping interpolation methods.

<InterpMethod> Description
bil Bilinear remapping
bic Biqubic remapping
nn Nearest neighbour remapping
dis Distance-weighted average remapping
ycon First order conservative remapping
con First order conservative remapping
con2 Second order conservative remapping
laf Largest area fraction remapping

The operator requires a parameter that holds the information about the desired new
grid (<NewGrid>). Table 8.7.1.2 lists possible parameter options for the new grid.

Table 8.7.1.2: CDO grid definition types used for remapping.

<NewGrid> parameter Description Example


r<NX>x<NY> Regular grid with number grid r360x180
boxes in the longitude (<NX>) and
latitude (<NY>) direction

n<N> Global Gaussian grid, see Section n80


4.6.2 for description of N value

lon=<LON>/lat=<LAT> Single point defined by the lon=20.5/lat=30.3


longitude (<LON>) and latitude
(<LAT>) value

<file_name> Fetching grid information from file.nc


another file
In the following example the horizontal grid of the input file is remapped to a regular
1° by 1° grid using nearest neighbour interpolation.

cdo remapnn,r360x180 ifile.nc ofile.nc

For horizontal interpolation to a point location such as the location of an instrument


site or a town the lon=<LON>/lat=<LAT> parameter can be used. In the following ex-
Data Analysis with CDO 269

ample the bilinear interpolation method is used to interpolate data to the geographic
location specified by the longitude -1.255987 and latitude 51.758571 (location of the
School of Geography and the Environment, University of Oxford).

cdo remapbil,lon=-1.255987/lat=51.758571 ifile.nc ofile.nc

To make comparison of data between different models possible, a shared horizontal


resolution is needed. The easiest way to achieve this is to remap one file to the
resolution of the other. Assuming that the two files are named model_a.nc and
model_b.nc then we can remap model_a.nc to the resolution of model_b.nc by using the
command below. The desired grid information for the output file model_a_remapped.nc
is passed on to the operator remapnn (nearest neighbour interpolation used here) by
providing the filename model_b.nc.

cdo remapnn,model_b.nc model_a.nc model_a_remapped.nc

9.7.2 Interpolation in the Vertical Domain


Various operators are available for interpolation in the vertical domain. Before
attempting vertical interpolation it is important to understand the vertical level type
of the input data field. Vertical level types are discussed in more detail in Section
4.7. Depending on the vertical level type some vertical interpolation operators are
available as listed in Table 8.7.2.1.

Table 8.7.2.1: CDO operators used for interpolations in the vertical domain.

Operator Description
intlevel Linear vertical interpolation of non-hybrid 3D variables
intlevel3d Linear vertical interpolation of 3D variables fields with given 3D
vertical coordinates
ml2pl Interpolation of 3D variables on hybrid sigma pressure level to
pressure levels
ml2hl Interpolation of 3D variables on hybrid sigma pressure level to
height levels
ap2pl Interpolation of 3D variables on hybrid sigma height coordinates to
pressure levels
ap2hl Interpolation of 3D variables on hybrid sigma height coordinates to
height levels
Data Analysis with CDO 270

For 3D fields on non-hybrid levels the operator intlevel can be used to perform linear
interpolation to a new set of vertical levels. For instance, the following command
interpolates data on pressure levels to a new set of target pressure levels. The target
pressure levels are passed to the operator as parameters. Make sure that the units of
the level variable in input file match that of the target levels (e.g., hPa or Pa).

cdo intlevel,1010,1020,1030,1040,1050 ifile.nc ofile.nc

For a more detailed description of the other vertical interpolation operators see the
CDO User’s Guide⁴.

9.7.3 Interpolation in the Time Domain


For interpolations in the time domain the operators inttime and intntime can be used.
Linear interpolation is performed between the timesteps. The general syntax for the
operator inttime is as follows.

cdo inttime,date,time[,inc] ifile.nc ofile.nc

The format for the date and time information is YYYY-MM-DD and hh:mm:ss, respectively.
An optional increment parameter (inc) can be passed to the operator. Possible
increments include seconds, minutes, hours, days, months and years (the default is 0hour).
For example, the following command will interpolate a file with3-hourly temporal
resolution starting at 1 June 1980 at 18 UTC to a 1-hourly time series.

cdo inttime,1980-06-01,18:00:00,1hour ifile.nc ofile.nc

Alternatively, the intntime operator can be used for time interpolations. Instead of the
temporal resolution as used with inttime the intntime operator expects the number of
timesteps from one timestep to the next in the original file to be passed as a parameter.
The following command will produce the same output as the inttime example above.

⁴https://code.zmaw.de/projects/cdo/wiki/Cdo#Documentation
Data Analysis with CDO 271

intntime,3 ifile.nc ofile.nc

9.8 Basic Arithmetic


CDO allows for some basic arithmetic calculations such as addition, subtraction,
multiplication and division. In principle, there are two ways in which arithmetic
operators are used: arithmetic between data fields saved in two different files and
arithmetic by applying a constant to a data field.

9.8.1 Artithmetic Between Two Files


In order to add, subtract, multiply or divide two files the operators add, sub, mul and
div can be used, respectively. For instance, to create an anomaly field one might want
to subtract the long-term mean field (ltm.nc) from a specific year (1997.nc) and save
the output (anomaly.nc) as done in the following example.

cdo sub 1997.nc ltm.nc anomaly.nc

The operator (in the above example sub) is followed by the two file names separated
by white spaces followed by the output filename. To mentally visualise which file
is subtracted from which one might imagine the arithmetic operator to be located
between the two file names.

The two file names are passed to the arithmetic operator as arguments
separated by white spaces and not commas as done with other operator
arguments.

9.8.2 Arithmetic Using a Constant Value


Arithmetic can also be done using a constant whereby the computation is applied
to every data point of a data field. A constant may be added to or subtracted from
each data point by using the operators addc and subc, respectively. Similarly, each
Data Analysis with CDO 272

data point in a data field may be multiplied with or divided by a constant using the
mulc or divc operator, respectively.

The constant is passed to the operator as a parameter in the usual way (see Section
8.4.3). In the following example the constant 273.15 is added to each data point in the
input file ifile.nc in order to convert temperature data from °C to Kelvin.

cdo addc,273.15 ifile.nc ofile.nc

Note that the value of the units attribute in the netCDF file (e.g., Degree
Celsius) remains unchanged. Only the values in the data variable change.

9.9 Applying CDO in Climate Computations


One feature that makes CDO an incredibly powerful tool is that multiple operators
can be included in a single command. This allows the creation of quite complex data
operations. As outlined in Section 8.3.3 a couple of rules need to be followed when
using multiple operators in a CDO command.
The order in which operators are executed matters in some case while in other cases
it does not. It all depends on what the operators do to the input file or the field that
is passed on to them following completion of the previous operator. Operators are
executed in sequence from right to left meaning the operator located furthest on the
right in the CDO command is executed first. The output of that execution is then
passed on to the next operator on the left. The output from the individual operator
operations is not saved but is passed on to the next operator in memory.

Operators are executed in sequence from right to left.

As the fields from the intermediate steps are passed on in memory it is important to
think through what the structure of the data field would look like after the execution
of each individual operator. Dimensions or temporal resolution of the field may
Data Analysis with CDO 273

change. If in doubt then execute each operator individually, save the output and
check the file using ncdump, cdo [s]info[n] and ncview.
Remember that hyphens (-) are required for all operators apart from the leftmost one
(executed last).
The following subsections will cover some examples of how multiple operators may
be applied and what a CDO workflow might look like.

9.9.1 Indian Ocean Dipole Example


In November 1997, the Indian Ocean Dipole (IOD) was in an extreme positive state.
Calculate the spatial distribution of sea surface temperature (SST) anomalies for
November 1997 (reference period 1980-2010) using the Hadley Centre Ice and Sea-
Surface Temperature (HadISST) observed dataset.
Step 1: Explore the file headers of the input file HadISST_sst.nc using any of the
available tools. The following output from the command cdo sinfo HadISST_sst.nc
shows that the file contains monthly global data on a 1° by 1° grid between January
1870 and March 2016.

File format : netCDF


-1 : Institut Source Ttype Levels Num Points Num Dtype : Parameter \
ID
1 : unknown HadISST instant 1 1 2 1 F32 : -1
2 : unknown HadISST instant 1 1 64800 2 F32 : -2
Grid coordinates :
1 : generic : points=2
2 : lonlat : points=64800 (360x180)
longitude : -179.5 to 179.5 by 1 degrees_east circular
latitude : 89.5 to -89.5 by -1 degrees_north
Vertical coordinates :
1 : surface : levels=1
Time coordinate : 1755 steps
RefTime = 1870-01-01 00:00:00 Units = days Calendar = standard
YYYY-MM-DD hh:mm:ss YYYY-MM-DD hh:mm:ss YYYY-MM-DD hh:mm:ss YYYY-MM-DD hh:\
mm:ss
1870-01-16 12:00:00 1870-02-14 23:59:59 1870-03-16 11:59:59 1870-04-15 23:\
59:59
Data Analysis with CDO 274

1870-05-16 12:00:00 1870-06-16 00:00:00 1870-07-16 12:00:00 1870-08-16 12:\


00:00
1870-09-16 00:00:00 1870-10-16 12:00:00 1870-11-16 00:00:00 1870-12-16 12:\
00:00
1871-01-16 12:00:00 1871-02-15 00:00:00 1871-03-16 12:00:00 1871-04-16 00:\
00:00
1871-05-16 12:00:00 1871-06-16 00:00:00 1871-07-16 12:00:00 1871-08-16 12:\
00:00
............................................................................\
....
............................................................................\
....
........
2014-05-16 12:00:00 2014-06-16 12:00:00 2014-07-16 12:00:00 2014-08-16 12:\
00:00
2014-09-16 12:00:00 2014-10-16 12:00:00 2014-11-16 12:00:00 2014-12-16 12:\
00:00
2015-01-16 12:00:00 2015-02-16 12:00:00 2015-03-16 12:00:00 2015-04-16 12:\
00:00
2015-05-16 12:00:00 2015-06-16 12:00:00 2015-07-16 12:00:00 2015-08-16 12:\
00:00
2015-09-16 12:00:00 2015-10-16 12:00:00 2015-11-16 12:00:00 2015-12-16 12:\
00:00
2016-01-16 12:00:00 2016-02-16 12:00:00 2016-03-16 12:00:00

Step 2: The following CDO command calculates the long-term mean November SSTs
for the period 1980 to 2010 and saves the output in a file named HadISST_sst_Nov_-
ltm.nc. First, the selection operator selmon,11 is executed which selects all timesteps
between 1980 and 2010 inclusive. The resulting data field should now have 31 times 12
(=372) timesteps. The data field is passed on to the selection operator selmon,11 which
now selects all timesteps corresponding to the month November. The resulting data
field from this operation should now have 31 timesteps. The resulting data field is
now passed on to the statistics operator timmean which calculates the mean over the
temporal domain. The final data saved in the file HadISST_sst_Nov_ltm.nc represents
a single timestep (long-term mean) field on a 1° by 1° global grid.

cdo timmean -selmon,11 -selyear,1980/2010 HadISST_sst.nc HadISST_sst_Nov_ltm.nc


Data Analysis with CDO 275

The position of operators selmon,11 and selyear,1980/2010 could be swapped. How-


ever, the operator timmean needs to be executed last.
Step 3: The following command extracts the November 1997 timestep from the
original input file and save it in the output file HadISST_sst_Nov1997.nc. The order
of the two operators selmon,11 and selyear,1997 does not matter in this case. The
output field will represent a single timestep (November 1997) on a 1° by 1° global
grid.

cdo selmon,11 -selyear,1997 HadISST_sst.nc HadISST_sst_Nov1997.nc

Step 4: In order to calculate the anomaly the 1980-2010 long-term mean field needs
to be subtracted from the November 1997 field. The following command does exactly
that by using the arithmetic operator sub which subtracts SST data in file HadISST_-
sst_Nov_ltm.nc from the SST data in file HadISST_sst_Nov1997.nc. The resulting SST
anomaly field is saved in the file HadISST_sst_Nov1997_anom.nc.

cdo sub HadISST_sst_Nov1997.nc HadISST_sst_Nov_ltm.nc HadISST_sst_Nov1997_anom.\


nc

The CDO command may output a warning such as cdo sub (Warning):
Input streams have different parameters!. This is the result of the file
HadISST_sst_Nov_ltm.nc having an additional variable named ‘time_bnds_2’
that was created as a result of the cdo command in Step 2.

Step 5: The output file HadISST_sst_Nov1997_anom.nc created in the previous step


contains the November 1997 SST anomaly field. The Python programming language
can now be used to read in and plot the data (Chapter 7). The output file created in
this example can be read in and plotted using Code 7.x.x.1. The code creates a map
showing SST anomalies within the Indian Ocean basin domain (Figure 8.8.1.1).
During November 1997 the sea surface temperatures were below the long-term
average by more than 1.5°C in the eastern part of the Indian Ocean basin whereas
they were above the long-term average by more than 1°C in the western part of the
basin.
Data Analysis with CDO 276

9.9.2 Sahel Rainfall Variability Example


The Sahel region is a climatic zone at the southern fringes of the Sahara. Each summer
the African monsoon moves northwards and brings rainfall to the Sahel. The amount
of rainfall has varied significantly over the years. In order to identify drought years
calculate observed total rainfall departures from the long-term mean (1950-2018) for
the rainy season July to September (JAS) averaged over the Sahel domain (20°W to
30°E and 10° to 18°N) and visualise the output.
Step 1: Explore the file headers of the input file cru_ts4.03.1901.2018.pre.dat.nc
using any of the available tools. The following output from the command cdo sinfo
cru_ts4.03.1901.2018.pre.dat.nc shows that the input file contains monthly fields on
a global 0.5° by 0.5° grid covering the period January 1901 to December 2018 (1416
timesteps).

File format : netCDF


-1 : Institut Source Ttype Levels Num Points Num Dtype : Parameter \
ID
1 : unknown Run instant 1 1 259200 1 F32 : -1
2 : unknown Run instant 1 1 259200 1 I32 : -2
Grid coordinates :
1 : lonlat : points=259200 (720x360)
lon : -179.75 to 179.75 by 0.5 degrees_east circ\
ular
lat : -89.75 to 89.75 by 0.5 degrees_north
Vertical coordinates :
1 : surface : levels=1
Time coordinate : 1416 steps
RefTime = 1900-01-01 00:00:00 Units = days Calendar = standard
YYYY-MM-DD hh:mm:ss YYYY-MM-DD hh:mm:ss YYYY-MM-DD hh:mm:ss YYYY-MM-DD hh:\
mm:ss
1901-01-16 00:00:00 1901-02-15 00:00:00 1901-03-16 00:00:00 1901-04-16 00:\
00:00
1901-05-16 00:00:00 1901-06-16 00:00:00 1901-07-16 00:00:00 1901-08-16 00:\
00:00
1901-09-16 00:00:00 1901-10-16 00:00:00 1901-11-16 00:00:00 1901-12-16 00:\
00:00
1902-01-16 00:00:00 1902-02-15 00:00:00 1902-03-16 00:00:00 1902-04-16 00:\
00:00
Data Analysis with CDO 277

1902-05-16 00:00:00 1902-06-16 00:00:00 1902-07-16 00:00:00 1902-08-16 00:\


00:00
............................................................................\
....
............................................................................\
....
.....
2017-05-16 00:00:00 2017-06-16 00:00:00 2017-07-16 00:00:00 2017-08-16 00:\
00:00
2017-09-16 00:00:00 2017-10-16 00:00:00 2017-11-16 00:00:00 2017-12-16 00:\
00:00
2018-01-16 00:00:00 2018-02-15 00:00:00 2018-03-16 00:00:00 2018-04-16 00:\
00:00
2018-05-16 00:00:00 2018-06-16 00:00:00 2018-07-16 00:00:00 2018-08-16 00:\
00:00
2018-09-16 00:00:00 2018-10-16 00:00:00 2018-11-16 00:00:00 2018-12-16 00:\
00:00

The following output from the command ncdump -h cru_ts4.03.1901.2018.pre.dat.nc


shows that the intput file contains precipitation amounts given in mm/month and that
the precipitation variable name is pre. It also shows that a second data variable named
stn is part of the file representing the number of stations contributing to each datum.
This variable could be used to address questions of uncertainties in the data.

netcdf cru_ts4.03.1901.2018.pre.dat {
dimensions:
lon = 720 ;
lat = 360 ;
time = UNLIMITED ; // (1416 currently)
variables:
float lon(lon) ;
lon:long_name = "longitude" ;
lon:units = "degrees_east" ;
float lat(lat) ;
lat:long_name = "latitude" ;
lat:units = "degrees_north" ;
float time(time) ;
time:long_name = "time" ;
time:units = "days since 1900-1-1" ;
Data Analysis with CDO 278

time:calendar = "gregorian" ;
float pre(time, lat, lon) ;
pre:long_name = "precipitation" ;
pre:units = "mm/month" ;
pre:correlation_decay_distance = 450.f ;
pre:_FillValue = 9.96921e+36f ;
pre:missing_value = 9.96921e+36f ;
int stn(time, lat, lon) ;
stn:description = "number of stations contributing to each datum" ;
stn:_FillValue = -999 ;
stn:missing_value = -999 ;
...
}

Step 2: The following CDO command calculates the JAS rainfall amounts for each
year between 1950 and 2018 averaged over the Sahel domain. First, the data variable
pre is selected using the operator selvar,pre. Second, the period 1950 to 2018 is
selected using the operator selyear,1950/2018 followed by selecting the months July
to September using the operator selmon,7,8,9. Next the spatial domain is cropped
to the Sahel region using the operator sellonlatbox,-20,30,10,18. Next, the rainfall
values are summed up for each year. This will give one value per year per grid box
representing JAS totals. Lastly, the spatial average is calculated for the Sahel domain
using the statistics operator fldmean. The output in the form of a time series is saved
in the file Sahel_JAS_pre.nc.
cdo fldmean -yearsum -sellonlatbox,-20,30,10,18 -selmon,7,8,9 -selyear,1950/2018 -
selvar,pre cru_ts4.03.1901.2018.pre.dat.nc Sahel_JAS_pre.nc

Step 3: In order to calculate the anomalies the long-term mean value needs to be
subtracted from each data value in the time series. The long-term mean of the
time series can be calculated using the following command employing the statistics
operator timmean. The output is saved in the temporary file foo.nc.

cdo timmean Sahel_JAS_pre.nc foo.nc

The file ‘foo.nc’ will contain only a single data point representing the long-term mean
value of the time series. To find out what that value is the command ncdump foo.nc
Data Analysis with CDO 279

(without -h) can be used. It reveals that pre = 401.9308. This is the long-term mean
JAS Sahel rainfall between 1850 and 2018.
Step 4: Now we can subtract the long-term mean value from each value in the time
series using the operator subc,401.9308 in the command below. The output is saved
in the file Sahel_JAS_pre_anom.nc.

cdo subc,401.9308 Sahel_JAS_pre_anom.nc

Subtracting the mean value from the time series values is a fairly simple
procedure. This step could also be done in one line within the Python code
that does the plotting of the time series meaning that Step 3 and Step 4
would not be needed.

Step 5: The output file Sahel_JAS_pre_anom.nc created in the previous step contains
the rainfall anomaly time series for the Sahel region. Python can now be used to
read in and plot the data. The output file created in this example can be read in and
plotted using Code 7.x.x.2. The code creates a time series plot (Figure 7.x.x.x).

9.9.3 Creating a Land-Sea Mask File


A land-sea mask is a gridded data file that allows users to make a distinction between
water and land grid cells (and in some cases grid cells covering both). Different types
of land-sea mask are used within the climate community. A land-sea mask file could
contain land area fractions for each grid cell whereby values of 0 represent water grid
cells, values of 1 represent land grid cells and any value between 0 and 1 represents
grid cells that cover both land and sea areas. Other versions of land-sea masks may
consist of just binary values of 0 (water) and 1 (land) or may have either land or sea
grid cells set to a missing value.
The quality of the land-sea mask will depend on the quality and resolution of the
topography field used to create it. The more accurate the underlying topography
field is the better the quality of the land-sea mask.
The following example shows how CDO can be used to create a land-sea mask file
that matches the resolution of a specific data file.
Data Analysis with CDO 280

cdo -f nc setctomiss,0 -gtc,0 -remapnn,data.nc -topo land-sea-mask.nc

The -f option followed by nc is used here to make sure that the output file
land-sea-mask.nc is saved in netCDF format (see Section 8.3.1 for CDO options).
The topo operator creates a topography (elevation) field based on a high-resolution
(30 arc-seconds; approximately 1 km) global digital elevation model developed by
the United States Geological Survey (USGS) named GTOPO30⁵.
The operator remapnn is used to perform nearest-neighbour interpolation of the
topography field to the resolution of the data file data.nc (see [Section 8.7.1] for
remapping of files).
The gtc operator compares a field with a constant returning a field containing values
of 1 if the comparison is true and 0 if the comparison is not true. Therefore, applying
the gtc,0 operator to the topography field will return 1 for all grid cells where the
elevation is greater than 0 metres (sea level) and 0 for all other grid cells (ocean).
While the field now already represents a land-sea mask with grid cell values of 0 or
1 the setctomiss (set constant to missing) operator with the parameter 0 is used here
in addition to set all grid cell values that are 0 (ocean grid cells) to missing values.
The resulting field will now only have grid cell values of 1 or missing values.

Ocean islands smaller than approximately 1 km² are not represented in


GTOPO30 and lowland coastal areas have an elevation of at least 1 metre
(GTOPO30 Readme⁶). It follows that any land area larger than about 1 km²
within a model-sized grid cell will likely increase the average elevation of
the grid cell above 0 metres resulting in this grid cell to be marked as land
by the CDO command above.

The output of the command ncdump -h land-sea-mask.nc may look similar to the
following.

⁵https://www.usgs.gov/centers/eros/science/usgs-eros-archive-digital-elevation-global-30-arc-second-elevation-
gtopo30?qt-science_center_objects=0#qt-science_center_objects
⁶https://prd-wret.s3.us-west-2.amazonaws.com/assets/palladium/production/s3fs-public/atoms/files/GTOPO30_
Readme.pdf
Data Analysis with CDO 281

netcdf land-sea-mask {
dimensions:
lon = 96 ;
bnds = 2 ;
lat = 73 ;
variables:
double lon(lon) ;
lon:standard_name = "longitude" ;
lon:long_name = "longitude" ;
lon:units = "degrees_east" ;
lon:axis = "X" ;
lon:bounds = "lon_bnds" ;
double lon_bnds(lon, bnds) ;
double lat(lat) ;
lat:standard_name = "latitude" ;
lat:long_name = "latitude" ;
lat:units = "degrees_north" ;
lat:axis = "Y" ;
lat:bounds = "lat_bnds" ;
double lat_bnds(lat, bnds) ;
float topo(lat, lon) ;
topo:units = "m" ;
topo:_FillValue = -9.e+33f ;
topo:missing_value = -9.e+33f ;
...
}

9.10 Using CDO with Python


The Python programming language introduced in Chapter 7 can be used to execute
any Unix system command. This means that any command one would normally
execute on the Linux command line can also be executed from within Python.
Therefore, it is also possible to execute CDO commands from within Python (Section
7.2.5). This can be useful when batch processing large numbers of netCDF files. For
instance, a loop written in a Python script can be used to iterate over a list of files
(see Section 7.2.2 for looping through files). A CDO command can then be executed
within the loop, processing one file at a time with each iteration. An example of this
scenario is outlined in the following paragraphs.
Data Analysis with CDO 282

A list of netCDF files containing daily values of rainfall generated as part of the
Climate Hazards Group InfraRed Precipitation with Station data (CHIRPS⁷) dataset
are saved in a directory named /home/data/chirps_v20/ as follows.

/home/data/chirps_v20/0.5x0.5/chirps-v2.0.1981.days_p05.nc
/home/data/chirps_v20/0.5x0.5/chirps-v2.0.1982.days_p05.nc
/home/data/chirps_v20/0.5x0.5/chirps-v2.0.1983.days_p05.nc
/home/data/chirps_v20/0.5x0.5/chirps-v2.0.1984.days_p05.nc
/home/data/chirps_v20/0.5x0.5/chirps-v2.0.1985.days_p05.nc
/home/data/chirps_v20/0.5x0.5/chirps-v2.0.1986.days_p05.nc
...
/home/data/chirps_v20/0.5x0.5/chirps-v2.0.2013.days_p05.nc
/home/data/chirps_v20/0.5x0.5/chirps-v2.0.2014.days_p05.nc
/home/data/chirps_v20/0.5x0.5/chirps-v2.0.2015.days_p05.nc
/home/data/chirps_v20/0.5x0.5/chirps-v2.0.2016.days_p05.nc
/home/data/chirps_v20/0.5x0.5/chirps-v2.0.2017.days_p05.nc
/home/data/chirps_v20/0.5x0.5/chirps-v2.0.2018.days_p05.nc

Each file contains one year of rainfall data. The period covered is 1981 to 2018 yielding
38 files. Python can now be used to loop over each file with CDO computing monthly
rainfall totals for each file as shown in the following Python code example.

1 import numpy as np
2 from os.path import basename
3 import subprocess
4
5 # define paths
6 datain = '/home/data/chirps_v20/'
7 dataout = 'output/'
8
9 # loop through years
10 for yyyy in np.arange(1981, 2019):
11 # construct input filename
12 ifile = datain+'chirps-v2.0.'+str(yyyy)+'.days_p05.nc'
13 print('Processing', ifile)
14
15 # construct output filename
⁷https://www.chc.ucsb.edu/data/chirps
Data Analysis with CDO 283

16 bname = basename(ifile)
17 ofile = dataout+bname.replace('days', 'monsum')
18
19 # execute CDO command
20 cmd = 'cdo monsum '+ifile+' '+ofile
21 process = subprocess.Popen([cmd], shell=True, stdout=subprocess.PIPE)
22 process.communicate()

In line 1 to 3 the packages and functions needed are imported followed by defining
variables for holding the data input and output directory paths in line 6 and 7. In
line 10 the loop is set up with the variable yyyy iterating over a sequence of numbers
(years) from 1981 to 2019.
Inside the loop the filename is constructed in line 12 and saved in the variable ifile.
Note that the input directory path (datain) is joined with the filename. The year in
the middle of the filename is the the part that changes with each iteration and is
added here by converting the year number to a string (str(yyyy)).
In line 16 and 17 the output filename is constructed. The basename() function is used
here to extract the filename from the input file which includes the full path (line 16).
The output filename is then constructed by replacing the string days with monsum
(CDO operator for monthly totals) of the input filename and adding the output
directory path to the beginning and saving everything the variable ofile.
Next, the CDO command is constructed in line 20 using the CDO operator monsum to
calculated monthly rainfall totals. Line 21 and 22 will execute the CDO command
saved in the variable cmd using the subprocess (see Section 7.2.5 for more details).
As an alternative to using method described above the Python package cdo⁸ may
be used to integrate CDO functionality into a Python script. The usefulness of this
module depends on the task at hand. It is worth noting that the Python cdo package
does not install CDO itself. It just acts as a wrapper around the CDO binaries. A
description of the module including installation instructions and how to use it can
be found on the MPI for Meteorology webpage⁹.

⁸https://pypi.org/project/cdo
⁹https://code.mpimet.mpg.de/projects/cdo/wiki/Cdo%7Brbpy%7D
Appendix
Appendix 285

References
Dee, D. P., Uppala, S. M., Simmons, A. J., Berrisford, P., Poli, P., Kobayashi, S., Andrae,
U., Balmaseda, M. A., Balsamo, G., Bauer, P., Bechtold, P., Beljaars, A. C. M., van
de Berg, L., Bidlot, J., Bormann, N., Delsol, C., Dragani, R., Fuentes, M., Geer, A. J.,
Haimberger, L., Healy, S. B., Hersbach, H., Holm, E. V., Isaksen, L., Kallberg, P., Kohler,
M., Matricardi, M., McNally, A. P., Monge-Sanz, B. M., Morcrette, J. J., Park, B. K.,
Peubey, C., de Rosnay, P., Tavolato, C., Thepaut, J. N. and Vitart, F., 2011, The ERA-
Interim reanalysis: configuration and performance of the data assimilation system,
Quarterly Journal of the Royal Meteorological Society, 137, 553-597.
Gelaro, R., McCarty, W., Suarez, M. J., Todling, R., Molod, A., Takacs, L., Randles, C.
A., Darmenov, A., Bosilovich, M. G., Reichle, R., Wargan, K., Coy, L., Cullather, R.,
Draper, C., Akella, S., Buchard, V., Conaty, A., da Silva, A. M., Gu, W., Kim, G. K.,
Koster, R., Lucchesi, R., Merkova, D., Nielsen, J. E., Partyka, G., Pawson, S., Putman,
W., Rienecker, M., Schubert, S. D., Sienkiewicz, M. and Zhao, B., 2017, The Modern-
Era Retrospective Analysis for Research and Applications, Version 2 (MERRA-2),
Journal of Climate, 30, 5419-5454.
Hersbach, H., Peubey, C., Simmons, A., Berrisford, P., Poli, P. and Dee, D., 2015, ERA-
20CM: a twentieth-century atmospheric model ensemble, Quarterly Journal of the
Royal Meteorological Society, 141, 2350-2375.
Kanamitsu, M., Ebisuzaki, W., Woollen, J., Yang, S. K., Hnilo, J. J., Fiorino, M. and
Potter, G. L., 2002, NCEP-DOE AMIP-II reanalysis (R-2), Bulletin of the American
Meteorological Society, 83, 1631-1643.
Kobayashi, S., Ota, Y., Harada, Y., Ebita, A., Moriya, M., Onoda, H., Onogi, K.,
Kamahori, H., Kobayashi, C., Endo, H., Miyaoka, K. and Takahashi, K., 2015, The
JRA-55 Reanalysis: General Specifications and Basic Characteristics, Journal of the
Meteorological Society of Japan, 93, 5-48.
Rienecker, M. M., Suarez, M. J., Gelaro, R., Todling, R., Bacmeister, J., Liu, E.,
Bosilovich, M. G., Schubert, S. D., Takacs, L., Kim, G.-K., Bloom, S., Chen, J., Collins,
D., Conaty, A., Da Silva, A., Gu, W., Joiner, J., Koster, R. D., Lucchesi, R., Molod,
A., Owens, T., Pawson, S., Pegion, P., Redder, C. R., Reichle, R., Robertson, F. R.,
Ruddick, A. G., Sienkiewicz, M. and Woollen, J., 2011, MERRA: NASA’s Modern-Era
Retrospective Analysis for Research and Applications, Journal of Climate, 24, 3624-
Appendix 286

3648.
Saha, S., Moorthi, S., Pan, H.-L., Wu, X., Wang, J., Nadiga, S., Tripp, P., Kistler, R.,
Woollen, J., Behringer, D., Liu, H., Stokes, D., Grumbine, R., Gayno, G., Wang, J.,
Hou, Y.-T., Chuang, H.-Y., Juang, H.-M. H., Sela, J., Iredell, M., Treadon, R., Kleist,
D., Van Delst, P., Keyser, D., Derber, J., Ek, M., Meng, J., Wei, H., Yang, R., Lord,
S., Van den Dool, H., Kumar, A., Wang, W., Long, C., Chelliah, M., Xue, Y., Huang,
B., Schemm, J.-K., Ebisuzaki, W., Lin, R., Xie, P., Chen, M., Zhou, S., Higgins, W.,
Zou, C.-Z., Liu, Q., Chen, Y., Han, Y., Cucurull, L., Reynolds, R. W., Rutledge, G. and
Goldberg, M., 2010, The NCEP Climate Forecast System Reanalysis, Bulletin of the
American Meteorological Society, 91, 1015-1057.
Uppala, S. M., Kallberg, P. W., Simmons, A. J., Andrae, U., Bechtold, V. D., Fiorino,
M., Gibson, J. K., Haseler, J., Hernandez, A., Kelly, G. A., Li, X., Onogi, K., Saarinen,
S., Sokka, N., Allan, R. P., Andersson, E., Arpe, K., Balmaseda, M. A., Beljaars, A.
C. M., Van De Berg, L., Bidlot, J., Bormann, N., Caires, S., Chevallier, F., Dethof,
A., Dragosavac, M., Fisher, M., Fuentes, M., Hagemann, S., Holm, E., Hoskins, B. J.,
Isaksen, L., Janssen, P. A. E. M., Jenne, R., McNally, A. P., Mahfouf, J. F., Morcrette,
J. J., Rayner, N. A., Saunders, R. W., Simon, P., Sterl, A., Trenberth, K. E., Untch, A.,
Vasiljevic, D., Viterbo, P. and Woollen, J., 2005, The ERA-40 re-analysis, Quarterly
Journal of the Royal Meteorological Society, 131, 2961-3012.
Appendix 287

List of Acronyms

3D 3-dimensional
ACL Access Control List
AGCM Atmospheric General Circulation Model
ASCII American Standard Code for Information Interchange
CD-ROM Compact Disc Read-Only Memory
CLI Command Line Interface
CFSR Climate Forecast System Reanalysis
CRU Climate Research Unit
CSS Cascading Style Sheets
CSV Comma-Separated Values
DOE Department of Energy
DVD Digital Versatile Disc
ECMWF European Centre for Medium-Range Weather Forecasts
ERA ECMWF ReAnalysis
FTP File Transfer Protocol
GDAL Geospatial Data Abstraction Library
GNOME GNU Network Object Model Environment
GNU GNU’s Not Unix! (recursive acronym)
GPCP Global Precipitation Climatology Project
GTOPO30 Global Topography at 30 arc-second resolution
GUI Graphical User Interface
HPC High Performance Computing
IDE Integrated Development Environment
GRIB Gridded Binary
HadISST Hadley Centre Sea Ice and Sea Surface Temperature
JMA Japan Meteorological Agency
JRA JMA ReAnalysis
LAN Local Area Network
LLJ Low Level Jet
MERRA Modern Era Retrospective-Analysis for Research and
Applications
MPI Max-Planck-Institute
MPI-BGC MPI for Biogeochemistry
MSL Mean Sea Level
MSLP Mean Sea Level Pressure
NASA National Aeronautics and Space Administration
Appendix 288

NCAR National Center for Atmospheric Research


NCEP National Centers for Environmental Prediction
NCL NCAR Command Language
netCDF Network Common Data Form
NWP Numerical Weather Prediction
PC Personal Computer
Pibal Pilot Balloon
Pip Pip Installs Packages (recursive acronym)
PSF Python Software Foundation
PP Post Processed
PyPI Python Package Index
SYNOP Surface Synoptic Observations
TAR Tape ARchive
TIFF Tagged Image File Format
UI User Interface
UEA University of East Anglia
URL Uniform Resource Locator (web address)
USB Universal Serial Bus
USGS United States Geological Survey
VPN Virtual Private Network
ZIP Not an acronym; means move at high speed

You might also like