9.3.2 CDO Operator Categories: Section 5.3
9.3.2 CDO Operator Categories: Section 5.3
The option -b is needed when working with packed netCDF files (see
Section 5.3).
Category Description
File information Print file information
File operations Copy, split or merge files
Selection Select (extract) parts of a dataset
Comparison Compare datasets
Modification Modify datasets
Arithmetic Arithmetics
Statistics Spatial, vertical, temporal and ensemble statistics
Correlation Correlation statistics
Regression Regression and trend analysis
Interpolation Spatial, vertical and time interpolation
Transformation Spectral, divergence and vorticity analysis
Note that parameter values (longitude and latitude boundaries for the Congo Basin)
are passed to the sellonlatbox operator (see Section 8.3.4).
In the second example, the value 273.15 (a value of type float) is passed to the operator
subc (subtract constant) in order to be subtracted from each data value in the input
file (i.e., to convert temperature values from Kelvin to degree C).
Data Analysis with CDO 241
In the third example, the integer values 5 to 9 (values of the type integer) are passed
to the operator selmonth (select months) in order to extract the corresponding months
from the input file.
No white spaces are allowed between an operator and its associated param-
eter(s). Only commas are used to separate the operator and parameter(s).
A short-cut for passing sequential lists of integer values such as months or years to
operators is to separate the first and last value of the list by a forward slash (/) as in
the following example. The values between the first and last list element are filled in
automatically when the operator is applied. This is especially useful for long value
lists. The following example will result in the same output file as in the previous
example.
era5_hourly_d2m_201501.nc
era5_hourly_d2m_201502.nc
era5_hourly_d2m_201503.nc
era5_hourly_d2m_201504.nc
era5_hourly_d2m_201505.nc
era5_hourly_d2m_201506.nc
era5_hourly_d2m_201507.nc
era5_hourly_d2m_201508.nc
era5_hourly_d2m_201509.nc
era5_hourly_d2m_201510.nc
era5_hourly_d2m_201511.nc
era5_hourly_d2m_201512.nc
To merge these files into a single file in which all the timesteps are sorted correctly the
following command can be used. Note that the special character question mark (?)
is used here to generate a list of intput files (see Section 3.4.5 for special characters).
The operators copy and cat can be used to achieve similar results to the use of
mergetime. The exact differences, however, are not well documented in the CDO
User’s Guide. The conditions for the input files are the same as those for mergetime
(see Table 8.4.1).
While the operator copy can be used for single file operations such as converting to
netCDF or adding a relative time axis it can also be used to concatenate datasets that
have different timesteps but have otherwise the same structure and same variables.
One of the difference is that the operators copy and cat will not automatically sort
the data fields by date and time. Therefore, the order of the list of input files matters
here.
Care should be taken if the operators copy and cat are used to merge files
because the data fields are not automatically sorted by date and time as
done by the operator mergetime.
The following command will achieve the same results as in the mergetime example
above only because the use of the special characters ?? will result in a list of input
files where the data fields across all files are sorted by date and time.
Data Analysis with CDO 244
The operator cat works in a similar way to mergetime and copy but it can be used to
append data fields to an existing file. If the file does not exist then it will be created.
Therefore, the following command will achieve the same result as in the mergetime
and copy examples above as long as the output file does not already exist and the list
of input files is sorted by date and time.
If the number of input files is very large then the use of the cat operator is
recommended for merging files.
9.5 Selections
Selection operators are frequently used to select parts or specific features from a
netCDF file such as temporal and spatial subsets and subsets of variables. Selection
operator names start with sel followed by the feature to be selected (e.g., selmonth). If
the selected feature is saved into a new file without any further processing then one
may also refer to the process as extraction. Some of the more frequently used selection
operators are discussed in the following sub-sections (Section 8.4.1 to Section 8.4.4).
Longitude values ranging between -180° and 180° can be used with the
sellonlatbox operator even if the longitudes in the file range from 0° to
360° (Pacific centred maps). CDO will convert them internally. Similarly,
longitude values ranging between 0° to 360° can be used with Africa centred
files (longitudes ranging from -180° and 180°).
air temperature or specific humidity). Types of vertical levels used for the latter are
discussed in more detail in Section 4.7. One or more vertical levels can be selected
from such files using the sellevel or sellevidx operator.
For the use of the sellevel operator the level or levels of interest must be specified
using values saved in the vertical dimension variable. To find out which values
are save in the variable associated with the vertical dimension the file information
operator showlevel can be used.
In the following example the data field associated with the 925 hPa pressure level is
extracted from a file.
When using the sellevidx operator the levels to be selected are specified using the
corresponding index values.
In the following example all timesteps with an hour that matches either 12 or 18 UTC
are selected.
In the following example all timesteps with a month that matches June are selected.
In the following example all timesteps that fall between the years 1980 to 1989
inclusive are selected. Note the shortcut for creating a list of years by using a forward
slash (/) between the first and last year.
The parameters for selecting times, dates and seasons are of the type String whereas
all other parameters are of the type Integer (Table 8.5.4.1). String parameters are
passed on to the operator using specific formats which are listed in Table 8.5.4.2.
Table 8.5.4.2: Specific formats of time, date and season parameters used with selection operators.
A specific time range can be defined by using the operator seldate. The operator
expects parameters corresponding to the start date and time (date1) as well as the
end date and time (date2). All timesteps that fall between these two date/time
parameters inclusively will be selected. The format of the date/time parameter is
YYYY-MM-DDThh:mm:ss whereby YYYY is the year, MM is the month, DD is the day, hh is the
hour, mm is the minute and ss is the second. The letter T separates the date and the
time part of the parameter. Time details hh:mm:ss may be omitted in cases where they
are not required (e.g., monthly mean timesteps).
In the following example all timesteps that fall between 1 Jan 1975 and 31 Dec 2012
Data Analysis with CDO 248
In the following example all timesteps that fall between 1 Jan 2000 at 12 UTC and 3
Jan 2000 at 12 UTC inclusive are selected.
In the following example all timesteps that fall within the months of June, July and
August are selected, corresponding to the season JJA.
The difference between the statistic mean (mean) and average (avg) is artificial and
relates to how missing values are treated in the computation. The operator mean
ignores missing values. For instance, the mean of an array containing the four
elements 1, 2, missing value and 3 is
Data Analysis with CDO 249
(1 + 2 + 3) / 3
= 2
whereas the average operator (avg) applied to the same four element array is
(1 + 2 + missing_value + 3) / 4
= missing_value / 4
= missing_value
If the array does not include any missing values then the use of statistic mean and avg
will produce the same result.
Some of the more frequently used statistical computations are discussed in following
Section 8.5.1 to Section 8.5.4.
ncdump -h cru_ts4.02.1979.2015.tmp.dat.nc
³http://data.ceda.ac.uk/badc/cru/data/cru_ts/cru_ts_4.02/data/tmp
Data Analysis with CDO 250
netcdf cru_ts4.02.1979.2015.tmp.dat {
dimensions:
lon = 720 ;
lat = 360 ;
time = UNLIMITED ; // (37 currently)
variables:
float lon(lon) ;
lon:standard_name = "longitude" ;
lon:long_name = "longitude" ;
lon:units = "degrees_east" ;
lon:axis = "X" ;
float lat(lat) ;
lat:standard_name = "latitude" ;
lat:long_name = "latitude" ;
lat:units = "degrees_north" ;
lat:axis = "Y" ;
double time(time) ;
time:standard_name = "time" ;
time:long_name = "time" ;
time:units = "days since 1900-1-1 00:00:00" ;
time:calendar = "standard" ;
time:axis = "T" ;
float tmp(time, lat, lon) ;
tmp:long_name = "near-surface temperature" ;
tmp:units = "degrees Celsius" ;
tmp:_FillValue = 9.96921e+36f ;
tmp:missing_value = 9.96921e+36f ;
tmp:correlation_decay_distance = 1200.f ;
...
}
The file contains near-surface temperature data on a global 720 by 360 spatial grid
(0.5° resolution) and has 37 timesteps (one for each year).
The following CDO command computes the long-term mean near-surface tempera-
ture field over all timesteps using the timmean operator. The output is saved in a file
named tmp_ltm.nc.
Data Analysis with CDO 251
Looking at the file header information of the output file tmp_ltm.nc, the command
ncdump -h tmp_ltm.nc will look like the following.
netcdf tmp_ltm {
dimensions:
lon = 720 ;
lat = 360 ;
time = UNLIMITED ; // (1 currently)
bnds = 2 ;
variables:
float lon(lon) ;
lon:standard_name = "longitude" ;
lon:long_name = "longitude" ;
lon:units = "degrees_east" ;
lon:axis = "X" ;
float lat(lat) ;
lat:standard_name = "latitude" ;
lat:long_name = "latitude" ;
lat:units = "degrees_north" ;
lat:axis = "Y" ;
double time(time) ;
time:standard_name = "time" ;
time:long_name = "time" ;
time:bounds = "time_bnds" ;
time:units = "days since 1900-1-1 00:00:00" ;
time:calendar = "standard" ;
time:axis = "T" ;
double time_bnds(time, bnds) ;
float tmp(time, lat, lon) ;
tmp:long_name = "near-surface temperature" ;
tmp:units = "degrees Celsius" ;
tmp:_FillValue = 9.96921e+36f ;
tmp:missing_value = 9.96921e+36f ;
tmp:correlation_decay_distance = 1200.f ;
...
}
Data Analysis with CDO 252
The time dimension (time) now only holds 1 timestep whereas the spatial dimensions
(lon and lat) remain unchanged.
Statistics calculated over the temporal domain using the tim<stat> operator result in
the collapse of the time dimension to 1 as shown in Figure 8.5.1.1 for a 3D field.
Statistics calculated over the time domain using the tim<stat> operator will
result in the collapse of the time dimension to 1.
Figure 8.5.1.1: Schematic showing the collapse of the time dimension when the tim<stat> operator
is applied to a 3D (longitude, latitude and time) data structure resulting in a 2D (longitude and
latitude) data structure.
Applying the tim<stat> operator to a 4D field (longitude, latitude, level and time)
will also collapse the time dimension to 1 as illustrated in Figure 8.5.1.2.
Data Analysis with CDO 253
Figure 8.5.1.2: Schematic showing the collapse of the time dimension when the tim<stat> operator
is applied to a 4D (longitude, latitude, levels and time) data structure resulting in a 3D (longitude,
latitude and levels) data structure.
In addition, the CDO command added a new dimension variable named bnds and a
new data variable named time_bnds which holds the two timesteps associated with
the boundaries of the time range over which the mean was computed.
The output of the command ncdump -h tmp_time series.nc looks like the following.
Data Analysis with CDO 254
netcdf tmp_timeseries {
dimensions:
lon = 1 ;
lat = 1 ;
time = UNLIMITED ; // (37 currently)
variables:
double lon(lon) ;
lon:standard_name = "longitude" ;
lon:long_name = "longitude" ;
lon:units = "degrees_east" ;
lon:axis = "X" ;
double lat(lat) ;
lat:standard_name = "latitude" ;
lat:long_name = "latitude" ;
lat:units = "degrees_north" ;
lat:axis = "Y" ;
double time(time) ;
time:standard_name = "time" ;
time:long_name = "time" ;
time:units = "days since 1900-1-1 00:00:00" ;
time:calendar = "standard" ;
time:axis = "T" ;
float tmp(time, lat, lon) ;
tmp:long_name = "near-surface temperature" ;
tmp:units = "degrees Celsius" ;
tmp:_FillValue = 9.96921e+36f ;
tmp:missing_value = 9.96921e+36f ;
tmp:correlation_decay_distance = 1200.f ;
...
}
Note that the number of elements of the dimensions longitude and latitude have
been collapsed to 1 while the time dimension (time) remains unchanged. Each year
is now associated with a single near-surface temperature value thereby creating a
time series. Opening the output file using the command ncview tmp_time series.nc
will show the time series in a simple plot (Figure 8.5.2.1).
Data Analysis with CDO 255
Figure 8.5.2.1: Screenshot of the ncview window when opening the output file tmp_time_series.nc
showing annual global mean near-surface temperatures from 1979 to 2015.
The collapse of the spatial dimension longitude and latitude is demonstrated visually
for a 3D data structure (as used in the example above) in Figure 8.5.2.2 and for a 4D
data structure in Figure 8.5.2.3.
Statistics calculated over the spatial domain using the fld<stat> operator
will result in the collapse of the longitude and latitude dimension to 1.
Data Analysis with CDO 256
Figure 8.5.2.2: Schematic showing the collapse of the spatial dimensions longitude and latitude when
the fld<stat> operator is applied to a 3D (longitude, latitude and time) data structure resulting in a
1D (time only) data structure.
Figure 8.5.2.3: Schematic showing the collapse of the spatial dimensions longitude and latitude when
the fld<stat> operator is applied to a 3D (longitude, latitude, levels and time) data structure resulting
in a 2D (levels and time) data structure.
ncdump -h erai_q_ltm.nc
netcdf erai_q_ltm {
dimensions:
longitude = 480 ;
latitude = 241 ;
level = 37 ;
time = UNLIMITED ; // (1 currently)
bnds = 2 ;
variables:
float longitude(longitude) ;
longitude:standard_name = "longitude" ;
longitude:long_name = "longitude" ;
longitude:units = "degrees_east" ;
longitude:axis = "X" ;
float latitude(latitude) ;
latitude:standard_name = "latitude" ;
latitude:long_name = "latitude" ;
latitude:units = "degrees_north" ;
latitude:axis = "Y" ;
double level(level) ;
level:standard_name = "air_pressure" ;
level:long_name = "pressure_level" ;
level:units = "millibars" ;
level:positive = "down" ;
level:axis = "Z" ;
double time(time) ;
time:standard_name = "time" ;
time:long_name = "time" ;
time:bounds = "time_bnds" ;
time:units = "hours since 1900-1-1 00:00:00" ;
time:calendar = "standard" ;
time:axis = "T" ;
double time_bnds(time, bnds) ;
double q(time, level, latitude, longitude) ;
q:standard_name = "specific_humidity" ;
q:long_name = "Specific humidity" ;
Data Analysis with CDO 258
The specific humidity data are saved for 37 vertical levels on a 480 by 241 global
grid. The vertical levels represent atmospheric pressure levels. The following CDO
command calculates the vertically integrated (summed up) specific humidity using
the CDO operator vertsum. The output is saved in a file named erai_ltm_q_vertsum.nc.
The output of the command ncdump -h erai_ltm_q_vertsum.nc looks like the following.
netcdf erai_ltm_q_vertsum {
dimensions:
longitude = 480 ;
latitude = 241 ;
time = UNLIMITED ; // (1 currently)
bnds = 2 ;
variables:
float longitude(longitude) ;
longitude:standard_name = "longitude" ;
longitude:long_name = "longitude" ;
longitude:units = "degrees_east" ;
longitude:axis = "X" ;
float latitude(latitude) ;
latitude:standard_name = "latitude" ;
latitude:long_name = "latitude" ;
latitude:units = "degrees_north" ;
latitude:axis = "Y" ;
double time(time) ;
time:standard_name = "time" ;
time:long_name = "time" ;
time:bounds = "time_bnds" ;
time:units = "hours since 1900-1-1 00:00:00" ;
time:calendar = "standard" ;
Data Analysis with CDO 259
time:axis = "T" ;
double time_bnds(time, bnds) ;
double q(time, latitude, longitude) ;
q:standard_name = "specific_humidity" ;
q:long_name = "Specific humidity" ;
q:units = "kg kg**-1" ;
q:_FillValue = -32767. ;
q:missing_value = -32767. ;
...
}
Note that CDO, in this case, not only collapsed the vertical dimension but also
removed it completely from the netCDF file. The longitude, latitude and time
dimensions remain unchanged. The single level output field represents the global
long-term mean vertically integrated specific humidity.
The collapse of the vertical dimension level is demonstrated visually for a 4D data
structure (as used in the example above) in Figure 8.5.3.1 and for a 2D data structure
in Figure 8.5.3.2.
Statistics calculated over the vertical domain using the vert<stat> operator
will result in the collapse of the vertical dimension to 1.
Figure 8.5.3.1: Schematic showing the collapse of the vertical dimension level when the vert<stat>
operator is applied to a 4D (longitude, latitude, level and time) data structure resulting in a 3D
(longitude, latitude and time) data structure.
Data Analysis with CDO 260
Figure 8.5.3.2: Schematic showing the collapse of the vertical dimension level when the vert<stat>
operator is applied to a 2D (levels and time) data structure resulting in a 1D (time only) data
structure (time series).
The output of the command ncdump -h erai_ltm_q_zonmean.nc looks like the following.
Data Analysis with CDO 261
netcdf erai_q_ltm_zonmean {
dimensions:
lon = 1 ;
lat = 241 ;
level = 37 ;
time = UNLIMITED ; // (1 currently)
bnds = 2 ;
variables:
double lon(lon) ;
lon:standard_name = "longitude" ;
lon:long_name = "longitude" ;
lon:units = "degrees_east" ;
lon:axis = "X" ;
double lat(lat) ;
lat:standard_name = "latitude" ;
lat:long_name = "latitude" ;
lat:units = "degrees_north" ;
lat:axis = "Y" ;
double level(level) ;
level:standard_name = "air_pressure" ;
level:long_name = "pressure_level" ;
level:units = "millibars" ;
level:positive = "down" ;
level:axis = "Z" ;
double time(time) ;
time:standard_name = "time" ;
time:long_name = "time" ;
time:bounds = "time_bnds" ;
time:units = "hours since 1900-1-1 00:00:00" ;
time:calendar = "standard" ;
time:axis = "T" ;
double time_bnds(time, bnds) ;
double q(time, level, lat, lon) ;
q:standard_name = "specific_humidity" ;
q:long_name = "Specific humidity" ;
q:units = "kg kg**-1" ;
q:_FillValue = -32767. ;
q:missing_value = -32767. ;
...
}
Data Analysis with CDO 262
Note that the longitude dimension collapsed to 1 whereas the number of elements
of the latitude, level and time dimensions remain unchanged. The output field
represents a latitude by height cross section of the atmosphere.
The collapse of the longitude dimension longitude is demonstrated visually for a 4D
data structure (as used in the example above) in Figure 8.5.4.1 and for a 3D data
structure in Figure 8.5.4.2.
Statistics calculated over the zonal domain using the zon<stat> operator
will result in the collapse of the longitude dimension to 1.
Figure 8.5.4.1: Schematic showing the collapse of the zonal dimension longitude when the zon<stat>
operator is applied to a 4D (longitude, latitude, levels and time) data structure resulting in a 3D
(latitude, levels and time) data structure.
Data Analysis with CDO 263
Figure 8.5.4.2: Schematic showing the collapse of the zonal dimension longitude when the zon<stat>
operator is applied to a 3D (longitude, levels and time) data structure resulting in a 2D (levels and
time) data structure.
The output of the command ncdump -h erai_ltm_q_mermean.nc looks like the following.
Data Analysis with CDO 264
netcdf erai_ltm_q_mermean {
dimensions:
lon = 480 ;
lat = 1 ;
level = 37 ;
time = UNLIMITED ; // (1 currently)
bnds = 2 ;
variables:
double lon(lon) ;
lon:standard_name = "longitude" ;
lon:long_name = "longitude" ;
lon:units = "degrees_east" ;
lon:axis = "X" ;
double lat(lat) ;
lat:standard_name = "latitude" ;
lat:long_name = "latitude" ;
lat:units = "degrees_north" ;
lat:axis = "Y" ;
double level(level) ;
level:standard_name = "air_pressure" ;
level:long_name = "pressure_level" ;
level:units = "millibars" ;
level:positive = "down" ;
level:axis = "Z" ;
double time(time) ;
time:standard_name = "time" ;
time:long_name = "time" ;
time:bounds = "time_bnds" ;
time:units = "hours since 1900-1-1 00:00:00" ;
time:calendar = "standard" ;
time:axis = "T" ;
double time_bnds(time, bnds) ;
double q(time, level, lat, lon) ;
q:standard_name = "specific_humidity" ;
q:long_name = "Specific humidity" ;
q:units = "kg kg**-1" ;
q:_FillValue = -32767. ;
q:missing_value = -32767. ;
...
}
Data Analysis with CDO 265
Note that the latitude dimension collapsed to 1 whereas the number of elements
of the longitude, level and time dimensions remain unchanged. The output field
represents a longitude by height cross section of the atmosphere.
The collapse of the latitude dimension latitude is demonstrated visually for a 4D data
structure (as used in the example above) in Figure 8.5.5.1 and for a 3D data structure
in Figure 8.5.5.2.
Statistics calculated over the meridional domain using the mer<stat> oper-
ator will result in the collapse of the latitude dimension to 1.
Figure 8.5.5.1: Schematic showing the collapse of the meridional dimension latitude when the
mer<stat> operator is applied to a 4D (longitude, latitude, levels and time) data structure resulting
in a 3D (longitude, levels and time) data structure.
Data Analysis with CDO 266
Figure 8.5.5.2: Schematic showing the collapse of the meridional dimension latitude when the
mer<stat> operator is applied to a 3D (latitude, levels and time) data structure resulting in a 2D
(levels and time) data structure.
is often not the case when comparing output from different models. If the files do
not share the same resolution then files have to be interpolated spatially (known as
remapping) to a common resolution before ensemble statistics can be calculated (see
Section 8.6.1 for remapping options).
In the following example the 95ʰ percentile is calculated over relative humidity (hurs)
fields from six ensemble members (r1i1p1 to r6i1p1) of historical experiment CCSM4
global climate model runs. Note that the asterisk (*) is used here as a wildcard in
order to generate a list of input files.
9.7 Interpolations
The CDO operator name is constructed from two parts. The first part is named remap
followed by the second part the interpolation method (<InterpMethod>) to be used.
Table 8.7.1.1 lists the available interpolation methods. Using both parts, operator
names such as remapbil (for bilinear remapping) or remapnn (for nearest neighbour
remapping) can be constructed.
Data Analysis with CDO 268
<InterpMethod> Description
bil Bilinear remapping
bic Biqubic remapping
nn Nearest neighbour remapping
dis Distance-weighted average remapping
ycon First order conservative remapping
con First order conservative remapping
con2 Second order conservative remapping
laf Largest area fraction remapping
The operator requires a parameter that holds the information about the desired new
grid (<NewGrid>). Table 8.7.1.2 lists possible parameter options for the new grid.
ample the bilinear interpolation method is used to interpolate data to the geographic
location specified by the longitude -1.255987 and latitude 51.758571 (location of the
School of Geography and the Environment, University of Oxford).
Table 8.7.2.1: CDO operators used for interpolations in the vertical domain.
Operator Description
intlevel Linear vertical interpolation of non-hybrid 3D variables
intlevel3d Linear vertical interpolation of 3D variables fields with given 3D
vertical coordinates
ml2pl Interpolation of 3D variables on hybrid sigma pressure level to
pressure levels
ml2hl Interpolation of 3D variables on hybrid sigma pressure level to
height levels
ap2pl Interpolation of 3D variables on hybrid sigma height coordinates to
pressure levels
ap2hl Interpolation of 3D variables on hybrid sigma height coordinates to
height levels
Data Analysis with CDO 270
For 3D fields on non-hybrid levels the operator intlevel can be used to perform linear
interpolation to a new set of vertical levels. For instance, the following command
interpolates data on pressure levels to a new set of target pressure levels. The target
pressure levels are passed to the operator as parameters. Make sure that the units of
the level variable in input file match that of the target levels (e.g., hPa or Pa).
For a more detailed description of the other vertical interpolation operators see the
CDO User’s Guide⁴.
The format for the date and time information is YYYY-MM-DD and hh:mm:ss, respectively.
An optional increment parameter (inc) can be passed to the operator. Possible
increments include seconds, minutes, hours, days, months and years (the default is 0hour).
For example, the following command will interpolate a file with3-hourly temporal
resolution starting at 1 June 1980 at 18 UTC to a 1-hourly time series.
Alternatively, the intntime operator can be used for time interpolations. Instead of the
temporal resolution as used with inttime the intntime operator expects the number of
timesteps from one timestep to the next in the original file to be passed as a parameter.
The following command will produce the same output as the inttime example above.
⁴https://code.zmaw.de/projects/cdo/wiki/Cdo#Documentation
Data Analysis with CDO 271
The operator (in the above example sub) is followed by the two file names separated
by white spaces followed by the output filename. To mentally visualise which file
is subtracted from which one might imagine the arithmetic operator to be located
between the two file names.
The two file names are passed to the arithmetic operator as arguments
separated by white spaces and not commas as done with other operator
arguments.
data point in a data field may be multiplied with or divided by a constant using the
mulc or divc operator, respectively.
The constant is passed to the operator as a parameter in the usual way (see Section
8.4.3). In the following example the constant 273.15 is added to each data point in the
input file ifile.nc in order to convert temperature data from °C to Kelvin.
Note that the value of the units attribute in the netCDF file (e.g., Degree
Celsius) remains unchanged. Only the values in the data variable change.
As the fields from the intermediate steps are passed on in memory it is important to
think through what the structure of the data field would look like after the execution
of each individual operator. Dimensions or temporal resolution of the field may
Data Analysis with CDO 273
change. If in doubt then execute each operator individually, save the output and
check the file using ncdump, cdo [s]info[n] and ncview.
Remember that hyphens (-) are required for all operators apart from the leftmost one
(executed last).
The following subsections will cover some examples of how multiple operators may
be applied and what a CDO workflow might look like.
Step 2: The following CDO command calculates the long-term mean November SSTs
for the period 1980 to 2010 and saves the output in a file named HadISST_sst_Nov_-
ltm.nc. First, the selection operator selmon,11 is executed which selects all timesteps
between 1980 and 2010 inclusive. The resulting data field should now have 31 times 12
(=372) timesteps. The data field is passed on to the selection operator selmon,11 which
now selects all timesteps corresponding to the month November. The resulting data
field from this operation should now have 31 timesteps. The resulting data field is
now passed on to the statistics operator timmean which calculates the mean over the
temporal domain. The final data saved in the file HadISST_sst_Nov_ltm.nc represents
a single timestep (long-term mean) field on a 1° by 1° global grid.
Step 4: In order to calculate the anomaly the 1980-2010 long-term mean field needs
to be subtracted from the November 1997 field. The following command does exactly
that by using the arithmetic operator sub which subtracts SST data in file HadISST_-
sst_Nov_ltm.nc from the SST data in file HadISST_sst_Nov1997.nc. The resulting SST
anomaly field is saved in the file HadISST_sst_Nov1997_anom.nc.
The CDO command may output a warning such as cdo sub (Warning):
Input streams have different parameters!. This is the result of the file
HadISST_sst_Nov_ltm.nc having an additional variable named ‘time_bnds_2’
that was created as a result of the cdo command in Step 2.
netcdf cru_ts4.03.1901.2018.pre.dat {
dimensions:
lon = 720 ;
lat = 360 ;
time = UNLIMITED ; // (1416 currently)
variables:
float lon(lon) ;
lon:long_name = "longitude" ;
lon:units = "degrees_east" ;
float lat(lat) ;
lat:long_name = "latitude" ;
lat:units = "degrees_north" ;
float time(time) ;
time:long_name = "time" ;
time:units = "days since 1900-1-1" ;
Data Analysis with CDO 278
time:calendar = "gregorian" ;
float pre(time, lat, lon) ;
pre:long_name = "precipitation" ;
pre:units = "mm/month" ;
pre:correlation_decay_distance = 450.f ;
pre:_FillValue = 9.96921e+36f ;
pre:missing_value = 9.96921e+36f ;
int stn(time, lat, lon) ;
stn:description = "number of stations contributing to each datum" ;
stn:_FillValue = -999 ;
stn:missing_value = -999 ;
...
}
Step 2: The following CDO command calculates the JAS rainfall amounts for each
year between 1950 and 2018 averaged over the Sahel domain. First, the data variable
pre is selected using the operator selvar,pre. Second, the period 1950 to 2018 is
selected using the operator selyear,1950/2018 followed by selecting the months July
to September using the operator selmon,7,8,9. Next the spatial domain is cropped
to the Sahel region using the operator sellonlatbox,-20,30,10,18. Next, the rainfall
values are summed up for each year. This will give one value per year per grid box
representing JAS totals. Lastly, the spatial average is calculated for the Sahel domain
using the statistics operator fldmean. The output in the form of a time series is saved
in the file Sahel_JAS_pre.nc.
cdo fldmean -yearsum -sellonlatbox,-20,30,10,18 -selmon,7,8,9 -selyear,1950/2018 -
selvar,pre cru_ts4.03.1901.2018.pre.dat.nc Sahel_JAS_pre.nc
Step 3: In order to calculate the anomalies the long-term mean value needs to be
subtracted from each data value in the time series. The long-term mean of the
time series can be calculated using the following command employing the statistics
operator timmean. The output is saved in the temporary file foo.nc.
The file ‘foo.nc’ will contain only a single data point representing the long-term mean
value of the time series. To find out what that value is the command ncdump foo.nc
Data Analysis with CDO 279
(without -h) can be used. It reveals that pre = 401.9308. This is the long-term mean
JAS Sahel rainfall between 1850 and 2018.
Step 4: Now we can subtract the long-term mean value from each value in the time
series using the operator subc,401.9308 in the command below. The output is saved
in the file Sahel_JAS_pre_anom.nc.
Subtracting the mean value from the time series values is a fairly simple
procedure. This step could also be done in one line within the Python code
that does the plotting of the time series meaning that Step 3 and Step 4
would not be needed.
Step 5: The output file Sahel_JAS_pre_anom.nc created in the previous step contains
the rainfall anomaly time series for the Sahel region. Python can now be used to
read in and plot the data. The output file created in this example can be read in and
plotted using Code 7.x.x.2. The code creates a time series plot (Figure 7.x.x.x).
The -f option followed by nc is used here to make sure that the output file
land-sea-mask.nc is saved in netCDF format (see Section 8.3.1 for CDO options).
The topo operator creates a topography (elevation) field based on a high-resolution
(30 arc-seconds; approximately 1 km) global digital elevation model developed by
the United States Geological Survey (USGS) named GTOPO30⁵.
The operator remapnn is used to perform nearest-neighbour interpolation of the
topography field to the resolution of the data file data.nc (see [Section 8.7.1] for
remapping of files).
The gtc operator compares a field with a constant returning a field containing values
of 1 if the comparison is true and 0 if the comparison is not true. Therefore, applying
the gtc,0 operator to the topography field will return 1 for all grid cells where the
elevation is greater than 0 metres (sea level) and 0 for all other grid cells (ocean).
While the field now already represents a land-sea mask with grid cell values of 0 or
1 the setctomiss (set constant to missing) operator with the parameter 0 is used here
in addition to set all grid cell values that are 0 (ocean grid cells) to missing values.
The resulting field will now only have grid cell values of 1 or missing values.
The output of the command ncdump -h land-sea-mask.nc may look similar to the
following.
⁵https://www.usgs.gov/centers/eros/science/usgs-eros-archive-digital-elevation-global-30-arc-second-elevation-
gtopo30?qt-science_center_objects=0#qt-science_center_objects
⁶https://prd-wret.s3.us-west-2.amazonaws.com/assets/palladium/production/s3fs-public/atoms/files/GTOPO30_
Readme.pdf
Data Analysis with CDO 281
netcdf land-sea-mask {
dimensions:
lon = 96 ;
bnds = 2 ;
lat = 73 ;
variables:
double lon(lon) ;
lon:standard_name = "longitude" ;
lon:long_name = "longitude" ;
lon:units = "degrees_east" ;
lon:axis = "X" ;
lon:bounds = "lon_bnds" ;
double lon_bnds(lon, bnds) ;
double lat(lat) ;
lat:standard_name = "latitude" ;
lat:long_name = "latitude" ;
lat:units = "degrees_north" ;
lat:axis = "Y" ;
lat:bounds = "lat_bnds" ;
double lat_bnds(lat, bnds) ;
float topo(lat, lon) ;
topo:units = "m" ;
topo:_FillValue = -9.e+33f ;
topo:missing_value = -9.e+33f ;
...
}
A list of netCDF files containing daily values of rainfall generated as part of the
Climate Hazards Group InfraRed Precipitation with Station data (CHIRPS⁷) dataset
are saved in a directory named /home/data/chirps_v20/ as follows.
/home/data/chirps_v20/0.5x0.5/chirps-v2.0.1981.days_p05.nc
/home/data/chirps_v20/0.5x0.5/chirps-v2.0.1982.days_p05.nc
/home/data/chirps_v20/0.5x0.5/chirps-v2.0.1983.days_p05.nc
/home/data/chirps_v20/0.5x0.5/chirps-v2.0.1984.days_p05.nc
/home/data/chirps_v20/0.5x0.5/chirps-v2.0.1985.days_p05.nc
/home/data/chirps_v20/0.5x0.5/chirps-v2.0.1986.days_p05.nc
...
/home/data/chirps_v20/0.5x0.5/chirps-v2.0.2013.days_p05.nc
/home/data/chirps_v20/0.5x0.5/chirps-v2.0.2014.days_p05.nc
/home/data/chirps_v20/0.5x0.5/chirps-v2.0.2015.days_p05.nc
/home/data/chirps_v20/0.5x0.5/chirps-v2.0.2016.days_p05.nc
/home/data/chirps_v20/0.5x0.5/chirps-v2.0.2017.days_p05.nc
/home/data/chirps_v20/0.5x0.5/chirps-v2.0.2018.days_p05.nc
Each file contains one year of rainfall data. The period covered is 1981 to 2018 yielding
38 files. Python can now be used to loop over each file with CDO computing monthly
rainfall totals for each file as shown in the following Python code example.
1 import numpy as np
2 from os.path import basename
3 import subprocess
4
5 # define paths
6 datain = '/home/data/chirps_v20/'
7 dataout = 'output/'
8
9 # loop through years
10 for yyyy in np.arange(1981, 2019):
11 # construct input filename
12 ifile = datain+'chirps-v2.0.'+str(yyyy)+'.days_p05.nc'
13 print('Processing', ifile)
14
15 # construct output filename
⁷https://www.chc.ucsb.edu/data/chirps
Data Analysis with CDO 283
16 bname = basename(ifile)
17 ofile = dataout+bname.replace('days', 'monsum')
18
19 # execute CDO command
20 cmd = 'cdo monsum '+ifile+' '+ofile
21 process = subprocess.Popen([cmd], shell=True, stdout=subprocess.PIPE)
22 process.communicate()
In line 1 to 3 the packages and functions needed are imported followed by defining
variables for holding the data input and output directory paths in line 6 and 7. In
line 10 the loop is set up with the variable yyyy iterating over a sequence of numbers
(years) from 1981 to 2019.
Inside the loop the filename is constructed in line 12 and saved in the variable ifile.
Note that the input directory path (datain) is joined with the filename. The year in
the middle of the filename is the the part that changes with each iteration and is
added here by converting the year number to a string (str(yyyy)).
In line 16 and 17 the output filename is constructed. The basename() function is used
here to extract the filename from the input file which includes the full path (line 16).
The output filename is then constructed by replacing the string days with monsum
(CDO operator for monthly totals) of the input filename and adding the output
directory path to the beginning and saving everything the variable ofile.
Next, the CDO command is constructed in line 20 using the CDO operator monsum to
calculated monthly rainfall totals. Line 21 and 22 will execute the CDO command
saved in the variable cmd using the subprocess (see Section 7.2.5 for more details).
As an alternative to using method described above the Python package cdo⁸ may
be used to integrate CDO functionality into a Python script. The usefulness of this
module depends on the task at hand. It is worth noting that the Python cdo package
does not install CDO itself. It just acts as a wrapper around the CDO binaries. A
description of the module including installation instructions and how to use it can
be found on the MPI for Meteorology webpage⁹.
⁸https://pypi.org/project/cdo
⁹https://code.mpimet.mpg.de/projects/cdo/wiki/Cdo%7Brbpy%7D
Appendix
Appendix 285
References
Dee, D. P., Uppala, S. M., Simmons, A. J., Berrisford, P., Poli, P., Kobayashi, S., Andrae,
U., Balmaseda, M. A., Balsamo, G., Bauer, P., Bechtold, P., Beljaars, A. C. M., van
de Berg, L., Bidlot, J., Bormann, N., Delsol, C., Dragani, R., Fuentes, M., Geer, A. J.,
Haimberger, L., Healy, S. B., Hersbach, H., Holm, E. V., Isaksen, L., Kallberg, P., Kohler,
M., Matricardi, M., McNally, A. P., Monge-Sanz, B. M., Morcrette, J. J., Park, B. K.,
Peubey, C., de Rosnay, P., Tavolato, C., Thepaut, J. N. and Vitart, F., 2011, The ERA-
Interim reanalysis: configuration and performance of the data assimilation system,
Quarterly Journal of the Royal Meteorological Society, 137, 553-597.
Gelaro, R., McCarty, W., Suarez, M. J., Todling, R., Molod, A., Takacs, L., Randles, C.
A., Darmenov, A., Bosilovich, M. G., Reichle, R., Wargan, K., Coy, L., Cullather, R.,
Draper, C., Akella, S., Buchard, V., Conaty, A., da Silva, A. M., Gu, W., Kim, G. K.,
Koster, R., Lucchesi, R., Merkova, D., Nielsen, J. E., Partyka, G., Pawson, S., Putman,
W., Rienecker, M., Schubert, S. D., Sienkiewicz, M. and Zhao, B., 2017, The Modern-
Era Retrospective Analysis for Research and Applications, Version 2 (MERRA-2),
Journal of Climate, 30, 5419-5454.
Hersbach, H., Peubey, C., Simmons, A., Berrisford, P., Poli, P. and Dee, D., 2015, ERA-
20CM: a twentieth-century atmospheric model ensemble, Quarterly Journal of the
Royal Meteorological Society, 141, 2350-2375.
Kanamitsu, M., Ebisuzaki, W., Woollen, J., Yang, S. K., Hnilo, J. J., Fiorino, M. and
Potter, G. L., 2002, NCEP-DOE AMIP-II reanalysis (R-2), Bulletin of the American
Meteorological Society, 83, 1631-1643.
Kobayashi, S., Ota, Y., Harada, Y., Ebita, A., Moriya, M., Onoda, H., Onogi, K.,
Kamahori, H., Kobayashi, C., Endo, H., Miyaoka, K. and Takahashi, K., 2015, The
JRA-55 Reanalysis: General Specifications and Basic Characteristics, Journal of the
Meteorological Society of Japan, 93, 5-48.
Rienecker, M. M., Suarez, M. J., Gelaro, R., Todling, R., Bacmeister, J., Liu, E.,
Bosilovich, M. G., Schubert, S. D., Takacs, L., Kim, G.-K., Bloom, S., Chen, J., Collins,
D., Conaty, A., Da Silva, A., Gu, W., Joiner, J., Koster, R. D., Lucchesi, R., Molod,
A., Owens, T., Pawson, S., Pegion, P., Redder, C. R., Reichle, R., Robertson, F. R.,
Ruddick, A. G., Sienkiewicz, M. and Woollen, J., 2011, MERRA: NASA’s Modern-Era
Retrospective Analysis for Research and Applications, Journal of Climate, 24, 3624-
Appendix 286
3648.
Saha, S., Moorthi, S., Pan, H.-L., Wu, X., Wang, J., Nadiga, S., Tripp, P., Kistler, R.,
Woollen, J., Behringer, D., Liu, H., Stokes, D., Grumbine, R., Gayno, G., Wang, J.,
Hou, Y.-T., Chuang, H.-Y., Juang, H.-M. H., Sela, J., Iredell, M., Treadon, R., Kleist,
D., Van Delst, P., Keyser, D., Derber, J., Ek, M., Meng, J., Wei, H., Yang, R., Lord,
S., Van den Dool, H., Kumar, A., Wang, W., Long, C., Chelliah, M., Xue, Y., Huang,
B., Schemm, J.-K., Ebisuzaki, W., Lin, R., Xie, P., Chen, M., Zhou, S., Higgins, W.,
Zou, C.-Z., Liu, Q., Chen, Y., Han, Y., Cucurull, L., Reynolds, R. W., Rutledge, G. and
Goldberg, M., 2010, The NCEP Climate Forecast System Reanalysis, Bulletin of the
American Meteorological Society, 91, 1015-1057.
Uppala, S. M., Kallberg, P. W., Simmons, A. J., Andrae, U., Bechtold, V. D., Fiorino,
M., Gibson, J. K., Haseler, J., Hernandez, A., Kelly, G. A., Li, X., Onogi, K., Saarinen,
S., Sokka, N., Allan, R. P., Andersson, E., Arpe, K., Balmaseda, M. A., Beljaars, A.
C. M., Van De Berg, L., Bidlot, J., Bormann, N., Caires, S., Chevallier, F., Dethof,
A., Dragosavac, M., Fisher, M., Fuentes, M., Hagemann, S., Holm, E., Hoskins, B. J.,
Isaksen, L., Janssen, P. A. E. M., Jenne, R., McNally, A. P., Mahfouf, J. F., Morcrette,
J. J., Rayner, N. A., Saunders, R. W., Simon, P., Sterl, A., Trenberth, K. E., Untch, A.,
Vasiljevic, D., Viterbo, P. and Woollen, J., 2005, The ERA-40 re-analysis, Quarterly
Journal of the Royal Meteorological Society, 131, 2961-3012.
Appendix 287
List of Acronyms
3D 3-dimensional
ACL Access Control List
AGCM Atmospheric General Circulation Model
ASCII American Standard Code for Information Interchange
CD-ROM Compact Disc Read-Only Memory
CLI Command Line Interface
CFSR Climate Forecast System Reanalysis
CRU Climate Research Unit
CSS Cascading Style Sheets
CSV Comma-Separated Values
DOE Department of Energy
DVD Digital Versatile Disc
ECMWF European Centre for Medium-Range Weather Forecasts
ERA ECMWF ReAnalysis
FTP File Transfer Protocol
GDAL Geospatial Data Abstraction Library
GNOME GNU Network Object Model Environment
GNU GNU’s Not Unix! (recursive acronym)
GPCP Global Precipitation Climatology Project
GTOPO30 Global Topography at 30 arc-second resolution
GUI Graphical User Interface
HPC High Performance Computing
IDE Integrated Development Environment
GRIB Gridded Binary
HadISST Hadley Centre Sea Ice and Sea Surface Temperature
JMA Japan Meteorological Agency
JRA JMA ReAnalysis
LAN Local Area Network
LLJ Low Level Jet
MERRA Modern Era Retrospective-Analysis for Research and
Applications
MPI Max-Planck-Institute
MPI-BGC MPI for Biogeochemistry
MSL Mean Sea Level
MSLP Mean Sea Level Pressure
NASA National Aeronautics and Space Administration
Appendix 288