0% found this document useful (0 votes)
114 views219 pages

Xcdat Readthedocs Io en Stable

The document provides documentation for xCDAT version 0.6.0. xCDAT is an extension of xarray for climate data analysis on structured grids. It aims to provide generalizable features and utilities for simple and robust analysis of climate data. Key features include extensions to xarray's open_dataset and open_mfdataset functions with additional post-processing options when opening datasets. The documentation covers getting started, community involvement, contributing, features, and more.

Uploaded by

Swarnendu Pal
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
114 views219 pages

Xcdat Readthedocs Io en Stable

The document provides documentation for xCDAT version 0.6.0. xCDAT is an extension of xarray for climate data analysis on structured grids. It aims to provide generalizable features and utilities for simple and robust analysis of climate data. Key features include extensions to xarray's open_dataset and open_mfdataset functions with additional post-processing options when opening datasets. The documentation covers getting started, community involvement, contributing, features, and more.

Uploaded by

Swarnendu Pal
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 219

xCDAT Documentation

Release 0.6.0

Tom Vo

Oct 10, 2023


FOR USERS

1 Project Motivation 3

2 Getting Started 5

3 Community 7

4 Contributing 9

5 Features 11

6 Things We Are Striving For 13

7 Releases 15

8 Useful Resources 17

9 Acknowledgement 19

10 License 21
10.1 Getting Started . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21
10.2 xCDAT on Jupyter and HPC Machines . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23
10.3 Gallery . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24
10.4 Presentations and Demos . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 98
10.5 API Reference . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 118
10.6 History . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 187
10.7 Frequently Asked Questions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 197
10.8 xCDAT Community Code of Conduct . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 201
10.9 Contributing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 204
10.10 Project Maintenance . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 211
10.11 The Team . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 212

Index 213

i
ii
xCDAT Documentation, Release 0.6.0

xCDAT is an extension of xarray for climate data analysis on structured grids. It serves as a modern successor to the
Community Data Analysis Tools (CDAT) library.
Useful links: Documentation | Code Repository | Issues | Discussions | Releases | Mailing List

FOR USERS 1
xCDAT Documentation, Release 0.6.0

2 FOR USERS
CHAPTER

ONE

PROJECT MOTIVATION

The goal of xCDAT is to provide generalizable features and utilities for simple and robust analysis of climate data.
xCDAT’s design philosophy is focused on reducing the overhead required to accomplish certain tasks in xarray. xCDAT
aims to be compatible with structured grids that are CF-compliant (e.g., CMIP6). Some key xCDAT features are
inspired by or ported from the core CDAT library, while others leverage powerful libraries in the xarray ecosystem
(e.g., xESMF, xgcm, cf_xarray) to deliver robust APIs.
The xCDAT core team’s mission is to provide a maintainable and extensible package that serves the needs of the climate
community in the long-term. We are excited to be working on this project and hope to have you onboard!

3
xCDAT Documentation, Release 0.6.0

4 Chapter 1. Project Motivation


CHAPTER

TWO

GETTING STARTED

The best resource for getting started is the xCDAT documentation website. Our documentation provides general guid-
ance for setting up xCDAT in an Anaconda environment on your local computer or on an HPC/Jupyter environment.
We also include an API Overview and Gallery to highlight xCDAT functionality.

5
xCDAT Documentation, Release 0.6.0

6 Chapter 2. Getting Started


CHAPTER

THREE

COMMUNITY

xCDAT is a community-driven open source project. We encourage discussion on topics such as version releases, feature
suggestions, and architecture design on the GitHub Discussions page.
Subscribe to our mailing list for news and announcements related to xCDAT, such as software version releases or future
roadmap plans.
Please note that xCDAT has a Code of Conduct. By participating in the xCDAT community, you agree to abide by its
rules.

7
xCDAT Documentation, Release 0.6.0

8 Chapter 3. Community
CHAPTER

FOUR

CONTRIBUTING

We welcome and appreciate contributions to xCDAT. Users and contributors can view and open issues on our GitHub
Issue Tracker.
For more instructions on how to contribute, please checkout our Contributing Guide.

9
xCDAT Documentation, Release 0.6.0

10 Chapter 4. Contributing
CHAPTER

FIVE

FEATURES

• Extension of xarray’s open_dataset() and open_mfdataset() with post-processing options


– Generate bounds for axes supported by xcdat if they don’t exist in the Dataset
– Optional selection of single data variable to keep in the Dataset (bounds are also kept if they exist)
– Optional decoding of time coordinates
∗ In addition to CF time units, also decodes common non-CF time units (“months since . . . ”, “years
since . . . ”)
– Optional centering of time coordinates using time bounds
– Optional conversion of longitudinal axis orientation between [0, 360) and [-180, 180)
• Temporal averaging
– Time series averages (single snapshot and grouped), climatologies, and departures
– Weighted or unweighted
– Optional seasonal configuration (e.g., DJF vs. JFD, custom seasons)
• Geospatial weighted averaging
– Supports rectilinear grid
– Optional specification of regional domain
• Horizontal structured regridding
– Supports rectilinear and curvilinear grids
– Extends the xESMF horizontal regridding API
– Python implementation of regrid2 for handling cartesian latitude longitude grids
• Vertical structured regridding
– Support rectilinear and curvilinear grids
– Extends the xgcm vertical regridding API

11
xCDAT Documentation, Release 0.6.0

12 Chapter 5. Features
CHAPTER

SIX

THINGS WE ARE STRIVING FOR

• xCDAT supports CF compliant datasets, but will also strive to support datasets with common non-CF compliant
metadata (e.g., time units in “months since . . . ” or “years since . . . ”)
– xCDAT leverages cf_xarray to interpret CF attributes on xarray objects
– Refer to CF Convention for more information on CF attributes
• Robust handling of dimensions and their coordinates and coordinate bounds
– Coordinate variables are retrieved with cf_xarray using CF axis names or coordinate names found in
xarray object attributes. Refer to Metadata Interpretation for more information.
– Bounds are retrieved with cf_xarray using the "bounds" attr
– Ability to operate on both longitudinal axis orientations, [0, 360) and [-180, 180)
• Support for parallelism using dask where it is both possible and makes sense

13
xCDAT Documentation, Release 0.6.0

14 Chapter 6. Things We Are Striving For


CHAPTER

SEVEN

RELEASES

xCDAT (released as xcdat) follows a feedback-driven release cycle using continuous integration/continuous deploy-
ment. Software releases are performed based on the bandwidth of the development team, the needs of the community,
and the priority of bug fixes or feature updates.
After releases are performed on GitHub Releases, the corresponding xcdat package version will be available to down-
load through Anaconda conda-forge usually within a day.
Subscribe to our mailing list to stay notified of new releases.

15
xCDAT Documentation, Release 0.6.0

16 Chapter 7. Releases
CHAPTER

EIGHT

USEFUL RESOURCES

We highly encourage you to checkout the awesome resources below to learn more about Xarray and Xarray usage in
climate science!
• Official Xarray Tutorials
• Xarray GitHub Discussion Forum
• Pangeo Forum
• Project Pythia

17
xCDAT Documentation, Release 0.6.0

18 Chapter 8. Useful Resources


CHAPTER

NINE

ACKNOWLEDGEMENT

Huge thank you to all of the xCDAT contributors!


xCDAT is jointly developed by scientists and developers from the Energy Exascale Earth System Model (E3SM)
Project and Program for Climate Model Diagnosis and Intercomparison (PCMDI). The work is performed for the
E3SM project, which is sponsored by Earth System Model Development (ESMD) program, and the Simplifying ESM
Analysis Through Standards (SEATS) project, which is sponsored by the Regional and Global Model Analysis (RGMA)
program. ESMD and RGMA are programs for the Earth and Environmental Systems Sciences Division (EESSD) in
the Office of Biological and Environmental Research (BER) within the Department of Energy’s Office of Science.

19
xCDAT Documentation, Release 0.6.0

20 Chapter 9. Acknowledgement
CHAPTER

TEN

LICENSE

xCDAT is licensed under the terms of the Apache License (Version 2.0 with LLVM exception).
All new contributions must be made under the Apache-2.0 with LLVM exception license.
See LICENSE and NOTICE for details.
SPDX-License-Identifier: Apache-2.0
LLNL-CODE-846944

10.1 Getting Started

10.1.1 Prerequisites

1. Familiarity with xarray, since this package is an extension of it


• We highly recommend visiting the xarray tutorial and xarray documentation pages if you aren’t familiar
with xarray.
2. xCDAT is distributed through the conda-forge channel of Anaconda. We recommend using Mamba (via Mini-
forge), a drop-in replacement of Conda that is faster and more reliable than Conda. Miniforge ships with
conda-forge set as the prioritized channel. Mamba also uses the same commands and configurations as Conda,
and you can swap commands between both tools.
Follow these steps to install Miniforge (Mac OS & Linux):

curl -L -O "https://github.com/conda-forge/miniforge/releases/latest/download/
˓→Miniforge3-$(uname)-$(uname -m).sh"

bash Miniforge3-$(uname)-$(uname -m).sh

Then follow the instructions for installation. We recommend you type yes in response to "Do you wish the
installer to initialize Miniforge by running conda init?" to add conda and mamba to your
path. Note that this will modify your shell profile (e.g., ~/.bashrc).
Note: After installation completes you may need to type ``bash`` to restart your shell (if you use bash). Alterna-
tively, you can log out and log back in.

21
xCDAT Documentation, Release 0.6.0

10.1.2 Installation

1. Create a Mamba environment from scratch with xcdat (mamba create)


We recommend using the Mamba environment creation procedure to install xcdat. The advantage with following
this approach is that Mamba will attempt to resolve dependencies (e.g. python >= 3.8) for compatibility.
To create an xcdat Mamba environment with xesmf (a recommended dependency), run:

>>> mamba create -n <ENV_NAME> -c conda-forge xcdat xesmf


>>> mamba activate <ENV_NAME>

Note that xesmf is an optional dependency, which is required for using xesmf based horizontal regridding APIs
in xcdat. xesmf is not currently supported on Windows because it depends on esmpy, which also does not
support Windows. Windows users can try WSL2 as a workaround.
2. Install xcdat in an existing Mamba environment (mamba install)
You can also install xcdat in an existing Mamba environment, granted that Mamba is able to resolve the com-
patible dependencies.

>>> mamba activate <ENV_NAME>


>>> mamba install -c conda-forge xcdat xesmf

Note: As above, xesmf is an optional dependency.


3. [Optional] Some packages that are commonly used with xcdat can be installed either in step 1 or step 2 above:
• jupyterlab: a web-based interactive development environment for notebooks, code, and data. This pack-
age also includes ipykernel.
• matplotlib: a library for creating visualizations in Python.
• cartopy: an add-on package for matplotlib and specialized for geospatial data processing.

10.1.3 Updating

New versions of xcdat will be released periodically. We recommend you use the latest stable version of xcdat for the
latest features and bug fixes.

>>> mamba activate <ENV_NAME>


>>> mamba update xcdat

To update to a specific version of xcdat:

>>> mamba activate <ENV_NAME>


>>> mamba update xcdat=<MAJOR.MINOR.PATCH>

22 Chapter 10. License


xCDAT Documentation, Release 0.6.0

10.2 xCDAT on Jupyter and HPC Machines

xCDAT should be compatible with most high performance computing (HPC) platforms. In general, xCDAT is available
on Anaconda via the conda-forge channel. xCDAT follows the same convention as other conda-based packages by
being installable via conda. The conda installation instructions in this guide are based on the instructions provided by
NERSC.
Setup can vary depending on the exact HPC environment you are working in so please consult your HPC doc-
umentation and/or HPC support resources. Some HPC environments might have security settings that restrict
user-managed conda installations and environments.

10.2.1 Setting up your xCDAT environment

Ensure conda is installed

Generally, the instructions from getting started guide can also be followed for HPC machines. This guide covers
installing Miniconda3 and creating a conda environment with the xcdat package.
Before installing Miniconda3, you should consult your HPC documentation to see if conda is already available; in
some cases, python and conda may be pre-installed on an HPC machine. You can check to see whether they are
available by entering which conda and/or which python in the command line (which will return their path if they
are available).
In other cases, python and conda are available via modules on an HPC machine. For example, some machines make
both available via:

module load python

Once conda is active, you can create and activate a new xcdat environment with xesmf (a recommended dependency):

conda create -n <ENV_NAME> -c conda-forge xcdat xesmf


conda activate <ENV_NAME>

Note that xesmf is an optional dependency, which is required for using xesmf based horizontal regridding APIs in
xcdat. xesmf is not currently supported on osx-arm64 or windows because esmpy is not yet available on these
platforms. Windows users can try WSL2 as a workaround.
You may also want to use xcdat with some additional packages. For example, you can install xcdat with matplotlib,
ipython, and ipykernel (see the next section for more about ipykernel):

conda create -n <ENV_NAME> -c conda-forge xcdat xesmf matplotlib ipython ipykernel


conda activate <ENV_NAME>

The advantage with following this approach is that conda will attempt to resolve dependencies (e.g., python >= 3.8) for
compatibility.
If you prefer, you can also add packages later with conda install (granted that conda is able to resolve the compatible
dependencies).

10.2. xCDAT on Jupyter and HPC Machines 23


xCDAT Documentation, Release 0.6.0

10.2.2 Adding an xcdat kernel for use with Jupyter

HPC systems frequently include a web interface to Jupyter, which is a popular web application that is used to perform
analyses in Python. In order to use xcdat with Jupyter, you will need to create a kernel in your xcdat conda envi-
ronment using ipykernel. These instructions follow those from NERSC, but setup can vary depending on the exact
HPC environment you are working in so please consult your HPC documentation. If you have not already installed
ipykernel, you can install it in your xcdat environment (created above) with:

conda activate <ENV NAME>


conda install -c conda-forge ipykernel

Once ipykernel is added to your xcdat environment, you can create an xcdat kernel with:

python -m ipykernel install --user --name <ENV NAME> --display-name <ENV NAME>

After the kernel is installed, login to the Jupyter instance on your HPC. Your xcdat kernel may be available on the
home launch page (to open a new notebook or command line instance). This launcher is sometimes accessed by clicking
the blue plus symbol (see screenshot below). Alternatively, you may need top open a new Notebook and then click
“Kernel” on the top bar -> click “Change Kernel. . . ” and then select your xcdat kernel. You should then be able to
use your xcdat environment on Jupyter.

10.3 Gallery

This gallery demonstrates how to use some of the features in xcdat. Contributions are highly welcomed and appreci-
ated. Please checkout the Contributing guide.

24 Chapter 10. License


xCDAT Documentation, Release 0.6.0

10.3.1 A Gentle Introduction to xCDAT (Xarray Climate Data Analysis Tools)

“A Python package for simple and robust climate data analysis.”


Core Developers: Tom Vo, Stephen Po-Chedley, Jason Boutte, Jill Zhang, Jiwoo Lee
With thanks to Peter Gleckler, Paul Durack, Karl Taylor, and Chris Golaz

This work is performed under the auspices of the U. S. DOE by Lawrence Livermore National Laboratory under contract
No. DE-AC52-07NA27344.

Notebook Setup

Create an Anaconda environment for this notebook using the command below:

conda create -n xcdat -c conda-forge xarray xcdat xesmf matplotlib nc-time-axis jupyter

• xesmf is required for horizontal regridding with xESMF


• matplotlib is an optional dependency required for plotting with xarray
• nc-time-axis is an optional dependency required for matplotlib to plot cftime coordinates

Presentation Overview

Intended audience: Some or no familiarity with xarray and/or xcdat


1. Driving force behind xCDAT
2. Goals and milestones of CDAT’s successor
3. Introducing xCDAT
4. Understanding the basics of Xarray
5. How xCDAT extends Xarray for climate data analysis
6. Technical design philosophy and APIs
7. Demo of capabilities

10.3. Gallery 25
xCDAT Documentation, Release 0.6.0

8. How to get involved

The Driving Force Behind xCDAT

• The CDAT (Community Data Analysis Tools) library has provided a suite of robust and comprehensive open-
source climate data analysis and visualization packages for over 20 years
• A driving need for a modern successor
– Focus on a maintainable and extensible library
– Serve the needs of the climate community in the long-term

Goals and Milestones for CDAT’s Successor

1. Offer similar core capabilities


1. For example geospatial averaging, temporal averaging, and regridding
2. Use modern technologies in the library’s stack
1. Support parallelism and lazy operations
3. Be maintainable, extensible, and easy-to-use
1. Python Enhancement Proposals (PEPs)
2. Automate DevOps processes (unit testing, code coverage)

26 Chapter 10. License


xCDAT Documentation, Release 0.6.0

3. Actively maintain documentation


4. Cultivate an open-source community that can sustain the project
1. Encourage GitHub contributions
2. Community engagement efforts (e.g., Pangeo, ESGF)

Introducing xCDAT

• xCDAT is an extension of xarray for climate data analysis on structured grids


• Goal of providing features and utilities for simple and robust analysis of climate data
• Jointly developed by scientists and developers from:
– E3SM Project (Energy Exascale Earth System Model Project)
– PCMDI (Program for Climate Model Diagnosis and Intercomparison)
– SEATS Project (Simplifying ESM Analysis Through Standards Project)
– Users around the world via GitHub

10.3. Gallery 27
xCDAT Documentation, Release 0.6.0

Before We Dive Deeper, Let’s Talk About Xarray

• Xarray is an evolution of an internal tool developed at The Climate Corporation


• Released as open source in May 2014
• NumFocus fiscally sponsored project since August 2018

28 Chapter 10. License


xCDAT Documentation, Release 0.6.0

Key Features and Capabilities in Xarray

• “N-D labeled arrays and datasets in Python”


– Built upon and extends NumPy and pandas
• Interoperable with scientific Python ecosystem including NumPy, Dask, Pandas, and Matplotlib
• Supports file I/O, indexing and selecting, interpolating, grouping, aggregating, parallelism (Dask), plotting (mat-
plotlib wrapper)
– Supported formats include: netCDF, Iris, OPeNDAP, Zarr, and GRIB
Source: https://xarray.dev/#features

Why use Xarray?

“Xarray introduces labels in the form of dimensions, coordinates and attributes on top of raw NumPy-
like multidimensional arrays, which allows for a more intuitive, more concise, and less error-prone
developer experience.”
—https://xarray.pydata.org/en/v2022.10.0/getting-started-guide/why-xarray.html
• Apply operations over dimensions by name
– x.sum('time')
• Select values by label (or logical location) instead of integer location

10.3. Gallery 29
xCDAT Documentation, Release 0.6.0

– x.loc['2014-01-01'] or x.sel(time='2014-01-01')
• Mathematical operations vectorize across multiple dimensions (array broadcasting) based on dimension
names, not shape
– x - y
• Easily use the split-apply-combine paradigm with groupby
– x.groupby('time.dayofyear').mean().
• Database-like alignment based on coordinate labels that smoothly handles missing values
– x, y = xr.align(x, y, join='outer')
• Keep track of arbitrary metadata in the form of a Python dictionary
– x.attrs
Source: https://docs.xarray.dev/en/v2022.10.0/getting-started-guide/why-xarray.html#what-labels-enable

The Xarray Data Models

“Xarray data models are borrowed from netCDF file format, which provides xarray with a natural and
portable serialization format.”
—https://docs.xarray.dev/en/v2022.10.0/getting-started-guide/why-xarray.html
1. ``xarray.Dataset``
• A dictionary-like container of DataArray objects with aligned dimensions
– DataArray objects are classified as “coordinate variables” or “data variables”
– All data variables have a shared union of coordinates
• Serves a similar purpose to a pandas.DataFrame

2. ``xarray.DataArray``
• A class that attaches dimension names, coordinates, and attributes to multi-dimensional arrays (aka
“labeled arrays”)
• An N-D generalization of a pandas.Series

Exploring the Xarray Data Models

Example dataset: tas_Amon_ACCESS-ESM1-5_historical_r10i1p1f1_gn_185001-201412.nc


• Open up this real dataset from ESGF using xarray’s OPeNDAP support.
– Contains the tas variable (near-surface air temperature) recorded on a monthly frequency
• It is not downloaded until calculations/computations are performed on the Dataset object
– Example of an xarray lazy operation

30 Chapter 10. License


xCDAT Documentation, Release 0.6.0

[1]: # This style import is necessary to properly render Xarray's HTML output with
# the Jupyer RISE extension.
# GitHub Issue: https://github.com/damianavila/RISE/issues/594
# Source: https://github.com/smartass101/xarray-pydata-prague-2020/blob/main/rise.css

from IPython.core.display import HTML

style = """
<style>
.reveal pre.xr-text-repr-fallback {
display: none;
}
.reveal ul.xr-sections {
display: grid
}

.reveal ul ul.xr-var-list {
display: contents
}
</style>
"""

HTML(style)
[1]: <IPython.core.display.HTML object>

[2]: import xarray as xr

filepath = "https://esgf-data1.llnl.gov/thredds/dodsC/css03_data/CMIP6/CMIP/CSIRO/ACCESS-
˓→ESM1-5/historical/r10i1p1f1/Amon/tas/gn/v20200605/tas_Amon_ACCESS-ESM1-5_historical_

˓→r10i1p1f1_gn_185001-201412.nc"

ds = xr.open_dataset(filepath)

The Dataset Model

[3]: ds
[3]: <xarray.Dataset>
Dimensions: (time: 1980, bnds: 2, lat: 145, lon: 192)
Coordinates:
* time (time) datetime64[ns] 1850-01-16T12:00:00 ... 2014-12-16T12:00:00
* lat (lat) float64 -90.0 -88.75 -87.5 -86.25 ... 86.25 87.5 88.75 90.0
* lon (lon) float64 0.0 1.875 3.75 5.625 ... 352.5 354.4 356.2 358.1
height float64 ...
Dimensions without coordinates: bnds
Data variables:
time_bnds (time, bnds) datetime64[ns] ...
lat_bnds (lat, bnds) float64 ...
lon_bnds (lon, bnds) float64 ...
tas (time, lat, lon) float32 ...
(continues on next page)

10.3. Gallery 31
xCDAT Documentation, Release 0.6.0

(continued from previous page)


Attributes: (12/48)
Conventions: CF-1.7 CMIP-6.2
activity_id: CMIP
branch_method: standard
branch_time_in_child: 0.0
branch_time_in_parent: 87658.0
creation_date: 2020-06-05T04:06:11Z
... ...
variant_label: r10i1p1f1
version: v20200605
license: CMIP6 model data produced by CSIRO is li...
cmor_version: 3.4.0
tracking_id: hdl:21.14100/af78ae5e-f3a6-4e99-8cfe-5f2...
DODS_EXTRA.Unlimited_Dimension: time

A dictionary-like container of labeled arrays (DataArray objects) with aligned dimensions.


Key properties:
• dims: a dictionary mapping from dimension names to the fixed length of each dimension (e.g., {‘x’: 6, ‘y’: 6,
‘time’: 8})
• coords: another dict-like container of DataArrays intended to label points used in data_vars (e.g., arrays of
numbers, datetime objects or strings)
• data_vars: a dict-like container of DataArrays corresponding to variables
• attrs: dict to hold arbitrary metadata
Source: https://docs.xarray.dev/en/stable/user-guide/data-structures.html#dataset

The DataArray Model

A class that attaches dimension names, coordinates, and attributes to multi-dimensional arrays (aka
“labeled arrays”)

Key properties:
• values: a numpy.ndarray holding the array’s values
• dims: dimension names for each axis (e.g., (‘x’, ‘y’, ‘z’))
• coords: a dict-like container of arrays (coordinates) that label each point (e.g., 1-dimensional arrays of numbers,
datetime objects or strings)
• attrs: dict to hold arbitrary metadata (attributes)
Source: https://docs.xarray.dev/en/stable/user-guide/data-structures.html#dataarray

[4]: ds.tas
[4]: <xarray.DataArray 'tas' (time: 1980, lat: 145, lon: 192)>
[55123200 values with dtype=float32]
Coordinates:
* time (time) datetime64[ns] 1850-01-16T12:00:00 ... 2014-12-16T12:00:00
* lat (lat) float64 -90.0 -88.75 -87.5 -86.25 ... 86.25 87.5 88.75 90.0
* lon (lon) float64 0.0 1.875 3.75 5.625 7.5 ... 352.5 354.4 356.2 358.1
(continues on next page)

32 Chapter 10. License


xCDAT Documentation, Release 0.6.0

(continued from previous page)


height float64 ...
Attributes:
standard_name: air_temperature
long_name: Near-Surface Air Temperature
comment: near-surface (usually, 2 meter) air temperature
units: K
cell_methods: area: time: mean
cell_measures: area: areacella
history: 2020-06-05T04:06:10Z altered by CMOR: Treated scalar dime...
_ChunkSizes: [ 1 145 192]

Resources for Learning Xarray

• Here are some highly recommended resources:


– Xarray Tutorial
– “Xarray in 45 minutes”
– Xarray Documentation
– Xarray API Reference

xCDAT Extends Xarray for Climate Data Analysis

• Some key xCDAT features are inspired by or ported from the core CDAT library
– e.g., spatial averaging, temporal averaging, regrid2 for horizontal regridding
• Other features leverage powerful libraries in the xarray ecosystem
– xESMF for horizontal regridding
– xgcm for vertical interpolation
– CF-xarray for CF convention metadata interpretation
• xCDAT strives to support datasets CF compliant and common non-CF compliant metadata (time units in
“months since . . . ” or “years since . . . ”)
• Inherent support for lazy operations and parallelism through xarray + dask

10.3. Gallery 33
xCDAT Documentation, Release 0.6.0

The Technical Design Philosophy

• Streamline the user experience of developing code to analyze climate data


• Reduce the complexity and overhead for implementing certain features with xarray (e.g., temporal averaging,
spatial averaging)
• Encourage reusable functionalities through a single library

Leveraging the APIs

xCDAT provides public APIs in two ways:


1. Top-level APIs functions
• e.g., xcdat.open_dataset(), xcdat.center_times()
• Usually for opening datasets and performing dataset level operations
2. Accessor classes
• xcdat provides Dataset accessors, which are implicit namespaces for custom functionality.
• Accessor namespaces clearly identifies separation from built-in xarray methods.
• Operate on variables within the xr.Dataset
• e.g., ds.spatial, ds.temporal, ds.regridder

xcdat spatial functionality is exposed by chaining the .spatial accessor attribute to the xr.Dataset object.
Source: https://xcdat.readthedocs.io/en/latest/api.html

Key Features in xCDAT

Feature
API
Description
Extend xr.open_dataset() and xr.open_mfdataset()
open_dataset()
open_mfdataset()
Bounds generation
Time decoding (CF and select non-CF time units)

34 Chapter 10. License


xCDAT Documentation, Release 0.6.0

Centering of time coordinates


Conversion of longitudinal axis orientation
Temporal averaging
ds.temporal.average()
ds.temporal.group_average()
ds.temporal.climatology()
ds.temporal.departures()
Single snapshot and group average
Climatology and departure
Weighted or unweighted
Optional seasonal configuration< (e.g., custom seasons)
Geospatial averaging
ds.spatial.average()
Rectilinear grids
Weighted
Optional specification of region domain
Horizontal regridding
ds.regridder.horizontal()
Rectilinear and curvilinear grids
Extends xESMF horizontal regridding
Python implementation of regrid2
Vertical regridding
ds.regridder.vertical()
Transform vertical coordinates
Extends xgcm vertical interpolation
Linear, logarithmic, and conservative interpolation
Decode parametric vertical coordinates if required

A Demo of xCDAT Capabilities

• Prerequisites
– Installing xcdat
– Import xcdat
– Open a dataset and apply postprocessing operations
• Scenario 1 - Calculate the spatial averages over the tropical region
• Scenario 2 - Calculate the annual anomalies
• Scenario 3 - Horizontal regridding (bilinear, gaussian grid)

10.3. Gallery 35
xCDAT Documentation, Release 0.6.0

Installing xcdat

xCDAT is available on Anaconda under the conda-forge channel (https://anaconda.org/conda-forge/xcdat)


Two ways to install xcdat with recommended dependencies (xesmf):
1. Create a conda environment from scratch (conda create)

conda create -n <ENV_NAME> -c conda-forge xcdat xesmf


conda activate <ENV_NAME>

2. Install xcdat in an existing conda environment (conda install)

conda activate <ENV_NAME>


conda install -c conda-forge xcdat xesmf

3. If you’re working on HPC, we have a guide for that too!


• https://xcdat.readthedocs.io/en/latest/getting-started-hpc-jupyter.html
Source: https://xcdat.readthedocs.io/en/latest/getting-started.html

Opening a dataset

Example dataset: tas_Amon_ACCESS-ESM1-5_historical_r10i1p1f1_gn_185001-201412.nc (same as before)

[5]: # This gives access to all xcdat public top-level APIs and accessor classes.
import xcdat as xc

# We import these packages specifically for plotting. It is not required to use xcdat.
import matplotlib.pyplot as plt
import pandas as pd

filepath = "https://esgf-data1.llnl.gov/thredds/dodsC/css03_data/CMIP6/CMIP/CSIRO/ACCESS-
˓→ESM1-5/historical/r10i1p1f1/Amon/tas/gn/v20200605/tas_Amon_ACCESS-ESM1-5_historical_

˓→r10i1p1f1_gn_185001-201412.nc"

ds = xc.open_dataset(
filepath,
add_bounds=True,
decode_times=True,
center_times=True
)

# Unit adjustment from Kelvin to Celcius.


ds["tas"] = ds.tas - 273.15

[6]: ds
[6]: <xarray.Dataset>
Dimensions: (time: 1980, bnds: 2, lat: 145, lon: 192)
Coordinates:
* time (time) object 1850-01-16 12:00:00 ... 2014-12-16 12:00:00
* lat (lat) float64 -90.0 -88.75 -87.5 -86.25 ... 86.25 87.5 88.75 90.0
* lon (lon) float64 0.0 1.875 3.75 5.625 ... 352.5 354.4 356.2 358.1
(continues on next page)

36 Chapter 10. License


xCDAT Documentation, Release 0.6.0

(continued from previous page)


height float64 2.0
Dimensions without coordinates: bnds
Data variables:
time_bnds (time, bnds) object ...
lat_bnds (lat, bnds) float64 ...
lon_bnds (lon, bnds) float64 ...
tas (time, lat, lon) float32 -27.19 -27.19 -27.19 ... -25.29 -25.29
Attributes: (12/48)
Conventions: CF-1.7 CMIP-6.2
activity_id: CMIP
branch_method: standard
branch_time_in_child: 0.0
branch_time_in_parent: 87658.0
creation_date: 2020-06-05T04:06:11Z
... ...
variant_label: r10i1p1f1
version: v20200605
license: CMIP6 model data produced by CSIRO is li...
cmor_version: 3.4.0
tracking_id: hdl:21.14100/af78ae5e-f3a6-4e99-8cfe-5f2...
DODS_EXTRA.Unlimited_Dimension: time

Scenario 1: Spatial Averaging

Related accessor: ds.spatial


In this example, we calculate the spatial average of tas over the tropical region and plot the first 100 time steps.

[7]: ds_trop_avg = ds.spatial.average("tas", axis=["X","Y"], lat_bounds=(-25,25))


ds_trop_avg.tas
[7]: <xarray.DataArray 'tas' (time: 1980)>
array([25.24722608, 25.61795924, 25.96516235, ..., 26.79536823,
26.67771602, 26.27182383])
Coordinates:
* time (time) object 1850-01-16 12:00:00 ... 2014-12-16 12:00:00
height float64 2.0

[8]: ds_trop_avg.tas.isel(time=slice(1, 100)).plot()


[8]: [<matplotlib.lines.Line2D at 0x14bdf2170>]

10.3. Gallery 37
xCDAT Documentation, Release 0.6.0

Scenario 2: Calculate temporal average

Related accessor: ds.temporal


In this example, we calculate the temporal average of ``tas`` as a single snapshot (time dimension is collapsed).

[9]: ds_avg = ds.temporal.average("tas", weighted=True)


ds_avg.tas
[9]: <xarray.DataArray 'tas' (lat: 145, lon: 192)>
array([[-48.01481628, -48.01481628, -48.01481628, ..., -48.01481628,
-48.01481628, -48.01481628],
[-44.94085363, -44.97948214, -45.01815398, ..., -44.82408252,
-44.86273067, -44.9009281 ],
[-44.11875274, -44.23060624, -44.33960158, ..., -43.76766492,
-43.88593717, -44.00303006],
...,
[-18.21076615, -18.17513373, -18.13957458, ..., -18.32720478,
-18.28428828, -18.2486193 ],
[-18.50778243, -18.49301854, -18.47902819, ..., -18.55410851,
-18.5406963 , -18.52413098],
[-19.07366375, -19.07366375, -19.07366375, ..., -19.07366375,
-19.07366375, -19.07366375]])
(continues on next page)

38 Chapter 10. License


xCDAT Documentation, Release 0.6.0

(continued from previous page)


Coordinates:
* lat (lat) float64 -90.0 -88.75 -87.5 -86.25 ... 86.25 87.5 88.75 90.0
* lon (lon) float64 0.0 1.875 3.75 5.625 7.5 ... 352.5 354.4 356.2 358.1
height float64 2.0
Attributes:
operation: temporal_avg
mode: average
freq: month
weighted: True

[10]: ds_avg.tas.plot(label="weighted")
[10]: <matplotlib.collections.QuadMesh at 0x10cc641f0>

10.3. Gallery 39
xCDAT Documentation, Release 0.6.0

Scenario 3: Horizontal Regridding

Related accessor: ds.regridder


In this example, we will generate a gaussian grid with 32 latitudes to regrid our input data to.

Create the output grid

[11]: output_grid = xc.create_gaussian_grid(32)


output_grid
[11]: <xarray.Dataset>
Dimensions: (lat: 32, bnds: 2, lon: 65)
Coordinates:
* lat (lat) float64 85.76 80.27 74.74 69.21 ... -74.74 -80.27 -85.76
* lon (lon) float64 0.0 5.625 11.25 16.88 ... 343.1 348.8 354.4 360.0
Dimensions without coordinates: bnds
Data variables:
lat_bnds (lat, bnds) float64 90.0 83.21 83.21 77.61 ... -83.21 -83.21 -90.0
lon_bnds (lon, bnds) float64 -2.812 2.812 2.812 8.438 ... 357.2 357.2 362.8

Plot the Input vs. Output Grid

[12]: fig, axes = plt.subplots(ncols=2, figsize=(16, 6))

input_grid = ds.regridder.grid
input_grid.plot.scatter(x='lon', y='lat', s=5, ax=axes[0], add_colorbar=False, cmap=plt.
˓→cm.RdBu)

axes[0].set_title('Input Grid')

output_grid.plot.scatter(x='lon', y='lat', s=5, ax=axes[1], add_colorbar=False, cmap=plt.


˓→cm.RdBu)

axes[1].set_title('Output Grid')

plt.tight_layout()

40 Chapter 10. License


xCDAT Documentation, Release 0.6.0

Regrid the data

xCDAT offers horizontal regridding with xESMF (default) and a Python port of regrid2. We will be using xESMF to
regrid.

[13]: # xesmf supports "bilinear", "conservative", "nearest_s2d", "nearest_d2s", and "patch"


output = ds.regridder.horizontal('tas', output_grid, tool='xesmf', method='bilinear')

[14]: fig, axes = plt.subplots(ncols=2, figsize=(16, 4))

ds.tas.isel(time=0).plot(ax=axes[0])
axes[0].set_title('Input data')

output.tas.isel(time=0).plot(ax=axes[1])
axes[1].set_title('Output data')

plt.tight_layout()

Parallelism with Dask

Nearly all existing xarray methods have been extended to work automatically with Dask arrays for paral-
lelism
—https://docs.xarray.dev/en/stable/user-guide/dask.html#using-dask-with-xarray
• Parallelized xarray methods include indexing, computation, concatenating and grouped operations
• xCDAT APIs that build upon xarray methods inherently support Dask parallelism
– Dask arrays are loaded into memory only when absolutely required (e.g., decoding time, handling bounds)

High-level Overview of Dask Mechanics

• Dask divides arrays into many small pieces, called “chunks” (each presumed to be small enough to fit into
memory)
• Dask arrays operations are lazy
– Operations queue up a series of tasks mapped over blocks
– No computation is performed until values need to be computed (lazy)
– Data is loaded into memory and computation is performed in streaming fashion, block-by-block

10.3. Gallery 41
xCDAT Documentation, Release 0.6.0

• Computation is controlled by multi-processing or thread pool


Source: https://docs.xarray.dev/en/stable/user-guide/dask.html

How do I activate Dask with Xarray/xCDAT?

• The usual way to create a Dataset filled with Dask arrays is to load the data from a netCDF file or files
• You can do this by supplying a chunks argument to open_dataset() or using the open_mfdataset() function
– By default, open_mfdataset() will chunk each netCDF file into a single Dask array
– Supply the chunks argument to control the size of the resulting Dask arrays
– Xarray maintains a Dask array until it is not possible (raises an exception instead of loading into memory)
Source: https://docs.xarray.dev/en/stable/user-guide/dask.html#reading-and-writing-data
[15]: filepath = "http://esgf.nci.org.au/thredds/dodsC/master/CMIP6/CMIP/CSIRO/ACCESS-ESM1-5/
˓→historical/r10i1p1f1/Amon/tas/gn/v20200605/tas_Amon_ACCESS-ESM1-5_historical_r10i1p1f1_

˓→gn_185001-201412.nc"

# Use .chunk() to activate Dask arrays


# NOTE: `open_mfdataset()` automatically chunks by the number of files, which
# might not be optimal.
ds = xc.open_dataset(
filepath,
chunks={"time": "auto"}
)

[16]: ds
[16]: <xarray.Dataset>
Dimensions: (time: 1980, bnds: 2, lat: 145, lon: 192)
Coordinates:
* time (time) object 1850-01-16 12:00:00 ... 2014-12-16 12:00:00
* lat (lat) float64 -90.0 -88.75 -87.5 -86.25 ... 86.25 87.5 88.75 90.0
* lon (lon) float64 0.0 1.875 3.75 5.625 ... 352.5 354.4 356.2 358.1
height float64 ...
Dimensions without coordinates: bnds
Data variables:
time_bnds (time, bnds) object dask.array<chunksize=(1980, 2), meta=np.ndarray>
lat_bnds (lat, bnds) float64 dask.array<chunksize=(145, 2), meta=np.ndarray>
lon_bnds (lon, bnds) float64 dask.array<chunksize=(192, 2), meta=np.ndarray>
tas (time, lat, lon) float32 dask.array<chunksize=(1205, 145, 192), meta=np.
˓→ndarray>

Attributes: (12/49)
Conventions: CF-1.7 CMIP-6.2
activity_id: CMIP
branch_method: standard
branch_time_in_child: 0.0
branch_time_in_parent: 87658.0
creation_date: 2020-06-05T04:06:11Z
... ...
version: v20200605
license: CMIP6 model data produced by CSIRO is li...
(continues on next page)

42 Chapter 10. License


xCDAT Documentation, Release 0.6.0

(continued from previous page)


cmor_version: 3.4.0
_NCProperties: version=2,netcdf=4.6.2,hdf5=1.10.5
tracking_id: hdl:21.14100/af78ae5e-f3a6-4e99-8cfe-5f2...
DODS_EXTRA.Unlimited_Dimension: time

Example of Parallelism in xCDAT’s Spatial Averager

This is a demonstration that chunked Dataset objects work with xCDAT APIs. - The generation of weights is
serial - The weighted average operation should be parallelized (uses xarray’s .weighted() API) - We intend on
doing performance metrics and giving guidance on when to chunk - For now visit https://github.com/xCDAT/xcdat/
discussions/376 for best practices

[17]: tas_global = ds.spatial.average("tas", axis=["X", "Y"], weights="generate")["tas"]


tas_global
[17]: <xarray.DataArray 'tas' (time: 1980)>
dask.array<truediv, shape=(1980,), dtype=float64, chunksize=(1205,), chunktype=numpy.
˓→ndarray>

Coordinates:
* time (time) object 1850-01-16 12:00:00 ... 2014-12-16 12:00:00
height float64 ...
Attributes:
standard_name: air_temperature
long_name: Near-Surface Air Temperature
comment: near-surface (usually, 2 meter) air temperature
units: K
cell_methods: area: time: mean
cell_measures: area: areacella
history: 2020-06-05T04:06:10Z altered by CMOR: Treated scalar dime...
_ChunkSizes: [ 1 145 192]

Further Dask Guidance

Visit these pages for more guidance (e.g., when to parallelize):


• Parallel computing with Dask: https://docs.xarray.dev/en/stable/user-guide/dask.html
• Xarray with Dask Arrays: https://examples.dask.org/xarray.html

Key Takeaways

• A driving need for a modern successor to CDAT


• Serves the climate community in the long-term
• xCDAT is an extension of xarray for climate data analysis on structured grids
• Goal of providing features and utilities for simple and robust analysis of climate data

10.3. Gallery 43
xCDAT Documentation, Release 0.6.0

Get Involved on GitHub!

• Code contributions are welcome and appreciated


– GitHub Repository: https://github.com/xCDAT/xcdat
– Contributing Guide: https://xcdat.readthedocs.io/en/latest/contributing.html
• Submit and/or address tickets for feature suggestions, bugs, and documentation updates
– GitHub Issues: https://github.com/xCDAT/xcdat/issues
• Participate in forum discussions on version releases, architecture, feature suggestions, etc.
– GitHub Discussions: https://github.com/xCDAT/xcdat/discussions

44 Chapter 10. License


xCDAT Documentation, Release 0.6.0

10.3.2 General Dataset Utilities

Authors:
• Tom Vo
• Stephen Po-Chedley
Date: 05/26/22

Overview

This notebook demonstrates the use of general utility methods available in xcdat, including the reorientation of the
longitude axis, centering of time coordinates using time bounds, and adding and getting bounds.

[1]: import xcdat

Open a dataset

Datasets can be opened and read using open_dataset() or open_mfdataset() (multi-file).


Related APIs:
• xcdat.open_dataset()
• xcdat.open_mfdataset()

[2]: dataset_links = [
"https://esgf-data2.llnl.gov/thredds/dodsC/user_pub_work/E3SM/1_0/amip_1850_aeroF/
˓→1deg_atm_60-30km_ocean/atmos/180x360/time-series/mon/ens2/v3/TS_187001_189412.nc",

"https://esgf-data2.llnl.gov/thredds/dodsC/user_pub_work/E3SM/1_0/amip_1850_aeroF/
˓→1deg_atm_60-30km_ocean/atmos/180x360/time-series/mon/ens2/v3/TS_189501_191912.nc",

[3]: # NOTE: Opening a multi-file dataset will result in data variables to be dask
# arrays.
ds = xcdat.open_mfdataset(dataset_links)

[4]: ds
[4]: <xarray.Dataset>
Dimensions: (lat: 180, lon: 360, nbnd: 2, time: 600)
Coordinates:
* lat (lat) float64 -89.5 -88.5 -87.5 -86.5 ... 86.5 87.5 88.5 89.5
* lon (lon) float64 0.5 1.5 2.5 3.5 4.5 ... 356.5 357.5 358.5 359.5
* time (time) object 1870-02-01 00:00:00 ... 1920-01-01 00:00:00
Dimensions without coordinates: nbnd
Data variables:
lat_bnds (lat, nbnd) float64 dask.array<chunksize=(180, 2), meta=np.ndarray>
lon_bnds (lon, nbnd) float64 dask.array<chunksize=(360, 2), meta=np.ndarray>
gw (lat) float64 dask.array<chunksize=(180,), meta=np.ndarray>
time_bnds (time, nbnd) object dask.array<chunksize=(300, 2), meta=np.ndarray>
area (lat, lon) float64 dask.array<chunksize=(180, 360), meta=np.ndarray>
(continues on next page)

10.3. Gallery 45
xCDAT Documentation, Release 0.6.0

(continued from previous page)


TS (time, lat, lon) float32 dask.array<chunksize=(300, 180, 360), meta=np.
˓→ndarray>

Attributes: (12/21)
ne: 30
np: 4
Conventions: CF-1.0
source: CAM
case: 20180622.DECKv1b_A2_1850aeroF.ne30_oEC.e...
title: UNSET
... ...
remap_script: ncremap
remap_hostname: acme1
remap_version: 4.9.6
map_file: /export/zender1/data/maps/map_ne30np4_to...
input_file: /p/user_pub/e3sm/baldwin32/workshop/amip...
DODS_EXTRA.Unlimited_Dimension: time

Reorient the longitude axis

Longitude can be represented from 0 to 360 E or as 180 W to 180 E. xcdat allows you to convert between these axes
systems.
• Related API: xcdat.swap_lon_axis()
• Alternative solution: xcdat.open_mfdataset(dataset_links, lon_orient=(-180, 180))

[5]: ds.lon
[5]: <xarray.DataArray 'lon' (lon: 360)>
array([ 0.5, 1.5, 2.5, ..., 357.5, 358.5, 359.5])
Coordinates:
* lon (lon) float64 0.5 1.5 2.5 3.5 4.5 ... 355.5 356.5 357.5 358.5 359.5
Attributes:
long_name: Longitude of Grid Cell Centers
standard_name: longitude
units: degrees_east
axis: X
valid_min: 0.0
valid_max: 360.0
bounds: lon_bnds

[6]: ds2 = xcdat.swap_lon_axis(ds, to=(-180,180))

[7]: ds2.lon
[7]: <xarray.DataArray 'lon' (lon: 360)>
array([-179.5, -178.5, -177.5, ..., 177.5, 178.5, 179.5])
Coordinates:
* lon (lon) float64 -179.5 -178.5 -177.5 -176.5 ... 177.5 178.5 179.5
Attributes:
long_name: Longitude of Grid Cell Centers
standard_name: longitude
(continues on next page)

46 Chapter 10. License


xCDAT Documentation, Release 0.6.0

(continued from previous page)


units: degrees_east
axis: X
valid_min: 0.0
valid_max: 360.0
bounds: lon_bnds

Center the time coordinates

A given point of time often represents some time period (e.g., a monthly average). In this situation, data providers
sometimes record the time as the beginning, middle, or end of the period. center_times() places the time coordinate
in the center of the time interval (using time bounds to determine the center of the period).
• Related API: xcdat.center_times()
• Alternative solution: xcdat.open_mfdataset(dataset_links, center_times=True)
The time bounds used for centering time coordinates:

[8]: # We access the values with .values because it is a dask array.


ds.time_bnds.values
[8]: array([[cftime.DatetimeNoLeap(1870, 1, 1, 0, 0, 0, 0, has_year_zero=True),
cftime.DatetimeNoLeap(1870, 2, 1, 0, 0, 0, 0, has_year_zero=True)],
[cftime.DatetimeNoLeap(1870, 2, 1, 0, 0, 0, 0, has_year_zero=True),
cftime.DatetimeNoLeap(1870, 3, 1, 0, 0, 0, 0, has_year_zero=True)],
[cftime.DatetimeNoLeap(1870, 3, 1, 0, 0, 0, 0, has_year_zero=True),
cftime.DatetimeNoLeap(1870, 4, 1, 0, 0, 0, 0, has_year_zero=True)],
...,
[cftime.DatetimeNoLeap(1919, 10, 1, 0, 0, 0, 0, has_year_zero=True),
cftime.DatetimeNoLeap(1919, 11, 1, 0, 0, 0, 0, has_year_zero=True)],
[cftime.DatetimeNoLeap(1919, 11, 1, 0, 0, 0, 0, has_year_zero=True),
cftime.DatetimeNoLeap(1919, 12, 1, 0, 0, 0, 0, has_year_zero=True)],
[cftime.DatetimeNoLeap(1919, 12, 1, 0, 0, 0, 0, has_year_zero=True),
cftime.DatetimeNoLeap(1920, 1, 1, 0, 0, 0, 0, has_year_zero=True)]],
dtype=object)

Before centering time coordinates:

[9]: ds.time
[9]: <xarray.DataArray 'time' (time: 600)>
array([cftime.DatetimeNoLeap(1870, 2, 1, 0, 0, 0, 0, has_year_zero=True),
cftime.DatetimeNoLeap(1870, 3, 1, 0, 0, 0, 0, has_year_zero=True),
cftime.DatetimeNoLeap(1870, 4, 1, 0, 0, 0, 0, has_year_zero=True), ...,
cftime.DatetimeNoLeap(1919, 11, 1, 0, 0, 0, 0, has_year_zero=True),
cftime.DatetimeNoLeap(1919, 12, 1, 0, 0, 0, 0, has_year_zero=True),
cftime.DatetimeNoLeap(1920, 1, 1, 0, 0, 0, 0, has_year_zero=True)],
dtype=object)
Coordinates:
* time (time) object 1870-02-01 00:00:00 ... 1920-01-01 00:00:00
Attributes:
long_name: time
bounds: time_bnds
cell_methods: time: mean

10.3. Gallery 47
xCDAT Documentation, Release 0.6.0

[10]: ds3 = xcdat.center_times(ds)

After centering time coordinates:

[11]: ds3.time
[11]: <xarray.DataArray 'time' (time: 600)>
array([cftime.DatetimeNoLeap(1870, 1, 16, 12, 0, 0, 0, has_year_zero=True),
cftime.DatetimeNoLeap(1870, 2, 15, 0, 0, 0, 0, has_year_zero=True),
cftime.DatetimeNoLeap(1870, 3, 16, 12, 0, 0, 0, has_year_zero=True),
...,
cftime.DatetimeNoLeap(1919, 10, 16, 12, 0, 0, 0, has_year_zero=True),
cftime.DatetimeNoLeap(1919, 11, 16, 0, 0, 0, 0, has_year_zero=True),
cftime.DatetimeNoLeap(1919, 12, 16, 12, 0, 0, 0, has_year_zero=True)],
dtype=object)
Coordinates:
* time (time) object 1870-01-16 12:00:00 ... 1919-12-16 12:00:00
Attributes:
long_name: time
bounds: time_bnds
cell_methods: time: mean

Add bounds

Bounds are critical to many xcdat operations. For example, they are used in determining the weights in spatial or
temporal averages and in regridding operations. add_bounds() will attempt to produce bounds if they do not exist in
the original dataset.
• Related API: xarray.Dataset.bounds.add_bounds()
• Alternative solution: xcdat.open_mfdataset(dataset_links, add_bounds=True)
– (Assuming the file doesn’t already have bounds for your desired axis/axes)

[12]: # We are dropping the existing bounds to demonstrate adding bounds.


ds4 = ds.drop_vars("time_bnds")

[13]: try:
ds4.bounds.get_bounds("T")
except KeyError as e:
print(e)
"Bounds were not found for the coordinate variable 'time'. They must be added (Dataset.
˓→bounds.add_bounds)."

[14]: # A `width` kwarg can be specified, which is width of the bounds relative to
# the position of the nearest points. The default value is 0.5.
ds4 = ds4.bounds.add_bounds("T", width=0.5)

[15]: ds4.bounds.get_bounds("T")
[15]: <xarray.DataArray 'time_bnds' (time: 600, bnds: 2)>
array([[cftime.DatetimeNoLeap(1870, 1, 18, 0, 0, 0, 0, has_year_zero=True),
cftime.DatetimeNoLeap(1870, 2, 15, 0, 0, 0, 0, has_year_zero=True)],
(continues on next page)

48 Chapter 10. License


xCDAT Documentation, Release 0.6.0

(continued from previous page)


[cftime.DatetimeNoLeap(1870, 2, 15, 0, 0, 0, 0, has_year_zero=True),
cftime.DatetimeNoLeap(1870, 3, 16, 12, 0, 0, 0, has_year_zero=True)],
[cftime.DatetimeNoLeap(1870, 3, 16, 12, 0, 0, 0, has_year_zero=True),
cftime.DatetimeNoLeap(1870, 4, 16, 0, 0, 0, 0, has_year_zero=True)],
...,
[cftime.DatetimeNoLeap(1919, 10, 16, 12, 0, 0, 0, has_year_zero=True),
cftime.DatetimeNoLeap(1919, 11, 16, 0, 0, 0, 0, has_year_zero=True)],
[cftime.DatetimeNoLeap(1919, 11, 16, 0, 0, 0, 0, has_year_zero=True),
cftime.DatetimeNoLeap(1919, 12, 16, 12, 0, 0, 0, has_year_zero=True)],
[cftime.DatetimeNoLeap(1919, 12, 16, 12, 0, 0, 0, has_year_zero=True),
cftime.DatetimeNoLeap(1920, 1, 16, 12, 0, 0, 0, has_year_zero=True)]],
dtype=object)
Coordinates:
* time (time) object 1870-02-01 00:00:00 ... 1920-01-01 00:00:00
Dimensions without coordinates: bnds
Attributes:
xcdat_bounds: True

Add missing bounds for all axes supported by xcdat (X, Y, T, Z)

• Related API: xarray.Dataset.bounds.add_missing_bounds()

[16]: # We drop the dataset axes bounds to demonstrate generating missing bounds.
ds5 = ds.drop_vars(["time_bnds", "lat_bnds", "lon_bnds"])

[17]: ds5
[17]: <xarray.Dataset>
Dimensions: (lat: 180, lon: 360, time: 600)
Coordinates:
* lat (lat) float64 -89.5 -88.5 -87.5 -86.5 -85.5 ... 86.5 87.5 88.5 89.5
* lon (lon) float64 0.5 1.5 2.5 3.5 4.5 ... 355.5 356.5 357.5 358.5 359.5
* time (time) object 1870-02-01 00:00:00 ... 1920-01-01 00:00:00
Data variables:
gw (lat) float64 dask.array<chunksize=(180,), meta=np.ndarray>
area (lat, lon) float64 dask.array<chunksize=(180, 360), meta=np.ndarray>
TS (time, lat, lon) float32 dask.array<chunksize=(300, 180, 360), meta=np.
˓→ndarray>

Attributes: (12/21)
ne: 30
np: 4
Conventions: CF-1.0
source: CAM
case: 20180622.DECKv1b_A2_1850aeroF.ne30_oEC.e...
title: UNSET
... ...
remap_script: ncremap
remap_hostname: acme1
remap_version: 4.9.6
map_file: /export/zender1/data/maps/map_ne30np4_to...
input_file: /p/user_pub/e3sm/baldwin32/workshop/amip...
DODS_EXTRA.Unlimited_Dimension: time

10.3. Gallery 49
xCDAT Documentation, Release 0.6.0

[18]: ds5 = ds5.bounds.add_missing_bounds(width=0.5)

[19]: ds5
[19]: <xarray.Dataset>
Dimensions: (lat: 180, lon: 360, time: 600, bnds: 2)
Coordinates:
* lat (lat) float64 -89.5 -88.5 -87.5 -86.5 ... 86.5 87.5 88.5 89.5
* lon (lon) float64 0.5 1.5 2.5 3.5 4.5 ... 356.5 357.5 358.5 359.5
* time (time) object 1870-02-01 00:00:00 ... 1920-01-01 00:00:00
Dimensions without coordinates: bnds
Data variables:
gw (lat) float64 dask.array<chunksize=(180,), meta=np.ndarray>
area (lat, lon) float64 dask.array<chunksize=(180, 360), meta=np.ndarray>
TS (time, lat, lon) float32 dask.array<chunksize=(300, 180, 360), meta=np.
˓→ndarray>

lon_bnds (lon, bnds) float64 0.0 1.0 1.0 2.0 ... 358.0 359.0 359.0 360.0
lat_bnds (lat, bnds) float64 -90.0 -89.0 -89.0 -88.0 ... 89.0 89.0 90.0
time_bnds (time, bnds) object 1870-01-18 00:00:00 ... 1920-01-16 12:00:00
Attributes: (12/21)
ne: 30
np: 4
Conventions: CF-1.0
source: CAM
case: 20180622.DECKv1b_A2_1850aeroF.ne30_oEC.e...
title: UNSET
... ...
remap_script: ncremap
remap_hostname: acme1
remap_version: 4.9.6
map_file: /export/zender1/data/maps/map_ne30np4_to...
input_file: /p/user_pub/e3sm/baldwin32/workshop/amip...
DODS_EXTRA.Unlimited_Dimension: time

Get the dimension coordinates for an axis.

In xarray, you can get a dimension coordinates by directly referencing its name (e.g., ds.lat). xcdat provides an
alternative way to get dimension coordinates agnostically by simply passing the CF axis key to applicable APIs.
• Related API: xcdat.get_dim_coords()
Helpful knowledge:
• This API uses cf_xarray to interpret CF axis names and coordinate names in the xarray object attributes. Refer
to Metadata Interpretation for more information.
Xarray documentation on coordinates (source):
• There are two types of coordinates in xarray:
– dimension coordinates are one dimensional coordinates with a name equal to their sole dimension (marked
by * when printing a dataset or data array). They are used for label based indexing and alignment, like the
index found on a pandas DataFrame or Series. Indeed, these “dimension” coordinates use a pandas.Index
internally to store their values.

50 Chapter 10. License


xCDAT Documentation, Release 0.6.0

– non-dimension coordinates are variables that contain coordinate data, but are not a dimension coordinate.
They can be multidimensional (see Working with Multidimensional Coordinates), and there is no relation-
ship between the name of a non-dimension coordinate and the name(s) of its dimension(s). Non-dimension
coordinates can be useful for indexing or plotting; otherwise, xarray does not make any direct use of the
values associated with them. They are not used for alignment or automatic indexing, nor are they required
to match when doing arithmetic (see Coordinates).
• Xarray’s terminology differs from the CF terminology, where the “dimension coordinates” are called “coordinate
variables”, and the “non-dimension coordinates” are called “auxiliary coordinate variables” (see GH1295 for
more details).

1. axis attr

[20]: ds.lat.attrs["axis"]
[20]: 'Y'

2. standard_name attr

[21]: ds.lat.attrs["standard_name"]
[21]: 'latitude'

[22]: "lat" in ds.dims


[22]: True

[24]: xcdat.get_axis_coord(ds, axis="Y")


[24]: <xarray.DataArray 'lat' (lat: 180)>
array([-89.5, -88.5, -87.5, -86.5, -85.5, -84.5, -83.5, -82.5, -81.5, -80.5,
-79.5, -78.5, -77.5, -76.5, -75.5, -74.5, -73.5, -72.5, -71.5, -70.5,
-69.5, -68.5, -67.5, -66.5, -65.5, -64.5, -63.5, -62.5, -61.5, -60.5,
-59.5, -58.5, -57.5, -56.5, -55.5, -54.5, -53.5, -52.5, -51.5, -50.5,
-49.5, -48.5, -47.5, -46.5, -45.5, -44.5, -43.5, -42.5, -41.5, -40.5,
-39.5, -38.5, -37.5, -36.5, -35.5, -34.5, -33.5, -32.5, -31.5, -30.5,
-29.5, -28.5, -27.5, -26.5, -25.5, -24.5, -23.5, -22.5, -21.5, -20.5,
-19.5, -18.5, -17.5, -16.5, -15.5, -14.5, -13.5, -12.5, -11.5, -10.5,
-9.5, -8.5, -7.5, -6.5, -5.5, -4.5, -3.5, -2.5, -1.5, -0.5,
0.5, 1.5, 2.5, 3.5, 4.5, 5.5, 6.5, 7.5, 8.5, 9.5,
10.5, 11.5, 12.5, 13.5, 14.5, 15.5, 16.5, 17.5, 18.5, 19.5,
20.5, 21.5, 22.5, 23.5, 24.5, 25.5, 26.5, 27.5, 28.5, 29.5,
30.5, 31.5, 32.5, 33.5, 34.5, 35.5, 36.5, 37.5, 38.5, 39.5,
40.5, 41.5, 42.5, 43.5, 44.5, 45.5, 46.5, 47.5, 48.5, 49.5,
50.5, 51.5, 52.5, 53.5, 54.5, 55.5, 56.5, 57.5, 58.5, 59.5,
60.5, 61.5, 62.5, 63.5, 64.5, 65.5, 66.5, 67.5, 68.5, 69.5,
70.5, 71.5, 72.5, 73.5, 74.5, 75.5, 76.5, 77.5, 78.5, 79.5,
80.5, 81.5, 82.5, 83.5, 84.5, 85.5, 86.5, 87.5, 88.5, 89.5])
Coordinates:
* lat (lat) float64 -89.5 -88.5 -87.5 -86.5 -85.5 ... 86.5 87.5 88.5 89.5
Attributes:
long_name: Latitude of Grid Cell Centers
(continues on next page)

10.3. Gallery 51
xCDAT Documentation, Release 0.6.0

(continued from previous page)


standard_name: latitude
units: degrees_north
axis: Y
valid_min: -90.0
valid_max: 90.0
bounds: lat_bnds

10.3.3 Calculate Geospatial Weighted Averages from Monthly Time Series

Authors:
• Tom Vo
• Stephen Po-Chedley
Date: 05/27/22
Related APIs:
• xarray.Dataset.spatial.average()
The data used in this example can be found through the Earth System Grid Federation (ESGF) search portal.

Overview

A common data reduction in geophysical sciences is to produce spatial averages. Spatial averaging functionality in
xcdat allows users to quickly produce area-weighted spatial averages for selected regions (or full dataset domains).
In the example below, we demonstrate the opening of a (remote) dataset and spatial averaging over the global, tropical,
and Niño 3.4 domains.

[1]: %matplotlib inline

import matplotlib.pyplot as plt


import xcdat

1. Open the Dataset

We are using xarray’s OPeNDAP support to read a netCDF4 dataset file directly from its source. The data is not loaded
over the network until we perform operations on it (e.g., temperature unit adjustment).
More information on the xarray’s OPeNDAP support can be found here.

[2]: filepath = "https://esgf-data1.llnl.gov/thredds/dodsC/css03_data/CMIP6/CMIP/CSIRO/ACCESS-


˓→ESM1-5/historical/r10i1p1f1/Amon/tas/gn/v20200605/tas_Amon_ACCESS-ESM1-5_historical_

˓→r10i1p1f1_gn_185001-201412.nc"

ds = xcdat.open_dataset(filepath)

# Unit adjust (-273.15, K to C)


ds["tas"] = ds.tas - 273.15

ds

52 Chapter 10. License


xCDAT Documentation, Release 0.6.0

[2]: <xarray.Dataset>
Dimensions: (time: 1980, bnds: 2, lat: 145, lon: 192)
Coordinates:
* time (time) datetime64[ns] 1850-01-16T12:00:00 ... 2014-12-16T12:00:00
* lat (lat) float64 -90.0 -88.75 -87.5 -86.25 ... 86.25 87.5 88.75 90.0
* lon (lon) float64 0.0 1.875 3.75 5.625 ... 352.5 354.4 356.2 358.1
height float64 2.0
Dimensions without coordinates: bnds
Data variables:
time_bnds (time, bnds) datetime64[ns] ...
lat_bnds (lat, bnds) float64 ...
lon_bnds (lon, bnds) float64 ...
tas (time, lat, lon) float32 -27.19 -27.19 -27.19 ... -25.29 -25.29
Attributes: (12/48)
Conventions: CF-1.7 CMIP-6.2
activity_id: CMIP
branch_method: standard
branch_time_in_child: 0.0
branch_time_in_parent: 87658.0
creation_date: 2020-06-05T04:06:11Z
... ...
variant_label: r10i1p1f1
version: v20200605
license: CMIP6 model data produced by CSIRO is li...
cmor_version: 3.4.0
tracking_id: hdl:21.14100/af78ae5e-f3a6-4e99-8cfe-5f2...
DODS_EXTRA.Unlimited_Dimension: time

2. Global

[3]: ds_global_avg = ds.spatial.average("tas")

[4]: ds_global_avg.tas
[4]: <xarray.DataArray 'tas' (time: 1980)>
array([12.52127071, 13.09115223, 13.60703132, ..., 15.5767848 ,
14.65664621, 13.84951678])
Coordinates:
* time (time) datetime64[ns] 1850-01-16T12:00:00 ... 2014-12-16T12:00:00
height float64 2.0

[5]: # Plot the first 100 time steps


ds_global_avg.tas.isel(time=slice(0, 100)).plot()
[5]: [<matplotlib.lines.Line2D at 0x7fac9d6aee80>]

10.3. Gallery 53
xCDAT Documentation, Release 0.6.0

3. Tropical Region

[6]: ds_trop_avg = ds.spatial.average("tas", lat_bounds=(-25, 25))

[7]: ds_trop_avg.tas
[7]: <xarray.DataArray 'tas' (time: 1980)>
array([25.24722608, 25.61795924, 25.96516235, ..., 26.79536823,
26.67771602, 26.27182383])
Coordinates:
* time (time) datetime64[ns] 1850-01-16T12:00:00 ... 2014-12-16T12:00:00
height float64 2.0

[8]: # Plot the first 100 time steps


ds_trop_avg.tas.isel(time=slice(0, 100)).plot()
[8]: [<matplotlib.lines.Line2D at 0x7fac9d58ff70>]

54 Chapter 10. License


xCDAT Documentation, Release 0.6.0

4. Nino 3.4 Region

Niño 3.4 (5N-5S, 170W-120W): The Niño 3.4 anomalies may be thought of as representing the average >
equatorial SSTs across the Pacific from about the dateline to the South American coast. The Niño > 3.4
index typically uses a 5-month running mean, and El Niño or La Niña events are defined when > the Niño
3.4 SSTs exceed +/- 0.4C for a period of six months or more.”
—https://climatedataguide.ucar.edu/climate-data/nino-sst-indices-nino-12-3-34-4-oni-and-tni

[9]: ds_nino_avg = ds.spatial.average("tas", lat_bounds=(-5, 5), lon_bounds=(190, 240))

[10]: ds_nino_avg.tas
[10]: <xarray.DataArray 'tas' (time: 1980)>
array([27.00284678, 27.06796429, 26.18095324, ..., 27.17515272,
27.30917002, 27.38399379])
Coordinates:
* time (time) datetime64[ns] 1850-01-16T12:00:00 ... 2014-12-16T12:00:00
height float64 2.0

[11]: # Plot the first 100 time steps


ds_nino_avg.tas.isel(time=slice(0, 100)).plot()
[11]: [<matplotlib.lines.Line2D at 0x7fac9d515eb0>]

10.3. Gallery 55
xCDAT Documentation, Release 0.6.0

10.3.4 Calculate Time Averages from Time Series Data

Author: Tom Vo
Date: 05/27/22
Last Edited: 08/17/22 (v0.3.1)
Related APIs:
• xarray.Dataset.temporal.average()
• xarray.Dataset.temporal.group_average()
The data used in this example can be found through the Earth System Grid Federation (ESGF) search portal.

Overview

Suppose we have netCDF4 files for air temperature data (tas) with monthly, daily, and 3hr frequencies.
We want to calculate averages using these files with the time dimension removed (a single time snapshot), and averages
by time group (yearly, seasonal, and daily).

[1]: %matplotlib inline

import pandas as pd
import matplotlib.pyplot as plt
import xcdat

56 Chapter 10. License


xCDAT Documentation, Release 0.6.0

1. Calculate averages with the time dimension removed (single snapshot)

Related API: xarray.Dataset.temporal.average()


Helpful knowledge:
• The frequency for the time interval is inferred before calculating weights.
– The frequency is inferred by calculating the minimum delta between time coordinates and using the con-
ditional logic below. This frequency is used to calculate weights.

if min_delta < pd.Timedelta(days=1):


return "hour"
elif min_delta >= pd.Timedelta(days=1) and min_delta < pd.Timedelta(days=28):
return "day"
elif min_delta >= pd.Timedelta(days=28) and min_delta < pd.Timedelta(days=365):
return "month"
else:
return "year"

• Masked (missing) data is automatically handled.


– The weight of masked (missing) data are excluded when averages are calculated. This is the same as giving
them a weight of 0.

Open the Dataset

In this example, we will be calculating the time weighted averages with the time dimension removed (single snapshot)
for monthly tas data.
We are using xarray’s OPeNDAP support to read a netCDF4 dataset file directly from its source. The data is not loaded
over the network until we perform operations on it (e.g., temperature unit adjustment).
More information on the xarray’s OPeNDAP support can be found here.

[2]: filepath = "https://esgf-data1.llnl.gov/thredds/dodsC/css03_data/CMIP6/CMIP/CSIRO/ACCESS-


˓→ESM1-5/historical/r10i1p1f1/Amon/tas/gn/v20200605/tas_Amon_ACCESS-ESM1-5_historical_

˓→r10i1p1f1_gn_185001-201412.nc"

ds = xcdat.open_dataset(filepath)

# Unit adjust (-273.15, K to C)


ds["tas"] = ds.tas - 273.15

ds
[2]: <xarray.Dataset>
Dimensions: (time: 1980, bnds: 2, lat: 145, lon: 192)
Coordinates:
* time (time) datetime64[ns] 1850-01-16T12:00:00 ... 2014-12-16T12:00:00
* lat (lat) float64 -90.0 -88.75 -87.5 -86.25 ... 86.25 87.5 88.75 90.0
* lon (lon) float64 0.0 1.875 3.75 5.625 ... 352.5 354.4 356.2 358.1
height float64 2.0
Dimensions without coordinates: bnds
Data variables:
time_bnds (time, bnds) datetime64[ns] ...
lat_bnds (lat, bnds) float64 ...
(continues on next page)

10.3. Gallery 57
xCDAT Documentation, Release 0.6.0

(continued from previous page)


lon_bnds (lon, bnds) float64 ...
tas (time, lat, lon) float32 -27.19 -27.19 -27.19 ... -25.29 -25.29
Attributes: (12/48)
Conventions: CF-1.7 CMIP-6.2
activity_id: CMIP
branch_method: standard
branch_time_in_child: 0.0
branch_time_in_parent: 87658.0
creation_date: 2020-06-05T04:06:11Z
... ...
variant_label: r10i1p1f1
version: v20200605
license: CMIP6 model data produced by CSIRO is li...
cmor_version: 3.4.0
tracking_id: hdl:21.14100/af78ae5e-f3a6-4e99-8cfe-5f2...
DODS_EXTRA.Unlimited_Dimension: time

[3]: ds_avg = ds.temporal.average("tas", weighted=True)

[4]: ds_avg.tas
[4]: <xarray.DataArray 'tas' (lat: 145, lon: 192)>
array([[-48.01481628, -48.01481628, -48.01481628, ..., -48.01481628,
-48.01481628, -48.01481628],
[-44.94085363, -44.97948214, -45.01815398, ..., -44.82408252,
-44.86273067, -44.9009281 ],
[-44.11875274, -44.23060624, -44.33960158, ..., -43.76766492,
-43.88593717, -44.00303006],
...,
[-18.21076615, -18.17513373, -18.13957458, ..., -18.32720478,
-18.28428828, -18.2486193 ],
[-18.50778243, -18.49301854, -18.47902819, ..., -18.55410851,
-18.5406963 , -18.52413098],
[-19.07366375, -19.07366375, -19.07366375, ..., -19.07366375,
-19.07366375, -19.07366375]])
Coordinates:
* lat (lat) float64 -90.0 -88.75 -87.5 -86.25 ... 86.25 87.5 88.75 90.0
* lon (lon) float64 0.0 1.875 3.75 5.625 7.5 ... 352.5 354.4 356.2 358.1
height float64 2.0
Attributes:
operation: temporal_avg
mode: average
freq: month
weighted: True

[5]: ds_avg.tas.plot(label="weighted")
[5]: <matplotlib.collections.QuadMesh at 0x7f120aa2a9a0>

58 Chapter 10. License


xCDAT Documentation, Release 0.6.0

2. Calculate grouped averages

Related API: xarray.Dataset.temporal.group_average()


Helpful knowledge:
• Each specified frequency has predefined groups for grouping time coordinates.
– The table below maps type of averages with its API frequency and grouping convention.

Type of Aver- API Frequency Group By


ages
Yearly freq=“year” year
Monthly freq=“month” year, month
Seasonal freq=“season” year, season
Custom sea- freq="season" and season_config={"custom_seasons year, season
sonal ": <2D ARRAY>}
Daily freq=“day” year, month, day
Hourly freq=“hour” year, month, day,
hour

– The grouping conventions are based on CDAT/cdutil, except for daily and hourly means which aren’t im-
plemented in CDAT/cdutil.
• Masked (missing) data is automatically handled.
– The weight of masked (missing) data are excluded when averages are calculated. This is the same as giving
them a weight of 0.

10.3. Gallery 59
xCDAT Documentation, Release 0.6.0

Open the Dataset

In this example, we will be calculating the weighted grouped time averages for tas data.
We are using xarray’s OPeNDAP support to read a netCDF4 dataset file directly from its source. The data is not loaded
over the network until we perform operations on it (e.g., temperature unit adjustment).
More information on the xarray’s OPeNDAP support can be found here.

[6]: filepath = "https://esgf-data1.llnl.gov/thredds/dodsC/css03_data/CMIP6/CMIP/CSIRO/ACCESS-


˓→ESM1-5/historical/r10i1p1f1/Amon/tas/gn/v20200605/tas_Amon_ACCESS-ESM1-5_historical_

˓→r10i1p1f1_gn_185001-201412.nc"

ds = xcdat.open_dataset(filepath)

# Unit adjust (-273.15, K to C)


ds["tas"] = ds.tas - 273.15

ds
[6]: <xarray.Dataset>
Dimensions: (time: 1980, bnds: 2, lat: 145, lon: 192)
Coordinates:
* time (time) datetime64[ns] 1850-01-16T12:00:00 ... 2014-12-16T12:00:00
* lat (lat) float64 -90.0 -88.75 -87.5 -86.25 ... 86.25 87.5 88.75 90.0
* lon (lon) float64 0.0 1.875 3.75 5.625 ... 352.5 354.4 356.2 358.1
height float64 2.0
Dimensions without coordinates: bnds
Data variables:
time_bnds (time, bnds) datetime64[ns] ...
lat_bnds (lat, bnds) float64 ...
lon_bnds (lon, bnds) float64 ...
tas (time, lat, lon) float32 -27.19 -27.19 -27.19 ... -25.29 -25.29
Attributes: (12/48)
Conventions: CF-1.7 CMIP-6.2
activity_id: CMIP
branch_method: standard
branch_time_in_child: 0.0
branch_time_in_parent: 87658.0
creation_date: 2020-06-05T04:06:11Z
... ...
variant_label: r10i1p1f1
version: v20200605
license: CMIP6 model data produced by CSIRO is li...
cmor_version: 3.4.0
tracking_id: hdl:21.14100/af78ae5e-f3a6-4e99-8cfe-5f2...
DODS_EXTRA.Unlimited_Dimension: time

60 Chapter 10. License


xCDAT Documentation, Release 0.6.0

Yearly Averages

Group time coordinates by year

[7]: ds_yearly = ds.temporal.group_average("tas", freq="year", weighted=True)

[8]: ds_yearly.tas
[8]: <xarray.DataArray 'tas' (time: 165, lat: 145, lon: 192)>
array([[[-48.75573349, -48.75573349, -48.75573349, ..., -48.75573349,
-48.75573349, -48.75573349],
[-45.65206528, -45.69302368, -45.73506165, ..., -45.52127838,
-45.56386566, -45.60668945],
[-44.77523422, -44.90583801, -45.03297043, ..., -44.37118149,
-44.50630951, -44.64050293],
...,
[-20.50597572, -20.48132133, -20.45456505, ..., -20.58895874,
-20.55752182, -20.53087234],
[-20.79759216, -20.78425217, -20.77545547, ..., -20.83267975,
-20.82335663, -20.80768394],
[-21.20114899, -21.20114899, -21.20114899, ..., -21.20114899,
-21.20114899, -21.20114899]],

[[-48.95254898, -48.95254898, -48.95254898, ..., -48.95254898,


-48.95254898, -48.95254898],
[-45.83190918, -45.8649025 , -45.89875031, ..., -45.7321701 ,
-45.76544189, -45.79859543],
[-44.93536758, -45.03795624, -45.13800812, ..., -44.61143112,
-44.71986008, -44.82937241],
...
[-14.91627121, -14.89926147, -14.88381004, ..., -14.99542999,
-14.96513653, -14.93853188],
[-15.40592194, -15.39668083, -15.38595486, ..., -15.43246269,
-15.42605591, -15.41356754],
[-15.94499969, -15.94499969, -15.94499969, ..., -15.94499969,
-15.94499969, -15.94499969]],

[[-47.59732056, -47.59732056, -47.59732056, ..., -47.59732056,


-47.59732056, -47.59732056],
[-44.72136688, -44.76342773, -44.80350494, ..., -44.59239197,
-44.63444519, -44.67822647],
[-43.85031891, -43.96956253, -44.08713913, ..., -43.47090149,
-43.59676361, -43.72407913],
...,
[-14.52023029, -14.47407913, -14.43230724, ..., -14.67551422,
-14.62093163, -14.56736755],
[-14.91123581, -14.89230919, -14.86901569, ..., -14.9820118 ,
-14.96266842, -14.93872261],
[-15.6184063 , -15.6184063 , -15.6184063 , ..., -15.6184063 ,
-15.6184063 , -15.6184063 ]]])
Coordinates:
* lat (lat) float64 -90.0 -88.75 -87.5 -86.25 ... 86.25 87.5 88.75 90.0
* lon (lon) float64 0.0 1.875 3.75 5.625 7.5 ... 352.5 354.4 356.2 358.1
(continues on next page)

10.3. Gallery 61
xCDAT Documentation, Release 0.6.0

(continued from previous page)


height float64 2.0
* time (time) object 1850-01-01 00:00:00 ... 2014-01-01 00:00:00
Attributes:
operation: temporal_avg
mode: group_average
freq: year
weighted: True

This GIF was created using xmovie.


Sample xmovie code:

import xmovie
mov = xmovie.Movie(ds_yearly_avg.tas)
mov.save("temporal-average-yearly.gif")

Seasonal Averages

Group time coordinates by year and season

[9]: ds_season = ds.temporal.group_average("tas", freq="season", weighted=True)

[10]: ds_season.tas
[10]: <xarray.DataArray 'tas' (time: 661, lat: 145, lon: 192)>
array([[[-32.70588303, -32.70588303, -32.70588303, ..., -32.70588303,
-32.70588303, -32.70588303],
[-30.99376678, -31.03758621, -31.08932686, ..., -30.84562302,
-30.89412689, -30.94400978],
[-30.0251503 , -30.14543724, -30.26419067, ..., -29.66037178,
-29.78108025, -29.90287781],
...,
[-37.72314072, -37.68549347, -37.65416718, ..., -37.82619858,
-37.79034424, -37.75682831],
[-38.27464676, -38.26372528, -38.25014496, ..., -38.29218292,
-38.29063797, -38.28456116],
[-38.74358749, -38.74358749, -38.74358749, ..., -38.74358749,
-38.74358749, -38.74358749]],

[[-54.29086304, -54.29086304, -54.29086304, ..., -54.29086304,


-54.29086304, -54.29086304],
[-51.11771393, -51.17523575, -51.23055267, ..., -50.93516541,
-50.99657059, -51.05614471],
[-50.31804657, -50.48666382, -50.64956665, ..., -49.79003143,
-49.97007751, -50.14521027],
...
[-12.34277439, -12.2246685 , -12.10663223, ..., -12.74492168,
-12.60908794, -12.47839165],
[-13.12640381, -13.0661087 , -13.00387573, ..., -13.306077 ,
-13.25871468, -13.19972038],
(continues on next page)

62 Chapter 10. License


xCDAT Documentation, Release 0.6.0

(continued from previous page)


[-14.28846931, -14.28846931, -14.28846931, ..., -14.28846931,
-14.28846931, -14.28846931]],

[[-28.99049377, -28.99049377, -28.99049377, ..., -28.99049377,


-28.99049377, -28.99049377],
[-28.19291687, -28.22457886, -28.26130676, ..., -28.09593201,
-28.12599182, -28.15802002],
[-27.60740662, -27.7056427 , -27.80511475, ..., -27.31161499,
-27.41082764, -27.50836182],
...,
[-24.25627136, -24.14059448, -24.03753662, ..., -24.61853027,
-24.48849487, -24.36643982],
[-24.62901306, -24.61338806, -24.54986572, ..., -24.75204468,
-24.72160339, -24.66641235],
[-25.28923035, -25.28923035, -25.28923035, ..., -25.28923035,
-25.28923035, -25.28923035]]])
Coordinates:
* lat (lat) float64 -90.0 -88.75 -87.5 -86.25 ... 86.25 87.5 88.75 90.0
* lon (lon) float64 0.0 1.875 3.75 5.625 7.5 ... 352.5 354.4 356.2 358.1
height float64 2.0
* time (time) object 1850-01-01 00:00:00 ... 2015-01-01 00:00:00
Attributes:
operation: temporal_avg
mode: group_average
freq: season
weighted: True
dec_mode: DJF
drop_incomplete_djf: False

Notice that the season of each time coordinate is represented by its middle month.
• “DJF” is represented by month 1 (“J”/January)
• “MAM” is represented by month 4 (“A”/April)
• “JJA” is represented by month 7 (“J”/July)
• “SON” is represented by month 10 (“O”/October).
This is implementation design was used because datetime objects do not distinguish seasons, so the middle month is
used instead.

[11]: ds_season.time
[11]: <xarray.DataArray 'time' (time: 661)>
array([cftime.DatetimeProlepticGregorian(1850, 1, 1, 0, 0, 0, 0, has_year_zero=True),
cftime.DatetimeProlepticGregorian(1850, 4, 1, 0, 0, 0, 0, has_year_zero=True),
cftime.DatetimeProlepticGregorian(1850, 7, 1, 0, 0, 0, 0, has_year_zero=True),
...,
cftime.DatetimeProlepticGregorian(2014, 7, 1, 0, 0, 0, 0, has_year_zero=True),
cftime.DatetimeProlepticGregorian(2014, 10, 1, 0, 0, 0, 0, has_year_zero=True),
cftime.DatetimeProlepticGregorian(2015, 1, 1, 0, 0, 0, 0, has_year_zero=True)],
dtype=object)
Coordinates:
height float64 2.0
(continues on next page)

10.3. Gallery 63
xCDAT Documentation, Release 0.6.0

(continued from previous page)


* time (time) object 1850-01-01 00:00:00 ... 2015-01-01 00:00:00
Attributes:
bounds: time_bnds
axis: T
long_name: time
standard_name: time
_ChunkSizes: 1

Monthly Averages

Group time coordinates by year and month


For this example, we will be loading a subset of daily time series data for tas using OPeNDAP.

NOTE:
For OPeNDAP servers, the default file size request limit is 500MB in the TDS server configuration. Opening up a
dataset over OPeNDAP also introduces an overhead compared to direct file access.
The workaround is to use Dask to request the data in manageable chunks, which overcomes file size limitations
and can improve performance.
We have a few ways to chunk our request:
1. Specify chunks with "auto" to let Dask determine the chunksize.
2. Specify a specify the file size to chunk on (e.g., "100MB") or number of chunks as an integer (100 for 100 chunks).
Visit this page to learn more about chunking and performance: https://docs.xarray.dev/en/stable/user-guide/dask.html#
chunking-and-performance

[12]: # The size of this file is approximately 1.45 GB, so we will be chunking our
# request using Dask to avoid hitting the OPeNDAP file size request limit for
# this ESGF node.
ds2 = xcdat.open_dataset(
"https://esgf-data1.llnl.gov/thredds/dodsC/css03_data/CMIP6/CMIP/CSIRO/ACCESS-ESM1-5/
˓→historical/r10i1p1f1/3hr/tas/gn/v20200605/tas_3hr_ACCESS-ESM1-5_historical_r10i1p1f1_

˓→gn_201001010300-201501010000.nc",

chunks={"time": "auto"},
)

# Unit adjust (-273.15, K to C)


ds2["tas"] = ds2.tas - 273.15

ds2
[12]: <xarray.Dataset>
Dimensions: (time: 14608, lat: 145, bnds: 2, lon: 192)
Coordinates:
* time (time) datetime64[ns] 2010-01-01T03:00:00 ... 2015-01-01
* lat (lat) float64 -90.0 -88.75 -87.5 -86.25 ... 86.25 87.5 88.75 90.0
* lon (lon) float64 0.0 1.875 3.75 5.625 ... 352.5 354.4 356.2 358.1
(continues on next page)

64 Chapter 10. License


xCDAT Documentation, Release 0.6.0

(continued from previous page)


height float64 ...
Dimensions without coordinates: bnds
Data variables:
lat_bnds (lat, bnds) float64 dask.array<chunksize=(145, 2), meta=np.ndarray>
lon_bnds (lon, bnds) float64 dask.array<chunksize=(192, 2), meta=np.ndarray>
tas (time, lat, lon) float32 dask.array<chunksize=(913, 145, 192), meta=np.
˓→ndarray>

time_bnds (time, bnds) datetime64[ns] 2010-01-01T01:30:00 ... 2015-01-01...


Attributes: (12/48)
Conventions: CF-1.7 CMIP-6.2
activity_id: CMIP
branch_method: standard
branch_time_in_child: 0.0
branch_time_in_parent: 87658.0
creation_date: 2020-06-05T04:54:56Z
... ...
variant_label: r10i1p1f1
version: v20200605
license: CMIP6 model data produced by CSIRO is li...
cmor_version: 3.4.0
tracking_id: hdl:21.14100/b79e6a05-c482-46cf-b3b8-83b...
DODS_EXTRA.Unlimited_Dimension: time

[13]: ds2_monthly_avg = ds2.temporal.group_average("tas", freq="month", weighted=True)

[14]: ds2_monthly_avg.tas
[14]: <xarray.DataArray 'tas' (time: 61, lat: 145, lon: 192)>
dask.array<truediv, shape=(61, 145, 192), dtype=float64, chunksize=(1, 145, 192),␣
˓→chunktype=numpy.ndarray>

Coordinates:
* lat (lat) float64 -90.0 -88.75 -87.5 -86.25 ... 86.25 87.5 88.75 90.0
* lon (lon) float64 0.0 1.875 3.75 5.625 7.5 ... 352.5 354.4 356.2 358.1
height float64 ...
* time (time) object 2010-01-01 00:00:00 ... 2015-01-01 00:00:00
Attributes:
operation: temporal_avg
mode: group_average
freq: month
weighted: True

Daily Averages

Group time coordinates by year, month, and day


For this example, we will be opening a subset of 3hr time series data for tas using OPeNDAP.

[15]: # The size of this file is approximately 1.17 GB, so we will be chunking our
# request using Dask to avoid hitting the OPeNDAP file size request limit for
# this ESGF node.
ds3 = xcdat.open_dataset(
(continues on next page)

10.3. Gallery 65
xCDAT Documentation, Release 0.6.0

(continued from previous page)


"https://esgf-data1.llnl.gov/thredds/dodsC/css03_data/CMIP6/CMIP/CSIRO/ACCESS-ESM1-5/
˓→historical/r10i1p1f1/3hr/tas/gn/v20200605/tas_3hr_ACCESS-ESM1-5_historical_r10i1p1f1_

˓→gn_201001010300-201501010000.nc",

chunks={"time": "auto"},
)

# Unit adjust (-273.15, K to C)


ds3["tas"] = ds3.tas - 273.15

[16]: ds3.tas
[16]: <xarray.DataArray 'tas' (time: 14608, lat: 145, lon: 192)>
dask.array<sub, shape=(14608, 145, 192), dtype=float32, chunksize=(913, 145, 192),␣
˓→chunktype=numpy.ndarray>

Coordinates:
* time (time) datetime64[ns] 2010-01-01T03:00:00 ... 2015-01-01
* lat (lat) float64 -90.0 -88.75 -87.5 -86.25 ... 86.25 87.5 88.75 90.0
* lon (lon) float64 0.0 1.875 3.75 5.625 7.5 ... 352.5 354.4 356.2 358.1
height float64 ...

[17]: ds3_day_avg = ds3.temporal.group_average("tas", freq="day", weighted=True)

[18]: ds3_day_avg.tas
[18]: <xarray.DataArray 'tas' (time: 1827, lat: 145, lon: 192)>
dask.array<truediv, shape=(1827, 145, 192), dtype=float64, chunksize=(1, 145, 192),␣
˓→chunktype=numpy.ndarray>

Coordinates:
* lat (lat) float64 -90.0 -88.75 -87.5 -86.25 ... 86.25 87.5 88.75 90.0
* lon (lon) float64 0.0 1.875 3.75 5.625 7.5 ... 352.5 354.4 356.2 358.1
height float64 ...
* time (time) object 2010-01-01 00:00:00 ... 2015-01-01 00:00:00
Attributes:
operation: temporal_avg
mode: group_average
freq: day
weighted: True

10.3.5 Calculating Climatology and Departures from Time Series Data

Author: Tom Vo
Date: 05/27/22
Last Updated: 2/27/23
Related APIs:
• xarray.Dataset.temporal.climatology()
• xarray.Dataset.temporal.departures()
The data used in this example can be found through the Earth System Grid Federation (ESGF) search portal.

66 Chapter 10. License


xCDAT Documentation, Release 0.6.0

Overview

Suppose we have two netCDF4 files for air temperature data (tas).
• File 1: Monthly frequency from 1850-01-16 to 2014-12-16
– We want to calculate the annual and seasonal cycle climatologies and departures using this file.
• File 2: Hourly frequency from 2010-01-01 to 2015-01-01 (subset).
– We want to calculate the daily cycle climatologies and departures using this file.

[1]: %matplotlib inline

import matplotlib.pyplot as plt


import pandas as pd
import xcdat

1. Open Sample Datasets

We are using xarray’s OPeNDAP support to read a netCDF4 dataset files directly from their source. The data is not
loaded over the network until we perform operations on it (e.g., temperature unit adjustment).
More information on the xarray’s OPeNDAP support can be found here.

File 1: Monthly Frequency

[2]: filepath1 = "http://esgf.nci.org.au/thredds/dodsC/master/CMIP6/CMIP/CSIRO/ACCESS-ESM1-5/


˓→historical/r10i1p1f1/Amon/tas/gn/v20200605/tas_Amon_ACCESS-ESM1-5_historical_r10i1p1f1_

˓→gn_185001-201412.nc"

ds_monthly = xcdat.open_dataset(filepath1)

# Unit adjust (-273.15, K to C)


ds_monthly["tas"] = ds_monthly.tas - 273.15
ds_monthly
[2]: <xarray.Dataset>
Dimensions: (time: 1980, bnds: 2, lat: 145, lon: 192)
Coordinates:
* time (time) datetime64[ns] 1850-01-16T12:00:00 ... 2014-12-16T12:00:00
* lat (lat) float64 -90.0 -88.75 -87.5 -86.25 ... 86.25 87.5 88.75 90.0
* lon (lon) float64 0.0 1.875 3.75 5.625 ... 352.5 354.4 356.2 358.1
height float64 2.0
Dimensions without coordinates: bnds
Data variables:
time_bnds (time, bnds) datetime64[ns] ...
lat_bnds (lat, bnds) float64 ...
lon_bnds (lon, bnds) float64 ...
tas (time, lat, lon) float32 -27.19 -27.19 -27.19 ... -25.29 -25.29
Attributes: (12/49)
Conventions: CF-1.7 CMIP-6.2
activity_id: CMIP
branch_method: standard
(continues on next page)

10.3. Gallery 67
xCDAT Documentation, Release 0.6.0

(continued from previous page)


branch_time_in_child: 0.0
branch_time_in_parent: 87658.0
creation_date: 2020-06-05T04:06:11Z
... ...
version: v20200605
license: CMIP6 model data produced by CSIRO is li...
cmor_version: 3.4.0
_NCProperties: version=2,netcdf=4.6.2,hdf5=1.10.5
tracking_id: hdl:21.14100/af78ae5e-f3a6-4e99-8cfe-5f2...
DODS_EXTRA.Unlimited_Dimension: time

File 2: Hourly Frequency

The size of this file is approximately 1.17 GB, so we will be chunking our request using Dask to avoid hitting the
OPeNDAP file size request limit for this ESGF node.

[3]: filepath2 = "http://esgf.nci.org.au/thredds/dodsC/master/CMIP6/CMIP/CSIRO/ACCESS-ESM1-5/


˓→historical/r10i1p1f1/3hr/tas/gn/v20200605/tas_3hr_ACCESS-ESM1-5_historical_r10i1p1f1_

˓→gn_201001010300-201501010000.nc"

ds_hourly = xcdat.open_dataset(filepath2, chunks={"time": "auto"})

# Unit adjust (-273.15, K to C)


ds_hourly["tas"] = ds_hourly.tas - 273.15
ds_hourly
[3]: <xarray.Dataset>
Dimensions: (time: 14608, lat: 145, bnds: 2, lon: 192)
Coordinates:
* time (time) datetime64[ns] 2010-01-01T03:00:00 ... 2015-01-01
* lat (lat) float64 -90.0 -88.75 -87.5 -86.25 ... 86.25 87.5 88.75 90.0
* lon (lon) float64 0.0 1.875 3.75 5.625 ... 352.5 354.4 356.2 358.1
height float64 ...
Dimensions without coordinates: bnds
Data variables:
lat_bnds (lat, bnds) float64 dask.array<chunksize=(145, 2), meta=np.ndarray>
lon_bnds (lon, bnds) float64 dask.array<chunksize=(192, 2), meta=np.ndarray>
tas (time, lat, lon) float32 dask.array<chunksize=(913, 145, 192), meta=np.
˓→ndarray>

time_bnds (time, bnds) datetime64[ns] 2010-01-01T01:30:00 ... 2015-01-01...


Attributes: (12/49)
Conventions: CF-1.7 CMIP-6.2
activity_id: CMIP
branch_method: standard
branch_time_in_child: 0.0
branch_time_in_parent: 87658.0
creation_date: 2020-06-05T04:54:56Z
... ...
version: v20200605
license: CMIP6 model data produced by CSIRO is li...
cmor_version: 3.4.0
_NCProperties: version=2,netcdf=4.6.2,hdf5=1.10.5
(continues on next page)

68 Chapter 10. License


xCDAT Documentation, Release 0.6.0

(continued from previous page)


tracking_id: hdl:21.14100/b79e6a05-c482-46cf-b3b8-83b...
DODS_EXTRA.Unlimited_Dimension: time

2. Calculate Climatology

Related API: xarray.Dataset.temporal.climatology()


In this example, we will be calculating the weighted climatology of the tas variable for its seasonal, annual, and daily
cycles.
Helpful knowledge:
• Masked (missing) data is automatically handled.
– The weight of masked (missing) data is excluded when averages are calculated. This is the same as giving
them a weight of 0.
• If desired, use the reference_period argument to calculate a climatology based on a climatological reference
period (a subset of the entire time series). If no value is provided, the climatological reference period will be the
full period covered by the dataset.

Seasonal Climatology

Groups time coordinates by season


The season_config dictionary keyword argument can be passed to .climatology() for more granular configura-
tion. We will be sticking with the default settings.

[4]: season_climo = ds_monthly.temporal.climatology(


"tas",
freq="season",
weighted=True,
season_config={"dec_mode": "DJF", "drop_incomplete_djf": True},
)

[5]: season_climo.tas
[5]: <xarray.DataArray 'tas' (time: 4, lat: 145, lon: 192)>
array([[[-31.00774765, -31.00774765, -31.00774765, ..., -31.00774765,
-31.00774765, -31.00774765],
[-29.65324402, -29.685215 , -29.71771049, ..., -29.55809784,
-29.58923149, -29.62030983],
[-28.88215446, -28.98016167, -29.07778549, ..., -28.58658791,
-28.68405914, -28.78241539],
...,
[-31.36740303, -31.31291962, -31.25907516, ..., -31.54325676,
-31.47868538, -31.42434502],
[-31.88631248, -31.86421967, -31.84326553, ..., -31.95551682,
-31.93475533, -31.91006279],
[-32.83132172, -32.83132172, -32.83132172, ..., -32.83132172,
-32.83132172, -32.83132172]],

[[-53.70133972, -53.70133972, -53.70133972, ..., -53.70133972,


(continues on next page)

10.3. Gallery 69
xCDAT Documentation, Release 0.6.0

(continued from previous page)


-53.70133972, -53.70133972],
[-50.02594376, -50.07233047, -50.11901093, ..., -49.88347626,
-49.93112564, -49.97804642],
[-49.16661835, -49.29807281, -49.42589951, ..., -48.75580978,
-48.89396286, -49.03115463],
...
[ -1.05963409, -1.05649328, -1.05370045, ..., -1.06824732,
-1.06510675, -1.06242192],
[ -1.06418574, -1.06315029, -1.06234932, ..., -1.06742334,
-1.06604695, -1.06509995],
[ -1.12615526, -1.12615526, -1.12615526, ..., -1.12615526,
-1.12615526, -1.12615526]],

[[-48.71931076, -48.71931076, -48.71931076, ..., -48.71931076,


-48.71931076, -48.71931076],
[-45.70309448, -45.74006271, -45.77688599, ..., -45.59179306,
-45.62841034, -45.664814 ],
[-44.89496231, -44.9999733 , -45.10230255, ..., -44.5642662 ,
-44.67589569, -44.78623962],
...,
[-18.21715736, -18.16695976, -18.11590195, ..., -18.38574219,
-18.32255554, -18.27195358],
[-18.61506462, -18.59276581, -18.57180786, ..., -18.68408012,
-18.66509819, -18.64017296],
[-19.34391594, -19.34391594, -19.34391594, ..., -19.34391594,
-19.34391594, -19.34391594]]])
Coordinates:
* lat (lat) float64 -90.0 -88.75 -87.5 -86.25 ... 86.25 87.5 88.75 90.0
* lon (lon) float64 0.0 1.875 3.75 5.625 7.5 ... 352.5 354.4 356.2 358.1
height float64 2.0
* time (time) object 0001-01-01 00:00:00 ... 0001-10-01 00:00:00
Attributes:
operation: temporal_avg
mode: climatology
freq: season
weighted: True
dec_mode: DJF
drop_incomplete_djf: True

[6]: notnull = pd.notnull(season_climo["tas"][0])


tas_season = season_climo.tas

fig, axes = plt.subplots(nrows=4, ncols=1, figsize=(14, 12))


for i, season in enumerate(("DJF", "MAM", "JJA", "SON")):
tas_season.isel(time=i).where(notnull).plot.pcolormesh(
ax=axes[i],
vmin=-30,
vmax=30,
cmap="Spectral_r",
add_colorbar=True,
extend="both",
)
(continues on next page)

70 Chapter 10. License


xCDAT Documentation, Release 0.6.0

(continued from previous page)


axes[i].set_ylabel(season)

for ax in axes.flat:
ax.axes.get_xaxis().set_ticklabels([])
ax.axes.get_yaxis().set_ticklabels([])
ax.axes.axis("tight")
ax.set_xlabel("")

plt.tight_layout()
fig.suptitle("Seasonal Surface Air Temperature", fontsize=16, y=1.02)
[6]: Text(0.5, 1.02, 'Seasonal Surface Air Temperature')

Notice that the time coordinates are cftime objects, with each season (“DJF”, “MAM”, “JJA”, and “SON”) represented

10.3. Gallery 71
xCDAT Documentation, Release 0.6.0

by its middle month.


cftime objects are used because the time coordiantes are outside the Timestamp-valid range (approximately between
years 1678 and 2262).
• More info here: https://xarray.pydata.org/en/v2022.03.0/user-guide/weather-climate.html#
non-standard-calendars-and-dates-outside-the-timestamp-valid-range

[7]: season_climo.time
[7]: <xarray.DataArray 'time' (time: 4)>
array([cftime.DatetimeProlepticGregorian(1, 1, 1, 0, 0, 0, 0, has_year_zero=True),
cftime.DatetimeProlepticGregorian(1, 4, 1, 0, 0, 0, 0, has_year_zero=True),
cftime.DatetimeProlepticGregorian(1, 7, 1, 0, 0, 0, 0, has_year_zero=True),
cftime.DatetimeProlepticGregorian(1, 10, 1, 0, 0, 0, 0, has_year_zero=True)],
dtype=object)
Coordinates:
height float64 2.0
* time (time) object 0001-01-01 00:00:00 ... 0001-10-01 00:00:00
Attributes:
bounds: time_bnds
axis: T
long_name: time
standard_name: time
_ChunkSizes: 1

Custom Seasonal Climatology

Groups time coordinates by season


To calculate custom seasonal cycle climatology, we must first define our custom seasons using the season_config
dictionary and the "custom_seasons" key.
"custom_seasons" must be a list of sublists containing month strings, with each sublist representing a custom season.
• Month strings must be in the three letter format (e.g., ‘Jan’)
• Each month must be included once in a custom season
• Order of the months in each custom season does not matter
• Custom seasons can vary in length

[8]: custom_seasons = [
["Jan", "Feb", "Mar"], # "JanFebMar"
["Apr", "May", "Jun"], # "AprMayJun"
["Jul", "Aug", "Sep"], # "JunJulAug"
["Oct", "Nov", "Dec"], # "OctNovDec"
]

c_season_climo = ds_monthly.temporal.climatology(
"tas",
freq="season",
weighted=True,
season_config={"custom_seasons": custom_seasons},
)

72 Chapter 10. License


xCDAT Documentation, Release 0.6.0

[9]: c_season_climo.tas
[9]: <xarray.DataArray 'tas' (time: 4, lat: 145, lon: 192)>
array([[[-38.74568939, -38.74568939, -38.74568939, ..., -38.74568939,
-38.74568939, -38.74568939],
[-36.58245468, -36.61849976, -36.65530777, ..., -36.47352982,
-36.50952148, -36.54521942],
[-35.74017334, -35.84892654, -35.95645142, ..., -35.40914154,
-35.51865387, -35.62909698],
...,
[-32.0694809 , -32.01528931, -31.96115875, ..., -32.24432373,
-32.18037796, -32.1263504 ],
[-32.59425354, -32.57166672, -32.55008316, ..., -32.66543961,
-32.64432526, -32.61899185],
[-33.51273727, -33.51273727, -33.51273727, ..., -33.51273727,
-33.51273727, -33.51273727]],

[[-56.2096405 , -56.2096405 , -56.2096405 , ..., -56.2096405 ,


-56.2096405 , -56.2096405 ],
[-52.31330872, -52.36031723, -52.40692902, ..., -52.16948318,
-52.21759415, -52.26473999],
[-51.48299408, -51.61407852, -51.74102783, ..., -51.06825256,
-51.20875549, -51.34703445],
...
[ -4.15014648, -4.13455486, -4.11836147, ..., -4.20478487,
-4.18330002, -4.16762114],
[ -4.25911999, -4.25162458, -4.24448299, ..., -4.28251314,
-4.2763319 , -4.26777029],
[ -4.44926548, -4.44926548, -4.44926548, ..., -4.44926548,
-4.44926548, -4.44926548]],

[[-38.29449081, -38.29449081, -38.29449081, ..., -38.29449081,


-38.29449081, -38.29449081],
[-36.35746002, -36.39224243, -36.42734528, ..., -36.25349045,
-36.28746796, -36.32162476],
[-35.58590698, -35.68638992, -35.78586197, ..., -35.27788162,
-35.38057709, -35.48319244],
...,
[-24.59911537, -24.54461861, -24.49049377, ..., -24.7778244 ,
-24.71206665, -24.65720177],
[-25.07014275, -25.04795647, -25.027174 , ..., -25.13942909,
-25.1194191 , -25.09457588],
[-25.95426178, -25.95426178, -25.95426178, ..., -25.95426178,
-25.95426178, -25.95426178]]])
Coordinates:
* lat (lat) float64 -90.0 -88.75 -87.5 -86.25 ... 86.25 87.5 88.75 90.0
* lon (lon) float64 0.0 1.875 3.75 5.625 7.5 ... 352.5 354.4 356.2 358.1
height float64 2.0
* time (time) object 0001-02-01 00:00:00 ... 0001-11-01 00:00:00
Attributes:
operation: temporal_avg
mode: climatology
freq: season
(continues on next page)

10.3. Gallery 73
xCDAT Documentation, Release 0.6.0

(continued from previous page)


weighted: True
custom_seasons: ['JanFebMar', 'AprMayJun', 'JulAugSep', 'OctNovDec']

[10]: notnull = pd.notnull(c_season_climo["tas"][0])


tas_c_season = c_season_climo.tas

fig, axes = plt.subplots(nrows=4, ncols=1, figsize=(14, 12))


for i, season in enumerate(tas_c_season.attrs["custom_seasons"]):
tas_c_season.isel(time=i).where(notnull).plot.pcolormesh(
ax=axes[i],
vmin=-30,
vmax=30,
cmap="Spectral_r",
add_colorbar=True,
extend="both",
)
axes[i].set_ylabel(season)

for ax in axes.flat:
ax.axes.get_xaxis().set_ticklabels([])
ax.axes.get_yaxis().set_ticklabels([])
ax.axes.axis("tight")
ax.set_xlabel("")

plt.tight_layout()
fig.suptitle("Seasonal Surface Air Temperature", fontsize=16, y=1.02)
[10]: Text(0.5, 1.02, 'Seasonal Surface Air Temperature')

74 Chapter 10. License


xCDAT Documentation, Release 0.6.0

Annual Climatology

Groups time coordinates by month

[11]: annual_climo = ds_monthly.temporal.climatology("tas", freq="month", weighted=True)

[12]: annual_climo.tas
[12]: <xarray.DataArray 'tas' (time: 12, lat: 145, lon: 192)>
array([[[-28.21442795, -28.21442795, -28.21442795, ..., -28.21442795,
-28.21442795, -28.21442795],
[-27.14847946, -27.17834282, -27.20867348, ..., -27.06005478,
(continues on next page)

10.3. Gallery 75
xCDAT Documentation, Release 0.6.0

(continued from previous page)


-27.08879089, -27.11763954],
[-26.4435463 , -26.53694916, -26.62967873, ..., -26.1612587 ,
-26.25445938, -26.34812355],
...,
[-31.93053436, -31.87295341, -31.81675529, ..., -32.11352158,
-32.04728317, -31.99032974],
[-32.46694946, -32.44190598, -32.41777802, ..., -32.543293 ,
-32.52000809, -32.4929657 ],
[-33.39895248, -33.39895248, -33.39895248, ..., -33.39895248,
-33.39895248, -33.39895248]],

[[-37.97247314, -37.97247314, -37.97247314, ..., -37.97247314,


-37.97247314, -37.97247314],
[-35.9917984 , -36.02771759, -36.06422424, ..., -35.88243866,
-35.91858673, -35.9542923 ],
[-35.13148499, -35.23939133, -35.34668732, ..., -34.80545044,
-34.91286087, -35.02162933],
...
[-24.97770691, -24.91987419, -24.86286926, ..., -25.1666317 ,
-25.09855843, -25.03938293],
[-25.40159607, -25.37743568, -25.3551445 , ..., -25.47573471,
-25.45422173, -25.42788887],
[-26.30274582, -26.30274582, -26.30274582, ..., -26.30274582,
-26.30274582, -26.30274582]],

[[-27.46326065, -27.46326065, -27.46326065, ..., -27.46326065,


-27.46326065, -27.46326065],
[-26.38991165, -26.42050743, -26.45174217, ..., -26.30039787,
-26.32954788, -26.35881042],
[-25.6327858 , -25.72666359, -25.82057571, ..., -25.35087395,
-25.44388771, -25.53768921],
...,
[-29.80114746, -29.74719429, -29.69542122, ..., -29.973629 ,
-29.90939331, -29.85676384],
[-30.31214523, -30.29234123, -30.27323341, ..., -30.3752346 ,
-30.35642242, -30.3335495 ],
[-31.22587013, -31.22587013, -31.22587013, ..., -31.22587013,
-31.22587013, -31.22587013]]])
Coordinates:
* lat (lat) float64 -90.0 -88.75 -87.5 -86.25 ... 86.25 87.5 88.75 90.0
* lon (lon) float64 0.0 1.875 3.75 5.625 7.5 ... 352.5 354.4 356.2 358.1
height float64 2.0
* time (time) object 0001-01-01 00:00:00 ... 0001-12-01 00:00:00
Attributes:
operation: temporal_avg
mode: climatology
freq: month
weighted: True

76 Chapter 10. License


xCDAT Documentation, Release 0.6.0

Daily Climatology

Groups time coordinates by month and day.


Leap days (if present) are dropped if the CF calendar type is "gregorian", "proleptic_gregorian", or "standard
".

[13]: daily_climo = ds_hourly.temporal.climatology("tas", freq="day", weighted=True)

[14]: daily_climo.tas
[14]: <xarray.DataArray 'tas' (time: 365, lat: 145, lon: 192)>
dask.array<truediv, shape=(365, 145, 192), dtype=float64, chunksize=(1, 145, 192),␣
˓→chunktype=numpy.ndarray>

Coordinates:
* lat (lat) float64 -90.0 -88.75 -87.5 -86.25 ... 86.25 87.5 88.75 90.0
* lon (lon) float64 0.0 1.875 3.75 5.625 7.5 ... 352.5 354.4 356.2 358.1
height float64 ...
* time (time) object 0001-01-01 00:00:00 ... 0001-12-31 00:00:00
Attributes:
operation: temporal_avg
mode: climatology
freq: day
weighted: True

3. Calculate Departures (Anomalies)

Related API: xarray.Dataset.temporal.departures()


In this example, we will be calculating the weighted departures of the tas variable for its seasonal, annual, and daily
cycles.
Helpful knowledge:
• What are anomalies?
– In climatology, “anomalies” refer to the difference between the value during a given time interval (e.g., the
January average surface air temperature) and the long-term average value for that time interval (e.g., the
average surface temperature over the last 30 Januaries).
• How is the climatology calculated?
– In the departures API, the reference climatology is calculated internally so there is no need to pass one to
this method.
– You can still calculate the reference climatology using the climatology API.
– If desired, use the reference_period argument to calculate anomalies relative to a climatological refer-
ence period (a subset of the entire time series). If no value is provided, the climatological reference period
will be the full period covered by the dataset.
• Masked (missing) data is automatically handled.
– The weight of masked (missing) data is excluded when averages are calculated. This is the same as giving
them a weight of 0.

10.3. Gallery 77
xCDAT Documentation, Release 0.6.0

Seasonal Anomalies

The season_config dictionary keyword argument can be passed to .departures() for more granular configuration.
We will be sticking with the default settings.

[15]: season_departures = ds_monthly.temporal.departures(


"tas",
freq="season",
weighted=True,
season_config={"dec_mode": "DJF", "drop_incomplete_djf": True},
)

[16]: season_departures.tas
[16]: <xarray.DataArray 'tas' (time: 1977, lat: 145, lon: 192)>
array([[[ 4.34326172, 4.34326172, 4.34326172, ..., 4.34326172,
4.34326172, 4.34326172],
[ 3.86720657, 3.8577919 , 3.85103607, ..., 3.88219452,
3.87863541, 3.87652969],
[ 4.2077713 , 4.16518402, 4.11819077, ..., 4.32532883,
4.28820419, 4.24903488],
...,
[-12.20487785, -12.16464233, -12.10724449, ..., -12.2838459 ,
-12.24758339, -12.22140503],
[-12.55296707, -12.54230881, -12.52475929, ..., -12.58355713,
-12.5769062 , -12.56380463],
[-12.51847076, -12.51847076, -12.51847076, ..., -12.51847076,
-12.51847076, -12.51847076]],

[[ -3.24290466, -3.24290466, -3.24290466, ..., -3.24290466,


-3.24290466, -3.24290466],
[ -3.34371567, -3.36968613, -3.38976288, ..., -3.2491684 ,
-3.28220749, -3.31663132],
[ -3.67527008, -3.71800995, -3.75824356, ..., -3.5437355 ,
-3.59187698, -3.63333511],
...
[ 0.38942909, 0.4767437 , 0.56199265, ..., 0.10713196,
0.19812012, 0.29603195],
[ -0.33409309, -0.26799774, -0.2053833 , ..., -0.50570869,
-0.47422981, -0.41047096],
[ -0.21318245, -0.21318245, -0.21318245, ..., -0.21318245,
-0.21318245, -0.21318245]],

[[ 10.28124619, 10.28124619, 10.28124619, ..., 10.28124619,


10.28124619, 10.28124619],
[ 9.61903381, 9.61942673, 9.61753845, ..., 9.61211777,
9.61251068, 9.61755753],
[ 9.96759415, 9.97954178, 9.98497772, ..., 9.94213486,
9.95424652, 9.96336365],
...,
[ 5.24849892, 5.38485527, 5.50936508, ..., 4.77218628,
4.9283905 , 5.09534836],
(continues on next page)

78 Chapter 10. License


xCDAT Documentation, Release 0.6.0

(continued from previous page)


[ 4.4073925 , 4.46779633, 4.53607178, ..., 4.21307182,
4.27819633, 4.33395958],
[ 2.43467522, 2.43467522, 2.43467522, ..., 2.43467522,
2.43467522, 2.43467522]]])
Coordinates:
* time (time) datetime64[ns] 1850-03-16T12:00:00 1850-04-16 ... 2014-11-16
* lat (lat) float64 -90.0 -88.75 -87.5 -86.25 ... 86.25 87.5 88.75 90.0
* lon (lon) float64 0.0 1.875 3.75 5.625 7.5 ... 352.5 354.4 356.2 358.1
height float64 2.0
Attributes:
operation: temporal_avg
mode: departures
freq: season
weighted: True
dec_mode: DJF
drop_incomplete_djf: True

Custom Seasonal Anomalies

To calculate custom seasonal cycle anomalies, we must first define our custom seasons using the season_config
dictionary and the "custom_seasons" key.
"custom_seasons" must be a list of sublists containing month strings, with each sublist representing a custom season.
* Month strings must be in the three letter format (e.g., ‘Jan’) * Each month must be included once in a custom season
* Order of the months in each custom season does not matter * Custom seasons can vary in length

[17]: custom_seasons = [
["Jan", "Feb", "Mar"], # "JanFebMar"
["Apr", "May", "Jun"], # "AprMayJun"
["Jul", "Aug", "Sep"], # "JulAugSep"
["Oct", "Nov", "Dec"], # "OctNovDec"
]
c_season_departs = ds_monthly.temporal.departures(
"tas",
freq="season",
weighted=True,
season_config={"custom_seasons": custom_seasons},
)

[18]: c_season_departs.tas
[18]: <xarray.DataArray 'tas' (time: 1980, lat: 145, lon: 192)>
array([[[ 1.15587234e+01, 1.15587234e+01, 1.15587234e+01, ...,
1.15587234e+01, 1.15587234e+01, 1.15587234e+01],
[ 1.06294823e+01, 1.06283112e+01, 1.06131477e+01, ...,
1.06636086e+01, 1.06529236e+01, 1.06392441e+01],
[ 1.07633209e+01, 1.07548561e+01, 1.07489014e+01, ...,
1.07870865e+01, 1.07799606e+01, 1.07722702e+01],
...,
[-3.13597870e+00, -3.16473389e+00, -3.20427704e+00, ...,
-3.06773376e+00, -3.09281540e+00, -3.11287689e+00],
(continues on next page)

10.3. Gallery 79
xCDAT Documentation, Release 0.6.0

(continued from previous page)


[-3.20626831e+00, -3.22188187e+00, -3.23925400e+00, ...,
-3.15018845e+00, -3.16690826e+00, -3.19141769e+00],
[-2.88719559e+00, -2.88719559e+00, -2.88719559e+00, ...,
-2.88719559e+00, -2.88719559e+00, -2.88719559e+00]],

[[-7.04269409e-02, -7.04269409e-02, -7.04269409e-02, ...,


-7.04269409e-02, -7.04269409e-02, -7.04269409e-02],
[ 7.80868530e-03, -7.27844238e-03, -2.19535828e-02, ...,
5.26618958e-02, 3.81317139e-02, 2.33840942e-02],
[ 1.25839233e-01, 1.10904694e-01, 9.38415527e-02, ...,
1.70631409e-01, 1.54930115e-01, 1.39518738e-01],
...
[ 1.16304569e+01, 1.17625141e+01, 1.18839569e+01, ...,
1.11642685e+01, 1.13179016e+01, 1.14805965e+01],
[ 1.08624706e+01, 1.09229870e+01, 1.09914379e+01, ...,
1.06684208e+01, 1.07325172e+01, 1.07883625e+01],
[ 9.04502106e+00, 9.04502106e+00, 9.04502106e+00, ...,
9.04502106e+00, 9.04502106e+00, 9.04502106e+00]],

[[ 9.30399704e+00, 9.30399704e+00, 9.30399704e+00, ...,


9.30399704e+00, 9.30399704e+00, 9.30399704e+00],
[ 8.16454315e+00, 8.16766357e+00, 8.16603851e+00, ...,
8.15755844e+00, 8.16147614e+00, 8.16360474e+00],
[ 7.97850037e+00, 7.98074722e+00, 7.98074722e+00, ...,
7.96626663e+00, 7.96974945e+00, 7.97483063e+00],
...,
[ 3.42844009e-01, 4.04024124e-01, 4.52957153e-01, ...,
1.59294128e-01, 2.23571777e-01, 2.90761948e-01],
[ 4.41129684e-01, 4.34568405e-01, 4.77308273e-01, ...,
3.87384415e-01, 3.97815704e-01, 4.28163528e-01],
[ 6.65031433e-01, 6.65031433e-01, 6.65031433e-01, ...,
6.65031433e-01, 6.65031433e-01, 6.65031433e-01]]])
Coordinates:
* time (time) datetime64[ns] 1850-01-16T12:00:00 ... 2014-12-16T12:00:00
* lat (lat) float64 -90.0 -88.75 -87.5 -86.25 ... 86.25 87.5 88.75 90.0
* lon (lon) float64 0.0 1.875 3.75 5.625 7.5 ... 352.5 354.4 356.2 358.1
height float64 2.0
Attributes:
operation: temporal_avg
mode: departures
freq: season
weighted: True
custom_seasons: ['JanFebMar', 'AprMayJun', 'JulAugSep', 'OctNovDec']

80 Chapter 10. License


xCDAT Documentation, Release 0.6.0

Annual Anomalies

[19]: annual_departures = ds_monthly.temporal.departures("tas", freq="month", weighted=True)

[20]: annual_departures.tas
[20]: <xarray.DataArray 'tas' (time: 1980, lat: 145, lon: 192)>
array([[[ 1.02746201, 1.02746201, 1.02746201, ..., 1.02746201,
1.02746201, 1.02746201],
[ 1.19550705, 1.18815422, 1.16651344, ..., 1.25013351,
1.23219299, 1.2116642 ],
[ 1.46669388, 1.44287872, 1.42212868, ..., 1.53920364,
1.51576614, 1.49129677],
...,
[-3.27492523, -3.30706978, -3.3486805 , ..., -3.19853592,
-3.22591019, -3.24889755],
[-3.33357239, -3.35164261, -3.37155914, ..., -3.27233505,
-3.29122543, -3.31744385],
[-3.00098038, -3.00098038, -3.00098038, ..., -3.00098038,
-3.00098038, -3.00098038]],

[[-0.84364319, -0.84364319, -0.84364319, ..., -0.84364319,


-0.84364319, -0.84364319],
[-0.5828476 , -0.59806061, -0.61303711, ..., -0.53842926,
-0.55280304, -0.56754303],
[-0.48284912, -0.49863052, -0.51592255, ..., -0.43305969,
-0.45086288, -0.46794891],
...
[12.00904846, 12.1377697 , 12.2563324 , ..., 11.55307579,
11.70439339, 11.86277771],
[11.19392395, 11.2524662 , 11.31940842, ..., 11.00472641,
11.06731987, 11.12167549],
[ 9.3935051 , 9.3935051 , 9.3935051 , ..., 9.3935051 ,
9.3935051 , 9.3935051 ]],

[[-1.52723312, -1.52723312, -1.52723312, ..., -1.52723312,


-1.52723312, -1.52723312],
[-1.80300522, -1.80407143, -1.80956459, ..., -1.79553413,
-1.79644394, -1.79920959],
[-1.97462082, -1.97897911, -1.98453903, ..., -1.96074104,
-1.96693993, -1.97067261],
...,
[ 5.5448761 , 5.60659981, 5.6578846 , ..., 5.35509872,
5.42089844, 5.49032402],
[ 5.68313217, 5.67895317, 5.72336769, ..., 5.62318993,
5.63481903, 5.66713715],
[ 5.93663979, 5.93663979, 5.93663979, ..., 5.93663979,
5.93663979, 5.93663979]]])
Coordinates:
* time (time) datetime64[ns] 1850-01-16T12:00:00 ... 2014-12-16T12:00:00
* lat (lat) float64 -90.0 -88.75 -87.5 -86.25 ... 86.25 87.5 88.75 90.0
* lon (lon) float64 0.0 1.875 3.75 5.625 7.5 ... 352.5 354.4 356.2 358.1
height float64 2.0
(continues on next page)

10.3. Gallery 81
xCDAT Documentation, Release 0.6.0

(continued from previous page)


Attributes:
operation: temporal_avg
mode: departures
freq: month
weighted: True

Daily Anomalies

Leap days (if present) are dropped if the CF calendar type is "gregorian", "proleptic_gregorian", or "standard
".

[21]: daily_departures = ds_hourly.temporal.departures("tas", freq="day", weighted=True)

[22]: daily_departures.tas
[22]: <xarray.DataArray 'tas' (time: 14600, lat: 145, lon: 192)>
dask.array<getitem, shape=(14600, 145, 192), dtype=float64, chunksize=(8, 145, 192),␣
˓→chunktype=numpy.ndarray>

Coordinates:
* time (time) datetime64[ns] 2010-01-01T03:00:00 ... 2015-01-01
* lat (lat) float64 -90.0 -88.75 -87.5 -86.25 ... 86.25 87.5 88.75 90.0
* lon (lon) float64 0.0 1.875 3.75 5.625 7.5 ... 352.5 354.4 356.2 358.1
height float64 ...
Attributes:
operation: temporal_avg
mode: departures
freq: day
weighted: True

10.3.6 Horizontal Regridding

Author: Jason Boutte


Date: 09/26/22
Related APIs:
• xarray.Dataset.regridder.horizontal
The data used in this example can be found through the Earth System Grid Federation (ESGF) search portal.

Overview

We’ll cover horizontal regridding using the xESMF and Regrid2 tools as well as various methods supported by xESMF.
It should be noted that Regrid2 treats the grid cells as being flat.

[1]: %matplotlib inline

import os
import sys
(continues on next page)

82 Chapter 10. License


xCDAT Documentation, Release 0.6.0

(continued from previous page)


os.environ["ESMFMKFILE"] = sys.prefix + "/lib/esmf.mk" # TODO remove after esmf>=8.5
import xesmf

[2]: import matplotlib.pyplot as plt


import xarray as xr
import xcdat

1. Open the Dataset

We are using xarray’s OPeNDAP support to read a netCDF4 dataset file directly from its source. The data is not loaded
over the network until we perform operations on it (e.g., temperature unit adjustment).
More information on the xarray’s OPeNDAP support can be found here.

[3]: filepath = "http://aims3.llnl.gov/thredds/dodsC/css03_data/CMIP6/CMIP/CCCma/CanESM5/


˓→historical/r13i1p1f1/Amon/tas/gn/v20190429/tas_Amon_CanESM5_historical_r13i1p1f1_gn_

˓→185001-201412.nc"

ds = xcdat.open_dataset(filepath)

# Unit adjust (-273.15, K to C)


ds["tas"] = ds["tas"] - 273.15

ds

[3]: <xarray.Dataset>
Dimensions: (time: 1980, bnds: 2, lat: 64, lon: 128)
Coordinates:
* time (time) object 1850-01-16 12:00:00 ... 2014-12-16 12:00:00
* lat (lat) float64 -87.86 -85.1 -82.31 -79.53 ... 82.31 85.1 87.86
* lon (lon) float64 0.0 2.812 5.625 8.438 ... 348.8 351.6 354.4 357.2
height float64 2.0
Dimensions without coordinates: bnds
Data variables:
time_bnds (time, bnds) object ...
lat_bnds (lat, bnds) float64 ...
lon_bnds (lon, bnds) float64 ...
tas (time, lat, lon) float32 -25.04 -25.28 -25.49 ... -25.93 -25.73
Attributes: (12/54)
CCCma_model_hash: 7e8e715f3f2ce47e1bab830db971c362ca329419
CCCma_parent_runid: rc3.1-pictrl
CCCma_pycmor_hash: 33c30511acc319a98240633965a04ca99c26427e
CCCma_runid: rc3.1-his13
Conventions: CF-1.7 CMIP-6.2
YMDH_branch_time_in_child: 1850:01:01:00
... ...
variable_id: tas
variant_label: r13i1p1f1
version: v20190429
license: CMIP6 model data produced by The Governm...
(continues on next page)

10.3. Gallery 83
xCDAT Documentation, Release 0.6.0

(continued from previous page)


cmor_version: 3.4.0
DODS_EXTRA.Unlimited_Dimension: time

2. Create the output grid

Related API: xcdat.create_gaussian_grid()


In this example, we will generate a gaussian grid with 32 latitudes to regrid our input data to.
Alternatively a grid can be loaded from a file, e.g.

grid_urlpath = "http://aims3.llnl.gov/thredds/dodsC/css03_data/CMIP6/CMIP/NOAA-GFDL/GFDL-
˓→CM4/abrupt-4xCO2/r1i1p1f1/day/tas/gr2/v20180701/tas_day_GFDL-CM4_abrupt-4xCO2_r1i1p1f1_

˓→gr2_00010101-00201231.nc"

grid_ds = xcdat.open_dataset(grid_urlpath)

output_grid = grid_ds.regridder.grid

Other related APIs available for creating grids: xcdat.create_grid() and xcdat.create_uniform_grid()

[4]: output_grid = xcdat.create_gaussian_grid(32)

fig, axes = plt.subplots(ncols=2, figsize=(16, 4))

ds.regridder.grid.plot.scatter(x="lon", y="lat", s=4, ax=axes[0])


axes[0].set_title("Input Grid")

output_grid.plot.scatter(x="lon", y="lat", s=4, ax=axes[1])


axes[1].set_title("Output Grid")

plt.tight_layout()

84 Chapter 10. License


xCDAT Documentation, Release 0.6.0

3. Regrid the data

Related API: xarray.Dataset.regridder.horizontal()


Here we will regrid the input data to the ouptut grid using the xESMF tool and the bilinear method.

[5]: output = ds.regridder.horizontal("tas", output_grid, tool="xesmf", method="bilinear")

fig, axes = plt.subplots(ncols=2, figsize=(16, 4))

ds.tas.isel(time=0).plot(ax=axes[0], vmin=-40, vmax=40, extend='both', cmap = 'RdBu_r')


axes[0].set_title("Input data")

output.tas.isel(time=0).plot(ax=axes[1], vmin=-40, vmax=40, extend='both', cmap = 'RdBu_r


˓→')

axes[1].set_title("Output data")

plt.tight_layout()

4. Regridding algorithms

Related API: xarray.Dataset.regridder.horizontal()


In this example, we will compare the different regridding methods supported by xESMF.
You can find a more in depth comparison on xESMF’s documentation.
Methods:
• bilinear
• conservative
• nearest_s2d
• nearest_d2s
• patch

[6]: methods = ["bilinear", "conservative", "nearest_s2d", "nearest_d2s", "patch"]

fig, axes = plt.subplots(3, 2, figsize=(16, 12))

axes = axes.flatten()

for i, method in enumerate(methods):


(continues on next page)

10.3. Gallery 85
xCDAT Documentation, Release 0.6.0

(continued from previous page)


output = ds.regridder.horizontal("tas", output_grid, tool="xesmf", method=method)

output.tas.isel(time=0).plot(ax=axes[i], vmin=-40, vmax=40, extend='both', cmap =


'RdBu_r')
˓→

axes[i].set_title(method)

axes[-1].set_visible(False)

plt.tight_layout()

5. Masking

Related API: xarray.Dataset.regridder.horizontal()


xESMF supports masking by simply adding a data variable with the id mask.
See xESMF documentation for additonal details.

[7]: ds["mask"] = xr.where(ds.tas.isel(time=0) < -10, 1, 0)

(continues on next page)

86 Chapter 10. License


xCDAT Documentation, Release 0.6.0

(continued from previous page)


masked_output = ds.regridder.horizontal(
"tas", output_grid, tool="xesmf", method="bilinear"
)

fig, axes = plt.subplots(ncols=2, figsize=(18, 4))

ds["mask"].plot(ax=axes[0], cmap="binary_r")
axes[0].set_title("Mask")

masked_output.tas.isel(time=0).plot(ax=axes[1], vmin=-40, vmax=40, extend='both', cmap=


˓→'RdBu_r')

axes[1].set_title("Masked output")

plt.tight_layout()

6. Regridding using regrid2

Related API: xarray.Dataset.regridder.horizontal()


Regrid2 is a conservative regridder for rectilinear (lat/lon) grids originally from the cdutil package from CDAT.
This regridder assumes constant latitude lines when generating weights.

[8]: output = ds.regridder.horizontal("tas", output_grid, tool="regrid2")

fig, axes = plt.subplots(ncols=2, figsize=(16, 4))

ds.tas.isel(time=0).plot(ax=axes[0], vmin=-40, vmax=40, extend='both', cmap = 'RdBu_r')

output.tas.isel(time=0).plot(ax=axes[1], vmin=-40, vmax=40, extend='both', cmap = 'RdBu_r


˓→')

[8]: <matplotlib.collections.QuadMesh at 0x7ff3d84aa890>

10.3. Gallery 87
xCDAT Documentation, Release 0.6.0

10.3.7 Vertical Regridding

Authors: Jason Boutte and Jill Zhang


Date: 9/26/23
Related APIs:
• xarray.Dataset.regridder.vertical
The data used in this example can be found through the Earth System Grid Federation (ESGF) search portal. We are
using xarray’s OPeNDAP support to read a netCDF4 dataset file directly from its source. The data is not loaded over
the network until we perform operations on it (e.g., temperature unit adjustment). More information on the xarray’s
OPeNDAP support can be found here.
We’ll cover vertical regridding using xgcm. Two examples are outlined here to apply vertical regridding/remapping
using ocean variables and atmosphere variables, respectively.

Example 1: Remapping Ocean Variables

[1]: %matplotlib inline

import matplotlib.pyplot as plt


import xarray as xr
import xcdat
import numpy as np
# gsw_xarray is a wrapper for GSW-Python:
# the Python implementation of the Gibbs SeaWater (GSW) Oceanographic Toolbox of TEOS-10
import gsw_xarray as gsw

import warnings

warnings.filterwarnings("ignore")

88 Chapter 10. License


xCDAT Documentation, Release 0.6.0

1. Open dataset

[2]: # urls for sea water potential temperature (thetao) and salinity (so) from the NCAR␣
˓→model in CMIP6

urls = [
"http://aims3.llnl.gov/thredds/dodsC/css03_data/CMIP6/CMIP/NCAR/CESM2/historical/
˓→r1i1p1f1/Omon/so/gn/v20190308/so_Omon_CESM2_historical_r1i1p1f1_gn_185001-201412.nc",

"http://aims3.llnl.gov/thredds/dodsC/css03_data/CMIP6/CMIP/NCAR/CESM2/historical/
˓→r1i1p1f1/Omon/thetao/gn/v20190308/thetao_Omon_CESM2_historical_r1i1p1f1_gn_185001-

˓→201412.nc",

ds = xr.merge([xcdat.open_dataset(x, chunks={"time": 4}) for x in urls])

# lev coordinate is in cm and bounds is in m, convert lev to m


with xr.set_options(keep_attrs=True):
ds.lev.load()
ds["lev"] = ds.lev / 100
ds.lev.attrs["units"] = "meters"

ds
2023-09-26 15:45:48,860 [WARNING]: bounds.py(add_missing_bounds:186) >> The nlat coord␣
˓→variable has a 'units' attribute that is not in degrees.
2023-09-26 15:45:48,860 [WARNING]: bounds.py(add_missing_bounds:186) >> The nlat coord␣
˓→variable has a 'units' attribute that is not in degrees.
2023-09-26 15:45:48,966 [WARNING]: bounds.py(add_missing_bounds:186) >> The nlat coord␣
˓→variable has a 'units' attribute that is not in degrees.
2023-09-26 15:45:48,966 [WARNING]: bounds.py(add_missing_bounds:186) >> The nlat coord␣
˓→variable has a 'units' attribute that is not in degrees.
[2]: <xarray.Dataset>
Dimensions: (lev: 60, nlat: 384, nlon: 320, time: 1980, d2: 2, vertices: 4,
bnds: 2)
Coordinates:
* lev (lev) float64 5.0 15.0 25.0 ... 4.875e+03 5.125e+03 5.375e+03
* nlat (nlat) int32 1 2 3 4 5 6 7 8 ... 377 378 379 380 381 382 383 384
* nlon (nlon) int32 1 2 3 4 5 6 7 8 ... 313 314 315 316 317 318 319 320
* time (time) object 1850-01-15 13:00:00.000007 ... 2014-12-15 12:00:00
lat (nlat, nlon) float64 dask.array<chunksize=(384, 320), meta=np.ndarray>
lon (nlat, nlon) float64 dask.array<chunksize=(384, 320), meta=np.ndarray>
Dimensions without coordinates: d2, vertices, bnds
Data variables:
time_bnds (time, d2) object dask.array<chunksize=(4, 2), meta=np.ndarray>
lat_bnds (nlat, nlon, vertices) float32 dask.array<chunksize=(384, 320, 4),␣
˓→meta=np.ndarray>

lon_bnds (nlat, nlon, vertices) float32 dask.array<chunksize=(384, 320, 4),␣


˓→meta=np.ndarray>

lev_bnds (lev, d2) float32 dask.array<chunksize=(60, 2), meta=np.ndarray>


so (time, lev, nlat, nlon) float32 dask.array<chunksize=(4, 60, 384, 320),␣
˓→meta=np.ndarray>

nlon_bnds (nlon, bnds) float64 0.5 1.5 1.5 2.5 ... 318.5 319.5 319.5 320.5
thetao (time, lev, nlat, nlon) float32 dask.array<chunksize=(4, 60, 384, 320),␣
˓→meta=np.ndarray>
(continues on next page)

10.3. Gallery 89
xCDAT Documentation, Release 0.6.0

(continued from previous page)


Attributes: (12/46)
Conventions: CF-1.7 CMIP-6.2
activity_id: CMIP
case_id: 15
cesm_casename: b.e21.BHIST.f09_g17.CMIP6-historical.001
contact: [email protected]
creation_date: 2019-01-16T23:15:40Z
... ...
sub_experiment_id: none
branch_time_in_parent: 219000.0
branch_time_in_child: 674885.0
branch_method: standard
further_info_url: https://furtherinfo.es-doc.org/CMIP6.NCA...
DODS_EXTRA.Unlimited_Dimension: time

2. Create the output grid

Related API: xcdat.create_grid()


In this example, we will generate a vertical grid with a linear spaced level coordinate using xcdat.create_grid
Alternatively a grid can be loaded from a file, e.g.

grid_urlpath = "http://aims3.llnl.gov/thredds/dodsC/css03_data/CMIP6/CMIP/NOAA-GFDL/GFDL-
˓→CM4/abrupt-4xCO2/r1i1p1f1/day/tas/gr2/v20180701/tas_day_GFDL-CM4_abrupt-4xCO2_r1i1p1f1_

˓→gr2_00010101-00201231.nc"

grid_ds = xcdat.open_dataset(grid_urlpath)

output_grid = grid_ds.regridder.grid

[3]: output_grid = xcdat.create_grid(


z=xcdat.create_axis("lev", np.linspace(5, 537, 10))
)

output_grid
[3]: <xarray.Dataset>
Dimensions: (lev: 10, bnds: 2)
Coordinates:
* lev (lev) float64 5.0 64.11 123.2 182.3 ... 359.7 418.8 477.9 537.0
Dimensions without coordinates: bnds
Data variables:
lev_bnds (lev, bnds) float64 -24.56 34.56 34.56 93.67 ... 507.4 507.4 566.6

90 Chapter 10. License


xCDAT Documentation, Release 0.6.0

3. Regridding using the linear method

Related API: xarray.Dataset.regridder.vertical()


Here we will regrid the input data to the output grid using the xgcm tool and the linear method.
We’ll interpolate salinity onto the new vertical grid.

[4]: output = ds.regridder.vertical("so", output_grid, tool="xgcm", method="linear")

output.so.isel(time=0).mean(dim="nlon").plot()
plt.gca().invert_yaxis()
2023-09-26 15:45:50,454 [WARNING]: bounds.py(add_missing_bounds:186) >> The nlat coord␣
˓→variable has a 'units' attribute that is not in degrees.
2023-09-26 15:45:50,454 [WARNING]: bounds.py(add_missing_bounds:186) >> The nlat coord␣
˓→variable has a 'units' attribute that is not in degrees.

10.3. Gallery 91
xCDAT Documentation, Release 0.6.0

4. Regridding from depth to density space

Related API: xarray.Dataset.regridder.vertical()


Here we will regrid the input data to the output grid using the xgcm tool and the linear method.
We’ll remap salinity into density space.

[5]: # Apply gsw function to calculate potential density from potential temperature (thetao)␣
˓→and salinity (so)

ds["dens"] = gsw.sigma0(ds.so, ds.thetao)

ds.dens.isel(time=0).mean(dim="nlon").plot()
plt.gca().invert_yaxis()

[6]: density_grid = xcdat.create_grid(


z=xcdat.create_axis("lev", np.linspace(6, 26, 40))
)

output = ds.regridder.vertical("so", density_grid, tool="xgcm", method="linear", target_


˓→data="dens")

output.so.isel(time=0).mean(dim="nlon").plot()
plt.gca().invert_yaxis()

92 Chapter 10. License


xCDAT Documentation, Release 0.6.0

2023-09-26 15:46:53,807 [WARNING]: bounds.py(add_missing_bounds:186) >> The nlat coord␣


˓→variable has a 'units' attribute that is not in degrees.
2023-09-26 15:46:53,807 [WARNING]: bounds.py(add_missing_bounds:186) >> The nlat coord␣
˓→variable has a 'units' attribute that is not in degrees.

5. Regridding using the conservative method

Related API: xarray.Dataset.regridder.vertical()


Here we will regrid the input data to the output grid using the xgcm tool and the conservative method.
We’ll transform model levels using conservative regridding. In order to perform the regridding we’ll need two grid
positions, the lev coordinate is center and we”ll create the outer points using cf_xarray”s bounds_to_vertices.

[7]: ds_olev = ds.cf.bounds_to_vertices("lev").rename({"lev_vertices": "olev"})

output = ds_olev.regridder.vertical("so", output_grid, tool="xgcm", method="conservative


˓→", grid_positions={"center": "lev", "outer": "olev"})

output.so.isel(time=0).sel(lev=0, method="nearest").plot()
2023-09-26 15:47:50,128 [WARNING]: bounds.py(add_missing_bounds:186) >> The nlat coord␣
˓→variable has a 'units' attribute that is not in degrees.

(continues on next page)

10.3. Gallery 93
xCDAT Documentation, Release 0.6.0

(continued from previous page)


2023-09-26 15:47:50,128 [WARNING]: bounds.py(add_missing_bounds:186) >> The nlat coord␣
˓→variable has a 'units' attribute that is not in degrees.

[7]: <matplotlib.collections.QuadMesh at 0x1875e1a10>

Example 2: Remapping Atmosphere Variables

1. Open dataset

[8]: # Url of data from the E3SM model in CMIP6


url_ta = 'https://esgf-data2.llnl.gov/thredds/dodsC/user_pub_work/CMIP6/CMIP/E3SM-
˓→Project/E3SM-2-0/historical/r1i1p1f1/Amon/ta/gr/v20220830/ta_Amon_E3SM-2-0_historical_

˓→r1i1p1f1_gr_185001-189912.nc'

url_cl = 'https://esgf-data2.llnl.gov/thredds/dodsC/user_pub_work/CMIP6/CMIP/E3SM-
˓→Project/E3SM-2-0/historical/r1i1p1f1/Amon/cl/gr/v20220830/cl_Amon_E3SM-2-0_historical_

˓→r1i1p1f1_gr_185001-189912.nc'

ds_ta = xcdat.open_dataset(url_ta, chunks={"time": 4}, add_bounds=["Z"])


ds_cl = xcdat.open_dataset(url_cl, chunks={"time": 4})

94 Chapter 10. License


xCDAT Documentation, Release 0.6.0

2. Create the output grid

Related API: xcdat.create_grid()


In this example, we will generate a grid with a linear spaced level coordinate.

[9]: output_grid = xcdat.create_grid(


z=xcdat.create_axis("lev", np.linspace(100000, 1, 13))
)

output_grid
[9]: <xarray.Dataset>
Dimensions: (lev: 13, bnds: 2)
Coordinates:
* lev (lev) float64 1e+05 9.167e+04 8.333e+04 ... 8.334e+03 1.0
Dimensions without coordinates: bnds
Data variables:
lev_bnds (lev, bnds) float64 1.042e+05 9.583e+04 ... 4.168e+03 -4.166e+03

3. Remapping air temperature on pressure levels to a set of target pressure levels.

Related API: xarray.Dataset.regridder.vertical()


Here we will regrid the input data to the output grid using the xgcm tool and the log method.
We’ll remap pressure levels.

[10]: # Remap from original pressure level to target pressure level using logarithmic␣
˓→interpolation

# Note: output grids can be either ascending or descending


output_ta = ds_ta.regridder.vertical("ta", output_grid, method="log")

output_ta.ta.isel(time=0, lev=0).plot()
[10]: <matplotlib.collections.QuadMesh at 0x188277b10>

10.3. Gallery 95
xCDAT Documentation, Release 0.6.0

4: Remap cloud fraction from model hybrid coordinate to pressure levels

Related API: xarray.Dataset.regridder.vertical()


Here we will regrid the input data to the output grid using the xgcm tool and the linear method.
We’ll remap cloud fraction into pressure space.

[11]: # Build hybrid pressure coordinate


def hybrid_coordinate(p0, a, b, ps, **kwargs):
return a*p0 + b*ps

pressure = hybrid_coordinate(**ds_cl.data_vars)

pressure
[11]: <xarray.DataArray (lev: 72, time: 600, lat: 180, lon: 360)>
dask.array<add, shape=(72, 600, 180, 360), dtype=float64, chunksize=(72, 4, 180, 360),␣
˓→chunktype=numpy.ndarray>

Coordinates:
* lev (lev) float64 0.9985 0.9938 0.9862 ... 0.0001828 0.0001238
* time (time) object 1850-01-16 12:00:00 ... 1899-12-16 12:00:00
* lat (lat) float64 -89.5 -88.5 -87.5 -86.5 -85.5 ... 86.5 87.5 88.5 89.5
* lon (lon) float64 0.5 1.5 2.5 3.5 4.5 ... 355.5 356.5 357.5 358.5 359.5

96 Chapter 10. License


xCDAT Documentation, Release 0.6.0

[12]: new_pressure_grid = xcdat.create_grid(


z=xcdat.create_axis("lev", np.linspace(100000, 1, 13))
)

output_cl = ds_cl.regridder.vertical("cl", new_pressure_grid, method="linear", target_


˓→data=pressure)

output_cl.cl.isel(time=0, lev=0).plot()
[12]: <matplotlib.collections.QuadMesh at 0x19974eed0>

[13]: output_cl.cl.isel(time=0).mean(dim='lon').plot()
plt.gca().invert_yaxis()

10.3. Gallery 97
xCDAT Documentation, Release 0.6.0

[ ]:

10.4 Presentations and Demos

This page includes relevant xCDAT presentations, demos, and papers.

98 Chapter 10. License


xCDAT Documentation, Release 0.6.0

10.4.1 LLNL Climate and Weather Seminar Series (01/25/2023) - A Gentle Introduc-
tion to xCDAT

“A Python package for simple and robust climate data analysis.”


Core Developers: Tom Vo, Stephen Po-Chedley, Jason Boutte, Jill Zhang, Jiwoo Lee
With thanks to Peter Gleckler, Paul Durack, Karl Taylor, and Chris Golaz

This work is performed under the auspices of the U. S. DOE by Lawrence Livermore National Laboratory under contract
No. DE-AC52-07NA27344.

Presentation Overview

Intended audience: Some or no familiarity with xarray and/or xcdat


1. Driving force behind xCDAT
2. Goals and milestones of CDAT’s successor
3. Introducing xCDAT
4. Understanding the basics of Xarray
5. How xCDAT extends Xarray for climate data analysis
6. Technical design philosophy and APIs
7. Demo of capabilities
8. How to get involved

Notebook Setup

Create an Anaconda environment for this notebook using the command below:

conda create -n xcdat -c conda-forge xarray xcdat xesmf matplotlib nc-time-axis jupyter

• xesmf is required for horizontal regridding with xESMF


• matplotlib is an optional dependency required for plotting with xarray
• nc-time-axis is an optional dependency required for matplotlib to plot cftime coordinates

The Driving Force Behind xCDAT

• The CDAT (Community Data Analysis Tools) library has provided a suite of robust and comprehensive open-
source climate data analysis and visualization packages for over 20 years
• A driving need for a modern successor
– Focus on a maintainable and extensible library
– Serve the needs of the climate community in the long-term

10.4. Presentations and Demos 99


xCDAT Documentation, Release 0.6.0

Goals and Milestones for CDAT’s Successor

1. Offer similar core capabilities


1. For example geospatial averaging, temporal averaging, and regridding
2. Use modern technologies in the library’s stack
1. Support parallelism and lazy operations
3. Be maintainable, extensible, and easy-to-use
1. Python Enhancement Proposals (PEPs)
2. Automate DevOps processes (unit testing, code coverage)
3. Actively maintain documentation
4. Cultivate an open-source community that can sustain the project
1. Encourage GitHub contributions
2. Community engagement efforts (e.g., Pangeo, ESGF)

100 Chapter 10. License


xCDAT Documentation, Release 0.6.0

Introducing xCDAT

• xCDAT is an extension of xarray for climate data analysis on structured grids


• Goal of providing features and utilities for simple and robust analysis of climate data
• Jointly developed by scientists and developers from:
– E3SM Project (Energy Exascale Earth System Model Project)
– PCMDI (Program for Climate Model Diagnosis and Intercomparison)
– SEATS Project (Simplifying ESM Analysis Through Standards Project)
– Users around the world via GitHub

10.4. Presentations and Demos 101


xCDAT Documentation, Release 0.6.0

Before We Dive Deeper, Let’s Talk About Xarray

• Xarray is an evolution of an internal tool developed at The Climate Corporation


• Released as open source in May 2014
• NumFocus fiscally sponsored project since August 2018

102 Chapter 10. License


xCDAT Documentation, Release 0.6.0

Key Features and Capabilities in Xarray

• “N-D labeled arrays and datasets in Python”


– Built upon and extends NumPy and pandas
• Interoperable with scientific Python ecosystem including NumPy, Dask, Pandas, and Matplotlib
• Supports file I/O, indexing and selecting, interpolating, grouping, aggregating, parallelism (Dask), plotting (mat-
plotlib wrapper)
– Supported formats include: netCDF, Iris, OPeNDAP, Zarr, and GRIB
Source: https://xarray.dev/#features

Why use Xarray?

“Xarray introduces labels in the form of dimensions, coordinates and attributes on top of raw NumPy-
like multidimensional arrays, which allows for a more intuitive, more concise, and less error-prone
developer experience.”
—https://xarray.pydata.org/en/v2022.10.0/getting-started-guide/why-xarray.html
• Apply operations over dimensions by name
– x.sum('time')
• Select values by label (or logical location) instead of integer location
– x.loc['2014-01-01'] or x.sel(time='2014-01-01')
• Mathematical operations vectorize across multiple dimensions (array broadcasting) based on dimension
names, not shape
– x - y
• Easily use the split-apply-combine paradigm with groupby
– x.groupby('time.dayofyear').mean().
• Database-like alignment based on coordinate labels that smoothly handles missing values
– x, y = xr.align(x, y, join='outer')
• Keep track of arbitrary metadata in the form of a Python dictionary
– x.attrs
Source: https://docs.xarray.dev/en/v2022.10.0/getting-started-guide/why-xarray.html#what-labels-enable

The Xarray Data Models

“Xarray data models are borrowed from netCDF file format, which provides xarray with a natural and
portable serialization format.”
—https://docs.xarray.dev/en/v2022.10.0/getting-started-guide/why-xarray.html
1. ``xarray.Dataset``
• A dictionary-like container of DataArray objects with aligned dimensions
– DataArray objects are classified as “coordinate variables” or “data variables”
– All data variables have a shared union of coordinates

10.4. Presentations and Demos 103


xCDAT Documentation, Release 0.6.0

• Serves a similar purpose to a pandas.DataFrame


2. ``xarray.DataArray``
• A class that attaches dimension names, coordinates, and attributes to multi-dimensional arrays (aka
“labeled arrays”)
• An N-D generalization of a pandas.Series

Exploring the Xarray Data Models

Example dataset: tas_Amon_ACCESS-ESM1-5_historical_r10i1p1f1_gn_185001-201412.nc


• Open up this real dataset from ESGF using xarray’s OPeNDAP support.
– Contains the tas variable (near-surface air temperature) recorded on a monthly frequency
• It is not downloaded until calculations/computations are performed on the Dataset object
– Example of an xarray lazy operation

[1]: # This style import is necessary to properly render Xarray's HTML output with
# the Jupyer RISE extension.
# GitHub Issue: https://github.com/damianavila/RISE/issues/594
# Source: https://github.com/smartass101/xarray-pydata-prague-2020/blob/main/rise.css

from IPython.core.display import HTML

style = """
<style>
.reveal pre.xr-text-repr-fallback {
display: none;
}
.reveal ul.xr-sections {
display: grid
}

.reveal ul ul.xr-var-list {
display: contents
}
</style>
"""

HTML(style)
[1]: <IPython.core.display.HTML object>

[2]: import xarray as xr

filepath = "https://esgf-data1.llnl.gov/thredds/dodsC/css03_data/CMIP6/CMIP/CSIRO/ACCESS-
˓→ESM1-5/historical/r10i1p1f1/Amon/tas/gn/v20200605/tas_Amon_ACCESS-ESM1-5_historical_

˓→r10i1p1f1_gn_185001-201412.nc"

ds = xr.open_dataset(filepath)

104 Chapter 10. License


xCDAT Documentation, Release 0.6.0

The Dataset Model

[3]: ds
[3]: <xarray.Dataset>
Dimensions: (time: 1980, bnds: 2, lat: 145, lon: 192)
Coordinates:
* time (time) datetime64[ns] 1850-01-16T12:00:00 ... 2014-12-16T12:00:00
* lat (lat) float64 -90.0 -88.75 -87.5 -86.25 ... 86.25 87.5 88.75 90.0
* lon (lon) float64 0.0 1.875 3.75 5.625 ... 352.5 354.4 356.2 358.1
height float64 ...
Dimensions without coordinates: bnds
Data variables:
time_bnds (time, bnds) datetime64[ns] ...
lat_bnds (lat, bnds) float64 ...
lon_bnds (lon, bnds) float64 ...
tas (time, lat, lon) float32 ...
Attributes: (12/48)
Conventions: CF-1.7 CMIP-6.2
activity_id: CMIP
branch_method: standard
branch_time_in_child: 0.0
branch_time_in_parent: 87658.0
creation_date: 2020-06-05T04:06:11Z
... ...
variant_label: r10i1p1f1
version: v20200605
license: CMIP6 model data produced by CSIRO is li...
cmor_version: 3.4.0
tracking_id: hdl:21.14100/af78ae5e-f3a6-4e99-8cfe-5f2...
DODS_EXTRA.Unlimited_Dimension: time

A dictionary-like container of labeled arrays (DataArray objects) with aligned dimensions.


Key properties:
• dims: a dictionary mapping from dimension names to the fixed length of each dimension (e.g., {‘x’: 6, ‘y’:
6, ‘time’: 8})
• coords: a dict-like container of DataArrays intended to label points used in ``data_vars`` (e.g., arrays of
numbers, datetime objects or strings)
• data_vars: a dict-like container of DataArrays corresponding to variables
• attrs: dict to hold arbitrary metadata
Source: https://docs.xarray.dev/en/stable/user-guide/data-structures.html#dataset

10.4. Presentations and Demos 105


xCDAT Documentation, Release 0.6.0

The DataArray Model

[4]: ds.tas
[4]: <xarray.DataArray 'tas' (time: 1980, lat: 145, lon: 192)>
[55123200 values with dtype=float32]
Coordinates:
* time (time) datetime64[ns] 1850-01-16T12:00:00 ... 2014-12-16T12:00:00
* lat (lat) float64 -90.0 -88.75 -87.5 -86.25 ... 86.25 87.5 88.75 90.0
* lon (lon) float64 0.0 1.875 3.75 5.625 7.5 ... 352.5 354.4 356.2 358.1
height float64 ...
Attributes:
standard_name: air_temperature
long_name: Near-Surface Air Temperature
comment: near-surface (usually, 2 meter) air temperature
units: K
cell_methods: area: time: mean
cell_measures: area: areacella
history: 2020-06-05T04:06:10Z altered by CMOR: Treated scalar dime...
_ChunkSizes: [ 1 145 192]

A class that attaches dimension names, coordinates, and attributes to multi-dimensional arrays (aka “labeled
arrays”)
Key properties:
• values: a numpy.ndarray holding the array’s values
• dims: dimension names for each axis (e.g., (‘x’, ‘y’, ‘z’))
• coords: a dict-like container of arrays (coordinates) that label each point (e.g., 1-dimensional arrays of
numbers, datetime objects or strings)
• attrs: dict to hold arbitrary metadata (attributes)
Source: https://docs.xarray.dev/en/stable/user-guide/data-structures.html#dataarray

Resources for Learning Xarray

• Here are some highly recommended resources:


– Xarray Tutorial
– “Xarray in 45 minutes”
– Xarray Documentation
– Xarray API Reference

106 Chapter 10. License


xCDAT Documentation, Release 0.6.0

xCDAT Extends Xarray for Climate Data Analysis

• Some key xCDAT features are inspired by or ported from the core CDAT library
– e.g., spatial averaging, temporal averaging, regrid2 for horizontal regridding
• Other features leverage powerful libraries in the xarray ecosystem
– xESMF for horizontal regridding
– xgcm for vertical interpolation
– CF-xarray for CF convention metadata interpretation
• xCDAT strives to support datasets CF compliant and common non-CF compliant metadata (time units in
“months since . . . ” or “years since . . . ”)
• Inherent support for lazy operations and parallelism through xarray + dask

The Technical Design Philosophy

• Streamline the user experience of developing code to analyze climate data


• Reduce the complexity and overhead for implementing certain features with xarray (e.g., temporal averaging,
spatial averaging)
• Encourage reusable functionalities through a single library

10.4. Presentations and Demos 107


xCDAT Documentation, Release 0.6.0

Leveraging the APIs

xCDAT provides public APIs in two ways:


1. Top-level APIs functions
• e.g., xcdat.open_dataset(), xcdat.center_times()
• Usually for opening datasets and performing dataset level operations
2. Accessor classes
• xcdat provides Dataset accessors, which are implicit namespaces for custom functionality.
• Accessor namespaces clearly identifies separation from built-in xarray methods.
• Operate on variables within the xr.Dataset
• e.g., ds.spatial, ds.temporal, ds.regridder

xcdat spatial functionality is exposed by chaining the .spatial accessor attribute to the xr.Dataset object.
Source: https://xcdat.readthedocs.io/en/latest/api.html

Key Features in xCDAT

Feature
API
Description
Extend xr.open_dataset() and xr.open_mfdataset()
open_dataset()
open_mfdataset()
Bounds generation
Time decoding (CF and select non-CF time units)
Centering of time coordinates
Conversion of longitudinal axis orientation
Temporal averaging
ds.temporal.average()
ds.temporal.group_average()
ds.temporal.climatology()
ds.temporal.departures()
Single snapshot and group average
Climatology and departure
Weighted or unweighted
Optional seasonal configuration< (e.g., custom seasons)
Geospatial averaging
ds.spatial.average()

108 Chapter 10. License


xCDAT Documentation, Release 0.6.0

Rectilinear grids
Weighted
Optional specification of region domain
Horizontal regridding
ds.regridder.horizontal()
Rectilinear and curvilinear grids
Extends xESMF horizontal regridding
Python implementation of regrid2
Vertical regridding
ds.regridder.vertical()
Transform vertical coordinates
Extends xgcm vertical interpolation
Linear, logarithmic, and conservative interpolation
Decode parametric vertical coordinates if required

A Demo of xCDAT Capabilities

• Prerequisites
– Installing xcdat
– Import xcdat
– Open a dataset and apply postprocessing operations
• Scenario 1 - Calculate the spatial averages over the tropical region
• Scenario 2 - Calculate temporal average
• Scenario 3 - Horizontal regridding (bilinear, gaussian grid)

Installing xcdat

xCDAT is available on Anaconda under the conda-forge channel (https://anaconda.org/conda-forge/xcdat)


Two ways to install xcdat with recommended dependencies (xesmf):
1. Create a conda environment from scratch (conda create)

conda create -n <ENV_NAME> -c conda-forge xcdat xesmf


conda activate <ENV_NAME>

2. Install xcdat in an existing conda environment (conda install)

conda activate <ENV_NAME>


conda install -c conda-forge xcdat xesmf

Source: https://xcdat.readthedocs.io/en/latest/getting-started.html

10.4. Presentations and Demos 109


xCDAT Documentation, Release 0.6.0

Opening a dataset

Example dataset: tas_Amon_ACCESS-ESM1-5_historical_r10i1p1f1_gn_185001-201412.nc (same as before)

[5]: # This gives access to all xcdat public top-level APIs and accessor classes.
import xcdat as xc

# We import these packages specifically for plotting. It is not required to use xcdat.
import matplotlib.pyplot as plt
import pandas as pd

filepath = "https://esgf-data1.llnl.gov/thredds/dodsC/css03_data/CMIP6/CMIP/CSIRO/ACCESS-
˓→ESM1-5/historical/r10i1p1f1/Amon/tas/gn/v20200605/tas_Amon_ACCESS-ESM1-5_historical_

˓→r10i1p1f1_gn_185001-201412.nc"

ds = xc.open_dataset(
filepath,
add_bounds=True,
decode_times=True,
center_times=True
)

# Unit adjustment from Kelvin to Celcius.


ds["tas"] = ds.tas - 273.15

[6]: ds
[6]: <xarray.Dataset>
Dimensions: (time: 1980, bnds: 2, lat: 145, lon: 192)
Coordinates:
* time (time) object 1850-01-16 12:00:00 ... 2014-12-16 12:00:00
* lat (lat) float64 -90.0 -88.75 -87.5 -86.25 ... 86.25 87.5 88.75 90.0
* lon (lon) float64 0.0 1.875 3.75 5.625 ... 352.5 354.4 356.2 358.1
height float64 2.0
Dimensions without coordinates: bnds
Data variables:
time_bnds (time, bnds) object ...
lat_bnds (lat, bnds) float64 ...
lon_bnds (lon, bnds) float64 ...
tas (time, lat, lon) float32 -27.19 -27.19 -27.19 ... -25.29 -25.29
Attributes: (12/48)
Conventions: CF-1.7 CMIP-6.2
activity_id: CMIP
branch_method: standard
branch_time_in_child: 0.0
branch_time_in_parent: 87658.0
creation_date: 2020-06-05T04:06:11Z
... ...
variant_label: r10i1p1f1
version: v20200605
license: CMIP6 model data produced by CSIRO is li...
cmor_version: 3.4.0
tracking_id: hdl:21.14100/af78ae5e-f3a6-4e99-8cfe-5f2...
DODS_EXTRA.Unlimited_Dimension: time

110 Chapter 10. License


xCDAT Documentation, Release 0.6.0

Scenario 1: Spatial Averaging

Related accessor: ds.spatial


In this example, we calculate the spatial average of tas over the tropical region and plot the first 100 time steps.

[7]: ds_trop_avg = ds.spatial.average("tas", axis=["X","Y"], lat_bounds=(-25,25))


ds_trop_avg.tas
[7]: <xarray.DataArray 'tas' (time: 1980)>
array([25.24722608, 25.61795924, 25.96516235, ..., 26.79536823,
26.67771602, 26.27182383])
Coordinates:
* time (time) object 1850-01-16 12:00:00 ... 2014-12-16 12:00:00
height float64 2.0

[8]: ds_trop_avg.tas.isel(time=slice(1, 100)).plot()


[8]: [<matplotlib.lines.Line2D at 0x1441e60b0>]

10.4. Presentations and Demos 111


xCDAT Documentation, Release 0.6.0

Scenario 2: Calculate temporal average

Related accessor: ds.temporal


In this example, we calculate the temporal average of ``tas`` as a single snapshot (time dimension is collapsed).

[9]: ds_avg = ds.temporal.average("tas", weighted=True)


ds_avg.tas
[9]: <xarray.DataArray 'tas' (lat: 145, lon: 192)>
array([[-48.01481628, -48.01481628, -48.01481628, ..., -48.01481628,
-48.01481628, -48.01481628],
[-44.94085363, -44.97948214, -45.01815398, ..., -44.82408252,
-44.86273067, -44.9009281 ],
[-44.11875274, -44.23060624, -44.33960158, ..., -43.76766492,
-43.88593717, -44.00303006],
...,
[-18.21076615, -18.17513373, -18.13957458, ..., -18.32720478,
-18.28428828, -18.2486193 ],
[-18.50778243, -18.49301854, -18.47902819, ..., -18.55410851,
-18.5406963 , -18.52413098],
[-19.07366375, -19.07366375, -19.07366375, ..., -19.07366375,
-19.07366375, -19.07366375]])
Coordinates:
* lat (lat) float64 -90.0 -88.75 -87.5 -86.25 ... 86.25 87.5 88.75 90.0
* lon (lon) float64 0.0 1.875 3.75 5.625 7.5 ... 352.5 354.4 356.2 358.1
height float64 2.0
Attributes:
operation: temporal_avg
mode: average
freq: month
weighted: True

[10]: ds_avg.tas.plot(label="weighted")
[10]: <matplotlib.collections.QuadMesh at 0x1443d9f30>

112 Chapter 10. License


xCDAT Documentation, Release 0.6.0

Scenario 3: Horizontal Regridding

Related accessor: ds.regridder


In this example, we will generate a gaussian grid with 32 latitudes to regrid our input data to.

Create the output grid

[11]: output_grid = xc.create_gaussian_grid(32)


output_grid
[11]: <xarray.Dataset>
Dimensions: (lat: 32, bnds: 2, lon: 65)
Coordinates:
* lat (lat) float64 85.76 80.27 74.74 69.21 ... -74.74 -80.27 -85.76
* lon (lon) float64 0.0 5.625 11.25 16.88 ... 343.1 348.8 354.4 360.0
Dimensions without coordinates: bnds
Data variables:
lat_bnds (lat, bnds) float64 90.0 83.21 83.21 77.61 ... -83.21 -83.21 -90.0
lon_bnds (lon, bnds) float64 -2.812 2.812 2.812 8.438 ... 357.2 357.2 362.8

10.4. Presentations and Demos 113


xCDAT Documentation, Release 0.6.0

Plot the Input vs. Output Grid

[12]: fig, axes = plt.subplots(ncols=2, figsize=(16, 6))

input_grid = ds.regridder.grid
input_grid.plot.scatter(x='lon', y='lat', s=5, ax=axes[0], add_colorbar=False, cmap=plt.
˓→cm.RdBu)

axes[0].set_title('Input Grid')

output_grid.plot.scatter(x='lon', y='lat', s=5, ax=axes[1], add_colorbar=False, cmap=plt.


˓→cm.RdBu)

axes[1].set_title('Output Grid')

plt.tight_layout()

Regrid the data

xCDAT offers horizontal regridding with xESMF (default) and a Python port of regrid2. We will be using xESMF to
regrid.

[13]: # xesmf supports "bilinear", "conservative", "nearest_s2d", "nearest_d2s", and "patch"


output = ds.regridder.horizontal('tas', output_grid, tool='xesmf', method='bilinear')

[14]: fig, axes = plt.subplots(ncols=2, figsize=(16, 4))

ds.tas.isel(time=0).plot(ax=axes[0])
axes[0].set_title('Input data')

output.tas.isel(time=0).plot(ax=axes[1])
axes[1].set_title('Output data')

plt.tight_layout()

114 Chapter 10. License


xCDAT Documentation, Release 0.6.0

Parallelism with Dask

Nearly all existing xarray methods have been extended to work automatically with Dask arrays for paral-
lelism
—https://docs.xarray.dev/en/stable/user-guide/dask.html#using-dask-with-xarray
• Parallelized xarray methods include indexing, computation, concatenating and grouped operations
• xCDAT APIs that build upon xarray methods inherently support Dask parallelism
– Dask arrays are loaded into memory only when absolutely required (e.g., generating weights for averaging)

[15]: filepath = "http://esgf.nci.org.au/thredds/dodsC/master/CMIP6/CMIP/CSIRO/ACCESS-ESM1-5/


˓→historical/r10i1p1f1/Amon/tas/gn/v20200605/tas_Amon_ACCESS-ESM1-5_historical_r10i1p1f1_

˓→gn_185001-201412.nc"

# Use .chunk() to activate Dask arrays


# NOTE: `open_mfdataset()` automatically chunks by the number of files, which
# might not be optimal.
ds = xc.open_dataset(
filepath,
chunks={"time": "auto"}
)
ds
[15]: <xarray.Dataset>
Dimensions: (time: 1980, bnds: 2, lat: 145, lon: 192)
Coordinates:
* time (time) object 1850-01-16 12:00:00 ... 2014-12-16 12:00:00
* lat (lat) float64 -90.0 -88.75 -87.5 -86.25 ... 86.25 87.5 88.75 90.0
* lon (lon) float64 0.0 1.875 3.75 5.625 ... 352.5 354.4 356.2 358.1
height float64 ...
Dimensions without coordinates: bnds
Data variables:
time_bnds (time, bnds) object dask.array<chunksize=(1980, 2), meta=np.ndarray>
lat_bnds (lat, bnds) float64 dask.array<chunksize=(145, 2), meta=np.ndarray>
lon_bnds (lon, bnds) float64 dask.array<chunksize=(192, 2), meta=np.ndarray>
tas (time, lat, lon) float32 dask.array<chunksize=(1205, 145, 192), meta=np.
˓→ndarray>

Attributes: (12/49)
Conventions: CF-1.7 CMIP-6.2
(continues on next page)

10.4. Presentations and Demos 115


xCDAT Documentation, Release 0.6.0

(continued from previous page)


activity_id: CMIP
branch_method: standard
branch_time_in_child: 0.0
branch_time_in_parent: 87658.0
creation_date: 2020-06-05T04:06:11Z
... ...
version: v20200605
license: CMIP6 model data produced by CSIRO is li...
cmor_version: 3.4.0
_NCProperties: version=2,netcdf=4.6.2,hdf5=1.10.5
tracking_id: hdl:21.14100/af78ae5e-f3a6-4e99-8cfe-5f2...
DODS_EXTRA.Unlimited_Dimension: time

Further Dask Guidance

Visit these pages for more guidance (e.g., when to parallelize):


• Ongoing xCDAT Dask Investigation: https://github.com/xCDAT/xcdat/discussions/376
– Performance metrics, best practices, and possibly a guide
• Parallel computing with Dask: https://docs.xarray.dev/en/stable/user-guide/dask.html
• Xarray with Dask Arrays: https://examples.dask.org/xarray.html

Key Takeaways

• A driving need for a modern successor to CDAT


• Serves the climate community in the long-term
• xCDAT is an extension of xarray for climate data analysis on structured grids
• Goal of providing features and utilities for simple and robust analysis of climate data

116 Chapter 10. License


xCDAT Documentation, Release 0.6.0

Where to Find xCDAT

• xCDAT is available for installation through Anaconda


– Install command: ``conda install -c conda-forge xcdat xesmf``
• Check out xCDAT’s Read the Docs, which we strive to keep up-to-date
– https://xcdat.readthedocs.io/en/stable/

10.4. Presentations and Demos 117


xCDAT Documentation, Release 0.6.0

Get Involved on GitHub!

• Code contributions are welcome and appreciated


– GitHub Repository: https://github.com/xCDAT/xcdat
– Contributing Guide: https://xcdat.readthedocs.io/en/latest/contributing.html
• Submit and/or address tickets for feature suggestions, bugs, and documentation updates
– GitHub Issues: https://github.com/xCDAT/xcdat/issues
• Participate in forum discussions on version releases, architecture, feature suggestions, etc.
– GitHub Discussions: https://github.com/xCDAT/xcdat/discussions

10.5 API Reference

10.5.1 Overview

Most public xcdat APIs operate on xarray.Dataset objects. xcdat follows this design pattern because coordinate
variable bounds are often required to perform robust calculations. Currently, coordinate variable bounds can only be
stored on Dataset objects and not DataArray objects. Refer to this issue for more information.

10.5.2 Top-level API Functions

Below is a list of top-level API functions that are available in xcdat.

118 Chapter 10. License


xCDAT Documentation, Release 0.6.0

open_dataset(path[, data_var, add_bounds, ...]) Wraps xarray.open_dataset() with post-processing


options.
open_mfdataset(paths[, data_var, ...]) Wraps xarray.open_mfdataset() with post-
processing options.
center_times(dataset) Centers time coordinates using the midpoint between
time bounds.
decode_time(dataset) Decodes CF and non-CF time coordinates and time
bounds using cftime.
swap_lon_axis(dataset, to[, sort_ascending]) Swaps the orientation of a dataset's longitude axis.
compare_datasets(ds1, ds2) Compares the keys and values of two datasets.
get_dim_coords(obj, axis) Gets the dimension coordinates for an axis.
get_dim_keys(obj, axis) Gets the dimension key(s) for an axis.
create_axis(name, data[, bounds, ...]) Creates an axis and optional bounds.
create_gaussian_grid(nlats) Creates a grid with Gaussian latitudes and uniform lon-
gitudes.
create_global_mean_grid(grid) Creates a global mean grid.
create_grid([x, y, z, attrs]) Creates a grid dataset using the specified axes.
create_uniform_grid(lat_start, lat_stop, ...) Creates a uniform rectilinear grid and sets appropriate
the attributes for the lat/lon axis.
create_zonal_grid(grid) Creates a zonal grid.

xcdat.open_dataset

xcdat.open_dataset(path, data_var=None, add_bounds=['X', 'Y'], decode_times=True, center_times=False,


lon_orient=None, **kwargs)
Wraps xarray.open_dataset() with post-processing options.
Deprecated since version v0.6.0: add_bounds boolean arguments (True/False) are being deprecated. Please use
either a list (e.g., [“X”, “Y”]) to specify axes or None.
Parameters
• path (str, Path, file-like or DataStore) – Strings and Path objects are interpreted as
a path to a netCDF file or an OpenDAP URL and opened with python-netCDF4, unless the
filename ends with .gz, in which case the file is gunzipped and opened with scipy.io.netcdf
(only netCDF3 supported). Byte-strings or file-like objects are opened by scipy.io.netcdf
(netCDF3) or h5py (netCDF4/HDF).
• data_var (Optional[str], optional) – The key of the non-bounds data variable to keep in
the Dataset, alongside any existing bounds data variables, by default None.
• add_bounds (List[CFAxisKey] | None | bool) – List of CF axes to try to add bounds
for (if missing), by default [“X”, “Y”]. Set to None to not add any missing bounds. Please
note that bounds are required for many xCDAT features.
– This parameter calls xarray.Dataset.bounds.add_missing_bounds()
– Supported CF axes include “X”, “Y”, “Z”, and “T”
– By default, missing “T” bounds are generated using the time frequency of the coordinates.
If desired, refer to xarray.Dataset.bounds.add_time_bounds() if you require more
granular configuration for how “T” bounds are generated.
• decode_times (bool, optional) – If True, attempt to decode times encoded in the standard
NetCDF datetime format into cftime.datetime objects. Otherwise, leave them encoded as
numbers. This keyword may not be supported by all the backends, by default True.

10.5. API Reference 119


xCDAT Documentation, Release 0.6.0

• center_times (bool, optional) – If True, attempt to center time coordinates using the mid-
point between its upper and lower bounds. Otherwise, use the provided time coordinates, by
default False.
• lon_orient (Optional[Tuple[float, float]], optional) – The orientation to use for the
Dataset’s longitude axis (if it exists). Either (-180, 180) or (0, 360), by default None. Sup-
ported options include:
– None: use the current orientation (if the longitude axis exists)
– (-180, 180): represents [-180, 180) in math notation
– (0, 360): represents [0, 360) in math notation
• kwargs (Dict[str, Any]) – Additional arguments passed on to xarray.open_dataset.
Refer to the1 xarray docs for accepted keyword arguments.
Returns
xr.Dataset – Dataset after applying operations.

Notes

xarray.open_dataset opens the file with read-only access. When you modify values of a Dataset, even one
linked to files on disk, only the in-memory copy you are manipulating in xarray is modified: the original file on
disk is never touched.

References

xcdat.open_mfdataset

xcdat.open_mfdataset(paths, data_var=None, add_bounds=['X', 'Y'], decode_times=True, center_times=False,


lon_orient=None, data_vars='minimal', preprocess=None, **kwargs)
Wraps xarray.open_mfdataset() with post-processing options.
Deprecated since version v0.6.0: add_bounds boolean arguments (True/False) are being deprecated. Please use
either a list (e.g., [“X”, “Y”]) to specify axes or None.
Parameters
• paths (str | NestedSequence[str | os.PathLike]) – Paths to dataset files. Paths
can be given as strings or as pathlib.Path objects. Supported options include:
– Directory path (e.g., "path/to/files"), which is converted to a string glob of *.nc files
– String glob (e.g., "path/to/files/*.nc"), which is expanded to a 1-dimensional list
of file paths
– File path to dataset (e.g., "path/to/files/file1.nc")
– List of file paths (e.g., ["path/to/files/file1.nc", ...]). If concatenation along
more than one dimension is desired, then paths must be a nested list-of-lists (see2
xarray.combine_nested for details).
– File path to an XML file with a directory attribute (e.g., "path/to/files"). If
directory is set to a blank string (“”), then the current directory is substituted (“.”).
This option is intended to support the CDAT CDML dialect of XML files, but it can work
1 https://xarray.pydata.org/en/stable/generated/xarray.open_dataset.html
2 https://docs.xarray.dev/en/stable/generated/xarray.combine_nested.html

120 Chapter 10. License


xCDAT Documentation, Release 0.6.0

with any XML file that has the directory attribute. Refer to4 for more information on
CDML. NOTE: This feature is deprecated in v0.6.0 and will be removed in the subsequent
release. CDAT (including cdms2/CDML) is in maintenance only mode and marked for
end-of-life by the end of 2023.
• add_bounds (List[CFAxisKey] | None | bool) – List of CF axes to try to add bounds
for (if missing), by default [“X”, “Y”]. Set to None to not add any missing bounds. Please
note that bounds are required for many xCDAT features.
– This parameter calls xarray.Dataset.bounds.add_missing_bounds()
– Supported CF axes include “X”, “Y”, “Z”, and “T”
– By default, missing “T” bounds are generated using the time frequency of the coordinates.
If desired, refer to xarray.Dataset.bounds.add_time_bounds() if you require more
granular configuration for how “T” bounds are generated.
• data_var (Optional[str], optional) – The key of the data variable to keep in the Dataset,
by default None.
• decode_times (bool, optional) – If True, attempt to decode times encoded in the standard
NetCDF datetime format into cftime.datetime objects. Otherwise, leave them encoded as
numbers. This keyword may not be supported by all the backends, by default True.
• center_times (bool, optional) – If True, attempt to center time coordinates using the mid-
point between its upper and lower bounds. Otherwise, use the provided time coordinates, by
default False.
• lon_orient (Optional[Tuple[float, float]], optional) – The orientation to use for the
Dataset’s longitude axis (if it exists), by default None. Supported options include:
– None: use the current orientation (if the longitude axis exists)
– (-180, 180): represents [-180, 180) in math notation
– (0, 360): represents [0, 360) in math notation
• data_vars ({"minimal", "different", "all" or list of str}, optional) –
These data variables will be concatenated together:
– “minimal”: Only data variables in which the dimension already appears are included,
the default value.
– “different”: Data variables which are not equal (ignoring attributes) across all datasets
are also concatenated (as well as all for which dimension already appears). Beware: this
option may load the data payload of data variables into memory if they are not already
loaded.
– “all”: All data variables will be concatenated.
– list of str: The listed data variables will be concatenated, in addition to the “minimal”
data variables.
The data_vars kwarg defaults to "minimal", which concatenates data variables in a man-
ner where only data variables in which the dimension already appears are included. For
example, the time dimension will not be concatenated to the dimensions of non-time data
variables such as “lat_bnds” or “lon_bnds”. data_vars=”minimal” is required for some xC-
DAT functions, including spatial averaging where a reduction is performed using the lat/lon
bounds.
4 https://cdms.readthedocs.io/en/latest/manual/cdms_6.html

10.5. API Reference 121


xCDAT Documentation, Release 0.6.0

• preprocess (Optional[Callable], optional) – If provided, call this function on each


dataset prior to concatenation. You can find the file-name from which each dataset was
loaded in ds.encoding["source"].
• kwargs (Dict[str, Any]) – Additional arguments passed on to xarray.open_mfdataset.
Refer to the3 xarray docs for accepted keyword arguments.
Returns
xr.Dataset – The Dataset.

Notes

xarray.open_mfdataset opens the file with read-only access. When you modify values of a Dataset, even
one linked to files on disk, only the in-memory copy you are manipulating in xarray is modified: the original file
on disk is never touched.
The CDAT “Climate Data Markup Language” (CDML) is a deprecated dialect of XML with a defined set of
attributes. CDML is still used by current and former users of CDAT. To enable CDML users to adopt xCDAT
more easily in their workflows, xCDAT can parse XML/CDML files for the directory to generate a glob or list
of file paths. Refer toPage 121, 4 for more information on CDML. NOTE: This feature is deprecated in v0.6.0 and
will be removed in the subsequent release. CDAT (including cdms2/CDML) is in maintenance only mode and
marked for end-of-life by the end of 2023.

References

xcdat.center_times

xcdat.center_times(dataset)
Centers time coordinates using the midpoint between time bounds.
Time coordinates can be recorded using different intervals, including the beginning, middle, or end of the interval.
Centering time coordinates, ensures calculations using these values are performed reliably regardless of the
recorded interval.
This method attempts to get bounds for each time variable using the CF “bounds” attribute. Coordinate variables
that cannot be mapped to bounds will be skipped.
Parameters
dataset (xr.Dataset) – The Dataset with original time coordinates.
Returns
xr.Dataset – The Dataset with centered time coordinates.

xcdat.decode_time

xcdat.decode_time(dataset)
Decodes CF and non-CF time coordinates and time bounds using cftime.
By default, xarray only supports decoding time with CF compliant units5 . This function enables also decod-
ing time with non-CF compliant units. It skips decoding time coordinates that have already been decoded as
"datetime64[ns]" or cftime.datetime.
3 https://xarray.pydata.org/en/stable/generated/xarray.open_mfdataset.html
5 https://cfconventions.org/cf-conventions/cf-conventions.html#time-coordinate

122 Chapter 10. License


xCDAT Documentation, Release 0.6.0

For time coordinates to be decodable, they must have a “calendar” attribute set to a CF calendar type sup-
ported by cftime. CF calendar types include “noleap”, “360_day”, “365_day”, “366_day”, “gregorian”, “pro-
leptic_gregorian”, “julian”, “all_leap”, or “standard”. They must also have a “units” attribute set to a format
supported by xCDAT (“months since . . . ” or “years since . . . ”).
Parameters
dataset (xr.Dataset) – Dataset with numerically encoded time coordinates and time bounds
(if they exist). If the time coordinates cannot be decoded then the original dataset is returned.
Returns
xr.Dataset – Dataset with decoded time coordinates and time bounds (if they exist) as cftime
objects.
Raises
KeyError – If time coordinates were not detected in the dataset, either because they don’t exist
at all or their CF attributes (e.g., ‘axis’ or ‘standard_name’) are not set.

Notes

Time coordinates are represented by cftime.datetime objects because it is not restricted by the pandas.
Timestamp range (years 1678 through 2262). Refer to6 and7 for more information on this limitation.

References

Examples

Decode the time coordinates in a Dataset:

>>> from xcdat.dataset import decode_time


>>>
>>> ds.time
<xarray.DataArray 'time' (time: 3)>
array([0, 1, 2])
Coordinates:
* time (time) int64 0 1 2
Attributes:
units: years since 2000-01-01
bounds: time_bnds
axis: T
long_name: time
standard_name: time
calendar: noleap
>>>
>>> ds_decoded = decode_time(ds)
>>> ds_decoded.time
<xarray.DataArray 'time' (time: 3)>
array([cftime.DatetimeNoLeap(1850, 1, 1, 0, 0, 0, 0, has_year_zero=True),
cftime.DatetimeNoLeap(1850, 1, 1, 0, 0, 0, 0, has_year_zero=True),
cftime.DatetimeNoLeap(1850, 1, 1, 0, 0, 0, 0, has_year_zero=True)],
dtype='object')
Coordinates:
(continues on next page)
6 https://docs.xarray.dev/en/stable/user-guide/weather-climate.html#non-standard-calendars-and-dates-outside-the-timestamp-valid-range
7 https://pandas.pydata.org/pandas-docs/stable/user_guide/timeseries.html#timestamp-limitations

10.5. API Reference 123


xCDAT Documentation, Release 0.6.0

(continued from previous page)


* time (time) datetime64[ns] 2000-01-01 2001-01-01 2002-01-01
Attributes:
units: years since 2000-01-01
bounds: time_bnds
axis: T
long_name: time
standard_name: time
calendar: noleap

View time encoding information:

>>> ds_decoded.time.encoding
{'source': None,
'dtype': dtype('int64'),
'original_shape': (3,),
'units': 'years since 2000-01-01',
'calendar': 'noleap'}

xcdat.swap_lon_axis

xcdat.swap_lon_axis(dataset, to, sort_ascending=True)


Swaps the orientation of a dataset’s longitude axis.
This method also swaps the axis orientation of the longitude bounds if it exists. Afterwards, it sorts longitude
and longitude bounds values in ascending order.
Note, based on how datasets are chunked, swapping the longitude dimension and sorting might raise
PerformanceWarning: Slicing is producing a large chunk. To accept the large chunk
and silence this warning, set the option.... This function uses xarray’s arithmetic to swap
orientations, so this warning seems potentially unavoidable.
Parameters
• dataset (xr.Dataset) – The Dataset containing a longitude axis.
• to (Tuple[float, float]) – The orientation to swap the Dataset’s longitude axis to.
Supported orientations:
– (-180, 180): represents [-180, 180) in math notation
– (0, 360): represents [0, 360) in math notation
• sort_ascending (bool) – After swapping, sort in ascending order (True), or keep existing
order (False).
Returns
xr.Dataset – The Dataset with swapped lon axes orientation.

124 Chapter 10. License


xCDAT Documentation, Release 0.6.0

xcdat.compare_datasets

xcdat.compare_datasets(ds1, ds2)
Compares the keys and values of two datasets.
This utility function is especially useful for debugging tests that involve comparing two Dataset objects for being
identical or equal.
Checks include:
• Unique keys - keys that exist only in one of the two datasets.
• Non-identical - keys whose values have the same dimension, coordinates, values, name, attributes, and
attributes on all coordinates.
• Non-equal keys - keys whose values have the same dimension, coordinates, and values, but not necessarily
the same attributes. Key values that are non-equal will also be non-identical.

Parameters
• ds1 (xr.Dataset) – The first Dataset.
• ds2 (xr.Dataset) – The second Dataset.
Returns
Dict[str, Union[List[str]]] – A dictionary mapping unique, non-identical, and non-equal
keys in both Datasets.

xcdat.get_dim_coords

xcdat.get_dim_coords(obj, axis)
Gets the dimension coordinates for an axis.
This function uses cf_xarray to attempt to map the axis to its dimension coordinates by interpreting the CF
axis and coordinate names found in the coordinate attributes. Refer to1 for a list of CF axis and coordinate names
that can be interpreted by cf_xarray.
If obj is an xr.Dataset,, this function can return a single dimension coordinate variable as an xr.DataArray
or multiple dimension coordinate variables in an xr Dataset. If obj is an xr.DataArray, this function should
return a single dimension coordinate variable as an xr.DataArray.
Parameters
• obj (Union[xr.Dataset, xr.DataArray]) – The Dataset or DataArray object.
• axis (CFAxisKey) – The CF axis key (“X”, “Y”, “T”, “Z”).
Returns
Union[xr.Dataset, xr.DataArray] – A Dataset of dimension coordinate variables or a
DataArray for the single dimension coordinate variable.
Raises
• ValueError – If the obj is an xr.DataArray and more than one dimension is mapped to
the same axis.
• KeyError – If no dimension coordinate variables were found for the axis.
1 https://cf-xarray.readthedocs.io/en/latest/coord_axes.html#axes-and-coordinates

10.5. API Reference 125


xCDAT Documentation, Release 0.6.0

Notes

Multidimensional coordinates are ignored.

References

xcdat.get_dim_keys

xcdat.get_dim_keys(obj, axis)
Gets the dimension key(s) for an axis.
Each dimension should have a corresponding dimension coordinate variable, which has a 1:1 map in keys and is
denoted by the * symbol when printing out the xarray object.
Parameters
• obj (Union[xr.Dataset, xr.DataArray]) – The Dataset or DataArray object.
• axis (CFAxisKey) – The CF axis key (“X”, “Y”, “T”, or “Z”)
Returns
Union[str, List[str]] – The dimension string or a list of dimensions strings for an axis.

xcdat.create_axis

xcdat.create_axis(name, data, bounds=None, generate_bounds=True, attrs=None)


Creates an axis and optional bounds.
Parameters
• name (str) – The CF standard name for the axis (e.g., “longitude”, “latitude”, “height”).
xCDAT also accepts additional names such as “lon”, “lat”, and “lev”. Refer to xcdat.axis.
VAR_NAME_MAP for accepted names.
• data (Union[List[Union[int, float]], np.ndarray]) – 1-D axis data consisting of
integers or floats.
• bounds (Optional[Union[List[List[Union[int, float]]], np.ndarray]]) – 2-D
axis bounds data consisting of integers or floats, defaults to None. Must have a shape of n x
2, where n is the length of data.
• generate_bounds (Optiona[bool]) – Generate bounds for the axis if bounds is None, by
default True.
• attrs (Optional[Dict[str, str]]) – Custom attributes to be added to the generated
xr.DataArray axis, by default None.
User provided attrs will be merged with a set of default attributes. Default attributes
(“axis”, “coordinate”, “bnds”) cannot be overwritten. The default “units” attribute is the
only default that can be overwritten.
Returns
Tuple[xr.DataArray, Optional[xr.DataArray]] – A DataArray containing the axis data
and optional bounds.
Raises
ValueError – If name is not valid CF axis name.

126 Chapter 10. License


xCDAT Documentation, Release 0.6.0

Examples

Create axis and generate bounds (by default):

>>> lat, bnds = create_axis("lat", np.array([-45, 0, 45]))

Create axis and bounds from list of floats:

>>> lat, bnds = create_axis("lat", [-45, 0, 45], bounds=[[-67.5, -22.5], [-22.5, 22.
˓→5], [22.5, 67.5]])

Create axis and disable generating bounds:

>>> lat, _ = create_axis("lat", np.array([-45, 0, 45]), generate_bounds=False)

Provide additional attributes and overwrite units:

>>> lat, _ = create_axis(


>>> "lat",
>>> np.array([-45, 0, 45]),
>>> attrs={"generated": str(datetime.date.today()), "units": "degrees_south"},
>>> )

xcdat.create_gaussian_grid

xcdat.create_gaussian_grid(nlats)
Creates a grid with Gaussian latitudes and uniform longitudes.
Parameters
nlats (int) – Number of latitudes.
Returns
xr.Dataset – Dataset with new grid, containing Gaussian latitudes.

Examples

Create grid with 32 latitudes:

>>> xcdat.regridder.grid.create_gaussian_grid(32)

xcdat.create_global_mean_grid

xcdat.create_global_mean_grid(grid)
Creates a global mean grid.
Bounds are expected to be present in grid.
Parameters
grid (xr.Dataset) – Source grid.
Returns
xr.Dataset – A dataset containing the global mean grid.

10.5. API Reference 127


xCDAT Documentation, Release 0.6.0

xcdat.create_grid

xcdat.create_grid(x=None, y=None, z=None, attrs=None, **kwargs)


Creates a grid dataset using the specified axes.
Deprecated since version v0.6.0: **kwargs argument is being deprecated, please migrate to x, y, or z arguments
to create future grids.
Parameters
• x (Optional[Union[xr.DataArray, Tuple[xr.DataArray]]]) – Data with optional
bounds to use for the “X” axis, by default None.
• y (Optional[Union[xr.DataArray, Tuple[xr.DataArray]]]) – Data with optional
bounds to use for the “Y” axis, by default None.
• z (Optional[Union[xr.DataArray, Tuple[xr.DataArray]]]) – Data with optional
bounds to use for the “Z” axis, by default None.
• attrs (Optional[Dict[str, str]]) – Custom attributes to be added to the generated
xr.Dataset.
Returns
xr.Dataset – Dataset with grid axes.

Examples

Create uniform 2.5 x 2.5 degree grid using create_axis:

>>> # NOTE: `create_axis` returns (axis, bnds)


>>> lat_axis = create_axis("lat", np.arange(-90, 90, 2.5))
>>> lon_axis = create_axis("lon", np.arange(1.25, 360, 2.5))
>>>
>>> grid = create_grid(x=lon_axis, y=lat_axis)

With custom attributes:

>>> grid = create_grid(


>>> x=lon_axis, y=lat_axis, attrs={"created": str(datetime.date.today())}
>>> )

Create grid using existing xr.DataArray’s:

>>> lat = xr.DataArray(...)


>>> lon = xr.DataArray(...)
>>>
>>> grid = create_grid(x=lon, x=lat)

With existing bounds:

>>> lat_bnds = xr.DataArray(...)


>>> lon_bnds = xr.DataArray(...)
>>>
>>> grid = create_grid(x=(lat, lat_bnds), y=(lon, lon_bnds))

Create vertical grid:

128 Chapter 10. License


xCDAT Documentation, Release 0.6.0

>>> z = create_axis(
>>> "lev", np.linspace(1000, 1, 20), attrs={"units": "meters", "positive": "down"}
>>> )
>>> grid = create_grid(z=z)

xcdat.create_uniform_grid

xcdat.create_uniform_grid(lat_start, lat_stop, lat_delta, lon_start, lon_stop, lon_delta)


Creates a uniform rectilinear grid and sets appropriate the attributes for the lat/lon axis.
Parameters
• lat_start (float) – First latitude.
• lat_stop (float) – Last latitude.
• lat_delta (float) – Difference between two points of axis.
• lon_start (float) – First longitude.
• lon_stop (float) – Last longitude.
• lon_delta (float) – Difference between two points of axis.
Returns
xr.Dataset – Dataset with uniform lat/lon grid.

Examples

Create 4x5 uniform grid:

>>> xcdat.regridder.grid.create_uniform_grid(-90, 90, 4.0, -180, 180, 5.0)

xcdat.create_zonal_grid

xcdat.create_zonal_grid(grid)
Creates a zonal grid.
Bounds are expected to be present in grid.
Parameters
grid (xr.Dataset) – Source grid.
Returns
xr.Dataset – A dataset containing a zonal grid.

10.5. API Reference 129


xCDAT Documentation, Release 0.6.0

10.5.3 Accessors

What are accessors?

xcdat provides Dataset accessors, which are implicit namespaces for custom functionality that clearly identifies it
as separate from built-in xarray methods. xcdat implements accessors to extend xarray with custom functionality
because it is the officially recommended and most common practice (over sub-classing).
In the example below, custom spatial functionality is exposed by chaining the spatial accessor attribute to the
Dataset object. This chaining enables access to the underlying spatial average() method.

How do I use xcdat accessors?

First, import the package:

>>> from xcdat

Then open up a dataset file as a Dataset object:

>>> ds = xcdat.open_dataset("path/to/file", data_var="ts")

Now chain the accessor attribute to the Dataset to expose the accessor class attributes, methods, or properties:

>>> ds = ds.spatial.average("ts", axis=["X", "Y"])

Note: Accessors are created once per Dataset instance. New instances, like those created from arithmetic operations
will have new accessors created.

Classes

xcdat.bounds.BoundsAccessor(dataset) An accessor class that provides bounds attributes and


methods on xarray Datasets through the .bounds at-
tribute.
xcdat.spatial.SpatialAccessor(dataset) An accessor class that provides spatial attributes and
methods on xarray Datasets through the .spatial at-
tribute.
xcdat.temporal.TemporalAccessor(dataset) An accessor class that provides temporal attributes and
methods on xarray Datasets through the .temporal at-
tribute.
xcdat.regridder.accessor. An accessor class that provides regridding attributes and
RegridderAccessor(dataset) methods for xarray Datasets through the .regridder at-
tribute.
xcdat.regridder.regrid2.
Regrid2Regridder(...)
xcdat.regridder.xesmf.XESMFRegridder(...[, ...])

xcdat.regridder.xgcm.XGCMRegridder(...[, ...])

130 Chapter 10. License


xCDAT Documentation, Release 0.6.0

xcdat.bounds.BoundsAccessor

class xcdat.bounds.BoundsAccessor(dataset)
An accessor class that provides bounds attributes and methods on xarray Datasets through the .bounds attribute.

Examples

Import BoundsAccessor class:

>>> import xcdat # or from xcdat import bounds

Use BoundsAccessor class:

>>> ds = xcdat.open_dataset("/path/to/file")
>>>
>>> ds.bounds.<attribute>
>>> ds.bounds.<method>
>>> ds.bounds.<property>

Parameters
dataset (xr.Dataset) – A Dataset object.

Examples

Import:

>>> from xcdat import bounds

Return dictionary of axis and coordinate keys mapped to bounds:

>>> ds.bounds.map

Return list of keys for bounds data variables:

>>> ds.bounds.keys

Add missing coordinate bounds for supported axes in the Dataset:

>>> ds = ds.bounds.add_missing_bounds(axes=["X", "Y", "T"])

Get coordinate bounds if they exist:

>>> lat_bounds = ds.bounds.get_bounds("Y")


>>> lon_bounds = ds.bounds.get_bounds("X")
>>> time_bounds = ds.bounds.get_bounds("T")

Add coordinate bounds for a specific axis if they don’t exist:

>>> ds = ds.bounds.add_bounds("Y")

__init__(dataset)

10.5. API Reference 131


xCDAT Documentation, Release 0.6.0

Methods

__init__(dataset)

add_bounds(axis) Add bounds for an axis using its coordinates as mid-


points.
add_missing_bounds(axes) Adds missing coordinate bounds for supported axes
in the Dataset.
add_time_bounds(method[, freq, ...]) Add bounds for an axis using its coordinate points.
get_bounds(axis[, var_key]) Gets coordinate bounds.

Attributes

keys Returns a list of keys for the bounds data variables in


the Dataset.
map Returns a map of axis and coordinates keys to their
bounds.

_dataset

property map
Returns a map of axis and coordinates keys to their bounds.
The dictionary provides all valid CF compliant keys for axis and coordinates. For example, latitude will
includes keys for “lat”, “latitude”, and “Y”.
Returns
Dict[str, Optional[xr.DataArray]] – Dictionary mapping axis and coordinate keys to
their bounds.
property keys
Returns a list of keys for the bounds data variables in the Dataset.
Returns
List[str] – A list of sorted bounds data variable keys.
add_missing_bounds(axes)
Adds missing coordinate bounds for supported axes in the Dataset.
This function loops through the Dataset’s axes and attempts to adds bounds to its coordinates if they don’t
exist. “X”, “Y” , and “Z” axes bounds are the midpoints between coordinates. “T” axis bounds are based
on the time frequency of the coordinates.
An axis must meet the following criteria to add bounds for it, otherwise they are ignored:
1. Axis is either X”, “Y”, “T”, or “Z”
2. Coordinates are a single dimension, not multidimensional
3. Coordinates are a length > 1 (not singleton)
4. Bounds must not already exist
• Coordinates are mapped to bounds using the “bounds” attr. For example, bounds exist if ds.
time.attrs["bounds"] is set to "time_bnds" and ds.time_bnds is present in the dataset.
5. For the “T” axis, its coordinates must be composed of datetime-like objects (np.datetime64 or cftime).

132 Chapter 10. License


xCDAT Documentation, Release 0.6.0

Parameters
axes (List[str]) – List of CF axes that function should operate on. Options include “X”,
“Y”, “T”, or “Z”.
Returns
xr.Dataset

get_bounds(axis, var_key=None)
Gets coordinate bounds.
Parameters
• axis (CFAxisKey) – The CF axis key (“X”, “Y”, “T”, “Z”).
• var_key (Optional[str]) – The key of the coordinate or data variable to get axis bounds
for. This parameter is useful if you only want the single bounds DataArray related to the
axis on the variable (e.g., “tas” has a “lat” dimension and you want “lat_bnds”).
Returns
Union[xr.Dataset, xr.DataArray] – A Dataset of N bounds variables, or a single bounds
variable DataArray.
Raises
• ValueError – If an incorrect axis argument is passed.
• KeyError: – If bounds were not found for the specific axis.
add_bounds(axis)
Add bounds for an axis using its coordinates as midpoints.
This method loops over the axis’s coordinate variables and attempts to add bounds for each of them if they
don’t exist. Each coordinate point is the midpoint between their lower and upper bounds.
To add bounds for an axis its coordinates must meet the following criteria, otherwise an error is thrown:
1. Axis is either X”, “Y”, “T”, or “Z”
2. Coordinates are single dimensional, not multidimensional
3. Coordinates are a length > 1 (not singleton)
4. Bounds must not already exist
• Coordinates are mapped to bounds using the “bounds” attr. For example, bounds exist if ds.
time.attrs["bounds"] is set to "time_bnds" and ds.time_bnds is present in the dataset.

Parameters
axis (CFAxisKey) – The CF axis key (“X”, “Y”, “T”, “Z”).
Returns
• xr.Dataset – The dataset with bounds added.
• Raises

add_time_bounds(method, freq=None, daily_subfreq=None, end_of_month=False)


Add bounds for an axis using its coordinate points.
This method loops over the time axis coordinate variables and attempts to add bounds for each of them if
they don’t exist. To add time bounds for the time axis, its coordinates must be the following criteria:
1. Coordinates are single dimensional, not multidimensional

10.5. API Reference 133


xCDAT Documentation, Release 0.6.0

2. Coordinates are a length > 1 (not singleton)


3. Bounds must not already exist
• Coordinates are mapped to bounds using the “bounds” attr. For example, bounds exist if ds.
time.attrs["bounds"] is set to "time_bnds" and ds.time_bnds is present in the dataset.
4. If method=freq, coordinates must be composed of datetime-like objects (np.datetime64 or
cftime)

Parameters
• method ({"freq", "midpoint"}) – The method for creating time bounds for time co-
ordinates, either “freq” or “midpoint”.
– “freq”: Create time bounds as the start and end of each timestep’s period using either the
inferred or specified time frequency (freq parameter). For example, the time bounds
will be the start and end of each month for each monthly coordinate point.
– “midpoint”: Create time bounds using time coordinates as the midpoint between their
upper and lower bounds.
• freq ({"year", "month", "day", "hour"}, optional) – If method="freq", this pa-
rameter specifies the time frequency for creating time bounds. By default None, which
infers the frequency using the time coordinates.
• daily_subfreq ({1, 2, 3, 4, 6, 8, 12, 24}, optional) – If freq=="hour", this pa-
rameter sets the number of timepoints per day for time bounds, by default None.
– daily_subfreq=None infers the daily time frequency from the time coordinates.
– daily_subfreq=1 is daily
– daily_subfreq=2 is twice daily
– daily_subfreq=4 is 6-hourly
– daily_subfreq=8 is 3-hourly
– daily_subfreq=12 is 2-hourly
– daily_subfreq=24 is hourly
• end_of_month (bool, optional) – If freq=="month", this flag notes that the timepoint is
saved at the end of the monthly interval (see Note), by default False.
– Some timepoints are saved at the end of the interval, e.g., Feb. 1 00:00 for the time
interval Jan. 1 00:00 - Feb. 1 00:00. Since this method determines the month and year
from the time vector, the bounds will be set incorrectly if the timepoint is set to the end
of the time interval. For these cases, set end_of_month=True.
Returns
xr.Dataset – The dataset with time bounds added.

_drop_ancillary_singleton_coords(coord_vars)
Drop ancillary singleton coordinates from dimension coordinates.
Xarray coordinate variables retain all coordinates from the parent object. This means if singleton coordi-
nates exist, they are attached to dimension coordinates as ancillary coordinates. For example, the “height”
singleton coordinate will be attached to “time” coordinates even though “height” is related to the “Z” axis,
not the “T” axis. Refer to1 for more info on this Xarray behavior.
1 https://github.com/pydata/xarray/issues/6196

134 Chapter 10. License


xCDAT Documentation, Release 0.6.0

This is an undesirable behavior in xCDAT because the add bounds methods loop over coordinates related
to an axis and attempts to add bounds if they don’t exist. If ancillary coordinates are present, “ValueError:
Cannot generate bounds for coordinate variable ‘height’ which has a length <= 1 (singleton)” is raised. For
the purpose of adding bounds, we temporarily drop any ancillary singletons from dimension coordinates
before looping over those coordinates. Ancillary singletons will still be present in the final Dataset object
to maintain the Dataset’s integrity.
Parameters
coord_vars (Union[xr.Dataset, xr.DataArray]) – The dimension coordinate variables
with ancillary coordinates (if they exist).
Returns
Union[xr.Dataset, xr.DataArray] – The dimension coordinate variables with ancillary
coordinates dropped (if they exist).

References

_get_bounds_keys(axis)
Get bounds keys for an axis’s coordinate variables in the dataset.
This function attempts to map bounds to an axis using cf_xarray and its interpretation of the CF “bounds”
attribute.
Parameters
axis (CFAxisKey) – The CF axis key (“X”, “Y”, “T”, or “Z”).
Returns
List[str] – The axis bounds key(s).
_create_time_bounds(time, freq=None, daily_subfreq=None, end_of_month=False)
Creates time bounds for each timestep of the time coordinate axis.
This method creates time bounds as the start and end of each timestep’s period using either the inferred or
specified time frequency (freq parameter). For example, the time bounds will be the start and end of each
month for each monthly coordinate point.
Parameters
• time (xr.DataArray) – The temporal coordinate variable for the axis.
• freq ({"year", "month", "day", "hour"}, optional) – The time frequency for creat-
ing time bounds, by default None (infer the frequency).
• daily_subfreq ({1, 2, 3, 4, 6, 8, 12, 24}, optional) – If freq=="hour", this pa-
rameter sets the number of timepoints per day for bounds, by default None. If greater than
1, sub-daily bounds are created.
– daily_subfreq=None infers the freq from the time coords (default)
– daily_subfreq=1 is daily
– daily_subfreq=2 is twice daily
– daily_subfreq=4 is 6-hourly
– daily_subfreq=8 is 3-hourly
– daily_subfreq=12 is 2-hourly
– daily_subfreq=24 is hourly

10.5. API Reference 135


xCDAT Documentation, Release 0.6.0

• end_of_month (bool, optional) – If freq==”month”`, this flag notes that the timepoint is
saved at the end of the monthly interval (see Note), by default False.
Returns
xr.DataArray – A DataArray storing bounds for the time axis.
Raises
• ValueError – If coordinates are a singleton.
• TypeError – If time coordinates are not composed of datetime-like objects.

Note: Some timepoints are saved at the end of the interval, e.g., Feb. 1 00:00 for the time interval Jan.
1 00:00 - Feb. 1 00:00. Since this function determines the month and year from the time vector, the
bounds will be set incorrectly if the timepoint is set to the end of the time interval. For these cases, set
end_of_month=True.

_create_yearly_time_bounds(timesteps, obj_type)
Creates time bounds for each timestep with the start and end of the year.
Bounds for each timestep correspond to Jan. 1 00:00:00 of the year of the timestep and Jan. 1 00:00:00 of
the subsequent year.
Parameters
• timesteps (np.ndarray) – An array of timesteps, represented as either cftime.datetime or
pd.Timestamp (casted from np.datetime64[ns] to support pandas time/date components).
• obj_type (Union[cftime.datetime, pd.Timestamp]) – The object type for time
bounds based on the dtype of time_values.
Returns
List[Union[cftime.datetime, pd.Timestamp]] – A list of time bound values.
_create_monthly_time_bounds(timesteps, obj_type, end_of_month=False)
Creates time bounds for each timestep with the start and end of the month.
Bounds for each timestep correspond to 00:00:00 on the first of the month and 00:00:00 on the first of the
subsequent month.
Parameters
• timesteps (np.ndarray) – An array of timesteps, represented as either cftime.datetime or
pd.Timestamp (casted from np.datetime64[ns] to support pandas time/date components).
• obj_type (Union[cftime.datetime, pd.Timestamp]) – The object type for time
bounds based on the dtype of time_values.
• end_of_month (bool, optional) – Flag to note that the timepoint is saved at the end of the
monthly interval (see Note), by default False.
Returns
List[Union[cftime.datetime, pd.Timestamp]] – A list of time bound values.

Note: Some timepoints are saved at the end of the interval, e.g., Feb. 1 00:00 for the time interval Jan.
1 00:00 - Feb. 1 00:00. Since this function determines the month and year from the time vector, the
bounds will be set incorrectly if the timepoint is set to the end of the time interval. For these cases, set
end_of_month=True.

136 Chapter 10. License


xCDAT Documentation, Release 0.6.0

_add_months_to_timestep(timestep, obj_type, delta)


Adds delta month(s) to a timestep.
The delta value can be positive or negative (for subtraction). Refer to4 for logic.
Parameters
• timestep (Union[cftime.datime, pd.Timestamp]) – A timestep represented as
cftime.datetime or pd.Timestamp.
• obj_type (Union[cftime.datetime, pd.Timestamp]) – The object type for time
bounds based on the dtype of timestep.
• delta (int) – Integer months to be added to times (can be positive or negative)
Returns
Union[cftime.datetime, pd.Timestamp]

References

_create_daily_time_bounds(timesteps, obj_type, freq=1)


Creates time bounds for each timestep with the start and end of the day.
Bounds for each timestep corresponds to 00:00:00 timepoint on the current day and 00:00:00 on the sub-
sequent day.
If time steps are sub-daily, then the bounds will begin at 00:00 and end at 00:00 of the following day. For
example, for 3-hourly data, the bounds would be:

[
["01/01/2000 00:00", "01/01/2000 03:00"],
["01/01/2000 03:00", "01/01/2000 06:00"],
...
["01/01/2000 21:00", "02/01/2000 00:00"],
]

Parameters
• timesteps (np.ndarray) – An array of timesteps, represented as either cftime.datetime or
pd.Timestamp (casted from np.datetime64[ns] to support pandas time/date components).
• obj_type (Union[cftime.datetime, pd.Timestamp]) – The object type for time
bounds based on the dtype of time_values.
• freq ({1, 2, 3, 4, 6, 8, 12, 24}, optional) – Number of timepoints per day, by de-
fault 1. If greater than 1, sub-daily bounds are created.
– freq=1 is daily (default)
– freq=2 is twice daily
– freq=4 is 6-hourly
– freq=8 is 3-hourly
– freq=12 is 2-hourly
– freq=24 is hourly
4 https://stackoverflow.com/a/4131114

10.5. API Reference 137


xCDAT Documentation, Release 0.6.0

Returns
List[Union[cftime.datetime, pd.Timestamp]] – A list of time bound values.
Raises
ValueError – If an incorrect freq argument is passed. Should be 1, 2, 3, 4, 6, 8, 12, or 24.

Notes

This function is intended to reproduce CDAT’s setAxisTimeBoundsDaily method5 .

References

_validate_axis_arg(axis)

xcdat.spatial.SpatialAccessor

class xcdat.spatial.SpatialAccessor(dataset)
An accessor class that provides spatial attributes and methods on xarray Datasets through the .spatial attribute.

Examples

Import SpatialAccessor class:

>>> import xcdat # or from xcdat import spatial

Use SpatialAccessor class:

>>> ds = xcdat.open_dataset("/path/to/file")
>>>
>>> ds.spatial.<attribute>
>>> ds.spatial.<method>
>>> ds.spatial.<property>

Parameters
dataset (xr.Dataset) – A Dataset object.

__init__(dataset)

Methods

__init__(dataset)

average(data_var[, axis, weights, ...]) Calculates the spatial average for a rectilinear grid
over an optionally specified regional domain.
get_weights(axis[, lat_bounds, lon_bounds, ...]) Get area weights for specified axis keys and an op-
tional target domain.

5 https://github.com/CDAT/cdutil/blob/master/cdutil/times.py#L1093

138 Chapter 10. License


xCDAT Documentation, Release 0.6.0

average(data_var, axis=['X', 'Y'], weights='generate', keep_weights=False, lat_bounds=None,


lon_bounds=None)
Calculates the spatial average for a rectilinear grid over an optionally specified regional domain.
Operations include:
• If a regional boundary is specified, check to ensure it is within the data variable’s domain boundary.
• If axis weights are not provided, get axis weights for standard axis domains specified in axis.
• Adjust weights to conform to the specified regional boundary.
• Compute spatial weighted average.
This method requires that the dataset’s coordinates have the ‘axis’ attribute set to the keys in axis. For
example, the latitude coordinates should have its ‘axis’ attribute set to ‘Y’ (which is also CF-compliant).
This ‘axis’ attribute is used to retrieve the related coordinates via cf_xarray. Refer to this method’s examples
for more information.
Parameters
• data_var (str) – The name of the data variable inside the dataset to spatially average.
• axis (List[SpatialAxis]) – List of axis dimensions to average over, by default [“X”,
“Y”]. Valid axis keys include “X” and “Y”.
• weights ({"generate", xr.DataArray}, optional) – If “generate”, then weights are
generated. Otherwise, pass a DataArray containing the regional weights used for weighted
averaging. weights must include the same spatial axis dimensions and have the same
dimensional sizes as the data variable, by default “generate”.
• keep_weights (bool, optional) – If calculating averages using weights, keep the weights
in the final dataset output, by default False.
• lat_bounds (Optional[RegionAxisBounds], optional) – A tuple of floats/ints for the re-
gional latitude lower and upper boundaries. This arg is used when calculating axis weights,
but is ignored if weights are supplied. The lower bound cannot be larger than the upper
bound, by default None.
• lon_bounds (Optional[RegionAxisBounds], optional) – A tuple of floats/ints for the
regional longitude lower and upper boundaries. This arg is used when calculating axis
weights, but is ignored if weights are supplied. The lower bound can be larger than the
upper bound (e.g., across the prime meridian, dateline), by default None.
Returns
xr.Dataset – Dataset with the spatially averaged variable.
Raises
KeyError – If data variable does not exist in the Dataset.

Examples

Check the ‘axis’ attribute is set on the required coordinates:

>>> ds.lat.attrs["axis"]
>>> Y
>>>
>>> ds.lon.attrs["axis"]
>>> X

Set the ‘axis’ attribute for the required coordinates if it isn’t:

10.5. API Reference 139


xCDAT Documentation, Release 0.6.0

>>> ds.lat.attrs["axis"] = "Y"


>>> ds.lon.attrs["axis"] = "X"

Call spatial averaging method:

>>> ds.spatial.average(...)

Get global average time series:

>>> ts_global = ds.spatial.average("tas", axis=["X", "Y"])["tas"]

Get time series in Nino 3.4 domain:

>>> ts_n34 = ds.spatial.average("ts", axis=["X", "Y"],


>>> lat_bounds=(-5, 5),
>>> lon_bounds=(-170, -120))["ts"]

Get zonal mean time series:

>>> ts_zonal = ds.spatial.average("tas", axis=["X"])["tas"]

Using custom weights for averaging:

>>> # The shape of the weights must align with the data var.
>>> self.weights = xr.DataArray(
>>> data=np.ones((4, 4)),
>>> coords={"lat": self.ds.lat, "lon": self.ds.lon},
>>> dims=["lat", "lon"],
>>> )
>>>
>>> ts_global = ds.spatial.average("tas", axis=["X", "Y"],
>>> weights=weights)["tas"]

get_weights(axis, lat_bounds=None, lon_bounds=None, data_var=None)


Get area weights for specified axis keys and an optional target domain.
This method first determines the weights for an individual axis based on the difference between the upper
and lower bound. For latitude the weight is determined by the difference of sine(latitude). All axis weights
are then combined to form a DataArray of weights that can be used to perform a weighted (spatial) average.
If lat_bounds or lon_bounds are supplied, then grid cells outside this selected regional domain are given
zero weight. Grid cells that are partially in this domain are given partial weight.
Parameters
• axis (List[SpatialAxis]) – List of axis dimensions to average over.
• lat_bounds (Optional[RegionAxisBounds]) – Tuple of latitude boundaries for regional
selection, by default None.
• lon_bounds (Optional[RegionAxisBounds]) – Tuple of longitude boundaries for re-
gional selection, by default None.
• data_var (Optional[str]) – The key of the data variable, by default None. Pass this
argument when the dataset has more than one bounds per axis (e.g., “lon” and “zlon_bnds”
for the “X” axis), or you want weights for a specific data variable.

140 Chapter 10. License


xCDAT Documentation, Release 0.6.0

Returns
xr.DataArray – A DataArray containing the region weights to use during averaging.
weights are 1-D and correspond to the specified axes (axis) in the region.

Notes

This method was developed for rectilinear grids only. get_weights() recognizes and operate on latitude
and longitude, but could be extended to work with other standard geophysical dimensions (e.g., time, depth,
and pressure).
_validate_axis_arg(axis)
Validates that the axis dimension(s) exists in the dataset.
Parameters
axis (List[SpatialAxis]) – List of axis dimensions to average over.
Raises
• ValueError – If a key in axis is not a supported value.
• KeyError – If the dataset does not have coordinates for the axis dimension, or the axis
attribute is not set for those coordinates.
_validate_region_bounds(axis, bounds)
Validates the bounds arg based on a set of criteria.
Parameters
• axis (SpatialAxis) – The axis related to the bounds.
• bounds (RegionAxisBounds) – The axis bounds.
Raises
• TypeError – If bounds is not a tuple.
• ValueError – If the bounds has 0 elements or greater than 2 elements.
• TypeError – If the bounds lower bound is not a float or integer.
• TypeError – If the bounds upper bound is not a float or integer.
• ValueError – If the axis is “Y” and the bounds lower value is larger than the upper
value.
_get_longitude_weights(domain_bounds, region_bounds)
Gets weights for the longitude axis.
This method performs longitudinal processing including (in order):
1. Align the axis orientations of the domain and region bounds to (0, 360) to ensure compatibility in the
proceeding steps.
2. Handle grid cells that cross the prime meridian (e.g., [-1, 1]) by breaking such grid cells into two (e.g.,
[0, 1] and [359, 360]) to ensure alignment with the (0, 360) axis orientation. This results in a bounds
axis of length(nlon)+1. The index of the grid cell that crosses the prime meridian is returned in order
to reduce the length of weights to nlon.
3. Scale the domain down to a region (if selected).
4. Calculate weights using the domain bounds.

10.5. API Reference 141


xCDAT Documentation, Release 0.6.0

5. If the prime meridian grid cell exists, use this cell’s index to handle the weights vector’s increased
length as a result of the two additional grid cells. The extra weights are added to the prime meridian
grid cell and removed from the weights vector to ensure the lengths of the weights and its corresponding
domain remain in alignment.

Parameters
• domain_bounds (xr.DataArray) – The array of bounds for the longitude domain.
• region_bounds (Optional[np.ndarray]) – The array of bounds for longitude regional
selection.
Returns
xr.DataArray – The longitude axis weights.
Raises
ValueError – If the there are multiple instances in which the domain_bounds[:, 0] > do-
main_bounds[:, 1]

_get_latitude_weights(domain_bounds, region_bounds)
Gets weights for the latitude axis.
This method scales the domain to a region (if selected). It also scales the area between two lines of latitude
as the difference of the sine of latitude bounds.
Parameters
• domain_bounds (xr.DataArray) – The array of bounds for the latitude domain.
• region_bounds (Optional[np.ndarray]) – The array of bounds for latitude regional
selection.
Returns
xr.DataArray – The latitude axis weights.
_calculate_weights(domain_bounds)
Calculate weights for the domain.
This method takes the absolute difference between the upper and lower bound values to calculate weights.
Parameters
domain_bounds (xr.DataArray) – The array of bounds for a domain.
Returns
xr.DataArray – The weights for an axes.
_swap_lon_axis(lon, to)
Swap the longitude axis orientation.
Parameters
• lon (Union[xr.DataArray, np.ndarray]) – Longitude values to convert.
• to (Literal[180, 360]) – Axis orientation to convert to, either 180 [-180, 180) or 360 [0,
360).
Returns
Union[xr.DataArray, np.ndarray] – Converted longitude values.

142 Chapter 10. License


xCDAT Documentation, Release 0.6.0

Notes

This does not reorder the values in any way; it only converts the values in-place between longitude conven-
tions [-180, 180) or [0, 360).
_scale_domain_to_region(domain_bounds, region_bounds)
Scale domain bounds to conform to a regional selection in order to calculate spatial weights.
Axis weights are determined by the difference between the upper and lower boundary. If a region is selected,
the grid cell bounds outside the selected region are adjusted using this method so that the grid cell bounds
match the selected region bounds. The effect of this adjustment is to give partial weight to grid cells that
are partially in the selected regional domain and zero weight to grid cells outside the selected domain.
Parameters
• domain_bounds (xr.DataArray) – The domain’s bounds.
• region_bounds (np.ndarray) – The region bounds that the domain bounds are scaled
down to.
Returns
xr.DataArray – Scaled dimension bounds based on regional selection.

Notes

If a lower regional selection bound exceeds the upper selection bound, this algorithm assumes that the axis
is longitude and the user is specifying a region that includes the prime meridian. The lower selection bound
should not exceed the upper bound for latitude.
_combine_weights(axis_weights)
Generically rescales axis weights for a given region.
This method creates an n-dimensional weighting array by performing matrix multiplication for a list of
specified axis keys using a dictionary of axis weights.
Parameters
axis_weights (AxisWeights) – Dictionary of axis weights, where key is axis and value is
the corresponding DataArray of weights.
Returns
xr.DataArray – A DataArray containing the region weights to use during averaging.
weights are 1-D and correspond to the specified axis keys (axis) in the region.
_validate_weights(data_var, axis)
Validates the weights arg based on a set of criteria.
This methods checks for the dimensional alignment between the weights and data_var. It assumes that
data_var has the same keys that are specified in axis, which has already been validated using self.
_validate_axis() in self.average().
Parameters
• data_var (xr.DataArray) – The data variable used for validation with user supplied
weights.
• axis (List[SpatialAxis]) – List of axes dimension(s) average over.
• weights (xr.DataArray) – A DataArray containing the region area weights for averaging.
weights must include the same spatial axis dimensions found in axis and data_var, and
the same axis dims sizes as data_var.

10.5. API Reference 143


xCDAT Documentation, Release 0.6.0

Raises
• KeyError – If weights does not include the latitude dimension.
• KeyError – If weights does not include the longitude dimension.
• ValueError – If the axis dimension sizes between weights and data_var are mis-
aligned.
_averager(data_var, axis)
Perform a weighted average of a data variable.
This method assumes all specified keys in axis exists in the data variable. Validation for this criteria is
performed in _validate_weights().
Operations include:
• Masked (missing) data receives zero weight.
• Perform weighted average over user-specified axes/axis.

Parameters
• data_var (xr.DataArray) – Data variable inside a Dataset.
• axis (List[SpatialAxis]) – List of axis dimensions to average over.
Returns
xr.DataArray – Variable that has been reduced via a weighted average.

Notes

weights must be a DataArray and cannot contain missing values. Missing values are replaced with 0 using
weights.fillna(0).

xcdat.temporal.TemporalAccessor

class xcdat.temporal.TemporalAccessor(dataset)
An accessor class that provides temporal attributes and methods on xarray Datasets through the .temporal
attribute.
This accessor class requires the dataset’s time coordinates to be decoded as np.datetime64 or cftime.
datetime objects. The dataset must also have time bounds to generate weights for weighted calculations and to
infer the grouping time frequency in average() (single-snap shot average).

Examples

Import TemporalAccessor class:

>>> import xcdat # or from xcdat import temporal

Use TemporalAccessor class:

>>> ds = xcdat.open_dataset("/path/to/file")
>>>
>>> ds.temporal.<attribute>
(continues on next page)

144 Chapter 10. License


xCDAT Documentation, Release 0.6.0

(continued from previous page)


>>> ds.temporal.<method>
>>> ds.temporal.<property>

Check the ‘axis’ attribute is set on the time coordinates:

>>> ds.time.attrs["axis"]
>>> T

Set the ‘axis’ attribute for the time coordinates if it isn’t:

>>> ds.time.attrs["axis"] = "T"

Parameters
dataset (xr.Dataset) – A Dataset object.

__init__(dataset)

Methods

__init__(dataset)

average(data_var[, weighted, keep_weights]) Returns a Dataset with the average of a data variable
and the time dimension removed.
climatology(data_var, freq[, weighted, ...]) Returns a Dataset with the climatology of a data vari-
able.
departures(data_var, freq[, weighted, ...]) Returns a Dataset with the climatological departures
(anomalies) for a data variable.
group_average(data_var, freq[, weighted, ...]) Returns a Dataset with average of a data variable by
time group.

average(data_var, weighted=True, keep_weights=False)


Returns a Dataset with the average of a data variable and the time dimension removed.
This method infers the time grouping frequency by checking the distance between a set of upper and lower
time bounds. This method is particularly useful for calculating the weighted averages of monthly or yearly
time series data because the number of days per month/year can vary based on the calendar type, which can
affect weighting. For other frequencies, the distribution of weights will be equal so weighted=True is the
same as weighted=False.
Time bounds are used for inferring the time series frequency and for generating weights (refer to the
weighted parameter documentation below).
Parameters
• data_var (str) – The key of the data variable for calculating averages
• weighted (bool, optional) – Calculate averages using weights, by default True.
Weights are calculated by first determining the length of time for each coordinate point
using the difference of its upper and lower bounds. The time lengths are grouped, then
each time length is divided by the total sum of the time lengths to get the weight of each
coordinate point.

10.5. API Reference 145


xCDAT Documentation, Release 0.6.0

The weight of masked (missing) data is excluded when averages are taken. This is the same
as giving them a weight of 0.
• keep_weights (bool, optional) – If calculating averages using weights, keep the weights
in the final dataset output, by default False.
Returns
xr.Dataset – Dataset with the average of the data variable and the time dimension removed.

Examples

Get weighted averages for a monthly time series data variable:

>>> ds_month = ds.temporal.average("ts")


>>> ds_month.ts

group_average(data_var, freq, weighted=True, keep_weights=False, season_config={'custom_seasons':


None, 'dec_mode': 'DJF', 'drop_incomplete_djf': False})
Returns a Dataset with average of a data variable by time group.
Time bounds are used for generating weights to calculate weighted group averages (refer to the weighted
parameter documentation below).
Parameters
• data_var (str) – The key of the data variable for calculating time series averages.
• freq (Frequency) – The time frequency to group by.
– “year”: groups by year for yearly averages.
– “season”: groups by (year, season) for seasonal averages.
– “month”: groups by (year, month) for monthly averages.
– “day”: groups by (year, month, day) for daily averages.
– “hour”: groups by (year, month, day, hour) for hourly averages.
• weighted (bool, optional) – Calculate averages using weights, by default True.
Weights are calculated by first determining the length of time for each coordinate point
using the difference of its upper and lower bounds. The time lengths are grouped, then
each time length is divided by the total sum of the time lengths to get the weight of each
coordinate point.
The weight of masked (missing) data is excluded when averages are calculated. This is the
same as giving them a weight of 0.
• keep_weights (bool, optional) – If calculating averages using weights, keep the weights
in the final dataset output, by default False.
• season_config (SeasonConfigInput, optional) – A dictionary for “season” frequency
configurations. If configs for predefined seasons are passed, configs for custom seasons
are ignored and vice versa.
Configs for predefined seasons:
– “dec_mode” (Literal[“DJF”, “JFD”], by default “DJF”)
The mode for the season that includes December.
∗ “DJF”: season includes the previous year December.

146 Chapter 10. License


xCDAT Documentation, Release 0.6.0

∗ “JFD”: season includes the same year December.


Xarray labels the season with December as “DJF”, but it is actually “JFD”.
– “drop_incomplete_djf” (bool, by default False)
If the “dec_mode” is “DJF”, this flag drops (True) or keeps (False) time coordinates
that fall under incomplete DJF seasons Incomplete DJF seasons include the start
year Jan/Feb and the end year Dec.
Configs for custom seasons:
– “custom_seasons” ([List[List[str]]], by default None)
List of sublists containing month strings, with each sublist representing a custom
season.
∗ Month strings must be in the three letter format (e.g., ‘Jan’)
∗ Each month must be included once in a custom season
∗ Order of the months in each custom season does not matter
∗ Custom seasons can vary in length

>>> # Example of custom seasons in a three month format:


>>> custom_seasons = [
>>> ["Jan", "Feb", "Mar"], # "JanFebMar"
>>> ["Apr", "May", "Jun"], # "AprMayJun"
>>> ["Jul", "Aug", "Sep"], # "JulAugSep"
>>> ["Oct", "Nov", "Dec"], # "OctNovDec"
>>> ]

Returns
xr.Dataset – Dataset with the average of a data variable by time group.

Examples

Get seasonal averages for a data variable:

>>> ds_season = ds.temporal.group_average(


>>> "ts",
>>> "season",
>>> season_config={
>>> "dec_mode": "DJF",
>>> "drop_incomplete_season": True
>>> }
>>> )
>>> ds_season.ts
>>>
>>> ds_season_with_jfd = ds.temporal.group_average(
>>> "ts",
>>> "season",
>>> season_config={"dec_mode": "JFD"}
>>> )
>>> ds_season_with_jfd.ts

Get seasonal averages with custom seasons for a data variable:

10.5. API Reference 147


xCDAT Documentation, Release 0.6.0

>>> custom_seasons = [
>>> ["Jan", "Feb", "Mar"], # "JanFebMar"
>>> ["Apr", "May", "Jun"], # "AprMayJun"
>>> ["Jul", "Aug", "Sep"], # "JulAugSep"
>>> ["Oct", "Nov", "Dec"], # "OctNovDec"
>>> ]
>>>
>>> ds_season_custom = ds.temporal.group_average(
>>> "ts",
>>> "season",
>>> season_config={"custom_seasons": custom_seasons}
>>> )

Get the average() operation attributes:

>>> ds_season_with_djf.ts.attrs
{
'operation': 'temporal_avg',
'mode': 'average',
'freq': 'season',
'weighted': 'True',
'dec_mode': 'DJF',
'drop_incomplete_djf': 'False'
}

climatology(data_var, freq, weighted=True, keep_weights=False, reference_period=None,


season_config={'custom_seasons': None, 'dec_mode': 'DJF', 'drop_incomplete_djf': False})
Returns a Dataset with the climatology of a data variable.
Time bounds are used for generating weights to calculate weighted climatology (refer to the weighted
parameter documentation below).
Parameters
• data_var (str) – The key of the data variable for calculating climatology.
• freq (Frequency) – The time frequency to group by.
– “season”: groups by season for the seasonal cycle climatology.
– “month”: groups by month for the annual cycle climatology.
– “day”: groups by (month, day) for the daily cycle climatology. If the CF
calendar type is "gregorian", "proleptic_gregorian", or "standard
", leap days (if present) are dropped to avoid inconsistencies when calculating
climatologies. Refer to1 for more details on this implementation decision.
• weighted (bool, optional) – Calculate averages using weights, by default True.
Weights are calculated by first determining the length of time for each coordinate
point using the difference of its upper and lower bounds. The time lengths are
grouped, then each time length is divided by the total sum of the time lengths to
get the weight of each coordinate point.
The weight of masked (missing) data is excluded when averages are taken. This
is the same as giving them a weight of 0.
1 https://github.com/xCDAT/xcdat/discussions/332

148 Chapter 10. License


xCDAT Documentation, Release 0.6.0

• keep_weights (bool, optional) – If calculating averages using weights, keep the


weights in the final dataset output, by default False.
• reference_period (Optional[Tuple[str, str]], optional) – The climatolog-
ical reference period, which is a subset of the entire time series. This pa-
rameter accepts a tuple of strings in the format ‘yyyy-mm-dd’. For example,
('1850-01-01', '1899-12-31'). If no value is provided, the climatological
reference period will be the full period covered by the dataset.
• season_config (SeasonConfigInput, optional) – A dictionary for “season” fre-
quency configurations. If configs for predefined seasons are passed, configs for
custom seasons are ignored and vice versa.
Configs for predefined seasons:
– “dec_mode” (Literal[“DJF”, “JFD”], by default “DJF”)
The mode for the season that includes December.
∗ “DJF”: season includes the previous year December.
∗ “JFD”: season includes the same year December.
Xarray labels the season with December as “DJF”, but it is actually
“JFD”.
– “drop_incomplete_djf” (bool, by default False)
If the “dec_mode” is “DJF”, this flag drops (True) or keeps (False) time
coordinates that fall under incomplete DJF seasons Incomplete DJF sea-
sons include the start year Jan/Feb and the end year Dec.
Configs for custom seasons:
– “custom_seasons” ([List[List[str]]], by default None)
List of sublists containing month strings, with each sublist representing
a custom season.
∗ Month strings must be in the three letter format (e.g., ‘Jan’)
∗ Each month must be included once in a custom season
∗ Order of the months in each custom season does not matter
∗ Custom seasons can vary in length

>>> # Example of custom seasons in a three month␣


˓→format:

>>> custom_seasons = [
>>> ["Jan", "Feb", "Mar"], # "JanFebMar"
>>> ["Apr", "May", "Jun"], # "AprMayJun"
>>> ["Jul", "Aug", "Sep"], # "JulAugSep"
>>> ["Oct", "Nov", "Dec"], # "OctNovDec"
>>> ]

Returns
xr.Dataset – Dataset with the climatology of a data variable.

10.5. API Reference 149


xCDAT Documentation, Release 0.6.0

References

Examples

Get a data variable’s seasonal climatology:

>>> ds_season = ds.temporal.climatology(


>>> "ts",
>>> "season",
>>> season_config={
>>> "dec_mode": "DJF",
>>> "drop_incomplete_season": True
>>> }
>>> )
>>> ds_season.ts
>>>
>>> ds_season = ds.temporal.climatology(
>>> "ts",
>>> "season",
>>> season_config={"dec_mode": "JFD"}
>>> )
>>> ds_season.ts

Get a data variable’s seasonal climatology with custom seasons:

>>> custom_seasons = [
>>> ["Jan", "Feb", "Mar"], # "JanFebMar"
>>> ["Apr", "May", "Jun"], # "AprMayJun"
>>> ["Jul", "Aug", "Sep"], # "JulAugSep"
>>> ["Oct", "Nov", "Dec"], # "OctNovDec"
>>> ]
>>>
>>> ds_season_custom = ds.temporal.climatology(
>>> "ts",
>>> "season",
>>> season_config={"custom_seasons": custom_seasons}
>>> )

Get climatology() operation attributes:

>>> ds_season_with_djf.ts.attrs
{
'operation': 'temporal_avg',
'mode': 'climatology',
'freq': 'season',
'weighted': 'True',
'dec_mode': 'DJF',
'drop_incomplete_djf': 'False'
}

departures(data_var, freq, weighted=True, keep_weights=False, reference_period=None,


season_config={'custom_seasons': None, 'dec_mode': 'DJF', 'drop_incomplete_djf': False})
Returns a Dataset with the climatological departures (anomalies) for a data variable.
In climatology, “anomalies” refer to the difference between the value during a given time interval (e.g.,

150 Chapter 10. License


xCDAT Documentation, Release 0.6.0

the January average surface air temperature) and the long-term average value for that time interval (e.g.,
the average surface temperature over the last 30 Januaries).
Time bounds are used for generating weights to calculate weighted climatology (refer to the weighted
parameter documentation below).
Parameters
• data_var (str) – The key of the data variable for calculating departures.
• freq (Frequency) – The frequency of time to group by.
– “season”: groups by season for the seasonal cycle departures.
– “month”: groups by month for the annual cycle departures.
– “day”: groups by (month, day) for the daily cycle departures. If the CF cal-
endar type is "gregorian", "proleptic_gregorian", or "standard",
leap days (if present) are dropped to avoid inconsistencies when calculating
climatologies. Refer to2 for more details on this implementation decision.
• weighted (bool, optional) – Calculate averages using weights, by default True.
Weights are calculated by first determining the length of time for each coordinate
point using the difference of its upper and lower bounds. The time lengths are
grouped, then each time length is divided by the total sum of the time lengths to
get the weight of each coordinate point.
The weight of masked (missing) data is excluded when averages are taken. This
is the same as giving them a weight of 0.
• keep_weights (bool, optional) – If calculating averages using weights, keep the
weights in the final dataset output, by default False.
• reference_period (Optional[Tuple[str, str]], optional) – The climatolog-
ical reference period, which is a subset of the entire time series and used for
calculating departures. This parameter accepts a tuple of strings in the format
‘yyyy-mm-dd’. For example, ('1850-01-01', '1899-12-31'). If no value is
provided, the climatological reference period will be the full period covered by
the dataset.
• season_config (SeasonConfigInput, optional) – A dictionary for “season” fre-
quency configurations. If configs for predefined seasons are passed, configs for
custom seasons are ignored and vice versa.
Configs for predefined seasons:
– “dec_mode” (Literal[“DJF”, “JFD”], by default “DJF”)
The mode for the season that includes December.
∗ “DJF”: season includes the previous year December.
∗ “JFD”: season includes the same year December.
Xarray labels the season with December as “DJF”, but it is actually
“JFD”.
– “drop_incomplete_djf” (bool, by default False)
If the “dec_mode” is “DJF”, this flag drops (True) or keeps (False) time
coordinates that fall under incomplete DJF seasons Incomplete DJF sea-
sons include the start year Jan/Feb and the end year Dec.
Configs for custom seasons:
2 https://github.com/xCDAT/xcdat/discussions/332

10.5. API Reference 151


xCDAT Documentation, Release 0.6.0

– “custom_seasons” ([List[List[str]]], by default None)


List of sublists containing month strings, with each sublist representing
a custom season.
∗ Month strings must be in the three letter format (e.g., ‘Jan’)
∗ Each month must be included once in a custom season
∗ Order of the months in each custom season does not matter
∗ Custom seasons can vary in length

>>> # Example of custom seasons in a three month␣


˓→format:

>>> custom_seasons = [
>>> ["Jan", "Feb", "Mar"], # "JanFebMar"
>>> ["Apr", "May", "Jun"], # "AprMayJun"
>>> ["Jul", "Aug", "Sep"], # "JulAugSep"
>>> ["Oct", "Nov", "Dec"], # "OctNovDec"
>>> ]

Returns
xr.Dataset – The Dataset containing the departures for a data var’s climatology.

Notes

This method uses xarray’s grouped arithmetic as a shortcut for mapping over all unique labels. Grouped
arithmetic works by assigning a grouping label to each time coordinate of the observation data based
on the averaging mode and frequency. Afterwards, the corresponding climatology is removed from the
observation data at each time coordinate based on the matching labels.
Refer to3 to learn more about how xarray’s grouped arithmetic works.

References

Examples

Get a data variable’s annual cycle departures:

>>> ds_depart = ds_climo.temporal.departures("ts", "month")

Get the departures() operation attributes:

>>> ds_depart.ts.attrs
{
'operation': 'departures',
'frequency': 'season',
'weighted': 'True',
'dec_mode': 'DJF',
'drop_incomplete_djf': 'False'
}

3 https://xarray.pydata.org/en/stable/user-guide/groupby.html#grouped-arithmetic

152 Chapter 10. License


xCDAT Documentation, Release 0.6.0

_averager(data_var, mode, freq, weighted=True, keep_weights=False, reference_period=None,


season_config={'custom_seasons': None, 'dec_mode': 'DJF', 'drop_incomplete_djf': False})
Averages a data variable based on the averaging mode and frequency.
_set_data_var_attrs(data_var)
Set data variable metadata as object attributes and checks whether the time axis is decoded.
This includes the name of the data variable, the time axis dimension name, the calendar type and its
corresponding cftime object (date type).
Parameters
data_var (str) – The key of the data variable.
Raises
• TypeError – If the data variable’s time coordinates are not encoded as datetime-
like objects.
• KeyError – If the data variable does not have a “calendar” encoding attribute.
_set_arg_attrs(mode, freq, weighted, reference_period=None, season_config={'custom_seasons': None,
'dec_mode': 'DJF', 'drop_incomplete_djf': False})
Validates method arguments and sets them as object attributes.
Parameters
• mode (Mode) – The mode for temporal averaging.
• freq (Frequency) – The frequency of time to group by.
• weighted (bool) – Calculate averages using weights.
• season_config (Optional[SeasonConfigInput]) – A dictionary for “sea-
son” frequency configurations. If configs for predefined seasons are passed,
configs for custom seasons are ignored and vice versa, by default DE-
FAULT_SEASON_CONFIG.
Raises
• KeyError – If the Dataset does not have a time dimension.
• ValueError – If an incorrect freq arg was passed.
• ValueError – If an incorrect dec_mode arg was passed.
_is_valid_reference_period(reference_period)

_form_seasons(custom_seasons)
Forms custom seasons from a nested list of months.
This method concatenates the strings in each sublist to form a a flat list of custom season strings
Parameters
custom_seasons (List[List[str]]) – List of sublists containing month strings, with
each sublist representing a custom season.
Returns
Dict[str, List[str]] – A dictionary with the keys being the custom season and the
values being the corresponding list of months.
Raises
• ValueError – If exactly 12 months are not passed in the list of custom seasons.
• ValueError – If a duplicate month(s) were found in the list of custom seasons.
• ValueError – If a month string(s) is not supported.

10.5. API Reference 153


xCDAT Documentation, Release 0.6.0

_preprocess_dataset(ds)
Preprocess the dataset based on averaging settings.
Preprocessing operations include:
• Drop incomplete DJF seasons (leading/trailing)
• Drop leap days
Parameters
ds (xr.Dataset) – The dataset.
Returns
xr.Dataset
_drop_incomplete_djf(dataset)
Drops incomplete DJF seasons within a continuous time series.
This method assumes that the time series is continuous and removes the leading and trailing incomplete
seasons (e.g., the first January and February of a time series that are not complete, because the December
of the previous year is missing). This method does not account for or remove missing time steps anywhere
else.
Parameters
dataset (xr.Dataset) – The dataset with some possibly incomplete DJF seasons.
Returns
xr.Dataset – The dataset with only complete DJF seasons.
_drop_leap_days(ds)
Drop leap days from time coordinates.
This method is used to drop 2/29 from leap years (if present) before calculating climatology/departures
for high frequency time series data to avoid cftime breaking (ValueError: invalid day number provided in
cftime.DatetimeProlepticGregorian(1, 2, 29, 0, 0, 0, 0, has_year_zero=True).
Parameters
ds (xr.Dataset) – The dataset.
Returns
xr.Dataset
_average(data_var, time_bounds)
Averages a data variable with the time dimension removed.
Parameters
• data_var (xr.DataArray) – The data variable.
• time_bounds (xr.DataArray) – The time bounds.
Returns
xr.DataArray – The averages for a data variable with the time dimension removed.
_group_average(data_var, time_bounds)
Averages a data variable by time group.
Parameters
• data_var (xr.DataArray) – The data variable.
• time_bounds (xr.DataArray) – The time bounds.
Returns
xr.DataArray – The data variable averaged by time group.

154 Chapter 10. License


xCDAT Documentation, Release 0.6.0

_get_weights(time_bounds)
Calculates weights for a data variable using time bounds.
This method gets the length of time for each coordinate point by using the difference in the upper and lower
time bounds. This approach ensures that the correct time lengths are calculated regardless of how time
coordinates are recorded (e.g., monthly, daily, hourly) and the calendar type used.
The time lengths are labeled and grouped, then each time length is divided by the total sum of the time
lengths in its group to get its corresponding weight.
The sum of the weights for each group is validated to ensure it equals 1.0.
Parameters
time_bounds (xr.DataArray) – The time bounds.
Returns
xr.DataArray – The weights based on a specified frequency.

Notes

Refer to4 for the supported CF convention calendar types.

References

_group_data(data_var)
Groups a data variable.
This method groups a data variable by a single datetime component for the “average” mode or labeled time
coordinates for all other modes.
Parameters
data_var (xr.DataArray) – A data variable.
Returns
DataArrayGroupBy – A data variable grouped by label.
_label_time_coords(time_coords)
Labels time coordinates with a group for grouping.
This methods labels time coordinates for grouping by first extracting specific xarray datetime components
from time coordinates and storing them in a pandas DataFrame. After processing (if necessary) is per-
formed on the DataFrame, it is converted to a numpy array of datetime objects. This numpy serves as the
data source for the final DataArray of labeled time coordinates.
Parameters
time_coords (xr.DataArray) – The time coordinates.
Returns
xr.DataArray – The DataArray of labeled time coordinates for grouping.
4 https://cfconventions.org/cf-conventions/cf-conventions.html#calendar

10.5. API Reference 155


xCDAT Documentation, Release 0.6.0

Examples

Original daily time coordinates:

>>> <xarray.DataArray 'time' (time: 4)>


>>> array(['2000-01-01T12:00:00.000000000',
>>> '2000-01-31T21:00:00.000000000',
>>> '2000-03-01T21:00:00.000000000',
>>> '2000-04-01T03:00:00.000000000'],
>>> dtype='datetime64[ns]')
>>> Coordinates:
>>> * time (time) datetime64[ns] 2000-01-01T12:00:00 ... 2000-04-01T03:00:
˓→00

Daily time coordinates labeled by year and month:

>>> <xarray.DataArray 'time' (time: 3)>


>>> array(['2000-01-01T00:00:00.000000000',
>>> '2000-03-01T00:00:00.000000000',
>>> '2000-04-01T00:00:00.000000000'],
>>> dtype='datetime64[ns]')
>>> Coordinates:
>>> * time (time) datetime64[ns] 2000-01-01T00:00:00 ... 2000-04-01T00:00:
˓→00

_get_df_dt_components(time_coords)
Returns a DataFrame of xarray datetime components.
This method extracts the applicable xarray datetime components from each time coordinate based on the
averaging mode and frequency, and stores them in a DataFrame.
Additional processing is performed for the seasonal frequency, including:
• If custom seasons are used, map them to each time coordinate based on the middle month of the
custom season.
• If season with December is “DJF”, shift Decembers over to the next year so DJF seasons are correctly
grouped using the previous year December.
• Drop obsolete columns after processing is done.
Parameters
time_coords (xr.DataArray) – The time coordinates.
Returns
pd.DataFrame – A DataFrame of datetime components.

Notes

Refer to5 for information on xarray datetime accessor components.


5 https://xarray.pydata.org/en/stable/user-guide/time-series.html#datetime-components

156 Chapter 10. License


xCDAT Documentation, Release 0.6.0

References

_process_season_df(df )
Processes a DataFrame of datetime components for the season frequency.
Parameters
df (pd.DataFrame) – A DataFrame of xarray datetime components.
Returns
pd.DataFrame – A DataFrame of processed xarray datetime components.
_map_months_to_custom_seasons(df )
Maps the month column in the DataFrame to a custom season.
This method maps each integer value in the “month” column to its string represention, which then maps
to a custom season that is stored in the “season” column. For example, the month of 1 maps to “Jan” and
“Jan” maps to the “JanFebMar” custom season.
Parameters
df (pd.DataFrame) – The DataFrame of xarray datetime components.
Returns
pd.DataFrame – The DataFrame of xarray datetime coordinates, with each row
mapped to a custom season.
_shift_decembers(df_season)
Shifts Decembers over to the next year for “DJF” seasons in-place.
For “DJF” seasons, Decembers must be shifted over to the next year in order for the xarray groupby oper-
ation to correctly label and group the corresponding time coordinates. If the aren’t shifted over, grouping
is incorrectly performed with the native xarray “DJF” season (which is actually “JFD”).
Parameters
df_season (pd.DataFrame) – The DataFrame of xarray datetime components pro-
duced using the “season” frequency.
Returns
pd.DataFrame – The DataFrame of xarray datetime components with Decembers
shifted over to the next year.

Examples

Comparison of “JFD” and “DJF” seasons:

>>> # "JFD" (native xarray behavior)


>>> [(2000, "DJF", 1), (2000, "DJF", 2), (2000, "DJF", 12),
>>> (2001, "DJF", 1), (2001, "DJF", 2)]

>>> # "DJF" (shifted Decembers)


>>> [(2000, "DJF", 1), (2000, "DJF", 2), (2001, "DJF", 12),
>>> (2001, "DJF", 1), (2001, "DJF", 2)]

_map_seasons_to_mid_months(df )
Maps the season column values to the integer of its middle month.
DateTime objects don’t support storing seasons as strings, so the middle months are used to represent the
season. For example, for the season “DJF”, the middle month “J” is mapped to the integer value 1.

10.5. API Reference 157


xCDAT Documentation, Release 0.6.0

The middle month of a custom season is extracted using the ceiling of the middle index from its list of
months. For example, for the custom season “FebMarAprMay” with the list of months [“Feb”, “Mar”,
“Apr”, “May”], the index 3 is used to get the month “Apr”. “Apr” is then mapped to the integer value 4.
After mapping the season to its month, the “season” column is renamed to “month”.
Parameters
df (pd.DataFrame) – The dataframe of datetime components, including a “season”
column.
Returns
pd.DataFrame – The dataframe of datetime components, including a “month” column.
_drop_obsolete_columns(df_season)
Drops obsolete columns from the DataFrame of xarray datetime components.
For the “season” frequency, processing is required on the DataFrame of xarray datetime components,
such as mapping custom seasons based on the month. Additional datetime component values must be
included as DataFrame columns, which become obsolete after processing is done. The obsolete columns
are dropped from the DataFrame before grouping time coordinates.
Parameters
df_season (pd.DataFrame) – The DataFrame of time coordinates for the “season”
frequency with obsolete columns.
Returns
pd.DataFrame – The DataFrame of time coordinates for the “season” frequency with
obsolete columns dropped.
_convert_df_to_dt(df )
Converts a DataFrame of datetime components to cftime datetime objects.
datetime objects require at least a year, month, and day value. However, some modes and time frequencies
don’t require year, month, and/or day for grouping. For these cases, use default values of 1 in order to meet
this datetime requirement.
Parameters
df (pd.DataFrame) – The DataFrame of xarray datetime components.
Returns
np.ndarray – A numpy ndarray of cftime.datetime objects.

Notes

Refer to6 and7 for more information on Timestamp-valid range. We use cftime.datetime objects to avoid
these time range issues.

References

_keep_weights(ds)
Keep the weights in the dataset.
Parameters
ds (xr.Dataset) – The dataset.
Returns
xr.Dataset – The dataset with the weights used for averaging.

6 https://docs.xarray.dev/en/stable/user-guide/weather-climate.html#non-standard-calendars-and-dates-outside-the-timestamp-valid-range
7 https://pandas.pydata.org/pandas-docs/stable/user_guide/timeseries.html#timestamp-limitations

158 Chapter 10. License


xCDAT Documentation, Release 0.6.0

_add_operation_attrs(data_var)
Adds attributes to the data variable describing the operation. These attributes distinguish a data variable
that has been operated on from its original state. The attributes in netCDF4 files do not support booleans
or nested dictionaries, so booleans are converted to strings and nested dictionaries are unpacked.
Parameters
data_var (xr.DataArray) – The data variable.
Returns
xr.DataArray – The data variable with a temporal averaging attributes.

xcdat.regridder.accessor.RegridderAccessor

class xcdat.regridder.accessor.RegridderAccessor(dataset)
An accessor class that provides regridding attributes and methods for xarray Datasets through the .regridder
attribute.

Examples

Import xCDAT:

>>> import xcdat

Use RegridderAccessor class:

>>> ds = xcdat.open_dataset("...")
>>>
>>> ds.regridder.<attribute>
>>> ds.regridder.<method>
>>> ds.regridder.<property>

Parameters
dataset (xr.Dataset) – The Dataset to attach this accessor.
__init__(dataset)

Methods

__init__(dataset)

horizontal(data_var, output_grid[, tool]) Transform data_var to output_grid.


horizontal_regrid2(data_var, output_grid, ...) Deprecated, will be removed with 0.7.0 release.
horizontal_xesmf (data_var, output_grid, ...) Deprecated, will be removed with 0.7.0 release.
vertical(data_var, output_grid[, tool]) Transform data_var to output_grid.

10.5. API Reference 159


xCDAT Documentation, Release 0.6.0

Attributes

grid Extract the X, Y, and Z axes from the Dataset and re-
turn a new xr.Dataset.

_ds

property grid
Extract the X, Y, and Z axes from the Dataset and return a new xr.Dataset.
Returns
xr.Dataset – Containing grid axes.
Raises
• ValueError – If axis dimension coordinate variable is not correctly identified.
• ValueError – If axis has multiple dimensions (only one is expected).

Examples

Import xCDAT:

>>> import xcdat

Open a dataset:

>>> ds = xcdat.open_dataset("...")

Extract grid from dataset:

>>> grid = ds.regridder.grid

_get_axis_data(name)

horizontal_xesmf(data_var, output_grid, **options)


Deprecated, will be removed with 0.7.0 release.
Extends the xESMF library for horizontal regridding between structured rectilinear and curvilinear grids.
This method extends xESMF by automatically constructing the xe.XESMFRegridder object, preserving
source bounds, and generating missing bounds. It regrids data_var in the dataset to output_grid.
Option documentation xcdat.regridder.xesmf.XESMFRegridder()
Parameters
• data_var (str) – Name of the variable in the xr.Dataset to regrid.
• output_grid (xr.Dataset) – Dataset containing output grid.
• options (Dict[str, Any]) – Dictionary with extra parameters for the regridder.
Returns
xr.Dataset – With the data_var variable on the grid defined in output_grid.
Raises
ValueError – If tool is not supported.

160 Chapter 10. License


xCDAT Documentation, Release 0.6.0

Examples

Generate output grid:

>>> output_grid = xcdat.create_gaussian_grid(32)

Regrid data to output grid using xesmf:

>>> ds.regridder.horizontal_xesmf("ts", output_grid)

horizontal_regrid2(data_var, output_grid, **options)


Deprecated, will be removed with 0.7.0 release.
Pure python implementation of CDAT’s regrid2 horizontal regridder.
Regrids data_var in dataset to output_grid using regrid2’s algorithm.
Options documentation xcdat.regridder.regrid2.Regrid2Regridder()
Parameters
• data_var (str) – Name of the variable in the xr.Dataset to regrid.
• output_grid (xr.Dataset) – Dataset containing output grid.
• options (Dict[str, Any]) – Dictionary with extra parameters for the regridder.
Returns
xr.Dataset – With the data_var variable on the grid defined in output_grid.
Raises
ValueError – If tool is not supported.

Examples

Generate output grid:

>>> output_grid = xcdat.create_gaussian_grid(32)

Regrid data to output grid using regrid2:

>>> ds.regridder.horizontal_regrid2("ts", output_grid)

horizontal(data_var, output_grid, tool='xesmf', **options)


Transform data_var to output_grid.
When might Regrid2 be preferred over xESMF?
If performing conservative regridding from a high/medium resolution lat/lon grid to a coarse lat/lon target,
Regrid2 may provide better results as it assumes grid cells with constant latitudes and longitudes while
xESMF assumes the cells are connected by Great Circles1 .
Supported tools, methods and grids:
• xESMF (https://pangeo-xesmf.readthedocs.io/en/latest/)
– Methods: Bilinear, Conservative, Conservative Normed, Patch, Nearest s2d, or Nearest
d2s.
– Grids: Rectilinear, or Curvilinear.
1 https://earthsystemmodeling.org/docs/release/ESMF_8_1_0/ESMF_refdoc/node5.html#SECTION05012900000000000000

10.5. API Reference 161


xCDAT Documentation, Release 0.6.0

– Find options at xcdat.regridder.xesmf.XESMFRegridder()


• Regrid2
– Methods: Conservative
– Grids: Rectilinear
– Find options at xcdat.regridder.regrid2.Regrid2Regridder()
Parameters
• data_var (str) – Name of the variable to transform.
• output_grid (xr.Dataset) – Grid to transform data_var to.
• tool (str) – Name of the tool to use.
• **options (Any) – These options are passed directly to the tool. See specific
regridder for available options.
Returns
xr.Dataset – With the data_var transformed to the output_grid.
Raises
ValueError – If tool is not supported.

References

Examples

Import xCDAT:

>>> import xcdat

Open a dataset:

>>> ds = xcdat.open_dataset("...")

Create output grid:

>>> output_grid = xcdat.create_uniform_grid(-90, 90, 4.0, -180, 180, 5.0)

Regrid variable using “xesmf”:

>>> output_data = ds.regridder.horizontal("ts", output_grid, tool="xesmf",␣


˓→method="bilinear")

Regrid variable using “regrid2”:

>>> output_data = ds.regridder.horizontal("ts", output_grid, tool="regrid2")

vertical(data_var, output_grid, tool='xgcm', **options)


Transform data_var to output_grid.
Supported tools:
• xgcm (https://xgcm.readthedocs.io/en/latest/index.html)
– Methods: Linear, Conservative, Log
– Find options at xcdat.regridder.xgcm.XGCMRegridder()

162 Chapter 10. License


xCDAT Documentation, Release 0.6.0

Parameters
• data_var (str) – Name of the variable to transform.
• output_grid (xr.Dataset) – Grid to transform data_var to.
• tool (str) – Name of the tool to use.
• **options (Any) – These options are passed directly to the tool. See specific
regridder for available options.
Returns
xr.Dataset – With the data_var transformed to the output_grid.
Raises
ValueError – If tool is not supported.

Examples

Import xCDAT:

>>> import xcdat

Open a dataset:

>>> ds = xcdat.open_dataset("...")

Create output grid:

>>> output_grid = xcdat.create_grid(lev=np.linspace(1000, 1, 20))

Regrid variable using “xgcm”:

>>> output_data = ds.regridder.vertical("so", output_grid, method="linear")

xcdat.regridder.regrid2.Regrid2Regridder

class xcdat.regridder.regrid2.Regrid2Regridder(input_grid, output_grid, **options)

__init__(input_grid, output_grid, **options)


Pure python implementation of the regrid2 horizontal regridder from CDMS2’s regrid2 module.
Regrid data from input_grid to output_grid.
Available options: None
Parameters
• input_grid (xr.Dataset) – Dataset containing the source grid.
• output_grid (xr.Dataset) – Dataset containing the destination grid.
• options (Any) – Dictionary with extra parameters for the regridder.

10.5. API Reference 163


xCDAT Documentation, Release 0.6.0

Examples

Import xCDAT:

>>> import xcdat

Open a dataset:

>>> ds = xcdat.open_dataset("...")

Create output grid:

>>> output_grid = xcdat.create_gaussian_grid(32)

Regrid data:

>>> output_data = ds.regridder.horizontal("ts", output_grid)

Methods

__init__(input_grid, output_grid, **options) Pure python implementation of the regrid2 horizontal


regridder from CDMS2's regrid2 module.
horizontal(data_var, ds) See documentation in xcdat.regridder.
regrid2.Regrid2Regridder()
vertical(data_var, ds) Placeholder for base class.

vertical(data_var, ds)
Placeholder for base class.
horizontal(data_var, ds)
See documentation in xcdat.regridder.regrid2.Regrid2Regridder()
_output_axis_sizes(da)
Maps axes to output array sizes.
Parameters
da (xr.DataArray) – Data array containing variable to be regridded.
Returns
Dict – Mapping of axis name e.g. (“X”, “Y”, etc) to output sizes.
_regrid(input_data, axis_sizes, ordered_axis_names)
Applies regridding to input data.
Parameters
• input_data (np.ndarray) – Input multi-dimensional array on source grid.
• axis_sizes (Dict[str, int]) – Mapping of axis name e.g. (“X”, “Y”, etc) to
output sizes.
• ordered_axis_names (List[str]) – List of axis name in order of dimensions of
input_data.
Returns
np.ndarray – Multi-dimensional array on destination grid.

164 Chapter 10. License


xCDAT Documentation, Release 0.6.0

_base_put_indexes(axis_sizes)
Calculates the base indexes to place cell (0, 0).
Example: For a 3D array (time, lat, lon) with the shape (2, 2, 2) the offsets to place cell (0, 0) in each time
step would be [0, 4].
For a 4D array (time, plev, lat, lon) with shape (2, 2, 2, 2) the offsets to place cell (0, 0) in each time step
would be [0, 4, 8, 16].
Parameters
axis_sizes (Dict[str, int]) – Mapping of axis name e.g. (“X”, “Y”, etc) to output
sizes.
Returns
np.ndarray – Array containing the base indexes to be used in np.put operations.
_create_output_dataset(input_ds, data_var, output_data, axis_variable_name_map,
ordered_axis_names)
Creates the output Dataset containing the new variable on the destination grid.
Parameters
• input_ds (xr.Dataset) – Input dataset containing coordinates and bounds for
unmodified axes.
• data_var (str) – The name of the regridded variable.
• output_data (np.ndarray) – Output data array.
• axis_variable_name_map (Dict[str, str]) – Map of axis name e.g. (“X”,
“Y”, etc) to variable name e.g. (“lon”, “lat”, etc).
• ordered_axis_names (List[str]) – List of axis names in the order observed for
output_data.
Returns
xr.Dataset – Dataset containing the variable on the destination grid.
_abc_impl = <_abc._abc_data object>

xcdat.regridder.xesmf.XESMFRegridder

class xcdat.regridder.xesmf.XESMFRegridder(input_grid, output_grid, method, periodic=False,


extrap_method=None, extrap_dist_exponent=None,
extrap_num_src_pnts=None, ignore_degenerate=True,
**options)

__init__(input_grid, output_grid, method, periodic=False, extrap_method=None,


extrap_dist_exponent=None, extrap_num_src_pnts=None, ignore_degenerate=True, **options)
Extension of xESMF regridder.
This method extends xESMF by automatically constructing by xesmf.XESMFRegridder object and ensur-
ing bounds and metadata are preserved in the output dataset.
The method argument can take any of the following values: bilinear, conservative, conservative_normed,
patch, nearest_s2d, or nearest_d2s. You can find a comparison of the methods here.
The extrap_method argument can take any of the following values: inverse_dist or nearest_s2d. This
argument along with extrap_dist_exponent and extrap_num_src_pnts can be used to configure
how extrapolation is applied.

10.5. API Reference 165


xCDAT Documentation, Release 0.6.0

The **options arguments are additional values passed to the xesmf.XESMFRegridder constructor. A
description of these arguments can be found on xESMF’s documentation.
Parameters
• input_grid (xr.Dataset) – Contains source grid coordinates.
• output_grid (xr.Dataset) – Contains desintation grid coordinates.
• method (str) – The regridding method to apply, defaults to “bilinear”.
• periodic (bool) – Treat longitude as periodic, used for global grids.
• extrap_method (Optional[str]) – Extrapolation method, useful when moving
from a fine to coarse grid.
• extrap_dist_exponent (Optional[float]) – The exponent to raise the distance
to when calculating weights for the extrapolation method.
• extrap_num_src_pnts (Optional[int]) – The number of source points to use
for the extrapolation methods that use more than one source point.
• ignore_degenerate (bool) – Ignore degenerate cells when checking the in-
put_grid for errors. If set False, a degenerate cell produces an error.
This only applies to “conservative” and “conservative_normed” regridding meth-
ods.
• **options (Any) – Additional arguments passed to the underlying xesmf.
XESMFRegridder constructor.
Raises
• KeyError – If data variable does not exist in the Dataset.
• ValueError – If method is not valid.
• ValueError – If extrap_method is not valid.

Examples

Import xCDAT:

>>> import xcdat

Open a dataset:

>>> ds = xcdat.open_dataset("...")

Create output grid:

>>> output_grid = xcdat.create_gaussian_grid(32)

Regrid the “ts” variable using the “bilinear” method:

>>> output_data = ds.regridder.horizontal(


>>> "ts", output_grid, tool="xesmf", method="bilinear"
>>> )

Passing additional values to xesmf.XESMFRegridder:

166 Chapter 10. License


xCDAT Documentation, Release 0.6.0

>>> output_data = ds.regridder.horizontal(


>>> "ts", output_grid, tool="xesmf", method="bilinear", unmapped_to_
˓→nan=True

>>> )

Methods

__init__(input_grid, output_grid, method[, ...]) Extension of xESMF regridder.


horizontal(data_var, ds) See documentation in xcdat.regridder.xesmf.
XESMFRegridder()
vertical(data_var, ds) Placeholder for base class.

vertical(data_var, ds)
Placeholder for base class.
horizontal(data_var, ds)
See documentation in xcdat.regridder.xesmf.XESMFRegridder()
_abc_impl = <_abc._abc_data object>

xcdat.regridder.xgcm.XGCMRegridder

class xcdat.regridder.xgcm.XGCMRegridder(input_grid, output_grid, method='linear', target_data=None,


grid_positions=None, periodic=False,
extra_init_options=None, **options)

__init__(input_grid, output_grid, method='linear', target_data=None, grid_positions=None,


periodic=False, extra_init_options=None, **options)
Extension of xgcm regridder.
The XGCMRegridder extends xgcm by automatically constructing the Grid object, transposing the output
data to match the dimensional order of the input data, and ensuring bounds and metadata are preserved in
the output dataset.
Linear and log methods require a single dimension position, which can usually be automatically derived.
A custom position can be specified using the grid_positions argument.
Conservative regridding requires multiple dimension positions, e.g., {“center”: “xc”, “left”: “xg”} which
can be passed using the grid_positions argument.
xgcm.Grid can be passed additional arguments using extra_init_options. These arguments can be
found on XGCM’s Grid documentation.
xgcm.Grid.transform can be passed additional arguments using options. These arguments can be
found on XGCM’s Grid.transform documentation.
Parameters
• input_grid (xr.Dataset) – Contains source grid coordinates.
• output_grid (xr.Dataset) – Contains destination grid coordinates.
• method (XGCMVerticalMethods) –
Regridding method, by default “linear”. Options are

10.5. API Reference 167


xCDAT Documentation, Release 0.6.0

– linear (default)
– log
– conservative
• target_data (Optional[Union[str, xr.DataArray]]) – Data to transform tar-
get data onto, either the key of a variable in the input dataset or an xr.DataArray,
by default None.
• grid_positions (Optional[Dict[str, str]]) – Mapping of dimension posi-
tions, by default None. If None then an attempt is made to derive this argument.
• periodic (Optional[bool]) – Whether the grid is periodic, by default False.
• extra_init_options (Optional[Dict[str, Any]]) – Extra options passed to the
xgcm.Grid constructor, by default None.
• options (Optional[Dict[str, Any]]) – Extra options passed to the xgcm.
Grid.transform method.
Raises
• KeyError – If data variable does not exist in the Dataset.
• ValueError – If method is not valid.

Examples

Import xCDAT:

>>> import xcdat

Open a dataset:

>>> ds = xcdat.open_dataset("...")

Create output grid:

>>> output_grid = xcdat.create_grid(lev=np.linspace(1000, 1, 5))

Regrid data to output_grid:

>>> output_data = ds.regridder.vertical(


>>> "so", output_grid, tool="xgcm", method="linear"
>>> )

Create pressure variable:

>>> ds["pressure"] = (ds["hyam"] * ds["P0"] + ds["hybm"] * ds["PS"]).transpose(


>>> **ds["T"].dims
>>> )

Regrid data to output_grid in pressure space:

>>> output_data = ds.regridder.vertical(


>>> "so", output_grid, tool="xgcm", method="linear", target_data="pressure"
>>> )

168 Chapter 10. License


xCDAT Documentation, Release 0.6.0

Passing additional arguments to xgcm.Grid and xgcm.Grid.transform:

>>> regridder = xgcm.XGCMRegridder(


>>> ds,
>>> output_grid,
>>> method="linear",
>>> extra_init_options={"boundary": "fill", "fill_value": 1e27},
>>> mask_edges=True
>>> )

Methods

__init__(input_grid, output_grid[, method, ...]) Extension of xgcm regridder.


horizontal(data_var, ds) Placeholder for base class.
vertical(data_var, ds) See documentation in xcdat.regridder.xgcm.
XGCMRegridder()

horizontal(data_var, ds)
Placeholder for base class.
vertical(data_var, ds)
See documentation in xcdat.regridder.xgcm.XGCMRegridder()
_get_grid_positions()

_abc_impl = <_abc._abc_data object>

Attributes

Dataset.bounds.map Returns a map of axis and coordinates keys to their


bounds.
Dataset.bounds.keys Returns a list of keys for the bounds data variables in the
Dataset.
Dataset.regridder.grid Extract the X, Y, and Z axes from the Dataset and return
a new xr.Dataset.

xarray.Dataset.bounds.map

Dataset.bounds.map
Returns a map of axis and coordinates keys to their bounds.
The dictionary provides all valid CF compliant keys for axis and coordinates. For example, latitude will includes
keys for “lat”, “latitude”, and “Y”.
Returns
Dict[str, Optional[xr.DataArray]] – Dictionary mapping axis and coordinate keys to
their bounds.

10.5. API Reference 169


xCDAT Documentation, Release 0.6.0

xarray.Dataset.bounds.keys

Dataset.bounds.keys
Returns a list of keys for the bounds data variables in the Dataset.
Returns
List[str] – A list of sorted bounds data variable keys.

xarray.Dataset.regridder.grid

Dataset.regridder.grid
Extract the X, Y, and Z axes from the Dataset and return a new xr.Dataset.
Returns
xr.Dataset – Containing grid axes.
Raises
• ValueError – If axis dimension coordinate variable is not correctly identified.
• ValueError – If axis has multiple dimensions (only one is expected).

Examples

Import xCDAT:

>>> import xcdat

Open a dataset:

>>> ds = xcdat.open_dataset("...")

Extract grid from dataset:

>>> grid = ds.regridder.grid

170 Chapter 10. License


xCDAT Documentation, Release 0.6.0

Methods

Dataset.bounds.add_bounds(axis) Add bounds for an axis using its coordinates as mid-


points.
Dataset.bounds.add_time_bounds(method[, ...]) Add bounds for an axis using its coordinate points.
Dataset.bounds.get_bounds(axis[, var_key]) Gets coordinate bounds.
Dataset.bounds.add_missing_bounds(axes) Adds missing coordinate bounds for supported axes in
the Dataset.
Dataset.spatial.average(data_var[, axis, ...]) Calculates the spatial average for a rectilinear grid over
an optionally specified regional domain.
Dataset.temporal.average(data_var[, ...]) Returns a Dataset with the average of a data variable and
the time dimension removed.
Dataset.temporal.group_average(data_var, freq) Returns a Dataset with average of a data variable by time
group.
Dataset.temporal.climatology(data_var, freq) Returns a Dataset with the climatology of a data variable.
Dataset.temporal.departures(data_var, freq) Returns a Dataset with the climatological departures
(anomalies) for a data variable.
Dataset.regridder.horizontal(data_var, ...) Transform data_var to output_grid.
Dataset.regridder.vertical(data_var, out- Transform data_var to output_grid.
put_grid)

xarray.Dataset.bounds.add_bounds

Dataset.bounds.add_bounds(axis)
Add bounds for an axis using its coordinates as midpoints.
This method loops over the axis’s coordinate variables and attempts to add bounds for each of them if they don’t
exist. Each coordinate point is the midpoint between their lower and upper bounds.
To add bounds for an axis its coordinates must meet the following criteria, otherwise an error is thrown:
1. Axis is either X”, “Y”, “T”, or “Z”
2. Coordinates are single dimensional, not multidimensional
3. Coordinates are a length > 1 (not singleton)
4. Bounds must not already exist
• Coordinates are mapped to bounds using the “bounds” attr. For example, bounds exist if ds.time.
attrs["bounds"] is set to "time_bnds" and ds.time_bnds is present in the dataset.
Parameters
axis (CFAxisKey) – The CF axis key (“X”, “Y”, “T”, “Z”).
Returns
• xr.Dataset – The dataset with bounds added.
• Raises

10.5. API Reference 171


xCDAT Documentation, Release 0.6.0

xarray.Dataset.bounds.add_time_bounds

Dataset.bounds.add_time_bounds(method, freq=None, daily_subfreq=None, end_of_month=False)


Add bounds for an axis using its coordinate points.
This method loops over the time axis coordinate variables and attempts to add bounds for each of them if they
don’t exist. To add time bounds for the time axis, its coordinates must be the following criteria:
1. Coordinates are single dimensional, not multidimensional
2. Coordinates are a length > 1 (not singleton)
3. Bounds must not already exist
• Coordinates are mapped to bounds using the “bounds” attr. For example, bounds exist if ds.time.
attrs["bounds"] is set to "time_bnds" and ds.time_bnds is present in the dataset.
4. If method=freq, coordinates must be composed of datetime-like objects (np.datetime64 or cftime)
Parameters
• method ({"freq", "midpoint"}) – The method for creating time bounds for time
coordinates, either “freq” or “midpoint”.
– “freq”: Create time bounds as the start and end of each timestep’s period using
either the inferred or specified time frequency (freq parameter). For example, the
time bounds will be the start and end of each month for each monthly coordinate
point.
– “midpoint”: Create time bounds using time coordinates as the midpoint between
their upper and lower bounds.
• freq ({"year", "month", "day", "hour"}, optional) – If method="freq", this
parameter specifies the time frequency for creating time bounds. By default None,
which infers the frequency using the time coordinates.
• daily_subfreq ({1, 2, 3, 4, 6, 8, 12, 24}, optional) – If freq=="hour", this
parameter sets the number of timepoints per day for time bounds, by default None.
– daily_subfreq=None infers the daily time frequency from the time coordinates.
– daily_subfreq=1 is daily
– daily_subfreq=2 is twice daily
– daily_subfreq=4 is 6-hourly
– daily_subfreq=8 is 3-hourly
– daily_subfreq=12 is 2-hourly
– daily_subfreq=24 is hourly
• end_of_month (bool, optional) – If freq=="month", this flag notes that the timepoint
is saved at the end of the monthly interval (see Note), by default False.
– Some timepoints are saved at the end of the interval, e.g., Feb. 1 00:00 for the time
interval Jan. 1 00:00 - Feb. 1 00:00. Since this method determines the month and
year from the time vector, the bounds will be set incorrectly if the timepoint is set
to the end of the time interval. For these cases, set end_of_month=True.
Returns
xr.Dataset – The dataset with time bounds added.

172 Chapter 10. License


xCDAT Documentation, Release 0.6.0

xarray.Dataset.bounds.get_bounds

Dataset.bounds.get_bounds(axis, var_key=None)
Gets coordinate bounds.
Parameters
• axis (CFAxisKey) – The CF axis key (“X”, “Y”, “T”, “Z”).
• var_key (Optional[str]) – The key of the coordinate or data variable to get axis
bounds for. This parameter is useful if you only want the single bounds DataArray
related to the axis on the variable (e.g., “tas” has a “lat” dimension and you want
“lat_bnds”).
Returns
Union[xr.Dataset, xr.DataArray] – A Dataset of N bounds variables, or a single bounds
variable DataArray.
Raises
• ValueError – If an incorrect axis argument is passed.
• KeyError: – If bounds were not found for the specific axis.

xarray.Dataset.bounds.add_missing_bounds

Dataset.bounds.add_missing_bounds(axes)
Adds missing coordinate bounds for supported axes in the Dataset.
This function loops through the Dataset’s axes and attempts to adds bounds to its coordinates if they don’t exist.
“X”, “Y” , and “Z” axes bounds are the midpoints between coordinates. “T” axis bounds are based on the time
frequency of the coordinates.
An axis must meet the following criteria to add bounds for it, otherwise they are ignored:
1. Axis is either X”, “Y”, “T”, or “Z”
2. Coordinates are a single dimension, not multidimensional
3. Coordinates are a length > 1 (not singleton)
4. Bounds must not already exist
• Coordinates are mapped to bounds using the “bounds” attr. For example, bounds exist if ds.time.
attrs["bounds"] is set to "time_bnds" and ds.time_bnds is present in the dataset.
5. For the “T” axis, its coordinates must be composed of datetime-like objects (np.datetime64 or cftime).
Parameters
axes (List[str]) – List of CF axes that function should operate on. Options include “X”,
“Y”, “T”, or “Z”.
Returns
xr.Dataset

xarray.Dataset.spatial.average

Dataset.spatial.average(data_var, axis=['X', 'Y'], weights='generate', keep_weights=False, lat_bounds=None,


lon_bounds=None)
Calculates the spatial average for a rectilinear grid over an optionally specified regional domain.
Operations include:
• If a regional boundary is specified, check to ensure it is within the data variable’s domain boundary.
• If axis weights are not provided, get axis weights for standard axis domains specified in axis.
• Adjust weights to conform to the specified regional boundary.

10.5. API Reference 173


xCDAT Documentation, Release 0.6.0

• Compute spatial weighted average.


This method requires that the dataset’s coordinates have the ‘axis’ attribute set to the keys in axis. For example,
the latitude coordinates should have its ‘axis’ attribute set to ‘Y’ (which is also CF-compliant). This ‘axis’
attribute is used to retrieve the related coordinates via cf_xarray. Refer to this method’s examples for more
information.
Parameters
• data_var (str) – The name of the data variable inside the dataset to spatially average.
• axis (List[SpatialAxis]) – List of axis dimensions to average over, by default [“X”,
“Y”]. Valid axis keys include “X” and “Y”.
• weights ({"generate", xr.DataArray}, optional) – If “generate”, then weights
are generated. Otherwise, pass a DataArray containing the regional weights used for
weighted averaging. weights must include the same spatial axis dimensions and have
the same dimensional sizes as the data variable, by default “generate”.
• keep_weights (bool, optional) – If calculating averages using weights, keep the
weights in the final dataset output, by default False.
• lat_bounds (Optional[RegionAxisBounds], optional) – A tuple of floats/ints for
the regional latitude lower and upper boundaries. This arg is used when calculating
axis weights, but is ignored if weights are supplied. The lower bound cannot be larger
than the upper bound, by default None.
• lon_bounds (Optional[RegionAxisBounds], optional) – A tuple of floats/ints for
the regional longitude lower and upper boundaries. This arg is used when calculating
axis weights, but is ignored if weights are supplied. The lower bound can be larger
than the upper bound (e.g., across the prime meridian, dateline), by default None.
Returns
xr.Dataset – Dataset with the spatially averaged variable.
Raises
KeyError – If data variable does not exist in the Dataset.

Examples

Check the ‘axis’ attribute is set on the required coordinates:

>>> ds.lat.attrs["axis"]
>>> Y
>>>
>>> ds.lon.attrs["axis"]
>>> X

Set the ‘axis’ attribute for the required coordinates if it isn’t:

>>> ds.lat.attrs["axis"] = "Y"


>>> ds.lon.attrs["axis"] = "X"

Call spatial averaging method:

>>> ds.spatial.average(...)

Get global average time series:

>>> ts_global = ds.spatial.average("tas", axis=["X", "Y"])["tas"]

174 Chapter 10. License


xCDAT Documentation, Release 0.6.0

Get time series in Nino 3.4 domain:

>>> ts_n34 = ds.spatial.average("ts", axis=["X", "Y"],


>>> lat_bounds=(-5, 5),
>>> lon_bounds=(-170, -120))["ts"]

Get zonal mean time series:

>>> ts_zonal = ds.spatial.average("tas", axis=["X"])["tas"]

Using custom weights for averaging:

>>> # The shape of the weights must align with the data var.
>>> self.weights = xr.DataArray(
>>> data=np.ones((4, 4)),
>>> coords={"lat": self.ds.lat, "lon": self.ds.lon},
>>> dims=["lat", "lon"],
>>> )
>>>
>>> ts_global = ds.spatial.average("tas", axis=["X", "Y"],
>>> weights=weights)["tas"]

xarray.Dataset.temporal.average

Dataset.temporal.average(data_var, weighted=True, keep_weights=False)


Returns a Dataset with the average of a data variable and the time dimension removed.
This method infers the time grouping frequency by checking the distance between a set of upper and lower
time bounds. This method is particularly useful for calculating the weighted averages of monthly or yearly time
series data because the number of days per month/year can vary based on the calendar type, which can affect
weighting. For other frequencies, the distribution of weights will be equal so weighted=True is the same as
weighted=False.
Time bounds are used for inferring the time series frequency and for generating weights (refer to the weighted
parameter documentation below).
Parameters
• data_var (str) – The key of the data variable for calculating averages
• weighted (bool, optional) – Calculate averages using weights, by default True.
Weights are calculated by first determining the length of time for each coordinate point
using the difference of its upper and lower bounds. The time lengths are grouped, then
each time length is divided by the total sum of the time lengths to get the weight of each
coordinate point.
The weight of masked (missing) data is excluded when averages are taken. This is the
same as giving them a weight of 0.
• keep_weights (bool, optional) – If calculating averages using weights, keep the
weights in the final dataset output, by default False.
Returns
xr.Dataset – Dataset with the average of the data variable and the time dimension removed.

10.5. API Reference 175


xCDAT Documentation, Release 0.6.0

Examples

Get weighted averages for a monthly time series data variable:

>>> ds_month = ds.temporal.average("ts")


>>> ds_month.ts

xarray.Dataset.temporal.group_average

Dataset.temporal.group_average(data_var, freq, weighted=True, keep_weights=False,


season_config={'custom_seasons': None, 'dec_mode': 'DJF',
'drop_incomplete_djf': False})
Returns a Dataset with average of a data variable by time group.
Time bounds are used for generating weights to calculate weighted group averages (refer to the weighted pa-
rameter documentation below).
Parameters
• data_var (str) – The key of the data variable for calculating time series averages.
• freq (Frequency) – The time frequency to group by.
– “year”: groups by year for yearly averages.
– “season”: groups by (year, season) for seasonal averages.
– “month”: groups by (year, month) for monthly averages.
– “day”: groups by (year, month, day) for daily averages.
– “hour”: groups by (year, month, day, hour) for hourly averages.
• weighted (bool, optional) – Calculate averages using weights, by default True.
Weights are calculated by first determining the length of time for each coordinate point
using the difference of its upper and lower bounds. The time lengths are grouped, then
each time length is divided by the total sum of the time lengths to get the weight of each
coordinate point.
The weight of masked (missing) data is excluded when averages are calculated. This is
the same as giving them a weight of 0.
• keep_weights (bool, optional) – If calculating averages using weights, keep the
weights in the final dataset output, by default False.
• season_config (SeasonConfigInput, optional) – A dictionary for “season” frequency
configurations. If configs for predefined seasons are passed, configs for custom seasons
are ignored and vice versa.
Configs for predefined seasons:
– “dec_mode” (Literal[“DJF”, “JFD”], by default “DJF”)
The mode for the season that includes December.
∗ “DJF”: season includes the previous year December.
∗ “JFD”: season includes the same year December.
Xarray labels the season with December as “DJF”, but it is actually
“JFD”.

176 Chapter 10. License


xCDAT Documentation, Release 0.6.0

– “drop_incomplete_djf” (bool, by default False)


If the “dec_mode” is “DJF”, this flag drops (True) or keeps (False) time co-
ordinates that fall under incomplete DJF seasons Incomplete DJF seasons
include the start year Jan/Feb and the end year Dec.
Configs for custom seasons:
– “custom_seasons” ([List[List[str]]], by default None)
List of sublists containing month strings, with each sublist representing a
custom season.
∗ Month strings must be in the three letter format (e.g., ‘Jan’)
∗ Each month must be included once in a custom season
∗ Order of the months in each custom season does not matter
∗ Custom seasons can vary in length
>>> # Example of custom seasons in a three month format:
>>> custom_seasons = [
>>> ["Jan", "Feb", "Mar"], # "JanFebMar"
>>> ["Apr", "May", "Jun"], # "AprMayJun"
>>> ["Jul", "Aug", "Sep"], # "JulAugSep"
>>> ["Oct", "Nov", "Dec"], # "OctNovDec"
>>> ]
Returns
xr.Dataset – Dataset with the average of a data variable by time group.

Examples

Get seasonal averages for a data variable:


>>> ds_season = ds.temporal.group_average(
>>> "ts",
>>> "season",
>>> season_config={
>>> "dec_mode": "DJF",
>>> "drop_incomplete_season": True
>>> }
>>> )
>>> ds_season.ts
>>>
>>> ds_season_with_jfd = ds.temporal.group_average(
>>> "ts",
>>> "season",
>>> season_config={"dec_mode": "JFD"}
>>> )
>>> ds_season_with_jfd.ts

Get seasonal averages with custom seasons for a data variable:


>>> custom_seasons = [
>>> ["Jan", "Feb", "Mar"], # "JanFebMar"
>>> ["Apr", "May", "Jun"], # "AprMayJun"
>>> ["Jul", "Aug", "Sep"], # "JulAugSep"
(continues on next page)

10.5. API Reference 177


xCDAT Documentation, Release 0.6.0

(continued from previous page)


>>> ["Oct", "Nov", "Dec"], # "OctNovDec"
>>> ]
>>>
>>> ds_season_custom = ds.temporal.group_average(
>>> "ts",
>>> "season",
>>> season_config={"custom_seasons": custom_seasons}
>>> )

Get the average() operation attributes:

>>> ds_season_with_djf.ts.attrs
{
'operation': 'temporal_avg',
'mode': 'average',
'freq': 'season',
'weighted': 'True',
'dec_mode': 'DJF',
'drop_incomplete_djf': 'False'
}

xarray.Dataset.temporal.climatology

Dataset.temporal.climatology(data_var, freq, weighted=True, keep_weights=False, reference_period=None,


season_config={'custom_seasons': None, 'dec_mode': 'DJF',
'drop_incomplete_djf': False})
Returns a Dataset with the climatology of a data variable.
Time bounds are used for generating weights to calculate weighted climatology (refer to the weighted parameter
documentation below).
Parameters
• data_var (str) – The key of the data variable for calculating climatology.
• freq (Frequency) – The time frequency to group by.
– “season”: groups by season for the seasonal cycle climatology.
– “month”: groups by month for the annual cycle climatology.
– “day”: groups by (month, day) for the daily cycle climatology. If the CF calen-
dar type is "gregorian", "proleptic_gregorian", or "standard", leap days
(if present) are dropped to avoid inconsistencies when calculating climatologies.
Refer to1 for more details on this implementation decision.
• weighted (bool, optional) – Calculate averages using weights, by default True.
Weights are calculated by first determining the length of time for each coordinate point
using the difference of its upper and lower bounds. The time lengths are grouped, then
each time length is divided by the total sum of the time lengths to get the weight of each
coordinate point.
The weight of masked (missing) data is excluded when averages are taken. This is the
same as giving them a weight of 0.
1 https://github.com/xCDAT/xcdat/discussions/332

178 Chapter 10. License


xCDAT Documentation, Release 0.6.0

• keep_weights (bool, optional) – If calculating averages using weights, keep the


weights in the final dataset output, by default False.
• reference_period (Optional[Tuple[str, str]], optional) – The climatological
reference period, which is a subset of the entire time series. This parameter ac-
cepts a tuple of strings in the format ‘yyyy-mm-dd’. For example, ('1850-01-01',
'1899-12-31'). If no value is provided, the climatological reference period will be
the full period covered by the dataset.
• season_config (SeasonConfigInput, optional) – A dictionary for “season” frequency
configurations. If configs for predefined seasons are passed, configs for custom seasons
are ignored and vice versa.
Configs for predefined seasons:
– “dec_mode” (Literal[“DJF”, “JFD”], by default “DJF”)
The mode for the season that includes December.
∗ “DJF”: season includes the previous year December.
∗ “JFD”: season includes the same year December.
Xarray labels the season with December as “DJF”, but it is actually
“JFD”.
– “drop_incomplete_djf” (bool, by default False)
If the “dec_mode” is “DJF”, this flag drops (True) or keeps (False) time co-
ordinates that fall under incomplete DJF seasons Incomplete DJF seasons
include the start year Jan/Feb and the end year Dec.
Configs for custom seasons:
– “custom_seasons” ([List[List[str]]], by default None)
List of sublists containing month strings, with each sublist representing a
custom season.
∗ Month strings must be in the three letter format (e.g., ‘Jan’)
∗ Each month must be included once in a custom season
∗ Order of the months in each custom season does not matter
∗ Custom seasons can vary in length

>>> # Example of custom seasons in a three month format:


>>> custom_seasons = [
>>> ["Jan", "Feb", "Mar"], # "JanFebMar"
>>> ["Apr", "May", "Jun"], # "AprMayJun"
>>> ["Jul", "Aug", "Sep"], # "JulAugSep"
>>> ["Oct", "Nov", "Dec"], # "OctNovDec"
>>> ]
Returns
xr.Dataset – Dataset with the climatology of a data variable.

10.5. API Reference 179


xCDAT Documentation, Release 0.6.0

References

Examples

Get a data variable’s seasonal climatology:

>>> ds_season = ds.temporal.climatology(


>>> "ts",
>>> "season",
>>> season_config={
>>> "dec_mode": "DJF",
>>> "drop_incomplete_season": True
>>> }
>>> )
>>> ds_season.ts
>>>
>>> ds_season = ds.temporal.climatology(
>>> "ts",
>>> "season",
>>> season_config={"dec_mode": "JFD"}
>>> )
>>> ds_season.ts

Get a data variable’s seasonal climatology with custom seasons:

>>> custom_seasons = [
>>> ["Jan", "Feb", "Mar"], # "JanFebMar"
>>> ["Apr", "May", "Jun"], # "AprMayJun"
>>> ["Jul", "Aug", "Sep"], # "JulAugSep"
>>> ["Oct", "Nov", "Dec"], # "OctNovDec"
>>> ]
>>>
>>> ds_season_custom = ds.temporal.climatology(
>>> "ts",
>>> "season",
>>> season_config={"custom_seasons": custom_seasons}
>>> )

Get climatology() operation attributes:

>>> ds_season_with_djf.ts.attrs
{
'operation': 'temporal_avg',
'mode': 'climatology',
'freq': 'season',
'weighted': 'True',
'dec_mode': 'DJF',
'drop_incomplete_djf': 'False'
}

180 Chapter 10. License


xCDAT Documentation, Release 0.6.0

xarray.Dataset.temporal.departures

Dataset.temporal.departures(data_var, freq, weighted=True, keep_weights=False, reference_period=None,


season_config={'custom_seasons': None, 'dec_mode': 'DJF',
'drop_incomplete_djf': False})
Returns a Dataset with the climatological departures (anomalies) for a data variable.
In climatology, “anomalies” refer to the difference between the value during a given time interval (e.g., the
January average surface air temperature) and the long-term average value for that time interval (e.g., the average
surface temperature over the last 30 Januaries).
Time bounds are used for generating weights to calculate weighted climatology (refer to the weighted parameter
documentation below).
Parameters
• data_var (str) – The key of the data variable for calculating departures.
• freq (Frequency) – The frequency of time to group by.
– “season”: groups by season for the seasonal cycle departures.
– “month”: groups by month for the annual cycle departures.
– “day”: groups by (month, day) for the daily cycle departures. If the CF calendar
type is "gregorian", "proleptic_gregorian", or "standard", leap days (if
present) are dropped to avoid inconsistencies when calculating climatologies. Re-
fer to2 for more details on this implementation decision.
• weighted (bool, optional) – Calculate averages using weights, by default True.
Weights are calculated by first determining the length of time for each coordinate point
using the difference of its upper and lower bounds. The time lengths are grouped, then
each time length is divided by the total sum of the time lengths to get the weight of each
coordinate point.
The weight of masked (missing) data is excluded when averages are taken. This is the
same as giving them a weight of 0.
• keep_weights (bool, optional) – If calculating averages using weights, keep the
weights in the final dataset output, by default False.
• reference_period (Optional[Tuple[str, str]], optional) – The climatological ref-
erence period, which is a subset of the entire time series and used for calculating de-
partures. This parameter accepts a tuple of strings in the format ‘yyyy-mm-dd’. For
example, ('1850-01-01', '1899-12-31'). If no value is provided, the climatolog-
ical reference period will be the full period covered by the dataset.
• season_config (SeasonConfigInput, optional) – A dictionary for “season” frequency
configurations. If configs for predefined seasons are passed, configs for custom seasons
are ignored and vice versa.
Configs for predefined seasons:
– “dec_mode” (Literal[“DJF”, “JFD”], by default “DJF”)
The mode for the season that includes December.
∗ “DJF”: season includes the previous year December.
2 https://github.com/xCDAT/xcdat/discussions/332

10.5. API Reference 181


xCDAT Documentation, Release 0.6.0

∗ “JFD”: season includes the same year December.


Xarray labels the season with December as “DJF”, but it is actually
“JFD”.
– “drop_incomplete_djf” (bool, by default False)
If the “dec_mode” is “DJF”, this flag drops (True) or keeps (False) time co-
ordinates that fall under incomplete DJF seasons Incomplete DJF seasons
include the start year Jan/Feb and the end year Dec.
Configs for custom seasons:
– “custom_seasons” ([List[List[str]]], by default None)
List of sublists containing month strings, with each sublist representing a
custom season.
∗ Month strings must be in the three letter format (e.g., ‘Jan’)
∗ Each month must be included once in a custom season
∗ Order of the months in each custom season does not matter
∗ Custom seasons can vary in length

>>> # Example of custom seasons in a three month format:


>>> custom_seasons = [
>>> ["Jan", "Feb", "Mar"], # "JanFebMar"
>>> ["Apr", "May", "Jun"], # "AprMayJun"
>>> ["Jul", "Aug", "Sep"], # "JulAugSep"
>>> ["Oct", "Nov", "Dec"], # "OctNovDec"
>>> ]
Returns
xr.Dataset – The Dataset containing the departures for a data var’s climatology.

Notes

This method uses xarray’s grouped arithmetic as a shortcut for mapping over all unique labels. Grouped arith-
metic works by assigning a grouping label to each time coordinate of the observation data based on the averaging
mode and frequency. Afterwards, the corresponding climatology is removed from the observation data at each
time coordinate based on the matching labels.
Refer to3 to learn more about how xarray’s grouped arithmetic works.

References

Examples

Get a data variable’s annual cycle departures:

>>> ds_depart = ds_climo.temporal.departures("ts", "month")

Get the departures() operation attributes:

3 https://xarray.pydata.org/en/stable/user-guide/groupby.html#grouped-arithmetic

182 Chapter 10. License


xCDAT Documentation, Release 0.6.0

>>> ds_depart.ts.attrs
{
'operation': 'departures',
'frequency': 'season',
'weighted': 'True',
'dec_mode': 'DJF',
'drop_incomplete_djf': 'False'
}

xarray.Dataset.regridder.horizontal

Dataset.regridder.horizontal(data_var, output_grid, tool='xesmf', **options)


Transform data_var to output_grid.
When might Regrid2 be preferred over xESMF?
If performing conservative regridding from a high/medium resolution lat/lon grid to a coarse lat/lon target,
Regrid2 may provide better results as it assumes grid cells with constant latitudes and longitudes while xESMF
assumes the cells are connected by Great Circles1 .
Supported tools, methods and grids:
• xESMF (https://pangeo-xesmf.readthedocs.io/en/latest/)
– Methods: Bilinear, Conservative, Conservative Normed, Patch, Nearest s2d, or Nearest d2s.
– Grids: Rectilinear, or Curvilinear.
– Find options at xcdat.regridder.xesmf.XESMFRegridder()
• Regrid2
– Methods: Conservative
– Grids: Rectilinear
– Find options at xcdat.regridder.regrid2.Regrid2Regridder()
Parameters
• data_var (str) – Name of the variable to transform.
• output_grid (xr.Dataset) – Grid to transform data_var to.
• tool (str) – Name of the tool to use.
• **options (Any) – These options are passed directly to the tool. See specific regridder
for available options.
Returns
xr.Dataset – With the data_var transformed to the output_grid.
Raises
ValueError – If tool is not supported.
1 https://earthsystemmodeling.org/docs/release/ESMF_8_1_0/ESMF_refdoc/node5.html#SECTION05012900000000000000

10.5. API Reference 183


xCDAT Documentation, Release 0.6.0

References

Examples

Import xCDAT:

>>> import xcdat

Open a dataset:

>>> ds = xcdat.open_dataset("...")

Create output grid:

>>> output_grid = xcdat.create_uniform_grid(-90, 90, 4.0, -180, 180, 5.0)

Regrid variable using “xesmf”:

>>> output_data = ds.regridder.horizontal("ts", output_grid, tool="xesmf", method=


˓→"bilinear")

Regrid variable using “regrid2”:

>>> output_data = ds.regridder.horizontal("ts", output_grid, tool="regrid2")

xarray.Dataset.regridder.vertical

Dataset.regridder.vertical(data_var, output_grid, tool='xgcm', **options)


Transform data_var to output_grid.
Supported tools:
• xgcm (https://xgcm.readthedocs.io/en/latest/index.html)
– Methods: Linear, Conservative, Log
– Find options at xcdat.regridder.xgcm.XGCMRegridder()
Parameters
• data_var (str) – Name of the variable to transform.
• output_grid (xr.Dataset) – Grid to transform data_var to.
• tool (str) – Name of the tool to use.
• **options (Any) – These options are passed directly to the tool. See specific regridder
for available options.
Returns
xr.Dataset – With the data_var transformed to the output_grid.
Raises
ValueError – If tool is not supported.

184 Chapter 10. License


xCDAT Documentation, Release 0.6.0

Examples

Import xCDAT:

>>> import xcdat

Open a dataset:

>>> ds = xcdat.open_dataset("...")

Create output grid:

>>> output_grid = xcdat.create_grid(lev=np.linspace(1000, 1, 20))

Regrid variable using “xgcm”:

>>> output_data = ds.regridder.vertical("so", output_grid, method="linear")

10.5.4 CDAT Mapping Table

The table below maps the supported xCDAT operations to the equivalent CDAT and xCDAT APIs. It is especially
useful for those who are transitioning over from CDAT to xarray/xCDAT.

10.5. API Reference 185


xCDAT Documentation, Release 0.6.0

How do I. . . xCDAT CDAT


Open dataset files? xcdat.open_dataset() and xcdat. cdms2.open()
open_mfdataset()
Get coordinate Dataset.bounds.get_bounds() cdms2.tvariable.getBounds()
bounds?
Set coordinate Dataset.bounds.add_bounds() cdms2.tvariable.setBounds()
bounds for a single
axis?
Set coordinate Dataset.bounds. N/A
bounds for all add_missing_bounds()
axes with missing
bounds?
Center time coor- xcdat.center_times() N/A
dinates using time
bounds?
Swap the longitude axis.swap_lon_axis() N/A
axis orientation be-
tween (-180 to 180)
and (0 to 360)?
Spatially average Dataset.spatial.average("VAR_KEY cdutil.averager(TransientVariable,
over an optionally ", axis=["X", "Y"]) specifying axis="xy"), optionally subset
specified rectilinear lat_bounds and lon_bounds TransientVariable with cdutil.
grid? region.domain()
Decode time coordi- xr.decode_cf() specifying cdms2.axis.Axis.asComponentTime()
nates with CF/Non- decode_times=True, or xcdat.
CF units? decode_time()
Temporally averag- Dataset.temporal.average("VAR_KEY cdutil.averager(TransientVariable,
ing with a single ") axis="t")
time-averaged snap-
shot and time coor-
dinates removed?
Temporally average Dataset.temporal.group_average( cdutil.SEASONALCYCLE(),
by time group? "VAR_KEY", freq=<"season"|"month cdutil.ANNUALCYCLE(), cdutil.
"|"day"|"hour">), subset results for <DJF|MAM|JJA|SON>(), cdutil.
individual seasons, months, or hours <JAN|FEB|...|DEC>()
Calculate climatolo- Dataset.temporal.climatology( cdutil.SEASONALCYCLE.
gies? "VAR_KEY", freq=<"season"|"month climatology(), cdutil.
"|"day">), subset results for individual ANNUALCYCLE.climatology(), cdutil.
seasons, months, or days <DJF|MAM|JJA|SON>.climatology(),
cdutil.<JAN|FEB|...|DEC>.
climatology()
Calculate climato- Dataset.temporal.departures( cdutil.SEASONALCYCLE.
logical departures? "VAR_KEY", freq=<"season"|"month departures(), cdutil.
"|"day">), subset results for individual ANNUALCYCLE.departures(), cdutil.
seasons, months, or days <DJF|MAM|JJA|SON>.departures(),
cdutil.<JAN|FEB|...|DEC>.
departures()
Regrid horizon- Dataset.regridder. cdms2.regrid2()
tally? horizontal(tool="regrid2")

186 Chapter 10. License


xCDAT Documentation, Release 0.6.0

10.6 History

10.6.1 v0.6.0 (10 October 2023)

This minor version update consists of new features including vertical regridding (extension of xgcm), functions for
producing accurate time bounds, and improving the usability of the create_grid API. It also includes bug fixes to
preserve attributes when using regrid2 horizontal regridder and fixing multi-file datasets spatial average orientation
and weights when lon bounds span prime meridian.

10.6.2 Features

• Functions to produce accurate time bounds by Stephen Po-Chedley in https://github.com/xCDAT/xcdat/pull/418


• Add API extending xgcm vertical regridding by Jason Boutte in https://github.com/xCDAT/xcdat/pull/388, https:
//github.com/xCDAT/xcdat/pull/535, https://github.com/xCDAT/xcdat/pull/525
• Update create_grid args to improve usability by Jason Boutte in https://github.com/xCDAT/xcdat/pull/507,
https://github.com/xCDAT/xcdat/pull/539

10.6.3 Deprecation

• Add deprecation warnings for add_bounds boolean args by Tom Vo in https://github.com/xCDAT/xcdat/pull/


548,
• Add deprecation warning for CDML/XML support in open_mfdataset() by Tom Vo in https://github.com/
xCDAT/xcdat/pull/503, https://github.com/xCDAT/xcdat/pull/504

10.6.4 Bug Fixes

Horizontal Regridding

• Improves error when axis is missing/incorrect attributes with regrid2 by Jason Boutte in https://github.com/
xCDAT/xcdat/pull/481
• Fixes preserving ds/da attributes in the regrid2 module by Jason Boutte in https://github.com/xCDAT/xcdat/pull/
468
• Fixes duplicate parameter in regrid2 docs by Jason Boutte in https://github.com/xCDAT/xcdat/pull/532

Spatial Averaging

• Fix multi-file dataset spatial average orientation and weights when lon bounds span prime meridian by Stephen
Po-Chedley in https://github.com/xCDAT/xcdat/pull/495

10.6. History 187


xCDAT Documentation, Release 0.6.0

10.6.5 Documentation

• Typo fix for climatology code example in docs by Jiwoo Lee in https://github.com/xCDAT/xcdat/pull/491
• Update documentation in regrid2.py by Jiwoo Lee in https://github.com/xCDAT/xcdat/pull/509
• Add more fields to GH Discussions question form by Tom Vo in https://github.com/xCDAT/xcdat/pull/480
• Add Q&A GH discussions template by Tom Vo in https://github.com/xCDAT/xcdat/pull/479
• Update FAQs question covering datasets with conflicting bounds by Tom Vo in https://github.com/xCDAT/xcdat/
pull/474
• Add Google Groups mailing list to docs by Tom Vo in https://github.com/xCDAT/xcdat/pull/452
• Fix README link to CODE-OF-CONDUCT.rst by Tom Vo in https://github.com/xCDAT/xcdat/pull/444
• Replace LLNL E3SM License with xCDAT License by Tom Vo in https://github.com/xCDAT/xcdat/pull/443
• Update getting started and HPC documentation by Tom Vo in https://github.com/xCDAT/xcdat/pull/553

10.6.6 DevOps

• Fix Python deprecation comment in conda env yml files by Tom Vo in https://github.com/xCDAT/xcdat/pull/514
• Simplify conda environments and move configs to pyproject.toml by Tom Vo in https://github.com/xCDAT/
xcdat/pull/512
• Update DevOps to cache conda and fix attributes not being preserved with xarray > 2023.3.0 by Tom Vo in
https://github.com/xCDAT/xcdat/pull/465
• Update GH Actions to use mamba by Tom Vo in https://github.com/xCDAT/xcdat/pull/450
• Update constraint cf_xarray >=0.7.3 to workaround xarray import issue by Tom Vo in https://github.com/
xCDAT/xcdat/pull/547
Full Changelog: https://github.com/xCDAT/xcdat/compare/v0.5.0. . . v0.6.0

10.6.7 v0.5.0 (27 March 2023)

This long-awaited minor release includes feature updates to support an optional user-specified climatology reference
period when calculating climatologies and departures, support for opening datasets using the directory key of the
legacy CDAT Climate Data Markup Language (CDML) format (an XML dialect), and improved support for using
custom time coordinates in temporal APIs.
This release also includes a bug fix for singleton coordinates breaking the swap_lon_axis() function. Additionally,
Jupyter Notebooks for presentations and demos have been added to the documentation.

Features

• Update departures and climatology APIs with reference period by Tom Vo in https://github.com/xCDAT/xcdat/
pull/417
• Wrap open_dataset and open_mfdataset to flexibly open datasets by Stephen Po-Chedley in https://github.com/
xCDAT/xcdat/pull/385
• Add better support for using custom time coordinates in temporal APIs by Tom Vo in https://github.com/xCDAT/
xcdat/pull/415

188 Chapter 10. License


xCDAT Documentation, Release 0.6.0

Bug Fixes

• Raise warning if no time coords found with decode_times by Tom Vo in https://github.com/xCDAT/xcdat/


pull/409
• Bump conda env dependencies by Tom Vo in https://github.com/xCDAT/xcdat/pull/408
• Fix swap_lon_axis() breaking when sorting with singleton coords by Tom Vo in https://github.com/xCDAT/
xcdat/pull/392

Documentation

• Update xsearch-xcdat-example.ipynb by Stephen Po-Chedley in https://github.com/xCDAT/xcdat/pull/425


• Updates xesmf docs by Jason Boutte in https://github.com/xCDAT/xcdat/pull/432
• Add presentations and demos to sphinx toctree by Tom Vo in https://github.com/xCDAT/xcdat/pull/422
• Update temporal .average and .departures docstrings by Tom Vo in https://github.com/xCDAT/xcdat/pull/
407

DevOps

• Bump conda env dependencies by Tom Vo in https://github.com/xCDAT/xcdat/pull/408


Full Changelog: https://github.com/xCDAT/xcdat/compare/v0.4.0. . . v0.5.0

10.6.8 v0.4.0 (9 November 2022)

This minor release includes a feature update to support datasets that have N dimensions mapped to N coordinates to
represent an axis. This means xcdat APIs are able to intelligently select which axis’s coordinates and bounds to work
with if multiple are present within the dataset. Decoding time is now a lazy operation, leading to significant upfront
runtime improvements when opening datasets with decode_times=True.
A new notebook called “A Gentle Introduction to xCDAT” was added to the documentation gallery to help guide new
xarray/xcdat users. xCDAT is now hosted on Zenodo with a DOI for citations.
There are various bug fixes for bounds, naming of spatial weights, and a missing flag for xesmf that broke curvilinear
regridding.

Features

• Support for N axis dimensions mapped to N coordinates by Tom Vo and Stephen Po-Chedley in https://github.
com/xCDAT/xcdat/pull/343
– Rename get_axis_coord() to get_dim_coords() and get_axis_dim() to get_dim_keys()
– Update spatial and temporal accessor class methods to refer to the dimension coordinate variable on the
data_var being operated on, rather than the parent dataset
• Decoding times (decode_time()) is now a lazy operation, which results in significant runtime improvements
by Tom Vo in https://github.com/xCDAT/xcdat/pull/343

10.6. History 189


xCDAT Documentation, Release 0.6.0

Bug Fixes

• Fix add_bounds() not ignoring 0-dim singleton coords by Tom Vo and Stephen Po-Chedley in https://github.
com/xCDAT/xcdat/pull/343
• Fix name of spatial weights with singleton coord by Tom Vo in https://github.com/xCDAT/xcdat/pull/379
• Fixes xesmf flag that was missing which broke curvilinear regridding by Jason Boutte and Stephen Po-Chedley
in https://github.com/xCDAT/xcdat/pull/374

Documentation

• Add FAQs section for temporal metadata by Tom Vo in https://github.com/xCDAT/xcdat/pull/383


• Add gentle introduction notebook by Tom Vo in https://github.com/xCDAT/xcdat/pull/373
• Link repo to Zenodo and upload GitHub releases by Tom Vo in https://github.com/xCDAT/xcdat/pull/367
• Update project overview, FAQs, and add a link to xarray tutorials by Tom Vo in https://github.com/xCDAT/
xcdat/pull/365
• Update feature list, add metadata interpretation to FAQs, and add ipython syntax highlighting for notebooks by
Tom Vo in https://github.com/xCDAT/xcdat/pull/362

DevOps

• Update release-drafter template by Tom Vo in https://github.com/xCDAT/xcdat/pull/371 and https://github.com/


xCDAT/xcdat/pull/370
• Automate release notes generation by Tom Vo in https://github.com/xCDAT/xcdat/pull/368
Full Changelog: https://github.com/xCDAT/xcdat/compare/v0.3.3. . . v0.4.0

10.6.9 v0.3.3 (12 October 2022)

This patch release fixes a bug where calculating daily climatologies/departures for specific CF calendar types that have
leap days breaks when using cftime. It also includes documentation updates.

Bug Fixes

• Drop leap days based on CF calendar type to calculate daily climatologies and departures by Tom Vo and Jiwoo
Lee in https://github.com/xCDAT/xcdat/pull/350
– Affected CF calendar types include gregorian, proleptic_gregorian, and standard
– Since a solution implementation for handling leap days is generally opinionated, we decided to go with
the route of least complexity and overhead (drop the leap days before performing calculations). We may
revisit adding more options for the user to determine how they want to handle leap days (based on how
valuable/desired it is).

190 Chapter 10. License


xCDAT Documentation, Release 0.6.0

Documentation

• Add horizontal regridding gallery notebook by Jason Boutte in https://github.com/xCDAT/xcdat/pull/328


• Add doc for staying up to date with releases by Tom Vo in https://github.com/xCDAT/xcdat/pull/355
Full Changelog: https://github.com/xCDAT/xcdat/compare/v0.3.2. . . v0.3.3

10.6.10 v0.3.2 (16 September 2022)

This patch release focuses on bug fixes related to temporal averaging, spatial averaging, and regridding. xesmf is
now an optional dependency because it is not supported on osx-arm64 and windows at this time. There is a new
documentation page for HPC/Jupyter guidance.

Bug Fixes

Temporal Average

• Fix multiple temporal avg calls on same dataset breaking by Tom Vo in https://github.com/xCDAT/xcdat/pull/
329
• Fix incorrect results for group averaging with missing data by Stephen Po-Chedley in https://github.com/xCDAT/
xcdat/pull/320

Spatial Average

• Fix spatial bugs: handle datasets with domain bounds out of order and zonal averaging by Stephen Po-Chedley
in https://github.com/xCDAT/xcdat/pull/340

Horizontal Regridding

• Fix regridder storing NaNs for bounds by Stephen Po-Chedley in https://github.com/xCDAT/xcdat/pull/344

Documentation

• Update README and add HPC/Jupyter Guidance by Stephen Po-Chedley in https://github.com/xCDAT/xcdat/


pull/331

Dependencies

• Make xesmf an optional dependency by Paul Durack in https://github.com/xCDAT/xcdat/pull/334


– This is required because xesmf (and esmpy which is a dependency) are not supported on osx-arm64 and
windows at this time.
– Once these platforms are supported, xesmf can become a direct dependency of xcdat.
Full Changelog: https://github.com/xCDAT/xcdat/compare/v0.3.1. . . v0.3.2

10.6. History 191


xCDAT Documentation, Release 0.6.0

10.6.11 v0.3.1 (18 August 2022)

This patch release focuses on bug fixes including handling bounds generation with singleton coordinates and the use of
cftime to represent temporal averaging outputs and non-CF compliant time coordinates (to avoid the pandas Times-
tamp limitations).

Bug Fixes

Bounds

• Ignore singleton coordinates without dims when attempting to generate bounds by Stephen Po-Chedley in https:
//github.com/xCDAT/xcdat/pull/281
• Modify logic to not throw error for singleton coordinates (with no bounds) by Stephen Po-Chedley in https:
//github.com/xCDAT/xcdat/pull/313

Time Axis and Coordinates

• Fix TypeError with Dask Arrays from multifile datasets in temporal averaging by Stephen Po-Chedley in https:
//github.com/xCDAT/xcdat/pull/291
• Use cftime to avoid out of bounds datetime when decoding non-CF time coordinates by Stephen Po-Chedley
and Tom Vo in https://github.com/xCDAT/xcdat/pull/283
• Use cftime for temporal averaging operations to avoid out of bounds datetime by Stephen Po-Chedley and
Tom Vo in https://github.com/xCDAT/xcdat/pull/302
• Fix open_mfdataset() dropping time encoding attrs by Tom Vo in https://github.com/xCDAT/xcdat/pull/309
• Replace “time” references with self._dim in class TemporalAccessor by Tom Vo in https://github.com/
xCDAT/xcdat/pull/312

Internal Changes

• Filters safe warnings. by Jason Boutte in https://github.com/xCDAT/xcdat/pull/276

Documentation

• update conda install to conda create by Paul Durack in https://github.com/xCDAT/xcdat/pull/294


• Update project overview and planned features list by Tom Vo in https://github.com/xCDAT/xcdat/pull/298
• Fix bullet formatting in README.rst andindex.rst by Tom Vo in https://github.com/xCDAT/xcdat/pull/299
• Fix Jupyter headings not rendering with pandoc by Tom Vo in https://github.com/xCDAT/xcdat/pull/318

192 Chapter 10. License


xCDAT Documentation, Release 0.6.0

DevOps

• Unify workspace settings with settings.json by Tom Vo in https://github.com/xCDAT/xcdat/pull/297


• Run CI/CD on “push” and “workflow_dispatch” by Tom Vo in https://github.com/xCDAT/xcdat/pull/287 and
https://github.com/xCDAT/xcdat/pull/288
• Pin numba=0.55.2 in dev env and constrain numba>=0.55.2 in ci env by Tom Vo in https://github.com/xCDAT/
xcdat/pull/280
• Update conda env yml files and add missing dependencies by Tom Vo in https://github.com/xCDAT/xcdat/pull/
307

New Contributors

• Paul Durack made their first contribution in https://github.com/xCDAT/xcdat/pull/294


Full Changelog: https://github.com/xCDAT/xcdat/compare/v0.3.0. . . v0.3.1

10.6.12 v0.3.0 (27 June 2022)

New Features

• Add horizontal regridding by Jason Boutte in https://github.com/xCDAT/xcdat/pull/164


• Add averages with time dimension removed by Tom Vo in https://github.com/xCDAT/xcdat/pull/236
• Update _get_weights() method in class SpatialAccessor and class TemporalAccessor by Tom Vo
in https://github.com/xCDAT/xcdat/pull/252
– Add keep_weights keyword attr to reduction methods
– Make _get_weights() public in class SpatialAccessor
• Update get_axis_coord() to interpret more keys by Tom Vo in https://github.com/xCDAT/xcdat/pull/262
– Along with the axis attr, it also now interprets standard_name and the dimension name

Bug Fixes

• Fix add_bounds() breaking when time coords are cftime objects by Tom Vo in https://github.com/xCDAT/
xcdat/pull/241
• Fix parsing of custom seasons for departures by Tom Vo in https://github.com/xCDAT/xcdat/pull/246
• Update swap_lon_axis to ignore same systems, which was causing odd behaviors for (0, 360) by Tom Vo in
https://github.com/xCDAT/xcdat/pull/257

10.6. History 193


xCDAT Documentation, Release 0.6.0

Breaking Changes

• Remove class XCDATAccessor by Tom Vo in https://github.com/xCDAT/xcdat/pull/222


• Update spatial axis arg supported type and keys by Tom Vo in https://github.com/xCDAT/xcdat/pull/226
– Now only supports CF-compliant axis names (e.g., “X”, “Y”)
• Remove center_times kwarg from temporal averaging methods by Tom Vo in https://github.com/xCDAT/
xcdat/pull/254

Documentation

• Revert official project name from “XCDAT” to “xCDAT” by Tom Vo in https://github.com/xCDAT/xcdat/pull/


231
• [DOC] Add CDAT API mapping table and gallery examples by Tom Vo in https://github.com/xCDAT/xcdat/
pull/239

Internal Changes

• Update time coordinates object type from MultiIndex to datetime/cftime for TemporalAccessor reduction
methods and add convenience methods by Tom Vo in https://github.com/xCDAT/xcdat/pull/221
• Extract method _postprocess_dataset() and make bounds generation optional by Tom Vo in https://github.
com/xCDAT/xcdat/pull/223
• Update add_bounds kwarg default value to True by Tom Vo in https://github.com/xCDAT/xcdat/pull/230
• Update decode_non_cf_time to return input dataset if the time “units” attr can’t be split into unit and reference
date by Stephen Po-Chedley in https://github.com/xCDAT/xcdat/pull/263
Full Changelog: https://github.com/xCDAT/xcdat/compare/v0.2.0. . . v0.3.0

10.6.13 v0.2.0 (24 March 2022)

New Features

• Add support for spatial averaging parallelism via Dask by Stephen Po-Chedley in https://github.com/xCDAT/
xcdat/pull/132
• Refactor spatial averaging with more robust handling of longitude spanning prime meridian by Stephen Po-
Chedley in https://github.com/xCDAT/xcdat/pull/152
• Update xcdat.open_mfdataset time decoding logic by Stephen Po-Chedley in https://github.com/xCDAT/xcdat/
pull/161
• Add function to swap dataset longitude axis orientation by Tom Vo in https://github.com/xCDAT/xcdat/pull/145
• Add utility functions by Tom Vo in https://github.com/xCDAT/xcdat/pull/205
• Add temporal utilities and averaging functionalities by Tom Vo in https://github.com/xCDAT/xcdat/pull/107

194 Chapter 10. License


xCDAT Documentation, Release 0.6.0

Bug Fixes

• Add exception for coords of len <= 1 or multidimensional coords in fill_missing_bounds() by Tom Vo in
https://github.com/xCDAT/xcdat/pull/141
• Update open_mfdataset() to avoid data vars dim concatenation by Tom Vo in https://github.com/xCDAT/
xcdat/pull/143
• Fix indexing on axis keys using generic map (related to spatial averaging) by Tom Vo in https://github.com/
xCDAT/xcdat/pull/172

Breaking Changes

• Rename accessor classes and methods for API consistency by Tom Vo in https://github.com/xCDAT/xcdat/pull/
142
• Rename fill_missing_bounds() to add_missing_bounds() by Tom Vo in https://github.com/xCDAT/
xcdat/pull/157
• Remove data variable inference API by Tom Vo in https://github.com/xCDAT/xcdat/pull/196
• Rename spatial file and class by Tom Vo in https://github.com/xCDAT/xcdat/pull/207

Documentation

• update README by Jill Chengzhu Zhang in https://github.com/xCDAT/xcdat/pull/127


• Update readme by Jiwoo Lee in https://github.com/xCDAT/xcdat/pull/129
• Update HISTORY.rst and fix docstrings by Tom Vo in https://github.com/xCDAT/xcdat/pull/139
• Update README.rst content and add logo by Tom Vo in https://github.com/xCDAT/xcdat/pull/153
• Update API Reference docs to list all APIs by Tom Vo in https://github.com/xCDAT/xcdat/pull/155
• Add config.yml for issue templates with link to discussions by Tom Vo in https://github.com/xCDAT/xcdat/
pull/176
• Add FAQs page to docs by Tom Vo in https://github.com/xCDAT/xcdat/pull/181
• Fix syntax of code examples from PR #181 by Tom Vo in https://github.com/xCDAT/xcdat/pull/182
• Replace markdown issue templates with GitHub yml forms by Tom Vo in https://github.com/xCDAT/xcdat/pull/
186
• Update README.rst, index.rst, and project_maintenance.rst by Tom Vo in https://github.com/xCDAT/
xcdat/pull/211

Deprecations

Internal Changes

• Update logger levels to debug by Tom Vo in https://github.com/xCDAT/xcdat/pull/148


• Update and remove logger debug messages by Tom Vo in https://github.com/xCDAT/xcdat/pull/193

10.6. History 195


xCDAT Documentation, Release 0.6.0

DevOps

• Add requires_dask decorator for tests by Tom Vo in https://github.com/xCDAT/xcdat/pull/177


• Update dependencies in setup.py and dev.yml by Tom Vo in https://github.com/xCDAT/xcdat/pull/174
• Add matrix testing and ci specific conda env by Tom Vo in https://github.com/xCDAT/xcdat/pull/178
• Suppress xarray warning in test suite by Tom Vo in https://github.com/xCDAT/xcdat/pull/179
• Drop support for Python 3.7 by Tom Vo in https://github.com/xCDAT/xcdat/pull/187
• Update conda env dependencies by Tom Vo in https://github.com/xCDAT/xcdat/pull/189
• Add deps to pre-commit mypy and fix issues by Tom Vo in https://github.com/xCDAT/xcdat/pull/191
• Add matplotlib to dev env, update ci.yml and add Python 3.10 to build workflow by Tom Vo in https://github.
com/xCDAT/xcdat/pull/203
• Replace conda with mamba in rtd build by Tom Vo in https://github.com/xCDAT/xcdat/pull/209

New Contributors

• Jill Chengzhu Zhang made their first contribution in https://github.com/xCDAT/xcdat/pull/127


• Jiwoo Lee made their first contribution in https://github.com/xCDAT/xcdat/pull/129
• Stephen Po-Chedley made their first contribution in https://github.com/xCDAT/xcdat/pull/132
Full Changelog: https://github.com/xCDAT/xcdat/compare/v0.1.0. . . v0.2.0

10.6.14 v0.1.0 (7 October 2021)

New Features

• Add geospatial averaging API through DatasetSpatialAverageAccessor class by Stephen Po-Chedley and
Tom Vo in #87
– Does not support parallelism with Dask yet
• Add wrappers for xarray’s open_dataset and open_mfdataset to apply common operations such as:
– If the dataset has a time dimension, decode both CF and non-CF time units
– Generate bounds for supported coordinates if they don’t exist
– Option to limit the Dataset to a single regular (non-bounds) data variable while retaining any bounds data
variables
• Add DatasetBoundsAccessor class for filling missing bounds, returning mapping of bounds, returning names
of bounds keys
• Add BoundsAccessor class for accessing xcdat public methods from other accessor classes
– This will be probably be the API endpoint for most users, unless they prefer importing the individual
accessor classes
• Add ability to infer data variables in xcdat APIs based on the “xcdat_infer” Dataset attr
– This attr is set in xcdat.open_dataset(), xcdat_mfdataset(), or manually
• Utilizes cf_xarray package (https://github.com/xarray-contrib/cf-xarray)

196 Chapter 10. License


xCDAT Documentation, Release 0.6.0

Documentation

• Visit the docs here: https://xcdat.readthedocs.io/en/latest/index.html

DevOps

• 100% code coverage (https://app.codecov.io/gh/xCDAT/xcdat)


• GH Actions for CI/CD build (https://github.com/xCDAT/xcdat/actions)
• Pytest and pytest-cov for test suite
Full Changelog: https://github.com/xCDAT/xcdat/commits/v0.1.0

10.7 Frequently Asked Questions

10.7.1 Metadata Interpretation

What types of datasets does xcdat primarily focus on?

xcdat supports datasets with structured grids that follow the CF convention, but will also strive to support datasets
with common non-CF compliant metadata (e.g., time units in “months since . . . ” or “years since . . . ”).

What structured grids does xcdat support?

xCDAT aims to be a generalizable package that is compatible with structured grids that are CF-compliant (e.g.,
CMIP6). xCDAT’s horizontal regridder supports grids that are supported by Regrid2 and xESMF (curvilinear and
rectilinear).

How does xcdat interpret dataset metadata?

xcdat leverages cf_xarray to interpret CF attributes on xarray objects. xcdat methods and functions usually accept an
axis argument (e.g., ds.temporal.average("ts")). This argument is internally mapped to cf_xarray mapping
tables that interpret the CF attributes.

What CF attributes are interpreted using cf_xarray mapping tables?

• Axis names – used to map to dimension coordinates


– For example, any xr.DataArray that has axis: "X" in its attrs will be identified as the “latitude” coor-
dinate variable by cf_xarray.
– Refer to the cf_xarray Axis Names table for more information.
• Coordinate names – used to map to dimension coordinates
– For example, any xr.DataArray that has standard_name: "latitude" or _CoordinateAxisType:
"Lat" or "units": "degrees_north" in its attrs will be identified as the “latitude” coordinate variable
by cf_xarray.
– Refer to the cf_xarray Coordinate Names table for more information.
• Bounds attribute – used to map to bounds data variables

10.7. Frequently Asked Questions 197


xCDAT Documentation, Release 0.6.0

– For example, the latitude coordinate variable has bounds: "lat_bnds", which maps its bounds to the
lat_bnds data variable.
– Refer to cf_xarray Bounds Variables page for more information.

10.7.2 Handling Bounds

How are bounds generated in xCDAT?

xCDAT generates bounds by using coordinate points as the midpoint between their lower and upper bounds.

Does xCDAT support generating bounds for multiple axis coordinate systems in the same dataset?

For example, there are two sets of coordinates called “lat” and “latitude” in the dataset.
Yes, xCDAT can generate bounds for axis coordinates if they are “dimension coordinates” (coordinate variables in CF
terminology) and have the required CF metadata. “Non-dimension coordinates” (auxiliary coordinate variables in CF
terminology) are ignored.
Visit Xarray’s documentation page on Coordinates for more info on “dimension coordinates” vs. “non-dimension
coordinates”.

10.7.3 Temporal Metadata

What type of time units are supported?

The units attribute must be in the CF compliant format "<units> since <reference_date>". For example, "days
since 1990-01-01".
Supported CF compliant units include day, hour, minute, second, which is inherited from xarray and cftime.
Supported non-CF compliant units include year and month, which xcdat is able to parse. Note, the plural form of
these units are accepted.
References:
• https://cfconventions.org/cf-conventions/cf-conventions#time-coordinate

What type of calendars are supported?

xcdat supports that same CF convention calendars as xarray (based on cftime and netCDF4-python package).
Supported calendars include:
• 'standard'
• 'gregorian'
• 'proleptic_gregorian'
• 'noleap'
• '365_day'
• '360_day'
• 'julian'
• 'all_leap'

198 Chapter 10. License


xCDAT Documentation, Release 0.6.0

• '366_day'
References:
• https://cfconventions.org/cf-conventions/cf-conventions#calendar

Why does xcdat decode time coordinates as cftime objects instead of datetime64[ns]?

One unfortunate limitation of using datetime64[ns] is that it limits the native representation of dates to those that
fall between the years 1678 and 2262. This affects climate modeling datasets that have time coordinates outside of this
range.
As a workaround, xarray uses the cftime library when decoding/encoding datetimes for non-standard calendars or
for dates before year 1678 or after year 2262.
xcdat opted to decode time coordinates exclusively with cftime because it has no timestamp range limitations, sim-
plifies implementation, and the output object type is deterministic.
References:
• https://github.com/pydata/xarray/issues/789
• https://pandas.pydata.org/pandas-docs/stable/user_guide/timeseries.html#timestamp-limitations

10.7.4 Data Wrangling

xcdat aims to implement generalized functionality. This means that functionality intended to handle data quality issues
is out of scope, especially for limited cases.
If data quality issues are present, xarray and xcdat might not be able to open the datasets. Examples of data quality
issues include conflicting floating point values between files or non-CF compliant attributes that are not common.
A few workarounds include:
1. Configuring open_dataset() or open_mfdataset() keyword arguments based on your needs.
2. Writing a custom preprocess() function to feed into open_mfdataset(). This function preprocesses each
dataset file individually before joining them into a single Dataset object.

How do I open a multi-file dataset with bounds values that conflict?

In xarray, the default setting for checking compatibility across a multi-file dataset is compat='no_conflicts'.
In cases where variable values conflict between files, xarray raises MergeError: conflicting values for
variable <VARIABLE NAME> on objects to be combined. You can skip this check by specifying
compat="override".
If you still intend on working with these datasets and recognize the source of the issue (e.g., minor floating point diffs),
follow the workarounds below. Please proceed with caution. You should understand the potential implications of
these workarounds.
1. Pick the first bounds variable and keep dimensions the same as the input files
• This option is recommended if you know bounds values should be the same across all files, but one or
more files has inconsistent bounds values which breaks the concatenation of files into a single xr.Dataset
object.

10.7. Frequently Asked Questions 199


xCDAT Documentation, Release 0.6.0

>>> ds = xcdat.open_mfdataset(
"path/to/files/*.nc",
compat="override",
data_vars="minimal",
coords="minimal",
join="override",
)

• compat="override": skip comparing and pick variable from first dataset


– xarray defaults to compat="no_conflicts"
• data_vars="minimal": Only data variables in which the dimension already appears are
included.
– xcdat defaults to data_vars="minimal"
– xarray defaults to data_vars="all"
• coords="minimal": Only coordinates in which the dimension already appears are included.
– xarray defaults to coord="different"
• join="override": if indexes are of same size, rewrite indexes to be those of the first object
with that dimension. Indexes for the same dimension must have the same size in all objects.
– Alternatively, join="left": use indexes from the first object with each dimension
– xarray defaults to join="outer". This can cause issues where data variable values
conflict because additional coordinates points are concatenated at the point of conflict
which can produce nan values.
2. Drop the conflicting bounds variable(s)
• This option is recommended if you know don’t mind dropping the bounds variable(s). xcdat will generate
and replace the dropped bounds if add_bounds includes the axis for the dropped variable (by default,
add_bounds=["X", "Y"]).

>>> # Drop single variable


>>> xcdat.open_mfdataset("path/to/files/*.nc", drop_variables="lon_bnds")
>>> # Drop multiple variables
>>> xcdat.open_mfdataset("path/to/files/*.nc", drop_variables=["lon_bnds",
˓→"lat_bnds"])

For more information on these options, visit the xarray.open_mfdataset documentation.

10.7.5 Regridding

xcdat extends and provides a uniform interface to xESMF and xgcm. In addition, xcdat provides a port of the CDAT
regrid2 package.
Structured rectilinear and curvilinear grids are supported.

200 Chapter 10. License


xCDAT Documentation, Release 0.6.0

How can I retrieve the grid from a dataset?

The xcdat.regridder.accessor.RegridderAccessor.grid() property is provided to extract the grid informa-


tion from a dataset.

ds = xcdat.open_dataset(...)
grid = ds.regridder.grid

How do I perform horizontal regridding?

The xcdat.regridder.accessor.RegridderAccessor.horizontal() method provides access to the xESMF


and Regrid2 packages.
The arguments for each regridder can be found:
• xcdat.regridder.xesmf.XESMFRegridder()
• xcdat.regridder.regrid2.Regrid2Regridder()
An example of horizontal regridding can be found in the gallery.

How do I perform vertical regridding?

The xcdat.regridder.accessor.RegridderAccessor.vertical() method provides access to the xgcm pack-


age.
The arguments for each regridder can be found:
• xcdat.regridder.xgcm.XGCMRegridder()
An example of vertical regridding can be found in the gallery.

Can xcdat automatically derive Parametric Vertical Coordinates in a dataset?

Automatically deriving Parametric Vertical Coordinates is a planned feature for xcdat.

Can I regrid data on unstructured grids?

Regridding data on unstructured grids is a feature we are exploring for xcdat.

10.8 xCDAT Community Code of Conduct

10.8.1 Our Pledge

We as members, contributors, and leaders pledge to make participation in our community a harassment-free experience
for everyone, regardless of age, body size, visible or invisible disability, ethnicity, sex characteristics, gender identity
and expression, level of experience, education, socio-economic status, nationality, personal appearance, race, caste,
color, religion, or sexual identity and orientation.
We pledge to act and interact in ways that contribute to an open, welcoming, diverse, inclusive, and healthy community.

10.8. xCDAT Community Code of Conduct 201


xCDAT Documentation, Release 0.6.0

10.8.2 Our Standards

Examples of behavior that contributes to a positive environment for our community include:
• Demonstrating empathy and kindness toward other people
• Being respectful of differing opinions, viewpoints, and experiences
• Giving and gracefully accepting constructive feedback
• Accepting responsibility and apologizing to those affected by our mistakes, and learning from the experience
• Focusing on what is best not just for us as individuals, but for the overall community
Examples of unacceptable behavior include:
• The use of sexualized language or imagery, and sexual attention or advances of any kind
• Trolling, insulting or derogatory comments, and personal or political attacks
• Public or private harassment
• Publishing others’ private information, such as a physical or email address, without their explicit permission
• Other conduct which could reasonably be considered inappropriate in a professional setting

10.8.3 Enforcement Responsibilities

Community leaders are responsible for clarifying and enforcing our standards of acceptable behavior and will take
appropriate and fair corrective action in response to any behavior that they deem inappropriate, threatening, offensive,
or harmful.
Community leaders have the right and responsibility to remove, edit, or reject comments, commits, code, wiki edits, is-
sues, and other contributions that are not aligned to this Code of Conduct, and will communicate reasons for moderation
decisions when appropriate.

10.8.4 Scope

This Code of Conduct applies within all community spaces, and also applies when an individual is officially representing
the community in public spaces. Examples of representing our community include using an official e-mail address,
posting via an official social media account, or acting as an appointed representative at an online or offline event.

10.8.5 Enforcement

Instances of abusive, harassing, or otherwise unacceptable behavior may be reported to the community leaders re-
sponsible for enforcement at [email protected]. All complaints will be reviewed and investigated
promptly and fairly.
All community leaders are obligated to respect the privacy and security of the reporter of any incident.

202 Chapter 10. License


xCDAT Documentation, Release 0.6.0

10.8.6 Enforcement Guidelines

Community leaders will follow these Community Impact Guidelines in determining the consequences for any action
they deem in violation of this Code of Conduct:

1. Correction

Community Impact: Use of inappropriate language or other behavior deemed unprofessional or unwelcome in the
community.
Consequence: A private, written warning from community leaders, providing clarity around the nature of the violation
and an explanation of why the behavior was inappropriate. A public apology may be requested.

2. Warning

Community Impact: A violation through a single incident or series of actions.


Consequence: A warning with consequences for continued behavior. No interaction with the people involved, includ-
ing unsolicited interaction with those enforcing the Code of Conduct, for a specified period of time. This includes
avoiding interactions in community spaces as well as external channels like social media. Violating these terms may
lead to a temporary or permanent ban.

3. Temporary Ban

Community Impact: A serious violation of community standards, including sustained inappropriate behavior.
Consequence: A temporary ban from any sort of interaction or public communication with the community for a
specified period of time. No public or private interaction with the people involved, including unsolicited interaction
with those enforcing the Code of Conduct, is allowed during this period. Violating these terms may lead to a permanent
ban.

4. Permanent Ban

Community Impact: Demonstrating a pattern of violation of community standards, including sustained inappropriate
behavior, harassment of an individual, or aggression toward or disparagement of classes of individuals.
Consequence: A permanent ban from any sort of public interaction within the community.

10.8.7 Attribution

This Code of Conduct is adapted from the Contributor Covenant, version 2.1, available at https://www.
contributor-covenant.org/version/2/1/code_of_conduct.html.
Community Impact Guidelines were inspired by Mozilla’s code of conduct enforcement ladder.
For answers to common questions about this code of conduct, see the FAQ at https://www.contributor-covenant.org/faq.
Translations are available at https://www.contributor-covenant.org/translations.

10.8. xCDAT Community Code of Conduct 203


xCDAT Documentation, Release 0.6.0

10.9 Contributing

Contributions are welcome and greatly appreciated! Every little bit helps, and credit will always be given.

10.9.1 Types of Contributions

xCDAT includes issue templates based on the contribution type: https://github.com/xCDAT/xcdat/issues/new/choose.


Note, new contributions must be made under the Apache-2.0 with LLVM exception license.

Bug Report

Look through the GitHub Issues for bugs to fix. Any unassigned issues tagged with “Type: Bug” is open for imple-
mentation.

Feature Request

Look through the GitHub Issues for feature suggestions. Any unassigned issues tagged with “Type: Enhancement” is
open for implementation.
If you are proposing a feature:
• Explain in detail how it would work.
• Keep the scope as narrow as possible, to make it easier to implement.
• Remember that this is a open-source project, and that contributions are welcome :)
Features must meet the following criteria before they are considered for implementation:
1. Feature is not implemented by xarray
2. Feature is not implemented in another actively developed xarray-based package
• For example, cf_xarray already handles interpretation of CF convention attributes on xarray objects
3. Feature is not limited to specific use cases (e.g., data quality issues)
4. Feature is generally reusable
5. Feature is relatively simple and lightweight to implement and use

Documentation Update

Help improve xCDAT’s documentation, whether that be the Sphinx documentation or the API docstrings.

Community Discussion

Take a look at the GitHub Discussions page to get involved, share ideas, or ask questions.

204 Chapter 10. License


xCDAT Documentation, Release 0.6.0

10.9.2 Version Control

The repository uses branch-based (core team) and fork-based (external collaborators) Git workflows with tagged soft-
ware releases.

Guidelines

1. main must always be deployable


2. All changes are made through support branches
3. Rebase with the latest main to avoid/resolve conflicts
4. Make sure pre-commit quality assurance checks pass when committing (enforced in CI/CD build)
5. Open a pull request early for discussion
6. Once the CI/CD build passes and pull request is approved, squash and rebase your commits
7. Merge pull request into main and delete the branch

Things to Avoid

1. Don’t merge in broken or commented out code


2. Don’t commit directly to main
• There are branch-protection rules for main
3. Don’t merge with conflicts. Instead, handle conflicts upon rebasing
Source: https://gist.github.com/jbenet/ee6c9ac48068889b0912

Pre-commit

The repository uses the pre-commit package to manage pre-commit hooks. These hooks help with quality assurance
standards by identifying simple issues at the commit level before submitting code reviews.

Fig. 1: pre-commit Flow

10.9.3 Get Started

Ready to contribute? Here’s how to set up xCDAT for local development.

10.9. Contributing 205


xCDAT Documentation, Release 0.6.0

VS Code, the editor of choice

We recommend using VS Code as your IDE because it is open-source and has great Python development support.
Get VS Code here: https://code.visualstudio.com

VS Code Setup

xCDAT includes a VS Code workspace file (.vscode/xcdat.code-setting). This file automatically configures
your IDE with the quality assurance tools, code line-length rulers, and more.
Make sure to follow the Local Development section below.

Recommended VS Code Extensions

• Python
• Pylance
• Python Docstring Generator
• Python Type Hint
• Better Comments
• Jupyter
• Visual Studio Intellicode

Local Development

1. Download and install Conda


Linux

$ wget https://repo.anaconda.com/miniconda/Miniconda3-latest-Linux-
˓→x86_64.sh

$ bash ./Miniconda3-latest-Linux-x86_64.sh
Do you wish the installer to initialize Miniconda3 by running conda␣
˓→init? [yes|no] yes

MacOS

$ wget https://repo.anaconda.com/miniconda/Miniconda3-latest-MacOSX-
˓→x86_64.sh

$ bash ./Miniconda3-latest-MacOSX-x86_64.sh
Do you wish the installer to initialize Miniconda3 by running conda␣
˓→init? [yes|no] yes

2. Fork the xcdat repo on GitHub.


• If you are a maintainer, you can clone and branch directly from the root repository here: https://github.
com/xCDAT/xcdat
3. Clone your fork locally:

206 Chapter 10. License


xCDAT Documentation, Release 0.6.0

$ git clone [email protected]:your_name_here/xcdat.git

4. <OPTIONAL> Open .vscode/xcdat.code-settings in VS Code


5. Create and activate Conda development environment:

$ cd xcdat
$ conda env create -f conda-env/dev.yml
$ conda activate xcdat_dev

6. <OPTIONAL> Set VS Code Python interpretor to xcdat_dev


7. Install pre-commit:

$ pre-commit install
pre-commit installed at .git/hooks/pre-commit

8. Create a branch for local development and make changes:

$ git checkout -b <BRANCH-NAME>

9. <OPTIONAL> During or after making changes, check for formatting or linting issues using pre-commit:
# Step 9 performs this automatically on staged files in a commit
$ pre-commit run --all-files

Trim Trailing Whitespace.................................................Passed


Fix End of Files.........................................................Passed
Check Yaml...............................................................Passed
black....................................................................Passed
isort....................................................................Passed
flake8...................................................................Passed
mypy.....................................................................Passed

10. Generate code coverage report and check unit tests pass:
$ make test # Automatically opens HTML report in your browser
$ pytest # Does not automatically open HTML report in your browser

================================= test session starts␣


˓→=================================

platform darwin -- Python 3.8.8, pytest-6.2.2, py-1.10.0, pluggy-0.13.1


rootdir: <your-local-dir/xcdat>, configfile: setup.cfg
plugins: anyio-2.2.0, cov-2.11.1
collected 3 items

tests/test_dataset.py ..
tests/test_xcdat.py .

---------- coverage: platform darwin, python 3.8.8-final-0 -----------


Name Stmts Miss Cover
---------------------------------------
xcdat/__init__.py 3 0 100%
xcdat/dataset.py 18 0 100%
xcdat/xcdat.py 0 0 100%
(continues on next page)

10.9. Contributing 207


xCDAT Documentation, Release 0.6.0

(continued from previous page)


---------------------------------------
TOTAL 21 0 100%
Coverage HTML written to dir tests_coverage_reports/htmlcov
Coverage XML written to file tests_coverage_reports/coverage.xml

• The Coverage HTML report is much more detailed (e.g., exact lines of tested/untested code)
11. Commit your changes:

$ git add .
$ git commit -m <Your detailed description of your changes>

Trim Trailing Whitespace.................................................Passed


Fix End of Files.........................................................Passed
Check Yaml...............................................................Passed
black....................................................................Passed
isort....................................................................Passed
flake8...................................................................Passed
mypy.....................................................................Passed

12. Make sure pre-commit QA checks pass. Otherwise, fix any caught issues.
• Most of the tools fix issues automatically so you just need to re-stage the files.
• flake8 and mypy issues must be fixed automatically.
13. Push changes:

$ git push origin <BRANCH-NAME>

14. Submit a pull request through the GitHub website.

10.9.4 Pull Request Guidelines

Before you submit a pull request, check that it meets these guidelines:
1. The pull request should include tests for new or modified code.
2. Link issues to pull requests.
3. If the pull request adds functionality, the docs should be updated. Put your new functionality into a function with
a docstring, and add the feature to the list in README.rst.
4. Squash and rebase commits for a clean and navigable Git history.
When you open a pull request on GitHub, there is a template available for use.

208 Chapter 10. License


xCDAT Documentation, Release 0.6.0

10.9.5 Style Guide

xCDAT integrates the Black code formatter for code styling. If you want to learn more, please read about it here.
xCDAT also leverages Python Type Annotations to help the project scale. mypy performs optional static type checking
through pre-commit.

10.9.6 Testing

Testing your local changes are important to ensure long-term maintainability and extensibility of the project. Since
xCDAT is an open source library, we aim to avoid as many bugs as possible from reaching the end-user.
To get started, here are guides on how to write tests using pytest:
• https://docs.pytest.org/en/latest/
• https://docs.python-guide.org/writing/tests/#py-test
In most cases, if a function is hard to test, it is usually a symptom of being too complex (high cyclomatic-complexity).

DOs for Testing

• DO write tests for new or refactored code


• DO try to follow test-driven-development
• DO use the Coverage reports to see lines of code that need to be tested
• DO focus on simplistic, small, reusable modules for unit testing
• DO cover as many edge cases as possible when testing

DON’Ts for Testing

• DON’T push or merge untested code


• DON’T introduce tests that fail or produce warnings

10.9.7 Documenting Code

If you are using VS code, the Python Docstring Generator extension can be used to auto-generate a docstring snippet
once a function/class has been written. If you want the extension to generate docstrings in Sphinx format, you must set
the "autoDocstring.docstringFormat": "sphinx" setting, under File > Preferences > Settings.
Note that it is best to write the docstrings once you have fully defined the function/class, as then the extension will
generate the full docstring. If you make any changes to the code once a docstring is generated, you will have to
manually go and update the affected docstrings.
More info on docstrings here: https://sphinx-rtd-tutorial.readthedocs.io/en/latest/docstrings.html

10.9. Contributing 209


xCDAT Documentation, Release 0.6.0

DOs for Documenting Code

• DO explain why something is done, its purpose, and its goal. The code shows how it is done, so commenting on
this can be redundant.
• DO explain ambiguity or complexities to avoid confusion
• DO embrace documentation as an integral part of the overall development process
• DO treat documenting as code and follow principles such as Don’t Repeat Yourself and Easier to Change

DON’Ts for Documenting Code

• DON’T write comments as a crutch for poor code


• DON’T comment every function, data structure, type declaration

10.9.8 Developer Tips

• flake8 will warn you if the cyclomatic complexity of a function is too high.
– https://github.com/PyCQA/mccabe

10.9.9 Helpful Commands

Note: Run make help in the root of the project for a list of useful commands

To run a subset of tests:

$ pytest tests.test_xcdat

10.9.10 FAQs

Why squash and rebase commits?

Before you merge a support branch back into main, the branch is typically squashed down to a single buildable commit,
and then rebased on top of the main repo’s main branch.
Why?
• Ensures build passes from the commit
• Cleans up Git history for easy navigation
• Makes collaboration and review process more efficient
• Makes handling conflicts from rebasing simple since you only have to deal with conflicted commits

210 Chapter 10. License


xCDAT Documentation, Release 0.6.0

How do I squash and rebase commits?

• Use GitHub’s Squash and Merge feature in the pull request


– You still need to rebase on the latest main if main is ahead of your branch.
• Manually squash and rebase
1. <OPTIONAL if you are forking> Sync your fork of main (aka origin) with the root main (aka upstream)

git checkout main


git rebase upstream/main
git push -f origin main

2. Get the SHA of the commit OR number of commits to rebase to

git checkout <branch-name>


git log --graph --decorate --pretty=oneline --abbrev-commit

3. Squash commits:

git rebase -i [SHA]

# OR

git rebase -i HEAD~[NUMBER OF COMMITS]

4. Rebase branch onto main

git rebase main


git push -f origin <BRANCH-NAME>

5. Make sure your squashed commit messages are refined


6. Force push to remote branch

git push -f origin <BRANCH-NAME>

10.10 Project Maintenance

This page covers tips for project maintenance.

10.10.1 Releasing a New Version

1. Checkout the latest main branch.


2. Checkout a branch with the name of the version.

# For release candidates, append "rc" to <version>


git checkout -b <version>
git push --set-upstream origin <version>

3. Add updates to HISTORY.rst and commit.


4. Bump version using tbump.

10.10. Project Maintenance 211


xCDAT Documentation, Release 0.6.0

# <version> should match step 2


# --no-tag is required since tagging is handled in step 5.
tbump <version> --no-tag

5. Create a pull request to the main repo and merge it.


6. Create a GitHub release.
7. Open a PR to the xcdat conda-forge feedstock with the latest changes.
• https://github.com/conda-forge/xcdat-feedstock

10.10.2 Continuous Integration / Continuous Delivery (CI/CD)

This project uses GitHub Actions to run the CI/CD build workflow.
This workflow is triggered by Git pull_request and push (merging PRs) events to the the main repo’s main branch.
Jobs:
1. Run pre-commit for formatting, linting, and type checking
2. Build conda CI/CD environment with different Python versions, install package, and run test suite

10.11 The Team

• Tom Vo <[email protected]>
• Jason Boutte
• Stephen Po-Chedley
• Jill Chengzhu Zhang
• Jiwoo Lee

212 Chapter 10. License


INDEX

Symbols dat.bounds.BoundsAccessor method), 137


__init__() (xcdat.bounds.BoundsAccessor method), _create_monthly_time_bounds() (xc-
131 dat.bounds.BoundsAccessor method), 136
__init__() (xcdat.regridder.accessor.RegridderAccessor _create_output_dataset() (xc-
method), 159 dat.regridder.regrid2.Regrid2Regridder
__init__() (xcdat.regridder.regrid2.Regrid2Regridder method), 165
method), 163 _create_time_bounds() (xc-
__init__() (xcdat.regridder.xesmf.XESMFRegridder dat.bounds.BoundsAccessor method), 135
method), 165 _create_yearly_time_bounds() (xc-
__init__() (xcdat.regridder.xgcm.XGCMRegridder dat.bounds.BoundsAccessor method), 136
method), 167 _dataset (xcdat.bounds.BoundsAccessor attribute), 132
__init__() (xcdat.spatial.SpatialAccessor method), 138 _drop_ancillary_singleton_coords() (xc-
__init__() (xcdat.temporal.TemporalAccessor dat.bounds.BoundsAccessor method), 134
method), 145 _drop_incomplete_djf() (xc-
_abc_impl (xcdat.regridder.regrid2.Regrid2Regridder dat.temporal.TemporalAccessor method),
attribute), 165 154
_abc_impl (xcdat.regridder.xesmf.XESMFRegridder at- _drop_leap_days() (xc-
tribute), 167 dat.temporal.TemporalAccessor method),
_abc_impl (xcdat.regridder.xgcm.XGCMRegridder at- 154
tribute), 169 _drop_obsolete_columns() (xc-
_add_months_to_timestep() (xc- dat.temporal.TemporalAccessor method),
dat.bounds.BoundsAccessor method), 136 158
_add_operation_attrs() (xc- _ds (xcdat.regridder.accessor.RegridderAccessor at-
dat.temporal.TemporalAccessor method), tribute), 160
158 _form_seasons() (xcdat.temporal.TemporalAccessor
_average() (xcdat.temporal.TemporalAccessor method), 153
method), 154 _get_axis_data() (xc-
_averager() (xcdat.spatial.SpatialAccessor method), dat.regridder.accessor.RegridderAccessor
144 method), 160
_averager() (xcdat.temporal.TemporalAccessor _get_bounds_keys() (xcdat.bounds.BoundsAccessor
method), 152 method), 135
_base_put_indexes() (xc- _get_df_dt_components() (xc-
dat.regridder.regrid2.Regrid2Regridder dat.temporal.TemporalAccessor method),
method), 164 156
_calculate_weights() (xcdat.spatial.SpatialAccessor _get_grid_positions() (xc-
method), 142 dat.regridder.xgcm.XGCMRegridder method),
_combine_weights() (xcdat.spatial.SpatialAccessor 169
method), 143 _get_latitude_weights() (xc-
_convert_df_to_dt() (xc- dat.spatial.SpatialAccessor method), 142
dat.temporal.TemporalAccessor method), _get_longitude_weights() (xc-
158 dat.spatial.SpatialAccessor method), 141
_create_daily_time_bounds() (xc- _get_weights() (xcdat.temporal.TemporalAccessor

213
xCDAT Documentation, Release 0.6.0

method), 154 add_bounds() (xcdat.bounds.BoundsAccessor method),


_group_average() (xcdat.temporal.TemporalAccessor 133
method), 154 add_missing_bounds() (xarray.Dataset.bounds
_group_data() (xcdat.temporal.TemporalAccessor method), 173
method), 155 add_missing_bounds() (xc-
_is_valid_reference_period() (xc- dat.bounds.BoundsAccessor method), 132
dat.temporal.TemporalAccessor method), add_time_bounds() (xarray.Dataset.bounds method),
153 172
_keep_weights() (xcdat.temporal.TemporalAccessor add_time_bounds() (xcdat.bounds.BoundsAccessor
method), 158 method), 133
_label_time_coords() (xc- average() (xarray.Dataset.spatial method), 173
dat.temporal.TemporalAccessor method), average() (xarray.Dataset.temporal method), 175
155 average() (xcdat.spatial.SpatialAccessor method), 138
_map_months_to_custom_seasons() (xc- average() (xcdat.temporal.TemporalAccessor method),
dat.temporal.TemporalAccessor method), 145
157
_map_seasons_to_mid_months() (xc- B
dat.temporal.TemporalAccessor method), BoundsAccessor (class in xcdat.bounds), 131
157
_output_axis_sizes() (xc- C
dat.regridder.regrid2.Regrid2Regridder center_times() (in module xcdat), 122
method), 164 climatology() (xarray.Dataset.temporal method), 178
_preprocess_dataset() (xc- climatology() (xcdat.temporal.TemporalAccessor
dat.temporal.TemporalAccessor method), method), 148
154 compare_datasets() (in module xcdat), 125
_process_season_df() (xc- create_axis() (in module xcdat), 126
dat.temporal.TemporalAccessor method), create_gaussian_grid() (in module xcdat), 127
157 create_global_mean_grid() (in module xcdat), 127
_regrid() (xcdat.regridder.regrid2.Regrid2Regridder create_grid() (in module xcdat), 128
method), 164 create_uniform_grid() (in module xcdat), 129
_scale_domain_to_region() (xc- create_zonal_grid() (in module xcdat), 129
dat.spatial.SpatialAccessor method), 143
_set_arg_attrs() (xcdat.temporal.TemporalAccessor D
method), 153
decode_time() (in module xcdat), 122
_set_data_var_attrs() (xc-
departures() (xarray.Dataset.temporal method), 181
dat.temporal.TemporalAccessor method),
departures() (xcdat.temporal.TemporalAccessor
153
method), 150
_shift_decembers() (xc-
dat.temporal.TemporalAccessor method), G
157
_swap_lon_axis() (xcdat.spatial.SpatialAccessor get_bounds() (xarray.Dataset.bounds method), 173
method), 142 get_bounds() (xcdat.bounds.BoundsAccessor method),
_validate_axis_arg() (xc- 133
dat.bounds.BoundsAccessor method), 138 get_dim_coords() (in module xcdat), 125
_validate_axis_arg() (xcdat.spatial.SpatialAccessor get_dim_keys() (in module xcdat), 126
method), 141 get_weights() (xcdat.spatial.SpatialAccessor method),
_validate_region_bounds() (xc- 140
dat.spatial.SpatialAccessor method), 141 grid (xarray.Dataset.regridder attribute), 170
_validate_weights() (xcdat.spatial.SpatialAccessor grid (xcdat.regridder.accessor.RegridderAccessor prop-
method), 143 erty), 160
group_average() (xarray.Dataset.temporal method),
A 176
group_average() (xcdat.temporal.TemporalAccessor
add_bounds() (xarray.Dataset.bounds method), 171
method), 146

214 Index
xCDAT Documentation, Release 0.6.0

H X
horizontal() (xarray.Dataset.regridder method), 183 XESMFRegridder (class in xcdat.regridder.xesmf ), 165
horizontal() (xcdat.regridder.accessor.RegridderAccessorXGCMRegridder (class in xcdat.regridder.xgcm), 167
method), 161
horizontal() (xcdat.regridder.regrid2.Regrid2Regridder
method), 164
horizontal() (xcdat.regridder.xesmf.XESMFRegridder
method), 167
horizontal() (xcdat.regridder.xgcm.XGCMRegridder
method), 169
horizontal_regrid2() (xc-
dat.regridder.accessor.RegridderAccessor
method), 161
horizontal_xesmf() (xc-
dat.regridder.accessor.RegridderAccessor
method), 160

K
keys (xarray.Dataset.bounds attribute), 170
keys (xcdat.bounds.BoundsAccessor property), 132

M
map (xarray.Dataset.bounds attribute), 169
map (xcdat.bounds.BoundsAccessor property), 132

O
open_dataset() (in module xcdat), 119
open_mfdataset() (in module xcdat), 120

R
Regrid2Regridder (class in xcdat.regridder.regrid2),
163
RegridderAccessor (class in xc-
dat.regridder.accessor), 159

S
SpatialAccessor (class in xcdat.spatial), 138
swap_lon_axis() (in module xcdat), 124

T
TemporalAccessor (class in xcdat.temporal), 144

V
vertical() (xarray.Dataset.regridder method), 184
vertical() (xcdat.regridder.accessor.RegridderAccessor
method), 162
vertical() (xcdat.regridder.regrid2.Regrid2Regridder
method), 164
vertical() (xcdat.regridder.xesmf.XESMFRegridder
method), 167
vertical() (xcdat.regridder.xgcm.XGCMRegridder
method), 169

Index 215

You might also like