Open Data Cube
Open Data Cube
A High-Level Overview
Architecture and Ecosystem | 2
Summary
The Open Data Cube (ODC) is an open source solution for accessing, managing, and analyzing large quantities of Geographic
Information System (GIS) data - namely Earth observation (EO) data. It presents a common analytical framework composed
of a series of data structures and tools which facilitate the organization and analysis of large gridded data collections. The
Open Data Cube was developed for the analysis of temporally-rich earth observation data, however the flexibility of the
platform also allows other gridded data collections to be included and analyzed. Such data may include elevation models,
geophysical grids, interpolated surfaces and model outputs. A key characteristic of the Open Data Cube is that every unique
observation is kept, which contrasts with many other methods used to handle large gridded data collections. Some of the
major advantages of ODC are the following:
• Flexible framework
• User maintains control and ownership over their data
• Paradigm shift from scene-based analysis to pixel based
• Lower barrier to entry for remote sensing data analysis.
In this document, we briefly describe and illustrate the high-level architecture and ecosystem of the ODC framework in order
to provide a better understanding to those who are new to ODC. This document only covers major components of the ODC
and the relationships between them.
CONTENTS
Summary 2
Load Data 7
Acronyms 8
Architecture and Ecosystem | 3
Several international space agencies provide data and make provisions to supply this data in an Analysis Ready Data (ARD)
format for immediate application. Figure 1 illustrates a diverse set of data being managed by an ODC core system. The ODC
core system is then used as a simplified basis on which end users conduct analysis using ODC compatible analysis tools.
1
https://github.com/opendatacube/datacube-core
Architecture and Ecosystem | 4
• Command Line Tools: A tool used by programmers/developers to interface with the ODC.
• Open Data Cube Explorer: A visual and interactive web application that lets users explore their inventory of available
data.
• Open Data Cube Stats: An optimized means of defining and executing advanced analysis on ODC system. This tool
is oriented towards scientists.
• Web User Interface (UI): A web application that allows developers to interactively showcase and visualize the output
of algorithms.
• Jupyter Notebooks: Research documents centered around techniques in EO sciences. A notebook contains
executable code detailing examples of how the data cube is used in a research setting, and therefore is an invaluable
reference material for new users.
• Open Geospatial Consortium (OGC) Web Services: Adapters that can connect non-ODC applications to the ODC.
1. As shown in Figure 3, the first step in this process is to describe the source of the imagery. We include basic details
about which sensor the data comes from, what format to expect the data in, as well as its measurements, e.g. bands.
This is done by drafting a document called a product definition for each data type. This product definition is then
added to the system. Adding a product definition enables the system to accept that product.
2. The second step in the process is about extracting details from an individual satellite image. This is called the data
preparation step. Scripts are available to extract information or metadata from many types of images.
3. The data extracted in step 2 typically includes date and time of acquisition, spatial bounds, etc. as metadata. In the
third step, called indexing, metadata (documents) are indexed into the ODC’s database. Most importantly, the
process stores the location of the data within a local system.
Load Data
User-supplied query parameters are used as a lookup into the metadata database in order to determine which datasets hold
data requested by the user. Those datasets are then grouped and ordered, and the actual data is loaded from the file system.
The resulting data is organized into an Xarray Dataset with appropriate temporal-spatial dimensions and separate data
variables for each band.
Acronyms
ARD Analysis Ready Data
EO Earth Observation
DB Database
GA Geoscience Australia