ODC_Cheatsheet
ODC_Cheatsheet
org
Loading and analysing Earth Observation data with the Open Data Cube Datacube docs: https://datacube-core.readthedocs.io/en/latest/
odc-geo docs: https://odc-geo.readthedocs.io/en/latest/
Getting started Load and reproject data into a custom coordinate reference system Aggregating data (e.g. min, max, mean, median, std):
and resolution grid, e.g. UTM Zone 55 S, 200 metre resolution: Calculate means for every pixel Calculate means across all pixels in each
Import Python packages and connect to database: (for most CRSs, the first value is negative by convention) across time, producing a 2D image: timestep, producing a 1D timeseries:
import datacube # used for querying and loading data
import odc.geo.xr # enables additional geospatial tools dc.load(... ds.mean(dim="time") ds.mean(dim=["y", "x"])
output_crs="EPSG:32755",
dc = datacube.Datacube() resolution=(-200, 200)) # -y, x
List available products in the datacube: Apply custom resampling when reprojecting (default is “nearest”): Plotting and exporting data
dc.list_products() Use “average” resampling Use “nearest” resampling for the Plot on an interactive map for rapid data exploration:
for all bands: “fmask” band, “average” for all others:
ds.isel(time=0).odc.explore() # also works for single bands
List the measurements (e.g. bands or variables) available for dc.load(...
each datacube product: dc.load(... resampling={ Plotting single bands as a static plot:
resampling="average") "fmask": "nearest",
dc.list_measurements() Plot a single timestep: Plot multiple timesteps:
"*": "average"})
ds.fmask.plot(
ds.fmask.isel(time=0).plot()
col="time", col_wrap=4)
Lazily load data using Dask:
(used for parallelization and managing memory; chunk sizes will depend on data)
Plotting multiple bands as an RGB image:
Loading data (will auto-guess red, green and blue bands if they exist in the data)
dc.load(..., dask_chunks={"y": 2048, "x": 2048})
Load a specific product and measurements:
ds.isel(time=0).odc.to_rgba().plot.imshow()
ds = dc.load(
product="ga_ls8c_ard_3", Export data as a cloud optimised GeoTIFF raster file:
measurements=["nbart_red", "nbart_blue", "fmask"], ...) Preparing data for analysis
ds.isel(time=0).fmask.odc.write_cog("output_filename.tif")
Inspect nodata attributes and cloud masking band flags:
Load data for a specific spatial extent:
ds.nbart_red.odc.nodata
Degrees lat/lon coordinates Custom coordinate reference system
ds.fmask.attrs["flags_definition"]
(WGS84/EPSG:4326): (e.g. Australian Albers):
GeoBox and geospatial tools
dc.load(... View a dataset’s “GeoBox” defining its spatial pixel grid:
dc.load(... Setting nodata pixels (e.g. -999) to NaN:
x=(948280, 981840),
y=(-32.2, -32.5),
y=(-3546480, -3584720),
x=(142.2, 142.5)) ds_masked = datacube.utils.masking.mask_invalid_data(ds) ds.odc.geobox
crs="EPSG:3577")
ds.odc.geobox.crs # coordinate reference system (CRS)
ds.odc.geobox.resolution # spatial pixel resolution
Loading data by time: Convert a cloud masking band into a boolean mask and apply to ds.odc.geobox.boundingbox # spatial extent of data
a dataset (setting cloud pixels to NaN):
From a specific date: From an entire year:
Reproject a loaded dataset:
cloud_mask = datacube.utils.masking.make_mask(
dc.load(... dc.load(... Reproject to a different CRS: Reproject to another dataset’s GeoBox:
ds.fmask, fmask="cloud")
time="2020-01-01") time="2020")
ds_masked = ds.where(~cloud_mask)
ds_wgs84 # data in another CRS
ds.odc.reproject(
ds.odc.reproject(
All data from 2020 to 2022 All data from 2020 onward how="EPSG:32755")
how=ds_wgs84.odc.geobox)
(inclusive of start and end): (inclusive of start):
dc.load(... dc.load(... Basic analysis with xarray Mask or crop a dataset to the extent of a polygon:
time=("2020", "2022")) time=("2020", None))
Selecting a subset of data: from odc.geo.geom import Geometry
Use “.isel()” for “index selection”, Use “.sel()” for “coordinate selection”, geopolygon = Geometry(<shapely_polygon>, crs="EPSG:4326")
Group sequential images captured along each satellite path into
e.g. select first 5 values along the e.g. select all pixels between specific y
daily timesteps: data’s y and x dimensions: and x coordinates: # Mask data to set pixels outside polygon to NaN
(only required for products with daily acquisitions, e.g. Landsat or Sentinel-2; ds_masked = ds.odc.mask(poly=geopolygon)
not required for summary products like annual or monthly datasets)
ds.isel( ds.sel(
y=slice(0, 5), y=slice(-3867375, -3867350), # Crop data to extent of polygon (and optionally mask)
dc.load(..., group_by="solar_day") x=slice(0, 5)) x=slice(1516200, 1541300)) ds_cropped = ds.odc.crop(poly=geopolygon, apply_mask=True)
Designed: Robbi Bishop-Taylor (@SatelliteSci), Geoscience Australia. Modified: Feb 2024 (datacube==1.8.17, odc-geo==0.4.2)