0% found this document useful (0 votes)
3 views15 pages

mcs308_data_format_lecture

The document outlines various data formats used in climate science, including GRIB, netCDF, HDF, and GeoTIFF, which are all machine-independent and self-describing. It details the characteristics and versions of these formats, emphasizing their metadata capabilities and applications in storing and manipulating climate data. Additionally, it introduces the Climate Data Operator (CDO) for processing and analyzing climate data, providing examples of basic commands and operations.

Uploaded by

jadonamite
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
3 views15 pages

mcs308_data_format_lecture

The document outlines various data formats used in climate science, including GRIB, netCDF, HDF, and GeoTIFF, which are all machine-independent and self-describing. It details the characteristics and versions of these formats, emphasizing their metadata capabilities and applications in storing and manipulating climate data. Additionally, it introduces the Climate Data Operator (CDO) for processing and analyzing climate data, providing examples of basic commands and operations.

Uploaded by

jadonamite
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 15

DATA FORMATS IN CLIMATE.

SCIENCE
• Data formats commonly encountered in climate research fall into 3 generic
categories: GRIB, netCDF and HDF.
• All of these formats are machine independent and self-describing.
Self-describing
files can be examined and read by the appropriate software without the user
knowing the file's structural details.
• Metadata with the file information are always included
• Typical metadata may include textual information about each variable's contents
and units (eg.,"specific humidity" and "g/kg") or numerical information describing the
coordinates (eg., time, level, latitude, longitude) that apply to the variables on the
file.

DIFFERENT FILES AND VARIATIONS
• GRIB1: GRIdded Binary (Edition 1), World Meteorological Organization
• GRIB2: GRIdded Binary (Edition 2), World Meteorological Organization
• netCDF3: Network Common Data Form, (Version 3.x), Unidata (UCAR/NCAR)
• netCDF4: Network Common Data Format, (Version 4.x), Unidata (UCAR/NCAR
• HDF4: Hierarchical Data Format, (Version 4.x), NCSA/NASA
• HDF4-EOS2: HDF4-Earth Obseving System, (Version 2; georeferenced data)
• HDF5: Hierarchical Data Format, (Version 5.x), NCSA/NASA
• HDF5-EOS5: HDF5-Earth Obseving System, (Version 5; georeferenced data)
• GeoTIFF: Georeferenced raster imagery
GRIB-GRIDDED BINARY
• GRIB is a file format for the storage and transport of gridded meteorological data, such as
Numerical Weather Prediction model output. It is designed to be self-describing, compact and
portable across computer architectures. The GRIB standard was designed and is maintained by the
World Meteorological Organization.
• Over the years, the WMO issued three editions of the GRIB standard:
• GRIB Edition 0: now obsolete, unsupported, and rarely used.
• GRIB 1: no longer the most current WMO GRIB edition, the format of Edition 1 has been frozen
from future enhancements. However, due to it's usage in the World Area Forecast system of the ICAO,
it it still recognized by the WMO. In the medium term, the CMC will no longer produce data in this
format.
• GRIB 2 (GRIB2): the GRIB2 format represents an enlarging and a significant modernization of the
GRIB standard. It is being phased in by the ECMWF and some national Numerical Weather
Prediction institutions, notably in the US and Europe. A significant modernization and broadening of
the GRIB standard, Edition 2 is not backward-compatible with Edition 1.
• A GRIB file contains one or more data records, arranged as a
sequential bit stream. Each record begins with a header, followed
by packed binary data.
• It contains information about :
• the qualitative nature of the data (field, level, date of production, forecast
valid time, etc),
• the header itself (meta-information on header length, header byte usage,
presence of optional sub-headers),
• the method and parameters to be used to decode the packed data,
• the layout and geographical characteristics of the grid the data is to be
plotted on.
HDF- HIERARCHICAL DATA FORMAT
• Hierarchical Data Format (HDF) is a data file format designed by the National Center
for Super-computing Applications (NCSA) to assist users in the storage and manipulation
of scientific data across diverse operating systems and machine.
• There are two distinct varieties of HDF, known as HDF (version 4 and earlier) and the
newer HDF5.
• HDF files are also self-describing. For each data object in an HDF file, there are
predefined tags that identify such information as the type of data, the amount of data,
its dimensions, and its location in the file.
• The self-describing capability of HDF files has important implications for
processing scientific data. It makes it possible to fully understand the
structure and contents of a file just from the information stored in the file
itself.
• A program that has been written to interpret certain tag types can scan a
file containing those tag types and process the corresponding data.
• Self-description also means that many types of data can be bundled in an
HDF file. For example, it is possible to accommodate symbolic, numerical,
and graphical data in one HDF file
GEOTIFF
• GeoTIFF is a public domain metadata standard that enables
georeferencing information to be embedded within an image file.
• The GeoTIFF format embeds geospatial metadata into image files
such as aerial photography, satellite imagery, and digitized maps
so that they can be used in GIS applications.
• A GeoTIFF file extension contains geographic metadata that
describes the actual location in space that each pixel in an image
represents.
• In creating a GeoTIFF file, spatial information is included in the .tif file as
embedded tags, which can include raster image metadata such as:
• horizontal and vertical datums
• spatial extent, i.e. the area that the dataset covers
• the coordinate reference system (CRS) used to store the data
• spatial resolution, measured in the number of independent pixel values per unit
length
• the number of layers in the .tif file
• ellipsoids and geoids - estimated models of the Earth’s shape
• mathematical rules for map projection to transform data for a three-dimensional
space into a two-dimensional display
NETCDF- NETWORK COMMON DATA FORM
• NetCDF (Network Common Data Form) is a set of software libraries and machine-independent
data formats that support the creation, access, and sharing of array-oriented scientific data. It
is also a community standard for sharing scientific data.
• NetCDF data is:
• Self-Describing. A netCDF file includes information about the data it contains.
• Portable. A netCDF file can be accessed by computers with different ways of storing integers, characters,
and floating-point numbers.
• Scalable. A small subset of a large dataset may be accessed efficiently.
• Appendable. Data may be appended to a properly structured netCDF file without copying the dataset or
redefining its structure.
• Shareable. One writer and multiple readers may simultaneously access the same netCDF file.
• Archivable. Access to all earlier forms of netCDF data will be supported by current and future versions of
the softwar
DATA PROCESSING AND MANIPULATIONS
CLIMATE DATA OPERATOR (CDO)
• CDO is a collection of command line operators to manipulate and
analyze climate and Numerical Weather Prediction model (NWP) model
o u t p u t s .

• Supported data formats: GRIB1, GRIB2 and NetCDF3, NetCDF4

• More detailed information : https://code.zmaw.de/projects/cdo


BASIC EXAMPLES
• To print short information about a file:
• cdo sinfo myfile.nc

• To select a single year (e.g,1950) :


• cdo selyear,1950 ifile ofile

• To select a period (e.g., 1961-1990):


• cdo selyear,1961/1990 ifile ofile

• To subtract two datasets


• cdo sub ifile1 ifile2 ofile

• converting from GRIB to netCDF can be as simple as


• cdo -f nc copy file.grb file.nc

• To add to datasets:
• cdo add ifile1 ifile2 ofile
PIPING IN CDO
• The use of pipes reduce unnecessary disk usage:
• e.g., calculation of the 1961-1990 October-November-December (OND) mean

• Step by step:

• bash$cdo selyear,1961/1990 ifile.nc ofile1.nc


• bash$cdo selmon,10,11,12 ofile1.nc ofile2.nc
• bash$cdo timmean ofile2.nc ofile3.nc

• Piping:

• Bash$cdo timmean –selmon,10,11,12 –selyear,1961/90 ifile.nc ofile3.nc

• Syntax : cdo operation3 –operation2 –operation1 ifile.nc ofile.nc


CLASS TASK

• Perform operations in CDO- Arithmetic, conversion, griddling,

• Calculate extreme events in CDO

You might also like