Open Science Data Catalogue
Open Science Data Catalogue
Open Science Data Catalogue
KEY WORDS: Open Science, STAC, FAIR, Cloud Platform, Earth Observation, Collaboration.
ABSTRACT:
Open Science is a catalyst for innovation. Across the Earth Observation value chain, from R&D to prototyping new products and
development of commercial applications, openness can play an important role by promoting long-term sustainable, community-
contributed science and technology. The FAIR principles provide essential support to implementing Open Science, by offering
guidelines for how researchers can adapt their EO and Earth Science practice to enable that their work (taking place increasingly in the
cloud) and results are discovered, accessed, used, and reproduced by others. The Open Science Data Catalogue (OSC)
(https://opensciencedata.esa.int) is an ESA Open Science activity aiming to enhance the discoverability and use of the various scientific
and value-added results (i.e. data, code, documentation) achieved in Earth System Science research activities funded by ESA EO. The
OSC provides open access for the scientific community to geoscience products (based on EO data from ESA and non-ESA missions
and other geospatial information and models) across the whole spectrum of Earth Science domains. The OSC adheres to FAIR
principles and promotes reproducibility of scientific studies. The OSC makes use of various Open-Source geospatial technologies such
as pycsw, PySTAC, and OpenLayers and tries to contribute back to these projects in terms of software and standardisation. This paper
reviews the EO OSC architecture, technology stack, and illustrates how this tool can be used to discover and publish Earth System
Science products from ESA activities. It also looks at future evolutions of the product and how it contributes to ESA’s EO Open Science
and Innovation goals.
Figure 2. The OSC Metrics Page, providing an overview of the • Resource Management:
available geospatial products, and their temporal and spatial ◦ Resource Catalogue - provides a standards-based
coverage. EO metadata catalogue that includes support for
OGC CSW / API Records, STAC and
OpenSearch.
The Open Science Data Catalogue has the capability to hold ◦ Harvester and Registrar - The Data Access
product metadata for assets that are stored externally or internally provides standards-based services for access to
if required. Open Science Data Catalogue is also developing the platform hosted data - including OGC
capability to discover processes that can be deployed to and WMS/WMTS for visualisation, and OGC WCS
executed on remote EOEPCA platforms, using assets discovered for data retrieval. This component also includes
by Open Science Data Catalogue. Harvester and Registrar services to
discover/watch the existing data holding of the
The Open Science Data Catalogue (OSC) is based upon the EO infrastructure data layer and populate/maintain
Exploitation Platform Common Architecture (EOEPCA) and the data access and resource catalogue services
shares its basic Open Source components, but extends it with accordingly.
additional functionalities, including: • User Management:
◦ Login-Service
1. The Static Catalogue – which is a hosted STAC Catalogue, ◦ Policy Decision Point
comprised of static Catalogue, Collection, and Items that ◦ User Profile
represent the Themes, Variables, Projects, and Products.
2. The Open Science Data Catalogue Frontend – which is a The components supplementing the EOEPCA components are as
Vue.js based client application, that allows the efficient follows:
browsing of the Open Science Data Catalogue. • Fronted
3. The Backend API which allows users to make submissions • Metadata Proxy
to create, update, and delete Themes, Variables, Projects, • Backend API
Products and EO-Missions. These submissions are then • Metadata Repository
handled as GitHub Pull Requests, where they can be further
• Static Catalogue
reviewed, discussed, and finally accepted or denied.
4. The process execution – Using the Application Deployment
The Figure 4 shows the interaction of the reused and
and Execution Service (ADES) building block of EOEPCA,
supplementary components of the Open Science Data Catalogue.
it is possible to run the scientific workflows on platforms
2.1 Components 2.1.6 Resource Catalogue: This component loads all records
from the static catalogue and provides convenient ways to search
2.1.1 Frontend: The OSC frontend is the main user interface across the contents for various metadata filters. It is considered
component that allows scientists and other parties to interact with to be a cached version of the static catalogue and must be
the system and consume the contents. It allows users to search for regularly synchronized. The resource catalogue is realized by the
scientific products, as well as to contribute to the contents of the Open Source software pycsw. Internally, the records are stored in
catalogue by ingesting new products in the catalogue or a PostgreSQL database with PostGIS extensions enabled.
submitting requests for updates of already existing content.
2.1.7 Harvester: The harvester runs in regular intervals and
The Frontend is based upon the Open Source application STAC reads the contents of the static catalogue. The harvested data
Browser (STAC Browser, 2023a) extended with functionality items are them pushed forward for registration.
from the Vue framework. The main elements of the frontend are
presented in the following paragraphs. On the landing page users 2.1.8 Registrar: This component is responsible to accept all
are presented with the current OSC themes and can access the harvested items and push them to the resource catalogue, either
main functionalities of the OSC (Search, Catalogue, Metrics, adding, altering, or removing elements as necessary.
API).
2.2 Data Model
The OSC Catalogue page allows users to discover the available
Collections and Items. The STAC specification describing all the 2.2.1 STAC Catalogue: The contents of the metadata
components and their properties is available at (STAC, 2023f). repository are kept as a static STAC Catalogue, a collection of
inter-linked JSON files and supplementary metadata. It is graph
Since all the scientific Products in the OSC are outcomes of structure, with a single root STAC Catalogue as an entry point
specific research projects, thus the OSC provides the possibility which has the following branches:
to discover the specific activities to ensure traceability and • Themes
facilitate scientific exchanges between the data owners (i.e. the • Variables
data producers and distributors) and the community accessing • EO-Missions
and using the data. • Projects
• Products
2.1.2 Metadata Proxy: This is a small reverse proxy to enable
browser access to certain web services that do not provide the Each element in turn is a listing of a number of elements of that
necessary CORS headers. type, which are in turn represented as a STAC Catalogue or
STAC Collection. These objects use the OSC STAC extension to
2.1.3 Backend API: This REST API service allows to submit reference elements of other groups they are associated with. e.g
contributions from either the frontend or any other compliant a Product has an “osc:variables” field, that lists the measurement
component to the contents of the catalogue. It translates all variables this product is comprised of.
contributions to GitHub Pull Requests. The Backend API is built
using the FastAPI software framework. To allow the easier management of the catalogue, the linking is
done simply by using identifier fields. Actual STAC links are
2.1.4 Metadata Repository: The metadata of the Open added, when the catalogue is exported.
Science Data Catalogue items are stored in a git repository. This
allows the convenient management of the contents, with the When exported (when a change to the main branch of the
possibility to add changes in the form of branches/Pull Requests, metadata repository is merged), then the Static Catalogue is built.
that can be reviewed by metadata administrators and to be finally Here, the contents of the metadata repository are taken, and
submitted to the main branch of the repository. The metadata STAC link objects are introduced to link the related files.
repository is hosted and managed by GitHub. Additionally, search keywords are added to allow a later
retrieval.
2.1.5 Static Catalogue: This is an export of the contents of
the source of truth of the Open Science Data Catalogue. Upon The STAC Catalogue makes use of various STAC extensions to
any change on the metadata repository, the static catalogue is re- best describe the contents. Most notably the scientific (sci),
built an exported for the consumption of the frontend or any other subjects, projection (proj), and datacube extensions.
compliant component. The static catalogue itself is a structure of
STAC objects in various inter-linked JSON files. 2.2.2 STAC API: Once harvested into the resource
management, the STAC API of the Resource Catalogue allows
efficient searching using text, geospatial, temporal and other
metadata attributes.
Figure 6. An example OSC Project Page. The Project metadata and the associated Products (i.e., Project outcomes) are visible on
this page.
Figure 7. An OSC Product Page. A Product is a STAC item, while the various datasets included in the Product are Assets of the
respective Item.
Figure 8. The OSC Search Page. In addition to searching by free text or keywords, users can search by Theme, Variable, Project, EO
Mission and Region.