Open Science Data Catalogue

Download as pdf or txt
Download as pdf or txt
You are on page 1of 7

The International Archives of the Photogrammetry, Remote Sensing and Spatial Information Sciences, Volume XLVIII-1/W2-2023

ISPRS Geospatial Week 2023, 2–7 September 2023, Cairo, Egypt

OPEN SCIENCE DATA CATALOGUE

F. Schindler1, S. Pari1, S. Meissl1, G. Smith2, E. Dobrowolska3, A. Anghelea4


1EOX IT Services GmbH, Vienna, Austria, [email protected]
2Telespazio UK, Luton, UK, [email protected]
3 Serco Italy S.p.A., Frascati, Italy, [email protected]

4European Space Agency, Frascati, Italy, [email protected]

KEY WORDS: Open Science, STAC, FAIR, Cloud Platform, Earth Observation, Collaboration.

ABSTRACT:

Open Science is a catalyst for innovation. Across the Earth Observation value chain, from R&D to prototyping new products and
development of commercial applications, openness can play an important role by promoting long-term sustainable, community-
contributed science and technology. The FAIR principles provide essential support to implementing Open Science, by offering
guidelines for how researchers can adapt their EO and Earth Science practice to enable that their work (taking place increasingly in the
cloud) and results are discovered, accessed, used, and reproduced by others. The Open Science Data Catalogue (OSC)
(https://opensciencedata.esa.int) is an ESA Open Science activity aiming to enhance the discoverability and use of the various scientific
and value-added results (i.e. data, code, documentation) achieved in Earth System Science research activities funded by ESA EO. The
OSC provides open access for the scientific community to geoscience products (based on EO data from ESA and non-ESA missions
and other geospatial information and models) across the whole spectrum of Earth Science domains. The OSC adheres to FAIR
principles and promotes reproducibility of scientific studies. The OSC makes use of various Open-Source geospatial technologies such
as pycsw, PySTAC, and OpenLayers and tries to contribute back to these projects in terms of software and standardisation. This paper
reviews the EO OSC architecture, technology stack, and illustrates how this tool can be used to discover and publish Earth System
Science products from ESA activities. It also looks at future evolutions of the product and how it contributes to ESA’s EO Open Science
and Innovation goals.

1. INTRODUCTION developed in the frame of scientific research projects funded by


ESA EO grouped by themes (or scientific domains) associated to
Open Science is increasingly recognized as a catalyst for the ESA Science Clusters, through which ESA aims at
innovation. In 2016, the EC's DG-RTD laid a vision for European contributing to the establishment of European research areas in
R&D (EC, 2016) which acknowledged that “the way that science close collaboration with the European Commission Directorate
works is fundamentally changing, and an equally important General for Research and Innovation and other European and
transformation is taking place in how companies and societies international partners. Details ablut ESA’s Science Clusters are
innovate. The advent of digital technologies is making science available at https://eo4society.esa.int/communities/scientists/.
and innovation more open, collaborative, and global”. The EC The Open Science Data Catalogue brings new functionalities,
has since expanded its views and interest in Open Science and provides discovery and open access for geospatial products and
most recently has published a renewed Open Science policy (EU, documentation (or/and code) to a scientific community of users.
2020). With common dictionary and unified metadata across
heterogeneous sources, products discovery is facilitated.
The concept of Open Science and Innovation is embraced by the Published items are also open to community contribution and
European Space Agency in its Agenda 2025 (ESA, 2022), curation, with all activity tracked on a public GitHub project.
recognizing the value that such principles of innovation can bring
for the space sector in terms of optimizing development cycles,
accelerating time to market, and reducing cost. Adhering to
principles of openness in EO and Earth System Science from the
earliest stage of the value chain, i.e., the scientific research phase,
can contribute to a sustainable creation of value, potentially
resulting in more innovation.

The Open Science Data Catalogue is one of the elements


contributing to an Open Science framework and infrastructure,
with the scope to enhance the discoverability and use of products,
data and knowledge resulting from Scientific Earth Observation
exploitation studies.

The Open Science Data Catalogue is a publicly available


platform (available at https://opensciencedata.esa.int/) that Figure 1. Open Science Data Catalogue Landing Page. The
contributes to ESA’s Earth Observation (EO) Open Science
main themes of the OSC are visible on this page.
framework. It stores geoscience products, datasets and resources

This contribution has been peer-reviewed.


https://doi.org/10.5194/isprs-archives-XLVIII-1-W2-2023-997-2023 | © Author(s) 2023. CC BY 4.0 License. 997
The International Archives of the Photogrammetry, Remote Sensing and Spatial Information Sciences, Volume XLVIII-1/W2-2023
ISPRS Geospatial Week 2023, 2–7 September 2023, Cairo, Egypt

implementing ADES, to re-generate Products from the


Finally, it allows for synoptic view for Earth Observation and Catalogue, enabling reproducibility and contributing to more
Earth System Science gap analysis, by providing a dashboard transparent research. The processing is done in a federated
view with statistics on the geophysical variables available in the fashion, allowing the processing close to the input data.
catalogue., the EO missions providing the underlying data for the
respective products and the geographical coverage of the data. Adhering by design to the “FAIR” (findable, accessible,
Currently, the actual data and its associated documentation interoperable, reproducible/reusable) principles, the Open
published on Open Science Data Catalogue are maintained and Science Data Catalogue aims to support better knowledge
accessible by the data providers, outside of esa.int, for most discovery and innovation. It facilitates data and knowledge
cases. The catalogue provides the metadata and links to the data integration and reuse by the scientific community.
as it exists in those many other locations. Work in progress looks
at improving the long-term availability by facilitating publication
of products in community maintained and curated repositories. 2. OPEN SCIENCE DATA CATALOGUE
ARCHITECTURE

The Open Science Data Catalogue is a deployment of the


EOEPCA (EOEPCA, 2023a) components in conjunction of
additional components to facilitate open access to a catalogue of
science projects and products.

EOEPCA is an ESA activity aiming to provide a blueprint for an


EO exploitation platform that attempts to facilitate
interoperability by tackling some key problems. The latest
software building blocks are freely available as source code
on GitHub and as docker images on DockerHub.
The reused components from EOEPCA are as follows
(EOEPCA, 2023b):

Figure 2. The OSC Metrics Page, providing an overview of the • Resource Management:
available geospatial products, and their temporal and spatial ◦ Resource Catalogue - provides a standards-based
coverage. EO metadata catalogue that includes support for
OGC CSW / API Records, STAC and
OpenSearch.
The Open Science Data Catalogue has the capability to hold ◦ Harvester and Registrar - The Data Access
product metadata for assets that are stored externally or internally provides standards-based services for access to
if required. Open Science Data Catalogue is also developing the platform hosted data - including OGC
capability to discover processes that can be deployed to and WMS/WMTS for visualisation, and OGC WCS
executed on remote EOEPCA platforms, using assets discovered for data retrieval. This component also includes
by Open Science Data Catalogue. Harvester and Registrar services to
discover/watch the existing data holding of the
The Open Science Data Catalogue (OSC) is based upon the EO infrastructure data layer and populate/maintain
Exploitation Platform Common Architecture (EOEPCA) and the data access and resource catalogue services
shares its basic Open Source components, but extends it with accordingly.
additional functionalities, including: • User Management:
◦ Login-Service
1. The Static Catalogue – which is a hosted STAC Catalogue, ◦ Policy Decision Point
comprised of static Catalogue, Collection, and Items that ◦ User Profile
represent the Themes, Variables, Projects, and Products.
2. The Open Science Data Catalogue Frontend – which is a The components supplementing the EOEPCA components are as
Vue.js based client application, that allows the efficient follows:
browsing of the Open Science Data Catalogue. • Fronted
3. The Backend API which allows users to make submissions • Metadata Proxy
to create, update, and delete Themes, Variables, Projects, • Backend API
Products and EO-Missions. These submissions are then • Metadata Repository
handled as GitHub Pull Requests, where they can be further
• Static Catalogue
reviewed, discussed, and finally accepted or denied.
4. The process execution – Using the Application Deployment
The Figure 4 shows the interaction of the reused and
and Execution Service (ADES) building block of EOEPCA,
supplementary components of the Open Science Data Catalogue.
it is possible to run the scientific workflows on platforms

This contribution has been peer-reviewed.


https://doi.org/10.5194/isprs-archives-XLVIII-1-W2-2023-997-2023 | © Author(s) 2023. CC BY 4.0 License. 998
The International Archives of the Photogrammetry, Remote Sensing and Spatial Information Sciences, Volume XLVIII-1/W2-2023
ISPRS Geospatial Week 2023, 2–7 September 2023, Cairo, Egypt

Figure 3. Open Science Data Catalogue design schema.

Figure 4. Interaction diagram of the Open Science Data Catalogue components.

This contribution has been peer-reviewed.


https://doi.org/10.5194/isprs-archives-XLVIII-1-W2-2023-997-2023 | © Author(s) 2023. CC BY 4.0 License. 999
The International Archives of the Photogrammetry, Remote Sensing and Spatial Information Sciences, Volume XLVIII-1/W2-2023
ISPRS Geospatial Week 2023, 2–7 September 2023, Cairo, Egypt

2.1 Components 2.1.6 Resource Catalogue: This component loads all records
from the static catalogue and provides convenient ways to search
2.1.1 Frontend: The OSC frontend is the main user interface across the contents for various metadata filters. It is considered
component that allows scientists and other parties to interact with to be a cached version of the static catalogue and must be
the system and consume the contents. It allows users to search for regularly synchronized. The resource catalogue is realized by the
scientific products, as well as to contribute to the contents of the Open Source software pycsw. Internally, the records are stored in
catalogue by ingesting new products in the catalogue or a PostgreSQL database with PostGIS extensions enabled.
submitting requests for updates of already existing content.
2.1.7 Harvester: The harvester runs in regular intervals and
The Frontend is based upon the Open Source application STAC reads the contents of the static catalogue. The harvested data
Browser (STAC Browser, 2023a) extended with functionality items are them pushed forward for registration.
from the Vue framework. The main elements of the frontend are
presented in the following paragraphs. On the landing page users 2.1.8 Registrar: This component is responsible to accept all
are presented with the current OSC themes and can access the harvested items and push them to the resource catalogue, either
main functionalities of the OSC (Search, Catalogue, Metrics, adding, altering, or removing elements as necessary.
API).
2.2 Data Model
The OSC Catalogue page allows users to discover the available
Collections and Items. The STAC specification describing all the 2.2.1 STAC Catalogue: The contents of the metadata
components and their properties is available at (STAC, 2023f). repository are kept as a static STAC Catalogue, a collection of
inter-linked JSON files and supplementary metadata. It is graph
Since all the scientific Products in the OSC are outcomes of structure, with a single root STAC Catalogue as an entry point
specific research projects, thus the OSC provides the possibility which has the following branches:
to discover the specific activities to ensure traceability and • Themes
facilitate scientific exchanges between the data owners (i.e. the • Variables
data producers and distributors) and the community accessing • EO-Missions
and using the data. • Projects
• Products
2.1.2 Metadata Proxy: This is a small reverse proxy to enable
browser access to certain web services that do not provide the Each element in turn is a listing of a number of elements of that
necessary CORS headers. type, which are in turn represented as a STAC Catalogue or
STAC Collection. These objects use the OSC STAC extension to
2.1.3 Backend API: This REST API service allows to submit reference elements of other groups they are associated with. e.g
contributions from either the frontend or any other compliant a Product has an “osc:variables” field, that lists the measurement
component to the contents of the catalogue. It translates all variables this product is comprised of.
contributions to GitHub Pull Requests. The Backend API is built
using the FastAPI software framework. To allow the easier management of the catalogue, the linking is
done simply by using identifier fields. Actual STAC links are
2.1.4 Metadata Repository: The metadata of the Open added, when the catalogue is exported.
Science Data Catalogue items are stored in a git repository. This
allows the convenient management of the contents, with the When exported (when a change to the main branch of the
possibility to add changes in the form of branches/Pull Requests, metadata repository is merged), then the Static Catalogue is built.
that can be reviewed by metadata administrators and to be finally Here, the contents of the metadata repository are taken, and
submitted to the main branch of the repository. The metadata STAC link objects are introduced to link the related files.
repository is hosted and managed by GitHub. Additionally, search keywords are added to allow a later
retrieval.
2.1.5 Static Catalogue: This is an export of the contents of
the source of truth of the Open Science Data Catalogue. Upon The STAC Catalogue makes use of various STAC extensions to
any change on the metadata repository, the static catalogue is re- best describe the contents. Most notably the scientific (sci),
built an exported for the consumption of the frontend or any other subjects, projection (proj), and datacube extensions.
compliant component. The static catalogue itself is a structure of
STAC objects in various inter-linked JSON files. 2.2.2 STAC API: Once harvested into the resource
management, the STAC API of the Resource Catalogue allows
efficient searching using text, geospatial, temporal and other
metadata attributes.

This contribution has been peer-reviewed.


https://doi.org/10.5194/isprs-archives-XLVIII-1-W2-2023-997-2023 | © Author(s) 2023. CC BY 4.0 License. 1000
The International Archives of the Photogrammetry, Remote Sensing and Spatial Information Sciences, Volume XLVIII-1/W2-2023
ISPRS Geospatial Week 2023, 2–7 September 2023, Cairo, Egypt

Figure 5. Open Science Data Catalogue Page

Figure 6. An example OSC Project Page. The Project metadata and the associated Products (i.e., Project outcomes) are visible on
this page.

This contribution has been peer-reviewed.


https://doi.org/10.5194/isprs-archives-XLVIII-1-W2-2023-997-2023 | © Author(s) 2023. CC BY 4.0 License. 1001
The International Archives of the Photogrammetry, Remote Sensing and Spatial Information Sciences, Volume XLVIII-1/W2-2023
ISPRS Geospatial Week 2023, 2–7 September 2023, Cairo, Egypt

Figure 7. An OSC Product Page. A Product is a STAC item, while the various datasets included in the Product are Assets of the
respective Item.

Figure 8. The OSC Search Page. In addition to searching by free text or keywords, users can search by Theme, Variable, Project, EO
Mission and Region.

This contribution has been peer-reviewed.


https://doi.org/10.5194/isprs-archives-XLVIII-1-W2-2023-997-2023 | © Author(s) 2023. CC BY 4.0 License. 1002
The International Archives of the Photogrammetry, Remote Sensing and Spatial Information Sciences, Volume XLVIII-1/W2-2023
ISPRS Geospatial Week 2023, 2–7 September 2023, Cairo, Egypt

2.3 Deployment PostgreSQL Global Development Group, 2023. PostgreSQL:


The World's Most Advanced Open Source Relational
The whole deployment of the Open Science Data Catalogue Database. https://www.postgresql.org/. (14 July 2023).
system is orchestrated using Flux CD, which is connected to a
git repository, containing the manifests and metadata of the to STAC Community, 2023a. Spatio Temporal Asset Catalogue
be deployed components. Every change to that repository, (STAC) Browser repository. https://github.com/radiantearth/s
reflects a change to the kubernetes cluster, the Open Science tac-browser (14 July 2023).
Data Catalogue is deployed on. This enables a convenient and
reliable way to configure the cluster, with a full history of STAC Community, 2023b. Spatio Temporal Asset Catalogue
configuration changes. (STAC) Datacube extension
repository. https://github.com/stac-extensions/datacube (14
July, 2023).
3. CONCLUSIONS
STAC Community, 2023c. Spatio Temporal Asset Catalogue
The Open Science Data Catalogue is implemented using (STAC) extension repository. https://github.com/stac-
EOEPCA building blocks. It provides a catalogue of publicly extensions/ (14 July 2023).
available EO and Earth Science products, datasets, workflows,
and other resources developed in the frame of scientific STAC Community, 2023d. Spatio Temporal Asset Catalogue
research Projects funded by ESA EO, and one of the elements (STAC) Open Science Catalogue extension
of the Open Science framework and infrastructure supporting repository. https://github.com/stac-extensions/osc (14 July
reproducible Earth System Science research with Earth 2023).
Observation Data.
STAC Community, 2023e. Spatio Temporal Asset Catalogue
(STAC) Projection extension repository.
ACKNOWLEDGEMENTS https://github.com/stac-extensions/projection (14 July 2023).
This work has been carried out under the EO Science for STAC Community, 2023f. Spatio Temporal Asset Catalogue
Society programme of and funded by the European Space (STAC) Specification. https://github.com/radiantearth/stac-
Agency: EOEPCA (Earth Observation Exploitation Platform spec (14 July 2023).
Common Architecture) https://eoepca.org/.
STAC Community, 2023g. Spatio Temporal Asset Catalogue
(STAC) Themes extension
REFERENCES repository. https://github.com/stac-extensions/themes (14
July 2023).
EC, 2016. Open innovation, open science, open to the world.
https://digital-strategy.ec.europa.eu/en/library/open-
The Flux, 2023. Flux - the GitOps family of projects.
innovation-open-science-open-world (14 July 2023).
https://fluxcd.io/ (14 July 2023).
EOEPCA, 2021. Master System Design Document:
tiangolo, 2023. FastAPI. https://fastapi.tiangolo.com/ (14 July
EOEPCA.SDD.001. https://eoepca.github.io/master-system-
2023).
design/current/ (14 July 2023).

EOEPCA, 2023a. Earth Observation Exploitation Platform


Common Architecture. https://eoepca.org/ (14 July 2023).

EOEPCA, 2023b. EOEPCA Deployment Guide https://deplo


yment-guide.docs.eoepca.org/current/quickstart/userman-
deployment/ (14 July 2023).

ESA, 2022. ESA Agenda 2025. https://www.esa.int/About_U


s/ESA_Publications/Agenda_2025 (14 July 2023).

EU, 2020. EU Open Science Policy. https://research-and-


innovation.ec.europa.eu/strategy/strategy-2020-2024/our-
digital-future/open-science_en (14 July 2023).

Evan You, 2023. Vue.js, The Progressive JavaScript


Framework. https://vuejs.org/ (14 July 2023).

Kubernetes, 2023. Kubernetes - Production-Grade Container


Orchestration. https://kubernetes.io/ (14 July 2023).

OSGeo, 2023. Pycsw https://pycsw.org/. (14 July 2023).

PostGIS PSC & OSGeo, 2023. PostGIS. http://postgis.net/. (14


July 2023).

This contribution has been peer-reviewed.


https://doi.org/10.5194/isprs-archives-XLVIII-1-W2-2023-997-2023 | © Author(s) 2023. CC BY 4.0 License. 1003

You might also like