0% found this document useful (0 votes)
29 views25 pages

Wis2box Datasets Copie

Uploaded by

manga
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
29 views25 pages

Wis2box Datasets Copie

Uploaded by

manga
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 25

Datasets in wis2box:

metadata and data mappings


The Dataset driven approach

wis2box workflow relies on Datasets to be able to publish WIS2 notifications:


• A Dataset is described by Discovery Metadata for publication to the Global
Discovery Catalogue
• A Dataset connects data and metadata using the metadata identifier
• A Dataset defines the topic hierarchy to publish the WIS2 data notification
• A Dataset contains data plugins used to transform and publish the data

Datasets are stored in the pygeoapi 'discovery-metadata' collection


see: <wis2box-url>/oapi/collections/discovery-metadata/items?f=json
Dataset driven approach
storage
data uploaded
to incoming proxy HTTP-proxy
incoming public
listen to publish to
incoming- public-
bucket
MQTT-broker
bucket
tions
wis2box-management 2 - notifica
s h WIS
publi
transform input data
create WIS2 notifications
create WCMP2 records

Retrieve dataset: store


Metadata and geojson
data-mappings

API-backend wis2box-api

Datasets determine the actions taken whenever a file


is uploaded to the 'incoming' storage bucket (see next
slides)
Dataset driven approach: step-by-step
storage
CSV data uploaded to
incoming proxy HTTP-proxy
incoming public

MQTT-broker

wis2box-management
zzz…

API-backend wis2box-api 1. data arrives in the incoming bucket


Dataset driven approach: step-by-step
storage
proxy HTTP-proxy
incoming public
There is
some new
CSV data!
MQTT-broker

wis2box-management
“OK, let’s handle
the new data!”

API-backend wis2box-api 2. MinIO informs wis2box-management new data arrived


Dataset driven approach: step-by-step
storage
proxy HTTP-proxy
incoming public
There is
some new
CSV data!
MQTT-broker

wis2box-management
“Matching data
with dataset…”

Retrieve dataset:
metadata_id,
topic and plugins

API-backend wis2box-api 3. wis2box-management matches the incoming data to a dataset:


defining the metadata identifier, WIS2 topic hierarchy and data plugins

Files are matches with a dataset if:


• Filepath contains the identifier for the dataset, or
• Filepath contains the topic for the data
Dataset driven approach: step-by-step
storage
proxy HTTP-proxy
incoming public
publish BUFR to
public bucket
MQTT-broker

wis2box-management
“Transform using
Data plugin cvs2bufr:
new BUFR produced”
Request wis2box-api
to use BUFR tools
for transformation

API-backend wis2box-api 4. Plugin is loaded to transform data and publish output to public bucket
Dataset driven approach: step-by-step
storage
proxy HTTP-proxy
incoming public
publish BUFR
to public
MQTT-broker
bucket
tions
wis2box-management 2 notifica
s h WIS
publi
“Hi broker, publish this
WIS2 notification for
new BUFR data”

API-backend wis2box-api 5. wis2box-management publishes data notification on associated topic


Dataset driven approach: step-by-step
storage
proxy HTTP-proxy
incoming public
There is
some new
BUFR data! MQTT-broker

wis2box-management
“plugin bufr2geojson
storing GeoJSON in
backend…”
store
GeoJSON

API-backend wis2box-api 6. Optionally, data is also stored as GeoJSON in the backend


Datasets in the wis2box
Two ways to configure a new dataset in the wis2box:
• Use the dataset-editor in the wis2box-webapp
• … or share an MCF file with the wis2box-management container
and execute ‘wis2box dataset publish <file-path>’

new WCMP2 notification on origin/a/wis2/<centre-id>/metadata


for every new dataset published

update WCMP2 notification on origin/a/wis2/<centre-id>/metadata


for every updated dataset published

delete WCMP2 notification on origin/a/wis2/<centre-id>/metadata


whenever a dataset is unpublished

new/update/delete
WCMP2

JSON
Global
Discovery
Catalogue
“centre identifier” (centre-id)
To define a dataset you have to provide a centre-id for your WIS centre

“The centre identifier (centre-id) is an acronym as proposed by the Member and endorsed by the WMO
Secretariat. It is a single identifier comprised of a Top Level Domain (TLD) and centre-name, and represents
the Data Publisher, distributor or issuing centre of a given Dataset or data product/granule”

uk-metoffice
token 1: the TLD (lowercase) br-inmet token 2: a descriptive name for the
centre (lowercase), may include dashes
cn-cma
id-bmkg
Dataset Editor in wis2box-webapp
When using the dataset editor you will be asked to provide a “Centre ID” which will be
used to define the identifier and the corresponding topic for the dataset

You will be asked to provide a “Data Type”:

Choose a template to initialize the dataset with


a fixed topic and other pre-defined values

Choose ‘other’ If your data type is not included


as a template
Dataset Editor in wis2box-webapp
Data Type = weather/surface-based-observation/synop

The Topic Hierarchy is fixed to …/weather/surface-based-observations/synop


Dataset Editor in wis2box-webapp
Data Type = other

The Topic Hierarchy field needs to be updated by the user


Dataset Editor in wis2box-webapp
Step 1. Define metadata and validate form
Step 2. Define data plugins
Step 3. Submit the dataset for publication

Slide demonstrating in Dataset Editor


Datasets using YAML and the command line
cords in wis2box can also be defined by a completing a YAML configuration file.
pygeometa project metadata control file (MCF) format:

Slides demonstrating the yaml content and command-line functions

MCF can be published using the ‘wis2box dataset publish’ command


available in the wis2box-management container:
wis2box dataset publish /data/wis2box/metadata-
MCF.yml

When using MCF, the user is responsible to ensure all


required fields are present and contain valid entries
wis2box data plugins

Data Mappings map a specific dataset to a set of data plugins


Data plugins use an abstract model/approach to enable extensibility and reuse
A data plugin defines the actions taken to transform and publish the data
See github.com/wmo-im/wis2box/tree/main/wis2box-management/wis2box/data

WMO Support
wis2box data plugins

wis2box contains the following built-in data plugins:


• wis2box.data.universal.UniversalData
• wis2box.data.cap_message.CAPMessageData
• wis2box.data.bufr4.ObservationDataBUFR
• wis2box.data.synop2bufr.ObservationDataSYNOP2BUFR
• wis2box.data.csv2bufr.ObservationDataCSV2BUFR
• wis2box.data.bufr2geojson.ObservationDataBUFR2GeoJSON

Developers are encouraged to contribute new data plugins to wis2box!


wis2box data plugins: synop2bufr
File containing SYNOP
messages (FM-12)
One or more
synop2bufr BUFR files bufr2geojson

Station list WIS 2.0


GeoJSON

plugins:
txt:
- plugin: wis2box.data.synop2bufr.ObservationDataSYNOP2BUFR
notify: true
file-pattern: ‘^*_(\d{4})(\d{2}).*\.txt$'
bufr4:
- plugin: wis2box.data.bufr2geojson.ObservationDataBUFR2GeoJSON
file-pattern: ’ ^WIGOS_(\d-\d+-\d+-\w+)_.*\.bufr4$’
wis2box data plugins: csv2bufr
Tabulated CSV data from One or more
observing station, including BUFR files
csv2bufr bufr2geojson
location

WIS 2.0
Station list
GeoJSON
mapping template
plugins:
csv:
- plugin: wis2box.data.csv2bufr.ObservationDataCSV2BUFR
template: aws-template.json
notify: true
file-pattern: ‘^.*\.csv$'
bufr4:
- plugin: wis2box.data.bufr2geojson.ObservationDataBUFR2GeoJSON
file-pattern: ’ ^WIGOS_(\d-\d+-\d+-\w+)_.*\.bufr4$’
wis2box data plugins: bufr2bufr

File containing one or more BUFR processing One or more


BUFR subsets (extraction of subsets) BUFR files bufr2geojson

WIS 2.0
Station list
GeoJSON

plugins:
bin:
- plugin: wis2box.data.bufr4.ObservationDataBUFR
notify: true
file-pattern: ‘^.*\.bin$’
bufr4:
- plugin: wis2box.data.bufr2geojson.ObservationDataBUFR2GeoJSON
file-pattern: ’ ^WIGOS_(\d-\d+-\d+-\w+)_.*\.bufr4$’
wis2box data plugins: universal/passthrough

Extract data-timestamp
from filename pattern
File File

WIS 2.0

plugins:
grib2:
- plugin: wis2box.data.universal.UniversalData
notify: true
buckets:
- ${WIS2BOX_STORAGE_INCOMING}
file-pattern: '^.*_(\d{8})\d{2}.*\.grib2$'
wis2box data plugins: Common Alerting Protocol (CAP)

Validate against CAP v1.2 schema


and verify digital signature
CAP extract data-timestamp from content
CAP

WIS 2.0

plugins:
xml:
- plugin: wis2box.data.cap_message.CAPMessageData
notify: true
buckets:
- ${WIS2BOX_STORAGE_INCOMING}
file-pattern: ‘^.*\.xml$'
Summary

Dataset-driven workflow in wis2box-managment: a corresponding dataset needs to be


configured to publish WIS2-notifications for incoming data

Datasets in the wis2box consist of two components:


• “Discovery Metadata” to publish a WCMP2 record for the Global Discovery Catalogue
• “Data Mappings” that define the plugins used to transform and publish the data

Datasets can be configured using the dataset-editor in wis2box-webapp or using YAML


files and the command line in wis2box-management

WMO Support
Thank you
Merci
Gracias
谢谢

You might also like