0% found this document useful (0 votes)
47 views5 pages

M4M Faqs: Q1: What Is The Goal of The M4M Workshops?

This document provides answers to frequently asked questions (FAQs) about Metadata for Machines (M4M) workshops. The overall goal of M4M workshops is to help domain researchers create high-quality metadata describing their research assets that adheres to FAIR principles. M4M workshops are facilitated by FAIR metadata experts and domain experts to efficiently create reusable and machine-actionable metadata. The document addresses questions about the benefits of participating in M4Ms, how to find and access metadata from other researchers, and the roles of researchers and data stewards in the metadata creation process.

Uploaded by

Karim Hajji
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
47 views5 pages

M4M Faqs: Q1: What Is The Goal of The M4M Workshops?

This document provides answers to frequently asked questions (FAQs) about Metadata for Machines (M4M) workshops. The overall goal of M4M workshops is to help domain researchers create high-quality metadata describing their research assets that adheres to FAIR principles. M4M workshops are facilitated by FAIR metadata experts and domain experts to efficiently create reusable and machine-actionable metadata. The document addresses questions about the benefits of participating in M4Ms, how to find and access metadata from other researchers, and the roles of researchers and data stewards in the metadata creation process.

Uploaded by

Karim Hajji
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 5

M4M FAQs

Here we assemble common questions that ID&AMR M4M participants ask. These FAQs will be
used in our future communications and training events (likely to become a webpage at the GO
FAIR Foundation and Health-RI websites). Please feel free to add questions or comments in
this document that you consider to be relevant for others.

This document is maintained by Erik Schultes: [email protected]


See also: https://www.go-fair.org/today/making-fair-metadata/

Overview

Q1: What is the goal of the M4M workshops?


The overall goal of the M4M workshop is to assist the domain researcher (or, in general, a data
producer) in creating high-quality FAIR metadata describing research assets. The M4M
workshop was created in recognition of the fundamental role played by metadata in achieving
overall FAIRness, but also that the skills to build good metadata are not always readily
available. Hence, in the M4M the domain researcher works with FAIR metadata experts to
create metadata that are 1) reusable, 2) domain-specific and 3) FAIR (machine-actionable). To
make metadata creation more scalable, the M4M workshop format was designed to be as light-
weight as possible. In the M4M, the domain researcher ensures that the metadata are relevant
and adheres to the community standards, while the FAIR metadata expert guarantees
adherence to the FAIR Principles. Once created, there are many ways the metadata can be
deployed by both humans and machines. Furthermore, when using CEDAR it is possible to
track the creation and use of metadata components (templates and vocabularies) created in one
workshop for their efficient reuse in another. Not only does this save time (prevents the
reinvention of the wheel) but it also drives interoperability and convergence. In short, the M4M
workshop makes it easier for humans to create metadata for machines, so that machines can
better help people.

There are a number of different flavours of M4M workshops specialized for different audiences
and different purposes. In any case, M4M workshops are offered following two broad models:
1. M4M workshop as a service: FAIR metadata experts facilitate a controlled ‘brainstorm’
process with domain experts to document the domain-relevant community standards
that describe their research assets. The FAIR metadata experts then work off-line to
craft the needed vocabularies and templates fit for, and serving the community. In the
service model, we assume the community is willing to fill out metadata forms, but not
build them. The service model is expedient, but the researcher/data stewards remain
dependent on the M4M facilitators to build and maintain metadata [The ID&AMR
programme follows the service model].
2. M4M workshops as a training event: FAIR metadata experts facilitate accelerated
training for data stewards (and researchers who may also be interested) to create,
adapt, and extend metadata vocabularies and templates, and embed these skills locally.
The training model delivers FAIR metadata, but the process takes longer. In the end, the
training model will be more sustainable.

Q2: We already create metadata at the repository where we store


our research data. Why do I need to create metadata again in this
workshop?
At many repositories you will be asked to fill out metadata fields when you deposit your data.
Some fields are mandatory but these are usually only the general ones. Domain-specific fields
are usually not mandatory but are nonetheless extremely important for data reusability. In many
cases, the repository does not enforce the use of controlled vocabularies (indeed, in many
cases they allow free text response). This often results in non-interoperability of metadata and
an effective loss of the data from scientific discourse.

For example, in the EOSC-Nordic WP4


Shaping up the Nordics for EOSC, semi-automated FAIR-assessment approach, datasets from
100 data repositories were sampled and assessed for their levels of FAIRness. A highlight of
this FAIR-assessment is that The majority of repositories are not very FAIR, primarily because
they do not support machine-actionable metadata. An accompanying summary report
concludes: 30% of the repositories had no support for machine-actionable metadata
whatsoever, a few repositories supported some degree of machine-actionable metadata or had
some metadata standards in place. A handful of repositories scored more than 50 % on
machine actionable metadata. There is still a lot of work to do. See also The variable quality of
metadata about biological samples used in biomedical experiments and a recent presentation
Making COVID Data Accessible.

In contrast, the CEDAR form we use in the M4M uses controlled vocabularies to ensure
automated interoperability, and it can be extended as needed by a domain community to cover
a range of metadata descriptors that the community finds relevant and important for Findability
and Reuse. When exposed on platforms like the Health-RI Portal or FAIR Data Points these
metadata can be used directly to locate the data, even when they are located in a repository
that does not support machine-actionable metadata.

Q3: What is the direct benefit for me and my research, when I


work with M4M templates?
Data volumes and complexity now exceed human comprehension. For example, since January
2020, more than 138 thousand research articles related to COVID-19 have been published
(some 250 articles per day, all with associated datasets LitCOV). The FAIR metadata templates
and vocabularies you help to create in the M4M workshop, when filled in by you and your
colleagues, will increase the Findability and Reusability of your data by your community, by
those who do not know of your work, by automated systems, and most importantly by you and
your lab. FAIR metadata will also better ensure (via licensing and other FAIR attributes) that you
get credit (citations) when your data are reused.

Q4: How do I find machine readable metadata that are already


provided by other researchers?
When a user completes a CEDAR form, the captured metadata (called metadata instances) are
stored in the individual work space of the user on the CEDAR Workbench. Currently, only
CEDAR admins can see these metadata instances, but they use a script to pool all the
metadata instances from our M4M into a common folder. In this way the CEDAR template
instances are prevented from being manipulated by others. From here, Health-RI pulls the
metadata (via the CEDAR API) into the COVID-19 Portal. The Portal is automatically updated
every 15 minutes, so the metadata of completed forms can be displayed in near real time, and
provides a dashboard for the metadata created in the COVID-19 program (130 projects). The
Portal is still in prototype, but can already provide detailed search services at the project, and
soon the dataset level. Additional plans are underway to deploy the metadata instances on so-
called FAIR Data Points that allow the metadata to be more fully machine-actionable, and
allows the machine to automatically access the datasets themselves via well-described access
protocols. It's important to note, the use of CEDAR, the Health-RI Portal and FAIR Data Points
makes it possible to expose the same metadata in multiple locations, and linked to the datasets
irrespective of the repository where they are stored.

Q5: Is there a human readable counterpart for the machine


readable metadata?
Yes there is. You can see and read the metadata directly in CEDAR if you have a CEDAR
account. Alternatively, you can view metadata instances without a CEDAR account on
OpenView. Lastly, the Health-RI Portal provides a way to anyone to see and search the
metadata instances created in your (and other) M4M workshops.
Q6: What is the role of the researcher in the creation, adaptation,
or extension of M4M templates? And what is the role of the data
steward?
These roles are still evolving. However, roughly speaking, the researcher brings the knowledge
and experience of the domain and current best practices, whereas the data steward has the
technical skills to create and use metadata vocabularies and templates. In the creation of high
quality FAIR data and services, both expertises are absolutely essential, and the researcher and
data sewards form a team. Including the researchers in the creation process has the advantage
to adapt the template structure and the vocabulary to their requirements, ensuring domain-
relevant community standards are followed. In general we expect data stewards will
create/adapt/maintain the vocabularies and templates, while normally the researchers will more
routinely fill in the metadata.

Q7: How can the metadata we are going to produce be further


used?
An example of what can be done with the metadata you are producing - please check the
COVID-19 Portal <https://covid19initiatives-test.health-ri.nl/p/ProjectOverview> being built by
Health-RI, as part of the FAIR Data Stewardship Support of the COVID programme. The
metadata in the Portal is based on the filled in CEDAR templates by the ZonMw COVID-19
projects involved in the COVID M4Ms. Irrespective of the repository where each project's data is
stored, you will be able to find it via the Portal. Because the metadata is machine readable
(uses controlled vocabularies and is well structured), the Portal can automatically upload it, and
all of the projects and their data will be findable and reusable. Data can then be requested,
according to the associated license, governance defined per dataset.
QX: What kind of communities are M4Ms targeting?

QX: How much time do M4Ms require from the participants?

Practical Elements

Q8: Why do I have to create an account on so many platforms?


The GO FAIR M4M workshops aim to create metadata that stick close to the FAIR Principles.
Although the M4M workshops are agnostic about technology platforms, the combination of
CEDAR, Bioportal and GitHub are extremely well suited to the M4M format. Furthermore,
following an M4M workshop, researchers and data stewards will want, from time to time, to
create new templates or adapt or extend existing ones. To save valuable time in the workshop,
we suggest creating these accounts before the workshop, and have the login information ready
at hand. The following platforms are essential for the creation, adaptation, or extension of M4M
templates and for running of the M4M workshop itself:
a. CEDAR Workbench - our primary tool for authoring metadata webforms that can automatically
produce domain-relevant machine-actionable metadata following community standards.
b. BioPortal - a repository of over 800 controlled vocabularies that powers the auto-complete and
drop-down menus in CEDAR webforms.
c. GitHub - the FAIR Data Collective has created a service running in GitHub to automatically
register your vocabulary in Bioportal, from a Google sheet.
d. Google - we like to use google docs and sheets for group activities.
e. ORCID - or similar globally unique, persistent and resolvable identifier for yourself… a FAIR
identifier for yourself is essential if you are to ever author FAIR data and metadata.

You might also like