SlideShare a Scribd company logo
@openaire_eu
A user journey in
OpenAIRE services
through the lens of repository managers
PedroPríncipe,UniversityofMinho,AlessiaBardi,CNR-ISTI,AndréVieira,UniversityofMinho,JochenSchirrwagen,BielefeldUniversity,
OR2019 Workshop – 1st part
WORKSHOP MAIN TOPICS – OpenAIRE SERVICES
Interoperability guidelines
OpenAIRE Research Graph
Content acquisition policy
Literatureguidelines V4
Exploreservice(demo)
Infrastructurenovelties
Services forRepoManagers
Dashboard forContent
Providers
Broker&usage statistics
Dashboard demo
Servicetestdrive
Userfeedback
09:00 – Welcome and introduction, Pedro Príncipe
09:20 – OpenAIRE graph expansion: an academic graph aggregating all information required to deliver
monitoring tools
09:50 – OpenAIRE content acquisition policy and the new terms of agreement for content providers.
10:05 –Explore service demo (and beta test drive) + Showcase metadata quality issues
10:30-11:00 – Coffee break
11:00 –OpenAIRE interoperability guidelines overview
11:10 –Guidelines for Literature Repositories: implementation and early adopters
11:30 –RCAAP use case & HAPLO use case
11:50 –OpenAIRE Validator demo - testing the compliance against version 4
12:00 –Breakout groups -discussion
12:20 –Wrap-up
AGENDA
14th International Open Repositories Conference, June 10th, Hamburg, Germany
SLIDES HERE:
bit.ly/openaire_or2019
http://box.openaire.eu/index.php/s/V9xSCykE5oklxMC
WorkshopsTopics–1stpart
1)OpenAIREgraphexpansion:anacademicgraphaggregatingall
informationrequiredtodelivermonitoringtools
2)OpenAIREcontentacquisitionpolicyandthenewtermsofagreement
forcontentproviders.
>>>Exploreservicedemo(andbetatestdrive)
>>>Showcasemetadataqualityissues
3)OpenAIREinteroperability guidelinesoverview
4)GuidelinesforLiterature Repositories:implementationandearly
adopters: RCAAPusecase&HAPLOusecase
>>> Validatordemo-testingthecomplianceagainstguidelinesV4
>>>Breakoutgroups
Research
communities
Researchers (All)
Content providers
Innovators
Research
managers
Funders
Building the OpenAIRE research graph and the Dashboard services
Infrastructure
Validation
Cleaning De-duplication
Inference
Project communiity
FunderFunding
Product
Publicatio
n
Data Software
Organizatio
n
TERMS
OF USE
Harvesting Uploading
Brokering
Source
ORP
Publications
repositories
Data
repositories
Hybrid
repositories
Registries
OA
Journals
Software
repositories
Content Providers Research
Infras
GUIDE
LINES
OpenAIRE Services
From basic infrastructure level to value added
https://explore.openaire.eu
https://beta.explore.openaire.eu
www.menti.com
192517
1
OpenAIRE graph
expansion an academic graph aggregating
all information required to deliver
monitoring tools
OpenAIRE
graph expansion
An academic graph aggregating all
information required to deliver monitoring
tools
Slides by Paolo Manghi
Presented by Alessia Bardi
Institute of Information Science and Technologies - CNR
Science publishing, evaluation, and monitoring
Open
Science
Providing an open metadata
research graph of interlinked
scientific products, with access
rights information, linked to
funding information and research
communities
The OpenAIRE research graph
Open
Complete
De-duplicated
Transparent
Participatory
Decentralized
Trusted
OpenAIRE e-infrastructure
MaterializingtheOpenSciencegraph
Project
communit
y
FunderFunding
Product
Publication
Research
Data
Software
Organization
Source
Other res.
products
MiningHarvestingDeduplication
• Harvested data sources
10K +
• Harvested records
500Mi +
• Publication full-texts
7.5Mi (soon 10.5Mi+)
• Harvested/mined links
200Mi +
• Repositories and publishers
Download from URLs in harvested metadata: 6.8Mi
Machine-learning on OA URLs from large aggregators (DOAJ,
CrossRef/Unpaywall): 3.3 Mi (downloaded, under integration in
BETA)
• Publishers metadata/PDFs via CORE-UK
ResourceSync
Springer Open Access, etc.: 750K
Open Access articles sources
Mining results: links
Project community
FunderFunding
Product
Publication Research Data Software
Organization
Source
Other res.
products
5.17M
1.96M
1.24M
218M
fundedBy
affiliated
refersTo
similarTo
75k
40k
relatedWith
• Document classification: 3.86M of pubs with at least one
class assigned:
arxiv: 2.35Mi, meshEuroPmc: 3.64Mi, acm: 832k
• Document properties
New abstracts: 1.3Mi
• Document references
168.44M bibliographic references for 5.33M pubs
• Document external links
PDB reference extraction: 320k references (68k of unique pubs)
Mining results: properties
Context Propagation
Product
Project
Product
supplementedBy
fundedBy
communit
y
Product
Project Source
ofInterestofInterest
fundedBy
hostedBy
Product
supplementedBy
BETA
Product
Source
Country
Project
Organization
hostedBy
(institutional repository)
Funder
funds
(National Funder)
fundedBy
jurisdiction
located
operates
Open metadata
Springer
Microsoft
Repositories
Archives
Repositories
Archives
Complete aggregation
coverage
Academic Graph
Project
communit
y
FunderFunding
Product
Publication
Research
Data
Software
Organization
Source
Other res.
products
… and more
… and more
… and more
… and more
… and more
… and more
De-duplicated
Entity type # Collected
records
# Records after
cleaning and
de-duplication
# Identified
duplicates
Publications ~ 343M ~ 94M ~ 249 millions
Data ~ 5,2M ~4,6M ~ 600K
Software ~150K ~ 134 K ~ 20K
Other ~ 5M ~ 4,5M ~ 500K
Organisations ~ 380K ~220K ~ 160K
More information about the de-duplication framework used by OpenAIRE can be found
searching on Zenodo for :
• “De-duplicating the OpenAIRE Scholarly Communication Big Graph” (poster)
• “GDup: De-Duplication of Scholarly Communication Big Graphs”
• Rely on quality scholarly
communication sources of
different kinds
Participatory
• Include solutions and content
from any interested and known
content provider in scholarly
communication
Institutional repositories
Aggregators
Data archives
Software repositories
Research infrastructure sources
Funder grant databases
Authors & Orgs entity registries
Publishers & journals
• Metadata in the graph includes provenance when harvested
and reliability indicators when obtained from mining
Transparent
• Preservation and ownership beyond OpenAIRE
Exchanged with other graph initiatives
Redistributed via subscription and notification to
contributing data sources (provide.openaire.eu)
• Openly accessible via APIs
(develop.openaire.eu)
Decentralized
• Authors in the loop to enrich their ORCID record
• Validation of end-user ”claims”
Trusted (in progress)
Monitoring Tools
OpenAIRE Open Science Monitoring
Research
communities
Research
admins
Funders
Research
Organizatio
n
Open Access/Science
trends
Research impact
Research Community
Dashboard for Research
Infrastructures (Initiatives)
For building community-specific
gateways to Open Science
AvirtualenvironmenttoimplementOpenSciencepublishingpractices
Report to funders
Uptake of Open Science
publishing practices
Research Impact
All the relevant
research products
DEPOSIT ANYTHING
… linked
On demand publishing on services of
Research Infrastructures
Ongoingcollaborations
Research Infrastructures/initiatives Disciplinary researchcommunities
OpenAIRE - EOSC Hub - EC meeting | Amsterdam | 15th Dec 2017
• Sustainable Development Solutions
Network (Greece)
• Agricultural and Food Science
• Fisheries and Aquaculture Management
• European Marine Science
• Neuroinformatics
• Digital Humanities and Cultural Heritage
AimsforResearchInfrastructures
Publications, researchdata,software
publishedthankstotheexistence ofthe
RI
Funding
Impact
MonitoringofOpenScienceimpact:
data/software FAIRness, reproducibility
trends
OpenAccess/Science
Impact
• Open Science indicators
Added value functionalities
Funder Dashboard
(Project dashboard)
A user journey in OpenAIRE services through the lens of repository managers - #OpenREPO2019 workshops 1st part
Aims
Publications, researchdata,software
publishedthankstograntsawardedby
thefunder
Funding
Impact
MonitoringofOpenScienceimpact:
data/software FAIRness, reproducibility
trends
OpenAccess/Science
Impact
• Funders
• Trends in research fields: new (multidisciplinary) disciplines
• Projects
• Interconnections, possible liaisons
• Institutions
• OA/OS behavior, ability to attract cross-funder grants
Added value functionalities
Institutional Dashboard
(under development)
Aims
Abilityofresearchersaffiliated
withtheinstitutiontoproduce
innovativeandqualityscientific
products
Research
Impact
Abilityofservicesmaintainedand
operatedbytheinstitutionto
supportresearchersatproducing
orstoringscientificproducts
Servicecapacity impact
Abilityofinstitutiontoreach
fundingfromdifferentfunders
anddisciplines
Funding
impact
• Funders
• Recent and past EC and other funders’ activities (representing various
funding levels)
• Checking compliance to funder mandates
• Institutions
• Collaboration network (by institution) via projects and products
• Projects
• Compare project portfolio against that of other similar institutions
(anonymized?)
Added value functionalities
Tell us!
Thoughtaboutan added
valuefunctionalitywe did
not mentioned?
Explore the BETA graph
and tell us how to improve!
https://beta.explore.openaire.eu
OpenAIRE - EOSC Hub - EC meeting | Amsterdam | 15th Dec 2017
Questions?
2
OpenAIRE content
acquisition policy
new terms of agreement for
content providers
ALL Literature, Research data,
Software, Other research products
www.openaire.eu/policies
Open Access & non-Open Access material
ALL Literature, Research data, Software, Other research products
OpenAIRE Content Acquisition Policy + complete
RespectingtheOpenAIREguidelines(DataCitemetadata)
UsingPIDswithresolvers
released 05-Oct-2018,
https://doi.org/10.5281/zenodo.1446408www.openaire.eu/content-aquisition-policy
ALL SCIENTIFIC RESEARCH
PRODUCTS
literature, dataset, software,
other research products
METADATA QUALITY
with a minimal quality
conditions under which
metadata can be accepted
OF ALL ACCESS LEVELS
open, closed, metadata only
what data/metadata
we collect
It’s important that the access
level of a record is made clear
Each record must contain a
PID (or URL) that resolves to a
splash page
Is vital that the access level
of a record is clear (by an
access level statement on
record level, alternately by
the use of specific OAI-sets)
how we process
Metadata describing Open Access and non-Open Access material
will be included and links to other products will be resolved
where this is possible.
Metadata describing Open Access and
non-Open Access material will be
included and links to other products
will be resolved where this is possible
(i.e. the provided PIDs have a resolver).
as stated in the Content Acquisition Policy, published Oct. 2018
https://doi.org/10.5281/zenodo.1446408
Objectives of OpenAIRE’s Aggregation Policy
Objectives of OpenAIRE’s Aggregation Policy
CAP supported by the set of Guidelines for
Open Science Content Providers
https://guidelines.openaire.eu
https://explore.openaire.eu
https://beta.explore.openaire.eu
https://explore.openaire.eu
How and where is my repository represented in OpenAIRE
- Repository landing page, content, figures, metrics, projects/funders and organizations…
- last index update/collection monitor
https://beta.explore.openaire.eu
Novelties: orcid, collected from, communities…
EGI Application database
OMICS DI
Kaggle
ReactToMe
DOECode
Unpaywall
New data sources
OpenAIRE - EOSC Hub - EC meeting | Amsterdam | 15th Dec 2017
Metadata Quality Challenges
Issue Affects Proposed Solutions
Missing values Indexing, discovery, reuse Curation by repository team;
use OpenAIRE Validator,
Broker service
Missing Links and
Identifier
Interlinking with other research
products; Contextualisation
ScholXplorer, Broker service
Lack of controlled values Discovery Use agreed controlled
vocabularies according to
OpenAIRE Guidelines
Mandatory values only Discovery and reuse Broker service
• Open Access version coming from one of the
sources:
https://explore.openaire.eu/search/publication?articl
eId=dedup_wf_001::0ea9b3d0d7300315854e7f25e49
9d2b9
• Document classification:
https://explore.openaire.eu/search/publication?articl
eId=od______1874::6331f80a2b9758f56609a874e9ad
dc26
• For more: look at the Content Provider Dashboard
Records enhanced by de-duplication
• Link to software (re-use):
https://explore.openaire.eu/search/publication?articleId=
od________18::4405ffb18cc37d73d0daff3650e48f82
• Link from a software to its «main» publication:
https://explore.openaire.eu/search/software?softwareId=
openaire____::949d7264f0efb7a27e521fee9c59209b
• Software not available on GitHub, but on
SoftwareHeritage only:
https://explore.openaire.eu/search/software?softwareId=
openaire____::8bf2fbf6cb1f0c9552ca0a6fd0aecfbc
Records enhanced by full-text mining (1)
• Reference to a Research infrastructure:
https://explore.openaire.eu/search/publication?
articleId=nora_uio__no::3197de1949480eb9f3fc
82ba26ad2e25
• Link to project:
https://explore.openaire.eu/search/publication?
articleId=dedup_wf_001::4be652d611c4bbcf897
118bdb564c557
Records enhanced by full-text mining
• Take care of your PIDs:
https://explore.openaire.eu/search/dataset?data
setId=dedup_wf_001::69a0263a2925140e015c44
70779f79c1
Quality issues
(Aaltodoc Publication Archive, DSpace)
comparison of OAI-PMH OpenAIRE endpoint/set and standard endpoint/set
• different number of records (due to former OpenAIRE Content Acqu. Policy)
completeListSize="12413" vs. completeListSize="36886"
• non-normalised resource types
<dc:type>info:eu-repo/semantics/article</dc:type> vs.
<dc:type>A1 Alkuperäisartikkeli tieteellisessä aikakauslehdessä</dc:type>
• non-normalised or missing access levels
<dc:rights>info:eu-repo/semantics/openAccess</dc:rights> vs.
<dc:rights>openAccess</dc:rights>
A not so rare example
OpenAIRE interoperability
guidelines overview Guidelines for Literature
Repositories: implementation
and early adopters
3
@openaire_eu
OpenAIRE Interoperability Guidelines
Jochen Schirrwagen
Bielefeld University Library, Germany
Workshop A User Journey in OpenAIRE Services | Hamburg | 10 Jun 2019
Evolution of OpenAIRE-Guidelines
2010
Literature
Guidelines v1
2012
- Literature
Guidelines v2
- Data
Guidelines v1
2013
Literature
Guidelines
v3
2014
Data
Guidelines
v2
2015
CRIS-CERIF
Guidelines v1
2018 Guidelines for
- institutional and
thematicrepos. v4.0
-CRIS-CERIF v1.1
2018 Guidelines for
- Software
Repositories
- Other Research
Products
Diversity of Research Results from Different Types of
Sources
Publications
• Article
• Preprint
• Report
• …
Datasets
• Dataset
• Collection
• Clinical Trials
• …
Software
• Research
Software
• …
Other Research
Products
• Service
• Workflow
• Interactive
Resource
• …
Institutional/
publication
repositories
Journals/
publishers
Data
repositories
Other
Products
repositories
Software
repositories
Metadata Goals in OpenAIRE
Goal Metadata Groups
Discovery and Citability Descriptive metadata
Accessibility and Reuse Access Rights, License Conditions
Contextualization Research Project, Linked Research Artefacts
Interoperability Identifier for Entities, Controlled Vocabularies
Reporting Funding Reference
TDM File Location, License Conditions
OpenAIRE‘s Guidelines for
Open Science Content Providers
https://guidelines.openaire.eu
Metadata describing Open Access
and non-Open Access material will
be included and links to other
products will be resolved where
this is possible (i.e. the provided
PIDs have a resolver).
as stated in the
Content
Acquisition Policy
Role of PIDs in OpenAIRE
OpenAIRE Guidelines for
Literature Repository
Managers v4.0
http://dx.doi.org/10.5281/zenodo.1299203
(Released Nov-2018)
• Established standards: Dublin
Core and DataCite metadata
scheme
• To describe different kinds of
scholarly works
• Defines Application Profile
• Controlled Vocabularies and
Persistent Identifiers for
different entities
required
● PIDs for scholarly publications (with versioning)
● Deposition of content with LTP programme (eg. CLOCKSS)
● Article level metadata interoperable non-proprietary format,
under a CC0 public domain, incl. funding information
● Machine-readable information on Open Access status and
the license
Plan S - Requirements and Recommendations
recommended (strongly)
● PIDs for authors (e.g., ORCID), funders, funding programmes and
grants, institutions, and other relevant entities.
● Registering self-archiving policy of the venue in SHERPA/RoMEO.
● Availability for download of full text for all publications (including
supplementary text and data), eg. JATS XML.
● Direct deposition of publications by the publisher into … Open
Access repositories that fulfil the Plan S criteria.
● OpenAIRE compliance of the metadata.
● Linking to data, code, and other research outputs.
● Openly accessible data on citations according to the standards by
the Initiative for Open Citations (I4OC).
Plan S - Requirements and Recommendations
Implementation in Repositories
Software Supported Version Status Comments
DSpace 7 (in prep.)
5 & 6 (in test)
In preparation - DSpace
OpenAIRE 4.0 WG
Implementations by PT
repos RCAAP for v.5
70 days effort (WG
timeline plans)
Documentation will be
available ASAP
EPrints All Contacted May need funding via
Jisc or OpenAIRE
Invenio / zenodo On their roadmap
Islandora Contacted
Librecat Contacted
OPUS 4 (in prod.) Contacted
MyCoRe Contacted
HAL Contacted May have very limited
resources
Fedora Will contact
Haplo Implemented
Implementation Examples
Implementation Examples
● Guidelines at https://openaire-guidelines-for-literature-repository-
managers.readthedocs.io/en/v4.0.0/
● Schema and examples on github
https://github.com/openaire/guidelines-literature-repositories
References
Questions?
OpenAIRE Guidelines 4
@ RCAAP Project
José Carvalho
jcarvalho@sdum.uminho.pt
Why?
• Need for a specific format for scientific publications
• More specific fields for metadata fields
• Support for hierarchichal information (concept of entities)
• Metadata Alignement with other services (datacite, openaire,….)
What changes?
• Repositories
• Journals
• Search Portal
• Policies
What Changes? On Repositories…
- Submission forms
- Expose in OpenAIRE 4 metadata schema (OAI-PMH)
Transformation of project ID into project
entity
info:eu-repo/grantAgreement/EC/FP7/612425/EU
Submission Forms (authors)
• Author IDs
• COAR Taxonomies
Expose OAI-PMH with oai_openaire
What Changes? On Journals (OJS)…
• OAI PMH (not yet developed)
What Changes? On Search Portal…
• New harvesting process
• Support for multi metadata schema (oai_dc; xoai; oai_openaire)
• Support for type of resource (repository, journal,…) and type of metadata
schema
• New transformation and validation processes
• DRIVER Types  COAR Types
• New ways to present the information
• On the user interface
• On OAI-PMH
• On REST API
What Changes? On Policies level…
• New harvesting policy for the National Harvester
• Important for content regulation
• Develpment of new profile for OpenAIRE 4 on RCAAP Validator
• Important to align with national and international services and
developments
Final Considerations
• Pilot Dspace 5 instance with guidelines OpenAIRE 4 implemented
• All the information will be available to implemente in Dspace software
Participation on the Working Group DSpace OpenAIRE
• New harvesting rules and pilot with OpenAIRE 4 soon
• Mappings of information
• Still some lacks of information between some services (information may be lost)
• Already some suggestions for Guidelines OpenAIRE 4.1
Thanks!
José Carvalho – jcarvalho@sdum.uminho.pt
www.menti.com
192517
• How to optimize the information exchange (metadata
and fulltext) between repositories and OpenAIRE? How
to reduce the burden to repository managers?
• How could we help - what kind of support to you would
like to have from OpenAIRE?
• What are the major metadata quality issues and how to
solve them?
Breakout groups – questions:
Thank you!PedroPríncipe,UniversityofMinho,pedroprincipe@sdum.uminho.pt;
AlessiaBardi,CNR-ISTI,alessia.bardi@isti.cnr.it
AndréVieira,UniversityofMinho,andrevieira@sdum.uminho.pt;
JochenSchirrwagen,BielefeldUniversity,jochen.schirrwagen@uni-
bielefeld.deinfo@openaire.eu
info@openaire.eu

More Related Content

A user journey in OpenAIRE services through the lens of repository managers - #OpenREPO2019 workshops 1st part