0% found this document useful (0 votes)
15 views17 pages

Gajbe 2021

Uploaded by

fat moghi
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
15 views17 pages

Gajbe 2021

Uploaded by

fat moghi
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 17

Information Processing and Management 58 (2021) 102480

Contents lists available at ScienceDirect

Information Processing and Management


journal homepage: www.elsevier.com/locate/ipm

Evaluation and analysis of Data Management Plan tools: A


parametric approach
Sagar Bhimrao Gajbe 1 , Amit Tiwari 1 , Gopalji ∗,1 , Ranjeet Kumar Singh 1
Research Scholar at DRTC, Indian Statistical Institute, 8th Mile Mysore Road, Bangalore-560059, India

ARTICLE INFO ABSTRACT

Keywords: This paper explores the openly available DMP tools and forms a comparative analysis aimed at
Data Management Plan assisting researchers and data managers to formulate effective data management plans. Based on
Data Management Plan tools a literature review 14 DMP tools were selected and were evaluated using 45 selected parameters.
Data Life Cycle
The study enlists and enunciates the features of DMP tools, spots several gaps in DMP practices,
Research Data Management
and provides a few recommendations that can improve the existing tools and DMP practices.
Compared to other related works, present work sheds extra light on percentage coverage of
parameters by each tool and percentage coverage of tools by each parameter. It is identified
that selected tools cover 50%–84% parameters, whereas 78% parameters are covered by half
the selected DMP tools. Moreover, 28% of the tools cover 60% of the DMP assisting parameters.
Additionally, co-occurrence of parameters and correlation among the tools are illustrated using
matrices. It was found that co-occurrence of data description/summary/collection, documen-
tation and metadata, findability, and accessibility parameters are relatively higher and all the
selected tools are positively correlated to each other. The study is impactful for the researchers,
librarians, data managers, and funding agencies for selecting an appropriate DMP tool as per
their requirement.

1. Introduction

Research requires a huge amount of funding, infrastructure, resources, support, hard work, and passion. Moreover, it generates
enormous amounts of crucial data. To manage such data there is a need for an explicit way that can facilitate the Research Data
Management (RDM). However, until recently, this aspect (data management) had got little attention from its stakeholders such as
funders, researchers, data managers, libraries, and research organizations although it is now being included as conditions for grant
of funds (e.g., National Science Foundation and Wellcome Trust have made the data management plan submission mandatory for
availing the research funds). The stakeholders need to establish a systematic data management plan (DMP), that can facilitate the
exchange and reuse of research data.
DMP has three components: data, management, and plan (Smale et al., 2018). Data is a raw fact that can be further processed,
analyzed, and arranged for gaining meaningful insights. Management refers to the strategies that effectively organize anything to
minimize time and effort while maximizing productivity. The plan is a systematic course of action to achieve a pre-defined goal.
Data management includes necessary steps, measures, and strategies to manage the entire data life cycle. It involves identification,
collection, preparation, organization, classification, processing, analysis, storage, publishing, curation, and reuse of data (Gupta &
Müller-Birn, 2018). The concept of data management is often misunderstood with data repositories, however, data management is

∗ Corresponding author.
E-mail address: [email protected] (Gopalji).
1 Equal contribution.

https://doi.org/10.1016/j.ipm.2020.102480
Received 30 July 2020; Received in revised form 17 December 2020; Accepted 20 December 2020
Available online 5 February 2021
0306-4573/© 2021 Elsevier Ltd. All rights reserved.
S.B. Gajbe et al. Information Processing and Management 58 (2021) 102480

the umbrella term that might require data repositories as a storage unit for data preservation and access (Antonio et al., 2019). Data
management is meant for optimum use of data.
The DMP is a document that explicitly outlines the strategies for data management at each stage of the data life cycle (Ball,
2012). DMP is crucial in various stages of a project for example a minimal plan is prepared for a project proposal, a core plan is
presented during the project while a full plan is formulated once the project is completed (Donnelly & Jones, 2011). The first phase
is important from the funder’s perspective as it explains their funding criteria and helps the researchers to get funding. The second
phase is critical from the researchers’ perspective and helps them to handle their data throughout the ongoing research process.
The third phase is significant from a general perspective. It is useful for the funders, researchers, librarians, users, data curators,
etc., since it provides comprehensive metadata and documentation regarding the entire data life cycle. DMPs, all together in three
phases, facilitate data sharing, data curation, and optimum reuse of data. DMP preparation helps to make data FAIR i.e., findable,
accessible, interoperable, and reusable (Wilkinson et al., 2016).
Data management plan can be prepared directly with word templates or with the aid of tools. Direct manual preparation can
have many versions for upstream and becomes tedious due to lack of training and awareness, whereas, due to the guidance support
tool aid DMP process becomes less tedious. Moreover, tool assistance reduces the effort and time. Hence, DMPs can be effectively
designed and managed using DMP tools. Considering this, Digital Curation Centre (DCC) developed the first tool (DMPOnline)
to design a DMP (Donnelly et al., 2010). Over time, DMPs and DMP tools have been gaining popularity. Several DMP tools and
templates have been discussed in the relevant section of this article. Selecting a purpose-specific appropriate DMP tool is challenging
and requires a strong comparative basis. Therefore, to overcome the challenge, present study helps DMP stakeholders to decide the
suitable DMP tool based on their requirements.

1.1. Objective and contribution

The research objective of this work is as follows:

• To identify the parameters that can be used to define and evaluate the DMP tools.
• To find out the openly available DMP tools to design a DMP.
• To explore the comprehensiveness of DMP tools based on their parameter coverage and vice versa.
• To find out the relatedness between the DMP tools.

The major contributions of this paper include:

• The work identifies the existing literature on DMP tools.


• The study found there are significantly fewer reviews on DMP and DMP tools. Moreover, reviews are less comprehensive as
most of them include fewer tools and parameters for the evaluation.
• We have formed a set of parameters to review the selected DMP tools.
• The study identifies the correlation among the DMP tools.
• The evaluation based on parameters helps the researchers, data managers, research funders, and library and information
science professionals in their data management and curation.

The rest of this paper is organized as follows: Section 2 presents a literature review of DMP tools and discusses the related works.
This section also identifies gaps in the literature and justifies the relevance of the present study. Section 3 illustrates the research
methodology. Section 4 enlists all the tools available online, describes the parameters identified for evaluation, and addresses the
findings. An extended discussion is carried out in Section 5. Section 6 highlights the theoretical and practical implications of the
work. Finally, Section 7 concludes the paper and provides directions for future developments.

2. Literature review

2.1. Data management plan

According to Wittman and Aukema (2020), data management facilitates the reusability of the data, hence it saves time and
resources. A systematic plan improves the data management process, therefore, several funders have made it mandatory to submit
a detailed data management plan for research grants (Stodden et al., 2019). In support of the above statement, Kennan (2018)
interrogates the need for data management in research and lists out a total of ten basic steps required to design a DMP. Further,
he discusses two different data life cycles, i.e., Digital Curation Center (DCC) and Data Documentation Initiative (DDI), based on a
data management plan, process, legal, ethical, and policy requirements.
DMP provides step-by-step guidance to manage the data. It is developed using several components such as metadata standards,
access policies, archiving policies of the data, etc. Baykoucheva (2015) and Nightingale (2020). Additionally, DMP development
depends upon various factors such as the significance of the project, geographical scope, funding source, complexity, duration,
number of participating organizations, etc. Sutter et al. (2015). Lefebvre et al. (2020) addressed organizational and technological
challenges towards RDM. The study also explored the RDM stakeholders and practices. Developing a standard operating procedure
(SOP) for a data management plan is critically significant for an institute. An SOP comprises a set of DMP components. It includes
the selection of a database for data management, defining the management task, data entry, file management, data cleaning, coding
and reconciliation, data processing, risk monitoring and preservation, and archiving (Brand et al., 2015).

2
S.B. Gajbe et al. Information Processing and Management 58 (2021) 102480

The concept of data management plan is new, and there is a lack of awareness about it among the scientific community (Parham
et al., 2016; Vitale & Moulaison Sandy, 2019). In furtherance of the above, Bishop et al. (2020) interviewed 14 Belmont Forum
members and found a void in existing knowledge of data management theory and practice. In order to fill the void, extensive
training programs are needed at the institute level for making awareness about DMP among the researchers. Moreover, Holles and
Schmidt (2018) advocated the need of teaching DMP in RDM courses to graduate students that should be co-taught by librarians
and faculty members. Library professionals and administrative personnel can effectively help the researchers in designing a DMP.
DMP assistance is one of the major research data services the libraries provide nowadays (Claibourn, 2015; de León & de Ferrer,
2018). Libraries can facilitate all the four layers of data management including, curation, preservation, archiving, and storage.
Hence, library professionals are in a better position to assist and train the researchers in designing a DMP and furthermore in data
management. In support of the above statement, Exner (2018) discussed a case where agricultural librarians support and assist
faculties of USDA NIFA in writing the DMP.
Recently, several surveys and case studies have been conducted on RDM and DMPs. Aleixandre-Benavent et al. (2020) surveyed
Spanish researchers in the health sciences, to identify their habits and current experiences in managing and sharing raw research
data. The author found that 54.9% of researchers do not have DMPs for their research data. They also learned that 81% researchers
use personal computers for storage because of the fear of misuse or misinterpretation of data and loss of authorship. Similarly, Melero
and Navarro-Molina (2020) also identified several concerns and concluded that researchers are unclear about the concept of working
plans and data management plans. In another study, Bunkar and Bhatt (2020) spotted that 88% researchers of Parul University
supported data sharing and reuse, but were more concerned about intellectual property rights while sharing the data for the public
use. Dogan et al. (2020)’s questionnaire based research finding reflects that Turkish scholars have low experience in creating data
management plans. Kaari (2020), in a survey that includes three Arab University faculties, encountered a very less response rate
i.e., 8%. Based on the response it was found that 97% respondents use personal computers for storing data. With the help of three
cases of clinical research, Bowman and Maxwell (2018) demonstrated how DMP can be a good starting point to manage protected
health information (PHI). They justified that effective DMP can handle the challenge of safe and effective use of PHI. Recent literature
reflects that there is significantly less awareness of RDM and DMP among the researchers, however, these topics are getting more
attention (Redkina, 2019).
Miksa, Cardoso et al. (2019) summarizes the work of the RDA working group on DMP Common Standard. The study identifies
DMP stakeholders, narrows down the scope of the common data model for machine-actionable DMP (maDMP), and investigates
the necessity of automating DMP tasks. Authors concluded that data management is essential not only to safeguard the data but
also to ensure its correct interpretation and reuse. Later, Miksa, Simms et al. (2019) presented ten principles that can be applied
to put maDMPs into practice. The principles are derived based on both existing DMP practices and future requirements. In another
work, Cardoso et al. (2020) showcase the usage of semantic technologies and envisaged that it can be used to express and exploit
the maDMP features. Additionally, Bakos et al. (2018) presents a prototype of maDMP which is automatically generated. One of
the similar attempts, Romanos et al. (2019) developed a novel methodology for data documentation that can be used as a starting
point for scientific data management. They developed an ontology for data harnessing and selected material science and engineering
domain as a proof of concept.
Since several funders made it mandatory to prepare a data management plan for getting funds (Stodden et al., 2019), it
led to the development of data management plan tools, e.g., DMPOnline, DMPTool, ezDMP, etc. These tools contain a number
of funders’ guidelines and domain-specific templates. The templates enlist numerous questions and guidelines for planning data
management (Reilly & Dryden, 2013). These guidelines are provided by the institutions including, Digital Curation Center
(DCC), MIT libraries, the Australian National University, the National Institutes of Health, and the Rural Economy Land Usage
Programme (Swauger, 2015). There are plenty of product reviews and descriptive articles on various data management plan
tools. Donnelly et al. (2010) advised creating a web-based tool named DMPOnline, which can assist researchers in creating and
exporting customizable DMPs according to UK research funders’ requirements. DMPOnline is one of the earliest data management
plan tools. In another study, Mallery (2014) highlighted features and identified the strengths and weaknesses of a data management
plan tool, namely, DMPTool. Giorgio and Ronzino (2018) provide an overview of the PARTHENOS DMP template which was
developed for the archeology researchers. It is based on the Open Science initiative and FAIR principles. In addition to that, Black
(2018) provided a critical product review of a data management plan tool DMP Assistant. In another article, Pergl et al. (2019)
describe and establish the need for a novel DMP tool Data Stewardship Wizard (DSW). The tool is based on FAIR principles. The
authors claimed that DSW can be a step towards a machine-actionable DMP. Stodden et al. (2019) used control vocabulary and
semantic descriptors for developing a DMP. They selected ezDMP as a DMP tool for implementing their ideas. The workflow of
their work accommodates ezDMP such that it acts as a connecting link between the researcher and funders. The study communicates
to DMP stakeholders regarding the policies of artifact availability and explains to them about the artifact creation, archiving, and
their reuse. A web tool TUB-DMP was developed based on the recommendations of project Horizon 2020 that aimed to maximize
research data reuse. TUB-DMP has a Horizon 2020 template that includes the questions on data summary, FAIR data, allocation of
resources, data security, ethical aspects, and other relevant issues (Kamocki et al., 2019; Kuberek, 2018).

2.2. Related work

A plethora of tools and templates are developed for data management planning. However, there are significantly lesser efforts
in their review and evaluation. In a review, Cauchick-Miguel et al. (2020) highlights some traits associated with research data
management. Based on existing literature, the study enlists a set of checklists and provides a step by step guide for designing a

3
S.B. Gajbe et al. Information Processing and Management 58 (2021) 102480

DMP. The work of Cauchick-Miguel et al. (2020) is still ongoing and needs to present and discuss the complete result. Sallans and
Donnelly (2012) discussed some strategies in the implementation of DMPOnline and DMPTool. In the study, tools were distinguished
on their cultural and philosophical basis. Moreover, it was suggested to have the DMPs based on country, institute, domain, and
funder. They however included only a few tools and evaluation parameters. In another study, Cope (2013) considered a generic
template of DMPOnline for comparing three other templates viz. Research360 DMP template, twenty questions about research, and
DataTrain Postgraduate Data Management Plan Template. It provides a set of recommendations such as (a) researchers should work
in a group, (b) regular workshops are essential to guide the researchers, and (c) the competency questions of templates should be
precise and relevant.
Jones et al. (2019) presented a parameter based evaluation of ten data management plan tools. These tools include DMPOnline,
DMPTool, tools built on DMPRoadmap codebase3, easy.DMP, Data Stewardship Wizard (DSW), Research Data Management
Organizer (RDMO), Research Data Manager (UQRDM), DataWiz, ezDMP, and OpenDMP. In the study, ten parameters were chosen
for the evaluation of the tools. The parameters are name of the tool, operator organization, production release year, functions of the
tool, user support, tool adopting organization, delivery mechanism, funding support, tool codebase, and API support. Although the
study contains a significant number of tools, it has considered only a few parameters for evaluation. Moreover, the parameters are
about the technical specifications of the tools. Furthermore, the study, which is an opinion article, does not present its methodology
explicitly.
Williams et al. (2017) have identified and reviewed 43 DMP requirements (topics) for research funders. The authors found a high
variation in requirements and suggestions for DMP topics among funders, which results in inconsistency while writing the DMP.
They also found that most of the funders emphasize post-publication DMP requirements. In the paper, the authors divided DMP
preparation into two parts, namely, upstream (i.e., pre-publication) and downstream (i.e., post-publication). The authors suggested
there is a significant requirement of upstream DMPs as well since it provides traceability and reproducibility to data. Their work
compares DMP topics based on the funders’ perspective whereas our work compares DMP tools based on the researchers’ perspective.
For the comparison, our work includes not only DMP requirements (named as DMP templates) but also other DMP tool aspects such
as technical specifications and tool features.
It is evident that Jones et al. (2019), Sallans and Donnelly (2012), and Williams et al. (2017) have examined either the DMP tools
or parameters. Present work comprehensively extends their works, particularly findings. However, the work varies in the approach
and scope of the similar works.

3. Methodology

This study is based on a literature review and a comprehensive exploration of DMP tools. The review was carried out in four
steps: (i) literature search strategy, (ii) literature identification, (iii) tool identification, and (iv) data reporting and analysis.

3.1. Literature search strategy

An advance search with terms ‘‘data management plan’’, ‘‘DMP’’, ‘‘DMP tools’’, and ‘‘data management plan tools’’ was performed
individually as well as in combination ‘‘data management plan’’ OR ‘‘DMP’’ AND ‘‘data management plan tools’’ OR ‘‘DMP tools’’ in
Scopus, SpringerLink, Emerald Insight, Taylor & Francis Online, ACM digital library, EBSCOhost Research, IEEE Xplore databases.
We also searched the above key terms in the Google Scholar search engine. The search was limited to the document type article,
language English, and the period before January 2020.

3.2. Literature identification

Following the advanced search strategy we obtained a total of 13 894 documents. Further, In the obtained documents, an
advanced search (in the abstract field using reference manager (JabRef)) for the term ‘‘data management plan’’ was performed
that fetched a total of 637 documents. The title and abstract of these documents were read manually, and 48 relevant documents
on data management plan (DMP) tools were identified.

3.3. Tool identification

Based on the full-text reading of identified 48 documents we prepared a list of data management plan tools. For further
evaluation, only those tools are included which were (i) openly available and (ii) not built on any other DMP tool (for example:
DMP Assistant tool which is powered by DMPOnline, so DMPOnline got enlisted but DMP Assistant did not). Hence, considering
above inclusion and exclusion criteria a total of 14 tools (see Table 1) were identified for the evaluation.

4
S.B. Gajbe et al. Information Processing and Management 58 (2021) 102480

Table 1
List of DMP tools.
Tool name Description
‘‘DMPOnline’’ One of the oldest data management planning tool developed by Digital Curation Centre (DCC) in 2010 for UK based funders
(‘‘DMPOnline, 2010’’).
‘‘DMPTool’’ Based on DCC’s DMPOnline and is developed and maintained by the University of California Curation Center of the California
Digital Library in 2011 for USA based funders (‘‘DMPTool, 2011’’).
‘‘IEDA DMP’’ Interdisciplinary Earth Data Alliance (IEDA) launched it in 2011 for making proposals to the National Science Foundation (NSF) and
other funders, indulging in Earth Sciences and GeoSciences research in the USA (‘‘IEDA DMP, 2011’’).
‘‘ezDMP’’ Released in 2018 and based on the IEDA DMP tool and funded by an NSF EAGER grant to K. Lehnert and V. Ferrini (Columbia
University), H.M. Berman (Rutgers University), and V.C. Stodden (University of Illinois) in the USA (‘‘ezDMP, 2018’’).
‘‘DSW’’ A pioneer wizard created in cooperation of Dutch Techcentre for Life Sciences (DTL, ELIXIR NL) & Czech Technical University in
Prague Faculty of Information Technology (CTU, ELIXIR CZ), Institute of Organic Chemistry and Biochemistry of the CAS, Centre for
Conceptual Modelling and Implementation in 2015 in Europe (‘‘DSW, 2015’’).
‘‘TUB-DMP’’ Built by Service Center for Research Data and Publications (SZF) at the Technische Universität Berlin in 2015. This tool has
restricted access and is available only in the German language (‘‘TUB-DMP, 2015’’).
‘‘OpenDMP’’ Developed in 2017 by OpenAIRE and EUDAT in Greece (‘‘OpenDMP, 2017’’).
‘‘UWADMP’’ A web-based tool developed by the University of Western Australia (‘‘UWA DMP, 0000’’).
‘‘RDMO’’ Meant for Research Data Management Organizer and created by Leibniz-Institute for Astrophysics Potsdam (AIP) in 2017 in
Germany (‘‘RDMO, 2017’’).
‘‘DataWiz’’ DataWiz knowledge base was developed by Leibniz Institute for Psychology Information and Documentation (ZPID) in Germany in
2017 (‘‘DataWiz, 2017’’).
‘‘DMPTY’’ Clarin-D from Germany developed DMPTY in 2015 (‘‘DMPTY, 2015’’).
‘‘easy.DMP’’ Developed by Sigma2 in collaboration with EUDAT2020 in Norway, Europe (‘‘easy.DMP, 2015’’).
‘‘ResData RDMP’’ Developed by the University of New South Wales, Australia with restricted access (‘‘ResData, 0000’’).
‘‘PARTHENOS DMP’’ Archeology research, PARTHENOS project — a Horizon 2020 project funded by the European Commission has developed the tool
(‘‘PARTHENOS DMP, 0000’’).

Table 2
Description of technical specifications parameters.
Parameter Description
‘‘Framework’’ Web application structure upon which the tools are built and deployed.
‘‘Server OS’’ Specifies the name of the server-side operating system, the tools are hosted on.
‘‘Web server’’ Refers to the required software for hosting the DMP tools interface on the web.
‘‘Database’’ Denotes the name of the database system that has been used in the backend for storage.
‘‘Web-based’’ Indicates whether the tools can be used via web browsers.
‘‘Source code availability’’ Inquires whether the source code of the tools is available for downloading and installing in local machines.
‘‘Access’’ Determines whether the tools are open to everyone or is available only to a specific group of people such as researchers, staff,
and students of a specific organization.
‘‘Single sign-on (SSO)’’ Shows if the tools allow the user to authenticate many applications by signing in using a single id.
‘‘First release’’ First published for the general public
‘‘Latest release’’ When was the latest release year till 20th July 2020
‘‘Latest version’’ The latest version available for the tool till 20th July 2020.
‘‘Hosted by’’ Informs about the host institutions or organizations that host the tool.
‘‘Region covered’’ Shows the geographical areas where the tools are hosted and used more frequently.
‘‘License’’ Determines the license under which the tools are registered.

3.4. Data reporting and analysis

This study follows a parametric approach to review DMP tools. The tools are reviewed based on 14 technical specifications (see
Table 2), 13 features (see Table 4), and 18 data life cycle support (based on DMP templates questions) parameters (see Table 6).
The technical specifications and features are identified from existing literature and tool exploration, however, data life cycle support
parameters are based on DMP templates of various selected DMP tools. The literature, tools, and parameters were reported and
tabulated using a spreadsheet. The analysis is illustrated with a donut chart, bar chart, co-occurrence, and correlation matrix. Donut
chart is used for Tables 3 and 5, while bar chart, co-occurrence, and correlation matrices are used for discussing the data of Table 7.
Additionally, the co-occurrence of parameters are used to determine a correlation graph among the tools. The study calculates the
weighted parameter using co-occurrence of parameter. By considering the effect of all other parameters, then on top of the weighted
parameter, correlation among tools were calculated. Findings, in detail, are analyzed in the discussion section of this article.

4. Findings

The comparative analysis is based on the parameters that can be broadly classified as technical specifications, features, and data
life cycle supported DMP template questions. For DMP stakeholders, the comparison of tools based on classified parameters is crucial.
It is critically relevant for both the tool users and developers. The comparison helps the users to choose a suitable tool according to
their purpose and requirements. Developers can identify the lacunae and requirements to fill that accordingly. In addition to that,
this study considers the parameters related to the data life cycle supported DMP template questions. Here, DMP template questions

5
S.B. Gajbe et al.
Table 3
Technical specifications for each DMP tools.
DMPOnline DMPTool IEDA DMP ezDMP DSW TUB-DMP OpenDMP UWA DMP RDMO DataWiz DMPTY easy.DMP UNSW PARTHENOS
ResData DMP
RDMP
Framework Ruby ≥ 2.0.0 Ruby ≥ 2.4.4 NA Node.js Haskell Tool Laravel 5.5 REST NA Django NA Javascript Python ≥ 3.5, NA NA
Rails ≥ 4.0 Rails ≥ 4.2 object Stack ≥ backend framework pip, graphviz,
relational 1.9.3, services AngularJS Django
mapping RabbitMQ ≥ Java/Spring
(ORM) 3.7.8, Docker
Angular.js ≥ 17.09.0-ce,
with fully Python
responsive
Bootstrap
UI
Server OS Ubuntu Ubuntu NA NA NA Linux NA NA Linux, NA NA NA NA NA
GNU/Linux GNU/Linux macOS,
Windows
Web server Apache Apache NA NA YAML Apache2, Apache NA Django, NA NA NA NA NA
PHP 7.1 Nginx,
Apache
Database MySQL ≥ 5.0 MySQL ≥ 5.0 NA PostgreSQL MongoDB ≥ PostgreSQL , PostgreSQL NA PostgreSQL, NA NA Sqlite, NA NA
6

4.0.12 MySQL, etc MySQL, & PostgreSQL


SQLite
Web-based Yes Yes Yes Yes Yes Yes Yes Yes Yes Yes Yes Yes Yes NA
Source code Yes Yes No Yes Yes Yes Yes No Yes Yes No Yes No No
available
Access Open Open Open Open Open Restricted Open Open Open Open Open Open Restricted NA
(Open/Restricted)
Single sign-On No Yes No Yes No Yes No No Yes No No Yes No NA
First release year 2010 2011 2011 2018 2015 2015 2017 NA 2017 2017 2015 2015 NA NA

Information Processing and Management 58 (2021) 102480


Latest release year 2014 2019 NA 2018 2019 2017 2019 NA NA 2017 NA 2019 NA NA
Latest version v4.0 v2.1.1 NA NA v1.10.1 v2.1 NA NA v0.9 NA NA v0.21.3 NA NA
Hosted by University of University of Lamont– Interdisci- DTL, ELIXIR SZF at OpenAIRE Qualtrics Leibniz- Leibniz CLARIN-D Sigma2 in UNSW Parthenos
Edinburgh California Doherty plinary NL & CTU, Technische and EUDAT Institute for Institute for collaboration Library, Project — a
Earth Earth Data ELIXIR CZ, Universität Astrophysics Psychology with Sydney Horizon
Observatory Alliance Prague Berlin Potsdam Information EUDAT2020 & 2020 project
of Columbia (IEDA) (AIP) and Docu- UNINETT funded by
University mentation European
(ZPID) Commission
Regions covered UK USA USA USA Europe Germany Greece Australia Germany Germany Germany Norway Australia Europe
License MIT license MIT license CC MIT license Apache 2.0 MIT license Apache 2.0 NA Apache 2.0 GNU NA MIT license NA NA
BY-NC-SA license license license General
Public
license v3.0
S.B. Gajbe et al. Information Processing and Management 58 (2021) 102480

Table 4
Description of features parameters.
Parameter Description
‘‘Language Support’’ Enlists the total languages in which the tools allow to design a plan.
‘‘Domain Coverage’’ The scope of the tools in terms of their domains of study.
‘‘Guideline Support (GS)’’ Inquires whether the host institution provides help in answering DMP questions.
‘‘Number of Guidelines’’ Gives the count of guidelines, the DMP tools include.
‘‘Community Support (CS)’’ Examines whether the tools have provided a platform to discuss and resolve issues relevant to the
application.
‘‘Collaboration Option (CO)’’ Investigates if the tools provide an option to collaborate among the researchers and organizations.
‘‘Publicly Available DMP Example (PADE)’’ Whether the tools incorporate some example use cases and make them publicly available to facilitate users
in designing their DMP.
‘‘Closed or Open-Ended’’ Inquires if the default template of the tools has closed or open-ended questions.
‘‘User Guide (UG)’’ Identifies whether the tools have a user support guide.
‘‘Available Funder’s Template’’ Provides the count of publicly available funder’s templates (other than default template).
‘‘Export Format’’ Examines the flexibility of tools to export the plan in various file formats.
‘‘DMP Export Customization (DEC)’’ Concerned with the resilience of the tools while exporting a plan in different fonts, sizes, and page margins.
‘‘Plan Analysis (PA)’’ Provide an overview of the entire plan.

Fig. 1. Proportion of tools having properties (WB, SCA, and OA) and service (SSO) specific to technical specifications.

are obtained from the default template of DMP tools, which include a set of questions related to the data management planning
for each stage of the data life cycle. The usefulness of these templates is determined by the quality, range, and completeness of the
questions asked pertaining to the data life cycle.
The parameter tables include descriptive, numerical, date, categorical, and NA data values. Descriptive data values are the factual
information; numerical data values contain the numbers; date data express the timeline; categorical data values are ‘Y’ that means
known or available features and ‘N’ that means no such feature is available; and ‘NA’ means authors could not find information
related to the particular parameter.

4.1. Technical specifications

The study has adopted a total of 14 technical specifications parameters (see Table 2) to evaluate the selected DMP tools. As per
Table 3, a variety of frameworks have been used by DMP tools. It was observed, out of the tools with available information, all are
built on a dedicated frontend or backend framework. Ubuntu/Linux is used as server OS by four DMP tools namely, DMPOnline,
DMPTool, TUB-DMP, and RDMO. Moreover, RDMO supports macOS and Windows. A total of four tools including, DMPOnline,
DMPTool, TUB-DMP, and OpenDMP use Apache; DSW uses YAML; and RDMO uses Apache, Nginx, or Django as a web server. DSW
uses NoSQL database MongoDB, however, the remaining tools use relational databases such as MySQL, PostgreSQL, and SQLite.
NoSQL databases are faster and can handle large storage efficiently (Han et al., 2011). Therefore, DSW has an advantage over other
tools. For designing a DMP tool, the selection of operating systems, web browsers, and databases depends upon the availability of
resources. However, open-source software are the top priority among hosts. The source code availability is useful for a developer. It
allows a developer to modify it or develop a new tool. The source code of DMPOnline, DMPTool, ezDMP, DSW, TUB-DMP, OpenDMP,
RDMO, DataWiz, and easy.DMP are available with open licenses. Two tools, TUB-DMP and UNSW ResData RDMP have restricted
access. However, other tools are fully accessible to everyone. Five tools namely, DMPOnline, DMPTool, ezDMP, TUB-DMP, and
easy.DMP are developed using MIT license. DSW, RDMO, and OpenDMP follow Apache license whereas DataWiz has a GNU GPL
license. The single sign-on facility is provided by five tools namely, DMPTool, ezDMP, TUB-DMP, RDMO, and easy.DMP. The SSO
has provision to use credentials from institutions, Google, ORCID, or other social networks. The newly released version of the Data
Stewardship Wizard (DSW) platform makes the tool available on Docker — a container-based application that supports YAML.
From Table 3, it was found only 29% of the tools (see Fig. 1) provide the source code with a single sign-on option and are
publicly available. Another 29% tools are publicly available and provide their source code. Whereas, 21% of the tools are open
access and 14% web-based tools are neither publicly available nor provide their source code or single sign-on options.

7
S.B. Gajbe et al.
Table 5
Features of DMP tools.
DMPOnline DMPTool IEDA DMP ezDMP DSW TUB-DMP OpenDMP UWA DMP RDMO DataWiz DMPTY easy.DMP UNSW PARTHENOS
ResData DMP
RDMP
Languages Deutsch, English & English English English German English English English and English and English and English English English
Supported (LS) Espaol, Portuguese German German German
English (GB),
English (US)
& Français
Domain Coverage Domain Domain Earth Earth Domain Domain Domain Domain Domain Domain Humanities Domain Domain Archeology
(DC) Independent Independent Sciences Sciences Independent Independent Independent Independent Independent Independent and Social Independent Independent
GeoSciences GeoSciences Science
Guidelines Support Yes Yes Yes Yes Yes Yes Yes Yes No Yes No Yes Yes Yes
(GS)
No. of guidelines 57 93 11 22 1 NA 4 1 0 3 0 2 1 1
Community Yes Yes NA NA NA NA NA Yes No No No Yes No NA
Support (CS)
Collaboration Yes Yes No Yes No Yes No Yes Yes Yes No Yes Yes NA
8

Option (CO)
Publicly Available Yes Yes NA No Yes NA Yes No No Yes No No No No
DMP Examples
(PADE)
Close or Open Open Both Both Both Open Both Both Both Both Open Open Both Both
Open-Ended DMP
Questions
User Guide (UG) Yes Yes No No Yes Yes Yes Yes Yes Yes Yes Yes Yes Yes
Available Funder’s 18 19 0 0 0 NA 18 0 6 3 0 1 0 0

Information Processing and Management 58 (2021) 102480


Templates
Export Formats pdf, csv, html, csv, pdf, docx, pdf pdf pdf, tex, pdf xml, pdf pdf pdf, rtf, odt, odf word, rtf, tex html pdf pdf
docx, txt txt, html html, odt, docx, html,
docx, json, md,
md mediawiki,
tex
DMP Export Yes Yes No No No NA No No No No No No No NA
Customization
(DEC)
Plan Analysis (PA) No No No No Yes No No No No No No No No No
S.B. Gajbe et al. Information Processing and Management 58 (2021) 102480

Table 6
Description of parameters for DMP tools template questions.
Parameter Description
‘‘Data description/summary/collection’’ Inquires if the DMP template includes questions about data collection, data description, and summary.
‘‘Documentation and metadata’’ Shows whether the template of the tools has metadata related questions.
‘‘Ethics and legal compliance’’ Explores whether moral, professional, and legal issues were covered in the template.
‘‘Storage and backup’’ Examines if the template has questions related to the data storage and backup.
‘‘Selection and preservation’’ The template includes questions on the data selection process and the preservation of data.
‘‘Roles and responsibilities’’ Indicates whether the template has questions regarding the roles involved in data handling and the
responsibilities provided to data handlers.
‘‘Resource allocation’’ Checks whether the DMP template includes information related to resources and infrastructure used in the
project.
‘‘Data privacy’’ Probes if the template has questions related to any personally identifiable information.
‘‘Data security and integrity’’ Criteria finds out whether the safety-related questions and precautions pertaining to the data are included.
‘‘Relationship to other datasets’’ Checks if the template has questions that inquire about the datasets reused by the project.
‘‘Time frame’’ Examines whether the questions on time and date associated with data have been included.
‘‘Data sharing’’ Checks if the template has questions about keeping the data accessible for reuse.
‘‘Findability’’ Checks whether the template has questions related to making the data easier to find, for example, metadata
and persistent identifiers.
‘‘Accessibility’’ Checks whether the template has questions related to accessibility like authentication and authorization.
‘‘Interoperability’’ Checks for questions related to interoperability within the template.
‘‘Reusability’’ Checks if the template has questions that can help with reuse of the data.
‘‘Quality assurance of data/data quality’’ Investigates whether the DMP template has questions related to the appraisal of data (Houston et al., 2018;
Levitin & Redman, 1995).
‘‘Details about the data product’’ Shows if the DMP template has questions that inquire about the final data product of the project such as
observational, analytical, processed data, experimental, model/theoretical, interpretive product.

4.2. Features

Table 5 includes a total of 13 parameters, described in Table 4, related to the general, advanced, and user-support features. From
Table 5, TUB-DMP is the only tool that supports a single language (German), whereas other tools allow the design of the DMP in
more than one language. In addition to English, DMPTool supports Portuguese, while RDMO, DataWiz, and DMPTY support German.
Moreover, DMPOnline allows five languages: Deutsch (German), Espaol, English (GB), English (US) & Français for preparing a DMP.
Ten DMP tools are found to be domain-independent, whereas IEDA DMP and ezDMP are devoted to Earth and GeoSciences, DMPTY is
dedicated to Humanities and Social Sciences, and PARTHENOS DMP is designed for Archeology. Guidelines support makes it easy for
a user to answer the DMP questions based on its institutional requirements. Other than RDMO and DMPTY, the remaining DMP tools
provide guidelines support. Relatively, DMPTool (93) and DMPOnline (57) provide more institutional guidelines. The general queries
and suggestions can be effectively addressed on the community support platform. Hence, community support is crucial for both users
and developers. DMPOnline, DMPTool, UWA DMP, and easy.DMP provide community support. The collaboration option enables a
team of researchers to prepare DMP in a group. Moreover, a researcher might not be able to answer each of the template questions,
hence, it is advantageous to involve the administration or librarian of the institute using the collaboration option (Delserone, 2008).
We found nine tools that allow collaboration options. DMPOnline, DMPTool, DSW, OpenDMP, and DataWiz have publicly available
DMP examples. These examples are useful in understanding the workflow to prepare a DMP. Five tools ask only open-ended questions
in their default template, whereas the rest incorporate both open and closed-ended questions. DMP questions are vast and could
mean differently for the different domains therefore most of the DMP tools provide open-ended questions. A user guide can be helpful
for a novice as this includes step by step instructions to use the tool. Excluding IEDA DMP and ezDMP, other tools provide their user
guides. DMPTool, DMPOnline, OpenDMP, RDMO, DataWiz, and easy.DMP includes 19, 18, 18, 6, 3, and 1 funder’s templates (other
than default) respectively, which shows DMPOnline, DMPTool, and OpenDMP are popular among the funding agencies. DMPOnline
and DMPTool allow plan export customization where users can customize the fonts, sizes, and margins of their plans while exporting
it. DMPOnline, DMPTool, DSW, and RDMO provide more flexibility to export the plan in different file formats. Other than DSW
none of the selected tools provides the analysis of the plan. Based on the answers to the DMP questions, DSW provides a summary
of the plan in terms of FAIR support and evaluates how good the DMP is.

4.3. DMP templates

Table 7 contains 18 parameters, described in Table 6, for the appraisal of DMP tools. The parameters were determined by
analyzing the competency questions asked in the default templates of DMP tools. All the DMP templates incorporate questions
related to data description/summary/collection and documentation and metadata which are important from a retrieval point of
view. Except for IEDA and ezDMP, other tools include questions on legal and ethical compliances. The storage and backup enable
the reusability of data, hence 11 tools pose the questions on storage and backup, whereas, IEDA DMP, TUB-DMP, and easy.DMP do
not ask questions on storage and backup in their templates. OpenDMP, easy.DMP, and UNSW ResData RDMP circumvent questions
on selection and preservation. Questions on roles and responsibilities are covered by 12 tools except for UWA DMP and PARTHENOS
DMP. There are 11 tools implementing questions on resources (infrastructure) allocation except for IEDA DMP, DMPTY, and UNSW
ResData RDMP.

9
S.B. Gajbe et al. Information Processing and Management 58 (2021) 102480

Table 7
Data Life Cycle coverage based on default templates of DMP tools.
DMPOnline DMPTool IEDA ezDMP DSW TUB- OpenDMP UWA RDMO DataWiz DMPTY easy.DMP UNSW PARTHENOS
DMP DMP DMP ResData DMP
RDMP
Data descrip- Yes Yes Yes Yes Yes Yes Yes Yes Yes Yes Yes Yes Yes Yes
tion/summary/collection
Documentation and Yes Yes Yes Yes Yes Yes Yes Yes Yes Yes Yes Yes Yes Yes
metadata
Ethics & legal Yes Yes No No Yes Yes Yes Yes Yes Yes Yes Yes Yes Yes
compliance
Storage & backup Yes Yes No Yes Yes No Yes Yes Yes Yes Yes No Yes Yes
Selection & Yes Yes Yes Yes Yes Yes No Yes Yes Yes Yes No No Yes
preservation
Roles and Yes Yes Yes Yes Yes Yes Yes No Yes Yes Yes Yes Yes No
responsibilities
Resources Yes Yes No Yes Yes Yes Yes Yes Yes Yes No Yes No Yes
(Infrastructure)
allocation
Data privacy No No No No No No No No Yes Yes No No Yes No
Data security & Yes Yes No No Yes Yes Yes Yes No No No Yes Yes Yes
integrity
Relationship to other No No Yes Yes Yes Yes No No Yes No No Yes No No
datasets
Time frames No No Yes Yes No No No Yes Yes Yes Yes No Yes No
Data sharing Yes Yes No No Yes Yes Yes Yes No Yes Yes Yes Yes Yes
Findability Yes Yes Yes Yes Yes Yes Yes Yes Yes Yes Yes Yes Yes Yes
Accessibility Yes Yes Yes Yes Yes Yes Yes Yes Yes Yes Yes Yes Yes Yes
Interoperability No No No No Yes Yes Yes No Yes No No Yes No Yes
Reusability Yes Yes No Yes Yes Yes Yes Yes Yes Yes Yes Yes Yes Yes
Quality assurance of No No No No Yes Yes Yes No No Yes Yes Yes No Yes
data/data quality
Details about data No No Yes Yes No No No No No No No No No No
product

The complex nature of scientific research requires collaborators and external resources or infrastructure. It is ethical to
acknowledge the resources used in the research or project such as infrastructure support by the institute/organization, software,
and funding information. Hence, many of the tools incorporate the questions related to collaborators’ roles & responsibilities and
external resources. RDMO, DataWiz, and UNSW ResData RDMP incorporate questions on data privacy, the remaining 11 DMP tools
need to include the questions on privacy in their template. Out of 14 tools, nine have the provision of data security and integrity
related questions, whereas the rest five do not. It is a useful practice to look for the related datasets as it negates the duplication of
work. Also, it gives credit to the previous researcher. Questions on relationships to other datasets are asked by eight tools whereas
rest six neglect the same. Time frame questions describe the life story of data. There are seven DMP tools that contain questions on
time frames and data quality. Except for IEDA DMP, ezDMP, and RDMO, the rest other tools cover questions on data sharing.
Research communities are constantly urged to make data FAIR supportable. The FAIR principle promotes findability, accessibility,
interoperability, and reusability of resources. This study found all the DMP tools pose questions related to findability and
accessibility. Moreover, six of the tools provide questions on interoperability and except for IEDA DMP, the other 13 tools include
questions on reusability. DSW, OpenDMP, PARTHENOS DMP, RDMO, TUB-DMP, and easy.DMP fully support the FAIR principle. In
this regard, FAIR support does not rely on the 13 principles (Wilkinson et al., 2016) but broadly focuses on the results of parameters
— findability, accessibility, interoperability, and reusability. A total of seven tools, namely DMPOnline, DMPTY, DMPTool, DataWiz,
UWA DMP, ezDMP, and UNSW ResData RDMP need to incorporate questions on interoperability. Furthermore, IEDA DMP includes
only findability and accessibility aspects of FAIR. Only IEDA DMP and ezDMP incorporate questions related to data products. It is
the least practiced aspect among the DMP tools, however, it can help a user to identify the worth of the data collection at a glance.
Fig. 2 illustrates that 22.2% of the parameters (data description/summary/collection, documentation & metadata, findability, and
accessibility) are covered by all 14 tools whereas the equal number of the parameters (storage & backup, selection & preservation,
resources allocation, and data sharing) are covered by 78.57% of the tools. In addition, 42.86% of the tools include 11.1% of the
parameters (interoperable and relation to other datasets). We generated a co-occurrence matrix (Fig. 3) which shows the occurrence
of any two parameters together in a tool. The purpose of the matrix was to spot the number of tools (i.e., frequency) where a pair of
parameters occur together. The scale for measuring co-occurrence of one parameter with any other parameter is between 0 to 14.
The numbers represent the total number of tools where the pair of parameters occurred together. Moreover, the diagonal elements
of the co-occurrence matrix represent the total number of tools having a particular parameter. Further, based on the co-occurrence
matrix correlation matrix has been derived. Using the co-occurrence matrix, weightage was calculated based on the ratio of the
average of sum of the co-occurrence of the parameters with all other parameters and the total number of tools. This weightage was
superimposed on categorical values (Yes (i.e., 1) and No (i.e., 0)) of Table 7 that generates a weighted matrix of parameters with
corresponding tools. Then we have calculated Pearson’s coefficient of correlation with respect to all the tools (shown in Fig. 4). This
correlation corresponds to the percentage of relatedness with every other tool.

10
S.B. Gajbe et al. Information Processing and Management 58 (2021) 102480

Fig. 2. Set of parameters covered by the proportion of tools (in %).

Fig. 3. Co-occurrence matrix plot of DMP template parameters.

5. Discussion

Data management plan enhances the reusability of data along with that it also inherently facilitates data curation. This is the
reason why major funding agencies such as NSF and Wellcome Trust have made the DMP submission mandatory for availing the
research funds. The fact that cannot be ignored that using a DMP template motivates and reminds a researcher to include the
missed data management aspects, by looking at the DMP questions in their project. Additionally, DMP on research projects enables
funders/research communities to channel resources (time and money) through coordinated research. This means the data is reusable
and retrievable in order to encourage further studies using the same or related datasets.

11
S.B. Gajbe et al. Information Processing and Management 58 (2021) 102480

Fig. 4. Correlation matrix plot of DMP tools.

Since universities consist of many departments, therefore they preferred to host domain-independent DMP tools. It was found that
most of the DMP tools have originated in native English speaking countries, hence they prefer English as a working language. On the
other hand, German was found to be the next preferred language. Formerly originated DMP tools regularly release new versions, and
support more number of guidelines. Due course of time such tools have proliferated themselves among a large number of institutions
and users. Further, based on feedback from institutions and users, they are motivated to incorporate more DMP guidelines. From
the finding, all the tools are web-based so any local setup is not required. The only requirement is to have a credential to sign-in.
The selected tools originated in developed countries and none of the tools belong to Asia, Africa, or South America.
Seven parameters of Table 5 namely, guideline support, community support, collaboration option, publicly available DMP
example, user guide, DMP export customization, and plan analysis are categorized as user assistance parameters. Among the above
listed parameters, the first five parameters (first group) assist the users throughout the DMP design. However, the last two (second
group) parameters are relevant at the completion of DMP. The first group could account for the larger number of tools i.e., 26.6%,
21.4%, 21.4%, 14.3%, and 14.3% of the tools cover 60%, 80%, 40%, 20%, and 100% of parameters of this group respectively (see
Fig. 5). Whereas, the second group is not evident in most of the tools. The plan analysis is provided by only DSW, while DMP export
customization is allowed by only DMPOnline and DMPTool.
‘‘Sharing is caring’’ and if we have strategies to share the data, we are actually caring for the research community. Data sharing is
the key to reusability and facilitates data validation (Kim, 2017; Wiley, 2018). Privacy is one of the major concerns in data sharing
and reuse. Surprisingly, the data privacy parameter is covered by only three tools (21.43%). Many of the DMP templates pose
questions related to findability, accessibility, and reusability of data, but interoperability needs to be ensured by them. Further, it
is advisable that for easy retrieval and efficient reuse DMP templates need to incorporate questions that can reveal the relationship
of the data with other datasets. Of particular note (From Fig. 2), the DMP parameters associated with data privacy and details
about data products tend to co-occur at lower coverage among all the DMP tools. Whereas, data description/summary/collection,
documentation and metadata, findability, and accessibility co-occur at higher coverage.
All the tools are positively correlated on the bases of the aforesaid weighted co-occurrence of parameter. DMPOnline and
DMPTool are 100% correlated, whereas IEDA DMP and PARTHENOS DMP have the least correlation (20.8%). Based on Pearson’s
coefficient of correlation (see Fig. 4) the relatedness results have been categorized into three classes of correlation. The classes include
the tools that are highly correlated (more than 66.6%), moderately correlated (between 33.3%–66.6%), and loosely correlated (less
than 33.3%) with the corresponding tools. From the results, it is clear that the domain-independent tools are more correlated with
other tools. However, domain-specific tools are less correlated with other tools.
DMPOnline and DMPTool are similar as they cover the exact set of data life cycle parameters. Fig. 6, derived from Table 7,
illustrates the parameter coverage by tools. It is evident that DSW is the most comprehensive tool that covers a total of 15 parameters

12
S.B. Gajbe et al. Information Processing and Management 58 (2021) 102480

Fig. 5. (a) Proportion of tools having parameters (such as GS, UG, CO, CS, PADE, DEC, and PA) that assist throughout DMP. (b) Proportion of tools covering
a proportionate set of parameters that assist in the initial stage of DMP (such as GS, UG, CO, CS, PADE).

Fig. 6. Set of tools having parameter (in %).

(83.33%). However, IEDA DMP includes the least number (50%) of parameters. Except for IEDA DMP, the remaining other tools
cover at least 12 parameters (≥ 66.67%).
As per the need of users, there could be several scenarios for trading off between the parameters and tools. For example, if the
user wants to choose a DMP tool whose default template is FAIR compliant, has ethics, data protection, roles and responsibility, and
storage & backup related questions, the user can choose DSW or OpenDMP according to his/her needs. Similarly, numerous scenarios
can be derived from the findings of the present work to choose useful tools as an output that suits the user’s perspective. Based
on the features of DMP tools it is inferred DMPOnline and DMPTool have more features and provisions to accommodate a variety
of users. Whereas, the default template of DSW is more comprehensive. DSW incorporates provisions like NoSQL databases, plan

13
S.B. Gajbe et al. Information Processing and Management 58 (2021) 102480

analysis, and evaluation options. Though DSW is a tool that satisfies most of the parameters, it can be overwhelming for someone
who just wants to create a DMP and proceed. Some amount of training might be necessary for a novice.

6. Implications

Present work has several theoretical and practical implications. The first theoretical implication of the work is inherent in the
comprehensive nature of the review. The work is novel in both its extension and intension. None of the existing studies are evident
in considering 14 tools and 45 parameters for evaluation.
Additionally, present work is in alignment with the results of the related works. However, contrary to similar studies (particularly
that are based on funders perspective) this work is based on the researchers perspective. Sallans and Donnelly (2012) emphasised
on purpose specific DMP, this study carries forward their work and provides a method to select the purpose specific DMP tools using
matrices. The results of this study has several practical impacts on its stakeholders such as

• Academicians — to determine the appropriate DMP tool for their research data management; for example, an academician
works on remote sensing and population census data simultaneously, so for catering the need for remote sensing, domain
specific tools can be the choice while for the population census data, domain independent tools would be preferable. However,
for catering the need for both the domains combinedly, choice of DMP tool would be a trade-off between the tools. Table 8
can facilitate this trade-off. Let us say the first choice is ezDMP/IEDA DMP which is domain specific but has strong correlation
with DataWiz and RDMO(both domain independent). Furthermore, DataWiz would be the preferred tool, as that is highly
correlated with 64% of the total tools.
• Tool developers — to understand the limitations & benefits of the tools and to use this knowledge to harness or improve a
DMP tool;
• Librarian or data manager — to use the DMP for RDM and to help choose the library an appropriate DMP tool to help its
researchers with the specifics of the project and data.
• Funders — to suggest the researchers to adopt a certain DMP tool for the grant of funds for the research/project or enhance
the funders template.

The results of the work are handy in ranking the DMP tools based on their parameter coverages. This work can be a groundwork
for studying further the usability of DMP tools and researchers’ behavior towards research data reuse and DMP.

7. Conclusion and future work

DMP is crucial for data management and curation. It makes the RDM aspects explicit that helps the funders in making a decision
about which, why, and how the research should be funded. This study attempts to review the current status and future prospects
of DMP and DMP tools. A comprehensive parametric approach was adopted to identify the gaps in DMP practices and prevailing
DMP tools. The study is novel in its nature and approach. The comprehensive nature and a parametric approach of this work are
unique among the DMP studies. The study found that there is a significant lack of DMP awareness and practice. DMP is a less
recognized practice in Asian and African countries compared to its American and European counterparts. The nature of data varies
across the domain, therefore the domain-specific DMPs are essential. However, it was observed, the domain-specific DMPs or DMP
tools are limited in practice. It is pointed out that the earlier released DMP tools lack FAIR support. The researcher can enhance the
effectiveness of DMPs by involving other planning stakeholders for instance librarian, institutes administration, and collaborator. It
was noticed that DMP tools are positively correlated and domain-specific tools are loosely correlated with the other tools. Selection
of the best DMP tool is tedious as it depends upon the user’s needs, however, based on the current parameters and certain limitations,
present study found that DMPOnline, DMPTool, and DSW are relatively comprehensive.
The parameters of this study can be used as metadata elements for describing a DMP tool and it can be extended to design a
metadata schema for DMPs. In future, we will extend the present work to a more detailed FAIR principle based evaluation and
design the metadata schema for DMPs.

CRediT authorship contribution statement

Sagar Bhimrao Gajbe: Conceptualization, Methodology, Formal analysis, Data curation, Writing - original draft, Writing -
Review & Editing, Visualization. Amit Tiwari: Conceptualization, Methodology, Formal analysis, Data curation, Writing - original
draft, Writing - Review & Editing, Visualization. Gopalji: Conceptualization, Methodology, Formal analysis, Data curation, Writing
- original draft, Writing - Review & Editing, Visualization. Ranjeet Kumar Singh: Conceptualization, Methodology, Formal analysis,
Data curation, Writing - original draft, Writing - Review & Editing, Visualization.

Acknowledgments

All authors equally contributed to this work. The authors wish to express their gratitude to Prof./Dr. Devika P. Madalli and retd.
Prof./Dr. A.R.D. Prasad for their constructive comments. We also extend our thanks to the editor and anonymous reviewers for their
valuable comments.

14
S.B. Gajbe et al.
Table 8
Categorization based on the correlation among the tools.
DMPOnline DMPTool IEDA DMP ezDMP DSW TUB-DMP OpenDMP UWA DMP RDMO DataWiz DMPTY easy.DMP UNSW PARTHENOS
ResData DMP
RDMP
Highly DMPTool, DMPOnline, ezDMP IEDA DMP, DMPOnline, DMPOnline, DMPOnline, DMPOnline, DMPOnline, DMPOnline, DMPOnline, DMPOnline, DMPOnline, DMPOnline,
Correlate DSW, DSW, RDMO, and DMPTool, DMPTool, DMPTool, DMPTool, DMPTool, DMPTool, DMPTool, DMPTool, DMPTool, DMPTool,
(above TUB-DMP, TUB-DMP, Datawiz TUB-DMP, DSW, DSW, DSW, ezDMP, DSW, ezDMP, DSW, DSW, DSW, OpenDMP, DSW,
66.6%) OpenDMP, OpenDMP, OpenDMP, OpenDMP, TUB-DMP, DataWiz, and DataWiz OpenDMP, DataWiz, and TUB-DMP, UWA DMP, TUB-DMP,
UWA DMP, UWA DMP, UWA DMP, easy.DMP, DataWiz, UNSW UWA DMP, UNSW and DataWiz, and OpenDMP,
RDMO, RDMO, RDMO, and easy.DMP, ResData RDMO, ResData OpenDMP DMPTY UWA DMP,
DataWiz, DataWiz, DataWiz, PARTHENOS UNSW RDMP, and DMPTY, RDMP and DataWiz
DMPTY, DMPTY, DMPTY, DMP ResData PARTHENOS UNSW
easy.DMP, easy.DMP, easy.DMP, RDMP, and DMP ResData
UNSW UNSW and PARTHENOS RDMP, and
ResData ResData PARTHENOS DMP PARTHENOS
RDMP, and RDMP, and DMP DMP
PARTHENOS PARTHENOS
DMP DMP
15

Moderately IEDA DMP, IEDA DMP, DMPOnline, DMPOnline, IEDA DMP, IEDA DMP, ezDMP, UWA ezDMP, IEDA DMP, IEDA DMP, IEDA DMP, IEDA DMP, IEDA DMP, ezDMP,
correlated ezDMP ezDMP DMPTool, DMPTool, ezDMP, and ezDMP, UWA DMP, RDMO, TUB-DMP, TUB-DMP, TUB-DMP, ezDMP, ezDMP, UWA ezDMP, DSW, RDMO,
(between DSW, DSW, UNSW DMP, RDMO, and DMPTY OpenDMP, OpenDMP, and TUB-DMP, DMP, RDMO, TUB-DMP, DMPTY,
33.3%– TUB-DMP, TUB-DMP, ResData DataWiz, RDMO, UWA DMP, easy.DMP OpenDMP, DataWiz, RDMO, easy.DMP,
66.6%) RDMO, OpenDMP, RDMP DMPTY, and DMPTY, and DMPTY, UWA DMP, DMPTY, easy.DMP, and UNSW
DataWiz, UWA DMP, UNSW easy.DMP easy.DMP RDMO, UNSW and ResData
DMPTY, DMPTY, ResData UNSW easy.DMP, ResData PARTHENOS RDMP
easy.DMP, UNSW RDMP ResData and RDMP, and DMP
and UNSW ResData RDMP, and PARTHENOS PARTHENOS

Information Processing and Management 58 (2021) 102480


ResData RDMP, PARTHENOS DMP DMP
RDMP easy.DMP, DMP
and
PARTHENOS
DMP
Loosely None None OpenDMP, None None None IEDA DMP IEDA DMP None None None None None IEDA DMP
correlated UWA DMP,
(less than and
33.3%) PARTHENOS
DMP
S.B. Gajbe et al. Information Processing and Management 58 (2021) 102480

References

Aleixandre-Benavent, R., Vidal-Infer, A., Alonso-Arroyo, A., Peset, F., & Sapena, A. (2020). Research data sharing in spain: Exploring determinants, practices,
and perceptions. Data, 5(2).
Antonio, M., Schick-Makaroff, K., Doiron, J., Sheilds, L., White, L., & Molzahn, A. (2019). Qualitative data management and analysis within a data repository.
Western Journal of Nursing Research.
Bakos, A., Miksa, T., & Rauber, A. (2018). Research data preservation using process engines and machine-actionable data management plans. In E. Mendez,
C. Ribeiro, G. David, J. Lopes, & F. Crestani (Eds.), LNCS: Vol. 11057, Lecture notes in computer science (including subseries lecture notes in artificial intelligence
and lecture notes in bioinformatics) (pp. 69–80).
Ball, A. (2012). Review of data management lifecycle models. University of Bath, IDMRC.
Baykoucheva, S. (2015). Coping with ‘‘big data’’: escience. In S. Baykoucheva (Ed.), Managing scientific information and research data (pp. 71–84). Chandos
Publishing, (chapter 8).
Bishop, B., Gunderman, H., Davis, R., Lee, T., Howard, R., Samors, R., Murphy, F., & Ungvari, J. (2020). Data curation profiling to assess data management
training needs and practices to inform a toolkit. Data Science Journal, 19(1).
Black, E. (2018). Dmp assistant. Journal of Librarianship and Scholarly Communication, 6(1).
Bowman, M., & Maxwell, R. (2018). A beginner’s guide to avoiding protected health information (phi) issues in clinical research – with how-to’s in redcap data
management software. Journal of Biomedical Informatics, 85, 49–55.
Brand, S., Bartlett, D., Farley, M., Fogelson, M., Hak, J. B., Hu, G., Montana, O. D., Pierre, J. H., Proeve, J., Qureshi, S., Shen, A., Stockman, P., Chamberlain, R.,
& Neff, K. (2015). A model data management plan standard operating procedure.. Therapeutic Innovation & Regulatory Science, 49(5), 720–729.
Bunkar, A., & Bhatt, D. (2020). Perception of researchers & academicians of parul university towards research data management system & role of library: A
study. DESIDOC Journal of Library and Information Technology, 40(3), 139–146.
Cardoso, J., Proença, D., & Borbinha, J. (2020). Machine-actionable data management plans: A knowledge retrieval approach to automate the assessment of
funders’ requirements. In J. Jose, E. Yilmaz, J. Magalhaes, F. Martins, P. Castells, N. Ferro, & M. Silva (Eds.), LNCS: vol. 12036, Lecture notes in computer
science (including subseries lecture notes in artificial intelligence and lecture notes in bioinformatics) (pp. 118–125).
Cauchick-Miguel, P., Moro, S., Rivera, R., & Amorim, M. (2020). Data management plan in research: characteristics and development. In LNICST: Vol. 319,
Lecture Notes of the Institute for Computer Sciences, Social-Informatics and Telecommunications Engineering (pp. 3–14).
Claibourn, M. P. (2015). Bigger on the inside: building research data services at the university of virginia. Insights, 28(2), 100–106.
Cope, J. (2013). Using DMPonline with postgraduate research students. University of Bath.
DataWiz (2017). Welcome to datawiz knowledge base. Retrieved from https://datawizkb.leibniz-psychology.org/index.php/before-my-project-starts/data-
management-plans/.
Delserone, L. M. (2008). At the watershed: Preparing for research data management and stewardship at the university of minnesota libraries. Library Trends,
57(2), 202–210.
DMPOnline (2010). Welcome. Retrieved from https://dmponline.dcc.ac.uk/.
DMPTool (2011). Welcome to the dmptool : Create data management plans that meet institutional and funder requirements.. Retrieved from https://dmptool.org/.
DMPTY (2015). Wizard for data management plan creation (experimental). Retrieved from https://www.clarin-d.net/en/preparation/data-management-plan.
Dogan, G., Taskin, Z., & Aydinoglu, A. (2020). Research data management in Turkey: A survey to build an effective national data repository. IFLA Journal.
Donnelly, M., & Jones, S. (2011). Checklist for a data management plan. Digital Curation Centre, 3, 03–17.
Donnelly, M., Jones, S., & Pattenden-Fail, J. W. (2010). DMP Online: the digital curation centre’s web-based tool for creating, maintaining and exporting data
management plans. International Journal of Digital Curation, 5(1), 187–193.
DSW (2015). Data stewardship wizard : Create smart data management plans for fair open science. Retrieved from https://ds-wizard.org/.
easy. DMP (2015). Easy.dmp: Data management plan generator. Retrieved from https://easydmp.sigma2.no/.
Exner, N. (2018). Data management support for faculty facing new funding mandates: The case of the u. s. Department of agriculture’s national institute of food
and agriculture. New Review of Academic Librarianship, 24(1), 90–104.
ezDMP (2018). Ezdmp: Data management plans made easy. Retrieved from https://ezdmp.org.
Giorgio, S., & Ronzino, P. (2018). Parthenos data management plan template for open research in archaeology. In A. Addison, & H. Thwaites (Eds.), 2018 3rd
digital heritage international congress (DigitalHERITAGE) held jointly with 2018 24th international conference on virtual systems multimedia (VSMM 2018) (pp.
1–4). Institute of Electrical and Electronics Engineers Inc..
Gupta, S., & Müller-Birn, C. (2018). A study of e-research and its relation with research data life cycle: A literature perspective. Benchmarking: An International
Journal, 25(6), 1656–1680.
Han, J., Haihong, E., Le, G., & Du, J. (2011). Survey on nosql database. In 2011 6th international conference on pervasive computing and applications (pp. 363–366).
IEEE.
Holles, J., & Schmidt, L. (2018). Graduate research data management course content: Teaching the data management plan (dmp). In ASEE annual conference and
exposition, conference proceedings, Vol. 2018-June. American Society for Engineering Education.
Houston, L., Probst, Y., Yu, P., & Martin, A. (2018). Exploring data quality management within clinical trials. Applied Clinical Informatics, 9(1), 72–81.
IEDA DMP (2011). Data management plan (dmp) tool. https://www.iedadata.org/dmp/.
Jones, S., Pergl, R., Hooft, R., Miksa, T., Samors, R., Ungvari, J., Davis, R. I., & Lee, T. (2019). Data management planning: How requirements and solutions
are beginning to converge. Data Intelligence, 208–219.
Kaari, J. (2020). Researchers at arab universities hold positive views on research data management and data sharing. Evidence Based Library and Information
Practice, 15(2), 168–170.
Kamocki, P., Mapelli, V., & Choukri, K. (2019). Data Management Plan (dmp) for language data under the new General data protection Regulation (gdpr) (pp.
135–139). European Language Resources Association (ELRA).
Kennan, M. A. (2018). Managing research data. In K. Williamson, & G. Johanson (Eds.), Research methods: Information, systems, and contexts (2nd ed.). (pp.
505–515). Chandos Publishing, (chapter 21).
Kim, Y. (2017). Fostering scientists’ data sharing behaviors via data repositories, journal supplements, and personal communication methods. Information Processing
& Management, 53(4), 871–885.
Kuberek, M. (2018). Guidance for creating a data management plan in horizon 2020 projects.
Lefebvre, A., Bakhtiari, B., & Spruit, M. (2020). Exploring research data management planning challenges in practice. IT - Information Technology, 62(1), 29–37.
de León, M., & de Ferrer, L. (2018). From open access to open data: collaborative work in the university libraries of catalonia. LIBER Quarterly, 28(1).
Levitin, A., & Redman, T. (1995). Quality dimensions of a conceptual view. Information Processing & Management, 31(1), 81–88.
Mallery, M. (2014). Dmptool: Guidance and resources for your data management plan. Technical Services Quarterly, 31(2), 197–199.
Melero, R., & Navarro-Molina, C. (2020). Researchers’ attitudes and perceptions towards data sharing and data reuse in the field of food science and technology.
Learned Publishing, 33(2), 163–179.
Miksa, T., Cardoso, J., & Borbinha, J. (2019). Framing the scope of the common data model for machine-actionable data management plans (pp. 2733–2742). Institute
of Electrical and Electronics Engineers Inc..

16
S.B. Gajbe et al. Information Processing and Management 58 (2021) 102480

Miksa, T., Simms, S., Mietchen, D., & Jones, S. (2019). Ten principles for machine-actionable data management plans. PLoS Computational Biology, 15(3).
Nightingale, A. (2020). Data management plans: Time wasting or time saving?. Biochemist, 42(3), 38–39.
OpenDMP (2017). Welcome to argos : Create, link, share data management plans. https://opendmp.eu/home.
Parham, S., Carlson, J., Hswe, P., Westra, B., & Whitmire, A. (2016). Using data management plans to explore variability in research data management practices
across domains. International Journal of Digital Curation, 11, 53–67.
PARTHENOS DMP, 0000. Data management plan : Parthenos project, Retrieved from http://www.parthenos-project.eu/portal/dmp.
Pergl, R., Hooft, R., Suchánek, M., Knaisl, V., & Slifka, J. (2019). "data stewardship wizard’’: A tool bringing together researchers, data stewards, and data experts
around data management planning. Data Science Journal, 18(1).
RDMO (2017). Rdmo : Research data management organiser. Retrieved from https://rdmorganiser.github.io/en/.
Redkina, N. (2019). Current trends in research data management. Scientific and Technical Information Processing, 46(2), 53–58.
Reilly, M., & Dryden, A. (2013). Building an online data management plan tool. Journal of Librarianship & Scholarly Communication, 1(3), 1–11.
ResData, 0000. ResData, Retrieved from https://resdata.unsw.edu.au/pages/authenticate.faces.
Romanos, N., Kalogerini, M., Koumoulos, E., Morozinis, A., Sebastiani, M., & Charitidis, C. (2019). Innovative data management in advanced characterization:
Implications for materials design. Materials Today Communications, 20.
Sallans, A., & Donnelly, M. (2012). Dmp online and dmptool: Different strategies towards a shared goal. IJDC, 7(2), 123–129.
Smale, N., Unsworth, K. J., Denyer, G., & Barr, D. P. (2018). The history, advocacy and efficacy of data management plans. bioRxiv, Article 443499.
Stodden, V., Ferrini, V., Gabanyi, M., Lehnert, K., Morton, J., & Berman, H. (2019). Open access to research artifacts: Implementing the next generation data
management plan. Proceedings of the Association for Information Science and Technology, 56(1), 481–485.
Sutter, R. D., Wainscott, S. B., Boetsch, J. R., Palmer, C. J., & Rugg, D. J. (2015). Practical guidance for integrating data management into long-term ecological
monitoring projects. Wildlife Society Bulletin (2011-), 39(3), 451–463.
Swauger, S. (2015). Dmptool. Charleston Advisor, 16(3), 12–15.
TUB-DMP (2015). Tub-dmp. Retrieved from https://dmp.tu-berlin.de/.
UWA DMP, 0000. Research Data Management Plan submission form, Retrieved from https://uwa.qualtrics.com/jfe/form/SV_elYkWKCFjW3oXMV.
Vitale, C., & Moulaison Sandy, H. (2019). Data management plans: A review. DESIDOC Journal of Library and Information Technology, 39(6), 322–328.
Wiley, C. (2018). Data sharing and engineering faculty: An analysis of selected publications. Science and Technology Libraries, 37(4), 409–419.
Wilkinson, M. D., Dumontier, M., Aalbersberg, I. J., Appleton, G., Axton, M., Baak, A., Blomberg, N., Boiten, J.-W., da Silva Santos, L. B., Bourne, P. E.,
Bouwman, J., Brookes, A. J., Clark, T., Crosas, M., Dillo, I., Dumon, O., Edmunds, S., Evelo, C. T., Finkers, R., .... Mons, B. (2016). The FAIR guiding
principles for scientific data management and stewardship. Scientific Data, 3(1), Article 160018.
Williams, M., Bagwell, J., & Nahm Zozus, M. (2017). Data management plans: the missing perspective. Journal of Biomedical Informatics, 71, 130–142.
Wittman, J., & Aukema, B. (2020). A guide and toolbox to replicability and open science in entomology. Journal of Insect Science, 20(3).

17

You might also like