2021RDMandSharing AwarnessAttitudeBehaviour
2021RDMandSharing AwarnessAttitudeBehaviour
2021RDMandSharing AwarnessAttitudeBehaviour
Information Development
1–15
Research data management and © The Author(s) 2021
Article reuse guidelines:
Muhammad Rafiq
University of the Punjab
Kanwal Ameen
University of Home Economics, Lahore
Abstract
This study assesses the research data management (RDM) awareness, attitude, practices, and behaviors of
Pakistan’s academic researchers. By using an internationally designed structured questionnaire as a data collec-
tion instrument. Quantitative survey research method was opted to meet the research objectives and data was
collected from academicians and researchers of four premier universities of Pakistan. The study reveals used
and produced data file formats, data acquisition sources, data storage patterns, metadata and tagging practices,
data sharing patterns, RDM awareness, attitude, and behavior of the respondents by investigating the self-opi-
nion of respondents on extensive sets of structured questionnaire items. It is a comprehensive assessment of
the phenomenon from a developing country’s perspective where research data management policies are
absent at national and institutional level. The findings have theoretical implications for researchers and practical
implications for policymakers, university administrators, university library administrators, and educational
trainers.
Keywords
data literacy, data management practices, data sharing behaviors, data management skills, data management
training, higher education, metadata behaviors, tagging behaviors, Pakistan
mandatory for all the US federally funded agencies and that fall within the category of data management
research bodies, with a threshold of $100 million include: file naming (the proper way to name com-
budget, to ensure the OA availability of research data puter files); data quality control and quality assurance;
and products. UK Research and Innovation’s Data data access; data documentation (including levels of
Policy, Common Principles on Data Policy (UKRI, uncertainty); metadata creation and controlled voca-
n.d.), and Concordat on Open Research Data (UKRI, bularies; data storage; data archiving and preservation;
2016) established the framework of effective research data sharing and re-use; data integrity; data security;
data management and sharing by the researchers. data privacy; data rights; notebook protocols (lab or
Similar developments have taken place in other field).”
regions like Australia (National Health and Medical Research data is a valuable resource and requires a
Research Council, 2018), Canada (Government of great deal of time and other resources to create, pre-
Canada, 2016), and the European Union (Donnelly, serve, manage and re-use. The role of researchers is
2017). However, in developing countries’ contexts, vital in RDM and sharing. Researchers are the
policies on data management and sharing cease to exist. primary tool in collecting, organizing, and sharing
Research data may be defined as “Data that are used data. They hold their data rights, and sharing data
as primary sources to support technical or scientific for others largely depends on their will and attitude.
enquiry, research, scholarship, or artistic activity, Policies of major funding bodies around the world
and that are used as evidence in the research process put the burden of sharing research data on researchers.
and/or are commonly accepted in the research com- Researchers’ awareness, attitude, and behavior corres-
munity as necessary to validate research findings pond to successful RDM and sharing practices. Thus,
and results. All other digital and non-digital content considering the importance of RDM and the research-
has the potential of becoming research data. ers’ role, it seemed appropriate to conduct a study to
Research data may be experimental data, observa- assess the RDM practices, awareness, and attitude of
tional data, operational data, third-party data, public Pakistani researchers. The literature search revealed
sector data, monitoring data, processed data, or repur- the dearth of studies on the topic. This study is
posed data” (CASRAI, n.d.). based on the primary data collected from Pakistani
Data may be conceptualized in different ways and researchers in connection with an international
perspectives. Researchers/scholars work with many project on RDM literacy skills and the first one from
kinds of data and sources. Humanities’ scholars Pakistan addressing RDM in the context of individual
might talk about their primary sources or texts. In researchers.
social science, they think in terms of survey results,
interviews, observations, and tests. Natural science
data come from experiments and observations. On Literature review
the other side, research data can be qualitative, quan- Major databases, including Science Direct, Emerald
titative, or both and available in print, analogue, or Insight, LISTA, LISA, Google Scholars, etc., were
digital formats. Moreover, it may further be divided searched by formulating and applying different
into numeric, images, audio, video, text, tabular search strategies to identify the related literature on
data, modeling data, spatial data, instrumentation the topic. A review of the identified literature is pre-
data, etc. There are various formats within one type sented in this section.
of digital data, e.g., image data may be in JPEG, In terms of current RDM practices, the literature
BMP, GIF, TIFF, JFIF, Exi, PNG, BAT, BPG, showed that individual faculty’s research data falls
WebP, etc. Such dynamics add complexity to within the gigabyte range (Akers and Doty, 2013;
Research Data Management (RDM). Chen and Wu, 2017). Researchers rely heavily on perso-
CASRAI, (n.d.) defines RDM as “the storage, nal computers (Chen and Wu, 2017; Wolff-Eisenberg
access, and preservation of data produced from a et al., 2016), local computers (Aydinoglu, Dogan, and
given investigation. Data management practices Taskin, 2017), and mostly at their institutions
cover the entire lifecycle of the data, from planning (Berghmans et al., 2017) to store data, particularly
the investigation to conducting it and from backing during the active research project. Cloud-based storage
up data as it is created and used to the long-term pre- such as Google Drive, Dropbox, etc., is used by a
servation of data deliverables after the research inves- smaller proportion of researchers (Wolff-Eisenberg
tigation has concluded. Specific activities and issues et al., 2016). For long term storage and back up, basic
Rafiq and Ameen: Research data management and sharing awareness, attitude, and behavior of academic researchers 3
science researchers used university-based servers and rely requirements. Tenopir et al. (2011) identified
heavily on specialized instruments such as hard drives common barriers encountered by the researchers in
(Akers and Doty, 2013; Burnette, Williams, and Imker, RDM: insufficient time, lack of funding, and lack of
2016); On the other hand, arts and humanities researchers standards in managing research data. The researchers
depend heavily on personal computer/external hard also expressed some degree of concerns or anxiety
drives as well as Internet-based storage (Akers and regarding their abilities to effectively meet the chal-
Doty, 2013). After completing a research project, data lenges of RDM (Bardyn et al., 2012). Several other
sharing is primarily influenced by the requirements of studies highlighted similar issues being faced by the
funding bodies stored in data repositories. A large researchers in meeting the requirements of RDM,
number of repositories are available (Data Repositories, such as lack of sufficient time to handle the data man-
2018; Registry of Research Data Repositories, 2018; agement (Federer et al., 2016); data storage, integrity,
Scientific Data, 2018; University of Minnesota, 2018) backup options (Mclure et al., 2014); lack of technical
for a variety of disciplines, including humanities, social skills and knowledge needed (Aydinoglu et al., 2017);
sciences, science and technology, health sciences, and and lack of metadata knowledge (Akers and Doty,
other allied and multidisciplinary subjects. However, at 2013; Aydinoglu et al., 2017; Burnette et al., 2016).
present, sharing data in most scientific disciplines (with Van Panhuis et al. (2014) conducted a systematic
notable exceptions in genomics, astronomy, physics) is review of the literature on barriers to data sharing in
still at nominal (Piwowar, 2011; Warrd, Rotman, and public health. The study identified several researchers’
Lauruhn, 2014), and research data sharing practices are concerns hindering data sharing and divided these
at the discipline level (Mallasvik and Martins, 2020). concerns into technical, motivational, economic, poli-
Berghmans et al. (2017) also noted the relationship tical, legal, and ethical. Van den Eynden and Bishop
between data-sharing practices and the field of research. (2014) also highlighted numerous concerns of
In the academic disciplines such as Soil Science, researchers in data sharing such as fear of competition,
Human Genetics, and Digital Humanities, where fear of being scooped, the cost in both time and money
sharing data is well placed, and researchers work collab- to prepare data and documentation for sharing,
oratively, data sharing is integral. Open data practices are absence of funding, absence of professional rewards
less uniform in other fields, and data remains limited to for data sharing, lack of standards and data infrastruc-
the researcher in personal, departmental, or institutional ture, and ethical and legal concerns. Similar findings
archives. On the other hand, Mallasvik and Martins were reported in a qualitative study by Cheah et al.
(2020) indicated that “research data sharing behaviors (2015) and through a questionnaire by Schmidt et al.
are heavily mediated by institutional rules and rational- (2016).
ities that inform researchers’ attitudes”. In a recent study on researchers of the top 25 most
Complexities in data formats, standards, infrastruc- productive Turkish universities, deficiency in techni-
ture, etc., pose challenges to the researchers to meet cal skills and expertise to meet the RDM requirement
RDM requirements. The ease with which researchers was observed along with the absence of RDM policy,
collect large and complex data sets is outpacing their procedures, and guidelines; lack of organizational
knowledge and skill to properly manage them support and training opportunities; lack of finances;
(Whitmire et al., 2015). Accuracy, completeness, and lack of necessary tools and technical support for
timeliness of data, the disparity in metadata standards, researchers to meet the RDM requirements
incompatibility between commercial products and (Aydinoglu et al., 2017). A more recent study
institutional databases and online systems, etc., add (Houtkoop et al., 2018) on 600 psychology research-
further complications in RDM. Researchers showed ers (identified from Web of Science) also established
the willingness to share their data in many studies that researchers’ have certain concerns in data
(Aydinoglu et al., 2017; Berghmans et al., 2017; sharing, such as their perception that sharing data
Burnette et al., 2016; Chen and Wu, 2017). requires extra work, lack of training on sharing data,
However, the extent of individual researchers fears that data might be misinterpreted, and they
making their own data available to others is lower might be scooped.
(Fecher et al., 2015). Certain fears, concerns, issues, Researchers feel the need for enhanced skills and
and barriers hinder the researchers from sharing their support in RDM. The researcher showed interest in
data. A number of studies reported the problems faculty workshops on data management practices
being faced by the researchers in meeting the RDM and assistance in preparing data management plans
4 Information Development 0(0)
for grant applications (Akers and Dotty, 2013). Chen 2. determine the research data collaboration
and Wu (2017) identified the researchers’ require- and sharing practices.
ments of RDM services which include: tools for data 3. explore the state of awareness, practices, and
recording and processing; introduction of policies of attitude of academic researchers regarding
research funding agencies and academic journals’ RDM.
data requirements; methods and standards for collect- 4. assess the status of RDM training needs of aca-
ing data and ways for publishing and submitting data demic researchers
papers. The researchers expressed their intentions to
access RDS through special lectures, social media, Research design and methodology
online courses, phone, email, instant messengers,
The study adopted quantitative research design and
training, a platform for knowledge exchange and
conducted a questionnaire-based survey to collect
sharing, workshop, the library microblogging, etc. In
data. The study was a part of an international multicul-
Pakistan, Piracha and Ameen (2018) conducted a
tural research project aimed to collect data about the
small-scale qualitative study on RDM practices of uni-
data literacy and RDM skills of academics and
versity faculty members through qualitative research
researchers in higher education institutions of different
design. The data was collected through semi-
countries and compare the findings based on disci-
structured interviews from purposely selected ten
plines and participatory countries. The questionnaire
faculty members of the University of the Punjab,
was developed by a team of academicians from infor-
Lahore. The study reported that faculty members
mation schools of the UK and Turkey. We participated
store their research data on personal computers and
in this survey from Pakistan, and permission was
devices, however, prefer a central repository at their
granted to us to use this data for a separate publication
university premises for long term data storage. The
that we are submitting here.
study reported the need for metadata training and
This study’s sample included academic researchers
RDM guidance for faculty members.
from four premier institutions of Pakistan: University
The literature review established that both quanti-
of the Punjab, Lahore; GC University, Lahore;
tative and qualitative studies exist on RDM in the
University of Engineering and Technology, Lahore;
context of the developed world where data manage-
and National University of Science and Technology,
ment and sharing policies of government and
Islamabad. The first two universities are the oldest
research funders’ have been devised and implemen-
institutions of general education, while the other two
ted. However, in developing countries’ perspectives,
are top national universities of specialized disciplines
policies on data management and sharing do not
in the country. The survey was launched online, and
exist Thus, such developing countries, like
respondents were invited to participate through
Pakistan, present a special context to be studied.
emails, listservs, and university teachers’ social
There is a scarcity of studies on RDM awareness,
media groups. After multiple follow-ups, the study
attitudes, and behaviors concerning the developing
received 271 responses. However, 11 responses
countries’ researcers, particularly of Pakistan. This
were incomplete and discarded. Finally, 260
study fills the literature gap and comprehensively
responses were analyzed to present the findings. The
addresses the phenomenon by unfolding the per-
data were analyzed by applying appropriate statistics
ceived awareness, attitude, and typical behaviors of
using SPSS, Ver. 20.
academic researchers with regards to the types of
data used and produced, sources of data acquisition,
data storage patterns and preferences, assigning Data analysis
metadata to research data, and collaboration and
data sharing. Demographic characteristics of respondents
Data on demographic variables shows that the major-
ity of the respondents were male (55%) and research
Research objectives students (57%), belonged to 26–35 years of age
The objectives of the study were to: (70%) and having less than five years of research
experience. Half of the total respondents belonged to
1. identify the patterns of research data used and science and technology disciplines, followed by
produced by the academic researchers. social sciences (45.4%) and Humanities(4.6).
Rafiq and Ameen: Research data management and sharing awareness, attitude, and behavior of academic researchers 5
Re-use of data 51%) of the respondents create new data (Figure 2).
File types used by the respondents A little less than half of the respondents acquire data
from their own research team/group at the university
Researchers were predominantly (81.2%) using stan- (111; 43%) and/or from their own research networks
dard office documents such as text, spreadsheets, pre- such as personal and professional networks (107;
sentations, etc. In contrast, Internet and web-based 41%).
data (webpages, emails, blogs, social network data,
etc.), Images (JPEG, GIF, TIFF, PNG, etc.), structured
scientific and statistical data were used around ∼50% Use of data acquired from others/outside resources
of respondents (Figure 1).
Almost half of the respondents use the acquired data for
Sources of data acquisition their research after spending a lot of time and effort to
make it usable for the project (126; 49%). A similar per-
The majority of the respondents (151; 58%) acquire centage (123; 47%) reported the use of data with a bit of
data from multiple known sources, and a half (132; effort for some cleaning and modifications (123; 47%),
while 44 (17%) respondents mentioned that they do
not use data from outside sources. Only 13% of respon-
dents mentioned that they use data, as it is, without any
problem for their research (Figure 3). Nevertheless, most
of the respondents have to modify or clean the data
before using it, and significant efforts and time are
required for such work.
Data produced
File types of data produced
Most of the respondents (189; 73%) produced data in
standard office formats during their research projects
(Figure 3). Structured scientific and statistical data
(e.g. SPSS, GIS, etc.) formats were produced by
∼50%, followed by data in Image formats (JPEG,
GIF, TIFF, PNG, etc.) produced by 44%. Only 21
(8%) respondents mentioned audio file production
Figure 1. File types of data used (N = 260). and source code as data during their research work.
Thus, in most cases, respondents produce data in
Figure 10. Preferred location of data storage for long term access (N = 260).
publications, and contributing data sets do not help in the term metadata. This revelation is like the findings
this regard. It is necessary to incorporate the policies of Aydinoglu et al. (2017), who found the lack of
for recognition of data contributors and introduce a metadata knowledge among Turkish researchers.
reward mechanism for data sets’ contributions to aca- The respondents have a positive attitude towards
demic publications for academic promotions and RDM and see a major role of their universities to
incentives. recommend standard file naming systems and meta-
data sets to manage their research data. However,
they have uncertainties about sharing their data.
RDM awareness, practices, and attitude Such uncertainties are because of the absence of an
More than half of the respondents never used a data established mechanism (policies, procedures, hard-
management plan (DMP) for their research project. ware, software, training, services, etc.) of RDM in uni-
Almost 80% were uncertain about the availability of versities. These uncertainties may be diminishing by
the DMP in their institutions, or they consider that placing the RDM mechanism at the national and insti-
their institutions do not have any DMP. These two tutional levels.
findings portray the primitive stage of RDM in
Pakistani universities. RDM does not exist from an
institutional perspective in the Pakistani higher educa- Training attained and needs
tion sector. The main funding agencies in Pakistan are The data revealed that the RDM training component is
the Higher Education Commission of Pakistan (HEC) absent. The highest number of respondents who attained
and Provincial Higher Education Commissions any training was less than one-third of the total respon-
(PHECs). Neither has an RDM policy/strategy nor dents, and interestingly that training was about data cit-
asks for an RDM plan from the scientists’ funds. A ation styles. More than 80% of the respondents never
similar situation is with the universities as they do attended any training on metadata or DMP. The situa-
not have formal policies, procedures, software, infra- tion about training on consistent file naming and
structure, staff, and services to support the researchers version controlling presents a bleak picture too. It is
about RDM activities. For example, the University of encouraging that the respondents showed a great interest
the Punjab, Lahore (the oldest and largest institution in in getting training on DMP, metadata, data citation
the country) funds research projects every year, but no styles, file naming, and file version controlling.
RDM policy exists. The situation is similar to Two-thirds of the respondents mentioned that formal
Aydinoglu et al. (2017) finding in Turkey. It is recom- training on metadata would be useful for managing
mended that HEC and PHEC should devise a formal research data. This is the area where institutions need
policy and set up a mechanism (procedure, software, to plan and intervene by active programming of semi-
hardware, training, etc.) to intact RDM in research nars, workshops, and training programs. Such programs
activities in Pakistan. It is essential to meet the new may be instrumental in addressing the challenges of the
norms and requirements of scholarly communication RDM by enhancing the researchers’ knowledge and
and the scientific community. As mentioned in this skills. With enhanced knowledge of data formats, meta-
paper’s introductory section, funding agencies world- data standards, RDM framework, requirements of
wide have already made mandatory data management policy institutions, funding bodies, journals, and data
plans with every research project funded by public repositories, along with enhanced RDM skills and tech-
funds. Respondents cite research data often. Thus, it nical capabilities, it may be assumed that researchers
may be inferred that Pakistan’s academic researchers will be more inclined to manage and share their data
have better citing research data skills while possessing as an imperative to the researchers.
a moderate level of skills of working with open data,
file naming convention or standard, working with
restricted data, and file version control, and using Originality
own/in-house tags and metadata. The study offers comprehensive assessment of the
Moreover, they lack the skills to use metadata stan- phenomenon from a developing country perspective
dards to tag their data and use datasets tagged with where research data management policies are absent
standard metadata, as the respondents rarely practiced at national and institutional level. The study fills
these two elements. It is interesting to note that almost the literature gap and first of its kind in Pakistani
half of the respondents do not know or uncertain about researchers’ context.
14 Information Development 0(0)
Registry of Research Data Repositories (2018) Available at: Wolff-Eisenberg C, Rod AB and Schonfeld RC (2016) UK
https://www.re3data.org/. Survey of Academics 2015: Ithaka S + R | Jisc | RLUK.
Rice R and Haywood J (2011) Research data management Available at: https://doi.org/10.18665/sr.282736
initiatives at University of Edinburgh. International (accessed 15 March 2021).
Journal of Digital Curation 6(2): 232–244.
Schmidt B, Gemeinholzer B and Treloar A (2016) Open
data in global environmental research: The Belmont About the authors
Forum’s Open data survey. PloS One 11(1): e0146695.
https://doi.org/10.1371/journal.pone.0146695. Muhammad Rafiq is currently serving as Associate
Scientific Data (2018) Recommended data repositories. Professor at the Department of Information Management,
Available at: https://www.nature.com/sdata/policies/ University of the Punjab, Lahore, Pakistan. Dr Rafiq has
repositories#close (assessed 25 July 2020). over 20 years of experience in teaching, conducting and
Tenopir C, Allard S, Douglass K, et al. (2011) Data sharing supervising research, and administrating libraries and
by scientists: Practices and perceptions. PloS One 6(6): archives of government and non-government organizations.
e21101. https://doi.org/10.1371/journal.pone.0021101.
He has published several research articles, book chapters,
UK Research and Innovation (UKRI) (2016) Concordat on
open research data. Available at https://www.ukri.org/ and conference papers. in esteemed journals and book
files/legacy/documents/concordatonopenresearchdata-pdf/ series. Dr Rafiq also received: The Jay Jordan IFLA/
(accessed 16 June 2020). OCLC Early Career Development Fellowship 2005,
UK Research and Innovation (UKRI) (n.d.) Data policy - ASISandT Best Paper Award 2009, and Fulbright
UK Research and Innovation. Available at: https:// Fellowship, USA for his Post-Doc studies at the State
www.ukri.org/funding/information-for-award-holders/data- University of New York at Buffalo NY USA. He is also
policy/ (accessed 16 June 2020). serving as the Editor of Pakistan Journal of Information
University of Minnesota (2018) Discipline-based data Management and Libraries (PJIMandL), a Scopus
archives. Available at: https://www.lib.umn.edu/data- journal, since 2014. His research interests are: open
management/datacenters (accessed 18 June 2020). access, research data management, social media, ICTs
Van den Eynden V and Bishop L (2014) Sowing the seed: applications in information settings, and digital library.
incentives and motivations for sharing research data, a
Contact: Institute of Information Management, University
researcher’s perspective. Available at: https://repository.-
jisc.ac.uk/5662/1/KE_report-incentives-for-sharing- of the Punjab, Lahore [Pakistan].Email: rafi[email protected].
researchdata.pdf (accessed on 5 July 2020). pk | drrafi[email protected] Cell: + 92(0)333-3110909
Van den Eynden V and Corti L (2020) The importance of
managing and sharing research data. In: Corti L, Eynden Kanwal Ameen was a Professor in Information Management
V, Bishop L and Woollard M (eds) Managing and at the University of the Punjab, Lahore [Pakistan]. She has
Sharing Research Data. London: Sage Publications, 1–32. served as: chairperson (2009–2018) of the Department of
Van Panhuis WG, Paul P, Emerson C, et al. (2014) A sys- Information Management (University of the Punjab),
tematic review of barriers to data sharing in public Chairperson Doctoral Program Coordination Committee
health. BMC Public Health 14(1): 1–9. (DPCC), and as Director, Directorate of External
Waard AD, Rotman D and Lauruhn M (2014) Research data Linkages. Since 2018, she is serving as Vice-Chancellor
management at institutions: visions, bottlenecks and ways of the University of Home Economics, Lahore. Ameen is
forward. Library Connect Digest:18–21. Available at one of the most prolific author of her discipline in
https://www.elsevier.com/__data/assets/pdf_file/0020/
Pakistan and published more than 150 scholarly publica-
1046603/Library-Connect-Digest-2014.pdf (accessed
15 March, 2021). tions and also remained the chief editor for the Pakistan
Whitmire AL, Boock M and Sutton SC (2015) Variability Journal of Information Management and Libraries till
in academic research data management practices: 2018. She has also served ASISandT, ALISE, and IFLA
Implications for data services development from a in different capacities. Contact: Institute of Information
faculty survey. Program: Electronic Library and Management, University of the Punjab, Lahore [Pakistan]
Information Systems 49(4): 382–407. Email: [email protected]