Enhancing The Reuse of Scientific Experiments For Agricultural Software Ecosystems

J Grid Computing (2021) 19: 44
https://doi.org/10.1007/s10723-021-09583-x
Enhancing the Reuse of Scientific Experiments

for Agricultural Software Ecosystems
Lenita Ambrósio & Heitor Linhares & José Maria N.
David & Regina Braga & Wagner Arbex & Mariana
Magalhães Campos & Rafael Capilla
Received: 24 March 2021 / Accepted: 29 August 2021 / Published online: 26 October 2021
# The Author(s), under exclusive licence to Springer Nature B.V. 2021
Abstract Scientific experiments involve complex inter- Keywords Software ecosystem . Experiment reuse .
actions between geographically distributed researchers, Integration . Workflow . Scientific experiments .
who act as units and require a substantial volume of data E-Science
and services. This scenario categorizes a Scientific Soft-
ware Ecosystem, which involves researchers and scien-
tists working together, using scientific software and
related services through scientific workflows. However, 1 Introduction
to develop scientific workflows, users need suitable
platforms to comply with experiment requirements. This Scientific workflows [1, 2] are widely used in e-Science
paper describes two services designed on top of an to support experiments. These experiments involve in-
open-source E-science Software Ecosystem (E-SECO) teractions between researchers, including aspects such
platform to support researchers’ activities during the as the use of large amounts of data and the need for
scientific workflow life cycle. Such services focus on support of distributed computing resources and services.
data integration and provenance data to support exper- Furthermore, they require intense relationships between
iment reuse. We conducted two different case studies in resources and applications that support workflows and
a Brazilian Agricultural Research Corporation. Our re- between geographically distributed researchers. In this
sults show that the proposed platform facilitates the context, research institutions had to open their frontiers
reuse of scientific experiments through data integration to collaborate with external developers, similarly to
in an agriculture context. what was done by software companies [3], giving rise
L. Ambrósio : H. Linhares : J. M. N. David : e-mail: [email protected]

R. Braga (*) : W. Arbex
Graduate Program in Computer Science, Federal University of W. Arbex : M. M. Campos
Juiz de Fora, Juiz de Fora, MG, Brazil Embrapa – Dairy Cattle, Juiz de Fora, MG, Brazil
e-mail: [email protected]
M. M. Campos
L. Ambrósio e-mail: [email protected]
e-mail: [email protected]
R. Capilla
H. Linhares Universidad Rey Juan Carlos, c/ Tulipán s/n, 28933 Madrid, Spain
e-mail: [email protected] e-mail: [email protected]
J. M. N. David
44 Page 2 of 24 J Grid Computing (2021) 1 9: 44
to a new concept of development, where several soft- information in the past” [9]. Therefore, information on
ware solutions, companies, and developers adhere to a provenance data is essential for researchers to under-
common platform. This new approach is called Soft- stand, reproduce, examine, and audit the experiment
ware Ecosystem. results. Provenance data can be considered information
A software ecosystem (SECO) is a set of actors who about the parts involved in the production of an object
collaborate and interact with a common market. Inter- [10]. Provenance information is critical for scientific
actions are often underpinned by a common technolog- workflows to support the reuse of computational exper-
ical platform that operates through the exchange of iments, interpret results and diagnose problems, as pro-
information, resources, and artifacts [4]. posed by [11–14]. However, these solutions do not
Scientific workflow specification is not a trivial task. support the reuse and integration of such data in scien-
It requires specialized knowledge, often interdisciplin- tific experiments on SECO platforms.
ary understanding, and some computing skills from Therefore, the research problem addressed in this
scientists. Besides, researchers often interact across geo- paper is how to support data integration and experiment
graphically distributed sites to reuse experiments and reuse of third parties’ applications to enhance the col-
integrate applications that can support the execution of laborative experimentation process.
these experiments. In many cases, the data used and To tackle this problem, this paper presents two ser-
generated by scientific experiments and parts thereof vices aimed at supporting the development of scientific
need to be reused. These complexities create some bar- experiments as regards experiment reuse, data integra-
riers to and difficulties in developing and reusing tion, and provenance data. We designed the services on
workflows designed by different scientists. As a result, top of a Scientific Software ECOsystem named E-
the reworking of experiments added to a high cost of SECO [15]. We aimed to create a shared environment
specification and execution are recurrent. that allows the reuse of experiments through data inte-
Scientific experimentation is usually a collaborative gration support. The services were evaluated based on
activity. It goes through a life cycle that begins investi- the Design Science Research (DSR) methodology
gating the problem, followed by prototyping and exe- [16–18] through two case studies where integration
cuting the experiment [5]. There are several reasons for and reuse issues were investigated, considering the
sharing scientific data, including: (i) from an ethical use of E-SECO in scientific projects. Under the DSR
point of view, results of publicly funded research should methodology, we deployed two cycles. In the first
be made available to the public; (ii) open science shall cycle, we constructed the FeedEfficiencyService ar-
make research more transparent by allowing reproduc- tifact and carried out a case study to evaluate this
ibility of research results; and (iii) open science fosters a service. In the second cycle, we developed the
more collaborative and efficient science, thus maximiz- ContextProv artifact and performed the second case
ing the social and economic benefits of research [6]. study. The case studies refer to a scenario where E-
When reusing experiments, it is also necessary to SECO provided an environment to support integra-
offer support for data integration [7] associated with tion, as well as data reuse from several experiments
the applications that help experiments that demand re- related to dairy cattle feed efficiency. This scenario
use. It is not enough to offer support for workflow reuse. considers that the feed is the main problem related
The integration of applications and their data to support to rearing livestock for either milk or meat produc-
the execution of computational experiments in the eco- tion. Animals that ingest food more efficiently con-
system platform is also important. In this scenario, consume less to achieve the same production level,
text and provenance information plays an important making them more profitable and producing more
role. Storing and retrieving contextual information dur- food per unit area.
ing the experimentation process can be critical if the As our main contributions in this research, we can
activities performed cannot be reproduced or reused [8]. mention: (i) the specification and implementation of a
Among contextual information are, for instance, those service, named FeedEfficiencyService, on top of the E-
about the domain of the experiment and the technologies SECO ecosystem platform; and (ii) the specification and
used in each activity. implementation of ContextProv, a service that enhances
In scientific experimentation, provenance data refers experiment reuse, considering the context and prove-
“to a kind of contextual element that describes nance information lifecycle.
J Grid Computing (2021) 1 9: 44 Page 3 of 24 44
This paper is organized as follows. Section 2 presents knowledge as well as integrating and sharing such
related work and a discussion regarding the main con- knowledge among the different research centers.
tributions of our research. Section 3 describes the Hulsegge et al. [23] described a global program
methods and materials employed. Section 4 presents aimed to develop an Animal Trait Ontology (ATO).
the results of the study. Finally, concluding remarks The authors developed two ontologies for livestock
are presented in Section 5. reproduction, i.e. REPO (Reproductive Trait and Phe-
notype Ontology) and HPIO (Host-Pathogen Interac-
tions Ontology). REPO focuses on female fertility in
2 Related Work cattle, while HPIO focuses on the interactions between
pigs and Salmonella. Jonqueta et al. [24] present a
In the SECO context, Manikas [19] and Jansen [20] note platform called AgroPortal that receives and hosts on-
that the literature focuses on issues related to open- tologies, aligning them and enabling data reuse in agri-
source software repositories. Such repositories allow, cultural software applications. However, these papers
for example, data mining and provide information about do not use these ontologies as integration models and
the technical and social perspectives of a project, such as provenance data to support reuse in the context of
codes, dependencies and changes. Manikas [19] high- experiments.
lights that the existing literature concentrates on integra- We also analyzed other solutions to identify how
tion and includes studies on integration-based ap- they support data reuse in scientific experiments on
proaches comprising strategies for integration, soft- SECO platforms. The Karma [11, 14] is a framework
ware integration, integration of knowledge re- that captures provenance data for scientific workflows.
sources, integration of platforms, or business inte- This framework provides guidelines for storing and
gration. Additionally, Manikas [19] argues that a querying provenance data in a relational database. How-
SECO should allow the integration and existence ever, this solution does not infer implicit knowledge, nor
of multiple functionalities in a safe and adequate does it provide information to enhance data reuse. Be-
way, emphasizing the value of modularization and sides, it does not provide adequate visualizations for
the importance of reuse support. users not familiar with provenance data, and it does
Parrot, Lacroix and Wade [21] developed a multi- not support the execution of scientific experiments on
agent collaborative architecture to support dairy indus- SECO platforms.
try decisions. In their approach, the authors used ontol- SciCumulus [13] is a middleware created to orches-
ogy to map meanings from different domains and estab- trate scientific workflows through Workflow Manage-
lish communication between agents. However, they did ment Systems in distributed and parallel environments.
not use ontology to process new knowledge based on This approach offers a real-time capture service. The
inference engines or use ontology as an integration provenance data is stored with adequate granularity at
model. Thus, this multi-agent collaborative architecture the activity level and at execution time. As a result, it is
does not produce new knowledge through ontology. It possible to monitor the workflow status and evaluate the
does not propose integrating experiments, unlike our results available during execution. This solution is based
work using direct relationships to integrate data, and on a provenance model that considers both the data
inference algorithms (Pellet) to discover new knowl- descriptors related to the cloud environment and
edge. These relationships and new knowledge help im- the data related to the workflows’ structure and
prove the integration process, including new connec- execution (prospective and retrospective provenance).
tions between information sources. SciCumulus focuses on the provenance of workflows,
Janssen et al. [22] described the SEAMLESS (Sys- so it does not manage context explicitly, which could
tem for Environmental and Agricultural Modeling) ar- hinder data reuse. The visualizations implemented are
chitecture, which integrates databases from different not aimed at reusing scientific experiments on SECO
domains, such as climatic conditions, soil, and cropping platforms, which can make user’s interpretation during
patterns. The authors developed a collaborative ontolo- reuse more difficult.
gy to facilitate the interdisciplinary nature of research, Da Cruz and Do Nascimento [25] resort to data
focusing on mapping these databases. Compared to provenance and reuse to support the reproducibility of
SEAMLESS, our work enables discovering new agronomic experiments. The authors present a
framework that manages the data of such experiments. support data reuse in the context of a “scientific ecosys-
Like in our proposed solution, the authors capture pro- tem platform”. Moreover, they do not discuss aspects
spective and retrospective provenance of experiments, related to reuse associated with integration and contex-
but they reuse scripts to support agronomic experiments. tual information to support scientific experiments.
Nevertheless, we propose services integrated into a
SECO platform to enhance reuse and integration in the
scope of agricultural research experiments. 3 Methods and Materials
ProM [26] is a framework that uses comparison
algorithms for mining both imperative and declarative This study was conducted based on the Design Science
processes. The purpose of this tool is to assist specialists Research (DSR) methodology [16–18]. In our study, the
in planning scientific experiments by discovering FeedEfficiencyService and ContextProv services, de-
workflow models. It uses the provenance data generated veloped as our solution for a scientific software ecosys-
by workflows that had good results in the past. It also tem platform, correspond to the artifact. In each cycle of
helps that domain experts can visualize and understand the case studies, the conducting of the evaluation gen-
the similarities between instances of the workflow. Con- erated scientific knowledge. This knowledge helps build
sequently, experiments can be reused, shared or planned new versions of the services to compose the SECO
in order to obtain better results. However, the frame- platform solution. To apply DSR, we followed some
work does not offer visualizations that support the reuse steps, namely problem definition, literature review and
of scientific experiments on SECO platforms, which discussion on existing solutions, artifact development,
would facilitate data interpretation by users who do evaluation, and discussion of results.
not know how to interpret declarative models. Also, it We first identified the problem relevance as being “to
does not consider the contextual information associated support the development of scientific experiments em-
with the scientific experiment. phasizing data integration and reuse issues” on Scientif-
E-SECO ProVersion [27] is an approach that sup- ic SECO platforms. We investigated related works to
ports configuration management on the E-SECO plat- find proposals in the literature that deal with integration
form. This approach allows researchers to capture the and reuse support (Section 2).
workflow data in different Workflow Management Sys- Hence, we detail our research method in the follow-
tems through a web service included in the workflow ing sections, considering a project related to feed effi-
model. These data are instantiated in PROV-OEXT ciency for dairy cattle at EMBRAPA. Researchers need
ontology, which, through domain-specific rules, detects to conduct field experiments1 to calculate different Feed
the evolution and maintenance of workflows. This ap- Efficiency (FE) indices and analyze how these indices
proach allows the extraction of implicit knowledge relate with different characteristics of the animals. Given
through ontologies. On the other hand, the captured the fact that researchers need months to complete the
contextual information and provenance focus on the experiments, and due to time constraints, it would be
execution of the workflow. This approach is not capable unworkable to follow up on an experiment throughout
of managing provenance during the entire experiment its execution and only then perform the evaluation. In
cycle. Besides, it does not consider the experiment’s view of that, we chose to perform case studies using
contextual elements and does not offer visualization information from experiments already completed.
components to support the reuse of scientific
experiments.
3.1 E-SECO: Scientific Software Ecosystem
Solutions to support the integration of services in a
multi-cloud environment have been discussed consider-
E-SECO [15] is a Scientific Software Ecosystem
ing different vendors [28, 29]. However, in our research,
(SSECO) developed in an E-Science project context at
we deal with a domain of scientific experimentation
supported by a software ecosystem platform. In this 1
An experiment is an empirical procedure that arbitrates competing
context, we propose a solution that uses integration models or hypotheses. Researchers also use experimentation to test
support to enhance the reuse of scientific experiments. existing theories or new hypotheses to support or disprove them.
https://www.ncbi.nlm.nih.gov/pmc/articles/PMC1635141/. In this
Therefore, as far as we can ascertain, the studies vein, an agricultural experiment is usually associated with a scientific
available in the literature do not propose solutions to method for testing certain agricultural phenomena.
the Federal University of Juiz de Fora, Brazil. It pro- Memory layer, which encompasses E-SECO databases,
vides a platform that allows the accomplishment of (ii) the Interoperability layer, and (iii) the Provenance
experiment steps. A scientific SECO usually manages and Context layer and the Integration layer, which will
multiple relationships with external research, consider- be detailed further in this paper.
ing that the scientific context requires specialized and The Provenance and Context layer (ContextProv) is
interdisciplinary knowledge, as in E-SECO’s case. responsible for capturing, storing, performing infer-
An experimentation process usually traces the fol- ences, and sharing information from scientific experi-
lowing steps on E-SECO. During the problem investi- ments. In the View layer, the information is organized in
gation step, scientists look for similar experiments, in- the user interface, providing a graphical representation
teract with other researchers using the E-SECO plat- of the entire experimentation process.
form, define their goals and break down the experiment The Interoperability layer comprises the PRIME
into smaller steps. In the experiment prototyping step, (PRagmatic Interoperability to MEaningful collabora-
scientists build a prototype by designing workflows and tion) [30] service, which supports interoperability on
reusing available assets, also accessing artifacts the E-SECO platform. PRIME receives a data set, pro-
persisted in E-SECO-related repositories. Therefore, re- cesses the data and searches for relevant services capa-
searchers can explore the assets and reuse their compo- ble of interoperating. The PRIME service comprises
nents to produce new products and provide new artifacts syntactic, semantic, and pragmatic modules. These
during the experiment prototyping step. As a final step, modules are mainly responsible for indicating if two or
researchers analyze and publish their results and contri- more services may interoperate at a certain interopera-
butions, using the collaboration support provided by the bility level or not. Considering that activities on the E-
E-SECO platform. SECO platform demand the support of different inter-
Figure 1 presents the E-SECO platform’s main mod- operability types, PRIME contributes to supporting the
ules: The E-SECO Development Environment module integration of different solutions on the platform.
comprises the Execution Environment, which aims to At the syntactic interoperability level, it is necessary
support the execution of the experiment. The Service to establish the syntax of exchanged data as a standard
Development Process module supports the services con- format. Semantic interoperability is concerned with en-
structed on the E-SECO platform, and the Collaborative suring that the meaning of data across the applications is
Services module aims to support the collaborative ac- the same. In dealing with pragmatic interoperability, we
tivities conducted during experimentation processes. ensure that message sender and receiver share the exact
The Development Environment module interacts with expectations about the outcome of the messages [31].
additional support layers for the experimentation pro- The Integration layer (FeedEfficiencyService) sup-
cess. Among these layers, we can highlight (i) the Group ports the integration with external applications through
Fig. 1 Overview of the E-SECO

ecosystem platform highlighting
the solution (dashed box)
independent components developed for this purpose. In terms of specific requirements that guided the devel-
Currently, the E-SECO platform connects with the fol- opment of the approach, we can highlight: (i) Integrated
lowing external platforms: Mendeley 2 , Parsifal 3 , storage of the experiments, considering the heterogeneity of
MyExperiment4, BioCatalogue5 and Kepler6. We im- the databases; (ii) Definition of a generic integrator model
plemented these integrations independently since each that can also be used by a research center outside the context
platform has specific characteristics. of feed efficiency; (iii) Creation of visualization tools to
It is not the aim of this paper to detail the E-SECO support the analysis of the integrated data.
platform. Our main goal is to present how experiment The FeedEfficiencyService was specified following the
reuse can be enhanced by E-SECO through integration layered architectural model to provide better modularization
support, considering two specific services to support (i) and decoupling between the services, facilitating integration
the integration of feed efficiency experiments, and (ii) with other applications. The architecture is composed of the
context and provenance information in feed efficiency following layers: Wrapper layer, Service layer, Front End
experiments. layer, and Ontology layer.
The Wrapper layer was specified to translate the data
from multiple experiments stored in heterogeneous da-
3.2 Feed Efficiency Service tabases to an integrator model, named FEO, provided by
the Ontology layer. The Wrapper layer considers spe-
The primary motivation for the specification of cific data from the analyzed experiments. However, if
FeedEfficiencyService, on top of the E-SECO platform, there is a need to include new experiments related to the
was the need to integrate data from experiments carried feed efficiency domain, they can be easily integrated
out by a group of scientists from Embrapa – Dairy into the architecture. It should be noted that the Wrapper
Cattle, namely the Animal Nutrition Group, which, like layer will not be necessary in experiments conducted
other research groups in different institutions, works after the adoption of the architecture.
with heterogeneous and non-integrated data. Thus, the The Service layer provides a RESTful web service to
group needs to integrate and analyze these data together provide access for applications that need to consume
to ensure experiment reuse. data and provide information to FeedEfficiencyService
The research group proposed to use the E-SECO and E-SECO. It provides services for the storage, man-
platform, together with FeedEfficiencyService to sup- agement and query of data, as well as interoperability
port the execution of the experiments, the cross-analysis with other applications and services. The Service layer
of the data, and reuse. It is important to emphasize that enables a researcher to use data and services related to
these Embrapa’s researchers had not used the E-SECO dairy cattle feed efficiency and share information with
platform before. The Embrapa’s researchers have been other researchers while also providing remote access
using multiple computational tools to store, sort and from multiple devices. The Ontology layer aims to
analyze animal data in their experiments. In this context, support the integration and analysis of data from exper-
e.g. before the adoption of the E-SECO platform and iments. This layer encompasses the FEO7 (Fig. 2). Fi-
FeedEfficiencyService, the data collected in situ and the nally, the Front−End layer provides researchers with an
calculation bases for indices, results and classifications interface for using the FeedEfficiencyService through a
were stored in a spreadsheet. web application8.
The lack of data integration and the absence of a
history from data derivation until experiment results 3.2.1 FEO Ontology
hinder experiment reuse. Therefore, we believe that the
use of the E-SECO platform could help integrate the
heterogeneous databases and reuse experiments through We developed the integrator model based on an
the FeedEfficiencyService service. ontology 9 specification for dairy cattle feed and
2 7
https://www.mendeley.com/newsfeed. h t t p s : / / g i t h u b .
3
https://parsif.al/. com/mateusgon/FeedEfficiencyServiceBase/tree/master/data
4 8
https://www.myexperiment.org/home. https://github.com/mateusgon/FeedEfficiencyServiceBase
5 9
https://www.biocatalogue.org/. In computation, an ontology is a formal and explicit specification of
6
https://kepler-project.org/. a shared conceptualization [36].
nutrition, called Feed Efficiency Ontology (FEO)10

(Fig. 2). This ontology allows the semantic integration S1 Rule – CAR index classification
between related experiments.
The FEO ontology supports researchers regarding EMBRAPA:Cattle(?cattle) ^ EMBRAPA:isEvaluationOf(?cattle,
animal classification and data interoperability to per- ?evaluation) ^ EMBRAPA:Experiment_CAR(?evaluation,
?EvaluationCAR) ^ swrlb:lessThan(?EvaluationCAR, (-1)*X)
form cross-analysis and discover new connections be- ->EMBRAPA:Efficient_CAR(?cattle)
tween experiments. Besides, due to the need to classify
the animals according to efficiency indices, specific As a result, through its declared model (explicit
classes and rules were created for efficiency knowledge) with the addition of specific SWRL rules
classification. and inference engine, the FEO infers the classification of
The language used to implement the ontology was the animals’ instances under the feed efficiency indices
OWL 2.0, recommended by W3C. The FEO structure is (implicit knowledge). The animal classification is con-
composed of three main classes: Cattle, Classification, sidered new knowledge, produced from the processing
and Evaluation. The Classification class has three sub- of SWRL rules and inference engines over the ontolog-
classes in the feed efficiency index: efficient, interme- ical instances.
diary, and inefficient. There are other four subclasses for
feed efficiency indices: residual feed intake (CAR),
residual body weight gain (GPR), residual intake and 3.3 Context and Provenance Service
body weight gain (CGPR), and feed conversion efficien-
cy (ECA)11. ContextProv aims to support researchers in their under-
To discover new associations between experiments standing of the experiments already executed and thus
and animals and enable the processing of classifications enhance knowledge reuse from these experiments. Its
related to feed efficiency, we created SWRL (Semantic secondary aim is to discover the implicit context and
Web Rule Language) rules in the FEO. These rules historical data through ontologies and inference
classify animals as efficient, intermediary, or inefficient mechanisms.
for each of the feed efficiency indices and enable dis- To fulfill these functional requirements, we imple-
covering new associations between animals and mented supporting tools for each stage of the prove-
experiments. For example, rule S1 – CAR index nance and context lifecycle. Figure 3 provides an over-
classification - classifies efficient animals accord- view of the proposed solution.
ing to the CAR index. To do so, it uses previously Information is captured (Capturing – Fig. 3) through
known information (explicit knowledge), e.g. being the user interface. The user interface also shows the
an instance of Cattle ontological class and having information on researchers’ scientific profiles, including
an instance of the Evaluation ontological class their research interests, educational institutions, and re-
associated with it, besides having a data ontologi- search groups. The capture of provenance and context
cal property Experiment_CAR, and that being less information related to the execution of workflows in E-
than X 12. Thus, an animal instance with such SECO occurs through the integration with Workflow
combinations is classified in the Efficient_CAR Management Systems, such as Kepler13.
ontological class (implicit knowledge) as efficient We stored (Storage – Fig. 3) the data captured in
CAR. distributed repositories. Despite being a large volume of
data, the E-SECO platform manages these data with the
support of a peer-to-peer network [15]. Each network
node has an E-SECO data repository. These nodes store
10
data in a decentralized manner.
https://github.com/mateusgon/FeedEfficiencyServiceBase/ We support data enrichment using an ontology
blob/master/data/Embrapa_Experimento1.owl.
11
Henceforth, the acronyms for these indices will appear in Portu- named Prov-SE-O14. In this stage (Enrichment −
guese, in the way that local researchers routinely use them, as these
13
acronyms will be referred to as such in the proposed architecture. https://kepler-project.org/.
12 14
X is the value obtained by computing the standard deviation of the https://github.com/mateusgon/ProvSe-
indices obtained, varying according to the classification of efficient, Service/blob/master/src/main/resources/ontologies/prov-se-o-load-
intermediary, or inefficient. 1107.owl.
Fig. 2 Feed Efficiency Ontology (FEO) Main classes (in the protégé tool interface)
Fig. 3), the data captured and stored in the database are rules identify similar experiments or workflows and
inserted into the Prov−SE−O ontology. Using inference reused documents. We consider similar the experiments
algorithms, we can derive implicit knowledge such as that use the same workflow or versions of the same
experiments and similar workflows. Workflow Manage- workflow, while those that include the same program
ment Systems are used to execute workflows (when the are considered similar workflows. We consider capable
platform does not inform them explicitly), among other of being reused the documents used to execute a pro-
related scientific data. gram that does not belong to the same workflow that
The Prov-SE-O ontology models scientific experi- generated the document. Through inference algorithms,
ments workflows and considers contextual information the Prov-SE-O ontology allows the derivation of implic-
related to the collaborative and distributed nature of the it knowledge, such as (i) experiments and workflows
experiment. Prov-SE-O includes new classes, proper- similar to each other or derived from each other; (ii)
ties, Property Chains, and rules in SWRL [32]. SWRL researchers, institutions, and research groups involved
Fig. 3 Overview of the context and provenance service (ContextProv) steps

or that influenced the experiment; (iii) workflows or 3.4 Evaluation

services used in the experimentation process; and (iv)
data consumed and generated by the experiment, activ- Our research suggests two services designed on top of a
ities or entities produced in an experiment and reused in Scientific Software ECOsystem platform supporting the
future experiments. reuse of experiments. We opted to conduct Case Studies
We share information in the E-SECO platform (Shar- as they are the most appropriate strategy for our re-
ing – Fig. 2) through the peer-to-peer network and search, considering that we studied specific artifacts
ContextProv. The information refers to the provenance (the FeedEfficiencyService and ContexProv services)
and context of scientific experiments and their in a business domain (scientific experiments) [33, 34].
workflows, executions and data. This information is
accessed (Visualization – Fig. 3) through the user inter-
face, allowing researchers to interact with the platform 3.4.1 Context Definition
to register, view, list and delete provenance and context
information from scientific experiments (Interpretation A research project focused on feed efficiency for dairy
– Fig. 3). Figure 4 shows a screenshot of the cattle has been conducted at the Brazilian Agricultural
ContextProv service. Research Corporation (Embrapa – Dairy Cattle). The
This interface interacts with the information control experiments were carried out at the Embrapa Experi-
services of the E-SECO platform. These services control mental Farm, in the Multi-Compound Farming and
provenance and context information for the following Sustainability Complex. Such field experiments are
objects: Researchers; Research Groups; Institutions; Ex- time-consuming, as animals are observed for months
periments; Workflows; Activities; Entities; WfMS; until the necessary data can be collected. These experi-
Change of experiment stage; Execution of workflows; ments have a high financial cost because Embrapa
Execution of activities; and Details of the experiment, as maintains the animals and all the infrastructure during
illustrated in Fig. 3. the trial period. Thus, the reuse of experiments is
Fig. 4 Example of the ContextProv GUI highlighting experiment and workflow details
essential to maximize the results obtained from each task assigned for the participants. (v) A follow-up ques-
experiment. tionnaire to collect data after the participants had accom-
Given the fact that researchers need months to com- plished the task. The E-SECO and related services were
plete the experiments, and due to time constraints, it made available for the participants to accomplish the
would be unworkable to follow up on an experiment task.
throughout its execution and only then perform the
present work’s evaluation. In view of that, we chose to 3.4.2 Selection of Subjects and Training
perform case studies using information from experi-
ments already completed. A total of 35 individuals (Embrapa researchers) partic-
The GQM (Goal, Question, Metrics) method [35] ipated in both cycles, i.e., CS1 and CS2. Of them, 16
allows defining the scope of this evaluation as to “An- work in the animal nutrition/feed efficiency area, and 19
alyze the use of E-SECO’s integration and reuse ser- work in areas not related to that context. The group of 16
vices to enable the integration of scientific data, from the individuals held degrees in zootechnics, veterinary sci-
researchers’ point of view, in the context of the Feed ence, and computing. In the group of 19 individuals, 4
Efficiency Research Group of Embrapa – Dairy Cattle”. of them are researchers at Embrapa – Dairy Cattle,
From the definition of the scope of the study, the however from other contexts, and although they do not
Research Question (RQ) was formulated: “How do E- act directly in feed efficiency research, they hold de-
SECO and its related services support data integration grees in veterinary science, zootechnics and carry out
to enhance the reusability of experiments?“ statistics and research on genetic engineering of milk,
To answer this RQ, we performed two DSR cycles. precision farming, and genetic enhancement. The re-
In the first cycle, we carried out Case Study 1 (CS1) maining 15 individuals of the latter group have a degree
using heterogeneous data from 6 experiments conducted in computer science and usually work with scientific
by researchers from Embrapa – Dairy Cattle between experiments.
2014 and 2017. Participants used a web interface acces- All participants were trained to use the E-SECO
sible through personal computers to carry out the exper- platform and its related services. However, because we
iment. The duration of each experiment was approxi- had one group of participants with no relation to the
mately 30 min. To facilitate the reproducibility of this animal nutrition/feed efficiency context, a brief expla-
case study, the collection instruments are available in nation about the domain was necessary, presenting the
Appendix 1 (Questionnaire CR, Questionnaire 1 and researchers’ daily routine and explaining how data anal-
Questionnaire 2) and the E-SECO and yses used to be carried out before adopting the platform
FeedEfficiencyService used to conduct CS1 are avail- and its related services. Then, the E-SECO,
able on GitHub15. FeedEfficiencyService and ContextProv were presented
Data were collected using questionnaires and inter- to the participants.
views. The research allowed direct contact with the For the training sections, we used two scenarios. The
subjects involved, so data were also collected in real- first scenario focused on access to experiments and
time. Semi-structured interviews were conducted, keep- animal data analyses, communication between the ser-
ing in mind that the main focus was to analyze an E- vices, and the animals’ classification under the efficien-
SECO and the use of related services. cy indices. The second scenario presented the classifi-
In the second cycle, we used the ContextProv service cation visualization feature as a support for analyzing
to conduct Case Study 2 (CS2) using 3 experiments. the animals’ evolution throughout the experiments. This
The collection instruments are available in Appendix 2. scenario used data from the six experiments. These were
To conduct the Case Studies, the following instru- conducted at different stages of the animals’ lives.
ments were selected (i) Consent from the subjects to For the evaluation, we created two groups: the par-
enable the publication of the collected data in this work; ticipants related to animal nutrition/feed efficiency
(ii) Profile background questionnaire; (iii) Material to (Group A, responded Questionnaire 1 and 2) and the
support the training session offered to the participants participants not related to this context (Group B,
before the study began; (iv) A document explaining the responded Questionnaire 2). The second group was
created so that we can observe whether researchers from
15
https://github.com/mateusgon/FeedEfficiencyServiceBase. other contexts, using E-SECO/FeedEfficiencyService,
could analyze and reuse the experiments, thus encour- Open-ended questions aimed to detect relevant as-
aging interaction between different Embrapa research pects not considered in the closed-ended questions. The
centers and other external researchers. For the selection closed-ended questions were answered through a scale
of the individuals in each group, a characterization of values, ranging from 1 to 5. Value 1 refers to the
form 16 (Appendix 1 − Questionnaire CR) was answers that disagree with the statement and value 5 to
completed. answers that totally agree with the statement.
We specified two different sets of questions17. The Before conducting CS2, it was necessary to analyze
first set addresses questions related to the context before the scientific experimentation process conducted with-
the adoption of E−SECO/FeedEfficiencyService (Ques- out using the proposed approach, focusing mainly on
tionnaire1). In contrast, the second set encompasses the experiments that reused parts of other previous ex-
questions regarding the context after the adoption of periments and selecting the studied cases. Also, it was
the service. Both groups had access to E−SECO/ necessary to collect the provenance and context data
FeedEfficiencyService(Questionnaire 2). from the selected experiments. For this purpose, we
In the second DSR cycle, as we described above, used the following collection sources: documents, ar-
Case Study 2 (CS2) was used to evaluate the chived records, and interviews.
ContextProv service from the researchers’ point of view.
Thus, the 35 participants who contributed to this study 3.4.4 Populating the Platform with Data
know, at various levels, scientific experimentation,
provenance data, and context analysis. The Preparation Stage of CS1 Heterogeneous data-
The aim of CS2 was to evaluate if the proposed bases store several experiments related to Food Efficien-
approach presented benefits regarding the reuse of ex- cy at Embrapa – Dairy Cattle. We developed a translator
periments. Then, we asked them to analyze the experi- layer (wrapper) for both case studies, which allow trans-
ments registered on the platform and their provenance lating heterogeneous data to an integrator data model
and context information to reuse them in future experi- related to the feed efficiency domain. We specified this
ments. During this analysis, we collected data through translator layer to consider specific data from the ana-
direct observation of the researchers’ interaction on the lyzed experiments. However, if there is a need to ana-
platform. Finally, we conducted interviews and applied lyze new experiments previously carried out and related
the Questionnaire 3 (Appendix 2). This questionnaire to the feed efficiency domain, they are easily integrated
also addressed issues about this process so that we could into the architecture.
obtain more information about the experimentation pro- For the use of data from multiple experiments with
cess and reuse without the platform support. different data models, an integrator data model was
specified based on the information from the heteroge-
neous databases and the information provided by the
3.4.3 Data Collection Sources researchers.
The application provides researchers with importing
A case study can rely on different data sources, and data through CSV (Comma Separated Values) files.
these data can be obtained through direct, indirect, and Researchers used this pattern in most experiments car-
independent methods. ried out at Embrapa – Dairy Cattle, relying on tables
For CS1, we chose to collect data through the direct with a large volume of data produced daily, for instance
method, considering direct interaction with the partici- regarding the waterers and feeders. At each animal’s
pants and adopting a questionnaire. As for the question- visit to the respective waterer or feeder, a line is created
naires 1 and 2, we adopted the semi-structured category and inserted in the FeedEfficiencyService.
as it encompasses open-ended and closed-ended
questions. The Preparation Stage of CS2 We used the following
data sources: documents, archived records, and inter-
16
https://www.dropbox.com/s/kasyqklt7hxgg5 views. Such documents and records, analyzed in CS2,
k/QUESTIONNAIRE%20AND%20CHARACTERIZATION%20
were contained in the master’s theses of the Embrapa
OF%20SUBJECTS.docx?dl=0 – see also on Appendix 1.
17
https://www.dropbox.com/s/wilbaxwkz3f5975/analise%20 researchers who conducted experiments and produced
dos%20dados.xlsx?dl=0. spreadsheets with the data collected. We conducted the
interviews informally during meetings with the re- information on the animals’ diet composition. This in-
searchers and visited the Embrapa Experimental Farm. formation, among other data collected, was registered in
This allowed us to better understand the experiments’ the platform and used to conduct the case study present-
context and identify reuse opportunities in the process ed below.
adopted.
Based on the document analysis, we observed that 3.4.5 Case Study 1 (First DSR Cycle)
the researchers use an automated system to collect ani-
mal data during the experiments. However, they do not We conducted CS1 in three stages due to the availability
use any tool to support scientific experimentation, one of participants. The group completed the first stage at
that would allow them to manage all the stages of the the headquarters of Embrapa – Dairy Cattle. Seven
process. Thus, information about the planning, participants attended it, six researchers from the
prototyping, and execution stages of the experiments institution and one from an external research center.
was of the researchers’ exclusive responsibility. At the The second stage took place at the Federal Univer-
end of the experiments, the results are published in sity of Juiz de Fora (UFJF), Brazil, and had 14
articles, theses and dissertations written by Embrapa external researchers. Finally, the third stage took
researchers. Any researcher who intended to reuse place in Embrapa – Dairy Cattle’s experimental field
something would have to read all the publications about and had 14 participants.
a given experiment and look for contextual information. The E-SECO/FeedEfficiencyService was presented
The reliability of such data ends up being impaired, as in every stage, allowing for access to feed efficiency
much contextual information necessary for experiment data from experiments carried out at Embrapa – Dairy
reuse may not appear in the publications. Cattle. Furthermore, we did a presentation on the con-
After analyzing all the available material, we identi- cept of efficient animals, possible classifications
fied cases where we could reuse parts of other experi- adopted in Embrapa – Dairy Cattle, and research moti-
ments, which is of interest for the current case study. In vations for the group not related to the context of feed
this sense, three experiments were included in the study. efficiency.
In Experiment 1, the researcher aimed to study di- We did not use a script or made any interference in
vergences in residual feed intake (CAR) and residual their choices. All the data analyzed were obtained
body weight gain (GPR) in F1 Girolando heifers and the through the questionnaires and direct observation of
relationships thereof with intake, performance, feeding, the participants. We grouped and organized these data
water intake behavior, body surface temperature, and in a tabular form (Fig. 5), and these data were presented
morphological characteristics. In Experiment 2, the aim for the participants according to the visualization pre-
was to evaluate phenotypic divergence effects on feed sented in Fig. 6.
efficiency (CAR, GPR, and ECA). It includes digest-
ibility, enteric methane emission (CH4), energy parti- 3.4.6 Case Study 2 (Second DSR Cycle)
tion, heat production, blood parameters, nitrogen me-
tabolism, and temperatures of different regions of the After conducting Case Study 1 and based on the scien-
body in Girolando heifers under tropical climate condi- tific knowledge obtained, we used the ContextProv
tions. In Experiment 3, the aim was to address repro- service to conduct Case Study 2.
ductive parameters in F1 HG (Girolando) heifers from For CS 2, we sought participants with some knowl-
puberty to first pregnancy and correlate nutritional effi- edge about scientific experimentation and with varying
ciency with some of these parameters. levels of knowledge about provenance data and context
We collected the provenance and context information analysis of scientific experiments. Three researchers
of the selected experiments from the documents and participated in the CS2. As shown in Fig. 7, the partic-
interviews, based on the data modeled by the ipant characterization questionnaire found that partici-
ContextProv. In this way, we gathered the following pants had a moderate or good knowledge of scientific
information: (i) FE indices and the number of groups experimentation, provenance data and context analysis.
used for classification; (ii) length of the animals’ adap- We presented to the participants the E-SECO plat-
tation period; (iii) number of animals included in the form as well as context management provenance and
experiment; (iv) animals’ age and initial weight; and (v) context management capabilities. The researchers then
Fig. 5 CS2 data of FeedEfficiencyService
(a) (b)
Fig. 6 Screenshot of FeedEfficiencyService visualization – (a) Classification and (b) Clustering
Fig. 7 Prior knowledge of the

Case Study 2 participants
had access to the platform to interact with the visualiza- we considered the data obtained both from Groups A
tions and analyze the provenance and context informa- and B. Therefore, the participants answered Question-
tion of the experimentation process in Experiment 1. We naire 2.
explained the research problem addressed in Experi- The aim of the analysis with Group B was to evaluate
ment 2 to the participants and asked them to evaluate whether researchers from other contexts, using E-
the information presented on the platform considering SECO/FeedEfficiencyService, could analyze the exper-
the possibility of reusing parts of the experiments. After iments and therefore encourage the reuse of experiment
this analysis, the participants answered the solution data between different research groups.
evaluation questionnaire. To answer further questions We used the data triangulation18 analysis to
during the evaluation, we interviewed the participants analyze the answers to Questionnaire 2 by both
after answering the questionnaire. groups (A and B). Participants agreed that E
−SECO/FeedEfficiencyService brought agility to
the experiments’ analysis and facilitated the reuse
4 Results of experiment data.
Question 3 evaluated the consistency of the experi-
In the following sections, we detail how E-SECO with ments (Questionnaire 2). We perceived that the evaluat-
FeedEfficiencyService and ContextProv can support the ed service provided a better procedure for storing the
experiments at Embrapa. animal and experiment data; both groups agreed with
the statement.
4.1 Observations from Case Study 1 Questions 4, 5, 6, 7, 8 and 9 evaluated the clustering
and classification visualizations. Regarding questions 4,
For the analysis of Case Study 1, we adopted the qual- 5, 6 and 7, there was no divergence of opinion between
itative method. Questions regarding the animal and ex- the groups. Therefore, we can consider that the E-
periment evaluation method before the adoption of E- SECO/FeedEfficiencyService can facilitate the observa-
SECO/ FeedEfficiencyService by Embrapa were cov- tion and comparison of animal integrated data in the
ered in the Questionnaire 1 completed by Group A. This experiments and the analysis and posterior reuse of such
group had members with experience in a previous sce- data in related experiments. Regarding question 8,
nario, i.e., without E-SECO/FeedEfficiencyService. Group A was unanimous in rejecting the statement that
Group A is the only group that answered Question- the service did not improve animal analysis and exper-
naire 1. The answers related to questions 1, 2 and 5 iment data reuse. Considering the integration of hetero-
revealed problems such as the need for knowledge about geneous data, most participants rejected the statement.
statistical tools, the absence of storage, and patterns. It As previously described, Group B contained individuals
also revealed the difficulty of associating the same ani- outside the context of feed efficiency, with limited
mals in different experiments, i.e. by integrating data, knowledge of any previous methodology and efficiency
therefore hampering experiment data reuse. Questions analysis processes. The absence of such knowledge may
3, 7, 8 and 9 addressed aspects of the analysis of the have influenced their evaluation.
experiments. In these questions, the participants agreed Through the FEO ontology, the SWRL rules and the
about the difficulty of: classifying the animals; monitor- inference machines, it was possible to obtain the ani-
ing the animal’s performance; obtaining the animals’ mals’ classification under four feed efficiency indices
indices; and comparing the experiments, attributable to (CAR, ECA, GPR, CGPR) and classify them under the
the heterogeneity of the bases and the absence of a
standard procedure to conduct experiments. Regarding 18
Runeson et al. [34] argue that triangulation is essential to increase
the data and their storage, the answers to questions 4 and accuracy and strengthen the validity of empirical research and is of
great importance in qualitative data analysis. The authors define trian-
6 revealed the participants’ dissatisfaction with the pre-
gulation as an analysis from multiple perspectives, providing a broader
vious storage mechanisms and their access to experi- view of the study’s object. Data triangulation (source) uses more than
ment data. one data source or collection at different times. Observation triangula-
In addition, to evaluate the questions related to the tion uses more than one observer in the study; Methodology triangu-
lation combines different types of data collection methods (qualitative
contents and the participants’ comprehension of data and quantitative), and Theory triangulation uses alternative theories or
integration, classifications, visualizations and analyses, points of view.
labels efficient, intermediary, and inefficient – evaluated particularly important. Nevertheless, the results show
in questions 10 and 11. In question 10, most participants that reuse in this domain still remains a challenge.
of Group A and Group B agreed that the labels im- Questions 4, 5, 6 and 7 aimed to evaluate whether the
proved the analyses. In turn, in question 11, the majority context information presented in the service is relevant
of Group A and Group B agreed that the ontology can and sufficient to support the reuse of scientific experi-
encourage interaction with other researchers, consider- ments. The answers obtained for question 4 and 5 indi-
ing that FEO provides a comprehensive view of the cated that, for most items, the information is sufficient
experiments. Therefore, this ontology can aid in the for the reuse of these experiments. Questions 6 and 7
reuse of experiment data by researchers from other aimed to identify information that is not relevant to the
contexts. reuse of scientific experiments. Too much information
By collecting the study data and revisiting our re- can overwhelm the workspace, impairing understanding
search question (How do E-SECO and its related ser- and making it difficult for users to conduct analyses.
vices support data integration to enhance the reusability According to the answers obtained, the researchers con-
of experiments?“), we were able to obtain an under- sidered the information presented relevant.
standing of how E-SECO and FeedEfficiencyService Questions 8 to 13 aimed to evaluate whether the
support Embrapa – Dairy Cattle’s daily routine. information inferred by the ontology can support the
Despite these results on data integration, we believe reuse of scientific experiments. Regarding the presenta-
that it is essential to conduct additional evaluations to tion of similar experiments, the majority of the re-
collect more evidence regarding the integration support searchers considered it as relevant for reusing the exper-
of applications in the scientific software ecosystem and iments. Regarding the presentation of activities that
context information to enhance the reuse of experiments served to create a document, all researchers considered
in the ecosystem platform. In this regard, we conducted it relevant. This presentation comprises (i) activities that
an additional evaluation (CS2) to revisit our research had already reused a document, (ii) activities that were
question. reused from another experiment and (iii) activities
reused by other experiments.
4.2 Observations from Case Study 2 Questions 17 to 21 evaluated how the platform’s
visualizations can contribute to the reuse of scientific
In Case Study 2 (CS2), through direct observation, we experiments. The majority of researchers agreed that the
could verify that the researchers were able to find the visualization of the researchers involved with the exper-
provenance and context information to evaluate the iments and the visualization of the collaboration be-
possibility of reusing parts of the experiments. All re- tween them contribute to the reuse of scientific experi-
searchers chose to use the shortcuts to access the infor- ments. Considering the activities that generated a docu-
mation, which indicates that the shortcut component can ment and the activities that had already reused such
make it easier for novice users to use the platform. document visualizations, most of the participants agreed
However, we observed some difficulties in using the that this visualization feature contributes to the reuse of
platform. For example, in the provenance visualization experiments. Regarding the visualization of the experi-
graph, none of the participants were able to access the ments’ flow of activities, all the researchers agree that it
graph details and navigate to other items. contributes to experiment reuse.
The results analysis obtained with the Questionnaire Questions 22 and 23 evidenced how the researchers
3 (Appendix 2) was divided according to the aspects of evaluate the platform’s support for scientific experimen-
the solution being evaluated. The first part aimed to tation. Their answers indicate that using the platform is
better understand the experimentation process without faster, easier, safer and more reliable. Some reasons
the use of the E-SECO platform (questions 1, 2 and 3). cited were: (i) the platform brings relevant data and
The answers indicated that the researchers used to con- information to the researcher in the experimentation
duct experiments without any supported platform. process automatically; (ii) visualizations contribute to
Therefore, according to one of the participants, “the the understanding of the data; (iii) the connection with
process was complex, inefficient and with many other researchers enables contact for various purposes;
chances of error”. It was also possible to observe that (iv) the interface is organized and simple to use; (v) the
the researchers in this domain consider reuse platform provides information to support reuse
decisions from provenance data. Regarding the support experiments, facilitating animal and experiment data
offered by the platform for the reuse of experiments, the analyses and the reuse thereof in other experiments.
researchers’ answers indicated that the proposed ap- The service classifies and labels heterogeneous ani-
proach can facilitate activities with the reuse of experi- mal data through the FEO ontology, the SWRL rules
ments in the E-SECO platform. and the inference machines, providing the researcher
with an integrated view that allows identifying efficient
animals and sharing and reusing data. Besides, there is
4.3 Performance evaluation evidence of improvement in the interaction between
researchers from different contexts, facilitating cooper-
In scientific scenarios, researchers often store their data ation and data reuse.
or simultaneously query these data, either to monitor it The classification visualization of integrated data
or to plan future actions. Thus, in this context, efficient from various experiments allows researchers to analyze
mechanisms are required for both data storage and que- more experiments simultaneously in the same data
ry. We also evaluated our approach in terms of average chart. It also provides researchers with the information
throughput, varying the workload of transactions (10 to necessary to analyze and reuse data, as evinced by the
10,000), between requests, (write/invoke and query) questionnaire answers. Furthermore, the clustering vi-
from data, by a set of peers (four peer) simultaneously. sualization shows the indices’ analyses in an experiment
For this purpose, we use an infrastructure of VM in- or several experiments (integrated data).
stances on Amazon Elastic Compute Cloud (EC2) with The results we obtained from Case Study 2 indicate
Intel(R) Xeon(R) E5-2690 CPU, 2.60GHz, 24-core that the context information supported the reuse of the
CPU, 24 GB of RAM running Ubuntu 16.04. Table 1 scientific experiments registered on the platform. The
shows the average throughput per second for different results of the questionnaire showed all items evaluated
numbers of transactions, with a TPS of 5 at a fixed-rate. as sufficient. All information inferred by the ontology
After the analysis, we had evidence that the system was evaluated by most researchers as relevant or very
can operate with a low latency. This result provides relevant for reuse. This evaluation shows that the Prov-
initial evidence that we can enhance efficiency in dis- SE-O ontology can enhance the reuse of experiments in
tributed environments of scientific experimentation. the E-SECO platform.
The information about the researchers’ profiles was
also considered relevant, or truly relevant, by most
4.4 Discussion
participants, providing evidence that this integration
may benefit experiment reuse.
As a final discussion, considering the observed
Our analysis reveals that all the ContextProv service
evidence, we can state that E-SECO/
information, analyzed in Case Study 2, received a pos-
FeedEfficiencyService, in the E-SECO platform,
itive evaluation as regards the reuse of scientific exper-
reduced experiment data integration complexity.
iments in the E-SECO platform. Thus, we have evi-
Also, the service offers agility to evaluate the
dence that the platform’s provenance and context infor-
experiments and provides organization and stan-
mation can support the reuse of scientific experiments
dardization in the conduction of experiments, ac-
on an SSECO platform. As a result, there is also evi-
cording to the answers given in the questionnaires.
dence that the addition of the ContextProv service on the
The classification and clustering visualizations also
platform contributed to enrich the analyses of our re-
enable comparing heterogeneous data captured in the
search question: “How do E-SECO and its related ser-
Table 1 Transactions average throughput vices support data integration to enhance the reusability
of experiments?“.
Average Throughput per Second Additionally, in the evaluations, most of the partici-
pants reported that the conducting of the experiments
10 100 1000 10,000
was streamlined when the E-SECO platform supports
Invoke 1.54 (s) 2.65 (s) 3.52 (s) 4.43 (s) reuse. Our results provide evidence about how the pro-
Query 0.01 (s) 0.01 (s) 0.08 (s) 1.39 (s) posed ontology contributed to the reuse of experiments.
Such ontology was considered relevant by the
researchers who stressed the importance of the informa- Considering the volume of data used in this work, we
tion reused. The questionnaires showed that all the believe that our ontology met the participants’ expecta-
information generated by the ontology was considered tions. The ontology has restrictions for processing large
relevant or very relevant by most researchers. In addi- volumes of data, which may cause slowness in creating
tion, during the interviews, the researchers pointed out visualizations and in the services output. Aimed to mit-
the importance of having at hand information on (i) igate this threat, we are investigating how the use of
similar experiments, (ii) the source of the documents high-performance computational resources could sup-
and (iii) previous cases of reused documents and port ontology processing.
activities. Furthermore, creating new ontologies from the ser-
vices added to the architecture could result in new
knowledge. Likewise, the addition of computational
4.5 Threats to Validity intelligence techniques could also be relevant for knowl-
edge discovery.
The instruments used in the case studies were When evaluating the impact of the E-SECO/
chosen using the E-SECO’s resources with FeedEfficiencyService in the production and dissemina-
FeedEfficiencyService and ContextProv. Also, all par- tion of knowledge, we identified that the support for
ticipants performed only a single evaluation of the plat- decision making and for interaction between researchers
form. The number of participants in a case study is a are requirements to be studied in other Embrapa con-
threat to validity as more participants could influence texts. Probably, interaction issues can interfere with the
the results. To mitigate this threat, we could carry out evaluation of the proposed solution. In this vein, it is
additional case studies, considering another set of par- necessary to conduct new experiments to evaluate the
ticipants. It is worth mentioning that the E-SECO plat- impact of using the proposed service in these new
form’s collaborative activities need to be explored in contexts and evaluate the E-SECO platform’s interac-
depth. tion support.
In this evaluation, we used several data sources to For Case Studies 1 and 2, only the researchers with a
evaluate the proposed solution. However, some partici- degree in the areas of the experiments contributed.
pants did not understand the exact meaning and purpose Therefore, the results obtained from the case studies
of each question in the questionnaire or interview, so are limited to the domain of SECO platforms and to
there was a threat to the solution’s validity. To mitigate the environment in which we conducted our study.
this threat, we trained the participants to know the tools These results cannot be generalized but can be trans-
and resources available on the E-SECO platform. How- ferred to other similar contexts.
ever, some participants had to do remote training We developed the FeedEfficiencyService and
through a video. It is possible that some participants ContextProv services and integrated them into the E-
did not understand some parts of the training video SECO platform to meet the researchers’ needs related to
and that, during interaction with the platform, they did dairy cattle feed efficiency. New studies should be
not know about the existence or usefulness of a partic- conducted in new contexts to measure the platform’s
ular resource. In this case, there is a threat to the study’s impact on the interaction and reuse of applications
internal validity. and data.
The researchers evaluated the FeedEfficiencyService In terms of integrating our solution with the
service and emphasized its importance regarding Mendeley and Kepler platforms, the fact is that Embrapa
animal data analysis, the safety of the data analysis, does not use these platforms. To address that, we indi-
and the standardization and organization of the data rectly evaluated the relevance of the data obtained
in the experiments. However, we built the through such integrations using local databases. The
FeedEfficiencyService service to meet the needs of data obtained through them was added to the platform
the Feed Efficiency Research Group of Embrapa – manually.
Dairy Cattle, and its functionalities are restricted to As future work, we plan to conduct further experi-
the particularities of this research group. Thus, its ments to explore other situations involving the reuse of
use cannot be generalized, although we could use it scientific experiments on the E-SECO platform, for
in other contexts. example, by conducting additional case studies
comprising other scientific databases, exploring other construction of services that were integrated into the
agricultural domains, like poultry farming. E-SECO platform incrementally. Initially, the first ser-
Regarding the visualizations implemented by our vice was built and evaluated. Support integration and
solution, all were considered by most researchers reuse were enhanced from the generated scientific
as relevant for the reuse of scientific experiments. knowledge for the construction and improvement of
Visualization components were appropriate for the the second service.
presentation of contextual information and prove- The two case studies go beyond support for reuse and
nance of scientific experiments. To display infor- also provide support for research advances on ecosys-
mation in a standardized way, and through abstrac- tem platforms of scientific software, such as (i) the
tions that facilitate its comprehension, we devel- modeling of a provenance and context information life
oped and evaluated visualization components to cycle on SECO platforms; (ii) the development of an
represent the workflow used and the reuse of ontology capable of modeling and extracting implicit
workflow activities; the relationship between re- knowledge about scientific experiments’ provenance
searchers; the relationship between researchers and and context; (iii) the integration between the E-SECO
experiments; and provenance of the entities pro- platform and the Mendeley and Kepler platforms; and
duced and reused by the experiments. The results (iv) the implementation of provenance management on
obtained show that the difficulties observed during the E-SECO platform, enhancing the reuse of the
the researchers’ interaction with the provenance knowledge constructed during the experimentation pro-
graph did not compromise the relevance of this cess; and (v) the implementation of visualizations and
visualization for experiment reuse. These results provenance data in scientific experiments. Besides, the
also evidenced that the visualizations implemented proposed architecture facilitates the integration of the
also contribute to the reuse of scientific experi- experiments, reducing the heterogeneity factor. It pro-
ments on the E-SECO platform. vides a blueprint, which different research groups can
reuse to analyze feed efficiency.
Again, further work will include integrating our so-
5 Conclusions lution with other scientific workflow management sys-
tems, sensors, and external platforms to allow other
This work presented two services developed to support ways of capturing provenance and context information.
the reuse of experiments in a scientific software ecosys- Also, new rules can be defined to support data enrich-
tem. For this purpose, we aimed to enhance reuse by ment in the ontology and their integration with other
supporting data integration. These services were devel- domain-specific ontologies to increase the capacity of
oped and evaluated according to the Design Science knowledge extraction. Finally, we believe that it would
Research (DSR) methodology. Based on the scientific be useful to conduct new case studies to evaluate the
knowledge acquired from the construction and evalua- support offered by FeedEfficiencyService and
tion of the first service, named FeedEfficiencyService, ContextProv in different application domains.
the second service (ContextProv) was developed and
evaluated. The first service focused on data integration
Acknowledgements We would like to thank the researchers
and visualization to support scientific experimentation.
who participated in the evaluation of this proposal, as well as the
The second service focused on managing context Brazilian Agricultural Research Corporation (Embrapa/Brazil).
and provenance data on platforms of scientific soft-
ware ecosystems. It captures, stores, enriches,
shares, and provides visualization of context and Funding This work was partially funded by UFJF/Brazil,
CAPES/Brazil, CNPq/Brazil (grant: 311595/2019-7), and
provenance data throughout the life cycle of scien-
FAPEMIG/Brazil (grant: APQ-02685-17), (grant: APQ-02194-
tific experimentation. 18).Data AvailabilityThe datasets generated during and/or ana-
We conducted two case studies to evaluate the pro- lyzed during the current study are available on Github at
posed solution in the context of the DSR methodology. (https://github.com/mateusgon/FeedEfficiencyServiceBase and
https://github.com/mateusgon/ProvSe-Service) and also at
Through service evaluations, we evidenced the use of
https://www.dropbox.com/s/wilbaxwkz3f5975/analise%20
resources to support the reuse of knowledge in a SECO dos%20dados.xlsx?dl=0. Appendices 1 and 2 present the
context. The DSR methodology supported the questionnaires from CS1 and CS2.
Appendix 1 Characterization of Participants in Both

Case Studies and Case Study 1 Questionnaires
Questionnaire CR - Characteristics of the Respondent

Academic Background in the Area: ____________________________________
Do you work or work with research in the area of animal nutrition/food efficiency?
[ ] Yes
[ ] No. Which research area do you work?
To complete Questionnaire 1 and Questionnaire 2, the following scale will be used:
totally disagree 1
partially disagree 2
indifferent 3
Partially agree 4
Totally agree 5
QUESTIONNAIRE 1 (Answered by GROUP A)

Questions regarding the procedures carried out BEFORE the architecture use:
1 2 3 4 5
1 – Is knowledge about statistical tools indispensable for obtaining the CAR,
GPR, and CGPR indexes?
2 – Were the form of storage, the numbering of the animals, the table labels, and
the dates and numbers formats defined by researchers?
3 –Is the classification of animals under efficient, intermediate, and inefficient
labels obtained easily?
4 –Is the way the experiment is stored and analyzed today meet the needs of the
researcher?
5 –Have the animal more than one code that identifies it throughout the
experiments?
6 –Are the experiments data accessible, and is the access centralized?
7 –Is the monitoring of the performance of the animal occur easily and
satisfactorily?
8 –For each feed efficiency index, are the animals processed individually, and
are their data manually fulfilled for a later classification?
9 –Considering 'N' experiments, regarding comparing animals' performance, is it
satisfactory?
10 –Is the classification of animals precise and understood by researchers outside
the context of animal nutrition?
11 – How do you evaluate the methodology currently used for the storage and analysis of the
experiments?
QUESTIONNAIRE 2 (Answered by GROUPS A and B)
Questions about procedures AFTER architecture adoption:
1 2 3 4 5
1 – Did the architecture reduce the complexity and bring agility in the analysis of
the experiments?
2 –Does the knowledge in some statistical tools remain indispensable for
obtaining the CAR, GPR and CGPR indices?
3 –Does the architecture not allow experiments to have variations in the form of
storage, numbering of animals, labels in columns and formats of dates and
numbers?
4 –Does the architecture facilitate the supervision of the performance of the
animals in several experiments?
5 –Did the colors adopt in the views contribute to identifying the performance of
the animals?
6 –Does the architecture contribute to the analysis and comparison of food
efficiency indexes in the experiments?
7 – Through interactivity, can the researcher construct the views with the desired
data?
8 –Does the architecture not improve the performance analysis of the animal
throughout the experiments?
9 – In cases of imbalance in the distribution between efficient, intermediate and
inefficient categories, does the grouping visualization allow identifying the food
efficiency index less suitable to the experiment?
10 –Does the classification of animals under the labels of efficient, intermediate
and inefficient improved the analyses compared to the analysis of the numerical
results obtained?
11 –Does the classification of animals under the labels of efficient, intermediate
and inefficient improve interaction with researchers outside the context of animal
nutrition.
12 –What are the points observed in the architecture that can contribute to the advances in the
research of dairy cattle's feed efficiency?
Appendix 2 - Case Study 2 Questionnaire
QUESTIONNAIRE 3
1- How do you evaluate the currently adopted scientific experimentation process (without the
support of the E-SECO platform)?
2- Do you consider the practice of reuse significant in scientific experimentation? Why?
3- How do you evaluate the process of reusing scientific experiments currently adopted (without
the support of the E-SECO platform)?
4- Is the information presented on the platform related to each of the following items sufficient
for the experiment's reuse?
() Researchers; () Research groups; () Institutions; () Experiments; () Workflows; () Activities;
() Execution of workflows; () Execution of activities; () Protocol used; () Provenance
5- Would you add any information? Which are?
6- Is the information related to each item below relevant to the reuse of the experiment?
() Researchers; () Research groups; () Institutions; () Experiments; () Workflows; () Activities;
() Execution of workflows; () Execution of activities; () Protocol used; () Provenance
7- Would you remove any information? Which are?
8- Is the presentation of similar experiments relevant to the reuse of the experiment?

() very irrelevant () irrelevant () indifferent () relevant () very relevant
9- Is the presentation of the activities responsible for creating a document (for example, a
spreadsheet) relevant to this document's reuse in other experiments?
10- Is the presentation of the activities that have already reused a document relevant to the
document's new reuse?
11- Is the presentation of the experiment activities reused FROM another experiment relevant to
this experiment's reuse?
12- Is the presentation of the experiment's activities that were reused BY other experiments
relevant to the reuse of this experiment?
13- Is the presentation of the institutions involved in carrying out the experiments relevant to the
experiment's reuse?
14- Is presenting details about the researchers' scientific profile involved in the experiment
relevant to the reuse of the experiment?
15- Is presenting the start and end date and time of each activity performed relevant to the
experiment's reuse?
16- Is presenting the inputs and outputs (documents used, and documents produced) of each
activity performed relevant to the experiment's reuse?
17- Does the visualization that relates the researchers involved with the experiments, which is
presented below, contribute to the experiment's reuse?
() strongly disagree () disagree () indifferent () agree () strongly agree
18- Does the visualization that presents the collaboration between the researchers, presented
below, contribute to the reuse of the experiment?
19- Does the visualization that presents the activity that generated a document and the activities
that have already reused this document, presented below, contribute to the experiment's reuse?
20- Does the visualization that presents the experiment's activity flow contribute to the
experiment's reuse?
21- Does the visualization that presents the provenance information of the experiment,
presented below, contribute to the reuse of the experiment?
22- How do you evaluate the scientific experimentation process supported by the E-SECO
platform?
23- What are the points observed on the E-SECO platform that can contribute to the
experiment's reuse?
24- Would you like to leave any suggestions that can improve the scientific experimentation
process through the E-SECO platform?
References Version 3. Los Angeles, CA, USA: Proceedings of the

2009 Congress on Services-I, pp. 348–351 (2009).
https://doi.org/10.1109/SERVICES-I.2009.54
1. Roure, D., Goble, C., Stevens, R.: The design and realization 15. Classe, T., Braga, R.M., David, J.M.N., Campos, F., Arbex,
of the my experiment virtual research environment for social A.: A distributed infrastructure to support scientific experi-
sharing of workflow. Futur. Gener. Comput. Syst. 25(5), ments. J. Grid Comput. 15, 475–500 (2017). https://doi.
561–567 (2009). https://doi.org/10.1016/j. org/10.1007/s10723-017-9401-7
future.2008.06.010 16. Simon, H.A.: The sciences of the artificial. Cambridge, MA
2. Hine, C.M.: New infrastructures for knowledge production: (1969)
understanding E-science, 1st edn. Information Science 17. Hevner, AR, March, ST, Park, J, Ram, S: Design science in
Publishing (2006) information systems research. MIS Q. 28(1), 75–105 (2004)
3. Bosch, J.: From software product lines to software ecosys- 18. Hevner, A.: A Three Cycle View of Design Science
tems. Proceedings of the 13th International Software Product Research. Scand. J. Inf. Syst. 19(2), 87–92 (2007).
Line Conference, pp. 111–119 (2009). https://doi. 19. Manikas, K: Revisiting software ecosystems research: A
org/10.1145/1753235.1753251 longitudinal literature study. J. Syst. Softw. 117, 84–103
4. Jansen, S., Finkelstein, A., Brinkkemper, S.: A sense of (2016). https://doi.org/10.1016/j.jss.2016.02.003
community: A research agenda for software ecosystems, 20. Jansen, S.: A focus area maturity model for software eco-
software engineering - Companion Volume, ICSE- system governance. Inf. Softw. Technol. 118, 106219, ISSN
Companion 2009. 31st International Conference on, pp. 0950–5849 (2020) https://doi.org/10.1016/j.
187–190 (2009). https://doi.org/10.1109/ICSE- infsof.2019.106219
COMPANION.2009.5070978
21. Parrot, L, Lacroix, R, Wade, KM: Design considerations for
5. Belloum, A., Inda, M.A., Vasunin, D., Korkhov, V., Zhao,
the implementation of multi-agent systems in the dairy in-
Z., Rauwerda, H., Breit, T.M., Bubak, M., Hertzberger,
dustry. Comput. Electron. Agric. 38(2), 79–98 (2003).
L.O.: Collaborative e-Science Experiments and Scientific
https://doi.org/10.1016/S0168-1699(02)00139-4
Workflows, pp. 39–47. IEEE Computer Society,
22. Janssen, S, Andersen, E, Athanasiadis, IN, Van Ittersum,
Washington, DC (2011). https://doi.org/10.1109
MK: A database for integrated assessment of European
/MIC.2011.87
agricultural systems. Environ. Sci. Policy 12(5), 573–587
6. Michel, F.: Integrating heterogeneous data sources in the
(2009). https://doi.org/10.1016/j.envsci.2009.01.007
Web of data. Université Côte d’Azur, Nice (2017)
7. Doan, A, Halevy, A, Ives, Z: Principles of data integration. 23. Hulsegge, B, Smits, MA, te Pas, MFW, Woelders, H:
Elsevier, San Francisco (2012) Contributions to an animal trait ontology. J. Anim. Sci.
8. Mayer, R., Miksa, T., Rauber, A.: Ontologies for describing 90(6), 2061–2066 (2012). https://doi.org/10.2527/jas.2011-
the context of scientific experiment processes, 10th 4251
International Conference on e-Science, pp. 153–160: 24. Jonqueta, C., Touleta, A., Arnaudc, E., Aubind, S., Dzale,
(2014). https://doi.org/10.1109/eScience.2014.47 E., Emoneta, V., Graybealf, J., Laportec, M., Musenf, M.,
9. Buneman, P., Tan, W.: Data Provenance: What next? ACM Larmand, V.: AgroPortal: A vocabulary and ontology repos-
SIGMOD Record. Vol. 47(3), pp. 5–16 (2018). https://doi. itory for agronomy. Comput. Electron. Agric. 144, 126–143
org/10.1145/3316416.3316418 (2018). https://doi.org/10.1016/j.compag.2017.10.012
10. Moreau, L., Groth, P.: Provenance: an introduction to 25. Da Cruz, S.M.S., Do Nascimento, J.A.P.: Towards integra-
PROV. Synthesis Lectures on the Semantic Web. Theory tion of data-driven agronomic experiments with data prove-
and Technology, 3(4), 129p (2013). https://doi.org/10.2200 nance. Comput. Electron. Agric. 161, 14–28 (2019).
/S00528ED1V01Y201308WBE007 https://doi.org/10.1016/j.compag.2019.01.044
11. Simmhan, Y.L., Plale, B., Gannon, D., Marru, S.: 26. Silva, M.F., Baião, F.A., Revoredo, K.: Towards Planning
Performance Evaluation of the Karma Provenance Scientific Experiments through Declarative Model
Framework for Scientific Workflows, Chicago, IL, USA: Discovery in Provenance Data, São Paulo, Brasil: 10th
Provenance and Annotation of Data: International International Conference on e-Science, pp. 95–98 (2014).
Provenance and Annotation Workshop, IPAW, pp. 222– https://doi.org/10.1109/eScience.2014.60
236 (2006). https://doi.org/10.1007/11890850_23 27. Sirqueira, T.F.M., Dalpra, H.L.O., Braga, R., Araújo,
12. Gil, Y., Deelman, E., Ellisman, M., Fahringer, T., Fox, G., M.A.P., David, J.M.N., Campos, F.: E-SECO ProVersion:
Gannon, D., Goble, C., Livny, M., Moreu, L., Myers, J.: An approach for scientific workflows maintenance and evo-
Examining the challenges of scientific workflows. lution. Procedia Comput. Sci. 100, 547–556 (2016).
Computer, IEEE, vol. 40(12), pp. 24–32 (2007). https://doi.org/10.1016/j.procs.2016.09.194
https://doi.org/10.1109/MC.2007.421 28. Park, J., Kim, U., Yun, D., Yeom, K.: Approach for selecting
13. De Oliveira, D., Ogasawara, E., Baião, F., Mattoso, M.: and integrating cloud services to construct hybrid cloud. J.
SciCumulus: A lightweight cloud middleware to explore Grid Comput. 18, 441–4698 (2020). https://doi.org/10.1007
many task computing paradigm in scientific workflows. /s10723-020-09519-x
Miami, FL, USA: Cloud Computing (CLOUD), 2010 29. Markoska, E., Ackovsak, N., Ristov, S., Gusev, M.:
IEEE 3rd International Conference on. pp. 378–385 Software design patterns to develop an interoperable cloud
(2010). https://doi.org/10.1109/CLOUD.2010.64 environment. 23rd IEEE Telecommunications Forum Telfor
14. Cao, B., Plale, B., Subramanian, G., Robertson, E., (TELFOR), pp. 986–989 (2015). https://doi.org/10.1109
Simmhan, Y.: Provenance Information Model of Karma /TELFOR.2015.7377630
30. Neiva, F.W., David, J.M.N., Braga, R., Campos, F., Freitas, 34. Runeson, P., Host, M., Rainer, A.: Case study research in
V.: PRIME: Pragmatic interoperability architecture to sup- software engineering: guidelines and examples. 1st edn.
port collaborative development of scientific workflows. Wiley Publishing, Hoboken (2012)
Brazilian Symposium on Components, Architectures and 35. Basili, V.R., Weiss, D.M.: A methodology for collecting
Reuse Software, pp. 50–59 (2015). https://doi.org/10.1109 valid software engineering data. IEEE Trans. Software
/SBCARS.2015.16 Eng. 10(6), 728–738 (1984). https://doi.org/10.1109
31. Neiva, F.W., David, J.M.N., Braga, R., Borges, M.R.S., /TSE.1984.5010301
Campos, F.: SM2PIA: A Model to Support the 36. Guarino, N., Oberle, D., Staab, S.: What is an ontology?
Development of Pragmatic Interoperability Requirements. Handbook on ontologies. [S.l: s.n.], pp 1–17. Springer,
In: 2016 IEEE 11th International Conference on Global Berlin (2009)
Software Engineering (ICGSE), pp. 119–128 (2016).
https://doi.org/10.1109/ICGSE.2016.15
32. Horrocks, I., Patel-Schneider, P.F., Boley, H., Tabet, S.,
Grosof, B., Dean, M.: SWRL: A semantic web rule language Publisher’s Note Springer Nature remains neutral with regard to
combining OWL and RuleML. Available at: http://www. jurisdictional claims in published maps and institutional
w3.org/Submission/SWRL/.accessedin10/18/2021. affiliations.
33. Wohlin, C., Runeson, P., Höst, M., Ohlsson, M.C., Regnell,
B., Wesslén, A.: Experimentation in software engineering.
Springer, Berlin (2012)

Enhancing The Reuse of Scientific Experiments For Agricultural Software Ecosystems

Uploaded by

Document Informationclick to expand document information

Copyright:

Available Formats

Enhancing The Reuse of Scientific Experiments For Agricultural Software Ecosystems

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Enhancing The Reuse of Scientific Experiments For Agricultural Software Ecosystems

Uploaded by

Copyright:

Available Formats

J Grid Computing (2021) 19: 44

Enhancing the Reuse of Scientific Experiments

L. Ambrósio : H. Linhares : J. M. N. David : e-mail: [email protected]

Fig. 1 Overview of the E-SECO

nutrition, called Feed Efficiency Ontology (FEO)10

Fig. 3 Overview of the context and provenance service (ContextProv) steps

or that influenced the experiment; (iii) workflows or 3.4 Evaluation

Fig. 5 CS2 data of FeedEfficiencyService

Fig. 7 Prior knowledge of the

Appendix 1 Characterization of Participants in Both

Questionnaire CR - Characteristics of the Respondent

To complete Questionnaire 1 and Questionnaire 2, the following scale will be used:

QUESTIONNAIRE 1 (Answered by GROUP A)

QUESTIONNAIRE 2 (Answered by GROUPS A and B)

Questions about procedures AFTER architecture adoption:

Appendix 2 - Case Study 2 Questionnaire

2- Do you consider the practice of reuse significant in scientific experimentation? Why?

5- Would you add any information? Which are?

7- Would you remove any information? Which are?

8- Is the presentation of similar experiments relevant to the reuse of the experiment?

References Version 3. Los Angeles, CA, USA: Proceedings of the

You might also like