Skip to main content
Advertisement
  • Loading metrics

PEtab—Interoperable specification of parameter estimation problems in systems biology

Abstract

Reproducibility and reusability of the results of data-based modeling studies are essential. Yet, there has been—so far—no broadly supported format for the specification of parameter estimation problems in systems biology. Here, we introduce PEtab, a format which facilitates the specification of parameter estimation problems using Systems Biology Markup Language (SBML) models and a set of tab-separated value files describing the observation model and experimental data as well as parameters to be estimated. We already implemented PEtab support into eight well-established model simulation and parameter estimation toolboxes with hundreds of users in total. We provide a Python library for validation and modification of a PEtab problem and currently 20 example parameter estimation problems based on recent studies.

Author summary

Parameter estimation is a common and crucial task in modeling, as many models depend on unknown parameters which need to be inferred from data. There exist various tools for tasks like model development, model simulation, optimization, or uncertainty analysis, each with different capabilities and strengths. In order to be able to easily combine tools in an interoperable manner, but also to make results accessible and reusable for other researchers, it is valuable to define parameter estimation problems in a standardized form. Here, we introduce PEtab, a parameter estimation problem definition format which integrates with established systems biology standards for model and data specification. As the novel format is already supported by eight software tools with hundreds of users in total, we expect it to be of great use and impact in the community, both for modeling and algorithm development.

Introduction

Dynamical modeling is central to systems biology, providing insights into the underlying mechanisms of complex phenomena [1]. It enables the integration of heterogeneous data, the testing and generation of hypotheses, and experimental design. However, to achieve this, the unknown model parameters commonly need to be inferred from experimental observations.

Various software tools exist for simulating models and inferring parameters [210], which implement various methods and algorithms. Many of these tools support community standards for model specification to facilitate reproducibility, interoperability and reusability. In particular the Systems Biology Markup Language (SBML) [11], CellML [12] and the BioNetGen Language (BNGL) [13] are widely used.

The Simulation Experiment Description Markup Language (SED-ML) builds on top of such model definitions and allows for a machine-readable description of simulation experiments based on XML [14]. Also more complex simulation experiments like parameter scans can be encoded, and a human-readable adaptation is provided by the phraSED-ML format [15]. Similarly, the XML-based Systems Biology Results Markup Language (SBRML) was designed to associate models with experimental data and share simulation experiment results in a machine-readable way [16]. Like SED-ML, SBRML can also be used for parameter scans. Complementary, SBtab is a set of table-based conventions for the definition of experimental data and models designed for human-readability and -writability [17].

However, parameter estimation is so far not in the scope of any of the available formats, and important information for it, like the definition of a noise model, is missing. Parameter estimation toolboxes usually use their own specific input formats, making it difficult for the user to switch between tools to benefit from their complementary functionalities and hindering reusability and reproducibility.

Based on our experience with parameter estimation and tool development for systems biology, we developed PEtab, a tabular format for specifying parameter estimation problems. This includes the specification of biological models, observation and noise models, experimental data and their mapping to the observation model, as well as parameters in an unambiguous way.

Design and implementation

Scope

The scope of PEtab is the full specification of parameter estimation problems in typical systems biology applications. In our experience, a typical setup of data-based modeling starts either with (i) the model of a biological system that is to be calibrated, or with (ii) experimental data that are to be integrated and analyzed using a computational model. Measurements are linked to the biological model by an observation and noise model. Often, measurements are taken after some perturbations have been applied, which are modeled as derivations from a generic model (Fig 1A). Therefore, one goal was to specify such a setup in the least redundant way. Furthermore, we wanted to establish an intuitive, modular, machine- and human-readable and -writable format that makes use of existing standards.

thumbnail
Fig 1. Specifying parameter estimation problems in PEtab.

(A) Example of a typical setup for data-based modeling. Usually, a model of a biological system is developed and calibrated based on measurements from perturbation experiments, which are linked to the biological model by an observation model. Different instances of a generic model are used to account for different perturbations or measurement setups. (B) Simplified illustration of how different entities from (A) map to different PEtab files (not all table columns are shown).

https://doi.org/10.1371/journal.pcbi.1008646.g001

PEtab problem specification format

PEtab defines parameter estimation problems using a set of files that are outlined in Fig 2. A detailed specification of PEtab version 1 is provided in supplementary file S1 File, as well as at https://github.com/PEtab-dev/PEtab. Additionally, we created a tutorial illustrating how to set up a PEtab problem, covering the most common features (supplementary file S2 File). Further example problems can be found at https://github.com/Benchmarking-Initiative/Benchmark-Models-PEtab. The different files specify the biological model, the observation model, experimental conditions, measurements, parameters and visualizations (Fig 1B). These files are described in more detail in the following.

thumbnail
Fig 2. Overview of PEtab files and the most important features.

PEtab consists of a model in the SBML format and several tab-separated value (TSV) files to specify measurements and link them to the model. A visualization file can be provided optionally. A YAML file can be used to group the aforementioned files unambiguously.

https://doi.org/10.1371/journal.pcbi.1008646.g002

Model (SBML): File specifying the biological process using the established and well-supported SBML format [11]. Any existing SBML model can be used without modification. All versions of SBML are supported by PEtab and can be used if the specific toolbox supports it.

Experimental conditions (TSV): File specifying the condition(s), such as drug stimuli or genetic backgrounds, under which the experimental data were collected. These experimental conditions specify model properties that are altered between conditions, and allow for a hierarchical specification of model properties (Fig 3A). If simulation conditions are used for pre-equilibration—meaning that some experiment started from the equilibrium reached for another condition—specific model states can be marked for re-initialization (Fig 3B).

thumbnail
Fig 3. Parameter hierarchy and pre-equilibration in PEtab.

(A) Illustration of possibilities and precedence of parameter overriding at different stages. The generic model parameter vector, as specified in the SBML model, can be overridden via the observable, measurement, condition and parameter tables, differentially for conditions and measurement points to account for different model inputs or observational model parameters. The parameters that are overridden in each step are indicated with thicker cell borders. Individual parameters can be set to specific values or marked to be estimated (as here p1). (B) In an often encountered experimental setup, a biological system is under some “baseline” condition and assumed to be in equilibrium (e.g., here depicted for after 24h incubation) before a perturbation is applied. If the equilibrium state of the system is not known a priori, such a setup can be modeled by simulating the system until an apparent steady-state is reached (pre-equilibration). To simulate the perturbation, a subset of model states are reinitialized.

https://doi.org/10.1371/journal.pcbi.1008646.g003

Observables (TSV): File linking model properties such as state variables and parameter values to measurement data via observation functions and noise models. Various noise models including normal and Laplace distributions are supported, and noise model parameters can be estimated. Observables can be on linear or logarithmic scale.

Measurements (TSV): File specifying and linking experimental data to the experimental conditions and the observables via the respective identifiers. Optionally, simulation conditions for pre-equilibration can be defined (Fig 3B). Parameters that are relevant for the observation process of a given measurement, such as offsets or scaling parameters, can be provided along with the measured values. This allows for overriding generic output parameters in a measurement-specific manner (Fig 3A).

Parameters (TSV): File defining the parameters to be estimated, including lower and upper bounds as well as transformations (e.g., linear or logarithmic) to be used in parameter estimation. Furthermore, prior information on the parameters can be specified to inform starting points for parameter estimation, or to perform Bayesian inference.

Visualization (TSV): Optional visualization file specifying how to combine data and simulations for plotting. Different plots such as time-course or dose-response curves can be automatically created based on this file using the PEtab Python library described below. This allows, for example, to quickly create visualizations to inspect parameter estimation results. A default visualization file can be automatically generated.

PEtab problem file (YAML): File linking all of the above-mentioned PEtab files together. This allows combinations of, e.g., multiple models or measurement files into a single parameter estimation problem, as well as easy reuse of various files in different parameter estimation problems (e.g., for model selection). The current YAML version 1.2 is used here.

We designed PEtab to cover common features needed for parameter estimation. The TSV files comprise different mandatory columns. These provide all necessary information to define an objective function like the χ2 or likelihood function. However, some methods tailored to specific problems require additional information to estimate the unknown parameters. To acknowledge this, we allow for optional application-specific extensions in addition to the required columns in the PEtab files, e.g., if some parameters can be calculated analytically using hierarchical optimization approaches [18].

PEtab library

To facilitate easy usability, PEtab (https://github.com/PEtab-dev/PEtab) comes with detailed documentation describing the specific format of each of the different files in a concise yet comprehensive manner. Additionally, we provide a Python-based library that can be used to read, modify, write, and validate existing PEtab problems. Furthermore, the PEtab library provides functionality to package PEtab files into COMBINE archives [19]. After parameter estimation, the modeler usually investigates how well the model fits the experimental data. To support this, the PEtab library provides various visualization routines to analyze data and parameter estimation results.

Results

PEtab support in established tools

We implemented support for PEtab in currently eight systems biology toolboxes, namely COPASI [2], AMICI [6], pyPESTO [20], pyABC [21], Data2Dynamics [5], dMod [10], parPE [18], and MEIGO [4]. These toolboxes provide a broad range of distinct features for model creation, model simulation, parameter inference, and uncertainty quantification (Table 1). Combining different tools with complementary features is often desirable. However, in practice this was hitherto hampered by the substantial overhead of tedious and error-prone re-implementation of the parameter estimation problem in the specific format required by the respective tool. With all of these tools now supporting PEtab, a user can more easily combine different tools and make use of their specific strengths. For example, one can use COPASI for model creation and testing, AMICI for efficient simulation of large models, pyPESTO for multi-start local optimization and sampling, or MEIGO for global scatter searches, and Data2Dynamics or dMod for profiling. The ease of switching between tools also provides the opportunity to easily reproduce and verify results, e.g., whether different tools yield similar results. Additionally, developers can compare the performance of newly developed methods with existing algorithms implemented in different toolboxes, independent of the programming language, to select the most appropriate one for a given setting.

thumbnail
Table 1. Non-exhaustive overview of the functionality offered by the different toolboxes currently supporting the PEtab format.

The list of supporting tools and functionality covered by the respective tools may increase over time. An updated overview is available on the PEtab website. Darker colors indicate more accurate, scalable, or broader functionality compared to basic implementations.

https://doi.org/10.1371/journal.pcbi.1008646.t001

PEtab test suite and examples

Along with introducing PEtab support to different tools, we have set up a test suite with various toy problems and reference values that can be used by other tool developers to assess and verify PEtab support in their software packages. The specific status of the PEtab support of the different tools is provided in Table 2 and continuously updated on the PEtab GitHub webpage. The test cases are based on SBML level 2 version 4 which is supported by all considered toolboxes.

thumbnail
Table 2. Overview of supported PEtab features in different tools, based on passed test cases of the PEtab test suite.

The first character indicates whether computing simulated data is supported and simulations are correct (✓) or not (-). The second character indicates whether computing χ2 values of residuals are supported and correct (✓) or not (-). The third character indicates whether computing likelihoods is supported and correct (✓) or not (-). An up-to-date overview of supported features is maintained on the PEtab GitHub page.

https://doi.org/10.1371/journal.pcbi.1008646.t002

To demonstrate the various features and the broad applicability of PEtab, we provide a growing collection of currently 20 example parameter estimation problems in the PEtab format largely based on a previously published benchmark collection [22]. These models can be used as templates for creating new PEtab problems and for method development and testing.

Availability and future directions

PEtab complements existing standards for model definition by facilitating the specification of complex estimation problems using tabular text files, defining experimental measurements and linking model entities and measurements via observables and a noise model.

The specification of the PEtab format, the PEtab Python library, as well as links to examples, a web-based validation tool, and all supporting software are available at https://github.com/PEtab-dev/PEtab. A snapshot is available at https://doi.org/10.5281/zenodo.3732958. PEtab and all original content presented here is available under permissive licences. For any questions or requests related to PEtab, we encourage interested users to approach us via the Issues function in the aforementioned GitHub repository, or the respective tool repositories for more specific queries.

We developed PEtab to cover the most common features needed for parameter estimation in the context of dynamic modeling. However, as multiple model formats as well as a multitude of tailored parameter estimation methods exist, which require different information, we could not cover every aspect. While at the time of writing, PEtab only allows for models defined in the SBML format, the PEtab format is general enough to be integrated with other model specification formats like CellML and rule-based formats [13] in the future. Additionally, other formats like SBtab [17] or Antimony [23] provide converters to SBML and can therefore also indirectly be used together with PEtab. Recently, new methods have been developed to estimate parameters in a hierarchical manner [18], including from qualitative data [24, 25]. PEtab could be extended to also allow for these types of measurements. To cover the most important needs, we invite users and developers to suggest new features to be supported by PEtab. We formed a maintainer team comprising developers of all supporting toolboxes to facilitate long-term support and improvement of PEtab. We encourage additional toolbox developers to implement support for PEtab. As an example, since the preprint publication of this manuscript, PEtab has already been adopted as the input format for a newly developed tool, SBML2Julia [26].

As PEtab is already supported by software tools with hundreds of users in total, we envisage that it will facilitate reusability, reproducibility and interoperability. We expect that a common specification format will prove helpful for users as well as developers of parameter estimation tools and methods in systems biology.

Supporting information

S1 File. PEtab specification.

Detailed format description of PEtab version 1.

https://doi.org/10.1371/journal.pcbi.1008646.s001

(PDF)

S2 File. PEtab tutorial.

Step-by-step instructions for creating PEtab files for an application example.

https://doi.org/10.1371/journal.pcbi.1008646.s002

(PDF)

Acknowledgments

We thank Dagmar Waltemath for helpful discussions.

References

  1. 1. Kitano H. Computational Systems Biology. Nature. 2002;420(6912):206–210.
  2. 2. Hoops S, Sahle S, Gauges R, Lee C, Pahle J, Simus N, et al. COPASI—a COmplex PAthway SImulator. Bioinformatics. 2006;22(24):3067–3074. pmid:17032683
  3. 3. Balsa-Canto E, Banga JR. AMIGO, a toolbox for advanced model identification in systems biology using global optimization. Bioinformatics. 2011;27(16):2311–2313.
  4. 4. Egea JA, Henriques D, Cokelaer T, Villaverde AF, MacNamara A, Danciu DP, et al. MEIGO: An open-source software suite based on metaheuristics for global optimization in systems biology and bioinformatics. BMC Bioinf. 2014;15(136). pmid:24885957
  5. 5. Raue A, Steiert B, Schelker M, Kreutz C, Maiwald T, Hass H, et al. Data2Dynamics: a modeling environment tailored to parameter estimation in dynamical systems. Bioinformatics. 2015;31(21):3558–3560. pmid:26142188
  6. 6. Fröhlich F, Kaltenbacher B, Theis FJ, Hasenauer J. Scalable Parameter Estimation for Genome-Scale Biochemical Reaction Networks. PLoS Comput Biol. 2017;13(1):e1005331.
  7. 7. Choi K, Medley JK, König M, Stocking K, Smith L, Gu S, et al. Tellurium: An extensible python-based modeling environment for systems and synthetic biology. Bio Systems. 2018;171:74–79. pmid:30053414
  8. 8. Stapor P, Weindl D, Ballnus B, Hug S, Loos C, Fiedler A, et al. PESTO: Parameter EStimation TOolbox. Bioinformatics. 2018;34(4):705–707. pmid:29069312
  9. 9. Mitra ED, Suderman R, Colvin J, Ionkov A, Hu A, Sauro HM, et al. PyBioNetFit and the Biological Property Specification Language. iScience. 2019;19:1012–1036. pmid:31522114
  10. 10. Kaschek D, Mader W, Fehling-Kaschek M, Rosenblatt M, Timmer J. Dynamic Modeling, Parameter Estimation, and Uncertainty Analysis in R. J Stat Softw. 2019;88(10).
  11. 11. Hucka M, Finney A, Sauro HM, Bolouri H, Doyle JC, Kitano H, et al. The systems biology markup language (SBML): A medium for representation and exchange of biochemical network models. Bioinformatics. 2003;19(4):524–531. pmid:12611808
  12. 12. Cuellar AA, Lloyd CM, Nielsen PF, Bullivant DP, Nickerson DP, Hunter PJ. An Overview of CellML 1.1, a Biological Model Description Language. Simulation. 2003;79(12):740–747.
  13. 13. Harris LA, Hogg JS, Tapia JJ, Sekar JAP, Gupta S, Korsunsky I, et al. BioNetGen 2.2: advances in rule-based modeling. Bioinformatics. 2016;32(21):3366–3368. pmid:27402907
  14. 14. Waltemath D, Adams R, Bergmann FT, Hucka M, Miller FKAK, Moraru II, et al. Reproducible computational biology experiments with SED-ML—The Simulation Experiment Description Markup Language. BMC Syst Biol. 2011;5(198). pmid:22172142
  15. 15. Choi K, Smith LP, Medley JK, Sauro HM. phraSED-ML: A paraphrased, human-readable adaptation of SED-ML. Journal of bioinformatics and computational biology. 2016;14(06):1650035.
  16. 16. Dada JO, Spasić I, Paton NW, Mendes P. SBRML: a markup language for associating systems biology data with models. Bioinformatics. 2010;26:932–938.
  17. 17. Lubitz T, Hahn J, Bergmann FT, Noor E, Klipp E, Liebermeister W. SBtab: a flexible table format for data exchange in systems biology. Bioinformatics. 2016;32(16):2559–2561.
  18. 18. Schmiester L, Schälte Y, Fröhlich F, Hasenauer J, Weindl D. Efficient parameterization of large-scale dynamic models based on relative measurements. Bioinformatics. 2019;36(2):594–602.
  19. 19. Bergmann FT, Adams R, Moodie S, Cooper J, Glont M, Golebiewski M, et al. COMBINE archive and OMEX format: one file to share all information to reproduce a modeling project. BMC bioinformatics. 2014;15:369. pmid:25494900
  20. 20. Schälte Y, Fröhlich F, Stapor P, Wang D, Weindl D, Schmiester L, et al.. ICB-DCM/pyPESTO: pyPESTO 0.0.11; 2020. Available from: https://doi.org/10.5281/zenodo.3715448.
  21. 21. Klinger E, Rickert D, Hasenauer J. pyABC: distributed, likelihood-free inference. Bioinformatics. 2018;34(20):3591–3593.
  22. 22. Hass H, Loos C, Raimúndez-Álvarez E, Timmer J, Hasenauer J, Kreutz C. Benchmark problems for dynamic modeling of intracellular processes. Bioinformatics. 2019;35(17):3073–3082.
  23. 23. Smith LP, Bergmann FT, Chandran D, Sauro HM. Antimony: a modular model definition language. Bioinformatics. 2009;25(18):2452–2454.
  24. 24. Mitra ED, Dias R, Posner RG, Hlavacek WS. Using both qualitative and quantitative data in parameter identification for systems biology models. Nature communications. 2018;9(1):3901.
  25. 25. Schmiester L, Weindl D, Hasenauer J. Parameterization of mechanistic models from qualitative data using an efficient optimal scaling approach. J Math Biol. 2020;81(2):603–623. pmid:32696085
  26. 26. Lang PF, Shin S, Zavala VM. SBML2Julia: interfacing SBML with efficient nonlinear Julia modelling and solution tools for parameter optimization. arXiv preprint arXiv:2011.02597. 2020.