Cobra Toolbox
Cobra Toolbox
https://doi.org/10.1038/s41596-018-0098-2
Nathan E. Lewis 5,21, Thomas Sauter2, Bernhard Ø. Palsson16,22, Ines Thiele1 and
1234567890():,;
Ronan M. T. Fleming1,23*
Constraint-based reconstruction and analysis (COBRA) provides a molecular mechanistic framework for integrative
analysis of experimental molecular systems biology data and quantitative prediction of physicochemically and
biochemically feasible phenotypic states. The COBRA Toolbox is a comprehensive desktop software suite of interoperable
COBRA methods. It has found widespread application in biology, biomedicine, and biotechnology because its functions can
be flexibly combined to implement tailored COBRA protocols for any biochemical network. This protocol is an update to
the COBRA Toolbox v.1.0 and v.2.0. Version 3.0 includes new methods for quality-controlled reconstruction, modeling,
topological analysis, strain and experimental design, and network visualization, as well as network integration of
chemoinformatic, metabolomic, transcriptomic, proteomic, and thermochemical data. New multi-lingual code integration
also enables an expansion in COBRA application scope via high-precision, high-performance, and nonlinear numerical
optimization solvers for multi-scale, multi-cellular, and reaction kinetic modeling, respectively. This protocol provides an
overview of all these new features and can be adapted to generate and analyze constraint-based models in a wide variety
of scenarios. The COBRA Toolbox v.3.0 provides an unparalleled depth of COBRA methods.
This protocol is an update to Nat. Protoc. 2, 727–738 (2007): https://doi.org/10.1038/nprot.2007.99 and Nat. Protoc. 6,
1290–1307 (2011): https://doi.org/10.1038/protex.2011.234
1
Luxembourg Centre for Systems Biomedicine, University of Luxembourg, Belvaux, Luxembourg. 2Life Sciences Research Unit, University of Luxembourg,
Belvaux, Luxembourg. 3Center for Genome Regulation (Fondap 15090007), University of Chile, Santiago, Chile. 4Mathomics, Center for Mathematical
Modeling, University of Chile, Santiago, Chile. 5Department of Pediatrics, University of California, San Diego, School of Medicine, La Jolla, CA, USA.
6
European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Hinxton, Cambridge, UK. 7Department of Chemical Engineering,
The Pennsylvania State University, State College, PA, USA. 8Department of Physics, and Bioinformatics and Systems Biology Program, University of
California, San Diego, La Jolla, CA, USA. 9Sinopia Biosciences, San Diego, CA, USA. 10Algorithms and Randomness Center, School of Computer Science,
Georgia Institute of Technology, Atlanta, GA, USA. 11Biomedical Engineering and Sciences Department, TECNUN, University of Navarra, San Sebastián,
Spain. 12Institute of Microbiology and Biotechnology, University of Latvia, Riga, Latvia. 13Institut Curie, PSL Research University, Mines Paris Tech, Inserm,
U900, Paris, France. 14Department of Management Science and Engineering, Stanford University, Stanford, CA, USA. 15Department of Statistics,
University of Michigan, Ann Arbor, MI, USA. 16Department of Bioengineering, University of California, San Diego, La Jolla, CA, USA. 17Utah State
University Research Foundation, North Logan, UT, USA. 18Centre for Integrative Systems Biology and Bioinformatics, Department of Life Sciences,
Imperial College London, London, UK. 19Department of Mathematics, University of Alicante, Alicante, Spain. 20Department of Computing and
Mathematical Sciences, California Institute of Technology, Pasadena, CA, USA. 21Novo Nordisk Foundation Center for Biosustainability, University of
California, San Diego, La Jolla, CA, USA. 22Novo Nordisk Foundation Center for Biosustainability, Technical University of Denmark, Kemitorvet, Lyngby,
Denmark. 23Division of Systems Biomedicine and Pharmacology, Leiden Academic Centre for Drug Research, Faculty of Science, Leiden University,
Leiden, The Netherlands. 24These authors contributed equally: Laurent Heirendt, Sylvain Arreckx. *e-mail: ronan.mt.fl[email protected]
Introduction
a c
Flux balance analysis
Genome-scale metabolic reconstruction
Maximize/minimize an objective function
Automatic ψ = c1v1 + c2v2 + ... + c5v5 such that:
Reactions
R1 R2 R3 R4 R5
Genome Metabolic Metabolic Gap
sequence enzymes reactions fill
–1 0 0 0 0
Based on genome annotations 1 –1 0 0 0 v1 0
0 1 –1 0 0 v2 0
Metabolites
Draft
Manual 0 1 0 0 –1 v3 0
× =
0 0 1 0 0 v4 0
0 0 0 –1 0 v5 0
New
Literature
knowledge 0 0 0 1 –1
0 0 0 0 1
v1 v1 v1 v1 v1 v1
v2 v2 v2 v2 v2 v2
Fig. 1 | Overview of key constraint-based reconstruction and analysis concepts. a, A genome-scale metabolic reconstruction is a structured knowledge
base that abstracts pertinent information on the biochemical transformations taking place within a chosen biochemical system, e.g., the human gut
microbiome38. Genome-scale metabolic reconstructions are built in two steps. First, a draft metabolic reconstruction based on genome annotations is
generated using one of several platforms. Second, the draft reconstruction is refined on the basis of known experimental and biochemical data from the
literature6. Novel experiments can be performed on the organism and the reconstruction can be refined accordingly. b, A phenotypically feasible
solution space is defined by specifying certain assumptions, e.g., a steady-state assumption, and then converting the reconstruction into a computational
model that eliminates physicochemically or biochemically infeasible network states. Various methods are used to interrogate the solution space. For
example, optimization for a biologically motivated objective function (e.g., biomass production) identifies a single optimal flux vector (v), whereas
uniform sampling provides an unbiased characterization via flux vectors uniformly distributed in the solution space. c, Flux balance analysis is an
optimization method that maximizes a linear objective function, ψ(v) = cTv, formed by multiplying each reaction flux vj by a predetermined coefficient, cj,
subject to a steady-state assumption, Sv = 0, as well as lower and upper bounds on each reaction flux (lbj and ubj, respectively).
B Initialize and verify the installation Software dependency audit, e.g., solvers, binaries, and git
R Input and output of reconstructions and models Support for latest standards, e.g., SBML flux balance constraints64
R Reconstruction: rBioNet New software for quality-controlled reconstruction48
R Reconstruction: create a functional generic New methods for selecting different types of subnetworks
subnetwork
R Reconstruction exploration New methods, e.g., find adjacent reactions
R Reconstruction refinement Maintenance of internal model consistency, e.g., upon subnetwork generation29
R Numerical reconstruction properties Flag a reconstruction requiring a multi-scale solver54
R Convert a reconstruction into a flux balance Identification of a maximal flux and stoichiometrically consistent subset69
analysis model
I Atomically resolve a metabolic reconstruction New algorithms and methods for working with molecular structures, atom-mapping,
identification of conserved moieties110,122
I Integration of metabolomic data New methods for analysis of metabolomic data in a network context65,142
I Integration of transcriptomic and proteomic data New algorithms for generation of context-specific models89
A Flux balance analysis and its variants New flux balance methods, multi-scale model rescaling and multi-scale solvers,
additional solver interfaces, thermodynamically feasible methods42,60,128,132,143,144
A Variation on reaction rate bounds in flux balance Increased computational efficiency
analysis
A Parsimonious flux balance analysis New method for parsimonious flux balance analysis145
A Sparse flux balance analysis New method for sparse flux balance analysis
A Gap filling Increased computational efficiency82
A Adding biological constraints to a flux balance New methods for coupling reaction rates38,146
model
A Testing biochemical fidelity Human metabolic function test suite17
A Testing basic properties of a metabolic model New methods to minimize occurrence of modeling artifacts66
(sanity checks)
A Minimal spanning pathway vectors New method for determining minimal spanning pathway vectors100
A Elementary modes and pathway vectors Extended functionality by integration with CellNetAnalyzer55
A Minimal cut sets Extended functionality by integration with CellNetAnalyzer147,148, and new algorithms
for genetic MCS57
A Flux variability analysis Increased computational efficiency101
A Uniform sampling of steady-state fluxes New algorithm, guaranteed convergence to uniform distribution102
I Thermodynamically constrain reaction New algorithms and methods for estimation of thermochemical parameter estimation
directionality in multi-compartmental, genome-scale metabolic models126,127
A Variational kinetic modeling New algorithms and methods for genome-scale kinetic modeling68,135–137
D Metabolic engineering and strain design New methods, e.g., OptForce, interpretation of new strain designs. New modeling
language interface to GAMS59
V Human metabolic network visualization: New method for genome-scale metabolic network visualization50,51,149
ReconMap
V Variable scope visualization with automatic layout New method for automatic visualization of network parts141
generation
B Contributing to the COBRA Toolbox with New software application enabling contributions by those unfamiliar with version-
MATLAB.devTools control software
B Engaging with the COBRA Toolbox Forum >800 posted questions with supportive replies connecting problems and solutions
Each method available in the COBRA Toolbox v.3.0 is made accessible with a narrative tutorial that illustrates how the corresponding function(s) are combined to implement each COBRA method
in the respective src/ directories (https://github.com/opencobra/cobratoolbox/tree/master/src): base (B), reconstruction (R), dataIntegration (I), analysis (A), design (D), and visualization (V).
(ref. 4), which offered an enhanced range of methods to simulate, analyze, and predict a variety of
phenotypes using genome-scale metabolic reconstructions. Since then, the increasing functional scope
and size of biochemical network reconstructions, as well as the increasing breadth of physicochemical
and biological constraints that are represented within constraint-based models, naturally resulted in
the development of a broad arbor of new COBRA methods5.
The present protocol provides an overview of the main novel developments within v.3.0 of the
COBRA Toolbox (Table 1), especially the expansion of functionality to cover new biochemical network
reconstruction and modeling methods. In particular, this protocol includes the input and output of new
standards for sharing reconstructions and models, an extended suite of supported general-purpose
optimization solvers, new optimization solvers developed especially for constraint-based modeling
problems, enhanced functionality in the areas of computational efficiency and high-precision com-
puting, numerical characterization of reconstructions, conversion of reconstructions into various forms
of constraint-based models, comprehensive support for flux balance analysis (FBA) and its variants,
integration with omics data, uniform sampling of high-dimensional models, atomic resolution of
metabolic reconstructions via molecular structures, estimation, and application of thermodynamic
constraints, visualization of metabolic networks, and genome-scale kinetic modeling.
This protocol consists of a set of methods that are introduced in sequence but can be combined in
a multitude of ways. The overall purpose is to enable the user to generate a biologically relevant, high-
quality model that enables novel predictions and hypothesis generation. Therefore, we implement and
enforce standards in reconstruction and simulation that have been developed by the COBRA com-
munity over the past two decades. All explanations of a method are also accompanied by explicit
computational commands.
First, we explain how to initialize and verify the installation of the COBRA Toolbox in MATLAB
(MathWorks). The main options for importing and exploring the content of a biochemical network
reconstruction are introduced. For completeness, a brief summary of methods for manual and
algorithmic reconstruction refinement is provided, with reference to the established reconstruction
protocol6. We also explain how to characterize the numerical properties of a reconstruction, especially
with respect to detection of a reconstruction requiring a multi-scale numerical optimization solver.
We explain how to semi-automatically convert a reconstruction into a constraint-based model
suitable for FBA. This is followed by an extensive explanation of how to carry out FBA and its
variants. The procedure to fill gaps in a reconstruction, due to missing reactions, is also explained.
We provide an overview of the main methods for integration of metabolomic, transcriptomic, pro-
teomic, and thermochemical data to generate context-specific, constraint-based models. Various methods
are explained for the addition of biological constraints to a constraint-based model. We then explain how
to test the chemical and biochemical fidelity of the model. After a high-quality model has been generated,
we explain how to interrogate the discrete geometry of its stoichiometric subspaces, how to efficiently
measure the variability associated with the prediction of steady-state reaction rate using flux variability
analysis, and how to uniformly sample steady-state fluxes. We introduce various approaches for pro-
spective use of a constraint-based model, including strain design and experimental design.
We explain how to atomically resolve a metabolic reconstruction by connecting it with molecular
species structures and how to use cheminformatic algorithms for atom mapping and identification of
conserved moieties. Using molecular structures for each metabolite, and established thermochemical
data, we estimate the transformed Gibbs energy of each subcellular compartment-specific reaction in
a model of human metabolism in order to thermodynamically constrain reaction directionality and
constrain the set of feasible kinetic parameters. Sampled kinetic parameters are then used for var-
iational kinetic modeling in an illustration of the utility of recently published algorithms for genome-
scale kinetic modeling. We also explain how to visualize predicted phenotypic states using a recently
developed approach for metabolic network visualization. We conclude with an explanation of how to
engage with the community of COBRA developers, as well as contribute code to the COBRA Toolbox
with MATLAB.devTools, a newly developed piece of software for community contribution of
COBRA methods to the COBRA Toolbox.
All documentation and code is released as part of the openCOBRA project (https://github.com/
opencobra/cobratoolbox). Where reading the extensive documentation associated with the COBRA
Toolbox does not suffice, we describe the procedure for effectively engaging with the community via a
dedicated online forum (https://groups.google.com/forum/#!forum/cobra-toolbox). Taken together,
the COBRA Toolbox v.3.0 provides an unparalleled depth of interoperable COBRA methods and a
proof of concept that knowledge integration and collaboration by large numbers of scientists can lead
to cooperative advances impossible to achieve by a single scientist or research group alone7.
COBRA Toolbox MATLAB (and others) Script/narrative Open sourcea git All
RAVEN150 MATLAB Script Open sourcea git All
CellNetAnalyzer55 MATLAB (and others) Script/GUI Closed sourcea zip All
FBA-SimVis151 Java + MATLAB GUI Closed sourceb zip Windows
OptFlux152 Java Script Open sourcea svn All
COBRA.jl42 Julia Script/narrative Open sourcea git All
Sybil53 R package Script Open sourcea zip All
COBRApy40 Python Script/narrative Open sourcea git All
CBMPy52 Python Script Open sourcea zip All
Scrumpy153 Python Script Open sourcea tar All
SurreyFBA47 C++ Script/GUI Open sourcea zip All
FASIMU154 C Script Open sourceb zip Linux
FAME155 Web-based GUI Open sourceb zip All
PathwayTools43 Web-based GUI/script Closed sourcea zip All
KBase41 Web-based Script/narrative Open sourcea git All
GUI, graphical user interface; NA, not applicable. The COBRA Toolbox: https://opencobra.github.io/cobratoolbox; RAVEN: https://github.com/
SysBioChalmers/RAVEN; CellNetAnalyzer: https://www2.mpi-magdeburg.mpg.de/projects/cna/cna.html; FBA-SimVis: https://immersive-analytics.
infotech.monash.edu/fbasimvis; OptFlux: http://www.optflux.org; COBRA.jl: https://opencobra.github.io/COBRA.jl; Sybil: https://rdrr.io/cran/sybil;
COBRApy: http://opencobra.github.io/cobrapy; CBMPy: http://cbmpy.sourceforge.net; SurreyFBA: http://sysbio.sbs.surrey.ac.uk/sfba; FASIMU:
http://www.bioinformatics.org/fasimu; FAME: http://f-a-m-e.org; Pathway Tools: http://bioinformatics.ai.sri.com/ptools; KBase: https://kbase.us.
Software may be distributed by version-controlled repositories (git, svn) or as compressed files (zip, tar). The ‘All’ label in the OS column means that
the application is compatible with Windows, Linux and Mac operating systems. aActive project. bInactive project.
constraint-based modeling method. By adapting the input data and interpreting the output results in
a different way, the same method can be used to address a different research question.
Biotechnological applications of constraint-based modeling include the development of sustainable
approaches for chemical9 and biopharmaceutical production10,11. Among these applications is the
computational design of new microbial strains for production of bioenergy feedstocks from non-food
plants, such as microbes capable of deconstructing biomass into their sugar subunits and synthesizing
biofuels, either from cellulosic biomass or through direct photosynthetic capture of carbon dioxide.
Another prominent biotechnological application is the analysis of interactions between organisms
that form biological communities and their surrounding environments, with a view toward utilization
of such communities for bioremediation12 or nutritional support of non-food plants for bioenergy
feedstocks. Biomedical applications of constraint-based modeling include the prediction of the
phenotypic consequences of single-nucleotide polymorphisms13, drug targets14, and enzyme defi-
ciencies15–18, as well as side and off-target effects of drugs19–21. COBRA has also been applied to
generate and analyze normal and diseased models of human metabolism17,22–25, including organ-
specific models26–28, multi-organ models29,30, and personalized models31–33. Constraint-based mod-
eling has also been applied to understanding of the biochemical pathways that interlink diet, gut
microbial composition, and human health34–38.
Each software tool for constraint-based modeling has varying degrees of dependency on other
software. Web-based applications exist for the implementation of a limited number of standard
constraint-based modeling methods. Their only local dependency is on a web browser. The COBRA
Toolbox depends on MATLAB (MathWorks), a commercially distributed, general-purpose computa-
tional tool. MATLAB is a multi-paradigm programming language and numerical computing envir-
onment that allows matrix manipulations, plotting of functions and data, implementation of algorithms,
creation of user interfaces, and interfacing with programs written in other languages, including C, C++,
C#, Java, Fortran, and Python. All software tools for constraint-based modeling also depend on at least
one numerical optimization solver. The most robust and efficient numerical optimization solvers
for standard problems are distributed commercially, but often offer free licenses for academic use,
e.g., Gurobi Optimizer (http://www.gurobi.com). Stand-alone constraint-based modeling software tools
also exist, and their dependency on a numerical optimization solver is typically satisfied by GLPK
(https://gnu.org/software/glpk), an open-source linear optimization solver.
Some perceive a commercial advantage to depending only on open-source software. However,
there are also commercial costs associated with dependency on open-source software. That is, in the
form of increased computation times as well as increased time required to install, maintain, and
upgrade open-source software dependencies. This is an important consideration for any research
group whose primary focus is on biological, biomedical, or biotechnological applications, rather than
on software development. The COBRA Toolbox v.3.0 strikes a balance by depending on closed-
source, general purpose, commercial computational tools, yet all COBRA code is distributed and
developed in an open-source environment (https://github.com/opencobra/cobratoolbox).
The availability of comprehensive documentation is an important feature in the usability of any
modeling software. Therefore, a dedicated effort has been made to ensure that all functions in the
COBRA Toolbox v.3.0 are comprehensively and consistently documented. Moreover, we also provide
a new suite of >35 tutorials (https://opencobra.github.io/cobratoolbox/latest/tutorials) to enable
beginners, as well as intermediate and advanced users to practice a wide variety of COBRA methods.
Each tutorial is presented in a variety of formats, including as a MATLAB live script, which is an
interactive document, or narrative (https://mathworks.com/help/matlab/matlab_prog/what-is-a-live-
script.html), that combines MATLAB code with embedded output, formatted text, equations, and
images in a single environment viewable with the MATLAB Live Editor (v.R2016a or later).
MATLAB live scripts are similar in functionality to Mathematica Notebooks (Wolfram) and Jupyter
Notebooks (https://jupyter.org). The latter support interactive data science and scientific computing
for >40 programming languages. To date, only the COBRA Toolbox v.3.0, COBRApy40, KBase41, and
COBRA.jl42 offer access to constraint-based modeling algorithms via narratives.
KBase is a collaborative, open environment for systems biology of plants, microbes, and their
communities41. It also has a suite of analysis tools and data that support the reconstruction, pre-
diction, and design of metabolic models in microbes and plants. These tools are tailored toward the
optimization of microbial biofuel production and the identification of minimal media conditions
under which that fuel is generated, and predict soil amendments that improve the productivity of
plant bioenergy feedstocks. In our view, KBase is currently the tool of choice for the automatic
generation of draft microbial metabolic networks, which can then be imported into the COBRA
Toolbox for further semi-automated refinement, which has recently successfully been completed for a
suite of gut microbial organisms38. However, KBase41 currently offers a modest depth of constraint-
based modeling algorithms.
MetaFlux43 is a web-based tool for the generation of network reconstructions directly from pathway
and genome databases, proposing network refinements to generate functional flux balance models from
reconstructions, predict steady-state reaction rates with FBA, and interpret predictions in a graphical
network visualization. MetaFlux is tightly integrated within the PathwayTools44 environment, which
provides a broad selection of genome, metabolic, and regulatory informatics tools. As such,
PathwayTools provides breadth in bioinformatics and computational biology, whereas the COBRA
Toolbox v.3.0 provides depth in constraint-based modeling, without providing, for example, any
genome informatics tools. Although an expert can locally install a PathwayTools environment, the
functionality is closed source and only accessible via an application programming interface. This
approach does not permit the level of repurposing possible with open-source software. As recognized in
the computational biology community45, open-source development and distribution is scientifically
important for tractable reproducibility of results, as well as reuse and repurposing of code46.
Lakshmanan et al.39 consider the availability of a graphical user interface to be an important
feature in the usability of modeling software. For example, SurreyFBA47 provides a command-line
Experimental design
The COBRA Toolbox v.3.0 is designed for flexible adaptation into customized pipelines for COBRA
in a wide range of biological, biochemical, or biotechnological scenarios, from single organisms to
communities of organisms. To become proficient in adapting the COBRA Toolbox to generate a
protocol specific to one’s situation, it is wise to first familiarize oneself with the principles of
constraint-based modeling. This can best be achieved by studying the educational material already
available. The textbook Systems Biology: Constraint-based Reconstruction and Analysis1 is an ideal
place to start. It is accompanied by a set of lecture videos that accompany the various chapters
(http://systemsbiology.ucsd.edu/Publications/Books/SB1-2LectureSlides). The textbook Optimization
Methods in Metabolic Networks58 provides the fundamentals of mathematical optimization and its
application in the context of metabolic network analysis. A study of this educational material will
accelerate one’s ability to utilize any software application dedicated to COBRA.
Once one is cognizant of the conceptual basis of COBRA, one can then proceed with this protocol,
which summarizes a subset of the key methods that are available within the COBRA Toolbox. To
adapt this protocol to one’s situation, users can combine the COBRA methods implemented within
the COBRA Toolbox in numerous ways. The adaption of this protocol to one’s situation may require
the development of new customized MATLAB scripts that combine existing methods in a new way.
Owing to the aforementioned benefits of narratives, the first choice should be to implement these
customized scripts in the form of MATLAB live scripts. To get started, the existing tutorial narratives,
described in Table 1, can be repurposed as templates for new analysis pipelines. Narrative figures and
tables can then be generated from raw data and used within the main text of scientific articles and
converted into supplementary material to enable full reproducibility of computational results. The
narratives specific to individual scientific articles can be shared with peers within a dedicated
repository (https://github.com/opencobra/COBRA.papers).
New tutorials can be shared with the COBRA community (https://github.com/opencobra/COBRA.
tutorials). Depending on one’s level of experience, or the novelty of an analysis, the adaptation of this
protocol to a particular situation may require the adaption of existing COBRA methods, development
of new COBRA methods, or both.
/test /src
Coverage
on Source code Documenter.py
codecov.io
Continuous
integration software
(Jenkins)
ARTENOLIS
Fig. 2 | Continuous integration of newly developed code is performed on a dedicated server running Jenkins. The
main code is located in the src folder and test functions are located in the test folder. A test not only runs a function
(first-degree testing), but also tests the output of that function (second-degree testing). The continuous-integration
setup relies on end-of-year releases of MATLAB only. Soon after the latest stable version of MATLAB is released, full
support will be provided for the COBRA Toolbox. After a successful run of tests on the three latest end-of-year
releases of MATLAB using various solver packages, the documentation based on the headers of the functions
(docstrings) is extracted, generated, and automatically deployed. Immediate feedback through code coverage
reports (https://codecov.io/gh/opencobra/cobratoolbox) and build statuses is reported on GitHub. With this setup,
the impact of local changes in the code base is promptly revealed. This newly developed software architecture is
termed ARTENOLIS (Automated Reproducibility and Testing Environment for Licensed Software).
second, reviewed manually by at least one domain expert, before integration with the development
branch. Third, each new contribution to the development branch is evaluated in practice by active
COBRA researchers before it becomes part of the master branch.
Until recently, the code-quality checks of the COBRA Toolbox were primarily static: the code was
reviewed by experienced users and developers, and occasional code inspections led to discoveries of
bugs. The continuous-integration setup defined in Fig. 2 aims at dynamic testing with automated builds,
code evaluation, and documentation deployment. Often a function runs properly independently and
yields the desired output(s), but when called within a different part of the code, logical errors are
thrown. The unique advantage of continuous integration is that logical errors are mostly avoided.
In addition to automatic testing, manual usability testing is performed regularly by users and is key
to providing a tested and usable code base to the end user. These users provide feedback on the usability
of the code base, as well as the documentation, and report eventual issues online (https://github.com/
opencobra/cobratoolbox/issues). The documentation is automatically deployed to the COBRA Toolbox
Controls
COBRA is part of an iterative systems biology cycle1. As such, it can be used as a framework for
integrative analysis of experimental data in the context of prior information on the biochemical
network underlying one or many complementary experimental datasets. Moreover, it can be used to
predict the outcome of new experiments, or it can be used in both of these scenarios at once.
Assuming all of the computational steps are errorless, the appropriate control for any prediction
derived from a computational model is the comparison with independent experimental data, that is,
experimental data that were not used for the model-generated predictions. It is also important to
introduce quality controls to check that the computational steps are free from certain errors that may
arise during adaptation of existing COBRA protocols or development of new ones.
There are various strategies for the implementation of computational quality controls. Within the
COBRA Toolbox v.3.0, substantial effort has been devoted to automatically testing the functionality of
existing COBRA methods. We have also embedded a large number of sanity checks, which evaluate
whether the input data could possibly be appropriate for use with a function. These sanity checks have
been accumulated over more than a decade of continuous development of the COBRA Toolbox. Their
objective is to rule out certain known classes of obviously false predictions that might result from an
inappropriate use of a COBRA method, but they do not (and are not intended) to catch every such
error, as it is impossible to imagine all of the eventual erroneous inputs that may be presented to a
COBRA Toolbox function. It is advisable for users to create their own narratives with additional sanity
checks, which will depend heavily on the modeling scenario. Examples of such narratives can be found
within the COBRA Toolbox website (https://opencobra.github.io/cobratoolbox/stable/tutorials).
Required expertise
Most of this protocol can be implemented by anyone with a basic familiarity with the principles of
constraint-based modeling. Some methods are only for advanced users. If one is a beginner with
respect to MATLAB, Supplementary Manual 1 provides pointers to get started. MATLAB is a rela-
tively simple programming language to learn, but it is also a powerful language for an expert because of
the large number of software libraries for numerical and symbolic computing that it provides access to.
Certain specialized methods within this protocol, such as that for thermodynamically constraining
reaction directionality, depend on the installation of other programming languages and software,
which may be too challenging for a beginner with a non-standard operating system.
If the documentation and tutorials provided within the COBRA Toolbox are not sufficient, then
Step 103 guides the user toward sources of COBRA community support. The computational demands
associated with the implementation of this protocol for one’s reconstruction or model of choice are
dependent on the size of the network concerned. For a genome-scale model of metabolism, a desktop
computer is usually sufficient. However, for certain models, such as a microbial community of
genome-scale metabolic networks, a multi-scale model of metabolism and macromolecular synthesis,
or a multi-tissue model, more powerful processors and extensive memory capacity are required,
ranging from a workstation to a dedicated computational cluster. Embarrassingly parallel, high-
performance computing is feasible for most model analysis methods implemented in the COBRA
Toolbox, which will run in isolation with invocation from a distributed computing engine. It is
currently an ongoing topic of research, beyond the scope of this protocol, to fully exploit high-
performance computing environments with software developed within the wider open COBRA
environment, although some examples42 are already available for interested researchers to consult.
Limitations
A protocol for the generation of a high-quality, genome-scale reconstruction, using various software
applications, including the COBRA Toolbox, has previously been disseminated6; therefore, this pro-
tocol focuses more on modeling than reconstruction. The COBRA Toolbox is not meant to be a
general-purpose computational biology tool, as it is focused on COBRA. For example, although
various forms of generic data analysis methods are available within MATLAB, the input data for
integration with reconstructions and models within the COBRA Toolbox are envisaged to have already
been preprocessed by other tools. Within its scope, the COBRA Toolbox aims for complete coverage of
COBRA methods. The first comprehensive overview of the COBRA methods available for microbial
metabolic networks5 requires an update to encompass many additional methods that have been
reported to date, in addition to the COBRA methods targeted toward other biochemical networks. The
COBRA Toolbox v.3.0 provides the most extensive coverage of published COBRA methods. However,
there are certainly some methods that have yet to be incorporated directly as MATLAB imple-
mentations, or indirectly via a MATLAB interface to a software dependency. Although, in principle,
any COBRA method could be implemented entirely within MATLAB, it may be more efficient to
leverage the core strength of another programming language that could provide intermediate results
that can be incorporated into the COBRA Toolbox via various forms of MATLAB interfaces. Such a
setup would enable one to overcome any current limitation in coverage of existing methods.
Materials
Equipment
Input data
The COBRA Toolbox offers support for several commonly used data formats for describing models,
including models in Systems Biology Markup Language (SBML) and Excel sheets (.xls). The COBRA
Toolbox fully supports the standard format documented in the SBML Level 3 v.1 with the Flux
Balance Constraints (fbc) package v.2 specifications (http://www.sbml.org/specifications/sbml-level-3
/version-1/fbc/sbml-fbc-version-2-release-1.pdf).
Required hardware
CRITICAL Depend-
c
● A computer with any 64-bit Intel or AMD processor and at least 8 GB of RAM
ing on the size of the reconstruction or model, more processing power and more memory may be
needed, especially if it is also desired to store the results of analysis procedures within the MATLAB
workspace.
● A hard drive with free storage of at least 10 GB
● A working and stable Internet connection is required during installation and while contributing to the
COBRA Toolbox.
Required software
● A Linux, macOS, or Windows operating system that is MATLAB qualified (https://mathworks.com/
support/sysreq.html) CRITICAL Make sure that the operating system is compatible with the
c
versions older than R2014b. MATLAB is released on a twice-yearly schedule. After the latest release
(version b), it may be a couple of months before certain methods with dependencies on other software
become compatible. For example, the latest releases of MATLAB may not be compatible with the
existing solver interfaces, necessitating an update of the MATLAB interface provided by the solver
developers, an update of the COBRA Toolbox, or both.
● The COBRA Toolbox (https://github.com/opencobra/cobratoolbox), v.3.0 or above. Install the COBRA
connectivity between the COBRA Toolbox and the remote GitHub server. The version control software
git v.1.8 or above is required to be installed and accessible through system commands. On Linux and
macOS, a bash terminal with git and curl is readily available. Supplementary Manual 2 provides a brief
guide to the basics of using a terminal. CRITICAL On Windows, the shell integration included with git
c
Bash (https://git-for-windows.github.io) utilities must be installed. The command-line tools such as git or
Optional software
● The ability to read and write models in SBML format requires the MATLAB interface from the
libSBML application programming interface, v.5.15.0 or above. The COBRA Toolbox v.3.0 supports
the latest SBML Level 3 Flux Balance Constraints version 2 package (http://sbml.org/Documents/
Specifications/SBML_Level_3/Packages/fbc). The libSBML package, v.5.15.0 or above, is already
packaged with the COBRA Toolbox via the COBRA.binary submodule for all common operating
systems. Alternatively, binaries can be downloaded separately and installed by following the procedure
at http://sbml.org/Software/libSBML. The COBRA Toolbox developers work closely with the SBML
team to ensure that the COBRA Toolbox supports the latest standards, and moreover that standard
development is also focused on meeting the evolving requirements of the constraint-based modeling
community. After the latest release of MATLAB, there may be a short time lag before input and output
become fully compatible. For example, the input and output of .xml files in the SBML standard formats
rely on platform-dependent binaries that we maintain (https://github.com/opencobra/COBRA.binary)
for each major platform, but the responsibility for maintenance of the source code64 lies with
the SBML team (http://sbml.org), which has a specific forum for raising interoperability issues
(https://groups.google.com/forum/#!forum/sbml-interoperability).
●
The MATLAB Image Processing Toolbox, the Parallel Computing Toolbox, the Statistics and Machine
Learning Toolbox, and the Optimization Toolbox and Bioinformatics Toolbox (https://mathworks.
com/products) must be licensed and installed to ensure certain model analysis functionality, such
as topology-based algorithms, flux variability analysis, or sampling algorithms. The individual
MATLAB toolboxes can be installed during the MATLAB installation process. If MATLAB is already
installed, the toolboxes can be managed using the built-in MATLAB add-on manager as described at
https://mathworks.com/help/matlab/matlab_env/manage-your-add-ons.html.
● The Chemaxon Calculator Plugins (ChemAxon, https://chemaxon.com/products/calculator-plugins),
language that enables platform-independent applications. Java, v.8 or above, can be installed by
following the procedures given at https://java.com/en/download/help/index_installing.xml.
● Python (https://python.org/downloads), v.2.7, is a high-level programming language for general-purpose
programming and is required to run NumPy or to generate the documentation locally (relevant when
contributing). Python v.2.7 is already installed on Linux and macOS. On Windows, the instructions at
https://wiki.python.org/moin/BeginnersGuide/Download will guide you to install Python.
● NumPy (http://numpy.org), version 1.11.1 or above, is a fundamental package for scientific computing
with Python. NumPy can be installed by following the procedures accessible via https://scipy.org.
●
OpenBabel (https://openbabel.org), v.2.3 or above, is a chemical toolbox designed to speak the
many languages of chemical data. OpenBabel can be installed by following the installation instructions
at http://openbabel.org/wiki/Category:Installation.
● Reaction Decoder Tool (RDT; https://github.com/asad/ReactionDecoder/releases), v.1.5.0 or above, is a
Java-based, open-source atom-mapping software tool. The latest version of the RDT can be installed by
following the procedures at https://github.com/asad/ReactionDecoder#installation.
Solvers
●Table 4 provides an overview of supported optimization solvers. At least one linear programming (LP)
solver is required for basic constraint-based modeling methods. Therefore, by default, the COBRA
Toolbox installs certain open-source solvers, including the LP and mixed-integer LP (MILP) solver
GLPK (https://gnu.org/software/glpk). However, for more efficient and robust linear optimization, we
recommend that an industrial numerical optimization solver be installed. On Windows, the OPTI
solver suite (https://inverseproblem.co.nz/OPTI) must be installed separately in order to use the OPTI
Table 4 | An overview of the types of optimization problems solved by each optimization solver
interface. CRITICAL Depending on the type of optimization problem underlying a COBRA method,
c
the setting of environment variables. Detailed instructions and links to the official installation
guidelines for installing Gurobi, Mosek, Tomlab, and IBM Cplex can be found at https://opencobra.
github.io/cobratoolbox/docs/solvers.html. CRITICAL Make sure that environment variables are
c
properly set in order for the solvers to be correctly recognized by the COBRA Toolbox.
Application-specific software
CRITICAL Certain solvers have additional software requirements, and some binaries provided in the
c
NetBSD/bin/csh) is required. On Linux or macOS, the C-shell csh can be installed by following the
instructions at https://en.wikibooks.org/wiki/C_Shell_Scripting/Setup.
● The GNU C compiler gcc v.7.0 or above (https://gcc.gnu.org). The library of the gcc compiler is
required for generating new binaries of fastFVA with a different version of the CPLEX solver from that
officially supplied. The gcc compiler can be installed by following the link given at https://opencobra.
github.io/cobratoolbox/docs/compilers.html.
● The GNU Fortran compiler gfortran v.4.1 or above (https://gcc.gnu.org/fortran). The library of
the gfortran compiler is required for running dqqMinos. Most Linux distributions come with this
compiler preinstalled. Alternatively, the gfortran compiler can be installed by following the link given
at https://opencobra.github.io/cobratoolbox/docs/compilers.html.
Contributing software
● MATLAB.devTools (https://github.com/opencobra/MATLAB.devTools) is highly recommended for
contributing code to the COBRA Toolbox in a user-friendly and convenient way, even for those without
basic knowledge of git. The MATLAB.devTools can be installed by following the instructions given at
https://github.com/opencobra/MATLAB.devTools#installation. Alternatively, if the COBRA Toolbox is
already installed, then the MATLAB.devTools can be installed directly from within MATLAB by typing:
>> installDevTools()
>> updateCobraToolbox
Update from the terminal (or shell) by running the following from within the cobratoolbox
directory.
In the case that the update of the COBRA Toolbox fails or cannot be completed, clone the
repository again. CRITICAL The COBRA Toolbox can be updated as described above only if no
c
changes have been made to the code in one’s local clone of the official online repository. If one intends
to edit the code in one’s local clone of the COBRA Toolbox, then the official repository should be cloned
as explained in Steps 97–102. These steps also explain how to contribute any local edits to the COBRA
Toolbox code into the official online repository.
Procedure
Initialization of the COBRA Toolbox ● Timing 5–30 s
1 At the start of each MATLAB session, the COBRA Toolbox must be initialized. The initialization
can be done either automatically (option A) or manually (option B). For a regular user
who primarily uses the official openCOBRA repository, automatic initialization of the COBRA
Toolbox is recommended. It is highly recommended to manually initialize when contributing
(Steps 97–102), especially when the official version and a clone of the fork are present locally.
(A) Automatically initializing the COBRA Toolbox
(i) Edit the MATLAB startup.m file and add a line with initCobraToolbox so that the
COBRA Toolbox is initialized each time that MATLAB is started.
and reported to the command window. It is not necessary that all possible dependencies be
satisfied before beginning to use the toolbox; e.g., satisfaction of a dependency on a multi-
scale linear optimization solver is not necessary for modeling with a mono-scale metabolic
model. However, it is essential to satisfy other software dependencies; e.g., dependency on a
linear optimization solver must be satisfied for any method that uses FBA.
? TROUBLESHOOTING
(B) Manually initializing the COBRA Toolbox
(i) Navigate to the directory where you installed the COBRA Toolbox and initialize by
running:
>> initCobraToolbox;
and reported to the command window. It is not necessary that all possible dependencies be
satisfied before beginning to use the toolbox; e.g., satisfaction of a dependency on a multi-
scale linear optimization solver is not necessary for modeling with a mono-scale metabolic
model. However, it is essential to satisfy other software dependencies; e.g., dependency on a
linear optimization solver must be satisfied for any method that uses FBA.
? TROUBLESHOOTING
2 At initialization, one from a set of available optimization solvers will be selected as the default
solver. If Gurobi is installed, it is used as the default solver for LP problems, quadratic problems
(QPs), and MILP problems. Otherwise, the GLPK solver is selected for LPs and MILP problems. It
is important to check whether the solvers installed are satisfactory. A table stating the solver
compatibility and availability is printed for the user during initialization. Check the currently
selected solvers with the following command:
>> changeCobraSolver;
CRITICAL STEP A dependency on at least one linear optimization solver must be satisfied
c
for FBA.
>> testAll
? TROUBLESHOOTING
When filename is left blank, a file-selection dialogue window is opened. If no file extension is
provided, the code will automatically determine the appropriate format from the given filename.
The readCbModel function also supports reading of normal MATLAB files for convenience, and
checks whether those files contain valid COBRA models. Legacy model structures saved in a .mat
file are loaded and converted. The fields are also checked for consistency with the current
definitions.
CRITICAL STEP We advise that readCbModel() be used to load new models. This is also valid
c
for models provided in .mat files, as readCbModel checks the model for consistency with the
COBRA Toolbox v.3.0 field definitions and automatically performs necessary conversions for
models with legacy field definitions or field names. To develop future-proof code, it is good practice
to use readCbModel() instead of the built-in function load.
? TROUBLESHOOTING
The extension of the fileName provided is used to identify the type of output requested. The
model will consequently be converted and saved in the respective format. When exporting a
reconstruction or model, it is necessary that the model adhere to the model structure in Table 3,
and that fields contain valid data. For example, all cells of the rxnNames field should contain only
data of type char and not data of type double.
? TROUBLESHOOTING
c
tool) for the addition or removal of reactions and of gene–reaction associations.
CRITICAL A stoichiometric representation of a reconstructed biochemical network is contained within the
c
model.S matrix. This is a stoichiometric matrix with m rows and n columns. The entry model.S(i,j)
corresponds to the stoichiometric coefficient of the ith molecular species in the jth reaction. The coefficient is
negative when the molecular species is consumed in the reaction and positive if it is produced in the reaction.
If model.S(i,j)== 0, then the molecular species does not participate in the reaction. To manipulate an
existing reconstruction in the COBRA Toolbox, one can use rBioNet, use a spreadsheet, or generate scripts
with reconstruction functions. Each approach has its advantages and disadvantages. When adding a new
reaction or gene–protein–reaction association, rBioNet ensures that reconstruction standards are satisfied, but
it may make the changes less tractable when many reactions are added. A spreadsheet-based approach is
tractable, but allows only for the addition, and not the removal, of reactions. By contrast, using reconstruction
functions provides an exact specification for all the refinements made to a reconstruction. One can also
combine these approaches by first formulating the reactions and gene–protein–reaction associations with
rBioNet and then adding sets of reactions using reconstruction functions.
CRITICAL If you do not have existing rBioNet metabolite, reaction, and compartment databases, the first
c
step is to create these files. Refer to the rBioNet tutorial provided in the COBRA Toolbox for instructions on
how to add new metabolites and reactions to an rBioNet database. Make sure that all the relevant
metabolites and reactions that you wish to add to your reconstruction are present in your rBioNet databases.
6 There are two options for using rBioNet functionality to add reactions to a reconstruction: using the
rBioNet graphical interface (option A) or not using the interface (option B). If you wish to add the
reactions only to the rBioNet database, hence benefiting from the included quality control and
assurance measures, but then afterward decide to use the COBRA Toolbox commands to add
reactions to the reconstruction, use option B.
(A) Adding reactions from an rBioNet database to a reconstruction using the rBioNet graphical
user interface
(i) Verify your rBioNet settings. First, make sure the paths to your rBioNet reaction,
metabolite, and compartment databases are set correctly.
>> rBioNetSettings;
(ii) Load the .mat files that hold your reaction, metabolite, and compartment databases.
(iii) To add reactions from an rBioNet database to a reconstruction, invoke the rBioNet
graphical user interface with the following command:
>> ReconstructionTool;
(B) Adding reactions from the rBioNet database without using the rBioNet interface
(i) Load (or create) a list of reaction abbreviations ReactionList to be added from the
rBioNet reaction database:
>> load(‘Reactions.mat’);
>> load(‘rxnDB.mat’);
8 Merge the existing reconstruction model with the new model structure modelNewR to obtain
a reconstruction with expanded content, modelNew:
>> modelNew = mergeTwoModels(model, modelNewR, 1);
then maximize (‘max’) and minimize (‘min’) the flux through this reaction.
If the reaction should have a negative flux value (e.g., a reversible metabolic reaction or
an uptake exchange reaction), then the minimization should result in a negative objective value
FBA.f < 0. If both maximization and minimization return an optimal flux value of zero
(i.e., FBA.f == 0), then this newly added reaction cannot carry a non-zero flux value under the
given simulation condition and the cause for this must be identified. If the reaction(s) can carry
non-zero fluxes, make sure to carry out Steps 17 and 47 to ensure stoichiometric consistency as
a check for chemical and biochemical fidelity.
11 Remove reactions. To remove reactions from a reconstruction, use:
Note that the removal of one or more metabolites makes sense only if they do not appear in
any reactions or if one wishes to remove all reactions associated with one or more metabolites.
For example, if a network contains reactions A þ B $ C and A $ C, removing metabolite C
will remove the former reaction also.
13 Remove trivial stoichiometry. If metabolites with zero rows, or reactions with zero columns are
present in a stoichiometric matrix, they can be removed with:
After removing one or more reactions (or metabolites) from the reconstruction, please repeat
Steps 9–13 in order to check that these modifications did not alter existing metabolic functions of
the reconstruction-derived models.
A positive solverStatus also indicates that the COBRA Toolbox will use Gurobi as
the default linear optimization solver.
CRITICAL STEP A dependency on at least one linear optimization solver must be
c
satisfied for FBA. If any numerical issues arise while using a double-precision solver, then a
higher-precision solver should be tested. For instance, a double-precision solver may
incorrectly report that a poorly scaled optimization problem is infeasible although it
actually might be feasible for a higher-precision solver. The checkScaling function can
be used on all operating systems, but the dqqMinos or quadMinos interfaces are
available only on UNIX operating systems.
? TROUBLESHOOTING
(B) Quadruple-precision solver
(i) If the recommendation shows that a higher-precision solver is required, then, for example,
select the quadruple-precision optimization solver dqqMinos for solving linear
optimization problems with the following command:
satisfied for FBA. If any numerical issues arise while using a double-precision solver, then a
higher-precision solver should be tested. For instance, a double-precision solver may
incorrectly report that a poorly scaled optimization problem is infeasible although it
actually might be feasible for a higher-precision solver. The checkScaling function can
be used on all operating systems, but the dqqMinos or quadMinos interfaces are
available only on UNIX operating systems.
? TROUBLESHOOTING
inconsistent with the remainder of a reconstruction should be omitted from a model that is
inconsistent with the remainder of a reconstruction should be omitted from a model that is
intended to be subjected to FBA; otherwise erroneous predictions may result due to
inadvertent violation of the steady-state mass conservation constraint.
? TROUBLESHOOTING
(C) Identifying the largest set of reactions that are stoichiometrically consistent
(i) Given stoichiometry alone, a non-convex optimization problem can be used to approxi-
mately identify the largest set of reactions in a reconstruction that are stoichiometrically
consistent.
inconsistent with the remainder of a reconstruction should be omitted from a model that is
intended to be subjected to FBA; otherwise erroneous predictions may result due to
inadvertent violation of the steady-state mass conservation constraint.
? TROUBLESHOOTING
particular context, the more likely network states that are specific to that context are to be predicted,
as opposed to those predicted from a generic model. All else being equal, a model derived from a
comprehensive, yet generic, reconstruction will be less constrained than a model derived from a less
comprehensive, yet generic, reconstruction. That is, in general, the more comprehensive a
reconstruction is, the greater attention must be paid to setting simulation constraints.
Identification of molecular species that leak, or siphon, across the boundary of the model
● Timing 1 s–17 min
19 Identification of internal and external reactions using findSExRxnInd in Step 16A is the fastest
option, but it may not always be accurate. It is therefore wise to check whether there exist molecular
species that can be produced from nothing (leak) or consumed, giving nothing (siphon) in a
reconstruction, with all external reactions blocked. If modelBoundsFlag == 1, then the leak
testing uses the model bounds on internal reactions, and if modelBoundsFlag == 0, then all
internal reactions are assumed to be reversible. To do this, type the following commands:
>> modelBoundsFlag = 1;
>> [leakMetBool, leakRxnBool, siphonMetBool, siphonRxnBool] =...
findMassLeaksAndSiphons(model, model.SIntMetBool, model.SIntRxn-
Bool, modelBoundsFlag);
corresponding molecular species can be produced from nothing or consumed, giving nothing, and
may invalidate any FBA prediction.
maxn ρðvÞ :¼ cT v
v2R
s:t:Sv ¼ 0; ð1Þ
l v u;
where c 2 Rn is a parameter vector that linearly combines one or more reaction fluxes to form the
objective function, denoted as ρ(v). In the COBRA Toolbox, model.c contains the objective coefficients.
S 2 Rm ´ n is the stoichiometric matrix stored in model.S, and the lower and upper bounds on reaction
rates, l, u ∈ Rn are stored in model.lb and model.ub, respectively. The equality constraint represents
a steady-state constraint (production = consumption) on internal metabolites and a mass balance
constraint on external metabolites (production + input = consumption + output). The solution to
optimization problem (1) can be obtained using a variety of LP solvers that have been interfaced with the
COBRA Toolbox. Table 4 gives the various options. A typical application of FBA is to predict an optimal
steady-state flux vector that optimizes a microbial biomass production rate71, subject to literature-derived
bounds on certain reaction rates. Deciphering the most appropriate objective function for a particular
context is an important open research question. The objective function in problem (1) can be modified by
changing model.c directly or by using the convenient function as follows:
CRITICAL STEP Assuming the constraints are feasible, the optimal objective value
c
Assuming the constraints are feasible, the optimal objective value solution.fand the
optimal flux vector solution.v are unique. Setting minNorm to 10−6 is equivalent to
maximizing the function pðvÞ :¼ cT v σ2 vT v with σ = 10−6 and θðvÞ ¼ σ2 vT v is a
regularization function. With high-dimensional models, it is wise to ensure that the
optimal value of the regularization function is smaller than the optimal value of the original
linear objective in problem (1), that is ρ(v*) ≫ θ(v*). A pragmatic approach is to specify
minNorm = 1e-6;, and then reduce it if necessary.
? TROUBLESHOOTING
specified reaction bound. To resolve the infeasibility, one can use relaxed FBA, which is an
optimization problem that minimizes the number of bounds to relax in order to render an FBA
equation feasible. The refluxed flux balance analysis problem is
s:t: Sv þ r ¼ b; ð2Þ
l p v u þ q;
p; q; r 0;
where S 2 Rm ´ n denotes a stoichiometric matrix, p; q 2 Rn denote the relaxations of the lower and
upper bounds (l and u) on reaction rates of the flux vector v, and r 2 Rm denotes a relaxation of the
mass balance constraint. The latter is useful when there are non-zero boundary constraints forcing
secretion or uptake of biochemical species from the environment; that is, b 2 Rm ≠0 . Non-negative
parameters λ 2 Rn≥ 0 and α 2 Rn0 can be used to trade off between relaxation of mass balance or
bound constraints, e.g., relaxation of bounds on exchange reactions rather than internal reactions or
mass balance constraints. The optimal choice of parameters depends heavily on the biochemical context.
A relaxation of the minimum number of constraints is desirable because, ideally, one should be able to
justify the relaxation of each bound with reference to the literature. The scale of this task is proportional
to the number of bounds proposed to be relaxed, motivating the sparse optimization problem to
minimize the number of relaxed bounds. Implement relaxed FBA with the following command:
The structure relaxOption can be used to prioritize the relaxation of one type of bound over
another. For example, in order to disallow relaxation of bounds on all internal reactions, set the field
.internalRelax to 0, and to allow the relaxation of bounds on all exchange reactions, set the field
.exchangeRelax to 2. If there are certain reaction bounds that should not be relaxed, then this can
be specified using the Boolean vector field .excludedReactions. The first application of
relaxFBA to a model may predict bounds to relax that are not supported by the literature or other
experimental evidence. In this case, the field .excludedReactions can be used to disallow the
relaxation of bounds on certain reactions.
min kvk0
v
s:t: Sv ¼ b;
ð3Þ
l v u;
cT v ¼ p ;
where the last constraint is optional and represents the requirement to satisfy an optimal objective
value ρ* derived from any solution to problem (1). The optimal flux vector can be considered as a
steady-state biochemical pathway with minimal support, subject to the bounds on reaction rates
and satisfaction of the optimal objective of problem (1). There are many possible applications of
such an approach; here, we consider one example.
Sparse FBA is used to find the smallest active stoichiometrically balanced cycle that can produce
ATP at a maximal rate using the ATP synthase reaction (https://www.vmh.life/#reaction/ATPS4m).
We use the Recon3Dmodel.mat66 (naming subject to change), which does not have such a cycle
active due to bound constraints, but does contain such an active cycle with all internal reactions set
to be irreversible. First the model is loaded, then the internal reactions are identified and blocked,
and finally the objective is set to maximize the ATP synthase reaction rate. Thereafter, the sparse
FBA solution is computed. To do this, use the following command:
Prioritize reaction types in the reference database to use for filling gaps using a weights
parameter structure. The parameters weights.MetabolicRxns, weights.Exchan-
geRxns, and weights.TransportRxns allow different priorities to be set for internal
metabolic reactions, exchange reactions, and transport reactions, respectively. Transport reactions
include intracellular and extracellular transport reactions. The lower the weight for a reaction type,
the higher is its priority. Generally, a metabolic reaction should be prioritized in a solution over
transport and exchange reactions with, for example, the following commands:
must be manually curated before being added to a reconstruction. This step is critical for obtaining
a high-quality metabolic reconstruction. Adding the least number of reactions to fill gaps may not
be the most appropriate assumption from a biological viewpoint. Consequently, the reactions
proposed to be added to reconstruction require further manual assessment. Proposed gap-filling
solutions must be rejected if they are biologically incorrect.
CRITICAL The mapping between the metabolite abbreviations in the universal database (e.g.,
c
KEGG) and the reconstruction metabolite abbreviations in model.mets, will ultimately limit how
many blocked reactions might be resolved with fastGapFill. The larger the number of
metabolites that map between these different namespaces, the larger the pool of metabolic reactions
from the universal database that can be proposed to fill gaps. The mapping between the
reconstruction and universal metabolite database can be customized using the dictionaryFile,
which lists the universal database identifiers and their counterparts in the reconstruction.
(A) Gap filling without return of additional metadata
(i) To fill gaps without returning additional metadata, run the following command:
The starting_model is the model before addition of fresh medium constraints. The
current_inf input argument allows one to specify a value for the large-magnitude finite
number that is currently used to represent an effectively infinite reaction rate bound, then
harmonize it to a new value specified by set_inf. When no information on the bounds of a
reaction is known, the ideal way to set reaction bounds is by use of the commands model.lb(j)
= -inf; and model.ub(j)= inf;. However, depending on the optimization solver, an
infinite lower or upper bound may or may not be accepted. Therefore, when no information on the
bounds of a reaction is known, except perhaps the directionality of the reaction, then the upper or
lower bound may be a large-magnitude finite number, e.g., model.ub(j)= 1000;. The fresh
medium composition must be specified with a vector of exchange reaction abbreviations for
metabolites in the cell medium (medium_composition) and the corresponding millimolar
concentration of each medium component (met_Conc_mM). The density of the culture
(cellConc, cells per mL), the time between the beginning and the end of the experiment
(t, hours), and the measured cellular dry weight (cellWeight, gDW) must also be specified.
Basic medium components (mediumCompound), such as protons, water, and bicarbonate, and the
corresponding lower bounds on exchange reactions (mediumCompounds_lb), must also be
specified. Even though they are present, they are not usually listed in the specification of a
commercially defined medium, but they are needed for cells and the generic human metabolic
model in order to support the synthesis of biomass. The modelMedium is a new model with
external reaction bounds set according to the defined fresh medium.
32 Next, prepare the quantitative exometabolomic data using the prepIntegrationQuant function:
The fluxes for each metabolite are given as uptake (negative) and secretion (positive) flux values
in a metabolomic data matrix metData, in which each column represents a sample in
sampleNames and each row in exchanges represents an exchanged metabolite. The units used
for fluxes must be consistent within a model. For the input model in modelMedium, the
prepIntegrationQuant function tests whether the qualitative uptake (test_max, e.g., ±500)
and secretion (test_min, e.g., 10−5) values of the metabolites are possible for each sample defined
in the metabolomic data matrix metData. If a metabolite cannot be secreted or taken up, it will be
removed from the data matrix for that particular sample. Possible reasons for this could be missing
production or degradation pathways or blocked reactions. For each sample, the uptake and
secretion profile compatible with the input model in modelMedium is saved to the location
specified in outputPath using the unique sample name.
33 Use the model constrained by the defined fresh medium composition modelMedium and the
output of the prepIntegrationQuant function to generate a set of functional, contextualized,
condition-specific models using the following command:
A subset of samples can be specified with samples. All fluxes smaller than tol will be treated
as zero. A lower bound (minGrowth, e.g., 0.008 per h) on a specified objective function (e.g., obj
= biomass_reaction2;) needs to be defined, along with metabolites that should not be
secreted (e.g., no_secretion = ‘EX_o2[e]’) or taken up (no_uptake = ‘EX_o2s’). The
function returns a ResultsAllCellLines structure containing the context-specific models, as
well as an overview of model statistics in OverViewResults. For each sample, a condition-
specific model is created, in which the constraints have been set in accordance with the medium
specification and the measured extracellular metabolomic data. This set of condition-specific
models can then be phenotypically analyzed using the various additional functions present in the
COBRA Toolbox as detailed in the MetaboTools protocol65.
The uFBAvariables structure must contain the following fields: .metNames is a list of
measured metabolites, .changeSlopes provides the rate of change of concentration with respect
Time (d)
Step 35
Principal-component
analysis determines stages
Concentration
Linear regression yields rate Steady state
of change for each state
b2
95% confidence
b1 interval
Step 38
Time
Metabolite rates of change
S × v ≥ b1
are integrated into model
S × v ≤ b2
lb ≤ v ≤ ub
v2
Step 39
Feasible cellular flux
states are determined
v1
v3
Fig. 3 | Unsteady-state flux balance analysis. Conceptual overview of the main steps involved in the unsteady-state
flux balance analysis (uFBA) method.
to time for each measured metabolite, .changeIntervals yields the difference between the
mean rate of change of concentration with respect to time and the lower bound of the 95%
confidence interval. The list ignoreSlopes contains metabolites whose measurements should be
ignored because of an unsubstantial rate of change.
The output is a uFBAoutput structure that contains the following fields: .model, a COBRA
model structure with constraints on the rate of change of metabolite concentrations;
.metsToUse, containing a list of metabolites with metabolomic data integrated into the model;
and .relaxedNodes, containing a list of metabolites that deviate from steady state along with
the direction (i.e., accumulation or depletion) and magnitude (i.e., reaction bound) of deviation.
The uFBA algorithm automatically determines sink or demand reactions needed to return a model
with at least one feasible flux balance solution, by automatically reconciling potentially incomplete
or inaccurate metabolomic data with the model structure. The added sink or demand reactions
allow the corresponding metabolites, defined by .relaxedNodes, to deviate from a steady state
to ensure model feasibility. The default approach is to minimize the number of metabolites that
deviate from steady state.
The buildUFBAmodel function integrates quantitative time course metabolomic data with
a model by setting rates of change with respect to time for a set of measured intracellular
and extracellular metabolites. A set of sink reactions, demand reactions, or both, may have been added
to certain nodes in the network to ensure that the model admits at least one feasible mass balanced flux.
39 Use optimizeCbModel to minimize the obtained model:
be used to determine the set of reactions that must be part of a context-specific model, including
transcriptomic, proteomic, and metabolomic data, as well as complementary experimental data
from the literature. Several model extraction methods have been developed, with different
underlying assumptions, and each has been the subject of multiple comparative evaluations86–88.
The selection of a model extraction method and its parameterization, as well as the methods chosen
to preprocess and integrate the aforementioned omics data, substantially influences the size,
functionality, and accuracy of the resulting context-specific model. Currently, there is insufficient
evidence to assert that one model extraction method universally provides the most physiologically
accurate models. Therefore, a pragmatic approach is to test the biochemical fidelity of context-
specific models generated using a variety of model extraction methods. The COBRA Toolbox offers
six different model extraction methods; to access these, use this common interface:
The different methods and associated parameters are selected via the options structure. The
.solver field indicates which method will be used. The other fields of the options structure
vary depending on the method and often depend on bioinformatic preprocessing of input omics
data. There are additional optional parameters for all algorithms, with the default being the values
indicated in the respective papers. Please refer to the original papers reporting each algorithm for
details on the requirements for preprocessing of input data.
● The FASTCORE
89
algorithm. One set of core reactions that is guaranteed to be active in the
extracted model is identified by FASTCORE. Then, the algorithm finds the minimum number of
reactions possible to support the core; the .core field provides the core reactions, which have to
be able to carry flux in the resulting model.
● The GIMMME
90
algorithm. With this algorithm, the usage of low-expression reactions is minimized
while keeping the objective (e.g., biomass) above a certain value; the .expressionRxnsfield
provides the reaction expression, with -1 for unknown reactions or reactions not linked to genes;
the .threshold field sets the threshold above which a reaction is assumed to be active.
● The iMAT
91
algorithm. iMAT finds the optimal trade-off between including high-expression
reactions and removing low-expression reactions; the .expressionRxns field is defined as above;
the .threshold_lb field is the threshold below which reactions are assumed to be inactive; the
.threshold_ub field is the threshold above which reactions are assumed to be active.
● The INIT
92
algorithm. The optimal trade-off between including and removing reactions on the basis
of their given weights is determined by this algorithm; .the weights field provides the weights w
P P
for each reaction in the objective of INIT max i2R wi yi þ j2M xj . Commonly, a high
expression leads to higher positive values and low or no detection leads to negative values.
● The MBA93 algorithm. MBA defines high-confidence reactions to ensure activity in the extracted
model. Medium-confidence reactions are kept only when a certain parsimony trade-off is met. The
.medium_set field provides the set of reactions that have a medium incidence, and the
.high_set field provides the set of reactions that have to be in the final model. Any reaction not
in the medium or high set is assumed to be inactive and preferably not present in the final model.
● The mCADRE
94
algorithm. A set of core reactions is first found, and all other reactions are then
pruned on the basis of their expression, connectivity to core, and confidence score. Reactions that
are not necessary to support the core or defined functionalities are thus removed. Core reactions are
removed if they are supported by a certain number of zero-expression reactions. The
.confidenceScores field provides reliability for each reaction, generally based on the literature,
and the .ubiquityScore field provides the ubiquity score of each reaction in multiple replicates,
i.e., the number of times the reaction was detected as active in experimental data under the
investigated condition.
CRITICAL STEP When integrating omics data, parameter selection is critical, especially selection
c
of the threshold for binary classification, e.g., the threshold for genes to be placed into active or
inactive sets. Algorithmic performance often strongly depends on parameter choices and on the
choice of data preprocessing method87. However, createTissueSpecificModel does not
offer data preprocessing tools, because the selection of the discretization method and parameters
depend on the origin of the data. However, the COBRA Toolbox offers functionality to map
preprocessed expression data to reactions via the function mapExpressionToReactions
(model, expression).
(PCHOLP_hs, https://www.vmh.life/#reaction/PCHOLP_hs).
● Phospholipase A2 acts on the bond between the fatty acid and the hydroxyl group of PC to form a
fatty acid (e.g., arachidonic acid or docosahexaenoic acid) and lysophosphatidylcholine (PLA2_2,
https://vmh.life/#reaction/PLA2_2).
● Ceramide and PC can also be converted to sphingomyelin by sphingomyelin synthetase (SMS,
https://vmh.life/#reaction/SMS).
Load a COBRA model and define the set of reactions that will represent degradation of the
metabolite in question; for this example type:
CRITICAL STEP Correctly converting the literature data into bound constraints with the same
c
units used for the model fluxes may be a challenge. Indeed, the curation of biochemical literature
to abstract the information required to quantitatively bound turnover rates can take between 4
and 8 weeks when the target is to retrieve the biomass composition and the turnover rates of each
of the different biomass precursors. Once all the constraints are available, imposing the corre-
sponding reaction bounds takes <5 min.
42 Verify that all the reactions are irreversible (the lower and upper bounds should be ≥0).
46 Check the values of the added fluxes. The sum of fluxes should be greater than or equal to the
value of d:
>> solution.v(rxnInd);
>> sum(c*FBAsolution.v(rxnInd));
50 Allow the model to uptake oxygen and water, and then provide 1 mol/gdw/h of a carbon source,
e.g., glucose (Virtual Metabolic Human database (VMH; http://vmh.life) ID: glc_D[e]):
51 Compute an FBA solution with maximum flux through the DM_atp[c] reaction:
If you want to change key parameters, do this using the params structure. Among others, the
main parameters include the amount of time .timeLimit for each iterative solve in seconds and
the number of threads for the MILP solver to use. The output Z 2 Rn ´ ðnrankðSÞÞ is a sparse set of n
− rank(S) linearly independent flux modes, each corresponding to a MinSpan pathway.
max n minvj
v
s:t:S v ¼ 0 ð4Þ
l v u;
cT v ¼ c T v :
Just as there are many possible variations on FBA, there are many possible variations on flux
variability analysis. The COBRA Toolbox offers a straightforward interface to implement standard
flux variability analysis and a wide variety of options to implement variations on FBA.
(A) Standard flux variability analysis
(i) Use the following command to compute standard flux variability analysis:
The result is a pair of n-dimensional column vectors, minFlux and maxFlux, with
the minimum and maximum flux values satisfying problem (4).
(B) Advanced flux variability analysis
(i) Access the full spectrum of flux variability analysis using the command:
The optPercentage parameter allows one to choose whether to consider solutions that
give at least a certain percentage of the optimal solution. For instance optPercentage = 0
would just find the flux range of each reaction, without of any requirement to satisfy any
optimality with respect to FBA. Setting the parameters osenseStr = ‘min'or
osenseStr = ‘max'determines whether the FBA problem is first solved as a minimization
or maximization.
The rxnNameList accepts a cell array list of reactions upon which to selectively perform
flux variability. This is useful for high-dimensional models, for which the computation of a flux
variability for all reactions is more time consuming. The additional n × k output matrices
Vmin and Vmax return the flux vector for each of the k ≤ n fluxes selected for flux variability.
The verbFlag input determines how much output will be printed. The parameter
allowLoops == 0 invokes an MILP implementation of thermodynamically constrained
flux variability analysis for each minimization or maximization of a reaction rate. The method
input argument determines whether the output flux vectors also minimize the 0-norm,
1-norm, or 2-norm while maximizing or minimizing the flux through one reaction.
The default result is a pair of maximum and minimum flux values for each reaction.
Optional parameters can be set. For instance, parameters can be set to control which subset
of k ≤ n reactions of interest will be obtained, or to determine the characteristics of each of the
2 × k flux vectors.
The output argument optsol returns the optimal solution of the initial FBA.
(B) Use distributedFBA.jl with Julia
(i) Alternatively, solve the 2 × k linear optimization problem using multiple threads running
on parallel processors or a cluster using distributedFBA.jl, an openCOBRA extension that
permits the solution of FBA, a distributed set of flux balance problems, or a flux variability
analysis using a common of solver (GLPK, CPLEX, Clp, Gurobi, and Mosek). Assuming
that distributedFBA.jl has been correctly installed and configured, the commands to go
back and forth between a model or results in MATLAB and the computations in Julia are
as follows:
Here, nWorkers = 128 will distribute the flux variability analysis problem among
128 Julia processes on one or more computing nodes in a computational cluster.
The samples output is an n × p matrix of sampled flux vectors, where p is the number
of samples. To accelerate any future rounds of sampling, use the modelSampling
output. This is a model storing extra variables acquired from preprocessing the model for
sampling (Fig. 4).
Flux space
ACHR CHRR
600 600
Model 1 Model 1
500 Model 2 500 Model 2
400 400
No. of samples
No. of samples
300 300
200 200
100 100
0 0
0 0.5 1 1.5 2 2.5 3 0 0.5 1 1.5 2 2.5 3
Flux (mmol/gDW/h) Flux (mmol/gDW/h)
Fig. 4 | Solution spaces from steady-state fluxes are anisotropic, that is, long in some directions and short in others. This impedes the ability of any
sampling algorithm taking a random direction to evenly explore the full feasible set (artificial centering hit-and-run (ACHR) algorithm). The CHRR
(coordinate hit-and-run with rounding) algorithm first rounds the solution space based on the maximum-volume ellipsoid. Then, the rounded solution
space is uniformly sampled using a provably efficient coordinate hit-and-run random walk. Finally, the samples are projected back onto the anisotropic
feasible set. This leads to a more distributed uniform sampling, so that the converged sampling distributions for the selected reactions become
smoother.
The variable sampleFile contains the name of a .mat file used to save the sample
vectors to disk. A string passed to samplerName can be used to sample with non-default
solvers. The options structure contains fields that control the number of sampling
steps taken per sample point saved (.nStepsPerPoint) and the number of sample
points saved (.nPointsReturned). Reasonable parameter values are nPointsSaved =
8 × (n– rank(S))2 and nStepsPerPoint = 200. The output modelSampling is a model
that can be used in subsequent rounds of sampling. Although rounding of large models is
computationally demanding, the results can be reused when sampling the same model
more than once. The CHRR algorithm provably converges to a uniform stationary
sampling distribution, if enough samples are obtained, and has been tested with mono-
scale metabolic models with up to 10,000 reactions. The default parameters are set using
heuristic rules to estimate a sufficiently large number of samples, which balances this
requirement against the desire to complete the sampling procedure in a practically useful
period of time.
? TROUBLESHOOTING
57 Load an illustrative model that comprises only 90 reactions, describing the central metabolism
in E. coli109.
>> readCbModel('AntCore.mat');
58 Set the objective function to maximize the biomass reaction (R75). Change the lower bounds such
that the E. coli model will be able to consume glucose, oxygen, sulfate, ammonium, citrate, and
glycerol.
CRITICAL STEP In this example, we provide the constraints for both wild-type and mutant
c
strains, but in a typical scenario, the definition of differential constraints on wild-type and mutant
strains requires additional research. This step could take a few days or weeks, depending on the
information available for the species of interest. Flux bounds (i.e., uptake rate and minimum
biomass yield target) are required inputs. New experiments might be required to be performed in
addition to the literature curation task in order to obtain such data. Assumptions may also be made
when describing the phenotypes of both strains, which will reduce the dependency on literature
curation. It is important that the two strains be sufficiently different in order to be able to anticipate
differences in reaction ranges.
60 Perform flux variability analysis for both wild-type and mutant strains with the following
commands:
MUSTU MUSTL
v1 v1 v1
MUSTUU MUSTLL
v 1 + v2 v1 + v 2 v1 + v2
MUSTUL MUSTLU
v1 – v2 v1 – v2 v1 – v 2
Fig. 5 | In the OptForce procedure, the MUST sets are determined by contrasting the flux ranges obtained using
flux variability analysis (FVA) of a wild-type (blue bars) and an overproducing strain (red bars). The first order
MUST sets (top panel) are denoted MUSTL and MUSTU. For instance, a reaction belongs to the MUSTU set if the
upper bound of the flux range in the wild-type is less than the lower bound of the flux range of the overproducing
strain. The center and bottom panels show all possible second-order MUST sets.
61 The MUST sets are the sets of reactions that must increase or decrease their flux in order to achieve the
desired phenotype in the mutant strain. As shown in Fig. 5, the first-order MUST sets are MustU and
MustL, and second-order MUST sets are denoted as MustUU, MustLL, and MustUL. After
parameters and constraints are defined, the functions findMustL and findMustU are run to determine
the mustU and mustL sets, respectively. Define an ID for the run with the following command:
Each time the MUST sets are determined, folders are generated to read inputs and store outputs,
i.e., reports. These folders are located in the directory defined by the uniquely defined runID.
62 To find the first-order MUST sets, define the constraints:
63 Determine the first-order MUST set MustL by running the following command:
>> disp(mustLSet)
65 Determine the first-order MUST set MustU by running the following command:
67 Determine the second-order MUST set MustUU by running the following command:
The results are stored and available in a format analogous to that of the mustL set. The
reactions of the mustUU can be displayed using the disp function.
68 Repeat the above steps to determine the second-order MUST sets MustLL and MustUL by using
the functions findMustLL and findMustUL, respectively. The results are stored and available in a
format analogous to that of the mustL set. In the present example, mustLL and mustUL are
empty sets.
? TROUBLESHOOTING
69 To find the interventions needed to ensure an increased production of the target of interest, define
the mustU set as the union of the reactions that must be upregulated in the first- and second-order
MUST sets. Similarly, mustL can be defined with the following commands:
70 Define the number of interventions k allowed, the maximum number of sets to find (nSets), the
reaction producing the metabolite of interest (targetRxn (in this case, succinate)), and the
constraints on the mutant strain (constrOpt) with the following commands:
71 Run the OptForce algorithm and display the reactions identified by optForce with the following
command:
72 To find non-intuitive solutions, increase the number of interventions k and exclude the SUCt
reaction from upregulations. Increase nSets to find the 20 best sets. Change the runID to save
this second result in a separate folder from that of the previous result, and then run optForce
again as in Step 71.
target, so check the minimum production rate, e.g., by using the function testoptForceSol.
This function computes atom-mapping data for the balanced and unbalanced reactions in the
metabolic network and saves them in the outputDir directory. The optional maxTime
parameter sets a runtime limit for atom mapping of a reaction. If standardiseRxn == 1, then
atom mappings are also canonicalized, which is necessary in order to obtain a consistent
interoperable set of atom mappings for certain applications, e.g., computation of conserved moieties
in Step 77. The output balancedRxns contains the balanced atom-mapped metabolic reactions.
? TROUBLESHOOTING
77 With a set of canonicalized atom mappings for a metabolic network, the set of linearly independent
conserved moieties for a metabolic network can be identified122. Each of these conserved moieties
corresponds to a molecular substructure (set of atoms in a subset of a molecule) whose structure
remains invariant despite all the chemical transformations in a given network. A conserved moiety is
a group of atoms that follow identical paths through metabolites in a metabolic network. Similarly to
a vector in the (right) nullspace of a stoichiometric matrix that corresponds to a pathway (Step 52), a
conserved moiety corresponds to a vector in the left nullspace of a stoichiometric matrix. Metabolic
networks are hypergraphs123, whereas most moiety subnetwork are graphs. Therefore, conserved
moieties have both biochemical and mathematical significance and, once computed, can be used for a
wide variety of applications. Given a metabolic network of exclusively mass-balanced reactions, one
can identify conserved moieties by a graph theory analysis of its atom transition network122.
First compute an atom transition network for a metabolic network using the following
command:
>> ATN = buildAtomTransitionNetwork(model, rxnfileDir);
where rxnfileDir is a directory containing only atom-mapped files from balanced reactions,
which can be obtained as explained in Step 76. The output ATN is a structure with several fields: .A
is a p × q sparse incidence matrix for the atom transition network, where p is the number of atoms
and q is the number of atom transitions, .mets is a p × 1 cell array of metabolite identifiers to link
each atom to its corresponding metabolites, .rxns is a q × 1 cell array of reaction identifiers to link
atom transitions to their corresponding reactions, and .elements is a p × 1 cell array of element
symbols for atoms in .A.
CRITICAL STEP All the RXN files needed to compute the atom transition network must be in a
c
method for thermodynamically constraining a genome-scale metabolic model is beyond the scope
of this protocol. Therefore, only several key steps are highlighted.
Given a set of experimentally derived training_data on standard transformed Gibbs
energies of formation, a state-of-the art quantitative estimation of standard Gibbs energy of
formation for metabolites with similar chemical substructures can be obtained using an
implementation of the component contribution method127. We assume that the input model
has been anatomically resolved as described in Steps 74–78. Access to a compendium of
stoichiometrically consistent metabolite structures110,122 is a prerequisite. When these are in place,
invoke the component contribution method as follows:
The model.DfG0 field gives the estimated standard Gibbs energy of formation for each
metabolite in the model with the model.DfG0_Uncertainty field expressing the uncertainty
in these estimates, which is smaller for metabolites structurally related to metabolites in the training
set. All thermodynamic estimates are given in units of kJ/mol.
80 Transform the standard Gibbs energy of formation for each metabolite according to the
environment of each compartment of the model126, i.e., the temperature, pH, ionic strength, and
electrical potential specific to each compartment. Estimate the thermodynamic properties of
reactions, given model.concMin and model.concMax, where one can supply lower and upper
bounds on compartment-specific metabolite concentrations (mol/L), using the following command:
In the output, field .DfGt0 of model gives the estimated standard transformed Gibbs energy
of formation for each metabolite and .DrGt0 gives the estimated standard transformed Gibbs
energy for each reaction. Subject to a confidenceLevel specified as an input, the upper and
lower bounds on standard transformed Gibbs energy for each reaction are provided in .DrGtMin
and .DrGtMax, respectively.
CRITICAL STEP In a multi-compartmental model, this step must be done for an entire network at
c
once in order to ensure that thermodynamic potential differences, arising from differences in the
environment between compartments, are properly taken into account. See ref. 126 for a theoretical
justification for this assertion.
81 Quantitatively assign reaction directionality based on the aforementioned thermodynamic estimates
using the following command:
>> allowLoops = 0;
>> solution = optimizeCbModel(model, [], [], allowLoops);
The solution structure is the same as for FBA (see problem (1)), except that this solution
satisfies additional constraints that ensure the predicted steady-state flux vector is thermo-
dynamically feasible129. The solution satisfies energy conservation and the second law of
thermodynamics130.
molecular species in the jth reverse reaction. We assume that the network of reactions is
stoichiometrically consistent69, that is, that there exists at least one positive vector l 2 Rn0 satisfying
(R − F)Tl = 0. Equivalently, we require that each reaction conserves mass. The matrix N:= R − F
represents net reaction stoichiometry and can be viewed as the incidence matrix of a directed
hypergraph123. We assume that there are fewer molecular species than there are net reactions, that
is m < n. We assume the cardinality of each row of F and R is at least 1, and the cardinality of each
column of R − F is at least 2. The matrices F and R are sparse, and the particular sparsity pattern
depends on the particular biochemical network being modeled. Moreover, we assume that rank([F,
R]) = m, which is a requirement for kinetic consistency134.
A vector c* is a steady state if and only if it satisfies f (c*) = 0, leading to the nonlinear system of
equations
f ðxÞ ¼ 0:
There are many algorithms that can handle this nonlinear system by minimizing a nonlinear least-
squares problem; however, particular features of this mapping, such as sparsity of stoichiometric
matrices F and R and non-unique local zeros of mapping f, motivate the quest to develop several
algorithms for efficiently dealing with this nonlinear system. A particular class of such mappings,
called duplomonotone mappings, was studied for biochemical networks136, and three derivative-free
algorithms for finding zeros of strongly duplomonotone mappings were introduced. Further, it was
shown that the function ||f (x)||2 can be rewritten as a difference of two convex functions that
is suitable for minimization with DC programming methods135. Therefore, a DC algorithm and
its acceleration by adding a line-search technique were proposed for finding a stationary point
of ||f (x)||2. Because the mapping f has locally non-unique solutions, it does not satisfy classic
assumptions (e.g., nonsingularity of the Jacobian) for convergence theory. As a result, it was proven
that the mapping satisfies the so-called Hölder metric subregularity assumption137, and an adaptive
Levenberg–Marquardt method was proposed to find a solution for this nonlinear system if the
starting point is close enough to a solution. To guarantee the convergence of the
Levenberg–Marquardt method with an arbitrary starting point, it is combined with globalization
techniques such as line search or trust region, which leads to computationally efficient algorithms.
Note that a stationary point of ||f (x)||2 may not correspond to a solution x, such that f(x) = 0, when
∇f(x)f(x) = 0 does not imply f(x) = 0.
Compute a non-equilibrium kinetic steady state by running the function optimizeVKmodel.
The mandatory inputs for computing steady states are a model vKModel containing F and R, the
name of a solver to solve the nonlinear system, an initial point x0, and parameters for the considered
solvers. For example, for specifying a solver, we write solver = ‘LMR';. Optional parameters for
the selected algorithm can be given to optimizeVKmodel by the paramsstruct as follows:
Otherwise, the selected algorithm will be run with the default parameters assigned for each
algorithm. Run the function optimizeVKmodel by typing the following command:
The output struct contains information related to the execution of the solver.
where [,] stands for the horizontal concatenation operator. Let L 2 Rmr;m denote a basis for the
left nullspace of N, which implies LN = 0. We have rank(L) = m − r. We say that the system
satisfies moiety conservation if for any initial concentration c0 2 Rm
0 ,
Lc ¼ L expðxÞ ¼ l0 ;
where l0 2 Rm
0 . It is possible to compute L such that each corresponds to a structurally identifiable
conserved moiety in a biochemical network122. The problem of finding the moiety-conserved
steady state of a biochemical reaction network is equivalent to solving the nonlinear system of
equations
!
f ðxÞ
hðxÞ :¼ ¼ 0: ð7Þ
L expðxÞ l0
Among algorithms mentioned in the previous section, the local and global Levenberg–Marquardt
methods137 are designed to compute either a solution for the nonlinear system (7) or a stationary
point of the merit function 12 khðxÞk2 . To compute the conserved non-equilibrium kinetic steady state
for a moiety, run the optimizeVKmodel function in the same way as in the previous section. Then
pass a model vKModel containing F and R, L and l0 to optimizeVKmodel together with the
name of one of the Levenberg–Marquardt solvers.
>> load('minerva.mat');
>> minerva.login = ‘username';
>> minerva.password = ‘password'; minerva.map = ‘ReconMap-2.01';
88 Load a human metabolic model into MATLAB with the following command:
89 Change the objective function to maximize ATP production through complex V (ATP synthase,
'ATPS4m') in the electron transport chain with the following command:
90 Although the optimal objective value of FBA problem (1) is unique, the optimal flux vector itself is
most likely not. When visualizing a flux vector, it is important that a unique solution to some
optimization problem be displayed. For example, we can predict a unique network flux by
regularizing the FBA problem (1) by redefining ρ(v):= cTv − σ/2 × vTv and σ = 10−6 (Step 21). To
obtain a unique optimal flux vector, run the following command:
91 Build the context-specific overlay of a flux vector on ReconMap by instructing the COBRA Toolbox
to communicate with the remote MINERVA server using the following command:
The only new input variable is the text string in the identifier that enables you to name each
overlay according to a unique title. The response status will be set to 1 if the overlay was
successfully received by the MINERVA server.
? TROUBLESHOOTING
92 Visualize context-specific ReconMaps using a web browser. Navigate to http://vmh.life/#reconmap,
log in with your credentials above, and then select ‘OVERLAYS’. The list of USER-PROVIDED
OVERLAYS will appear. To see the map from Step 91, check the box adjacent to the unique text
string provided by identifier.
93 To export context-specific ReconMaps as publishable graphics, at least two options are available:
portable document format (.pdf) or portable network format (.png). The former is useful for
external editing, whereas the latter essentially produces a snapshot of the visual part of the map.
95 Visualize a selected network fragment around a list of reactions in a model, contextualized using
a flux vector flux, by running the following commands:
The rxns input provides a selection of reactions of interest. The remaining inputs are optional
and control the appearance of the automatic layout. For example, excludeMets provides a list
of metabolites that can be excluded from the network visualization, e.g., cofactors such as NAD
and NADP.
96 To visualize a model fragment with a specified radius around a specified metabolite of interest, such
as 'etoh[c]', run the following command:
Github server - Online read access - Online read and write access
- Clone/pull possibility - Clone/pull/push possibility
Develop
myBranch2
Fork-
Cobratoolbox cobratoolbox
Master Develop
myBranch1
Master
Local computer
- Local read and write access - Local read and write access
- No upload/push possibility - Upload/push possibility
Fig. 6 | Development branching model of the COBRA Toolbox. The openCOBRA repository and the fork of a
contributor located on the GitHub server can be cloned to the local computer as cobratoolbox and fork-cobratoolbox
folders, respectively. Each repository might contain different branches, but each repository contains the master and
develop branches. Note that contributors only have read access to the openCOBRA repository. The stable branch is
the master branch (black branch), and the development of code is done on the develop branch (green branch). The
master branch will be checked out when using the cobratoolbox repository, whereas contributors can create new
branches originating from the develop branch (local fork-cobratoolbox directory and online <username>/cobratoolbox
repository). In the present example, myBranch1 (blue branch) has already been pushed to the forked repository on
the GitHub server, whereas myBranch2 (pink branch) is present only locally. The branch myBranch1 can be merged
into the develop branch of the openCOBRA repository by opening a pull request. To submit the contributions
(commits) on myBranch2, the contributor must first push the commits to the forked repository (https://github.com/
<username>/cobratoolbox) before opening a pull request. Any commit made on the develop branch (red square) will
be merged with the master branch if the develop branch is stable overall (orange square).
>> installDevTools
With this command, the directory MATLAB.devTools is created next to the cobratoolbox
installation directory. The MATLAB.devTools can also be installed from the terminal (or shell) with
the following command:
After initialization of the MATLAB.devTools, the user and developer may have two folders: a
cobratoolbox folder with the stable master branch checked out, and a fork-cobratoolbox folder with
the develop branch checked out. Detailed instructions for troubleshooting and/or contributing to
the COBRA Toolbox using the terminal (or shell) are provided in Supplementary Manual 3.
CRITICAL STEP A working Internet connection is required, and git and curl must be installed.
c
Installation instructions are provided on the main repository page of the MATLAB.devTools. A
valid passphrase-less SSH key must be set in the GitHub account settings in order to contribute
without entering a password while securely communicating with the GitHub server.
? TROUBLESHOOTING
98 The MATLAB.devTools are configured on the fly or whenever the configuration details are
not present. The first time a user runs contribute, the personal repository (fork) is
downloaded (cloned) into a new folder named fork-cobratoolbox at the location specified
by the user. In this local folder, both master and develop branches exist, but it is the develop branch
that is automatically selected (checked out). Any new contributions are derived from the develop
branch.
Initializing a contribution using the MATLAB.devTools is straightforward. In MATLAB, type
the following command:
If the MATLAB.devTools are already configured, procedure [1] updates the fork (if necessary)
and initializes a new branch with a name requested during the process. Once the contribution is
initialized, files can be added, modified or deleted in the fork-cobratoolbox folder. A contribution is
successfully initialized when the user is presented with a brief summary of configuration details.
Instructions on how to proceed are also provided.
CRITICAL STEP The location of the fork must be specified as the root directory. There will be a
c
Procedure [2] pulls all changes from the openCOBRA repository and rebases the existing
contribution. In other words, existing commits are shifted forward and placed after all commits
made on the develop branch of the openCOBRA repository.
CRITICAL STEP Before attempting to continue working on an existing feature, make sure that
c
When running procedure [3], choose between ‘simple contribution’ and ‘publishing and
opening a pull request’.
● Simple contribution without opening a pull request. All changes to the code are individually listed
and the user is asked explicitly which changes should be added to the commit. Once all changes
have been added, a commit message must be entered. Upon confirmation, the changes are pushed
to the online fork automatically.
●
Publishing and opening a pull request. The procedure for submitting a pull request is the same as
for the simple contribution, with the difference that when selecting to open a pull request, a link is
c
procedure [3] before stopping work on that contribution. When following procedure [3], the
incremental changes are uploaded to the remote server. We advise publishing often and making
small, incremental changes to the code. There is no need for opening a pull request immediately if
there are more changes to be made. A pull request can be opened at any time, even manually and
directly from the GitHub website. Unless the pull request is accepted and merged, the changes
submitted are not available on the develop or master branches of the openCOBRA version of the
COBRA Toolbox.
? TROUBLESHOOTING
101 If a contribution has been merged into the develop branch of the openCOBRA repository (accepted
pull request), the contribution (feature or branch) can be safely deleted both locally and remotely
on the fork by running contribute and selecting procedure [4]. Note that deleting a
contribution deletes all the changes that have been made on that feature (branch). It is not possible
to selectively delete a commit using the MATLAB.devTools. Instead, create a new branch by
following procedure [1] (Step 98), and follow the instructions to ‘cherry-pick’ in the Supplementary
Manual 3.
CRITICAL STEP Make sure that your changes are either merged or saved locally if you need them.
c
Once procedure [4] is concluded, all changes on the deleted branch are removed, both locally and
remotely. No commits can be recovered.
? TROUBLESHOOTING
102 It is sometimes useful to simply update the fork without starting a new contribution. The local fork
can be updated using procedure [5] of the contribute menu.
CRITICAL STEP Before updating your fork, make sure that no changes are present in the local
c
>> checkStatus
If there are changes listed, publish them by selecting procedure [3] of the contribute menu as
explained in Step 100.
? TROUBLESHOOTING
Troubleshooting
Troubleshooting advice can be found in Table 5.
1 The initCobraToolbox function Incompatible third-party software or First, read the output of the initialization script in the
displays warnings or error messages an improperly configured system command window. Any warning or error messages, although
often brief, may point toward the source of the problem
during initialization if read literally. Second, verify that all
software versions are supported and have been correctly
installed as described in the ‘Materials’ section. Third, ensure
that you are using the latest version of the COBRA Toolbox,
cf. Steps 97–102. Fourth, verify and test the COBRA Toolbox
as described in Step 3. Finally, if nothing else works, consult
the COBRA Toolbox forum, as described in Step 103
3 Some tests are listed as failed when Some third-party dependencies are Verify that all required software has been correctly installed
running testAll not properly installed or the system is as described in the ‘Materials’ section. The specific test can
improperly configured then be run individually to determine the exact cause of the
error. If the error can be fixed, try to use the MATLAB.
devTools and contribute a fix. Further details on how to
approach submitting a contribution are given in Steps 97–102.
If the error cannot be determined, reach out to the community
as explained in Step 103
4 The readCbModelfunction fails The input file is not correctly Specifications for Excel sheets accepted by the COBRA
to import a model formatted or the SBML file format is Toolbox can be found on GitHub (http://opencobra.github.
not supported io/cobratoolbox/docs/COBRAModelFields.html). An Excel
template is available at https://github.com/opencobra/
cobratoolbox/blob/master/docs/source/notes/
COBRA_structure_fields.xlsx. Files with legacy SBML formats
can be imported, but some information from the SBML file
might be lost. In addition to constraint-based information
encoded by fields of the fbc package, the COBRA-style
annotations introduced in the COBRA Toolbox v.2.0 (ref. 4)
are supported for backward compatibility. Some information
is still stored in this type of annotation. The data specified
with the latest version of the fbc package is used in
preference to other fields, e.g., legacy COBRA-style notes,
which may contain similar data
The readCbModel function fails to The model may contain deprecated Old MATLAB models saved as .mat files sometimes contain
import a model saved as a .mat file fields or fields that have invalid values deprecated fields or fields that have invalid values. Some of
these instances are checked and corrected when
readCbModel is run, but there might be instances when
readCbModel fails. If this happens, it is advisable, to load
the .mat file, run the verifyModel function on the loaded
model, and manually adjust all indicated inconsistent fields.
After this procedure, we suggest saving the model again and
using readCbModel to load the model
The readCbModel function fails to The model might be invalid If an SBML file produces an error during input–output, check
import an SBML file that the file is valid SBML by using the SBMLValidator
(http://sbml.org/Facilities/Validator)
5 The writeCbModel function fails to Some of the required fields of the Before a reconstruction or model is exported, a summary of
export a model model structure are missing or the invalid data in the model can be obtained by running
model contains invalid data verifyModel(model). A list of required fields for the
model structure is presented in Table 3
15 The dqqMinos or quadMinos The binaries might not be compatible Make sure that all relevant system requirements described in
interfaces are not working as intended with your operating system the ‘Materials’ section are satisfied. If you are still unable to
use the respective interfaces, reach out to the community as
explained in Step 103
16(A) The findSExRxnInd function fails to Some exchange, demand and sink Try an alternative approach, e.g.,
identify some exchange, demand and reactions do not start with any of findStoichConsistentSubset, which finds the subset of
sink reactions anticipated prefixes a stoichiometric matrix that is stoichiometrically consistent
16(B) The function Some formulae are missing or a Try an alternative approach, e.g.,
checkMassChargeBalance formula is incorrectly specified, findStoichConsistentSubset, which finds the subset of
returns incorrect results leading one or more reactions to be a stoichiometric matrix that is stoichiometrically consistent
incorrectly identified as being
elementally balanced
16(C) Erroneous predictions Inadvertent violation of the steady- Manually inspect the reaction formulae for each reaction
state mass conservation constraint to identify any obviously mass-imbalanced reactions;
omit them from the reconstruction and run
findStoichConsistentSubset again
22
Table continued
The solution status given by A too-short runtime limit has been set Check the value of FBAsolution.origStat and compare
FBAsolution.stat is −1 or numerical issues occurred during it with the documentation provided by the solver in use for
the optimization procedure further information. If one is using a double-precision solver
to solve a model that could be multi-scale but is not yet
recognized as such, then FBAsolution.stat == -1 can
be symptomatic of this situation. In that case, refer to Steps
14 and 15 to learn how to numerically characterize a
reconstruction model
55 The sampling distribution is not The values of the sampling parameters Increase the values of the options.nSkip and options.
uniform (revealed by a non-uniform options.nSkip and options. nSamples parameters until smooth and unimodal marginal
marginal flux distribution) nSamples are set too low flux distributions are obtained
68 No reaction is found in the MUST sets The wild-type or mutant strain may A solution is to add more constraints to the strains until
not be sufficiently constrained differences in the reaction ranges are shown. If no differences
are found, another algorithm might be better suited. If there is
an error when running the findMust* functions, a possible
reason is that the inputs are not well defined or a solver may
not be set. Verify the inputs and use changeCobraSolver
to change to a commercial-grade optimization solver (see
Table 4 for a list of supported solvers)
76 Some reactions could not be mapped Too-short runtime limit or a reaction Increase the runtime limit of the algorithm
that the algorithm could not atom-
map
91 The remote MINERVA server refuses The text string in the identifier input Change the identifier text string of your overlay
to build a new overlay variable is not uniquely defined in your
account
97 An error message claims permission is The SSH key of the computer is not The installation of the MATLAB.devTools is dependent on a
denied configured properly correctly configured GitHub account. The SSH key of the
computer must be set in the GitHub account settings or else
errors will be thrown. If the git clone command works, the SSH
key is properly set. In that case, delete the SSH key locally
(generally located in the ssh folder in the home directory) and
remotely on GitHub, and generate a new SSH key
98 When running contribute an error The local forked folder cannot be It may occur that the configuration of the MATLAB.devTools
message is generated claiming that found or has been moved, or the is faulty or has been mistyped. In that case, try to reset the
the fork cannot be reached or that the remote fork cannot be reached configuration by typing: >> resetDevTools
local fork cannot be found
Procedure [1] fails when running The local fork-cobratoolbox folder is too In that case and if no local changes are present, back up and
contribute old or has not been updated for a remove the local fork-cobratoolbox folder and run the
while contribute command again. Alternatively, try to delete the
forked repository online and re-fork the openCOBRA
repository. When one is sure that everything is fine, the
backup can be safely deleted, but it is wise to store it for some
time, in case later one realizes that some updates to the code
have gone missing
There are changes in the local fork- Contribute the changes manually as described in
cobratoolbox folder Supplementary Manual 3
The forked repository cannot be Set the SSH key in your GitHub account and make sure that
reached online or the SSH key is not the forked repository can be reached. This can easily be
configured properly checked by re-cloning the MATLAB.devTools in the terminal
as explained in Step 97 and by browsing to the forked
repository online
99 Procedure [2] fails when running Your contribution has been deleted When the rebase process fails, the user is asked to reset the
contribute online or is no longer available locally contribution, which will reset the contribution to the online
version of the branch in the fork. In general, when the rebase
fails, there have been changes made on the openCOBRA
repository that are in conflict with the local changes. You can
check the status of the local repository by typing: >>
checkStatus.
If there are conflicts that you do not know how to resolve,
check the official repository or ping the developers at https://
groups.google.com/forum/#!forum/cobra-toolbox as
explained in Step 103. If you already have published changes,
try to submit a pull request as explained in Step 100 to help
the developers to understand the situation. Alternatively, you
can try to resolve the conflicts manually. More information on
how to solve conflicts is given as Supplementary Manual 3
100 Procedure [3] fails when running The forked repository cannot be Check to make sure the SSH key in your GitHub account is set
contribute reached online or the SSH key is not properly and make sure that the forked repository can be
configured properly reached
When opening a pull request, GitHub There have been changes made on the Submit the pull request anyway; another developer will help
cannot automatically merge openCOBRA repository and on your you rebase your contribution manually
local fork
Table continued
Table 5 (continued)
Step Problem Possible reason Solution
101 Procedure [4] fails when running Your local changes are not yet Follow procedure [3] of the contribute menu in order to
contribute published (committed) publish your changes first as explained in Step 100
102 Procedure [5] fails when running There are some local changes that Back up eventual modifications, remove the fork-cobratoolbox
contribute have not yet been published folder, and run the contribute command again
(committed)
Too many changes have been made in Back up your modified files to a separate location, and reset
the openCOBRA repository your branch manually by typing the following into the
terminal (be careful; this will delete all your changes locally,
but not remotely):
$ git reset --hard origin/<yourBranch>
Then copy your file back into the fork-cobratoolbox folder and
contribute normally
Timing
Steps 1 and 2, initialization of the COBRA Toolbox: 5–30 s
Step 3, verification and testing of the COBRA Toolbox: ~17 min
Step 4, importation of a reconstruction or model: 10 s–2 min
Step 5, exportation of a reconstruction or model: 10 s–2 min
Step 6, use of rBioNet to add reactions to a reconstruction: 1 s–17 min
Steps 7 and 8, use of a spreadsheet to add reactions to a reconstruction: 1 s–17 min
Steps 9–13, use of scripts with reconstruction functions: 1 s–2 min
Step 14, checking the scaling of a reconstruction: 1 s–2 min
Step 15, selection of a double- or quadruple-precision optimization solver: 1–5 s
Step 16, identification of stoichiometrically consistent and inconsistent reactions: 1 s–28 h
Step 17, identification of stoichiometrically consistent and inconsistent molecular species: 1 s–17 min
Step 18, setting of simulation constraints: 1 s–17 min
Step 19, identification of molecular species that leak, or siphon, across the boundary of the model:
1–17 min
Step 20, identification of flux-inconsistent reactions: 1 s–17 min
Steps 21 and 22, flux balance analysis: 1 s–2 min
Step 23, relaxed flux balance analysis: 1 s–17 min
Step 24, sparse flux balance analysis: 1 s–17 min
Steps 25–27, identification of dead-end metabolites and blocked reactions: ~2 min
Steps 28–30, gap-filling a metabolic network: 2 min–28 h
Steps 31–33, integration of extracellular metabolomic data: 17 min–28 h
Steps 34–39, integration of intracellular metabolomic data: 2 min–2.8 h
Step 40, integration of transcriptomic and proteomic data: 2 min–2.8 h
Steps 41–46, adding biological constraints to a flux balance model: ~2 min
Steps 47 and 48, qualitative chemical and biochemical fidelity testing:2–17 min
Steps 49–51, quantitative biochemical fidelity testing: 2–7 min
Step 52, MinSpan pathways: a sparse basis of the nullspace of a stoichiometric matrix: 2 min–2.8 h
Step 53, low-dimensional flux variability analysis: 1 s–17 min
Step 54, high-dimensional flux variability analysis: 1 s–28 h
Step 55, uniform sampling of steady-state fluxes: 1 s–17 min
Steps 56–73, identification of all genetic manipulations leading to targeted overproductions: 10 s–28 h
Steps 74–78, atomically resolving a metabolic reconstruction: 10 s–28 h
Steps 79–82, thermodynamically constraining a metabolic model: 1 s–17 min
Step 83, conversion of a flux balance model into a kinetic model: 1 s–17 min
Step 84, computation of a non-equilibrium kinetic steady state: 1 s–17 min
Step 85, computation of a moiety-conserved non-equilibrium kinetic steady state: 1 s–17 min
Steps 86–93, human metabolic network visualization with ReconMap: 1 s–2 min
Steps 94–96, variable scope visualization of a network with Paint4Net: 1 s–17 min
Steps 97–102, contributing to the COBRA Toolbox with MATLAB.devTools: 1–30 s
Step 103, engaging with the COBRA Toolbox forum, 1 s–2 min
Step 2
A list of solvers assigned to solve each class of optimization solvers is returned:
Defined solvers are:
CBT_LP_SOLVER: gurobi
CBT_MILP_SOLVER: gurobi
CBT_QP_SOLVER: qpng
CBT_MIQP_SOLVER: gurobi
CBT_NLP_SOLVER: matlab
Step 3
The test suite starts by initializing the COBRA Toolbox and thereafter all the tests are run. At the end
of the test run, a comprehensive summary table is presented in which the respective tests and their
outcomes are shown. On a fully configured system that is compatible with the most recent version of
the COBRA Toolbox, all tests should pass. It may not be necessary to have a fully configured system
to use one’s particular subset of methods.
Step 5
A file is exported that contains the information from the model to the location and in the format
specified by the fileName variable.
Step 20
Any non-zero entry in fluxInConsistentRxnBool indicates a flux-inconsistent reaction, i.e., a
reaction that does not admit a non-zero flux. Blocked reactions can be resolved by manual recon-
struction6, algorithmic reconstruction82, or a combination of the two.
q10[m] focytC[m]
glu_L[m]
r0074 NADH2_u10mi
HMR_3966 atp[m] o2[m]
GLU5Km h[m]
amp[m]
ADK1m h[m]
nadh[m] h[i]
ppi[m]
Reaction flux
nad[m]
h2o[m] CYOOm2i
1
PPAm glu5sa[m] q10h2[m]
h[m] CYOR_u10mi
ficytC[m] 2
h2o[m] xo17ah2al[m]
h2o[m]
pi[m] 5
nadp[m]
nadph[m] h[m]
o2[m]
h[m]
adp[m] nadh[m]
ATPS4mi
glu5p[m] G5SDym P45027A15m nad[m]
RE1804M
xo17ah3[m]
Mitochondrion
Fig. 8 | An energy-generating stoichiometrically balanced cycle. The smallest stoichiometrically balanced cycle that produces ATP at a maximal rate
using the ATP synthase reaction, in Recon3D, with all internal reactions. Relative reaction flux magnitudes are indicated by the color of each directed
hyperedge shown in the legend. All metabolite and reaction abbreviations are primary keys in the Virtual Metabolic Human database (https://vmh.
life): reaction abbreviation, reaction name: ADK1m, adenylate kinase, mitochondrial; G5SDym, glutamate-5-semialdehyde dehydrogenase,
mitochondrial; GLU5Km, glutamate 5-kinase, mitochondrial; P45027A15m, 5-beta-cytochrome P450, family 27, subfamily A, polypeptide 1; PPAm,
inorganic diphosphatase; r0074, L-glutamate 5-semialdehyde:NAD+ oxidoreductase; HMR_3966, nucleoside-triphosphate giphosphatase; ATPS4mi,
ATP synthase (four protons for one ATP); CYOR_u10mi, ubiquinol-6 cytochrome c reductase, complex III; NADH2_u10mi, NADH dehydrogenase,
mitochondrial; CYOOm2i, cytochrome c oxidase, mitochondrial complex IV.
gluc
pyr
R19 (PDH)
accoa
oac cit
nadh R20 (CS)
R21 (ACONT)
R27 (MDH)
mal
R26 (FUM)
icit
fum R22 (ICDHyr)
R25 (SUCDi) nadph
fadh2
akg
R24 (SUCOAS)
R23 (AKGDH)
suc(ext) suc
SUCt
succoa nadh
atp
R22 (ICDHyr)
or R25 (SUCDi)
R23 (AKGDH) and or
or R26 (FUM)
R24 (SUCOAS)
Fig. 9 | The interventions predicted by the OptForce method for succinate overproduction in E. coli (AntCore
model) under aerobic conditions. Reactions that need to be upregulated (green arrows and labels) and knocked out
(red arrows and labels) are shown in this simplified metabolic map. The strategies include upregulation of reactions
generating succinate such as isocitrate dehydrogenase, α-ketoglutarate dehydrogenase, or succinyl-CoA synthetase,
along with knockout of reactions draining succinate, such as those for succinate dehydrogenase or fumarate
hydratase. Note that each of these reactions may associate with one or more genes in E. coli. R before a number
denotes the index of the corresponding reaction in the stoichiometric martrix. Each metabolite and reaction is
abbreviated: accoa, acetyl-CoA; ACONT, aconitase; akg, oxoglutaric acid; AKGDH, 2-oxoglutarate dehydrogenase;
atp, adenosine triphosphate; cit, citric acid; CS, citrate synthase; fadh2, reduced form of flavin adenine dinucleotide;
fum, fumaric acid; FUM, fumarase; ICDHyr, isocitrate dehydrogenase; icit, isocitric acid; gluc, glucose; mal, malate;
MDH, malate dehydrogenase; nadh, reduced nicotinamide–adenine dinucleotide; oac, oxaloacetate; pyr, pyruvate;
suc, succinate; SUCDi, succinate dehydrogenase (irreversible); suc(ext), succinate (extracellular); succoa, succinyl-
CoA; SUCOAS, succinyl-CoA synthetase (ADP-forming); SUCt, succinate transport.
200 0.9
0.8
∆r G′m (blue) or ∆r G′ (red) (kJ/mol)
100 0.7
0.6
0.4
–100 0.3
0.2
–200 0.1
0
0 500 1,000 1,500 2,000 2,500
Reactions, sorted by ∆r G′m or P (∆r G′m < 0)
Fig. 10 | Qualitatively forward, quantitatively reverse reactions in a multi-compartmental, genome-scale model. In Recon3D, the transformed
reaction Gibbs energy could be estimated for 7,215 reactions. Of these reactions, 2,868 reactions were qualitatively assigned to be forward in the
reconstruction, but were quantitatively assigned to be reversible using subcellular compartment–specific thermodynamic parameters, the component
contribution method, and broad bounds on metabolite concentrations (10−5–0.02 mol/L), except for certain cofactors. The geometric mean (green)
and feasible range (between maximum and minimum) of estimated millimolar standard transformed reaction Gibbs energy (ΔrG′m, blue) and
transformed reaction Gibbs energy (ΔrG′, red) are illustrated. The relative uncertainty in metabolite concentrations versus uncertainty in
thermochemical estimates is reflected by the relative breadth of the red and blue bars for each reaction, respectively. The reactions are rank ordered
by the cumulative probability that millimolar standard transformed reaction Gibbs energy is less than zero, P(ΔrG′m < 0), (black descending line from
left to right). This assumes that all metabolites are at a millimolar concentration (1 mM) and a Gaussian error is assumed in component contribution
estimates. In this ordering, forward transport reactions have P(ΔrG′m < 0) = 1 (far left) and reverse transport reactions have P(ΔrG′m < 0) = 0 (far
right). In between, from left to right are biochemical reactions with decreasing cumulative probability of being forward in direction, subject to the
stated assumptions. Alternative rankings are possible. The key point is to observe that the COBRA Toolbox is primed for quantitative integration of
metabolomic data as the uncertainty in transformed reaction Gibbs energy associated with thermochemical estimates using the component
contribution method is now substantially lower than the uncertainty associated with the assumption of broad concentration range.
Fig. 11 | Human metabolic network visualization. Overlay of the flux vector for maximum ATP synthase flux, using flux balance analysis with
regularization of the flux vector. Active fluxes are highlighted (blue). The full image can be accessed as the Supplementary Data file.
ICDHyr (6.0072) Biomass_Ecoli_core_w_GAM (0.87392) AKGt2r (0) G6PDH2r (4.96) PFK (7.4774)
q8h2[c] o2[c]
D_LACt2 (0)
lac-D[c] pyr[e]
c[c] fum[e] q8[c]
etoh[e] pyr[c]
etoh[c]
ALCD2x (0)
fum[c]
ICL (0) PDH (9.2825) ACALD (0) PFL (7.0824e-31) FORt2 (0) ACt2r (0)
MDH (5.0644)
nadh[c]
q8h2[c] o2[c]
succ[e] fum[c]
Fig. 12 | Selective scope visualization of the E. coli core model by Paint4Net. Rectangles represent reactions with rates of fluxes in parentheses; the
red rectangles represent reactions with only one metabolite; ellipses represent metabolites; the red ellipses represent dead-end metabolites; gray
arrows represent zero-rate fluxes; green arrows represent positive-rate (forward) fluxes; and blue arrows represent negative-rate (backward) fluxes.
Network visualization also enables zoom to specific regions.
References
1. Palsson, B. Ø. Systems Biology: Constraint-Based Reconstruction and Analysis (Cambridge University Press,
Cambridge, 2015).
2. O’Brien, E. J., Monk, J. M. & Palsson, B. O. Using genome-scale models to predict biological capabilities.
Cell 161, 971–987 (2015).
3. Becker, S. A. et al. Quantitative prediction of cellular metabolism with constraint-based models: the COBRA
Toolbox. Nat. Protoc. 2, 727–738 (2007).
4. Schellenberger, J. et al. Quantitative prediction of cellular metabolism with constraint-based models: the
COBRA Toolbox v2.0. Nat. Protoc. 6, 1290–1307 (2011).
5. Lewis, N. E., Nagarajan, H. & Palsson, B. O. Constraining the metabolic genotype–phenotype relationship
using a phylogeny of in silico methods. Nat. Rev. Microbiol. 10, 291–305 (2012).
6. Thiele, I. & Palsson, B. Ø. A protocol for generating a high-quality genome-scale metabolic reconstruction.
Nat. Protoc. 5, 93–121 (2010).
7. Kitano, H., Ghosh, S. & Matsuoka, Y. Social engineering for virtual ‘big science’ in systems biology.
Nat. Chem. Biol. 7, 323–326 (2011).
8. Bordbar, A., Monk, J. M., King, Z. A. & Palsson, B. O. Constraint-based models predict metabolic and
associated cellular functions. Nat. Rev. Genet. 15, 107–120 (2014).
9. Maia, P., Rocha, M. & Rocha, I. In silico constraint-based strain optimization methods: the quest for optimal
cell factories. Microbiol. Mol. Biol. Rev. 80, 45–67 (2016).
10. Hefzi, H. et al. A consensus genome-scale reconstruction of Chinese hamster ovary cell metabolism. Cell
Syst. 3, 434–443.e8 (2016).
11. Yusufi, F. N. K. et al. Mammalian systems biotechnology reveals global cellular adaptations in a recom-
binant CHO cell line. Cell Syst. 4, 530–542.e6 (2017).
12. Zhuang, K. et al. Genome-scale dynamic modeling of the competition between Rhodoferax and Geobacter in
anoxic subsurface environments. ISME J 5, 305–316 (2011).
13. Jamshidi, N. & Palsson, B. Ø. Systems biology of the human red blood cell. Blood Cells Mol. Dis. 36, 239–247
(2006).
14. Yizhak, K., Gabay, O., Cohen, H. & Ruppin, E. Model-based identification of drug targets that revert
disrupted metabolism and its application to ageing. Nat. Commun. 4, 2632 (2013).
15. Shlomi, T., Cabili, M. N. & Ruppin, E. Predicting metabolic biomarkers of human inborn errors of
metabolism. Mol. Syst. Biol. 5, 263 (2009).
16. Sahoo, S., Franzson, L., Jonsson, J. J. & Thiele, I. A compendium of inborn errors of metabolism mapped
onto the human metabolic network. Mol. Biosyst. 8, 2545–2558 (2012).
17. Thiele, I. et al. A community-driven global reconstruction of human metabolism. Nat. Biotechnol. 31,
419–425 (2013).
18. Pagliarini, R. & di Bernardo, D. A genome-scale modeling approach to study inborn errors of liver
metabolism: toward an in silico patient. J. Comput. Biol. 20, 383–397 (2013).
19. Shaked, I., Oberhardt, M. A., Atias, N., Sharan, R. & Ruppin, E. Metabolic network prediction of drug side
effects. Cell Syst. 2, 209–213 (2016).
20. Chang, R. L., Xie, L., Xie, L., Bourne, P. E. & Palsson, B. Drug off-target effects predicted using structural
analysis in the context of a metabolic network model. PLoS Comput. Biol. 6, e1000938 (2010).
21. Kell, D. B. Systems biology, metabolic modelling and metabolomics in drug discovery and development.
Drug Discov. Today 11, 1085–1092 (2006).
22. Duarte, N. C. et al. Global reconstruction of the human metabolic network based on genomic and bibliomic
data. Proc. Natl. Acad. Sci. USA 104, 1777–1782 (2007).
23. Swainston, N. et al. Recon 2.2: from reconstruction to model of human metabolism. Metabolomics 12, 109
(2016).
24. Pornputtapong, N., Nookaew, I. & Nielsen, J. Human metabolic atlas: an online resource for human
metabolism. Database 2015, bav068 (2015).
25. Zielinski, D. C. et al. Systems biology analysis of drivers underlying hallmarks of cancer cell metabolism.
Sci. Rep. 7, 41241 (2017).
26. Mardinoglu, A. et al. Genome-scale metabolic modelling of hepatocytes reveals serine deficiency in patients
with non-alcoholic fatty liver disease. Nat. Commun. 5, 3083 (2014).
27. Karlstädt, A. et al. CardioNet: a human metabolic network suited for the study of cardiomyocyte meta-
bolism. BMC Syst. Biol. 6, 114 (2012).
28. Gille, C. et al. HepatoNet1: a comprehensive metabolic reconstruction of the human hepatocyte for the
analysis of liver physiology. Mol. Syst. Biol. 6, 411 (2010).
29. Martins Conde Pdo, R., Sauter, T. & Pfau, T. Constraint based modeling going multicellular. Front. Mol.
Biosci. 3, 3 (2016).
30. Bordbar, A. et al. A multi-tissue type genome-scale metabolic network for analysis of whole-body systems
physiology. BMC Syst. Biol. 5, 180 (2011).
31. Yizhak, K. et al. Phenotype-based cell-specific metabolic modeling reveals metabolic liabilities of cancer.
Elife 3, e03641 (2014).
32. Mardinoglu, A. et al. Integration of clinical data with a genome-scale metabolic model of the human
adipocyte. Mol. Syst. Biol. 9, 649 (2013).
33. Bordbar, A. et al. Personalized whole-cell kinetic models of metabolism for discovery in genomics and
pharmacodynamics. Cell Syst. 1, 283–292 (2015).
34. Shoaie, S. et al. Quantifying diet-induced metabolic changes of the human gut microbiome. Cell Metab. 22,
320–331 (2015).
35. Nogiec, C. D. & Kasif, S. To supplement or not to supplement: a metabolic network framework for human
nutritional supplements. PLoS ONE 8, e68751 (2013).
36. Heinken, A., Sahoo, S., Fleming, R. M. T. & Thiele, I. Systems-level characterization of a host-microbe
metabolic symbiosis in the mammalian gut. Gut Microbes 4, 28–40 (2013).
37. Heinken, A. et al. Functional metabolic map of Faecalibacterium prausnitzii, a beneficial human gut
microbe. J. Bacteriol. 196, 3289–3302 (2014).
38. Magnúsdóttir, S. et al. Generation of genome-scale metabolic reconstructions for 773 members of the
human gut microbiota. Nat. Biotechnol. 35, 81–89 (2017).
39. Lakshmanan, M., Koh, G., Chung, B. K. S. & Lee, D.-Y. Software applications for flux balance analysis. Brief
Bioinform. 15, 108–122 (2014).
40. Ebrahim, A., Lerman, J. A., Palsson, B. O. & Hyduke, D. R. COBRApy: constraints-based reconstruction and
analysis for Python. BMC Syst. Biol. 7, 74 (2013).
41. Arkin, A. P. et al. The United States Department of Energy Systems Biology Knowledgebase. Nat. Bio-
technol. 36, 566–569 (2018).
42. Heirendt, L., Thiele, I. & Fleming, R. M. T. DistributedFBA.jl: high-level, high-performance flux balance
analysis in Julia. Bioinformatics 33, 1421–1423 (2017).
43. Latendresse, M., Krummenacker, M., Trupp, M. & Karp, P. D. Construction and completion of flux balance
models from pathway databases. Bioinformatics 28, 388–396 (2012).
44. Karp, P. D. et al. Pathway Tools version 19.0 update: software for pathway/genome informatics and systems
biology. Brief Bioinform. 17, 877–890 (2016).
45. Sandve, G. K., Nekrutenko, A., Taylor, J. & Hovig, E. Ten simple rules for reproducible computational
research. PLoS Comput. Biol. 9, e1003285 (2013).
46. Ince, D. C., Hatton, L. & Graham-Cumming, J. The case for open computer programs. Nature 482, 485–488
(2012).
47. Gevorgyan, A., Bushell, M. E., Avignone-Rossa, C. & Kierzek, A. M. SurreyFBA: a command line tool and
graphics user interface for constraint-based modeling of genome-scale metabolic reaction networks.
Bioinformatics 27, 433–434 (2011).
48. Thorleifsson, S. G. & Thiele, I. rBioNet: a COBRA toolbox extension for reconstructing high-quality
biochemical networks. Bioinformatics 27, 2009–2010 (2011).
49. Sauls, J. T. & Buescher, J. M. Assimilating genome-scale metabolic reconstructions with modelBorgifier.
Bioinformatics 30, 1036–1038 (2014).
50. Noronha, A. et al. ReconMap: an interactive visualization of human metabolism. Bioinformatics 33, 605–607
(2017).
51. Gawron, P. et al. MINERVA—a platform for visualization and curation of molecular interaction networks.
npj Syst. Biol. Appl. 2, 16020 (2016).
52. Olivier, B. G., Rohwer, J. M. & Hofmeyr, J.-H. S. Modelling cellular systems with PySCeS. Bioinformatics 21,
560–561 (2005).
53. Gelius-Dietrich, G., Desouki, A. A., Fritzemeier, C. J. & Lercher, M. J. Sybil—efficient constraint-based
modelling in R. BMC Syst. Biol. 7, 125 (2013).
54. Ma, D. et al. Reliable and efficient solution of genome-scale models of metabolism and macromolecular
expression. Sci. Rep. 7, 40863 (2017).
55. Klamt, S., Saez-Rodriguez, J. & Gilles, E. D. Structural and functional analysis of cellular networks with
CellNetAnalyzer. BMC Syst. Biol. 1, 2 (2007).
56. Klamt, S. & von Kamp, A. An application programming interface for CellNetAnalyzer. Biosystems 105,
162–168 (2011).
57. Apaolaza, I. et al. An in-silico approach to predict and exploit synthetic lethality in cancer metabolism.
Nat. Commun. 8, 459 (2017).
58. Maranas, C. D. & Zomorrodi, A. R. Optimization Methods in Metabolic Networks (Wiley, New York,
2016).
59. Chowdhury, A., Zomorrodi, A. R. & Maranas, C. D. Bilevel optimization techniques in computational strain
design. Comp. Chem. Eng. 72, 363–372 (2015).
60. Thiele, I. et al. Multiscale modeling of metabolism and macromolecular synthesis in E. coli and its appli-
cation to the evolution of codon usage. PLoS ONE 7, e45635 (2012).
61. Feist, A. M. et al. A genome-scale metabolic reconstruction for Escherichia coli K-12 MG1655 that accounts
for 1260 ORFs and thermodynamic information. Mol. Syst. Biol. 3, 121 (2007).
62. Thiele, I., Jamshidi, N., Fleming, R. M. T. & Palsson, B. Ø. Genome-scale reconstruction of Escherichia coli’s
transcriptional and translational machinery: a knowledge base, its mathematical formulation, and its
functional characterization. PLoS Comput. Biol. 5, e1000312 (2009).
96. Lajtha, A. & Sylvester, V. Handbook of Neurochemistry and Molecular Neurobiology (Springer, Boston,
2008).
97. Schuster, S. & Hilgetag, C. On elementary flux modes in biochemical reaction systems at steady state. J. Biol.
Syst. 02, 165–182 (1994).
98. Schilling, C. H., Letscher, D. & Palsson, B. Ø. Theory for the systemic definition of metabolic pathways and
their use in interpreting metabolic function from a pathway-oriented perspective. J. Theor. Biol. 203,
229–248 (2000).
99. Klamt, S. et al. From elementary flux modes to elementary flux vectors: metabolic pathway analysis with
arbitrary linear flux constraints. PLoS Comput. Biol. 13, e1005409 (2017).
100. Bordbar, A. et al. Minimal metabolic pathway structure is consistent with associated biomolecular inter-
actions. Mol. Syst. Biol. 10, 737 (2014).
101. Gudmundsson, S. & Thiele, I. Computationally efficient flux variability analysis. BMC Bioinformatics 11,
489 (2010).
102. Haraldsdóttir, H. S., Cousins, B., Thiele, I., Fleming, R. M. T. & Vempala, S. CHRR: coordinate hit-and-run
with rounding for uniform sampling of constraint-based models. Bioinformatics 33, 1741–1743 (2017).
103. Cousins, B. & Vempala, S. Gaussian cooling and algorithms for volume and Gaussian volume. SIAM J.
Comput. 47, 1237–1273 (2018).
104. Cousins, B. & Vempala, S. A practical volume algorithm. Math. Prog. Comp. 8, 1–28 (2015).
105. Burgard, A. P., Pharkya, P. & Maranas, C. D. Optknock: a bilevel programming framework for identifying
gene knockout strategies for microbial strain optimization. Biotechnol. Bioeng. 84, 647–657 (2003).
106. Patil, K. R., Rocha, I., Förster, J. & Nielsen, J. Evolutionary programming as a platform for in silico metabolic
engineering. BMC Bioinformatics 6, 308 (2005).
107. Lun, D. S. et al. Large-scale identification of genetic design strategies using local search. Mol. Syst. Biol. 5,
296 (2009).
108. Ranganathan, S., Suthers, P. F. & Maranas, C. D. OptForce: an optimization procedure for identifying all
genetic manipulations leading to targeted overproductions. PLoS Comput. Biol. 6, e1000744 (2010).
109. Antoniewicz, M. R. et al. Metabolic flux analysis in a nonstationary system: fed-batch fermentation of a high
yielding strain of E. coli producing 1,3-propanediol. Metab. Eng. 9, 277–292 (2007).
110. Haraldsdóttir, H. S., Thiele, I. & Fleming, R. M. T. Comparative evaluation of open source software for
mapping between metabolite identifiers in metabolic network reconstructions: application to Recon 2.
J. Cheminform. 6, 2 (2014).
111. Preciat Gonzalez, G. A. et al. Comparative evaluation of atom mapping algorithms for balanced metabolic
reactions: application to Recon 3D. J. Cheminform. 9, 39 (2017).
112. Kim, S. et al. PubChem substance and compound databases. Nucleic Acids Res. 44, D1202–D1213 (2016).
113. Kanehisa, M. & Goto, S. KEGG: Kyoto Encyclopedia of Genes and Genomes. Nucleic Acids Res. 28, 27–30
(2000).
114. Hastings, J. et al. The ChEBI reference database and ontology for biologically relevant chemistry:
enhancements for 2013. Nucleic Acids Res. 41, D456–D463 (2013).
115. Sud, M. et al. LMSD: LIPID MAPS structure database. Nucleic Acids Res. 35, D527–D532 (2007).
116. Forster, M., Pick, A., Raitner, M., Schreiber, F. & Brandenburg, F. J. The system architecture of the BioPath
system. In Silico Biol. 2, 415–426 (2002).
117. Williams, A. J., Tkachenko, V., Golotvin, S., Kidd, R. & McCann, G. ChemSpider—building a foundation for
the semantic web by hosting a crowd sourced databasing platform for chemistry. J. Cheminform. 2, O16
(2010).
118. Wishart, D. S. et al. HMDB: the Human Metabolome Database. Nucleic Acids Res. 35, D521–D526 (2007).
119. Rahman, S. A. et al. Reaction Decoder Tool (RDT): extracting features from chemical reactions. Bioin-
formatics 32, 2065–2066 (2016).
120. Kumar, A. & Maranas, C. D. CLCA: maximum common molecular substructure queries within the MetRxn
Database. J. Chem. Inf. Model. 54, 3417–3438 (2014).
121. Shimizu, Y., Hattori, M., Goto, S. & Kanehisa, M. Generalized reaction patterns for prediction of unknown
enzymatic reactions. Genome Inform. 20, 149–158 (2008).
122. Haraldsdóttir, H. S. & Fleming, R. M. T. Identification of conserved moieties in metabolic networks by
graph theoretical analysis of atom transition networks. PLoS Comput. Biol. 12, e1004999 (2016).
123. Klamt, S., Haus, U.-U. & Theis, F. Hypergraphs and cellular networks. PLoS Comput. Biol. 5, e1000385
(2009).
124. Fleming, R. M. T. & Thiele, I. von Bertalanffy 1.0: a COBRA toolbox extension to thermodynamically
constrain metabolic models. Bioinformatics 27, 142–143 (2011).
125. Fleming, R. M. T., Thiele, I. & Nasheuer, H. P. Quantitative assignment of reaction directionality in
constraint-based models of metabolism: application to Escherichia coli. Biophys. Chem. 145, 47–56 (2009).
126. Haraldsdóttir, H. S., Thiele, I. & Fleming, R. M. T. Quantitative assignment of reaction directionality in a
multicompartmental human metabolic reconstruction. Biophys. J. 102, 1703–1711 (2012).
127. Noor, E., Haraldsdóttir, H. S., Milo, R. & Fleming, R. M. T. Consistent estimation of Gibbs energy using
component contributions. PLoS Comput. Biol. 9, e1003098 (2013).
128. Fleming, R. M. T., Maes, C. M., Saunders, M. A., Ye, Y. & Palsson, B. Ø. A variational principle for
computing nonequilibrium fluxes and potentials in genome-scale biochemical networks. J. Theor. Biol. 292,
71–77 (2012).
Acknowledgements
The Reproducible Research Results (R3) team, in particular, C. Trefois and Y. Jarosz, of the Luxembourg Centre for Systems Biomedicine,
is acknowledged for their help in setting up the virtual machine and the Jenkins server. This study was funded by the National Centre of
Excellence in Research (NCER) on Parkinson’s disease, the U.S. Department of Energy, Offices of Advanced Scientific Computing
Research and the Biological and Environmental Research as part of the Scientific Discovery Through Advanced Computing program,
grant no. DE-SC0010429. This project also received funding from the European Union’s HORIZON 2020 Research and Innovation
Programme under grant agreement no. 668738 and the Luxembourg National Research Fund (FNR) ATTRACT program (FNR/A12/01)
and OPEN (FNR/O16/11402054) grants. N.E.L. was supported by NIGMS (R35 GM119850) and the Novo Nordisk Foundation
(NNF10CC1016517). M.A.P.O. was supported by the Luxembourg National Research Fund (FNR) grant AFR/6669348. A.R. was
supported by the Lilly Innovation Fellows Award. F.J.P. was supported by the Minister of Economy and Competitiveness of Spain
(BIO2016-77998-R) and the ELKARTEK Programme of the Basque Government (KK-2016/00026). I.A. was supported by a Basque
Author contributions
S.A.: continuous integration, code review, opencobra.github.io/cobratoolbox, Jenkins, Documenter.py, changeCobraSolver, pull request
support, tutorials, tests, coordination, manuscript, and initCobraToolbox. L.H.: continuous integration, code review, fastFVA (new
version, test, and integration), MATLAB.devTools, opencobra.github.io, tutorials, tests, pull request support, coordination, manuscript,
initCobraToolbox, and forum support. T.P.: input–output and transcriptomic integration, tutorials, tutorial reviews, input–output and
transcriptomic integration sections of manuscript, forum support, pull request support, and code review. S.N.M.: development and update
of strain design algorithms, GAMS and MATLAB integration, and tutorials. A.R.: transcriptomic data integration methods, tutorials,
transcriptomic integration section of manuscript, RuMBA, pFBA, metabolic tasks, and tutorial review. A.H.: multispecies modeling code
contribution, tutorial review, and testing. H.S. Haraldsdóttir: thermodynamics, conserved moiety, and sampling methods. J.W.: doc-
umentation. S.M.K.: SBML input–output support. V.V.: tutorials. S.M.: multispecies modeling, tutorial review, and testing. C.Y.N.: strain
design code review, tutorial review, and manuscript (OptForce/biotech introduction). G.P.: tutorials and chemoinformatics for metabolite
structures and atom mapping data. A.Ž.: metabolic cartography. S.H.J.C.: solution navigation, multispecies modeling code, and tutorial
review. M.K.A.: metabolomic data integration. C.M.C.: tutorials and testing. J.M.: metabolic cartography and human metabolic network
visualization tutorials. J.T.S.: modelBorgifier code and tutorial. A.N.: virtual metabolic human interoperability. A.B.: MinSpan method and
tutorial, supervision on uFBA method and tutorial. B.C.: CHRR uniform sampling. D.C.E.A.: tutorials. L.V.V.: tutorials and genetic MCS
implementation. I.A.: tutorials and genetic MCS implementation. S.G.: interoperability with CellNetAnalyzer. M.A.: adaptive
Levenberg–Marquardt solver. M.B.G.: tutorial reviews. A.K.: Paint4Net code and tutorial. N.S.: development of metabolomic cartography
tool and tutorial. H.M.L.: cardinality optimization solver. D.M.: quadruple-precision solvers. Y.S.: multiscale FBA reformulation. L.W.:
strain design code review, tutorial review, and manuscript (OptForce). J.T.Y.: uFBA method and tutorial. M.A.P.O.: tutorial. P.T.V.:
adaptive Levenberg–Marquardt solvers and boosted difference of convex optimization solver. L.P.E.A.: chemoinformatic data integration
and documentation. I.K.: development of metabolomic cartography tool and tutorial. A.Z.: development of metabolomic cartography tool
and tutorial. H.S. Hinton: E. coli core tutorials. W.A.B.: code refinement. F.J.A.A.: duplomonotone equation solver, boosted difference of
convex optimization solver, and adaptive Levenberg–Marquardt solvers. F.J.P.: academic supervision, tutorials, and genetic MCS
implementation. E.S.: academic supervision, Paint4Net, and tutorial. A.M.: academic supervision. S.V.: academic supervision and CHRR
uniform sampling algorithm. M.H.: academic supervision and SBML input–output support. M.A.S.: academic supervision, quadruple-
precision solvers, nullspace computation, and convex optimization. C.D.M.: academic supervision and strain design algorithms. N.E.L.:
academic supervision and coding, and transcriptomic data integration, RuMBA, pFBA, metabolic tasks, and tutorial review. T.S.:
academic supervision and FASTCORE algorithm. B.Ø.P.: academic supervision and openCOBRA stewardship. I.T.: academic supervision,
tutorials, code contribution, and manuscript. R.M.T.F.: conceptualization, lead developer, academic supervision, software architecture,
code review, sparse optimization, nullspace computation, thermodynamics, variational kinetics, fastGapFill, sampling, conserved moieties,
network visualization, forum support, tutorials, and manuscript.
Competing interests
The authors declare no competing interests.
Additional information
Supplementary information is available for this paper at https://doi.org/10.1038/s41596-018-0098-2.
Reprints and permissions information is available at www.nature.com/reprints.
Correspondence and requests for materials should be addressed to R.M.T.F.
Publisher’s note: Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Related links
Key references using this protocol
Schellenberger, J. et al. Nat. Protocols 6, 1290–1307 (2011): https://www.nature.com/articles/nprot.2011.308
Becker, S. A. et al. Nat. Protocols 2, 727–738 (2007): https://www.nature.com/articles/nprot.2007.99