Microbial Systems Biology - Methods and Protocols (2021)
Microbial Systems Biology - Methods and Protocols (2021)
Microbial
Systems Biology
Methods and Protocols
Second Edition
METHODS IN MOLECULAR BIOLOGY
Series Editor
John M. Walker
School of Life and Medical Sciences
University of Hertfordshire
Hatfield, Hertfordshire, UK
Second Edition
Edited by
Ali Navid
Biosciences and Biotechnology Division, Lawrence Livermore National Laboratory, Livermore, CA, USA
Editor
Ali Navid
Biosciences and Biotechnology Division
Lawrence Livermore National Laboratory
Livermore, CA, USA
This Humana imprint is published by the registered company Springer Science+Business Media, LLC, part of Springer
Nature.
The registered company address is: 1 New York Plaza, New York, NY 10004, U.S.A.
Dedication
v
Preface
Although we have advanced the field of microbial systems biology significantly during the
last decade, our foray into analyzing exciting and evermore complex systems such as gut
microbiome or autotroph-bacterial interactions requires examining all available data at our
disposal. Systems biology analyses today usually use multiple types of omics data for each
study. This of course is a pleasant byproduct of the revolutionary advances in technologies
that allow rapid and relatively cheap collection of system-level data.
To combine and analyze these data requires the use of increasingly complex computa-
tional models and state-of-the-art bioinformatics tools. Fortunately, the field of computa-
tional biology has kept up with technological advances. New and exciting tools are being
developed that allow for detailed analyses of disparate types of omics data. The results of
these studies can then be used to quickly generate system-level models and conduct analyses
that at times can only be achieved in silico.
In this edition of the book I have tried to introduce the reader to powerful and easy-to-
use computational biology databases and tools (e.g., MetaFlux, Kbase, COBRA toolbox)
that can significantly help researchers examine their system-level data. There are also chap-
ters that deal with annotating genomes and using the collected information to conduct
network analyses of the system. The book also introduces the reader to a number of
specialized analytic tools (e.g., NanoSIP and PAMMS) that can significantly improve and
add to the data that is available for systems biology studies. It is my hope that the informa-
tion provided by the authors will inspire researchers to explore innovative analytical meth-
ods, examine their problem from multiple angles, and generate novel hypotheses that will
lead to new scientific discoveries.
Unlike 8 years ago when I edited the first edition of this book, this one took significantly
longer than expected. This was due to a large extent to last minute withdrawal of contribu-
tions due to uncontrollable events. I want to express my gratitude to Dr. John Walker for his
guidance and support during this process. I want to thank the authors for their contribution
and patience during this prolonged process; specially Drs. Benjamin Stewart, Peter Karp,
and their coauthors for having waited the longest for publication of their chapters. Finally, I
want to acknowledge current and former staff from DOE’s national laboratories, specially
my colleagues at Lawrence Livermore National lab for their support and contributions.
vii
Contents
Dedication . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . v
Preface . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . vii
Contributors. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xi
ix
x Contents
Index . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 381
Contributors
BENJAMIN H. ALLEN • Oak Ridge National Laboratory, Oak Ridge, TN, USA
EIVIND ALMAAS • Department of Biotechnology, Norwegian University of Science &
Technology, NTNU, Trondheim, Norway
ANDRE B. CANELAS • Department of Biotechnology, Delft University of Technology, Delft, The
Netherlands
DAVID P. CHIMENTO • Rockland Immunochemicals Inc., Limerick, PA, USA
XIAO CONG • Center for Infectious and Inflammatory Diseases, Institute of Biosciences and
Technology, Texas A&M Health Science Center, Houston, TX, USA
LODEWIJK P. DE JONGE • Department of Biotechnology, Delft University of Technology, Delft,
The Netherlands
JASPREET KAUR DHANJAL • Department of Biochemical Engineering and Biotechnology, DBT-
AIST International Laboratory for Advanced Biomedicine (DAILAB), Indian Institute
of Technology Delhi, Hauz Khas, New Delhi, India
RUTGER D. DOUMA • Department of Biotechnology, Delft University of Technology, Delft, The
Netherlands
JANAKA N. EDIRISINGHE • Argonne National Laboratory, Lemont, IL, USA
JOSÉ P. FARIA • Argonne National Laboratory, Lemont, IL, USA
MARC GRIESEMER • Controls and Data Systems Division, SLAC National Accelerator
Laboratory, Menlo Park, CA, USA
NIDHI GUPTA • Argonne National Laboratory, Lemont, IL, USA
JOSEPH J. HEIJNEN • Department of Biotechnology, Delft University of Technology, Delft, The
Netherlands
CHRISTOPHER S. HENRY • Argonne National Laboratory, Lemont, IL, USA
BRENDAN M. JEFFREY • Bioinformatics and Computational Biosciences Branch, Rocky
Mountain Laboratories, National Institute of Allergy and Infectious Diseases, National
Institutes of Health, Hamilton, MA, USA
PETER D. KARP • SRI International, Menlo Park, CA, USA
HAROLD D. KIM • School of Physics, Georgia Institute of Technology, Atlanta, GA, USA
JEFFREY A. KIMBREL • Biosciences and Biotechnology Division, Lawrence Livermore National
Laboratory, Livermore, CA, USA
ARTHUR LAGANOWSKY • Center for Infectious and Inflammatory Diseases, Institute of
Biosciences and Technology, Texas A&M Health Science Center, Houston, TX, USA;
Department of Chemistry, Texas A&M University, College Station, TX, USA
MARIO LATENDRESSE • SRI International, Menlo Park, CA, USA
XIAOWEN LIANG • Center for Infectious and Inflammatory Diseases, Institute of Biosciences
and Technology, Texas A&M Health Science Center, Houston, TX, USA
WEN LIU • Center for Infectious and Inflammatory Diseases, Institute of Biosciences and
Technology, Texas A&M Health Science Center, Houston, TX, USA
YANG LIU • Center for Infectious and Inflammatory Diseases, Institute of Biosciences and
Technology, Texas A&M Health Science Center, Houston, TX, USA; Department of
Chemistry, Texas A&M University, College Station, TX, USA
ANITA MA€ KI • Department of Biological and Environmental Science, University of Jyv€
askyl€
a,
Jyv€askyl€
a , Finland
xi
xii Contributors
Abstract
Parallel accelerator and molecular mass spectrometry (PAMMS) is a powerful analytical technique capable
of simultaneous quantitation of carbon-14 tracer and structural characterization of 14C-labeled biomole-
cules. Here we describe the use of PAMMS for the analysis of biological molecules separated by high-
performance liquid chromatography. This protocol is intended to serve as a guide for researchers who need
to perform PAMMS experiments using instrumentation available at resource centers such as the National
User Resource for Biological Accelerator Mass Spectrometry at Lawrence Livermore National Laboratory.
Key words Accelerator mass spectrometry, Radiocarbon, Liquid sample interface, Isotope ratio mass
spectrometry
1 Introduction
Ali Navid (ed.), Microbial Systems Biology: Methods and Protocols, Methods in Molecular Biology, vol. 2349,
https://doi.org/10.1007/978-1-0716-1585-0_1, © Springer Science+Business Media, LLC, part of Springer Nature 2022
1
2 Benjamin J. Stewart and Ted J. Ognibene
2.1 Deciding to Use PAMMS has unique capabilities and limitations that must be under-
PAMMS stood before deciding on the most suitable analytical method for a
desired experiment. For many researchers, analysis cost is an impor-
tant consideration. Access to PAMMS through the National User
Resource for Biological Accelerator Mass Spectrometry is free for
nearly all investigators. With this in mind, the central questions a
researcher needs to answer are:
Parallel Accelerator and Molecular Mass Spectrometry Measurement. . . 3
Fig. 1 Decision flowchart for selecting the use of PAMMS or other analytical methods for sample
measurement.
2.2 Designing Although each PAMMS experiment is unique and must be tailored
an Experiment to the specific scientific question of interest, some basic principles
can be applied to all PAMMS experiments to ensure that results are
correct and useful. Investigators should answer the following ques-
tions:
Parallel Accelerator and Molecular Mass Spectrometry Measurement. . . 5
2.5 Mobile Phase In general, analytes and mobile phases that are amenable to mea-
Preparation surement by LC-MS electrospray ionization are also suitable for
PAMMS analysis. All reagents should be LC-MS grade if available.
Mobile phases should be prepared fresh in bottles not used for
other laboratory procedures. When cleaning dedicated LC-MS
glassware, avoid the use of detergents as they can contaminate the
mass spectrometer. Reagents containing nonvolatile salts such as
sodium chloride and phosphates should be avoided. These salts will
not evaporate in the drying oven and will precipitate in the com-
bustion oven, clogging the aperture and breaking the wire.
3 Methods
3.1 Making PAMMS After samples have been prepared for PAMMS measurement and it
Measurements has been determined that the activity is within the range of AMS
detection, samples can be measured by PAMMS as follows:
1. Set up the molecular mass spectrometer. This step includes
calibration, lockmass setup, ionization mode selection, and
other instrument-specific parameters.
2. Set up the HPLC system and load the appropriate separation
method. Prepare fresh mobile phase(s) as needed for the
Parallel Accelerator and Molecular Mass Spectrometry Measurement. . . 9
3.2 Interpreting PAMMS data consists of data collected from three measurement
the Data instruments: HPLC, molecular mass spectrometer, and AMS.
HPLC and molecular mass spectrometer data sets are collected
together and can be analyzed using the same data analysis software
package, and the AMS results must be analyzed separately and
linked together with the HPLC and molecular mass spectrometry
data. The best way to do this is to plot time and intensity of each
signal together using a plotting package. To overlay peaks, it is
necessary to correct the time scale for all instruments for which
data has been collected. Data merging and analysis can be per-
formed as follows:
1. Determine the time delay between instruments. Each analyte
has a specific column retention time but reaches the molecular
mass spectrometer and the AMS detector at different times due
to differences in flow path lengths. In the case that the analyte
of interest is detected by the HPLC detector, align the spectra
by subtracting the time difference between detection by the
HPLC detector and detection by the molecular mass spectrom-
eter from the molecular mass spectrometer detection time
scale. Perform the same procedure to align the AMS spectrum.
In the case that the analyte of interest is invisible to the HPLC
detector but visible to the molecular mass spectrometer, it is
only necessary to align the molecular mass spectrometer chro-
matogram with the AMS detector chromatogram. Depending
on the complexity of the molecular mass spectrometer total ion
chromatogram, it may be more useful to use the extracted ion
chromatogram rather than the total ion chromatogram for
alignment.
2. Quantify HPLC peaks using standard analysis methods.
3. Verify identity of analytes using molecular mass spectra and
MS/MS spectra. Quantify analytes using standard analysis
methods for molecular mass spectrometry.
4. Using liquid sample AMS analysis software, bin and integrate
carbon-14 and stable carbon peaks for each analyte of interest
to calculate carbon-14 and stable carbon content.
10 Benjamin J. Stewart and Ted J. Ognibene
Acknowledgments
References
1. Sporty J, Lin SJ, Kato M, Ognibene T, 7. Brown K, Dingley KH, Turteltaub KW (2005)
Stewart B, Turteltaub K et al (2009) Quantita- Accelerator mass spectrometry for biomedical
tion of NAD+ biosynthesis from the salvage research. Methods Enzymol 402:423–443
pathway in Saccharomyces cerevisiae. Yeast 8. Brown K, Tompkins EM, White IN (2006)
26:363–369 Applications of accelerator mass spectrometry
2. Sporty JL, Kabir MM, Turteltaub KW, for pharmacological and toxicological research.
Ognibene T, Lin SJ, Bench G (2008) Single Mass Spectrom Rev 25:127–145
sample extraction protocol for the quantifica- 9. Turteltaub KW, Vogel JS (2000) Bioanalytical
tion of NAD and NADH redox states in Sac- applications of accelerator mass spectrometry
charomyces cerevisiae. J Sep Sci 31:3202–3211 for pharmaceutical research. Curr Pharm
3. Stewart BJ, Navid A, Turteltaub KW, Bench G Design 6:991–1007
(2010) Yeast dynamic metabolic flux measure- 10. Thomas AT, Ognibene T, Daley P,
ment in nutrient-rich media by HPLC and Turteltaub K, Radousky H, Bench G (2011)
accelerator mass spectrometry. Anal Chem Ultrahigh efficiency moving wire combustion
82:9812–9817 interface for online coupling of high-
4. Stewart BJ, Navid A, Kulp KS, Knaack JL, performance liquid chromatography (HPLC).
Bench G (2013) D-Lactate production as a Anal Chem 83:9413–9417
function of glucose metabolism in Saccharo- 11. Thomas AT, Stewart BJ, Ognibene TJ, Turtel-
myces cerevisiae. Yeast 30:81–91 taub KW, Bench G (2013) Directly coupled
5. Koarashi J, Iida T, Moriizumi J, Asano T high-performance liquid chromatography-
(2004) Evaluation of 14C abundance in soil accelerator mass spectrometry measurement
respiration using acclerator mass spectrometry. of chemically modified protein and peptides.
J Environ Radioact 75:117–132 Anal Chem 85:3644–3650
6. Kunioka M, Ninomiya F, Funabashi M (2009) 12. Sacks GL, Derry LA, Brenna JT (2006) Ele-
Biodegradation of poly(butylene succinate) mental speciation by parallel elemental and
powder in a controlled compost at 58 degrees molecular mass spectrometry and peak profile
C evaluated by naturally-occurring carbon matching. Anal Chem 78:8445–8455
14 amounts in evolved CO(2) based on the 13. Buchholz BA, Freeman SPHT, Haack KW,
ISO 14855-2 method. Int J Mol Sci Vogel JS (2000) Tips and traps in the 14C
10:4267–4283 bio-AMS preparation laboratory. Nucl Instrum
Meth Phys Res B 172:404–408
Chapter 2
Abstract
Obtaining meaningful snapshots of the metabolome of microorganisms requires rapid sampling and
immediate quenching of all metabolic activity, to prevent any changes in metabolite levels after sampling.
Furthermore, a suitable extraction method is required ensuring complete extraction of metabolites from
the cells and inactivation of enzymatic activity, with minimal degradation of labile compounds. Finally, a
sensitive, high-throughput analysis platform is needed to quantify a large number of metabolites in a small
amount of sample. An issue which has often been overlooked in microbial metabolomics is the fact that
many intracellular metabolites are also present in significant amounts outside the cells and may interfere
with the quantification of the endo metabolome. Attempts to remove the extracellular metabolites with
dedicated quenching methods often induce release of intracellular metabolites into the quenching solution.
For eukaryotic microorganisms, this release can be minimized by adaptation of the quenching method. For
prokaryotic cells, this has not yet been accomplished, so the application of a differential method whereby
metabolites are measured in the culture supernatant as well as in total broth samples, to calculate the
intracellular levels by subtraction, seems to be the most suitable approach. Here we present an overview of
different sampling, quenching, and extraction methods developed for microbial metabolomics, described in
the literature. Detailed protocols are provided for rapid sampling, quenching, and extraction, for measure-
ment of metabolites in total broth samples, washed cell samples, and supernatant, to be applied for
quantitative metabolomics of both eukaryotic and prokaryotic microorganisms.
1 Introduction
Ali Navid (ed.), Microbial Systems Biology: Methods and Protocols, Methods in Molecular Biology, vol. 2349,
https://doi.org/10.1007/978-1-0716-1585-0_2, © Springer Science+Business Media, LLC, part of Springer Nature 2022
11
12 Walter M. van Gulik et al.
1.1 Method It is well known that many metabolites, especially the intermediates
Development of the central metabolic pathways and connected cofactors like ATP
and NADH, have turnover times in the order of seconds, as can be
1.1.1 Methods for Rapid
calculated from their in vivo pool sizes and conversion rates. This
Sampling and Quenching
implies that a proper snapshot of the intracellular metabolite levels
can only be obtained if sampling and subsequent arrest of metabolic
activity are sufficiently fast, that is, significantly faster than the
turnover times of the metabolite pools.
Biochemists have been aware of this for many decades as can be
inferred from publications from the early 1960s and 1970s, e.g. on
the quenching and extraction of rat liver tissue [5–8]. Some of the
early works on metabolite measurements in microbial cells already
emphasized the importance of arresting all metabolic activity as fast
as possible. With the aim to measure the ATP levels in fermentor
cultures of E. coli under different growth conditions, Cole et al. [9]
took 2 mL broth samples from the fermentor directly into ice-cold
perchloric acid, achieving simultaneous quenching and extraction.
Although the disadvantage of this procedure was that it resulted in
a dilution of the sample, this was not a problem in this case because
the authors used a sensitive luciferase-based assay for the measure-
ment of the ATP level. Another key disadvantage of combining
quenching with extraction of complete culture samples is that no
distinction can be made between the metabolites present in the cells
and in the supernatant. Partly for this reason, but also because of
the relative insensitivity of most of the (in the past mainly enzyme
based) metabolite assays, often a separation step has been applied,
i.e., filtration or centrifugation, followed by resuspension in a small
volume of medium prior to quenching, for example, in cold
Fast Sampling of the Cellular Metabolome 13
Fig. 1 Schematic overview of two different sampling procedures, left panel: rapid sampling and conventional
cold methanol quenching combined with cold centrifugation and centrifugation-based washing; right panel:
rapid sampling and cold methanol quenching combined with cold filtration and filtration-based washing
(Figure from Douma et al. [12])
14 Walter M. van Gulik et al.
Fig. 2 Measurements carried out in different sample fractions to enable a mass balance-based approach for
quantification of metabolite leakage during quenching (Figure from Canelas et al. [16])
1.1.2 Fast Sampling Probably the first attempt of rapid sampling from a laboratory-scale
Devices bioreactor has been reported in 1969 by Harrison and Maitra
[17]. Sampling was performed via a port in the base plate of the
reactor. To remove the broth from the dead volume of the sampling
port prior to the withdrawal of the sample, 5 mL of culture was
allowed to flow to waste shortly before sampling. The authors
measured the sampling time and the subsequent time required to
fully mix the sample with the quenching solution, by sampling
9 mL of a 10 M alkali solution into 1 mL of concentrated HCl in
a test tube to which a Thymol Blue indicator was added. The
sampling procedure was recorded with a cine-camera at
67 frames/s. In this way, they determined that the maximum
time interval between the removal of the sample from the culture
vessel and coming into contact with the quenching solution in the
sample tube was approximately 0.1 s. Subsequent mixing with the
quenching solution took about 0.08 s. This rapid sampling method
was applied to measure the levels of the adenine nucleotides and
some intermediates of central metabolism in chemostat cultures of
Klebsiella aerogenes under different oxygen supply conditions and
as response to substrate pulses.
With the aim to avoid the contamination of the sample with the
contents of the dead volume of the sample valve, Iversen [18]
constructed a rapid sampling valve wherein the remaining broth
was removed from the dead space of the valve after each sampling
Fast Sampling of the Cellular Metabolome 15
Fig. 3 Examples of results of the balancing approach for quantification of metabolite leakage during the cold
methanol quenching procedure: (F2) amount measured in the filtrate, ICcal (¼ B F2) calculated amount in the
cell pellet, (WS) measured amount in the washing solution, (QS) measured amount in the quenching solution,
(IC) measured amount in the biomass pellet. Bars represent the averages, with their standard errors, of four
replicate samples taken from two independent chemostat experiments, analyzed in duplicate (Figure from
Taymaz-Nikerel et al. [15])
Fig. 4 Workflow of the differential method for intracellular metabolite quantification (Figure from Taymaz-
Nikerel et al. [15])
16 Walter M. van Gulik et al.
Still also with this system the sampling frequency could not be
increased much above one sample per 5 s, because of the many
manual handlings that had to be performed. Therefore, Schaefer
et al. [22] developed a completely automated sampling device,
whereby the sampling tubes were fixed in transport racks which
were moved by a step engine underneath a continuous jet of
sample, with a flow rate of 3.3 mL/s, from a stirred tank bioreactor.
In this way, the sampling tubes containing the quenching solution
could be filled within 220 ms, resulting in a sampling rate of
approximately 4.5 samples per second. This automated rapid sam-
pling device was applied for investigation of the intracellular metab-
olite dynamics of glycolysis in Escherichia coli after rapid glucose
addition to a glucose-limited steady state culture.
A completely different approach to increase the sampling fre-
quency was developed by Weuster-Botz et al. [23]. The basic idea
was to perform sampling, inactivation of metabolic activity and
extraction of intracellular metabolites in a continuous way in a
tube, connected to a well-controlled bioreactor. In this way, the
highly dynamic metabolite patterns resulting from a sudden distur-
bance of the culture in the reactor were fixed at a certain position in
the sampling tube. The system consisted of a custom-made sam-
pling probe with an inlet of 4 mm diameter, which contained a
second inlet for continuous supply of quenching/extraction solu-
tion and an outlet of 8 mm diameter connected to the sampling
tube. Cold (40 C) perchloric acid was used as quenching/extrac-
tion solution, which was mixed with the sample 3 mm from the
entrance of the sample probe. The sampling tube was made from
polyethylene with an internal diameter of 8 mm and a total length
of 100 m and was wound up to a coil with a diameter of 0.5 m.
18 Walter M. van Gulik et al.
Fig. 6 Operation principle of a rapid sampling device for fungal cultivations (Figure from Lameiras et al. [31])
1.1.3 Methods for Fast In the literature on fast sampling methods discussed above, all
Quenching of Metabolism methods have been applied with certain quenching procedures. It
should be realized, however, that many different combinations of
20 Walter M. van Gulik et al.
1.1.4 Extraction The next step in the procedure is the extraction of the metabolites
of Metabolites from from the quenched sample. Ideally, the applied extraction proce-
Quenched Cell Samples dure should result in unbiased and complete extraction of all meta-
bolites from the cells, should not lead to conversion and/or
degradation of metabolites during extraction and subsequent sam-
ple processing, and should be compatible with the analysis methods
to be applied. Extraction can be achieved using high temperature,
extreme pH, organic solvents, mechanical stress, or combinations
of these. Well-known methods which have been employed since the
1950s are extraction in perchloric acid [36, 37], hot water [38, 39],
and boiling ethanol/water [40, 41]. More recently, the tendency
has been to apply milder extraction methods, to prevent degrada-
tion of metabolites as much as possible. In these methods, extrac-
tion is carried out at low temperatures, sometimes combined with
repeated freezing and thawing to disintegrate the cells. Examples
are cold chloroform methanol/extraction [11], freeze-thawing in
methanol [42], and cold acetonitrile-methanol extraction [43]. A
quantitative evaluation of different extraction methods for applica-
tion to metabolome analysis of yeast has been published by Canelas
et al. [44]. In this study, the addition of 13C-labeled internal
standards at different stages of sample processing has been applied
to determine the metabolite recoveries. Canelas et al. concluded
that the boiling ethanol/water and chloroform/methanol extrac-
tion methods performed best, in terms of efficacy and metabolite
recoveries. Application of methods which do not ensure complete
enzyme inactivation, e.g., freeze-thawing in methanol, significantly
affected the outcome of the metabolome measurements, due to
enzymatic conversion of metabolites in the samples. Metabolite
recoveries upon extraction of yeast cells with acidic acetonitrile-
methanol appeared low for larger and more polar metabolites (see
Fig. 7).
Fig. 7 Overall process recoveries for 44 metabolites analyzed in yeast, in order of increasing molecular weight,
for each of the extraction methods, under two growth conditions, chemostat and batch cultivation. Data are
averages and standard deviations of duplicate samples each analyzed twice. Legend: ∇, chemostat; Δ, batch
(Figure from Canelas et al. [44])
Fast Sampling of the Cellular Metabolome 23
1.1.5 Analytical Finally, high-throughput analysis methods are required for selective
Procedures and precise quantification of a large variety of metabolites. In the
past almost exclusively enzyme based methods have been used [45]
which have the advantage that they are very specific for a particular
metabolite, but the disadvantage that for each metabolite a differ-
ent assay is required and that some of the enzymes needed might
not be commercially available. With the improvement of GC and
HPLC techniques, these have therefore increasingly been used.
During the last decade, sensitive high-throughput mass
spectrometry-based methods (mainly GC-MS and LC-MS/MS)
have enabled the measurement of large numbers of different meta-
bolites in a small amount of sample. Especially with the application
of U-13C-labeled internal standards, enabling to perform isotope
dilution mass spectrometry (IDMS), the precision of MS-based
metabolome measurements has increased significantly [46, 48].
2 Materials
2.1 Cold Methanol 1. Rapid sampling setup (see Note 1), e.g., the system published
Quenching Combined by Lange et al. (for a complete description, see ref. 21).
with Cold 2. Cryostat, filled with a suitable cryo liquid (e.g., ethylene glycol)
Centrifugation and capable of reaching a temperature of 40 C.
3. 60% (v/v) methanol/water mixture.
4. Appropriate test tubes (e.g., polypropylene (PP) tubes of
14 mL, 17 mm diameter) with caps.
5. Cooled laboratory centrifuge capable of reaching a tempera-
ture of at least 20 C.
6. A 40 C freezer to pre-cool the centrifuge rotor.
Precautions:
Methanol and ethylene glycol (the most commonly used cool-
ing fluid) are toxic substances. Always wear (impermeable) gloves
and safety goggles when manipulating the samples, and avoid con-
taminating surfaces and equipment.
2.2 Additional 1. For fungal cultures: glass fiber filters (e.g., type A/E, Pall
Materials for Cold Corporation, East Hills, NY, USA, 47 mm diameter, 1μm
Methanol Quenching pore size). For yeast and bacterial cultures: Hydrophilic poly-
Combined with Cold ethersulfone (PES) membrane filter with a pore size of
Filtration 0.2–0.45μm (e.g., Supor, Pall, USA).
2. Peristaltic pump capable of reaching a flow rate of at least
300 mL/min.
3. Filtration setup with vacuum pump.
4. Balance.
5. Water bath at 70 C.
24 Walter M. van Gulik et al.
2.3 Rapid Sampling 1. Plastic syringes with a volume of 10, 30, or 60 mL (depending
of Culture Filtrate on the sample volume required).
2. Stainless steel beads with a diameter of 4 mm.
3. Syringe filters with a pore size of 0.45μm, e.g., Milex HV
(Millipore, Cork, UK).
3 Methods
3.1 Rapid Sampling This protocol is typically suited for rapid sampling of microorgan-
for Endometabolome isms which show negligible leakage of metabolites into the quench-
Analysis: Cold ing solution. Because this method includes a centrifugation and a
Centrifugation Method washing step, the metabolites present in the cultivation medium are
removed. This allows proper quantification of the intracellular
metabolites without interference of the exometabolome. It should
be noted that although the concentrations of metabolites in the
medium are usually much lower than within the cells, the amount
of extracellular metabolites in a broth sample can still be significant
Fast Sampling of the Cellular Metabolome 25
3.1.1 Preparation It is advisable to carry out the following preparatory steps the day
before the sampling is performed:
1. For n samples, prepare:
– n test tubes containing 5 ml of 60% v/v MeOH, for sam-
pling. Number and weigh them. Close all tubes with caps
and store at 40 C.
– n test tubes containing 5 ml of 60% v/v MeOH for the
washing step. Close tubes and store also at 40 C.
– n test tubes containing 5 ml of 75% v/v EtOH (68% m/m)
for the extraction step. Close tubes and store in the fridge.
2. Set the temperature of the centrifuge to 20 C and put the
appropriate centrifuge rotor in a 40 C freezer. Turn on the
cryostat and set the temperature to 40 C.
3. Connect the rapid sampling setup to the bioreactor to be
sampled.
The next steps are best performed on the same day the sam-
pling is performed:
1. If IDMS analysis is used (see Note 2): Let the frozen
13
C-labeled extract thaw in the fridge. Make sure that you use
the same uniform solution for all samples and standards. Keep
the vial containing the 13C extract closed and cold, e.g., on ice.
2. Place the tubes containing 60% (v/v) methanol, required for
sampling and washing of the cell pellet, in the cryostat at
40 C.
3. Adjust the timer controlling the electronic valve(s) of the rapid
sampling system such that the weight of the sample taken
equals 1.00 0.05 g.
13
4. Calibrate the pipette required for C extract additions (typi-
cally 100μL).
5. Switch on a suitable water bath and let it reach a temperature of
95 C before sampling is started.
6. Place the tubes containing the 75% ethanol next to the water
bath and allow them to warm up to room temperature.
26 Walter M. van Gulik et al.
3.1.4 Further Sample In the protocol below, it is assumed that a Labconco RapidVap is
Processing used for the sample drying.
1. Turn on the cold trap of the RapidVap. Make sure the cold trap
is empty. It will take 10–20 min to be ready.
2. Evaporate the ethanol/water mixture until the samples are dry.
Set the speed of the RapidVap to 90%, and apply full vacuum.
3. 5 min after the start, switch on the heating and set to 30 C.
4. 25 min after the start, decrease the vacuum to 5 mbar.
5. Stop the RapidVap 110 min after the start and check if the
samples are completely dry. If not, continue until dry.
6. Resuspend the dried sediment in 500μL MilliQ water.
7. Mix thoroughly by vortexing and transfer to Eppendorf tubes.
8. Centrifuge at 15,000 g for 5 min at 1 C. (If the supernatant
is still turbid, transfer supernatant to clean Eppendorf tubes
and centrifuge again.)
9. Transfer the supernatants to (labeled) 0.2-μm Durapore PVDF
centrifuge filters.
10. Filter by centrifuging again at 15,000 g for 5 min at 1 C.
11. Transfer supernatant to screw-cap sample vials and store at
80 C until analysis.
3.2 Rapid Sampling For quantification of intracellular metabolites which are present in
for Endometabolome the cells in very low amounts compared to their presence in the
Analysis: Cold cultivation medium, the washing efficiency of the cold centrifuga-
Filtration Method tion method may not be sufficient. Therefore, a method was devel-
oped whereby cold methanol quenching is combined with a cold
filtration step for virtually complete removal of the exometabolome
[12]. See Fig. 1 right panel for a schematic overview of the method.
This procedure is especially useful to quantify intracellular amounts
of substrates and secreted (by)products. In the following protocol,
it is assumed that 60% aqueous methanol is a suitable quenching
28 Walter M. van Gulik et al.
3.2.1 Preparation For n samples, prepare the following the day before the sampling is
carried out:
1. 3 n tubes with 50 mL of 60% methanol. Cap and cool down
to 40 C.
2. n tubes with 30 mL of 75% ethanol. Cap and heat them up in a
70 C water bath before the sampling starts. (70 C is just
below the boiling point of this mixture.)
The next steps are best performed on the same day the sam-
pling is performed:
1. Place the vacuum filtration unit on the balance (see Fig. 1, right
panel). Connect the tubing to the vacuum pump without
strain, such that it does not affect the weight of the filtration
unit during sampling.
13
2. Calibrate the pipette required for C extract additions (typi-
cally 100μL).
3. If IDMS is applied for metabolite quantification (see Note 2):
Let the frozen 13C-labeled extract thaw in the fridge. Make
sure that you use the same uniform solution for all samples and
standards. Keep the vial containing the 13C extract closed and
cold on ice.
3.2.2 Sampling 1. Place a filter on the filter support disc and clamp the filtration
beaker.
2. Open a tube with 75% ethanol at 70 C (required for extraction
in a few minutes) and keep it in the 70 C water bath.
3. Get three tubes with 50 mL of 60% methanol at 40 C from
the freezer/cryostat. Leave two of them next to the sampling
setup ready to grab and pour out one in the filtration beaker,
for washing the cell cake.
4. Tare the balance.
Fast Sampling of the Cellular Metabolome 29
5. Switch on the peristaltic pump and flush the dead volume of the
sampling tubing into a waste tube. Without switching off the
pump, direct the flow/spray into the cold 60% methanol in the
filtration beaker. The spray must directly contact the cold 60%
methanol, so avoid hitting the wall of the filtration beaker.
Switch off the pump after approximately 10 g (¼ 10 mL) of
broth has been sampled.
6. Read the exact sample weight from the balance. (The second
experimenter has time to write down the weight.)
7. Start the vacuum pump. Open the second 60% methanol tube
while the broth/methanol suspension is filtered and pour it out
into the beaker only after the filter cake falls dry. Repeat with
the third 60% methanol tube and turn off the vacuum pump
after the filter cake falls dry.
3.2.3 Extraction 1. Remove the filtration beaker, lift up the filter with cell cake
of the Cell Cakes using tweezers, pipette 100μL of 13C extract (0 C) on top of
the washed cell cake and immediately submerge the cell cake in
the 75% ethanol tube at 70 C.
2. Cap the tube and vigorously shake it by hand for 5 s (glass fiber
filters will disintegrate at this point) and then place in a 95 C
water bath for 3 min (open the cap slightly to prevent
pressurization).
3. Remove the tube from the water bath and cool it on ice. Recap
the tube.
4. If desired, the sample can now be stored at 80 C until further
processing. If not, continue with Subheading 3.2.4, step 1.
5. Clean the filtration setup for the next sample.
3.2.4 Further Sample In the protocol below, it is assumed that a Labconco RapidVap is
Processing used for the sample drying.
1. Centrifuge the extracted samples for 8 min at 4 C and
4400 g.
2. Filter the supernatant using a 0.2-μm filter to remove glass
fibers from the solution.
3. Evaporate the thus obtained extract to dryness using the
RapidVap. Alternatively, if problems occur with resuspension
of the dry residue, the extract can be concentrated instead of
complete evaporation to dryness, e.g., to a final volume of
300–500μL. The drying/concentration step requires about
2 h (depending on the number of tubes processed at the
same time). See Subheading 3.1.4 for the steps preparing the
RapidVap for use. Start at a slow speed (30%) and increase as
more and more water and ethanol evaporates. Set the heat to
30 Walter M. van Gulik et al.
3.3 Rapid Sampling With this procedure, samples from a culture of microorganisms are
for Exometabolome quickly cooled down to a temperature close to 0 C. The purpose is
Analysis to minimize metabolic activity as much as possible while avoiding
freezing the sample, as this may lead to cell damage. The cooling of
the sample is accomplished by direct contact with pre-cooled steel
beads which are placed in a syringe. Directly thereafter the sample is
pressed through a filter to obtain a supernatant sample. The
amount of beads needed to cool down the sample to a temperature
slightly above 0 C can be calculated from the heat capacities of
stainless steel and water, the required sample volume, the initial
sample temperature, and the initial temperature of the stainless steel
beads, see ref. 49. Note that if the cells are susceptible to cold shock
(i.e., sudden cooling will result in release of metabolites from the
cells), the cooling step should be omitted. The protocol below is
designed for the withdrawal of 2 mL of sample with an initial
temperature of 30 C.
3.3.1 Preparation 1. Fill the required number of syringes with 25 g of stainless steel
beads each. Close the syringes with their plungers and the
syringe outlets with parafilm and put them overnight in a
freezer at 20 C.
3.3.2 Sampling 1. Take the required number of syringes filled with cold beads
from the freezer, remove the parafilm, and connect the filters to
the syringes. Keep them in a Styrofoam box filled with cooling
elements of 20 C until sampling, to prevent them from
warming up.
2. Sample 2 mL of broth from the bioreactor into a syringe and
filter immediately, while collecting the supernatant in a
sample vial.
Fast Sampling of the Cellular Metabolome 31
3.4.1 Preparation Most convenient is to carry out the following preparatory steps the
day before the sampling is performed:
1. For n samples, prepare:
– n tubes containing 5 ml of 60% v/v MeOH for sampling.
Number and weigh them. Store at 40 C.
Only if rapid cooling of the sample is required and the
microorganisms are not susceptible to cold shock (see refs. 13,
15):
– n syringes filled with the proper amount of stainless steel
beads (see protocol for exometabolome sampling). Close the
syringes with their plungers, cover the syringe outlets with a
layer of parafilm (to prevent formation of ice) and leave
them overnight in a freezer at 20 C.
– n tubes containing 5 ml of 75% v/v EtOH for the extraction
step. Store in the fridge.
2. Turn on the cryostat and set the temperature to 40 C.
32 Walter M. van Gulik et al.
3.4.3 Extraction In this protocol not only the total broth sample but also the filtrate
of the Quenched Total sample is extracted in hot ethanol, to denaturate all possible
Broth and Filtrate Samples enzymes present (see Note 5). Even the presence of minimal
amounts of enzymes would lead to distortion of metabolite profiles
later on in the sample processing, which must be avoided.
1. Transfer from each quenched broth sample 500μL into an
empty tube and keep them in the cryostat at 40 C until
extraction. Be sure to completely mix the quenched samples
by vortexing before the transfer.
2. Repeat this procedure for the quenched filtrate samples.
3. Add the U-13C internal standard mix (typically 100μL).
4. Apply the same procedure as described for extraction of the cell
pellets (see Subheading 3.1.3).
3.4.4 Further Processing Apply the same procedure for sample drying and cleanup as
of the Total Broth described for the cell pellets (see Subheading 3.1.4).
and Filtrate Samples
3.4.5 Determination After quantification of the metabolites in the total broth and filtrate
of the Intracellular samples, the intracellular amounts can be calculated by subtraction.
Metabolite Levels Proper quantification of the real amounts of sample taken, which
for the Differential Method was performed by weighing in case of the total broth samples and
by accurate pipetting (in addition, weighing can be used here) will
increase the accuracy of the final result. The most convenient way of
expressing the metabolite levels, both in total broth and in the
filtrate, is per amount of biomass present in the bioreactor, e.g.,
in μmol per gram of biomass dry weight. Subtraction of metabolite
levels in the filtrate from the total broth levels then directly yields
intracellular levels (see Note 7).
3.5 Principles A detailed description of how to apply isotope dilution mass spec-
of Metabolite trometry will not be given here. Different methods for the analysis
Quantification Using of different groups of metabolites have been published previously
Isotope Dilution Mass [47, 48]. The principle of the method is that metabolites are
Spectrometry quantified by mass spectrometry, whereby for each individual
metabolite a chemically identical, fully 13C-labeled analog is
added as internal standard. Each metabolite is then quantified
relative to the amount of its fully 13C-labeled analog present. This
procedure effectively corrects for non-idealities in the subsequent
MS-based quantification, such as sample matrix effects, nonlinear-
ity resulting from competition in the ESI interface (in case of
LC-MS analysis), incomplete derivatization (in case of GC-MS
analysis), machine drift, etc.
34 Walter M. van Gulik et al.
4 Notes
References
1. Tang YJ, Martin HG, Myers S, Rodriguez S, 3. Gupta N, Tanner S, Jaitly N, Adkins JN,
Baidoo EE, Keasling JD (2009) Advances in Lipton M, Edwards R, Romine M,
analysis of microbial metabolic fluxes via 13C Osterman A, Bafna V, Smith RD, Pevzner PA
isotopic labeling. Mass Spectrom Rev (2007) Whole proteome analysis of post-
28:362–375 translational modifications: applications of
2. Gstaiger M, Aebersold R (2009) Applying mass-spectrometry for proteogenomic annota-
mass spectrometry-based proteomics to genet- tion. Genome Res 17(9):1362–1377
ics, genomics and network biology. Nat Rev
Genet 10:617–627
38 Walter M. van Gulik et al.
4. Gancedo JM, Gancedo C (1973) Concentra- metabolome analysis in Escherichia coli. Anal
tions of intermediary metabolites in yeast. Bio- Biochem 386:9–19
chimie 55:205–211 16. Canelas AB, Ras C, Ten Pierick A, Van Dam
5. Wollenberger a, Ristau O, Schoffa G. (1960) JC, Heijnen JJ, Van Gulik WM (2008)
Eine einfache technik der extrem schnellen Leakage-free rapid quenching technique for
abkuhlung grosserer gewebestucke. Pflugers yeast metabolomics. Metabolomics 4:226–239
Arch 270:399–412 17. Harrison DE, Maitra PK (1969) Control of
6. Williams DH, Lund P, Krebs HA (1967) respiration and metabolism in growing Klebsi-
Redox state of free nicotinamide-adenine dinu- ella aerogenes—role of adenine nucleotides.
cleotide in cytoplasm and mitochondria of rat Biochem J 112:647–656
liver. Biochem J 103:514–527 18. Iversen JJL (1981) A rapid sampling valve with
7. Veech RL, Egglesto LV, Krebs HA (1969) minimal dead space for laboratory scale fer-
Redox state of free nicotinamide-adenine dinu- mentors. Biotechnol Bioeng 23:437–440
cleotide phosphate in cytoplasm of rat liver. 19. Theobald U, Mailinger W, Reuss M, Rizzi M
Biochem J 115:609–619 (1993) In-vivo analysis of glucose-induced fast
8. Faupel RP, Seitz HJ, Tarnowsk W, changes in yeast adenine nucleotide pool apply-
Thiemann V, Weiss C (1972) Problem of tissue ing a rapid sampling technique. Anal Biochem
sampling from experimental-animals with 214:31–37
respect to freezing technique, anoxia, stress 20. Larsson G, Törnkvist M (1996) Rapid sam-
and narcosis – new method for sampling pling, cell inactivation and evaluation of low
rat-liver tissue and physiological values of gly- extracellular glucose concentrations during
colytic intermediates and related compounds. fedbatch cultivation. J Biotechnol 49
Arch Biochem Biophys 148:509–522 (1–3):69–82
9. Cole HA, Wimpenny JW, Hughes DE (1967) 21. Lange HC, Eman M, Van Zuijlen G, Visser D,
ATP pool in Escherichia coli. I. Measurement of Van Dam JC, Frank J, Teixeira de Mattos MJ,
pool using a modified luciferase assay. Biochim Heijnen JJ (2001) Improved rapid sampling
Biophys Acta 143:445–453 for in vivo kinetics of intracellular metabolites
10. Saez MJ, Lagunas R (1976) Determination of in Saccharomyces cerevisiae. Biotechnol Bioeng
intermediary metabolites in yeast – critical- 75(4):406–415
examination of effect of sampling conditions 22. Schaefer U, Boos W, Takors R, Weuster-Botz
and recommendations for obtaining true levels. D (1999) Automated sampling device for
Mol Cell Biochem 13:73–78 monitoring intracellular metabolite dynamics.
11. De Koning W, Van Dam K (1992) A method Anal Biochem 270(1):88–96
for the determination of changes of glycolytic 23. Weuster-Botz D (1997) Sampling tube device
metabolites in yeast on a subsecond time scale for monitoring intracellular metabolite dynam-
using extraction at neutral Ph. Anal Biochem ics. Anal Biochem 246(2):225–233
204:118–123 24. Schaub J, Schiesling C, Reuss M, Dauner M
12. Douma RD, de Jonge LP, Jonker CT, Seifar (2006) Integrated sampling procedure for
RM, Heijnen JJ, Van Gulik WM (2010) Intra- metabolome analysis. Biotechnol Prog 22
cellular metabolite determination in the pres- (5):1434–1442
ence of extracellular abundance: application to 25. Visser D, Van Zuijlen GA, Van Dam JC,
the penicillin biosynthesis pathway in Penicil- Oudshoorn A, Eman MR, Ras C, Van Gulik
lium chrysogenum. Biotechnol Bioeng 107 WM, Frank J, GWK VD, Heijnen JJ (2002)
(1):105–115 Rapid sampling for analysis of in vivo kinetics
13. Wittmann C, Kromer JO, Kiefer P, Binz T, using the bioscope: a system for continuous
Heinzle E (2004) Impact of the cold shock pulse experiments. Biotechnol Bioeng
phenomenon on quantification of intracellular 79:674–681
metabolites in bacteria. Anal Biochem 26. Mashego MR, Van Gulik WM, Vinke JL,
327:135–139 Visser D, Heijnen JJ (2006) in vivo kinetics
14. Bolten CJ, Kiefer P, Letisse F, Portais JC, Witt- with rapid perturbation experiments in Saccha-
mann C (2007) Sampling for metabolome romyces cerevisiae using a second-generation
analysis of microorganisms. Anal Chem bioscope. Metab Eng 8:370–383
79:3843–3849 27. Nasution U, Van Gulik WM, Pröll A, Van
15. Taymaz-Nikerel H, De Mey M, Ras C, Ten Winden WA, Heijnen JJ (2006) Generating
Pierick A, Seifar RM, Van Dam JC, Heijnen short term kinetic responses of primary metab-
JJ, van Gulik WM (2009) Development and olism of Penicillium chrysogenum through
application of a differential method for reliable
Fast Sampling of the Cellular Metabolome 39
glucose perturbation in the bioscope mini reac- 41. Bent KJ, Morton AG (1964) Amino acid com-
tor. Metab Eng 8:395–405 position of fungi during development in
28. Visser D, Van Zuylen GA, Van Dam JC, Eman submerged culture. Biochem J 92:260–269
MR, Pröll A, Ras C, Wu L, Van Gulik WM, 42. Maharjan RP, Ferenci T (2003) Global metab-
Heijnen JJ (2004) Analysis of in vivo kinetics of olite analysis: the influence of extraction meth-
glycolysis in aerobic Saccharomyces cerevisiae by odology on metabolome profiles of Escherichia
application of glucose and ethanol pulses. Bio- coli. Anal Biochem 313:145–154
technol Bioeng 88:157–167 43. Rabinowitz JD, Kimball E (2007) Acidic ace-
29. Mashego MR, Van Gulik WM, Heijnen JJ tonitrile for cellular metabolome extraction
(2007) Metabolome dynamic responses of Sac- from Escherichia coli. Anal Chem
charomyces cerevisiae to simultaneous rapid per- 79:6167–6173
turbations in external electron acceptor and 44. Canelas AB, Ten Pierick A, Ras C, Seifar RM,
electron donor. Fems Yeast Res 7:48–66 Van Dam JC, Van Gulik WM, Heijnen JJ
30. De Mey M, Taymaz-Nikerel H, Baart G, (2009) Quantitative evaluation of intracellular
Waegeman H, Maertens J, Heijnen JJ, Van metabolite extraction techniques for yeast
Gulik WM (2010) Catching prompt metabo- metabolomics. Anal Chem 81:7379–7389
lite dynamics in Escherichia coli with the bio- 45. Bergmeyer HU (1983) Methods of enzymatic
scope at oxygen rich conditions. Metab Eng analysis. VCH Publishers, Weinheim
12:477–487 46. Seifar RM, Ras C, Van Dam JC, Van Gulik
31. Lameiras F, Heijnen JJ, Van Gulik WM (2015) WM, Heijnen JJ, Van Winden WA (2009)
Development of tools for quantitative intracel- Simultaneous quantification of free nucleotides
lular metabolomics of Aspergillus niger chemo- in complex biological samples using ion pair
stat cultures. Metabolomics 11:1253–1264 reversed phase liquid chromatography isotope
32. Villas-Bôas SG, Bruheim P (2007) Cold gly- dilution tandem mass spectrometry. Anal Bio-
cerol–saline: the promising quenching solution chem 388:213–219
for accurate intracellular metabolite analysis of 47. Wu L, Mashego MR, Van Dam JC, Pröll A,
microbial cells. Anal Biochem 370:87–97 Vinke JL, Ras C, Van Winden WA, Van Gulik
33. Bolten CJ, Wittmann C (2008) Appropriate WM, Heijnen JJ (2005) Quantitative analysis
sampling for intracellular amino acid analysis of the microbial metabolome by isotope dilu-
in five phylogenetically different yeasts. Bio- tion mass spectrometry using uniformly
technol Lett 30:1993–2000 13C-labeled cell extracts as internal standards.
34. Oldiges M, Takors R (2005) Applying meta- Anal Biochem 336:164–171
bolic profiling techniques for stimulus- 48. Maleki Seifar R, Zhao Z, Van Dam JC, Van
response experiments: chances and pitfalls. Winden WA, Van Gulik WM, Heijnen JJ
Technol Transfer Biotechnol 92:173–196 (2008) Quantitative analysis of metabolites in
35. Mashego MR, Rumbold K, De Mey M, complex biological samples using ion-pair
Vandamme E, Soetaert W, Heijnen JJ (2007) reversed-phase liquid chromatography-isotope
Microbial metabolomics: past, present and dilution tandem mass spectrometry. J Chrom A
future methodologies. Biotechnol Lett 1187:103–110
29:1–16 49. Mashego MR, Van Gulik WM, Vinke JL, Heij-
36. Hancock R (1958) The intracellular amino nen JJ (2003) Critical evaluation of sampling
acids of Staphylococcus aureus: release and anal- techniques for residual glucose determination
ysis. Biochim Biophys Acta 28:402–412 in carbon-limited chemostat culture of Saccha-
37. Hommes FA (1964) Oscillatory reductions of romyces cerevisiae. Biotechnol Bioeng
pyridine nucleotides during anaerobic glycoly- 83:395–399
sis in brewers yeast. Arch Biochem Biophys 50. Van Dam JC, Eman MR, Frank J, Lange HC,
108:36–46 Van Dedem GWK, Heijnen JJ (2002) Analysis
38. Gale EFJ (1947) Gen Microbiol 1:53–76 of glycolytic intermediates in Saccharomyces cer-
evisiae using anion exchange chromatography
39. Work E (1949) Biochim Biophys Acta and electrospray ionization with tandem mass
3:400–411 spectrometric detection. Anal Chim Acta
40. Fuerst R, Wagner RP (1957) An analysis of the 460:209–218
free intracellular amino acids of certain strains
of Neurospora. Arch Biochem Biophys
70:311–326
Chapter 3
Abstract
Integral membrane proteins are embedded in biological membranes where various lipids modulate their
structure and function. There exists a critical need to elucidate how these lipids participate in the physio-
logical and pathological processes associated with the membrane protein dysfunction. Native mass spec-
trometry (MS), combined with ion mobility spectrometry (IM), is emerging as a powerful tool to probe
membrane protein complexes and their interactions with ligands, lipids, and other small molecules. Unlike
other biophysical approaches, native IM-MS can resolve individual ligand/lipid binding events. We have
developed a novel method using native MS, coupled with a temperature-control apparatus, to determine
the thermodynamic parameters of individual ligand or lipid binding events to proteins. This approach has
been validated using several soluble protein–ligand systems wherein MS results are compared with those
acquired from conventional biophysical techniques, such as isothermal titration calorimetry (ITC) and
surface plasmon resonance (SPR). Using these principles, it is possible to elucidate the thermodynamics of
individual lipid binding to integral membrane proteins. Herein, we use the ammonia channel (AmtB) from
Escherichia coli as a model membrane protein. Remarkably, distinct thermodynamic signatures for AmtB
binding to lipids with different headgroups and acyl chain configurations are observed. Additionally, using a
mutant form of AmtB that abolishes a specific lipid binding site, distinct changes have been discovered in
the thermodynamic signatures compared with the wild-type, implying that these signatures can identify key
residues involved in specific lipid binding and potentially differentiate between specific lipid binding sites.
This chapter provides procedures and findings associated with temperature-controlled native MS as a novel
approach to interrogate membrane proteins and their interactions with lipids and other molecules.
Key words Membrane protein, Lipid, Native mass spectrometry, Ion mobility-mass spectrometry,
Temperature control, Binding thermodynamics
1 Introduction
Ali Navid (ed.), Microbial Systems Biology: Methods and Protocols, Methods in Molecular Biology, vol. 2349,
https://doi.org/10.1007/978-1-0716-1585-0_3, © Springer Science+Business Media, LLC, part of Springer Nature 2022
41
42 Xiao Cong et al.
2 Materials
2.1 Plasmid 1. Plasmid for expression of target soluble and membrane pro-
Construction, Protein teins. Here, we use pET15-based vector that expresses E. coli
Expression, ammonium channel (AmtB) as TEV protease cleavable
and Purification N-terminal fusion to pelB (signal sequence), 10 His-tag,
and maltose binding protein (MBP) (see ref. 3). The soluble
nitrogen regulatory protein, GlnK, from E. coli was expressed
as a TEV cleavable N-terminal fusion to Strep Tag II (see ref.
(21)). MBP was expressed from pET28b as a thrombin cleav-
able N-terminal 6 His-tag (see ref. 21).
2. Tobacco etch virus (TEV) protease purchased or produced
in-house (see ref. 22).
3. Thrombin.
4. E. coli Rosetta 2(DE3) pLysS.
5. E. coli ArcticExpress (DE3) RIL.
6. E. coli BL21 (DE3) Gold.
7. E. coli OverExpress C43(DE3).
8. Terrific broth (TB) medium.
9. Lysogeny broth (LB) Miller medium: 5 g yeast extract, 10 g
peptone from casein, and 10 g sodium chloride per liter.
10. Chloramphenicol.
11. Kanamycin.
12. Isopropyl 1-thio-β-D-galactopyranoside (IPTG).
13. BioSpectrometer® basic.
14. M-110P microfluidizer.
15. Optima™ XPN ultracentrifuge.
16. TBS buffer: 20 mM tris(hydroxymethyl)methylamine (Tris),
pH ¼ 7.4 at room temperature, 150 mM sodium chloride.
17. Resuspension buffer: 50 mM Tris, pH ¼ 7.4 at room tempera-
ture, 300 mM sodium chloride.
44 Xiao Cong et al.
(b) Maltotriose.
(c) N, N0 , N00 -Triacetylchitotriose (NAG3).
(d) Adenosine 50 -diphosphate (ADP) (Sigma-Aldrich).
10. Lipid solutions.
(a) 1-Palmitoyl-2-oleoyl-sn-glycero-3-phospho-(10 -rac-glyc-
erol) (POPG).
(b) 1-Palmitoyl-2-oleoyl-sn-glycero-3-phosphoethanolamine
(POPE).
(c) 1-Palmitoyl-2-oleoyl-sn-glycero-3-phosphate (POPA).
(d) 1-Palmitoyl-2-oleoyl-sn-glycero-3-phospho-L-serine
(POPS).
Investigation of Protein–Lipid Interactions Using Native Mass Spectrometry 47
3 Methods
3.1 Soluble Protein 1. MBP and GlnK expression plasmids were transformed into
Expression E. coli Rosetta 2(DE3) pLysS and E. coli ArcticExpress (DE3)
RIL, respectively.
2. Grow colonies overnight in 50 mL of LB medium supplemen-
ted with either chloramphenicol (45μg/mL) and kanamycin
(50μg/mL) for MBP or kanamycin (50μg/mL) for GlnK at
37 C.
3. Inoculate 1 L of LB with 7 mL of overnight culture and grow at
37 C until the OD600 reaches 0.8.
4. Chill the cell cultures on ice prior to adding isopropyl
1-thio-β-D-galactopyranoside (IPTG) to a final concentration
of 1 mM and 0.1 mM for MBP and GlnK, respectively.
5. Grow the cells for 24 h at 20 C.
6. Harvest the cells by centrifugation at 5000 g for 10 min,
wash once with TBS buffer, re-pellet, store the pellets at
80 C.
48 Xiao Cong et al.
3.2 Soluble Protein 1. Thaw the cell pellets and resuspend in NHA buffer, respec-
Purification tively. Unless otherwise stated, all purification steps are carried
out at 4 C.
2. Lyse the cells with 1–3 passes through an M-110P microflui-
dizer at 20,000 psi.
3. Clarify cell lysate by centrifugation (30 min at 30,000 g).
4. Collect the supernatant containing recombinant 6
His-tagged MBP or Strep-tag II-tagged GlnK.
5. Filter the supernatant with a syringe filtration device (0.45μm).
6. Load 6 His-tagged MBP onto a HisTrap 5 mL column elute
with NHB buffer. Load Strep-tag II-tagged GlnK onto a Strep-
Trap HP 5 mL column and elute it with the TBS buffer con-
taining 2.5 mM D-desthiobiotin. Store the peak fraction of 6
His-tagged MBP and Strep-tag II-tagged GlnK, respectively.
7. Inject 6 His-tagged MBP peak fraction onto a HiPrep 26/10
desalting column equilibrated in NHA buffer with imidazole
omitted.
8. Digest the purified 6 His-tagged MBP with thrombin
(3 units of thrombin per mg of protein) overnight at room
temperature.
9. Pass digested MBP over a 5 mL HisTrap HP column to remove
tagged (undigested) protein. Concentrate the untagged pro-
teins using centrifugal concentrator.
10. Load the concentrated protein onto a HiLoad 16/600 Super-
dex 75 pg column equilibrated in GF buffer.
11. Pool and concentrate the peak fractions containing MBP or
Strep-tag II-tagged GlnK.
12. Aliquot and flash-freeze the protein. Store at 80 C.
13. For hen egg white lysozyme, directly dissolve the protein in
TBS buffer on ice prior to use.
3.3 Membrane 1. Transform the MBP–AmtB plasmid into E. coli BL21 (DE3)
Protein Expression Gold cells and the MBP-AmtBN72A/N79A plasmid into E. coli
OverExpress C43(DE3) cells.
2. Inoculate several colonies into 50 mL LB Miller medium and
grow overnight at 37 C.
3. Inoculate each liter of LB in 2 L shaker flasks with 7 mL of
overnight culture and grow at 37 C until the OD600 reached
between 0.6 and 0.8.
4. Add IPTG to the culture at a final concentration of 0.5 mM
and grow for 3 h at 37 C.
5. Collect the cells by centrifugation at 5000 g for 10 min at
4 C, wash them once with TBS buffer, re-pellet them, and
store the pellets at 80 C.
Investigation of Protein–Lipid Interactions Using Native Mass Spectrometry 49
3.4 Membrane 1. Thaw the MBP-AmtB cell pellets and resuspend them in resus-
Protein Purification pension buffer supplemented with a cOmplete™ Protease
Inhibitor Cocktail tablet and 5 mM BME. The protocol for
AmtB has been optimized for native MS studies and other
membrane proteins may require a detergent screening
approach to remove co-purified lipids (see Note 1).
2. Lyse the cells with 4–5 passes through an M-110P microflui-
dizer at 20,000 psi.
3. Clarify cell lysate by centrifugation at 20,000 g for 25 min at
4 C.
4. Pellet membranes by centrifugation at 100,000 g for 2 h at
4 C and resuspend the membranes in 20 mM Tris buffer
(pH ¼ 7.4 at room temperature) containing 100 mM sodium
chloride, 10% glycerol, and 5 mM BME.
5. Homogenize the membrane suspension using a Potter-
Elvehjem Teflon pestle and glass tube, or similarly glass tissue
grinder/homogenizers.
6. Add 200 mM OG to the membrane suspension for membrane
protein extraction and incubate with gentle agitation overnight
at 4 C.
7. Clarify the extraction by centrifugation at 20,000 g for
25 min at 4 C.
8. Filter the supernatant before loading onto a 5 mL HisTrap-HP
column equilibrated in NHA buffer supplemented with
0.025% DDM.
9. After loading, wash the column initially with 40–50 mL of
NHA buffer supplemented with 1% OG before exchanging
the proteins into several column volumes of NHA buffer sup-
plemented with 0.025% DDM until a steady baseline is
reached.
10. Elute the membrane protein fusions with a linear gradient to
100% in two column volumes of NHB buffer supplemented
with 0.025% DDM.
11. Pool the peak fractions and add BME to a final concentration
of 5 mM and His-tagged TEV protease (10 units of TEV
protease per mg of protein). Incubate overnight at 4 C.
12. Filter the sample and pass over a 5 mL HisTrap-HP column
equilibrated in NHA buffer supplemented with 0.025% DDM.
13. Collect the flow-through containing the untagged membrane
protein and concentrate it using a 100 kDa MWCO centrifugal
concentrator.
14. Immediately use purified protein or flash-freeze them in liquid
nitrogen and store at 80 C.
50 Xiao Cong et al.
3.5 Ligand or Lipid 1. Prepare stock solutions by dissolving each ligand (powder
Solution Preparation form) in AA buffer to a final concentration of 0.1–1 mM.
2. For each ligand, perform a serial dilution to make sets of
solutions containing different concentrations of ligands for
titration experiments.
3. Lipids, received as concentrated stocks in chloroform, can be
dried under a stream of nitrogen gas, followed by vacuum
desiccation overnight at room temperature to remove residual
solvent.
4. Prepare stock solutions of lipids by resuspending dried lipid
films in water supplemented with 0.5% C8E4 and 5 mM BME
to a final concentration of 10 mM.
5. Determine the concentration of stock solutions using a phos-
phorus assay [25, 26].
6. Perform a serial dilution of each lipid stock in AA buffer sup-
plemented with 0.5% C8E4 to make sets of solutions containing
different concentrations of lipids for titration experiments.
3.6 Protein–Ligand/ 1. Buffer exchange the purified soluble protein and the mem-
Lipid Titration brane protein into AA buffer and AA buffer supplemented
Experiments Using with 0.5% C8E4, respectively, using Micro Bio-Spin 6 centrifu-
Native MS gal buffer exchange device.
2. Determine the soluble and membrane protein concentration
with the DC Protein Assay kit using bovine serum albumin as
the standard and then dilute the protein sample to a concentra-
tion of 0.1–1μM.
3. Load 3μL of the diluted protein solution into a gold-coated
capillary tip for MS analysis.
4. Tune the mass spectrometer parameters to achieve the native-
like conditions for soluble or membrane proteins (see ref. 27,
28 (see Note 2) and Subheading 4 for further discussion). For
soluble proteins applied in this chapter, the instrument is set to
a source pressure of 6.2–6.4 mbar, capillary voltage of 1.4 kV,
sampling cone voltage of 20 V, extractor cone voltage of 3.0 V,
trap collision voltage of 20 V, collision gas (Argon) flow rate of
4 mL/min (3.6 102 mbar), and T-wave settings (velocity/
height) for trap, IMS, and transfer of 100 ms1/
0.5 V,100 ms1/4.0 V, and 100 ms1/3.0 V, respectively.
The source temperature (50 C) and trap bias (22 V) are
optimized for soluble proteins. For model membrane pro-
tein–lipid binding systems, instrument parameters are tuned
to maximize ion intensity but simultaneously preserve the
native-like state of AmtB, monitored by the drift time of pro-
tein ions using IM coupled with MS. The instrument is set to a
capillary voltage of 1.7 kV, sampling cone voltage of 200 V,
Investigation of Protein–Lipid Interactions Using Native Mass Spectrometry 51
Fig. 2 Evaluating the time for samples to reach equilibrium. Mole fraction of
membrane protein–lipid complexes incubated for 5 min or 24 h. Shown is data
for 1.5μM AmtB mixed with 10μM POPG at Tsample ¼ 25 C for 5 min in the mass
spectrometer or 24 h in a thermal cycler set at 25 C. Reported is the mean and
standard deviation from repeated measurements using different gold-coated
capillary tips (n ¼ 3). Figure adapted with permission from ref. 21 (Copyright
2016 American Chemical Society)
52 Xiao Cong et al.
3.7 Native MS Data 1. Import MS raw file series into PULSAR or Unidec software to
Analysis convert RAW file format to text-format containing MS and
IM-MS data [23].
2. Open each MS text file using UniDec software and deconvo-
lute each mass spectrum [24]. Input the “charge range” and
“mass range” based on the charge state and mass of the desired
sample. Optimize the fitting parameters such as “sample mass
every (Da)” and “Peak FWHM (full width at half maximum)”
until R2 reaches the highest value. Output the intensity of
protein (P) and protein–ligand/lipid (PLn, n ¼ number of
ligands/lipids bound) species to a text file.
3. Divide the intensity of each species by the sum intensity of all
the species to convert to the mole fraction of each species.
4. As the mole fraction of P and PLn is dependent on the apparent
equilibrium association constant (KA), thus for soluble pro-
tein–ligand systems, the following sequential binding model
can be applied to deduce KA [21]. For protein binding to one
ligand:
KA ½PL
P þ L , PL KA ¼ ð1Þ
½P ½L
and binding to multiple ligands:
K An ½PLn
PLn1 þ L , PLn K An ¼ ð2Þ
½PLn1 ½L
Investigation of Protein–Lipid Interactions Using Native Mass Spectrometry 53
½PLn ½L n ∏nj ¼1 K Aj
Fn ¼ ¼ ð4Þ
½P total Pn
1þ ½L i ∏i j ¼1 K Aj
i¼1
Fig. 4 Soluble protein–ligand binding thermodynamics determined by native MS. Shown are (a) stacked
representative mass spectra of GlnK titrated with serial ADP concentrations, (b) stacked plots of mole fraction
of apo GlnK and GlnK(ADP)1–3 as a function of free ADP concentration collected at 25, 29, 33, and 37 C, (c)
van’t Hoff plots, and (d) binding thermodynamic parameters for binding of one to three ADP molecules. Shown
in (c) and (d) are the mean and standard deviation (n ¼ 3) (Figure adapted with permission from ref. 21.
Copyright 2016 American Chemical Society)
Fig. 5 Membrane protein–lipid binding data fit to different binding models. (a) Representative mass spectrum
showing POPE self-assembly and POPE–detergent complexes. (b, c) Plots of mole fraction AmtB(POPE)0–5
determined from the titration series of POPE (dots) and resulting fit (solid lines) using a (b) sequential binding
model (R2 ¼ 0.96, χ 2 ¼ 0.112) or (c) modified sequential lipid-binding model (R2 ¼ 0.99, χ 2 ¼ 0.019). (d) Plot
of available POPE concentrations calculated by the above Eq. 11 as a function of total POPE concentrations
titrated. KAGG applied is abstracted from fitted data in (c) using a modified sequential lipid-binding model,
which in this example equals 7.26μM (Figure adapted with permission from ref. 21. Copyright 2016 American
Chemical Society)
4 Notes
2. MS Instrument Tuning.
It has been known for some time that instrument tuning
can have dramatic effects on the mass spectra obtained for
protein ions [27, 28]. Acquiring data for the same protein of
interest using different types of mass spectrometers can often-
times yield marked differences in their mass spectra. It is imper-
ative that the parameters of each instrument be tuned to obtain
resolved spectra that demonstrate good signal-to-noise ratios.
Adequate tuning ensures that proteins retain their folded,
native-like state, even after removal of detergent and solvent
within the mass spectrometer. These parameters include, but
are not limited to, source temperature, source pressure, capil-
lary voltage, sampling cone voltage, trap and transfer cell colli-
sion voltage, trap and IMS cell bias potentials, as well as other
parameters unique to each mass spectrometer’s function [32].
To assess the extent of protein folding/unfolding in the gas
phase, the collision cross section (CCS) or arrival time distri-
bution of protein ions obtained from IM coupled with MS can
be plotted as a function of collision voltage (or internal energy
applied), which reveals a protein ion’s unfolding pathway from
a native-like conformation at low E, through intermediates, to
a partially unfolded structure (Fig. 6a, b) [3]. Additionally, the
mole fraction of apo and ligand/lipid-bound protein species
can be plotted against the collision voltage/energy applied to
determine the energy range within which the complex does not
undergo dissociation in the mass spectrometer (Fig. 6c).
Fig. 6 Optimization of native mass spectrometry (MS) settings for membrane proteins. (a) Experimental and (b)
modeled gas phase unfolding plots of 15+ ion of AmtB (fitting χ 2 ¼ 2.96). Extraction regions for native-like
(green) and the first (blue), second (gray) and third (purple) partially unfolded states are shown. (c) Plots of the
mole fraction of apo and lipid-bound AmtB as a function of collision voltage for 1.5μM AmtB mixed with
16.7μM POPE. Based on these results, a collision voltage of 60 V is selected for native MS experiments where
the native-like state of AmtB is preserved (Figure adapted with permission from ref. 21. Copyright 2016
American Chemical Society)
Investigation of Protein–Lipid Interactions Using Native Mass Spectrometry 59
Table 1
Thermodynamic parameters for soluble protein–ligand binding
Fig. 7 Native MS reveals thermodynamic signatures of individual lipid binding events to AmtB. (a) Represen-
tative mass spectrum in the series of AmtB titrated with POPA. (b) Plots of mole fraction for AmtB and AmtB
(POPA)1–5 determined from a titration series of POPA (dots) and resulting fit (R2 ¼ 0.99) from a sequential lipid-
binding model (solid lines) collected at 29 C. (c) Binding thermodynamics for POPE, POPG, POPS, POPA, and
TOCDL to AmtB determined through van’t Hoff analysis for binding of the first, second, and third lipid (labeled
as 1–3). Shown above are the headgroup structures ( p < 0.01 for ΔH and –TΔS; p > 0.9 for ΔG, one-way
ANOVA, n ¼ 3). (d) Thermodynamics of AmtB binding PG lipids with increasing acyl chain length: DLPG, DMPG,
and DPPG. Trend lines are plotted for binding the first, second, and third lipid ( p < 0.01 for ΔH and –TΔS;
p > 0.9 for ΔG, one-way ANOVA, n ¼ 3). (e) Thermodynamics of the AmtB double mutant (N72A/N79A)
binding the first, second, and third POPG or POPE molecule. Reported in (c–e) are the mean and standard
deviation (n ¼ 3) (Figure adapted with permission from ref. 21. Copyright 2016 American Chemical Society)
Acknowledgments
References
1. Yildirim MA, Goh KI, Cusick ME, Barabasi 16. Freire E (2006) Overcoming HIV-1 resistance
AL, Vidal M (2007) Drug-target network. to protease inhibitors. Drug Discov Today Dis
Nat Biotechnol 25(10):1119–1126 Mech 3(2):281–286
2. Overington JP, Al-Lazikani B, Hopkins AL 17. Vuignier K, Schappler J, Veuthey JL, Carrupt
(2006) How many drug targets are there? Nat PA, Martel S (2010) Drug-protein binding: a
Rev Drug Discov 5(12):993–996 critical review of analytical tools. Anal Bioanal
3. Laganowsky A, Reading E, Allison TM, Chem 398(1):53–66
Ulmschneider MB, Degiacomi MT, Baldwin 18. Loo JA (1997) Studying noncovalent protein
AJ, Robinson CV (2014) Membrane proteins complexes by electrospray ionization mass
bind lipids selectively to modulate their struc- spectrometry. Mass Spectrom Rev 16(1):1–23
ture and function. Nature 510(7503):172–175 19. Gulbakan B, Barylyuk K, Zenobi R (2015)
4. Lee AG (2011) Biological membranes: the Determination of thermodynamic and kinetic
importance of molecular detail. Trends Bio- properties of biomolecules by mass spectrome-
chem Sci 36(9):493–500 try. Curr Opin Biotechnol 31:65–72
5. Contreras FX, Ernst AM, Wieland F, Brugger 20. Pacholarz KJ, Garlish RA, Taylor RJ, Barran PE
B (2011) Specificity of intramembrane protein- (2012) Mass spectrometry based tools to inves-
lipid interactions. Cold Spring Harb Perspect tigate protein-ligand interactions for drug dis-
Biol 3(6):a004705 covery. Chem Soc Rev 41(11):4335–4355
6. Bogdanov M, Dowhan W, Vitrac H (2014) 21. Cong X, Liu Y, Liu W, Liang X, Russell DH,
Lipids and topological rules governing mem- Laganowsky A (2016) Determining membrane
brane protein assembly. Biochim Biophys Acta protein-lipid binding thermodynamics using
1843(8):1475–1488 native mass spectrometry. J Am Chem Soc
7. Singer SJ, Nicolson GL (1972) The fluid 138(13):4346–4349
mosaic model of the structure of cell mem- 22. Laganowsky A, Benesch JL, Landau M,
branes. Science 175(4023):720–731 Ding L, Sawaya MR, Cascio D, Huang Q,
8. Poveda JA, Giudici AM, Renart ML, Molina Robinson CV, Horwitz J, Eisenberg D
ML, Montoya E, Fernandez-Carvajal A, (2010) Crystal structures of truncated alphaA
Fernandez-Ballester G, Encinar JA, Gonzalez- and alphaB crystallins reveal structural mechan-
Ros JM (2014) Lipid modulation of ion chan- isms of polydispersity important for eye lens
nels through specific binding sites. Biochim function. Protein Sci 19(5):1031–1043
Biophys Acta 1838(6):1560–1567 23. Allison TM, Reading E, Liko I, Baldwin AJ,
9. Jiang QX, Gonen T (2012) The influence of Laganowsky A, Robinson CV (2015) Quanti-
lipids on voltage-gated ion channels. Curr fying the stabilizing effects of protein-ligand
Opin Struct Biol 22(4):529–536 interactions in the gas phase. Nat Commun
10. Eisenberg DS, Crothers DM (1979) Physical 6:8551
chemistry: with applications to the life sciences. 24. Marty MT, Baldwin AJ, Marklund EG, Hoch-
Benjamin/Cummings Publishing Company, berg GK, Benesch JL, Robinson CV (2015)
Menlo Park, CA Bayesian deconvolution of mass and ion mobil-
11. Gilson MK, Zhou HX (2007) Calculation of ity spectra: from binary interactions to polydis-
protein-ligand binding affinities. Annu Rev perse ensembles. Anal Chem 87
Biophys Biomol Struct 36:21–42 (8):4370–4376
12. Gibbs JW (1873) A method of geometrical 25. Subbarow CHF (1925) The colorimetric
representation of the thermodynamic proper- determination of phosphorus. J Biol Chem
ties of substances by means of surfaces. Trans 66:26
Conn Acad Arts Sci 2:23 26. Chen PS, Toribara TY, Warner H (1956)
13. Keserü G, Swinney DC, Mannhold R, Microdetermination of phosphorus. Anal
Kubinyi H, Folkers G (2015) Thermodynamics Chem 28(11):3
and kinetics of drug binding. Wiley-VCH, 27. Ruotolo BT, Benesch JL, Sandercock AM,
Weinheim Hyung SJ, Robinson CV (2008) Ion
14. Raffa RB (2001) Drug-receptor thermody- mobility-mass spectrometry analysis of large
namics: introduction and applications. Wiley, protein complexes. Nat Protoc 3
Chichester (7):1139–1152
15. Klebe G (2015) Applying thermodynamic 28. Laganowsky A, Reading E, Hopper JT, Robin-
profiling in lead finding and optimization. Nat son CV (2013) Mass spectrometry of intact
Rev Drug Discov 14(2):95–110
64 Xiao Cong et al.
membrane protein complexes. Nat Protoc 8 and 2-oxoglutarate. J Biol Chem 285
(4):639–651 (40):31037–31045
29. van’t Hoff MJH (1884) Etudes de dynamique 35. Telmer PG, Shilton BH (2003) Insights into
chimique. Recl Trav Chim Pays-Bas 3 the conformational equilibria of maltose-
(10):333–336 binding protein by analysis of high affinity
30. Prabhu NV, Sharp KA (2005) Heat capacity in mutants. J Biol Chem 278(36):34555–34567
proteins. Annu Rev Phys Chem 56:521–548 36. Maple HJ, Garlish RA, Rigau-Roca L, Porter J,
31. Reading E, Liko I, Allison TM, Benesch JL, Whitcombe I, Prosser CE, Kennedy J, Henry
Laganowsky A, Robinson CV (2015) The role AJ, Taylor RJ, Crump MP, Crosby J (2012)
of the detergent micelle in preserving the struc- Automated protein-ligand interaction screen-
ture of membrane proteins in the gas phase. ing by mass spectrometry. J Med Chem 55
Angew Chem Int Ed Engl 54(15):4577–4581 (2):837–851
32. Liu Y, Cong X, Liu W, Laganowsky A (2017) 37. Maple HJ, Scheibner O, Baumert M, Allen M,
Characterization of membrane protein-lipid Taylor RJ, Garlish RA, Bromirski M, Burnley
interactions by mass spectrometry ion mobility RJ (2014) Application of the Exactive Plus
mass spectrometry. J Am Soc Mass Spectrom EMR for automated protein-ligand screening
28(4):579–586 by non-covalent mass spectrometry. Rapid
33. Daneshfar R, Kitova EN, Klassen JS (2004) Commun Mass Spectrom 28(13):1561–1568
Determination of protein-ligand association 38. Dunitz JD (1995) Win some, lose some:
thermochemistry using variable-temperature enthalpy-entropy compensation in weak inter-
nanoelectrospray mass spectrometry. J Am molecular interactions. Chem Biol 2
Chem Soc 126(15):4786–4787 (11):709–712
34. Radchenko MV, Thornton J, Merrick M 39. Marsh D (2013) Handbook of lipid bilayers,
(2010) Control of AmtB-GlnK complex for- 2nd edn. Taylor & Francis Group, Boca Raton,
mation by intracellular levels of ATP, ADP, FL, p 387
Chapter 4
Abstract
Western blot processing is a well-established procedure that includes protein extraction from tissues and
cells, gel electrophoresis separation, transfer to a membrane, and immunodetection with specific antibodies.
Here, we show that optimization of washing helps to maximize the specific interactions of antigens and
antibodies. Performing all washing steps at 4 C ensures a maximal signal to noise ratio and reduces
nonspecific signals.
1 Introduction
Ali Navid (ed.), Microbial Systems Biology: Methods and Protocols, Methods in Molecular Biology, vol. 2349,
https://doi.org/10.1007/978-1-0716-1585-0_4, © Springer Science+Business Media, LLC, part of Springer Nature 2022
65
66 Russ Yukhananov et al.
2 Materials
All solutions have been prepared using purified water and analytical
grade reagents (see Note 1).
NP-40 lysis buffer:
Western Blot Processing Optimization: The Perfect Blot 67
50 mM Tris, pH 8.0.
0.1% SDS (sodium dodecyl sulfate).
0.5% sodium deoxycholate
The 10% sodium deoxycholate stock solution (5 g into 50 ml)
must be protected from light.
Laemmli 2 buffer (loading buffer):
4% SDS.
10% 2-mercaptoethanol.
20% glycerol.
0.004% bromophenol blue.
0.125 M Tris–HCl.
Adjust to pH to 6.8 using HCl (see Note 2).
Transfer buffer:
25 mM Tris, pH 7.5.
200 mM glycine.
20% methanol.
Adjust to pH to 8.3.
1 Running buffer Tris-glycine/SDS:
25 mM Tris–HCl.
200 mM glycine.
0.1 w/v SDS.
pH is adjusted to 8.3.
Tris–glycine 10 or 1 stock solution is commercially available
from multiple vendors. For proteins larger than 80 kDa, we recom-
mend that SDS is included at a final concentration of 0.1%.
PBST buffer:
100 mM sodium phosphate dibasic.
20 mM potassium phosphate monobasic.
27 mM KCl.
1.37 M NaCl.
0.1–0.5% Tween-20.
Check the pH and adjust to pH 7.4 0.2 (see Note 2).
Blocking buffer:
3% milk or 3% goat serum added to PBST buffer and mixed well
and filtered. Failure to filter can lead to spotting, where tiny dark
grains will contaminate the blot during detection.
Western Blot Processing Optimization: The Perfect Blot 69
3 Methods
3.1 Sample The sample preparation protocol recorded here is for HELA cells or
Preparation the clear cell renal carcinoma (ccRCC) cell line. However, similar
procedures can be used for a variety of cell types.
Cell culture dishes were placed on ice and washed with ice-cold
PBS. The PBS buffer was removed by aspiration, and an ice-cold
lysis buffer (about 1 ml per ten million cells) was added. Adherent
cells were scraped off the dish using a cold plastic cell scraper, then
gently transferred into a pre-cooled micro centrifuge tube and
agitated for 15 min at 4 C. Cells were centrifuged for 10 min at
3000 g (12,000 rpm) using a refrigerated centrifuge; then the
supernatant was gently removed and the pellet discarded. A small
volume of lysate was removed to measure protein concentrations
using a Bradford assay (see Note 2).
The lysate were aliquoted in 0.5 ml volume and stored at
80 C (see Note 1).
with the anionic denaturing detergent SDS is used, and the mixture
is boiled at 95–100 C for 5 min. Heating at 70 C for 5–10 min is
also acceptable and may be preferable when studying membrane
proteins. These tend to aggregate when boiled, and the aggregates
may not enter the gel efficiently. With SDS, all of the proteins
become negatively charged by their attachment to the SDS anions.
SDS confers a negative charge to the polypeptide in proportion to
its length—i.e., the denatured proteins become negatively charged
with almost equal charge densities per unit length. In denaturing
SDS-PAGE separations, therefore, migration is determined not by
a protein’s electrical charge, but by its molecular weight.
It might be necessary to reduce disulfide bridges in proteins by
using ß-mercaptoethanol or dithiothreitol (DTT). This allows pro-
teins to adopt a configuration suitable for separation by size. Glyc-
erol is added to the loading buffer to increase the density of the
sample, so the loaded sample is located at the bottom of the well,
reducing uneven gel loading. To enable visualization of protein
migration, a small anionic dye molecule (often bromophenol
blue) is included in the loading buffer. Since the dye is anionic
and small, it migrates faster than any protein and provides an
indicator to monitor the progress of the separation. During sample
treatment, the sample should be mixed by vortexing before and
after the heating step for best resolution, and briefly centrifuged
prior to loading into the gel.
The standard loading buffer, called the 2 Laemmli buffer, was
first described in 1970 by Laemmli [9]. It can also be made at
higher concentrations, 4 and 6, to minimize dilution of the
samples. The 2 is to be mixed in a 1:1 ratio with the sample.
In this example, 10 μl of the sample, or about 15–20 μg of
protein, was diluted 1:1 with 2 loading buffer and boiled at 95 C
for 5 min, cooled on ice, centrifuged, and loaded on a precast gel.
Equal amounts of protein were loaded into each well of the SDS-
PAGE gel, along with molecular weight markers. Extracts from the
HELA cell that included 15 μg of total protein was loaded, along
with 40 ng of purified protein (hIL6). A precast gradient gel of
4–12% was used. Electrophoresis lasted for 1–2 h at 100 V, until the
blue dye reached the bottom of the gel (see Note 3).
3.3 Transferring the There are two main types of protein transfer: wet transfer and semi-
Protein from the Gel to dry transfer. Traditional wet transfer is still the most common
the Membrane method for protein transfer from gel to membrane and offers
high efficiency, but at a cost of time and effort. Semi-dry transfer
is faster than wet transfer [10]. Semi-dry systems can efficiently
transfer proteins to membranes in 7 min. In our hands the quality
of the transfer was similar between the new semi-dry blotters and
the traditional wet transfer method. Transfer can be done through a
stack of capillary paper or using an electroblotting technique. Elec-
troblotting is more consistent and faster, but otherwise there is no
Western Blot Processing Optimization: The Perfect Blot 71
3.5 Incubation of The membranes are incubated with primary antibodies, the rabbit
Primary Antibodies anti-IL2 and mouse anti-α-tubulin antibodies, for 2 h at RT or for
10 h at 4 C with constant shaking.
Western blots with a cell extract from ccRCC were incubated
with the rabbit polyclonal anti-PARP antibody for 8 h at 4 C with
constant shaking.
3.6 The Washing The membranes were washed with PBST or TBST buffer three
Membrane After times at RT and five times at 4 C, with each wash containing
Primary Antibody 20 ml of washing buffer, for 5–15 min with constant shaking.
Incubation
3.7 The Secondary The membranes were incubated with fluorescently labeled or
Antibody Incubation horseradish peroxidase (HRP)-conjugated anti-rabbit IgG second-
ary antibodies for 1 h at RT and for 3 h at 4 C, with constant
shaking.
3.8 Washing After Membranes were washed four times at RT and six times at 4 C
Secondary Antibodies with PBST or TBST.
3.9 Detection and The fluorescence and the chemiluminescent signal were measured
Data Analysis using an imager. Data for each DyLight™ fluorophore was col-
lected independently at excitation, or the emission wavelengths:
530/605 nm for the DyLight™ 549 and 625/695 nm for
DyLight™ 649. The fluorescent intensities between different
blots were normalized using a molecular weight standard. Raw
data was collected as arbitrary light units, averaged and quantified
micro densitometrically using ImageJ 1.40 g, which is freely
provided by the National Institutes of Health (Bethesda, MD).
4 Notes
1. Sample preparation.
The lysis procedure disrupts the cell membrane and solu-
bilizes proteins, so they can migrate individually through the
separating gel. There are many recipes for lysis buffers; the
Western Blot Processing Optimization: The Perfect Blot 73
Table 1
Protease inhibitor
Final
Protease/phosphatase concentration in
Inhibitor inhibited lysis buffer Stock
Aprotinin Trypsin, chymotrypsin, 2 μg/ml, dilute in 10 mg/ml. Do not re-use
plasmin water aliquots
Leupeptin Lysosomal 5–10 μg/ml, 5–10 μg/ml. Dilute in water.
dilute in water Do not re-use once
defrosted
Pepstatin A Aspartic proteases 1 μg/ml Dilute in methanol, 1 mM
PMSF, Serine, cysteine 1 mM, dilute in Can re-use the aliquot
Phenylmethylsulfonyl proteases ethanol
fluoride
EDTA Metalloproteases that 5 mM, dilute in Adjust to pH to 8.0
require Mg++ and water, 0.5 M
Mn++
EGTA Metalloproteases that 1 mM, dilute in Adjust to pH to 8.0
require Ca++ water, 0.5 M
Sodium fluoride (NaF) Serine/threonine 5–10 mM, dilute Do not re-use once defrosted
phosphatases in water
Sodium orthovanadate Tyrosine phosphatases 1 mM, dilute in Do not re-use once defrosted
water
Gradient gels can also be used if there are both low molec-
ular weight (LMW) and high molecular weight (HMW) pro-
teins that need to be measured in the same sample.
The SDS-PAGE system should include a tank, a lid with
power cables, electrode assembly, cell buffer dam, casting
stands, casting frames, combs, and glass plates. The SDS-
PAGE consists of a stacking gel and separating gel. The stack-
ing gel (acrylamide 5%) is poured on top of the separating gel
(after solidification) and a gel comb is inserted in the
stacking gel.
4. Immunodetection.
The reaction of antibodies in a solution and a protein
attached to a solid support could introduce bias and errors if
not performed properly. There are many variations in how to
perform these procedures that could produce dramatically dif-
ferent results. In general, the procedure should be optimized
for each combination of antibodies and antigen. Each step of
immunodetection is important in achieving reliable and repro-
ducible results, in particular the washing and hybridization
steps. If it is impossible to optimize each step for the particular
antibodies, a general guideline to how to perform each step
that works well with most antibody–antigen combinations
could be followed.
Western Blot Processing Optimization: The Perfect Blot 77
Table 2
Electrophoresis conditions
Migration
Protein state Gel condition Loading buffer buffer
Reduced, Reducing and denaturing With β-mercaptoethanol or DTT, with With SDS
denatured SDS
Reduced, native Reducing and With β-mercaptoethanol or DTT, no No SDS
non-denaturing SDS
Oxidized, Non-reducing and No β-mercaptoethanol or DTT, with With SDS
denatured denaturing SDS
Oxidized, native Non-reducing and native No β-mercaptoethanol or DTT, no No SDS
SDS
a b
c 120
100
human IL-2 Tubulin
Relative intensity
80
60
40
20
0
4°C RT 4°C RT
Fig. 3 Western blot processing at low temperature increased staining intensity. The blots were processed as
described in Subheading 3. In each blot, the first lane contained a HeLa whole-cell lysate; the second lane
contained a recombinant human IL-2, and the third lane a mixture of HeLa whole-cell lysate and h-IL2.
Following the SDS-PAGE, proteins were transferred from the gel onto the nitrocellulose membrane as
described in Methods. Membranes were blocked for 30 min at RT or for 90 min at 4 C using blocking
reagent and then incubated with the rabbit anti-IL2 and mouse anti-α-tubulin primary antibodies for 2 h at RT
(a) or for 10 h at 4 C (b), followed by the incubation with DyLight™ 549 conjugated anti-mouse IgG and
DyLight™ 649 conjugated anti-rabbit IgG secondary antibodies for 1 hour at RT (a) or 3 h at 4 C (b). Data for
each DyLight™ fluorophore was collected independently at excitation/emission wavelengths: 530/605 nm for
the DyLight™ 549 and 625/695 nm for DyLight™ 649. (a) All steps of immunodetection (blocking, primary
and secondary antibody incubation, all washing steps) were done at RT. (b) All steps of immunodetection
(blocking, primary and secondary antibody incubation, all washing steps) at 4 C. (c) Comparison of
fluorescence intensity between blots processed at RT and at 4 C. The fluorescent intensities between
different blots were normalized using MW standard
Western Blot Processing Optimization: The Perfect Blot 79
Fig. 4 Western blots with cell extract from the clear cell renal carcinoma (ccRCC)
cell line were incubated with the rabbit polyclonal anti-PARP antibody using the
traditional manual procedure with all washing at RT (a) or with all processing
done at 4 C (b). The membranes were prepared as described in methods. The
bands were visualized using a chemiluminescence substrate. (a) Incubation with
the primary antibody was carried out at 4 C while the rest of the procedure was
performed at room temperature. (b) The entire blot processing was done at 4 C.
When the blot was processed manually at RT (a), multiple nonspecific bands
were observed. In contrast, the blot processed at 4 C (b) displayed only the
bands representing the uncleaved PARP protein (113 kDa) as well as the
C-terminal and N-terminal cleaved PARP fragments (89 and 24 kDa, respec-
tively). The results suggest the importance of blocking and washing steps to be
done at 4 C, especially for monoclonal antibodies
Acknowledgments
References
1. Wu WC, Walaas SI, Nairn AC, Greengard P 6. YALOW RS, GLICK SM, ROTH J, BERSON
(1982) Calcium/phospholipid regulates phos- SA (1964) Radioimmunoassay of human
phorylation of a Mr “87k” substrate protein in plasma ACTH. J Clin Endocrinol Metab
brain synaptosomes. Proc Natl Acad Sci U S A 24:1219–1225
79:5249–5253 7. Towbin H, Staehelin T, Gordon J (1979) Elec-
2. Conte A, Sigismund S (2017) Methods to trophoretic transfer of proteins from polyacryl-
investigate EGFR ubiquitination. Methods amide gels to nitrocellulose sheets: procedure
Mol Biol 1652:81–100. https://doi.org/10. and some applications. Proc Natl Acad Sci
1007/978-1-4939-7219-7_5 76:4350–4354. https://doi.org/10.1073/
3. Pere-Brissaud A, Blanchet X, Delourme D et al pnas.76.9.4350
(2015) Expression of SERPINA3s in cattle: 8. Burnette WN (1981) “Western blotting”: elec-
focus on bovSERPINA3-7 reveals specific trophoretic transfer of proteins from sodium
involvement in skeletal muscle. Open Biol dodecyl sulfate--polyacrylamide gels to
5:150071 unmodified nitrocellulose and radiographic
4. Voelkel T, Andresen C, Unger A et al (2013) detection with antibody and radioiodinated
Lysine methyltransferase Smyd2 regulates protein A. Anal Biochem 112:195–203
Hsp90-mediated protection of the sarcomeric 9. Laemmli UK (1970) Cleavage of structural
titin springs and cardiac function. Biochim Bio- proteins during the assembly of the head of
phys Acta, Mol Cell Res 1833:812–822 bacteriophage T4. Nature 227:680–685
5. Sarge KD, Park-Sarge O-K (2009) Detection 10. Wisdom GB (1994) Protein blotting. Methods
of proteins sumoylated in vivo and in vitro. Mol Biol 32:207–213
Methods Mol Biol 590:265–277
Chapter 5
FISHing on a Budget
Gable M. Wadsworth and Harold D. Kim
Abstract
Sensitive quantification of RNA transcripts via fluorescence in situ hybridization (FISH) is a ubiquitous part
of understanding quantitative gene expression in single cells. Many techniques exist to identify and localize
transcripts inside the cell, but often they are costly and labor intensive. Here we present a method to use a
singly labeled short DNA oligo probe to perform FISH in yeast cells. This method is effective for highly
constrained FISH applications where the target length is limited (<200 nucleotides). This method can
quantify different RNA isoforms or enable the use of fluorescence resonance energy transfer (FRET) to
detect co-transcription of neighboring sequence blocks. Since this method relies on a single probe, it is also
more cost-effective than a multiple probe labeling strategy.
Key words Single molecule, RNA FISH, In situ hybridization, Isoform profiling, FRET
1 Introduction
Ali Navid (ed.), Microbial Systems Biology: Methods and Protocols, Methods in Molecular Biology, vol. 2349,
https://doi.org/10.1007/978-1-0716-1585-0_5, © Springer Science+Business Media, LLC, part of Springer Nature 2022
81
82 Gable M. Wadsworth and Harold D. Kim
2 Materials
Table 1
Cost of FISH probes
2.2.3 Washing, 1. Wash Buffer: 10% formamide (v/v), 2 SSC, and RNase-free
Hybridization, and water.
Mounting 2. Hybridization Buffer: 10% dextran sulfate (w/v), 1 mg/mL
Escherichia coli tRNA, 2 mM vanadyl ribonucleoside complex,
0.2 mg/mL BSA, 2 SSC, and RNase-free water.
84 Gable M. Wadsworth and Harold D. Kim
3 Methods
3.1 Choosing a FISH In addition to cost, there are three important considerations for
Method choosing which FISH method to use.
1. Is absolute copy number important?
2. How long is the sequence of interest?
3. How sensitive is equipment used in data acquisition?
In many experiments, the absolute number of transcripts is less
important than the change in the number of transcripts due to
some perturbation. In these cases, a systematic error such as the
error introduced by inactive fluorophores that are inherently pres-
ent in a population of labeled DNA oligos does not change the
relationship between transcript levels when comparing two strains.
In cases where absolute copy number is important, it is necessary to
use a multiple probe labeling strategy (Fig. 1a). The single-
fluorophore strategy (Fig. 1b) can be extended to this case by
incorporating a hybridization chain reaction design (HCR) [8]
(Fig. 1c), which allows for amplification to reduce false negatives.
For short sequences or for discrimination between short features of
a long sequence, the length of the available sequence requires a
short probe, and specificity can be enhanced using a hairpin probe
design (Fig. 1d). In cases such as alternate sites of initiation a pair of
probes with FRET is another option (Fig. 1e). In order to perform
these experiments, sensitive equipment is necessary. In contrast to
multiple probe FISH experiments which are typically performed
with a widefield microscope, cooled CCD camera, and
epi-fluorescence geometry, the single-probe design requires a sys-
tem capable of sensitive single-molecule detection. The minimum
set of equipment to accomplish single-fluorophore detection is a
widefield microscope capable of TIR/HILO illumination equipped
with an EMCCD camera. We think the combination of light-sheet
or spinning-disk confocal microscopy with sCMOS camera would
work at least equally well. But due to photobleaching and long
acquisition time, most point-scanning confocal systems are not
desirable.
FISHing on a Budget 85
Fig. 1 Schematic of FISH probe design. (a) For conventional FISH methods, 20–50 unique end labeled probes
are used. These are typically labeled as a set of pooled oligos in a single reaction so that they are not able to
be used independently. (b) This design uses a singly labeled DNA oligoprobe to quantify RNA. (c) Two hairpin
probes are designed to be metastable without the RNA input. When the RNA is present, HCR (hybridization
chain reaction) provides enzyme-free signal amplification. (d) A hairpin probe is designed to be metastable in
the hybridization condition. This probe targets 26 nucleotides of the RNA, but 16 of them are masked to
improve specificity. (e) To utilize fluorescence resonance energy transfer (FRET) with FISH, the probes should
be located with the labeled 50 and 30 ends approximately six nucleotides apart. (f) The RNA is detected with an
unlabeled probe and then is hybridized by a secondary probe. This allows the use of one labeled oligo for many
detection schemes to reduce cost
3.2 Probe Design The power of a single-fluorophore probe method is that it enables
specific detection of short regions of RNA transcripts. If the
sequence of interest is unconstrained, then the choice of probe
target location should depend on a combination of homology
with other genomic sequences, RNA-DNA melting temperature,
probe secondary structure, and target secondary structure. Ideally,
the melting temperature should be around 60 C, and the sequence
should have less than 60% correspondence to any other genomic
sequence according to BLAST. Secondary structure can be pre-
dicted by using mFOLD [8] (see Note 1). If the secondary struc-
ture is unavoidable, the probe and the target should have at least
seven consecutive nucleotides that are fully accessible for toehold-
mediated strand displacement [9].
If a FRET-based design is desired, then the two probes should
target sequences separated by two nucleotides, and the labeling
should be done so that the pair of dye molecules are on the
proximate 30 and 50 ends of the two probes.
A single-fluorophore probe can also be designed as a hairpin
with a 10-nt toehold and a sequestered 10-nt loop, which can
function as a toehold for a secondary probe to provide signal
amplification via HCR. These probes are designed to be metastable
86 Gable M. Wadsworth and Harold D. Kim
3.3 Cell Culture, 1. At the end of the day, inoculate cells into SD liquid media in a
Fixation, and culture flask from cells actively growing on an SD complete
Spheroplasting plate.
2. After overnight growth, measure the optical density of cells
using a spectrophotometer at 600 nm by pipetting 1 mL of
cell culture into a cuvette.
3. Upon reaching 0.6 OD600, decant cell culture into 59-mL
Falcon tubes and pellet at 671 g.
4. Resuspend the pellet in 10 mL of ice cold (4 C) methanol for
10 min.
5. Pellet and aspirate the cells.
6. Resuspend in ice-cold Buffer B twice and aspirate.
7. Prepare fresh spheroplasting buffer.
8. Resuspend cells in 1 mL of spheroplasting buffer containing
2 μL of zymolyase and pipette to mix gently.
9. Measure initial cell OD600 by adding 100 μL of cell solution to
900 mL of deionized water and let stand for 1 min. Sphero-
plasted cells should undergo lysis in this condition. Incubate
the remaining cell solution for 30 min and check cell OD600
once more.
10. Once cell OD600 drops by at least 30% from the initial mea-
surement, pellet the cells and aspirate. Cells should be treated
gently in all further steps with no more than 268 g centrifu-
gation (see Note 2).
11. Resuspend in ice-cold Buffer B twice and aspirate.
FISHing on a Budget 87
3.5 Mounting 1. Prepare clean slides and coverslips (see Note 5).
2. Mix 3 μL of concentrated cells with 3 μL of Imaging Buffer on
the slide by gentle pipetting.
3. Place a coverslip on the cells by wetting the edge with the cell
solution and gently lowering to avoid trapping any bubbles
underneath.
4. Use a tissue to gently remove any excess liquid from the edges
of the coverslip and create a monolayer of cells. Do not apply
pressure to the cells through the coverslip.
5. Seal the edges of the coverslip using 5 min epoxy (see Note 6).
4 Notes
Acknowledgments
References
1. Patel AP, Tirosh I, Trombetta JJ, Shalek AK, single fluorophores. Nucleic Acids Res 45(15):
Gillespie SM, Wakimoto H, Cahill DP, Nahed e141–e141
BV, Curry WT, Martuza RL (2014) Single-cell 5. Choi HM, Chang JY, Trinh LA, Padilla JE,
RNA-seq highlights intratumoral heterogene- Fraser SE, Pierce NA (2010) Programmable
ity in primary glioblastoma. Science 344 in situ amplification for multiplexed imaging
(6190):1396–1401 of mRNA expression. Nat Biotechnol 28
2. Femino AM, Fay FS, Fogarty K, Singer RH (11):1208
(1998) Visualization of single RNA transcripts 6. Moffitt JR, Zhuang X (2016) RNA imaging
in situ. Science 280(5363):585–590 with multiplexed error-robust fluorescence in
3. Raj A, Van Den Bogaard P, Rifkin SA, Van situ hybridization (MERFISH). Methods
Oudenaarden A, Tyagi S (2008) Imaging indi- Enzymol 572:1–49
vidual mRNA molecules using multiple singly 7. Cold Spring Harbor Protocols (2015) Syn-
labeled probes. Nat Methods 5(10):877 thetic defined (SD) medium. Cold Spring Har-
4. Wadsworth GM, Parikh RY, Choy JS, Kim HD bor Protoc 2015:pdb.rec085639. https://doi.
(2017) mRNA detection in budding yeast with org/10.1101/pdb.rec085639
90 Gable M. Wadsworth and Harold D. Kim
8. Zuker M (2003) Mfold web server for nucleic 10. Broadwater DB Jr, Kim HD (2016) The effect
acid folding and hybridization prediction. of basepair mismatch on DNA strand displace-
Nucleic Acids Res 31(13):3406–3415 ment. Biophys J 110(7):1476–1484
9. Broadwater DB, Altman RB, Blanchard SC, 11. Mueller F, Senecal A, Tantale K, Marie-Nelly-
Kim HD (2018) ERASE: a novel surface H, Ly N, Collin O, Basyuk E, Bertrand E,
reconditioning strategy for single-molecule Darzacq X, Zimmer C (2013) FISH-quant:
experiments. Nucleic Acids Res 47(3):e14–e14 automatic counting of transcripts in 3D FISH
images. Nat Methods 10(4):277
Chapter 6
Abstract
High-resolution imaging with secondary ion mass spectrometry (nanoSIMS) has become a standard
method in systems biology and environmental biogeochemistry and is broadly used to decipher ecophysio-
logical traits of environmental microorganisms, metabolic processes in plant and animal tissues, and cross-
kingdom symbioses. When combined with stable isotope-labeling—an approach we refer to as nanoSIP—
nanoSIMS imaging offers a distinctive means to quantify net assimilation rates and stoichiometry of
individual cell-sized particles in both low- and high-complexity environments. While the majority of
nanoSIP studies in environmental and microbial biology have focused on nitrogen and carbon metabolism
(using 15N and 13C tracers), multiple advances have pushed the capabilities of this approach in the past
decade. The development of a high-brightness oxygen ion source has enabled high-resolution metal
analyses that are easier to perform, allowing quantification of metal distribution in cells and environmental
particles. New preparation methods, tools for automated data extraction from large data sets, and analytical
approaches that push the limits of sensitivity and spatial resolution have allowed for more robust characteri-
zation of populations ranging from marine archaea to fungi and viruses. NanoSIMS studies continue to be
enhanced by correlation with orthogonal imaging and ‘omics approaches; when linked to molecular
visualization methods, such as in situ hybridization and antibody labeling, these techniques enable in situ
function to be linked to microbial identity and gene expression. Here we present an updated description of
the primary materials, methods, and calculations used for nanoSIP, with an emphasis on recent advances in
nanoSIMS applications, key methodological steps, and potential pitfalls.
Key words NanoSIMS, Isotope assimilation, Metal imaging, Single-cell biology, Sample preparation,
SEM, TEM, FIB, FISH, O ion source
1 Introduction
Ali Navid (ed.), Microbial Systems Biology: Methods and Protocols, Methods in Molecular Biology, vol. 2349,
https://doi.org/10.1007/978-1-0716-1585-0_6, © Springer Science+Business Media, LLC, part of Springer Nature 2022
91
92 Jennifer Pett-Ridge and Peter K. Weber
These data have also enabled insights into the genomic potential of
uncultured organisms that exist in complex systems; however,
quantitative measures of metabolic functions of these organisms
and within-population variability remain largely untested. Isotope
tracing techniques are unique in their ability to identify in situ
ecophysiology of microorganisms and biogeochemical exchanges,
making them some of the most powerful techniques in microbial
ecology [1–6]. Among these approaches, the development of high-
spatial resolution secondary ion mass spectrometry [7], specifically
with a CAMECA NanoSIMS 50 and later the 50L, has opened up
new capabilities for taking on the challenge of single-cell scale
isotope imaging and has become a standard method for assessing
in situ metabolic activity.
Nanoscale secondary ion mass spectrometry (nanoSIMS) [7] is
a quantitative imaging technique where a high-energy primary ion
beam is used to sputter small volumes of sample surface material,
generating secondary ions that are used to create atomic or molec-
ular ion maps. Its high lateral resolution (~50 nm) and parts per
million to high parts per billion detection limit enables in situ
characterization of isotope enrichment and elemental composition
at the single-cell level. The NanoSIMS 50 and 50L (CAMECA,
Gennevilliers, France) can image 5–7 elements or isotopes simulta-
neously; additional species can be imaged using a magnetic peak
switching approach [8]. These characteristics enable mapping of
trace element and isotopic variations with submicron-scale resolu-
tion, including in subregions of individual cells (Figs. 1, 2, and 3).
Measurement precision in submicron regions is typically 1% for
isotope ratios; higher precision can be achieved in larger volumes.
As such, nanoSIP studies typically involve isotope or rare element
labeling, although microscale imaging of naturally occurring ele-
mental or isotope fractionation patterns is possible [9, 10].
NanoSIMS was first intensively applied to meteoritic material
[11], and in the early 2000’s to biological materials ranging from
cell membranes to bacteria, eukaryote symbionts, archaea, cyano-
bacteria, spores, biominerals, and soils [12–28]. Interest in nano-
SIMS applications for microbial ecology, cell biology, and
environmental science has grown quickly between that period and
the present, with multiple CAMECA nanoSIMS instruments in use
specifically for these applications. Today, nanoSIMS analysis is a
well-accepted technique and has been discussed in over 1000 pub-
lications from many disciplines. Multiple literature reviews have
been published that focus on applications including soils [28, 29],
biofilms [30], marine ecology [31], cell metabolism [32], plant
elemental distribution [33], the combination of nanoSIP with
fluorescent in situ hybridization (FISH) [34], general biological
applications [35–37], and cell membranes [38]. An updated list of
nanoSIMS literature in environmental biology and cell biology may
be found at https://www.cameca.com/products/sims/nanosims
NanoSIP: NanoSIMS Applications for Microbial Biology 93
δ15N δ15N
TEM
1120
0
8530
5810
3080
360
Fig. 1 Correlated nanoSIMS nitrogen isotopic composition and TEM images of a Trichodesmium thin-section
incubated for 8 h with 13C-HCO3 and 15N-N2. The cyanobacterial filament was resin embedded, ultramicro-
tomed into 200-nm-thick sections, imaged by TEM, and then analyzed by nanoSIMS. The nitrogen isotope data
are shown as deviations from the natural abundance value in parts per thousand, as indicated in the legend
(δ15N). Areas of 15N enrichment indicate localization of newly fixed nitrogen, which is accumulated in
cyanophycin granules (arrows) apparent in the TEM image (Finzi-Hart, Pett-Ridge et al. PNAS 2008)
Fig. 2 Thin section isotope imaging illustrates how newly acquired C and N is allocated to regions of active growth
or maintenance. Correlated TEM and nanoSIMS images of a filamentous cyanobacterium, Anabaena sp. SSM-00
(larger cells) infected by an epibiont (Rhizobium sp. WH2K) that attaches to the Anabeana heterocyst, the site of N
fixation. The δ13C and δ15N images show that newly acquired 13C and 15N fixed by Anabeana is used by the
epibiont, in addition to being allocated for active growth or maintenance in the Anabeana. Scale bar is 2μm (Image
by Jennifer Pett-Ridge, in collaboration with A. Spormann and W.O. Ng, Stanford University)
94 Jennifer Pett-Ridge and Peter K. Weber
Fig. 3 TEM and nanoSIMS images illustrating the potential for analysis of subcellular elemental distribution in
resin-embedded and microtome-sectioned cells. Top row, left to right: TEM of ultramicrotome section of
mouse brain tissue, (a) a glial cell nucleus, (b) a blood vessel, and (c) myelinated axions are indicated; 12C14N
ion image; 31P ion image of the same region (In collaboration with B. Anderson, SUNY Stony Brook). Bottom
row, left to right: nanoSIMS secondary ion images showing the distributions of N (measured as CN) and P in
sectioned non-Hodgkin’s lymphoma cells (Raji) (Image by Brenda Anderson and Peter Weber, in collaboration
with G. L. DeNardo, University of California, Davis)
1.1 Recent With nanoSIP, metabolic activities of single microbial and eukary-
Developments otic cells and their symbionts can be tracked by imaging natural
in NanoSIMS Systems isotopic and elemental composition or isotope distribution after
Biology Research a stable isotope tracing experiment [40]. Most nanoSIP environ-
mental microbiology studies have targeted nitrogen and carbon
metabolism (using 15N and 13C enriched tracers) (e.g., [31, 41–
43]), but a growing number discuss patterns of sulfur, phospho-
rous, and metals (e.g., [44–49]) or use D2O as a means to track
NanoSIP: NanoSIMS Applications for Microbial Biology 95
active cells [50, 51]. While many of the earliest nanoSIP microbiol-
ogy studies were focused on aquatic bacterial and archaeal commu-
nities [18–20, 52], and 13C and 15N fixation in diazotroph cultures
such as Trichodesmium spp. [23] (Fig. 1) and Anabaena oscillar-
ioides (Fig. 2) [25], recent years have brought a large expansion in
the types of microbial study systems. These include methane pro-
ducers and consumers in aquatic and industrial waste treatment
systems [18, 53–58], many types of symbionts [43, 59–61], and
taxa found in the human gut microbiome [62] and insect gut
[63, 64]. NanoSIMS imaging has proved particularly useful for
studies of elemental exchanges between symbionts and has been
applied in sponges and corals [65–70], algal–bacterial interactions
[71–73], ant–plant–fungus interactions [74], and microbial mat
studies of multifunctional group interactions [75].
In the past decade, nanoSIP approaches have been used to
support a systems-level understanding in a substantially expanded
pool of study systems, including plants, fungi, soils, and viruses. In
plants, elemental distributions of Zn, Cd, Fe, Mg, K, Cu, As, Si,
and U have been mapped at the cellular and subcellular scale as a
means to understand patterns of hyperaccumulation, toxicity, and
metabolism (reviewed in Nunez et al. [37]). Transfers of carbon,
nutrients, and water between plant roots and mycorrhizal fungi,
first imaged by Nuccio et al. in 2013 [76], are particularly well
suited to nanoSIMS analyses, as these exchanges occur across a
microscale interface [76–84]. In soil, nanoSIMS imaging has the
potential to measure biogeochemical exchanges between diverse
phases, including bacteria, fungi, minerals, organic matter, and
phages, although the extreme spatial complexity demands a large
number of analyses to provide statistically robust conclusions. Since
Hermann et al.’s early perspective article [28], many dozens of
nanoSIP studies have explored soils, including the fate of isotopi-
cally enriched plant amendments to soil [29, 85–88], so-called
rock-eating microbes that weather primary minerals [85], the
incorporation of microbial necromass into soil organic matter
[87], and soil clay minerals that exhibit antibacterial properties
[89, 90]. Creative applications, including nanoSIMS analysis of μl
quantities of soil porewater [88] and cells separated from the soil
matrix via Nycodenz gradients [46, 91] can help to deconvolve the
isotope enrichment or elemental stoichiometry of distinct soil
pools. Viral and phage particles are a frontier for nanoSIMS imag-
ing, since their size is at the current outer limit of technical feasibil-
ity [92–94]. Novel approaches, such as low-energy ion
implantation (see below), may help to increase sensitivity for such
tiny particles, which are so thin that much of the sample could be
sputtered away in the initial moments of an analysis, when the
sputter rate can be 100 times higher than the equilibrium rate [94].
Environmental systems biology studies using nanoSIP have
also expanded in breadth in the past decade, and now reach far
96 Jennifer Pett-Ridge and Peter K. Weber
1.2 Recent The most notable technical advance for nanoSIMS in the past
NanoSIMS decade was the development of a high-brightness negative oxygen
Instrumentation ion source, which enables positive secondary ion imaging with
Innovations 50 nm resolution. In the CAMECA nanoSIMS instruments, imag-
ing resolution is determined by the analysis ion beam spot size, and
originally, only the micro Cs+ ion source had sufficient brightness to
achieve a 50 nm spot size; the lower brightness of the duoplasma-
tron source (used to generate O ions) allowed 100 nm resolution
at best, with lower stability and reliability. As such, many researchers
prioritized Cs+ analyses for electronegative elements like C and N
over O analyses for metals.
In 2013, Oregon Physics (Hillsboro, OR, USA) produced a
high-brightness O source called the Hyperion II, which generates
ions from oxygen gas using a radiofrequency (RF) inductively cou-
pled plasma (ICP) [111, 112]. The Oregon Physics system sub-
stantially suppresses electron extraction while producing a high-
brightness O beam. Based on tests with Lawrence Livermore
National Laboratory’s NanoSIMS 50, the Hyperion II was
NanoSIP: NanoSIMS Applications for Microbial Biology 97
Fig. 4 (i) Zebrafish embryo retina sections for wild-type (A, C, E, & G) and copper-deficient Calgw71 embryos
(B, D, F, H). Left to right orientation is from inner to outer retina. A, B: Anatomical nuclear staining for
reference. NanoSIMS images include: C, D: copper (Cu); E, F: phosphorous (P); G, H: overlay of copper and
phosphorous images. GCL: ganglion cell layer; IPL: inner plexiform layer; INL: inner nuclear layer; OPL: outer
plexiform layer; ONL: outer nuclear layer; RPE: retinal pigmented epithelium. Scale bar 25μm. NanoSIMS
copper ion image (D) for the copper deficient embryos show reduced copper in megamitochondria relative to
the wild-type in ONL, but elevated relative to other organs (not shown). These images provide evidence for
copper prioritization for vision. (ii) Standard curve for copper generated by nanoSIMS analysis of matrix-
matched standards plotted against bulk copper concentrations determined by liquid-phase ICP-MS. N 3
measurements per point. Error bars represent standard deviations (Akerman et al. Metallomics 2018)
1.3 Moving Toward While nanoSIP has become widely used by the fields of systems
Standardized Methods biology and microbial ecology, the application of high spatial reso-
lution SIMS to biology is still limited to a couple of dozen labs and
user facilities, each with its own protocols for analysis, standardiza-
tion, and data treatment. A number of important issues are still not
codified in the literature and not widely reported in nanoSIMS-
based publications:
1. Standards to demonstrate proper operation and tuning of the
instrument and for quantification of isotopic ratios and ele-
mental concentrations.
2. Effective mass resolving power (see Subheading 3.4.1) and
demonstration of negligible collection of isobaric interferences.
3. Pre-analysis ion implantation and sputtering equilibrium.
4. Demonstration of sample performance (charging, flatness,
orientation).
5. Data extraction protocols, including defining regions of
interest.
6. Effects of sample preparation.
As the systems biology community continues to elaborate on
the nanoSIP approach, it will serve the community to have a more
standardized approach. In the method description that follows, we
describe a series of protocols that could serve as a basis for
standardization.
2 Materials
2.1 Sample Selection 1. Cultures, co-cultures, natural communities from soil, water or
and Experimental sediment, tissues.
Design 2. Treatments and controls, harvests from a temporal series
(if desired).
3. Final preparation must fit within a 50 mm circle and be vacuum
compatible.
NanoSIP: NanoSIMS Applications for Microbial Biology 99
2.2 Incubations 1. Substrates labeled with stable isotopes. These can be purchased
for Stable from companies such as Cambridge Isotopes, Isotec-Sigma, or
Isotope Tracing JPT Peptide Technologies. Substrates may also be grown (e.g.,
13
in Cultures C and 15N plant litter) or purified in house [96].
and Microbial 2. To label cultures with gasses or gas-exchangeable compounds:
Communities sealed vials, gas bags, or environmental chambers. For gas
injection: gas tight syringe, gas tank regulator, and
extraction port.
3. Any inert container can be used for labeling experiments with
nongaseous compounds. Field labeling is also possible if a
portion of the system can be at least partially sealed off.
2.4 High Spatial 1. The NanoSIMS 50 and NanoSIMS 50L (CAMECA) are state-
Resolution SIMS of-the-art instruments for isotope and elemental imaging.
These nanoSIMS instruments are a form of magnetic sector
SIMS with high spatial resolution (down to 50 nm), high mass
resolving power and transmission, and simultaneous detection.
They use a high-energy primary ion beam to interrogate the
sample (sputtering). In this process, a small volume of the
sample is impacted by the primary ion beam, breaking bonds,
and ejecting atoms and small molecules. A fraction of the
sputtered material spontaneously ionizes in proportion to the
element-specific ionization probability. The ions are extracted
by an electric field into a secondary ion mass spectrometer. The
sensitivity ranges from detecting 1 in 20 nitrogen atoms to 1 in
1,000,000 helium atoms, and mass resolving power (specific-
ity) can be up to 15,000 M/ΔM in corrected units (see Sub-
heading 3.4.1) [7, 11]. Imaging is achieved by scanning the
primary beam over the sample (in a region <50μm2) and
reconstructing the ion images digitally.
100 Jennifer Pett-Ridge and Peter K. Weber
2.5 Data Analysis 1. NanoSIMS image analysis software (L’image, L. Nittler, Car-
negie Institution of Washington).
2. WinImage, Cameca.
3. OpenMIMS (https://nano.bwh.harvard.edu/openmims), an
add-on for Image J, a free-ware program available from the
U.S. National Institutes of Health (http://rsb.info.nih.gov/ij/
download.html).
4. Look@NanoSIMS [118], a free-ware program developed for
MatLab (MathWorks).
3 Methods
3.1 Sample Selection A wide range of biological samples can be analyzed by nanoSIMS if
and Experimental properly prepared (see Subheading 3.3). Experimental effects
Design should be maximized to compensate for sample heterogeneity
and analysis precision: ideally with isotope enrichment >1 atom %
or trace element concentration differences that are >2-fold. Typi-
cally, treatment samples are referenced to control samples. For
nanoSIP experiments, useful controls include the initial mate-
rial, no-heavy isotope addition controls, and time-zero isotope
addition controls. If an isotopically labeled solid substrate has
been used as an amendment (e.g., 13C plant material, necromass,
EPS [87, 96, 119–121]), it is essential to analyze some of the same
material “neat”—to understand its microscale heterogeneity. For
trace element studies, no-treatment controls are likely sufficient.
For many experiments, time course analyses aid data interpretation
[23, 95, 100–102]. Finally, while nanoSIMS analysis time is fre-
quently costly and limited, biological replicates are essential for
each timepoint and treatment and will substantially improve statis-
tical power.
3.2 Isotopic Labeling If an isotope label is to be tracked, the labeled substrate will depend
of Cultures on the experimental goals, but can range from dinitrogen gas to
and Microbial amino acids to complex biomolecules such as cellulose. Typically,
13
Communities C and/or 15N are added as tracers in nanoSIP studies because
NanoSIP: NanoSIMS Applications for Microbial Biology 101
Fig. 5 NanoSIMS ion images showing co-localization of bromine (77Br) with phosphorus (31P) in a HeLa cell,
indicating the incorporation of BrdU into DNA. The high P signal shows the location of the DNA in the nucleus.
The lack of correlation between bromine and chlorine (35Cl) indicates that the distribution of bromine is not
the result of being a trace constituent in the major halide-bearing molecules. Therefore, these results showed
that the Br accumulates in the nucleus, suggesting that the DNA-RNA hybrid was being degraded. The cells
were grown on a Si wafer, treated with BrdU, fixed and dried, and analyzed in a nanoSIMS by sputtering with
high beam current until the nucleus was reached (Image by Peter Weber, in collaboration with L. Dugan, LLNL)
they can be used without altering cellular function (Figs. 1 and 2).
Other options include 18O2- and 2H-labeled substrates and water.
Elemental labels such as F, Br, and I can also be used as tracers
[22, 122]. For example, bromine-labeled deoxy-uradine (BrdU)
may be used as a DNA tag to track cellular division [24, 123, 124]
and can be used to track the fate of a Br-labeled nucleic acids
(Fig. 5). Methods for introducing isotopically labeled substrates
can follow the pattern established by stable isotope probing (SIP)
[125, 126], a set of widely accepted techniques used in microbial
ecology. As a general principle, incubation experiments must last
significantly longer than the time of diffusion into the sample;
however, a balance must be struck in order to avoid cross-feeding
effects. Depending upon the research goal, each labeling experi-
ment will necessarily have minor differences, though many may
resemble the protocol below, which was used to 13C and 15N
label a freshwater cyanobacteria culture [25] (Fig. 6).
Example Isotope Labeling Protocol: A. oscillarioides was grown in
liquid culture with standard conditions, nutrients, buffer, and trace
element amended media [25]. Exponential phase cultures were trans-
ferred to sealed serum vials with no gas phase. Thereafter, a 24 h
incubation occurred with a 12 h light: 12 h dark illumination regime.
At the outset of the pulse labeling, 0.07 ml of NaH13CO3 (~99 atm %
13
C, 0.047 M, final enrichment of 1.7 atm.% 13C-dissolved inorganic
carbon) and 0.3 ml of 99 atm.% 15N2, 0.57 mM, final enrichment of
13.6 atm.% 15N2 were injected into each vial. Basic environmental
factors (irradiance, temperature, pH, starting inorganic N and C
pools) were measured during the incubation period. At multiple time
points (0 min, 15 min, 30 min, 1 h, 2 h, 4 h, 8 h, and 24 h), a vial was
destructively sampled and cells were fixed with 2% glutaraldehyde in
order to determine net uptake rates over the diel cycle.
102 Jennifer Pett-Ridge and Peter K. Weber
19 20 21 22(=Het.B) 23 C 15N
13
0.2 0.25
A.2
0.1
0.2
0
A.3 0.15
-0.1
5 μm 0.1
B Het.B
Het.A Het.C
Het.A Het.B Het.C
C 0.25
0.2
0.15
APE
0.1
0.05
13 15
C APE N APE
0
1 10 20 30 40 50
Fig. 6 (a) Chain of five cells from a filament of A. oscillarioides analyzed by nanoSIMS after 4 hours of
incubation with H13CO3 and 15N2. Het heterocyst. Individual cells are numbered to correspond with the
numbering in part c. (a.1) Image reconstruction based on secondary electrons. (a.2) The distribution of 13C
enrichment. (a.3) The distribution of 15N enrichment. Enrichment is expressed as atom percent enrichment
(APE). (b) Post-analysis nanoSIMS secondary electron image of a filament of 50 cells of A. oscillarioides
showing three heterocysts (Popa, 2007 #1969) after 4 h of incubation with H13CO3 and 15N2. The white box
indicates the area shown in the images a.1, a.2, and a.3. (c) The cell-to-cell variation in 13C (diamonds) and
15
N enrichment (squares) along the same 50 cells filament. There are 1–6 independent replicate measure-
ments per cell. Error bars represent two standard errors (Popa et al. ISME Journal 2007)
3.3 Sample Sample preparation is critical to the success of any nanoSIP experi-
Preparation ment and in some cases is the most challenging step. SIMS is an
and Pre-analysis ultra-high vacuum (~1010 Torr) technique, and samples must be
Characterization prepared for the vacuum chamber in a way that preserves the
molecular and elemental distribution of interest. NanoSIMS imag-
ing cannot be used for in vivo studies, and samples cannot be
analyzed in an aqueous phase without a cryogenic stage [127]. To
prepare samples, it is often necessary to stabilize biological compo-
nents (fixation), remove water (dehydration), and salts (derived
from growth media or sea or sediment water), mount samples on
a conductive support (Si wafer, TEM grid), and then either proceed
to an intact sample analysis or follow with embedding and section-
ing. For some nonaqueous sample types (soils, fungal hyphae), we
have found it workable to analyze unfixed samples [28, 87]. For
other samples, it is ideal to separate cells or particles from a matrix
prior to nanoSIMS analysis; in these cases, a Nycodenz gradient,
flow cytometry, or microfluics approach can be used [46, 91, 103,
128, 129].
3.3.1 Sample Flatness While ideal samples are flat with no more than nanometer-scale
and Conductivity variations in surface topography, in our experience, it is possible to
work with non-flat samples. The primary concern topography
introduces is increased error in isotopic measurements, which result
from spot to spot variations in ion extraction conditions, which can
detune the mass spectrometer. On a perfectly flat sample (e.g.,
individual spores), ~1 permil (‰) precision is possible when imag-
ing with electron multipliers. However, with large cells, soil parti-
cles or other sources of surface irregularity, only percent-level
precision is often possible. For a given sample type, it is necessary
to establish the precision of the measurement conditions by using
comparable samples to the samples of interest. In most cases,
control samples that were not exposed to isotopically labeled sub-
strates are the best option. In many nanoSIP studies, the goal is to
achieve isotopic enrichment of >100 permil or higher; at these
enrichment levels, even many micrometer of surface topography
can be tolerated [63, 110].
Because SIMS instruments use an ion beam to interrogate the
sample and extract ions and electrons, sample charging is a critical
104 Jennifer Pett-Ridge and Peter K. Weber
3.3.3 Embedding In cases where the goal is to target intercellular elemental or isoto-
and Sectioning pic distribution (e.g., Figs. 1, 2, 3, and 7), embedding and section-
ing will likely be needed prior to nanoSIMS analysis. As with other
aspect of sample preparation, the embedding and sectioning
106 Jennifer Pett-Ridge and Peter K. Weber
Fig. 7 Correlated SEM and nanoSIMS micrographs showing the localization of Rubisco, labeled with 5 nm
immuno-gold in thin sections of the cyanobacterium Trichodesmium IMS 101. The immuno-gold can be
imaged by nanoSIMS, allow stable isotope tracing immuno-localization. Note that the gold enhances the
production of CN ions (In collaboration with G. Sandh & B. Bergman, Stockholm University)
3.3.4 Sample Mapping Sample mapping is the final critical step prior to nanoSIMS ana-
lyses; it can greatly enhance operator efficiency and is often essential
to interpretation of results. Most nanoSIMS instruments have the
108 Jennifer Pett-Ridge and Peter K. Weber
3.4 NanoSIMS High spatial resolution SIMS (better than 0.5μm lateral resolution)
Analyses is necessary to characterize the isotopic and elemental composition
of individual microbial cells. The CAMECA NanoSIMS 50 and
50L are the state-of-the-art instruments for combining high lateral
resolution, high mass resolution, and high transmission and may be
used for both stable isotope and trace element analyses of microbial
samples (outlined below). These instruments have two modes of
analysis: a Cs+ primary beam to generate negative secondary ions,
or an O primary beam to generate positive secondary ions. As a
general rule, electronegative elements (e.g., halides) are detected as
negative secondary ions, and electropositive elements (e.g., metals)
are detected as positive secondary ions. Manufacturer manuals and
standard references on SIMS can provide additional guidance on
the choice of detection polarity [117]. In some cases, an experi-
ment requires both electronegative and positive elements to be
mapped in the same sample. This is possible, but changing pola-
rities is a multiple-hour effort. Alternatively, at high enough con-
centrations, some elements can be imaged with sufficient sensitivity
in their non-typical polarity (e.g., FeO instead of Fe+; C+ instead
of C, P+ instead of P; Fig. 4) [26, 47, 117].
For any analysis, it is useful to have standard samples that are
routinely used for tuning. This allows session to session comparison
of transmission, mass resolving power (MRP), and elemental or
isotopic ratios. Standards are also important for finding the correct
species, which can be particularly challenging for higher masses.
Simple reference materials (e.g., iron) are easier to work with than
multi-element standards like the National Institute of Science and
Technology’s NBS610, which has 500μg/g of most elements.
NanoSIP: NanoSIMS Applications for Microbial Biology 109
A.
12C 14 N
H12 C13 C
Log plot
B.
DM
spores
12 C 14 N standard
Linear plot
DM
Flat-top peaks
Fig. 8 Flattop peaks and ultimate precision. (a) Logarithmic and linear plots of a mass scan at mass 26.
12 14
C N is readily resolved from H12C13C, which is 0.007 amu heavier. 13C2 is only 0.004 amu heavier than
12 14
C N and could be resolved, but typically is 4–5 orders of magnitude less abundant, and therefore is
negligible. Note that the 12C14N peak is flat-topped, which means that a range of mass lines from the top of
the peak can be aligned with the detector and precise measurements still be achieved. (b) Measurement
precision is affected by instrument tuning and stability and sample characteristics, but the ultimate limit on
measurement precision is the number of ions collected for the minor species. Therefore, in this example, the
precision of the measurements of bacterial spores is lower than the precision for the graphite standard
because the spores have less mass, and therefore less 13C counts
3.4.1 NanoSIMS Tuning Tuning a SIMS instrument requires expert knowledge. The central
and Estimating Mass aspects of SIMS instrument tuning are primary ion beam align-
Resolving Power ment, peak shape, mass selection, and resolving isobaric interfer-
ences—all of which are important variables to report on in a
nanoSIP article’s methods description. Here we present the basics
issues.
The alignment and focus of the primary ion beam (analysis
beam) set the location of the ion source for the secondary mass
spectrometer and determine the quality of the ion images. Grid
samples are typically used to identify and correct for sources
110 Jennifer Pett-Ridge and Peter K. Weber
3.4.2 Cs+ Analysis The vast majority of systems biology studies requiring nanoSIMS
for Electronegative analysis are focused on electronegative elements such as H, C, O,
Elements and Isotope N, P, and S [39]. All of these elements (and their corresponding
Ratios isotopes) are analyzed with a Cs+ primary beam. Of these, com-
bined C and N isotope measurements are the most common and
stringent analyses at the low end of the periodic table; we discuss
their analysis in detail below.
For both of carbon-13 and nitrogen-15, higher sensitivity is
achieved using a Cs+ primary beam and extracting negative second-
ary ions. The rare and major isotopes are both mapped in the
sample, and the ratio of the two reveals the distribution of the
incorporated label in the sample (Fig. 1, 2, 6, and 9). Nitrogen is
typically detected as the molecular ion CN because of the poor
yield of N and N+ [145, 146]. Carbon isotopes can be measured
using the monomers (C), the hydrides (CH), the dimers (C2),
or the CN species (mass resolving power requirements increase with
mass). The CN species typically have the highest ion count rate in
biological samples, but because ~12,000 MRP (~18,000 based on
the CAMECA software) is required to resolve 13C14N from
11 16
B O at mass 27, these species are typically only used when
the highest surface sensitivity is required [147].
We have found that the C2 dimers measured at mass 24 and
25 are more compatible with the CN species (e.g., 12C2,
13 12 12 14 12 15
C C , C N , C N ) because of similar secondary ion
112 Jennifer Pett-Ridge and Peter K. Weber
Fig. 9 NanoSIMS images of a filamentous cyanobacterium, Anabaena sp. SSM-00 (larger cells) infected by an
epibiont (Rhizobium sp. WH2K) that attaches to the Anabeana heterocyst, the site of N fixation. (a) and (b) are
replicate filaments from the same culture, illustrating that cell–to–cell variation in isotopic enrichment may be
extremely large, even while relative enrichment patterns remain consistent (Image by Jennifer Pett-Ridge, in
collaboration with A. Spormann & W.O. Ng, Stanford University)
focusing (Fig. 10). Simply put, the maximum transmission for the
carbon dimers is better aligned with the maximum transmission for
CN than the carbon monomers are. Physically, this means that the
optimal focusing voltage for the lens used to focus the secondary
ion beam in the entrance slit to the mass spectrometer is more
similar for C2 and CN than for C and CN. Because the ions
are all detected simultaneously, only a single E0S focusing voltage
can be used, and therefore if C and CN are measured, the E0S
focusing voltage has to be compromised for one or both sets of
species. This compromise not only results in a loss in transmission,
but it may result in lower reproducibility of isotope ratio measure-
ments. Maintaining optimal focus at the entrance slit is important
to isotope ratio measurement reproducibility. The difference in E0S
focusing voltage for these species is likely due to the differences in
energy spectra resulting from C2 and CN primarily coming from
molecule decomposition during flight, while C is generated at the
sample [148]. We have observed that the offset between C and
CN varies, but we have not succeeded in making this offset
NanoSIP: NanoSIMS Applications for Microbial Biology 113
350,000
12C14N-
300,000
200,000
150,000
100,000
50,000
0
-7,100 -7,050 -7,000 -6,950 -6,900
Secondary ion beam focus (V)
Fig. 10 Scan of the secondary ion beam focus voltage for lens E0S, showing the
relative change in detected counts. The maximum transmission for 12C14N and
12
C2 coincides here, whereas the maximum transmission for 12C is offset.
While the 12C14N and 12C2 scans are not always this well aligned, C is
typically offset, resulting in either reduced transmission for C or CN if the two
are detected simultaneously. The difference in count rate among these species
varies from sample to sample, but in biological samples, CN typically has a
higher count rate, and C and C2 are similar
98Mo/12C
5 µm
Heterocysts =
0 0.005 0.01 0.015 0.02
0.016
0.014
0.012
0.01
98Mo/C
0.008
0.006
0.004
0.002
0
0 10 20 30 40 50 60
cell number
Fig. 11 Molybdenum distribution in Anabaena oscillarioides. Filaments were fixed in gluteraldehyde and
sputtered with a O beam to a depth of 1μm on a Si planchette (wafer). Data for multiple Mo isotopes were
collected to assess for isobaric interferences. Top: ion ratio map of 98Mo normalized to 12C for quantifica-
tion. A thin white line outlines each individual cell. Grey triangles indicate heterocyst cells. Bottom: data
summary for two replicate filaments. Heterocyst cells are consistently enriched in Mo, a critical nitrogenase
co-factor, suggesting active N-fixation. Mean Mo concentrations, estimated based on published relative
sensitivity factors (117) are 64 (4) μg/g in heterocysts (n ¼ 5) and 18 (0.9) μg/g in vegetative cells
(n ¼ 46) (Image by Jennifer Pett-Ridge)
Fig. 12 Representative nanoSIP images demonstrating high-throughput metabolic screening of cells filtered
from Pacifica, California seawater incubated with 13C-bicarbonate and 15N-amino acids for 6 days. 14N12C
ion counts reflect all carbon- and nitrogen-containing particles, 13C atom percent indicates cells enriched in
13
C, and 15N atom percent indicates cells enriched in 15N. The same four cells are indicated with arrows in
each panel, with letters in the first panel indicating putative metabolism: I (no enrichment; inactive cell), C1
(enrichment in only 13C; chemoautotroph), H (enrichment in only 15N; heterotroph), and C2, (enrichment in 13C,
minor enrichment in 15N; chemoautotroph). Scale bar is 11μm (Dekas et al. Frontiers in Microbiology 2019)
3.4.3 NanoSIMS Trace Trace element analysis in biological samples is often used to deter-
Element Analysis mine the concentration and distribution of metal cofactors and
labels. With the invention of the Hyperion II RF inductively cou-
pled plasma ion source, trace metal analysis with nanoSIMS has
become significantly easier and more attractive. The method of
analysis is similar to the stable isotope analysis method outlined
above, except that typically the trace elements of interest are metals,
which are imaged with higher sensitivity as positive secondary ions
with an O primary beam [117]; elements such as Na, K, Al, Mg,
and Ca ionize extremely well in this mode. Whether metals such as
116 Jennifer Pett-Ridge and Peter K. Weber
Fig. 13 Comparison of nanoSIMS-based characterization of sectioned versus whole Bacillus thuringiensis (Bti)
spores. (a) TEM image of a sectioned Bti spore showing its layered architecture and overall dimensions. Scale
bar is 200 nm. (b) Lateral profile across the surface of a sectioned Bti spore showing the distribution of 12C,
31
P, and 35Cl. The dashed lines identify the core region based on the 31P profile. The whole spore is defined
based on the 12C profile and identified by solid lines. Profile: length 1200 nm; width 200 nm. (c) Model
representation of a sectioned spore with the highlighted rectangular region representing the location of profile
data. (d–f) NanoSIMS secondary ion images showing the distribution of 12C, 31P, and 35Cl across the sectioned
spore surface. Scale bar is 200 nm. (g) SEM image of a whole Bti spore. Scale bar is 200 nm. (h) Depth profile
of whole spore showing the distribution of 12C, 31P, and 35Cl as a function of depth in the spore. (i) Model
representation of a whole spore with the highlighted column representing the location of the profile data.
Profile diameter is 200 nm. (j–l) NanoSIMS secondary ion images showing the spatial distribution of 12C, 31P,
and 35Cl in the spore. Scale bar is 500 nm. Both profiles were acquired with the Cs22 primary ion beam
(Reprinted with permission from: Ghosal et al. Analytical Chemistry 2008. Copyright 2008 American Chemical
Society)
Mn, Fe, Cu, Mo, Cr, V, and Ni (and in the right circumstances, Zn
and As) can be detected in a given system with subcellular resolu-
tion depends on their concentration in the sample and relative
sensitivity factor (a.k.a., relative useful yield; see Subheading 3.5.2
and [117]). At LLNL, we have imaged a range of trace elements in
cells, including Mo (as a proxy for nitrogenase; Fig. 11), Mg, Si, P,
Mn, Fe, Cu, Zn, and As [47, 48, 104, 151]. The highest spatial
resolution achieved with the Hyperion II on a CAMECA Nano-
SIMS in this mode is ~50 nm with ~0.5 pA O primary beam
[47, 113]. For very-low-concentration elements (ppb to low
ppm), a >100 pA primary beam is necessary to acquire enough
NanoSIP: NanoSIMS Applications for Microbial Biology 117
counts for imaging, with spatial resolution >250 nm (Fig. 4). The
sputter rate for biological materials with an O primary beam is
~0.2 nm/μm2∙pA/s [150]. For many metals, low-ppm-level cellu-
lar concentrations can be imaged, but great care must be taken to
ensure detectors only collect the isotope or element of interest, as
opposed to isobaric interferences.
3.4.4 Standards Standards and controls have distinct but related roles that are
and Controls important to obtaining reliable results. Standards are used to
check instrument operation, quantify absolute composition, and
provide a reference for experiments. For high-precision isotope
measurements or trace element measurements, at least two
matrix-matched standards with distinct known compositions are
necessary to insure accurate and meaningful results
[47, 152]. Experimental controls are used to test for experimental
artifacts and the statistical significance of treatments.
Standards are not readily available for biological SIMS because
certified biological samples have not been appropriate. As a result,
standards typically need to be produced and characterized “in-
house” or borrowed from other laboratories. In cases of large
effects relative to analytical uncertainty, no-treatment controls can
sometimes take the place of standards. For elemental analyses with-
out concentration standards, it may be necessary for the measured
ratios of interest to be on the order of 10 higher than background
to be confident the effects are real [48, 139]. Furthermore, without
standards, correct instrument operation is hard to verify. One stop-
gap option is to always analyze the same sample at every session,
even if the absolute composition is uncertain or it is not relevant to
the biological sample (e.g., NBS610) [48, 139].
For C and N isotopic measurements, we at LLNL originally
used a well-characterized Bacillus subtilis spore preparation as a
reference standard for [23]. Measurement precision, σ (internal), for
this standard is 0.4–1.4% (2σ for individual 13C/12C and 15N/15N
measurements), and replicate analyses yielded an analytical preci-
sion, σ(std), of 2.1% (2σ for an individual measurement) (Fig. 8).
More recently, we use an in-house characterized culture of Pseudo-
monas stutzeri deposited on a Si wafer because these cells provide a
better matrix match for our typical experiments.
For high spatial resolution elemental analyses of biological
samples, absolute concentration standards are more difficult to
establish for multiple reasons. First, concentrations are typically
low and therefore prone to contamination. Second, elemental con-
centrations can vary spatially, making it difficult to relate high-
resolution analyses with bulk composition. Third, the composition
of the elemental concentration standards needs to closely match the
unknowns. Beyond these constraints, it is ideal to have multiple
concentrations in the relevant range to establish a calibration curve
to control for potential isobaric interferences.
118 Jennifer Pett-Ridge and Peter K. Weber
3.5 Data Processing NanoSIMS researchers have developed multiple programs that
and Image Analysis allow nanoSIMS ion images to be displayed and processed to
extract the quantitative data (see Subheading 2.5). Data processing
should include corrections for detector dead-time and image shift
and should enable regions of interest (ROIs) to be defined. The
isotopic composition for each ROI is calculated by averaging over
all of the replicate scans. ROI definition algorithms can be used to
identify cells, partition images into uniform subregions, or define
threshold cutoffs for extracting data automatically. Notably,
Arandia-Gorostidi et al. and Dekas et al. both used auto-
identification to select many 100s of putative cells in their analyses
[41, 109], far more than in many early nanoSIMS studies.
RUNKmeas
RUNKest ¼ : ð5Þ
IMF
The resulting isotopic data can be presented as ratios, delta
values, and atom percent excess (APE) (e.g., Fig. 6). For tracer
experiments, APE provides the clearest indication of the uptake of a
stable isotope tracer. APE is calculated based on the initial isotopic
ratio of the sample (or organism) at T ¼ 0 (Ri) and the final isotopic
ratio in the sample, Rf [23]:
Rf Ri
APE ¼ 100%: ð6Þ
Rf þ 1 Ri þ 1
Note that R is the ratio of the rare isotope to the abundant
isotope (e.g., 13C/12C) and that R/(R + 1) is the fraction, f, of the
rare isotope of element X, which can be written fX.
Data can also be presented as net incorporation of the labeled
element in the substrate if its isotopic composition and amount are
well constrained and it is uniformly available to the sampled organ-
isms. In Popa et al., we defined the term Fxnet as the net incorpora-
tion of an element (e.g., net carbon incorporation is “FCnet”)
[25]. Assuming a two-isotope system, we derived Fxnet based on a
two-component mixing model that accounts for the minor (Eq. 7)
and major isotopes (Eq. 8) of element X incorporated from the
initial biomass and the spiked pool:
f xf ¼ F i f xi þ F s f xs & ð7Þ
h i
1 f xf ¼ F i 1 f xi þ F s 1 f xs , ð8Þ
3.5.2 Quantifying For biological samples, relative and absolute elemental concentra-
and Reporting tions are typically determined based on the relative ion count rates
Elemental Data for the element of interest, X, compared to a uniformly distributed
major element—typically C in most biological samples. This
approach may not be valid if the element of interest is in a structure
that is low in C relative to the average matrix concentration (e.g., if
metal is sequestered in a vacuole). In rare cases, implantation of a
reference ion has been used to enable direct quantification in
biological samples [156]. To the extent SIMS is used to quantify
trace elements in biological samples, researchers tend to use matrix-
matched elemental standards.
If a matrix-matched standard for element X is available, the
concentrations of element X, [X]UNK, can readily be determined
based on proportionality using a parameter known as the relative
useful yield (RUY) [157]. This approach works because SIMS
typically yields a linear change in relative ion count rates as the
concentration of that species increases in the sample. Ideally, line-
arity is demonstrated in the relevant range using a set of standards.
Resolving isobaric interferences is an important aspect of getting a
reliable, linear response. The ion yield for the element of interest is
normalized to a reference ion. The RUY is defined as ratio of the
concentrations of element X and the reference element—here C—
to the corresponding ion ratio measured for a standard:
1
½XSTD Xþ
RUY X:C ¼ ð11Þ
½CSTD Cþ STD
þ in the standard of
where [X]STD and [C]STD are the concentrations
element X and carbon, respectively, and CX þ is the measured
STD
ion ratio, here shown as positive ions. Note that the concentrations
can be in any units, and the ion ratio can be for the measured
species (e.g., 56Fe+ and 12C+) or it can be corrected for the isotope
abundances, as long as these choices and the measured species are
consistent for the standard and the unknowns. The RUY is then
used to calculate the concentration of element X in the unknown
using:
Xþ
½XUNK ¼ ½CUNK RUY ð12Þ
Cþ UNK
4 Future Directions
Acknowledgments
References
1. Mayali X, Weber PK, Brodie EL, Mabery S, 6. Koch BJ, McHugh TA, Hayer M, Schwartz E,
Hoeprich PD, Pett-Ridge J (2012) High- Blazewicz SJ, Dijkstra P, van Gestel N, Marks
throughput isotopic analysis of RNA micro- JC, Mau RL, Morrissey EM, Pett-Ridge J,
arrays to quantify microbial resource use. Hungate BA (2018) Estimating taxon-
ISME J 6(6):1210–1221 specific population dynamics in diverse micro-
2. Adamczyk J, Hesselsoe M, Iversen N, bial communities. Ecosphere 9(1):
Horn M, Lehner A, Nielsen PH, Schloter M, e02090–e02015
Roslev P, Wagner M (2003) The isotope array, 7. Hillion F, Daigne B, Girard F, Slodzian G
a new tool that employs substrate-mediated (1993) A new high performance instrument:
labeling of rRNA for determination of micro- the CAMECA NanoSIMS 50. In: Bennin-
bial community structure and function. Appl ghoven A et al (eds) Secondary ion mass spec-
Environ Microbiol 69(11):6875–6887 trometry: SIMS IX, vol 254-257. John Wiley
3. Ouverney CC, Fuhrman JA (1999) Com- & Sons, Chichester
bined microautoradiography-16S rRNA 8. Ghosal S, Leighton TJ, Wheeler KE, Hutch-
probe technique for determination of radio- eon ID, Weber PK (2010) Spatially resolved
isotope uptake by specific microbial cell types characterization of water and ion incorpora-
in situ. Appl Environ Microbiol 65 tion in Bacillus spores. Appl Environ Micro-
(4):1746–1752 biol 76(10):3275–3282
4. Jehmlich N, Schmidt F, Taubert M, Seifert J, 9. Orphan VJ, House CH, Hinrichs K-U,
Bastida F, von Bergen M, Richnow H-H, McKeegan KD, DeLong EF (2001)
Vogt C (2010) Protein-based stable isotope Methane-consuming archaea revealed by
probing. Nat Protoc 5(12):1957–1966 directly coupled isotopic and phylogenetic
5. Murrell JC, Whiteley AS (2011) Stable iso- analysis. Science 293(5529):484–487
tope probing and related technologies. ASM 10. Smart K, Kilburn M, Salter C, Smith J, Gro-
Press, Washington, DC, p 345 venor C (2007) NanoSIMS and EPMA
128 Jennifer Pett-Ridge and Peter K. Weber
analysis of nickel localisation in leaves of the (2007) Morphological and chemical studies
hyperaccumulator plant Alyssum lesbiacum. of pathological human and mice brain at the
Int J Mass Spectrom 260(2–3):107–114 subcellular level: correlation between light,
11. Stadermann FJ, Walker RM, Zinner E (1999) electron, and NanoSIMS microscopies.
Nanosims: the next generation ion probe for Microsc Res Tech 70(4):281–295
the microanalysis of extraterrestrial material. 22. Behrens S, Losekann T, Pett-Ridge J, Weber
Meteorit Planet Sci 34:A111–A112 PK, Ng W, Stevenson BS, Hutcheon ID, Rel-
12. Guerquin-Kern J-L, Hillion F, Madelmont man DA, Spormann AM (2008) Linking
J-C, Labarre P, Papon J, Croisy A (2004) microbial phylogeny to metabolic activity at
Ultra-structural cell distribution of the mela- the single-cell level by using enhanced ele-
noma marker iodobenzamide: improved ment labeling-catalyzed reporter deposition
potentiality of SIMS imaging in life sciences. fluorescence in situ hybridization (EL-FISH)
BioMed Eng. http://www.biomedical-engi and NanoSIMS. Appl Environ Microbiol 74
neering-online.com/content/3/1/10 (10):3143
13. Kraft ML, Fishel SF, Marxer CG, Weber PK, 23. Finzi-Hart J, Pett-Ridge J, Weber P, Popa R,
Hutcheon ID, Boxer SG (2006) Quantitative Fallon SJ, Gunderson T, Hutcheon I,
analysis of supported membrane composition Nealson K, Capone DG (2008) Fixation and
using the NanoSIMS. Appl Surf Sci 252 fate of carbon and nitrogen in Trichodesmium
(19):6950–6956 IMS101 using nanometer resolution second-
14. Moreau JW, Weber PK, Martin MC, ary ion mass spectrometry (NanoSIMS).
Gilbert B, Hutcheon ID, Banfield JF (2007) PNAS 106:6345–6350
Extracellular proteins limit the dispersal of 24. Lechene C, Hillion F, McMahon G,
biogenic nanoparticles. Science Benson D, Kleinfeld A, Kampf JP, Distel D,
316:1600–1603 Luyten Y, Bonventre J, Hentschel D, Park K,
15. Peteranderl R, Lechene C (2004) Measure of Ito S, Schwartz M, Benichou G, Slodzian G
carbon and nitrogen stable isotope ratios in (2006) High-resolution quantitative imaging
cultured cells. J Am Soc Mass Spectrom 15 of mammalian and bacterial cells using stable
(4):478–485 isotope mass spectrometry. J Biol 5(6):20
16. Wainwright M, Weber PK, Smith JB, Hutch- 25. Popa R, Weber PK, Pett-Ridge J, Finzi JA,
eon ID, Klyce B, Wickramasinghe NC, Narli- Fallon SJ, Hutcheon ID, Nealson KH,
kar JV, Rajaratnam P (2004) Studies on Capone DG (2007) Carbon and nitrogen fix-
bacteria-like particles sampled from the ation and metabolite exchange in and
stratosphere. Aeorobiologia 20:237–240 between individual cells of Anabaena oscillar-
ioides. ISME J 1(4):354–360
17. Galli Marxner C, Kraft ML, Weber PK,
Hutcheon I, Boxer SG (2005) Supported 26. Ghosal S, Fallon SJ, Leighton T, Wheeler KE,
membrane composition analysis by secondary Hutcheon ID, Weber PK (2008) Imaging and
ion mass spectrometry with high lateral reso- 3D elemental characterization of intact bacte-
lution. Biophys J 88:2965–2975 rial spores with high-resolution secondary ion
mass spectrometry (NanoSIMS) depth profile
18. Dekas AE, Poretsky RS, Orphan VJ (2009) analysis. Anal Chem 80(15):5986–5992
Deep-sea archaea fix and share nitrogen in
methane-consuming microbial consortia. Sci- 27. Lechene CP, Luyten Y, McMahon G, Distel
ence 326(5951):422–426 DL (2007) Quantitative imaging of nitrogen
fixation by individual bacteria within animal
19. Halm H, Musat N, Lam P, Langlois R, cells. Science 317:1563–1566
Musat F, Peduzzi S, Lavik G, Schubert CJ,
Sinha B, LaRoche J, Kuypers MMM (2009) 28. Herrmann A, Ritz K, Nunan N, Clode P,
Co-occurrence of denitrification and nitrogen Pett-Ridge J, Kilburn M, Murphy D,
fixation in a meromictic lake, Lake Cadagno O’Donnell A, Stockdale E (2007) Nano-
(Switzerland). Environ Microbiol 11 scale secondary ion mass spectrometry – a
(8):2190–2190 new analytical tool in biogeochemistry and
soil ecology: a review article. Soil Biol Bio-
20. Musat N, Halm H, Winterholler B, Hoppe P, chem 39:1835–1850
Peduzzi S, Hillion F, Horreard F, Amann R,
Jørgensen BB, Kuypers MMM (2008) A 29. Mueller CW, Weber PK, Kilburn MR,
single-cell view on the ecophysiology of anaer- Hoeschen C, Kleber M, Pett-Ridge J (2013)
obic phototrophic bacteria. Proc Natl Acad Advances in the analysis of biogeochemical
Sci 105(46):17861–17866 interfaces: NanoSIMS to investigate soil
microenvironments. In: Sparks D
21. Quintana C, Wu TD, Delatour B, (ed) Advances in agronomy. Elsevier,
Dhenain M, Guerquin-Kern JL, Croisy A Amsterdam
NanoSIP: NanoSIMS Applications for Microbial Biology 129
30. Renslow RS, Lindemann SR, Cole JK, Zhu Z, Turk V, Wagner M, Bright M (2018) Nano-
Anderton CR (2016) Quantifying element SIMS and tissue autoradiography reveal sym-
incorporation in multispecies biofilms using biont carbon fixation and organic carbon
nanoscale secondary ion mass spectrometry transfer to giant ciliate host. ISME J 12
image analysis. Biointerphases 11(2):02A322 (3):714–727
31. Mayali X (2020) NanoSIMS: microscale 44. Calabrese F, Voloshynovska I, Musat F,
quantification of biogeochemical activity Thullner M, Schlömann M, Richnow HH,
with large-scale impacts. Annu Rev Mar Sci Lambrecht J, Müller S, Wick LY, Musat N
12:449–467 (2019) Quantitation and comparison of phe-
32. Gao D, Huang X, Tao Y (2016) A critical notypic heterogeneity among single cells of
review of NanoSIMS in analysis of microbial monoclonal microbial populations. Front
metabolic activities at single-cell level. Crit Microbiol 10:2814
Rev Biotechnol 36(5):884–890 45. Braun PD, Schulz-Vogt HN, Vogts A, Nausch
33. Zhao F-J, Moore KL, Lombi E, Zhu Y-G M (2018) Differences in the accumulation of
(2014) Imaging element distribution and spe- phosphorus between vegetative cells and het-
ciation in plant cells. Trends Plant Sci 19 erocysts in the cyanobacterium Nodularia
(3):183–192 spumigena. Sci Rep 8(1):1–6
34. Musat N, Musat F, Weber PK, Pett-Ridge J 46. Gross A, Lin Y, Weber PK, Pett-Ridge J, Silver
(2016) Tracking microbial interactions with WL (2020) The role of soil redox conditions
NanoSIMS. Curr Opin Biotechnol in microbial phosphorus cycling in humid
41:114–121 tropical forests. Ecology 101(2):e02928
35. Agüi-Gonzalez P, J€ahne S, Phan NT (2019) 47. Ackerman CM, Weber PK, Xiao T, Thai B,
SIMS imaging in neurobiology and cell biol- Kuo TJ, Zhang E, Pett-Ridge J, Chang CJ
ogy. J Anal At Spectrom 34(7):1355–1368 (2018) Multimodal LA-ICP-MS and nano-
36. Boxer SG, Kraft ML, Weber PK (2009) SIMS imaging enables copper mapping within
Advances in imaging secondary ion mass spec- photoreceptor megamitochondria in a zebra-
trometry for biological samples. Annu Rev fish model of Menkes disease. Metallomics 10
Biophys 38:53–74 (3):474–485
37. Nuñez J, Renslow R, Cliff JB III, Anderton 48. Hong-Hermesdorf A, Miethke M, Gallaher
CR (2018) NanoSIMS for biological applica- SD, Kropat J, Dodani SC, Barupala D,
tions: current practices and analyses. Biointer- Chan J, Domaille DW, Shirasaki DI, Loo JA,
phases 13(3):03B301 Weber PK, Pett-Ridge J, Stemmler TL,
Chang CJ, Merchant SS (2014) Selective
38. Gorman BL, Kraft ML (2019) High- sub-cellular visualization of trace metals iden-
resolution secondary ion mass spectrometry tifies dynamic sites of Cu accumulation in
analysis of cell membranes. ACS Publications, Chlamydomonas. Nat Chem Biol
Washington, DC 10:1034–1042
39. CAMECA. NanoSIMS 50L: scientific publi- 49. Dawson KS, Scheller S, Dillon JG, Orphan VJ
cations. https://www.cameca.com/ (2016) Stable isotope phenotyping via cluster
products/sims/nanosims analysis of NanoSIMS data as a method for
40. Pett-Ridge J, Weber PK (2012) NanoSIP: characterizing distinct microbial ecophysiolo-
NanoSIMS applications for microbial biology. gies and sulfur-cycling in the environment.
In: Navid A (ed) Microbial systems biology: Front Microbiol 7:774
methods and protocols. Humana, New York, 50. Berry D, Mader E, Lee TK, Woebken D,
NY Wang Y, Zhu D, Palatinszky M,
41. Dekas AE, Parada AE, Mayali X, Fuhrman JA, Schintlmeister A, Schmid MC, Hanson BT
Wollard J, Weber PK, Pett-Ridge J (2019) (2015) Tracking heavy water (D2O) incor-
Characterizing chemoautotrophy and hetero- poration for identifying and sorting active
trophy in marine archaea and bacteria with microbial cells. Proc Natl Acad Sci 112(2):
single-cell multi-isotope nanoSIP. Front E194–E203
Microbiol 10:2682 51. Kopf SH, McGlynn SE, Green-Saxena A,
42. Chadwick GL, Otero FJ, Gralnick JA, Bond Guan Y, Newman DK, Orphan VJ (2015)
DR, Orphan VJ (2019) NanoSIMS imaging Heavy water and 15 N labelling with N ano
reveals metabolic stratification within current- SIMS analysis reveals growth rate-dependent
producing biofilms. Proc Natl Acad Sci 116 metabolic heterogeneity in chemostats. Envi-
(41):20716–20724 ron Microbiol 17(7):2542–2556
43. Volland J-M, Schintlmeister A, Zambalos H, 52. Ploug H, Musat N, Adam B, Moraru CL,
Reipert S, Mozetič P, Espada-Hinojosa S, Lavik G, Vagner T, Bergman B, Kuypers
130 Jennifer Pett-Ridge and Peter K. Weber
MMM (2010) Carbon and nitrogen fluxes 63. Carpenter KJ, Weber PK, Davisson ML, Pett-
associated with the cyanobacterium Aphani- Ridge J, Haverty MI, Keeling PJ (2013) Cor-
zomenon sp. in the Baltic Sea. ISME J 4 related SEM, FIB-SEM, TEM, and Nano-
(9):1215–1223 SIMS imaging of microbes from the hindgut
53. Scheller S, Yu H, Chadwick GL, McGlynn SE, of a lower termite: methods for in situ func-
Orphan VJ (2016) Artificial electron accep- tional and ecological studies of uncultivable
tors decouple archaeal methane oxidation microbes. Microsc Microanal 19
from sulfate reduction. Science 351 (06):1490–1501
(6274):703–707 64. Tai V, Carpenter KJ, Weber PK, Nalepa CA,
54. Dekas AE, Connon SA, Chadwick GL, Perlman SJ, Keeling PJ (2016) Genome evo-
Trembath-Reichert E, Orphan VJ (2016) lution and nitrogen-fixation in bacterial ecto-
Activity and interactions of methane seep symbionts of a protist inhabiting wood-
microorganisms assessed by parallel transcrip- feeding cockroaches. Appl Environ Microbiol.
tion and FISH-NanoSIMS analyses. ISME J https://doi.org/10.1128/AEM.00611-16
10(3):678–692 65. Ceh J, Kilburn MR, Cliff JB, Raina JB, van
55. Green-Saxena A, Dekas AE, Dalleska NF, Keulen M, Bourne DG (2013) Nutrient
Orphan VJ (2014) Nitrate-based niche differ- cycling in early coral life stages: Pocillopora
entiation by distinct sulfate-reducing bacteria damicornis larvae provide their algal symbiont
involved in the anaerobic oxidation of meth- (Symbiodinium) with nitrogen acquired from
ane. ISME J 8(1):150–163 bacterial associates. Ecol Evol 3
56. Milucka J, Kirf M, Lu L, Krupke A, Lam P, (8):2393–2400
Littmann S, Kuypers MMM, Schubert CJ 66. Pernice M, Dunn SR, Tonk L, Dove S,
(2015) Methane oxidation coupled to oxy- Domart-Coulon I, Hoppe P,
genic photosynthesis in anoxic waters. ISME Schintlmeister A, Wagner M, Meibom A
J 9(9):1991–2002 (2015) A nanoscale secondary ion mass spec-
57. Marlow JJ, Steele JA, Ziebis W, Thurber AR, trometry study of dinoflagellate functional
Levin LA, Orphan VJ (2014) Carbonate- diversity in reef-building corals. Environ
hosted methanotrophy represents an unrec- Microbiol 17(10):3570–3580
ognized methane sink in the deep sea. Nat 67. Wangpraseurt D, Pernice M, Guagliardo P,
Commun 5(1):1–12 Kilburn MR, Clode PL, Polerecky L, Kühl
58. Oswald K, Graf JS, Littmann S, Tienken D, M (2016) Light microenvironment and
Brand A, Wehrli B, Albertsen M, Daims H, single-cell gradients of carbon fixation in tis-
Wagner M, Kuypers MM (2017) Crenothrix sues of symbiont-bearing corals. ISME J 10
are major methane consumers in stratified (3):788–792
lakes. ISME J 11(9):2124–2140 68. Lema KA, Clode PL, Kilburn MR,
59. Foster RA, Kuypers MM, Vagner T, Paerl RW, Thornton R, Willis BL, Bourne DG (2016)
Musat N, Zehr JP (2011) Nitrogen fixation Imaging the uptake of nitrogen-fixing bacte-
and transfer in open ocean diatom- ria into larvae of the coral Acropora millepora.
cyanobacterial symbioses. ISME J 5 ISME J 10(7):1804–1808
(9):1484–1493 69. Kopp C, Domart-Coulon I, Barthelemy D,
60. Thompson AW, Foster RA, Krupke A, Carter Meibom A (2016) Nutritional input from
BJ, Musat N, Vaulot D, Kuypers MMM, Zehr dinoflagellate symbionts in reef-building cor-
JP (2012) Unicellular cyanobacterium symbi- als is minimal during planula larval life stage.
otic with a single-celled eukaryotic alga. Sci- Sci Adv 2(3):e1500681
ence 337(6101):1546–1550 70. Yang S-H, Tandon K, Lu C-Y, Wada N, Shih
61. Adam B, Klawonn I, Sveden JB, Bergkvist J, C-J, Hsiao SS-Y, Jane W-N, Lee T-C, Yang
Nahar N, Walve J, Littmann S, Whitehouse C-M, Liu C-T (2019) Metagenomic, phylo-
MJ, Lavik G, Kuypers MMM, Ploug H genetic, and functional characterization of
(2016) N2-fixation, ammonium release and predominant endolithic green sulfur bacteria
N-transfer to the microbial and classical food in the coral Isopora palifera. Microbiome 7
web within a plankton community. ISME J 10 (1):1–13
(2):450–459 71. Samo TJ, Kimbrel JA, Nilson DJ, Pett-Ridge-
62. Berry D, Stecher B, Schintlmeister A, J, Weber PK, Mayali X (2018) Attachment
Reichert J, Brugiroux S, Wild B, Wanek W, between heterotrophic bacteria and microal-
Richter A, Rauch I, Decker T (2013) Host- gae influences symbiotic microscale interac-
compound foraging by intestinal microbiota tions. Environ Microbiol 20(2):4385–4400
revealed by single-cell stable isotope probing. 72. de-Bashan LE, Mayali X, Bebout BM, Weber
Proc Natl Acad Sci 110(12):4720–4725 PK, Detweiler AM, Hernandez J-P, Prufert-
NanoSIP: NanoSIMS Applications for Microbial Biology 131
Bebout L, Bashan Y (2016) Establishment of mediated transfer of water and nutrients sti-
stable synthetic mutualism without mulates bacterial activity in dry and oligotro-
co-evolution between microalgae and bacteria phic environments. Nat Commun 8(1):1–9
demonstrated by mutual transfer of metabo- 83. Bougoure J, Ludwig M, Brundrett M, Cliff J,
lites (NanoSIMS isotopic imaging) and per- Clode P, Kilburn M, Grierson P (2014) High-
sistent physical association (fluorescent in situ resolution secondary ion mass spectrometry
hybridization). Algal Res 15:179–186 analysis of carbon dynamics in mycorrhizas
73. Alonso C, Musat N, Adam B, Kuypers M, formed by an obligately myco-heterotrophic
Amann R (2012) HISH–SIMS analysis of orchid. Plant Cell Environ 37(5):1223–1230
bacterial uptake of algal-derived carbon in 84. Hill PW, Broughton R, Bougoure J,
the Rı́o de la Plata estuary. Syst Appl Micro- Havelange W, Newsham KK, Grant H, Mur-
biol 35(8):541–548 phy DV, Clode P, Ramayah S, Marsden KA
74. Leroy C, Jauneau A, Martinez Y, Cabin- (2019) Angiosperm symbioses with
Flaman A, Gibouin D, Orivel J, Séjalon-Del- non-mycorrhizal fungal partners enhance N
mas N (2017) Exploring fungus–plant N acquisition from ancient organic matter in a
transfer in a tripartite ant–plant–fungus mutu- warming maritime Antarctic. Ecol Lett 22
alism. Ann Bot 120(3):417–426 (12):2111–2119
75. Lee JZ, Burow LC, Woebken D, Everroad 85. Mergelov N, Mueller CW, Prater I,
RC, Kubo MD, Spormann AM, Weber PK, Shorkunov I, Dolgikh A, Zazovskaya E,
Pett-Ridge J, Bebout BM, Hoehler TM Shishkov V, Krupskaya V, Abrosimov K, Cher-
(2014) Fermentation couples chloroflexi and kinsky A (2018) Alteration of rocks by endo-
sulfate-reducing bacteria to cyanobacteria in lithic organisms is one of the pathways for the
hypersaline microbial mats. Front Microbiol beginning of soils on Earth. Sci Rep 8
5:61 (1):1–15
76. Nuccio EE, Hodge A, Pett-Ridge J, Herman 86. Kopittke PM, Dalal RC, Hoeschen C, Li C,
DJ, Weber PK, Firestone MK (2013) An Menzies NW, Mueller CW (2020) Soil
arbuscular mycorrhizal fungus significantly organic matter is stabilized by organo-mineral
modifies the soil bacterial community and associations through two key processes: the
nitrogen cycling during litter decomposition. role of the carbon to nitrogen ratio. Geo-
Environ Microbiol 15(6):1870–1881 derma 357:113974
77. Kaiser C, Kilburn MR, Clode PL, 87. Keiluweit M, Bougoure JJ, Zeglin LH, Myr-
Fuchslueger L, Koranda M, Cliff JB, Solaiman old DD, Weber PK, Pett-Ridge J, Kleber M,
ZM, Murphy DV (2015) Exploring the trans- Nico PS (2012) Nano-scale investigation of
fer of recent plant photosynthates to soil the association of microbial nitrogen residues
microbes: mycorrhizal pathway vs direct root with iron (hydr)oxides in a forest soil
exudation. New Phytol 205(4):1537–1551 O-horizon. Geochim Cosmochim Acta
78. Kuga Y, Sakamoto N, Yurimoto H (2014) 95:213–226
Stable isotope cellular imaging reveals that 88. Keiluweit M, Bougoure JJ, Nico PS, Pett-
both live and degenerating fungal pelotons Ridge J, Weber PK, Kleber M (2015) Mineral
transfer carbon and nitrogen to orchid proto- protection of soil carbon counteracted by root
corms. New Phytol 202(2):594–605 exudates. Nat Clim Chang 5(6):588–595
79. Pett-Ridge J, Firestone MK (2017) Using sta- 89. Morrison KD, Misra R, Williams LB (2016)
ble isotopes to explore root-microbe-mineral Unearthing the antibacterial mechanism of
interactions in soil. Rhizosphere 3:244–253 medicinal clay: a geochemical approach to
80. Hestrin R, Hammer EC, Mueller CW, Leh- combating antibiotic resistance. Sci Rep
mann J (2019) Synergies between mycorrhi- 6:19043
zal fungi and soil microbial communities 90. Londono SC, Hartnett HE, Williams LB
increase plant nitrogen acquisition. Commun (2017) Antibacterial activity of aluminum in
Biol 2(1):1–9 clay from the Colombian Amazon. Environ
81. Gorka S, Dietrich M, Mayerhofer W, Sci Technol 51(4):2401–2408
Gabriel R, Wiesenbauer J, Martin V, 91. Eichorst SA, Strasser F, Woyke T,
Zheng Q, Imai B, Prommer J, Weidinger M Schintlmeister A, Wagner M, Woebken D
(2019) Rapid transfer of plant photosynthates (2015) Advancements in the application of
to soil bacteria via ectomycorrhizal hyphae NanoSIMS and Raman microspectroscopy to
and its interaction with nitrogen availability. investigate the activity of microbial cells in
Front Microbiol 10:168 soils. FEMS Microbiol Ecol 91(10):fiv106
82. Worrich A, Stryhanyuk H, Musat N, König S, 92. Pasulka AL, Thamatrakoln K, Kopf SH,
Banitz T, Centler F, Frank K, Thullner M, Guan Y, Poulos B, Moradian A, Sweredoski
Harms H, Richnow H-H (2017) Mycelium- MJ, Hess S, Sullivan MB, Bidle KD (2018)
132 Jennifer Pett-Ridge and Peter K. Weber
in the plasma membranes of fibroblasts. Proc glycine-derived C and N retention with soil
Natl Acad Sci 110(8):E613–E622 organo-mineral associations. Biogeochem-
111. Smith NS, Boswell RW, Tesch PP, Martin NP istry 125(3):303–313
(2017) Rf system, magnetic filter, and high 122. Li T, Wu TD, Mazeas L, Toffin L, Guerquin-
voltage isolation for an inductively coupled Kern JL, Leblon G, Bouchez T (2008) Simul-
plasma ion source, Google Patents taneous analysis of microbial identity and
112. Smith N, Tesch P, Martin N, Kinion D function using NanoSIMS. Environ Micro-
(2008) A high brightness source for nano- biol 10(3):580–588
probe secondary ion mass spectrometry. 123. Benn PA, Perle MA (1992) Chromosome
Appl Surf Sci 255(4):1606–1609 staining and banding techniques. In: Rooney
113. Malherbe J, Penen F, Isaure M-P, Frank J, DE, Czepulkowski BH (eds) Human cytoge-
Hause G, Dobritzsch D, Gontier E, Horréard netics: volume I, constitutional analysis: a
FO, Hillion FO, Schaumlöffel D (2016) A practical approach. Oxford University Press,
new radio frequency plasma oxygen primary New York, NY
ion source on nano secondary ion mass spec- 124. Latt SA (1973) Microfluorometric detection
trometry for improved lateral resolution and of deoxyribonucleic acid replication in human
detection of electropositive elements at single metaphase chromosomes. Proc Natl Acad Sci
cell level. Anal Chem 88(14):7130–7136 U S A 49:3395–3399
114. Cabin-Flaman A, Monnier AFO, Coffinier Y, 125. Manefield M, Whiteley AS, Griffiths RI, Bai-
Audinot J-N, Gibouin D, Wirtz T, ley MJ (2002) RNA stable isotope probing, a
Boukherroub R, Migeon H-N, Bensimon A, novel means of linking microbial community
Jannière L (2011) Combed single DNA function to phylogeny. Appl Environ Micro-
molecules imaged by secondary ion mass biol 68:5367–5373
spectrometry. Anal Chem 83(18):6940–6947 126. Radajewski S, Ineson P, Parekh NR, Murrell J
115. Cabin-Flaman A, Monnier A-F, Coffinier Y, (2000) Stable-isotope probing as a tool in
Audinot J-N, Gibouin D, Wirtz T, microbial ecology. Nature 403(10):646–649
Boukherroub R, Migeon H-N, Bensimon A, 127. Jensen LHS, Cheng T, Plane FOV, Escrig S,
Jannière L (2016) Combining combing and Comment A, van den Brandt B, Humbel BM,
secondary ion mass spectrometry to study Meibom A (2016) En route to ion micro-
DNA on chips using 13C and 15N labeling. probe analysis of soluble compounds at the
F1000Res 5:27429742 single cell level: the CryoNanoSIMS. In:
116. Weber PK, Graham GA, Teslich NE, European microscopy congress 2016: pro-
MoberlyChan W, Ghosal S, Leighton TJ, ceedings. Wiley, New York, NY
Wheeler KE (2010) NanoSIMS imaging of 128. Lovrić J, Malmberg P, Johansson BR,
Bacillus spores sectioned by focused ion Fletcher JS, Ewing AG (2016) Multimodal
beam. J Microsc 238:189–199 imaging of chemically fixed cells in prepara-
117. Wilson RG, Stevie FA, Magee CW (1989) tion for NanoSIMS. Anal Chem 88
Secondary ion mass spectrometry: a practical (17):8841–8848
handbook for depth profiling and bulk impu- 129. Gibbin E, Gavish A, Domart-Coulon I,
rity analysis. Wiley, New York, NY Kramarsky-Winter E, Shapiro O, Meibom A,
118. Polerecky L, Adam B, Milucka J, Musat N, Vardi A (2018) Using NanoSIMS coupled
Vagner T, Kuypers MM (2012) Look@ Nano- with microfluidics to visualize the early stages
SIMS–a tool for the analysis of nanoSIMS of coral infection by Vibrio coralliilyticus.
data in environmental microbiology. Environ BMC Microbiol 18(1):1–10
Microbiol 14(4):1009–1023 130. Nunan N, Ritz K, Crabb D, Harris K, Wu K,
119. Huang W, Hammel KE, Hao J, Thompson A, Crawford JW, Young IM (2001) Quantifica-
Timokhin VI, Hall SJ (2019) Enrichment of tion of the in situ distribution of soil bacteria
lignin-derived carbon in mineral-associated by large-scale imaging of thin sections of
soil organic matter. Environ Sci Technol 53 undisturbed soil. FEMS Microbiol Ecol 37
(13):7522–7531 (1):67–77
120. Whitman T, Zhu Z, Lehmann J (2014) Car- 131. Kuo J (2007) Electron microscopy: methods
bon mineralizability determines interactive and protocols. In: Methods in molecular biol-
effects on mineralization of pyrogenic organic ogy, 2nd edn. Humana, Totowa, NJ
matter and soil organic carbon. Environ Sci 132. Tippkötter R, Ritz K (1996) Evaluation of
Technol 48(23):13727–13734 polyester, epoxy and acrylic resins for suitabil-
121. Hatton P-J, Remusat L, Zeller B, Brewer EA, ity in preparation of soil thin sections for in
Derrien D (2015) NanoSIMS investigation of
134 Jennifer Pett-Ridge and Peter K. Weber
154. Guerquin-Kern JL, Wu TD, Quintana C, 165. Pernthaler A, Pernthaler J, Amann R (2002)
Croisy A (2005) Progress in analytical imag- Fluorescence in situ hybridization and cata-
ing of the cell by dynamic secondary ion mass lyzed reporter deposition for the identifica-
spectrometry (SIMS microscopy). BBA-Gen tion of marine bacteria. Appl Environ
Subjects 1724(3):228–238 Microbiol 68(6):3094–3101
155. Burns MS, File DM, Deline V, Galle P (1986) 166. Woebken D, Burow LC, Prufert-Bebout L,
Matrix effects in secondary ion mass spectro- Bebout BM, Hoehler TM, Pett-Ridge J,
metric analysis of biological tissue. Scan Elec- Spormann AM, Weber PK, Singer SW
tron Microscopy 1986(4):1277–1290 (2012) Identification of a novel cyanobacter-
156. Harris WC, Chandra S, Morrison GH (1983) ial group as active diazotrophs in a coastal
Ion implantation for quantitative ion micros- microbial mat using NanoSIMS analysis.
copy of biological soft tissue. Anal Chem 55 ISME J 6(7):1427–1439
(12):1959–1963 167. Lemaire R, Webb RI, Yuan Z (2008) Micro-
157. Phinney D (2006) Quantitative analysis of scale observations of the structure of aerobic
microstructures by secondary ion mass spec- microbial granules used for the treatment of
trometry. Microsc Microanal 12(4):352 nutrient-rich industrial wastewater. ISME J 2
158. Decelle J, Veronesi G, Gallet B, (5):528–541
Stryhanyuk H, Benettoni P, Schmidt M, 168. Hatzenpichler R, Scheller S, Tavormina PL,
Tucoulou R, Passarelli M, Bohic S, Clode P Babin BM, Tirrell DA, Orphan VJ (2014) In
(2020) Subcellular chemical imaging: new situ visualization of newly synthesized pro-
avenues in cell biology. Trends Cell Biol 30 teins in environmental microbes using amino
(3):173–188 acid tagging and click chemistry. Environ
159. Penen F, Malherbe J, Isaure M-P, Microbiol 16(8):2568–2590
Dobritzsch D, Bertalan I, Gontier E, Le 169. Bradley JP, Dai ZR, Erni R, Browning ND,
Coustumer P, Schaumlöffel D (2016) Chem- Graham GA, Weber PK, Smith JB, Hutcheon
ical bioimaging for the subcellular localization ID, Ishii H, Bajt S, Floss C, Stadermann FJ,
of trace elements by high contrast TEM, Sandford S (2005) An astronomical 2175 Å
TEM/X-EDS, and NanoSIMS. J Trace Elem feature in interplanetary dust particles. Sci-
Med Biol 37:62–68 ence 307:244–247
160. Nomaki H, LeKieffre C, Escrig S, Meibom A, 170. Remusat L, Hatton P-J, Nico PS, Zeller B,
Yagyu S, Richardson EA, Matsuzaki T, Kleber M, Derrien D (2012) NanoSIMS
Murayama M, Geslin E, Bernhard JM study of organic matter associated with soil
(2018) Innovative TEM-coupled approaches aggregates: advantages, limitations, and com-
to study foraminiferal cells. Mar Micropaleon- bination with STXM. Environ Sci Technol 46
tol 138:90–104 (7):3943–3949
161. Kraft ML, Weber PK, Longo ML, Hutcheon 171. De Samber B, De Rycke R, De Bruyne M,
ID, Boxer SG (2006) Phase separation of lipid Kienhuis M, Sandblad L, Bohic S, Cloetens P,
membranes analyzed with high-resolution Urban C, Polerecky L, Vincze L (2020)
secondary ion mass spectrometry. Science Effect of sample preparation techniques
313:1948–1951 upon single cell chemical imaging: a practical
162. Wirtz T, Fleming Y, Gysin U, Glatzel T, comparison between synchrotron radiation
Wegmann U, Meyer E, Maier U, Rychen J based X-ray fluorescence (SR-XRF) and nano-
(2013) Combined SIMS-SPM instrument for scopic secondary ion mass spectrometry
high sensitivity and high-resolution elemental (nano-SIMS). Anal Chim Acta 1106:22–32
3D analysis. Surf Interface Anal 45 172. Lehmann J, Kinyangi J, Solomon D (2007)
(1):513–516 Organic matter stabilization in soil microag-
163. Orphan VJ, House CH, Hinrichs K-U, gregates: implications from spatial heteroge-
McKeegan KD, DeLong EF (2001) neity of organic carbon contents and carbon
Methane-consuming Archaea revealed by forms. Biogeochemistry 85(1):45–57
directly coupled isotopic and phylogenetic 173. Wan J, Tyliszczak T, Tokunaga TK (2007)
analysis. Science 293:484–487 Organic carbon distribution, speciation, and
164. Amann RI, Krumholz L, Stahl DA (1990) elemental correlations within soil microaggre-
Fluorescent-oligonucleotide probing of gates: applications of STXM and NEXAFS
whole cells for determinative, phylogenetic, spectroscopy. Geochim Cosmochim Acta 71
and environmental studies in microbiology. J (22):5439–5449
Bacteriol 172(2):762–770
136 Jennifer Pett-Ridge and Peter K. Weber
174. Kopp C, Wisztorski M, Revel J, Mehiri M, response to resource availability: amino acid
Dani V, Capron L, Carette D, Fournier I, incorporation in San Francisco Bay. PLoS
Massi L, Mouajjah D (2015) MALDI-MS One 9(4):e95842
and NanoSIMS imaging techniques to study 181. Mayali X, Weber PK, Pett-Ridge J (2013)
cnidarian–dinoflagellate symbioses. Zoology Taxon-specific C:N relative use efficiency for
118(2):125–131 amino acids in an estuarine community.
175. Schlüter S, Eickhorst T, Mueller CW (2018) FEMS Microbiol Ecol 83(2):402–412
Correlative imaging reveals holistic view of 182. Bryson S, Li Z, Chavez F, Weber PK, Pett-
soil microenvironments. Environ Sci Technol Ridge J, Hettich RL, Pan C, Mayali X, Muel-
53(2):829–837 ler RS (2017) Phylogenetically conserved
176. Lin S, Henze S, Lundgren P, Bergman B, resource partitioning in the coastal microbial
Carpenter EJ (1998) Whole-cell immunolo- loop. ISME J 11(12):2781–2792
calization of nitrogenase in marine diazo- 183. Smith DF, Kiss A, Leach FE, Robinson EW,
trophic cyanobacteria, Trichodesmium spp. Paša-Tolić L, Heeren RM (2013) High mass
Appl Environ Microbiol 64(8):3052–3058 accuracy and high mass resolving power
177. Levenson RM, Borowsky AD, Angelo M FT-ICR secondary ion mass spectrometry
(2015) Immunohistochemistry and mass for biological tissue imaging. Anal Bioanal
spectrometry for highly multiplexed cellular Chem 405(18):6069–6076
molecular imaging. Lab Investig 95 184. Steele AV, Schwarzkopf A, McClelland JJ,
(4):397–405 Knuffman B (2017) High-brightness Cs
178. Singer SW, Chan CS, Hwang MH, Zemla A, focused ion beam from a cold-atomic-beam
VerBerkmoes NC, Hettich RL, Banfield JF, ion source. Nano Fut 1(1):015005
Thelen MP (2008) Characterization of cyto- 185. Hayes JM (2004) An introduction to isotopic
chrome579, an unusual cytochrome isolated calculations. Woods Hole Oceanographic
from an iron-oxidizing microbial community. Institution, Woods Hole, MA, USA.
Appl Environ Microbiol 74:4454–4462 https://www.whoi.edu/cms/files/jhayes/
179. Gerard E, Guyot F, Philippot P, Lopez-Garcia 2005/9/IsoCalcs30Sept04_5183.pdf
P (2005) Fluorescence in situ hybridization 186. Legin AA, Schintlmeister A, Jakupec MA,
coupled to ultra small immunogold detection Galanski M, Lichtscheidl I, Wagner M, Kep-
to identify prokaryotic cells using transmis- pler BK (2014) NanoSIMS combined with
sion and scanning electron microscopy. J fluorescence microscopy as a tool for subcel-
Microbiol Methods 63:20–28 lular imaging of isotopically labeled platinum-
180. Mayali X, Weber PK, Mabery S, Pett-Ridge J based anticancer drugs. Chem Sci 5(8):3135-
(2014) Phylogenetic patterns in the microbial 3143
Chapter 7
Abstract
Metatranscriptomic sequencing enables studying community-wide gene expression profiles of microbial
samples and getting functional insight on their up- or downregulated pathways. However, shotgun
sequencing is not the most efficient way to study expression of ribosomal RNA genes or to compare lot
of samples in experimental setups. Here we describe an efficient primer-independent method for processing
and barcoding libraries for directional sequencing of the 50 end region of the RNA. When applying size
selection of the original RNA, the method forms an optimal solution for the simultaneous analysis of
bacterial, archaeal, and eukaryotic rRNA diversity.
1 Introduction
Ali Navid (ed.), Microbial Systems Biology: Methods and Protocols, Methods in Molecular Biology, vol. 2349,
https://doi.org/10.1007/978-1-0716-1585-0_7, © Springer Science+Business Media, LLC, part of Springer Nature 2022
137
138 €ki
Marja Tiirola and Anita Ma
2 Materials
2.1 Size Selection of 1. Your RNA samples in water or buffer, typically 5–50 ng/μL.
the RNA (Optional, See 2. Control RNA isolated from E. coli or other organisms.
Note 1)
3. Molecular marker, e.g., 100 bp size ladder.
4. Pre-cast 1% agarose gel (for RNA) system.
5. Soft cooling block, pre-cooled in fridge.
6. Blue light table (emission at around 470 nm wavelength),
sterile scalpels and selective glasses to protect eyes from the
light.
7. Kit for recovery of RNA fragments from agarose gel.
Construction of Metatranscriptomic Libraries for 50 End Sequencing. . . 139
Fig. 1 Library preparation includes ligation of an RNA-oligo to the extracted RNA and reverse transcription of
the RNA into cDNA using a random hexamer primer with an overhang adapter P1. Libraries are amplified using
barcoded primers, dual size selected (containing purification), quantified, and pooled for next-generation
sequencing. The procedure can be done in 1 or 2 work days
3 Methods
3.1 RNA Size 1. Load 20 μL of your RNA to a precast 1% agarose gel. In extra
Selection (Optional) lanes, load 100 bp size marker and isolated RNA of E. coli to aid
in selecting the right size if the sample RNA is not visible.
2. Run the gel using appropriate program until the rRNA bands
are clearly visible (8–10 min). While running, keep a cooled ice
block on the top of the gel to avoid heating of the gel.
3. Separate the gel plates aiding with a spatula from the four
corners. Cut the area of 16S/18S rRNA (see Note 5) with a
sterile scalpel and insert the gel slices into clean microcentrifuge
tubes (Fig. 1).
Construction of Metatranscriptomic Libraries for 50 End Sequencing. . . 141
3.2 Ligation of the 1. Prepare the RNA ligation reaction (see Note 7). Prepare also a
M13-RNA Oligo negative control using water instead of the RNA. Keep the
negative control as one of the samples in the following steps.
3.3 cDNA Synthesis 1. Prepare the cDNA synthesis reaction by mixing the following
Using P1-Tailed Primer reagents per each sample (see Note 9).
(continued)
142 €ki
Marja Tiirola and Anita Ma
3.4 Amplification of 1. Prepare the PCR master mix, here Maxima SYBR Green/Fluo-
the cDNA rescein Master Mix (Thermo Scientific).
4 Notes
Fig. 2 RNA samples in the E-Gel agarose electrophoresis gel before (a) and after (b) extraction of the 16S/18S
rRNA gel band
er
er
1 500
pp
84
Lo
U
1 000 3500
700
3000
500
300 2000
200
1500
100 1000
500
50
0
Size
1 000
1 500
[bp]
100
200
300
400
500
700
25
25
50
b
er
er
w
pp
1 500
2
Lo
40
U
1 000
700
800
500
Sample Intensity [FU]
400
600
300
200
400
100
200
50
0
Size
1 000
1 500
[bp]
100
200
300
400
500
700
25
25
50
Fig. 3 Size distribution of the PCR product before (a) and after (b) the dual size selection using ProNex (see
Note 13)
146 €ki
Marja Tiirola and Anita Ma
Acknowledgments
References
1. Woese CR, Fox GE (1977) Phylogenetic struc- 5. Adiconis X, Haber AL, Simmons SK, Levy
ture of the prokaryotic domain: the primary Moonshine A, Ji Z, Busby MA, Shi X,
kingdoms. Proc Natl Acad Sci U S A Jacques J, Lancaster MA, Pan JQ, Regev A,
74:5088–5090 Levin JZ (2018) Comprehensive comparative
2. M€aki A, Salmi P, Mikkonen A, Kremp A, Tiirola analysis of 50 -end RNA-sequencing methods.
M (2017) Sample preservation, DNA or RNA Nat Methods 15(7):505–511
extraction and data analysis for high-throughput 6. Karst SM, Dueholm MS, McIlroy SJ, Kirkegaard
phytoplankton community sequencing. Front RH, Nielsen PH, Albertsen M (2018) Retrieval
Microbiol 8:1848 of a million high-quality, full-length microbial
3. Carini P, Marsden PJ, Leff JW, Morgan EE, 16S and 18S rRNA gene sequences without
Strickland MS, Fierer N (2016) Relic DNA is primer bias. Nat Biotechnol 36:190–195
abundant in soil and obscures estimates of soil 7. M€aki A, Rissanen AJ, Tiirola M (2016) A practi-
microbial diversity. Nat Microbiol 2:16242 cal method for barcoding and size-trimming
4. Machida RJ, Lin YY (2014) Four methods of PCR templates for amplicon sequencing. Bio-
preparing mRNA 50 end libraries using the Illu- Techniques 60:88–90
mina sequencing platform. PLoS One 9(7): 8. M€aki A, Tiirola M (2018) Directional high-
e101812 throughput sequencing of RNAs without gene-
specific primers. BioTechniques 60:219–223
Chapter 8
Abstract
The easily programmable CRISPR/Cas9 system has found applications in biomedical research as well as
microbial and crop applications, due to its ability to create site-specific edits. This powerful and flexible
system has also been modified to enable inducible gene regulation, epigenome modifications and high-
throughput screens. Designing efficient and specific guides for the nuclease is a key step and also a major
challenge in effective application. This chapter describes rules for sgRNA design and important features to
consider while touching upon bioinformatics advances in predicting efficient guides. Computational tools
that suggest improved guides, depending on application, or predict off-targets have also been mentioned
and compared.
1 Introduction
Ali Navid (ed.), Microbial Systems Biology: Methods and Protocols, Methods in Molecular Biology, vol. 2349,
https://doi.org/10.1007/978-1-0716-1585-0_8, © Springer Science+Business Media, LLC, part of Springer Nature 2022
147
148 Jaspreet Kaur Dhanjal et al.
Fig. 2 Schematic representation of mechanism of action of CRISPR/Cas9 system. Once the CRISPR/Cas9
system binds to the target DNA, Cas endonuclease identifies the PAM on the non-target strand initiating the
cleavage activity. Formation of a double-strand break is followed by its repair, which uses either non-
homologous end joining (NHEJ) or homologous recombination (HR). NHEJ involves simple re-ligation of the
broken ends but results in loss or gain of nucleotides. HR requires a correct template to repair the break by
synthesizing new nucleotides between the broken edges
2.1 Gene Knockout Gene knockout refers to disruption of expression of a genetic locus.
CRISPR/Cas9-based knockout is widely used in functional geno-
mics and screening of drugs. Many research groups have reported
successful CRISPR/Cas9 knockout screening in human cells
[4, 12, 22, 23]. Not only mammalian systems, CRISPR/Cas9
system is also being widely used to manipulate the genomes of
microorganisms [24–27].
Successful CRISPR/Cas9-based gene knockouts have been
reported in different cell types in different organisms [22, 28]. Gen-
eration of indels in a gene by NHEJ after DSBs leads to knockout of
a gene. However, it is common in CRISPR/Cas9-based knockout
experiments, for the gene to remain functional because of in-frame
variants generated after NHEJ. As micro-homology of the DNA
sequence plays a significant role in generation of in-frame variants,
success of CRISPR/Cas9 knockout depends on micro-homology
profile of the target gene [29]. sgRNAs should be designed accord-
ingly for an efficient gene knockout. Nucleotide sequence down-
stream of the protospacer also plays a key role in determining
knockout efficiency [30].
2.3 Gene Expression The CRISPR/Cas9 complex can also be used for specifically reg-
Control ulating the expression of desired genes. A modified CRISPR/Cas9
system called CRISPRi (CRISPR interference) has been proposed
for gene silencing [38]. CRISPRi system consists of sgRNA com-
plexed with a catalytically dead Cas9 (dCas9), which lacks endonu-
clease activity. This functions as a simple DNA-binding system
guided by RNA. This system when bound to a gene inhibits its
transcription by interfering with transcription elongation and bind-
ing of RNA polymerase and transcription factors. Similarly CRIS-
PRa (CRISPR activation) system—sgRNA-dCas9 complex coupled
with different transcriptional activation domains—when targeted
to promoter regions, upregulates the expression of target genes
[39, 40].
3.1 Sequence Profile The 30 end of the sgRNA sequence plays a major role in determin-
of the Target DNA ing Cas9 loading. RNAs which have purines as the last four nucleo-
and sgRNA tides of the spacer are preferred for binding by Cas9 over those
having pyrimidines. In human and murine cell lines, guanines were
found to be preferred at the 1 and 2 positions proximal to the
PAM sequence for efficient sgRNAs (Figure 1 shows the number-
ing pattern used to specify the nucleotide positions in sgRNA).
Cytosine is preferred at the 3 position, which is the DNA cleavage
site by the CRISPR/Cas9 complex [41]. Thymines are disfavored
at the +4/4 positions closest to the PAM [42]. This preference for
thymines might contribute to the efficiency of cleavage or the
introduction of mutation upon DNA repair. Also, adenines are
preferred from position 5 to 12, and guanines are preferred at
positions 14 to 17 [42]. Such a bias was also seen against
guanine and for cytosine at position 16.
Apart from position-specific propensity, the overall GC content
of the guide RNA is also an important factor in determining effi-
ciency of the system. The GC content of a stretch of nucleic acid
determines its melting temperature. Melting is an important pro-
cess for DNA interrogation by the guide. This ATP-independent
process of RNA invasion starts with PAM binding, which leads to
duplex distortion facilitating local DNA melting [43, 44].
Computational Approaches for Designing Highly Specific and Efficient sgRNAs 153
3.2 PAM Recent studies have not only reported nucleotide preferences for
and the Flanking active sgRNAs/target sequence at positions across the 20 nucleo-
Sequence tide length, but also the regions flanking the target sequence. The
position just adjacent to the PAM was found to have preference for
some specific nucleotides. Guanine was reported to be strongly
disfavored at position 1, suggesting that an extended PAM
sequence of CGGH (where H: A/C/T) is optimal for the use of
S. pyogenes Cas9 to engineer genomic DNA in cells. The study also
reported a bias for cytosine in the variable nucleotide of the PAM
and a preference against thymine at the same position [30].
3.3 Location The efficacy of a guide RNA is also related to the location of the cut
Targeted Within site and the location of the target site within the gene. A study
the Gene carried out in 2014 showed that the activity of sgRNAs targeting
close to the C-terminus is diminished, as frameshift mutations near
the end of a protein are comparatively less likely to disrupt expres-
sion. Certain gene-specific patterns were also observed; for
instance, the N-terminus of CD15 was a less-effective target site,
probably reflecting local chromatin accessibility [30]. Targeting
more than one site per gene would presumably help compensate
for such limitations. In keeping with assumptions, the activity
quickly decreased as a function of distance to the nearest CDS,
and the sgRNAs targeting the 50 and 30 UTRs were ineffective [30].
Although sgRNAs are popularly designed to target the CDS,
designing sgRNAs with an expected cut site exactly at the exon–
intron boundary sites that disrupts splicing can be efficacious and
may be particularly useful when it is desirable to reintroduce the
CDS, such as for phenotype rescue experiments. A recent study also
showed that the sgRNAs targeting the transcribed strand are less
effective than those targeting the non-transcribed strand [45].
3.4 Structural Structural accessibility of the RNA has been shown to play an
Accessibility important role in RNA-guided sequence recognition by siRNA
and miRNA, and hence may also play a role in CRISPR/Cas9
system. The important determinants being secondary structure,
self-folding free energy, and the accessibility of individual nucleo-
tides of the sgRNAs [46–48].
To form a functional RNA-protein complex, the crRNA needs
to interact with the Cas9 protein, which is facilitated by the well-
defined structural motifs of the tracrRNA region. It was also seen
that sgRNAs with very high or low G/C content tend to be less
active. The important nucleotides being at positions 21–50, form-
ing a stable stem-loop structure. It has also been reported that
nucleotides at positions 51–53 pair with nucleotides at positions
18–20 of the guide RNA, forming an extended stem loop which
decreases base accessibility of the “seed” region (the first 10–12
nucleotides adjacent to the PAM, which are critical for binding and
activation), which is critical for target site recognition.
154 Jaspreet Kaur Dhanjal et al.
3.5 Microhomology The presence of same short sequence of bases in different genomic
Profile regions is called microhomology, the pattern of which is related to
microhomology-mediated end-joining (MMEJ), otherwise called
alternative NHEJ (Alt-NHEJ). MMEJ is one of the DSB repair
pathways in the cell. The distinguishing property of MMEJ is the
presence of microhomology of 5–25 bp during the alignment of
broken ends before joining, causing deletions of sequences flanking
the break. Microhomology sequence profiles generally correlate
with in-frame mutations and are hence used to predict sgRNA
efficacy because CRISPR-based knockouts often result in genera-
tion of in-frame variants retaining functionality and reducing effi-
ciency. A study was reported where microhomology was used for in
Computational Approaches for Designing Highly Specific and Efficient sgRNAs 155
3.7 Experimental Various approaches are being reported to increase the specificity of
Conditions the CRISPR/Cas9 system by minimizing the off-target cleavage
events. These majorly include guide RNA truncations [55], exten-
sions [56] at the 50 ends of sgRNAs, co-localizing paired nickase
mutants of Cas9 [57], fusing catalytically inactive i.e. dCas9 to a
dimerization-dependent nuclease Fok1 [58, 59], and engineering
high-fidelity mutants of the Cas9 protein [60–62]. Other
approaches include controlling duration of CRISPR activity in
target cells by delivery of Cas9 and sgRNA as a ribonucleoprotein
complex (sgRNP) by electroporation or using cationic lipids
[63, 64], by spatiotemporal control [65, 66], or by adding a
CRISPR–Cas9 inhibitor in a time-dependent manner [67] among
other strategies. Among these methods, transient delivery of the
RNP complex has gained extensive favor while the other methods
are yet to be widely adopted [68].
156 Jaspreet Kaur Dhanjal et al.
Many tools are available today for predicting the most optimal
sgRNAs for targeting unique locations within the genome of vari-
ous organisms and predict their on-target efficiency. Table 1 lists
some of the representative tools for computational design of
sgRNAs.
5.1 sgRNA Design As discussed in the previous section, the success of CRISPR-based
Tools for Knockout gene knockout experiments largely depends on the nucleotide
and Knockin sequence of the target region, its base pairing with the designed
Experiments sgRNA, location of the cut site within the gene, generation of
indels by NHEJ and the chromatin structure. Microhomology
sequence profile further helps in predicting the chances of
in-frame mutations, which may reduce the efficacy of the knockout
experiment by retaining the partial functionality of the gene. Apart
from complete knockdown of genes, CRISPR/Cas system can also
be used to regulate their expression by fusing functionally mutated
Cas nuclease to transcriptional activators or repressors. The
sgRNAs designed for transcriptional regulation target gene promo-
ters rather than the coding regions, and therefore the sequence
features governing the specificity of such sgRNAs differ from those
used for designing sgRNAs for knockout experiments. Owing to
the insufficient data pertaining to CRISPRi/a experiments, the
sequence context of such sgRNAs is poorly understood and not
many tools exist for CRISPRi/o sgRNA designing. CRISPR-ERA
[70] and SSC [71] are two such tools belonging to this category.
The prediction tools can be broadly categorized into three
groups: (1) sequence-alignment-based tools, (2) hypothesis-driven
tools, and (3) machine learning-based tools [54]. Alignment-based
tools work by simply locating 20 nucleotide sequence followed by
PAM in the region of interest within the genome. This is followed
by the search for other genomic sites similar to each one to probe
their possible off-targets. The hypothesis-driven approaches are an
extension of these alignment-based tools. After the search for
probable sgRNA binding sites using alignment methods, the effi-
cacy potential of each of them is scored and ranked depending on
various parameters as listed in design rules. Learning approaches
make use of the ability of machines to mine a given data to look for
various factors that are important for estimation of the response
variable. Further, using the extracted features, models are devel-
oped that can learn, grow and change when exposed to new data.
With so many tools available for designing highly specific and
efficient sgRNAs, choosing the most appropriate tool is a major
challenge (see Note 1). Majority of the existing tools fall in the
category of alignment-based prediction tools, and most of them
take into consideration only up to three mismatches, not complying
158 Jaspreet Kaur Dhanjal et al.
Table 1
List of computational resources for designing sgRNAs
to the real scenario [72–74]. Also the algorithms like Bowtie, BWA
aligner and BatMis used for exploring the off-targets with mis-
matches limit their efficiency of prediction [72–76]. These align-
ment algorithms in general are meant to explore near exact matches
of short sequences within a large genome, and hence, the possibility
of missing out potential off-targets with more mismatches
increases. Thus, choosing an appropriate alignment algorithm
becomes important. Further, the presence of DNA and RNA
bulges in the sgRNA–DNA duplex also contributes in enhancing
this tolerance for mismatches. Tools like Cas-OFFinder [77] and
COSMID [78] give an option to incorporate gaps (serving as
bulges) while searching for off-targets. However, they also have
their own disadvantages that include, firstly, an incompetent align-
ment algorithm and, secondly, a bulge of only one nucleotide is
allowed. A bulge of up to four nucleotides has already been
reported [79]. Though different tools are expected to serve differ-
ently depending upon the objective of prediction, hypothesis-
driven and learning-based approaches as expected have been
shown to performed better in comparison to the simple
alignment-based methods [48]. This is attributed to their algo-
rithm that takes into consideration sequence preferences and chro-
matin organization. Also the machine learning methods provide us
with an advantage of in-depth sequence profiling, which might be a
limiting factor in case of manual curation. sgRNA-designer, a
machine learning-based tool was reported to perform the best for
human and mouse cell lines among 36 different tools compared in a
study [80]. It ranks and picks sgRNA for the desired genomic locus
using the rule sets made for maximizing on-target activity and
minimizing off-target activity. Tsai et al. in 2015 developed a
protocol, called GUIDE-seq for the unbiased genome-wide detec-
tion of off-targets [81]. The authors also compared their experi-
mental predictions with two in silico tools, namely MIT CRISPR
and E-CRISP. Yet another study compared the performance of
COSMID with several other prediction tools [82]. These compar-
ative studies show wide disparity between results obtained from
these in terms of predicted and experimentally obtained off-target
sites [81, 82]. This may be explained by the fact that sites predicted
purely on the basis of mismatch fail to capture the intrinsic
off-target mechanisms.
Furthermore, factors such as sgRNA structure and chromatin
accessibility which also influence Cas9 binding and cleavage are also
poorly explored by these tools. A little progress has been made by
CROP-IT, an Off-target Prediction and Identification Tool
(CROP-IT) in prediction of off-target binding and cleavage sites
by digging knowledge from the available experimental dataset for
Cas9 binding and cleavage [83]. We have recently developed a tool,
called CRISPcut, for the prediction of the most optimal sgRNA
binding sites in humans [84]. Along with mismatch tolerance, the
160 Jaspreet Kaur Dhanjal et al.
5.2 Species-Specific Some of the sgRNA design tools have been developed for very
Tools for Aiding specific genomes. For example, CRISPR-P has been developed for
in sgRNA Designing plant [85], flyCRISPR for Drosophila [86], and EuPaGDT for
pathogens [87]. These tools should be given a priority when work-
ing with genomes of the corresponding specific species. However,
the precise performance of these sgRNA on-target design tools
requires a thorough assessment because most of the design rules
incorporated in these tools have been derived from experiments
done in human and mouse cells.
A recent effort by MacPherson et al. to compare various fea-
tures of different sgRNA design tools is now available to the users as
an Excel-based program, called CRISPR Software Matchmaker, for
help in selection of the most appropriate tool as per their
needs [88].
5.3 Tools These tools mainly deal with the data obtained from high-
for Posterior Analysis throughput screening using CRISPR/Cas system and genome-
of Data Obtained from wide indels profiling after the experiments. CRISPR/Cas9 system
CRISPR/Cas9-Based is being widely used across the globe for high-throughput screens
Experiments for (1) detection of genes involved in a biological process, (2) cor-
relation of gene function to specific phenotypes, and (3) uncovering
the genetic cause of many diseases. These experiments mainly
Computational Approaches for Designing Highly Specific and Efficient sgRNAs 161
Fig. 3 Designing sgRNAs using CRISPcut. CRISPcut is an easy to use tool for designing sgRNAs for human
targets. The major steps involved in the process have been illustrated here
162 Jaspreet Kaur Dhanjal et al.
5.4 CRISPR/Cas9 Some of the available resources catalog the sgRNAs that can be
Genome Editing used for editing specific genes and are accessible to the users. Some
Databases examples representing this category are listed in Table 1.
6 Notes
References
1. Cong L, Ran FA, Cox D, Lin S, Barretto R, targets by CRISPR-Cas9 screening of protein
Habib N et al (2013) Multiplex genome engi- domains. Nat Biotechnol 33(6):661–667
neering using CRISPR/Cas systems. Science 12. Chen Y, Cao J, Xiong M, Petersen AJ, Dong Y,
339(6121):819–823 Tao Y et al (2015) Engineering human stem
2. Mali P, Yang L, Esvelt KM, Aach J, Guell M, cell lines with inducible gene knockout using
DiCarlo JE et al (2013) RNA-guided human CRISPR/Cas9. Cell Stem Cell 17(2):233–244
genome engineering via Cas9. Science 339 13. Belhaj K, Chaparro-Garcia A, Kamoun S,
(6121):823–826 Patron NJ, Nekrasov V (2015) Editing plant
3. Malina A, Mills JR, Cencic R, Yan Y, Fraser J, genomes with CRISPR/Cas9. Curr Opin Bio-
Schippers LM et al (2013) Repurposing technol 32:76–84
CRISPR/Cas9 for in situ functional assays. 14. Lowder LG, Zhang D, Baltes NJ, Paul JW III,
Genes Dev 27(23):2602–2614 Tang X, Zheng X et al (2015) A CRISPR/Cas9
4. Zhou Y, Zhu S, Cai C, Yuan P, Li C, Huang Y toolbox for multiplexed plant genome editing
et al (2014) High-throughput screening of a and transcriptional regulation. Plant Physiol
CRISPR/Cas9 library for functional genomics 169(2):971–985
in human cells. Nature 509(7501):487–491 15. Khatodia S, Bhatotia K, Passricha N, Khurana
5. Dhanjal JK, Radhakrishnan N, Sundar D SM, Tuteja N (2016) The CRISPR/Cas
(2017) Identifying synthetic lethal targets genome-editing tool: application in improve-
using CRISPR/Cas9 system. Methods ment of crops. Front Plant Sci 7:506
131:66–73 16. Choi KR, Lee SY (2016) CRISPR technologies
6. Qiu XY, Zhu LY, Zhu CS, Ma JX, Hou T, Wu for bacterial systems: current achievements and
XM et al (2018) High-effective and low-cost future directions. Biotechnol Adv 34
microRNA detection with CRISPR-Cas9. ACS (7):1180–1209
Synth Biol 7(3):807–813 17. Estrela R, Cate JH (2016) Energy biotechnol-
7. Sergiu C, Diana G, Amin H, Ioana BN (2018) ogy in the CRISPR-Cas9 era. Curr Opin Bio-
Restoring the p53 ‘guardian’ phenotype in technol 38:79–84
p53-deficient tumor cells with CRISPR/Cas9. 18. Donohoue PD, Barrangou R, May AP (2018)
Trends Biotechnol 36(7):653–660 Advances in industrial biotechnology using
8. Rauscher B, Heigwer F, Henkel L, Hielscher T, CRISPR-Cas systems. Trends Biotechnol 36
Voloshanenko O, Boutros M (2018) Toward (2):134–146
an integrated map of genetic interactions in 19. Deltcheva E, Chylinski K, Sharma CM,
cancer cells. Mol Syst Biol 14(2):e7656 Gonzales K, Chao Y, Pirzada ZA et al (2011)
9. Pham HT, Mesplede T (2018) The latest evi- CRISPR RNA maturation by trans-encoded
dence for possible HIV-1 curative strategies. small RNA and host factor RNase III. Nature
Drugs Context 7:212522 471(7340):602–607
10. Uppada V, Gokara M, Rasineni GK (2018) 20. Hawkins JS, Wong S, Peters JM, Almeida R, Qi
Diagnosis and therapy with CRISPR advanced LS (2015) Targeted transcriptional repression
CRISPR based tools for point of care diagnos- in bacteria using CRISPR interference (CRIS-
tics and early therapies. Gene 656:22–22 PRi). Methods Mol Biol 1311:349–362
11. Shi J, Wang E, Milazzo JP, Wang Z, Kinney JB, 21. Ran FA, Hsu PD, Wright J, Agarwala V, Scott
Vakoc CR (2015) Discovery of cancer drug DA, Zhang F (2013) Genome engineering
164 Jaspreet Kaur Dhanjal et al.
using the CRISPR-Cas9 system. Nat Protoc 8 albumin in pigs through CRISPR/Cas9-
(11):2281–2308 mediated knockin of human cDNA into swine
22. Shalem O, Sanjana NE, Hartenian E, Shi X, albumin locus in the zygotes. Sci Rep 5:16705
Scott DA, Mikkelson T et al (2014) Genome- 34. Platt RJ, Chen S, Zhou Y, Yim MJ, Swiech L,
scale CRISPR-Cas9 knockout screening in Kempton HR et al (2014) CRISPR-Cas9
human cells. Science 343(6166):84–87 knockin mice for genome editing and cancer
23. Chu HW, Rios C, Huang C, Wesolowska- modeling. Cell 159(2):440–455
Andersen A, Burchard EG, O’Connor BP 35. Dow LE (2015) Modeling disease in vivo with
et al (2015) CRISPR-Cas9-mediated gene CRISPR/Cas9. Trends Mol Med 21
knockout in primary human airway epithelial (10):609–621
cells reveals a proinflammatory role for 36. Zhang MM, Wong FT, Wang Y, Luo S, Lim
MUC18. Gene Ther 22(10):822–829 YH, Heng E et al (2017) CRISPR–Cas9 strat-
24. Feng X, Zhao D, Zhang X, Ding X, Bi C egy for activation of silent Streptomyces bio-
(2018) CRISPR/Cas9 assisted multiplex synthetic gene clusters. Nat Chem Biol 13
genome editing technique in Escherichia coli. (6):607
Biotechnol J 13(9):1700604 37. Behler J, Vijay D, Hess WR, Akhtar MK (2018)
25. de Vries ARG, de Groot PA, van den Broek M, CRISPR-based technologies for metabolic
Daran J-MG (2017) CRISPR-Cas9 mediated engineering in cyanobacteria. Trends Biotech-
gene deletions in lager yeast Saccharomyces nol 36(10):996–1010
pastorianus. Microb Cell Fact 16(1):222 38. Qi LS, Larson MH, Gilbert LA, Doudna JA,
26. Serif M, Dubois G, Finoux A-L, Teste M-A, Weissman JS, Arkin AP et al (2013) Repurpos-
Jallet D, Daboussi F (2018) One-step genera- ing CRISPR as an RNA-guided platform for
tion of multiple gene knock-outs in the diatom sequence-specific control of gene expression.
Phaeodactylum tricornutum by DNA-free Cell 152(5):1173–1183
genome editing. Nat Commun 9(1):3924 39. Cheng AW, Wang H, Yang H, Shi L, Katz Y,
27. Hegde S, Nilyanimit P, Kozlova E, Narra HP, Theunissen TW et al (2013) Multiplexed acti-
Sahni SK, Hughes GL (2018) CRISPR/Cas9- vation of endogenous genes by CRISPR-on, an
mediated gene deletion of the ompA gene in an RNA-guided transcriptional activator system.
enterobacter gut symbiont impairs biofilm for- Cell Res 23(10):1163–1171
mation and reduces gut colonization of Aedes 40. Chuai GH, Wang QL, Liu Q (2017) In silico
aegypti mosquitoes. bioRxiv 13(12):e0007883 meets in vivo: towards computational
28. Shen Z, Zhang X, Chai Y, Zhu Z, Yi P, Feng G CRISPR-based sgRNA design. Trends Bio-
et al (2014) Conditional knockouts generated technol 35(1):12–21
by engineered CRISPR-Cas9 endonuclease 41. Cong L, Ran FA, Cox D, Lin S, Barretto R,
reveal the roles of coronin in C. elegans neural Habib N et al (2013) Multiplex genome engi-
development. Dev Cell 30(5):625–636 neering using CRISPR/Cas systems. Science
29. Bae S, Kweon J, Kim HS, Kim JS (2014) 339(6121):819–823
Microhomology-based choice of Cas9 nuclease 42. Xu H, Xiao T, Chen CH, Li W, Meyer CA, Wu
target sites. Nat Methods 11(7):705–706 Q et al (2015) Sequence determinants of
30. Doench JG, Hartenian E, Graham DB, improved CRISPR sgRNA design. Genome
Tothova Z, Hegde M, Smith I et al (2014) Res 25(8):1147–1157
Rational design of highly active sgRNAs for 43. Jinek M, Chylinski K, Fonfara I, Hauer M,
CRISPR-Cas9-mediated gene inactivation. Doudna JA, Charpentier E (2012) A program-
Nat Biotechnol 32(12):1262–1267 mable dual-RNA-guided DNA endonuclease
31. Maruyama T, Dougan SK, Truttmann MC, in adaptive bacterial immunity. Science 337
Bilate AM, Ingram JR, Ploegh HL (2015) (6096):816–821
Increasing the efficiency of precise genome 44. Mekler V, Minakhin L, Severinov K (2017)
editing with CRISPR-Cas9 by inhibition of Mechanism of duplex DNA destabilization by
nonhomologous end joining. Nat Biotechnol RNA-guided Cas9 nuclease during target
33(5):538–542 interrogation. Proc Natl Acad Sci U S A 114
32. Li HL, Fujimoto N, Sasakawa N, Shirai S, (21):5443–5448
Ohkame T, Sakuma T et al (2015) Precise cor- 45. Tim Wang JJW, Sabatini DM, Lander ES
rection of the dystrophin gene in duchenne (2014) Genetic screens in human cells using
muscular dystrophy patient induced pluripo- the CRISPR-Cas9 system. Science 343
tent stem cells by TALEN and CRISPR-Cas9. (6166):80–84
Stem Cell Rep 4(1):143–154 46. Wang X, Wang X, Varma RK, Beauchamp L,
33. Peng J, Wang Y, Jiang J, Zhou X, Song L, Magdaleno S, Sendera TJ (2009) Selection of
Wang L et al (2015) Production of human hyperfunctional siRNAs with improved
Computational Approaches for Designing Highly Specific and Efficient sgRNAs 165
potency and specificity. Nucleic Acids Res 37 60. Kleinstiver BP, Pattanayak V, Prew MS, Tsai
(22):e152 SQ, Nguyen NT, Zheng Z et al (2016) High-
47. Long D, Lee R, Williams P, Chan CY, fidelity CRISPR-Cas9 nucleases with no
Ambros V, Ding Y (2007) Potent effect of detectable genome-wide off-target effects.
target structure on microRNA function. Nat Nature 529(7587):490–495
Struct Mol Biol 14(4):287–294 61. Slaymaker IM, Gao L, Zetsche B, Scott DA,
48. Robins H, Li Y, Padgett RW (2005) Incorpor- Yan WX, Zhang F (2016) Rationally engi-
ating structure to predict microRNA targets. neered Cas9 nucleases with improved specific-
Proc Natl Acad Sci U S A 102(11):4006–4009 ity. Science 351(6268):84–88
49. Wong N, Liu W, Wang X (2015) WU-CRISPR: 62. Chen JS, Dagdas YS, Kleinstiver BP, Welch
characteristics of functional guide RNAs for MM, Sousa AA, Harrington LB et al (2017)
the CRISPR/Cas9 system. Genome Biol Enhanced proofreading governs CRISPR-Cas9
16:218 targeting accuracy. Nature 550
50. Wu X, Scott DA, Kriz AJ, Chiu AC, Hsu PD, (7676):407–410
Dadon DB et al (2014) Genome-wide binding 63. Zuris JA, Thompson DB, Shu Y, Guilinger JP,
of the CRISPR endonuclease Cas9 in mamma- Bessen JL, Hu JH et al (2015) Cationic lipid-
lian cells. Nat Biotechnol 32(7):670–676 mediated delivery of proteins enables efficient
51. Fu Y, Foden JA, Khayter C, Maeder ML, protein-based genome editing in vitro and
Reyon D, Joung JK et al (2013) High- in vivo. Nat Biotechnol 33(1):73–80
frequency off-target mutagenesis induced by 64. Kim S, Kim D, Cho SW, Kim J, Kim JS (2014)
CRISPR-Cas nucleases in human cells. Nat Highly efficient RNA-guided genome editing
Biotechnol 31(9):822–826 in human cells via delivery of purified Cas9
52. Hsu PD, Scott DA, Weinstein JA, Ran FA, ribonucleoproteins. Genome Res 24
Konermann S, Agarwala V et al (2013) DNA (6):1012–1019
targeting specificity of RNA-guided Cas9 65. Zetsche B, Volz SE, Zhang F (2015) A split-
nucleases. Nat Biotechnol 31(9):827–832 Cas9 architecture for inducible genome editing
53. Yu C, Liu Y, Ma T, Liu K, Xu S, Zhang Y et al and transcription modulation. Nat Biotechnol
(2015) Small molecules enhance CRISPR 33(2):139–142
genome editing in pluripotent stem cells. Cell 66. Petris G, Casini A, Montagna C, Lorenzin F,
Stem Cell 16(2):142–147 Prandi D, Romanel A et al (2017) Hit and go
54. G-h C, Wang Q-L, Liu Q (2017) In silico CAS9 delivered through a lentiviral based self-
meets in vivo: towards computational limiting circuit. Nat Commun 8:15334
CRISPR-based sgRNA design. Trends Bio- 67. Shin J, Jiang F, Liu J-J, Bray NL, Rauch BJ,
technol 35(1):12–21 Baik SH, Nogales E, Bondy-Denomy J, Corn
55. Fu Y, Sander JD, Reyon D, Cascio VM, Joung JE, Doudna JA (2017) Disabling Cas9 by an
JK (2014) Improving CRISPR-Cas nuclease anti-CRISPR DNA mimic. Sci Adv 3(7):
specificity using truncated guide RNAs. Nat e1701620
Biotechnol 32(3):279–284 68. Ryan DE, Taussig D, Steinfeld I, Phadnis SM,
56. Daesik Kim SB, Park J, Kim E, Kim S, Yu HR, Lunstad BD, Singh M et al (2017) Improving
Hwang J, Kim J-I, Kim J-S (2015) Digenome- CRISPR-Cas specificity with chemical modifi-
seq: genome-wide profiling of CRISPR-Cas9 cations in single-guide RNAs. Nucleic Acids
off-target effects in human cells. Nat Methods Res 46(2):792–803
12:237–242 69. Cameron P, Fuller CK, Donohoue PD, Jones
57. Ran FA, Hsu PD, Lin CY, Gootenberg JS, BN, Thompson MS, Carter MM et al (2017)
Konermann S, Trevino AE et al (2013) Double Mapping the genomic landscape of CRISPR-
nicking by RNA-guided CRISPR Cas9 for Cas9 cleavage. Nat Methods 14(6):600–606
enhanced genome editing specificity. Cell 154 70. Liu H, Wei Z, Dominguez A, Li Y, Wang X, Qi
(6):1380–1389 LS (2015) CRISPR-ERA: a comprehensive
58. Tsai SQ, Wyvekens N, Khayter C, Foden JA, design tool for CRISPR-mediated gene edit-
Thapar V, Reyon D et al (2014) Dimeric ing, repression and activation. Bioinformatics
CRISPR RNA-guided FokI nucleases for 31(22):3676–3678
highly specific genome editing. Nat Biotechnol 71. Xu H, Xiao T, Chen C-H, Li W, Meyer CA, Wu
32(6):569–576 Q et al (2015) Sequence determinants of
59. Guilinger JP, Thompson DB, Liu DR (2014) improved CRISPR sgRNA design. Genome
Fusion of catalytically inactive Cas9 to FokI Res 25(8):1147–1157
nuclease improves the specificity of genome 72. Ma M, Ye AY, Zheng W, Kong L (2013) A
modification. Nat Biotechnol 32(6):577–582 guide RNA sequence design platform for the
166 Jaspreet Kaur Dhanjal et al.
CRISPR/Cas9 system for model organism 83. Singh R, Kuscu C, Quinlan A, Qi Y, Adli M
genomes. Biomed Res Int 2013:270805 (2015) Cas9-chromatin binding information
73. Montague TG, Cruz JM, Gagnon JA, Church enables more accurate CRISPR off-target pre-
GM, Valen E (2014) CHOPCHOP: a diction. Nucleic Acids Res 43(18):e118
CRISPR/Cas9 and TALEN web tool for 84. Dhanjal JK, Radhakrishnan N, Sundar D
genome editing. Nucleic Acids Res 42(Web (2018) CRISPcut: a novel tool for designing
Server issue):W401–W407 optimal sgRNAs for CRISPR/Cas9 based
74. O’Brien A, Bailey TL (2014) GT-scan: identi- experiments in human cells. Genomics 111
fying unique genomic targets. Bioinformatics (4):560–566
30(18):2673–2675 85. Lei Y, Lu L, Liu HY, Li S, Xing F, Chen LL
75. Heigwer F, Kerr G, Boutros M (2014) (2014) CRISPR-P: a web tool for synthetic
E-CRISP: fast CRISPR target site identifica- single-guide RNA design of CRISPR-system
tion. Nat Methods 11(2):122 in plants. Mol Plant 7(9):1494-1496
76. Lei Y, Lu L, Liu HY, Li S, Xing F, Chen LL 86. Gratz SJ, Ukken FP, Rubinstein CD, Thiede G,
(2014) CRISPR-P: a web tool for synthetic Donohue LK, Cummings AM et al (2014)
single-guide RNA design of CRISPR-system Highly specific and efficient CRISPR/Cas9-
in plants. Mol Plant 7(9):1494–1496 catalyzed homology-directed repair in Dro-
77. Bae S, Park J, Kim JS (2014) Cas-OFFinder: a sophila. Genetics 196(4):961–971
fast and versatile algorithm that searches for 87. Peng D, Tarleton R (2015) EuPaGDT: a web
potential off-target sites of Cas9 RNA-guided tool tailored to design CRISPR guide RNAs
endonucleases. Bioinformatics 30 for eukaryotic pathogens. Microb Genomics 1
(10):1473–1475 (4):e000033
78. Cradick TJ, Qiu P, Lee CM, Fine EJ, Bao G 88. MacPherson CR, Scherf A (2015) Flexible
(2014) COSMID: a web-based tool for identi- guide-RNA design for CRISPR applications
fying and validating CRISPR/Cas off-target using protospacer workbench. Nat Biotechnol
sites. Mol Ther Nucl Acids 3(12):e214 33(8):805
79. Lin Y, Cradick TJ, Brown MT, Deshmukh H, 89. Li W, Xu H, Xiao T, Cong L, Love MI, Zhang
Ranjan P, Sarode N et al (2014) CRISPR/Cas9 F et al (2014) MAGeCK enables robust identi-
systems have off-target activity with insertions fication of essential genes from genome-scale
or deletions between target DNA and guide CRISPR/Cas9 knockout screens. Genome
RNA sequences. Nucleic Acids Res 42 Biol 15(12):554
(11):7473–7485 90. Winter J, Breinig M, Heigwer F,
80. Doench JG, Fusi N, Sullender M, Hegde M, Brügemann D, Leible S, Pelz O et al (2015)
Vaimberg EW, Donovan KF et al (2016) Opti- caRpools: an R package for exploratory data
mized sgRNA design to maximize activity and analysis and documentation of pooled
minimize off-target effects of CRISPR-Cas9. CRISPR/Cas9 screens. Bioinformatics 32
Nat Biotechnol 34(2):184 (4):632–634
81. Tsai SQ, Zheng Z, Nguyen NT, Liebers M, 91. Güell M, Yang L, Church GM (2014) Genome
Topkar VV, Thapar V et al (2015) GUIDE- editing assessment using CRISPR genome ana-
seq enables genome-wide profiling of lyzer (CRISPR-GA). Bioinformatics 30
off-target cleavage by CRISPR-Cas nucleases. (20):2968–2970
Nat Biotechnol 33(2):187–197 92. Pinello L, Canver M, Hoban M (2015) Cris-
82. Lee CM, Cradick TJ, Fine EJ, Bao G (2016) presso: sequencing analysis toolbox for crispr-
Nuclease target site selection for maximizing cas9 genome editing. bioRxiv. https://doi.
on-target activity and minimizing off-target org/10.1101/031203
effects in genome editing. Mol Ther 24
(3):475–487
Chapter 9
Abstract
A central driver for the field of systems biology is to develop an understanding of how interactions between
components affect the functioning of a system as a whole. Network analysis is an approach that is uniquely
suited to uncover patterns and organizing principles in a wide variety of complex systems. In this chapter,
we will give a detailed description of basic concepts for characterizing empirical networks, frequently used
random network models, and how to compute properties of networks using Python packages. We will
demonstrate the application of network analysis by investigating several biological networks.
1 Introduction
Ali Navid (ed.), Microbial Systems Biology: Methods and Protocols, Methods in Molecular Biology, vol. 2349,
https://doi.org/10.1007/978-1-0716-1585-0_9, © Springer Science+Business Media, LLC, part of Springer Nature 2022
167
168 André Voigt and Eivind Almaas
2 Tools
2.2 NetworkX NetworkX [17], first released in 2005 is a free and open source
Python library that allows for the generation, manipulation, and
analysis of a variety of different types of networks. The main net-
work types supported are ordinary undirected graphs, directed
graphs, and multigraphs (both directed and undirected), which
differ from ordinary graphs in that a given pair of nodes may be
connected by any number of edges, with potentially different edge
attributes. NetworkX can generate networks according to a variety
of predefined models, including Erdös–Rényi, Barabási–Albert, and
the configuration models. It also contains standard functions to
compute many different network attributes, such as clustering,
shortest paths, centrality, and k-cores. As a library for the Python
programming language, it does not provide a graphical user inter-
face for user input. Instead, it is accessed through the Python/
IPython command line interface or through user-written programs
and scripts. This chapter provides several examples of how to per-
form different aspects of network analysis using the built-in func-
tions in NetworkX. Note that, while user input is entirely through
the command line, NetworkX also provides graphical visualization
of networks through integration with Matplotlib, another free/
open source Python library for 2D plotting.
2.3 Cytoscape Cytoscape is a general open source platform for complex network
analysis and visualization [15]. However, initially, it was developed
as a bioinformatics program for visualizing molecular interaction
networks and biological pathways and integrating these networks
with annotations, gene expression profiles, and other state data.
The Cytoscape core contains a basic set of features for data integra-
tion and visualization. There exists many plug-ins for network and
molecular profiling analyses, network layouts, additional file format
support, scripting, and connection with databases. Plug-ins are
developed by users of the Cytoscape open API based on Java™
(http://www.java.com) technology, and plug-in development by
the community is encouraged. Most of the plug-ins are freely
available at http://www.cytoscape.org/.
CoexpressionNetwork/Rpackages/WGCNA/Tutorials/index.
html) contains extensive documentation that provides examples of
use of WGCNA on various types of data.
3 Biological Networks
import networkx as nx
import random
adjMat = [[0, 1, 1], [1, 0, 0], [1, 0, 0]] #Diagonally symmetric
matrix, corresponding to an undirected network - node 1 connects to
nodes 2 and 3, nodes 2 and 3 are not connected to each other
G = nx.Graph()
for i in range(len(adjMat)-1):
for j in range(i+1, len(adjMat)):
if (adjMat[i][j] == 1):
G.add_edge(i, j)
nx.degree(G, 1) #Returns degree for node 1, equal to 2
nx.degree(G, 2) #Returns degree for node 2, equal to 1
nx.degree(G, 3) #Returns degree for node 3, equal to 1
nx.degree(G) #Returns dict structure containing node degrees,
{1:2, 2:1, 3:1}
import networkx as nx
G = nx.Graph()
G.add_edge(’a’, ’b’) # Creates nodes a and b, and links them by an
edge
max_deg = max(nx.degree(G).values())
deg_dist = [0]*(max_deg+1)
#Initializes an array to contain the
degree distribution
for node in G:
deg_dist[G.degree(node)] += 1
Table 1
Degree distribution and cumulative degree distribution for the network in Fig. 1
k 0 1 2 3 4 5 6 7
P(k) 0.000 0.125 0.125 0.375 0.125 0.250 0.000 0.000
CumP(k) 1.000 1.000 0.875 0.750 0.375 0.250 0.000 0.000
The maximal degree of a single node is N 1 when there are N nodes
Complex Network Analysis in Microbial Systems: Theory and Examples 173
Table 2
Clustering coefficient, closeness centrality, and betweenness centrality of the example network in
Fig. 1
Node number, i 1 2 3 4 5 6 7 8
ki 3 2 1 5 5 4 3 3
Ci 0.667 1.000 0.000 0.300 0.500 0.667 1.000 1.000
Closenessi 0.714 0.595 0.524 0.857 0.857 0.786 0.667 0.667
Betweennessi 1.167 1.000 1.000 1.233 1.767 1.367 1.000 1.000
Fig. 2 Mus musculus Protein-protein interactions in the species Mus musculus from BioGrid-2.0.61 [20]. This
network contains a total of 1407 nodes and 1579 links and consists of 176 separate connected subnetworks,
of which the largest has 766 nodes
Fig. 3 Plot of the degree distribution for the protein-protein interaction network of the species Mus musculus
Complex Network Analysis in Microbial Systems: Theory and Examples 175
averaging the clustering coefficient over all the nodes in the net-
work, we obtain the global measure called the average clustering
coefficient [19]:
X
hC i ¼ C =N :
i i
ð6Þ
import networkx as nx
G = nx.read_edgelist("example_network.txt")
cluster_coef_a = nx.clustering (G, ’a’) #Clustering coefficient of
node a, equal to 0.1666666....
cluster_coef = nx.clustering (G) #{’a’: 0.16666666666666666, ’b’:
1.0, ’c’: 1.0, ’d’: 0.0, ’e’: 0.0}
avg_clustering = nx.average_clustering (G) #0.43333333...
3.1 Protein–Protein The PPI is of great importance for a multitude of process in a cell.
Interaction (PPI) Understanding the details of the PPI is relevant for understanding
Networks diseases and the identification of new therapeutic methods. In PPI
networks, nodes correspond to proteins and (undirected) links
represent an interaction between a pair of proteins. An example of
a PPI network is shown in Fig. 2, and the network’s degree distri-
bution is plotted in Fig. 3.
When studying the depiction of the network (Fig. 2), one
feature is particularly striking: It is not possible to start from one
node and reach all the other nodes by only hopping along the links.
The network is broken into many components [18]. Note that a
component consists of all the nodes that can be reached by only
following the links. The network in Fig. 1 contains one giant
component, containing 54.4% of all the nodes. The remaining
175 components contain significantly fewer nodes. The component
size distribution, CSD(n), is the chance that a randomly selected
component contains n nodes and is defined as
CSDðnÞ ¼ ðNumber of components with size nÞ=ðTotal number of componentsÞ: ð7Þ
We identify both the total number of components and their size
by using a network “burning” algorithm (see Example 5). This is
based on the simple idea of recursively following links to visit
(“burn”) nodes, until no new nodes are discoverable. Then all the
nodes in the current component have been detected.
176 André Voigt and Eivind Almaas
import networkx as nx
G = nx.read_edgelist("example_network.txt")
G.add_edges_from([(’f’,’g’), (’f’, ’h’), (’g’, ’h’)])
components = list(nx.connected_components(G)) #Returns simple
list of nodes in each component [{a, b, c, d, e}, {’f’, ’g’, ’h’}]
comp_size = [len(i) for i in components] #[3, 5]
node pairs in the network (there is N(N 1)/2 different node pairs
when N is the number of nodes), for all the nodes we keep track of
how many of the shortest paths go through them. We describe a
script for finding the betweenness centrality of each node in Exam-
ple 6 [21]. The betweenness centrality bk shows the importance of
node k according to the number of shortest paths to node j that
passes through node k. To calculate the betweenness for all paths, bk
is added by a new score for each node, and the entire calculation is
repeated for each of the N nodes. The final bk score is the between-
ness of node k [21]. We normalize the bk score of each node by
dividing with 2(size of the component including k) 1, the smallest
possible number of shortest paths that may pass through a node: at
every node (N 11) shortest paths originate, and at every node
(N 1) shortest paths terminate. Note that in a directed network,
the shortest path from node A to node B and from node B back to
node A often is not the same.
Example 6: Shortest Path, Closeness Centrality,
and Betweenness Centrality:
import networkx as nx
G = nx.read_edgelist("example_network.txt")
nx.shortest_path_length(G, ’b’, ’d’) #2
distances_to_b = {}
for node in G:
distances_to_b[node] = nx.shortest_path_length(G, ’b’, node)
distances_to_b #[’a’:1, ’b’:0, ’c’:1, ’d’:2, ’e:2’]
all_distances = nx.shortest_path_length(G)
all_distances[’a’][’b’] #1
all_distances[’b’][’e’] #2
Table 3
Properties of the protein interaction network of Mus musculus compared with four random network
models
not imply that the majority of the nodes necessarily have their
degree at this value. On visual inspection, we immediately deter-
mine that only nodes 1, 7, and 8 have a degree of 3. Table 1 shows
the degree distribution and the cumulative degree distribution for
this network. In Table 2, we show the degree, the clustering, the
closeness centrality, and the betweenness centrality for each of the
nodes. In this table, node 4 has the largest betweenness centrality
and nodes 4 and 5 have the largest closeness centrality. This result
should not be surprising, given the particular nature of the example
network and the definitions of betweenness and closeness centrality.
We have applied the analysis tools just described to the PIN of
M. musculus [13], and we report some of the global network
characteristics in Table 3. This table also contains results for the
four random graph models, which we will discuss in detail in sect. 4,
with parameters chosen for these models to provide a relevant
comparison to the M. musculus PIN. For now, it suffices to observe
that while many of the models are able to capture aspects of the
statistical properties of the M. musculus PIN, none of the presented
models are fully able to explain the global properties of this PIN.
This marks an important issue in the study of complex networks;
Complex Network Analysis in Microbial Systems: Theory and Examples 179
Fig. 4 The metabolic network of Yersinia pestis [22]. This network has a total of 841 nodes and 2810 links and
consists of 4 connected components. The largest connected component has 835 nodes
Table 4
Comparing metabolic network in Yersinia pestis (YP) [22] with four random network models
3.2 Metabolic Metabolism is the set of chemical reactions in a cell that produces
Networks the building blocks and supplies the energy needs necessary for the
cell to live and replicate. Here, we will consider cellular metabolism
as a network consisting of metabolites (nodes) that are linked by
reactions [13], and we will use the network concepts discussed
above for its analysis. In order to represent the cellular metabolism
as a network, it is necessary to decide what properties a node and a
link should represent. The most common choice is that of a metab-
olite is being represented by a node, and a link between two nodes
signifying two metabolites being listed together as a substrate–
product pair in a chemical reaction [5, 13]. For instance using
this scheme, the two reactions
A þ B ! C and B ! D
are represented as the four nodes A, B, C, and D with the three
connections A–C, B–C, and B–D. In this common representation,
the directionality of the reactions is often not included, resulting in
an undirected network. Other metabolic representations are dis-
cussed elsewhere [5]. As an example network for our analysis, we
use the genome-scale metabolic reconstruction of the bacterium
Yersinia pestis (YP) [22], the causative agent of bubonic plague.
Fig. 4 shows a representation of the structure of this metabolic
network, and Fig. 5 its degree distribution. In Table 4, we present
the global statistics for this network. Note that the average degree,
the average clustering, and the average closeness centrality scores
are much larger in the YP metabolic network compared to the
M. musculus PIN. In agreement with the visual representation of
the networks, this suggests that many more nodes in the YP meta-
bolic network are in highly clustered regions of the network.
Complex Network Analysis in Microbial Systems: Theory and Examples 181
3.3 Gene Gene expression is the process by which the information contained
Co-expression in a gene manifests itself in a functional gene product. This process
Networks is critical in linking the emergence of phenotype (physical traits) to
genotype (hereditary information). The significance of this process
becomes readily apparent by looking at the development of com-
plex multicellular organisms from single stem cells. Such an organ-
ism grows through duplication of these stem cells—sequence
mutation in this process being the exception rather than the
norm—and as a result, the genetic information contained in each
cell remains the same. The genetic content of an organism’s cells
being the same, the differentiation of cells into widely different and
highly specialized tissue types is instead the consequence of differ-
ences in how this information is expressed. Studying how certain
genes may be silenced, or activated, depending on functional dif-
ferences in cells or changes in environmental conditions is therefore
key to understanding their role.
The importance of understanding gene expression has led to
the development of the field of gene co-expression networks, by
which we seek to understand gene roles by looking at how their
expression depends on each other. In such a network, nodes repre-
sent specific genes, while a link represents a similarity between the
expression profile of a pair of genes—typically, the correlation of
mRNA or protein expression values.
While the base paradigm of systems biology is to provide a big
picture analysis of a given biological system and the interactions
between elements, biological networks often contain smaller
sub-networks that are significantly more interconnected between
each other than they are to the rest of the network. In network
terminology, these sub-networks are commonly referred to as com-
munities (or interchangeably, modules). As these communities often
reflect biologically important relationships, properly identifying
them is of great interest to systems biologists. Note that unlike
components, which, as mentioned earlier, are entirely distinct struc-
tures which do not connect to nodes outside them, different com-
munities in a given network may connect to each other through
intermediary edges. The term community structure denotes the
extent to which a network consists of communities of nodes
which are densely connected internally, but less so toward nodes
in other communities. A common measure to quantify the commu-
nity structure of a network is the modularity [23],
1 X ki k j
Q ¼ ððAij Þδðc i , c j ÞÞ ð9Þ
2m ij 2m
where A is the adjacency matrix, ki ¼ ∑jAij and kj ¼ ∑iAij the
degree of nodes i and j, respectively, m ¼ ∑ijAij and δ(ci, cj) ¼ 1 if
i and j belong to the same community ci ¼ cj, 0 otherwise (ci 6¼ cj).
As the modularity Q is defined from an existing community decom-
position, Eq. 9 merely tells us the quality of a given decomposition,
182 André Voigt and Eivind Almaas
import networkx as nx
import community #requires https://github.com/taynaud/python-
louvain/
G = nx.read_edgelist(’example_network.txt’)
G.edges() #{(’a’, ’b’), (’a’, ’c’), (’a’, ’d’), (’a’, ’e’),
(’b’, ’c’)}
G.add_edges_from([(’e’, ’f’), (’e’, ’g’), (’e’, ’h’), (’g’, ’h’)])
communities = community.best_partition(G) #{’a’:0, ’b’:0,
’c’: 0, ’d’: 0, ’e’: 1, ’f’: 1, ’g’: 1, ’h’: 1}
3.4 Sources We have downloaded the protein interaction network of Mus mus-
of Datasets culus from the Biological General Repository for Interaction Data-
sets (BioGRID) database [20] (http://www.thebiogrid.org).
Currently, it has more than 330,000 protein and genetic interac-
tions from major model organism species.
Fig. 7 Average degree distribution for 100 networks generated using the ER
model (G(n,M )). N ¼ 10,000 and M ¼ 2,499,750
184 André Voigt and Eivind Almaas
4.1 Erdös–Rényi What is often called the ER model comes in two flavors with
(ER) Model somewhat different properties. One approach is based on using a
fixed number of links to connect randomly chosen nodes (the G(N,
M) model [9]), and the other is based on randomly choosing to
connect two nodes while going through all the possible node
pairings (the G(N,p) model [25]). If the parameters are properly
selected, the difference between the resulting network generation
methods is small for large networks. By choosing p ¼ N ð2MN 1Þ , a
network produced from the G(N,p) approach will on average have
the same number of links as that from the G(N,M) approach when
the number of nodes, N is large [18]. Note that the degree distri-
bution from the ER model is approximately a binomial distribution
that approaches a Poisson distribution when the number of nodes
N is large.
4.1.1 G(N,M) Erdös and Rényi introduced the G(N,M) model [9, 26]. This
model chooses randomly and with the same probability a graph of
a collection of all graphs with n nodes and m links. Starting with a
collection of N nodes that contain no links, we add M links in a
step-wise process. In each iteration, a pair of nodes among the
N nodes is selected to be connected by a link. This is repeated
M times [18]. We have referred to this model as the ER model in
Complex Network Analysis in Microbial Systems: Theory and Examples 185
4.1.2 G(N,p) The G(N,p) model was proposed by Edgar Gilbert in 1959 [25]. In
this model, each pair of nodes, of which there are N2 different
combinations, are connected with a probability p. Note that the
process of connecting a particular pair is independent of the out-
come of this decision in other pairs. The resulting graph has
N nodes, and the probability that it will have M links is given by
N
pM ð1 pÞ 2 M [6]. This model is very similar to the ER model (and
the terms are sometimes used interchangeably, for instance by
NetworkX)—the difference being that while an ER/G(N, M) net-
work will always contain the specified number of edges M, the
number of edges in a given G(N,p) network varies stochastically
around the expected value.
import networkx as nx
import random
n = 10 #number of nodes
#Erdos-Renyi model
m = 20 #number of edges
G = nx.Graph()
G.add_nodes_from(range(n))
for i in range(m):
node1 = random.randint(0, n-1)
node2 = random.randint(0, n-1)
while (node2 == node1):
node2 = random.randint(0, n-1) #in case of self-
loops, draw new node2
G.add_edge(node1, node2)
#Erdos-Renyi model
m = 20 #number of edges
G = nx.Graph()
G, add notes from(range(n))
186 André Voigt and Eivind Almaas
for i in range(m):
node1 = random, randint(0, n-1)
node2 = random, randint(0, n-1)
while (node2 == node1)
node2 = random,randint(0, n-1) #in case of self-
loops, draw new node2
G,add edge(node1, node2)
#G(N, p) model
a b c
Fig. 9 Visual comparison of the three network models: a) Erdös–Rényi (ER), b) Barabási–Albert (BA), and
c) configuration (conf) using the degree distribution from Fig. 3. Individual node degrees are indicated with
color code: Black: degree 0–8, Gray: 9–16, White: >16
G = nx.Graph()
G.add_nodes_from(range(n))
4.2 Barabási–Albert This model was introduced by Barabási and Albert [10] with the
(BA) Model aim of generating random networks with a degree distribution that
was a power law. This was motivated by the observation that many
networks are characterized by a long-tailed, or scale-free, degree
distribution [6, 10]. This is in clear contrast to the properties of the
canonical ER model. The BA model implements the principle of
preferential attachment to grow [10]: the probability of a new node
to connect to existing nodes is proportional to the degree of the
nodes. Thus, nodes with many neighbors have an increased chance
of getting new neighbors. We state this probability mathematically
as
k
Π i ¼ PN i : ð10Þ
j ¼1 k j
import networkx as nx
import random
G = nx.Graph()
n0 = 10
n = 100
k = 3
for i in range(n0):
G.add_node(i)
for node1 in range(n0-1):
for node2 in range(node1+1, n0):
G.add_edge(node1, node2)
G.add_edge(node1, node2)
import networkx as nx
import randomG = nx.Graph()
n = 100
deg_dist_source_net = nx.barabasi_albert_graph(n, 3)
degree_sequence = nx.degree(deg_dist_source_net).values()
#Create valid degree sequence
random.shuffle(degree_sequence) #Not strictly necessary, sim-
ply randomizes node names
for j in range(degree_sequence[i]):
stubs.append(i)
for i in range(len(degree_distribution)):
G.add_node(i)
nb_of_edges = len(stubs)/2
References
1. Kitano H (2002) Systems biology: a brief over- 8. Newman MEJ (2003) The structure and func-
view. Science 295:1662–1664 tion of complex networks. Siam Rev
2. Ideker T, Galitski T, Hood L (2001) A new 45:167–256
approach to decoding life: systems biology. 9. Erdős PR, A. (1959) On random
Annu Rev Genom Hum G 2:343–372 graphs. I. Publ Math 6:290–297
3. Bruggeman FJ, Westerhoff HV (2007) The 10. Barabasi AL, Albert R (1999) Emergence of
nature of systems biology. Trends Microbiol scaling in random networks. Science
15:45–50 286:509–512
4. Barabasi AL, Oltvai ZN (2004) Network biol- 11. Molloy M, Reed B (1995) A critical-point for
ogy: understanding the cell’s functional orga- random graphs with a given degree sequence.
nization. Nat Rev Genet 5:101–113 Random Struct Algor 6:161–179
5. Almaas E (2007) Biological impacts and con- 12. Newman MEJ, Strogatz SH, Watts DJ (2001)
text of network theory. J Exp Biol Random graphs with arbitrary degree distribu-
210:1548–1558 tions and their applications. Phys Rev E
6. Albert R, Barabasi AL (2002) Statistical 2001:6402
mechanics of complex networks. Rev Mod 13. Jeong H, Tombor B, Albert R, Oltvai ZN, Bar-
Phys 74:47–97 abasi AL (2000) The large-scale organization of
7. Dorogovtsev SN, Mendes JFF (2002) Evolu- metabolic networks. Nature 407:651–654
tion of networks. Adv Phys 51:1079–1187 14. Hagberg AA, Schult DA, Swart PJ (2008)
Exploring network structure, dynamics, and
Complex Network Analysis in Microbial Systems: Theory and Examples 191
function using NetworkX. In: Varoquaux G, Yersinia pestis, strain 91001. Mol Biosyst
Vaught T, Millman J (eds) Proceedings of the 5:368–375
7th Python in science conference (SciPy2008). 23. Newman MEJ (2006) Modularity and commu-
SciPy, Pasadena, CA, pp 11–15 nity structure in networks. Proc Natl Acad Sci
15. Shannon P, Markiel A, Ozier O, Baliga NS, U S A 103(23):8577–8582
Wang JT, Ramage D, Amin N, 24. Nowick K, Gernat T, Almaas E, Stubbs L
Schwikowski B, Ideker T (2003) Cytoscape: a (2009) Differences in human and chimpanzee
software environment for integrated models of gene expression patterns define an evolving
biomolecular interaction networks. Genome network of transcription factors in brain. Proc
Res 13:2498–2504 Natl Acad Sci 106(52):22358–22363
16. Batagelj V, Mrvar A (2002) Pajek – analysis and 25. Gilbert EN (1959) Random graphs. Ann Math
visualization of large networks. Graph Drawing Stat 30:1141–1144
2265:477–478 26. Erdos P, Renyi A (1960) On the evolution of
17. Blondel VD, Guillaume J-L, Lambiotte R, random graphs. B Int Stat Inst 38:343–347
Lefebvre R (2008) Fast unfolding of commu- 27. Krapivsky PL, Redner S, Leyvraz F (2000)
nities in large networks. J Stat Mech Theor Exp Connectivity of growing random networks.
10:P10008 Phys Rev Lett 85:4629–4632
18. Newman M (2010) Networks: an introduc- 28. Dorogovtsev SN, Mendes JFF, Samukhin AN
tion. Oxford University Press, New York, NY (2000) Structure of growing networks with
19. Watts DJ, Strogatz SH (1998) Collective preferential linking. Phys Rev Lett
dynamics of ‘small-world’ networks. Nature 85:4633–4636
393:440–442 29. Albert R, Barabasi AL (2000) Topology of
20. Stark C, Breitkreutz BJ, Reguly T, Boucher L, evolving networks: local events and universal-
Breitkreutz A, Tyers M (2006) BioGRID: a ity. Phys Rev Lett 85:5234–5237
general repository for interaction datasets. 30. Amaral LAN, Scala A, Barthelemy M, Stanley
Nucleic Acids Res 34:D535–D539 HE (2000) Classes of small-world networks.
21. Newman MEJ (2001) Scientific collaboration Proc Natl Acad Sci U S A 97:11149–11152
networks. II. Shortest paths, weighted net- 31. Dorogovtsev SN, Mendes JFF (2000) Evolu-
works, and centrality. Phys Rev E 2001:6401 tion of networks with aging of sites. Phys Rev E
22. Navid A, Almaas E (2009) Genome-scale 62:1842–1845
reconstruction of the metabolic network in
Chapter 10
Abstract
In the last decade, the high-throughput and relatively low cost of short-read sequencing technologies have
revolutionized prokaryotic genomics. This has led to an exponential increase in the number of bacterial and
archaeal genome sequences available, as well as corresponding increase of genome assembly and annotation
tools developed. Together, these hardware and software technologies have given scientists unprecedented
options to study their chosen microbial systems without the need for large teams of bioinformaticists or
supercomputing facilities. While these analysis tools largely fall into only a few categories, each may have
different requirements, caveats and file formats, and some may be rarely updated or even abandoned. And
so, despite the apparent ease in sequencing and analyzing a prokaryotic genome, it is no wonder that the
budding genomicist may quickly find oneself overwhelmed. Here, we aim to provide the reader with an
overview of genome annotation and its most important considerations, as well as an easy-to-follow protocol
to get started with annotating a prokaryotic genome.
Key words Genome annotation, Prokaryote sequencing, Gene prediction, Structural annotation,
Functional annotation
1 Introduction
Ali Navid (ed.), Microbial Systems Biology: Methods and Protocols, Methods in Molecular Biology, vol. 2349,
https://doi.org/10.1007/978-1-0716-1585-0_10, © Springer Science+Business Media, LLC, part of Springer Nature 2022
193
194 Jeffrey A. Kimbrel et al.
2 Materials
Table 1
Genome quality and taxonomy assessment software
2.2 Computational The software detailed in this protocol utilize web servers and thus
Requirements can be run on common hardware such as a laptop. That said, many
annotation programs and databases can further be run at the com-
mand line, often expanding the options and capabilities available.
Prokaryotic Genome Annotation 197
Table 2
Genome visualization software
3 Methods
3.1 All-in-One Tools Automated, or “all-in-one,” annotation tools combine both struc-
tural and functional annotation in one convenient step, producing a
high-quality draft genome. There are several automated annotation
tools available either as online web servers or as downloadable,
command-line-driven software (Table 3). Some tools, such as the
Prokaryotic Genome Annotation Pipeline (PGAP) and Integrated
Microbial Genomes/Genomes OnLine Database (IMG/GOLD),
198 Jeffrey A. Kimbrel et al.
Table 3
Automated genome annotation software
Fig. 1 Rapid annotation using subsystem technology (RAST) genome upload interface. (a) Under “choose file,”
select a FASTA file of DNA contig sequences. (b) Complete the genome information by submitting the NCBI
taxonomy identifier, as well as the genus, species, and strain
3.2 Gene Prediction The first step in the typical annotation process (including RAST
(Structural) above) is identifying the structural components of a genome. These
are features with defined genomic coordinates such as genes for
protein coding sequences (CDSs) and transfer and ribosomal RNA
(tRNA and rRNA, respectively). Several popular and highly cited
structural annotation tools are available depending on the type of
structural features identified (Table 4). Prodigal, Glimmer, and the
GeneMark suite all detect open reading frames (ORFs)
corresponding to a protein CDS (detailed in Subheading 3.2.1).
Other structural annotation tools instead discover so-called non--
coding RNA features including tRNA and rRNA (detailed in Sub-
heading 3.2.2). As structural annotation is often included in
automated pipelines, in lieu of a protocol, we give a review of the
methods and challenges these annotation tools encounter.
3.2.1 Protein Coding Prokaryotic protein coding sequences (CDS) typically follow a
Sequences common scheme which gene-calling algorithms utilize. The struc-
ture of a gene is tied to aspects of the central dogma, that is, DNA is
transcribed into messenger RNA (mRNA) which is translated into
amino acids/proteins. The molecular machinery catalyzing the
central dogma rely on landmarks that are encoded in the genome.
Computational pipelines for structural annotation similarly work in
part by identifying these genomic landmarks, followed by further
refinement through algorithmic or heuristic approaches. Genomic
landmarks involved in transcription include promoter regions, tran-
scription factor binding sites, transcription termination sites, and
genes transcribed together in polycistronic operons. Although
transcription is the first step in producing a protein, gene calling
algorithms typically do not utilize these transcriptional landmarks
Prokaryotic Genome Annotation 201
Fig. 2 Online results from the RAST automated annotation of Marinobacter sp. PT19DW. (a) Subsystem
coverage shows the ratio of coding sequences with a predicted functional annotation, and the ratio of those
annotations are displayed in the accompanying pie chart. (b) KEGG metabolic pathway detail of the TCA cycle.
Boxes represent functional categories that were identified (filled) or not identified (unfilled) in the Marinobacter
sp. PT19DW genome
202 Jeffrey A. Kimbrel et al.
Table 4
Structural annotation software
3.2.2 Non-coding RNA Protein-coding sequences are not the only genomic feature that can
be identified in the structural annotation process. There are many
genes that are transcribed into RNA products, but do not undergo
translation into an amino acid sequence, and instead the RNA
transcripts themselves are functionally active. These non-coding
RNAs (ncRNAs) include the well-conserved rRNAs and tRNAs,
to the more variable and less well-studied small regulatory RNAs
(sRNAs) and clustered regularly interspaced short palindromic
repeats (CRISPR) transcripts. Even sequences in the untranslated
204 Jeffrey A. Kimbrel et al.
Table 5
Structural annotation refinement software
3.3 Functional Although many automated tools can produce high-quality genome
Prediction (General) annotations, combining multiple functional annotation tools can
significantly increase the completeness [2]. Therefore, in this pro-
tocol, we will supplement the RAST annotation with another
“general” functional annotation tool. The goal of general func-
tional annotation is to classify well-conserved genes, or those
genes in well-studied classes, such as glycolysis, amino acid and
nucleotide metabolism, and nutrient cycling. One large collection
of these genes is the Kyoto Encyclopedia of Genes and Genomes
(KEGG [51]). The KEGG database is arranged into an ontology of
proteins, the reactions they can perform, and the pathways that
these reactions fit into. The KEGG Automated Annotation Server
(KAAS [52]) is a web service to annotate protein coding sequences.
206 Jeffrey A. Kimbrel et al.
Fig. 3 Kyoto Encyclopedia of Genes and Genomes Automated Annotation Server (KEGG/KAAS) web server for
general functional annotation. (a) Upload the FASTA protein file obtained from the RAST annotation and select
“for Prokaryotes” for the representative set. Submit for annotation, replying to the confirmation e-mails. (b)
After receiving an e-mail informing that your KAAS run is complete, note the run information (by clicking the
Job ID link), and download the text file format with your KEGG orthology annotations
Fig. 4 KAAS annotation analysis in the online pathway browser for the TCA cycle. As in Fig. 2, boxes represent
functional categories that were identified (filled) or not identified (unfilled). Differences between the RAST and
KAAS annotations are circled
Table 6
List of some specific functional annotation software
3.4.2 antiSMASH Some annotation tools require different input files, as they may use
other genomic information beyond the primary amino acid
sequence. The antibiotics and secondary metabolites (antiSMASH)
server searches for gene clusters involved in the production of
210 Jeffrey A. Kimbrel et al.
Fig. 5 Database for automated carbohydrate-active enzyme annotation (dbCAN)-specific functional annotation
for carbohydrate-active enzymes. (a) Tabular output giving the gene name and the CAZy functional class, with
confidence metrics including E-value and covered fraction. This data can be downloaded using the “Down-
load” links. (b) Graphical representation of the domain architecture
Fig. 6 Antibiotics and secondary metabolites (antiSMASH) server for specific functional annotation. (a) Upload
the Genbank file obtained from the RAST annotation. The Genbank file preserves gene synteny, allowing
antiSMASH to more accurately identify gene clusters. (b) Results are shown in the web browser, and each
cluster can be observed in more detail by clicking the cluster name. (c) Further information on Cluster 2 for a
type I polyketide synthetase/non-ribosomal peptide synthase cluster, and a predicted structure of the
produced compound
6. After a few minutes, you will receive an e-mail with a link to the
results. These results will remain online for 1 month. On the
top right corner of the results page, a downward facing arrow
gives options for downloading the results locally, including in
Excel or Genbank format. “Download all results” will down-
load all of the data, including the HTML output.
7. antiSMASH identified three clusters in the Marinobacter
genome, including ectoine production, a type I polyketide
synthase/non-ribosomal peptide synthetase cluster, and for a
212 Jeffrey A. Kimbrel et al.
4 Notes
Acknowledgments
References
1. Sorokina M, Stam M, Médigue C et al (2014) 3. Baric RS, Crosson S, Damania B et al (2016)
Profiling the orphan enzymes. Biol Direct 9:10 Next-generation high-throughput functional
2. Griesemer M, Kimbrel JA, Zhou CE et al annotation of microbial genomes. MBio 7:
(2018) Combining multiple functional anno- e01245-16
tation tools increases coverage of metabolic
annotation. BMC Genomics 19:948
Prokaryotic Genome Annotation 213
Abstract
With the nexus of super computing and the biotech revolution, it seems an era of predictive biology
through systems biology may be at hand. Modern omics capabilities enable examination of the state of
biological system in exquisite detail. The genome, transcriptome, proteome, and metabolome may all be
largely knowable, at least for some model systems, providing a basis for modeling and simulation of
molecular mechanisms, or pathways, that could capture a biological system’s emergent properties. How-
ever, there are significant challenges remaining that impede the realization of this vision, perhaps the most
significant being the missing functional annotation of genes and gene products. For even the most well-
studied organisms as much as a third of called genes for a given genome are not annotated and more than
half may be tenuous. Homology inferred from sequence similarity is the basis for much of genome
annotation. Homology inferred from structural similarity could be a powerful complement to sequence-
based annotation methods. Structural biology or structural informatics can be used to assign molecular
function and may have increasing utility with the rapid growth of gene sequence databases and emerging
methods for structure determination, like structure prediction based on coevolution. Here we describe
tools and provide example cases using structural similarity at the level of quaternary structure, domain
content, domain topology, and small 3D motifs to infer homology and posit function. Ultimately annota-
tion by similarity, be it 3D structure homology or more classically primary sequence homology, must be
founded by accurate annotation of one ortholog in the group—understanding every function encoded by a
genome remains a major challenge to life science.
1 Introduction
Ali Navid (ed.), Microbial Systems Biology: Methods and Protocols, Methods in Molecular Biology, vol. 2349,
https://doi.org/10.1007/978-1-0716-1585-0_11, © Springer Science+Business Media, LLC, part of Springer Nature 2022
215
216 Brent W. Segelke
cells; even for each of many cells within a complex tissue [7]. It is
intriguing to think we may be soon able to build system models
informed by this wealth of data and simulate complex cellular and
multicellular behavior of biological systems based on their complex
orchestration of molecular mechanisms. Unfortunately, for the
foreseeable future, attempts at system modeling at the molecular
level for cellular systems will be plagued with the annotation
problem [8].
To build detailed models of cellular systems, a comprehensive
accounting is needed for the molecules present, which may be
within reach, but a comprehensive accounting of molecular func-
tion is also needed. This means that a complete and accurate
annotation of genomes is essential. Unfortunately, genome anno-
tation, even for the most well-studied species, is far from complete
[9]. In fact, the annotation problem is a serious impediment in
many areas of the life sciences. Nearly all proteomic, transcriptomic,
or metagenomic surveys return lists of implicated gene for which a
significant portion have unknown function. Even the genome of
the minimal organism, engineered to have only genes essential for
replication, is missing functional annotation for ~30% of the genes
it encodes [10]. It is estimated that 40% of newly sequenced genes
from genome centers and metagenomics project have no known
function [9]. Furthermore, the rate of discovery of new genes, even
new gene sequence families, outstrips the rate of functional assign-
ment for genes by orders of magnitude [8]. To make matters worse,
existing annotations may be vague, incomplete, or even wrong.
There are protein families that have been studied for decades for
which the function of family members are still in dispute [11].
By necessity, since the rate of production for gene sequence
data outstrips the rate of new sequence family functional assign-
ments, the overwhelming majority of genes are currently annotated
automatically and annotations are inferred from sequence homol-
ogy [12, 13]. Meaning newly sequenced genes (and encoded prod-
uct) are generally assigned function based on homology to genes
with existing annotation. Homology is inferred from sequence
similarity, so a gene lends its annotation to a newly discovered
gene, which may in turn lend its annotation to the next newly
sequenced gene with similarity, and so on. As a result, a gene
annotation could derive from the series of functional inferences
such as the newly annotated gene is not identifiably similar by
sequence to the originally annotated gene.
Gene annotation by structural similarity could be a powerful
compliment to current annotation methods. Similarity can be
deduced by structural comparison independent of sequence simi-
larity. Furthermore, entries in the structural database are much
more likely to have a direct link to primary literature, which gives
greater certainty to the annotation. Identifying homology through
structure could have an outsized impact on the annotation problem
Functional Annotation from Structural Homology 217
1.1 Inferring Structure has long been used for taxonomic classification and infer-
Homology from ring relatedness of species. Gross anatomy of facial structure for
Structural Similarity example is indicative of the relatedness of evolutionarily connected
species (Fig. 1). Just by appearance, we can intuit the similarity of
primate faces: mirror symmetrical across the median plane of the
face; two eyes above a nose with two nostrils; two ears, one on each
side of the head; and a mouth below the nose (Fig. 1a). Knowing
the function of our own eyes, nose, mouth, and ears allows us to
reasonably infer the function of these anatomical structures for
related species. Looking beyond the immediately apparent outward
appearance, an interrelatedness of more distance species can be
discovered. A close look at the skeletal anatomy of fore limbs reveals
that nature has reused the same basic structure for various but
related functions (Fig. 1b). From humans, to reptiles, to birds,
the skeletal anatomy of the fore limb is strikingly similar, one
large bone connecting the limb to the torso, then connected to
two smaller bones which is in turn connected to matrix of even
smaller bones (carpal bones), then to a fan of digits. Structural
analysis makes it clear—there is more relatedness between distant
species than first meets the eye.
Just as gross anatomical structure homology results from evo-
lutionary links between species, biomacromolecular structure
homology evolves from a common origin of molecules within
molecular families and can reveal homologous macromolecular
function. Macromolecular structure homology can reveal evolu-
tionary links over much greater evolutionary distances. From the
domain architecture of single domain phospholipase A2 (PLA2), for
example, a close relationship is apparent between vertebrates
(human (pdb 6g5j, [14]), cow (pdb 1mkt, [15]), and cobra (pdb
1a3f, [16]), Fig. 1b). A close relationship between fungus (pdb
4aup, [18]) and bacteria (pdb 1lwb, [19]) PLA2 molecules is also
readily apparent. There is some apparent relationship between
insect PLA2 and vertebrate, but no obvious relationship between
vertebrate, insect, fungus, and bacteria at the domain architecture
level. Structure-based alignment however reveals a common
subdomain.
Upon close inspection, facilitated by structure-based superpo-
sition, a core subdomain common to all of the extracellular
218 Brent W. Segelke
Fig. 1 (continued) from view and not labeled. The similarity of PLA2 domain
architecture is readily apparent between vertebrates, as is the similarity between
fungus and bacteria. Similarity between insect and vertebrate, or between insect
and fungus or bacteria, is not obvious. (d) All six PLA2 molecules are shown
superimposed by the common core anti-parallel helix subdomain. The core
subdomain is highlighted in gray; the remaining elements are colored white
and shown as semi-transparent (left panel). All of these PLA2 enzymes have an
antiparallel helix domain, with the upper helix in this depiction going from
N-terminal to C-terminal left to right and have at least three helical turns in
common among all of the molecules shown. Near the center of the antiparallel
helix domain, there is a nearly absolutely conserved motif, with a histidine-
helical turn-tyrosine on the lower helix and an aspartic acid hydrogen bound to
both the histidine and the tyrosine on the upper helix (center panel). A simple
schematic can be derived that codifies the rules making up the hallmark for this
collection of PLA2 molecules and their structural family members (right panel).
These depictions of PLA2 molecules were generated with Chimera [20]
220 Brent W. Segelke
histidine terminates into a long loop region just after the conserved
aspartate and therefore does not have the highly conserved tyro-
sine. Instead there is a compensatory mutation near the c-terminus
of the third helix that provides the tyrosine that hydrogen bonds
with the core aspartate residue. It is known from molecular biology
and enzymology studies that these residues form the core motif
involved in catalysis [21]. The conserved tyrosine and aspartate
stabilize the histidine, which in turn deprotonates the water that
hydrolyzes the ester bond linking the fatty acid in the A2 position to
the glycerol backbone of phospholipid [21].
From the observed structural homology, a simple hallmark
motif can be derived with HDXXY sequence on the lower helix
and D within hydrogen bonding distance on the upper helix.
Notably, the PLA2 structure from bee venom does not exactly
match the hallmark motif. The tyrosine residue in the HDXXY
sequence is missing, but there has been a compensatory change
such that there is a different tyrosine that hydrogen bonds to the
central motif aspartate (Fig. 1d). The subdomain and catalytic
motif common to PLA2 enzymes from bacteria to human reveals
a deep evolutionary conservation between species that last had a
common ancestor billions of years ago.
1.2 The Many Tiers Macromolecular structures evolve over many different length and
of Macromolecular time scales leading to many different tiers of structural homology. A
Structural Homology number of tools provided by and to the structural biology commu-
nity make it possible to explore structural similarity at several dif-
ferent tier levels, all of which we might be exploited to infer
functional details. Macromolecular structures that form a biological
unit to carry out a specific function range in size and complexity
from small, single chain, single domain protein, like PLA2, or
massive mega Dalton heterocomplexes made up of many chains of
various sizes and polymer type, like the ribosome [22]. As molecu-
lar structures evolve, parts get reused and altered. Domains may be
added or removed, by addition or deletion of chains in a complex or
by insertion of deletion of domains into existing chains. Chains can
grow or shrink by insertions and deletions of various length seg-
ments. And sequences of biopolymers that make up macromolecu-
lar structures can drift or change with random mutation that
accumulate over evolutionary timescales.
A brief glimpse at the variations in the large family of ATP
Binding Cassette transporters (ABC transporters) provides a good
case study for the different tiers of structural similarity. ABC trans-
porters are multidomain proteins with widely divergent sequences
but identifiable by their highly conserved ATP binding domains
[23]. These macromolecular assemblies can have very simple
arrangements (in the case of ABC transporters, as simple as a
homodimer of two chains each composed of two domains [23])
or as complex as hetero complexes of many chains of different types
Functional Annotation from Structural Homology 221
2 Materials
2.1 The Protein The PDB is the information resource that provides the basis for
Databank (PDB) structural homology searches and therefore functional annotation
via structure homology [29, 30]. The PDB is the repository of
publicly released, experimentally determined bio-macromolecular
structures and complexes. There are three partner PDB organiza-
tions and processing centers, ePDB (Protein Data Bank in Europe),
RCSB (Research Collaboratory for Structural Bioinformatics Pro-
tein Data Bank), and jPDB (Protein Data Bank in Japan), that form
a consolidated structure repository, the worldwide PDB
[31, 32]. The Biological Magnetic Resonance Data Bank
(BMRB) is a fourth member of the wwPDB and serves as a reposi-
tory of NMR data. As of the date that this was written, there are
nearly 160,000 biological macromolecular structures entered in the
PDB [29].
222 Brent W. Segelke
Fig. 2 The many tiers of protein structural homology. (a) Comparison of the
domain organization of two functionally homologous ATP binding cassette (ABC)
transporters. The left figure shows a schematic depiction of the five
non-covalently interacting domains of the vitamin B12 import ABC transporter
in complex with its vitamin B12 periplasmic binding protein (PDB: 2qi9, [25]).
The right figure shows a schematic of the eight-domain maltose import ABC
transporter in complex with its maltose periplasmic binding protein (PDB: 2r6g,
[24]). Domains depicted with the same shape and shade are homologous—they
are thought to have a common ancestor gene. The two larger circles at the
bottom of each schematic are the ATP binding motor domains, the vertically
oriented ovals are the transmembrane permease domains, and the apical circles
depict the periplasmic binding proteins. (b) The chain composition of given ATP
importer complexes. The ovals with adjoining line segments depict the protein
chains corresponding to the ABC transporter directly above. The three chains on
the left are from the vitamin B12 transporter and the four chains on the right are
from the maltose transporter. Note the vitamin B12 transporter is a homodimer
of heterodimers plus the periplasmic binding protein; the maltose transporter
contains a homodimer of ATP binding cassettes, but the rest is a heterooligomer.
The maltose transporter permease domains are structurally homologous but
have only weak sequence homology. (c) Topology of ATP binding cassette
domains (corresponding to the circles at the bottom of the schematics in part
a of the figure). The topology diagrams depict the secondary structural elements
and their interconnectivity corresponding to the ATP binding domains for the ABC
transporter directly above. The number, the constituents, the relative size, and
Functional Annotation from Structural Homology 223
Fig. 2 (continued) the order of connectivity can be discerned from the topology
diagram. The order, the orientation, and interactions of β-strands making up
β-sheets can also be inferred. It is readily apparent that the ATP binding domains
from the vitamin B12 transporter and the maltose transporter are homologous.
β-Strands are depicted as rectangular arrows and α-helices are depicted as
rectangles. (b) Superposition of ATP binding domains from vitamin B12 trans-
porter and the maltose transporter. Shown are the ribbon diagrams of vitamin
B12 transporter and the maltose transporter ATP binding domains superim-
posed. The vitamin B12 transporter ATP binding domain is shown in a lighter
shade of gray compared to the maltose transporter homolog. From the superpo-
sition of ribbon diagrams, it apparent that these two domains are highly
homologous. (e) Superposition of the ATP binding region of vitamin B12 and
maltose transporters. The ATP binding motif is at the dimer interface between
ATP binding domains. The number, type, and conformations of the residues
interacting with ATP are nearly identical (Figure panel c was adapted from the
HERA diagrams [26] available from PDBSum [27]. Figure panels d and e were
composed with Chimera [20])
224 Brent W. Segelke
2.2 BLASTp: Basic BLAST is a heuristic algorithm for similarity searches [33, 34] and
Local Alignment is perhaps the most widely used search tool for finding database
Search Tool entries that contain sequence information. BLASTp is a useful tool
Protein-Protein for examining structural homology when used to search for protein
sequences in the PDB. Like SAS (described below), BLASTp can be
used to find PDB entries that have protein sequence similarily to a
query protein sequence, providing an entrée to structural homol-
ogy starting from protein sequence (see Note 4). As with SAS,
BLASTp is most useful for finding the proteins in the PDB most
closely related to the query sequence.
2.3 FATCAT: Flexible FATCAT is the algorithm used by RCSB (see Note 5) to precalcu-
Structure Alignment by late the clusters of structural similarity for the entire PDB
Chaining Aligned [35]. FATCAT is structure-based alignment or search tool that
Fragment Pairs scores pairwise matches based primarily on root mean square dis-
Allowing Twists tance differences of aligned fragment pairs. FATCAT search against
all of the chains in the PDB is computationally expensive so FAT-
CAT searches are generally conducted against a representative set of
chains selected by sequence clustering for the chain sequences in
the PDB. FATCAT and other aligned fragment pair algorithms are
inherently sequence-independent comparison methods and as such
can identify very different homologies. FATCAT output can pro-
vide the basis for a large multi-sequence structure-based alignment.
As a consequence of matching against representative chains for
sequence clusters FATCAT often returns a matched set of chains
with very diverse sequence. This can provide the basis for a multi-
sequence structure-based alignment that has few highly or abso-
lutely conserved amino acids. If the highly conserved residues
cluster together in space to form a 3D motif, which can be deter-
mined by mapping the conserved residues onto 3D structures,
these residues are often functionally important (see Note 6).
Users can obtain FATCAT results by submitting a query on the
FATCAT web service. The FATCAT web service interface [36]
requires only a PDB ID and chain ID and a choice of databases of
representative chains from precalculated sequence clusters as input.
The output is emailed to the user-designated email address.
2.4 PDBSum PDBSum is a useful complement to the main PDB resource because
it nicely aggregates and displays structure information contained
within PDB entries, for example topologies are much more accessi-
ble through PDBsum than through the RCSB website
[27, 37]. PDBsum is a rich interactive website maintained by the
Functional Annotation from Structural Homology 225
2.7 CATH: Class CATH is a database of classified domain structures derived from
Architecture Topology analysis of the structural domains present in the individual chains in
Homologous the PDB [41]. It is also a website that offers search and analysis
Superfamily tools, enabling the use of the domain classification database
[42]. CATH classifies domains based on: class, architecture, topol-
ogy, and homologous superfamily in that order. Each class, archi-
tecture, topology, and homologous superfamily is assigned a
number such that a superfamily is identifiable by a four number
ID. For example, 3.30.70.100 is the identifier for a superfamily that
in the alpha beta class, with a two-layer sandwich architecture,
alpha-beta plaits topology, and 620 members domains identified
from the pdb within 35% sequence similarity. Superfamilies are
further subdivided into sequence similarity clusters at the sequence
family (35% similarity), orthologous family (60% similarity), like
domain (95% similarity), and identical domain (100% similarity)
levels. Each similarity cluster is also assigned a number, so a set of
domains with identical sequence from the PDB belonging to a
CATH superfamily can be identified by a seven number ID, PDB
ID, plus chain identifier.
CATH entries can be identified with the CATH web site search
utilities, and CATH is often conveniently linked by other structure
analysis resources like PDBsum. The CATH website [42] provides
searches based on CATH ID, PDB ID, other reference IDs such as
UniProt ID, keyword, protein sequence, or by structure. The
structure search requires a coordinate file be uploaded. A CATH
search will return a high-level report on the matching superfamilies,
domains, and PDB structures matched to the query with clickable
links to further examine the query results. The PDBsum link to
CATH links directly to the superfamily match for a given domain
within a PDB entry.
228 Brent W. Segelke
2.8 SCOPe: SCOPe [43], and its predecessor SCOP, is a database of protein
Structural domain classifications based on structure and evolutionary relation-
Classification ships [44]. Original versions of SCOP, initiated at the Laboratory
of Proteins-Extended for Molecular Biology at the UK Medical Research Council, were
based entirely on manual curation. SCOP manual curation ended in
2009. SCOPe, developed and maintained at Lawrence Berkeley
National Lab and UC Berkeley, resumed PDB domain curation
based on the SCOP classification, now based largely on automated
curation.
SCOP classification, like CATH, is hierarchical, but has seven
levels of classification: SCOP release, Class, Fold, Superfamily,
Family, Protein, and Species. SCOP superfamilies are used as the
basis for identifying structural clusters, which in turn form the bases
for identifying representative domains, which can be searched more
efficiently for structural homology. The SCOPe website provided a
search utility that can be used to find SCOP entries that correspond
to a given PDB, which can then be compared to related SCOP
entries at different levels in the SCOP hierarchy. Comparison to
family members in the same superfamily can help to identify func-
tional homologs.
2.9 SAS: Sequence SAS is a tool for finding PDB entries that have sequence similarity
Annotated by Structure to a given protein sequence, providing an entrée to structural
homology starting with protein sequence [45]. PDBsum provides
a convenient link to SAS for a given PDB entry so that other PDB
Functional Annotation from Structural Homology 229
2.10 ConSurf ConSurf is a program that categorized amino acid positions within
a sequence based on degree of evolutionary conservation [47]. A
conservation category is given a corresponding category numerical
value that can be color coded on a protein sequence or multi-
sequence alignment and/or mapped onto a 3D molecular surface.
Given a protein sequence or a PDB coordinate set, from which
protein sequence can be extracted, ConSurf will generate a multi-
sequence alignment, cluster sequences to remove highly redundant
sequences, then calculate an estimated conservation rate, that is
then broken down into categories 1–9. The rank categories are
conveniently represented by discrete colors that can be mapped
onto sequences or 3D structures. ConSurf also outputs the multi-
sequence alignment, a dendrogram depicting the interrelatedness
of sequences in the multi-sequence alignment, script files for dis-
playing ConSurf results in Chimera (and other popular molecular
graphics programs), and text files. The text files contain the esti-
mated conservation rates for each position, the category number,
reliability statistics, and the frequency of appearance for each amino
acid type for each position along the sequence.
Evolutionarily conservation of amino acids at specific locations
on protein structure is highly related to that amino acid’s functional
importance. To the degree that a multi-sequence alignment can
faithfully recapitulate structural alignment of amino acid positions
in three dimensions, sequence conservation can reveal functionally
important sequence positions and corresponding amino acid types.
ConSurf is a convenient tool for sequence conservation analysis
that can be mapped onto 3D structure, and it is nicely linked to
PDBSum on the “Protein” tab by clicking “Analysis of sequence’s
residue conservation.”
3 Methods
3.2 Discovering By far the most common invocation of structural similarity is the
Homology Through use of domain similarity, both in domain topology and three-
Domain Structure dimensional geometry, to classify proteins and to identify homo-
Similarity logs. Indeed, both CATH [41, 42] and SCOP [43] are databases of
domains derived from classifying structures based on their simila-
rities. In addition, precalculated structure similarity results, such as
the results presented on the structural similarity tab for a given PDB
entry on the RCSB website [29, 30], tend to return top scoring hits
that have high levels of domain similarity just by the nature of how
proteins fold and how the matching algorithms work. Thanks to
the foundational work of CATH, SCOP, and the all-vs-all precal-
culated structural similarities generated at the PDB (see Note 3),
structural biologists can quickly identify related structures for a
given structure and investigate homologies. For newly determined
structures that have not been entered into the PDB, structure-
based database searches, such as the web service provided on the
FATCAT website [36], can be performed by uploading the coordi-
nate file for the new structure to the database search utility. Homol-
ogy to a protein with well annotated function can be used to infer
details about function for a protein with homologous structure.
As a demonstration case, the example of annotation function
for the F. tularensis Rapid Encystment Protein 34 kDa (REP34) is
reviewed here. At the time the structure for REP34 was first deter-
mined, the REP34 protein was implicated in inducement of encyst-
ment by amoeba [53], but the protein and gene that encodes
REP34 (orf FTN_0149) were annotated as conserved protein of
unknown function. FTN_0149 was annotated as such because
there was no identifiable sequence match to a gene or protein
with well-annotated function. There was also no identifiable
sequence match to proteins in the PDB using BLASTp. Once the
structure for REP34 was determined, homologs were readily iden-
tified by structural similarity, a function was inferred and subse-
quently tested and confirmed [54].
Functional Annotation from Structural Homology 235
Table 1
PDB entries in the structural neighborhood of 2omo
Annotation Identity
ID PFB-description Uniprot description scorea (%)
2gff Yersinia pestis lsrg lsrG; (4S)-4-hydroxy-5- 3 47
phosphonooxypentane-2,3-dione
isomerase
3qmq E. coli autoinducer- lsrG; (4S)-4-hydroxy-5- 2 43
2 modifying protein lsrg phosphonooxypentane-2,3-dione
isomerase
1y0h Structural genomics, Putative monooxygenase 2 22
unknown function
3f44 Putative monooxygenase Putative uncharacterized protein 1 21
2omo Putative anribiotic Domain of unknown function (duf176) 1 N/A
biosynthesis
monooxygenase
3kkf Putative anribiotic Putative flavoredoxin 1 17
biosynthesis
monooxygenase
3mcs Putative monooxygenase Hypothetical cytosolic protein 1 17
3bm7 Putative monooxygenase Antibiotic monooxygenase domain- 1 15
containing protein
1r6y Unknown function Probable quinol monooxygenase YgiN 3 14
1q8b Unknown function Uncharacterized protein Yjcs 1 11
1x7v Unknown function Antibiotic monooxygenase domain- 1 19
containing protein
2fb0 Conserved protein of Antibiotic monooxygenase domain- 1 22
unknown function containing protein
2bbe Unknown function Antibiotic biosynthesis monooxegenase 1 19
family protein
4dpo Conserved protein of Conserved protein 1 16
unknown function
a
Annotation scores are taken from the Uniprot entry linked from the PDB entry
Fig. 5 (continued) 2omo [57]. The structure for 3qmq was identified by CATH
[41, 42] as belonging to the same sequence cluster as 2omo at the 35%
sequence similarity level. The structure for 3qmq is included in this structural
family alignment because it is one of the better annotated proteins in this
collection [59]. (b) Structure-based multi-sequence alignment from the multi-
structure alignment shown in (a). Each row is labeled with the PDB and chain
identifiers for entries used in the alignment. The three non-glycine residues that
are absolutely conserved for the whole multi-sequence alignment (down an
entire column for a given position in the alignment) are by capital letter in the
consensus row above the individual sequences. (c) The three absolutely con-
served non-glycine residues are displayed (displayed as sticks with carbon
atoms shown in a lighter gray than His nitrogen atoms and Glu oxygenatoms)
on the ribbon diagrams of the superimposed structures (left panel). A represen-
tative chain (3qmq, [59]) is displayed with the highly conserved residues and a
solvent accessible pocket (right panel). The surface is shaded the same as the
color of the atom corresponding to the surface vertices. The conserved glutamic
acid at the base of the pocket is obscured by the section of molecular surface,
but the oxygen atoms of the side chain are contributing to the pocket solvent
accessible surface, as is the ε nitrogen from the conserved histidine. The ε
nitrogen of the conserved histidine is also within hydrogen bonding distance of a
water molecule. The absolute conservation, proximity in space, hydrogen bond-
ing network (both glutamic acids are hydrogen bonded to the conserved histi-
dine), and positions of two of the residues at the base of a pocket strongly
implies that these residues are functionally important for this family of proteins.
(d) The canonical monooxygenase (ActVA from Streptomyces coelicolor, [71]) is
shown with three structural neighbors [72–74]. The structure based multi-
sequence of this structural cluster reveals a highly conserved motif made up
of residues proximal in space and at the bottom of a solvent accessible pocket
on the molecular surface (not shown). The conserved motif is made up of a
tryptophan and a contiguous tri-peptide proline-glycine-phenylalanine. The con-
served tryptophan is known to be involved in monooxygenase activity. This figure
was composed using Chimera [20]
242 Brent W. Segelke
Table 2
PDB entries in the structural neighborhood of monooxygenase 3kg0
Annotation
ID PDB description Uniprot description scorea Identity (%)
1lq9 Monooxygenase ActVA 6 protein 1 16
1iuj Hypothetical protein TT1380 protein 1 18
3kg0 Monooxygenase Deoxynogalonate 5 10
SnoB monooxygenase snoaB
3hx9 Heme-degrader Heme oxygenase 5 14
MhuD (mycobilin-producing)
MhuD
a
Uniprot annoation score. A score of 5 indicates the strongest experimental evidence supporting the given annotation
and a score of 1 indicates little or no evidence as to the annotated function.
3.4 Identifying Simple, highly conserved, 3D motifs can be used to identify distant
Distant Homologs homologs that do not necessarily belong to the same CATH or
with Shared 3D Motifs SCOP families. Searching by motif similarity is a powerful means
for finding very distant homologs that do not share whole domain
architectures. These more distant homologies can still lead to func-
tion insights. Motifs can be identified, as in the last section, by
structure-based alignment of structurally similar proteins that are
distantly related by sequence. Putative 3D motifs, for example, a
conspicuous cluster of residues that form a hydrogen bonding net-
work at the base of a pocket can also provide the basis for a motif
search. Matches to a motif search that extended structural similar-
ity, beyond the motif used for the query, provides strong evidence
of common ancestry. Proteins with metal binding sites provide
good example cases for 3D motif-based searches, since the metal
binding motif is conspicuous.
Functional Annotation from Structural Homology 243
Fig. 6 Identifying homologs from motif similarity. (a) The protein represented in
PDB entry 4fca [75] is described by the depositors as a conserved protein with
unknown function. At the base of a cleft formed by two domains of the protein
(not shown), there is a conspicuous metal binding motif made up of two histidine
and a glutamate that are all coordinated to a metal ion (left panel). The two
histidines reside on the same α-helix one helical turn apart from each other. The
glutamic acid residue resides on helical section packed next to the first helix in
an anti-parallel direction. Using the three-residue motif identified in the 4fca
entry as the input for RASMOT-3D PRO [49, 50] and selecting “identical
residues” in the protein selection returns a cluster of five matches, including
the query structure, with low RMSD (0.2 Å). The right panel shows the
superposition of the closest four matches: 4fgm [76], 1e1h [77], 3u9w [78],
and 4fca [75]. The motifs match nearly identically in all torsions and relation to
the coordinated metal ion even though only the first torsion for each of the three
residues of the motif are used in the query (an implementation detail of the
RASMOT-3D PRO algorithm). (b) The full ribbon diagram is shown for each of the
four PDB entries with matching motifs that were superimposed in part (a). Short
segments of antiparallel helices that contain the motif and three strands of a
β-sheet in close proximity to the metal binding motif are highlighted in gray. The
remainder of the ribbon diagram for each molecule is shown in white. (c) Given a
Functional Annotation from Structural Homology 245
on the helix with the two conserved histidine and a highly con-
served tyrosine nearby (Fig. 6c). The conservation of structural
elements beyond the query motif and conservation of the subdo-
main architecture is highly suggestive of homology between these
molecules. A small schematic can be made to depict the “rules” for
the core homologous element shared by these protein (Fig. 6d).
From the known function of the newly identified distant homo-
logs to 4fca, we can provide some annotation to this unannotated
protein. Each of the three identified homologs are enzymes with
the metal co-factor implicated in catalysis. The conserved HEXXH
sequence on one helical section of the 3D motif is considered a
hallmark for Zn proteases [79]. The catalytic domain of botulinum
toxin is in fact known to be a Zn protease [77]. The protein
represented by the 4fgm entry is also described as a Zn aminopep-
tidase, although the annotation score is low [76]. Despite contain-
ing the HEXXH motif, the protein in the 3u9w entry is annotated
as human leukotriene a4 hydrolase [78] and the annotation score
indicated high confidence in the annotation. The protein repre-
sented by 4fca could be annotated as a metalloenzyme and hydro-
lase with high confidence. It is reasonable to say that 4fca is a
putative Zn metallopeptidase given the hallmark features it has in
common with known metallopeptidases.
4 Conclusion
Fig. 6 (continued) matched motif, a zonal region can be explored for other
conserved elements among homologs. Shown here in addition to the metal
binding residues is a second absolutely conserved glutamic acid between the
two histidines and one amino acid toward the c-terminus from the first histidine.
Also shown is a highly conserved tyrosine that is implicated in peptidase activity
in the botulism toxin zinc peptidase (pdb 1e1h, [77]). (d) A schematic is shown
depicting the conserved elements of a larger motif discovered by examining the
superposition of the metal binding motif matches
246 Brent W. Segelke
residue that might be quite distant along the sequence but proximal
in space. High sequence identity (>30%) as determined by struc-
tural alignment can also be a strong indicator of homology.
One major advantage of analyzing function annotation
through structure homology is that there is a much greater likeli-
hood of having associated primary literature relevant to the given
annotation. Historically, a structure determination effort followed
extensive characterization of protein function. In the genomics era,
a structural genomics approach in which gene discovery for pre-
dicted proteins of unknown function motivates a structure deter-
mination effort. With this application of structural genomics, the
structure is expected to reveal function. Unfortunately, many data-
base entries of structures determined by structural genomics cen-
ters do not have functional annotation or simply borrow the Pfam
annotation, and Pfam descriptions can be misleading. The Pfam
descriptions often sound like functional annotations, but they are
really just the name given to the archetypal representative for a
protein sequence family. For example, the Pfam name for the
LsrG protein (pdb 2gff) is antibiotic biosynthesis monooxygenase,
which has no obvious relationship to the known function of LsrG,
recycling phospho-AI-2 in the AI-2 quorum sensing pathway. The
lack of annotation or the use of misleading annotation from Pfam
for structural genomics center depositions can complicate or con-
found structural homology-based function analysis, but it also
represents an opportunity to revisit and improve the annotations
for many of these entries.
5 Notes
zero; chains are assigned the letter code provided in the input
file or entry; residues are assigned the number provided in an
entry; and atoms are assigned the name provided in the input
pdb file.
Graphical selection in combination with zone selection
enables the local spatial expansion of selected molecular fea-
tures which in turn is helpful for: tailoring displayed features,
identifying structurally homologous residues, or examining
interactions. Graphical selection of an atom or bond is achieved
by clicking while holding the ctrl key; shift+ctrl click appends
to or subtracts from the selection; ctrl click in empty space
clears a selection. If a ribbon is displayed for a chain, a residue
within the chain can be graphically selected by ctrl+click on a
section of the ribbon. Graphical selection of an atom or atoms
followed by selection of a zone with a small zone radius (e.g.,
<1.0 Å), conveniently expands atom selections into a residue
selection by selecting the “Select all atoms/bonds of any resi-
due in the selected zone” option within the “Select Zone
Parameters” dialog window. A small zone radius selection is
also useful for identifying structurally homologous residues for
superimposed structurally homologous molecules. Graphical
selection of atoms or residues in combination with zone selec-
tion with a 3–3.5 Å zone radius can be used to find residues
that interact with selected residues or atoms, because
non-bonding interaction, such as salt bridges, h-bonds, or
van der Waals interactions are typically in the 2.5–3.5 Å range.
Chain selection and chain selection in combination with
“Invert,” either selected model or all models, helps to rapidly
simplify the displayed content to focus on features of interest.
There is often redundant information in PDB entries so when a
model is first opened in the display window, there are multiple
copies of chains or molecular assemblies with identical or near
identical sequences—this is a consequence of how crystallogra-
phy and NMR results are entered in the PDB. By selecting
redundant chains, they can be easily undisplayed. Selecting a
chain of interest, all other chains can be quickly undisplayed or
deleted by inverting the selection. If multiple models are open,
selecting the “Invert (selected models)” option after selecting a
chain of interest selects all other chains within the same model,
which can be used to eliminate the unwanted or redundant
chains within the same model. Choosing instead the “Invert
(all models)” options can be used to eliminant all displayed
chains for all models. Inverting the selection again returns the
selection to the initially selected chain of interest.
Select mode options help to rapidly build up or pair down
selected molecular component. The available select mode
options are replace, append, intersect, and subtract. Replace is
the default mode. Other useful selection tools that are not
250 Brent W. Segelke
under the “Select” drop down are provided via the model
panel, the sequence tool, and the CASTp tool. The model
panel and sequence tool can be launched from the “Favorites”
pull down menu. The model panel lists open models and
displayed surfaces. Individual listed items can be highlighted
by mouse click then selected with the select button on the
model panel. The sequence tool allows the user to choose
from a list of chains for all of the chains currently open then
opens a sequence window for each of the chains chosen. A
sequence window will display the linear one-letter code
sequence for a given chain. A residue, or segment of residues,
can be selected by dragging the mouse over the given residue
or segment. Other residues can be added to the existing selec-
tion by shift+drag over other parts of the sequence. Selection
from the sequence window are reflected in the display window
and vice versa. The CASTp tool is automatically launched when
CASTp.poc files are opened through the Chimera File!open
utility. When a CASTp file is opened in Chimera, the
corresponding coordinate file is opened and chains are dis-
played as ribbons. A CASTp tool is also launched which has a
sortable tabular list of CASTp identified pockets and several
selectable options. Clicking on one of the listed pockets will
display the surface of that pocket over the associated ribbon of
the corresponding chain and will select all of the associated
atoms, which can be easily expanded to the associated residues
by use of zone selection with a small zone radius.
Having selected regions of interest, a variety of actions can
be applied to the selected region with the utilities provided
under the “Actions” dropdown. Actions!Atom/Bond!-
show (or hide), Actions!Ribbons!show (or hide), and
Actions!Surface!show (or hide), and Actions!color tend
to be frequently used for highlighting and comparing regions
of interest.
11. Matchmaker in Chimera. Chimera provides a structure com-
parison tool called MatchMaker which can be used to superim-
pose structural homologs and to produce structure-based
sequence alignments which enable detailed analysis of local
and global structural homolog and investigation of the rela-
tionship between structural and sequence homologies. Match-
Maker can produce pairwise superpositions and alignments or
multi-structure superpositions and alignments. MatchMaker is
launched from the Tools!Structure Comparison drop down.
To run MatchMaker, there must be at least two models
(or coordinate files) open. The MatchMaker tool window dis-
plays side-by-side lists of open models, one labeled Reference
Structure and one labeled Structure(s) to match. To run
MatchMaker, one reference structure should be selected and
Functional Annotation from Structural Homology 251
Acknowledgments
References
1. Benson DA, Karsch-Mizrachi I, Lipman DJ, human proteome: 2018 metrics from the
Ostell J, Wheeler DL (2007) GenBank. HUPO human proteome project. J Proteome
Nucleic Acids Res 36(Suppl_1):D25–D30 Res 17(12):4031–4041
2. Lachmann A, Torre D, Keenan AB, Jagodnik 4. McCool EN, Lubeckyj RA, Shen X, Chen D,
KM, Lee HJ, Wang L, Silverstein MC, Ma’ayan Kou Q, Liu X, Sun L (2018) Deep top-down
A (2018) Massive mining of publicly available proteomics using capillary zone
RNA-seq data from human and mouse. Nat electrophoresis-tandem mass spectrometry:
Commun 9(1):1–10 identification of 5700 proteoforms from the
3. Omenn GS, Lane L, Overall CM, Corrales FJ, Escherichia coli proteome. Anal Chem 90
Schwenk JM, Paik YK, Van Eyk JE, Liu S, (9):5529–5533
Snyder M, Baker MS, Deutsch EW (2018) 5. Feussner K, Feussner I (2019) Comprehensive
Progress on identifying and characterizing the LC-MS-based metabolite fingerprinting
254 Brent W. Segelke
approach for plant and fungal-derived samples. 17. Scott DL, Otwinowski Z, Gelb MH, Sigler PB
In: High-throughput metabolomics. Humana, (1990) Crystal structure of bee-venom phos-
New York, NY, pp 167–185 pholipase A2 in a complex with a transition-
6. Lake BB, Chen S, Sos BC, Fan J, Kaeser GE, state analogue. Science 250(4987):1563–1566
Yung YC, Duong TE, Gao D, Chun J, Kharch- 18. Cavazzini D, Meschi F, Corsini R, Bolchi A,
enko PV, Zhang K (2018) Integrative single- Rossi GL, Einsle O, Ottonello S (2013) Autop-
cell analysis of transcriptional and epigenetic roteolytic activation of a symbiosis-regulated
states in the human adult brain. Nat Biotechnol truffle phospholipase A2. J Biol Chem 288
36(1):70–80 (3):1533–1547
7. Sandberg R (2014) Entering the era of single- 19. Matoba Y, Sugiyama M (2003) Atomic resolu-
cell transcriptomics in biology and medicine. tion structure of prokaryotic phospholipase
Nat Methods 11(1):22–24 A2: analysis of internal motion and implication
8. DOE US (2019) Breaking the bottleneck of for a catalytic mechanism. Proteins 51
genomes: understanding gene function across (3):453–469
taxa workshop report, DOE/SC-0199. 20. Pettersen EF, Goddard TD, Huang CC,
U.S. Department of Energy Office of Science, Couch GS, Greenblatt DM, Meng EC, Ferrin
Washington, DC. https://genomicscience. TE (2004) UCSF chimera—a visualization sys-
energy.gov/genefunction/. Accessed 26 Feb tem for exploratory research and analysis. J
2020 Comput Chem 25(13):1605–1612. https://
9. Sivashankari S, Shanmughavel P (2006) Func- doi.org/10.1002/jcc.20084
tional annotation of hypothetical proteins–a 21. Scott DL, Sigler PB (1994) Structure and cata-
review. Bioinformation 1(8):335 lytic mechanism of secretory phospholipases
10. Hutchison CA, Chuang RY, Noskov VN, A2. Adv Protein Chem 45:53–88
Assad-Garcia N, Deerinck TJ, Ellisman MH, 22. Noeske J, Wasserman MR, Terry DS, Altman
Gill J, Kannan K, Karas BJ, Ma L, Pelletier JF RB, Blanchard SC, Cate JH (2015) High-
(2016) Design and synthesis of a minimal bac- resolution structure of the Escherichia coli
terial genome. Science 351:6280 ribosome. Nat Struct Mol Biol 22(4):336–341
11. Richarme G, Liu C, Mihoub M, Abdallah J, 23. Locher KP (2016) Mechanistic diversity in
Leger T, Joly N, Liebart JC, Jurkunas UV, ATP-binding cassette (ABC) transporters. Nat
Nadal M, Bouloc P, Dairou J (2017) Guanine Struct Mol Biol 23(6):487
glycation repair by DJ-1/Park7 and its bacte- 24. Oldham ML, Khare D, Quiocho FA, Davidson
rial homologs. Science 357(6347):208–211 AL, Chen J (2007) Crystal structure of a cata-
12. UniProt Consortium (2018) UniProt: a lytic intermediate of the maltose transporter.
worldwide hub of protein knowledge. Nucleic Nature 450(7169):515
Acids Res 47(D1):D506–D515 25. Hvorup RN, Goetz BA, Niederer M,
13. UniProt consortium (2020) UniProt Hollenstein K, Perozo E, Locher KP (2007)
UniProtKB/Swiss-Prot UniProt release Asymmetry in the structure of the ABC
2020_01. https://www.uniprot.org/statis transporter-binding protein complex BtuCD-
tics/Swiss-Prot. Accessed 26 Feb 2020 BtuF. Science 317(5843):1387–1390
14. Giordanetto F, Knerr L, Nordberg P, 26. Hutchinson EG, Thornton JM (1990)
Pettersen D, Selmi N, Beisel HG, de la HERA—a program to draw schematic dia-
Motte H, Månsson Å, Dahlstrom M, grams of protein secondary structures. Proteins
Broddefalk J, Saarinen G (2018) Design of 8(3):203–212
Selective sPLA2-X inhibitor ()-2-{2-[carba- 27. Laskowski RA, Jabłońska J, Pravda L, Vařeková
moyl-6-(trifluoromethoxy)-1 H-indol-1-yl] RS, Thornton JM (2018) PDBsum: structural
pyridine-2-yl} propanoic acid. ACS Med summaries of PDB entries. Protein Sci 27
Chem Lett 9(7):600–605 (1):129–134
15. Sekar K, Sekharudu C, Tsai MD, Sundaralin- 28. Lewinson O, Livnat-Levanon N (2017) Mech-
gam M (1998) 1.72 Å resolution refinement of anism of action of ABC importers: conserva-
the trigonal form of bovine pancreatic phos- tion, divergence, and physiological
pholipase A2. Acta Crystallogr D Biol Crystal- adaptations. J Mol Biol 429(5):606–619
logr 54(3):342–346 29. RCSB (2000) Protein Data Bank. http://www.
16. Segelke BW, Nguyen D, Chee R, Xuong NH, rcsb.org/. Accessed 26 Feb 2020
Dennis EA (1998) Structures of two novel 30. Berman HM, Westbrook J, Feng Z,
crystal forms of Naja naja naja phospholipase Gilliland G, Bhat TN, Weissig H, Shindyalov
A2 lacking Ca2+ reveal trimeric packing. J Mol IN, Bourne PE (2000) The Protein Data Bank.
Biol 279(1):223–232 Nucleic Acids Res 28:235–242
Functional Annotation from Structural Homology 255
31. wwPDB (2003) Worldwide Protein Data Bank. tool to facilitate the use of structural informa-
http://www.wwpdb.org/. Accessed 26 Feb tion in sequence analysis. Protein Eng 11
2020 (10):855–859
32. Berman H, Henrick K, Nakamura H (2003) 46. Lipman DJ, Pearson WR (1985) Rapid and
Announcing the worldwide protein data bank. sensitive protein similarity searches. Science
Nat Struct Mol Biol 10(12):980 227(4693):1435–1441
33. Altschul SF, Gish W, Miller W, Myers EW, Lip- 47. Ashkenazy H, Abadi S, Martz E, Chay O, May-
man DJ (1990) Basic local alignment search rose I, Pupko T, Ben-Tal N (2016) ConSurf
tool. J Mol Biol 215(3):403–410 2016: an improved methodology to estimate
34. NIH, National Center for Biotechnology and visualize evolutionary conservation in
Information, U.S. National Library of Medi- macromolecules. Nucleic Acids Res 44(W1):
cine (1990) BLAST >> blastp suite. https:// W344–W350
blast.ncbi.nlm.nih.gov/Blast.cgi? 48. Tian W, Chen C, Lei X, Zhao J, Liang J (2018)
PAGE¼Proteins. Accessed 26 Feb 2020 CASTp 3.0: computed atlas of surface topog-
35. Ye Y, Godzik A (2003) Flexible structure align- raphy of proteins. Nucleic Acids Res 46(W1):
ment by chaining aligned fragment pairs allow- W363–W367
ing twists. Bioinformatics 19(Suppl 2): 49. RASMOT-3D PRO (2009) Recursive Auto-
ii246–ii255 matic Search of MOTif in 3D structures of
36. Godzik Lab (2020) FATCAT. http://fatcat. PROteins. http://biodev.cea.fr/rasmot3d/.
godziklab.org/fatcat-cgi/cgi/fatcat.pl?- Accessed 26 Feb 2020
func¼search. Accessed 26 Feb 2020 50. Debret G, Martel A, Cuniasse P (2009)
37. EMBL-EBI (2013) PDBsum pictorial database RASMOT-3D PRO: a 3D motif search web-
of 3D structures in the protein databank. server. Nucleic Acids Res 37(Suppl 2):
https://www.ebi.ac.uk/thornton-srv/ W459–W464
databases/cgi-bin/pdbsum/GetPage.pl? 51. Zeng ZH, Castano AR, Segelke BW, Stura EA,
pdbcode¼index.html. Accessed 26 Feb 2020 Peterson PA, Wilson IA (1997) Crystal struc-
38. El-Gebali S, Mistry J, Bateman A, Eddy SR, ture of mouse CD1: an MHC-like fold with a
Luciani A, Potter SC, Qureshi M, Richardson large hydrophobic binding groove. Science
LJ, Salazar GA, Smart A, Sonnhammer ELL 277(5324):339–345
(2019) The Pfam protein families database in 52. Fremont DH, Matsumura M, Stura EA, Peter-
2019. Nucleic Acids Res 47(D1):D427–D432 son PA, Wilson IA (1992) Crystal structures of
39. EMBL-EBI (2018) Pfam 32.0. https://pfam. two viral peptides in complex with murine
xfam.org/. Accessed 26 Feb 2020 MHC class I H-2Kb. Science 257
40. Hunter S, Apweiler R, Attwood TK, Bairoch A, (5072):919–927
Bateman A, Binns D, Bork P, Das U, 53. El-Etr SH, Margolis JJ, Monack D, Robison
Daugherty L, Duquenne L, Finn RD (2008) RA, Cohen M, Moore E, Rasley A (2009)
InterPro: the integrative protein signature Francisella tularensis type a strains cause the
database. Nucleic Acids Res 37(Suppl 1): rapid encystment of Acanthamoeba castellanii
D211–D215 and survive in amoebal cysts for three weeks
41. Dawson NL, Lewis TE, Das S, Lees JG, Lee D, postinfection. Appl Environ Microbiol 75
Ashford P, Orengo CA, Sillitoe I (2017) (23):7488–7500
CATH: an expanded resource to predict pro- 54. Feld GK, El-Etr S, Corzett MH, Hunter MS,
tein function through structure and sequence. Belhocine K, Monack DM, Frank M, Segelke
Nucleic Acids Res 45(D1):D289–D295 BW, Rasley A (2014) Structure and function of
42. CATH (2020) CATH/Gene3D v4.2. https:// REP34 implicates carboxypeptidase activity in
www.cathdb.info/. Accessed 26 Feb 2020 Francisella tularensis host cell invasion. J Biol
Chem 289(44):30668–30679
43. Fox NK, Brenner SE, Chandonia JM (2014)
SCOPe: structural classification of proteins— 55. PDB id: 3b2y, Joint Center for Structural
extended, integrating SCOP and ASTRAL data Genomics (JCSG) (2007) Crystal structure of
and classification of new structures. Nucleic metallopeptidase containing co-catalytic metal-
Acids Res 42(D1):D304–D309 loactive site (YP_563529.1) from Shewanella
denitrificans OS217 at 1.74 Å resolution.
44. Murzin AG, Brenner SE, Hubbard TJP, https://doi.org/10.2210/pdb3B2Y/pdb
Chothia C (1995) SCOP: a structural classifi-
cation of proteins database for the investigation 56. Otero A, Rodrı́guez de la Vega M, Tanco S,
of sequences and structures. J Mol Biol Lorenzo J, Avilés FX, Reverter D (2012) The
247:536–540 novel structure of a cytosolic M14 metallocar-
boxypeptidase (CCP) from Pseudomonas
45. Milburn D, Laskowski RA, Thornton JM
(1998) Sequences annotated by structure: a
256 Brent W. Segelke
aeruginosa: a model for mammalian CCPs. 66. PDB id: 1q8b, Zhang R, Joachimiak A,
FASEB J 26(9):3754–3764 Edwards A, Savchenko A, Midwest Center for
57. PDB id: 2omo, Osipiuk J, Evdokimova E, Structural Genomics (MCSG) (2003) Struc-
Kagan O, Savchenko A, Edwards A, tural genomics, protein YJCS. https://doi.
Joachimiak A, Midwest Center for Structural org/10.2210/pdb1Q8B/pdb
Genomics (MCSG) (2007) Putative antibiotic 67. PDB id: 1x7v, Sanders DA, Walker JR,
biosynthesis monooxygenase from Nitrosomo- Skarina T, Gorodichtchenskaia E,
nas europaea. DOI. https://doi.org/10. Joachimiak A, Edwards A, Savchenko A, Mid-
2210/pdb2OMO/pdb west Center for Structural Genomics (MCSG)
58. PDB id: 2gff, de Carvalho-Kavanagh M, (2004) Crystal structure of PA3566 from Pseu-
Schafer J, Lekin T, Toppani D, Chain P, domonas aeruginosa. https://doi.org/10.
Lao V, Motin V, Garcia E, Segelke B (2007) 2210/pdb1X7V/pdb
Crystal structure of Yersinia pestis LsrG. 68. PDB id: 2fb0, Nocek B, Hatzos C, Abdullah J,
https://doi.org/10.2210/pdb2GFF/pdb Collart F, and Joachimiak A, Midwest Center
59. Marques JC, Lamosa P, Russell C, Ventura R, for Structural Genomics (MCSG) (2006) Crys-
Maycock C, Semmelhack MF, Miller ST, Xavier tal structure of conserved protein of unknown
KB (2011) Processing the interspecies function from Bacteroides thetaiotaomicron
quorum-sensing signal autoinducer-2 (AI-2) VPI-5482 at 2.10 Å resolution, possible oxido-
characterization of phospho-(S)-4, 5-dihy- reductase. https://doi.org/10.2210/
droxy-2, 3-pentanedione isomerization by pdb2FB0/pdb
LsrG protein. J Biol Chem 286 69. PDB id: 2bbe, Chang C, Bigelow L,
(20):18331–18343 Joachimiak A, Midwest Center for Structural
60. Lemieux MJ, Ference C, Cherney MM, Genomics (MCSG) (2005) Crystal structure of
Wang M, Garen C, James MN (2005) The protein SO0527 from Shewanella oneidensis.
crystal structure of Rv0793, a hypothetical https://doi.org/10.2210/pdb2BBE/pdb
monooxygenase from M. tuberculosis. J Struct 70. PDB id: 4dpo, Agarwal R, Chamala S, Evans R,
Funct Genom 6(4):245–257 Gizzi A, Hillerich B, Kar A, LaFleur J, Foti R,
61. PDB id: 3f44, Joint Center for Structural Siedel R, Zencheck W, Villigas G, Almo SC,
Genomics (JCSG) (2008) Crystal structure of Swaminathan S, New York Structural Geno-
putative monooxygenase (YP_193413.1) from mics Research Consortium (NYSGRC)
Lactobacillus acidophilus NCFM at 1.55 A res- (2012) Crystal structure of a conserved protein
olution. https://doi.org/10.2210/pdb3F44/ MM_1583 from Methanosarcina mazei Go1.
pdb https://doi.org/10.2210/pdb4DPO/pdb
62. PDB id: 3kkf, Joint Center for Structural 71. Sciara G, Kendrew SG, Miele AE, Marsh NG,
Genomics (JCSG) (2009) Crystal structure of Federici L, Malatesta F, Schimperna G,
putative antibiotic biosynthesis monooxygen- Savino C, Vallone B (2003) The structure of
ase (NP_810307.1) from Bacteroides thetaio- ActVA-Orf6, a novel type of monooxygenase
taomicron VPI-5482 at 1.30 Å resolution. involved in actinorhodin biosynthesis. EMBO J
https://doi.org/10.2210/pdb3KKF/pdb 22(2):205–215
63. PDB id: 3mcs, Joint Center for Structural 72. Wada, Shirouzu T, Terada M, Kamewari T,
Genomics (JCSG) (2010) Crystal structure of Park Y, Tame SY, Kuramitsu JR, Yokoyama S
putative monooxygenase (fn1347) from fuso- (2004) Crystal structure of the conserved
bacterium nucleatum subsp. Nucleatum ATCC hypothetical protein TT1380 from Thermus
25586 at 2.55 Å resolution. https://doi.org/ thermophilus HB8. Proteins 55(3):778–780
10.2210/pdb3MCS/pdb 73. Grocholski T, Koskiniemi H, Lindqvist Y,
64. PDB id: 3bm7, Joint Center for Structural M€ants€al€a P, Niemi J, Schneider G (2010) Crys-
Genomics (JCSG) (2007) Crystal structure of tal structure of the cofactor-independent
a putative antibiotic biosynthesis monooxygen- monooxygenase SnoaB from Streptomyces
ase (cc_2132) from Caulobacter crescentus nogalater: implications for the reaction mecha-
cb15 at 1.35 Å resolution. https://doi.org/ nism. Biochemistry 49(5):934–944
10.2210/pdb3BM7/pdb 74. Chim N, Iniguez A, Nguyen TQ, Goulding
65. PDB id: 1r6y, Adams MA, Jia Z, Montreal- CW (2010) Unusual diheme conformation of
Kingston Bacterial Structural Genomics Initia- the heme-degrading protein from Mycobacte-
tive (BSGI) (2003) Crystal structure of YgiN rium tuberculosis. J Mol Biol 395(3):595–608
from Escherichia coli. https://doi.org/10. 75. PDB id: 4fca, Tan K, Zhou M, Kwon K, Ander-
2210/pdb1R6Y/pdb son WF, Joachimiak A, Center for Structural
Genomics of Infectious Diseases (CSGID)
(2012) The crystal structure of a functionally
Functional Annotation from Structural Homology 257
unknown conserved protein from Bacillus noncanonical zinc protease activity. Proc Natl
anthracis str. Ames. https://doi.org/10. Acad Sci 101(18):6888–6893
2210/pdb4FCA/pdb 78. PDB id: 3u9w, Niegowski D, Thunnissen M,
76. PDB id: 4fgm, Vorobiev S, Su M, Tong T, Tholander F, Rinaldo-Matthis A, Muroya A,
Kohan E, Wang D, Everett JK, Acton TB, Haeggstrom J Z (2012) Structure of human
Montelione GT, Tong L, Hunt JF, Northeast leukotriene a4 hydrolase in complex with
Structural Genomics Consortium (NESGC) inhibitor sc57461a. https://doi.org/10.
(2012) Crystal structure of the aminopeptidase 2210/pdb3U9W/pdb
n family protein q5qty1 from Idiomarina loi- 79. Rawlings ND, Barrett AJ (1995) Evolutionary
hiensis, Northeast structural genomics consor- families of metallopeptidases. Methods Enzy-
tium target ilr60. https://doi.org/10.2210/ mol 248:183–228
pdb4FGM/pdb 80. Guzenko D, Burley SK, Duarte JM 2020 Real
77. Segelke B, Knapp M, Kadkhodayan S, time structural search of the Protein Data
Balhorn R, Rupp B (2004) Crystal structure Bank. PLoS computational biology, 16(7), p.
of Clostridium botulinum neurotoxin protease e1007970
in a product-bound state: evidence for
Chapter 12
Abstract
The MetaFlux software supports creating, executing, and solving quantitative metabolic flux models using
flux balance analysis (FBA). MetaFlux offers four modes of operation: (1) solving mode executes an FBA
model for an individual organism or for an organism community, (2) gene knockout mode executes an FBA
model with one or many gene knockouts, (3) development mode assists the user in creating and improving
FBA models, and (4) flux variability analysis mode generates a report of the robustness of an FBA model.
MetaFlux also solves dynamic FBA (dFBA) for both individual organisms and communities of organisms.
MetaFlux can be used in two different environments: on your local computer, which requires the installa-
tion of the Pathway Tools software, or through the web, which does not require installation of Pathway
Tools. On your local computer, MetaFlux offers all four modes of operation, whereas the web environment
provides only the solving mode.
Several visualization tools are available to analyze model solutions. The Cellular Overview tool graphi-
cally shows the reaction fluxes on an organism’s metabolic map once a model is solved. The Omics
Dashboard provides a hierarchical approach to visualizing reaction fluxes, organized by metabolic sub-
systems. For a community of organisms, plotting of accumulated biomasses and metabolites can be
performed using the Gnuplot tool.
In this chapter, we present eight methods using MetaFlux. Five solving mode methods illustrate execu-
tion of models for individual organisms and for organism communities. One method illustrates the gene
knockout mode. Two methods for the development mode illustrate steps for developing new metabolic
models.
Keywords Flux balance analysis, FBA, Solver, Gene knockout, Metabolic model, Steady-state,
Genome-scale model, Dynamic FBA, Community modeling, COBRA methods
1 Introduction
Ali Navid (ed.), Microbial Systems Biology: Methods and Protocols, Methods in Molecular Biology, vol. 2349,
https://doi.org/10.1007/978-1-0716-1585-0_12, © Springer Science+Business Media, LLC, part of Springer Nature 2022
259
260 Mario Latendresse et al.
interfaces for those who desire them. Another way in which Meta-
Flux models are highly accessible to scientists is due to the repre-
sentation of MetaFlux models within PGDBs, which in turn can be
queried, inspected, and visualized by using an extensive set of
web-based tools that support searching for genes, reactions, and
metabolites, along with the existence of extensive information
pages for these entities (for example, the metabolite information
page lists all reactions and pathways that produce and consume a
given metabolite). In addition, the PGDB representation enriches a
MetaFlux model with additional information that increases its
understandability, such as chemical structures, reaction atom map-
pings (which identify for each atom in each reactant compound the
corresponding atom in a product compound), regulatory informa-
tion, and linkages to protein and nucleic acid sequences for the
organism. Finally, a model’s outputs can be visualized through
several different tools that enable more rapid understanding of
model results. A highly accessible model is understood more
quickly, is easier to reuse and modify, and is more effectively
validated.
A second advantage of MetaFlux is its extensive set of software
tools for aiding the user during model development. One of the
most time-consuming aspects of model development is enabling
the model to produce all biomass metabolites from the nutrients.
Whereas most tools have a reaction gap-filler only, MetaFlux con-
tains a multiple gap-filler that proposes not only new reactions to
add to a model, but also new nutrients and secretions to add.
Further, the MetaFlux gap-filler identifies which biomass metabo-
lites cannot be produced by the model, so that the user knows
which biomass metabolites to focus their model-debugging efforts
on; other tools simply report that no solution can be found without
identifying the problematic biomass metabolites. We have studied
the accuracy of the MetaFlux gap-filler by randomly removing
reactions from an existing model and running the MetaFlux
gap-filler against those gaps [7], and by comparing manually
curated gap-filling results against MetaFlux results [8]. For the
metabolites that are not produced by the model, MetaFlux pro-
duces a blocked-reaction/metabolite report that lists, for each
non-produced metabolite, which reactions that could produce the
metabolite were blocked from proceeding because of the absence of
a reactant required by the reaction, and lists the blocking reactants.
MetaFlux also identifies unbalanced reactions.
Starting with Pathway Tools version 23.0, two new features
have been added to the installed version of MetaFlux. First, you can
now perform flux variability analysis (FVA) in MetaFlux once you
have a model that solves in FBA. FVA can be used to determine the
robustness of metabolic models under various simulated growth
conditions. This is done by solving a maximization and a minimi-
zation optimization problem on a set of reactions (all reactions are
262 Mario Latendresse et al.
Fig. 1 The graphical user interface of MetaFlux in Pathway Tools. The top pane contains buttons to create an
FBA template file or to select a specific input file from the local machine, a mode selector, and a button to
execute the run. The middle pane will contain a summary of the resulting run and buttons to access files
generated as a result of the run (e.g., log file and solution file). The bottom pane is used to display messages
and traces of execution
Metabolic Modeling with MetaFlux 263
2 Materials
3 Methods
Fig. 2 The Cellular Overview with reactions colored according to their fluxes once a model is solved. The
reactions are grouped in metabolic pathways, and the pathways are grouped into classes of pathways. Due to
space limitations, we show only pathways; the reactions not in pathways to the right are not shown. Based on
the mapping from fluxes to the colors shown in the legend, the cofactor biosynthesis pathways (purple) toward
the upper left clearly carry less flux than glycolysis and the TCA cycle (yellow), which are shown in the center.
Interacting with the Cellular Overview by zooming in and out, mousing over the compounds and reactions to
open tooltips describing them, and more options are possible
Fig. 3 The upper portion of the Omics Dashboard shows metabolite uptake and secretion rates, the cellular
growth rate, and ATP production rate. The appearance of each histogram (plot) can be customized using the
“Options” tab
Fig. 4 The lower portion of the Omics Dashboard summarizes flux through different metabolic systems
but rather details about the model before it was executed. You can
open the log file with your favorite text editor. In the first part of
the file is a list of reactions that were not included in the model
because these reactions were found to have some issues. For exam-
ple, a reaction that is not mass balanced is not included in the
model. Each of these reactions is preceded by a key-code (e.g.,
[unbalanced]) giving the reason for not including the reaction in
the model. The full list of key-codes and their meanings are at the
beginning of the log file.
After the problematic reactions, the log file lists the reactions
included in the model. That list is divided into three sections, one
for the non-generic and non-instantiated reactions, all generic
reactions, and all instantiated reactions (that is, instantiated from
the generic reactions). A generic reaction is a reaction that has at
least one of its substrates as a class (e.g., “an amino acid”). A
compound class contains several compound instances (e.g., L-tryp-
tophan). An instantiated reaction is generated from a generic reac-
tion where each compound class is replaced with specific compound
instance of that class.
268 Mario Latendresse et al.
Finally, in the log file, the last section lists blocked reactions
(the reactions that cannot proceed because one of their reactants is
not present) and blocked metabolites (the metabolites that cannot
be produced because all the reactions that produce them are
blocked). A blocked reaction must have a zero flux. Blocked reac-
tions are not included in the model. Blocked reactions are deter-
mined recursively by first computing a set of basic blocked reactions:
reactions that either (a) have a reactant that is not produced by
another reaction and is not included as a nutrient or (b) have a
product that is not consumed by another reaction and is not a
secretion or biomass metabolite. The products of these basic
blocked reactions are metabolites that also cannot be produced or
used, which block more reactions.
Metabolic Modeling with MetaFlux 269
Fig. 6 Fluxes through the two variants of the glycolysis pathway present in E. coli
2. Use the Quick Search box below the top menu to search for a
metabolite (example: “pyruvate”), a reaction (example:
“4.1.2.8”), a pathway (example: “glycolysis”), or a gene
(example: “trpa”). Click on an object in the Quick Search
results page to see its information page.
3. From an object information page, click on other related objects
to navigate to their information pages. For example, from a
pathway page, click on a metabolite or gene name to see their
pages; from a reaction page, click on a metabolite name to see
its page.
Method 1b: Executing a model with gene knockouts
We run a model with one or more gene knockouts to determine
the effect of the knockout on the growth rate of an organism. A
knockout could reduce the growth rate or eliminate growth. We
could specify a few genes to knockout or try all the genes of an
organism. In this method, we will knockout all the genes, one by
one, to find which genes are essential for E. coli based on the model
described by the file ecocyc-23.0-gem-cs-glucose-tea-
oxygen.fba.
Fig. 8 This popup window enables selecting metabolites and organisms for the 2D plots
Fig. 9 Two plots over 12 h in the entire grid, one for the accumulation of biomasses of two organisms, and one
for the accumulation of four metabolites
Metabolic Modeling with MetaFlux 275
Fig. 10 Two plots showing the accumulated production (positive values) and consumption (negative values) of
four metabolites for the two organisms of a dynamic FBA over 12 h for the entire grid
Fig. 11 Accumulation of the biomass of the two E. coli models and the take-up of oxygen over the grid
Fig. 12 Accumulation in mmol of four metabolites (acetate, oxygen, beta-D-glucopyranose, CO2) over the 5 5 grid
Metabolic Modeling with MetaFlux 279
Fig. 13 On the left are the accumulations of the biomasses of two organisms in a grid of 5 5 at four different steps (beginning of steps 1, 5, and 10; ending of
step 12). On the right are the accumulations of two metabolites in the same grid at the same steps
Metabolic Modeling with MetaFlux 281
Fig. 14 The main model-execution web page after the Tools menu command Metabolism!Run Metabolic
Model has been selected
Fig. 15 The page that displays the description of the public model cs_glucose_tea_oxygen
reactions with their fluxes is directly shown on the web page. The
reaction unique identifiers are clickable to further study them.
Clicking the button Show Solution File opens a new tab of the
browser and shows the same solution file as described for the earlier
methods. Similarly, the button Show Log File shows the entire log
file. Clicking the button Show Fluxes on the Cellular Over-
view opens a new tab to show the Cellular Overview with the
reactions highlighted according to the fluxes of the solved model.
The Cellular Overview functionalities on the web are similar but
not identical to the functionalities of the Cellular Overview on the
desktop platform of Pathway Tools. The model can be analyzed,
modified, and rerun, which are addressed in the next method.
282 Mario Latendresse et al.
Fig. 16 The content of the Results tab after executing the model cs_glucose_tea_oxygen; additional reaction
fluxes appear as the user scrolls down the page
Fig. 17 The Nutrients tab of the newly copied model. The full list of nutrients is not shown due to space
limitations
3.2 Creating The previous methods in Methods 1 used previously created meta-
and Completing bolic models. In this section, we introduce the creation of new
Models models.
Method 2a: creating a basic FBA model specification
This method will show how to create a basic FBA specification
file (or FBA file for short) for any PGDB; here our example organ-
ism will be E. coli. This initial FBA specification will contain a very
basic biomass specification (37 compounds out of the 85 com-
pounds produced by the full model). The following method uses
the EcoCyc PGDB for which we already have a model, but it will
illustrate a simple model-creation method that should work for any
PGDB (i.e., for any organism for which a PGDB has been created).
A template FBA file will include a suggested try-biomass sec-
tion, based on the taxonomic classification of the current PGDB.
We have pre-defined a set of 27 biomass templates that are stored in
MetaCyc. The metabolite composition of each template was
defined by a curator, after reading published biomass information
found in the literature. The templates are intended to represent a
starting point for the biomass, on a per phylum level. Given the
current PGDB, a simple algorithm ascends the tree structure of the
NCBI taxonomy, until the phylum level is reached, under which
this PGDB resides. If this phylum taxon node points to a biomass
template, it is retrieved. If this search fails (in a phylum for which we
have no information available), then a very generic template is
returned, which simply consists of the amino acids, nucleotides,
and a few universal cofactors.
A functional FBA model will uptake the specified nutrients and
process those nutrients via the metabolic reactions provided in the
PGDB to produce the biomass metabolites and the secretions.
During model development, all the four model components (nutri-
ents, reactions, secretions, and biomass metabolites) will probably
need adjustment. If all biomass metabolites are produced by the
model, we say that the model solves. If even one biomass metabo-
lite cannot be produced by the model, we say the model does not
solve. The model can fail to produce a biomass metabolite for fairly
obvious reasons, such as because of a missing nutrient, a missing
reaction, or a reaction whose specified directionality is the opposite
of what it should be. The model can also fail to produce a biomass
metabolite for less obvious reasons, such as a missing secretion
(which will block the reactions that produce the secretion, because
the secreted compound has nowhere to go).
Metabolic Modeling with MetaFlux 285
a. GLC[cytosol].
b. OXYGEN-MOLECULE[cytosol].
c. AMMONIUM[cytosol].
d. SULFATE[cytosol].
e. Pi[cytosol].
f. FE+2[cytosol].
g. CA+2[cytosol].
h. MG+2[cytosol].
i. MN+[cytosol].
j. NA+[cytosol].
the reactions that you are adding can indeed create growth, execut-
ing the model in solving mode is advisable (i.e., without the
try-reactions section, once the new reactions are added, to confirm
that growth is truly obtained).
Acknowledgments
References
1. Karp PD, Latendresse M, Paley SM, Sauls JT, Noronha A, Bordbar A, Cousins B,
Krummenacker M, Ong QD, Billington R, Assal DCE, Valcarcel LV, Apaolaza I,
Kothari A, Weaver D, Lee T, Subhraveti P, Ghaderi S, Ahookhosh M, Guebila MB,
Spaulding A, Fulcher C, Keseler IM, Caspi R Kostromins A, Sompairac N, Le HM, Ma D,
(2016) Pathway tools version 19.0 update: soft- Sun Y, Wang L, Yurkovich JT, Oliveira MAP,
ware for pathway/genome informatics and sys- Vuong PT, Assal LPE, Kuperstein I,
tems biology. Brief Bioinform 17:877–890 Zinovyev A, Hinton HS, Bryant WA, Artacho
2. Mahadevan R, Edwards JS, Doyle FJ (2002) FJA, Planes FJ, Stalidzans E, Maass A,
Dynamic flux balance analysis of diauxic growth Vempala S, Hucka M, Saunders MA, Maranas
in Escherichia coli. Biophys J 83:1331–1340 CD, Lewis NE, Sauter T, Palsson BØ, Thiele I,
3. Mahadevan R, Schilling CH (2003) The effects Fleming RMT (2019) Creation and analysis of
of alternate optimal solutions in constraint- biochemical constraint-based models using the
based genome-scale metabolic models. Metab COBRA Toolbox v.3.0. Nat Protoc 14:639
Eng 5:264–276 7. Latendresse M, Karp P (2018) Evaluation of
4. Dandekar T, Fieselmann A, Majeed S, Ahmed Z reaction gap-filling accuracy by randomization.
(2014) Software applications toward quantita- BMC Bioinformatics 19:53
tive metabolic flux analysis and modeling. Brief 8. Karp PD, Weaver D, Latendresse M (2018)
Bioinform 15:91–107 How accurate is automated gap filling of meta-
5. Lakshmanan M, Koh G, Chung BK, Lee DY bolic models? BMC Syst Biol 12:93
(2014) Software applications for flux balance 9. Caspi R, Billington R, Fulcher CA, Keseler IM,
analysis. Brief Bioinform 15:108–122 Kothari A, Krummenacker M, Latendresse M,
6. Heirendt L, Arreckx S, Pfau T, Mendoza SN, Midford PE, Ong Q, Ong WK, Paley S,
Richelle A, Heinken A, Haraldsdóttir HS, Subhraveti P, Karp PD (2018) The MetaCyc
Wachowiak J, Keating SM, Vlasov V, database of metabolic pathways and enzymes.
Magnusdóttir S, Ng CY, Preciat G, Žagare A, Nucleic Acids Res 46:D633–D639
Chan SHJ, Aurich MK, Clancy CM, Modamio J,
Chapter 13
Abstract
The DOE Systems Biology Knowledgebase (KBase) platform offers a range of powerful tools for the
reconstruction, refinement, and analysis of genome-scale metabolic models built from microbial isolate
genomes. In this chapter, we describe and demonstrate these tools in action with an analysis of isoprene
production in the Bacillus subtilis DSM genome. Two different methods are applied to build initial
metabolic models for the DSM genome, then the models are gapfilled in three different growth conditions.
Next, flux balance analysis (FBA) and flux variability analysis (FVA) techniques are applied to both study the
growth of these models in minimal media and classify reactions within each model based on essentiality and
functionality. The models are applied with the FBA method to predict essential genes, which are then
compared to an updated list of essential genes obtained for B. subtilis 168, a very similar strain to the DSM
isolate. The models are also applied to simulate Biolog growth conditions, and these results are compared
with Biolog data collected for B. subtilis 168. Finally, the DSM metabolic models are applied to explore the
pathways and genes responsible for producing isoprene in this strain. These studies demonstrate the
accuracy and utility of models generated from the KBase pipelines, as well as exploring the tools available
for analyzing these models.
Key words Metabolic models, Draft models, Genome-scale reconstruction, Flux balance analysis,
DOE knowledgebase
1 Introduction
Ali Navid (ed.), Microbial Systems Biology: Methods and Protocols, Methods in Molecular Biology, vol. 2349,
https://doi.org/10.1007/978-1-0716-1585-0_13, © Springer Science+Business Media, LLC, part of Springer Nature 2022
291
292 Benjamin H. Allen et al.
1.1 Background Before diving into our KBase workflow, it is useful to provide some
Description background information on genome-scale metabolic models.
of Metabolic Models These models are meant to be a representation of all the metabolic
pathways encoded by the genes of an organism. Thus, metabolic
models always include a stoichiometric matrix of metabolic reac-
tions and their associate biochemical compounds. Models also
include data specifying how each reaction depends upon the
genes encoding it, called gene-protein-reaction (GPR) associations.
The GPR associations in a model allow it to differentiate between
Application of the Metabolic Modeling Pipeline in KBase to Categorize. . . 293
2 Materials
3 Methods
3.1 Model Before flux balance analysis can be used to study the metabolic
Reconstruction from capabilities of an organism (or microbiome), a metabolic model
Microbial Genomes must first be constructed for that organism. The programs for
building these models based on genome sequencing data are
described in detail in previous publications [12, 13], so only a
brief description will be included here. KBase includes two apps
for constructing new genome-scale metabolic models for a micro-
bial genome. The first of these apps, called Build Metabolic Model,
constructs a new model from scratch based solely on the SEED-
based functional annotations found in the genome input into this
app. Only genomes annotated externally to KBase by RAST [14] or
PATRIC [15] or annotated within KBase by the Annotate Micro-
bial Assembly or Annotate Microbial Genome apps (see Fig. 2) will
have SEED annotations [16].
If a genome comes from any other source (e.g., IMG, Gen-
Bank, RefSeq), it is essential that the genome be first annotated by
one of the aforementioned mechanisms prior to building a model
of the genome using the Build Metabolic Model app (Fig. 3). This
app maps reactions to genes in the genome based on the annotated
functions of those genes. These reactions include both intracellular
and transport reactions, and full GPR associations are constructed
for each reaction in the model. A biomass producing reaction is also
constructed for the organism, with a different biomass being used
for gram-negative and gram-positive bacterial genomes. Gapfilling
of the draft model is optional, although this feature is active by
default. The gapfilling step of this process is described later in this
chapter.
Application of the Metabolic Modeling Pipeline in KBase to Categorize. . . 295
Fig. 2 Annotating a Bacillus subtilis DSM genome with RAST in KBase using the Annotate Microbial Genome
app; subsequent results table shown above
296 Benjamin H. Allen et al.
Fig. 3 Building a draft metabolic model with the Bacillus subtilis DSM genome using the Build Metabolic Model
app in KBase with Carbon-D-Glucose as the gapfilling media; subsequent results table shown above
Application of the Metabolic Modeling Pipeline in KBase to Categorize. . . 297
One caveat to using a tool like the Build Metabolic Model app to
construct a new model directly from annotations is that such mod-
els often require significant manual curation and refinement before
they can be truly used in a predictive capacity. A second model
reconstruction app in KBase, called Propagate Model to New
Genome (see Fig. 4), seeks to overcome this drawback by propagat-
ing an already curated and refined model published for a taxonomi-
cally similar genome to a genome of a new organism of interest.
This propagated model will share any reactions from the original
model for which corresponding orthologous genes could be found
between the original and new genome. The gene orthology rela-
tionships used by this app are generated by another app called
Compare Two Proteomes (see Fig. 5), which uses BLAST to compute
bidirectional-best-hits between the proteins in two input genomes.
If an orthologous gene pair is found, the annotated functions
assigned to the pair are ignored when propagating a model. Instead
the ortholog in the input genome inherits the reaction associations
of its homologous partner in the published model being propa-
gated. Optionally, this propagated model can also carry over any
gapfilled reactions from the original model. The propagated model
also shares reaction directionalities, full GPR rules (when
corresponding genes exist), and biomass reactions with the original
model. Refining these components of the model is the major target
for manual curation that requires significant effort, so propagating
298 Benjamin H. Allen et al.
Fig. 5 Comparing proteomes of the Bacillus subtilis DSM genome and the Bacillus subtilis 168 genome
and analyzing corresponding gene pairs using the Compare Two Proteomes app in KBase; subsequent synteny
map generated by app show above
page: https://narrative.kbase.us/#org/published-metabolic-
models. Check this page for existing published model before
attempting to import a published model from scratch. When a
published model is imported into KBase using the Import model
SBML from web and Integrate Imported Model into KBase Name-
space apps, a default media formulation for the model is automati-
cally generated based on the exchange flux constraints in the model
SBML file. This custom media is interoperable with any KBase
model and can be readily applied for gapfilling and flux analysis of
a propagated model. Whether or not the propagated model can
grow using this media can indicate if the propagated genome has
lost significant metabolic functionality.
Both of these model reconstruction methods were applied to
produce new genome-scale models for the exemplar genome
selected for this chapter, Bacillus subtilis DSM (see Note 2). The
subsequent models were tested using a wide range of FBA-based
methods, providing insights into how propagated and draft models
differ in performance. To build the propagated model, the Propa-
gate Model to New Genome app was applied to an existing published
model of B. subtilis 168 called the iBsu1103V2 [10]. This model
was selected because the DSM and 168 strains of B. subtilis are
extremely similar (4161 distinct genes overlapping). Indeed, these
genomes were so exceptionally close that this example represents
the best-case scenario for how propagated models will perform.
Biochemistry data that represent prokaryotic models divide
reactions into intercellular (cytosol) and extracellular compart-
ments. Unlike prokaryotic cells, eukaryotic cell hosts organelles
that each plays a unique physiological role in the cell. Therefore,
eukaryotic models are constructed as multi-compartmentalized
models that represent the metabolic processes in each organelle.
Additional transport reactions are added to facilitate the exchange
of compounds between the subcellular compartments of the meta-
bolic reconstruction, including the extracellular space and cytosolic
compartment.
The Build Metabolic Model app was also applied to build a
brand-new draft model of the DSM genome. The results from the
application of these two model reconstruction apps are shown in
Table 1.
Significant differences can be seen between the draft and pro-
pagated model. The propagated model had more reactions and
more associated genes. It also had more irreversible reactions,
reflecting its inheritance of more carefully curated reversibility con-
straints in the iBsu1103V2 model. The gapfilling differences will be
discussed later in the gapfilling section of this chapter.
3.2 Gapfilling One of the aspects of flux balance analysis (FBA) that makes it
a Metabolic Model challenging to apply to many biological and physical systems is
that the approach is mathematically unforgiving. If a metabolic
300 Benjamin H. Allen et al.
Table 1
Statistics from the reconstruction of models for the B. subtilis DSM genome
model contains a single gap in any critical pathway, this will render
the pathway, and probably the entire model, nonfunctional. Iden-
tifying and eliminating gaps within genome-scale models can be
particularly difficult, as they are composed of thousands of reac-
tions, hundreds of pathways, and high tens of biomass components.
Furthermore, the fidelity of genome-scale models is commensurate
with the current state of sequencing, assembly, gene calling, and
functional annotation technology. A genome-scale model will
always contain gaps corresponding to those in the underlying tech-
nologies. Sometimes genes have incorrect sequences and transla-
tions due to errors in underlying sequencing; often genes will be
missing due to errors in assembly or gene calling; and most com-
monly, genes are often incorrectly annotated or left entirely unan-
notated due to gaps in knowledge. All of these error sources lead to
many tens of gaps in a typical genome-scale model. In the case of
some reactions, the gene responsible for the reaction remains
unknown, meaning the reaction will always be missing in initial
draft models.
Fortunately, numerous tools and methods have emerged to
greatly simplify the process of finding and filling these gaps in a
genome-scale model. Optimization-based formulations were initi-
ally proposed to solve this problem by adding a minimal set of
reactions from a database such that the model would be capable
of producing biomass (or satisfying some other cellular objective
function) [17, 18]. These approaches have been improved over
time, primarily with respect to reducing the runtime of these algo-
rithms [19, 20]. Today, models can be gapfilled to eliminate the
missing reactions and enable the production of biomass in just a few
minutes. The Gapfill Metabolic Model app (see Fig. 6) in KBase
makes this process very easy. This app enables a user to select a
model for gapfilling, select a growth condition of interest, and
select a target reaction to enable flux through. By default, the target
reaction is the biomass objective function of the model, but the
method is equally applicable to any other objective function. The
growth condition is defined in a KBase Media object, which indi-
cates all the compounds that may be consumed or excreted within a
Application of the Metabolic Modeling Pipeline in KBase to Categorize. . . 301
Fig. 6 Gapfilling missing reactions in a draft metabolic model of Bacillus subtilis DSM using the Gapfill
Metabolic Model app in KBase; some of the resulting additional reactions are show in the results table below
302 Benjamin H. Allen et al.
Fig. 6 (continued)
Table 2
Comparison of gapfilling required in a variety of growth conditions
3.3 Flux Balance Once the gapfilling step is complete, a model is ready to be analyzed
Analysis and Media using flux balance analysis (FBA). In flux balance analysis, one flux
in KBase profile in a selected model is predicted such that a specified cellular
objective is optimized. The most common objective used is the
maximization of biomass flux. However, many other objectives are
possible, including maximization of ATP production, maximized
production of entropy, maximum product biosynthesis, or maxi-
mum carbon efficiency [5]. By default, the constraints that govern
the fluxes in FBA include bounds on each individual reaction flux
(typically defined based on thermodynamic reversibility of a reac-
tion), mass balance constraints on each metabolite, and uptake and
excretion constraints on each extracellular metabolite (defined in
KBase from the Media objects described in the gapfilling section).
Many other constraints are possible, including constraints based on
protein allocation [7], constraints enforcing thermodynamic feasi-
bility of all reactions [10], and constraints defined from transcrip-
tomic data [7, 8]. In turn, FBA produces numerous insights,
predicting active pathways during growth in a particular environ-
ment, identifying essential genes and reactions, predicting auxotro-
phy, and testing the impact of strategies to improve strain
productivity.
In KBase, FBA is run using the Run Flux Balance Analysis app
(see Fig. 7). As with the KBase gapfilling app, the user must specify
the model of interest and provide a Media object that defines the
constraints on the uptake and excretion of nutrients allowed in the
model (see Note 4). It is important to understand the power and
utility of the Media object in KBase. Of course, one can use the
Media object to set limits on the uptake of nutrients (by setting
limited maximum flux on a media compound), but one can also use
the object to: (1) prevent the excretion of a compound by setting its
minimum flux to zero, (2) force the excretion of a compound by
setting its maximum flux to a negative value, and (3) force the
uptake of a compound by setting its minimum flux to a positive
value. This is particularly useful when applied in concert with the
gapfilling app, as these strategies enable a user to ensure that a
model is capable of consuming or excreting a set of compounds
of interest. The FBA app in KBase also has many advanced para-
meters, which allow users to specify sets of reactions and genes to
knockout; specify supplemental compounds to add to media; set
upper limits on carbon, nitrogen, phosphate, and sulfate update;
identify expression data to use in constraining fluxes; specify custom
flux bounds on any model reaction; and select an objective to
optimize.
304 Benjamin H. Allen et al.
Fig. 7 Running Flux Balance Analysis app in KBase on a draft metabolic model built from the Bacillus
subtilis DSM genome; (a) parameter for enabling Flux Variability Analysis; (b) parameter for simulating single
knockouts
Table 3
Results from FBA run in minimal and complete media, measured in mmol/g CellDryWeight h
Fig. 8 Setting maximum uptake values in the advanced parameters of Run Flux Balance Analysis
3.4 Flux Variability One of the challenges in interpreting results from flux balance
Analysis analysis is the possibility of multiple equivalently optimal solutions.
In effect, this means that the flux profile reported by a particular
flux balance analysis simulation is actually only one of many possi-
ble, equally optimal, solutions. As a result, one cannot assume that a
reaction is essential for growth just because it has a non-zero flux
during a single FBA simulation. Similarly, one cannot assume a
reaction will be inactive in a specific condition just because it has a
flux of zero in a single FBA simulation. Flux variability analysis
(FVA) offers a mechanism for better characterizing the behavior
of each reaction in a model in a particular growth condition, despite
the existence of alternative equivalent optima [21]. This is accom-
plished by fixing the primary objective of the initial FBA simulation
to be greater than or equal to some minimal value (KBase uses 10%
of optimal objective value as a default). Next, each individual
reaction was independently minimized and maximized. In the
FBA solution table (see Fig. 9), a reaction with a negative maximum
value is essential in the reverse direction; a reaction with a positive
minimum value is essential in the forward direction; a reaction with
a maximum and minimum value both fixed at zero is said to be
blocked; and all other reactions are “variable” or “optional” mean-
ing they are capable of functioning on a particular condition but are
not essential. FVA is run by default in all KBase FBA simulations,
and the maximum and minimum flux values and associated classi-
fications are all reported in the FBA results table generated by the
Run Flux Balance Analysis app (see Fig. 7a).
To demonstrate FVA in action in KBase, we revisit the FBA
solution tables from the previous analysis of the DSM genome
described in the FBA section. Recall, FBA was performed in two
media conditions: (1) complete and (2) glucose minimal media
(GMM). Also, recall these simulations were performed using both
the propagated DSM model and the newly constructed draft model.
The FVA results from these studies are shown in Table 4.
Note how models simulated in complete media always had
fewer essential reactions than when simulated in glucose minimal
media. This is because complete media contains many biologically
significant nutrients (e.g., amino acids, vitamins, nucleotides) that
the cell can simply import rather than having to synthesize these
compounds. In contrast, in minimal media, the cell is forced to
synthesize virtually all biomass components. Thus, the reactions
that are essential in minimal media but not in complete media are
primarily biosynthesis reactions. The complete media simulations
also had the fewest blocked reactions. This is because complete
Application of the Metabolic Modeling Pipeline in KBase to Categorize. . . 307
Fig. 9 FBA solution table; the first reaction (bio1) is the biomass producing equation
Table 4
Results from essentiality classification of reactions using FVA
Table 5
Results from approximate functional classification of reactions using FVA
3.5 Predicting Just as metabolic models can be used to classify model reactions
Essential Genes using FVA, these models can also be applied to classify model genes
as either essential or non-essential by simulating gene knockouts.
As with FVA, the Run Flux Balance Analysis app in KBase makes
this computation relatively easy to perform. Unlike FVA, this par-
ticular analysis is not always performed by default. To classify a
model’s genes, one must click on the checkbox labeled Simulate
All Single KO (see Fig. 7b). When this feature is active, the FBA will
start by maximizing the selected primary objective function (typi-
cally biomass production) as usual. Then, each individual gene
knockout is simulated for every gene in the model. In these simula-
tions, all the reactions that are exclusively associated with the
knocked-out gene (or exclusively associated with a multi-enzyme
complex involving the knocked-out gene) have their fluxes con-
strained to zero. The primary objective is then re-optimized, and
the fraction of the knockout objective and the wild-type objective
values are reported. For completely essential genes, this fraction will
be zero. However, many gene knockouts will result in a reduced
objective rather than completely eliminated objective (see Note 8).
This gene essentiality prediction capability was demonstrated in
KBase using the same DSM models. Additionally, because the
B. subtilis DSM strain was shown to be extremely similar to
B. subtilis 168 strain, the known essential genes in B. subtilis 168
[22] may be translated to the corresponding genes in the DSM
genome and subsequently used to validate the DSM model predic-
tions. Next, the Run Flux Balance Analysis app was applied to
predict essential genes in defined rich LB media and in glucose
minimal media in each of our models (see Table 6).
Application of the Metabolic Modeling Pipeline in KBase to Categorize. . . 309
Table 6
Evaluating accuracy of DSM models in predicted essential genes in LB media and minimal media
3.6 Simulating In addition to predicting essential genes, models are also useful for
Biolog Phenotype predicting what catabolic degradation pathways are present in an
Profiles organism, and as with essentiality predictions, a simple and wide-
spread experimental procedure exists to facilitate the validation of
these predictions. Specifically, Biolog phenotype arrays [24] and
similar technologies test the capacity of an organism to utilize a
wide range of carbon, nitrogen, phosphate, and sulfate sources.
This type of data may then be loaded into KBase as a Phenotype
Set object (see Fig. 10). Phenotype Sets specify experimental growth
values for a set of media formulations typically in binary growth/no
growth format (although continuous growth rate measurements
are also supported). Preexisting media formulations already exist in
KBase for nearly all of the common compound conditions tested in
Biolog phenotype arrays, although the Edit Media app in KBase can
be applied as needed to create new formulations. Once a Phenotype
Set is loaded, the Simulate Growth on Phenotype Data app (see
Fig. 11) in KBase enables the rapid simulation of biomass produc-
tion in a selected metabolic model in each growth condition
included in the Phenotype Set. This app then reports the accuracy
of the model in predicting biomass production in each growth
condition, in terms of correct positives and negatives, and false
positives and negatives.
310 Benjamin H. Allen et al.
Fig. 11 Simulating growth of FBA models based on phenotype data using the Simulate Growth on Phenotype
Data app in KBase; (a) Add transporters for primary nutrients associated with all phenotype conditions
312 Benjamin H. Allen et al.
Fig. 11 (continued)
Table 7
Evaluating the accuracy of our models in predicting phenotype array growth
3.7 Simulating The Biolog analysis demonstrates the utility of metabolic models to
Metabolite rapidly test for the existence and completeness of degradation
Biosynthesis pathways. Models can be used very similarly to test for the existence
and completeness of biosynthesis pathways as well. In this case, an
export reaction can be added to a model for any compound of
interest, and this export reaction can be maximized, while one
provides a Media formulation that contains the desired starting
points for the pathway of interest. If the resulting objective func-
tion is zero, then the biosynthesis pathway is missing or incomplete.
If the objective is greater than zero, then the resulting FBA solution
provides a list of mass balanced reactions required to produce the
compound of interest.
While this approach works and can be run in KBase using a
combination of the Edit Model and Run Flux Balance Analysis
apps, KBase has another app that accomplishes the same steps in
superior fashion in a single step. This new app, called Predict
Metabolite Biosynthesis Pathway (see Fig. 12), enables a user to select
a model, select a set of compounds of interest, select additional
starting molecules or interest, and reports whether the compounds
can be produced. This app also provides simple streamlined path-
ways for each desired compound as an output. Unlike graph-based
pathway search algorithms, this approach will identify branched
chain pathways with no problem.
We started by first gapfilling our models using the Gapfill
metabolic model app to ensure that both models are capable of
producing isoprene. This resulted in the addition of two reactions
to the draft model and propagated model. Next, we applied the
Predict Metabolite Biosynthesis Pathway app to actually propose a
specific pathway to synthesis isoprene from central carbon metabo-
lites. We selected isoprene in this case because the DSM strain of
B. subtilis is known for producing isoprene, and it is also an impor-
tant biofuel target. The results from this analysis are in Table 8.
314 Benjamin H. Allen et al.
Fig. 12 Predicting pathways for isoprene biosynthesis in draft metabolic model built from the DSM genome
using the Predict Metabolite Biosynthesis Pathway app in KBase; subsequent results table show above
Application of the Metabolic Modeling Pipeline in KBase to Categorize. . . 315
Table 8
Isoprene biosynthesis pathway prediction using DSM models and FBA
4 Conclusion
5 Notes
Acknowledgments
References
1. Kumar VS, Maranas CD (2009) GrowMatch: models. Nat Biotechnol 28:977–982. https://
an automated method for reconciling in silico/ doi.org/10.1038/nbt.1672
in vivo growth predictions. PLoS Comput Biol 13. Faria JP, Khazaei T, Edirisinghe JN et al (2016)
5:e1000308. https://doi.org/10.1371/jour Constructing and analyzing metabolic flux
nal.pcbi.1000308 models of microbial communities. In: McGe-
2. Goldford JE, Lu N, Bajić D et al (2018) Emer- nity TJ, Timmis KN, Nogales B (eds) Hydro-
gent simplicity in microbial community assem- carbon and lipid microbiology protocols.
bly. Science 361:469–474. https://doi.org/ Springer, Berlin, pp 247–273
10.1126/science.aat1168 14. Aziz RK, Bartels D, Best AA et al (2008) The
3. Pharkya P (2004) OptStrain: a computational RAST server: rapid annotations using subsys-
framework for redesign of microbial produc- tems technology. BMC Genomics 9:75.
tion systems. Genome Res 14:2367–2376. https://doi.org/10.1186/1471-2164-9-75
https://doi.org/10.1101/gr.2872004 15. Wattam AR, Brettin T, Davis JJ et al (2018)
4. Monk JM, Koza A, Campodonico MA et al Assembly, annotation, and comparative geno-
(2016) Multi-omics quantification of species mics in PATRIC, the all bacterial bioinformat-
variation of Escherichia coli links molecular fea- ics resource center. In: Setubal JC, Stoye J,
tures with strain phenotypes. Cell Syst Stadler PF (eds) Comparative genomics.
3:238–251.e12. https://doi.org/10.1016/j. Springer, New York, NY, pp 79–101
cels.2016.08.013 16. Overbeek R, Olson R, Pusch GD et al (2014)
5. Orth JD, Thiele I, Palsson BØ (2010) What is The SEED and the Rapid Annotation of micro-
flux balance analysis? Nat Biotechnol bial genomes using Subsystems Technology
28:245–248. https://doi.org/10.1038/nbt. (RAST). Nucleic Acids Res 42:D206–D214.
1614 https://doi.org/10.1093/nar/gkt1226
6. Henry CS, Broadbelt LJ, Hatzimanikatis V 17. Satish Kumar V, Dasika MS, Maranas CD
(2007) Thermodynamics-based metabolic flux (2007) Optimization based automated cura-
analysis. Biophys J 92:1792–1805. https:// tion of metabolic reconstructions. BMC Bioin-
doi.org/10.1529/biophysj.106.093138 formatics 8:212. https://doi.org/10.1186/
7. Tournier L, Goelzer A, Fromion V (2017) 1471-2105-8-212
Optimal resource allocation enables mathemat- 18. Reed JL, Patel TR, Chen KH et al (2006)
ical exploration of microbial metabolic config- Systems approach to refining genome annota-
urations. J Math Biol 75:1349–1380. https:// tion. Proc Natl Acad Sci 103:17480–17484.
doi.org/10.1007/s00285-017-1118-5 https://doi.org/10.1073/pnas.0603364103
8. Covert MW, Palsson BØ (2002) Transcrip- 19. Dreyfuss JM, Zucker JD, Hood HM et al
tional regulation in constraints-based meta- (2013) Reconstruction and validation of a
bolic models of Escherichia coli. J Biol Chem genome-scale metabolic model for the filamen-
277:28058–28064. https://doi.org/10. tous fungus neurospora crassa using FARM.
1074/jbc.M201691200 PLoS Comput Biol 9:e1003126. https://doi.
9. Arkin AP, Cottingham RW, Henry CS et al org/10.1371/journal.pcbi.1003126
(2018) KBase: the United States Department 20. Latendresse M (2014) Efficiently gap-filling
of Energy Systems Biology Knowledgebase. reaction networks. BMC Bioinformatics
Nat Biotechnol 36:566–569. https://doi. 15:225. https://doi.org/10.1186/1471-
org/10.1038/nbt.4163 2105-15-225
10. Henry CS, Zinner JF, Cohoon MP, Stevens RL 21. Mahadevan R, Schilling CH (2003) The effects
(2009) iBsu1103: a new genome-scale meta- of alternate optimal solutions in constraint-
bolic model of Bacillus subtilis based on SEED based genome-scale metabolic models. Metab
annotations. Genome Biol 10:R69. https:// Eng 5:264–276
doi.org/10.1186/gb-2009-10-6-r69 22. Koo B-M, Kritikos G, Farelli JD et al (2017)
11. Thiele I, Palsson BØ (2010) A protocol for Construction and analysis of two genome-scale
generating a high-quality genome-scale meta- deletion libraries for Bacillus subtilis. Cell Syst
bolic reconstruction. Nat Protoc 5:93–121. 4:291–305.e7. https://doi.org/10.1016/j.
https://doi.org/10.1038/nprot.2009.203 cels.2016.12.013
12. Henry CS, DeJongh M, Best AA et al (2010) 23. Henry CS, Rotman E, Lathem WW et al
High-throughput generation, optimization (2017) Generation and Validation of the
and analysis of genome-scale metabolic iKp1289 metabolic model for Klebsiella
320 Benjamin H. Allen et al.
pneumoniae KPPR1. J Infect Dis 215: 26. Song H-S, Nelson WC, Lee J-Y et al (2018)
S37–S43. https://doi.org/10.1093/infdis/ Metabolic network modeling for computer-
jiw465 aided design of microbial interactions. In:
24. Bochner BR (2001) Phenotype microarrays for Chang HN (ed) Emerging areas in bioengi-
high-throughput phenotypic testing and assay neering. Wiley-VCH Verlag GmbH &
of gene function. Genome Res 11:1246–1255. Co. KGaA, Weinheim, pp 793–801
https://doi.org/10.1101/gr.186501 27. Heirendt L, Arreckx S, Pfau T et al (2019)
25. Henry CS, Bernstein HC, Weisenhorn P et al Creation and analysis of biochemical
(2016) Microbial community metabolic mod- constraint-based models using the COBRA
eling: a community data-driven network recon- Toolbox v.3.0. Nat Protoc 14:639–702.
struction: community data-driven metabolic https://doi.org/10.1038/s41596-018-0098-
network modeling. J Cell Physiol 2
231:2339–2345. https://doi.org/10.1002/
jcp.25428
Chapter 14
Abstract
Constraint-based reconstruction and analysis (COBRA) methods have been used for over 20 years to
generate genome-scale models of metabolism in biological systems. The COBRA models have been utilized
to gain new insights into the biochemical conversions that occur within organisms and allow their survival
and proliferation. Using these models, computational biologists can conduct a variety of different analyses
such as examining network structures, predicting metabolic capabilities, resolving unexplained experimen-
tal observations, generating and testing new hypotheses, assessing the nutritional requirements of a
biosystem and approximating its environmental niche, identifying missing enzymatic functions in the
annotated genomes, and engineering desired metabolic capabilities in model organisms. This chapter
details the protocol for developing curated system-level COBRA models of metabolism in microbes.
Ali Navid (ed.), Microbial Systems Biology: Methods and Protocols, Methods in Molecular Biology, vol. 2349,
https://doi.org/10.1007/978-1-0716-1585-0_14, © Springer Science+Business Media, LLC, part of Springer Nature 2022
321
322 Ali Navid
1 Introduction
1.1 Constraint- Readily available genomic information has led to a paradigm shift in
Based Modeling microbiology where detailed analyses of isolated cellular processes
have been replaced by system-level analyses of the organism as a
whole. Typical COBRA models forgo some level of detail (such as
insights into transient behavior of metabolites and enzyme-
substrate affinity) in order to gain a broader understanding of the
overall metabolic capabilities of a cell. The most successful of
COBRA approaches is Flux Balance Analysis (FBA) [28, 29]. FBA
models are based on the knowledge of the stoichiometry of meta-
bolic reactions that can easily be extracted from the annotated
genomes. The stoichiometry is used to develop a mathematical
reconstruction of the metabolic networks. These models also
require prior knowledge of an organism’s growth phenotypes and
related nutritional needs. The data is used to constrain cellular
growth and the uptake of nutrients and export of waste materials.
The constraints also limit the cellular energy metabolism to a
narrow set of possible catabolic pathways. Finally, the knowledge
of thermodynamics of reactions is used to constrain their
directions.
Curating COBRA Models of Microbial Metabolism 323
2 Materials
2.1 Annotated The most important organism-specific data that is necessary for
Genome developing genome-scale model of metabolism is the annotated
genome. Genome annotations are available from a number of
different sources (see Table 1). Some annotated genomes can be
found on databases dedicated to a specific model organism (e.g.,
EcoCyc [31] for E. coli). Most of the publicly available annotated
genomes can be found on databases such as Integrated Microbial
Genomes (IMG) [32] and EntrezGene [33] that contain large
grouping of annotated genomes. The annotated genome provides
the modeler with a list of all proteins that can be translated from the
genome of an organism. These include enzymes which catalyze
metabolic reactions.
In case of novel organisms or if one wants to reannotate a
genome, the reader is encouraged to read Chapter 10 in this
book for a detailed protocol on how to annotate genomes. The
U.S. Department of Energy systems biology knowledgebase
(KBase) [34] has recently implemented a number of new apps
that allow a user to import, compare, and consolidate genome
annotations from multiple sources. This tool is an excellent source
for initiating development of genome-scale models (GSMs) since
relying on a single annotation tool will usually result in missing
gene-function assignments and ultimately, incomplete metabolic
network reconstructions [35].
2.2 Software A number of free and commercial software are available for recon-
structing metabolic networks and developing draft FBA models.
2.2.1 Automated
These include:
Metabolic Network
Reconstruction and Model l AutoKEGGRec [36].
Development l CarveMe [37].
l KBase [34]/Model SEED [38, 39]—see Chapter 13.
l Merlin [40].
l Pathway Tools (MetaFlux) [41]—see Chapter 12.
l RAVEN [42, 43].
l SuBliMinaL [44].
324 Ali Navid
Table 1
Some of the databases commonly used for the development of genome-scale metabolic models
2.2.2 Model Editors Once the constrained model has been generated, common text
editing programs such as vim (www.vim.org) or GNU Emacs
(www.gnu.org/software/emacs/) can be used for editing. How-
ever, most of the programs developed for simulating FBA models
(e.g., COBRA toolbox [45–47]), include commands for easy
manipulation of models.
2.2.3 Linear For most COBRA methods, such as FBA, one needs access to a
Programming Solver linear programming (LP) solver. Commercial programs such as
Cplex from IBM (www.ibm.com/analytics/cplex-optimizer), Gur-
obi (www.gurobi.com/products/gurobi-optimizer), and Matlab
(www.mathworks.com/products/matlab.html), as well as free pro-
grams such as GLPK (Gnu linear programming kit) (www.gnu.
org/software/glpk) and PCx (www.anl.gov/tcp/pcx-optimiza
tion-problem-solver) can be used to simulate FBA models.
2.2.4 Simulation A number of useful tools have been developed for analyzing the
Toolboxes models. For Matlab, a popular suite of programs called the COBRA
toolbox (opencobra.github.io/cobratoolbox/stable) [45–47] has
been developed (see Chapter 15). For those who do not have access
to Matlab, Python (cobrapy, https://opencobra.github.io/
cobrapy/), and Julia (cobra.jl, https://opencobra.github.io/
COBRA.jl/stable) versions are also available. Other tools such as
Pathway tools and MetaFlux [41] (see Chapter 12) and RAVEN
[42, 43] can also be used. A large number of system-level analyses
can be conducted using these programs, such as in silico gene
deletion analyses, flux variability analysis [48], and dynamic
FBA [49].
Curating COBRA Models of Microbial Metabolism 325
3 Model Development
3.1 Draft Network In the early days of COBRA model development, all network
Reconstruction reconstructions and subsequent mathematical formulations of the
models were done manually. However, during the last decade, a
number of useful tools have been developed to significantly accel-
erate model development by automating the process of metabolic
network reconstruction and constraining the models (see Subhead-
ing 2.2.1). These initial drafts are not publication quality models
and require significant amount of additional manual refinement.
However, today a majority of researchers use these initial draft
models as the starting point for generating their new models.
The first step in building a genome-scale model of metabolism
in an organism is to obtain its annotated genome. The process of
similarity-based genome annotation is built upon comparing the
gene sequences in a recently sequenced genome to all other anno-
tated genomes. If there is statistically significant similarity between
the sequence of a query gene and a gene with a known function in
another organism, then the former gene is assigned the same func-
tion as that of the latter gene. The process is obviously more
complicated than this simple description and interested readers
can see Chapter 10 in this book for a detailed description.
As explained in Subheading 2.1, annotated genomes can be
obtained from a number of different databases (see Table 1). The
annotated genome presents the modeler with a list of metabolic
enzymes that can be present in the target organism (see Note 1).
The function of each metabolic enzyme is usually denoted by an
Enzyme (EC) or Transport (TC) Commission number.
Once the annotated genome has been obtained, one has to
ensure that it matches the requirements of the draft model genera-
tor. For example, AutoKEGGRec [36] only accepts annotated
genomes in the KEGG format. For a long time, all KBase generated
draft models were based on RAST [50] annotations. Currently
efforts are underway to allow for use of annotation from multiple
sources for the generation of models in KBase.
3.2 Manual Curation The first step in refining the draft reconstruction is to re-examine
of the Draft the annotated genome and exclude dubious annotation (and asso-
Reconstruction ciated reactions). Some enzymatic functions will be erroneously
included in the list of available functionalities due to poor annota-
3.2.1 Wrong Annotations tions. As a result, it is essential that the modeler ensures that the
most complete annotation of the genome is used for the process of
curating the model. Combining annotations from multiple sources
could provide a more complete reconstruction of metabolic path-
ways when compared to results from a single source [35]. There-
fore, the modeler should examine and include system-level
information that is deposited on other metabolic databases such
326 Ali Navid
3.2.2 Generic Reactions Some draft metabolic network reconstructions might contain
generic reactions which include nonspecific terms such as peptide,
electron acceptor, or protein. An example of such a reaction is
R00056 from KEGG:
Dinucleotide þ H2 O $ 2 Mononucleotide:
These reactions have to be excised from the reconstruction
because they are too general and do not refer to any specific
component of the network.
In most metabolic reconstructions, some multistep processes
such as those catalyzed by large enzyme complexes (e.g., pyruvate
dehydrogenase and α-ketoglutarate dehydrogenase) or linear path-
ways (e.g., fatty acid oxidation) have been combined into single
reactions for sake of simplicity. To eliminate redundancy, it is nec-
essary to remove the composite reactions from the network. How-
ever, prior to eliminating these reactions, one must ensure that they
do not serve a role in any other pathway.
3.2.3 Non-enzymatic A number of important cellular metabolic reactions are not cata-
Reactions lyzed by enzymes. Their activity might be critical because they can
connect reactions that are essential for cellular growth and survival.
Therefore, one should add to the metabolic reconstruction only
those non-enzymatic reactions that have at least one metabolite
joining them to the rest of the network. This will ensure that a large
number of dead-end metabolites do not clutter the reconstruction.
3.2.5 Reaction An important problem with some draft reconstructions is that most
Directionality reactions are classified as reversible. For example, a model might
predict that the exact same set of enzymes can catalyze both glycol-
ysis and gluconeogenesis. Additionally, the energetics of the system
will be drastically affected since some metabolic cycles might allow
for energy-free conversion of ADP to ATP. A recent study found
Curating COBRA Models of Microbial Metabolism 327
3.2.7 Intracellular If the modeled organism is eukaryotic, one must ensure that the
Transport Reactions proper set of intracellular exchange reactions between the various
compartments are included in the reconstruction. Unfortunately,
experimental data on these processes is not readily available. There-
fore, only those transport reactions that are necessary for the proper
function of a compartmented pathway must be included in the
reconstruction. Addition of excessive transport reactions can lead
to the formation of futile cycles which will lower the value of model
as a predictive tool for network and flux analyses (see Note 6).
328 Ali Navid
3.2.8 Gene-Protein- When refining the network, one must keep a record of the protein
Reaction (GPR) (s) that catalyze a reaction. Often a majority of the reactions in a
Association Table network are catalyzed by single proteins that are encoded by single
genes. However, a number of other scenarios are also possible.
These include:
l One protein can catalyze more than one reaction. For example, a
50 -nucleotidase (EC 3.1.3.5) can catalyze the phosphatase reac-
tions that convert various 50 -nucleotides into their respective
nucleosides.
l An enzyme that catalyzes a reaction is a heteromeric complex,
and hence, the byproducts of translation of multiple genes are
required to proceed with that reaction.
l Proteins encoded by different genes have the same functionality
(isozymes) and can catalyze the same reaction.
The GPR tables are usually included with draft reconstructions
or can be easily extracted from the models encoded in formats such
as Systems Biology Markup Language (SBML) [68, 69]. Develop-
ment of an accurate GPR table is essential for all gene-based ana-
lyses (see Note 7).
3.2.10 Growth- A second ATP hydrolysis reaction should be included in the meta-
Associated Energy bolic network reconstruction to account for the energy a cell uses
Consumption to grow. This includes the energy used for the process of DNA
synthesis, as well as gene transcription and mRNA translation. Since
this energy consumption process is proportional to the rate of the
cellular growth, it is generally included as part of the biomass
reaction.
3.2.12 Nutritional It is important to gather all available information about the unique
Requirements nutritional requirements and preferences of an organism for differ-
ent growth media. This information should be used to ensure the
appropriate nutrients and associated transport mechanisms are
included in the model. Absence of essential nutrients could lead
to erroneous editing of the metabolic network in order to confirm
that the missing metabolite can be produced in vivo. This mistake
can introduce errors into the prediction of optimal cellular growth
rate by diverting needed metabolites toward incorrectly added
anabolic pathways.
3.2.13 Extracellular The annotated genomes list proteins that are associated with differ-
Transport Reactions ent metabolite transport mechanisms. One must make sure that
each one of these proteins is associated with the correct mode of
transportation. It is particularly important to confirm that the
amounts of energy consumed for active transport processes are
accurately formulated. Finally, it is also necessary to ensure that
means of transportation are available for all essential nutrients. If
330 Ali Navid
3.3 Translating In order to utilize COBRA methods for analyzing the properties of
the Refined a system of interest, the curated biochemical reconstruction has to
Reconstruction into be translated into a mathematical format. The resulting stoichiomet-
a Mathematical Model ric matrix (S) incorporates all the information about interconver-
sion of metabolites and the structure of metabolic network. Each
3.3.1 Mathematical column in S corresponds to a metabolic reaction in the network,
Representation while each row represents a metabolite. If there are m metabolites
and n reactions in the metabolic network, then the dimensions of
S are m n. If a metabolite i is a product in reaction j, then the
value of Sij is positive. On the other hand, if the metabolite is a
reactant, then the value of Sij would be negative (see Fig. 1). For
each reaction (column), the values of S for all other rows (metabo-
lites that do not participate) are zero. The reaction rates (or fluxes)
for each reaction in the network are represented in a separate 1 n
vector v (see Note 10). Thus, the equation for dynamic mass
balance can be written as:
dX
¼S v
dt
where X represents the metabolite concentrations.
3.3.2 Characteristics At this stage of the model building process, one should identify the
of the Medium nutrients that are present in the growth medium of the cell. This is
done by defining an additional compartment (E) that envelops the
cell. This compartment is permeable to those metabolites that are
either imported or exported by the cell. The environment sur-
rounding the cell is characterized by the limits of the flux values
for exchange reactions in and out of E.
3.3.4 System Constraints The FBA method is based on three fundamental assumptions:
1. The system is in a metabolic quasi-steady state. This assump-
tion can be justified by the fact that changes in cellular metab-
olism are generally fast in comparison to overall cellular growth
rate and environmental changes.
2. Mass is conserved. Thus, all mass that is imported into the
system is either transformed into biomass or excreted as meta-
bolic byproducts.
3. The cell has an objective and fluxes through various metabolic
reactions are patterned to optimize this cellular goal.
The steady-state assumption means that we can fix the mass
balance equation so that:
dX
¼ S v ¼ 0:
dt
Due to scarcity of experimental measurements, there are a lot
more unknown reaction rates in S than linear independent mass
balance equations for all FBA models. Thus, the stoichiometric
matrix is rank deficient, and the problem is highly underdeter-
mined. This obstacle can be eliminated by using linear program-
ming to solve for one feasible flux vector which will optimize an
objective function. The most commonly maximized objective func-
tion for FBA models is cellular growth (i.e., production of biomass)
(see Note 11).
332 Ali Navid
3.3.5 Debugging Once the mathematical conversion has been completed, it becomes
the Network necessary to run the model so as to ensure that it has the capability
to produce the components of biomass and make predictions that
agree with experimental observations. The most common error
that leads to discrepancies is the presence of unresolved network
gaps. As previously mentioned, annotated genomes generally con-
tain missing functionalities. Using annotations from multiple
sources can provide a more complete network reconstruction
[35]. Once the modelers have ensured that all information that
can be collected from a genome has been incorporated in the
network reconstruction, they can use a variety of tools such as the
gap filling function in COBRA toolbox or pathway holefiller mod-
ule of the pathway tools program [80] to ensure the model predicts
the correct phenotype.
In order to make sure that all necessary reactions are included
in the model, it is crucial to examine all experimental data and make
certain that the model has the metabolic capacity to mimic
observed bacterial behavior under various conditions. For example,
if experimental measurements show that an organism has the capa-
bility to consume succinate as its sole carbon source, and the model
does not agree; then the model can be changed to make succinate
the sole carbon source in the medium and by running the gap filler
the missing reactions are added to the network.
3.3.6 MEMOTE Genome-scale models of metabolism are large and complex knowl-
edgebases. They include thousands of metabolites, reactions, and
gene associations. As the field of COBRA modeling grows, partic-
ularly toward analysis of complex multi-organism systems, it is
becoming clear that there is a need to develop a uniform descrip-
tion format for GSMs and also to ensure that published models pass
a set of rigorous quality control measures. To this end, a
community-maintained open-source software called Memote
(Metabolic Model Tests, https://github.com/opencobra/mem-
ote) [81] has been developed to test submitted models for a variety
of common problems ranging from annotation issues to stoichio-
metric inconsistencies and problems with biomass composition.
Memote is a very helpful tool for curating COBRA models. It can
even be configured to automatically test models as the process of
curation progresses.
Memote offers the quality control of models that is needed for
the development of evermore complex models and simplifies colla-
borations needed for such efforts to succeed. It is now agreed by
most members of the COBRA community that all models submit-
ted for publication should include Memote generated comprehen-
sive reports verifying the quality of the submitted work.
Curating COBRA Models of Microbial Metabolism 333
3.4 Applications The availability of annotated genomes has led to a dramatic increase
of FBA Models in the number of genome-scale metabolic models that are being
developed. Concurrently, the number and scope of theoretical
methods for investigating these reconstructions is also expanding.
The applications of these models are too numerous to list here.
However, a number of excellent reviews have been published that
thoroughly categorize and detail the most prominent uses of
constraint-based metabolic models. The interested reader can
examine manuscripts by Price et al. [82], Oberhardt et al. [27],
Feist and Palsson [83], Milne et al. [84], Liu et al. [85], and
Bordbar et al. [86].
4 Notes
Acknowledgments
This work was funded in part by the DOE OBER Genomic Science
program and LLNL Laboratory Directed Research and Develop-
ment funding and performed under the auspices of the
U.S. Department of Energy at Lawrence Livermore National Lab-
oratory under Contract DE-AC52-07NA27344.
References
1. Almaas E, Kovacs B, Vicsek T, Oltvai ZN, Bar- 5. Li G, Cao H, Xu Y (2018) Structural and func-
abasi AL (2004) Global organization of meta- tional analyses of microbial metabolic networks
bolic fluxes in the bacterium Escherichia coli. reveal novel insights into genome-scale meta-
Nature 427(6977):839–843 bolic fluxes. Brief Bioinform 20(4):1590–1603
2. Almaas E (2007) Optimal flux patterns in cel- 6. Segre D, Vitkup D, Church GM (2002) Anal-
lular metabolic networks. Chaos 17(2):026107 ysis of optimality in natural and perturbed met-
3. Almaas E, Oltvai ZN, Barabasi AL (2005) The abolic networks. Proc Natl Acad Sci U S A 99
activity reaction core and plasticity of metabolic (23):15112–15117
networks. PLoS Comput Biol 1(7):e68 7. Deutscher D, Meilijson I, Kupiec M, Ruppin E
4. Gagneur J, Jackson DB, Casari G (2003) Hier- (2006) Multiple knockout analysis of genetic
archical analysis of dependency in metabolic robustness in the yeast metabolic network. Nat
networks. Bioinformatics 19(8):1027–1034 Genet 38(9):993–998
Curating COBRA Models of Microbial Metabolism 335
8. Jamshidi N, Palsson BO (2006) Systems biol- 22. Yoshikawa K, Toya Y, Shimizu H (2017) Met-
ogy of SNPs. Mol Syst Biol 2:38 abolic engineering of Synechocystis sp. PCC
9. Edwards JS, Palsson BO (2000) Metabolic flux 6803 for enhanced ethanol production based
balance analysis and the in silico analysis of on flux balance analysis. Bioprocess Biosyst
Escherichia coli K-12 gene deletions. BMC Bio- Eng 40(5):791–796
informatics 1:1 23. Schuetz R, Kuepfer L, Sauer U (2007) System-
10. Ho W-C, Zhang J (2016) Adaptive genetic atic evaluation of objective functions for pre-
robustness of Escherichia coli metabolic fluxes. dicting intracellular fluxes in Escherichia coli.
Mol Biol Evol 33(5):1164–1176 Mol Syst Biol 3:119
11. Reed JL, Famili I, Thiele I, Palsson BO (2006) 24. Schuetz R, Zamboni N, Zampieri M,
Towards multidimensional genome annota- Heinemann M, Sauer U (2012) Multidimen-
tion. Nat Rev Genet 7(2):130–141 sional optimality of microbial metabolism. Sci-
12. Navid A, Almaas E (2009) Genome-scale ence 336(6081):601–604
reconstruction of the metabolic network in 25. Navid A, Jiao Y, Wong SE, Pett-Ridge J (2019)
Yersinia pestis, strain 91001. Mol BioSyst 5 System-level analysis of metabolic trade-offs
(4):368–375 during anaerobic photoheterotrophic growth
13. Pal C, Papp B, Lercher MJ (2005) Adaptive in Rhodopseudomonas palustris. BMC Bioin-
evolution of bacterial metabolic networks by formatics 20(1):233
horizontal gene transfer. Nat Genet 37 26. Peyraud R, Cottret L, Marmiesse L, Gouzy J,
(12):1372–1375 Genin S (2016) A resource allocation trade-off
14. Pal C, Papp B, Lercher MJ (2005) Horizontal between virulence and proliferation drives met-
gene transfer depends on gene content of the abolic versatility in the plant pathogen Ralsto-
host. Bioinformatics 21(Suppl 2):222–223 nia solanacearum. PLoS Pathogens 12(10):
e1005939
15. Pal C, Papp B, Lercher MJ, Csermely P, Oliver
SG, Hurst LD (2006) Chance and necessity in 27. Oberhardt MA, Palsson BO, Papin JA (2009)
the evolution of minimal metabolic networks. Applications of genome-scale metabolic recon-
Nature 440(7084):667–670 structions. Mol Syst Biol 5:320
16. Großkopf T, Consuegra J, Gaffé J, Willison JC, 28. Varma A, Palsson BO (1994) Metabolic flux
Lenski RE, Soyer OS et al (2016) Metabolic balancing: basic concepts, scientific and practi-
modelling in a dynamic evolutionary frame- cal use. Nat Biotechnol 12(10):994–998
work predicts adaptive diversification of bacte- 29. Orth JD, Thiele I, Palsson BO (2010) What is
ria in a long-term evolution experiment. BMC flux balance analysis? Nat Biotechnol 28
Evol Biol 16(1):163 (3):245–248
17. Pharkya P, Burgard AP, Maranas CD (2003) 30. Gottstein W, Olivier BG, Bruggeman FJ, Teu-
Exploring the overproduction of amino acids sink B (2016) Constraint-based stoichiometric
using the bilevel optimization framework Opt- modelling from single organisms to microbial
Knock. Biotechnol Bioeng 84(7):887–899 communities. J R Soc Interface 13
18. Burgard AP, Pharkya P, Maranas CD (2003) (124):20160627
Optknock: a bilevel programming framework 31. Keseler IM, Collado-Vides J, Santos-Zavaleta-
for identifying gene knockout strategies for A, Peralta-Gil M, Gama-Castro S, Muniz-
microbial strain optimization. Biotechnol Rascado L et al (2011) EcoCyc: a comprehen-
Bioeng 84(6):647–657 sive database of Escherichia coli biology.
19. Pharkya P, Burgard AP, Maranas CD (2004) Nucleic Acids Res 39(Database issue):
OptStrain: a computational framework for D583–D590
redesign of microbial production systems. 32. Markowitz VM, Chen IM, Palaniappan K,
Genome Res 14(11):2367–2376 Chu K, Szeto E, Grechkin Y et al (2010) The
20. Fong SS, Burgard AP, Herring CD, Knight integrated microbial genomes system: an
EM, Blattner FR, Maranas CD et al (2005) In expanding comparative analysis resource.
silico design and adaptive evolution of Escher- Nucleic Acids Res 38(Database issue):
ichia coli for production of lactic acid. Biotech- D382–D390
nol Bioeng 91(5):643–648 33. Maglott D, Ostell J, Pruitt KD, Tatusova T
21. Park JH, Lee KH, Kim TY, Lee SY (2007) (2007) Entrez gene: gene-centered informa-
Metabolic engineering of Escherichia coli for tion at NCBI. Nucleic Acids Res 35(Database
the production of L-valine based on transcrip- issue):D26–D31
tome analysis and in silico gene knockout sim- 34. Arkin AP, Cottingham RW, Henry CS, Harris
ulation. Proc Natl Acad Sci U S A 104 NL, Stevens RL, Maslov S et al (2018) KBase:
(19):7797–7802 The United States Department of Energy
336 Ali Navid
Systems Biology Knowledgebase. Nat Biotech- 46. Schellenberger J, Que R, Fleming RMT,
nol 36(7):566 Thiele I, Orth JD, Feist AM et al (2011) Quan-
35. Griesemer M, Kimbrel JA, Zhou CE, Navid A, titative prediction of cellular metabolism with
D’haeseleer P (2018) Combining multiple constraint-based models: the COBRA Toolbox
functional annotation tools increases coverage v2. 0. Nat Protoc 6(9):1290–1307
of metabolic annotation. BMC Genomics 19 47. Heirendt L, Arreckx S, Pfau T, Mendoza SN,
(1):948 Richelle A, Heinken A et al (2019) Creation
36. Karlsen E, Schulz C, Almaas E (2018) Auto- and analysis of biochemical constraint-based
mated generation of genome-scale metabolic models using the COBRA Toolbox v. 3.0. Nat
draft reconstructions based on KEGG. BMC Protoc 2019:1
Bioinformatics 19(1):467 48. Mahadevan R, Schilling CH (2003) The effects
37. Machado D, Andrejev S, Tramontano M, Patil of alternate optimal solutions in constraint-
KR (2018) Fast automated reconstruction of based genome-scale metabolic models. Metab
genome-scale metabolic models for microbial Eng 5(4):264–276
species and communities. Nucleic Acids Res 46 49. Mahadevan R, Edwards JS, Doyle FJ (2002)
(15):7542–7553 Dynamic flux balance analysis of diauxic
38. DeJongh M, Formsma K, Boillot P, Gould J, growth in Escherichia coli. Biophys J 83
Rycenga M, Best A (2007) Toward the auto- (3):1331
mated generation of genome-scale metabolic 50. Aziz RK, Bartels D, Best AA, DeJongh M,
networks in the SEED. BMC Bioinformatics Disz T, Edwards RA et al (2008) The RAST
8:139 Server: rapid annotations using subsystems
39. Henry CS, DeJongh M, Best AA, Frybarger technology. BMC Genomics 9(1):75
PM, Linsay B, Stevens RL (2010) High- 51. Schomburg I, Chang A, Hofmann O,
throughput generation, optimization and anal- Ebeling C, Ehrentreich F, Schomburg D
ysis of genome-scale metabolic models. Nat (2002) BRENDA: a resource for enzyme data
Biotechnol 28(9):977–982 and metabolic information. Trends Biochem
40. Dias O, Rocha M, Ferreira EC, Rocha I (2015) Sci 27(1):54–56
Reconstructing genome-scale metabolic mod- 52. Chang A, Scheer M, Grote A, Schomburg I,
els with merlin. Nucleic Acids Res 43 Schomburg D (2009) BRENDA, AMENDA
(8):3899–3910 and FRENDA the enzyme information system:
41. Karp PD, Latendresse M, Paley SM, new content and tools in 2009. Nucleic Acids
Krummenacker M, Ong QD, Billington R Res 37(Suppl 1):D588
et al (2015) Pathway Tools version 19.0 53. Kanehisa M, Goto S (2000) KEGG: Kyoto
update: software for pathway/genome infor- encyclopedia of genes and genomes. Nucleic
matics and systems biology. Brief Bioinform Acids Res 28(1):27
17(5):877–890 54. Kanehisa M, Goto S, Kawashima S, Okuno Y,
42. Agren R, Liu L, Shoaie S, Vongsangnak W, Hattori M (2004) The KEGG resource for
Nookaew I, Nielsen J (2013) The RAVEN deciphering the genome. Nucleic Acids Res
toolbox and its use for generating a genome- 32(suppl 1):D277
scale metabolic model for Penicillium chryso- 55. Caspi R, Foerster H, Fulcher CA, Kaipa P,
genum. PLoS Comput Biol 9(3):e1002980 Krummenacker M, Latendresse M et al
43. Wang H, Marcišauskas S, Sánchez BJ, (2008) The MetaCyc database of metabolic
Domenzain I, Hermansson D, Agren R et al pathways and enzymes and the BioCyc collec-
(2018) RAVEN 2.0: A versatile toolbox for tion of pathway/genome databases. Nucleic
metabolic network reconstruction and a case Acids Res 36(Database issue):D623–D631
study on Streptomyces coelicolor. PLoS Com- 56. Ren Q, Kang KH, Paulsen IT (2004) Trans-
put Biol 14(10):e1006541 portDB: a relational database of cellular mem-
44. Swainston N, Smallbone K, Mendes P, Kell brane transport systems. Nucleic Acids Res 32
DB, Paton NW (2011) The SuBliMinaL Tool- (suppl 1):D284
box: automating steps in the reconstruction of 57. Ren Q, Chen K, Paulsen IT (2006) Trans-
metabolic networks. J Integr Bioinform 8 portDB: a comprehensive database resource
(2):187–203 for cytoplasmic membrane transport systems
45. Becker SA, Feist AM, Mo ML, Hannum G, and outer membrane channels. Nucleic Acids
Palsson BO, Herrgard MJ (2007) Quantitative Res 35(suppl 1):D274
prediction of cellular metabolism with 58. Fritzemeier CJ, Hartleb D, Szappanos B,
constraint-based models: the COBRA Tool- Papp B, Lercher MJ (2017) Erroneous
box. Nat Protoc 2(3):727–738 energy-generating cycles in published genome
Curating COBRA Models of Microbial Metabolism 337
scale metabolic networks: Identification and growth and metabolic by-product secretion in
removal. PLoS Comput Biol 13(4):e1005494 wild-type Escherichia coli W3110. Appl Envi-
59. Alberty RA (1998) Calculation of standard ron Microbiol 60(10):3724–3731
transformed formation properties of biochem- 71. Neidhardt FC, Curtiss R III, Ingraham J,
ical reactants and standard apparent reduction Lin E, Low K, Magasanik B et al (1996) Escher-
potentials of half reactions. Arch Biochem Bio- ichia coli and salmonella: cellular and molecular
phys 358(1):25–39 biology. Sigma-Aldrich, Washington DC
60. Alberty RA (1998) Calculation of standard 72. Tekaia F, Yeramian E, Dujon B (2002) Amino
transformed Gibbs energies and standard trans- acid composition of genomes, lifestyles of
formed enthalpies of biochemical reactants. organisms, and evolutionary trends: a global
Arch Biochem Biophys 353(1):116–130 picture with correspondence analysis. Gene
61. Kummel A, Panke S, Heinemann M (2006) 297(1–2):51–60
Systematic assignment of thermodynamic con- 73. Dumontier M, Michalickova K, Hogue C
straints in metabolic network models. BMC (2002) Species-specific protein sequence and
Bioinformatics 7:512 fold optimizations. BMC Bioinformatics 3
62. Mavrovouniotis ML (1990) Group contribu- (1):39
tions for estimating standard gibbs energies of 74. Chan SHJ, Cai J, Wang L, Simons-Senftle MN,
formation of biochemical compounds in aque- Maranas CD (2017) Standardizing biomass
ous solution. Biotechnol Bioeng 36 reactions and ensuring complete mass balance
(10):1070–1082 in genome-scale metabolic models. Bioinfor-
63. Jankowski MD, Henry CS, Broadbelt LJ, Hat- matics 33(22):3603–3609
zimanikatis V (2008) Group contribution 75. Feist AM, Palsson BO (2010) The biomass
method for thermodynamic analysis of com- objective function. Curr Opin Microbiol 13
plex metabolic networks. Biophys J 95 (3):344–349
(3):1487–1499 76. Tang YJ, Martin HG, Myers S, Rodriguez S,
64. Henry CS, Jankowski MD, Broadbelt LJ, Hat- Baidoo EEK, Keasling JD (2009) Advances in
zimanikatis V (2006) Genome-scale thermody- analysis of microbial metabolic fluxes via 13C
namic analysis of Escherichia coli metabolism. isotopic labeling. Mass Spectrom Rev 28
Biophys J 90(4):1453–1461 (2):362–375
65. Henry CS, Broadbelt LJ, Hatzimanikatis V 77. Fischer E, Zamboni N, Sauer U (2004) High-
(2007) Thermodynamics-based metabolic flux throughput metabolic flux analysis based on
analysis. Biophys J 92(5):1792–1805 gas chromatography-mass spectrometry
66. Feist AM, Henry CS, Reed JL, derived 13C constraints. Anal Biochem 325
Krummenacker M, Joyce AR, Karp PD et al (2):308–316
(2007) A genome-scale metabolic reconstruc- 78. Sauer U (2006) Metabolic networks in motion:
tion for Escherichia coli K-12 MG1655 that 13C-based flux analysis. Mol Syst Biol 2:62
accounts for 1260 ORFs and thermodynamic 79. Stewart BJ, Navid A, Turteltaub KW, Bench G
information. Mol Syst Biol 3:121 (2010) Yeast dynamic metabolic flux measure-
67. Tanaka M, Okuno Y, Yamada T, Goto S, ment in nutrient-rich media by HPLC and
Uemura S, Kanehisa M (2003) Extraction of a accelerator mass spectrometry. Anal Chem 82
thermodynamic property for biochemical reac- (23):9812–9817
tions in the metabolic pathway. Genome Infor- 80. Green ML, Karp PD (2004) A Bayesian
matics Ser 14:370–371 method for identifying missing enzymes in pre-
68. Hucka M, Finney A, Sauro HM, Bolouri H, dicted metabolic pathway databases. BMC Bio-
Doyle JC, Kitano H et al (2003) The systems informatics 5:76
biology markup language (SBML): a medium 81. Lieven C, Beber ME, Olivier BG, Bergmann
for representation and exchange of biochemical FT, Ataman M, Babaei P et al (2020) MEM-
network models. Bioinformatics 19 OTE for standardized genome-scale metabolic
(4):524–531 model testing. Nat Biotechnol 38(3):272–276
69. Hucka M, Bergmann FT, Dr€ager A, Hoops S, 82. Price ND, Reed JL, Palsson BO (2004)
Keating SM, Le Novère N et al (2018) The Genome-scale models of microbial cells: evalu-
Systems Biology Markup Language (SBML): ating the consequences of constraints. Nat Rev
language specification for level 3 version Microbiol 2(11):886–897
2 core. J Integr Bioinformatics 15 83. Feist AM, Palsson BO (2008) The growing
(1):20170081 scope of applications of genome-scale meta-
70. Varma A, Palsson BO (1994) Stoichiometric bolic reconstructions using Escherichia coli.
flux balance models quantitatively predict Nat Biotechnol 26(6):659–667
338 Ali Navid
84. Milne CB, Kim PJ, Eddy JA, Price ND (2009) stationary fluxes in metabolic networks. Eur J
Accomplishments in genome-scale in silico Biochem 271(14):2905–2922
modeling for industrial and medical biotech- 89. Oliveira AP, Nielsen J, Förster J (2005) Mod-
nology. Biotechnol J 4(12):1653–1670 eling Lactococcus lactis using a genome-scale
85. Liu L, Agren R, Bordel S, Nielsen J (2010) Use flux model. BMC Microbiol 5(1):39
of genome-scale metabolic models for under- 90. Kauffman KJ, Prakash P, Edwards JS (2003)
standing microbial physiology. FEBS Lett 584 Advances in flux balance analysis. Curr Opin
(12):2556–2564 Biotechnol 14(5):491–496
86. Bordbar A, Monk JM, King ZA, Palsson BO 91. Krummenacker M, Paley S, Mueller L, Yan T,
(2014) Constraint-based models predict meta- Karp PD (2005) Querying and computing
bolic and associated cellular functions. Nat Rev with BioCyc databases. Bioinformatics 21
Genet 15(2):107–120 (16):3454–3455
87. Knorr AL, Jain R, Srivastava R (2007) 92. Sayers EW, Beck J, Brister JR, Bolton EE,
Bayesian-based selection of metabolic objective Canese K, Comeau DC et al (2019) Database
functions. Bioinformatics 23(3):351–357 resources of the National Center for Biotech-
88. Holzhutter HG (2004) The principle of flux nology Information. Nucleic Acids Res 48
minimization and its application to estimate (D1):D9–D16
Chapter 15
Abstract
COBRA toolbox is one of the most popular tools for systems biology analyses using genome-scale
metabolic reconstructions. The toolbox permits the use of many constraint-based analytical methods for
examining characteristics of metabolism in the biosystems ranging in complexity from single cells to
microbial communities and ultimately multicellular organisms. The toolbox has a number of different
variants that can be used depending on a user’s choice of programming language. Here, I provide a basic
tutorial for beginners that plan to use the original MATLAB version of the toolbox.
Key words Systems biology, Genome-scale models, Constraint-based analysis, FBA, Metabolic net-
works, COBRA toolbox
1 Introduction
Ali Navid (ed.), Microbial Systems Biology: Methods and Protocols, Methods in Molecular Biology, vol. 2349,
https://doi.org/10.1007/978-1-0716-1585-0_15, © Springer Science+Business Media, LLC, part of Springer Nature 2022
339
340 Ali Navid
2 Materials
3 Methods
3.1 Initializing First, download the package from the COBRA website and install
the Toolbox it. There are different ways of installing the toolbox depending on
your computer’s operating system. For a complete set of
A Beginner’s Guide to the COBRA Toolbox 341
Fig. 1 After initializing, the COBRA toolbox lists available LP solvers that can be used with the toolbox and the
mathematical problem that each can solve. A list of models that are included with the installation can be seen
on the left side of the figure
initCobraToolbox()
If you do not enter any value inside the parentheses, the default
value of true will be used and the program will check for any new
updates available for the toolbox from the Cobra toolbox github
site. If you are not interested in checking for new updates and want
to skip this step, enter false in the parentheses.
During the initialization process, the program checks for pres-
ence of needed software for various types of analysis. It provides a
report on availability of different LP solvers and the types of pro-
blems they can solve (see Fig. 1). The default solver for LP and
Mixed-integer linear programming (MILP) problems is the open-
source Gnu linear programming kit (GLPK), unless you have the
Gurobi solver (free for academic user but others need to purchase a
license, https://www.gurobi.com/) installed on your computer. If
you have access to Gurobi, then that solver will be used as default
for LP, MILP, and quadratic problems. The MATLAB solver can be
used for nonlinear programming problems. By using the command
changeCobraSolver, you can get a list of which solvers are being
used for every type of LP problem. If you want to change the solver
for a type of LP problem, then you can use the same command but
342 Ali Navid
with inputs that identify your choice of solver. For example, if you
want to use the pdco solver for solving LP problems, you can
change the solver with this command:
changeCobraSolver(‘pdco’,’LP’);
3.2 Reading Models In order to analyze any model, you will first need to import it into
into MATLAB MATLAB. Version 3 of the toolbox can accept a number of differ-
ent file types including systems biology markup language (SBML)
(.xml), Excel sheets (.xls), as well as MATLAB files (.mat).
Although all these formats have been used before for publishing
and sharing models, there is a strong push for normalizing the
modeling conventions in the COBRA field. Recently, a community
developed suite of tests named MEMOTE [19] has been developed
in order to assess the quality of genome-scale models (GSMs),
ensure easy reusability, and enhance simulation reproducibility.
The developers of MEMOTE have advocated the adoption of
SBML level 3 flux balance constraints package [20] as the primary
format for future published GSMs.
To import a model into the COBRA toolbox, you need to
change your directory to the one where your model file resides.
To access the SBML version of a small model of core metabolic
reactions in E. coli that comes with the cobra toolbox, you need to
go to the directory where that .xml file resides. To get there, from
your cobra toolbox home directory type:
cd test/models/xml
You can also go to this folder from the ‘Current Folder’ panel
on the left side of your MATLAB screen (see Fig. 1). Once in the
correct directory, you can choose to read the file into MATLAB
either directly using the name of the file (e.g., ecoli_core_model.
xml) or by making the name of the file a MATLAB variable. To load
the test model, one could use:
model=readCBModel(‘ecoli_core_model.xml’);
or
modelfile=’ecoli_core_model.xml’;
model=readCBModel(modelfile);
Fig. 2 Double clicking on the model name in the ‘Workspace’ panel (right side) will display a list of different
arrays and matrices that form the model knowledgebase
3.3 Model One of the main reasons for building mathematical models is to
Manipulation manipulate them and see how various changes will alter the behav-
ior of a system. In case of COBRA models, this could involve losing
or gaining functions (i.e., deleting/blocking or adding reactions,
respectively) as well as changing the constraints of the systems such
as altering the upper or lower flux bounds for a reaction.
3.3.1 Adding Metabolites are usually the nodes of a metabolic network. They are
and Removing Metabolites the compounds that proteins either chemically alter or transport
between compartments. The addMetabolite command is used
to enter new metabolites into a model. For example, to add pyro-
phosphate to the core E. coli model (henceforth the model) using
the ppi[c] identifier, one can use the command:
model=addMetabolite(model,’ppi[c]’);
model=addMetabolite(model,’asp-L[c]’,’metName’, ...
’L-Aspartate[c]’,’metFormula’,’C4H6NO4’, ...
’ChEBIID’,’17053’,’KEGGId’,’C00049’, ’Charge’,-1);
344 Ali Navid
model=addMultipleMetabolites(model,...
{’asn-L[c]’,’2oxosucc[c]’}),’metNames’,...
{’L-Asparagine[c]’, ’2-oxosuccinamate[c]’}, ...
’metFormulas’,{’C4H8N2O3’,’C4H4NO4’}, ...
’metChEBIID’,{‘17196’,‘16237’},’metKEGGID’, ...
{’C00152’,’C02362’},’metCharges’,[0,-1]);
model=removeMetabolites(model,{’asp-L[c]’,’2oxosucc[c]’});
3.3.2 Adding, Removing, There are multiple types of reactions in each GSM, and accordingly,
or Modifying Reactions there are multiple ways of adding a new reaction to the models. A
new reaction that deals with biochemical conversions can be added
using the addReaction command. There are two ways of using
this command, and your choice will dictate what information is
added within the parentheses following the command.
Addition Using Reaction When using this method, you will simply type the chemical equa-
Formula tion into the addReaction command. For example, to add the
aspartate transaminase reaction, a biochemical process that involves
two metabolites from the Krebs cycle pathway, one can use:
model=addReaction(model,’ASPTA’,’reactionFormula’,...
‘akg[c] + asp-L[c] <=> glu-L[c] + oaa[c]’)
If the command proceeds without any issues, then you will see
that the ASPTA (the name used for this reaction in GSMs from the
BIGG database [23, 24]) (see Note 4) is added to the tail-end of the
rxns array in the model. Also, the number of reactions in your
model should increase by one.
If the added reaction contains metabolites that are not already
present in the model, then those metabolites will automatically get
added to the model. For example, L-aspartate (asp-L[c]) is not a
compound in the original model and by adding the ASPTA reac-
tion, you add asp-L[c] as a new compound to the model.
A Beginner’s Guide to the COBRA Toolbox 345
Addition Using Separate You can also add a reaction to your model by using a method for
Lists Method which two separate lists, one of reaction metabolites and the other
of their stoichiometric coefficients, describe the reaction. For exam-
ple, to add the ASPTA reaction to the model one can use:
Model=addReaction(model,’ASPTA’,’metabolliteList’, ...
{‘akg[c]’, ‘asp-L[c]’, ‘glu-L[c]’,’oaa[c]’}, ‘stoichCoeff-
List’,[-1; -1; 1; 1], ‘reversible’, true);
Model=addReaction(model,’ASPTA’,’metabolliteList’,...
{‘akg[c]’, ‘glu-L[c]’, ‘asp-L[c]’,’oaa[c]’}, ‘stoichCoeff-
List’,[-1; 1; -1; 1], ‘reversible’, true);
Adding Multiple Reactions When adding multiple reactions to a model, the addMultipleR-
eactions command should be used to accelerate the process. For
example, the following command can be used to add two new
reactions (asparagine synthesis via asparagine amidohydrolase and
aspartate:glutamine amido-ligase) to the E. coli model:
model=addMultipleReactions(model,{’ASNS3_1’,’ASNN’},...
{’asp-L[c]’,’atp[c]’,’nh4[c]’,’gln-L[c]’,’amp[c]’, ...
’asn-L[c]’,’ppi[c]’,’glu-L[c]’,’h2o[c]’},[-1 1;-1 0; ...
-1 1; -1 0;1 0; 1 -1; 1 0; 1 0; -1 0],’rxnNames’,...
{’L-aspartate:L-glutamine amido-ligase’,...
’L-asparagine amidohydrolase’},’lb’,[0,0],’ub’, ...
[10,10],’subSystems’,...
{’Alanine, aspartate and glutamate metabolism’,...
’Alanine, aspartate and glutamate metabolism’},’grRules’,...
{’b0677’,’b0828 or b1767 or b2957’});
printRxnFormula(model,’rxnAbbrList’,{’ASNS3_1’,’ASNN’}, ...
’metNameFlag’,false,’gprFlag’,true);
Adding Non-metabolic While you can add any type of reaction using the addReaction
Reactions and addMultiReactions commands, COBRA toolbox also con-
tains a number of commands for quick addition of non-metabolic
reactions. For example, demand reactions are irreversible reactions
that represent metabolite consumption processes when one is not
concerned with the subsequent fate of the compound. For such
cases, the addDemandReaction command can be used. To add a
demand reaction to the model for the cytoplasmic oxaloacetate,
one can use:
model=addDemandReaction(model,’oaa[c]’);
model=addExchangeRxn(model,oaa[e],0,0);
Removing Reactions The removeRxns command is used to delete one or more reac-
tions from a model. For example, to remove the demand reaction
that was added in the previous section, the command would be:
model=changeGeneAssociation(model,’ASNS3_1’,’b0674’);
model=changeRxnBounds(model,{’RPI’,’ICL’,’ME1’},-10,’l’);
3.3.3 Reordering/ Changing a model can sometime make it unruly. Removing meta-
Cleaning a Model bolites or reactions could result in reactions, metabolites, or genes
that have become ‘orphan’. This means that they no longer have
any role in the model and their associated values in the S matrix are
zero. These entities can be removed using a number of different
commands.
To remove metabolites and reactions that are no longer part of
the model, the removeTrivialStoichiometry command can
be used:
updatedmodel=removeTrivialStoichiometry(model);
updatedmodel=removeUnusedGenes(model);
A Beginner’s Guide to the COBRA Toolbox 349
3.4 Creating a Cobra So far, all the commands that have been discussed involve reading
Model and modifying existing models. However, COBRA toolbox can
also be used to develop new models. The easiest way to do this
would be to start a model using the createModel command. For
example, if one was to create a three-reaction model consisting of
these reactions with associated information:
reaction 1 ðR1Þ : C1 þ E⟺C1E associated gene : G1
lb : 100 ub : 50
reaction 2 ðR2Þ : C1E ! C2E associated genes : G2 and G3
lb : 0 ub : 75
reaction 3 ðR3Þ : C2E⟺C2 þ E associated genes : G4 or G5
lb : 10 ub : 100
the command would be:
newmodel=createModel({’R1’,’R2’,’R3’},{’reaction 1’,...
’reaction 2’,’reaction 3’},{’C1 + E <=> C1E’,...
’C1E -> C2E’,’C2E <=> C2 + E’},’lowerBoundList’,...
[-100 0 -10],’upperBoundList’,[50,75,100],...
’subSystemList’, {’Omega subsystem’,’Omega subsystem’,...
’Omega subsystem’},’grRuleList’,{’G1’,’G2 and G3’,...
’G4 or G5’});
Here newmodel is the MATLAB name for the model you are
developing. If you look in the workspace window you will see that
newmodel has been added. After you enter this command you will
also receive a series of messages that inform you about what meta-
bolites and reactions have been added to the model (see Fig. 3).
Fig. 3 After using the createModel command, the new model will be created as a series of arrays. The
name given to the new model will be displayed in the Workspace panel. The toolbox will also print the list of
metabolites and reactions that are added as part of the new model
newmodel=changeObjective(newmodel,’R2’);
obf=checkObjective(newmodel)
obf (or any other name you choose) is a vector that will contain
the name of all the biochemical processes that contribute to the
model’s objective function.
3.4.2 Writing Models COBRA models can be saved and exported in a variety of different
in Different Format formats using the writeCbModel command. As noted earlier,
SBML is the most common format for publishing GSM models.
To save the newly created model in SBML format, the command is:
writeCbModel(newmodel, ’format’,’sbml’);
Since in the above command the name of the output file is not
specified, and only when saving a model in SBML, a pop-up win-
dow will appear (see Fig. 4) that will allow the user to name the
output file (see Note 7). The models can also be saved in ‘text’,
Excel (xls), and MATLAB (mat) formats. However, for these
A Beginner’s Guide to the COBRA Toolbox 351
convertCobraLP2mps(newmodel,’NewModel’);
writeLPProblem(newmodel,’fileName’, ’NewModel.mps’);
3.5 System-Level COBRA toolbox can be used to conduct a large variety of system-
COBRA Analyses level analyses. Here, only the protocol for a few of the most com-
monly used methods of analyzing the basic metabolic characteris-
tics of a system and its robustness are detailed. The reader is
encouraged to read reviews by Oberhardt et al. [25], Bordbar
et al. [26], and Fang et al. [1] to learn about some of the other
COBRA analytical methods. The manuscript introducing the third
version of the toolbox [10] is another great resource for learning
about COBRA toolbox analyses, as well as the latest updates to the
program.
3.5.1 Flux Balance Flux balance analysis (FBA) is the most widely used COBRA
Analysis method for analyzing GSMs [11, 27]. FBA places a number of
constraints on the genome-scale reconstructions of metabolism
and uses linear programming to solve for a metabolic flux pattern
that will result in the optimum outcome for a biosystem objective.
The system objective that is commonly optimized by FBA is bio-
mass production, i.e., growth. However, other biological activities
such as maximizing a system’s energy production, its metabolic
efficiency, or the production of a compound of interest can also
be selected as an FBA objective function. The major constraints
that are placed on a system for FBA are limitations on the rate of
import/export of metabolites and an assumption that the system is
operating at steady state, i.e., the rate of production and consump-
tion of all the metabolites in the system are equal to one another
and thus the overall change in the concentration for these com-
pounds is zero. In order for a compound to be excluded from the
steady-state constraint, there needs to be a demand, sink, or
exchange reaction associated with that metabolite.
352 Ali Navid
Fig. 4 If one is saving a model in the SBML (.xml) format using writeCbModel command, and the name of
the new model file is not included in the command, then the toolbox will provide a pop-up window that will ask
for the name of the model file. This does not happen when saving in other file formats
FBAsol=optimizeCbModel(model,’max’);
Fig. 5 The output of FBA simulation (FBAsol in this case) will provide: f (the optimum value of the objective
function), v (flux value for every reaction in the model), y (shadow prices, a measure of sensitivity of objective
function value to constraints [45]), w (reduced costs, a measure of sensitivity of the objective function to
changes in flux values [45]), s (slacks of the metabolites), solver (name of the solver used for FBA), stat (solver
status), origStat (the original status returned by the solver), and x (a legacy array similar to v)
Print the Reaction Flux The v vector will contain the reaction rates that result in the
Values optimum value of the objective function. The x vector is a legacy
vector and will contain the same values as v. To access the v and x
vectors from the command line, FBAsol.v and FBAsol.x should be
used (see Note 8).
To print a specific subset of the calculated fluxes, the print-
FluxVector command can be used. For example, if one wants to
examine the flux values of all the reaction in the model, this com-
mand can be used:
printFluxVector(model, FBAsol.v)
nonZeroFlag = 1;
fluxes = FBAsol.v;
printFluxVector(model, fluxes, nonZeroFlag);
excFlag=1;
printFluxVector(model, fluxes, nonZeroFlag, excFlag);
Examine the Activity After an FBA simulation, it often becomes necessary to examine
of Individual Model how a specific component of the model behaves when the system
Components operates at the optimal value of the objective function. A very useful
command in the COBRA toolbox for this type of analysis is the
354 Ali Navid
Fig. 6 Example output from surfNet command. This command can be used to examine the activity of
metabolites, reactions, and gene-associated reactions
surfNet(model,{’ACKr’,’6pgc[c]’,’b0720’},’metNameFlag’,...
true,’flux’, FBAsol.v,’nonzeroFluxFlag’,false);
3.5.2 Flux Variability The solution provided by FBA is not unique and other metabolic
Analysis flux patterns can also provide the same maximum value of a model’s
objective function. Flux variability analysis (FVA) is a method for
examining the range of fluxes that each reaction can have while the
objective function is at its maximum value [12]. FVA allows for a
number of different system-level studies, including investigation of
alternative optimal flux patterns, analyses of flux distributions
A Beginner’s Guide to the COBRA Toolbox 355
3.5.3 Parsimonious As noted above, the flux pattern calculated by FBA is not a unique
Enzyme Usage FBA solution. It is one of many flux patterns that could result in the same
maximum value of the objective function. To reduce the variability
of the fluxes, i.e., shrink the size of the multidimensional solution
space of the model, one has to constrain the model with additional
information or assumptions. In some cases, addition of information
such as measurements of changes in gene expressions or protein
levels have been used to constrain the models (e.g., [33–36]). In
other cases, thermodynamics [37] and metabolite levels [38] have
been used as constraints. Overall it has been shown that integrating
even a handful of experimentally measured fluxes into FBA models
could drastically reduce the variability of the predicted fluxes [39].
Parsimonious enzyme usage FBA (pFBA) [14] is the most
widely used COBRA method for reducing the variability of the
flux predictions. To achieve this, pFBA uses bilevel optimization
to ensure that the model predicts the most enzymatically efficient
flux distribution for a metabolic objective. FBA is used first to find
356 Ali Navid
The result of the pFBA can be used to classify all the reactions
and genes in a model based on their contribution to the goal of
reaching the optimum value of the objective function. The results
of the classifications are included in two sets of arrays that, in the
above example, are labeled GeneClasses and RxnClasses. The results
can be accessed by clicking on the chosen names in the workspace
panel (see Fig. 7). The genes are divided into six categories:
1. Essentials (pFBAEssential): these are genes that are necessary
for growth in the modeled media.
2. pFBA optima: these are non-essential genes that are needed for
achieving the maximum growth rate at minimum fluxes.
3. Enzymatically less efficient (ELEGenes): if the reactions asso-
ciated with these genes are used to achieve the fastest growth
rate, then the sum of fluxes through the system will be greater
than that predicted by pFBA; i.e., there are alternative pathways
that can achieve the same growth rate but with lower combined
fluxes.
3.5.4 Perturbation GSMs can be analyzed with COBRA methods to assess a system’s
Analyses robustness to genetic and environmental perturbations. This is
achieved by varying flux through one or more reactions and using
FBA to see how the change affects the optimum value of the
model’s objective function.
Single-Reaction Knockouts GSMs and FBA can be used to evaluate if the activity of a single
biochemical reaction is essential for cellular growth or some other
biological objective. The command singleRxnDeletion() in
the COBRA toolbox allows one to test the effect of single-reaction
knockouts (SRKO). The command for SRKO on the E. coli model
can be:
selGenes={’b0008’,’b0114’,’b0451’,’b0720’,’b0729’,’b1136’};
[grRatioSGKO, grRateSGKO, grRateWT, hasEffectgenes,...
delRxnsSGKO,fluxSolutionSGKO]=singleGeneDeletion(model,...
’FBA’,selGenes);
The output from this operation is very similar to that for sin-
gleRxnDeletion command. Except that for SGKOs, instead of
individual reactions, the results are for inactivating a subset or all of
the genes in the model. The default input for the command is
SGKO of every gene in the model. In order to conduct SGKO
for only a select group of genes, it is necessary to input a list of genes
(as shown above).
Synthetic Lethal Mutations Many biological systems minimize deleterious outcomes that might
results from environmental and genetic perturbations through
functional redundancy and distributed robustness [41, 42]. Meta-
bolic redundancies are manifested through the presence of iso-
zymes and alternate pathways. Isozymes are byproducts of
multiple genes that have the capability to catalyze the same
biological reaction. Alternate pathways are composed of different
biochemical reactions, but they share some of their metabolic
byproducts.
A Beginner’s Guide to the COBRA Toolbox 359
Robustness Analysis SRKO, SGKO, and SLM examine how genetic mutations affect a
system’s phenotype. Similarly, examining how flux changes for each
reaction affect the maximum value of a model’s objective function
provide information about phenotypic changes resulting from
switching between different pathways and variations in rates of
import/export of metabolites. The robustnessAnalysis com-
mand is used for this type of analysis. For example, if one is
interested to know how changing the rate of import/export of
CO2 would affect the growth rate for the core E. coli model, the
function can be:
The inputs for this function are model name, name of the
control reaction, number of points to be calculated/displayed,
choice of whether a graph of the results should be displayed, the
objective function to be examined, and choice of maximizing or
minimizing the objective function.
The robustnessAnalysis operation will provide three out-
puts. The controlFlux array will contain the set of flux values for the
chosen biochemical reaction. The range of values are determined by
the maximum and minimum amount of flux that the reaction can
carry. The objFlux array will contain the FBA calculated maximum
360 Ali Navid
Fig. 8 Graph displaying the result of robustnessAnalysis for import/export of CO2. The x-axis shows
the fixed value for import/export flux of CO2 and the y-axis displays the optimum value of the objective function
values of the objective function when the flux through the chosen
reaction is set to the values reported in controlFlux. The function
will also result in a 2D plot of the values in these arrays (see Fig. 8).
In order to examine how fluxes through two reactions affect
one another, as well as the maximum value of the objective func-
tion, the doubleRobustnessAnalysis command can be used.
This is a great tool for examining trade-offs between different
biological processes. For example, one might test how the rates of
production of two metabolic byproducts or flux through two path-
ways affect each other as well as the rate of growth. If one wants to
examine how the anabolic gluconeogenesis pathway (represented
by fructose-bisphosphatase, FBP) and the Krebs cycle (represented
by α-ketoglutarate dehydrogenase, AKGDH) affect one another
and the rate of growth of E. coli, the command to use can be:
The inputs for this function are similar to the ones for
robusntessAnalysis function with the exception that you
need to list two control reactions instead of one. The output for
this operation consists of three arrays containing the flux values for
A Beginner’s Guide to the COBRA Toolbox 361
4 Notes
1. For the COBRA toolbox commands, the courier font has been
used to distinguish them from the rest of the text.
2. When using COBRA toolbox, there are some inputs that need
to be entered in a precise manner. However, users also have a
lot of freedom in naming models, variable and arrays. In this
chapter many commands have been provided as examples
(using the E. coli core metabolism model). In the shown exam-
ple commands, those inputs that can be changed by the user are
in italics.
3. The letter in brackets designation (e.g. [c]) is not universal.
Some models might use _ followed by the letter (e.g. _c) or
parentheses (e.g. (c)).
4. Newly entered reactions can be given any name; however, it is
good practice to make sure the name matches the naming style
of the other reactions in the model. For example, when adding
a reaction to a model from the Kbase [16], it would be a good
idea to check the ModelSEED [44] database (https://model-
seed.org/biochem) to find the reaction’s name in that format
(e.g., ASPTA¼rxn00260).
5. If you do not explicitly set the lower and upper bounds for the
reaction, the default values of 1000 and 1000 will be assigned
to it. If you enter a value other than zero for the ‘objectiveCoef’
input, then the added equation will be optimized along with
362 Ali Navid
Acknowledgments
This work was funded in part by the DOE OBER Genomic Science
program and LLNL Laboratory Directed Research and Develop-
ment funding and performed under the auspices of the
U.S. Department of Energy at Lawrence Livermore National Lab-
oratory under Contract DE-AC52-07NA27344.
References
1. Fang X, Lloyd CJ, Palsson BO (2020) Recon- metabolic network reconstruction. PLoS One
structing organisms in silico: genome-scale 8(5):e63369
models and their emerging applications. Nat 3. Navid A, Almaas E (2009) Genome-scale
Rev Microbiol 18:731–743 reconstruction of the metabolic network in
2. Chaudhury S, Abdulhameed MDM, Singh N, Yersinia pestis, strain 91001. Mol Biosyst 5
Tawa GJ, D’haeseleer PM, Zemla AT et al (4):368–375
(2013) Rapid countermeasure discovery 4. Navid A (2011) Applications of system-level
against francisella tularensis based on a models of metabolism for analysis of bacterial
364 Ali Navid
physiology and identification of new drug tar- Systems Biology Knowledgebase. Nat Biotech-
gets. Brief Funct Genomics 10(6):354–364 nol 36(7):566
5. Burgard AP, Pharkya P, Maranas CD (2003) 17. Schellenberger J, Que R, Fleming RMT,
Optknock: a bilevel programming framework Thiele I, Orth JD, Feist AM et al (2011) Quan-
for identifying gene knockout strategies for titative prediction of cellular metabolism with
microbial strain optimization. Biotechnol constraint-based models: the COBRA Toolbox
Bioeng 84(6):647–657 v2.0. Nat Protoc 6(9):1290–1307
6. Park JH, Lee KH, Kim TY, Lee SY (2007) 18. Thiele I, Palsson BO (2010) A protocol for
Metabolic engineering of Escherichia coli for generating a high-quality genome-scale meta-
the production of L-valine based on transcrip- bolic reconstruction. Nat Protoc 5(1):93–121
tome analysis and in silico gene knockout sim- 19. Lieven C, Beber ME, Olivier BG, Bergmann
ulation. Proc Natl Acad Sci U S A 104 FT, Ataman M, Babaei P et al (2020) MEM-
(19):7797–7802 OTE for standardized genome-scale metabolic
7. Yoshikawa K, Toya Y, Shimizu H (2017) Met- model testing. Nat Biotechnol 38(3):272–276
abolic engineering of Synechocystis sp. PCC 20. Olivier BG, Bergmann FT (2018) SBML level
6803 for enhanced ethanol production based 3 package: flux balance constraints version 2. J
on flux balance analysis. Bioprocess Biosyst Integr Bioinformatics 15(1):20170082
Eng 40(5):791–796 21. Kanehisa M, Goto S (2000) KEGG: Kyoto
8. Fouladiha H, Marashi S-A, Torkashvand F, encyclopedia of genes and genomes. Nucleic
Mahboudi F, Lewis NE, Vaziri B (2020) A Acids Res 28(1):27
metabolic network-based approach for devel- 22. Kanehisa M, Sato Y, Kawashima M,
oping feeding strategies for CHO cells to Furumichi M, Tanabe M (2016) KEGG as a
increase monoclonal antibody production. reference resource for gene and protein anno-
Bioprocess Biosyst Eng 43:1381–1389 tation. Nucleic Acids Res 44(D1):
9. Becker SA, Feist AM, Mo ML, Hannum G, D457–DD62
Palsson BO, Herrgard MJ (2007) Quantitative 23. Schellenberger J, Park JO, Conrad TM, Pals-
prediction of cellular metabolism with son BO (2010) BiGG: a Biochemical Genetic
constraint-based models: the COBRA Tool- and Genomic knowledgebase of large scale
box. Nat Protoc 2(3):727–738 metabolic reconstructions. BMC Bioinformat-
10. Heirendt L, Arreckx S, Pfau T, Mendoza SN, ics 11:213
Richelle A, Heinken A et al (2019) Creation 24. King ZA, Lu J, Dr€ager A, Miller P,
and analysis of biochemical constraint-based Federowicz S, Lerman JA et al (2015) BiGG
models using the COBRA Toolbox v. 3.0. Nat models: a platform for integrating, standardiz-
Protoc 14:639–702 ing and sharing genome-scale models. Nucleic
11. Orth JD, Thiele I, Palsson BO (2010) What is Acids Res 44(D1):D515–DD22
flux balance analysis? Nat Biotechnol 28 25. Oberhardt MA, Palsson BO, Papin JA (2009)
(3):245–248 Applications of genome-scale metabolic recon-
12. Mahadevan R, Schilling CH (2003) The effects structions. Mol Syst Biol 5:320
of alternate optimal solutions in constraint- 26. Bordbar A, Monk JM, King ZA, Palsson BO
based genome-scale metabolic models. Metab (2014) Constraint-based models predict meta-
Eng 5(4):264–276 bolic and associated cellular functions. Nat Rev
13. Mahadevan R, Edwards JS, Doyle FJ (2002) Genet 15(2):107–120
Dynamic flux balance analysis of diauxic 27. Varma A, Palsson BO (1994) Metabolic flux
growth in Escherichia coli. Biophys J 83 balancing: basic concepts, scientific and practi-
(3):1331 cal use. Nat Biotechnol 12(10):994–998
14. Lewis NE, Hixson KK, Conrad TM, Lerman 28. Reed JL, Palsson BØ (2004) Genome-scale in
JA, Charusanti P, Polpitiya AD et al (2010) silico models of E. coli have multiple equivalent
Omic data from evolved E. coli are consistent phenotypic states: assessment of correlated
with computed optimal growth from genome‐ reaction subsets that comprise network states.
scale models. Mol Syst Biol 6(1):390 Genome Res 14(9):1797–1805
15. Segre D, Vitkup D, Church GM (2002) Anal- 29. Thiele I, Fleming RM, Bordbar A,
ysis of optimality in natural and perturbed met- Schellenberger J, Palsson BO (2010) Func-
abolic networks. Proc Natl Acad Sci U S A 99 tional characterization of alternate optimal
(23):15112–15117 solutions of Escherichia coli’s transcriptional
16. Arkin AP, Cottingham RW, Henry CS, Harris and translational machinery. Biophys J 98
NL, Stevens RL, Maslov S et al (2018) KBase: (10):2072–2081
the United States Department of Energy
A Beginner’s Guide to the COBRA Toolbox 365
30. Pharkya P, Maranas CD (2006) An optimiza- 38. Soh KC, Hatzimanikatis V (2014) Constrain-
tion framework for identifying reaction activa- ing the flux space using thermodynamics and
tion/inhibition or elimination candidates for integration of metabolomics data. In: Krömer
overproduction in microbial systems. Metab JO, Nielsen LK, Blank LM (eds) Metabolic flux
Eng 8(1):1–13 analysis: methods and protocols. Springer,
31. Bushell ME, Sequeira SIP, Khannapho C, New York, NY, pp 49–63
Zhao H, Chater KF, Butler MJ et al (2006) 39. Stewart B, Navid A, Turteltaub K, Bench G
The use of genome scale metabolic flux varia- (2010) Yeast dynamic metabolic flux measure-
bility analysis for process feed formulation ment in nutrient-rich media by HPLC and
based on an investigation of the effects of the accelerator mass spectrometry. Anal Chem 82
zwf mutation on antibiotic production in (23):9812–9817
Streptomyces coelicolor. Enzyme Microb 40. Machado D, Herrgård M (2014) Systematic
Technol 39(6):1347–1353 evaluation of methods for integration of tran-
32. Feist AM, Zielinski DC, Orth JD, scriptomic data into constraint-based models
Schellenberger J, Herrgard MJ, Palsson BO of metabolism. PLoS Comput Biol 10(4):
(2010) Model-driven evaluation of the produc- e1003580
tion potential for growth-coupled products of 41. Wagner A (2005) Distributed robustness ver-
Escherichia coli. Metab Eng 12(3):173–186 sus redundancy as causes of mutational robust-
33. Becker SA, Palsson BO (2008) Context- ness. Bioessays 27(2):176–188
specific metabolic networks are consistent 42. Whitacre J (2012) Biological robustness: para-
with experiments. PLoS Comput Biol 4(5): digms, mechanisms, and systems principles.
e1000082 Front Genet 3:67
34. Zur H, Ruppin E, Shlomi T (2010) iMAT: an 43. O’Neil NJ, Bailey ML, Hieter P (2017) Syn-
integrative metabolic analysis tool. Bioinfor- thetic lethality and cancer. Nat Rev Genet 18
matics 26(24):3140–3142 (10):613–623
35. Chandrasekaran S, Price ND (2010) Probabi- 44. Henry CS, DeJongh M, Best AA, Frybarger
listic integrative modeling of genome-scale PM, Linsay B, Stevens RL (2010) High-
metabolic and regulatory networks in Escheri- throughput generation, optimization and anal-
chia coli and Mycobacterium tuberculosis. Proc ysis of genome-scale metabolic models. Nat
Natl Acad Sci 107(41):17845 Biotechnol 28(9):977–982
36. Navid A, Almaas E (2012) Genome-level tran- 45. Maarleveld TR, Khandelwal RA, Olivier BG,
scription data of Yersinia pestis analyzed with a Teusink B, Bruggeman FJ (2013) Basic con-
new metabolic constraint-based approach. cepts and principles of stoichiometric modeling
BMC Syst Biol 6(1):150 of metabolic networks. Biotechnol J 8
37. Henry CS, Broadbelt LJ, Hatzimanikatis V (9):997–1008
(2007) Thermodynamics-based metabolic flux 46. Griesemer M, Navid A. MOFA: Multi-
analysis. Biophys J 92(5):1792–1805 Objective Flux Analysis for the COBRA Tool-
box. bioRxiv. 2021:2021.05.20.445041
Chapter 16
Abstract
Agent-based models (ABM), also called individual-based models, first appeared several decades ago with
the promise of nearly real-time simulations of active, autonomous individuals such as animals or objects.
The goal of ABMs is to represent a population of individuals (agents) interacting with one another and their
environment. Because of their flexible framework, ABMs have been widely applied to study systems in
engineering, economics, ecology, and biology. This chapter is intended to guide the users in the develop-
ment of an agent-based model by discussing conceptual issues, implementation, and pitfalls of ABMs from
first principles. As a case study, we consider an ABM of the multi-scale dynamics of cellular interactions in a
microbial community. We develop a lattice-free agent-based model of individual cells whose actions of
growth, movement, and division are influenced by both their individual processes (cell cycle) and their
contact with other cells (adhesion and contact inhibition).
1 Introduction
Ali Navid (ed.), Microbial Systems Biology: Methods and Protocols, Methods in Molecular Biology, vol. 2349,
https://doi.org/10.1007/978-1-0716-1585-0_16, © Springer Science+Business Media, LLC, part of Springer Nature 2022
367
368 Marc Griesemer and Suzanne S. Sindi
Fig. 1 Different modeling frameworks for studying the spatial dynamics of evolving systems. (a) Reaction
diffusion systems are specified through partial differential equations; deterministic equations where density of
species evolve in time through reactions with themselves of other species (reactions) and move in space
(diffusion). (b) Cellular automata are discrete time simulations where agents exist on a rectangular grid and
occupy one of finitely many states (such as white ¼ dead, black ¼ alive). At each time point, agents decide to
modify their state according to the states of neighboring grid cells. (c) Agent-based models consist of
individual agents (in this case yeast cells) that grow and divide according to their own specified processes
and by interacting with their environment and other cells around them
1.2 Cellular Cellular automata are a discrete time modeling framework for
Automata (CA) considering evolving agents on a rectangular grid (lattice). Each
element in the grid is assigned a value, usually discrete, to indicate
the state of that grid cell [11, 12]. For example, the simplest case
would be 0 (or white) for dead and 1 (or black) for alive as
illustrated in the example in Fig. 1b. The initial state of the CA is
usually taken to be either a single cell with a given state or some
random configuration of cellular states. The dynamics of the CA are
encapsulated in the rules for updating. Given a particular state of
the system at time t ¼ n, the grid state at the next time point
(t ¼ n + 1), is determined by each cell updating based on the
state of the neighboring cells. For example, in the CA formulation
of the Game of Life, developed by Conway [13], each cell considers
its eight neighboring cells and updates its state according to the
following four rules:
1. Any live (black) cell with fewer than two live (black) neighbors
dies (turns white). This represents death by underpopulation.
2. Any live (black) cell with two or three live (black) neighbors
lives (stays black).
3. Any live (black) cell with more than three living (black) neigh-
bors dies (turns white). This represents death by
overpopulation.
4. Any dead (white) cell with exactly three live (black) neighbors
becomes a live cell (turns black). This represents reproduction.
Since CAs were first developed, there have been many general-
izations to the underlying framework. For example, CAs have been
developed to live on torii or more general topological shapes. CA
370 Marc Griesemer and Suzanne S. Sindi
2 Materials
2.1 The Computer If you are developing your own ABM, you will need to consider the
System and ABM computational resources and limitations you have at your disposal.
Software If you are using an existing ABM software framework, the individ-
ual providers will detail the requirements. In addition, we note that
some ABM software allows not only for the simulation of the
population but the visualization of the resulting population
configuration.
2.1.1 ABM Software While below we describe a protocol for your own ABM implemen-
Packages tation, we note that there are many computational packages allow-
ing users to specify their own ABMs (Table 1). Common
applications and platforms for developing agent-based models
include FLAME [34], Netlogo [25], AgentCell [28], and Repast
[38]. There are also readymade simulators for ABMs. BacSim [29]
and its update iDynoMICS [35] and INDSIM [36] are applications
that only need the user to define a model. For very large simula-
tions that require high-performance computing, one can use Bio-
cellion [30] that scales to a billion cells. The multi-agent software
platform Swarm uses collections of agents (swarms) in hierarchical
structures or populations [39]. Virtual Cell (VCell) [40, 41] pro-
vides a platform for simulating chemical reactions using both deter-
ministic and stochastic methods, while including spatial effects.
CompuCell3D [33] is an open-source simulation environment
for multi-cell, single-cell-based modeling of tissues, organs, and
organisms using the Cellular Potts Model. Chaste (Cancer, Heart,
And Soft Tissue Environment) [31, 32] allows for the simulation of
models developed for those domains and more generally in physi-
ology and biology. MASON (Multi-Agent Simulator of Neighbor-
hoods) [37] is a Java-based discrete-event simulation library with a
set of 2D and 3D visualization tools. Extending to large-scale
simulations, the software package FLAME [34] focuses on high-
performance computers with an additional GPU version, while
Biocellion [30] supports the modeling of large cellular commu-
nities up to a billion cells on HPC.
2.1.2 Do It Yourself ABM Grimm [42] set up a protocol, called ODD (Overview, Design
Implementation concepts, and Detail), to describe agent-based systems in a com-
mon set of standards. This approach was done to address an early
criticism of ABMs that their descriptions were not easily under-
standable or complete. By encouraging users to explicitly docu-
ment the purpose of the model, the scientific questions to be
investigated and the scales of the model. If you write your own
ABM, we suggest you follow the OOD principles in your design
and presentation.
Rules of Engagement: A Guide to Developing Agent-Based Models 373
Table 1
Agent-based modeling software
2.2 Biological and/or Regardless of if you design your own code from the ground up or
Biophysical use a preexisting framework, the properties of your agents need to
Parameters be specified. The parameters required for an ABM are highly prob-
lem specific. In what follows, we will go through the details of
developing parameters for an agent-based model.
374 Marc Griesemer and Suzanne S. Sindi
3 Methods
3.1 State of an Agent In any implementation of an ABM, the specified quantities asso-
ciated with an agent must be detailed. For example, if this agent is a
cell, the state might be specified as the concentration of biochemi-
cal constituents (either inside or outside the cell), the size of the
cell, its age, and information about the hierarchical lineage of cells it
is related to (see Notes 1–3, and 9). These quantities associated
with an agent will be updated according to the equations and
parameters that define the underlying physical system. One can
also encode for different subtypes of the same agents, such as
differences in phenotypes or wild type versus mutants.
3.2 Action Next, we must specify precisely what an agent can do, and how it
of Individual Agents decides what to do. For example, in our framework agents may
move in a spatial domain, grow, divide, die (see Notes 2–4), and
communicate with other agents (Fig. 2). In our context of a grow-
ing population of cells, we may allow cells to grow in volume but
with growth rates that depend on availability of nutrients and the
space not occupied by neighboring cells.
Pushing (or shoving) is another type of movement in ABM
systems. Each cell moves individually but is not prevented by other
cells blocking the path. Instead these neighboring cells are dis-
placed in a direction opposite to the cell’s movement. This causes
a re-equilibrating of the colony to a new configuration. In this case,
movement always occurs, and it is up to the other cells to move out
of the way. As each cell does this in turn, this could lead to
conflicting actions. Thus, it may be more efficient to calculate the
Fig. 2 Detailed illustration of agent-based model (ABM). In this illustration, we have an off-lattice model of a
population of cells (circles). Each cell has a particular state (in this case the state of an agent is the center and
radius) and will modify their state by considering predefined cellular operations (such as division) and
information from neighboring cells. In this framework, a red cell (left panel) will stochastically decide to
perform one of the three modifications to its state: move, grow, and divide
376 Marc Griesemer and Suzanne S. Sindi
net movement of each cell once all cells have conducted their
movement. The algorithm works to minimize overlap through a
“shove radius.” All of the cells are accessed in a particular sequence.
This gives a bias into the sequence, and to correct that, the entire
set of ordering can be occasionally reversed. Randomization in the
order is another possibility (see Note 6).
Division or reproduction of agents depends on the problem in
question. In our ABM framework, we assume an agent divides
when it reaches twice its original size. Donache [44] found that
DNA replication was correlated with multiples of the initial cell
size. There are also different cellular division orientation phenom-
ena: axial or polar. The former occurs when the daughter cell
divides near its separation with the original mother, bunching up
the cells in the process. Polar division occurs when it happens on
the opposite side of the daughter, so that the cell colony grows away
from each other. Often both of the types of division occur at the
same time within a colony, with an environmental trigger such as
temperature or an internal type like gene regulatory proteins caus-
ing a change between the two modes. Cell division can be symmet-
ric, 60/40 between the mother/daughter, or budding in the case
of yeast. The life cycle of growth and division can also be specified.
3.3 Population The ABM is updated in time according to the actions of individual
Evolution in Time agents and how those actions impact the decisions of other agents.
As we mention below in Notes 5–7, implementation details such as
data structure and organization choices may greatly influence the
computational efficiency of an ABM.
A simulation can start with any number of cells and any config-
uration one can imagine within the geometry of the space. Mim-
icking experiments that perform inoculation of organisms in the lab
is one technique. If one seeks to follow the entire evolution of a
cellular community, then starting at one cell is ideal. The number of
cells can also specify the end of the simulation as well.
Once the colony has started to grow, external events can also
influence its trajectory beyond cell–cell interactions. For example,
one often introduces a nutrient field consisting of compounds to be
consumed by the cells. This can occur on a reaction-diffusion-based
lattice in which the nutrients can have a heterogeneous spatial
concentration. One can also have bulk concentrations such as for
large pools of gaseous nutrients such as CO2 or O2 or nitrogen.
These values can be varied to simulate conditions of limiting or rich
media to explore the effect of these compounds on movement.
Solid media such as agar inhibit cellular movement, while liquids
allow for more movement. This can either be a parameter in the
model or one can simulate the movement effect more explicitly.
There is therefore a link between a cell colony’s morphology and its
surroundings. As the media becomes thicker and impedes move-
ment, translation events become less frequent, meaning that
Rules of Engagement: A Guide to Developing Agent-Based Models 377
3.4 Storage In ABMs, there is usually interest in both how the individuals
and Output behave within a population and population level characteristics.
of Information As such, ABM typically track and output both types of quantities.
As we mention below in Subheading 4, by organizing file structures
and output can greatly facilitate post-processing and analyses of
these simulations (see Notes 8 and 9).
The coordinates of each individual cell’s center and its radius
are the primary data kept by the simulation, with one row per cell
per time point. Other information such as the cell type, state, birth
date can be added as needed. As cells are often object-oriented data
structures, their attributes can scale linearly with their number of
parameters.
The list of parameters needs to be recorded (see Note 10) along
with the random seed to be able to reproduce individual simula-
tions (see Note 11). The evolution of the cell community can be
given through a unique identifier that holds the lineage of the
individual cell. Subcolonies of cells could display different behavior
related to their surroundings.
4 Notes
1. Decide on a life cycle for the cell agents. Cells have checkpoints
such as mitosis, G1 in their cycle. More complicated life cycles
have multiple stages, but simply a focus on division and the
generation of new agents are most important.
2. Decide whether cells can die in the simulation. They can lyse,
providing nutrients for other species or provide space for the
remaining cells. One way the cells can die is if their growth rate
falls below a certain threshold. Another possibility is that the
cell is trapped and cannot grow at all, which lengthens its cell
cycle to the point where regulatory factors induce apoptosis.
3. Give cells multiple states and programmed actions for each
state. One can consider quiescent, hypoxic, apoptotic, necrotic,
or proliferative cells. The cells can transfer to different states
probabilistically or in response to different environmental
conditions.
4. Start with two cells and follow them through the process of
growth, translation, and division to see if they behave correctly
before scaling to many cells. A significant source of error is
when cells overlap due to incorrect updating of cell coordinates
and radius or mistakes in the potential function. An accounting
378 Marc Griesemer and Suzanne S. Sindi
of the energy values for the old and new configurations in a trial
is important in this case. One must make sure that cell births do
not cause immediate overlap unless such events are dependent
on having enough space. One solution is to have the new cells
fit inside the old cell so that no overlap occurs. This creates an
issue with the total cell area diminishing, but this may not have
much effect with fast growth and is a good first step in getting
the simulation working correctly.
5. Encapsulate agent instructions into functions and use the main
program for flow of the turn. For example, testing for a growth
event should be separate from the amount of growth done on a
successful trial.
6. Create a “process list” that has two members: cell number and a
variable containing a random number. For each cell, generate a
random number between 0 and 1 and place in a process list
vector element. Then sort the cells by their random number
and have that ordering govern the order in which the cells
perform trials. Upon the end of the turn, generate new random
numbers for each cell. Then sort the list again. When a new cell
is born, a random number is generated, and its entry should be
added to the “process list” vector.
7. Use get/set methods to modify and retrieve a cell variable. This
ties the cell attribute to an individual and protects from
unwanted editing and overwriting of values by other cell
objects.
8. Organize file I/O as separate files giving the list of the cell data
such as cell size and location. This print out of the simulation
can be for every turn, if debugging or for a video, but often a
snapshot of the output at regular intervals is sufficient to cap-
ture the dynamics of the simulation. Cell number is also impor-
tant to record too, so that the dynamics of the individual cells
can be traced. A third file keeps track of the time and number of
cells to determine growth rates and colony evolution.
9. Represent lineage of a cell as a string of numbers connected by
dashes. This allows for easy parsing for analysis purposes.
The length of the string gives the age of the cell. For example,
the C-0-0-1-2-1 is a fifth-generation cell that is present in the
subcolony of the first daughter born to the founder. This allows
one to find the ancestors and progeny of a particular cell easily
and quickly.
10. Record a seed value for the random number generator that
determines the series of random numbers that the program
generates. This allows for reproducibility of individual simula-
tions. However, generally seed the random number generator
with the current time, so that no two simulations are exactly
alike.
Rules of Engagement: A Guide to Developing Agent-Based Models 379
11. Record a parameter list in a file that gives values for the growth
amount, move amount, percentages, etc. It is important to also
record the type of potential and stopping point criteria.
References
1. Brehm-Stecher BF, Johnson EA (2004) Single- dynamic systems, 2nd edn. Academic,
cell microbiology: tools, technologies, and New York, NY
applications. Microbiol Mol Biol Rev 15. North MJ (2014) A theoretical formalism for
68:538–559 analyzing agent-based models. Complex Adapt
2. Railsback SF, Lytinen SL, Jackson SK (2006) Syst Model 2(1):1–34
Agent-based simulation platforms: review and 16. Railsback SE (2001) Concepts from complex
development recommendations. SIMULA- adaptive systems as a framework for individual-
TION 82(9):609–623 based modeling. Ecol Model 139:47–62
3. Song H-S et al (2014) Mathematical Modeling 17. Zhang L, Athale CA, Deisboeck TS (2007)
of microbial community dynamics: a methodo- Development of a three-dimensional multiscale
logical review. PRO 2(4):711–752 agent-based tumor model: simulating gene-
4. Kolmogorov A, Petrovsky L, Piscounov N protein interaction profiles, cell phenotypes
(1937) Study of the diffusion equation with and multicellular patterns in brain cancer. J
growth of the quantity of matter and its appli- Theor Biol 244:96–107
cation to a biological problem. Moscow Univ 18. Mansury Y et al (2002) Emerging patterns in
Bull Math 1:1–25 tumor systems: simulating the dynamics of
5. Murray JD (1988) How the leopard gets its multicellular clusters with an agent-based spa-
spots. Sci Am 258(3):80–87 tial agglomeration model. J Theor Biol 219
6. Holmes EE et al (1994) Partial differential (3):343–370
equations in ecology: spatial interactions and 19. Galle J et al (2006) Individual cell-based mod-
population dynamics. Ecology 75(1):17–29 els of tumor-environment interactions. Am J
7. Baker RE, Gaffney E, Maini P (2008) Partial Pathol 169(5):1802–1811
differential equations for self-organization in 20. Drasdo D, Höhme S (2003) Individual-based
cellular and developmental biology. Nonlinear- approaches to birth and death in avascular
ity 21(11):R251 tumors. Math Comput Model 37:1163–1175
8. Ward JP, King J (1997) Mathematical model- 21. Drasdo D, Hohme S (2005) A single-cell-
ling of avascular-tumour growth. Math Med based model of tumor growth in vitro: mono-
Biol 14(1):39–69 layers and spheroids. Phys Biol 2:133–147
9. Cantrell RS, Cosner C (2004) Spatial ecology 22. Xavier JB et al (2007) Multi-scale individual
via reaction-diffusion equations. John Wiley & based model of microbial and bioconversion
Sons, Hoboken, NJ dynamics in aerobic granular sludge. Environ
10. Britton NF (1986) Reaction-diffusion equa- Sci Technol 41(18):6410–6417
tions and their applications to biology. Aca- 23. Picioreanu C, Kreft JU, Loosdrecht MCMV
demic, New York, NY (2004) Particle-based multidimensional multi-
11. Alber MS et al (2003) On cellular automaton species biofilm model. Appl Environ Microbiol
approaches to modeling biological cells. In: 70:3024–3040
Rosenthal J, Gilliam DS (eds) Mathematical 24. Modwal A, Rao S (2015) Agent-based model-
systems theory in biology, communications, ling of biofilm formation and inhibition in
computation, and finance, vol 1-39. Springer, Escherichia coli. Curr Sci 109(5):930–937
New York, NY 25. Tisue S, Wilensky U (2004) Netlogo: a simple
12. Lee Y et al (1995) A cellular automaton model environment for modeling complexity. in
for the proliferation of migrating contact- International conference on complex systems.
inhibited cells. Biophys J 69:1284 ICCS, Boston, MA
13. Conway J (1970) The game of life. Sci Am 223 26. Smaldino PE, Calanchini J, Pickett CL (2015)
(4):4 Theory development with agent-based models.
14. Zeigler BP, Praehofer H, Kim TG (2000) The- Organ Psychol Rev 5(4):300–317
ory of modeling and simulation: integrating 27. Bernard RN (1999) Using adaptive agent-
discrete event and continuous complex based simulation models to assist planners in
380 Marc Griesemer and Suzanne S. Sindi
policy development: the case of rent control, 36. Ginovart M, Lopez D, Valls J (2002) INDSIM:
Santa Fe Institute working paper an individual-based discrete simulation model
28. Emonet T et al (2005) AgentCell: a digital to study bacterial cultures. J Theor Biol
single-cell assay for bacterial chemotaxis. Bio- 214:305–319
informatics 21(11):2714–2721 37. Pérez-Rodrı́guez G et al (2015) Agent-based
29. Kreft J-U, Booth G, Wimpenny JWT (1998) spatiotemporal simulation of biomolecular sys-
BacSim: a simulator for individual-based mod- tems within the open source MASON frame-
elling of bacterial colony growth. Microbiology work. Biomed Res Int 2015:1–12
144:3275–3287 38. Collier NT, North M (2012) Parallel agent-
30. Kang S et al (2014) Biocellion: accelerating based programming with repast for high per-
computer simulation of multicellular biological formance computing. SIMULATION
models. Bioinformatics 30(21):3101–3108 2012:1–21
31. Osborne JM et al (2017) Comparing 39. Minar N et al (1996) The swarm simulation
individual-based approaches to modeling the system: a toolkit for building multi-agent simu-
self-organization of multi-cellular tissues. lations. Santa Fe Institute, Santa Fe, NM
PLoS Comput Biol 13(2):e1005387 40. Walker DC, Southgate J (2009) The virtual
32. Harvey DG et al (2015) A parallel implemen- cell—a candidate co-ordinator for ‘middle-
tation of an off-lattice individual-based model out’modelling of biological systems. Brief
of multicellular populations. Comput Phys Bioinform 10(4):450–461
Commun 192:130–137 41. Resasco DC et al (2012) Virtual Cell: compu-
33. Swat MH et al (2012) Multi-scale modeling of tational tools for modeling in cell biology.
tissues using CompuCell3D. Methods Cell Wiley Interdiscip Rev Syst Biol Med 4
Biol 110:325 (2):129–140
34. Kiran M et al (2010) FLAME: simulating large 42. Polhill JG et al (2008) Using the ODD proto-
populations of agents on parallel hardware col for describing three agent-based social sim-
architectures. In: Proceedings of the 9th inter- ulation models of land-use change. J Artif Soc
national conference on autonomous agents and Soc Simul 11(2):3
multiagent systems: volume 1. International 43. Friedman SH et al (2016) MultiCellDS: a stan-
Foundation for Autonomous Agents and Mul- dard and a community for sharing multicellular
tiagent Systems, Richland, SC data. bioRxiv 2016:090696
35. Lardon LA et al (2011) iDynoMiCS: next- 44. Donachie WD (1968) Relationship between
generation individual-based modeling of bio- cell size and time of initiation of DNA replica-
films. Environ Microbiol 13(9):2416–2434 tion. Nature 219(5158):1077
INDEX
A C
Ab initio methods ................................................ 203, 204 Carbohydrate-active enzymes (CAZy)
Accelerator mass spectrometry (AMS), see Mass database............................................... 208–210
spectrometry (MS) Catalyzed reporter deposition-fluorescence in situ
Actinobacteria................................................................ 202 hybridization (CARD-FISH)............. 105, 124
Agent-based models (ABM)................................ 367–379 Cellular automata (CA) ...................................... 322–334,
Amplicons ...................................................................... 138 369–371, 374
Anabaena oscillarioides ......................................... 95, 101, Chimera ..................................... 219, 223, 229–233, 235,
102, 107, 114 237, 238, 241, 243, 248, 250, 251, 253
Antibiotics and secondary metabolites (antiSMASH) Chlamydomonas reinhardtii.......................................... 121
server ............................................................ 211 Chromatin immunoprecipitation-SIP (ChIP-SIP) ..... 125
Antibody labeling................................................. 106, 125 Chromatography
Atomic force microscopy gas chromatography (GC).........................23, 33, 152
(AFM) ...................... 123, 126, see Microscopy high-performance liquid chromatography
ATP Binding Cassette transporters (ABC) (HPLC)....................... 2, 4, 5, 7, 9, 16, 23, 83
transporters ................................ 219, 221, 231 Class Architecture Topology Homologous superfamily
(CATH) database .............................. 223–228,
B 234, 237, 238, 241, 242, 252
Bacillus anthracis .......................................................... 243 Clustered regularly interspaced short palindromic repeats
(CRISPR)........................................... 148–152,
Bacillus subtilis ........................................... 116, 292, 299,
300, 307, 308, 310, 312, 315, 316 155–160, 162, 163, 201, 204
Bacillus thuringiensis..................................................... 116 Clustering ..................................137, 169, 173, 175, 178,
180, 189, 223, 239, 252
Basic local alignment search tool (BLAST)
BLASTp ................................................ 224, 232, 234, Comparative genomics ................................................. 205
247, 252 Computed atlas of surface topography of proteins
Beads ......................................................... 24, 30–32, 138, (CASTp).................................... 230, 233, 234,
248, 250, 253
140–144
BiGG models database.................................................. 344 Constraint Based Reconstruction and Analysis (COBRA)
Binding thermodynamics COBRApy ............................................................... 340
models............................................................. 321–334
binding enthalpy ....................................................... 42
binding entropy......................................................... 42 toolbox ........................................................... 339–364
binding Gibbs free energy ..................................42, 56 ConSurf ................................................................ 225, 229
CRISPcut.............................................................. 158–161
BioCyc database
ecoCyc ................................. 260, 264, 266, 269–272, CRISPR/Cas9 system ........................................ 147–153,
279, 284, 285, 287, 288, 323, 324 155, 158
CRISPR interference (CRISPRi) ........................ 152, 157
metaCyc ................................................ 284, 286, 288,
324, 326 Cytoscape.............................................................. 168, 169
Biofilms ................................................................. 370, 373
D
Biolog phenotype microarrays...................................... 309
Biological General Repository for Interaction Datasets Database for automated carbohydrate-active enzyme
(BioGRID) database ................................... 183 annotation (dbCAN).......................... 208–210
BRENDA database .............................................. 324, 326 Dead cas9 (dCas9) ............................................... 152, 155
Deinococcus-Thermus.................................................. 202
Ali Navid (ed.), Microbial Systems Biology: Methods and Protocols, Methods in Molecular Biology, vol. 2349,
https://doi.org/10.1007/978-1-0716-1585-0, © Springer Science+Business Media, LLC, part of Springer Nature 2022
381
MICROBIAL SYSTEMS BIOLOGY: METHODS AND PROTOCOLS
382 Index
Department of energy systems biology knowledgebase Gas chromatography–mass spectrometry (GC-MS),
(KBase).............................. 291–318, 323–325, see Mass spectrometry (MS)
340, 361 Gel electrophoresis (GE) .......................... 47, 75, 76, 144
Detergent screening.................................... 57, 59, 60, 62 Genbank ...................................................... 198, 200, 294
Domains of unknown function .................................... 194 Gene knockin ....................................................... 150, 151
Dynamic FBA (dFBA) ........................................ 259, 260, Gene knockouts
271–273, 275–277, 279, 340 double deletion ....................................................... 360
single knockouts...................................................... 304
E Gene ontology (GO) ........................................... 224, 228
Electroporation ............................................................. 155 Gene-protein-reaction (GPR) associations.................. 292
Element labeling-catalyzed reporter deposition Genome annotation ................................... 193–211, 216,
291, 292, 321, 323, 325
fluorescence in situ hybridization (EL-FISH),
see Fluorescent in situ hybridization Github............................................................................ 341
EntrezGene database .................................................... 323 Gnu linear programming kit (GLPK)................. 324, 341
Gurobi solver................................................................. 341
Epitopes .............................................................. 69, 75, 76
Escherichia coli ............................................ 12–14, 17, 20,
H
43, 46, 48, 83, 138, 194, 260, 263–266,
269–272, 277, 279, 323, 334, 340, 341, 343, Hidden Markov Models (HMMs) ...................... 204, 209
346, 352, 355, 356, 360, 361 High-performance liquid chromatography (HPLC),
EuGene .......................................................................... 205 see Chromatography
Expression sequence tag (EST).................................... 204 Homology
sequences ........................................85, 204, 216, 217,
F 219, 221, 223, 228, 229, 232, 243, 246
structural.......................................216, 219, 221, 223,
FASTA format ...................................................... 196, 198
Fast sampling devices ................................................14–19 228, 232, 235, 243, 246
Flexible structure AlignmenT by Chaining Aligned Homology directed repair (HDR)..................... 149, 151,
155, 162
fragment pairs allowing Twists
(FATCAT).................................. 223, 232, 240 Hyperion II ............................................. 95, 97, 112, 116
Flow cytometry .................................................... 101, 194
I
Fluorescence spectroscopy.............................................. 42
Fluorescent in situ hybridization (FISH) ...................... 82 Immuno-labeling ................................................. 115, 125
catalyzed reporter deposition-fluorescence in situ Inductively coupled plasma (ICP)
hybridization (CARD-FISH) ............ 105, 124 radiofrequency (RF) inductively coupled plasma
element labeling-catalyzed reporter deposition (ICP) .............................................................. 96
fluorescence in situ hybridization Ion mobility spectrometry.............................................. 43
(EL-FISH) ................................................... 123 Isothermal titration calorimetry (ITC).................. 42, 53,
multiplexed error-robust fluorescence in situ 57, 59, 60
hybridization (MERFISH) ........................... 82 Isotope labeling................................................................. 2
Flux balance analysis (FBA) IUPAC International Chemical Identifier
objective function................................. 292, 308, 323, (InChI)......................................................... 344
351–355
Fluxome ........................................................................... 11 J
Flux variability analysis (FVA) ............................ 260–262,
Joint Genome Institute (JGI)
304, 307, 308, 318, 339, 354, 355 integrated microbial genomes (IMG)
Focused ion beam (FIB) sectioning............................. 107 system.................................................. 196, 324
Förster resonance energy transfer (FRET) ..............84, 85
Francisella tularensis ..................................................... 232 K
G Klebsiella pneumoniae .......................................... 309, 313
Kyoto Encyclopedia of Genes and Genomes (KEGG)
Gap filling ............................................................ 260, 261, KEGG Automated Annotation Server
287, 288, 332 (KAAS)................................................ 205–208
Gas chromatography (GC), see Chromatography
MICROBIAL SYSTEMS BIOLOGY: METHODS AND PROTOCOLS
Index 383
L mass resolving power (MRP) ......................98, 99,
104, 108–111
Linear programming (LP) .................................. 263, 324, nanometer-scale stable isotope probing (NanoSIP)
331, 340–342, 351 91–127
Links .......................................... 167, 168, 170, 173–175, relative useful yield (RUY) ..............116, 120, 121
178–181, 184, 185, 188, 189, 196–200, SIMS in situ hybridization (SIMSISH)........... 123
206–211, 216, 217, 223–225, 227, 228, 237, time-of-flight SIMS (TOF-SIMS)........... 125, 126
238, 247, 252, 263, 264, 266, 316, 345, 376 quadrupole time of flight (QTOF) ............................ 2
Liquid chromatography (LC) secondary ion mass spectrometry
high-performance liquid chromatography (SIMS) .................................. 92, 96–100, 102,
(HPLC)................................ 2, 4–9, 16, 23, 83 103, 105–109, 116–118, 120, 122–127
Liquid chromatography-electrospray ionisation tandem Matlab................................... 82, 88, 100, 324, 340, 341,
mass spectrometry (LC-ESI-MS/MS), see Mass 349–351, 362, 364
spectrometry (MS) Metabolic model testing (MEMOTE) ............... 332, 341
Liquid chromatography-mass spectrometry (LC-MS), Metabolic networks.................................... 179, 180, 184,
see Mass spectrometry (MS) 259, 317, 322, 323, 325, 326, 328–330, 333,
Liquid chromatography-tandem mass spectrometry 343, 356, 357
(LC-MS/MS), see Mass spectrometry (MS) Metabolome
Liquid scintillation counting (LSC).................... 1, 3, 5, 8 endometabolome ...................................................... 24
Louvain method, see Network analysis exometabolome ......................................................... 24
metabolite extraction ................................................ 36
M
MetaFlux, see Pathway tools software
Marinobacter sp. PT19DW ................................. 196, 201 Metagenome-assembled genomes
Mass spectrometry (MS)..................................... 4, 45, 54, (MAGs) ..............................194, 195, 206, 210
61, 92, 122 Metagenomics ............................................. 194, 210, 216
accelerator mass spectrometry (AMS) ................. 1–10 Microarrays .................................................................... 125
data analysis ....................................... 2, 51, 53, 54, 56 Microbiome ........................................................... 65, 126,
Fourier transform-ion cyclotron resonance mass 137–146, 294
spectrometry (FT-ICR)............................... 126 Microhomology-mediated end-joining
gas chromatography–mass spectrometry (MMEJ) ....................................................... 154
(GC-MS)..................................................23, 33 Microscopy
inductively coupled plasma mass spectrometry atomic force microscopy (AFM) ............................ 123
(ICP-MS) ..................................................... 123 confocal microscopy.................................................. 84
ionization techniques scanning electron microscopy
matrix-assisted laser desorption ionization (SEM) ................................................. 105, 123
(MALDI) ..................................................... 125 scanning electron microscopy and energy dispersive
nanoscale electrospray ionization (nESI)........... 46 spectroscopy (SEM-EDS) ........................... 123
isotope ratio mass spectrometry scanning transmission electron microscopy
(IRMS)............................................1, 105, 122 (STEM) ............................................... 108, 123
liquid chromatography-electrospray ionisation scanning transmission X-ray microscopy
tandem mass spectrometry (STXM) ............................................... 124, 125
(LC-ESI-MS/MS) ........................................ 36 transmission electron microscopy
liquid chromatography-mass spectrometry (TEM) ..........................................99, 101, 107,
(LC-MS) ............................................. 7, 23, 33 108, 122, 123
liquid chromatography-tandem mass spectrometry wear-edge X-ray absorption fine structure
(LC-MS/MS) ................................................ 23 (NEXAFS)........................................... 124, 125
mass resolving power (MRP) ........................ 104, 110 Mitochondria................................................................... 73
nanometer-scale SIMS (NanoSIMS) ...............91–127 Mixed-integer linear programming
CAMECA instruments ....................... 92, 96–100, (MILP) ................................................ 263, 341
108, 110, 111, 113, 116, 126 Multiplexed error-robust fluorescence in situ
extreme low impact energy hybridization
(EXLIE) .......................................... 97, 98, 126 (MERFISH)...........82, see Fluorescence in situ
instrumental mass fractionation hybridization
(IMF) .................................................. 118, 122 Mus musculus............................................... 174, 178, 183
MICROBIAL SYSTEMS BIOLOGY: METHODS AND PROTOCOLS
384 Index
N P
National Center for Biotechnology Information (NCBI) Pajek...................................................................... 168, 169
database........................................................ 324 Parallel accelerator and molecular mass spectrometry
Nanometer-scale SIMS (NanoSIMS) (NanoSIMS), (PAMMS)................................................... 1–10
see Mass spectrometry (MS) Parsimonious enzyme usage FBA
Nanometer-scale stable isotope probing (NanoSIP), (pFBA) ........................................340, 355–357
see Mass spectrometry (MS) Partial differential equations (PDE) ................... 368, 369
NetLogo ............................................................... 372, 373 Pathway tools software
Network analysis MetaFlux.................................................................. 259
adjacency matrix ............................................. 170–172 PathoLogic program............................................... 259
betweenness centrality ...........................173, 176–178 pathway/genome databases
closeness Centrality ................................173, 176–178 (PGDBs) ....................................259–261, 264,
community structure ..................................... 181, 182 284, 285, 287, 288
component size distribution (CSD) ...................... 175 PATRIC ......................................................................... 294
cumulative degree distribution.............................. 171, PDBSum ..................................... 223, 225, 227–229, 247
172, 178 Perl ................................................................................. 197
clustering coefficient ............................. 173, 175, 178 Phaeodactylum tricornutum.......................................... 196
degree distribution........................................ 171–175, Polymerase chain reaction (PCR)
178–180, 183–189 quantitative PCR (qPCR)................................ 81, 142
largest component size (LCS) ................................ 178 Posttranslational modifications ................................12, 65
Louvain method...................................................... 182 Primal-Dual interior method for convex objectives (pdco)
weighted gene co-expression network analysis solver ............................................................ 342
(WGCNA) .......................................... 168–170 Prokaryotic genome annotation pipeline
Networks (PGAP)................................................ 197, 198
communities/modules ........................................... 181 Protein coding sequence (CDS) ........................ 153, 198,
components ............................................................. 181 202, 203, 205
directed ........................................................... 168–170 Protein databank (PDB)
protein-protein interaction research collaboratory for structural bioinformatics
networks.............................................. 178, 183 (RCSB)...................................... 221, 223, 225,
random .................................................. 168, 184, 187 231, 232, 240
NetworkX ............................................................ 168, 169, The biological magnetic resonance data bank
171, 172, 185 (BMRB) ....................................................... 221
Next generation sequencing (NGS) ................... 139, 162 Protein–DNA complexes .............................................. 148
Nitrosomonas europaea.................................................. 238 Protein expression
Nodes membrane .................................................................. 48
centrality soluble........................................................................ 43
betweenness centrality .....................173, 176–178 Protein FAMily database (PFAM) ..................... 224, 225,
closeness centrality (CC) .................173, 176–178 227, 231, 246
Non-homologous end joining Protein–ligand binding .............................. 42, 47, 50–52,
(NHEJ) .............................................. 149–151, 54, 56, 57, 59, 60
154, 155, 157 Protein–protein interaction
Nuclear magnetic resonance (NMR) ............42, 221, 249 (PPI).............................. 56, 65, 174–180, 225
Nucleases Protein purification
meganucleases ......................................................... 148 membrane .................................................................. 49
transcription activator-like effector-based nucleases soluble..................................................................43, 48
(TALENs) .................................................... 148 Proteome .........................................................11, 12, 205,
zinc finger nucleases (ZFNs) .................................. 148 297, 298, 329
Protospacer adjacent motif (PAM) .................... 148–150,
O 152–157
Off-target prediction and identification tool Pseudomonas stutzeri ..................................................... 116
PubChem....................................................................... 344
(CROP-IT) ......................................... 158, 159
Open reading frames (ORFs) ............................. 198, 199, Python ..................................................46, 168, 169, 197,
202–204 260, 324, 373
MICROBIAL SYSTEMS BIOLOGY: METHODS AND PROTOCOLS
Index 385
Q Shewanella denitrificans ....................................... 235, 236
SIMS in situ hybridization (SIMSISH), see Spectrometry
Quantitative PCR (QPCR), see Polymerase chain reaction Single amplified genomes .................................... 194, 196
(PCR) Soil .............1, 92, 95, 97, 101, 107, 123, 124, 127, 138
Quenching Spectrometry .......................................... 1–10, 23, 24, 33,
cold methanol quenching ..................................13, 15, 41–62, 92, 122, 123
20, 21, 27, 31, 36 Stable isotope probing (SIP) ............................... 101, 125
metabolism ..........................................................19, 20 Streptococcus pyogenes............................................ 148, 153
Structural classification of proteins (SCOP)
R
structural classification of proteins-extended
Radioimmunoassay (RIA) .............................................. 66 (SCOPe) ...................................................... 228
Random graphs .................................................... 178, 188 Surface plasmon resonance (SPR).............. 42, 53, 59, 60
Random network models SWISS-PROT protein database ................................... 226
Barabási–Albert (BA) model .................................. 187 Synthetic lethal mutations, see Gene knockouts
configuration model....................................... 188–190 Systems biology markup language
Erdös–Renyı́ model........................................ 185–187 (SBML) ............................ 262, 299, 328, 341,
Rapid annotation using subsystems technology 350, 352, 364
(RAST) .............................. 198–201, 204–211,
294, 295, 325 T
RAVEN program.................................................. 323, 324 The SEED
Reaction-diffusion models...........................368–369, 376 ModelSEED .......................................... 316, 327, 362
Reaction knockouts.............................................. 260, 356 Transcription activator-like effector-based nucleases
Recursive Automatic Search of MOTif in 3D structures of (TALEN), see Nucleases
PROteins Transcriptome ................................ 11, 12, 137, 204, 205
(RASMOT 3D PRO) ................ 231, 251, 252 Transmission electron microscopy (TEM), see Microscopy
RefSeq............................................................................ 294 Translated EMBL nucleotide sequence data library
Rfam database ............................................................... 204 (TrEMBL).................................................... 225
RNA TransportDB ................................................................. 208
crRNAs (CRISPR RNA) ........................................ 149 Trichodesmium spp. ......................................................... 95
non-coding RNAs (ncRNAs) ........................ 201, 204
ribosomal RNA (rRNA) ............................... 125, 137, U
138, 140, 143, 144, 198
RNA sequencing (RNA-Seq) ..................81, 138, 204 UniProt....................................................... 223, 225, 227,
single guide-RNA (sgRNA) .......................... 152, 155 238, 243, 252
small regulatory RNAs (sRNAs) ................... 201, 204
V
tracrRNAs (trans-activating
CRISPR RNA) ................................... 148, 153 Virus................................................................................. 43
RNA sequencing (RNA-seq), see Sequencing
W
S
Weighted gene co-expression network analysis
Saccharomyces cerevisiae................................................... 20 (WGCNA)........................................... 168–170
Scanning electron microscopy (SEM), see Microscopy Western blots (WBs) .................................................65–79
Scanning electron microscopy and energy dispersive Whole genome amplification (WGA) .......................... 194
spectroscopy (SEM-EDS), see Microscopy
Scanning transmission X-ray microscopy (STXM), X
see Microscopy
X-ray tomography ......................................................... 125
SDS-PAGE ......................................................... 70, 76, 78
Sequence annotated by structure Y
(SAS) .........................223, 225, 228, 229, 247
Sequencing Yersinia pestis (YP) ...................................... 179, 180, 184
genome ...........................11, 162, 193–196, 204, 294
metatranscriptomic ........................................ 137–146
Z
RNA sequencing (RNA-seq)..........81, 138, 204, 316 Zinc finger nucleases (ZFNs), see Nucleases
transposon sequencing (TN-seq) .................. 292, 309