100% found this document useful (2 votes)
911 views356 pages

Synthetic Gene Circuits - Methods and Protocols (2021)

Uploaded by

guillermo_feliú
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
100% found this document useful (2 votes)
911 views356 pages

Synthetic Gene Circuits - Methods and Protocols (2021)

Uploaded by

guillermo_feliú
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 356

Methods in

Molecular Biology 2229

Filippo Menolascina
Editor

Synthetic
Gene Circuits
Methods and Protocols
METHODS IN MOLECULAR BIOLOGY

Series Editor
John M. Walker
School of Life and Medical Sciences
University of Hertfordshire
Hatfield, Hertfordshire, UK

For further volumes:


http://www.springer.com/series/7651
For over 35 years, biological scientists have come to rely on the research protocols and
methodologies in the critically acclaimed Methods in Molecular Biology series. The series was
the first to introduce the step-by-step protocols approach that has become the standard in all
biomedical protocol publishing. Each protocol is provided in readily-reproducible step-by-
step fashion, opening with an introductory overview, a list of the materials and reagents
needed to complete the experiment, and followed by a detailed procedure that is supported
with a helpful notes section offering tips and tricks of the trade as well as troubleshooting
advice. These hallmark features were introduced by series editor Dr. John Walker and
constitute the key ingredient in each and every volume of the Methods in Molecular Biology
series. Tested and trusted, comprehensive and reliable, all protocols from the series are
indexed in PubMed.
Synthetic Gene Circuits

Methods and Protocols

Edited by

Filippo Menolascina
School of Engineering, Institute for Bioengineering, University of Edinburgh, Edinburgh, UK
Editor
Filippo Menolascina
School of Engineering
Institute for Bioengineering
University of Edinburgh
Edinburgh, UK

ISSN 1064-3745 ISSN 1940-6029 (electronic)


Methods in Molecular Biology
ISBN 978-1-0716-1031-2 ISBN 978-1-0716-1032-9 (eBook)
https://doi.org/10.1007/978-1-0716-1032-9

© Springer Science+Business Media, LLC, part of Springer Nature 2021


This work is subject to copyright. All rights are reserved by the Publisher, whether the whole or part of the material is
concerned, specifically the rights of translation, reprinting, reuse of illustrations, recitation, broadcasting, reproduction
on microfilms or in any other physical way, and transmission or information storage and retrieval, electronic adaptation,
computer software, or by similar or dissimilar methodology now known or hereafter developed.
The use of general descriptive names, registered names, trademarks, service marks, etc. in this publication does not imply,
even in the absence of a specific statement, that such names are exempt from the relevant protective laws and regulations
and therefore free for general use.
The publisher, the authors, and the editors are safe to assume that the advice and information in this book are believed to
be true and accurate at the date of publication. Neither the publisher nor the authors or the editors give a warranty,
expressed or implied, with respect to the material contained herein or for any errors or omissions that may have been
made. The publisher remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

This Humana imprint is published by the registered company Springer Science+Business Media, LLC, part of Springer
Nature.
The registered company address is: 1 New York Plaza, New York, NY 10004, U.S.A.
Preface

Synthetic Biology is an emerging engineering discipline with an ambitious goal: empower-


ing scientists with the ability to program new functions into cells, just like they would do
with computers. The field is predicated upon the “Design-Build-Test-Learn” (DBTL) cycle.
Borrowed from engineering, this framework guides human experts through the process of
translating requirements into synthetic circuit designs, building, characterizing, and
re-designing them by “learning” from “mistakes”. Systematic in nature, the DBTL cycle
could translate into lengthy and expensive iterations, until now.
The recent advent of laboratory automation in biology radically changed this landscape:
computer algorithms have emerged over the past few years that automate the design of
biocircuits. Robots that can be instructed to take such designs and physically assemble the
DNA constructs are becoming more and more affordable. Miniaturization, introduced by
microfluidics, allows to increase throughput and cut reagents cost, ultimately enabling faster
and cheaper screening of candidate circuit designs. Mathematical models, once a prerogative
of quantitative scientists, can now be built automatically with human minimal effort.
Experiments themselves can now be designed by computer programs to save time and
money. Such advances, combined, can significantly speed up the process of biological
circuits engineering, yet only a fraction of this potential has been expressed so far.
This book aims at filling such knowledge gap. By bringing together some of the most
prominent scientists and engineers in synthetic biology, this volume aims at providing the
reader with clear, immediately actionable protocols to implement/exploit automated DBTL
in their research and development efforts. Following the natural evolution of a project in
synthetic biology, we first outline the techniques to model and simulate biological systems:
Chapters 1 and 2. Chapters 3 and 4 show how such models can be used automatically to
design and redesign biological systems. We then move onto laboratory automation: while
Chapter 5 guides the reader in the setup of an automated biolaboratory, Chapters 6 and 7
provide a step-by-step guide on how to perform Computer-Aided Design, Planning, and
Verification of DNA constructs using the rich toolbox of software developed at the Edin-
burgh Genome Foundry, one of the most automated public biofoundries worldwide. The
following three chapters delve into protocols for high-throughput gene circuits characteri-
zation: either through RNA sequencing (Chapter 8) or via microfluidics using bacterial cell-
free extracts, as in Chapter 9, or live mammalian cells, as in Chapter 10. Computational and
experimental procedures to automatically infer models, with minimal efforts, are outlined in
Chapters 11 and 12, respectively. Metabolic burden, a common source of divergence
between model predictions and experiments, is the focus of the following three chapters.
While in Chapters 13 and 14 we focus on computational techniques to predict such burden
from models, Chapters 15 illustrates how sensors can be designed and developed to
experimentally measure metabolic burden. Chapter 16 concludes the volume offering the
reader a broader, yet practical, perspective on how DNA parts can be engineered in
mammalian cells to sense, and respond to, intracellular signals in general.
Working on this volume we aimed at distilling our collective experience into a set of
steps and advices that will ideally help our readers jump-starting their journey into

v
vi Preface

automating the design, construction, testing, and modeling of biocircuits. We hope the
result will be met with favor. I personally wish to thank all the authors for their contribu-
tions: editing this book and venturing in their science were a tremendously enjoyable
learning experience for me.

Edinburgh, UK Filippo Menolascina


Contents

Preface . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . v
Contributors. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ix

1 Qualitative Modeling, Analysis and Control of Synthetic Regulatory


Circuits . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
Madalena Chaves and Hidde de Jong
2 Stochastic Differential Equations for Practical Simulation of Gene
Circuits . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41
Jesús Pico , Alejandro Vignoni, and Yadira Boada
3 Using Models to (Re-)Design Synthetic Circuits. . . . . . . . . . . . . . . . . . . . . . . . . . . . 91
Giselle McCallum and Laurent Potvin-Trottier
4 Automated Biocircuit Design with SYNBADm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 119
Irene Otero-Muras and Julio R. Banga
5 Setting Up an Automated Biomanufacturing Laboratory . . . . . . . . . . . . . . . . . . . . 137
Marilene Pavan
6 Computer-Aided Design and Pre-validation of Large Batches of DNA
Assemblies . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 157
Valentin Zulkower
7 Computer-Aided Planning for the Verification of Large Batches of DNA
Constructs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 167
Valentin Zulkower
8 Characterizing Genetic Parts and Devices Using RNA Sequencing . . . . . . . . . . . . 175
Deepti Vipin, Zoya Ignatova, and Thomas E. Gorochowski
9 Steady-State Cell-Free Gene Expression with Microfluidic Chemostats . . . . . . . . 189
Nadanai Laohakunakorn, Barbora Lavickova, Zoe Swank,
Julie Laurent, and Sebastian J. Maerkl
10 A Microfluidic/Microscopy-Based Platform for on-Chip Controlled
Gene Expression in Mammalian Cells . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 205
Mahmoud Khazim, Elisa Pedone, Lorena Postiglione,
Diego di Bernardo, and Lucia Marucci
11 Optimal Experimental Design for Systems and Synthetic Biology
Using AMIGO2 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 221
Eva Balsa-Canto, Lucia Bandiera, and Filippo Menolascina
12 A Cyber-Physical Platform for Model Calibration . . . . . . . . . . . . . . . . . . . . . . . . . . . 241
Lucia Bandiera, David Gomez-Cabeza, Eva Balsa-Canto,
and Filippo Menolascina
13 Prediction of Cellular Burden with Host–Circuit Models . . . . . . . . . . . . . . . . . . . . 267
Evangelos-Marios Nikolados, Andrea Y. Weiße, and Diego A. Oyarzún
14 A Practical Step-by-Step Guide for Quantifying Retroactivity
in Gene Networks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 293
Andras Gyorgy

vii
viii Contents

15 Engineering Sensors for Gene Expression Burden. . . . . . . . . . . . . . . . . . . . . . . . . . . 313


Alice Boo and Francesca Ceroni
16 Engineering Protein-Based Parts for Genetic Devices in Mammalian
Cells . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 331
Giuliano Bonfá, Federica Cella, and Velia Siciliano

Index . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 347
Contributors

EVA BALSA-CANTO • (Bio)Process Engineering Group, IIM-CSIC (Spanish National


Research Council), Vigo, Spain
LUCIA BANDIERA • School of Engineering, Institute for Bioengineering, The University of
Edinburgh, Edinburgh, UK; SynthSys - Centre for Synthetic and Systems Biology, The
University of Edinburgh, Edinburgh, UK
JULIO R. BANGA • BioProcess Engineering Group, IIM-CSIC, Spanish National Research
Council, Vigo, Spain
YADIRA BOADA • Synthetic Biology and Biosystems Control Lab, I.U. de Automática
e Informática Industrial (ai2), Universitat Politècnica de Valencia, Valencia, Spain;
Centro Universitario EDEM, Escuela de Empresarios, La Marina de València, Valencia,
Spain
GIULIANO BONFÁ • Istituto Italiano di Tecnologia, Largo Barsanti e Matteucci, Naples, Italy
ALICE BOO • Department of Bioengineering, Imperial College London, London, UK;
Imperial College Centre for Synthetic Biology, Imperial College London, London, UK
FEDERICA CELLA • Istituto Italiano di Tecnologia, Largo Barsanti e Matteucci, Naples, Italy;
University of Genoa, Genoa, Italy
FRANCESCA CERONI • Imperial College Centre for Synthetic Biology, Imperial College London,
London, UK; Department of Chemical Engineering, Imperial College London, London,
UK
MADALENA CHAVES • Université Côte d’Azur, Inria, INRAE, CNRS, Sorbonne Université,
Biocore Team, Sophia Antipolis, France
HIDDE DE JONG • Université Grenoble Alpes, Inria, Grenoble, Montbonnot, Saint Ismier
Cedex, France
DIEGO DI BERNARDO • Telethon Institute of Genetics and Medicine, Pozzuoli (NA), Italy
DAVID GOMEZ-CABEZA • School of Engineering, Institute for Bioengineering, The University
of Edinburgh, Edinburgh, UK
THOMAS E. GOROCHOWSKI • School of Biological Sciences, University of Bristol, Bristol, UK
ANDRAS GYORGY • New York University Abu Dhabi, Abu Dhabi, United Arab Emirates
ZOYA IGNATOVA • Institute for Biochemistry and Molecular Biology, Department of
Chemistry, University of Hamburg, Hamburg, Germany
MAHMOUD KHAZIM • Department of Engineering Mathematics, University of Bristol, Bristol,
UK; School of Cellular and Molecular Medicine, University of Bristol, Bristol, UK;
BrisSynBio, Bristol, UK
NADANAI LAOHAKUNAKORN • School of Biological Sciences, Institute of Quantitative Biology,
Biochemistry and Biotechnology, University of Edinburgh, Edinburgh, UK
JULIE LAURENT • Institute of Bioengineering, School of Engineering, École Polytechnique Fédé
rale de Lausanne, Lausanne, Switzerland
BARBORA LAVICKOVA • Institute of Bioengineering, School of Engineering, École Polytechnique
Fédérale de Lausanne, Lausanne, Switzerland
SEBASTIAN J. MAERKL • Institute of Bioengineering, School of Engineering, École
Polytechnique Fédérale de Lausanne, Lausanne, Switzerland

ix
x Contributors

LUCIA MARUCCI • Department of Engineering Mathematics, University of Bristol, Bristol,


UK; School of Cellular and Molecular Medicine, University of Bristol, Bristol, UK;
BrisSynBio, Bristol, UK
GISELLE MCCALLUM • Department of Biology, Concordia University, Montreal, QC,
Canada
FILIPPO MENOLASCINA • School of Engineering, Institute for Bioengineering, The University
of Edinburgh, Edinburgh, UK; SynthSys - Centre for Synthetic and Systems Biology, The
University of Edinburgh, Edinburgh, UK
EVANGELOS-MARIOS NIKOLADOS • School of Biological Sciences, University of Edinburgh,
Edinburgh, UK
IRENE OTERO-MURAS • BioProcess Engineering Group, IIM-CSIC, Spanish National
Research Council, Vigo, Spain
DIEGO A. OYARZÚN • School of Biological Sciences, University of Edinburgh, Edinburgh, UK;
School of Informatics, University of Edinburgh, Edinburgh, UK
MARILENE PAVAN • Lanzatech Inc., Skokie, IL, USA
ELISA PEDONE • Department of Engineering Mathematics, University of Bristol, Bristol, UK;
School of Cellular and Molecular Medicine, University of Bristol, Bristol, UK; BrisSynBio,
Bristol, UK
JESÚS PICÓ • Synthetic Biology and Biosystems Control Lab, I.U. de Automática
e Informática Industrial (ai2), Universitat Politècnica de Valencia, Valencia, Spain
LORENA POSTIGLIONE • Department of Engineering Mathematics, University of Bristol,
Bristol, UK; School of Cellular and Molecular Medicine, University of Bristol, Bristol, UK;
BrisSynBio, Bristol, UK
LAURENT POTVIN-TROTTIER • Department of Biology, Concordia University, Montreal, QC,
Canada; Center for Applied Synthetic Biology, Concordia University, Montreal, QC,
Canada; Department of Physics, Concordia University, Montreal, QC, Canada
VELIA SICILIANO • Istituto Italiano di Tecnologia, Largo Barsanti e Matteucci, Naples, Italy
ZOE SWANK • Institute of Bioengineering, School of Engineering, École Polytechnique Fédérale
de Lausanne, Lausanne, Switzerland
ALEJANDRO VIGNONI • Synthetic Biology and Biosystems Control Lab, I.U. de Automática
e Informática Industrial (ai2), Universitat Politècnica de Valencia, Valencia, Spain
DEEPTI VIPIN • Institute for Biochemistry and Molecular Biology, Department of Chemistry,
University of Hamburg, Hamburg, Germany
ANDREA Y. WEIßE • School of Informatics, University of Edinburgh, Edinburgh, UK
VALENTIN ZULKOWER • Edinburgh Genome Foundry, SynthSys, School of Biological Sciences,
University of Edinburgh, Edinburgh, UK
Chapter 1

Qualitative Modeling, Analysis and Control of Synthetic


Regulatory Circuits
Madalena Chaves and Hidde de Jong

Abstract
Qualitative modeling approaches are promising and still underexploited tools for the analysis and design of
synthetic circuits. They can make predictions of circuit behavior in the absence of precise, quantitative
information. Moreover, they provide direct insight into the relation between the feedback structure and the
dynamical properties of a network. We review qualitative modeling approaches by focusing on two specific
formalisms, Boolean networks and piecewise-linear differential equations, and illustrate their application by
means of three well-known synthetic circuits. We describe various methods for the analysis of state
transition graphs, discrete representations of the network dynamics that are generated in both modeling
frameworks. We also briefly present the problem of controlling synthetic circuits, an emerging topic that
could profit from the capacity of qualitative modeling approaches to rapidly scan a space of design
alternatives.

Key words Qualitative modeling, Gene regulatory networks, Synthetic circuits, Boolean models,
Piecewise-linear differential equation models, Network control

1 Introduction

Over the past decade, the construction of synthetic circuits in living


cells has been facilitated by the development of increasingly power-
ful techniques in molecular biology, from DNA synthesis to parts
libraries to genome editing [1–4]. Moreover, bioinformatics tools
supporting the in silico design of plasmids and genomes have
become standard in every laboratory. Notwithstanding these tech-
nological advances, the biological implementation of synthetic cir-
cuits remains a highly challenging task, because the interactions
between the circuit elements, and the circuit and the cellular chas-
sis, may have unforeseen dynamic consequences [5]. The difficul-
ties to predict and understand circuit dynamics become even more
compelling when the size and the scope of synthetic circuits
increase, as has been the case in recent years.

Filippo Menolascina (ed.), Synthetic Gene Circuits: Methods and Protocols, Methods in Molecular Biology, vol. 2229,
https://doi.org/10.1007/978-1-0716-1032-9_1, © Springer Science+Business Media, LLC, part of Springer Nature 2021

1
2 Madalena Chaves and Hidde de Jong

Probably the most promising way to get a grasp on the relation


between the structure and behavior of synthetic circuits is the use of
mathematical models. The development of such models is impor-
tant for both the a-priori design and the a-posteriori analysis of
synthetic regulatory circuits. For example, libraries of circuit com-
ponents have been used to explore the design space and find con-
structions that are Pareto optimal, in the sense of having the best
trade-off between multiple, conflicting optimization criteria
[6]. Moreover, the analysis of mathematical models of synthetic
oscillators has shown that circuits combining a negative with posi-
tive feedback lead to increased stability of oscillations [7].
While mathematical modeling is a basic part of the toolbox of
engineers in other disciplines, its application to (synthetic) biology
encounters specific problems that cause it to be still underexploited
today. One important difficulty for modeling biological systems on
the molecular level is the absence of reliable in-vivo values of para-
meters. This difficulty is further amplified by the fact that the
modeling of synthetic circuits and their interactions with the cellu-
lar environment may require a large number of equations and
parameters. The size and nonlinearity of these models make quan-
titative estimation of the parameters from usually incomplete and
noisy time-series data a real challenge [8–10].
An advantage of qualitative models is that they do not need
quantitative parameter values for making predictions about the
network dynamics. This comes at a price, of course, namely that
the predictions are less precise. Usually, only qualitative patterns
can be predicted, such as that a protein concentration increases over
time instead of increasing from 1 to 2 μM. For many purposes this
is sufficient though and in some cases even desirable. Indeed, by
focusing on qualitative aspects of the system dynamics, one may
gain a better insight into key structural properties of the network
that give rise to a certain dynamical behavior. This might allow, for
example, to perform a preliminary screening of possible network
structures capable of displaying a certain desired property or to
understand structural causes for undesired side-effects when ana-
lyzing data on circuit performance.
Qualitative models have been traditionally proposed for gene
regulatory networks [11–13], since the switching dynamics of gene
expression lend themselves particularly well to the approximations
underlying most qualitative models. As a matter of fact, the activity
levels of genes can be seen as consisting of distinct discrete states,
typically ON and OFF. Moreover, switches between these discrete
states can be seen as following a regulatory logic, due to the
combinatorial effect of transcription factors and other regulators.
In recent years, similar approximations have been shown useful for
other types of networks as well, notably signal transduction net-
works [14, 15]. In the latter case, the activity levels of signaling
proteins like kinases and phosphatases are assumed to correspond
Qualitative Modeling of Synthetic Circuits 3

to distinct discrete states and their combinatorial effect on the


activity of other signaling proteins to follow a regulatory logic.
Through the examples in this review, however, we will focus on
gene regulatory networks.
We discuss two different qualitative modeling approaches:
Boolean or logical models vs. piecewise-linear differential equation
models. The first approach is based on discrete models, whereas
models in the second approach are continuous but have dynamics
that can be analyzed in a qualitative manner via discrete abstrac-
tions, that is, discrete descriptions of an underlying continuous
dynamics. While the two types of models draw upon different
mathematical concepts and methods, it is quite remarkable that
the discrete representations of the network dynamics that are even-
tually used are quite similar. In both cases, the dynamics are well
described by the so-called state transition graphs, consisting of
network states and transitions between these states. As a conse-
quence, the methods for analyzing the two types of models are
quite similar in practice, and are concerned with, for example,
finding attractors in the state transition graph, verifying properties
of paths in the state transition graphs, reducing large state transi-
tion graphs to smaller, more insightful graphs, and composing
properties of the state transition graph of the entire network from
the properties of the network modules. In our description of these
methods, we will emphasize their applicability to both types of
qualitative models.
Much work in systems and synthetic biology has been
concerned with analyzing the relation between the structure of
synthetic circuits and their dynamic properties. A complementary
question, which has started to emerge in recent years, is the control
of synthetic circuits through suitably chosen inputs, so as to steer
the network behavior towards a desired objective [16, 17]. The
question of control can be understood in a wider sense as well,
namely how a synthetic circuit can bring a naturally evolved net-
work in the cell into a certain desired state. Both control questions
raise issues that can be fruitfully addressed using the qualitative
modeling approaches discussed in this review, and we will discuss
some burgeoning approaches by means of recent examples from the
literature.
This chapter does not aim at covering the whole breadth of
qualitative modeling approaches proposed in systems and synthetic
biology, nor does it intend to provide much detail on the mathe-
matical bases of the formalisms and methods covered here. Previous
reviews of qualitative modeling approaches available in the litera-
ture have already done this (see Note 8.1). Our specific contribu-
tions are threefold. First, we focus on the use of qualitative
modeling in the design and analysis of synthetic networks as com-
pared to naturally evolved networks. Second, we provide an
integrated discussion of two different modeling frameworks,
4 Madalena Chaves and Hidde de Jong

having different strengths and weaknesses, structured around a


common discrete representation of the network dynamics, state
transition graphs. Third, we highlight an emerging topic in the
design and analysis of synthetic circuits, namely the integration of
considerations of network control in the design phase.

2 Examples of Synthetic Regulatory Circuits

The modeling frameworks and analysis methods discussed in this


chapter will be illustrated by means of three simple examples of
synthetic regulatory circuits: the toggle switch, a synthetic oscilla-
tor, and the IRMA network (Fig. 1).
The toggle switch is among the first synthetic networks
described in the literature [18] and is still much used for illustrating
design and control principles in synthetic biology [6, 19]. Moreover,
the network motif of the toggle switch has been found to play an
important role in a variety of natural processes, such as in the
development of the body plan in insect embryos [20].
The toggle switch implemented in Escherichia coli consists of
two genes, lacI and tetR (Fig. 1a). The two genes encode transcrip-
tion factors, LacI and TetR, that mutually repress each other by
binding to the promoter region of the tetR and lacI genes, respec-
tively. A gfp reporter gene is co-transcribed with tetR. The inhibi-
tory activity of LacI can be modulated by adding the
non-metabolizable inducer molecule isopropyl β-D-1-thiogalacto-
pyranoside (IPTG) to the medium. Similarly, repression by TetR
can be released by adding anhydrotetracycline (aTc).
The toggle switch can display bistable behavior, depending on
parameter values, due to the presence of a positive feedback loop
(see [21] for another example of a simple bistable synthetic network
with positive feedback). On the contrary, negative feedback loops
have been associated with oscillatory behaviors. The repressilator
[22], a circuit implementing a three-gene negative feedback loop, is
the first example of a synthetic oscillator. As toggle switches have
been shown useful for understanding developmental switches,
repressilators, and other synthetic oscillators are interesting models
for analyzing temporally repetitive processes, such as the cell cycle
and circadian rhythms. The notions of positive and negative feed-
back, and their relation to circuit dynamics, are further developed in
Note 8.2, following the analysis in [23].
In this section, we introduce another example of an oscillatory
circuit, combining a negative feedback loop with positive feedback
[24] (Fig. 1b). The circuit consists of two genes, lacI and glnG. The
latter encodes nitrogen regulator I (NRI), a transcription regulator
that is active when phosphorylated. The E. coli strain in which this
circuit has been implemented was modified in such a way as to make
phosphorylation of NRI constitutive, independent of the cellular
Qualitative Modeling of Synthetic Circuits 5

a aTc

LacI TetR + GFP

lacI tetR gfp


IPTG

LacI NRI NRIp

lacI glnG
IPTG

Cbf1 Ash1

CBF1 ASH1

Gal4 Swi5

GAL4 SWI5
Galactose
Gal80

GAL80

Fig. 1 Three examples of synthetic regulatory circuits. (a) Toggle switch in


E. coli. (b) Oscillator in E. coli. (c) IRMA network in S. cerevisiae. Genes (blue/
green blocks) are preceded by a promoter region (red block). Gene names are in
italic font, protein names in roman font. The regulation of gene expression by the
proteins encoded by genes is represented by solid lines ending by a symbol
indicating the type of regulation: activation of gene expression is indicated by the
▹ symbol and repression by j. External metabolites and other small molecules
(IPTG, aTc, galactose), shown in orange, may affect the strength of the regulatory
interactions

(nitrogen) state. Both lacI and glnG are activated by phosphory-


lated NRI (NRIp). The negative feedback loop involving both
genes, lacI and glnG, is thus modulated by a positive (auto-
regulatory) feedback loop on glnG. Network topologies combining
negative with positive feedback have been argued to give rise to
more robust oscillations [7].
The final circuit used as a running example in this chapter is the
IRMA network (Fig. 1c). The number of genes in this network is
larger as compared to the others considered above and it has been
implemented in a eukaryotic microorganism, the yeast
6 Madalena Chaves and Hidde de Jong

Saccharomyces cerevisiae [25]. The IRMA network consists of five


genes and includes both transcriptional regulation and protein–
protein interactions. In particular, the genes CBF1, GAL4, SWI5,
and ASH1 encode transcriptional regulators arranged in a negative
feedback loop with a super-imposed positive feedback loop. Swi5
also activates the gene GAL80, whose product Gal80 binds to and
inactivates Gal4, thus giving rise to an additional negative feedback
loop. The action of Gal80 on Gal4 is inhibited when the metabolite
galactose is present in the medium.
In what follows, the toggle switch and the oscillator circuit will
be used to explain the principles of the modeling approaches. The
IRMA network, which has the most complex dynamics, will illus-
trate the methods developed for the analysis of large state transition
graphs.

3 Boolean Models

Boolean models are a simplified but intuitive modeling framework,


using discrete variables and translating the topology of the graph of
interactions into a set of logical operations. To apply this framework
to biological networks, the concentration of each biological com-
ponent (protein, mRNA) is represented by 0 or 1, depending on
whether it is weakly or highly expressed, and its activity is repre-
sented by a logical rule describing the combination of interactions
influencing that component. The concentrations are defined in
continuous time, but their value is allowed to change only at a
discrete set of time instants. The time units are arbitrary and only
sequences of states have biological meaning. The general form of a
Boolean model is

i ¼ f i ðX 1 , . . ., X n Þ, ð1Þ
where n is the number of variables, Xi ∈{0, 1} denotes the Boolean
concentration of variable i at the current time, and X þ i is the value
at the next evaluation time, to be computed from the current values
by applying the logical rule fi : {0, 1}n !{0, 1}. The function fi may
depend only on a subset of the variables. The discrete concentra-
tions evolve according to an updating schedule, which defines the
order in which the variables change to their next values. Common
schedules include synchronous updates, where the rules for all
components are simultaneously applied, and asynchronous
updates, where at most one rule is applied at each time instant (see
Note 8.3).
This concise but abstract formalism becomes quite useful when
studying complex biological networks, where the structure and
topology of interactions are well known but few quantitative details
are available; for instance, whenever the concentrations of most of
Qualitative Modeling of Synthetic Circuits 7

the components involved are not measured, and the reaction rates
and other parameters are unknown. In such cases, a Boolean model
of the network provides a global qualitative view of the dynamical
behavior of the network, using all the available information on the
network, but without introducing unknown parameters.
The first application of Boolean models to biological networks
was suggested by Stuart Kauffman in 1969 [11] and by René
Thomas around the same time [13]. Both used the Boolean repre-
sentation to describe genetic regulatory networks, where events
such as mRNA transcription and protein translation may be
thought of as being “turned on” or “turned off” (1 or 0).
The last 20 years have witnessed an increasing availability of
genomic and proteomic data, the discovery of new biological mole-
cules and pathways, and the multiplication of interactions among
biological components. Nevertheless, it is still difficult to obtain
detailed parameter sets to characterize each biological reaction or
interaction. On the mathematical side, several methods have been
proposed to better characterize Boolean models and introduce
quantitative elements: probabilistic and stochastic approaches
[26, 27], complex updating schedules [28–30], model reduction
[31, 32], attractor computation [33–35], characterization of state
transition graphs [36], network interconnections [37], and control
methods [38–40].
All these advances sparked a new wave of interest in Boolean
models for application to a wide range of biological networks, from
the cellular division cycle in various organisms [41–43], to signal
transduction networks [15, 44], cancer-related networks [45], or
pattern formation [46, 47]. A large collection of recent examples
can be found in a special issue of the journal Frontiers in
Physiology [48].
In addition to Boolean models, there are several approaches
using discrete and logical functions to describe biological networks.
Further work by René Thomas and collaborators extends Boolean
models in several ways [23], such as the inclusion of multiple
discrete levels, by assigning parameters to the transition graph
edges to indicate different concentration thresholds. Among
other formal methods, Petri nets have been successfully applied to
model biological systems [49]. A Petri net is defined through a
graph with two types of nodes (places and transitions), connected
by weighted directed edges. Places may be marked by a number of
tokens that enable transitions. Petri nets are especially suitable to
model biochemical and metabolic networks, as the incidence matrix
of the net reflects the stoichiometry matrix [50].

3.1 The Toggle As a first example, consider the toggle switch, a network with two
Switch components L and T (for LacI and TetR protein expression, respec-
tively), and two inputs A and I (for aTc and IPTG concentration,
respectively). Both variables and inputs take values 0 or 1. To write
8 Madalena Chaves and Hidde de Jong

LT L + T+ L T L + T+
00 11 00 10, 01 01 11
01 11
01 01 01 01
10 10 10 10
00 10 00 10
11 00 11 10, 01
A B C D

Fig. 2 Toggle switch without inputs (I ¼ A ¼ 0). (a and b) Synchronous updating


schedule and corresponding transition graph. (c and d) Asynchronous updating
schedule and corresponding transition graph

the logical rule for L activity, we translate the interactions into


logical operations, that is, protein LacI is produced only when its
repressor TetR is not present or when input aTc is present, since aTc
prevents inhibition by TetR. This can be written as fL(T, A) ¼ Ø
T _ A. A similar argument can be used to write the logical rule for
TetR: fT(L, I) ¼ ØL _ I. The Boolean model of the toggle switch is
then:
L þ ¼ f L ðT , AÞ ¼ ØT _ A, ð2Þ
T þ ¼ f T ðL, I Þ ¼ ØL _ I , ð3Þ
where both A and I are constant inputs. A constant variable may be
defined by the rule A+ ¼ A, but here we will simply analyze sepa-
rately the cases A ¼ 0 or A ¼ 1, and I ¼ 0 or I ¼ 1. From the logical
rules, we can construct the synchronous truth table for the model,
containing the successor of each state (Fig. 2a).
The overall behavior of the system can also be represented in
terms of a state transition graph, where each state is connected to its
successor by an arrow (Fig. 2b). The states LT that satisfy
L ¼ fL(T, A) and T ¼ fT(L, I) are called point attractors and corre-
spond to steady states of the system (see Subheadings 5.1 and 5.2
for more details).
The structure of the state transition graph depends on the
updating schedule. A short review of common schedules is given
in Note 8.3 below. With the synchronous updating schedule, all
variables are simultaneously updated, so there is exactly one succes-
sor for each state. In the corresponding state transition graph, there
are two point attractors, 01 and 10, representing two steady states:
either LacI is at high concentration thus inhibiting TetR which
must be at low concentration (state 10) or the opposite. There is
also a cyclic attractor, composed of two states {00, 11}. This cycle
does not have any valid biological interpretation, as it requires that
both LacI and TetR change concentrations simultaneously, but
appears as an artifact of the synchronous updating schedule. To
overcome this problem, a common approach is to add the hypoth-
esis that exactly one variable may change its value at each updating
Qualitative Modeling of Synthetic Circuits 9

step, thus yielding an asynchronous schedule, where each state can


have up to 2 n successors. Under this hypothesis, all point attractors
remain unchanged, but cyclic attractors are often resolved into
more realistic state trajectories. Applying this hypothesis to the
toggle switch yields Fig. 2c and the state transition graph in
panel d, where the states 00 and 11 can each cross to the two
attractors.
We conclude that, independently of its initial state and without
inputs, the toggle switch will eventually converge to a state where
only one of the proteins is strongly expressed. For other combina-
tions of the inputs A and I, it is easy to predict their effect: a
constant A ¼ 1 implies L+ ¼ L ¼ 1 which eventually sets T+ ¼ T ¼ 0,
hence leading to the state 10. Conversely, a constant I ¼ 1 eventu-
ally leads to the state 01. Setting both inputs to 1 leads to a state
where both proteins are strongly expressed since no inhibition
remains.

3.2 The Oscillator This network is also composed of two genes lacI and glnG encod-
with Positive Feedback ing for two proteins, LacI and NRI, both regulated by the phos-
phorylated form of the transcription factor NRI. The protein LacI
represses transcription of glnG and, in turn, the input IPTG lifts
LacI repression. In general, the phosphorylated transcription factor
NRI will activate genes lacI and glnG at different concentrations or
activity thresholds, that is, whenever protein NRI is above a first
threshold concentration θ1N , transcription of glnG is activated, and
when NRI becomes higher than a second threshold concentration
θ2N , it induces activity of lacI. The experimental system [51] implies
that θ1N < θ2N . These distinct thresholds of activation for NRI
require a variable with at least three discrete concentration levels,
while Boolean variables have only two levels. To resolve this prob-
lem, a generalized logical model would consider a multi-leveled
variable N to describe the concentration of protein NRI (as in the
corresponding PLDE model, Subheading 4.2). Alternatively, Bool-
ean models can also be extended as suggested in [52], by creating
two different Boolean variables, N1 and N2, to represent N as
follows:
( (
0, N < θ1N , 0, N < θ2N ,
N1 ¼ N 2 ¼
1, N > θ1N , 1, N > θ2N ,
These two variables will evolve according to different Boolean
rules, but should always satisfy N1  N2, by definition of the thresh-
olds. More specifically, if N is a logical variable with three levels {0,
1, 2}, then N1 and N2 allow us to code for those three levels in a
Boolean notation, that is, “0¼00,” “1¼10,” and “2¼11” so that
the higher concentration of N corresponds to both N1 and N2 ON,
while the intermediate concentration of N corresponds to N1 ON
and N2 OFF. Note that the Boolean state (N1, N2) ¼ (0, 1) does
10 Madalena Chaves and Hidde de Jong

not encode for any level of variable N and does not take part in the
state transition graph of the Boolean model.
Therefore, three variables will be considered: L for LacI and
N1, N2 for NRI protein expression. The input IPTG is denoted I.
To assign the rules for variables N1 and N2, we will consider that
NRI transcription is activated in a first stage by the positive feed-
back loop and in a second stage LacI repression comes into play.
Thus N1 is regulated by N2 only, while N2 is regulated both by N1
and L. The Boolean rules for the oscillator with positive feedback
become:
Lþ ¼ N 2, ð4Þ

1 ¼ N 2, ð5Þ

2 ¼ ðØL _ I Þ ^ N 1 : ð6Þ
The input I ¼ 1 induces the expression of NRI, followed by its
phosphorylation, and subsequent expression of LacI, so the system
converges to state 111.
In the case I ¼ 0, the synchronous and asynchronous updating
schedules lead to quite different state transition graphs, but both
contain only one attractor, consisting of the origin with all proteins
weakly expressed. In the synchronous case, however, the transition
010 ! 001 is an artifact of the simultaneous updating of N1 and
N2. This problem is resolved in the asynchronous state transition
graph (Fig. 3b), where the states 001 and 101 are transient and do

a b
L N1 N2 L+ N1+ N2+ L+ N1+ N2+
synchronous asynchronous
011 111
000 000 000
001 110 101, 011, 000
010 001 000, 011 010 110
011 111 111
100 000 000 001 101
101 110 111, 100
110 000 010, 100 000 100
111 110 110

c
C4

C1 C2 C3 C5

Fig. 3 Oscillator with positive feedback and zero input (I ¼ 0). (a) Truth table for
synchronous and asynchronous updating schedules. (b) Asynchronous state
transition graph. (c) Hierarchical state transition graph after decomposition into
strongly connected components (see Subheading 5.2 for the corresponding
analysis and definition of the components Ci)
Qualitative Modeling of Synthetic Circuits 11

not have any incoming arrows from other states. In this graph, the
effect of the negative feedback loop between LacI and NRI can be
observed in the cyclic orbit which is reached whenever NRI is above
its intermediate threshold concentration (N1 ¼ 1):
111 ! 110 ! 010 ! 011 ! 111. However, this cyclic orbit is not
an attractor itself and the Boolean model predicts that all trajec-
tories eventually converge to the point attractor formed by the
origin (see the transitions from the states 010 and 110 to 000).
In this example, the global behavior of the Boolean model
differs from that of the corresponding PLDE model in Subheading
4.2, even though both models have the same point attractor at the
origin and the cyclic orbit of the Boolean model corresponds
exactly to the orbit depicted in Fig. 6b. However, in the PLDE
model, the cyclic orbit is also an attractor, and there are trajectories
converging either to the origin or to a (damped) periodic orbit
depending on the initial conditions. In this case, the PLDE model
allows for a more detailed description of the continuous state space,
as discussed below (Subheading 4.2).

3.3 The IRMA Circuit This circuit is composed of five genes encoding for five proteins,
Ash1, Cbf1, Gal4, Gal80, and Swi5, and one input G (galactose).
One of the proteins (Swi5) is a transcription factor for three of the
genes. In this circuit, the different activity thresholds of Swi5
relative to the three genes will play an important role in determin-
ing the dynamical properties of the system. These thresholds define
the (distinct) concentrations of Swi5 which trigger the transcrip-
tion of the three genes. If S denotes the (continuous) concentration
of protein Swi5, then transcription of gene ASH1 is activated
whenever S > θaS . Similarly, transcription of genes CBF1 and
g
GAL80 is initiated when S > θcS and S > θS , respectively. From
the analysis in [51], the activity threshold for CBF1 should be the
g
lowest, and in this section we will consider that θcS < θS < θaS. These
distinct thresholds for S require a logical variable with at least four
discrete concentration levels so, as in the oscillator example, an
extended Boolean model will be constructed [52], by creating
three different Boolean variables to represent S as follows:
 ( g 
0, S < θcS , 0, S < θS , 0, S < θaS ,
Sc ¼ Sg ¼ g Sa ¼
1, S > θcS , 1, S > θS , 1, S > θaS :

These three variables will evolve according to different Boolean


rules, but should always satisfy Sc  Sg  Sa since, by definition of
the thresholds, Sa ¼ 1 implies Sg ¼ 1 which implies Sc ¼ 1. In par-
ticular, this signifies that the Boolean states where Sc < Sg or Sg < Sa
do not have biological meaning for the IRMA circuit and are not a
part of the state transition graph.
12 Madalena Chaves and Hidde de Jong

The IRMA circuit can now be translated into Boolean rules as


follows, with A, C, G4, and G80 denoting the concentrations of
Ash1, Cbf1, Gal4, and Gal80, respectively:
G4þ ¼ C, ð7Þ
A þ ¼ Sa, ð8Þ
C þ ¼ Sc ^ ØA, ð9Þ
G80þ ¼ Sg, ð10Þ
Sc þ ¼ G4 _ Sg, ð11Þ
Sg þ ¼ ððØG80 _ GÞ ^ ScÞ _ Sa, ð12Þ
Saþ ¼ G4 ^ Sg, ð13Þ
where the rules for the three Swi5 variables indicate that, first, Swi5
is activated by protein Gal4 and then, in a second step, the inhibi-
tion by Gal80 comes into play. If Swi5 is already above its second
threshold (with Sg ¼ 1), then continued activation by Gal4 further
increases Swi5 production. The dependence of the Swi5 variables
on each other guarantees that the “forbidden” states (i.e., those
satisfying Sc < Sg or Sg < Sa) do not enter into the dynamics: Sc
(respectively, Sg) should be in the ON state whenever Sg is (respec-
tively, Sa); conversely, Sa cannot become ON unless Sg is.
The IRMA Boolean model has 64 states, the state transition
graph corresponding to the asynchronous updating schedule with
G ¼ 1 is shown in Fig. 4. In the absence of galactose (G ¼ 0), there
is only one attractor, the origin, corresponding to the state in which
all proteins are weakly expressed. In the presence of galactose
(G ¼ 1), the inhibition of SWI5 by Gal80 is inactive, and the system
has two attractors: the origin and a cyclic attractor with eight states,
0001110 ! 0011110 ! 1011110 ! 1011111 ! 1111111 !
! 1101111 ! 0101111 ! 0101110 ! 0001110:
ð14Þ
In this cyclic attractor, Swi5 is always expressed above its second
threshold (Sc ¼ Sg ¼ 1), hence Gal80 is also always expressed. The
sequence of activations is then Cbf1 ON, Gal4 ON, Swi5a ON,
Ash1 ON, followed by their repression in the same order.
The IRMA circuit was implemented in yeast and experiments
report the response to input G (see Figure 3 in [25]): once galactose
is added (G ¼ 1), the transcription of gene SWI5 is “switched-on,”
and all proteins rapidly increase their concentration before going
back to a (possibly new) steady state, and possibly showing an
oscillatory behavior. Conversely, once galactose is removed
(G ¼ 0), all proteins are observed to decrease to a weakly expressed
state.
Qualitative Modeling of Synthetic Circuits 13

Fig. 4 State transition graph of the IRMA model, for the case G ¼ 1. The yellow nodes represent the two
attractors. This graph was constructed in the software platform Cytoscape [53]

The Boolean model is consistent with these experiments: in the


absence of galactose the only attractor is the origin, while in the
presence of galactose the proteins generally increase their concen-
trations and enter into a cyclic attractor, which may correspond to
sustained oscillations (see Subheading 4 below), or damped oscilla-
tions and convergence to a new steady state. Even in the presence of
galactose, the origin remains an attractor so another possible behav-
ior is that, after a rapid onset, the system returns to a state where all
proteins are weakly expressed.

4 Piecewise-Linear Differential Equation Models

Piecewise-linear differential equation (PLDE) models retain the


logical functions that occur in the Boolean models discussed above,
but embed them in a system of differential equations. This gives rise to
the following general definition of the dynamics of the concentration
of the product of gene i (typically a protein) [12, 54, 55]:
P
x_ i ¼ κli b li ðxÞ  γ i x i , 1  i  n, ð15Þ
l∈L i

where x ∈ Ω denotes a vector of n protein concentrations, and Ω a


subset of n0 . The synthesis rate is composed of a sum of positive
synthesis constants κ li , each modulated by a regulation function
b li ðxÞ : Ω ! f0, 1g, with l in an index set Li. A regulation function
is an algebraic expression of step functions s+(xj, θj) or s(xj, θj)
which formalizes the regulatory logic of gene expression,
14 Madalena Chaves and Hidde de Jong

analogously to the Boolean functions in Subheading 3. θj is a


so-called threshold for the concentration xj, such that s+(xj, θj)
evaluates to 1 if xj > θj, and to 0 if xj < θj, while s(xj, θj) ¼ 1  s+(xj,
θj). The step functions thus capture the switch-like character of
gene regulation. The degradation of a gene product has first-
order kinetics, with a positive degradation constant γ i.
For any value of x, the functions b li ðxÞ evaluate to either 0 or
1. It can be shown that, when subdividing
P Ω into hyper-rectangular
regions by the threshold planes, l∈L i κli b li ðxÞ is constant in every
region. In other words, every region is associated with a system of
n decoupled, linear equations, thus making Eq. 15 a piecewise-
linear model. As a consequence, the local dynamics in the regions is
straightforward to analyze, in the sense that all solutions monoton-
ically converge to the steady state of the local linear system
[12]. However, before reaching this state, the solutions may leave
the region in which the linear system is defined and enter another.
When piecing together the local dynamics in all regions, the possi-
bly complex global dynamics of the network can be reconstructed.
The solutions of the PLDE models are not well-defined on the
threshold planes, where the step functions switching from 0 to
1 (1 to 0) may cause a discontinuity in the right-hand side of
Eq. 15 for one or several i. Several ways to resolve this complication
have been proposed in the literature [51, 56–60], a subject briefly
summarized in Note 8.4.
The fact that in every region the system reduces to a simple
linear model suggests an intuitive, abstract description of the
dynamics of a regulatory circuit. Since in each region the system
behaves in a qualitatively uniform manner, the region can be asso-
ciated with a qualitative state, and the existence of solutions enter-
ing one region from another with transitions between qualitative
states. The sets of states and transitions between states form a
graph, the state transition graph [12]. The properties of this
graph, such as the occurrence of attractor states or cycles, can be
related to dynamical properties of the underlying PLDE model,
such as (stable or unstable) steady states and limit cycles [57, 61–
64]. The construction of a state transition graph from a PLDE
model follows simple rules, which do not need exact values for
the parameters but exploit qualitative orderings of parameters
[55]. The relation between the PLDE model and its state transition
graph can be formally grounded in discrete abstractions developed
in hybrid systems theory [65]. The state transition graphs thus
obtained are closely related to the graphs describing the dynamics
of multi-level logical models [66].
PLDE models of gene regulatory networks in the form of
Eq. 15 were first proposed by Leon Glass and Stuart Kauffman
[12] and are also known as Glass networks. They have been shown
powerful tools for exploring the possible dynamics of regulatory
circuits, such as the onset of chaotic dynamics [67, 68]. However,
Qualitative Modeling of Synthetic Circuits 15

they have also been used for modeling actual regulatory networks,
for example, in microbiology [69–71]. Computer tools allowing
the definition of PLDE models of regulatory networks and their
qualitative analysis are available, such as Genetic Network Analyzer
(GNA) [72, 73]. Recent publications present the (qualitative) anal-
ysis of more general classes of PLDE models [74], while other work
presents the related formalism of hybrid automata and their appli-
cation to circuit modeling [75].

4.1 The Toggle A PLDE model for the toggle switch can be developed, analogously
Switch to the Boolean model in Subheading 3.1. We again use the variables
L (LacI), T (TetR), I (IPTG), and A (aTc), but now treat them as
concentrations taking their values in 0 . Similarly, we introduce
for each of the variables a concentration threshold, labeled θL, θT,
θI, and θA, respectively. With these definitions, the step function
s+(L, θL) evaluates to 1, if L is present at a high concentration, above
its threshold θL, and to 0, if L is present at a low concentration,
below its threshold. Like in the Boolean model, we would like to
express that the gene encoding TetR is expressed when the concen-
tration of L is low or that of I high, in other words Øs+(L, θL) _ s+(I,
θI). An equivalent formulation is obtained using de Morgan’s law:
Ø(s+(L, θL) ^Øs+(I, θI)) ¼ Ø(s+(L, θL) ^ s(I, θI)), which can be
interpreted as saying that TetR is not expressed when LacI is
present at a high concentration and not inhibited due to the pres-
ence of IPTG. The latter expression can be rewritten in algebraic
form as (1  s+(L, θL)  s(I, θI)). Similarly, the regulation of LacI by
TetR and aTc gives rise to the step function expression (1  s+(T,
θT)  s(A, θA)). Boolean expressions of step functions can always be
translated into equivalent algebraic expressions [54].
With the above considerations, the model for the toggle switch
reads as
L_ ¼ κL  ð1  s þ ðT , θT Þ  s  ðA, θA ÞÞ  γ L  L, ð16Þ

T_ ¼ κT  ð1  s þ ðL, θL Þ  s  ðI , θI ÞÞ  γ T  T , ð17Þ
where I and A are considered constant inputs. The dynamics of this
model can be analyzed in the plane, where we assume for the time
being that IPTG and aTc are absent from the medium, that is,
I ¼ A ¼ 0 and therefore s(I, θI) ¼ s(A, θA) ¼ 1. The thresholds
for T and L divide the phase space into four regions (Fig. 5a), in
each of which the model of Eqs. 16–17 reduces to a simple linear
model. For example, in the region S1, defined by the inequalities
0  L < θL and 0  T < θT, we have L_ ¼ κL  γ L  L and T_ ¼
κT  γ T  T . In this region all solution trajectories (monotonically)
converge to the asymptotically stable steady state of the linear
system given by (κL/γ L, κT/γ T)0 . This so-called focal point is here
assumed to lie outside S1, in the region S3, which amounts to
assuming that κL/γ L > θL and κT/γ T > θT (Fig. 5a). In other
16 Madalena Chaves and Hidde de Jong

a b
T
S4 S3
κT /γT 01 11
S4 S3

θT
S2
00 10
S1 S2
S1
0 θL κL/γL L

Fig. 5 PLDE model of toggle switch in the absence of inputs (I ¼ A ¼ 0). (a) Phase
plane analysis. Some example solutions are shown (solid curves). (b) State
transition graph. The names of the states correspond to the names of the regions
and the states have been labeled with the values of s+(L, θL) and s+(T, θT)

words, the solution trajectories starting in S1 will leave the region


after some (finite) time and enter S2 or S4. Similarly, in region S2,
where L > θL and 0  T < θT, the model becomes L_ ¼ κL  γ L  L
and T_ ¼ γ T  T , and the solutions converge towards the focal
state (κL/γ L, 0)0 . Since this focal state is included in S2, the solutions
in S2 will never leave the region, and (κL/γ L, 0)0 is a (stable) steady
state of the system.
The phase plane analysis, when carried out in all four regions,
indicates that the system has two stable steady states, in which
either L or T is above its threshold (and the other variable below
its threshold). When now associating each region with a qualitative
state of the same name, and the solutions entering one region from
another with transitions between qualitative states, we obtain the
state transition graph shown in Fig. 5b. For clarity, the states have
been labeled with the values of s+(L, θL) and s+(T, θT), indicating
whether the concentrations of L and T are above or below their
thresholds in that region. Notice that for generating the graph, we
did not need to specify quantitative values for the parameters, but
only needed to know how κ L/γ L and κT/γ T were positioned with
respect to their respective thresholds, an observation that holds
more generally [55, 65].
As can be seen by comparing the graph in Fig. 5b with that in
Fig. 2d, the qualitative dynamics of the PLDE model of the toggle
switch and the Boolean asynchronous model are equivalent. Inde-
pendently of the initial state, the system will reach one of the two
stable equilibria (attractors) of the regulatory circuit.
The state transition graph provides a qualitative picture of the
dynamics of the network. The transitions in the graph correspond
to events of qualitative importance, such as the crossing of a thresh-
old that switches off a gene. In the transition from S1 to S2 in Fig. 5,
for example, LacI exceeds its threshold concentration and starts to
inhibit the expression of tetR.
Qualitative Modeling of Synthetic Circuits 17

Figure 5a does not show that some solutions in regions S1 and


S3 may reach the intersection of the thresholds θL and θT. The
vector field in the region these solutions are about to enter, for
example S3 from S1, points in the opposite direction, which pre-
cludes straightforward continuation of the solutions. In order to
resolve this problem, the definition of the solutions of the PLDE
systems needs to be extended to the thresholds, an issue that is
further developed in Note 8.4. It suffices here to say that, when
doing this, the threshold intersection turns out to be another steady
state, but unstable contrary to the other two stable steady states.

4.2 The Oscillator In developing the PLDE model of the oscillator with positive
with Positive Feedback feedback (Fig. 1b), like in the Boolean model, we will not distin-
guish between the phosphorylated and non-phosphorylated forms
of NRI, but rather build upon the fact that in the strain considered,
phosphorylation of NRI is constitutive. Contrary to the Boolean
model, however, we introduce only a single variable for the NRI
concentration (N), in addition to a variable for the LacI concentra-
tion (L) and the input IPTG (I). N has two different threshold
concentrations, a first threshold for activation of the promoter
driving NRI expression and a second threshold for the promoter
driving LacI expression. These two thresholds will be referred to as
θ1N and θ2N , respectively. The limitation to two state variables makes
it possible to display the dynamics of the model in the phase plane,
which will be convenient for illustrative purposes.
This results in the following PLDE model of the network:
L_ ¼ κL  s þ ðN , θ2N Þ  γ L  L, ð18Þ

N_ ¼ κ N  ð1  s þ ðL, θL Þ  s  ðI , θI ÞÞ  s þ ðN , θ1N Þ  γ N  N :
ð19Þ
The regulatory logic embedded in the equation for N agrees
with the details of the molecular implementation of the regulatory
circuit, where for the gene to be expressed, NRI needs to be present
and LacI to be absent or inactive. Moreover, the choice of promo-
ters in the circuit guarantees that θ1N < θ2N [24].
Figure 6a shows the phase plane analysis of the oscillator
model, under the assumption that IPTG is absent (I ¼ 0) and that
κL/γ L > θL and κN =γ N > θ2N . Notice that any other choice of the
parameter inequalities would be inconsistent with the implementa-
tion of the regulatory circuit, as it would imply that even when the
proteins were expressed, the concentrations of NRI and LacI would
never rise to a level where they can influence the expression of their
target genes. Interestingly, the analysis shows that the system has
the potential to generate oscillations in the regions where N > θ1N .
Below this threshold, however, the system falls back to the trivial
stable steady state (0, 0)0 . The oscillations and the steady state show
18 Madalena Chaves and Hidde de Jong

a b
N
S5 S6 02 12
κN /γ N S5 S6

θN2
S4 11
01
S3 S4
S3
θN1
S1 S2 00 10
S1 S2
0 θL κL/ γ L L

c d

θN1 . Z4 . (0, θN1 ) Z4 ( θL ,θN1 )

Z2 Z3
Z5 Z2 Z5 Z3

. Z1 . (0, 0) Z1 ( θL , 0)
0 θL

Fig. 6 PLDE model of oscillator with positive feedback in the absence of input
(I ¼ 0). (a) Phase plane analysis. Some example solutions are shown (solid
curves). (b) State transition graph. The names of the states correspond to the
names of the regions and the states have been labeled with the values of s+(L,
θL) and s þ ðN , θ1N Þ þ s þ ðN , θ2N Þ. (c) Refined phase plane analysis of the lower-
left portion of the phase plane with example solutions. (d) State transition graph
corresponding to the analysis in c

up as two attractors in the state transition graph (Fig. 6b). The


states in the graph have been labeled with the values of s+(L, θL) and
s þ ðN , θ1N Þ þ s þ ðN , θ2N Þ.
Generally speaking, the occurrence of cyclic attractors in a state
transition graph is not sufficient to conclude that the oscillations are
stable, that is, that the underlying PLDE system has a limit cycle
(Subheading 5.1). Indeed, numerical simulations suggest that the
oscillations are damped, consistent with the experimental data [24],
and converge to a steady state located on the intersection of the
threshold planes L ¼ θL and N ¼ θ2N [51]. In order to identify this
steady state, the analysis of the model needs to be extended to the
threshold planes, as explained in Note 8.4. The same extension is
necessary to show that solutions can slide on the threshold N ¼ θ1N
separating the two basins of attraction and reach an unstable steady
state at ð0, θ1N Þ. The results of such an extended analysis, focusing
on the lower-left portion of the phase plane are shown in Fig. 6c,
d. The computer tool GNA has been used to generate these results.
Notice that the solutions sliding on the threshold N ¼ θ1N appear
as the sequence of transitions ðθL , θ1N Þ ! Z 4 ! ð0, θ1N Þ . While
Qualitative Modeling of Synthetic Circuits 19

these subtle aspects of the dynamics are absent from the Boolean
model of Subheading 3.2, both models agree in predicting oscilla-
tions and a stable steady.

4.3 The IRMA Circuit Whereas the example networks in the previous two sections are
small and can be analyzed in the phase plane, this is not the case
for the IRMA network. The model has five genes, ASH1, CBF1,
GAL4, GAL80, and SWI5, and one input, galactose. The PLDE
model previously developed for this network [76] has five state
variables, one for each protein concentration (A, C, G4, G80
and S), and one input variable, representing the galactose concen-
tration (G):
A_ ¼ κ0A þ κA s þ ðS, θaS Þ  γ A A, ð20Þ

Ċ ¼ κ 1C s þ ðS, θcS Þ þ κ 2C s þ ðS, θcS Þs  ðA, θA Þ  γ C C, ð21Þ


_ ¼
G4 κ0G4 þ κ G4 s þ ðC, θC Þ  γ G4 G4, ð22Þ
_ ¼ κ 0 þ κG80 s þ ðS, θg Þ  γ G80 G80,
G80 ð23Þ
G80 S

S_ ¼ κ 0S þ κ S s þ ðG4, θG4 Þ ð1  s þ ðG80, θG80 Þ s  ðG, θG ÞÞ  γ S S:


ð24Þ
We also use the following inequalities on the parameters, which
were estimated from experimental data [76]:
0 < κ0A =γ A < θA < ðκ 0A þ κA Þ=γ A , ð25Þ
0 < κ1C =γ C < θC < ðκ 1C þ κ2C Þ=γ C , ð26Þ
0 < κ 0G4 =γ G4 < θG4 < ðκ 0G4 þ κG4 Þ=γ G4 , ð27Þ
0 < θG80 < κ 0G80 =γ G80 < ðκ0G80 þ κG80 Þ=γ G80 , ð28Þ
g
0 < κ0S =γ S < θcS < θaS < θS < ðκ 0S þ κ S Þ=γ S : ð29Þ
Despite the size of the network, the transition graph can be
generated and analyzed to find different attractors corresponding
to steady states or oscillations. The graph has 64 states, when
restricting the analysis to regions between thresholds like in
Fig. 6a, b. The results correspond to those obtained with the
Boolean model (Subheading 3.3). However, when doing a more
refined analysis like in Fig. 6c, d, necessary for identifying steady
states on threshold planes, this number quickly jumps to above
7000 states. Without additional tools, this makes it practically
impossible to analyze the qualitative dynamics of the network in
detail. In the following section, we will zoom in on some of the
tools developed for coping with large networks and state transition
graphs, both in Boolean and PLDE models.
20 Madalena Chaves and Hidde de Jong

5 Analysis of Network Dynamics

To a large extent, the analysis of network dynamics reduces to the


analysis of state transition graphs. Various approaches have been
proposed for this purpose and will be briefly reviewed in this
section. Like in the discussion of the modeling frameworks, the
different approaches will be illustrated by means of the examples in
Fig. 1.

5.1 Analysis Attractors in a state transition graph are (minimal) sets of states
of Attractors and Their which do not have any outgoing transitions, that is, transitions
Stability from a state inside to a state outside the attractor. Usually, two
different types of attractors are distinguished: point attractors,
consisting of a single state, and cyclic attractors, consisting of a set
of states forming one or several cycles. The interest of attractors for
the study of network dynamics is that, starting from an initial state
in the graph, the system reaches an attractor after a finite number of
transitions and then indefinitely remains there. For this reason,
attractors have been associated with end-points of developmental
trajectories in higher organisms [46, 47] or possible responses of
microorganisms to a challenge from their environment [69]. In
synthetic biology, attractors may correspond to different functional
states and thus form an objective of circuit design [6]. Although
new measurement techniques have made it possible to follow the
transient dynamics of networks, for instance, by using fluorescent
reporter proteins, in many cases attractors remain the only reliably
observable states of the system.
Given a state transition graph, the identification of attractors is
straightforward. Point attractors can be found by inspecting all
individual states and cyclic attractors by looking for strongly
connected components (SCCs) of the graph. An SCC is a set of
states which are mutually connected, that is, there exists a directed
pathway from each state to any other in the SCC. An SCC is also a
maximal set, in the sense that it contains every state mutually
connected to any other state in the SCC. An SCC may have incom-
ing edges and outgoing transitions, and for it to be an attractor, it
needs to be a terminal SCC, that is, have no outgoing transitions.
Since the size of the state transition graphs grows exponentially
with the number of variables (genes), however, this enumeration
approach may not be feasible in many situations of practical inter-
est. Several approaches for identifying attractors that do not require
the prior generation of the state transition graph have been devel-
oped. These approaches are based, for example, on the solution of a
constraint satisfaction problem [77], a satisfiability problem
[78, 79], a problem formulated in the answer set programming
framework [80], or a temporal logic query [81].
Qualitative Modeling of Synthetic Circuits 21

In the case of PLDE models, the attractors in the state transi-


tion graph map to properties of the underlying differential equation
systems. In particular, point attractors in the graph correspond to
stable steady states, whereas cyclic attractors may represent (stable
or damped) oscillations (Subheading 4). However, other states in
the graph may be interesting as well for studying the dynamic
properties of the PLDE system. For instance, in the state transition
graph in Fig. 6d, ð0, θ1N Þ0 corresponds to an unstable steady state of
the PLDE system, just like the threshold intersection (θL, θT)0 in
Fig. 5. These equilibria are located on the separatrix between two
stable attractors. Methods have been developed for identifying all
equilibria of a regulatory circuit described by a PLDE system, and
determining their stability [57, 78]. In general, determining if a
cyclic attractor in a state transition graph corresponds to a limit
cycle or a damped oscillator is a much more complex problem
[63, 64]. The question cannot usually be decided using parameter
inequalities of the type introduced in Subheading 4 only, and
requires numerical analysis [51].
The above analysis methods have been implemented in a variety
of computer tools, such as GINsim for logical models [82] and
GNA for PLDE models [73]. Applying the latter tool to the IRMA
model indicates that, when cells are growing on galactose (s+(G,
θG) ¼ 1), the network has three steady states: one stable, one unsta-
ble, and one whose stability cannot be determined from the local
structure of the state transition graph. The latter steady state lies in
a region where a cyclic attractor is present as well and numerical
simulations suggest that the cycle in the graph corresponds to a
stable limit cycle (and the point attractor to an unstable steady
state). Although the data are not entirely conclusive, due to the
fact that they are noisy and quantify mRNA concentrations on the
population level, they seem to indicate oscillatory patterns may
occur for at least some of the network components [25].
The analysis of attractors makes it possible to answer the ques-
tion which attractors can be reached from a given initial state. As
will be seen below, the attractor structure of a state transition graph
forms a suitable starting-point for network reduction, but is also an
important consideration in circuit design. It notably allows to test if
certain desired functional states can be reached and undesired states
avoided.

5.2 Reduction For high-dimensional systems, state transition graphs are typically
of State Transition handled through a square matrix of size 2 n, which is the number of
Graphs states in the graph for a model with n Boolean variables. Numerical
operations on state transition graphs are thus limited by the mem-
ory capacities of current computers, which cannot deal efficaciously
in real time with networks of n > 25 (approximately). Methods that
enable the analysis of large networks are thus critical, for example,
when studying the interactions of a synthetic circuit with a host
network.
22 Madalena Chaves and Hidde de Jong

Hierarchical Graphs and Goal-Oriented Reduction: A classical


tool for state transition graph analysis is its decomposition into
SCCs. Once the states are partitioned into SCCs, the latter become
the nodes of a new directed graph with no cycles and its terminal
SCCs are the attractors of the original state transition graph. In the
oscillator example of Fig. 3b, there are five SCCs:
C1 ¼ f001g, C 2 ¼ f101g, C 3 ¼ f100g,
ð30Þ
C4 ¼ f010, 011, 111, 110g, C 5 ¼ f000g
and the corresponding hierarchical graph is shown in Fig. 3c, where
C5 is the only terminal SCC. These decomposition techniques can
be found, for instance, in [83].
Other methods for the analysis and reduction of the state
transition graph are goal-oriented and can be applied more gener-
ally, for asynchronous graphs, automata networks, and other multi-
level logical models [84].
Identifying PLDE Parameter Regions with the Same Qualitative
Dynamics: An original approach was recently developed to explore
the correspondence between the state transition graphs of Boolean
or multi-level logical models and the parameters in a class of PLDE
models (see [85] and references therein). The idea is to characterize
regions in the parameter space that lead to the same local dynamics
and hence to the same state transition graph. This approach iden-
tifies different parameter regions, each corresponding to a specific
state transition graph and a set of logical rules compatible with the
system dynamics. A Morse graph, whose nodes are the SCCs of the
state transition graph, is associated with each parameter region. In
addition, these parameter regions are related through a parameter
graph. A software tool DSGRN (Dynamic Signatures Generated by
Regulatory Networks) is available to perform these computations
[85]. Among other applications, this tool lists all possible dynami-
cal behaviors compatible with the regulatory network, suggests
minimal network rules, or rules that exhibit a specific dynamic
behavior in the most robust way. For synthetic biology circuits,
which are affected by stochastic perturbations in molecule concen-
trations, it is important to guarantee topological robustness, in the
sense that the qualitative dynamics is the same even though para-
meters may suffer perturbations.
Reduction of the Regulatory Network: A different class of model
reduction algorithms focuses on reducing the size n of the network
of regulatory interactions and is based on iteratively suppressing
variables, by linking the incoming edges of the variable to be
supressed directly to its outgoing edges. Every occurrence of the
suppressed variable is substituted by its Boolean rule [31]. To
illustrate this methodology, consider the oscillator with positive
feedback example from Subheading 3.2. To eliminate variable N2,
Qualitative Modeling of Synthetic Circuits 23

we replace it by its Boolean rule, to obtain the following reduced


model:
L þ ¼ ðØL _ I Þ ^ N 1 , Nþ
1 ¼ ðØL _ I Þ ^ N 1 :

This method is shown to preserve all attractors. Indeed, for the


case I ¼ 0, the asynchronous transition graph of this reduced model
is
01 ⇆ 11 ! 10 ! 00
which contains the attractor 00 and also keeps a reduced form
(01 ⇆ 11) of the oscillatory behavior with N1 ¼ 1 (compare with
Fig. 3b). The above procedure can be applied iteratively to elimi-
nate variables from the network [86], excepting those variables
which contain self-loops, in which case the Boolean rule substitu-
tion does not apply.
A novel procedure for reduction of networks was introduced in
[32], which allows for a very efficient attractor computation
method. It is based, first, on a network expansion that removes all
negative interactions and, second, on the identification of the
so-called stable motifs, which contain sets of variables which are
fixed in each attractor. The network expansion consists of adding a
composite node to represent each conjunction and, in networks
with negative feedback loops, a complementary node (x i ) for each
variable, whose rule is the negated rule of the variable
(x þ
i ¼ Ø f i ðxÞ). For the oscillator example with I ¼ 0, the expanded
network becomes:
þ
Lþ ¼ N 2, N þ þ
1 ¼ N 2 , N 2 ¼ LN 1 , ðLN 1 Þ ¼ L ^ N 1 ,
þ þ þ
L ¼ N 2, N 1 ¼ N 2, N 2 ¼ L _ N 1:
A stable motif is a strongly connected set of nodes such that:
(1) it does not contain both a variable and its complementary node
and (2) if it contains a composite node, than it must contains its two
inputs. In the current example, there is only one such stable motif,
N 1 ⇆N2 . This motif represents an attractor of the network and
indicates that both N1 and N2 stabilize at the value 0 in the
þ
attractor. In addition, since L ¼ N 2 , it follows that 000 is the
only attractor of the network. Note that, for larger stable motifs, an
iterative procedure involving network reduction as above [31] and
stable motif identification is applied.

5.3 Formal Besides the detection and reachability of attractors, other dynamical
Verification of Network properties may be of interest for network analysis and design. For
Properties Using example, in order to validate a model, it is important to know if
Model Checking there exist paths in the graph in which the predicted qualitative
ordering of events, the temporal sequence of changes in gene
activity or protein concentrations, are consistent with experimental
observations.
24 Madalena Chaves and Hidde de Jong

Methods for model checking provide a formal framework for


testing a large variety of temporal properties of labeled state transi-
tion graphs, that is, graphs in which the states and/or transitions
have been annotated with features such as the value of the variables
identifying a state. In order to unambiguously specify the proper-
ties to be verified, formal languages called temporal logics have
been proposed [87]. Properties stated in temporal logic can be
verified using algorithms that efficiently run through the graph to
check if a property is true or false. While model checking has found
widespread use in computer science and engineering, applications
in systems and synthetic biology have also emerged in the past
15 years (see [88, 89] for reviews). The application of model check-
ing for the analysis of qualitative models of biological regulatory
circuits is supported by a number of computer tools [73, 82, 90–
92].
Temporal Logic to Describe Circuit Properties: A large variety of
temporal logics have been used to formalize temporal properties of
state transition graphs, differing in such characteristics as the possi-
bility to express properties on a single path or on branching paths,
to add or ignore quantitative constraints on the timing of events,
etc. In this chapter, we will illustrate the description of circuit
properties using the classical Computation Tree Logic (CTL) [87].
The expressions in CTL are interpreted on so-called Kripke
structures, which are very close to the labeled state transition
graphs describing the dynamics of Boolean and PLDE models
(Subheadings 3 and 4). A Kripke structure consists of a set of states,
a set of transitions between states, a set of atomic propositions
describing features of states, a labeling function mapping a state
to the atomic propositions that are satisfied in the state, and an
initial state. A CTL formula can be recursively constructed from
atomic proposition by means of standard Boolean operators
describing a state and pairs of quantifiers ranging over paths.
For example, for the Boolean model of the toggle switch in
Fig. 2d, the temporal logic formula EF (:L ^ T) means that, start-
ing from the initial state, there exists a path (E) such that in the
future (F) the formula :L ^ T holds. This property is true for the
initial state 00, but false when starting from 10. In the case of the
refined state transition graphs associated with PLDE models, the
CTL formula AG EF (L ¼ 0 ^ N ¼ 0) means that from any state it is
always possible to get to the trivial state in which the concentrations
of both LacI and NRI are 0. This property is false when the initial
states comprise the entire phase space, as the concentrations are
predicted to oscillate when starting above the threshold N ¼ θ1N .
Qualitative Modeling of Synthetic Circuits 25

Model Checkers to Verify Network Properties: The verification of


properties expressed in temporal logic requires highly efficient,
specialized computer tools, called model checkers. Model checking
algorithms can run on explicitly generated state transition graphs,
but as the number of states exponentially grows with the size of the
regulatory circuit, this approach quickly becomes infeasible.
Another approach is based on the symbolic encoding of the state
transition graph and the temporal logic formula to verify, reducing
the verification of the property to, for example, the solution of a
Boolean satisfiability problem or the reduction of a binary decision
diagram [87]. In the latter approach, the state transition graph is
not explicitly generated but implicitly contained in the symbolic
encoding of the problem. Powerful model checking tools exist and
most tools for biological network analysis and design call upon
these tools through a dedicated front-end or by generating an
appropriate input file.
Figure 7 shows the verification of the property AG EF
(L ¼ 0 ^ N ¼ 0) for the PLDE model of the oscillator with positive
feedback. The model and the property are entered in the tool GNA,
which calls the model checker NuSMV to test the property
[93]. The property is found to be false, as expected, since above
the threshold concentration θ1N oscillations can occur. A counter-
example is shown in the screenshot of the formal verification.

Fig. 7 Verifying reachability properties of the oscillator with positive feedback using model checking. The
property AG EF Zero, with Zero equal to (L ¼ 0 ^ N ¼ 0), is tested for the PLDE model of the oscillator in the
absence of IPTG (Fig. 6). GNA and NuSMV show the property to be false and a counterexample is shown in the
form of oscillations in the concentrations of LacI and NRI
26 Madalena Chaves and Hidde de Jong

The IRMA model has been analyzed using the above formal
verification tools [76]. The objective of the study was to verify that
the network structure and the observed data are compatible by
(1) expressing the measured RT-qPCR expression patterns of the
genes as temporal logic formulae and (2) testing if there are com-
binations of parameter inequalities for which the model predictions
are compatible with the observations. Surprisingly, among the
almost 5000 possible combinations of parameter inequalities, only
a handful turned out to be consistent with the data. The ordering of
the different activation thresholds of Swi5 inferred from the data
was corroborated by independent measurements of the promoter
activities. This and other examples from the literature [94, 95]
illustrate the interest of using temporal logic and model checking
for supporting the analysis and design of synthetic circuits.

5.4 Modular Analysis Networks in synthetic biology are often constructed by coupling
of Network Dynamics small networks, or modules, through known interactions so as to
obtain new dynamical behaviors [96]. To take advantage of this
modular approach, a recent method [37, 97] proposes to analyze a
Boolean network as the interconnection of two or more smaller
modules. In particular, this method calculates the attractors of the
full network from the attractors of the modules, thus avoiding the
calculation of the full state transition graph.
To illustrate this interconnection method, we will study a
hypothesized synthetic biology coupling between the toggle switch
(module Σ A) and the IRMA circuit (module Σ B).
Input/Output Characterization of the Modules’ Attractors: The
first step is to characterize the asymptotic input/output behavior
(or the attractors) of each module and identify the variable(s) of
each module which will influence some variable(s) in the other
module, in other words, identify the outputs and inputs for each
module. The full network will be obtained by interconnecting the
output of each module to the input of the other. For simplicity, we
assume that each module has a single input and a single output
where the inputs are as given above, with u denoting the aTc
concentration for the toggle switch (but fixing I ¼ 0) and G for
the IRMA circuit. For the outputs we will consider LacI for the
toggle switch and Gal4 for IRMA (see Fig. 8). Next, the attractors
of each module are computed for each input and they are classified
in terms of their output values, so that Auα denotes an attractor of
module Σ A subject to input u and whose output is α (both u and α
are Boolean values):
Attractors of Σ A : A 01 ¼ f10g, A 00 ¼ f01g, A 11 ¼ f10g,
where 10 and 01 are the two attractors of the toggle switch when
the input is 0; for each of these, the output L takes the values 1 and
0, respectively. In the case of input u ¼ 1, the toggle switch has only
Qualitative Modeling of Synthetic Circuits 27

A C

G4 ◦ u

Sa Sg Sc v ◦ L T

G80

Fig. 8 Input/output interconnected IRMA (module Σ B) and toggle switch (module


Σ A). The bold arrows denote the feedback interconnection: the output of one
network becomes the input to the other

one attractor 10, whose output is 1. With this input/output char-


acterization of the attractors, A01 and A11 are different objects,
even if they contain the same state. Similarly, for the IRMA circuit,
we obtain the attractors Bvβ of module Σ B corresponding to input
v and whose output is β:
Attractors of Σ B :B 00 ¼ f0000000g, B 10 ¼ f0000000g,
B c10 ¼ f0001110, 0011110, 0101111, 0101110g,
B c11 ¼ f1011110, 1011111, 1111111, 1101111g,
where the cyclic attractor B c10 \ B c11 is separated into two sets,
according to the values of the output G4.
Asymptotic Graph and Attractors of the Interconnected Network:
The second step of the method is to construct the so-called asymp-
totic graph, which is a directed graph where the nodes are all
possible pairs Auα  Bvβ (3  4 ¼ 12 in this example) and the
edges are assigned through reachability properties. Consider, for
instance, node A 01  B c11 . The output 1 from A01 implies that the
input to module Σ B is also 1, hence states in B c11 may evolve to B c10,
so an edge A 01  B c11 ! A 01  B c10 is added to the asymptotic
graph. Conversely, the output 1 from B 011 forces module Σ A to
switch to input 1; the states in A01 eventually converge to attractor
A11, so an edge A 01  B c11 ! A 11  B c11 is added to the asymptotic
graph.
The theoretical result [37] states that the asymptotic graph
contains (a representative of) all the attractors of the interconnec-
tion meaning that, in practice, the attractors of the asymptotic
graph are those of the full system. However, because the asymptotic
graph is a reduced version of the full interconnected system, it
contains less pathways and, in some cases, may generate other
spurious attractors (see [97, 98] for example).
In this example, observation of Fig. 9 shows that the
interconnected system has three attractors: two steady states
28 Madalena Chaves and Hidde de Jong

c c c
A01 × B11 A11 × B11 A00 × B11 A00 × B10 A11 × B00 A01 × B00

c c c
A01 × B10 A11 × B10 A00 × B10 A00 × B00 A11 × B10 A01 × B10

Fig. 9 Asymptotic graph for the interconnection between the toggle switch and the IRMA circuit. Bold arrows
denote a cyclic attractor of the interconnected system. There are two other point attractors, A00  B00 and
A01  B10

A00  B00 ¼ {010000000} and A01  B10 ¼ {100000000} which,


unsurprisingly, correspond to the product of the two toggle switch
attractors with the origin attractor of the IRMA circuit. In addition,
the cyclic attractor of the IRMA module is also maintained, coupled
to the state 10 of the toggle switch. This analysis suggests that, if
the two modules are synthetically coupled through Gal4 and LacI,
as indicated, three types of asymptotic behavior may co-exist. Note
that different interconnection schemes may lead to different
behavior.
From a synthetic biology perspective, this appears as a
promising tool for useful predictions and testing the dynamics of
modular systems, without the need to compute the full state transi-
tion graphs: in the example, the latter has 29 nodes, while the
asymptotic graph has only 12 nodes.

5.5 Probabilistic Although state transition graphs provide a global characterization


Analysis of State of the trajectories of the system, they give no quantitative informa-
Transition Graphs tion on the probability of the system following a given trajectory, or
the frequency of observing a given attractor. To introduce more
detailed state transition graphs, it is useful to consider their repre-
sentation as a square matrix M, where the entry mij is 1 if there is a
transition from node i to node j. For synchronous updating sche-
dules, each row has only one non-zero entry, while asynchronous
updating allows for multiple non-zero entries. The first step
towards a more quantitative description is to suppose that each
transition has its own associated
P2n probability, 0  mij  1, where
each row i adds up to 1, j ¼1 m ij ¼ 1. This idea leads to a natural
generalization of state transition graphs as Markov chains, thereby
allowing the computation of several quantitative measures such as
expected convergence time, expected reachability in a fixed number
of steps, or the probability of convergence to a given attractor.
Asynchronous Updating Schedules: Two questions can be asked:
(1) how to assign the transition probabilities mij and (2) how to
compute a trajectory in the state transition graph. Answers to these
questions are given, for instance, in [99] which discusses state
transition graphs as Markov chains. The first question is related to
the biological knowledge and experimental data on the system,
Qualitative Modeling of Synthetic Circuits 29

which is often incomplete. For piecewise-linear models, [100]


suggests to compute the (relative) size of the region of domain
i whose initial conditions lead to a trajectory into domain j. Transi-
tion probabilities are thus given in terms of the PLDE model’s
parameters. The tool MaBoSS [101] answers question (2) by apply-
ing a stochastic algorithm (Gillespie) to evolve within the state
transition graph and obtain different realizations. The transition
probabilities are either supplied by the user or assigned by default.
One of the outputs of MaBoSS is the probability of reaching a given
attractor.
The modular analysis described in Subheading 5.4 also allows
for a probabilistic extension [97]: it starts from the hypothesis that
each module attractor Auα and Bvβ is observed with a certain
probability and propagates their products along the asymptotic
graph using conditional probabilities. The attractor’s probabilities
may be obtained from biological data or other tools (for instance,
applying MaBoSS to each module).
As an example, for the IRMA/toggle switch interconnection,
assume that the probability of observing each module attractor is
defined as P(Auα) ¼ auα and P(Bvβ) ¼ bvβ, with b 11 =2 ¼ PðB c10 Þ ¼
PðB c11 Þ . To propagate these values throughout the asymptotic
graph, we need to specify the probability ρ that module Σ A is
updated. Then we assign transition probabilities as follows:
PðA 00  B c11 ! A 11  B c11 Þ ¼ ρ a 00 b 11 =2 and PðA 00  B c11 !
A 00  B 00 Þ¼ð1  ρÞ a 00 b 11 =2 . The total probability of reaching
an attractor is given by the sum of all incoming transitions.
Synchronous Updating Schedules: In this case, each node has
exactly one outgoing edge and belongs to a single basin of attrac-
tion, so the Markov chain representation as above is of no avail. An
alternative method to generate probabilistic transitions for syn-
chronous Boolean networks is proposed in [26]. The idea is to
consider a family of Boolean rules for each variable i, f f ki : k ¼
1, . . ., K i g and associated probabilities fc ki : k ¼ 1, . . ., K i g with
PK i k
k¼1 c i ¼ 1. At each updating time, there is a probability c i that
k
k
rule f i is selected to compute the next value of variable i. The
functions f ki might represent the uncertainty in the characteriza-
tion of the dynamics of variable i, or some perturbation effect. In
this way, one can compute the probability Pi(x) that variable i takes
value 0 at the next update, which depends only on the current state
x ¼ (x1, x2, . . ., xn) of the network. Therefore, taking all possible
combinations of the values Pi(x) and 1  Pi(x) for all i obtains the
probability of transition from state x to any other state y. These
probabilities are time-independent and may also be written as a
Markov chain to represent the state transition matrix.
In synthetic biology approaches, a calibrated probabilistic state
transition graph can help to improve circuit design to increase the
probability of observing a desired behavior, quantitatively predict
30 Madalena Chaves and Hidde de Jong

the response of the circuit to different inputs, and also allow for a
better regulation and control of the system, for which some tech-
niques will be discussed in the next section.

6 Control of Network Dynamics

A dynamical system, in either of the qualitative forms studied


above, Eq. 1 or 15, may have one or more input variables which
act on the system in some way, for instance, to induce gene tran-
scription, repress an inhibitory interaction, regulate the activity of a
transcription factor. In general, these input variables are easily
manipulated in the lab to obtain the desired degree of regulation.
In the toggle switch example, IPTG and aTc are input variables and
galactose is an input for the IRMA circuit.
More generally, a dynamical system with inputs may be studied
as a control system:
x_ ¼ f ðx, uÞ, ð31Þ
where x ¼ (x1, . . ., xn), f ¼ ( f1, . . ., fn), is the vector field which
governs the dynamics, and u is called the control vector, of dimen-
p
sion p, and takes values in some set U  0 . Note that p ¼ 2 for
the toggle switch and p ¼ 1 for both the oscillator and the IRMA
circuit. For control systems, the most frequently asked question is
of the form: given a target state, is there an input function U(t)
driving the system towards that state?

6.1 Control There are different ways to answer this question, but a first distinc-
Strategies tion can be made between open-loop and closed-loop control. In
open-loop control, the function U(t) is determined independently
of the dynamics of the system (31). As an example, consider the
toggle switch and suppose the target state is the one where both
LacI and TetR are strongly expressed, which is not a steady state of
the system without inputs. With the help of either the Boolean or
the PLDE models, we know that the following input will effectively
drive the toggle switch to the target state: U(t) ¼ (A(t), I(t)) ¼
(Ahigh, Ihigh), i.e., both inputs should be at a constant but suffi-
ciently high concentration.
An attractive open-loop method for practical use is known as
“bang-bang” control. The idea is to use only two constant values
for the input, Ihigh and Ilow, and apply them sequentially, by intervals
of appropriate length. This strategy tends to accelerate convergence
to the target state. This is useful when only a limited number of
input values are available, as is often the case in synthetic biology
experiments.
In contrast, a closed-loop control strategy takes into account
the evolution of the system and uses the current state to “correct”
the input. If all variables are observable, the control function is
Qualitative Modeling of Synthetic Circuits 31

written in terms of the system variables, U(t) ¼ k(x(t)), where the


p
function k : n0 ! 0 is called a feedback control law. In the case
of linear systems, a common method is to measure the difference
between the current state x(t) and the target state x∗ and let U(t) be
proportional to this difference, with a size n constant square matrix
K:
x ¼ f ðxðtÞÞ þ K ðx ∗  xðtÞÞ,
so the system tends to reduce the difference between its trajectory
and the target state. Alternatively, for a smoother response, the
integral of this difference can also be used, such as U ðtÞ ¼
K~ wðtÞ þ K ðx ∗  xðtÞÞ with wðtÞ
_ ¼ x ∗  xðtÞ and K~ another size
n constant square matrix. Application of both proportional and
integral feedback control leads to the classic PI controller [17].

6.2 Control In Boolean models, a control system still has the form of Eq. 31
for Boolean Models where u takes values in a discrete set U  f0, 1gp . A control func-
tion at time t corresponds to a discrete sequence of input values, U
[t] ¼ [u1, . . ., ut]. To construct a Boolean control function, there
are several approaches that take advantage of the discrete nature of
the system and are interpreted as a protocol for interventions. The
idea is to successively avoid pathways that lead away from the target
state.
In [102], two types of control actions are introduced, deletion
of a node or deletion of an edge in the regulatory network. The first
action corresponds to setting that node at a constant value, while
deletion of edge xi ! xj is encoded in the logical rules by:
f j ðx, ui,j Þ ¼ f j ðx 1 , . . ., Øui,j ^ x i , . . ., x n Þ, ð32Þ
where ui,j ¼ 0 implies no control is exerted and ui,j ¼ 1 implies that
xi no longer influences xj. In its general form, an input ui,j is added
for every edge in the network.
For probabilistic Boolean networks, algorithms were developed
that solve the problems of optimal finite-horizon [38] or infinite-
horizon [103] control. The goal is to drive the system from an
initial state z0 to a desired target state zM in a finite (or infinite)
number of steps while minimizing the cost associated with each
state transition. Finite- (or infinite-) horizon corresponds to the
case of a fixed (or very large) time window available for application
of a given treatment. For the infinite-horizon problem [103], the
cost is of the form
X 
1 M 1
J Π ðz 0 Þ ¼ lim E ~ t , μt , w t Þ ,
gðz
M !1 M t¼0

~ t , μt , w t Þ is the cost associated with the transition zt ! wt


where gðz
applying control μt at state zt at time t. The algorithm calculates a
control sequence Π ∗ ¼ ðμ∗ ∗ ∗
0 , μ1 , . . ., μM , . . .Þ that minimizes JΠ (z0).
32 Madalena Chaves and Hidde de Jong

Moreover, the authors find that a stationary policy, that is μ∗


k ¼μ

for all k, is easier to apply and gives good results in a melanoma


metastasis model.

6.3 Control In synthetic biology the main control question is often related to
of Synthetic Circuits the robustness of a circuit with respect to perturbations in the
environment, maintaining homeostasis [104, 105], or the reliabil-
ity and the predictability of circuit functioning [16, 17].
Applications of closed-loop control techniques to synthetic
biology circuits may involve a computer interface within the exper-
imental setup [17]. In this in silico approach, real-time measure-
ments are sent to the computer, where a calibrated mathematical
model of the circuit is used for online simulation of the PI control-
ler, which returns the updated input value. This was the methodol-
ogy used in [19] to control the toggle switch. The first objective
was to drive the system to the unstable steady state corresponding
to both LacI and TetR at their threshold concentrations. To do
this, the authors applied a PI controller through a computer inter-
face, computing aTc and IPTG separately, and succeeded in main-
taining the system near the unstable steady state. A second
experiment consisted of forcing the toggle switch with periodic
control, in an open-loop configuration. Independent pulses of
aTc and IPTG were applied to the synthetic circuit with different
periods. The toggle switch responded with periodic oscillations,
but only for carefully chosen periods of forcing.
In the experiments [19], both inputs were used to control the
system, and they were independently computed. However, a recent
theoretical result shows that a single input (in this case aTc) suffices
to control the toggle switch to the unstable steady state, x∗ ¼ (θL,
θT) [106]. The novelty is a feedback control law which is piecewise-
constant in regions of the state space: U(L(t)) ¼ umin < 1 if L
(t) < θL, that is, LacI is weakly expressed and the control law
decreases the influence of TetR on LacI; conversely, U(L(t)) ¼
umax > 1 if L(t) > θL. A similar approach on control of PLDE with
affine controls is discussed in [107], with the goal of either gen-
erating sustained oscillations in a system where they do not occur
naturally or, conversely, suppressing oscillations by damping, with
applications to a bacterial model.
Implementation of feedback control laws in a cellular environ-
ment remains one of the challenges in synthetic biology, even
though in silico techniques using PI controllers and optogenetic
devices (where gene transcription is controlled by light signals) are
increasingly used [16, 17].
Two main directions can be identified in current synthetic
biology approaches [16]: first, the design and implementation of
new circuits with biological components help to understand the
fundamental mechanisms guiding and regulating cellular behavior;
second, the design of controllers for natural regulatory and
Qualitative Modeling of Synthetic Circuits 33

metabolic networks, to improve a particular aspect of the system. In


this case, possible objectives in bioengineering include increasing
the production of specific biochemical products or metabolic com-
ponents, changing the cell’s resource allocation strategy, or even
controlling the distribution of a given product throughout a cell
population [108].

7 Concluding Remarks

In this chapter we have discussed the modeling and analysis of


synthetic circuits using qualitative formalisms. We focused on two
widely used formalisms, Boolean and other logical models and
piecewise-linear differential equation models. Although the two
formalisms are quite different on first sight, they are built upon
similar modeling assumptions. Moreover, they both allow a discrete
description of the network dynamics by means of state transition
graphs. This means that many of the methods developed for the
analysis of properties of state transition graphs are applicable to
models developed in the two formalisms. We illustrated this con-
vergence by modeling three prototypical synthetic circuits in both
the Boolean and the PLDE formalism and analyzing their proper-
ties. We also discussed one emerging trend in the engineering of
synthetic circuits, namely to view them from a control perspective.
This review, structured around three example circuits and avoiding
technical details, is certainly not exhaustive. In Note 8.1 some
pointers to further reviews are provided.
While Boolean and PLDE models are quite similar in many
respects, they also have differences and depending on the situation,
one formalism may be more appropriate than another. Boolean
models provide a natural encoding of regulatory functions and
they have been used to model very large networks
[14, 44]. PLDE models are more closely related to classical kinetic
models and they can account for subtle but potentially important
regulatory phenomena, such as the occurrence of steady states on
threshold (hyper)planes (Subheading 4). Other formalisms, like
multi-valued logical models and hybrid automata borrow aspects
from both [66, 75].
A major advantage of the use of qualitative models of synthetic
circuits is that they allow the rapid exploration of dynamical con-
sequences of design choices, in particular the choice of the network
topology and the logic of gene regulation. Without going through
the lengthy and difficult process of parameter estimation in quanti-
tative models, key properties of the dynamics, such as the occur-
rence of bistability and oscillations, can be rapidly assessed by means
of the methods discussed in Subheading 5. Although some care
should be exercised in directly translating the results of a qualitative
analysis to quantitative properties of the network dynamics, the
34 Madalena Chaves and Hidde de Jong

initial screening enabled by qualitative approaches may speed up the


network design phase and focus attention on high-potential candi-
date network designs before their actual biological implementation.
While the qualitative modeling and analysis of synthetic regu-
latory circuits are thus valuable in itself, it may also provide a
stepping stone towards a more detailed and precise, quantitative
analysis of the network dynamics. Computer tools have been devel-
oped for converting logical models into ordinary differential equa-
tion models, by transforming Boolean function into expressions of
sigmoidal functions [109], in the spirit of the representation of the
regulatory logic in PLDE models (Subheading 4). Tools for the
numerical simulation of networks described by PLDE models have
also been developed, capable of taking into account the dynamics
on threshold planes [51]. More generally, the development of the
SBML qual format [110], for the representation and exchange of
qualitative models, has made it possible to analyze a model by
means of different tools and facilitate the passing back-and-forth
between qualitative and quantitative formalisms. The SBML qual
format has emerged from the Consortium for Logical Models and
Tools (CoLoMoTo) (www.colomoto.org), an active and dynamic
working group stimulating the use of qualitative modeling for a
variety of biological applications.

8 Notes

8.1 Reviews Qualitative modeling approaches have been discussed as part of


on Qualitative general reviews of the modeling of regulatory networks [111–
Modeling 115]. Moreover, for most of the modeling formalisms discussed
in this chapter dedicated reviews are available: Boolean and other
logical models [116–118], piecewise-linear differential equation
models [119], Petri nets [82], and hybrid systems [89, 120].

8.2 Dynamic One of the first studies spelling out at length the interest of positive
Properties of Positive and negative feedback loops for the functioning of regulatory net-
and Negative works is the book on logical modeling by René Thomas [23]. Here,
Feedback Loops it was conjectured that positive feedback loops are a prerequisite for
multistability, that is, the co-occurrence of multiple stable steady
states (point attractors). On the other hand, negative feedback
loops were hypothesized to be necessary for stable oscillations.
Later work has confirmed the conjectures, both for the case of
positive and negative feedback loops [121–123]. Notice that the
criteria have been proposed for deterministic ODE models, and
that the existence of feedback loops provides necessary but not
sufficient conditions. For example, in Fig. 5, if we choose κL/
γ L < θL and κT/γ T < θT, then the toggle switch has only a single
stable steady state. Corresponding proofs of the conjectures in the
discrete, logical context have also been developed [124, 125].
Qualitative Modeling of Synthetic Circuits 35

8.3 Updating Boolean variables are defined in continuous time, but their state is
Schedules for Boolean allowed to change only at a discrete set of time instants. An updat-
Models ing schedule essentially determines the order in which the variables
change their state and it may be deterministic, where the same
order is applied at each iteration [30] or non-deterministic, where
the order is given by a random or stochastic process [126].
A deterministic updating schedule s may be defined as a func-
tion s : {1, . . ., n}!{1, . . ., m}, where m  n, s(i) < s( j) means that
variable i is updated before variable j, and s(i) ¼ s( j) indicates that
variables i and j are updated simultaneously. The case m ¼ 1 denotes
the synchronous updating schedule and the case m ¼ n denotes an
asynchronous sequential schedule. In the case of random schedules,
both m ¼ mt and s(i) ¼ st(i) depend on the current iteration time t.
If xi[t] denotes the state of variable i at time t, then the state at the
next iteration, xi[t + 1], is given by:
x i ½t þ 1 ¼ f i ðx 1 ½t þ Δt1i , . . . , x n ½t þ Δtni Þ, ð33Þ
where Δt j i ¼ 0 if st(i)  st( j) and Δt j i ¼ 1 if st(i) > st( j). In general,
each realization of an updating schedule leads to a different trajec-
tory. The dynamic properties of various updating schedules have
been studied in the literature (see [28, 30, 126] for some examples).

8.4 Discontinuities The use of step functions results in PLDE models with favorable
in Piecewise-Linear mathematical properties, except at the thresholds where disconti-
Differential Equation nuities occur. As explained in Subheading 4, these discontinuities
Models arise from the fact that when a protein concentration crosses a
threshold, it may change the rate at which some genes are
expressed, and thus switch the local vector field in a region. While
the dynamics at the thresholds are often ignored, this is potentially
dangerous as it may cause steady states and other important dyna-
mical properties of the system to be missed. In order to deal with
the discontinuities in a mathematically rigorous manner, the PL
differential equations have been generalized to differential inclu-
sions [56]. Several different extensions have been proposed, such as
Filippov extensions [56, 57], Aizerman–Pyatnitskii extensions
[51, 60], and hyper-rectangular overapproximations of the former
[55]. The latter overapproximations can be computed with quali-
tative information only, i.e., the parameter inequalities mentioned
in Subheading 4, and have been implemented in the tool GNA
[73]. For relatively mild conditions on the types of regulatory
functions, the three extensions are equivalent in practice
[51]. Other approaches for dealing with discontinuities in
piecewise-linear models have been proposed [58, 59], but are less
amenable to the automated qualitative analysis of higher-
dimensional networks.
36 Madalena Chaves and Hidde de Jong

Acknowledgements

We would like to thank our friend and colleague Jean-Luc Gouzé,


for a critical reading of the manuscript and many useful discussions.
This work has been supported by the ANR projects Maximic
(ANR-17-CE40-0024-01) and ICycle (ANR-16-CE33-0016-
01), and Inria IPL CoSy.

References

1. Kosuri S, Church G (2014) Large-scale de 13. Thomas R (1973) Boolean formalization of


novo DNA synthesis: technologies and appli- genetic control circuits. J Theor Biol
cations. Nat Methods 11(5):499–507 42:563–585
2. Csörgő B, Nyerges A, Pósfai G, Féher T 14. Rodrı́guez-Jorge O, Kempis-Calanis L,
(2016) System-level genome editing in Abou-Jaoudé W, Gutiérrez-Reyna D,
microbes. Curr Opin Microbiol 33:113–122 Hernandez C, Ramirez-Pliego O, Thomas-
3. Decoene T, Paepe BD, Maertens J, Chollier M, Spicuglia S, Santana M, Thieffry
Coussement P, Peters G, Maeseneire SD, D (2019) Cooperation between T cell recep-
Mey MD (2018) Standardization in synthetic tor and Toll-like receptor 5 signaling for CD4
biology: an engineering discipline coming of + T cell activation. Sci Signal 12(577):
age. Crit Rev Biotechnol 38(5):647–656 eaar3641
4. Nielsen A, Der B, Shin J, Vaidyanathan P, 15. Saez-Rodriguez J, Simeoni L, Lindquist J,
Paralanov V, Strychalski E, Ross D, Hemenway R, Bommhardt U, Arndt B,
Densmore D, Voigt C (2016) Genetic circuit Haus UU, Weismantel R, Gilles E, Klamt S,
design automation. Science 352(6281): Schraven B (2007) A logical model provides
aac7341 insights into T cell receptor signaling. PLoS
5. Kwok R (2010) Five hard truths for synthetic Comput Biol 3(8):e163
biology. Nature 463(7279):288–290 16. Hsiao V, Swaminathan A, Murray R (2018)
6. Otero-Muras I, Banga J (2017) Automated Control theory for synthetic biology. IEEE
design framework for synthetic biology Control Syst Mag 38:32–62
exploiting Pareto optimality. ACS Synth Biol 17. Del Vecchio D, Dy AJ, Qian Y (2016) Con-
6(7):1180–1193 trol theory meets synthetic biology. J R Soc
7. Purcell O, Savery N, Grierson C, di Bernardo Interface 13:20160380
M (2010) A comparative analysis of synthetic 18. Gardner T, Cantor C, Collins J (2000) Con-
genetic oscillators. J R Soc Interface 7 struction of a genetic toggle switch in Escher-
(52):1503–1524 ichia coli. Nature 403(6767):339–342
8. Ashyraliyev M, Nanfack YF, Kaandorp J, 19. Lugagne JB, Carrillo S, Kirch M, Köhler A,
Blom J (2009) Systems biology: parameter Batt G, Hersen P (2017) Balancing a genetic
estimation for biochemical models. FEBS J toggle switch by real-time feedback control
276(4):886–902 and periodic forcing. Nat Commun 8:1671
9. Berthoumieux S, Brilli M, de Jong H, 20. Jaeger J, Surkova S, Blagov M, Janssens H,
Kahn D, Cinquemani E (2011) Identification Kosman D, Kozlov K, Manu, Myasnikova E,
of metabolic network models from incom- Vanario-Alonso C, Samsonova M, Sharp D,
plete high-throughput datasets. Bioinformat- Reinitz J (2004) Dynamic control of posi-
ics 27(13):i186–i195 tional information in the early Drosophila
10. Villaverde A, Banga J (2013) Reverse engi- embryo. Nature 430(6997):368–371
neering and identification in systems biology: 21. Becskei A, Serrano L (2000) Engineering sta-
strategies, perspectives and challenges. J R Soc bility in gene networks by autoregulation.
Interface 11(91):20130505 Nature 405(6786):590–591
11. Kauffman S (1969) Metabolic stability and 22. Elowitz M, Leibler S (2000) A synthetic oscil-
epigenesis in randomly constructed genetic latory network of transcriptional regulators.
nets. J Theor Biol 22(3):437–467 Nature 403(6767):335–338
12. Glass L, Kauffman S (1973) The logical anal- 23. Thomas R, D’Ari R (1990) Biological feed-
ysis of continuous, nonlinear biochemical back. CRC Press, Boca Raton
control networks. J Theor Biol 39:103–129 24. Atkinson M, Savageau M, Myers J, Ninfa A
(2003) Development of genetic circuitry
Qualitative Modeling of Synthetic Circuits 37

exhibiting toggle switch or oscillatory behav- asymptotic and transient dynamics. Automa-
ior in Escherichia coli. Cell 113(5):597–608 tica 49(4):884–893
25. Cantone I, Marucci L, Iorio F, Ricci M, 38. Datta A, Choudhary A, Bittner ML, Dough-
Belcastro V, Bansal M, Santini S, di erty ER (2003) External control in Markovian
Bernardo M, di Bernardo D, Cosma M genetic regulatory networks. Mach Learn 52
(2009) A yeast synthetic network for in vivo (1–2):169–181
assessment of reverse-engineering and model- 39. Laschov D, Margaliot M (2012) Controllabil-
ing approaches. Cell 137:172–181 ity of Boolean control networks via the
26. Shmulevich I, Dougherty E, Kim S, Zhang W Perron-Frobenius theory. Automatica 48
(2002) Probabilistic Boolean networks: a (6):1218–1223
rule-based uncertainty model for gene regu- 40. Yang JM, Lee CK, Cho KH (2018) Global
latory networks. Bioinformatics 18 stabilization of Boolean networks to control
(2):261–274 the heterogeneity of cellular responses. Front
27. Mori T, Flöttmann M, Krantz M, Akutsu T, Physiol 9:774
Klipp E (2015) Stochastic simulation of Bool- 41. Li F, Long T, Lu Y, Ouyang Q, Tang C
ean rxncon models: towards quantitative anal- (2004) The yeast cell-cycle network is
ysis of large signaling networks. BMC Syst robustly designed. Proc Natl Acad Sci USA
Biol 9(45):1–9 101(14):4781–4786
28. Chaves M, Albert R, Sontag E (2005) 42. Fauré A, Naldi A, Chaouiya C, Thieffry D
Robustness and fragility of Boolean models (2006) Dynamical analysis of a generic bool-
for genetic regulatory networks. J Theor Biol ean model for the control of the mammalian
235(3):431–449 cell cycle. Bioinformatics 22(14):124–131
29. Gonzalez A, Naldi A, Sànchez L, DThieffry, 43. Ortiz-Gutiérrez E, Garcı́a-Cruz K,
Chaouiya C (2006) GINsim: a software suite Azpeitia E, Castillo A, Sánchez M, Alvarez-
for the qualitative modelling, simulation and Buylla E (2015) A dynamic gene regulatory
analysis of regulatory networks. BioSystems network model that recovers the cyclic behav-
84(2):91–100 ior of Arabidopsis thaliana cell cycle. PLoS
30. Aracena J, Goles E, Moreira A, Salinas L Comput Biol 11(9):e1004486
(2009) On the robustness of update schedules 44. Calzone L, Tournier L, Fourquet S,
in Boolean networks. BioSystems 97(1):1–8 Thieffry D, Zhivotovsky B, Barillot E, Zino-
31. Naldi A, Rémy E, Thieffry D, Chaouiya C vyev A (2010) Mathematical modelling of
(2011) Dynamically consistent reduction of cell-fate decision in response to death receptor
logical regulatory graphs. Theor Comput Sci engagement. PLoS Comput Biol 6(3):
412(21):2207–2218 e1000702
32. Zañudo J, Albert R (2013) An effective net- 45. Zhang R, Shah M, Yang J, Nyland S, Liu X,
work reduction approach to find the dynami- Yun J, Albert R, Loughran TP Jr (2008) Net-
cal repertoire of discrete dynamic networks. work model of survival signaling in large gran-
Chaos 23(2):025111 ular lymphocyte leukemia. Proc Natl Acad Sci
33. Irons D (2006) Improving the efficiency of USA 105(42):16308–16313
attractor cycle identification in Boolean net- 46. Sánchez L, Thieffry D (2001) A logical analy-
works. Physica D 217:7–21 sis of the Drosophila gap-gene system. J Theor
34. Akutsu T, Melkman A, Tamura T, Yamamoto Biol 211:115–141
M (2011) Determining a singleton attractor 47. Albert R, Othmer HG (2003) The topology
of a Boolean network with nested canalyzing of the regulatory interactions predicts the
functions. J Comput Biol 18(10):1275–1290 expression pattern of the Drosophila segment
35. Veliz-Cuba A, Aguilar B, Hinkelmann F, Lau- polarity genes. J Theor Biol 223:1–18
benbacher R (2014) Steady state analysis of 48. Barberis M, Helikar T (eds) (2019) Logical
Boolean molecular network models via model modeling of cellular processes: from software
reduction and computational algebra. BMC development to network dynamics. Lausanne:
Bioinform 15:221 Frontiers Media
36. Lorenz T, Siebert H, Bockmayr A (2013) 49. Chaouiya C (2007) Petri net modelling of
Analysis and characterization of asynchronous biological networks. Brief Bioinform 8
state transition graphs using extremal states. (4):210–219
Bull Math Biol 75(6):920–938 50. Heiner M, Koch I (2004) Petri net based
37. Tournier L, Chaves M (2013) Interconnec- model validation in systems biology. In:
tion of asynchronous Boolean networks, Cortadella J, Reisig W (eds) Applications and
38 Madalena Chaves and Hidde de Jong

theory of Petri nets 2004. Springer, Berlin, pp 64. Farcot E (2006) Geometric properties of a
216–237 class of piecewise affine biological network
51. Acary V, de Jong H, Brogliato B (2014) models. J Math Biol 52(3):373–418
Numerical simulation of piecewise-linear 65. Batt G, de Jong H, Page M, Geiselmann J
models of gene regulatory networks using (2008) Symbolic reachability analysis of
complementarity systems. Physica D genetic regulatory networks using discrete
269:103–119 abstractions. Automatica 44(4):982–989
52. van Ham P (1979) How to deal with variables 66. Thomas R, Thieffry D, Kaufman M (1995)
with more than two levels. In: Thomas R Dynamical behaviour of biological regulatory
(ed) Kinetic logic: a Boolean approach to the networks: I. Biological role of feedback loops
analysis of complex regulatory systems. Lec- and practical use of the concept of the loop-
ture notes in biomathematics, vol 29. characteristic state. Bull Math Biol 57
Springer, Berlin, pp 326–343 (2):247–276
53. Shannon P, Markiel A, Ozier O, Baliga N, 67. Edwards R, Siegelmann H, Aziza K, Glass L
Wang J, Ramage D, Amin N, (2001) Symbolic dynamics and computation
Schwikowski B, Ideker T (2003) Cytoscape: in model gene networks. Chaos 11
a software environment for integrated models (1):160–169
of biomolecular interaction networks. 68. Mestl T, Lemay C, Glass L (1996) Chaos in
Genome Res 13(11):2498–2504 high-dimensional neural and gene networks.
54. Mestl T, Plahte E, Omholt S (1995) A math- Physica D 98(1):33–52
ematical framework for describing and analys- 69. de Jong H, Geiselmann J, Batt G,
ing gene regulatory networks. J Theor Biol Hernandez C, Page M (2004) Qualitative
176(2):291–300 simulation of the initiation of sporulation in
55. de Jong H, Gouzé JL, Hernandez C, Page M, B. subtilis. Bull Math Biol 66(2):261–299
Sari T, Geiselmann J (2004) Qualitative sim- 70. Monteiro P, Dias P, Ropers D, Oliveira A, S-
ulation of genetic regulatory networks using á-Correia I, Teixeira M, Freitas A (2011)
piecewise-linear models. Bull Math Biol 66 Qualitative modelling and formal verification
(2):301–340 of the FLR1 gene mancozeb response in Sac-
56. Gouzé JL, Sari T (2002) A class of piecewise charomyces cerevisiae. IET Syst Biol 5
linear differential equations arising in (5):308–316
biological models. Dynam Syst 17 71. Sepulchre JA, Reverchon S, Nasser W (2007)
(4):299–316 Modeling the onset of virulence in a pectino-
57. Casey R, de Jong H, Gouzé JL (2006) lytic bacterium. J Theor Biol 44(2):239–257
Piecewise-linear models of genetic regulatory 72. de Jong H, Geiselmann J, Hernandez C, Page
networks: equilibria and their stability. J Math M (2003) Genetic network analyzer: qualita-
Biol 52(1):27–56 tive simulation of genetic regulatory net-
58. Ironi L, Panzeri L, Plahte E, Simoncini V works. Bioinformatics 19(3):336–344
(2011) Dynamics of actively regulated gene 73. Batt G, Besson B, Ciron P, de Jong H,
networks. Physica D 240(8):779–794 Dumas E, Geiselmann J, Monte R,
59. Plahte E, Kjóglum S (2005) Analysis and Monteiro P, Page M, Rechenmann F, Ropers
generic properties of gene regulatory net- D (2012) Genetic network analyzer: a tool for
works with graded response functions. Phy- the qualitative modeling and simulation of
sica D 201(1):150–176 bacterial regulatory networks. Methods Mol
60. Machina A, Edwards R, van den Driessche P Biol 804:439–462
(2013) Singular dynamics in gene network 74. Huttinga Z, Cummins B, Gedeon T, Mischai-
models. SIAM J Appl Math 12(1):95–125 kow K (2018) Global dynamics for switching
61. Glass L (1975) Classification of biological systems and their extensions by linear differ-
networks by their qualitative dynamics. J ential equations. Physica D 367:19–37
Theor Biol 54(1):85–107 75. Ghosh R, Tomlin C (2004) Symbolic reach-
62. Glass L, Pasternack J (1978) Prediction of able set computation of piecewise affine
limit cycles in mathematical models of hybrid automata and its application to
biological oscillations. Bull Math Biol 40 biological modelling: Delta-Notch protein
(3):27–44 signalling. Syst Biol 1(1):170–183
63. Edwards R (2000) Analysis of continuous- 76. Batt G, Page M, Cantone I, Goessler G,
time switching networks. Physica D 146 Monteiro P, de Jong H (2010) Efficient
(1–4):165–199 parameter search for qualitative models of
Qualitative Modeling of Synthetic Circuits 39

regulatory networks using symbolic model 91. Calzone L, Fages F, Soliman S (2006) BIOC-
checking. Bioinformatics 26(18):i603–i610 HAM: an environment for modeling
77. Devloo V, Hansen P, Labbé M (2003) Identi- biological systems and formalizing experi-
fication of all steady states in large networks by mental knowledge. Bioinformatics 22
logical analysis. Bull Math Biol (14):1805–1807
65:1025–1051 92. Kwiatkowska M, Norman G, Parker D (2011)
78. de Jong H, Page M (2008) Search for steady PRISM 4.0: Verification of probabilistic real-
states of piecewise-linear differential equation time systems. In: Gopalakrishnan G, Qadeer S
models of genetic regulatory networks. (eds) Proceedings of 23rd international con-
IEEE/ACM Trans Comput Biol Bioinform ference computer aided verification
5(2):208–222 (CAV’11). Lecture notes in computer science,
79. Dubrova E, Teslenko M (2011) A SAT-based vol 6806. Springer, Berlin, pp 585–591
algorithm for finding attractors in synchro- 93. Monteiro P, Dumas E, Besson B, Mateescu R,
nous Boolean networks. IEEE/ACM Trans Page M, Freitas A, de Jong H (2009) A
Comput Biol Bioinform 8(5):1393–1399 service-oriented architecture for integrating
80. Abdallah EB, Folschette M, Roux O, Magnin the modeling and formal verification of
M (2017) ASP-based method for the enumer- genetic regulatory networks. BMC Bioinform
ation of attractors in non-deterministic syn- 10:450
chronous and asynchronous multi-valued 94. Batt G, Belta C, Weiss R (2008) Temporal
networks. Algorithms Mol Biol 12:20 logic analysis of gene networks under param-
81. Klarner H, Siebert H (2015) Approximating eter uncertainty. IEEE Trans Autom Control
attractors of Boolean networks by iterative 53:215–229
CTL model checking. Front Bioeng Biotech- 95. Courbet A, Amar P, Fages F, Renard E,
nol 3:130 Molina F (2018) Computer-aided biochemi-
82. Chaouiya C, Naldi A, Thieffry D (2012) Log- cal programming of synthetic microreactors as
ical modelling of gene regulatory networks diagnostic devices. Mol Syst Biol 14(6):e7845
with GINsim. Methods Mol Biol 96. Perez-Carrasco R, Barnes C, Schaerli Y,
804:463–479 Isalan M, Briscoe J, Page K (2018) Combin-
83. Cormen T, Leiserson C, Rivest R, Stein C ing a toggle switch and a repressilator within
(2001) Introduction to algorithms. MIT the AC-DC circuit generates distinct dynami-
Press and McGraw-Hill, Cambridge cal behaviors. Cell Syst 6(4):521–530
84. Paulevé L (2018) Reduction of qualitative 97. Chaves M, Tournier L (2018) Analysis tools
models of biological networks for transient for interconnected Boolean networks with
dynamics analysis. IEEE/ACM Trans Com- biological applications. Front Physiol 9:586
put Biol Bioinformatics 15(4):1167–1179 98. Chaves M, Carta A (2015) Attractor compu-
85. Cummins B, Gedeon T, Harker S, Mischai- tation using interconnected Boolean net-
kow K (2018) DSGRN: examining the works: testing growth rate models in E. coli.
dynamics of families of logical models. Front Theor Comput Sci 599:47–63
Physiol 9:549 99. Bourdon J, Eveillard D, Siegel A (2011) Inte-
86. Veliz-Cuba A (2011) Reduction of Boolean grating quantitative knowledge into a qualita-
network models. J Theor Biol 289:167–172 tive gene regulatory network. PLOS Comput
Biol 7(9):1–11
87. Clarke E, Grumberg O, Peled D (1999)
Model checking. MIT Press, Boston 100. Chaves M, Farcot E, Gouzé JL (2013) Prob-
abilistic approach for predicting periodic
88. Carrillo M, Góngora P, Rosenblueth D orbits in piecewise affine differential models.
(2012) An overview of existing modeling Bull Math Biol 75(6):967–987
tools making use of model checking in the
analysis of biochemical networks. Front Plant 101. Stoll G, Viara E, Barillot E, Calzone L (2012)
Sci 3:155 Continuous time Boolean modeling for
biological signaling: application of Gillespie
89. Bartocci E, Lió P (2016) Computational algorithm. BMC Syst Biol 6(1):116
modeling, formal analysis, and tools for sys-
tems biology. PLoS Comput Biol 12(1): 102. Murrugarra D, Veliz-Cuba A, Aguilar B, Lau-
e1004591 benbacher R (2016) Identification of control
targets in Boolean molecular network models
90. Bernot G, Comet JP, Richard A, Guespin J via computational algebra. BMC Syst Biol
(2004) Application of formal methods to 10:94
biological regulatory networks: extending
Thomas’ asynchronous logical approach with 103. Pal R, Datta A, Dougherty ER (2006) Opti-
temporal logic. J Theor Biol 229(3):339–347 mal infinite-horizon control for probabilistic
40 Madalena Chaves and Hidde de Jong

Boolean networks. IEEE Trans Signal Process 114. Novère NL (2015) Quantitative and logic
54(6):2375–2387 modelling of molecular and gene networks.
104. Miller M, Hafner M, Sontag E, Davidsohn N, Nat Rev Genet 16(3):146–158
Subramanian S, Purnick P, Lauffenburger D, 115. de Jong H, Ropers D (2006) Strategies for
Weiss R (2016) Modular design of artificial dealing with incomplete information in the
tissue homeostasis: robust control through modeling of molecular interaction networks.
synthetic cellular heterogeneity. PLoS Com- Brief Bioinform 7(4):354–63
put Biol 8:e1002579 116. Bornholdt S (2008) Boolean network models
105. Aoki S, Lillacci G, Gupta A, Baumschlager A, of cellular regulation: prospects and limita-
Schweingruber D, Khammash M (2019) A tions. J R Soc Interface 5(Suppl 1):S85–S94
universal biomolecular integral feedback con- 117. Wang RS, Saadatpour A, Albert R (2012)
troller for robust perfect adaptation. Nature Boolean modeling in systems biology: an
570(7762):533–537 overview of methodology and applications.
106. Chambon L, Gouzé JL (2019) A new quali- Phys Biol 9(5):055001
tative control strategy for the genetic toggle 118. Abou-Jaoudé W, Traynard P, Monteiro P,
switch. IFAC-PapersOnLine 52(1):532–537 Saez-Rodriguez J, Helikar T, Thieffry D,
107. Edwards R, Kim S, van den Driessche P Chaouiya C (2016) Logical modeling and
(2011) Control design for sustained oscilla- dynamical analysis of cellular networks.
tion in a two-gene regulatory network. J Front Genet 7:94
Math Biol 62(4):453–478 119. Glass L, Edwards R (2018) Hybrid models of
108. Liu D, Mannan A, Han Y, Oyarzún D, Zhang genetic networks: mathematical challenges
F (2018) Dynamic metabolic control: and biological relevance. J Theor Biol
towards precision engineering of metabolism. 458:111–118
J Ind Microbiol Biotechnol 45(7):535–543 120. Li X, Omotere O, Qian L, Dougherty E
109. Wittmann D, Krumsiek J, Saez-Rodriguez J, (2017) Review of stochastic hybrid systems
Lauffenburger D, Klamt S, Theis F (2009) with applications in biological systems model-
Transforming Boolean models to continuous ing and analysis. EURASIP J Bioinform Syst
models: methodology and application to Biol 2017(1):8
T-cell receptor signaling. BMC Syst Biol 3:98 121. Gouzé JL (1998) Positive and negative cir-
110. Chaouiya C, Bérenguier D, Keating S, cuits in dynamical systems. J Biol Syst 6
Naldi A, Van Iersel M, Rodriguez N, (1):11–15
Dr€ager A, Büchel F, Cokelaer T, Kowal B, 122. Soulé C (2003) Graphic requirements for
Wicks B, Gonçalves E, Dorier J, Page M, multistationarity. ComPlexUs 1(3):123–133
Monteiro P, Von Kamp A, Xenarios I, de 123. Snoussi E (1998) Necessary conditions for
Jong H, Hucka M, Klamt S, Thieffry D, Le multistationarity and stable periodicity. J Biol
Novère N, Saez-Rodriguez J, Helikar T Syst 6(1):3–9
(2013) SBML qualitative models: a model
representation format and infrastructure to 124. Remy E, Ruet P, Thieffry D (2008) Graphic
foster interactions between qualitative model- requirement for multistability and attractive
ling formalisms and tools. BMC Syst Biol 7 cycles in a Boolean dynamical framework.
(1):135 Adv Appl Math 41(3):335–350
111. de Jong H (2002) Modeling and simulation 125. Richard A, Comet JP (2007) Necessary con-
of genetic regulatory systems: a literature ditions for multistationarity in discrete dyna-
review. J Comput Biol 9(1):67–103 mical systems. Discr Appl Math 155
(18):2403–2413
112. Fisher J, Henzinger T (2007) Executable cell
biology. Nat Biotechnol 25(11):1239–1250 126. Deng X, Geng H, Matache M (2006)
Dynamics of asynchronous random Boolean
113. Karlebach G, Shamir R (2008) Modelling and networks with asynchrony generated by sto-
analysis of gene regulatory networks. Nat Rev chastic processes. BioSystems 88(1–2):16–34
Mol Cell Biol 9(10):770–780
Chapter 2

Stochastic Differential Equations for Practical Simulation


of Gene Circuits
Jesús Picó, Alejandro Vignoni, and Yadira Boada

Abstract
The Chemical Langevin Equation approach allows simple stochastic simulation of gene circuits under many
practical situations where the number of molecules of the species involved is not extremely low. Here, we
describe methods and a computational framework to simulate a population of cells containing gene circuits
of interest. These methods account for both intrinsic and extrinsic noise sources, and allow us to have both
individual cell-related species and population-related ones. The protocol covers aspects related to proper
description of the system and setting the software tools. It also helps to deal with the optimization of data
storage and the simulation precision versus computational time issue. Finally, it also gives practical tests to
assess the validity of the underlying technical assumptions.

Key words Synthetic biology, Gene circuits, Stochastic Modeling, Chemical Langevin equation

1 Introduction

Noise is pervasive in the cellular mechanisms underlying gene


expression [32]. As a consequence, a variation of protein expression
levels appears in every cell within a population of cells [28]. This
stochasticity in protein expression levels is often referred to as gene
expression noise [9, 12].
Gene expression noise cannot be avoided and generates pheno-
typic variability that may have a relevant impact on cellular func-
tions, including, e.g. the stress response, metabolism,
development, the cell cycle, circadian rhythms, and aging
[1, 30]. Indeed, noise propagates to downstream genes at the
single-cell level and eventually causes variations within an isogenic
population [24, 30]. These variations may determine the fate of
individual cells and that of a whole population, being beneficial in
some contexts and harmful in others [11].
At the gene level, noise can be traced back to intrinsic sources
due to stochastic fluctuations in transcription and translation react-
ing mechanisms, and extrinsic ones corresponding to gene

Filippo Menolascina (ed.), Synthetic Gene Circuits: Methods and Protocols, Methods in Molecular Biology, vol. 2229,
https://doi.org/10.1007/978-1-0716-1032-9_2, © Springer Science+Business Media, LLC, part of Springer Nature 2021

41
42 Jesús Picó et al.

independent fluctuations in gene expression due to external factors


[9, 11, 19]. The first arise as a consequence of the discrete random
nature of the molecular reaction events involved in gene expression.
The latter are mainly caused by global fluctuations in the amounts
of biochemical resources (e.g. available number of RNA poly-
merases and ribosomes, plasmids copy number, etc.) and noise in
upstream genes. Both intrinsic and extrinsic noise should be taken
into account to perform stochastic simulations of gene circuits
[7, 17, 37, 40].
Besides the different mechanisms originating noise, the reac-
tions involved in gene circuits take place within the spatially
organized volume of the cell. In this chapter, algorithms that
simulate spatially distributed stochastic systems are not considered.
Therefore, we assume a well-mixed homogeneous system where
diffusion processes are much faster than reaction ones, so we can
ignore the spatial distribution of reactions within the cell and cells
within the culture. For spatial stochastic modeling, the reader is
referred to [3, 27].
The continuous deterministic approach to model and simulate
gene circuits provides a good approximation to the average behav-
ior of the relevant variables (i.e. biochemical species). Yet, it fails to
fully characterize the system behavior when stochasticity due to
noise plays a relevant role [11, 34] as, for instance, when the
number of molecules of the involved species is small and random
fluctuations are thus significant relative to the overall gene expres-
sion average values. In these cases, the stochastic system may show
behaviors, like, for instance, bimodality and oscillations, that their
deterministic counterparts do not show. In addition, making use of
stochastic models and the additional information conveyed by noisy
experimental measurements may allow improved estimation of
model parameters [6, 33]
The direct approach to simulate a biochemical stochastic sys-
tem consists of accounting for the probability that each of the
biochemical species in the system has a given number of copies.
The Chemical Master Equation (CME) provides an accurate
account for the evolution of the probability density distribution
of the number of molecules for each species in the system
[38, 39]. It yields an infinite-dimensional system of differential
equations, one for each possible state of the system. That is, the
CME has to be solved for all possible number of molecules of each
species. Therefore, an analytic solution of the CME is not possible
in the general case. Workarounds include numerical approaches
providing a solution in a truncated state-space like in [20, 26] or
simulations of its exact sample paths using Gillespie’s stochastic
simulation algorithm (SSA) [14]. The SSA computes single realiza-
tions of the underlying Markov jump process to obtain a numerical
estimation of the probability distribution of each species in the
system. The properties deduced about the probabilistic nature of
Stochastic Differential Equations for Practical Simulation of Gene Circuits 43

the process from multiple runs can be made arbitrarily accurate by


averaging over a sufficient number of runs to reduce the Monte
Carlo error associated with the estimates [39]. Thus, the SSA is
exact in the sense that the statistics from the CME are reproduced
precisely. But it comes at a high computational cost even for a few
species. In particular, if the number of molecules has large fluctua-
tions or if many reactions occur per unit time. In the first case a
large number of samples have to be simulated to obtain statistically
accurate results, whereas in the second case single simulations
become expensive since the time between reaction events becomes
small [35]. There are two main approaches to speed up the simula-
tion time for the SSA. The first one, a still exact method, is based on
factoring-out reaction propensities, what is called partial-
propensities method [29]. The second one, essentially consists of
lumping together reactions and updating the state vector only after
many reactions have fired. This last method is the so-called tau-
leaping approximation and its variants introduce approximation
errors that will be small as long as the state vector updates are
relatively small [16]. Pushing further these approximation
approaches leads to continuous stochastic simulation methods.
As an alternative to the numerical approaches above, the
Chemical Langevin Equation (CLE) approach is a continuous sto-
chastic simulation method that approximates the CME by a system
of stochastic differential equations (SDE). In contrast to the CME,
which leads to an infinite-dimensional system, the CLE gives a
system of SDEs of order equal to the number of species [16]. The
CLE approach is a practical way to model gene expression noise
when the number of molecules of the species is sufficiently large
[13, 14].
To apply the CLE approach and simulate genetic circuits, the
mathematical model of the circuit dynamics must first be expressed
in the proper form. Then, it is necessary to account for both
intrinsic and extrinsic noise sources, and to enable the possibility
of having either or both individual cell-related species and
population-related ones. In addition, an efficient computational
framework for stochastic simulation must be set. Aspects related
to optimization of data storage, simulation precision versus compu-
tational time, and practical tests to assess the validity of the under-
lying technical assumptions have also to be considered.

2 Materials

To clarify the protocol steps, the synthetic gene circuit depicted in


Fig. 1 will be used as example when needed. This circuit, hereafter
denoted as the QS/Fb circuit, integrates a cell-to-cell communica-
tion mechanism and an intracellular feedback loop [5]. As a conse-
quence, there are both intracellular biochemical species at the
44 Jesús Picó et al.

Fig. 1 QS/Fb circuit. The gene circuit aims to regule the mean expression of a
protein of interest while minimizing the noise strength. To this end, it relies on
the combination of a cell-to-cell communication based on quorum sensing
(QS) via exchange of a diffusible molecule, and intracellular negative feedback
(Fb). The Fb subsystem regulates the expression of the protein of interest inside
each cell, minimizing its noise strength. The QS subsystem induces consensus
among the cells thus achieving homogeneous expression across the population
of cells

individual cell level and extracellular ones at the population level.


The circuit aim is to achieve a desired mean value of the expression
of a protein of interest while minimizing its variability in time and
across the population of cells. Therefore, analysis of the circuit
performance requires stochastic simulations.

2.1 Getting the 1. Define a vector containing the number of molecules of the
Model in Proper Form biochemical species for the population of cells. The dynam-
ics of the circuit will later be expressed using this vector con-
taining the number of molecules of the relevant biochemical
species as the model state variables (see Note 1). Set the num-
ber N of cells to be simulated. This protocol assumes N is
constant throughout the simulation. This is consistent with exper-
imental conditions carried out under continuous operation in
turbidostats and microfluidic devices. Refer to Note 2 on how
to obtain an estimation of the population size N so as to get
statistically correct results taking into account the computational
cost. Refer to Note 3 to relate the population size and the optical
density. Consider all N cells have the same set of relevant intracel-
lular biochemical species. Refer to Note 4 on how to deal with
heterogeneous cells. For a system with c common intracellular
species for all N cells and e extracellular species, define the column
vector n ¼ [ni, . . ., nN, nc+1, . . ., nc+e]T containing all vectors
ni ¼ [n1, . . ., nc]i with the number of molecules for the
c intracellular species in the i-th cell, and the variables nc+1, . . .,
Stochastic Differential Equations for Practical Simulation of Gene Circuits 45

nc+e containing the number of molecules of each extracellular


species (see Example 1).

Example 1 For the QS/Fb circuit the relevant species are


the intracellular n1 (PI), n2 (R), n3 (R.A2), the small
molecule n4 (A) that can freely diffuse across the cellular
membrane, and n6 (R.A). n6 can be obtained algebrai-
cally as a function of the others, n6 ¼ g(n2, n3, n4) (see
Point 4 below to handle species algebraically deter-
mined). Additionally, n5 (Ae) represents the amount of
extracellular molecules of the diffusible species A. With
this set of species, n ¼ [ni, . . ., nN, n5]T, with ni ¼
½ni1 , ni2 , ni3 , ni4 , ni6  for the five intracellular species in
each i-th cell.

2. Define the vector of reaction propensities. For a system with


N cells where
– rc reactions are common to all cells and affect both the
dynamics of intracellular species and those of the extracellu-
lar ones, and
– re reactions only affect the dynamics of extracellular species,
define the column vector a(n) (see Eq. 1) containing all vectors
a(n)i of propensities for the rc intracellular reactions in the i-th
cell, and the reaction propensities aðnÞr c þ1 , . . . , aðnÞr c þr e for
the extracellular reactions.
⎡ ⎤
1
⎢ a(n) ⎥
⎢ ⎥
⎢ a(n) ⎥
2
⎢ ⎥
⎢ ⎥
⎢ .. ⎥ ⎡ ⎤
⎢ . ⎥
⎢ ⎥ a1 (n)
⎢ ⎥ ⎢ ⎥
⎢ a(n)
N ⎥
⎢ ⎥
⎢ ⎥ ..
a(n) = ⎢ ⎥.
i
a(n) = ⎢ ⎥, ⎢ . ⎥ ð1Þ
⎢ ⎥ ⎣ ⎦
⎢ ⎥
⎢ ⎥ arc (n)
⎢ ⎥
⎢ arc +1 (n) ⎥
⎢ ⎥
⎢ .. ⎥
⎢ ⎥
⎢ . ⎥
⎣ ⎦
arc +re (n)

Refer to Note 4 on how to deal with heterogeneous cells


with different sets of intracellular reactions. Refer to Note 5 on
how to model the reaction propensities. Refer to Notes 6, 7
and 8 on how to deal with lumped propensities obtained from a
46 Jesús Picó et al.

reduced order model (see Example 2) and validate them (see


Example Note 5). Refer to Note 7 on how to deal with
diffusion of molecules across the cell membrane.

Example 2 For the QS/Fb circuit, consider the set of reactions among the species depicted
below. This set includes some pseudo-reactions: the first reaction with the lumped
functional propensity f 1 ðni3 Þ resulting from a previous model-order reduction (see
Notes 6 and 7), and the 9th reaction accounting for the diffusion process (see Note 7).
The corresponding vector of propensities is shown on the right. For each cell, the last
propensity DVcn5 depends on the extracellular species n5 (Ae) but it is included as an
intracellular reaction as it affects the dynamics of the intracellular species ni4 (A). On the
contrary, the propensity function dAe n5 only affects directly the dynamics of the
extracellular species n5. Therefore, it is included as an extracellular reaction in the vector
of propensities. Refer to Example 12 for the software code implementation.
⎡ ⎤
i
i f1 (n3 ) i
(R · A)2 −−−−→ PI + (R · A)2 i f (n i
)
⎢ 1 3 ⎥
⎢ ⎥
⎢ ⎥
⎢ dI n1 ⎥
d i
PIi −−I→ ∅
⎢ ⎥
⎢ C ⎥
cLR
−−→ R i ⎢ LR ⎥
⎢ ⎥
⎡ ⎤ ⎢ − i ⎥
⎢ k n ⎥
⎢ 1 6 ⎥
+
k1 1
Ri + Ai − −−
− R · Ai ⎢ a(n) ⎥ ⎢ ⎥
k− ⎢ ⎥ ⎢ k+ ni ni ⎥
1
⎢ a(n)2 ⎥ ⎢ 1 2 4⎥
⎢ ⎥ ⎢ ⎥
d
Ri −−R→ ∅ ⎢ . ⎥ ⎢ d ni ⎥
⎢ . ⎥ ⎢ R 2 ⎥
⎢ . ⎥ ⎢ ⎥
+
k2
a(n) = ⎢ i ⎢
⎥ , a(n) = ⎢ k+ (ni )2 ⎥ ⎥
R · Ai + R · Ai −
−− 
− (R · A)2
i
⎢ ⎥ ⎢ 2 6 ⎥
k− ⎢ a(n)N ⎥ ⎢ − ⎥
2 ⎢ ⎥ ⎢ k ni ⎥
⎢ ⎥ ⎢ 2 3 ⎥
i dRA
(R · A)2 −−−→ ∅ ⎢ ⎥ ⎢ ⎥
⎢ ⎥ ⎢ ⎥
⎣ ⎦ ⎢ dRA2 ni3 ⎥
i kA
PI −−→ PI + A i i
dAe n5 ⎢ ⎥
⎢ ⎥
⎢ kA ni1 ⎥
D ⎢ ⎥
Ai 
−−−−
−− Ae ⎢ ⎥
DVc ⎢ dA ni ⎥
⎢ 4 ⎥
⎢ ⎥
d
Ai −−A→ ∅ ⎢ ⎥
⎢ Dni4 ⎥
⎣ ⎦
dAe
Ae −− →∅ DVc n5

3. Define the extended stoichiometry matrix. For a system with


N cells where rc common intracellular reactions and re extracel-
lular reactions take place among c common intracellular
non-algebraic species, i.e. species that cannot be obtained alge-
braically as a function of others (see Point 4 below to consider
algebraic species)—for all N cells and e extracellular
non-algebraic species, the extended stoichiometry matrix S
has a blocks structure:
Stochastic Differential Equations for Practical Simulation of Gene Circuits 47

⎡ ⎤
⎢ IN ⊗ Scc 0cN×re ⎥
S=⎣ ⎦ ð2Þ
11×N ⊗ Sec See

where
– Scc is a c  rc matrix formed by the stoichiometric coeffi-
cients for the c intracellular non-algebraic species account-
ing only for the rc intracellular reactions
– 0cNre is a c  N  re null matrix
– Sec is a e  rc matrix formed by the stoichiometric coeffi-
cients for the e extracellular non-algebraic species account-
ing for the interactions with the intracellular ones via the rc
intracellular reactions affecting them
– See is a e  re matrix formed by the stoichiometric coeffi-
cients for the e extracellular non-algebraic species account-
ing only for the re extracellular reactions
– IN is the N  N identity matrix
– 11N is a 1  N row vector of ones
and  is the Kronecker product (see Note 9).

Example 3 For the QS/Fb circuit, using the reactions


defined in Example 2, we have
2 3
1 1 0 0 0 0 0 0 0 0 0 0 0
6 7
60 1 1 1 07
6 0 1 0 0 0 0 0 0 7
Scc ¼6
6
7
7
60 0 0 0 0 0 1 1 1 0 0 0 07
4 5
0 0 0 1 1 0 0 0 0 1 1 1 1
Sec ¼ ½0 0 0 0 0 0 0 0 0 0 0 1 1 
See ¼ ½ 1 
ð3Þ
If we consider N ¼ 2 cells, the extended stoichiome-
try matrix takes the form:
2 3
½1  Scc  ½0  Scc  ½041 
6 7
6 7
S ¼ 6 ½0  Scc  ½1  Scc  ½041  7 ð4Þ
4 5
½1  Sec  ½1  Sec  ½See 
Refer to Example 14 for the software code
implementation.
48 Jesús Picó et al.

4. Consider algebraically determined species. Algebraic relation-


ships among the species in the circuit may arise as a product of
model-order reduction (see Note 6). Thus, for each i-th cell, the
vector of propensities a(n)i may depend on species nia1 , . . . , niaa
that in turn can be obtained as an algebraic function of the species
in the system. These algebraic species need not to be considered in
the vector of the species defined in Point 1 above. The algebraic
functions nia j ¼ g i j ðnÞ, j ¼ 1, . . . , a will be used as constraints
during the simulation of the system dynamics (see Point 5 below).
For a system with ca common intracellular algebraic species for all
N cells and ea extracellular algebraic species:
(a) define the column vector na ¼ ½na i , . . . , na N , nc a þ1 ,
. . . , nc a þe a T containing all vectors na i ¼ ½na1 , . . ., naca i for
the number of molecules of the ca intracellular algebraic
species in the i-th cell, and the variables nc a þ1 , . . . , nc a þe a
containing the number of molecules of each algebraic extra-
cellular species (see Example 4).
(b) define the column vector gðnÞ ¼ ½ga i ðnÞ, . . . , ga N ðnÞ,
g c a þ1 ðnÞ, . . . , g c a þe a ðnÞT containing all vectors of the alge-
braic functions ga i ðnÞ ¼ ½g a1 ðnÞ, . . . , g aca ðnÞi for the ca
intracellular algebraic species in the i-th cell, and the func-
tions g c a þ1 ðnÞ, . . . , g c a þe a ðnÞ for the algebraic extracellular
species (see Example 4).

Example 4 For the QS/Fb circuit, notice the fourth com-


ponent in the i-th cell vector of propensities a(n)i depends
on the species ni6 (see Example 1). This one can be
obtained algebraically as a function of the others, ni6 ¼
g i6 ðni2 , ni3 , ni4 Þ, so there was no need to explicitly consider
it as a component of the vector containing the number
molecules of the biochemical species. To simulate N cells
we set the algebraic constraint n6 ¼ g(n) with n6 ¼
T
½n16 , . . . , nN6  , and g6 ðnÞ ¼ ½g 6 ðn2 , n3 , n 4 Þ, . . . , g 6 ðn2 ,
1 1 1 1 N N

N T
n3 , n4 Þ . Refer to Example 16 for the software code
N

implementation.

5. Define the general structure of the dynamic model. See Note


1 for the technical background. Split the column vector n such
that n ¼ ½nna , na  where nna contains the number of molecules
for the set of non-algebraic species and na for the algebraic
ones. The temporal evolution of the number of molecules for
each biochemical species of interest will be expressed as:
pffiffiffiffiffiffiffiffiffiffiffiffiffiffiffipffiffiffiffiffi
nna ðt þ δtÞ ¼ nna ðtÞ þ S  aðnðtÞÞδt þ S  Nt  aðnðtÞÞ δt
na ðt þ δtÞ ¼ gðnðt þ δtÞÞ
ð5Þ
Stochastic Differential Equations for Practical Simulation of Gene Circuits 49

See Methods 3.1 for software coding instructions to simu-


late the temporal evolution of the number of molecules of each
biochemical species of interest in the system. See Point 2.2.1
for the definition of Nt .

Example 5 For the QS/Fb circuit, define nna ¼ [nnai,


. . ., nnaN, n5]T, with nna i ¼ ½ni1 , ni2 , ni3 , ni4  (see Exam-
ple 1) and na ¼ n6 as defined in Example 4. The
propensities vector a(n(t)) and the extended stoichi-
ometry matrix S were defined in Examples 2 and 3,
respectively.

2.2 Accounting for 1. Define the intrinsic noise matrix. The diffusion term in the
Noise and Computing Euler–Maruyama discrete formulation of the CLE given by
Long-Term Statistics Eq. 5 accounts for the intrinsic noise (see Note 1). For a system
with N cells, rc intracellular reactions, and re extracellular ones,
define the (N  rc + re)  (N  rc + re) matrix Nt as a diagonal
matrix with N  rc + re continuous independent normal random
variables with zero mean and unit variance
(Nii ðμ, σ2 Þ ¼ Nð0, 1Þ). Refer to Example 13 for the software
code implementation. See there how to skip some reactions so
they are not affected by intrinsic noise.
2. Define the extrinsic noise characteristics. Time-invariant
dynamics are assumed. That is, the system parameters may
take random values around a nominal one, but they keep
constant in time. The time-variant case, not covered here,
requires setting a stochastic differential equation for the tem-
poral evolution of each parameter for each cell in the popula-
tion. Consider extrinsic noise by randomizing the values of the
model parameters. Each model parameter θ has a nominal value
θn. The value of the parameter assigned to the i-th cell is θi ¼
θn ð1 þ CV θ N ð0, 1ÞÞ , where N ð0, 1Þ is the standard normal
distribution, and CVθ is a user-defined coefficient of variation
for the parameter θ. Refer to Example 10 for the software code
implementation of extrinsic noise.
3. Computing long-term first statistics for the time evolution
of the species of interest in the population of cells. To
compute the long-term moments of the species of interest,
such as mean (μ) and standard deviation (σ), and derived statis-
tics such as the noise strength η2 (squared coefficient of varia-
tion, η2 ¼ σ 2/μ2) for a population of N cells, follow the steps:
(a) For a population of N cells, run a simulation (see Subhead-
ing 3) of length T units of time, with discrete-time sam-
pling δt ensuring T large enough so that the steady state is
reached and maintained for Ts units of time. Refer to Note
2 to estimate the appropriate value of N providing repre-
sentative statistics. Notice only one realization of the
50 Jesús Picó et al.

simulation (one run) is required to compute the long-


term first moments if the system is ergodic. That is, if
enough time averaging along one realization is equivalent
to obtain statistics drawn from many realizations at each
time instant. Refer to Note 10 on computational assess-
ment of ergodicity of the system.
(b) Using the data for the N cells at every discrete-time
instant tk, calculate the mean m n j ðt k Þ and the variance
s 2n j ðt k Þ for each nj species of interest across the population
at each discrete-time instant tk:
1
X
N
mn j ðt k Þ ¼ ni ðt k Þ
N i¼1 j

1
XN 2
s 2n j ðt k Þ ¼ ðni j ðt k Þ  mn j ðt k ÞÞ
N i¼1

where ni j ðt k Þ is the value of species nj in the i-th cell at


time instant tk. Refer to Example 17 to see parallel soft-
ware code implementing these expressions.
(c) For each species of interest nj, calculate the long-term
total mean and variance using the law of total variance—
the total variance is the sum of the mean of the variance
plus the variance of the mean [4]:

1
Xf
μn j ¼ m n j ðt k Þ
Ts k¼0

1
Xf 1
Xf
σ 2n j ¼ s 2n j ðt k Þ þ ðm n j ðt k Þ  μn j Þ2
Ts k¼0 Ts k¼0

(d) Obtain additional statistics. The noise strength is obtained as:


σ 2n
η2n j ¼
j

μ2n j

2.3 Software 1. Select a software platform. To implement and run the simula-
tion algorithm of the CLE-based model, efficient software plat-
forms alleviate the computational cost. Here we consider the C+
+ version of the scalable Open Framework for particle and
Particle-Mesh codes (OpenFPM, available in http://openfpm.
mpi-cbg.de/). OpenFPM allows efficient parallel computation
using full parallel particle and mesh algorithms [18].
2. OpenFPM server installation. To install OpenFPM in Linux
or OSX, clone the repository and use the following lines to
install it in the default location (refer to Note 11 on how to
install other OpenFPM possible configurations and
troubleshooting):
Stochastic Differential Equations for Practical Simulation of Gene Circuits 51

1 git clone https :// github . com / IBirdSoft / openfpm_pdata_2 .0.0.

git

2 cd openfpm_pdata_2 .0.0

3 ./ install

4 sudo make install

After successful installation, either run the line

1 source ˜/ openfpm_vars

before each compilation of your OpenFPM client code or


incorporate it into your .bashrc system file (or equivalent).
The following three files are required in the working
directory:
– main.cpp, the OpenFPM client program that implements
the algorithm (refer to Methods 3.1)
– langevin.mk, a file that includes the locations of the
OpenFPM library in the particular installation (see below),
and
– Makefile with the compiler configuration (see below).
The file langevin.mk has all the specific locations of the
OpenFPM library in the particular installation. To create the
file langevin.mk, copy into the working directory the file
example.mk located in /usr/local/openfpm_pdata/
include/ (where /usr/local is the default installation
directory). If the installation directory is different, substitute
the /usr/local for the one where the OpenFPM library was
installed:

1 / working_directory$ cp / usr / local / openfpm_pdata / include /

example . mk langevin . mk

Finally, create the file Makefile shown below which spe-


cifies the compiler configuration:
52 Jesús Picó et al.

1 include langevin . mk

2 CC = mpic ++

3 LDIR =

4 OBJ = main . o

5 %. o : %. cpp

6 $ ( CC ) - O3 -c -- std = c ++11 -o $@ $ < $ ( INCLUDE_PATH )

7 langevin : $ ( OBJ )

8 $ ( CC ) -o $@ $ ˆ $ ( CFLAGS ) $ ( LIBS_PATH ) $ ( LIBS )

9 all : langevin

10 . PHONY : clean all

11 clean :
12 rm -f *. o *˜ core langevin

3. OpenFPM client program. The OpenFPM client program is


the C++ program, called main.cpp, that implements the sim-
ulation algorithm with the CLE model generated as in Materi-
als 2.1.5 (refer to Methods 3.1). To obtain the source code,
clone the repository:

1 git clone https :// github . com / sc2cl / population_cle . git

The repository includes the full source code of the


OpenFPM client main.cpp, together with the Matlab and
Python scripts to show different usage options.

3 Methods

3.1 Define the The dynamic CLE model as defined in Materials 2.1.5 and the
OpenFPM Client computation of the long-term statistics of the species of interest are
Program main.cpp implemented in an OpenFPM client program called main.cpp. It
contains three functions: main() with the main algorithmic steps
to execute the simulation and generate the selected output,
input_data() to read the parameters of the model from a file,
and evolve_time() to compute the system states—that is, the
number of molecules of the species of interest—at each simulation
time step δt. The last two functions are called from main(). Their
main features are given next. For further details refer to the com-
plete code available at https://github.com/sc2cl/population_cle.
1. Function main( ) The pseudo-code with the contents of the
main( ) function is given below:
Stochastic Differential Equations for Practical Simulation of Gene Circuits 53

1 Initialize OpenFPM library

2 Get Arguments from input :

3 Number of cells to simulate

4 Variance of parametric extrinsic noise

5 Extracellular species initial condition

6 Output selection :

7 Histograms for every time ,

8 Long - term histogram ,

9 Population statistics for every time ,

10 or Long - term statistics

11 Initialize distributed random number generator

12 Initialize variables :

13 Variables common to all processors

14 Distributed domain

15 Call to the function input_data ()

16 Initialize time

17 Allocate memory for the statistics and histograms

18 Loop from 0 to final time :

19 Call to the function evolve_time ()

20 Obtain population statistics or histograms for each time

step :

21 Iterate over the domain

22 Gather distributed variables into one processor

23 Write the output files ( one per state )

24 end loop

25 Obtain long - term population statistics or histograms :

26 Compute long term statistics

27 Write the output files

28 Finalize the OpenFPM library


54 Jesús Picó et al.

Line 1 of the main( ) pseudo-code is standard for any


OpenFPM client (see Example 6).

Example 6 For the QS/Fb example, the code cor-


responding to the first line of the pseudo-code is

310 // Initialize the library

311 openfpm_init (& argc ,& argv ) ;

312 Vcluster < > & v_cl = create_vcluster () ;

Line 312 initializes the v_cl, the Vcluster object that


manages the paralellization and the operations between
the processors. This will be used, for instance, in Exam-
ple 18 to execute sums over all the processors.

Lines 2–10 of the main( ) pseudo-code are standard C++


code for obtaining arguments from the command line. Refer to
the main.cpp code (Lines 314–354) in the repository for
further details.
Line 11 sets the random number generator. Refer to Note
12 to see the implementation and code details.
Lines 12–14 initialize the variables and data structures to
be used. There are two types: normal variables and distributed
ones. Normal variables are common to all processors. For
example, normal variables can be used for the extracellular
species. Distributed variables are defined as nodes in a
distributed grid using the data structure type dist_grid_id.
Every node of the grid represents an individual cell. A node
may contain several variables, such as the state variables of that
cell (e.g. the intracellular species) and the specific values for its
parameters (Line 412 in Example 7). Once the code is com-
piled and executed over several processors, the grid is decom-
posed and each processor executes actions on the subset of
nodes of the grid assigned to it [18]. This process is transparent
for the user.

Example 7 The code to create the grid with 240 cells


reads like:
Stochastic Differential Equations for Practical Simulation of Gene Circuits 55

378 // Creation of the distributed grid domain

379 Box <2 , double > domain ({0.0 ,0.0} ,{1.0 ,1.0}) ;

380 size_t a = 10;

381 size_t b = 24;

394 size_t sz [2] {a , b };

Line 394 defines the size of the grid, i.e. the number of
cells, as N ¼ a  b.

399 // Vector for x5 ( as normal variable , not a

distriduted one )

400 openfpm :: vector < double > x5 ;

Line 400 creates a openfmp::vector variable, a normal


variable which is not distributed along the processors. In
this case for the extracellular specie x5.

411 // Create a distributed grid in 2 D

412 grid_dist_id <2 , double , aggregate < double , double ,

double , double , double , double [23] > > g1 ( sz , domain , g ) ;

Line 412 creates the distributed grid g1. The first argu-
ment sets the grid as a 2D dimensional one. The second
one specifies double precision. The variable
aggregate<> is a vector containing as many double
precision variables as states in the cell and an array of
doubles with size equal to the number of model para-
meters of the cell. For the QS/Fb example, there are five
doubles, one for each of the five intracellular species (see
Example 1) and one double array of size 23 (double
[23]) for the parameters of each cell. To make access to
these variables more user friendly, the following lines are
defined at the beginning of the code:
56 Jesús Picó et al.

14 const size_t I = 0;

15 const size_t R = 1;

16 const size_t RA2 = 2;

17 const size_t A = 3;

18 const size_t RA = 4;

19

20 const size_t Parameters = 5;

These lines (Lines 14–20) define names for the variables


in each cell. To access the variables inside each cell use
g1.template get < WHO >(key) where WHO is one of
the five previously defined names (I, R, RA2, A,
RA, Parameters), and key is the identity of the cell
to be accessed. Refer to Example 10, Line 273, to see to
obtain the key.

413 grid_dist_id <2 , double , aggregate < double , double ,

double , double , double , double [23] > > g2 ( g1 .

getDecomposition () ,sz , g ) ;

Line 413 creates a second grid g2 with the same charac-


teristics asg1, the one created in Line 412. Computing
the time evolution of the system makes use of two grids:
one for the values of the current discretized time instant,
and another one for the previous one. Alternating
between these two grids yields an efficient memory
usage, as the amount of memory used is independent of
the number of time steps. See Example 8 for details.

Line 15 of the main( ) pseudo-code calls the function


input_data() to obtain the parameters of the CLE model
from the param.dat (see the pseudo-code in Methods 3.1).
Line 16 initializes the time variables and the time step δt (refer
to Note 13 on how to select the time step, and the source code
in Lines 422–461 for the implementation). Line 17 allocates
Stochastic Differential Equations for Practical Simulation of Gene Circuits 57

memory for long-term statistics and histograms. Lines 18–23


implement the integration loop by calling the function evol-
ve_time() to numerically solve the stochastic differential
equations as an iterative solution of the CLE model in Materi-
als 2.1.5 (see Example 8).

Example 8 The code to implement the time step inte-


gration of the QS/Fb CLE model reads like:

463 // Now we start a loop in time to solve the

differential equations

464 for ( int j = 0; j < (N -1) ; j ++)

465 {

466

467 // J increments by 1 in each time step . Always read

the first argument and and write in second . But

change the positions in memory swaping between 1 and

2.

468

469 if ( j %2 == 0)

470 {

471 evolve_time ( g1 , g2 , T , sT , x5 , engine , NDist ,

stats_vec , Ncells ,j , dAee , val4 ) ;

472 }

473 else

474 {

475 evolve_time ( g2 , g1 , T , sT , x5 , engine , NDist ,

stats_vec , Ncells ,j , dAee , val4 ) ;

476 }

where j increments by 1 at every time step. The first


argument of the function evolve_time() is the grid to
be read, and the second one is the grid to write into.
Thus, in the even time steps, the current information is
read from g1 and the updated one written into g2, and
58 Jesús Picó et al.

the other way round in the odd time steps (read from g2
and write into g1). The other arguments of this function
are: the time steps (T and sT), the vectors of the extracel-
lular species (x5), the engine of the random number
generator (engine), the normal distribution generator
(NDist), the statistics vector (stats_vec), the number
of cells (Ncells), the step counter (j), the degradation
of the extracelular species (dAee), and the initial condi-
tion of the extracellular species (val4).

611 // Increment the time

612 t += T ;

613 }

Lines 20–23 of the main( ) pseudo-code implement a


loop over the grid domain to obtain the values of the states per
cell, and gather into one processor all the information
distributed in the other processors to write the output file
(see Lines 478–555 in the source code for details). For output
analysis purposes, the temporal evolution of the i-th cell is
stored at each multiple D of the time step δt. The storage
decimation value D depends on the number of cells N in the
population, the size of the simulation time step δt, and the
desired maximum size of the output files (see Example 9).
Refer to Note 14 on how to select D.

Example 9 One realization of the QS/Fb CLE model


was ran over a simulation time of T ¼ 800 min using a
time step δt ¼ 0.025 min. Keeping every calculated data
point generates a total of 32,000 time points per
240 cells and 4 states per cell, implying 30 million
data points in a 250 MB file. However, using D ¼ 10,
i.e. keeping one in ten data points, yields a total of
three million data points stored in a 26 MB file.

Lines 24–26 compute the long-term statistics and write the


output file. These are standard implementations of equations in
Materials 2.2.4c. Refer to Lines 616–694 in the source code
for further details.
Finally, Line 27 finalizes the execution closing the library
appropriately.
Stochastic Differential Equations for Practical Simulation of Gene Circuits 59

2. Function input_data( ) The pseudo-code with the contents of


the input_data( ) function is given below. This function sets
the values of the model parameters for all the cells in the
population. It reads the initial values from an external file
provided by the user and modifies them adding extrinsic
noise if so specified.

1 Read file param . dat

2 Write into parameters array

3 Iterate over all the cells :

4 Initialize the states of cell i

5 Read parameters array and add parametric extrinsic noise

if required

6 Write new parameters into the cell i

7 Increment i

8 Write parameters that are external to the cells

9 Return the distributed grid with the parameters

Example 10 below shows relevant parts of the code imple-


menting the input_data() function for the QS/Fb case.

Example 10

267 // Create iterator for the distributed grid

268 auto dom_init = g1 . getDomainIterator () ;

269 // Iterate

270 while ( dom_init . isNext () )

271 {

272 // Get the actual position from the iterator in the

subdomain

273 auto key = dom_init . get () ;

274 // Initialize the grid

275 g1 . template get <I >( key ) = 0.0;


60 Jesús Picó et al.

Line 268 creates the domain iterator. In each iteration,


Line 273 obtains the identity key of the current cell.
Then, key is used to assign the initial conditions of the
states (Lines 275–279) and the parameters to the cur-
rent cell:

281 for ( int l = 0; l < 19; l ++)

282 {

283 double aux_param = ( EXTRINSIC_NOISE * enoise_sigma *

NDist ( en ) + 1) * parameters [ l ];

284

285 g1 . template get < Parameters >( key ) [ l ] = aux_param ;

286 g2 . template get < Parameters >( key ) [ l ] = aux_param ;

287

288 }

In Line 283, when EXTRINSIC_NOISE¼1, extrinsic


noise is added to each parameter l by multiplying it by
ð1 þ σ e N l Þ, where σ e is a user-defined coefficient of
variation previously introduced as an argument to the
program in Line 1 of the pseudo-code (see Materials
2.1.2), and NDist(en) is a random number generated
with the random number generator (see Note 12). Refer
to Lines 249–305 of the source code of the QS/Fb
implementation for further details.

3. Function evolve_time( ) The pseudo-code with the contents


of the evolve_time( ) function is given below. This function
updates the states of the system at each discrete-time step.
Stochastic Differential Equations for Practical Simulation of Gene Circuits 61

1 Initialize variables ( states and statistics )

2 Create a domain iterator

3 Execute until domain is covered

4 Read parameters for the cell i

5 Calculate propensities of cell i using parameters and

previous values of the states

6 Generate random numbers to include noise

7 Calculate the deterministic and stochastic terms

8 Compute the new values of the states of cell i

9 Compute the algebraic restrictions

10 Append the contribution of cell i to the extracellular

species

11 Append the contribution of cell i to the population mean

calculation

12 Increment the iterator

13 Compute the sum of the partial extracellular species and

means over all the processors

14 Compute the new values of the extracellular species

15 Create a new domain iterator

16 Execute until domain is covered

17 Append the contribution of cell i to the population

variance calculation

18 Increment the iterator


19 Compute the sum of the partial variances over all the

processors

20 Return the updated statistics and extracellular species

Lines 1–3 of the pseudo-code are equal to the ones in


Example 10 (Lines 267–273). Line 4 saves into local variables
the parameters of the i-th cell obtained from the grid, as seen in
Example 11 for the QS/Fb case.
62 Jesús Picó et al.

Example 11 Saving the value of the first parameter


(degradation rate of the species I ) of the key cell in
the variable dI.

62 // Copy parameters from the distributed grid

variable Parameters into the named variables

63

64 // I parameters

65 double dI = g_dist_read . template get < Parameters >( key )

[0];

Line 5 calculates the propensities as shown in Example 12.

Example 12

94 // Propensities

95 // I

96 double X11 = dI * g_dist_read . template get <I >( key ) ;

97 double c1 = ( pI * kI * pN_I ) / dmI ;

98 double c2 = 1/( kdLux + g_dist_read . template get < RA2 >(

key ) ) ;
99 double X12 = c1 * c2 *( kdLux + alphaI * g_dist_read .

template get < RA2 >( key ) ) ;

Note that all these operations are performed on


g_dist_read, which is the grid containing the number
of molecules of the species in the previous discrete-time
instant.

Line 6 generates random numbers to incorporate the noise


terms (see Note 12 and Example 13).
Stochastic Differential Equations for Practical Simulation of Gene Circuits 63

Example 13 To generate random numbers, the code for


the QS/Fb example reads like:

118 // Noises

119 double x1_noise1 = NDist ( en ) ;

120 double x1_noise2 = NDist ( en ) ;

Each time the function NDist(en) is called, it gen-


erates a new independent random number.

138 x4_noise_difu = 0;

Line 138 shows how to consider a reaction not to be


affected by intrinsic noise simply setting the corres-
ponding noise variable to zero.

Line 7 calculates the deterministic and stochastic terms


taking into account the stoichiometry (see Example 14).

Example 14 The code in QS/Fb CLE model for the


species I in the key cell reads like:

141 // Deterministic and stochastic terms with

stoicheometry included

142 double x1_det = T *( - X11 + X12 ) ;

143 double x1_sto = sT *( - sqrt ( std :: abs ( X11 ) ) * x1_noise1 +

sqrt ( std :: abs ( X12 ) ) * x1_noise2 ) ;

Line 8 computes the updated new value of the number of


molecules of the species of interest by adding the deterministic
and stochastic terms to the value of the number of molecules in
the previous discrete-time instant (see Example 15).

Example 15 The code in QS/Fb CLE model for the


species I in the key cell reads like:

154 // Compute new value

155 g_dist_write . template get <I >( key ) = g_dist_read .

template get <I >( key ) + x1_det + x1_sto ;

Recall that between consecutive calls to the function


evolve_time() the grid g1 and g2 will alternatively
64 Jesús Picó et al.

assigned to g_dist_read and g_dist_write as men-


tioned in Example 8.

Line 9 calculates the algebraic constraint using the new


states values previously computed (see Example 16).

Example 16 The code that implements the algebraic


relationship between the species R.A (ni6) as a function of
ni2 , ni3 , ni4 (R, A, (R.A)2) for the i ¼ key cell in the
QS/Fb CLE model (see Example 4) is:

173 // Algebraic constraints

174 double c6 = 2* k_2 * kd1 * g_dist_write . template get < RA2 >(

key ) + k_1 * g_dist_write . template get <R >( key ) *

g_dist_write . template get <A >( key ) ;

175 double c7 = 8* k_2 * c6 ;

176 double c8 = k_1 + dRA ;

177 double c9 = ( kd1 * kd2 ) * c8 * c8 ;

178 double c10 = ( kd2 * c8 ) /(4* k_2 ) ;

179 double c11 = c7 / c9 + 1;

180 g_dist_write . template get < RA >( key ) = c10 *( sqrt ( std

:: abs ( c11 ) ) - 1) ; // % This is R . A

Line 10 adds the contribution of the i-th cell to the value of the
extracellular species and Line 11 adds the contribution of the i-
th cell to the variable accounting for the mean number of
molecules of the intracellular species (see Example 17).

Example 17 The code in the QS/Fb model obtaining


the updated number of molecules for the extracellular
species Ae is:
Stochastic Differential Equations for Practical Simulation of Gene Circuits 65

182 // Write the partial value of Ae from the present

cell .

183 tot_A_partial += T * ( -1) * X44 + sT * ( -1) * sqrt (

std :: abs ( X44 ) ) * x4_noise_difu ;

184

185 // Add to the mean the term corresponding to the

present cell .

186 x1mean += g_dist_write . template get <I >( key ) / Ncells

Line 186 implements a partial calculation of the mean of x1 in the following way:

1
XN
x 1mean ¼ xk
N k¼1 1

where N is the total number of cells, x i1 is the number of molecules of the species x1
(PI) in the i-th cell. Recall this code is actually executed in several processors at the same
time. For example, considering a hypothetical distribution of N cells over two
processors:

P1 = {celli : i = 1, 2, . . . M }, P2 = {celli : i = M + 1, M + 2, . . . N } ,

where P q is the q-th processor, and M < N. Then, the calculation of x 1mean becomes
P1 P2
zfflfflfflfflffl
Xffl}|fflfflfflfflfflffl{ zfflfflfflfflfflfflfflffl
X ffl}|fflfflfflfflfflfflfflfflffl{
1 M 1 N
x 1mean ¼ xk þ xk
N k¼1 1 N k¼M þ1 1

When the iterator is in the j-th cell, with j  M, the calculation is executed in
processor P 1, so the partial calculation of x 1mean is

1  k 1  k
j (j−1)
1 1
x1mean (P1 , cellj ) = x1 = x1 + xj1 = x1mean (P1 , cellj−1 ) + xj1
N N N N
k=1 k=1

On the contrary, when the iterator is in the j-th cell, with M < j < N, the calculation
is executed in processor P 2, and the partial calculation of x 1mean is

1 
j
1 
j−1
1 j 1
x1mean (P2 , cellj ) = xk1 = xk1 + x = x1mean (P2 , cellj−1 )+ xj1
N N N 1 N
k=M +1 k=M +1

After the iterator covers the whole grid of N cells, each processor has finished its partial
calculation of x 1mean :
66 Jesús Picó et al.

1
X
N
x 1mean ðP 1 Þ ¼ xk
N k¼1 1

1
XN
x 1mean ðP 2 Þ ¼ xk
N k¼M þ1 1

The mean value corresponding to the whole popula-


tion of N cells is obtained as the sum of the distributed
partial values gathered from all the processors (two pro-
cessors in the example):
x 1mean ¼ x 1mean ðP 1 Þ þ x 1mean ðP 2 Þ
The corresponding software code is shown in Exam-
ple 18.

Line 12 in the pseudo-code of evolve_time( ) incre-


ments the iterator and the previous code is executed for all the
cells. Once the iterator reaches the end, Line 13 computes the
mean number of molecules of the intracellular species across the
population summing the partial results obtained in each of the
computing processors. To this end, the functions sum and exe-
cute are used as shown in Example 18 (Lines 197–201). Line
13 of the pseudo-code also computes the number of molecules
of the extracellular species as the sum of the values obtained in
each of the distributed processors (see Example 18, Line 196).

Example 18 Aggregating the partial mean values of the


species in the QS/Fb example:

195 // Excecute the sum of the means and A_partial over

all the processors

196 v_cl . sum ( tot_A_partial ) ;

197 v_cl . sum ( x1mean ) ;

198 v_cl . sum ( x2mean ) ;

199 v_cl . sum ( x3mean ) ;

200 v_cl . sum ( x4mean ) ;

201 v_cl . execute () ;

Line 14 computes the new values of the extracellular species (see


Example 19).
Stochastic Differential Equations for Practical Simulation of Gene Circuits 67

Example 19 Parts of the code computing the updated


value of the number molecules of the extracellular species
Ae in the QS/Fb example:

235 // Noise for x5

236 double x5_noise = NDist ( en ) ;

237 // Stochastic part of x5 , x5 . last is the previous

value of x5

238 double x5_sto = sT *( - sqrt ( std :: abs ( dAe * x5 . last () ) ) *

x5_noise ) ;

239 // Calculate the new x5 value

240

241 double x5to_add = x5 . last () + T *( - dAe * x5 . last () ) +

x5_sto + tot_A_partial ;

242

243 // Add the las calculated value to x5 vector

244 x5 . add ( x5to_add ) ;

Lines 15–19 perform the same operations: creation of an itera-


tor, iteration over the domain, calculation of the partial vari-
ance over the population, and sum of the partial variances once
the domain is covered as in Examples 17–18. See Lines
203–223 in the source code for details about the
implementation.

3.2 Compilation Next, compile the OpenFPM client program main.cpp with the
command make. The result of the compilation is the executable
program langevin. For a successful compilation it is mandatory to
have both the langevin.mk and Makefile files in the working
directory with the compiler configuration mentioned in Materials
2.3.

3.3 Simulation Stochastic simulations of the dynamic model are obtained by


executing the program langevin (the compiled main.cpp). A
stand-alone parallel execution of the langevin program can be
run as follows:
68 Jesús Picó et al.

1 mpirun - np 4 ./ langevin param . dat 240 0.1 0 1 0 1 0

where -np 4 sets four core processors to run the parallel simulation,
./langevin is the name of the executable program, and the file
param.dat. The input file param.dat is a CSV text file with the
nominal values of the parameters ordered and separated by com-
mas. The first three numeric arguments correspond to the number
of cells to be simulated (240), the user-defined coefficient of vari-
ance for the extrinsic noise (0.1), and the initial number of mole-
cules of the extracellular species (0). The last four remaining
arguments configure:
– Simulation with intrinsic noise (1) or deterministic simulations
(0),
– Long-term population histograms (1) or not (0). See Example
20, right part of the plot.
– Population statistics (mean and variance) at every time step
(1) or not (0). See Example 21.
– Temporal response of all cells at time step (1) or not (0). See
Example 20.
An execution of langevin returns by default the long-term
population statistics in the output file output.dat, see Example
22. Additionally, the file param.dat and the corresponding exe-
cution of the langevin program can be performed in different
ways. See Note 15 to see a parametric swept performed in
MATLAB®. See Note 16 for a Python script to start an execution.

Example 20 N ¼ 240 cells were simulated to obtain trajectories


and histograms of the intracellular species for the QS/Fb system.
Each species has a normalized endpoint distributions computed
over one realization of the CLE model, including intrinsic and
extrinsic noise. All endpoint histograms show a well-shaped
normal distribution, and they differ only in their means and noise
strengths. Each species mean and standard deviation (μ  σ)
were computed using the last one third data of each species in the
whole population to avoid the effect of the transient.
Stochastic Differential Equations for Practical Simulation of Gene Circuits 69

5000
PI molecules

4000

3000

2000

1000

0
0 100 200 300 400 500 600 700 0 0.5 1 1.5x10-3

1000
R molecules

800
600

400
200

0
0 100 200 300 400 500 600 700 0 2 4 6x10-3
1500
(R.A)2 molecules

1000

500

0
0 100 200 300 400 500 600 700 0 1 2 3x10-3

100
A molecules

80
60
40
20
0
0 100 200 300 400 500 600 700 0 0.02 0.04
Time [min] Frequency
70 Jesús Picó et al.

Example 21 Population statistics at each time step comparing stochastic and deterministic
results of the QS/Fb CLE model are depicted below. This is a single realization computed
over 800 min for the four intracellular species considering a population of N ¼ 240 cells.
The stochastic (solid line) and deterministic (dashed line) are two independent simulations
but under the same initial conditions. The average number of molecules of each species
obtained in both simulations closely match.

3000
molecules
PoI/LuxI

2000

1000

0
0 100 200 300 400 500 600 700 800
600
LuxR molecules

400

200

0
0 100 200 300 400 500 600 700 800
600
(LuxR.AHL)2
molecules

400

200

0
0 100 200 300 400 500 600 700 800
AHL molecules

40

20

0
0 100 200 300 400 500 600 700 800
Time(min)

Deterministic Stochastic (Mean) Stochastic (Variance)

Example 22 Long-term population statistics for the QS/Fb system output (n1
(PI)) using different sets of model parameters are illustrated below. Quorum sensing
(orange dots) in the QS/Fb system reduces the PoI/LuxI noise strength. The no
quorum sensing effect (purple dots) inhibits the diffusion of AHL molecules and
increases the PoI/LuxI noise strength.
Stochastic Differential Equations for Practical Simulation of Gene Circuits 71

4 Notes

1. Chemical Langevin Equation The general form of the sto-


chastic differential Chemical Langevin Equation is
pffiffiffiffiffiffiffiffiffi
dnðtÞ ¼ S  aðnÞ  dt þ S  aðnÞ  dW ð6Þ
where n(t) is a vector containing the number of molecules of
each species in the model, SKJ is the stoichiometry matrix,
where K the number of species and J the number of biochemi-
cal reactions, a(n)J1 is the vector of propensities containing
the reaction kinetics, and dW are scalar independent Brownian
motions associated to each reaction [16]. The first term on the
right-hand side of the equation corresponds to the deterministic
kinetics, also called the macroscopic drift term within this stochas-
tic context. The second term, accounting for intrinsic noise, is the
so-called diffusion term. Notice the deterministic drift term grows
as the size of the system (i.e. the number of molecules), while the
diffusion term grows as the square root of the size of the system.
Therefore, the relative weight of the stochastic term with respect
to the deterministic one scales as the inverse square root of the size
of the system. That is, as the number of molecules of the species
increases, the solution of the SDE (6) will approach that of the
deterministic model in the sense that the fluctuations around the
deterministic solution will have less relative size.
The Euler–Maruyama discretization method [15] can be
used for generating sample paths of the stochastic process driven
by the CLE . It describes the temporal evolution of the number
of molecules of each biochemical species in the system as:
72 Jesús Picó et al.

pffiffiffiffiffiffiffiffiffipffiffiffiffiffi
nðt þ δtÞ ¼ nðtÞ þ S  aðnÞδt þ S  N  aðnÞ δt ð7Þ
where Nð0, 1ÞJ J is a diagonal matrix containing J statistically
independent normal random variables, and δt is the discretiza-
tion time step.
2. Selecting the size N of the population of cells Recall this
protocol assumes N is constant throughout the simulation. To
get an estimate of the population size N so as to get statistically
correct results at minimum computational cost, run a set of
simulations changing the size of the population of cells and the
culture volume while keeping constant cell density (see Note 3)
and evaluate the effect of changes on the statistical information
of interest (e.g. noise strength). Simulations at different OD
values can assess on its potential effect on the system behavior
(e.g. by affecting cell-to-cell communication mechanisms) (see
Example Note 1).

Example Note 1
The next figure shows the results obtained for the
QS/Fb circuit when comparing noise strength of
protein n1 (PI) at different OD600 values defined in
the table below. A Noise strength does not apprecia-
bly change for OD ∈ [0.005, 5] obtained either
changing the volume Vext and keeping the cell num-
ber N¼240 (blue squares) or changing both N and
Vext (green squares). B Noise strength for different
N and Vext keeping constant OD600 ¼ 0.3.
Stochastic Differential Equations for Practical Simulation of Gene Circuits 73

N fixed

N (cells) 240 240 240 240 240 240

Vext ( μL) 0.06 0.03 0.006 0.003 0.0006 0.0003

OD600 0.005 0.01 0.05 0.1 0.5 1

Variable N and Vext

N (cells) 240 240 1200 2400 4800 12000

Vext ( μL) 0.03 0.006 0.015 0.006 0.006 0.003

OD600 0.01 0.05 0.1 0.5 1 5

OD fixed

N (cells) 240 1200 2400 4800 12000

Vext ( μL) 0.001 0.005 0.01 0.02 0.05

OD600 0.3 0.3 0.3 0.3 0.3

3. Relating the cell population size N and optical density


Optical density (OD) is an adimensional measurement com-
monly used to estimate the concentration of bacterial or other
cells in a liquid culture [36]. Typically, the OD of a cell sample
is measured at a wavelength of 600 nm (OD600). Its value
depends on the number of cells N and the volume of the
culture Vext as:
1 1
OD ¼ N 
V ext N OD¼1
where NOD¼1 is the quantity of cells contained in one volumet-
ric unit of culture when the optical density is OD ¼ 1 (see
Example Note 2).

Example Note 2
Considering that N ¼ 8  105 is the quantity of
cells contained in 1 μL of bacterial culture when the
OD is 1 (Source: Agilent, E. coli Cell Culture Con-
centration from OD600 Calculator) and
Vext ¼ 1  103 μL as a typical culture volume in a
microfluidic device, we need N¼240 cells to simulate
a scenario corresponding to OD¼ 0.3
74 Jesús Picó et al.

4. Dealing with heterogeneous cells In case there are several


populations of cells, with different sets of intracellular reactions
and species but sharing the extracellular species and reactions,
extend the model by creating one grid per group of different
cells (see Example 7 in Methods 3.1).
5. Reaction propensities The reaction propensities a(n) in Eq. 6
can be obtained by applying the mass action kinetics formalism
[10]. The law of mass action states that the rate of a chemical
reaction is proportional to the product of the reactant concen-
trations raised to a given power given by the stoichiometry of
the reaction. If one of the required products is lacking, the
reaction will not take place. The reaction proceeds faster as the
concentration of the required substrates increase. Doing it
proportionally to the product of the reactant concentra-
tions—also called substrates—basically accounts for the proba-
bility of encounter (collision) among the reactants. Thus, the
rationale behind mass action kinetics is that the rate at which a
reaction proceeds is proportional to the probability that the
required reactants encounter. This probability, in turn, is pro-
portional to the product of their concentrations. Consider a
system with m species and a reaction Rj relating them:
k+
Rj : pj1 n1 + . . . + pjm nm −
−
j
−−
− qj1 n1 + . . . + qjm nm
kj

where pji, qji are the consumption and production stoichiome-


try coefficients for the i-th species, and kþj , kj the specific
reaction rates of the forward and reverse reactions, respectively.
Define the net stoichiometry coefficient rji ¼ qji  pji. Accord-
ing to the mass action formalism, reaction Rj will contribute to
the deterministic dynamics of the i-th species, ni, as:
p q
n_ i ðtÞ ¼ r j i kþj ∏s¼1 ns j s  r j i kj ∏s¼1 ns js þ . . .
m m

where notice other reactions will contribute with analogous


pjs
terms. The functions a þj ðnÞ ¼ kþj ∏m s¼1 n s and aj ðnÞ ¼
 m qjs
k j ∏s¼1 ns are the propensity functions corresponding to the
forward and reverse j-th reactions.
6. Model reduction Direct application of mass action kinetics to
the set of reactions may result in dynamic models with many
states (biochemical species) and model parameters. Model
reduction yields a model with less variables and, thus, less first
order differential equations, i.e. less order. There are some
advantages in reducing a dynamic differential model. Thus:
– large order models have many parameters (i.e. specific reac-
tion rates). The values of these parameters must be obtained
using experimental data related to the corresponding reac-
tions. The experimental difficulties and computational cost
Stochastic Differential Equations for Practical Simulation of Gene Circuits 75

for the parameters estimation process increases with the


number of parameters.
– In practice, there are reactions that proceed at much faster
rates than others. This means that there are very different
time scales associated to each reaction. The large differences
in the time scales among the different species in the reac-
tions network originate huge difficulties for simulating the
temporal evolution of the circuit species and for understand-
ing the basic principles of its operation.
The reduction process should yield a model more amena-
ble for computational analysis, but avoiding excessive reduc-
tion that would lead to lack of biological relevance. In
particular, the species in the reduced model must not be
lumped ones. The resulting lumped parameters in this reduced
model must be easy to associate to experimental tuning knobs.
Model reduction can be carried out by means of the Quasi
Steady-State Approximation (QSSA) of the fast chemical spe-
cies. In essence QSSA is a singular perturbation method
[21, 22] that considers the time-scale separation among the
different dynamics [25, 42]. In particular, a common assump-
tion is that binding reactions of transcription factors to gene
promoters occur very fast in comparison with those
corresponding to transcription, translation, and degradation.
Additional algebraic relationships among variables can be
obtained through system invariants. In the case of reaction
networks, it can be observed that some reactions are a linear
combination of other ones. Then, the linear combination of
the number of molecules (alternatively, concentrations) of the
species involved will keep constant in time. These linear com-
binations, so-called moieties, can be understood as a kind of
quasi-species that keep invariant, i.e. keep constant number of
molecules (see Example Note 3). The reduced order models
can be expressed as a reduced set of equivalent pseudo-
reactions with lumped functional propensities (see Example
Note 4.1).

Example Note 3
The set of reactions below represent the conversion
of the substrate X2 into the product X4 catalyzed by the
enzyme X1:
k+
X1 + X2 −−−− X3
1
−−
k1

k
X3 −−2→ X1 + X4
d
X4 −−4→ ∅
76 Jesús Picó et al.

where x3 is the intermediate substrate-enzyme complex.


Application of the mass action kinetics gives the dynamic
balances for the four species in the system:
x_ 1 ¼ k
1 x 3  k1 x 1 x 2 þ k2 x 3
x_ 2 ¼ k
1 x 3  k1 x 1 x 2
x_ 3 ¼ k
1 x 3 þ k1 x 1 x 2  k2 x 3
x_ 4 ¼ k2 x 3  d 4 x 4
Assuming the association–dissociation reaction
between the enzyme X1 and the substrate X2 to produce
the intermediate complex X3 is much faster than the
other reactions, we can apply the quasi-steady-state
assumption:
k1
x_ 2 0 Î x 3 ¼ x 1x 2
k
1

On the other hand, the sum of the first and third


equations in the dynamic balances is zero. That is, the
sum of free and ligated enzyme is constant, equal to the
total amount of enzyme in the system:
x_ 1 þ x_ 3 ¼ 0 Î x 1 þ x 3 ¼ c
From the expressions above, one has the reduced
order dynamics:

k2 cx 2
x_ 4 ¼ 1
 d4x 4
k
þ x2
k1

7. Pseudo-reactions and lumped propensities The reduced


order models can be expressed as a reduced set of equivalent
pseudo-reactions with associated lumped functional propensi-
ties (see Example Note 4.1). Pseudo-reactions may also be
used to represent physical diffusion processes (see Example
Note 4.2).

Example Note 4.1 Consider the reduced order dynamics


of the product obtained in Example Note 3:
k2 cx 2
x_ 4 ¼ 1
 d4x 4
k
þ x2
k1
One may consider an equivalent pseudo-reaction
such that application of mass action kinetics to it will
produce the dynamics above:
Stochastic Differential Equations for Practical Simulation of Gene Circuits 77

f (x2 )
X2 −−−→ X2 + X4
d
X4 −−4→ ∅

where f ðx 2 Þ ¼ kk12 cx 2 . Notice f(x2) can be considered as a


k1 þx 2

lumped propensity function.

Example Note 4.2


Consider the diffusion process of the species Ai, Ae in
Example 2 across the cell membrane. Even if a physical
system, it can be modeled by means of the pseudo-
reaction:
D
Ai −
−−
 −
−− Ae
DVc

where Vc is the ratio between the cell volume and the


extracellular one. The corresponding propensity terms
(see Example 2) will generate the appropriate terms in
the dynamic balances:

n_ i4 ¼ DV c n5  Dni4 þ . . .
PN
n_ 5 ¼ N DV c n5 þ D i
i¼1 n4 þ ...

8. Statistical validation of the lumped propensities In Note 4


lumped propensity functions are obtained as result of consider-
ing reaction invariants and dependence of slow reactions as a
function of fast ones. The use of higher-order terms in stochas-
tic simulation is justified from the point of view of the compu-
tational implementation [8, 31]. Usually, stochastic algorithms
treat all the reaction events alike, spending the great majority of
their time simulating the many relatively uninteresting fast
reaction events than explicitly simulate only the slow reactions.
Nevertheless, statistical validation of the high-order functional
propensities can be done by simulating the associated pseudo-
reaction using the CLE approach, and then comparing this
result with the one obtained by simulating the set of
corresponding original reactions using Gillespie’s direct
method SSA [5, 41]. To this end, for the relevant species
involved in the associated pseudo-reaction, obtain Box-and-
Whisker plots of the SSA and CLE realizations. Perform a
Kruskal–Wallis test [23] to test if there is any statistically signif-
icant difference between their medians (see Example Note 5).
78 Jesús Picó et al.

Example Note 5
In the QS/Fb case, there is one lumped propensity:
the Hill-like function f(n3) modeling the repressible pro-
moter PI/(R.A)2. Transcription and degradation of PI
can be described (see Example 2) using the equivalent set
of pseudo-reactions:
f1 (ni )
(R · A)2i −−−−3→ mPIi + (R · A)2i
dm
mPIi −−→
I

The original set of reactions is


C
gPI −−→
I
gPI + mPI
klux
k
−−−
gPI + (R · A)2  −−−
− gPI · (R · A)2
dlux

klux

αCI
gPI · (R · A)2 −−→ gPI · (R · A)2 + mPI
dm
mPI −−→
I

C I pI kdlux þαI ni3


where f ðni3 Þ ¼ dmI ð kdlux þni3 Þ.
To validate the propensity function f(n3), both sets of
reactions were simulated. For one single-cell (i¼1) and
with the same conditions, the set of pseudo-reactions
were ran using the CLE, and original ones using the
Gillespie direct method (SSA).
For one realization, the figure below shows how the
CLE trajectory (A right) matches very well the SSA one
(A left) during the whole simulation period. Both SSA
and CLE trajectories have similar distributions with small
differences between their first statistical moments
(μSSA μCLE, and σ SSA σ CLE) (see B). The noise
strength of mRNApoI/luxI for the SSA distribution
(η2SSA ¼ 0:008 ) matches closely with the same for the
CLE (η2CLE ¼ 0:0072).
Subplot C shows the Box-and-Whisker plots for the
messenger RNA of poI/luxI in both SSA and CLE reali-
zations. Their medians (red line) are practically the same
n1SSA ¼ 127:7 and n
(~ ~ 1CLE ¼ 126:1 molecules). The Krus-
kal–Wallis test [23] reveals that there is no statistically
significant difference between their medians at the 95.0%
confidence level ([test statistic, p-value] ¼
[2.09067  106, 1.0]).
Stochastic Differential Equations for Practical Simulation of Gene Circuits 79

SSA CLE
A 200 200
180 180
160 160
140 140
120 120
100 100
80 80
60 60
40 40
20 20
0 0
0 5 10 15 0 5 10 15
Time [min] Time [min]
B C
0.1
0.09
Normalized Counts

0.08
0.07
0.06
0.05
0.04
0.03
0.02
0.01
0

9. Kronecker product If A is an m  n matrix and B is an p  q


matrix, the Kronecker product A B is the mp  nq matrix:
2 3
a 11 B . . . a 1n B
6 7
AB¼6 4 ⋮ ⋱ ⋮ 7 5
a m1 B . . . a mn B
10. Number of realizations. Checking ergodicity Stochastic
simulations have a high computational cost. Hence, it may be
very useful to check if one realization of the CLE model for the
whole population will be enough to characterize the long-term
statistics such as mean, variance, or noise strength for each
biochemical species. To this end, perform several realizations
of the CLE model. For each realization, select the values at the
last time point of each species for every i-th cell. Use a
80 Jesús Picó et al.

MANOVA test ([d,p,stats] ¼ manova1([A I R RA2 Reali-


zation]) in Matlab) to determine whether there are differ-
ences in the means of the species, among the different
realizations. The results of the MANOVA test include d, an
estimate of the dimension of the space containing the group
means. If d¼0, there is no statistically significance to reject the
hypothesis of the three realizations have the same mean, with
p-value p and the test statistic Wilk’s lambda (stats.
lambda). In addition, it is interesting to evaluate if the Maha-
lanobis distance between the means of each realization is close
to zero (values of the elements of stats.gmdist). MANOVA
test, however, assumes the variables are normally distributed.
The non-parametric Kruskal–Wallis test can be performed for
each one of the species separately to address if the data of each
realization comes from the same distribution (see Example
Note 6).

Example Note 6
In the QS/Fb system, three realizations were per-
formed with the same set of parameters and conditions
for a population of N cells. The steady-state portion of
each species of interest was selected for every i-th cell.
The time average was obtained over this steady-state
time-window, resulting in an averaged number of mole-
cules of each species per cell. The figure below depicts the
matrix scatter plot for the three realizations (using a
different color for each one). Notice the distributions
for all four species are unimodal and well shaped. A
MANOVA analysis reflects no statistically significance to
reject the hypothesis of the three realizations have the
same mean and variance, with p-value ¼ [0.5961,
0.6730], and Wilk’s lambda λ ¼ [0.9910, 0.9978]. In
addition, the Mahalanobis distance between the means
of each realization is close to zero (DM ¼ [0.0132,
0.0320, 0.0363]). This analysis confirms that one reali-
zation of the simulation of a population with
N interconnected cells during enough simulation time
provides representative long-term moments of the
population.
Since, the MANOVA test assumes normality, the
Kruskal–Wallis test was performed for the three realiza-
tions in each one of the species. The results for A: [statis-
tic, p-value ¼ [0.610148, 0.737069]; I: [statistic, p-value
¼ [0.427088, 0.807717]; R: [statistic, p-value
¼ [2.22063, 0.309456]; and (R.A.)2: [statistic, p-value
¼ [0.344232, 0.841881]. Since all the p-values are
greater than or equal to 0.01, there is no statistically
significant difference between the medians of the species
Stochastic Differential Equations for Practical Simulation of Gene Circuits 81

from the different realizations (with a 99.0% confidence


level).

Species distributions and matrix scatter plot


100
A 50
0
4000
PI 2000
0
800
R 400
0
1000
(R.A) 500
2
0
0 50 100 0 2000 4000 0 400 800 0 500 1000
A PI R (R.A)2

11. OpenFPM Server installation Using the default location


installation will install OpenFPM in /usr/local. For this,
you need to have writing privileges in that directory, otherwise
the installation fails. To choose another directory where to
install dependencies and the library use:

1 git clone https :// github . com / IBirdSoft / openfpm_pdata_2 .0.0.

git

2 cd openfpm_pdata_2 .0.0

3 ./ install -i " / where / you / want / to / put / dependencies "

4 -c " -- prefix =/ where / you / want / to / install "

5 make install

For more information on the installation and trouble-


shooting the reader is referred to [18] and http://openfpm.
mpi-cbg.de/troubleshoot.
12. Random Number Generator (RNG) Seeding a random
number generator is a highly nontrivial task, since one has to
eliminate all structures of the input seeds. For unbiased sam-
pling, we use a Mersenne Twister random number generator
(MTRNG). To assure a unique seed on every processor, we mix
82 Jesús Picó et al.

the processor ID and the user seed with the mixing procedure
of a hash-based random number generator, Saru [2], as shown
in Example Note 7.

Example Note 7
The code that implements the distributed random
number generators for the QS/Fb CLE model reads like:

356 // Random number generator

357 size_t seed1 = v_cl . getProcessUnitID () ;

358 srand ( time ( NULL ) ) ;

359 size_t seed0 = rand () ;

360 srand ( seed0 ) ;

361 size_t seed2 = getuid () + rand () ;

362

363 seed2 += seed1 < <16;

364 seed1 += seed2 < <11;

365 seed2 +=(( signed int ) seed1 ) > >7;

366 seed1 ˆ=(( signed int ) seed2 ) > >3;


367 seed2 *=0 xA5366B4D ;

368 seed2 ˆ= seed2 > >10;

369 seed2 ˆ=(( signed int ) seed2 ) > >19;

370 seed1 += seed2 ˆ0 x6d2d4e11 ;

371 seed1 =0 x79dedea3 *( seed1 ˆ((( signed int ) seed1 ) > >14) ) ;

372 seed2 =( seed1 + seed2 ) ˆ ((( signed int ) seed1 ) > >8) ;

373 size_t MTseed =0 xABCB96F7 + ( seed2 > >1) ;

374

375 std :: mt19937_64 engine ( MTseed ) ; // Use the global key

as a seed for the PRNG

376 std :: normal_distribution < double > NDist (0 ,1) ; //

double Normal distribution with mean 0 and std 1


Stochastic Differential Equations for Practical Simulation of Gene Circuits 83

The above initial mixing does not create any correlation


and MTseed is used to seed MTRNG (the test harness ensures
that the behavior of this seeding is indeed chaotic). This seed-
ing assures uncorrelated RNG streams on different
processor ID.
13. Time step selection A Poisson random variable with large
enough mean is well approximated by a normal random vari-
able with the same mean and variance. If every reaction j is
expected to fire many times over [t, t + δt), its corresponding
propensity will change from Poisson to normal. A reaction
firing has a Poisson distribution probability with mean njδt, if
we consider a short time interval. Therefore, the CLE approxi-
mation implies two conditions: (1) δt is small enough to
assume a constant propensity during the interval ½0, T  ; and
(2) δt is large enough so that the expected number of occur-
rences of each reaction in [t, t + δt) be much larger than
1 [13]. Even though both conditions present a trade-off,
they can be simultaneously satisfied by having large molecular
population numbers. To select an appropriate δt, start from a
large initial value. Run the simulation for the whole system
with N ¼ 1 or for a subset computationally affordable using
SSA. Check whether the CLE simulation converges and gives
positive values for the species. If so, compare the results with
those obtained with SSA used as gold standard. Decrease the
time step by halving it until convergence of the CLE simulation
and good results are obtained.
14. Storage memory (decimation) The implementation of the
algorithm makes an efficient use of memory by saving the only
data points corresponding to the previous time and the present
time as mentioned in Example 7. This way, the memory neces-
sary for a execution is independent of the length of the simula-
tion and scales 2N with the number of cells. When the user
needs to keep all the trajectories of all states for every cell, a
strategy for saving memory is needed. Storing data points every
D time steps is a feasible solution for this problem. The best
value for D is the largest possible one that does not deteriorate
the statistical properties. To find the best D value: For one
realization, compute the statistics of interest (e.g. mean and
noise strength, μk and η2k , respectively) of each species k for
each i-th cell. This is D ¼ 1. Duplicate the value of D (D ¼ 2).
This is equivalent to take sample every other time step. Compute
84 Jesús Picó et al.

the new values of the statistics of interest. Repeat this process


until the statistics of interest start depart significatively from
their initial values (see Example Note 8).

Example Note 8
For the QS/Fb case, the figure below shows how the
use of storage memory decreases when the number of
samples is reduced by the decimation process. Decima-
tion to D ¼ 32 yields approximately a 95% reduction of
the required memory space to save the data, and keeps
the long-term statistics without significantly changes
from the initial ones.

4 1

2 0.8

Relative Memory Usage


0.6
0
0 5 10

1 0.4
Time Step (min)

10-1
0.2
-2
10
0
0 2 4 6 8 10 0 5 10
Iteration Iteration

15. Matlab Parameter Sweep

Example Note 9
For the QS/Fb case, a parametric swept can be
carried out as follows.
Stochastic Differential Equations for Practical Simulation of Gene Circuits 85

96 % Generate Matrix X with all parameters combinations

97 D_v = [2 0];

98 kA_v = [0.04 0];

99 pI_v = [0.2 0.4 2 4 10];

100 kdLux_v = [10 100 200 500 1000 2000];

101 alphaI_v = [0.01 0.1];

102 dR_v = [0.02 0.07 0.2];

103 pR_v = [0.2 0.4 2 4 10];

104

105 X = transpose ( combvec ( D_v , pI_v , kdLux_v , alphaI_v , dR_v , pR_v

, kA_v ) ) ;

106 %%

107 for xpop =1: size (X ,1)

108 % With and without QS

109 D = X ( xpop ,1) ;

110 % LuxI

111 pI = X ( xpop ,2) ; % translation rate of LuxI #

mRNA [1/ min ]. b * dmI from Weber . [3.0928 - 6.1856]

112 kdLux = X ( xpop ,3) ; % dissociation cte ( LuxR . A ) 2

to promoter [ molecules ] , Bucler et al [1 1000] nM

113 alphaI = X ( xpop ,4) ; % leakage of the repressor

Plux 0.01 -0.1

114

115 % LuxR
86 Jesús Picó et al.

116 dR = X ( xpop ,5) ;

117 pR = X ( xpop ,6) ;

118 kA = X ( xpop ,7) ;

119

120 % Writing parameters to a struct in the proper order to

be read by the

121 % langevin C ++ program .

122 param_out = struct ( ’ dI ’ , dI , ’ pI ’ , pI , ’ kI ’ , kI , ’

pN_luxI ’ , pN_luxI , ’ dmI ’ , dmI , ’ kdLux ’ , kdLux , ’ alphaI

’ , alphaI , ’ dR ’ , dR , ’ pR ’ , pR , ’ cR ’ , cR , ’ dmR ’ , dmR , ’

k_1 ’ , k_1 , ’ kd1 ’ , kd1 , ’ k_2 ’ , k_2 , ’ kd2 ’ , kd2 , ’ dRA2 ’ ,

dRA2 , ’ kA ’ , kA , ’ dA ’ , dA , ’D ’ , D , ’ Vcell ’ , Vcell , ’

Vext ’ , Vext , ’ dRA ’ , dRA , ’ dAe ’ , dAe ) ;

123

124

125 % Write a file named param . dat , with the struct param_out

126 struct2csv_append ( param_out , ’ param . dat ’ , ’W ’) ;

127

128

129 % Excecuting the external C ++ program langevin with 4

cores , and with the files param . dat as input

130 command = [ ’ mpirun - np 4 ./ langevin param . dat ’ num2str (

Ncells ) ’ ’ num2str ( ruido ) ’ ’ num2str ( ahl_e_0 ) ’ ’

num2str ( STO ) ’ ’ num2str ( HISTO ) ’ ’ num2str ( TEMPO ) ’ ’

num2str ( TEMPOT ) ];

131 system ( command ) ;

In Lines 96–105 a matrix with all the combinations of


parameters is generated from the individual parameters
using combvec command. In Lines 107–118 a loop is
Stochastic Differential Equations for Practical Simulation of Gene Circuits 87

executed. The parameters combinations are recovered


one by one and saved into the param.dat file in Lines
122 and 126. The execution of the langevin program
with its corresponding arguments is performed in Lines
130–131. The remaining part of the code (Lines
133–173) implement the code to read from the out-
put.dat file and import the obtained data into a Matlab
variable Data, which is saved into a Data.mat Matlab
binary file for further analysis.
To be able to execute this code in Linux, it is neces-
sary to declare an alias for Matlab:

1 alias matlabc = ’ end LD_PRELOAD =/ usr / lib / x86_64 - linux - gnu -

libstdc ++. so .6 / matlab / instalation / directory / R2014a /

bin / matlab - nodesktop - nojvm - nosplash ’

This way, Matlab has the location of the C++ lib in its
environment.

16. Python execution Another option to execute langevin is to


run it though a Python script. An example for the QS/Fb case
is shown below.

Example Note 10
For the QS/Fb case, the Python3 code
corresponding to the Simulate_PCLE.py:

75 # Arguments of the langevin program .

76 Ncells = 240

77 ruido = 0.15

78 ahl_e_0 = 0/ Vc

79 STO = 1 # STO = 0 is deterministic simulation

, and STO = 1 is stochastic .

80 HISTO = 0 # HISTO = 1 for obtaining the steady

state histogram
88 Jesús Picó et al.

81 TEMPO = 1 # TEMPO = 1 For obtaining the

temporal response of the means and std

82 TEMPOT = 0 # TEMPOT = 1 For obtaining the

temporal response of all cells

83

84 print ( ’ Saving params ... ’)

85 # Params list

86 param_out = [ dI , pI , kI , pN_luxI , dmI , kdLux , alphaI , dR ,

pR , cR , dmR , k_1 , kd1 , k_2 , kd2 , dRA2 , kA , dA , D ,

Vcell , Vext , dRA , dAe ]

87

88 # Write a file named param . sT , with the struct param_out

89 filename = ’ param . dat ’

90 with open ( filename , ’w ’) as myfile :

91 wr = csv . writer ( myfile , quoting = csv . QUOTE_NONE )

92 wr . writerow ( param_out )

93

94 print ( ’ Params saved in ’ + filename )

95 # Excecuting the external C ++ program langevin with 4

cores , and with the files param . dat as input

96 command = ’ mpirun - np 4 ./ langevin param . dat ’ + str (

Ncells ) + ’ ’ + str ( ruido ) + ’ ’ + str ( ahl_e_0 ) + ’ ’

+ str ( STO ) + ’ ’ + str ( HISTO ) + ’ ’ + str ( TEMPO ) + ’ ’

+ str ( TEMPOT )

97 print ( command )

98 os . system ( command )

In Lines 1–71 the necessary parameters are assigned


into Python variables (code not shown, refer to the
repository for details). In Lines 75–82 the execution
Stochastic Differential Equations for Practical Simulation of Gene Circuits 89

parameters are set. In Line 86 the parameters are


gathered into a vector and saved into the param.
dat file in Lines 89–92. The execution of the lan-
gevin program with its corresponding arguments is
performed in Lines 96–98 using the os.command
Python function.

Acknowledgement

This work is partially supported by grant MINECO/AEI, EU


DPI2017-82896-C2-1-R.

References
1. Acar M, Mettetal JT, van Oudenaarden A noise and regulatory network architecture.
(2008) Stochastic switching as a survival strat- Trends Genet 28(5):221–232
egy in fluctuating environments. Nat Genet 40 10. Chellaboina V, Bhat S, Haddad M, Bernstein D
(4):471–475 (2009) Modeling and analysis of mass-action
2. Afshar Y, Schmid F, Pishevar A, Worley S kinetics. IEEE Control Syst 29(4):60–78
(2013) Exploiting seeding of random number 11. Eldar A, Elowitz MB (2010) Functional roles
generators for efficient domain decomposi- for noise in genetic circuits. Nature 467
tion parallelization of dissipative particle (7312):167–173
dynamics. Comput Phys Commun 184 12. Elowitz MB, Levine AJ, Siggia ED, Swain PS
(4):1119–1128 (2002) Stochastic gene expression in a single
3. Andrews SS, Dinh T, Arkin AP (2009) Stochas- cell. Science 297(5584):1183–1186
tic models of biological processes. Springer 13. Gillespie DT (2000) The chemical Langevin
New York, New York, pp 8730–8749 equation. J Chem Phys 113:297–306
4. Basak S, Chabakauri G (2010) Dynamic mean- 14. Gillespie DT (2007) Stochastic simulation of
variance asset allocation. Rev Financ Stud 23 chemical kinetics. Annu Rev Phys Chem
(8):2970–3016 58:35–55
5. Boada Y, Vignoni A, Picó J (2017) Engineered 15. Higham DJ (2001) An algorithmic introduc-
control of genetic variability reveals interplay tion to numerical simulation of stochastic
among quorum sensing, feedback regulation, differential equations. SIAM Rev 43
and biochemical noise. ACS Synth Biol 6 (3):525–546
(10):1903–1912
16. Higham DJ (2008) Modeling and simulating
6. Boada Y, Vignoni A, Picó J (2019) Multiobjec- chemical reactions. SIAM Rev 50(2):347–368
tive identification of a feedback synthetic gene
circuit. IEEE Trans Control Syst Technol 17. Hilfinger A, Paulsson J (2011) Separating
1–16. intrinsic from extrinsic fluctuations in dynamic
biological systems. Proc Natl Acad Sci 108
7. Cai L, Friedman N, Xie XS (2006) Stochastic (29):12167–12172
protein expression in individual cells at the sin-
gle molecule level. Nature 440 18. Incardona P, Leo A, Zaluzhnyi Y,
(7082):358–362 Ramaswamy R, Sbalzarini IF (2019)
Openfpm: a scalable open framework for par-
8. Cao Y, Gillespie DT, Petzold LR (2005) The ticle and particle-mesh codes on parallel com-
slow-scale stochastic simulation algorithm. J puters. Comput Phys Commun
Chem Phys 122(1):014116 241:155–177.
9. Chalancon G, Ravarani CN, Balaji S, Martinez- 19. Jones DL, Brewster RC, Phillips R (2014) Pro-
Arias A, Aravind L, Jothi R, Madan Babu M moter architecture dictates cell-to-cell
(2012) Interplay between gene expression
90 Jesús Picó et al.

variability in gene expression. Science 346 32. Raser JM, O’Shea EK (2005) Noise in gene
(6216):1533–1536 expression: origins, consequences, and control.
20. Kazeev V, Khammash M, Nip M, Schwab C Science 309(5743):2010–2013
(2014) Direct solution of the chemical master 33. Ruess J, Lygeros J (2015) Moment-based
equation using quantized tensor trains. PLoS methods for parameter inference and experi-
Comput Biol 10(3):e1003359 ment design for stochastic biochemical reaction
21. Khalil HK (1996) Nonlinear systems, 3rd edn. networks. ACM Trans Model Comput Simul
Prentice-Hall, New Jersey 25(2):8
22. Kokotovic P, Khalil H, O’Reilly J (1986) Sin- 34. Samoilov M, Plyasunov S, Arkin AP (2005)
gular perturbation methods in control: analysis Stochastic amplification and signaling in enzy-
and design. Academic Press, Orlando matic futile cycles through noise-induced bist-
23. Kruskal WH, Wallis WA (1952) Use of ranks in ability with oscillations. Proc Natl Acad Sci
one-criterion variance analysis. J Am Stat Assoc USA 102(7):2310–2315
47(260):583–621 35. Schnoerr D, Sanguinetti G, Grima R (2017)
24. Labhsetwar P, Cole JA, Roberts E, Price ND, Approximation and inference methods for sto-
Luthey-Schulten ZA (2013) Heterogeneity chastic biochemical kinetics-a tutorial review. J
in protein expression induces metabolic Phys A: Math Theor 50(9):093001
variability in a modeled Escherichia coli pop- 36. Sutton S (2006) Measurement of cell concen-
ulation. Proc Natl Acad Sci USA 110 tration in suspension by optical density. Micro-
(34):14006–14011 biology 585:210-8336
25. Mélykúti B, Hespanha JaP, Khammash M 37. Swain PS, Elowitz MB, Siggia ED (2002)
(2014) Equilibrium distributions of simple Intrinsic and extrinsic contributions to stochas-
biochemical reaction systems for time-scale ticity in gene expression. Proc Natl Acad Sci 99
separation in stochastic reaction networks. J R (20):12795–12800
Soc Interface 11(97):20140054 38. Van Kampen N (2011) Stochastic processes in
26. Munsky B, Khammash M (2006) The finite physics and chemistry. North-Holland Per-
state projection algorithm for the solution of sonal Library, Elsevier Science
the chemical master equation. J Chem Phys 39. Wilkinson DJ (2006) Stochastic modelling for
124(4):044104 systems biology. Mathematical and computa-
27. Murray JD (1989) Mathematical biology. tional biology Series, 2nd edn. Champan and
Springer, Berlin Hall/CRC, London
28. Novick A, Weiner M (1957) Enzyme induction 40. Wilkinson DJ (2009) Stochastic modelling for
as an all-or-none phenomenon. Proc Natl Acad quantitative description of heterogeneous
Sci USA 43(7):553 biological systems. Nat Rev Genet 10
29. Ostrenko O, Incardona P, Ramaswamy R, (2):122–133
Brusch L, Sbalzarini IF (2017) pssalib: the 41. Woods ML, Leon M, Perez-Carrasco R,
partial-propensity stochastic chemical network Barnes CP (2016) A statistical approach
simulator. PLoS Comput Biol 13(12): reveals designs for the most robust stochastic
e1005865 gene oscillators. ACS Synth Biol 5
30. Raj A, van Oudenaarden A (2008) Nature, (6):459–470
nurture, or chance: stochastic gene expression 42. Zagaris A, Kaper HG, Kaper TJ (2004) Analy-
and its consequences. Cell 135(2):216–226 sis of the computational singular perturbation
31. Rao CV, Arkin AP (2003) Stochastic chemical reduction method for chemical kinetics. J Non-
kinetics and the quasi-steady-state assumption: linear Sci 14(1):59–91
application to the Gillespie algorithm. J Chem
Phys 118(11):4999–5010
Chapter 3

Using Models to (Re-)Design Synthetic Circuits


Giselle McCallum and Laurent Potvin-Trottier

Abstract
Mathematical models play an important role in the design of synthetic gene circuits, by guiding the choice
of biological components and their assembly into novel gene networks. Here, we present a guide for
biologists to build and utilize models of gene networks (synthetic or natural) to analyze dynamical proper-
ties of these networks while considering the low numbers of molecules inside cells that results in stochastic
gene expression. We start by describing how to write down a model and discussing the level of details to
include. We then briefly demonstrate how to simulate a network’s dynamics using deterministic differential
equations that assume high numbers of molecules. To consider the role of stochastic gene expression in
single cells, we provide a detailed tutorial on running stochastic Gillespie simulations of a network,
including instructions on coding the Gillespie algorithm with example code. Finally, we illustrate how
using a combination of quantitative experimental characterization of a synthetic circuit and mathematical
modeling can guide the iterative redesign of a synthetic circuit to achieve the desired properties. This is
shown using a classic synthetic oscillator, the repressilator, which we recently redesigned into the most
precise and robust synthetic oscillator to date. We thus provide a toolkit for synthetic biologists to build
more precise and robust synthetic circuits, which should lead to a deeper understanding of the dynamics of
gene regulatory networks.

Key words Synthetic gene circuits, Mathematical modeling, Dynamical gene network, Stochastic
simulations, Gillespie algorithm, Synthetic oscillator, Synthetic biology, Biological oscillations

1 Introduction

Models are simplified representation of the world and a core com-


ponent of science. They help us understand how the world works,
for example, via simple mathematical equations that approximately
describe the movement of objects under a range of conditions.
They can also help us design and engineer systems, for example
by using mathematical models to ensure an electronic circuit will
function as intended. Models have been at the core of synthetic
biology since its beginnings in 2000 with the publication of two
gene circuits: an oscillator called the repressilator [1] and a bistable
toggle switch [2]. They have helped define synthetic biology as a
new field, in which biologists moved from modifying existing

Filippo Menolascina (ed.), Synthetic Gene Circuits: Methods and Protocols, Methods in Molecular Biology, vol. 2229,
https://doi.org/10.1007/978-1-0716-1032-9_3, © Springer Science+Business Media, LLC, part of Springer Nature 2021

91
92 Giselle McCallum and Laurent Potvin-Trottier

biological systems to designing and engineering novel gene circuits.


In these seminal papers, mathematical models were used to deter-
mine the required parameters for the oscillatory or bistable behav-
ior of the systems and guided the choice of biologic parts that met
these requirements. Even though this chapter will focus on the
“redesign” of synthetic circuits, we want to emphasize that model-
ing should be used throughout the design-build-test-learn cycle of
synthetic biology. Models should be created and analyzed before
building a circuit, as even with advances in DNA synthesis and
assembly, it is still much faster to model a circuit in silico than to
build it—and there is no point in building a circuit if it cannot work
under any conditions. After building and testing a circuit, models
can help to understand why the circuit might not behave as
intended and to learn about the dynamics of the assembled gene
networks. Measurements of a circuit’s properties need to be reli-
able, as it is necessary to distinguish and isolate variability coming
from either the measurements, the environment, or the synthetic
circuit itself. This is a key component of the redesign: experimental
data will be only useful if it actually represents a circuit’s behavior.
As a final warning, we would like to remind the reader that if the
model fits the data, it does not mean that it is “correct.” It simply
means that it is consistent with the data and cannot be ruled out,
but it is possible (and even likely) that many more models can fit the
data equally well. Nevertheless, these models can be useful to make
predictions about a systems’ function, informing processes such as
redesigning features of the circuit to improve its behavior.
Modeling of gene circuits expressed in vivo is complicated by
the fact that cells are not macroscopic-sized test tubes: their size
and the finite, small number of molecules they contain means that
the chemical reactions inside a cell will happen by chance when
molecules collide with each other. Thus, the number of molecules,
such as mRNAs and proteins, will vary over time, meandering
around the mean value. This is due to stochastic chemistry and is
referred to as stochastic gene expression. These random fluctua-
tions are important to take it into account in the design and
modeling of synthetic circuits as they do not merely make the
circuit’s behavior “noisier,” but can have counterintuitive effects,
such as turning a non-oscillating system into an oscillating one
[3, 4]. In 2016, using a combination of careful single-cell micros-
copy experiments and stochastic modeling, we took the original
repressilator circuit that appeared to have rather poor oscillatory
properties and iteratively transformed it into by far the most precise
and robust synthetic oscillator to date [5]. A microfluidic device
nicknamed the “mother machine” [6–8] enabled us to track single
cells under carefully controlled growth conditions and separate
environmental noise from variability intrinsic to the circuit. Model-
ing of the circuit using these measurements while considering the
stochastic nature of the chemical reactions enabled us to build a
Using Models to (Re-)Design Synthetic Circuits 93

version of the circuit with improved precision and robustness.


Many synthetic biology projects are often left at the proof-of-
principle stage, where the function of the circuit is approximately
what was originally intended. We believe that understanding why
these circuits do not function exactly as intended is as useful as
making them in the first place: it will provide a greater understand-
ing of gene regulatory networks while leading to precise and robust
systems that can be used in applications.
In this chapter, we aim to provide a guide to (re-)designing
gene circuits using modeling, using our previous work with the
repressilator as an example. Obviously, there is no universal step-by-
step protocol to redesign a synthetic circuit. However, we hope to
provide a useful guide to synthetic biologists from various scientific
backgrounds to help them incorporate modeling while building
and analyzing their circuits. To complement existing resources,
we will focus on dynamic systems and stochastic analysis [9–
13]. While this chapter focuses on synthetic circuits, the same
modeling approach is just as useful for natural gene circuits. We
will start by explaining how to write down a simple model and use it
to write down the differential equations representing a system’s
dynamics. Then, we will discuss how to model the effects of sto-
chastic gene expression on the circuit by performing stochastic
(Gillespie) simulations and understanding sources of noise in a
system. In the spirit of this series, we will provide a step-by-step
protocol (and code) to run Gillespie simulations, which provide an
exact representation of the stochastic system while being simple and
fast to run, making them tremendously useful. Finally, we will
describe an example of how these models can be used to guide
the re-design of synthetic circuits.

2 Materials

All models used in this chapter can be solved or simulated using


built-in or custom-coded functions in programming languages like
Matlab, Octave, Python (numpy), Mathematica, C, Fortran, and so
on. All code for models discussed in Subheading 3 can be found at
our source code repository (https://github.com/potvinlab/
MiMB_circuitmodeling.git) and can be easily implemented in
Matlab or Octave, an open-source alternative. While there are
already software and packages to run stochastic simulations avail-
able in many languages, writing these algorithms from scratch is as
fast as understanding an existing code while being much more
pedagogical (see list of software at Gillespie Wikipedia page [14],
Wolfram Alpha Demonstrations Project [15]).
94 Giselle McCallum and Laurent Potvin-Trottier

3 Methods

3.1 Writing Down The first step in writing down a model is to record all the interac-
a Model tions between the molecules in the circuit or impacting it, and the
chemical reactions that create and eliminate them in a diagram
3.1.1 Abstracting
(sometimes called the network topology). Here, we must consider
the Circuit: Sketching Its
the level of detail we want to include in the model. While it is
Diagram
important to include enough details to accurately reflect the under-
lying processes we wish to learn about, too much detail can weigh
down the model and distract from the effects that particular vari-
ables can have on the behavior of the system. For example, consid-
ering relativity while calculating the movement of a ball through
the air will obscure the understanding of the simple system while
adding futile precision. Consider the repressilator circuit: in this
network, three genes encode different repressor proteins (LacI,
TetR, and λ CI), each of which represses the expression of the
next gene in the circuit in a single feedback loop. Because of the
odd number of repressors, this effectively leads to autorepression
with a delay, producing out-of-phase oscillations of the three pro-
teins. The simplest model of this circuit contains only the repressors
as variables and considers that the proteins directly repress each
other’s production (Fig. 1a). While this model can still lead to
oscillations, it ignores many important biological parameters, such
as transcription rates and difference between mRNAs’ and proteins’
half-lives (see Note 1). We could also model fluctuations in gene
copies due to plasmid copy number, the switching of the promoter
between the active and inactive state, the number of RNA poly-
merases and ribosomes, multimerization of the repressors, and
enzymatic degradation of the repressor via proteases—all biological
parameters with an impact on our circuit (Fig. 1b). However, these
details will not necessarily provide valuable insight on the behavior
of our circuit and will add many new variables and parameters to the
model. While they may not be included in the equations, keeping
these details in mind is helpful when analyzing the model and
circuit’s behavior, as they might help explain unexpected results.
Our favorite approach is to start with the simplest model that can
lead to some understanding of the system. Then, complexity can be
progressively added if it is necessary to explain the observed behav-
ior or to test the effects of a particular component of the system. It
is important to consider that in biology there are still many
unknowns (and unknown unknowns), and that adding many para-
meters to the model will not make it a better representation of
reality. Powerful approaches have been developed to rigorously
model systems with many unknown interactions, but they are
beyond the scope of this chapter [16, 17]. In our example, we
will include mRNA (m) and proteins (P) as variables in our net-
work. The transcription of an mRNA is repressed by the previous
Using Models to (Re-)Design Synthetic Circuits 95

a P1 b n plam

ids
Ø gene P1
Ø
+
RNAP
RNase +
P3 P2 + ribosomes

Ø Ø + protease
c Ø
P1 Ø
Ø
Ø diluo n +
Ø

P3

Ø mRNA
Ø
P2
Ø x 3 genes

Fig. 1 Network topology diagram. Examples of possible network topology


diagrams displaying the interactions between molecular species in the repres-
silator. Models can range from being very minimalistic (a) to including many
detailed processes and interactions between your circuit and the cell (b). How
much detail you should include depends on what you want to learn, but starting
simple is key. (c) We have chosen a simple model of the repressilator that
includes both mRNA and proteins as variables

protein in the network (i.e., P1 inhibits the transcription of m2),


and proteins are translated proportionally to the number of
mRNAs. Both mRNA and proteins are depleted from the cell by a
combination of dilution due to cell growth and active degradation
(Fig. 1c). We will account for the dimerization of repressors and
their affinity to their promoters in Subheading 3.1.2.

3.1.2 Mass Action The next step in building our model is to write an expression
Equations describing each reaction in our system according to the law of
mass action, which states that the rate of a reaction is proportional
to the concentration of the reactants. For example, for the reaction
x + y ! z, the rate of production of z is calculated as ddt½z  ¼ ½x ½y   k1,
where k1 is a constant known as the mass action rate constant that
indicates the rate per reactant (or proportionality) of the reaction.
Intuitively, this means that if you have twice as many molecules, the
reaction rate will be twice as high because collisions between mole-
cules are twice as likely to happen. The mass action equation is then
k1
written as x þ y ! z.
96 Giselle McCallum and Laurent Potvin-Trottier

We can thus write the mass equations for all the chemical
reactions included in our model. For the repressilator, the equa-
tions are:
f ðP i1 Þ
; ! m i
βm
mi ! ;
λP
mi ! P i þ mi
βP
Pi ! ;
for each repressor (i ¼ 1, 2, 3, and where P0 ¼ P3 by definition). λP
is the rate of translation of repressor (Pi) per mRNA per unit time,
and βm and βP are the rate of elimination of mRNA and protein,
respectively (determined by dilution due to cell growth and active
degradation). The rate for the transcription of mi is the value of the
function f(Pi  1) describing repression of the promoter by the
previous protein (Pi  1) in the network. Here, we decide to use
the following function called a Hill function:
λm  K h
f ðP i1 Þ ¼
K h þ P hi1
The Hill equation is classically used to describe cooperative
binding of ligands to a receptor and is useful in describing many
biological processes, as it describes nonlinear switching of a system
between 0 and 1 (a fully “off” and fully “on” state). In our model,
the Hill function is used to approximate the (possibly partial)
cooperative binding of the repressor proteins to their promoters
(see Note 2). Here, h is the Hill coefficient representing this coop-
erative binding. The parameter K in the Hill function is the thresh-
old at which half of a population of a repressor in the cell is bound
to its site and accounts for the affinity that a repressor has for its
binding site in a promoter. λm is the maximal transcription rate
when there is no repression of the gene encoding the mRNA.
Using the mass action equations, we can now write the ordi-
nary differential equations (ODEs) that describe the dynamics of
our system and find a deterministic solution of the system (see
Subheading 3.2) assuming that the numbers of molecules are very
high. This may not be an accurate approximation in all situations,
but can provide an intuition about the system’s behavior. In order
to consider the effects of the finite number of molecules, we can
also simulate the reactions stochastically (see Subheading 3.3).

3.1.3 Parameter Before proceeding, it is useful to obtain an order of magnitude


Estimation estimate for the biological parameters of our model. There are a few
resources that can be extremely helpful with this task, such as
BioNumbers [18], an online database containing molecular
Using Models to (Re-)Design Synthetic Circuits 97

Table 1
Estimated parameter values for the mRNA-protein repressilator model

Est.
Parameter Description Units value Source
1
λmi Max transcription rate mRNA min 4.1 [49]
mRNA τp1(see 150
footnote a)
K Threshold of repression (½ molecules proteins 7 [5]
are bound to promoters)
h Hill coefficient of cooperativity Unitless 2 [50]
1
βm mRNA elimination rate (combination min 0.1 [51]b
degradation and dilution) τp1(see 3.6
footnote a)
λPi Translation rate proteins mRNA1 min1 1.8 [21, 51]b
proteins mRNA1τp1(see 65
footnote a)
βP Combination of dilution due min1 0.027
to cell growth and active τp1(see footnote a) 1
degradation of protein
a
For parameter scans discussed in Subheading 3.3.8, we have set all values for rate constant parameters to units of protein
lifetime τp1 (see Note 1) by scaling rate parameter values assuming a cell division rate (and protein half-life) of 25 min,
with no degradation of our proteins by proteases (making τP ¼ 36 min)
b
Values were found on BioNumbers [18]

biology numbers from many cell types and species (from peer-
reviewed sources) and Ron Milo and Rob Philips’ book Cell Biology
by the Numbers [19]. For example, in our model (assuming that our
network is being expressed in Escherichia coli), several studies listed
on BioNumbers have shown translation rates of ~ 8 aa/mRNA/
s [20, 21], which we can adapt to the units required for our model
(see Note 3). Table 1 contains a complete list of parameters in the
example repressilator model, and an order of magnitude estimation
of these parameters that serve as a starting point for our simula-
tions. It is easy to vary these parameters by orders of magnitude in
silico, but it is important to remember what they physically repre-
sent: rates of chemical reactions in a cell. As such, they must remain
physically realistic. For example, binding constants cannot be so
high such that molecules would need to diffuse faster than they
would in water. Note that although testing a range of parameters
computationally is easier than experimentally, the dimensionality
(and therefore computational load) expands as the number of
parameters increases (if you want to test a range of n values for
x parameters, you must run nx simulations). This provides further
motivation to keep the model as simple as possible.
98 Giselle McCallum and Laurent Potvin-Trottier

3.2 Deterministic Let’s start by assuming that our system is evolving in a macroscopic
Solution test tube in which all of our reactants are present in high numbers,
and the system is homogenously mixed. Under this assumption, we
3.2.1 Writing Ordinary
can write a set of deterministic ordinary differential equations
Differential Equations
(ODEs) that describe the dynamics of our system. Here, we write
(ODEs)
one equation for each species in the system, describing its rate of
change (i.e., its production rate minus its overall depletion rate).
For example, for a molecule x produced at a constant rate and
eliminated at a constant rate per molecule, dx dt ¼ λ  β  x ðt Þ, where λ
and β are the production and elimination rates constants, respec-
tively. It is often useful to know the concentration of your compo-
nents at equilibrium. To get this value, we can simply set dx dt ¼ 0
(at steady state, concentration does not change) and solve the
equation for x. Here, the steady-state value of x is (intuitively)
determined by its rate of production divided by its degradation
rate: x ss ¼ βλ.
We can use the same strategy to build a set of ODEs describing
the repressilator. In the spirit of simplicity, let’s assume for now that
the parameters are roughly equal for each mRNA and repressor. We
can therefore use the same parameter values for all equations in our
symmetrical system:
dmi λ  Kh
¼ hm  m i  βm
dt K þ P hi1
dP i
¼ m i  λP  P i  β p
dt
where i is the gene index as defined above. In total, our model will
consist of six ODEs with two terms each, describing the rate of
change of three mRNAs and three repressors over time.

3.2.2 Solving the System A system of ODEs describing nonlinear biochemical networks
of ODEs cannot usually be solved analytically. However, for a given set of
parameters and initial value for the components, it can be solved
relatively easily using a numerical solver (for well-behaved
systems), which are built into most programming languages (e.g.,
Matlab (see code at https://github.com/potvinlab/MiMB_
circuitmodeling.git), Python). For example, using the ode23 func-
tion built into MatLab, we can solve our system of equations with
the parameters in Table 1 over a specified time. In Fig. 2a, we can
see that for our estimated parameter values and chosen set of initial
conditions, our system exhibits sustained oscillations. For more
detailed information on ODE models of biological networks, please
refer to the chapter in this book titled Modelling frameworks:
Ordinary Differential Equations.
Using Models to (Re-)Design Synthetic Circuits 99

a b
1500
Deterministic 3000
Stochastic
P1

Copy Number
P2
1000 2000
P3

500 1000

0 0
0 20 40 0 40

c Time (τp)
10

0
28 29 30

Fig. 2 Time traces for the proteins of the repressilator. (a) Deterministic
numerical solution to the system of ODEs, solved with the parameter set
shown in shows oscillations in Table 1. This set of parameters leads to sustained
oscillations. (b) Time traces for three proteins, simulated stochastically with the
same parameter set as in a. The stochastic system still produces sustained
oscillations, with some noise in period and amplitude of peaks. (c) Zooming in,
we can see copy number changing in discrete steps, with one protein being
produced or degraded in time steps of various lengths

3.2.3 Parameter Space Often, we are interested in understanding the behavior of the
Analysis and Bifurcation system over a range of parameters. For example, you might be
Diagrams interested in choosing components (and indirectly parameters) for
your circuit that will lead to a specific behavior. For a deterministic
system, it can be possible to analytically determine the parameter
boundary that will give oscillations (or other behaviors like damped
oscillations or stability of equilibria) using a method called linear
stability and bifurcation analysis. While the detailed process of
linear stability analysis is outside the scope of this chapter (see
Note 4), this approach has previously been used to analyze the
parameter space of the repressilator model and find the boundary of
the parameter space, which can give rise to oscillations
[1, 22]. Sometimes, combinations of parameters (such as the ratio
between them), rather than individual parameters themselves,
determine a system’s behavior. Here, two combinations of para-
meters, α and β (where α ¼ βλmβλPK and β ¼ ββP ), determine whether
m P m
the system oscillates and are used in linear stability analysis to
determine the boundary of the oscillation space. A plot of this
boundary, called the bifurcation diagram (Fig. 3), shows that
there are many sets of parameters for which the system can oscillate.
From this diagram, we can infer that increasing cooperativity/
100 Giselle McCallum and Laurent Potvin-Trottier

103
h = 1.35
h = 1.5
h=2
102
h=3

101

β=βp/βm 100

10-1

10-2 0
10 101 102 103 104
λ mλ P
α=
βp βmK

Fig. 3 Bifurcation diagram. Plot showing the boundary of the αβ parameter
space that gives rise to oscillations. Thick lines indicate the boundary at various
h values, as determined by linear stability analysis (and the parameter combina-
tions that lead to oscillations contained to the right each line). Increasing
cooperativity (h), increasing α and having β ¼ 1, all increase the parameter
space that support oscillations

nonlinearity helps the oscillations (by broadening the region of


parameter space that sustains oscillations) and that ideally the
mRNAs’ half-lives should be similar to the proteins’ half-lives.
When interpreting this diagram, it is important to keep in mind
that not all parameter values in this space are biologically relevant,
physically possible, or easy to achieve experimentally. While these
differential equations ignore the stochastic noise inside cells, they
can provide useful insights into the behavior of dynamic systems
and networks by providing analytical relations for the parameters.

3.3 Stochastic So far, we have been operating under the assumption that mole-
Simulations cules in our system are present in high numbers and therefore
behave according to deterministic dynamics. In cells, this assump-
tion is not always correct: many molecules such as mRNAs and
proteins are present in low copy numbers [23–28]. Individual
chemical reactions will happen by chance when molecules collide
with each other, such that numbers of molecules will fluctuate over
time (or across cells in a population), making their respective cellu-
lar processes (like gene expression) stochastic in nature. Levels of
molecules can also fluctuate even if they are present in higher
numbers, as this noise can be transmitted from one molecule to
the next (for example, if proteins are translated from a noisy
Using Models to (Re-)Design Synthetic Circuits 101

mRNA). Therefore, we cannot predict or calculate how these num-


bers will change over time (which chemical reaction will happen
and when), we can only calculate the probabilities that the system
will have a particular number of molecules at a given time. These
probabilities are described by a set of differential equations relating
the change in the probability distribution of having a certain num-
ber of molecules called the chemical master equation (CME,
[11, 29, 30]). Even for the simplest processes such as one molecule
being produced at a constant rate and eliminated at a constant rate
per molecule (i.e., a Poisson birth and death process), the CME is
represented by an infinite number of coupled differential equations.
While the CME is generally not solvable except in special cases, it is
possible to calculate the moments of the probability distribution
(e.g., average, variance, autocorrelation) by approximating the rates
of the chemical reactions as linear around the average. This is called
the linear noise approximation (or a first-order van Kampen expan-
sion in physical chemistry, and many other names in other fields)
and is exact when the rates are linear function of the number of
molecules. This is outside the scope of this chapter, and we direct
the interested reader to the following references [10, 11, 28].
Here, rather than analytically solving the CME, we will focus
on simulating one realization (one sample path, or an example time
trace) of this continuous-time, discrete state Markov process using
an approach known as a Gillespie simulation [23, 31]. This algo-
rithm is very simple to implement and is exact in the sense that
simulated time traces converge to the correct probability distribu-
tion and its moments (average, variance, autocorrelation, etc.)
(Fig. 4a, b). Using this algorithm, we can simulate our genetic
circuits and measure the impact of the stochastic chemistry inside
cells on our circuits for different designs. Continuing with the
example of the repressilator, we will describe and demonstrate
how to implement the Gillespie algorithm to stochastically model
gene regulatory networks.

3.3.1 Stochastic Notation Similar to the deterministic system, we can write a set of equations
describing the rates for each possible reaction in the system, using
the following notation (see Note 5):
λ
x ! xþn
where n is the change in x value resulting from a reaction and λ is
the average rate of the reaction. In our example of simple birth and
death of a molecule, n ¼ 1 and 1, respectively, but can be 2 in
other cases, such as production in a burst or oligomerization of
molecules. In all cases, n should be an integer, as molecules can only
exist in integer numbers. For the repressilator, the reaction equa-
tions in stochastic notation are as follows:
102 Giselle McCallum and Laurent Potvin-Trottier

a b
t t
t t
t t
t t

frequency
xss
xss

c time x
β·x t0 λ
P(x=2,t1) P(x=4,t1)
x
0 1 2 3 4 5
x → x-1 t0→ next reaction?
x → x+1
x
time

Fig. 4 Stochastic simulation of a birth and death process. (a) Single time traces
(thin gray lines) simulated using the Gillespie algorithm for a species x, which is
produced at rate λ and degraded at rate β  x. Although each trace is different,
their statistical properties eventually converge to the correct probability distribu-
tion (colored lines) and its moments (e.g., mean and standard deviation). (b)
Once steady state is reached, the probability distribution does not change. (c)
Random walk on a lattice. Starting at a given x value, the system can move to a
value of x + 1 or x  1 with probabilities that depend on the production and
degradation rate, respectively
λm K h
K h þP i1 h
mi ! mi þ 1
βm mi
mi ! mi  1
λP mi
Pi ! Pi þ 1
βP P i
Pi ! Pi  1
We will use the reaction rate expressions from these equations
in our Gillespie algorithm.

3.3.2 Simulating Instead of analytically solving the CME, we will simulate one reali-
a Time-Trace: The Gillespie zation of the stochastic process. The idea behind the Gillespie
Algorithm algorithm is quite simple: we initialize the system to an (arbitrary)
initial value (number of molecules at time t), and then let chemical
reactions happen randomly. To be exact, we need to pick these
reactions from their proper probability distribution, describing
both when the reaction is going to happen and which one will
happen. For example, consider our molecule x from the previous
Using Models to (Re-)Design Synthetic Circuits 103

section (produced with rate λ and degraded with rate β  x), and
imagine our cell has 3 molecules (x ¼ 3) at a particular time point
(t0). The next chemical reaction will either be the production or
degradation of a molecule, either leading to x ¼ 4 or x ¼ 2. The
time until this next reaction happens is also stochastic and depends
on the current state of the system (Fig. 4c).

3.3.3 Gillespie Algorithm: After assigning an (arbitrary) initial value to all molecular species at
Time to Next Reaction the time zero of our simulation (e.g., x ¼ 3 in Fig. 4c), we will first
calculate the time to the next reaction, given the current state of our
system. We can imagine the system sitting in one state, simply
waiting for the next reaction to occur. We know that the probability
of that chemical reaction happening per time unit is constant over
time, regardless of how long we waited. As an analogy, imagine the
waiting time for rolling black while playing roulette. It does not
matter how long you wait or how many times you have already
rolled red, the probability of falling on black on a given roll is always
the same. Another example is radioactive decay, a stochastic process
where the probability of one nucleus decaying is constant over
time. This property is called memorylessness, because the stochastic
process does not have a “memory” of how long it waited in a state.
The only continuous probability distribution with this property is
the exponential distribution, here with T as the time to the first
reaction, λ is the average rate of the reaction, and τ is a given time
interval (see Note 6):
P ðT > τÞ ¼ e λτ
P ðT  τÞ ¼ 1  P ðT > τÞ ¼ 1  e λτ ¼ F ðt Þ
This is the cumulative distribution function (CDF) of the time
to the first reaction: if we take the derivative of this function, we get
the exponential probability density function, which gives the prob-
ability density that the time to the first reaction is around τ:
dF ðτÞ
pðT ¼ τÞ ¼ ¼ λ  e ðλτÞ

In our algorithm, we therefore want to sample from this expo-
nential distribution. Because most programming languages include
a function to generate random numbers uniformly distributed
between 0 and 1, we use the CDF to map the distribution of
interest to a uniform distribution (see Note 7). This process is
referred to as inversing the distribution. In our example, λ is the
total rate of all reactions in the system: because the reactions are
independent, the rate of any one reaction happening is the sum of
 
PN
the rates of each reaction in our system λtot ¼ λi , where N is
i¼1
the number of reactions in the system. Because we know the
current number of molecules in the system, we can calculate these
104 Giselle McCallum and Laurent Potvin-Trottier

N
0 λtot = ∑ λi
λ1 λ2 λ3 λN
i=1

λ1 / λtot λ2 / λtot λ3 / λtot λN / λtot

0 r2 1

Fig. 5 Choosing a reaction. Rates of all reactions are calculated given the current
system state and normalized by the sum of all possible reactions (λtot), such that
their cumulative sum is 1. Rates are aligned, and a number r2 uniformly
distributed between 0 and 1 is generated, whose value will determine which
of these reactions will occur

rates at that particular timepoint and using a randomly generated


number (here named r1) between 0 and 1, solve for the time to the
next reaction τ with the equation
 ln ð1  r 1 Þ  ln ðr 1 Þ
τ¼ ¼
λtot λtot
Because r1 is a randomly chosen number between 0 and 1, we
can replace 1  r1 with r1 to simplify the expression (see Note 8).

3.3.4 Gillespie: Choosing We now know that a reaction happened at time t0 + τ, but we still
a Reaction don’t know which reaction occurred. Using the rate of each reac-
tion λi (again using the current state of the system), we first nor-
malize these rates by λtot, such that if we line them up, their
cumulative sum is 1, thus building a cumulative distribution func-
tion (see Note 9). We now generate a second random number
between 0 and 1 (r2), which will fall somewhere on this line of
normalized reaction rates, determining the reaction that will occur
(Fig. 5). Reactions with higher rates take up more of the space in
the vector, and are therefore chosen with higher probability.

3.3.5 Gillespie: Updating Once we know which of the N reactions will occur and the time it
the State of the System will take, we must update both the time of our simulation and
quantities of all the species in the system. To update the time,
simply add the randomly sampled τ value to the current time. To
update the quantity of reactants, we add or subtract the appropriate
value to each species involved in the randomly picked reaction (see
Note 10). For example, if we chose the transcription of m1 as our
next reaction, we would update the system by adding +1 to the
current value of m1.
Using Models to (Re-)Design Synthetic Circuits 105

3.3.6 Gillespie: Iterating The steps of the algorithm above are iterated a chosen number of
the Algorithm times (n chemical reactions), with time and quantity of reactants
being updated at each iteration. The number of iterations should be
long enough to properly characterize the resulting time trace. For
example, if you are interested in the statistical properties of your
system around steady state, you should run your simulation far
enough past the time that steady state is reached that you have
sufficient points to sample to calculate these statistics. Note that
different species in your model may evolve at different time scales
and it might take many reactions until you can sufficiently sample
your slow species (this may be computationally challenging). After
running the simulation for n iterations at the parameters listed in
Table 1, our system shows regular oscillations, but with some noise
in the period and amplitude of the oscillations (Fig. 2b). Zooming
in, we can see the discrete production and depletion steps of our
proteins, and the different sized τ intervals in time (Fig. 2c).
The steps of the Gillespie algorithm can be summarized as
follows:
1. Initialize the system at t0 to a chosen set of reactant quantities
2. Calculate all reaction rates (λi(x1, x2. . .)) and their sum
 
P
N
λtot ¼ λi , using the current state of the system (quantity
i¼1
of each reactant, x1, x2. . ., at t0)
3. Use λtot and a randomly generated r1 (0,1) to calculate time τ
to next reaction using inverse sampling of the exponential
distribution:
 ln ðr 1 Þ
τ¼
λtot
4. Normalize all reaction rates (λi) by λtot and align them. Ran-
domly generate r2 between (0,1), whose value determines
which reaction happens at time t0 + τ
5. Update system according to chosen reaction, adding or sub-
tracting the appropriate amount to or from the quantity of each
species involved in the chosen reaction
6. Repeat steps 2–5 n times, updating the state of the system at
each iteration

3.3.7 Characterization After simulating the system for many steps, we can then character-
of Results ize its properties. For example, using a long time trace, we can
calculate the probability distribution of the number of molecules at
steady state (P(X ¼ x), Fig. 4b), or moments of the distribution,
such as the mean number of molecules or the fluctuations around
the mean (i.e., variance). The specific measure by which a gene
circuit is characterized will depend on its desired behavior. For
oscillators, the autocorrelation of the protein copy number is a
106 Giselle McCallum and Laurent Potvin-Trottier

convenient measure of the quality of the oscillations. The autocor-


relation function represents the correlation coefficient of a trace at
two time points separated by a time lag (ΔτP), and thus should be
equal to 1 after one period for a perfect oscillator. It includes both a
measure of phase drift (how quickly the oscillations become
de-synchronized), as well as amplitude noise (see Note 11).

3.3.8 Parameter Scan As with the deterministic solution to our model, we should assess
how the behavior of the system changes as a function of its para-
meters. Here, it is useful to set one parameter equal to 1, varying
the other parameter values in relation to it to minimize the number
of parameters to range and assess the behavior of your system when
simulated using this range of parameters. Typically, we set βP to
1, switching the time units of the simulation to protein lifetimes
(τp) and scaling the value of other rate values accordingly (Table 1,
see Note 1). Quantifying the autocorrelation for a range of para-
meters shows that the stochastic system oscillates over a broader
range of parameters than the deterministic system (Fig. 6a, b),
“smoothing” out the bifurcation transition (Fig. 6b). While the

Correlaon aer
a Determinisc Stochasc b one period
300 400 100
Copy Number

0.7
Hill = 2

0.5
150 200 10-1
β

0.3

10-2 0.1
0 0
0 20 40 0 20 40 0
10 10 1 2
10
α
100 200 100 0.7
Copy Number
Hill = 1.5

0.5
50 100 10-1
β

0.3

0.1
0 0 10-2
0 20 40 0 20 40 0
10 10 1
10 2

Time (τP) α

Fig. 6 Scanning the parameter space of the stochastic system. (a) Comparison of deterministic solution and
stochastic simulation of the system with different Hill coefficients. With the chosen parameters, the system
still shows sustained oscillations with low cooperativity, whereas the deterministic solution shows damped
oscillations. (b) Heatmap of the autocorrelation of the time traces of the proteins after one period. Here,
α ¼ βλβP λm
and β ¼ ββP . The thick black line indicates the deterministic bifurcation boundary, and the pink dot
P m K m
corresponds to the parameter values used in simulations in a. As in the deterministic bifurcation calculation,
increasing cooperativity increases the size of the oscillation space. However, in the stochastic regime, it is
possible to maintain oscillations outside the predicted bifurcation boundary, with both high and low coopera-
tivity constants
Using Models to (Re-)Design Synthetic Circuits 107

recommendations from the deterministic analysis still hold (e.g.,


increasing cooperativity to expand oscillation space), the require-
ments are much less stringent in the stochastic system. This tells us
that we have more leeway when choosing parts (and their
corresponding parameters) with which to build and redesign our
circuits than the deterministic analysis would indicate.

3.4 Using Models Now that we know how to simulate our synthetic circuits, we can
to Redesign incorporate data from the first design to help us understand and
the Circuit: An improve their behavior. This process is obviously very specific to
Example particular circuits, so here we will use our previous experience
redesigning the repressilator as an example. Some recommenda-
tions are general, such as reducing propagation of stochastic noise
(as this can transmitted between molecules), and we will emphasize
these. It might also be necessary to iterate the design-build-test
loop multiple times, making small changes to the circuit, then
quantifying its properties and analyzing the results. Initially, the
repressilator was designed using the bifurcation diagram in Fig. 3.
The guidelines were therefore to have strong promoters
(to increase α) and high cooperativity, while ensuring that the
proteins’ half-lives were similar to the mRNAs’ (β ¼ 1). Therefore,
repressors that multimerize and bind strongly were chosen, and
they were targeted for fast degradation to reduce their half-lives.
The assembled circuit did indeed oscillate, but its performance
appeared much lower than natural oscillators or other subsequently
published synthetic oscillators [32–39].
For the redesign of a circuit, it is crucial that the experimental
data accurately represent the circuit’s behavior. Therefore, for
single-cell dynamic properties such as oscillations, we evaluated
the performance of the circuit using a microfluidic device nick-
named the mother machine [6–8], which enables us to track
thousands of single cells under constant growth conditions for
hundreds of cell divisions. Comparing these data to the original
experiments performed on agar pad (where growth conditions
change rapidly as cells start to compete for nutrients) revealed
that the oscillatory properties appeared much improved, suggesting
that the circuit is sensitive to changes in growth conditions. This
illustrates how separating variability from the environment and
intrinsic noise of a circuit can aid in its redesign, as we can then
change or eliminate components that are highly sensitive to envi-
ronmental noise (e.g., growth conditions). We also observed high
amplitude noise between the peaks of the oscillations (Fig. 7), and
we thus decided to investigate fluorescent read-out for the oscilla-
tions as a potential source of noise. The original design included
one plasmid for the repressor and another, noisy plasmid carrying
the fluorescent read-out to track oscillations. Therefore, this ampli-
tude noise could be simply an artifact of our measurements.
Indeed, transferring the reporter to the repressilator plasmid
108 Giselle McCallum and Laurent Potvin-Trottier

PR lacI-ssrA
λcI-ssrA lacI gfp-asv
Redesigning a PLtetO1 PLtetO1
PLlacO1
synthetic circuit: λcI tetR
the repressilator pSC101 ori colE1 ori
tetR-ssrA
original repressilator cicuit
1. Precise characterization
4

GFP concentration
1cm
3
• analyze single cells “Mother Machine”
under constant 2
growth conditions 1
0
1.5μm 0 10 20 30
time (τP)
2. Identify and eliminate sources of noise

YFP concentration
6
a. amplitude noise PLtetO1 λcI-ssrA
• integrate reporter mVenus PR 4
lacI-ssrA
to remove noise PLtetO1 2
from reporter
plasmid PLlacO1
0
pSC101 ori tetR-ssrA 0 10 20 30
time (τP)
b. noisy decay model decay to guide redesign: repression curve TetR sponge
decay decay decay
transcription

• remove degrada- PLtetO1


+
protein

tion to increase K
peak protein h
number colE1 ori
time [repressor]
PLtetO1
YFP concentration

λcI 6
• add titration PLtetO1
mVenus PR
sponge to 4
lacI
increase repression P
LtetO1 + 2
threshold and PLlacO1
cooperativity colE1 ori 0
pSC101 ori tetR 0 10 20 30 40
time (τP)

Fig. 7 Redesigning the repressilator. Outline of steps taken to redesign the repressilator circuit to achieve high
robustness and precision. In step 1, we characterized the circuit in single cells at constant growth rates, which
improved oscillations compared to the original experimental setup. In step 2, we identified and eliminated
intrinsic sources of noise in the circuit. These included variable copy numbers of reporter plasmids (a), low
peak amplitude due to degradation of repressors and apparent low K values of repressors (b). Integrating the
fluorescent reporter onto the repressilator plasmid, removing degradation, and adding a titration sponge
improved precision of the circuit, leading to the most precise performance of a synthetic oscillator to date
Using Models to (Re-)Design Synthetic Circuits 109

greatly reduced peak amplitude noise (Fig. 7). In doing so, we also
made a serendipitous discovery: the fluorescent reporter originally
targeted for degradation interfered with degradation of the repres-
sors, adding noise to the oscillations. This was an example of the
unknowns in biology and emphasizes the need for both experiment
and modeling.
We also observed that the shape of the oscillations was strongly
non-sinusoidal (Fig. 7), which mathematical modeling of our sys-
tem told us was characteristic of very low repression thresholds (K)
(as expected for these strong repressors). In such a regime, the
promoters operate in a switch-like fashion—they are either
completely on or off—and the period can be decomposed in three
sub phases where each repressor decays from its peak value down to
its repression threshold while its production is completely off
(Fig. 7). After P1 decays below its threshold, production of P2 is
derepressed, which will immediately inhibit production of P3 and
initiate its decay phase. The length of the period is thus determined
by the sum of the three decay times, and we can analyze each decay
independently. While this analysis is specific to this circuit, such
pseudo-steady-state analysis, or time scale separation, is a general
technique that can be useful in analyzing many types of circuits. A
detailed analysis of the decay phase showed that two factors were
necessary for a precise timing: (1) high peak amplitude, averaging
the timing of the decay over many steps, and (2) relatively high
repression thresholds, as the elimination time of the last few mole-
cules (to fall below a low K value) is very noisy (Fig. 2c), which in
turns causes large variation in period. These recommendations were
implemented by (1) removing protein degradation, thus letting
proteins accumulate to higher numbers (and also possibly remov-
ing a source of noise) and (2) adding decoy binding sites for the
repressors (called a “titration sponge”). These decoy sites (present
in much higher copy numbers than the actual sites) soak up free
repressors, effectively increasing the repression threshold (and
increasing effective cooperativity at the same time)(Fig. 7). This
linear molecular titration [40–44] is a very versatile tool that
enables the tuning of repression curves that would experimentally
difficult to change otherwise and has been used in a variety of
applications, from timers in natural biological systems [7] to con-
trollers for perfect adaptation [45]. After implementing these
changes, the oscillations of the repressilator were extremely precise,
taking more than 13 periods before accumulating half a period of
phase drift.
As demonstrated, mathematical modeling and careful experi-
mental characterization are both critical components of designing
and redesigning synthetic genetic circuits. Models can provide
valuable insights into required parameter values and possible sys-
tem behaviors, and should guide the initial engineering of a circuit.
It is important to carefully characterize this initial circuit
110 Giselle McCallum and Laurent Potvin-Trottier

experimentally, isolating variability originating from the growth


environment, reporters or measurements, and other sources of
extrinsic noise from the variability of the circuit itself. This data
can then be used together with continued mathematical modeling
to hypothesize changes to the circuit that could improve its behav-
ior. After testing these changes, the circuit’s behavior can be char-
acterized again and improved iteratively. Here, we provided a brief
guide to building deterministic models and running stochastic
simulations of simple circuits, as these methods are simple and
fast to execute, while being extremely informative. We propose
that incorporating modeling in the design-build-test loop of syn-
thetic biology while pursuing a precision and robustness that rivals
natural circuits will lead to novel insights into the design of natural
and synthetic gene networks.

4 Notes

1. It is often convenient to set one of the parameter values of the


system equal to 1. For example, in the parameter scan discussed
in Subheading 3.3.8, this method minimizes the number of
parameters that we need to scan through. Typically, we choose
the protein elimination rate, setting βP ¼ 1, and scaling all
other rate values accordingly. This changes the units of all rate
parameters to protein lifetimes. This parameter has an intuitive
connotation, in both the human realm and the molecular one.
It is a measure of how long molecules “live” in the system on
average before they are eliminated. To illustrate this important
parameter, we will consider a simple system where proteins are
eliminated at a constant rate per molecule (if there is degrada-
tion by protease, it is not saturated), and they are no longer
produced. Therefore,
dP ðt Þ
¼ βp P ðt Þ:
dt
P ðt Þ ¼ P 0  e βP t
where P0 is the initial amount of protein, and βP is the decay
rate constant. The number of proteins thus decays exponen-
tially (Fig. 8).
The lifetime is defined as β1
P , or the time it takes to decay
1
to e . It is more intuitive to think about the half-life of the
protein (t1/2) or the time at which half the initial quantity has
decayed. These time constants are simply related by a propor-
tionality constant:
e t=τp ¼ 2 ln ð2Þt=τ1=2
t 1=2 ¼ τP  ln ð2Þ
Using Models to (Re-)Design Synthetic Circuits 111

P0

P(t) = P0·e-βP t

Protein (P)
P0 ln(2)
2 t1 /2= = τP ln(2)
βP
e-1

0
0 t1 /2 τP
Time (t)

Fig. 8 Exponential protein decay. For proteins being eliminated at a constant rate
(with no production), the population will decay exponentially. The half-life (t1/2) is
the time at which the population has reached half of its initial value. The lifetime
(τP) of the protein is the average time that a protein will exist in the system. βP is
the elimination rate constant

The lifetime does not merely represent how quickly mole-


cules are eliminated but is also a natural timescale for the
system, indicating how quickly the system adopts a new steady
state. This is why we usually set the timescale of our simulation
to τP. In the case of our example in which proteins are not
actively degraded, we can assume the protein half-life is 25 min,
meaning that τP ¼ 25ln min
ð2Þ ¼ 36 min
2. The Hill function is a common function in biology that
describes cooperative binding of ligands to a receptor. This
function was originally derived by Hill in 1910 to describe O2
molecules binding to hemoglobin [46]. It is typically written in
the following form:
xh
θ¼ h
K þ xh
Where θ is the fraction of ligand bound, x is the concentra-
tion of free ligand, K is the concentration of ligand at which
half of the population is bound, and h is the cooperativity
coefficient. h determines the nonlinearity of this function—
higher h values make the curve sharper (Fig. 9). The Hill
function has been used extensively in modeling of unknown
activation or repression functions, due to their flexibility to
describe nonlinear sigmoidal-shaped function (note that h can
be non-integer). In our model, we use 1  θ to calculate the
transcription rate of a gene based on the fraction of unbound
promoters available for expression.
112 Giselle McCallum and Laurent Potvin-Trottier

fraction repressors bound


1/2

h=1
h = 1.5
h=3
h=6
0
0 K
[repressors]

Fig. 9 Hill function. The Hill is used in our model to describe binding of repressors
to their promoters. Transcription rate is calculated as a function of the number of
unbound promoters (available for expression). As the cooperativity coefficient
h increases, the transition between a gene being fully unbound (expressed with
rate λm) and fully bound (repressed) becomes sharper

3. Many values provided on BioNumbers may not be in the exact


units required for your model but can easily be adapted for use.
For example, we can change translation rate from
aa mRNA1s1 to proteins mRNA1 min1 as follows:

8 aa 60 s 3 bp protein 1:75 protein


   ¼
1 mRNA s min aa 821 bp mRNA min
4. While a detailed description of linear stability analysis is outside
the scope of this chapter, there are many helpful resources
available. Two helpful texts that cover this topic are Strogatz’s
Non-linear Dynamics and Chaos: With Applications to Physics,
Biology, Chemistry and Engineering [47], and Epstein and Poj-
man’s book An Introduction to Non-linear Chemical Dynam-
ics: Oscillations, Waves, Patterns and Chaos [48].
5. Note that in stochastic notation, the arrows in the reaction
equations have a different meaning than in the mass action
equations in Subheading 3.1.2. Here, rather than a transfor-
mation from one product to another, an arrow means a change
in quantity of a reactant. Note also that in this notation, the full
rate of this change is indicated above the arrow (without the
implicit multiplication by the left-hand side). Finally, note that
in a stochastic simulation, we measure numbers rather than
concentration of reactants. Here, if we wanted to measure
concentration, we would need to include cell growth and
Using Models to (Re-)Design Synthetic Circuits 113

division in our stochastic equations, tracking the cell’s volume


over time. Instead, we approximate this process by adding
constant dilution of our molecules, and tracking absolute num-
bers of molecules in a “typical” cell volume over time.
6. The memorylessness property is defined as:
P ðT > t þ s j T > t Þ ¼ P ðT > s Þ
where t and s are positive real numbers. This means that the
conditional probability that the time to the first event is greater
than t + s, knowing that it is greater than t, is equal to the
probability that the time is greater than s. In other words, it
does not matter how long you have already waited. We can
show that the exponential distribution satisfies this property
using the law of total probability (P(A| B)P(B) ¼ P(A and B)):
P ðT > t þ s and T > t Þ
P ðT > t þ s j T > t Þ ¼
P ðT > t Þ
P ðT > t þ s Þ
¼ ¼ P ðT > s Þ
P ðT > t Þ
because if T > t + s, then T is necessarily greater than t.
Substituting the exponential distribution satisfies this equality:
e λðtþs Þ
¼ e λs
e λt
Note that it is possible to show that the exponential is the
only distribution to show this property, but we do not include
it here, as it is not particularly pedagogical.
7. Inverse transform sampling is easier to understand visually
(Fig. 10). Intuitively, the idea is to map the uniform probability
density to the CDF. If we compare regions where the slope of
the CDF (the probability density function since pðT ¼ t Þ ¼
dF ðt Þ
dt ) is different, we see that the regions with higher slope
will take up a higher proportion of the uniform distribution
and will therefore have a higher probability. This can be proven
mathematically, if U is a uniform random variable between
0 and 1, then F1(U) has F(x) its CDF.
 
P F1 ðU Þ  x ¼ P ðU  F ðx ÞÞ
¼ F ðx Þ
Where we applied F(x) on both sides and then used the CDF of
the uniform distribution (P(U  x) ¼ x).
8. We realize it is a bit of a jump between these equations. For
those who like to see the steps, they are as follows:
r ¼ 1  e λtot τ
114 Giselle McCallum and Laurent Potvin-Trottier

15%
P(T≤t)
0.5

15%

0
0 100
4.2% 27.8%
time (t)

Fig. 10. Intuition for the inverse transform sampling. Equal probabilities (15%) on
the uniform distribution are mapped to different probabilities on the CDF, where
the higher slopes (corresponding to the PDF) corresponds to higher probabilities

1
e λtot τ ¼
1  r1
remember that if y ¼ ex, x ¼ ln ( y). Therefore:
 
1
λtot  τ ¼ ln
1  r1
Also remember that ln(x y) ¼ y · ln (x):

 ln ð1  r 1 Þ
τ¼
λtot
9. Gillespie in practice: rate vector function. When setting up the
Gillespie algorithm, there are a few tricks that make things
more efficient and cleaner. For example, after writing the entire
set of equations, it is helpful to assemble the rates definitions
for each equation into a vector, and build this vector into a
function that accepts current reactant values as an input (called
the rate vector function, or rvf). This allows us to easily calcu-
late the individual rates of all reactions for a given system state
and to quickly sum the rates to calculate λtot. In the case of the
repressilator, the rate vector function is defined as:
h
λm  K h λ  Kh
rvf ðm, P Þ ¼ , βm  m1 , λP  m1 , βP  P 1 , mh , β m  m 2 , λP  m 2 ,
K þ P3
h h
K þ P h1
i
λ  Kh
βP  P 2 , mh , β m  m 3 , λP  m 3 , β P  P 3
K þ P h2
Using Models to (Re-)Design Synthetic Circuits 115

10. Gillespie in practice: Stoichiometry matrix. To easily determine


the value to add or subtract to each species in the event that a
given reaction occurs, we build a stoichiometry matrix for our
system. This is an M by N matrix, where M is the number of
species/reactants in the system, and N is the number of possi-
ble reactions. Each row of the matrix corresponds to one
reactant, and each column gives the change in quantity of
each reactant, should a given reaction occur. As an example,
see the stoichiometry matrix of the repressilator (Table 2).
Column i corresponding to the chosen reaction will give the
values to be added or subtracted from each reactant and can be
used to directly update a vector containing the current values of
each species.
Note that the case of the repressilator, the matrix is quite
simple, as all reactions are independent and lead to an increase
or decrease of 1 molecule. Matrices can be more complex,
especially for systems with coupled reactions. For example,
consider a system in which monomers of molecule y are pro-
duced at rate λy and dimerize irreversibly to form molecule Y at
rate λY. For this system, our reaction equations would be
written as
λy
y ! y þ1
λY y 2
ðy, Y Þ ! ðy  2, Y þ 1Þ

Table 2
Stoichiometry matrix for the repressilator model

m1 P1 m2 P2 m3 P3
m1 ! m1 + 1 1 0 0 0 0 0
m1 ! m 1  1 1 0 0 0 0 0
P1 ! P1 + 1 0 1 0 0 0 0
P1 ! P1  1 0 1 0 0 0 0
m2 ! m2 + 1 0 0 1 0 0 0
m2 ! m 2  1 0 0 1 0 0 0
P2 ! P2 + 1 0 0 0 1 0 0
P2 ! P2  1 0 0 0 1 0 0
m3 ! m3 + 1 0 0 0 0 1 0
m3 ! m 3  1 0 0 0 0 1 0
P3 ! P3 + 1 0 0 0 0 0 1
P3 ! P3  1 0 0 0 0 0 1
116 Giselle McCallum and Laurent Potvin-Trottier

Table 3
Example Stoichiometry matrix for a system with coupled reactions

Production Dimerization
λy λY y 2
y ! y þ1 ðy, Y Þ ! ðy  2, Y þ 1Þ
y 1 2
Y 0 1

The stoichiometry matrix for this system would be written


as in Table 3.
11. Gillespie in practice: Resampling time trace data. Because τ
varies between reactions, the time trace output by the Gillespie
algorithm will have points separated by non-uniform time
steps. For further analysis (for example, to calculate the auto-
correlation), it is often helpful to resample the data at regular
time intervals. To do this, we sample the number of reactants at
every tresample time interval of the output, such that the length
of our final resampled data matrix will be tmax/tresample, where
tmax is the time length of the simulation. tresample is chosen
appropriately so that the resampling is done sufficiently enough
to capture the system’s behavior.

References
1. Elowitz MB, Leibier S (2000) A synthetic 20:1099–1103. https://doi.org/10.1016/j.
oscillatory network of transcriptional regula- cub.2010.04.045
tors. Nature 403:335–338. https://doi.org/ 7. Norman TM, Lord ND, Paulsson J, Losick R
10.1038/35002125 (2013) Memory and modularity in cell-fate
2. Gardner TS, Cantor CR, Collins JJ (2000) decision making. Nature 503:481–486.
Construction of a genetic toggle. Nature https://doi.org/10.1038/nature12804
403:339–342. https://doi.org/10.1038/ 8. Potvin-Trottier L, Luro S, Paulsson J (2018)
35002131 Microfluidics and single-cell microscopy to
3. Vilar JMG, Kueh HY, Barkai N, Leibler S study stochastic processes in bacteria. Curr
(2002) Mechanisms of noise-resistance in Opin Microbiol 43:186–192. https://doi.
genetic oscillators. Proc Natl Acad Sci U S A org/10.1016/j.mib.2017.12.004
99:5988–5992. https://doi.org/10.1073/ 9. Alon U (2007) An introduction to systems
pnas.092133899 biology : design principles of biological cir-
4. McKane AJ, Newman TJ (2005) Predator-prey cuits. Chapman & Hall/CRC, Boca Raton, FL
cycles from resonant amplification of demo- 10. Phillips R, Kondev J, Theriot J et al (2013)
graphic stochasticity. Phys Rev Lett Physical biology of the cell, 2nd edn. Garland
94:218102. https://doi.org/10.1103/Phy Science, New York, NY
sRevLett.94.218102 11. Munsky B, Hlavacek WS, Tsimring LS (2018)
5. Potvin-Trottier L, Lord ND, Vinnicombe G, Quantitative biology : theory, computational
Paulsson J (2016) Synchronous long-term methods, and models. MIT Press, Cambridge,
oscillations in a synthetic gene circuit. Nature MA
538:514–517. https://doi.org/10.1038/ 12. Ingalls B (2013) Mathematical modelling in
nature19841 systems biology: an introduction. MIT Press,
6. Wang P, Robert L, Pelletier J et al (2010) Cambridge, MA
Robust growth of Escherichia coli. Curr Biol
Using Models to (Re-)Design Synthetic Circuits 117

13. Bialek WS (2012) Biophysics: searching for 27. Raj A, Van Oudenaarden A (2009) Single-
principles. Princeton University Press, Prince- molecule approaches to stochastic gene expres-
ton, NJ sion. Annu Rev Biophys 38:255–270. https://
14. Wikipedia (2019) Gillespie algorithm. https:// doi.org/10.1146/annurev.biophys.37.
en.wikipedia.org/wiki/Gillespie_algorithm 032807.125928
15. Kernst OK (2015) Gillespie’s stochastic simu- 28. Paulsson J (2004) Summing up the noise in
lation algorithm for chemical reactions. In: gene networks. Nature 427:415–418.
Wolfram Alpha Demonstr. https:// https://doi.org/10.1038/nature02257
demonstrations.wolfram.com/Gillespies 29. McQuarrie DA (1967) Stochastic approach to
StochasticSimulationAlgorithmForChemical chemical kinetics. J Appl Probab 4:413–478.
Reactions/ https://doi.org/10.2307/3212214
16. Hilfinger A, Norman TM, Paulsson J (2016) 30. van Kampen NG (2007) Stochastic processes in
Exploiting natural fluctuations to identify physics and chemistry, 3rd edn. Elsevier,
kinetic mechanisms in sparsely characterized Amsterdam
systems. Cell Syst 2:251–259. https://doi. 31. Gillespie DT (1977) Exact stochastic simula-
org/10.1016/j.cels.2016.04.002 tion of coupled chemical reactions. J Phys
17. Hilfinger A, Norman TM, Vinnicombe G, Chem 81:2340–2361. https://doi.org/10.
Paulsson J (2016) Constraints on fluctuations 1021/j100540a008
in sparsely characterized biological systems. 32. Mihalcescu I, Hsing W, Leibler S (2004) Resil-
Phys Rev Lett 116:058101. https://doi.org/ ient circadian oscillator revealed in individual
10.1103/PhysRevLett.116.058101 cyanobacteria. Nature 430:81–85. https://doi.
18. Milo R, Jorgensen P, Moran U et al (2010) org/10.1038/nature02533
BioNumbers--the database of key numbers in 33. Teng S-W, Mukherji S, Moffitt JR et al (2013)
molecular and cell biology. Nucleic Acids Res Robust circadian oscillations in growing cyano-
38:D750–D753. https://doi.org/10.1093/ bacteria require transcriptional feedback. Sci-
nar/gkp889 ence 340:737–740. https://doi.org/10.
19. Milo R, Phillips R (2016) Cell biology by the 1126/science.1230996
numbers. Garland Science, New York, NY 34. Chabot JR, Pedraza JM, Luitel P, van Oude-
20. Guet CC, Bruneaux L, Min TL et al (2008) naarden A (2007) Stochastic gene expression
Minimally invasive determination of mRNA out-of-steady-state in the cyanobacterial circa-
concentration in single living bacteria. Nucleic dian clock. Nature 450:1249–1252. https://
Acids Res 36:e73. https://doi.org/10.1093/ doi.org/10.1038/nature06395
nar/gkn329 35. Stricker J, Cookson S, Bennett MR et al (2008)
21. Siwiak M, Zielenkiewicz P (2013) Transimula- A fast, robust and tunable synthetic gene oscil-
tion - protein biosynthesis web service. PLoS lator. Nature 456:516–519. https://doi.org/
One 8:e73943. https://doi.org/10.1371/ 10.1038/nature07389
journal.pone.0073943 36. Tigges M, Dénervaud N, Greber D et al
22. Elowitz MB (1999) Transport, assembly, and (2010) A synthetic low-frequency mammalian
dynamics in systems of interacting proteins. oscillator. Nucleic Acids Res 38:2702–2711.
PhD Thesis. Princeton University, Princeton, https://doi.org/10.1093/nar/gkq121
NJ 37. Danino T, Mondragón-Palomino O,
23. El Samad H, Khammash M, Petzold L, Gille- Tsimring L, Hasty J (2010) A synchronized
spie D (2005) Stochastic modelling of gene quorum of genetic clocks. Nature
regulatory networks. Int J Robust Nonlinear 463:326–330. https://doi.org/10.1038/
Control 15:691–711. https://doi.org/10. nature08753
1002/rnc.1018 38. Mondragón-Palomino O, Danino T, Selim-
24. Paulsson J (2005) Models of stochastic gene khanov J et al (2011) Entrainment of a popula-
expression. Phys Life Rev 2:157–175. https:// tion of synthetic genetic oscillators. Science
doi.org/10.1016/j.plrev.2005.03.003 333:1315–1319. https://doi.org/10.1126/
25. Elowitz MB, Levine AJ, Siggia ED, Swain PS science.1205369
(2002) Stochastic gene expression in a single 39. Prindle A, Selimkhanov J, Li H et al (2014)
cell. Science 297:1183–1186. https://doi. Rapid and tunable post-translational coupling
org/10.1126/science.1070919 of genetic circuits. Nature 508
26. Ozbudak EM, Thattai M, Kurtser I et al (7496):387–391. https://doi.org/10.1038/
(2002) Regulation of noise in the expression nature13238
of a single gene. Nat Genet 31:69–73. https:// 40. Buchler NE, Louis M (2008) Molecular titra-
doi.org/10.1038/ng869 tion and ultrasensitivity in regulatory networks.
118 Giselle McCallum and Laurent Potvin-Trottier

J Mol Biol 384:1106–1119. https://doi.org/ 46. Hill A (1910) The possible effects of the aggre-
10.1016/j.jmb.2008.09.079 gation of the molecules of haemoglobin on its
41. Buchler NE, Cross FR (2009) Protein seques- oxygen dissociation curve. J Physiol 40:4–7
tration generates a flexible ultrasensitive 47. Strogatz S (2015) Nonlinear dynamics and
response in a genetic network. Mol Syst Biol chaos: with applications to physics, biology,
5:272. https://doi.org/10.1038/msb. chemistry, and engineering, 2nd edn. Westview
2009.30 Press, Boulder, CO
42. Genot AJ, Fujii T, Rondelez Y (2012) Com- 48. Epstein IR, Irving R, Pojman JA, John A
puting with competition in biochemical net- (1998) An introduction to nonlinear chemical
works. Phys Rev Lett 109:1–5. https://doi. dynamics: oscillations, waves, patterns, and
org/10.1103/PhysRevLett.109.208102 chaos. Oxford University Press, New York, NY
43. Lee T-H, Maheshri N (2012) A regulatory role 49. Weiße AY, Oyarzún DA, Danos V et al (2015)
for repeated decoy transcription factor binding Mechanistic links between cellular trade-offs,
sites in target gene expression. Mol Syst Biol gene expression, and growth. Proc Natl Acad
8:576. https://doi.org/10.1038/msb.2012.7 Sci U S A 112:E1038–E1047. https://doi.
44. Brewster RC, Weinert FM, Garcia HG et al org/10.1073/pnas.1416533112
(2014) The transcription factor titration effect 50. Niederholtmeyer H, Sun ZZ, Hori Y et al
dictates level of gene expression. Cell (2015) Rapid cell-free forward engineering of
156:1312–1323. https://doi.org/10.1016/j. novel genetic ring oscillators. elife 4:1–18.
cell.2014.02.022 https://doi.org/10.7554/elife.09771
45. Lillacci G, Aoki SK, Gupta A et al (2019) A 51. Taniguchi Y, Choi PJ, Li G-W et al (2010)
universal rationally-designed biomolecular Quantifying E. coli proteome and transcrip-
integral feedback controller for robust perfect tome with single-molecule sensitivity in single
adaptation. Nature 570:533–537. https://doi. cells. Science 329:533–538. https://doi.org/
org/10.1038/s41586-019-1321-1 10.1126/science.1188308
Chapter 4

Automated Biocircuit Design with SYNBADm


Irene Otero-Muras and Julio R. Banga

Abstract
SYNBADm is a Matlab toolbox for the automated design of biocircuits using a model-based optimization
approach. It enables the design of biocircuits with pre-defined functions starting from libraries of biological
parts. SYNBADm makes use of mixed integer global optimization and allows both single and multi-
objective design problems. Here we describe a basic protocol for the design of synthetic gene regulatory
circuits. We illustrate step-by-step how to solve two different problems: (1) the (single objective) design of a
synthetic oscillator and (2) the (multi-objective) design of a circuit with switch-like behavior upon
induction, with a good compromise between performance and protein production cost.

Key words Automated design, Biological parts, Global optimization, Mixed Integer Nonlinear
Programming, Multi-objective optimization, Trade-offs, Synthetic biology

1 Introduction

Synthetic biology aims to provide a framework for the rational


bottom-up engineering of biocircuits with a priori defined func-
tionalities. Computational tools can play a major role in the design
of these synthetic biosystems [1]. The challenge is to map in a
predictable manner sequence and function [2] such that, given a
function of interest, we obtain the DNA sequence encoding the
transcriptional circuit to be implemented in cells to execute it. One
approach to automated design is inspired in the design of electronic
circuits, and relies on truth table and logic gates (see [3] and
references therein). A second main approach is based on continu-
ous dynamic models, usually with mechanistic meaning (see [4] and
references therein). Both approaches aim at meeting the principles
of modularity, orthogonality, predictability, and reliability (enum-
erated by Xiang et al. [5] as key principles of automated design).
SYNBADm [4] belongs to the second family of methods, and
combines continuous dynamic simulation with advanced mixed
integer optimization solvers. A key novelty of this toolbox is that
it allows to consider multi-objective design problems, i.e. those
considering conflicting design criteria. For these situations,

Filippo Menolascina (ed.), Synthetic Gene Circuits: Methods and Protocols, Methods in Molecular Biology, vol. 2229,
https://doi.org/10.1007/978-1-0716-1032-9_4, © Springer Science+Business Media, LLC, part of Springer Nature 2021

119
120 Irene Otero-Muras and Julio R. Banga

SYNBADm provides the set of best trade-offs between the objec-


tives (usually called the Pareto-optimal set).
In the next section we describe the formalism and methods
used in this software. In Subheading 3 we indicate how to install
and initialize SYNBADm. In Subheadings 4 and 5 we describe how
to design an oscillator (single objective design) and a circuit with
switch-like behavior (multi-objective design), respectively.

2 Methods

SYNBADm combines a dynamic modeling formalism with model-


based design methods using global optimization. These two main
components are described below.

2.1 The Modeling SYNBADm is based on continuous dynamic descriptions of the


Framework behavior of biological circuits. In particular, it uses models based
on nonlinear deterministic ordinary differential equations (ODEs).
The dynamics of gene regulatory networks is (internally encoded)
by a superstructure of the form:
dc ð1Þ
¼ f ðc, w, y, kÞ,
dt
where c is the vector of species concentrations, w is a vector of
tunable (real) parameters, y is a vector of binary variables that
describes the topology of the circuit, and k is a vector of fixed
parameters.
Given a specific library of components and user-defined design
objective(s) and specifications, SYNBADm automatically generates
the corresponding superstructure(s) that optimize(s) the objectives
(s) and is compatible with the specifications. Currently, SYNBADm
allows us to use two classes of libraries, denoted as Mass Action and
Hill-type libraries, respectively. The user decides which type of
library is more convenient depending on the desired level of
model granularity. The superstructure in Eq. 1 corresponding to a
mass action library is a plain mass action kinetic model describing
the dynamics of genes, intermediates, mRNAs, and proteins. The
superstructure for a Hill library, also in the form of Eq. 1 is a
reduced model describing only the dynamics of the proteins with
Hill kinetics. In both cases, a library is constituted by the following
elements:
l a set of promoters (P),
l a set of protein Coding Sequences (CDS),
l a set of Inducers (I),
l CDS to P relations (what promoter is affected by each
transcript),
Biocircuit Design with SYNBADm 121

Library of components Devices System


1 P1 2
P1 P1
1
1 2
P2 RBS Ter
2
P1 2 3
P3 3 P1
2
4 . 3

5
.
P3 . 15 P3
15
CDS
5 5

Fig. 1 Biological components: Promoters, Ribosome Binding Site (RBS), Protein Coding Sequences (CDS), and
Terminator (Ter); Devices and Systems

l I to CDS relations (what transcript is affected by each inducer).


To illustrate how the vector y of binary variables encodes the
structure of a particular system (biocircuit), we use the example in
Fig. 1. In this case we have three different promoters and five
coding sequences. This gives a total of 15 different devices. These
15 devices are internally labeled from 1 to 15. In this way, the
structure of any circuit (which is a combination of devices) is
given by a vector of 15 binary variables. If a device j is part of the
circuit, then the corresponding component is yj ¼ 1 (being zero
otherwise). The device in the figure will be given, in terms of
structure by the vector y ¼ [0, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 1], or equivalently by a 5  3 matrix where the entry Yi,k corre-
sponds to a device with promoter k and coding sequence i:
0 1
0 0 0
B C
B1 0 0C
B C
B C
Y ¼ B1 0 0C
B
C: ð2Þ
B C
B0 0 0C
@ A
0 0 1

2.2 Design as an In the framework described above, a biocircuit is completely char-


Optimization Problem acterized by a vector of integer (and/or binary) variables (defining
the topology) and a vector of real variables (the tunable parameters
of the parts). The design problem is formulated as follows: given a
set (library) of components, a design objective, and a set of design
specifications, find the optimal biocircuit given by the connectivity
of a certain subset of parts and the value of their tunable para-
meters. In other words, SYNBADm performs a simultaneous
global search in topology and parameter spaces by means of an
122 Irene Otero-Muras and Julio R. Banga

optimization algorithm and a mixed integer description of the


biocircuit dynamics.
The toolbox allows single and multi-objective formulations. In
the latter case, instead of a single solution, SYNBADm gives a
Pareto-optimal set of solutions (i.e. the best trade-offs between
the objectives). From a mathematical point of view, the design
problem is formulated as a mixed integer multi-objective dynamic
optimization problem. Technical details can be found in [6].
In addition to model-based optimal design, SYNBADm can
also solve dynamic simulations, i.e. given a model of a biocircuit and
certain initial conditions, obtain its time-dependent behavior by
solving the corresponding initial value problem.

2.3 Optimization The class of mixed integer dynamic optimization problems consid-
Solvers ered above is NP-hard. Although these problems can be solved with
purely stochastic methods (such as simulated annealing, genetic
algorithms, etc.), that would be extremely costly computationally,
since these methods require a very large number of evaluations of
the cost function (and therefore many simulations of the explored
biocircuits). To avoid this, SYNBADm includes four global optimi-
zation solvers which are based on metaheuristics, combining sto-
chastic global search with efficient local search methods. The main
advantage is that we keep the global character of the search (escap-
ing from local solutions) while increasing efficiency dramatically
thanks to the deterministic local solvers. Currently, SYNBADm
offers the following metaheuristics:
l eSS (enhanced Scatter Search by Egea et al. [7]): for Mixed
Integer Nonlinear Programming problems, it handles con-
straints and incorporates the local solver MISQP (Mixed Integer
Sequential Quadratic Programming by Exler et al. [8]).
l MITS (Mixed Integer Tabu Search by Exler et al. [9]): for Mixed
Integer Nonlinear Programming problems, incorporates the
local solver MISQP.
l ACOmi (Ant Colony Optimization for mixed integer domain by
Schlueter et al. [10]): for Mixed Integer Nonlinear Program-
ming problems, incorporates local solver MISQP.
l VNS (Variable Neighborhood search by Mladenovic et al. [11]):
for Integer Nonlinear Programming problems, it does not han-
dle constraints. Use this solver for unconstrained single objec-
tive problems with integer (or binary) variables only.
It should be noted that, due to their stochastic and heuristic
nature, these methods cannot guarantee global optimality. How-
ever, it should also be recalled that deterministic global optimiza-
tion methods, which could in principle offer such guarantees, are in
practice too computationally costly to be applied to problems of
realistic size. In contrast, the metaheuristics considered in
Biocircuit Design with SYNBADm 123

SYNBADm usually provide near-globally optimal solutions in rea-


sonable computation times. We have obtained excellent results
considering benchmark problems of known solution, and we have
obtained similar or better solutions for a number of published case
studies [6]. Besides, for many problems we can formulate objective
functions that are bounded, so we can easily assess how good these
solutions are despite not being able to guarantee their global
optimality.
For problems with constraints (e.g. minimum and maximum
number of devices) and/or mixed integer-continuous variables,
eSS, MITS, or ACOmi can be used. The three of them are suitable
for single and multi-objective design, but their comparative perfor-
mance will be problem dependent. For new problems with no prior
knowledge about the expected solution, it is a good practice to use
all of them. In our experience, eSS usually shows a good perfor-
mance in synthetic gene circuit design independently of the balance
between the number of real and integer variables. MITS and
ACOmi usually perform better for problems without (or with
few) real decision variables. However, due to their stochastic
nature, we recommend to solve each new problem with these
three different solvers and cross-check the results.

2.4 Practical Below we consider two practical examples, giving a step-by-step


Examples description of their solution using SYNBADm. In these examples,
we consider design problems for two different target behaviors:
(1) an oscillator and (2) a circuit with a switch-like behavior upon
induction. For these case studies we will use no a priori information
about the configurations or architectures leading to the behaviors
of interest. We will make use of built-in SYNBADm libraries that we
will modify accordingly to adapt to the biological components
available in each case.

3 SYNBADm Installation and Initialization

SYNBADm is available under a GPLv3 license at https://sites.


google.com/site/synbadm, and runs under the MATLAB numer-
ical computing environment (www.mathworks.com).
1. First, make sure that a compatible C++ compiler is installed.
SYNBADm allows fast dynamic simulation by automatically
converting dynamic models to C code. This feature requires
the installation of a compatible C++ compiler. For more infor-
mation, go to:
http://es.mathworks.com/support/compilers
Alternatively, dynamic models can be integrated with
Matlab ODE solvers (without requiring a C++ compiler) but
the execution times will be much longer.
124 Irene Otero-Muras and Julio R. Banga

2. Unzip and copy the toolbox folder to a directory of your choice


(it is important to (re)name the main folder as SYNBAD
3. (For Linux users only) the first time you use SYNBADm it is
needed to (1) in Matlab change to the SYNBAD main folder
and run >>SYNBAD_install; (2) compile the default library
files (in order to use C++ integrators), changing to the folder
MA_library and executing:
>>SYNBAD_Makelibrary_MA_C( MA_input_library) .
then change to the folder HL_library and execute:
>>SYNBAD_Makelibrary_HL_C( HL_input_library) .
this step is needed only the first time you use SYNBADm,
before running >>SYNBAD_Startup.
4. From Matlab, change to the SYNBAD directory and run
>>SYNBAD_Startup, which adds all the relevant files to the
path. Remember to run >>SYNBAD_Startup in every new
Matlab session.
5. Test the installation by running any of the examples available in
the Examples folder. For example 1, execute:
>>Run_Example_1, if the installation is correct, the opti-
mization will run and the corresponding results stored in the
file RESULTS_DESIGN.mat.

4 Design of an Oscillator with SYNBADm

4.1 Definition of the The goal is to find an endogenous oscillator (i.e. the system oscil-
Problem lates without the need of an external inducer) starting from a library
of available components. We consider mass action kinetics, because
we are interested in tracking the concentrations of proteins and
mRNAs (see Note 1).
We have the following components available: two different
promoters Pλ and Ptet that we denote, respectively, by P1, P2 and
four protein coding sequences (cI, tetR, lacI, luxI). In addition, we
consider an extra promoter P3, repressed by lacI with tunable
promoter strength. To generate the corresponding SYNBADm
library we will use as a template the built-in mass action library
and modify it accordingly.

4.2 Preparing the 1. In the folder MA_Library (within USR_Libraries), we open the
Library of Components script MA_input_library that we are going to use as a tem-
plate. Before doing any modification, we save the script with a
different name, MA_input_library_EX1.
Biocircuit Design with SYNBADm 125

1 library . n a m e o f f u n c t i o n= ’MAex1 ’ ;
2 library . p r o m o t e r s={ ’ P1 ’ , ’ P2 ’ , ’ P3 ’ } ; % l i s t o f p r o m o t e r s
3 library . t r a n s c r i p t s ={ ’ c I ’ , ’ tetR ’ , ’ l a c I ’ , ’ l u x I ’ } ; % l i s t o f p r o t e i n c o d i n g r e g i o n s
4 library . p r o m t f={ ’ c I ’ , ’ tetR ’ , ’ l a c I ’ } ; % t r a n s c r i p t a f f e c t i n g each promoter
5 library . i n d u c e r s ={}; % l i s t o f i n d u c e r s
6 library . i n d t r ={}; % t r a n s c r i p t a f f e c t e d by each i n d u c e r

Fig. 2 Library input file for example 1: MA_input_library_EX1.m

2. Now, in the file MA_input_library_EX1 we fill in the fields


of the structure library, as indicated in Fig. 2 where name_-
of_function contains a short name to identify the library
files, promoters is a row cell array of strings containing the
names of the promoters, transcripts is a row cell array of
strings containing the names of the protein coding regions,
prom_tf is the list of transcripts repressing each promoter,
inducers is a row cell array of strings containing the names
of the inducers (empty in this case), and ind_tr is a row cell
array of strings containing the names of the transcript being
bound by each inducer (empty in this case).
3. In order to generate the library files we call the SYNBADm
mass action library function:
>>SYNBAD_Makelibrary_MA_C(MA_input_librar-
y_EX1), this generates the C++ library code, for efficient inte-
gration with CVODES.
4. We can check the ordinary differential equations opening the
generated odefile MAex1_odefile_c.
5. The values of default initial conditions are stored in MAex1_-
default_states.m.

6. The values of the default parameters are stored in MAex1_de-


fault_parameters.m. We are going to use the values in the
literature [12] for all the parameters. The default value for
kf_pt_3 is also taken from the literature (same source) but
note that this parameter is a decision variable in the optimiza-
tion problem. We use as a template the file MAex1_default_-
parameters.m, and after the corresponding modifications,
we save it as MAex1_parameters_1.m. The values of the
parameters are shown in Fig. 3.

4.3 Defining the SYNBADm has a number of built-in objective functions included in
Objective Function the folder USR_ObjFuns. The function OF_Oscil is the objective
function especially suited to design oscillators. Therefore, we do
not need to define in this case an ad-hoc objective function but
making use directly of the built-in function OF_Oscil. We only
need to adapt the list of species to the library that we have defined
for our problem. In order to do this, we open OF_Oscil and
substitute the default list of species by the one we are currently
using, i.e.: trnsc ¼ {cI,tetR,lacI,luxI}; The objective
126 Irene Otero-Muras and Julio R. Banga

1 f u n c t i o n k=MAex1 parameters
2
3 NA = 6 . 0 2 2 1 4 1 5 e23 ; % Avogadro
4 V = 1 e −14; % C e l l volume
5 NAV = NA∗V/1 e9 ; % For c o n c e n t r a t i o n i n nM
6
7 k f p t 1=NAV;
8 k f p t 2=NAV;
9 k f p t 3=NAV;
10 kb pt 1 =0.5;
11 kb pt 2 =0.5;
12 kb pt 3 =0.5;
13 kdeg pt 1 =0.075;
14 kdeg pt 2 =0.075;
15 kdeg pt 3 =0.075;
16 ktransc 1 =0.00005;
17 ktransc 2 =0.00005;
18 ktransc 3 =0.00005;
19 kleak 1 =0.12;
20 kleak 2 =0.09;
21 kleak 3 =0.01;
22 k t r a n s l 1 =0.1;
23 k t r a n s l 2 =0.1;
24 k t r a n s l 3 =0.1;
25 k t r a n s l 4 =0.1;
26 kdeg m 1 = 0 . 0 0 1 ;
27 kdeg m 2 = 0 . 0 0 1 ;
28 kdeg m 3 = 0 . 0 0 1 ;
29 kdeg m 4 = 0 . 0 0 1 ;
30 kdeg 1 =0.001;
31 kdeg 2 =0.001;
32 kdeg 3 =0.001;
33 kdeg 4 =0.001;

Fig. 3 Parameter values considered for example 1: MAex1_parameters_1.m

function is based on the autocorrelation function and it has been


described in [13].

4.4 Solving the In the library USR_inputs, we create the input file (a pre-existing
Single Objective input file can be used as a template), and we save it as Oscilla-
Optimization Problem tor_MAex1.m (see Fig. 4). In this file we indicate:

1. The model options, including type of library, name of the


odefile to be used, number of variables of each type to be
considered for the design, and the names of the scripts contain-
ing the values of parameters, initial conditions, etc.
2. The options for the design, including the name of the script
with the objective function to be used, the lower and upper
bounds for the decision variables, the minimum and maximum
number of devices allowed in the final system and the indices of
the parameters to be tuned, in this case inputs.design.
idx¼{3}, as it corresponds to the parameter kf_pt_3.
3. The options for simulation, including the time interval for the
integration inputs.simulate.tspan.
4. The options for the MINLP solvers (we choose the optimiza-
tion solver, and in case of ESS, MITS, or ACO we choose also
the local solver to be used). In presence of integer or binary
variables, the local solver to be used is MISQP.
Biocircuit Design with SYNBADm 127

1 %================================
2 % MIXED INTEGER MODEL FRAMEWORK
3 %================================
4 i n p u t s . model . l i b t y p e = ’ MA Library ’ ; %Choose ’ MA Library ’ | ’ HL Library ’
5 i n p u t s . model . ode name = ’ M A e x 1 o d e f i l e c ’ ;
6 i n p u t s . model . n i n t e g e r v a r = 0 ;
7 i n p u t s . model . n r e a l v a r = 1 ;
8 i n p u t s . model . n b i n a r y v a r = 1 2 ;
9 i n p u t s . model . d e f p a r a m f u n c t i o n= ’ MAex1 parameters 1 ’ ;
10 i n p u t s . model . d e f s t a t e f u n c t i o n= ’ M A e x 1 d e f a u l t s t a t e s ’ ;
11 i n p u t s . model . t r a n s c p r o m o t f u n c t i o n = ’ M A e x 1 t r a n s c r i p t s a n d p r o m o t e r s ’ ;
12 i n p u t s . model . u v a l u e s = [ ] ;
13 %============================
14 % DESIGN PROBLEM OPTIONS
15 %============================
16 i n p u t s . d e s i g n . o b j e c t i v e = ’ OF Oscil ’ ;
17 inputs . design . idx = {3};
18 inputs . design . par x = [ ] ;
19 inputs . design . var L = z e r o s ( 1 , 1 3 ) ;
20 i n p u t s . d e s i g n . var U = o n e s ( 1 , 1 3 ) ;
21 inputs . design . var 0 = zeros (1 ,13) ;
22 i n p u t s . d e s i g n . D max = 3 ; % o n l y a p p l i e s i n MITS , ESS , ACO
23 i n p u t s . d e s i g n . D min = 3 ; % o n l y a p p l i e s i n MITS , ESS , ACO
24 %====================================
25 % SIMULATE OPTIONS
26 %=====================================
27 inputs . simulate . v a r c i r c u i t = [ ] ;
28 inputs . simulate . tspan = 0 : 1 0 : 4 0 0 0 0 ;
29 i n p u t s . s i m u l a t e . o b j e c t i v e = { ’ OF Oscil ’ } ;
30 %==================================
31 % MINLP SOLVER OPTIONS
32 %==================================
33 i n p u t s . o p t s o l . o p t s o l v e r = ’ ESS ’ ; % Choose MINLP s o l v e r ’ ESS ’ | ’ MITS ’ | ’ ACO’ | ’ VNS’
34 i n p u t s . o p t s o l . maxtime = 1 0 0 ;
35 i n p u t s . o p t s o l . maxeval = [ ] ;
36 %e s s o p t i o n s
37 i n p u t s . o p t s o l . e s s . l o c a l . s o l v e r = ’ misqp ’ ;
38 %==================================
39 % IVP SOLVER OPTIONS
40 %==================================
41 i n p u t s . i v p s o l . r t o l = 1 . 0D−7; % [ ] IVP s o l v e r i n t e g r a t i o n t o l e r a n c e s
42 i n p u t s . i v p s o l . a t o l = 1 . 0D−7;

Fig. 4 SYNBAD design input file for example 1: Oscillator_MAex1.m

5. The options for the integration, mainly the tolerances for the
initial value problem (IVP) solver.
Once the input file is completed, we call (from the main direc-
tory) the function to solve the single optimization design problem:
>>SYNBAD_Design_SO(Oscillator_MAex1). After the
computation time (selected in the design input file, in this case
100 s), the optimal solution found is stored in the file RESULTS_-
DESIGN.mat. Note that the design problem might have more than
one optimal solution, and due to the fact that we use global
optimization solvers, the solution obtained by SYNBADm might
be different in every call to SYNBAD_Design_SO. Here we find the
following solution:
results.xbest ¼ [0.013209502186800 0 0 1 1 0 0 0 1 0 0
0 0] which corresponds to the circuit in Fig. 5. The value of the
objective function for the optimal circuit is results.fbest¼
-0.739787373366868. We recommend to save the mat file con-
taining the results with a different name (RESULTS_DE-
SIGN_EX1_T1) in the USR_Results folder, to avoid overwriting
the results in further calls to the single objective design function.
128 Irene Otero-Muras and Julio R. Banga

Circuit superstructure matrix (active pairs in red)


P1
t1 (cI) tetR

t2 (tetR) P2

LacI
t3 (lacI)

P3
t4(luxI) kf_pt_3 = 0.0132
cI
P1 P2 P3

Fig. 5 Oscillator found by SYNBADm (matrix and circuit scheme)

0.2 0.2
P1

P2
0.1 0.1
0 0
0.2 0.2
P3lacI P1cI
P2tetR P3

0.1 0.1
0 0
0.2 0.2
0.1 0.1
0 0
200 100
tetR
cI

100 50
0 0
1000 2
cIm
lacI

500 1
0 0
tetRm

2 20
lacIm

1 10
0 0
0 0.5 1 1.5 2 2.5 3 3.5 4 0 0.5 1 1.5 2 2.5 3 3.5 4
t x10 4 t x10 4

Fig. 6 Dynamics of all the species involved in the oscillator obtained by SYNBADm

4.5 Simulating the If we want to simulate the dynamics of a circuit (the optimal circuit
Dynamics of a Circuit found by SYNBADm or any other combination) we only have to fill
the simulation options in the input file Oscillator_MAex1.m,
including the vector describing the circuit inputs.simulate.
var_circuit. Importantly, if we choose other circuit than the
solution found, the vector needs to preserve the same structure in
terms of number and class (binary or real) of the entries. The
simulation is carried out by running:
>>SYNBAD_Simulate(Oscillator_MAex1). Using the
solution found, the dynamics of all the species involved are auto-
matically depicted, as it is shown in Fig. 6.
Biocircuit Design with SYNBADm 129

5 Design of a Switch-Like Circuit with SYNBADm

5.1 Definition of the The second example consists on finding circuits that behave as
Problem switches upon stimulation by different inducers (starting by a
library containing different promoters, protein coding sequences,
and inducers). By switch-like performance, we understand as in
[14] that the steady state level of LacI is high upon aTc and low
upon IPTG induction, whereas the steady state level of tetR is low
upon aTc and high upon IPTG induction.
In order to ensure an optimal use of the cell resources, we take
into account the protein production cost as a second optimization
objective. We consider in this case Hill kinetics (we are interested
only in the dynamics of the proteins involved, see Note 2).
We have the following components available: five different
promoters Plac1, Plac2, Pλ, Ptet, ParaC that we denote, respectively,
by P1 ... P5 and four protein coding sequences (tetR, lacI, cI,
araC). To generate the corresponding SYNBADm library we will
use as a template the built-in Hill kinetics library and modify it
accordingly. The first two promoters are both repressed by lacI but
with different affinities.

5.2 Preparing the 1. In the folder HL_Library (within USR_Libraries), we open the
Library of Components script HL_input_library that we are going to use as a tem-
plate. Before doing any modification, we save the script with a
different name, HL_input_library_EX2.
2. Now, in the file HL_input_library_EX2 we fill in the fields of
the structure library, as indicated in Fig. 7, where name_-
of_function contains a short name to identify the library
files, promoters is a row cell array of strings containing the
names of the promoters, transcripts is a row cell array of
strings containing the names of the protein coding regions,
prom_tf is the list of transcripts repressing each promoter,
prom_nhill contains the Hill coefficients for each repressor-
promoter pair, inducers is a row cell array of strings contain-
ing the names of the inducers and ind_tr is a row cell array of
strings containing the names of the transcript being bound by
each inducer.
3. In order to generate the library files we call the SYNBADm Hill
kinetics library function:

1 library . n a m e o f f u n c t i o n= ’ HLex2 ’ ;
2 library . p r o m o t e r s={ ’ P l a c 1 ’ , ’ P l a c 2 ’ , ’ Plambda ’ , ’ P t e t ’ , ’ ParaC ’ } ; % l i s t o f p r o m o t e r s
3 library . t r a n s c r i p t s ={ ’ tetR ’ , ’ l a c I ’ , ’ c I ’ , ’ araC ’ } ; % l i s t o f p r o t e i n c o n d i n g r e g i o n s
4 library . p r o m t f={ ’ l a c I ’ , ’ l a c I ’ , ’ c I ’ , ’ tetR ’ , ’ araC ’ } ; % t r a n s c r i p t a f f e c t i n g each promoter
5 library . p r o m n h i l l = [ 4 , 4 , 2 , 2 , 2 ] ; % H i l l c o e f f i c i e n t s o f R e p r e s s o r −Promoter p a i r s
6 library . i n d u c e r s ={ ’IPTG ’ , ’ aTc ’ } ; % l i s t o f i n d u c e r
7 library . i n d t r ={ ’ l a c I ’ , ’ tetR ’ } ; % t r a n s c r i p t a f f e c t e d by each i n d u c e r

Fig. 7 Library input file for example 2: HL_input_library_EX2.m


130 Irene Otero-Muras and Julio R. Banga

1 f u n c t i o n k=H L e x 2 p a r a m e t e r s 1
2
3 K Plac1 =10;
4 K Plac2 = 0 . 0 1 ;
5 K Plambda = 0 . 3 3 ;
6 K Ptet = 0 . 0 1 4 ;
7 K ParaC = 2 . 5 ;
8 alpha tetR =1.215;
9 a l p h a l a c I =1.215;
10 alpha cI =2.92;
11 alpha araC =1.215;
12 kdeg tetR =0.0346;
13 k d e g l a c I =0.0346;
14 kdeg cI =0.0693;
15 kdeg araC = 0 . 0 1 1 5 ;
16 kf lacIIPTG =0.05;
17 kf tetRaTc =0.05;
18 kb lacIIPTG = 0 . 1 ;
19 kb tetRaTc = 0 . 1 ;
20 kdeg lacIIPTG = 0 . 0 6 9 3 ;
21 kdeg tetRaTc = 0 . 0 6 9 3 ;

Fig. 8 Parameter values considered for example 2: HLex2_parameters_1.m

>>SYNBAD_Makelibrary_HL_C(HL_input_library_
EX2) .

This generates the C++ library code, for efficient integra-


tion with CVODES.
4. We can check the ordinary differential equations opening the
generated odefile HLex2_odefile_c.
5. The values of default initial conditions are stored in HLex2_-
default_states.m.
6. The values of the default parameters are stored in HLex2_de-
fault_parameters.m. We are going to use the values in the
literature [14] for all the parameters. We use as a template the
file HLex2_default_parameters.m, and after making the
corresponding modifications (the values of the parameters are
shown in Fig. 8), we save the script as HLex2_para-
meters_1.m.

5.3 Defining the The first objective function must encode the desired switch-like
Objective Functions behavior: namely, the steady state level of LacI must be high upon
aTc and low upon IPTG induction, whereas the steady state level of
tetR must be low upon aTc and high upon IPTG induction, as it has
been defined in [14]. To ensure that we achieve the desired func-
tionality at a minimal consumption of cell resources we consider as a
second objective a proxy of the protein production cost as defined
elsewhere [15]. Both objective functions can be defined by modify-
ing accordingly the templates denoted, respectively, as OF_Switch
and OF_Cost available in SYNBADm, in the folder USR_ObjFuns.
We save the corresponding functions as OF1_Switch and
OF2_Cost in USR_ObjFuns.
Biocircuit Design with SYNBADm 131

O2U sol1

Objective 2
sol2
O2L

O1L Objective 1 O1U

Fig. 9 Scheme of the ε-constraint strategy as implemented in SYNBADm

5.4 Solving the SYNBADm solves bi-objective optimization problems using the
Multi-Objective epsilon-constraint strategy [16]. First, we choose our objective
Optimization Problem 1 and objective 2 and solve, for each of them, a single objective
optimization problem. In this way we obtain the extremes of the
Pareto front denoted in Fig. 9 by sol1 and sol2. In this example, we
choose the circuit performance as our first objective and the protein
production cost as the second objective.
In order to solve the first single objective optimization prob-
lem, we create the corresponding input file Switch_HLex2_-
OJB1.m as indicated in Fig. 10 in the library USR_inputs. Once
this file is created, we execute:
>>SYNBAD_Design_SO( Switch_HLex2_OBJ1)
obtaining as a result the first extreme of the Pareto front (sol1
in Fig. 9), we rename the mat file as sol1.mat for storage pur-
poses. We proceed in the same manner to solve the second single
objective optimization problem (in this case we create the input file
Switch_HLex2_OJB2.m as indicated in Fig. 10 in the library
USR_inputs, but just modifying in this case the objective
inputs.design.objective ¼ {OF2_Cost}. Once this file is
created, we execute:
>>SYNBAD_Design_SO(Switch_HLex2_OBJ2) to obtain
the second extreme of the Pareto front (sol2 in Fig. 9), that we
store as sol2.mat. Now, we are in the position to solve the
bi-objective optimization problem. In the library USR_inputs, we
create input file Switch_HLex2.m (see Fig. 11). In this file we
indicate the two objectives to optimize inputs.modesign.
objective1 and inputs.modesign.objective2. We also
indicate the coordinates of the two extreme points of the Pareto
front obtained as solutions of the single optimization problems,
respectively, inputs_modesign_min_objective_1 and
inputs_modesign_min_objective_2. Finally we need to
132 Irene Otero-Muras and Julio R. Banga

Fig. 10 SYNBADm design input file for example 2, single objective search for the extreme of the Pareto:
Switch_HLex2_OBJ1.m

indicate the number of intervals of the ε-constraint, i.e. the number


of intervals in which we divide the y-axis in the objective space (see
Fig. 9).
We are now in the position to solve the bi-objective optimiza-
tion problem, executing (from the main directory) the function:
>>SYNBAD_Design_MO(Switch_HLex2). Before starting the
optimization, SYNBADm indicates the approximated computation
time (calculated as the maximum computation time selected by the
user times the number of intervals considered for the ε-constraint
strategy). Once the optimization is ready the results are stored in
the file RESULTS_MO_DESIGN.mat. Note that the design problem
is bi-objective, and therefore we obtain a Pareto set of optimal
solutions. In order to depict the Pareto front obtained, we call:
>>SYNBAD_Plot_Pareto(Switch_HLex2, RESULTS_MO_-
DESIGN). The Pareto optimal front obtained from the available
Biocircuit Design with SYNBADm 133

Fig. 11 SYNBADm design input file for example 2: Switch_HLex2.m

library of components is depicted in Fig. 12. The circuit P2 shows a


good compromise between both objectives.

6 Notes

1. Reactions Associated with the Biological Devices in SYN-


BADm Library of Mass Action Type. The kinetic formalism is
adapted and further extended from the work by Pedersen and
Phillips [12]. Within this framework, all the reactions are
endowed with mass action kinetics. Next we enumerate the
reactions (and kinetic parameters) corresponding parts, combi-
nations of parts and devices of interest:
134 Irene Otero-Muras and Julio R. Banga

IPTG Promoters
P5 1 2 3 4 5

Repressors
LacI1 1
2 IPTG Promoters
2500 P1 3 P2 1 2 3 4 5
P1

Repressors
4 LacI1 1
tetR
2
P1 3
aTc tetR 4
2000
Objective 2 (Cost)

aTc

IPTG
P1
1500 LacI 1
P2
P1
tetR Promoters
P3 1 2 3 4 5

Repressors
1000 1
aTc 2
−1 −0.95 −0.9 −0.85 −0.8 −0.75
3
Objective 1 (Performance) 4

Fig. 12 Pareto front of solutions obtained in Example 2

1. A promoter P1 negatively regulated by a protein T1:


kfpt ktransc
P 1 + T 1 FGGGGGGGGB
GGGGGGGG P 1T 1GGGGGGGGGGGA P 1T 1 + mT 2 ð3Þ
kbpt

where P1 is the promoter, T1 is the repressor protein, P1T1 is


the repressor–promoter complex, and mT2 is the mRNA of the
transcribed protein. The parameters kfpt, kbpt, and ktransc are the
protein-promoter binding rate constant, protein-promoter
unbinding rate constant, and the rate of transcription in the
bound state.
2. A promoter P1 that is not regulated by any transcription factor:
kleak kdegpt
P 1GGGGGGGGGA P 1 + mT 2 mT 2GGGGGGGGGGGA ∅ ð4Þ

where kleak is the constitutive rate of transcription in absence of


transcription factors and kdegpt is the degradation rate constant
for the mRNA degradation.
3. A protein coding sequence T2 introduces the reactions of
translation:
ktransl ð5Þ
mT 2GGGGGGGGGGGA mT 2 + T 2
Biocircuit Design with SYNBADm 135

where ktransl is the rate constant corresponding to the transla-


tion of mRNA, and degradation:
kdeg kdegm
T 2GGGGGGGGGGGA ∅ mT 2GGGGGGGGGA ∅ ð6Þ

where kdeg and kdegm are the degradation rate constants of


protein and mRNA, respectively.
4. The complete set of reactions for a device P1  T2 in presence
of the repressor protein T1:

kb ktb
P 1 + T 1 FGGGGGGBG P 1T 1GGGGGGA P 1 + mT 2
GGGGG
ku
ð7Þ
kr kdm kd
mT 2GGGGGGA mT 2 + T 2 mT 2GGGGGGGGA ∅ T2 GGGGGGA ∅

5. The presence of an external inducer binding repressor T1 adds


the following reactions:
kbi1 kdi
I + T 1 FGGGGGGGB
GGGGGGG IT 1GGGA ∅
kui1

where kbi, kui and kdi are the constants of binding, unbinding,
and degradation of the inducer complex, respectively.
2. Reactions Associated with the Biological Devices in a SYNBADm
Library of Hill Type. The kinetic formalism is adapted from [14]
and further extended. Within this framework, the device
P1  T2, where P1 is a promoter negatively regulated by a pro-
tein T1, has associated with the reactions:
rt ð8Þ
P 1GGGGGA P 1 + T 2

kd
T 2GGGGGGA ∅. ð9Þ

The first reaction has Hill-type kinetics, being the rate rt of


the lumped transcription and translation given by the
expression:
αt1
rt ¼ ð10Þ
1 þ K p1 T 1n
where αp1, Kp1 are constants associated with the promoter and
repressor, respectively, and n is a Hill-like coefficient. The
second reaction corresponds to the degradation of the protein
T2 (with first order mass action kinetics).
The presence of an external inducer binding repressor T1
will add the following reactions (with mass action kinetics):
136 Irene Otero-Muras and Julio R. Banga

kbi1 kdi
I + T 1 FGGGGGGGB
GGGGGGG IT 1GGGA ∅
kui1

where kbi, kui, and kdi are the constants of binding, unbinding,
and degradation of the inducer complex, respectively.
Funding: This research was funded by the Spanish Ministry of Sci-
ence, Innovation and Universities, project SYNBIOCONTROL (ref.
DPI2017-82896-C2-2-R).

References
1. Marchisio MA, Stelling J (2009) Computa- 9. Exler O, Antelo LT, Egea JA, Alonso AA,
tional design tools for synthetic biology. Curr Banga JR (2008) A Tabu search-based algo-
Opin Biotechnol 20(4):479–485 rithm for mixed-integer nonlinear problems
2. Rodrigo G, Landrain TE, Shen S, Jaramillo A and its application to integrated process and
(2013) A new frontier in synthetic biology: control system design. Comput Chem Eng 32
automated design of small RNA devices in bac- (8):1877–1891
teria. Trends Genet 29(9):529–536 10. Schlueter M, Egea JA, Banga JR (2009)
3. Nielsen AAK, Der BS, Shin J, Vaidyanathan P, Extended ant colony optimization for
Paralanov V, Strychalski EA, Ross D, non-convex mixed integer nonlinear program-
Densmore D, Voigt CA (2016) Genetic circuit ming. Comput Oper Res 36(7):2217–2229
design automation. Science 352(6281): 11. Hansen P, Mladenovic N, Moreno-Perez JA
aac7341 (2010) Variable neighbourhood search: meth-
4. Otero-Muras I, Henriques D, Banga JR ods and applications. Ann Oper Res 175
(2016) Synbadm: a tool for optimization- (1):367–407
based automated design of synthetic gene cir- 12. Pedersen M, Phillips A (2009) Towards pro-
cuits. Bioinformatics 32(21):3360–3362 gramming languages for genetic engineering of
5. Xiang Y, Dalchau N, Wang B (2018) Scaling up living cells. J R Soc Interface 6:S437–S450
genetic circuit design for cellular computing: 13. Otero-Muras I, Banga JR (2016) Design prin-
advances and prospects. Nat Comput 17 ciples of biological oscillators through optimi-
(4):833–853 zation: Forward and reverse analysis. PLoS
6. Otero-Muras I, Banga JR (2017) Automated ONE 11(12):e0166867
design framework for synthetic biology exploit- 14. Dasika MS, Maranas CD (2008) Optcircuit: an
ing Pareto optimality. ACS Synth Biol 6 optimization based method for computational
(7):1180–1193 design of genetic circuits. BMC Syst Biol 2:24
7. Egea JA, Marti R, Banga JR (2010) An evolu- 15. Szekely P, Sheftel H, Mayo A, Alon U (2013)
tionary method for complex-process optimiza- Evolutionary tradeoffs between economy and
tion. Comput Oper Res 37:315–324 effectiveness in biological homeostasis systems.
8. Exler O, Schittkowski K (2007) A trust region PLoS Comput Biol 9(8):e1003163
SQP algorithm for mixed-integer nonlinear 16. Otero-Muras I, Banga JR (2014) Multicriteria
programming. Optim Lett 1(3):269–280 global optimization for biocircuit design. BMC
Syst Biol 8:113
Chapter 5

Setting Up an Automated Biomanufacturing Laboratory


Marilene Pavan

Abstract
Laboratory automation is a key enabling technology for genetic engineering that can lead to higher
throughput, more efficient and accurate experiments, better data management and analysis, decrease in
the DBT (Design, Build, and Test) cycle turnaround, increase of reproducibility, and savings in lab
resources. Choosing the correct framework among so many options available in terms of software,
hardware, and skills needed to operate them is crucial for the success of any automation project. This
chapter explores the multiple aspects to be considered for the solid development of a biofoundry project
including available software and hardware tools, resources, strategies, partnerships, and collaborations in
the field needed to speed up the translation of research results to solve important society problems.

Key words Laboratory automation, Synthetic biology, Hardware, Software, Throughput, Machine
learning, Liquid handling, Metabolic engineering, Standardization, Reproducibility

1 Introduction

Synthetic biology provides the opportunity to produce thousands


of complex molecules and solve key challenges in the modern
society varying from agriculture, chemicals production, and clean
energy in a sustainable way [1–3]. It brings together engineers and
biologists to design and build synthetic genetic circuits to encode
novel components, networks, and pathways, and use these con-
structs to reprogram organisms. However, the execution of these
experiments is still largely dependent on the artisanal work of
skillful researchers. The low throughput of this approach limits
the speed of development of the DBT (Design-Build-Test) cycle,
and manual procedures involve human errors and consequent lack
of reproducibility. Yet, almost 50% of the variance in human error
can be explained by stress, repetition, fatigue, and work environ-
ment, strongly related to a traditional laboratory work [4]. Also,
due to the complexity of biological networks and pathways, design-
ing, building, testing, and replicating the large number of con-
structs needed to achieve optimal solutions for biological

Filippo Menolascina (ed.), Synthetic Gene Circuits: Methods and Protocols, Methods in Molecular Biology, vol. 2229,
https://doi.org/10.1007/978-1-0716-1032-9_5, © Springer Science+Business Media, LLC, part of Springer Nature 2021

137
138 Marilene Pavan

questions are virtually impossible when using manual techniques


and procedures. As an example, it took over 10 years and more than
100 million dollars to develop the biosynthetic process for
1,3-propanediol [5–8]. The advancement of the manufacturing of
chemicals using biological systems depends also on the develop-
ment of new tools, techniques, and skills that can speed up and
provide new insights in the interface of synthetic biology, metabolic
engineering, and automation, which will also be explored in this
chapter [6].
Laboratory automation is a key enabling technology for Syn-
thetic Biology. In fact, a recent survey showed that up to 89% of
publications in the biomedical literature have some methods that
are supported by existing commercial robotic labs [9]. Without the
usage of automated, high-throughput pipelines, it is virtually
impossible to achieve an efficient and effective design-build-test
(DBT) cycle [1, 5, 10–13]. The concept of laboratory automation
is not a new and already became mainstream in the 1950s. In 1963
it was predicted that the market for clinical laboratory instruments
would grow 20% annually, driven by recent advances in miniaturi-
zation and speed of biological sample analysis [14]. Modern auto-
mated systems involve intricate workflows, with different levels of
integration of both software and hardware components. These
could be off-the-shelf or customized, each one with different spe-
cifications, costs, and skills needed to operate. Laboratory automa-
tion can increase the throughput of the laboratory, free up
researchers from repetitive tasks, and allow them to dedicate them-
selves to activities more intellectually relevant, while at the same
time avoiding injuries caused by repetitive procedures. It also
removes human errors and allows standardization of procedures
and increase reproducibility and accuracy, save money and time,
and collect, store, retrieve and analyze the large amount of scientific
data being produced [15].
While having so many options available that can easily fill the
gaps in different set ups and environments, from a researcher
standpoint it can be a burden to choose between different systems,
vendors, and skills needed to operate them. Failure to choose the
appropriate system can lead to frustration and, consequently, bar-
rier to adoption. Having a person fully dedicated to assessing the
lab’s current processes—and consequently their automation
needs—is key to ensure the project success and adoption of auto-
mation in the lab routine. This include to prospect different solu-
tions and partners based on those needs and to educate themselves
and others in the lab on the new, automated processes. Similarly, it
is crucial that the vendor or provider, both for software and hard-
ware, is open to work with the lab to develop features in collabora-
tion to adapt their tools to the lab needs [16]. Continuous team
Setting Up an Automated Biomanufacturing Lab 139

communication and training are also key to get the team on board.
One should be prepared to explain the benefits in adopting the new
procedures and technologies, have a clear roadmap in hand, and
have the team fully trained and supported during operations.

2 Identifying the Need for Automation

Biofoundries provide an integrated infrastructure to enable the


rapid design, construction, and testing of genetically modified
organisms [10]. As mentioned above, automation can lead to
higher throughput, more efficient and accurate experiments, better
data management and analysis, decrease in the DBT turnaround,
increase of reproducibility, and save lab resources [12, 17,
18]. Poor experiments recording, disparities on how research is
conducted, run-to-run differences, among others, lead to the inca-
pacity to reproduce experiments and results, resulting in a waste of
time and money. An analysis of past studies indicates that the
cumulative (total) prevalence of irreproducible preclinical research
exceeds 50%, for example, resulting in approximately US$28 bil-
lion/year spent on preclinical research that is not reproducible—in
the United States alone [19].
A typical automated lab workflow is presented below. Brain-
storming how the lab scientific research sits in each one (or all) of
these steps is key to identify the need for automation either for
some or all the steps. As consequence, the whole process can be
fully integrated, decreasing human intervention as much as possible
or semi-automated, usually more flexible and suitable for smaller
labs and throughput [5].

2.1 Design Software tools are available today to help on the experimental
design and data management and analysis. They are fundamental
to help to guide on the best strategy for both the abstract level
(choosing the best genetic candidates to be tested) and for the
practical level (the DNA assembly strategy to be executed). Auto-
mated, well-informed designs help to increase the number of
designs that can be generated, the speed these designs can be
generated, and it helps to narrow down the design space prioritiz-
ing the best candidates to be built and tested, saving lab resources
[20–24].

2.2 Build Worklist-based instructions for liquid handlers are the most com-
mon approach in automated facilities. Usually a .csv file is gener-
ated, containing the instructions from where to aspirate liquids
(DNA parts, reagents, media, etc.) to where to dispense them.
Ideally, the build instructions should be linked to the design and
sample inventory pipelines. As an example for the advantage of
having the build process automated in the lab, DNA assembly can
140 Marilene Pavan

consume 33–50% of a scientist time and is generally low efficient


[5, 12]. Also, the multiple interactions and repetitions to achieve
the correct construct and to insert variability in the design space
makes it, potentially, the most tedious and error prone procedure in
a molecular biology lab, besides the increase in costs.

2.3 Test The test tools and equipment should link the experiments being
performed in the lab, with the respective constructs created by the
design tool and assembled by the build pipeline. They also collect
and store the data being generated while testing the synthetic
genetic circuits. Fragment analysis, sequencing, transcriptomics,
fermentation, proteomics, and metabolomics can be usually auto-
mated, though analytical techniques remain one of the most diffi-
cult steps to automate to date.

2.4 Learn Machine learning (ML) [25, 26] is being used to interpret data
being generated by the test cycle. The vast amount of data being
generated by automated experiments suits better computer algo-
rithms than human minds. These algorithms can help to under-
stand the behavior and predictability of genetic circuits, narrowing
down the space of genetic circuits to be constructed, speeding up
and saving money in the whole process. In a study published in
2019, Opgenorth and colleagues [13] reported on the implemen-
tation of two DBT cycles to optimize 1-dodecanol production from
glucose using 60 engineered Escherichia coli MG1655 strains. They
used the data produced in the first DBT cycle to train several
machine-learning algorithms and to suggest protein profiles for
the second DBT cycle that would increase production. These stra-
tegies resulted in a 21% increase in dodecanol titer in Cycle 2.
Finally, together with the data presented above, the 10 ques-
tions below should be asked when considering the need for auto-
mation: (1) Does the lab need more reproducible and accurate
results? (2) Would the laboratory research line benefit from increas-
ing the throughput of experiments (more data generation leading
to faster answers)? (3) Does the lab staff have the time and resources
to train the lab students and employees, continuously, on automa-
tion? (4) Does the lab want and have the time to work with the
vendors to co-develop and adapt the protocols and processes to an
automated set up? (5) Not having the students and employees
performing some repetitive tasks is important for the lab operations
(benefiting from walk-away time and avoiding injuries caused by
repetitive tasks)? (6) How the costs associated with automation
(purchase, maintenance, consumables, training, co-developments)
will impact the lab? (7) Does the lab need better documentation on
the protocols being performed and results being generated? (8) Are
the lab protocols and experiments standard and frequent enough so
they can benefit from an automated procedure? (9) Would it be
beneficial to achieve financial stability in lab operations? (10) Does
the lab have the required space for an automated system?
Setting Up an Automated Biomanufacturing Lab 141

3 Strategy

When implementing automation for the first time, projects might


fail to achieve the desired results. Failure and frustration can often
be traced back to as follows:
1. The scope of the work has been underestimated (by all levels of
seniority involved). It usually requires considerable investment
in time, training, and money to proper manage and implement
any automation project. A written vision is useful and must be
extensively communicated so that the purpose of the project is
understood by everyone in the lab [27].
2. The selection of the appropriate approach to automation
should be guided by a deep understanding of how the lab is
currently working (protocols, personnel, time, costs, through-
put, frequency of the protocols being performed, how flexible
they are, how the workflow conditions may change over time)
and what is the vision to where the laboratory is heading with
automation adoption (how automation will change the
work) [28].
3. Unsuccessful choices in system flexibility, data management
systems, and the quality of the partnership with the vendors
also have direct impact over the automation project success.
(a) Some considerations that should be made by the one
(s) responsible for the project, together with the lab
group, are:
l Budget—Be very clear to assess and communicate the
available budget from the very beginning of the project
to all involved parties. Take into consideration the costs
of highly skilled scientists, potentially expensive
reagents and labware, hardware and software, system
maintenance, and necessary training. Vendors can help
to analyze these costs, as well as proper internal depart-
ments in companies and universities (departments as
business development, financial, corporate relation-
ships, project management). Talking with experts and
independent consultants in laboratory automation can
also help to understand the costs involved and avoid
future surprises. Finally, a range of different solutions in
different sizes, costs, throughput, and flexibility are
available today [29, 30] to choose from.
l Expertise—In general, researchers that can navigate the
biology, hardware, and software fields tend to be more
prone to learn automation faster and teach others easily.
Still, the one(s) responsible for the project should ask
for and the vendors should provide training and post-
142 Marilene Pavan

sale support to develop methods and be ready to answer


any possible questions in a timely manner. Knowing the
skills and knowledge available in the lab is key to fore-
cast the amount of training and support that will be
required to program and run the automated system as
smooth as possible.
l Protocols—The main aspect to consider here is how
likely the conditions tend to change for the protocols
aimed for automation. The more they change, the more
flexible and accommodating the system should be. This
is important to consider when determining the system
flexibility and integration (fully versus semi-automated
workflow).
l Throughput—This is frequently the first question asked
by all vendors. Having a clear throughput expectation
(number of experiments, tests, strains, compounds, and
data being performed and produced for example) per
period is key to identify the best system to suit the
workflow.
l Software—Efficient data and protocol collection, stor-
age, and sharing are required to manage and interpret
properly the great amount of data being generated by
an automated laboratory. Efficient and standard data
capture and analysis help keeping track of biological
behavior predictability and results reproducibility lead-
ing to, ultimately, better experimental design [22]. Soft-
ware is important to provide experimental strategies
[20] and robotic instructions [21], manage repository
of parts and reagents [31], integrate equipment and
schedule protocols (some developers include Thermo-
Fisher, Biosero, HighRes, and Synthace), and store and
interpret data and protocols [23, 32]. It is a normal
situation an automated system will demand more than
one software tool to cover all the items described
above.
l Hardware—A thermocycler is an automation equip-
ment, manually operated, and is enough to perform a
number of PCR (polymerase chain reaction) reactions
in most of the labs. However, the more the experimen-
tal throughput and complexity increase the more a
laboratory will need to acquire peripherals that can be
integrated seamlessly to other similar equipment and to
robotic arms, in detriment of human handling. Some of
them (centrifuges, sealers, pealers, incubators, and
thermocyclers, for example) might make sense if
integrated to robotic arms, for example, to provide
Setting Up an Automated Biomanufacturing Lab 143

the walk-away component, characteristic of many auto-


mated systems. Others, as liquid handlers, can vary in
footprint and cost (examples include the Opentrons
OT2, Hamilton STAR, Tecan Evo, among others)
and in sample volume transferred (Labcyte Echo,
iDOT). Automated freezers offer advantages as better
control and storage of biological samples that can be
achieved independently of the set up (integrated or
standalone). To choose the correct individual piece of
equipment, before decide on the capabilities, capacity,
and integration, keep in mind: (1) which equipment
the laboratory will use in standalone mode (manually
operated) or integrated (or both) to other equipment;
(2) which footprint, cost, and throughput the labora-
tory is aiming for; (3) the reasons for automation (walk-
away component, avoiding repetitive tasks, increase
throughput, cost per reaction). Figure 1 shows an
example, at the University of Edinburgh, of a fully
integrated automated system. Another integrated sys-
tem is enclosed in an oxygen-free environment—which
adds a different level of complexity—at Lanzatech Inc.
(Fig. 2) to manage anaerobic organisms.
l Building (infrastructure)—Be sure to check with the
responsible in the department for the infrastructure to
ensure that the room where the system is going to sit
has appropriate energy, water, and gas requirements,
the floor supports the weight of the system, and that
the system is following all (bio)safety requirements.

4 Business Plan

A business plan is an effective tool not only to provide a big,


complete picture to justify the investment in automation but also
to secure the capital and resources needed to run the facility
[33]. Below are the key strategic elements that should be consid-
ered and presented in any plan, whether the biofoundry is located
in a university or in a private company.

4.1 Funding: Federal investments in synthetic biology research contribute to


Government foundational knowledge and technological development to facili-
tate commercial applications in platform organisms, process path-
ways, and related biotechnologies [34]. In the United States, some
of the most important federal funders include the National Science
Foundation (NSF), Department of Energy (DOE), Department of
Health & Human Services (HHS, including National Institutes of
Health (NIH)), Department of Defense (DOD, including Defense
144 Marilene Pavan

Fig. 1 Edinburgh Genome Foundry. The foundry system has a complete, fully integrated automated system for
the design, build and test of hundreds of genetic variants per day

Fig. 2 Enclosed biofoundry at Lanzatech Inc. allows to modify and test anaerobic microorganisms

Advanced Research Projects Agency (DARPA)), the Navy, the Air


Force, Department of Agriculture (USDA), and National Aeronau-
tics & Space Administration (NASA), each of them with their own
scientific focus [1]. Funded efforts in biomanufacturing
Setting Up an Automated Biomanufacturing Lab 145

automation for scientific research and partnerships in the United


States include the Agile Biofoundry (DOE) and the Broad Institute
(DARPA).
What to do: Review the several programs available in your country
supporting new equipment acquisition, keep track on dates, partic-
ipate on the informative agency calls, and submit your project with
clear indication—supported by consistent and reasonable data—on
how the new technology will support and advance your research.

4.2 Funding: To maintain an automated system can be expensive. A fee-for-


Fee-for-Service Model service model might be a good option to provide funding
and Project (in exchange for providing services to others) and diversify the
Partnerships business (keeping your biofoundry busy while attracting new scien-
tific partnerships).
As an example, known foundries located inside universities
working under the fee-for-service model are the Edinburgh
Genome Foundry (https://www.genomefoundry.org/) and the
DAMP Lab (https://www.damplab.org). Private companies
include Transcriptic (https://www.transcriptic.com/) and Emerald
(https://www.emeraldcloudlab.com/), which offer “cloud-
based” labs.
Other interesting and effective way to diversify and optimize
your research pipeline is to establish partnerships with other com-
panies, university groups, and national laboratories as the iBioFab
[35], the DNA London Foundry (https://www.
londonbiofoundry.org/), Ginkgo Bioworks (https://www.
ginkgobioworks.com/), Zymergen (https://www.zymergen.
com/), Amyris (https://amyris.com/), Agile Biofoundry
(https://agilebiofoundry.org/), among others.
What to do: explore, inside your company or university, which are
the possibilities to diversify your business and get more funding and
scientific partnerships in the process to help set up and support the
biofoundry. Usually the finance department and departments
related to Open Innovation, Industry Partnerships, Entrepreneur-
ship, and Business Development are good candidates to start with.
They can help not only with ideas but also with all paperwork
necessary to put together the business plan. Also, talk with existing
groups operating in one of these models above to learn from them.
Not less important, the balance between the own lab research and
experiment and the experiments and research being developed to
others must be very clearly defined.

4.3 Partnerships Though more and more companies are offering low cost, more
affordable, and accessible solutions in lab automation, partnerships
with existent public or private biofoundries [10, 12] might provide
a good solution for first-time users or to those already using auto-
mation in their lab but in need of different protocols and
146 Marilene Pavan

methodologies to be developed and implemented. Startup compa-


nies, while evaluating where to spend their limited funds, can also
benefit from these partnerships for their proof of concept experi-
ments before deciding on having an in-house automated system.
Companies like Transcript (https://www.transcriptic.com/),
Emerald (https://www.emeraldcloudlab.com/) [12], and Geno-
Fab (https://genofab.com/) are cloud-based laboratories that can
provide services that enable scale and efficiency for the discovery
process. On the software spectrum companies as TeselaGen
(https://teselagen.com/), Benchling (https://www.benchling.
com/), and Ryffin (https://riffyn.com/) offer on-demand projects
and ready-to-use software solutions for scientific design, automa-
tion, and data analysis. Public and private biofoundries and bio-
foundry consortiums as the Global Biofoundry Alliance (https://
www.biofoundries.org/) and Agile Biofoundry (https://
agilebiofoundry.org/) can be great partners as well as they reunite
a vast amount of contacts, resources, and expertise, all over the
world [10]. Finally, vendors as Thermo, Opentrons, HighRes,
Molecular Devices, and Biosero are valuable partners as well and
provide guidance based specifically in the client’s needs and avail-
able resources. No matter the nature and goals of the partnership,
communication is key. Freedom and flexibility can be barriers while
setting up a partnership and contracting services. So, it is very
important to know partner’s capabilities and flexibility to adapt to
the researcher’s protocols.
Finally, other great source of knowledge and contacts are con-
ferences (focused or not) on biomanufacturing automation as
SLAS (Society for Laboratory Automation and Screening),
IWBMA (International Workshop in Biomanufacturing Automa-
tion), IWBDA (International Workshop in Biodesign Automation),
and SynBioBeta.
What to do: research and prospect among potential partners and
map their capabilities, set up calls and visits with them, explore
partnership options, have in mind a clear process or service you
want to develop, outcomes you want to achieve, timeline, and
resources available (personnel, space, budget), be open to partici-
pate on conferences, to codevelop protocols, and to write publica-
tions together with the partners.

4.4 Education Sometimes, the gains of having an automated lab set up are not as
obvious as scaling up and speeding up experiments, but equally
important. Through research being developed in an automated
setup, a new generation of scientists is being trained at the interface
of systems biology, synthetic biology, molecular biology, bioinfor-
matics, strain engineering, and metabolic engineering, along with
hardware and software engineering. These skill sets are critical in
basic and applied R&D but rarely acquired in traditional research
programs [1].
Setting Up an Automated Biomanufacturing Lab 147

Companies like OpenTrons, together with scientific competi-


tions as iGEM (https://igem.org) are actively promoting automa-
tion of engineering biology throughout the competition. Initiatives
as the STEM Pathways at Boston University (https://
stempathways.org), JBEI educational programs (http://www.jbei.
org/education/), and the Earlham Institute (https://www.
earlham.ac.uk/learning) are also actively contributing to engage a
new generation of scientists in the biomanufacturing field. Not only
outreach activities using automated set up gets a new generation of
researchers interested in a specific scientific field, but it also pro-
motes the lab research activities and capabilities to other groups,
fostering collaborations and attracting talent. Finally, for some
federal agencies, outreach programs are actually a
requirement [36].
What to do: look for inspiration in existing programs inside and
outside your institution to set up your own science outreach and
training programs. It might be very useful to develop a system
based on qualifiable and quantifiable metrics to keep track of the
efficiency of the program.

4.5 System Oftentimes, the use of grant funds to pay for system maintenance is
Maintenance not permitted. In this case, check with the finance department in
and Personnel the company or university for alternatives. Usually, the equipment
comes with a one-year warranty contract that can be negotiated to
be extended. Is important to involve the department responsible
for purchases and partnerships in those negotiations. Preventive
maintenance and service contracts, after the warranty expires,
should also be negotiated. Finally, consider learning the basics of
the system maintenance, so you do not have to rely heavily in
maintenance services. Be also very clear in the business plan on
how you are planning to implement daily procedures for cleaning
and maintenance.
Personnel are often the most expensive part of an automation
project due to the fact that it requires highly specialized scientists in
biotechnology and software-related fields. Usually a scientist work-
ing in automation projects learns a programming language, works
well in a diverse team, is a good problem-solving, build solid, long-
term relationships with other laboratories, vendors, and partners,
can understand and adapt scientific protocols to automated sys-
tems, helps with grant writing, and has great discipline to maintain
the system and document the research being developed there.

5 Exciting Ventures in the Automation Field: Enabling Technologies

Achieving the ideal bioengineering scenario where commercially


viable biochemicals and pharmaceutical compounds, effective and
fast medical diagnosis and treatment, effective climate change
148 Marilene Pavan

solutions as carbon recycling, and other solutions for societal pro-


blems that can be translated faster from the laboratory to society
requires the development of powerful enabling tools and meth-
odologies related to the biomanufacturing field, some of them
described below.

5.1 Machine The design of scientific experiments, the generation of robotics


Learning instructions, the tracking of samples, experimental procedures,
and results, and the analysis of the large data being produced by
an automated framework is simply not possible without the support
of a solid software structure. Not only private and public research
groups are working to build this framework [20, 21, 23] but also to
make it more flexible, user friendly, and smart as possible [37].
Machine learning algorithms also bring the promise to elimi-
nate the historically nonintegrated, trial-and-error approach to
construct and test synthetic pathways. Though with the improving
capabilities in DNA synthesis and automation to build and test
thousands of different genetic constructs, a more informed decision
to narrow down the design space and optimize the number of
variants being tested represents important savings in research
costs and time. The availability of large amounts of high-quality
data being produced by automated facilities would enable compu-
tational biologists to produce robust theories [38], and the theory
produced by these data sets would allow experimentalists to better
design experiments and tackle questions of general relevance.
In silico tools for the predictive design of microbial cell fac-
tories, for example, allow for the optimal genetic combination
prediction in multigene pathways for high producers. In the work
published by Jervis and colleagues [26], the development and
training of machine learning algorithms has demonstrated to
boost the monoterpenoid production titers, for example, by over
60% while screening under 3% of a library, using an automated
screening pipeline. For other good reference, see Opgenorth
et al., 2019 [13].
Machine learning algorithms have also being used for drug
discovery [39], cell image analysis [40], and metabolic flux
analysis [41].
Enough resources should be dedicated to the unique opportu-
nity to learn from the results being generated by the Design, Build,
and Test phases and to the incorporation of this new knowledge in
the new cycles to avoid the obstacles to an increase in synthetic
biology productivity [42].

5.2 Microfluidics Engineering biological systems holds great potential to generate


and Cell-Free Systems high-value compounds. However, the Design, Build, and Test
(DBT) cycle involved in the process can be slow, laborious, expen-
sive, and hard to automate and scale. In vitro prototyping and
biomanufacturing offers a powerful technical solution for
Setting Up an Automated Biomanufacturing Lab 149

automation and high-throughput screening as it is promised to be


faster—compared with microbial cell factories—and suitable to be
adapted to miniaturized hardware footprint and reaction volume.
Yet, the cell crude extract is oftentimes easy and inexpensive to
produce in any molecular biology laboratory, can be produced in
scale [43], and it can be stored for longer times [44]. Cell-free
protein synthesis (CFPS)/Transcription-Translation (TX-TL) sys-
tems usually involves cell growth, lysis, and DNA extraction from
the lysate, and the addition of NTPs, amino acids, an energy source
and, finally, the DNA of interest [45–47].
A number of works has demonstrated its efficacy for E. coli [48–
50], yeast [51], and even nontraditional organisms [52, 53]. Cell-
free systems have also been used for educational purposes [54] and
for studies requiring minimal cells construction, improving the
understanding of biological process with minimal metabolic inter-
ference [55, 56], and prototyping pathways [52].
CFPS are also suitable for the utilization in microfluidics chips
[57]. Microfluidics platforms offer an alternative for the adoption
of an automated pipeline at lower price and footprint while main-
taining equivalent throughputs and reproducibility capabilities,
compared to liquid handling robotics. According to DeMello
et al. 2019, “microfluidics describes the investigation of analytical
systems that manipulate, process and control small volumes of
fluids (typically on the picolitre to nanoliter scale)” [57]. Droplet
microfluidics (in which very low volumes of reagents and biological
material are encapsulated into monodisperse droplets), for exam-
ple, has been demonstrating exception promise for biological
experiments as DNA assembly, transformation and transfection,
cell culturing and sorting, and genetic circuits prototyping
[58]. Microfluidic chips also offer great control over the experi-
ments via the control of temperature, evaporation, droplet genera-
tion rate, size, and flow, addition of selection medium, oxygen
ratio, and throughput [59, 60].

5.3 DNA Assembly Standardization and modularity bring the potential to make engi-
and Strain neering biology more predictable, while enabling miniaturization
Development and automation of DNA assembly methods. The first attempt, in
synthetic biology, to introduce some degree of standardization was
the implementation of the BioBrick standard, adopted by the
iGEM competition [61]. Today, a number of modular, highly
efficient, automation friendly tools, and methodologies are avail-
able for DNA construction, being the most used method known as
Gibson Assembly [62]. This one-step, isothermal, scarless, in vitro
recombination approach utilizes exonuclease activity, DNA poly-
merase activity, and DNA ligase activity to amplify and ligate DNA
fragments with appropriate overlaps. Another widely adopted tool,
the Golden Gate assembly [63]—and its variables as Modular
Cloning (MoClo) [64] and BASIC [65] among others [66–68]—
150 Marilene Pavan

takes advantage of the intrinsic characteristic of the restriction


enzymes type IIs to cut the DNA after the recognition site. By
this way, short overhangs can be introduced in the DNA sequence,
adjacent to the enzyme recognition site, complementary to the
adjacent parts, which will be later ligated by a T4 DNA ligase
enzyme. Several vendors commercialize Gibson and Golden Gate
assembly enzyme mixes and offer online tools and instructions to
proper design oligos and experiments in general. Also, several
reagents (oligos, gene fragments, competent cells) also have the
option to be delivered in SBS standard plates. Ellis et al. have
published a very complete review on a plethora of powerful DNA
assembly methods [69].

5.4 Open Science The more automation and throughput are introduced in the lab
routine, the more biological material exchange and collaborations
might be required and needed. Biological material exchange is
wanted to save time and money in resynthesizing, retesting already
existing, well-characterized DNA parts and strains. Material Trans-
fer Agreements (MTAs) underlie the legal requirements within
researchers to define the terms and conditions for sharing biological
materials, ensuring and respecting the rights of the creators, and
promoting safe practices and responsible research [70]. However,
the process of getting an agreement can be very bureaucratic and
time-consuming. Fortunately, there are initiatives as the OpenMTA
(https://www.openplant.org/openmta/), which relaxes restric-
tions on the redistribution and commercial use of biomaterials,
while supporting the practical realities of technology transfer by
being flexible enough to accommodate the needs of different
groups worldwide [70]. It is highly desirable the widespread adop-
tion of this system, in order to accelerate and simplify the MTA
process.
Also, community labs such as BioBlaze (https://www.bioblaze.
org/), BioCurious (http://biocurious.org/), and GenSpace
(https://www.genspace.org/); material exchange initiatives such
as the OpenMTA and the Free Genes project (https://biobricks.
org/freegenes/); outreach initiatives such as the Community Bio-
technology Initiative (CBI) [71] and the IGEM competition; and
low-cost robots like the OpenTrons OT2 (https://opentrons.
com/) are facilitating, promoting, and democratizing the entry
access to automation.

5.5 Metrology Metrics, standards, and modularity are intrinsic characteristics of


and Standardization synthetic biology and make possible to keep track and evaluate
experimental reproducibility, cost, time, and efficiency. Also,
metrics and standards allow the critical analyze of genetic engineer-
ing automation for the same parameters to determine even when
automation is warranted, based on factors such as assembly meth-
odology, protocol details, and number of samples [72, 73]. A
Setting Up an Automated Biomanufacturing Lab 151

number of tools and steps can be used or taken in consideration to


evaluate an automated system and to improve its reproducibility
overtime [74].
Evaluating these metrics collaboratively among different
laboratories also provide valuable information over the robustness
of the protocol, methodology, and the system as a whole
[17, 72]. A work developed over the 2016 iGEM competition
between 92 institutions around the world brilliantly showcases
the collaborative efforts in the metrology and standardization fields
with the objective, in this case, to tackle the lack of comparable
units to measure fluorescence. The participant groups measured
fluorescence from E. coli transformed with three engineered test
plasmids, plus positive and negative controls, using simple,
low-cost unit calibration protocols designed for use with a plate
reader and/or flow cytometer. The results of this project provided
not only comparable units but also valuable information about data
collection and processing, precision, and instances of protocol
failure [75].
Recently, synthetic biologists have developed an open-source
software called SynBioHub that facilitates the sharing of informa-
tion about engineered biological systems. By connecting to relevant
repositories, the software allows users to browse, upload, and
download data in various standard formats, regardless of their
location or representation. SynBioHub also provides a central ref-
erence point for other resources to link to delivering design infor-
mation in a standardized format using the Synthetic Biology Open
Language (SBOL), a data exchange standard for descriptions of
genetic parts, devices, modules, and systems. The goals of this
standard are to allow scientists to exchange designs of biological
parts and systems, to facilitate the storage of genetic designs in
repositories, and to facilitate the description of genetic designs in
publications [76, 77].
The lab automation field, itself, developed a standard system
that, today, guides the whole lab automation industry, facilitating
the adoption and stimulating competition in the sector. It started
with the definition of the microplate standard defined by the Amer-
ican National Standards Institute (ANSI) and the Society of Bio-
molecular Screening (SBS) now named the Society for Laboratory
Automation and Screening (SLAS), which today guides the whole
industry, and it is known as SBS standard.

6 Conclusion

Automation of genetic circuits synthesis is a recurrent technical


theme in scientific publications and roadmaps [78, 79] due to
rapid advances in software and high-throughput analysis and
DNA assembly. Using automation to conduct large numbers of
152 Marilene Pavan

experiments in parallel is making it increasingly possible to address


biological system function from a digital rather than analogue
perspective. By applying engineering principles of characterization,
standardization, and modularization to biological systems—allied
to a range of innovations in miniaturization, automation, and
metrology—predictability and development speed can be increased
and costs reduced. Previously intractable challenges can be read-
dressed, and the potential to commercialize useful applications
enhanced. By enabling concepts to be translated more rapidly and
reliably into commercially viable processes, the cost of market entry
may be reduced, competitiveness enhanced, and the delivery of
benefits accelerated [79].
The development and adoption of automation-friendly
enabling technologies and the collaboration between different
groups and biofoundries have enabled the integration between
software development, liquid handling robotics, protocol develop-
ment empowering the design, construct, test, and learn cycle in
order to develop scalable new biotechnology applications. How-
ever, effective automation requires the proper skills, physical space,
dedication, training, and funding to achieve scalability and repro-
ducible results. Challenges might include the proper translation of
a scientific protocol to an automated framework, adequate system
flexibility, proper throughput forecasting, and an adequate software
ecosystem for data management and analysis.
Biomanufacturing automation brings the powerful capability
to accelerate the scientific discovery, while developing and sharing
standardized protocols and techniques, promoting training and
education, developing and adopting metrology standards, decreas-
ing costs and time, and fostering partnerships to ultimately deliver
transformative technologies to address complex problems in a sus-
tainable manner.

Acknowledgments

This work was supported by the U.S. Department of Energy, Office


of Biological and Environmental Research in the DOE Office of
Science [Grant Number DE-SC0018249].

References
1. Si T, Zhao H (2016) A brief overview of syn- chemical diversity. Nat Rev Microbiol 14
thetic biology research programs and roadmap (3):135–149
studies in the United States. Synth Syst Bio- 4. Yeow JA, Ng PK, Tan KS et al (2014) Effects of
technol 1(4):258–264 stress, repetition, fatigue and work environ-
2. Khalil AS, Collins JJ (2010) Synthetic biology: ment on human error in manufacturing indus-
applications come of age. Nat Rev Genet 11 tries. J Appl Sci 14(24):3464–3471. https://
(5):367–379 doi.org/10.3923/jas.2014.3464.3471
3. Smanski MJ, Zhou H, Claesen J et al (2016)
Synthetic biology to access and expand nature’s
Setting Up an Automated Biomanufacturing Lab 153

5. Chao R, Mishra S, Si T, Zhao H (2017) Engi- 20. Densmore DM, Bhatia S (2014) Bio-design
neering biological systems using automated automation: software + biology + robots.
biofoundries. Metab Eng 42:98–108 Trends Biotechnol 32:111–113
6. Studies L (2015) Industrialization of biology: a 21. Hillson NJ, Rosengarten RD, Keasling JD
roadmap to accelerate the advanced (2012) J5 DNA assembly design automation
manufacturing of chemicals software. ACS Synth Biol 1(1):14–21. https://
7. Nielsen J, Keasling JD (2016) Engineering cel- doi.org/10.1021/sb2000116
lular metabolism. Cell 164(6):1185–1197 22. Morrell WC, Birkel GW, Forrer M et al (2017)
8. Karim AS, Dudley QM, Jewett MC (2016) The experiment data depot: a web-based soft-
Cell-free synthetic systems for metabolic engi- ware tool for biological experimental data stor-
neering and biosynthetic pathway prototyping. age, sharing, and visualization. ACS Synth Biol
In: Wittmann C, Liao JC (eds) Industrial bio- 6(12):2248–2259. https://doi.org/10.1021/
technology. Wiley, Weinheim acssynbio.7b00204
9. Groth P, Cox J (2017) Indicators for the use of 23. Nielsen AAK, Der BS, Shin J et al (2016)
robotic labs in basic biomedical research: a lit- Genetic circuit design automation. Science
erature analysis. PeerJ 5:e3997. https://doi. 352(6281):aac7341. https://doi.org/10.
org/10.7717/peerj.3997 1126/science.aac7341
10. Hillson N, Caddick M, Cai Y et al (2019) 24. Appleton E, Densmore D, Madsen C, Roehner
Building a global alliance of biofoundries. Nat N (2017) Needs and opportunities in
Commun 10:2040 bio-design automation: four areas for focus.
11. Carbonell P, Jervis AJ, Robinson CJ et al Curr Opin Chem Biol 40:111–118
(2018) An automated design-build-test-learn 25. Costello Z, Martin HG (2018) A machine
pipeline for enhanced microbial production of learning approach to predict metabolic path-
fine chemicals. Commun Biol 1:66. https:// way dynamics from time-series multiomics
doi.org/10.1038/s42003-018-0076-9 data. NPJ Syst Biol Appl 4:19. https://doi.
12. Hayden EC (2014) The automated lab. Nature org/10.1038/s41540-018-0054-3
516(7529):131–132 26. Jervis AJ, Carbonell P, Vinaixa M et al (2019)
13. Opgenorth P, Costello Z, Okada T et al (2019) Machine learning of designed translational
Lessons from two design-build-test-learn control allows predictive pathway optimization
cycles of Dodecanol production in Escherichia in Escherichia coli. ACS Synth Biol 8
coli aided by machine learning. ACS Synth Biol (1):127–136. https://doi.org/10.1021/
8(6):1337–1351. https://doi.org/10.1021/ acssynbio.8b00398
acssynbio.9b00020 27. Hale AN (1999) 5 Building realistic automated
14. Olsen K (2012) The first 110 years of labora- production lines for genetic analysis. In: Craig
tory automation: technologies, applications, AG, Hoheisel JD (eds) Methods in microbiol-
and the creative scientist. J Lab Autom 17 ogy. Academic Press, San Diego
(6):469–480. https://doi.org/10.1177/ 28. O’Sullivan B (2019) Points to consider when
2211068212455631 planning for lab automation projects. In: High-
15. Chapman T (2003) Lab automation and Res Bio. https://highresbio.com/blog/
robotics: automation on the move. Nature points-to-consider-when-planning-for-lab-
421(6923):661, 663, 665-6. https://doi. automation-projects/
org/10.1038/421661a 29. Opentrons (2019) Guide to choosing a lab
16. Lundberg K (2012) Increase user adoption automation platform. In: Opentrons. https://
rates and realize a higher rate of return on insights.opentrons.com/the-automated-
your LIMS investment. GenomeWeb 1–8 pipetting-revolution-is-here
17. Phillips P, Lithgow GJ, Driscoll M (2017) A 30. Butler JM (2012) New technologies and auto-
long journey to reproducible results. Nature mation. In: Advanced topics in forensic DNA
548:387–388 typing. Elsevier Academic Press, San Diego
18. Teytelman L (2018) No more excuses for 31. Ham TS, Dmytriv Z, Plahar H et al (2012)
non-reproducible methods. Nature 560 Design, implementation and practice of JBEI-
(7719):411 ICE: an open source biological part registry
platform and tools. Nucleic Acids Res 40(18):
19. Freedman LP, Cockburn IM, Simcoe TS e141. https://doi.org/10.1093/nar/gks531
(2015) The economics of reproducibility in
preclinical research. PLoS Biol 13(6): 32. Oberortner E, Cheng JF, Hillson NJ, Deutsch
e1002165. https://doi.org/10.1371/journal. S (2017) Streamlining the design-to-build
pbio.1002165 transition with build-optimization software
154 Marilene Pavan

tools. ACS Synth Biol 6(3):485–496. https:// platform for cell-free synthetic biology. ACS
doi.org/10.1021/acssynbio.6b00200 Synth Biol 5(4):344–355. https://doi.org/
33. Cohen L (2019) Writing your business plan. 10.1021/acssynbio.5b00296
Nat Biotechnol 20(Suppl):BE33–BE35. 47. Gregorio NE, Levine MZ, Oza JP (2019) A
https://doi.org/10.1038/nbt0602supp- user’s guide to cell-free protein synthesis.
BE33 Methods Protoc 2:24. https://doi.org/10.
34. Clark DP, Pazdernik NJ (2016) Synthetic biol- 3390/mps2010024
ogy: report to congress 2013. Biotechnology 48. Kay JE, Jewett MC (2015) Lysate of engi-
419–445. https://doi.org/10.1016/B978-0- neered Escherichia coli supports high-level
12-385015-7.00013-2 conversion of glucose to 2,3-butanediol.
35. Chao R, Liang J, Tasan I et al (2017) Fully Metab Eng 32:133–142. https://doi.org/10.
automated one-step synthesis of single- 1016/j.ymben.2015.09.015
transcript TALEN pairs using a biological 49. Rustad M, Eastlund A, Marshall R et al (2017)
foundry. ACS Synth Biol 6:678–685. https:// Synthesis of infectious bacteriophages in an
doi.org/10.1021/acssynbio.6b00293 E. coli-based cell-free expression system. J Vis
36. NSF Broader impacts review criterion. https:// Exp (126):56144. https://doi.org/10.3791/
www.nsf.gov/pubs/2007/nsf07046/ 56144
nsf07046.jsp 50. Dudley QM, Nash CJ, Jewett MC (2019) Cell-
37. Segal M (2019) An operating system for the free biosynthesis of limonene using enzyme-
biology lab. Nature 573(7775):S112–S113 enriched Escherichia coli lysates. Synth Biol 4
38. Carbonell P, Radivojevic T, Garcı́a Martı́n H (1):ysz003. https://doi.org/10.1093/
(2019) Opportunities at the intersection of synbio/ysz003
synthetic biology, machine learning, and auto- 51. Schoborg JA, Clark LG, Choudhury A et al
mation. ACS Synth Biol 8(7):1474–1477. (2016) Yeast knockout library allows for effi-
https://doi.org/10.1021/acssynbio.8b00540 cient testing of genomic mutations for cell-free
39. Lima AN, Philot EA, Trossini GHG et al protein synthesis. Synth Syst Biotechnol 1:2–6.
(2016) Use of machine learning approaches https://doi.org/10.1016/j.synbio.2016.02.
for novel drug discovery. Expert Opin Drug 004
Discov 11(3):225–239 52. Karim AS, Dudley QM, Juminaga A, et al
40. Kan A (2017) Machine learning applications in (2019) In vitro prototyping and rapid optimi-
cell image analysis. Immunol Cell Biol 95 zation of biosynthetic enzymes for cellular
(6):525–530 design. bioRxiv. https://doi.org/10.1101/
685768
41. Ghosh A, Ando D, Gin J et al (2016) 13C
metabolic flux analysis for systematic metabolic 53. Moore SJ, MacDonald JT, Wienecke S et al
engineering of S. cerevisiae for overproduction (2018) Rapid acquisition and model-based
of fatty acids. Front Bioeng Biotechnol 4:76. analysis of cell-free transcription–translation
https://doi.org/10.3389/fbioe.2016.00076 reactions from nonmodel bacteria. Proc Natl
Acad Sci U S A 115(19):E4340–E4349.
42. Lawson CE, Harcombe WR, Hatzenpichler R https://doi.org/10.1073/pnas.1715806115
et al (2019) Common principles and best prac-
tices for engineering microbiomes. Nat Rev 54. Huang A, Nguyen PQ, Stark JC et al (2018)
Microbiol 17(12):725–741. https://doi.org/ Biobits™ explorer: a modular synthetic biol-
10.1038/s41579-019-0255-9 ogy education kit. Sci Adv 4(8):eaat5105.
https://doi.org/10.1126/sciadv.aat5105
43. Kwon YC, Jewett MC (2015) High-
throughput preparation methods of crude 55. Jewett MC, Forster AC (2010) Update on
extract for robust cell-free protein synthesis. designing and building minimal cells. Curr
Sci Rep 5:8663. https://doi.org/10.1038/ Opin Biotechnol 21(5):697–703
srep08663 56. Caschera F, Noireaux V (2016) Compartmen-
44. Karim AS, Jewett MC (2018) Cell-free syn- talization of an all-E. coli cell-free expression
thetic biology for pathway prototyping. Meth- system for the construction of a minimal cell.
ods Enzymol 608:31–57 Artif Life 22(2):185–195
45. Perez JG, Stark JC, Jewett MC (2016) Cell- 57. Gulati S, Rouilly V, Niu X et al (2009) Oppor-
free synthetic biology: engineering beyond the tunities for microfluidic technologies in syn-
cell. Cold Spring Harb Perspect Biol 8(12): thetic biology. J R Soc Interface 6:S493–S506
a023853. https://doi.org/10.1101/ 58. Gach PC, Iwai K, Kim PW et al (2017) Droplet
cshperspect.a023853 microfluidics for synthetic biology. Lab Chip
46. Garamella J, Marshall R, Rustad M, Noireaux 17:3388–3400
V (2016) The all E. coli TX-TL toolbox 2.0: a
Setting Up an Automated Biomanufacturing Lab 155

59. Gach PC, Shih SCC, Sustarich J et al (2016) A reusable genetic modules. PLoS One 6(7):
droplet microfluidic platform for automating e21622. https://doi.org/10.1371/journal.
genetic engineering. ACS Synth Biol 5 pone.0021622
(5):426–433. https://doi.org/10.1021/ 69. Casini A, Storch M, Baldwin GS, Ellis T (2015)
acssynbio.6b00011 Bricks and blueprints: methods and standards
60. Lashkaripour A, Rodriguez C, Ortiz L, Dens- for DNA assembly. Nat Rev Mol Cell Biol 16
more D (2019) Performance tuning of micro- (9):568–576
fluidic flow-focusing droplet generators. Lab 70. Kahl L, Molloy J, Patron N et al (2018) Open-
Chip 19(6):1041–1053. https://doi.org/10. ing options for material transfer. Nat Biotech-
1039/C8LC01253A nol 36(10):923–927
61. Shetty RP, Endy D, Knight TF (2008) Engi- 71. Kong DS, Thorsen TA, Babb J et al (2017)
neering BioBrick vectors from BioBrick parts. J Open-source, community-driven microfluidics
Biol Eng 2:5. https://doi.org/10.1186/ with metafluidics. Nat Biotechnol 35
1754-1611-2-5 (6):523–529
62. Gibson DG, Young L, Chuang RY et al (2009) 72. Walsh DI, Pavan M, Ortiz L et al (2019) Stan-
Enzymatic assembly of DNA molecules up to dardizing automated DNA assembly: best prac-
several hundred kilobases. Nat Methods 6 tices, metrics, and protocols using robots.
(5):343–345. https://doi.org/10.1038/ SLAS Technol 24(3):282–290. https://doi.
nmeth.1318 org/10.1177/2472630318825335
63. Engler C, Kandzia R, Marillonnet S (2008) A 73. Ortiz L, Pavan M, McCarthy L, et al (2017)
one pot, one step, precision cloning method Automated robotic liquid handling assembly of
with high throughput capability. PLoS One 3 modular DNA devices. J Vis Exp (130):54703.
(11):e3647. https://doi.org/10.1371/jour https://doi.org/10.3791/54703
nal.pone.0003647 74. Jessop-Fabre MM, Sonnenschein N (2019)
64. Weber E, Engler C, Gruetzner R et al (2011) A Improving reproducibility in synthetic biology.
modular cloning system for standardized Front Bioeng Biotechnol 7:18. https://doi.
assembly of multigene constructs. PLoS One org/10.3389/fbioe.2019.00018
6(2):e16765. https://doi.org/10.1371/jour 75. Beal J, Haddock-Angelli T, Baldwin G et al
nal.pone.0016765 (2018) Quantification of bacterial fluorescence
65. Storch M, Casini A, Mackrow B et al (2015) using independent calibrants. PLoS One 13
BASIC: a new biopart assembly standard for (6):e0199432. https://doi.org/10.1371/jour
idempotent cloning provides accurate, single- nal.pone.0199432
tier DNA assembly for synthetic biology. ACS 76. Madsen C, McLaughlin JA, Misirl G et al
Synth Biol 4(7):781–787. https://doi.org/10. (2016) The SBOL stack: a platform for storing,
1021/sb500356d publishing, and sharing synthetic biology
66. Lai HE, Moore S, Polizzi K, Freemont P designs. ACS Synth Biol 5(6):487–497.
(2018) EcoFlex: a multifunctional moclo kit https://doi.org/10.1021/acssynbio.5b00210
for E. coli synthetic biology. Methods Mol 77. McLaughlin JA, Myers CJ, Zundel Z et al
Biol 1772:429–444 (2018) SynBioHub: a standards-enabled
67. Iverson SV, Haddock TL, Beal J, Densmore design repository for synthetic biology. ACS
DM (2016) CIDAR MoClo: improved Synth Biol 7(2):682–688. https://doi.org/
MoClo assembly standard and new E. coli 10.1021/acssynbio.7b00403
part library enable rapid combinatorial design 78. Bioeconomy G (2019) A research roadmap for
for synthetic and traditional biology. ACS the next-generation bioeconomy
Synth Biol 5(1):99–103. https://doi.org/10.
1021/acssynbio.5b00124 79. Clarke LJ, Kitney RI (2016) Synthetic biology
in the UK – an outline of plans and progress.
68. Sarrion-Perdigones A, Falconi EE, Zandalinas Synth Syst Biotechnol 1(4):243–257
SI et al (2011) GoldenBraid: an iterative clon-
ing system for standardized assembly of
Chapter 6

Computer-Aided Design and Pre-validation of Large


Batches of DNA Assemblies
Valentin Zulkower

Abstract
Type-2S restriction enzymes allow the routine assembly of large batches of synthetic constructs from
individual genetic parts. However, design flaws in the part sequence can cause assembly failures, incurring
troubleshooting costs and project delays. As a result, the careful design and checking of the assembly plan is
often a bottleneck of large assembly projects, and may require computational support. This chapter
demonstrates the use of two free and open-source web applications accelerating this task by automating
genetic part design and simulating type-2S cloning to detect potential assembly issues.

Key words Computer-aided design, Computer-aided manufacturing, DNA assembly, Synthetic


Biology

1 Introduction

Advances in DNA synthesis technologies and robotics over the past


two decades have significantly reduced the costs and completion
times of Synthetic Biology projects [1, 2]. In particular, various
methods relying on type-2S restriction enzymes (which can cleave
DNA outside of their recognition site) enable the assembly of
reusable genetic parts into DNA constructs ranging typically from
2000 to 20,000 nucleotides in size [3–5]. However, these methods
require the sequence of each genetic part to be standardized, which
may involve the removal of internal type-2S restriction sites and the
addition of flanking restriction sites determining the part’s relative
position in the assembled construct. Design flaws at this stage can
result in assembly failure or artifacts which are often long and costly
to troubleshoot.
This chapter presents software solutions to help suppress
human error in part standardization, and ensure that the final
DNA sequences conform to the researcher’s expectations. We
describe two web applications routinely used at the Edinburgh
Genome Foundry (EGF) to design large projects involving

Filippo Menolascina (ed.), Synthetic Gene Circuits: Methods and Protocols, Methods in Molecular Biology, vol. 2229,
https://doi.org/10.1007/978-1-0716-1032-9_6, © Springer Science+Business Media, LLC, part of Springer Nature 2021

157
158 Valentin Zulkower

hundreds of different genetic parts and constructs, and released as


part of the EGF’s collection of public web applications (https://
cuba.genomefoundry.org). The first application streamlines the
standardization of large sets of genetic parts with respect to a
user-selected assembly standard. The second application uses clon-
ing simulation to predict final assembly sequences and detect flawed
assembly plans. Both applications are and rely on free, open-source
computational libraries developed at the EGF (https://edinburgh-
genome-foundry.github.io/).

2 Batch Part Standardization

Several assembly standards based on type-2S restriction enzymes


have been proposed over the last decade, including MoClo [6],
YeastFab [7], and EMMA [8]. These standards differ by their
choice of restriction enzyme(s) and assembly overhangs. As a con-
sequence, each standard enforces a specific set of sequence design
rules to ensure that its genetic parts can be properly assembled
together, and that the resulting constructs have the expected
biological function. We will first see an example of part standardiza-
tion “by hand,” before showing how this can be automated for
larger batches using a dedicated web application.

2.1 Manual Here, we detail the different steps involved in the standardization of
Standardization of a a Green Fluorescent Protein (GFP) sequence for use at position
Genetic Part (Outline) “p9” of the EMMA assembly standard, which will enable to express
other proteins (placed at position “p7”) with a downstream GFP
fusion, by the intermediary of a peptide chain in position p8.
1. Obtain a GFP-encoding nucleotides sequence, e.g., from the
NCBI website (https://www.ncbi.nlm.nih.gov/nuccore/
L29345.1).
2. Open the sequence in the editor of your choice. Sequence files
in text or FASTA format can be open in any text editor, while
files in Genbank format require specialized software such as
Benchling (https://www.benchling.com/) or Snapgene
(https://www.snapgene.com).
3. Add the two-nucleotide sequence “CA” at the beginning of the
sequence in order to make the sequence compatible with posi-
tion p9 of the EMMA standard. This dimer will complete
position p9’s left overhang GCGT to form the assembly scar
GCGTCA, encoding a short Alanine–Serine peptide chain.
Omitting the addition of “CA” will result in an out-of-frame
GFP sequence and biologically dysfunctional protein.
4. Add GCGT and TGCT on the left and right sides of the
sequence, respectively. These will be the sequences of the
Computer-Aided Design and Pre-validation of Large Batches of DNA Assemblies 159

part’s “sticky ends” after cleaving of the DNA by BsmBI, and


will anneal with parts for position 8 on the left and position
10 on the right.
5. Ensure that neither the recognition sequence of the BsmBI
enzyme (CGTCTC) nor its reverse complement (GAGACG)
appears in the sequence. If the parts are intended to be used in
hierarchical (two-step) assemblies, then the BsaI recognition
site should also be removed from the sequence. Use synony-
mous codon juggling to preserve the protein’s sequence while
removing restriction sites [9].
6. Add the sequence CGTCTCN to the left of the sequence
(where CGTCTC is a BsmBI recognition site and N is a nucle-
otide of your choice), and add the reverse complement NGA-
GACG on the right.
7. If the sequence is to be ordered from a commercial provider as
linear DNA, add around 5 base pairs on each end of the
sequence (see https://international.neb.com/tools-and-res
ources/usage-guidelines/cleavage-close-to-the-end-of-dna-
fragments).
8. Order a DNA preparation of the resulting sequence.
For projects involving a large number of parts, manual standar-
dization can be time consuming and error-prone. The next sections
describe the usage of the web application developed at the EGF to
automate these steps and apply an assembly standard’s rules to large
batches of genetic parts at once.

2.2 Preparing 1. Specify the assembly standard by creating a spreadsheet (using


the Necessary Data Microsoft Excel or LibreOffice) on the model of Fig. 1a. Some
Files standards may not have dedicated names for the different part
positions, in which case arbitrary position names can be chosen
by the user, as long as the specified overhangs comply with the
standard. Save the spreadsheet in Excel format (.xls or .xlsx) or
CSV format (.csv). The resulting file will be referred to as the
Standard Definition File.
2. Make sure that each part to be processed is named after the
template “POSITION_part-name,” where the POSITION
attribute refers to the part position, as defined in the standard
definition file. For instance, a GFP sequence to be standardized
for EMMA’s p9 position should be named “p9_GFP.”
3. Gather the sequences of all parts to be processed in a single file,
which can be either (1) a Fasta file in the format shown in
Fig. 1c, or (2) a zip file containing the sequences as separated
files in the Genbank format, each file named after the part
sequence it provides, for instance “p9_GFP.gb.” This file will
be referred to as the Sequences File.
160 Valentin Zulkower

Fig. 1 Input and output of the web-based part standardization application. (a) User-created spreadsheet
defining the assembly standard to follow. (b) Screenshot of the web application showing the web form in its
entirety. (c) Sample files from the part standardization report: PDF summary of the report (front) and Fasta file
listing all standardized sequences for ordering from a DNA synthesis company

If some genetic part regions must be protected against arbitrary


modifications (such as promoter regions or coding sequences),
then these parts should be provided in Genbank format, with
annotations indicating design constraints, as explained in the next.
Computer-Aided Design and Pre-validation of Large Batches of DNA Assemblies 161

2.3 Protecting some 1. Before adding the part’s corresponding Genbank file to the zip
Part Regions against archive, open the Genbank file in a sequence editor, for
Modifications instance, the free software Snapgene Viewer (snapgene.com)
or Benchling (benchling.com).
2. Locate a sequence region which should be protected against
modifications and add an annotation at this location (the
method for which may vary from one sequence editor to
another). The Genbank type of the annotation should be “mis-
c_feature,” and the label of the annotation should be either
“@keep” (to forbid any mutation in the region) or “@cds”
(to allow codon juggling only, i.e., mutations that do not
change the translated protein sequence). Note that many
more design constraints are available, as listed in the documen-
tation of the underlying sequence optimizer DNA Chisel
(https://edinburgh-genome-foundry.github.io/DnaChisel).
3. Save the resulting Genbank record to a file (e.g., “p9_GFP.
gb”) and add the file to the zip archive.

2.4 Using the Web 1. With the web browser of your choice (we recommend a mod-
Application ern version of Google Chrome or Firefox), connect to the
application at the following address: https://cuba.
genomefoundry.org/domesticate_part_batches.
2. The application consists in a one-page form shown in Fig. 1b.
In the rest of this protocol, letters in parenthesis (a), (b), etc.
refer to annotations in this figure.
3. Enter the name of the assembly standard used in (a). This
information is mostly optional and only used for reference in
the report produced by the application.
4. Drag and drop the Standard Definition File in the upload box
(b).
5. Drag and drop the Sequences Files in the upload box (c).
6. When using Genbank records, if the name of the different parts
is provided in the file name (e.g., “p9_GFP.gb”) rather than in
the Genbank’s metadata, set the selection menu in (d) to the
“Use file names as parts IDs” option.
7. Tick the checkbox (e) to allow sequence edits. If the box is left
unticked and some of the provided sequences cannot be stan-
dardized without sequence modifications, the standardization
of these parts will fail, and the failures will be signaled in the
resulting report with an indication for troubleshooting. If the
box is ticked, make sure that sensitive elements have been
protected as explained in the previous section.
8. Click on the “Domesticate” button (f) to start the standardiza-
tion of the parts. This will take a few seconds to a few minutes
depending on the number of parts to process (a progress bar
will be displayed).
162 Valentin Zulkower

2.5 Output 1. As the automated standardization process ends, a button


marked “Download Report” appears below the form. Clicking
the button will download a multi-file zipped report (the Stan-
dardization Report).
2. Open Report.pdf, located at the root of the Standardization
Report, and review, for each part of the batch, the domestica-
tion variant used (to check that each part has been indeed
standardized for the intended position), as well as the number
of nucleotides added or modified during the standardization.
3. Parts for which standardization failed due to unsatisfiable con-
straints will be indicated by messages both in the web interface
and in the PDF report. Refer to the subfolder “error_reports”
of the Standardization Report for more information, in partic-
ular the location of the problematic regions of the parts.
4. The subfolder “sequences_to_order” in the Standardization
Report contains the sequences of all domesticated parts, in
3 formats (CSV, Excel, and FASTA). These files can be
uploaded on the website of commercial providers (which will
generally accept one of these formats) to order DNA prepara-
tions of the standardized parts. However, before any DNA
ordering, it is recommended to check the overall validity of
the assembly plan, as detailed in the next section.

3 Batch Type-2S Assembly Pre-validation Via Cloning Simulation

Type-2S assembly protocols typically consist in mixing standard


genetic parts together with a restriction enzyme (to produce linear
parts with single-stranded overhangs) and a ligase (to assemble
complementary parts into a circular plasmid). An assembly plan
can be defined as the set of parts that must be mixed together to
obtain each desired construct. Despite the apparent simplicity of
the task, some errors can be introduced during the writing of an
assembly plan. Some advanced standards, such as EMMA, allow to
assemble up to 25 parts at once, and assemblies of less than 25 parts
can be created by inserting biologically neutral “connector parts”
to cover unused positions. While offering more freedom to the
designer, such a standard is also more complex to use. Simple
mistakes, such as a flawed part standardization, the omission of a
part in the assembly plan, or the omission of a connector, can lead
to the failure of multiple assemblies at once, yielding either unex-
pected sequences of DNA constructs, or a total absence of viable
clones at the end of the assembly protocol. In both cases, any
design flaw uncovered at this stage requires the design and ordering
of new genetic parts, delaying the project by months and adding
thousands of dollars to its budget.
Computer-Aided Design and Pre-validation of Large Batches of DNA Assemblies 163

This section describes a web-based software application relying


on cloning simulation, that is, the in-silico modeling of restriction
and ligation reactions [10], to predict the outcome of assembly
reactions, and validate assembly plans prior to any DNA ordering or
cloning work. The application offers several advantages for assem-
bly planning, as it is able, with minimal input, to detect part
standardization issues, find omitted or redundant parts in a assem-
bly plan, and produce the final sequence of the assemblies, which
will be useful for quality control at the end of the manufacturing
process, as discussed in the next chapter.

3.1 Preparing 1. Gather the sequences of all parts involved in the assembly. The
the Necessary Data sequences could be spread across different Fasta and Genbank
Files files, but for practicality, we recommend a single Fasta file or a
zip file containing the sequences as separated files in the Gen-
bank format, each file named after the part sequence it
provides.
2. Specify the assembly plan by creating a spreadsheet on the
model of Fig. 2a. Save the spreadsheet in Excel format (.xls or
.xlsx) or CSV format (.csv). The resulting file will be referred to
as the Assembly Plan Spreadsheet in the rest of this section.
3. The web application offers the possibility to omit connector
parts in the Assembly Plan Spreadsheet, and instead have the
necessary connectors for each construct automatically selected.
This requires to gather the sequences of all connector parts
available, as a single Fasta file or a zip file containing the
sequences as separated files in the Genbank format, each file
named after the connector part it provides. These file(s) will be
referred to as Connector Sequences in this section.

3.2 Using the Web 1. Connect to the following application with the web browser of
Application your choice: https://cuba.genomefoundry.org/simulate_gg_
assemblies.
2. The application consists in a one-page form shown in Fig. 2b.
In the rest of this protocol, letters in parenthesis (a), (b), etc.
refer to annotations in this figure.
3. Select the enzyme to be used for the assembly (a). Options are
BsaI, BsmBI (this option is also suitable for other type-2S
BsmBI isoschizomers such as Esp3I), BbsI, or the default
option “Autoselect,” which will attempt to guess the intended
enzyme based on the presence of recognition sites in the part
sequences of each assembly.
4. Drag and drop all sequence files in the upload box (b).
5. Tick the checkbox “Provide a list of assemblies” and drag the
Assembly Plan Spreadsheet in the appearing box (c). Note that
this step can be skipped if the assembly plan consists in a single
assembly.
164 Valentin Zulkower

Fig. 2 Input and output of the web-based cloning simulation application. (a) Screenshot of the web application
showing the web form in its entirety. (b) User-created spreadsheet specifying the assembly plan. Each line
starts with the name of the construct to be assembled, followed by the list of parts in each assembly. (c)
Organization of the Cloning Simulation Report file. (d) Schema of the Genbank record of “Construct 1” as
predicted by the application from the assembly plan of panel B. (e) Part connection schema for “Construct 2.”
The circularity of the schema indicates that the parts will indeed assemble properly into a circular plasmid

6. When using Genbank records, if the name of the standard parts


is provided in the file names (e.g., “p9_GFP.gb”) rather than in
the Genbank’s internal ID field, choose option “Use file names
as parts IDs” in the selection menu in (d).
7. The checkboxes in (e) provide customization options for the
final report. Check “Ensure each line gives a single assembly”
to flag in the report the construct definitions which may lead to
several valid assemblies (i.e., combinatorial assemblies). Check
“Ensure that no part is forgotten in the assemblies” to flag
constructs for which only a subset of the parts will assemble
into a valid circular assembly, other parts being redundant.
Computer-Aided Design and Pre-validation of Large Batches of DNA Assemblies 165

8. If the assembly plan requires completion by connectors, tick


the “Autoselect connectors” box and drag the Connector
Sequences in the appearing box.
9. Click on the “Predict Final Constructs” button (f) to start the
standardization of the parts. This will take a few seconds to a
few minutes depending on the number of parts to process
(a progress bar will be displayed).

3.3 Output 1. As the cloning simulation process ends, a “Download” button


appears below the form (g). Clicking on the button will down-
load a multi-file zipped report on the user’s computer, referred
to as the Cloning Simulation Report in the rest of this section
(Fig. 2c).
2. File “assembly_plan.csv” provides the assembly plan, possibly
complemented with auto-selected connectors to form valid
assemblies. Note that if no connector completion was neces-
sary, this file is identical to the input assembly plan, and is
attached in the Cloning Simulation Report for traceability.
3. File “all_parts.csv” provides the alphabetical list of all parts
involved in the assembly plan, including auto-selected connec-
tors, and can be used as a materials checklist when carrying out
the assembly plan.
4. The Cloning Simulation Report features one folder for each
assembly. A folder contains a Genbank record with the pre-
dicted assembly sequence, as well as a schema of the assembly
(Fig. 2d) and the construct’s “connections graph” showing
connections between the different parts of the assembly
(Fig. 2e).
5. Reviewing the connections graph to detect design problems.
Any connections graph that is not perfectly circular indicates an
invalid assembly plan. A linear connection graph, for instance,
may indicate that parts are missing from the assembly plan. A
part appearing at several places in the connection graph indi-
cates that it was digested in more than one fragment, that is, the
part sequence contains internal restriction sites which should
be removed.
6. Carefully review the final Genbank sequences to ensure that the
final sequences are biologically viable. For instance, check that
all open reading frames spanning over several assembly parts are
not affected by unwanted base pair deletion or insertion due to
assembly scars. Also check that the final constructs feature a
replication origin and the adequate resistance marker.
7. For convenience, copies of each assembly’s Genbank record are
gathered in the “all_records” folder. This set of Genbank
records can be used later on as the input of other software
applications, for example, for automated quality control, as
will be discussed in the next chapter.
166 Valentin Zulkower

References
1. Kosuri S, Church GM (2014) Large-scale de system for standardized assembly of multigene
novo DNA synthesis: technologies and applica- constructs. PLoS One 6(2):e16765. https://
tions. Nat Methods 11(5):499–507. https:// doi.org/10.1371/journal.pone.0016765
doi.org/10.1038/nmeth.2918 7. Guo Y, Dong J, Zhou T, Auxillos J, Li T,
2. Chao R, Mishra S, Si T, Zhao H (2017) Engi- Zhang W et al (2015) YeastFab: the design
neering biological systems using automated and construction of standard biological parts
biofoundries. Metab Eng 42:98–108. https:// for metabolic engineering in Saccharomyces
doi.org/10.1016/j.ymben.2017.06.003 cerevisiae. Nucleic Acids Res 43(13):e88.
3. Engler C, Kandzia R, Marillonnet S (2008) A https://doi.org/10.1093/nar/gkv464
one pot, one step, precision cloning method 8. Martella A, Matjusaitis M, Auxillos J, Pollard
with high throughput capability. PLoS One 3 SM, Cai Y (2017) EMMA: an extensible mam-
(11):e3647. https://doi.org/10.1371/jour malian modular assembly toolkit for the rapid
nal.pone.0003647 design and production of diverse expression
4. Tsuge K, Sato Y, Kobayashi Y, Gondo M, vectors. ACS Synth Biol 6(7):1380–1392.
Hasebe M, Togashi T et al (2015) Method of https://doi.org/10.1021/acssynbio.7b00016
preparing an equimolar DNA mixture for 9. Richardson SM, Wheelan SJ, Yarrington RM,
one-step DNA assembly of over 50 fragments. Boeke JD (2006) GeneDesign: rapid, auto-
Sci Rep 5:10655. https://doi.org/10.1038/ mated design of multikilobase synthetic genes.
srep10655 Genome Res 16:550–556. https://doi.org/
5. Lin D, O’Callaghan CA (2018) MetClo: 10.1101/gr.4431306
Methylase-assisted hierarchical DNA assembly 10. Pereira F, Azevedo F, Carvalho Â, Ribeiro GF,
using a single type IIS restriction enzyme. Budde MW, Johansson B (2015) Pydna: a sim-
Nucleic Acids Res 46:e113. https://doi.org/ ulation and documentation tool for DNA
10.1093/nar/gky596 assembly strategies using python. BMC Bioin-
6. Weber E, Engler C, Gruetzner R, Werner S, formatics 16(1):142. https://doi.org/10.
Marillonnet S (2011) A modular cloning 1186/s12859-015-0544-x
Chapter 7

Computer-Aided Planning for the Verification of Large


Batches of DNA Constructs
Valentin Zulkower

Abstract
Restriction digest analysis and Sanger sequencing are among the most commonly used techniques to check
the sequence of synthetic DNA constructs. However, both require careful preparation to select restriction
enzymes or DNA primers adapted to the expected constructs sequences. In projects involving
manufacturing of large batches of synthetic constructs, the task can be tedious and error-prone. This
chapter demonstrates the use of two free and open-source web applications providing fast and automated
selection of enzymes and sequencing primers for DNA construct verification.

Key words Computer-aided manufacturing, DNA assembly, DNA verification, Sanger sequencing,
Restriction digest analysis, Synthetic Biology

1 Introduction

The assembly of standard genetic parts into circular plasmids is one


of the most common operations in modern genetic engineering. In
a typical protocol, DNA parts are fused together via enzymatic
ligation or PCR [1] and the assembly product is transformed into
bacteria. The bacteria are then plated to obtain isolated colonies,
and each colony can be cultivated and lysed in order to obtain a
high-concentration preparation of the assembled plasmid. How-
ever, only a fraction of the colonies may carry plasmids with the
expected sequence, due to undesired phenomena such as homolo-
gous DNA recombination in the bacterial cells, or parts
mis-annealing during the assembly reaction [2]. As a consequence,
assemblies featuring either a large number of parts or impeding
sequence patterns (such as homologies and tandem repeats) may
require the verification of over 20 colonies to obtain a valid plasmid
preparation. Quality control can therefore account for a significant
proportion of the costs and planning time spent on a high-
throughput DNA assembly project.

Filippo Menolascina (ed.), Synthetic Gene Circuits: Methods and Protocols, Methods in Molecular Biology, vol. 2229,
https://doi.org/10.1007/978-1-0716-1032-9_7, © Springer Science+Business Media, LLC, part of Springer Nature 2021

167
168 Valentin Zulkower

While advances in Next Generation Sequencing (NGS) have


significantly decreased DNA verification costs [3], current NGS
methods still require a complex processing of the DNA samples
to be analyzed, and are only price-competitive for large batches of
hundreds to thousands of assemblies. Pre-existing methods such as
restriction digests and Sanger sequencing [4] remain popular solu-
tions for routine DNA verification, although they require careful
planning, in particular in the selection of ad hoc restriction enzymes
and sequencing primers, in order to obtain decisive results.
This chapter presents two web applications routinely used at
the Edinburgh Genome Foundry (EGF) to plan the verification of
large assembly sets via restriction digests and Sanger sequencing.
The first application automates the selection of a minimal set of
restriction enzymes for the verification of an assembly batch. The
ond application automates the selection of primers for Sanger
sequencing verification of assembly batches, with a focus on primer
reuse across constructs to reduce prices and protocol complexity.

2 Automated Enzyme Selection for Restriction Digest

DNA assembly verification by restriction digest analysis consists in


digesting a DNA construct preparation using selected restriction
enzymes, and separating the resulting DNA fragments into distinct
“migration bands” via gel or capillary electrophoresis. Comparing
band migrations to that of DNA ladders (containing calibrated
DNA fragments of known sizes) enables the estimation of the
different fragment sizes, also called the “band profile” of a con-
struct. Band profiles resulting from the digestion of a DNA
sequence by a given set of enzymes can easily be predicted using
pen and paper, and observations differing significantly from the
predictions indicate an invalid construct.
While less informative than DNA sequencing (which will be
addressed in the next section), restriction digests provide a simple
method for first-pass screening of mis-assembled constructs in a few
hours for less than $1 per assemblies, and laboratory automation
advances in recent years have increased the possible batch size from
a few dozen to a few thousand constructs [5].
However, the selection of enzymes adapted to the constructs to
be verified can be challenging. An ideal digest would consist in one
or two enzymes cutting the construct in several places in order to
produce a multi-band profile, while ensuring that the bands are well
spaced (to prevent bands from fusing together and becoming indis-
tinguishable) and that fragment sizes are in the same range as the
ladder’s bands (typically from a few dozen to a few thousand base
pairs) so that they can be measured with good precision. Addition-
ally, when assembling many constructs, one may want to find a
single enzyme (or enzymes mix) suitable for all constructs, which
Computer-Aided Planning for DNA Verification 169

would allow to prepare a single digestion mix for the project, at the
same time greatly simplifying the protocol and saving reagents. If
need be, the objective could be relaxed to finding a pair of digests
such that any construct in the batch can be digested using one of
the two options. This section describes a software solution auto-
mating the search for such optimal enzymes.

2.1 Using the Web 1. With the web browser of your choice (we recommend Google
Application Chrome or Firefox), connect to the application at the following
address: https://cuba.genomefoundry.org/domesticate_part_
batches. The application consists in a simple one-page form
shown in Fig. 1a. In the rest of this section, letters in parenthe-
sis (a), (b), etc. refer to annotations in this figure.
2. Make sure that the selection box in (a) indicates “Good pat-
terns for all constructs.”
3. Choose the ideal range for the number of bands in a band
profile (b). Less than three bands are generally considered too
generic to be a good screening, and more than 8 bands gener-
ally result in crowded band patterns.
4. Drag and drop the sequences of all constructs, in Genbank or
Fasta format, in the upload box (c).
5. Tick the checkbox in (d) to indicate that the sequences are
circular plasmids.
6. Indicate which ladder will be used among the options proposed
in (e), or the ladder with the closest range if it does not appear
in the options.
7. Enter all available enzymes as a comma-separated list in the text
box (f). Note that a few pre-set enzyme lists are available in the
selection on top of the text box.
8. Choose the maximum number of enzymes in a given digest, as
well as the maximum number of digests accepted to verify the
batch. For instance, when asking for 3–6 bands via a single
digest with 2 enzymes suitable for a batch of 10 constructs, the
application will return a digestion plan as shown in Fig. 1b,
relying on a mix of enzymes AseI and EcoRI. When asking for
4–6 bands and 2 possible digests for the same constructs, the
application will return an assembly plan consisting of AseI
+EcoRI, completed this time by a SphI+XhoI digest
(as shown in Fig. 1c). Notice how this second digest provides
4-band patterns for constructs C5, C6, and C8, for which
digest AseI+EcoRI only produced three bands. As a conse-
quence, each construct in the batch can be verified using either
AseI+EcoRI or SphI+XhoI.
9. Optionally, tick the boxes in (h) for the application to return
detailed plots showing, for each digest, which regions of a
construct correspond to the different bands in the band profile
(Fig. 1d).
170 Valentin Zulkower

Fig. 1 Form and output of the web-based enzyme selection application. (a) Screenshot of the web application
showing the web form in its entirety. (b) Example plot returned by the application for a batch of 10 constructs.
The selected enzymes are indicated on the left. (c) Plot returned by the application in complement to the one in
panel B when the user requires two different digests. (d) Example of construct map returned by the application
(here construct C1, with construct features blurred as not relevant for this chapter). Yellow features indicate
the construct regions corresponding to the different bands (labeled a, b, c, d) of the digestion pattern

3 Automated Primer Selection and Design for Sanger Sequencing

Sanger sequencing [4] enables the determination of a DNA mole-


cule’s sequence by pairing the DNA preparation to be sequenced
with different primers. Each sequencing primer is a small oligonu-
cleotide (typically 20 nucleotides long) homologous to a specific
region of the DNA molecule, and produces a Sanger read
Computer-Aided Planning for DNA Verification 171

determining the sequence of the segment located between two


points ~100 bp and ~1000 bp downstream of the homologous
region, for a typical cost of $1–2 per read. To ensure the success
of Sanger sequencing, each primer should be designed so as to be
specific to the region targeted and have a melting temperature of
55–65  C (to avoid weak reads or high background), and avoid
strong secondary structure that could impair annealing, which can
be automated via software [6]. The Sanger validation of large
batches of assemblies may therefore require the design and pur-
chase of hundreds of primers, as well as hundreds of sequencing
reactions, adding significantly to the overall time and cost of the
DNA assembly process. However, for batches of constructs assem-
bled from standard genetic parts, the number of necessary primers
can be greatly reduced, as (1) constructs sharing common genetic
parts present homologies, making it possible to use a primer with
several constructs, and (2) some primers ordered to sequence a
given batch may be reusable in the next. From this perspective,
we describe here a web application automating the selection of
primers for a given batch of constructs, via a strategy minimizing
the number of reads and the number of new primers required.

3.1 Preparing 1. Prepare a zip archive containing the expected sequence of all
the Necessary Data constructs in the batch as separate Genbank files (referred to as
Files the Constructs Sequences Archive in the rest of this section).
The name of each Genbank file should reflect the construct’s
name. If only part of these sequences should be covered, refer
to the instructions below.
2. Optionally, prepare a Fasta file gathering the sequences of all
primers already available to you. This file will be referred to as
the Primers Sequences File in the rest of this section).

3.2 Indicating The full Sanger sequencing of a 10-kb plasmid requires typically
Regions to Cover 20 reads to ensure a 2 coverage (where each nucleotide is read
and Primer-Free twice). Consequently, sequencing a hundred plasmids will require
Regions two thousand reads, and possibly hundreds of different primers,
making it costly and logistically challenging. To reduce the com-
plexity of the sequencing plan, one may want to restrict sequencing
to some regions of interest. For instance, when assembling several
genetic parts into a receptor vector, the sequencing of the receptor
region may be deemed unnecessary. One may also decide to only
sequence regions at the junctions between successive genetic parts,
as these locations may be more prone to assembly artifacts. More-
over, one may want to avoid using a primer annealing at these
junctions, as the primer may not be able to anneal in case of artifacts
at this location, leading to no read at all. The following steps show
how to specify regions to cover and prevent primers at certain
172 Valentin Zulkower

Fig. 2 Input and output of the web-based primer selection application. (a) Schematic representation of an
assembly’s Genbank record, with part junctions annotated to indicate that these regions in particular should
be covered by sequencing, and should not be an annealing location for primers. (b) Screenshot of the web
application showing the web form in its entirety. (c) Example output schema showing the sequencing plan for
2 constructs. Short red triangles indicate primer annealing locations, blue features indicate Sanger reads from
newly designed primers, and purple features indicate Sanger reads using available primers

locations, using Genbank annotations as shown in Fig. 2a. These


steps are optional and can be skipped if the whole construct
sequence should be covered and no location is unfit for primers.
1. Before adding the construct’s corresponding Genbank file to
the zip archive, open the Genbank file in a sequence editor, for
instance, the free software Snapgene Viewer (see snapgene.
com) or Benchling (benchling.com).
2. Find the location a sequence region which should be protected
against modifications.
Computer-Aided Planning for DNA Verification 173

3. Add an annotation at this location (the method for which may


vary from one sequence editor to another). The Genbank type
of the annotation should be “misc_feature,” and the label of
the annotation “cover.”
4. Likewise, find the location of sequence region for which pri-
mers should be avoided. Add an annotation at this location
with Genbank type “misc_feature,” and the label of the anno-
tation “no_cover.”
5. Save the resulting Genbank record to a file and add the file to
the Constructs Sequences Archive.

3.3 Using the Web 1. With the web browser of your choice, connect to the applica-
Application tion at the following address: https://cuba.genomefoundry.
org/select_primers.
2. The application consists in a simple one-page form shown in
Fig. 2b. In the rest of this section, letters in parenthesis (a), (b),
etc. refer to annotations in this figure.
3. Make sure the validation type is set to “Sanger sequencing” (a).
4. In selection box (b) indicate whether the primers should pro-
duce reads on the 30 –50 strand, 50 –30 strand, or both,
corresponding to a 2 coverage where each nucleotide is read
once from each direction.
5. Drag the Constructs Sequences Archive in the upload box (c).
6. Tick the box (d) to indicate that the constructs to validate are
circular.
7. Optionally, drag the Primers Sequences File in the upload box (e).
8. Specify the expected read size (f) and the target annealing
temperature of the primers (g). The default values provided
are typical, but these parameters may slightly vary depending
on protocol details, and must be checked with the sequencing
laboratory.
9. Specify the number of digits used in the name formatting for
new primers (h). For instance, a value of 3 will result in primer
names of the form P001, P002, etc. Name collisions with
existing primers specified in the Primers Sequences File will
be automatically avoided.
10. Click on “Select primers” (i) to launch the primer selection,
which may take a few minutes (progress bars will be displayed).

3.4 Output 1. As the automated primer selection process ends, “Download”


appears below the form. Clicking on the button will download
a multi-file zipped report on the user’s computer, referred to as
the Primer Selection Report in the rest of this section, which
describes an optimized Sanger sequencing plan for the batch of
constructs. It consists of the three files described below.
174 Valentin Zulkower

2. File “coverage_plots.pdf” features schemas indicating the loca-


tion of primer annealing and expected read regions, as shown in
Fig. 2c, and can be used to quickly review the assembly plan.
3. File “primers_list.csv” lists all primers used in the sequencing
plan, in spreadsheet format, and can be used as a checklist
before starting the preparation of the sequencing. All newly
designed primers appear at the top of the list, making it easy to
order their sequences from commercial primer providers.
4. File “primers_per_record.csv” is a spreadsheet associating each
construct of the batch with the list of primers necessary to
sequence it (or cover the regions of interest). It is meant to
be used either as a checklist if the sequencing reactions are
prepared manually, or as a data file for generating robotic
pick-lists if sample handling is automated.

References
1. Casini A, Storch M, Baldwin GS, Ellis T (2015) 4. Sanger F, Coulson AR (1975) A rapid method
Bricks and blueprints: methods and standards for determining sequences in DNA by primed
for DNA assembly. Nat Rev Mol Cell Biol 16 synthesis with DNA polymerase. J Mol Biol 94
(9):568–576. https://doi.org/10.1038/ (3):441–448. https://doi.org/10.1016/0022-
nrm4014 2836(75)90213-2
2. Potapov V, Ong JL, Kucera RB, Langhorst BW, 5. Dharmadi Y, Patel K, Shapland E, Hollis D,
Bilotti K, Pryor JM et al (2018) Comprehensive Slaby T, Klinkner N et al (2014) High-
profiling of four base overhang ligation fidelity throughput, cost-effective verification of struc-
by T4 DNA ligase and application to DNA tural DNA assembly. Nucleic Acids Res 42(4):
assembly. ACS Synth Biol 7(11):2665–2674. e22. https://doi.org/10.1093/nar/gkt1088
https://doi.org/10.1021/acssynbio.8b00333 6. Hancock JM, Zvelebil MJ, Hancock JM (2004).
3. Shapland EB, Holmes V, Reeves CD, Sorokin E, PRIMER3. In: Dictionary of bioinformatics and
Durot M, Platt D et al (2015) Low-cost, high- computational biology. https://doi.org/10.
throughput sequencing of DNA assemblies 1002/9780471650126.dob0560.pub2
using a highly multiplexed Nextera process.
ACS Synth Biol 4(7):860–866. https://doi.
org/10.1021/sb500362n
Chapter 8

Characterizing Genetic Parts and Devices Using RNA


Sequencing
Deepti Vipin, Zoya Ignatova, and Thomas E. Gorochowski

Abstract
Synthetic genetic circuits are composed of many parts that must interact and function together to produce a
desired pattern of gene expression. A challenge when assembling circuits is that genetic parts often behave
differently within a circuit, potentially impacting the desired functionality. Existing debugging methods
based on fluorescent reporter proteins allow for only a few internal states to be monitored simultaneously,
making diagnosis of the root cause impossible for large systems. Here, we present a tool called the Genetic
Analyzer which uses RNA sequencing data to simultaneously characterize all transcriptional parts (e.g.,
promoters and terminators) and devices (e.g., sensors and logic gates) in complex genetic circuits. This
provides a complete picture of the inner workings of a genetic circuit enabling faults to be easily identified
and fixed. We construct a complete workflow to coordinate the execution of the various data processing and
analysis steps and explain the options available when adapting these for the characterization of new systems.

Key words Genetic circuits, Genetic parts, Characterization, Biometrology, RNA-seq, Synthetic
biology

1 Introduction

Synthetic genetic circuits allow us to reprogram the behavior of


living cells [1]. They consist of many genetic parts and devices that
must work together to regulate gene expression [2]. A challenge
often faced when constructing complex genetic circuits is that
individual parts behave differently when assembled with other
components. Such contextual effects can arise due to changes in
the local sequence composition [3, 4], uncharacterized interactions
between parts [5, 6], competition for shared cellular resources [7–
9], and many other factors [10]. As the size and complexity of
genetic circuits grow [11–13], these malfunctions make it increas-
ingly difficult to construct a working system. Furthermore, because
there are numerous potential points of failure, it is difficult to
exhaustively test every part and single out the root cause. What is

Filippo Menolascina (ed.), Synthetic Gene Circuits: Methods and Protocols, Methods in Molecular Biology, vol. 2229,
https://doi.org/10.1007/978-1-0716-1032-9_8, © Springer Science+Business Media, LLC, part of Springer Nature 2021

175
176 Deepti Vipin et al.

needed is a way of measuring the function of every component in


the context of the complete circuit.
To date, methods to characterize the performance of genetic
parts and devices have mostly relied on the read-out of fluorescent
reporter proteins as proxies for gene expression levels [11, 12, 14,
15]. Genes of interest are tagged with a fluorescent protein or a
fluorescent protein is co-expressed with a gene of interest. The
major benefit of this approach is that fluorescence can be easily
monitored in real-time across entire populations using a plate
reader or even in single cells using flow cytometry. However, a
number of limitations also exist. First, only a limited number of
fluorescent reporters can be used concurrently due to spectral
overlap [16], and second, modifications must be made to the circuit
which too may alter the behavior of the system [2].
Next-generation sequencing has revolutionized many areas of
biological research offering a holistic view of many cellular pro-
cesses [17]. For example, RNA sequencing (RNA-seq) [18] can be
used to assess transcriptional regulation [19], ribosome profiling
(Ribo-seq) [20] can offer insight into protein translation [21], and
numerous other methods exist allowing us to assess the binding
sites of transcriptional regulators (ChIP-seq) [22] and even the
secondary structure of RNA molecules within a cell (SHAPE-seq,
PARS-seq) [23, 24], to name but a few. Unlike fluorescent repor-
ters, sequencing methods do not require the modification of the
host cell or a synthetic genetic circuit and provide a complete, cell-
wide snapshot [17, 18, 25]. Although such methods would allow
for the simultaneous characterization of many types of genetic part,
sequencing has so far seen limited use in synthetic biology. This is
changing with sequencing methods being recently used to charac-
terize transcriptional and translational processes across entire
genetic circuits [19, 21] and the application of multiplexing tech-
niques to significantly decrease costs [26].
In this chapter, we demonstrate how RNA-seq data can be used
to characterize genetic parts, devices, and the host cell response to a
synthetic genetic circuit [19]. We assume that RNA-seq data have
already been collected for the range of possible conditions the
circuit can be exposed to and show how a computational tool called
the Genetic Analyzer can be used to process and analyze these data.
Step-by-step instructions are given on how to install and create new
analysis workflows and the customizations that are necessary to
study new systems of interest (Fig. 1). We also provide brief descrip-
tions of how various tools are used at each step and the principles
underpinning the characterization of transcriptional promoters and
terminators (Fig. 2) and genetic devices like small-molecule sensors
and logic gates (Fig. 3).
Genetic Analyzer Tool 177

START

data/bed/S.bed
data/fasta/S.fasta
data/fastq/S.fastq
data/gff/S.gff
data/settings.txt
00_setup.sh
Generate normalized results/S/
Create temporary and
result directories correcting edge effects

01_map_reads.sh 06_de_analysis.sh

CHARACTERIZATION
Map RNA-seq reads using tmp/S/ Evaluate differential
results/
BWA and Samtools S.bam gene expression
S.de.analysis.txt
between sets of samples
DATA PRE-PROCESSING

02_count_reads.sh results/S/ 07_part_analysis.sh


Estimate read count per S.mapped.reads.txt results/
gene using HTseq S.counts.txt Calculate promoter and
S.gene.lengths.txt terminator performance

03_fragment_distributions.sh
Generate fragment length results/S/
distribution of mapped genes S.fragment.distribution.txt Fit response functions results/
for genetic devices

04_read_analysis.sh results/
count.matrix.txt 09_clean_up.sh
Calculate FPKMs per gene
mapped.reads.matrix.txt
and TMM between sample
gene.lengths.matrix.txt
normalization factors
norm.factors.matrix.txt
fpkm.normed.matrix.txt
END

Fig. 1 Overview of the workflow. Major analyses shown in boxes with dependencies and flows between
analyses shown by arrows. Dashed arrows denote input and output files to each step with “S” denoting a
prefix that would be replaced with a specific sample name

Promoter Terminator

Genetic
Design

J Te

Transcription
TSS Postion (bp) TTS

Fig. 2 Method for characterizing promoter and terminator parts. For both types of part, a small region of the
transcription profile before and after each part is used to estimate changes in RNA polymerase (RNAP) flux
[14, 19]. For promoters, a sharp increase in RNAP flux occurs at the transcription start site (TSS) and the
absolute change in RNAP flux from before to after δJ captures the promoter strength. For terminators, a drop in
RNAP flux occurs at the transcription termination site (TTS) as RNAP physically unbind from the DNA. As this is
a stochastic process, the fractional drop in RNAP flux across the part is related to the termination efficiency Te
(i.e., percentage of RNAP that terminate). Genetic design shown with Synthetic Biology Open Language Visual
(SBOL Visual) symbols [34] and produced using DNAplotlib [35, 36]
178 Deepti Vipin et al.

a Sensor b NOT-gate

Pout Pin Pout

Joff Joff
Jin
2

Jout
1 1
Jon

J
3 4
2
+ J in
+ Inducer
3

Fig. 3 Method for quantifying response function of genetic devices. (a) Sensors are characterized by the
activity δJ of an output promoter Pout in the presence (+) and absence () of an inducer molecule, or another
environmental factor [19]. (b) Genetic gates, such as a NOT-gate, are characterized by the relationship
between the total RNAP flux acting as input to the gate Jin and the activity δJout of the output promoter Pout
[19]. Steady-state measurements across a range of input combinations of a circuit can then be used to fit a
response function (e.g., Hill equation) for the device [12]. The response functions of each device are shown on
the right of each panel, and the transcription profiles and the RNAP flux measurements used to calculate these
are shown to the left

2 Materials

2.1 Software The Genetic Analyzer requires that the following software tools and
Dependencies packages are installed and accessible from a command prompt. In
most cases, newer versions of the software should be compatible.
However, if issues are encountered, we recommend using the pre-
cise versions listed below.
(a) Python version 2.7.9 [27]—we recommend using a packaged
Python distribution such as Anaconda (www.continuum.io)
or Enthought (www.enthought.com).
(b) R version 3.2.1 [28].
(c) edgeR version 3.8.6 [29].
(d) BWA version 0.7.4 [30].
(e) SAMtools version 1.4 [31].
(f) HTSeq version 0.9.1 [32].
(g) Git version 2.21.0.

2.2 Installation 1. The Genetic Analyzer forms a part of a number of tools for
of the Genetic Analyzer analyzing sequencing data. A copy of the latest Genetic Ana-
lyzer can be downloaded by running the following command:

git clone https://github.com/VoigtLab/MIT-BroadFoundry


Genetic Analyzer Tool 179

2. A directory called “MIT-BroadFoundry” will have been cre-


ated. Within this, the “genetic-analyzer” directory contains
code related to the Genetic Analyzer workflow. All analysis
scripts can be found in the “bin” directory and an example
workflow is given in the “circuit example” directory. To ensure
that analysis scripts can be found by each script, it is essential
that each stage of the analysis workflow is executed from within
a workflow directory (a description of each stage is provided in
Subheading 3 below). This will ensure that the relative paths
used in the scripts point to the correct code.

2.3 Sequencing Data The Genetic Analyzer assumes that sequencing data will be
provided in a standardized form to allow for automated processing
[19]. In particular, it requires that paired-end RNA-seq data with
FASTQ files is provided for read 1 and read 2 of each fragment. We
recommend preparing strand-specific RNA-seq sequencing
libraries [26] to allow for the multiplexing of multiple samples
during a single run, and sequencing these libraries using an Illu-
mina sequencer (e.g., HiSeq 2500). It is essential that a sufficient
number of reads are generated per sample to allow for accurate
quantification of genetic parts and devices. Although the precise
number is dependent on the size of the host genome and synthetic
genetic constructs present, for Escherichia coli cells, we find that
approximately four million reads per sample is sufficient for accu-
rate measurements from large genetic circuits [19].

3 Methods

In the following sections, we explain the key steps and processes


required to characterize genetic parts and devices, and to under-
stand the transcriptional response of the host cell upon introduc-
tion of a synthetic genetic circuit. It is assumed that a copy of the
entire workflow and all scripts are available in the current path and
that all commands are run from this location (see Note 1).

3.1 Initial Workflow 1. The first step is the creation of a new workflow to store all
Setup sequencing data, metadata about the host system and synthetic
genetic circuits being studied, and the generated results. To
create a new workflow, it is advised that a copy of the “circuit
example” directory is made and renamed as appropriate.
Because the workflow relies on the specific location of certain
files, keeping the same directory structure within a workflow is
essential. Once a new workflow directory has been created, a
number of key files must be edited and added within the “data”
directory. We recommend editing and renaming the examples
provided to ensure that correct file formats are maintained.
180 Deepti Vipin et al.

2. Within the “data/bed” directory, a BED file [33] (*.bed) must


be present that provides the genomic regions for which tran-
scription profiles will be created. The chromosome names used
in this file must be identical to the names used in the provided
reference sequences (see step 3 below). The format of this file is
a line containing the chromosome name, start location, and
end location (tab separated) for each region required. It is vital
that transcription profiles are generated for regions containing
every part that will be characterized and it is not necessary for
the entire chromosome of the host cell to be considered, unless
genetic parts within the host will be measured.
3. Within the “data/fasta” directory, a FASTA file (*.fa or *.fasta)
must be present containing reference sequences for the host
genome and any other genetic constructs present. A standard
multi-FASTA format is used and the names of the chromo-
somes must match their use in other files.
4. Within the “data/fastq” directory, all raw FASTQ files (*.fq or
*.fastq) from the sequencing must be present. There should be
a pair of FASTQ files for each state of the circuit (e.g., combi-
nation of inducer molecules) corresponding to the paired-end
reads produced by the sequencer.
5. Within the “data/gff” directory, a GFF file (*.gff) must be
present describing the location of all features that will be
needed by the analysis workflow. This file should provide a
detailed annotation of the reference sequences giving the start
and end location of each feature, the strand that the feature
resides (+ or –), as well as information about the type and other
metadata related to the part itself (if relevant). Table 1 describes
the five different features that can be used, and the related
metadata needed to capture regulatory links between parts
and other crucial information for part characterization. It
should be noted that differential gene expression analysis (see
Subheading 3.4) will only consider features of the “gene” type.
6. Finally, the “data/setting.txt” file contains a tab-delimited
table where each row corresponds to a particular sample/state
of the circuit (e.g., combination of inducers). Existing rows for
the circuit example should be edited or removed as necessary.
In addition, the first row is a header and the second row should
remain untouched as it states a central location for storing
results collating data from all circuit samples. We recommend
using relative paths so that the entire workflow directory can be
easily moved without causing issues relating to absolute paths
changing.
7. Before any analysis can be performed, a number of additional
directories must be created to provide a location for temporary
files and analysis results. This process is performed by the
Genetic Analyzer Tool 181

Table 1
Custom workflow feature types and parameters for GFF files

Type Parametera Description


promoter Name Name of the promoter. Promoter features are
not used for calculating promoter strengths
promoter_unit Name Name of the promoter unit. This contains one
or more promoters and is used as the main
feature for calculating promoter strengths
b
promoter_names Names of the individual promoters
promoter_typesb The type of each promoter, either “repress” if
repressed by a gene in the circuit or
“induced” if an output promoter from a
sensor
promoter_nsb n values of the hill functions characterizing the
response function of each individual
promoter making up the promoter unit
chrom_inputsb Chromosome of the input promoter units
b
promoter_unit_inputs Other promoter units that act as inputs, driving
expression of any genes that have a regulatory
effect on this promoter unit (e.g., repressors).
For induced promoters this should be a colon
separating list of each circuit state and a 0 or
1, if inactive or active, respectively (e.g.,
“state1 > 0:state2 > 1:state3 > 0”)
gene Name Name of the gene. Gene features should span
the coding region of a protein and are used to
calculate differential gene expression (see
Subheading 3.4)
transcript Name Name of the transcript. Transcript features are
used to generate the transcription profiles
start_site Nucleotide position within the transcript where
cleavage by a ribozyme takes place. A position
of ten would be the tenth nucleotide in the
transcript, not the reference sequence
terminator Name Name of the terminator. Terminator features
should span the full length of the part
a
Parameters are provided as key-value pairs in the form “key ¼ value.” Multiple parameters should be semi-colon
separated, that is, “parameter1 ¼ value1; parameter2 ¼ value2”
b
Where multiple promoters make up a promoter unit, the values for each promoter within the unit should be comma
separated, corresponding to the promoters in sequence, e.g., “Name ¼ P1,P2”

“00_setup.sh” script which must first be edited to create sepa-


rate directories for each sample in the “results” and “tmp”
directories. The names of these directories should match pre-
cisely the sample names in the “data/setting.txt” file.
8. The required directories are then created by running the
command:
sh 00_setup.sh
182 Deepti Vipin et al.

3.2 Data 1. Once a complete workflow is setup, the raw RNA sequencing
Preprocessing data for each sample need to be mapped to the reference
sequences. This is performed by the “01_map_reads.sh” script
which calls the “map_reads.py” script to coordinate the SAM-
tools [31] and BWA [30] software for each sample. This script
should be edited to include entries for each sample present in
the “data/setting.txt” file.
2. The mapping of sequencing reads is then performed by run-
ning the command:
sh 01_map_reads.sh

3. This creates BAM files [31] for each sample in the “tmp”
directory.
4. The next step is to generate read counts for each gene feature in
the GFF file of the system being studied. This is used when
calculating differential gene expression in Subheading 3.4. This
process is performed by the “02_count_reads.sh” script which
calls the “count_reads.py” script for each sample. This script
should be edited to include entries for each sample present in
the “data/setting.txt” file.
5. Read counts for each gene are then calculated by running the
command:
sh 02_count_reads.sh

6. The script will create multiple output files in the “results”


directory for each sample: “SAMPLE.counts.txt” containing
read counts for each gene, “SAMPLE.gene.lengths.txt” con-
taining the length of each gene used for calculations of gene
expression in Fragments Per Kilobase of transcript per Million
mapped reads (FPKM) units, and “SAMPLE.mapped.reads.
txt” containing information about the total number of mapped
reads. In all cases, “SAMPLE” in the filename is replaced by the
full sample name.
7. The next step is to generate fragment length distributions that
are needed to allow for the correction of reduced read counts at
the ends of transcripts. This process is performed by the
“03_fragment_distibrutions.sh” script which calls the “frag-
ment_distributions.py” script for each sample. This script
should be edited to include entries for each sample present in
the “data/setting.txt” file.
8. The distribution of fragment lengths for each sample is then
produced by running the command:
sh 03_fragment_distributions.sh

9. The script will create output files in the “results” directory for
each sample containing the fragment length distributions.
Genetic Analyzer Tool 183

These files will be named in the format “SAMPLE. fragment.


distribution.txt” where SAMPLE is replaced by the
sample name.
10. The final step of the preprocessing is to collate the data gener-
ated for each sample separately and calculate the trimmed mean
of M-values (TMM) normalization factors using the edgeR
package [29] needed to enable comparison of read counts
between samples. This process is performed by the “04_read_-
analysis.sh” script which calls the “read_analysis.py” script.
These scripts should not need to be edited.
11. Collation of all read data is then performed by running the
command:
sh 04_read_analysis.sh

12. The script will create three output files in the “results” direc-
tory: “norm.factors.matrix.txt” containing TMM between
sample normalization factors, “mapped.reads.matrix.txt” con-
taining mapped read counts, “count.matrix.txt” containing
read counts for each gene, “gene.lengths.matrix.txt” contain-
ing the length of each gene, and “fpkm.normed.matrix.txt”
containing normalized FPKM expression values for each gene.

3.3 Generating 1. Once the RNA-seq data have been preprocessed, the next step
Transcription Profiles is to generate transcription profiles for specified regions of the
host genome, as well as any synthetic genetic constructs that
might be contained on plasmids. This process is performed by
the “05_transcription_profiles.sh” script which should be edi-
ted such that calls to the “transcription_profile.py” script are
made for each sample. Chromosomes for which profiles should
be created are specified with the “-chroms” option.
2. Transcription profiles are then created by running the
command:
sh 05_transcription_profiles.sh

3. The script will create pairs of output files in the “results”


directory for each sample. These files will be named using the
formats “SAMPLE.fwd.norm.profiles.txt” and “SAMPLE.rev.
norm.profiles.txt” where SAMPLE is replaced by the full name
of the sample. These files contain transcriptional profiles for the
forward and reverse strands of the regions specified in the user-
provided BED file (see Subheading 3.1).

3.4 Analyzing 1. Synthetic circuits can impart a significant burden on a host cell
Differential Gene which is often manifested by changes in gene expression. Dif-
Expression ferential gene expression analysis allows for shifts in expression
to Understand the Host to be quantified in a robust manner, correcting for potential
Response between-sample variations due to differences in sequencing
184 Deepti Vipin et al.

depth. This analysis is performed by the “06_de_analysis.sh”


script, which should be edited to call the “de_analysis.py” script
for user-specified sets of samples to compare. For example, a
user can use the options “-group1 1,2,3 -group2 4,5,6” to
compare differences in gene expression between samples {1,
2, 3} and {4, 5, 6}. The numbers correspond to the sample in
that row of the “data/settings.txt” file. The “-output_prefix”
option can be used to provide a filename prefix for the output
file containing the results. This enables multiple differential
gene expression analyses to be performed simultaneously.
2. Differential gene analysis is then performed by running the
command:
sh 06_de_analysis.sh

3. The script will create output files in the “results” directory for
each analysis performed. These will be named in the format
“PREFIX.de.analysis.txt” where PREFIX is replaced by the
user provided “-output_prefix” in the “06_de_analysis.sh”
script.

3.5 Characterizing 1. To characterize the performance of promoter and terminator


Promoters parts, we analyze changes in a transcription profile (i.e. RNAP
and Terminators flux) from the start to the end of the part (see Fig. 2) [19]. To
perform this task, the “07_part_analysis.sh” script makes calls
to the “part_profile_analysis.py” script. These scripts load and
analyze each promoter and terminator part in the workflow’s
GFF file for every sample in the “data/settings.txt” file. These
scripts should not need to be edited.
2. Genetic part characterization is performed by running the
command:
sh 07_part_analysis.sh

3. The script will create two output files in the “results” directory:
“promoter.profile.perf.txt” containing estimates of promoter
strengths and “terminator.profile.perf.txt” containing termina-
tor efficiencies calculated from the transcription profiles (see
Fig. 2).

3.6 Quantifying 1. In addition to measuring the performance of genetic parts in


the Response Function isolation, we can also infer the functional response of many
of Genetic Devices parts that work together in concert as a genetic device. Exam-
ples include sensor modules and genetic logic gates in which
promoters act as inputs and outputs. To characterize these
types of genetic device, we fit a steady-state response function
to capture how the input and output promoter activities (cal-
culated in Subheading 3.5) vary together across all states of the
system. It should be noted that when characterizing genetic
Genetic Analyzer Tool 185

devices, it is essential that the samples taken, span the full range
of possible inputs the system may be exposed to. This ensures
that inputs vary over their full range and improve the fitting of a
response function. In this workflow, we allow for genetic
devices that have activating and repressing Hill-like response
functions. The fitting of the response function to experimental
data is performed by the “08_promoter_fitting.sh” script. This
calls the “promoter_fitting.py” script for each set of samples
corresponding to a particular condition. The script will need to
be updated for the samples to be processed. If for example, you
have assayed a circuit in two separate types of growth media,
then the samples for one media should be fitted separately to
the other. This will, therefore, require two calls to the “pro-
moter_fitting.py” script with the appropriate samples given as
arguments to the “-samples” option.
2. Genetic device characterization is performed by running the
command:
sh 08_promoter_fitting.sh

3. The script will create the “fitted.promoter.perf.txt” output file


in the “results” directory. This contains fitted response func-
tions for each genetic device (see Fig. 3).

3.7 Removing 1. Once the complete workflow has been run and all required
Temporary Files analysis performed, a clean-up step can be used to remove all
and Logs temporary files and logs. This will ensure that any generated
results remain untouched but will not allow for intermediate
steps to be rerun out of order (some of the temporary files are
necessary for many of the analyses).
2. The clean-up step is performed by the “09_clean_up.sh” script.
Before running, this file must be updated to include entries to
delete all contents from the “tmp” and “logs” directories
(including any sub-directories). Once edited, the script can be
executed using:
sh 09_clean_up.sh

4 Notes

1. This workflow has been tested on Linux and MacOS operating


systems and assumes that a UNIX-compatible command
prompt running a standard shell (e.g., sh, bash, and zsh) is
available. For Windows users, we recommend installing the
Windows Subsystem for Linux (WSL), which will provide
access to a required command prompt that is able to execute
the scripts in the workflow. This subsystem will require all the
prerequisite tools installed and working (see Subheading 2 for
details).
186 Deepti Vipin et al.

Acknowledgments

D.V. and Z.I. were supported by the EU H2020 SynCrop


European Training Network (grant 764591). T.E.G. was sup-
ported by BrisSynBio, a BBSRC/EPSRC Synthetic Biology
Research Centre (grant BB/L01386X/1) and a Royal Society
University Research Fellowship (grant UF160357).

References

1. Greco FV, Tarnowski MJ, Gorochowski TE 13. Woodruff LBA et al (2016) Registry in a tube:
(2019) Living computers powered by bio- multiplexed pools of retrievable parts for
chemistry. Biochemist 41:14–18 genetic design space exploration. Nucleic
2. Brophy JAN, Voigt CA (2014) Principles of Acids Res 45(3):1553–1565
genetic circuit design. Nat Methods 11:508 14. Canton B, Labno A, Endy D (2008) Refine-
3. Kosuri S et al (2013) Composability of regu- ment and standardization of synthetic
latory sequences controlling transcription and biological parts and devices. Nat Biotechnol
translation in Escherichia coli. Proc Natl Acad 26:787
Sci U S A 110(34):14024 15. Kelly JR et al (2009) Measuring the activity of
4. Mutalik VK et al (2013) Precise and reliable BioBrick promoters using an in vivo reference
gene expression via standard transcription and standard. J Biol Eng 3(1):4
translation initiation elements. Nat Methods 16. Kleeman B et al (2018) A guide to choosing
10:354 fluorescent protein combinations for flow cyto-
5. Schmidl SR et al (2019) Rewiring bacterial metric analysis based on spectral overlap. Cyto-
two-component systems by modular metry A 93(5):556–562
DNA-binding domain swapping. Nat Chem 17. Goodwin S, McPherson JD, McCombie WR
Biol 15(7):690–698 (2016) Coming of age: ten years of next-
6. Scott SR, Hasty J (2016) Quorum sensing generation sequencing technologies. Nat Rev
communication modules for microbial consor- Genet 17:333
tia. ACS Synth Biol 5(9):969–977 18. Stark R, Grzelak M, Hadfield J (2019) RNA
7. Gorochowski TE, Avcilar-Kucukgoze I, sequencing: the teenage years. Nat Rev Genet
Bovenberg RAL, Roubos JA, Ignatova Z 20(11):631–656
(2016) A minimal model of ribosome alloca- 19. Gorochowski TE et al (2017) Genetic circuit
tion dynamics captures trade-offs in expression characterization and debugging using
between endogenous and synthetic genes. ACS RNA-seq. Mol Syst Biol 13(11):952
Synth Biol 5(7):710–720 20. Ingolia NT (2014) Ribosome profiling: new
8. Gyorgy A et al (2015) Isocost lines describe the views of translation, from single codons to
cellular economy of genetic circuits. Biophys J genome scale. Nat Rev Genet 15:205
109(3):639–646 21. Gorochowski TE, Chelysheva I, Eriksen M,
9. Qian Y, Huang H-H, Jiménez JI, Del Vecchio Nair P, Pedersen S, Ignatova Z (2019) Abso-
D (2017) Resource competition shapes the lute quantification of translational regulation
response of genetic circuits. ACS Synth Biol 6 and burden using combined sequencing
(7):1263–1272 approaches. Mol Syst Biol 15(5):e8719
10. Cardinale S, Arkin AP (2012) Contextualizing 22. Park PJ (2009) ChIP–seq: advantages and chal-
context for synthetic biology – identifying lenges of a maturing technology. Nat Rev
causes of failure of synthetic biological systems. Genet 10(10):669–680
Biotechnol J 7(7):856–866 23. Del Campo C, Bartholom€aus A, Fedyunin I,
11. Nielsen AAK et al (2016) Genetic circuit Ignatova Z (2015) Secondary structure across
design automation. Science 352(6281): the bacterial transcriptome reveals versatile
aac7341 roles in mRNA regulation and function. PLoS
12. Stanton BC, Nielsen AAK, Tamsir A, Clancy K, Genet 11(10):e1005613
Peterson T, Voigt CA (2014) Genomic mining 24. Strobel EJ, Yu AM, Lucks JB (2018) High-
of prokaryotic repressors for orthogonal logic throughput determination of RNA structures.
gates. Nat Chem Biol 10(2):99–105 Nat Rev Genet 19(10):615–634
Genetic Analyzer Tool 187

25. Conway T et al (2014) Unprecedented high- 31. Li H et al (2009) The sequence alignment/
resolution view of bacterial operon architecture map format and SAMtools. Bioinformatics 25
revealed by RNA sequencing. MBio 5(4): (16):2078–2079
e01442–e01414 32. Anders S, Pyl PT, Huber W (2014) HTSeq—a
26. Shishkin AA et al (2015) Simultaneous genera- Python framework to work with high-
tion of many RNA-seq libraries in a single reac- throughput sequencing data. Bioinformatics
tion. Nat Methods 12:323 31(2):166–169
27. Sanner MF (1999) Python: a programming 33. Quinlan AR, Hall IM (2010) BEDTools: a
language for software integration and develop- flexible suite of utilities for comparing genomic
ment. J Mol Graph Model 17(1):57–61 features. Bioinformatics 26(6):841–842
28. R. C. Team (2013) R: a language and environ- 34. Beal J et al (2019) Communicating structure
ment for statistical computing. R Foundation and function in synthetic biology diagrams.
for Statistical Computing, Vienna ACS Synth Biol 8(8):1818–1825
29. Robinson MD, McCarthy DJ, Smyth GK 35. Der BS et al (2017) DNAplotlib: programma-
(2009) edgeR: a bioconductor package for dif- ble visualization of genetic designs and asso-
ferential expression analysis of digital gene ciated data. ACS Synth Biol 6(7):1115–1119
expression data. Bioinformatics 26 36. Bartoli V, Dixon DOR, Gorochowski TE
(1):139–140 (2018) Automated visualization of genetic
30. Li H, Durbin R (2009) Fast and accurate short designs using DNAplotlib. In: Braman JC
read alignment with Burrows–Wheeler trans- (ed) Synthetic biology: methods and protocols.
form. Bioinformatics 25(14):1754–1760 Springer New York, New York, NY, pp
399–409
Chapter 9

Steady-State Cell-Free Gene Expression with Microfluidic


Chemostats
Nadanai Laohakunakorn, Barbora Lavickova, Zoe Swank, Julie Laurent,
and Sebastian J. Maerkl

Abstract
Cell-free synthetic biology offers an approach to building and testing gene circuits in a simplified environ-
ment free from the complexity of a living cell. Recent advances in microfluidic devices allowed cell-free
reactions to run under nonequilibrium, steady-state conditions enabling the implementation of dynamic
gene regulatory circuits in vitro. In this chapter, we present a detailed protocol to fabricate a microfluidic
chemostat device which enables such an operation, detailing essential steps in photolithography, soft
lithography, and hardware setup.

Key words Microfluidics, Cell-free, Synthetic biology, Steady-state gene expression

1 Introduction

One of the enduring challenges in synthetic biology today is the


overwhelming difficulty of predictive forward-engineering, despite
major efforts to characterize, standardize, and mathematically
model synthetic biological parts and systems [1]. Even if parts
such as promoters and regulators are initially well-characterized,
combining them together into larger subsystems typically changes
the context of the parts as well as the host cell, resulting in dimin-
ished predictive accuracy, and in some cases, a loss of the original
function altogether. Functional designs are therefore usually devel-
oped not in a purely rational manner, but require rounds of empiri-
cal design-build-test cycles. While this approach can certainly yield
functional designs, it is preferable to ultimately develop more effi-
cient and rational ways of engineering gene circuits.
Within synthetic biology, the adoption of cell-free systems has
become increasingly widespread [2]. From an engineering perspec-
tive, they behave as a very simplified “host cell,” providing a con-
stant and controllable environment in which to build synthetic

Filippo Menolascina (ed.), Synthetic Gene Circuits: Methods and Protocols, Methods in Molecular Biology, vol. 2229,
https://doi.org/10.1007/978-1-0716-1032-9_9, © Springer Science+Business Media, LLC, part of Springer Nature 2021

189
190 Nadanai Laohakunakorn et al.

gene networks. Cell-free systems are thus well suited for rational,
bottom-up engineering of biomolecular systems [3, 4]. Further-
more, the functionality of cell-free systems can be expanded by
inclusion of additional components [5], and provide a system for
quantitative analysis including mRNA and protein concentrations
[6, 7]. A second key benefit is that their ease of preparation and
scalability also accelerate design-build-test cycles, resulting in their
adoption as an efficient rapid prototyping platform. Both lysate
[8, 9] and recombinant [10] cell-free reaction systems can now be
readily generated using standard laboratory equipment at reason-
ably low costs.
Microfluidics have allowed these benefits of cell-free synthetic
biology to be more fully realized [11]. By increasing the through-
put, lowering reagent consumption and providing control and
quantitative monitoring of thousands of reactions in parallel, they
have enabled precise characterization of cell-free gene circuits both
in integrated chips, [12, 23] as well as in encapsulated droplets
[13, 14].
Batch cell-free reactions typically run to chemical equilibrium
as substrates are exhausted, reaction products accumulate, and
enzymatic machinery degrades. To maintain a more life-like non-
equilibrium steady state, large-scale continuous exchange or con-
tinuous flow reactors have been used to feed the reaction with small
molecules and wash away products through ultrafiltration mem-
branes [15]. At the microfluidic level, microchemostat devices have
been developed which replenish not only substrates but also the
enzymatic machinery, while at the same time diluting away reaction
products [16, 17]. These microchemostats enable long-term
steady-state reactions, and also allow for the investigation of bio-
logically relevant dynamical behaviors such as oscillations [16, 17]
and pattern formation [18].
In this chapter, we describe the entire process of designing,
fabricating, and operating a microfluidic chemostat device. The
chip we chose as an example is a revised and simplified version of
the microchemostat presented in Niederholtmeyer et al. 2013 [16],
and is shown in Fig. 1.
The operation of the device first involves selecting an input
solution using the multiplexer unit, which is directed to one of
eight separate reactor rings. Each reactor contains four output
ports, located at specific positions around the ring. Opening these
ports exchanges a fixed fraction of the reactor volume, with the
exact fraction depending on the position of the port. The place-
ment of these ports allows the reactor to be loaded with a reaction
of fixed composition, and importantly also allows a dilution step to
occur which preserves this composition. In between dilution steps,
Steady-State Cell-Free Gene Expression 191

a Flow, 14 µm channel height (rounded) b


100 µm width
Control, 40 µm channel height
100 or 40 µm width glass slide
Unpressurized Pressurized
9-input multiplexer
c
dual-function valve and
peristaltic pump

outlet

2 mm

individual microreactor

Fig. 1 (a) A two-layer microchemostat design consists of a thin control layer sandwiched between a glass slide
and a thicker flow layer. (b) Applying pressure to channels in the control layer pushes up valves which close off
channels in the flow layer. (c) The chip contains eight individual chemostat reactors. Four control lines serve
as dual-function valves and peristaltic pump. Actuating these lines sequentially mixes the liquid inside the
reactors

the reaction is mixed using a peristaltic pump. Full details are given
in Subheading 3.5.
We describe the photolithographic steps required to print the
chip design on a chrome mask, and subsequently transfer it onto
silicon wafers. Once fabricated, these silicon molds can be used for
multiple rounds of soft lithography where they are used to cast
polydimethysiloxane (PDMS) devices. Finally, the hardware
required for operating the chip is described, and a standard experi-
ment outlined. Related protocols are available in the literature
[19, 20].
192 Nadanai Laohakunakorn et al.

2 Materials

The photolithography steps were carried out in a Class 100 clean


room at EPFL. Soft lithography was done in a dedicated space in a
standard wet lab. Specialized machines, consumables, and chemi-
cals are listed below.

2.1 Photolithography 1. VPG200 photoresist laser writer (Heidelberg Instruments


Machines Mikrotechnik GmbH).
2. HMR900 mask processor (Hamatech APE GmbH).
3. Optispin SB20 spin coater and VB20 hotplate
(ATMsse GmbH).
4. MJB4 mask aligner (Süss MicroTec AG).
5. Tepla 300 plasma stripper (PVA Tepla AG).
6. LSM250 spin coater and HP200 hotplate (Sawatec AG).
7. AccuPlate thermal accumulator and hot plate system (Detlef
Gestigkeit).

2.2 Photolithography 1. AZ 9260 positive photoresist (MicroChemicals GmbH).


Consumables 2. GM1070-SU8 negative photoresist (Gersteltec).
3. 1-methoxy-2-propyl-acetate (PGMEA) developer (Sigma).
4. AZ 400 K developer (Merck).
5. AZ 351B developer (Merck).
6. Cr01 chrome etchant (Technic).
7. Hexa-methyl-disilazane (HMDS) primer (Technic).
8. Silicon wafers, diameter 100  0.5 mm, thickness
525  25 μm, P-type (boron-doped), and resistivity 0.1–100
Ωcm (Siegert).
9. SLM5 500 blank chrome mask (Nanofilm).

2.3 Soft Lithography 1. ARE-250 centrifugal mixer (Thinky).


Machines 2. SCS G3P-8 spin coater (Specialty Coating Systems Inc.).
3. Schmidt Press manual hole puncher and 21-gauge (OD 0.0400 )
pins (Technical Innovations, Inc.).
4. Diener Femto 40 kHz low pressure plasma oven with O2
supply (Diener electronic GmbH + Co. KG).
5. Universal Oven UF110, 108 L (Memmert).
6. SZX10 dissection microscope with DF PLANO 1.25 objec-
tive and KL 1500 LCD light source (Olympus).

2.4 Soft Lithography 1. Trimethylchlorosilane (Sigma).


Consumables
Steady-State Cell-Free Gene Expression 193

2. Sylgard 184 polydimethylsiloxane (PDMS) elastomer and cur-


ing agent (Dow Corning).
3. Glass slides 76  26  1 mm 631–1550 (VWR).

2.5 Microfluidic 1. 12-station aluminium pneumatic manifold with 24 V 3-way


Hardware normally open solenoid valves (S10MM-31-24-2/A
Pneumadyne).
2. Polycarbonate manual luer manifold (Cole-Parmer).
3. Custom relay circuit board (see Note 1).
4. Type 10, 2–60 psi and 2–25 psi pressure regulators (Marsh
Bellofram).
5. 0.1–3 bar pressure gauge (Riegler & Co. KG).

2.6 Microfluidic 1. Synflex 1201-M06 polyethylene (PE) tubing, OD 6 mm ID


Connectors 4 mm (Eaton).
2. Low-density polyethylene (PE-LD) tubing OD 1/8c ID
1/1600 (Tuyau).
3. Tygon tubing, OD 0.0600 ID 0.0200 (Cole-Parmer).
4. Fluorinated ethylene-propylene (FEP_ tubing, OD 1/1600 ID
1/3200 (Upchurch).
5. Polyetherketone (PEEK) tubing, OD 1/3200 ID 0.18 mm
(Vici).
6. Luer stubs 12 mm, 23 and 20 ga.
7. Male-to-male and 1/1600 barb to male luer adaptors.
8. Stainless steel connecting pins OD 0.65 mm ID 0.35 mm,
8 mm (Unimed).
9. Brass Series G pneumatic fittings (Serto AG).
10. Blue Series pneumatic fittings (Riegler & Co. KG).

2.7 Microscope 1. Ti2 Eclipse Inverted Microscope (Nikon).


Hardware 2. Objectives: CFI Achro 4 NA 0.1 (Nikon); CFI S Plan Fluor
20 NA 0.45 ELWD DIC N1 (Nikon).
3. Filters: F36–504 mCherry HC filter set (Semrock); FITC
(Nikon).
4. Microscope enclosure and heater (Okolab).
5. Sola SM II Light Engine (Lumencor).
6. Orca-Flash 4.0 V3 Digital CMOS Camera (Hamamatsu).

2.8 Software 1. AutoCAD2019 (Autodesk).


2. CleWin (WieWeb).
3. LabView 2018 (National Instruments).
4. Matlab 2019 (Mathworks).
194 Nadanai Laohakunakorn et al.

2.9 Experimental 1. TX-TL cell-free extract, ribosomes and energy solution,


Reagents prepared as in [13].
2. DNA template, prepared as in [14].

3 Methods

3.1 Design of 1. Design the device (see Note 2) on AutoCAD 2019 or other
Microfluidic Devices software with similar functionality. A specific example is shown
in Fig. 1, and other designs are available on our webpage (see
Note 3). Export the final design as a .dxf file.
2. Using CleWin, convert the designs to a machine-compatible .
cif file ready for photomask fabrication.
3. During curing, the PDMS layers will differentially shrink, with
the thicker flow layer shrinking more than the thinner control
layer, which remains attached to the rigid mold. Thus, it is
crucial to enlarge the entire flow layer design by 1.5%. This
can be done in CleWin during the conversion.

3.2 Photolithography 1. Expose chrome masks with the VPG200 laser writer, using a
for Mask and Wafer 20 mm write lens (see Note 4) and 48% intensity. Make sure
Fabrication that the polarity and mirroring of the mask are correct (see
Note 5).
3.2.1 Mask Fabrication
2. Next, process the exposed masks using the HMR900 mask
processor. This involves the following automated steps:
3. First purge the machine with deionized (DI) water.
4. Then develop for 100 s with a diluted developer mixture
(AZ 351B:DI water in the ratio 1:3.75) and rinse with DI
water.
5. Etch through the chrome layer for 60 s using the Cr01 etchant,
and rinse.
6. Finally, strip the photoresist using the AZ 400 K developer for
35 s, followed by a final rinse and drying with CO2. The
completed masks should be completely dry before use.

3.2.2 Flow Mold 1. Prime a clean Si wafer with hexamethyldisilazane (HMDS) (see
Fabrication Note 6) for 10 s in vacuum, using the VB20 hotplate.
2. Transfer the wafer onto the Optispin SB20 spin coater and
dispense a few ml of positive-resist AZ9260 onto the center
of the wafer, taking care to avoid bubbles (see Notes 7 and 8).
3. Spin coat at 920 rpm for 100 s, followed by 60 s relaxation at
0 rpm. This deposits a 14-μm layer of photoresist on the surface
of the wafer.
Steady-State Cell-Free Gene Expression 195

4. When the spin coating has finished, immediately transfer the


wafer to a preheated hotplate, and “softbake” for 6 min exactly
at 115  C.
5. Transfer the wafer to an opaque storage box and allow it to
rehydrate for a minimum of 1 h (see Note 9).
6. Load the appropriate chrome mask onto the MJB4 mask
aligner, and expose for 2 cycles at 18 s per cycle, with a waiting
time of 10–15 s between each cycle, using the Hg-i line
(365 nm) at 20 mW/cm2 (see Notes 10 and 11). Use the
following parameters: expose type ¼ hard, alignment gap ¼ 30,
WEC type ¼ cont, N2 purge ¼ NO, and WEC-offset ¼ OFF.
7. Develop immediately (maximal waiting time is 1 h) by transfer-
ring the wafer to a bath of diluted AZ 400 K developer (1:3
developer:DI water). Develop face-up and gently agitate the
wafer in the bath for 10 min (see Note 12).
8. Rinse with DI water, then carefully but rapidly dry the wafer
with N2, and inspect features under a microscope. If photo-
resist residues remain, develop further until all the residues are
removed and repeat the cleaning and drying.
9. Finally, transfer the wafer to the AccuPlate hotplate, and carry
out a “reflow” bake using the following program to round-off
features (see Note 13): 1 h ramp up to 170  C, 2 h at 170  C,
and 1 h ramp down to room temperature.

3.2.3 Control Mold 1. Clean the Si wafer with 2.45 GHz O2 plasma in the Tepla
Fabrication 300 Plasma Stripper, using 500 W for 7 min and 400 ml/min
of O2.
2. Transfer the wafer onto the LSM250 spin coater and dispense a
few ml of negative resist GM1070-SU8 onto the center of the
wafer, taking care to avoid bubbles.
3. Spin coat a 40-μm layer of photoresist onto the wafer using the
following program: 5 s/0–500 rpm, 5 s/500 rpm, 21 s/
500–1933 rpm, 40s/1933 rpm, 1 s/1933–2933 rpm, 1 s/
2933–1933 rpm, 5 s/1933 rpm, and 26 s/1933–0 rpm.
4. When the spin coating has finished, immediately transfer the
wafer to the hotplate and carry out an initial relaxation fol-
lowed by a softbake using the following program (see Note 14):
30 min at 30  C, then 3000 s ramp 30  C to 130  C, 300 s at
130  C, and then 3000 s ramp 130  C to 30  C.
5. Load the appropriate chrome mask onto the MJB4 mask
aligner and expose for 1 cycle at 16 s, using the Hg-i line
(365 nm) at 20 mW/cm2. Use the following parameters:
expose type ¼ soft, alignment gap ¼ 30, WEC type ¼ cont,
N2 purge ¼ NO, and WEC-offset ¼ OFF.
196 Nadanai Laohakunakorn et al.

6. Transfer the wafer to the HP200 hotplate for a postexposure


bake using the following program: 2400 s ramp 30  C to
90  C, 2400 s at 90  C, 2700 s at 60  C, and 2700 s at 30  C.
7. Transfer the wafer to an opaque storage box and wait from 1 h
to overnight before development.
8. Develop by transferring the wafer to a bath of propylene-gly-
col-methyl-ether-acetate (PGMEA) developer (see Note 15).
Gently agitate the wafer in the bath for 2 min before transfer-
ring to a bath of new developer for a further 1 min.
9. Rinse with isopropanol. If a reaction is visible (white residues
appear) then return wafer to PGMEA for 30–60 s before
rinsing with isopropanol again. Let it dry naturally.
10. Inspect features under a microscope and carefully develop
further if needed. Avoid overdevelopment, which can lead to
breaking of features.
11. Finally, transfer to hotplate and carry out a “hardbake” using
the following program: 30 min ramp to 135  C, 2 h at 135  C,
and then 30 min ramp down to room temperature.

3.3 Soft Lithography 1. Before first use, place wafers inside a sealed box with few drops
for Device Fabrication (0.5 mL) of trimethylchlorosilane and incubate for at least
12 h. Repeat the silanization before each use for 10 min.
3.3.1 Silanization of
Wafers

3.3.2 Casting and Curing 1. In two plastic cups, weigh out and add PDMS elastomer and
of PDMS Devices curing agent in a ratio 5:1 (50 g: 10 g) for the flow layer and
20:1 (20 g: 1 g) for the control layer.
2. Defoam the mixture using the ARE-250 centrifugal mixer, by
mixing at 2000 rpm for 1 min followed by defoaming at
2200 rpm for 2 min. These values correspond to machine
settings specific for the ARE250, which is not a standard cen-
trifuge but a ’planetary’ mixer, i.e. the samples spin on a plat-
form which itself revolves around a central axis.
3. Clean both flow and control wafers using pressurised N2.
4. Put the flow layer wafer on aluminium foil inside a glass petri
dish. Make sure the foil covers the dish and contains the PDMS
fully. Pour all of the 5:1 PDMS mixture on top of the wafer and
place the dish inside a vacuum desiccator for 40 min to degas
the mixture.
5. Put the control layer wafer in the SCS G3P-8 spin coater, and
carefully pour a few ml of the 20:1 PDMS onto the center of
the wafer. To coat the wafer, run the following program: Step
0, rpm ¼ 0, disp ¼ 2, ramp ¼ 0.0, dwell ¼ 0; Step
1, rpm ¼ 1420, disp ¼ none, ramp ¼ 20.0, dwell ¼ 35; Step
Steady-State Cell-Free Gene Expression 197

2, rpm ¼ 100, disp ¼ none, ramp ¼ 20.0, dwell ¼ 1; and Step


3, rpm ¼ 100, disp ¼ none, ramp ¼ 1.0, dwell ¼ 0.
6. After coating, the PDMS layer will be uneven due to the high
40-μm features. Place the wafer on aluminium foil in a second
petri dish, cover to protect from dust, and set aside on the
bench for 40 min.
7. Then bake both flow and control wafers in an oven at 80  C.
The flow layer is baked for 20 min, and the control layer for
25 min. Timings for this step must be exact (see Note 16).
8. Remove the wafers from the oven. Using a sharp scalpel, cut
out each design from the flow layer, and immediately place on
top of the corresponding control layer region, roughly aligning
the two layers.
9. Once all the devices have been roughly aligned in this way,
transfer the control wafer to a stereo dissection microscope,
and align the two layers by manually lifting off and carefully
placing the top layer in its precise position (see Note 17).
10. Put the aligned devices back into the oven at 80  C and bake
for a minimum of 1 h 30 min.
11. Cut the multilayer devices off the wafer using a scalpel.
12. Using the hole puncher, punch through all the channel inlets.
13. Protect the PDMS surfaces from dust using Scotch tape. The
completed PDMS devices can now be stored in a clean petri
dish until the next step.

3.3.3 Bonding of PDMS 1. Clean glass slides using pressurised N2.


Devices to a Glass Slide 2. Remove any residual dust from the slide and feature surface of
the PDMS device using Scotch tape (see Note 18).
3. Switch on the Femto plasma oven and place the slide and
PDMS device bonding-sideup.
4. Pump out the chamber for at least 15 min to ensure a clean
vacuum environment.
5. Switch on the O2 for 2 min at a flow rate of 25 sccm and
0.1 bar, then apply 30 s of plasma at 100% power (which
corresponds to a plasma of 40 kHz and 100 W (see Note 19)).
6. Immediately, ventilate the plasma byproducts before opening
the chamber. Put the PDMS and glass together and manually
apply even, moderate pressure for a few seconds (see Note 20).
Then, put the bonded device into an oven at 80  C for 1 h to
overnight.
7. The completed devices can finally be stored at room tempera-
ture until use (see Note 21).
198 Nadanai Laohakunakorn et al.

To relay board + PC
Control branch

Solenoid
PE tubing valve
PE tubing Luer stub Water-filled control line Connector pin
OD 6 mm male luer 23 ga ID 0.35 mm

Electric
Compressed
manifold
air supply To chip
Buffers Connector pin
ID 0.35 mm

Regulator
Manual Luer stub TX-TL reagents
manifold 23 ga

Luer stub Male-to- Luer stub Reagent line PEEK tubing


23 ga male luer 20 ga ID 0.18 mm
adaptor

Flow branch

Fig. 2 Pneumatic connections for the setup. The compressed air supply is split into two independently
regulated branches. Pressure in the control branch is switched using electric valves while the flow branch is
controlled manually. Buffers and other input solutions are stored in Tygon tubing, while cell-free (TX-TL)
reagents are stored in FEP–PEEK tubing

3.4 Hardware Setup Air pressure is supplied to the setup using polyethylene (PE) tubing
connected directly to the laboratory compressed air supply. A sche-
matic of the setup’s pneumatic connections is shown in Fig. 2.

3.4.1 Regulation of 1. Connect one branch of the input air supply to a regulator, and
Control Layer Pressure direct the regulated output supply to the aluminium electric
manifold.
2. The electric manifold directs air pressure to the chip’s control
lines. Attach Tygon tubing (ID 0.0200 ) to the manifold using
appropriate adaptors as shown in Fig. 2. The tubing contains a
23 ga luer stub on one end (used for filling and connecting to
the manifold) and a stainless steel connector pin on the other
(used for connecting to the chip).
3. Plug the electric manifold into the relay board, which links via
USB to a PC running control software written in LabVIEW. An
example of the code and full documentation can be found
online (see Note 22).

3.4.2 Regulation of Flow 1. Connect the other branch of the input air supply to a regulator,
Layer Pressure and connect the regulated supply to the manual luer manifold.
2. Adjust the pressure as required (typically ~0.3 bar).

3.5 Device Operation 1. Lower the control manifold pressure to around ~10 psi.
3.5.1 Filling Control 2. Using the PC software, close all the control line valves.
Lines 3. Fill each Tygon line with deionised water (see Note 23)
through the connecting pin, using a syringe attached to a
luer stub.
Steady-State Cell-Free Gene Expression 199

4. Connect each line to the appropriate control channel inlet.


5. Once all the lines are connected, open all valves. This pres-
surizes the control channels, pushing air into the PDMS and
allowing them to fill with water. Wait until the channels are
completely filled with water, which can take up to 20 min.
Slowly raise the pressure up to ~20–30 psi.
6. Visually inspect all the valves to check that they actuate fully.

3.5.2 Filling Flow Lines 1. Make sure the appropriate manual manifold valve is closed.
2. Basic reagents such as buffers and chemicals are held in ID
0.0200 Tygon tubing. First, assemble the tubing which consists
of a length of Tygon, a 23 ga luer stub on one end and a
connector pin on the other.
3. Attach a syringe to the luer stub and carefully draw up the
required reagent into the tubing. Make sure there are no
bubbles.
4. Attach the connector pin to the appropriate flow inlet, before
removing the syringe and attaching the luer stub to the manual
manifold.
5. Make sure valves are in the appropriate configuration on the
chip before opening the flow manifold valve, and allowing the
reagent to fill into the device. Typically, a pressure of ~0.3 bar is
ideal for the flow lines.
6. For the cell-free extract, follow the previous steps, but instead
draw up the solution into the FEP coil through the PEEK
tubing. Attach the PEEK tubing directly into the chip.
7. An important requirement for long-term steady-state reactions
is that the cell-free extract is separated from energy and DNA
solutions. If required, cooling elements can be supplemented
to further prevent degradation of the solutions [16, 20].

3.5.3 Cell-Free 1. The device can be characterized as shown in Fig. 3.


Expression 2. A typical experimental program is shown in Fig. 4. First switch
on the environmental chamber to 29  C.
3. Load each reactor with cell-free extract, energy solution, and
DNA in the ratio 40%, 40%, and 20%, respectively.
4. The reactor contents are mixed by actuating the four multi-
function valves sequentially at a frequency of 20 Hz.
5. Dilution involves flowing cell-free extract, energy solution, and
DNA into the reactors in the ratio 8%, 8%, and 4%, respectively.
This corresponds to a 20% dilution of the reactor which pre-
serves the original reaction composition.
6. The dilution rate can be varied by adjusting the interval
between dilution steps.
200 Nadanai Laohakunakorn et al.

Solution A
a Loading b Dilution Solution B

1 2 3 4

60% 20% 12% 4%

c d 103 e f
200

Experimentally-determined load %

Experimentally-determined load %
ring number 70 70
15 Dilution %
ring 1 4 y = 1.01x - 0.26
YFP fluorescence [RFU]

YFP fluorescence [RFU]

60 60
ring 2 12 R2 = 0.9996
150 12 ring 3 50 20 50
ring 4 60
9 ring 5 40 40
100 ring 6
ring 7 30 30 Chip
6 ring 8
number
50 20 20 1
3 2
10 10 3
4
0 0 0 0
0 50 100 150 200 250 0 2 4 6 8 Ring 1 0 10 20 30 40 50 60 70
Ring 2
Ring 3
Ring 4
Ring 5
Ring 6
Ring 7
Ring 8
Time [s] Cycle Theoretical load %

Fig. 3 Basic operations and characterization of the chip. (a) Initial loading is achieved by flowing an input
solution (solution A, green) first through one side of the reactor, then the other. (b) Dilution takes place by
flushing an input solution (solution B, yellow) through different outlets. The dilution fraction is controlled by the
geometric positioning of the outlets and is fixed for a given design. (c) After loading 20% of a reactor with YFP,
actuating the peristaltic pump at 20 Hz mixes the solution in ~100 s. (d) This shows the fluorescence from all
eight reactor rings, initially loaded with 20% YFP, and repeatedly diluted with buffer. (e) Experimentally
determined dilution fraction for each of the eight reactors. (f) Experimentally determined load fraction vs
theoretical load fraction for four different chips

7. Image the resulting fluorescence using the microscope setup.


Software for the analysis, example images, and full documenta-
tion can be found online (see Note 24).

4 Notes

1. A custom relay board is used to control the electric manifold


actuation; any appropriate controller can be used in its place,
for instance the 24-channel USB24PRMx (EasyDAQ).
2. Excellent guidance is available, e.g., [21].
3. Designs for microfluidic devices are available online at http://
lbnc.epfl.ch/microfluidic_designs.html.
4. The 20 mm lens provides the highest write speed, taking
~4 min to write a 100  100 mm mask with 2 μm edge
resolution, and 1 mm stripe width. Higher resolutions are
possible but not necessary for soft lithography.
5. This is the step most often done incorrectly. The flow layer uses
positive-resist AZ, and requires a DARK-mode mask. The
Steady-State Cell-Free Gene Expression 201

a Initial loading c
103 Solution A 103 Solution B 103 Solution C
Load A Load B Load C 30 20 20
ring number
40% 40% 20% 25 ring 1
15 15 ring 2

DNA-cy5 [RFU]
20 ring 3

CFP [RFU]

YFP [RFU]
ring 4
15 10 10
ring 5
10 ring 6
5 5 ring 7
5 ring 8

0 0 0
0 2 4 6 8 10 0 2 4 6 8 10 0 2 4 6 8 10
Time [hours] Time [hours] Time [hours]

b Dilution step d 8
103 Tracer for Solution A
60
103 Steady-state expression
Load A Load B Load C ring number
8% 8% 4% ring 1
ring 2
6
ring 3

mCherry [RFU]

deGFP [RFU]
40 ring 4
ring 5
4 ring 6
ring 7
20 ring 8
2

0 0
0 2 4 6 8 10 0 2 4 6 8 10
Time [hours] Time [hours]

Fig. 4 Typical experimental operation of the chip. (a) The chip is initially loaded with three solutions A–C
(green, yellow, and blue) in the ratio 40%, 40%, and 20%, and (b) subsequently diluted with the same
solutions in the ratio 8%, 8%, and 4%. (c) Carrying out this process using aqueous solutions of three different
fluorescent tracers demonstrates that steady-state concentrations are maintained over many hours. (d)
Steady-state cell-free expression can be achieved by adding as the three solutions cell-free lysate (solution
A), energy solution (solution B), and DNA template (solution C). The lysate is labeled with an mCherry tracer to
assess its concentration (left), while the reaction produces deGFP, which reaches a steady-state concentration
when production and dilution rates are equal (right). Here, a dilution step was carried out every 15 min

control layer uses negative-resist SU8 and requires a CLEAR-


mode mask. Finally, as the exposure is chrome-side-down, the
masks must be MIRRORED AT Y.
6. HMDS priming enhances photoresist adhesion. Alternatively,
the wafer can also be treated with O2 plasma or thermally
dehydrated.
7. Pouring directly from the bottle introduces fewer bubbles than
using a plastic pipettor.
8. Opening the cap to the AZ9260 bottle to allow the release of
air bubbles a few minutes before use can also help minimize
bubbles.
9. Homogenous rehydration is important for efficient exposure,
and the minimum rehydration time is a function of the photo-
resist thickness (5 μm, 8 min; 20 μm, 2 h).
10. The mercury lamp contains spectral lines at 365, 405, and
436 nm. On the MJB4 machine, the i-line filter is installed
which passes only the 365 nm line. Without the filter, the
exposure is broadband. The exposure mode must be taken
into account during exposure time calculations.
202 Nadanai Laohakunakorn et al.

11. For 15 μm AZ9260, the recommended dose is 580 mJ/cm2


for i-line exposure and 660 mJ/cm2 for broadband.
12. The recommended development time is around 45 s per μm of
AZ9260.
13. Rounded features are crucial for the flow layer as it allows valves
to close completely.
14. The variable here is the ramp time, which depends on the
specific type of SU8 used.
15. It is highly recommended to develop the wafer upside down.
Prepare two baths of PGMEA.
16. The precise timing here is important. The PDMS should set
sufficiently so that it is not too sticky, but not so much that the
resulting multilayer device does not bond together.
17. This step requires the most practice. Alignment should be
completed as quickly and precisely as possible to ensure optimal
bonding. Air bubbles are typically caused by buckling of the
PDMS layers, and can be removed by first ensuring the top
layer is completely flat, and then with gentle application of
pressure. Putting weights on top of the PDMS during
subsequent baking can also help.
18. This step is important as the presence of dust between the glass
and PDMS can compromise bonding or render the device
nonfunctional.
19. Plasma treatment converts methylsiloxane to siloxyl groups on
the PDMS surface, enabling its covalent cross-linking to silica-
containing glass. There is, however, an optimum amount of
treatment, as over-treating increases the surface roughness of
the PDMS and decreases the effective contact area [22].
20. The binding can be checked by putting the chip against a black
piece of paper. Regions which are not bound will show up as
bubble-like features.
21. In our experience, devices can still be functional after 6 months’
storage.
22. https://github.com/nadanai263/lbnc-cellfree2
23. Ideally, all Tygon control lines should have the same length and
the same amount of water. The larger the volume of water in
the line, the faster the pressure transfer and valve actuation, due
to the incompressibility of water; in practice, care must be
taken so the water does not get into the electric manifold, so
do not fill the lines fully. Finally, make certain that there are no
air bubbles where the line connects to the chip.
24. https://github.com/nadanai263/lbnc-cellfreeview
Steady-State Cell-Free Gene Expression 203

Acknowledgments

This work was supported by an HFSP Program Grant RGP0032/


2015; the European Research Council under the European
Union’s Horizon 2020 research and innovation program Grant
723106; and the École Polytechnique Fédérale de Lausanne.

References
1. Purnick P, Weiss R (2009) The second wave of 12. Niederholtmeyer H et al (2015) Rapid cell-free
synthetic biology: from modules to systems. forward engineering of novel genetic ring oscil-
Nat Rev Mol Cell Biol 10:410–422 lators. elife 4:1–18
2. Garenne D, Noireaux V (2019) Cell-free tran- 13. Hori Y et al (2017) Cell-free extract based
scription-translation: engineering biology from optimization of biomolecular circuits with
the nanometer to the millimetre scale. Curr droplet microfluidics. Lab Chip 17:3037–3042
Opin Biotechnol 58:19–27 14. Chang J-C et al (2018) Microfluidic device for
3. Takahashi MK et al (2015) Characterizing and real-time formulation of reagents and their
prototyping genetic networks with cell-free subsequent encapsulation into double emul-
transcription-translation reactions. Methods sions. Sci Rep 8:8143
86:60–72 15. Spirin A et al (1988) A continuous cell-free
4. Perez JG et al (2016) Cell-free synthetic biol- translation system capable of producing poly-
ogy: engineering beyond the cell. Cold Spring peptides in high yield. Science 242:1162–1164
Harb Perspect Biol 8:a023853 16. Niederholtmeyer H et al (2013) Implementa-
5. de Maddalena LL et al (2016) GreA and GreB tion of cell-free biological networks at steady
enhance expression of Escherichia coli RNA state. Proc Natl Acad Sci 110:15985–15990
polymerase promoters in a reconstituted 17. Karzbrun E et al (2014) Programmable
transcription-translation system. ACS Synth on-chip DNA compartments as artificial cells.
Biol 5:929–935 Science 6198:829–832
6. Niederholtmeyer H, Xu L, Maerkl SJ (2013) 18. Tayar A et al (2017) Synchrony and pattern
Real-time mRNA measurement during an formation of coupled genetic oscillators on a
in vitro transcription and translation using chip of artificial cells. Proc Natl Acad Sci
binary probes. ACS Synth Biol 2:411–417 114:11609–11614
7. Wick S et al (2019) PERSIA for direct fluores- 19. Rockel S, Geertz M, Maerkl SJ (2012)
cence measurements of transcription, transla- MITOMI: a microfluidic platform for in vitro
tion, and enzyme activity in cell-free systems. characterization of transcription factor-DNA
ACS Synth Biol 8:1010–1025 interaction. Methods Mol Biol 786:97–114
8. Kwon Y-C, Jewett MC (2015) High- 20. van der Linden A J et al (2019) A multilayer
throughput preparation methods of crude microfluidic platform for the conduction of
extract for robust cell-free protein synthesis. prolonged cell-free gene expression. J Vis Exp
Sci Rep 5:8663 152:e59655
9. Sun ZZ et al (2013) Protocols for implement- 21. Ferry MS, Razinkov IA, Hasty J (2012) Micro-
ing an Escherichia coli based TX-TL cell-free fluidics for synthetic biology: from design to
expression system for synthetic biology. J Vis execution. Methods Enzymol 497:295–372
Exp 79:1–15 22. Chau K et al (2011) Dependence of the quality
10. Lavickova B, Maerkl SJ (2019) A simple, of adhesion between poly(dimethylsiloxane)
robust, and low-cost method to produce the and glass surfaces on the composition of the
PURE cell-free system. ACS Synth Biol oxidizing plasma. Microfluid Nanofluid
8:455–462 10:907–917
11. Dubuc E et al (2019) Cell-free microcompart- 23. Swank Z, Laohakunakorn N, Maerkl SJ (2019)
mentalised transcription-translation for the Cell-free gene-regulatory network engineering
prototyping of synthetic communication net- with synthetic transcription factors. Proc Natl
works. Curr Opin Biotechnol 58:72–80 Acad Sci U S A 116:5892–5901
Chapter 10

A Microfluidic/Microscopy-Based Platform for on-Chip


Controlled Gene Expression in Mammalian Cells
Mahmoud Khazim, Elisa Pedone, Lorena Postiglione, Diego di Bernardo,
and Lucia Marucci

Abstract
Applications of control engineering to mammalian cell biology have been recently implemented for precise
regulation of gene expression. In this chapter, we report the main experimental and computational
methodologies to implement automatic feedback control of gene expression in mammalian cells using a
microfluidics/microscopy platform.

Key words Feedback control, Mammalian cell, Microfluidics, Cell segmentation, PDMS , Control
algorithms

1 Introduction

In recent years, feedback control has been widely used for


controlling gene expression across cellular species and applications.
In-cell feedback is implemented within cells by means of gene
regulatory networks involving, for example, positive and negative
feedback loops. Instead, in silico feedback control implements the
control action externally: cellular outputs are measured usually by
microscopy using fluorescent proteins, and actuators provide cells
the control inputs (e.g., inducer molecules) to minimize the con-
trol error. Here, we report the main experimental and computa-
tional methods we employed for external feedback control of gene
expression in mammalian cells using a microfluidics/microscopy
platform [1–3]. The PDMS microfluidic device we used [4] has
been optimized for long-term mammalian cell culturing, imaging,
and precise delivery of two media to cells; it consists of 33 individual
cuboid culture chambers adjoined to a main perfusion channel via a

Mahmoud Khazim, Elisa Pedone, and Lorena Postiglione contributed equally to this work.Diego di Bernardo
and Lucia Marucci contributed equally to this work.

Filippo Menolascina (ed.), Synthetic Gene Circuits: Methods and Protocols, Methods in Molecular Biology, vol. 2229,
https://doi.org/10.1007/978-1-0716-1032-9_10, © Springer Science+Business Media, LLC, part of Springer Nature 2021

205
206 Mahmoud Khazim et al.

wide opening on one side of each chamber. For a detailed descrip-


tion of the device, please refer to [4]. The experimental protocols
described here would need to be adapted if using a different micro-
fluidics setup.

2 Materials

2.1 Chip Fabrication 1. Master silicon wafer (Silicon Valley Microelectronics, USA).
2. Chlorotrimethylsilane (TCSM).
3. Aluminum foil.
4. Vacuum degassing chamber (Bel-Art).
5. Oven.
6. Sonicator (Camlab).
7. Acetone, methanol, isopropyl alcohol, and distilled water.
8. Pressurized nitrogen.
9. Polydimethylsiloxane (PDMS) Sylgard 184 Elastomer base
(Dow Corning).
10. 0.75-mm biopsy punch (World Precision Instruments;
504,529).
11. Cover glasses (Hirschmann 24  60 mm T 0.13–0.17 mm).
12. O2 plasma asher (DienerZepto).

2.2 Chip Loading 1. PDMS microfluidic chips.


2. Dissociation reagent.
3. Phosphate-buffered saline (PBS) solution.
4. 15-mL falcon tube.
5. Centrifuge.
6. 10-mL (Terumo, IVS10) and 2.5-mL syringes (Terumo,
IVS03).
7. 23-gauge needles (BD, 300800).
8. PTFE Tubing (06417-21_PTFE #24 AWG Thin Wall Tubing,
Cole-Parmer Inc.).
9. 22-gauge 90 bent metal pins (922050-90BTE, Metcal).
10. Vacuum aspirator.
11. Tissue culture microscope.
12. Loading buffer: knockout Dulbecco’s modified Eagle’s
medium (DMEM), 15% fetal bovine serum, 1  nonessential
amino acids, 1  GlutaMax, 1  2-mercaptoethanol, 1000 U/
mL LIF.
Mammalian Cell Control 207

2.3 Microfluidic- 1. 10-mL (Terumo, IVS10) and 50-mL syringes (Terumo,


Based Time-lapse SS + 50 L1).
2. 23-gauge needles (BD, 300800).
3. 22-gauge metal pins (922050-90BTE, Metcal).
4. Y-junction (Ziggy’s tubes and wires, Inc. HSCY-23).
5. PTFE Tube (06417-21_PTFE #24 AWG Thin Wall Tubing,
Cole-Parmer Inc.).
6. Fluorescent dye (e.g., Atto488 or 467 dye from ThermoFisher,
Sulforodhamine from Sigma).

3 Methods

3.1 Fabrication For brevity, the protocol reported here assumes that a master mold
of PDMS Replica is available; therefore, only the steps to produce device replica are
Molding included. For master mold fabrication, please refer to the original
publication [4].

3.1.1 Silanization Prior to replication of microfluidic devices, the master mold is


of Master Mold Wafer exposed to chlorotrimethylsilane vapors to produce a passivation
of the surfaces to prevent the PDMS from adhering to the master.
This process is not required for each replica fabrication, but it is
done prior to the first use of the master, and when it becomes
difficult to peel off the PDMS.
l Place the master silicon wafer in a vacuum degassing chamber
(Fig. 1a).
l Place 2–10 μL of the silanization agent in an open Eppendorf
tube, stood in an aluminum foil cap, adjacent to the master, and
apply the vacuum to allow silanes to form a monolayer on the
surface of the master for 15–30 min. This process is not required
for each replica but the first time you use it and whenever the
peeling of the PDMS gets harder.

3.1.2 PDMS Microfluidic PDMS base is mixed with curing agent and placed on a silicon
Device Preparation master mold wafer. The mixture is degassed and then cured. The
cured PDMS is peeled off and autoclaved, and then, ports are
punched using a biopsy punch.
l Prepare PDMS by mixing Sylgard 184 Elastomer base and cur-
ing agent in a 10:1 ratio. Mix the base and curing agent well
using a lab spatula. The amount of PDMS is usually worked out
to a tailored dimension. For a 4-in. wafer and petri dish, use 50 g
of PDMS/curing agent in a 10:1 ratio (45 g of PDMS base and
5 g of curing agent).
208 Mahmoud Khazim et al.

Fig. 1 Steps and equipment for the fabrication of microfluidic devices by PDMS replica molding. (a)
Silanization and surface treatment of the master by placing in a degassing chamber with TCSM. (b) The
master is placed in a glass petri dish covered with aluminum foil. (c) PDMS and curing agent are mixed,
poured over the master, and degassed. (d) After curing, the PDMS replica are cut out and ports are made by
use of a reusable biopsy punch. (e) The punched devices are sonicated in isopropyl alcohol followed by
sonication in H2O. (f) The cleaned replica is bonded to glass coverslips using a plasma asher

l Place master mold into a glass petri dish, with similar area
dimensions, covered with aluminum foil (Fig. 1b).
l Pour PDMS mix onto master mold and degas in a vacuum
degassing chamber for 30 min or until all bubbles have been
removed (Fig. 1c).
l Place master mold with PDMS in an oven to cure for 1 h at
80  C.
l Gently peel off cured PDMS from the mold end and release the
PDMS from the master mold.
l Autoclave cured PDMS for 30 min at 121  C in an autoclavable
paper bag to ensure long-term viability of cells in the device.
l Using a 0.75-mm biopsy punch, punch the ports to create
fluidic ports for access of cells and media (Fig. 1d) (see Note 1).
Mammalian Cell Control 209

3.1.3 Cleaning Punched PDMS devices are sonicated to dislodge PDMS shavings
and Bonding of PDMS from the ports. Coverslips are cleaned and dried. Finally, the PDMS
Chips to Glass Coverslips devices and coverslips are placed in a plasma asher and bonded by
bringing the surfaces to contact and optionally baking overnight to
increase the bond between the device and coverslip.
l Place punched PDMS devices in isopropyl alcohol and sonicate
for 10 min (Fig. 1e).
l Sonicate in distilled water for 10 min (Fig. 1e).
l Air-dry using pressurized nitrogen.
l For each molded, punched PDMS device, clean a thin
24  60 mm cover glass in acetone, methanol, isopropyl alcohol,
and distilled water and then dry with pressurized nitrogen.
l Expose the PDMS devices, with layers facing up, and cover
glasses to oxygen plasma in an O2 plasma asher, at 50–70%
power for 2 min (Fig. 1f).
l Bring the PDMS device into contact with the cover glass with
layers facing down to form a strong irreversible bond between
surfaces.
l Use a microscope to check for any faults.
l Optional: bake the bonded devices at 90  C overnight.

3.2 Chip Loading Prior to trapping the cells in the microfluidic device chambers via
on-chip vacuum, the device needs to be prewet such that device
3.2.1 Pins Preparation
channels are filled with fluid while the culture chambers remain
and Wetting of the Device
filled with air. Also, pins need to be prepared. To release the pins
from the syringe adaptors (Fig. 2a), incubate the pins with isopro-
pyl alcohol for 24–48 h (Fig. 2b). The pins can be used to connect
fluidic lines to the microfluidic chip (Fig. 3) for all future
experiments.

Fig. 2 Release of metallic pins from adaptors. Metallic pins before (a) and after
(b) 24/48-h isopropyl alchohol (isopropanol) incubation
210 Mahmoud Khazim et al.

Fig. 3 Microfluidic device wetting, cell loading, and preculture. (a) Microfluidic device, which has been bonded
to a glass coverslip. Media is flushed through the microfluidic channels starting from port 5 (b) followed by
filling through port 2 (c). (d) The wetted device is fastened onto a lab microscope, and a vacuum pump is
attached to ports 3 and 4. Cells are pushed through port 1 and loaded via the vacuum into the cell chambers.
(e) The loaded microfluidic device for preculture has ports 2, 5, and 6 plugged using a pin with a short length of
PTFE tubing tied at the end to stop fluidic flow through the ports (*). A 10-mL syringe with media is attached to
port 1 for overnight perfusion (blue arrow) and media flows out of port 5 (red arrow). (f) A 10-mL syringe with
media is fastened onto a makeshift rig and attached to port 1

l Connect a 2.5-mL syringe, via an attached 23-gauge sterile


needle, to a 10-cm section of PTFE #24 AWG tubing with a
bent 90 metal pin connected at the end.
l Aspirate a 1-mL volume of media into the syringe and connect
the pin to port 5 of the microfluidic device (Fig. 3b). Gentle
pressure is applied until fluid fills all main device ports to the top
(ports 6, 7, and 2).
l Remove pin and tubing from port 5 and wet connect (fluid
droplet from tubing is applied to the port prior to connecting
to ensure bubbles do not enter the device) to port 2 (Fig. 3c).
Apply very gentle pressure to the syringe until port 1 is filled to
the top.
Mammalian Cell Control 211

3.2.2 Shear-Free Cell Once the chip has been wetted, the chip is attached to a vacuum
Loading Via on-Chip pump and cells are loaded into the chip. Cells are visually moni-
Vacuum tored via laboratory microscope while being vacuum-loaded. Cells
that remain in the main device channels are flushed out using fresh
media, while cells in the culture chambers are shielded and resistant
to convective flow.
l Connect the on-chip vacuum to ports 3 and 4 and fasten the
chip to a laboratory microscope to enable monitoring of cell
loading (Fig. 3d).
l Wash mammalian cells (previously kept in complementary media
and maintained in a tissue culture incubator at 37  C and 5%
CO2) with PBS; detach cells from culture dish and place into a
centrifuge tube.
l Centrifuge cells to form a pellet and resuspend in complete
media, at a density of 2  106 cells per 100 μL of media. If
cells are too concentrated, dilute with extra media.
l Aspirate the cell suspension into a fresh 2.5-mL syringe via
needle attached tubing and metal pin, and wet connect to port
1 of the device.
l Gently apply pressure to syringe with cell suspension until cells
are visible in the main perfusion channel upon inspection via a
tissue culture microscope.
l Once the presence of cells in the main channel is confirmed, stop
the flow by releasing syringe pressure until cells are apparent at
the entrance of the chip chambers.
l To begin cell loading, turn the vacuum on and visually monitor
cells entering the chambers; mechanical finger tapping of the
tube near the port can enhance cell loading as it avoids cells
getting stuck on the walls of the device.
l Once loaded, turn vacuum off and disconnect the vacuum ports.
l Use a new syringe and tubing with fresh media to flush out
untrapped cells from the ports by wet connecting to port
1 and applying a gentle pressure through the main channel,
out of the remaining ports of the device. Care needs to be
taken so that the fluid flow through the device is not too strong,
as this can cause cells properly trapped into the chambers to be
washed out.

3.2.3 Preculture of Cells The device with cells in the culture chambers can now be precul-
in the Microfluidic Device tured in the incubator, overnight and up to 48 h, to allow cells to
attach and proliferate inside the device prior to undertaking control
experiments on the microscope.
212 Mahmoud Khazim et al.

l Plug ports 2, 6, and 7 by using 90 bent metal pins with a small
amount of tubing which can be tied at the end to stop the flow
(Fig. 3e).
l Fill a 10-mL syringe with up to 5 mL of culture media onto a
makeshift rig and attach by needle, tubing and pin to port
1 (Fig. 3f). The hydrostatic pressure difference between the
syringe fluid and the opening in port 5 allows media to flow
through the device, establishing a slow perfusion flow.
l Place rig with culture media and microfluidic device into an
incubator and culture overnight.

3.3 Microfluidics/ l Measure and cut three sections of PTFE #24 AWG tubing for
Microscopy-Based collecting the waste media and and two sections of tubing for
Time-lapse time-controlled delivery of culture media to cells. The length of
the tubing is around 120 cm for the output (waste) and 200 cm
3.3.1 Tubes for the input (delivered) media.
l Connect the Y-junction to a short tube of about 20 cm and two
of the 120-cm waste output tubes (Fig. 4).
l Connect the short section of tubing (Fig. 4a) to a 23-gauge
needle and the longer sections of tubing to metal pins
(Fig. 4b, c).
l Similarly, connect one side of the remaining tubes (one output
and two inputs) to a 23-gauge needle on one end and metal pins
on the other end.
l Attach each fluidic line to a 50-mL syringe and slowly fill them
with 12 mL of culture media. Note that this amount of media is
enough for experiments <72 h; for longer experiments, more
media is required.

Fig. 4 Junction and tube connections. Side a is connected to the 50-mL waste
syringe placed at 23 cm from the stage, whereas b and c are connected to two
120-cm output tubes. Two metallic pins are used to attach the tubes to the chip
and collect the waste media from ports 1 and 2
Mammalian Cell Control 213

Fig. 5 Motor-controlled actuators; syringes containing media are attached to the


carriage of each actuator

l Place the two waste syringes at 23 and 46 cm above microscope


stage and secure them to the wall with adhesive tape. Place and
secure the syringes containing the media to be automatically
delivered to the chip on the motor-controlled tracks of the
actuator (Fig. 5) (see Note 2).

3.3.2 Chip Positioning Check the chip for the presence of obstructions that might impair
correct media flow through the microfluidic channels. If needed,
the chip can be flushed using a short section of #24 PTFE tubing
connected to a 10-mL syringe with fresh media.
Handle the following steps very carefully to avoid damaging the
chip and support, see Fig. 6.
l Connect each fluidic line to the chip and corresponding port,
starting from the waste ports. The tubes from the syringe placed
at 23 cm from the stage go to ports 1 and 2, while the one at
46 cm from the stage connects to port 5. Finally, connect the
syringes for media delivery to ports 6 and 7.
l Center the microfluidic chip, loaded with cells, on the micro-
scope stage and secure with adhesive tape.
l Check the CO2 valve is open and fasten the atmospheric cham-
ber correctly over the chip.
214 Mahmoud Khazim et al.

Fig. 6 Positioning of microfluidic device on the microscope stage; CO2 chamber


and microfluidic line connections are indicated

3.3.3 Actuation System The actuation system consists of two motor-controlled syringes
(containing media) mounted on linear actuators connected to
ports 6 and 7. Custom scripts in MATLAB need to be written to
implement online cell segmentation and control algorithms (see
Subheadings 3.4 and 3.5); the latter have to been coupled to the
software for automatic syringe movement and actuation.
l Turn on the software for syringes calibration.
l Calibrate the actuation system using the dedicated software and
move the syringes to the desired positions. The syringe with
media to be delivered to the cells should be at the highest
position.

3.3.4 Microscope Specs The settings in this section refer to use [2, 3] of a Leica DMi8
inverted microscope equipped with the digital camera AndoriXON
897 ultra back-illuminated EMCCD (512  512 16 μm pixels,
16 bit, 56 fps at full frame), and an environmental control chamber
(PeCon) for temperature and CO2 control. Equivalent micro-
scopes can be used, as far as the following is present:
l Digital camera for image acquisition.
l Environmental control chamber (PeCon) for long-term temper-
ature control and CO2 enrichment.
l Adaptive Focus Control (AFC) option to ensure that the focus is
maintained during the entire duration of the experiment.
l 20–40 objective.

3.3.5 Time-lapse l Turn on the microscope following this order: stage, fluorescent
Settings lamp, and microscope.
l Launch the microscope software and setup the time-lapse by:
Mammalian Cell Control 215

– Choosing the appropriate laser power to minimize phototox-


icity and adequate exposure settings.
– Selecting and saving positions to be imaged.
– Setting the duration of the time-lapse and the sampling time
(see Notes 3 and 4).

3.4 Computational For mammalian cell segmentation, the property of cells exhibiting a
Algorithms white halo in phase contrast images can be exploited [1–3]. The
main steps for cell segmentation and fluorescence quantification are
3.4.1 Cell Segmentation
as follows:
l Defining a threshold to generate a first binary image selecting
only pixels belonging to cell edges.
l Obtaining a second binary image (mask) in which the cell area is
overestimated by using dilation and filling operators.
l Subtracting from the mask obtained at point (2) the mask
obtained at point (1) in order to derive a binary image that
selects the portion of the original image covered by cells.
l Applying the mask, obtained at point (3), to the fluorescent field
image. In order to calculate the average fluorescence intensity of
pixels belonging to cells, the value of mask pixels obtained is
divided by the area of the mask.
l Subtracting the background signal (measured in a cell-free por-
tion of the chamber) from the value of cell fluorescence signal.
Other segmentation algorithms might be used, given different
cell morphologies and/or microscope used.

3.5 Feedback Control This control strategy can be expressed as follows:


Algorithms 
umax if e ðt Þ > 0
uðt Þ ¼
3.5.1 Relay Controller umin if e ðt Þ < 0
where the control error e(t) ¼ r(t)  y(t) is the difference between
the reference signal r and the system output y; u is the control
input. Usually, in controlling biological systems, the system output
is a fluorescent protein, while the input is represented by inducer
molecule(s), provided by the actuators (see Subheading 3.3.3). This
control strategy, although being simple, succeeds in keeping the
system output close to the desired reference. Typically, the con-
trolled variable oscillates around the reference; this is acceptable if
the oscillations are sufficiently small [1, 5].
In the Relay control strategy, when the output value is very
close to the reference, the control error can rapidly change sign,
thus causing the control input to continuously switch (chattering
216 Mahmoud Khazim et al.

phenomenon [6]); this can be reduced by adding hysteresis ε to the


controller, modifying the control law as follows:

umax if e ðt Þ  ε
uðt Þ ¼
umin if e ðt Þ < ε
The drawback of this controller is that the amplitude of the
oscillations around the set-point increases. For examples of Relay
implementation in mammalian cells, see [1–3].

3.5.2 Proportional- l The PI output u bðt Þ is a function


R t of the control error e(t), and it is
Integral (PI) Controller bðt Þ ¼ kp e ðt Þ þ ki 0 e ðτÞdτ.
defined as u
l For a control system with a wide range of operating conditions,
it may happen that the control action reaches the actuator limits;
if an integral action is used, the error will continue to be
integrated meaning that the integral term and the control out-
put may become very large. The control signal will then remain
saturated even when the error changes, and it may take a long
time before the integrator and the controller output come inside
the saturation range (integrator windup [5]). An anti-windup
compensation scheme can be add to the PI controller to prevent
the windup phenomenon and the control input u b to become too
large (refer to [5], for examples of anti-windup scheme).
l The proportional and integral gains of the PI controller (kp and
ki, respectively) have to be tuned using a dynamical model of the
system to be controlled (i.e., following the Ziegler and Nichols
method [5]).
l If the biological system under investigation can be fed with or
without the inducer molecule in a mutually exclusive manner,
the continuous signal u bðt Þ has to be decoded in a discrete way.
The control technique to satisfy the above constraint is to couple
the PI regulator with a PWM (pulse-width modulator). Specifi-
cally, at each sampling time kT, the PWM algorithm calculates
the duty cycle of the input d k ¼ Tu as the ratio between the
control input u b and the sampling time T. The input u(t) is
computed as follows:

umax if kT  t < ðk þ d k ÞT
uðt Þ ¼
umin if ðk þ d k ÞT  t < ðk þ 1ÞT

The PI-PWM controller was used, for example, in [1] to con-


trol gene expression from the tetracycline-inducible promoter in
CHO cells. The first advantage of PI controller is that, besides its
very simple implementation, it guarantees zero steady-state error
for constant reference and the rejection of constant disturbances at
steady state. Moreover, the PI controller does not require a model
of the controlled system, although an idea of its dynamics is neces-
sary for gains’ tuning. On the other hand, a PI controller does not
Mammalian Cell Control 217

achieve a satisfactory performance for tracking time-varying refer-


ences unless the reference dynamics are much slower than the
closed-loop system dynamics.

3.5.3 Model Predictive Model predictive control (MPC) is a well-established technique for
Control (MPC) controlling multivariable systems subject to constraints. Applica-
tions of MPC to regulate gene expression and signaling pathway
activity in mammalian cells are reported in [2].
l Given a desired control reference, MPC aims at finding the
optimal control input to minimize the difference between the
target value and the measured value, by means of a dynamical
model of the system being controlled and a cost function.
l To speed up computation, a discretized version of the dynamical
models describing the biological system is used, assuming that
the input is piece-wise constant during the sampling period
T (zero-order hold method):

x kþ1 ¼ Ax k þ Buk
y k ¼ Cx k
where, for example, in the case of a three-state system with
0 1
x1ðkT Þ
B C
1 input, x k ¼ @ x2ðkT Þ A are the system states, uk ¼ u(kT) is
x3ðkT Þ
the control input, and yk ¼ () is the system output with being a
natural number (∈[1,2,. . .]).
l Starting from the experimental data, at each sampling time , the
MPC controller uses the discrete model to predict the dynamic
behavior of the system to be controlled over a defined prediction
horizon and to determine the input such that an open-loop
objective function is minimized [7]. An example of cost function
to be minimized is the squared control error (SSE), defined as
follows:
X
kþN
SSEk ¼ ðN þ 1 þ k  i Þε2i
i¼kþ1

where N defines the length of the prediction horizon in terms of


sampling intervals; (N + 1 + k  i) is a weighting factor that that
weights the control error samples at the beginning of the pre-
diction horizon more than those at the end. .
The MPC strategy requires a mathematical model of the pro-
cess being controlled to calculate the control input. Note that these
models are used only to synthesize the controllers and not to
estimate biological parameters quantitatively; thus, the uniqueness
of the identified parameters is not ensured but only the models’
ability to predict the system output given the input.
218 Mahmoud Khazim et al.

l Use the microfluidics platform in open-loop to measure input/


output time-series data. For example, deliver an input to the cells
in the microfluidic device as a series of pulses of inducer mole-
cule with variable duration but fixed amplitude (square waves)
and measure the mean fluorescence in the cell population, which
we considered as the output of the system [8].
l If the biological processes being controlled are fluorescent pro-
teins driven by inducible promoters, you can assume that their
dynamics could be well approximated by state-space linear
models.
l Derive the dynamical model from the input–output data by
using black-box or gray-box identification approaches [2, 9].

4 Notes

1. Chip fabrication procedure is best performed in a clean room


environment to avoid the inclusion of impurities in the chip.
2. To monitor correct media flow and measure which input is
delivered to the cells, add a fluorescent dye to one of the two
syringes with media.
3. To control correct chip perfusion, we suggest to image also the
DAW junction of the chip, where the media coming from the
two actuation syringes mix.
4. The microscope settings need to be adjusted depending on the
resolution of the camera and the brightness of each
fluorescent tag.

Acknowledgments

This work was funded by Medical Research Council grant


MR/N021444/1 to L.M., by the Engineering and Physical
Sciences Research Council grants EP/R041695/1 and
EP/S01876X/1 to L.M., and by BrisSynBio, a BBSRC/EPSRC
Synthetic Biology Research Centre (BB/L01386X/1) to L.M.

References
1. Fracassi C, Postiglione L, Fiore G, di Bernardo of gene expression and signaling pathway activity
D (2016) Automatic control of gene expression in mammalian cells by automated microfluidics
in mammalian cells. ACS Synth Biol 5 feedback control. ACS Synth Biol 7
(4):296–302. https://doi.org/10.1021/ (11):2558–2565. https://doi.org/10.1021/
acssynbio.5b00141 acssynbio.8b00235
2. Postiglione L, Napolitano S, Pedone E, Rocca 3. Pedone E, Postiglione L, Aulicino F, Rocca DL,
DL, Aulicino F, Santorelli M, Tumaini B, Montes-Olivas S, Khazim M, di Bernardo D, Pia
Marucci L, di Bernardo D (2018) Regulation Cosma M, Marucci L (2019) A tunable dual-
Mammalian Cell Control 219

input system for on-demand dynamic gene 7. Morari M, Lee JH (1999) Model predictive con-
expression regulation. Nat Commun 10 trol: past, present and future. Comput Chem
(1):4481. https://doi.org/10.1038/s41467- Eng 23(4):667–682. https://doi.org/10.
019-12329-9 1016/S0098-1354(98)00301-9
4. Kolnik M, Tsimring LS, Hasty J (2012) 8. Fiore G, Menolascina F, di Bernardo M, di Ber-
Vacuum-assisted cell loading enables shear-free nardo D (2013) An experimental approach to
mammalian microfluidic culture. Lab Chip 12 identify dynamical models of transcriptional reg-
(22):4732–4737. https://doi.org/10.1039/ ulation in living cells. Chaos 23(2):025106.
c2lc40569e https://doi.org/10.1063/1.4808247
5. Astrom KJ, Murray RM (2010) Feedback sys- 9. Menolascina F, Fiore G, Orabona E, De
tems: an introduction for scientists and engi- Stefano L, Ferry M, Hasty J, di Bernardo M, di
neers. Princeton University Press Bernardo D (2014) In-vivo real-time control of
6. Utnik V, Lee, H (2006) Chattering problem in protein expression from endogenous and syn-
sliding mode control systems. Paper presented at thetic gene networks. PLoS Comput Biol 10
the international workshop on variable structure (5):e1003625–e1003625. https://doi.org/10.
systems, Alghero, Sardinia, Italy 1371/journal.pcbi.1003625
Chapter 11

Optimal Experimental Design for Systems and Synthetic


Biology Using AMIGO2
Eva Balsa-Canto, Lucia Bandiera, and Filippo Menolascina

Abstract
Dynamic modeling in systems and synthetic biology is still quite a challenge—the complex nature of the
interactions results in nonlinear models, which include unknown parameters (or functions). Ideally, time-
series data support the estimation of model unknowns through data fitting. Goodness-of-fit measures
would lead to the best model among a set of candidates. However, even when state-of-the-art measuring
techniques allow for an unprecedented amount of data, not all data suit dynamic modeling.
Model-based optimal experimental design (OED) is intended to improve model predictive capabilities.
OED can be used to define the set of experiments that would (a) identify the best model or (b) improve the
identifiability of unknown parameters. In this chapter, we present a detailed practical procedure to compute
optimal experiments using the AMIGO2 toolbox.

Key words Biological systems, Dynamic models, Optimal experimental design, Practical identifiability

1 Introduction

The ultimate aim of systems biology is the discovery of the design


principles of life, while the ultimate aim of synthetic biology is to
apply those design principles to synthesize novel biological systems.
Both the discovery and the synthesis rely on a combination of data
and mechanistic mathematical models that capture the most rele-
vant features of the system. Ideally, models will offer the means to
prediction and, thus, to design.
Model building is an iterative process which goes back and
forth between model refinement and data validation. The first
steps of the modeling process include (a) defining the question to
be addressed, (b) generating hypotheses, (c) selecting the modeling
framework, and (d) formulating one (or several) candidate model
(s). Steps (b)–(d) often rely on prior knowledge and observations.
This chapter focuses on modeling biological networks. The first
steps of modeling (1 and 2) define the topology of the network:
which are the biomolecules of interest and whether they interact

Filippo Menolascina (ed.), Synthetic Gene Circuits: Methods and Protocols, Methods in Molecular Biology, vol. 2229,
https://doi.org/10.1007/978-1-0716-1032-9_11, © Springer Science+Business Media, LLC, part of Springer Nature 2021

221
222 Eva Balsa-Canto et al.

with each other. Next steps (3 and 4) describe the kinetics and
strengths of biomolecular interactions within the system. If we
consider the cell as a well-stirred reactor, we can explain the behav-
ior of the network using a set of ordinary differential equations
which determine concentration changes as prescribed by kinetic
laws. The model would read as follows:
dx
¼ f ðx, u, θ, t Þ; xðt 0 Þ ¼ x0 ð1Þ
dt
where x, u, and θ regard the vectors of state variables, inputs, and
model parameters, respectively.
The dynamics of the system (1) depends on the initial condi-
tions (x0) and the parameter values. Parameter estimation offers the
means to reconcile models with data [1]. The underlying idea is to
solve a nonlinear optimization problem to compute unknown and
nonmeasurable kinetic constants to maximize the likelihood of the
data ey.
The experimental data consist of a matrix of values
corresponding to individual measurements obtained under the
conditions specified by an experimental scheme ε. We encode the
experimental data and the model predictions in the following
vectors:
h i h i
y¼ e
e y 1 , ey 2 , . . . , ey d , . . . , ey nd y ¼ y 1 , y 2 , . . . , y d , . . . , y nd
ð2Þ
where ey represents the experimental data and y ¼ g(x, u, θ, t) the
corresponding model predictions; d represents a specific experi-
mental condition defined by subindexes ε-for the experiment-,
o-for the observables in the experiment ε-, and s – for the sampling
times in the experiment ε. nd regards the total number of such
conditions, that is, the number of data. Accordingly, the operators
to be defined in the sequel can be easily condensed as follows:
nεo
!!
Xnd X
nε X no,ε
Xs

ðÞ ¼ ðÞ ð3Þ


d¼1 i¼1 j ¼1 k¼1

Output additive experimental noise is often assumed in such a


way that:
e
y d ¼ y d þ ed , ð4Þ
where ed belongs to a sequence of independent random variables
with probability density Π(ed). In many practical examples, experi-
mental noise is assumed to be Gaussian, and its variance σ 2d is known
for all d’s (homoscedastic case) or unknown and dependent of d
(heteroscedastic case).
Optimal Experimental Design for Systems and Synthetic Biology Using AMIGO2 223

When information about the nature of the experimental noise is


available, parameter estimation looks for the value of the para-
meters gives the highest probability to the measured data:
J llk ðθÞ ¼ lnðπð~
yjθÞÞ ð5Þ
The probability density function will condition the exact for-
mulation of the cost function. In practice, homoscedastic Gaussian
additive noise is often assumed, and the resulting cost function is
similar to the well-known generalized least squares, with weights
set to the inverse of the variance of the experimental data. The
parameter estimation problem is formulated as finding the parame-
ter values that minimize:
 2
Xnd
yd ðθÞ  eyd
J llk ðθÞ ¼ ð6Þ
d¼1
σ 2d

Precise parameter estimates require informative data, that is,


informative experimental schemes ε. The precision of the parameter
estimates can be measured in terms of the volume and eccentricity
of the confidence hyperellipsoid. In this regard, the confidence
hyperellipsoid informs about the quality and quantity of informa-
tion provided by the experimental scheme for parameter estima-
tion. Related to this, the concept of practical identifiability refers to
the (im)possibility of assigning precise values to model parameters.
Performing experiments to obtain a rich enough set of experi-
mental data for nonlinear dynamic modeling is costly and time-
consuming. Model-based optimal experiment design (OED, see,
e.g., [2, 3]) allows devising the minimum set of experiments for
model identification, that is, for either model selection or parame-
ter estimation. Mathematically the OED problem can be formu-
lated as a dynamic optimization problem where the objective is to
find the observables, time-varying inputs (stimuli), initial condi-
tions, sampling times, and experiment duration, to maximize or
minimize a performance index which is related to the experiment
information content. Figure 1 illustrates the concept of the experi-
mental scheme.
The definition of the information depends on the aim of opti-
mal experimental design. For the case of model selection, the
information typically relates to the differences in predictions
between candidate models:
   
J OED,MS uε ðt Þ, t εs , nεs , t εf ¼ Ψ y εA , y εB ð7Þ

where y εA and y εB correspond to the observables as predicted by


model A and B given the experimental conditions ε. The functional
Ψ may correspond to, for example, the Euclidian distance between
the models.
224 Eva Balsa-Canto et al.

Fig. 1 Concept of the experimental scheme. It includes the number of experiments and/or replicates,
stimulation conditions, measured states, experiment duration, and sampling times

For the case of parameter estimation, the determinant or the


eigenvalues of the Fisher information matrix provide a measure of
the statistical quality of the parameter estimates, that is, a measure
of the volume and eccentricity of the confidence hyperellipsoid
[4]. The Fisher information matrix reads as follows:
(  T )
dJ llk ðθÞ dJ llk ðθÞ
F ¼ E ð8Þ
eyjμ dθ dθ

being E the expected value and μ a near-optimum value of the


parameters. The Cramèr–Rao inequality provides a lower bound
on the covariance of the estimators (under given conditions):
C  F 1 ðμÞ ð9Þ
The confidence interval for a given parameter μi is then given
by the following:
pffiffiffiffiffiffiffi
t γα=2 Cii ð10Þ

where t γα=2 is given by the Students t-distribution, γ regards the


number of degrees of freedom, and α is the (1  α) 100% confi-
dence interval.
Both the parameter estimation and the optimal experimental
design problems can be cast as nonlinear programming problems
(NLP) subject to dynamic and algebraic constraints. For the case of
optimal experimental design, the stimuli are parametrized to trans-
form the function u(t) into a vector w ∈ Rρ, with ρ the number of
Optimal Experimental Design for Systems and Synthetic Biology Using AMIGO2 225

Fig. 2 Iterative solution of the parameter estimation and optimal experimental design problems. The iterative
solution requires a NLP solver to generate candidate solutions at each iteration k; and an IVP solver to solve
the model equations, plus the parametric sensitivities in the case of OED, to evaluate the cost function and
constraints

parameters required to characterize the stimuli profile (e.g., num-


ber of steps or pulses, stimuli switching times). Figure 2 shows the
iterative procedure for their resolution. In an outer iteration, the
NLP solver will generate candidate solutions, and in an inner itera-
tion, the initial value problem (IVP) solver will compute model
predictions and parametric sensitivities for the given candidate
solution.
As for the selection of the initial value problem solver, possibly
the most popular are the Runge–Kutta in its implicit and explicit
versions, the Adams–Bashforth, and the backward differentiation
formula (BDF)-based methods. For an extensive review of meth-
ods, see [5].
NLP solvers are designed to generate, from one or several
initial guesses, a sequence of solutions—iterates—that eventually
converge to the minimum of the cost function (Eqs. 4 and 5). The
way these iterates are computed allows the first classification of NLP
solvers into three major groups: local, global, and hybrid optimizers
(Fig. 3).
Local methods use information about the cost function and
possibly its gradient and its Hessian to compute search directions.
These methods guarantee convergence to a local optimum—the
global optimum if the problem is convex. Interested readers are
referred to, for example, the book by Fletcher (1987) [6] for
extensive descriptions of local optimizers; further Seber and Wild
(1989) [7] and Schittkowski (2002) [8] describe Levenberg–Mar-
quardt and Gauss–Newton-based methods for the specific case of
least squares problems. The use of adjoint sensitivities may largely
improve the efficiency of evaluating the gradient of the cost func-
tion [9], thereby ameliorating the overall convergence rate of local
indirect methods. The use of first- and projected second-order
226 Eva Balsa-Canto et al.

Fig. 3 Classification of NLP solvers and some popular examples

sensitivities also enhances the convergence rate of the solution of


dynamic optimization problems such as those solved in optimal
experimental design [10].
However, the nonlinear character of the dynamic biological
models often leads to multimodality, and thus, local methods may
end up in suboptimal solutions. Lin and Stadtherr (2006) [11] or
Polisetty et al. (2006) [12] suggested the use of global determin-
istic optimizers for parameter estimation. Although very promising
and powerful, there are still limitations to their application, mainly
due to the rapid increase of computational cost with the size of the
considered system and the number of its parameters.
Alternatively, stochastic global optimization algorithms make
use of pseudorandom sequences to determine search directions
toward the global optimum. The main advantage of these methods
is that they rapidly arrive at the proximity of the solution, which
makes them particularly attractive to implement hybrid global-local
optimizers suitable for dynamic optimization and optimal experi-
mental design [13] and parameter estimation [14]. Villaverde et al.
(2019) [15] conclude, in their benchmark of optimizers for param-
eter estimation, that the best performance is achieved with hybrid
approaches such as the enhanced Scatter Search method (eSS,
[16, 17]).
AMIGO2 [18] is a multiplatform MATLAB-based toolbox
designed to automate the solution of optimization problems
which are at the core of systems and synthetic biology: (a) the
iterative identification of dynamic models, (b) the use of optimality
principles for predicting biological behavior, and (c) the multiob-
jective optimal control of biological systems.
This chapter presents a protocol to iteratively build predictive
mathematical models using parameter estimation and optimal
experimental design as implemented in the AMIGO2 toolbox [18].
Optimal Experimental Design for Systems and Synthetic Biology Using AMIGO2 227

2 Materials

2.1 Toolbox AMIGO2 toolbox and the corresponding documentation are avail-
Download and License able at:
https://sites.google.com/site/amigo2toolbox/.
The toolbox is provided as a zip file with a password; it is free of
charge for academic purposes under the creative commons license.
For further details on license conditions, please visit http://
creativecommons.org/licenses/by-nc-nd/3.0.

2.2 Toolbox AMIGO2 has been implemented in MATLAB and tested in several
Requirements MATLAB versions. However, it may interface to C code for model
and Installation Guide simulation and parameter estimation.
For full capabilities, the user will require the following addi-
tional software:
l Cytoscape is needed for network visualization.
l MATLAB optimization toolbox is required to use the local
optimizers fmincon (SQP method for constrained problems,
suitable for dynamic optimization) or lsqnonlin (a least-squares
local NLP solver, suited for parameter estimation).
l MATLAB symbolic manipulation toolbox is used to evaluate
exact Jacobians and for network visualization.
l C compiler (e.g., gcc) is required to use AMIGO2-enhanced
modes with C.
The most computationally demanding step in all tasks in
AMIGO2 is the solution of the system dynamics, that is, the set
of ordinary differential equations (ODE). In this regard, the tool
offers the possibility of automatically generating C code, and this
will be automatically mexed to CVODES (Sundials) as included in
AMIGO2.
The toolbox does not require installation. Once unzipped,
open a MATLAB session and move to the AMIGO2 path. The
code is initialized by typing AMIGO_Startup. The Startup auto-
matically adds AMIGO2 to the path and generates mex options
files. From that moment on, users can access the Help from the
MATLAB help Supplemental Software section.

2.3 Code Structure AMIGO2 is organized in four main modules: the preprocessor, the
numerical kernel, the postprocessor, and the module of main tasks.
Figure 4 presents the code structure once unzipped.
l Help folder keeps all toolbox-related documentation.
l Examples folder keeps several implemented examples that the
user may consider as templates to address new problems.
228 Eva Balsa-Canto et al.

Fig. 4 Code structure. The code is organized in user-oriented folders (Examples, Inputs, Help, and Results),
code folders (Preprocessor, Postprocessors, Add-ons, Release-info), and tasks (Startup, Prep, SModel, SObs,
SData, LRank, GRank, ContourP, RIdent, PE, REG_PE, PE-PostAnalysis, OED, IOC, and DO)

l Inputs folder, initially empty, is devoted to keeping new inputs


created by users.
l Kernel folder keeps mathematical functions, NLP solvers, IVP
solvers, and auxiliary code.
l Postprocessor folder keeps all MATLAB functions to generate
reports, structures, and figures.
l Preprocessor folder keeps all MATLAB functions to generate
MATLAB or C code, to mex files when required, and to create
necessary paths.
l Release_info folder contains the AMIGO_release_info.m with
all details about the current release.
l Results folder, initially empty, is devoted, by default, to keep all
results. User may create other results folders.
Inputs to the code are kept into a MATLAB structure inputs.
Different tasks require different inputs. For the purpose of this
chapter, we will use the following:
l inputs.model: To include all information about the model, that
is, the number of states, parameters, and stimuli; their names;
model equations; and a nominal value of the parameters.
l inputs.exps: To specify the experimental scheme, that is, the
number of experiments and for each experiment, its initial and
stimulation conditions, observables, sampling times, experiment
duration, and available experimental data and experimental
noise.
l Inputs.Dosol: To specify the optimal experimental design for
model selection, that is, the objective functional, the control
vector parameterization approach, initial conditions for the
experiment, experiment duration, and constraints.
Optimal Experimental Design for Systems and Synthetic Biology Using AMIGO2 229

l inputs.OEDsol: To specify the objective of the optimal experi-


mental design for parameter estimation.
l inputs.ivpsol: To specify the IVP solver, the sensitivity solver,
and the integration tolerances. Defaults are set for these inputs.
l inputs.nlpsol: To define the NLP solver and the corresponding
parameters, for example, the maximum number of function
evaluations or maximum computational time for optimization,
to name a few. Defaults are set for these inputs.
l inputs.plotd: To specify the characteristics of the display of
results. Defaults are set for these inputs.

2.4 Basics As described in the previous section, AMIGO2 is organized in tasks


on the Use of AMIGO2 or tools. Every task is devoted to a specific problem in systems and
synthetic biology. The following tasks are of interest for optimal
experimental design:
l AMIGO_Prep interprets the “inputs” structure and creates the
necessary files for other tasks.
l AMIGO_SModel, AMIGO_Sobs, and AMIGO_SData are
devoted to model simulation. Models or observables are evalu-
ated, and results are plotted against experimental data. Pseudo-
data can be generated for numerical tests or synthetic problems.
l AMIGO_OED solves the model-based optimal experimental
design problem to improve parametric identifiability.
l AMIGO_DO solves multi- and single-objective dynamic opti-
mization problems using the control vector parameterization
(CVP) approach [19]. This tool can be used for optimization-
based modeling or stimulation design, among others. It is,
therefore, applicable to the OED for model selection.

3 Methods

3.1 Illustrative We consider the modeling of an inducible promoter in Saccharomy-


Example: Modeling ces cerevisiae to motivate the protocol. Promoters are defined as the
of an Inducible DNA sequence surrounding the transcription start site, which is
Promoter bound by the basal transcription machinery, thus allowing tran-
scriptional initiation. Knowledge about the promoter architecture
led to the development of inducible promoter systems. Here, we
consider a chemically regulated system which is induced by the
presence of IPTG and controls the expression of a fluorescent
reporter, Citrine. The model reads as follows [20]:
L_ 0 ¼ kLacI  ð2k2 IPTGi þ kd ÞL 0 þ km2 L 1
 k1 ð2G 20 þ G 21 ÞL 0 þ km1 ðG 21 þ 2G 22 Þ ð11Þ
230 Eva Balsa-Canto et al.

L_ 1 ¼ k2 ð2L 0  L 1 ÞIPTGi  ðkm2 þ kd ÞL 1 þ 2km2 L 2 ð12Þ


L_ 2 ¼ k2 L 1 IPTGi  ð2km2 þ kd ÞL 2 ð13Þ
Lac12 ¼ kLac12  ðkTP1 þ kd ÞLac12 ð14Þ
Lac12m ¼ kTP1 Lac12  kd Lac12m ð15Þ
G_ 2,0 ¼ 2k1 L 0 G 2,0 þ ðkm1 þ kd ÞG 2,1 ð16Þ

G_ 2,1 ¼ 2k1 L 0 G 2,0 þ 2ðkm1 þ kd ÞG 2,2


 ðkm1 þ k1 L 0 þ kd ÞG 2,1 ð17Þ

G_ 2,2 ¼ 2ðkm1 þ kd ÞG 2,2 þ k1 L 0 G 2,1 ð18Þ


: kcat Lac12mIPTGe
IPTG ¼
i K m þ IPTGe
 ðkout kd þ 2k2 L 0 þ k2 L 1 ÞIPTGi þ ðkd þ km2 ÞL 1
þ 2ðkd þ km2 ÞL 2 ð19Þ
 
Cit_ m ¼ kC G 2,0 þ l k kC G 2,1 þ G 2,2  kd Citm ð20Þ
where L0 regards the LacI2 repressor; L1 and L2 correspond to
LacI2  IPTGi and LacI2  2IPTGi complexes; Lac12 and Lac12m
represent the protein in the cytoplasm and membrane, respectively;
G2,0 corresponds to the gene that codes for citrine; G2,1 and G2,2
regard the G1  IPTGi and G1  2IPTGi complexes. Unknown
kinetic constants include kLacI, k2, kd, km2, k1, km1, kLac12, kTP1,
kcat, Km, kout, kC, and lk.
As a second candidate model, we consider the reduced model
proposed by Bandiera et al. (2018). The model structure builds on
the assumption of time-scale separation between the expression of
the repressor LacI, its dimerization and subsequent binding to the
operator sites and IPTG, considered at quasi-steady state, and
Citrine expression. The model reads as follows:
IPTGeh
Cit_mrna ¼ α1 þ V m1  d 1 Citmrna ð21Þ
K hm1 þ IPTGeh
_
CitfoldedP ¼ α2 Citmrna  ðd 2 þ K f ÞCitfoldedP ð22Þ
Cit_fluo ¼ K f CitfoldedP  d 2 Citfluo ð23Þ
where Citmrna, CitfoldeP, and Citfluo are the concentrations of Citrine
mRNA, immature folded protein, and matured (fluorescent) pro-
tein. The model describes transcription, translation, and matura-
tion of the fluorescent reporter. Transcription depends on the
concentration of the inducer IPTGe through a Hill equation,
where Vm1 is the maximal-induced transcriptional rate; h is the
Hill coefficient; Km1 is the Michaelis–Menten coefficient. Transla-
tion occurs at a rate α2, and the folded protein matures at rate Kf.
Optimal Experimental Design for Systems and Synthetic Biology Using AMIGO2 231

mRNA and protein are subject to linear degradation occurring at


rates d1 and d2, respectively.
Both models, regarded here as MA and MB, respectively, were
fitted to the time-series data available in [20]. Details on the fitting
can be found in [21]. The best parameter values will be used here as
the reference for optimal experimental design.
Here, we show the definition of model in AMIGO2:

inputs.model.input_model_type='charmodelC';%Model type: charmodelC for


C code
inputs.model.n_st=4; % Number of states
inputs.model.n_par=9; % Number of model parameters
inputs.model.n_stimulus=1; % Number of inputs
inputs.model.st_names=char('Cit_mrna','Cit_foldedP',...
'Cit_fluo','Cit_AU'); % Names of states
inputs.model.par_names=char('alpha1','Vm1','h1','Km1','d1','alpha2',...
'd2','Kf','sc_molec'); % Names of the parameters
inputs.model.stimulus_names=char('IPTG'); % Names of the stimuli or
inputs
inputs.model.eqns=... % ODEs system dynamics.
char('dCit_mrna=alpha1+Vm1*(IPTGe^h1/(Km1^h1+IPTGe^h1))-
d1*Cit_mrna',...
'dCit_foldedP=alpha2*Cit_mrna-(d2+Kf)*Cit_foldedP',...
'dCit_fluo=Kf*Cit_foldedP-d2*Cit_fluo',...
'dCitAUB = sc_molec*dCit_fluo');
inputs.model.par=[0.000377125304442752, 0.00738924359598526,
1.53333782244337, 5.01927275636639, 0.00118831480244382,
0.0461264539194078, 0.000475563708997018, 0.000301803966012407,
68.8669567134881]; % Reference value for the parameters

3.2 Optimal Currently, AMIGO2 does not offer a specific task to solve the
Experimental Design problem of OED for model selection. Still, it is possible to use
for Model Selection AMIGO_DO for that purpose. Remark that the use of DO implies
in AMIGO2 that it is possible to measure regularly over time.
AMIGO_DO requires the definition of inputs.model, inputs.
DOsol, inputs.IVPsol, inputs.NLPsol, and inputs.plotd.

3.2.1 Definition The first step in the protocol corresponds to the definition of the
of the Objective Functional objective functional that will characterize the differences between
the models. Several possibilities exist. Here, we include a couple of
examples:
1. The integral of the squared differences of the fluorescent
protein:
2t¼tf 31=2
ð
 2
J OED,MS ¼4 CitAU,A  CitAU,B dt 5 ð24Þ
t¼0

where CitAU,A and CitAU,B correspond to the fluorescence in


arbitrary units as predicted by models A and B, respectively. A
232 Eva Balsa-Canto et al.

scaling parameter transforms Citm (in model A) or Citfluo


(in model B) into the A.U. of the experimental data.
2. The integral of the squared differences of the fluorescent pro-
tein over the experiment duration:
2t¼tf 31=2
ð
1 4  
CitAU,A  CitAU,B dt 5
2
J OED,MS ¼ ð25Þ
tf
t¼0

Remark that this second possibility will penalize longer experi-


ments. Alternative formulations to penalize experiment duration
are possible. Also solving the problem as a multiobjective problem
would allow to find the Pareto front of solutions offering the best
compromise between model differences and experiment duration.
Here, we focus on the single-objective case.
The definition of the cost function in AMIGO2 requires the
declaration of both models in sequence plus one or more additional
ODEs to account for the objective functional(s). In this way, mod-
els and objective functional will be solved simultaneously, reducing
the computational effort. For the particular case of functionals in
Eqs. (24) and (25), the additional ODEs read:

Equation 24:
'dJ_OED_MS=(CitAUA-CitAUB)^2'
Equation 25:
'dtfinal=1',…
'dJ_OED_MS=(1/tfinal)*(CitAUA-CitAUB)^2'

3.2.2 Definition The definition of the dynamic optimization problem requires the
of the Optimization following elements: the initial conditions for model simulation; the
Problem tentative experiment duration; the type of optimization problem
(minimization or maximization); the definition of the objective;
and control vectors parameterization (type of input interpolation,
number of discretization elements, initial guess and bounds for the
inputs, and bounds for the experiment duration).
Inputs are shown in the sequel. For illustrative purposes, we
will assume that the experiment may last between 4 and 24 h, and
the input profile corresponds to a step-wise profile with five ele-
ments of fixed duration (i.e., the experiment is split into four
segments of equal duration). The input parameterization can be
easily modified in the inputs structure to consider steps of varying
duration, pulse-wise profiles, or linear-wise profiles. It should be
noted that the use of steps or linear-wise profiles with elements of
varying duration increases the multimodality of the optimization
problem. In general, solving a case with ten constant duration steps
is simpler than solving a case with five steps whose duration is also
Optimal Experimental Design for Systems and Synthetic Biology Using AMIGO2 233

optimized, even if the NLP problem has the same number of


decision variables.
inputs.DOsol.y0=[ 6.105e-05 0.32025 419.98 598.88
10251 0.99998 1.9501e-05 9.5071e-11 3.1415e+09 1247.8
17157 6.5338 387.69 246.04 16944 0 0];
% Initial conditions (state after overnight at maximum IPTGe)

inputs.DOsol.tf_type='od'; % Experiment duration: fixed or


designed
inputs.DOsol.tf_guess=15*3600; % Tentative experiment duration

inputs.DOsol.DOcost_type='max'; % Type of problem: max/min


inputs.DOsol.DOcost='sqrt(J_OED_MS)';% Objective functional

% CVP DETAILS
inputs.DOsol.u_interp='stepf'; % Stimuli definition:
% 'sustained' |'step'|'stepsf'
|'linear'
inputs.DOsol.n_steps=5;
inputs.DOsol.u_guess=500*ones(1,inputs.DOsol.n_steps); % Guess for the
input
inputs.DOsol.u_min=zeros(1,inputs.DOsol.n_steps);
inputs.DOsol.u_max=1000*ones(1,inputs.DOsol.n_steps); % Min/max for the
input
inputs.DOsol.t_con=linspace(0,inputs.DOsol.tf_guess,inputs.DOsol.n_steps+1);
% Input swithching times: Initial and final time

inputs.DOsol.tf_min =8*3600; % Min/max for the experiment duration


inputs.DOsol.tf_max =24*3600;

3.2.3 Definition The user selects the initial value problem solver plus the optimizer.
of the Numerical Methods AMIGO_DO allows for successive input refinements, and there-
fore, the user may activate that possibility.

% SIMULATION
%
inputs.ivpsol.ivpsolver='cvodes'; % IVP solver: 'cvodes'(default,
C)|
% 'ode15s'
(default,MATLAB,sbml)|'ode113'|
% 'ode45'
inputs.ivpsol.rtol=1.0D-7; % [] IVP solver integration
tolerances
inputs.ivpsol.atol=1.0D-7;

% OPTIMIZATION
%
inputs.nlpsol.nlpsolver='local_fmincon'; % [] NLP solver:
% LOCAL: 'local_fmincon'|'local_n2fb'|'local_dn2fb'|'local_dhc'|
% 'local_ipopt'|'local_solnp'|'local_nomad'|
% MULTISTART:'multi_fmincon'|'multi_n2fb'|'multi_dn2fb'|'multi_dhc'|
% 'multi_ipopt'|'multi_solnp'|'multi_nomad'|
234 Eva Balsa-Canto et al.

% GLOBAL: 'de'|'sres'
% HYBRID: 'hyb_de_fmincon'|'hyb_de_n2fb'|'hyb_de_dn2fb'|'hyb_de_dhc'|
% 'hyp_de_ipopt'|'hyb_de_solnp'|'hyb_de_nomad'|
%
'hyb_sres_fmincon'|'hyb_sres_n2fb'|'hyb_sres_dn2fb'|'hyb_sres_dhc'|
% 'hyp_sres_ipopt'|'hyb_sres_solnp'|'hyb_sres_nomad'
% METAHEURISTICS:
% 'ess' or 'eSS' (default)
% Note that the corresponding defaults are in files:
% OPT_solvers\DE\de_options.m; OPT_solvers\SRES\sres_options.m;
% OPT_solvers\eSS_**\ess_options.m
%
inputs.nlpsol.reopt='off'; % Reoptimization
inputs.nlpsol.reopt_local_solver='fmincon'; % Optimiser for
reoptimization
inputs.nlpsol.n_reOpts=2; % Number of
reoptimizations

3.2.4 Running the Code The first step is to preprocess the model to generate necessary
scripts: C code for model simulation and the objective function.
After preprocessing, the AMIGO_DO can be run.

>> IP_model_selection % Reads the inputs structure


>> AMIGO_Prep(inputs) % Runs preprocess
>> AMIGO_DO(inputs) % Solves the OED problem

The optimizer was not able to converge: the final solution


corresponds to the initial guess. At this point, we switch to a global
optimizer. In this particular case, we have selected eSS [16]. As an
alternative, it is recommended to check differential evolution [22]
as it has been quite successful in solving dynamic optimization
problems [13].
inputs.nlpsol.nlpsolver='ess';
inputs.nlpsol.eSS.maxeval = 100000; % Maximum ner of function
evaluation
inputs.nlpsol.eSS.maxtime = 120; % Maximum CPU time in seconds
inputs.nlpsol.eSS.local.solver = 'fmincon'; % Local solver -
refinements
inputs.nlpsol.eSS.local.finish = 'fminsearch';

Since the optimizer is stochastic, it is advised to run the code


several times to check for convergence. Note that if all runs
provide different solutions, the maximum computation time
should be increased. Ideally, all runs should end up in the same
optimum.
Optimal Experimental Design for Systems and Synthetic Biology Using AMIGO2 235

Fig. 5 Optimal experimental design for model selection. (a) The optimal IPTG profile corresponds to a pulse-
wise profile, starting from the absence of stimulation. Response time of model B to IPTG pulses is shorter than
the corresponding response time for model A. (b) The optimal IPTG profile starts from no stimulation for more
than half the experiment, and after that, the IPTG value increases to a final value of 152. The experiment lasts
the minimum allowed of 8 h. In both cases, models respond differently, thus being distinguishable

The optimal IPTG profile, as obtained from the maximization


of the models’ distance defined in Eq. (24), corresponds to a pulse-
wise profile, shown in Fig. 5a. Since there is no penalty on the final
time, the optimum corresponds to the maximum of 24 h. The
optimal IPTG profile as obtained for the maximization of the
models’ distance defined in Eq. (25) corresponds to the step-wise
profile shown in Fig. 5b. Note that the optimum is achieved in an
experiment lasting 8 h.

3.3 Optimal AMIGO_OED offers the possibility of solving the optimal experi-
Experimental Design mental design problem for parameter estimation. The problem is
for Parameter formulated as a dynamic optimization problem in which the objec-
Estimation in AMIGO2 tive is to find the experimental scheme that minimizes a specific
functional of the Fisher information matrix subject to a given set of
constraints. The model is defined as in Subheading 3.1.

3.3.1 Definition The toolbox predefines various OED problems. The most widely
of the Objective Functional used are as follows:
l D-optimum design corresponds to the maximization of the
determinant of the Fisher information matrix. This design is
236 Eva Balsa-Canto et al.

particularly suited to cases in which parameters are not highly


correlated but poorly identifiable.
l E-optimum design corresponds to the maximization of the
minimum eigenvalue of the Fisher information matrix. This
design is particularly adequate for those cases in which para-
meters are highly correlated, that is, the confidence hyperellip-
soid is highly eccentric.
Remark that the evaluation of the Fisher information matrix
requires information about the expected (or typical) experimental
noise in the system. The definition of the objective functional
would read as follows:
%==================================
% OJECTIVE FUNCTIONAL RELATED DATA
%==================================

inputs.PEsol.id_global_theta='all'; % Parameters to be considered


for OED
'all'|User selected
inputs.PEsol.global_theta_guess=inputs.model.par; % Nominal value of
the
% parameters to compute the FIM
inputs.exps.noise_type='homo_var'; % Type of experimental noise:
'homo'
% |'homo_var'| 'hetero'
inputs.exps.std_dev{1}=0.1; % Standard deviation of the
noise for
each experiment: Ex: 0.05 <=>
5%
inputs.OEDsol.OEDcost_type='Eopt'; % FIM based criterium:
'Dopt'|'Eopt'|'Aopt'|'Emod'

3.3.2 Definition The user needs to define what is being designed: initial conditions,
of the Optimization stimuli condition, observation function, experiment duration, and
Problem number and location of sampling times.
Inputs are shown in the sequel. In this particular example, we
will assume that we design a single 24-h experiment, and the input
profile corresponds to a step-wise profile with five elements of fixed
duration.
Optimal Experimental Design for Systems and Synthetic Biology Using AMIGO2 237

%===========================================
% DEFINITION OF EXPERIMENT 1: TO BE DESIGNED
%===========================================
inputs.exps.exp_type{1}='od';
inputs.exps.n_obs{1}=1; % Number of observables
inputs.exps.obs_names{1}=char('CitAU'); % Name of observables
inputs.exps.obs{1}=char('CitAU=CitAUB'); % Observation function

%Initial conditions for the experiment


inputs.exps.exp_y0{1}=[ 6.5338 387.69 246.04 16944];

inputs.exps.t_f {1}=24*3600; % Experiment duration

inputs.exps.ts_type{1}='fixed'; % [] Type of sampling times:


% 'fixed'(default) | 'od' (to be
designed)
inputs.exps.n_s{1}=244; % Number of sampling times, every
5 min

inputs.exps.u_type{1}='od'; % Type of stimulation: 'fixed' |


'od' (to
% be designed)
inputs.exps.u_interp{1}='stepf'; % Stimuli definition for
experiment 1:
% OPTIONS:u_interp: 'sustained'
|'step'|
%'linear'(default)|'pulse-
up'|'pulse-down'
inputs.exps.n_steps{1}=5; % Number of pulses _|-|_|-|_
inputs.exps.u_min{1}=0*ones(1,inputs.exps.n_steps{1});
inputs.exps.u_max{1}=1000*ones(1,inputs.exps.n_steps{1}); % Max/min
u
inputs.exps.u_guess{1}=1000*rand(1,inputs.exps.n_steps{1});% Guess u

3.3.3 Definition The evaluation of the Fisher information matrix requires the solu-
of the Numerical Methods tion of the model parametric sensitivities. The toolbox implements
several possibilities including CVODES (for C code), a modifica-
tion of ode15s for sensitivity computation (for MATLAB models),
and a couple of finite differences schemes which may be used for C,
MATLAB, or blackbox models.
% SIMULATION
inputs.ivpsol.ivpsolver='cvodes'; % IVP solver:
'cvodes'(default,C)|
%'ode15s' (default, MATLAB,
sbml)|
%'ode113'|'ode45'
inputs.ivpsol.senssolver='cvodes'; % Sensitivities solver:'cvodes'
% (default,C)|
'sensmat'(matlab)|
% Finite differences:
'fdsens2'|'fdsens5'
inputs.ivpsol.rtol=1.0D-7; % Solver integration tolerances
inputs.ivpsol.atol=1.0D-7;
238 Eva Balsa-Canto et al.

Fig. 6 Optimal input profiles for parameter estimation. (a) Presents the optimal input to achieve maximum
information, that is, to maximize the determinant of the Fisher information matrix (Dopt). (b) Presents the
optimal input to achieve minimum correlation, that is, to maximize the minimum eigenvalue of the Fisher
information matrix (Eopt). The profiles are completely different, while for Dopt, the optimum corresponds to a
pulse-wise profile starting from full stimulation; for Eopt, it seems more convenient to use steps of different
magnitudes

As for the optimizers, the definition is similar to that in Sub-


headings 3.2.3 and 3.2.4.

3.3.4 Running the Code The first step is to preprocess the model to generate necessary
scripts: C code for model simulation and the objective function.
After preprocessing, the AMIGO_OED can be run.

>> IP_OED % Reads the inputs structure


>> AMIGO_Prep(inputs) % Runs preprocess
>> AMIGO_OED(inputs) % Solves the OED problem

As an initial guess, we considered a random input profile which


corresponds to a minimum eigenvalue (Eopt) of 5.29  1011 and
a volume (Dopt) of 9.23  109. After optimizing to maximize the
minimum eigenvalue, Eopt is 4.13  105, while the volume of the
corresponding Fisher information matrix is 1.14  1011. When
solving the problem to maximize the volume of the information,
Dopt is 2.54  1018, while the minimum eigenvalue would be
3.56  105. Figure 6 shows the optimal input profiles for both
cases.
The protocol can be complemented with new experiment
designs in an online optimal experimental design scheme as the
one suggested in [21]. The underlying idea is to iteratively improve
the quality of the parameter estimates given previous results. For
this purpose, the iterative procedure could eventually switch
between objectives and focus the attention on less accurate
parameters.
Optimal Experimental Design for Systems and Synthetic Biology Using AMIGO2 239

Acknowledgments

The authors acknowledge financial support from the Spanish Min-


istry of Science, Innovation and Universities and the European
Union FEDER (project grant RTI2018-093744-B-C33). This
work was also supported by a Royal Society of Edinburgh-MoST
grant, EPSRC grant EP/R035350/1 and EP/S001921/1 to
Dr. Menolascina, and the EPSRC grant EP/P017134/1-
CONDSYC to Dr. Bandiera.

References
1. Jaqaman K, Danuser G (2006) Linking data to deterministic method. Ind Eng Chem Res 44
models: data regression. Nat Rev Mol Cell Biol (5):1514–1523
7(11):813–819 14. Rodriguez-Fernandez M, Mendes P, Banga JR
2. Balsa-Canto E, Alonso AA, Banga JR (2008) (2006) A hybrid approach for efficient and
Computational procedures for optimal experi- robust parameter estimation in biochemical
mental design in biological systems. IET Syst pathways. Biosystems 83(2–3):248–265
Biol 2(4):163–172 15. Villaverde AF, Fröhlich F, Weindl D,
3. Kreutz C, Timmer J (2009) Systems biology: Hasenauer J, Banga JR (2019) Benchmarking
experimental design. FEBS J 276(4):923–942 optimization methods for parameter estima-
4. Walter E, Pronzato L (1997) Identification of tion in large kinetic models. Bioinformatics 35
parametric models from experimental data. (5):830–838
Springer 16. Egea JA, Balsa-Canto E, Garcı́a M-SG, Banga
5. Quarteroni A, Sacco R, Saleri F (2000) JR (2009) Dynamic optimization of nonlinear
Numerical mathematics. Springer-Verlag, processes with an enhanced scatter search
New York method. Ind Eng Chem Res 48(9):4388–4401
6. Fletcher R (1987) Practical methods of optimi- 17. Egea JA, Martı́ R, Banga JR (2010) An evolu-
zation. Wiley, Chichester tionary method for complex-process optimiza-
7. Seber GAF, Wild CJ (1989) Nonlinear regres- tion. Comp Oper Res 37(2):315–324
sion. Wiley series in probability and mathemat- 18. Balsa-Canto E, Henriques D, Gabor A, Banga
ical statistics. Wiley, New York JR (2016) AMIGO2, a toolbox for dynamic
8. Schittkowski K (2002) Numerical data fitting modeling, optimization and control in systems
in dynamical systems. Kluwer, Dordrecht biology. Bioinformatics 32(21):3357–3359
9. Fröhlich F, Kaltenbacher B, Theis FJ, Hase- 19. Vassiliadis VS, Sargent RWH, Pantelides CC
nauer J (2017) Scalable parameter estimation (1994) Solution of a class of multi-stage
for genome-scale biochemical reaction net- dynamic optimization problems: 1, problems
works. PLoS Comp Biol 13(1):e1005331 without path constraints, 2, problems with
path constraints. Ind Eng Chem Res 33
10. Balsa-Canto E, Banga JR, Alonso AA, Vassilia- (2111–2122):2123–2133
dis VS (2002) Restricted second order infor-
mation for the solution of optimal control 20. Gnugge R, Dharmarajan L, Lang M, Stelling J
problems using control vector parameteriza- (2016) An orthogonal permease–inducer–re-
tion. J Proc Cont 12(2):243–255 pressor feedback loop shows bistability. ACS
Synth Biol 5:1098–1107
11. Lin Y, Stadtherr MA (2006) Deterministic
global optimization for parameter estimation 21. Bandiera L, Hou Z, Kothamachu V, Balsa-
of dynamic systems. Ind Eng Chem Res Canto E, Swain P, Menolascina F (2018)
45:8438–8448 On-line optimal input design increases the effi-
ciency and accuracy of the modelling of an
12. Polisetty P, Voit E, Gatzke E (2006) Identifica- inducible synthetic promoter. Processes 6
tion of metabolic system parameters using (9):148
global optimization methods. Theor Biol Med
Model 3:4 22. Storn R, Price K (1997) Differential evolution –
a simple and efficient heuristic for global opti-
13. Balsa-Canto E, Vassiliadis VS, Banga JR (2005) mization over continuous spaces. J Glob
Dynamic optimization of single- and multi- Optim 11:341–359
stage systems using a hybrid stochastic-
Chapter 12

A Cyber-Physical Platform for Model Calibration


Lucia Bandiera, David Gomez-Cabeza, Eva Balsa-Canto, and
Filippo Menolascina

Abstract
Synthetic biology has so far made limited use of mathematical models, mostly because their inference has
been traditionally perceived as expensive and/or difficult. We have recently demonstrated how in silico
simulations and in vitro/vivo experiments can be integrated to develop a cyber-physical platform that
automates model calibration and leads to saving 60–80% of the effort. In this book chapter, we illustrate the
protocol used to attain such results. By providing a comprehensive list of steps and pointing the reader to
the code we use to operate our platform, we aim at providing synthetic biologists with an additional tool to
accelerate the pace at which the field progresses toward applications.

Key words Synthetic biology, Mathematical modeling, System identification, Optimal experimental
design, Microfluidics

1 Introduction

Despite a booming community and some notable successes of


synthetic biology, synthetizing new genetic circuits remains
extremely time-consuming. This is mostly due to the fact that
their building blocks, so-called parts, are rarely properly character-
ized. Mathematical models are uniquely suited to address this
problem. However, despite being an engineering discipline, Syn-
thetic Biology has so far made limited use of them, mostly because
their inference has been traditionally perceived as expensive and/or
difficult.
Our group recently proposed [1] to combine Optimal Experi-
mental Design (OED) and microscopy/microfluidics in a cyber-
physical platform (Fig. 1a) that automates model calibration, i.e.,
the identification of parameters in a model.
Given a part of interest and an initial model for it, this system
iteratively identifies the most informative experiment to refine
parameter estimates and runs such experiments (off-line). In the
on-line configuration, the system periodically uses the newly

Filippo Menolascina (ed.), Synthetic Gene Circuits: Methods and Protocols, Methods in Molecular Biology, vol. 2229,
https://doi.org/10.1007/978-1-0716-1032-9_12, © Springer Science+Business Media, LLC, part of Springer Nature 2021

241
242 Lucia Bandiera et al.

Fig. 1 Cyber-physical platform and test case used to illustrate its implementation. (a) In the cyber-physical
platform, the computer where the OED algorithm is implemented quantifies gene expression and uses
Parameter Estimation/OED to stimulate the cells with inputs that maximize the amount of information
extracted per experiment. Such input is translated in a stimulus for the cells in the microfluidic device
using a Hydrostatic Pressure Modulation System (HPMS, see [7]). A microscope is used to observe cells and
close the OED loop. (b) An inducible promoter in engineered yeast cells, presented in [2], is considered in the
following. (c) Ordinary differential equation model used to mathematically formalize the behavior of the
inducible promoter

acquired experimental data to update the model and design an


optimal experiment for the new model, iterating until robust esti-
mates are reached. To achieve automatic model inference, the
platform we developed integrates seamlessly microfluidics, fluores-
cence videomicroscopy, real-time optimization, and fluidic actua-
tion (Fig. 1a). In a computational study [1] we determined that,
compared to standard experiments used in systems and synthetic
biology, this platform allows the error of parametric estimates to be
reduced by 60% when used in the off-line configuration. The
on-line (Fig. 2) mode allows increasing this figure to over 80%.
In the following, we consider the problem of identifying the
model of an inducible promoter in yeast S. cerevisiae as an example
of the application of our platform. We readapted the model from
the original paper [2] and, as a first step, we run structural identifia-
bility, sensitivity, and practical identifiability analyses. These ana-
lyses aim at determining the parameters the Optimal Experimental
Design should focus on. In fact, there is no reason for optimizing
the design of experiments for parameters that are not theoretically
(structurally) identifiable, parameters that do not sufficiently affect
the dynamical behavior of the model or that, in practical terms,
have little influence on the output. This is a crucial step as (a) there
A Cyber-Physical Platform for Model Calibration 243

Fig. 2 On-line vs off-line OED. In off-line OED (a) the input (red signal) is optimized before the beginning of the
experiment and then applied during the experiment while the output (green) is recorded. The experiment stops
at τH ¼ τS when the data are gathered for a potential new iteration. At this point, the off-line and on-line
modes differ: in on-line OED (b) at τS < τH a new parameter estimation routine is run on the input/output data
acquired up until then (0 < t < τS). The resulting model ℳ( p1) is used to design a new optimal input u ∗ 2 that
maximizes the information content of subexperiment 2 when it is administered to the cells

is no reason for optimizing the design of experiments for para-


meters that are not theoretically/practically identifiable and (b) by
allowing us to focus only on identifiable parameters, this analysis
reduces the dimensionality of the Fisher Information Matrix and
the computational complexity of the Optimal Experimental Design
procedure. These steps usually take place well before the experi-
ments themselves.
The day before the experiment, we fabricate the microfluidic
devices and inoculate the overnight (O/N) cultures. On the day of
the experiment, we start by mounting the microfluidic device on
the microscope, connecting the fluidic lines, and loading the cells
that carry the part of interest. Then, the experiment management
algorithm is started: this software triggers the acquisition of images
from the microscope, segments/tracks cells, quantifies the expres-
sion levels of the gene of interest and records the input/output
data. The experiment either stops after one iteration (off-line
OED) or the new data are used to (a) update the model and
(b) design a new input to refine it (Fig. 2). The procedure is
244 Lucia Bandiera et al.

repeated until parametric convergence or external factors bring the


experiment to a halt.
In the following, we describe all wet- and dry-lab procedures to
automate model inference. The reader is provided with pointers to
additional protocols and code where necessary to maximize the
applicability of the outlined procedure.

2 Materials

Prepare and store all reagents at room temperature, unless other-


wise stated. Comply with laboratory health and safety and waste
disposal regulations.

2.1 Computational 1. Structural identifiability analysis: MATLAB toolbox STRIKE-


Tools GOLDD [3].
2. Practical identifiability and sensitivity analyses, parameter esti-
mation, optimal experimental design: MATLAB toolbox
AMIGO2 [4].
3. Image processing: cell segmentation and tracking are per-
formed using the ImageJ plugins U-Net [5] and Lineage
Mapper [6].

2.2 Microfluidic 1. Soft lithography: wafer of the MFD005a microfluidic device


Device Fabrication [7], disposable plastic cup and spatula, silicon elastomer kit
(Sylgard 184), desiccator, vacuum pump, and oven.
2. Polydimethylsiloxane (PDMS) processing: razor blades, sterile
disposable scalpel, 1 mm disposable biopsy puncher, magic
tape, kimwipes, 5-mL disposable syringe, 25G needle, DI
water, fine bore polythene tubing, high-precision cover slips
24  60 mm N 1.5H, light source, UV cleaner, and oven.

2.3 Microfluidic 1. Supplement 60 mL of synthetic complete (SC) media with the


Experimental Setup appropriate sugar, e.g., glucose: 20% w/V solution in water.
Weigh 20 g of glucose and transfer it to a 100-mL graduated
cylinder containing about 60 mL of water. Make it up to
100 mL with water and filter-sterilize through a 0.22 μm filter.
2. Prepare a chemical inducer solution, e.g., isopropyl β-D-1-
thiogalactopyranoside (IPTG): 0.1 M solution in double dis-
tilled water. Weigh 0.238 g of IPTG powder in a weighing tray
and transfer to a 25-mL beaker containing 8 mL of water. Make
it up to 10 mL with water and filter-sterilize through a 0.22 μm
filter. Prepare 1 mL aliquots and store them at 20  C.
3. Prepare a fluorescent dye to track the inducer concentration,
e.g., sulforhodamine B: 0.1% w/V solution in water. Weigh
10 mg of sulforhodamine B powder in a 15-mL falcon tube and
A Cyber-Physical Platform for Model Calibration 245

resuspend in 8 mL of water. Make it up to 10 mL with water


and filter-sterilize through a 0.22 μm filter. Store in the dark.
4. Colonies of the S. cerevisiae strain under investigation.
5. Microfluidic devices.
6. 7 25G needles, 6 50-mL and 1 5-mL sterile, disposable
syringes.
7. Fine bore Polythene tubing (SMME800/100/120), electric
tape, Kimwipes.
8. Hydrostatic pressure linear actuators system [7].
9. Nikon Eclipse TI fluorescence microscope.

3 Methods

3.1 Structural 1. In Matlab, create a .mat file with information on your model:
Identifiability list the symbolic variables (syms) and specify the model states
(x), the output variables (h), the unknown parameters (p), the
dynamic equations (f), the vector of initial conditions (ics), the
known initial conditions (known_ics), and the inputs (u) (see
Note 1).
2. Open the file options.m and specify as modelname the string
given to the generated .mat file. If not already existent, create a
directory called results where the Structural Identifiability
results will be saved. If the complexity of the model requires
decomposition for the analysis to be run, additionally specify
the directory path for MEIGO software.
3. Select the desired identifiability options for the computation of
the generalized observability-identifiability matrix (see Note 2).
With the inducible promoter model example, the rank of the
matrix has been computed symbolically (value set to 0), the
states have not been replaced with known initial conditions
(value set to 0), identifiability of initial conditions and input
observability have been checked (value set to 1), finding of
identifiable combinations, checks for unidentifiability and
model decomposition have not been selected (value set to 0),
and maximum time allowed for computing 1 Lie derivative has
been set to 1000 s. To resolve structural identifiability issues,
the basal transcriptional rate α has been fixed (see Note 3). This
parameter is therefore defined in the vector of previously iden-
tified parameters (prev_ident_pars) since its value has been
fixed.
4. Start the Structural Identifiability analysis (see Note 4) by
running the script called STRIKE_GOLDD.m.
246 Lucia Bandiera et al.

3.2 Sensitivity 1. In AMIGO2, define the model to be analyzed in the Matlab


Analysis structure inputs.model by setting the number of states (.n_st),
parameters (.n_par), and inputs (.n_stimulus) as integers and
specify their names (.st_names, .par_names, and .stimulus_-
names). Add the differential model equations (.eqns) as char-
acter vectors. Nominal parameter values can be defined in
inputs.model.par. In addition, indicate the directory where
the results will be saved in .pathd.results_folder and .pathd.
short_name.
2. Define the experimental conditions that will be used for the
sensitivity analysis. Multiple experimental conditions can be
specified in a cell array. These will be defined in the Matlab
structure inputs.exps, containing the number of experiments (.
n_exp), the model initial conditions (.exp_y0), the experiment
duration (.t_f), the name (.obs_names) and number of obser-
vables (.n_obs) as well as their corresponding state variable in
the model (.obs). The experiment definition is complemented
by the type of stimuli applied (.u_interpret), the input switch-
ing times (.t_con), the values of the input (.u) and the number
(.n_s) and location (.t_s) of sampling times. In the inducible
promoter model, we hypothesize the use of three random step
profile experiments of 24 h. The piece-wise constant inputs are
composed of 180 min steps, while the inducer concentration is
randomly sampled in the range 0–10 μM, 0–30 μM, and
0–100 μM.
3. Provide information about the model parameters in the Matlab
structure inputs.PEsol. Define a character list of the parameters
to be considered in the analysis (.id_global_theta), their upper
(.global_theta_max) and lower (.global_theta_min) bounds,
and an initial estimate of their values (.global_theta_guess).
For a more exhaustive analysis, we selected 30 different initial
guesses for the parameter vector sampled through Latin-
hypercube sampling of the parameter boundaries on a logarith-
mic scale.
4. If desired, select the initial value problem (ivp) solver in the
Matlab structure inputs.ivpsol.ivpsolver and sensitivities solver
in inputs.ivpsol.senssolver as well as absolute (inputs.ivpsol.
atol) and relative (inputs.ivpsol.rtol) tolerances. We selected
as the ivp and sensitivities solver cvodes with both absolute
and relative tolerances set to 108.
5. Specify the number of samples, taken within the parameter
bounds, that will be used for the analysis (inputs.rank.gr_sam-
ples). The default value, also used in our example, is 10,000
samples.
6. To include all the necessary functions and tools into the Matlab
working directory, run the AMIGO2 script AMIGO_Startup.
A Cyber-Physical Platform for Model Calibration 247

Fig. 3 Results of global sensitivity analysis on the inducible promoter model considering a multiexperiment
scheme composed by three random dynamic inputs. (a) Importance factors computed by AMIGO2 to quantify
global sensitivity considering a random initial guess for the parameter vector. Note that, while the output
sensitivity values depend on parameter estimates, the ranking of the kinetic rates remains conserved across
multiple runs. (b) Box plots, overlaid with swarmplots, of the importance factor δmsqr
p , computed for 30 random
initial guesses of the parameter vector, for each parameter of the model ( p). Decreasing values of the
importance factor (from left to right) relate to a smaller sensitivity of the model output to the parameter

m or manually add the AMIGO2 folder and subfolders to


the path.
7. Run the AMIGO2 function AMIGO_Prep(), with the Matlab
structure inputs as an argument, for the preprocessing step.
This compiles and generates mex functions for the required
tasks.
8. Run the function AMIGO_GRank(), with the Matlab structure
inputs as an argument, to perform the Global Sensitivity analy-
sis (see Note 5) and obtain the list of model parameters, ranked
according to their decreasing ability to affect the output of the
model (Fig. 3).

3.3 Practical 1. Follow steps 1 and 2 from Subheading 3.2 to create the Matlab
Identifiability structure inputs.exps, which will additionally be populated
with the experimental data. To define the type of data used,
specify real or pseudo as a string in .data_type. Then, introduce
the data (.exp_data) and their associated error (.error_data).
2. Follow steps 3 and 4 from Subheading 3.2.
248 Lucia Bandiera et al.

Fig. 4 Results of the practical identifiability analysis. Example of joint plot of the
most (Kr) and least (kf) identifiable parameters in the inducible promoter model,
as selected from a comparison of the coefficient of variation of their estimates.
The marginal distributions were computed from the 95% confidence interval on
parameter estimates inferred on 600 in silico realizations of the experimental
data from a user-specified initial guess. The bivariate plot conveys information
about the correlation between the parameters (e.g., weak correlation in this
example)

3. If desired, select the cost function for the Parameter Estimation


problem in the Matlab structure inputs.PEsol.PEcost_type.
The cost function can be lsq (least squares, used in our exam-
ple), llk (log-likelihood), or user-defined. From the different
options in AMIGO2, select the desired type of weights from
inputs.PEsol.lsq_type or inputs.PEsol.llk_type. In our exam-
ple, this has been set to .lsq_type ¼ “Q_expmax.”
4. Define the optimization algorithm to be used in the Matlab
structure inputs.nlpsol.nlpsolver as well as the different hyper-
parameters associated with it (see Note 6). In our example, the
selection has been .nlpsolver ¼ “eSS,” and in the Matlab struc-
ture inputs.nlpsol.eSS, .maxeval ¼ 10,000, .maxtime ¼ 100, .
local.solver ¼ “lsqnonlin”, and .local.finish ¼ “lsqnonlin”.
5. Select the number of iterations where noisy simulated data will
be generated from the experimental profiles and the parameter
estimation problem will be solved (see Note 7).
6. Follow steps 6 and 7 from Subheading 3.2.
7. Run the function AMIGO_RIdent(), with the Matlab struc-
ture inputs as an argument, to perform the robust Practical
Identifiability analysis (see Note 8). An example of practical
identifiability results is shown in Fig. 4.
A Cyber-Physical Platform for Model Calibration 249

3.4 Microfluidic To enhance the efficiency of microfluidic device fabrication, per-


Experiments form all the steps described below on the same day. Soft lithography
(steps 1–4) should be executed in a clean or semiclean room to
3.4.1 Microfluidic Device
prevent dust and debris accumulation in the devices. Perform all
Fabrication
procedures using nitrile gloves.
1. Mixing the PDMS. A PDMS prepolymer is prepared by mixing
the curing agent and the silicone monomer in a 1:9 ratio. To
compute the mass of the prepolymer required to obtain a
0.5 cm thick PDMS mold, measure the diameter d (in cm) of
the patterned region of the wafer. As the density of the pre-
polymer is ρ ¼ 1.1 g/cm3, the mass of the mixture is 11=80π d 2.
In a plastic cup, weigh 99=800π d 2 g of silicon elastomer and
11=800π d 2 g of silicon curing agent. Stir the mixture with a
plastic spatula for approximately 3 min or until it becomes
white and foamy.
2. Degassing the PDMS. To remove the bubbles generated when
mixing, place the plastic cup into a desiccator. With the valve in
the open position, apply vacuum for 10–15 min: the decrease in
pressure will cause bubbles’ expansion and migration to the
PDMS surface. Close the valve, turn off the pump, and quickly
release the vacuum to pop bubbles. Repeat this cycle until
complete removal of air bubbles.
3. Pouring the PDMS. Slowly pour the PDMS on the master,
selecting a point without features. As this step will generate
new bubbles, cover the pyrex petri dish containing the wafer
and repeat step 2 until there will not be visible bubbles.
4. Curing. While keeping the waver as flat as possible, place it in
the oven at 60  C for 3 h.
5. Removing the PDMS mold. Remove the wafer from the oven
and let it cool down to room temperature. Using a sterile
scalpel, excise the PDMS in a circle comfortably containing
the patterned region of the wafer. To prevent damaging the
wafer, insert the scalpel in the PDMS at the minimum depth
allowing air to appear at the cut site. Keeping at a distance from
the patterned area, slowly lift up the PDMS layer from one side
and allow it to peel off from the wafer. Cover the petri dish
containing the wafer and store it in a safe place.
6. Cutting and punching the PDMS devices. To aid features’ visu-
alization, cover both surfaces of the PDMS mold with over-
lapping stripes of magic tape. Using a razor blade, cut the
patterned region of PDMS along the external perimeter.
Remove the tape in a single movement and place the PDMS
with feature-side up. Using a light source to enhance contrast,
place the biopsy puncher at the ports location and orient it
perpendicularly to the PDMS. Apply a downward pressure
250 Lucia Bandiera et al.

until the puncher will break through the PDMS. Use the
puncher plunger to get rid of the PDMS core, lift the PDMS
layer, and carefully pull out the puncher from the hole while
rotating the puncher in a counterclockwise direction. Follow-
ing the steps above, punch all ports in all devices. Cover the
PDMS with magic tape again and, using a blade razor, isolate
single microfluidic chips following the grids.
7. Cleaning device ports. Insert a 25G needle in a short length of
tubing and connect the needle adapter to a 5-mL disposable
syringe filled with double distilled water. Insert the free extrem-
ity of the tubing in a port and apply pressure. Water should flow
through the port, removing PDMS debris. Repeat the outlined
procedure on all ports on both sides of the devices.
8. Bonding chips to coverslips. Warm up the plasma cleaner. After
15 min, run two cycles of vacuum (30 s), plasma (45 s), and
pressure release. The plasma cleaner is ready for use when a
bright pink plasma is visible in the chamber. Remove dust from
the device by covering it with magic tape and insert it in the
plasma cleaner, feature-side up. Using kimwipes, gently wipe
both sides of a high precision coverslip until it is completely free
of dust and insert it in the chamber of the plasma cleaner. Turn
on the vacuum (30 s) and apply the plasma (45 s). Turn off the
vacuum and gently release the pressure. Remove the device and
the coverslip from the plasma cleaner and quickly bond them
by letting the device fall on the coverslip from a 45 angle.
Application of a downward pressure, which could cause the
features to collapse, has to be avoided. Transfer the bonded
chip to a 60  C oven for 15 min. Repeat the above steps for
each chip. Place the devices in a petri dish and store them at
room temperature.

3.4.2 Overnight Culture 1. On the day before the experiment, under the fume hood, pick
an isolated, average size colony from an SC plate supplemented
with the appropriate sugar (e.g., 2% w/V glucose) and inocu-
late it in a 20-mL test tube containing 5 mL of SC media
supplemented with sugar (e.g., 2% w/V glucose) and the high-
est concentration of the chemical inducer to be used (e.g.,
1000 μM IPTG).
2. Grow the cell culture overnight in a shaking incubator at 30  C,
230 rpm.
3. On the day of the experiment, measure the Optical Density
(OD600) of the cell culture. Dilute the cell culture to an
OD600 ~ 0.1 in fresh media having the same composition of
the one used in the overnight culture.
4. Grow in a shaking incubator at 30  C, 230 rpm for 2–3 h or
until the cell culture reaches the middle exponential phase
(OD600 ∈ [0.3, 0.5]). In the meanwhile, proceed with the
following steps.
A Cyber-Physical Platform for Model Calibration 251

3.4.3 Syringe 1. Prepare 7 lengths of tubing (4  70 cm, 2  150 cm,


Preparation 1  20 cm), 7 25G needles, 6 50-mL and 1 5-mL syringes.
2. Under the fume hood, connect a 25G needle to a free end of
each length of tubing. Remove the plunger from the syringes
and connect a needle adapter to each of them (the shortest
length of tubing is for the 5-mL syringe).
3. Fill the 5-mL syringe, to be used for device wetting, with SC
media supplemented with sugar.
4. Using paper tape, label the 50-mL syringes with sequential
numbers 1–6 (1 and 2 are the input syringes and are connected
to the longer tubing). The volume and composition of the
solutions to add to each syringe is specified in Table 1. For
each syringe, using a P1000 pipette, make contact with the
inside of the leur stub adapter, and load 1 mL of the appropri-
ate solution. This reduces the formation of bubbles that would
prevent the flow of solutions in the tubing. Using a 10-mL
stripette, load the residual volume by letting it run on the
syringe wall before reaching the bottom.
5. Raise the height of the syringe relative to the free end of the
tubing until a droplet appears at the end of the latter. Examine
the line to check for absence of air bubbles.
6. Cover the barrel flange with a parafilm and, using the tip of
scissors, create a hole in it.
7. In the microscope room, place syringes 1 and 2 on the linear
actuators at a height of 50 cm from the microscope stage.
Attach syringes 3, 4, and 5 to the microscope chamber, at a
height of 18, 23, and 23 cm above the stage (see Note 9).

3.4.4 Wetting 1. Secure the microfluidic device to the lid of a petri dish, acting as
the Microfluidic Chip a chip holder, on one side of the cover slip using paper tape.
Examine the quality of the device features at 10 magnification
to verify correct punching of the ports and absence of debris
obstructing the channels.
2. Apply pressure to the 5-mL syringe containing media until the
short length of tubing is free from air bubbles and droplets
appear at its free end.
3. Insert the free end of the tubing in port 5 and apply a gentle
pressure to enable media flow while preventing the chip from
being lifted off the cover slip. When a media droplet appears at
port 4, detach the tubing from port 5 by applying a counter
pressure and connect it to port 4 (see Note 10). Repeat the
above procedure for ports 3, 1, and 2.
4. Under the microscope, verify the absence of air bubbles in the
chip. If air bubbles are present, repeat the procedure above.
5. Using kimwipes, remove the excess of media on top of the
device.
252 Lucia Bandiera et al.

Table 1
Content of the 50 mL syringes for the microfluidic experiment

Syringe
identifier Syringe content
1 SC media complemented with the appropriate sugar, inducer and fluorescent dye for a
total volume of 10 mL (e.g., 8.73 mL SC media, 1 mL 20% glucose, 100 μL IPTG
0.1 M, and 170 μL Sulforhodamine B 1 mM)
2 SC media complemented with the appropriate sugar for a total volume of 10 mL
3 10 mL of SC media
4 10 mL of SC media
5 5 mL of SC media
6 5 mL of cell culture

3.4.5 Connecting 1. Remove the device from the petri dish and secure it to the
Syringes to the Chip sample holder using electrical tape. Clean the lower side of the
cover slip using 70% EtOH and kimwipes.
2. At 10 magnification, re-examine the chip for absence of air
bubbles and debris in the ports, the channels, and the chamber.
Using the wetting syringe, cover the ports of the chip with
media.
3. Check for the absence of air bubbles in the lines and, operating
at the height of the stage, connect syringe 5 to its port. Proceed
connecting syringe 4, 1, 2, and 3.
4. Verify that no air bubbles were introduced in the microfluidic
device during the procedure and secure the tubing to the stage
using paper tape.

3.4.6 Calibration 1. Select the microscope channels to be used (DIC and the chan-
of the Microfluidic Device nel for the fluorescent dye used to track the inducing media,
e.g., sulforhodamine) and set the field of view at the DAW
junction of the microfluidic device (Fig. 5).
2. Specify the minimum (hmin) and maximum (hMax) heights of
the actuators, which generate an approximate mixing ratio of
0% (i.e., absence of fluorescent signal in the channel feeding the
chamber) and 100% (i.e., fluorescent signal detected across the
entire width of the main channel). From these, the average
height (hmean, mixing ratio of approximately 50%) can be
retrieved.
3. To enable an accurate, a posteriori estimate of the 0% and 100%
mixing ratios, heights that correspond to pressures that will
slightly overshoot the central channel of the DAW junction
should be considered. To this aim, we specify a range for the
A Cyber-Physical Platform for Model Calibration 253

Fig. 5 Architecture of the MFD005a microfluidic device [7]. The microfluidic


chamber, the ports, the dial a wave junction (DAW), and the mixing channel are
highlighted. Black arrows depict the direction of media flow in running conditions

actuator’s height equal to [hmean  0.6  (hMax  hmin),


hmean + 0.6  (hMax  hmin) ].
4. Program the actuators to perform three triangle input waves,
centred on a mixing ratio of 50% and with period T ¼ 6 min.
Specify the amplitude of the actuators steps and acquire images
from the two selected channels at each step. In our case, the
step length has been set to 2 s (see Notes 11 and 12).
5. Save all fluorescence and DIC images as matrices in a Matlab
cell array structure.
6. From the fluorescence images, extract the region of interest
(mixing channel) (see Note 13).
7. Applying an edge detection function (e.g., edge() in Matlab) to
one DIC image, detect the boundaries of the mixing channel
and make a binary mask with unitary entries for the pixels
inside the channel. Next, perform an element-wise multiplica-
tion between the mask and each fluorescence image. This will
allow the selection of the pixels inside the channel required for
the computation of the mixing ratio.
8. Extract all the pixel values within the region of interest and use
a two-component Gaussian Mixture Model to identify the
distributions corresponding to background and sulforhoda-
mine fluorescence (see Note 14). Then, define a threshold to
discriminate them as a multiple of the standard deviation asso-
ciated to one of the distributions depending on the level of
noise of the fluorescent images (i.e., 3 standard deviations away
from the mean to recover 99.7% of the data) (see Note 15).
254 Lucia Bandiera et al.

9. Compute the mixing ratio as the number of pixels above the


threshold divided by the total number of nonzero pixels in the
selected region of interest.
10. Use a curve-fitting algorithm (least squares curve fit in our
case) to map the input pressure of one of the actuators to the
computed output mixing ratio. This will provide the exact
estimate of hmin, hMax, and hmean to be used [7] (see Note 16).
11. Save the results and verify the calibration accuracy. Visually, the
accuracy of calibration can be assessed through the input–
output relation plot (data and fit), the error percentage of the
fit versus the mixing ratio or a comparison between the fluo-
rescent images and a binary mask generated using the selected
threshold value.

3.4.7 Loading the Cells 1. Using a spectrophotometer, measure the OD600 of the cell
culture to verify whether it reached middle exponential phase.
2. Prepare the cell syringe, having ID 6 (see Subheading 3.4.3,
step 4) and attach it at a height above 23 cm from the micro-
scope stage.
3. Disconnect syringe 5 and connect syringe 6 to port 5. Move
syringe 4 to an upper position while keeping it below the cell
syringe.
4. Monitoring in live DIC, at a 60 magnification, verify cell flow.
Flickering the tubing of syringe 6 might help to perturb the
flow. To prevent premature clogging of the device, the initial
number of cells in the trap should be low, ideally below 10.
5. When satisfied with the number of cells in the trap, adjust the
syringes to the running position: gently lower syringe 4 to
23 cm above the stage, bring the cell syringe to the same
height, disconnect the cells reservoir from port 5, and plug in
syringe 5.
6. Verify the absence of air bubbles in the ports, channels, and the
chamber.

3.4.8 Microscope Setup 1. Using a 70% EtOH-wet kimwipe, clean the 40 objective. Add
oil and set the focus.
2. With the help of the stage controller to navigate the device,
mark the position of the chamber and DAW junction.
3. Select the DIC and fluorescence channels (e.g., sulforhoda-
mine, citrine) to be acquired during the experiment. For each
of them, specify the exposure time.
4. Specify the sampling frequency and the number of acquired
images. These two fields determine the duration of the
experiment.
A Cyber-Physical Platform for Model Calibration 255

5. Load the text file containing the dynamic perturbation profile


to be administered to the cells.
6. Specify the path to the folder in which the images will be stored
and start the acquisition.

3.5 Image In this section, we describe image processing and extraction of


Processing fluorescence time-series data at the single-cell and population
level. While the computational tools we employ were selected for
their flexibility toward alternative imaged cell-types, numerous
approaches are currently available for segmentation of yeast cells
from brightfield/DIC images [8–11].

3.5.1 Fine-Tuning 1. Open manually annotated images in ImageJ (see Note 17).
of the Weights 2. Open the U-Net Job manager (Plugins! U-Net ! Job Man-
of the Convolutional Neural ager) and select Fine-tuning.
Network
3. Use pretrained weights, available in the U-Net example
2d_cell_net_v0.caffemodel.h5, as a starting point for transfer
learning (see Note 18).
4. Subdivide the set of annotated images in training (67%) and
test set (33%). Both sets should contain representative samples
of the images to segment.
5. Specify the number of evaluations of the loss function used to
optimize the network weights. While 1.5  105 iterations are
normally enough, the number should be significantly increased
when the network is trained from scratch. In addition, set the
learning rate (1  105) and the validation interval (150).
6. Specify the file name and path where the resultant weights will
be saved (see Note 19).
7. Untick the selection “labels are classes” (see Note 20).
8. Press OK to start the fine-tuning.
9. Statistics plots are generated in real time during training.
Among these, the Loss function and the Intersection Over
Union plots are the most informative (Fig. 6).
10. Once newly optimized weights are available, qualitatively check
the performances of the network on samples in the validation
set and manually annotated images not included in the tuning.

3.5.2 Image 1. To isolate from the images the section corresponding to the cell
Segmentation trap, cut all DIC (and fluorescence) images using a rectangular
region of interest located at the same coordinate.
2. In ImageJ, open the DIC image to be segmented.
3. Open the U-Net Job manager (Plugins! U-Net ! Job Man-
ager) and select Segmentation.
256 Lucia Bandiera et al.

Fig. 6 Statistics plots generated by U-Net [5] during fine-tuning. (a) Plot of the intersection over union metric
as a function of the number of iterations in fine-tuning. The metric, computed as the ratio between the
overlapping cell objects predicted from convolutional neural network and identified in the ground-truth (i.e.,
manually annotated images) and the cell area encompassed by both, quantifies the accuracy of cell detection.
A value above 0.5 suggests a good prediction. (b) Cross-entropy loss, computed on the training (gray line) and
validation (blue line) sets, as refinement of the network weights occurs. In this example, convergence to
optimal weights is achieved after a limited number of iterations

4. Select the caffe.model.h5 file containing the CNN weights


obtained in Subheading 3.5.1 and press OK (see Note 18).
5. Once a binary mask (Fig. 7, top central panels) has been
generated, verify whether its pixel size is coherent with the
original DIC and fluorescence images and save them in TIFF
format (see Note 21).

3.5.3 Cell-Tracking 1. To identify single cells from the population in the binary image,
and Extraction open each image as a matrix in Matlab and label connected
of Fluorescence components by applying the bwlabel function. Hence, save the
Time-Series matrix as an image in TIFF format.
2. Open Lineage Mapper (Plugins! Tracking ! Lineage
Mapper).
3. Specify the path and file names for the images to be tracked as
well as the identifier of the directory and files in which the
results, i.e., masks with the tracking indexes, will be stored.
4. Populate the fields corresponding to the tracking parameters,
following the instructions provided by the plugin developers
[12] (see Note 22).
5. Press the tab “track.”
A Cyber-Physical Platform for Model Calibration 257

Fig. 7 Visual representation of the outcome of a microfluidic experiment in which the response of cells to a
random stepwise input (blue line) is measured in fluorescence microscopy. DIC images, together with the
associated binary mask, acquired at 0 and 24 h are shown (top panels). The mean fluorescence across the cell
population (black line) and its standard deviation (gray shaded area) are computed from single cell time series.
Representative single cell data are shown as yellow, pink, and purple lines. Note that the bottom panel reports
in silico data

6. In Matlab, import each mask in the time-series as a matrix.


7. For each cell-index in the mask, compute the average of the
fluorescence signal of the corresponding pixels at each time
point. This yields a vector of raw fluorescence, whose entries
are the average fluorescence of a given cell, at each time-point.
8. Correct the single-cell fluorescence vector by time-point sub-
traction of the background signal (see Note 23). This can be
computed as the average of the nonzero entries of a matrix
obtained by the entry-wise product of the fluorescence image
and the image complement of the binary mask obtained at step
5 of Subheading 3.5.2.
9. Merge the single-cell fluorescence vectors in a matrix, with the
number of rows equal to the number of cells and the number of
columns equal to the number of time points in the experiment.
Perform column-wise computation of the mean and standard
deviation of the fluorescence signal across the cell population
(Fig. 7, bottom panel) (see Note 24).

3.6 Parameter 1. Follow steps 1 and 2 from Subheading 3.2 and step 1 from
Estimation Subheading 3.3 to create the inputs.exps Matlab structure.
Populate it with the experimental data (Fig. 8a, b) obtained
from Subheadings 3.4 and 3.5.
258 Lucia Bandiera et al.

Fig. 8 Comparison of pseudo-experiments in which random (a, orange line) or optimally designed (b, cyan line)
inputs are used to gather data for parameter estimation. While aimed at exemplifying the outcome of OED and
PE in the cyber-physical platform, pseudo-data were here obtained by sampling the model output, in response
to the shown input, and adding 5% Gaussian noise. The green line represents the calibrated model response
to the data. (c) Distributions of the estimate of parameter γf, inferred with the two input profiles when
assuming a uniform prior, are compared to the true parameter value. The higher informative content of the
optimally designed input is reflected in the location (i.e., centered on the true value) and width of the
distribution

2. Follow steps 3 and 4 from Subheading 3.2 and steps 3 and


4 from Subheading 3.3 to specify the options related to param-
eter estimation, optimization, and ivp solvers. To ensure con-
vergence of the optimization algorithm, an adequate number
of evaluations (.maxeval) of the cost function and maximum
computation time (.maxtime) should be used. In our example,
we set them to 2  105 and 5  103, respectively.
3. Follow steps 6 and 7 from Subheading 3.2 to prepare the
inputs Matlab structure to perform Parameter Estimation.
4. Run the function AMIGO_PE(), passing the inputs Matlab
structure as an argument (see Note 25), to obtain parameter
estimates and their associated uncertainty (Fig. 8c). The good-
ness of fit can be assessed by measuring the distance between
the model output (Fig. 8a, b, green line), computed with the
inferred parameters, and the experimental data.

3.7 Optimal 1. Follow step 1 in Subheading 3.2 to create the inputs Matlab
Experimental Design structure that contains the ODEs.
for Model Calibration 2. Create the inputs.exps Matlab structure as described in step
2 from Subheading 3.2. To specify the properties of the
A Cyber-Physical Platform for Model Calibration 259

experimental scheme that will be optimized, select the type of


experiment as optimally designed (.exp_type ¼ “od”), the
allowed boundaries for the inducer (.u_min, .u_max) (see
Note 26), the noise type (.noise_type), and standard deviation
(.std_dev) associated to the experiment. In our example, we
constrained optimal experimental design (see Note 27) to the
identification of an optimal perturbation profile. This was defined
as a stepwise input with segments of fixed duration (.u_inter-
pret ¼ “stepf”). The total number of steps is defined in .n_steps.
3. Follow steps 3 and 4 from Subheading 3.2 and steps 3 and
4 from Subheading 3.3 to set the options related to parameter
estimation, optimization, and ivp solvers. To improve the con-
vergence of the optimization algorithm .maxeval and .maxtime
were set to 5  104 and 3  104, respectively.
4. In inputs.OEDsol.OEDcost_type, select the scalar measure of
the Fisher Information Matrix (FIM) to be used for optimal
experimental design (see Note 28).
5. Follow steps 6 and 7 from Subheading 3.2 to prepare the
inputs Matlab structure to perform Optimal Experimental
Design.
6. Run the function AMIGO_OED(), with the inputs Matlab
structure as an argument, to obtain the optimal input profile
(Fig. 8b, cyan line).

4 Notes

1. Follow the nomenclature specified within brackets when


assigning the name to each vector. If the initial conditions are
unknown, define the corresponding vector as empty. The vec-
tor of initial conditions is binary and has the same length and
order of the state variables vector. Entries are set to 1 if the
initial condition for the corresponding state variable is known.
2. Structural identifiability analysis is computationally expensive
due to the high memory consumption. Limited computational
resources or high complexity of the model, as determined by
the number of parameters and states, represents challenges to
the analysis. In such cases, the rank of the observability-
identifiability matrix can be computed numerically (opts.
numeric ¼ 1). To overcome the risk of obtaining an artificial
decrease of the matrix rank, the analysis should be re-run
several times to ensure convergence to the correct result. It is
worth noting that a parameter known to be identifiable (from
alternative analysis or because its value has been fixed) can be
specified as a symbolic variable in prev_ident_pars, with the
advantage of reducing the complexity of the problem or
260 Lucia Bandiera et al.

structural identifiability issues. As an alternative, memory con-


sumption can be restricted by reducing the maximum time
allowed for the computation of Lie derivatives (opts.maxLie-
time), although this might lead to uncertain identifiability
results for some parameters. Finally, models with a large num-
ber of states can be decomposed (opts.forcedecomp, opts.
decomp, and opts.decomp_user) on the ground that para-
meters found to be identifiable in a submodel will be identifi-
able in the whole model. Generation of submodels is an
optimized process performed by the software MEIGO
[13]. In our example, the computational time was approxi-
mately 1 min.
3. In some instances, for an in-depth insightful structural iden-
tifiability assessment of models, a multiexperiment analysis
(e.g., use of different inputs in one experiment) might be
required. In these cases, we suggest the use of the software
GenSSI 2.0 [14], which presents a structure/syntax similar to
STRIKE-GOLDD and a clear user manual on how to set up
the required scripts in the associated GitHub repository.
4. The analysis intends to determine the possibility of assigning
unique values to model parameters from ideal output measure-
ments (i.e., continuous and noise-free). STRIKE-GOLDD
performs a structural identifiability analysis as an extension of
the observability concept (i.e., the possibility to infer the inter-
nal state of the model from time-finite output measurements)
where parameters are considered as states without dynamic. To
test the structural identifiability of the model, STRIKE-
GOLDD makes use of Lie derivatives of the output function
to develop a generalized observability-identifiability matrix. A
full rank of the observability-identifiability matrix denotes local
structural identifiability of the model, while a lower rank indi-
cates unidentifiability for a given set of parameters [3].
5. Global ranking of the parameters is performed to assess the
relative influence of each parameter on the model predictions,
as quantified by relative parametric sensitivities. The general
case of multiple observables and experiments is considered in
AMIGO2, making use of diverse importance factors [15]. For a
broader analysis of the parametric sensitivities, AMIGO2 uses
n samples from the parameter space obtained by Latin hyper-
cube sampling. Since the analysis considers the generic case, in
which the isolated effect of time points, observables, or experi-
ments cannot be explored, it is recommended to run a similar
analysis for these elements. This supports an improved under-
standing of which parameters exert a more relevant effect on a
particular observable in a particular experimental scheme.
A Cyber-Physical Platform for Model Calibration 261

6. AMIGO2 presents a set of local, global, and hybrid optimiza-


tion algorithms that can be found in the software’s theoretical
background documentation [16]. Due to the general noncon-
vexity of the problems usually encountered, we recommend the
use of a hybrid optimizer such as enhanced Scatter Search (eSS)
combined with an indirect solver, which uses control vector
parameterization to avoid issues with nonsmooth functions.
However, this comes at a higher computational cost (indirect
methods make use of gradient descent methods, which are
faster). If only proximity to the global optimum is required,
use of a global optimizer, such as Differential Evolution, can be
sufficient. Finally, local methods can be used if a multistart for
the initial cost function evaluation is used. Due to the nonlinear
character of biological models, the use of only local solvers
might lead to suboptimal solutions.
7. This cycle needs to be performed for a sufficient number of
iterations for more robust and reliable results. While the mini-
mum recommended number of trials is 500 (600 used in our
example), the value can be adjusted according to the available
time and computational resources.
8. Practical identifiability analysis is performed to quantify the
expected uncertainty of the parameter estimates in relation to
a specific experimental scheme. Monte-Carlo sampling of the
parameter space is used to generate noise-corrupted pseudo-
data subject to an experimental profile and parameter estima-
tion is performed for each time-series. Principal Component
Analysis (PCA) is then applied to the 0.95–0.05 interquartile
range of the hyper-ellipsoid approximated by the samples so the
uncertainty of the estimates or correlation between parameters
can be estimated [16].
9. The height of the syringe above the microscope stage is
measured from the bottom of the meniscus of media in the
syringe.
10. By merging the media droplet on the port with the one at the
free end of the tubing, you can reduce the risk of air bubbles
entering the device.
11. It is recommended to have previously defined a mask contain-
ing the region of interest of the DAW junction (i.e., the mixing
channel). In the first iteration step, verify the mask overlays
with the acquired DIC image. This simplifies the definition of
the region of interest at the DAW junction.
12. To improve the stability of the procedure, the periodic signal
should be preceded and followed by a constant input at 50%
mixing ratio for 30 s.
262 Lucia Bandiera et al.

13. To prevent bias in the computation of the mixing ratio, due to


false positive/negative inclusions of pixels in the region of
interest, make sure that the latter covers an area that exceeds
the walls of the mixing channel.
14. A different function in Matlab can be used; some examples are
mixGaussEm() or fitgmdist(). A sufficiently high number of
iterations prevent convergence to a local minimum, in which
the function could not provide 2 Gaussian distributions with
different means in output.
15. Noise filtering might be required to enhance the quality of the
binary mask computed with the fluorescence threshold. For
example, noise could be detected by the presence of outlier
pixels.
16. To automate the procedure, the mentioned functions can be
integrated into a script controlling the actuators within Matlab
or in a designed GUI/platform.
17. To perform the fine-tuning of the convolutional neural net-
work, manually annotated images, acting as ground-truth sam-
ples, are required. You can use either full images, if denoted by
a low cell count, or a selected region. The manually annotated
images should constitute a representative set of the experimen-
tal time-lapse. In our example, we used a combination of
12 images (frames with a low number of cells) and subsection
of images (frames with a higher number of cells). Open in
ImageJ the DIC image to be annotated and, using the elliptical
selection tab, draw an initial contour around each cell in the
image. Ctrl+T will add the region of interest (ROI) to the ROI
Manager. Select each ROI and, using the brush selection tool,
refine the initial contour (ROIs must not overlap, but they can
be adjacent) and update the ROI. Transfer the ROIs into the
original image as an Overlay (Image ! Overlay ! From ROI
Manager) and save the resulting image in TIFF format.
18. U-Net offers different options inherent to the memory and
computational allocation of the process (i.e., fine-tuning or
segmentation). Among these, the user is asked to execute the
computation in CPU or GPU. Since the process is not opti-
mized for CPUs, we recommend the use of a GPU: this scales
the computational time from days to hours. It is worth noting
that, due to the dependency of the developed patch for Caffe
(python deep learning framework), both CPU and GPU
implementations rely on a Linux operating system. The
U-Net plugin is implemented to support computation in a
remote machine (e.g., Amazon Web Services) that can be
accessed to through an SSH connection.
A Cyber-Physical Platform for Model Calibration 263

19. The set of weights for the Convolutional Neural Network is


always saved in the remote host unless the user specifies an
additional copy should be stored locally.
20. The option “labels are classis” is of interest only when segmen-
tation or selection of different cell types is performed.
21. When segmenting a large number of images, the procedure can
be automated by running it in batch-mode using ImageJ
Macros code. We recommend recording the procedure of one
image (Plugins ! Macros ! Record. . .) to obtain the basic
Macros code for the segmentation. Then, integrate the section
in a loop to iterate over all the images in a script that can be run
in the Macros console (Plugins ! Macros ! Startup
Macros. . .).
22. In our example, we selected Minimum Object size ¼ 4, Maxi-
mum Centroid Displacement ¼ 50, Enable Division ¼ yes,
Enable Fusion ¼ no, Weight Cell Overlap ¼ 0, Weight Cell
Centroid Displacement ¼ 100, Weight Cell Size ¼ 0.75, Min-
imum Division Overlap ¼ 0, Daughter Size Similarity ¼ 30,
Daughter Aspect ratio similarity ¼ 75, Mother Circularity
Threshold ¼ 50, Number of frames to check for circularity ¼ 5,
Minimum Cell Lifespan ¼ 30, Cell Death Delta Centroid
Threshold ¼ 0, and allow cell density and border cells to affect
the confidence index.
23. As within the microfluidic device cells are growing in a mono-
layer, by the end of the experiment the number of pixels
corresponding to the background will be significantly low,
impeding an appropriate correction for the background fluo-
rescence level. Under the hypothesis of minimal variation in
time of the background fluorescence signal, we suggest sub-
tracting the average background computed over the previous
segment of the experiment whenever the number of back-
ground pixels falls below a user-specified threshold.
24. The computation of single cell fluorescence vectors enables the
screening of cells that have been imaged for a minimum
amount of time defined by the experimentalist and the exclu-
sion of abnormal cells that can populate the last frames.
25. Parameter estimation (i.e., model calibration) aims to estimate
unknown model parameters. Here, parameter estimation is
framed as a nonlinear optimization problem whose objective
is to minimize a predefined distance measure (cost-function)
between experimental data and model predictions
[15]. AMIGO2 implements both the weighted least squares
and the log-likelihood cost functions to be selected upon
depending on the information available on the noise (homo-
scedastic or heteroscedastic) corrupting the data. While these
scalar measures assume normally distributed noise, the
264 Lucia Bandiera et al.

software allows the introduction of alternative, user-defined


cost functions.
26. Here, considering experimental constraints due to the use of a
microfluidic platform, we cast OED as a constrained optimiza-
tion problem that searches for the most informative stepwise
perturbation profile composed of segments of fixed duration.
This corresponds to optimizing the concentration of the
inducer administered to the cells at each step. It is worth
mentioning that AMIGO2 supports the optimization of
other control variables: number and location of sampling
times, observed species, initial conditions, and experimental
duration. In general, the selection of the most suitable strategy
will depend on the biological system, the complexity of its
mathematical description, and limitations of the experimental
platform used for data gathering.
27. Optimal experimental design (OED) is a branch of statistics
that searches for the most informative and less resource-
intensive experimental scheme (here for model calibration).
By increasing the informative content of the acquired data,
OED allows overcoming issues that affect parameter estima-
tion (e.g., practical identifiability). To quantify the amount of
information of an experiment, AMIGO2 makes use of the
Fisher Information Matrix to solve a general dynamic optimi-
zation problem minimizing or maximizing a scalar measure
that relates to the shape and size of the hyper-ellipsoid asso-
ciated to the FIM [15].
28. Multiple scalar measures of the FIM (i.e., optimality criteria)
are available in scientific literature. AMIGO2 implements
D-optimality (Determinant), E-optimality (Eigenvalue),
A-optimality (Average), and DoverE-optimality. D-optimality
seeks to minimize the determinant of the inverse of the FIM,
E-optimality to maximize the minimum eigenvalue of the FIM,
and A-optimality to minimize the trace of the inverse of the
FIM. In our examples, following its widespread adoption, we
selected D-optimality.

References
1. Bandiera L, Hou Z, Kothamachu V, Balsa- 3. Villaverde AF, Barreiro A, Papachristodoulou
Canto E, Swain P, Menolascina F (2018) A (2016) Structural identifiability of dynamic
On-line optimal input design increases the effi- systems biology models. PLoS Comput Biol 12
ciency and accuracy of the modelling of an (10):1–22
inducible synthetic promoter. Processes 6 4. Balsa-Canto E, Henriques D, Gábor A, Banga
(9):148 JR (2016) AMIGO2, a toolbox for dynamic
2. Gnügge R, Dharmarajan L, Lang M, Stelling J modeling, optimization and control in systems
(2016) An orthogonal Permease-inducer- biology. Bioinformatics 32(21):3357–3359
repressor feedback loop shows bistability. ACS
Synth Biol 5(10):1–29
A Cyber-Physical Platform for Model Calibration 265

5. Falk T et al (2019) U-net: deep learning for cell segmentation of budding yeast. Bioinformatics
counting, detection, and morphometry. Nat 34(1):88–96
Methods 16(1):67–70 12. “Lineage Mapper User Guide.” [Online].
6. Chalfoun J, Majurski M, Dima A, Halter M, https://github.com/USNISTGOV/Lineage-
Bhadriraju K, Brady M (2016) Lineage map- Mapper/wiki/User-Guide
per: a versatile cell and particle tracker. Sci Rep 13. Egea JA, Henriques D, Cokelaer T, Villaverde
6:1–9 AF, Julio R (2014) MEIGOR: a software suite
7. Ferry MS, Razinkov IA, Hasty J (2011) Micro- based on metaheuristics for global optimiza-
fluidics for synthetic biology, vol 497, 1st edn. tion in systems biology and bioinformatics.
Elsevier Inc., San Diego Continuous and mixed-integer problems:
8. Versari C et al (2017) Long-term tracking of enhanced scatter search, pp. 1–33
budding yeast cells in brightfield microscopy: 14. Ligon TS, Fröhlich F, Chiş OT, Banga JR,
CellStar and the Evaluation Platform. J R Soc Balsa-Canto E, Hasenauer J (2018) GenSSI
Interface 14:20160705 2.0: multi-experiment structural identifiability
9. Dimopoulos S, Mayer CE, Rudolf F, Stelling J analysis of SBML models. Bioinformatics 34
(2014) Accurate cell segmentation in micros- (8):1421–1423
copy images using membrane patterns. Bioin- 15. Balsa-canto E, Alonso AA, Banga JR (2010) An
formatics 30(18):2644–2651 iterative identification procedure for dynamic
10. Bredies K, Wolinski H (2011) An active- modeling of biochemical networks. BMC Syst
contour based algorithm for the automated Biol 4:11
segmentation of dense yeast populations on 16. “AMIGO2 Documentation.” [Online].
transmission microscopy images. Comput Vis https://sites.google.com/site/
Sci 14(7):341–352 amigo2toolbox/doc
11. Bakker E, Swain PS, Crane MM (2018) Mor-
phologically constrained and data informed cell
Chapter 13

Prediction of Cellular Burden with Host–Circuit Models


Evangelos-Marios Nikolados, Andrea Y. Weiße, and Diego A. Oyarzún

Abstract
Heterologous gene expression draws resources from host cells. These resources include vital components to
sustain growth and replication, and the resulting cellular burden is a widely recognized bottleneck in the
design of robust circuits. In this tutorial we discuss the use of computational models that integrate gene
circuits and the physiology of host cells. Through various use cases, we illustrate the power of host–circuit
models to predict the impact of design parameters on both burden and circuit functionality. Our approach
relies on a new generation of computational models for microbial growth that can flexibly accommodate
resource bottlenecks encountered in gene circuit design. Adoption of this modeling paradigm can facilitate
fast and robust design cycles in synthetic biology.

Key words Cellular burden, Growth models, Whole-cell modeling, Gene circuit design, Synthetic
biology, Resource allocation

1 Introduction

The grand goal of Synthetic Biology is to engineer living systems


with novel functions. The approach relies on the combination of
biological knowledge with design strategies from engineering
sciences [1–4]. Engineering principles, such as modularity and
standardization, have led to gene circuits with a wide range of
functions such as cellular oscillators [5, 6], memory devices [7],
and biosensors [8, 9]. As synthetic biology matures into an engi-
neering discipline of its own, mathematical modeling is playing an
increasingly important role in the design of biological circuitry
[10]. Moreover, model-based design offers opportunities for
other fields such as computer-aided design [11], control theory
[12], and machine learning [13] to contribute with new methods
and protocols for gene circuit design.
The success of the celebrated “design–build–test–learn” cycle
[14] relies on the availability of good quality models for circuit
function. A major drawback of current modeling frameworks, how-
ever, is the implicit assumption that biological circuits function in

Filippo Menolascina (ed.), Synthetic Gene Circuits: Methods and Protocols, Methods in Molecular Biology, vol. 2229,
https://doi.org/10.1007/978-1-0716-1032-9_13, © Springer Science+Business Media, LLC, part of Springer Nature 2021

267
268 Evangelos-Marios Nikolados et al.

isolation from their host. This simplification limits the predictive


power of circuit models and slows down the iterations between
system design, testing, and characterization. In reality, gene circuits
interact with their host in many ways, including the consumption of
molecular resources such as amino acids, nucleotides, or energy, as
well as using major components of the genetic machinery such as
polymerases and ribosomes.
Competition for a limited pool of host resources produces a
two-way interplay between synthetic circuits and the native physi-
ology of the host [15]. This interplay is commonly known as
burden and perturbs the homeostatic balance of the host, resulting
in slowed growth, reduced biosynthesis, and the induction of stress
responses [16]. Since such effects can impact circuit behavior, they
create feedback effects that can potentially break down circuit
function [17–19]. As a result, individual modeling of circuit parts
and their connectivity is not sufficient to predict circuit function
accurately.
In a seminal study on host–circuit interactions, Tan and collea-
gues [20] studied a simple circuit consisting of T7 RNA polymerase
that activates its own expression in Escherichia coli. Contrary to
what standard mathematical models would predict, the circuit dis-
played bistable dynamics. The authors showed that synthesis of the
polymerase produced an indirect, growth-mediated, positive feed-
back loop, which when included in their model was able to repro-
duce the observed bistability. This study was the first empirical
demonstration that growth defects can drastically change circuit
function. A number of subsequent works have focused on the
sources and impact of burden on gene circuits. For example, Ceroni
et al. showed that genes with weaker ribosomal binding-strength
are less taxing on the host resources [21]. Other works have
focused on strategies to mitigate burden. An and Chin built a
gene expression system that combines orthogonal transcription by
T7 RNA polymerase and translation by orthogonal ribosomes
[22]. The system reported in [23] allows to allocate resources
among competing genes, while [24] built libraries of promoters
that tune expression of burdensome proteins and decrease cellular
stress. The work by Shopera et al. showed that negative feedback
control can reduce the cross-talk between gene circuits
[25]. Another strategy for reducing burden was proposed in [26]
using an orthogonal ribosome for translation of heterologous
genes. A particularly attractive strategy is to exploit burden to
improve functionality. For example, Rugbjerg and colleagues
increased metabolite production by coupling pathway expression
to that of essential endogenous genes [27], while [28] employed
stress-response promoters to build a feedback system with increased
protein yield.
Host-Circuit Modelling 269

As a result of the increasing interest in cellular burden and


host–circuit interactions, the modeling community has devoted
substantial attention to improving models for gene circuits and
their interaction with a host. A key challenge is to find a suitable
level of model complexity with enough detail to describe tunable
circuit parts but without excessive granularity that makes models
impractical. At one end of the complexity spectrum, a number of
works have proposed simple resource allocation models for the
interplay between circuit and host genes [29–31]. Using different
modeling approaches and assumptions, these models generally pre-
dict a linear relation between expression of native and heterologous
genes. Increases in the expression of one gene cause a linear drop in
the expression of another gene, as a result of a limited abundance of
ribosomes for translation. At the other end of the spectrum, the
whole-cell model of Mycoplasma genitalium [32] was an ambitious
attempt to describe all layers of cellular organization under a single
computational model. A subsequent work demonstrated the use of
the whole-cell model in conjunction with gene circuits [33]. Yet to
date such whole-cell models have not been built for bacterial hosts
commonly employed in synthetic biology, and their high complex-
ity prevents their systematic use in circuit design and optimization.
A number of approaches have sought to find a middle ground
between model complexity and tractability. Inspired by the widely
established “bacterial growth laws” [19, 34], Weiße and colleagues
built a mechanistic growth model for Escherichia coli [35]. The
model uses a coarse-grained partition of the proteome to describe
how cells allocate their resources across various gene expression
tasks. It accurately predicts growth rate from the interplay between
metabolism and gene expression, and can be extended with a wide
range of genetic circuits. Applications of the mechanistic growth
model include the design of orthogonal ribosomes [26], the addi-
tion of extra layers of regulation [36], and its extension to single-
cell growth dynamics [37]. Most recently, Nikolados et al.
employed the model to study the impact of growth defects in
various exemplar circuits [38].
In this tutorial we describe how mechanistic growth models can
be employed to simulate gene circuits together with the host
physiology (Fig. 1). In Subsection 2 we first revisit the bacterial
growth laws and explain the core principles of the mechanistic
growth model. In Subsection 3 we present how to extend the
growth model with heterologous genes. We illustrate the method-
ology with a number of transcriptional logic gates in Subsection 4.
We conclude the chapter with a perspective for future research in
the field.
270 Evangelos-Marios Nikolados et al.

circuit parts cellular host


genes

promoters RBS

host-circuit model

growth resource
defects usage
design circuit
dbl time

space function

expression translation
parameter 2

protein
time
ribosomes
parameter 1

Fig. 1 Host–circuit modeling. Integrated host–circuit models provide a quantita-


tive basis to study the impact of design parameters on circuit function and
genetic burden on their host

2 Coarse-Grained Models for Bacterial Growth

We begin by describing the bacterial growth laws that form the


basis for most current models for growth. Our focus is on coarse-
grained models that describe cell physiology using lumped variables
representing aggregates of molecular species. We deliberately
exclude whole-cell models [32] and genome-scale models [39],
both of which have been discussed extensively in the literature
[40–42] and so far have found relatively limited applications in
gene circuit design.

2.1 Bacterial Bacterial growth has been an active topic of study for many decades.
Growth Laws The celebrated work of Nobel laureate Jacques Monod provided a
key quantitative description for growth [43], based on the observa-
tion that bacteria in batch cultures exhibit several phases of growth:
l Lag phase: cells do not immediately start to grow after nutrient
induction, as they first must adapt to the new environment;
RNA and proteins are produced as the cell prepares for division.
l Exponential phase: cells duplicate at a constant rate, so that
their number grows exponentially as N(t) ¼ N02 t/τ with τ being
the average doubling time. Equivalently, the number of cells can
be expressed as N(t) ¼ N0eλt, where λ ¼ log 2=τ is the
growth rate.
Host-Circuit Modelling 271

l Stationary phase: cell replication stops because an essential


nutrient has been depleted from the batch. The number of
cells remains constant during this phase.
l Death phase: cells begin to die, resulting in a decreasing cell
population.
The vast majority of studies on bacterial growth focus on the
exponential phase, and to date this remains the best characterized
growth phase. A widely empirical model for exponential growth is
given by Monod’s law, which relates the instantaneous growth rate
and the substrate concentration:
λmax s
λ¼ , ð1Þ
s þ Ks
where s is the growth substrate, λmax is the maximum growth rate
possible in the substrate, and Ks is the substrate concentration for
which growth rate is half maximal. The relationship in Eq. 1 is
known as Monod’s law and describes the hyperbolic dependence
of the growth rate λ on the concentration of a growth-limiting
nutrient s in the medium.
Measurements of bacterial cells growing at different rates
[44, 45] have revealed a central role of ribosome synthesis in
maintaining exponential growth [46, 47]. In particular, the ribo-
somal mass fraction, ϕR, has been shown to increase linearly with
growth rate [44, 48]. This is the second growth law, described
mathematically as:
λ
ϕR ¼ ϕmin þ , ð2Þ
R κt
R is an offset term and κ t is a phenomenological parameter
where ϕmin
related to protein synthesis.
The third growth law relates to growth inhibition. It has been
shown that sublethal antibiotic doses targeting ribosomal activity
produce a negative linear relation between growth rate and the
ribosomal mass fraction [19]. Mathematically, this growth law can
be described by:
λ
ϕR ¼ ϕmax
R  , ð3Þ
κn
where the parameter κ n describes the nutrient capacity of the
growth medium and ϕmax R is the maximum allocation to ribosomal
synthesis in the limit of complete translational inhibition.
Taken together, Eqs. 1–3 provide a remarkably simple descrip-
tion of exponential growth. Yet a common caveat of such descrip-
tions is their lack of explicit links between phenomenological
parameters and the molecular processes that drive growth. Some
works have indeed found quantitative descriptions of model para-
meters in terms of intracellular properties [19, 34]. However,
272 Evangelos-Marios Nikolados et al.

another strand of research has moved away from phenomenological


models toward mechanistic descriptions of cell physiology [35, 49,
50]. Notably, earlier work by Molenaar and colleagues [51] pro-
posed a model that integrates metabolism and protein biosynthesis
into a resource allocation model. A key assumption in that approach
is that microbes adjust their proteome composition to maximize
growth. This leads to growth predictions that rely on an optimality
principle, without the need of a mechanistic description of how
cellular constituents contribute to growth and replication.

2.2 A Mechanistic Here we describe a mechanistic model that predicts bacterial


Model of Bacterial growth rate from first principles [35]. The model, illustrated in
Growth Fig. 2, reproduces the bacterial growth laws and, at the same time,
contains detailed mechanisms for nutrient metabolism, transcrip-
tion, and translation. It employs a partition of the proteome similar
to an earlier work [51], but it does not require the assumption of
growth maximization. The model is versatile and can predict how
cells reallocate their proteome composition under various types of
perturbations, including nutrient shifts, genetic modifications, and
antibiotic treatments.
The model combines nutrient import and its conversion to
cellular energy with the biosynthetic processes of transcription
and translation. In its basic form, the model includes 14 intracellular
variables: an internalized nutrient si; a generic form of energy,
denoted a, that models the total pool of intracellular molecules
required to fuel biosynthesis, such as ATP and amino acids; and
four types of proteins: ribosomes pr, transporter enzymes pt,

ribosomes

proteome
pr
ro
o
transcription

translation
n

energy enzymes

metabolism

nutrients

Fig. 2 Mechanistic model for bacterial growth. The model predicts growth rate from the allocation of two
cellular resources (energy and ribosomes) among the various processes that fuel growth and replication [35]
Host-Circuit Modelling 273

metabolic enzymes pm, and house-keeping proteins pq. The model


also contains the corresponding free and ribosome-bound mRNAs
for each protein type, denoted by mx and cx, respectively, with
x ∈{r, t, m, q}. The model can be described by the chemical reac-
tions listed in Table 1. From these reactions we model the cell as a
system of ordinary differential equations, describing the rate of
change of the numbers of molecules per cell of a particular species.
Next we explain in detail how the model equations are built.
The environment, or growth medium, of the cell contains a
single nutrient described by the constant parameter s. A transporter
protein pt is responsible for the uptake of the external nutrient at a
fixed concentration, which once internalized, si, is catabolized by a
metabolic enzyme pm. The dynamics of the internalized nutrient
obey:
s_i ¼ vimp  vcat  λs i : ð4Þ
Similarly to the bacterial growth laws described in Subsection 2.1,
the growth rate is denoted by λ. All intracellular species are assumed
to be diluted at a rate λ because of partitioning cellular content
between daughter cells at division. Nutrient import (vimp) and
catabolism (vcat) are assumed to follow Michaelis–Menten kinetics:
vt s vm s i
vimp ¼ pt , vcat ¼ pm , ð5Þ
Kt þ s K m þ si
where vt and vm are maximal rates, while Kt and Km are Michaelis–
Menten constants. Since translation is known to dominate energy
consumption [48], the model neglects other energy-consuming
processes. Using cx to denote the complex between a ribosome
and the mRNA for a protein px, the translation rate for every
protein obeys
γðaÞ
vx ¼ c x : ð6Þ
nx
The parameter nx in Eq. 6 is the length of the protein px in terms of
amino acids, and the term γ(a) represents the net rate of transla-
tional elongation. Assuming that each elongation step consumes a
fixed amount of energy [35], the net elongation rate depends on
the energy resource by:
γ a
γðaÞ ¼ max , ð7Þ
Kγ þ a
where γ max is the maximal elongation rate and Kγ is the energy
required for a half-maximal rate. From Eq. 6 we can compute the
total energy consumption by translation of all proteins and get a
differential equation for the net turnover of energy:
P
a_ ¼ ns vcat  nx vx  λa, ð8Þ
xfr, t, m, qg
274

Table 1
Chemical reactions in the mechanistic growth model [35]

Transcription Dilution/degradation Ribosome binding Dilution Translation Dilution


Ribosomes wr λþd m k
b λ vr λ
ϕ ! mr mr ! ϕ −

pr + mr − cr cr ! ϕ nr a þ c r ! pr þ m r þ pr pr ! ϕ
ku
Evangelos-Marios Nikolados et al.

Transporter enzyme wt λþd m k


b λ vt λ
ϕ ! mt mt ! ϕ −

pr + mt − ct ct ! ϕ nt a þ c t ! pr þ m t þ pt pt ! ϕ
ku

Metabolic enzyme wm λþd m k


b λ vm λ
ϕ ! mm mm ! ϕ −

pr + mm − cm cm ! ϕ nm a þ c m ! pr þ m m þ pm pm ! ϕ
ku

House-keeping proteins wq λþd m k


b λ vq λ
ϕ ! mq mq ! ϕ 
pr + mq −
− cq cq ! ϕ nq a þ c q ! pr þ m q þ pq pq ! ϕ
ku

Nutrient import v imp Internal nutrient λ


s ! si si ! ϕ

Metabolism vcat Energy molecules λ


s i ! ns a a!ϕ
Host-Circuit Modelling 275

where the sum over x is over all types of protein in the cell. Overall,
energy is created by metabolizing si and lost through translation
and dilution by growth. The positive term in Eq. 8 determines
energy yield per molecule of internalized nutrient from Eq. 4.
The parameter ns describes the nutrient efficiency of the growth
medium.
In rapidly growing E. coli, it is known that transcription has a
minor role in energy consumption [52]. We therefore model tran-
scription as an energy-dependent process, but with a negligible
impact on the overall energy pool. If wx,max denotes the maximal
transcription rate, the effective transcription rate has the form
a
w x ¼ w x;max , ð9Þ
θx þ a
for all proteins except house-keeping ones, i.e. x ∈{r, t, m}. We
assume that the transcription of housekeeping mRNAs is subject to
negative autoregulation so as to keep constant expression levels in
various growth conditions:
a 1
wq ¼ w q;max  :
θq þ a 1 þ ðpq =K q Þhq
|fflfflffl{zfflfflffl} |fflfflfflfflfflfflfflfflfflfflffl{zfflfflfflfflfflfflfflfflfflfflffl} ð10Þ
energy  dependent negative
translation autoregulation
In Eqs. 9 and 10, the parameter θ x denotes a transcriptional thresh-
old, while Kq and hq are regulatory parameters. The differential
equations for the number of mRNAs (mx) are therefore:
m_ x ¼ wx  ðλ þ d m Þmx þ vx  kb pr m x þ ku c x , ð11Þ
where x ∈{r, t, m, q}. In Eq. 11, mRNAs are produced through
transcription with rate wx, while mRNAs are lost through dilution λ
and degradation with rate dm. At the same time, mRNAs bind and
unbind with ribosomes, so that the ribosome–mRNA complexes
(cx) follow
c_x ¼ λc x  vx þ kb pr m x  ku c x , ð12Þ
where kb and ku are the rate constants of binding and unbinding.
Translation contributes with a positive term to Eq. 11 and a nega-
tive term to Eq. 12. The differential equations for protein abun-
dance are therefore:
p_x ¼ v x  λpx , x∈ft, m, qg: ð13Þ
We note that Eq. 13 applies to all proteins except free ribosomes.
The equation for free ribosomes pr includes an additional term:
P
p_ r ¼ vr  λpr þ ðvx  kb pr mx þ ku c x Þ: ð14Þ
x∈fr, t, m, qg
276 Evangelos-Marios Nikolados et al.

Through Eq. 14 the model accounts for competition among


different mRNAs for free ribosomes, as well as ribosomal autoca-
talysis. Ribosomal transcripts sequester free ribosomes for their
own translation, and the pool of free ribosomes can increase as a
result of translation of new ribosomes and, at the same time, the
release of ribosomes engaged in translation of non-ribosomal
mRNAs.
Finally, it can be shown (details in [35]) that under the assump-
tion of constant average mass, the specific growth rate can be
computed in terms of the total number of ribosomes engaged in
translation:
γðaÞ X
λ¼  cx, ð15Þ
M
x∈fr, t, m, qg

where M is the constant cell mass.


Overall, Eqs. 4–15 constitute the core of the mechanistic
growth model. Equations 8 and 14, in particular, model the avail-
ability of energy and ribosomes, both regarded as cellular resources
shared between metabolism and protein biosynthesis. The model
contains 22 parameters. For E. coli, some parameter values were
mined directly from the literature and others were estimated with
Bayesian inference on published growth data [19, 35]. The param-
eter values are shown in Table 2. We note that we have assumed that
all components of the proteome are not subject to active degrada-
tion. As we shall see in the next sections, the core model can be
extended with gene circuits of varying complexity.

Table 2
Model parameters for an Escherichia coli host, taken from [35]

Parameter Value Parameter Value


s 104 (molecules) M 108 (aa)
nr 7459 (aa/molecules) θr 427 (molecules)
γ max 1260 (aa/min molecules) Kγ 7 (molecules)
1
vt 726 (min ) Kt 1000 (molecules)
1
vm 5800 (min ) Km 1000 (molecules)
wr,max 930 (molecules/min) wm,max, wt,max 4.14 (molecules/min)
wq,max 949 (molecules/min) dm 0.1 (min1)
Kq 152,219 (molecules) hq 4
θ q, θ t, θ m 4.38 (molecules) nq, nt, nm 300 (aa/molecules)
1 1
kb 0.0095 (min molecules ) ku 1 (min1)
Units of aa correspond to number of amino acids per cell
Host-Circuit Modelling 277

3 Modeling Gene Circuits Coupled with Their Host

In this section we discuss how to extend the mechanistic growth


model with heterologous gene circuits. The extended model can be
employed for predicting the impact of genetic parameters, such as
promoter strengths or gene length, on the growth rate of the host
strain and the resulting heterologous expression levels. We first
describe the steps needed to extend the model, and then illustrate
the ideas with a simple model for an inducible gene. This is a simple
example that contains all the elements needed by more complex
circuits.

3.1 Extending The extension of the model requires three steps:


the Model
Step 1: Add New Model Species: First, we include mass balance
with Heterologous equations for the expression of each heterologous gene. This
Genes requires three additional species per gene: the transcript, the
mRNA–ribosomal complex, and the protein, all of which follow
dynamics similar to Eqs. 11–13:
p_ i
c
¼ vci  ðλ þ d p Þpci ,
m_ ci ¼ wci  ðλ þ d m Þmci þ v ci  kcb,i pr m ci þ kcu,i c ci ,
c_ci ¼ λc ci þ kcb,i pr m ci  kcu,i c x  vci ,
ð16Þ
where the superscript c denotes heterologous species and the sub-
script i denotes the ith heterologous gene. The ribosomal binding
parameters kcb,i and kcu,i are specific to each gene and can be used, for
example, to model different ribosomal binding sequences. The
translation rate vci is modeled similarly as that of native genes in
Eq. 6:
c ci γ a
v ci ¼  max , ð17Þ
nci a þ K γ
with nci being the length of the ith circuit protein. Likewise, the
transcription rate is similar to Eq. 9:
a
wci ¼ w cmax,i c R, ð18Þ
θ þa i
where wcmax,i is the maximal transcription rate. Note that we have
included an additional term Ri to model regulatory interactions by
other genes. Complex circuit connectivities can be modeled by
suitable choices of the function Rii. Later in Subheading 4 we
exemplify this with models for transcriptional logic gates.
Step 2: Modify Allocation of Resources: Second, we include the
additional consumption of energy and ribosomes in the model.
Starting from the resource equations in Eqs. 8 and 14, we write:
278 Evangelos-Marios Nikolados et al.
P X
a_ ¼ ns vcat  nx vx  nci vci  λa,
x
i
|fflfflfflfflffl{zfflfflfflfflffl} ð19Þ
energy consumption
by foreign genes

P
p_ r ¼ v r  λpr þ ðvx  kb pr m x þ ku c x Þ
X
x

þ ðvci  kcb,i pr m ci þ kcu,i c ci Þ : ð20Þ


i
|fflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflffl{zfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflffl}
consumption of free ribosomes
by foreign genes
Step 3: Adjust Growth Rate Prediction: Third, we update the
prediction of growth rate in Eq. 15 to include translation of heter-
ologous genes:

γðaÞ X X
λ ¼
M
ð cx þ c ci Þ:
x i
|fflffl{zfflffl} ð21Þ
ribosomal
complexes

3.2 Simulation of an Inducible expression systems are widely employed as building


Inducible Gene blocks of complex gene circuits. As an example, we consider a
reporter gene (rep) under the control of an inducible promoter,
modeled by the reactions in Table 3.
The model contains mRNAs of the heterologous gene, which
can reversibly bind to free ribosomes of the host, pr. Protein trans-
lation consumes energy (a) and, at the same time, proteins and
other model species are diluted by cell growth. In contrast to native
proteins of the host, however, we assume that heterologous pro-
teins are tagged for degradation by proteases, a strategy often
employed to accelerate protein turnover [53]. This active degrada-
tion is modeled by the parameter dp,rep in Table 3.
We do not explicitly model the molecular mechanism for
induction, as this will depend on the particular implementation of
choice. For example, in the tetR inducible system, the inducer
anhydrotetracycline (aTc) activates gene expression by reversible
binding to the tetracycline repressor tetR, whereas in the lac induc-
ible system, the inducer Isopropyl-β-D-thiogalactoside (IPTG)
binds to allosteric sites of the lac repressor lacR. Instead, we lump
the induction mechanism into an effective transcription rate,
denoted as wrep in Table 3.
Table 3
Reactions for an inducible reporter gene

Transcription Dilution/degradation Ribosome binding Dilution Translation Dilution/degradation


REP w rep λþd m;rep kb λ vrep λþd p;rep
ϕ ! mrep m rep ! ϕ 
pr + mrep −
− crep c rep ! ϕ nrep a þ c rep ! pr þ m rep þ prep prep ! ϕ
ku
Host-Circuit Modelling
279
280 Evangelos-Marios Nikolados et al.

Using the general circuit equations in 16–18 of Subsection 3.1,


for the inducible gene Eq. 16 becomes:
p_rep ¼ vrep  ðλ þ d p;rep Þprep ,
m_ rep ¼ w rep  ðλ þ d m;rep Þm rep þ vrep  kb;rep pr mrep þ ku;rep c rep ,
c_rep ¼ λc rep þ kb;rep pr m rep  ku;rep c rep  vrep :
ð22Þ
The rate of reporter translation follows as in Eq. 23:
c rep γ a
vrep ¼  max , ð23Þ
nrep a þ K γ
where nrep is the length of the reporter in amino acids. Likewise, the
transcription rate in Eq. 18 becomes:
a
w rep ¼ w max;rep  c : ð24Þ
θ þa
Note that in the transcription rate, the regulatory term is Ri ¼ 1,
because the inducible system does not contain any regulatory
interactions.
Before simulating the expression of the heterologous protein,
we first need to obtain an estimate for the proteome composition of
the wild-type. This is required to initialize the host–circuit simula-
tions with a physiologically realistic cellular composition. To this
end, we first simulate Eqs. 4–15 for the “wild-type model” until
steady state. The results, summarized in Fig. 3a, show that host
proteins are translated at different rates with most of the translating
ribosomes bound to mRNAs of house-keeping proteins. However,
a sizeable fraction is bound to ribosomal mRNA, highlighting how
the growth model accounts for ribosomal autocatalysis. A closer
look (Fig. 3a, bottom) reveals that translation-engaged ribosomes
account approximately for two-thirds of the total ribosomal frac-
tion in the form of mRNA–ribosomal complexes, with one-third
remaining free.
Next, we simulate heterologous expression using the maximal
transcription rate wmax,rep in Eq. 24 to describe the effect of differ-
ent gene induction strengths. As shown in the dose–response curve
in Fig. 3b, the model predicts that increased induction causes an
increase in expression. We observe, however, that protein expres-
sion reaches a maximum at a critical induction strength and subse-
quently drops sharply for stronger induction. This reflects the
limitations that resource competition imposes on the expression
of a heterologous gene [38].
To understand the main source of the resource limitations, we
use the model to explore the synthesis rates of the various compo-
nents of the proteome. Because growth rate is linearly related to the
total rate of translation (Eq. 21), we can make direct conclusions
for cellular growth as well. As shown in Fig. 3b (inset), the model
Host-Circuit Modelling 281

a b x102
wild-type 50
metabolic 100
ribosomal
growth rate
% of WT

expression (# of molecules)
50

translation
rates 30
100 104
house-keeping
free
ribosomes
house-keeping
10 uptake enzyme
metabolic enzyme
ribosomes heterologous protein
0
100 101 102 103 104
bound
gene induction (mRNAs/min)

Fig. 3 Simulation of an inducible gene. (a) Steady state translation rates and ribosomal abundance predicted
for the wild-type Escherichia coli model, parameterized as in Table 2. (b) Predicted steady state expression of
a heterologous gene for increasing induction strength. The pie charts indicate translation rates and ribosomal
abundance as in the left panel. The inset shows the predicted growth rate, relative to the wild-type. The
induction strength was modeled with the parameter wmax,rep in Eq. 24. The binding rate constant was set equal
to the dissociation rate constant, so that kb,rep ¼ 1  102 min1molecules1, ku,rep ¼ 1  102 min1.
Transcript and protein half-lives were set to two and four minutes, respectively [5], so that
d m,rep = ln 2 / 2 min −1 and d p,rep = ln 2 / 4 min −1

predicts a sigmoidal decrease in growth rate for stronger gene


induction. At low induction, expression of the foreign gene is
mostly at the expense of house-keeping proteins, while ribosomes,
transporter, and metabolic enzymes, show little decrease. This
suggests that the host can compensate for this load through tran-
scriptional regulation and repartitioning of the proteome (Fig. 3b).
As the induction of the reporter gene increases, circuit mRNAs
dominate the mRNA population, hence increasing the competition
for free ribosomes. Finally, for sufficiently strong induction, ribo-
somal scarcity leads to reduction of all proteins, which in turn leads
to the drop in growth rate observed in Fig. 3b (inset). These results
are in agreement with the widespread conception that ribosomal
availability is a major control node for cellular physiology [19, 54,
55], with depletion of free ribosomes being the main source of
burden for translation of circuit genes [21, 31].

4 Simulation of Transcriptional Logic Gates

There has been substantial interest in gene constructs that mimic


digital electronic circuitry [6, 56, 57]. Cellular logic gates, in
particular, have been used to produce desired behaviors in response
to various inputs such as temperature, pH, and small molecules
282 Evangelos-Marios Nikolados et al.

a b c
input 1 input 1
gene 1 gene 1
input output output output
gene 1 gene 2 gene 3 gene 3 gene 4

gene 2 gene 2
NOT NOT
input 2 input 2
AND AND
NAND

Fig. 4 Logic gates based on transcriptional regulators. (a) The NOT gate contains two genes connected in
cascade. Repression of gene 2 inverts the input signal. (b) The AND gate contains three genes, in which two
transcriptional activators jointly trigger the expression of a third output gene. (c) The NAND gate contains four
genes and is the composition of an AND and a NOT gate. Circuit connectivities are based on the implementa-
tion by Wang et al. [61]

[58–60]. Multiple logic gates can be combined to build larger


information-processing circuits with advanced cellular
functions [8].
To illustrate our simulation strategy in more complex circuitry,
here we build host–circuit models for cellular logic gates based on
transcriptional regulators [61]. We first build and simulate the
models for NOT, AND, and NAND gates shown in Fig. 4. To
highlight the power of our approach for circuit design, we then use
the host–circuit models to predict circuit function across the design
space, using combinations of RBS strength and growth media. As
discussed in Subsection 3.1, we model the circuits by adding extra
genes to the growth model and modifying the mass balance and
growth rate equations. We model the circuit connectivity by choos-
ing suitable regulatory terms Ri in the transcription rates in Eq. 18,
and the gate inputs via the maximal transcription rate w cmax,i .
To compare our host–circuit simulations with those of tradi-
tional models, we built circuit-only models using mass balance
equations for mRNAs and proteins:
m_ ci ¼ wci Ri  ðλeff þ d m Þm ci ,
ð25Þ
p_i
c
¼ i m i  ðλ
keff c eff
þ d p Þpci ,
where the subscript i denotes the ith circuit gene and we assume a
constant dilution rate, λeff ¼ 0.022 min1, which is equal to the
growth rate predicted by the model for the wild-type with a nutri-
ent efficiency of ns ¼ 0.5. The effective translation rates are fixed to
1 1
1 ¼ k 2 ¼ 16:8 min
keff 3 ¼ 0:61 min
eff
and keff for the AND gate,
and k1 ¼ k2 ¼ 13:86 min , k3 ¼ 0:058 min1 , and keff
eff eff 1 eff
4 ¼
1
347 min for the NAND gate. In all cases, we assume that
mRNAs and proteins are actively degraded with rate constants
dm = ln 2/2 min−1 and dp = ln 2/4 min−1.
Host-Circuit Modelling 283

4.1 Host-Aware The NOT gate contains two genes in cascade, where gene 1 codes
NOT Gate for a transcriptional repressor that inhibits the expression of gene 2;
the circuit diagram is shown in Fig. 4a. We first model the NOT
gate in isolation using Eq. 25. We choose the regulatory functions
Ri as
1
R1 ¼ 1, R2 ¼  c h :
p1 ð26Þ

K c1
The choice of R2 models the inhibition of gene 2, and differ-
ent inhibitory strengths and cooperativity effects can be described
by suitable choices of the threshold K c1 and Hill coefficient h. We fix
K c1 ¼ 250 molecules and h1 ¼ 2.
As shown in Fig. 5a, the isolated models correctly predict the
expected circuit function, with stronger induction of the input gene
1 gradually suppressing the expression of the output proteins (pc2 ),
with strong induction resulting in minimal output yield. In other
words, the gate has high output only when the input signal is low, in
effect acting as an inverter of the input signal.
To simulate the host-aware NOT gate, we follow the procedure
outlined in Subsection 3.1. The host-aware simulations shown in
Fig. 5b suggest that the function of the NOT gate remains largely
unaffected by host–circuit interactions. For intermediate input
levels, simulations predict an increase in growth rate of up to
50% with respect to a basal case. Such apparent growth benefit
is a consequence of the circuit architecture (Fig. 4a): an increase in
the input causes a stronger repression of gene 2 and thus relieves
the burden on the host. But since the expression of the repressor
coded by gene 1 also burdens the host, for high inputs the expres-
sion of gene 1 counteracts the growth advantages gained by repres-
sion of gene 2, resulting in an overall drop in growth rate.

a b
isolated model host-aware model
x102 x102
25 50 150
growth rate (% of basal)
output (# molecs.)

output (# molecs.)

NOT
100
15
input output 25 basal
0 1
50
1 0
5

0 0 0
100 101 102 103 104 100 101 102 103 104 100 101 102 103 104
input (mRNAs/min) input (mRNAs/min) input (mRNAs/min)

Fig. 5 Host-aware simulation of a NOT gate. (a) Gate output predicted by a model isolated from the cellular
host. Inset shows the Boolean truth table for the NOT gate. (b) Output and growth rate predictions from host-
aware model of the NOT gate. Growth rate is normalized to a basal case
284 Evangelos-Marios Nikolados et al.

4.2 Host-Aware The AND gate comprises two genes that co-activate a third output
AND Gate gene (Fig. 4b). As built in the original implementation [61], the
promoter for gene 3 is activated only when both the co-dependent
enhancer-binding proteins, encoded by genes 1 and 2, are present
in a heteromeric complex. Consequently, the regulatory functions
for the AND gate are:
 c h1  c h 2
p1 p2
K c1 K c2
R1 ¼ 1, R2 ¼ 1, R3 ¼  c h1   c h 2 , ð27Þ
p1 p2
1þ 1þ
K c1 K c2
with K c1 ¼ 200 molecules and h1 ¼ 2.381 for the activation by
gene 1, and K c2 ¼ 3000 molecules and h2 ¼ 1.835 for the activa-
tion by gene 2; these values are similar to the parameter values
estimated in Wang et al. [61].
Simulations of the isolated model (Fig. 6a) show that, as
expected, the gate has a high output only when the input signals
are high. This agrees with the expected truth table of the AND,
shown in the inset of Fig. 6a. In contrast, simulations of the host-
aware model, shown Fig. 6b, suggest a strong impact of host–
circuit interactions. The host-aware model predicts a bell-shaped
response surface, where the output reaches a maximal value for an
intermediate level of the inputs, beyond which the output drops
monotonically. Such loss-of-function coincides with a drop in
growth rate observed for increased levels of either input, as seen
in the right panel of Fig. 6b, and thus suggests a link between
growth defects and poor circuit function.

a isolated model
b host-aware model
104 104 104
output (# molecules x104)

output (# molecules x102)

2.5 2.5 100


growth rate (% of basal)
input 1 (mRNAs/min)

input 1 (mRNAs/min)

input 1 (mRNAs/min)

103 103 103

AND
102 102 102

101 input 1 input 2 output 101 101


0 0 0
0 1 0
1 0 0 0 0 basal 0
1 1 1
100 100 100
100 1
10 2
10 103
10 4
100 101 2
10 103
10 4
100 101 2
10 103 4
10
input 2 (mRNAs/min) input 2 (mRNAs/min) input 2 (mRNAs/min)

Fig. 6 Host-aware simulation of an AND gate. (a) Output predicted by a model isolated from the cellular host.
Inset shows the Boolean truth table for the AND gate. (b) Output and growth rate predictions from host-aware
model of the AND gate across the input space. Growth rate is normalized to the basal case in lower left corner
of the heatmap
Host-Circuit Modelling 285

4.3 Host-Aware The NAND gate is the negation of an AND gate, and thus pro-
NAND Gate duces a low output only when both inputs are high. As shown in
Fig. 4c, the gate has four genes connected as the composition of an
AND and NOT gate. As with the previous two cases, we simulate
the isolated model using Eq. 25. The regulatory functions for the
NAND gate are:
R1 ¼ 1,
R2 ¼ 1,
h  c h 2
pc1 1 p2
c
K1 K c2
R3 ¼  c h 1   c h2 ,
p1 p2 ð28Þ
1þ 1þ
K c1 K c2
1
R4 ¼  c h 3 ,
p3

K c3
with parameter values for R3 equal to those for R3 of the AND gate
in Eq. 27, and parameter values for R4 equal to those of R2 for the
NOT gate in Eq. 26.
As shown in Fig. 7, simulations reveal substantially different
predictions between the isolated and host-aware models of the
NAND gate. The host-aware model predicts a complex relation
between inputs and output that differs from the ideal response
predicted by the isolated model. Host-aware simulations produce
the correct response across a range of the input space (Fig. 7b), but
display significant distortions possibly caused by the loss-of-func-
tion of the AND component shown in Fig. 6b. The impact of host–
circuit interactions can also be observed in the predicted growth
rate, which suggests a growth advantage for intermediate levels of
the inputs. This is a result of the architecture of the NOT gate, akin
to what we observed in Fig. 5b.

a isolated model b host-aware model


104 104 104
output (# molecules x104)

output (# molecules x104)

growth rate (% of basal)

NAND 45 150
input 1 (mRNAs/min)

input 1 (mRNAs/min)

input 1 (mRNAs/min)

2.5
103 output
103 103
input 1 input 2 output
0 0 1
0 1 1
102 1 0 1
102 102
1 1 0

101 101 101

0 0 basal 0
100 100 100 0
100 101 102 103 104 100 101 102 103 104 10 101 102 103 104
input 2 (mRNAs/min) input 2 (mRNAs/min) input 2 (mRNAs/min)

Fig. 7 Host-aware simulation of a NAND gate. (a) Output predicted by a model isolated from the cellular host.
Inset shows the Boolean truth table for the NAND gate. (b) Output and growth rate predictions from host-aware
model of the AND gate across the input space. Growth rate is normalized to the basal case in lower left corner
of the heatmap
286 Evangelos-Marios Nikolados et al.

4.4 Impact of Design In this final section, we conduct a series of simulations that mimic
Parameters on Circuit experiments commonly used in circuit design. These aim to explore
Function the impact of design parameters and growth media on circuit
function.

4.4.1 Ribosomal Binding A number of studies have shown that RBS strength is a key mod-
Sites (RBS) ulator of cellular burden [21, 29–31]. Here we examine the impact
of RBS strengths on the AND and NAND gates from the previous
section. Using the notation in our model, see e.g. Eq. 16, we define
the RBS strength as:
kcb;i
RBSi ¼ c , ð29Þ
ku;i
where kcb;i is the mRNA-ribosome binding rate constant (in units of
min1molecules1), and kcu;i is their dissociation rate constant
(in units of min1).
We simulated the AND and NAND gates with variable RBS
strengths and gene induction strengths. As shown in Fig. 8a (left),
the AND gate retains its function for increasing RBS strength. We
observe that for the same induction, designs with stronger RBS
lead to increased circuit yield. At the same time, the simulations
predict (Fig. 8a, left) a larger bell-shaped response surface, suggest-
ing, that by increasing RBS, we expect a slightly larger design space
where the output can reach a larger maximal value for the same
range of inputs. In all cases, however, after the output reaches a
maximal value, we find a monotonic drop in circuit yield. The loss-
of-function coincides with a drop in growth rate observed in all
designs (Fig. 8a, right), which becomes more pronounced with
stronger RBS.
As shown in Fig. 8b, the impact of RBS is more notable for the
NAND gate. For designs with stronger RBS (insets Fig. 8b, left),
but weak induction, the gate displays a behavior akin to that of the
basal case. For intermediate induction, increasing RBS strength has
more detrimental effects on the circuit’s function. Specifically, the
NOT component fails to fully repress the AND component, thus
distorting the region where the circuit is functional. However,
further increase in RBS greatly impairs the system leading to near
total loss-of-function across the entire response surface (insets
Fig. 8b, left). Likewise, for stronger RBS and intermediate levels
of the input, we observe loss of the growth advantage gained by the
NOT gate component (Fig. 8b, right).

4.4.2 Nutrient Quality Bacterial growth is known to depend critically on the quality of the
growth media. As a final illustration of our approach, we used the
host-aware models to explore the impact of media on the function
of the transcriptional logic gates. We model the quality of the media
Host-Circuit Modelling 287

a output (# molecules x102) growth rate (% of basal)


0 45 0 100
104 RBS X10 104 RBS X10
RBS

AND

input 1 (mRNAs/min)
103
input 1 (mRNAs/min)

103
AND

input 1
input 1
102 input 2 102 input 2
RBS X50 RBS X50

101 101

input 1
input 1 basal
0
10 0 input 2 10 input 2

100 101 102 103 104 100 101 102 103 104
input 2 (mRNAs/min) input 2 (mRNAs/min)

b output (# molecules x102) growth rate (% of basal)


0 45 0 150
104 RBS X10
104 RBS X10
RBS
input 1 (mRNAs/min)

NAND
103 103
input 1 (mRNAs/min)
input 1

input 1
input 2
102 102 input 2
RBS X50 RBS X50

101 101
input 1

input 1
basal
100 input 2
100 input 2
100 101 102 103 104 100 101 102 103 104
input 2 (mRNAs/min) input 2 (mRNAs/min)

Fig. 8 Impact of ribosomal binding site (RBS) strength. (a) Output and growth rate predictions for the AND gate
in Fig. 4b and three RBS strengths. (b) Output and growth rate predictions for the NAND gate in Fig. 4c. RBS
strengths were computed from Eq. 29 by simultaneously increasing the binding rate constant
k cb;i ∈f102 , 101:5 , 101:155 g and decreasing the dissociation rate constant k cu;i ∈f102 , 102:5 , 102:855 g
in a pairwise manner for i ¼ 3 (AND gate) and i ¼ 4 (NAND gate). Gene induction strengths were varied in the
range 100  w cmax;i  104 mRNAs/min for i ¼ 1, 2 in both gates, and fixed w cmax;3 ¼ 375 mRNAs/min for the
AND gate, and w cmax;3 ¼ 375 mRNAs/min and w cmax;4 ¼ 250 mRNAs/min for NAND gate

via the nutrient efficiency parameter ns in Eqs. 4 and 19, which


determines the energy yield per molecule of internalized nutrient.
Our simulations suggest that nutrient quality affects the quan-
tity of output, but not the specific response of the AND gate
(Fig. 9a). As the quality of the growth medium improves, the
gene expression capacity of the host increases and, as a result, we
observe an increase in the operational range of the circuit. How-
ever, this is not the case for the NAND gate, which displays a more
complex behavior for low nutrient quality. As seen in Fig. 9b, richer
media improve the function of the gate, compared to the basal case
(Fig. 7a). This is because an increase in nutrient quality improves
288 Evangelos-Marios Nikolados et al.

a output (# molecules x102) b output (# molecules x102)


0 45 0 70

104 n s = 0.6 104 n s = 1.0


n s = 0.6
n s = 1.0
AND

input 1 (mRNAs/min)
input 1 (mRNAs/min)

NAND
103 103
AND

input 1
input 1
input 2 input 2
102 n s = 0.2
102 n s = 0.2

101 101

input 1
input 1
input 2 input 2
100 100
100 10 1 2
10 103
10 4 100 10 1
10 2
10 3
10 4

input 2 (mRNAs/min) input 2 (mRNAs/min)

Fig. 9 Impact of growth media on circuit function. (a) Simulations of the AND gate in Fig. 4b in various growth
media. (b) Simulations of the NAND gate in Fig. 4c in various growth media. In both cases the nutrient quality
parameter was set to n s ∈f0:2, 0:6, 1:0g; all other model parameters are identical to the simulations in Figs. 6
and 7b

the output of the gate’s AND component, which in turn leads to a


stronger input for the NOT component, and hence stronger
repression. On the contrary, poor nutrient quality leads to loss-of-
function for the circuit. As observed in Fig. 9a, poorer media
correspond to significantly decreased expression of the AND gate,
which is also true for the AND component of the NAND gate. This
translates to very weak input for the NOT component, which in
turn does not properly repress gene 4 (Fig. 4c), resulting in the loss
of gate functionality (Fig. 9b).

5 Discussion

In this chapter we discussed host-aware modeling in Synthetic


Biology. Starting from the three bacterial growth laws, we pre-
sented a deterministic model to simulate the dynamics of a bacterial
host [35]. We showed how to incorporate synthetic gene circuits
into the host model, and used this methodology to simulate host-
aware versions of various gene circuits. Finally, we examined the
impact of host–circuit interactions on the gates, for combinations
of inputs, RBS strength, and growth media of different nutrient
quality.
While we focused on host–circuit competition for energy and
free ribosomes, in practice gene circuits also consume other com-
ponents that may become resource bottlenecks, such as RNA poly-
merases and σ-factors for transcription, or amino acids and tRNAs
for translation. Molecular species associated with these processes
can be readily incorporated into the growth model. For instance,
instead of a single energy resource a, the catabolism of the
Host-Circuit Modelling 289

internalized nutrient si by the metabolic protein pm, could also


produce a pool of amino acids, which would then participate in
the downstream transcription and translation processes. Explicit
models of amino acid pools could be employed to study amino
acid recycling after protein degradation, or global effects such as
upregulation of transcription triggered by nutrient starvation
[36, 62]. Such extensions, however, need to be dealt with caution
since they can increase model complexity, and ultimately obscure
the relations between different sources of burden.
A grand goal of Synthetic Biology is to produce target pheno-
types through rational design of gene circuits. As with other engi-
neering disciplines, predictive models are an essential step to
accelerate the design cycle, yet current models in synthetic biology
are largely under-powered for this task. Integrated host–circuit
models can effectively bridge this gap and offer a flexible framework
to account for a wide range of resource bottlenecks. For example,
recent data [63, 64] suggest highly nonlinear relations between
growth rate and heterologous expression and a sizeable burden
caused by metabolic imbalances typically found in pathway engi-
neering [65]. Such findings raise compelling prospects for the
integration of mechanistic cell models with large-scale characteri-
zation data, ultimately paving the way for more robust and predict-
able Synthetic Biology.

References

1. Andrianantoandro E, Basu S, Karig DK, Weiss 8. Tabor JJ, Salis HM, Simpson ZB, Chevalier
R (2006) Synthetic biology: new engineering AA, Levskaya A, Marcotte EM, Voigt CA,
rules for an emerging discipline. Mol Syst Biol Ellington AD (2009) A synthetic genetic edge
2(1):2006.0028 detection program. Cell 137(7):1272–1281
2. Canton B, Labno A, Endy D (2008) Refine- 9. Mannan AA, Liu D, Zhang F, Oyarzún DA
ment and standardization of synthetic (2017) Fundamental design principles for
biological parts and devices. Nat Biotechnol transcription-factor-based metabolite biosen-
26(7):787 sors. ACS Synth. Biol. 6:1851–1859
3. Ninfa AJ, Selinsky S, Perry N, Atkins S, Song 10. Oyarzún DA, Stan G-BV (2013) Synthetic
QX, Mayo A, Arps D, Woolf P, Atkinson MR gene circuits for metabolic control: design
(2007) Using two-component systems and trade-offs and constraints.. J R Soc Interf
other bacterial regulatory factors for the fabri- 10:20120671
cation of synthetic genetic devices. Methods 11. Nielsen AA, Der BS, Shin J, Vaidyanathan P,
Enzymol 422:488–512 Paralanov V, Strychalski EA, Ross D,
4. Teo JJ, Woo SS, Sarpeshkar R (2015) Synthetic Densmore D, Voigt CA (2016) Genetic circuit
biology: a unifying view and review using ana- design automation. Science 352(6281):
log circuits. IEEE Trans Biomed Circ Syst 9 aac7341
(4):453–474 12. Chaves M, Oyarzún DA (2019) Dynamics of
5. Elowitz MB, Leibler S (2000) A synthetic complex feedback architectures in metabolic
oscillatory network of transcriptional regula- pathways. Automatica 99:323–332
tors. Nature 403(6767):335 13. Carbonell P, Radivojevic T, Garcı́a Martı́n H
6. Hasty J, McMillen D, Collins JJ (2002) Engi- (2019) Opportunities at the intersection of
neered gene circuits. Nature 420(6912):224 synthetic biology, machine learning, and auto-
7. Gardner TS, Cantor CR, Collins JJ (2000) mation. ACS Synth Biol 8:1474–1477
Construction of a genetic toggle switch in 14. Hughes RA, Ellington AD (2017) Synthetic
Escherichia coli. Nature 403(6767):339 DNA synthesis and assembly: putting the
290 Evangelos-Marios Nikolados et al.

synthetic in synthetic biology. Cold Spring Stan G-B, Ellis T (2018) Burden-driven feed-
Harbor Perspect Biol 9:a023812 back control of gene expression. Nat Methods
15. Rondelez Y (2012) Competition for catalytic 15(5):387
resources alters biological network dynamics. 29. Gyorgy A, Jiménez JI, Yazbek J, Huang H-H,
Phys Rev Lett 108(1):018102 Chung H, Weiss R, Del Vecchio D (2015)
16. Cardinale S, Arkin AP (2012) Contextualizing Isocost lines describe the cellular economy of
context for synthetic biology–identifying genetic circuits. Biophys J 109(3):639–646
causes of failure of synthetic biological systems. 30. Carbonell-Ballestero M, Garcia-Ramallo E,
Biotechnol J 7(7):856–866 Montañez R, Rodriguez-Caso C, Macı́a J
17. Gyorgy A, Del Vecchio D (2014) Limitations (2015) Dealing with the genetic load in bacte-
and trade-offs in gene expression due to com- rial synthetic biology circuits: convergences
petition for shared cellular resources. In: 2014 with the ohm’s law. Nucleic Acids Res 44
IEEE 53rd Annual Conference on Decision (1):496–507
and Control (CDC), pp. 5431–5436. IEEE, 31. Gorochowski TE, Avcilar-Kucukgoze I,
New York (2014) Bovenberg RA, Roubos JA, Ignatova Z
18. Mather WH, Hasty J, Tsimring LS, Williams RJ (2016) A minimal model of ribosome alloca-
(2013) Translational cross talk in gene net- tion dynamics captures trade-offs in expression
works. Biophys J 104(11), 2564–2572 between endogenous and synthetic genes. ACS
19. Scott M, Gunderson CW, Mateescu EM, Synth Biol 5(7):710–720
Zhang Z, Hwa T (2010) Interdependence of 32. Karr JR, Sanghvi JC, Macklin DN, Gutschow
cell growth and gene expression: origins and MV, Jacobs JM, Bolival Jr B, Assad-Garcia N,
consequences. Science 330(6007):1099–1102 Glass JI, Covert MW (2012) A whole-cell
20. Tan C, Marguet P, You L (2009) Emergent computational model predicts phenotype
bistability by a growth-modulating positive from genotype. Cell 150(2):389–401
feedback circuit. Nat Chem Biol 5(11):842 33. Purcell O, Jain B, Karr JR, Covert MW, Lu TK
21. Ceroni F, Algar R, Stan G-B, Ellis T (2015) (2013) Towards a whole-cell modeling
Quantifying cellular capacity identifies gene approach for synthetic biology. Chaos 23
expression designs with reduced burden. Nat (2):025112
Methods 12(5):415 34. Klumpp S, Zhang Z, Hwa T (2009) Growth
22. An W, Chin JW (2009) Synthesis of orthogonal rate-dependent global effects on gene expres-
transcription-translation networks. Proc Natl sion in bacteria. Cell 139:1366–1375
Acad Sci 35. Weiße AY, Oyarzún DA, Danos V, Swain PS
23. Segall-Shapiro TH, Meyer AJ, Ellington AD, (2015) Mechanistic links between cellular
Sontag ED, Voigt CA (2014) A resource allo- trade-offs, gene expression, and growth. Proc
cator for transcription based on a highly frag- Natl Acad Sci 112(9):E1038–E1047
mented T7 RNA polymerase. Mol Syst Biol 10 36. Liao C, Blanchard AE, Lu T (2017) An inte-
(7):742 grative circuit–host modelling framework for
24. Pasini M, Fernández-Castané A, Jaramillo A, predicting synthetic gene network behaviours.
de Mas C, Caminal G, Ferrer P (2016) Using Nat. Microbiol. 2(12):1658
promoter libraries to reduce metabolic burden 37. Thomas P, Terradot G, Danos V, Weiße AY
due to plasmid-encoded proteins in recombi- (2018) Sources, propagation and conse-
nant Escherichia coli. New Biotechnol 33 quences of stochasticity in cellular growth.
(1):78–90 Nat Commun 9(1):1–11
25. Shopera T, He L, Oyetunde T, Tang YJ, Moon 38. Nikolados E-M, Weiße AY, Ceroni F, Oyarzún
TS (2017) Decoupling resource-coupled gene DA (2019) Growth defects and loss-of-func-
expression in living cells. ACS Synth Biol 6 tion in synthetic gene circuits. ACS Synth Biol
(8):1596–1604 8(6):1231–1240
26. Darlington APS, Kim J, Jiménez JI, Bates DG 39. O’Brien EJ, Lerman JA, Chang RL, Hyduke
(2018) Dynamic allocation of orthogonal ribo- DR, Palsson B (2013) Genome-scale models of
somes facilitates uncoupling of co-expressed metabolism and gene expression extend and
genes. Nat Commun 9:695 refine growth phenotype prediction. Mol Syst
27. Rugbjerg P, Sarup-Lytzen K, Nagy M, Som- Biol 9:693
mer MOA (2018) Synthetic addiction extends 40. Carrera J, Covert MW (2015) Why build
the productive life time of engineered Escher- whole-cell models? Trends Cell Biol 25
ichia coli populations. Proc Natl Acad Sci 115 (12):719–722
(10):2347–2352 41. Karr JR, Takahashi K, Funahashi A (2015) The
28. Ceroni F, Boo A, Furini S, Gorochowski TE, principles of whole-cell modeling. Curr Opin
Borkowski O, Ladak YN, Awan AR, Gilbert C, Microbiol 27:18–24
Host-Circuit Modelling 291

42. O’Brien EJ, Monk JM, Palsson BO (2015) ribosomes: expression from reporter genes
Using genome-scale models to predict does not always reflect functional mRNA levels.
biological capabilities Cell 161(5):971–987 J Mol Biol 231(3):678–688
43. Monod J (1949) The growth of bacterial cul- 55. Dong H, Nilsson L, Kurland CG (1995) Gra-
tures. Ann Rev Microbiol 3(1):371–394 tuitous overexpression of genes in Escherichia
44. Schaechter M, Maaløe O, Kjeldgaard NO coli leads to growth inhibition and ribosome
(1958) Dependency on medium and tempera- destruction. J Bacteriol 177(6):1497–1504
ture of cell size and chemical composition dur- 56. Lim WA (2010) Designing customized cell
ing balanced growth of Salmonella signalling circuits. Nat Rev Mol Cell Biol 11
typhimurium. Microbiology 19(3):592–606 (6):393
45. Neidhardt FC, Magasanik B (1960) Studies on 57. Khalil AS, Collins JJ (2010) Synthetic biology:
the role of ribonucleic acid in the growth of applications come of age. Nat Rev Genet 11
bacteria. Biochim Biophys Acta 42:99–116 (5):367
46. Dennis PP, Ehrenberg M, Bremer H (2004) 58. Joshi N, Wang X, Montgomery L, Elfick A,
Control of rRNA synthesis in Escherichia coli: French C (2009) Novel approaches to biosen-
a systems biology approach. Microbiol Mol sors for detection of arsenic in drinking water.
Biol Rev 68(4):639–668 Desalination 248(1–3):517–523
47. Maaløe O (1979) Regulation of the protein- 59. Paitan Y, Biran I, Shechter N, Biran D,
synthesizing machinery—ribosomes, tRNA, Rishpon J, Ron EZ (2004) Monitoring aro-
factors, and so on. In: Biological Regulation matic hydrocarbons by whole cell electrochem-
and Development, pp. 487–542. Springer, ical biosensors. Anal Biochem 335(2):175–183
New York (1979) 60. Saeidi N, Wong CK, Lo T-M, Nguyen HX,
48. Bremer H, Dennis PP, et al (1996) Modulation Ling H, Leong SSJ, Poh CL, Chang MW
of chemical composition and other parameters (2011) Engineering microbes to sense and
of the cell by growth rate. EcoSal Cell Mol Biol eradicate Pseudomonas aeruginosa, a human
2(2):1553–1569 pathogen. Mol Syst Biol 7(1):521
49. Maitra A, Dill KA (2015) Bacterial growth laws 61. Wang B, Kitney RI, Joly N, Buck M (2011)
reflect the evolutionary importance of energy Engineering modular and orthogonal genetic
efficiency. Proc Natl Acad Sci 112(2):406–411 logic gates for robust digital-like synthetic biol-
50. Bosdriesz E, Molenaar D, Teusink B, Brugge- ogy. Nat Commun 2:508
man FJ (2015) How fast-growing bacteria 62. Hartline CJ, Mannan AA, Liu D, Zhang F,
robustly tune their ribosome concentration to Oyarzún DA (2020) Metabolite sequestration
approximate growth-rate maximization. FEBS enables rapid recovery from fatty acid depletion
J 282(10):2029–2044 in Escherichia coli. mBio 11:e03112–e03119
51. Molenaar D, Van Berlo R, De Ridder D, Teu- 63. Cambray G, Guimaraes JC, Arkin AP (2018)
sink B (2009) Shifts in growth strategies reflect Evaluation of 244,000 synthetic sequences
tradeoffs in cellular economics. Mol Syst Biol 5 reveals design principles to optimize translation
(1):323 in Escherichia coli. Nat Biotechnol 36
52. Russell JB, Cook GM (1995) Energetics of (10):1005
bacterial growth: balance of anabolic and cata- 64. Borkowski O, Bricio C, Murgiano M,
bolic reactions. Microbiol Mol Biol Rev 59 Rothschild-Mancinelli B, Stan GB, Ellis T
(1):48–62 (2018) Cell-free prediction of protein expres-
53. McGinness KE, Baker TA, Sauer RT (2006) sion costs for growing cells. Nat Commun 9
Engineering controllable protein degradation. (1):1457
Mol Cell 22(5):701–707 65. Liu D, Mannan AA, Han Y, Oyarzún DA,
54. Vind J, Sørensen MA, Rasmussen MD, Peder- Zhang F (2018) Dynamic metabolic control:
sen S (1993) Synthesis of proteins in Escher- towards precision engineering of metabolism. J
ichia coli is limited by the concentration of free Ind Microbiol Biotechnol 45:535–543
Chapter 14

A Practical Step-by-Step Guide for Quantifying Retroactivity


in Gene Networks
Andras Gyorgy

Abstract
One of the fundamental properties of engineered large-scale complex systems is modularity. In synthetic
biology, genetic parts exhibit context-dependent behavior. Here, we describe and quantify a major source of
such behavior: retroactivity. In particular, we provide a step-by-step guide for characterizing retroactivity to
restore the modular description of genetic modules. Additionally, we also discuss how retroactivity can be
leveraged to quantify and maximize robustness to perturbations due to interconnection of genetic modules.

Key words Retroactivity, Gene transcription networks, Modularity, Synthetic biology, Context-
dependence, Model order reduction, Loading

1 Introduction

Modularity greatly simplifies the design and analysis of complex


systems. Although biological systems comprise motifs at the struc-
tural level [2, 34, 42, 53, 57], these modules display context-
dependent behavior [8, 11, 44, 46, 59, 65], hindering the rational
design of large-scale synthetic genetic circuits [22, 38, 51]. There-
fore, genetic modules currently need to be re-designed through a
lengthy and ad hoc process every time they are inserted into a
different system [11, 59], thus the development of even simple
circuit components requires an iterative process in which slight
modifications are tested and then tuned [4, 60], where the (opti-
mal) characterization of each part is slow and costly [7].
Sources of context-dependence include interactions among
parts due to spatial co-localization [13, 18, 63], dependence on
the host organism and strain [6], growth-dependence [9, 56, 64],
environmental dependence [10, 20, 47, 48, 66], the limited avail-
ability of shared cellular resources [12, 23, 24, 26, 44, 52, 58], and
retroactivity due to the composition of modules [19, 28–30,
33]. Here, we focus on this last source of context-dependence,
capturing how a downstream module perturbs the dynamic state

Filippo Menolascina (ed.), Synthetic Gene Circuits: Methods and Protocols, Methods in Molecular Biology, vol. 2229,
https://doi.org/10.1007/978-1-0716-1032-9_14, © Springer Science+Business Media, LLC, part of Springer Nature 2021

293
294 Andras Gyorgy

Fig. 1 Experimental demonstration of retroactivity, adapted from [43]. Upon addition of DOX, rtTa binds to the
promoter pTET, expressing SKN7m, which then triggers GFP production in the output module

of its upstream module in the process of receiving information from


the latter [17, 54]. This is illustrated in Fig. 1: addition of the load
module affects the upstream (input) module, and as a result, the
output of the system as well.
Perturbations due to retroactivity can have dramatic effects on
the upstream module’s behavior [5], for instance, by changing the
behavior of a toggle switch [41], one of the most widely used
genetic modules with applications ranging from clocks [50] to
frequency multipliers [14]. Additionally, retroactivity has profound
effects on the robustness of modules as well [45], thus accounting
for it is essential for the predictable design of complex systems by
combining small modules (e.g., accurate and sensitive biosensors
[3]). Here, we provide a step-by-step quantitative framework to
accurately predict how protein expressions become coupled as a
result of retroactivity. In particular, we demonstrate that the
dynamic effects of loading due to interconnections can be fully
captured via appropriate retroactivity matrices. To this end, we
detail a workflow comprising five major steps:
step 1 derive mathematical model of modules;
step 2 compute internal retroactivity of modules;
step 3 compute external retroactivity of modules;
step 4 compute scaling and mixing retroactivity of modules;
step 5 bound the effects of retroactivity.

The results presented here provide a summary of the main


results of [25], interested readers are encouraged to consult the
original publication for more details.

2 Materials

The standard mechanistic model of gene transcription networks


includes protein production, decay, and reversible binding reac-
tions between transcription factors (TFs) and promoter sites,
required for transcriptional regulation. Genetic modules are thus
a set of TFs, communicating with each other by having TFs
A Step-by-Step Guide for Retroactivity in Gene Networks 295

produced in one module regulate the expression of TFs produced


in a different module. After introducing the reactions that govern
the behavior of gene networks, we present the two main mathe-
matical tools we later use to quantify retroactivity and its effects.

2.1 Biochemical The production of TF xi is regulated by its parents pi,1, pi,2, . . .


Reactions (where pi,j ¼ xk for some k): they bind to the promoter of xi, and
form complexes ci,1, ci,2, . . . with the promoter according to
αi,j ,k
ci,j þ pi,l Ð ci,k , ð1Þ
βi,k,j

where αi,j,k and βi,j,k are the corresponding association/dissociation


rates. Each of these complexes, in turn, produces xi with a different
rate π i,j (incorporating features such as the RBS strength and the
promoter strength) according to
π i,j
ci,j !ci,j þ xi , ð2Þ

where we use a one-step production process (see Note 1) encapsu-


lating both transcription and translation [2]. Finally, we consider
external induction and decay of xi modeled by
ζ i ðt Þ
; Ð xi , ð3Þ
δi

where δi denotes the protein decay, whereas ζ i ðt Þ represents the


production rate that may be due to external inputs or perturbations
(inducer, noise or disturbance). Finally, we assume that the total
concentration of the promoter, denoted by ηi, for each transcrip-
P
tion component is conserved, so that ηi ¼ Cj i¼0 c i,j , where Ci is the
number of possible complexes formed with the promoter of xi. The
concentration ηi is proportional to the copy number of plasmids
from which the genes are expressed, which can be easily tuned [35].

2.2 Model Order Consider the dynamics


Reduction via
x_ ¼ f ðt, x, z, εÞ, xðt 0 Þ ¼ ξðεÞ,
Time-Scale Separation ð4Þ
εż ¼ gðt, x, z, εÞ, zðt 0 Þ ¼ χðεÞ,
where ξ(ε) and χ(ε) depend smoothly on ε and t0 ∈ [0, t1) and let x
(t, ε) and z(t, ε) denote the solution of 4. Furthermore, let z ¼ h
(t, x) denote an isolated root of 0 ¼ g(t, x, z, 0). In addition to some
smoothness properties (see Theorem 11.1 in [32] for technical
details), assume that x_ ¼ f ðt, x, hðt, xÞ, 0Þ with x(t0) ¼ ξ(0) has a

unique solution xðtÞ, and that the origin is an exponentially stable
equilibrium point of
dy
¼ gðt, x, y þ hðt, xÞ, 0Þ

296 Andras Gyorgy

with y :¼ z  h(t, x) and τ :¼ (t  t0)/ε. Then, there exists a positive


constant ε∗ such that for all χ(0)  h(t0, ξ(0)) and 0 < ε < ε∗, the
dynamics in 4 has a unique solution x(t, ε), z(t, ε) on [t0, t1], such
 ¼ OðεÞ . Moreover, for any tb > t0, there is
that xðt, εÞ  xðtÞ
ε∗∗ ε∗ such that zðt, εÞ  hðt, xðtÞÞ
 ¼ OðεÞ holds uniformly for
t ∈ [tb, t1] whenever ε < ε∗∗. For details, see Theorem 11.1 in [32].

2.3 Contraction A system x_ ¼ f ðx, t Þ is called contracting [40] if there exists a


Theory square matrix Θðx, t Þ with the following two properties: (1) ΘT Θ
is uniformly positive definite and (2) the symmetric part of the
generalized Jacobian
 
∂f
J ðx, t Þ :¼ Θ_ þ Θ Θ1
∂x
is uniformly negative definite. The absolute value of the largest
eigenvalue of the symmetric part of J is called the system’s contrac-
tion rate with respect to the metric Θ.

3 Methods

To quantify the effect of retroactivity, we next detail a workflow


comprising the five major steps outlined in Subheading 1.

3.1 Step 1: When a TF xi belongs to the module, we call it an internal TF,


Mathematical Model otherwise it is an external TF. Further, we identify external TFs that
of Modules are parents to internal TFs as inputs to the module. Consider first a
network of n transcription factors and the reactions given in 1–3.
Let x, u, and c denote the concentration vector of internal TFs,
inputs, and TF-promoter complexes, respectively.
1. Introduce the reaction flux vector v containing all the reaction
rates in the system such that v is partitioned into r and r∗,
where r is composed of the fast reactions in 1, whereas r∗
contains the slow processes in 2–3:
0 1
0 1 ⋮
⋮ B C
B C B ζi C
B αi,j ,k c i,j p C
m B C
B C
i,l
B i,l C
r¼B C, r ¼ B δi x i C
∗ B
C: ð5Þ
B βi,k,j c i,k C B C
@ A B π i,j c i,j C
@ A


2. According to [36], write the dynamics of x and c as
! " # !
˙c 0 A r ∗ ðx, cÞ
¼ ,
˙x B∗ B rðx, c, uÞ
|{z} |{z}
N st vðx, c, uÞ
A Step-by-Step Guide for Retroactivity in Gene Networks 297

where Nst is the stoichiometry matrix (the upper left block


matrix is the zero matrix as DNA is not produced/degraded
[1]).
3. Once the context of the module is present, represent all the
quantities related to the context with an overbar. In this case,
the dynamics of the species in the module (c and x) and in the
 can be written as
c and x)
context (

ð6Þ

Here, the upper left block matrix is zero as DNA is assumed to


be a conserved species; the off-diagonal block matrices in the
upper right block matrix are zero since r and r encapsulate the
binding/unbinding reactions in the module and in its context,
respectively; and the off-diagonal block matrices in the lower
left block matrix are zero as r∗ and r∗ encapsulate the produc-
tion/decay reactions in the module and its context,
respectively.
4. Introduce s :¼ Er to describe the effective rate of change of
x due to intermodular binding reactions (presence of context),
as the stoichiometry matrix E represents how internal TFs of
the module participate in binding/unbinding reactions in the
context of the module (E can be interpreted similarly).

Milestone 1: With g ðx, c Þ :¼ B ∗ r ∗ ðx, c Þ we obtain


c_ ¼ Ar ðx, c, uÞ,
ð7Þ
x_ ¼ g ðx, c Þ þ Br ðx, c, uÞ,
which we call the isolated dynamics of a module. Conversely,
c_ ¼ Ar ðx, c, uÞ,
ð8Þ
x_ ¼ g ðx, c Þ þ Brðx, c, uÞ þ sðx, c, uÞ,
is called the connected dynamics of a module.
Insight from Milestone 1: We refer to s as the retroactivity to the
output of the module, encompassing retroactivity applied to the
module due to the context of the module. Similarly, we call r the
retroactivity to the input of a module, representing retroactivity
originating inside the module. The two major drawbacks of the
above description are the following. First, it involves microscopic
parameters that are hard to measure, for instance, association rate
constants. As a result, its practical usability is limited. Second, it fails
to provide insights into how retroactivity affects a module’s dynam-
ics, and more importantly, how do the dynamics and behavior
change once the module is interconnected with other modules as
part of a larger system (see Note 2).
298 Andras Gyorgy

3.2 Step 2: Internal Here we derive the reduced order model of the isolated dynamics of
Retroactivity a module when the module has no inputs.
1. The binary matrix Vi has as many columns as the number of
TFs in the module, and as many rows as the number of parents
of xi, such that its ( j, k) element is 1 if the jth parent of xi is xk,
otherwise the entry is zero. That is, an entry in the following
matrix
x1 x2 ...
2 3
pi,1
6 7
Vi ¼ 6
6
7
7 pi,2
4 5

is 1 if the species indexing the corresponding row and column


are the same, otherwise the entry is zero, yielding pi ¼ Vix.
Furthermore, let Φ denote the set of TFs having parents from
inside the module.
2. The binary matrix Ψi has as many columns as the number of
complexes formed with the promoter of xi, and as many rows as
the number of parents of xi. That is, the ( j, k) element in the
following matrix
c i,1 c i,2 ...
2 3
pi,1
6 7
Ψi ¼ 6
6
7
7 pi,2
4 5

is m if the jth parent of xi is bound as an m-multimer in ci,k


(m ¼ 0 if the jth parent is not bound).
3. Since A in 7 has a block diagonal structure [36] with blocks Ai,
we can write c_i ¼ A i r i ðpi , c i Þ where ri( pi, ci) denotes the reac-
tion flux vector corresponding to reversible binding reactions
with the promoter of xi. Let ci ¼ γ i( pi) denote the vector of
concentrations of complexes with the promoter of xi at the
quasi-steady state, obtained by setting 0 ¼ Airi( pi, ci), and
similarly, let c ¼ γ(x) be the locally unique solution of 0 ¼ Ar
(x, c) from 7.
4. Let γ i,j( pi) denote the jth entry in γ i( pi) and Ci the number of
complexes with the promoter of xi. Define
P
Ci
H i ðpi Þ ¼ π i,j γ i,j ðpi Þ ð9Þ
j ¼0
A Step-by-Step Guide for Retroactivity in Gene Networks 299

(see Note 3) and introduce


0 1
ζ 1 þ H 1 ðp1 Þ  δ1 x 1
B C
B ζ 2 þ H 2 ðp2 Þ  δ2 x 2 C
B C
hðxÞ ¼ B C: ð10Þ
B ⋮ C
@ A
ζ N þ H N ðpN Þ  δN x N
5. Define the retroactivity Ri( pi) of TF xi ∈ Φ (see Note 3) as
  dγ ðp Þ
Ri pi ¼ Ψi i i : ð11Þ
dpi
6. Introduce the internal retroactivity of a module as
P
RðxÞ ¼ V Ti Ri ðpi ÞV i : ð12Þ
f i j x i ∈Φ g

Milestone 2: Let (c(t), x(t)) be the solution of the isolated module


^ of
dynamics 7 with initial condition (c0, x0). The solution xðtÞ
x_ ¼ ½I þ RðxÞ1 hðxÞ: ð13Þ
with initial condition xð0Þ ^ ¼ x^0 well approximates x(t) when x^0 þ
x B ðγðx^0 ÞÞ ¼ x 0 þ x B ðc 0 Þ where
P
x B ðcÞ ¼ V Ti Ψi c i : ð14Þ
f i j x i ∈Φ g

Insight from Milestone 2: The reduced order dynamics in 13


reveal how internal retroactivity R of the module affects its dynam-
ics (see Note 4). When R ¼ 0, we have x_ ¼ hðxÞ , the commonly
used Hill function-based model for gene transcription networks
[2]. Moreover, 13 describes how changes in the total concentration
of TFs h(x) relate to changes x_ in the concentration of free TFs.
Specifically, to change the concentration of free TFs by one unit,
the module has to change the total concentration of TFs by (I + R)
units, as R units are “spent on” changing the concentration of
bound TFs. Having R ¼ 0 implies that the module’s effort on
affecting the total concentration of TFs is entirely spent on chang-
ing the concentration of free TFs. By contrast, jjRjj ! 1 implies
that no matter how much the total concentration of TFs changes, it
is not possible to achieve any changes in the free concentration of
some of the TFs. Therefore, the internal retroactivity R describes
how “stiff” the module is against changes in x due to loading
applied by internal connections. The retroactivity Ri( pi) of each
TF can be interpreted similarly.

3.3 Step 3: External Here, we extend the reduced order model in 13 to the case in which
Retroactivity the module has external TFs as inputs.
300 Andras Gyorgy

1. Let u ¼ (u1, u2, . . ., uW)T denote the concentration vector of


TFs external to the module, and define Ω as the set of TFs
having parents from outside the module (external TFs).
2. The binary matrix Di has as many columns as the number of
inputs of the module, and as many rows as the number of
parents of xi, such that its ( j, k) element is 1 if the jth parent
of xi is uk, otherwise the entry is zero. That is, an entry in the
following matrix
u1 u2 ...
2 3
pi,1
6 7
Di ¼ 6
6
7
7 pi,2
4 5

is 1 if the species indexing the corresponding row and column


are the same, otherwise the entry is zero, yielding pi ¼
T
½ V i D i ð x T uT Þ . Note that in the presence of input u,
both h(.) and R(.) given in 10 and 12, respectively, depend on
x and u, as some of the parents of internal TFs are external TFs.
Similarly, we now have r(x, c, u) instead of r(x, c).
3. In the presence of input u, R(.) given in 12 depends on both
x and u, as some of the parents of internal TFs are external TFs,
so that
P
Rðx, uÞ ¼ V Ti Ri ðpi ÞV i : ð15Þ
f i j x i ∈Φ g

4. Define the external retroactivity as


P
Q ðx, uÞ ¼ V Ti Ri ðpi ÞD i : ð16Þ
f i j x i ∈Φ\Ω g

Milestone 3: Let (c(t), x(t)) be the solution of the isolated module


dynamics 7 with initial condition (c0, x0) and with smooth input u
^ of
(t). The solution xðtÞ
x_ ¼ ½I þ Rðx, uÞ1 ½h ðx, uÞ  Q ðx, uÞu_  ¼: f ðx, u, uÞ
_ ð17Þ
with initial condition xð0Þ^ ¼ x^0 well approximates x(t) when x^0 þ
x B ðγðx^0 , uð0ÞÞÞ ¼ x 0 þ x B ðc 0 Þ with xB() defined in 14.
Insight from Milestone 3: The reduced order dynamics in 17
reveals the role that the external retroactivity Q plays. Recall that
h(x, u) ¼ 0 implies that the total concentrations of internal TFs are
constant from 10. In this case, 17 reduces to x_ ¼ ðI þ RÞ1 Q u, _
where x is the concentration vector of free internal TFs. This means
that the concentrations of free internal TFs can still be changed
subsequent to changes in the external TFs (input), despite the fact
A Step-by-Step Guide for Retroactivity in Gene Networks 301

that the total concentration (free and bound) of internal TFs


remains unaffected. Therefore, Q captures the phenomenon by
which external TFs force internal TFs to bind/unbind, for instance,
by competing for the same binding sites.

3.4 Step 4: Scaling Next, consider the interconnection of the module together with its
and Mixing context.
Retroactivity
1. The binary matrix U has as many rows as the number of inputs
of the module, and as many columns as the number of TFs in
the context, such that its ( j, k) element is 1 if the jth input of the
module is the kth internal TF of the context (u j ¼ xk ), other-
wise the entry is zero. That is, an entry in the following matrix
x1 x2 ...
2 3
u1
6 7
U ¼ 6
6
7
7 u2
4 5

is 1 if the species indexing the corresponding row and column


are the same, otherwise the entry is zero, yielding u ¼ U x .
Define U similarly for the context, yielding u ¼ U x.
2. Define the scaling retroactivity of the module as
P
Sðx, xÞ ¼ ½D i U T Ri ðpi ÞD i U : ð18Þ
f i j x i ∈Ω g

3. Define the mixing retroactivity of the module as


P
M ðx, xÞ ¼ ½D i U T Ri ðpi ÞV i : ð19Þ
f i j x i ∈ðΦ\ΩÞ g

4. Define the scaling retroactivity of the context as


P  T
Sðx, xÞ ¼ Di U Ri ðpi ÞD i U : ð20Þ
f i j x i ∈Ω g
5. Define the mixing retroactivity of the context as
P  T
M ðx, xÞ ¼ Di U Ri ðpi ÞV i : ð21Þ
Þ g
f i j x i ∈ðΦ\Ω

6. Introduce x 0 :¼ ðx xÞ and c 0 :¼ ðc cÞ together with


! " #
rðx, c, U xÞ A 0
r 0 ðx 0 , c 0 Þ ¼ , A0 ¼ , ð22Þ
rðx, c, U xÞ 0 A
and let c 0 ¼ γ~0 ðx 0 Þ be an isolated root of 0 ¼ A0 r0 (x0 , c0 ).
302 Andras Gyorgy

Milestone 4: Let (c0 (t), x0 (t)) be the solution of the dynamics 6


with initial condition ðc 00 , x 00 Þ. The solution x^0 ðtÞ of
! " #1
x_ I þ ðI þ RÞ1 S ðI þ RÞ1 M
¼
x_ ðI þ RÞ 1 M  1 S
I þ ðI þ RÞ
!
f ðx, U x, _
 U xÞ
fðx,
U  x, U
 xÞ
_ ð23Þ
|{z}
isolated dynamics
of the module and
of its context
with initial condition x^00 well approximates x0 (t) when x^0 ð0Þ ¼ x00
such that x^00 þ x 0B ðγ 0 ðx^00 ÞÞ ¼ x 00 þ x 0B ðc 00 Þ where
0 P 1
n o V T
i Ψi c i
B C
B i j xi ∈Φ C
B C
x 0B ðc 0 Þ ¼ BB P
C:
B V  c C
T Ψ C
@n o i i iA

i j x∈Φ
i

Insight from Milestone 4: The reduced order dynamics in 23


describes how the dynamics of the module and that of the context
change upon interconnection as it relates the connected dynamics
to the isolated dynamics, characterized by the internal, scaling, and
mixing retroactivity matrices according to 23. First, zero matrices
S, M, S , and M  lead to no alteration in the dynamics upon
interconnection. When M  ¼ 0, the dynamics of the module after
interconnection become
h i1  
x_ ¼ I þ ðI þ RÞ1 S  U x_ ,
f x, U x,
|{z}
ð24Þ
isolated dynamics
of the module
that is, S determines how the isolated dynamics of the module get
“scaled” upon interconnection. Complementing this effect, the
dynamics of the context enter into the module’s dynamics
through the mixing retroactivity M  of the context, referring to
the “mixing” of the dynamics of the module and that of its context.
When M  6¼ 0, a perturbation applied in the context can result in a
response in the upstream module, even without TFs in the context
regulating TFs in the module, leading to a counter-intuitive trans-
mission of signals from downstream (context) to upstream
(module).
A Step-by-Step Guide for Retroactivity in Gene Networks 303

3.5 Step 5: Error Due Here, we provide three distinct ways to quantify the measure of
to Retroactivity disturbance on the module dynamics due to retroactivity from its
context when parameter values are known (see Note 5). For sim-
 ¼ 0. Let
plicity, we focus on the case when M
x_ ¼ f ðx, u, uÞ
_ ð25Þ
denote the dynamics of the module in isolation from 17. Once the
module is connected to its context, its dynamics change according
to
1
x_ ¼ ½I þ ðI þ RÞ1 S f ðx, u, u_ Þ ð26Þ
from 24. Let x ðt Þ and x~ðt Þ denote the solution of 25 and 26,
respectively, with identical initial conditions.
1. Introduce
1

μðx, uÞ: ¼ jj½I þ ðI þ RÞ1 S  I jj2 : ð27Þ
2. If they exist, define l^, f^ , and μ^ such that (i) f ðx, u, u_ Þ have
Lipschitz constant l^ , (ii) jj f ðx, u, u_ Þjj2  f^ , and (iii)
μðx, uÞ  μ^.
3. Let σ min ðI þ RÞ denote the smallest singular value of (I + R),
 stands for the greatest singular value of S
and similarly, σ max ðSÞ
and define

σ max ðSÞ
μ^ ¼ max 
x, x σ min ðI þ RÞ  σ max ðSÞ
 < σ min ðI þ RÞ.
provided that σ max ðSÞ
4. It the system 25 is contracting [40] with rate λ > 0 and metric
transformation Θðx, t Þ , then denote by κ ðx, t Þ the condition
number of Θðx, t Þ, and let κ^  0 such that κ^  κðx, t Þ.
Milestone 4: The change in dynamics of a module due to retro-
ctivity from its context is bounded according to
_  f~ðx, u, uÞk
kf ðx, u, uÞ _ 2
 μðx, uÞ: ð28Þ
_ 2
kf ðx, u, uÞk
Similarly, the difference between trajectories of 25 and 26 is
bounded as
μ^ f^ h lt^ i
jjx ðt Þ  x~ðt Þjj2  e 1 ,
l^
and also by
μ^ f^κ^
jjx ðt Þ  x~ðt Þjj2  :
λ
304 Andras Gyorgy

Insight from Milestone 5: The above results suggest that the


module becomes more robust to interconnection as μ^ decreases,
for instance, by increasing min x,x σ min ðI þ RÞ or by decreasing
 Such a metric can be used not only in the design
max x,xσ max ðSÞ.
of gene transcription networks (low values of μ^ lead to modules that
behave almost the same when connected or isolated), but also
during their analysis, for instance, by enhancing existing partition-
ing methods based on other measures (e.g., edge betweenness
[21], its extension to directed graphs with nonuniform weights
[67], round trip distance [61] or retroactivity [55]) with respect
to robustness to interconnection. The bounds on the difference in
dynamics and trajectories upon interconnection can be used to
specify the fan-out of a module [37]: the amount of “load” a
module can tolerate while satisfying certain design specifications,
such as switching time in the case of a toggle, or period and
amplitude in the case of an oscillator.

3.6 Illustrating To illustrate both the steps detailed above and the effect of inter-
the Effects modular connections on the dynamics of interconnected modules,
of Intermodular we consider first a natural recurring network motif, then a com-
Connections monly used synthetic genetic module.
Example 1: Single-Input Motif: The single-input motif in Fig. 2a
is a recurrent motif in gene transcription networks [31, 57]. Here,
we show that the dynamic performance (speed) of the module and
its robustness to interconnection with its context are not indepen-
dent, and that this trade-off can be analyzed by focusing on the
interplay between the internal retroactivity R of the module and the
scaling retroactivity S of the context. Let x_ 1 ¼ f ðx 1 Þ denote the
isolated dynamics of the module from 7. Furthermore, we have
 1 Þ ¼ Pl R
 i ¼ 1 for i ¼ 1, 2, . . ., l and U ¼ 1, so that Sðx 
D i¼1 l ðx 1 Þ
by 20, where R  i ðx 1 Þ is the retroactivity of TF xi in the context.
According to 24, the dynamics of the module upon interconnec-
tion modify to
1 þ Rðx 1 Þ
x_ 1 ¼  1 Þf ðx 1 Þ ¼ ½1  μðx 1 Þ f ðx 1 Þ,
1 þ Rðx 1 Þ þ Sðx |{z}
|{z}
effect of the context
effect of the context

 1 Þ=½1 þ Rðx 1 Þ þ Sðx


where μðx 1 Þ ¼ Sðx  1 Þ . The smaller μ(x1), the
more robust the module to interconnection. From a design per-
spective, if speed is a priority, one should choose a strong RBS with
a low-copy number plasmid, or alternatively, a promoter with high
dissociation constant k1. By contrast, if robustness to interconnec-
tion is central, a weak RBS with a high-copy number plasmid
(or with low k1) is a better choice. If both speed and robustness
to interconnection are desired, other design approaches may be
A Step-by-Step Guide for Retroactivity in Gene Networks 305

Fig. 2 (a) Single input motif. (b) The response time increases with the load. (c)
High internal retroactivity counteracts the effect of loading

required, such as the incorporation of insulator devices, as pro-


posed in other works [27].
Example 2: Oscillator: The common clock design in Fig. 3a is
based on two TFs, one of which is an activator and the other is a
repressor [5, 15, 62]. Here, we illustrate that while internal retro-
activity acts against sustained oscillations (Fig. 3c), scaling retroac-
tivity of the context promotes them. To see this, note that V1 ¼ I
and V 2 ¼ ½ 1 0  , whereas h(x) and R(x) can be constructed by
considering R1(x1, x2), R2(x1), H1(x1, x2), and H2(x1), respec-
tively, in Tables 1 and 2. With this, we write R2 ¼ a and
306 Andras Gyorgy
" #
b c
R1 ¼ :
d e
Then, we obtain that 13 takes the form
2 1þe c 3
! 
x_ 1 6 ð1 þ a þ bÞð1 þ eÞ  cd ð1 þ a þ bÞð1 þ eÞ  cd 7
¼64
7
5
x_ 2 d 1þaþb

ð1 þ a þ bÞð1 þ eÞ  cd ð1 þ a þ bÞð1 þ eÞ  cd
|{z}
½I þRðxÞ1
!
H 1 ðx 1 , x 2 Þ  δ1 x 1
:
H 2 ðx 1 Þ  δ2 x 2
|{z}
hðxÞ

Therefore, the activator and repressor dynamics are slowed


down asymmetrically (diagonal terms in [I + R(x)]1) due to
internal retroactivity. In particular, in the case when c, d  1
+ e  1 + a + b, the activator slows down compared to the repressor,
quenching the oscillations (Fig. 3d) [16]. To restore sustained
oscillations, we have to render the repressor dynamics slower with
respect to the activator dynamics by adding extra loading for the
repressor (Fig. 3a, right panel) [28]. In this case, we have R3(x2) >
0 given in Table 1, which, due to 13, will yield the following change

Fig. 3 (a) AR-clock. (b) AR-clock with load. (c) Neglecting retroactivity, the isolated AR-clock displays
sustained oscillations. (d) When internal retroactivity is accounted for, oscillations are quenched. (e) Oscilla-
tions can be restored by loading the repressor, thus increasing the scaling retroactivity
A Step-by-Step Guide for Retroactivity in Gene Networks 307

in the above dynamics: instead of e, we will have e + R3 > e, render-


ing the dynamics of the repressor slower with respect to the activa-
tor dynamics, restoring oscillations (Fig. 3e).

4 Notes

1. We treat gene expression as a one-step process, neglecting


mRNA dynamics. This assumption is based on the fact that
mRNA dynamics occur on a time scale much faster than pro-
tein production/decay [2]. Additionally, including mRNA
dynamics is not relevant for the study of retroactivity, and
would yield only minor changes in our results (see [25] for
details).
2. While the most widely used modeling approach employing Hill
functions conceals the effects of retroactivity, the framework
presented here reveals and quantifies these effects. Further-
more, this framework only involves measurable macroscopic
parameters.
3. For the most common binding types, we provide the expres-
sions of Ri( pi) and Hi( pi) in Tables 1 and 2. In particular, if
node xi has no parents, we have that Hi ¼ π i,0ηi and its node
retroactivity is not defined. In the single parent case, node xi
has one parent, y binding as an n-multimer with dissociation

Table 1
Retroactivity Ri of a node for the most common binding types

Binding type Ri
Single parent ηi n2 y n1
 2 ky
y
1þky

2 3
Independent ηi n2 y n1
6 2 k 0 7
6 1þ y y 7
6 ky 7
6 7
6 ηi 2 m1 7
6 m z 7
4 0  2 k 5
z
1 þ kz
z

2 3
Competitive n2 y n1 kz þ z m ny n mz m1
6 k  7
6 kz ky kz 7
 ηi 2 6
y
7
y zn
1þky þ kz
4 ny n1
mz m 2 m1 k þ y n 5
m z y

ky kz kz ky
2 3
Cooperative n2 y n1 kz þ z m ny n mz m1
6 k kz ky kz 7
6 7
 ηi 2 6
y
4 ny n7
y n
1þky þzkz
n1
mz m n 2
y m z m1 ky þ y 5
ky kz ky kz ky
308 Andras Gyorgy

Table 2
Hill function Hi for the most common binding types

Binding type Hi
y
Single parent π i,0 þπ i,1 ky
ηi yn
1þ ky

y m yn m
Independent π i,0 þπ i,1 ky þπ i,2 zkz þπ i,3 ky zkz
ηi yn m yn m
1þ ky þzkz þ ky zkz

y m
Competitive π i,0 þπ i,1 ky þπ i,2 zkz
ηi yn m
1þ ky þzkz

y yn m
Cooperative π i,0 þπ i,1 ky þπ i,3 ky zkz
ηi yn yn m
1þ ky þ ky zkz

constant ky. In the case of independent, competitive and coop-


erative binding, node xi has two parents, y and z, binding as
multimers with multimerization factors n and m, respectively,
together with dissociation constants ky and kz, respectively. The
total concentration of the promoter of xi is denoted by ηi. The
production rates π i,0, π i,1, π i,2, and π i,3 correspond to the
promoter complexes without parents, with y only, with z
only, and with both y and z, respectively.
4. The main technical assumptions are that (a) there is a separa-
tion of time scale between production/degradation of proteins
and the reversible binding reactions between TFs and DNA,
and that (b) the corresponding quasi-steady state is locally
exponentially stable. Assumption (a) is justified by the fact
that gene expression is on the time scale of minutes to hours
while binding reactions are on the time scale of subsecond to
second [2]. Assumption (b) is implicitly made any time Hill
function-based models are used in gene regulatory networks.
5. Since cellular systems are highly stochastic and experience dis-
turbances from many sources, parameter values are uncertain.
To handle their effects on the behavior of interconnected com-
ponents, one can use dissipativity analysis and SOSTOOLS
[49] or by studying the effects of robustness of low-copy and
high-copy genetic circuits to noise [39].

References

1. Akerlund T, Nordstrom K, Bernander R 3. Aris H, Borhani S, Cahn D, O’Donnell C,


(1995) Analysis of cell size and DNA content Tan E, Xu P (2019) Modeling transcriptional
in exponentially growing and stationary-phase factor cross-talk to understand parabolic kinet-
batch cultures of Escherichia coli. J Bacteriol ics, bimodal gene expression and retroactivity
177:6791–6797 in biosensor design. Biochem Eng J
2. Alon U (2007) Network motifs: theory and 144:209–216. https://doi.org/10.1016/j.
experimental approaches. Nat Rev Genet 8 bej.2019.02.005. http://www.sciencedirect.
(6):450–461
A Step-by-Step Guide for Retroactivity in Gene Networks 309

com/science/article/pii/ 15. Danino T, Mondragon-Palomino O,


S1369703X19300452 Tsimring L, Hasty J (2010) A synchronized
4. Arpino JAJ, Hancock EJ, Anderson J, quorum of genetic clocks. Nature 463
Barahona M, Stan GBV, (7279):326–330
Papachristodoulou A, Polizzi K (2013) Tuning 16. Del Vecchio D (2007) Design and analysis of
the dials of Synthetic Biology. Microbiol 159 an activator-repressor clock in E. coli. In: Pro-
(7):1236–1253. https://doi.org/10.1099/ ceedings of the American Control Conference,
mic.0.067975-0 pp 1589–1594
5. Atkinson MR, Savageau MA, Myers JT, Ninfa 17. Del Vecchio D, Ninfa AJ, Sontag ED (2008)
AJ (2003) Development of genetic circuitry Modular cell biology: retroactivity and insula-
exhibiting toggle switch or oscillatory behavior tion. Nature/EMBO Mol Syst Biol 4:161
in Escherichia coli. Cell 113(5):597–607 18. Du L, Villareal S, Forster AC (2012) Multigene
6. Balagadde FK, You L, Hansen CL, Arnold FH, expression in vivo: supremacy of large versus
Quake SR (2005) Long-term monitoring of small terminators for T7 RNA polymerase.
bacteria undergoing programmed population Biotechnol Bioeng 109(4):1043–1050
control in a microchemostat. Science 309 19. Franco E, Friedrichs E, Kim J, Jungmann R,
(5731):137–140 Murray R, Winfree E, Simmel FC (2011)
7. Bandiera L, Hou Z, Kothamachu VB, Balsa- Timing molecular motion and production
Canto E, Swain PS, Menolascina F (2018) with a synthetic transcriptional clock. Proc
On-line optimal input design increases the effi- Natl Acad Sci 108(40):E787
ciency and accuracy of the modelling of an 20. Giladi H, Goldenberg D, Koby S, Oppenheim
inducible synthetic promoter. Processes 6(9), AB (1995) Enhanced activity of the bacterio-
https://doi.org/10.3390/pr6090148. phage lambda PL promoter at low tempera-
http://www.mdpi.com/2227-9717/6/9/ ture. FEMS Microbiol Rev 17(1–2):135–140
148 21. Girvan M, Newman MEJ (2002) Community
8. Borkowski O, Ceroni F, Stan G, Ellis T (2016) structure in social and biological networks.
Overloaded and stressed: whole-cell considera- Proc Natl Acad Sci 99(12):7821–7826
tions for bacterial synthetic biology. Curr Opin 22. Guido NJ, Wang X, Adalsteinsson D,
Microbiol 33:123–130. https://doi.org/10. McMillen D, Hasty J, Cantor CR, Elston TC,
1016/j.mib.2016.07.009 Collins JJ (2006) A bottom-up approach to
9. Bremer H, Dennis P (1996) Modulation of gene regulation. Nature 439(7078):856–860
chemical composition and other parameters of 23. Gyorgy A (2018) Sharing resources can lead to
the cell by growth rate in Escherichia coli and monostability in a network of bistable toggle
Salmonella: cellular and molecular biology. switches. IEEE Control Syst Lett 3
ASM Press, Washington (2):308–313. https://doi.org/10.1109/
10. C CM, Nieto JM, S SP, Falconi M, Gualerzi LCSYS.2018.2871128
CO, Juarez A (2002) Temperature- and H-NS- 24. Gyorgy A, Murray RM (2016) Quantifying
dependent regulation of a plasmid-encoded resource competition and its effects in the
virulence operon expressing Escherichia coli TX-TL system. In: 55th IEEE Conference on
hemolysin. J Bacteriol 184(18):5058–5066 Decision and Control (CDC), IEEE, pp
11. Cardinale S, Arkin AP (2012) Contextualizing 3363–3368. https://doi.org/10.1109/CDC.
context for synthetic biology – identifying 2016.7798775
causes of failure of synthetic biological systems. 25. Gyorgy A, Vecchio DD (2014) Modular com-
Biotechnol J 7(7):856–866 position of gene transcription networks. PLoS
12. Ceroni F, Algar R, Stan GB, Ellis T (2015) Comput Biol 10(3):e1003486
Quantifying cellular capacity identifies gene 26. Gyorgy A, Jiménez JI, Yazbek J, Huang HH,
expression designs with reduced burden. Nat Chung H, Weiss R, Del Vecchio D (2015)
Methods 12(5):415–418 Isocost lines describe the cellular economy of
13. Cox RS, Surette MG, Elowitz MB (2007) Pro- genetic circuits. Biophys J 109(3):639–646.
gramming gene expression with combinatorial https://doi.org/10.1016/j.bpj.2015.06.034
promoters. Mol Syst Biol 3:145 27. Jayanthi S, Del Vecchio D (2011) Retroactivity
14. Cuba Samaniego C, Franco E (2018) A robust attenuation in bio-molecular systems based on
molecular network motif for period-doubling timescale separation. IEEE Trans Autom Con-
devices. ACS Synth Biol 7(1):75–85. pMID: trol 56(4):748–761
29227103. https://doi.org/10.1021/ 28. Jayanthi S, Del Vecchio D (2012) Tuning
acssynbio.7b00222 genetic clocks employing DNA binding sites.
PLoS One 7(7):e41019
310 Andras Gyorgy

29. Jayanthi S, Nilgiriwala KS, Del Vecchio D acquisition and model-based analysis of cell-
(2013) Retroactivity controls the temporal free transcription–translation reactions from
dynamics of gene transcription. ACS Synth nonmodel bacteria. Proc Natl Acad Sci
Biol 2(8):431–441 https://doi.org/10.1073/pnas.1715806115.
30. Jiang P, Ventura AC, Sontag ED, Merajver SD, http://www.pnas.org/content/early/2018/
Ninfa AJ, Del Vecchio D (2011) Load-induced 04/16/1715806115.full.pdf
modulation of signal transduction networks. 45. Mou S, Del Vecchio D (2015) How retroactiv-
Sci Signal 4(194):ra67 ity impacts the robustness of genetic networks.
31. Kalir S, McClure J, Pabbaraju K, Southward C, In: 2015 54th IEEE Conference on Decision
Ronen M, Leibler S, Surette MG, Alon U and Control (CDC), pp 1551–1556. https://
(2001) Ordering genes in a flagella pathway doi.org/10.1109/CDC.2015.7402431
by analysis of expression kinetics from living 46. Nagaraj VH, Greene JM, Sengupta AM, Son-
bacteria. Science 292(5524):2080–2083 tag ED (2017) Translation inhibition and
32. Khalil HK (2002) Nonlinear systems. Prentice resource balance in the TX-TL cell-free gene
Hall, Upper Saddle River expression system. Synt Biol 2(1):1–7. https://
33. Kim Y, Paroush Z, Nairz K, Hafen E, doi.org/10.1093/synbio/ysx005
Jiménez G, Shvartsman SY (2011) Substrate- 47. Neupert J, Karcher D, Bock R (2008) Design
dependent control of MAPK phosphorylation of simple synthetic RNA thermometers for
in vivo. Mol Syst Biol 7:467 temperature-controlled gene expression in
34. Kirschner MW, Gerhart JC (2006) The plausi- Escherichia coli. Nucleic Acids Res 36(19):e124
bility of life: Resolving Darwin’s dilemma. Yale 48. Perez-Martin J, Espinosa M (1994) Correla-
University Press, New Haven tion between DNA bending and transcriptional
35. Kittleson JT, Cheung S, Anderson JC (2011) activation at a plasmid promoter. J Mol Biol
Rapid optimization of gene dosage in Escheri- 241(1):7–17
chia coli using dial strains. J Biol Eng 5:10 49. Prescott TP, Gyorgy A (2015) Isocost lines
36. Klipp E, Liebermeister W, Wierling C, describe the cellular economy of genetic cir-
Kowald A, Lehrach H, Herwig R (2009) Sys- cuits. In: Proceedings of the IEEE Conference
tems biology: a textbook. Wiley, Hoboken on Decision and Control
37. Kyung KH, Sauro HM (2010) Fan-out in gene 50. Purcell O, di Bernardo M, Grierson CS, Savery
regulatory networks. J Biol Eng 4:16 NJ (2011) A multi-functional synthetic gene
network: a frequency multiplier, oscillator and
38. Lauffenburger DA (2000) Cell signaling path- switch. PLOS One 6(2):1–12. https://doi.
ways as control modules: complexity for sim- org/10.1371/journal.pone.0016140
plicity? Proc Natl Acad Sci 97(10):5031–5033
51. Purnick PEM, Weiss R (2009) The second
39. Lee JW, Gyorgy A, Cameron DE, et al. (2016) wave of synthetic biology: from modules to
Creating single-copy genetic circuits. Mol Cell systems. Nat Rev Mol Cell Biol 10(6):410–422
63(2):329–336. https://doi.org/10.1016/j.
molcel.2016.06.00 52. Qian Y, Huang HH, Jiménez JI, Del Vecchio
D (2017) Resource competition shapes the
40. Lohmiller W, Slotine JJE (1998) On contrac- response of genetic circuits. ACS Synth Biol 6
tion analysis for non-linear systems. Automa- (7):1263–1272. https://doi.org/10.1021/
tica 34(6):683–696 acssynbio.6b00361
41. Lyons SM, Xu W, Medford J, Prasad A (2014) 53. Ravasz E, Somera AL, Mongru DA, Oltvai ZN,
Loads bias genetic and signaling switches in Barabasi AL (2002) Hierarchical organization
synthetic and natural systems. PLoS Comput of modularity in metabolic networks. Science
Biol 10(3):e1003533 297(5586):1551–1555
42. Milo R, Shen-Orr SS, Kashtan N, Chlovskii 54. Saez-Rodriguez J, Kremling A, Gilles ED
DB, Alon U (2002) Network motifs: simple (2005) Dissecting the puzzle of life: modular-
building blocks of complex networks. Science ization of signal transduction networks. Com-
298(5594):824–827 put Chem Eng 29(3):619–629
43. Mishra D, Rivera PM, Lin A, Vecchio DD, 55. Saez-Rodriguez J, Gayer S, Ginkel M, Gilles
Weiss R (2014) A load driver device for engi- ED (2008) Automatic decomposition of
neering modularity in biological networks. Nat kinetic models of signaling networks minimiz-
Biotechnol 32(12):1268–1275 ing the retroactivity among modules. Bioinfor-
44. Moore SJ, MacDonald JT, Wienecke S, matics 24(16):213–219
Ishwarbhai A, Tsipa A, Aw R, Kylilis N, Bell 56. Scott M, Gunderson C, Mateescu E, Zhang Z,
DJ, McClymont DW, Jensen K, Polizzi KM, Hwa T (2010) Interdependence of cell growth
Biedendieck R, Freemont PS (2018) Rapid
A Step-by-Step Guide for Retroactivity in Gene Networks 311

and gene expression: origins and conse- robust and tunable synthetic gene oscillator.
quences. Science 330:1099–1102 Nature 456(7221):516–519
57. Shen-Orr SS, Milo R, Mangan S, Alon U 63. Tamsir A, Tabor JJ, Voigt CA (2011) Robust
(2002) Network motifs in the transcriptional multicellular computing using genetically
regulation network of Escherichia coli. Nat encoded nor gates and chemical ‘wires’. Nature
Genet 31(1):64–68 469(7329):212–215
58. Siegal-Gaskins D, Tuza ZA, Kim J, Noireaux V, 64. Tan C, Marguet P, You L (2009) Emergent
Murray RM (2014) Gene circuit performance bistability by a growth-modulating positive
characterization and resource usage in a cell- feedback circuit. Nat Chem Biol 5
free “Breadboard”. ACS Synth Biol (11):842–848
3:416–425. https://doi.org/10.1021/ 65. Weiße AY, Oyarzún DA, Danos V, Swain PS
sb400203p (2015) Mechanistic links between cellular
59. Slusarczyk AL, Lin A, Weiss R (2012) Founda- trade-offs, gene expression, and growth. Proc
tions for the design and implementation of Natl Acad Sci 112(9):E1038–E1047. https://
synthetic genetic circuits. Nat Rev Genet 13 doi.org/10.1073/pnas.1416533112
(6):406–420 66. Yates EA, Philipp B, Buckley C, Atkinson S,
60. Smanski MJ, Bhatia S, Zhao D, Park Y, Wood- Chhabra SR, Sockett RE, Goldner M,
ruff L BA, Giannoukos G, Ciulla D, Busby M, Dessaux Y, Camara M, Smith H, Williams P
Calderon J, Nicol R, Gordon DB, (2002) N-acylhomoserine lactones undergo
Densmore D, Voigt CA (2014) Functional lactonolysis in a pH-, temperature-, and acyl
optimization of gene clusters by combinatorial chain length-dependent manner during
design and assembly. Nat Biotechnol 32 growth of Yersinia pseudotuberculosis and Pseu-
(12):1241–1249 domonas aeruginosa. Infect Immun 70
61. Sridharan GV, Hassoun S, Lee K (2011) Iden- (10):5635–5646
tification of biochemical network modules 67. Yoon J, Blumer A, Lee K (2006) An algorithm
based on shortest retroactive distances. PLoS for modularity analysis of directed and
Comput Biol 7(11):e1002262 weighted biological networks based on edge-
62. Stricker J, Cookson S, Bennett MR, Mather betweenness centrality. Bioinformatics 22
WH, Tsimring LS, Hasty J (2008) A fast, (24):3106–3108
Chapter 15

Engineering Sensors for Gene Expression Burden


Alice Boo and Francesca Ceroni

Abstract
RNA-seq enables the analysis of gene expression profiles across different conditions and organisms. Gene
expression burden slows down growth, which results in poor predictability of gene constructs and product
yields. Here, we describe how we applied RNA-seq to study the transcriptional profiles of Escherichia coli
when burden is elicited during heterologous gene expression. We then present how we selected early
responsive promoters from our RNA-seq results to design sensors for gene expression burden. Finally, we
describe how we used one of these sensors to develop a burden-driven feedback regulator to improve
cellular fitness in engineered E. coli.

Key words Synthetic construct, Gene expression burden, RNA-seq, Sensor, Feedback

Filippo Menolascina (ed.), Synthetic Gene Circuits: Methods and Protocols, Methods in Molecular Biology, vol. 2229,
https://doi.org/10.1007/978-1-0716-1032-9_15, © Springer Science+Business Media, LLC, part of Springer Nature 2021

313
314 Alice Boo and Francesca Ceroni

1 Introduction

In cell engineering, cells are modified with synthetic constructs to


express molecules of interest. The expression of exogenous proteins
has been shown to cause detrimental physiological changes in the
host cells [1], usually leading to decreased growth and poor yields,
a phenomenon known as cellular burden [2]. Burden can stem from
the specific role of a protein and its interactions with the intracellu-
lar environment [3, 4]. However, recent work in the field of syn-
thetic biology has provided evidence that gene expression burden is
mainly caused by the competition between the host cell and the
synthetic constructs for the intracellular resources needed for gene
expression with ribosome uptake and energy consumption shown
to play a major role [1, 5, 6]. Burden is not only detrimental for the
cells; it is also the major cause of the poor predictability of the
behavior of synthetic constructs. Understanding the cell’s response
to burden is thus crucial to be able to counteract it and identify
strategies for a more robust design of gene expression devices.
We recently combined multiplex RNA-seq with an in vivo assay
to reveal the major transcriptional changes occurring in E. coli when
a set of inducible synthetic constructs are expressed. We identified
that native promoters related to the heat-shock response activate
rapidly in response to synthetic expression, regardless of the con-
struct in use. We termed these natural biosensors for burden as they
allow early detection of gene expression burden occurring in the
cell. Using these promoters, we built a CRISPR/dCas9-based
feedback regulation system that automatically adjusts synthetic
construct expression in response to burden. Cells equipped with
this general-use controller maintain capacity for native gene expres-
sion to ensure robust growth and outperform unregulated cells in
terms of protein yields in batch production.
Cells are transformed with constructs of interest and grown
over time in a plate reader. Induction of gene expression is per-
formed. Cells are harvested at 15 and 60 min after induction of
gene expression. Total RNA is isolated and genomic DNA
removed. rRNA is removed from the extract and mRNAs are
retro-transcribed to cDNA. The Illumina Nextera XT kit is used
to perform library preparation starting from cDNA. Identification
of early responsive promoters via analysis of RNA-seq results leads
to burden sensor design and testing. The htpG1 promoter is
selected to design and build a burden-based biomolecular feedback
system. The feedback adjusts heterologous gene expression levels
to mitigate the effect of burden on the cells.
Burden Sensors for Synthetic Biology 315

2 Materials

2.1 Strains We used bacterial strains MG1655 (K-12 F- λ- rph-1) and DH10B
(K-12 F- λ- araD139 Δ(araA-leu)7697 Δ(lac)X74 galE15 galK16
galU hsdR2 relA rpsL150(StrR) spoT1 deoR ϕ80dlacZΔM15
endA1 nupG recA1 e14- mcrA Δ(mrr hsdRMS mcrBC)), acquired
from the National BioResource Project Japan. Users should select
the strain of their own interest and apply the materials and methods
for the strain(s) of their choice.

2.2 Molecular 1. Plasmid DNA isolation, extraction of DNA from agarose gels,
Cloning and PCR purification were done using Qiagen kits.
2. All our PCR reactions were carried out using the NEB Phusion
High Fidelity Polymerase and oligonucleotide primers synthe-
sized by IDT.
3. The burden-responsive promoters were synthesized as gBlocks
from IDT and inserted into the destination vector below
through restriction cloning using SfiI and PacI. All enzymes
were ordered from NEB.
4. The structure of the plasmids used to test the burden early
responsive promoters on a plasmid is as described in Fig. 1. The
gene reporting the activity of the burden-responsive promoter,
here sfGFP, can be easily swapped using restriction cloning
using the PacI and BsaI restriction sites.
5. The structure of the burden-driven feedback plasmid used to
regulate heterologous protein production is as described in
Fig. 2. The actuator, here the sgRNA, can be swapped using
restriction cloning using the PacI and AscI restriction sites. The
target site on the sgRNA can also be swapped using inverse
PCR of the feedback plasmid with insertion-encoding 50 phos-
phorylated primers, followed by DpnI digestion and religation
before transformation into the strain of interest. This allows for
replacement of the target domain, which in our case was
designed to hit the araBAD promoter.

Fig. 1 Schematic of the sensor plasmid designed to characterize the promoters


upregulated by burden out of their genomic context
316 Alice Boo and Francesca Ceroni

Fig. 2 Feedback plasmid constitutively expressing dCas9. Expression of the


sgRNA is driven by the htpG1 burden promoter sensor, selected from the pool of
upregulated promoters identified by RNA-seq

Table 1
Protocol to make 400 mL of M9 0.4% fructose supplemented with
Casamino acids

M9 minimal media recipe

Volume of Solution to Add


Solution to make 400 mL of M9 Medium
Autoclaved distilled water 278 mL
M9 minimum salts (5) 80 mL
Thiamine hydrochloride (10 mg/mL) 10 mL
Fructose (10%) 16 mL
Casamino acids (10%) 8 mL
MgSO4 (0.1 M) 8 mL
CaCl2 (0.1 M) 400 μL

2.3 Medium The M9 media used in our experiments (Table 1) consisted of M9


minimal salts (5) supplemented with 0.4% casamino acids,
0.25 mg/mL thiamine hydrochloride, 2 mM MgSO4, 0.1 mM
CaCl2, 0.4% fructose, and the appropriate antibiotic (see Note 1).

2.4 RNA-Seq Library 1. Qiagen RNeasy mini kit (Qiagen 74104).


Preparation 2. DNase I (Qiagen 79254).
2.4.1 Consumables 3. Agilent RNA 6000 Nano Kit (5067-1511).
4. MicrobExpress rRNA removal kit (Thermo Scientific
AM1905) (see Note 2).
5. Tetro cDNA synthesis kit (Bioline BIO-65043).
6. Random hexamers (Bioline BIO-38028).
Burden Sensors for Synthetic Biology 317

7. Next second-strand synthesis buffer (NEB B6117S).


8. dNTPs (NEB N0446S).
9. RNase H (NEB M0297L).
10. Polymerase I (Thermo Scientific 18010025).
11. MiniElute PCR purification kit (Qiagen 28004).
12. DEPC-treated free water.
13. Nextera XT kit (Illumina FC-131-1096).
14. Ampure beads (Beckman Coulter A63880).
15. Agilent high-sensitivity DNA analysis kit (5067-4626).

2.4.2 Equipment 1. Agilent 2100 Bioanalyzer.


2. Qubit fluorometer (Invitrogen).
3. Multi-well plate reader.

3 Methods

3.1 Identify How Here, we describe the workflow we followed to prepare our E. coli
the Host Responds strains for measuring the impact of expressing a synthetic construct
to Burden Using on the host. We also describe how the samples were prepared for
RNA-Seq studying the impact of our burden-inducing construct on the host
transcriptome via RNA-seq. The workflow is represented in Fig. 3.

3.1.1 Transformation In our work, we were interested in expression burden triggered by


with Synthetic Construct overexpressing a heterologous protein. We specifically looked at the
Causing Burden burden caused by the expression of LacZ, the Lux operon and
VioB-mCherry. For the purpose of this chapter, we will be discuss-
ing the case of VioB-mCherry expression. One of our constructs
consisted of VioB-mCherry a 3.7 kb fusion protein consisting of
VioB and mCherry, controlled by the araBAD promoter (Fig. 4).
Induction of the expression of VioB-mCherry was done by adding
arabinose to the media.

Fig. 3 Workflow to measure the burden induced by the expression of a heterologous construct and extract the
RNA for RNA-seq
318 Alice Boo and Francesca Ceroni

Fig. 4 Design of our burden inducing construct

1. Construct the plasmid carrying your gene of interest which you


want to test.
2. Transform the construct into your strain and verify that it is
being expressed through any method of your choice. (We
checked concentrations of VioB-mCherry by tracking red fluores-
cence in a plate-reader.)
3. Together with your construct, transform into the same strain
its empty plasmid to be used as negative control. You will
compare the gene expression profile of this negative control
to the one of the strains carrying the burden plasmid to inves-
tigate which native genes are up/downregulated in the pres-
ence of burden.

3.1.2 Time-Course Assay 1. Grow overnight cultures of E. coli cells transformed with the
construct and control plasmids at 37  C overnight with aera-
tion in a shaking incubator in 5 mL of M9 medium (see Mate-
rials Subheading 3.3).
2. In the morning, dilute 60 μL of each sample into 3 mL of fresh
M9 media supplemented with the appropriate antibiotics and
grow them at 37  C with shaking for another hour
(outgrowth).
3. Then, transfer 200 μL of each sample into a 96-well plate (we
used clear transparent 96-well Costar plates) at approximately
0.1 OD600.
4. Place the samples in a microplate reader (we used a Biotek
Synergy HT plate-reader) and incubate them at 37  C with
orbital at Medium Shaking. For 1 h. Take measurements of
VioB-mCherry (excitation, 590 nm; emission, 645 nm) and
OD600 every 15 min. (if using GFP then use excitation,
485 nm; emission, 528 nm).
5. Sixty minutes into the incubation, briefly remove the plate to
add the inducer to the wells (our final concentrations of inducers
were: l-arabinose, 0.2%; l-rhamnose, 2%). Set this time point as
your “time 0.”
Burden Sensors for Synthetic Biology 319

6. If you are doing a burden assay: grow the cells in the reader for
4.5 h, taking measurements of VioB-mCherry (excitation,
590 nm; emission, 645 nm) and OD600 every 15 min.
7. If you are performing RNA-seq analysis: remove the samples
from the wells at 15 and 60 min after induction for processing:
(a) Take 170 μL from each of four wells per time point and
dispense it in a fresh tube to which you would have added
1.360 mL of RNA protection buffer.
(b) Leave the samples for 5 min at room temperature and then
centrifuge them at 4  C at maximum speed.
(c) Discard the supernatant and freeze the pellets at 20  C.
(d) Repeat the experiment for the three replicates on three
different days (our three replicates were repeated indepen-
dently on three different days for a total of 90 samples used to
produce the final data set (7 constructs  2 strains 
3 replicates  2 time points ¼ 84 samples; plus control strain
DH10B-GFP cells  3 replicates  2 time points)).

3.1.3 RNA-Seq Sample The library preparation uses a custom protocol adapted from pre-
Preparation vious Nextera kit methods [7].
1. Extract the RNA from your samples taken in the
Section Burden Assay and RNA-seq Time Course. Use the
Qiagen RNeasy mini kit (Qiagen 74104).
2. Remove possible traces of genomic DNA contamination by
treating 2 μg of each sample for a second time with DNase I
(Qiagen 79254).
3. Assess the total RNA quality and integrity with an Agilent 2100
Bioanalyzer and Agilent RNA 6000 Nano Kit (5067-1511).
The average RNA integrity number should be superior to 9.
4. Enrich the mRNA with the MicrobExpress rRNA removal kit
(Thermo Scientific AM1905).
5. Assess successful rRNA depletion on the Bioanalyzer.
6. Carry the retrotranscription starting from 50 ng of total
enriched mRNA with the Tetro cDNA synthesis kit (Bioline
BIO-65043) and 6 μL of random hexamers (Bioline
BIO-38028) per reaction.
7. For the second cDNA synthesis, add 5 μL of NEB Next
second-strand synthesis buffer (NEB B6117S) to the first-
strand synthesis mix, 3 μL of dNTPs (NEB N0446S), 2 μL of
RNase H (NEB M0297L), 2 μL of polymerase I (Thermo
Scientific 18010025), and 18 μL of water per reaction.
8. Incubate the samples at 16  C for 2.5 h.
9. Purify the cDNA with the MiniElute PCR purification kit
(Qiagen 28,004) and elute in 10 μL of DEPC-treated free
water.
320 Alice Boo and Francesca Ceroni

10. Quantify the amount of cDNA with a Qubit fluorometer


(Invitrogen).
11. For the library preparation, use the Nextera XT kit (Illumina
FC-131-1096) starting from 1 ng of total cDNA. Use 3 min of
tagmentation and 13 cycles of step-limited PCR.
12. Purify the library using ampure beads (Beckman Coulter
A63880).
13. Assess the quality and quantity of the library with an Agilent
2100 Bioanalyzer and Agilent high-sensitivity DNA analysis kit
(5067-4626).
14. Pool together all your samples in the same reaction tube at a
final concentration of 1 nM.

3.1.4 RNA-Seq Library We performed the library sequencing at the Imperial College
Sequencing London Genomic Facility. We used two lanes from the HiSeq
2500 sequencer for paired-end sequencing with read length of
100 bp.

3.1.5 Sequencing Quality 1. Trim and assess the quality of your raw reads for all sequenced
Control and Alignments samples using Trim Galore v0.4.1 with default settings. Look
for potential batch effects by pooling your technical replicates.
2. Obtain the genomic sequences of your organism, for example,
using Ensembl Genomes. (In our case, we created a FASTA
format sequence file corresponding to our DH10B-GFP and
MG1655-GFP strain by merging the composite of strain, plasmid,
and integrated GFP for each sample to use as a reference for read
alignment.)
3. Align the trimmed reads using the BWA mem algorithm
v0.7.12-r1039 with the default settings.
4. Create a sorted BAM file for each sample using SAMtools
v1.3.1 on the alignments obtained at the previous step.
5. Check that your biological replicates do not exhibit any batch
effects before you generate the raw counts with Bioconductor
Rsubread package v1.12.6.
6. Discard all reads identified as unremoved rRNA, and in the one
case where reads could align to either the plasmid or the strain
genome, assign the raw reads appropriately to match those of
flanking sequence.
7. Check the biological replicates to identify any outlier sample.
8. Generate the normalized FPKM counts with the Bioconductor
edgeR package version 3.4.2, accounting for gene length and
library size (by TMM normalization), which will be used for
downstream analysis.
Burden Sensors for Synthetic Biology 321

3.1.6 Transcription We adopted the method of Gorochowski et al. (2017) [8] to


Profiles and Promoter generate the transcription profiles from RNA-seq data.
Characterization
1. Map the raw reads from the sequencer, previously saved in a
FASTQ format, to your appropriate host genome reference
sequence (which includes any genomically integrated
sequences and/or plasmid sequences) with BWA version
0.7.4 with default settings. You will obtain BAM files for each
of your samples.
2. Separately process each of these BAM files with custom Python
scripts [8] to extract the position of the mapped reads, count
read depths across the reference sequences, and apply correc-
tions to the profiles at the ends of transcription units.
3. Normalize the obtained profiles to be able to compare them
between samples.
4. Characterize the promoters with custom Python scripts as in
Gorochowski et al. (2017) [8], which take as input a GFF
reference of the construct defining the location of all parts.
5. Use DNAplotlib version 1.0 [9] to visualize your transcription
profiles, and associated genetic design information were gen-
erated in an SBOL Visual format [10] (All our analyses were
carried out with custom scripts run using Python version 2.7.12,
NumPy version 1.11.2, and matplotlib version 1.5.3.).

3.1.7 Analyze To calculate the burden imposed by the constructs, refer to Ceroni
the Plate-Reader Data et al. [11]:
to Evaluate Burden
ln ðODðt 3 ÞÞ  ln ðODðt 1 ÞÞ
Growth rateðt 2 Þ ¼
t3  t1
Total GFPðt 3 Þ  Total GFPðt 1 Þ
GDP Capacityðt 2 Þ ¼
ODðt 2 Þ  ðt 3  t 1 Þ
Total RFPðt 3 Þ  Total RFPðt 1 Þ
RFP Production Rate per Cellðt 2 Þ ¼
ODðt 2 Þ  ðt 3  t 1 Þ
where t1 ¼ time  15 min after induction, t2 ¼ time after induc-
tion, and t3 ¼ time + 15 min after induction.
Mean rates and their standard errors are calculated from three
biological. To account for the background red fluorescence of M9,
we added 400 to all RFP output rates per cell as we measured that
red fluorescence decreases at a rate of approximately 400 RFP h1
as it is consumed by cells during growth.

3.2 Select the Best The next step is to identify which promoters are upregulated in the
Burden-Responsive presence of burden. We identified early responsive promoters using
Promoter to Build RNA-seq, isolated and cloned them upstream of a fluorescent
a Burden Biosensor reporter so to characterize their response to burden when out of
their genomic context on a plasmid. This workflow is presented in
Fig. 5. This allowed us to select our burden sensor: the promoter
exhibiting the best fold activation when it is triggered by burden.
322 Alice Boo and Francesca Ceroni

Fig. 5 Workflow to identify promoters that are upregulated by burden from RNA-seq results and test them out
of their genomic context in order to select the best candidate to use as a burden biosensor

3.2.1 Interpret Here, we describe how to interpret the RNA-seq results to identify
the RNA-Seq Results promoters with an early response to burden. We used DESeq2 for
to Identify Promoter our differential expression analyses [12].
Upregulated by Burden
1. Compare gene expression between cells transformed with syn-
thetic constructs and the analogous cells transformed with the
corresponding empty plasmid (We excluded the reads mapping
to ribosomal genes or to the synthetic constructs).
2. Annotate the differentially expressed genes with data extracted
from the EcoCyc database [13] using custom Python code.
3. Using a volcano plot can help visualizing which genes were
upregulated or downregulated in the cells experiencing the
imposed burden compared to the control cells (Fig. 5). We
specifically looked at the differential gene expression at 15 min,
and 1 h after induction.

3.2.2 Test Once we identified which promoters upregulate gene expression


the Burden-Responsive through RNA-seq analysis, we studied their behavior out of their
Promoters Out of Their genomic context on a plasmid.
Genomic Context
1. Order gBlock of each candidate promoter that upregulated the
expression of a native gene while exposed to burden. Include
SfiI restriction site upstream of the promoter sequence and
PacI restriction site downstream of the promoter sequence for
easy insertion into the sensor plasmid (Fig. 1).
2. Insert each gBlock into the sensor plasmid via restriction
cloning.
3. The reporter, currently sfGFP, can be swapped to a different
reporter gene by restriction cloning using PacI and BsaI.

3.2.3 Select the Best Analyze the plate-reader data and select the sensor plasmid that
Promoter to Use as Burden exhibits the best ON/OFF properties.
Biosensor
1. Analyze the plate-reader data according to sect. 3.1.7. Plot bar
graphs at 1 h post-induction with burden of the GFP produc-
tion rate per cell.
Burden Sensors for Synthetic Biology 323

Fig. 6 Workflow to build a burden-driven feedback for gene expression based on a burden biosensor
uncovered with RNA-seq

2. Select the promoter that is the most responsive to burden (the


highest GFP production rate per cell when there is burden) but
that also has the lowest OFF activity when there is no burden
(the lowest GFP production rate per cell when there is no
burden). We found the promoter with the best fold change in
GFP production rate per cell between the two conditions.
We constructed four sensor plasmids: htpG1, htpG2, groSL,
and ibpAB promoters driving the expression of sfGFP. We found
that htpG1 had the best fold activation out of the four constructs.
Since the htpG regulon is driven by two overlapping promoters,
namely htpG1 and htpG2, both promoters were tested separately
on the sensor plasmid.

3.3 Build Once we identified our burden sensor, we used it to drive the
the Burden-Driven expression of an actuator able to regulate gene expression in
Feedback Loop response to burden. Our workflow for building a burden-driven
feedback loop is represented in Fig. 6. In the presence of burden,
the actuator should be triggered to decrease heterologous gene
expression, thus decreasing the burden imposed on the cell, and
restore some of its cellular capacity.
To measure cellular burden, we used the capacity monitor from
Ceroni et al. [1]. This can assess the burden of genetic constructs by
calculating the changes in GFP productions from a “monitor cas-
sette” constitutively expressing GFP from the bacterial genome. A
detailed protocol of how to integrate the capacity monitor into a
strain of interest can be found in Note 3. GFP capacity, or the GFP
production rate per cell, should be maintained above a specific
threshold, which means that burden would be contained to an
upper bound.

3.3.1 Build the Feedback Build the feedback plasmid (Fig. 2) by restriction cloning: the
Plasmid promoter can be inserted using the previously synthesized gBlocks
carrying the SfiI and PacI restriction sites. The actuator can also be
324 Alice Boo and Francesca Ceroni

Fig. 7 Architecture of the burden-driven feedback implemented with CRISPRi to


regulate the production of a heterologous protein

synthesized with PacI and AscI restriction sites for insertion into
the feedback plasmid via restriction cloning. In our case, the sgRNA
was placed under the regulation of the htpG1 promoter to promote
fast dynamics of our system and such that the levels of sgRNA in the
cell will be directly related to the host cell capacity. dCas9 is consti-
tutively expressed and binds to sgRNA present in the cell to inhibit
the production of VioB-mCherry, which slows down cell growth
when its expression is triggered (Fig. 7).
1. Transform the burden plasmid and the feedback plasmid into a
strain containing the sfGFP capacity monitor integrated into
the genome. Also transform an open-loop version of the feed-
back: the sgRNA should not target anything in the cell.
2. Carry a time-course assay in the plate-reader: take measure-
ments of VioB-mCherry (excitation, 590 nm; emission,
645 nm), sfGFP (excitation, 485 nm; emission, 528 nm), and
OD600 every 15 min.
3. Sixty minutes into the incubation, briefly remove the plate to
add the inducer to the wells (0.2% arabinose).
4. Grow the cells for 6 h.
5. Analyze the data by plotting the GFP capacity and the VioB-
mCherry production rate at 1 h post-induction.
Repression of the VioB-mCherry production is tunable by
controlling the intracellular concentration of dCas9 available to
form an inhibiting complex together with the guide RNA.
dCas9 expression sets the steady-state repression levels of the
heterologous VioB-mCherry protein, but its production rate
should be carefully chosen such that it does not itself impose a
large burden on the host cell. The capacity monitor can assess
the burden of genetic constructs by calculating the changes in
GFP productions from a “monitor cassette” constitutively
expressing GFP from the bacterial genome (see Note 3).
6. Create a library of feedback constructs with promoters of vari-
ous strengths driving dCas9 expression to check if increasing
dCas9 levels strengthen repression of the feedback. Randomly
Burden Sensors for Synthetic Biology 325

mutate the J23100 Anderson promoter for specific positions


by analyzing the variable positions in the constitutive Anderson
promoter library. Order primers to insert your random muta-
tions through inverse PCR.
7. After construction of the library, the promoter strength of the
constructs can be assessed by monitoring their GFP capacity via
a plate-reader characterization assay (Sect. 3.1.7). Higher GFP
capacity implies that dCas9 production has a lower impact on
the cellular burden; hence, the promoter in front of the dCas9
must be a weak constitutive promoter. Similarly, if GFP capac-
ity tends to zero, then the constitutive promoter driving dCas9
expression must be strong.
8. Sequence enough library constructs to obtain a diversified
range of GFP capacities in the above experiment.
9. Transform the selected sequenced constructs with the burden-
some plasmid and select the one for which the GFP capacity is
the best conserved when burden is induced.

3.3.2 Tune One of the advantages of using a dCas9-gRNA-based regulation is


the Burden-Driven that the sgRNA sequence can be easily and quickly mutated so to
Feedback bind the target with different affinity, thus providing a convenient
way to tune the gain of the feedback Fig. 8. The same result is
achievable by changing the strength of the promoter guiding dCas9
expression.
The library of promoters controlling the expression of dCas9
demonstrated the capacity of the feedback system to repress to
production of VioB-mCherry and keep the cellular capacity close
to that of the wild-type strain. The feedback system should have a
maximized heterologous protein production rate while keeping
cellular capacity high. Introducing a mutation in the sgRNA con-
tributes to decreasing the binding affinity between the dCas9/
sgRNA complex and the araBAD promoter, hence lowering the
repression of VioB-mCherry and improving its rate of production.
Farasat et al. [14] described how to rationally introduce mis-
matches in the guide RNA to regulate the activity of the dCas9/
sgRNA complex. One mismatch in the 6 bases closest to the PAM
site is expected to reduce repression by 3 or four-fold, while two
mismatches lead to a 14-fold decrease in repression. We decided to
introduce one mismatch in our sgRNA targeting the ARABAD
promoter, intuitively predicting that two mismatches would
decrease the repression too much for the feedback to have a notice-
able effect on maintaining the cellular burden high in the cell.
1. Construct a library of randomly point-mutated sgRNA was
done using inverse PCR: introduce random point mutations
in the 6 bases closest to the PAM site.
326 Alice Boo and Francesca Ceroni

Fig. 8 Tune the feedback gain by changing the expression level of dCas9
(promoter/RBS) or by varying the affinity of the sgRNA with its target promoter
(bp mutation)

2. Evaluate the performance of the point-mutated sgRNA feed-


back constructs by conducting a batch experiment and com-
paring the final yield of the different constructs:
(a) Inoculate 3 mL of M9 fructose media, supplemented with
the appropriate antibiotics, in 15 mL culture tubes with
the constructs carrying the different mutated sgRNAs.
(b) Grow the cultures in the 37  C shaking incubator for 5 h,
before diluting them to 0.015 OD600.
(c) Use 50 μL of the diluted culture (~150,000 cells) to
inoculate batch cultures of 50 mL M9 supplemented
with the inducer and the appropriate antibiotics in
500-mL baffled shake-flasks.
(d) Grow the cultures in the 37  C shaking incubator during
16 h.
(e) Then, every hour from 16 h until 24 h, dispense 200 μL of
each culture in individual wells of a 96-well plate (it will be
used to read the cell density and bulk fluorescence of each
construct into the plate-reader). Also dilute 350 μL of
culture into 650 μL of PBS and store in the fridge at
4  C (it will be used to read the fluorescence of each
construct with the flow cytometer).
Burden Sensors for Synthetic Biology 327

(f) After sampling at each hour, place the 96-well plate in a


preheated plate-reader at 37  C and start a plate-reader
kinetic, performing OD measurements (OD600 and
OD700), GFP measurements (excitation, 485 nm; emis-
sion, 528 nm), and RFP measurements (excitation,
590 nm; emission, 645 nm) every 2 min for 10 min.
Average over the 5 points to obtain the OD, GFP and
RFP values at the specific sampling time point (this is to
allow the sample to settle down in the plate-reader).
(g) Measure the GFP and RFP levels of individual cells from
the cultures stored in PBS at 4  C with a flow-cytometer
(we used the FortessaX20).
(h) Select the sgRNA that produced the highest final yield.
(We found that the strain producing the highest yield was the
one without any mutation as they grew faster than the other
strains.)

4 Notes

1. M9 Medium Recipe
(a) M9 Minimum salts (5) stock solution: dissolve 56.4 g
of M9 Minimum Salts into 1 L of distilled H2O. Stir to
suspend and sterilize by autoclaving. Store at room
temperature.
(b) Thiamine hydrochloride stock solution: dissolve 10 mg
of thiamine hydrochloride into 1 mL of water. Agitate to
suspend. Filter-sterilize. Cover the sterile container with
aluminum foil to protect it from the light. Store at room
temperature. (DH10B cannot produce thiamine
hydrochloride.)
(c) Fructose stock solution: dissolve 10 g of fructose into
100 mL of distilled H2O. Filter-sterilize. Store at 4  C.
(We used fructose as the main carbon source to avoid the
strong catabolite repression of AraBAD and RhaBAD pro-
moters known to occur in glucose media.)
(d) Casamino acids stock solution: dissolve 10 g of Casa-
mino Acids into 100 mL of distilled H2O. Stir to suspend
and sterilize by autoclaving. Store at room temperature.
(We tried various Casamino acids brands and found that
Casamino acids from MP Biomedicals gave us consistent
growth for our DH10B and MG1655 cells.)
(e) 1 M Magnesium sulfate (MgSO4) stock solution: dis-
solve 246 g of MgSO4l7H2O into 1 L of distilled
H2O. Sterilize by autoclaving. Store at room temperature.
328 Alice Boo and Francesca Ceroni

(f) 1 M Calcium chloride (CaCl2) stock solution: dissolve


44 g of CaCl2l6H2O into 200 mL of distilled H2O. Ster-
ilize by autoclaving. Store at room temperature.
2. Ribodepletion
The ribodepletion step was carried out using the
MICROBExpress mRNA Enrichment Kit. We selected this
kit for its cost effectiveness, though better depletion of the
ribosomal RNA can be achieved using Illumina kits, especially
the RiboZero Kit [15]. During our analysis, we found that
around 60% of the sequences were coming from ribosomal
RNA, but this varied from sample to sample with different
efficiency.
3. Integration of the sfGFP capacity monitor
Our CRIM plasmid carrying the sfGFP constitutive cas-
sette is available from Addgene (https://www.addgene.org/
66073/) such that it can be inserted using the helper plasmid
pINT-ts (https://www.addgene.org/66076/) in the users
strain(s) of interest.
The sfGFP capacity monitor [1] was integrated into the λ
site of E. coli using the CRIM [16] plasmid pAH63. The
following protocol for the insertion of the sfGFP capacity
monitor is adapted from Dr. Algar PhD thesis [17].
“To insert the monitor into the genome we used the
CRIM system [16]. This system involves two separate plas-
mids, one of which contains the monitor and will be inserted
into the genome, the other being a ‘helper plasmid’ that facil-
itates the genomic integration.
The CRIM system works by placing the circuit you wish to
insert into the genome into the CRIM plasmid corresponding
to the integration site. CRIM plasmids have the γ replication
origin of R6K, which requires the trans-acting Π protein
(encoded by pir) for replication. This means that these plasmids
can only be maintained in cells which have a pir + genotype. In
order to replicate the CRIM plasmid with monitor we trans-
formed into pir + cells. For this we used TransforMax™-
EC100D™pir-116 Electrocompetent E. coli cells.”
(a) Construct the CRIM integration pAH63 plasmid (Kana-
mycin resistance) containing the sfGFP monitor into
TransforMax™EC100D™pir-116 electrocompetent
E. coli cells.
(b) In parallel, transform the pINT-ts helper plasmid (Ampi-
cillin resistance) into DH10B. Always grow these cells at
30  C.
(c) Make electrocompetent the DH10B cells transformed
with the pINT-ts helper plasmid. Always grow these cells
at 30  C.
Burden Sensors for Synthetic Biology 329

(d) Transform the pAH63 plasmid containing the sfGFP


monitor into your pINT-ts electrocompetent cells.
(e) Following electroporation, suspend the cells in SOC or
SOB. Incubate at 37  C for 1 h and then at 42  C for
30 min. (The phage integrase (Int) enzyme is synthesized
at elevated temperatures from the CRIM helper plasmid
pINT-ts. The helper plasmid has a temperature sensitive
origin of replication such that resulting colonies are nearly
always cured of the helper plasmid.)
(f) Spread onto selective agar (Kanamycin in our case) and
incubate at 37  C.

References
1. Ceroni F, Algar R, Stan G-B, Ellis T (2015) genetic designs and associated data. ACS
Quantifying cellular capacity identifies gene Synth Biol 6:1115–1119. https://doi.org/10.
expression designs with reduced burden. Nat 1021/acssynbio.6b00252
Methods 12:415–418. https://doi.org/10. 10. Myers CJ, Beal J, Gorochowski TE et al (2017)
1038/nmeth.3339 A standard-enabled workflow for synthetic
2. Borkowski O, Ceroni F, Stan GB, Ellis T biology. Biochem Soc Trans 45:793–803.
(2016) Overloaded and stressed: whole-cell https://doi.org/10.1042/BST20160347
considerations for bacterial synthetic biology. 11. Ceroni F, Boo A, Furini S et al (2018) Burden-
Curr Opin Microbiol 33:123–130. https:// driven feedback control of gene expression.
doi.org/10.1016/j.mib.2016.07.009 Nat Methods 15:387–393. https://doi.org/
3. Ellis T (2018) Predicting how evolution will 10.1038/nmeth.4635
beat us. Microb Biotechnol 12(1):41–43. 12. Love MI, Huber W, Anders S (2014) Moder-
https://doi.org/10.1111/1751-7915.13327 ated estimation of fold change and dispersion
4. Martin VJJ, Pitera DJ, Withers ST et al (2003) for RNA-seq data with DESeq2. Genome Biol
Engineering a mevalonate pathway in Escher- 15:1–21. https://doi.org/10.1186/s13059-
ichia coli for production of terpenoids. Nat 014-0550-8
Biotechnol 21:796–802. https://doi.org/10. 13. Keseler IM, Mackie A, Santos-Zavaleta A et al
1038/nbt833 (2017) The EcoCyc database: reflecting new
5. Gyorgy A, Jiménez JI, Yazbek J et al (2015) knowledge about Escherichia coli K-12.
Isocost lines describe the cellular economy of Nucleic Acids Res 45:D543–D550. https://
genetic circuits. Biophys J 109:639–646. doi.org/10.1093/nar/gkw1003
https://doi.org/10.1016/j.bpj.2015.06.034 14. Farasat I, Salis HM (2016) A biophysical model
6. Shachrai I, Zaslaver A, Alon U, Dekel E (2010) of CRISPR/Cas9 activity for rational design of
Cost of unneeded proteins in E. coli is reduced genome editing and gene regulation. PLoS
after several generations in exponential growth. Comput Biol 12:1–33. https://doi.org/10.
Mol Cell 38:758–767. https://doi.org/10. 1371/journal.pcbi.1004724
1016/j.molcel.2010.04.015 15. Petrova OE, Garcia-Alcalde F, Zampaloni C,
7. Gertz J, Varley KE, Davis NS et al (2012) Sauer K (2017) Comparative evaluation of
Transposase mediated construction of rRNA depletion procedures for the improved
RNA-seq libraries. Genome Res 22:134–141. analysis of bacterial biofilm and mixed patho-
https://doi.org/10.1101/gr.127373.111. gen culture transcriptomes. Sci Rep 7:1–15.
134 https://doi.org/10.1038/srep41114
8. Gorochowski TE, Espah Borujeni A, Park Y 16. Haldimann A, Wanner BL (2001) Conditional-
et al (2017) Genetic circuit characterization replication, integration, excision, and retrieval
and debugging using RNA-seq. Mol Syst Biol plasmid-host systems for gene structure-
13:952. https://doi.org/10.15252/msb. function studies of bacteria. J Bacteriol
20167461 183:6384–6393. https://doi.org/10.1128/
9. Der BS, Glassey E, Bartley BA et al (2017) JB.183.21.6384
DNAplotlib: programmable visualization of
330 Alice Boo and Francesca Ceroni

17. Algar RJR (2013) Understanding, characteris- Rhys James Richmond Algar, MA (Oxon),
ing and modelling the interactions between MRes Submission for the degree of PhD.
synthetic genetic circuits and their host chassis Imperial College London
Chapter 16

Engineering Protein-Based Parts for Genetic Devices


in Mammalian Cells
Giuliano Bonfá, Federica Cella, and Velia Siciliano

Abstract
Synthetic biology has been advancing cellular and molecular biology studies through the design of synthetic
circuits capable to examine diverse endogenously or exogenously driven regulatory pathways. While early
genetic devices were engineered to be insulated from intracellular crosstalk, more recently the need of
achieving dynamic control of cellular behavior has led to the development of smart interfaces that connect
signal information (sensor) to desired output activation (actuator). Sensor-actuator circuits can respond to
diverse inputs, including small molecules, exogenous and endogenous mRNA, noncoding RNA (i.e.,
miRNA), and proteins to regulate downstream events, transcriptionally, posttranscriptionally, and transla-
tionally. These devices require attentive engineering to either create complex chimeric proteins or modify
protein structures to be amenable to the specific circuits’ architecture and/or purpose.
In this chapter, we describe how to implement two different protein-based devices in mammalian cells:
(1) a modular platform that sense and respond to disease-associated proteins and (2) a protein-based system
that allows simultaneous regulation of RNA translation and protein activity, via RNA-protein and newly
engineered protein–protein interactions.

Key words Mammalian synthetic biology, Protein sensor-actuator, Synthetic smart interfaces, Pro-
tein–protein regulation, Protein–RNA regulation, RNA-binding protein

1 Introduction

1.1 Synthetic Programmable and model aided synthetic circuits hold the poten-
Devices that Sense tial to improve our understanding of the rules that govern
Intracellular Protein biological processes [1–4] and to create new tools for biomedical
and Regulate purposes [22]. Genetic biosensors with medical applications focus
Cellular Fate on cell function rewiring by triggering a therapeutic output via
transcriptional or translational regulation [5–9]. Most of synthetic
biosensors have been designed to respond to extracellular stimuli
either by building input-specific devices or by creating a generaliz-
able framework to adapt to different cues [7, 10, 23].
Here, we describe the first modular platform that can be repur-
posed to sense and respond to several intracellular proteins that
function as disease’ biomarkers. This synthetic platform couples

Filippo Menolascina (ed.), Synthetic Gene Circuits: Methods and Protocols, Methods in Molecular Biology, vol. 2229,
https://doi.org/10.1007/978-1-0716-1032-9_16, © Springer Science+Business Media, LLC, part of Springer Nature 2021

331
332 Giuliano Bonfá et al.

Membrane
Tag
mKate
sm
pla
y to scFv162
C
scFv35

TCS TEVp

Gal4-VP16

us
ucle
N Actuator

Fig. 1 Schematics of intracellular protein sensors. The sensing modules are


composed by two intrabodies that recognize and interact with an intracellular
target protein (green star). One intrabody is anchored to the cell membrane (i.e.,
for the HCV protein biomarker NS3 we used scFv35) and is fused at the
N-terminus to mKate fluorescent tag and at the C-terminus to the TCS and to
a GAL4-VP16 transcriptional activator. The second intrabody (i.e., for the HCV
protein biomarker NS3 we used scFv162) is cytosolic and fused to the TEVp.
Interaction of the intrabodies with the target protein approximates the TEVp to
TCS, and the cleavage reaction results in the release of GAL4-VP16 that
translocate to the nucleus and induce the expression of the output gene
(Actuator)

intracellular protein sensing to actuator modules to convert protein


detection into programmed gene expression in a modular
fashion [6].
The architecture of the device is very articulated and include
intrabodies that specifically bind the selected proteins connected to
the viral TEV protease (TEVp) system to release a membrane-
bound transcription factor when the protein is sensed (Fig. 1).
Intrabodies guarantee the modularity of the framework that was
built to allow their easy interchange and quick rearrangements
toward new proteins of interest.
We chose to detect disease proteins primarily expressed in the
cytoplasm, and for each target, we selected two intrabodies
(or interacting peptide domains) that bind to different epitopes.
Engineering Protein-Based Parts in Mammalian Cells 333

Output was driven by GAL4 cognate promoter including


(a) fluorescence for diagnostic purpose, (b) apoptotic gene for cell
killing, and (c) chemokine for immunomodulatory purposes.
We demonstrate the functionality of engineered protein sensors
by creating devices that sense three different proteins associated
with the following diseases hepatitis C virus (HCV) infection,
human immunodeficiency virus (HIV) infection, and Huntington’s
disease and respond with either fluorescent reporter activation or
biological activity (cell apoptosis and HLA-I downregulation).

1.2 A Protein-Based RNA-encoded genetic circuits have the potential to limit immuno-
Strategy to Regulate genicity and mutagenicity issues of DNA-based system and exhibit
RNA Translation faster dynamics. They have thus become an appealing strategy for
and Protein Activity synthetic systems’ regulation with a variety of applications mostly in
the biomedical fields [11, 12]. However, the ability to achieve fine
control over gene expression at posttranscriptional and transla-
tional level is limited by the poor toolbox of regulatory devices
available: ribozymes, aptamers, riboswitches can modulate the
translation of the associated output but cannot be interconnected
to create modular and scalable circuits [13, 14].
Recently, RNA-binding proteins (RBPs) have been demon-
strated for the engineering of RNA-encoded networks, enhancing
the regulatory features of RNA expression [15]. We envisioned to
create a multilayered system that adds further regulatory elements
via protein–RNA and protein–protein interactions using a protein
engineering strategy.
Proteases can recognize specific aminoacidic sequences leading
to proteolytic events. In theory, these protease-responsive
sequences could be transferred to other proteins modifying their
structure but with no impairment of their function. This is poten-
tially possible due to the availability of protein crystal structures for
a large number of proteins, as well as of multiple software available
to study and infer protein structure and sequence [16, 17], after in
silico modification or via homology analysis for native protein
sequences. Thus, this system is potentially highly modular, and
more levels of regulation can be multiplexed. This study use
TEVp and others from the same family, but this framework could
be extended to endogenous proteins that are activated following
specific cellular state. Here we report on how to use protein engi-
neering to create regulatory cascades that connect proteases to
RBPs L7Ae and Ms2-cNOT7, to tune output expression at post-
transcriptional and translational level. L7Ae binds kink-turn
motives in the 50 UTR of the target mRNA and blocks output
translation [15]. We modified its structure to be TEV protease
dependent. Ms2-cNOT7 is a fusion protein that binds Ms2 binding
motifs in the 30 UTR and chops the poly(A) of the target mRNA,
334 Giuliano Bonfá et al.

that is consequently degraded [15]. In the linker between Ms2 and


cNOT7, we inserted the cleavage site for four different cognate
proteases: TEV, TUMV, TVMV, and SuMMV. We chose four viral
proteases that are orthogonal to the host organism, so they do not
interfere with endogenous pathways, and with each other. Further-
more, we reengineered protease structure to create protease–prote-
ase interactions. Finally, we created protease-based, multi-stage
regulatory cascades and here we report step by step how this was
achieved.

2 Material

2.1 Intracellular 1. Plasmid backbones: pEntry-promoters, pDonor-gene,


Protein-Sensor pDestination.
Devices to Regulate 2. For pDonor-gene design using Golden Gate: Type IIS restric-
Cell Fate tion enzymes BsaI and BsmBI.
2.1.1 DNA Cloning 3. Gateway (Life Technology), Infusion (Clonetech) and Golden
and Plasmid Construction Gate [18] systems.
4. LB medium and a 37  C shaking incubator for bacterial
growth.
5. Kit for DNA extraction and gel purification.

2.1.2 Mammalian Cells 1. Required cell lines: HEK293FT (Invitrogen), HeLa-based


Culture and Transfection/ TZM-bl (NIH AIDS reagent program), and Jurkat (ATCC).
Electroporation 2. To maintain HEK293FT and HeLa-based TZM-bl cells, use
Dulbecco’s modified Eagle medium (DMEM, Cellgro) supple-
mented with 10% FBS (Atlanta BIO), 1% penicillin/streptomy-
cin/L-Glutamine (Sigma-Aldrich) and 1% non-essential amino
acids (HyClone). Culture them at 37  C in a 5% CO2-humidi-
fied incubator. To maintain Jurkat cells use RPMI-1640
(ATCC) supplemented with 10% FBS (Atlanta BIO), and 1%
non-essential amino acids (HyClone). Culture them at 37  C in
a 5% CO2-humidified incubator.
3. Doxycycline (Clonetech): prepare a stock solution by diluting
5 g in double distilled H2O to reach 1 mg/mL. Filter, aliquot,
and store the solution at 20  C in the dark.
4. Perform transfection of HEK293FT cells with Attractene (Qia-
gen) and electroporation of HeLa-based TZM-bl and Jurkat
cells using Neon Transfection System with 10 μL Neon Tip
(Life Technologies).
5. For HIV-1 production and infection use HIV-1 corresponding
infectious molecular clones of strains IIIB, JRCSF, LAI, and
NL4.3 (NIH AIDS reagents program) and JetPRIME®
reagent (Polyplus transfection #114-07).
Engineering Protein-Based Parts in Mammalian Cells 335

6. Ultracentrifuge for virus concentration.


7. HIV-1 p24 ELISA Kit (PerkinElmer NEK050B001KT).
8. Cytofix/cytoperm solution (BD Biosciences #554722).
9. Anti-p24 antibody FITC-conjugated (KC57-FITC from Beck-
man Coulter #CO6604665).
10. 35 mm glass bottom dishes (Fluorodish), DMEM base
medium (Cellgro) without supplements, Opti-MEM I reduced
serum medium (Life Technologies), Trypsin (Invitrogen), PBS
1, humidified incubator at 37  C with 5% CO2, centrifuge.
11. Leica TCS SP5 II microscope equipped with an incubation
chamber using a x63 objective.
12. Evos Cell Imaging System (Life Technology).

2.1.3 Flow Cytometry 1. LSR Fortessa flow cytometer, equipped with 405, 488, and
Staining, Acquisition, 561 nm lasers and LSR-II system (BD Biosciences).
and Analyses 2. SpheroTech RCP-30-5A beads (SpheroTech).
3. To determine surface expression of HLA-I molecules, use
AlexaFluor® 647 mouse anti-human HLA A, B, C antibody
clone W6/32 (Biolegend® #311414).
4. For apoptosis assays, stain post-transfected and PBS washed
cells with Pacific-Blue conjugated Annexin V (LifeTechnolo-
gies) before flow cytometry analysis.
5. FACSDiva8 software.

2.1.4 RNA Extraction, 1. RNeasy Mini Kit (Qiagen).


cDNA Synthesis, and qPCR 2. RNase free water.
3. QuantiTect Reverse Transcription Kit (Qiagen).
4. Fast SYBR Green Master Mix (ThermoFisher Scientific).
5. MicroAmp™ Fast Optical 96-Well Reaction Plate (0.1 mL)
(ThermoFisher Scientific).
6. StepOnePlus™ 7500 Fast Real Time PCR machine (Thermo-
Fisher Scientific).
7. Primers:
GAPDH Forward GAAGATGGTGATGGGATTTC.
GAPDH Reverse GAAGTTGAAGGTCGGAGT.
XCL-1 Forward CTTGGCATCTGCTCTCTCACT.
XCL-1 Reverse AGGCTCACACAGGTCCTCTTA.
336 Giuliano Bonfá et al.

2.2 Protein-Based 1. Required cell line: HEK293FT (Invitrogen).


Devices to Regulate 2. To maintain the cells, use Dulbecco’s Modified Eagle’s
RNA and Protein Medium DMEM phenol red (Cellgro) supplemented with
Activity 10% fetal bovine serum FBS (Atlanta Bio), 1% penicillin/strep-
2.2.1 Cell Culture,
tomycin (Sigma), 1% L-Glutamine (Sigma), 1% MEM nones-
Transient Transfection Cell
sential amino acids (HyClone). Culture them at 37  C in a 5%
Imaging and Flow
CO2-humidified incubator.
Cytometry 3. DMEM phenol red medium (Cellgro).
4. DMEM medium (Cellgro) supplemented with 10% fetal
bovine serum FBS (Atlanta Bio).
5. Trypsin-EDTA 0.25% phenol red (Gibco).
6. DPBS no calcium no magnesium (Gibco).
7. Attractene transfection reagent (Qiagen).
8. Lipofectamine 3000 transfection reagent (Invitrogen).
9. Opti-MEM reduced serum medium (Gibco).
10. 24-well plates flat bottom (Corning).
11. Countess™ II Automated Cell Counter and Countess® Cell
Counting Chamber Slides (Invitrogen).
12. Trypan blue.
13. EVOS® Cell Imaging System (Life Technologies) using 10
objective with EVOS® Light Cubes Texas Red, GFP,
and DAPI.
14. LSR Fortessa flow cytometer equipped with 405, 488 and
561 nm lasers (BD Biosciences).
15. FlowJo software version 10.5 to perform data analysis.

2.2.2 PCR and Plasmid 1. Accuprime PFX Supermix (ThermoFisher Scientific).


Cloning 2. 5 U/μL of BamHI-HF and PacI restriction enzymes (NEB)
and CutSmart® Buffer to a final concentration of 1.
3. In-Fusion HD cloning kit (Clontech) is used to a final concen-
tration of 1.
4. E. coli Stellar™ Competent Cells (Takara).

2.2.3 In Silico Protein 1. Personal computer equipped with Pymol [16].


Engineering 2. SWISS-MODEL server for protein modeling.

3 Methods

The framework combines sensing and actuation modules. The


sensing modules are based on intrabodies and the actuation mod-
ules are based on the Tango-TEV technology. The building and test
process comprise the following steps:
Engineering Protein-Based Parts in Mammalian Cells 337

3.1 Intracellular 1. Genetic circuits construction.


Protein-Sensor 2. Transduction/transfection.
Devices to Regulate
3. Protein expression.
Cell Fate
4. Protein binding (detection).
5. Protease cleavage.
6. Protein nuclear translocation.
7. Transcriptional activation.

3.1.1 DNA Cloning One intrabody is fused at the N-terminus to a membrane-tethered


and Plasmid Construction fluorescent tag (mKate) and at the C-terminus to a Tabacco Etch
Virus (TEV) cleavage site (TCS) and to a GAL4-VP16 transcrip-
tional activator, forming a chimeric protein sequestered in the
cytosol. A second intrabody is fused to the TEV protease (TEVp)
that recognizes and cleaves the TCS. The presence of the target
protein in the cytosol and subsequent recognition by the two
intrabodies results in TEVp cleavage of TCS and release of GAL4-
VP16, which translocate into the nucleus and converts proteins
detection into programed gene expression.
Plasmids containing the sensing and actuation modules are
built with Gateway, Infusion or Golden Gate systems described
below. The membrane-tethered module is driven by a constitutive
promoter hEF1, whereas the TEVp module is either driven by
hEF1 or TET (responsive to doxycycline) promoter. Below we
show examples of module combination for NS3, HTT, and HIV
devices (see Notes 1–4):
1. NS3 device: FGFR-mKate-scFv35-LD0-TCS(L)-GAL4-VP16
and TEVp-LD0-scFv162 (see Notes 5 and 6).
2. HTT device: Happ1-LD0-TCS-(S) and DD-TEVp-LD0-
Vl12.1.
3. Nef device: sdAb19-LD0-TCS(L)-GAL4-VP16 and TEVp-
LD0-SH3.
Final plasmids can be obtained by a combined golden gate and
gateway strategy. First, chimeric proteins are generated by PCR
amplification of each gene and inserting restriction sites for type
IIS enzymes BsaI or BsmBI to perform golden gate reactions in the
donor vector. Next, gateway recombinations are performed with a
plasmid containing the promoter and a destination vector follow-
ing the manufacturer’s instructions.

3.1.2 Transfection of HEK 1. Carry out protein sensor transfections in 24-well plate format
293FT Cells transfecting HEK 293FT with Attractene.
and Fluorescence Imaging 2. Prepare a mix of 300 ng of total DNA in DMEM base medium
without supplements to a final volume of 60 μL.
338 Giuliano Bonfá et al.

3. Add 1.5 μL of Attractene to the DNA mix prepared and vortex


the samples promptly to mix. Incubate the complexes for
20–25 min at room temperature (see Note 7).
4. During the incubation time, harvest the cells by trypsinization
and seed 2  105 cells in 500 μL of complete culture medium
per well.
5. Add transfection complexes dropwise to the freshly seeded
cells. Gently mix the plates and incubate at 37  C in a 5%
CO2—humidified incubator.
6. Supplement the cells with 1 mL of fresh growth medium 24 h
post-transfection and analyze by flow cytometry after 48 h (see
Note 9).
7. Perform confocal imaging with Leica TCS SP5 II microscope
equipped with an incubation chamber using a  63 objective.
Fluorescence and bright-field micrographs can be acquired
with Evos Cell Imaging System, using 10 objective.

3.1.3 Electroporation 1. Electroporate TZM-bl and Jurkat cells with Neon Transfection
of TZM-bl and Jurkat Cells System using 10 μL Neon Tip.
2. For TZM-bl cells, prepare a total of 2 μg of DNA in a
1.5 mL tube.
3. Harvest 2  105 cells by trypsinization and centrifuge in PBS at
150  g for 5 min at room temperature.
4. Remove the supernatant with a pipette, suspend the cells in
buffer R, and then add the cells to the DNA tube mixing gently.
5. Pick the DNA and cell mix with the appropriate Neon Tip and
transfer to the electroporator. Apply a pulse (pulse voltage:
1005 v, pulse width: 35 ms, pulse number: 2) and transfer all
the cells to the well.
6. For Jurkat cells, prepare a total of 4 μg of DNA in a
1.5 mL tube.
7. Harvest 3  105 cells and centrifuge in PBS at 150  g for
5 min at room temperature.
8. Remove the supernatant with a pipette, suspend the cells in
buffer R, and then add the cells to the DNA tube mixing gently.
9. Pick the DNA and cell mix with the appropriate Neon Tip and
transfer to the electroporator. Apply a pulse (pulse voltage:
1325 v, pulse width: 10 ms, pulse number: 3) and transfer all
the cells to the well.
10. Infect the TZM-bl and Jurkat cells with HIV strains around
6–12 h post-transfections, allowing for recovery after
electroporation.
Engineering Protein-Based Parts in Mammalian Cells 339

3.1.4 Flow Cytometry 1. Acquire the cells with LSR Fortessa flow cytometer, equipped
and Data Analysis with 405, 488, and 561 nm lasers.
2. Collect 30,000–100,000 events per sample and acquire fluo-
rescence data with the following cytometer settings: 488 nm
laser and 530/30 nm bandpass filter for EYFP/EGFP, 561 nm
laser and 610/20 nm filter for mKate, and 405 nm laser,
450/50 filter for EBFP.
3. Convert flow cytometry data from arbitrary units to compen-
sated molecules of equivalent fluorescein (MEFL) using the
TASBE characterization method [19, 20]. The TASBE method
uses a strong constitutively expressed fluorophore, which
serves as both a transfection marker and an indicator of relative
circuit copy count.
4. An affine compensation matrix is computed from single posi-
tive and blank controls.
5. FITC measurements are calibrated to MEFL using SpheroTech
RCP-30-5-A beads.
6. Mappings from other channels to equivalent FITC are com-
puted from co-transfection of constitutively expressed EBFP,
EYFP, and mKate, each controlled by the hEF1a promoter on
its own otherwise identical plasmid.
7. MEFL data are segmented by constitutive fluorescent protein
expression into logarithmic bins at 10 bins/decade and because
the data are log-normally distributed, geometric mean, and
variance computed for those data points in each bin.
8. Observe the constitutive fluorescence distributions. Select the
threshold based on each data set, below which data are
excluded as being too close to the non-transfected population
(e.g., 1  107 MEFL for NS3 and NEF HEK, 3  107 MEFL
for HTT and TAT, 2  105 MEFL for TZM-bl, and 105 for
Jurkat data sets).
9. Removed high outliers by excluding all bins without at least
100 data points. Both population and per-bin geometric statis-
tics are computed over this filtered set of data.
10. Include at least three biological replicates for all experiments
and indicate error bars using standard deviation. Variance for
all groups should be generally similar: any differences should be
reflected in the displayed standard deviation.

3.1.5 Determination 1. Determine surface expression of HLA-I molecules by staining


of HLA-I Surface before fixation with AlexaFluor® 647 mouse anti-human
Expression by Flow HLA A, B, C antibody clone W6/32 (dilution 1:20).
Cytometry 2. Quantify fluorescence signals with a flow-cytometer BD LSR-II
system and FACSDiva8 software with the following settings:
640 nm laser and 670/14 nm filter.
340 Giuliano Bonfá et al.

3. Convert the flow cytometry data from arbitrary units to com-


pensated molecules of equivalent soluble fluorochrome
(MESF) using Spherotech RCP-30- 5A-2.
4. Analyze the data with FlowJo software. Determine the mean of
fluorescence (MFI) and plot it for each condition. Include at
least three biological replicates for all experiments and indicate
error bars using standard deviation.

3.1.6 HIV Production 1. Produce HIV-1 strains by transfecting HEK-293 T cells with
and Infection the corresponding infectious molecular clones (NIH AIDS
reagents program) and JetPRIME® reagent.
2. After 40 h, concentrate virus preparations by ultracentrifuga-
tion for 1 h, 64,074  g, 4  C on 20% sucrose to avoid viral
particle-free proteins.
3. Titrate viral stocks by HIV-1 p24 ELISA.
4. For infection of TZM-bl cells, use a viral inoculum of 500 ng of
p24 for each strain.
5. Forty hours after infection, harvest, fix, and permeabilize the
cells with cytofix/cytoperm solution for 15 min at room
temperature.
6. Determine the percentage of infected cells by intracellular
staining of viral protein p24 with a FITC-conjugated antibody
(KC57-FITC, dilution 1:50) and flow cytometry.

3.1.7 Apoptosis Assays 1. Sensing-actuation devices transfections should be performed


along with pCMV-EGFP transfection marker.
2. Harvest sample cells 48 h post-transfection (including those in
supernatant). Wash with PBS and stain with 2.5 μL of Pacific
Blue conjugated to Annexin V in 50 μL of binding buffer for
10 min at room temperature.
3. Analyze the cells by flow cytometry. Gate transfected cells
(EGFP+) and calculate apoptosis induction within this popula-
tion (cell death), defining as the percentage of Pacific-Blue
conjugated Annexin V positive cells.
4. Include at least three biological replicates for all experiments
and indicate error bars using standard deviation. Perform data
analysis for the apoptotic assays using FlowJo software.

3.1.8 RNA Extraction, 1. Perform RNA extraction with RNeasy Mini Kit. Wash the cells
cDNA Synthesis, and qPCR in PBS and add buffer RTL directly into the wells.
2. Incubate for 2 min at room temperature and collected with a
sterile scraper. Proceed the RNA extraction according to man-
ufacturer’s instructions.
3. Elute RNA in 30 μL of RNAse free water to maximize the yield.
Engineering Protein-Based Parts in Mammalian Cells 341

4. Conserve RNA samples at 80  C.


5. Synthesize cDNA using QuantiTect Reverse Transcription Kit
according to manufacturer’s instructions. Perform this proto-
col on ice in RNAse-free environment to avoid RNA
degradation.
6. Always prepare a negative control without Quantiscript
Reverse Transcriptase to assess contamination of genomic
DNA of the RNA preparation.
7. Dilute cDNA 1:10 and perform qPCR using Fast SYBR Green
Master Mix.
8. Load samples in MicroAmp™ Fast Optical 96-Well Reaction
Plate (0.1 mL) so that each well contains 20 μL of final volume
(10 μL SYBR Green Master Mix, 7 μL ddH2O, 1 μL of each
primer, and 1 μL of template).
9. Run the experimental plate in a StepOnePlus™ 7500 fast
machine.
10. Set a control without template (blank).
11. Perform analyses by calculating the 2ddCt to measure the fold
change of output expression (XCL-1) in presence or absence of
the target protein (Nef), after normalization of Ct values to
endogenous housekeeping gene expression (GAPDH).

3.2 Protein-Based To build protease-responsive RBPs and protease-responsive pro-


Devices to Regulate teases, insert the cognate protease cleavage sites into their aminoa-
RNA and Protein cidic sequence. The modification at the insertion site has to
Activity (a) minimally affect the protein structure and activity; (b) assure
protein disruption by proteolytic cleavage. To test the efficiency of
3.2.1 Protein Structural the devices, clone fluorescence reporters responsive to RBPs’ activ-
Analysis and Plasmid ity. All plasmids must be confirmed by sequencing.
Cloning
1. L7Ae crystal structure is reported with the PDB id: 1RLG
[21]. Visualize its structure in Pymol to identify possible inser-
tion loci for TEV protease cleavage site (TCS): three loci can be
identified. Synthetize the three L7Ae-CS variants as gblocks
and inserted into pL-A1 by In-Fusion between BamHI and
PacI restriction sites with a backbone: gblock ratio of 1:2.
2. A reporter plasmid encoding for two kink-turn motives
upstream an EGFP fluorescent reporter
(pBoxCDGC_2xKMet_DD-EGFP) was designed in [15].
3. Insert cleavage sites for different proteases in the linker
between Ms2 and cNOT7 by PCR: amplify both Ms2 and
cNOT7 by PCR with Accuprime Pfx DNA Polymerase from
pL-R1 and clone them by In-Fusion in pL-A1 between BamHI
and PacI restriction sites with a backbone:PCR1: PCR2 ratio of
1:2:2.
342 Giuliano Bonfá et al.

4. A reporter plasmid encoding for eight Ms2-binding motives


downstream an EGFP fluorescent reporter (pBoxCDGC-
mut_KMet-EGFP-8xMS2-pA) is designed in [15].
5. Visualize TVMV protease crystal structure Pymol: alternative
insertion sites can be identified for the TUMVp cleavage site
(TUCS): (a) TVMVp-TUCS1 between amino acid residues
D26-G27, (b) TVMVp-TUCS2 between amino acid residues
Q119-K120, and (c) TVMVp-TUCS3 between amino acid
residues T173-N174. Insert the cleavage site by PCR into the
three loci. Link the PCR products to the backbone by Infusion.
6. Design TVMV-responsive TEV protease variants (TEV-TVCS)
by homology between TEV and TVMV, so add the cleavage
sites in the same aminoacidic regions mentioned for TVMV.
7. As TUMV crystal structure is not resolved, infer it is by homol-
ogy with TEV using SWISS-MODEL.

3.2.2 Protein–Protein To test protein–protein interaction devices measure fluorescent


Devices Testing reporters’ expression by flow cytometry at steady state 48 h post-
transfection (see Notes 7–14).
1. Perform transfections to test L7Ae-CS with Attractene trans-
fection reagent in HEK293FT cells in 24-well plates format.
Aliquot 60 μL of uncomplemented DMEM for each transfec-
tion mix. Add a total of 400 ng of DNA per reaction mix (50 ng
of fluorescent reporter, 150 ng of L7Ae variant, 60 ng of wild-
type protease, 50 ng of transfection marker, empty plasmid to
400 ng) followed by 1.5 μL of Attractene transfection reagent.
Vortex all the reaction mixes are vortexed and incubated for
15 min (Fig. 2).
2. Perform transfections to test Ms2-cNOT7 constructs and pro-
tease–protease circuits with Lipofectamine 3000 transfection
reagent in HEK293FT cells in 24-well plates. Prepare two
master mixes: (a) 25 μL of Opti-MEM and 1 μL of P3000
Reagent for each sample; (b) 25 μL of Opti-MEM and 0.75 μL
of Lipofectamine 3000 for each sample. Aliquot 26 μL of
master mix (a) per sample in separate Eppendorf tubes. Add
400 ng of DNA per reaction mix to the master mix (a) (25 ng
of fluorescent reporter, 50 ng of Ms2-CS-cNOT7 variant,
30 ng of engineered protease, 50 ng of wild-type protease,
50 ng of transfection marker, empty plasmid to 400 ng).
Then add 25.75 μL of master mix (b) to each sample and mix
them by vortexing. Incubate the reactions for 15 min (Fig. 3).
3. During the 15 min of incubation time, plate HEK293FT cells
in 24-well plates. First, remove the medium from the flask, then
gently wash the flask with PBS (10 mL for T75 and 5 mL for
T25) and add trypsin (1.5 mL for T75 and 0.5 mL for T25).
Keep the flasks 2 min in the incubator. Add fresh new
Engineering Protein-Based Parts in Mammalian Cells 343

a Ins1

Ins3
Ins2

b
State 1 State 2

TEV

L7Ae_TCS L7Ae_TCS

EGFP EGFP

k-turns k-turns

Fig. 2 (a) Crystal structure of L7Ae bound to RNA target with the possible
insertion sites highlighted. (b) Graphical representation of the RNA-encoded
circuit regulated by a TEV-responsive L7Ae. State1: In absence of TEVp,
L7Ae_TCS represses EGFP translation. State2: When TEVp is present, it
cleaves L7Ae rendering it nonfunctional and EGFP levels increases

a b c
Stage 3 TUMV TVMV TEV

Stage 2
TVMV_TUCS TEV_TVCS TUMV_TCS
Stage 1
TVCS TCS TUCS
EGFP EGFP EGFP
Stage 0 AAA AAA AAA
Ms2 binding Ms2 binding Ms2 binding
motives motives motives

Fig. 3 Graphical representation of protease-based cascades. In all cascade variants, at stage 0, EGFP is
expressed and at stage 1 is downregulated by Ms2-cNOT7. (a) At stage 2, Ms2-TVCS-cNOT7 activity is
disrupted by TVMV-TUCS and EGFP expression is restored; at stage 3, EGFP expression is knocked down again
as TVMV-TUCS is repressed by TUMV. (b) At stage 2, Ms2-TCS-cNOT7 activity is disrupted by TEV-TVCS and
EGFP expression is restored; at stage 3, EGFP expression is knocked down again as TEV-TVCS is repressed by
TVMV. (c) At stage 2, Ms2-TUCS-cNOT7 activity is impaired by TUMV-TCS and EGFP expression increases; at
stage 3, TUMV-TCS is repressed by TEV, and EGFP is downregulated

complemented DMEM to the trypsinized cells (3.5 mL for


T25 and 5.5 for T75). Mix 10 μL of resuspended cells with
10 μL of Trypan Blue and load 10 μL of mix in a Countess®
Cell Counting Chamber Slide and loaded in the Countess®
Cell Counter II for cell counting. Seed a total of 140,000
cells/well in a final volume of 500 μL of complete DMEM.
344 Giuliano Bonfá et al.

4. After 24 h, add 1 mL of fresh complemented DMEM to


each well.
5. After 48 h, observe the cells at EVOS® Cell Imaging System
and acquire images of the transfection in all the fluorescent
channels.
6. Then analyze the cells with flow cytometer. First, remove
DMEM from the wells and add 50 μL of trypsin to each well.
Keep the plates in incubator for 2 min. Add 300 μL of DMEM
supplemented with 10% of FBS to each well. Transfer the cells
into FACS tubes and keep them on ice. Vortex each tube for
few seconds before loading it into the flow cytometer. Record
20,000 events in the single cell population.

4 Notes

1. In order to optimize sensing-actuation performance, we


designed variants of the device by tuning several of its features.
TEVp was fused to the N-terminus or C-terminus of the intra-
body with Glycine-Serine flexible linker sequence.
2. To obtain devices with significant ON/OFF ratio for output
expression, we tested flexible linker sequences of different
length to maximize the likelihood of intrabodies with protein
and TEVp with TCS interactions, and tested TCS mutants-
TEVp complexes with different binding constants.
3. Constitutive TEVp expression induced significant activation of
the reporter gene (up to 100 fold ON/OFF induction), indi-
cating that TEV cleavage site is accessible in the design config-
uration, and suggesting that careful tuning of protease
expression is critical to maximize signal-to-noise ratio.
4. The selection of target proteins for testing our sensing-
actuation framework was based on: (a) partial or complete
cytosolic localization, and (b) existence of intrabodies binding
two different epitopes of the antigen. Following these criteria,
we engineered genetic devices that recognize NS3, HTT, and
Tat/Nef proteins, respectively, specific for HCV, Huntington’s
disease, and HIV.
5. We confirmed NS3-intrabody interaction by fusing a BFP tag
(Blue Fluorescent Protein) to the N-terminus of nNS3
(BFP-nNS3) in a colocalization assay. Colocalization imaging
was performed after transfecting 293FT cells in 35 mm glass
bottom dishes with NS3, BFP-nNS3 and intrabody against
NS3 constructs. Cells were transfected with Lipofectamine
LTX following manufacturer’s instructions.
Engineering Protein-Based Parts in Mammalian Cells 345

6. We found that low-affinity TEV cleavage site (TCS-L) and low


sensor concentration provide the best operating conditions, in
agreement with the conclusion from a predictive computa-
tional model that we implemented.
7. Transfection efficiency improves if the mix is vortexed immedi-
ately after Attractene/Lipofectamine 3000 addition for 3–5 s.
8. Transfection quality improves pipetting plasmid DNA directly
into the mix (a) and not on the sidewall of the 1.5 mL tube.
9. FACS analysis quality improves by adding EDTA 2 mM to the
medium as it reduces cellular clumps formation.
10. Washing cells with PBS before detaching them with trypsin
reduces clumps formation.
11. Pour PBS towards the flask’s sidewall and not directly on the
cell layer, because HEK cells very easily detach from the flask
surface, thus they could go lost in this step.
12. When aliquoting the cells for transfection, it is recommended
to put extra care in resuspending them with serological pipets
several times, to disrupt clumps, and avoid sedimentation.
13. Filtering trypan blue increases the accuracy of cell counting.
14. When preparing cells to seed for transfection, it is recom-
mended to prepare a mix of media and cells at the correct
density, the mix is then aliquoted in 500 μL per well.

References
1. Ausl€ander D, Eggerschwiler B, Kemmer C, Engineering modular intracellular protein
Geering B, Ausl€ander S, Fussenegger M sensor-actuator devices. Nat Commun 9:1881
(2014) A designer cell-based histamine-specific 7. Scheller L, Strittmatter T, Fuchs D, Bojar D,
human allergy profiler. Nat Commun 5:4408 Fussenegger M (2018) Generalized extracellu-
2. di Bernardo D, Marucci L, Menolascina F, Sici- lar molecule sensor platform for programming
liano V (2012) Predicting synthetic gene net- cellular behavior. Nat Chem Biol 14:723–729
works. Methods Mol Biol 813:57–81 8. Courbet A, Endy D, Renard E, Molina F, Bon-
3. Tigges M, Marquez-Lago TT, Stelling J, Fus- net J (2015) Detection of pathological biomar-
senegger M (2009) A tunable synthetic mam- kers in human clinical samples via amplifying
malian oscillator. Nature 457:309–312 genetic switches and logic gates. Sci Transl
4. Siciliano V, Garzilli I, Fracassi C, Criscuolo S, Med 7:289ra83
Ventre S, di Bernardo D (2013) MiRNAs con- 9. Sedlmayer F, Fussenegger M (2017) Synthetic
fer phenotypic robustness to gene networks by biology: a probiotic probe for inflammation.
suppressing biological noise. Nat Commun Nat Biomed Eng 1:0097
4:2364 10. Schwarz KA, Daringer NM, Dolberg TB, Leo-
5. Kipniss NH, Dingal PCDP, Abbott TR, Gao Y, nard JN (2016) Rewiring human cellular inpu-
Wang H, Dominguez AA, Labanieh L, Qi LS t–output using modular extracellular sensors.
(2017) Engineering cell sensing and responses Nat Chem Biol 13:202
using a GPCR-coupled CRISPR-Cas system. 11. McNamara MA, Nair SK, Holl EK (2015)
Nat Commun 8:2212 RNA-based vaccines in cancer immunotherapy.
6. Siciliano V, DiAndreth B, Monel B, Beal J, J Immunol Res 2015:794528
Huh J, Clayton KL, Wroblewska L,
McKeon A, Walker BD, Weiss R (2018)
346 Giuliano Bonfá et al.

12. Sahin U, Karikó K, Türeci Ö (2014) mRNA- 18. Engler C, Marillonnet S (2014) Golden Gate
based therapeutics — developing a new class of cloning. Methods Mol Biol 1116:119–131
drugs. Nat Rev Drug Discov 13:759–780 19. Beal J, Wagner TE, Kitada T, Azizgolshani O,
13. Cella F, Wroblewska L, Weiss R, Siciliano V Parker JM, Densmore D, Weiss R (2015)
(2018) Engineering protein-protein devices Model-driven engineering of gene expression
for multilayered regulation of mRNA transla- from RNA replicons. ACS Synth Biol 4:48–56
tion using orthogonal proteases in mammalian 20. Beal J, Weiss R, Yaman F, Davidsohn N, Adler
cells. Nat Commun 9:1–9 A (2012) A method for fast, high-precision
14. Culler SJ, Hoff KG, Smolke CD (2010) Repro- characterization of synthetic biology devices.
gramming cellular behavior with RNA control- MIT CSAIL Tech Report 2012-008
lers responsive to endogenous proteins. 21. Moore T, Zhang Y, Fenley MO, Li H (2004)
Science 330:1251–1255 Molecular basis of box C/D RNA-protein
15. Wroblewska L, Kitada T, Endo K, Siciliano V, interactions; cocrystal structure of archaeal
Stillo B, Saito H, Weiss R (2015) Mammalian L7Ae and a box C/D RNA. Structure
synthetic circuits with RNA binding proteins 12:807–818
for RNA-only delivery. Nat Biotechnol 22. Caliendo F, Dukhinova M, Siciliano V (2019)
33:839–841 Engineered Cell-Based Therapeutics: Synthetic
16. PyMOL | pymol.org. https://pymol.org/2/. Biology Meets Immunology. Front. Bioeng.
Accessed 30 Oct 2019 Biotechnol. 7:43
17. Waterhouse A, Bertoni M, Bienert S, Studer G, 23. Cella F, Siciliano V (2019) Protein-based parts
Tauriello G, Gumienny R, Heer FT, de Beer and devices that respond to intracellular and
TAP, Rempfer C, Bordoli L, Lepore R, extracellular signals in mammalian cells. Curr.
Schwede T (2018) SWISS-MODEL: homol- Opin. Chem. Biol. 52:47–53
ogy modelling of protein structures and com-
plexes. Nucleic Acids Res 46:W296–W303
INDEX

A library of promoters ............................................... 325


plasmid........................................................... 323–325
AMIGO2
Matlab..................................................................... 246 C
MATLAB-based toolbox ....................................... 226
model selection Cell-free systems
numerical methods................................... 233–234 gene expression (see Steady-state gene
objective functional.................................. 231–232 expression)
optimization problem .............................. 232–233 lysate ....................................................................... 201
running the code...................................... 234–235 microfluidics .................................................. 148–149
parameter estimation Cell segmentation ...................................... 214, 215, 244
numerical methods................................... 237–238 Cellular burden ......................... 269, 286, 314, 323, 325
objective functional.................................. 235–236 Characterization
optimization problem .............................. 236–237 genetic part............................................................. 176
running the code.............................................. 238 modules’ attractors............................................. 26–27
use of....................................................................... 229 operations ............................................................... 200
Asymptotic graph .................................................... 27, 28 promoters and terminators .................................... 184
Automated selections of results ........................................................ 105–106
design...................................................................... 119 RNA-seq (see RNA sequencing
laboratory (see Laboratory automation) (RNA-seq))
necessary data files.................................................. 171 TASBE .................................................................... 339
output ............................................................ 173–174 transcription profiles .............................................. 321
primer-free regions........................................ 171–173 transition graphs......................................................... 7
web application ..................................... 169–170, 173 Chemical Langevin equation (CLE) ................ 43, 71–72
Euler–Maruyama discrete formulation ................... 49
B OpenFPM client program ....................................... 52
QS/Fb .................................................. 58, 63, 64, 82
Bacterial growth
CLE, see Chemical Langevin equation (CLE)
laws................................................................. 270–272 Computer-aided design
mechanistic model......................................... 272–277 DNA assemblies ............................................ 157–165
Bifurcation
model-based design ............................................... 267
boundary ................................................................ 106 Computer-aided manufacturing
diagrams................................................... 99–100, 107 automated selection
transition................................................................. 106
enzyme...................................................... 168–170
Biological parts............................................................. 189 primer ....................................................... 170–174
Biological systems ...................... 2, 7, 92, 109, 138, 148, DNA assembly project .................................. 167–168
152, 215–217, 226, 264, 293
Context-dependence.................................................... 293
Biosensor .......................... 267, 294, 314, 321–323, 331 Control algorithms
Boolean models cell segmentation ................................................... 214
control ................................................................ 31–32 MPC............................................................... 217–218
IRMA circuit ...................................................... 11–13
PI controller .................................................. 216–217
oscillator with positive feedback ......................... 9–11 relay controller .............................................. 215–216
phosphorylated and non-phosphorylated forms .... 17
schedules................................................................... 35 D
toggle switch .......................................................... 7–9
Burden-driven feedback loop Diffusion term ......................................................... 49, 71
cellular burden............................................... 325–327 Drift term ....................................................................... 71

Filippo Menolascina (ed.), Synthetic Gene Circuits: Methods and Protocols, Methods in Molecular Biology, vol. 2229,
https://doi.org/10.1007/978-1-0716-1032-9, © Springer Science+Business Media, LLC, part of Springer Nature 2021

347
SYNTHETIC GENE CIRCUITS : METHODS AND PROTOCOLS
348 Index
DNA assembly repressilator .............................................................. 91
automated selection RNA-seq (see RNA sequencing (RNA-seq))
enzyme...................................................... 168–170 simulation of an inducible gene ................... 278–282
primer ....................................................... 170–174 stochastic simulations............................................... 42
batch part standardization ............................ 158–162 whole-cell model .................................................... 269
circular plasmids ..................................................... 167 See also Synthetic circuits
EGF................................................................ 157–158 Gene expression burden
NGS ........................................................................ 168 burden-driven feedback loop
and strain development................................. 149–150 cellular burden.......................................... 325–327
synthetic biology projects ...................................... 157 library of promoters ......................................... 325
type-2S assembly ........................................... 162–165 plasmid...................................................... 323–325
verification .............................................................. 168 burden-responsive promoter
DNA verification .......................................................... 168 biosensor................................................... 322–323
Dynamic models........... 67, 74, 119, 120, 123, 223, 226 genomic context............................................... 322
RNA-seq results ............................................... 322
E cell engineering ...................................................... 314
Edinburgh Genome Foundry (EGF)................ 157–158, host responds ................................................ 317–321
medium................................................................... 316
168, 341–343
molecular cloning.......................................... 315–316
F RNA-seq library preparation ........................ 316–317
strains ...................................................................... 315
Feedback Gene network
burden-based biomolecular ................................... 314 retroactivity (see Retroactivity)
burden-driven................................................ 323–327 synthetic.................................................................. 110
control Gene regulatory networks ................................. 3, 14, 93,
algorithms................................................. 215–218 101, 120, 308
laws...................................................................... 32 Genetic parts
negative loop ........................................................ 5, 34 descriptions............................................................. 151
positive................................................. 4, 9–11, 17–19 DNA constructs ..................................................... 157
stability of oscillations ................................................ 2 receptor vector ....................................................... 171
three-gene negative.................................................... 4 RNA-seq (see RNA sequencing (RNA-seq))
Feedback control standardization .............................................. 158–159
controller Gene transcription networks ............... 32, 294, 299, 304
PI............................................................... 216–217 Gillespie algorithm
relay........................................................... 215–216 choosing a reaction ................................................ 104
implementation ........................................................ 34 iterating................................................................... 105
law ............................................................................. 32 Markov process....................................................... 101
MPC............................................................... 217–218 rate vector function................................................ 114
Focal point...................................................................... 15 resampling time trace data..................................... 116
Funding SSA...................................................................... 42, 77
fee-for-service model ............................................. 145 stochastic algorithm ................................................. 29
government ................................................... 143–145 stoichiometry matrix ..................................... 115, 116
project partnerships................................................ 145 system update ......................................................... 104
time to next reaction..................................... 103–104
G time-trace simulating .................................... 102–103
Gene circuits Global optimization ............................................ 120, 226
cell-free ................................................................... 190 Growth models ......................................... 269, 274, 276,
CLE approach .......................................................... 43 280, 282, 288
construction ........................................................... 337
design.................................................... 123, 267, 270 H
heterologous genes ....................................... 277–278 Hardware ................. 138, 141, 142, 146, 149, 191, 193
living cells ............................................................... 175 control layer pressure regulation ........................... 198
modeling................................................................... 92 flow layer pressure .................................................. 198
QS/Fb circuit..................................................... 43, 44 Hill function ................................ 96, 111, 112, 299, 308
SYNTHETIC GENE CIRCUITS : METHODS AND PROTOCOLS
Index 349
Host–circuit models M
bacterial growth ............................................ 269–276
gene circuits Machine learning
heterologous genes .................................. 277–278 algorithms............................................................... 148
inducible gene .......................................... 278–282 automation need .................................................... 140
model complexity................................................... 269 gene circuit design ................................................. 267
transcriptional logic gates ............................. 281–288 scientific experiments design ................................. 148
T7 RNA polymerase .............................................. 268 test cycle ................................................................. 140
trial-and-error approach ........................................ 148
I Mammalian cell
culture and transfection ................................ 334–335
Intracellular protein-sensor electroporation .............................................. 334–335
acquisition .............................................................. 335 microfluidics/microscopy (see Microfluidics)
analyses ................................................................... 335 segmentation .......................................................... 215
apoptosis assays ...................................................... 340 tissue culture........................................................... 211
cDNA synthesis ..................................... 335, 340–341 Mammalian synthetic biology
data analysis ............................................................ 339 intracellular protein-sensor devices ............. 331–335,
DNA cloning ................................................. 334, 337 337–341
electroporation ...................................... 334–335, 338 protein-based
flow cytometry .............................................. 335, 339 devices.............................................. 336, 341–344
fluorescence imaging..................................... 337–338 strategy...................................................... 333–334
of HEK 293FT cells ...................................... 337–338 synthetic devices ....................................... 331–333
HIV production and infection .............................. 340 Mathematical modeling ......................... 2, 109, 110, 267
HLA-I surface expression ............................. 339–340 Metabolic engineering ................................................. 146
mammalian cells culture ............................... 334–335 Metrology ............................................................ 150–152
plasmid construction..................................... 334, 337 Microfluidics
qPCR ..................................................... 335, 340–341 and cell-free systems.............................. 148–149, 190
RNA extraction ..................................... 335, 340–341 chip fabrication....................................................... 206
chip loading ............................................................ 206
L
pins preparation and wetting................... 209–210
Laboratory automation preculture of cells ..................................... 211–212
automation field shear-free cell loading ...................................... 211
cell-free systems........................................ 148–149 computational algorithms...................................... 215
DNA assembly.......................................... 149–150 connectors .............................................................. 193
metrology ................................................. 150–151 device fabrication ................................................... 244
microfluidics ............................................. 148–149 experiments
ML .................................................................... 148 calibration ................................................. 252–254
open science ..................................................... 150 cells loading ...................................................... 254
standardization ......................................... 150–151 connecting syringes to the chip....................... 252
strain development................................... 149–150 fabrication................................................. 249–250
build ............................................................... 139–140 microfluidic chip wetting................................. 251
business plan microscope setup...................................... 254–255
education .................................................. 146–147 overnight culture.............................................. 250
funding ..................................................... 143–145 setup.......................................................... 244–245
partnerships .............................................. 145–146 syringe preparation........................................... 251
system maintenance and personnel ................. 147 feedback control algorithms
design...................................................................... 139 MPC.......................................................... 217–218
ML .......................................................................... 140 PI controller ............................................. 216–217
strategy........................................................... 141–144 relay controller ......................................... 215–216
synthetic biology .................................................... 138 hardware ................................................................. 193
test........................................................................... 140 PDMS ..................................................................... 205
Liquid handling................................................... 149, 152 time-lapse................................................................ 207
SYNTHETIC GENE CIRCUITS : METHODS AND PROTOCOLS
350 Index
actuation system ............................................... 214 dynamics ................................................................. 222
chip positioning....................................... 213, 214 inducible promoter modeling ...................... 229–231
microscope specs .............................................. 214 local methods ......................................................... 225
settings ...................................................... 214–215 model
tubes.......................................................... 212–213 building............................................................. 221
and turbidostats........................................................ 44 calibration ................................................. 258–259
Mixed integer nonlinear programming ...................... 122 selection .................................................... 231–235
Model calibration parameter estimation ................... 224, 225, 235–238
computational tools ............................................... 244 probability density function................................... 223
image processing stochastic global optimization algorithms ............ 226
cell-tracking and extraction ..................... 256–257 toolbox
fine-tuning ........................................................ 255 download and license....................................... 227
segmentation ............................................ 255–256 requirements and installation guide................ 227
microfluidic Ordinary differential equations (ODEs)
device fabrication ............................................. 244 cyber-physical platform .......................................... 242
experimental setup ................................... 244–245 Matlab solvers......................................................... 123
optimal experimental design......................... 258–259 nonlinear deterministic .......................................... 120
parameter estimation .................................... 257–258 solving................................................................. 98–99
practical identifiability ................................... 247–248 writing....................................................................... 98
sensitivity analysis .......................................... 246–247
structural identifiability .......................................... 245 P
test case ................................................................... 242
Parameter space analysis ....................... 22, 99–100, 106,
Model order reduction .......................... 46, 48, 295–296 121, 260, 261
Model predictive control (MPC) ....................... 217–218 PDMS, see Polydimethysiloxane (PDMS)
Modularity ........................ 119, 149, 150, 267, 293, 332
Photolithography
Moieties .......................................................................... 75 consumables ........................................................... 192
MPC, see Model predictive control (MPC) control mold fabrication............................... 195–196
Multi-objective optimization.............................. 131–133
flow mold fabrication.................................... 194–195
machines ................................................................. 192
N
mask fabrication ..................................................... 194
Network control............................................................... 4 PI controller, see Proportional-Integral (PI) controller
Boolean models .................................................. 31–32 Piecewise-linear differential equation (PLDE) models
strategies ............................................................. 30–31 cyclic orbit ................................................................ 11
synthetic circuits................................................. 32–33 discontinuities .......................................................... 35
Network dynamics IRMA circuit ............................................................ 19
analysis oscillator with positive feedback ....................... 17–19
attractors and their stability ......................... 20–21 toggle switch ...................................................... 15–17
formal verification of network Polydimethysiloxane (PDMS)
properties ................................................ 23–26 bonding .................................................................. 197
modular analysis ........................................... 26–28 casting and curing ......................................... 196–197
state transition graphs ..................... 21–23, 28–30 degassing................................................................. 249
control elastomer................................................................. 193
Boolean models ............................................ 31–32 mixing ..................................................................... 249
strategies ....................................................... 30–31 replica molding
synthetic circuits........................................... 32–33 cleaning and bonding ...................................... 209
Next generation sequencing (NGS) .................. 168, 176 microfluidic device preparation ............... 207–208
silanization........................................................ 207
O Practical identifiability 223, 242, 244, 247–248, 261, 264
ODEs, see Ordinary differential equations (ODEs) Proportional-integral (PI) controller .................... 31, 32,
216–217
Optimal experimental design
AMIGO2 ....................................................... 226, 229 Protein-based devices
candidate models.................................................... 223 cell culture .............................................................. 336
code structure................................................ 227–229 flow cytometry ....................................................... 336
PCR......................................................................... 336
SYNTHETIC GENE CIRCUITS : METHODS AND PROTOCOLS
Index 351
plasmid cloning ..................................... 336, 341–342 quality control and alignments........................ 320
protein–protein devices testing .................... 342–344 sample preparation ................................... 319–320
protein structural analysis ............................. 341–342 time-course assay...................................... 318–319
in silico protein engineering .................................. 336 transcription profiles ........................................ 321
transient transfection cell imaging ........................ 336 transformation .......................................... 317–318
Protein-protein regulation......................... 333, 342–344 library preparation
Protein-RNA regulation .............................................. 333 consumables ............................................. 316–317
Protein sensor-actuator...................... 332–334, 337–341 equipment......................................................... 317
materials
Q genetic analyzer installation..................... 178–179
QSSA, see Quasi steady-state approximation (QSSA) sequencing data ................................................ 179
software dependencies ..................................... 178
Qualitative modeling
Boolean models .................................................... 6–13 methods
DNA synthesis............................................................ 1 data preprocessing.................................... 182–183
differential gene expression ..................... 183–184
dynamic properties............................................. 34–35
gene expression dynamics .......................................... 2 initial workflow setup............................... 179–181
network dynamics promoters and terminators .............................. 184
response function ..................................... 184–185
analysis .......................................................... 20–30
control .......................................................... 30–33 temporary files and logs removing .................. 185
PLDE models ................................................ 3, 15–19 transcription profiles ........................................ 183
reviews ...................................................................... 34 in vivo assay ............................................................ 314
synthetic regulatory circuits................................... 4–6
S
Quasi steady-state approximation (QSSA) ................... 75
Sanger sequencing
R necessary data files.................................................. 171
Relay controller ................................................... 215–216 output ............................................................ 173–174
primer-free regions........................................ 171–173
Reproducibility.......................... 137–139, 142, 149–151
Resource allocation ...................................... 33, 269, 272 web application ...................................................... 173
Restriction digest analysis Sensor
DNA assembly verification .................................... 168 gene expression burden ......................................... 314
genetic logic gates .................................................. 184
web application ............................................. 169–170
Retroactivity intracellular protein (see Intracellular protein-sensor)
biochemical reactions............................................. 295 small-molecule........................................................ 176
Soft lithography
contraction theory ................................................. 296
error ............................................................... 303–304 clean/semiclean room ........................................... 249
external .......................................................... 299–301 consumables .................................................. 192–193
device fabrication
intermodular connections............................. 304–307
internal........................................................... 298–299 bonding of PDMS ........................................... 197
mathematical model of modules .................. 296–297 casting and curing, PDMS....................... 196–197
silanization........................................................ 196
model order reduction.................................. 295–296
modularity .............................................................. 293 machines ................................................................. 192
perturbations .......................................................... 294 Software ........................................................................ 193
scaling and mixing......................................... 301–302 automatic syringe movement ................................ 214
code implementation ......................................... 46–49
time-scale separation ..................................... 295–296
RNA-binding protein .................................................. 333 dependencies .......................................................... 178
RNA sequencing (RNA-seq) DSGRN .................................................................... 22
FACSDiva8............................................................. 335
burden-driven feedback ......................................... 323
characterize genetic parts....................................... 176 FlowJo .................................................................... 340
computational tool................................................. 176 and hardware components..................................... 138
MEIGO .................................................................. 245
host responds
library sequencing ............................................ 320 Snapgene Viewer .................................................... 160
plate-reader data............................................... 321 spectrum companies............................................... 146
promoter characterization ............................... 321 SynBioHub ............................................................. 151
tools ........................................................................ 139
SYNTHETIC GENE CIRCUITS : METHODS AND PROTOCOLS
352 Index
Software (cont.) modeling framework..................................... 120–121
web-based ............................................................... 162 optimization
SSA, see Stochastic simulation algorithm (SSA) problem design......................................... 121–122
Standardization .................................. 138, 149–152, 157 solvers ....................................................... 122–123
genetic part.................................................... 158–159 oscillator design
necessary data files......................................... 159, 160 library of components.............................. 124–125
output ............................................................ 161–162 objective function..................................... 125–126
part regions.................................................... 160–161 problem definition ........................................... 124
web application ...................................................... 161 simulating the dynamics, circuit...................... 128
State transition graphs ............................ 3, 8–11, 13, 16, single objective optimization problem ... 126–128
18, 20–23, 28–30 Pareto front of solutions........................................ 134
Steady-state gene expression practical examples................................................... 123
batch cell-free reactions ......................................... 190 switch-like circuit design
device operation definition .......................................................... 129
cell-free expression ................................... 199–201 library of components.............................. 129–130
filling control lines ................................... 198–199 multi-objective optimization problem .... 131–133
flow lines filling ................................................ 199 objective functions ........................................... 130
experimental reagents ............................................ 194 SynBioHub ................................................................... 151
hardware setup ....................................................... 198 Synthetic biology
host cell................................................................... 189 application .................................................................. 2
microfluidic.................................................... 193, 194 batch cell-free reactions ......................................... 190
microscope hardware ............................................. 193 biocircuits ............................................................... 119
photolithography cell-free systems...................................................... 189
consumables ..................................................... 192 cyber-physical platform .......................................... 242
control mold fabrication .......................... 195–196 DBT ........................................................................ 137
flow mold fabrication............................... 194–195 design-build-test-learn cycle........................... 92, 267
machines ........................................................... 192 federal investments................................................. 143
mask fabrication ............................................... 194 genetic design......................................................... 177
soft lithography homeostasis .............................................................. 32
consumables ............................................. 192–193 laboratory automation (see Laboratory automation)
device fabrication ..................................... 196–198 microfluidics ........................................................... 190
machines ........................................................... 192 OED .............................................................. 241, 242
software................................................................... 193 on-line vs. off-line .................................................. 243
Stochastic modeling ....................................................... 92 photolithographic steps ......................................... 191
CME ......................................................................... 42 promoters and regulators ...................................... 189
continuous deterministic approach ......................... 42 sequencing methods .............................................. 176
gene expression noise............................................... 41 stochastic perturbations ........................................... 22
materials.............................................................. 43–52 toggle switch .............................................................. 4
methods .............................................................. 52–71 two-layer microchemostat design ................ 190, 191
spatial ........................................................................ 42 Synthetic circuits
Stochastic simulation algorithm (SSA) ................. 42, 43, characterizing promoter and terminator ..... 176, 177
78, 79, 83 contextual effects.................................................... 175
Stochastic simulations control ................................................................ 32–33
characterization of results ............................. 105–106 dynamic properties..................................................... 3
circuit performance .................................................. 44 genetic parts and devices ....................................... 176
dynamic model ......................................................... 67 Hill function .................................................. 111, 112
gene circuits.............................................................. 42 interactions ............................................................... 21
Gillespie algorithm ........................................ 102–105 inverse transform sampling.................................... 113
parameter scan............................................... 106–107 materials
stochastic notation ........................................ 101–102 built-in/custom-coded functions ..................... 93
time-trace....................................................... 102–103 computing long-term statistics.................... 49–50
SYNBADm model in proper form .................................. 44–49
initialization ................................................... 123–124 noise .............................................................. 49–50
installation ..................................................... 123–124 software......................................................... 50–52
SYNTHETIC GENE CIRCUITS : METHODS AND PROTOCOLS
Index 353
memorylessness property....................................... 113 T
methods
abstracting the circuit .................................. 94–95 Throughput ............................... 137–143, 149–152, 167
compilation......................................................... 67 Trade-offs ........................................ 2, 83, 120, 122, 304
deterministic solution ................................ 98–100 Transcriptional logic gates
mass action equations .................................. 95–96 circuit function
models to redesign ................................... 107–110 nutrient quality......................................... 286–287
OpenFPM client program ........................... 52–67 RBS ................................................................... 286
parameter estimation ................................... 96–97 host-aware gate
simulation ..................................................... 67–71 AND ................................................................. 284
stochastic simulations............................... 100–107 NAND .............................................................. 285
models....................................................................... 91 NOT ................................................................. 283
novel gene circuits.................................................... 92 Type-2S assembly pre-validation
parameter values..................................................... 110 necessary data files.................................................. 163
redesign..................................................................... 92 output ..................................................................... 165
response function .......................................... 176, 178 restriction sites........................................................ 157
structure and behavior ............................................... 2 web application ............................................. 163–165
workflow ........................................................ 176, 177
W
See also Qualitative modeling
Synthetic construct ............................ 314, 317–318, 322 Whole-cell modeling........................................... 269, 270
Synthetic gene circuits, see Synthetic circuits

You might also like