Abstract

Mathematical modeling plays an important and often indispensable role in synthetic biology because it serves as a crucial link between the concept and realization of a biological circuit. We review mathematical modeling concepts and methodologies as relevant to synthetic biology, including assumptions that underlie a model, types of modeling frameworks (deterministic and stochastic), and the importance of parameter estimation and optimization in modeling. Additionally we expound mathematical techniques used to analyze a model such as sensitivity analysis and bifurcation analysis, which enable the identification of the conditions that cause a synthetic circuit to behave in a desired manner. We also discuss the role of modeling in phenotype analysis such as metabolic and transcription network analysis and point out some available modeling standards and software. Following this, we present three case studies—a metabolic oscillator, a synthetic counter, and a bottom-up gene regulatory network—which have incorporated mathematical modeling as a central component of synthetic circuit design.

1. Introduction

Synthetic biology aims to design novel biological circuits for desired applications, implemented through the assembly of biological parts including natural components of cells and artificial molecules that emulate biological behavior [1, 2]. Because of its parts-to-whole approach, synthetic biology has a significant engineering component. Engineering endeavors typically involve the three classical engineering strategies: standardization (ensuring that components of a system are compatible and exchangeable), decoupling (dissecting a system into less complicated subsystems), and abstraction (streamlining a problem to focus only on the pertinent facets) [35]. It may appear that it should be possible to apply these strategies toward constructing a synthetic biological circuit in a manner similar to constructing an electric or electronic circuit. The attainment of this ideal goal is, however, impeded by the overwhelming complexity of biological systems with their myriad biomolecules and interconnections as well as sparse databases of gene function [3]. Consequently it is challenging to convert design concepts to predicted results.

This stumbling block in synthetic biology can be alleviated by the use of computer-aided mathematical modeling. Modeling is a powerful and often indispensable link between design and realization in engineering. It can predict the dynamics of a network under several different conditions and combinations thereof. Due to this, a user can search large parameter spaces in silico to identify the small regions of parameter space that produce the desired behavior or the most effective design or, alternatively, avoid parameter values that result in undesired responses. Modeling also provides the capability of using knowledge about the constituent parts of a system to predict the behavior of a system as a whole. Therefore, mathematical modeling serves as a bridge connecting a conceptual design idea to its biological realization (Figure 1).

In this review we present mathematical modeling concepts as relevant to synthetic biology and illustrate their application through the discussion of three case studies [68]. While the role of modeling in synthetic biology has been expertly reviewed before (e.g., [4, 9, 10]), this review aims to build upon the previous reviews by collecting several modeling methodologies into a single article and exemplifying them with in-depth case studies.

Figure 1 summarizes the important role played by mathematical modeling in synthetic biology and the key steps in the modeling process. Briefly, a model is formulated on the basis of certain assumptions about the system. Two broad types of modeling frameworks are available: deterministic modeling and stochastic modeling. Depending on the model type and the system, the estimation of parameters in the model could be a crucial step in obtaining a satisfactory model, and optimization may play a major role in this process. The predictions of a completed model are used, in conjunction with sensitivity analysis, bifurcation analysis, or, in certain cases, metabolic and regulatory network analyses to obtain insights toward an effective design. Below, we present a detailed discussion of the steps involved in modeling.

2. Assumptions Underlying a Model

Biological systems are difficult to model and simulate despite a wealth of data on the structure and function of biomolecules and on cellular mechanisms. This is because biological systems exhibit complexity on several scales. Firstly metabolites, metabolic fluxes, proteins, RNA, and genes network in a highly complex manner; furthermore, their interconnections could constitute feedback or feedforward loops that respond at various time scales [11]. Secondly, living systems can be highly sensitive to time-variant environmental conditions such as light, humidity, and supply of nutrients. These and other unknown causes of uncertainty result in “biological errors”, which are distinct and usually greater in magnitude than instrumentation or measurement errors. Therefore, it is difficult to exactly predict the output of a biological system as compared to a mechanical or an electrical system and one has to often reconcile with an approximate reproduction.

However, a biological system can often be simplified to a level that permits a user to obtain insights toward synthetic circuit construction [11]. For example, Ma’ayan et al. [12] demonstrated how simplifying the dynamics of single components could lead to valuable information on a system’s function. Simplification of a model requires the making of various assumptions. A commonly used assumption is homogeneity, both within the cell and within a cell population. Spatially homogeneous time-variant systems can be modeled by ordinary differential equations (ODEs) (equation (1), see Figure 2, e.g.). However, time-variant systems that feature compartmentation [13], spatial segregation, or intracellular gradients [14] may require the use of partial differential equations (PDEs). Although solving PDEs (and thereby non-homogenous models) is computationally much more intensive than solving ODEs, it can pay off well. For example, effects such as the spatial segregation of two enzymes that may generate intracellular gradients or the effect of protein diffusivity on enzyme activity can be simulated by using nonhomogenous models [14]. Closely related to spatial homogeneity is the assumption of cell population homogeneity, which is very frequently employed in models of biological systems. However, the modeling of heterogeneous populations in chemical reactors [15] has found application in the modeling of heterogeneous cell populations, and stochastic models frequently employ it. Besides the homogeneity assumption, most models involving enzyme kinetics or transcriptional regulations also assume equilibrium [8], steady state [7], or quasi-steady state [6]. Such an assumption can remove time-dependence from the model and converts ODEs to simpler algebraic equations. The task of formulating the assumptions underlying a model is a fine balancing act between trimming down the complexity of the system while retaining the features of the system that are crucial to making reliable predictions for the application at hand. If a model based on certain assumptions does not agree with experimentally observed behavior, then the assumptions have to be revised [9]. This makes mathematical modeling an iterative process (Figure 1).

3. Types of Model Frameworks

3.1. Deterministic Mathematical Models

Mathematical models of biological systems can be categorized into two major types: deterministic and stochastic [4]. A deterministic model emulates a real system with analytical equations (usually ODEs or PDEs) that include numerical parameters. These equations are usually mass balances on cellular species (Figure 2), and the state of the system that is predicted by such a model is reproducible [4]. In contrast, a stochastic model endeavors to represent a real system with randomly interacting particles or species. The rate of each reaction between the species follows a probabilistic equation [9]; additionally, the time between the reactions can also vary. Stochastic models usually incorporate the fluctuations and noise inherent in real biological systems and examine the effect of noise on system dynamics [4].

Figure 2 explains the construction of a deterministic model for a simple network. Deterministic models usually employ differential equations used to describe interactions or reactions between biomolecules: where X and N are vectors comprised of species concentrations (and may be identical), is the rate of change of X, is a vector of model parameters (see the following section on model parameters), and is a nonlinear vector function that relates rates of change to concentrations [4]. Dynamic simulations of a system modeled by equation (1) are quite straightforward and will reveal the time-dependent characteristics of the system by generating time series trajectories of the species concentrations. Furthermore, the simulations help to analyze the behavior of a network when feedforward or feedback regulations are integrated into it. Such analysis has shown that the dynamic properties of feedforward loops depend on their specific architecture [16, 17], that positive or double negative feedbacks often introduce ultrasensitive or bistable switches [18, 19], and that negative feedbacks may reduce instability of the system [20]. Furthermore, multiple steady states or oscillations may occur as the result of positive and negative feedbacks [21, 22].

To analyze steady states of a time-dependent biological system, the time derivatives in (1) are set to zero [4]: which represents the steady state(s) of the system, is usually combined with bifurcation analysis to obtain the range of parameters in which the system will exhibit certain desired behaviors such as oscillations [23] (see Section 5 below).

Numerous deterministic models have been developed for biological systems, including several for synthetic circuits. Case studies I and II discussed in this article [6, 7] employ deterministic models.

3.2. Stochastic Mathematical Models

In deterministic models, every interaction and every parameter value is certain. Therefore, such models predict identical system dynamics for the same set of parameter values and initial conditions. However, real systems are characterized by unexpected and irreproducible fluctuations. To capture these fluctuations and their consequences on the behavior of the system, an alternative type of model, the stochastic model, is used. Such models mimic a system as a collection of interacting particles, with the reaction rates being governed by probabilistic rate laws [9]. An example of such a rate law is the chemical Master equation [10]. Stochastic simulation algorithms (SSAs) [10] such as Gillespie’s algorithm are then used to simulate the state of the system.

One approach in stochastic modeling is to assume that a system is comprised of randomly interacting biomolecules, wherein the reactions between the molecules are modeled as Poisson processes with a probabilistically determined rate parameter [9]. Another approach is to perceive a time-variant system as a discrete time stochastic process. This approach uses a random variable or a vector Xn to indicate the discrete state of the system amongst several (finite or infinite) possible states [24]. The fewer the system states, the easier it is to construct a stochastic model. Guido et al. [8] have employed the latter approach with six system states to develop a stochastic model for a bottom-up gene regulatory network. In this type of stochastic model, the probability at time n of each system state is computed based on certain assumptions [24]: System responses or outputs such as the rate of synthesis of green fluorescent protein (GFP) are then described in terms of the state probabilities: where is the net output resulting from a combination of n states and is the synthesis rate contributed by each state . The probability of a system state could be estimated by taking into account noises from synthesis and degradation of mRNA, GFP, or transcription factors and physical properties of the cell or system [8]. Finally, equation-free stochastic models have also been developed [25].

Several stochastic models have been developed for synthetic biological circuits and related simple biological systems [8, 2529]. In case study III of this article, we discuss a stochastic model for a bottom-up gene regulatory network reported in Guido et al. [8].

4. Parameters in a Model

Any model contains several variables that do not represent the system state, but whose values govern the dynamics of the equations in the model. Such variables include reaction rate constants, equilibrium constants, diffusivities, and other physical properties. These are termed “parameters” of the model, as opposed to “state variables” such as species concentrations that represent the state of the system. To make useful predictions from a model, the parameters in the model have to be accurately estimated.

Mechanistic models, which are based on physical and chemical laws, include parameters that carry physical, chemical, or biological meaning. However, there could be many instances where not much information is available about a system, and constructing a “black box” model is the only option available. The parameters of such a model do not carry physical or biological meaning, but their estimation is nevertheless indispensable to the success of the model. Occasionally, information about a system could be so meager that even a black box model cannot be constructed. In such cases, a reverse engineering approach is employed to translate observable information to not only parameters but also model equations. This approach involves searching through (discrete) topological space instead of (continuous) numerical parameter space. Sometimes, combining the topological and numerical parameters for a system and simultaneously searching for both types of parameters has many advantages in understanding systems that are sparsely known [30]. Sometimes, parameter estimation can be largely bypassed. Recently, Tran et al. [31] described an approach called ensemble modeling, suited to modeling metabolic networks. This technique bypasses the requirement of obtaining accurate parameter values by reducing an initial set of models. The final set of models so obtained is capable of describing phenotypes of enzyme perturbations. Such approaches offer versatile ways to attack parameter problems in modeling.

4.1. Parameter Estimation

Parameter estimation is known as the “inverse problem” or “model calibration” and is both a key step and a limiting step of model construction [32, 33]. Parameter estimation typically leads to a first working model. If this initial model exhibits significant departures from experimental data (a frequent occurrence), further experiments may need to be performed to refine the parameter values. This process is repeated iteratively until a satisfactory model is constructed [34, 35]. Model parameters can occasionally be found from the literature or estimated manually, although this is usually feasible for small parameter spaces and simple systems. In general, parameter estimation from experimental observations requires sophisticated techniques such as described below. Amongst these, (global) optimization, a computationally intensive but powerful and robust method, is widely used.

4.2. Optimization

Parameter estimation is generally an optimization problem that involves locating the optimum (minimum or maximum) of an objective function that represents how well the model simulations agree with experimental data. This can be expressed as The function , which represents the goodness of fit between experiment and simulation, is a scalar function of the parameter vector . Its optimum is determined by iteratively adjusting the values of the components of and sometimes revising model assumptions [4]. The function is most frequently a weighted sum-of-squares error (a chi-squared statistic) between the experimental data points and the corresponding simulated points [36]. Other objective functions such as the Bayesian estimator and the maximum likelihood estimator also work well [33]. The process of adjusting and refining parameter values to reach the optimum of this objective function can be performed manually for linear or piecewise linear models [37]. However, many biological processes are not only nonlinear but also random, thus necessitating nonlinear models, stochastic models, or both [3739]. In such cases, several general methods of parameter estimation are available, including some that are particularly suited to nonlinear dynamic systems typical in biology [4043]. Even while using sophisticated algorithms, parameter estimation can involve unexpected complications such as the inability of a given optimization algorithm to effectively search a parameter space. In such cases, an exhaustive searching of parameter space can sometimes be accomplished well by stochastic Monte Carlo algorithms [44]. However, the computation involved in such exhaustive searching may often become prohibitively expensive when the number of parameters runs into hundreds or thousands. Additionally, unobserved or unknown interactions that were not accounted for in the formulation of the model can result in unsuccessful parameter searches and will require the model assumptions to be revisited.

The calculation of the first and second partial derivatives of the objective function is sometimes useful in optimization. Gradient search optimization algorithms depend on the partial derivatives of the objective function (or the partial derivative matrix, the Jacobian) for their success [4]. The Jacobian J is defined as Occasionally it is also useful to analyze how the combination of two parameters may affect the system dynamics; this is accomplished through the Hessian matrix H: which is a matrix containing the second partial derivatives of the objective function with respect to pairs of parameters.

In gradient search methods it is not always possible to reach the global optimum of the objective function, especially for nonlinear objective functions that may have several local optima far away from the global optimum. Therefore, it may become necessary to sacrifice the speed of the gradient search methods for the exhaustive searching abilities of probabilistic methods [45]. Such methods avoid the inferior solutions often found by gradient search methods [46]. Examples of probabilistic optimization methods are simulated annealing, genetic, and evolutionary algorithms [47, 48] or scatter searches [49]. Conversely, a local minimum may sometimes be sufficient to generate parameter values that are practical. In some particularly difficult combinatorial optimization problems the local minimum found near the global minimum may turn out to be a better choice [50]. On some rare occasions, perhaps many of the optima could be useful for the researcher.

5. Model Analysis Techniques

After a satisfactory model is constructed, the analysis of the model and its predictions provides crucial input toward designing synthetic circuits that exhibit a given behavior. Here, we discuss two analysis techniques: sensitivity analysis and bifurcation analysis.

5.1. Sensitivity Analysis

Sensitivity analysis, which analyzes how sensitive a system is with respect to changes in parameter values [51], is useful in quantifying the significance of a parameter or parameters on system performance [4]. Local sensitivity analysis analyzes the effects of small perturbations whereas global sensitivity analysis is used to analyze the effects of perturbing parameter values over the entire parameter ranges. Caution should be exercised while extrapolating the results of sensitivity analyses to an operating point far away in parameter space, as this may not accurately predict system behavior at the operating point.

For local sensitivity analysis a sensitivity s is defined as where N is a vector comprised of species concentrations, represents a system state or output, and is a vector of parameters. The magnitudes of the elements of the vector s are proportional to the effect of the corresponding parameters [4]. In global sensitivity analysis, the parameter space (constrained by physical limitations, mass fraction summations, etc.) is explored by methodologies such as random sampling-high dimensional model representation [52], multiparametric sensitivity analysis, or Monte Carlo simulation [53].

5.2. Bifurcation Analysis

Bifurcation analysis is crucial to understanding and analyzing steady states, oscillations, and other dynamic features of a system and has found use in numerous modeling studies [6, 54, 55]. In many nonlinear models, the parameter space can be divided into regions that lead to a stable system, an unstable system, or a periodic (oscillatory) system. Identifying the boundaries of these regions will enable the design of a synthetic circuit that exhibits desired behavior. Two popular methods of bifurcation analysis are saddle node bifurcation analysis and Hopf bifurcation analysis. Saddle node analysis endeavors to investigate the threshold where a system functions as a biological switch and thus separates the region of the parameter space that confers monostability [56]. Hopf bifurcation analysis is predominantly used to analyze oscillators and can characterize the critical parameter values that enable a system to transition from a stable steady-state solution to a periodic solution [57, 58]. In Hopf bifurcation analysis the eigenvalues of the Jacobian matrix, equation (6), are used to obtain the threshold parameter values at which the system’s behavior changes drastically. A phase diagram is then constructed by identifying the points where the real parts of a pair of complex conjugate eigenvalues crosses zero while all other eigenvalues have negative real parts (see [6], e.g.; this paper is discussed in case study I below). While there are numerous examples of bifurcation analysis of deterministic models, one work [27] has comprehensively treated the applications of bifurcation analysis to a stochastic model of an oscillatory system.

6. Modeling as a Part of Phenotype Analysis

Synthetic biology can also benefit from metabolic flux and transcription network analyses, which combine high-throughput experimental observations (such as metabolome, isotopomer, and gene expression profiles) with mathematical modeling to quantitatively describe the phenotype of a biological system. This type of analysis could be particularly useful for highly complex systems. For example, Noirel et al. [59] presented a probabilistic metabolic model that was useful in analyzing the systemic metabolic effects of inserting synthetic circuits into a cell. Of specific relevance to synthetic biology is an elegant work [60] that used optimization to identify how regulation could be superposed on a metabolic network to optimize the network.

6.1. Metabolic Pathway Analysis

Isotope-assisted [6163] metabolic flux analysis is a powerful tool to evaluate carbon flow in metabolic networks and could be relevant in synthetic biology. In this method, material balance models for cellular species are used together with measurements of extracellular metabolites or isotopomers (from nuclear magnetic resonance or mass spectrometry) to obtain metabolic flux maps of a system. While the mathematical modeling approaches discussed above are generally still valid in flux analysis, these models can be enormously complex due to the vast numbers of reactions, myriad carbon atom rearrangements, reaction reversibilities, and variations of intracellular fluxes in real time. This complexity is significantly reduced through the use of compartmental matrix techniques such as cumulative isotopomers (cumomers) [64], bond isomers (bondomers) [65, 66], or elementary metabolite units [63]. Despite this reduction in complexity, metabolic flux analysis of compartmented [61] or instationary systems [62, 67] requires tremendous amounts of computation.

Another set of powerful techniques for modeling metabolic networks includes flux balance analysis (FBA) [6870] or genome-scale metabolic modeling [7173] and ensemble modeling [31]. In FBA, a metabolic network is modeled with linear stoichiometric equations, constrained by factors such as extracellular flux measurements and reaction irreversibilities. This model is usually solved by linear optimization and results in a map of steady-state flux values. Through such a reconstruction of a metabolic network, FBA can offer important insights towards selecting gene deletion targets.

Another valuable technique, metabolic control analysis [7476], aims to elucidate the interdependence of various parts of a metabolic network. The outcomes of this technique are metrics such as flux control coefficients [77], which represent the amount of control exerted by one system component (such as a metabolite) on another system component (such as an enzyme). This method has much to offer toward the important problem of linking genome and phenotype [78].

6.2. Transcription Network Analysis

Determining how genes are controlled by regulatory motifs is an important problem in biology. Because synthetic circuits are composed of well-characterized components, they can be used to investigate and quantify transcription networks. Such an investigation would employ a combinatorial technique to construct a circuit comprised of numerous genes and a smaller number of regulatory motifs [79]. The high-dimensional output of such a network is gene expression data and is the end product of the low-dimensional regulatory signals (transcription factor activities) and the strengths of the connectivities between the transcriptional motifs and genes [80]. These transcription factor activities and connectivities are quantified by analyzing the measured gene expression data, using one or more of several available methods [81]. These methods include principal component analysis [82], singular value decomposition [83, 84], independent component analysis [85], network component analysis [80], or state-space models [86]. Network component analysis is a powerful method that uses a priori knowledge about connectivities between transcription factors and genes together with gene expression data to quantitatively infer transcription factor activities and the strengths of the transcription factor-gene connectivities. The a priori information is obtained from databases or experimental techniques such as ChIP-chip analysis [87].

7. Modeling Standards and Software

Several standards and software are now available to simplify the process of building mathematical models and thereby bridge the gap between model description and prediction of the system’s behavior. System biology markup language (SBML) [88] and synthetic biology open language (SBOL) (http://dspace.mit.edu/handle/1721.1/49523) are two examples of standards. Both are computer-readable formats for representing models and facilitate the sharing of models between researchers and between different software platforms. Several modeling software are available to practitioners of synthetic biology. These software, which are usually written in popular computer languages such as C++, feature a user interface, relatively simple ways to input information and graphical output of the modeling outcomes, thus relieving users of the burden of setting up and solving mathematical equations. A nonexhaustive listing of these software includes Athena (http://www.codeplex.com/athena), BioJade (http://web.mit.edu/jagoler/www/biojade), Gepasi (http://www.gepasi.org), SynBioSS (http://synbioss.sourceforge.net/), which reads and writes in SBML, and TinkerCell (http://www.tinkercell.com/Home) which enables users to incorporate new features through custom programs written in C or Python.

8. Case Studies

Below we present three case studies that illustrate several of the previously discussed modeling methodologies. The case studies feature two deterministic models [6, 7] and one stochastic model [8]. Table 1 summarizes the principal modeling techniques used in the case studies.

8.1. Case Study I: Deterministic Model of a Synthetic Oscillator

Fung et al. [6] reported the mathematical model-aided design of a gene-metabolic oscillatory circuit called the metabolator. Toward designing an oscillatory circuit the authors conceived a network with two interconverting metabolite pools wherein one metabolite differently regulates the enzymes that interconvert the two pools. Such a network is theoretically capable of oscillation (see Figure 3(a)). The circuit was implemented in Escherichia coli using the acetate pathway (see Figure 3(b)). Under certain circumstances, the sizes of the pools M1 (Acetyl-CoA) and M2 (lumped pool of Ack, AcP, , and HOAc) can oscillate. The readout of this network was a GFP, located downstream of the network such that the oscillations of the GFP readout reflect any oscillations occurring in the network. A mathematical model was developed to analyze the behavior of the metabolator and determine the conditions under which oscillations would occur. The model was deterministic and employed ODEs of the following form: where X represents any pool and , indicate its influx and outflux, respectively. Equation (9) was used to describe the metabolite pools M1 and M2 while equation (10) and Michaelis-Menten rate laws were used to describe the kinetics of the enzymes driving the M1-M2 interconversions. Using parameter values or ranges typical for this system, the authors implemented the model with a fourth-order Runge-Kutta algorithm. Parameter sensitivity analysis showed that increasing glycolytic flux increases the oscillatory capability of the system (Figure 4(a)). This prediction was tested by experimentally varying the glycolytic flux through the feeding of different carbon sources (glucose, fructose, mannose, and glycerol), each of which resulted in a different value of this flux. An explanation for this observation is that there existed a threshold value of the glycolytic flux beyond which the system would oscillate. Conversely, high external acetate was predicted to suppress oscillation (Figure 4(c)), which was also verified by experiments.

Hopf bifurcation was then used to characterize the dynamics of the model and determine the transition point at which the steady state would turn to a periodic state. Figure 4(b) depicts the phase diagrams constructed by mapping the locus of Hopf bifurcation. The oscillations approach a limit cycle and were stable, as determined through Floquet analysis. As previously inferred, oscillations occur above a threshold glycolytic flux value and are not sustained at a high acetate concentration. Furthermore, by comparing the modeling simulations with experimental data, the authors found that the inherent noise in gene expression was an important determinant of the amplitude of oscillation.

This work represents a universal approach to construct a synthetic biological circuit with interesting dynamics and beautifully demonstrates the key role played by mathematical modeling in realizing a design concept. Modeling offered valuable insights on the analysis of experiment data and made nontrivial predictions of the system dynamics. The use of bifurcation analysis was particularly useful as it facilitated the determination of the points at which the system transitions between stable and periodic states. We expect that as oscillator design develops [21], modeling will become ever more relevant in the design process.

8.2. Case Study II: Deterministic Model of a Synthetic Counter

Another example of deterministic modeling is that of Friedland et al. [7], who developed synthetic counters in E. coli that can count up to two or three induction events. The first of these, a riboregulated transcriptional cascade (RTC) two-counter, has two nodes and is able to count up to two arabinose pulses by expressing a different protein in response to each pulse. This was extended to the RTC three-counter (Figure 5) which has three nodes and can count up to three pulses as follows. The constitutive promoter drives transcription of T7 RNA polymerase (T7 RNAP), whose gene product binds the T7 promoter, which in turn drives the transcription of T3 RNAP. Similarly the protein of T3 RNAP binds the T3 promoter and ultimately drives the transcription of GFP. All genes are further downregulated and upregulated by cis and trans elements of riboregulators. A cis repressor (cr) interacts with the downstream ribosome binding site (RBS) in such a manner as to prevent translation. The arabinose promoter drives a transactivating, noncoding RNA (taRNA) that binds to cr in trans, thus relieving the inhibition of translation. Due to this design, protein synthesis at each node requires both transcription and translation to independently happen. Thus this cascade is able to count three arabinose pulses by expressing a different protein in response to each pulse.

The authors constructed and analyzed a mathematical model for the two- and three-counters. The model used equations of the form of (10) to describe the dynamics of the species in the circuit. The degradation terms in these equations were assumed to be simple exponential decays with a different rate constant for each species whereas the synthesis rates were rate laws that reflected how each species was synthesized. The Hill function was used to describe arabinose induction and the dynamics of GFP. Arabinose pulse dynamics were modeled with two differential equations as follows. Arabinose consumption from the medium was modeled as a zero order rate law: whereas the decay of intracellular arabinose in cells suspended in arabinose-free media was modeled as a first order rate law: The arabinose pulses were mimicked by alternately using equation (11) and equation (12). The authors used optimization (implemented by a MATLAB routine lsqcurvefit) to evaluate (fit) the parameters in the model so as to agree with experimental data (Figures 6(a) and 6(b)). The model with fitted parameters was used to examine the effects of arabinose pulse frequency and length on the performance of the RTC three-counter (Figures 6(c) and 6(d)). The simulations indicated the ranges of pulse intervals (10 to 40 minutes) and pulse length (20 to 30 minutes) that would result in maximal system response (GFP fluorescence); these predictions were later confirmed by experiments (Figures 6(c) and 6(d)). The simulations indicated that the system response increased significantly when two or three pulses were delivered. Also indicated by the simulations was a small amount of leakage that occurred in response to partial pulses; this was also verified experimentally (Figures 6(a) and 6(b)). The authors further used this approach to construct a DNA invertase cascade (DIC) counter and discussed extending this counter with the use of other unique polymerases or recombinases to record N sequenced events.

The RTC two- and three-counters constitute an elegant example showing how synthetic circuit elements can be combined to recognize sequential events. Here too, the mathematical model was important in investigating the system dynamics and identifying the pulse length and interval that yielded the most effective response. Parameter estimation (and thereby optimization) provided crucial insights toward improving counter performance. As these counters are expanded to become capable of counting larger numbers of events (and thereby increasing in complexity), the role played by mathematical modeling in design will become increasingly important.

8.3. Case Study III: Stochastic Model of a Bottom-Up Gene Regulatory Circuit

Guido et al. [8] constructed regulatory networks by assembling simpler modules, such that the behavior of the network was predictable from that of the components. The authors engineered the promoter such that it caused both repression and activation of gene expression in E. coli. For this, they combined an unregulated promoter, a repressor-only and an activator-only system (Figure 7). The unregulated system (Figure 7(a)) consists of an promoter driving GFP expression. In the repressor-only module (Figure 7(b)), the promoter controls the transcription of lacI, which binds to the site to repress the promoter. This binding can be tuned by with the lacI inhibitor isopropyl-b-D-thiogalactopyranoside (IPTG), which reduces the binding of lacI protein to the site. In the activator-only module (Figure 7(c)), the promoter controls the transcription of cI, which can bind sequentially or cooperatively to either the 1 or the 2 sites of the promoter to activate it. This effect is tunable with arabinose, which activates the promoter. The repressor-activator system is constructed by combining all three of these modules.

The authors first developed a baseline deterministic model for the repressor-activator system and then extended it to include stochastic effects. A quasi-equilibrium state was assumed for the promoter such that there were six possible chemical states of the promoter (Figure 8). The interconversions between the states were modeled with the equation: where is the transition rate from promoter state to , is the equilibrium constant for the interconversion, and [P], depending on the reaction, is the concentration of either lacI or cI. The probabilities of each state were functions of the transition rates and the average GFP synthesis rate was in turn a function of the state probabilities (see discussion around equation (4) and Figure 9 for details). The authors extended the model with stochastic effects so that it was able to simulate cell growth and the distribution of GFP within the cell population. For example cell growth was modeled by treating the cell volume as a random variable with an exponential distribution. The model assumed that at every instance of cell division, the volume halves and the cellular components (mRNA and GFP) are distributed amongst the daughter cells in a binomial distribution. The simulations of the deterministic and stochastic parts of the model are shown in Figure 10. The parameters in this model were determined by a gradient search local optimization algorithm and all stochastic modeling was implemented through the Gillespie Monte Carlo method realized by BioNetS software. The authors experimentally measured all mRNA synthesis rates and GFP fluorescence relative to the unregulated system and chose the parameter values that best fitted the fluorescence results shown in Figure 10. The model slightly underestimates the experimentally observed variability of fluorescence within the cell population which indicates that effects not considered in the model might be occurring.

To further verify the predictive power of the model and test its stochastic aspects, the authors expanded the circuit to include a positive feedback. This was accomplished adding the cI gene in front of GFP so that the repressor-activator system transcribes both cI and GFP simultaneously. This amounts to positive feedback because the product of the cI gene activates gene expression in the system. To effectively model this feedback loop, the model was extended to include synthesis, degradation, and multimerization of cI. This model with positive feedback exhibited very good agreement with experimental data. This agreement validated the bottom-up approach employed by the authors to study regulatory systems. Furthermore, the model was used to make the counterintuitive prediction that cessation of cell growth and division increases noise in protein expression levels, which was also verified experimentally.

Although other researchers similarly built larger regulatory systems from simple ones (e.g., [89]), the work by Guido et al. emphasized the important role played by mathematical modeling in the design and in making counterintuitive predictions about system dynamics. We expect that in the future, mathematical modeling along with transcription network analyses will become indispensable in the design of more complex regulatory networks.

9. Conclusion and Outlook

Mathematical modeling is an often indispensable tool in synthetic biology. The mathematical techniques of parameter estimation as well as sensitivity and bifurcation analyses can be crucial to the development of a model intended to mimic a complex system. Modeling also plays an important role in phenotypic analyses such as metabolic flux analysis or transcription network analysis.

A mathematical model is akin to a road map that provides a visualization of a geographical area. Although the map may not describe every detail of the landscape, it contains adequate information to enable users to plan a journey; a mathematical model is similar in scope [4]. A seminal review on mathematical modeling [90] stated that the purpose of models is to discern which parts and connections of a system are significant, to unravel new strategies, or sometimes to correct conventional wisdom. Examples of these uses abound in synthetic biology, where models have been employed to identify which regions in parameter space cause a system to behave in a desired manner (e.g., [6] and case study I) or what parameter values result in the most effective design (e.g., [7] and case study II). Furthermore, models have also been employed to understand the global dynamics of a system from known behaviors of its component units and have made counterintuitive predictions that were later verified experimentally (e.g., [8] and case study III). We anticipate that as experimental advances in synthetic biology produce increasingly complex circuits, mathematical modeling will play an ever more important function as a bridge between concept and realization.

Acknowledgments

This work was funded by the Department of Chemical and Biomolecular Engineering, University of Maryland (faculty startup grant to GS), the A. James Clark School of Engineering, University of Maryland (Minta Martin award to GS), and Maryland Industrial Partnerships (MIPS) (award no. 4426). Y. Zheng was cofunded by a Jan and Anneke Sengers fellowship.