Post-transcriptional bursting in genes regulated by small RNA molecules

Gene expression programs in living cells are highly dynamic due to spatiotemporal molecular signaling and inherent biochemical stochasticity. Here we study a mechanism based on molecule-to-molecule variability at the RNA level for the generation of bursts of protein production, which can lead to heterogeneity in a cell population. We develop a mathematical framework to show numerically and analytically that genes regulated post transcriptionally by small RNA molecules can exhibit such bursts due to different states of translation activity (onoroff),mostlyrevealedinaregimeoffewmolecules.Weexploitthisframeworktocomparetranscriptionaland post-transcriptionalburstingandalsotoillustratehowtotunetheresultingproteindistributionwithadditionalpost-transcriptional regulations. Moreover, because RNA-RNA interactions are predictable with an energy model, we deﬁne the kinetic constants of on-off switching as functions of the two characteristic free-energy differences of the system, activation and formation, with a nonequilibrium scheme. Overall, post-transcriptional bursting represents a distinctive principle linking gene regulation to gene expression noise, which highlights the importance of the RNA layer beyond the simple information transfer paradigm and signiﬁcantly contributes to the understanding of the intracellular processes from a ﬁrst-principles perspective.


I. INTRODUCTION
Living organisms of ranging complexity are continuously subjected to environmental changes that force them to make decisions. In this regard, they have evolved to base their behavior on intricate regulatory networks that allow expressing determinant genes according to dynamic signals [1]. However, the intracellular processes are inherently stochastic, which results in complex dynamic patterns [2,3]. In some cases, decisions mainly rely on deterministic processes, such as the expression of LacZ in the presence of lactose or IPTG in E. coli [4], whereas in others they rely on stochastic processes, such as competence (DNA uptake) in Bacillus [5]. Therefore, for a precise understanding of how such decisions are made and ultimately how living organisms perform, we need to characterize theoretically and experimentally the mechanisms that lead to dynamic gene expression as well as how such mechanisms can be shaped by evolution [6,7].
With the advent of experimental techniques with single-cell resolution, several mechanisms of stochastic nature have been recognized [8][9][10][11][12]. One of these mechanisms, which has been extensively studied in recent years, is transcriptional bursting [12][13][14][15][16][17]. In this mechanism the promoter of a gene (cisregulatory region at the DNA level) presents multiple states in terms of transcription activity (typically simplified to two, on and off), which ends in the generation of bursts of protein production and, in turn, of cell-to-cell variability. The promoter switches stochastically from one state to another, a process that is generally attributed to the action of a transcription factor (TF) or the chromatin [17]. Broadly, this switching between functional states might also occur at other regulatory * guillermo.rodrigo@csic.es layers, in particular at the post-transcriptional level. Indeed, recent experimental results with single-molecule resolution are starting to uncover time-dependent differences in the translation rates of equal messenger RNAs (mRNAs) [18,19]. However, it is still unknown to what extent mRNA-to-mRNA variability can have an impact on the cell (or the organism) and what theoretical models are useful to describe this biochemical process.
In this work we study the dynamic expression of a gene homogenously transcribed, but with a two-state cis-regulatory region at the RNA level (the leader region). This is a quite different regulatory model with respect to previous models, as the bursting process involves ribosomes rather than polymerases, a phenomenon previously overlooked. We develop a general mathematical framework applicable in both prokaryotic and eukaryotic contexts, where the translation rate (0 in the off state and k p in the on state) is approached by a random telegraph process ζ , characterized by the kinetic constants of activation and deactivation (k on and k off , respectively) [12]. As a case study, we apply this framework to a bacterial gene that is controlled by a small RNA (sRNA) [20]. Here the ribosomebinding site (RBS) is trapped in a strong RNA structure in the off state and it is released upon interaction between the sRNA and mRNA in the on state [ Fig. 1(a)]. Notably, through a comprehensive analysis with stochastic differential equations and molecular interaction thermodynamics, we are able to derive analytical noise expressions and provide a genotypephenotype map.
We expect the high application of this theoretical framework, which can be extended beyond riboregulation, to rationalize dynamic gene expression data derived from experiments with single-molecule resolution of translation, once a technology based on a protein-tagging system (called SunTag) able to recruit fluorescent proteins fused to an antibody (with great affinity for the tag) has been developed [21]. This has been recently proven effective to monitor with time and space the translational activity of single mRNAs in vivo [18,19], leading to the recognition of changes in activity due to the effect of signal molecules and/or intrinsic stochastic fluctuations. In addition, this study also serves to point out the role of regulatory RNAs in not only fine-tuning gene expression levels [22,23], but also gene expression noise [24,25].

A. Riboregulation of translation can lead to protein expression bursts
The concentration of the sRNA (s) that controls the RBS activity (i.e., the ability of the RBS to recruit ribosomes) is taken as an external variable, which may vary with time and fluctuate stochastically. As the sRNA binds reversibly to the mRNA, the system is in constant fluctuation between the on and off states. This way, translation is a time-dependent process, described by the telegraph process (s determines the binding kinetics, while the unbinding kinetics is constant). The system of stochastic differential equations that model the dynamics of the mRNA and protein amounts (denoted by x and y, respectively) reads where c is the gene copy number, k m and k p are the transcription and translation rates, respectively, and γ m and γ p are the effective mRNA and protein degradation rates, respectively. In addition, ξ m and ξ p are two independent Wiener processes that account for intrinsic noise in mRNA and protein production, respectively, and their statistics (mean and autocorrelation) are ξ i (t) = 0 and ξ i (0)ξ i (t) = δ(t) for i = m,p. In this work, we discard, for simplicity and without loss of generality, the effect of extrinsic noise [8]. Furthermore, the statistics for the telegraph process are ζ (t,x,s) = x p on and ζ (0,x,s)ζ (t,x,s) = x p on (1 − p on )δ(t/τ ), with p on = k on s /(k on s + k off ) and τ = 2/(k on s + k off ) (see Sec. 1 in the Appendix).
To investigate the phenomenon of post-transcriptional bursting, we first solve numerically the system with constant mRNA amount (see Sec. 2 in the Appendix). We use the values of k on and k off experimentally determined for the interaction between the sRNA SgrS and the mRNA ptsG in bacteria, leading to an asymmetric on-off switching (slow activation and fast deactivation) [23]. When there is only one mRNA in the cell (and ten sRNAs), we find a protein dynamics with bursts, resulting in irregular periods of accumulation and relaxation [ Fig. 1(b)]. In this scenario, bursts from heterogeneous translation occur at a frequency of k on s and they have a typical amplitude of k p /k off , although larger bursts can be observed due to the stochasticity of the system. Because the deactivation is produced in a few seconds, the RBS has to be strong so that bursts have a significant impact. This highlights the coupling between the sRNA-mRNA and ribosome-mRNA interactions. However, when there are many mRNAs in the cell, the fluctuations are less significant with respect to the mean value y [ Fig. 1(c)]. Indeed, bursts occurring at the singlemolecule level are hidden at the cellular level in the same way as single-cell variability is hidden when a cell population as a whole is analyzed [8].
Simulations with a full kinetic model (see Sec. 3 in the Appendix) give similar results (Fig. 2) [26]. We indeed observe bursts in protein expression when the average number of mRNAs in the cell is x < 10 mol. Nevertheless, Kolmogorov-Smirnov tests reveal significant differences between the distributions coming from the stochastic differential equations and the full kinetic model for different transcription rates [ Fig. 2(c)]. This reflects the approximation of the continuous model, which is instrumental to perform an analytical treatment.
We calculate analytically the amplitude of the stochastic fluctuations in expression under the mean-field approximation (x = x + x and y = y + y) [27]. The deterministic solution in the steady state is x = ck m /γ m and y = k p p on x /γ p . By applying the Wiener-Khinchin theorem, we obtain mathematical expressions for the noise in mRNA and protein amounts (η 2 x = x 2 / x 2 and η 2 y = y 2 / y 2 , respectively), given by While noise in the mRNA expression is simply Poissonian, noise in protein expression is contributed by three different sources. The first term in the equation accounts for the bursting process at the translation level, the second term for the transmission of noise from the mRNA and sRNA (having assumed the same degradation rate for these two molecules, γ m ), and the third term for the intrinsic noise in the processes of translation and protein degradation (Fig. 3). With the parametrization used here, the three terms have similar values. The bursting process increases by 1 the Fano factor (defined as y η 2 y ), representing a 37.5% of the noise.
These calculations provide noise-mean scaling laws, beyond the null model of a Poisson distribution, from which to confront experimental data, once technical variability has been corrected [28]. In this regard, the precise experimental determination of the Fano factor (greater than 1) of the effective noise term η 2 y − γ p γ m η 2 x could be a useful strategy to recognize scenarios of post-transcriptional bursting (if we assume γ p γ m and η 2 s = 0). Note that the Fano factor of η 2 y is already greater than 1 in the case of homogenous translation, because this process amplifies the noise in the mRNA amount [29].

B. Post-transcriptional vs transcriptional bursting
With the aim of performing a comparative analysis, we construct the system of stochastic differential equations that model the dynamics in the case of transcriptional on-off switching [as Eqs. (1)]. Now the telegraph process describes the transcription rate [k m → k m ζ (t,c,s)] and the translation rate is homogeneous [k p ζ (t,x,s) → k p ]. In this regard, s denotes the concentration of a TF that controls the promoter activity (i.e., the ability of the promoter to recruit RNA polymerases, leading to time-dependent transcription). Notably, this model might also be exploited to study the effect of genetic redundancy by playing with the parameter c [30]. We simulate the two systems by ensuring the same time scale in all biochemical processes and the same value of y . As expected, we find larger bursts of protein production in the case of transcriptional on-off switching [Figs. 4(a)-4(d)], as there is a process of signal amplification. We also obtain analytical expressions for the noise in mRNA and protein amounts (see Sec. 4 in the Appendix), already known [12].
A genetic implementation of a system having a high transcription rate and a low translation rate is widely accepted as a strategy to minimize the noise in protein expression, as a strong promoter recruits RNA polymerases constantly without bursting and a weak RBS does not amplify much the noise in mRNA expression [29]. Even though a low translation rate can certainly entail a phenomenon of post-transcriptional bursting, such a design principle is still valid, as the many mRNAs transcribed balance the intrinsic variability. Moreover, genes subjected to negative feedback could also present reduced noise in protein expression [31], but in a regime with a significant number of molecules (i.e., moderate feedback strength) [32]. Thus, in a regime of very few molecules (x ∼ 1 mol), where fluctuations occurring post transcriptionally can really be observed, negative feedback would not be useful to buffer them.
Subsequently, we analyze how information encoded on the signal molecule, an sRNA or a TF, is transmitted through these regulatory systems (by using mutual information as a metric, as previously done [30]). This is part of a longer information transmission chain, from the environment to the phenotype. Certainly, by increasing the number of signal molecules in the cell (s), the activation becomes faster and the dynamics more Poissonian. We find that the noise in protein expression is lower in the case of post-transcriptional on-off switching when the signal is weak (p on < 0.1), but higher when the signal is strong [ Fig. 4(e)]. We also find that only for small input dynamic ranges (s max /s min ) post-transcriptional regulation outperforms transcriptional regulation in terms of information transfer [ Fig. 4(f)]. This suggests that sRNAs may mainly work to fine-tune protein expression by sensing weak signals, while the gross expression level may be adjusted through transcriptional control.

C. Predictability and tunability of the protein expression distribution
According to previous work, the distribution of protein expression can be well approached by a Gamma distribution [13,33]. This form, can still be valid to describe a system with post-transcriptional bursting, provided we redefine its two characteristic parameters (shape and scale, denoted by a and b, respectively). Here b represents the total mean number of proteins produced per stochastic fluctuation (i.e., the Fano factor). Typically, k p /k off proteins are produced once the RBS is active and one mRNA and one protein are produced intrinsically. The mRNA molecule generates k p p on /γ m proteins before it is degraded. In total, b = k p /k off + k p p on /γ m + 1 proteins are produced on average per stochastic fluctuation. In addition, a represents the total mean number of stochastic fluctuations during the protein lifetime, which is the total mean number of fluctuations per mRNA in that period (n) times the mean number of mRNAs x (a = n x ). With τ on = 1/(k on s ), τ m = 1/γ m , τ y = 1/(k p p on ), and τ p = 1/γ p , we have n = τ p /(τ on + τ m + τ y ). Notably, in the limit p on 1 and γ p γ m , the Gamma distribution as constructed is compatible with Eqs. (1) and (2), as E[P (y)] = ab = y and Var[P (y)] = ab 2 = y 2 .
Because the bursting regime requires a low number of molecules, we then investigate two additional posttranscriptional regulations in bacteria that contribute to this end [ Fig. 5(a)]. On the one hand, the mRNA can be quickly degraded if an antisense small RNA (asRNA) binds to it [22,23], a mechanism that can also occur in eukaryotes to fine-tune protein expression and noise [24]. To model this, we modify Eqs. (1) as γ m → γ m + μ m , assuming a high constant amount of the asRNA and no binding interference with the sRNA. As a result of the additional regulation, the distribution is shifted to the left and becomes more skewed to the right [ Fig. 5(b)]. This shape comes from the slowly decaying autocorrelation of protein expression due to γ p 0 and it is similar to the distribution that would be obtained in the case of moderate transcriptional bursting. On the other hand, the protein sequence can contain a degradation tag [34]. In a regime of a few molecules due to a weak promoter (or the presence of an asRNA), the proteolytic machinery is not saturated and then such degradation can be considered to be of first order. We modify Eqs. (1) as γ p → γ p + μ p to model this updated scenario (typically, μ p γ p ). As a result, the distribution is again shifted to the left but with a radically different, monotonically decaying shape [ Fig. 5(c)]. In both scenarios, the distributions coming from numerical simulations are in agreement with those obtained analytically (see also Fig. 6).
Overall, these results highlight how post-transcriptional regulations are useful to modulate not only the magnitude but also the shape of the intrinsic variability. This can end in different dynamic protein expression programs of stochastic nature that can be switched by regulatory RNAs according to particular environmental signals [25].

D. Nonequilibrium thermodynamics of the sRNA-mRNA interaction
Finally, to further understand the process of posttranscriptional bursting, we inspect the dependence of k on and k off on the energetics of the sRNA-mRNA interaction. This characterizes the nonequilibrium thermodynamics of the system, which, however, satisfies detailed balance [p on k off = (1 − p on )k on s ]. Typically, the energy landscape is described by the free energy of activation ( G † > 0, the free energy required to get the two molecules together) and the free energy of formation [ G < 0, the net free-energy release upon interaction; see Fig. 7(a)]. According to the transition-state theory [35], a kinetic constant is defined by the free-energy barrier faced with the reaction progress. Here the transition state corresponds to the sRNA-mRNA complex in which only the seed regions of each molecule are annealed [20]. In the classical scenario of equilibrium, the ratio between k on and k off yields the equilibrium constant (K eq = k on /k off ∝ e −β G ), which determines the protein production and only depends on the free energy of formation (not of activation). However, extensive experimental data have shown that only G is not sufficient to explain the dynamic range in a scenario of riboregulation [36]. Instead, G # and G are the two main determinants. Even if formation is very favorable, the reaction does not progress in the absence of a suitable seed interaction [37].
We hypothesize a nonequilibrium scenario in which the system continuously fluctuates between the two states and the dependence of the average protein production on the energetics is different [38]. On the one hand, the off to on transition faces a free-energy barrier of G † . Detailed experiments with nucleic acids evidence an exponential acceleration of k on with the seed region length, as well as a saturation for a sufficiently large seed [39]. In addition, the larger the seed region, the lower the free energy associated with the seed region annealing ( G # < 0). Hence, prior to the saturation limit, we can state that G † scales with G # , leading to e −β G † ∝ e −β G # . After that, the formation of the final intermolecular complex takes place, with low probability and slow kinetics due to the formation of intermolecular base pairs at the cost of breaking intramolecular ones [37]. Note that the seed region annealing contributes in part to the total complex formation ( G # > G). On the other hand, the on to off transition faces a free-energy barrier that depends on G, due to the process of recomposition of intramolecular base pairs by breaking intermolecular ones. This is assumed to not depend on G † , as once the previous process has ended, most probably through different pathways, dissociation is immediate. Therefore, our coarse-grained energy model reads where k 0 on and k 0 off represent the asymptotic kinetic constants. The parameter β is empirically estimated to be a value lower than the theoretical value given by the Boltzmann constant [40,41]. Importantly, both G # and G can be directly calculated from the nucleotide sequences [42]. Figures 7(b) and 7(c) illustrate how the dynamics changes with such energetics.
This results in a computational tool [ y ∝ k on /k off ∝ e −β( G # + G) ] to analyze the dynamics of natural and mutant systems [36] or to guide the design of synthetic sequences to implement different functionalities in the cell [20]. Of relevance is that the experimental characterization of post-transcriptional bursting with different sRNAs would serve to infer the precise values of the kinetic constants and then verify such an energy model with the use of the predicted free energies [42].

III. CONCLUSION
We have presented a theoretical framework to study a distinctive mechanism at the post-transcriptional level for the generation of stochastic bursts in protein expression. This mechanism exploits mRNA-to-mRNA functional variation (in terms of translation activity), in contrast to DNA-to-DNA functional variation (in terms of transcription activity) [12][13][14][15][16][17]. We performed numerical and analytical calculations maintaining a simple mathematical treatment, but more complex analytics might be followed in the future to refine our predictions [43,44]. Certainly, our work opens avenues in biophysics to rationalize increasing single-cell and single-molecule imaging data [18,33]. In theory, post-transcriptional bursting arises when there are at least two clearly different translation modes (on and off). Moreover, the switching kinetics between the modes have to be highly asymmetric towards the off state (k on k off ) and the mean time in the on state sufficient to allow translation initiation (k p k off ). In prokaryotes, a scenario of positive riboregulation can favor this [20], but not a scenario of negative riboregulation (see Sec. 5 in the Appendix), not in the case of repression of an active RBS [36] or degradation of the mRNA [22,23]. In both latter cases, translation occurs in the absence of the asRNA, which results in a kinetic asymmetry towards the active translation state (Fig. 8). In eukaryotes, by contrast, the translation initiation machinery is more complex, involving the participation of several cofactors, which can result in spontaneous on-off switchings with slower kinetics (e.g., periods in the on state of some minutes) [18]. Consequently, bursts can be much larger.
All in all, our results highlight a fundamental contribution of the post-transcriptional control mechanisms to the construction of dynamic gene expression programs, stressing the utility of the RNA layer for the cell [45]. More broadly, they highlight how the nonequilibrium thermodynamics of the functional macromolecular complexes in the cell [46], in particular those involved in translation (here sRNA-mRNA), can lead to different phenotypes with time. This quantitative and testable description of translation initiation dynamics and protein expression variability, explicitly considering time and energy, is parallel to the development of single-molecule experimental approaches able to monitor the activity of those complexes in vivo [18,19], a combination that ultimately leads to an understanding of biology from first principles.

Simplification of the kinetics for the telegraph process
Because the on-off switching is very fast in comparison to other processes in the system (k off γ m γ p ), we approached a decay exponential by a Dirac delta e −2t/τ δ(t/τ ). Here 1/τ represents the mean switching rate. This simplified the analytical calculations to obtain Eqs. (2). Moreover, we used the first-order expansion ζ (t,x,s) ζ (t, x , s ) + k on k off ( s x + x s).

Numerical simulations of the stochastic differential equations
For numerical calculations of Eqs. (1), we followed