Ab initio scaling laws between noise and mean of gene expression

Gene expression is inherently noisy due to fluctuations occurring at the molecular level. From a top-down perspective, noise has been traditionally decomposed into an intrinsic component that scales inversely with the mean expression level and an extrinsic component that is constant in absence of regulatory changes. Here, we adopt a bottom-up approach to reveal that extrinsic noise, by itself, can follow the aforementioned decomposition, which entails that one component of it can be confounded with intrinsic noise. Analytical expressions of the noise-mean relationship were derived for different scenarios, which were in part supported by numerical simulations.


I. INTRODUCTION
Genes are the fundamental, information-encoding pieces into which the DNA of every living organism is divided, and their expression (e.g., the biochemical process by which RNAs and subsequently proteins are produced) is inherently stochastic [1]. Over the last years, much experimental and theoretical work has been conducted with the aim of unveiling and characterizing the mechanisms that underlie cell-to-cell variability in gene expression, what is called molecular noise [2][3][4][5][6][7][8]. By defining noise (denoted by η 2 ) as the square of the coefficient of variation of gene expression, the conventional model states that η 2 = b . + B [9][10][11], where the first term accounts for the intrinsic noise (i.e., the variability generated specifically in the gene of study) and the second for the extrinsic noise (i.e., the variability generated at a global scale, shared by different genes) [2]. In this simple but effective model, while the extrinsic noise is constant (B, this holds for nonregulated genes or invariant regulatory activity), the intrinsic noise scales inversely with the mean expression level (denoted by . ). In brief, b is the Fano factor and represents the number of molecules produced on average per stochastic fluctuation (when b = 1, the noise is Poissonian).
In this work, we challenge this conventional model. For that, we follow a bottom-up approach to derive a general model based on stochastic differential equations (SDEs) in which the extrinsic noise arises as a consequence of fluctuations in the different kinetic parameters that shape gene expression. For instance, the transcription rate will vary if the pool of available RNA polymerases does [12]. Similarly, the mRNA degradation rate will be susceptible to perturbations in ribonucleases [13]. The relationship between extrinsic noise and cellular factors has been previously tackled mathematically [14][15][16]. Yet, it has been difficult to end with a tractable model, integrating in a simple way the variations in all kinetic parameters as a result of fluctuations in cellular factors, from which to perform analytical calculations. Moreover, the * guillermo.rodrigo@csic.es growing generation of single-cell (and even single-molecule) data urges improved quantitative descriptions [17].

A. General framework
Let us denote by x i the concentration of the ith species at play, whose level depends on a series of biochemical reactions (of synthesis and degradation) occurring in the cell [1]. These reactions can be modeled if we know the different kinetic parameter values (α k denotes the nominal value of the kth parameter). Then, if f i j denotes the jth reaction affecting the expression of the ith species, by following a perturbative approach the general system of SDEs that model the dynamics of gene expression reads where ξ i j 's are independent Wiener processes that account for intrinsic noise in each biochemical reaction (white noise), and their statistics (mean and autocorrelation) are ξ i j (t ) = 0 and ξ i j (0)ξ i j (t ) = δ(t ) [18]. Moreover, α k is the uncertainty in the value of the kth parameter, which is associated with extrinsic noise. By assuming that α k is proportional to the concentration of a given cellular factor that fluctuates with time (z), typically a protein, we can write α 2 k ∝ z 2 . If we further assume that the noise in the cellular factor scales with its mean, as [9,10], we obtain α 2 k ∝ α k + ε k α 2 k , with ε k a constant balancing the scaling law of the extrinsic fluctuation. Therefore, given θ k , a constant of proportionality, it turns out α k (t ) = θ k √ α k + ε k α 2 k ζ k (t ). Here, ζ k 's are different Ornstein-Uhlenbeck processes that account for noise in each biochemical parameter (colored noise), and their statistics are ζ k (t ) = 0 and ζ k (0)ζ k (t ) = 1 2τ e −|t|/τ , where τ is the correlation time. It has been revealed that τ is given by the effective protein degradation rate (e.g., the growth rate in the case of bacteria) [4].
A sum of independent stochastic processes (of Wiener or Ornstein-Uhlenbeck) is equivalent to one process of the same nature, as j q i j ξ i j (t ) = √ j q 2 i j ξ i (t ). Hence, if the fluctuations in the different cellular factors are independent, and provided that each parameter introducing noise only affects a particular biochemical reaction in the system, Eq. (1) can be rewritten as where the same correlation time was considered for all Ornstein-Uhlenbeck processes. This seems reasonable if all cellular factors are proteins with the same half-life. However, if some factors were RNAs (typically more short-lived) or proteins with widely different stability, that consideration would introduce inaccuracies.

B. Expression model
Following Eq. (2), the system of SDEs that model the dynamics of the mRNA and protein concentrations of a given gene (denoted by m and p, respectively) reads where k m represents the transcription rate, k p is the translation rate, γ m is the mRNA degradation rate, and γ p is the effective protein degradation rate. Easily, the deterministic solution is m = k m γ m and p = k p m γ p . Note that here first-order kinetics are assumed, but nothing prevents the use of more complex kinetics (e.g., sensu Michaelis-Menten [19]) to model the dynamic behavior of the system. Only, the analytical expressions of noise will be more complex as a consequence of the derivatives with respect to the parameters and less accurate if we follow a perturbative approach [20].

Comparable Poissonian fluctuations in the kinetic parameters
The value of θ k might be different for each kinetic parameter, as each cellular factor would enter into the system with a characteristic susceptibility [14,15]. However, we can consider a scenario of comparable fluctuations in the parameters and then simplify the model with θ k = θ . Moreover, we can neglect the second-order contribution in the variance of the cellular factor and then further simplify the model with ε k = 0 (Poissonian regime). This assumption would hold if the cellular factor were in low or moderate abundance. Hence, under the mean-field approximation [21], the SDE for the fluctuation in the mRNA expression ( m) reads For convenience, the constant modulating of the extrinsic noise level can be redefined as θ = 2σ 2 γ p (also with τ = 1 γ p ). By applying the Fourier transform [22], it turns out that where S m is the power spectrum of m (ω represents the frequency variable). By applying the Wiener-Khinchin theorem, and in the limit γ m γ p (i.e., faster degradation of mRNA than protein), it turns out an expression for m 2 , the variance of the fluctuation. Finally, the noise in the mRNA which can be decomposed by defining b in = 1, b ex = σ 2 γ m , and B ex = σ 2 γ m , in intrinsic and extrinsic contributions ( This is illustrated in Fig. 1(a). In addition, the SDE for the fluctuation in protein expression ( p) reads By applying the Fourier transform, and noting that m, ξ p , and ζ p are independent fluctuations, it turns out that where S p is the power spectrum of p. Again, by applying the Wiener-Khinchin theorem, in the limit γ m γ p as before and further assuming k p γ p , it turns out an expression for p 2 . Finally, the noise in the protein expression (η 2 p = p 2 p 2 ) is 2γ p , in intrinsic and extrinsic con- This is illustrated in Fig. 1(b). It is worthwhile to note at this point that if the protein of interest were short-lived (τ 1 γ p ), we would have b ex = σ 2 k p γ m γ p and B ex = σ 2 γ p instead.

Comparable non-Poissonian fluctuations in the kinetic parameters
If the cellular factors were in high abundance, their expression noise would be mostly constant. We can now consider this scenario and simplify the model accordingly with ε k 1 α k , together with θ k = θ and ε k = ε for comparable fluctuations in the parameters. Hence, the SDE for m reads With θ = 2σ 2 γ p and γ m γ p , it turns out that Note that now b ex = 0 (i.e., the extrinsic noise is constant). Equation (7) stands with b in = 1 and B ex = 2σ 2 ε. In addition, the SDE for p reads Thus, we end with in which again b ex = 0. Equation (11) stands with b in = 1 + k p γ m and B ex = 2σ 2 ε.

Extrinsic noise dominated by fluctuations in the transcription rate
It has been suggested that transcription is the biochemical process that mostly contributes to noise in gene expression [23,24], especially in eukaryotes. We can take into account this scenario by setting θ 1 θ k , then leading to Again with θ 1 = 2σ 2 γ p and γ m γ p , it turns out that Thus, with b in = 1, b ex = σ 2 γ m , and B ex = σ 2 ε 1 , we recover Eq. (7).
In addition, the SDE for p is given by Eq. (8) with θ = 0, so It is important to note that the main difference with respect to the results obtained in Sec. II C 1 is that now, in terms of protein, B ex presents a lower value, as ε 1 1 γ p can be considered. This is illustrated in Fig. 2.

D. Numerical simulations
To validate, at least in part, our analytical calculations, we performed numerical simulations of the system of SDEs presented in Eq. (3). For that, we allowed propensities to vary with time, beyond the mean-field approximation (i.e., using m and p rather than m and p ). We considered different sets of kinetic parameters to reach different expression levels. The parameters were directly Ornstein-Uhlenbeck processes, avoiding a first-order expansion on them. Simulations were done for 10 4 h with an increment of 0.1 h by employing a practical numerical approach [25,26] implemented in MATLAB (MathWorks).
Specifically, we considered different nominal transcription rates (k m = 0.1-100 mol/h), keeping fixed the nominal mRNA degradation rate (γ m = 1 h −1 ), to achieve different values of m , covering several orders of magnitude. Moreover, we considered a single value for the nominal translation rate (k p = 1 h −1 ) and two values for the nominal protein degradation rate (γ p = 0.04 or 0.4 h −1 ), the greater to achieve low protein levels. Despite this, the frequency of the extrinsic fluctuations was always 1 τ = 0.04 h −1 . Accordingly, p also spanned over several orders of magnitude. Of note, these nominal values are compatible with experimental determinations in eukaryotes [27]. To model the fluctuations in the kinetic parameters, we set θ 1 = θ 2 = θ 3 = 1.41, θ 4 = 0.14, ε 1 = ε 2 = ε 3 = 1, and ε 4 = 25 (each in its appropriate units). Numerical simulations were then carried out by considering only intrinsic noise, only extrinsic noise, or both types of noise (Fig. 3).
Analytically, when k m ≈ 1 mol/h, the Poissonian regime can be invoked, then with σ 2 = θ 2 1 γ p (as ε 1 k m ≈ 1 and ε 2 γ m = 1). This leads to b ex = σ 2 γ m = 0.08, in terms of mRNA. However, this regime cannot be assumed to calculate B ex , because at high mRNA levels the transcriptional fluctuations are non-Poissonian. Instead, we shall write B ex = 2σ 2 ε 1 = 0.16. Of relevance, we found that Eqs. (7) with those values of b ex and B ex , together with b in = 1, explain well the data [ Fig. 3(a)].
In terms of protein, in addition, we can write b ex = σ 2 k p γ m γ p = 0.20, as for low p values the protein is short-lived (γ p = 0.4 h −1 ) and the Poissonian regime holds. However, as before, we need to write B ex = 2σ 2 ε 1 = 0.16. Of relevance, we also found that Eqs. (11) with those values of b ex and B ex , together with b in = 1 + k p γ m = 2, correctly explain the numerical data [ Fig. 3(b)].

III. DISCUSSION
In sum, we have relied on ab initio calculations to develop a simple theory that provides original noise-mean scaling laws in gene expression. Importantly, this noise description can be directly tested against experimental data (measuring expression levels for a large cell population, to then calculate means and variances). This can be done from expression data of genomic approaches [11,17] or externally regulable genes [28]. The slope (at low expression levels) and the floor (at high levels) of the noise-mean scaling law shall reveal the values of b in + b ex and B ex , respectively. To extract the value of b ex when b in is not known (>1), additional data will be required, such as from experiments with a second gene copy [15]. Moreover, we provide ab initio calculations for different scenarios (e.g., when all reactions equally contribute to the extrinsic noise or when transcription is the main determinant). Of note, b ex = 0 if the cellular resources used for gene expression are in great excess and a Poissonian regime cannot be invoked ( z 2 ∝ z 2 ). In principle, by analyzing the b ex B ex ratio, both in terms of mRNA and protein, we might get certain mechanistic insight about the underlying noise scheme.
Our calculations indeed highlight the importance of distilling with precision the different contributions to the Fano factor in order to appreciate the effect of the stochastic fluctuations of extrinsic nature, in contrast to the conventional wisdom [9][10][11]. In terms of mRNA, our model is more conceptual than practical, as b ex b in = 1 (i.e., σ 2 γ m , if we follow the Poissonian regime of extrinsic fluctuations). This is appreciated at low mRNA abundance, where η 2 m,in ≈ η 2 m [ Fig. 1(a)]. Indeed, by looking at the variability of highly expressed genes in mammalian cells, it results that B ex ≈ 0.06 [11]. Thus, assuming a half-life of about 1 h for transcripts, we have σ ≈ 0.24 h −1/2 (with b ex b in ≈ 0.06). In terms of protein, by contrast, the model predicts a higher b ex b in ratio (approximated by σ 2 2γ p ), due to time averaging of stochastic fluctuations from transcription to translation (γ p γ m ). Figure 1(b) shows how the noise-mean scaling is maintained for η 2 p,in and η 2 p , but with different Fano factors. By using that inferred value of σ and considering a half-life of about 1 d for mammalian proteins, we have b ex b in ≈ 0.72. This means that stochastic protein bursts are substantially contributed by changes from time to time in the transcription rate [24], and not only by intrinsic fluctuations in mRNA levels as previously thought [9]. This entails that such bursts might appear even in scenarios of high expression or might be correlated between different proteins [15].
So far, we considered a gene transcribed homogeneously at a given rate. However, increasing evidence suggests that many genes, mainly in eukaryotic cells, are transcribed by bursts [6,29]. In Appendix A, we provide a new noise calculation by considering this scenario. Now, as the bursting process typically dominates the amplitude of the stochastic fluctuations of intrinsic nature (k m k off in mammalian cells, then b in ≈ k m k off 1 for mRNA) [28], the impact of extrinsic noise on the Fano factor is reduced (b ex b in ). Yet, our results revealed that if transcription remains off for large periods, the bursting process also determines the fluctuations of extrinsic nature (γ m k on , then B ex ≈ σ 2 k on for mRNA; although this should be taken more in qualitative terms due to the use of an approximative model) [30]. Perhaps perturbations in chromatin remodelers that are propagated to the off-on switching kinetics can be explanatory [31].
Moreover, if the gene is regulated (e.g., by an upstream transcription factor), the major contribution to the extrinsic noise will be given by the uncertainty associated with the corresponding molecular binding (e.g., a protein-DNA binding). As observed experimentally [2,5], this leads to a peak-shape function by following a description based on Hill functions [32], which will only affect B ex .
We might also consider an alternative scenario by assuming that the different stochastic fluctuations of extrinsic nature are correlated (nonindependent), provided they are the result of global changes in the cell. For instance, a variation in the cell growth rate (in bacteria) could lead to systematic changes in the different resources used for gene expression [33]. In Appendix B, we provide a new noise calculation by considering the ideal case of common fluctuations, revealing that more complex noise-mean scaling laws can appear.
Finally, a limitation of a noise-mean scaling law is that its parametrization (the values of b in , b ex , and B ex ) can vary from gene to gene, especially when the translation rate changes. We would then expect a tighter genome-wide distribution in the case of mRNA than in the case of protein expression [17]. The parametrization can even change for a given gene if we are able to tune its expression level. Another drawback here is the fact that at low expression levels (<10 mol) a continuous model given by SDEs might be not accurate enough. Certainly, the transition from a master equation to a Langevin equation is typically justified when the number of molecules is high; yet, the first and second moments analytically calculated from those approaches always coincide when the system is linear ( i.e., based on first-order kinetics) [34]. In addition, more complex analytical treatments might be adopted for noise calculation, beyond mean-field theory; treatments that might lead to more accurate predictions [35]. Overall, the results presented in this work contribute to enlarge our knowledge on gene expression noise, providing a basic framework from which to assess from scratch the impact of extrinsic noise.

ACKNOWLEDGMENT
This work was supported by the Spanish Ministry of Science, Innovation, and Universities grant PGC2018-101410-B-I00 (cofinanced by the ERDF).

APPENDIX A: NONHOMOGENEOUS TRANSCRIPTION
If the gene is expressed through transcriptional bursts (i.e., pulses in the transcriptional activity of the promoter), the system of SDEs that model the dynamics reads (for comparable Poissonian fluctuations in the kinetic parameters) where ψ is a telegraph process [6], which is described by the kinetics k on (transcription activation) and k off (transcription deactivation). In the limit k on k off , its statistics are ψ (t ) = ρ and ψ (0)ψ (t ) = ρ(1 − ρ)e −k off |t| , with ρ = k on k off . The SDE for m then reads By applying the Wiener-Khinchin theorem, and in the limit k off γ m γ p (i.e., shorter period of active transcription than mRNA half-life and faster degradation of mRNA than