Merger rate of black hole binaries from globular clusters: theoretical error bars and comparison to gravitational wave data

Black hole binaries formed dynamically in globular clusters are believed to be one of the main sources of gravitational waves in the Universe. Here, we use our new population synthesis code, cBHBd, to determine the redshift evolution of the merger rate density and masses of black hole binaries formed in globular clusters. We simulate $\sim 20$ million models to explore the parameter space that is relevant to real clusters and over all mass scales. We show that when uncertainties on the initial cluster mass function and half-mass radius relation are properly taken into account, they become the two dominant factors in setting the theoretical error bars on merger rates. Other model parameters (e.g., natal kicks, black hole masses, metallicity) have virtually no effect on the local merger rate density, although they affect the masses of the merging black holes. Modelling the merger rate density as a function of redshift as $R(z)=R_0(1+z)^\lambda$ at $z<2$, and marginalizing over uncertainties, we find: $R_0=7.2^{+21.5}_{-5.5}{\rm Gpc^{-3}yr^{-1}}$ and $\lambda=1.6^{+0.4}_{-0.6}$ ($90\%$ credibility). The rate parameters for binaries that merge inside the clusters are ${R}_{\rm 0,in}=1.6^{+1.9}_{-1.0}{\rm Gpc^{-3}yr^{-1}}$ and $\lambda_{\rm in}=2.3^{+1.3}_{-1.0}$; $\sim 20\%$ of these form as the result of a gravitational-wave capture, implying that eccentric mergers from globular clusters contribute $\lesssim 0.4 \rm Gpc^{-3}yr^{-1}$ to the local rate. A comparison to the merger rate reported by LIGO-Virgo shows that a scenario in which most of the detected black hole mergers are formed in globular clusters is consistent with current constraints, and requires initial cluster half-mass densities $\gtrsim 10^4 M_\odot \rm pc^{-3}$. Interestingly, such higher density models also result in a better match to the inferred black hole mass distribution.

The realistic modeling of the dynamical evolution of BHs in the core of a GC represents a complex computational challenge requiring an enormous dynamical range in both space and time. For this reason, it is only very recently, thanks to major improvements in computational methods and hardware, that it became possible to make robust predictions about the numbers and physical properties of BH binary (BHB) mergers produced in GCs [15][16][17][18][19]. Thanks to these past efforts it is now clear that a large fraction of the sources detected by LIGO-Virgo could have been dynamically assembled in GCs. However, as discussed below, a full characterisation of the model uncertainties related to the BHB merger rate from the GC channel is still missing.
Several recent studies only considered the contribution from clusters that have survived to the present day [e.g., 16,20]. These studies found that the present-day population of GCs produces BHB mergers at a local rate of ≈ 5 Gpc −3 yr −1 . This represents a lower limit to the actual merger rate as there likely existed a population of clusters which did not survive to the present, but that contributed significantly to the local merger rate [21]. In fact, it is believed that the GC mass function (GCMF) today is the result of an initial GCMF that was shaped by dynamical processes [e.g., [22][23][24][25][26][27]. These processes, e.g., relaxation driven evaporation and tidal shocking, are particularly efficient at destroying low-mass clusters.
A key uncertainty in estimating a merger rate from all GCs is that the amount of such disrupted clusters is not known.
Previous estimates for the BHB merger rate ignored the fact that the fractional mass that has been lost from the GC population by the present time, K (see equation 6 below), is very uncertain as it cannot be tightly constrained from the present-day properties of the GC population. We will show that once this uncertainty is taken properly into account, it becomes one of the dominant factors in setting the error bars on local merger rate estimates from the GC channel. For example, both Fragione and Kocsis [21] and Rodriguez and Loeb [28] used a single value for K which was derived under one assumption for the initial GCMF. Although Rodriguez and Loeb [28] considered the effect of different GCMFs on their results, they neglected that K should be related to the choice of initial GCMF. A different initial GCMF not only changes the mass of the GCs which make the BHBs, but it also sets the amount of mass that is lost from the GC system; and it turns out that the BHB merger rate is quite sensitive to both effects.
The properties and merger rate of BHBs depend on several other physical processes, many of which lack strong observational constraints [29]. For example, the distribution of the natal kicks controls the number of BHs that are ejected from the GC upon formation, as well as the fraction of BHs that retain their binary companion after supernova. Different assumptions about the early stages of BH formation will also reflect on the evolution of the host cluster, affecting its total lifetime and final properties. Moreover, merger rates are expected to be sensitive to the assumed density and the related mass-radius relation of GCs at formation which is also unconstrained observationally [30]. The full implications of these uncertainties is still not fully explored. The main reason for this is that standard numerical techniques such as Nbody and Monte Carlo simulations are still too slow to allow a full parameter space exploration. This is why in this study we employ our new population synthesis code clusterBHBdynamics (hereafter cBHBd) [31] to systematically vary assumptions made for the model parameters and over the full range of initial conditions relevant to real GCs. Thus, we examine the effect of these initial assumptions on the number and properties of merging BHBs using a suite of about 20 million cluster models.
In conclusion, the merger rate of BHBs produced dynamically in GCs has been studied by multiple teams [e.g., 13,16,17,32]. Here we build on former studies in two ways which allow us to place error bars on theoretical estimates for the BHB merger rate density and on its redshift evolution: (i) we constrain the fractional mass that has been lost from GCs over cosmic time by fitting an evolved Schechter mass function to the observed GCMF in the Milky Way today, and using a simple model for cluster evaporation. (ii) we employ our new population synthesis code cBHBd to explore how the BHB merger rate depends on uncertain parameters in the models (e.g., initial cluster densities, BH formation recipes, natal kicks), and exploring the parameter space that is relevant to real GCs and over all mass scales.
The paper is organized as follows. In Section II we compute the GC formation rate density as a function of time using constraints from the present-day GCMF. In Section III we describe our population synthesis model and detailed the modifications we made to it with respect to the version used in [31]. Section IV describes our main results. We discuss the implications of our results and conclude in Section V.

II. CLUSTER FORMATION RATE
In order to compute a BHB merger rate we need the cluster formation rate density (i.e. per unit of volume) as a function of time:ρ GC (t). We do this by imposing that in our model: (i) the present-day GC mass density in the Universe, ρ GC , is consistent with its empirically inferred value and (ii) the present-day GCMF is consistent with the observed mass function of the Milky Way GCs.

A. Globular clusters density in the Universe
To derive ρ GC we use the same approach as [15], who use the empirically established relation between the total mass of a GC population (M GCs ) and the dark matter halo mass of the host galaxy (M h ). The ratio of these two quantities is remarkable constant over a large range of halo masses (10 10 M h /M 10 15 ) and for different galaxy types: η ≡ M GCs /M h (3 − 7) × 10 −5 [33][34][35][36][37]. We can also estimate this ratio for the Milky Way: the total luminosity of all Milky Way GCs from the Harris catalogue [38,39] is 1.75 × 10 7 L V, . Adopting a massto-light ratio in the V -band of Υ V = 2 M /L V, and a virial mass of the Milky Way of M h = 1.3 × 10 12 M [40] we find η = 2.7 × 10 −5 for our Galaxy. Table 1 in [36] summarises 8 results from different studies. We use the 7 studies that include at least 25 galaxies and combine this with the result of η = 2.9 × 10 −5 by [37] that was published after this summary. We also add the Milky Way estimate from above to find a mean value of We determine the dark matter halo mass function from simulations of large scale structure formation by [41] using the HMFcalc tool [42]. The total mass density in dark matter halos with individual masses M h ≥ 10 10 M /h is ρ DM = 3.64 × 10 19 h 2 M Gpc −3 . Combined with our value for η from equation (1) The relation between M GCs and M h may hold down to dwarf galaxy masses of M h 10 9 M [44], and including these low-mass galaxies would increase ρ DM and ρ GC by about 15%, but because of the uncertain GC occupation fraction below M h 10 10 M , we continue with the result of equation (2) 1 .

B. Globular cluster mass function
For our population synthesis model of the next section, we need the initial mass density of GCs in the Universe (ρ GC0 ). To find the relation between ρ GC0 and ρ GC from the previous section, we adopt a simple model for the mass evolution of GCs. We assume that the initial GCMF is a Schechter-type function [45], i.e. a power-law with index −2 at low masses with an exponential highmass truncation at M * , as is found for young massive clusters in the Local Universe [46]. We assume that this initial GCMF is universal throughout the Universe and across cosmic time. There are arguments for a flatter initial GCMF (i.e. fewer low-mass GCs) in dwarf galaxies at high redshift and low metallicity [47], but we proceed with the assumption of a Universal initial GCMF. We will discuss the effect of flatter initial GCMFs on the BHB merger rate in section V.
To find an expression for the GCMF today, resulting from the initial GCMF we follow a similar approach as [24,27] and assume that all GCs have lost an amount of mass ∆ = |Ṁ |t, whereṀ is the mass loss rate from escaping stars and t is the age of the GCs. We do not specify the escape mechanism and let ∆ be constrained by the Milky Way GCMF. Details of the various processes can be found in literature: relaxation driven evaporation [48]; disc and bulge shocks [49,50]; interactions with molecular gas clouds (at young ages) [51][52][53] and combinations of the various effects [22,24,54,55]. From here on we refer to the mechanism responsible forṀ , regardless of what the underlying physical process may be, as 'evaporation'.
We then assume that ∆ is a constant, i.e. independent of GC mass, host galaxy, orbit and formation epoch. This is clearly not realistic, becauseṀ depends on the (timedependent) tidal field and the GC orbit within their galaxy [48]. However, this exercise is merely meant to arrive at an order of magnitude estimate of how much mass GCs lose between formation and now, rather than developing a realistic description of GC evolution. The present-day GCMF, ψ, defined as the number of GCs per unit volume (n GC ) in the mass range [M, M + dM ], is given by the 'evolved Schechter function' [27] At low masses, where the GCMF is affected by mass loss (M ∆ M * ), this function approaches a constant ψ A/∆ 2 . In fact, any initial GCMF evolves towards a uniform ψ at low masses ifṀ is constant [56]. The GCMF is often plotted as the number of GCs in logarithmic mass bins (∝ dN/d log M ), which increases linearly with M at low masses and peaks at M peak ∆ (for ∆ M * ). The simple functional form of equation (3) provides a good description for the Milky Way GCMF and the luminosity function of GCs in external galaxies [27]. The constant of proportionality A is found from the constraint that all GCs must add up to the present-day GC mass density in the Universe: ∞ M lo ψM dM = ρ GC , with ρ GC from equation (2) and M lo = 100 M .
The GC evolution model we use in the next section also considers mass loss by stellar evolution, which mostly happens in the first few 100 Myr. The fraction of mass that clusters lose as a result of stellar evolution is around 40 − 50%, depending on metallicity, the stellar initial mass function, stellar evolution details and on whether BHs are ejected, or not. In the cluster evolution code we use in the next section, the (approximate) stellar evolution prescription is independent of metallicity and when excluding mass loss by BH ejections a cluster loses approximately half of its initial mass by stellar evolution. We therefore assume that clusters lose half their mass by stellar evolution alone. Next, we assume that stellar evolution and escape affect the GCMF sequentially (i.e. first stellar mass loss and then escape). We can then write M 0 = 2(M + ∆) and we can find the initial GCMF from the continuity equation [24] Because , the initial GCMF that corresponds to the present-day GCMF of equation (3) is given by We note that the Schechter mass of the initial GCMF is 2M * , where M * is derived from the present-day GCMF. We then introduce a factor K for the ratio ρ GC0 over ρ GC , i.e.
In the next section we include the contribution from clusters of all masses, and it is therefore important to understand the exact value of K, or better the distribution of K. To find a posterior distribution for K, we fit the evolved Schechter functions from equation (3) to the Milky Way GCs and then derive K using equations (5) and (6). We use the V -band luminosities of the 156 GCs in the 2010 edition of the Harris catalogue [38,39] and then assume a constant mass-to-light ratio of Υ V = 2 M /L V, to convert luminosities to masses. A histogram of the resulting mass function is shown in Fig. 1. The Milky Way values are binned in bins with 15 GCs each, with the highest mass bin containing 6 GCs. The black dots are the average masses of the GCs in each bin, while the horizontal error bars show the bin range and the vertical error bars show the Poisson errors. The GCMF is plotted as dN/d log M ∝ M ψ, hence the uniform mass function at low-masses is a linear increase in this log-log plot.
We then use the normalised evolved Schechter function of equation (3) as a likelihood function to find ∆ and M * . We use the Markov Chain Monte Carlo (MCMC) code emcee [57] to maximise the log-likelihood and vary log ∆ and log M * , assuming flat priors in the range 3 ≤ log(∆/M ) ≤ 7 and 3 ≤ log(M * /M ) ≤ 7. In Fig. 1 we show the result. For 10 4 walker positions of the converged chains we compute the GCMF (equation 3) and the initial GCMF (equation 5) from ∆ and M * and at each mass we determine the [16%, 50%, 84%] percentiles of the initial and present-day GCMF. The full-blue and dashed-green lines show the 50% (i.e. median) values for the GCMF and initial GCMF, respectively, while the shaded regions indicate the [16%, 84%] intervals. The fit results are similar to what was found by [27] for the 1996 Harris catalogue: log ∆ = 5.4 ± 0.1 and M * = 5.9 ± 0.1. Note that Jordán et al. did not consider stellar evolution mass loss, so their initial GCMF was truncated at M * , while ours is truncated at 2M * .
We also compute K with equation (6) for these 10 4 walker positions and find K = 32.5 +29.8 −12.5 . The spread in K provides an estimate of the uncertainty in K, given the 156 Milky Way GC masses. The merger rate will not increase by the same factor of K. This is firstly because half of the value of K is due to stellar mass loss. The decrease in GC population mass by evaporation is K ∆ = K/2 = 16.3 +14.9 −6.25 . Using the approximation for the number of mergers in the observable redshift range from [31], , we can estimate the fractional increase in the merger rate (K merge ) as a function of K ∆ . The relation between K merge and K ∆ depends on the adopted mass-radius relation. If we parameterise this as r h0 ∝ M µ 0 , then we find that for µ = 0 (i.e. a constant initial radius) that then K merge = 3.5 +3.8 −1.1 ; for µ = 1/3 (i.e. a constant initial half-mass density) we find K merge = 5.0 +5.9 −1.7 and for µ = 0.6 (i.e. a Faber-Jacksonlike relation, [30,58]) we find K merge = 7.8 +9.8 −2.8 . The reason that K merge increases with µ, is because for large µ, the low-mass clusters are denser and produce more BHB mergers. Because we do not know the initial massradius relation, the values of K merge is thus in the range 2.4 − 17.6, corresponding the the range in K ∆ mentioned above: 10.1 − 31.2.
Previous studies adopted a constant K = 2.6 to account for evaporation [28,59] and then assumed that K merge = K. This value for K merge is on the lower boundary of our estimated range for for K ∆ for clusters with a constant radius, corresponding to a factor of ∼ 2 below the lower boundary of the distribution of K. For our upper boundary of K ∆ for µ = 0.6 this value of K merge is a factor of 6.8(13.4) lower than our K ∆ (K).

C. The GC formation rate
Next, we compute the cluster formation rateρ GC ≡ ρ GC (τ ), where τ is lookback time, for a set of model assumptions. We do this in terms of a normalised GC formation rate R ≡ R(τ ), such that 0 ∞ Rdτ = 1. In the next section we will derive R from a model for GC The resulting initial GCMF, corrected for mass loss by stellar evolution (factor of 2) and evaporation (∆), is shown as a green dashed line. The shaded regions and uncertainties of quoted values indicate the 16%-84% interval. The inferred K value implies that the total mass of the Milky Way GC population was 32.5 times higher initially. Half of this mass reduction is because of stellar mass loss and the remaining factor of 16 is due to evaporation.
formation across cosmic time. For a given present-day ρ GC (equation 2), and no mass loss, the GC formation is then found fromρ GC = ρ GC R. We now show howρ GC can be easily derived for a population of GCs with a given present-day mass function that have lost mass by stellar evolution and/or escape. Thus, after we determine ψ 0 , we can findρ GC from imposinġ The cluster mass formed per unit volume integrated over all times is The large error bars are because of the uncertainty in K and η , and imply that ρ GC0 is uncertain by a factor of 2. In the next section we include this uncertainty in the predictions for the merger rate.

III. METHODOLOGY
The evolution of the BHBs in our cluster models is computed using the fast code cBHBd. While the details of this method are described in Antonini and Gieles [31], here we give a brief summary of the model philosophy, including the full set of differential equations that are used to compute the secular evolution of the cluster models and the merging BHBs they produce.

A. clusterBH
We assume that the cluster consists of two types of members: BHs and all the other members (i.e. other stellar remnants and stars). Each contribute a total mass of M BH and M , respectively, such that the total cluster mass is M = M + M BH . We assume that after several relaxation time-scales the cluster reaches a state of balanced evolution [60,61], so that the heat generated in the core by the BHBs and the evolution of the cluster global properties are related as [56,61,62]: where E −0.2GM 2 /r h is the total energy of the cluster, with M the total cluster mass and r h the half-mass radius. The constant ζ 0.1 [62], and t rh is the average relaxation time-scale within r h which is given by [e.g., 63] t rh = 0.138 Here m all is the mean mass of the stars (initially m all = 0.638 M ), and ln Λ is the Coulomb logarithm, which varies slowly with N , but we fix it to ln Λ = 10. The quantity δ depends on the mass spectrum within r h , for which we adopt the following form: where f BH = M BH /M is the fraction of mass in BHs to the total cluster mass and a 1 is a constant that was determined from a comparison to N -body models (see below). We define the start of the balanced evolution as Under the above assumptions, the set of coupled ordinary differential equations given below are integrated forward in time to obtain solutions for M BH (t), M(t) and r h (t).
The mass loss rate of BHs is coupled to the energy generation rate, which itself is coupled to the total E and t rh of the cluster (equation 10), such that [61] The cluster mass loss due to stellar evolution iṡ with t sev 2 Myr. We include here an additional (mass independent) term which was not present in Antonini and Gieles [31], and that accounts for cluster evaporatioṅ with t 10 Gyr the averaged cluster formation time. The total mass loss rate of the cluster is theṅ In balanced evolution, the expansion rate of the cluster radius as the result of relaxation iṡ Before balanced evolution, the cluster radius expands adiabatically as the result of stellar mass loss at a ratė The final expression for the half-mass radius evolution iṡ The remaining parameters were obtained in Antonini and Gieles [31] by fitting the results of N -body simulations: N rh = 3.21, β = 2.80 × 10 −3 , ν = 8.23 × 10 −2 and a 1 = 1.47 × 10 2 .

B. BHBdynamics
The initial contraction of the cluster core due to twobody relaxation leads to high central densities of BHs which favor the formation of binaries through three-body processes. The energy produced by the BHBs reverts the contraction process of the core and powers the subsequent expansion of the cluster as described by equation (18). We can then relate the BHB hardening rate to the rate of energy generationĖ whereĖ bin is the hardening rate of all core binaries. Equation (21) allows us to couple in a simple way the evolution of the BHBs to the evolution of the cluster model. It is important to stress that the hardening rate equation (21) depends neither on the number of binaries present in the cluster core nor on the exact mechanism leading to their formation. In order to compute a merger rate and the binary properties from equation (21), we need to further specify the dynamical processes that lead to the hardening and merger of the binaries. We are interested in mergers that occur through (strong) binary-single dynamical encounters in the cluster core [e.g., 64,65]. Thus we consider: (i) mergers that occur in between binary-single encounters while the binary is still bound to its parent cluster (in-cluster inspirals); (ii) mergers that occur during a binary-single (resonant) encounter as two BHs are driven to a short separation such that gravitational wave (GW) radiation will lead to their merger (GW captures); and (iii) mergers that occur after the binary is ejected from its parent cluster. We use BHBdynamics to determine the rate and masses of the BH binary mergers produced by these three dynamical channels.
In balanced evolution, the binary formation rate is equal to the binary ejection rate and it is given therefore by the BH mass ejection rate equation (14) divided by the total mass ejected by each binary where m ej was computed using equation (38) in [31], and it is approximately a fixed number m ej 6m. The number of BHBs that merge after a time t from the formation of the cluster is [31] where P ex (t − t ej ) is the probability that a binary ejected dynamically from the cluster at a time t ej will merge due to GW emission within a time t from the formation of the cluster, and P in is the probability that a binary merges inside the cluster. These probabilities are computed by assuming that the eccentricity of the BHBs follows that of a so-called thermal distribution N (e) ∝ e [e.g., 66], and their full expressions can be found in Antonini and Gieles [31]. Moreover, when evaluating P ex and P in we set E bin = −Gm 1 m 2 /2a, with a the binary semi-major axis and m 1 and m 2 the mass of the BH components, i.e., we have assumed that only one BHB is responsible for all the heating at any given time. However, because the dependence is weak, e.g. P in ∝Ė −2/7 bin [31], and the number of hard binaries is expected to be of order unity in the type of clusters we consider [67], this simplification is reasonable.

C. Black hole mass function and natal kicks
In order to calculate the merger rate through equation (23) we need a physically motivated model for the BH mass function and its time evolution.
We sample the stellar progenitor masses from the initial mass-function φ (m ) ∝ m −2.3 [68] between m ,lo and 100M , with m ,lo 20M the stellar mass above which a BH forms. The resulting masses of the BHs are then obtained using the fast stellar evolution code SSE [69] which we modified to include up to date prescriptions for stellar wind driven mass loss [70], compactobject formation and supernova kicks [71,72], and we also include prescriptions to account for pulsational-pair instabilities and pair-instability supernovae [73]. The initial mass fraction in BHs is set equal to the total mass in BHs divided by the total mass in stars between 0.1 and m ,lo for a Kroupa [68] initial mass function, and ranges from f bh 0.04 to 0.07 depending on the metallicity and the adopted prescription for compact-object formation. These mass fractions first increase as the result of stellar evolution mass loss and then they reduce due to the ejection of BHs caused by natal kicks. The BH natal kicks are computed using a standard fallback model in which the BHs receive a kick drawn from a Maxwellian distribution with dispersion σ = 265 km s −1 [74], lowered by the fraction of the ejected supernova mass that falls back into the compact-object. The fallback fraction and remnant masses are determined according to the chosen remnantmass prescription. We adopt here the rapid supernova prescription described in Section 4 of Fryer et al. [72], in which the explosion is assumed to occur within the first 250ms after bounce. But later in Section IV B we also explore other choices for the compact-object formation recipe.
The cluster processes its BH population such that the mass of the merging BHs progressively decreases with time because the most massive BHs are the first to reach the cluster core, form hard binaries and merge [e.g., 75]. Simulations of dense star clusters also show that the merging BHs have a distribution of mass-ratio that is strongly peaked around one [e.g., 17]. Thus, we assume that the BHs taking part to the dynamical interactions have the same mass, and that at any given time this mass is equal to that of the largest BH in the cluster, i.e., m 1 = m 2 = m 3 = m max . The value of m max at a given time, can be easily linked to the time evolution of the total mass in BHs given by clusterBH. For a generic BH mass function φ • , we use the fact that where the integral in the left hand side is simply the cumulative distribution of BH masses computed with SSE. We then invert numerically this relation to find m max (M BH ).

D. Cluster formation and initial properties
We sample the initial cluster masses from the Schechter mass function equation (5), i.e., we assume that both evaporation and mass loss are important. The values [∆, M * ] needed to compute A are sampled from their posterior distributions derived in Section II B.
An important property of balanced evolution is that the value of a cluster half-mass radius today is largely independent of its initial value [30,56]. It is therefore not possible to infer the initial density of GCs from their properties today. We therefore consider three choices for the initial stellar mass density within the half-mass radius, ρ 0 = 3M 0 /8πr 3 h,0 , which we set equal to ρ 0 = 10 3 , 10 4 or 10 5 M pc −3 .
For the cluster metallicity, we fit a quadratic polynomial to the observed age-metallicity relation for the Milky Way GCs [76], to obtain the mean metallicity (25) Given the cluster age, t, we then assume a log-normal distribution of metallicity around the mean, with standard deviation σ = 0.4 dex. This takes into account the large spread found in the observed age-metallicity relation. In order to determine the effect of metallicity on our results we will later consider additional models where the cluster metallicity is set to a fixed value.
We obtain the distribution of cluster formation times from the semi-analytical galaxy formation model of El-Badry et al. [77]. The same model has also been used in recent work [e.g., 28,78], allowing a direct comparison of our results to literature. El-Badry et al. [77] describe the process of GC formation as resulting from star formation activity in the high-density disks of gas-rich galaxies. Motivated by the results of simulations of molecular cloud collapse, they assume that massive bound clusters form preferentially when the gas surface density exceeds a certain threshold. Applying this Ansatz to a semianalytic gas model built on dark matter merger trees, they make predictions for the cosmological formation rate of GCs. We sample the formation redshift of our cluster models from the total cosmic cluster formation rate given by the fiducial model of El-Badry et al. [77] and integrated over all halo masses. This corresponds to the formation rate per comoving volume of their Figure 8 with their parameters β Γ = 1 and β η = 1/3, where β Γ sets the dependence of the cluster formation efficiency on surface density, and β η the dependence of the star formation rate on the halo virial mass. We then normalize the GC formation to unit total number to obtain R (equation 7). Thus, we only sample the cluster formation redshift from the El-Badry et al. [77] model, while the total cluster formation rate is given by our equation (7). Later, in order to determine the importance of our assumption about the cluster formation history, we will consider another class of models with different values for β Γ and β η .
Given the initial cluster mass, radius, metallicity, and formation time, the BHB merger rate,Ṅ , is obtained from cBHBd.

IV. BINARY BLACK HOLE MERGER RATE
Our equation for the merger rate density of BHBs at a lookback time τ is whereṄ (t; M 0 , r h0 , Z) is the BHB merger rate corresponding to a cluster with an initial mass M 0 , half-mass radius r h0 and metallicity Z at a time t after its formation; R(t) is the normalized cluster formation rate; s(Z) is the normalized formation rate of clusters with a metallicity Z at a time t, s(Z; t)dZ = 1, which can be calculated given a model for the time evolution of metallicity, e.g., equation (25).
Because our results turn out to be more sensitive to the cluster initial density than to other parameters, we first focus on Mod1, Mod2 and Mod3 in Table I. In these models we vary ρ 0 in a range that is relevant to real globular clusters, while keeping fixed all the other parameters as given in the table. This allows us to bracket a plausible range of values for the local merger rate density and its redshift evolution.
For each model assumption we sample 100 values over the posterior distributions of the parameters M * and ∆ obtained from the MCMC fit to the observed Milky Way GCMF. Then, for each pair [M * , ∆], we evolve 50,000 cluster models with initial properties sampled from the distributions given in Table I. The normalisation constant, A, of the GCMF is obtained from the relation ∞ M lo ψM dM = ρ GC , where we assume that the parameter η follows a Gaussian distribution with mean 4.4 × 10 −5 and σ = 1.6 × 10 −5 . We sample 1000 values from this latter distribution and for each of them we use equation (26) to determine a merger rate estimate for each of the [M * , ∆] values, and thus obtain a distribution of merger rate density values. In Fig. 2 we plot the median of the merger rate distribution and credible intervals as a function of redshift, and the primary BH mass distribution of binaries merging at redshifts z < 1 as well as the initial mass distribution of the clusters where these binaries originated. Fig. 2 shows that the difference between our upper and lower bounds on the comoving BH merger rate density is about a factor ∼ 10 for any density assumption. This is due to the fact that R ∝ K at a very good approximation, and, as we discussed above, tight constraints on K cannot be placed from the present day Milky Way GCMF. Moreover, rates are not too sensitive to the initial cluster density -two orders of magnitude difference in the initial density leads to a factor of ∼ 5 variation in the local value of R. From this we conclude that the initial density uncertainty is as important as the unknown initial GCMF.
For each initial GCMF, corresponding to new values of [M * , ∆] and η , we fit the redshift distribution of the   merger rate density at z < 2 using to derive a distribution of values for the parameters R 0 and λ for each of our three density assumptions. In this analysis we neglect the uncertainties associated to each fit because their standard deviations are much smaller than the variation in the inferred parameters across the different models. The initial parameters for each of the three densities and the corresponding median values and uncertainties (5 and 95 percentiles) of R 0 and λ are given in Table I  The middle and lower panels show the differential local merger rate as a function of the the primary BH mass and initial cluster mass, respectively. We compare our results to the median (solid), and the 90% credible intervals (hatched regions) inferred from the O1 and O2 catalogs in LIGO Scientific Collaboration and Virgo Collaboration [5]. In the middle panel, we have used their 'Model B' and the thick-red line gives the BH initial mass function (in arbitrary units). The metallicity model used is given by equation (25), and we have adopted the rapid supernova mechanism from Fryer et al. [72]. Vertical dotted line corresponds to the LIGO Voyager upgrade horizon for (10 + 10) M BHBs [79]. The observable horizons of the Einstein telescope [80] and the Cosmic Explorer [79] extend to the very early Universe and are both to the right of the x-axis range in the figure.
the range R 0 1 to 50 Gpc −3 yr −1 . A comparison to the local merger rate inferred from the GW detections, 58 +58.5 −28.8 Gpc −3 yr −1 [5], shows that BHBs formed dynamically in GCs are likely to explain a significant fraction of the BHB mergers detected by LIGO-Virgo. Note, however, that if GCs are formed with high densities, ∼ 10 5 M pc −3 , then our merger rate estimates are con- sistent with the LIGO-Virgo merger rates within uncertainties. Combining the three models together, i.e., assuming a universe in which one third of the clusters form as in Mod1, one third as in Mod2 and the remaining as in Mod3, we find where uncertainties refer to the 90% credible intervals. The middle panel of Fig. 2 shows the primary BH mass distribution normalized to the BH merger rate density in the local Universe (z < 1). Our models do a reasonable job at reproducing the shape but not the normalization of the (although still very uncertain) BH mass function inferred from the O1 and O2 catalogs [5]. The BH mass distribution appears to be sensitive to the initial cluster density, in the sense that higher densities lead to a higher fraction of lower mass BHs among the merging population. It is interesting to note, that the higher density model provides a better match to the inferred BH mass function and the rates. The additional low-mass BHs in high density GCs is due to the higher retention fraction of lower mass BHs in higher density models after a natal kick due to the high escape velocities from these clusters, and to the fact that denser clusters process their BH populations faster, thereby 'eating' away their BH mass function more.
Given the number of complex features that can be seen in the BH mass distributions we do not attempt here a parametrization over the full range of BH masses. Moreover, as we will show later in Section IV B, these distributions are sensitive to the uncertain prescriptions for BH formation, natal kicks and metallicity. We instead consider the mass range 13 M to 30 M where a simple power law model, dR/dm ∝ m α , does a reasonable job. In this mass range we find α = 0.1 +0.9 , and the uncertainties refer to the 5 and 95 percentiles. As before we neglect the uncertainties associated to each fit because their standard errors are much smaller than the variation of α across the different models. By combining the three density models together, we find Negative values of α are preferred, though positive values are also acceptable. For comparison, the BH initial mass function integrated over all metallicities is shown in the middle panel of Fig. 2. Within the same BH mass range, m = 13 M to 30 M , the best fit power law model to the initial mass function had a spectral index α ≈ −1.8. The most striking feature, however, is that all mass distributions in Fig. 2 are strongly depleted at m 15 M the fraction of BHs below this mass decreased by more than one order of magnitude with respect to the BH initial mass function. Due to this, all our models underpredict the number of BHBs at m ∼ 10M compared to the mass distribution inferred from the LIGO-Virgo detections. Moreover, we note some features that are common to the three models considered here. All three models show peaks at m 13 M , 20 M and 30M . Above m = 30 M the BH mass distribution starts to decline rapidly with mass until a break at m 38 M above which the decline becomes much steeper. All models show essentially no BHs with mass above 40 M or below 5M . The suppression at 40M is a consequence of stellar mass loss prior to the formation of the BHs and it is present even in models that do not include any prescription for pair instability [83]. Note that we do not consider hierarchical mergers, which are expected to represent a small number of the total mergers produced in GCs [84]. Thus, while including hierarchical mergers will somewhat increases the number of BHs above 40 M , it will not significantly affect the BH mass distributions we derived.
The bottom panel of Fig. 2 shows the differential contribution of clusters with different masses to the local merger rate (z < 1). This contribution increases with cluster mass until about 10 6 M , above which the rate starts to rapidly decrease because of the exponential truncation of the initial GCMF above M * . The contribution of clusters with masses larger than 10 7 M or smaller than 10 3 M is negligible. In our models, clusters that have a mass M 0 < 4 × 10 5 M are fully disrupted by the present time. These clusters have been neglected in some previous work [e.g., 15,17,20,75], but we find that they contribute a significant fraction of the local merger rate: ≈ 0.33, 0.48 and 0.30 for Mod1, Mod2 and Mod3 respectively.

A.
In-cluster vs. ejected binaries The merger of a BHB in our models can occur either after the binary has been ejected dynamically from the cluster, or within the cluster itself. In-cluster mergers are relevant because they can lead to the formation of BHs with mass above the pair-instability gap at ≈ 50 M [84][85][86], and the observational implications of this have been discussed in a number of previous papers, e.g. [87][88][89]. Among all in-cluster mergers, GW captures are also particularly important because a fraction of them are expected to have a finite eccentricity at the moment they first chirp within the the LIGO frequency band above 10 Hz [85,90,91]. Thus, they could be identified among other binaries due to their unique eccentric signature.
In Fig. 4 we show separately the rate evolution of BH mergers occurring among the ejected binaries, those forming inside the cluster and GW captures, as well as the mass distributions of mergers at z < 1 and the mass distribution of their parent clusters. While in-cluster mergers dominate the rate density at early times, z 2, most of the BHB mergers in the local Universe are produced among the ejected population. The local rate of in-cluster mergers is 2 Gpc −3 yr −1 and that of GW captures is 0.4 Gpc −3 yr −1 , with little dependence on the initial density assumed. However, the fractional contribution of in-cluster mergers does depend quite strongly on the cluster initial conditions, in the sense that higher densities lead to lower fractions -for ρ 0 = 10 3 M pc −3 , nearly 40% of the local mergers are formed in-cluster, while for the highest initial densities, ρ 0 = 10 5 M pc −3 , in-cluster mergers only contribute 15% of the total. The percentage of all in-cluster mergers that occur through a GW capture is ≈ 20% in the local Universe and increases smoothly with redshift, reaching ≈ 30% near the peak of cluster formation activity. Because an order of unity fraction of gravitational wave captures are expected to have a finite eccentricity ( 0.1) above 10Hz frequency, we conclude that eccentric mergers from globular clusters contribute 0.4 Gpc −3 yr −1 to the merger rate in the local universe. This low rate is consistent with the non detection of eccentric binaries in current searches [92].
Our models Mod1 and Mod2 show a decrement for the local fraction of in-cluster mergers over previous estimates which bracketed this between 30% and 50% of the total rate [e.g., 28]. This difference arises from the fact that this previous work only considered clusters with mass 10 5 M for which about half of the overall merger rate is due to in-cluster binaries. However, as shown in Fig. 4, lower mass clusters contribute significantly to the local rate, although the mergers they produce only occur among the ejected population. The reason for this is that their BHs have all been ejected by z = 1. Thus, including these systems leads to an overall reduction of the contribution of in-cluster mergers and also affects the redshift evolution of the merger rate density. Fig. 4 shows the distributions of R 0 and λ obtained by fitting equation (27) to the merger rate evolution of incluster and ejected binaries separately. The merger rate of in-cluster binaries evolves steeply with redshift, λ ≈ 3, albeit with a large scatter, while for ejected binaries the dependence is much weaker, λ ≈ 1. If, as before, we assign equal probability to each of our density assumptions and fit the total merger rate density of in-cluster binaries using equation (27) we find while for ejected binaries We now use a simplified analytical model to gain some physical insights on this result. From equation (23), the merger rate for in-cluster binaries is while for ejected binaries we have The merger probabilities that enter in the integral equations above can be linked to the evolution of the cluster properties in a simple way under some simplifying assumptions. If we neglect cluster mass loss and that the BHs have a range of masses -both have little effect on the merger rate evolution (see Secion IV B)we can write t rh (t) = t rh,0 1 + 3 2 ζt/t rh,0 , and ρ(t) = ρ 0 1 + 3 2 ζt/t rh,0 −2 [86]. Using that the merger probabilities scale as P in ∝ ρ(t) 10/21 (neglecting captures) and P ej ∝ (t − t ej ) 2/7 ρ(t) 8/21 [31], we can then determine the redshift dependence of the merger rate through equations (32) and (33). At times t t rh,0 /ζ, for in-cluster binaries we have where we used that t(z) ∝ (1 + z) −3/2 in order to convert time into redshift. For ejected binaries we have dN Although we have neglected some important ingredients (e.g., mass loss, BH mass function), the expected value of λ for the two populations is consistent with the ones found above and, as expected, it is much steeper for incluster mergers. This fits in the view that the rate at which the merging BHBs are produced by a cluster is controlled by the relaxation process within the cluster itself, providing a physical interpretation to our results. Another result of our analysis is that the local rate of in-cluster inspirals and GW captures are nearly independent of the initial density assumed. In general, we find that other model variations also have little effect on the merger rate of in-cluster binaries. This result is a consequence of the cluster expansion during balanced evolution.
Due to the expansion powered by the BHBs, all clusters evolve asymptotically to (approximately) approach the same value of half-mass relaxation time. Hence, after some time, the merger rate of in-cluster binaries must also become approximately the same for all clusters. This concept is illustrated in Fig. 6 where we show the evolution of a set of cluster models with the same mass but different ρ 0 . For M 0 = 3 × 10 6 M (left panels), the BHB merger rate at 10 Gyr only varies by a factor of ∼ 2 between the models, although these were started with widely different densities. The middle panel gives the cluster half-mass relaxation time, which also tends to evolve to the same value for all models. This roughly recovers Hénon's result that t/t rh increases until t rh ∼ t rh,0 /ζ, after which t rh ∝ t [30].
In the right panel of Fig. 6 we consider the evolution of clusters with an initial lower mass M 0 = 3 × 10 5 M . In these models all BHs are ejected at t 5 Gyr. Thus, their in-cluster binaries do not contribute to the merging population at late times. This explains why in the population FIG. 5. Distribution of the rate parameter λ, and the local merger rate, R0, for each of the three models of Fig. 2, and for the in-cluster mergers and ejected binary mergers separately. Colors are as in Fig.2.   FIG. 6. Example of the BHB merger rate evolution for two cluster masses, and separately for in-cluster and ejected binaries. The middle panels gives the cluster relaxation time, and the lower panels the total BH mass in units of the initial value. Different lines correspond to different initial half-mass density, ρ0 = 0.03, 0.1, 0.3 and 1 × 10 5 M pc −3 . The initial density increases with line thickness. Here we set ∆ = 3×10 5 M and the simulations are terminated either after 13 Gyr of evolution or after all BHs have been ejected. models above the local merger rate of in-cluster binaries is widely dominated by high mass systems, M 0 10 6 M (e.g., lower panel of Fig. 4). Moreover, the binary merger rate near the end of the simulations shows a larger variation among different models than in the high mass cluster case. This simply reflects the large difference in the cluster density at early times when these binaries were formed and ejected.

B. Dependence on model parameters
In the previous section we consider the merger rate density evolution for three choices of initial cluster density. Here we discuss the results for a larger set of models in which for each of the three density assumptions we vary the prescription for the cluster metallicity, the cosmological cluster formation model, the BH formation mechanism, the BH natal kicks, and the cluster mass loss rate. All the model parameters we considered are listed in Table I (Mod4 to Mod39), together with the corresponding values of merger rate evolution parameters and uncertainties. We stress that not all the models analysed here are realistic representations of a globular cluster population. They are nevertheless useful in order to understand the impact of different model parameters on the merger rate evolution and BH mass distribution. Our main message here is that variations in other model assumptions have little effect on the local value of the BHB merger rate density and its redshift evolution. Thus, we conclude this section by presenting the results from an additional set of models where ρ 0 is varied over a wider range of values than in Section IV to more systematically explore its effect on the BHB merger rate.
Metallicity. In order to explore the dependence of the merger rate on metallicity, we consider models where the clusters all have the same metallicity which we set to Z = 0.01, 0.1 or 1 × Z . Since mass loss due to stellar winds is less effective in metal-poor stars, the forming merger remnant mass increases with decreasing metallicity. At Solar metallicity, the mass distribution of the final merger products spans from a few solar masses up to about 30 M and peaks near 10 M . At lower metallicity, Z = 0.01 and 0.1, the distribution of remnant masses is much wider with its maximum at ∼ 50M . This has an obvious effect on the mass distribution of the merging BHBs as can be seen in Fig. 7, 8 and 9. The value of metallicity affects also what type of clusters make the BH mergers in the local universe, with their mass distribution being skewed towards higher values for Solar metallicities. The important result here, however, is that the evolution of the merger rate density is largely unaffected by the choice of metallicity and its dependence on cluster age. Even in the unrealistic case in which all clusters are formed at Solar metallicity, the merger rate density only starts to deviate significantly from the other models at z > 2. Such lower merger rate at early times is expected and it is a consequence of the longer t rh due  Table I where the initial half-mass density is set to 10 4 M pc −3 . Middle panels give the distribution of primary BH masses for mergers at z < 1. The lower panels show the mass distribution of clusters where these merging binaries were formed.
to the lower initial BH mass fraction.
We conclude that a detailed knowledge of the metallicity distribution of GCs and its dependence on time is not necessary in order to determine a BHB merger rate, although it has an important effect on their mass distribution.
Cluster ages. We implemented two additional choices for the parameters in the cosmological model of El-Badry et al. [77] which determine the distribution of cluster ages: [β Γ = 1, β η = 1/6], and [β Γ = 0, β η = 1/3]. These two models are shown in Fig. 8 of El-Badry et al. [77]. Moreover, we consider an additional case of a burst-like cluster formation history in which all clusters are formed at z = 3.
From Fig. 7, 8, and 9 we can see that, for a given initial density, our results at z 2 are also independent of the exact distribution of cluster formation times. Within this redshift, even the oversimplified case in which all clusters form at z = 3 leads to a merger rate density and BH mass distribution that are consistent with those obtained from the full cosmological models. At z > 2, however, the red-shift evolution of the merger rate is clearly affected with its peak coinciding with the peak of cluster formation activity in each model. BH formation. We consider three more recipes to computing the BH mass distribution based on different corecollapse/supernova models. We use the delayed model in which the supernova explosion is allowed to occur over a much longer timescale than in the previously employed rapid model [72]. We then use the compact-object mass prescriptions from [81] and [82]. These two latter models use slightly different recipes for the proto-compact object masses while adopting the same formulae to determine the amount of fallback material. We note that the effect of the BH formation recipe is two folds as it influences both the mass distribution of the BHs as well as their natal kicks. Apart from the effect on the BH mass function, however, there is very little change of the merger rate evolution among the various prescriptions, with the delayed model leading to a slightly lower merger rate at all redshifts than the others.
Natal kicks. Two additional assumptions about the  Table I. BH natal kicks are explored. In one the BHs are formed with no kick, and in the other the BHs receive the same momentum kick as neutron stars, meaning that their kick velocities are drawn from a Maxwellian distribution with dispersion σ = 265km s −1 [74] and then reduced by the neutron star to BH mass ratio, 1.4M /m. Among the model variations considered in this section, the BH natal kick prescription has the largest (but still mild) impact on our results. The zero kick and the momentum kick prescriptions lead, respectively, to a larger and smaller retention fraction of BHs compared to the fallback prescription [93]. The difference becomes especially important in clusters with initial mass M 0 10 4 M because of their lower escape velocities. In these clusters, virtually no BHs are left after the momentum kicks have been imparted, which is reflected in the mass distribution of useful clusters shown in the bottom-right panels of Fig. 7, 8, and 9.
Cluster evaporation. Our mass-independent and orbitindependent mass loss rate for cluster evaporation is certainly a simplified one. To understand its effect on the cluster and BHB evolution, we computed three additional models with exactly the same initial conditions as in Mod1, Mod2 and Mod3 but withṀ ,ev = 0. Here we still compute the initial GCMF from equation (5) and use the [M * , ∆] values obtained from the MCMC analysis above, but we do not include any prescription for mass loss when evolving the clusters. Thus, this exercise is only meant to determine the importance of the mass loss effect on the secular evolution of the clusters and the BHBs they produce. We find that the local value of the merger rate density as well as λ, the BH mass and progenitor cluster mass distributions are consistent with those found in the models with cluster evaporation included. For the same initial conditions as in Mod1, Mod2 and Mod3 the median values of the local merger rate are R 0 = 6.9Gpc −3 yr −1 , 14.1Gpc −3 yr −1 and 3.5Gpc −3 yr −1 , respectively. This shows that cluster evaporation has a small effect on the dynamics of the BHBs.
In our models, however, tidal mass loss must become important at some point, e.g., for high enough ∆, GCs will evaporate before they can produce BHBs. We now quantify how high ∆ needs to be in order to change the BH dynamics significantly. To do this we compare the tidal mass loss timescale, t ev ≡ M 0 /|Ṁ ,ev |, to the timescale after which the BHs have been nearly depleted by dynamical ejections, which we define to be t BH ≡ M BH,0 /|Ṁ BH |. We should expect that for t BH < t ev most BHBs will have formed already before the cluster mass has changed significantly due to evaporation. This will happen if ∆ is smaller than the critical value For ρ 0 = 10 3 M pc −3 and f BH = 0.05, we find ∆ c ≈ 10 6 M independent of the initial cluster mass; for ρ 0 = 10 5 M pc −3 , we have ∆ c ≈ 10 7 M . These values are larger than any value of ∆ used in our models (see Fig. 1), explaining the small impact of cluster evaporation on the results. While the models discussed above show that the impact of cluster evaporation on the BHB dynamics is small, they do not asses its effect on the merger rate. Thus, we consider three new models withṀ ,ev = 0 but now use an initial GCMF that only accounts for mass loss due to stellar evolution. If only stellar evolution is included, the initial GCMF becomes: and K 2. These new models provide us with a safe lower limit on the BHB merger rate for each density as-sumption; they are Mod15, Mod27 and Mod39 in Table I and Fig. 7, 8 and 9. From these results we see that the merger rate in models without evaporation are about three times smaller than in models where the effect of cluster evaporation is included.
Cluster density. The model variations explored above show that for a given initial GCMF, the initial cluster density is clearly the most important parameter for setting the BHB merger rate density and its redshift evolution. Thus, here we perform a more systematic exploration of such dependence by running an additional set of models where the initial cluster density is varied in the range ρ 0 = 10 2 M pc −3 to 10 7 M pc −3 . All other model parameters are set as in Mod1 of Table I. The results from these additional models are shown in Fig. 10 and Fig. 11.
From Fig. 10 we see that the peak of the merger rate and the redshift at which it occurs in each model increase with ρ 0 , and vary from 3 Gpc −3 yr −1 at z = 3 for ρ 0 = 10 2 M pc −3 , to 10 3 Gpc −3 yr −1 at z = 4.5 for ρ 0 = 10 7 M pc −3 . The situation is different, however, when we look at the merger rate in the local Universe. In Fig. 11 we see that the median value of R 0 has a  Table I. maximum value of 20 Gpc −3 yr −1 at ρ 0 10 5 M pc −3 . This is an important result as it shows that the local BHB merger rate density from the GC channel has a robust upper limit of 50 Gpc −3 yr −1 -the upper error bar estimate for the ρ 0 = 10 5 M pc −3 model.
The reason why R 0 decreases with ρ 0 above a certain density can be understood from the lower panel in Fig. 10. This plot shows the mass distribution of clusters from which the BHBs that merge in the local universe are formed. For initial densities above 10 5 M pc −3 the contribution from clusters in the mass range 10 5 M M 0 10 7 M gradually decreases as the distribution of cluster masses contributing to the mergers becomes bimodal. Such narrowing of the range of cluster masses that can produce local mergers explains the relatively low merger rate in the higher density models. It also affects the distribution of the primary BH masses as shown in the middle panel of Fig. 10.
We looked into the density dependence of the cluster mass distribution shown in the lower panel of Fig. 10 in more details. We find that the lower mass peak seen in the cluster mass distribution for ρ 0 = 10 6 M pc −3 and 10 7 M pc −3 is only due to ejected binaries while the higher mass peak is only due to in-cluster mergers. Thus, we can explain the depletion of BHBs that come from intermediate mass clusters by considering the behaviour of the two merging populations when varying ρ 0 . Above a certain initial cluster mass, v esc becomes large enough ( 100km s −1 ) that all BHBs merge inside the cluster. But, because v esc ∝ M 1/3 ρ 1/6 , the value of initial cluster mass that still allows for dynamical ejections to occur goes down as ρ 0 increases. This explains why the upper end of the mass distribution of clusters that produce mergers from ejected BHBs decreases as ρ 0 increases. Clusters with an initial mass larger than this value only produce in-cluster mergers. Such clusters, however, can only produce mergers in the local universe if they still contain BHs at the present time. Because the BHs are processed at a rate t −1 BH ∝ √ ρ/M , the value of the initial cluster mass above which BHs are still present in the local universe increases with density. This explains why the lower end of the mass distribution of systems that produce in-cluster mergers moves towards larger masses as ρ 0 increases; clusters with a mass smaller than this value get rid of all their BHs by z = 1. Fig. 10 shows that only models in which GCs start with an initially high density ρ 0 10 4 M pc −3 can account for a large fraction of the LIGO-Virgo BHB mergers. Interestingly, these models also give a better fit to the inferred BH mass function above m 13M as seen in Fig 2. Future GW observations will reduce the error bars associated with the merger rate estimates and the BH mass distribution, providing important clues on the the initial densities of GCs.
Finally, we consider two additional model realizations. In one we evolve the same initial conditions as in [28] and [59] where half of the clusters have r h,0 = 0.8 pc and the remaining half have r h,0 =1.6 pc; in the other model, the cluster half-mass radius scales as This latter relation was derived by [30] from the results of Has , egan et al. [58] who fit this Faber-Jackson-like relation to ultra-compact dwarf galaxies (UCDs) and elliptical galaxies. Gieles et al. [30] derived the initial mass-radius relation correcting for mass loss and expansion by stellar evolution and correcting radii for projection. All the other model parameters are the same as in Mod1 of Table I. The results of these two new models are shown in Fig. 12. Interestingly, both give a local merger rate, 10 Gpc −3 yr −1 , which is similar to the maximum merger rate value we obtained before. Moreover, these results show how the choice of initial half-mass radius relation has a significant effect on both the BH mass and initial cluster mass distributions. For r h,0 ∝ M 0.6 0 , the cluster mass distribution becomes nearly flat so that, roughly speaking, all clusters with initial mass in the range 10 4 −10 6 M contribute equally to the local merger rate.

V. DISCUSSION AND CONCLUSIONS
Our models are based on some assumptions and approximations. These are discussed in the following sections, which also present a more detailed comparison to the literature. We end the paper with a brief summary of our main results.

A. Present-day GC mass density
Our value of ρ GC is larger than what was found by [15]: if we adopt their assumption of an average GC mass of 3 × 10 5 M , we find that equation (2) corresponds to a GC number density of n GC = 2.4 Mpc −3 , which is 3.3 times larger than their n GC = 0.72 Mpc −3 , but similar to the value used in [12] and [20]. Part of this difference is because we adopted a larger value of η: if we use their mild M h -dependent η from [36], we find n GC = 1.50 Mpc −3 , which corresponds to our lower error bar η . However, this is still a factor of 2.1 higher than what was found by [15]. We are not sure what causes this remaining difference, but we note that n GC = 1.50 Mpc −3 is about a factor h −2 larger than n GC = 0.72 Mpc −3 (Carl Rodriguez, private communication).

B. Initial GC density in the Universe
To derive ρ GC0 , a different approach was adopted by [28] and [59]. They use the total mass density of GCs forming in the semi-analytical galaxy formation model of El-Badry et al. [77]. They approximate the numerical results with analytical functions and find a total ρ GC0 = 5.8 × 10 14 M Gpc −3 , about 15% higher than El-Badry et al. [77] and 20% lower than our adopted ρ GC (equation 2). They then assume that the initial masses of all GCs were a factor of 2.6 higher (from [94]) because of stellar mass loss and evaporation and find that initial mass density of GCs more massive than 10 5 M is ρ GC0 (M 0 > 10 5 M ) 1.5 × 10 15 M Gpc −3 . This is a factor of ∼ 4 higher than found by [28]. The reason we find a higher value is that their assumption that the present-day GC density in the Universe is made from GCs with M 0 > 10 5 M that lost (only) a factor of 2.6 in mass implies a mass loss rate that is much lower in our models. In our models the present-day ρ GC0 is made of GCs with M 0 4 × 10 5 M .
In addition, we do consider the contribution to the merger rate of lower mass GCs with M 0 < 10 5 M . We Use cBHBd to evolve the same initial conditions as in [28] and [59] where half of the clusters have r h,0 = 0.8 pc and the remaining half have r h,0 =1.6pc but extended the initial GCMF down to M lo = 100M (see Fig. 12). We find that 10% of the local mergers come from GCs with M 0 < 10 5 M , and therefore conclude that for the exact same initial GCMF, our models would lead to a local merger rate that is still 4 times that found in [28] and [59]. Our Schechter mass is M * 2 × 10 6 M , so we compare to the results from [28] for mass functions with M * = (2.5 − 5) × 10 6 M . For these, the rates in [28] are 5 to 10Gpc −3 yr −1 . The median of the merger rate distribution computed from our models is 10Gpc −3 yr −1 . Rodriguez and Loeb [28] use a fit to the results of a set of Monte Carlo simulations to determine the number of mergers produced by a cluster as a function of time. Because these fitting formulae are not public, it is currently difficult to establish the reason why our rates are only 1 to 2 times, and not 4 times, those in [28]. We note in passing that we compared our models to the number of mergers from the two examples shown in Fig. 2B of [28] and found very good agreement.

C. O-star ejections and IMBH formation
Our cBHBd model makes the simplifying assumption that all BHs are in place when the cluster forms. Because the typical timescale of GC evolution (i.e. 100 Myr -Gyr) is much longer than the timescale of BH formation (i.e. 10 Myr), this is fine in most cases. However, for very dense, low-mass clusters, the relaxation time is so short that O-stars are ejected as 'runaway stars' before they form BHs [95,96]. As a result, the initial BH fraction is lower in these clusters than what we assume in our models, possibly affecting the merger rate and properties of the mergers. To quantify this, we use the fact that the dynamical process that ejects O-stars is the same as the one that ejects BHs at a later stage. We therefore adopt clusterBH and replace the BH population by a massive star population between 10 − 100 M , with a logarithmic slope of −2.3, and a mass fraction of 18%, as appropriate for a Kroupa IMF. We then determine for a grid of initial cluster masses and half-mass radii the maximum mass of massive stars that form BHs inside the cluster. In Fig. 13 we show contours for 20 M (the minimum mass of an O-star to produce a BH), 35 M (approximately half of the mass in BHs is produced by stars more massive than this) and 100 M (the upper limit of our IMF). We also overplot the 3 initial cluster densities adopted in the previous section. From this we see that clusters with M 0 10 4 M are affected by O-star ejections, which affects about half of the mass in the initial GCMF. However, these low-mass GCs are only responsible for ∼ 15% of the mergers. The fraction of clusters for which more than half of the BH mass is ejected is only a few per cent. Cluster that produce runaways, will have fewer massive BHs, leading to a slightly higher merger rate of slightly less massive BHs. This effect is small, but would lead to a slightly steeper BH mass function especially for the densest models. However, we conclude that runaway stars do not significantly affect our results and that the effect is probably smaller than other uncertainties in our model.
Another process that is not included in cBHBd is repeated mergers of BHs. After a merger, the BH merger remnant receives a general relativistic momentum kick of several 100 km/s, and if this is smaller than the escape velocity from the center of the cluster, it can be involved in subsequent mergers [85] possibly forming an intermediate-mass BH (IMBH) [86]. This can only occur for an initial escape velocity 300 km/s and in Fig. 13 we show that only in our densest ( 10 5 M /pc 3 ), most massive ( 10 7 M ) models this could happen. Ignoring the effect of IMBH formation via dry BH mergers is therefore not affecting our results.

D. Cluster mass loss and initial GCMF
Our models adopt a constant mass loss for all clusters of ∆ 2 × 10 5 M . This is what is required to evolve the initial GCMF with a power-law slope of −2 at low-masses to the peaked GCMF of old GCs, but it is inconsistent with some studies of GC evolution. Firstly, N -body simulations of tidally limited clusters show thaṫ M ∝ M 1/3 [48], rather thanṀ ∝ M 0 . Including this mass dependence inṀ and maintaining the constraint that all GCs formed with the same Universal initial mass function, implies that clusters need to lose more mass for the turn-over in the GCMF to move to 2 × The magenta, full line shows the initial mass-radius relation that describes the most massive GCs ( 10 6 M ) and ultracompact dwarf galaxies ( 10 7 M ) [30,58]. Full lines show the maximum mass of O-stars that form BHs inside the cluster. Clusters with M 10 4 M will eject some O-stars before they become BHs, and these clusters will therefore have slightly lower BHB mergers than in our model. The red, full line shows an initial escape velocity of 300 km/s, which is the minimum escape velocity required for repeat mergers to occur [86]. This process is not playing a role in our adopted initial conditions. [97], resulting in a twice as large value of K 64 [98] as we found for a mass independentṀ . Secondly, the models of [48] show that ∆ for a typical Milky Way GC is smaller and depends on the apocenter and eccentricity of the Galactic orbit. For the median Galactocentric distance of Miky Way GCs (∼ 5 kpc) and an age of 10 Gyr, these models find ∆ 4 × 10 4 M , i.e. a factor of 5 smaller than what we assumed. These simulations considered the secular evolution of clusters in a static tidal field and therefore underestimate mass loss of clusters if additional disruption processes are important. For example, interactions with massive gas clouds in the early evolution (first Gyr) can be disruptive [51,52], have a similar mass dependence as relaxation driven evaporation in a static tidal field [55] and lead to a turn over in the GCMF [53,99]. If this is the cause for the value of ∆, than |Ṁ | is much higher in the early evolution, which would affect the resulting merger rate. Because the relaxation time decreases if the mass reduces, including this type of mass evolution will lead to a higher merger rate than in our models with anṀ that is constant in time.
The models of [48] also do not contain BHs and it has been shown that retaining a BH population significantly increases the escape rate of stars [100,101]. The BH population can increase |Ṁ | by an order magnitude (Gieles et al., in prep), especially towards the end of the evolution. This implies a relatively low(high) |Ṁ |(t rh ) in the early evolution compared to our models, leading to a reduction of the merger rate. In addition, forṀ ∝ M γ , with γ < 0, the required K to get the turn-over at the right mass is lower than for γ = 0. If BHs are responsible for the value of ∆, our merger rates could therefore also be slightly overestimated for this reason. However, we do not expect this effect to be important for dense clusters ( 10 4 M /pc 3 ) because their BHB mergers are produced when the clusters are still unaffected by the Galactic tides. We plan to include the effect of relaxation driven escape in a tidal field in a future version of cBHBd to address this issue.

E. Primordial binaries
The effect of binaries that form in the star formation process and undergo stellar evolution in the first stages of cluster evolution has not been discussed so far. We argue here that primordial binaries have a negligible effect on the merger rate and the distribution of the BH masses we derived. Because of Hénon principle, the energy generation rate by binaries is determined by the relaxation process in the cluster as a whole. Whether dynamically active binaries form in three-body encounters from single BHs, or in encounters involving BHBs that formed from primordial binaries will therefore result in a central binary with the same properties. However, primordial binaries might affect the initial BH mass function due to binary evolution processes. But, because BHB mergers from primordial binaries in GCs are a subdominant population at low redshifts [see Fig. 2 in 15], the effect on the local BHB mass distribution is also expected to be small.

F. Conclusions
In this paper we have considered the dynamical formation of BHB mergers in GCs. Using our new population synthesis code cBHBd we have evolved a large number of models covering a much wider set of initial conditions than explored in the literature. This allowed us to place robust error bars on the merger rate and mass distributions of the merging BHBs. We find that the GC channel produces BHB mergers in the local universe at a rate of 7.2 +21.5 −5.5 Gpc −3 yr −1 , where the error bars are mostly set by the unknown initial GC mass function and initial cluster density. By comparing to the merger rate inferred by LIGO-Virgo, our results imply that a model in which most of the detected mergers come from GCs is consistent with current constraints. This would require, however, that GCs form with half-mass densities larger than 10 4 M pc −3 , and suppression of other formation mechanisms. All our models show a drop in the merger rate of binary with primary BH mass below 15M , for which there is no evidence in the gravitational wave data. This might suggest that another mechanism is responsible for the production of these lower mass sources.
Our results have a number of implications for the formation of BHB mergers and GCs. The dependence of the merger rate and BHB properties (e.g., eccentricity, mass) on the model parameters suggests that a direct comparison to the gravitational wave data will allow us to place constraints on the initial conditions of GCs and their evolution. Our models will also help to understand other uncertain parameters that control the formation of BHs and their natal kicks. While these latter parameters have little effect on the merger rate, they have a significantly impact on the masses of the merging BHBs. Thus, useful constraints could be placed once the number of gravitational wave detections will be large enough to to allow for a statistically significant comparison to the inferred BH mass function.
In the future, we plan to consider other type of clusters such as open and nuclear star clusters which are also believed to be efficient factories of gravitational wave sources [84,86,[102][103][104]. The study of these systems will require us to add additional physics to cBHBd.