IMRPhenomXHM: A multi-mode frequency-domain model for the gravitational wave signal from non-precessing black-hole binaries

We present the IMRPhenomXHM frequency domain phenomenological waveform model for the inspiral, merger and ringdown of quasi-circular non-precessing black hole binaries. The model extends the IMRPhenomXAS waveform model, which describes the dominant quadrupole modes $\ell = |m| = 2$, to the harmonics $(\ell, |m|)=(2,1), (3,3), (3,2), (4,4)$, and includes mode mixing effects for the $(3,2)$ spherical harmonic. IMRPhenomXHM is calibrated against hybrid waveforms, which match an inspiral phase described by the effective-one-body model and post-Newtonian amplitudes for the subdominant harmonics to numerical relativity waveforms and numerical solutions to the perturbative Teukolsky equation for large mass ratios up to 1000. A computationally efficient implementation of the model is available as part of the LSC Algorithm Library Suite.


I. INTRODUCTION
Frequency domain phenomenological waveform models for compact binary coalescence, such as [3][4][5][6] have become a standard tool for gravitational wave data analysis [7,8]. These models describe the amplitude and phase of spherical harmonic modes in terms of piecewise closed form expressions. The low computational cost to evaluate these models makes them particularly valuable for applications in Bayesian inference [9,10], which typically requires millions of waveform evaluations to accurately determine the posterior distribution of the source properties measured in observations, such as the mass, arrival time, or sky location.
Until recently the modelling of the gravitational wave signal from such systems, and consequently gravitational wave data analysis, have focused on the dominant = |m| = 2 harmonics. For high masses or high mass ratios this leads however to a significant loss of detection rate [11][12][13], systematic bias in the source parameters (see e.g. [11,[14][15][16][17][18]), and implies a degeneracy between distance and inclination of the binary system. As the sensitivity of gravitational wave detectors increases, accurate and computationally efficient waveform models that include subdominant harmonics are required in order to not limit the scientific scope of gravitational wave astronomy.
Recently both time domain and frequency domain inspiralmerger-ringdown (IMR) models have been extended to subdominant spherical harmonics, i.e. modes other than the (2, |2|) modes: In the time domain this has been done in the context of the effective-one-body (EOB) approach [19], however EOB models are computationally expensive and usually a reduced order model (ROM) is constructed to accelerate evaluation [20,21] (see however [22] for an analytical method to accelerate the inspiral). Furthermore, the NRHyb-Sur3dq8 surrogate model [23] has been directly built from hybrid waveforms, but is restricted to mass ratios up to eight.
For a precessing surrogate model, calibrated to numerical relativity waveforms, see [24]. Fast frequency domain models have previously been developed for the non-spinning subspace [25,26], and for spinning black holes through an approximate map from the (2, 2) harmonic to general harmonics as described in [6], which presented the IMRPhenomHM model, which is publicly available as part of the LIGO Algorithm Library Suite (LALSuite) [2]. This approximate map is based on the approximate scaling behaviour of the subdominant harmonics with respect to the (2, 2) mode, the IMRPhe-nomHM model is thus only calibrated to numerical data for the (2,2) mode. This information from numerical waveforms enters through the IMRPhenomD model, which is calibrated to numerical relativity (NR) waveforms up to mass ratio q = 18.
Here we present the first frequency domain model for the inspiral, merger and ringdown of spinning black hole binaries, which calibrates subdominant harmonics to a set of numerical waveforms for spinning black holes, instead of using an approximate map as IMRPhenomHM. For the (2, |2|) modes the model is identical to IMRPhenomXAS [1], which presents a thorough update of the IMRPhenomD model, extends it to extreme mass ratios, drops the approximation of reducing the two spin parameters of the black holes to effective spin parameters, and replaces ad-hoc fitting procedures by the hierarchical method presented in [27].
Our modelling approach largely follows our work on IMR-PhenomXAS, with some adaptions to the phenomenology of subdominant modes, as first summarized in Sec. II. As for IMRPhenomXAS we construct closed form expressions for the amplitude and phase of each spherical harmonic mode in three frequency regimes, which correspond to the inspiral, ringdown, and an intermediate regime. In the inspiral and ringdown the model can be based on the perturbative frameworks of post-Newtonian theory [28] and black hole perturbation theory [29]. The intermediate regime, which models the highly dynamical and strong field transition between the physics of the inspiral and ringdown still eludes a perturbative treatment. An essential goal of frequency domain phenomenological waveform models is computational efficiency. To this end, an accompanying paper [30] presents techniques to further accelerate the model evaluation following [31].
We model the complete observable signal, from the inspiral phase to the merger and ringdown to the remnant Kerr black hole, but we restrict our work to the quasi-circular (i.e. noneccentric) and non-precessing part of the parameter space of astrophysical black hole binaries in general relativity, which is three-dimensional and given by mass ratio q = m 1 /m 2 ≥ 1 and the dimensionless spin components χ i of the two black holes which are orthogonal to the preserved orbital plane, where S 1,2 are the spins (intrinsic angular momenta) of the two individual black holes, L is the orbital angular momentum, and m 1,2 are the masses of the two black holes. We also define the total mass M = m 1 + m 2 , and the symmetric mass ratio η = m 1 m 2 /M 2 . An approximate map from the non-precessing to the precessing parameter space [4,32,33] can then be used to extend the model to include the leading precession effects. The paper is organized as follows. In Sec. II we collect some preliminaries: our conventions, notes on waveform phenomenology which motivate our modelling approach, and a brief description of our plan of fitting numerical data. In Sec. III we briefly describe our input data set of hybrid waveforms, and the underlying numerical relativity and perturbative Teukolsky waveforms. The construction of our model is discussed in Secs. IV-VI for the inspiral, intermediate region, and ringdown respectively. The accuracy of the model is evaluated in Sec. VII, and we conclude with a summary and discussion in Sec. VIII. Appendix A discusses the conversion from spheroidal to spherical-harmonic modes. In appendix B we describe our method to test tetrad conventions in multimode waveforms. Some technical details of our LALSuite [2] implementation are presented in appendix C. Details regarding the rescaling of the inspiral phase are presented in appendix D, and appendix E summarizes post-Newtonian results on the Fourier domain amplitude.

A. Conventions
Our waveform conventions are consistent with those chosen for the IMRPhenomXAS model in [1] and our catalogue of multi-mode hybrid waveforms [34], which we introduce in Sec. III. We use a standard spherical coordinate system (r, θ, φ) and spherical harmonics Y −2 m of spin-weight −2 (see e.g. [35]). The black holes orbit in the plane θ = π/2. Due to the absence of spin-precession the spacetime geometry exhibits equatorial symmetry, i.e. the northern hemisphere θ < π/2 is isometric to the southern hemisphere θ > π/2, and in consequence the same holds for the gravitational-wave signal.
The gravitational-wave strain h depends on an inertial time coordinate t, the angles θ, φ in the sky of the source, and the source parameters (η, χ 1 , χ 2 ). We can write the strain in terms of the polarizations as h = h + (t, θ, ϕ) − i h × (t, θ, ϕ), or decompose it into spherical harmonic modes h m as (2.1) The split into polarizations (i.e. into the real and imaginary parts of the time domain complex gravitational wave strain) is ambiguous due to the freedom to perform tetrad rotations, which corresponds to the freedom to choose an arbitrary overall phase factor. As discussed in detail in [34,36] and in appendix B, only two inequivalent choices are consistent with equatorial symmetry, and for simplicity we adopt the convention that for large separations (i.e. at low frequency) the time domain phases satisfy This differs from the convention of Blanchet et al. [37] by overall factors of (−1)(−ι) m in front of the h m modes. In appendix B we discuss how to test a given waveform model for the tetrad convention that is used. The equatorial symmetry of non-precessing binaries implies h m (t) = (−) h * −m (t), (2.3) it is thus sufficient to model just one spherical harmonic for each value of |m|.
We adopt the conventions of the LIGO Algorithms Library [2] for the Fourier transform, (2.4) With these conventions the time domain relations between modes (2.3) that express equatorial symmetry can be converted to the Fourier domain, where they read The definitions above then also imply thath m ( f ) (with m > 0) is concentrated in the negative frequency domain andh −m ( f ) in the positive frequency domain. For the inspiral, this can be checked against the stationary phase approximation (SPA), see e.g. [38].
As we construct our model in the frequency domain, it is convenient to modelh −m , which is non-zero for positive frequencies. The modeh m , defined for negative frequencies, can then be computed from (2.5). We model the Fourier amplitudes A m ( f > 0), which are non-negative functions for positive frequencies, and zero otherwise, and the Fourier domain (2.6) The contribution to the gravitational wave polarizations of both positive and negative modes and for positive frequencies is then given bỹ If we are only interested in the contribution of just one mode for positive frequencies then the polarizations read as: For the = 2, |m| = 2 modes these equations correspond to our IMRPhenomXAS model [1].
In appendix C we discuss conventions which are specific to our LALSuite implementation, in particular how to specify a global rotation, and the time of coalescence.

B. Perturbative waveform phenomenology: inspiral and ringdown
The phenomenology of the oscillating subdominant modes |m| > 0 is largely similar to the dominant modes = |m| = 2, which has been discussed in detail in [1, 39] -with some important exceptions that lead to both simplifications and complications when modelling these modes, as opposed to modelling = |m| = 2.
The main simplification is that at low frequencies post-Newtonian theory, combined with the stationary phase approximation, predicts an approximate relation between the phases Φ m of different harmonics, which with our choice of tetrad takes the simple form of eq. (2.2). This approximation is not exact, and becomes less accurate for higher frequencies. We have studied this in detail in [34,36], and in [34] we find that for comparable mass binaries we can neglect the error of the approximation (2.2) before a binary system reaches its minimal energy circular orbit (MECO) as defined in [40]. As in our IMRPhenomXAS model, we will use the MECO to guide the choice of transition frequency between the inspiral and intermediate frequency regions. In the mass ratio range where we have numerical relativity data (q ≤ 18) it is thus not necessary to model the inspiral phase, but we can use the scaling relation (2.2), as has been done in [6].
For the time domain amplitude, approximate scaling relations have been discussed in [41,42], and in the frequency domain they have been used in the IMRPhenomHM model [6]. Unlike for the phase, however, even in the inspiral the errors are too large for our purposes, and we will need to model the amplitude for each spherical harmonic in a similar way as for IMRPhenomXAS, including in the inspiral.
Rotations in the orbital plane by an angle ϕ transform the spherical harmonic modes as (2.13) Interchange of the two black holes thus corresponds to a rotation by ϕ = π, and modes with odd m vanish for equal black hole systems. A problem can arise in regions of the parameter space where the amplitude is close to zero, even in the inspiral, as discussed in [19,34]: For black holes with very similar masses, the amplitude can be very small, with the sign depending on the mass ratio, spins, and the frequency -which can lead to sign changes with frequency. In such cases the amplitude does become oscillatory, and the approximate relation (2.2) can not be expected to be satisfied. This happens in particular for the (2, |1|) modes. We do not currently model the amplitude oscillations, and thus for a certain region of parameter space our model does not properly capture the correct waveform phenomenology. This is region does depend on the frequency, but roughly corresponds to very similar masses and anti-aligned spins, for a particular example for the (2, |1|) modes see Fig.1. However, as phenomenon happens precisely when the amplitude of the (2, |1|) modes is very small, this is not expected to be a significant effect for the current generation of detectors. During the inspiral and merger, gravitational wave emission is dominated by the direct emission due to the binary dynamics. As the final black hole relaxes toward a stationary Kerr black hole, the gravitational wave emission is eventually dominated by a superposition of quasinormal modes before the late time polynomial tail falloff sets in (for an overview see e.g. [29,43]). As is common in waveform modelling targeting applications in GW data analysis, we neglect the tail falloff, and focus our late time description on the ringdown by means of quasinormal mode (QNM) emission, where the strain can be written as a sum of exponentially damped oscillations, h(t, θ, ϕ) ≈ mn a mn e i(ω mn t)+φ mn −2Y S m (θ, φ), (2.14) where the complex frequencies ω mn are known in terms of the black hole spin and mass, and the functions −2Y S m (θ, φ) are the spheroidal harmonics of spin weight −2 [44,45]. The amplitude parameters a mn and phase offsets φ mn will in general have to be fitted to numerical relativity data.
It has long been known that representing the spheroidal harmonic ringdown modes can result in mode mixing for modes with approximately the same real part of the ringdown frequency ω lmn . This happens in particular for modes with the same value of m, where then values with larger are much weaker, and do not show the usual exponential amplitude drop, but a more complicated phenomenology. In our case this happens for the (3,2) mode. In this work we will model mixing with just one mode, specifically the (3, 2) with the (2, 2), and we neglect the mixing with the (4, 2) and weaker modes. While for the modes that do not show mixing it is sufficient to model their spherical harmonics, for the (3, 2) we will model the spheroidal harmonic, and then transform to the spherical FIG. 1: Two top panels: Amplitude of the (2, 1) mode relative to the (2, 2 at a fixed frequency of M f = 0.001 as a function of the spins for mass ratios 1.2 (top) and 1. 35 (bottom). We can see the regions where the (2, 1) amplitude tends to zero. The minimum amplitude region consists in a diagonal that is moving towards one of the corners as we increase the mass ratio. For q ∼ 1.7 it has already disappeared. Bottom panel: A similar behaviour can be observed for higher frequencies (M f = 0.02).
harmonic basis, as discussed in Sec. VI. For a recent nonspinning model of mode-mixing see [26].
A key challenge of accurately modeling multi-mode waveforms is to preserve the relative time and phase difference between the individual modes, say as measured at the peaks of the modes. In the frequency domain time shifts are encoded in a phase term that is linear in frequency: the Fourier transformation of a time shifted function h τ = h(t − τ) will be given byh τ =he −i2π f τ . In GW data analysis the quality of a model is typically evaluated in terms of how well two waveforms match, up to time shifts and global rotations, e.g. in terms of match integrals. Adding a linear term in the phase leaves such match integrals invariant. In order to improve the conditioning of the model calibration it has thus been common for phenomenological frequency domain models to subtract the linear part in frequency before calibrating the model to improve the conditioning of numerical fits, and then add back a linear in frequency term at the end, which approximately aligns the waveforms in time, e.g. by approximately aligning the am-plitude peak at a certain time. This strategy has also been followed in our construction of the IMRPhenomXAS model, i.e. for the = |m| = 2 modes, whereas for the other modes we directly model a given alignment in time.
More specifically, our strategy of aligning the different spherical harmonic modes in time and phase has been the following: our hybrid waveforms are aligned in time and phase such that the peak of the = |m| = 2 modes of ψ 4 in the time domain is located at t = 0, and the corresponding phase Φ 22 (ψ 4 , t = 0) = 0, which corresponds to a time ∆t = 500M before the end of the waveform. For the odd modes this leaves an ambiguity of a phase shift by multiples of π. We do not use the odd modes of the numerical waveforms to resolve this ambiguity in order to not depend on the poor quality of many odd mode data sets, and rather resolve the ambiguity in our model by making a smooth choice of the inspiral phase across our parameter space. Our subdominant modes are calibrated to agree with this alignment of our hybrid waveforms. The = |m| = 2 mode is aligned a posteriori to the same alignment, similar to what has been done in previous phenomenological frequency domain models. A difference here is that this a posteriori time alignment is achieved via an additional parameter space fit.
C. Strategy for fitting our model to numerical data As discussed above, following IMRPhenomXAS our model is constructed in terms of closed form expressions for the frequency domain amplitude and phase of spherical or spheroidal harmonic modes, which are each split into three frequency regimes. We will refer to a model for the amplitude or phase for one of the frequency regimes as a partial model. We thus need to construct a total of six partial models for each mode. For the inspiral phase, we can use the scaling relation (2.2) for comparable masses, and only need to model the extreme mass ratio case. For each of the six partial models the ansatz will be formulated in terms of some coefficients, which for example in the inspiral will be the pseudo-PN coefficients that correspond to yet unknown higher post-Newtonian orders.
We thus employ two levels of fits to our input numerical data: First, for each waveform in our hybrid data set we perform fits of the six partial models for each mode to the data. This yields a set of coefficients for each mode, quantity (amplitude or phase), and frequency interval. We call this first level the direct fit of the model to our data. Second, we fit each coefficient across the three-dimensional parameter space of mass ratio and component spins. We call this second level the parameter space fit. In the direct fit we usually collect redundant information: such as the model coefficients, values and derivatives at certain frequencies, and other quantities. This redundancy provides for some freedom when reconstructing the model waveform after the parameter space fits. We make extensive use of this freedom when tuning our model, while the final model uses a particular reconstruction, which is what will be described below.
Fitting the coefficients of a particular partial model across the parameter space may not turn out to be a well-conditioned procedure, e.g. for the inspiral the pseudo PN coefficents have alternating signs, the PN series converges slowly, and different sets of PN coefficients can yield very similar functions. We will thus sometimes transform the set of coefficents we need to model to an alternative representation, in particular collocation points, following [1, 3,39]. We thus fit the values or derivatives at certain frequencies, and use the freedom in reconstructing the final phase or amplitude in tuning the model as mentioned above.
In order to perform the three-dimensional parameter space fits in symmetric mass ratio η and the two black-hole spins χ 1,2 we use the hierarchical fitting procedure described in [27], which we have also used for the underlying IMRPhenomXAS model. The goal of this procedure is to avoid both underfitting and overfitting our data set. In order to simplify the problem we split the three-dimensional problem into a hierarchy of lower dimensional fits to some particular subsets of all data points. For each lower dimensional problem it is significantly easier to choose an ansatz that avoids underfitting and overfitting, and finally we combine the lower-dimensional fits into the full three-dimensional fit and check the global quality of the fit. In order to check fit quality we compute residuals and compute the RMS error, and we employ different information criteria to penalize models with more parameters as discussed in [27] as an approximation to a full Bayesian analysis.
As a first step of our hierarchical procedure we perform a 1dimensional fit for non-spinning subspace, choosing the symmetric mass ratio η as the independent variable. We can then identify two further natural one-dimensional problems: First, for the extreme mass ratio case we can neglect the spin of the smaller black hole, and consider the mass ratio as a scaling parameter, and we thus consider a one-dimensional problem in terms of the spin of the larger hole. Second, at fixed mass ratio we can fix a relation between the spins. For quantities such as the final spin and mass, or the coefficients of the IMR-PhenomXAS model for the (2, 2) mode, it has been natural to consider equal black holes, e.g. equal mass and equal spin for this one-dimensional problem.
It is then useful to express the results for the onedimensional spin fits in terms of a suitably chosen effective spin, such as which is typically measured in parameter estimation (see e.g. [8]), and which has also been the choice in the early phenomenological waveform models IMRPhenomB [46] and IMRPhenomC [47]. A judicious choice of effective spin parameter can minimize the errors when approximating functions of the three-dimensional parameter space by functions of η and effective spin, and can be sufficient for many applications, since spin-differences are a sub-dominant effect. For IMRPhenomD [3,39] two effective spin parameters have been used: For the inspiral calibration to hybrid waveforms (2.15) has been augmented by extra terms motivated by post-Newtonian theory [48] χ PN = χ eff − 38η 113 (χ 1 + χ 2 ) (2.16) The final spin and mass, and thus the ringdown frequency have been fit to numerical data in terms of the rescaled total angular momentum of the two black holeŝ (2.17) We choose χ PN as the effective spin for the inspiral amplitude collocation points, andŜ for other amplitude coefficients. For the phase inspiral we use the scaling relation (2.2) for comparable masses, such that a phase inspiral calibration is only necessary for large mass ratios, which we treat in the same way as other phase coefficients, where we again useŜ . We will denote a generic effective spin parameter by χ in general equations involving the effective spin.
We thus perform three one-dimensional fits, one for the non-spinning sub-space (depending on η), one for equal black holes (depending on χ), and one for extreme mass ratios (again depending on χ). Then a 2D ansatz depending on η and χ is built such that it reduces to the 1D fits for those particular cases. The 2D fit is then performed for all data points and from it we get the best η-dependence for the spin-effective terms.
In order to extend the hierarchical method to the full threedimensional parameter space, a second spin parameter needs to be chosen, which incorporates spin difference effects. For small spin difference effects, we can simply choose without worrying about a particular mass-weighting of the spins, since differences in mass-weighting could be absorbed into higher order terms. The spin difference effects are then modelled with a function f ∆χ (η) as a term f ∆χ (η)∆χ, (2.19) and the modelling of small spin difference effects is reduced to the 1-dimensional problem of fitting a function of η. For larger spin difference effects, we will however need higher order terms, and in [27] a term quadratic in ∆χ and a term proportional to χ ∆χ were included, and again these terms can be modelled as one-dimensional problems in terms of functions of η.
Extensions to this procedure are needed to model the behaviour of sub-dominant modes, in particular for the (3,2) mode and for odd m modes. For the (3,2), we need to model mode mixing in the ringdown, as discussed briefly above, and in detail in Sec. VI. While this requires a rotation from spheroidal to spherical harmonics, it does not directly affect our strategy for carrying out the direct fits and the parameter space fits. For odd m modes, changes are required due to the change of sign in the amplitude when rotating by an angle of π, see eq. (2.13), corresponding to interchanging the two black holes. For even m modes, rotations by π correspond to the identity, and as for IMRPhenomXAS it is natural to work with positive amplitudes. For odd m modes however, restricting the amplitude to positive values will make it a non-smooth function in the two-dimensional spin parameter space, where the amplitude corresponds to the absolute value of a function that can change sign. Specifically, at fixed mass ratio, the two-dimensional data in the spin parameter space exhibit a crease along a line which corresponds to a vanishing amplitude. For equal masses, this line appears for equal spin systems, as shown in Fig. 2. For sufficiently large mass ratios, the spin dependence of the sign of the phase can be neglected, we thus choose which part of the crease we flip in sign to be consistent with the behaviour for higher mass ratios data.
FIG. 2: Example of how the amplitude data set is modified for the parameter space fit. The data shown correspond to the first intermediate collocation point of the (3, 3) mode. One of the leaves of the original q = 1 data (top plot) is flipped in sign so we get a flat surface that is easier to fit. The leaf to be flipped is chosen such that the final behaviour is consistent with the data for other mass ratios, e.g. mass ratio 2 (intermediate plot). After the fit we take the absolute value of the final fit to return back positive amplitude.
When modelling the amplitude of odd m modes we can not use equal masses as one of our one-dimensional fitting problems, since the amplitude vanishes there, and instead we use a different mass ratio, typically q = 3. Applying appropriate boundary conditions for equal black hole systems is then essentially straightforward -we simply demand that the amplitude vanishes. Setting appropriate boundary values for odd mode phases for equal black hole is however a complicated problem, since the phase will in general not vanish as one takes the limit toward the boundary. The numerical data typically become very noisy and inaccurate for modes with very small amplitude, and thus one can not in general expect to model odd m modes for close-to-equal black holes with small relative errors. When building a parameter space ansatz for odd mode amplitudes we are adding a minus sign when exchanging the spins, the non-spinning and effective-spin parts of the ansatz must be manifestly set to zero because they pick up a minus sign when exchanging the BHs, while at the same time they are invariant by symmetry. We implement this by adding a multiplicating factor 1 − 4η that cancels these parts for equal mass systems. The even modes behave in the opposite way: since when exchanging the two spins they remain the same, it is the spin-difference part which has to vanish because it would introduce a minus sign.

III. CALIBRATION DATA SET
Our input data set coincides with the data we have used for the IMRPhenomXAS model [1], and comprises a total of 504 waveforms: 466 for comparable masses (with 1 ≤ q ≤ 18) and 38 for extreme mass-ratios (with q = {200, 1000}). These waveforms are "hybrids", constructed by appropriately gluing a computationally inexpensive inspiral waveform to a computationally expensive waveform, which covers the late inspiral, merger and ringdown (IMR). For comparable masses, the = |m| inspiral is taken from the SEOBNRv4 (EOB) model [49], and the subdominant modes are constructed from the phase of the = |m| mode and post-Newtonian amplitudes as described in [34] along with other details of our hybridization procedure. The IMR part of the waveform is taken from numerical relativity simulations summarized below.
For extreme mass ratios the IMR part is taken from numerical solutions of the perturbative Teukolsky equation, and the inspiral part is taken from a consistent EOB description, as discussed below in Sec. III B.
Due to the poor quality of many of the numerical relativity waveforms, and the fact that our extreme mass ratio waveforms are only approximate perturbative solutions, we do not use all of the waveforms of our input data set for the calibration of all the quantities we need to model across the parameter space. Already for IMRPhenomXAS (see [1]) we had to carefully select outliers, which lacked sufficient quality for model calibration. Higher-modes waveforms are typically even noisier and more prone to exhibit pathological features than the dominant quadrupolar ones. This can result in a large number of outliers in the parameter-space fits, which can introduce unphysical oscillations in the fit surfaces. To attenuate this problem, we developed a system of annotations that stores relevant information about the quality of all the waveforms in the calibration set. A careful analysis of data quality is needed, separately for each quantity that we fit, such as the value of a collocation point at particular frequency for a particular mode. We will not document these procedures in detail, instead, we will discuss outliers in Sec. VII, where we evaluate the quality of our model by comparing to the original hybrid data. We will see that this comparison has less stringent quality criteria: pathologies which prohibit the use of a some waveform accurate fit for a particular coefficient may in the end not significantly contribute to the waveform mismatch. We will thus only discuss those waveforms which we excluded from the model evaluation, because of doubts in their quality.

A. Numerical relativity waveforms
The NR simulations used in this work were produced using three different codes to solve the Einstein equations: for the amplitude calibration we used 186 waveforms [50] from the public SXS collaboration catalog, as of 2018 [51] obtained with the SpEC code [52], 95 waveforms [27,39] obtained with the BAM code [53,54], and 16 waveforms from simulations we have performed with the Einstein-Toolkit [55] code. After the release of the latest SXS collaboration catalog [56], we extended the data set to include 355 SpEC simulations, and updated the parameter-space fits for the phase accordingly. We chose not to update the amplitude fits, as their recalibration was expected to have a smaller effect on the overall quality of the model waveforms. The parameters of our waveform catalogue are visualized in Fig. 3, note that some points in the parameter-space 3 map to multiple waveforms, which allows for important cross-checks among different codes and/or different resolutions. A detailed list of the waveforms we have used can be found in our paper on the hybrid data set [34].

B. Extrapolation to the test-mass limit
Due to the computational cost of high mass-ratio simulations, our catalogue of NR waveforms extends only up to q = 18, which would leave the test-particle limit of our model poorly constrained. Here, following [57], we chose to pin down the large-q boundary of the parameter-space using Extreme Mass-Ratio Inspiral (EMRI) waveforms. We produced two sets of such waveforms, one with q = 200 and the other with q = 10 3 . The spin on the primary spans the interval [−0.9, 0.9] in uniform steps of 0.1, while the secondary is assumed to be spin-less. The waveforms in the calibration set were generated by hybridizing a longer inspiral EOB waveform with a shorter numerical waveform computed using the code of Ref. [58]. The code solves the 2 + 1 Teukolsky equations for perturbations that can be freely specified by the user. In our case the gravitational perturbation was sourced by a particle governed by an effective-one-body dynamics, with radiation-reaction effects included up to 6PN order for all the multipoles modelled in this work [59]. The EOB and Teukolsky waveforms, being both extracted at future null infinity, can be consistently hybridized, following the same hybridization routine used for comparable-mass cases.

IV. INSPIRAL MODEL
In the inspiral region we work under the assumption that the SPA approximation (see e.g. [38] and our discussion in the context of IMRPhenomXAS [1]) is valid. The frequency domain strain of each mode will therefore take the form where t c is a time shift parameter, ψ 0 is an overall phase factor that depends on the choice of tetrad convention and Notice that, in our tetrad-convention, ψ 0 = π (see Eq. (2.2) and discussion therein). Furthermore, we will assume that we can work within the post-Newtonian framework, and that we can model currently unknown higher-order terms in the PN-expansions with NRcalibrated coefficients. Let us stress that in IMRPhenomXHM the phase and amplitude are treated in different ways: while the latter is fully calibrated to NR, the former is built with a reduced amount of calibration, as we will explain below.
Following the approach taken in IMRPhenomXAS, we set the end of the inspiral region around the frequency of the MECO (minimum energy circular orbit), as defined in Ref. [40]. For the amplitude, the default end-frequency of the inspiral is taken to be: While the fully NR-calibrated amplitude requires careful tuning of the above transition frequencies, we find that for the phase we can simply set f m Ins = m 2 f 22 MECO . The start frequencies of our hybrid waveforms [34] are set up such that M f ≥ 0.001453 m/2 for comparable masses, and M f ≥ 0.001872 m/2 for extreme-mass-ratio waveforms (depending on the spherical harmonic index m). We start the amplitude calibration at a higher minimum frequency of f min = 0.002 m/2 to avoid contamination from Fourier transform artefacts. Note that a higher cutoff frequency was chosen for IMRPhenomXAS due to the higher accuracy requirements in that case. For the phase, only a small and simple (linear) correction term (4.10) is calibrated, and the same minimum frequency cutoff is applied in this case.

A. Amplitude
In the inspiral region, the amplitude ansatz of IMRPhe-nomXHM augments a post-Newtonian expansion with terms up to 3PN oder with three NR-calibrated coefficients, which correspond to higher-order PN terms. A 3PN-order expansion for the Fourier domain amplitudes is computed in [60]. We found however significant discrepancies with our numerical data, which we resolved by re-deriving the Fourier domain amplitudes from the time domain expressions given in [61], see appendix E, which lists the expressions we use in this work.
Using the notation of Eq. (4.1), at low-frequency one has and similar expressions for other modes. Such a divergent behaviour is expected to negatively impact the conditioning of our amplitude fits. Therefore, we do not model the SPA amplitudes directly, but similar to IMRPhenomXAS we rather work with the quantities which are non-negative by construction and non-divergent in the limit f → 0. Note that the leading power law in A SPA m for a given ( , m) depends on the spin, it is for this reason that we normalize with the amplitude of the 22-mode.

Default reconstruction
Currently the highest known PN-term in the expansion of the H m is proportional to f 2 . In order to model currently unknown higher-order effects, we introduce up to three pseudo-PN terms {α, β, γ} : Following [3], we do not calibrate {α, β, γ} directly: instead, we compute parameter-space fits of the hybrids' amplitudes evaluated at three equispaced frequencies 0.5 f Ins m , 0.75 f Ins m , f Ins m . By default, the inspiral amplitude is reconstructed by requiring that the ansatz of Eq. (4.7) passes through the three collocation points given by the parameter space fits. We observe however, that in some regions of parameter space this leads to oscillatory behaviour of the reconstructed amplitude for the (2, 1) and (3, 2) modes, and a lower order polynomial, calibrated to a smaller number of collocation points, gives more robust results. This problem arises in regions of the parameter space where the model is poorly constrained due to the lack of NR simulations, such as in cases with very high positive spins (where in addition the correct functional form is rather simple and a higher order polynomial is not required), and where the amplitude of the waveform is very small (see e.g. Fig. 1). For this reason we apply a series of vetoes that remove collocation points and allow for a smooth reconstruction, as discussed next.

Vetoes and non-default reconstruction
For the (2, 1) mode, when q < 8 we remove the collocation points with an amplitude below a threshold of 0.2 (in geometric units), which is a typical value for the ringdown for comparable masses. Furthermore, we check whether the amplitude values at the three collocation points form a monotonic sequence, otherwise we remove the middle point to avoid oscillatory behaviour.
For the (3, 2) mode we drop the middle collocation point if it is not consistent with monotonic behaviour. In addition, for this mode we have isolated two particular regions of the parameter space where we drop collocation points from our reconstruction due to the poor quality of the reconstruction. The first one is given by q > 2.5, χ 1 < −0.9, χ 2 < −0.9, where we do not use any of the collocation points and just reconstruct with the PN ansatz. The second is given by q > 2.5, χ 1 < −0.6, χ 2 > 0 where we just remove the highest frequency collocation point. For the 33 mode we remove the last collocation point in the region q ∈ (1, 1.2), χ 1 < −0.1, In the future we will revisit this problem when recalibrating the amplitude against the recently released new SXS catalogue of NR simulations [56], which we expect to mitigate some of the issues we observe.

B. Phase
For the inspiral phase, we start from the consideration that, with good accuracy, the NR data satisfy the relation [36], As discussed above, our amplitude fits return a real quantity, but one must be mindful that the PN expansions of the A SPA m are, in general, complex (see, for instance, [28]). For each mode, we re-expand to linear order in the frequency the quantities and add them to the rescaled IMRPhenomXAS ansatz. We evaluated the functions Λ PN m for all the hybrids in our catalogue and found that In Fig. 4 we show the behaviour of this approximation for an example case of comparable mass ratio compared to the result obtained from the hybrids.
The accuracy of the above approximation tends to degrade for high-mass ratios, high-spins configurations and the PN relative phases are recovered only at sufficiently low frequencies, as illustrated in Fig. 5. Therefore, we compute parameterspace fits to capture the leading order behaviour of each ∆φ Ins m , and use them to build our final inspiral ansatz in the extrememass ratio regime.
Based on the above discussion, we express the inspiral phase of each multipole as where φ X 22 is the quadrupole phase reconstructed with IMR-PhenomXAS, whose coefficients need to be rescaled as detailed in appendix D, and The constant dφ Ins m in Eq. (4.11) is determined by continuity with the intermediate-region ansatz.
Once a smooth phase-derivative has been reconstructed, the remaining constant, φ Ins m , is fixed by requiring that, at low frequencies, one has which follows from Eq. (4.3) and from our choice of tetrad convention (see also (2.2).

V. INTERMEDIATE REGION
The intermediate region connects the inspiral regime to the ringdown. It is the only region where IMRPhenomXHM is fully calibrated, both in amplitude and phase. While for the amplitude this is the last region to be attached to the rest of the reconstruction, for the phase this is the central piece of the model, to which inspiral and ringdown phase derivatives will be smoothly attached. Physically, this implies that, in IMR-PhenomXHM, the relative time-shifts among different modes are entirely calibrated around merger. We have found that a good practical way of testing whether time shifts are consistent between modes is to compute the recoil of the merger remnant, which is also of astrophysical interest, and which we discuss in Sec. VII C.
The intermediate region covers the range of frequencies where the inspiral cutting frequencies f m Ins are those of Eq. (4.4) and the ringdown cutting frequencies are defined as: In the above equation, f m ring is the fundamental quasi-normal mode (QNM) frequency of the ( , m) mode and the default coefficients δ m RD , m RD are given in Tab. II. Notice that, in the (3, 2) reconstruction, high-mass, high-spin cases require an adjustment of the default cutting frequencies, as we explain in Subsecs. V A and V B below.
The rationale behind our choice of cutting frequencies is simple: in general, the QNM frequencies f m ring mark the onset of the ringdown region and it is therefore natural to terminate our intermediate region slightly before those. This does not apply to the (3, 2)-mode, where, due to mode-mixing, new features appear in the waveforms already around f ≈ f 22 ring < f 32 ring .
In the following subsections, we will describe in more detail our choice of ansätze and collocation points, based on the specific features of our numerical waveforms in the intermediate region.  A. Amplitude

Default reconstruction
In the intermediate frequency regime we model the amplitude with the inverse of a fifth-order polynomial as A Inter where A 0 = π 2η 3 . This function has six free parameters, which are determined by imposing the value of the amplitude at two collocation points, together with four boundary conditions (two on the amplitude itself, and two on its first derivative), so that the final IMR amplitude is a C 1 function. We use two collocation points at the equally spaced frequencies f m

Extreme-mass-ratio reconstruction
For the EMR regime we refine the model to adapt to the steep amplitude drop at the end of the inspiral part, which is

Collocation Points
Value Derivative associated with the sharp transition from inspiral to plunge for extreme mass ratios. As one would expect, this drop is deeper for very negative spins. We find that the ansatz of Eq.

Collocation Points
Value Derivative We then apply the default reconstruction procedure in the region f ∈ f m Int 0 , f m RD , imposing the conditions listed in Tab. III, with the replacement f m Ins → f m Int 0 . The two calibration regions are shown in Fig. 6, where have shaded the preintermediate region in red and the intermediate one in blue. We find it convenient to terminate the inspiral region just before the amplitude drop. Therefore, we replace the cutting frequency of Eq. (4.4) with that of a local maximum in the amplitude of ψ 4 , which always precedes the drop (see Fig. 6).
We carried out a fit over the EMR parameter space of the frequency at which this maximum occurs and used it to replace the default inspiral cutting frequency when q > 70.

Vetoes and non-default reconstruction
While for comparable masses the ansatz of Eq. (5.3) is well suited to model the intermediate amplitude, in other regions of the parameter space modelling errors can result in a zerocrossing of the fifth order polynomial. We resolve this problem using a strategy akin to that of Subsec. IV A 2 above, i.e. by switching to a lower order polynomial, the minimum order being one. The regions of the parameter space affected are typically those where the amplitude is very small, or the high-spin regime.
We have also isolated some regions where, due to the poor quality of the reconstruction, we drop both intermediate collocation points. This system of vetoes is summarized in Tab. VI.
We describe now in more detail the adjustments made to the default reconstruction procedure mode-by-mode. A summary of the rules applied can be found in Tab. V.
For the (2, 1) mode, we remove the intermediate colloca- tion points at which the amplitude is below a threshold of 0.2. This happens when the (2, 1) amplitude is very small and in consequence the current accuracy of the parameter space fits is not sufficient. The advantage is that for those cases the (2, 1) mode does not contribute significantly to the total waveform and we can afford to simplify the reconstruction. It can be seen from Fig. 1 that the ratio between the (2, 1) and (2, 2) amplitude is in some cases well below 1%. If the amplitude at the ringdown cutting frequency is below a threshold of 0.01, we remove the two intermediate collocation points. If the intermediate collocation points have passed these preliminary tests, we check whether they form a monotonic sequence, and if not we remove f m Int 2 . Finally, we apply the parameter-space vetoes indicated in Tab. VI.
For the (3, 2) mode, we require that the amplitude at the ringdown cutting frequency is above the same threshold applied to the (2, 1) mode. If this condition is not satisfied, we remove the two intermediate collocation points. If it is, we check whether the collocation points form a monotonic sequence. If not, we drop f m Int 2 . Finally, we apply our set of parameter-space vetoes.
The (3,3) and (4, 4) modes are typically less problematic. However, we find that in the high-spin, high-mass-ratio region (q > 7, χ 1 > 0.95) the inspiral is very long and there is a sharp transition to the ringdown, without a specific merger signature. For that reason we remove the two intermediate collocation points and connect inspiral and ringdown with a third-order polynomial. Once again, we apply the vetoes of Tab. VI.
After applying all the mode-specific vetoes, we check whether the denominator of our polynomial ansatz ever crosses zero in the frequency range of the intermediate region.
If so, we lower the order of the polynomial by iteratively re-laxing the boundary conditions until we obtain a well-defined ansatz.

B. Phase
In the intermediate region our most general ansatz for the phase-derivative of each mode reads: For the modes that do not show significant mode-mixing, namely (2, 1), (3, 3), (4, 4), we set a m 3 = 0 in the above equation. For these modes, we do not impose any boundary condition, which leaves us with a total of five free coefficients, which we determine by solving the linear system In the reconstruction of the (3, 2)-mode, we allow a 32 3 to be non-zero: this extra degree of freedom allows to have better control on the effects caused by mode-mixing (see Fig. 7). In this case only, we impose two boundary conditions coming from the ringdown-region, where the (3, 2)-phase is also fully calibrated 1 . We determine the six free coefficients of Eq. (5.4) by solving the system dφ Int and β(η) is a monotonically decreasing function of η that shifts forward the frequency of the first collocation points for high Check that the denominator of the resulting ansatz does not cross zero, if so remove derivatives at boundaries  mass-ratios, thus reducing the steepness of the parameterspace fit surfaces in this limit, chosen here as β(η) := (1. + 0.001(0.25/η − 1)).
We find it convenient to model one more collocation point than what is strictly needed, in order to add some flexibility to the calibration. Our standard choice of collocation points Collocation point frequencies can result in a badly-behaved reconstruction in regions of the parameter space where we have fewer calibration waveforms, such as the high spin and/or high mass-ratio regime. In such cases, we drop one of the collocation points close to inspiral, where the phase derivative has a steeper slope and is harder to model accurately, and replace it with a point in the flatter near-ringdown region, as we illustrate in Fig. 8.

VI. RINGDOWN MODEL
The ringdown region covers the frequency range where f m RD was defined in Eq. (5.2). In IMRPhenomXHM, the modes (2, 1), (3, 3), (4, 4) have a fully calibrated amplitude, while their phase is built by appropriately rescaling the quadrupole's phase, along the lines of IMRPhenomHM.
When mode-mixing visibly affects the ringdown waveform (i.e. in the (3, 2)-mode reconstruction), the model is instead fully calibrated to NR. In this case, the key observation is that the signal is much simpler when expressed in terms of spinweighted spheroidal harmonics, as we illustrate in Figs. 9 and 10. This can be traced back to the fact that the Teukolsky equation is fully separable only in a basis of spheroidal harmonics, and not in a spherical-harmonic one [62]. Under the simplifying assumption that the (3, 2)-mode interacts only with the (2, 2)-mode, the spherical-harmonic strains can be projected onto a spheroidal-harmonic basis by means of a simple linear transformation, which we describe in App. A. We reconstruct amplitude and phase of the signal in a spheroidal-harmonic basis and then rotate the full waveform back into the original basis. After doing so, the ringdown reconstruction can be smoothly connected to the inspiral-merger waveform.
In the following subsections, we provide further details about the ansätze used in this region.

A. Amplitude
The ansatz we adopt is similar to the one used in IMRPhe-nomXAS: except for the factor f −1/12 used here for historical reasons and for the replacement of the (2, 2) ringdown and damping frequencies with those of the corresponding ( , m) mode. This ansatz is used for all the modes calibrated in the model. Notice, however, that for the (3,2) mode the ansatz is fitted to the data expressed in a spheroidal-harmonic basis, see Fig. 10.
We first fit the free coefficients {a λ , λ, σ} to NR data in a "primary" direct fit and then perform a parameter space fit of each coefficient for every mode. We find that the coefficient σ shows a very small dynamic range for the modes (3,3), (3,2) and (4,4), and we thus take it as a constant. We then perform a "secondary" direct fit where we redo the direct fits using the constant values for σ shown in Tab. VIII. Finally we repeat the parameter space fits for a λ and λ.
The final reconstruction of the amplitude through inspiral, merger and ringdown for the modes without mixing can be seen in Fig. 11.  B. Phase
Our ansatz for the phase-derivatives in this case reads: which, integrated, gives: We first compute parameter-space fits of the quantities α 22 λ and α 22 2 and rescale them to obtain their higher-modes counterparts. We set where w lm are some constants, which only depend on , m and not on the intrinsic parameters of the binary. We verified that the above equalities hold, albeit approximately, for the parameters of the direct fits to each mode's phase derivative. The shifts dφ m RD and φ m RD are fixed by requiring a smooth connection to the intermediate-region reconstruction.

Modes with mode-mixing
As we outlined at the beginning of this section, the morphology of the (3, 2)-mode ringdown signal is significantly affected by mode-mixing. In this case, we first build a reconstruction of the phase derivative in a spheroidal-harmonics base, using the ansatz below (6.6) Integrating the above equation, one obtains where the subscript S is a reminder that we are now working in a spheroidal-harmonic basis. The four free coefficients of Eq. (6.6) are determined by solving the linear system where G i 32 are some parameter-space fits of the value of the phase derivative, evaluated at four collocation points f i RD , i ∈ [1, 4], given in Tab. IX.
One must ensure that φ 32,S has the correct relative time and phase shift with respect to the (2, 2) mode that is being used, or else the rotation back to the original spherical-harmonic basis will produce an incorrect result. Therefore, we compute two extra fits  at some suitable reference frequencies f 0 , f 1 in the ringdown region, and use them to correctly align our spheroidalharmonic reconstruction to the quadrupole's phase given by IMRPhenomXAS.
In Fig. 12 we plot the (3, 2) mode of a hybrid waveform built from the SXS simulation SXS:BBH:0271 and show the corresponding time-domain reconstructions resulting from IMRPhenomXHM and IMRPhenomHM. The inset shows that our model can better capture the effects of mode-mixing on the ringdown waveform.

A. Single Mode Matches
To quantify the agreement between two single-mode waveforms (reals in time domain) we use the standard definition of the inner product (see e.g. [63]), where S n ( f ) is the one-sided power spectral-density of the detector. The match is defined as this inner product divided by the norm of the two waveforms and maximised over relative time and phase shifts between both of them, Accordingly, we define the mismatch between two waveforms as For our match calculations we use the Advanced-LIGO design sensitivity Zero-Detuned-High-Power Power Spectral Density (PSD) [64,65] with a lower cutoff frequency for the integrations of 20 Hz. In Fig. 13 we first show single-mode mismatches against a validation set consisting of 387 of our hybrid waveforms built from the latest SXS collaboration catalog, where we have discarded 152 hybrids, which show up as outliers in our calibration procedure for at least one of the modes, which we suspect to be due to quality problems with the waveforms. The list of SXS waveforms we have used is provided as supplementary material. We also show mismatches for masses between 3M and 150M (individual masses not smaller than 1M ) among IM-RPhenomXHM, the previous IMRPhenomHM model and the independent NRHybSur3dq8 surrogate model. In Fig. 14 we show the mismatches for the calibration region of NRHyb-Sur3dq8, for mass ratios below 9.09 and dimensionless spin magnitudes up to 0.8, and up to 0.5 in the neutron star sector of total masses up to 3M . We carry out three set of comparisons, in red we have the mismatches between IMRPhe-nomXHM and IMRPhenomHM, in blue IMRPhenomHM versus NRHybSur3dq8 and finally in green IMRPhenomXHM versus NRHybSur3dq8. The results show that IMRPhe-nomXHM is in a much better agreement with the surrogate model that the previous version IMRPhenomHM, the improvement is particularly remarkable for the (3, 2) mode due to the modelling of the mode-mixing. In Fig. 15 we show matches for cases outside of the spin region defined before to assess the effects of extrapolation beyond the calibration region.

B. Multi-Mode Matches
When having a multi-mode waveform, not only it is important to model accurately each individual mode but also the relative phases and time shifts between them. To test this we compute the mismatch for the h + and h × polarizations between our hybrids and the model for three inclination values: 0, π/3 and π/2 (rad). The polarizations for the hybrids and the model are built with the same inclination, however the azimuthal angle entering the spherical harmonics can be different. We denote by φ S and φ T the azimuthal angle of the hybrids (source) and model (template) respectively. φ S takes the values of an equally spaced grid of five points between 0 and 2π. Then for each value of φ S we numerically optimize the value of φ T that gives the best mismatch. For each configuration of inclination and φ S the mismatch is computed for an array of 8 total masses from 20M to 300M logarithmically spaced and then we take the minimum, mean and maximum values over all these configurations. Similarly to the single-mode matches we used the Advanced-LIGO design sensitivity Zero-Detuned-High-Power noise curve and a lower cutoff of 20 Hz. The results are shown in Fig. 16. It can be observed that the mismatches degrade for higher inclinations due to the weaker contribution of the dominant 22 mode which is the best modelled mode, although events that are seen close to edge-on are much less likely to be detected due to the reduced signal-tonoise-ratio. For edge-on systems we only show results for the h + polarization, since the h × vanishes. Alternative ways of quantifying multi-mode mismatches have been used in the literature, see e.g. [66] or [67], with different advantages and and drawbacks. The quantities we show here are chosen for simplicity.

C. Recoil
Asymmetric black-hole binaries will radiate gravitational waves anisotropically. This will result in a net emission of linear momentum, at a rate (in geometric units) where n k is the radial unit-vector pointing away from the source, leading the final remnant to recoil in the opposite direction. The precise value of the final recoil velocity will depend on the interactions among different GW multipoles. This quantity is extremely sensitive to the relative time and phase shifts among different modes and thus provides an excellent and physically meaningful test-bed for our model. We computed the final recoil velocity predicted by IM-RPhenomXHM for different binary configurations and compared the results with those obtained directly from our hybrid waveforms. As new numerical simulations became available, several works presented increasingly improved NR-based fits for the final recoil velocity (see for instance [68][69][70][71] and the latter reference for further works and comparisons). Below we compare our results to the fit of Ref. [69], for two test configurations: a black-hole binary where both bodies are nonspinning (Fig. 17), and one where both are spinning with Kerr parameters χ 1 = χ 2 = 0.5 (Fig. 18). For comparison, we also show the recoil velocities obtained with IMRPhenomHM.

D. Time-domain behaviour
We have checked that the model has a reasonable behaviour in regions of the parameter-space where no simulations are available (e.g. 18 ≤ q ≤ 200) and for extreme spins. We show here example-waveforms to test both these regimes. Figs. 19 and 20 show single-mode waveforms for binaries with parameters (q, χ 1 , χ 2 ) = (4, 1., 1.) and (q, χ 1 , χ 2 ) = (100, 0.7, 0.7), respectively. We can see that in both cases the model returns well-behaved waveforms.  [69] (black, note that here we also shade in gray the fit's error margins, using the error estimates provided by the authors in Tab. IV of the aforementioned reference). In blue we show the recoil velocity for all the corresponding configurations in our calibration dataset. Note that, despite the loss of accuracy due to having fewer waveforms than in the non-spinning case, our model returns a value of |v f | much closer to NR than the uncalibrated version.

E. Parameter estimation: GW170729
In our companion paper to present IMRPhenomXAS for the (2, 2) mode we re-analyzed the data for the first gravitational wave event, GW150914, as an example for an application to parameter estimation. Here we present a re-analysis There are no NR simulations in our calibration dataset with q > 18, and the extrapolation to high mass-ratios is done by placing Teukolsky waveforms at the large-q boundary of the parameter space, as explained in Subsec. III B. Here we can see that the model achieves a smooth transition between NR and point-particle physics.
of GW170729, where the effect of models with subdominant higher harmonics has been discussed in the literature [16], and we will demonstrate broad agreement between IM-RPhenomXHM, IMRPhenomHM and SEOBNRv4HM for this event. Again we use coherent Bayesian inference methods to determine the posterior distribution p( θ| d) to derive expected values and error estimates for the parameters of the binary. Following [16], we use the public data for this event from the Gravitational Wave Open Science Center (GWOSC) [72][73][74] calibrated by a cubic spline and the PSDs used in [8]. We ana-lyze four seconds of the strain data set with a lower cutoff frequency of 20Hz. For our analysis we use the LALInference [9] implementation of the nested sampling algorithm. We perform the runs using 2048 'live points' for five different seeds, then merge into a single posterior result. We choose the same priors used in [16], taking into account that IMRPhenomXHM is a non-precessing model and we have to use aligned spin priors.
In Fig. 21 we compare our results with the higher mode models (IMRPhenomHM and SEOBNRv4HM) and the (2,2) mode model results (IMRPhenomD) published in [16]. We find that the posteriors derived from IMRPhenomXHM are consistent with the two other models that include higher harmonics, which can be distinguished from the results obtained for models that only include the (2,2) mode.

F. Computational cost
We will now compare the computational cost of the evaluation of different models available in LALSimulation compared to the IMRPhenomX family. Since the different models include a different number of modes we also show the evaluation time per mode. Using the GenerateSimulation executable within LALSimulation we compute the average evaluation time over 100 repetitions for a non-spinning case (q, χ 1 , χ 2 ) = (1.5, 0, 0) for a frequency range of 10 to 2048 Hz. We vary the total mass of the system from 3 to 300 solar masses and the frequency spacing d f is automatically chosen by the function SimInspiralFD to take into account the length of the waveform in the time domain for the given parameters. All the timing calculations were carried out in the LIGO cluster CIT to allow comparison with the benchmarks we have shown in [30] to compare different accuracy thresholds of multi-banding which is a technique that accelerate the evaluation of the model by evaluating it in a coarser non-uniform frequency grid and using interpolation to get the waveform in the final fine uniform grid, reducing considerably the computational cost.
In Fig. 22 the dashed lines represent models for only the 22 mode (IMRPhenomD, SEOBNRv4 ROM and IMRPhe-nomXAS). We see that the three models show very similar performance for low masses, while for higher masses the IMR-Phenom models are faster. Models that include higher modes are shown with solid lines: NRHybSur3dq8 (11 modes), IM-RPhenomHM (6 modes) and IMRPhenomXHM (5 modes), the latter is shown with and without the acceleration technique of multibanding [30]. Since NRHybSur3dq8 is a time domain model, for many applications the actual evaluation time would also include the time for the Fourier transformation to the frequency domain, which can also lead to requirements for a lower start frequency and windowing to avoid artefacts from the Fourier transforms, which again would increase evaluation time. We also see that the new model IMRPhenomXHM without multibanding is already significantly faster than the previous IMRPhenomHM model. Comparing the new model to the surrogate, IMRPhenomXHM without multibanding is significantly faster when considering all the modes, but the evaluation cost per mode is only lower for high masses. However, using the multibanding technique [30] with the threshold value of 10 −3 , which is the default setting when calling the model in LALSuite (IMRPhenomXHM MB3 in the plot), IMRPhenomXHM is significantly faster also for the evaluation time per mode. The threshold value can be adjusted to control the speed and accuracy of the algorithm as explained in Appendix C and in [30], where we have shown that for an example injection of a relatively high signal-to-noise ratio 28, even at a threshold of 10 −1 , which evaluates significantly faster than the conservative default setting, differences in posteriors are hardly visible.

VIII. CONCLUSIONS
Phenomenological waveform models in the frequency domain have become a standard tool for gravitational wave parameter estimation [8] due to their computational efficiency, accuracy [75], and simplicity. The current generation of such models has been built on the IMRPhenomD model for the (2,2) mode of the gravitational wave signal of non-precessing and non-eccentric coalescing black holes, which has been extended to precession by the IMRPhenomP [4,76] and IMRPhenomPv3 [33] models, to sub-dominant harmonics by the IMRPhenomHM model, and to tidal deformations by the IMRPhenomPv2 NRTidal model [77,78].
The present paper is the second in a series to provide a thorough update of the family of phenomenological frequency domain models: In a parallel paper [1] we have presented IMRPhenomXAS, which extends IMRPhenomD to a genuine double spin model, includes a calibration to extreme mass ratios, and improves the general accuracy of the model. In the present work we extend IMRPhenomXAS to subdominant modes. Contrary to IMRPhenomHM, the IMRPhenomXHM model we present here is calibrated to numerical hybrid waveforms, and we have tested in Sec. VII that the new model is indeed significantly more accurate than IMRPhenomHM. Calibration of the model to comparable mass numerical data has proceeded in two steps: we have started with a data set based on numerical relativity simulations we have performed with the BAM and Einstein Toolkit codes, and the data set corresponding to the 2013 edition of the SXS waveform catalog [50] (including updates up to 2018). The quality and number of waveforms available at the time has determined the number of modes we model in this paper, i.e. the (2, 1), (3,2), (3,3) and (4, 4) spherical harmonics. During the implementation of the model in LALSuite [2], the 2019 edition of the SXS waveform catalog [56] became available, and we have subsequently upgraded the calibration of IMRPhe-nomXAS and of the subdominant mode phases to the 2019 SXS catalog. We have not updated the amplitude calibrations, which are more involved but contribute less to the accuracy of the model. Instead, we plan to update the amplitude model in future work, where we will include further harmonics, in particular the (4, 3) and (5, 5) modes, which we can now calibrate to numerical date thanks to the increased number of waveforms and improved waveform quality of the latest SXS cata-FIG. 21: Comparison between IMRPhenomXHM, IMRPhenomHM, SEOBNRv4HM and IMRPhenomD [16] for the posterior distributions of mass ratio, total mass, effective aligned spin and luminosity distance respectively. The dashed vertical lines mark the 90% confidence limits. log.
The computational performance and a method to accelerate the waveform evaluation by means of evaluation on appropriately chosen unequally spaced grids and interpolation is presented in a companion paper [30].
While IMRPhenomXHM resolves various shortcomings of IMRPhenomHM, further improvements are called for by the continuous improvement of gravitational wave detectors: We will need to address the complex phenomenology of the (2,1) harmonic for close to equal masses, add further modes as indicated above, and include non-oscillatory m = 0 modes.
IMRPhenomXHM can be extended to precession following [4,33,76]. Regarding mode mixing in the context of precession one will however take into account that mixing then occurs between precessing modes [79], which will require modifications to our handling of mode mixing, which take precession into account. Spin-weighted spheroidal harmonics can be written as a linear combination of spherical ones: In the sum above, each spherical harmonic is weighted by a mixing coefficient α l l m measuring its overlap with the corre-sponding spherical harmonic: Note that the mixing coefficients are functions of the final spin only. Although in theory the (3,2) couples to all the modes with m = 2, in practice we find that the strongest source of mode-mixing comes from the mixing with the (2, 2). Therefore, we choose to neglect the coupling to modes with l > 3. Under this assumption the coefficients of the strain in the two bases of harmonics are related via the following simple linear transformation: The mixing coefficients have been computed in [80] for black holes spinning up to χ f = 0.9999. To improve accuracy for extreme spins, we perform a quadratic-in-spin fit of all the data-points with χ f ∈ [0.999, 0.9999] and use it to obtain the values of the mixing coefficients extrapolated at |χ f | = 1. independent version numbers, and are implemented in different files of the source code. Note that the XLAL standard implies that all the source code files are included via the C preprocessor into the main driver file, LALSimIMRPhenomXHM.c.
The model can be called both in the native Fourier domain, and in the time domain, where an inverse fast Fourier transformation is applied by the LALSuite code. The SWIG [82] software development tool is used to automatically create Python interfaces to all XLAL functions of our code, which can be used alternatively to the C interfaces.
Special attention is due for the time and phase alignment of our LALSuite implementation. As mentioned in Sec. II B, our hybrid waveforms are aligned in time such that the Newman-Penrose scalar for the = |m| = 2 modes peaks 500M before the end of the waveform. In the LALSuite implementation, we first apply a global time-shift of 500M to our reconstructed waveforms, and then a parametric fit that accounts for the time-difference between the peak-time of ψ 4 and that of strain. An inverse Fourier transformation of the Fourier domain waveform, as produced by LALSuite, will then return a strain peaking around the end of the waveform. When calling the model in the time domain through LALSimulation's ChooseTDWaveform interface, the time coordinate is chosen such that t = 0 for the peak of the sum of the square of the polarizations: These polarizations include all the modes used to generate the model and also depend on the line of sight from the detector to the source through the inclination and azimuthal angle. This choice is consistent with the choice made for the IMR-PhenomHM model. In LALSimulation the model is called through the function ChooseFDWaveform, whose input parameters f ref and Note that when talking about positive frequencies we have to refer to the negative mode, although we usually skip the minus sign for economy of the language.
Since our model is built in the Fourier domain we can not compute the quantity t f ref without a Fourier transformation to the time domain plus a numerical root finding, and we currently set it to t f ref = 0. Furthermore, the expression C3 would not be valid if f ref is situated in the merger-ringdown part of the waveform, because the SPA approximation is only reliable for the inspiral. This means that when comparing a time-domain model with our model with the exact same parameters we can only expect them to agree up to rotations, and would thus have to optimize over phiRef to achieve agreement.
In LALSimulation the azimuthal angle that enters in the spin-weighted spherical harmonics is defined as β = π 2 − phiRef, this means that changing the parameter phiRef is equivalent to rotate the system: For example, by increasing the phiRef by a quantity δφ, we would rotate the system an angle −δφ. When the 22 mode only is considered, phiRef acts just as a global phase factor for the waveform (e i 2phiRef ) and the match is not affected since it maximizes over phase and time shifts. However, when higher modes are included this is not satisfied anymore since the term e i mphiRef is different for every mode and can not be factored out. Note that in our LALSuite code, this rotation is applied to every individual mode, such that individual modes and the mode sum are consistent with respect to rotations.
The user is free to specify the spherical harmonic modes that should be used to construct the waveform. The default behaviour is to use all the modes available: {22, 2-2, 21, 2-1, 33, 3-3, 32, 3-2, 44, 4-4}, below we describe how the modes can be chosen through the different interfaces available for LALSuite waveforms. Furthermore, the model implemented in LALSuite supports acceleration of waveform evaluation by interpolation of an unequispaced frequency grid broadly following the "multibanding" of [31]. Our version of the algorithm is described in [30] to do the evaluation faster and can also use a custom list of modes specified by the user. The multibanding algorithm is parameterized by a threshold, which describes the permitted local interpolation error for the phase in radians, lower values thus correspond to higher accuracy. The default value is set to a value of 10 −3 .
Extensive debugging information can be enabled at compile time with the C preprocessor flag -D PHENOMXHMDEBUG.

a. Python
Interface. To call the model with the default behaviour we use the function SimInspiralChooseFDWaveform from lalsimulation with the argument lalparams being an empty LALSuite dictionary lalparams=lal.CreateDict(). The threshold of the multibanding and the mode array can be changed by adding their values to the LALSuite dictionary in the following way: If threshold=0 then multibanding is switched off. By calling ChooseFDWaveform with this LALSuite dictionary we would get the hp and hc polarizations from the contribution of the 22, 2-2, 21, and 2-1 modes without using multibanding.
c. LALInference and Bilby. We also included the options in the two standard codes to perform Bayesian inference in gravitational wave data analysis: LALInference [83] and Bilby [10]. LALInference uses the same syntax than GenerateSimulation when called through the command line. You can also add these options to the config file and the example we have employed so far can be called as: [engine] ... approx = IMRPhenomXHMpseudoFourPN modesList = '2,2, 2,-2, 2,1, 2,-1' phenomXHMMband = 0 ...
Note that in the current version of LALInference the string pseudoFourPN has to be added to the name of the approximant. For Bilby these options are specified in the waveform argument dictionary defined in the configuration file. The equivalent example would be called as: The released version of Bilby does not support the multibanding option yet, however a private branch that support this option can be downloaded with git clone -b imrphenomx https://git.ligo.org/cecilio. garcia-quiros/bilby.git.

(E7)
We can now compute the complex non-polynomial Fourier Domain PN amplitudes A m , that we re-expand up to 3PN order, as we have done for the = |m| = 2 modes in IMRPhe-nomXAS. We list the resulting complex Fourier Domain PN amplitudes following this method. We write them as a func- and factor out a common term to simplify the comparison with [60] (note that there ν = η, V m = v and mΨ SPA + π/4 = Ψ lm ), finally obtaining The Post-Newtonian expressions that we use to calibrate the inspiral part of the amplitude are given by the expanded expressions below, as used in eq. (E3), except for the (2, 1) mode. We observed that for some cases with q < 40 the reexpansion of the (2, 1) mode breaks down before reaching the cutting frequency of the inspiral. For those cases the (2, 1) amplitude is very small (see Section II B) and the spinning contribution of higher PN terms is more important due to the different competing effects which can lead to cancellations in the waveform which are not captured by our 3PN accurate qusi-circular expressions. For the (2, 1) mode we therefore do not re-expand in a power series but keep the form of expression (E3) when q < 40. The expression (E3) is not used for q > 40 because it shows a divergence that appears before the inspiral cutting frequency for high spins.