Fully self-consistent $GW$ and quasi-particle self-consistent $GW$ for molecules

Two self-consistent schemes involving Hedin's $GW$ approximation are studied for a set of sixteen different atoms and small molecules. We compare results from the fully self-consistent $GW$ approximation (SC$GW$) and the quasi-particle self-consistent $GW$ approximation (QS$GW$) within the same numerical framework. Core and valence electrons are treated on an equal footing in all the steps of the calculation. We use basis sets of localized functions to handle the space dependence of quantities and spectral functions to deal with their frequency dependence. We compare SC$GW$ and QS$GW$ on a qualitative level by comparing the computed densities of states (DOS). To judge their relative merit on a quantitative level, we compare their vertical ionization potentials (IPs) with those obtained from coupled-cluster calculations CCSD(T). Our results are futher compared with"one-shot"$G_0W_0$ calculations starting from Hartree-Fock solutions ($G_0W_0$-HF). Both self-consistent $GW$ approaches behave quite similarly. Averaging over all the studied molecules, both methods show only a small improvement (somewhat larger for SC$GW$) of the calculated IPs with respect to $G_0W_0$-HF results. Interestingly, SC$GW$ and QS$GW$ calculations tend to deviate in opposite directions with respect to CCSD(T) results. SC$GW$ systematically underestimates the IPs, while QS$GW$ tends to overestimate them. $G_0W_0$-HF produces results which are surprisingly close to QS$GW$ calculations both for the DOS and for the numerical values of the IPs.

Two self-consistent schemes involving Hedin's GW approximation are studied for a set of sixteen different atoms and small molecules. We compare results from the fully self-consistent GW approximation (SCGW ) and the quasi-particle self-consistent GW approximation (QSGW ) within the same numerical framework. Core and valence electrons are treated on an equal footing in all the steps of the calculation. We use basis sets of localized functions to handle the space dependence of quantities and spectral functions to deal with their frequency dependence. We compare SCGW and QSGW on a qualitative level by comparing the computed densities of states (DOS). To judge their relative merit on a quantitative level, we compare their vertical ionization potentials (IPs) with those obtained from coupled-cluster calculations CCSD(T). Our results are futher compared with "one-shot" G0W0 calculations starting from Hartree-Fock solutions (G0W0-HF). Both self-consistent GW approaches behave quite similarly. Averaging over all the studied molecules, both methods show only a small improvement (somewhat larger for SCGW ) of the calculated IPs with respect to G0W0-HF results. Interestingly, SCGW and QSGW calculations tend to deviate in opposite directions with respect to CCSD(T) results. SCGW systematically underestimates the IPs, while QSGW tends to overestimate them. G0W0-HF produces results which are surprisingly close to QSGW calculations both for the DOS and for the numerical values of the IPs.

I. INTRODUCTION
Self-consistent methods are commonly used to solve the non-linear equations appearing in electronic structure theory. For instance, in the Hartree-Fock (HF) method, 1,2 one iteratively determines the best single-determinant wave function, starting from a reasonable initial guess, until the energy is minimized. In the Kohn-Sham framework of density-functional theory (DFT) one uses self-consistency to find, for a given exchange-correlation functional, a set of single-particle orbitals that are used to determine the electron density 2-4 . Self-consistency is, in principle, also an essential ingredient to solve Hedin's coupled equations to compute the interacting single-particle Green's function 5,6 . Unfortunately, the full system of Hedin's equations contains unknown functional derivatives that prevent an exact solution. However, Hedin also proposed a simpler approximation, the so-called GW approximation, which is numerically tractable and has proven to be a useful tool to study the electronic properties of real materials [5][6][7][8][9][10][11][12][13][14] .
In the GW approximation, the self energy Σ is obtained from the product of the electron Green's functions (G) and the screened interaction (W ) as Σ = iGW . However, in spite of their apparent simplicity, GW calculations can be numerically quite involved and demanding for real materials. For this reason, a popular approach has been the socalled "one-shot" GW , 7,8,15 where one computes the electron self energy directly from the Green's function G obtained from DFT or HF results and the corresponding screened interaction W . As an alternative, one can iterate the process and feed back the electron self energy into the computation of G and try to achieve self consistency in the relation Σ = iGW . This seems a good idea for several reasons. For example, it eliminates the undesired dependence of the results on the arbitrary starting point that is inherent in the one-shot GW scheme and is often quite large [16][17][18][19] . Even more importantly, it has been shown that self-consistent GW (SCGW ) is a conserving approximation, respecting the conservation of the number of particles, momentum and energy, among others. 20 Unfortunately, it was demonstrated for the homogeneous electron gas 21 that SCGW tends to worsen the agreement of the band structure with respect to experimental results for nearly-free-electron metals, as compared to the simpler one-shot GW scheme. This has been a widely accepted conclusion for years. However, recent work on small molecules and atoms 19,[22][23][24][25][26] has reported some improvements, although moderate, with the use of SCGW .
There is an alternative self-consistent GW procedure, the so-called "quasi-particle self-consistent approximation" (QSGW ), that has been shown to be more accurate than the one-shot GW approximation for several solids and molecules. 12,27 Surprisingly, in spite of the conflicting claims of accuracy for the self-consistent SCGW and QSGW , there are few direct comparisons of their respective performances. Indeed, to the best of our knowledge, a comparison in which these two approaches are treated using the same numerical approach and where their comparative merits can be compared unambiguously, is still lacking. The purpose of this article is to provide such a consistent comparison between SCGW and QSGW using the same numerical implementation.
Our results do not indicate that any of the two self-consistent GW approaches is clearly superior to the other, at least for the description of the small molecules considered here. Indeed, averaging over the set of studied molecules, they give results quite close and only slightly better than those of one-shot G 0 W 0 calculations using HF as a starting point, and SCGW gives results only marginally closer to our reference CCSD(T) calculations than QSGW . During the self-consistent iteration QSGW only requires the evaluation of the self energy at the quasiparticle energies obtained in the previous step. This is computationally much less demanding than SCGW , which needs the self energy at all frequencies. For this reason, QSGW could be a more suitable method for calculations in large systems.
The rest of the article is organized as follows. We briefly describe Hedin's GW approximation in Section II. In Section III, the two self-consistent GW approaches are presented. In Section IV and V, we elaborate our numerical methods and their particular usage for the present all-electron SCGW and QSGW calculations. Section VI contains our results and discussion. We present our main conclusions in Section VII.

II. HEDIN'S GW APPROXIMATION
Green's functions have been a method of choice in solid state physics where electron correlations play an important role. In particular the interacting single-particle Green's function G(r, r , ω) depends only on two spatial variables and frequency, but it directly accounts for the electron density, electron removal and addition energies, and it also allows the computation of the total energy. 28,29 The interacting single-particle Green's function can be found by solving Dyson's equation 28 G(r, r , ω) = G 0 (r, r , ω) + G 0 (r, r , ω)∆Σ(r , r , ω)G(r , r , ω).
Please, notice that here we adopt the convention that an integral over spatial variables is implied in any equation unless these variables appear on its left-hand side. In Eq. (1), G 0 (r, r , ω) is the single-particle Green's function of a reference, artificial, system of non-interacting electrons G 0 (r, r , ω) = [ωδ(r − r ) − H eff (r, r )] −1 , described by an effective one-electron Hamiltonian H eff =T +V ext +V H +V xc ≡Ĥ 0 +V H +V xc .
Here,Ĥ 0 includes the one-electron terms, i.e., the kinetic energy operatorT and the external potentialV ext (electrostatic field of the nuclei). The Hartree term (electrostatic field of the electron density) isV H , and the exchange and correlation operator is denoted byV xc . Finally, ∆Σ(r, r , ω) = Σ(r, r , ω) −V xc (r, r ), where Σ(r, r , ω) is the self energy that describes the effects of electron correlations. In order to avoid double counting, it is necessary to subtract the approximate description of those effects already included in the effective one-electron Hamiltonian (V xc ). Standard choices for the reference non-interacting system are given by the Kohn-Sham and HF methods. The interacting Green's function is then obtained by solving Dyson's equation A closed set of exact equations for the Green's functions, the self energy (and a vertex) was written down by Hedin. 5 However, these equations have been solved so far only for model systems 30,31 . Fortunately, Hedin 5 also proposed an expansion of the self energy in powers of the screened interaction W (r, r , ω). To the lowest order he obtained a simple expression for the self energy, the so-called GW approximation, where the self energy is given by the product of the Green's function and the screened Coulomb interaction 5 with η being a positive infinitesimal. The screened Coulomb interaction W (r, r , ω) takes into account that an electron repels other electrons and thereby effectively creates a cloud of positive charge around it that weakens or screens the bare Coulomb potential. The screened interaction can be found as a solution of an integral equation where, to the lowest order in the electron-electron interaction, the polarization operator can be evaluated as 5 Equations (1), (6), (7) and (8) constitute a closed set of equations that can be iteratively solved in order to find an approximation to the interacting one-electron Green's function G(r, r , ω). This is usually known as the self-consistent GW approximation (SCGW ). The corresponding cycle is schematically depicted in Fig. 1. It is important to stress that, as already noted above, SCGW is just an approximation to the exact set of Hedin's equations. The exact set of equations involves the vertex function Γ(r, r , ω; r , ω ), which requires computing the functional derivative of the exact self energy. The GW approximation replaces the vertex function by δ(r − r )δ(r − r ), which is the zeroth order expression for the expansion of the vertex function in terms of the screened interaction W . Thus, the GW approximation transforms Hedin's equations into a numerically tractable set of equations.
In spite of their apparent simplicity, GW calculations are still numerically demanding. This is one of the reasons why most studies of real materials to date do not use the SCGW approach, i.e. do not iterate GW equations until self-consistency, but rather use the so-called G 0 W 0 approximation. In this "one-shot" calculation, the non-interacting Green's function G 0 (r, r , ω) is used instead of the interacting one in Eqs. (6), (7) and (8). The screened Coulomb interaction obtained in this way is referred to as W 0 in the following. A clear drawback of the G 0 W 0 calculation is the dependence of the results on the approximation used to compute the non-interacting Green's function G 0 . [16][17][18][19]32,33 This dependence gives rise to sizable differences, for example, starting from HF or DFT effective Hamiltonians. The SCGW scheme can correct this undesired feature of G 0 W 0 . Furthermore, it can be shown 20 that the self-consistent version of GW is a conserving approximation, i.e., respects electron number, momentum and energy conservation. Exact equations involve the vertex function Γ for which, unfortunately, there is not an explicit formula available. Instead, the SCGW method approximates Γ by its zeroth order term in an expansion as function of W , Γ(r, r , ω; r , ω ) ≈ δ(r−r )δ(r−r ), giving rise to equations (6), (7) and (8) in the text. These equations, together with Dyson's equation (1) define a self-consistent procedure to compute the interacting Green's function G.

III. SELF-CONSISTENT APPROACHES INVOLVING HEDIN'S GW
The formally simplest self-consistent GW approximation is illustrated in Fig. 1. In this procedure, the self energy at a given iteration is computed with the Green's function from the previous iteration using the equations (6), (7) and (8) presented above. This new self energy is then used to calculate a new Green's function, and the process is iterated until a stable solution is found.
In the first iteration, to start the self-consistent loop, we need an initial approximation to the Green's function. This is typically obtained from the non-interacting Green's function G 0 (r, r , ω) according to equation (2) using some suitable one-electron effective theory. The non-interacting electron density response χ 0 (r, r , ω) and the screened interaction W 0 (r, r , ω) are then obtained using equations (8) and (7). With the screened interaction, we can already calculate the self energy Σ(r, r , ω) according to equation (6). So far the calculation is equivalent to a "one-shot" G 0 W 0 calculation. However, inserting the calculated self energy into equation (5) we can obtain our first approximation to the interacting Green's function G(r, r , ω).
We can now start the GW calculation again, using the obtained interacting Green's function G(r, r , ω) (instead of the non-interacting one G 0 (r, r , ω)), to compute χ(r, r , ω) and repeat the cycle until reaching self-consistency. In such cycle, the Green's function in step n, G (n) , is computed from the self energy Σ (n−1) obtained using the information from the previous step The electron density n(r) has to be recalculated at the end of each iteration according to the relation and, therefore, the Hartree potential V H (r) must be also updated after each iteration. E F in Eq. 10 is the Fermi energy of the system, which is determined by the number of electrons. The most detailed studies on the performance of the SCGW scheme have been carried out for the homogeneous electron gas. 9,21,34,35 For this system it has been shown that SCGW does not improve or even worsens the description of the band structure, overestimating the bandwidth. 9 Furthermore, the weight of the plasmon satellite is reduced with respect to G 0 W 0 and it almost disappears in some cases. Part of these deficiencies seem to be related to the use of the interacting Green's function in the definition of the polarizability function χ (Eq. 8). Due to the renormalization of the quasiparticle weight and the transfer of spectral weight to higher energies (plasmon satellite), χ looses its clear physical meaning as a response function and it no longer satisfies the f -sum rule. 9 As a consequence, the description of the screened interaction W is also affected and the plasmon resonance becomes very broad and ill-defined. For systems other than the homogeneous electron gas, the situation is not so clear. Recent studies for atoms and small molecules seem to reach conflicting conclusions about whether SCGW improves the ionization energies given by G 0 W 0 with suitable starting points, and whether these improvements are sufficiently systematic to justify the use of the computationally more demanding SCGW . 19,[22][23][24][25][26]32,33 In general, the improvements, when present, seem to be small. In spite of these deficiencies, the total energies obtained from SCGW Green's functions, using either the Galitskii-Migdal formula 29 or the Luttiger-Ward functional 36 , are quite accurate. 9,[21][22][23]25,26 The good behavior of the total energy is probably related to the energy conserving character of the SCGW approximation. 9,20 Furthermore, the conserving character of SCGW is an interesting property that becomes useful in transport calculations. 37 An alternative to this straightforward, self-consistent GW approach is given by the so-called "quasi-particle selfconsistent GW " (QSGW ) approximation recently proposed by Kotani,Schilfgaarde and Faleev. 12,38 The rationale behind this approach is based on the perturbative character of the GW approximation, where the electron self energy is treated as a small perturbation. Therefore, GW should become a more accurate approximation if applied in conjunction with a suitable effective one-electron HamiltonianĤ eff that already provides a fair description of the one-electron-like excitations of the many-electron system or quasiparticles (QP). The quasiparticles can be obtained as solutions of the equation where Re extracts the Hermitian part of the self-energy operator. In QSGW ,Ĥ eff is optimized such that its eigenfunctions (Ψ i ) and eigenvalues (E i ) are good approximations to the QP wavefunctions (ψ i ) and energies ( i ) obtained using Eq. 11 and a G 0 W 0 self energy. This is done by defining a suitable mapping Σ G0W0 (ω) →Ĥ eff . Of course, as already described above, in order to compute the self energy Σ G0W0 it is necessary to use a one-electron Hamiltonian as a starting point. Thus, in each iteration n we obtain a new self energy Σ (n) G0W0 , and a new effective Hamiltonian from itĤ (n) eff , that is then used to start the next iteration. The procedure finishes when Ψ i (r) and E i do not change anymore and, therefore, we have reached a self-consistent result for the "optimum"Ĥ eff (of course, the quality of these results is determined by the quality of the Σ G0W0 (ω) →Ĥ eff mapping). Self-consistency in QSGW is therefore not sought within the GW calculation, but rather generating an optimal (in the sense that minimizes the ∆Σ G0W0 ( i )= Σ G0W0 ( i ) − V xc evaluated at the quasiparticle energies i 38 ) non-interacting Green's function G 0 to perform a G 0 W 0 calculation. The principle of this QSGW approach is illustrated in Fig. 2.
Principle of the quasi-particle self-consistent GW approximation (QSGW ). The calculated self energy at the G0W0 level in one iteration is used to define a new one-electron effective Hamiltonian. This newĤ eff provides the starting point for the next G0W0-like iteration. The procedure is repeated until we get a stableĤ eff . The method is based on a heuristic mapping ΣG 0 W 0 (ω) →Ĥ eff as defined in Eq. 12.
So far we have not specified the procedure to perform the mapping Σ G0W0 →Ĥ eff . This mapping is not unique and Kotani et al. have actually proposed several ways to perform it. Here we have adopted the procedures called "mode A" and "mode B" in Ref. 38, which we recast in a single expression: where the operatorV sfe is given byV The frequency ω ij is different for "mode A" and "mode B". For "mode A" ω ij = E j , while for "mode B" ω ij = E j , if i = j, and ω ij = E F otherwise. For the closed-shell molecules considered here we take E F in the middle of the gap between the highest occupied (HOMO) and lowest unoccupied (LUMO) molecular orbitals.
Here Re[Σ ij (ω)] denotes the Hermitian part of the matrix elements of the self energy between the QP wavefunctions Ψ i (r), and they are evaluated at the QP energies E i . These QP wavefunctions Ψ i (r) and energies E i correspond to the solutions of the QSGW effective Hamiltonian at a given iteration and must be updated during the self-consistent loop. Equation (12) is derived from the fact that {Ψ i } forms a complete set and the requirement of having an HermitianV xc operator. 38 In Ref 38 it was also shown that Eq. (12) can be obtained from minimizing the norm of However, the ultimate justification of the use of expression (12) comes from the fact that it has been shown to provide accurate results for the band structure of a large variety of semiconductors and transition metal oxides. 12,38 It is worth noting that in the present calculations we do not observe any evidence of a starting-point dependence of the QSGW results, as recently suggested by calculations in oxides. 39,40 In the case of the small molecules studied here, HF and local density approximation DFT starting points converged always to the same IPs and the same density of states.

IV. IMPLEMENTATION OF SCGW AND QSGW SCHEMES
In the present work we compare the results of G 0 W 0 , SCGW and QSGW calculations performed using the same numerical framework. Our numerical procedure is based on the use of a basis set of atomic orbitals and a basis set of dominant products to express the products among those orbitals, as well as the use of spectral functions to treat the frequency dependence of the functions involved in GW calculations. 41 In this Section, we focus on the main technical differences and describe the additional procedures necessary to perform the present all-electron self-consistent GW calculations.
First, in our previous work 41 we presented G 0 W 0 results for several aromatic molecules starting from DFT pseudopotential 2 calculations. In contrast, here we perform all-electron calculations. This eliminates the important uncertainties associated with the use of pseudopotentials, as discussed by several authors. [24][25][26][42][43][44] The basis of dominant products had to be improved to adapt the basis for core-valence orbital products. The construction of the basis and the necessary improvements are described in subsection IV A.
Second, in previous works we have used numerical orbitals with a finite spatial support. 45 However, here we use Gaussian basis sets to be able to carry out consistent comparisons with coupled-cluster calculations performed using the NWChem package. 46 Third, for small molecules, HF solutions seem to be a better starting point for GW calculations than local or semilocal DFT functionals. 33 For this reason, most of our calculations were initiated from a HF solution of the system. The final results in the self-consistent schemes are independent of the starting point as we will show explicitly. For our HF calculations we have used a modified version of a code originally due to James Talman. 47 In the present work, the Hartree and exchange operators are computed using the dominant products basis.
Fourth, some modifications are necessary in our non-local compression scheme 41 of the dominant product basis to perform SCGW calculations as explained in some detail in the subsection IV D.
Fifth, both self-consistent methods, SCGW and QSGW , need some mixing procedure to achive convergence. The mixing procedures are explained in the subsection IV F.
Finally, we use spectral functions to deal with the frequency dependence of Green's function, response function, screened interaction and self energy. Although the method had not changed substantially since our publication 41 , we briefly describe our method in subsection IV B for the sake of the readability of the manuscript.

A. Expansions using orbital and dominant-products basis sets
We use linear combination of atomic orbitals (LCAO) approach 48 and expand the eigenfunctions Ψ E (r) of the one-electron Hamiltonian in terms of atom-centered localized functions f a (r) The atomic orbitals f a (r) have a predefined angular momentum and radial shape, while the coefficients X E a must be determined by solving the corresponding eigenvalue equation. In this work we have used a basis set of atomic orbitals expanded in terms of Gaussian functions. 49,50 These basis sets are the same used by most of the Quantum Chemistry codes. We have used NWChem code 46 to perform the ∆SCF coupled-cluster calculations that will be compared with our GW results. In particular, for most calculations we have used two different sets of basis for all our calculations: a correlation-consistent double-ζ (cc-pVDZ) and a triple-ζ (cc-pVTZ) basis. This choice represents a trade off between the computational cost of our all-electron GW calculations, their accuracy and our intent to perform calculations for a relatively large set of molecules. Having results with two different basis sets allows estimating the dependence of the observed behaviors on the size of the basis set. Furthermore, the smaller cc-pVDZ basis also allowed us to perform calculations with a higher frequency resolution, which is instrumental to study the convergence with respect to this computational parameter. As commented in more detail in Section VI B, several recent studies of the convergence of GW calculations with respect to the size of the basis set indicate that, for several small molecules and atoms, the cc-pVTZ basis provides results for the IPs within few tenths of eV of the converged values. 18,27,33 This is further confirmed by a systematic convergence study as a function of the basis set size that we have performed for two small systems, He and H 2 . For these two species we could explore the convergence of the results using basis sets up to cc-pV5Z. As described in detail in Subsection V E and Section VI B, these highly converged results seem to confirm that the main conclusions of our comparison among different self-consistent GW schemes remain valid in the limit of saturated basis sets.
In the case of the initial HF calculations, we must self-consistently solve the equation where Hartree and exchange operators depend on the eigenfunctions Ψ E (r), with (we assume here a closed-shell system and the factor of two stands for the two orientations of the spin), and Introducing (14) in equations (15) and (17), we obtain the Hartree-Fock equations in a basis of atomic orbitals with H ab ≡ T ab + V ab ext + V ab H + Σ ab x and S ab , respectively, the matrix elements of the Fock operator and the overlap. The exchange operator Σ ab x is given by The appearance of products of atomic orbitals f a (r)f a (r) in this expression gives rise, in principle, to the need of computing cumbersome four-center integrals. In practice, this can be avoided using an auxiliary basis set that spans the space of orbital products and largely simplifies the calculations. 51,52 . Furthermore, the set of products of atomic orbitals usually comprise strong collinearities. Therefore, if properly defined, the number of elements in this auxiliary basis can be much smaller than the total number of orbital products, making the calculations more efficient. In Ref. 53, one of us presented a well-defined method to obtain such an auxiliary basis for an arbitrary set of atomic orbitals. In this work we use this set of dominant products in all the operations involving products of atomic orbitals.
The dominant products F µ (r) are independently defined for each atom pair and provide an optimal, orthogonal (with respect to the Coulomb metric) basis to expand the products of orbitals within that pair of atoms, i.e., Therefore, the dominant products preserve the local character of the original atomic orbitals and V ab µ is a sparse table by construction.
The dominant products F µ (r) are expanded in terms of spherical harmonics about a center. In the case of valencevalence and core-core bilocal products (i.e., involving two atoms at different locations and valence or core orbitals in both atoms), the midpoint along the vector that joins both nuclei is chosen as the expansion center. However, for pairs of orbitals involving core orbitals in one atom and valence orbitals in the other atom, we use an expansion center that is much closer to the nucleus of the first atom. The center of expansion for such core-valence products is determined using information about the spatial extension of the core and valence shells. As a measure of the spatial extension of a given shell, we take an average of the square-root of the expectation values of r 2 among all the radial orbitals belonging to that shell, , where 2l s + 1 is the multiplicity of a given orbital with angular momentum l s . The coordinate of this core-valence bilocal dominant product is then calculated as a weighted sum of the positions of the two shells (atoms) involved, C core and C val , . This adjustment of the expansion center significantly increased the accuracy of the expansion (Eq. 20). For instance, the precision of the computed overlaps and dipoles improved by an order of magnitude.
The product expansion in Eq. (20) allows reducing substantially the dimension of the space of orbital products. For example, using a cc-pVDZ basis we have 38 orbitals to describe acetylene (C 2 H 2 ), leading to 703 products. However, they can be expressed in terms of 491 dominant products with high precision (throwing away eigenfunctions of the local Coulomb metric with eigenvalues lower than 10 −6 ). 53 In general, we typically found a reduction in the number of products by at least 30% with this local compression scheme in these accurate calculations. Still, as we will see in subsection IV D it is generally possible to reduce further the dimension of the product basis using a non-local compression scheme. We can now rewrite the exchange operator (19) as where D ab = E<EF X E a X E b is a density matrix, and v µν are matrix elements Therefore, the exchange operator (21) is efficiently calculated in terms of two-center integrals (22). The matrix elements of Hartree potential V H (r) are also calculated in this basis of dominant products V ab As shown in Ref. 41, the GW equations (5), (6), (7) and (8) can also be conveniently rewritten within the basis sets of atomic orbitals {f a (r)} and dominant products {F µ (r)}. We state these equations without derivation for the sake of completeness The treatment of convolutions in the latter equations is done with spectral function technique as explained below.

B. Spectral functions technique
As customary, the screened interaction W (r, r , ω) in our calculation is separated into the bare Coulomb interaction v(r, r ) and a frequency-dependent component W c (r, r , ω) = W (r, r , ω) − v(r, r ). The bare Coulomb interaction v(r, r ) gives rise to the HF exchange operator. 13 It can be computed with the space of dominant products without much computational effort according to Eq. (21). The GW correlation operator Σ c = iGW c is more demanding due to the frequency dependence combined with the rather large dimension of the space of products.
Because of the discontinuities of the electronic Green's functions, a straightforward convolution to obtain either response function (26) or the self-energy operator (24) is practically impossible both in the time domain and in the frequency domain. However, one can use an imaginary time technique 54 or spectral function representations 41,55,56 to recover a computationally feasible approach. In this work, we continue to use the spectral function technique and rewrite the time-ordered operators as follows where "positive" and "negative" spectral functions define the whole spectral function by means of Heaviside functions θ(t). For instance, the spectral function of the electronic Green's function reads ρ ab (s) = θ(s)ρ + ab (s) + θ(−s)ρ − ab (s). Transforming the first of equations (27) to the frequency domain, we obtain the familiar expression for the spectral representation of a Green's function Here ε is a small line-broadening constant. In practice, the choice of ε is related to the spectral resolution ∆ω of the numerical treatment and will be discussed below in section V. One can derive expression for spectral function of response a µν (s) using equations (26) and (27) Here, the convolution can be computed with fast Fourier methods and the (time-ordered) response function χ µν (ω) can be obtained with a Kramers-Kronig transformation The calculation of the screened interaction W µν c (ω) must be done with the response function, rather than with its spectral representation, because of the inversion in equation (25). The spectral function of the screened interaction γ µν (ω) can be easily recovered from the screened interaction itself 13 . Deriving the spectral function σ(ω) of the self energy, we arrive at These expressions show that the spectral function of a convolution is given by a convolution of the corresponding spectral functions. As in the response functions, we compute these convolutions employing fast Fourier transforms.

C. Frequency-dependent functions on the equidistant grid
The spectral functions of the non-interacting Green's function (2) are merely a set of poles at the eigenenergies E The use of fast Fourier techniques for convolution, for instance in equation (29), requires that the spectral functions ρ + bc (ω), ρ − da (ω) be known at equidistant grid points ω j = j∆ω, j = −N ω . . . N ω , rather than at a set of energies resulting from a diagonalization procedure. The solution to this problem (discretization of spike-like functions) is known and well tested. 41,55,56 We define a grid of points that covers the whole range of eigenenergies E. Going through the poles E, we assign their spectral weight X E a X E b to the neighboring grid points n and n + 1 such that ω n ≤ E < ω n+1 according to the distance between the pole and the grid points p n, ab = ω n+1 − E ∆ω X E a X E b , p n+1, ab = 1 − p n, ab . Such a discretization keeps both the spectral weight and the center of mass of a pole. Convergence of discretization parameters is discussed below, in section V.
As a result of our calculation, we obtain the density of states (DOS) directly from the imaginary part of the converged Green's function where G ab (ω) is obtained by solving Dyson's equation (23). In our approach, the ionization potential IP is found directly from the density of states DOS(ω) on a uniform frequency grid. We find the IP by fitting the density of states locally by a third order polynomial and by finding the maximum of this fit. The convergence of both SCGW and QSGW loops is determined by the DOS(ω) where N orbs is total number of orbitals in the molecule -the DOS i (ω) is normalized to this number and i is the iteration number. We have chosen a small threshold on this convergence parameter Conv < 10 −5 in order to stop GW the iteration of both self-consistency schemes. In general we observe that this criterium translates to an even larger accuracy in the convergence of IP (better than 10 −5 relative error).

D. Non-local compression of the dominant-products basis
The calculation of screened interaction W c (r, r , ω) should have been performed in the space of orbital products, thus requiring the inversion of matrices of large dimensions. The basis of dominant products partially alleviates this problem by eliminating the collinearities between products of orbitals corresponding to the same pair of atoms. However, there are still strong linear dependencies between products of orbitals corresponding to neighboring pairs of atoms. Thus, the number of elements in the auxiliary basis set for the orbital product expansion can be further reduced with important savings in the required memory and run time. In order to address this problem, we perform an additional non-local compression: the new product basis is formed by linear combinations of the dominant products of all the pairs of atoms in the molecule. As described in detail in Ref 41, these linear combinations are obtained by first constructing the Coulomb metric projected into a relevant function manifold, and second keeping only the eigenfunctions of this projected metric with eigenvalues larger than a threshold value λ thrs . Thus, the elements of this new basis are orthogonal to each other with respect to Coulomb metric. The relevant manifold is determined by low-energy electron-hole pair excitations according to: where X E a are the eigenvectors of the effective Hamiltonian (18), and V ab µ is the product "vertex" (20). In the construction of the metric only low-energy excitations are included according to the criterium: Using Eq. (35) to select the relevant electron-hole pair excitations to describe the dynamics provides good results for one-shot G 0 W 0 calculations if E thrs is sufficiently large. However, for SCGW one has to reconsider this point more carefully. During the iteration process, the restriction that the relevant subspace to represent the polarization function χ µν (ω) necessarily corresponds to pairs of occupied-unoccupied eigenstates of the initial one-electron Hamiltonian H eff is relaxed. With each iteration we are loosing the information about the initialĤ eff and its sharp division of the Hilbert space into one occupied and one unoccupied manifolds. Therefore, we have used a more general subset of vectors V EF µ = X E a V ab µ X F b in which more general low-energy pairs EF were included according to So we consider products of occupied/occupied, unoccupied/unoccupied and occupied/unoccupied pairs of eigenfunctions ofĤ eff , provided that their energies are sufficiently close.
In our calculations E thrs and λ thrs are treated as convergence parameters, which are refined until convergence is reached in the self energy for the range of frequencies under exploration. Here we consider small molecules with a relatively small basis set. Therefore it was actually possible to include all possible pairs of eigenvectors in the compression procedure, while λ thrs was taken 10 −3 for all molecules. With this choice, we could get a significant reduction in the size of the product basis. For example, for the acetylene molecule with a cc-pVDZ basis, from the 703 initial products of orbitals, we made a first local compression to 491 dominant products and, with the non-local compression, this was reduced to 128 basis elements.

E.
Σ(ω) →Vxc mapping in a basis of atomic orbitals The map of the self energy to an exchange-correlation operator (12) is made separately for the frequency-independent (exchange) self energy Σ x = iGv = Σ HF x , and for the frequency-dependent correlation self energy Σ c (ω) = iGW c . Obviously, the exchange operatorV x is identical to the exchange part of the self energy V ab x = Σ ab x (i.e. to the HF exchange operator 21).
The correlation operatorV c is found by using equation (12) and inserting the LCAO expansion (14) into equation Because we use real-valued basis functions f a (r), the Hermitian part of operator reduces to the real part. In our approach, we obtain the self energy Σ ab c (ω) on an equidistant frequency grid, which allows the calculation of convolutions by means of fast Fourier transforms. The eigenvalues E of the QP equation do not necessarily fit with any equidistant grid, but we have found that a linear interpolation procedure provides a reliably converging approximation to the self energy in an arbitrary energy Σ ab c (E).

F. Mixing schemes for SCGW and QSGW
Mixing of successive iterations is often necessary to achieve convergence in iterative approaches to nonlinear equations. Mixing is needed to solve the Hartree-Fock equations and the same is true for the self-consistent equations of SCGW and QSGW .
In the SCGW scheme ( Fig. 1) we have to mix frequency-dependent operators, which unfortunately leads to large memory requirements. Therefore, we resorted to the simplest linear mixing scheme. Initially, we tried to mix the Green's functions calculated in sucessive steps as suggested in Ref. 37. However, we found examples where the convergence was unstable and the results unreliable. By contrast, a linear mixing of the self energy always worked in the case of SCGW and it was possible to use a mixing weight as large as α = 0.35.
In the case of QSGW calculations (Fig. 2) the self energy mixing sometimes failed to achieve convergence. A convenient solution was to mix the correlation operator (37) rather than the self energy. This mixing of correlation operator has been also used in the MOLGW code by Bruneval. 18 For the molecules considered here, the linear mixing of the correlation operator has been used with α = 0.25.

G. Independence of SCGW and QSGW on their starting points
In both methods, SCGW and QSGW , the Hartree potential V H , as well as the exchange Σ x and correlation Σ c (ω) components of the self energy are recomputed in every iteration. Only the matrix elements of the kinetic energyT and the nuclear attraction V ext are kept fixed. In such self-consistent loop, we expect that any reasonable starting Green's function will converge to the same interacting Green's function, but this expectation has to be confirmed by actual calculations 25 . Such a test also provides a measure of the achievable accuracy in the numerical procedure. We present such test in Fig. 3 for the methane molecule, where the convergence of the IP is accomplished using HF and the local density approximation (LDA) to DFT as starting points. For these calculations we have used a frequency resolution ∆ω = 0.05 eV and a broadening constant ε = 0.1 eV for both SCGW and QSGW . This choice of frequency resolution and broadening constant will be justified in section V. The frequency grid covers a range of [−1228.8 eV, 1228.8 eV] for both starting points: HF and LDA, which is sufficient to obtain converged SCGW calculations. The non-local compression was done with all possible pairs of molecular orbitals (i.e. E thrs is chosen higher than maximal difference of eigenvalues) and threshold for eigenvalues λ thrs is set to λ thrs = 10 −5 . A Hartree-Fock calculation and a density functional theory calculation using the local density approximation were used as starting points. The first iteration corresponds to a G0W0 calculation.
We can see that the convergence behavior of SCGW is monotonic and, in this case, almost symmetric with respect to the LDA/HF starting points. After 25 iterations, both starting points converge to the same IP within 3 meV for the SCGW calculation, which is well within the used frequency resolution of 50 meV.
QSGW converges rather fast at the beginning of the self-consistent loop, but the convergence behavior is not monotonic in general. However, the "mode B" converges somewhat more reliably because a monotonic convergence sets in earlier than for the "mode A", as shown in Fig. 3 Moreover, QSGW "mode B" can achieve a better and faster convergence of the DOS (Eq. 34) than with "mode A". For instance, in the present case, we reached Conv ∼ 2 · 10 −3 for "mode A" after 150 iterations both with HF and LDA starting points, while for "mode B" we found Conv ∼ 10 −6 after 31 iterations for HF and 40 iterations for LDA starting points. In both cases we used mixing parameter α = 0.25. These indications of better convergence properties of "mode B" comparing to "mode A" will be further discussed below, in subsection V E, in relation to the convergence with respect to the basis set size.
The negligible starting point dependence of the IP seems to indicate that we are indeed reaching the same selfconsistent solution either starting from HF or LDA, both for SCGW and QSGW self-consistent schemes. This is further confirmed by the direct comparison of the iterated DOSs. For all the cases examined we have found that LDA and HF starting points always arrive to indistinguishable DOSs.

V. CONVERGENCE STUDIES
Here we discuss the dependence of our results on different technical parameters. The set of convergence parameters is rather large. Namely, we should explore the convergence with respect to the extension of the frequency grid [ω min , ω max ], the frequency resolution of the grid ∆ω, the broadening constant ε and the parameters defining the nonlocal compression (E thrs , λ thrs ), for the three self-consistent schemes SCGW , QSGW "mode A" and QSGW "mode B". We have chosen to study these parameters for two systems: helium and methane with cc-pVDZ basis set. A full range-covering convergence study is practically impossible with such a large set of convergence parameters. However, it is possible to show the convergence with respect to each parameter separately, keeping the other parameters fixed. Additionally we explore the convergence with respect to the basis set size for two small systems, He and H 2 , using basis sets up to cc-pV5Z basis. As we will see, this study will unveil the poor convergence properties of QSGW "mode A" with respect to the size of the basis.
Notice that in our previous publication, 41 we proposed the use of two grids with different resolution: a finer grid covering the low energies of interest, and a coarser grid with larger extension. However, here we do not use this so-called second window technique. We prefer to converge the results with respect to a single frequency grid and, thus, eliminate this additional source of uncertainties.

A. Frequency grid extension
Here we consider the convergence with respect to frequency grid extension. Analyzing the changes in the DOS as a function of the self-consistency iteration, we have clearly seen the appearance of satellite structures besides the main peaks. The satellites at the G 0 W 0 level can reach approximately twice ∆E, where ∆E = |E 1 − E N | and E 1 and E N are, respectively, the lowest and highest eigenvalues of the starting point Hamiltonian. The subsequent iterations in the SCGW loop lead to the appearance of even larger frequencies in the self energy and, consequently, in the DOS. However, the higher-order satellites are weak and do not significantly contribute to the numerical value of the ionization potential. We discuss the satellite structure of SCGW in more detail in the Supplementary Material. 57 We take into account the above mentioned facts and parametrize the range of the frequency grid as a function of ∆E, defining a new parameter f ω , [−f ω ∆E, f ω ∆E]. The other parameters were chosen as following: ε = 0.2 eV, ∆ω = 0.1 eV, E thrs = ∆E, λ thrs = 10 −3 ; this choice will be justified later in this section. data shows that results converge for large enough grid extensions. Incidentally, the convergence is much faster for CH 4 than for He. According to these data, f ω = 2 seems to set the smallest frequency grid extension after which the results become reliable. In the rest of the calculations presented here, we will use f ω = 2.5 to ensure a good convergence of the obtained IP (now within a few meV).

B. Frequency grid resolution
We turn now to the role of the frequency resolution. In this study, we fixed the extension of the grid to [−2.5∆E, 2.5∆E] as discussed above, varied the frequency resolution ∆ω, and compared the calculated IPs. The broadening constant is ε = 2∆ω. The parameters of non-local compression are chosen as in the previous subsection. The results for helium and methane are presented in Fig. 4.
Both QSGW "modes" give results largely independent on the frequency resolution ∆ω. This is a welcome feature because a relatively coarse frequency grid can be used with QSGW . It is interesting to note that a similar behavior is generally found for one-shot G 0 W 0 calculations. In contrast, the SCGW procedure exhibits a stronger dependence on the frequency resolution. We observe an almost linear dependence of the calculated IP on ∆ω. This (less welcome) feature has its roots in the computation of the density matrix from the Green's function (Eq. 10). The spectral function treatment using a coarse grid results in rather broad resonances of Lorentzian shape, and their width deteriorates the quality of the density matrix. This convergence behavior can be seen already in a self-consistent loop without any correlation self energy Σ c (ω), i. e. performing the Hartree-Fock calculation with Green's functions. Regarding this point it is interesting to note that, although the deviations of the electron number are usually rather small in the present GW calculations, typically not larger than 1%, we renormalize the density matrix to right number of electrons after each iteration to avoid uncontrolled variations of the Hartree potential. Notice that this consequence of the spectral function representation does not affect the QSGW calculations, because the density matrix in QSGW is obtained directly from the eigenvectors of the QSGW effective HamiltonianĤ eff .
The approximate linear dependence of the SCGW IP (Fig. 4) for small values of ∆ω is seen in all the examples we have considered. For most atoms and molecules the calculated IP increases as ∆ω decreases, with the sole exception of LiF that shows the opposite behavior. Therefore, we will estimate the results in the limit of infinite resolution (∆ω → 0) from two calculations with different frequency resolutions. The SCGW results presented in subsection VI B have been obtained using this linear extrapolation to infinite resolution.

C. Broadening constant
The choice of broadening constant ε in our calculations with equidistant frequency grid is rather intuitive. If the broadening constant is smaller than frequency resolution ∆ω, then a resonance may "squeeze" unnoticed between two neighboring frequency points and become missed. Therefore, the broadening constant ε must be necessarily larger than the frequency spacing ∆ω.
In this work, we will parametrize the broadening constant as ε = f ε ∆ω, where f ε > 1 is a new parameter. We are interested to keep the number of frequencies in the grid as small as possible to minimize the computational cost connected to the size of the frequency grid. Here the frequency grid extension is set using f ω = 2.5. The frequency resolution is chosen to be ∆ω = 0.1 eV for QSGW , while for SCGW the data presented correspond to a linear extrapolation of the IPs from the data computed for ∆ω = 0.1 and ∆ω = 0.05 eV as described in subsection V B. The parameters of non-local compression are chosen as in subsection V A. In Table II we show the IPs computed with different broadening constants f ε ∆ω. One can see that the IPs change steadily with decreasing of parameter f ε from 3.0 to 2.0 in all calculations, while between f ε = 2.0 to f ε = 1.0 there is no clear trend. Moreover, the SCGW calculation for methane failed to converge to our target Conv accuracy with f ε = 1.0. Therefore, we regard f ε = 2.0 as an optimal parametrization for broadening constant ε.

D. Non-local compression
The choice of non-local compression parameters was studied in Ref. 41 for pseudo-potential based, LDA-G 0 W 0 calculations. In the present work, we found the behavior of non-local compression to be similar to that found in our previous study. However, here we prefer not to limit the number of molecular orbitals by the energy criterium E thrs (see section IV D). This decision does not significantly contributes to the runtime of any of our examples, while it removes one technical parameter to converge our calculations with respect to. Table III shows the dependence of the IPs on the threshold eigenvalue λ thrs of the Coulomb metric. The other calculation parameters has been chosen as in the previous subsection. From the table one can see that a large threshold for the eigenvalues of the Coulomb metric λ thrs = 0.1 leads to sizable changes of the computed IPs. However, the non-local compression becomes reliable with thresholds λ thrs < 10 −3 . The values of the IP with λ thrs = 10 −3 and λ thrs = 10 −4 vary less than 6 meV. Because a stronger reduction of the number of products positively impacts the computational performance, we have chosen λ thrs = 10 −3 for the main calculations in section VI.

E. Size of the cc-pVζZ basis sets and failure of QSGW "mode A" to converge
The correlation consistent basis sets cc-pVζZ are supposed to provide increasingly better results in terms of the convergence to the complete basis set (CBS) limit as the cardinal number ζ of the basis set is increased. We intent to study this convergence for SCGW and QSGW schemes. The computational cost of using high-ζ basis grows very steeply. Therefore, we are limited in this test to small systems and, as already mentioned, for larger molecules we restrict to cc-pVDZ and cc-pVTZ basis. The covergence test as a function of the size of the basis is important to determine whether a meaningful comparison between SCGW and QSGW can be done using those smaller basis set. The results presented here seem to indicate that this is the case because, although the convergence of the IPs is quite slow with the size of the basis set, both GW schemes show a rather similar convergence behavior.
We focus in the helium atom and the hydrogen dimer. The frequency grid extension is fixed by f ω = 2.5. The frequency resolution is ∆ω = 0.1 eV for both QSGW "modes". For SCGW , we report linearly extrapolated IPs from data calculated using ∆ω = 0.1 and ∆ω = 0.05 eV, following our discussion in subsection V B. The broadening constant is set to ε = 2∆ω, and the non-local compression is performed with λ thrs = 10 −3 . These choices are justified by the tests presented in the subsections V A, V B, V C and V D. The data for the IPs as a function of the basis size are collected in the Table IV. We present results obtained with our code for "mode A" and"mode B" of QSGW (henceforth QSGW A and QSGW B), and SCGW . Table IV also presents the data computed with the MOLGW code developed by F. Bruneval 58 as well as our reference ionization energies from the CCSD calculations with the NWChem code 46 . Notice that for systems containing two electrons CCSD and CCSD(T) are identical, due to the absence of triple excitations, and become equivalent to full-CI. 59 MOLGW implements (among other methods) the QSGW A algorithm. 18 It is important to stress here that the MOLGW code employs other algorithms than used in this work and its implementation is independent on our implementation. Therefore, the close agreement (maximal deviation of 0.03 eV) between the QSGW IPs computed with our code and MOLGW is an important cross-check. In agreement with previous studies, 18,27 the data in Table IV illustrate the very slow convergence of the GW results with the basis set size. A more noticeable and unexpected finding is the non-monotonous convergence of the QSGW A method for the two systems considered here. This is in clear contrast with the behavior observed for both SCGW and QSGW B and, to the best of our knowledge, it had not been reported previously. Notice that the same irregular behavior is produced by our code and by MOLGW. According to our analysis, this poor convergence can be traced back to the combination of two issues, one inherent to the QSGW A scheme, and the other related to the use of atomic orbitals as a basis set. The difficulties arise from the fact that in QSGW A the non-diagonal elements (in the basis set of QP wavefunctions) of the correlation operator (Eq. 12) contain contributions from the self energy evaluated at two different QP energies. Therefore, e.g., the calculation of the HOMO is influenced by the self energy calculated at all other energies, including energies corresponding to the highest molecular states. In spite of the lack of justification for having this mixing of information evaluated at different energies (other than defining an Hermitian operator in Eq. 12), this should not necessarily cause difficulties for the convergence if those self-energy cross-terms would be small or would have a smooth dependence on frequency. Unfortunately this is not always the case. In particular, using a basis set of atomic orbitals (even a quite complete one), the self energy is very spiky even at high energies. This reflects the fact that the continuum of states, that one should find above the vacuum level, is replaced by a discrete collection of states. Therefore, when one of the eigenvalues of the QSGW QP equation lies in a region where the self energy is large, this might have a large influence on the results at low energies through the self-energy cross-terms. In this situation, self-consistency might be difficult to achieve (due to changes in the sign of the self-energy contribution during the self-consistent process), and even if self-consistency is reached the results do not show a steady trend with the basis set size (since increasing the basis set modifies strongly the structure of the self energy at high energies).
The bad convergence properties of QSGW A in combination with basis set of atomic orbitals is a serious draw back for the applicability of this scheme in our case. Fortunately, this property is not shared by QSGW B, that shows a slow but steady convergence with the basis set size for both He and H 2 . The reason is that, in "mode B", all the non-diagonal components of the correlation operator are computed at the Fermi energy, and the difficulties described above disappear. Therefore, in the rest of the paper we will concentrate in the QSGW B method.
The steady convergence of the QSGW B and SCGW methods with respect to the basis set allows extrapolating to the CBS limit. This extrapolation is performed using an inverse cubic function on the cardinal number ζ of the cc-pVζZ basis, IP=IP CBS + Aζ −3 , with ζ = 4 and 5. This formula is frequently used to extrapolate the correlation energy 60,61 and we have found that perfectly fits the dependence of our IPs calculated with ζ ≥ 3. It is interesting to note that our CBS-limit IPs using SCGW 24.56 and 16.25 eV, respectively for He and H 2 , are in excellent agreement with the values, 24.56 and 16.22, given by Stan et al. using large bases of Slater orbitals. 22,23 Interestingly, if we use our CCSD results as a reference in the CBS limit, in the case of He we find that the SCGW IP is much closer to the reference value than the QSGW B one, while for H 2 we have the opposite behavior and QSGW B performs somewhat better than SCGW . The slow convergence of the self-consistent GW schemes with the basis set is certainly an undesirable feature. The IPs calculated with a cc-pVTZ basis are still 0.1-0.2 eV from the CBS limit. However, a very interesting feature is that the covergence behavior is very similar for both methods, and the differences between the calculated IPs converges much faster with the basis set size. In particular, we observed that the IPs obtained with the QSGW scheme are always higher than those obtained with SCGW . For example, the IPs calculated with QSGW and SCGW for He (H 2 ) using a TZ basis differ by 0.15 (0.29) eV, while the CBS-limit difference is 0.19 (0.33) eV. So, at least for these two systems, the qualitative differences between QSGW and SCGW IPs obtained with a cc-pVTZ basis seem to be maintained all the way to the CBS limit.
Table IV also shows that CCSD results converge somewhat faster with the basis set than the GW ones. The IPs of He and H 2 calculated with a cc-pVTZ basis are within 0.07 eV of our CBS limits. This different rate of convergence makes difficult the comparison of the performance of the self-consistent GW schemes against CCSD results using nonsaturated basis sets. Still for basis sets larger than DZ we see than the CCSD IPs always lie somewhere in between the SCGW lower bound and the QSGW upper bound. One should keep in mind the different rate of convergence between the GW schemes and the CCSD when examining the results in Table V. In particular, since the IPs tend to increase with the quality of the basis set, using basis sets which are not fully converged QSGW could tend to outperform SCGW . However, as we will see below, we find the opposite trend and SCGW is, on the average, marginally better than QSGW B at the cc-pVTZ level. This is probably a robust result which holds for larger basis sets.

VI. RESULTS
The methods presented above allow realizing both SCGW and QSGW calculations within the same numerical framework. In subsection VI A we present the densities of states (DOS) obtained with different GW schemes. The quantitative merit of the GW methods is studied in subsection VI B, using the calculated IPs as a measure of such performance.

A. Densities of states for CH4 and N2
Information about the effect of different self-consistent procedures can be obtained from the DOS they provide. Figure 5 compares the DOS of the methane molecule and the nitrogen dimer using different schemes. Panels (a) and (b) demonstrate that SCGW and QSGW B behave quite similarly although the positions of the peaks are slightly shifted. Panels (c), (d), (e) and (f) illustrate the dependence of one-shot G 0 W 0 on the starting point and its comparison with QSGW And SCGW B results. The Hartree-Fock starting point (G 0 W 0 -HF) produces a DOS very close to that of the self-consistent QSGW solution (panels (e) and (f)). In contrast, calculations using the Perdew-Zunger 62 local density exchange-correlation functional as a starting point (G 0 W 0 -LDA) produce DOSs that depart more from those of both (SCGW and QSGW ) self-consistent approaches. In particular, several satellite peaks can be seen in the frequency range below −20 eV for both, CH 4 and N 2 . Self-consistency tends to eliminate these features (see panels (c) and (d)). However, weak satellite peaks also appear in both SCGW and QSGW approaches. For example, for methane we can find satellite peaks around ±35 eV, although they are barely visible in Fig. 5. To clearly visualize these structures it is necessary to plot the DOS in logarithmic scale. This kind of analysis is presented in the Supplementary information. 57 In agreement with previous observations, 63 we find that the Hartree-Fock starting point in combination with the one-shot G 0 W 0 approach tends to provide excellent estimations of one-electron excitation energies in small molecules, see the example of methane in Fig. 5 (e) and Table V. For this reason we use HF as a starting point in our calculations of ionization potentials in the next subsection.

B. Ionization potential of atoms and small molecules
In order to assess the quality of the self-consistent GW method for atoms and small molecules at a quantitative level, we compare the performance of SCGW and QSGW "mode B" with that of quantum chemistry methods, in particular with coupled-cluster (CC) calculations. Here we focus in the first vertical IP. Although we further compare our results against experimental data, a reliable study would require considering effects due to structural relaxations in the final state and corrections related to the finite nuclear masses for light elements, among others. These effects are not taken into account in the present GW calculations. Moreover, a comparison with other well-established theoretical methods using the same basis set also eliminates, at least partially, the ambiguities related to the use of a finite, necessarily incomplete, basis set of atomic orbitals (see the comments Sec. V E). This is an important point since, due to the use of all-electron calculations in the self-consistent GW calculations (therefore requiring the evaluation of the self energy in a very extended frequency grid), even with the small molecules considered here, we are limited to relatively modest basis sets that might not provide fully converged results.
We have chosen the coupled-cluster method with single, double and perturbative triple excitations (CCSD(T)) as a reference theory to compare our GW results with. This choice is motivated by the usefulness of CCSD(T) in many other applications requiring to estimate the contribution of electron correlations in quantum chemical calculations. 64 We performed our CC calculations using the open-source NWChem package, 46 and two different Gaussian basis sets 49,50 that we also adopted in our GW calculations for consistency. We have used both, correlation-consistent double-ζ polarized (cc-pVDZ), and triple-ζ polarized (cc-pVTZ) basis sets for all of our calculations. Comparison of these two sets of results provides a rough estimation of the effect of the basis set incompleteness. A systematic study of the convergence with respect to the basis set size was presented in Sec. V E for two small systems, He and H 2 . The basic conclusions obtained from these two systems are: i) The convergence of the GW results is rather slow; ii) Fortunately the convergence of SCGW B and QSGW is very similar and differences between IPs calculated with these two schemes are converged within 0.05 eV already for cc-pVTZ basis sets; iii) The convergence of CCSD(T) is somewhat faster than that of GW , which should be taken into account when analyzing the data presented here.
The molecular geometries were optimized at the level of CCSD(T) using the cc-pVTZ basis set 57 . These geometries were later used in all the other calculations, including the self-consistent GW . In addition to the CCSD(T) calculations, we have also performed calculations without perturbative triples (CCSD) with the cc-pVTZ basis as a way to estimate the convergence of the description of correlations as provided by CCSD(T). Due to the use of relatively small basis sets in our calculations, we limit our study to the IPs. An accurate calculation of electron affinities would require more complete augmented basis sets.
At the level of CC calculations, the vertical IPs were obtained from ∆SCF-CC calculations, i.e., the IP is taken as the difference between the total energy calculated for the neutral molecule and a singly-charged positive ion keeping the ground-state CCSD(T)/cc-pVTZ geometry. For the positive ions, unrestricted Hartree-Fock was used to produce the starting point for the CC calculations. 65 Our calculations compare well with the literature. We checked our CCSD(T)/cc-pVTZ calculations against the data from NIST database CCCBDB. 66 Ionization potential of atoms is the same as provided by NIST. Unfortunately, there are only adiabatic IPs available from NIST for the small molecules we consider. However, we compared the total energies of neutral molecules with the corresponding NIST values and found a good agreement within a few meV. Moreover, our ionization energies of N 2 , CO, F 2 C 2 H 2 and H 2 CO agree well with some recent quantum chemical calculations. [67][68][69][70] In the GW calculations, the IPs were obtained from the position of the first peak below Fermi level in the DOS of each molecule. The frequency grid resolution ∆ω used with the QSGW approach was 0.05 eV for cc-pVDZ and 0.1 eV for cc-pVTZ basis sets. In the case of SCGW , a linear extrapolation to the limit of infinite frequency resolution was applied as discussed in subsection V B. Therefore, ∆ω = 0.05 and 0.025 eV were used in the calculations with cc-pVDZ basis set, and ∆ω = 0.1 and 0.05 eV for those using a cc-pVTZ basis set.
The convergence with the number of dominant products, used here to express the products of basis functions, was monitored comparing the energies of the HOMO of the different molecules calculated at the Hartree-Fock level with our code and with NWChem. Our code uses the basis of dominant products to compute Hartree and exchange contributions to the energy and Hamiltonian. We found maximal differences of at most 6 meV (for nitrogen containing molecules), while the mean absolute error (MAE) of the HF-HOMO position is only 1.6 meV for our set of sixteen atoms and molecules.
The results for the IPs of all the studied systems are presented in Table V. Before analyzing the GW results, it will be instructive to make some comments about our CC reference calculations. Comparison between CCSD(T) and CCSD results (both using the cc-pVTZ basis) indicates that the inclusion of triple excitations does not substantially modify the calculated IPs on the average: 69 meV MAE and a maximal difference of 0.22 eV for the F 2 molecule. These differences are significantly smaller than those obtained when comparing the CCSD(T) results with those of the different GW methods. This confirms that, at least for the systems considered here, CCSD(T) is a reasonable choice as a reference theory.
The convergence of the results with respect to the basis set is slower as we could anticipate from our systematic study for He and H 2 . Comparing CCSD(T) results calculated with cc-pVDZ and cc-pVTZ bases, we find a MAE of 0.27 eV and a maximal difference of 0.54 eV for the IP of the water molecule. These larger variations are a clear indication of the rather slow convergence of correlation effects with respect to the basis size. The present results also confirm the observation, made in Sec. V E for He and H 2 , that the IPs increase with the use of the more complete basis, with the exception of beryllium atom whose IP is unchanged when moving from a cc-pVDZ basis to a cc-pVTZ basis.
The observed dependence of the IP on the basis set size also agrees with the results of two recent convergence studies of G 0 W 0 -HF IPs for light atoms as a function of the basis set size. 18,27 According to these studies, G 0 W 0 -HF calculations using a cc-pVTZ basis set already produce IPs converged within ∼0.15 eV for He and Be as compared with calculations using much larger bases. This agrees well with our observation for He and H 2 IPs of a convergence with respect to the CBS limit within ∼0.2 eV using the TZ basis. However, for Ne, Bruneval 18 has shown that this error can grow considerably (∼0.4 eV) and it is necessary to use a much larger basis, up to cc-pV5Z, in order to converge the results within a range of ∼0.1 eV. Another convergence study at the G 0 W 0 level was performed by Ren et al. 71 . It also shows the increase and slow convergence of the IPs of atomic and molecular systems with the basis set size. Unfortunately, the use of aug-cc-pV6Z bases, proposed in Ref. 71 as an appropriate reference basis set, is prohibitively expensive for the molecular study of self-consistent GW schemes presented here. Thus, following Ke 27 , we use cc-pVTZ basis in our calculations. We stress here that the main purpose of the present paper is not to provide fully converged IPs, but to study how different self-consistent GW schemes perform for several representative molecules while keeping all other technical details identical. As shown in detail below, the cc-pVTZ basis seems to be sufficient for this purpose. This is indicated by the fact that the qualitative and quantitative deviations of the different GW IPs with respect to the CCSD(T) results, and among them, are rather similar with the two basis sets used in this study (cc-pVDZ and cc-pVTZ). In any case, Table V provides a consistent comparison, using the same basis sets and the same numerical implementation, between different schemes to include correlation.
Comparing our CCSD(T)/cc-pVTZ results with the experimental data in Table V we can find some significant deviations. The larger deviation (0.96 eV) takes place for H 2 . This is probably related to the lack of corrections due to the finite mass of nuclei and the structural relaxations in the final state in our calculations. The second largest difference (0.78 eV) happens for CH 4 . Relaxations in the final state are known to play a crucial role for methane 72 (the adiabatic IP is 12.61 eV 66 ), and this might be behind the poor comparison with the nominal experimental vertical IP (13.60 eV 66 ). In spite of the uncertainties about the comparison of our calculated vertical IPs with available experimental data, the overall agreement is good and the MAE of the CCSD(T)/cc-pVTZ calculations with respect to the experimental results in Table V is 0.19 eV, smaller than those of most of the self-consistent GW methods.
We now turn to the analysis of our GW results. Both self-consistent GW approaches, SCGW and QSGW B, give results that are relatively close to the CC numbers obtained using the same basis. Figure 6 depicts the differences between GW and CC IPs. We can see that the overall behavior of SCGW and QSGW IPs is quite similar. However, QSGW tends to overestimate the IPs as compared to CC results, whereas SCGW underestimates the IP in most cases. In the case of He and H 2 such behavior is also observed for IPs calculated using more complete basis sets. The G 0 W 0 results starting from HF solutions are closer to those of QSGW B. Indeed the MAE with respect to CCSD(T) results using the cc-pVTZ basis is very similar for both methods.
QSGW and SCGW deviate from CC results in different directions. However, the mean absolute value of such deviation is quite similar in both cases. The MAEs with respect to the CCSD reference can be found in Table V: 0.21 and 0.25 eV, respectively for SCGW and QSGW B calculations using the cc-pVDZ basis, which increase to 0.22 and 0.27 eV when the larger cc-pVTZ basis is used. It is interesting to note, following our discussion Sec. V E, that the MAE of QSGW B IPs with respect to the CCSD(T) data is slightly larger than that of SCGW . If the observed differences were solely determined by the faster convergence of CCSD(T) results with respect to the basis set size, we would expect the opposite behavior. Therefore, we can speculate that, for the set of sixteen molecules considered here, it is likely that SCGW will provide better IPs (in average) than those given by QSGW B. However, coming back to Table V, we can say that using cc-pVTZ basis sets on average QSGW and SCGW perform very similarly. The maximal discrepancies are somewhat larger for SCGW : 0.76 eV for the Be atom using the cc-pVTZ basis, to be compared with the 0.67 eV deviation for F 2 in the case of QSGW . The G 0 W 0 -HF is on average only slightly worse than SCGW and quite comparable to QSGW B, with a MAE of 0.28 (0.22) eV and a maximal error of 0.86 (0.77) eV for the N 2 (CO) molecule using the cc-pVTZ (cc-pVDZ) basis.
We can now compare our results with previously published data for the IPs of small molecules computed with self-consistent GW schemes. For this purpose we will use the results obtained with the more complete cc-pVTZ basis. Most of the existing data for molecules correspond to the SCGW method. 19,[22][23][24][25][26]43 We are only aware of three very recent studies using the QSGW method for small molecules: one dealing with small sodium clusters up to five atoms 73 , one studying small conjugated molecules 27 and one for first row atoms. 18 We start with the SCGW results. Stan et al. 22,23 performed all-electron SCGW calculations using large bases of Slater orbitals. They presented results for the IPs of the same atoms that we have considered (He, Be and Ne), as well as for H 2 and LiH. In general we find good agreement with their data. Our IPs are always somewhat smaller, although differences stay within 0.15 eV, except for Ne, for which the difference grows up to 0.39 eV. Most of the differences are probably due to the basis set. As mentioned above, in the cases of He and H 2 in which we could use larger basis sets, our IPs extrapolated to the complete basis set limit and those reported by Stan et al. agree within 0.03 eV. The large deviation for Ne seems to indicate some particular difficulty of the cc-pVTZ basis set to describe the IP of this element. 18 The MAE, over the five species mentioned above, of our SCGW IPs with respect to those of Stan et al. is 0.15 eV (which grows up to 0.19 eV when we compare the G 0 W 0 -HF results). Delaney et al. 43 reported an all-electron SCGW IP for Be of 8.47 eV. Our SCGW /cc-pVTZ IP for Be (8.53 eV) lies in between this value and that given by Stan et al. (8.66 eV).
More extensive sets of molecules have been studied by Rostgaard et al. 24 and Caruso et al. 25 . Rostgard et al. presented data for the all-electron SCGW IPs of 34 different molecules, including all the molecules considered here except H 2 . Their calculations used a double-ζ polarized basis set of augmented Wannier functions (Wannier functions obtained from projector augmented wave calculations of the molecules, supplemented with suitably chosen numerical atomic orbitals). Core states were taken into account in the calculation of the matrix elements of the exchange self energy. However, the contribution of core states to the correlation self energy of valence electrons was disregarded, since it was assumed to be small due to the large energy difference and small spatial overlap between valence and core states. We find that the SCGW IPs in Table V are larger (except for LiF and LiH) than those reported by Rostgard et al.. The maximal differences take place for F 2 and LiF, where our calculated IPs are 0.54 eV larger and 0.67 eV smaller, respectively. The average deviation between our SCGW results and those of Rostgard et al. (MAE=0.32 eV, which grows up to 0.57 eV for the G 0 W 0 -HF results) is somewhat larger, although comparable, to that between our SCGW and CCSD(T) results. This seems to indicate that numerical and methodological aspects behind each implementation still hinder the comparison of results obtained with different codes using, formally, the same selfconsistent GW scheme. The use of different basis is probably one of the most important causes of discrepancies, as recently pointed out by Bruneval and Marques for G 0 W 0 calculations. 33 However, part of the discrepancies might be related to two factors: i) the use of MP2/6-31G(d) geometries by Rostgard et al., while we use CCSD(T)/cc-pVTZ relaxed geometries and, ii) the lack of core-valence correlations in their calculations. The better agreement of our results with the full all-electron SCGW calculations in Ref. 25 could support this last conclusion on the influence of core-valence correlations.
Caruso et al. 25 report the values of the SCGW IPs for the same set of molecules used by Rostgard et al.. Their all-electron calculations use a basis set of numerical atomic orbitals and the resolution of the identity technique to express the products of those orbitals. Their IPs are systematically larger than those reported here, although the differences are relatively small, lower than 0.19 eV for all the molecules except for LiF, for which the difference grows up to 0.46 eV. The MAE over the 12 molecules is only 0.14 eV for SCGW and 0.15 eV for G 0 W 0 -HF calculations. Therefore, the overall agreement between our SCGW /cc-pVTZ results and those of Caruso et al. is rather good. Now we compare our QSGW with the very scarce data available in the literature. Ke has recently studied the IPs and electron affinities of a number of conjugated molecules using the QSGW "mode A" method. 27 Ke uses a cc-pVTZ basis, similar to that utilized here, and reports 11.31 eV and 11.44 eV for the IP of C 2 H 2 calculated at the level of QSGW A and G 0 W 0 -HF, respectively. This is in excellent agreement with our corresponding results of 11.43 eV and 11.54 eV and indicates that, at least for this molecule and the cc-pVTZ basis set, the calculated IP is rather stable against the use either QSGW A or B schemes. Bruneval 18 reported 24.46 (24.72), 9.11 (9.16) and 21.62 (21.79) eV, respectively, for the IPs of He, Be and Ne calculated using the QSGW A (G 0 W 0 -HF) approach and a very complete cc-pV5Z basis (of Cartesian kind). These values are in good agreement with our results although they are always somewhat larger. This is due to the use of a smaller cc-pVTZ basis set in our case, as clearly demonstrated by the excellent agreement between data calculated using the MOLGW program 18 and our code when the same basis set are used (Table IV). Furthermore, focusing on the results published by Bruneval in Ref. 18, comparing our G 0 W 0 -HF with those reported in Figure 1 of that paper, we find that the results reported there for the cc-pVTZ basis are almost identical to those presented here. This again indicates a very welcome consistency between both sets of calculations.
Finally, we can compare our GW vertical IPs with the experimental data in Table V. This comparison should be taken with some caution: as commented above, the comparison might be affected by other factors different from the ability of the GW schemes to capture electron correlations. In any case, it is interesting to obtain a quantitative measure of the deviation. The MAE with respect to the experimental data are similar for the SCGW and QSGW B results obtained using the cc-pVTZ basis, 0.26 and 0.35 eV, respectively. It increases to 0.5 eV for the G 0 W 0 -HF approach. These deviations of the GW results with respect to the experiments are somewhat larger than those with respect to the CCSD(T)/cc-pVTZ theoretical reference. They seem to confirm a very similar degree of accuracy for the QSGW and SCGW methods, as well as their moderate improvement as compared to the G 0 W 0 -HF approach.

VII. CONCLUSIONS AND OUTLOOK
In this article we studied two self-consistent GW approaches, the self-consistent GW (SCGW ) and the quasiparticle self-consistent GW (QSGW ), within a single numerical framework. We explored two possible realizations of the QSGW algorithm, the so-called "mode A" and "mode B". A systematic study for He and H 2 indicated that, for QSGW A, the IPs do not show a monotonic convergence as a function of the basis set size. This unexpected results was traced back to the peculiar dependence on two different reference energies of the cross-terms of the correlation operator in QSGW A, in combination with the use of basis sets of atomic orbitals that confers the self energy a complex and abrupt frequency dependence in the high frequency limit. Motivated by this observation, we concentrate our study of different molecules in a comparison between standard self-consistent SCGW and QSGW "mode B" .
We focused on light atoms and small molecules as examples of finite electronic systems and performed all-electron GW calculations for them. We have studied the density of states (or spectral function) given by both approaches and, from a qualitative point of view and at low and moderate energies, we did not find significant differences between both approaches. In both cases the number and intensity of satellite structures is reduced with respect to one-shot G 0 W 0 calculations. This is in agreement with previous observations, for example, for the homogeneous electron gas. 21 We have also compared both approaches quantitatively by calculating the ionization potentials and comparing them against coupled-cluster calculations. The comparison shows similar qualities for both self-consistent GW approaches, which are only slightly better that one-shot G 0 W 0 calculations starting from Hartree-Fock. Interestingly, SCGW and QSGW calculations tend to deviate in opposite directions with respect to CCSD(T) results. SCGW systematically produces too low IPs, while QSGW tends to overestimate the IPs. We do not have a clear explanation for this different behavior of SCGW and QSGW . It is interesting to note, however, that the behavior observed for QSGW here seems to be consistent with the known tendency of QSGW to overestimate the band gaps of solids. 38,74 For the small molecules considered here, G 0 W 0 -HF produces results which are surprisingly close to QSGW calculations both for the DOS and for the numerical values of the IPs. In spite of the similarities, SCGW produces results somewhat closer to the CCSD(T) reference.
We chose to compare our results against CCSD(T) calculations, rather than against experimental results for several reasons. One of them is the difficulty to converge the self-consistent GW results with respect to the basis set in our all-electron calculations. Performing converged calculations with respect to the frequency grid and size of the auxiliary basis of dominant products proved to be computationally intensive and, therefore, we are limited to cc-pVTZ basis sets in most cases. However, comparison between CCSD(T) and GW results obtained with both the cc-pVDZ or cc-pVTZ bases, leads to very similar observations. Furthermore, a systematic convergence test as a function of the basis set size performed for He and H 2 indicates that our observation that QSGW tends to overestimate, while SCGW tends to underestimate, the ionization potential of CCSD(T) is very likely to remain valid using more complete basis sets. Regarding the observation that SCGW is marginally closer to the CCSD(T) results than QSGW , we also believe that it will remain valid with more complete basis sets. The reason for this suspicion is the steeper increase of the GW IPs with the basis size as compared to those calculated using CCSD(T) (that show a faster convergence). We argue that this will tend to improve the agreement between SCGW and CCSD(T), and degrade that of QSGW , as the basis set size increases. Another interesting point is that the exclusion of triple excitations in the CC calculations, i. e. performing CCSD calculation, produced only minor differences for most systems. With all these ingredients, we expect that the comparison presented here among different self-consistent GW methods, and of those with CCSD(T), reflects the ability of these schemes to deal with the effects of correlations in small molecules.
Regarding the applicability of self-consistent GW methods: On the one hand, our results could not prove that any of the explored self-consistent GW approaches is clearly superior to one-shot G 0 W 0 calculations using an appropriate starting point (e.g., Hartree-Fock and certain hybrid functionals have been shown to provide an excellent starting point for one-shot GW calculations 17,19,32,33,75,76 ); On the other hand, at least for the IPs of the set of atoms and molecules considered here, the self-consistent results seems to improve, although slightly, the G 0 W 0 -HF and we did not observe any clear signature that the self-consistent GW results were pathological. This is interesting because there are situation where one would like to improve the one-particle DFT spectra using a charge or energy conserving scheme. Transport calculations in molecular junctions are a clear example. 37 In this context, it is also worth noting that our calculations indicate that SCGW shows a more stable convergence pattern of the self-consistent loop. The QSGW method can be advantageous in many applications because it generates an effective one-electron Hamiltonian with an improved spectrum.