Ab initio calculation of the shift photocurrent by Wannier interpolation

We describe and implement a first-principles algorithm based on maximally-localized Wannier functions for calculating the shift-current response of piezoelectric crystals in the independent-particle approximation. The proposed algorithm presents several advantages over existing ones, including full gauge invariance, low computational cost, and a correct treatment of the optical matrix elements with nonlocal pseudopotentials. Band-truncation errors are avoided by a careful formulation of $k\cdot p$ perturbation theory within the subspace of wannierized bands. The needed ingredients are the matrix elements of the Hamiltonian and of the position operator in the Wannier basis, which are readily available at the end of the wannierization step. If the off-diagonal matrix elements of the position operator are discarded, our expressions reduce to the ones that have been used in recent tight-binding calculations of the shift current. We find that this `diagonal' approximation can introduce sizeable errors, highlighting the importance of carefully embedding the tight-binding model in real space for an accurate description of the charge transfer that gives rise to the shift current.


I. INTRODUCTION
Under homogeneous illumination, noncentrosymmetric crystals exhibit the bulk photovoltaic effect (BPVE), a nonlinear optical response that consists in the generation of a photovoltage (open circuit) or photocurrent (closed circuit) when light is absorbed via intrinsic or extrinsic processes [1][2][3]. Contrary to the conventional photovoltaic effect in p-n junctions, the BPVE occurs in homogeneous systems, and the attained photovoltage is not limited by the band gap of the material. The BPVE comprises a "circular" part that changes sign with the helicity of light, and a "linear" part that also occurs with linearly-polarized or unpolarized light. The former is symmetry-allowed in the gyrotropic crystal classes, and the latter in the piezoelectric ones [1][2][3].
The present work deals with the intrinsic contribution to the linear BPVE due to interband absorption, known as "shift current." This phenomenom was intensively studied in the 60s and 70s, particularly in ferroelectric oxides such as BaTiO 3 [4]. In recent years it has attracted renewed interest in view of potential applications in novel solar-cell designs [5][6][7], and in connection with topological insulators [8][9][10] and Weyl semimetals [11][12][13].
In a simplified picture, the shift current arises from a coordinate shift accompanying the photoexcitation of electrons from one band to another. Like the intrinsic anomalous Hall effect [14], the shift current originates from interband velocity matrix elements, depending not only on their magnitudes but also on their phases [15][16][17][18].
Over the years, the understanding of the shift current has greatly benefited from model calculations [6,7,19,20]. Tight-binding models have been used to analyze various aspects of the problem, including the possible correlation with electric polarization, the role of virtual transitions, and the sensitivity to the wave functions. Recently, density-functional theory methods started being employed to calculate the shift-current responsivity in specific materials [6,[21][22][23]. The results are generally in good agreement with experimental measurements, proving the predictive power of the ab initio approach.
The first-principles evaluation of the shift current (and of other nonlinear optical responses) is technically challenging, due to the intricate form of the matrix elements involved [15][16][17][18]. Two basic approaches have been devised. One is to express those matrix elements as an infinite sum over intermediate virtual states [15,16,18]. In practice this requires calculating a large number of unoccupied bands, to minimize truncation errors [21,23]. Alternatively, the matrix elements can be recast in terms of derivatives with respect to the crystal momentum k of the initial and final band states [15][16][17][18]. This strategy circumvents the summation over intermediate states, but its practical implementation requires a careful treatment of the derivatives on a finite k-point grid in order to retain gauge invariance and handle degeneracies [22]. Finally, it has been found that the shift current tends to converge slowly with respect to the number of k points used for the Brillouin zone (BZ) integration [23]. All these factors render the shift current more challenging and expensive to calculate than the ordinary linear optical conductivity.
In this work, we develop an accurate and efficient ab initio scheme for calculating the shift current and related nonlinear optical responses in the independent-particle approximation. The proposed methodology, based on localized Wannier functions [24], is closely related to the Wannier interpolation method of calculating to the Berry curvature and the intrinsic anomalous Hall conductivity [25]. In essence, it consists in evaluating the matrix elements by k ·p perturbation theory within the subspace of wannierized bands. This strategy inherits the practical advantages of the sum-over-states approach in the complete space of Bloch eigenstates, but without introducing truncations errors. In addition, it has a very low computational cost thanks to the compact basis set. We will comment on the relation between our methodology and a recent proposal with similar characteristics [26].
Our Wannier-interpolation scheme distinguishes itself in two aspects. First, it provides a physically transparent connection to tight-binding approaches [7]. This is achieved by adopting a phase convention for the Bloch sums that includes the Wannier centers in the phase factors, such that the resulting expressions cleanly separate into two parts: an "internal" part that only depends on the Hamiltonian matrix elements and Wannier centers (the only ingredients in a typical tight-binding calculation), and an "external" part containing the off-diagonal position matrix elements. We find that the latter can give a sizeable contribution to the shift current; moreover, its inclusion removes an artificial symmetry of the shift-current matrix elements in two-band tight-binding models [7]. These findings highlight the importance of carefully embedding the tight-binding model in real space -via the position matrix elements -when calculating the shift current. The other salient feature of our formulation is that it is fully gauge invariant. This is in contrast to previous Wannier-based schemes, where a parallel-transport gauge was assumed when calculating the interband matrix elements [25,26].
The manuscript is organized as follows. In Sec. II we provide some background on the microscopic theory of the shift current. In Sec. III we first review the Wannierinterpolation scheme for calculating the energy bands and the interband dipole matrix elements; the same interpolation approach is then applied to the generalized derivative of the interband dipole matrix, completing the list of ingredients needed for evaluating the shift current. The technical details of our electronic-structure and Wannier-function calculations are described in Sec. IV, and the resulting shift-current spectra of GaAs and monolayer GeS are presented and discussed in Sec. V. We provide some concluding remarks in Sec. VI, and leave additional technical discussions to the appendices.

A. Definitions and background
Our starting point is the formalism of Sipe and Shkrebtii for calculating second-order interband optical responses of bulk crystals within the independent-particle approximation [18]. The basic ingredients are the interband dipole matrix, and its "generalized derivative" with respect to the crystal momentum k. They are given by and respectively, where is the Berry connection matrix, where |u km denotes the cell-periodic part of a Bloch eigenstate and ∂ a stands for ∂/∂k a . The three equations above define Hermitean matrices in the band indices n and m. Importantly, the first two transform covariantly under band-diagonal gauge transformations, where the subscript k has been dropped for brevity. As a result, the combination appearing in Eq. (8) below is gauge invariant. Consider a monochromatic electric field of the form with E(−ω) = E * (ω). Phenomenologically, the dc photocurrent density from the linear BPVE reads [1][2][3] The third-rank response tensor is symmetric under b ↔ c, and transforms like the piezoelectric tensor. According to Eqs. (38) and (41) in Ref. 18, the interband (shiftcurrent) part of the response is given by n,m Here f nm = f n − f m and ω nm = E m − E n are differences between occupation factors and band energies, respectively, and the integral is over the first BZ, with Because I abc mn is Hermitean, the right-hand-side of Eq. (8) is real. Its transformation properties under inversion and time-reversal symmetry are summarized in Appendix A.
For comparison, we also calculate the joint density of states (JDOS) per crystal cell, n,m (v c is the cell volume), and the interband contribution to the absorptive (abs) part of the dielectric function [18], ab n,m f nm r a nm r b mn δ(ω mn − ω). (10) In nonmagnetic crystals ab abs is purely imaginary and symmetric, and we report values for Im ab r = Im ab abs / 0 , the imaginary part of the relative permittivity. The matrix elements r a nm and r a;b nm appearing in Eq. (8) satisfy the identities and where v a nm = 1 u n |∂ aĤ |u m , Equation (11) can be obtained by differentiating the identity u n |Ĥ|u m = E n δ nm with respect to k a for m = n. Differentiating once more with respect to k b and inserting a complete set of states yields the sum rule in Eq. (12) [7,18]. For Hamiltonians of the form H k = (p + k) 2 /2m e + V (r), the term w ab nm therein has no off-diagonal components and does not contribute to the sum rule. That term should however be included in tight-binding calculations [7], and in first-principles calculations with nonlocal pseudopotentials [26].
Equation (12) has been used in ab initio calculations of the shift current [21,23], with a truncated summation over intermediate states p = n, m. An exact (truncationfree) expression for r a;b nm that only requires summing over a finite number of wannierized bands, Eq. (36) below, constitutes a central result of the present work.

III. WANNIER INTERPOLATION SCHEME
The needed quantities for calculating the shift-current response from Eq. (8) are the energy eigenvalues, and the matrix elements r a nm and r a;b nm defined by Eqs. (1) and (2). In this section we describe how to evaluate each of them in a Wannier-function basis.
Consider a set of M well-localized Wannier functions per cell w j (r − R) = r|Rj spanning the initial and final states involved in interband absorption processes up to some desired frequency ω. (In practice we shall construct them by post-processing a first-principles calculation, using the method of maximally-localized Wannier functions [27,28].) Starting from these orbitals, we define a set of Blochlike basis states as where the superscript (W) stands for "Wannier gauge" [25]. Note that at variance with Ref. 25, we have chosen to include the Wannier center τ j = 0j|r|0j (15) in the phase factor of Eq. (14). This phase convention, often used in tight-binding calculations, is the most natural one for expressing the Berry connection and related geometric quantities in reciprocal space [29].

A. Energy eigenvalues
The matrix elements of the first-principles Hamiltonian H k = e −ik·rĤ e ik·r between the Blochlike states (14) read Diagonalization of this M ×M matrix yields the Wannierinterpolated energy eigenvalues, where U k is the unitary matrix taking from the Wannier gauge to the Hamiltonian gauge. This Slater-Koster type of interpolation, with the Wannier functions acting as an orthogonal tight-binding basis, has been shown in practice to provide a smooth k-space interpolation of the ab initio eigenvalues. (With disentangled Wannier functions, the interpolation is faithful only within the socalled "inner" or "frozen" energy window [28].)

B. Berry connection and interband dipole
The same interpolation strategy can be applied to other k-dependent quantities. In particular, the Hamiltonian-gauge Bloch states interpolate the ab initio Bloch eigenstates, allowing to treat wavefunction-derived quantities. As a first example, consider the Berry connection matrix defined by Eq. (3). Inserting the above expression for |u kn in that equation yields [25] A a where A (W) a in Eq. (19c) denotes a Cartesian component of the Berry connection matrix in the Wannier gauge, The term A a nm in Eq. (19) carries the interpretation of a Berry connection for the eigenvectors of H (W) (the column vectors of U ). Introducing the notation ||u n for those vectors, 1 Eq. (19b) becomes A a nm = i u n ||∂ a u m . This is the "internal" Berry connection for the tightbinding model defined by Eq. (16) in terms of the Hamiltonian matrix elements and Wannier centers.
The extra term a a nm in Eq. (19) arises from off-diagonal matrix elements of the position operator in the Wannier basis, as can be seen by inspecting the matrix element in Eq. (20) together with Eq. (15). In tight-binding formulations, it is customary to postulate a diagonal representation forr [29][30][31][32][33], where we have introduced the symbol " . =" to denote equalities that only hold only within this "diagonal tightbinding approximation" (diagonal TBA). Thus, a a nm is the part of the Berry connection matrix A a nm that is discarded when making the diagonal TBA, and we will refer to it as the "external" part.
For the interband dipole matrix of Eq. (1) we get where r a with ∂ a H (W) obtained by differenting the right-hand-side of Eq. (16). Equation (23a) is the "internal" counterpart of Eq. (11) for r a nm . It can be derived in a similar manner, by differentiating Eq. (17) with m = n.

C. Generalized derivative of the interband dipole
The energy eigenvalues and interband dipole matrix elements r a nm are the only ingredients entering Eq. (10) for the dielectric function, which has been previously evaluated by Wannier interpolation [34]. Equation (8) for the shift current contains in addition the generalized derivative r a;b nm , and in the following we describe how to evaluate it within the same framework.

Useful definitions and identities
Our strategy will be to evaluate Eq. (2) for r a;b nm starting from Eqs. (19) and (22) for A a nm and r a nm , respectively. Inspection of those equations reveals that we need to differentiate with respect to k b the matrices v a nm and a a nm . Noting that both of them are of the form and using the identity we find Writing A b nm in the commutator as δ nm A b nn + r b nm and then expanding r b , O as a sum over states yields 1 When the Wannier centers are included in the phase factors of the Bloch sums as in Eq. (14), the eigenvectors of H (W) can be thought of as tight-binding analogues of the cell-periodic Bloch states, hence the notation ||un . The fact that Berry-phase-type where the contribution from intermediate states p = n, m has been separated out. We find it convenient to define an "internal generalized derivative" of the matrix O in analogy with Eq. (2), Note that this is equal to the sum of the first three terms in Eq. (27). Before proceeding, let us also define the following internal quantities in analogy with Eq. (29c)

Derivation
We begin by differentiating the term r a nm with the help of Eq. (27) and expressing the result in the form of Eq. (28), we find This is the internal counterpart of the sum rule (12), written in terms of the tight-binding eigenvectors, eigenvalues, and Hamiltonian, instead of the ab initio ones. The same procedure can be used to differentiate the term a a nm in Eq. (22), given by Eq. (19c). The result is where quantities are defined in terms of the cell-periodic Bloch states is the reason why that phase convention is the most natural one for dealing with such quantities in tight-binding [29].
Adding ∂ b r a nm and ∂ b a a nm from Eqs. (31) and (33) to form ∂ b r a nm , and then subtracting the amount to obtain r a;b nm as per Eq. (2), we arrive at This expression for the generalized derivative in the Wannier representation is a central result of the present work. An alternative expression that is equally valid was obtained in Ref. 26, and the precise relation between the two formulations is established in Appendix B.
D. Discussion

Summary of the interpolation algorithm
To summarize, the response tensor σ abc (0; ω, −ω) is given by Eq. (8) in terms of the energy eigenvalues and of the matrix elements I abc mn defined by Eq. (5). At each k, the former are interpolated using Eq. (17), and the latter using Eqs. (22) and (36)  It is well known that the tight-binding expression for an operator depends on the phase convention used for the Bloch sums [35]. Let us discuss how this plays out for the Berry connection matrix (similar remarks apply to the interband dipole matrix and its generalized derivative).
The phase convention we have adopted in this work is that of Eq. (14). The other commonly used convention is to drop τ j from that equation [29,35], in which case the Berry connection matrix is still given by Eq. (19) but τ i and τ j should be removed from Eqs. (16) and (20).
As a result, the term A a nm in Eq. (19) becomes a function of the Hamiltonian matrix elements only and not of the Wannier centers, whose contributions to the Berry connection are absorbed by a a nm . The total Berry connection A a nm remains the same as before, but the term a a nm is now nonzero under the diagonal TBA of Eq. (21).

Gauge covariance of the generalized derivative
Although Eq. (2) for r a;b nm is gauge covariant in the sense of Eq. (4), its individual terms are not, leading to numerical difficulties. Instead, the individual terms in the Wannier-based expression (36) for r a;b nm transform covariantly under band-diagonal gauge transformations. As a result, its numerical implementation is very robust.

4.
Generalized derivative versus the effective-mass sum rule: The role of position matrix elements As remarked in Sec. II B, Eq. (12) for r a;b nm follows from differentiating the identity u n |Ĥ|u m = E n δ nm once with respect to k a and once with respect to k b , for m = n. Doing so for m = n yields the effective-mass sum rule.
For tight-binding models with a finite number of bands, the effective-mass sum rule can be formulated exactly. The modified sum-rule expression, which only depends on the Hamiltonian matrix elements, includes an intraband term w ab nn given by Eq. (29c) [30,34,36]. The effect of the basis truncation on the calculation of nonlinear optical responses has been the subject of several recent investigations [7,37,38]. In particular, it was suggested in Ref. 7 that Eq. (32) for r a;b nm , which includes an interband term w ab nm , is the correct expression for r a;b nm in tight-binding models. In fact, that expression only accounts for part of the wavefunction dependence of r a;b nm , via the diagonal position matrix elements. The full expression, Eq. (36), has additional terms that depend on the off-diagonal position matrix elements. Those should be included in order to completely describe the wavefunction dependence, and to render the result independent of the choice of Wannier basis orbitals [32].
In the diagonal TBA of Eq. nm . In this approximation the shift current only depends on the Hamiltonian matrix elements and on the Wannier centers, and a strong dependence on the latter was found in Ref. 7. As we will see in Sec. V (and also noted in Ref. 26), the additional contributions from off-diagonal position matrix elements can modify appreciably the calculated shift-current spectrum.

The two-band limit
The shift-current response of two-band tight-binding models has been considered in Refs. 7 and 19. In that limit the three-band terms in Eq. (36) (those containing intermediate states) vanish identically, and r a;b nm is completely specified by the two-band terms, which pick up the missing contributions (the importance of the w ab nm term in this regard was emphasized in Ref. 7). It appears to have gone unnoticed that the diagonal TBA introduces a qualitative error for two-band models, as we now discuss.
In the diagonal TBA, Eq. (36) for a two-band model reduces to the first two terms in Eq. (32), This expression is symmetric under a ↔ b, and when used in Eq. (5) for I abc mn it renders Eq. (8) for σ abc (0; ω, −ω) totally symmetric, irrespective of crystal symmetry. This unphysical behavior is not an artifact of two-band models, but of the diagonal TBA applied to such models. The shift current arising from the photoexcitation of carriers between the two bands can be calculated exactly, without adding more bands to the model, by including the additional two-band terms in Eq. (36) associated with off-diagonal position matrix elements. These considerations appear relevant to the ongoing discussion on the shift-current response of Weyl semimetals [11][12][13].

IV. COMPUTATIONAL DETAILS
In this section we describe the various steps of the calculations that we have carried out for two test systems, bulk GaAs and single-layer GeS. In a first step, we performed density-functional theory calculations using the Quantum ESPRESSO code package [39]. The core-valence interaction was treated by means of fully-relativistic projector augmented-wave pseudopotentials (taken from the Quantum ESPRESSO website) that had been generated with the Perdew-Burke-Ernzerhof exchange-correlation functional [40], and the energy cutoff for the planewave basis expansion was set at 60 Ry. Maximallylocalized Wannier functions were then constructed in a post-processing step, using the Wannier90 code package [41]. Finally, the shift-current spectrum [Eq. (8)], the JDOS [Eq. (9)], and the dielectric function [Eq. (10)] were calculated in the Wannier basis as described in Sec. III.
In the case of zincblende GaAs, the self-consistent calculation was carried out on a 10 × 10 × 10 k -point mesh, using the experimental lattice constant of a = 10.68 a 0 . Starting from the converged self-consistent Kohn-Sham potential, the 24 lowest bands and Bloch wavefunctions were then calculated on the same mesh. Finally, a set of 16 disentangled Wannier functions spanning the eight valence bands and the eight low-lying conduction bands were constructed using s and p atom-centered orbitals as trial orbitals. The Wannier-interpolated energy bands are shown in Fig. 1 together with the ab initio bands (including in both cases a "scissors correction"). The agreement between the two is excellent inside the inner energy window [28], which spans the energy range from the bottom of the figure up to the dashed horizontal line.
The calculations for monolayer GeS were done in a slab geometry, with a supercell of length 15Å along the nonperiodic direction and a 1 × 12 × 12 k -point mesh for both the self-consistent and for the band structure calculation. The parameters for the structure with an in-plane polar distortion were taken from Table II in  To obtain well-converged shift-current spectra, we used dense k -point interpolation meshes of 100 × 100 × 100 for GaAs and 1×1000×1000 for GeS. In the case of GaAs, we employed an adaptive scheme [34] for choosing the width of the broadened delta functions in Eq. (8). For GeS we used a fixed width of 0.02 eV, as it was found to handle better the strong van-Hove singularities characteristic of two-dimensional (2D) systems.
In the sum-over-states expression for σ abc (0; ω, −ω), the energy denominators involving intermediate states should be interpreted as principal values [15]. In our formalism such denominators appear in Eqs. (32) and (34) and in practice we make the replacement and similarly for 1/ω pm . Such a regularization procedure is needed to avoid numerical problems caused by near degeneracies. Following Ref. 21, we choose η in a range where the calculated spectrum remains stable. In the calculations reported below, we have used η = 0.04 eV for both GaAs and GeS.
As mentioned earlier, a scissors correction was applied to the calculated band structure of GaAs in Fig. 1, in order to cure the underestimation of the gap. The conduction bands were rigidly shifted by 1.15 eV and the spectral quantities plotted in Fig. 3 were modified accordingly as described below, facilitating comparison with Ref. 21 where a scissors correction was also applied.
It is clear from Eq. (9) that the scissors correction leads to a rigid shift of the JDOS. Although less obvious, the shift-current spectrum [Eq. (8)] and the dielectric function [Eq. (10)] also undergo rigid shifts. The reason is that Eqs. (8) and (10) do not contain any frequency prefactors, and the matrix elements therein are intrinsic properties of the Bloch eigenstates [see Eqs. (1) and (2)], which are unaffected by the scissors correction (only the eigenvalues change). The eigenvalues do appear in Eqs. (11) and (12) that are used in practice to evalute the optical matrix elements, but a careful analysis reveals that those equations remain invariant under a scissors correction [42].

A. Bulk GaAs
The zincblende semiconductor GaAs was the first piezoelectric crystal whose shift-current spectrum was evaluated using modern band structure methods. The original calculation [18] suffered from a computational error, and a corrected spectrum was reported later [21]. Given the existence of this benchmark calculation, we have chosen GaAs as the first test case for our implementation. Figure 3(a) shows the calculated σ xyz (0; ω, −ω), which is equal to σ abc (0; ω, −ω) for any permutation abc of xyz, and all other components vanish by symmetry [18].  Fig. 3(a) into "internal" (solid lines) and "external" (dashed lines) terms on one hand, and into "three-band" (black lines) and "two-band" (gray lines) terms on the other. the spectra calculated in Ref. 21.
The dielectric function and the shift-current spectrum share similar peak structures, inherited from the JDOS. The level of agreement with Ref. 21 is excellent for Im xx r (ω) and also very good for σ xyz (0; ω, −ω), with only minor deviations. The presence of small discrepancies is not surprising, given that the shift current is rather sensitive to the wavefunctions [15][16][17] and that the two calculations differ on several technical aspects. For example, we use pseudopotentials instead of an allelectron method, and a generalized gradient approximation for the exchange-correlation potential instead of the local-density approximation. The BZ integration methods are also different, and the spin-orbit contribution to the velocity matrix elements was not included in Ref. 21.
The dash-dotted gray lines in panels (a) and (b) of Fig. 3 show the spectra calculated in the diagonal TBA of Eq. (21). While in the case of Im xx r (ω) the changes are quite small, they are more significant for σ abc (0; ω, −ω). This reflects the strong wave-function dependence of the shift current, encoded not only in the Wannier centers [7] but also in the off-diagonal position matrix elements 0n|r|Rm . Those matrix elements are usually discarded in tight-binding calculations, but they should be included to fully embed the tight-binding model in real space. The sensitivity of the shift current to those matrix elements can be understood from the charge-transfer nature of the photoexcitation process in piezoelectric crystals [15,16,21].
It is instructive to break down the shift-current spectrum calculated by Wannier interpolation into different types of contribution. Inserting Eqs. (22) and (36) for r a nm and r a;b nm into Eq. (5) for I abc nm generates a number of terms. Each can be classified as "external" or "internal" depending on whether or not it contains off-diagonal position matrix elements: the term r b mn r c;a nm is internal, and all others are external. In addition, we classify each term as "two-band" or "three-band" depending on whether it only involves states n and m, or intermediate states p as well. This gives a total of four types of terms, whose contributions to the shift current are shown in Fig. 4. The dominant contribution comes from internal threeband terms, which by themselves provide a reasonable approximation to the full spectrum shown in Fig. 3(a). They are followed by the internal two-band terms, while the two external terms are somewhat smaller. Over most of the spectral range, the external terms have the opposite sign compared to the internal ones. Since the diagonal TBA amounts to discarding the external terms, that explains why the dash-dotted gray line in Fig. 3(a) overestimates the magnitude of the full spectrum given by the solid black line. We emphasize that the decomposition of the shift-current spectrum in Fig. 4 depends on the choice of Wannier functions.

B. Monolayer GeS
GeS is a member of the group-IV monochalcogenides, which in bulk form are centrosymmetric, but become polar -and hence piezoelectric -when synthesized as a single layer. The point group of monolayer GeS is mm2, which allows for seven tensorial components of σ abc (0; ω, −ω) to be nonzero [23]. With the same choice of coordinate axis as in Fig. 1 of Ref. 23 (the in-plane directions areŷ andẑ, with the spontaneous polarization alongẑ), the nonzero components are zxx, zyy, zzz, yyz = yzy, and xxz = xzx.
The zzz component of the shift-current spectrum is displayed in Fig. 5(a). Following Ref. 23, we report a 3D-like response obtained assuming an active single-layer thickness of 2.56Å. This is achieved by rescaling the calculated response of the slab of thickness 15Å as follows, In Figs Our calculated spectra in Fig. 5 are in reasonable agreement with those reported in Ref. 23 (dashed red lines), including on the positions of the main peaks and on the sign change of the shift current taking place at around 2 eV. However, the agreement is not as good as that seen in Fig. 3 for GaAs. This may be due in part to some differences in computational details between the two calculations, namely the use of different k -point meshes and BZ integration methods: we have sampled the BZ on a uniform mesh of 10 6 k points, while in Ref. 23 a more sophisticated tetrahedron method was used for the integration, but with far fewer k points (4900). There is however another source of disagreement, which was not present in Fig. 3: the approximate treatement in Ref. 23 of the optical matrix elements within the nonlocal pseudopotential framework. This source of error is discussed further in Appendix D.

C. Analysis of computational time
Here we compare the computational requirements of our numerical scheme with a direct calculation of the shift-current spectrum without Wannier interpolation (e.g., using the method outlined in Appendix D). The spectrum is evaluated by discretizing the BZ integral in Eq. (8) over a mesh containing N k points, and we wish to see how the computational times of the two approaches scale with N .
For that purpose, let us define the following time scales per k point: t w and t d are the times to evaluate the integrand in Eq. (8) by Wannier interpolation and using the direct method, respectively, and t nscf is the time to carry out a non-self-consistent calculation to obtain the ab initio Bloch eigenfunctions and energy eigenvalues. Further, we define T scf as the total time needed to carry out the self-consistent ground-state calculation, and T wf as the total time needed to construct the Wannier functions on a grid of M k points. The total time of a Wannier-based calculation of the shift current is then while the total time of a direct calculation is where we used t d t nscf . Let us take as a concrete example a calculation for monolayer GeS done on a single Intel Xeon E5-2680 processor with 24 cores running at 2.5 GHz. For the choice of parameters indicated in Sec. IV we find t w 21 ms, t nscf 46 s, T scf 0.5 hours, and T wf 1 hour. In Fig. 6 we plot as a function of N the total times obtained from Eqs. (40) and (41), for M = 12 2 . The use of Wannier interpolation is already quite advantageous for N ∼ 500, and the speedup increases very rapidly with N . If a dense k-point sampling with N ∼ 10 6 is required, the speedup reaches three orders of magnitude. (The absolute times reported in Fig. 6 can be reduced by parallelizing the loop over the N k points, which is trivial to do both with and without Wannier interpolation.)

VI. SUMMARY
In summary, we have described and validated a Wannier-interpolation scheme for calculating the shiftcurrent spectrum of piezoelectric crystals, starting from the output of a conventional electronic-structure calculation. The method is both accurate and efficient; this is achieved by using a truncated Wannier-function basis, but without incurring in truncation errors when evaluating the optical matrix elements. The same approach can be applied to other nonlinear optical responses, such as second-harmonic generation, that involve the same matrix elements [18,26].
Our work was motivated in part by the growing interest in the calculation of nonlinear optical properties of novel materials such as Weyl semimetals and 2D materials. We hope that the proposed methodology, and its implementation in the Wannier90 code package, will help turn such calculations into a fairly routine task.
When describing the formalism, we tried to emphasize the notion that Wannier functions provide an essentially exact (in some chosen energy range) tight-binding parametrization of the ab initio electronic structure. Thus, we chose our notation and conventions so as to facilitate comparison with the expressions for nonlinear optical responses found in the tight-binding literature. Our numerical results suggest that it should be possible to systematically improve the tight-binding description of such responses by including off-diagonal position matrix elements as additional model parameters. In Ref. 31, an attempt was made along those lines to improve the tight-binding parametrization of semiconductors for the calculation of Born effective charges, but with limited success. Clearly more work is needed in this direction, and the shift current, with its strong sensitivity to the wavefunctions, is particularly well-suited for such investigations.
The shift current has been mostly studied in acentric crystals without magnetic order. The presence of timereversal symmetry in such systems implies The points k and −k now give equal contributions to the BZ integral, and Eq. (8) reduces to n,m In Ref. 26, a similar Wannier-interpolation scheme for calculating the shift current was proposed independently. The expression given in that work for the generalized derivative in the Wannier basis is however different from Eq. (36). In this Appendix, we show that the two formulations are in fact consistent with one another.
Below their Eq. (7), the authors of Ref. 26 write ab U,(B1) which follows from differentiating Eq. (19). The last term can be expressed in terms of D a = U † ∂ a U = −iA a as The non-Hermitean term iD b D a cancels the fourth term in Eq. (B1), leaving an expression for ∂ b A a that is correctly Hermitean, term by term. Let us now evaluate the term ∂ b D a assuming D a nn = 0 (parallel-transport) [26]. The off-diagonal matrix elements of the matrix D a read where v a mm was defined in Eq. (23b). Invoking Eq. (26) we find nm is eventually recovered (after using A b = r b , which holds in a parallel-transport gauge).
We can now proceed to compare with Ref. 26. Combining Eqs. (B2)-(B4) we obtain The first two terms in this equation agree with those in Eq. (8) of Ref. 26, and in the following we show that the remaining terms in both equations can also be brought into agreement. Dropping the first two terms of Eq. (B5) and using ω nm /(ω nl ω lm ) = 1/ω nl − 1/ω lm in the last term, we find 2  (24), a parallel-transport gauge was imposed on the U matrices while evaluating the Berry curvature in a Wannier basis. Should one then enforce the parallel-transport condition when choosing those matrices at neighboring k points? This is in fact not necessary, as we now show.
The Berry curvature of band n is given by the m = n element of the matrix Using which follows from Eqs. (18) and (25), we find Since the second term vanishes for m = n, we conclude that the Berry curvature, given by the band-diagonal entries in Eq. (C3), is insensitive to the value of the gaugedependent quantity A a nn . This is consistent with the fact that the Berry curvature is gauge invariant. In some previous ab initio calculations of the shift current [21,23], the velocity operator was approximated aŝ The interband velocity matrix elements v nm in the Bloch basis were then inserted into Eqs. (11) and (12) (dropping the term w ab nm in the latter) to obtain the interband dipole matrix r a nm and its generalized derivative r a;b nm . When using either an all-electron method (as in the GaAs calculation of Ref. 21) or local pseudopotentials, the above procedure is exact, at least when spin-orbit coupling is neglected. 3 However, modern pseudopotential calculations employ nonlocal pseudopotentials, for which that procedure introduces some errors: the velocity operator is not simply given by Eq. (D1) [44,45], and as a result the term w ab nm in Eq. (12) for r a;b nm becomes nonzero (see Appendix B in Ref. 26).
In this Appendix we perform additional calculations for single-layer GeS employing the same computational setup as used in Ref. 23 (ABINIT code [46] with Hartwigsen-Goedecker-Hutter pseudopotentials [47]), in order to estimate the errors arising from the use of the approximate procedure outlined above.
As a first step, we switched off by hand the nonlocal terms in the pseudopotentials. For a given k-point sampling and delta-function smearing, the resulting spectra Im zz r (ω) and σ zzz (0; ω, −ω) (not shown) were found to be in perfect agreement with those calculated by Wannier interpolated using the same local pseudopotentials. This provided a strong numerical check of our Wannier interpolation scheme, which does not depend on whether an all-electron or a pseudopotential method has been used, or on whether the pseudopotentials are local or nonlocal.
We then redid both calculations using the full nonlocal pseudopotentials. The results obtained by sampling the Im r, FIG. 7. (a) Shift-current spectrum, and (b) dielectric function of single-layer GeS calculated using an exact (red) and an approximate (blue) treatment of the optical matrix elements within the nonlocal-pseudopotential approach. The red curve was obtained with Wannier interpolation, while for the blue curve the optical matrix elements were calculated directly in the plane-wave basis using Eq. (D1).
2D BZ on a relatively coarse 70 × 70 grid with a fairly large delta-function broadening of 0.1 eV are shown in Fig. 7 (as a result of the coarse k-point sampling and of the large broadening, the spectral features are broadened compared to Fig. 5). There are clear differences between the spectra calculated in the manner of Ref. 23, and those obtained using the Wannier interpolation scheme: the positions of the peaks are the same, but their heights are somewhat different, as expected from a small change in the matrix elements. Given the perfect agreement that had been found with local pseudopotentials, these differences must arise exclusively from the approximate treatment of the optical matrix elements in the approach of Ref. 23 combined with nonlocal pseudopotentials. Since the level of disagreement seen in Fig. 7 is comparable to that seen in Figs. 5(a,b), it seems plausible that there the discrepancies may also arise in part from these small errors in the matrix elements.