A hierarchical Bayesian Beta regression approach to study the effects of geographical genetic structure and spatial autocorrelation on species distribution range shifts

Global climate change (GCC) may be causing distribution range shifts in many organisms worldwide. Multiple efforts are currently focused on the development of models to better predict distribution range shifts due to GCC. We addressed this issue by including intraspecific genetic structure and spatial autocorrelation (SAC) of data in distribution range models. Both factors reflect the joint effect of ecoevolutionary processes on the geographical heterogeneity of populations. We used a collection of 301 georeferenced accessions of the annual plant Arabidopsis thaliana in its Iberian Peninsula range, where the species shows strong geographical genetic structure. We developed spatial and nonspatial hierarchical Bayesian models (HBMs) to depict current and future distribution ranges for the four genetic clusters detected. We also compared the performance of HBMs with Maxent (a presence‐only model). Maxent and nonspatial HBMs presented some shortcomings, such as the loss of accessions with high genetic admixture in the case of Maxent and the presence of residual SAC for both. As spatial HBMs removed residual SAC, these models showed higher accuracy than nonspatial HBMs and handled the spatial effect on model outcomes. The ease of modelling and the consistency among model outputs for each genetic cluster was conditioned by the sparseness of the populations across the distribution range. Our HBMs enrich the toolbox of software available to evaluate GCC‐induced distribution range shifts by considering both genetic heterogeneity and SAC, two inherent properties of any organism that should not be overlooked.

Such improvements are aimed at considering all the possible organism responses to GCC, such as shifts in dispersal ability, phenology and physiology of life-history traits (Bellard, Bertelsmeier, Leadley, Thuiller, & Courchamp, 2012;Lenoir & Svenning, 2015). However, precise data on these responses are lacking for many organisms because of the intensive amount of labour and data needed to estimate them properly for multiple populations across large areas of the distribution range. Nonetheless, modelling approaches clearly have to go beyond presence-background models and related approaches (e.g., presence-absence models, random pseudo-absence point models) using current and future climatic conditions to increase their accuracy and reliability (Guisan, Thuiller, & Zimmermann, 2017). In this study, we address this issue by considering two important biological aspects that should be considered when modelling current distribution ranges of terrestrial organisms and their GCC-induced shifts.
First, demographic processes (i.e., extinction/colonization dynamics and dispersal ability) and adaptation to local environmental conditions determine the extent of population stratification, namely geographically distributed allele frequencies depicting subpopulations or clusters at different spatial scales (Anderson et al., 2010).
Molecular data are commonly used to infer the number of genetically differentiated clusters and their degree of admixture (Falush, Stephens, & Pritchard, 2003;Pritchard, Stephens, & Donnelly, 2000). In addition, it is very informative to determine whether such genetic clusters are geographically distributed across the species distribution range (Novembre et al., 2008;Rosenberg et al., 2005). This is because we can interpret the number of genetic clusters, their geographical distribution and their degree of admixture as the result of all demographic processes and adaptive forces acting on populations. This paradigm has steadily gained ground in studies estimating future distribution range shifts due to GCC by means of species distribution models (SDMs; Bálint et al., 2011;D'Amen, Zimmermann, & Pearman, 2013;Diniz-Filho et al., 2016;Gotelli & Stanton-Geddes, 2015;Ikeda et al., 2017;Jay et al., 2012;Lima, Ballesteros-Mejia, Lima-Ribeiro, & Collevatti, 2017;Marcer, Méndez-Vigo, Alonso-Blanco, & Picó, 2016;Milanesi et al., 2018;Yannic et al., 2014), stressing the need to consider the inherent genetic heterogeneity of organisms.
From a methodological viewpoint, working with intraspecific patterns of genetic diversity implies the combination of presenceonly data, which commonly feed SDMs such as Maxent, with genetic structure data, which are generally expressed as genetic cluster membership proportions (ranging between 0 and 1) that inform on the degree of genetic admixture (Serra-Varela et al., 2017). The problem arises when admixture information is lost because individuals have to be assigned to a single cluster, generally the one with the highest membership proportion according to some arbitrary threshold to run presence-only SDMs (Gotelli & Stanton-Geddes, 2015;Ikeda et al., 2017;Marcer et al., 2016). In doing that, the amount of data and information lost depends on the genetic structure of the study organism. For instance, species with a pronounced genetic structure will probably have individuals with high genetic cluster membership proportions, which facilitates the assignment of individuals to single genetic clusters. In contrast, individual assignment to single genetic clusters will exhibit higher uncertainty for weakly genetically structured organisms (e.g., high levels of individual genetic admixture), posing problems for the development of SDMs to study the effects of GCC on their patterns of geographic genetic structure. Either way, we lose valuable information that may reduce the value and impact of GCC model outcomes and therefore our understanding of the GCC effects on biodiversity.
Here, we use hierarchical Bayesian models (HBMs), which account for the geographical distribution of intraspecific genetic diversity and the presence of SAC, to analyse current distribution range as well as the effect of GCC on its shifts. To that end, we use a collection of 301 natural populations of the annual plant Arabidopsis thaliana occurring in the Iberian Peninsula, the region of the distribution range with the highest genomic diversity (The 1001 Genomes Consortium, 2016). Genome-wide markers are used to infer Iberian A. thaliana's geographical genetic structure by estimating genetic cluster membership proportions. To better understand the potential of our model, we compare three approaches, Maxent and two Bayesian, representing a gradient of complexity in the treatment of intraspecific genetic data and SAC, in particular (a) presence-only SDMs (Maxent) that do not take SAC into account and are based on binary data for genetic cluster membership proportions; (b) nonspatial HBMs not accounting explicitly for SAC and based on continuous data for genetic cluster membership proportions; and (c) spatially explicit HBMs accounting for SAC and based on continuous data for genetic cluster membership proportions. Although HBMs represent well-established methods for statistical inference in several research fields, the application of Beta regression to climate-driven shifts in species distribution range is not common (see Martínez-Minaya, Cameletti, Conesa, & Pennino, 2018).
In particular, we promote the use of Beta regressions where data fitting can be achieved using integrated nested Laplace approximation (INLA) rather than Markov chain Monte Carlo (MCMC) methods. We discuss our results in terms of the relevance of intraspecific genetic variation and SAC to better interpret and contextualize the implications of GCC on species distribution range shifts, but also identifying the limitations and caveats of our approach.
Populations included in this study came from field surveys that spanned 12 years (2000)(2001)(2002)(2003)(2004)(2005)(2006)(2007)(2008)(2009)(2010)(2011). We sampled seed from several individuals from each population. Every year and a few months after every survey, field-collected seed was multiplied by the single seed descent method in a glasshouse at the Centro Nacional de Biotecnología (CNB-CSIC) in Madrid. Seeds that had multiplied were stored in dry, dark conditions in cellophane bags at room temperature, storing conditions that can preserve A. thaliana seeds for years.
In this study, we used one representative individual (hereafter accession) per population to analyse the genetic structure of A. thaliana in the Iberian Peninsula. Importantly, accessions exhibited common phenotypes within their populations based on flowering time and/ or the vernalization requirement for flowering, which are traits under strong selection in Iberian A. thaliana (Méndez-Vigo, Gomaa, Alonso-Blanco, & Picó, 2013) that appear to be mediated by variation in temperature (Méndez-Vigo et al., 2011;Vidigal et al., 2016).
This procedure increased the odds of using accessions best suited to their local environments and, therefore, common genotypes in the populations.
In short, SNPs were selected based on their polymorphism shown in Central Europe, the Iberian Peninsula and in a worldwide collection of accessions, and were genotyped using the SNPlex technique  . In brief, model settings included haploid multilocus genotypes, correlated allele frequencies between populations and a linkage model. Each run consisted of 50,000 burn-in MCMC iterations and 100,000 MCMC after-burning repetitions for parameter estimation. To determine the K number of ancestral populations and the ancestry membership proportions of each accession in each population, the algorithm was run 20 times for each defined number of groups (K value) from 1 to 10. The number of distinct genetic groups was determined by testing the differences between the data likelihood for successive K values using Wilcoxon tests for two related samples. The final K number was estimated as the largest K value with significantly higher likelihood than that of K −1 runs (two-sided p < 0.005). This was supported by a high similarity among the ancestry membership matrices from different runs of the same K value (H′ = 0.99). The average symmetric similarity coefficient H′ among runs and the average matrix of ancestry membership proportions, derived from the 20 runs, were calculated with clumpp version 1 (Jakobsson & Rosenberg, 2007). This analysis inferred four genetic clusters in the Iberian Peninsula (Figure 1), in agreement with previous studies on A. thaliana's genetic structure (Gomaa et al., 2011;Manzano-Piedras et al., 2014;Marcer et al., 2016;Méndez-Vigo et al., 2011;Picó et al., 2008).

| Climatic variables and GCC scenarios
We selected a total of eight bioclimatic predictors to define the climatic space: annual mean temperature (BIO1), mean diurnal range (BIO2), isothermality (BIO3), temperature seasonality (BIO4), mean temperature of the wettest quarter (BIO8), annual precipitation (BIO12), precipitation seasonality (BIO15) and precipitation of the warmest quarter (BIO18). These bioclimatic predictors were selected because their pairwise correlation coefficients were <0.7, a threshold value commonly used to avoid unacceptable colinearity among independent variables (Pino, Font, Carbó, Jové, & Pallarès, 2005). We used the dismo R package (Hijmans, Phillips, Leathwick, & Elith, 2017) to retrieve these climate layers from the Digital Atlas of the Iberian Peninsula (http://www.openg is.uab.cat/wms/iberi a/), which provides interpolated surface layers of mean monthly data obtained from 2,285 weather stations for the period 1951-1999. We refer to this time period as year 2000.
We chose the year 2070 as the scenario to evaluate A. thaliana's distribution range shifts due to GCC. In order to use the most and least conservative GCC scenarios, we selected the representative concen-

| Species distribution models
Species distribution models link information on the presence/absence or abundance of a species to environmental variables to predict where it is likely to be present in unsampled locations or time periods (Elith & Leathwick, 2009;Guisan & Thuiller, 2005). In recent years, the quantity and the quality of the data sets have increased substantially, resulting in a higher complexity of the statistical issues that have to be addressed when an SDM is created. As a result of this increasing complexity, the performance of the SDM inferential and predictive processes are becoming more challenging, forcing researchers to develop new sophisticated statistical techniques (reviewed by Martínez-Minaya et al., 2018). Given the flood of methodologies developed to address this issue, we compared SDMs obtained with two contrasting approaches: a presence-only model (Maxent) and the hierarchical Bayesian Beta regressions (with and without a spatial term).

| Maxent
We used Maxent version 3.3.3k (Elith et al., 2011;Phillips, Anderson, & Schapire, 2006) to model the current and future distribution of A. thaliana's genetic clusters. As Maxent uses presence-only data, we assigned each of the 301 accessions to its predominant genetic cluster using a cut-off value of 0.5 to each genetic cluster membership proportion given by structure (as in Marcer et al., 2016). As a result,  (Table S1). We then reran the chosen model with all data points to obtain the final model. Maxent was used with default parameters with the exception of features, which were limited to the hinge type, making it similar to a Generalized Additive Model (Elith et al., 2011).

| Hierarchical Bayesian Beta regression
Spatial and nonspatial HBMs were also used to model the current and future distribution of A. thaliana's genetic clusters. In particular, spatial and nonspatial Beta regressions were conducted to estimate the probability of genetic cluster membership, which in this particular context can be thought of as the habitat suitability for each genetic cluster. In contrast to Maxent, Beta regressions allowed us to model each genetic cluster separately using all genetic information available, that is, the genetic cluster membership proportions of all 301 accessions. In other words, no data were excluded.
The class of Beta regression models is commonly used to model variables that assume values in the unit interval (between 0 and 1; Ferrari & Cribari-Neto, 2004), such as the case of membership probabilities of genetic clusters. A Beta distribution depends on two scaling parameters, Beta (a,b), which can be parameterized in terms of its mean, , a dispersion parameter, = a + b, and the variance, 2 = (1− ) 1+ . This parameterization better supports the truncated nature of the Beta distribution because the variance depends on the mean, which translates into maximum variance at the centre of the distribution and minimum variance at the edges. In addition, the dispersion of the distribution for a fixed decreases as increases. We did not transform the data to avoid the problems posed by extreme values, as proposed elsewhere (Cribari-Neto & Zeileis, 2010), because data fell far enough from the extremes of the Beta distribution.
As we were interested in depicting the relationship between the probabilities of genetic cluster membership and the bioclimatic predictors, we linked the mean and the precision of the response variable to the linear bioclimatic predictors via suitable link functions. In particular, if Y i represents the genetic cluster membership probability where i and i are the Beta distribution parameters at location i . We used the logit and log links for i and i , respectively. The mean was linked to climatic covariates (nonspatial term) and, in the case of spatial models, to a stochastic spatial effect (spatial term). Precision was assumed to be not dependent on any effect. The resulting model with a spatial term is known as a point-referenced spatial Beta regression (Paradinas et al., 2016(Paradinas et al., , 2018. It is highly suitable for situations in which data are observed at continuous locations occurring within a defined spatial domain: where is the vector of regression coefficients ( 0 , 1 , … , c ), X i is the vector corresponding to the ith row of the design matrix whose first element is 1 (the one multiplying the intercept 0 ), the covariate values at location i being the remaining elements, and W i is the spatially structured random effect at each location i. W is assumed to be a multivariate Gaussian distribution whose covariance matrix, 2 W H ( ), depends on the distance between locations, and its parameters, 2 W and , represent the variance and range of the spatial effect, respectively.
In the context of HBMs, parameters were treated as random variables and prior knowledge was incorporated using the corresponding prior distributions. These priors were specified in the second stage jointly with random effects. In the third and final level of the hierarchy, prior knowledge about the hyperparameters was expressed. This hierarchical structure can also be considered as a latent Gaussian model (Rue & Held, 2005).
As posterior distributions for the parameters and hyperparameters do not have an analytical expression, numerical approximations are usually needed. In the case of latent Gaussian models, INLA (Rue, Martino, & Chopin, 2009) is a computationally efficient alternative to the MCMC method. However, to fit and predict the particular case of continuously indexed Gaussian fields with INLA, W in our case, an additional module is required. Lindgren, Rue, and Lindström (2011) proposed an approach using an approximate stochastic weak solution to a stochastic partial differential equation (SPDE) as a Gaussian Markov random field (GMRF) approximation to continuous Gaussian fields with Matérn covariance structure, a highly flexible and general family of functions in spatial statistics (Rue & Held, 2005). The Markov property allowed the use of a precision sparse matrix, enabling efficient numerical algorithms. Under this approximation, the spatial effect is reparameterized as follows: Here, W depends on two different parameters, and , which determine the range of the effect and the total variance, respectively.
More precisely, the range is approximately = √ 8 and the variance is 2 W = 1 4 2 2 (Lindgren et al., 2011). We specified prior distributions for the parameters and hyperparameters. In particular, normal vague priors with mean 0 and precision 10 −4 were used for the vector of regression coefficients.
Although internally INLA works with parameters and , we specified the spatial effect in terms of and W using the reparameterizations log ( ) and log W as independent normal vague distributions (Lindgren & Rue, 2015).
Overall, the full model was stated as follows: where m was automatically chosen so that the prior mean of φ was about 50% the diameter of the study geographical region, while m W was chosen in a way that the corresponding variance of the field was 1. For our analysis, this resulted in m = 13.517 and m W = 0. Finally, the default a priori precisions for log ( ) and log w distributions were q = 0.25 and q W = 0.25, respectively. These latter values, q and q W , express the large uncertainty about the parameters before the analysis, resulting in quite noninformative hyperpriors. This is important because it allows the range to take values between 0 and the total diameter of the Iberian Peninsula. In contrast to Maxent, HBMs can take space into account when modelling distribution ranges, which provides the possibility to evaluate its mean effects and uncertainty. As mentioned above, once the inference is made, the main interest becomes to predict the response in unsampled locations. To do that, we applied the SPDE by constructing a Delaunay triangulation (Hjelle & Daehlen, 2006) covering the whole Iberian Peninsula ( Figure S1).

| Model selection, distribution range shifts and residual SAC
Hierarchical Bayesian models were run with and without the spatial component with the r-inla package (Lindgren & Rue, 2015) in order to quantify its effects on distribution range shifts with GCC and to be compared with Maxent outcomes. We fitted all possible models given by the set of combinations among the eight climatic predictors without interactions. To select the best model, we used as a summary statistic of the conditional predictive ordinate (CPO; Geisser, 1993), which gives an overall measure of predictive performance (Hooten & Hobbs, 2015). Conditional predictive ordinate is defined as the cross-validated predictive density at a given observation. Conditional predictive ordinate can be used to compute predictive measures, such as the logarithmic score (Gneiting & Raftery, 2007) or the cross-validated mean Brier score (Schmid & Griffith, 2005). Among the best five models for each ge- All models mentioned above were checked for residual SAC. We calculated the residuals for model predictions between observed and predicted values and tested for residual SAC using the spdep R package (Bivand, Hauke, & Kossowski, 2013;Bivand & Piras, 2015 considered as SAC-free. As expected, spatial HBM did not show residual SAC, while nonspatial HBM did. All Maxent models, except for genetic cluster C4, also retained residual SAC (Table S2).

| Current distribution range
We compared the performance of three modelling approaches, Maxent and HBMs with and without the spatial component, to depict the current distribution range of four genetic clusters of A. thaliana using eight selected bioclimatic predictors. Maxent models included between four and seven bioclimatic predictors per genetic cluster (Table 1). With these predictors, Maxent yielded a clear geographical distribution of genetic cluster ranges in the Iberian Peninsula (Figure 2a), as found in earlier studies using the same approach but with different data (Marcer et al., 2016).
In the case of Bayesian models, spatial and nonspatial HBMs produced broadly similar geographic distributions of genetic clusters to those generated by Maxent models, particularly for genetic clusters C1 and C4 (Figure 2a). In the case of genetic cluster C2, the spatial HBM depicted a rather continuous distribution in northeast Spain, which clearly differed from those given by Maxent and nonspatial HBM, showing the truncated distribution that this genetic cluster actually has in the wild (Figure 2a). In general, spatial HBMs and Maxent models showed more compact distribution ranges than nonspatial HBMs, the former with more dramatic transitions between low and high probability values in all genetic clusters (Figure 2a; Figure S2).
The exception was genetic cluster C3, whose predicted distribution range with nonspatial HBMs was rather blurred in comparison with that obtained with Maxent ( Figure 2a). In fact, it was not possible to fit spatial HBMs for genetic cluster C3 because the results were inconsistent. When using vague hyperpriors for the range of the spatial effect, the resulting mean of the posterior distribution of the range was larger than the whole study area. By contrast, when using more informative priors, results were different and very much conditioned by prior selections. Thus, the spatial effect for genetic cluster C3 did not provide explanation further to what can already be explained by the bioclimatic predictors.
For all genetic clusters, the uncertainty of the predictive mean for nonspatial HBMs was lower and more evenly distributed across space than that for spatial HBMs (Figure 2b). The main reason for this apparent reduction in uncertainty is that spatial models reflect the intrinsic variability of the Beta-distributed data, variability that is not reflected by nonspatial models. As a result, the distribution of means and standard deviations for spatial HBMs was more pronounced than those for nonspatial HBMs (Figure 2; Table   S3). Nonetheless, mean values of the posterior distribution of the precision parameter, which are inversely proportional to the variance of the data, were larger in the spatial HBMs, reflecting their acceptable behaviour (Table S4). In the case of spatial HBMs, these models allowed the spatial effects to be visualized, which clearly were more intense at the centre of the genetic cluster distribution ranges ( Figure 3). Uncertainty of the mean of the spatial effect was greater for genetic cluster C4 than for the other two genetic clusters ( Figure 3). Overall, spatial HBMs selected fewer bioclimatic predictors than nonspatial HBMs and Maxent models to define the distribution range of the four genetic clusters (Table 1; Table S5), particularly for genetic cluster C4. Finally, the combination of bioclimatic predictors used by Maxent models and spatial and nonspatial HBMs was quite different (Table 1).
Mean absolute error and RMSE were lower for spatial than for nonspatial HBMs for all genetic clusters in which the comparison was possible (Table 2). This indicated that spatial HBMs had lower average model prediction errors in the response variable.

| Distribution range shifts with GCC
Maxent models and HBMs were also used to quantify distribution range shifts of A. thaliana's genetic clusters under different GCC models and scenarios. The three modelling approaches yielded different GCC predictions for each genetic cluster based on suitability shifts in distribution range projections (Table 3, Figure 4; Figure S3). Overall, Maxent showed a trend of predicting more dramatic changes in distribution range due to GCC for all genetic clusters compared to spatial and nonspatial HBMs (Table 3, Figure 4). For genetic cluster C1, important reductions in distribution range were predicted for the two GCC scenarios with Maxent and nonspatial HBMs, whereas spatial HBMs predicted slight increases (Table 3, Figure 4). For genetic cluster C2, Maxent predicted increasing and decreasing distribution ranges with RCPs 2.6 and 8.5, respectively, whereas both HBMs predicted small fluctuations in distribution range in both GCC scenarios (Table 3, Figure 4). For genetic cluster C3, Maxent showed very large increases in distribution range, particularly for the RCP 8.5 scenario, whilst nonspatial HBMs predicted slight fluctuations in distribution range in both GCC scenarios (Table 3, Figure 4).
Finally, for genetic cluster C4, all approaches predicted increases in distribution range in both GCC scenarios. Maxent gave higher increases in RCP 2.6 than in RCP 8.5 and vice versa for both HBMs (Table 3, Figure 4). inherent to practically all organisms: the geographical distribution of intraspecific genetic variation and the SAC in the data. Importantly, both geographical genetic structure and SAC can be considered as indicators of eco-evolutionary forces shaping species' distribution ranges, such as colonization/extinction dynamics, dispersal ability, local adaptation and historical factors.

| Current distribution range
The selection of the modelling approach may have significant repercussions when considering a species as a genetically heterogeneous organism, whose geographical distribution of its genetic variation may have major implications for understanding the effects of GCC affecting the distribution range may take place, such as the balance between selection and dispersal against hybrids (Barton & Hewitt, 1985). For this reason, HBMs represent a better choice to model distribution ranges of intraspecific genetic lineages if it is undesirable to discard accessions with too much genetic admixture.
Broadly speaking, Maxent and the Bayesian modelling approaches were consistent in depicting the current geographical distribution of the four genetic clusters of Iberian A. thaliana (Figures 1 and 2). The exception was genetic cluster C3, in which nonspatial HBM blurred the distribution range and spatial HBM was not able to produce results due to unacceptable outcomes in a Bayesian framework. Interestingly, genetic cluster C3 is strongly differentiated from the other clusters found in the Iberian Peninsula, as well as across the whole species' distribution range.
The relict nature of genetic cluster C3 is also supported by its F I G U R E 3 Mean and standard deviation of the spatial effects included in the spatial hierarchical Bayesian models (HBMs). Darker and lighter intensities (logit scale) indicate higher and lower spatial effects, respectively. Grey maps for genetic cluster C3 indicate that the spatial HBMs were not acceptable  Maxent predicts its distribution best. For the remaining genetic clusters with marked geographical distributions (NW, NE and SW Iberian Peninsula for genetic clusters C1, C2 and C4, respectively), Maxent and Bayesian approaches were able to model their current distribution ranges. However, they exhibited some differences among genetic clusters. For example, genetic cluster C2 exhibits a disjunct distribution due to a major geographical barrier (i.e., the Ebro river valley in NE Spain), which was clearly depicted by Maxent (Figure 2). Note that such a disjunct distribution is not a sampling problem, but is the result of the low occurrence of the species confirmed after repeated field campaigns in the region.
Bayesian Beta regression approaches, particularly the spatial HBM, blurred (but did not totally erase) the disjunct distribution of this genetic cluster. In contrast, Maxent and Bayesian Beta regression approaches were more consistent for genetic clusters C1 and C4, which exhibited more compact distributions. Overall, we conclude that the continuity of a cluster's distribution range increases its suitability to be modelled by alternative means.
Spatial HBMs, along with Maxent for genetic cluster C4, did not show residual SAC, which is a desirable property to avoid inaccurate parameter estimates and inadequate quantification of uncertainty (Beguin et al., 2012;Crase et al., 2014;Latimer et al., 2006;Record et al., 2013). In addition, spatial HBMs exhibited lower average model prediction errors than nonspatial HBMs. Hence, and from a purely statistical viewpoint, the higher rigour of spatial HBMs, in terms of higher accuracy and efficient removal of residual SAC, gives them a clear advantage (Crase et al., 2014;Swanson et al., 2013). Spatial HBMs also allowed us to assess the spatial effects on distribution range, which were quite compact and with high intensities at the distribution range centre (Figure 3). Such patterns may account for the lower number of bioclimatic predictors selected by spatial HBMs in comparison with nonspatial HBMs and Maxent. In fact, the reduction of predictors represents a common shortcoming of SDMs that, in some cases, may jeopardize the biological interpretation of the environmental factors underlying current distribution ranges (Beale et al., 2010;Swanson et al., 2013). We want to emphasize, however, that the five best spatial HBMs for each genetic cluster included additional variables compared to the best model, and all models were quite similar in terms of DIC, WAIC and LCPO values (Table S5).
Thus, we have different options to identify environmental drivers of current distribution ranges. Furthermore, the reduction of predictors in spatial models may not reduce the models' interpretability.

| Distribution range shifts with GCC
Taking spatial effects into account had a profound effect on the predictions of distribution range shifts due to GCC for A. thaliana's genetic clusters in the Iberian Peninsula. In general, spatial HBMs exhibited more conservative patterns of change compared to Maxent and nonspatial HBMs (Figure 4). This result is in agreement with other research suggesting that environment-only models forecast substantially greater range shifts compared to models incorporating spatial effects (Crase et al., 2014;Swanson et al., 2013). The rationale is that organisms exhibiting a high SAC in the environmental drivers accounting for their distribution ranges will have larger areas with similar climates, which will also make GCC effects more predictable and homogeneous across space (Nadeau, Urban, & Bridle, 2017 indicating that genetic cluster C1 will maintain and even increase its current distribution range (Figure 4). The strong spatial effects detected by spatial HBMs for genetic cluster C1 may account for this result, as the response of A. thaliana to GCC is also expected to be more homogeneous. Recent experimental data from transplant experiments using accessions from this genetic cluster indicate that this scenario may be plausible, as A. thaliana performed well in warmer environments, highlighting the potential of this genetic cluster to cope with warming (Exposito-Alonso, Brennan, Alonso-Blanco, . The same applies to genetic cluster C4, which is also distributed continuously in the typically Mediterranean SW Iberian Peninsula. In this case, however, all modelling approaches predicted its expansions with GCC, although some discrepancies among modelling approaches were recorded (Figure 4). Although we lack experimental evidence of the effects of warming on the performance of C4 A. thaliana accessions, we believe that such expansion with GCC is highly probable, as genetic cluster C4 mostly occupies the warmest Iberian region.

| CON CLUS IONS
We developed hierarchical Bayesian Beta regression models to explore the current distribution range and its GCC-induced shifts for an organism with a marked geographical genetic structure, which represents the outcome of historical, ecological and evolutionary forces probably acting in concert. For this reason, the effects of GCC have to be understood as a mosaic of responses varying in extent and intensity determined by the complexity of the geographical genetic structure exhibited by study organisms. Rather than predicting mere contractions or expansions for a single organism, we should expect a reshuffling of the genetic diversity and its geographical structure with GCC, which is obviously more difficult to predict. The HBMs developed here enrich the toolbox of software available to deal with such expectation.
From a statistical viewpoint, our HBMs allow the modelling of each genetic cluster and avoid the binarization of genetic cluster membership proportions, required by Maxent, which may imply an important data loss. This has the advantage that populations with high admixture can be included in HBMs. In addition, our HBMs can take the SAC of data into account, which not only improves the statistical properties of the model (i.e., removal of residual SAC) but also adds realism, as SAC may represent the result of eco-evolutionary processes shaping distribution ranges. Despite such desirable properties, our simulations of current and future distribution ranges of the four genetic clusters of Iberian A. thaliana indicated that the ease of modelling is strongly related to the continuity of their distributions. Furthermore, biological knowledge of the study organism (e.g., the identification of relict genetic lineages based on whole-genome markers, the detection of void areas after years of extensive field sampling, and the experimental quantification of plant performance with warming) emerges as an essential element in the understanding of model outputs. Finally, we believe that further work should also be conducted to validate model outputs by independent means (i.e., the assignment of new A. thaliana populations to genetic clusters based on model predictions).
We conclude by stressing the importance of developing better models to forecast the effects of GCC on organisms' distribution ranges worldwide. Such predictive tools, and the comparison thereof, may lead to mitigation of the inevitable impact of GCC on biodiversity.
However, we need to increase our understanding of the evolutionary (e.g., physiological adaptive responses) and demographic (e.g., extinction/colonization dynamics and dispersal ability) factors accounting for the response of organisms to environmental changes imposed by GCC. F.X.P. led the writing and all authors made significant contributions to its final version through several revisions.

DATA ACCE SS I B I LIT Y
The data that support the findings of this study are openly available