Conservation of aquatic insects in Neotropical regions: A gap analysis using potential distributions of diving beetles in Cuba

1. Human activities are an increasing threat to Neotropical freshwater ecosystems, with the potential extinction of thousands of aquatic species. Despite this, knowledge about the effectiveness of protected area networks in protecting aquatic insects in this biogeographical region is very limited. 2. Cuba supports the highest diversity of aquatic insects in the Antilles, with a large number of endemics. 3. A gap analysis was conducted to assess the effectiveness of the National System of Protected Areas of Cuba (NSPAC) in the conservation of Cuban diving beetles (family Dytiscidae). This involved considering the areas with the highest potential species richness, estimated by using species distribution models with three different approaches (MaxEnt, Random Forest and Support Vector Machine), and the known localities of endemic species.

1. Human activities are an increasing threat to Neotropical freshwater ecosystems, with the potential extinction of thousands of aquatic species.Despite this, knowledge about the effectiveness of protected area networks in protecting aquatic insects in this biogeographical region is very limited.
2. Cuba supports the highest diversity of aquatic insects in the Antilles, with a large number of endemics.
3. A gap analysis was conducted to assess the effectiveness of the National System of Protected Areas of Cuba (NSPAC) in the conservation of Cuban diving beetles (family Dytiscidae).This involved considering the areas with the highest potential species richness, estimated by using species distribution models with three different approaches (MaxEnt, Random Forest and Support Vector Machine), and the known localities of endemic species.
4. The highest potential species richness of Dytiscidae in Cuba is predicted to occur in the low-medium altitude of the eastern mountain areas.Although most of these areas occur inside the NSPAC, several areas of potential high species richness are currently unprotected.It is recommended that sampling programmes are carried out in areas with high predicted species richness to validate the species distribution models.5.The distribution of three Cuban endemic species (Copelatus barbouri, Desmopachria glabella and Celina cubensis) lies completely outside of the NSPAC.
Despite their conservation interest as threatened endemic species, they are currently unprotected.6.To improve the conservation of freshwater biodiversity in Cuba it is recommended that (i) the NSPAC network is extended to protect areas supporting endemic species and those with the highest potential species richness that are currently unprotected, and (ii) a whole-catchment management approach, specifically to maintain natural flows, should be adopted, especially in the mountainous areas of eastern Cuba.

| INTRODUCTION
Biodiversity conservation is critical to human sustainability because it drives the ecosystem services that provide the core of human lifesupport systems.However, owing to increasing human impacts worldwide, species extinction rates at present may be similar to those of past mass extinction events (Ceballos et al., 2015).Several authors have concluded that freshwater biodiversity is at greater risk than biodiversity in any other ecosystem (Allan & Flecker, 1993;Master, Flack & Stein, 1998;Ricciardi & Rasmussen, 1999;Albert et al., 2021).
For example, approximately 83% of freshwater species have declined globally since 1970, and the greatest biodiversity loss has been observed in the Neotropics, Indo-Pacific, and Afrotropics (World Wildlife Fund, 2018).These findings are based on evaluations conducted using only vertebrate species (freshwater fishes and amphibians), but similar declines are likely to be experienced by aquatic macroinvertebrates, which constitute the overwhelming majority of aquatic biodiversity.Indeed, the International Union for Conservation of Nature (IUCN) Red List assessment found that 15% of dragonflies and damselflies (Odonata) were under threat of extinction (Collen et al., 2012).Despite these clear trends, freshwater biodiversity continues to receive less attention than its terrestrial and marine counterparts, particularly in tropical regions (Boyero et al., 2009;Godet & Devictor, 2018).A recent study (Sundar et al., 2020) revealed a lack of adequate taxonomic, phylogenetic and ecological information for most aquatic macroinvertebrate groups in tropical regions, and consequently large-scale knowledge gaps regarding the response of macroinvertebrate diversity to human impacts in these regions.
A common approach to minimizing the loss of biodiversity is the establishment of protected areas.These play a key role in the protection of biodiversity and are the mainstay of most conservation polices (Chape et al., 2005).One of the main challenges for conservation biologists is assessing the effectiveness of protected areas.Several authors have concluded that the conservation of biodiversity by existing protected areas is inadequate in several regions throughout the world and at different geographical scales (Rodrigues et al., 2004;Araújo, Lobo & Moreno, 2007;de Carvalho et al., 2017;Gonçalves et al., 2018;Sundar et al., 2020).Although species loss in freshwater habitats is alarmingly high, very few assessments have been carried out of protected area networks based specifically on their capacity to protect aquatic insects (Abell an et al., 2007;S anchez-Fern andez et al., 2008), especially in Neotropical regions (but see N obrega & de Marco, 2011).This is probably because knowledge of the spatio-temporal distribution of aquatic insects in the Neotropics is still very incompletethe so-called Wallacean shortfall (Whittaker et al., 2005).
One way to overcome the Wallacean shortfall is to use species distribution models (SDMs) (Guisan & Zimmermann, 2000;Elith & Leathwick, 2009).When applied correctly, SDMs offer a useful approach to identify the areas with the highest environmental suitability as well as the key environmental factors determining spatial patterns of occurrence (Guisan & Zimmermann, 2000).In addition to exploring basic ecological questions, SDMs also have practical applications for nature conservation, such as the potential to identify areas with a high probability of finding new populations of threatened species (Rebelo & Jones, 2010), providing support to species conservation or reserve planning (Carvalho et al., 2010;Doko et al., 2011), identifying gaps in geographical distribution and assessing the degree of protection coverage provided by nature reserves (Doko et al., 2011;Domíguez-Vega et al., 2012).Therefore, predicting the species distribution using ecological niche modelling offers a powerful approach to identify under-sampled areas that are potentially very important from a conservation perspective.
This study focused on one of the few aquatic insect groups with relatively good taxonomic and faunistic knowledge in Neotropical areas: the diving beetles (family Dytiscidae) in Cuba (see Megna & S anchez-Fern andez, 2016).The Dytiscidae (predaceous diving beetles) is a large cosmopolitan family of Coleoptera, with an estimated 4,400 species worldwide (Nilsson & H ajek, 2019), and more than 1,200 in Neotropical areas (Jäch & Balke, 2008).With a few exceptions, all members of the family are well adapted to aquatic life, both adults and larvae, inhabiting a wide range of lentic and lotic habitats (Balke & Hendrich, 2016).The highest diversity of Dytiscidae in the Antilles occurs in Cuba with 53 species (7% of the species of the family in the Neotropical area), followed by the Bahamas (28) and Hispaniola (23) (Megna & S anchez-Fern andez, 2016).Cuba supports a high rate of endemism, with almost a quarter (23%) of the 53 species of diving beetles being found exclusively in that country.Almost half of the Cuban species have been identified as highly or moderately vulnerable (Megna et al., 2018).These taxa represent the majority of water beetle species in most regions of the world and could serve as indicators of diversity for other macroinvertebrate groups (S anchez-Fern andez et al., 2006;Slimani et al., 2019).
A gap analysis was conducted to assess whether the National System of Protected Areas of Cuba (NSPAC) is effective in the conservation of Cuban diving beetles, based on the coverage of the highest potential species richness derived by species distribution models.However, mapping areas of high species richness is only one proxy used for the establishment of priority areas for conservation, with the premise that conserving only the richer areas does not necessarily imply the conservation of rare or endangered species (Ceballos & Ehrlich, 2006;Jenkins, Pimm & Joppa, 2013).This was also investigated by assessing the proportion of the observed distribution of Cuban endemic species that occur within the NSPAC.

| Study area and biological dataset
Cuba is an archipelago composed of one long and narrow island (the largest of the Greater Antilles, at >106,757 km 2 ), Isla de la Juventud (2,419 km 2 ) and a set of more than 4,000 cays and islets (with a total area of around 3,126 km 2 ) (Figure 1).Nearly 70% of Cuba consists largely of flatlands, with only a few significant mountain ranges: the western Sierra de Guaniguanico, the central range of Guamuahaya and Sierra Maestra, and the eastern Nipe-Sagua-Baracoa Mountains (Figure 1).
Distributional data for the species of the family Dytiscidae were obtained from the DICU (Ditíscidos de Cuba) database (Megna & S anchez-Fern andez, 2016).This database represents the most complete information available for a major group of freshwater invertebrates in the study area, including all available geographical and biological data for water beetles from the literature, museum, private collections, PhD theses and other unpublished sources from 1856 to 2014 (Megna & S anchez-Fern andez, 2016).The database contains a total of 708 records for the 53 species that currently occur in Cuba.

| Environmental variables
Climatic data were obtained from Worldclim, version 1.4 (http:// www.worldclim.org;Hijmans et al., 2005).This database contains 19 bioclimatic variables that were obtained by interpolation of climate station records from 1950 to 2000 (Table S1).A variable related to vegetation type, divided into 32 categories or types of vegetation according to Capote et al. (1989) was also used (see Supporting Information Table S2).All the variables were generated at a resolution of 2.5 min ($4.4Â 4.4 km = 19.4km 2 at the equator).A principal component analysis was performed to generate orthogonal independent variables, which were used as inputs in the SDMs to decrease the number of predictors and to avoid collinearity in the models (Dormann et al., 2013).

| Species distribution modelling
Only those species with at least five recorded localities were modelled.As one of the sources of uncertainty in SDMs is the type of algorithm used (Pearson et al., 2006;Watling et al., 2015), the models were generated using three different algorithms: MaxEnt, Random Forest and Support Vector Machine.They are commonly used machine learning algorithms, applicable to presence-only data, and have been shown to perform well when compared with other methods, especially to predict the spatial distribution of species whose available presence data are limited and where false absence in surveys is a significant risk (Hernandez et al., 2006;Phillips & Dudik, 2008;Wisz et al., 2008;Elith & Leathwick, 2009;Mi et al., 2017;Velazco et al., 2019).Kadmon, 2006).The TSS measures the difference between the rates of sensitivity (true positive predictions) and specificity (true negative predictions), and between the rates of commission (false positive predictions) and omission (false negative predictions).The TSS scores range from 0 to 1 with 0 describing a model that is no better than random (high omission/commission and low sensitivity/specificity), and 1 describing a perfect agreement with the observed data (low omission/commission and high sensitivity/specificity).Models with TSS > 0.4 were considered to have acceptable performance (Engler et al., 2011;Domisch et al., 2013).Other evaluation metrics [area under the receiver operating characteristic curve (AUC), Sørensen similarity index (F-score) and Jaccard similarity index] were also calculated.The models derived from the three different algorithms were used to construct maps of the mean environmental suitability of each species on a scale from 0 to 1. Model fitting, evaluation and ensemble were done with the modular R package modleR v.0.0.1 (S anchez-Tapia et al., 2020).In modleR, the MaxEnt, Random Forest and Support Vector Machine algorithms were used as implemented in the packages dismo (Hijmans et al., 2017), randomForest (Liaw & Wiener, 2002) and e1071 (Meyer et al., 2017), respectively.

| Potential species richness pattern and hotspots
The maps of potential distribution of species that resulted from models giving a good predictive capacity (TSS > 0.4) were converted into binary maps of presence/absence for each one of the three algorithms in the modelling procedure using the maximum training sensitivity together with the specificity threshold approach (Liu et al., 2005).These binary maps were then overlapped to obtain the value of the potential species richness at the same spatial resolution (2.5 min squares); one map of potential richness was estimated by each one of the three algorithms used.
The hotspots of potential species richness were selected as the areas within the first decile (top 10%) of the obtained values of potential richness by each of the three algorithms.The three resultant maps were combined to obtain a map of congruence of the hotspots of potential species richness, showing whether each pixel has been selected as a hotspot by one, two or three of the algorithms used.To assess whether the NSPAC is effective in protecting diving beetles, this map was superimposed over the areas of potential richness derived by each of the three algorithms described above.

| Gap analysis
Given that it was impossible to model the environmental suitability of all endemic species, as most of them occur in fewer than five localities, the known present distribution of endemic taxa was also superimposed over the map of the NSPAC to identify additional areas that would benefit the conservation of these species.In order to apply quantitative conservation targets, a species was considered to be a 'covered species' if any protected area overlapped any extent of its mapped distribution; otherwise it was considered to be a 'gap species' (Rodrigues et al., 2004).As this is a rather relaxed and subjective criterion, we also explored how the number of gap species would vary considering other qualitative targets (percentage of populations covered by protected areas), for comparison.

| SDMs, potential species richness pattern and hotspots
The first three principal components were used to run the models, which accounted for $81% of the total variation in the original variables (Table S3 and Figure S1 in Supporting Information).The first factor was positively related to the precipitation of the driest quarter and negatively to the annual mean temperature and maximum temperature of the warmest month.The second factor was positively related to the minimum temperature of the coldest month and negatively to thermal seasonality.The third factor was positively related to isothermality and negatively to temperature seasonality (Table S3 and Figure S1 in Supporting Information).Potential distribution models were run for 32 species (out of the 53 currently recognized Cuban species of the family Dytiscidae), i.e. those species with at least five recorded occurrences.For 20 of the 32 species modelled (62.5%), the results derived from the three different algorithms showed a good prediction capacity (TSS values ≥ 0.4; Table 1; Figure S2 in Supplementary Material; see also other evaluation metrics in Table S4) and were used to predict species richness.
There was generally an increasing gradient of potential species richness from western to eastern areas with the highest predicted species richness mainly located in the mountainous areas (up to 1,200 m a.s.l.) in eastern Cuba (mainly Sierra Maestra and Nipe-Sagua-Baracoa), with some exceptions in the central part of the country (Macizo de Guamuhaya; Figure 2).However, in these areas species richness decreased towards the high-altitude mountains (1,500-2,000 m a.s.l.; Figure 2).This pattern was generally similar for the three algorithms used to run the models.However, some areas predicted to have high species richness by MaxEnt and Random Forest models were predicted to have a low number of species

| DISCUSSION
Most of the hotspots of potential species richness for Dytiscidae in Cuba are located at the low to medium altitudes of the mountain areas of the Macizo de Guamuhaya, Sierra Maestra and Nipe-Sagua-Baracoa in the eastern region of Cuba (Figure 2).In general, there was a high congruence in the results produced by the Random Forest and Maxent models, whereas the results produced by Support Vector Machine showed some differences.For example, some areas in the Guantanamo province in eastern Cuba were predicted to have high species richness by MaxEnt and Random Forest but a low number of species according to Support Vector Machine.The absolute species richness predicted by Support Vector Machine models was in general lower than that predicted by MaxEnt and Random Forest across the entire study area.A plausible explanation could be that the performance of Support Vector Machine is more dependent on the sample size of the training dataset.Indeed, Drake, Randin & Guisan (2006) concluded that at least 40 observations were necessary to obtain consistent models using Support Vector Machine-based methods.
The pattern of potential species richness detected for diving beetles might be similar for other aquatic macroinvertebrates, as water beetles have been shown to be good surrogates of biodiversity in fresh waters (S anchez-Fern andez et al., 2006).Indeed, the results obtained here are broadly consistent with inventories of other aquatic macroinvertebrates in Cuba, particularly Odonata and Ephemeroptera (Trapero & Naranjo, 2003; Gonz alez-Lazo, Salles & Naranjo, 2008).Naranjo et al. (2005) found that medium-altitude areas show a relatively good conservation status, as well as a great environmental heterogeneity, producing a diverse variety of aquatic macroinvertebrates.These areas are characterized by relatively high temperature seasonality, and lower temperatures compared with the rest of Cuba.They are characterized by abundant precipitation throughout the year and the occurrence of permanently flooded tropical or subtropical hydromorphic vegetation and secondary successional herbaceous and woody vegetation (Capote et al., 1989).
Previous studies (Larson, 1997;Balke & Hendrich, 2016) also highlight the importance of the structure of the vegetation in the surrounding areas of water bodies for the occurrence and the community structure of diving beetles.In the same way, environmental temperatures have long been seen as critical in determining species' distributions, particularly for water beetles (Calosi et al., 2008;Arribas et al., 2012;S anchez-Fern andez et al., 2012).
It is important to note that modelling procedures have limitations (see Jiménez-Valverde, Lobo & Hortal, 2008).For example, biotic interactions and dispersal capacities are not taken into account and these factors might prevent the occurrence of the species in environmentally suitable areas, such as the mountainous areas in eastern Cuba.In addition, the present distribution of Cuban dytiscids might also be influenced by various contingent, unique and unrepeatable factors (Ricklefs, 2004).The distributional range of some species extends beyond Cuba, but this study considered only Cuban records.Consequently, partial distribution information may bias the  Manrique, 2011b;Raes, 2012), although such problems may be minimized for taxa occurring on isolated islands.Furthermore, including all populations in the models would add another source of inaccuracy, as potential phenotypic plasticity and genetic variations of species across their whole range would be ignored (Scheiner & Goodnight, 1984;Addo-Bediako, Chown & Gaston, 2000;Terblanche et al., 2006), assuming that some genetically and phenotypically distinct populations (such as those typically found on isolated islands) are able to inhabit other apparently similar climatic regions.Other limitations and uncertainties are those derived from the identification of accessible areas (Barve et al., 2011) or the procedure to select background points (Barbet-Massin et al., 2012), but these would also be minimized when modelling species or populations from islands with no important geographical barriers across their territory.For example, in this study the differences in potentially accessible areas within Cuba among the species studied are likely to be small.Keeping these considerations in mind, the approach taken here should provide a much more realistic picture of the species richness of diving beetles than that obtained from observed richness alone.The results provide the basis for further studies to verify the causal factors behind the patterns detected.However, sampling effort has been scarce in some of the areas with high predicted species richness (e.g.Nipe-Sagua-Baracoa), where more sampling needs to be carried out to improve the effectiveness of biodiversity monitoring, expand species inventories and help to validate, verify and refine the predictive models (S anchez-Fern andez et al., 2011a).
This study demonstrated that the observed distributional range of the endemic species is smaller than for the non-endemic species.
Indeed, it was only possible to carry out distribution models for four (33%) of the 12 endemic species (Laccophilus alariei, Copelatus cordovai, Desmopachria andreae and Desmopachria tarda), as the remaining eight species occur in fewer than five cells.However, modelling was possible for 27 of 41 non-endemic species (Table 1).
Several of these species (Copelatus caelatipennis, Hydaticus bimarginatus, Hydaticus rimosus, Thermonectus basillaris basillaris, Thermonectus circumscriptus and Thermonectus margineguttatus) can be found both in lotic and lentic habitats (Megna & S anchez-Fern andez, 2016).These diving beetles are included in the 12 species for which models presented an insufficient predictive capacity (i.e.TSS < 0.4; Table 1).This could be explained by the low discriminatory capacity of the models for species that have a wide distribution in Cuba across a range of environmental conditions.It is likely that the inclusion of fine-scale hydrographic variables could improve the model predictions; however, at present there is no digital cartographic information on small streams, springs and temporary ponds in Cuba, that could be the most representative habitats for water beetles.
Most of the occurrence records for Dytiscidae (85%) are located in the NSPAC.However, despite the high degree of overlap among the areas of highest predicted species richness and NSPAC sites, 25% of the endemic species (C.barbouri, Desmopachria glabella and C. cubensis) were considered gap species, as their distributions lie completely outside the network (Table 2, Figure 3b).This result could be affected by the specific quantitative conservation target used to consider gap species; for example, 58.3% of the endemic species would be classified as gap species in the extreme case that all known populations needed to be within the NSPAC to consider a species as 'covered'.Thus, range size is an important criterion for evaluating the gap species as it is positively related to the number of populations of each species covered by protected areas, and thus to the number of gap species (Figure S4 in Supplementary Material).Megna et al. (2018) showed that the most threatened Cuban endemic species were characterized by highly localized populations and low abundance, as well as high habitat specificity.None of the gap species identified here have specific legal protection, so the only protection mechanism currently available for these species would be their occurrence within the NSPAC; these important species are thus totally unprotected.
This study has identified some endemic species and hotspots of potential richness outside the NSPAC network.To improve the conservation of freshwater biodiversity in Cuba, it is recommended that the network is extended to include those areas currently unprotected and with high conservation interest identified in this study.This would require more sampling programmes to obtain comprehensive and updated inventories of aquatic biodiversity, allowing more robust conclusions to be made about insect population declines (Didham et al., 2020).Similarly, integrated management plans for these protected areas in Cuba need to consider adequately the conservation of freshwater biodiversity.Human activities that occur outside the park boundaries, such as dam building, water abstraction and irrigation for agriculture, land-use disturbance and the introduction of non-native species, could have major adverse consequences for freshwater habitats within the park (Megna et al., 2018).The recommendations derived from the present study concur with previous studies that have highlighted the suitability of using hydrological units (sub-catchments) as units of planning and management for freshwater biodiversity (Hermoso et al., 2015).It is widely accepted that whole-catchment management, including the Eighty per cent of the records were used for model training and 20% for testing.Data partition was made by bootstrapping.One thousand pseudo absences/background points were selected at random, but excluding cells with species occurrences, as recommended for studies that wish to maximize specificity over sensitivity (e.g.reserve planning; Barbet-Massin et al., 2012).Model evaluation was conducted by means of the true skill statistic (TSS), which is based on the confusion matrix (Allouche, Tsoar & F I G U R E 1 Study area.Location of Cuba in the Antilles of protected areas.Digital information on the NSPAC was obtained from the World Database on Protected Areas (IUCN and UNEP-WCMC, 2020).Planning and integrated management of the NSPAC is coordinated by the National Centre for Protected Areas, an institution created in 1995 and belonging to the Ministry of Science, Technology and Environment of Cuba.Law 201/99 applying to the NSPAC established the legal regime for the management and control of the system, management categories and processes for the proposal and declaration of protected areas in Cuba.

F
I G U R E 3 Effectiveness of protected areas covering diving beetles.(a) Overlap between the hotspots of potential species richness of Dytiscidae in Cuba (i.e. the areas with the potential richness values within the first decile (top 10%), shaded as grey-scale areas) and the protected area networks from Cuba (red areas).Grey-scale indicates that each pixel has been selected by only one (light grey); two (dark grey) or all three modelling procedures used (black).(b) Localities of endemic Cuban species of Dytiscidae falling within (green dots) and outside (black dots) the National System of Protected Areas of Cuba results for some taxa because it may not describe the entire 'climatic niche' of a particular species (S anchez-Fern andez, Lobo & Hern andez- maintenance of natural flows, is indispensable for freshwater biodiversity conservation(Abell an et al., 2007).Management plans developed by the National Centre for Protected Areas (and approved by the Ministry of Science, Technology and Environment of Cuba) should include effective measures to secure the conservation of aquatic species in Cuba.The methodological approach used in this study, together with the recommended actions described above, could have wider applications and be implemented for the conservation of other aquatic species as well as in other Neotropical regions.
Summary of the results of the species distribution models of the family Dytiscidae in Cuba T A B L E 1 T A B L E 2 Endemic species of Dytiscidae from Cuba within the protected area networks Abbreviations: No., number of occurrences (at a spatial resolution of 2.5 0 squares); Protected, number and percentage of occurrences overlapping with the National System of Protected Areas of Cuba (NSPAC).