Methods to detect spatial biases in tracking studies caused by differential representativeness of individuals, populations and time

Over the last decades, the study of movement through tracking data has grown exceeding the expectations of movement ecologists. This has posed new challenges, specifically when using individual tracking data to infer higher‐level distributions (e.g. population and species). Sources of variability such as individual site fidelity (ISF), environmental stochasticity over time, and space‐use variability across species ranges must be considered, and their effects identified and corrected, to produce accurate estimates of spatial distribution using tracking data.


| INTRODUC TI ON
Although movement is a widespread characteristic in the animal kingdom, the study of movement ecology has been hampered by the scarcity of, and difficulty in obtaining, individual tracking data (Dingle, 1996). Since the 90s, the technological advances in animal tracking have facilitated the acquisition of large amounts of individual movement data with increasing precision (Wilmers et al., 2015).
This boosted the development of long-term (Klaassen et al., 2014), metapopulation (Ferreras, 2001) and even ecosystem-wide (Courbin et al., 2018) tracking studies. The miniaturization and increasing precision of tracking devices has overtaken the development of tools to prepare and analyse tracking data (Gupte et al., 2022) and has revealed sources of variability in the estimation of higher-level ecological groups' spatial distributions (e.g. colony, population and species) from individual movement data that are yet to be properly addressed (Gutowsky et al., 2015).
Among the most important sources of variability when using methods to scale up from individual tracks to the space use of higher-level ecological groups, we can find the following: the presence of individual site fidelity (ISF; Spiegel et al., 2017), the variation in space use according to changing environmental conditions at a range of temporal scales (Paiva et al., 2013) and the heterogeneous use of space by distant populations (Frederiksen et al., 2012;Matthiopoulos, 2003).
Firstly, fidelity to a geographic area is a well-documented phenomenon, present in a wide range of animal species (Switzer, 1993), and can be related to breeding or foraging behaviour, and to social interactions (Giuggioli & Bartumeus, 2012;Torney et al., 2018). An individual shows fidelity to a site when, based on previous experience, it returns to the same area where it had bred successfully or found a favourable environment for foraging (Schmidt, 2004).
Although ISF usually refers to the individuals returning to the same breeding area (Schumm et al., 2021;Shimada et al., 2021), fidelity also occurs independently of the breeding period, to foraging areas or nonbreeding areas (Léandri-Breton et al., 2021;Robillard et al., 2018). Regardless, its effect must be considered when individual tracking data are used to infer the distribution of a population, as it can bias results towards areas preferred by the animals with better data coverage (Giuggioli & Bartumeus, 2012;Lascelles et al., 2012).
Secondly, temporal changes in environmental conditions also play an important role in populations' distributions. Despite being based on individual decisions, an animal's use of space is linked to the optimal use of resources, and therefore, affected by environmental dynamism (Wolf & Trillmich, 2007). For instance, herds of nomadic herbivores move following peaks in productivity of grasslands (Aikens et al., 2017), migratory bird species change their distributions along the year or specific route characteristics and nonbreeding areas of migratory birds are linked to changing environmental conditions (Dias et al., 2011).
Lastly, failing to consider the entire distribution of a species when it spans a heterogeneous environment leads to underestimating its space use. Individuals from different parts of this distribution will be exposed to different environmental conditions and ecological pressures and will thus exhibit a differential use of space (Jeltsch et al., 2013). In fact, it has been shown that the size of the home ranges calculated using tracking data can be correlated with the size of the study area (Nekolny et al., 2017). Thus, maximizing not only the number of animals tracked, but also the geographical area where they are tracked from, seems necessary to obtain precise space-use estimates (Börger et al., 2006). In addition, in species with spatially structured distributions, or in migratory species with nonoverlapping nonbreeding distributions among populations, tracking individuals from only a few breeding populations could lead to an erroneous estimation of their nonbreeding areas (Webster et al., 2002).
Behind all these sources of variability there are specific ecological drivers, mentioned in previous paragraphs which, although of great interest, do not represent the goal of the current study. Here, we focus on presenting a method to detect (1) how individuals having different spatial preferences and uneven sampling intensities can affect the higher-level distribution when pooling all individual data without weighting or correcting; (2) how different sampling efforts across time can bias the results, if spatial preferences of the study species vary along time; (3) how different sampling efforts across space can bias the results, if there is a spatial structure in the space use of the target species. The set of functions we developed here, once applied to the tracking data prior to their use in any analyses answering ecological questions, can help to detect the presence of these three sources of variability. Correcting for the detected sources of variability, and then using the corrected data in the ensuing analyses, will guarantee more accurate distribution maps, protected area delimitations and space-use studies.
Marine top predators are a particularly useful group to study the effect of these sources of variability in tracking data since they show geographically widespread distributions and diverse movement and migratory patterns (Yurkowski et al., 2018). In addition, (1) they are often faithful to foraging and wintering areas (Chapman et al., 2009;Léandri-Breton et al., 2021;Wege et al., 2016); (2) their movements and distributions are affected by the dynamism of the marine environment (Albert et al., 2021;Patterson et al., 2021); and (3) they may display space-partitioning among populations (Wakefield et al., 2013). Lastly, they have, in recent years, been subject to a plethora of tracking studies, and although some start to acknowledge the need to account for these and other biases O'Toole et al., 2020), a unified framework to detect these three main sources of bias would be beneficial particularly for multiyear and multicolony studies.
Among marine top predators, Calonectris shearwaters are medium-sized Procellariforms that engage on year-round, longdistance and often trans-equatorial migrations (González-Solís et al., 2007) and show remarkable philopatry to the natal colony (Thibault, 1994). Three of the four extant species breed on the Mediterranean and Atlantic coasts, and their nonbreeding distributions are composed of discrete pelagic areas over the Atlantic and Indian Oceans (Dias et al., 2011;González-Solís et al., 2007). All these characteristics make them a suitable study group to investigate the effects of the sources of variability listed above.
In this work, we aim to provide a set of tools to understand the effects that (a) ISF, (b) environmental stochasticity over time and (c) the geographic extent of sampling effort, have on the distributions of mobile species tracked using animal-borne devices, regardless of tracking method, or habitat and characteristics of the study species, and to demonstrate their applicability to a real set of tracking data. To do so, we first tested the performance of the method with artificial datasets simulating different degrees of ISF, as well as different sampling regimes and efforts, and then demonstrated its applicability to real data by collating, for the first time, a dataset of 1346 year-round trips from 805 individuals of three Calonectris shearwater species breeding in 34 colonies. This constitutes a robust and diverse dataset and provides a relevant example to demonstrate the applicability of the method. Finally, we provide examples of the direct application of these methods by (1) calculating the nonbreeding distribution of each of the species and (2) evaluating the representativeness of each colony on the entire species based on their geographic location and centrality to the species breeding distribution.

| ME THODS
We describe here the procedures developed to detect instances in which the methodology used to obtain population-or species-level space-use measures from individual tracking data can introduce biases. In particular, we aim to test the effects of ISF and temporal and spatial variation in a tracking dataset, applicable to most sets of animal tracking data, as long as it can be split into discrete bouts such as foraging trips, migratory cycles, or even days, weeks or years (bouts, hereafter). We first demonstrate the efficiency of the procedures in simulated datasets and then illustrate the use of these methods using the nonbreeding distributions of Calonectris shearwaters tracked with Global Location Sensors (GLS, Wilson et al., 1992). In our worked example, the bouts were clearly defined as a migratory trip. In other types of data and species, bouts might be less evident and need to be carefully decided based on the research question and spatial ecology of the study species.

| Testing for ISF
To detect the bias caused by individuals with larger sample sizes preferring certain areas, we propose a method that consists of obtaining an estimate of space use for each bout and then comparing the similarities of bouts from the same individual to those of bouts from different individuals. If ISF exists within the sample being evaluated, space use of bouts from the same individual should be, on average, more similar among them than that of bouts of different individuals of the same sample. We estimate the space use of each individual with the autocorrelated kernel density estimate method (AKDE; Fleming & Calabrese, 2017) provided by the package "ctmm" (Fleming & Calabrese, 2021). This method models the movement of the animal, from the relocations obtained with the tracking device, as a continuous-time stochastic process, which separates the discrete observation process (a relocation every few minutes or hours depending on the device) from the continuous-time process which is the movement of the animal (the animal has continued to move between relocations, even if our observation process does not have information on where the animal has been in that time). From these models, an unbiased measure of space use can be estimated that considers the existing autocorrelation in tracking data with high spatiotemporal resolution (Fleming et al., 2015).
We developed a function, indEffectTest, in the R programming language, that computes, for every possible pair of bouts regardless of the individual that performs them, the similarity (i.e. spatial overlap) between their utilization distributions (UD). To assess this similarity, we used the Bhattacharyya affinity (BA), which is based on the comparison of two UD (p 1 and p 2 , evaluated at locations determined by x and y) to their joint distribution, and ranges from 0 (completely disjunct UD) to 1 (identical UD): In the BA version provided in the overlap function from the "ctmm" package, the calculation of the BA is corrected for small sample bias (Winner et al., 2018) which, in addition to the modelling of autocorrelation in the AKDE, provides a fairly unbiased measure of overlap, and a distribution to draw confidence intervals (CI) from, To demonstrate the function's ability to detect the effect of ISF, we performed practical examples by generating datasets that simulated 10 different degrees of fidelity (one dataset for each scenario). To simulate ISF, we generated the bouts over a resistance surface with a "hole" of null resistance inside. We set high resistance outside the "hole" to simulate maximum levels of fidelity and to simulate sequentially decreasing levels of fidelity, we sequentially decreased the value of resistance.
We performed additional tests on simulated datasets to analyse the performance of the function under different sample sizes for each individual, as well as for different sampling rates. The code for simulating the trips and running the different tests can be found at the GitHub repository specified in Appendix S1.

| Testing for temporal variability
The aim of this test was to explore the existence of changes in distribution over time and not to understand why or how these changes occur. In the example with empirical data, we tested variability among years, but the user may select any other temporal unit. When testing differences in nonbreeding distributions of migratory species, for instance, or differences in foraging distributions of central place foragers such as colonial animals, the temporal unit is obvious: nonbreeding season and foraging trip, respectively. For species with nomadic movement or with less evident temporal units, the selection of this unit is not as straightforward. The user is advised to carefully consider the biological meaning of defining a unit, the aim of the analysis and the hypotheses being tested to select an appropriate time unit to split the data. In general, larger temporal units will allow larger within-group variabilities, and thus less possibility of finding significant differences when compared to between-group variability.
The existence of temporal variability can be detected with an approach that mimics the method used to test the effect of ISF (see Section 2.1): the aim is to check whether variability in space use within a unit of time is the same as between units. The function is the same as above, but instead of the individual, the temporal unit is the grouping variable. Thus, we ignore the individual identity for this and compare overlaps of trips within and between temporal units.

| Testing for spatial variability
We propose a method to test the representativeness of each spatial unit with respect to the entire species (i.e. how well the distribution of a unit represents the distribution of the entire species) in spatially structured populations with large spatial ranges. In our example, since the study species is colonial, we used breeding colonies (from now on colonies) as spatial units. However, depending on the species studied, this choice might not be obvious. The user should decide then, based on the spatial ecology and distribution of the study species, and the spatial scope of their research question, if spatial variability is an issue and at which scale it should be tested.
This method works in two steps schematically described in  Table S1). Empirical tracking data collected at different colonies have biases related to different sampling efforts and population sizes in all sampled colonies (i.e. the larger colonies will not necessarily be the ones with a larger sample). To correct these biases, we first simulate distributions for each colony by pooling all trips of each colony together and calculating colony-level space use (Figure 1a,b). In this instance, we are going to estimate space use using independent and identically distributed (IID) kernel density estimation (KDE) provided by the functions in package "ade-habitatHR" (Calenge, 2006), since the AKDE method uses individual movement models that are not applicable to population-level spaceuse estimates. The user should be aware that, for tracking data with high temporal resolution and for small sample sizes, this method, which does not consider autocorrelation of tracking data, can underestimate space use (Fleming et al., 2015). The effects that the choice of smoothing parameter has on the resulting distribution should also be considered, a parameter that can be controlled by the scale argument in the simulateDistribution function (Lascelles et al., 2016).
Colony-level space use is represented by the obtained KDE surface, and a number of locations that is proportional to the colony size can be generated with a distribution proportional to the density surface ( Figure 2c). Once we have locations simulating each population's distribution, in a number proportional to each population's size, we pool the simulated locations from all populations to generate a simulated distribution that mimics that of the entire species, but unbiased by different sample sizes of each population, and faithful to the real population sizes (Figure 2c,d).
The drawback of this process is that it requires a good knowledge of the population distribution to generate a simulated location dataset against which researchers can compare their real location data. In our worked example, we have used our own data, since the dataset includes a good representation of the entire breeding distribution of the species. If such a comprehensive dataset is not available, users can skip the simulateDistribution step, and simply generate random points within a known distribution of the species (e.g. those provided by the BirdLife database for birds, or the International Union for Conservation of Nature, IUCN, website for species with a red list assessment), thus starting the process at step II in Figure 2. Although this second procedure simplifies the process, it has the drawback that the relative abundances within different regions of the distribution will not be correctly represented, and thus, the representativeness of each subpopulation will not be accurately represented.
Once we have a set of spatial locations representing the dis- where a and b are given start values of 1 and 0.1, respectively, and Inclusion mean is the mean inclusion value at each sample size. Through this model, we calculate the value for the Inclusion mean at which continuing to increase sample size will not increase inclusion further (i.e. the asymptote; Figure 2g). We regard this value as the maximum inclusion value (MIV) possible for that colony. Because MIV is calculated based on 50% UD, they range between 0 and 0.5 (i.e. even if a sample represents the entire species perfectly, its 50% UD will only include around 50% of the simulated locations). However, values higher than the UD percentage (in this case 50%) can occasionally occur, as the UD is calculated based on the locations of the colony, but the inclusion is calculated from the simulated locations of the entire species. In addition, users must exercise caution when comparing the representativity of colonies with vastly different sample sizes, as the biases introduced by the violation of the IID assumption and the bootstrapping of the KDE have a larger effect in small sample sizes. The function then multiplies the MIV obtained by 100, producing the value of "species inclusiveness," defined as the percentage of points of the entire species that falls within the 50% UD of the sampled colony.

| Empirical application
To demonstrate the use of these functions, we put together a dataset containing year-round trips from Cory's (Calonectris borealis), Scopoli's (C. diomedea) and Cape Verde (C. edwardsii) shearwaters obtained between the summers of 2007 and 2016, from 34 breeding colonies (Table 1). Locations were obtained from GLS (Wilson et al., 1992), which register ambient light and derive longitude from the time of twilight and latitude from the length of the light period, obtaining low spatial accuracy positions with a temporal resolution of one or two positions per day (Halpin et al., 2021). The dataset contains individuals tracked for up to eight consecutive years, which allows us to also test the effect of ISF. Details of GLS deployment for each of the colonies can be found in Appendix S2: Table S2.

| Data preparation
The twilight events were calculated from the raw light measurements (obtained from the GLS) and visually inspected and adjusted when necessary. The locations were obtained using either Intiproc® (Migrate Technology, 2012) or Biotrack® (BiotrackLtd) software, or the "GeoLight" package in R (Lisovski & Hahn, 2012). We discarded position data from 20 days before and after each equinox, as latitudes cannot be correctly inferred from day length during these periods (Ekstrom, 2004), and applied a quadratic speed filter following McConnell et al. (1992) to remove other highly inaccurate locations.
Phenological states (migrating, breeding, wintering or staging) were assigned using custom-made R routines and confirmed by visual inspection: the first two consecutive days of directional commuting movement were assigned as the beginning of migration, and the first two consecutive days without a fixed direction were considered as the beginning of a stationary period (either breeding, wintering or staging). Only the nonbreeding stationary periods (i.e. wintering and staging) were selected for the analyses.

| Individual site fidelity
To test the effect of ISF on nonbreeding distributions, we selected, for each species, all the colonies where individuals had been tracked more than once (Appendix S2: Table S3). We ran the indEffectTest function for each colony, using individual as the grouping variable, to obtain a WBR value based on debiased estimates of within-and between-individual overlaps. High WBR values are indicative of an effect of the ISF in the studied colony.

| Temporal differences
Since in our dataset each of the trips corresponded to a year, and to avoid confounding interannual differences with ISF, we used a subset of the original data containing only one trip per individual, randomly selected but ensuring a similar sample size for each year (through a stepwise custom-made R function that randomly selected trips per year, without selecting two trips from the same individual, but weighted the selection at every step to ensure all years had the same number of trips, whenever possible). With the resulting dataset (Appendix S2: Table S4), we ran the indEffectTest test for each colony, using year as the grouping variable.

| Spatial differences
Before proceeding with the analysis, we calculated the representativeness of our sample at each of the colonies, using the repAssess function from the package "track2kba" (Beal et al., 2020;Beal, Oppel, et al., 2021). This function performs a representativity test to estimate how many trips would be necessary to accurately represent the distribution of the entire colony, and how far away the current sample size is from that number. This approach is akin to that of species discovery curves and calculates a value of representativity that ranges between 0 and 100.
After calculating the colonies representativeness, we selected only those with a representativeness >80% (Table 2). From those colonies, we simulated the distribution of each species using the simulatedDistribution function, with a multiplication factor of 1 (i.e. one location at sea was generated for every individual in the colony).
We obtained a set of simulated locations for each species (Table 2), which was used in the following step. Finally, we ran the bootstrap-Colony function (see Section 2.3 of this section and Appendix S2: Table S1), which calculated, for each colony, a value of species inclusiveness, a measure of how well space use by individuals of that colony represents space use by the entire species.

| Examples of applicability of the developed methods
To further demonstrate the applicability of these methods, we used them in two different types of analyses, which exemplify possible uses. First, we calculated the nonbreeding distribution of each species, and to ensure these distributions were unbiased by different population and sample sizes from each colony, and due to different colony sizes, we used the following procedure: we first selected the colonies with a sample size larger than five individuals and a local representativeness >80%. Then, for each of those colonies, we selected the full dataset (if ISF has not been detected) or one track per individual (if ISF has been detected), and with the resulting dataset we followed the workflow of the package "track2kba" (Beal et al., 2020) to obtain rasterized abundance maps proportional to the colony sizes. We obtained 21 raster layers, seven for Cory's shearwater colonies, 12 for Scopoli's shearwater colonies and two for Cape Verde shearwater colonies.
Finally, for each species, we added the colony-level raster layers to produce a more accurate, unbiased representation of the distribution of the species.
Second, we used generalized linear models to model the obtained colony inclusiveness value, that is, how well they represent species nonbreeding distribution in relation to several variables that could be of interest to disentangle ecological from methodological drivers of differences in inclusiveness: geographic location, centrality in the breeding distribution and colony and sample size. We ran these models for Cory's and Scopoli's shearwaters (data from Cape Verde were excluded since we had data from only two colonies). The sampling unit for these models were the sampled colonies, and to use as response variables, we calculated two values: (1) the distance of each colony to the geographic centroid of the species' spatial distribution and (2) Table S1). The code for all the functions and performance tests can be found at https://github.com/Virgi niaMo rera/ Track ing_data_analysis

| Results of ISF test on the simulated dataset
We ran the ISF test in data simulating a situation where the tracked animals were not colonial (e.g. a residential species where all trips from the same individual had a common origin but trips from different individuals had different origins). Within that paradigm, we tested the function in 10 simulated datasets of decreasing site fidelity (Figure 3). At null resistance (i.e. simulating the absence of ISF), WBR value was 4.8%, increasing up to 85.8% at maximum resistance (Table 3).

| Interannual differences in distribution
Values of WBR were much lower in testing for interannual differences (evidencing no differences among years) than in the test for ISF and again higher for Cory's shearwaters than for the other two species. Values ranged between 6.2% (Berlenga) and 21.5% (Veneguera) for Cory's shearwaters, and the only nonzero value for the other two species was for the Scopoli's shearwaters of Chafarinas (2.5%).

| Spatial differences in species representativeness
Before the analyses, we excluded three of the 34 sampled colonies, which had less than five trips. From the 31 remaining colonies, we selected one trip per individual for Flifla (Scopoli's shearwater) since a WBR higher than 50% suggested an effect of ISF. With the resulting dataset, we obtained values of local representativeness >80% in 22 of them (Table 2). With these, we proceeded to test their species inclusiveness.

F I G U R E 3
Simulated datasets created to test the ISF function. The datasets consist of 10 trips of 100 positions each for each of the 10 animals simulated (in different colours). From left to right and top to bottom, ISF has been increased by increasing the resistance of the surface outside the allowed area (see Section 2) from 0 to 1.
We simulated nonbreeding distributions from each of those breeding colonies using the simulateDistribution function and obtained a dataset that contained more than 78,000 locations for Cory's shearwaters, more than 250,000 for Scopoli's shearwaters and more than 10,000 for Cape Verde shearwaters (  (Table 4; Appendix S2: Figure S1).

| Nonbreeding distributions of three species of Calonectris shearwaters
After selecting one track per individual for the colony of Filfla, for which we found an effect of ISF, and since tests for temporal variability were not able to detect any effect in our data, we used data from all years together to plot the nonbreeding distribution of each species. We also plotted, for each species, the nonbreeding distri- Note: For resistances outside the allowed polygon from 0 to 1, simulating ISF from low to high (see Section 2), we calculated how many within-individual overlap values fell below, inside and above the 95% CI of the between-individual overlap, for datasets simulating both a noncolonial and a colonial species. Note: Continuous predictors are mean-centred and scaled by 1 standard deviation. Effects with a p value <.05 and/or a CI not overlapping 0 are considered significant and highlighted in bold. A parameter of the goodness of fit (adjusted R 2 ) is also shown.

TA B L E 4
Main parameters from the models obtained from the selected linear models testing representativeness for Cory's and Scopoli's shearwaters.
and Benguela currents ( Figure 4c); and for animals from the Canary and Savages Islands, the main nonbreeding area was the southern Benguela current, with the Canary current, off the coast of Western Sahara and Senegal, as a secondary area (Figure 4d).
For Scopoli's shearwaters, the main nonbreeding area was the Canary current (Figure 5a), but the distribution of nonbreeding birds changed if we considered only animals from the eastern, central or western colonies, with animals from the westernmost colonies using F I G U R E 4 Nonbreeding distributions of Cory's shearwaters corrected for sample effort and weighted by population size. Using data from all representative colonies pooled together to represent the species' nonbreeding distribution (a), and using data only from the representative colonies in (

| DISCUSS ION
In this study, we provide a workflow to test the effect of three major sources of variability in spatial studies at the species level (ISF, temporal variability and spatial variability), potentially applicable to a wide variety of tracking datasets. In a context of increasingly sophisticated, miniaturized, and inexpensive animal tracking devices, studies using individual tracking data are proliferating, with ever larger extents both spatially and temporally. This emphasizes the need to account for the biases described here when studying space use at a population or species level, that is, before putting all tracks together towards generating a single measure of space use that does not consider potential differences in individual behaviour.
The ISF test is useful for datasets with uneven amounts of data for each individual. It detects instances when trips from the same individual are more similar than trips from different individuals, which causes the areas preferred by the more represented individual to be overrepresented in the distribution at a population level, and it can help designing future tracking studies to avoid this effect. The solution will largely depend on the type of data, size of the dataset, the type of analysis to develop and the aims of each particular study. In general, researchers should aim to (a) subset the data so the sampling effort is evenly distributed among all individuals, for example, sequentially removing one random track from the repeatedly sampled individuals and testing for the effect of ISF again until the effect is no longer detected or (b) apply different weights to each track, with larger weights applied to trips of underrepresented individuals. or using a spatiotemporal model that will account for temporal as well as spatial autocorrelation (e.g. Martínez-Minaya et al., 2018).
Conversely, if no temporal variability has been detected, a possible F I G U R E 6 Nonbreeding distributions of Cape Verde shearwaters corrected for sample effort and weighted by population size. Using data from all representative colonies pooled together to represent the species' nonbreeding distribution (a), and using data only from the representative colonies in ( distribution is proportional to its size (Beal, Dias, et al., 2021;Beal, Oppel, et al., 2021;Carneiro et al., 2020;Davies et al., 2021).
In our simulated datasets, the indEffectTest was able to detect the effect of ISF as intended, as the WBR value increased with increasing simulated ISF. These datasets were simulated as noncolo- For the rest of the colonies, no effect of the ISF was found, and we did not take any action on them.
The general lack of ISF found in our shearwater dataset agrees with published evidence suggesting a lack of individual consistency in the use of foraging areas during breeding and in several migratory and nonbreeding parameters for both Cory's and Scopoli's shearwaters (Courbin et al., 2018;Dias et al., 2011;Müller et al., 2014;Zango et al., 2019). At a population level, there are studies reporting interannual consistency in nonbreeding distribution and migration phenology of seabirds (Bogdanova et al., 2014;Legrand et al., 2016), and our results agreed with these findings, showing that withinindividual overlaps were mainly inside the 95% CI of betweenindividual overlaps for the three species.
In our example with these three phylogenetically close species that share nonbreeding areas, we found that the Cape Verde shearwater, the species with the most restricted breeding and non- Thus, they were not good in representing the entire nonbreeding distribution of the species, although necessary for the complete knowledge of the wintering grounds of the species. As with the results detailed above, the representativeness of each colony or population can offer insights into their migratory connectivity: in species with low migratory connectivity, all populations will present higher values of representativeness, while in species with high connectivity, each population will have a low level of representativeness.
The findings obtained from our shearwater dataset are a clear example of the implications of this work for conservation studies.
When the aim is to identify areas where the study species is more abundant, it would be more convenient to concentrate sampling efforts in colonies or breeding areas with larger populations. However, when the aim is to define the entire range of a given species, that is, any region in which an individual from the study species can be found, a sampling strategy including spread colonies or areas throughout the range of the species would be more appropriate. In addition, these methods could help decide when the data already collected are representative enough to extract conclusions relevant for conservation, thus avoiding the collection of more superfluous data and making the most of the data already collected. This work will serve as an example of the importance of taking these issues into consideration, as well as providing the tools to do so.
Collaborative datasets, metapopulation studies and their applications to predicting distributions of animals from unsampled areas Wakefield et al., 2017) and to informing conservation and management policies (Hays et al., 2019) are becoming common-place in the current scientific context, although they do not always account for the sources of variability explored here (but see Beal, Dias, et al., 2021). The proposed method provides an objective protocol for the detection of three of the main sources of variability (ISF, interannual variability and spatial variability). This is a first step in the building of more accurate spatial and spatiotemporal models of movement and species distribution that can be used by movement ecologists working with a wide diversity of tracking data types.

ACK N OWLED G EM ENTS
We would like to thank all students, volunteers and technicians who helped us at several stages of fieldwork. We are grateful to the following institutions for permission and logistical support in conducting fieldwork in the respective countries: Direção Nacional

CO N FLI C T O F I NTE R E S T
The authors declare no conflicts of interest.

DATA AVA I L A B I L I T Y S TAT E M E N T
The novel code developed for this manuscript, and to recreate the simulated datasets used to demonstrate the functions, can be found