Species Distribution Models (SDMs) have become increasingly useful for conservation issues. Initially designed to predict distributions of species from incomplete datasets, SDMs may also identify environmental conditions associated with higher occurrences and abundances of widely distributed taxa. Using sighting records of 15 widely distributed mammals from French Guiana, including primates, carnivores, rodents and ungulates, we used three SDMs –based on (i) entropy, (ii) genetic algorithm, (iii) Mahalanobis distance – to investigate relationships between species occurrence and predictive variables such as vegetation, biogeographic units, climate, and disturbance index. Maximal entropy procedures resulted in more accurate projected conditions: the accuracy of the predicted distributions was higher than 90% in nine species among the 15 tested, and predicted occurrences were correlated to field-measured abundances for nine species. The Genetic algorithm implemented with GARP had lower accuracy, with predicted occurrences correlated to abundances for three species only. Finally, Mahalanobis distance had a much lower performance and failed to find any correlation between occurrences and abundances. In the case of MaxEnt modelling, since map projection summarized more appropriate environmental conditions and identified areas likely to act as sources and/or corridors, we propose to use those appropriate environmental conditions as a proxy of conductance for landscape connectivity planning. We provide evidence here that SDMs can identify not only more suitable environmental conditions, but also areas hosting higher abundances for a large set of species with key ecological roles. Further management applications of this environmental suitability index could help in designing corridors between protected areas.
Introduction
Understanding patterns in the spatial distribution of biodiversity and ecological connectivity is key to the development of effective conservation strategies [12–3]. Ranking the priority of areas that may provide sources and corridors is nevertheless challenging. Beyond powerful modelling processes [4], the selection of adequate field information is crucial. Input data may rely on expert knowledge [5], endemism [6], richness, distribution, abundances, and population trends of focus species [7,8]. Biological indicators might also involve sets of more common species, which may identify not only areas hosting rare, flagship, umbrella, and/or Red List classified taxa [9,10], but also those with various ecological roles [11,12]. Therefore, identifying areas of higher abundances could help to ensure the maintenance of ecological processes, which can also provide sources for neighbouring depleted regions. Abundances of species are nevertheless difficult to assess on wide geographical scales, due to unevenly distributed ecological conditions and local variations in response to specific niche requirements [13].
Species Distribution Models (SDMs) are increasingly useful in a wide set of disciplines, including ecology, evolution science, conservation, and the management of species [14]. SDMs assume that species' occurrence is determined by an immediate response of individuals to geographic and/or temporal environmental variations [15]. SDMs can therefore identify more appropriate environmental conditions (sensu [16]), in which species not yet surveyed or recorded are more likely to be present than in other areas. Although initially designed to assess predicted distribution maps from incomplete datasets (e.g., [17]), SDMs have shown their usefulness for other issues, such as the prediction of exotic species invasions [18], the monitoring of declining species [19], the prediction of range expansions of recovering species [20], the assessment of the impact of climate change [21] and likelihood estimates of species' long-term persistence [22]. However, the occurrence of species may nevertheless be of insufficient value for conservation planning [23], because overall abundances have much greater value for macroecological approaches [24], and as indicators of population trends [25].
A surrogate for assessing species densities could be based on the expected relationships between the probability of species occurrence and their abundances [26,27]. SDMs produce geographic projections that indicate where species are more likely to occur, on the basis of relationships between records and ecological conditions [28]: the models are expected to reflect the niche requirements of the species, and consequently the environmental conditions associated with higher species performance [29]. Habitats with higher numbers of collected observations are likely associated with higher densities [3031–32].
Relationships between predicted habitat suitability and species dynamic traits have not been largely investigated, but thanks to the recent developments of SDMs [33], this approach has received increasing interest. Occurrence models have been tested as indicators of the variation of abundances of several plants [29], vertebrate species [34] and top predators [35,36]. Our general objective was to evaluate how SDMs could help to identify areas of importance for the maintenance of large-scale ecological dynamics, including both areas of higher richness, and corridors facilitating movements among them. Landscape permeability to species migration is not only explained by physical constraints, but also strongly influenced by bioecological characters of species [37]: habitat suitability resulting from SDMs may reflect some of the species' response to habitat variation, as well as the efficiency of migration areas.
Our study used the predicted occurrences of a large set of common and widely distributed large forest mammals in French Guiana, acting as a surrogate for key ecological processes [12] with three specific objectives:
Testing the accuracy of Species Distribution Models in identifying more appropriate environmental conditions for a set of widely distributed forest mammals;
Testing how those environmental conditions correlate to field-measured abundances;
Proposing those environmental conditions to identify key areas for long-term conservation of species and for modelling habitat connectivity, the gradient of predicted occurrence being considered a proxy for the permeability of the habitats to movements by the species of interest.
Methods
Study area
The work took place in French Guiana, a French administrative unit of 84,000 km2 in the northern part of South America, on the Guiana shield (Fig. 1). The Guiana shield is one of the largest pristine Neotropical rainforest blocks and a floristically distinctive province compared to the Amazonian basin [38]. Eighty percent of French Guiana is covered by moist upland forests on well-drained lateritic and oligotrophic soils over altitudes between 0–600 m. The alluvial coastal plain is rather narrow on this part of the Guiana shield [39], covered by marsh forests, savannahs, transition forests, and herbaceous swamps and is rather narrow on this part of the Guiana shield [39]. Compared to other Neotropical countries, the forest conservation status of eastern Venezuela, Guyana, Suriname, French Guiana, and the Brazilian states of Amapá and Para is still rather favourable. French Guiana benefits from an extensive network of protected areas, including five Nature Reserves located in patches in the northern half of the country, and a National Park in the south, for a total protected area of 23,000 km2 (>25% of the country).
Model of occurrence
A set of 15 mammalian species from the orders of Primates, Carnivora, Rodentia, Perissodactyla and Artiodactyla was studied (Table 2). Determining the absence of large species in tropical forests is almost impossible because of low densities, cryptic behaviours, and dense, close habitats. Consequently, records of presence-only were included. Data considered in this work have two origins. First, we conducted surveys in the country for 15 years, both to assess impacts of anthropogenic activities such as logging, hunting, and fragmentation, and to assess variations of densities in non-disturbed areas [12,40]. As well as estimating richness and abundances of target species (see below), our surveys recorded the presence of a large set of species. Second, bona-fide sighting records reported by naturalist volunteers, managed in a database by one of us (FC), were included. We only considered records from experienced observers, with associated geographic coordinates and/or site description allowing an estimate of the location with an error < 1 km2. Total numbers of records per species ranged from n=63 (Red acouchy, Myoprocta acouchy) to n=358 (Brazilian tapir, Tapirus terrestris) (Appendix 1a). Fig. 1 shows the distribution of all records, and the areas where field samplings were implemented
Considering the discrepancies among different SDMs (e.g., [36]), three models were used to investigate the distribution of occurrences. First, maximum entropy analysis has a recognized efficiency in processing presence-only data and small data sets [41,42]. MaxEnt3.3.3k [43] estimates the probability distribution of the maximum entropy of each environmental variable with the study area. This occurrence distribution is calculated with the constraint that the expected value of each environmental variable under this estimated distribution matches the empirical average generated from environmental values associated with species occurrence data [43]. When MaxEnt is applied to presence-only species distribution modelling, the pixels of the study area make up the space on which the MaxEnt probability distribution is defined, pixels with known species occurrence records constitute the sample points, and the features are environmental variables. To control the likely geographic bias of sightings distribution (i.e., trapping effort), the model was forced to use environmental layers restricted to the areas of sampling during the learning stage [44]. Predicted areas of occurrence were then projected at the country scale. The model was run using 75% of the records for training and the remaining 25% for testing, and 5,000 iterations with a bootstrap replicate strategy; other parameters were a 1.0 × 10−5 convergence threshold; logistic output format; and linear/quadratic regularization values.
For each species, the model was run with 15 replicates and interpreted with the AUC test [45]. However, because the AUC test could lead to misinterpretation of model accuracy [46], the null model hypothesis [47] was also used to test the performance of the predictions. We generated 99 random distributions, and considered the 95th AUC value as the upper limit of the 95% C.I of AUC. Then, as soon as the AUC value of one species was higher that this 95th ranked AUC, the accuracy of the SDM was significantly higher than expected by chance alone with p<0.05. Once projected at a country-wide scale, predicted appropriate environmental conditions are expressed as an index of suitability, a value ranging from 0 (less favourable ecological conditions) to 1 (ideal conditions). Contribution of the environmental variables on the model was investigated with both a permutation heuristic test, and a jackknife test, implemented with MaxEnt.
Second, we used the Genetic Algorithm for Rule-Set Prediction (GARP, [48]). GARP searches for non-random associations between environmental characteristics of sites of known occurrence versus those of the overall study region. Basically, GARP works in an iterative process of rule selection, evaluation, testing, and incorporation or rejection to produce a heterogeneous rule-set characterizing the species' ecological requirements. As implemented here, the algorithm runs either 1,000 or until the addition of new rules has no appreciable effect on the intrinsic accuracy measure (convergence limit of 0,01). The final rule-set, or ecological-niche model, is then projected onto the country-wide scale. Localities of records were randomly divided into training (75%) and test (25%) data sets. For each species, 10 independent models were run; the ecological niche was obtained with superimposition of the runs, with in a pixel-scale index ranging from 0: none of the 10 models identified the pixel as an area of predicted occurrence, to 10: all the models identified the pixel as an area of predicted occurrence. For each species, the performance of the model was evaluated with both extrinsic and intrinsic values. First, a chi2 statistic test was run on the mean (among the 10 models) of extrinsic values. Second, we observed the mean of the intrinsic omission error, expected to be <5% in high quality modelling, and the mean of the intrinsic commission index, expected to be > 67% [49].
Third, we used Mahalanobis Distance [50], a geometrical method based on set theory and distribution profiles of presence data in environmental dimensions that also showed relevant results on the occurrence of large vertebrates [51,52]. Mahalanobis distance (D2) is a multivariate dissimilarity statistic, expressed as the standardized difference between the values of a set of environmental variables and the mean values for those same variables calculated from all points at which a species was detected [53,54]. A map of habitat suitability is created by calculating the D2 values for each landscape cell, with D2 = (x-m)T C−1 (x-m) where × is a vector of landscape data associated with each landscape square, m is a vector of means for the landscape data at all set squares, and C is a covariance matrix of the landscape data at all squares. The procedure was implemented on ArcGis 9.3 with Land Facet tools [55].
The predictive environmental data were the same for all three models and were selected according to the assumed pertinence of their interactions with the ecology of species [56,57] and to their availability and homogeneous definition at the country scale: rainfall [58], mean altitude; range of altitude [59]; vegetation types defined with high definition remote-sensing data and organized along a reflectance gradient (dense forests have higher reflectance, while open (savannahs) and disturbed areas show an opposite signature) [60]; biogeographic units [61]; and the human footprint representing the distribution and strength of pressures on natural habitats [12].
Relationships between suitability derived from SDMs and field-measured abundances, and environmental variables
To determine how projected environmental conditions reflect ecological conditions, the suitability index was tested in relation to the abundance of species measured in the field, abundance being considered a proxy for the quality of habitats [62]. Those abundances were measured for 14 species (monkeys, large rodents, deers and peccaries) in 36 sites with a line-transect method, with a standardized effort [63]. The abundance of tapirs was measured using a track index along a river-transect (18 sites, mean survey distance = 20 km). The relationships between predicted environmental suitability (for GARP, the suitability was set as the sum of the 10 models) and abundances were examined using both Ordinary Linear Regression (OLS) and 90th quantile linear regression, in order to consider the complex non-functional relationship between density and suitability [34,36].
Projected environmental conditions and landscape permeability
Landscape permeability was investigated at the country scale with CircuitScape 3.5 [64, which describes every movement of an animal as a random choice and equally probable in all directions, using the circuit theory [65]. CircuitScape considers the study area as networks of nodes (e.g., habitat patches, populations) connected by edges. The weight of each edge is related to the strength of the connection (e.g., number of dispersers) between the connected nodes. Predicting connectivity requires first assigning resistances to different habitat types in the grid. Outputs from SDMs were used as a measure of conductance (the reciprocal of resistance) in the study area: higher values of conductance indicate more suitable habitats, likely associated with more movements of target species, i.e. with higher permeability. The nodes, supposed to act as source areas, were defined as all areas located in protected areas and exhibiting index values above the natural break of index distribution, defined with the optimized method [66] implemented in ArcGIS 9.3. We used pairwise comparison to assess connectivity, calculated between all pairs of focal nodes. In order to allow the movement of animals in every direction, cells were connected to their eight neighbours with the resistance between a pair of first-order neighbours set to the mean of the two cells' resistances, and the resistance between a pair of second-order neighbours (diagonal) set to the mean resistance multiplied by the square root of 2 in order to reflect the greater distance between cell centres.
Results
Spatial modelling of the distribution of target species
The AUC values of the MaxEnt model range from 0.745 to 0.853 (Appendix 1a); the reliability of the null hypothesis that the accuracy of the SDM was significantly higher than expected by chance was > 95% (p<0.05) for seven species among the 15, and was > 90% (i.e., p<0.1) for two other species (Appendix 1a). As examples, the distribution of more suitable environmental conditions for the black spider monkey (Ateles paniscus) and for the Brazilian tapir (Tapirus terrestris) (Fig 2). Fig. 3 shows the superimposition of habitat suitability for the nine species with a prediction relevance > 90% (null hypothesis test).
The two other models had much lower performances. With GARP, the extrinsic measure of overall model performance (chi2 test) was significant in all species but one. But among the more stringent estimators, no intrinsic omission error was below 15%, and the intrinsic commission index was above the expected value (67%) in three species only (Appendix 1b). The Mahalanobis models also show weak performances, with very low AUC values and no significant null hypothesis test (Appendix 1c).
Predicted suitability, measured abundances and the role of environmental variables
With MaxEnt, Ordinary Linear Regression (OLS) and quantile regressions show that, for nine of the 15 species, abundances were correlated to predicted environmental suitabilities derived from MaxEnt modelling (Appendix 1a). Among the five environmental layers, the biogeography explained most of the geographic variation of the occurrences (Appendix 2, Fig. 4). The important contribution of the human index has to be considered, at least partially, as a spurious effect resulting from the distribution of sightings significantly influenced by overall disturbance. Due to the inaccessibility of large parts of the country, sighting records were often located in, or close to, areas with potential sources of disturbance (i.e., forest tracks and large rivers which also allow easy access to hunters), with consequently a positive, although low, disturbance index (see [12] for details). Hopefully those effects were mitigated by the projected distribution maps (see Material and Methods). Among other layers, mean altitude explained most of the remaining variation, although vegetation, rain and altitude range have a restricted contribution (Appendix 2). Detailed contributions of those three main variables (biogeography, disturbance index, and mean altitude) are shown in Appendix 3 for the nine above-mentioned species.
With GARP, regressions show positive and significant trends between index and abundance in three species only, although the models of those species were not associated with relevant omission and commission values (Appendix 1b). Regressions between abundances and habitat index defined with Mahalanobis modelling failed to detect any significant relationships (Appendix 1c).
Predicted environmental conditions and landscape permeability
Considering the lower performances of both GARP and Mahalanobis models, projected habitat conditions of MaxEnt only were used to map landscape permeability. The gradient of landscape permeability among the nodes of interest is shown in Fig. 5 for the nine species with a prediction relevance > 90%. A large single latitudinal corridor is shown, which will likely facilitate the flow of target species from the large refuge areas of the National Park toward the northern nature reserves.
Discussion
In this study, we explored how the Species Distribution Model could predict a gradient of environmental conditions for a set of forest-dwelling Neotropical mammals, and how those predicted environmental conditions are correlated to field-measured abundances. The objective was to contribute to the development of those models in understanding geographic variation of the densities of species [31,35,36], and in supporting conservation planning in a dynamic perspective [30].
Performance of the models
Few studies have evaluated relationships between abundances and habitat suitability derived from SDMs. On the basis of ca. 2,000 ad libitum records from 15 species and 36 surveys implemented to measure abundances, we show that even with geographically-biased sampling, Entropy-based SDMs can predict environmental conditions correlated to abundances of a large set of species, including four primates, three ungulates, one rodent and one carnivore.
Of the models,_MaxEnt clearly led to better accuracy than the Mahalanobis distance-based model and the Genetic algorithm for rule set predictions, on the basis of both statistical tests on modelled accuracies, and on the relation between field recorded abundances and predicted environmental conditions. Maximum entropy models have already provided relevant predictions of most appropriate conditions for species occurrence [34,67,68], but were of less value in predicting variations in the abundance of jaguars although GARP resulted in intermediate accuracies [36], as in our study.
The performance of the MaxEnt model and strength of the adequacy between predicted conditions and field validations nevertheless vary among species (Appendix 1a and 2). Fig. 6 shows the respective variations of abundances (expressed as Standard Deviation of abundance / Mean of abundance, in order to control for discrepancies among abundances of species) and suitability recorded from the 36 survey sites. As previously shown [34,36], although many species show both low variations of abundances and suitability, reflecting rather low ecological requirements, others show either important variation of abundance, but low variation of suitability (the south American coati Nasua nasua, the red-rumped agouti Dasyprocta leporina, the collared peccary Pecari tajacu) or the opposite pattern (the wedge-capped capuchin Cebus olivaceus). We also found that the black spider monkey Ateles paniscus has both important variation of abundances and predicted habitat suitability, likely indicating a higher level of ecological specialization and clear patterns of habitat preferences.
The roles of input variables, models used, target species, and specific traits can be evoked. Considering environmental variables used in the model processes, we detected a Wallacean shortfall, and for some species the human footprint is a more suitable positive explicative variable. Species-sightings are spatially biased by the human footprint index: places where people are more abundant will also be the places where more sightings will be recorded [68]. On the other hand, abundances of large species recorded with a standardized protocol show a decline when the footprint increases, since many primates and ungulates are game species [12,40]. This contradictory role of human footprint complicates the assessment of the relation between abundances and occurrences for a set of species. Other environmental variables of importance are landscape units and mean altitude (Table 2, Figure 3) which clearly (AUC>0.84) indicate some marked ecological preferences by black spider monkeys (Ateles paniscus) and wedged-capped capuchins (Cebus olivaceus) for hilly forests, and key roles of riparian and low altitude forest for the squirrel monkeys (Saimiri sciureus) (data not shown). In contrast, the distribution of sightings of some other species (e.g., the red howler monkey Alouatta macconnelli, the white-faced saki Pithecia pithecia, the golden handed tamarin Saguinus midas, which all have a non-significant null hypothesis test, and/or low AUC values) was not explained by SDM analysis. This suggests the need for considering other environment variables, as soon as those become available at the scale of the study area.
Specific traits may influence the performance of the modelling. In birds, the shape of the relation between measured abundances and expected occurrences is explained by the breeding system [35]. The importance of traits has also been raised in plants [29,69]. We failed to explain the difference in model performance with bioecological patterns of species such as density, lifespan, diet, rank on trophic chains, and plasticity (data not shown). Also, the dispersal of species responses (Fig.6) was not related to the relationship between abundance and suitability, highlighting the complexity of biotic and abiotic interactions and the importance of cryptic factors not considered, such as local climatic conditions, phenology, or local threats with short-time impacts on populations: environmental suitability has to be understood as an upper, or optimal proxy of density, rather than an indicator of immediate abundance [34].
Although for some species no relation was found between the index of suitability and abundances, this may not weaken the relevance of predicted areas. Those results raise questions about the relevance of abundance as assessed in a single survey, the bias in the detection of some species by a naturalist network, and the biological significance of the environmental variables tested. At least for species with high AUC values (i.e., > 0.8, collared peccary Tayassu pecari, south American coati Nasua nasua, and red acouchy Myoprocta acouchy, see Table 1), the absence of a positive relationship between suitability and abundance may highlight the difficulty of assessing densities in tropical forests, which often requires a larger survey effort than the one deployed for our own assessments of abundances [63].Moreover, cryptic and elusive species may bias sighting databases [70]. In our own field experience, the red acouchy Myoprocata acouchy and the white-faced saki Pithecia pithecia are difficult to detect in the field and seldom reported by volunteer naturalists, likely influencing performance of the models. Last, the roles of environmental variables on recorded abundances, which show some inconsistencies with variables explaining predicted suitability (Appendix 2), suggest the limits of our procedure when considering density as an indicator of habitat quality [71].
Implications for conservation
Despite a relevant protected area network and a voluntary environmental policy, the continuous increase of demographic pressures combined with a gold rush [72] are serious threats to terrestrial and aquatic biodiversity in French Guiana [12,73,74], raising urgent questions about the adequacy and optimization of conservation strategies. SDMs are assumed to predict environmental conditions associated with higher occurrences of the species as well as with their performance [29]. Pinpointing areas where more appropriate environmental conditions exist is vital for the planning of conservation projects, in order to maintain the diversity of ecological processes, to preserve high densities of species, and to continue ecological connectivity.
The species studied here have various ecological roles, such as seed disseminators and predators. Consequently, they provide a relevant view of the large mammal community and are a proxy for main ecological processes. The superimposition of the nine more relevant models can be a valuable tool for identifying key areas and help to design new protected areas [75]. Also, identification of functional corridors joining those protected areas requires mapping the permeability of the landscape in relation to the movements of the species. Landscape connectivity is understood as the “degree to which the landscape facilitates or impedes movement among resource patches” [76], and as the “functional relationship among habitat patches, owing to the spatial contagion of habitat and the movement responses of organisms to landscape structure” [77]. On the basis of predicted occurrences, one can assume that areas identified as less adequate by the SDMs may provide resistance values to assess the ecological costs of movements between patches [4]. This uses predicted environmental suitability in a dynamic way, as an input to assess landscape permeability. Together with ecological benefits, policy issues are important. This tool can help land planning management, when strategic information (e.g., threats, projections of infrastructures) can easily be superimposed with ecological constraints to determine conservation costs and social and political acceptability.
Some limits have nevertheless to be considered. First, we did not include all protected areas as nodes, but only areas above a threshold that needs to be better confirmed, since we saw that higher suitability values may not be associated with higher observed densities and may consequently bias the identification of nodes. Second, in such a dynamic approach, a regional perspective is needed, and surrounding areas of corridors, sources and sinks would definitively need to be determined.
Together with continuous theoretical improvements, Species Distribution Models are increasingly used for conservation science [33]. Although for some species our results may raise the question of measured abundance or predicted suitability as the most relevant proxy for “species performance” (sensu [29]), our application of SDMs has conservation interest, contributing to more relevant networks among areas of importance. Outcomes of projected and validated ecological conditions can identify areas acting as sources and corridors, and provide original conductance and/or resistance values for landscape connectivity modelling, Also, we have shown that even very rough information (i.e., opportunistic records of species presence) from networks of local naturalists and citizen participants [78,79] may be properly analysed through SDMs, and therefore be useful for conservation planning.
Acknowledgments
We acknowledge the contribution of several dozen people, professional and amateur naturalists, who have provided their field observations since 1990. The study directly benefited from DEAL Guyane. Kwata surveys (line-transects and tapirs surveys) were funded by the Office National des Forêts (ONF) Guyane, DEAL Guyane, French Zoological Parks of La Barben and Lille, the program SPECIES (funded by WWF Network, European Funds (FEDER, FFEM), the DGIS and the French Ministry of Higher Education and Research). ONCFS transects data were partly funded by Ministry of Environment (Programme ECOTROP). We acknowledge Monique Pool (Green Heritage Fund Suriname, Paramaribo, Suriname) for language editing.
References
Appendices
Appendix 1a.
Accuracy of MaxEnt modelling to explore environmental suitability for 15 large mammals in French Guiana: test AUC values, significance of the null model test, and significance of the relation between predicted environmental suitability and abundances measured in the field (36 sites). Taxonomy of mammals follows [80]. n.s. = not significant.
Appendix 1b.
Accuracy of GARP modelling (resulting from the sum of 10 independent models) to explore environmental suitability for 15 large mammals in French Guiana: omission error (expected to be < 0.05, [49]), commission index (expected to be > 0.67, [49], and significance of the relation between predicted environmental suitability and abundances measured in the field (36 sites). Taxonomy of mammals follows [80].
Appendix 1c.
Accuracy of Mahalanobis distance modelling to explore environmental suitability for 15 large mammals in French Guiana: test AUC values, significance of the null model test, and significance of the relation between predicted environmental suitability and abundances measured on the field (36 sites). Taxonomy of mammals follows [80].
Appendix 2.
Contribution of each out of five predictive environmental variables of the best predictive model (MaxEnt) to (i) predicted environmental suitability (first value: heuristic permutation test / second value: Jackknife test); and (ii) variation of field-measured abundances, based on surveys of 15 mammals on 36 sites (* = significant contribution – p<0.05 – of the variable)