Caveats for correlative species distribution modeling

10
Caveats for correlative species distribution modeling Catherine S. Jarnevich a, , Thomas J. Stohlgren b , Sunil Kumar b , Jeffery T. Morisette c , Tracy R. Holcombe a a U.S. Geological Survey, Fort Collins Science Center, 2150 Center Ave Bldg. C, Fort Collins, CO 80526, USA b Natural Resource Ecology Laboratory, Colorado State University, Fort Collins, CO 80523-1499, USA c Department of Interior, North Central Climate Science Center, Colorado State University, Fort Collins, CO 80523, USA abstract article info Article history: Received 5 March 2015 Received in revised form 18 June 2015 Accepted 21 June 2015 Available online 24 June 2015 Keywords: Correlative models Species distribution modeling Niche modeling Presenceabsence methods Mapping Species-environmental matching models Correlative species distribution models are becoming commonplace in the scientic literature and public out- reach products, displaying locations, abundance, or suitable environmental conditions for harmful invasive species, threatened and endangered species, or species of special concern. Accurate species distribution models are useful for efcient and adaptive management and conservation, research, and ecological forecasting. Yet, these models are often presented without fully examining or explaining the caveats for their proper use and interpretation and are often implemented without understanding the limitations and assumptions of the model being used. We describe common pitfalls, assumptions, and caveats of correlative species distribution models to help novice users and end users better interpret these models. Four primary caveats corresponding to different phases of the modeling process, each with supporting documentation and examples, include: (1) all sampling data are incomplete and potentially biased; (2) predictor variables must capture distribution constraints; (3) no single model works best for all species, in all areas, at all spatial scales, and over time; and (4) the results of species distribution models should be treated like a hypothesis to be tested and validated with additional sampling and modeling in an iterative process. Published by Elsevier B.V. Contents 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6 2. Primary caveat #1: all sampling data are incomplete and potentially biased . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7 2.1. Location data errors . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7 2.2. Sampling prevalence and sample size . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8 2.3. Spatial extent and background selection . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10 3. Primary caveat #2: predictor variables must capture distribution constraints . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10 4. Primary caveat #3: no single model works best for all species in all areas at all spatial scales and over time . . . . . . . . . . . . . . . . . . . . 11 4.1. No two models are the same . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11 4.2. Species characteristics matter . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11 4.3. Evaluation and calibration . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11 5. Primary caveat #4: The results of species distribution models should be treated like a hypothesis to be tested and validated with additional sampling and modeling in an iterative process . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12 5.1. Model interpretation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12 5.2. Creating yes/nomaps with thresholds . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12 6. Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13 Acknowledgments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13 References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13 1. Introduction Examples of species distribution models abound in the current litera- ture. They are being used to explore potential impacts of climate change (Kearney et al., 2010), establish conservation priority areas and select Ecological Informatics 29 (2015) 615 Corresponding author. Tel.: +1 970 226 9439; fax: +1 970 226 9230. E-mail address: [email protected] (C.S. Jarnevich). http://dx.doi.org/10.1016/j.ecoinf.2015.06.007 1574-9541/Published by Elsevier B.V. Contents lists available at ScienceDirect Ecological Informatics journal homepage: www.elsevier.com/locate/ecolinf

Transcript of Caveats for correlative species distribution modeling

Ecological Informatics 29 (2015) 6–15

Contents lists available at ScienceDirect

Ecological Informatics

j ourna l homepage: www.e lsev ie r .com/ locate /eco l in f

Caveats for correlative species distribution modeling

Catherine S. Jarnevich a,⁎, Thomas J. Stohlgren b, Sunil Kumar b, Jeffery T. Morisette c, Tracy R. Holcombe a

a U.S. Geological Survey, Fort Collins Science Center, 2150 Center Ave Bldg. C, Fort Collins, CO 80526, USAb Natural Resource Ecology Laboratory, Colorado State University, Fort Collins, CO 80523-1499, USAc Department of Interior, North Central Climate Science Center, Colorado State University, Fort Collins, CO 80523, USA

⁎ Corresponding author. Tel.: +1 970 226 9439; fax: +E-mail address: [email protected] (C.S. Jarnevich).

http://dx.doi.org/10.1016/j.ecoinf.2015.06.0071574-9541/Published by Elsevier B.V.

a b s t r a c t

a r t i c l e i n f o

Article history:Received 5 March 2015Received in revised form 18 June 2015Accepted 21 June 2015Available online 24 June 2015

Keywords:Correlative modelsSpecies distribution modelingNiche modelingPresence–absence methodsMappingSpecies-environmental matching models

Correlative species distribution models are becoming commonplace in the scientific literature and public out-reach products, displaying locations, abundance, or suitable environmental conditions for harmful invasivespecies, threatened and endangered species, or species of special concern. Accurate species distribution modelsare useful for efficient and adaptive management and conservation, research, and ecological forecasting. Yet,these models are often presented without fully examining or explaining the caveats for their proper use andinterpretation and are often implemented without understanding the limitations and assumptions of themodel being used. We describe common pitfalls, assumptions, and caveats of correlative species distributionmodels to help novice users and end users better interpret these models. Four primary caveats correspondingto different phases of the modeling process, each with supporting documentation and examples, include:(1) all sampling data are incomplete and potentially biased; (2) predictor variables must capture distributionconstraints; (3) no single model works best for all species, in all areas, at all spatial scales, and over time; and(4) the results of species distribution models should be treated like a hypothesis to be tested and validatedwith additional sampling and modeling in an iterative process.

Published by Elsevier B.V.

Contents

1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 62. Primary caveat #1: all sampling data are incomplete and potentially biased . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7

2.1. Location data errors . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 72.2. Sampling prevalence and sample size . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 82.3. Spatial extent and background selection . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10

3. Primary caveat #2: predictor variables must capture distribution constraints . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 104. Primary caveat #3: no single model works best for all species in all areas at all spatial scales and over time . . . . . . . . . . . . . . . . . . . . 11

4.1. No two models are the same . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 114.2. Species characteristics matter . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 114.3. Evaluation and calibration . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11

5. Primary caveat #4: The results of species distribution models should be treated like a hypothesis to be tested and validated with additional sampling andmodeling in an iterative process . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 125.1. Model interpretation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 125.2. Creating “yes/no” maps with thresholds . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12

6. Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13Acknowledgments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13

1 970 226 9230.

1. Introduction

Examples of species distribution models abound in the current litera-ture. They are being used to explore potential impacts of climate change(Kearney et al., 2010), establish conservation priority areas and select

7C.S. Jarnevich et al. / Ecological Informatics 29 (2015) 6–15

reserve designs (Fuller et al., 2008; Kremen et al., 2008; Pawar et al.,2007), guide restoration efforts (Fei et al., 2012), identify new popu-lations of rare and endangered species (de Siqueira et al., 2009;Guisan et al., 2006), predict distributions of known and unknownspecies (e.g., Raxworthy et al. visited high diversity locations fromcombined distribution models and found previously undescribedspecies; Pearson et al., 2007; Raxworthy et al., 2003), aid invasiveplant management (Crall et al., 2013), as a component of risk analy-sis (Ficetola et al., 2007), and in phylogeographic studies (Miller andKnouft, 2006; Moritz et al., 2009). To focus the scope of this paper,our discussion is limited to correlative species distribution models,rather than process-based models. Correlative models developmathematical relationships between observed locations of speciesoccurrence (presence or absence) and environmental variables. Cor-relative species–environment relationships often are applied to de-velop, as opposed to test, hypotheses regarding drivers of speciesdistributions or are applied to unsampled locations to predict occur-rence of the species or its suitable environmental conditions. Thesesame models are frequently used with climate change scenarios inecological forecasting. Our review was precipitated by the fact thatseveral important caveats are overlooked or unstated bymany inves-tigators. Or, even when they are explicitly stated, may not be clearlyrespected in applications. Thus, the purpose of this paper is to pro-vide a summary document that could be referred to by the modelingcommunity and resource managers and other users of model output.

Easy access to species distribution modeling software andgreater computing power has increased their use. Many techniquescan be implemented in freely available programs (e.g., see exam-ples in Elith et al. (2006); EcoMod by Guo and Liu (2010); SAHMby Morisette et al. (2013); Maxent by Phillips et al. (2006); andBIOMOD by Thuiller et al. (2009)), R packages (e.g., biomod2 anddismo) and R code provided in many papers (e.g., Broennimannet al., 2012). Some implementations are very user friendly, makingit easy to obtain output. This does not, however, mean that modelsare being implemented with an understanding of the assumptionsand limitations of a given statistical technique or of species distri-bution modeling in general (see Joppa et al., 2013). There areimportant considerations when following the model developmentsteps of preparing data, selecting and implementing a model, applyinga model to a novel location (e.g., native range to invaded range) ortime (e.g., climate change), and evaluating and interpreting resultsfrom a model. In recent years the literature has been flooded withmany studies examining several aspects related to modeling in greatdepth. Given the plethora of papers and the great need for and interestin applications of these techniques, a review of caveats and modelingconsiderations is warranted.

There are several review papers relating to some caveats of speciesdistributionmodeling that provide a good foundation for improved spe-cies distribution modeling practice and should be read. Our reviewbuilds on these older papers that do not cover topics debated in morerecent literature (e.g., Guisan and Zimmermann, 2000), covermore spe-cific topics (e.g., Heikkinen et al. (2006) and Thuiller et al. (2007) onlyexamine climate change applications), or deal with themore theoreticalrather than applied aspects of model development (e.g., Araujo andGuisan, 2006; Austin, 2007).We build on recent reviews examining ap-propriate use of bioclimatic models (e.g., Araújo and Peterson, 2012) bybriefly reviewing caveats and considerations for cautious implementa-tion of correlative species distribution modeling techniques.

We focus on techniques that characterize habitat suitability. The tech-niques for habitat suitability identification use presence-only or presence/absence data, whereas abundance models require a continuous valuesuch as count data for animals or foliar cover for plant species. Thoughwe do mention some aspects of species abundance, we primarily focuson species occurrences (i.e., presence locations). The locations of occur-rence or abundance are related to environmental characteristics, andthe patterns found between occurrence/abundance and the environment

aid in predicting suitability/ abundance at other unsampled locations in aconfined area (i.e., the extent). Such efforts could be fraught with uncer-tainty. Understanding four primary caveats to the habitat suitabilitymodeling process might help communicate the utility of species distribu-tionmodels, despite areas of highuncertainty. Fig. 1 gives anoutline of thespecies distributionmodelingworkflow. The caveats offered in this paperwill follow the same outline: (1) all sampling data are incomplete andpotentially biased; (2) predictor variables must capture distribution con-straints; (3) no single model works best for all species in all areas at allspatial scales and over time; and (4) the results of species distributionmodels should be treated like a hypothesis to be tested and validatedwith additional sampling and modeling in an iterative process.

2. Primary caveat #1: all sampling data are incomplete andpotentially biased

Biological field sampling typically represents a biased snapshot ofspecies distribution and abundance (Stohlgren, 2007). In general, fieldsampling is restricted in time and space to a small subset of all potentialspecies occurrences, abundances, persistence, fecundity, and dispersalpotential. A sampling design that poorly captures the entire range ofthe species or population of concern (e.g., Fig. 2a) may lead to samplebiases which in turn will affect model results. Locations should notonly capture the geographical range of a species but also the environ-mental gradients over which a species is found. A given “presence loca-tion” may occur in moderately suitable habitat, or may representtemporary use of unsuitable habitat, yet species distribution modelsgenerally treat all locations as being equally suitable with the assump-tion that locations represent viable populations. An occurrence “point”may represent a microsite and different environmental conditionsthan indicated by the environmental resolution of themodel. For exam-ple, a plant species may be restricted to tiny springs or seeps (mesicmicrosites)within a large grid cell characterized as a xeric environment.Thus, to create an accurate and useful model, the resolution of themodel needs to match the spatial context of the field sample and, relat-edly, the biological phenomenon beingmodeled rather than the resolu-tion of readily available environmental predictors.

2.1. Location data errors

True “presence data” must consist of observations that are concur-rent with the environmental predictors and must be taxonomicallyaccurate. The issue of data quality was highlighted by Lozier et al.(2009), when they generated a fictitious distribution of Sasquatch,when the data represented misidentifications of the black bear, Ursusamericanus. While this is a humorous example, it does highlight theimportance of using high quality data in model generation. Dormannet al. (2008) identified uncertainty in species breeding status at loca-tions as a primary source of uncertainty in model results (27.4% of iden-tified uncertainty) when compared to uncertainty from model type,variable selection within model method, and collinearity treatment.Other errors relate to the quality of georeferenced information associat-ed with the data. Some modeling methods may be less sensitive thanothers to moderate location errors, and useful predictions may still begenerated when errors are present (Graham et al., 2008). If a model iswell specified and only a small proportion of samples have errors(e.g., misidentification or inaccurate coordinates), it should not greatlyaffect model results (see Rodda et al., 2011). Spatial resolution of theenvironmental predictors can be adjusted to accommodate locationaluncertainties in species occurrences. For example, if occurrence datahave locational uncertainties of 1–2 kmdistance then using 4–5 km res-olution data layers may result in more accurate model predictions(e.g., Kumar et al., 2014). Location of the samples is also important asincomplete sampling across important environmental gradients affectsinterpolation and extrapolation.

Fig. 1. Phases of themodeling process. These phases correspond to the primary caveats and include (1) sampling data; (2) predictor variables; (3)model algorithm; and (4) the results ofspecies distribution models.

8 C.S. Jarnevich et al. / Ecological Informatics 29 (2015) 6–15

True “absence”maybe impossible tomodel over large areas throughtime. The observed “absence” of species during field sampling cannotdifferentiate between undetected presence, suitable environmentalconditions not yet established by a species, and unsuitable environmen-tal conditions (e.g., Fig. 2b). For example, seeds may be buried in soil,going unreported. Small organisms, or certain life stages of organisms,may be cryptic, or microscopic (e.g., insect eggs buried in the soil orinsect larvae developing inside a tree or fruit before emerging as adults).Imperfect detection of a species because of the type of survey methodused, species characteristics and abundance across the landscape (cryp-tic species have low detectability), and heterogeneity of environmentalconditions (harder to sample all environmental conditions), can alsoresult in false absence (Boulinier et al., 1998; MacKenzie, 2005). Trueabsence data at appropriate spatial scales also may not be stable overtime, as climate and other conditions change, particularly for invasivespecies (likely to be expanding) or threatened and endangered species(concerned to be shrinking; Table 1). Still, where absence data are avail-able they should be used as they generally produce better models(Brotons et al., 2004), and research using artificial species indicatesthat using randomly generated pseudo- absence locationsmay bebetterthan no absence data (Wisz and Guisan, 2009). The best we can dowithspecies distribution modeling tools is to provide a snapshot of a speciesdistribution or abundance with estimated levels of uncertainty over adefined period of time.

It is important to remember that all species distribution models areaffected in some way by the quality, completeness, and potential biasesof the data (Stohlgren, 2007). The best models cannot make up for poor

quality data. Ideal location data would consist of presence locationscovering suitable environmental conditions accessible to a species andabsence data covering unsuitable environmental conditions accessibleto a species. In any study the description of the field sampling methodused should be explicit, and should include information regarding thespatial consistency between observations and the model grid resolu-tion, temporal consistency between samples and predictor layers, andhow absence observations or presence-only considerations are handled.

2.2. Sampling prevalence and sample size

The influence of prevalence, defined as the ratio of presence toabsence locations, in species presence/absence datasets has received alot of attention. A 50% sampling prevalence is often preferred (Fig. 2c),but anything ranging from below 20% to above 75% prevalence of pres-ence locationsmay provide decent results (McPherson et al., 2004). It isimportant to note that McPherson et al. (2004) only tested linearregression and non-linear discriminant analysis, and other techniquesmay perform differently. In contrast, Jimenez-Valverde and Lobo(2006) suggest that prevalence is not a factor if: (1) models are devel-oped with good predictors (representative), and accurate training data(few false absences and adequate sample size); and (2) the data arenot spatially or environmentally biased (e.g., Fig. 2a where one branchof a riparian area was missed or Fig. 2c where the north part of thearea of interest was not surveyed). Franklin et al. (2009) manipulatedprevalence for each of a suite of species ranging from common to rareacross the landscape. They determined that prevalence did not have

Fig. 2. Issues from species distributionmodeling. Illustrated examples of different issues that may arise when developing species distributionmodels, including (a) spatial sampling bias—riparian zonesmay be important and here an entire branchof a riverwasmissed, although therewere several absence locations (open circles (absence) versus closed circles (presence)) inthe vicinity; (b) temporal sampling, which could bias predictions if the species had an unstable distribution— if this is where an invader newly arrived at Time 1, theremay be several un-true absence locations in the area overlapped by the future Time 2; (c) skewed prevalence and distribution of samples — ideally samples should cover the entire area being predicted,unlike this picture where there are many more absence locations and the northern part of the environmental gradient is missed; (d) non-environmental factors affecting a species' dis-tribution, such as biotic interactions (here, biological control for a species) or genetic diversity; (e) missing important predictors, such as a stream layer here for a riparian species, maydecrease model performance (species A with the stream layer results in a more restricted distribution than species A without the stream layer); (f) models for the same species usingthe same predictors but trained on different geographic areas will produce different results; (g) generalist species (distribution encompassed by circle) may be harder to predict thanspecialist species (limited to the riparian zone here); (h) the impact of threshold selection to define locations as suitable or unsuitable will affect the extent predicted as suitable.

Table 1Major underlying assumptions of presence-only models, presence–absence models, andabundance models.

Model type Assumptions or idiosyncrasies

Presence-background(e.g., Maxent and GARP)

1. Background locations adequately sample areasaccessible to the species (see Fig. 2a)2. Samples cover important environmentalgradients3. Produces a model of relative habitat suitability(e.g., one location is more suitable than another)but values do not reflect occupancy

Presence–absence models(e.g., logistic regression)

1. Absence recorded at a specific location isassumed to denote absence at the resolution ofthe cell in the model2. Absence locations are assumed to be unsuitableenvironmental conditions3. Samples cover important environmentalgradients (see Fig. 2c)4. Produces a model of occupancy

Abundance models(e.g., multiple linearregression)

1. Abundance measurements at a point areassumed to denote abundance at the resolution ofthe cell in the model.2. Abundance is not affected by time sincecolonization (see Fig. 2b).3. Samples cover important environmentalgradients (see Fig. 2c).

9C.S. Jarnevich et al. / Ecological Informatics 29 (2015) 6–15

an effect and rather that more accurate models can be developed forhabitat-specialist species than habitat-generalist species (also seeEvangelista et al., 2008; Fig. 2g); this may, however, be an artifact ofthe evaluationmetric (Lobo et al., 2008; see species characteristicsmat-ter section below). The ideal conditions proposed by Jimenez-Valverdeand Lobo (2006) are difficult to meet, so checking prevalence in yourdata may still be a good idea.

The minimum sample size required to create a useful model isanother issue. In one study, no algorithm from those tested predictedconsistently well at small sample sizes (n b 30; Wisz et al., 2008). How-ever, these small samples may still provide useful information to guidefurther survey efforts and enable more efficient location of unknownpopulations. For example, using jackknife techniques with Maxentwith as few as five presence locations, Pearson et al. (2007) successfullyidentified unknown populations, highlighting locations with similarenvironmental conditions rather than range boundaries. The samplesize also affects the number of predictors that should be included in amodel, with a recommendation of at least 10 data points per variable(Hosmer and Lemeshow, 2000). Categorical variables imply furtherconsideration for sample size, in that there should be sufficient pointsfor each class in a categorical predictor variable (e.g., at least 10 datapoints per class).

10 C.S. Jarnevich et al. / Ecological Informatics 29 (2015) 6–15

2.3. Spatial extent and background selection

Ideally, the spatial extent of the modeling exercise (and thus thelocation data) should cover the area that has been accessible to thespecies of interest over the relevant time period, and backgroundor absence data should be limited to this same extent (Barve et al.,2011). However, if presence data do not cover the entire range acces-sible to a species, geographic extent of these data should still definethe extent for model development. If background or absence dataare drawn from areas outside the presence location samplingrange, results will be altered and assessment metrics inflated.VanDerWal et al. (2009) varied the spatial extent from which theydrew background points (10 distance bands around presence points;10 km to 500 km) and found background points randomly extractedfrom within a distance band of 200 km from presence locations pro-duced the best Area Under the Curve results (AUC; discussed in the‘Evaluation and calibration’ section in more detail) for 12 species innorthwestern Australia. Smaller areas developed misleading modelswhile larger areas inflated test statistics, over-predicted distribu-tions, and altered response variable selection. The number of back-ground points also affects model predictions and a large number ofpoints, usually 10,000 or more, are recommended to represent back-ground environment (Barbet-Massin et al., 2012; Northrup et al.,2013). However, this number would vary based on the extent ofthe study area and heterogeneity of the environment as the back-ground data need to adequately capture environmental gradientsavailable to the species.

Projecting predictions to large extents from a model created for asmall area may also lead to problems. If capturing potential distribu-tion is desired, location data to develop the model will ideally bedrawn from the broadest geographic and environmental range pos-sible. These modeling techniques utilize correlative relationshipsbetween a species and the environment. Species relationships toenvironmental parameters and relationships among environmentalparameters may not be constant in space, and factors important incontrolling a species distribution in one region may be differentfrom those in a different region (Fig. 2f). For example, Duncan et al.(2009) examined the transferability of climate envelope modelsbuilt for dung beetles native to Africa and introduced to Australia.For three of five species, models created using data from the nativerange (Africa) did not accurately predict distribution in the intro-duced range (Australia). This may potentially be due to species cli-matic niche shift during the invasion process (Broennimann et al.,2007), or because factors other than climate may be controlling thedistribution.

3. Primary caveat #2: predictor variables must capture distributionconstraints

Predictor variables considered in themodeling should be biological-ly meaningful, selected based on species' eco-physiological tolerancesor habitat requirements (Austin, 2007; Guisan and Zimmermann,2000). The choice of the set of independent or predictor variables can af-fect the model predictions and performance (Johnson and Gillingham,2005). For example, one study used three different sets of climate pre-dictors to model the potential distribution of a vulnerable butterfly spe-cies. Depending on the set of predictors used, the species was eitherprojected to experience little alteration in suitable environmental con-ditions, or to have a great range expansion with climate change(Harris et al., 2013).

Given these potential effects, the combination of variables re-tained during screening processes, such as removal of correlatedvariables, is very important (Braunisch et al., 2013). The removalof highly correlated variables is important as they should be “inde-pendent” (among themselves)—not highly collinear (correlationcoefficient |r| N 0.70, assessed using Pearson, Spearman Rank, or

Kendall correlation metrics; Dormann et al., 2013). Collinearity isespecially problematic when there is interest in applying a modelto a novel geographic location or time, in particular if the correla-tion structure among variables is not constant. This constitutesextrapolation rather than interpolation. High collinearity amongvariables can also make the interpretation of their relative impor-tance very difficult.

Variables should also not be greatly affected by spatial autocorre-lation. Spatial autocorrelation in response and predictor variablesmight result in a lack of independence among model residuals, thusviolating the spatial independence assumption of some techniques(e.g., ordinary least squares regression). It can also result in parame-ter coefficient shift, underestimation of standard errors, and overes-timation of statistical significance (Hawkins, 2012; Mauricio Biniet al., 2009). A lack of independence among residuals could also bedue to a missing important environmental predictor (e.g., Fig. 2e)or clustered or geographically and environmentally biased occurrencedata. Therefore, spatial patterns in model residuals should be examinedfor spatial autocorrelation (Legendre and Legendre, 1998) usingmetricssuch as Moran's I coefficient of spatial autocorrelation, variograms andspatial correlograms (see Dormann et al., 2007 for details). Spatial auto-correlation in residuals can be addressed by either including spatialstructure in the model error term (e.g., Lichstein et al., 2002) or usingadditional predictor variable(s) that are ecologically meaningful andalso match the spatial pattern of the residuals. The spatial leave-one-out procedure suggested by Le Rest et al. (2014), which is a specialcase of spatial cross-validation where each point left out is spatiallyindependent from the others, can also be used to account for residualspatial autocorrelation. Higher spatial autocorrelation in species occur-rence data can result in highly inflated test evaluation statistics anderroneous model projections (Boria et al., 2014). A spatial filteringapproach for dealing with spatial autocorrelation in occurrence datacould be used (Boria et al., 2014) and a spatial cross-validation couldbe used to reduce inflation in test evaluation statistics.

Variables that have a direct influence on a species' distribution aremore transferable than indirect predictors (Guisan and Zimmermann,2000). If you want to project a model to a new area or to a future time,such as using an invasive species' native range to predict an invadedrange, use direct predictors (Fig. 2f). Something like elevation, which isa surrogate for other predictors like temperature, may not mean thesame thing in different locations. An elevation of 2000 m at 20° latitudewill probably be warmer on average than an elevation of 2000 m at 40°latitude.

Models generally only include abiotic factors, and these are oftenlimited to climatic information. Many factors other than abiotic con-ditions can affect species distributions including genetic diversity,dispersal ability (including anthropogenic spread pathways whichare often ignored), the presence of competing or predatory species,and natural succession, adaptations, and evolution (Fig. 2d). Thesefactors could be included in a model. For example, models could becreated for different genetic sub-populations (e.g., Jarnevich et al.(2014) createdmodels for Africanized honey bees which are a hybridof several subspecies of Apis mellifera). For an invasive species onemodel could be created for dispersal, where predictors would in-clude anthropogenic factors associated with spread such as distanceto transportation corridors or movements of human populations,and another for establishment, where predictors would includeclimatic factors that may limit habitat suitability. Many species-environmental matching models inadequately represent complexinteractions in the real world. Missing a key environmental layercan drastically affect model results (Fig. 2e), and poor model perfor-mance can alert a modeler to this issue. For example, missing astream layer (for distance from stream calculations) for a phreato-phyte species may weaken the models substantially, resulting inpoor evaluation metrics. If a model performs poorly, re-evaluatingthe predictor layers incorporated into the model may be helpful.

11C.S. Jarnevich et al. / Ecological Informatics 29 (2015) 6–15

4. Primary caveat #3: no singlemodel works best for all species in allareas at all spatial scales and over time

As evidenced by the results fromdifferentmodel comparison studiesand multiple model runs within the studies themselves (Elith et al.,2006; Tsoar et al., 2007), no single model works best for all species inall areas at all spatial scales and over time. Likewise, all models containsources of uncertainty and share some common assumptions. Below,we describe several related caveats and considerations when selecting,applying or interpreting species distribution models.

All species-environmental matching models have underlying as-sumptions that are rarely fully met. For example, the models generallyassume equilibrium or semi-equilibrium conditions, whereby speciesdistributions are largely unaffected by the time since establishment(i.e., the time needed to fill most available niches), dispersal limitations,or other major ecosystem processes such as competition or disturbance(Austin, 2002; Guisan and Thuiller, 2005; Guisan and Zimmermann,2000; Soberon and Peterson, 2005). As highlighted in caveats one andtwo, all models assume adequate sampling across important environ-mental gradients and the area of interest (model extent) with predictorvariables capturing the constraints of the species' distribution (Table 1).Within the statistical model the model generated functional forms ofrelationships between occurrence data and predictors are assumed tobe correct. Modelers may find it difficult to meet these assumptions,especially when dealing with changing conditions (climate, land use)or a new or highly invasive species, but can still produce useful models.

There are additional assumptions and idiosyncrasies of mostpresence-only models, presence–absence models, and species-abundance models (Table 1). True presence-only models developrelationships between a species' known locations and the environ-ment at those locations, developing some sort of box around theconditions (see Hirzel et al., 2002). Other methods often referredto as presence-only actually require background locations that re-flect the available environmental conditions in the region to com-pare to environmental conditions at known presence locations.Presence–absence techniques require both presence data and ab-sence data, although the absence data could be pseudo-absences(generated locations to represent absences) rather than measuredabsences. Models using either background locations or pseudo-absence locations can suffer from sampling bias (geographic and/or environmental), if the background or pseudo-absence locationsdo not mimic sampling bias in the presence data. If possible, biasin presence locations should be mimicked in selection of back-ground or pseudo-absence locations. Many different methods havebeen developed to try and mimic the sampling bias found in thepresence data. See Phillips et al. (2009) for an example usingtarget-group occurrence data as background points in Maxent to re-duce the bias and improve models. Background and pseudo-absenceshould also only be drawn from regions accessible to the species(Barve et al., 2011). Any of the parametric techniques, such as logis-tic regression, assume locations are randomly collected, which isoften not the case.

4.1. No two models are the same

No two models will produce identical results, whether holdinginputs constant and switching modeling methods or using the samemethod with different inputs (Fig. 2f). Sensitivity of the model tovariable choicewas discussed above in caveat two. The choice ofmodel-ing method depends on the type of dataset (e.g., presence-only, pres-ence–absence, or abundance) and the specific study objectives. Youcan either choose the best performing modeling method(s) based onprevious model comparison studies or test multiple methods on yourown. There are several comparison studies in the literature (e.g., Elithet al., 2006). Some compare different methods for specific aspects orapplications such as presence-only modeling (Ortega-Huerta and

Peterson, 2008), presence–absence versus presence-only methods(Brotons et al., 2004), habitat use (i.e., habitat specialist versus generalist;Evangelista et al., 2008), grain size (Guisan et al., 2007a), sample size(Wisz et al., 2008), species range sizes (McPherson et al., 2004), species'ecology (Guisan et al., 2007b;McPherson and Jetz, 2007), prevalence, lat-itudinal range and clumping (spatial autocorrelation; Marmion et al.,2009) andmany others. However,most papers that have tried to quantifydifferent contributions to uncertainty have identified model type as thetop contributor (e.g., Diniz et al., 2009; Dormann et al., 2008).

Ensemble modeling may help with these differences amongmodels(Araujo and New, 2007; Marmion et al., 2009; Roura-Pascual et al.,2009), and can at least help identify uncertainty resulting from differentchoices made during the modeling process. Ensemble modeling can beused to highlightwhere differentmodels agree andwhere they disagree(Stohlgren et al., 2010). The ensemble model will only be as good as themodels used, and will rely on the accuracy of individual models alongwith sampling and predictive layers mentioned earlier (e.g., samplesize and distribution relative to major environmental gradients andthe resolution and accuracy of the predictive layers). And all modelscontain other uncertainties associatedwith species characteristics, sam-pling design, and the completeness, accuracy, and resolution of predic-tive layers (Stohlgren et al., 2010).

4.2. Species characteristics matter

Many researchers have suggested that generalist species areharder to model than habitat specialist species (Brotons et al.,2004; Evangelista et al., 2008; Tsoar et al., 2007) in species–environ-mentmatchingmodels (Fig. 2g). Species alsomay be generalists oversome geographic extents, and specialists over other portions of theirdistributions. In general, habitat-generalist and habitat-specialistspecies differ in the ratio of occurrence extent occupied to the extentof the area of interest. The ratio will be much smaller for habitat-specialist species (small patches scattered across the landscape of in-terest compared to large swaths in the landscape of interest), and itwill be more likely that absence locations drawn from the entirearea of interest will differ from the presence locations for specialistswhen quantifying species–environment relationships with themodels. This ability to discriminate presence from absence locationsis what most evaluation metrics measure. It should be kept in mindthat AUC values are not directly comparable between models unlessthe exact same location data (presence and absence/ pseudo-absence/ background data) are used to compute them.

4.3. Evaluation and calibration

As a first step, model results should be examined to determine ifthey match what is known about the biology of a species and its distri-bution. If the results contradict what is known, model implementationand input data should be reviewed. An independent data set is idealfor statistical model evaluation and calibration, but often is unavailable.Alternatives include cross-validation, where the data set is divided inton equal parts (or folds) and n models are run withholding a differentsubset for testing each time (spatial or not; see caveat two section onautocorrelation). This option has advantages over the single split sam-ple approach because it is less sensitive to spurious results due to asingle random selection.

It is wise to calculate multiple evaluation statistics to assessmodel performance rather than relying on a single statistic. Forexample, Cohen's Kappa may be sensitive to the prevalence, whileAUC (Fielding and Bell, 1997), or True Skill Statistic (TSS; Alloucheet al., 2006) may be less affected. Comparing models using multipleevaluation statistics will allow for better overall evaluation (Loboet al., 2008), and there are several packages, including most ofthose mentioned early in this article, that make this easy to do.

12 C.S. Jarnevich et al. / Ecological Informatics 29 (2015) 6–15

AUC is perhaps the most commonly used metric for evaluationand is a discrimination statistic representing the likelihood of a pres-ence location having a higher predicted value than an absence loca-tion (Pearce and Ferrier, 2000). It is important to note that an AUCvalue of 0.5 represents a model that is no better than random. So,while AUC can range from 0 to 1, the practical range is from 0.5 to1.0. Generally, AUC values less than 0.7 imply a questionable modelfit (Swets, 1988). However, AUC can be sensitive to the spatial extentfrom which absences or background points are drawn (Lobo et al.,2008). For presence-background or presence and pseudo-absencedata, AUC values should be interpreted cautiously because the valuesreflect equal weights to the errors of commission and omission (seeJiménez-Valverde, 2012 for more discussion).

Often the purpose of creating a distribution model is to predict suit-able environmental conditions in new locations (e.g., model a nativerange and project to an invaded range) or time (e.g., model currentclimate conditions and project to future climate conditions). However,if your model performs poorly, it would be questionable to extrapolatethat model to other spatial or temporal situations.

Model calibration, which assesses the ability of a model to correctlypredict conditional probability of presence of a species (i.e., agreementbetweenpredictedprobabilities of occurrence and theobserved propor-tion of locations occupied by the species), is generally not calculated formodels. Discrimination is only useful to evaluate how well a modelperforms for a given species in a specific geographic location andpoint in time, while model calibration is more broadly applicable(through space and time; Jiménez-Valverde et al., 2013). Calibration isimportant if model results are intended to be used as an absolute mea-sure of probability of occurrence; if relative suitability of locations is thegoal then discrimination measures may be sufficient.

5. Primary caveat #4: The results of species distribution modelsshould be treated like a hypothesis to be tested and validated withadditional sampling and modeling in an iterative process

5.1. Model interpretation

Given all that is reviewed above, it seems clear that there is an artand a science tomodel interpretation. Interpretation of results, whetherthe model aimed to explain patterns of distribution or to predict actualor potential geographic distributions of the target species, heavilydepends on the type of modeling algorithm used (Jimenez-Valverdeet al., 2008). In general, the simpler methods (e.g., BIOCLIM, and envi-ronmental envelopes) may better estimate potential distribution ofthe species (where the species could survive) whereas more complextechniques that use presence–absence data (in this case real or trueabsences) may better predict actual realized geographic distributionsof the species (where a species actually survives; e.g., Neural Networks;Jimenez-Valverde et al., 2008). Complex presence-only methods likeMaxent (Phillips et al., 2006), and GARP (Stockwell and Nobel, 1992)will predict all or parts of realized and potential geographic distribu-tions of a species (Jimenez-Valverde et al., 2008).

It seems prudent for users to remain open to alternative explana-tions of the same results. Given the caveats described here, modelsshould be considered to create (rather than test) hypotheses regardingspecies realized and potential distributions and the factors that maybe driving them. When more than one explanation of model resultsseems plausible, this creates an opportunity for themodelers to consultiteratively with taxon experts in the process of exploring the existingdata in more detail, collecting more data, or otherwise obtaining moreinformation. It is important to keep in mind that the models represent(more or less complex) statistical relationships between predictorsand field observations. Simply put, these models represent correlationmore than causation. The key point is that when interpreting themodels, results should not be viewed as a definitive explanation of theway things are, but rather hypotheses about species distributions and

their relationship with environmental conditions that can be furthertested. Therein is the “art” of modeling.

Accurately modeling species distributions in space is difficult, andecological forecasting to new geographic areas or in time is even morechallenging. The caveats above are compounded when extrapolatingto new areas or times. For example, if the predictors used are notwhat is actually controlling the species distribution, you end up with aspurious prediction. Sax et al. (2013) illustrate the difference betweenthe realized niche for a species and the ‘tolerance’ niche (climaticspace under which the species can survive) by comparing the climatespace encompassed by the species' native range to the expandedspace when including naturalized or non-native populations. In theirexample the native range obviously is not limited by climatic con-straints but is controlled by other factors that could be explored(e.g., other abiotic conditions such as soil type, biotic factors such ascompetition). In this example a distribution model based on nativerange localities and climate would perform fine to capture the currentrealized range but would yield poor results for the potential range glob-ally or under future climate conditions.

These challenges have not stopped investigators from integratingfuture climate change scenarios into correlative species distributionmodels (Franklin et al., 2013; Jarnevich and Stohlgren, 2009). It is notpossible to provide a full disclosure of caveats related to climatemodeling and projections in association with species distributionmodels here. Fortunately, the primary caveats for the uncertaintiesassociated with climate modeling have been covered by Tebaldiet al. (2005) and Pielke and Wilby (2012). Often cited caveats in-clude non-analogous climate, a sparse and short climate record,novel processes and interactions as species shift distributions inde-pendently of one another, and extremely limited means of evaluat-ing uncertainty for projections in space and time, which can lead toquestionable species distribution predictions (Franklin et al.,2013). While such models begin to elucidate a few fundamentalspecies–environment relationships, reliable projections of rangechanges likely will only result from monitoring species abundance,persistence, dispersal, and migration over time. Other variablesincluding soils; genetics; species interactions; and changes indisturbance regimes, land use, and trade and transportation whichcontinually bring in new invaders are all likely to be importantand are unaccounted for in most extant distribution models(Stohlgren et al., 2013). Still, when combined with a careful long-term monitoring program, preliminary species distribution models areessential for identifying potential leading and trailing edges of distribu-tions (invasion or change due to changing climate or land use), and thedominant environmental factors associated with species presence,abundance, and persistence. If the diagnostics indicate a strong modelfit and the assumptions of the modeling technique are met by thegiven application, projecting future distributions is reasonable to con-sider but should be done with the understanding that the uncertaintyinherent in these types of models is compounded by the unknowns ofprojecting the predictor layers and the predictor–species relationshipsinto the future (Wilby and Dessai, 2010).

In short, we treat all species distribution model outputs as hypothe-ses to be tested in an iterative process of gathering new data, validating,and improving models over time (Crall et al., 2013; Stohlgren andSchnase, 2006). Thus, we recognize that our “predictions” should beviewed as “possible outcomes”, which are impossible to fully assess interms of accuracy, precision, and uncertainty, except by waiting to seeif the predictions hold truewith further data collection and assessment.

5.2. Creating “yes/no” maps with thresholds

Models created using presence-only or presence–absence datagenerally produce results with predicted probability values rangingbetween 0 and 1. For many model applications, diagnostics, and fur-ther analyses, the model predictions are converted into binary maps

13C.S. Jarnevich et al. / Ecological Informatics 29 (2015) 6–15

representing suitable and unsuitable locations for the species ofinterest. Selection of a threshold above which to convert locationsto suitable and below which to convert locations to unsuitable isimportant as different thresholds can produce drastically differentestimates of suitable range (e.g., Fig. 2h).

Thresholds are generally based on criteria related to weightingfalse presence or false absence. Weighting one error over anothermay be more desirable for certain applications such as minimizingerrors of commission to assist in reserve design to ensure that thespecies of interest has suitable environmental conditions withinthe reserve. Jimenez-Valverde and Lobo (2007) found that thesensitivity–specificity difference minimizer and sensitivity– speci-ficity sum maximizer threshold criteria performed best across awide-range of species, prevalences, and sample sizes. Liu et al.(2005) also found these two to be good along with three others oftwelve they tested. These two methods were also highly correlatedwith each other and with prevalence (Jimenez-Valverde and Lobo,2006). A recent examination of thresholds for presence-only modelsfound that some thresholding techniques did not appear to be sensi-tive to not having absence data (Liu et al., 2013). Ultimately, theapplication of model output should guide the selection of a decision

Table 2List of guidelines we follow when developing a species distribution model.

Issue Resolution

Sample size Minimum of 30 points, but still warn aboutinstability with small sample sizes; smaller samplesonly used to guide further sampling; alwaysremove duplicates found in the same pixel; fitsimpler models with fewer predictors (e.g., roughly10 locations for each predictor)

Background/pseudo-absenceand absence data

If absences are available, use them (exceptpotentially for species with very unstabledistributions such as a new invader); draw pointsfrom areas accessible to the species; mimicsampling bias in presence data using a bias surfaceor target group sampling forbackground/pseudo-absence

Spatial extent Ensure that the model is trained on area sampled;evaluate bias in samples; background or absencedata limited to area accessible to species

Variable selection Predictors should match direct effects onspecies; collinearity should be handled(e.g., remove one from each pair of variableswith correlations |r| N 0.7)

Thresholds Continuous surfaces best but if needed selectthreshold with care; keep objectives in mind;generally thresholds related to sensitivity andspecificity perform well

Evaluation statistics Examine discrimination using multiple statisticssuch as AUC, TSS, sensitivity, specificity, kappa,correct classification rate; question models withlow performance (e.g., AUC b 0.7); if occupancy isimportant examine calibration; independent dataare best but if unavailable use a cross-validationapproach with large sample size or a jack knifeapproach with small sample size

Model transferal in space ortime

Ensure that the model performs well on trainingarea before bothering with transferal; examinestability of correlation structure among predictors(models assume similar structure); highlight areaswith novel conditions as having high uncertainty;utilize Multivariate Environmental SimilaritySurface (MESS) maps or some other mechanism toexplore interpolation versus extrapolation

Response curves Examine shapes of species response curves: do theymake biological sense? and are they overlycomplex?; very complex curves can often result innonsensical projections and indicate overfitting;always apply an eco-plausibility filter (e.g., doresponse curves and distribution maps makebiological sense) rather than solely relying onevaluation metrics

threshold. For example, if the model is to be used to send crews tocontrol an invasive species, it is more important to have the speciesbe there (higher threshold) whereas if the purpose is for early detec-tion it is more important to not miss any locations (lower threshold).

6. Conclusions

George E.P. Box said, “Essentially, all models arewrong, but some areuseful” (Box and Draper, 1987). We highlighted several caveats relatedto species distribution modeling, but do not mean to imply that thesemodels are not useful. Rather, we hope to highlight the importance ofconsidering the limitations and sources of uncertainty in models. Westrongly recommend including a caveat section in papers based onthese models. Pay attention to most or all of the above considerationsfor better modeling and mapping of species' geographic distributions.Develop a set of rules or guidelines (e.g., Table 2), to consult and followwhen you develop a model. Identifying and keeping the purpose of themodel in mind may help guide the decisions made in model develop-ment. If the principles outlined here are considered and followed,species distribution models can produce results useful to resourcemanagers.

Four take-home messages bear repeating: (1) all sampling data areincomplete and potentially biased; (2) it is important to understandthe assumptions and limitations of the model being used; (3) no singlemodel works best for all species in all areas at all spatial scales; and(4) the results of species distribution models should be treated like ahypothesis to be tested and validated with additional sampling andmodeling in an iterative process.

Acknowledgments

We would like to thank the U.S. Geological Survey Invasive SpeciesProgram for financial support for this work. We thank Colorado StateUniversity and the U.S. Geological Survey Fort Collins Science Centerfor logistical support.We thank Tim Kern and Alan Swanson for provid-ing comments on an early version of our manuscript. This research waspartially supported by USDA CSREES/NRI 2008-35615-04666. Any useof trade, product, or firm names is for descriptive purposes only anddoes not imply endorsement by the U.S. Government.

References

Allouche, O., Tsoar, A., Kadmon, R., 2006. Assessing the accuracy of species distribu-tion models: prevalence, kappa and the true skill statistic (TSS). J. Appl. Ecol. 43,1223–1232.

Araujo, M.B., Guisan, A., 2006. Five (or so) challenges for species distribution modelling.J. Biogeogr. 33, 1677–1688.

Araujo, M.B., New, M., 2007. Ensemble forecasting of species distributions. Trends Ecol.Evol. 22, 42–47.

Araújo, M.B., Peterson, A.T., 2012. Uses and misuses of bioclimatic envelope modeling.Ecology 93, 1527–1539.

Austin, M.P., 2002. Spatial prediction of species distribution: an interface between ecolog-ical theory and statistical modelling. Ecol. Model. 157, 101–118.

Austin, M., 2007. Species distribution models and ecological theory: a critical assessmentand some possible new approaches. Ecol. Model. 200, 1–19.

Barbet-Massin, M., Jiguet, F., Albert, C.H., Thuiller, W., 2012. Selecting pseudo-absences forspecies distribution models: how, where and how many? Methods Ecol. Evol. 3,327–338.

Barve, N., Barve, V., Jiménez-Valverde, A., Lira-Noriega, A., Maher, S.P., Peterson, A.T.,Soberón, J., Villalobos, F., 2011. The crucial role of the accessible area in ecologicalniche modeling and species distribution modeling. Ecol. Model. 222, 1810–1819.

Boria, R.A., Olson, L.E., Goodman, S.M., Anderson, R.P., 2014. Spatial filtering to reducesampling bias can improve the performance of ecological niche models. Ecol.Model. 275, 73–77.

Boulinier, T., Nichols, J.D., Sauer, J.R., Hines, J.E., Pollock, K.H., 1998. Estimating speciesrichness: the importance of heterogeneity in species detectability. Ecology 79,1018–1028.

Box, G.E.P., Draper, N.R., 1987. Empirical Model Building and Response Surfaces. JohnWiley & Sons, New York, NY.

Braunisch, V., Coppes, J., Arlettaz, R., Suchant, R., Schmid, H., Bollmann, K., 2013. Selectingfrom correlated climate variables: a major source of uncertainty for predictingspecies distributions under climate change. Ecography 36, 971–983.

14 C.S. Jarnevich et al. / Ecological Informatics 29 (2015) 6–15

Broennimann, O., Treier, U.A., Muller-Scharer, H., Thuiller, W., Peterson, A.T., Guisan, A.,2007. Evidence of climatic niche shift during biological invasion. Ecol. Lett. 10,701–709.

Broennimann, O., Fitzpatrick, M.C., Pearman, P.B., Petitpierre, B., Pellissier, L., Yoccoz, N.G.,Thuiller, W., Fortin, M.-J., Randin, C., Zimmermann, N.E., Graham, C.H., Guisan, A.,2012. Measuring ecological niche overlap from occurrence and spatial environmentaldata. Glob. Ecol. Biogeogr. 21, 481–497.

Brotons, L., Thuiller, W., Araujo, M.B., Hirzel, A.H., 2004. Presence–absence versuspresence-only modelling methods for predicting bird habitat suitability. Ecography27, 437–448.

Crall, A.W., Jarnevich, C.S., Panke, B., Young, N., Renz, M., Morisette, J., 2013. Using habitatsuitability models to target invasive plant species surveys. Ecol. Appl. 23, 60–72.

de Siqueira, M.F., Durigan, G., Junior, P.M., Peterson, A.T., 2009. Something from nothing:using landscape similarity and ecological niche modeling to find rare plant species.J. Nat. Conserv. 17, 25–32.

Diniz, J.A.F., Bini, L.M., Rangel, T.F., Loyola, R.D., Hof, C., Nogues-Bravo, D., Araujo, M.B.,2009. Partitioning and mapping uncertainties in ensembles of forecasts of speciesturnover under climate change. Ecography 32, 897–906.

Dormann, C.F., McPherson, J.M., Araujo, M.B., Bivand, R., Bolliger, J., Carl, G., Davies, R.G.,Hirzel, A., Jetz, W., Kissling, W.D., Kuhn, I., Ohlemuller, R., Peres-Neto, P.R.,Reineking, B., Schroder, B., Schurr, F.M., Wilson, R., 2007. Methods to account forspatial autocorrelation in the analysis of species distributional data: a review.Ecography 30, 609–628.

Dormann, C.F., Purschke, O., Marquez, J.R.G., Lautenbach, S., Schroder, B., 2008. Compo-nents of uncertainty in species distribution analysis: a case study of the great greyshrike. Ecology 89, 3371–3386.

Dormann, C.F., Elith, J., Bacher, S., Buchmann, C., Carl, G., Carré, G., Marquéz, J.R.G., Gruber,B., Lafourcade, B., Leitão, P.J., Münkemüller, T., McClean, C., Osborne, P.E., Reineking,B., Schröder, B., Skidmore, A.K., Zurell, D., Lautenbach, S., 2013. Collinearity: a reviewof methods to deal with it and a simulation study evaluating their performance.Ecography 36, 027–046.

Duncan, R.P., Cassey, P., Blackburn, T.M., 2009. Do climate envelope models transfer? Amanipulative test using dung beetle introductions. Proc. R. Soc. B Biol. Sci. 276,1449–1457.

Elith, J., Graham, C.H., Anderson, R.P., Dudik, M., Ferrier, S., Guisan, A., Hijmans, R.J.,Huettmann, F., Leathwick, J.R., Lehmann, A., Li, J., Lohmann, L.G., Loiselle, B.A.,Manion, G., Moritz, C., Nakamura, M., Nakazawa, Y., Overton, J.M., Peterson, A.T.,Phillips, S.J., Richardson, K., Scachetti-Pereira, R., Schapire, R.E., Soberon, J., Williams,S., Wisz, M.S., Zimmermann, N.E., 2006. Novelmethods improve prediction of species'distributions from occurrence data. Ecography 29, 129–151.

Evangelista, P., Kumar, S., Stohlgren, T.J., Jarnevich, C.S., Crall, A.W., Norman III, J.B.,Barnett, D., 2008. Modelling invasion for a habitat generalist and a specialist plantspecies. Divers. Distrib. 14, 808–817.

Fei, S.L., Liang, L., Paillet, F.L., Steiner, K.C., Fang, J.Y., Shen, Z.H., Wang, Z.H., Hebard, F.V.,2012. Modelling chestnut biogeography for American chestnut restoration. Divers.Distrib. 18, 754–768.

Ficetola, G.F., Thuiller, W., Miaud, C., 2007. Prediction and validation of the potential glob-al distribution of a problematic alien invasive species— the American bullfrog. Divers.Distrib. 13, 476–485.

Fielding, A.H., Bell, J.F., 1997. A review of methods for the assessment of prediction errorsin conservation presence/absence models. Environ. Conserv. 24, 38–49.

Franklin, J., Wejnert, K.E., Hathaway, S.A., Rochester, C.J., Fisher, R.N., 2009. Effect ofspecies rarity on the accuracy of species distribution models for reptiles and amphib-ians in southern California. Divers. Distrib. 15, 167–177.

Franklin, J., Davis, F.W., Ikegami, M., Syphard, A.D., Flint, L.E., Flint, A.L., Hannah, L., 2013.Modeling plant species distributions under future climates: how fine scale do climateprojections need to be? Glob. Chang. Biol. 19, 473–483.

Fuller, T., Morton, D.P., Sarkar, S., 2008. Incorporating uncertainty about species' potentialdistributions under climate change into the selection of conservation areas with acase study from the Arctic Coastal Plain of Alaska. Biol. Conserv. 141, 1547–1559.

Graham, C.H., Elith, J., Hijmans, R.J., Guisan, A., Peterson, A.T., Loiselle, B.A., 2008. The influ-ence of spatial errors in species occurrence data used in distribution models. J. Appl.Ecol. 45, 239–247.

Guisan, A., Thuiller, W., 2005. Predicting species distribution: offering more than simplehabitat models. Ecol. Lett. 8, 993–1009.

Guisan, A., Zimmermann, N.E., 2000. Predictive habitat distribution models in ecology.Ecol. Model. 135, 147–186.

Guisan, A., Broennimann, O., Engler, R., Vust, M., Yoccoz, N.G., Lehmann, A., Zimmermann,N.E., 2006. Using niche-based models to improve the sampling of rare species.Conserv. Biol. 20, 501–511.

Guisan, A., Graham, C.H., Elith, J., Huettmann, F., 2007a. Sensitivity of predictive speciesdistribution models to change in grain size. Divers. Distrib. 13, 332–340.

Guisan, A., Zimmermann, N.E., Elith, J., Graham, C.H., Phillips, S., Peterson, A.T., 2007b.What matters for predicting the occurrences of trees: techniques, data, or species'characteristics? Ecol. Monogr. 77, 615–630.

Guo, Q., Liu, Y., 2010. ModEco: an integrated software package for ecological niche model-ing. Ecography 33, 637–642.

Harris, R.M.B., Porfirio, L.L., Hugh, S., Lee, G., Bindoff, N.L., Mackey, B., Beeton, N.J., 2013. Tobe or not to be? Variable selection can change the projected fate of a threatenedspecies under future climate. Ecol. Manag. Restor. 14, 230–234.

Hawkins, B.A., 2012. Eight (and a half) deadly sins of spatial analysis. J. Biogeogr. 39, 1–9.Heikkinen, R.K., Luoto, M., Araujo, M.B., Virkkala, R., Thuiller, W., Sykes, M.T., 2006.

Methods and uncertainties in bioclimatic envelope modelling under climate change.Prog. Phys. Geogr. 30, 751–777.

Hirzel, A.H., Hausser, J., Chessel, D., Perrin, N., 2002. Ecological-niche factor analysis: howto compute habitat-suitability maps without absence data? Ecology 83, 2027–2036.

Hosmer, D.W., Lemeshow, S., 2000. Applied Logistic Regression. 2nd ed. Wiley, New York.Jarnevich, C.S., Stohlgren, T.J., 2009. Near term climate projections for invasive species

distributions. Biol. Invasions 11, 1373–1379.Jarnevich, C.S., Esaias, W.E., Ma, P.L.A., Morisette, J.T., Nickeson, J.E., Stohlgren, T.J.,

Holcombe, T.R., Nightingale, J.M., Wolfe, R.E., Tan, B., 2014. Regional distributionmodels with lack of proximate predictors: Africanized honeybees expanding north.Divers. Distrib. 20, 193–201.

Jiménez-Valverde, A., 2012. Insights into the area under the receiver operating character-istic curve (AUC) as a discrimination measure in species distribution modelling. Glob.Ecol. Biogeogr. 21, 498–507.

Jimenez-Valverde, A., Lobo, J.M., 2006. The ghost of unbalanced species distribution datain geographical model predictions. Divers. Distrib. 12, 521–524.

Jimenez-Valverde, A., Lobo, J.M., 2007. Threshold criteria for conversion of probability ofspecies presence to either–or presence–absence. Acta Oecol. Int. J. Ecol. 31, 361–369.

Jimenez-Valverde, A., Lobo, J.M., Hortal, J., 2008. Not as good as they seem: the importanceof concepts in species distribution modelling. Divers. Distrib. 14, 885–890.

Jiménez-Valverde, A., Acevedo, P., Barbosa, A.M., Lobo, J.M., Real, R., 2013. Discriminationcapacity in species distribution models depends on the representativeness of the en-vironmental domain. Glob. Ecol. Biogeogr. 22, 508–516.

Johnson, C.J., Gillingham, M.P., 2005. An evaluation of mapped species distributionmodelsused for conservation planning. Environ. Conserv. 32, 117–128.

Joppa, L.N., McInerny, G., Harper, R., Salido, L., Takeda, K., O'Hara, K., Gavaghan, D.,Emmott, S., 2013. Troubling trends in scientific software use. Science 340, 814–815.

Kearney, M.R., Wintle, B.A., Porter, W.P., 2010. Correlative and mechanistic models ofspecies distribution provide congruent forecasts under climate change. Conserv.Lett. 3, 203–213.

Kremen, C., Cameron, A., Moilanen, A., Phillips, S.J., Thomas, C.D., Beentje, H., Dransfield, J.,Fisher, B.L., Glaw, F., Good, T.C., Harper, G.J., Hijmans, R.J., Lees, D.C., Louis, E.,Nussbaum, R.A., Raxworthy, C.J., Razafimpahanana, A., Schatz, G.E., Vences, M.,Vieites, D.R., Wright, P.C., Zjhra, M.L., 2008. Aligning conservation priorities acrosstaxa in Madagascar with high-resolution planning tools. Science 320, 222–226.

Kumar, S., Neven, L.G., Yee, W.L., 2014. Assessing the potential for establishment ofwestern cherry fruit fly using ecological niche modeling. J. Econ. Entomol. 107,1032–1044.

Le Rest, K., Pinaud, D., Monestiez, P., Chadoeuf, J., Bretagnolle, V., 2014. Spatial leave-one-out cross-validation for variable selection in the presence of spatial autocorrelation.Glob. Ecol. Biogeogr. 23, 811–820.

Legendre, P., Legendre, L., 1998. Numerical Ecology. 2nd English ed Elsevier, Amsterdam;New York.

Lichstein, J.W., Simons, T.R., Shriner, S.A., Franzreb, K.E., 2002. Spatial autocorrelation andautoregressive models in ecology. Ecol. Monogr. 72, 445–463.

Liu, C.R., Berry, P.M., Dawson, T.P., Pearson, R.G., 2005. Selecting thresholds of occurrencein the prediction of species distributions. Ecography 28, 385–393.

Liu, C., White, M., Newell, G., 2013. Selecting thresholds for the prediction of speciesoccurrence with presence-only data. J. Biogeogr. 40, 778–789.

Lobo, J.M., Jimenez-Valverde, A., Real, R., 2008. AUC: a misleading measure of the perfor-mance of predictive distribution models. Glob. Ecol. Biogeogr. 17, 145–151.

Lozier, J.D., Aniello, P., Hickerson, M.J., 2009. Predicting the distribution of Sasquatch inwestern North America: anything goes with ecological niche modelling. J. Biogeogr.36, 1623–1627.

MacKenzie, D.I., 2005. What are the issues with presence–absence data for wildlife man-agers? J. Wildl. Manag. 69, 849–860.

Marmion, M., Parviainen, M., Luoto, M., Heikkinen, R.K., Thuiller, W., 2009. Evalua-tion of consensus methods in predictive species distribution modelling. Divers.Distrib. 15, 59–69.

Mauricio Bini, L., Diniz-Filho, J.A.F., Rangel, T.F.L.V.B., Akre, T.S.B., Albaladejo, R.G.,Albuquerque, F.S., Aparicio, A., Araújo, M.B., Baselga, A., Beck, J., Isabel Bellocq, M.,Böhning-Gaese, K., Borges, P.A.V., Castro-Parga, I., Khen Chey, V., Chown, S.L., DeMarco, J.P., Dobkin, D.S., Ferrer-Castán, D., Field, R., Filloy, J., Fleishman, E., Gómez,J.F., Hortal, J., Iverson, J.B., Kerr, J.T., Daniel Kissling, W., Kitching, I.J., León-Cortés,J.L., Lobo, J.M., Montoya, D., Morales-Castilla, I., Moreno, J.C., Oberdorff, T., Olalla-Tárraga, M.Á., Pausas, J.G., Qian, H., Rahbek, C., Rodríguez, M.Á., Rueda, M., Ruggiero,A., Sackmann, P., Sanders, N.J., Carina Terribile, L., Vetaas, O.R., Hawkins, B.A., 2009.Coefficient shifts in geographical ecology: an empirical evaluation of spatial andnon-spatial regression. Ecography 32, 193–204.

McPherson, J.M., Jetz, W., 2007. Effects of species' ecology on the accuracy of distributionmodels. Ecography 30, 135–151.

McPherson, J.M., Jetz, W., Rogers, D.J., 2004. The effects of species' range sizes on theaccuracy of distribution models: ecological phenomenon or statistical artefact?J. Appl. Ecol. 41, 811–823.

Miller, A.J., Knouft, J.H., 2006. GIS-based characterization of the geographic distributionsof wild and cultivated populations of the Mesoamerican fruit tree Spondias purpurea(Anacardiaceae). Am. J. Bot. 93, 1757–1767.

Morisette, J.T., Jarnevich, C.S., Holcombe, T.R., Talbert, C.B., Ignizio, D., Talbert,M.K., Silva, C., Koop, D., Swanson, A., Young, N.E., 2013. VisTrails SAHM: visu-alization and workflow management for species habitat modeling. Ecography36, 129–135.

Moritz, C., Hoskin, C.J., MacKenzie, J.B., Phillips, B.L., Tonione, M., Silva, N., VanDerWal, J.,Williams, S.E., Graham, C.H., 2009. Identification and dynamics of a cryptic suturezone in tropical rainforest. Proc. R. Soc. B Biol. Sci. 276, 1235–1244.

Northrup, J.M., Hooten, M.B., Anderson, C.R., Wittemyer, G., 2013. Practical guidance oncharacterizing availability in resource selection functions under a use–availabilitydesign. Ecology 94, 1456–1463.

Ortega-Huerta, M.A., Peterson, A.T., 2008. Modeling ecological niches and predicting geo-graphic distributions: a test of six presence-only methods. Rev. Mex. Biodivers 79,205–216.

15C.S. Jarnevich et al. / Ecological Informatics 29 (2015) 6–15

Pawar, S., Koo, M.S., Kelley, C., Ahmed, M.F., Chaudhuri, S., Sarkay, S., 2007. Conservationassessment and prioritization of areas in Northeast India: priorities for amphibiansand reptiles. Biol. Conserv. 136, 346–361.

Pearce, J., Ferrier, S., 2000. An evaluation of alternative algorithms for fitting speciesdistribution models using logistic regression. Ecol. Model. 128, 127–147.

Pearson, R.G., Raxworthy, C.J., Nakamura, M., Peterson, A.T., 2007. Predicting species dis-tributions from small numbers of occurrence records: a test case using cryptic geckosin Madagascar. J. Biogeogr. 34, 102–117.

Phillips, S.J., Anderson, R.P., Schapire, R.E., 2006. Maximum entropy modeling of speciesgeographic distributions. Ecol. Model. 190, 231–259.

Phillips, S.J., Dudik, M., Elith, J., Graham, C.H., Lehmann, A., Leathwick, J., Ferrier, S., 2009.Sample selection bias and presence-only distribution models: implications for back-ground and pseudo-absence data. Ecol. Appl. 19, 181–197.

Pielke, R.A., Wilby, R.L., 2012. Regional climate downscaling: what's the point? Eos Trans.AGU 93, 52–53.

Raxworthy, C.J., Martinez-Meyer, E., Horning, N., Nussbaum, R.A., Schneider, G.E., Ortega-Huerta, M.A., Peterson, A.T., 2003. Predicting distributions of known and unknownreptile species in Madagascar. Nature 426, 837–841.

Rodda, G.H., Jarnevich, C.S., Reed, R.N., 2011. Challenges in identifying sites climaticallymatched to the native ranges of animal invaders. PLoS ONE 6 (e14670).

Roura-Pascual, N., Brotons, L., Peterson, A.T., Thuiller, W., 2009. Consensual predictions ofpotential distributional areas for invasive species: a case study of Argentine ants inthe Iberian Peninsula. Biol Invasions 11, 1017–1031.

Sax, D.F., Early, R., Bellemare, J., 2013. Niche syndromes, species extinction risks, andman-agement under climate change. Trends Ecol. Evol. 28, 517–523.

Soberon, J., Peterson, A.T., 2005. Interpretation ofmodels of fundamental ecological nichesand species' distributional areas. Biodivers. Inform. 2, 1–10.

Stockwell, D.R.B., Nobel, I.R., 1992. Induction of sets of rules from animal distribution data:a robust and informative method of data analysis. Math. Comput. Simul. 33, 385–390.

Stohlgren, T.J., 2007. Measuring Plant Diversity: Lessons From the Field. Oxford UniversityPress, New York.

Stohlgren, T.J., Schnase, J.L., 2006. Risk analysis for biological hazards: what we need toknow about invasive species. Risk Anal. 26, 163–173.

Stohlgren, T.J., Ma, P., Kumar, S., Rocca, M., Morisette, J.T., Jarnevich, C.S., Benson, N., 2010.Ensemble Habitat Mapping of Invasive Plant Species. Risk Anal. 30, 224–235.

Stohlgren, T., Kartesz, J., Nishino, M., Pauchard, A., Winter, M., Pino, J., Richardson, D.,Wilson, J., Murray, B., Phillips, M., 2013. Globalization Effects on Common PlantSpecies.

Swets, J.A., 1988. Measuring the accuracy of diagnostic systems. Science 240, 1285–1293.Tebaldi, C., Smith, R.L., Nychka, D., Mearns, L.O., 2005. Quantifying uncertainty in projec-

tions of regional climate change: a Bayesian approach to the analysis of multimodelensembles. J. Clim. 18, 1524–1540.

Thuiller, W., 2007. Biodiversity — climate change and the ecologist. Nature 448, 550–552.Thuiller, W., Lafourcade, B., Engler, R., Araujo, M.B., 2009. BIOMOD — a platform for

ensemble forecasting of species distributions. Ecography 32.Tsoar, A., Allouche, O., Steinitz, O., Rotem, D., Kadmon, R., 2007. A comparative evaluation

of presence-only methods for modelling species distribution. Divers. Distrib. 13,397–405.

VanDerWal, J., Shoo, L.P., Graham, C., William, S.E., 2009. Selecting pseudo-absence datafor presence-only distribution modeling: how far should you stray from what youknow? Ecol. Model. 220, 589–594.

Wilby, R.L., Dessai, S., 2010. Robust adaptation to climate change. Weather 65, 180–185.Wisz, M.S., Guisan, A., 2009. Do pseudo-absence selection strategies influence spe-

cies distribution models and their predictions? An information-theoreticapproach based on simulated data. BMC Ecol. 9. http://dx.doi.org/10.1186/1472-6785-1189-1188.

Wisz, M.S., Hijmans, R.J., Li, J., Peterson, A.T., Graham, C.H., Guisan, A., Distribut, N.P.S.,2008. Effects of sample size on the performance of species distribution models.Divers. Distrib. 14, 763–773.