Marshalling existing biodiversity data to evaluate biodiversity status and trends in planning...

11
SPECIAL FEATURE From SATOYAMA to managing global biodiversity Alberto Jime´ nez-Valverde Andre´ s Lira-Noriega A. Townsend Peterson Jorge Sobero´n Marshalling existing biodiversity data to evaluate biodiversity status and trends in planning exercises Received: 3 February 2010 / Accepted: 16 July 2010 / Published online: 11 August 2010 ȑ The Ecological Society of Japan 2010 Abstract A thorough understanding of biodiversity sta- tus and trends through time is necessary for decision- making at regional, national, and subnational levels. Information readily available in databases allows for development of scenarios of species distribution in relation to habitat changes. Existing species occurrence data are biased towards some taxonomic groups (espe- cially vertebrates), and are more complete for Europe and North America than for the rest of the world. We outline a procedure for development of such biodiversity scenarios using available data on species distribution derived from primary biodiversity data and habitat conditions, and analytical software, which allows esti- mation of species’ distributions, and forecasting of likely effects of various agents of change on the distribution and status of the same species. Such approaches can translate into improved knowledge for countries regarding the 2010 Biodiversity Target of reducing sig- nificantly the rate of biodiversity loss—indeed, using methodologies such as those illustrated herein, many countries should be capable of analyzing trends of change for at least part of their biodiversity. Sources of errors that are present in primary biodiversity data and that can affect projections are discussed. Keywords Geographic range Niche modelling Habitat suitability Climate change Conservation Introduction Many important components of biological diversity are disappearing at unprecedented rates, with grave conse- quences for ecosystem services that are crucial for human societies (Millennium Assessment 2005). This biodiver- sity crisis has led many countries to decide, at the sixth meeting of the Conference of the Parties (COP) to the Convention on Biological Diversity (CBD) in April 2002, to ‘‘achieve by 2010 a significant reduction of the current rate of biodiversity loss at the global, regional and na- tional level’’ (Decision VI/26, http://www.biodiv.org). This agreement is known as the ‘‘2010 Biodiversity Tar- get’’ (hereafter ‘‘2010 target’’) which, although having attracted a significant amount of attention and activity (Millennium Assessment 2005), is now regarded as unattainable (Mooney and Mace 2009; Secretariat of the Convention on Biological Diversity 2010). A curious feature of the 2010 target and most bio- diversity indicator targets that the CBD COP has adopted is that data required to calculate the indicators are largely unavailable or patchy at best, at national levels (Balmford et al. 2005; Walpolem et al. 2009). For instance, the CBD COP (the framework mentioned in annex II of decision VII/30 of the CBD on future eval- uation and UNEP/CBD/COP/7/INF/33) demands, among 18 other indicators, information on ‘‘trends in extent of selected biomes, ecosystems and habitats’’ and ‘‘trends in abundance and distribution of selected spe- cies.’’ In most developing countries, monitoring such elements is simply not done. Without relevant data, the proposed indicators remain academic exercises not di- rectly useful for policy development and decision-mak- ing (Sobero´n and Peterson 2009). For the developing world, it has been argued that no substitute exists for developing national institutions capable of monitoring biodiversity components at fine scales (Sobero´n and Sarukha´n 2010). The emerging field of biodiversity informatics is creating opportunities for making much better use of already available data, pro- viding capacity for assessing some of these biodiversity indicators. This contribution outlines a procedure for assessing the distribution, abundance, and aspects of conservation status of selected species through time. Major steps forward in enabling access to and analysis of biodiver- sity information have been taken (Edwards 2005; A. Jime´nez-Valverde (&) A. Lira-Noriega A. T. Peterson J. Sobero´n Biodiversity Institute, The University of Kansas, Lawrence, KS 66045, USA E-mail: [email protected]; [email protected] Tel.: +1-785-8642383 Ecol Res (2010) 25: 947–957 DOI 10.1007/s11284-010-0753-8

Transcript of Marshalling existing biodiversity data to evaluate biodiversity status and trends in planning...

SPECIAL FEATURE From SATOYAMA to managing global biodiversity

Alberto Jimenez-Valverde • Andres Lira-Noriega

A. Townsend Peterson • Jorge Soberon

Marshalling existing biodiversity data to evaluate biodiversitystatus and trends in planning exercises

Received: 3 February 2010 / Accepted: 16 July 2010 / Published online: 11 August 2010� The Ecological Society of Japan 2010

Abstract A thorough understanding of biodiversity sta-tus and trends through time is necessary for decision-making at regional, national, and subnational levels.Information readily available in databases allows fordevelopment of scenarios of species distribution inrelation to habitat changes. Existing species occurrencedata are biased towards some taxonomic groups (espe-cially vertebrates), and are more complete for Europeand North America than for the rest of the world. Weoutline a procedure for development of such biodiversityscenarios using available data on species distributionderived from primary biodiversity data and habitatconditions, and analytical software, which allows esti-mation of species’ distributions, and forecasting of likelyeffects of various agents of change on the distributionand status of the same species. Such approaches cantranslate into improved knowledge for countriesregarding the 2010 Biodiversity Target of reducing sig-nificantly the rate of biodiversity loss—indeed, usingmethodologies such as those illustrated herein, manycountries should be capable of analyzing trends ofchange for at least part of their biodiversity. Sources oferrors that are present in primary biodiversity data andthat can affect projections are discussed.

Keywords Geographic range Æ Niche modelling ÆHabitat suitability Æ Climate change Æ Conservation

Introduction

Many important components of biological diversity aredisappearing at unprecedented rates, with grave conse-quences for ecosystem services that are crucial for humansocieties (Millennium Assessment 2005). This biodiver-

sity crisis has led many countries to decide, at the sixthmeeting of the Conference of the Parties (COP) to theConvention on Biological Diversity (CBD) in April 2002,to ‘‘achieve by 2010 a significant reduction of the currentrate of biodiversity loss at the global, regional and na-tional level’’ (Decision VI/26, http://www.biodiv.org).This agreement is known as the ‘‘2010 Biodiversity Tar-get’’ (hereafter ‘‘2010 target’’) which, although havingattracted a significant amount of attention and activity(Millennium Assessment 2005), is now regarded asunattainable (Mooney and Mace 2009; Secretariat of theConvention on Biological Diversity 2010).

A curious feature of the 2010 target and most bio-diversity indicator targets that the CBD COP hasadopted is that data required to calculate the indicatorsare largely unavailable or patchy at best, at nationallevels (Balmford et al. 2005; Walpolem et al. 2009). Forinstance, the CBD COP (the framework mentioned inannex II of decision VII/30 of the CBD on future eval-uation and UNEP/CBD/COP/7/INF/33) demands,among 18 other indicators, information on ‘‘trends inextent of selected biomes, ecosystems and habitats’’ and‘‘trends in abundance and distribution of selected spe-cies.’’ In most developing countries, monitoring suchelements is simply not done. Without relevant data, theproposed indicators remain academic exercises not di-rectly useful for policy development and decision-mak-ing (Soberon and Peterson 2009).

For the developing world, it has been argued that nosubstitute exists for developing national institutionscapable of monitoring biodiversity components at finescales (Soberon and Sarukhan 2010). The emerging fieldof biodiversity informatics is creating opportunities formaking much better use of already available data, pro-viding capacity for assessing some of these biodiversityindicators.

This contribution outlines a procedure for assessingthe distribution, abundance, and aspects of conservationstatus of selected species through time. Major stepsforward in enabling access to and analysis of biodiver-sity information have been taken (Edwards 2005;

A. Jimenez-Valverde (&) Æ A. Lira-Noriega ÆA. T. Peterson Æ J. SoberonBiodiversity Institute, The University of Kansas,Lawrence, KS 66045, USAE-mail: [email protected]; [email protected].: +1-785-8642383

Ecol Res (2010) 25: 947–957DOI 10.1007/s11284-010-0753-8

Soberon and Peterson 2004; Peterson et al. 2010), butthis improved information access has not been trans-lated generally into improved knowledge for countriesregarding the 2010 target or other aspects of biodiver-sity. Rather, the great bulk of 2010 target evaluationspublished to date have been global trends assessmentspublished by researchers based in Europe or NorthAmerica (Butchart et al. 2005; Collen et al. 2008), withlittle regional or national content. We provide a frame-work for one dimension in which existing informationcan be marshalled to this task, and illustrate this po-tential by means of analyses across Europe based onexisting and easily accessible information. The conclu-sion is that considerable knowledge can be harvestedfrom existing and available information.

Biodiversity informatics: current status

Considerable progress has been made in biodiversityinformatics over the past couple of decades. To beginwith, since about 1980, biodiversity data have beencaptured in digital formats with increasing precision andease, to the point that now most vertebrate data (andincreasing proportions of invertebrate and plant data)exist in databases. Once such digital capture steps aretaken—and indeed the digital capture step represents aserious bottleneck in the entire process of enabling bio-diversity data for analysis—many additional advancesbecome possible. We focus our attention on these ‘pri-mary biodiversity data’ (i.e., data records that locate aparticular species in a place at a particular point in time).We do not consider secondary biodiversity data sources(e.g., polygon-based range maps, grid-based rangesummaries) further, because, although they are derivedfrom primary sources, they have limited utility owing toproblems of uncertainty derivation, spatial resolution,and update frequency.

Beginning in the 1990s, access to these data began tobe facilitated and made much more practical. The earlyFishGopher (Wiley and Peterson 2004) and then themuch-improved Species Analyst (Kaiser 1999) providedproofs of concept. The development of the Darwin Core(http://rs.tdwg.org/dwc/) and the Distributed GenericInformation Retrieval protocol (DiGIR, http://www.digir.net/) then opened the floodgates, and subsequentdevelopments (e.g., Access to Biological CollectionsData—ABCD—Schema; Taxonomic Database Work-ing Group Access Protocol for Information Retrieval,TAPIR) have improved the situation still more. Now,just 10 years later, >200 M biodiversity records areonline and available to researchers worldwide, particu-larly via large-scale biodiversity information networksproviding direct access to primary, research-grade data,such as the Global Biodiversity Information Network(GBIF; http://www.gbif.org), VertNet (http://vertnet.org/index.php), SpeciesLink (http://splink.cria.org.br),

Red Mundial para Informacion de la Biodiversidad(REMIB; http://www.conabio.gob.mx), and others.

Although large quantities of primary species occur-rence data are now online, much work remains (Yessonet al. 2007). In particular, because the data are veryEurope- and North America-centered, large spatial gapsremain to be filled. The data are also very vertebrate-focused, so many taxa remain underrepresented. Inaddition, the data frequently lack ‘‘value-added’’ fea-tures, such as georeferences—major efforts are under-way to automate and make the georeferencing processfaster and more efficient, interpreting textual localitydescriptors intelligently and converting them into usablelatitude-longitude coordinates with known degrees oferror or uncertainty. Finally, a critical step is that ofquality control, in which records likely holding errorsare flagged, and errors potentially cleaned or corrected.

In a parallel fashion, analytical tools for biodiversityinformation have matured considerably in recentyears. Approaches for characterizing and interpretingavailable biodiversity inventory data are now muchimproved—beginning with early steps (e.g., Soberon andLlorente 1993), these tools have now blossomed intorobust software platforms (e.g., EstimateS, available athttp://viceroy.eeb.uconn.edu/estimates). Similarly, toolsfor estimating ecological niches and potential geo-graphic distributions of species have seen considerableexploration and advance, now including a broad meth-odology for data preparation, niche estimation, modelvalidation, and exploration of the implications of results(Araujo and Guisan 2006; Guisan and Thuiller 2005;Peterson 2006; Phillips et al. 2006). Finally, tools forspatial interpretation of information regarding species’distributions into explicit and objective strategies formanagement and conservation have improved consid-erably, and now are able to optimize multiple prioritiesand constraints simultaneously (Sarkar et al. 2006).

In sum, a broad infrastructure of primary biodiver-sity data and analysis tools is now at hand and verymuch available to the biodiversity community, albeitonly recently. Although researchers have explored theseopportunities eagerly, the policy community has beenrelatively slow in assimilating them. Despite well-foun-ded concerns about gaps and errors in the data (Yessonet al. 2007), many of these complications can be atten-uated via more detailed procedures for data analysis. Allbiodiversity data sets hold errors; the question rather iswhether the existence of those errors can be incorpo-rated into analyses and their effects taken into account.Also evident is the need to work hand-in-hand withpeople of different training and expertise, so that theresulting analyses do not lack the appropriate interpre-tation (e.g. that they are biologically and geographicallymeaningful). The remainder of this contribution is, ineffect, an exploration of these considerations for biodi-versity data across Europe, based on a diverse sample of20 species from across the biota.

948

Methods and approaches in ecological niche modeling

Occurrence data acquisition and quality control

Species occurrence data are available in freely accessiblebiodiversity databases, such as GBIF, VertNet, Specie-sLink, REMIB. However, data quality control is anessential first step when using such information (Chap-man 2005), because errors like mis-georeferencing andsampling bias can be more the rule than the exception.Figure 1 in Hortal et al. (2007) presents an example ofsampling bias. Records falling outside of the region ofanalysis or in incorrect environments (e.g., terrestrialspecies in marine environments) can be flagged asproblematic. In addition, when georeferencing, precisionestimates are available, and records can be filtered toretain only those that are sufficiently precise for theanalysis at hand. Records for which any taxonomicdoubts exist should also be evaluated with some measureof caution.

Sampling bias is well known for its potential tohamper correct parameterization of ecological nichemodels (ENM; Vaughan and Ormerod 2003). As aconsequence, it is important to assess a priori the dis-tribution of sampling events (i.e., sites at which thespecies could have been recorded, had it been present)with respect to the approximate known distribution ofthe species; otherwise, the concentrations of occurrencedata that drive the ecological niche modeling processmay reflect concentrations in the distribution of sam-pling effort, rather than niche dimensions. Data densityin any areas that are overrepresented owing to samplingbias needs to be reduced to match the density across thebroader distributional area. Finally, after data cleaningand quality control, modeling can proceed only if sam-ple sizes are sufficient to permit effective training ofmodels (Jimenez-Valverde et al. 2009a, b; Wisz et al.2008). These initial steps of data compilation and qualitycontrol are without doubt the most time-consuming, butalso the most important; otherwise, the well-known‘‘garbage in, garbage out’’ rule will dominate.

Assembling environmental data

Ideally, ecological niche models are based on relation-ships between occurrence information and geospatialdatasets summarizing environmental variables withrecognized influence on the population biology anddistribution of the species under consideration (Austin2002). However, for most species, such knowledge doesnot exist; rather, distributions of most species are va-guely characterized, and effects of abiotic, biotic, andgeographic factors are only poorly understood (Soberonand Peterson 2005). This information gap at broad ex-tents is more or less to be expected, given the difficultiesof experimentation at geographic scales. Frequently,geospatial summaries of key variables are not available,

so modelers usually must rely on general environmentalvariables that are available, in the hopes that they will berelated at least indirectly to the species’ distribution inecological and geographic dimensions. The WorldClimdatabase (http://www.worldclim.org) sees extensive useat present thanks to its worldwide coverage, standardand easy-to-handle data format, variety of spatial reso-lutions, and availability of future climate scenarios(Hijmans et al. 2005), although other sources may beavailable on regional scales (e.g., the European Prudenceproject, http://prudence.dmi.dk/).

Niche modelers should also note that the spatial andtemporal range and resolution of environmental vari-ables must match that of the occurrence data. That is,the temporal range of the environmental data shouldcoincide with that of the biological data, or occurrencerecords may fall at sites with environmental conditionsnot representative of those under which the speciesactually occurred. Final considerations are dimension-ality and model complexity—to avoid producing modelsthat are overfit, and particularly given high degrees ofintercorrelation of environmental variables, the numberof environmental variables considered should be keptrelatively small. For example, just because 19 ‘‘biocli-matic’’ variables are available in WorldClim does notmean that all 19 should be included in development ofmodels (Peterson and Nakazawa 2008). Ideally thisquestion should be discussed in terms of the relevance ofthe environmental information for the focus species(e.g., Rodder et al. 2009).

Finally, depending on the characteristics of theoccurrence data available, information regarding landuse is an essential element in describing overall presentdistributions. Global-extent land use data sets have beendeveloped (e.g., Global Land Cover Facility, http://glcf.umiacs.umd.edu/index.shtml), but more detailed infor-mation may be available, and indeed necessary, to per-mit assessment, for example, of changes through time.For example, the European CORINE land cover project(http://www.eea.europa.eu/) provides detailed geospatialsummaries of land cover change from 1990 on, openingopportunities to assess impacts of aspects of globalchange on species’ distributions. In a parallel fashion,but on longer time scales, impacts of climate change(global warming) can be evaluated thanks to availabilityof model scenarios characterizing future climate condi-tions (e.g., Hadley Climate Centre’s HadCM3, for var-ious emissions scenarios for 2020 and 2050) and ofmodels of marine intrusion owing to increasing sea levelwith warming climates (Li et al. 2009; Menon et al.2010).

Niche model development

Ecological niche models (ENM) is an approach that at-tempts to estimate environmental requirements of spe-cies by means of associations between observed patternsof occurrence of species and environmental variation

949

across broad landscapes; once the ecological niche isestimated, it can be projected onto real landscapes toidentify areas that are environmentally suitable for thespecies (Peterson 2001). Numerous methods have beendeveloped for estimation of these environmental spacesor associated geographic distributions (Elith et al. 2006);however, most appropriate for ENM are methods thatrequire only input of data documenting the presence ofspecies, owing to uncertain meanings of data purportingto document absences (Jimenez-Valverde et al. 2008a, b).Probably the two most widely used approaches are theGenetic Algorithm for Rule-set Prediction (GARP;Stockwell and Peters 1999; http://www.nhm.ku.edu/desktopgarp/) and Maxent (Phillips et al. 2006; http://www.cs.princeton.edu/�schapire/maxent/). Both areevolutionary–computing approaches to estimating eco-logical niches in complex and highly dimensional envi-ronmental spaces: GARP is a genetic algorithm, whichuses the biological analogy of chromosomal evolution to‘‘evolve’’ solutions; while Maxent fits the probabilitydistribution of maximum entropy (i.e., the most spread-out distribution), constrained by the values of the pixelswhere the species has been found. Both methods havebeen used extensively for niche modeling. When used andevaluated properly, they provide results with similarpredictive power (Peterson et al. 2008). Because theirspecific configurations and applications have been de-scribed in detail elsewhere (Phillips et al. 2006; Phillipsand Dudık 2008; Stockwell and Peters 1999), and be-cause numerous examples are available in a burgeoningliterature, we have not included detailed descriptions ofour specific implementations herein: suffice it to say thatwe emphasized independent testing data for modelrefinement—full details are available upon request fromthe authors.

Processing raw model outputs into realized distributions

When ENM results are projected into geographicdimensions, a surface of continuous values ranking areasby their suitability (or, more formally, by their similarityin some general sense to the places where the species isknown to occur) is produced. However, species rarely ornever occupy all suitable sites owing to effects of bioticinteractions and historical factors such as dispersalbarriers (Pulliam 2000; Soberon 2007; Soberon andPeterson 2005). As a consequence, an important nextstep is to convert the map of overall potential into onerepresenting a best hypothesis of the actual distributionof the species.

First, the continuous maps should be thresholded tofocus on areas of presence versus areas of likely absence(Peterson et al. 2007). The selection of the threshold isnot straightforward, and depends on the relative weightsaccorded to omission (i.e., predicting a known site ofpresence as absent) and commission (i.e., predicting anabsence site as present) errors—that is, the thresholdthat is ideal for a situation will depend on the intended

use of the map (Jimenez-Valverde and Lobo 2007b). Inan ENM context, when no information about genuineabsence of the species (or more formally the absence ofsuitable conditions for the species) is available, omissionerrors are of much more concern than commission er-rors (Lobo et al. 2008; Peterson et al. 2008), so thresh-olds that maximize the sensitivity (i.e., well-predictedpresences) seem most appropriate.

After thresholding, with a map of likely presencesand absences of suitable conditions in hand, an addi-tional set of considerations becomes necessary. That is,because of the effects of dispersal barriers and otherfactors that prevent species from occupying the entirespatial footprint of the environmental conditions suit-able for their populations, it is necessary to state explicitassumptions regarding those barriers (i.e., we must makeassumptions about what areas have been available to thespecies for potential colonization, equivalent to ‘‘M’’ orthe mobility component of species’ distributions in theBAM diagram of Soberon and Peterson 2005). Oncethese assumptions are established, the thresholded suit-ability maps can be trimmed to eliminate suitable areasthat lie outside of the areas that the species has been ableto colonize (Soberon 2010).

Finally, because ENM is often carried out based onclimatic variables, land use and land cover should alsobe considered in developing and refining distributionalestimates for purposes such as natural resources man-agement. Here, again, explicit assumptions are necessaryregarding which land cover types will be suitable for thespecies; because such information is not reflected in cli-mate data, these assumptions cannot generally berecovered from the occurrence data. As a result, infor-mation from the scientific literature must be reviewedand synthesized to identify appropriate land cover types.Based on these assumptions, land cover types not con-sidered appropriate for the species in question can beremoved from the distributional estimate. Because thesetrimming and filtering processes are the most subjectivesteps in the protocol we present here, expert knowledgeis quite valuable in developing genuinely robust esti-mates.

Evaluation of model predictions

The predictive power of any models such as thosedeveloped above must be assessed before the modelpredictions can be considered useful (Peterson 2005;Vaughan and Ormerod 2005). As a consequence, weincorporated two distinct phases of model evaluationprior to any use and interpretation of model predictions.These two phases are (1) testing the accuracy of the rawENM models (i.e., the continuous predictions before thethresholding, trimming, and filtering processes) using thereceiver operating characteristic (ROC) curve, and (2)evaluating the match between final distributional pre-dictions (i.e., after thresholding, trimming, and filtering)using a hierarchical fuzzy pattern-matching approach.

950

The area under the ROC curve (AUC) is a discrimi-nation measure that, despite recent important criticisms(Lobo et al. 2008; Peterson et al. 2008), is used widely inevaluating ENM predictions. The ROC curve is a plot ofsensitivity, i.e., 1-omission error rate when tested withindependent testing data against 1-specificity (i.e., thecommission error rate) across all possible thresholdvalues (Fielding and Bell 1997). When no informationabout absences is available, the proportion of area pre-dicted present is substituted for 1-specificity (Phillipset al. 2006). Because the predictions are continuous, theROC plots are developed over all possible thresholds ofpredictions, producing a curve; the AUC is then com-pared with the area under a curve of null expectations toevaluate whether the prediction is better than random ornot (Swets 1988). This first set of tests will indicatewhether model predictions are better than chanceexpectations; however, it should be emphasized thatvalidation of such potential distributional hypothesis iscomplex (Jimenez-Valverde et al. 2008a, b) and modifi-cations to the ‘‘typical’’ ROC approach become neces-sary (Peterson et al. 2008).

Assessing the accuracy of the final predictions (i.e.,the hypotheses of actual distributions) is not easy,mainly because no information on true absences isavailable (Jimenez-Valverde et al. 2008a, b; Ward et al.2009). As a consequence, we use a hierarchical fuzzypattern-matching approach (Power et al. 2001). If dis-tributional maps from some independent source areavailable (e.g., in the worked example developed below,from expert opinion), then we can compare the predicteddistributions with them via these approaches. However,several sources of uncertainty make ‘‘true’’ accuracystatistics difficult to obtain; we mention three. (1)Available range maps do not (or rarely can) present thetruth at fine resolutions, or the modeling exercise wouldbe entirely unnecessary. (2) The filtering processregarding land use is imprecise, in the sense that thiskind of information usually includes significant classifi-cation errors (Jimenez-Valverde et al. 2008a, b) andresolution effects. (3) Despite efforts to reduce potentialdistributions via assumptions regarding M, predictionswill likely include some uninhabited areas. As a conse-quence, instead of pixel-by-pixel comparisons, we as-sessed similarity of patterns using a fuzzy pattern-matching global similarity measure (Power et al. 2001).This methodology is designed to mimic the human visualassessment process when comparing maps, assessingsimilarity at both local and global resolutions, i.e.,assessing global similarity in the pattern, but recognizing

local discrepancies (see Power et al. 2001 for a detailedexplanation of the method). A fuzzy global matchingvalue of ‡0.70 indicates close similarity in patternsrecovered (Power et al. 2001).

A worked example

Here, we present the results of a series of explorations ofthe geographic distributions and conservation status ofvarious species of the European biota. All of the speciesconsidered are accorded some conservation threat statusby the European Environmental Agency Article 17status assessment (http://biodiversity.eionet.europa.eu/article17). As a consequence, these results are directlyrelevant to biodiversity policy management in theregion. More specifically, we evaluate whether theapproaches explored—all based on biodiversity infor-mation that is directly and easily available online—provide novel information useful to status assessmentand conservation prioritization activities under theArticle 17 effort of the European Environmental Agency(EEA).

Species were selected for inclusion in this examplebased on four criteria: (1) occurrence in EuropeanUnion countries that vary in their technical capacitiesfor biodiversity analysis, (2) degree of biogeographicrestriction and habitat specialization, (3) taxonomiccoverage, and (4) data availability. We chose 20 speciesat random from among the species fitting these criteria(Table 1). Most occurrence records of species wereobtained from the Global Biodiversity InformationFacility (http://www.gbif.org/), supplemented with datafrom the Mammal Networked Information System(http://www.manisnet.org/) and HerpNet (http://www.herpnet.org/) information portals. In one case (the spi-der,Macrothele calpeiana), more precise occurrence datawere available thanks to a prior study (Jimenez-Valverdeand Lobo 2007a). As expected, the occurrence datashowed numerous problems (mainly georeferencing er-rors and spatial biases; see extreme example in Fig. 1).For most species, after data cleaning and filtering,sample sizes of occurrence records were reduced some-what (Fig. 2). Some species, previously selected, weresubstituted with others for which data were moreabundant.

We selected a set of environmental variables fromthe WorldClim climatic data archive (http://www.worldclim.org/) that we deemed most appropriate forthe analysis of the different sets of selected species, based

Table 1 Summary of speciesincluded in analyses Mammals Reptiles and amphibians Plants Invertebrates

Alopex lagopus Bombina bombina Primula scandinavica Euphydryas auriniaSpermophilus citellus Vipera seoanei Apium repens Macrothele calpeianaLutra lutra Bufo calamita Narcissus nevadensis Helicopsis striataNyctalus lasiopterus Emys orbicularis Cypripedium calceolus Maculinea arionGalemys pyrenaicus Triturus cristatus Eryngium viviparum Lycaena dispar

951

on relative independence of variables in detailed corre-lation analyses (Jimenez-Valverde et al. 2009a, b; Pet-erson and Nakazawa 2008). Specifically, we used annualmean temperature, mean diurnal temperature range,maximum temperature of warmest month, minimumtemperature of coldest month, annual precipitation, andprecipitation of wettest and driest months. To provide aview of likely future potential distributions of eachspecies, we projected niche models developed for thepresent onto modeled climatic conditions for 2020 and2050, from the Canadian Center for Climate Modelingand Analysis (CCCMA) A2 projection (drawn from theWorldClim download site, http://www.worldclim.org).To summarize likely effects of sea level rise and conse-quent marine intrusion into areas that are presentlyterrestrial, we used recent global projections at 1 kmresolution (Li et al. 2009). Finally, to incorporatedimensions of land cover and its change through time,we used data sets from 1990 and 2000 from the CO-RINE land cover project (CLC1990 and CLC2000;http://dataservice.eea.europa.eu/dataservice).

GARP (with best subsets option; Anderson et al.2003) and Maxent (default options) were used to gen-erate initial distributional predictions for each speciesbased on ecological niche estimates. Models were cali-brated based on a random 50% of available occurrencedata, and tested using the other 50% using the AUC(Fig. 3). All AUC values were higher than 0.7, the greatmajority being above 0.8, in theory indicating that theinitial potential distributional hypotheses were better aspredictors than random models. Finally, these nichemodels were projected onto the future climate scenarios,and areas projected as falling in zones of likely marineintrusion (6 m sea level rise scenario) were identified andexcluded.

To estimate actual distributions, raw niche modelswere converted into presence–absence maps by means ofthresholds that included 95% of the presence records onwhich the model was based (to accommodate 5% errorrates that may have escaped the data cleaning process)and we took the intersection of the thresholded Max-ent and GARP models for each species to identify a

Fig. 1 Map of known occurrences of Lycaena dispar, based on datafrom the Global Biodiversity Information Facility (GBIF;http://www.gbif.org). The high density of points in Poland, com-pared with lower (or null) densities in surrounding countries illus-

trates the sampling bias. The grid-like arrangement of most pointsoutside Poland reflecting different precision of georeferencing maycompromise the results of any modeling using such data

952

consensus predicted area. The resultant binary mapswere trimmed according to the present biogeographicknowledge of the species, based on likely dispersal bar-riers that limit distributional possibilities of each species.For example, the spider Macrothele calpeiana is endemicto the Iberian Peninsula; however, areas exist across

Europe that are suitable for the species from a climaticpoint of view. We eliminated all suitable areas fallingoutside of the accessible region for the species based onassumptions regarding dispersal (i.e., that species candisperse through continuous suitable areas, but not jumpover areas of unsuitable conditions). Finally, usingknown habitat requirements of each species, all locationsholding unsuitable land cover types were also eliminated;we filtered distributional areas using CLC1990, and theresulting map again filtered using CLC2000 data.

We compared the initially trimmed (i.e., to eliminateareas outside the dispersal ‘‘reach’’ of the species) mapsand the filtered ones (i.e., also excluding land covercategories not suitable for the species) with the distri-butional estimates developed via expert opinion underArticle 17 of the Habitat Initiative, provided by theEEA. Here, we used the fuzzy global pattern matchingapproach. The filtered maps showed close matching ofpatterns to the Article 17 maps of each species’ distri-bution, whereas the unfiltered maps, while still betterthan random (fuzzy global matching >0.0), did notmatch closely (Fig. 4). This reduced matching clearlyresulted from inclusion of many areas not holdingappropriate land cover types in the latter datasets.Overall, though, results indicate that our ENM ap-proach and subsequent filtering and trimming produceduseful estimates of species’ distributional potential,corresponding closely to expert-opinion-based estimatesthat took considerably greater effort and resources.

Finally, we concatenated all of the above steps into asingle, combined, raster data set to summarize the initial(trimmed) distributional estimate, projections onto fu-ture climates for 2020 and 2050, filtering to reflect suit-able land use categories in 1990 and 2000, and marineintrusion effects. In this way, queries could be performed

Maculinea arion

Helicopsis striata

Eryngium viviparum

Euphydryas aurinia

Maculinea arion

Lycaena dispar Maculinea

Primula scandinavica

Apium repens

Narcissus nevadensis Cypripedium calceolus

Triturus cristatus

Vipera seoanei Bufo calamita

Emys orbicularis

Spermophilus citellus Lutra lutra

Nyctalus lasiopterus

Bombina bombina

1 10 100 1000 10000 100000

Alopex lagopus

Galemys pyrenaicus

p p

Number of records (log scale)

Fig. 2 Numbers of recordsavailable from GBIF(http://www.gbif.org), supple-mented with records from theMaNIS (http://manisnet.org/)and HerpNet (http://www.herpnet.org/). Black bars Dataportals for each species, whitebars actual numbers of recordsused in the ecological nichemodels (ENM) procedure afterdata cleaning

GOOD EXCELLENT POOR

Fre

quen

cy

0.0 0.2 0.4 0.6 0.8 1.0

75

63

41

20

AUC value

Fig. 3 Results of the validation exercise presented as a frequencyhistogram of different values of received operating characteristicareas under the curve (AUC), indicating excellent predictive ability(i.e., AUC > 0.8) for 19 of the 20 species analyzed. We usedrandom samples of 50% of available occurrence data to performquantitative validations of model performance, based on relativelyindependent occurrence data. Models were calibrated based on halfof the occurrence data, and tested using the other half

953

easily to assess impacts of different processes on thespecies’ distribution. These summaries also offered aneasy view of the relative contributions of different fac-tors and their combinations to loss of distributional areafor the species included in this study (see example inFig. 5).

Discussion and conclusions

In this study, we outline and provide a worked exampleof practical protocols for obtaining distributional esti-mates for a diverse set of species that can be incorpo-rated into quantitative evaluations of regional statusassessment, natural resource planning, and conservationprioritization exercises. We have used several ap-proaches, but all of the steps are based on primarybiodiversity data, geospatial environmental data, andsoftware tools that are broadly and freely available. Theprocedures described herein can be used by investigatorsof diverse backgrounds with a modicum of basic training(e.g., GIS skills), and with varying levels of resources oftime, data, and funding (evident from our ability toassemble this exercise with little in the way of existingdata in hand).

That our analyses are feasible for implementation bymany or all researchers is clear from the relative easewith which the products we have described wereassembled. In fact, all of the data we used were assem-bled in a few hours of work, given some basic expertisewith GIS tools, and similar data resources exist now formany terrestrial components of biodiversity. The dis-tributional hypothesis we developed were all validatedby means of data-splitting approaches, and 95% (i.e., 19

out of 20) showed what is generally considered ‘excel-lent’ predictivity, although considerable doubt exists asto the full utility of the AUC approach (Lobo et al.2008; Peterson et al. 2008). What is more, our finaldistributional summaries showed close correspondenceto the Article 17 distributional information, so theENM approach and the Article 17 information convergeclosely on the same geographic patterns. The differ-ence, however, is in the time and expense involved in

0.8

000.

70

ng0.

70Good

Match

n0.

6al

mat

chi

0.6

0.5

uzzy

glo

b0.

5

Poor

Match

0.3

0.F0.

2

Trimmed and filtered Trimmed only

4

Fig. 4 Comparison of trimmed niche model-based distributionalestimates and trimmed and filtered estimates with the expert-derived maps from Article 17, across the 20 species analyzed,presented as box plots (horizontal line median; box 25–75%percentiles; whiskers 1.5 interquartile range, dots outliers) of fuzzymatching statistics

100

60

70

80

90

Country limitsCorine 2000Inundation 6 mClimate change 50 yearsRemaining distribution

10

20

30

40

50R

emai

nin

g a

rea

(%)

0

Origina

l dist

ribut

ion

Inun

datio

n 1m

Inun

datio

n 6

m

Land

cove

r 199

0

Land

cove

r 200

0

Climat

e ch

ange

202

0

Climat

e ch

ange

205

0

Loan

d co

ver 2

000

X inun

datio

n 1

m

Land

covo

ver 2

000

X inun

datio

n 6

m

Land

cove

r 200

0 X cl

imat

e ch

ange

202

0

Land

cove

r 200

0 X cl

imat

e ch

ange

205

0

Land

cove

r 200

0 X

clim

ate

chan

g 20

50 X

inun

datio

n 6

m

Fig. 5 Combined summary of effects of land use, climate change,and marine intrusion on the geographic distribution of the butterflyLycaena dispar. Top Map of distribution, showing the area lost bythe different factors (black marine intrusion, dark gray land usechanges, light gray climate change) and the remaining distribution(striped); bottom histogram showing area lost as a function of eachindividual factor and of combinations of factors

954

assembling the information: the ENM work is quiteefficient, and these estimates can be developed at leastinitially without involvement of taxonomic experts, sothe involvement of experts can then be focused onrefinement of initial estimates. In addition, the ENMestimates offer considerably improved quantitative de-tail over what is generally produced by expert opinionefforts (Seoane et al. 2005). The Article 17 summaries, incontrast, required extensive (and expensive) input fromexperts.

These methodologies also permit incorporation oflikely effects of future phenomena affecting species’distributions, such as climate change, marine intrusion,and land use change. Despite some well-founded criti-cisms (Dormann 2007), these approaches presently arethe only option available for such future-scenarioexplorations, and have considerable potential to provideproactive perspective on the species’ distributional po-tential (Pearson and Dawson 2003). Because validationof such predictions is not easy, given that the phenom-enon being forecast takes place in the future, investiga-tors have had to depend on special opportunities formodel validation: e.g., retropredictions to the LastGlacial Maximum for species for which distributionalinformation exists for the Pleistocene (Martınez-Meyeret al. 2004). A few opportunities have permitted vali-dation of these projections over shorter time periods(Araujo et al. 2005; Foden et al. 2007). Hence, althoughour preference would be for more validation, it is clearthat useful information results from these future pro-jections of ENM hypotheses, information that is notavailable with other approaches.

A central challenge in biodiversity and natural re-sources planning is that of assessing where progress isbeing made, or where negatives outweigh positives. Inparticular, regarding biodiversity, the 2010 target pri-oritized reducing rates of biodiversity loss by 2010, butassessing progress toward this goal has not been easy.Although several indicators have been proposed (But-chart et al. 2004, 2005; Loh et al. 2005), they have beenhighly dependent on global status lists or complexindices—as such, they have not been broadly applicable,scalable, or accessible to countries outside of WesternEurope and North America. Our approach offers a moreflexible and broadly applicable alternative (Soberon andPeterson 2009): ENM approaches are integrated withmultitemporal land cover and climate estimates, andrange loss or gain is tracked via integration throughtime. The result is a simple and highly accessible ap-proach that can track single species or customized sets ofspecies, within particular regions or globally, and thus ismuch more adaptable to goals such as the species-basedportions of the 2010 target.

Our positive result in this exploratory suite of anal-yses is qualified only in that the knowledge that ourapproaches yields is dependent on the quality andquantity of information that is available. That is,information products will give better results as the input

information gets denser, richer, and more detailed (Wiszet al. 2008). In this sense, the success of our effortsshould be a call for even-broader participation in effortsaimed at sharing primary, research-grade biodiversitydata, as has been the focus of the Global BiodiversityInformation Facility and other distributed biodiversityinformation networks. More generally, however, theapproaches explored and illustrated in this paper canprovide detailed information on the distributional po-tential of large suites of species in the face of changinglandscapes, at diverse scales of time and space—thisinformation will be quite useful in measuring success ofthe 2010 target and other such agreements aimed atreducing biodiversity loss.

Acknowledgments This work was stimulated by discussions withDr. Rania Spyropoulou of the European Environmental Agency,and was supported by a contract from that organization. A.J.-V.was further supported by a postdoctoral fellowship from theMinisterio de Educacion y Ciencia, Spain (Ref.: EX-2007-0381).A.L.-N. received graduate student support from the ConsejoNacional de Ciencia y Tecnologıa, Mexico (189216). A.T.P. andJ.S. were supported by a grant from Microsoft Research.

References

Anderson RP, Lew D, Peterson A (2003) Evaluating predictivemodels of species’ distributions: criteria for selecting optimalmodels. Ecol Model 162:211–232

Araujo MB, Guisan A (2006) Five (or so) challenges for speciesdistribution modelling. J Biogeogr 33:1677–1688

Araujo MB, Pearson RG, Thuiller W, Erhard M (2005) Validationof species-climate impact models under climate change. GlobChange Biol 11:1504–1513

Austin MP (2002) Spatial prediction of species distribution: aninterface between ecological theory and statistical modelling.Ecol Model 157:101–118

Balmford A, Crane AP, Dobson A, Green R, Mace G (2005) The2010 challenge: data availability, information needs and extra-terrestrial insights. Phil Trans R Soc Lond B 360:221–228

Butchart SHM, Stattersfield AJ, Bennun LA, Shutes SM, Akca-kaya H, Baillie JEM, Stuart SN, Hilton-Taylor C, Mace GM(2004) Measuring global trends in the status of biodiversity:Red List indices for birds. PLoS Biol 2:e383

Butchart SHM, Stattersfield A, Baillie J, Bennun L, Stuart S,Akcakaya H, Hilton-Taylor C, Mace G (2005) Using Red ListIndices to measure progress towards the 2010 target and be-yond. Phil Trans R Soc Lond B 360:255–268

Chapman AD (2005) Principles of data quality. Report for theGlobal Biodiversity Information Facility, Copenhagen

Collen B, Loh J, Whitmee S, McRae L, Amin R, Baillie JM (2008)Monitoring change in vertebrate abundance: the Living PlanetIndex. Conserv Biol 23:317–327

Dormann CF (2007) Promising the future? Global change projec-tions of species distributions. Basic Appl Ecol 8:387–397

Edwards J (2005) Research and societal benefits of the globalbiodiversity information facility. Bioscience 54:485–486

Elith J, Graham CH, Anderson RP, Dudık M, Ferrier S, Guisan A,Hijmans RJ, Huettmann F, Leathwick JR, Lehmann A, Li J,Lohmann LG, Loiselle BA, Manion G, Moritz C, NakamuraM, Nakazawa Y, Overton JM, Peterson AT, Phillips SJ,Richards K, Scachetti-Pereira R, Schapire RE, Soberon J,Williams S, Wisz MS, Zimmermann NE (2006) Novel methodsimprove prediction of species¢ distributions from occurrencedata. Ecography 29:129–151

955

Fielding AH, Bell JF (1997) A review of methods for the assess-ment of prediction errors in conservation presence/absencemodels. Environ Conserv 24:38–49

Foden W, Midgley GF, Hughes G, Bond WJ, Thuiller W, HoffmanMT, Kaleme P, Underhill LG, Rebelo A, Hannah L (2007) Achanging climate is eroding the geographical range of the Na-mib Desert tree Aloe through population declines and dispersallags. Divers Distrib 13:645–653

Guisan A, Thuiller W (2005) Predicting species distribution:offering more than simple habitat models. Ecol Lett 8:993–1009

Hijmans RJ, Cameron SE, Parra JL, Jones PG, Jarvis A (2005)Very high resolution interpolated climate surfaces for globalland areas. Int J Climatol 25:1965–1978

Hortal J, Lobo JM, Jimenez-Valverde A (2007) Limitations ofbiodiversity databases: case study on seed-plant diversity inTenerife, Canary Islands. Conserv Biol 21:853–863

Jimenez-Valverde A, Lobo JM (2007a) Potential distribution of theendangered spider Macrothele calpeiana (Walckenaer, 1805)(Araneae, Hexathelidae) and the impact of climate warming.Acta Zool Sin 53:865–876

Jimenez-Valverde A, Lobo JM (2007b) Threshold criteria forconversion of probability of species presence to either-or pres-ence-absence. Acta Oecol 31:361–369

Jimenez-Valverde A, Gomez JF, Lobo JM, Baselga A, Hortal J(2008a) Challenging species distribution models: the case ofMaculinea nausithous in the Iberian Peninsula. Ann Zool Fenn45:200–210

Jimenez-Valverde A, Lobo JM, Hortal J (2008b) Not as good asthey seem: the importance of concepts in species distributionmodelling. Divers Distrib 14:885–890

Jimenez-Valverde A, Lobo JM, Hortal J (2009a) The effect ofprevalence and its interaction with sample size on the reliabilityof species distribution models. Community Ecol 10:196–205

Jimenez-Valverde A, Nakazawa Y, Lira-Noriega A, Peterson AT(2009b) Environmental correlation structure and ecologicalniche model projections. Biodivers Inform 6:28–35

Kaiser J (1999) Searching museums from your desktop. Science284:888

Li X, Rowley RJ, Kostelnick JC, Braaten D, Meisel J (2009) GISanalysis of global inundation impacts from sea level rise. Pho-togramm Eng Rem S 75:807–818

Lobo JM, Jimenez-Valverde A, Real R (2008) AUC: a misleadingmeasure of the performance of predictive distribution models.Glob Ecol Biogeogr 17:145–151

Loh J, Green R, Ricketts T, Lamoreux J, Jenkins M, Kapos V,Randers J (2005) The Living Planet Index: using species pop-ulation time series to track trends in biodiversity. Phil Trans RSoc Lond B 360:289–295

Martınez-Meyer E, Peterson AT, Hargrove WW (2004) Ecologicalniches as stable distributional constraints on mammal species,with implications for Pleistocene extinctions and climate changeprojections for biodiversity. Glob Ecol Biogeogr 13:305–314

Menon S, Soberon J, Li X, Peterson AT (2010) Preliminaryglobal assessment of terrestrial biodiversity consequences ofsea level rise mediated by climate change. Biodivers Conserv19:1599–1609

Millennium Assessment (2005) Ecosystems and human well-being:biodiversity synthesis. World Resources Institute, Washington,DC

Mooney H, Mace G (2009) Biodiversity policy challenges. Science325:1474

Pearson RG, Dawson TP (2003) Predicting the impacts of climatechange on the distribution of species: are bioclimate envelopemodels useful? Glob Ecol Biogeogr 12:361–371

Peterson AT (2001) Predicting species¢ geographic distributionsbased on ecological niche modeling. Condor 103:599–605

Peterson AT (2005) Kansas GAP analysis: the importance of val-idating distributional models before using them. Southwest Nat50:230–236

Peterson AT (2006) Uses and requirements of ecological nichesmodels and related distributional models. Biodivers Inform3:59–72

Peterson AT, Nakazawa Y (2008) Environmental data sets matterin ecological niche modelling: an example with Solenopsis in-victa and Solenopsis richteri. Glob Ecol Biogeogr 17:135–144

Peterson AT, Papes M, Eaton M (2007) Transferability and modelevaluation in ecological niche modeling: a comparison ofGARP and Maxent. Ecography 30:550–560

Peterson AT, Papes M, Soberon J (2008) Rethinking receiveroperating characteristic analysis applications in ecological nichemodelling. Ecol Model 213:63–72

Peterson AT, Knapp S, Guralnick R, Soberon J, Holder MT (2010)The big questions for biodiversity informatics. Syst Biodivers8:159–168

Phillips SJ, Dudık M (2008) Modeling of species distributions withMaxent: new extensions and a comprehensive evaluation.Ecography 31:161–175

Phillips SJ, Anderson RP, Schapire RE (2006) Maximum entropymodeling of species geographic distributions. Ecol Model190:231–259

Power C, Simms A, White R (2001) Hierarchical fuzzy patternmatching for the regional comparison of land use maps. Int JGeogr Inf Sci 15:77–100

Pulliam HR (2000) On the relationship between niche and distri-bution. Ecol Lett 3:349–361

Rodder D, Schmidtlein S, Veith M, Lotters S (2009) Alien invasiveslider turtle in unpredicted habitat: a matter of niche shift or ofpredictors studied? PLoS One 4:e7843

Sarkar S, Pressey RL, Faith DP, Margules CR, Fuller T, StomsDM, Moffett A, Wilson KA, Williams KJ, Williams PH,Andelman S (2006) Biodiversity conservation planning tools:present status and challenges for the future. Annu Rev EnvironResour 31:123–159

Secretariat of the Convention on Biological Diversity (2010) Glo-bal biodiversity outlook 3. Secretariat of the Convention onBiological Diversity, Montreal

Seoane J, Bustamante J, Dıaz-Delgado R (2005) Effect of expertopinion on the predictive ability of environmental models ofbird distribution. Conserv Biol 19:512–522

Soberon J (2007) Grinnellian and Eltonian niches and geographicdistribution of species. Ecol Lett 10:1115–1123

Soberon J (2010) Niche and area of distribution modelling: apopulation ecology perspective. Ecography 33:159–167

Soberon J, Llorente J (1993) The use of species accumulationfunctions for the prediction of species richness. Conserv Biol7:480–488

Soberon J, Peterson AT (2004) Biodiversity informatics: managingand applying primary biodiversity data. Phil Trans R Soc LondB 359:689–698

Soberon J, Peterson AT (2005) Interpretation of models of fun-damental ecological niches and species’ distributional areas.Biodivers Inform 2:1–10

Soberon J, Peterson AT (2009) Monitoring biodiversity loss withprimary species-occurrence data: toward national-level indica-tors for the 2010 target of the convention on biological diver-sity. Ambio 38:29–34

Soberon J, Sarukhan J (2010) Comments on a new mechanism forscience-policy transfer and biodiversity governance. EnvironConserv 36:1–3

Stockwell DRB, Peters DP (1999) The GARP modelling system:problems and solutions to automated spatial prediction. Int JGeogr Inf Sci 13:143–158

Swets JA (1988) Measuring the accuracy of diagnostic systems.Science 240:1285–1293

Vaughan IP, Ormerod SJ (2003) Improving the quality of distri-bution models for conservation by addressing shortcomings inthe field collection of training data. Conserv Biol 17:1601–1611

Vaughan IP, Ormerod SJ (2005) The continuing challenges oftesting species distribution models. J Appl Ecol 42:720–730

Walpolem M, Almond REA, Besancon C, Butchart SHM,Campbell-Lendrum D, Carr GM, Collen B, Collette L,Davidson NC, Dulloo E, Fazel AM, Galloway JN, Gill M,Goverse T, Hockings M, Leaman DJ, Morgan DHW, RevengaC, Rickwood CJ, Schutyser F, Simons S, Stattersfield AJ,

956

Tyrrell TD, Vie J-C, Zimsky M (2009) Tracking progress to-wards the 2010 biodiversity target and beyond. Science 325:1503–1504

Ward G, Hastie T, Barry S, Elith J, Leathwick J (2009) Presence-only data and the EM algorithm. Biometrics 65:554–563

Wiley EO, Peterson AT (2004) Biodiversity and the Internet:building and using the virtual world museum. In: Scharl A (ed)Environmental online communication. Springer, London,pp 91–99

Wisz MS, Hijmans RJ, Li J, Peterson AT, Graham CH, Guisan A,NPSDW Group (2008) Effects of sample size on the perfor-mance of species distribution models. Divers Distrib 14:763–773

Yesson C, Brewer PW, Sutton T, Caithness N, Pahwa JS, BurgessM, Gray WA, White RJ, Jones AC, Bisby FA, Culham A(2007) How global is the global biodiversity information facil-ity? PLoS One 11:e1124

957