On Valuing Information in Adaptive-Management Models

10
Contributed Paper On Valuing Information in Adaptive-Management Models ALANA L. MOORE †‡ AND MICHAEL A. MCCARTHY School of Botany, University of Melbourne, Parkville, Victoria 3010, Australia †Department of Mathematics and Statistics, University of Melbourne, Parkville, Victoria 3010, Australia Abstract: Active adaptive management looks at the benefit of using strategies that may be suboptimal in the near term but may provide additional information that will facilitate better management in the future. In many adaptive-management problems that have been studied, the optimal active and passive policies (accounting for learning when designing policies and designing policy on the basis of current best information, respectively) are very similar. This seems paradoxical; when faced with uncertainty about the best course of action, managers should spend very little effort on actively designing programs to learn about the system they are managing. We considered two possible reasons why active and passive adaptive solutions are often similar. First, the benefits of learning are often confined to the particular case study in the modeled scenario, whereas in reality information gained from local studies is often applied more broadly. Second, management objectives that incorporate the variance of an estimate may place greater emphasis on learning than more commonly used objectives that aim to maximize an expected value. We explored these issues in a case study of Merri Creek, Melbourne, Australia, in which the aim was to choose between two options for revegetation. We explicitly incorporated monitoring costs in the model. The value of the terminal rewards and the choice of objective both influenced the difference between active and passive adaptive solutions. Explicitly considering the cost of monitoring provided a different perspective on how the terminal reward and management objective affected learning. The states for which it was optimal to monitor did not always coincide with the states in which active and passive adaptive management differed. Our results emphasize that spending resources on monitoring is only optimal when the expected benefits of the options being considered are similar and when the pay-off for learning about their benefits is large. Keywords: active adaptive management, Bayesian statistics, decision theory, optimal monitoring, terminal re- wards, revegetation Valoraci´ on de la Informaci´ on en Modelos de Manejo Adaptativo Resumen: El manejo adaptativo activo considera los beneficios de la utilizaci´ on de estrategias que pueden ser sub´ optimas en el corto plazo pero pueden proporcionar informaci´ on adicional que facilitar´ a un mejor manejo en el futuro. En muchos problemas de manejo adaptativo que han sido estudiados, las pol´ ıticas ´ optimas activas y pasivas (considerar el aprendizaje cuando se dise˜ nan pol´ ıticas y dise˜ no de pol´ ıticas con base en la mejor informaci´ on disponible, respectivamente) son muy similares. Esto parece parad´ ojico; cuando hay incertidumbre sobre la mejor acci´ on, los manejadores deben gastar poca energ´ ıa en el dise˜ no de programas activos para aprender sobre el sistema que est´ an manejando. Consideramos dos posibles razones por las que las soluciones adaptativas activas y pasivas a menudo son similares. Primero, los beneficios del aprendizaje a menudo est´ an confinados al estudio de caso particular en el escenario modelado, mientras que en la realidad la informaci´ on obtenida de estudios locales a menudo es aplicada m´ as ampliamente. Segundo, los objetivos de manejo que incorporan la varianza de una estimaci´ on pueden poner mayor ´enfasis en el aprendizaje que los objetivos utilizados m´ as com´ unmente que intentan maximizar un valor esperado. Exploramos estos email [email protected] Paper submitted March 18, 2009; revised manuscript accepted September 30, 2009. 984 Conservation Biology, Volume 24, No. 4, 984–993 C 2010 Society for Conservation Biology DOI: 10.1111/j.1523-1739.2009.01443.x

Transcript of On Valuing Information in Adaptive-Management Models

Contributed Paper

On Valuing Information in Adaptive-ManagementModelsALANA L. MOORE∗†‡ AND MICHAEL A. MCCARTHY∗∗School of Botany, University of Melbourne, Parkville, Victoria 3010, Australia†Department of Mathematics and Statistics, University of Melbourne, Parkville, Victoria 3010, Australia

Abstract: Active adaptive management looks at the benefit of using strategies that may be suboptimal

in the near term but may provide additional information that will facilitate better management in the

future. In many adaptive-management problems that have been studied, the optimal active and passive

policies (accounting for learning when designing policies and designing policy on the basis of current best

information, respectively) are very similar. This seems paradoxical; when faced with uncertainty about the

best course of action, managers should spend very little effort on actively designing programs to learn about

the system they are managing. We considered two possible reasons why active and passive adaptive solutions

are often similar. First, the benefits of learning are often confined to the particular case study in the modeled

scenario, whereas in reality information gained from local studies is often applied more broadly. Second,

management objectives that incorporate the variance of an estimate may place greater emphasis on learning

than more commonly used objectives that aim to maximize an expected value. We explored these issues in

a case study of Merri Creek, Melbourne, Australia, in which the aim was to choose between two options for

revegetation. We explicitly incorporated monitoring costs in the model. The value of the terminal rewards

and the choice of objective both influenced the difference between active and passive adaptive solutions.

Explicitly considering the cost of monitoring provided a different perspective on how the terminal reward

and management objective affected learning. The states for which it was optimal to monitor did not always

coincide with the states in which active and passive adaptive management differed. Our results emphasize that

spending resources on monitoring is only optimal when the expected benefits of the options being considered

are similar and when the pay-off for learning about their benefits is large.

Keywords: active adaptive management, Bayesian statistics, decision theory, optimal monitoring, terminal re-wards, revegetation

Valoracion de la Informacion en Modelos de Manejo Adaptativo

Resumen: El manejo adaptativo activo considera los beneficios de la utilizacion de estrategias que pueden

ser suboptimas en el corto plazo pero pueden proporcionar informacion adicional que facilitara un mejor

manejo en el futuro. En muchos problemas de manejo adaptativo que han sido estudiados, las polıticas

optimas activas y pasivas (considerar el aprendizaje cuando se disenan polıticas y diseno de polıticas con base

en la mejor informacion disponible, respectivamente) son muy similares. Esto parece paradojico; cuando hay

incertidumbre sobre la mejor accion, los manejadores deben gastar poca energıa en el diseno de programas

activos para aprender sobre el sistema que estan manejando. Consideramos dos posibles razones por las que

las soluciones adaptativas activas y pasivas a menudo son similares. Primero, los beneficios del aprendizaje a

menudo estan confinados al estudio de caso particular en el escenario modelado, mientras que en la realidad

la informacion obtenida de estudios locales a menudo es aplicada mas ampliamente. Segundo, los objetivos

de manejo que incorporan la varianza de una estimacion pueden poner mayor enfasis en el aprendizaje

que los objetivos utilizados mas comunmente que intentan maximizar un valor esperado. Exploramos estos

‡email [email protected] submitted March 18, 2009; revised manuscript accepted September 30, 2009.

984Conservation Biology, Volume 24, No. 4, 984–993C©2010 Society for Conservation BiologyDOI: 10.1111/j.1523-1739.2009.01443.x

Moore & McCarthy 985

temas en un estudio de caso de Merri Creek, Melbourne, Australia, en el que el objetivo fue elegir entre dos

opciones para revegetacion. Explıcitamente incorporamos los costos de monitoreo en el modelo. El valor de las

recompensas terminales y la eleccion del objetivo influyeron en la diferencia entre las soluciones adaptativas

activas y pasivas. La consideracion explıcita del costo de monitoreo proporciono una perspectiva diferente

de como afectaron al aprendizaje la recompensa terminal y el objetivo de manejo. Los estados para los que

era optimo monitorear no coincidieron siempre con los estados en los que hubo diferencias en el manejo

adaptativo activo y pasivo. Nuestros resultados enfatizan que el gasto de recursos en monitoreo solo es

optimo cuando los beneficios esperados de las opciones consideradas son similares y cuando la amortizacion

del aprendizaje de sus beneficios es grande.

Palabras Clave: estadıstica Bayesiana, manejo adaptativo activo, monitoreo optimo, recompensas terminales,revegetacion, teorıa de decisiones

Introduction

Environmental managers must make decisions in the faceof uncertainty. To do this, managers may seek to makedecisions that are robust to uncertainty (Burgman & Fer-son 2005; Moilanen et al. 2006). A complementary optionis to invest effort in learning about the environment theyare managing. Adaptive management addresses these twoissues by making decisions that account for uncertainty,but with a view to monitor the outcomes so that better de-cisions can be made in the future (Holling 1978; Walters1986). Under passive adaptive management (AM), man-agers aim to maximize the management objective in thenext time step, monitor the outcome, and then modifytheir subsequent actions on the basis of their observa-tions. The benefits of learning are incorporated explicitlyinto active adaptive management, with actions modifiedto enhance learning, and desired short-term outcomessacrificed to achieve better results in the long term.

A technical challenge of active AM is ensuring that theeffort spent on learning be worth the information that isgained. This information is usually garnered by modelingthe management scenario as a dynamic allocation prob-lem and solving it, commonly with stochastic dynamicprogramming (e.g., Silvert 1978; Williams 2001; Hauser& Possingham 2008). Active AM incorporates learning bymodifying parameter estimates when new data are ob-served. Bayes’ rule is used to update the estimates witheach new round of management and monitoring.

The influence of learning on the optimal managementdecision can be represented by the difference betweenthe passive and active AM solutions. When both solutionsare similar, explicitly modifying management actions toaccount for learning is predicted to have little benefit.In many AM problems that have been studied, the opti-mal active and passive policies are very similar even whenmonitoring costs are negligible and future rewards are notdiscounted (e.g., Johnson et al. 2002; McCarthy & Poss-ingham 2007; Moore et al. 2008). This seems paradoxical;when faced with uncertainty about the best course of ac-tion, managers should spend very little effort on activelydesigning programs to learn about the system they aremanaging.

We explored this apparent paradox by examining twopossible reasons why active and passive AM solutionsare often similar. First, the benefits of learning are oftenconfined to the particular case study in the modeled sce-nario, whereas in reality local management experiencesmight often be applied more broadly. Second, learningperhaps has its greatest influence on the variance of pa-rameter estimates rather than on the expected value. Incontrast, management objectives are often expressed inAM models in terms of expected values, which are usu-ally insensitive to the variance (uncertainty) of estimates.Using management objectives that incorporate the vari-ance of an estimate (e.g., the probability of achieving aminimally acceptable outcome) may place greater em-phasis on learning. We explored these issues in a casestudy of McCarthy and Possingham (2007) in which wesought to choose between two options for revegetationof the Merri Creek corridor in Melbourne, Australia. Weextended the case study by explicitly incorporating costsof monitoring. We asked whether extending the results oflearning beyond the particular case study and the choiceof management objective, influence the apparent valueof learning when monitoring costs are negligible. We alsodetermined the circumstances under which it is optimalto monitor when costs of monitoring are considered ex-plicitly.

Methods

Problem Formulation

We investigated the effects of increased terminal rewardsand monitoring costs by extending the active AM prob-lem considered by McCarthy and Possingham (2007).McCarthy and Possingham (2007) considered a situationin which it is necessary to decide how to allocate re-sources between two possible management options. Theoutcome of each management action is measured as asuccess or failure. Which management option is mostcost-efficient depends on the cost and probability of suc-cess. The cost is assumed to be known, but there is un-certainty about the probability of success of each option.

Conservation Biology

Volume 24, No. 4, 2010

986 Adaptive Management and Terminal Rewards

Because of this uncertainty, it is unclear how to allocateresources between the two options. Each time optioni is used one learns about its success probability. Nev-ertheless, there is a trade-off between learning about thesuccess probability of each option and employing the op-tion one believes is most efficient. The optimal allocationof resources is found by determining whether the pos-sible benefits of learning compensate for the costs. Weextended this model to include nonzero monitoring costsand an increase in the reward in the final time period.

Following McCarthy and Possingham (2007), we con-sidered two management objectives: maximize the ex-pected number of successes over a specified number oftime periods and maximize the expected number of timeperiods in which the number of successes is consideredacceptable. As well as both being reasonable manage-ment goals, they illustrate the two different types of ob-jectives discussed above. Other objectives could readilybe incorporated, such as maximizing the expected num-ber of successes in the final time period.

McCarthy and Possingham (2007) assumed monitoringwas free. In this case the management decision is howto allocate resources between the two management op-tions. A particular number of trials were allocated to eachoption, and the outcome of each trial was assumed to bemeasured in terms of success or failure. Examples of suc-cess could include the density of plants at a site beinggreater than a prescribed minimum, the density of a feralpredator being less than an upper threshold, or success-ful breeding in the wild by an animal bred in captivity.The number of trials able to be performed was limited byassuming a fixed budget for each time period. If the bud-get in each time period is B, then the maximum numberof trials for management option i is equal to B/ci, whereci is the cost of each trial of the management option.

We extended the model to include nonzero monitoringcosts. In this case, the management decisions are whatproportion of the budget to spend on monitoring eachoption and what proportion of resources to allocate toplanting under each management option. That is, if themanager specifies that ki trials be planted under option i,the manager must also specify how many trials should bemonitored. Consequently, for nonzero monitoring costs,the budget constraint is expressed as

B ≥ c1u1 + c2u2 + c1n1 + c2n2 + m(n1 + n2), (1)

where ui is the number of trials of option i that are un-monitored, ni is the number of trials of option i that aremonitored, and m is the cost of monitoring the outcomeof each trial. Learning about the success rate of each op-tion only occurs when the outcomes are monitored.

For both objective functions, McCarthy and Possing-ham (2007) maximized returns over a relatively shorttime horizon (20 years, or four decision periods). Al-though this is a reasonable time frame for managementto consider, it means there is only a short window in

which management can recuperate missed-opportunitylosses incurred by initially using experimental manage-ment policies. Nevertheless, if the results of the localstudy are used to inform management of a larger project(or if one considers a longer time frame), then it will beworth forfeiting more successes in the local study to gainbetter estimates of the success probabilities.

To explore how extending the results of learning be-yond the particular case study influences the apparentvalue of learning, we increased the number of trials man-aged in the final time period. That is, after a T -year timeperiod during which experimentation can occur, thesingle-year policy that is estimated to be the most effi-cient is used across S study sites rather than only one.This was modeled by multiplying the expected rewardin the final time step by S. We refer to the reward in thefinal time step as the terminal reward and hence to S asthe terminal reward factor.

In the model presented by McCarthy and Possingham(2007), the state of the system is the current estimateof the probability of success under each managementoption, denoted p1 and p2. The two probabilities are as-sumed to be independent and constant over time. Work-ing in a Bayesian framework, the success probability, pi,is modeled as a beta random variable, and the governingprobability density function (pdf) is updated with Bayes’theorem each time new data are obtained. If one assumesthe probability of success is the same in all time periods,the posterior at the end of the time step is the prior dis-tribution for the next time period. Consequently, a betadistribution is used because it is conjugate to binomialsampling (McCarthy & Possingham 2007). This simplymeans that if the prior is a beta distribution and one usesbinomial sampling, then the updated posterior distribu-tion will still be a beta distribution. The pdf of a betadistribution can be written as

gi(pi) = paii (1 − pi)bi

B(ai + 1, bi + 1), (2)

where B(x, y) is a beta function with parameters x andy. The mean of a beta distribution is pi = (ai + 1)/(ai +bi + 2).

When the prior has a beta distribution with parame-ters ai and bi and there are si monitored successes and fi

monitored failures from ni = si + fi independent trials,the posterior will be a beta distribution with parametersai + si and bi + fi (Hilborn & Mangel 1997; McCarthy2007). Observe that the uniform distribution is a specialcase of the beta distribution, with ai = bi = 0. Hence,if one assumes that initially all success probabilities areequally likely (i.e., initial prior is a uniform distribution),then ai records the number of success that were moni-tored previously, and bi the number of failures that weremonitored previously. Consequently, the number of pre-viously monitored success and failures can be used todefine the state of the system because these parameters

Conservation Biology

Volume 24, No. 4, 2010

Moore & McCarthy 987

fully describe the pdf of the parameter estimates andhence current knowledge about the system (assumingthe initial prior is uniform).

Nevertheless, it is worth emphasizing that using thefamily of beta distributions does not require that a uni-form prior distribution be assumed. Considering the casewhen the initial prior distribution is uniform helps lenda physical interpretation to the parameters ai and bi;however, it is not necessary to assume a uniform distri-bution as an initial condition of the problem. The familyof beta distributions is versatile and can approximate alarge range of realistic priors. If the prior information isnot well represented by a smooth, unimodal probabilitydistribution, then another approach needs to be taken.

To obtain a solution to this problem, it is necessary todefine the probability of transition from any system stateto any other system state. Because transitions are definedby increments in the number of monitored successes andfailures, the transition probabilities are defined by theprobability of obtaining a particular number of successesand failures from a given number of monitored trials.

Given ni monitored trials of option i, the probabilityof getting si monitored successes is given by a binomialdistribution. Nevertheless, given that pi is itself uncertainand defined by a beta distribution gi(pi), the probabilityof obtaining si successes from ni trials is obtained from abeta-binomial distribution (Kendall 1998):

Pr(Si = si) = ni

si !(ni − si)!

B(si + ai + 1, ni − si + bi + 1

B(ai + 1, bi + 1),

where the number of previous successes and failures areequal to ai and bi, respectively (assuming the initial prioris uniform) and B(x, y) is the beta function.

By using each of the different options and monitor-ing outcomes, the precision of estimates of the successesprobability of each option will improve, and the chanceof making a good decision will increase. Nevertheless,this benefit will only outweigh the risk of allocating re-sources to the option that is currently perceived to beinferior when the uncertainty is sufficiently large and theeffectiveness of management will be learned with suffi-cient speed.

This trade-off can be evaluated with stochastic dynamicprogramming, which determines the state-dependent op-timal allocation of resources in each time period. Detailsof stochastic dynamic programming can be found in Drey-fus and Law (1977) and Puterman (1994), and ecologicalexamples are provided, for example, by Stocker (1981),Richards et al. (1999), and McCarthy et al. (2001).

For this model, the active AM policy over T 5-year timeperiods corresponds to the stochastic dynamic program-ming solution over the same time horizon. The passiveAM policy corresponds to the single time-period optimalsolution (McCarthy & Possingham 2007).

Case Study: Planting Density for Revegetation

We investigated the effects of increased terminal rewardsand monitoring costs by applying the above model tothe case study analyzed by McCarthy and Possingham(2007). The case study addresses the problem of revege-tating previously cleared and degraded land and is basedspecifically on revegetation of the Merri Creek corridorin Melbourne, Australia (Hynes 1999; McCarthy & Poss-ingham 2007).

Large areas of previously cleared or degraded landare being revegetated all over the world (over 150,000ha/year in Australia; Environment Australia 2001). Thecosts involved are considerable (Schirmer & Field 2000);hence, it is important that money be allocated efficientlyby using the most appropriate options for revegetation.Nevertheless, there is usually at least some uncertaintyabout which option is most cost-effective for achievingsuccessful revegetation (Schirmer & Field 2000).

McCarthy and Possingham (2007) considered a situa-tion in which a manager is deciding whether to plant athigh or low density for revegetation (another commonmanagement dilemma is whether to plant seedlings orsow seed directly). High-density planting is more costly,but is more likely to succeed in reaching the desired den-sity of plants in a specified time frame. Consequently,for a given budget, less area can be planted under high-density planting, but successful revegetation is morelikely. Which management option is most cost-efficientdepends on the probability of success of each option.In the presence of uncertainty about the success prob-ability, it is unclear how to allocate resources betweenhigh- and low-density planting. Each time option i is used,we learn about its success probability. There is a trade-off,however, between learning about the success probabilityof each option and using the option one believes is mostefficient. The optimal allocation of resources is found bydetermining whether the possible benefits of learningcompensate for the costs.

When monitoring costs are incorporated, there is anadditional cost associated with learning about the suc-cess probability of each option. The difference betweenpassive and active AM policies represents the value, or in-fluence, of learning. Considering when it is beneficial toinvest in monitoring provides an alternative perspectiveon when, and to what extent, learning is beneficial.

McCarthy and Possingham (2007) examined results fortwo objectives: maximize the expected area of successfulplanting over the next 20 years, with success defined as avegetation density of >1500 plants (trees and shrubs)/haafter 5 years and maximize, over the next 20 years, theexpected number of 5-year periods in which there wasat least 3 ha of successful planting. The two manage-ment options were to plant at a high (approximately4000 plants/ha) or low (approximately 2000 plants/ha)density. The high-density option was twice as expensive

Conservation Biology

Volume 24, No. 4, 2010

988 Adaptive Management and Terminal Rewards

Num

ber

ofpre

vio

us

low

-den

sity

failure

s

0

2

4

6

8

10

0 2 4 6 8 10

(a)

Num

ber

ofpre

vio

us

low

-den

sity

failure

s

0

2

4

6

8

10

0 2 4 6 8 10

(d)

(b) (e)

(c) (f)

Num

ber

ofpre

vio

us

low

-den

sity

failure

s

0

2

4

6

8

10

0 2 4 6 8 10

Num

ber

ofpre

vio

us

low

-den

sity

failure

s

0

2

4

6

8

10

0 2 4 6 8 10

Num

ber

ofpre

vio

us

low

-den

sity

failure

s

Number of previouslow-density successes

0

2

4

6

8

10

0 2 4 6 8 10

Num

ber

ofpre

vio

us

low

-den

sity

failure

s

Number of previouslow-density successes

Figure 1. The optimal area to plant

at high density, ranging from 0

(white) to 5 (black) ha, versus the

number of previous success and

failures of low-density planting,

assuming eight successes from nine

previous trials of high-density

planting. The objective in (a), (b),

and (c) is to maximize the expected

area of successful revegetation, and

the objective in (d), (e), and (f) is to

maximize, over the next 20 years,

the expected number of 5-year time

periods in which at least 3 ha of

revegetation is successful. The

reward in the final year is S times

the expected reward in 1 year: (a)

and (d) S = 1, (b) and (e) S = 5, (c)

and (f) S = 50. In (a), (b), and (c),

the diagonal line is the optimal

passive adaptive strategy. For

parameter values below the line, it

is optimal to use only low-density

planting. For parameter values

above the line, it is optimal to use

only high-density planting.

as the low-density option, and the budget was assumedsufficient to plant 5 ha at high-density or 10 ha at lowdensity (or various combinations of the two) within a5-year period. Consequently, each trial was assumed tooccupy a 1-ha area.

We varied the area revegetated in the final 5-year periodby multiplying the terminal reward by a factor, S, whichwe varied between 1 and 100. We also considered a rangeof possible monitoring costs (0 ≤ m ≤ 0.5).

The area planted is limited by the budget available ac-cording to Eq. 1, with B = 10, c1 = 2, and c2 = 1. Theexpected area of successful planting in each time pe-riod is equal to (u1 + n1) p1 + (u2 + n2) p2, where pi isthe mean success rate of action i, as defined previously.Previous data from revegetation in the Merri Creek showthat three of six low-density plantings were successfuland eight of nine high-density plantings were successful(Hynes 1999). As more planting is conducted, these esti-mates of planting success will be updated. As previouslyoutlined, uncertainty about the probability of success of

each option can be described by a beta probability dis-tribution. Given the data, we assumed high-density plant-ing had initial parameters a1 = 8 and b1 = 1 (p1 = 8/9 =0.818), and low-density planting had initial parameters a2

= 3 and b2 = 3 (p2 = 4/8 = 0.5). In this case low-densityplanting has a higher expected return on investment (0.5× 10 ha = 5.0 ha/period), than the high-density option(0.818 × 5 ha = 4.09 ha/period).

Results

Maximize Expected Number of Successful Trials

Consider first the case when monitoring is free and whenthe management objective is to maximize the expectednumber of successes over the next 20 years (active AM).With the current number of recorded success and failuresof high-density planting, there was only a small numberof states in which it was optimal to test both manage-ment options (Fig. 1a). When only the next 5 years was

Conservation Biology

Volume 24, No. 4, 2010

Moore & McCarthy 989

considered (passive AM), it was optimal to use the man-agement option with the highest estimated efficiency(McCarthy & Possingham 2007). The diagonal line inFig. 1(a–c) is the boundary dividing the parameter val-ues for which the expected area of successful plantingin the next 5 years was maximized by either low-densityplanting (below the line) or high-density planting (abovethe line). Learning occurred primarily on the side of theboundary where high-density planting had the greatestexpected efficiency. This occurred because it is cheaperto learn about low-density planting than high-densityplanting. A single high-density data point is worth twolow-density points. Consequently, it is less of a sacrificeto learn about low-density planting when one believes itis less efficient, than learning about high-density plantingwhen the situation is reversed.

The Passive Adaptive Strategy Does Not Change

When after 15 years the best strategy was used acrossfive revegetation study areas instead of only one (S =5), there were more states in which both managementoptions were used (Fig. 1b). Sites were reallocated fromhigh-density planting to low-density planting, and experi-mental management only occurred in the region in whichhigh-density planting was estimated to be most efficient.

As the terminal reward (Fig. 1) increased, there was asmall increase in the number of states in which it wasoptimal to try both management options. Nevertheless,the number of states in which it was optimal to use onlylow-density planting also increased. Hence, although thenumber of states in which both planting options wereused remained reasonably constant, increasingly the ter-minal reward considerably changed the optimal policy(Fig. 2).

When monitoring was free, it was always optimal tomonitor all trials. When monitoring was not free, how-ever, it was possible to have both monitored and unmoni-tored trials during the same time step. That is, there werefour possible management strategies to choose from.When the cost of monitoring was one-tenth of the cost oflow-density planting (m = 0.1), it was optimal to monitorpatches that were planted with the low-density option ifthe state was close to the threshold above which high-density planting was expected to be more efficient. Mon-itoring only occurred in three of the states in which bothhigh- and low-density planting were used when monitor-ing was free (see Supporting Information).

Because the management policy was constrained by abudget, using monitoring in one site resulted in a nonin-teger portion of the budget remaining, which could notbe spent on revegetation. This was an additional cost as-sociated with monitoring. This sacrifice was greater forhigh-density planting than low-density because monitor-ing would result in a higher proportion of the budgetunable to be spent on (high-density) revegetation. Con-

0 20 40 60 80 1000

10

20

30

40

50

60

Num

ber

ofst

ates

inw

hic

h

opti

mal

pol

icy

isdiff

eren

t

Terminal reward factor (S)

Figure 2. The number of states in which the optimal

planting policy for revegetation is different from S = 1

(S, terminal reward factor; crosses, objective is to

maximize the expected number of success; circles,

objective is to maximize the number of study areas

that have at least 3 ha of successful revegetation).

sequently, it was not beneficial to monitor when it wasoptimal to use high-density planting in most sites (be-cause a relatively large portion of the budget remainedunspent).

When the cost of monitoring was increased to one-fifth of the cost of low-density planting (m = 0.2),monitoring was prescribed in mainly the same states,but less of the sites planted were monitored (Support-ing Information). When the cost of monitoring was in-creased further, the number of sites monitored contin-ued to decrease. Correspondingly, the number of statesin which at least some monitoring was optimal alsodeclined.

When the terminal reward was increased in additionto including nonzero monitoring costs, it was optimalto monitor in more states (see Supporting Information).Nevertheless, it was still never optimal to monitor instates in which all of the budget should be spent onhigh-density planting, and it was optimal to monitor onlyin one state in which 80% of the budget was allocatedto the high-density option under negligible monitoringcosts.

Maximize Expected Number of Time Periods with ThreeSuccessful Trials

The second management objective was to maximize theprobability that there be at least three successes in each5-year period over the next 20 years. For this manage-ment objective, it was optimal to use both manage-ment options for a higher proportion of the initial states(Fig. 1). McCarthy and Possingham (2007) compared this

Conservation Biology

Volume 24, No. 4, 2010

990 Adaptive Management and Terminal Rewards

High-density with monitoring Low-density with monitoring

0

2

4

6

8

10

0 2 4 6 8 100

2

4

6

8

10

0 2 4 6 8 10

High-density with no monitoring Low-density with no monitoring

Num

ber

ofpre

vio

us

low

-den

sity

failure

s

Number of previous low-density successes

0

2

4

6

8

10

0 2 4 6 8 100

2

4

6

8

10

0 2 4 6 8 10

Figure 3. The optimal area to

plant under each of the four

management strategies: high

density with and without

monitoring and low density with

and without monitoring. The

number of trials planted ranges

from 0 (white) to 10 (black) ha

and depends on the number of

previous success and failures of

low-density plantings. These results

are for the case when there have

been eight success from nine

previous trials of high-density

planting. The cost of monitoring is

one-tenth the cost of low-density

planting (m = 0.1), and the

terminal reward factor is S = 1.

The objective is to maximize the

expected number of 5-year time

periods in which at least 3 ha of

revegetation is successful.

policy with that obtained when the objective was to max-imize the probability of at least three success in a singletime period (passive AM). They found that the optimalpolices were very similar, which indicated that includ-ing the value of learning did little to alter the optimalpolicy.

When the area revegetated in the last 5-year period wasincreased, there was a moderate increase in the numberof states in which it was optimal to use both high- andlow-density planting (Fig. 1). In contrast to the previousobjective, the width of the region in which it was optimalto employ both options grew in both directions.

For small changes in the terminal reward, the num-ber of states in which the optimal policy changed wasless than when maximizing the expected number of suc-cesses (Fig. 2). Nevertheless, for larger terminal rewards(S > 10), more changes were observed for this manage-ment objective.

When the cost of monitoring was one-tenth of the costof low-density planting (m = 0.1), it was optimal to moni-tor when low-density planting was perceived to be highlyefficient, but the total number of previous observationswas low (Fig. 3). When the cost of monitoring was in-creased to m = 0.2 (Supporting Information), there wereonly four states in which it was optimal to monitor man-agement outcomes.

When the terminal reward was increased in addition toincorporating monitoring costs, there was again a largeincrease in the number of states for which at least somemonitoring was optimal (Fig. 4).

Discussion

The value of information for making a one-off decisionis reasonably well understood. Nevertheless, understand-ing the value of information when making a sequenceof decisions is a much more difficult problem. Compar-ing active and passive AM solutions is one way to gaininsight into the value of learning. Most AM examples todate suggest that knowledge and learning are not as im-portant as scientists might like to believe. They suggestthat, when faced with uncertainty about the best courseof action, managers should mostly implement manage-ment policies using the best-available information, ratherthan planning on obtaining new information. Although itmight be true that sometimes learning is not as valuableas one might intuitively believe, we propose that anotherreason learning is not highly valued in many models is adisparity in how information is valued in the model andin the real world. We explored two ways in which infor-mation might be valued differently and found they bothinfluenced learning.

The value of terminal rewards and the nature of theobjective influenced the difference between active andpassive AM policies. Increasing the terminal reward onlyslightly increased the number of states for which plantingat both high and low density concurrently was optimal(i.e., there are more shades of gray in Fig. 1). Neverthe-less, there was a more noticeable influence of terminalrewards on the difference between passive and activemanagement policies, with less high-density planting as

Conservation Biology

Volume 24, No. 4, 2010

Moore & McCarthy 991

High-density with monitoring Low-density with monitoring

0

2

4

6

8

10

0 2 4 6 8 100

2

4

6

8

10

0 2 4 6 8 10

High-density with no monitoring Low-density with no monitoring

Num

ber

ofpre

vio

us

low

-den

sity

failure

s

Number of previous low-density successes

0

2

4

6

8

10

0 2 4 6 8 100

2

4

6

8

10

0 2 4 6 8 10

Figure 4. The optimal area to plant

under each of the four management

strategies: high density with and

without monitoring and low density

with and without monitoring. The

number of trials planted ranges from

0 (white) to 10 (black) ha and

depends on the number of previous

success and failures of low-density

plantings. These results are for the

case when there have been eight

success from nine previous trials of

high-density planting. The cost of

monitoring is one-tenth the cost of

low-density planting (m = 0.1), and

the terminal reward factor is S = 5.

The objective is to maximize the

expected number of 5-year time

periods in which at least 3 ha of

revegetation is successful.

the terminal reward increased. This is presumably be-cause the success of high-density planting is believed tobe high, so the greatest long-term benefit will be real-ized if low-density planting proves more successful thanexpected. In contrast, there is a more even change inthe optimal policy with increases in the terminal rewardwhen the objective is based on maximizing the probabil-ity of achieving at least 3 ha of successful revegetationper period.

Explicitly considering the cost of monitoring provideda different perspective on the influence of the terminalrewards and the objective. When the management objec-tive was to maximize the expected number of successes,the passive and active policies differed when the ex-pected values under high- and low-density planting weresimilar (Fig. 1). These are similar to the states where itis optimal to spend resources on monitoring (SupportingInformation). In contrast, when the objective was basedon the probability of achieving at least 3 ha of successfulrevegetation, it was optimal to spend resources on mon-itoring over a broad range of parameters, but only whenthe results were applicable beyond the case study (i.e.,when the terminal rewards were larger) (Figs. 3 & 4). Forthis objective, the states for which it was optimal to spendresources on monitoring were different from those inwhich the passive and active adaptive-management poli-cies differed. This emphasizes the intuitive result thatspending resources on monitoring is only optimal whenthe expected benefits of the options being consideredare similar and when the payoff of learning about theirbenefits is large.

We considered the case when monitoring provides per-fect information regarding the success or failure of eachtrial. The cost of monitoring each site was fixed, so in-creased monitoring costs were only incurred when moresites were monitored. In the real world, however, per-fect observations of population abundance, be it floraor fauna, is impossible. Obtaining good population es-timates may be difficult and expensive (e.g., Engeman2005); hence, there is usually a trade-off between accu-racy of information obtained and monitoring costs. Con-sequently, another interesting question is what type orlevel of monitoring is optimal. Some researchers have ex-amined the question of imperfect observations (Chadeset al. 2008; Moore 2009; Hauser & McCarthy 2009). Nev-ertheless, in many cases the type or level of monitoringis fixed (e.g., Hauser et al. 2006; Moore 2009). In ourcase study, estimating the abundance of the plants wasrelatively simple, so considering fixed-effort monitoringis realistic in this case.

Our formulation of AM and those of others updateparameter estimates derived from Bayes’ rule. Such for-mulations require that the range of possible outcomes,and their a priori probability of occurrence, be speci-fied and that these a priori specifications remain validthroughout the analysis. In short, this means managerscannot change their minds; unexpected results need tobe incorporated into existing paradigms rather than lead-ing to new paradigms. This approach may underesti-mate the value of monitoring. Field observations may,in reality, lead to novel insights that could lead to newmanagement options. For example, Armstrong et al.

Conservation Biology

Volume 24, No. 4, 2010

992 Adaptive Management and Terminal Rewards

(2007) used AM to assess impacts of supplementary foodand mite control on the dynamics of a reintroduced pop-ulation of Hihi (Notiomystis cincta), a native passerineof New Zealand. Supplementary feeding did not increasethe low survival rate of the bird, which suggests that asoil-borne fungus was responsible for the population’sdecline. Assessing effects of the fungus was beyond thescope of the AM program, but the study neverthelesscontributed to the management decision to remove thebirds from the site. Additionally, the study contributed tosuccessful reintroductions of Hihi at two other locations(Armstrong et al. 2007).

The example of the Hihi illustrates that AM programscan sometimes provide information that goes beyondthe immediate research question and beyond the par-ticular case study. In the case of Hihi, the survival rateof the reintroduced population was less than the valueexpected for this species (Armstrong et al. 2007). The ex-pected value was based in part on knowledge of survivalrates elsewhere, so data from other sites contributed tomanagement at the location in question. Furthermore,information on survival rates of other bird species couldalso help establish expected survival rates, so data fromparticular case studies can inform others, even those ofdifferent species (McCarthy et al. 2008). Incorporatingunexpected results into analyses of AM is difficult be-cause the range of possible outcomes must be specifieda priori. Therefore, analyses such as ours that rely onBayesian updating will tend to underestimate the benefitof learning. The expected value of obtaining unexpectedresults from monitoring can be derived from the probabil-ity of achieving an unexpected outcome from monitoringand the management benefit of that outcome.

When knowledge gained was applied beyond thescope of the local study, there was a much greater dif-ference between active and passive AM solutions. Fur-thermore, for a given state of knowledge, the optimalmanagement strategy depended heavily on the choice ofmanagement goal. Consequently, incorporating the vari-ance of the estimate into the objective function also af-fected when and how often active learning was expectedto be beneficial. These results shed light on some of theunknowns that influence when managers should investeffort on actively designing programs to learn about thesystem they are managing. We hope these insights willassist managers to make more informed decisions in theface of uncertainty.

Acknowledgments

We gratefully acknowledge funding support from theCentre of Excellence for Mathematics and Statistics ofComplex Systems, Australian Research Council, and theApplied Environmental Decision Analysis Research Facil-

ity. We also thank two anonymous reviewers for helpfulcomments.

Supporting Information

Additional figures showing the optimal planting strategyfor nonzero monitoring costs (Appendix S1) are availableas part of the online article. The authors are responsi-ble for the content and functionality of these materials.Queries (other than absence of the material) should bedirected to the corresponding author.

Literature Cited

Armstrong, D. P., I. Castro, and R. Griffiths. 2007. Using adaptivemanagement to determine requirements of re-introduced popula-tions: the case of the New Zealand Hihi. Journal of Applied Ecology44:953–962.

Burgman, M., and S. Ferson. 2005. Risks and decisions for conserva-tion and environmental management. Cambridge University Press,London.

Chades, I., E. McDonald-Madden, M. McCarthy, B. Wintle, and Possing-ham, H. 2008. Save, survey or surrender: optimal management of acryptic threatened species. Proceedings of the National Academy ofSciences of the United States of America 105:13936–13940.

Dreyfus, S. E., and A. M. Law. 1977. The art and theory of dynamicprogramming. Academic Press, New York.

Engeman, R. M. 2005. Indexing principles and a widely applica-ble paradigm for indexing animal populations. Wildlife Research32:203–210.

Environment Australia. 2001. Work in progress: Australia’s commitmentto the environment. Commonwealth of Australia, Canberra.

Hauser, C., and H. P. Possingham. 2008. Experimental or precautionary?Adaptive harvest management over a range of time horizons. Journalof Applied Ecology 45:72–81.

Hauser, C. E., and M. A. McCarthy 2009. Streamlining search and de-stroy: cost-effective surveillance for invasive species management.Ecology Letters 12:683–692.

Hauser, C. E., A. R. Pople, and H. P. Possingham 2006. Should man-aged populations be monitored every year? Ecological Applications16:807–819.

Hilborn, R., and M. Mangel. 1997. The ecological detective: confrontingmodels with data. Princeton University Press, Princeton, New Jer-sey.

Holling, C. S. 1978. Adaptive environmental assessment and manage-ment. John Wiley and Sons, New York.

Hynes, L. 1999. Measuring the success of riparian revegetation projectson the Merri Creek: assessment of plant survivorship and commu-nity structure. Honours thesis. University of Melbourne, Melbourne,Victoria.

Johnson, F. A., W. L. Kendall, and J. A. Dubovsky. 2002. Conditionsand limitations on learning in the adaptive management of mallardharvests. Wildlife Society Bulletin 30:176–185.

Kendall, B. E. 1998. Estimating the magnitude of environmental stochas-ticity in survivorship data. Ecological Applications 8:184–193.

McCarthy, M., H. Possingham, and A. Gill. 2001. Using stochastic dy-namic programming to determine optimal fire management forBanksia ornata. Journal of Applied Ecology 38:585–591.

McCarthy, M. A. 2007. Bayesian methods for ecology. Cambridge Uni-versity Press, Cambridge, United Kingdom.

McCarthy, M. A., R. Citroen, and S. C. McCall. 2008. Allometric scalingand Bayesian priors for annual survival of birds and mammals. TheAmerican Naturalist 172:216–222.

Conservation Biology

Volume 24, No. 4, 2010

Moore & McCarthy 993

McCarthy, M. A., and H. P. Possingham. 2007. Active adaptive manage-ment for conservation. Conservation Biology 21:956–963.

Moilanen, A., M. C. Runge, J. Elith, A. Tyre, Y. Carmel, E. Fegraus, B.A. Wintle, M. Burgman, and Y. Ben-Haim. 2006. Planning for robustreserve networks using uncertainty analysis. Ecological Modelling199:115–124.

Moore, A. 2009. Managing populations in the face of uncertainty: adap-tive management, partial observability and the dynamic value ofinformation. PhD thesis. University of Melbourne, Melbourne, Vic-toria.

Moore, A. L., C. E. Hauser, and M. A. McCarthy. 2008. How wevalue the future affects our desire to learn. Ecological Applications18:1061–1069.

Puterman, M. L. 1994. Markov decision processes. John Wiley and Sons,New York.

Richards, S., H. Possingham, and J. Tizard. 1999. Optimal fire manage-ment for maintaining community diversity. Ecological Applications9:880–892.

Schirmer, J., and J. Field. 2000. The cost of revegetation. Final report.Australian National University Forestry, Greening Australia, and En-vironment Australia, Canberra.

Silvert, W. 1978. Price of knowledge—fisheries management as a re-search tool. Journal of the Fisheries Research Board Of Canada35:208–212.

Stocker, M. 1981. Optimization model for a wolf-ungulate system. Eco-logical Modelling 12:151–172.

Walters, C. J. 1986. Adaptive management of renewable resources.McMillan, New York.

Williams, B. K. 2001. Uncertainty, learning, and the optimal manage-ment of wildlife. Environmental and Ecological Statistics 8:269–288.

Conservation Biology

Volume 24, No. 4, 2010