Expectations and outcomes: decision-making in the primate brain

11
REVIEW Allison N. McCoy Michael L. Platt Expectations and outcomes: decision-making in the primate brain Received: 2 March 2004 / Revised: 28 July 2004 / Accepted: 12 August 2004 / Published online: 12 October 2004 Ó Springer-Verlag 2004 Abstract Success in a constantly changing environment requires that decision-making strategies be updated as reward contingencies change. How this is accomplished by the nervous system has, until recently, remained a profound mystery. New studies coupling economic theory with neurophysiological techniques have re- vealed the explicit representation of behavioral value. Specifically, when fluid reinforcement is paired with visually-guided eye movements, neurons in parietal cortex, prefrontal cortex, the basal ganglia, and supe- rior colliculus—all nodes in a network linking visual stimulation with the generation of oculomotor behav- ior—encode the expected value of targets lying within their response fields. Other brain areas have been implicated in the processing of reward-related infor- mation in the abstract: midbrain dopaminergic neu- rons, for instance, signal an error in reward prediction. Still other brain areas link information about reward to the selection and performance of specific actions in order for behavior to adapt to changing environmental exigencies. Neurons in posterior cingulate cortex have been shown to carry signals related to both reward outcomes and oculomotor behavior, suggesting that they participate in updating estimates of orienting value. Introduction For even the simplest of organisms, adaptive behavior requires, to paraphrase Darwin, the preservation of favorable variants and the rejection of injurious ones. In other words, nervous systems must weigh the potential rewards and punishments associated with available options and select the behavior likely to yield the best possible outcome. This process of adaptive decision- making is well illustrated by the example of a financial advisor choosing whether to buy, sell or hold portions of stock based on calculations of the most likely monetary gain. In the face of sudden change, such as a stock market crash, the value associated with available options is rapidly updated. In this way, the decision process al- lows behavior to adjust to environmental flux, resulting in a net benefit to the organism. While the analogy of the financial advisor captures the iterative, adaptive nature of the decision process, how does the brain actually select an action from the behavioral repertoire? This is a question that has long puzzled philosophers, economists, behavioral ecologists, and neuroscientists. In fact, significant advancement in our understanding of the neural basis of decision-mak- ing has been made through a combination of economic theory and neurophysiological techniques (see Glimcher 2003, for exposition). This review describes current understanding of the neural mechanisms underlying decision-making, focusing on evidence for the neural representation, computation and revision of economic decision variables used to guide visual orienting in pri- mates. Behavioral economics: expected value and decision-making In neurobiology, as in psychology, the sensory-motor reflex has been wide viewed as a useful model for directly linking sensation to action (Pavlov 1927; Sherrington 1906). As instructive as it has been, however, the simple A. N. McCoy M. L. Platt (&) Department of Neurobiology, Duke University Medical Center, 325 Bryan Research Building, Box 3209, Durham, NC 27710, USA E-mail: [email protected] M. L. Platt Center for Cognitive Neuroscience, Duke University, Durham, NC 27710, USA M. L. Platt Department of Biological Anthropology and Anatomy, Duke University, Durham, NC 27710, USA J Comp Physiol A (2005) 191: 201–211 DOI 10.1007/s00359-004-0565-9

Transcript of Expectations and outcomes: decision-making in the primate brain

REVIEW

Allison N. McCoy Æ Michael L. Platt

Expectations and outcomes: decision-makingin the primate brain

Received: 2 March 2004 / Revised: 28 July 2004 / Accepted: 12 August 2004 / Published online: 12 October 2004� Springer-Verlag 2004

Abstract Success in a constantly changing environmentrequires that decision-making strategies be updated asreward contingencies change. How this is accomplishedby the nervous system has, until recently, remained aprofound mystery. New studies coupling economictheory with neurophysiological techniques have re-vealed the explicit representation of behavioral value.Specifically, when fluid reinforcement is paired withvisually-guided eye movements, neurons in parietalcortex, prefrontal cortex, the basal ganglia, and supe-rior colliculus—all nodes in a network linking visualstimulation with the generation of oculomotor behav-ior—encode the expected value of targets lying withintheir response fields. Other brain areas have beenimplicated in the processing of reward-related infor-mation in the abstract: midbrain dopaminergic neu-rons, for instance, signal an error in reward prediction.Still other brain areas link information about rewardto the selection and performance of specific actions inorder for behavior to adapt to changing environmentalexigencies. Neurons in posterior cingulate cortex havebeen shown to carry signals related to both rewardoutcomes and oculomotor behavior, suggesting thatthey participate in updating estimates of orientingvalue.

Introduction

For even the simplest of organisms, adaptive behaviorrequires, to paraphrase Darwin, the preservation offavorable variants and the rejection of injurious ones. Inother words, nervous systems must weigh the potentialrewards and punishments associated with availableoptions and select the behavior likely to yield the bestpossible outcome. This process of adaptive decision-making is well illustrated by the example of a financialadvisor choosing whether to buy, sell or hold portions ofstock based on calculations of the most likely monetarygain. In the face of sudden change, such as a stockmarket crash, the value associated with available optionsis rapidly updated. In this way, the decision process al-lows behavior to adjust to environmental flux, resultingin a net benefit to the organism.

While the analogy of the financial advisor capturesthe iterative, adaptive nature of the decision process,how does the brain actually select an action from thebehavioral repertoire? This is a question that has longpuzzled philosophers, economists, behavioral ecologists,and neuroscientists. In fact, significant advancement inour understanding of the neural basis of decision-mak-ing has been made through a combination of economictheory and neurophysiological techniques (see Glimcher2003, for exposition). This review describes currentunderstanding of the neural mechanisms underlyingdecision-making, focusing on evidence for the neuralrepresentation, computation and revision of economicdecision variables used to guide visual orienting in pri-mates.

Behavioral economics: expected valueand decision-making

In neurobiology, as in psychology, the sensory-motorreflex has been wide viewed as a useful model for directlylinking sensation to action (Pavlov 1927; Sherrington1906). As instructive as it has been, however, the simple

A. N. McCoy Æ M. L. Platt (&)Department of Neurobiology,Duke University Medical Center,325 Bryan Research Building, Box 3209,Durham, NC 27710, USAE-mail: [email protected]

M. L. PlattCenter for Cognitive Neuroscience,Duke University, Durham, NC 27710, USA

M. L. PlattDepartment of Biological Anthropology and Anatomy,Duke University, Durham, NC 27710, USA

J Comp Physiol A (2005) 191: 201–211DOI 10.1007/s00359-004-0565-9

sensory-motor reflex can only begin to approximate therich ethology of real-world behavior (Glimcher 2003;Herrnstein 1997; Platt 2002). Only recently have neu-robiologists begun to investigate the idea that internaldecision variables dynamically link purely sensory andpurely motor processes and are explicitly represented inthe nervous system (Basso and Wurtz 1997; Dorris andMunoz 1998; Kim and Shadlen 1999; Platt and Glim-cher 1999; Shadlen and Newsome 1996, 2001).

In contrast, over the past 400 years, economists havedeveloped simple normative models to describe whatrational agents should do when confronted with a choicebetween two options [Arnauld and Nichole 1662; Ber-nouilli (1758) in Speiser 1982]. The simplest economicmodel of decision-making, known as Expected ValueTheory, posits that rational agents compute the likeli-hood that a particular action will yield a gain or loss, aswell as the amount of gain or loss that can be expectedfrom that choice (Arnauld and Nichole 1662). Thesevalues are then multiplied to arrive at an estimate ofexpected value for each possible course of action, andthe option with the highest expected value is chosen(Arnauld and Nichole 1662). Maximizing expected valueis the optimal algorithm for a rational chooser withcomplete information about the costs and benefits ofdifferent options, as well as their likelihood of occur-rence. Expected value models, and variations on them,i.e., Expected Utility Theory [Bernouilli (1758) in Speiser(1982)], have been shown to be very good descriptors ofthe choices both people and animals make in a variety ofsimple situations (Stephens and Krebs 1986; Herrnstein1997; Camerer 2003; Glimcher 2003).

Expected value models of decision-making assumethat rational choosers have access to information aboutthe probability and amount of gain that can be expectedfrom each action. Behavioral observations have repeat-edly demonstrated that humans and animals are exqui-sitely sensitive to the expected value of available options(Herrnstein 1961; Stephens and Krebs 1986; Glimcher2003). This suggests that nervous systems somehowrepresent information about the estimated costs andbenefits of potential behaviors and use that informationto dynamically link sensation to action.

New studies have begun to shed light on this process.These studies suggest that neurons representing differenteye movements compete to reach a firing rate thresholdformovement initiation (Hanes andSchall 1996;Roitmanand Shadlen 2002). The activity of these neurons is sys-tematically enhanced by both sensory informationfavoring a particular eye movement (Shadlen and New-some 1996, 2001; Roitman and Shadlen 2002) and in-creases in the expected value of that eye movement(Kawagoe et al. 1998; Leon and Shadlen 1999; Platt andGlimcher 1999; Coe et al. 2002). These observationsintimate that the eye movement decision process may beinstantiated rather simply by scaling the activity ofmovement-related neurons by the expected value of theeye movements they encode. Some of the important evi-dence supporting this model is described further below.

Expected value and primate parietal cortex

In an explicit application of economic theory to exper-imental neurophysiology, Platt and Glimcher (1999)manipulated the expected value of saccadic eye move-ments while monitoring the activity of neurons in thelateral intraparietal area (LIP), a subregion of primateposterior parietal cortex thought to intervene betweenvisual sensation and eye movements (Fig. 1; Gnadt andAndersen 1988; Goldberg et al. 1990, 1998). Priorstudies had suggested that the responses of LIP neuronswere sensitive to the behavioral relevance of visual tar-gets for guiding attention or gaze shifts (Gnadt andAndersen 1988; Colby et al. 1996; Platt and Glimcher1997) as well as the strength of perceptual evidencefavoring the production of a particular eye movement(Shadlen and Newsome 1996, 2001). These observationssuggested that LIP neurons play a role in the oculo-motor decision-making process (Platt and Glimcher1999; Shadlen and Newsome 1996).

In the first set of experiments designed to test thishypothesis (Platt and Glimcher 1999), monkeys per-formed cued saccade trials, in which a change in thecolor of a central fixation light instructed subjects to

Fig. 1 A network for oculomotor decision-making. Medial andlateral views of the macaque monkey brain illustrating majorpathways carrying signals related to saccadic eye movements(green) and reward (blue). Note: many of these pathways arebidirectional but have been simplified for presentation. AMYGamygdala, CGp posterior cingulate cortex, CGa anterior cingulatecortex, LIP lateral intraparietal cortex, SEF supplementary eyefields, FEF frontal eye fields, PFC prefrontal cortex, OFCorbitofrontal cortex, SC superior colliculus, NAC nucleus accum-bens, SNr substantia nigra pars reticulata, SNc substantia nigrapars compacta, VTA ventral tegmental area. After (Platt et al.2004)

202

shift gaze to one of two possible target locations in orderto receive a fruit juice reward. A change from yellow tored indicated that looking to the upper target would berewarded while a change from yellow to green indicatedthat the lower target would be rewarded. In successiveblocks of trials, either the volume of juice delivered foreach eye movement response or the probability that eachpossible movement would be cued and therefore rein-forced was varied. The investigators found that theactivity of LIP neurons was systematically modulated byboth the size of the reward a monkey could expect toreceive and the probability that a particular gaze shiftwould be reinforced—the two variables needed tocompute the expect value of the movement. Intriguingly,the effects of expected value on neuronal firing rateswere maximal early on each trial but diminished afterthe rewarded target had been cued. The informationencoded by LIP neurons thus appeared to track themonkey’s running estimate of the expected value of aparticular movement, which was maximal once thatmovement was cued (Glimcher 2002).

These experiments demonstrated that LIP neuronscarry information about the economic decision variablesthought to drive rational decision-making. In thosestudies, however, monkeys did not have a choice aboutwhich eye movement to make and thus the link betweendecision-making and LIP activity could not be directlyobserved. In a second experiment (Platt and Glimcher1999), the investigators studied the activity of LIP neu-rons while monkeys were permitted to freely choosebetween two movements differing in expected value.This allowed the investigators to derive a behavioralestimate of the monkey’s subjective valuation of eachresponse, which could be compared directly with LIPneuronal activity.

Under these conditions, monkeys behaved like ra-tional consumers: their pattern of target choices wassensitive to the amount of fruit juice associated witheach target (Fig. 2b). LIP neuronal activity was alsosensitive to expected value in the free choice task(Fig. 2a, c). Activity prior to movement increased sys-tematically with its expected value in concert with themonkey’s valuation of the same movement. Based onthese data, Platt and Glimcher (1999) concluded thatLIP neurons encode the expected value of potential eyemovements. More specifically, Gold and Shadlen (2002)have argued that these data are consistent with thehypothesis that LIP neurons signal the instantaneouslikelihood that the monkey will choose and execute aparticular eye movement, and that this likelihood iscomputed, at least in part, based on the expected valueof that movement.

Neurons in several other brain areas have been shownto carry signals related to the expected value of eyemovements. For example, neurons in the caudate nu-cleus were found to respond in anticipation of reward-predicting stimuli on a memory saccade task (Kawagoeet al. 1998), and the strength of this signal was correlatedwith the latency of behavioral response (Takikawa et al.

2002). On a reaction time eye movement task, antici-patory activity in the caudate was maximal when thecontralateral target was reliably rewarded, and trackedchanges in the monkeys‘ response time as the rewardedlocation was reversed (Lauwereyns et al. 2002). Antici-patory activity was absent, however, during a task inwhich the rewarded target unpredictably changed fromtrial to trial (Lauwereyns et al. 2002). Signals correlatedwith the reward value associated with eye movementshave also been found in dorsolateral prefrontal cortex(Leon and Shadlen 1999), supplementary eye fields(Amador et al. 2000; Stuphorn et al. 2000a), substantianigra pars reticulata (Sato and Hikosaka 2002), andeven the superior colliculus (Ikeda and Hikosaka 2003).

Expected value signals thus appear to penetrate theoculomotor system from the cortex through the finalcommon pathway in the superior colliculus, presumablyserving to bias eye movement selection towards maxi-mization of reward. Behavioral measures indicate thatexpected value systematically influences not only targetselection by eye movements, but also modulates saccademetrics, including amplitude, velocity, and reaction time

Fig. 2a–c Monkeys and posterior parietal neurons are sensitive toeye movement value. a Firing rate as a function of time for a singleLIP neuron when the monkey subject chose to shift gaze to thetarget in the neuronal response field. Black curve high value trials,grey curve low value trials. Tick marks indicate action potentialsrecorded on successive trials for the first ten trials of each block. bProportion of trials on which a single monkey subject chose target1 as a function of its expected value [reward size target 1/ (rewardsize target 1 + reward size target 2)]. c Average normalized pre-movement firing rate (±SE) for a single posterior parietal neuronin the same monkey as a function of the expected value of the sametarget. After Platt and Glimcher (1999). Reprinted by permissionfrom Nature

203

(Leon and Shadlen 1999; Takikawa et al. 2002; McCoyet al. 2003). These behavioral observations are consistentwith a model for oculomotor decision-making in whichexpected value systematically biases neuronal activity

throughout the cortical and subcortical oculomotorafferents to the superior colliculus. Such modulations insaccade-associated activity presumably result in thedifferential activation of pools of motor neurons in theoculomotor brainstem that ultimately generate the pat-terns of muscle contraction responsible for shifting gaze.This model, however, has yet to be tested functionallyusing either microstimulation or inactivation techniques.

Learning from mistakes: dopamine and prediction error

Recent studies by Platt and colleagues (McCoy et al.2003) have demonstrated that the expected value of eyemovements is rapidly updated in the primate brain whenreward contingencies change. Figure 3a shows the pat-tern of choice behavior in the target choice task for onewell-trained monkey as a function of time following theintroduction of a novel set of reward contingencies.Each graph plots the frequency of choosing target 1 as afunction of the relative value of that target (reward fortarget 1 divided by the sum of the reward values offeredfor targets 1 and 2) for data gathered with multipledifferent target values over several weeks of experimen-tal sessions. The choice functions were well fit bycumulative normal functions, which became steeper overtime but nearly reached asymptote after only about tentrials. Figure 3b plots reward sensitivity (Weber frac-tion) against time for two monkeys. These data indicatethat the monkeys rapidly learned to choose the targetassociated with the greatest amount of reward.

These data suggest that the brain continuously eval-uates the reward (or punishment) outcome associatedwith movements and uses that information to update theexpected value of those movements. Learning theoristshave long argued that a comparison of reward expec-tation and reward outcome directly drives learning(Pearce and Hall 1980; Rescorla and Wagner 1972) andshould therefore be encoded in the nervous system. Thecomparison of expected and actual reward is known as areward prediction error (Sutton and Barto 1981), and isdefined formally as the instantaneous discrepancy be-tween the maximal associative strength sustained by thereinforcer and the current strength of the predictivestimulus (Schultz 2004).

Reward prediction error models of learning havebeen invoked to explain previously elusive behavioralphenomena such as blocking. In traditional associative

Fig. 3a, b Monkeys rapidly learn to choose the high value target.a Proportion of trials on which a single monkey chose target 1 as afunction of its expected value. Individual graphs plot choicefunctions for sequential five trial blocks following a change inreward contingencies favoring target 1. Red line indicates acumulative normal function fit to the choice data. b Rewardsensitivity over time. Graph plots the steepness of the psychometricfunction (Weber fraction) computed by dividing the standarddeviation by the mean of the cumulative normal function fit to thechoice data in each five trial block for two monkey subjects. AfterMcCoy et al. (2003). Reprinted with permission from Neuron

b

204

models of learning (e.g., Skinner 1981; Thorndike 1898)any cue paired with reinforcement should be learned,but in blocking paradigms redundant cues paired withrewards fail to be learned (the cues are ‘‘blocked’’ fromacquiring associative value). The typical blockingexperiment introduces four stimuli, A, B, X, Y, whichare either paired with a reward or with nothing. In thefirst stage of the experiment, a subject learns that stim-ulus A is paired with a reward while stimulus B is not.Once this is learned, the same stimuli are subsequentlypaired with two novel stimuli (X and Y), and, in thissecond stage of the experiment, the joint stimuli AX andBY are both paired with rewards. If learning weremerely associative, the subject would respond to bothnovel stimuli X and Y as if they predicted the reward. Infact, under these conditions subjects learn to associateY, but not X, with reward. The explanation for thiseffect lies in the fact that because stimulus A fully pre-dicted reward in stage 1, stimulus X was redundant andwas therefore not highlighted for learning (Rescorla andWagner 1972). Reward prediction error models, such asthe Rescorla-Wagner model or temporal differencelearning models (Sutton and Barto 1981), nicely accountfor these results because once the relationship betweencue A and reward has been learned, the prediction errorterm goes to 0 and remains unchanged when the com-bined stimulus AX is followed by a reward in stage 2;hence, no new learning occurs for the redundant cue.

Over the past decade, Wolfram Schultz and col-leagues have demonstrated that the activation of dopa-mine neurons in the midbrain represents, at least in part,an instantiation of a reward prediction error in neuralcircuitry (Schultz and Dickinson 2000). These conclu-sions rest on the observation that the activity of dopa-mine neurons in the substantia nigra and ventraltegmental area is elevated by the delivery of unpredictedrewards, unchanged following the delivery of predictedrewards, and depressed when expected rewards arewithheld (Schultz 1998). Moreover, in blocking para-digms these dopamine neurons fail to respond toredundant cues (Waelti et al. 2001). The activity of theseneurons has therefore been proposed to encode rewardprediction error and thus to determine the direction andrate of learning: learning occurs when the error is posi-tive and the reward is not predicted by the stimulus; nolearning occurs when the error is 0 and the reward isfully predicted; forgetting or extinction occurs when theerror is negative and the reward is less than predicted bythe stimulus; and the rate of learning is determined bythe absolute size of the error, whether positive or neg-ative (Schultz and Dickinson 2000).

Recent evidence has called into question whether thesignals carried by dopaminergic midbrain neurons arespecific to rewards or can be generalized to other salientstimuli (Horvitz 2002). Horvitz and others have shownthat putative dopamine neurons respond to a variety ofnovel or arousing events, including aversive as well asrewarding stimuli (Salamone 1994; Horvitz 2000; but seeMirenowicz and Schultz 1996). These observations

suggest that the responses of dopamine neurons mighttherefore signal surprising events in the environmentregardless of whether they are rewarding or aversive(Horvitz 2002). Rather than encoding a reward predic-tion error per se, dopamine neurons might instead gatecortical and limbic inputs to the striatum following un-predicted events and thereby facilitate learning. The roleof dopamine neurons in learning thus appears to bemore complex than previously thought, and a deeperunderstanding will likely lie in further investigations ofthe response of dopamine neurons to all salient stim-uli—good or bad.

Updating expectations: the role of posterior cingulatecortex

Up to this point, we have discussed the evidence thatneurons in parietal cortex, prefrontal cortex, the basalganglia, and the superior colliculus carry informationabout the expected value of the decision about where tolook. Midbrain dopamine neurons, on the other hand,appear to play a role in processing reward-relatedinformation in the abstract—highlighting when expectedand actual reward outcomes are in conflict. Such a signalwould be useful for initiating the assignment of newestimates of the expected value of potential behaviors,and evidence increasingly suggests this to be the case(Montague and Berns 2002). But how are abstract pre-diction error signals generated by midbrain dopamineneurons linked to the selection and performance ofspecific actions, such as eye movements, that maximizereward?

With this question in mind, Platt and colleagues(McCoy et al. 2003) investigated the properties of neuronsin posterior cingulate cortex (CGp), a poorly understoodpart of the limbic system thought to be related to eyemovements (Olson et al. 1996), spatial localization(Sutherland et al. 1988; Harker and Whishaw 2002), andlearning (Bussey et al. 1997; Gabriel et al. 1980; Gabrieland Sparenborg 1987). Many CGp neurons are activatedfollowing saccades, in contrast with the activation ofsaccade-related neurons in parietal and prefrontal cortexprior to movement (Olson et al. 1996). Such a pattern ofactivation is suggestive of an evaluative, rather thangenerative, role in oculomotor behavior, an idea firstsuggested by Vogt and colleagues (Vogt et al. 1992).Anatomically, CGp is interconnectedwith reward-relatedareas of the brain (see Fig. 1), including the anterior cin-gulate cortex (Morecraft et al. 1993), orbitofrontal cortex(Cavada et al. 2000), and the caudate nucleus (BaleydierandMauguiere 1980), as well as oculomotor areas such asparietal cortex (Cavada and Goldman-Rakic 1989), pre-frontal cortex (Vogt and Pandya 1987) and the supple-mentary eye fields. These connections provide potentialsources for a functional linkage of motivational andoculomotor information within CGp.

To test the idea that CGp links motivational out-comes with eye movements, Platt and colleagues

205

recorded the activity of CGp neurons while monkeysshifted gaze to a single visual target for fruit juice re-wards. In the first experiment, the size of the rewardassociated with visually-guided saccades was held con-stant for a block of 50 trials, and then varied betweenblocks, while the oculomotor behavior of the monkeysand the activity of CGp neurons were examined. Theauthors found that saccade metrics were sensitive to thesize of reward associated with gaze shifts. Specifically,monkeys made faster, higher velocity saccades whenexpecting smaller rewards, a strategy consistent withreducing the delay to reinforcement and therebyincreasing fluid intake rate in low reward blocks of trials.This observation is consistent with the idea that, even inthe absence of an overt decision task, monkeys aresensitive to the expected value of gaze shifts.

In this study, the activity of CGp neurons was alsosystematically modulated by reward value, illustrated bydata from two example neurons in Fig. 4. Single neuronswere sensitive to reward size following movement(Fig. 4a) as well as following reward delivery (Fig. 4b).These represented largely separate modulations since theevents were separated in time by at least 500 ms. Acrossthe population of studied neurons, approximately one-third of the population was sensitive to reward sizefollowing movement and another third following thereceipt of reward (Fig. 4c). These modulations by re-ward size were independent of any effects of saccademetrics, demonstrated by the inclusion of saccadeamplitude, latency, and peak velocity as independentfactors in multiple linear regression analysis of firing rateas a function of reward size. These results thus demon-strate that information about both the predicted andexperienced reward value of a particular eye movementis carried by the activity of CGp neurons. Such modu-lation in neuronal activity by reward size cannot be ac-counted for solely by reafferent input from motor areas.

In this experiment, the net effect of changes in rewardvalue was to scale the gain of spatially-selective neuronalresponses following movement. Sixty-two percent ofstudied neurons in CGp responded in a spatially selectivemanner following eye movement onset. Among these,neurons excited following a particular movement werefurther excited when the expected value of that movementwas increased; similarly, neurons suppressed following aparticular movement were further suppressed when theexpected value of that movement was increased (Fig. 4d).Across the population, reward value thus tuned the spatialsensitivity of the CGp neuronal population to saccadedirection. Improved spatial sensitivity under high rewardconditions may be associated with the slower, moredeliberate saccades made by monkeys under high rewardconditions in this experiment.

The timing of modulations in CGp activity by rewardvalue suggests that this area encodes both the predictedand experienced value of a particular movement. If so,CGp might also be expected to carry signals related toan error in reward prediction, arising from a discrepancybetween predicted and experienced value. This hypoth-

esis was addressed in a second experiment in which re-ward size was held constant but deliveredprobabilistically while monkeys shifted gaze to a targetfixed in the area of maximal response for the neuronunder study. Correct trials were reinforced on a variablereward schedule of 0.8, meaning that, on average, 80%of correct trials were followed by an auditory noise burstand reward (‘‘rewarded trials’’), while the remaining20% of correct trials were followed by an auditory noiseburst only but no juice (‘‘unrewarded trials’’). The rel-atively low frequency of unrewarded trials allowed themto serve as catch trials for which a predicted reward wasunexpectedly withheld.

Once again, the behavior of monkeys and the activityof CGp neurons were both sensitive to this manipulationin reward contingencies. After predicted rewards werewithheld, monkeys made higher velocity saccades, sim-ilar to the effects of low reward on saccade metrics foundin the previous experiment. Similarly, the activity ofsingle CGp neurons was also significantly different onrewarded and unrewarded trials (Fig. 5a, b). Across thepopulation of cells studied in this experiment, firing ratefollowing the usual time of reward delivery was signifi-cantly greater when expected rewards were omitted(Fig. 5c). These data demonstrate that CGp neuronsfaithfully report the omission of expected reward, sug-gestive of a prediction error-like signal for eye move-ments (Schultz and Dickinson 2000).

Intriguingly, many CGp neurons responded equiva-lently to the delivery of larger than average rewards and

Fig. 4a–d Representation of saccade value in posterior cingulatecortex. a Left panel, average firing rate (±SE) for a single CGpneuron plotted as a function of time on high reward (black curve)and low reward (grey curve) blocks of trials. Rasters indicate timeof action potentials on individual trials. All trials are aligned onmovement onset. Right panel, average firing rate (±SE) followingmovement onset (grey shaded region on left panel) plotted as afunction of reward size, for the same neuron. Reward size ismeasured as the open time (ms) of a computer driven solenoid andis linearly related to juice volume. b left panel, average firing rate(±SE) for a second CGp neuron plotted as a function of time onhigh reward (black curve) and low reward (grey curve) blocks oftrials. Rasters indicate time of action potentials on individual trials.All trials are aligned on reward offset. Right panel, average firingrate (±SE) following reward offset (grey shaded region on leftpanel) plotted as a function of reward size, for the same neuron.c Population data. Proportion of CGp neurons with significantmodulation by reward size, peak saccade velocity, saccadeamplitude, and saccade latency plotted as a function of time.Significant modulations P< 0.05 for each individual factor in amultiple regression analysis of firing rate in each of ten sequential200 ms epochs. The size of liquid reward delivered on correct trialswas controlled linearly by the open time of a computer-drivensolenoid (volume=0.0026+0.001·open time in ms). d Rewardvalue scales the gain of CGp responses. The correlation coefficientbetween firing rate and reward size for each neuron was plotted as afunction of movement response index, a logarithmically scaledmeasure of the degree to which each neuron was excited orinhibited after movement relative to fixation level activity onmapping trials. Each dot represents data for one cell (n=67) andthe best-fit line is shown in grey. Note: two outliers were removedfrom this analysis. After McCoy et al. (2003). Reprinted withpermission from Neuron

c

206

the omission of predicted rewards (McCoy et al. 2003),unlike dopamine neurons which respond with a burst ofaction potentials following unpredicted rewards but aresuppressed following the omission of predicted rewards(Schultz et al. 1997). The reward modulation of neuro-

nal activity in CGp is therefore consistent with atten-tional theories of learning, which posit that rewardprediction errors highlight unpredicted stimuli asimportant for learning (Mackintosh 1975; Pearce andHall 1980). According to this idea, the absolute value of

207

the neuronal response correlates with the extent to whichthe reward event differed from expectation, whether in apositive or negative direction. While such a signal wouldnot contain information about ‘‘what’’ needs to belearned about the relationship between a stimulus andreward, such a signal would be useful for instructing‘‘when’’ and ‘‘how rapidly’’ learning should occur. Someof the saccade-related reward signals uncovered in CGpmay best be understood in terms of such an attentionallearning theory.

Taken together, the results of these experimentssuggest that neurons in CGp link motivational out-comes with gaze shifts. Recent studies have reportedthat neurons in the supplementary eye fields, withwhich CGp is reciprocally connected, also respond toreward outcomes associated with eye movements(Amador et al. 2000; Stuphorn et al. 2000b). Neuronsin other areas of cingulate cortex also have beenimplicated in guiding actions based on reward value.While studying timing behavior in primates, Niki andWatanabe (1979) first uncovered two classes of reward-error units in anterior cingulate cortex (CGa): oneclass responding after juice delivery, and the otherresponding following incorrect trials as well as fol-lowing the omission of reward on correct trials. Singlecells in anterior cingulate cortex (CGa) have also beenfound to carry signals related to reward expectation ona cued multi-trial color discrimination task (Shidaraand Richmond 2002). In the rostral cingulate motorarea (CMAr), Shima and Tanji (1998) found neuronsthat responded when a change in reward contingen-cies—but not a neutral cue—prompted monkeys tochange the strategy of their behavioral response inorder to maximize their receipt of reward. Inactivationof the CMAr by muscimol injection resulted ininsensitivity of the monkeys to a reward decrementand an apparent inability to modify their behavioralstrategy in order to maximize reward. Taken together,these data are consistent with a direct role for cingu-late cortex and supplementary eye fields in linkingmotivational outcomes to action.

Neurons in the primate ventral striatum (Cromwelland Schultz 2003; Hassani et al. 2001; Schultz 2004) andorbitofrontal cortex (Rolls 2000; Tremblay and Schultz1999) have also been shown to carry information aboutpredicted and experienced rewards. These neurons donot appear to be directly related to the generation ofaction, but their responses otherwise bear strikingresemblance to those in cingulate cortex. In particular,neurons in ventral striatum and orbitofrontal cortexrespond during delay periods preceding reinforcement,as well as following reinforcement, and these responsesreflect the subjective preferences of subjects for partic-ular rewards (Tremblay and Schultz 1999). Thus, or-bitofrontal cortex and ventral striatum appear toconvert information about rewards, punishments, andtheir predictors into a common internal currency(Montague and Berns 2002). Neurons in the dorsalstriatum and dorsolateral prefrontal cortex, on the otherhand, show similar response properties, but are acti-vated in association with particular movements, muchlike neurons in CGp and supplementary eye fields. Insummary, the response properties of neurons in ventralstriatum and orbitofrontal cortex suggest they provide avaluation scale for a broad range of stimuli, (Montagueand Berns 2002), while those in the dorsal striatum,dorsolateral prefrontal cortex, supplementary eye fields,and cingulate cortex may serve to associate this infor-mation with specific actions (Schultz 2004).

Fig. 5a, b Representation of saccade reward prediction error inposterior cingulate cortex. a Firing rate as a function of time for asingle CGp neuron on rewarded (black curve) and unrewarded(grey curve) delayed saccade trials. On average, eight out of tenrandomly-selected correct trials were rewarded. Conventions as inFig. 4. b Population response to reward omission. c Average(+SE) normalized firing rate measured after the normal time ofreward delivery for the population of CGp neurons on rewardedand unrewarded trials. After McCoy et al. (2003). Reprinted withpermission from Neuron

208

Decision-making in the human brain

Recent neuroimaging studies suggest that some of thesame neurophysiological processes guiding oculomotordecision-making characterize more abstract represen-tations of reward, punishment, and decision-making inhumans. Specifically, the striatum, amygdala, orbito-frontal cortex, prefrontal cortex, anterior cingulatecortex, and parietal cortex are activated by rewardsand reward-associated stimuli (Elliott et al. 2003).Moreover, hemodynamic responses are modulated byreward uncertainty in orbitofrontal cortex, ventralstriatum (Berns et al. 2001; Critchley et al. 2001), andCGp (Smith et al. 2002) and are linearly correlatedwith expected value in orbitofrontal cortex (O’Dohertyet al. 2001), ventral and dorsal striatum (Delgado et al.2000, 2003), amygdala (Breiter et al. 2001), and pre-motor cortex (Elliott et al. 2003). Further, errors inpredicting rewards or punishments evoke hemody-namic responses in anterior cingulate cortex (Holroydet al. 2004), ventral striatum (Pagnoni et al. 2002;Seymour et al. 2004), CGp (Smith et al. 2002) andinsula (Seymour et al. 2004), and error-related elec-trophysiological responses have been recorded fromthe medial frontal and anterior cingulate cortices withscalp electrodes in humans as well (Holroyd et al.2003). These observations indicate that brain regionscarrying value-related information are activated in asimilar fashion in monkeys and humans performingdisparate types of learning and decision-making tasks.Human neuroimaging studies, however, have not yettested the hypothesis that pools of neurons codingdifferent movements are activated in proportion tomovement value.

Conclusions

In summary, brain regions have been identified thatappear to participate in several stages of oculomotordecision-making, from sensation, to reward expecta-tion, to action, to outcome evaluation. Neurons inparietal cortex, prefrontal cortex, the basal ganglia,and the superior colliculus have been shown to encodein their firing rates the expected value of eye move-ments. Dopamine neurons in the midbrain, on theother hand, appear to signal an abstract reward pre-diction error, encoding in their firing rates a discrep-ancy between predicted and actual rewards. Becausedopamine neurons terminate on glutamatergic inputsto the striatum from orbitofrontal cortex and amyg-dala (among other cortical and limbic structures), theyare well-situated to gate the flow of motivationalinformation through the striatum and other basalganglia structures, whose purpose is ultimately toproduce adaptive behaviors. Neurons in the ventralstriatum, orbitofrontal cortex, and cingulate cortices,in turn, appear to signal the relative value or salienceof features in the environment that may be important

for learning and controlling behavior. In particular,posterior cingulate and supplementary eye field neu-rons have been recently shown to carry signals relatedto both reward outcomes and oculomotor behavior,and may contribute to updating orienting value signalsin parietal and prefrontal cortex. Activation of thiscomplex network thus appears to underlie visually-guided behavior—in particular, how animals choosewhere to look- and serves as a model for under-standing behavioral decision-making more generally.

References

Amador N, Schlag-Rey M, Schlag J (2000) Reward-predicting andreward-detecting neuronal activity in the primate supplemen-tary eye field. J Neurophysiol 84:2166–2170

Arnauld A, Nichole P (1662) The art of thinking: Port-Royal logic.Translated by Dickoff J, James P. Bobbs-Merrill, Indianapolis

Baleydier C, Mauguiere F (1980) The duality of the cingulate gyrusin monkey. Neuroanatomical study and functional hypothesis.Brain 103(3):525–554

Basso MA, Wurtz RH (1997) Modulation of neuronal activity bytarget uncertainty. Nature 389:66–69

Berns GS, McClure SM, Pagnoni G, Montague PR (2001) Pre-dictability modulates human brain response to reward. J Neu-rosci 21:2793–2798

Breiter HC, Aharon I, Kahneman D, Dale A, Shizgal P (2001)Functional imaging of neural responses to expectancy andexperience of monetary gains and losses. Neuron 30:619–639

Bussey TJ, Everitt BJ, Robbins TW (1997) Dissociable effects ofcingulate and medial frontal cortex lesions on stimulus-rewardlearning using a novel Pavlovian autoshaping procedure for therat: implications for the neurobiology of emotion. BehavNeurosci 111:908–919

Camerer CF (2003) Behavioral game theory: experiments in stra-tegic interaction (Roundtable Series in Behavioral Economics).Princeton University Press, Princeton

Cavada C, Goldman-Rakic PS (1989) Posterior parietal cortex inrhesus monkey: I. Parcellation of areas based on distinctivelimbic and sensory corticocortical connections. J Comp Neurol287:393–421

Cavada C, Company T, Tejedor J, Cruz-Rizzolo RJ, Reinoso-Su-arez F (2000) The anatomical connections of the macaquemonkey orbitofrontal cortex. A review. Cereb Cortex 10:220–242

Coe B, Tomihara K, Matsuzawa M, Hikosaka O (2002) Visual andanticipatory bias in three cortical eye fields of the monkeyduring an adaptive decision-making task. J Neurosci22(12):5081–5090

Colby CL, Duhamel JR, Goldberg ME (1996) Visual, presaccadic,and cognitive activation of single neurons in monkey lateralintraparietal area. J Neurophysiol 76:2841–2852

Critchley HD, Mathias CJ, Dolan RJ (2001) Neural activity in thehuman brain relating to uncertainty and arousal during antic-ipation. Neuron 29:537–545

Cromwell HC, Schultz W (2003) Effects of expectations for dif-ferent reward magnitudes on neuronal activity in primate stri-atum. J Neurophysiol 29:29

Delgado MR, Nystrom LE, Fissell C, Noll DC, Fiez JA (2000)Tracking the hemodynamic responses to reward and punish-ment in the striatum. J Neurophysiol 84:3072–3077

Delgado MR, Locke HM, Stenger VA, Fiez JA (2003) Dorsalstriatum responses to reward and punishment: effects of valenceand magnitude manipulations. Cogn Affect Behav Neurosci3:27–38

Dorris MC, Munoz DP (1998) Saccadic probability influencesmotor preparation signals and time to saccadic initiation.J Neurosci 18:7015–7026

209

Elliott R, Newman JL, Longe OA, Deakin JF (2003) Differentialresponse patterns in the striatum and orbitofrontal cortex tofinancial reward in humans: a parametric functional magneticresonance imaging study. J Neurosci 23(1):303–307

Gabriel M, Sparenborg S (1987) Posterior cingulate cortical lesionseliminate learning-related unit activity in the anterior cingulatecortex. Brain Res 409:151–157

Gabriel M, Orona E, Foster K, Lambert RW (1980) Cingulatecortical and anterior thalamic neuronal correlates of reversallearning in rabbits. J Comp Physiol Psychol 94:1087–1100

Glimcher P (2002) Decisions, decisions, decisions: choosing a bio-logical science of choice. Neuron 36:323–332

Glimcher P (2003) Decisions, Uncertainty, and the brain: the sci-ence of neuroeconomics. MIT Press, Cambridge

Gnadt JW, Andersen RA (1988) Memory related motor planningactivity in posterior parietal cortex of macaque. Exp Brain Res70:216–220

Gold JI, Shadlen MN (2002) Banburismus and the brain: decodingthe relationship between sensory stimuli, decisions, and reward.Neuron 36:299–308

Goldberg ME, Colby CL, Duhamel JR (1990) Representation ofvisuomotor space in the parietal lobe of the monkey. ColdSpring Harb Symp Quant Biol 55:729–739

Gottlieb JP, Kusunoki M, Goldberg ME (1998) The representationof visual salience in monkey parietal cortex. Nature 391:481–484

Hanes DP, Schall JD (1996) Neural control of voluntary movementinitiation. Science 274:427–430

Harker KT, Whishaw IQ (2002) Impaired spatial performance inrats with retrosplenial lesions: importance of the spatial prob-lem and the rat strain in identifying lesion effects in a swimmingpool. J Neurosci 22:1155–1164

Hassani OK, Cromwell HC, Schultz W (2001) Influence of expec-tation of different rewards on behavior-related neuronal activityin the striatum. J Neurophysiol 85:2477–2489

Herrnstein RJ (1961) Relative and absolute strength of response asa function of frequency of reinforcement. J Exp Anal Behav4:267–272

Herrnstein RJ (1997) The matching law: papers in psychology andeconomics. Harvard University Press, Cambridge

Holroyd CB, Nieuwenhuis S, Yeung N, Cohen JD (2003) Errors inreward prediction are reflected in the event-related brain po-tential. Neuroreport 14:2481–2484

Holroyd CB, Nieuwenhuis S, Yeung N, Nystrom L, Mars RB,Coles MG, Cohen JD (2004) Dorsal anterior cingulate cortexshows fMRI response to internal and external error signals.Nat Neurosci 7:497–498

Horvitz JC (2000) Mesolimbocortical and nigrostriatal dopamineresponses to salient non-reward events. Neuroscience 96:651–656

Horvitz JC (2002) Dopamine gating of glutamatergic sensorimotorand incentive motivational input signals to the striatum. BehavBrain Res 137:65–74

Ikeda T, Hikosaka O (2003) Reward-dependent gain and bias ofvisual responses in primate superior colliculus. Neuron 39:693–700

Kawagoe R, Takikawa Y, Hikosaka O (1998) Expectation of re-ward modulates cognitive signals in the basal ganglia. NatNeurosci 1:411–416

Kim JN, Shadlen MN (1999) Neural correlates of a decision in thedorsolateral prefrontal cortex of the macaque. Nat Neurosci2:176–185

Lauwereyns J, Watanabe K, Coe B, Hikosaka O (2002) A neuralcorrelate of response bias in monkey caudate nucleus. Nature418:413–417

Leon MI, Shadlen MN (1999) Effect of expected reward magnitudeon the response of neurons in the dorsolateral prefrontal cortexof the macaque. Neuron 24:415–425

Mackintosh NJ (1975) Blocking of conditioned suppression: role ofthe first compound trial. J Exp Psychol Anim Behav Process1:335–345

McCoy AN, Crowley JC, Haghighian G, Dean HL, Platt ML(2003) Saccade reward signals in posterior cingulate cortex.Neuron 40:1031–1040

Mirenowicz J, Schultz W (1996) Preferential activation of midbraindopamine neurons by appetitive rather than aversive stimuli.Nature 379:449–451

Montague PR, Berns GS (2002) Neural economics and the bio-logical substrates of valuation. Neuron 36:265–284

Morecraft RJ, Geula C, Mesulam MM (1993) Architecture ofconnectivity within a cingulo-fronto-parietal neurocognitivenetwork for directed attention. Arch Neurol 50:279–284

Niki H, Watanabe M (1979) Prefrontal and cingulate unit activityduring timing behavior in the monkey. Brain Res 171:213–224

O’Doherty J, Kringelbach ML, Rolls ET, Hornak J, Andrews C(2001) Abstract reward and punishment representations in thehuman orbitofrontal cortex. Nat Neurosci 4:95–102

Olson CR, Musil SY, Goldberg ME (1996) Single neurons inposterior cingulate cortex of behaving macaque: eye movementsignals. J Neurophysiol 76:3285–3300

Pagnoni G, Zink CF, Montague PR, Berns GS (2002) Activity inhuman ventral striatum locked to errors of reward prediction.Nat Neurosci 5:97–98

Pavlov IP (1927) Conditioned reflexes. Oxford University Press,London

Pearce JM, Hall G (1980) A model for Pavlovian learning: varia-tions in the effectiveness of conditioned but not of uncondi-tioned stimuli. Psychol Rev 87:532–552

Platt ML (2002) Neural correlates of decisions. Curr Opin Neu-robiol 12(2):141–148

Platt ML, Glimcher PW (1997) Responses of intraparietal neuronsto saccadic targets and visual distractors. J Neurophysiol78:1574–1589

Platt ML, Glimcher PW (1999) Neural correlates of decisionvariables in parietal cortex. Nature 400:233–238

Platt ML, Lau B, Glimcher PW (2004) Situating the SuperiorColliculus within the Gaze Control Network. In: Hall WC,Moschovakis A (eds) The superior colliculus: new approachesfor studying sensorimotor integration. CRC Press, New York

Rescorla RA, Wagner AR (1972) A theory of Pavlovian condi-tioning. Variations in the effectiveness of reinforcement andnonreinforcement. In: Black AHaP WF (eds) Classical condi-tioning. II. Current research and theory. Appleton-Century-Crofts, New York

Roitman JD, Shadlen MN (2002) Response of neurons in the lat-eral intraparietal area during a combined visual discriminationreaction time task. J Neurosci 22:9475–9489

Rolls ET (2000) The orbitofrontal cortex and reward. CerebCortex 10:284–294

Salamone JD (1994) The involvement of nucleus accumbensdopamine in appetitive and aversive motivation. Behav BrainRes 61:117–133

Sato M, Hikosaka O (2002) Role of primate substantia nigra parsreticulata in reward-oriented saccadic eye movement. J Neu-rosci 22:2363–2373

Schultz W (1998) Predictive reward signal of dopamine neurons.J Neurophysiol 80:1–27

Schultz W (2004) Neural coding of basic reward terms of animallearning theory, game theory, microeconomics and behaviouralecology. Curr Opin Neurobiol 14:139–147

Schultz W, Dickinson A (2000) Neuronal coding of prediction er-rors. Annu Rev Neurosci 23:473–500

Schultz W, Dayan P, Montague PR (1997) A neural substrate ofprediction and reward. Science 275:1593–1599

Seymour B, O’Doherty JP, Dayan P, Koltzenburg M, Jones AK,Dolan RJ, Friston KJ, Frackowiak RS (2004) Temporal dif-ference models describe higher-order learning in humans. Nat-ure 429:664–667

Shadlen MN, Newsome WT (1996) Motion perception: seeing anddeciding. Proc Natl Acad Sci U S A 93:628–633

Shadlen MN, Newsome WT (2001) Neural basis of a perceptualdecision in the parietal cortex (area LIP) of the rhesus monkey.J Neurophysiol 86:1916–1936

Sherrington CS (1906) The integrative action of the nervous sys-tem. Scribner, New York

210

Shidara M, Richmond BJ (2002) Anterior cingulate: single neuro-nal signals related to degree of reward expectancy. Science296:1709–1711

Shima K, Tanji J (1998) Role for cingulate motor area cells involuntary movement selection based on reward. Science282:1335–1338

Skinner BF (1981) Selection by consequences. Science 213:501–504Smith K, Dickhaut J, McCabe K, Pardo JV (2002) Neuronal

substrates for choice under ambiquity, risk, gains, and losses.Mangement Science 48:711–718

Speiser D (ed) (1982) The works of Daniel Bernouilli. Birkhauser,Boston

Stephens DW, Krebs JR (1986) Foraging theory. Princeton Uni-versity Press, Princeton

Stuphorn V, Taylor TL, Schall JD (2000a) Performance monitor-ing by the supplementary eye field. Nature 408:857–860

Stuphorn V, Taylor TL, Schall JD (2000b) Performance monitor-ing by the supplementary eye field. Nature 408:857–860

Sutherland RJ, Whishaw IQ, Kolb B (1988) Contributions of cin-gulate cortex to two forms of spatial learning and memory. JNeurosci 8:1863–1872

Sutton RS, Barto AG (1981) Toward a modern theory of adaptivenetworks: expectation and prediction. Psychol Rev 88:135–170

Takikawa Y, Kawagoe R, Itoh H, Nakahara H, Hikosaka O(2002) Modulation of saccadic eye movements by predictedreward outcome. Exp Brain Res 142:284–291

Thorndike LL (1898) Animal intelligence: an experimental study ofthe asssociative processes in animals. Psychol Rev Monogr[Suppl 11]

Tremblay L, Schultz W (1999) Relative reward preference in pri-mate orbitofrontal cortex. Nature 398:704–708

Vogt BA, Pandya DN (1987) Cingulate cortex of the rhesusmonkey. II. Cortical afferents. J Comp Neurol 262:271–289

Vogt BA, Finch DM, Olson CR (1992) Functional heterogeneity incingulate cortex: the anterior executive and posterior evaluativeregions. Cereb Cortex 2:435–443

211