Supplemental information for "Risk-responsive orbitofrontal neurons track acquired salience

12
Neuron, Volume 77 Supplemental Information Risk-Responsive Orbitofrontal Neurons Track Acquired Salience Masaaki Ogawa, Matthijs A.A. van der Meer, Guillem R. Esber, Domenic H. Cerri, Thomas A. Stalnaker, and Geoffrey Schoenbaum

Transcript of Supplemental information for "Risk-responsive orbitofrontal neurons track acquired salience

Neuron, Volume 77

Supplemental Information

Risk-Responsive Orbitofrontal Neurons

Track Acquired Salience

Masaaki Ogawa, Matthijs A.A. van der Meer, Guillem R. Esber, Domenic H. Cerri, Thomas A. Stalnaker, and Geoffrey Schoenbaum

Figure S1 (related to Figure 4)

Time course of variance in firing rates explained by the CS-US and CS-noUS regressors

(magenta line, the neurons in which variance in firing were explained significantly by

both regressors and the correlation coefficients for the regressors were the same (89

neurons); blue line, all task-responsive neurons). Additional variance explained was

plotted using a sliding window of 250 ms that was advanced in 50 ms time steps. Shading

represents s.e.m.

Figure S2 (related to Figure 4)

Numbers of the neurons exhibiting specific overall patterns of activity (left) or additional

variance explained by both the CS-noUS and CS-US or by both the risk and CS-US

regressors (light red or light green bar, respectively) averaged across all task-responsive

neurons in each group (right). Neurons were further categorized into three groups

depending on the results in the regression analysis shown in Fig 3b (left). Color cord is

the same with that used in Fig 3b (left). “33-67-100-0”, for example, denotes a group of

neurons exhibiting ordering of average firing rate 33%>67%>100%>0% or

0%>100%>67%>33% pattern during the outcome anticipation period. If the average

firing rate was the same in anticipation of different probabilities of reward, then count

was divided equally among all of the possible orderings. For example, if a cell exhibits

33%>67%>100%=0% ordering pattern, then count of 0.5 was equally added to 33-67-

100-0 and 33-67-0-100, considering two possible orderings (33%>67%>100%>0% and

33%>67%>0%>100%, respectively). Numbers of the neurons were significantly higher

(or lower) than expected by chance (shown with broken line; 23.5) (***p <0.001, * p <

0.05, chi-square test).

Table S1. Behaviour was stable during recording (related to Figure 1)

Number of licks (average ± SEM)

100% 67% 33% 0%

First 1/3 sessions 23.3 ± 3.4 24.3 ± 3.3 27.7 ± 4.4 16.1 ± 3.5

Second 1/3 sessions 25.8 ± 2.2 30.1 ± 2.5 33.2 ± 3.8 19.8 ± 3.7

Last 1/3 sessions 24.3 ± 2.2 26.0 ± 2.3 27.4 ± 2.1 17.3 ± 3.9

Movement latency (sec, average ± SEM)

100% 67% 33% 0%

First 1/3 sessions 0.76 ± 0.05 0.75 ± 0.05 0.71 ± 0.05 1.65 ± 0.64

Second 1/3 sessions 0.68 ± 0.02 0.66 ± 0.04 0.65 ± 0.04 3.11 ± 2.08

Last 1/3 sessions 0.66 ± 0.03 0.63 ± 0.04 0.61 ± 0.04 0.88 ± 0.10

Two-factor ANOVA's revealed that there were significant main effect of trial type

(number of licks; p < 0.001, movement latency; p < 0.05), but there were no significant

effects of session block (number of licks; p = 0.129, movement latency; p = 0.430), nor

any interactions between trial type and block (number of licks; p = 0.995, movement

latency; p = 0.569).

Table S2. Comparison of the acquired salience and the risk model using standard

model comparison metrics (related to Figure 3)

Acquired

salience

Risk Common A-S vs Risk

(P value)

A-S vs Common

(P value)

R-squared 0.111 0.104 0.0457 3.53-10 1.65-47

Adjusted R-squared 0.0912 0.0832 0.0325 3.63-10 3.30-33

AIC 682 685 697 2.06-10 3.68-17

Leave-one-out cross-validation 3.33 3.38 4.52 0.0126 3.91-8

Partial correlation coefficient 0.0439 0.0341 2.92-10

Shown in the left three columns are mean values for each metrics across all task-

responsive 282 neurons. “A-S” or “Risk” denotes the acquired salience model and the

risk model, respectively. “Common” denotes a model that includes only the common

regressors (i.e. response latencies and number of licks) between the two models. For

partial correlation coefficient, the effect of the addition of the CS-noUS or the risk for the

acquired salience or the risk model, respectively, is shown. Shown in the right two

columns are p values for each model comparison (Wilcoxon signed-rank test). “AIC”

denotes Akaike's Information Criterion. Note more accurate model has smaller values for

AIC and leave-one-out cross-validation, but larger values for the other metrics.

Supplemental Experimental Procedures

Behavioral task

Recording was conducted in aluminum chambers approximately 18” on each side with

sloping walls narrowing to an area of 12” x 12” at the bottom. A central odor port was

located above two adjacent fluid wells on a panel in the right wall of each chamber, one

of which was blocked and inoperative for the current experiment. Two lights were

located above the panel. The odor port was connected to an air flow dilution olfactometer

to allow the rapid delivery of olfactory cues. Task control was implemented via

computer. Port entry and licking was monitored by disruption of photobeams. Odors were

chosen from compounds obtained from International Flavors and Fragrances.

The basic design of a trial is illustrated in Figure 1. After illumination of the house light,

the rat could initiate a trial by nosepoking into the odor port, which resulted in delivery of

an odor cue for 0.5 s to a small hemicylinder located behind this opening. One of four

different odors (Auralva, Para-isopropyl hydratropic aldehyde, Camekol DH, or Verbena

oliffac) was delivered to the port on each trial, in a pseudorandom order. After odor

offset, rats were required to make a response at the fluid well within 100 s. In other

words, rats had to wait 100 s for the task to proceed if they did not respond at the fluid

well. As a result, rats learned to respond on nearly every trial.

Odors were associated with different probabilities of reward: 100%, 67%, 33%, and 0%.

In a rewarded trial, a sucrose reward (0.1 ml bolus of 10% sucrose solution) was

delivered 1 sec after the well entry. The house light was turned off when rats left the

fluid well after reward consumption. In a non-rewarded trial, the house light was turned

off one sec after the entry, in order to signal the end of the trial without reward

presentation. Importantly, rats did not necessarily need to wait for 1s in the well; all what

was required for the task to proceed was a response at the well. Inter-trial interval (ITI)

was 3 seconds. Rats were well-trained (~4 weeks) prior to the start of recording, such that

they responded on all rewarded trials and > 99.0% of the trials in which the odor

associated with 0% reward is presented. In addition, these rats’ behaviors were highly

stable during recording (see Table S1). During recording sessions, the rats waited for

1sec in the fluid well on 41.2% ± 3.0% of the trials after presentation of 0% odor. There

were no differences in movement latency without early well exit: 2.12 ± 0.74 sec and that

with early well exit: 1.97 ± 0.60 sec, p = 0.123, Wilcoxon signed rank test).

Single-unit recording

Drivable bundles of stereotrodes were screened for activity daily; if no activity was

detected, the rat was removed from the recording chamber, and the electrode assembly

was advanced 40 or 80 um. Otherwise, active wires were selected to be recorded, a

session was conducted, and the electrode was advanced at the end of the session. Neural

activity was recorded using Multichannel Acquisition Processor systems (Plexon),

interfaced with the recording chamber. Signals from the electrode wires were amplified

20X by an op-amp headstage (Plexon, HST/8o50-G20-GR), located on the electrode

array. Immediately outside the training chamber, the signals were passed through a

differential pre-amplifier (Plexon, PBX2/16sp-r-G50/16fp-G50), where the single unit

signals were amplified 50X and filtered at 150-9000 Hz. The single unit signals were

then sent to the Multichannel Acquisition Processor box, where they were further filtered

at 250-8000 Hz, digitized at 40 kHz and amplified at 1-32 X. Waveforms (>2.5:1 signal-

to-noise) were extracted from active channels and recorded to disk by an associated

workstation with event timestamps from the behavior computer.

Details of the multiple regression analysis

To systematically explore the factors contributing to OFC firing during the outcome

anticipation period, we fit different regression models to the spike counts of each task-

responsive neuron (282 neurons), defined as those neurons that fired significantly more

or less in any task event (odor sampling (0.5 s after odor onset), reward-waiting (outcome

anticipation period, 1 s), and reward presentation (or omission) (1 s after reward or light

off) than during baseline period (1 s immediately preceding house light on) (Wilcoxon

signed-rank test, p < 0.001) . Models were then compared using standard model

comparison metrics averaged across neurons. The main analyses employed linear

regression models (Matlab “LinearModel.fit” function) fitted to raw spike counts. We

also fit linear regression models to log- or square-root transformed spike counts, and

found that the pattern of results was not affected by these choices.

The regression analysis addresses the question: does a model based on acquired salience,

or one based on risk, account for OFC activity better? We define acquired salience as the

sum of the CS-US and CS-noUS associative strengths. Without an a priori estimate of the

relative weighting of these two components, the regression model finds the weighting

that best fits the data, in a model of the form

y = 0 + 1x1 + 2x2 + 3x3 + 4(CS-US) + 5(CS-noUS) (1)

where y is a vector with the trial-by-trial spike counts of an individual neuron, x1-x3 are

“common” variables included in both the acquired salience and risk models (latency from

odor offset to unpoke from the odor port, latency from odor unpoke to well entry, number

of licks during the outcome anticipation period, respectively), and 1-5 are found by the

model fitting procedure. Note that only the relative values used in the CS-US and CS-

noUS regressors matter. In other words, [0 0.33 0.67 0] and [0 1 2 0] for CS-noUS would

be equivalent in accounting for a neuron with average firing rates of [0 5 10 0] to the four

cues; 5 would simply be rescaled to account for the difference in the definition of the

regressor.

The “acquired salience” model (1) above is pitted against the equivalent model for risk:

y = β0 + β1x1 + β2x2+ β3x3 + β4(CS-US) + β5σ2 (2)

where σ2 is risk: [0 1 1 0], y and x are as above, and the coefficients β are estimated by

the regression procedure.

Note that the “risk” model (2) contains a CS-US term, included in the model to prevent a

severe decrease in overall model performance relative to the acquired salience model,

which also has this term (1). This means that the “risk” model considers all possible

linear combinations of value and risk in accounting for neural activity. For completeness,

we also fit a “common” model that includes only the common regressors (i.e. x1, x2, and

x3) between the two models. Finally, we verified that the conclusions were the same by

the exclusion of these common terms.

We employed a range of model comparison metrics to determine whether the “acquired

salience” or “risk” model was more effective in accounting for the data. The most

straightforward measure is the coefficient of determination, R2, which is the proportion of

variance in the data explained by a particular model. Because we are interested

specifically in the contribution of acquired salience versus risk plus CS-US, the main

analysis (Figure 3) reports a coefficient of partial determination, i.e. the additional

variance explained by adding either acquired salience or risk plus CS-US to a common

model that contains the three behavioral variables (x1- x3), adjusted for the number of

parameters (adjusted R2). However, the pattern of results was unchanged when the

adjusted R2 of the full models was used.

As described, R2 and partial correlation measures are subject to overfitting, i.e. the tuning

of regression parameters to capture idiosyncrasies or noise in the data. Overfitted models

fail to capture regularities in the data that enable generalization to novel cases. Thus, a

more rigorous criterion for model comparison is cross-validation, which tests explicitly

how well the model performs on testing trials not used to fit the model. Specifically, we

performed leave-one-out cross-validation: for each neuron, we fit models on N-1 trials

and tested the fit on the trial left out, averaging the sum of squares error (SSE) through

iterations over leaving out each trial 1…N in turn. We also report Akaike’s Information

Criterion (AIC), which under certain assumptions is equivalent to leave-one-out cross-

validation. Thus, for each neuron, we obtained a number of model comparison metrics to

determine whether acquired salience or risk is more effective in accounting for OFC

neural activity. Distributions of these metrics across the population of neurons were

compared using the Wilcoxon signed-rank test.

Formal description of the full Esber-Haselgrove model

Following the Pearce-Hall model (Pearce and Hall, 1980; Pearce et al., 1982), Esber and

Haselgrove (2011) assume that when a cue is probabilistically reinforced, occasional

pairing of the cue and the outcome will lead to the formation of an association between

the representations of these two events. Once the outcome becomes expected on the basis

of the cue, however, occasional omission of the outcome on intermixed non-reinforced

trials will encourage the formation of a second association, ‘no-outcome’ representation.

The net expectation of reinforcement is obtained from subtracting the strength of the CS-

noUS association from that of the CS-US association. If this difference is positive, the

cue will be a conditioned excitor; conversely, a negative difference will define a

conditioned inhibitor. Formally, increments in CS-US associative strength are calculated

using a modified version of the delta rule:

ΔCS-US+ = α ( - (ΣCS-US – ΣCS-noUS)

where α and are learning-rate parameters representing the salience of the cue and the

outcome, respectively and the subindex “+” indicates an increment associative strength.

is the maximum associative strength supported by the outcome. The difference ΣCS-US –

ΣCS-noUS designates the net expectation of reinforcement based on all presented cues.

When this difference surpasses the value of the outcome, , indicating that the outcome is

overexpected, the CS-noUS association will grow according to:

ΔCS-noUS+ = α ((ΣCS-US – ΣCS-noUS) - )

Importantly, for the purpose of determining acquired salience (), both associative

strengths will combine to produce additive effects:

= f (CS-US + CS-noUS)

where f is a monotonically increasing function, and CS-US + CS-noUS is combined

associative strength. In this manner, the model explains why the acquired salience or

combined associative strength of a partially reinforced cue is greater than those of a

continuously reinforced one ( = f (CS-US)), and even greater than those of a cue that has

not been paired with any relevant outcome ( = 0).

A further consequence of probabilistic reinforcement according to the model is some

extinction of the CS-US and CS-noUS associations. Thus, nonreinforced trials will also

result in some extinction of the CS-US association according to:

ΔCS-US- = α ( - (ΣCS-US – ΣCS-noUS)

whereas reinforced trials will extinguish to some extent the CS-noUS association

according to:

ΔCS-noUS- = α ((ΣCS-US – ΣCS-noUS) - )

In both equations, the subindex “-“ indicates a decrement in associative strength. The

relative contribution of incremental and decremental changes in associative strength, and

therefore the rates of acquisition and extinction of the two associations, is determined in

the model by the value assigned to the parameter in each case. Esber and Haselgrove

noted that a good fit for predictiveness- and uncertainty-related attentional phenomena in

learning, such as blocking, overshadowing, and conditioned inhibition, was provided

when the parameter for acquisition of at least the CS-US association was greater than

its corresponding for extinction. This is consonant with behavioral evidence showing

that the rate of acquisition of an association is greater than the rate of extinction

(Rescorla, 2002).

An exploration of the parameter space led Esber and Haselgrove to use the values 0.05,

0.03, 0.01, and 0.01 for, respectively, associative changes ΔCS-US+, ΔCS-US-, ΔCS-

noUS+, and ΔCS-noUS-. These values were selected to provide the best possible fit for

benchmark psychological tasks, not for our task. Using the same values, the full Esber

and Haselgrove model predicts that, at asymptote, acquired salience is [1 1.3 1.6 0] for

[100% 67% 33% 0%] probability of reward . These values can be also obtained when w =

1.9 is applied in the reduced model (i.e. = f (CS-US + w (CS-noUS)). Indeed, in the

reduced model, the specific ordering (i.e. 33%>67%>100%>0%) is obtained if w >1,

which we actually observed across all task-responsive neurons (Fig 4b). A critical

prediction that falls out of such parameter selection is that at asymptote the combined

associative strength, and therefore the acquired salience, of the 33% rewarded cue should

be higher than that of 67% rewarded cue. This asymmetry results from the greater

asymptotic acquisition of CS-noUS association by the former cue.

Thus, the precise values of the CS-US and CS-noUS associations in the full Esber-

Haselgrove model arise from a complex set of interactions of parameters. The full model

was designed to account for dynamic phenomena on a wide variety of psychological

tasks. The reduced model (i.e. = CS-US + w(CS-noUS)) employed in the main

analysis reflects the fact that data were taken long after subjects reached stable

performance, and captures the pattern of acquired salience or combined associative

strength values at asymptote predicted by the full model.

Supplemental References

Rescorla, R.A. (2002). Comparison of the rates of associative change during acquisition

and extinction. Journal of Experimental Psychology Animal Behavior Processes 28, 406-

415.