Neuron, Volume 77
Supplemental Information
Risk-Responsive Orbitofrontal Neurons
Track Acquired Salience
Masaaki Ogawa, Matthijs A.A. van der Meer, Guillem R. Esber, Domenic H. Cerri, Thomas A. Stalnaker, and Geoffrey Schoenbaum
Figure S1 (related to Figure 4)
Time course of variance in firing rates explained by the CS-US and CS-noUS regressors
(magenta line, the neurons in which variance in firing were explained significantly by
both regressors and the correlation coefficients for the regressors were the same (89
neurons); blue line, all task-responsive neurons). Additional variance explained was
plotted using a sliding window of 250 ms that was advanced in 50 ms time steps. Shading
represents s.e.m.
Figure S2 (related to Figure 4)
Numbers of the neurons exhibiting specific overall patterns of activity (left) or additional
variance explained by both the CS-noUS and CS-US or by both the risk and CS-US
regressors (light red or light green bar, respectively) averaged across all task-responsive
neurons in each group (right). Neurons were further categorized into three groups
depending on the results in the regression analysis shown in Fig 3b (left). Color cord is
the same with that used in Fig 3b (left). “33-67-100-0”, for example, denotes a group of
neurons exhibiting ordering of average firing rate 33%>67%>100%>0% or
0%>100%>67%>33% pattern during the outcome anticipation period. If the average
firing rate was the same in anticipation of different probabilities of reward, then count
was divided equally among all of the possible orderings. For example, if a cell exhibits
33%>67%>100%=0% ordering pattern, then count of 0.5 was equally added to 33-67-
100-0 and 33-67-0-100, considering two possible orderings (33%>67%>100%>0% and
33%>67%>0%>100%, respectively). Numbers of the neurons were significantly higher
(or lower) than expected by chance (shown with broken line; 23.5) (***p <0.001, * p <
0.05, chi-square test).
Table S1. Behaviour was stable during recording (related to Figure 1)
Number of licks (average ± SEM)
100% 67% 33% 0%
First 1/3 sessions 23.3 ± 3.4 24.3 ± 3.3 27.7 ± 4.4 16.1 ± 3.5
Second 1/3 sessions 25.8 ± 2.2 30.1 ± 2.5 33.2 ± 3.8 19.8 ± 3.7
Last 1/3 sessions 24.3 ± 2.2 26.0 ± 2.3 27.4 ± 2.1 17.3 ± 3.9
Movement latency (sec, average ± SEM)
100% 67% 33% 0%
First 1/3 sessions 0.76 ± 0.05 0.75 ± 0.05 0.71 ± 0.05 1.65 ± 0.64
Second 1/3 sessions 0.68 ± 0.02 0.66 ± 0.04 0.65 ± 0.04 3.11 ± 2.08
Last 1/3 sessions 0.66 ± 0.03 0.63 ± 0.04 0.61 ± 0.04 0.88 ± 0.10
Two-factor ANOVA's revealed that there were significant main effect of trial type
(number of licks; p < 0.001, movement latency; p < 0.05), but there were no significant
effects of session block (number of licks; p = 0.129, movement latency; p = 0.430), nor
any interactions between trial type and block (number of licks; p = 0.995, movement
latency; p = 0.569).
Table S2. Comparison of the acquired salience and the risk model using standard
model comparison metrics (related to Figure 3)
Acquired
salience
Risk Common A-S vs Risk
(P value)
A-S vs Common
(P value)
R-squared 0.111 0.104 0.0457 3.53-10 1.65-47
Adjusted R-squared 0.0912 0.0832 0.0325 3.63-10 3.30-33
AIC 682 685 697 2.06-10 3.68-17
Leave-one-out cross-validation 3.33 3.38 4.52 0.0126 3.91-8
Partial correlation coefficient 0.0439 0.0341 2.92-10
Shown in the left three columns are mean values for each metrics across all task-
responsive 282 neurons. “A-S” or “Risk” denotes the acquired salience model and the
risk model, respectively. “Common” denotes a model that includes only the common
regressors (i.e. response latencies and number of licks) between the two models. For
partial correlation coefficient, the effect of the addition of the CS-noUS or the risk for the
acquired salience or the risk model, respectively, is shown. Shown in the right two
columns are p values for each model comparison (Wilcoxon signed-rank test). “AIC”
denotes Akaike's Information Criterion. Note more accurate model has smaller values for
AIC and leave-one-out cross-validation, but larger values for the other metrics.
Supplemental Experimental Procedures
Behavioral task
Recording was conducted in aluminum chambers approximately 18” on each side with
sloping walls narrowing to an area of 12” x 12” at the bottom. A central odor port was
located above two adjacent fluid wells on a panel in the right wall of each chamber, one
of which was blocked and inoperative for the current experiment. Two lights were
located above the panel. The odor port was connected to an air flow dilution olfactometer
to allow the rapid delivery of olfactory cues. Task control was implemented via
computer. Port entry and licking was monitored by disruption of photobeams. Odors were
chosen from compounds obtained from International Flavors and Fragrances.
The basic design of a trial is illustrated in Figure 1. After illumination of the house light,
the rat could initiate a trial by nosepoking into the odor port, which resulted in delivery of
an odor cue for 0.5 s to a small hemicylinder located behind this opening. One of four
different odors (Auralva, Para-isopropyl hydratropic aldehyde, Camekol DH, or Verbena
oliffac) was delivered to the port on each trial, in a pseudorandom order. After odor
offset, rats were required to make a response at the fluid well within 100 s. In other
words, rats had to wait 100 s for the task to proceed if they did not respond at the fluid
well. As a result, rats learned to respond on nearly every trial.
Odors were associated with different probabilities of reward: 100%, 67%, 33%, and 0%.
In a rewarded trial, a sucrose reward (0.1 ml bolus of 10% sucrose solution) was
delivered 1 sec after the well entry. The house light was turned off when rats left the
fluid well after reward consumption. In a non-rewarded trial, the house light was turned
off one sec after the entry, in order to signal the end of the trial without reward
presentation. Importantly, rats did not necessarily need to wait for 1s in the well; all what
was required for the task to proceed was a response at the well. Inter-trial interval (ITI)
was 3 seconds. Rats were well-trained (~4 weeks) prior to the start of recording, such that
they responded on all rewarded trials and > 99.0% of the trials in which the odor
associated with 0% reward is presented. In addition, these rats’ behaviors were highly
stable during recording (see Table S1). During recording sessions, the rats waited for
1sec in the fluid well on 41.2% ± 3.0% of the trials after presentation of 0% odor. There
were no differences in movement latency without early well exit: 2.12 ± 0.74 sec and that
with early well exit: 1.97 ± 0.60 sec, p = 0.123, Wilcoxon signed rank test).
Single-unit recording
Drivable bundles of stereotrodes were screened for activity daily; if no activity was
detected, the rat was removed from the recording chamber, and the electrode assembly
was advanced 40 or 80 um. Otherwise, active wires were selected to be recorded, a
session was conducted, and the electrode was advanced at the end of the session. Neural
activity was recorded using Multichannel Acquisition Processor systems (Plexon),
interfaced with the recording chamber. Signals from the electrode wires were amplified
20X by an op-amp headstage (Plexon, HST/8o50-G20-GR), located on the electrode
array. Immediately outside the training chamber, the signals were passed through a
differential pre-amplifier (Plexon, PBX2/16sp-r-G50/16fp-G50), where the single unit
signals were amplified 50X and filtered at 150-9000 Hz. The single unit signals were
then sent to the Multichannel Acquisition Processor box, where they were further filtered
at 250-8000 Hz, digitized at 40 kHz and amplified at 1-32 X. Waveforms (>2.5:1 signal-
to-noise) were extracted from active channels and recorded to disk by an associated
workstation with event timestamps from the behavior computer.
Details of the multiple regression analysis
To systematically explore the factors contributing to OFC firing during the outcome
anticipation period, we fit different regression models to the spike counts of each task-
responsive neuron (282 neurons), defined as those neurons that fired significantly more
or less in any task event (odor sampling (0.5 s after odor onset), reward-waiting (outcome
anticipation period, 1 s), and reward presentation (or omission) (1 s after reward or light
off) than during baseline period (1 s immediately preceding house light on) (Wilcoxon
signed-rank test, p < 0.001) . Models were then compared using standard model
comparison metrics averaged across neurons. The main analyses employed linear
regression models (Matlab “LinearModel.fit” function) fitted to raw spike counts. We
also fit linear regression models to log- or square-root transformed spike counts, and
found that the pattern of results was not affected by these choices.
The regression analysis addresses the question: does a model based on acquired salience,
or one based on risk, account for OFC activity better? We define acquired salience as the
sum of the CS-US and CS-noUS associative strengths. Without an a priori estimate of the
relative weighting of these two components, the regression model finds the weighting
that best fits the data, in a model of the form
y = 0 + 1x1 + 2x2 + 3x3 + 4(CS-US) + 5(CS-noUS) (1)
where y is a vector with the trial-by-trial spike counts of an individual neuron, x1-x3 are
“common” variables included in both the acquired salience and risk models (latency from
odor offset to unpoke from the odor port, latency from odor unpoke to well entry, number
of licks during the outcome anticipation period, respectively), and 1-5 are found by the
model fitting procedure. Note that only the relative values used in the CS-US and CS-
noUS regressors matter. In other words, [0 0.33 0.67 0] and [0 1 2 0] for CS-noUS would
be equivalent in accounting for a neuron with average firing rates of [0 5 10 0] to the four
cues; 5 would simply be rescaled to account for the difference in the definition of the
regressor.
The “acquired salience” model (1) above is pitted against the equivalent model for risk:
y = β0 + β1x1 + β2x2+ β3x3 + β4(CS-US) + β5σ2 (2)
where σ2 is risk: [0 1 1 0], y and x are as above, and the coefficients β are estimated by
the regression procedure.
Note that the “risk” model (2) contains a CS-US term, included in the model to prevent a
severe decrease in overall model performance relative to the acquired salience model,
which also has this term (1). This means that the “risk” model considers all possible
linear combinations of value and risk in accounting for neural activity. For completeness,
we also fit a “common” model that includes only the common regressors (i.e. x1, x2, and
x3) between the two models. Finally, we verified that the conclusions were the same by
the exclusion of these common terms.
We employed a range of model comparison metrics to determine whether the “acquired
salience” or “risk” model was more effective in accounting for the data. The most
straightforward measure is the coefficient of determination, R2, which is the proportion of
variance in the data explained by a particular model. Because we are interested
specifically in the contribution of acquired salience versus risk plus CS-US, the main
analysis (Figure 3) reports a coefficient of partial determination, i.e. the additional
variance explained by adding either acquired salience or risk plus CS-US to a common
model that contains the three behavioral variables (x1- x3), adjusted for the number of
parameters (adjusted R2). However, the pattern of results was unchanged when the
adjusted R2 of the full models was used.
As described, R2 and partial correlation measures are subject to overfitting, i.e. the tuning
of regression parameters to capture idiosyncrasies or noise in the data. Overfitted models
fail to capture regularities in the data that enable generalization to novel cases. Thus, a
more rigorous criterion for model comparison is cross-validation, which tests explicitly
how well the model performs on testing trials not used to fit the model. Specifically, we
performed leave-one-out cross-validation: for each neuron, we fit models on N-1 trials
and tested the fit on the trial left out, averaging the sum of squares error (SSE) through
iterations over leaving out each trial 1…N in turn. We also report Akaike’s Information
Criterion (AIC), which under certain assumptions is equivalent to leave-one-out cross-
validation. Thus, for each neuron, we obtained a number of model comparison metrics to
determine whether acquired salience or risk is more effective in accounting for OFC
neural activity. Distributions of these metrics across the population of neurons were
compared using the Wilcoxon signed-rank test.
Formal description of the full Esber-Haselgrove model
Following the Pearce-Hall model (Pearce and Hall, 1980; Pearce et al., 1982), Esber and
Haselgrove (2011) assume that when a cue is probabilistically reinforced, occasional
pairing of the cue and the outcome will lead to the formation of an association between
the representations of these two events. Once the outcome becomes expected on the basis
of the cue, however, occasional omission of the outcome on intermixed non-reinforced
trials will encourage the formation of a second association, ‘no-outcome’ representation.
The net expectation of reinforcement is obtained from subtracting the strength of the CS-
noUS association from that of the CS-US association. If this difference is positive, the
cue will be a conditioned excitor; conversely, a negative difference will define a
conditioned inhibitor. Formally, increments in CS-US associative strength are calculated
using a modified version of the delta rule:
ΔCS-US+ = α ( - (ΣCS-US – ΣCS-noUS)
where α and are learning-rate parameters representing the salience of the cue and the
outcome, respectively and the subindex “+” indicates an increment associative strength.
is the maximum associative strength supported by the outcome. The difference ΣCS-US –
ΣCS-noUS designates the net expectation of reinforcement based on all presented cues.
When this difference surpasses the value of the outcome, , indicating that the outcome is
overexpected, the CS-noUS association will grow according to:
ΔCS-noUS+ = α ((ΣCS-US – ΣCS-noUS) - )
Importantly, for the purpose of determining acquired salience (), both associative
strengths will combine to produce additive effects:
= f (CS-US + CS-noUS)
where f is a monotonically increasing function, and CS-US + CS-noUS is combined
associative strength. In this manner, the model explains why the acquired salience or
combined associative strength of a partially reinforced cue is greater than those of a
continuously reinforced one ( = f (CS-US)), and even greater than those of a cue that has
not been paired with any relevant outcome ( = 0).
A further consequence of probabilistic reinforcement according to the model is some
extinction of the CS-US and CS-noUS associations. Thus, nonreinforced trials will also
result in some extinction of the CS-US association according to:
ΔCS-US- = α ( - (ΣCS-US – ΣCS-noUS)
whereas reinforced trials will extinguish to some extent the CS-noUS association
according to:
ΔCS-noUS- = α ((ΣCS-US – ΣCS-noUS) - )
In both equations, the subindex “-“ indicates a decrement in associative strength. The
relative contribution of incremental and decremental changes in associative strength, and
therefore the rates of acquisition and extinction of the two associations, is determined in
the model by the value assigned to the parameter in each case. Esber and Haselgrove
noted that a good fit for predictiveness- and uncertainty-related attentional phenomena in
learning, such as blocking, overshadowing, and conditioned inhibition, was provided
when the parameter for acquisition of at least the CS-US association was greater than
its corresponding for extinction. This is consonant with behavioral evidence showing
that the rate of acquisition of an association is greater than the rate of extinction
(Rescorla, 2002).
An exploration of the parameter space led Esber and Haselgrove to use the values 0.05,
0.03, 0.01, and 0.01 for, respectively, associative changes ΔCS-US+, ΔCS-US-, ΔCS-
noUS+, and ΔCS-noUS-. These values were selected to provide the best possible fit for
benchmark psychological tasks, not for our task. Using the same values, the full Esber
and Haselgrove model predicts that, at asymptote, acquired salience is [1 1.3 1.6 0] for
[100% 67% 33% 0%] probability of reward . These values can be also obtained when w =
1.9 is applied in the reduced model (i.e. = f (CS-US + w (CS-noUS)). Indeed, in the
reduced model, the specific ordering (i.e. 33%>67%>100%>0%) is obtained if w >1,
which we actually observed across all task-responsive neurons (Fig 4b). A critical
prediction that falls out of such parameter selection is that at asymptote the combined
associative strength, and therefore the acquired salience, of the 33% rewarded cue should
be higher than that of 67% rewarded cue. This asymmetry results from the greater
asymptotic acquisition of CS-noUS association by the former cue.
Thus, the precise values of the CS-US and CS-noUS associations in the full Esber-
Haselgrove model arise from a complex set of interactions of parameters. The full model
was designed to account for dynamic phenomena on a wide variety of psychological
tasks. The reduced model (i.e. = CS-US + w(CS-noUS)) employed in the main
analysis reflects the fact that data were taken long after subjects reached stable
performance, and captures the pattern of acquired salience or combined associative
strength values at asymptote predicted by the full model.
Supplemental References
Rescorla, R.A. (2002). Comparison of the rates of associative change during acquisition
and extinction. Journal of Experimental Psychology Animal Behavior Processes 28, 406-
415.
Top Related