Testing the role of reward and punishment sensitivity in avoidance behavior: a computational...

18
Behavioural Brain Research 283 (2015) 121–138 Contents lists available at ScienceDirect Behavioural Brain Research jou rn al hom epage: www.elsevier.com/locate/bbr Research report Testing the role of reward and punishment sensitivity in avoidance behavior: A computational modeling approach Jony Sheynin a,b,c,, Ahmed A. Moustafa a,d , Kevin D. Beck a,b,c , Richard J. Servatius b,c,e , Catherine E. Myers a,b,c a Department of Veterans Affairs, New Jersey Health Care System, East Orange, NJ, USA b Joint Biomedical Engineering Program, New Jersey Institute of Technology and Graduate School of Biomedical Sciences, Rutgers, The State University of New Jersey, Newark, NJ, USA c Stress & Motivated Behavior Institute, New Jersey Medical School, Rutgers, The State University of New Jersey, Newark, NJ, USA d Marcs Institute for Brain and Behaviour & School of Social Sciences and Psychology, University of Western Sydney, Sydney, Australia e Department of Veterans Affairs, Veterans Affairs Medical Center, Syracuse, NY, USA h i g h l i g h t s A reinforcement-learning model successfully simulated human avoidance behavior. Distinct reward and punishment sensitivity ratio might underlie sex differences. Distinct punishment sensitivity might underlie inhibited temperament differences. Attenuating effect of safety-signals is due to the competing approach response. Safety-signals might be used in cognitive-behavior therapies to reduce avoidance. a r t i c l e i n f o Article history: Received 9 September 2014 Received in revised form 12 January 2015 Accepted 20 January 2015 Available online 29 January 2015 Keywords: Avoidance Computational model Reinforcement learning Anxiety vulnerability Individual difference Safety-signal a b s t r a c t Exaggerated avoidance behavior is a predominant symptom in all anxiety disorders and its degree often parallels the development and persistence of these conditions. Both human and non-human animal stud- ies suggest that individual differences as well as various contextual cues may impact avoidance behavior. Specifically, we have recently shown that female sex and inhibited temperament, two anxiety vulnera- bility factors, are associated with greater duration and rate of the avoidance behavior, as demonstrated on a computer-based task closely related to common rodent avoidance paradigms. We have also demon- strated that avoidance is attenuated by the administration of explicit visual signals during “non-threat” periods (i.e., safety signals). Here, we use a reinforcement-learning network model to investigate the underlying mechanisms of these empirical findings, with a special focus on distinct reward and punish- ment sensitivities. Model simulations suggest that sex and inhibited temperament are associated with specific aspects of these sensitivities. Specifically, differences in relative sensitivity to reward and punish- ment might underlie the longer avoidance duration demonstrated by females, whereas higher sensitivity to punishment might underlie the higher avoidance rate demonstrated by inhibited individuals. Simu- lations also suggest that safety signals attenuate avoidance behavior by strengthening the competing approach response. Lastly, several predictions generated by the model suggest that extinction-based cognitive-behavioral therapies might benefit from the use of safety signals, especially if given to individ- uals with high reward sensitivity and during longer safe periods. Overall, this study is the first to suggest cognitive mechanisms underlying the greater avoidance behavior observed in healthy individuals with different anxiety vulnerabilities. © 2015 Elsevier B.V. All rights reserved. Corresponding author. Current address: Department of Psychiatry, University of Michigan, 4250 Plymouth Rd, Ann Arbor, MI 48109, USA. E-mail addresses: [email protected], [email protected] (J. Sheynin), [email protected] (A.A. Moustafa), [email protected] (K.D. Beck), [email protected] (R.J. Servatius), [email protected] (C.E. Myers). http://dx.doi.org/10.1016/j.bbr.2015.01.033 0166-4328/© 2015 Elsevier B.V. All rights reserved.

Transcript of Testing the role of reward and punishment sensitivity in avoidance behavior: a computational...

R

Tb

JCa

b

Tc

d

e

h

•••••

a

ARRAA

KACRAIS

r

h0

Behavioural Brain Research 283 (2015) 121–138

Contents lists available at ScienceDirect

Behavioural Brain Research

jou rn al hom epage: www.elsev ier .com/ locate /bbr

esearch report

esting the role of reward and punishment sensitivity in avoidanceehavior: A computational modeling approach

ony Sheynina,b,c,∗, Ahmed A. Moustafaa,d, Kevin D. Becka,b,c, Richard J. Servatiusb,c,e,atherine E. Myersa,b,c

Department of Veterans Affairs, New Jersey Health Care System, East Orange, NJ, USAJoint Biomedical Engineering Program, New Jersey Institute of Technology and Graduate School of Biomedical Sciences, Rutgers,he State University of New Jersey, Newark, NJ, USAStress & Motivated Behavior Institute, New Jersey Medical School, Rutgers, The State University of New Jersey, Newark, NJ, USAMarcs Institute for Brain and Behaviour & School of Social Sciences and Psychology, University of Western Sydney, Sydney, AustraliaDepartment of Veterans Affairs, Veterans Affairs Medical Center, Syracuse, NY, USA

i g h l i g h t s

A reinforcement-learning model successfully simulated human avoidance behavior.Distinct reward and punishment sensitivity ratio might underlie sex differences.Distinct punishment sensitivity might underlie inhibited temperament differences.Attenuating effect of safety-signals is due to the competing approach response.Safety-signals might be used in cognitive-behavior therapies to reduce avoidance.

r t i c l e i n f o

rticle history:eceived 9 September 2014eceived in revised form 12 January 2015ccepted 20 January 2015vailable online 29 January 2015

eywords:voidanceomputational modeleinforcement learningnxiety vulnerability

ndividual differenceafety-signal

a b s t r a c t

Exaggerated avoidance behavior is a predominant symptom in all anxiety disorders and its degree oftenparallels the development and persistence of these conditions. Both human and non-human animal stud-ies suggest that individual differences as well as various contextual cues may impact avoidance behavior.Specifically, we have recently shown that female sex and inhibited temperament, two anxiety vulnera-bility factors, are associated with greater duration and rate of the avoidance behavior, as demonstratedon a computer-based task closely related to common rodent avoidance paradigms. We have also demon-strated that avoidance is attenuated by the administration of explicit visual signals during “non-threat”periods (i.e., safety signals). Here, we use a reinforcement-learning network model to investigate theunderlying mechanisms of these empirical findings, with a special focus on distinct reward and punish-ment sensitivities. Model simulations suggest that sex and inhibited temperament are associated withspecific aspects of these sensitivities. Specifically, differences in relative sensitivity to reward and punish-ment might underlie the longer avoidance duration demonstrated by females, whereas higher sensitivityto punishment might underlie the higher avoidance rate demonstrated by inhibited individuals. Simu-lations also suggest that safety signals attenuate avoidance behavior by strengthening the competing

approach response. Lastly, several predictions generated by the model suggest that extinction-basedcognitive-behavioral therapies might benefit from the use of safety signals, especially if given to individ-uals with high reward sensitivity and during longer safe periods. Overall, this study is the first to suggestcognitive mechanisms underlying the greater avoidance behavior observed in healthy individuals withdifferent anxiety vulnerabilities.

© 2015 Elsevier B.V. All rights reserved.

∗ Corresponding author. Current address: Department of Psychiatry, University of MichE-mail addresses: [email protected], [email protected] (J. Sheynin), A.M

[email protected] (R.J. Servatius), [email protected] (C.E. Myers).

ttp://dx.doi.org/10.1016/j.bbr.2015.01.033166-4328/© 2015 Elsevier B.V. All rights reserved.

igan, 4250 Plymouth Rd, Ann Arbor, MI 48109, [email protected] (A.A. Moustafa), [email protected] (K.D. Beck),

1 Brain R

1

aapppTrpatrde

1

hsasfi(pathroacieaafd

a(lmctacbaasObvWimitstwt

22 J. Sheynin et al. / Behavioural

. Introduction

Avoidance is defined as a behavior that causes the omission ofversive events. Avoidance behavior in response to a cue signalingn upcoming aversive event is usually adaptive and serves torotect one from harm, but exaggerated avoidance behavior is aredominant symptom in all anxiety disorders [1], and its severityarallels the development and persistence of these disorders [2–6].o date, the literature on avoidance behavior is based mainly onodent studies where neutral signals (warning signals; e.g., tones)redict the occurrence of aversive events (e.g., electric shocks),nd the animal learns a predetermined response (e.g., lever-press)o overcome these events. Responding during the aversive eventesults in its termination (escape response; ER), while respondinguring the warning signal prevents the occurrence of the aversivevent (avoidance response; AR).

.1. Empirical work in human subjects

Some attempts to operationalize human avoidance behaviorave used an operant fear-conditioning framework in which theubject makes responses to avoid mild aversive events (“unpleas-nt but bearable” electric shocks; e.g., [7,8]). However, since suchtimuli are by definition not highly aversive, the generality of thendings is limited; on the other hand, the use of highly aversivee.g., painful and distressing) stimuli would have serious ethical andractical constraints. Studies that attempt to address how humansvoid truly painful and/or distressing stimuli have instead tendedo rely on self-report questionnaires, which ask subjects to reportow often they manifest different types of avoidance behaviors inesponse to real-world stimuli and events (e.g., [9,10]). Another linef research employs computer-based tasks to examine avoidance ofversive feedback (e.g., point loss). In these paradigms, the subjectontrols a spaceship, attempts to gain reward (point gain) by shoot-ng at enemy spaceships, and learns to avoid aversive on-screenvents (point loss). These tasks have been successfully shown tossess different aspects of avoidance behavior, such as passive andctive avoidance ([11,12], respectively), effects of different rein-orcement contingencies and contextual variables [13], as well asiscrimination learning and latent inhibition [14].

We have recently extended one of these tasks (by [12]) to test thecquisition of escape-avoidance behavior in healthy young adultsFig. 1; [15]). Briefly, in this task participants controlled a spaceshipocated at the bottom of the screen and were instructed to maxi-

ize their score. Participants could learn that a reward (one point)ould be obtained by shooting and destroying an enemy spaceshiphat was moving on the screen. Every 20 s, a signal (a colored rect-ngle at the top of the screen) appeared for 5 s. Depending on itsolor, the signal could be a warning signal (W+) that was followedy an aversive event, or a control signal (W−) that was not associ-ted with any event. The aversive event was a bomb that appearedt the center of the screen for 5 s, during which the participant’spaceship was exploded and a maximum of 30 points could be lost.n warning trials, W+ appeared (warning period), followed by theomb period, which in turn was followed by a 10-s intertrial inter-al (ITI) during which no signal/bomb occurred. On control trials,− appeared (control period), followed by a longer 15-s ITI. Partic-

pants could learn to protect themselves from the aversive event byoving their spaceship to a specific “safe area” on the screen (“hid-

ng”). However, while in the safe area, it was impossible to shoothe enemy spaceship and obtain reward. Subjects who entered the

afe area during the warning period and remained there throughouthe bomb period avoided all point loss on that trial (AR); subjectsho entered the safe area after the bomb period began were able

o escape that point loss (ER). At the beginning of the experiment,

esearch 283 (2015) 121–138

participants were given 1 min of practice time, during which theycould shoot the enemy spaceship but no signal or bomb appeared.

Sheynin et al. [15] used several variables to describe theescape-avoidance behavior on the computer-based task. First, hid-ing duration indicated the percentage of time spent hiding duringthe warning period, the control period and the bomb period. Hid-ing during the bomb period represented an ER and terminatedpoint loss. Hiding during the warning period represented avoidancebehavior and could completely prevent any point loss; if the partic-ipant emerged from hiding before the end of the bomb period, pointloss resumed and response was not recorded as an AR. In addition,Sheynin et al. defined two variables to describe specific aspects ofavoidance: AR rate – percentage of acquisition trials on which an ARwas made and AR duration – percentage of the warning period dur-ing which the participant’s spaceship was hidden, averaged acrosstrials where an AR was made. Longer AR duration indicated thata participant made a response earlier during the warning periodand remained hiding longer overall on that trial. In Sheynin et al.’s[15] initial study with the spaceship task, the vast majority of theparticipants learned the ER, while most of them also learned tocompletely avoid point loss by performing an AR. This pattern isconsistent with what is generally reported in the rodent literatureon avoidance learning (e.g., [16]).

In addition to providing a framework to operationalize humanavoidance behavior, Sheynin et al. [15] tested associations of avoid-ance behavior with individual differences and specifically, thosethat confer anxiety vulnerability. A large animal literature hasdemonstrated the effect of strain and sex on active avoidancebehavior in rodents. Specifically, female sex and inhibited temper-ament (i.e., behavioral inhibition in response to novel or aversivestimuli) have been associated with greater avoidance behavior inrodents (e.g., [16,17]). Since both female sex and inhibited tem-perament are vulnerability factors for anxiety disorders ([18,19],respectively), these observations suggested that greater avoid-ance behavior might mediate vulnerability to anxiety disorders inhumans. Indeed, by using the described spaceship avoidance task,Sheynin et al. have found the same facilitated AR pattern in vulner-able young adults. Interestingly, Sheynin et al. [15] also reported adouble dissociation of sex and temperament. Specifically, althoughmales and females showed similar AR rate, females had longer ARduration, meaning they tended to spent more of the warning periodhiding in the safe areas. On the other hand, inhibited participantshad higher AR rate than uninhibited participants, with no differ-ence in AR duration. Together, these findings suggested differentialvulnerability pathways associated with sex and temperament.

As a follow-up study, Sheynin et al. [20] extended the space-ship task to eliminate control trials and to include an extinctionphase, where W+ was not followed by an aversive event (bomband point loss). Importantly, impaired extinction learning charac-terizes anxiety disorders, as well as post-traumatic stress disorder,and is reflected in patients’ tendency to keep emitting ARs, althoughaversive outcomes no longer occur [21]. Results from the acquisi-tion phase on the spaceship task were similar to those of the priorstudy [15], in that females showed longer AR duration than males;females were also slower to extinguish the avoidance behavior thanmales (shown by longer hiding duration during the warning periodon extinction trials), an effect parallel to the delayed avoidanceextinction in animal models of anxiety vulnerability [17].

Sheynin et al. [20] also used the spaceship task to explore theeffect of safety signals (SSs; signals associated with non-threatperiods), which were shown to modulate avoidance behavior inrodents (e.g., [22–24]; for review, see [20]). Participants were

divided into two groups given different versions of the spaceshiptask – “with-SS” and “without-SS”. Participants in the “with-SS”group were administered an SS during the ITI on acquisition tri-als; the SS took the form of two background lights at the two

J. Sheynin et al. / Behavioural Brain Research 283 (2015) 121–138 123

Fig. 1. Computer-based escape-avoidance task (adapted from [15] (A–D); [20] (E)). (A) Participants controlled a spaceship located at the bottom of the screen and wereinstructed to maximize their score. They could learn that a reward (one point) could be obtained by shooting and destroying an enemy spaceship that was moving on thescreen. (B) In the original study [15], every 20 s, a signal (a colored rectangle at the top of the screen) appeared for 5 s. Depending on its color, the signal could be a warningsignal (W+) or a control signal (W−). (C) W+ was always followed by appearance of a bomb, which remained onscreen for another 5 s (bomb period), during which a maximumof 30 points could be lost. (D) Participants could learn to protect themselves from this aversive event by moving their spaceship to a specific “safe area” on the screen; movingthere during the bomb period terminated the point loss (ER), while moving during the warning period could completely prevent any point loss (AR). Importantly, while inthe safe area, it was impossible to shoot the enemy spaceship and obtain reward. (E) Sheynin et al. [20] modified this task by eliminating control trials and adding a safetys n in

( rred to

usieo

1

uBbawiii

ignal (yellow lights) which appeared during the ITI on acquisition trials. Labels showFor interpretation of the references to color in this figure legend, the reader is refe

pper corners of the screen (Fig. 1E). Results showed that suchignal administration facilitated the extinction of avoidance behav-or (shown by decreased hiding during the warning period onxtinction trials); this was a main effect that occurred regardlessf participant’s sex or inhibited temperament.

.2. Computational modeling approach and the current work

One approach for investigating the mechanisms that mightnderlie these behavioral outcomes is computational modeling.y developing computational models that simulate the observedehavior, researchers can shed light on previous findings, as wells generate new hypotheses that would drive future empirical

ork. Traditionally, computational models of avoidance learn-

ng have used reinforcement learning algorithms (RL; [25–30]),n which an “agent” (model) learns by trial and error to max-mize reward and minimize punishment [31]. Specifically, if a

white text are for illustration only and did not appear on the screen during the task. the web version of the article.)

response to a stimulus results in a reward (or better-than-expectedoutcome), the probability of repeating this response in the futureis increased. Alternatively, if a response results in a punishment(or worse-than-expected outcome), the probability is decreased. Acommon computational architecture that describes RL is the actor-critic model [31]. In this model, the “critic” assesses the valueof a given state and computes a prediction error (PE), based onthe difference between actual and expected outcomes. The “actor”uses this PE to optimize the agent’s behavior and maximize thetotal reward. Normally, RL models include different free parame-ters, which correspond to specific computational processes in themodel and might describe how learning patterns vary across sub-jects. Parameters include learning rates, exploitation/exploration

bias (the tendency to repeat previously reinforced responses versusexplore the effect of new ones; sometimes called “inverse temper-ature”) and sensitivities to the different possible outcomes of themodel (e.g., reward and punishment).

124 J. Sheynin et al. / Behavioural Brain Research 283 (2015) 121–138

Fig. 2. Decision process representation of a trial during the acquisition phase of the computer-based avoidance paradigm. Each simulated trial was divided into 60 timesteps,where each timestep represented 0.33 s. On each timestep, a total of four inputs were provided to the model: presence or absence of a warning signal (W+), a control signal(W−), an SS and an aversive event (bomb). On each timestep, the agent chose between two actions – “fight” (remain/move to the central area; dashed red arrows) or “hide”(remain/move into a “safe area”; solid green arrows). When located in the central area during the bomb period, external reinforcement was Rpunish (point loss; depicted byR−). When located in the central area during the warning period or the ITI, reinforcement was Rreward (point gain; depicted by R+). When located in a “safe area”, reinforcementwas always zero (no points gained or lost). (A) Warning trials included 15 timesteps with W+, followed by 15 timesteps with a bomb, followed by 30 timesteps of ITI. (B)Control trials included 15 timesteps with W−, followed by 45 timesteps of ITI (no bomb period). (For interpretation of the references to color in this figure legend, the readeri

bawfit(eeae

ottImtfatswtlateussss

s referred to the web version of the article.)

Here, we adapt a computational modeling approach that haseen previously used to simulate common rodent conditionedvoidance paradigms (e.g., [27,30]). Specifically, we use a RL net-ork model with an actor-critic architecture and apply it to thendings from recent studies with the computer-based spaceshipask [15,20]. We have organized the results section into three parts2–4); in each part, we list the key findings in the correspondingmpirical work, describe the computational methods that weremployed, report the results obtained from the model simulationsnd discuss the meaning of the results in the context of the existingmpirical literature.

Specifically, we first manipulate the different free parametersf the model, with the goal of revealing possible mechanismshat could underlie the associations between female sex, inhibitedemperament and facilitated acquisition of avoidance behavior.n light of reports of distinct sensitivities to reward and punish-

ent in females [32] and inhibited individuals [33], we hypothesizehat different outcome sensitivities may underlie individual dif-erences in avoidance behavior, in which reward and punishmentre often competing features [34]. Secondly, we use the modelo describe the extinction learning demonstrated on the space-hip task [20]. We hypothesize that the same mechanisms thatould be proposed to underlie females’ increased avoidance during

he acquisition phase could also underlie their slower extinctionearning. Third, we use the model to shed light on the attenu-ting effect of SSs on avoidance behavior [20]. We hypothesizehat since SSs are thought to have rewarding qualities [35], theirffect on avoidance behavior could be mediated by an individ-al’s sensitivity to reward. We then employ the model to generate

everal predictions that could potentially increase the under-tanding of the involvement of SSs in avoidance behavior, andpecifically, as a tool in cognitive-behavioral therapy for anxietyymptoms.

2. Acquisition of avoidance behavior and associations withanxiety vulnerabilities

In the initial study with the spaceship task, Sheynin et al. [15]demonstrated that this task could be used to assess the acquisi-tion of escape-avoidance behavior in human subjects. Specifically,although no explicit instructions to use the safe areas were given,healthy young adults successfully learned to discriminate betweenW− and W+ and to protect themselves from the on-screen aver-sive event. While the vast majority of the participants learned theER, some also learned the AR. Importantly, individuals with anxietyvulnerabilities exhibited increased avoidance responding: femalesex was associated with longer AR duration and inhibited temper-ament was associated with higher AR rate.

2.1. Methods

To simulate the spaceship task, we divided each trial of the task(which lasted 20 s) into 60 timesteps, where each timestep repre-sented 0.33 s (Fig. 2). On each timestep, a total of four inputs definedthe current state: presence or absence of a warning signal (W+), acontrol signal (W−), an SS and an aversive event (bomb). To sim-ulate the 1 min of practice time that was given at the beginningof the task, each simulation started with 180 “practice” timestepsduring which all inputs were set to zero and only reward was avail-able (i.e., no punishment). Following the methods of Sheynin et al.[15], the task then included an acquisition phase that consistedof 24 trials; each trial was preceded by one “pre-trial” timestepwhere all inputs were set to zero. Then, 15 timesteps with a signal

followed; signal type (W+/W−) was determined by the trial type,which followed a pseudorandom but fixed trial order. On warningtrials, the signal (W+) was followed by another 15 timesteps witha bomb and 30 ITI timesteps where neither a signal nor a bomb

rain R

wlrwmslpejapfiacRslaoaaF

stwdstaaatltao

2

tcwtdw

P

wttc

V

abfs

wop

T = 0.00133, Rpunish = 45 and Rreward = −0.9. When values of a specificparameter were manipulated, all other values remained constant.All simulations were run using the Xcode version 4.6.2 program-ming environment (Apple Inc., Cupertino, CA), using C-source code.

Fig. 3. The network model (“agent”) architecture. On each timestep, six binaryinputs signaled the presence or absence of the (1) warning signal (W+) (2) controlsignal (W−) (3) safety signal (4) aversive event (bomb), as well as the location of thesubject’s spaceship – (5) inside a “safe area” or (6) in the central area of the screen.On each timestep, the critic computed prediction error (the change in expectedfuture reinforcement; Eq. (1)). This prediction error was used to update (thick green

J. Sheynin et al. / Behavioural B

ere present (Fig. 2A). On control trials, the signal (W−) was fol-owed by 45 timesteps of ITI (Fig. 2B). On each timestep, externaleinforcement could be provided: when the subject’s spaceshipas located in the central area during a bomb period, a punish-ent (Rpunish) was provided, corresponding to point loss when the

ubject’s spaceship is bombed. When the subject’s spaceship wasocated in the central area during any other period, a reward wasrovided (Rreward), corresponding to an opportunity to shoot at thenemy spaceship, which could lead to point gain. When the sub-ect’s spaceship was located in a “safe area”, reinforcement waslways zero (no points gained or lost). The subject’s spaceship waslaced in the central area on the first timestep of each session (i.e.,rst timestep of the practice period). Rpunish and Rreward were sets free parameters in the model; since reinforcement values in theurrent model were arbitrarily set to represent aversive outcomes,punish and Rreward were represented by positive and negativecalars, respectively. The sensitivity ratio is defined as the abso-ute value of the ratio between the values of these outcomes [i.e.,bs(Rpunish/Rreward)]. On each timestep, the agent could select onef two actions: either to “fight” (remain/move to the central area)nd attempt to obtain reward, or “hide” (remain/move to a “saferea”) and avoid possible punishment (red and green arrows inig. 2).

In parallel with analyses of the human data from this task,everal dependent variables were analyzed. First, on each trial,he percentage of timesteps on which the agent was “hiding”as recorded for the warning, control and bomb periods (hiding

uration), with hiding during the warning and bomb period repre-enting avoidance and escape behavior, respectively. Similarly tohe human study, we further defined two specific variables: AR ratend AR duration, to assess the frequency and latency of the avoid-nce responses. The model simulation recorded an “AR” when thegent entered a safe area during the warning period and remainedhere through the remaining warning period and the majority (ateast 14 timesteps) of the subsequent bomb period. In analogy tohe human study, longer AR duration indicated that the agent made

response earlier after onset of W+, and remained in hiding longerverall on that trial.

.1.1. The critic moduleThe critic module (see Fig. 3) received four inputs each coding

he presence or absence of W+, W−, SS and bomb, and two inputsoding spaceship location (whether it was inside a “safe area” orhether it was in the central area of the screen). At each timestep,

he critic computed prediction error (PE), which represented theifference between expected values across adjacent timesteps andas calculated as:

E = R + �∗V − V ′ (1)

here R was the reinforcement (Rpunish, Rreward or zero), and � washe discounting factor that made distant reinforcements count lesshan more proximate reinforcements; � was fixed to 0.9 in theurrent study. V was the predicted future value, calculated as:

=∑

iv[i] ∗ Ii (2)

nd V′ was the value of V from the prior timestep. Ii was the currentinary (1/0) value of input i; and v[i] was the strength of connectionrom input i to V. All v[i] were initialized to zero at the start of theimulation run and updated as:

v[i] = ˛+/− ∗ PE ∗ I (3)

i

here ̨ represented the learning rate (LR), which dictated ratef weight change in the critic; ˛+/− were the LRs associated withositive/negative PEs (see [36,37]). The value of v[i] was clipped at

esearch 283 (2015) 121–138 125

a maximum of Rpunish and a minimum of zero, to prevent v fromgrowing out of bounds.

2.1.2. The actor moduleOn each timestep, the actor (see Fig. 3) chose between two pos-

sible responses – “fight” or “hide”. The probability of selecting aparticular response was calculated using a softmax function [31]:

P(fight) = f (fight)/[f (fight) + f (hide)] and P(hide) = 1 − P(fight) (4)

where f(a) = exp(Ma/T), with T being the exploitation/explorationparameter (“inverse temperature”) and Ma being the value associ-ated with action a, which was computed as:

Ma =∑

im[a][i] ∗ Ii (5)

As before, Ii was the current value of the input i; m[a][i] wasthe strength of the connection from input i to action a. To capturethe feature that participants were not explicitly informed about theoption of hiding in the safe areas, but had to learn via exploration inthe game [15], all weights for the “hide” response m[hide][i] wereinitialized to 0.85; since participants were explicitly informed thatthey should try to gain points by shooting enemy spaceships, theweights for the “fight” response were initialized to a higher valueof 0.95. On each timestep, weights for the chosen action r wereupdated based on PE calculated by the critic:

�m[r][i] = ε+/− ∗ (−PE) ∗ Ii (6)

where ε represented the LR, which dictated rate of weight changein the actor; ε+/− were the LRs associated with positive/negativePEs. The values of m were restricted to remain positive.

Unless mentioned otherwise, all simulation results representthe average of 100 simulations, which is comparable to number ofparticipants reported in recent studies using this task [15,20]. Forsimplicity, we first set (˛+) = (˛−) and (ε+) = (ε−), as often done withsimilar models (e.g., [26,30]). Then, we show how specific manip-ulations of these LRs affect model behavior. Except as otherwisenoted, parameter values were set to ˛+/− = 0.0001, ε+/− = 0.00021,

arrows) the connections from the inputs to the critic (Eq. (3); dashed blue arrows),as well as the connections from the inputs to the actor (Eq. (6); solid red arrows).These updated connections in the actor were used to generate a response, accordingto a softmax probabilistic rule (Eq. (4)). (For interpretation of the references to colorin this figure legend, the reader is referred to the web version of the article.)

1 Brain R

2

snsiwfioHRpreSRwalme[3“aa

rtplc(todimi(

Fsrpio

26 J. Sheynin et al. / Behavioural

.2. Results

To test the hypothesis that distinct reward and punishmentensitivities might underlie the associations between anxiety vul-erabilities and AR [15], we manipulated the values of theseensitivities in the model and recorded the corresponding changen AR rate and AR duration (Fig. 4). When sensitivity to punishment

as increased (Rpunish range [25,65]) and sensitivity to reward wasxed (Rreward = −0.9), both AR rate and AR duration increased – anutcome that did not match any of the empirical findings (Fig. 4A).owever, if Rpunish was increased but the ratio between Rpunish andreward was held fixed (sensitivity ratio = 50; i.e., Rreward was alsoroportionally increased; Fig. 4B), we observed an increase in ARate but a minimal change in AR duration – similar to the differ-nces between inhibited and uninhibited individuals observed inheynin et al. [15]. Finally, when the ratio between Rpunish andreward was increased (sensitivity ratio range [30,70]) but Rpunishas fixed (Rpunish = 45; Fig. 4C), we observed the opposite pattern –

n increase in AR duration but a minimal change in AR rate, simi-ar to the differences Sheynin et al. observed between female and

ale individuals. Based on these observations, we chose param-ter values that could represent the different vulnerability groups“females” versus “males” represented by sensitivity ratio of 65 and5 (respectively) and Rpunish of 45 (for both); “inhibited” versusuninhibited” represented by Rpunish of 55 and 35 (respectively)nd sensitivity ratio of 50 (for both); see vertical lines in Fig. 4Bnd C].

In the empirical data, inhibited participants had higher ARate than uninhibited participants (Fig. 5B); however, AR dura-ion did not differ significantly between inhibited and uninhibitedarticipants (Fig. 5D). When the value of Rpunish was set to simu-

ate “inhibited” and “uninhibited” as described above, the modelorrectly showed increased AR rate in “inhibited” simulationsFig. 5A) with little effect on AR duration (Fig. 5C). Further, inhe empirical data, males and females did not differ significantlyn AR rate (Fig. 5F), but females had significantly longer ARuration than males (Fig. 5H). When the sensitivity ratio was var-

ed to simulate “males” and “females” as described above, theodel again approximated the human data: longer AR duration

n “female” simulations (Fig. 5G) with smaller effect on AR rateFig. 5E).

ig. 4. AR as a function of parameter manipulations in the model. (A) AR rate and AR duensitivity to punishment but fixed sensitivity to reward (Rreward = −0.9). (B) AR rate and ARatio (ratio = 50; i.e., sensitivity to reward was proportionally increased). (C) AR rate anunishment (Rpunish = 45). Based on these results, values were set to define “females” vers

n (B)] simulations, as described in Fig. 5. Error bars indicate SEM. (For interpretation of thf the article.)

esearch 283 (2015) 121–138

Lastly, we considered the interaction of sex and temperament.The human participants in Sheynin et al. [15] included 22 unin-hibited males, 24 uninhibited females, 14 inhibited males and 35inhibited females. We ran corresponding simulations of “inhibited”and “uninhibited” “males” and “females”, using the parametersillustrated in Fig. 4, and recorded responding for each simulatedsubject. Data for all 95 simulations were then averaged, to createlearning curves similar to those presented for the empirical data(Fig. 6). Just as in the empirical data, the model quickly learned tohide during the bomb period (ER), gradually learned to hide dur-ing the warning period (avoidance behavior), with relatively littlehiding during control period.

2.2.1. Testing the effect of specific LR manipulationsDistinct punishment/reward sensitivity ratios in males and

females (Fig. 4C) could be mediated by striatal dopamine signaling,which is known to play an important role in sensitivity andresponse to aversive as well as appetitive stimuli [66,67]. Specif-ically, we hypothesized that dopamine D2 receptor binding, whichwas shown to differ between sexes [38], could be responsible for thedifferent AR duration in males and females. Interestingly, D2 recep-tor binding has been also associated with the pathway that supportsavoidance (“NoGo”) learning, which is triggered by negative PEs inRL models (e.g., [36,37]). We thus predicted that manipulating theLR associated with negative PEs in the current model (˛−, ε−) wouldaffect mainly the simulated AR duration and parallel the reportedsex-related difference in avoidance behavior.

Fig. 7 shows AR rate and AR duration as a function of the dif-ferent LRs. As predicted, increasing ε− (Fig. 7D) produced a rapiddecrease in AR duration (dashed green line) with a much milderdecease in AR rate (solid red line). However, increasing ˛− pro-duced a similar decrease in both AR duration and AR rate (Fig. 7B),whereas increasing ˛+ and ε+ produced a similar increase in thesevariables (Fig. 7A and C, respectively). Thus, while partially meet-ing our prediction, these specific manipulations did not adequatelyaddress the full range of empirical results.

2.3. Discussion

Here, we demonstrated that a simple actor-critic model cansuccessfully simulate the acquisition of human escape-avoidance

ration (solid red and dashed green lines, respectively) as a function of increasing duration as a function of increasing sensitivity to punishment but fixed sensitivity

d AR duration as a function of increasing sensitivity ratio but fixed sensitivity tous “males” [vertical lines in (C)] and “inhibited” versus “uninhibited” [vertical linese references to color in this figure legend, the reader is referred to the web version

J. Sheynin et al. / Behavioural Brain Research 283 (2015) 121–138 127

Fig. 5. AR rate and AR duration values (solid red and striped green, respectively) of inhibited versus uninhibited subjects (A–D) and of males versus females (E–H), in themodel (left) and empirical data (adapted from [15]). Both the model and the empirical data suggest that inhibited temperament is a stronger predictor of AR rate (A–D),whereas sex is a stronger predictor of AR duration (E–H). See the main text for a detailed explanation of these analyses, as well as a discussion on the discrepancy betweensimulated and empirical findings on AR rate in males and females (Fig. 5E and F). Error bars indicate SEM. (For interpretation of the references to color in this figure legend,the reader is referred to the web version of the article.)

128 J. Sheynin et al. / Behavioural Brain Research 283 (2015) 121–138

Fig. 6. Total hiding during the different task periods (dotted = control period; dashed = warning period; solid = bomb period), in (A) the model and (B) empirical data (adaptedfrom [15]). Both the model and the empirical data show that subjects quickly learn the ER, gradually learn the avoidance behavior and show little hiding during the controlperiod. Model simulations were run the same number of times as the different group sizes in the empirical data [i.e., uninhibited males (n = 22), uninhibited females (n = 24),i

brpptcAsbeiitd

btitccistwgeitidssb

nhibited males (n = 14) and inhibited females (n = 35)]. Error bars indicate SEM.

ehavior, as demonstrated on a computer-based task closelyelated to common rodent avoidance paradigms. The behavioralaradigm that was simulated here includes two motivational com-onents, namely reward (point gain) and punishment (point loss);he tendency to obtain the rewarding outcome may conflict andompete with the tendency to prevent the punishing outcome [34].s hypothesized, model simulations suggest that differential sen-itivities to these outcomes result in distinct patterns of avoidanceehavior and may shed light on the association between anxi-ty vulnerabilities and increased AR. Specifically, we show thatncreased punishment sensitivity might underlie the higher AR raten inhibited individuals, whereas increased sensitivity ratio (sensi-ivity to punishment versus reward) might underlie the longer ARuration in females [15].

Indeed, the latter finding supports the idea that the balanceetween punishment and reward sensitivities is as important ashese sensitivities themselves [39]. This may be especially true dur-ng periods when reward and punishment coincide and tendencieso approach or avoid these outcomes conflict (approach-avoidanceonflict; [34]). In the spaceship task, the warning period producesompetition between the incentive to acquire points by shoot-ng the enemy spaceship (which requires that the participant’spaceship remain in the central area) and the incentive to avoidhe upcoming point loss (which requires a hiding response duringhich the ability to shoot is suspended). Model simulations sug-

est that the sex difference on AR duration reported by Sheynint al. [15] is the result of different ratios between these compet-ng incentives. Specifically, females’ longer AR duration might behe result of higher punishment/reward sensitivity ratio. This isn agreement with prior work showing that female college stu-

ents reported similar punishment sensitivity but lower rewardensitivity scores than male counterparts [32,33]. Such a propo-ition might also provide an explanation for recent reports fromoth human and non-human animal literature, where females

exhibited less approach behavior than males on an approach-avoidance paradigm ([34,40], respectively), as well as on thespaceship task (i.e., less total points gained and less shootingattempts [20]).

It is also interesting to note that male participants exhib-ited higher numerical values of AR rate than female participants(Fig. 5F); while this further supports the idea that AR rate and ARduration are distinct and independent types of responding (seeopposite patterns in Fig. 5F and H), this trend was supported byneither the model simulation (Fig. 5E), nor by empirical findingsin the later spaceship paper ([20] data not shown) and should befurther tested in future work.

While speculation as to the biological causes of the observedsex differences remains beyond the scope of this paper, obviouscandidates that might influence reward/punishment sensitivityratios in males and females might be the different levels of sexhormones (e.g., testosterone [41]), as well as other forms of sexualdimorphisms, such as distinct neural activity in brain areas suchas amygdala, insula, ventral striatum, prefrontal cortex and/orhippocampus [42,43]. Interestingly, our results support the ideathat distinct dopamine signaling characteristics might also beinvolved in the different approach/avoidance biases ([38] Fig. 7).Specifically, manipulating one LR in the actor (ε−) provides aclose parallel to the sensitivity ratio manipulations (compareFigs. 7D to Fig. 4C), and raises the possibility that D2 receptors inthe dorsal striatum (associated with the actor; [44,45]) play a pre-dominant role in the sex-related differences in avoidance behavior.Moreover, the association between lower LR and greater AR dura-tion in Fig. 7D is consistent with females’ lower D2 receptor affinity[38], further supporting the importance of the dorsal striatum in

the sex differences observed in the current study. However, reportson sex differences in dopaminergic transmission are not consistentand caution should be used when interpreting these results (seealso [46–48]). Future empirical work should specifically study D2

J. Sheynin et al. / Behavioural Brain Research 283 (2015) 121–138 129

Fig. 7. AR rate and AR duration (solid red and dashed green lines, respectively) as a function of specific LR manipulations in the model. (A) Manipulating the LR associatedwith positive PEs in the critic; ˛+ . (B) Manipulating the LR associated with negative PEs in the critic; ˛− . (C) Manipulating the LR associated with positive PEs in the actor;ε resultsp rs indr

rwtL

omsuttGipTv

+ . (D) Manipulating the LR associated with negative PEs in the actor; ε− . Overall,

arallel the sex differences in AR duration described in the current study. Error baeader is referred to the web version of the article.)

eceptor characteristics in the dorsal striatum, test associationith avoidance behavior, and use the current computational model

o understand the relation to specific cognitive variables such asRs.

In addition to the association with sex, the personality traitf inhibition affected performance in the spaceship task [15]. Theodel suggests that this might be due to higher punishment sen-

itivity in those with inhibited temperament than in those withninhibited temperament. This suggestion echoes prior sugges-ions that approach and avoidance tendencies might be linkedo specific personality dimensions [49], and is consistent withray’s biopsychological theory of personality, where a behavioral

nhibition system and a behavioral activation system were pro-osed as the two systems that control behavioral activity [50].he behavioral inhibition system was thought to become acti-ated by signals of novelty and punishment, and higher activity of

support the idea that changes in LRs that are associated with negative PEs mighticate SEM. (For interpretation of the references to color in this figure legend, the

this system was shown (by self-report questionnaires) to magnifyreactions to negative events [51]. Further, the specific relation ofthe behavioral inhibition system to punishment sensitivity is sup-ported by the positive correlation between self-reported scores thatassessed these two constructs in both healthy and clinical popula-tions ([33,52], respectively). Serotonin, a neuromodulator that wasproposed to regulate the behavioral inhibition system [53], could beinvolved in punishment sensitivity and should be a target of futurework. Overall, the proposition that links inhibited temperament toincreased punishment sensitivity provides a simple explanation forthe observation that inhibited individuals show facilitation in bothoperant [15,54] and classical [55] conditioning.

Importantly, while the model simulations presented in this partof the work offer plausible mechanisms that are consistent withempirical literature, they address the association between anxietyvulnerability and avoidance behavior only during the acquisition

130 J. Sheynin et al. / Behavioural Brain Research 283 (2015) 121–138

F compW desig

paprat(w(omctld

3

eceRidAsaat

3

tetwc(

3

s(ssrf

ig. 8. Decision process representation of a trial during the extinction phase in the+ signal, followed by 45 timesteps of ITI. The acquisition phase followed a similar

hase. However, greater responding during extinction, when noversive events occur, is another predominant feature of manysychopathologies [21]. In the following section, we adapt the cur-ent model to simulate both acquisition and extinction of humanvoidance behavior. Due to the failure of specific LR manipulationso parallel differences between inhibited and uninhibited subjectsFig. 7) and in order to maintain a simple framework, we proceedith the model configuration with fewer free parameters, where

˛+) = (˛−) and (ε+) = (ε −). Such configuration is consistent withther related models (e.g., [26,30]) and proposes a comprehensiveechanism that is based solely on individual differences in out-

ome sensitivities. We are specifically interested to test whetherhe same computational mechanism that was proposed to under-ie differences in avoidance acquisition could be applied to addressifferences in extinction learning.

. Extending the study to extinction learning

In the second study with the spaceship task, Sheynin et al. [20]xtended their initial study [15] and demonstrated that this taskan be successfully used to assess both the acquisition and thextinction of the escape-avoidance behavior in human subjects.esults from the acquisition phase generally replicated the results

n Sheynin et al. [15]: female sex was associated with longer ARuration and inhibited individuals tended to demonstrate higherR rate, although the latter relationship did not reach statisticalignificance in Sheynin et al. [20]. In addition, Sheynin et al. [20]lso reported that females were slower to extinguish the avoid-nce behavior than males (shown by longer hiding duration duringhe warning period on extinction trials).

.1. Methods

We used the model described earlier to include an extinc-ion phase. Here, we followed the methodology used in Sheynint al. [20]. First, all control trials (with W−) were removed fromhe acquisition phase and only the 12 warning trials (with W+)ere left. These were followed by twelve extinction trials, which

onsisted of 15 timesteps with W+, followed by 45 ITI timestepswith no signals/bombs; Fig. 8).

.2. Results

Fig. 9 shows avoidance and escape behavior over the 12 acqui-ition trials, as well as avoidance behavior during extinction trialstrials 13–24). In the current, as well as in the next part of this

tudy, in all figures that describe responding across both the acqui-ition and extinction phases (i.e., Figs. 9–14), a gray vertical lineepresents the end of the acquisition phase. The model success-ully accounted for both acquisition and extinction of the avoidance

uter-based avoidance paradigm. All extinction trials included 15 timesteps with an as described earlier, but only including warning trials (Fig. 2A).

behavior, as reported in Sheynin et al. [20]. Specifically, simulatedfemales (simulated by increased sensitivity ratio) showed facili-tated avoidance acquisition, slower extinction and similar rates ofER, compared to male counterparts (Fig. 9). In addition, in spite ofthe change in the task design (omission of the control trials), theassociations between inhibited temperament, female sex, AR rateand AR duration were replicated (similar to Fig. 5; simulation datanot shown).

3.3. Discussion

Impaired extinction learning is a key feature of anxiety disor-ders, where individuals continue to avoid fear-provoking situationseven in the absence of actual threat [21]. Interestingly, recentempirical work suggests that, similarly to animal models for anxietyvulnerability [17], healthy humans with anxiety vulnerability dueto female sex exhibit slower extinction learning than males [20].Model simulations presented here provide a possible interpretationfor these empirical findings, and suggest that the same mecha-nism that successfully accounted for females’ facilitated acquisitionlearning (higher sensitivity ratio, see Fig. 5) can also account fortheir slower extinction learning (Fig. 9).

These model simulations continue to replicate the finding oflonger AR duration in females than males [15], as shown in Fig. 5.As in Fig. 5 and in Sheynin et al. [15], the model also continues topredict higher AR rate in inhibited than uninhibited participants.This suggests that, although the association between inhibition andAR rate did not reach significance in Sheynin et al. [20], this maybe merely due to sampling error, rather than reflecting differencesin task design (the omission of control trials); further empiricalstudies will be needed to clarify this issue.

Further, it is possible that more salient variations in the taskdesign would affect avoidance behavior. For instance, a largeliterature discusses the involvement of cues that predict the nonoc-currence of an aversive event (i.e., SSs) in aversive learning inrodents, nonhuman primates and humans (for review, see [35]).Using the current avoidance task, Sheynin et al. [20] have recentlydemonstrated that the administration of an explicit visual SS duringthe acquisition phase of the task attenuated avoidance behavior. Inthe next section, we use the computational model to simulate theadministration of SSs with the goal of revealing the mechanism thatunderlies their effect on behavior.

4. Testing the effect of safety signals

In the second study with the spaceship task, Sheynin et al. [20]

also tested the effect of SSs on avoidance behavior. Results showedthat administering SSs during the ITI on acquisition trials facili-tated the extinction of the avoidance behavior (shown by decreasedhiding during the warning period on extinction trials).

J. Sheynin et al. / Behavioural Brain Research 283 (2015) 121–138 131

Fig. 9. Hiding during the warning period [(A and B) avoidance behavior] and during the bomb period [(C and D) ER] during acquisition (trials 1–12) and extinction (trials1 paredr [20]o indica

4

iTlwb

3–24). Note that there is no bomb period during extinction trials. Analyses comespectively). In both the model (A and C) and empirical data (B and D; adapted fromf the avoidance behavior compared to males, with no difference on ER. Error bars

.1. Methods

We used the model described in part 3.1 to simulate the admin-stration of a signal associated with non-threat periods (i.e., SS).

o parallel the procedure used by Sheynin et al. [20], we simu-ated SS administration during the ITI on all acquisition trials, as

ell as during the initial “practice” period. The SS was simulatedy switching the value of the “safety signal” input in the model

hiding performance in male versus female individuals (dotted versus solid lines, “without-SS” group), females showed facilitated acquisition and slower extinctionte SEM.

(input #3; see Fig. 3) to “1” during all corresponding timesteps.We first showed that the model correctly simulated behavioral dif-ferences between the two experimental groups (“with-SS” versus“without-SS”). We then investigated how connection strengths in

the critic module (dashed blue arrows in Fig. 3) changed as a resultof the SS administration, to better understand the computationalprocesses that underlie the simulated behavior. These connectionstrengths determine the predicted future value V and are associated

1 Brain R

wtcttpft

F(vw

32 J. Sheynin et al. / Behavioural

ith each input to the agent (Eq. (2)). Based on this expected value,he critic calculates a PE (Eq. (1)), which is then used to update theonnections in the actor (solid red arrows in Fig. 3). Similarly tohe connections in the critic, these connections are associated with

he different inputs to the agent and their strengths determine therobability of choosing each action (Eq. (5)). Weights were recordedor the 12 acquisition and 12 extinction trials, averaged across allimesteps on each trial. By performing such detailed analyses, we

ig. 10. Hiding during the warning period [(A and B) avoidance behavior] and during thenote that there is no bomb period during extinction trials). Analyses compared the behaversus solid black lines, respectively). In both the model (A and C) and empirical data (B anithout affecting ER. Error bars indicate SEM. (For interpretation of the references to colo

esearch 283 (2015) 121–138

expected to reveal the specific inputs and actions that drove themodel behavior.

4.2. Results

In agreement with empirical data, the administration of the SSin the model facilitated the extinction of the avoidance behav-ior (i.e., decreased hiding during the warning period on extinction

bomb period [(C and D) ER] for acquisition trials 1–12 and extinction trials 13–24ior of subjects performing an avoidance task with versus without SS (dotted greend D; adapted from [20]), the SS facilitated the extinction of the avoidance behavior,r in this figure legend, the reader is referred to the web version of the article.)

rain R

todetam

cafi(awtiesialwaBa(t(dpa

F(pcft“

J. Sheynin et al. / Behavioural B

rials) without affecting the ER (Fig. 10). This was a main effect thatccurred independent of sex or temperament in both the empiricalata and the model (data not shown). There was also an attenuatingffect of the SS on avoidance behavior during acquisition; however,his relationship did not reach significance in the empirical datand was weaker than the effect of the SS during extinction in theodel.Interestingly, analyses of the change in the weight of the model

onnections suggested that the attenuating effect of SS on avoid-nce behavior is due to the competing approach response (i.e.,ghting). Fig. 11A–E shows the connection strengths in the critictop panel; corresponds to the dashed blue arrows in Fig. 3), acrosscquisition trials 1–12 and extinction trials 13–24. Recall that theseeights are used to compute the predicted value (V) – i.e., expec-

ation of future reward or punishment, with positive weight valuesndicating expected punishment and negative values representingxpected reward. Considering first the black lines, which repre-ent the “without-SS” condition, Fig. 11A shows that there is a mildncrease in weights to the critic from the warning signal duringcquisition when the warning signal predicts upcoming threat, fol-owed by a mild decrease in the weight during extinction when

arning signal no longer predicts threat (compare to the gradualcquisition and extinction of the avoidance behavior; Fig. 10A and). Further, Fig. 11C shows that the acquisition is associated with

strong increase in weights to the critic from the aversive eventcompare to the dramatic increase in ER; Fig. 10C and D). Notehat there is no change in the weight encoding the safety signalFig. 11B), since the SS is never presented in the without-SS con-

ition. The final two inputs (Fig. 11D and E) encode location of thearticipant’s spaceship, and both show gradual increases over thecquisition phase, with a mild decrease during extinction.

ig. 11. Changes in model weights across 12 acquisition and 12 extinction trials, with and

dotted green versus solid black lines, respectively). Overall, data suggest that simulatedunishment) of the connections associated with the presence of the SS and the state of behoose the “fight” response when these inputs were activated (G and J). For a detailed exprom figure: input #2 was omitted as it represents the presence of the control signal, whiche “hide” response were omitted due to a minimal change in their values for all the inputfight” response. Error bars indicate SEM. (For interpretation of the references to color in

esearch 283 (2015) 121–138 133

Turning to the “with-SS” group, there is a similar pattern changein weights to the critic from the warning signal (Fig. 11A), from theaversive event (Fig. 11C) and from the safe area (Fig. 11D) as in thewithout-SS group. However, since this condition does include pre-sentation of the SS during periods of non-threat, the weights fromthe SS decrease across acquisition (Fig. 11B). During extinction, theweights from the safety signal do not change further, because theSS is not presented during this phase, and so input #3 is alwayszero during this phase. More interesting is the difference betweenwith-SS and without-SS conditions in the weight from the centralarea (Fig. 11E); this weight shows much lower values in the with-SSgroup, suggesting that the SS decreased the expected punishmentassociated with the central area.

In addition to changing weights in the critic, weights in theactor (solid red arrows in Fig. 3) are also updated based on pre-diction error (Eq. (6)). There is one set of weights m[“fight”][i] fromeach input i to the “fight” response, and a second set m[“hide”][i]from each input i to the “hide” response; for simplicity, only the“fight” weights are shown in Fig. 11. Unsurprisingly, the weightsto the actor from the various inputs are approximate inverses ofthose in the critic, since a prediction of upcoming punishmentshould reduce the likelihood of selecting a “fight” response; thus,for example, after the first few acquisition trials, the aversive eventdecreases the weight for “fight” (corresponding to a tendency tohide from the bomb; Fig. 11F). Importantly, the SS increases theweight from the central area to the “fight” response, compared tothe without-SS condition (Fig. 11J).

The understanding of the mechanisms that result in such weight

change is crucial. First, only weights of inputs that are active dur-ing a specific period can be updated (see Eqs. (3) and (6); whenIi = 0, �v and �m would also be zero). Second, only weights of

without the presence of an SS during ITI (and “practice” period) on acquisition trials behavioral changes were driven mainly by decreased values (decreased predicteding at the central area (B and E), which in turn, resulted in increased probability tolanation of these analyses, see main text. For clarity, several features were omittedh was always set to zero in this simulation; the actor’s connections associated withs (range: 0.8493–0.8559) – suggesting that learning affected mainly the competingthis figure legend, the reader is referred to the web version of the article.)

1 Brain Research 283 (2015) 121–138

ctcobtttbaastiwamSrrpasio(w

4

lptcpioai

Fig. 12. Simulating the effect of an SS administered during different phases ofthe task. An SS administered during the acquisition phase (dotted green line) orextinction phase (dashed green line) facilitated extinction learning, compared to

FmSr

34 J. Sheynin et al. / Behavioural

hosen actions r are updated (Eq. (6)). Third, as different periods inhe task often share inputs [e.g., input #6 (being at the central area)an be activated during either the warning period, the bomb periodr the ITI], weight alterations during one period could affect modelehavior during other periods. In the simulation of the spaceshipask, as the subject’s spaceship is in the central area during most ofhe SS appearance (around 92% of the ITI period; data not shown),he corresponding weight from the central area is the main one toe updated (Fig. 11E and J). Such a dramatic increase in the prob-bility to choose the “fight” response when located in the centralrea during the ITI also affects the warning period, in which subjectspend more than 50% of the time in the central area (see Fig. 10A) –hus, resulting in a decrease of avoidance behavior (decreased hid-ng during the warning period). Importantly, Fig. 11 presents actor

eights that are associated with the “fight” response only; weightsssociated with the “hide” response were omitted due to a mini-al change in their values across the trials (range: 0.8493–0.8559).

uch minimal change in probability to hide is due to the fact that noeinforcement is provided when inside a safe area, which in turnesults in a small PE (Eq. (1)) and accordingly, a small change inrobability to hide (Eq. (6)). Overall, these analyses suggest that thedministration of SS during the ITI reinforced the inputs (SS and thetate of being in the central area) and responses (“fight”, i.e., stayn the central area) that occurred during that period; since somef these inputs and responses also occurred during other periodse.g., during the warning period) – behavior during those periodsas accordingly affected by the SS.

.3. Predicting manipulations to the safety signal

We used the model to examine the effect of several manipu-ations to the SS. First, since common therapeutic approaches forathological avoidance are based on extinction training [56], weested whether the attenuating effect of SS on avoidance behaviorould be obtained if the SS was administered during the extinctionhase, instead of the acquisition phase. Fig. 12 shows how admin-

stering SS during the ITI on acquisition trials (dotted green line)r on extinction trials (dashed green line) affected the acquisitionnd extinction of the avoidance behavior (assessed by hiding dur-ng the warning period), compared to performance when SS was

ig. 13. Simulating the effect of SS administration in models with different reward (butedium Rreward = −0.9 (C) high Rreward = −1.29]; “with-SS” (green dotted line) versus “wit

S on avoidance, this effect was stronger in simulations with higher Rreward . Error bars ineader is referred to the web version of the article.)

simulations without an SS on either phase (solid black). Error bars indicate SEM. (Forinterpretation of the references to color in this figure legend, the reader is referredto the web version of the article.)

absent (black solid line). Simulations suggested that administeringthe SS during the extinction phase did in fact facilitate extinction

(Fig. 12).

Second, in light of the important role of the competing appetitivecomponent in modulating the attenuating effect of SS on avoid-ance behavior (as depicted in Fig. 11), we hypothesized that the

fixed punishment) sensitivity values [(A–C) Rpunish = 45 (A) low Rreward = −0.45 (B)hout-SS” (black solid line). While all simulations show an attenuating effect of thedicate SEM. (For interpretation of the references to color in this figure legend, the

J. Sheynin et al. / Behavioural Brain Research 283 (2015) 121–138 135

Fig. 14. Simulating the effect of SS administration during the ITI on acquisition trials, in models with different ITI durations [(A) short ITI = 10 timesteps (B) medium ITI = 30t SS” (bt Error

t

stR(mwS(

o[atitsawtSwwe

4

tmbmotrtssiatt

imesteps (C) long ITI = 50 timesteps]; “with-SS” (green dotted line) versus “without-his effect was dependent on the ITI duration and was stronger when ITI was longer.

he reader is referred to the web version of the article.)

ensitivity to reward (Rreward) could mediate the SS effect. Simula-ions showed that administering the SS in a model with differentreward (but fixed Rpunish) values altered the attenuating effect of SSFig. 13): When a low Rreward value was used, the effect was mini-

al, during both the acquisition and extinction phases (Fig. 13A);hen medium and high Rreward values were used, the effect of the

S was stronger, especially during the extinction phase of the taskFig. 13B and C).

Third, although a few animal studies have shown that the effectf an SS on avoidance behavior is independent of the SS duration57–59], the paradigms that were used did not include an explicitppetitive component. Here, we tested the possibility that SS dura-ion could mediate the effect of SS in the spaceship task, whichncludes an appetitive component (stay in the central area to shoothe enemy spaceship for point gain) that competes with the aver-ive component (hide and avoid point loss). Fig. 14 shows howdministering the SS during the ITI on acquisition trials, in modelsith different ITI durations (10, 30 and 50 timesteps) modulated

he attenuating effect of the SS on avoidance behavior (Fig. 14):imilarly to the effect of different Rreward values (Fig. 13), the effectas minimal when short ITI was used (Fig. 14A) and was strongerhen medium and long durations were used, especially during the

xtinction phase of the task (Fig. 14B and C).

.4. Discussion

Periods free from aversive events (i.e., safety periods) arehought to represent an appetitive component that is capable of

odulating avoidance behavior in rodents [60,61]. Moreover, it haseen argued that signals that are associated with these periods (SSs)ay provide positive reinforcement and may become inhibitors

f fear [35]. Although a rich rodent literature exists on this topic,he lack of a standardized methodology together with inconsistentesults in the rodent literature (for review, see [20]) limit interpre-ation and translation of work to a human population. In addition, inpite of the importance of extinction learning as therapy for anxietyymptomatology, reports on the role of SSs in extinction are few and

nconsistent. To bridge the gap between human and non-humannimal literature, we have recently used the spaceship task to testhe role of SSs in human avoidance behavior [20]. We found thathe administration of a visual SS during ITI (and “practice” period)

lack solid line). While all simulations show an attenuating effect of SS on avoidance,bars indicate SEM. (For interpretation of the references to color in this figure legend,

of the acquisition phase attenuated the demonstrated avoidancebehavior. Consistent with these empirical findings, model simu-lations presented in the current work similarly demonstrate thedecreased avoidance in response to SS administration.

While the model successfully replicates prior empirical findingson avoidance behavior, its core value lies in its ability to revealmechanisms that could underlie the observed outcomes. Specif-ically, the ITI and the “practice” time are periods when reward isavailable and no punishment can occur. By administering an SS dur-ing these “safe” periods, the critic increased the values associatedwith the inputs that were active during these periods, correspond-ing to an expectation of reward. As the critic learned this prediction,weights were adjusted in the actor to favor the actions that werechosen (during these periods). Specifically, the actor increased theprobability of “fighting” (remaining in the central area) when theSS was present. It is important to note that since different periodsin the task often share inputs [e.g., input #6 (location at the centralarea) can be activated during either the warning period, the bombperiod or the ITI], such weight alteration also affected model behav-ior during the warning period – the probability of remaining inthe central area (“fighting”) was increased, and accordingly, avoid-ance behavior was decreased. Such a link between SSs and theappetitive component of an avoidance paradigm is consistent withearlier rodent reports, where safe states (e.g., “non-shock areas”)induced “relaxation”, reduced avoidance, and were generalized toother states with similar cues [60].

Lastly, we used the model to generate several novel predictions,which would motivate new investigations to better exploit SSs as avital component in anxiety therapy. First, while the effect of an SSadministered during the acquisition phase is consistent with a largerodent literature and is important for promoting basic understand-ing of avoidance behavior, obtaining control over individual stimuliduring real-life situations is not always possible. Thus, administer-ing or removing potential SSs during acquisition of avoidance maynot be practical. Cognitive-behavioral therapies, however, oftenrely on extinction learning, where individuals are exposed to thefeared stimulus or outcome in the absence of actual danger [56].

The model predicts that, similar to the effects of SS administra-tion during the acquisition phase, an SS administered during theextinction phase would similarly lead to facilitation of extinctionlearning. Second, the model predicts that when reward sensitivity is

1 Brain R

htmiaptp

5

asovtbrAsaimubiftba

icdthctih

atimwSaba

tArowcttwr

to

36 J. Sheynin et al. / Behavioural

igh, the attenuating effect of the SS is also increased. Interestingly,he model also suggests that when reward sensitivity is low, the SS

ight have no effect – a finding that might explain some of thenconsistency in the rodent literature, where effect of SSs on avoid-nce extinction is not always observed [24,58,62]. Third, the modelredicts that using longer safety (ITI) periods could help magnifyhe attenuating effect of SSs on avoidance behavior, especially inaradigms that include an explicit appetitive component.

. Overall summary and conclusions

In this work we first demonstrated that increased sensitivity ton aversive outcome could account for the higher AR rate demon-trated by inhibited individuals. This idea is consistent with Gray’sriginal definition of a behavioral inhibition system [50] and pro-ides a simple explanation for the association between inhibitedemperament and facilitated avoidance learning on computer-ased tasks [15,54]. We then used the model to show that higheratio of punishment/reward sensitivity could underlie the longerR duration, as well as the slower extinction learning, demon-trated by females [15,20]. Imbalance between the approach andvoidance systems has previously been argued to characterize anx-ety disorders [39], and might partially explain why females are

ore vulnerable than males to anxiety disorders. Specific manip-lations of the LRs also suggest that these sex differences mighte mediated by dopaminergic signaling in the dorsal striatum – an

dea that is consistent with previous literature and should driveuture empirical studies. The model further suggested an interpre-ation for the finding that SSs are capable of attenuating avoidanceehavior [20], by increasing the probability of choosing a competingpproach response.

The current work has important implications for future behav-oral studies. First, the model made several novel predictions thatould be tested empirically. We predicted that SSs administereduring extinction learning would also retard avoidance behavior,hat SSs might have a larger attenuating effect in individuals withigh reward sensitivity, and that the effect of SSs on avoidanceould be increased by using longer safety periods. The spaceshipask and common rodent avoidance paradigms could be employedn future studies to test these predictions in both human and non-uman subjects, respectively.

The model also predicted that females may have different rel-tive sensitivity to reward versus punishment than males, whilehose with inhibited temperament should have greater sensitiv-ty to punishment than those with uninhibited temperament. This

ight be explored in humans by self-report questionnaires thatould specifically assess participants’ approach-avoidance biases.

uch questionnaires might include the Sensitivity to Punishmentnd Sensitivity to Reward Questionnaire [33], the Behavioral inhi-ition/activation scale [50], or specific ratings of the motivation topproach reward and avoid punishment during the task [34].

Importantly, the model described here specifically addresseshe approach-avoidance conflict introduced by the spaceship task.s such, it uses a simplified environment where two competingesponses are available: an approach response, which provides anpportunity to obtain reward (“fight”) and an avoidance response,hich prevents possible punishment (“hide”). Future empirical and

omputational work could further examine performance on theask by assessing the degree that such approach behavior translateso an actual reward, i.e., whether the subject chooses to “shoot”hile being in the central area, and whether such shooting is accu-

ate and hits the enemy spaceship.It should also be noted that the current work used a parameter-

uning approach to test specific hypotheses regarding the rolef distinct reward and punishment sensitivities in avoidance

esearch 283 (2015) 121–138

behavior. While a few models of avoidance behavior have beenpreviously reported [25–29], the current model together withanother parameter-tuning model of rodent avoidance behavior[30] represent the first attempts to use computational tech-niques to understand individual differences in this behavior. Theparameter-tuning approach is often used to test the effect of spe-cific parameters on the model behavior (e.g., Tables 1 and 2 in [63]),to simulate group differences (e.g., in Parkinson’s disease [64]) andis especially advantageous when addressing a priori hypothesesconcerning possible abnormal processes in the studied subjects (forreview, see Fig. 1 in [65]). Moreover, we predicted that the proposedmechanisms would successfully describe group differences acrossseveral experiments; utilizing a model with a fixed set of tunedparameters provides a simplified and unified framework that canpotentially explain such a complex empirical dataset. Future com-putational efforts could further extend the current findings anduse a fitting approach to simulate data at the individual level andtest whether other specific parametric configurations could offer analternative explanation (e.g., see [36,37]); a synergistic use of differ-ent approaches or models at different levels of abstraction shouldalso be considered [65]. Overall, it should be clear that rather thanproviding the best explanation, this work merely proposes one sim-ple and comprehensive mechanism that should be tested by futureempirical analyses.

Lastly, future work should build on the empirical and computa-tional grounds presented here to address other important featuresof avoidance learning, e.g., the role of response-stimulus contin-gency in the effect of SSs on avoidance acquisition: while priorrodent studies demonstrated that the presence of an SS facilitatedacquisition of AR, Sheynin et al.’s [20] study showed the oppositepattern; Sheynin et al. raised the possibility that such discrepancyis due to the contingency of the SS on the rat’s ER/AR, whereasin the spaceship task the SS always appeared at the end of thebomb period, regardless of the subject’s behavior. Another phe-nomenon that could be investigated is “warm-up” (decreased ARat the beginning of a new session); lack of warm-up was shownto characterize vulnerable subjects in both empirical rodent workand computational modeling ([17,30], respectively). Further, thespaceship task and the computational model could be used to studyactive versus passive avoidance mechanisms, and specifically, test-ing the idea that female sex and inhibited temperament might beassociated with different types of avoidance [20]. Lastly, RL modelshave been linked to brain substrates; specifically, fMRI and electro-physiological studies provided evidence for the idea that the critic(value prediction) is related to the ventral striatum and the actor(action selection) is related to the dorsal striatum in both humansand rodents ([44,45], respectively). Possible future directions couldinclude functional neuroimaging (e.g., fMRI) studies of humansperforming the spaceship task, to reveal the brain areas that areinvolved in different patterns of the avoidance behavior (e.g., higherAR rate or longer AR duration) in individuals with different anxietyvulnerabilities.

In sum, the current work presents for the first time the use ofcomputational modeling techniques to study the processes thatmight result in greater avoidance in vulnerable individuals, whichin turn might subsequently increase the development of patho-logical avoidance behaviors. Importantly, by simulating humanbehavior on a task that was developed to parallel common animalparadigms, this work helps to bridge the gap between human andnon-human avoidance literature. The model simulations proposea simple and unified explanation for a series of empirical reportsobtained from healthy college students with anxiety vulnerabili-

ties; this explanation is based solely on individual differences insensitivity to rewarding and punishing outcomes, and as such, maygeneralize from experimental to real-world settings. Finally, weused the model to generate several predictions regarding how SSs

rain R

catsp

C

c

A

tOrbSro

R

[

[

[

[

[

[

[

[

[

[

[

[

[

[

[

[

[

[

[

[

[

[

[

[

[

[

[

[

[

[

[

[

[

[

[

[

[

[

[

[

[

[

J. Sheynin et al. / Behavioural B

ould be used to promote or retard avoidance; if these predictionsre upheld, this would provide further support for the computa-ional model and could also provide valuable insight to clinicianseeking to optimize extinction-based therapies for individuals withathological avoidance.

onflict of interest

The authors affirm that they have no relationships that couldonstitute potential conflict of interest.

cknowledgments

This work was supported by Award Number I01CX000771 fromhe Clinical Science Research and Development Service of the VAffice of Research and Development, by the NSF/NIH Collabo-

ative Research in Computational Neuroscience (CRCNS) Program,y NIAAA (R01 AA018737), and by additional support from theMBI. The views in this paper are those of the authors and do notepresent the official views of the Department of Veterans Affairsr the U.S. government.

eferences

[1] American Psychiatric Association. Diagnostic and statistical manual of mentaldisorders: DSM-IV-TR. 4th edn. Washington, DC: APA; 2000.

[2] North CS, Nixon SJ, Shariat S, Mallonee S, McMillen JC, Spitznagel EL, et al.Psychiatric disorders among survivors of the Oklahoma City bombing. J AmMed Assoc 1999;282:755–62.

[3] North CS, Pfefferbaum B, Tivis L, Kawasaki A, Reddy C, Spitznagel EL. The courseof posttraumatic stress disorder in a follow-up study of survivors of the Okla-homa City bombing. Ann Clin Psychiatry 2004;16:209–15.

[4] Foa EB, Stein DJ, McFarlane AC. Symptomatology and psychopathology of men-tal health problems after disaster. J Clin Psychiatry 2006;67(Suppl. 2):15–25.

[5] Karamustafalioglu OK, Zohar J, Guveli M, Gal G, Bakim B, Fostick L, et al. Nat-ural course of posttraumatic stress disorder: a 20-month prospective study ofTurkish earthquake survivors. J Clin Psychiatry 2006;67:882–9.

[6] O’Donnell ML, Elliott P, Lau W, Creamer M. PTSD symptom trajectories: fromearly to chronic response. Behav Res Ther 2007;45:601–6.

[7] Delgado MR, Jou RL, Ledoux JE, Phelps EA. Avoiding negative outcomes: trackingthe mechanisms of avoidance learning in humans during fear conditioning.Front Behav Neurosci 2009;3:33.

[8] Lovibond PF, Saunders JC, Weidemann G, Mitchell CJ. Evidence for expectancyas a mediator of avoidance and anxiety in a laboratory model of human avoid-ance learning. Q J Exp Psychol 2008;61:1199–216.

[9] Cloninger CR. A unified biosocial theory of personality and its role in the devel-opment of anxiety states. Psychiatr Dev 1986;4:167–226.

10] Taylor JE, Sullman MJ. What does the driving and riding avoidance scale (DRAS)measure? J Anx Dis 2009;23:504–10.

11] Arcediano F, Ortega N, Matute H. A behavioural preparation for the study ofhuman Pavlovian conditioning. Q J Exp Psychol B 1996;49:270–83.

12] Molet M, Leconte C, Rosas JM. Acquisition, extinction and temporal discrimi-nation in human conditioned avoidance. Behav Process 2006;73:199–208.

13] Raia CP, Shillingford SW, Miller Jr HL, Baier PS. Interaction of procedural factorsin human performance on yoked schedules. J Exp Anal Behav 2000;74:265–81.

14] Byron Nelson J, del Carmen Sanjuan M. A context-specific latent inhi-bition effect in a human conditioned suppression task. Q J Exp Psychol2006;59:1003–20.

15] Sheynin J, Beck KD, Pang KC, Servatius RJ, Shikari S, Ostovich J, et al.Behaviourally inhibited temperament and female sex, two vulnerability fac-tors for anxiety disorders, facilitate conditioned avoidance (also) in humans.Behav Process 2014;103:228–35.

16] Beck KD, Jiao X, Pang KC, Servatius RJ. Vulnerability factors in anxietydetermined through differences in active-avoidance behavior. Prog Neuropsy-chopharmacol Biol Psychiatry 2010;34:852–60.

17] Servatius RJ, Jiao X, Beck KD, Pang KC, Minor TR. Rapid avoidance acquisitionin Wistar-Kyoto rats. Behav Brain Res 2008;192:191–7.

18] Pigott TA. Gender differences in the epidemiology and treatment of anxietydisorders. J Clin Psychiatry 1999;60(Suppl. 18):4–15.

19] Gladstone GL, Parker GB, Mitchell PB, Wilhelm KA, Malhi GS. Relationshipbetween self-reported childhood behavioral inhibition and lifetime anxietydisorders in a clinical sample. Depress Anxiety 2005;22:103–13.

20] Sheynin J, Beck KD, Servatius RJ, Myers CE. Acquisition and extinction of humanavoidance behavior: attenuating effect of safety signals and associations with

anxiety vulnerabilities. Front Behav Neurosci 2014;15(8):323.

21] Graham BM, Milad MR. The study of fear extinction: implications for anxietydisorders. Am J Psychiatry 2011;168:1255–65.

22] Beck KD, Jiao X, Ricart TM, Myers CE, Minor TR, Pang KC, et al. Vulnerabilityfactors in anxiety: strain and sex differences in the use of signals associated with

[

[

esearch 283 (2015) 121–138 137

non-threat during the acquisition and extinction of active-avoidance behavior.Prog Neuropsychopharmacol Biol Psychiatry 2011;35:1659–70.

23] Grossen NE, Bolles RC. Effects of a classical conditioned ‘fear signal’ and ‘safetysignal’ on nondiscriminated avoidance behavior. Psychon Sci 1968;11:321–2.

24] Dillow PV, Myerson J, Slaughter L, Hurwitz HMB. Safety signals and the acquisi-tion and extinction of lever-press discriminated avoidance in rats. Br J Psychol1972;63:583–91.

25] Johnson JD, Li W, Li J, Klopf AH. A computational model of learned avoidancebehavior in a one-way avoidance experiment. Adapt Behav 2001;9:91–104.

26] Moutoussis M, Bentall RP, Williams J, Dayan P. A temporal difference accountof avoidance learning. Network 2008;19:137–60.

27] Maia TV. Two-factor theory, the actor-critic model, and conditioned avoidance.Learn Behav 2010;38:50–67.

28] Smith A, Li M, Becker S, Kapur S. A model of antipsychotic action inconditioned avoidance: a computational approach. Neuropsychopharmacol2004;29:1040–9.

29] Schmajuk NA, Zanutto BS. Escape, avoidance, and imitation: a neural networkapproach. Adapt Behav 1997;6:63–129.

30] Myers CE, Smith IM, Servatius RJ, Beck KD. Absence of “warm-up” during activeavoidance learning in a rat model of anxiety vulnerability: insights from com-putational modeling. Front Behav Neurosci 2014;18(8):283.

31] Sutton RS, Barto AG. Reinforcement learning: an introduction. Cambridge, MA:MIT Press; 1998.

32] Li CR, Huang CY, Lin W, Sun CWV. Gender differences in punishment and rewardsensitivity in a sample of Taiwanese college students. Personal Individ Differ2007;43:475–83.

33] Torrubia R, Avila C, Moltó J, Caseras X. The Sensitivity to Punishment and Sen-sitivity to Reward Questionnaire (SPSRQ) as a measure of Gray’s anxiety andimpulsivity dimensions. Personal Individ Differ 2001;31:837–62.

34] Aupperle RL, Sullivan S, Melrose AJ, Paulus MP, Stein MB. A reverse translationalapproach to quantify approach-avoidance conflict in humans. Behav Brain Res2011;225:455–63.

35] Christianson JP, Fernando AB, Kazama AM, Jovanovic T, Ostroff LE, Sangha S.Inhibition of fear by learned safety signals: a mini-symposium review. J Neu-rosci 2012;32:14118–24.

36] Frank MJ, Moustafa AA, Haughey HM, Curran T, Hutchison KE. Genetic triple dis-sociation reveals multiple roles for dopamine in reinforcement learning. ProcNatl Acad Sci U S A 2007;104:16311–6.

37] Frank MJ, Doll BB, Oas-Terpstra J, Moreno F. Prefrontal and striatal dopaminer-gic genes predict individual differences in exploration and exploitation. NatNeurosci 2009;12:1062–8.

38] Pohjalainen T, Rinne JO, Nagren K, Syvalahti E, Hietala J. Sex differences in thestriatal dopamine D2 receptor binding characteristics in vivo. Am J Psychiatry1998;155:768–73.

39] Stein MB, Paulus MP. Imbalance of approach and avoidance: the yin and yangof anxiety disorders. Biol Psychiatry 2009;66:1072–4.

40] Basso AM, Gallagher KB, Mikusa JP, Rueter LE. Vogel conflict test: sex dif-ferences and pharmacological validation of the model. Behav Brain Res2011;218:174–83.

41] van Honk J, Schutter DJ, Hermans EJ, Putman P, Tuiten A, Koppeschaar H. Testos-terone shifts the balance between sensitivity for punishment and reward inhealthy young women. Psychoneuroendocrinology 2004;29:937–43.

42] Aupperle RL, Paulus MP. Neural systems underlying approach and avoidancein anxiety disorders. Dialogues Clin Neurosci 2010;12:517–31.

43] Bach DR, Guitart-Masip M, Packard PA, Miro J, Falip M, Fuentemilla L,et al. Human hippocampus arbitrates approach-avoidance conflict. Curr Biol2014;24:541–7.

44] O’Doherty J, Dayan P, Schultz J, Deichmann R, Friston K, Dolan RJ. Dissocia-ble roles of ventral and dorsal striatum in instrumental conditioning. Science2004;304:452–4.

45] Daw ND [Unpublished doctoral dissertation] Reinforcement learning models ofthe dopamine system and their behavioral implications. Pittsburgh: CarnegieMellon University; 2003.

46] Farde L, Hall H, Pauli S, Halldin C. Variability in D2-dopamine receptor den-sity and affinity: a PET study with [11C]raclopride in man. Synapse 1995;20:200–8.

47] Parellada E, Lomena F, Catafau AM, Bernardo M, Font M, Fernandez-Egea E,et al. Lack of sex differences in striatal dopamine D2 receptor binding in drug-naive schizophrenic patients: an IBZM-SPECT study. Psychiatry Res 2004;130:79–84.

48] Munro CA, McCaul ME, Wong DF, Oswald LM, Zhou Y, Brasic J, et al. Sexdifferences in striatal dopamine release in healthy adults. Biol Psychiatry2006;59:966–74.

49] Elliot AJ, Thrash TM. Approach-avoidance motivation in personality:approach and avoidance temperaments and goals. J Personal Soc Psychol2002;82:804–18.

50] Carver CS, White TL. Behavioral inhibition, behavioral activation, and affectiveresponses to impending reward and punishment: the BIS/BAS scales. J PersonalSoc Psychol 1994;67:319–33.

51] Gable SL, Reis HT, Elliot AJ. Behavioral activation and inhibition in everyday life.J Personal Soc Psychol 2000;78:1135–49.

52] Jappe LM, Frank GK, Shott ME, Rollin MD, Pryor T, Hagman JO, et al. Height-ened sensitivity to reward and punishment in anorexia nervosa. Int J Eat Disord2011;44:317–24.

53] Cloninger CR. A systematic method for clinical description and classification ofpersonality variants. A proposal. Arch Gen Psychiatry 1987;44:573–88.

1 Brain R

[

[

[

[

[

[

[

[

[

[

[

[

[

to reward versus punishment. 2014.

38 J. Sheynin et al. / Behavioural

54] Sheynin J, Shikari S, Gluck MA, Moustafa AA, Servatius RJ, Myers CE. Enhancedavoidance learning in behaviorally inhibited young men and women. Stress2013;16:289–99.

55] Myers CE, Vanmeenen KM, McAuley JD, Beck KD, Pang KC, Servatius RJ. Behav-iorally inhibited temperament is associated with severity of post-traumaticstress disorder symptoms and faster eyeblink conditioning in veterans. Stress2012;15:31–44.

56] Balooch SB, Neumann DL, Boschen MJ. Extinction treatment in multiple con-texts attenuates ABC renewal in humans. Behav Res Ther 2012;50:604–9.

57] Galvani PF, Twitty MT. Effects of intertrial interval and exteroceptive feed-back duration on discriminative avoidance acquisition in the gerbil. Anim LearnBehav 1978;6:166–73.

58] Candido A, Maldonado A, Vila J. Effects of duration of feedback on signaledavoidance. Anim Learn Behav 1991;19:81–7.

59] Brennan FX, Beck KD, Servatius RJ. Leverpress escape/avoidance conditioningin rats: safety signal length and avoidance performance. Integr Physiol Behav

Sci 2003;38:36–44.

60] Denny MR, Weisman RG. Avoidance behavior as a function of length of non-shock confinement. J Comp Physiol Psychol 1964;58:252–7.

61] Berger DF, Brush FR. Rapid acquisition of discrete-trial lever-press avoidance:effects of signal-shock interval. J Exp Anal Behav 1975;24:227–39.

[

esearch 283 (2015) 121–138

62] Fernando AB, Urcelay GP, Mar AC, Dickinson TA, Robbins TW. The roleof the nucleus accumbens shell in the mediation of the reinforcing prop-erties of a safety signal in free-operant avoidance: dopamine-dependentinhibitory effects of d-amphetamine. Neuropsychopharmacol 2014;39:1420–30.

63] O’Reilly RC, Frank MJ. Making working memory work: a computationalmodel of learning in the prefrontal cortex and basal ganglia. Neural Comput2006;18:283–328.

64] Moustafa AA, Gluck MA. A neurocomputational model of dopamine andprefrontal-striatal interactions during multicue category learning by Parkinsonpatients. J Cogn Neurosci 2011;23:151–67.

65] Maia TV, Frank MJ. From reinforcement learning models to psychiatric andneurological disorders. Nat Neurosci 2011;14:154–62.

66] Tomer R, Slagter HA, Christian BT, Fox AS, King CR, Murali D, et al. Love to win orhate to Lose? Asymmetry of dopamine D2 receptor binding predicts sensitivity

67] van der Schaaf ME, van Schouwenburg MR, Geurts DE, Schellekens AF, BuitelaarJK, Verkes RJ, et al. Establishing the dopamine dependency of human stri-atal signals during reward and punishment reversal learning. Cereb Cortex2014;24(3):633–42.