Training a new response using conditioned reinforcement

7
This article appeared in a journal published by Elsevier. The attached copy is furnished to the author for internal non-commercial research and education use, including for instruction at the authors institution and sharing with colleagues. Other uses, including reproduction and distribution, or selling or licensing copies, or posting to personal, institutional or third party websites are prohibited. In most cases authors are permitted to post their version of the article (e.g. in Word or Tex form) to their personal website or institutional repository. Authors requiring further information regarding Elsevier’s archiving and manuscript policies are encouraged to visit: http://www.elsevier.com/copyright

Transcript of Training a new response using conditioned reinforcement

This article appeared in a journal published by Elsevier. The attachedcopy is furnished to the author for internal non-commercial researchand education use, including for instruction at the authors institution

and sharing with colleagues.

Other uses, including reproduction and distribution, or selling orlicensing copies, or posting to personal, institutional or third party

websites are prohibited.

In most cases authors are permitted to post their version of thearticle (e.g. in Word or Tex form) to their personal website orinstitutional repository. Authors requiring further information

regarding Elsevier’s archiving and manuscript policies areencouraged to visit:

http://www.elsevier.com/copyright

Author's personal copy

Behavioural Processes 87 (2011) 231–236

Contents lists available at ScienceDirect

Behavioural Processes

journa l homepage: www.e lsev ier .com/ locate /behavproc

Short report

Training a new response using conditioned reinforcement

Rodrigo Sosa ∗, Cristiano Valerio dos Santos ∗, Carlos FloresUniversidad de Guadalajara, Mexico

a r t i c l e i n f o

Article history:Received 3 December 2010Received in revised form 14 March 2011Accepted 21 March 2011

Keywords:Conditioned reinforcementFunction transferAssociationNew response techniqueInterdimensional compound stimulus

a b s t r a c t

Some of the most frequently used methods in the study of conditioned reinforcement seem to be insuf-ficient to demonstrate the effect. The clearest way to assess this phenomenon is the training of a newresponse. In the present study, rats were exposed to a situation in which a primary reinforcer and anarbitrary stimulus were paired and subsequently the effect of this arbitrary event was assessed by pre-senting it following a new response. Subjects under these conditions emitted more responses comparedto their own responding before the pairing and to their responding on a similar operandum that wasavailable concurrently that had no programmed consequences. Response rates also were higher com-pared to responding by subjects in similar conditions in which there was no contingency (a) between thearbitrary stimulus and the reinforcer, (b) between the response and the arbitrary stimulus or (c) both.Results are discussed in terms of necessary and sufficient conditions to study conditioned reinforcement.

© 2011 Elsevier B.V. All rights reserved.

1. Introduction

Learning theorists have agreed that the concept of conditionedreinforcement refers to a neutral event that acquires reinforcingproperties by its relation to a primary reinforcer (e.g., Schlinger andBlakely, 1994). Some authors state that this process is necessary tothe development of complex behavioural patterns both in humansand in other animals, since primary reinforcement contingenciesare not easily identified in most complex behavioural patterns (e.g.,Williams, 1994; Donahoe and Palmer, 1994). The most frequentlyused procedures to study conditioned reinforcement are extinc-tion (e.g., Skinner, 1938), chained schedules (e.g., Kendall, 1967),second-order schedules (e.g., Findley and Brady, 1965), and observ-ing responses (e.g., Wyckoff, 1969). Nevertheless, these procedureshave weaknesses that prevent an unequivocal demonstration of thephenomenon (see Williams, 1994 and Fantino, 1977, for an exten-sive critique of the procedures used to demonstrate the effect).

The new response technique using conditioned reinforcementis arguably the most adequate way to demonstrate the effect(Williams, 1994; Wyckoff, 1959). However, this procedure requiresspecific control conditions to make sure that (1) the supposed neu-tral event is not reinforcing by itself, (2) performance is not causedby an increase in general activity caused by the mere presentation

∗ Corresponding authors at: Centro de Estudios e Investigaciones en Compor-tamiento, Francisco de Quevedo 180, C.P. 44130 – Arcos Vallarta – Guadalajara,Mexico. Tel.: +5233 38 18 07 33304; fax: +5233 38 18 07 30x4.

E-mail addresses: [email protected] (R. Sosa), [email protected](C.V. dos Santos).

of a stimulus previously paired with a reinforcer, and (3) the perfor-mance does not reflect an increase in general activity caused by thewithdrawal of the primary reinforcer. Some of these controls wereimplemented in a number of studies (Bersh, 1951; Crowder et al.,1959; Fox and King, 1961; Keehn, 1962; Knott and Clayton, 1966;Saltzman, 1949; Skinner, 1938; Stein, 1958; Zimmerman, 1959;Snycerski et al., 2005; Sosa and Pulido, submitted for publication).Yet, none of these have implemented all the necessary control con-ditions at the same time. The aim of the present study is to evaluatethe new response procedure taking into account all these controlconditions that are necessary for an unequivocal demonstration ofthe phenomenon.

If subjects exposed to a pairing between an arbitrary stimu-lus and a primary reinforcer, but whose responses in the test hadno relation with the presentation of the arbitrary stimulus (Con-trol 1), responded at high rates, this effect could be attributed tothe presentations of the stimulus previously paired with a primaryreinforcer causing an increase in general activity. If there were anincrease in response rate for subjects exposed neither to a pairingbetween an arbitrary stimulus and a primary reinforcer nor to a testcondition in which its responses lead to reinforcement (Control 3),this could be interpreted as an increase in general activity caused bythe withdrawal of free reinforcer presentations. If subjects exposedto non-paired stimulus presentations in the training phase and acontingency between the response and the arbitrary stimulus in thesecond (Control 2), responded at higher rates, the arbitrary stimulusmay be assumed to have reinforcer properties. If only the subjectsexposed to pairing between an arbitrary stimulus and a primaryreinforcer during training responded more frequently in the test,during which the arbitrary stimulus is contingent on a response

0376-6357/$ – see front matter © 2011 Elsevier B.V. All rights reserved.doi:10.1016/j.beproc.2011.03.001

Author's personal copy

232 R. Sosa et al. / Behavioural Processes 87 (2011) 231–236

(Experimental Group), it could be claimed that the arbitrary stim-ulus actually had acquired reinforcing properties.

2. Materials and method

2.1. Subjects

Sixteen naïve, female Wistar Lewis rats, aged three months atthe start of the experiment, weighing between 250 g and 300 g,were used. Rats were housed individually and maintained on a 12-hr light-dark cycle with lights on at 7:00 am. Subjects had access towater 30 min after the experimental sessions. All rats had continu-ous access to food in their home cages.

2.2. Apparatus

Two standard rat conditioning chambers (MED associates, Inc.,Model ENV-008) were used. Chambers were housed in sound-attenuating cubicles (MED associates, Inc., Model ENV-022 M). Eachchamber was equipped with two water dispensers (MED associates,Inc., Model ENV-201A); one of them installed in the central panelof the right wall and the other was outside the chamber but insidethe sound-attenuating cubicle. Each chamber was equipped withtwo retractable levers situated on the front wall of the box at thelateral channel adjacent to the water dispenser. Levers were 2.5 cmabove the grid floor and a force of 0.15 N was required to close themicroswitch. The arbitrary stimulus was the following set of events,presented simultaneously: a 1 s, 60 dB white noise produced by asonalert situated on the upper left corner of the back wall, a 1 swhite light situated above the water dispenser and a click producedby the outside water dispenser. The event programming and datarecording was conducted using MED-PC IV computer equipment,interface and software for Windows environment.

3. Procedure

3.1. Experimental design

A 2 × 2 factorial design was used, with contingency and phaseas factors. Thus, four groups (n = 4) were used. All subjects wereexposed to two phases –training and test– each of which con-sisted of two conditions. The first condition of training consistedof either paired (Experimental and Control 1) or unpaired (Control2 and Control 3) presentations of the arbitrary stimulus and theprimary reinforcer; the second condition consisted of a reductionin the proportion of primary reinforcer presentations with respectto arbitrary stimulus presentations to prevent a rapid extinction ofthe reinforcing value of the putative conditioned reinforcer. In thefirst condition of the test, the arbitrary stimulus was either con-tingent (Experimental and Control 2) or non contingent (Control 1and Control 3) upon a response; in the second condition of the test,responding had no programmed consequences (Extinction). Tworesponse levers were present throughout the experiment to assessbaseline levels of responding.

3.2. Training

Both conditions of the training phase consisted of 24 pre-sentations of the arbitrary stimulus according to a random time120 s. These arbitrary stimuli were either paired (i.e., presentedimmediately before) with the presentation of water (for groupsExperimental and Control 1) or non-paired (the arbitrary stimuluswas arranged according to an independent random time schedule,groups Control 2 and Control 3). The non-paired condition was sim-ilar to a truly random procedure (Rescorla, 1967). In Condition 1,

the proportion of water presentations given the arbitrary stimuluswas 1.0 and, in Condition 2, this proportion was 0.5. Both condi-tions were conducted for four sessions. If a water presentation wasscheduled, pressing either lever delayed it by 6 s; this contingencywas implemented to prevent potential adventitious reinforcement.

3.3. Test

Test consisted of two conditions (Condition 3 and Condition 4).During Condition 3, responses on one of the levers (reinforcementoperandum) produced the arbitrary stimulus according to a ran-dom interval 60 s for groups Experimental and Control 2; responseson the other lever (inoperative operandum) had no programmedconsequences. During this condition, subjects of groups Control1 and Control 3 were yoked to subjects of groups Experimentaland Control 2 respectively, with regard to the presentations ofthe arbitrary stimulus: each time one of the subjects of groupsExperimental and Control 2 fulfilled the requirement imposed byRI schedule, it produced an arbitrary stimulus for itself and for itsyoked counterpart. One session was conducted each day and lastedfor a maximum of 48 min or until subjects of the ExperimentalGroup produced the arbitrary stimulus 20 times. If experimentalsubjects did not produce the arbitrary stimulus 20 times within onesession, subsequent test sessions were conducted until the num-ber of stimuli summed over all sessions was 20. During Condition 4,responses on either lever had no scheduled consequences for anyof the groups. This condition consisted of two sessions.

4. Results

Fig. 1 shows response rates during training. Subjects of allgroups showed low rates of responding (less than one response perminute) in the first phase of the experiment (i.e., training), withno apparent differences between response rates on either lever.Response rates for some subjects (C3, C4, C5, C7, C8, and C16) weresomewhat higher in the first sessions of the training phase anddecreased in the remaining sessions, which can be interpreted asan effect of novelty of the initial exposure to the operant cham-ber and experimental contingencies probably inducing exploratorybehavior.

Fig. 2 depicts response rates during the test phase. During Con-dition 3, response rate on the reinforcement operandum for theExperimental Group was notably higher than response rates forthe subjects of the remaining groups in this condition and higherthan response rates during training for the same subjects. Numbersnearby continuous plots demonstrate that subjects of the Exper-imental Group produced the arbitrary stimulus 20 times in twoto five sessions, while subjects of Control 2 produced it 12 timesat most during the same period. Response rate on the inoperativeoperandum increased for two subjects of the Experimental Group(C1 and C5) and remained relatively low for the other subjectsof that group (C9 and C13). For the animals in the other groups,response rate on inoperative operandum remained low during thiscondition. During Condition 4 (Extinction), response rate on thereinforcement operandum decreased for all subjects of the Exper-imental Group. For the remaining subjects, response rate in bothoperanda remained low during this condition, except for subject C4,which showed a mild increase in response rate in the first session.

We compared the response rate on the reinforcement operan-dum in the last session of each of the four conditions with atwo-way repeated measures ANOVA, which yielded a signifi-cant main effect of condition [F(1.41,16.91) = 9.49, P = 0.004]1, and

1 Given that the sphericity of variance–covariance matrix could not be assumed,the degrees of freedom were corrected by the Greenhouse-Geisser factor.

Author's personal copy

R. Sosa et al. / Behavioural Processes 87 (2011) 231–236 233

Fig. 1. Response rate on the reinforcement operandum and on the inoperative operandum for all subjects during conditions 1 and 2 (Training). Missing data in session 7 forsubjects C9 and C10, and in session 5 for subjects C13 and C14 were due to a recording error.

Author's personal copy

234 R. Sosa et al. / Behavioural Processes 87 (2011) 231–236

Fig. 2. Response rate on the reinforcement operandum and on the inoperative operandum for all subjects during conditions 3 and 4 (Test). Numbers in Phase 3 indicate howmany times the arbitrary stimulus was presented.

Author's personal copy

R. Sosa et al. / Behavioural Processes 87 (2011) 231–236 235

group [F(3,12) = 7.44, P = .004], and an interaction between con-dition and group [F(3.68,16.91) = 6.67, P = 0.002]. Bonferroni tests,with the significance corrected for multiple comparisons, revealeddifferences between Experimental Group and the control groups,which did not differ among themselves. No within- or between-group effect was observed for responding on the inoperativeoperandum.

5. Discussion

It has been stated that the new response procedure pro-duces results too ephemeral to sustain its validity, given therapid extinction of conditioned value (Snycerski et al., 2005). Con-trary to these claims, the present study provides evidence ofthe transference of the reinforcement function from a primaryreinforcer to an arbitrary stimulus without noticeable extinctioneffects.

Other studies have obtained evidence of conditioned rein-forcement using this preparation, but they all lacked the propercontrol conditions to rule out other effects. The performance ofsubjects exposed to control conditions rule out possible confound-ing factors such as water withdrawal effects, inherent reinforcingvalue of arbitrary stimulus, and an increase in general activitycaused by the presentation of a stimulus that had been previ-ously paired with primary reinforcement. During the test phase,two out of four subjects of the Experimental Group respondedboth on the reinforcement and on the inoperative operanda, eventhough this lever had no programmed consequences. This resultcould be at odds with a conditioned reinforcement account, sincepressing the inoperative operandum cannot be explained by thepresentation of a conditioned stimulus. One possible explana-tion that is compatible with a conditioned reinforcement accountis that, given the spatial proximity of the operanda, this proce-dure allowed lever presses on both operanda to be temporallycontiguous, thus making adventitious reinforcement more likely.This issue could be confronted by imposing a resetting contin-gency between responses on the inoperative operandum andreinforcement due to responses on the reinforcement operandum(Snycerski et al., 2005). In addition, if we consider that con-ditioned reinforcement is the result of a Pavlovian association,then second-order conditioning also might explain respondingon the inoperative lever (Rescorla, 1980). The lever pressingfeedback itself could have acquired reinforcement properties byits relation to the conditioned reinforcer. Hence, by a general-ization process, responding on both levers could become morelikely.

However, the necessary and sufficient conditions to producea reliable demonstration of conditioned reinforcement are notyet clear. Pilot studies in our laboratory failed to observe theeffect using different parameters. We could find at least oneother failure in a study by Wyckoff et al. (1959), who did notfind significant differences between the performance of experi-mental and control groups. However, both studies used a FixedTime Schedule (FT) in the conditioning phase and this sched-ule could promote timing behavior, so that the passage of time,because it was a reliable predictor of the primary reinforcer,could prevent the putative conditioned stimulus from acquir-ing reinforcing properties. Moreover, in the test phase, bothexposed experimental subjects to a Continuous ReinforcementSchedule (CRF), in which each response produced a conditionedreinforcer. Rapid decrease of conditioned reinforcement valuecould have fostered similar performance in both experimen-tal and control groups. While reinforcement value decreased,response rate declined making conditioned reinforcement effectsvirtually invisible. Wyckoff et al. (1959) admitted that although

their study failed in its objective of finding the conditionsfor observing secondary (i.e., conditioned) reinforcement, theycould not argue that conditioned reinforcement was entirelyabsent.

In contrast, the present study used a random interval(RI) requirement that promoted high rates of respondingin experimental subjects that differentiate their performancefrom that of subjects in control groups. Interestingly, exper-imental subjects in Wyckoff et al.’s (1959) study producedmore conditioned reinforcers than experimental subjects inthe present study: considering only the first minutes oftheir test phase, subjects responded at a rate of about 7responses per minute, which implies the production of 35conditioned reinforcers, while ours could produce 20 condi-tioned reinforcers at a maximum. It could be concluded thatintermittent reinforcement in the test phase (also used byZimmerman, 1959) was a potent tool to assess differences betweengroups or conditions when reinforcer value might decreaserapidly.

Additionally, the fact that we used an interdimensional(tone + light + click) compound as the conditioned stimulus mayhave affected our results. Given that interdimensional compoundstimuli have been shown to result in greater Pavlovian condition-ing (Kehoe, 1982), the compound we used may have been moresalient compared to other stimuli used in equivalent studies on con-ditioned reinforcement with the new response technique, whichcould have brought about a more evident effect of conditionedreinforcement.

Acknowledgements

This research was supported by grant 230501/209354 from theConsejo Nacional de Ciencia y Tecnología (CONACyT).

References

Bersh, P.J., 1951. The influence of two variables upon establishment of secondaryreinforcer operant responses. J. Exp. Psychol. 41, 62–73.

Crowder, W.F., Gay, B.R., Fleming, W.C., Hurst, R.W., 1959. Secondary reinforce-ment or response facilitation? IV. The retention method. J. Psychol. 48,311–314.

Donahoe, J.W., Palmer, C.D., 1994. Learning and Complex Behavior. Allyn and Bacon,Boston.

Fantino, E., 1977. Conditioned reinforcement: choice and information. In: Honig,W.K., Staddon, J.E.R. (Eds.), Handbook of Operant Behavior. Prentice-Hall, Engle-wood Cliffs, NJ, pp. 313–339.

Findley, J.D., Brady, J.V., 1965. Facilitation of large ratio performance by use of con-ditioned reinforcement. J. Exp. Anal. Behav. 8, 125–129.

Fox, R.E., King, R.A., 1961. The effects of reinforcement scheduling on the strengthof a secondary reinforce. J. Comp. Physiol. Psychol. 54, 266–269.

Keehn, J.D., 1962. The effect of post-stimulus conditions on the secondary reinforcingpower of a stimulus. J. Comp. Physiol. Psychol. 55, 22–26.

Kehoe, E.J., 1982. Overshadowing and summation in compound stimulus condition-ing in rabbit’s nictitating membrane response. J. Exp. Psychol.: Anim. Behav.Proc. 8 (4), 313–328.

Kendall, S.B., 1967. Some effects of fixed-interval duration on response rate in atwo-component chain schedule. J. Exp. Anal. Behav. 10, 341–347.

Knott, P.D., Clayton, K.N., 1966. Durable secondary reinforcement using brainstimulation as primary reinforcer. J. Comp. Physiol. Psychol. 61 (1),151–153.

Rescorla, R.A., 1967. Pavlovian conditioning and its proper control procedures. Psy-chol. Rev. 74, 71–80.

Rescorla, R.A., 1980. Pavlovian second-order conditioning: Studies in associativelearning. Erlbaum Associates, N. Jersey.

Saltzman, I.J., 1949. Maze learning in the absence of primary reward—a study ofsecondary reinforcement. J. Comp. Physiol. Psychol. 42, 161–173.

Schlinger, H.D., Blakely, E., 1994. A descriptive taxonomy of environmental opera-tions and its implications for behavior analysis. Behav. Anal. 17, 43–57.

Skinner, B.F., 1938. The Behavior of Organisms. Appleton-Century-Crofts, New York.Sosa, R. Pulido, M.A., submitted for publication. Response acquisition with delayed

conditioned reinforcement. Mexican J. Behav. Anal.Snycerski, S., Laraway, S., Poling, A., 2005. Response acquisition with immediate and

delayed conditioned reinforcement. Behav. Proc. 68, 1–11.Stein, L., 1958. Secondary reinforcement established with subcortical stimulation.

Science 127, 466–467.

Author's personal copy

236 R. Sosa et al. / Behavioural Processes 87 (2011) 231–236

Williams, B.A., 1994. Conditioned reinforcement: experimental and theoreticalissues. The Behav. Anal. 17, 261–285.

Wyckoff, L.B., 1959. Toward a quantitative theory of secondary reinforcement. Psy-chol. Rev. 66 (1), 68–78.

Wyckoff, L.B., 1969. The role of observing responses in discrimination learning. In:Derek, P., Hendry (Eds.), Conditioned Reinforcement. The Dorsey Press, Illinois.

Wyckoff, L.B., Sidowski, J., Chambliss, D.J., 1959. An experimental study of the rela-tionship between secondary reinforcing and cue effects of a stimulus. J. Comp.Physiol. Psych. 51, 103–109.

Zimmerman, D.W., 1959. Sustained performance in rats based on secondary rein-forcement. J. Comp. Physiol. Psych. 52, 353–358.