Choice: A local analysis

23
JOURNAL OF THE EXPERIMENTAL ANALYSIS OF BEHAVIOR CHOICE: A LOCAL ANALYSIS WILLIAM VAUGHAN, JR. HARVARD UNIVERSITY Analyses of free-operant choice usually employ one of two general procedures: the simple concurrent procedure (i.e., a concurrent variable-interval variable-interval schedule) or the concurrent-chains procedure (i.e., concurrently available initial links, each leading to an exclusively available terminal link). Theories about choice usually focus on only one of the two procedures. For example, maximization theories, which assert that behavior is distributed between two alternatives in such a way that overall rate of reinforcement is maximized, have been applied only to the simple concurrent procedure. In the present paper, a form of the pairing hypothesis (according to which pairings between one stimulus and another affect the value of the first, and pairings between responses and reinforcers af- fect the value of the former) is developed in a way that allows it to make qualitative predic- tions with regard to choice in a variety of simple concurrent and concurrent-chains pro- cedures. The predictions include matching on concurrent variable-interval variable- interval schedules, preference reversal in the self-control paradigm, and preference for tandem over chained terminal links. Key words: choice, reinforcement, conditioned reinforcement, matching, maximizing, melioration, concurrent schedules, concurrent-chains schedules, feedback functions INTRODUCTION Choice procedures involve studying how an organism distributes its behavior across two or more simultaneously available alternatives. A result that has proved central was reported by Herrnstein (1961): When exposed to two con- currently available variable-interval (VI) schedules, pigeons distributed their responses between the two keys in approximately the same proportion as reinforcers were received. That is, if twice as many reinforcements were received on the left as on the right, pigeons tended to respond about twice as often on the left as on the right. This may be expressed as: D R L L R (1) PR RR Preparation of this manuscript was supported by NSF Grants IST-81-00404 and IST-8409824 to Har- vard University. I thank the following for comments and encouragement: W. M. Baum, M. L. Commons, M. C. Davison, E. Fantino, R. J. Herrnstein, S. E. G. Lea, and J. A. Nevin. Reprints may be obtained from William Vaughan, Jr., Department of Psychol- ogy and Social Relations, Harvard University, Cam- bridge, Massachusetts 02138. Here, PL and PR represent responses on the left and right, and RL and RR represent rein- forcers received on the left and right. Subse- quently it was found that the distribution of time also tends to equal, or match, the distri- bution of obtained reinforcers (Brownstein & Pliskoff, 1968; Catania, 1966): TL= RL TR RR (2) TL and TR represent times spent on the left and right sides. Because matching is such an orderly phe- nomenon, a good deal of effort has gone into both generalizing it to new situations (such as single-key schedules, multiple schedules, and concurrent-chains schedules) and to explain- ing why it occurs in the first place. For exam- ple, it has been said that matching occurs because that is the distribution of behavior which maximizes overall rate of reinforcement (Rachlin, Green, Kagel, & Battalio, 1976; Staddon & Motheral, 1978), a process that may be called global maximization. Others (e.g., Hinson & Staddon, 1983; Shimp, 1969) have suggested that matching results from a 383 1985, 43, 383-405 NUMBER 3 (MAY)

Transcript of Choice: A local analysis

JOURNAL OF THE EXPERIMENTAL ANALYSIS OF BEHAVIOR

CHOICE: A LOCAL ANALYSISWILLIAM VAUGHAN, JR.

HARVARD UNIVERSITY

Analyses of free-operant choice usually employ one of two general procedures: the simpleconcurrent procedure (i.e., a concurrent variable-interval variable-interval schedule) orthe concurrent-chains procedure (i.e., concurrently available initial links, each leading toan exclusively available terminal link). Theories about choice usually focus on only one ofthe two procedures. For example, maximization theories, which assert that behavior isdistributed between two alternatives in such a way that overall rate of reinforcement ismaximized, have been applied only to the simple concurrent procedure. In the presentpaper, a form of the pairing hypothesis (according to which pairings between one stimulusand another affect the value of the first, and pairings between responses and reinforcers af-fect the value of the former) is developed in a way that allows it to make qualitative predic-tions with regard to choice in a variety of simple concurrent and concurrent-chains pro-cedures. The predictions include matching on concurrent variable-interval variable-interval schedules, preference reversal in the self-control paradigm, and preference fortandem over chained terminal links.

Key words: choice, reinforcement, conditioned reinforcement, matching, maximizing,melioration, concurrent schedules, concurrent-chains schedules, feedback functions

INTRODUCTION

Choice procedures involve studying how anorganism distributes its behavior across two ormore simultaneously available alternatives. Aresult that has proved central was reported byHerrnstein (1961): When exposed to two con-currently available variable-interval (VI)schedules, pigeons distributed their responsesbetween the two keys in approximately thesame proportion as reinforcers were received.That is, if twice as many reinforcements werereceived on the left as on the right, pigeonstended to respond about twice as often on theleft as on the right. This may be expressed as:

D RL LR (1)PR RR

Preparation of this manuscript was supported byNSF Grants IST-81-00404 and IST-8409824 to Har-vard University. I thank the following for commentsand encouragement: W. M. Baum, M. L. Commons,M. C. Davison, E. Fantino, R. J. Herrnstein, S. E.G. Lea, and J. A. Nevin. Reprints may be obtainedfrom William Vaughan, Jr., Department of Psychol-ogy and Social Relations, Harvard University, Cam-bridge, Massachusetts 02138.

Here, PL and PR represent responses on theleft and right, and RL and RR represent rein-forcers received on the left and right. Subse-quently it was found that the distribution oftime also tends to equal, or match, the distri-bution of obtained reinforcers (Brownstein &Pliskoff, 1968; Catania, 1966):

TL= RLTR RR

(2)

TL and TR represent times spent on the leftand right sides.

Because matching is such an orderly phe-nomenon, a good deal of effort has gone intoboth generalizing it to new situations (such assingle-key schedules, multiple schedules, andconcurrent-chains schedules) and to explain-ing why it occurs in the first place. For exam-ple, it has been said that matching occursbecause that is the distribution of behaviorwhich maximizes overall rate of reinforcement(Rachlin, Green, Kagel, & Battalio, 1976;Staddon & Motheral, 1978), a process thatmay be called global maximization. Others(e.g., Hinson & Staddon, 1983; Shimp, 1969)have suggested that matching results from a

383

1985, 43, 383-405 NUMBER 3 (MAY)

WILLIAM VA UGHAN, Jr.

process, termed momentary maximizing, ofalways emitting that choice response with thehighest momentary probability of reinforce-ment.

There are two procedures commonly usedin the study of choice, the simple concurrentprocedure and the concurrent-chains pro-cedure. In the case of the simple concurrentprocedure, two or more alternatives aresimultaneously available. Responses on eachgive rise, according to some schedule, toprimary reinforcement. In the case of theconcurrent-chains procedure, responses giverise, according to some schedule, to some fur-ther stimulus in whose presence primary rein-forcement is delivered, according to some dif-ferent schedule (the 'terminal link"). Duringthat terminal link the other alternatives areunavailable (i.e., the alternative keys aredarkened). The primary difference betweenthe two procedures, then, is that whereaschoice responses are directly reinforced in thecase of the simple concurrent procedure, in thecase of the concurrent-chains procedure choiceresponses produce other stimuli, in whosepresence primary reinforcement is delivered.The matching relation discussed above wasfound using the simple concurrent procedure,with VI schedules on each of the alternatives.On the concurrent-chains procedure, it be-comes necessary to incorporate the effect ofdelayed reinforcement into one's analysis ofchoice. The usual tactic has been to assumethat matching of some form is occurring onsuch schedules, and to hypothesize just whatthe variables are to which the organism ismatching.

However, there has occurred a certain divi-sion in the study of choice: In general, thosewho study choice in the concurrent-chainssituation tend to assume some form of match-ing as an axiom, while that very assumption isbeing challenged by some others (e.g., ad-vocates of global maximization or momentarymaximization) studying choice using the sim-ple concurrent procedure. The tactic pursuedin the present paper involves taking a par-ticular form of what is sometimes called thepairing hypothesis, and developing it in such away that both the process of melioration

(Hermstein & Vaughan, 1980) and the condi-tioned values of stimuli present during the ter-minal links of concurrent chains may bededuced. Because, on simple concurrent sched-ules, melioration leads to matching atequilibrium, the present approach simul-taneously addresses two domains of data, andso provides one possible link between them.

MATCHING ON CONCURRENTCHAINS, MAXIMIZATION, AND

MELIORATION

Four models of choice on concurrent-chainsschedules will be reviewed, primarily for pur-poses of orientation. In addition, predictions ofthe model to be developed here will be com-pared with predictions of these other models interms of certain known data.Baum and Rachlin (1969) suggested a par-

ticularly straightforward model: The ratio oftimes in two situations equals the ratio of theproducts of reinforcer rate, amount (Ai), andimmediacy (Ii):

TL RLALILTR RR'R

(3)

Although this was suggested primarily withrespect to simple concurrent schedules, the in-clusion of immediacy (the reciprocal of delay)logically implies the concurrent-chains pro-cedure. At the time it was proposed, this for-mulation provided an integration of data froma number of experiments (e.g., Catania, 1963;Chung & Herrnstein, 1967). For Baum andRachlin, Equation 3 constitutes an axiom, inthe sense that no attempt is made to deduce itfrom other, more primitive assumptions.A second model, the delay-reduction hy-

pothesis, was introduced by Fantino (1969a),who pointed out that certain earlier formula-tions had relied largely on data from con-current-chains experiments in which the initiallinks were concurrent VI 60-s schedules. Fan-tino reasoned that with unequal terminal links(so that one was better than the other), asequal initial links became richer, behaviorshould shift toward the better side, an outcomeinconsistent with these earlier models. He

384

CHOICE: A LOCAL ANALYSIS

reported data in accord with this intuition, andsuggested an equation that would handle thesedata. As formulated by Squires and Fantino(1971), this reads:

TL RL(T-t2L) (4)

TR RR(T- t2R)

Here, T is the average time to primary rein-forcement from the onset of the initial links, t2Land t2R are the left and right terminal-linktimes, and RL and RR are the overall rates ofreinforcement on the left and right keys. Thatis, RL equals the number of reinforcers duringa left terminal link, divided by the sum of theleft initial-link schedule and the left terminallink. This formulation has been applied to a

wide range of experiments over more than 10years.

A third approach was taken by Davison andTemple (1973), who reported the results of a

large number of concurrent-chains experi-ments with fixed-interval (FI) terminal links.They proposed the following formnulation as an

alternative to that of Squires and Fantino:

TL RLt2R (5)TR RRt2L

Here, RL and RR have the same meanings as

for Squires and Fantino. Times in the terminallinks, inclusive of reinforcer times, are repre-

sented by t2L and t2R. E is a factor reflecting theratio between obtained and programmed ter-minal-link entries. For example, if the initiallinks were VI 30 s on the left and VI 60 s on

the right (so that the programmed terminal-link entries were in a ratio of 2 to 1), but theleft terminal link was entered three times as

often as the right, E would equal 1.5.The fourth approach to be discussed is that

of Killeen (1982), whose model of behavior onconcurrent chains is based primarily on

theoretical considerations (incentive theory).The equation that Killeen suggested is:

TL RL[exp( - qt2L) + l/t2LI (6)

TR RR[exp( qt2R) l/t2R]

In this equation Ri and t2i have the same

meanings as for Squires and Fantino. Al-

though Killeen usually left the parameter q at0.125, if changing q and/or introducing a biasterm increased the amount of variance ac-counted for by more than 2%, he made suchchanges when addressing existing data.

Notice that the four models discussed aboveall assume some form of matching: The ratioof behavior on two schedules equals the ratioof some function of the schedule parameters.This is the sense in which these approachestake matching as an axiom. Maximizationtheory, whether global or momentary, assumesthat no form of matching is basic; rather, it isassumed that animals behave in ways thatmaximize overall value or the probability ofreinforcement at each response. Matching, bythis account, just happens to coincide withmaximization in a number of situations. Onsimple concurrent variable-ratio (VR) sched-ules, nearly exclusive preference is exhibitedfor the better side (Herrnstein & Loveland,1975), a strategy consistent with both forms ofmaximizing (as well as with matching). Usinga computer simulation of concurrent VIschedules, Rachlin et al. (1976) reported that,at matching, global rate of reinforcement ismaximized on such schedules. The same re-sult was shown analytically (making certainassumptions regarding the form of the VIfeedback function) by Rachlin (1978) and byStaddon and Motheral (1978).

Herrnstein and Vaughan (1980) suggestedan alternative interpretation of these data,which they termed melioration. They pro-posed that rather than maximizing overall rateof reinforcement (or the probability of rein-forcement for each response), organisms aresensitive to the local rates of reinforcement oneach of two alternatives (i.e., number of rein-forcers produced by responding on a side di-vided by time on that side). If the local rate onone side is higher than on the other, melio-ration specifies that more time will be spent onthe better side. This process can account formaximization on concurrent VR schedulesand for matching on concurrent VI schedules.In addition, it is consistent with a number ofprocedures explicitly designed to discriminatebetween melioration and global maximization(Boelens, 1984; Herrnstein & Vaughan, 1980;

385

WILLIAM VA UGHAN, Jr.

Vaughan, 1981; Vaughan, Kardish, & Wilson,1982; also see Herrnstein & Heyman, 1979;Lea, 1976; Mazur, 1981).

A GENERALIZATION OFMELIORATION

The particular generalization of meliorationpursued here is based in part on the attempt(Vaughan, 1982) to derive melioration from astrengthening process similar to that of Res-corla and Wagner (1972). Vaughan suggestedthat the value of a key (i.e., its strength as aconditioned reinforcer) was a function of therate at which transitions occurred from thatkey to food. Let VIL represent the value of theleft key in a concurrent key condition, and V2Lrepresent the value of food produced by re-sponding on that key. The first assumption tobe made here is that V1L is a negatively ac-celerated function of rlL (short for RL/TL-reinforcers on the left divided by time on theleft), approaching asymptote at V2L for highrates of food delivery. If reinforcement ratewere to fall to zero, it is further assumed thatthe value of the key would approach its uncon-ditioned value, represented by V'IL. For a keythis unconditioned value is assumed to bezero. Figure 1 shows the case in which (local)rate of reinforcement on the left (riL) is greaterthan on the right (rlR): The value of the left key(VIL) will be greater than that of the right(VIR). A bent arrow (at V'1) indicates the un-conditioned value of the stimulus under con-sideration. Arrows leading away (VIL and VIR)signify dependent variables (in terms of thespecific function under consideration), and ar-rows leading to the function signify indepen-dent variables. As will be seen below, in somecases independent variables for some functionsare dependent on other functions.The function shown in Figure 1 is a simple

hyperbola:

VI= rV2 (7)r + a

Here, V1 is the value of the stimulus underconsideration, V2 is the value of the stimulus(at the moment it is presented) to which tran-

4

3

4-VIL 2

4-VIRI

+-V2

0 ,,_ I.."1V'0 60 120 180 240 300

riR riLTrans per hour (r x 3600)

Fig. 1. Value of the left key (VIL) and right key(VIR) as a function of rate of transition on the left (r1L)and right (r1R) to a stimulus whose onset has value V2.V'1, assumed to be zero for a keylight, is the uncondi-tioned value of Stimulus 1. Dependent variables forthis function are shown as arrows leading away, andindependent variables are shown leading to the func-tion. A bent arrow signifies the unconditioned value ofa stimulus. In the specific case shown, the rate of rein-forcement on the left is 225 per hour and on the right is100 per hour, leading to a higher value on the left thanon the right.

sitions are made, r is the rate of transition (intransitions per second), and a is a parameter(in this case 0.1) that controls the rate at whichthe asymptote is approached (see Appendix 1also).On a simple concurrent VI VI schedule, the

number of reinforcers delivered on one side isrelatively independent of the amount of timespent there, given that changeovers occurmoderately often. Consider, for example, aconcurrent VI 40 s on the left and VI 60 s onthe right. On the latter, a reinforcer will bereceived approximately once every minute, or60 times per hour. If, in an houres session, onlyone third of that hour is spent on the right, thelocal rate on the right will be 60 reinforcersdivided by one third, or 180 reinforcers perhour. As this suggests, local rates of reinforce-ment are a function both of the VI schedules ineffect and of the distribution of behavior. Asusually programmed, the overall rate of rein-forcement obtained on a VI schedule also de-pends somewhat on the distribution of behav-ior. However, in the case of what have beencalled linear VI schedules (Vaughan, 1976,1982), local rate of reinforcement on a sidedoes in fact equal the overall programmed ratedivided by relative time on that side. Figure 2shows local rates of reinforcement on the leftand right, given a linear VI 40 s on the leftand a linear VI 60 s on the right, as a func-

_ ~ ~ ~ ~ ~~I

<~~~~~~~~~~~~~~~~~~~~~~~~~~

386

CHOICE: A LOCAL ANALYSIS

.r_

O R *- rRL.-

VI 60s (R) /

'40 VI40s (L)

180

120

60.

oI R.8 .6 .4 .2 0

4-RR

TL 'LTL + TR

Fig. 2. Rate of transition (from keylight to food)on the left key (rlL) and right key (r1R) as a function ofthe overall programmed rates of reinforcement on theleft (RL: 90 per hour, or a VI 40 s) and right (RR: 60per hour, or a VI 60 s) as well as relative time on theleft (IL, 0.4 in this case). With linear VI schedules,r1L = RL/tL and rlR = RR/(l - L), or RR/tR.

tion of relative time on the left. In this Figure,riL and rlR represent what the local rates ofreinforcement would be if relative time on theleft (tL) were to equal 0.4, and RL and RRrepresent overall programmed rates of rein-forcement on the left and right of 90 and 60per hour, respectively.The dependent variable of Figure 2 is local

rate of reinforcement (transitions per hour),the independent variable of Figure 1. Thisallows the two figures to be conjoined, as inFigure 3, which shows the value of the left andright keys as a function of both the VI sched-ules in effect and the distribution of time onthem. Here it can be seen that at 0.4 relativetime on the left, the value of the left key will begreater than that of the right key.

In Vaughan (1982), changeovers from a keylower in value to one higher in value wereassumed to strengthen the tendency to makethat changeover response, whereas change-overs in the opposite direction were assumedto weaken the tendency to make that latterresponse. It was further assumed that if thestrengths of changeover responses weremodified as just mentioned, more time wouldcome to be distributed to the key with greatervalue, because of those changes in strengths.One formalization of these assumptions maybe expressed as:

d TL /dt =fX(VL- VIR). (8)TL+ TR

RRTrans per hour

Fig. 3. Conjunction of Figures 1 and 2, showingvalues on the left (VIL) and right (VIR) as a function ofRL, RR, and tL. Note that rlR and r1L are dependentvariables with respect to the lower part of the figureand are independent variables with respect to the up-per part.

The left side of this equation represents therate at which relative time on the left changes.The functionf. is assumed to be differentiable,monotonically increasing, and to have theproperty thatf(0) = 0. According to this equa-tion, if the value on the left exceeds that on theright, the rate of change of relative time on theleft will be positive (or under certain condi-tions, zero), and vice versa. If the value on theleft equals that on the right, the rate of changeof relative time on the left will be zero. Thesubscript x represents a vector parameter, con-sisting of a number of components, possiblyincluding relative times on the alternatives andvalues of those alternatives. Its only role is toimply that the functionfmay not be the sameunder all conditions, though it must alwayssatisfy the above constraints. Myerson andHale (1984), for example, have shown forsome conditions that the rate of change on aside approaches zero as exclusive preference isapproached. Equation 8 is represented in twoways in Figure 4, for the case of two keys, ofvalues VIL and V1R. Panel A shows thederivative of relative time on the left, withrespect to time, as a function of V1L - VIR.

387

.u .V

t.

.cr v

WILLIAM VA UGHAN, Jr.

IA B

-o I~~

- 0 +

V- VIR VIR

Fig. 4. A: Derivative of relative time on the left, withrespect to time, as a function of difference between valueon the left and right. B: An alternative representation,with derivative of relative time on the left shown as an up-ward arrow, indicating rate of change of tL. Although it isassumed the function must satisfy certain constraints, thefunction itself may change as a function of certain quan-tities (e.g., VIL or VIR)-

A slightly different representation is shown inPanel B. Here the x axis again representsvalue, but the origin is shifted to the value onthe right, V1R. Rate of change in relative timeon the left can then be read off as the value ofthe function corresponding to V1L. The up-ward arrow below V1L corresponds to the rate(and direction) at which relative time on theleft changes. Scale values are omitted from thisfigure because few relevant data are available.

Figures 3 and 4 may now be conjoined, asshown in Figure 5. The abscissa of the melio-ration function has been lined up with relativetime on the left (tL). Note that there are nowtwo functions for value (on the left and rightkeys), shown separately for simplicity. Thesetwo values constitute the input for the processof melioration, whose output constitutes a rateof change of relative time on the left. In the ex-ample illustrated, value on the left exceeds thaton the right, so relative time on the left will in-crease (the arrow for change of tL is nonzeroand points up). Equilibrium is reached whenthe two local rates of reinforcement are equal,because at that point the values of the two keysare equal. The equilibrium is stable becausedeviations from that distribution of behaviorchange the local rates in such a way that theabove distribution is again approached. Equal-ity of the two local rates of reinforcement isequivalent to Equation 2, matching of timeratios to reinforcer ratios. The model shown inFigure 5, then, constitutes a possible explana-tion of why such matching occurs.

Before attempting to extend this model to

Trans per hour

Fig. 5. Conjunction of Figures 3 and 4(B). WithVI 40 s on the left key and VI 60 s on the right key, ifrelative time on the left equals 0.4, the value on the leftwill be greater than on the right, so relative time on theleft should increase. See text for details.

concurrent-chains experiments, it is worthpointing out a number of its properties. Theuse of linear VI schedules is not integral to themodel, but simply serves to make the mathe-matics easier. In principle, any feedback func-tion could be employed. Second, according tothis model the direction in which the distribu-tion of behavior shifts does not depend onoverall rate of reinforcement, but only on localrates, as specified in Equation 8. The model isthus to some extent confirmed by experimentsthat favor melioration over global maximiza-tion (e.g., Boelens, 1984; Herrnstein & Hey-man, 1979; Lea, 1976; Mazur, 1981; Vaughanet al., 1982). Third, the transition to reinforce-ment or to a conditioned stimulus is assumedto strengthen immediately preceding behavior(e.g., Ferster & Skinner, 1957; Morse, 1966;Skinner, 1938) or to change the value of con-tiguous stimuli. The model is thus an exampleof two-factor theory. Finally, rather than

388

CHOICE: A LOCAL ANALYSIS

specifying that the distribution of behaviorwill, say, match the distribution of obtainedreinforcers, the model specifies whether or notbehavior will tend to shift (and if so, in whatdirection). This allows the model to make pre-dictions regarding situations that involve un-stable, neutral, and stable equilibrium (e.g.,Vaughan, 1981; Vaughan & Herrnstein, inpress), a property that is usually only implicitin most other models.As mentioned above, the only difference be-

tween simple concurrent VI schedules andconcurrent-chain schedules is that in simpleconcurrents, reinforcers are produced im-mediately by certain responses during thechoice phase, but in concurrent chains asignaled delay of some sort is interposed be-tween those choice responses and the availa-bility or delivery of reinforcement. Figure 6shows a schematic representation of the con-current-chains procedure. The two initial linksare concurrently available, and a certain pro-portion of time will be spent on each, as deter-

Changeoverresponses

mined by the choice responses. Time spent oneach initial link gives rise, with some localrate, to the terminal link for that side (usuallysignaled by a change in key color), while theother side is darkened. On some schedule(e.g., VI), reinforcement is then made avail-able. Following this, the initial links are usu-ally reinstated. In the model discussed above,during the choice phase each side has a valuethat is a function of the rate of transititon toreinforcement, as well as of the value of rein-forcement itself. A straightforward generaliza-tion of this model would involve the interposi-tion of another such function between thechoice phase and reinforcement, to parallel theterminal link schedule shown in Figure 6. Inthis way, the value of a terminal link would bea function of both the rate at which food waspresented while it is in force and the value offood. The value of an intital link, in turn,would be a function of both the rate at whichthe terminal link was presented and the valueof that terminal link.

t IInitial link-

terminal linktransitions

1

Terminal link-reinforcementtransitions

Fig. 6. Schematic representation of the concurrent-chains procedure. The left and right initial links are con-

currently available; terminal links are mutually exclusive. Responses on each initial link produce transitions tothe respective terminal link with some frequency; each terminal link will then give rise to reinforcement withsome frequency. See Appendix 4 for a discussion of the effect of events following reinforcement.

Left initiallink

Leftreinf

Right initiallink

Right terminallink

Rightreinf

389

390

Recall that a hyperbola (Equation 7) wasused to model the value of a stimulus as afunction of both the rate of transition to a sec-ond stimulus and the value of that secondstimulus. If it is assumed that the uncondi-tioned value of stimulus 1 (V'1) need not bezero, Equation 9 is used:

r(V2- V'1)r+a (9)

(See also Appendix 2.) According to this equa-tion, V1 (the conditioned value of Stimulus 1)is bounded by V'i (the unconditioned value ofStimulus 1) as r approaches zero, and by V2(the conditioned value of Stimulus 2) as r

grows large. (V2, in turn, is calculated in an

analogous manner, being bounded by V'2 andV3.)

Figure 7 shows, for two values of a, a

number of graphs of Equation 9. On the leftthe equation is plotted in terms of r, rate oftransition to the next stimulus; on the right it isplotted in terms of 1/r, or duration of thestimulus (in seconds). For the upper two func-tions, V2 is greater than V'1; for the lower twofunctions, V2 is less than V'i. In the case of theupper two functions, a value of a equal to 0.2generates a curve below that for a equal to 0.1.In what follows, the former will be used for FIterminal links, and the latter for VI terminallinks. This is one of a number of ways to gen-

erate a preference for VI over FI terminallinks, given equal rates of reinforcement(Herrnstein, 1964).

Figure 8 makes explicit the way in which thevalue of an initial-link stimulus is assumed tobe functionally related to the value of a ter-minal-link stimulus, which in turn is assumedto depend on the value of reinforcement (V3)as well as on the rate at which reinforcementoccurs (top: r2), or, equivalently, on the dura-tion of the terminal-link stimulus (bottom:I/r2). V3, the value of reinforcement at the mo-ment it is presented, constitutes the asymptote(i.e., the maximum possible value) for thevalue of the terminal link. With r2 equal to1800 transitions per hour (or 0.5 transitionsper second), V2 is the value of entering the ter-minal link. This in turn constitutes the asymp-

V2[L LL- 1 r-. .. 1

0 180 360 540 720 900 15 12 9 6 3 0

Trans per hour (rx3600) Duration in s (i/rI

Fig. 7. Plots of Equation 9. Top: V2 > V'1. Bot-tom: V2 < V'1. Left: as a function of r (rate of transi-tion). Right: as a function of l/r (duration). In eachcase two values of a are shown, 0.1 for the case of VIschedules and 0.2 for the case of Fl schedules.

tote for the value of the initial link. It is im-material whether rate (top) or duration (bot-tom) is employed.The value of reinforcement as a function of

its duration will also fit within this framework.Let the unconditioned value of food (the valueof food at the moment it is presented, giventhat it will be on arbitrarily long) be positive;for convenience, a value of 10 will be em-

ployed here. Further, let the value of thekeylight following food be zero. Then the

3 _2

0 IL 0 iz~) oC1 24 300 i2c10 ',00 2 400:)'2

> Trcns per hour Trc-.s per h,ur

4-V3

4

4~~~~~~~~~~~V -

2

0 60 120 810 2-40 -300' 5

Trons per hour Duration in s

Fig. 8. Representation of how the value (V2) of a

terminal-link stimulus (right functions) varies as a

function of the value of reinforcement (V3) and the rateof transition to reinforcement (top: r2), or, equiva-lently, the duration of the terminal link (bottom: U/r2).The function determining value for the intitial link isshown on the left.

WILLIAM VA UGHAN, Jr.

V2

aIa=.l

V

3-a

CHOICE: A LOCAL ANAL YSIS

lower right portion of Figure 7, with V'1 equalto 10, V2 equal to zero, and a equal to 0.2represents the value of food as a function of itsduration (a will equal 0.2 if food is to be kepton for a fixed duration). For example, if food ison for a duration of 4 s (1/r3), it will have avalue of 4.44 (1V3), as shown in Figure 9. Notethat nothing is being said about the value ofterminal-link stimuli, or reinforcers, after theironset. Halfway through 4 s of food, for exam-ple, it is not being said that the momentaryvalue of food is equivalent to the onset of 2 s offood. Only the values of these stimuli at theironset is being addressed. With regard to initial-link stimuli, however, the assumption is madethat value is approximately constant overtime, because reinforcement (or the transitionto a terminal link) is approximately equiprob-able at each moment of time. These consid-erations imply that the model applies only toconcurrent VI VI initial links, and not to VIFI or FI FI initial links.

Figure 10 shows the complete model, for thecase of concurrent VI 60-s initial-link sched-

10 . <_ _ _ .J9

8

7-

54-V3

4- -\

3

2 -

I\

0 -_ V45 4 3 2 1 0

/r3Duration in s

Fig. 9. The value of reinforcement (V3) as a func-tion of its unconditioned value (V'3), its duration(1/r3), and the value of the following stimulus (V4),here assumed zero.

ules, leading to an Fl 2 s (left) and an FI 4 s(right). Function A shows the value of food onthe left ( V3L) as a function of its duration(1/r3L), in this case 4 s. V3L is then the asymp-tote of the left terninal link (Function B) plottedin terms of duration. With a left terminal linkconsisting of an FI 2 s (l/r2L), V2L representsthe asymptote of the left initial link. FunctionC represents the value of the left initial link asa function of its rate of transition (rlL) to theleft terminal link. That value (V1L) is then oneof two inputs to the process of melioration.The rate of transition to the left terminal link,rlL, depends on the left initial-link schedule(Function D) in conjuction with the distribu-tion of time on that schedule (0.4 in this case).On the right, reinforcement is 4 s in dura-

tion (1/r3R). G shows the value of the right in-itial link, which depends on the rate of transi-tion (rlR) from that stimulus to the right ter-minal link. The right initial-link schedule (H)and relative time on the right (0.6) in turndetermine rlR.The values of the left and right initial links

feed into the process of melioration (I), whoseoutput is an arrow showing the rate and direc-tion of change in relative time on the left (tL).Under the conditions shown, relative time onthe left will increase. Equilibrium will bereached when relative time on the left equals0.58. At this point the local rate of reinforce-ment on the left (rate of transition from the in-itial link to the terminal link) will be less thanthat on the right (60/0.58 = 103 versus60/0.42 = 143 per hour). These unequal ratesof reinforcement are such that they justbalance the unequal values of the terminallinks, so that the two initial links are equal invalue (VIL = VIR). At this point the two criteriathat are necessary and sufficient for stableequilibrium are met: The initial links areequal in value, and deviations from that pointset up conditions tending to drive behaviorback to that point. Note that this abstract formof matching is a theorem, rather than an ax-iom, within the present model.Given concurrent linear VI schedules, it is

possible to represent analytically relative timeon the left at the point where behavior isstable, as a function of overall transitions per

391

WILLIAM VA UGHAN, Jr.

I/S2L

r2R

4- V3

4--~

C ,\ A

I/r3L

I/,r3R

4-V4L

,43j

OF _ _ . . IV2R _ _ dV40 60 120 180 240 300 5 4 3 2 0 5 4 3 2 0

Trans per hour Duration in s

Fig. 10. Analysis of a concurrent-chains procedure, assuming on the left key (top functions) 4 s of reinforce-ment (A), an FI 2-s terminal link (B), and a VI 60-s initial link (D), giving rise to a value for the left initial link(C). On the right key (bottom functions) is shown 4 s of reinforcement (E), an Fl 4-s terminal link (F), and a

VI 60-s initial link (H), giving rise to a value on the right initial link (G). Given 0.4 relative time on the left, thepresent model specifies that relative time on the left will increase (I). See text for details.

second programmed on the left (RL) and right(RR) concurrent schedules, and the values on

the left (V2L) and right (V2R) to which transi-tions are made:

TL RLRR(V2L- V2R)+ aRLV2L (10)

TL + TR a(RL V2L + RR V2R)

(This equation is derived in Appendix 3.) Inthe present case, with concurrent VI 60-s in-

itial links, RL= RR = 0.0167 (60/3600) transi-tions per second. V2L=3.17 and V2R=2.47.Substituting these values into Equation 10gives TL/(TL + TR) = 0.58.

It can be seen from Figure 10 that the pres-

ent model is both very general (in the sense ofbeing unambiguously applicable to a widerange of experiments) and conceptually very

simple. Suppose, for example, that one ter-minal link consisted of two signaled FI sched-

ules leading to food (chained FI FI). It wouldbe necessary only to insert another function in-to Figure 10. Similarly, a simple concurrentexperiment would involve removing FunctionsB and F. The same result is approached con-

tinuously if l/r2L and l/r2R are made to ap-proach zero continuously: There is no discon-tinuity upon going from a nonzero to a zero

terminal link. Functions A, B, C, E, F, and Gare all plots of Equation 9.

Notice that the values of the stimuli follow-ing food (V4L and V4R) have been left at zero.

Technically, these should be set to the valuesof the keylights during the initial link. At thelevel of resolution attempted here, however,doing so would serve only to complicateanalyses without significantly affecting thequalitative predictions (see Appendix 4).To summarize, the model is composed of

only two equations (Equations 8 and 9), along

V2L <~~~~~~~~B'

4-V3L

IEL

4,

<--'R F

IV2

392

,,,'

CHOICE: A LOCAL ANALYSIS

with a set of subsidiary assumptions (e.g., thatthe unconditioned value of a keylight is zero).The use of linear VI schedules is not neces-sary, but serves to simplify the mathematics.Various experimental situations may be mod-eled by means of the appropriate concatena-tion of Equation 9, followed by the applicationof Equation 8 (or, equivalently, Equation 10).The simplicity of the model derives directlyfrom the fact that only local events affectbehavior, or the values of stimuli; the effect ofdelayed reinforcement is assumed to be medi-ated by conditioned reinforcers. The primaryquestion remaining, then, is how the modelcompares with others in addressing actualdata.

IMPLICATIONS

For several areas of research, experimentalfindings will be discussed, the workings of thepresent model will be explicated graphically,and predictions derived from the present modelalong with those derived from the previousmodels mentioned in the introduction will beexamined. Finally, three as yet untested im-plications of this model will be mentioned.

In making predictions from Davison andTemple's (1973) model, their parameter E wasleft at one. For the Squires and Fantino (1971)model, the suggestion of Navarick and Fan-tino (1976) was incorporated: If reinforcerdurations in the two terminal links differed,and those terminal links were nonzero in dura-tion, the terminal link leading to the longerreinforcer was multiplied by the ratio of thesmaller to the larger reinforcer. For Killeen's(1982) model, his Equation 13 was incorpor-ated to account for duration of reinforcement.In additon, his parameter q was left constant at0.125, and no bias term was employed. No at-tempt has been made to apply any model tosituations for which it was obviously notdesigned.

Although leaving all parameters fixed maybe judged inappropriate in some cases, doingso is equivalent to asking whether a givenmodel could make accurate qualitative predic-tions for an as yet untested situation. This isconsistent with the traditional focus on predic-tion and control that one finds within the ex-

perimental analysis of behavior. An alter-native exercise, not undertaken here, wouldinvolve incorporating analogous free param-eters into each of the models and fitting themto some set of data.

Fl Terminal Links Kept at a Constant RatioThe first area to be considered is that of

concurrent-chains experiments with FI ter-minal links kept at a constant ratio. Given anFI terminal link, the reciprocal of the FI valueis equivalent to immediacy of reinforcement,which is one of the variables to which Chungand Herrnstein (1967) suggested that birdsmatch. As mentioned earlier, Baum andRachlin (1969) built this matching assumptioninto their model. One implication of thisassumption is that, given a concurrent-chainsexperiment with FI terminal links kept at aconstant ratio (e.g., 1 to 2), preference shouldremain constant as the absolute size of the ter-minal links is varied.

This approach faces two problems, one log-ical and one empirical. Consider a concurrent-chains experiment with VI 60-s initial links, anFl 5-s left terminal link and an FI 10-s rightterminal link, each leading to 4 s of reinforce-ment. Matching to immediacy of reinforce-ment implies a 2 to 1 preference for the shorterterminal link, even if each terminal link is cutin half (inasmuch as their ratio remains con-stant). But the limit of cutting the terminallinks in half is a simple concurrent VI 60-s ex-periment, which should give rise to indiffer-ence. Logically, then, matching to immediacymust address the transition from nonzero tozero terminal links kept at a constant ratio.On the empirical side, two experiments

(MacEwen, 1972; Williams & Fantino, 1978)report an increase in preference for the shorterof two FI terminal links as both increase, giventhat they are kept at a ratio of 1 to 2. In bothcases, the increase was negatively accelerated.Thus, as applied to a concurrent-chains ex-periment with equal initial links and FI ter-minal links kept at a constant ratio but variedin absolute durations, a model should predictindifference at zero terminal-link durationsand some increase in preference for the shorterFI terminal link as both increase.

393

WILLIAM VA UGHAN, Jr.

In the discussion of Figure 10, it was shownthat, with concurrent VI 60-s initial links, anFI 2-s terminal link leading to 4 s of reinforce-ment on the left, and an FI 4-s terminal linkleading to 4 s of reinforcement on the right,the present model predicts that relative timeon the left should be 0.58. Figure 11 shows theresults obtained if the FI values are systemati-cally varied with their ratio at 1 to 2-for thepresent model and for the four others underconsideration. Three models (Davison & Tem-ple, Squires & Fantino, and the current one)all begin at indifference given zero terminallinks and show a negatively accelerated in-crease in preference for the shorter terminallink as both terminal links increase. The Baumand Rachlin model, as mentioned above,shows a constant preference for the shorter ter-minal link (the discontinuity at zero is notshown). Killeen's model does not converge onindifference as the terminal links approachzero; furthermore, over part of its range itpredicts a decrease in preference for theshorter terminal link as the terminal links in-crease in duration. On the face of it, then,both the Baum and Rachlin and the Killeenmodels are lacking with regard to both logicand data.

Killeen (1982) was able to account forMacEwen's data by assuming, in part, that hisbirds had exhibited bias. There is, however,no independent evidence for such bias (itwould be straightforward to rerun the experi-ment incorporating such a test). Rather, asFigure 11 suggests, Killeen's model requiresthe use of free parameters under the presentcircumstance to avoid making predictions thatare both counterintuitive and do not accordwith known data.

Self-ControlThe second area of research to be discussed

is that related to self-control, a topic addressedby a number of models of choice. If an organ-ism is presented with a choice between a smallreinforcer delivered immediately and a largerreinforcer delayed for some period, preferencefor the smaller reinforcer ('impulsiveness")may be exhibited. As equal delays are addedto both reinforcers, preference may shift from

LL

0.65

0 4 8 12 16 20 24

Shorter Fl in s

Fig. 11. Predictions of five models for aconcurrent-chains procedure in which concurrent VI60-s initial links lead to Fl terminal links that arevaried but kept at a ratio of 1 to 2. Relative choice forthe shorter FI terminal link is shown as a function ofthe duration of that terminal link. The arrow indicatesthe predicted equilibrium distribution for the specialcase shown in Figure 10. B&R= Baum & Rachlin;D&T = Davison & Temple; K = Killeen; S&F = Squires& Fantino; V = Vaughan.

the smaller to the larger reinforcer ("self-control"), the shift being termed preferencereversal. In animals, preference reversal hasbeen exhibited using both direct choice pro-cedures (Ainslie & Herrnstein, 1981; Green,Fisher, Perlow, & Sherman, 1981; Navarick &Fantino, 1976) and 'commitment" procedures,which allow the subject to commit itself to oneoption before the choice is due to be presented(Ainslie, 1974; Rachlin & Green, 1972).

It is sometimes said that preference reversalrequires gradients of reinforcement "more con-cave" than an exponential function (Ainslie,1975, for example, uses this term). This de-scription, however, is neither mathematicallyrigorous nor focused on what is essential aboutgradients that will produce preference reversal.At the point at which preference is reversed, thereinforcement gradients must have the sameheight but unequal derivatives, with the func-tion closer to reinforcement having the greaterderivative. Thus the derivative must not be afunction solely of the height of the function. Itis sufficient, for example, for the derivative tobe an increasing function of the height of thefunction and a decreasing function of the in-tercept (when plotted in terms of 1/r). Expo-

394

CHOICE: A LOCAL ANALYSIS

I/r2L

0)

> <- V2R

4'

4

3ii... V2R

V2 L 2

0

9 8 7 6 5 4I/r2

3I/r3L I/r3R

Duration in sFig. 12. Top pair: Representation of an Fl 5-s terminal link on the left key leading to 4 s of reinforcement.

V2L is the value of entering that terminal link, and V3L is the value of reinforcemnent. Middle pair: An FI 1 s onthe right key leading to 2 s of reinforcement. V2R is the value of entering that terminal link, and V3R is the value ofreinforcement. Bottom pair: Superimposition of the upper two pairs of functions, showing that the value of theterminal links will reverse as a constant amount is added to each FI terminal link.

nential functions are such that the derivative isa function solely of the height of the function,but hyperbolic functions have derivatives thatare an increasing function of the height and adecreasing function of the intercept.The analysis of preference reversal in terms

of the present model is shown in Figure 12.The top pair of panels represents the left ter-minal link, including reinforcement. Reinforce-ment duration of 4 s (l/r3L) produces valueV3L, the asymptote (or intercept, as plottedhere) for the left terminal link. If an FI 5-s ter-

minal link is in effect (l/r2L), the value uponentering that terminal link will be V2L, whichin turn is the asymptote for the left initial link.The middle pair of panels represents the

right terminal link. Reinforcement duration is2 s (1/r3R), producing value V3R. The right ter-minal link shown is an Fl 1 s (1/r2R), so thatthe value upon entering the right terminal linkiS V2R. By shifting the right terminal-link func-tion 4 s to the left in the figure, it is possible toshow what happens in general with an FI x son the right and an FI x + 4 s on the left. This

4

0

4-~~~~~4 -

.

. . .

- -

0

395

I I I I I I I I

3[

WILLIAM VA UGHAN, Jr.

is dearer if the two functions are superimposed,as shown in the bottom lefthand panel. Withan FI 5 s leading to 4 s of reinforcement versusan FI 1 s leading to 2 s of reinforcement, theFl 1 s will be preferred (i.e., V2R is greaterthan V2L). As I/r2 increases, however, the twofunctions cross (at about FI 2 versus FI 6), sothat further increases produce a preference forthe 4-s reinforcer over the 2-s reinforcer. Inother words, the model predicts a shift fromimpulsiveness to self-control under these con-ditions. Where the functions cross, theirheights are of course the same. The higherderivative for the right side then depends solelyon the lower intercept (V3R) for that side ascompared with the other side (V3L).The predictions of the present and four

other models under discussion are shown inFigure 13. Here, the initial links are concur-rent VI 60-s schedules, and the terminal linksare signaled Fl schedules. The smaller rein-forcer in 2 s and the larger is 4 s. Preferencefor the shorter reinforcer is shown as a functionof the delay to that reinforcer, given that thedelay to the longer reinforcer is 4 s greater.The only model not giving rise to preferencereversal is that of Davison and Temple. Of theothers, both Baum and Rachlin, and Killeen,predict exclusive preference for the 2-s rein-forcer at a zero delay; the current model andthat of Squires and Fantino predict nonex-clusive preference. Experimental data do notyet exist to decide between these two possibil-ities. Although the preferences predicted bythe current model are not great, in theory theycould be increased by making the initial linksricher (see below).

Tandem Versus Chained Terminal LinksThe third area to be addressed is that of

choice between tandem and chained FI FI ter-minal links. Partly on the basis of work ofGollub (1958), Fantino (1969b) hypothesizedthat, in a concurrent-chains experiment withterminal links of equal duration, preferenceshould be shown for a terminal link consistingof a single color over one that changed colormidway through (e.g., tandem versus chainedterminal links). Duncan and Fantino (1972)reported a negatively accelerated increase in

0 7 K I~ DST0 l0 .6 -

.2~

4-

0 2 3 4 5 6 7 8

Delay to shorter reinf in s

Fig. 13. Predictions of five models for aconcurrent-chains procedure in which concurrent VI60-s initial links lead to an Fl x (with 2 s of reinforce-ment) and an Fl x +4 (with 4 s of reinforcement),where x is the delay to the shorter reinforcer. Relativechoice for the shorter reinforcer is shown as a functionof the value of x. The arrow shows the predicteddistribution for the special case shown in Figure 12, FI1 s versus FI 5 s.

preference for FI x over chained Fl x/2 FI x12as x increased in duration. Fantino (1977)noted that Wallace (1973) found a preferencefor fixed-time (FT) x over chained FT x/2FT x/2, although the effect was smaller thanthat reported by Duncan and Fantino. Fantino(1983) determined that the effect dependedupon stimulus characteristics, rather thanresponse requirements, of tandem versuschained schedules.

Figure 14 (top two pairs of functions) showsthe analysis of tandem versus chained terminallinks. The top pair represents a tandem FI 4FI 4-s terminal link (which is taken to beequivalent to an Fl 8-s terminal link, in-asmuch as there is no stimulus change; seeFantino, 1983). V4L represents the value ofreinforcement on the left, and constitutes theasymptote (or intercept) for the second FI 4-sschedule (which begins as h/r3L). The height ofthe function at this point (V3L) is the finalheight for the first Fl 4, but the mathematicalasymptote for the first FI is the same as for thesecond, V4L. It is this latter property that leadsto the equivalence between tandem FI x/2FI x/2 and FI x.The middle pair of functions shows a chained

FI 4 Fl 4. Again, V4R represents the value offood, this time on the right. Here, however,

396

CHOICE: A LOCAL ANALYSIS

V3L

I/3

7 6 5

-V3R

5 1I/r3R

3 2 1 )

4-V4R

4

3

4-V2L2'V2R I

10 9 8 7/r2

6 5 4 3 2/r

Duration in s

Fig. 14. Top pair: Representation of an Fl 8-s terminal link (or a tandem Fl 4 s Fl 4 s) on the left key. V4L is

the value of presenting 4 s of food. Middle pair: A chained Fl 4 s Fl 4 s on the right key, likewise leading to 4 s of

food. Bottom pair: Superimposition of the upper two pairs of functions, showing that the value of the tandem ter-

minal link will be greater than that of the chained terminal link (i.e., V2L lies above V2R).

V3R constitutes the final height of the first link,as well as the asymptote of that function. Theway in which this generates a preference fortandem over chained can be seen more clearlyin the bottom pair, where the functions have

been moved together and superimposed. V2L isthe value entering the tandem FI Fl, and V2Rthe value of entering the chained FI Fl. Notethat at I/r3 the heights of the two functions are

the same, but the function for the chained has

a greater derivative to the left than the func-

tion for the tandem schedule. This is the sameproperty, discussed earlier, that allows for

preference reversal -the intercept for the initiallink of the chained schedule (V3R) is lower thanfor the tandem schedule (V3L or V4L). In terms

of the present model, preference reversal andpreference for tandem over chained scheduleseach derive from the same properties of thisfunction.The predictions of the two models under

consideration designed to deal with tandemversus chained terminal links are shown inFigure 15, assuming concurrent VI 60-s initiallinks and 4 s of reinforcement. (No provisionfor tandem versus chained schedules is made

4

31*- V2L2VL

I

0

4

I-a r2L

I/r2L

ci)

*EV2R

3

2

0

40--I

* - - - - - - - .

I I I -

I-aiI,__I

9 1I/r2R

I I I I I I I I I

I ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~.4- -

_ I, - !-I - - - -. . . . . . . ..~~~~I

0

397

V4-

0

WILLIAM VA UGHAN, Jr.

U_.8

cr

0 4 8 12 16 20 24 28 32

Total terminal link duration in s

Fig. 15. Predictions of two models for a

concurrent-chains procedure in which concurrent VI60-s initial links lead to tandem Fl FI versus chainedFI FI. Relative choice for the tandem side is shown as a

function of the total terminal-link duration. The arrow

shows the predicted distribution for the special case

shown in Figure 14, tandem Fl 4 s Fl 4 s versus chainedFI 4 s Fl 4 s.

by the models of Baum and Rachlin, Davisonand Temple, or Squires and Fantino.) Kil-leen's model predicts an increase in preferencefor the tandem side as both terminal links in-crease in duration, but over part of the rangehis model predicts a preference for chainedover tandem; further, the preference for tan-dem over chain, when present, is very slight.The present model predicts an at first posi-tively and later negatively accelerated increasein preference for tandem over chained, withindifference given zero terminal links (simpleconcurrent VI 60-s schedule). If we assume

that Duncan and Fantino's birds were not ex-

hibiting bias (so that at zero terminal linkstheir choice responses would be at 0.5), theyconform to this pattern of positive and nega-tive acceleration of the change in preference.With regard to Killeen's analysis, Figure 15

here differs from his Figure 9 in which he at-tempted to account for the data ofDuncan andFantino. In that figure his variable q was

changed from the usual value of 0. 125 to 0. 24;in Figure 15 here, q is left at 0.125. This isanother case, then, in which Killeen's modelappears to require the use of free parametersin order to conform to data.

Fantino and Duncan (1972) reported resultsfrom an experiment related to the above

analysis. Birds were exposed to schedules inwhich entering one terminal link did not turnoff the other, concurrent, schedule. It was thuspossible to encounter concurrent initial links,concurrent terminal links, or an initial link onone side and a terminal link on the other. Oneof the most striking results was that birds didnot appear to average reinforcement rate on akey across components (which would producematching of responses to reinforcers at alltimes). Rather, when facing an initial link onone side and a terminal link on the other, al-most all responses were directed at the ter-minal link.An analysis of this result from the present

point of view is straightforward. The value ofan initial-link stimulus is maintained by tran-sitions to its terminal link, but the value of aterminal link is maintained by transitions tofood. Given equal initial links and equal ter-minal links (as in Fantino and Duncan's ex-periment), the value of a terminal link will besubstantially greater than that of an initiallink, which should produce the trend exhibitedby the data.The analysis of chained terminal links can

be related to several experiments in whichmore than one reinforcer was delivered perterminal-link entry. Fantino and Herrnstein(1968), for example, ran a concurrent-chainsexperiment with VI 15-s terminal links. Onone side a single reinforcer was deliveredbefore returning to the initial link; on the otherside the number of reinforcers was varied from1 to 10. An increase in the number of rein-forcers produced a negatively accelerated in-crease in preference for that side.Moore (1979) studied a condition in which

an FI 30-s terminal link was run in conjunc-tion with an FI terminal link that, followingfood, led to four more reinforcers on a secondFI schedule. If the added Fl schedules wereshort (e.g., 3 to 9 s each), an increase inpreference was shown for the side with thosereinforcers. If the FI schedules were longer,there was little or no effect.

Shull, Spear, and Bryson (1981) studied asituation in which birds could switch from aVI 120-s schedule to a 240-s component dur-ing which four reinforcers were delivered. The

398

CHOICE: A LOCAL ANALYSIS

first occurred 30 s into the interval, and thelast 210 s into the interval. The location of theother two was varied. In general, as the othertwo occurred closer to the beginning of the in-terval, rate of entering the 240-s componentincreased.Moore suggested that reinforcers following

the first enhance its value; Shull et al. sug-gested that each reinforcer contributed to re-sponse strength, although to a decreasing ex-tent as it was further delayed. In terms of thepresent model, if one terminal-link componentis followed by another, differently signaledterminal-link component, the first will havegreater value than if it led back to the initiallink (see Appendix 4). This analysis is similarto that of Moore, although some additionalassumption would be required before the effectof adding components signaled by the samestimulus as the first could be accounted for.

UNTESTED PREDICTIONS

Three as yet untested predictions of the cur-rent model might also be examined. Considerthe case of equal simple concurrent VI sched-ules, one leading to 2 s of reinforcement andone to 4 s of reinforcement. As the two VIschedules are varied (keeping them equal toeach other), Baum and Rachlin predict a con-stant preference, matching to relative dura-tions of reinforcement. The present model pre-dicts such matching for a particular pair of VIschedules, and a shift toward exclusive prefer-ence for the longer reinforcer as both schedulesgrow rich (the other models under considera-tion do not address this procedure). These pre-dictions are shown in Figure 16, which showspreference for the longer reinforcer as a func-tion of the (equal) concurrent VI schedules.

This prediction is similar to one made in aslightly different context by Fantino (1969a).As mentioned earlier, he argued that given aconcurrent-chains schedule with unequal ter-minal links (e.g., VI 30 and VI 90-s sched-ules), as equal initial links became richer,preference should shift toward the betterterminal-link side; with sufficiently richschedules, exclusive preference should be ex-hibited. A similar argument may be made

(L !

cmc

._

.9

7- &

l6-

.91 .- . _IA Pn n20 40 60 so Iuu

Conc VI schedules in s

Fig. 16. Predictions of two models for a simpleconcurrent procedure in which equal VI schedules,which are varied, lead to reinforcer durations of 2 and4 s. Relative time for the longer reinforcer is shown asa function of the VI schedules.

here: Given unequal durations of reinforce-ment, as equal VI schedules become richer,preference should shift toward the longer rein-forcer. As shown in Figure 16 (which is similarin form to Fantino's Figure 1), the presentmodel implies such a shift, but the relevant ex-periment remains to be done. In terms of thepresent model, unequal durations of reinforce-ment are equivalent to unequal terminal links.(This may be seen by noting that Equation 10relies on V2L and V2R; those quantities may bederived from the presentation of immediatereinforcement, or from the operation of a ter-minal link.) The present model is thus consis-tent with Fantino's position.The property of the present model shown in

Figure 16 also relates to part of an experimentreported by Fantino and Dunn (1983). Birdswere presented with two concurrently avail-able VI 90-s schedules, one leading to a VI60-s terminal link and one to a VI 20-s ter-minal link. A third VI 90-s initial link, leadingto a VI 9-s terminal link, was either present orabsent, in alternate conditions. The main find-ing was that preference for the VI 20-s terminallink, relative to the VI 60-s terminal link, wasgreater in the presence of the third schedulethan in its absence. This outcome is consistentwith the delay-reduction hypothesis, but in-consistent with Luce's (1959) choice axiom.The same qualitative results may be deduced

from the present model: As above, unequal ter-minal links are treated as equivalent to unequaldurations of reinforcement. Consider two con-current VI 90-s schedules leading to dif-

399

WILLIAM VA UGHAN, Jr.

ferent durations of reinforcement. Somepreference will be exhibited for the longer rein-

forcer. Suppose, now, that less time isavailable for responding on those schedules. If,for example, only half of the time previouslyavailable is now available, the local rates offorcement on the concurrent schedules will riseto those normally produced by schedules twiceas rich (i.e., concurrent VI 45-s schedules). Asshown above, making equal concurrent sched-ules leading to different durations of reinforce-ment richer produces a greater preference forthe longer reinforcer. The effect of adding thethird schedule, given that it is sufficientlyvaluable to sustain responding, is simply thatof removing some of the time available for theother two schedules. The results of Fantinoand Dunn are thus consistent with the presentmodel.Baum and Rachlin assumed that the value

of a reinforcer was proportional to its duration(and hence unbounded); the present analysisassumes that the value of a reinforcer isbounded (by its unconditioned value). Oneimplication of boundedness is that, given con-

current VI 60-s schedules leading to reinforcerdurations in a ratio of 2 to 1, as both durationsincrease preference should shift toward indif-ference. Figure 17 shows preference for thelonger reinforcer under these conditions as

both reinforcers increase in duration. Althoughthe Baum and Rachlin model predicts match-ing to relative duration at all durations, thepresent model predicts such matching at leftand right durations of 4 and 2 s, respectively.

.9

o- .8.2

.7 B8 R

,.6

8 16 24 32 40

Duration of longer reinf in s

Fig. 17. Predictions of two models for a simpleconcurrent procedure in which concurrent VI 60-sschedules lead to reinforcer durations that are variedbut kept at a ratio of 2 to 1. Relative time for the longerreinforcer is shown as a function of the duration of thatreinforcer.

From what has been said so far, it can beseen that in a number of cases in which boththe current model and that of Squires andFantino make predictions, the two models are

qualitatively similar. One implication of thissimilarity is that many of the experimentsoriginally reported as confirming the delay-reduction hypothesis also serve to confirm thepresent model. It would, however, also be use-

ful to find an experimental situation in whichthe two approaches made qualitatively dif-ferent predictions.

Consider an experiment in which the twoterminal links are VI 60-s schedules, and theright initial link is a VI 60-s as well. Figure 18shows, for both the current model and that ofSquires and Fantino, what happens as the leftinitial-link requirement is decreased from VI60 s toward VI 0 s. Both models predict indif-ference, given concurrent VI 60 VI 60-s initiallinks. The current model predicts that choiceresponses will shift toward exclusive prefer-ence for the rich initial link; the Squires andFantino model predicts an asymptotic distri-bution of behavior at 0.67. That is, two thirdsof the initial-link responses would be directedto the side that provided immediate access toone terminal link, and one third would be di-rected to the side that provided terminal-linkaccess only once a minute, overall. A more re-

cent version of that model (Fantino & Davison,1983) predicts that choice responses will shift

.9

,,.8

~~~~~Vare lsh d ei

"7

"6

.5[2,0 30 40 50 60

Varied VI schedule in s

Fig. 18. Predictions of two models for a

concurrent-chains procedure in which the right initiallink is VI 60 s, the left initial link is varied, and bothlead to VI 60-s terminal links. Relative time for thevaried initial link is shown as a function of its value inseconds.

11 -- . .

400

CHOICE: A LOCAL ANALYSIS 401

only to 0.59. The relevant experiment remainsto be done.

SUMMARY

The generalization of melioration discussedhere is in qualitative agreement with trendsexhibited in a number of simple concurrent, aswell as concurrent-chains, experiments. It isalso a straightforward exercise to generate newpredictions that discriminate between the pres-ent model and others. The presentation of re-inforcement has no special status-the sameframework applies to both neutral stimuli andpositive reinforcers (as well as punishers,though this aspect has not been elaboratedhere; see Vaughan, in press). Finally, thevalues of initial-link stimuli are continuousfunctions of the schedule parameters, a pre-sumably desirable mathematical constraintthat is sometimes ignored.The present model is not so much one model

as it is one of the simplest of a class of models(see Appendix 2), each consisting of two equa-tions, one relating to conditioned reinforce-ment (e.g., Equation 9) and one relating to thedistribution of behavior (e.g., Equation 8).This class of models, in turn, is perhaps thesimplest of a larger class employing more thantwo equations. For example, time in thepresence of food, shock, and a keylight mightrequire somewhat different equations. Such aneventuality would not necessarily challenge thecore of the present model, but only the morespecific assumptions. As these considerationssuggest, the present paper is not meant to ad-vocate a particular model of choice, but rathera particular framework. The formalism of thatframework derives from the model of Rescorlaand Wagner (1972); the basic tenets regardingpairing are consistent with the approach takenby Skinner in The Behavior of Organisms.

REFERENCESAinslie, G. W. (1974). Impulse control in pigeons.

Journal of the Experimental Analysis of Behavior, 21,485-489.

Ainslie, G. (1975). Specious reward: A behavioraltheory of impulsiveness and impulse control.Psychological Bulletin, 82, 463-496.

Ainslie, G., & Herrnstein, R. J. (1981). Preferencereversal and delayed reinforcement. Animal Learning& Behavior, 9, 476-482.

Baum, W. M., & Rachlin, H. C. (1969). Choiceas time allocation. Journal of the Experimental Analysisof Behavior, 12, 861-874.

Boelens, H. (1984). Melioration and maximizationof reinforcement minus costs of behavior. Journal ofthe Experimental Analysis of Behavior, 42, 113-126.

Brownstein, A. J., & Pliskoff, S. S. (1968). Someeffects of relative reinforcement rate andchangeover delay in response-independent concur-rent schedules of reinforcement. Journal of the Ex-perimental Analysis of Behavior, 11, 683-688.

Catania, A. C. (1963). Concurrent performances:A baseline for the study of reinforcementmagnitude. Journal of the Experimental Analysis ofBehavior, 6, 299-300.

Catania, A. C. (1966). Concurrent operants. In W.K. Honig (Ed.), Operant behavior: Areas of research andapplication (pp. 213-270). New York: Appleton-Century-Crofts.

Chung, S., & Herrnstein, R. J. (1967). Choice anddelay of reinforcement. Journal of the ExperimentalAnalysis of Behavior, 10, 67-74.

Davison, M. C., & Temple, W. (1973). Preferencefor fixed-interval schedules: An alternative model.Journal of the Experimental Analysis of Behavior, 20,393-403.

Duncan, B., & Fantino, E. (1972). The psycholog-ical distance to reward. Journal of the ExperimentalAnalysis of Behavior, 18, 23-34.

Fantino, E. (1969a). Choice and rate of reinforce-ment. Journal of the Experimental Analysis of Behavior,12, 723-730.

Fantino, E. (1969b). Conditioned reinforcement,choice, and the psychological distance to reward. InD. P. Hendry (Ed.), Conditioned reinforcement (pp.163-191). Homewood, IL: Dorsey Press.

Fantino, E. (1977). Conditioned reinforcement:Choice and information. In W. K. Honig & J. E.R. Staddon (Eds.), Handbook of operant behavior (pp.313-339). Englewood Cliffs, NJ: Prentice-Hall.

Fantino, E. (1983). On the cause of preference forunsegmented over segmented reinforcementschedules. Behaviour Analysis Letters, 3, 27-33.

Fantino, E., & Davison, M. (1983). Choice: Somequantitative relations. Journal of the ExperimentalAnalysis of Behavior, 40, 1-13.

Fantino, E., & Duncan, B. (1972). Some effects ofinterreinforcement time upon choice. Journal of theExperimental Analysis of Behavior, 17, 3-14.

Fantino, E., & Dunn, R. (1983). The delay-reduc-tion hypothesis: Extension to three-alternativechoice. Journal of Experimental Psychology: AnimalBehavior Processes, 9, 132-146.

Fantino, E., & Herrnstein, R. J. (1968). Secondaryreinforcement and number of primary rein-forcements. Journal of the Experimental Analysis ofBehavior, 11, 9-14.

Ferster, C. B., & Skinner, B. F. (1957). Schedules ofreinforcement. New York: Appleton-Century-Crofts.

Gollub, L. R. (1958). The chaining of fixed-intervalschedules. Unpublished doctoral thesis, HarvardUniversity.

Green, L., Fisher, E. B., Jr., Perlow, S., & Sherman,L. (1981). Preference reversal and self control:Choice as a function of reward amount and delay.Behaviour Analysis Letters, 1, 43-51.

402 WILLIAM VA UGHAN, Jr.

Green, L., & Snyderman, M. (1980). Choice be-tween rewards differing in amount and delay:Toward a choice model of self control. Journal of theExperimental Analysis of Behavior, 34, 135-147.

Herrnstein, R. J. (1961). Relative and absolutestrength of response as a function of frequency ofreinforcement. Journal of the Experimental Analysis ofBehavior, 4, 267-272.

Herrnstein, R. J. (1964). Aperiodicity as a factor inchoice. Journal of the Experimental Analysis of Behavior,7, 179-182.

Herrnstein, R. J., & Heyman, G. M. (1979). Ismatching compatible with reinforcement max-imization on concurrent variable interval, variableratio? Journal of the Experimental Analysis of Behavior,31, 209-223.

Herrnstein, R. J., & Loveland, D. H. (1975). Maxi-mizing and matching on concurrent ratioschedules. Journal of the Experimental Analysis ofBehavior, 24, 107-116.

Herrnstein, R. J., & Vaughan, W., Jr., (1980).Melioration and behavioral allocation. In J. E. R.Staddon (Ed.), Limits to action: The allocation of in-dividual behavior (pp. 143-176). New York: Aca-demic Press.

Hinson,J. M., & Staddon,J. E. R. (1983). Match-ing, maximizing, and hill-climbing. Journal of theExperimental Analysis of Behavior, 40, 321-331.

Killeen, P. R. (1982). Incentive theory: II. Modelsfor choice. Journal of the Experimental Analysis ofBehavior, 38, 217-232.

Lea, S. E. G. (1976). Titration of schedule param-eters by pigeons. Journal of the Experimental Analysis ofBehavior, 25, 43-54.

Luce, R. D. (1959). Individual choice behavior: A theoret-ical analysis. New York: Wiley.

MacEwen, D. (1972). The effects of terminal-linkfixed-interval and variable-interval schedules onresponding under concurrent chained schedules.Journal of the Experimental Analysis of Behavior, 18,253-261.

Mazur, J. E. (1981). Optimization theory fails topredict performance of pigeons in a two-responsesituation. Science, 214, 823-825.

Moore, J. (1979). Choice and number of reinforcers.Journal of the Experimental Analysis of Behavior, 32,51-63.

Morse, W. H. (1966). Intermittent reinforcement.In W. K. Honig (Ed.), Operant behavior: Areas ofresearch and application (pp. 52-108). New York:Appleton-Century-Crofts.

Myerson, J., & Hale, S. (1984, May). A transition-state comparison of melioration and the kinetic model.Poster presented at the meeting of the Associationfor Behavior Analysis, Nashville, TN.

Navarick, D. J., & Fantino, E. (1976). Self-controland general models of choice. Journal ofExperimentalPsychology: Animal Behavior Processes, 2, 75-87.

Rachlin, H. (1978). A molar theory of reinforcementschedules. Journal of the Experimental Analysis ofBehavior, 30, 345-360.

Rachlin, H., & Green, L. (1972). Commitment,choice and self-control. Journal of the ExperimentalAnalysis of Behavior, 17, 15-22.

Rachlin, H., Green, L., Kagel, J. H., & Battalio, R.C. (1976). Economic demand theory andpsychological studies of choice. In G. H. Bower(Ed.), The psychology of learning and motivation (Vol.10, pp. 129-154). New York: Academic Press.

Rescorla, R. A., & Wagner, A. R. (1972). A theoryof Pavlovian conditioning: Variations in the effec-tiveness of reinforcement and nonreinforcement. InA. H. Black & W. F. Prokasy (Eds.), Classical condi-tioning II. Current research and theory (pp. 64-99). NewYork: Appleton-Century-Crofts.

Shimp, C. P. (1969). Optimal behavior in free-oper-ant experiments. Psychological Review, 76, 97-112.

Shull, R. L., Spear, D. J., & Bryson, A. E. (1981).Delay or rate of food delivery as a determiner ofresponse rate. Journal of the Experimental Analysis ofBehavior, 35, 129-143.

Skinner, B. F. (1938). The behavior of organisms. NewYork: Appleton-Century-Crofts.

Squires, N., & Fantino, E. (1971). A model forchoice in simple concurrent and concurrent-chainsschedules. Journal of the Experimental Analysis ofBehavior, 15, 27-38.

Staddon, J. E. R., & Motheral, S. (1978). On match-ing and maximizing in operant choice experiments.Psychological Review, 85, 436-444.

Vaughan, W., Jr. (1976). Optimization and reinforce-ment. Unpublished doctoral thesis, HarvardUniversity.

Vaughan, W., Jr. (1981). Melioration, matching,and maximization. Journal ofthe Experimental Analysisof Behavior, 36, 141-149.

Vaughan, W., Jr. (1982). Choice and the Rescorla-Wagner model. In M. L. Commons, R. J. Herrn-stein, & H. Rachlin (Eds.), Quantitative analyses ofbehavior: Vol. 2. Matching and maximizing accounts (pp.263-279). Cambridge, MA: Ballinger.

Vaughan, W., Jr. (in press). Choice and punish-ment: A local analysis. In M. L. Commons, J. E.Mazur, J. A. Nevin, & H. Rachlin (Eds.). Quan-titative analyses of behavior: Vol. 5. Reinforcement value:The effect of delay and intervening events. Hillsdale, NJ:Erlbaum.

Vaughan, W., Jr., & Herrnstein, R. J. (in press).Stability, melioration, and natural selection. In L.Green & J. H. Kagel (Eds.), Advances in behavioraleconomics (Vol. 1). Norwood, NJ: Ablex.

Vaughan, W., Jr., Kardish, T. A., & Wilson, M.(1982). Correlation versus contiguity in choice.Behaviour Analysis Letters, 2, 153-160.

Wallace, R. F. (1973). Conditioned reinforcement andchoice. Unpublished doctoral thesis, University ofCalifornia, San Diego.

Williams, B. A., & Fantino, E. (1978). Effects onchoice of reinforcement delay and conditioned rein-forcement. Journal of the Experimental Analysis ofBehavior, 29, 77-86.

Received November 9, 1982Final acceptance February 19, 1985

CHOICE: A LOCAL ANALYSIS

Appendix 1

In Vaughan (1982) the presentation of rein-forcement (a strengthening process) was as-sumed to change the value of the current stim-ulus by an amount proportional to the dif-ference between the value of reinforcementand the value of the current stimulus, alongthe lines of Rescorla and Wagner (1972). Inthe absence of reinforcement the value of thecurrent stimulus (i.e., a key currently beingpecked by a pigeon) was assumed to changetoward its unconditioned value at a rate pro-portional to the difference between its currentvalue and its unconditioned value (a weaken-ing process). From these two assumptions itcould be inferred that at equilibrium the valueof a stimulus was a strictly monotonically in-creasing function of the local rate of reinforce-ment (see Vaughan, 1982, Figure 12-2). Thespecific mathematical form of that function,however, was not discussed.

Here, a specific mathematical form (hyper-bolic) is taken as the starting point. This isdone on the assumption that some strengthen-ing and weakening processes (though not nec-essarily strict proportionalities) could be foundthat would produce such a form at equilibrium.

Appendix 2

Although the hyperbolic form is used here,other functional forms should not be dismissed.In fact, the actual function could well provenot to be analytic. The hyperbola is one mem-ber of the following class: assuming V2> V'1,and 0 < r < 00, the function is strictly mono-tonically increasing; AO) = V'1; Ar) - V2 asr 00; the first derivative is not a functionsolely of the height of the function; the secondderivative does not change sign.

In fact, there are two experiments (Green &Snyderman, 1980; Navarick & Fantino, 1976)that are inconsistent with Equation 9. Navar-ick and Fantino (Experiment 2) ran a concur-rent-chains procedure with equal FI terminallinks, one ending in 1.5 s of reinforcement andone in 4.5 s of reinforcement. As the two ter-minal links increased (from 5 to 40 s), pref-

erence for the larger reinforcement increased.Equation 9 predicts a constant preference.Green and Snyderman ran a concurrent-chains experiment with FI terminal links keptat a fixed ratio (6:1, 3:1, or 3:2), with thelonger FI ending in 6 s of reinforcement andthe shorter ending in 2 s of reinforcement. Asthe terminal links were varied (keeping theirratio constant), preference for the longer rein-forcement increased in some cases and de-creased in others. Equation 9 predicts thatpreference for the longer reinforcement willalways decrease.

It turns out that Equation 9 embodies a cer-tain invarience of scale that leads to the er-roneous predictions. Let d = V2 - V'1, and sup-pose d is some positive value. Then for somevalue of r, V1 will lie halfway between V'1 andV2. Now double the value of d. The samevalue of r will result in V1 lying halfway be-tween V'1 and V2. What is necessary in orderto account for the above two experiments isthat the value of r leading to V1 = (V2 - V'1)12increase as d decreases. Because larger valuesof a lead to lower rates of increase of V1, onesolution would be for a to increase as ddecreases. This may be accomplished bysubstituting for a in Equation 9 the quantitya/[1-exp(-jd)]. This change complicates themathematics without changing the qualitativetrends discussed below; therefore, it is not in-corporated in what follows.

Appendix 3

Let VIL and VIR represent the values of theleft and right terminal links, tL and 1 - tL therelative times on the left and right initial links,and V2L and V2R the values upon entering theleft and right terminal links.From Equation 9:

(R L/tL) V2L

(RL/tL) + a

VR -[RRI(I - tL) V2R[RRI( I - tL)] + a

At equilibrium the two sides of the initial links

403

WILLIAM VA UGHAN, Jr.

must be equal in value (IVIL = V1R), so we have:

(RL/L) V2L [RR/( tL)I V2R

(RL/tL) + a [RR/(I tL)] + a

By cross multiplying:

(RL/tL) V2L[(RR/(l tL)) +a] =

(RR/( tL))V2R[(RL/t + a].

This gives:

RRV aRVL R V2L+ L V2L -

tLO( -tL) tL

R RV aRVLRR V2R R V2R

tL(l tL) 1 tL

Multiplying all terms by tL(l - tL):RLRRV2L + aRLV2L(l tL) =

RLRRV2R + aRR V2RtL-

Or:RLRRV2L + aRLV2L aRLV2LtL

RLRRV2R + aRR V2RtL-

Collecting all terms with tL on the right, andexchanging left and right:

aRR V2RtL + aRL V2LtL =

RLRRV2L RLRRV2R + aRLV2L.Or:

( + aRVIL(aR V2R L V2L)RLRR( V2L V2R) + aRL V2L

So:

RRLRR(V2L- V2R) + aRL V2LtL =

a(RLV2L+ RR V2R)

which is equivalent to Equation 10. Becausethe initial links are VI schedules, a will equal0.1 in this equation.

Appendix 4

Before it is possible to take account of thevalues of stimuli following food, it isnecessary to make an assumption regardinghow the value of a stimulus is affected whenfollowed by two or more concurrent stimuliwith different values. The assumption thatwill be made here is that it is the stimulus

with maximum value that affects the values ofearlier stimuli. At equilibrium, given that thekeys are equal in value, that maximum valueis then the value of any of the keys.

It turns out there are two distinct cases thatmust be considered-that in which the dura-tions of food are the same and that in whichthey are not the same. To begin with the firstcase, consider a concurrent-chains experimentwith equal durations of food, whose valuesmay be represented by V3. Their values willalways be equal, because their durations areequal, and the same stimulus (the stimuluswith maximum value) is employed in calcu-lating how their values change given transi-tions to the initial link. Thus, the value ofthe left terminal link (1V2L) is given byr2LV3/(r2L + a2), and the value of the right ter-minal link (V2R) is given by r2R V3/(r2R + a2).Substituting these quantities into Equation 10gives the following equation:

TLTL+TR

R R (r2LV3 + r2RV3 +R (r2LV3RLRR\rLa r2R+a2 +aL kr2L+a21

Fir2LV3 \ /r2RV31a LRL~(r-a2) + RR (r2R-1-]

This equation, in turn, is equivalent to thefollowing:

TL _

TL+TRRLRRV r2L + r2R /L3 r2Lr2L±a2 r2R+2 +aL3 r2LTa2

aV3[RL (r2a2) + RR (r2R)]

In this latter equation, the quantity V3 can beeliminated altogether, provided it is nonzero,which implies that under these conditions tL isindependent of V3. The general conclusion,then, is that given equal durations of reinforce-ment, neglecting the values of stimuli follow-ing food has no effect on the predicted equi-librium distribution of time.

In the case of unequal durations of rein-forcement, attempts at an analytical expres-sion for the equilibrium distribution of timehave so far failed. The following approach ap-pears to converge quickly (within about five

404

CHOICE: A LOCAL ANALYSIS

iterations) at a satisfactory approximation.Using Equations 9 and 10, determine tL on theassumption that the values of the stimulifollowing reinforcement are zero. Calculatethe values of the initial-link at that distribu-tion, and let that value be the values of thestimuli following food. Again use Equations 9and 10 to redetermine the distribution ofbehavior, and so on.

In general, it appears that the effect of tak-ing into account the stimuli following food,

given unequal durations of food, is to increasethe preference for the shorter of the two dura-tions of food (i.e., mitigating the preferenceproduced directly by the longer duration offood), with the effect being greater with shorterterminal links (and hence greatest with zero

terminal links, or simple concurrent sched-ules). The general shapes of the predictedfunctions are not affected; where they ap-

proach asymptote or where they cross indif-ference does tend to be affected to some extent.

405