How Do People Learn to Allocate Resources? Comparing Two Learning Theories

How Do People Learn to Allocate Resources?Comparing Two Learning Theories

Jorg Rieskamp, Jerome R. Busemeyer, and Tei LaineIndiana University Bloomington

How do people learn to allocate resources? To answer this question, 2 major learning models arecompared, each incorporating different learning principles. One is a global search model, which assumesthat allocations are made probabilistically on the basis of expectations formed through the entire historyof past decisions. The 2nd is a local adaptation model, which assumes that allocations are made bycomparing the present decision with the most successful decision up to that point, ignoring all other pastdecisions. In 2 studies, participants repeatedly allocated a capital resource to 3 financial assets. Substan-tial learning effects occurred, although the optimal allocation was often not found. From the calibratedmodels of Study 1, a priori predictions were derived and tested in Study 2. This generalization test showsthat the local adaptation model provides a better account of learning in resource allocations than theglobal search model.

How do people learn to improve their decision-making behaviorthrough past experience? The purpose of this article is to comparetwo fundamentally different learning approaches introduced in thedecision-making literature that address this issue. One approach,called global search models, assumes that individuals form expec-tancies for every feasible choice alternative by keeping track of thehistory of all previous decisions and searching for the strongest ofall these expectancies. Prominent recent examples that belong tothis approach are the reinforcement-learning models of Erev andRoth (1998; see also Erev, 1998; Roth & Erev, 1995). Thesemodels follow up a long tradition of stochastic learning models(Bush & Mosteller, 1955; Estes, 1950; Luce, 1959). The secondapproach, called local adaptation models, does not assume thatpeople acquire a representation for every choice alternative andkeep track of the history of all past decisions. Instead, this ap-proach assumes that people compare the consequences of a currentdecision with a reference point (i.e., the previous decision or themost successful decision up to that point) and adjust their decisionin the direction of successful decisions. Prominent learning modelsthat belong to this approach are the learning direction theory bySelten and Stocker (1986), the error-correction learning models(e.g., Dorfman, Saslow, & Simpson, 1975; Thomas, 1973) and the

hill-climbing learning model by Busemeyer and Myung (1987; seealso Busemeyer & Myung, 1992).

Global search models have recently accumulated a large amountof support in decision making, particularly in the case of constant-sum games—in which there are a small number of actions avail-able on each trial (Erev & Roth, 1998). It is not clear that thesuccess of this model extends to decision problems involving alarge continuous set of choice options. Global search models maybe applicable to situations in which the decision alternatives forma small set of qualitatively different strategies, whereas localadaptation models may be applicable in situations in which thedecision alternatives form a continuous metric space of strategies(see Busemeyer & Myung, 1992). Constant-sum games are repre-sentative of the former decision problems, and resource allocationtasks are representative of the latter decision problems. In the latterdomain, local adaptation models may work better. In this study,our main goal consisted of comparing the ability of the twolearning models to describe a learning process for a particulardecision problem, called the resource allocation problem, whichprovides a large continuous set of choice options to the decisionmaker on each trial. In the following sections, we compare twoversions of learning models that best represent the two approachesfor the resource allocation task. Direct comparisons of learningmodels have rarely been done, particularly not for the type of taskwe are considering.

Resource Allocation Decision Making

Allocating resources to different assets is a decision problempeople often face. A few examples of resource allocation decisionmaking are dividing work time between different activities, divid-ing attention between different tasks, allocating a portfolio todifferent financial assets, or devoting land to different types offarming. Despite the ubiquity in real life, resource allocationdecision making has not been thoroughly acknowledged in thepsychological literature.

How good are people at making resource allocation decisions?Only a handful of studies have tried to address this question. In oneof the earliest studies by Gingrich and Soli (1984), participants

Jorg Rieskamp and Jerome R. Busemeyer, Department of Psychology,Indiana University Bloomington; Tei Laine, Computer Science Depart-ment, Indiana University Bloomington.

This study was supported in part by Grants SBR9521918 andSES0083511 from the Center for the Study of Institutions, Population, andEnvironmental Change through the National Science Foundation.

We acknowledge helpful comments by Jim Walker and Hugh Kelleywith whom we worked on a similar research project on which the presentstudy is based. In addition, we thank Ido Erev, Scott Fisher, WielandMuller, Elinor Ostrom, Reinhard Selten, the members of the bio-complexity project at Indiana University Bloomington and two anonymousreviewers for helpful comments.

Correspondence concerning this article should be addressed to JorgRieskamp, who is now at the Max Planck Institute for Human Develop-ment, Lentzeallee 94, 14195 Berlin, Germany. E-mail: rieskamp@mpib-berlin.mpg.de

0278-7393/03/$12.00 DOI: 10.1037/0278-7393.29.6.1066

were asked to evaluate all of the potential assets before makingtheir allocations. Although the assets were evaluated accurately,the majority of participants failed to find the optimal allocation.Northcraft and Neale (1986) also demonstrated individuals’ diffi-culties with allocation decisions when attention had to be paid tofinancial setbacks and opportunity costs.

Benartzi and Thaler (2001) studied retirement asset allocations.For this allocation problem the strategy of diversifying one’sinvestment among assets (i.e., bonds and stocks) appears to bereasonable (Brennan, Schwartz, & Lagnado, 1997). Benartzi andThaler showed that many people follow a “1/n strategy” by equallydividing the resource among the suggested investment assets.Although such a strategy leads to a sufficient diversification, thefinal allocation depends on the number of assets and, thus, can leadto inconsistent decisions.

The above studies show that individuals often do not allocatetheir resources in an optimal way, which is not surprising given thecomplexity of most allocation problems. Furthermore, in the abovestudies, little opportunity was provided for learning because theallocation decisions were made only once or infrequently. Incontrast, Langholtz, Gettys, and Foote (1993) required participantsto make allocations repeatedly (eight times). Two resources—fueland personnel hours for helicopters—were allocated across aworking week to maximize the operating hours of the helicopters.Participants improved their performance substantially throughlearning and almost reached the optimal allocation. However,under conditions of risk or uncertainty, in which the amount of theresource fluctuated over time, the improvement was less substan-tial. For similar allocation problems, Langholtz, Gettys, and Foote(1994, 1995) and Langholtz, Ball, Sopchak, and Auble (1997)showed that learning leads to substantially improved allocations.Interestingly, a tendency to allocate the resource equally amongthe assets was found here also. Ball, Langholtz, Auble, andSopchak (1998) investigated participants’ verbal protocols whenthey solved an allocation problem. According to these protocols,participants seemed to use simplifying heuristics, which broughtthem surprisingly close to the optimal allocation (i.e., reached onaverage 94% efficiency).

In a study by Busemeyer, Swenson, and Lazarte (1986), anextensive learning opportunity was provided, as participants made30 resource allocations. Participants quickly found the optimalallocation for a simple allocation problem that had a single globalmaximum. However, when the payoff function had several max-ima, the optimal allocation was frequently not found. Busemeyerand Myung (1987) studied the effect of the range of payoffsbetween the best and worst allocation and the variability of returnrates for the assets. For the majority of conditions, participantsreached good allocations through substantial learning. However,under a condition with widely varying return rates and with a lowrange of payoffs, participants got lost and did not exhibit muchlearning effect. Furthermore, Busemeyer and Myung showed thata hill-climbing learning model, which assumes that individualsimprove their allocations step-by-step, provided a good descriptionof the learning process. However, this model was not comparedagainst alternative models, and therefore it remains unclearwhether this is the best way to characterize learning in these tasks.

It can be concluded that when individuals make only a singleresource allocation decision, with no opportunity to learn fromexperience, they generally do not find good allocations at once. Insuch situations, individuals have a tendency to allocate an equal

share of the resource to the different assets, which, depending onthe situation, can lead to bad outcomes. Alternatively, when indi-viduals are given the opportunity to improve their allocationsthrough feedback, substantial learning effects are found and indi-viduals often approach optimal allocations. However, if localmaxima are present or the payoffs are highly variable, then sub-optimal allocations can result even after extensive training.

In the following two studies, repeated allocation decisions withoutcome feedback were made, providing sufficient opportunity forlearning. The allocation problem of both studies can be defined asfollows: The decision maker is provided with a financial resourcethat can be invested in three financial assets. A particular alloca-tion, i (allocation alternative), can be represented by a three-dimensional vector, X, where each dimension represents the pro-portion of the resource invested in one of the three assets. Forrepeated decisions, the symbol Xt represents the allocation made atTrial t. The distance between two allocations can be measured bythe Euclidean distance, which is the length of the vector that leadsfrom one allocation to the other.1 The restriction of proportions tointeger percentages implies a finite number (N � 5,151) of pos-sible allocations.

Learning Models

The first learning model proposed to describe learning effects ofrepeated resource allocations represents the global search (GLOS)model approach. The GLOS model presented here is a modifiedversion of the reinforcement-learning model proposed by Erev(1998). The second learning model represents the local adaptation(LOCAD) model approach. The LOCAD model presented here isa modified version of the hill-climbing learning model proposedby Busemeyer and Myung (1987).

As already pointed out above, previous research has demon-strated that the two learning model approaches have been success-fully applied to describe people’s learning processes for variousdecision problems: For example, in Busemeyer and Myungs(1987) research, a hill-climbing learning model was appropriate todescribe people’s learning process for a resource allocation task;Busemeyer and Myung (1992) also applied a hill-climbing modelsuccessfully to describe criterion learning for a probabilistic cate-gorization task. In contrast, Erev (1998) has shown that areinforcement-learning model is also appropriate to describe thelearning process for a categorization task. Furthermore, Erev ex-plicitly proposed the reinforcement-learning model as an alterna-tive model to the hill-climbing model. Moreover, Erev and Gopher(1999) suggested the reinforcement-learning model for a resourceallocation task in which attention was the resource to be allocatedand which shows, by simulations, that the model’s predictions areconsistent with experimental findings. In summary, direct compar-isons of the two approaches appear necessary to decide in whichdomain each learning model approach works best. To extend thegenerality of our comparison of the GLOS and LOCAD models,we first compare the models with respect to how well they predicta learning process for repeated resource allocations and, second,we test whether the models are also capable of predicting individ-ual characteristics of the learning process. Finally, although it is

1 The Euclidean distance between two allocations Xki and Xkj with threepossible assets k is defined as Dij � ��k�1

3 �Xki � Xkj�2.

1067LEARNING TO ALLOCATE RESOURCES

true that we can only test special cases for each approach, wecurrently do not know of any other examples within either ap-proach that can outperform the versions that we are testing.

Global Search Model

Erev (1998), Roth and Erev (1995), and Erev and Roth (1998)have proposed in varying forms a reinforcement-learning modelfor learning in different decision problems. The GLOS model wasparticularly designed for the resource allocation problem. Thebasic idea of the model is that decisions are made probabilisticallyproportional to expectancies (called propensities by Erev andcolleagues). The expectancy for a particular option increaseswhenever a positive payoff or reward is provided after it is chosen.This general reinforcement idea can be traced back to early workby Bush and Mosteller (1955), Estes (1950), and Luce (1959); formore recent learning models see Borgers and Sarin (1997), Cam-erer and Ho (1999a, 1999b), Harley (1981), Stahl (1996), andSutton and Barto (1998).

The GLOS learning model for the resource allocation problem isbased on the following assumptions: Each allocation alternative isassigned a particular expectancy. First, an allocation is selectedprobabilistically proportional to the expectancies. Second, the re-ceived payoff is used to determine the reinforcement for all allo-cation alternatives, in such a way that the chosen allocation alter-native receives reinforcement equal to the obtained payoff;allocation alternatives close to the chosen one receive slightly lessreinforcement; and allocation alternatives that are far away fromthe chosen allocation alternative receive very little reinforcement.Finally, the reinforcement is used to update the expectancies ofeach allocation alternative and the process returns to the first step.

In more detail GLOS is defined as follows: The preferences forthe different allocation alternatives are expressed by expectanciesqit, where i is an index of the finite number of possible allocations.The probability pit that a particular allocation, i, is chosen at Trialt is defined by (cf. Erev & Roth, 1998)

pit � qit /�i�1

qit. (1)

For the first trial, all expectancies are assumed to be equal anddetermined by the average payoff that can be expected fromrandom choice, multiplied by w, which is a free, so-called “initialstrength parameter” and is restricted by w � 0. After a choice ofallocation alternative j on Trial t is made, the expectancies areupdated by the reinforcement received from the decision, which isdefined as the received payoff, rjt. For a large grid of allocationalternatives, it is reasonable to assume that not only the chosenallocation is reinforced but also similar allocations. Therefore, toupdate the expectancies of any given allocation alternative, i, thereinforcement rit is determined by the following generalizationfunction (cf. Erev, 1998):

rit � rjt � g�xij� � rjt � exp� � xij2 / 2� R

2 �, (2)

where xij is the Euclidean distance of a particular allocation, i, tothe chosen allocation, j, and with the standard deviation �R as thesecond free parameter. This function was chosen so that thereinforcement rit for the chosen allocation j should equal thereceived payoff r jt .

2 In the case of a negative payoff, rjt � 0,

Equation 2 was modified as follows: rit � rjt g(xij) � rjt. By usingthis modification, if the current payoff is negative, then the chosenallocation receives the reinforcement of zero, whereas all otherallocation alternatives receive positive reinforcements. Finally, thedetermined reinforcement is used to update the expectancies by thefollowing updating rule (see Erev & Roth, 1998):

qit � �1 � ��qi�t�1� � rit, (3)

where � � [0,1] is the third free parameter, the forgetting rate. Theforgetting rate determines how strongly previous expectanciesaffect new expectancies. If the forgetting rate is large, the obtainedreinforcement has a strong effect on the new expectancies. Toensure that all possible allocation alternatives are chosen, at leastwith a small probability, the minimum expectancy for all optionsis restricted to v � 0.0001 (according to Erev, 1998). After theupdating process, the probability of selecting any particular allo-cation alternative is determined again.

In summary, the GLOS learning model has three free parame-ters: (a) the initial strength parameter, w, which determines theimpact of the initial expectancies; (b) the standard deviation, �R, ofthe generalization function, which determines how similar (close)allocations have to be to the chosen allocation to receive substan-tial reinforcement; and (c) the forgetting rate � that determines theimpact of past experience compared with present experience. It isimportant to limit the number of parameters to a relatively smallnumber because models built on the basis of too many parameterswill fail to generalize to new experimental conditions.

Local Adaptation Learning Model

The LOCAD learning model incorporates the idea of a hill-climbing learning mechanism. In general, hill-climbing mecha-nisms are widely used heuristics for optimization problems whoseanalytic solutions are too complex (Russel & Norvig, 1995). Thebasic idea is to start with a randomly chosen decision as a tempo-rary solution and to change the decision slightly in the next trial.If the present decision leads to a better outcome than a referenceoutcome (i.e., the best previous outcome), this decision is taken asa new temporary solution. Starting from this solution, a slightlydifferent decision is made in the same direction as the present one.If the present decision leads to an inferior outcome, the temporarysolution is kept, and starting from this solution a new decision ismade in the opposite direction from the present decision. The stepsize, that is, the distance between successive decisions, usuallydeclines during search. The search stops when no further changesusing this method yield substantial improvement. This processrequires that the available decision alternatives have an underlyingcausal structure, such that they can be ordered by some criteria and

2 This is one aspect where the GLOS model varies from Erev’s (1998)reinforcement model, where the density of the generalization function is setequal to the received payoff. This constraint, set by Erev, has the disad-vantage that the standard deviation of the generalization function, which issupposed to be a free parameter, interacts with the received payoff used asa reinforcement, such that a large reinforcement for a chosen allocation is,for example, only possible if a small standard deviation of the generaliza-tion function is chosen. We are confident that this difference and all otherdifferences of the GLOS model from the reinforcement model by Erevrepresent improvements, in particular for the allocation task, which makesit a strong competitor to the LOCAD model.

1068 RIESKAMP, BUSEMEYER, AND LAINE

a direction of change exists. Consequently, for decision problemsthat do not fulfill this requirement, the LOCAD learning modelcannot be applied. It is well known that hill-climbing heuristics areefficient, because they often require little search, but their disad-vantage can be suboptimal convergence (Russel & Norvig, 1995),in other words, “getting stuck” in a local maximum.

LOCAD is defined as follows: It is assumed that decisions aremade probabilistically as in the GLOS learning model. In the firsttrial, identical to GLOS, an initial allocation is selected with equalprobability, pit, of all possible allocations. For the second alloca-tion, the probability of selecting any particular allocation is definedby the following distribution function:

pit � fS�xij�/K � exp� � �xij � st�2/ 2�S

2�/K, (4)

where xij is the Euclidean distance of any allocation, i, to the firstchosen allocation, j, with a standard deviation �S as the first freeparameter, and K is simply a constant that normalizes the proba-bilities so that they sum to one. The step size, st, changes acrosstrials as follows:

st �s1

��t�1 � �t�2��b

t, (5)

where s1 is the initial step as the second free parameter, vt is thereceived payoff (with v0 � 0), and vb is the payoff of the referenceallocation. The reference allocation is the allocation alternativethat produced the highest payoff in the past and is represented bythe index b for best allocation so far. Accordingly, the step size isdefined by two components. The first component depends on thepayoffs of the preceding allocations and the maximum payoffreceived so far. The second component is time, manipulated so thatthe step size automatically declines over time. Note that for Trialt � 2, the step size, s2, equals the initial step size, s1.

For the third and all following trials, the probability of selectingany particular allocation is determined by the product of twooperations, one that selects the step size and the other that selectsthe direction of change. More formally, the probability of selectingan allocation alternative i on Trial t � 2 is given by

pit � fS�xib�fA�yij�/K. (6)

In the above equation, the probability of selecting a step size isdetermined by the function, fS (xib), which is the same functionpreviously defined in Equation 4 with the distance xib defined asthe Euclidean distance from any allocation i to the referenceallocation b. The second function is represented by fA (yij) �exp[�(yij – at)

2/2�A2 ], where yij is the angle between the direction

vector of any allocation i to the direction vector of the precedingallocation, j, and at equals 0° if the preceding allocation led to ahigher or equal payoff than the reference allocation; otherwise at

equals 180o. The direction vector of any allocation i is defined asthe vector from the preceding allocation j to the allocation i(defined as Xi – Xj). The angle between the two direction vectorsranges from 0° to 180° (mathematically the angle is determined bythe arccosines of the vector product of the two direction vectorsnormalized to a length of one). The function fA (yij) has a standarddeviation �A as the third free parameter.

In summary, the LOCAD learning model has the followingsteps. In the first trial, an allocation alternative is chosen withequal probability, and in the second trial a slightly different allo-cation alternative is selected. For selecting an allocation alternative

in the third and all following trials, the payoff received in thepreceding trial is compared with the reference allocation thatproduced the maximum payoff received so far (this is an importantdifference to the model proposed by Busemeyer & Myung, 1987,where the reference allocation was the previous allocation). If thepayoff increased (or stayed the same), allocations in the samedirection as the preceding allocation are likely to be selected. Onthe other hand, if the payoff decreased, allocations in the opposingdirection are more likely to be selected. The LOCAD learningmodel has three free parameters: (a) the initial step size, s1, whichis used to determine the most likely distance between the first andsecond allocation, and on which the succeeding step sizes depend;(b) the standard deviation, �S, of the distribution function, fS,which determines how likely the distance between new allocationsand the reference allocation differ from the distance defined by thestep size, st; and (c) the standard deviation, �A, of the distributionfunction, fA, which determines how likely the direction of newallocations differ from the direction (or opposing direction) of thepreceding allocation.

The LOCAD learning model has similarities to the learningdirection theory proposed by Selten and Stocker (1986) and to thehill-climbing learning model proposed by Busemeyer and Myung(1987). Learning direction theory also assumes that decisions areslightly adjusted on the basis of feedback, by comparing theoutcome of a decision with hypothetical outcomes of alternativedecisions. The LOCAD model represents a simple learning modelwith only three free parameters, compared with the hill-climbingmodel proposed by Busemeyer and Myung (1987) with eight freeparameters.

The LOCAD model is to some extent also related to so-calledbelief-based learning models (Brown, 1951; Cheung & Friedman,1997; Fudenberg & Levine, 1995; see also Camerer & Ho’s,1999a, 1999b, experience-weighted attraction learning model thatincludes belief-based models as a special case). These models,which have been prevalently applied to learning in games, assumethat people form beliefs about others’ future behavior on the basisof past experience. Therefore, decision alternatives that had notbeen chosen in the past and did not obtain any reinforcement butwould have resulted in good payoffs are likely to be chosen in thefuture. Similar to belief-based models, the LOCAD model predictsthat people form beliefs about which decision alternatives mightproduce higher payoffs compared with the present decision. How-ever, in contrast to belief-based models, these beliefs are not basedon foregone payoffs that are determined by the total history of pastdecisions but are based on an assumption of the underlying causalstructure of the decision problem.

The Relationship of the Two Learning Models

The two models presented, in our view, are appropriate imple-mentations of the two approaches of learning models we consider.Any empirical test of the two models, strictly speaking, onlyallows conclusions on the empirical accuracy of the particularlearning models implemented. However, keeping this restriction inmind, both learning models are provided with a flexibility (ex-pressed in the three free parameters of each model) that allowsthem to predict various learning processes. Variations of ourimplementations (e.g., using an exponential choice rule for deter-mining choice probabilities instead of the implemented linearchoice rule) might increase the empirical fit of the model but will

not abolish substantial different predictions made by the twolearning models for the allocation decision problem we consider.3

What are the different predictions that can be derived from thetwo learning models? In general, the GLOS model predicts that theprobabilities with which decision alternatives are selected dependon the total stock of previous reinforcements for these alternatives.This implies a global search process within the entire set ofalternatives, which should frequently find the optimal alternative.In contrast, the LOCAD model only compares the outcome of thepresent decision with the best outcome so far and ignores all otherexperienced outcomes, which are not integrated in an expectancyscore for each alternative. Instead, which alternatives will bechosen depends on the success and the direction of the presentdecision; thereby, an alternative similar to the present alternativewill most likely be selected. This implies a strong path depen-dency, so that depending on the starting point of the learningprocess, the model will often not converge to the optimal outcomeif several payoff maxima exist.

However, the specific predictions of the models depend on theparameters, so that particular parameter values could lead to sim-ilar behavior for both models. For example, if the GLOS model hasa high forgetting rate, the present allocation strongly influences thesucceeding allocation, resulting in a local search similar to theLOCAD model, so that it could also explain convergence to localpayoff maxima. Likewise, if the LOCAD model incorporates alarge initial step size it implies a more global, random searchprocess, and therefore it could explain convergence to a globalmaximum. Because of this flexibility of both models, one canexpect a relatively good fit of both models when the parametervalues are fitted to the data.

Therefore, we used the generalization method (Busemeyer &Wang, 2000) to compare the models, which entails using a two-stage procedure. As a first stage, in Study 1, each model was fit tothe individual learning data, and the fits of the two models werecompared. These fits provided estimates of the distribution of theparameters over individuals for each model. As a second stage, theparameter distributions estimated from Study 1 were used togenerate model predictions for a new learning condition presentedin Study 2. The accuracies of the a priori predictions of the twomodels for the new condition in Study 2 provide the basis for arigorous comparison of the two models.

Study 1

In this experiment, the decision problem consisted of repeatedlyallocating a resource among three financial assets. The rates ofreturn were initially unknown, but they could be learned by feed-back from past decisions. To add a level of difficulty to thedecision problem, the rate of return for each asset varied dependingon the amount invested in that asset and on the amount invested inthe other assets. One could imagine a real-life analogue in whichfinancial assets have varying returns because of fixed costs, econ-omies of scale, or efficiency, depending on investments in otherassets. The purpose of the first study was to explore how peoplelearn to improve their allocation decisions and whether they areable to find the optimal allocation that leads to the maximumpayoff. Study 1 was also used to compare the fits of the twomodels with the individual data and to estimate the distribution ofparameters for each model.

Method

Participants. Twenty persons (14 women and 6 men), with an averageage of 22 years, participated in the experiment. The computerized tasklasted approximately 1 h. Most participants (95%) were students in variousdepartments of Indiana University. For their participation, they received aninitial payment of $2. All additional payments depended on the partici-pants’ performance; the average payment was $18.

Procedure. The total payoff from an allocation is defined as the sum ofpayoffs obtained from the three assets. The selection of the particularpayoff function was motivated by the two learning models’ predictions. Ascan be seen in Figure 1, the allocation problem was constructed such thata local and a global maximum with respect to the possible payoffs resulted.In general, one would expect that people will get stuck at the local payoffmaximum if their learning process is consistent with the LOCAD model. Incontrast, the GLOS model predicts a learning process that frequentlyshould converge at the global payoff maximum.

Figure 1 only shows the proportion invested in Asset B and Asset C,with the rest being invested in Asset A. High investments in Asset C leadto low payoffs (in the worst case, a payoff of $3.28), whereas lowinvestments in Asset C result in higher payoffs. The difficult part is to findout that there are two payoff maxima: first, the local maximum with apayoff of $32.82 when investing 28% in Asset B and 19% in Asset C and,second, the global maximum with a payoff of $34.46 when investing 88%in Asset B and 12% in Asset C, yielding a difference of $1.64 between thetwo maxima. Note that there is no variability in the payoffs at eachallocation alternative so that if a person compares the local and the globalmaximum, it is perfectly obvious that there is a payoff difference betweenthem favoring the latter. The main difficulty for the person is finding theglobal maximum, not detecting a difference between the local and globalmaxima. The Euclidean distance between the corresponding allocations ofthe local and global maximum is 80 (the maximum possible distancebetween two allocations is 141). From random choice an average payoff of$24.39 can be expected. The payoff functions for each asset are providedin the Appendix.

The participants received the following instructions: They were to makerepeated allocation decisions in two phases of 100 trials. On each trial, theywould receive a loan of $100 that had to be allocated among three“financial assets” from which they could earn profit. The loan had to berepaid after each round, so that the profit from the investment decisionsequaled the participant’s gains. The three assets were described as follows:Investments in Asset A “pay a guaranteed return equal to 10% of yourinvestment,” whereas the returns from Asset B and Asset C depended onhow much of the loan was invested in the asset. Participants were informedthat there existed an allocation among Asset A, Asset B, and Asset C thatwould maximize the total payoffs and that the return rates for the threeassets were fixed for the whole experiment. It was explained that theywould receive 0.25% of their total gains as payment for their participation.After the first phase of 100 trials, participants took a small break. There-after, they received the information that the payoff functions for Asset B

3 In fact when we constructed the learning models for Study 1, variousmodifications of both learning models were tested. For example, for theGLOS model, among other things, we used different generalization func-tions to determine reinforcements (see also footnote 2) or different methodsto determine the reinforcement in case of negative payoffs. For theLOCAD model, among other things, we used different reference outcomeswith which the outcome of a present decision was compared with deter-mining the success of a decision or different methods for how the step sizeof the current trial was determined. In summary, the specified LOCAD andGLOS learning models were the best models (according to the goodness-of-fit criterion in Study 1) representing the two approaches of learningmodels. Therefore, the conclusions we draw from the results of our modelcomparisons are robust to variations of the present definition of the twolearning models.

and Asset C were changed but that everything else was identical to the firstblock.

In fact, the payoff functions for Asset B and Asset C were interchangedfor the second phase. To control any order effects of which payoff functionwas assigned to Asset B and which to Asset C, for half of the participantsthe payoff function of Asset B in the first phase was assigned to Asset Cand the payoff function for Asset C was assigned to Asset B. For thesecond phase, the reverse order was used.

Results

First, a potential learning effect is analyzed before the twolearning models are compared and more specific characteristics ofthe learning process are considered.

Learning effects. The average investment in Asset B increasedfrom 26% (SD � 15%) in the 1st trial to an average investment of36% (SD � 21%) in the 100th trial, whereas the average invest-ment in Asset C decreased from an average of 34% (SD � 21%)in the 1st trial to an average of 19% (SD � 8%) in the 100th trial.This difference represents a substantial change in allocations cor-responding to an average Euclidean distance of 28 (SD � 29),t(19) � 4.21, p � .001, d � 0.94. Furthermore, this change leadsto an improvement in payoffs, which is discussed in more detailbelow. Figure 2 shows the learning curve for the first phase of theexperiment. The percentages invested in Asset B and Asset C areplotted as a function of training (with a moving average of 9 trials).

To investigate the potential learning effect, the 100 trials of eachphase were aggregated into blocks of 10 trials (trial blocks). Arepeated measure analysis of variance (ANOVA) was conducted,with the average obtained payoff as the dependent variable, thetrial blocks and the two phases of 100 trials as two within-subjectfactors, and the order in which the payoff functions were assignedto the assets as a between-subjects factor. A strong learning effectcould be documented, as the average obtained payoff of $28 in thefirst block (SD � 2.2) increased substantially across the 100 trialsto an average payoff of $32 (SD � 1.6) in the last block, F(9,10) � 5.18, p � .008, �2 � 0.82. In addition, there was a learningeffect between the two phases as participants on average did better

in the second phase (M � $30 for the first phase, SD � 2.1 vs.M � $31 for the second phase, SD � 1.9), F(1, 18) � 12.69, p �.002, �2 � 0.41. However, this effect was moderated by aninteraction between trial blocks and the two phases, F(9, 10) �3.30, p � .038, �2 � 0.75. This interaction can be attributed to amore rapid learning process for the second phase compared withthe first phase: The average obtained payoff was higher in thesecond phase from the 2nd to 5th trial blocks, whereas for the 1sttrial block and last 5 trial blocks, the payoffs did not differ. Theorder in which the payoff functions were assigned to Asset B andAsset C had no effect on the average payoffs (therefore, forsimplicity, in the following and for the presented figures, theinvestments in Assets B and C are interchanged for half of theparticipants). No other interactions were observed.

Model comparison. How well do the two learning models fitthe observed leaning data? We wished to compare the modelsunder conditions where participants had no prior knowledge, andso we only used the data from the first phase to test the models.Each model was fit separately to each individual’s learning data asfollows.

First, a set of parameter values were selected for a modelseparately for each individual. Using the model and parameters,we generated a prediction for each new trial, conditioned on thepast allocations and received payoffs of the participant before thattrial. The model’s predictions are represented by a probabilitydistribution across all possible 5,151 allocation alternatives, wherethe selected allocation alternative of a participant received a valueof 1 and all other allocation alternatives received values of 0. Theaccuracy of the prediction for each trial was evaluated using the

Figure 1. The payoff function for the total payoff of the allocationproblem in Study 1. The figure shows the investment in Asset B and AssetC (which determines the investment in Asset A) and the correspondingpayoff.

Figure 2. Average participants’ allocations and average predictions of thetwo learning models fitted to each individual. The figure shows a movingaverage of nine trials, such that for each trial the average of the presentallocation and the preceding and succeeding four allocations are presented.(Note for the first 4 trials the moving average is determined by five to eighttrials.) GLOS � global search model; LOCAD � local adaptation model.Solid diamonds represent Real Asset B; solid triangles represent Real AssetC; solid lines represent GLOS Asset B; hatched lines represent GLOSAsset C; open squares represent LOCAD Asset B; and open trianglesrepresent LOCAD Asset C.

sum of squared error. That is, we computed the squared error of theobserved (0 or 1) response and the predicted probability for eachof the 5,151 allocation alternatives and summed these squarederrors across all the alternatives for each trial to obtain the sum ofsquared error for each trial (this score ranged from 0 to 2). Toassess the overall fit for a given individual, model, and set ofparameters, we determined the average of the sum of squared error(SSE) across all 100 trials.4

To compare the fits of the two learning models for Study 1, wesearched for the parameter values that minimized the SSE for eachmodel and individual. To optimize the parameters for each partic-ipant and model, reasonable parameter values were first selectedby a grid-search technique, and thereafter the best fitting gridvalues were used as a starting point for a subsequent optimizationusing the Nelder–Mead simplex method (Nelder & Mead, 1965).For the optimization process, the parameter values for the GLOSmodel were restricted to initial strength values w between 0 and 10,standard deviations �R of the generalization function between 1and 141, and forgetting rates � between 0 and 1. The parametervalues for the LOCAD model were restricted to initial step sizesbetween 1 and 141, a standard deviation �S of the distributionfunction fS between 1 and 141, and a standard deviation �A of thedistribution function fA between 0° and 360°.

The above procedure was applied to each of the 20 participantsto obtain 20 sets of optimal parameter estimates. For the GLOSmodel, this produced the following means and standard deviationsfor the three parameters: initial strength mean, w � 3.4 (SD � 4.4);forgetting rate mean, � � 0.24 (SD � 0.18); and a standarddeviation mean, �R � 1.8 (SD � 1.5) of the generalizationfunction. The mean and standard deviation of the SSE for theGLOS model was 0.94 (SD � 0.10).

For the LOCAD learning model, this estimation procedure pro-duced the following means and standard deviations: Initial stepsize mean of s1 � 23 (SD � 30), standard deviation mean for thedistribution function fS of �S � 22 (SD � 40), and a standarddeviation mean for the distribution function fA of �A � 119o

(SD � 72). The mean and standard deviation of the SSE for theLOCAD model were 0.91 and 0.18, respectively. In summary, forStudy 1 the LOCAD model was slightly more appropriate com-pared with the GLOS model according to the SSE to predictparticipants’ allocations (Z � 1.5, p � .135; Wilcoxon signed ranktest).

Figure 2 shows the average allocation of the participants acrossthe first 100 trials. Additionally, Figure 2 shows the predictedaverage allocation by both learning models when fitted to eachparticipant. Both models adequately describe the last two thirds ofthe learning process. However, for the first third, GLOS predictsan excessively large proportion invested in Asset C, whereasLOCAD overestimates the proportion invested in Asset B andunderestimates the proportion invested in Asset C.5

Individual characteristics of the learning process. In additionto analyzing the allocations of the participants, one can askwhether the learning models are also capable of predicting indi-vidual characteristics of the learning process. One characteristic iswhether a participant eventually found the global maximum, onlycame close to the local maximum, or was not close to eithermaximum. Figure 3A shows the percentage of participants whowere close (5%) to the allocations that produced the global orlocal maximum across the 100 trials. In the first trial, no partici-pant made an allocation corresponding to the local or global

maximum. At the end of training, only 10% of participants wereable to find the optimal allocation producing the maximum payoff,whereas 50% of the participants ended up choosing allocationsclose to the local maximum. Figure 3A also shows the predictionsof the models. Both models accurately describe the proportion ofparticipants who make allocations according to the local or globalmaximum.

As noted earlier, participants were able to increase their payoffsover the 100 trials through learning (see Figure 3B). Both learningmodels also accurately describe this increase in payoffs.

As a third criterion for comparing the two learning models, theeffect of training on the magnitude with which individuals changedtheir decisions was considered. To describe these changes duringlearning, the Euclidean distances between successive trials weredetermined. Figure 4A shows that in the beginning of the learningprocess, succeeding allocations differed substantially with an av-erage Euclidean distance of 30 units, whereas at the end of thetask, small changes in allocations were observed (M distance � 9units). LOCAD more accurately predicts the magnitude withwhich participants change their allocation than GLOS. GLOS onaverage predicts a too small magnitude with which allocations arechanged in successive trials.

A fourth characteristic to examine is the direction of changein allocations that individuals made following different types ofoutcome feedback. The LOCAD model predicts that the out-come of a decision is compared with the outcome of the mostsuccessful decision to that point, and, if the present decisionleads to a greater payoff, the direction of the succeeding deci-sion is likely to be in the same direction as the present decision.If a decision leads to a smaller payoff, the succeeding decisionis likely to be in the opposite direction. In contrast, the GLOSmodel predicts that a decision is based on the aggregatedsuccess and failure of all past decisions, so that no strongcorrelation between the success of a present decision and thedirection of the succeeding decision is expected. To test thisprediction, the angles between the direction of an allocation andthe direction of the preceding allocation were determined for allallocations. Figure 4B shows the proportion of the precedingallocations that were successful for all preceding allocations(i.e., led to a greater payoff than the allocation before), cate-gorized with respect to the angle between the direction of anallocation and the direction of the preceding allocation. Con-sistent with the LOCAD model, we observed an association

4 As an alternative method for parameter estimation, compared with theleast-squares estimation, maximum likelihood estimation has the drawbackthat it is sensitive to very small predicted probabilities, which frequentlyoccurred for the present task with the large number of possible allocations;for advantages of the least-squares estimation see Selten (1998). Further-more, the optimal properties of maximum likelihood only hold when themodel is the true model, which is almost never correct. In addition, theseproperties only hold if the parameters fall inside the convex boundary ofthe parameters, which is not guaranteed in our models. In summary, underconditions of possible model misspecification, least-squares estimation ismore robust that maximum likelihood estimation, so the statistical justifi-cations for maximum likelihood do not hold up under these conditions.

5 Note that the models’ parameters were not fitted by optimizing thepredicted average allocations compared with the observed average alloca-tions but by optimizing the predicted probabilities of which allocation wasselected; otherwise a closer fit would result.

between the participants’ allocation directions and their suc-cess: For 70% of all allocations made in the same direction asthe preceding allocation, the preceding allocation was success-ful compared with only 35% of all allocations made in anopposite direction.

This association was predicted, although to different extents,by both models. As expected for LOCAD, a preceding alloca-

tion was likely to be successful (in 67% of all cases) when thedirection of an allocation was the same as the direction of thepreceding allocation (angles between 0 and 30o), whereas thepreceding allocation was unlikely to be successful (only in41% of all cases) when the direction of an allocation wasopposite to the preceding direction. Surprisingly, this associa-tion was also observed for GLOS: For 73% of all allocations

Figure 3. Individual characteristics of the decision process in Study 1. A: Percentage of allocations corre-sponding to the local or global payoff maximum across all trials (with a tolerated deviation of 5% from theallocations that lead to the global or local maximum), presented with a moving average of nine trials. B: Averagepayoff across all trials, presented with a moving average of nine trials. GLOS � global search model; LOCAD �local adaptation model.

made in a similar direction to the preceding allocation, thepreceding allocation was successful, compared with 39% of allallocations made in an opposite direction. However, the pro-portions of successful preceding allocations for the differentangles was more strongly correlated with LOCAD’s predictions(r � .95) than with GLOS’s predictions (r � .84).

Summary of Study 1

In Study 1, we showed that people are able to improve theirdecisions in an allocation situation substantially when providedwith feedback. However, only a few participants were able to findthe allocation that produced the maximum possible payoff. This

Figure 4. Individual characteristics of the decision process in Study 1. A: Average magnitude of changes (stepsize) measured with the Euclidean distance between the allocations of successive trials (with possible valuesranging from 0 to 141), presented with a moving average of nine trials. B: The angles between allocations’directions compared with the direction of preceding allocations were determined and categorized in six intervals.For each category, the percentage of successful preceding allocations (i.e., those leading to a higher payoff thanthe allocations before) are presented. GLOS � global search model; LOCAD � local adaptation model.

result can be explained by the LOCAD learning model, whichdescribed the empirical results slightly better than the GLOSlearning model, on the basis of the goodness-of-fit criterion. Ifpeople start with a particular allocation and try to improve theirsituation by slightly adapting their decisions, as predicted byLOCAD, depending on their starting position, they will often notfind the global payoff maximum.

However, because both models were fitted to each individualseparately, it is difficult to decide which model is more appropri-ate, as the two models make similar predictions. When focusing onseveral individual learning characteristics only one out of fourcharacteristics supports the LOCAD model: the magnitude withwhich allocations are changed in successive trials. The other threeprocess characteristics are appropriately described by both learn-ing models. This result is not very surprising if one considers thatboth models were fitted for each individual and only predictedeach new trial on the basis of the information of previous trials. Incontrast, in Study 2 both models made a priori predictions forindependent data, enabling a rigorous comparison of the twomodels.

Study 2

In light of the results found in Study 1 that people, even whenprovided with substantial learning opportunity, often end up withsuboptimal outcomes, one might object that the function of thetotal payoff used in Study 1 only produced a relatively smallpayoff difference between the two maxima, providing small in-centives for participants to search for the global maximum. Inaddition, if one takes the opportunity costs of search into account,it might be reasonable to stay at the local maximum. One couldcriticize that the small difference between the payoffs does notsatisfy the criterion of payoff dominance (Smith, 1982), that is, theadditional payoff does not dominate any (subjective) costs offinding the optimal outcome, so that participants are not suffi-ciently motivated to find the global payoff maximum. In Study 2,we addressed this critique by increasing the payoff differencebetween the local and global payoff maximum but keeping theshape of the total payoff function similar to that in Study 1.

Increasing the payoff difference between the local and globalpayoff maximum has direct implications for the predictions of theGLOS learning model: If the reinforcement for the global payoffmaximum increases relative to the local payoff maximum, theprobability of selecting the allocation alternative corresponding tothe global maximum should increase according to the GLOSmodel. Therefore, one would expect the GLOS model to predictthat more people will find the global maximum. In contrast, alarger payoff difference between the local and global payoff max-imum does not affect the prediction of the LOCAD model.

Study 2 also provides an opportunity to test the two learningmodels on new independent data, by simulating 50,000 agentsusing the model parameter values randomly selected from normaldistributions with the means and standard deviations of the param-eter values derived from the individual fitting process of Study 1.Given that the models’ parameter values are not fitted by the dataof Study 2, the models’ predictions provide a stronger empiricalgeneralization test of the models, which has been often asked forbut seldom done (Busemeyer & Wang, 2000).

Method

Participants. Twenty persons (13 women and 7 men) with an averageage of 21 years participated in the experiment. The duration of the com-puterized task was approximately 1 h. Most participants (90%) werestudents in various departments of Indiana University. For their participa-tion they received an initial payment of $2. Additional payment wascontingent on the participants’ performance; the average payment was $20.

Procedure. The allocation problem was identical to the one used inStudy 1, with the only difference being the modified payoff functions. Thepayoff functions differed by an increase in the payoff difference betweenthe local and global payoff maximum (see Figure 5). Again, high invest-ments in Asset C led to low payoffs, in the worst case to a payoff of–$34.55, whereas small investments in Asset C result in higher payoffs.The local maximum with a payoff of $32.48 was obtained when investing29% in Asset B and 21% in Asset C (cf. 28% and 19%, respectively, witha payoff of $32.82 in Study 1), whereas the global maximum with a payoffof $41.15 was reached when investing 12% in Asset B and 88% in AssetC (the same allocation led to the global payoff maximum of $34.46 inStudy 1). From random choice, an average payoff of $17.44 could beexpected. The payoff functions yielded a difference of $8.67 and a Euclid-ean distance of 79 between the allocations corresponding to the local andglobal payoff maximum.

The instructions for the task in Study 2 were identical to those used inStudy 1.

Results

As in Study 1, first we analyze a potential learning effect beforethe two learning models are compared and more specific charac-teristics of the learning process are considered.

Learning effects. In the 1st trial, the average allocation con-sisted of an investment of 26% in Asset B (SD � 12%), whichincreased to an average investment of 48% in Asset B (SD � 27%)in the 100th trial. The investment in Asset C decreased from 27%(SD � 13%) in the first trial to 22% (SD � 12%) in the 100th trial.As in Study 1, participants in Study 2 had the tendency in the firsttrial to invest slightly more in Asset A, which guaranteed a fixedreturn. The allocation in the first trial substantially differed from

Figure 5. The payoff function for the total payoff of the allocationproblem in Study 2.

that in the 100th trial with a Euclidean distance of 41, t(19) � 6.93,p � .001, d � 1.55.

To investigate any learning effect, the 100 trials of both phaseswere aggregated into blocks of 10 trials (trial blocks). A repeatedmeasure ANOVA was conducted, with the obtained payoff as thedependent variable, the trial blocks and the two phases of 100 trialsas two within-subject factors, and the order in which the payofffunctions were assigned to the assets as a between-subjects factor.

A strong learning effect was documented, as the average ob-tained payoff of $25 in the first block (SD � 3.6) increasedsubstantially across the 100 trials to an average payoff of $34(SD � 4.7) in the last block, F(9, 10) � 4.09, p � .019, �2 � 0.79.In addition, there was a learning effect between the two phases, asparticipants on average did better in the second phase (M � 30,SD � 3.7 vs. M � 33, SD � 4.5), F(1, 18) � 8.59, p � .009, �2 �0.32. In contrast to Study 1, the interaction between trial blocksand the two phases was not significant, F(9, 10) � 2.24, p � .112,�2 � 0.67. The order in which the payoff functions were assignedto Asset B and Asset C had no effect on the average payoffs(therefore, for simplicity, in the following, the investments inAssets B and C are interchanged for half of the participants). Noother interactions were observed.

Model comparison. How well did the two learning modelspredict participants’ allocations across the first 100 trials? ForStudy 2, no parameter values were estimated. Instead, our testingapproach consisted of simulating a large number of agents with themodels’ parameter values randomly selected from normal distri-butions, with the means and standard deviations of the parametervalues derived from the fitting process of Study 1. Finally, themodels’ fits were assessed by calculating the mean squared error(MSE) of the average observed and average predicted allocations(the deviation between two allocations is defined by the Euclideandistance).

Figure 6 shows the development of the average allocation of theparticipants across all 100 trials. In addition, the figure shows thepredicted average allocation of both learning models. The LOCADlearning model better describes the development of the allocationsacross the 100 trials, and MSE equals 39. In contrast, the GLOSlearning model less appropriately describes the learning process,with an MSE of the predicted and observed average allocation of117. GLOS underestimates the magnitude of the learning effect forthe allocation task.

Characteristics of the learning process. Does the LOCADlearning model predict individualistic characteristics of the learn-ing process more suitably than the GLOS model? Figure 7A showsagain for Study 2 the proportion of allocations across all trials thatcorrespond to the allocations that led to the local or global payoffmaximum (with a tolerated deviation of 5%). Similar to Study 1,the proportion of participants that made allocations according tothe local or global maximum increased substantially through learn-ing across the 100 trials. However, again only a small number ofparticipants (20%) finally found the allocation corresponding tothe global payoff maximum, whereas a larger proportion (40%) gotstuck at the allocation corresponding to the local payoff maximum.This result was again predicted by the LOCAD learning model.Although both models underestimate the proportion of allocationsaccording to the local or global payoff maximum, the predictedproportions by LOCAD were closer to the observed data.

Through learning, participants were able to increase their payoffover the 100 trials (see Figure 7B). Both models underestimated

the payoff increase, but LOCAD’s prediction was closer to theobserved payoff increase than GLOS’s prediction.

The effect of training on the magnitude with which the partic-ipants changed their decisions was similar to Study 1 (see Figure8A), starting with an average magnitude of a Euclidean distance of29 for the first 10 trials and ending with an average magnitude of5 for the last 10 trials. Although both models underestimated thedecline in the magnitude with which decisions were adapted, thepredictions of LOCAD come closer to the observed development.

Similar to Study 1, an association between allocations’ direc-tions and their success was observed for the participants’ deci-sions: For 74% of all allocations in the same direction as thepreceding allocation (angles between 0o and 30o), the precedingallocation was successful, compared with only 35% of all alloca-tions made in an opposite direction (angles between 150o and180o; see Figure 8B).

An even stronger association was predicted by the LOCADmodel: For 92% of all allocations made in the same direction as thepreceding allocation, the preceding allocation was successful,compared with 20% of all allocations made in an opposite direc-tion. In contrast, the GLOS model predicted a weak association:For 61% of all allocations made in the same direction as thepreceding allocation, the preceding allocation was successful,compared with 46% of all allocations made in an opposite direc-tion. The proportions of successful preceding allocations for thedifferent angles were strongly correlated with both models’ pre-dictions (r � .93 for LOCAD and r � .92 for GLOS).

Figure 6. Average participants’ allocations and average predictions of thetwo learning models when simulating 50,000 agents. The figure shows amoving average of nine trials, such that for each trial the average of thepresent allocation and the preceding and succeeding four allocations arepresented. (Note for the first 4 trials, the moving average is determined byfive to eight trials.) GLOS � global search model; LOCAD � localadaptation model. Solid diamonds represent Real Asset B; solid trianglesrepresent Real Asset C; solid lines represent GLOS Asset B; hatched linesrepresent GLOS Asset C; open squares represent LOCAD Asset B; andopen triangles represent LOCAD Asset C.

Summary of Study 2

Study 2 illustrates the robustness of the findings from Study 1.Although the payoff difference between the local and global payoffmaximum was substantially increased, only a small proportion ofparticipants were able to find the global maximum, whereas manyparticipants got stuck at the local maximum. Such a result is consis-tent with the main learning mechanism of the LOCAD learningmodel, which better predicted the observed learning process for theallocation problem compared with the GLOS learning model.

Of course, one aspect of the payoff function that influences thedifficulty with which the local or global payoff maxima can bedetected is their localizations in the search space of possibleallocations. The allocation corresponding to the local payoff max-imum was located near the center of the search space, that is, nearan allocation with an equal share invested in all three assets. Incontrast, the allocation producing the global payoff maximum waslocated at the border of the search space, that is, an allocation withdisproportional investments in the different assets. If people tend

Figure 7. Individual characteristics of the decision process in Study 2. A: Percentage of allocations corre-sponding to the local or global payoff maximum across all trials (with a tolerated deviation of 5% from theallocations that lead to the global or local maximum), presented with a moving average of nine trials. B: Averagepayoff across all trials, presented with a moving average of nine trials. GLOS � global search model; LOCAD �local adaptation model.

to start with evenly distributed investments in all three assets andif they follow a learning process as predicted by the LOCADmodel, they should frequently get stuck at the local payoff maxi-mum. In contrast, one could imagine a payoff function for whichthe positions of the allocations corresponding to the local andglobal payoff maxima were interchanged. For such a payoff func-

tion, the majority of participants would presumable find the globalpayoff maximum. However, such a function would not allowdiscrimination between the predictions of the two learning modelsand was therefore not used.

In summary, the results that many participants got stuck at thelocal payoff maximum in both of our studies is a result of the

Figure 8. Characteristics of the decision process in Study 2. A: Average magnitude of changes (step size)measured with the Euclidean distance between the allocations of successive trials (with possible values rangingfrom 0 to 141), presented with a moving average of nine trials. B: The angles between allocations’ directionscompared with the direction of preceding allocations were determined and categorized in six intervals. For eachcategory, the percentage of successful preceding allocations (i.e., those leading to a higher payoff than theallocations before) are presented. GLOS � global search model; LOCAD � local adaptation model.

payoff function used and can be predicted with the proposedLOCAD learning model. The generalization test of the learningmodels in Study 2 was more substantial than that in Study 1,because no parameter values were fitted to the data; instead themodels predicted independent behavior of a different decisionproblem.

Discussion

Recently, several learning theories for decision-making prob-lems have been proposed (e.g., Borgers & Sarin, 1997; Busemeyer& Myung, 1992; Camerer & Ho, 1999a, 1999b; Erev & Roth,1998; Selten & Stocker, 1986; Stahl, 1996). Most of these learningtheories build on the basic idea that people do not solve a problemfrom scratch but adapt their behavior on the basis of experience.The theories differ according to the learning mechanism thatpeople apply, that is, their assumptions about cognitive processes.

The reinforcement-learning model proposed by Erev and Roth(1998) and the experience-weighted attraction learning model pro-posed by Camerer and Ho (1999a, 1999b) in general belong to theclass of global search models. These models assume that allpossible decision alternatives can be assigned an overall evalua-tion. Whereas the evaluation for the reinforcement-learning modelonly depends on the experienced consequences of past decisions,the experience-weighted attraction model additionally can takehypothetical consequences and foregone payoffs into account.Both models make the assumption that people integrate theirexperience for an overall evaluation, and alternatives that areevaluated positively are more likely to be selected.

The other approach—local adaptation models—does not as-sume that people necessarily acquire a global representation of theconsequences of the available decision alternatives through learn-ing. Instead, the hill-climbing model by Busemeyer and Myung(1987) and the learning direction theory of Selten and Stocker(1986) assume that decisions are adapted locally, so that a preced-ing decision might be slightly modified according to its success orfailure.

Busemeyer and Myung (1992) suggested that models in theglobal search class may be applicable to situations in which thedecision alternatives form a small set of qualitatively differentstrategies, whereas models in the local adaptation class may beapplicable in situations in which the decision alternatives form acontinuous metric space of strategies. Global search models havebeen successfully applied to constant-sum games, in which thereare only a small number of options. The purpose of this researchwas to examine learning processes in a resource allocation task,which provides a continuous metric space of strategies.

A new version of the global search model, called the GLOSmodel, and a new version of the local adaptation model, called theLOCAD model, were developed for this task. These two modelswere the best representations of the two classes that we constructedfor the resource allocation task. The models were compared in twodifferent studies. In the first study, the model parameters wereestimated separately for each participant, and the model fits werecompared with the individual data. In the second study, we usedthe estimated parameters from the first study to generate a prioripredictions for a new payoff condition, and the predictions of themodels were compared with the mean learning curves.

In both studies, the resource allocation task consisted of repeat-edly allocating a capital resource to different financial assets. The

task was difficult because the rates of return were unknown for twoassets, the rates of return depended in a nonlinear manner on theamount invested in the assets, and the number of allocation alter-natives was quite large. However, because any investment led to adeterministic return, it was always obvious which of two alloca-tions performed better after the payoffs for these allocation alter-natives were presented. Therefore, the essence of the task that theparticipants faced in both studies consisted of a search problem fora good allocation alternative. Given that the participants wereprovided with a large number of trials, finding the best possibleallocation alternative was possible. However, it turned out that themajority of participants did not find the best possible allocationcorresponding to the global payoff maximum but became dis-tracted by the local payoff maximum. Nevertheless, a substantiallearning process was observed: At the beginning of the task therewas a tendency to allocate an equal proportion of the resource toall three assets with a slightly larger proportion invested in theasset that guaranteed a fixed return. These allocations led torelatively low average payoffs, which then increased substantiallyover the 100 trials through learning. This learning process can becharacterized by substantial changes of allocations at the begin-ning of the task, which then declined substantially over time. Thedirection in which the allocations were changed depended stronglyon the success of previous changes, characterizing a directionallearning process.

These central findings correspond to the learning principles ofthe local adaptation model. Therefore, it is not surprising that thelocal adaptation model reached a better fit compared with theglobal search model in predicting individuals’ allocations in bothstudies. In Study 1, when fitting both models to each individualseparately, LOCAD reached a slightly better fit in describing thelearning process. In Study 2, the a priori predicted average allo-cations by the LOCAD model (see Figure 6) properly describedthe observed average allocation across 100 trials, corresponding toa smaller MSE for LOCAD compared with GLOS. Given that inStudy 2 the payoff function differed substantially from the payofffunction of Study 1, these results provide strong empirical supportfor LOCAD.

The appropriateness of LOCAD to describe the learning processis also supported by individual characteristics of the process. InStudy 1, the LOCAD model, compared with GLOS, more accu-rately predicted the magnitude with which successive allocationswere changed. In contrast, the other three individual characteristicsof the learning process are equally well described by the twomodels in Study 1. This result changes substantially when turningto Study 2; here the LOCAD model also more suitably describedthe development of payoffs and the development of the number ofallocations corresponding to the local and global payoff maximum.Unexpectedly, in both studies, the association between the direc-tion of allocations and the success of previous allocations wasappropriately described by the LOCAD model as well as theGLOS model.

Why is it that the LOCAD model, compared with the GLOSmodel, better describes the learning process in the resource allo-cation task? Although the predictions of the two models can besimilar with respect to specific aspects, the learning principles ofthe models are quite different. The learning principles of LOCADseem to correspond more accurately to individuals’ behavior forthis task. According to LOCAD, starting with a specific allocation,new allocations are made in the same direction as the direction of

the preceding successful allocation. Although this learning princi-ple is very effective at improving allocations, it can lead to theresult of missing the global maximum, as decisions have to bechanged substantially to find the global maximum. Yet this resultis exactly what was found in both studies. In contrast, the GLOSmodel eventually found a global payoff maximum, especiallywhen experience made at the beginning of a learning process wasnot given too strong a weight. In this case the GLOS modelselected all different kinds of allocations and eventually at somepoint also selected allocations corresponding to the global payoffmaximum, for which it then developed a preference. However,given that most participants did not find the global payoff maxi-mum, when fitting the GLOS model to the data, parameter valueswere selected so that the model would not lead to a convergence tothe global payoff maximum. However, with these parameter val-ues, the model also does not converge frequently to any allocation,so that it still does not predict the convergence to the local payoffmaximum, which was found for most participants.

To what extent can the results of the present studies be gener-alized to different learning models? The two models that weimplemented are the best examples of the two approaches oflearning models we found. Both were supported by past researchand both were directly compared in previous theoretical analyses(see, e.g., Erev, 1998). More important, we also compared manyvariations of each model, although because of space limitations,we only present the results for the best version of each model.Nevertheless, our conclusions are supported in the sense that novariation of the GLOS model outperformed the LOCAD modelthat we present here, and instead, all the variations did worse thanthe GLOS model that we present here. Furthermore, because of theflexibility of the implemented models, that is, their free parame-ters, we doubt that slight modifications of the presented modelswould lead to substantially different results that would challengeour claim that the LOCAD learning model is better to predict thelearning process for the resource allocation problem.

To what extent did Study 2 provide a fair test of the two models?The answer, we argue, is more than fair. First, both types oflearning models have been applied in previous theoretical analysesto resource allocation tasks similar to the one used in Study 1 (Erev& Gopher, 1999). Thus, there is no reason to claim that Study 1does not provide a suitable test ground. In the second study, wesimply increased the difference between the local and globalmaxima, which encouraged more participants to find the globalmaximum. This manipulation actually favors the GLOS modelbecause the a priori tendency for the LOCAD model is to beattracted to the local maximum. Thus, the second study providedthe best possible a priori chance for the GLOS model to outper-form the LOCAD model in the generalization test.

To what extent can the results of the present studies be gener-alized to other decision problems? It should be emphasized that thecurrent conclusions are restricted to the decision problem weconsidered. We expect that in similar decision problems thatprovide a large number of strategies that form a natural order, theLOCAD model would better describe the learning process. In suchsituations, people can form a hypothesis about the underlyingcausal structure of the decision process that enables a directedlearning process. For example, when deciding how much to investin a repeated public-good game, a local adaptation learning processmight occur.

However, there are many situations for which global searchlearning models describe learning processes better. For example,there is a large amount of empirical evidence that global searchmodels appropriately describe the learning process for constant-sum games with a small number of actions (Erev & Roth, 1998).In a constant-sum game, no possibility exists for the players toincrease the mutual payoff by “cooperation.” The prediction fromgame theory asserts that the different decision strategies (options)should be selected with a particular probability. In such a situation,there are only a small number of categorically different alterna-tives, making it difficult to apply a local adaptation model, becausethe set of alternatives provides no natural order to define directionsfor changes in strategies.

The present article demonstrates a rigorous test of two learningmodels representing two approaches in the recent learning litera-ture. It also provides an illustration that learning often does notlead to optimal outcomes as claimed, for example, by Simon(1990) or Selten (1991). Yet, people improve their decisionssubstantially through learning: For example, even when individu-als start with a suboptimal decision of allocating an equal share tothe different assets, they quickly change their decision by makingallocations that produce higher payoffs. This learning process canbe described by the local adaptation learning model, which iscommonly characterized by high efficiency but can lead to sub-optimal outcomes. For other domains, other learning mecha-nism(s) might govern behavior, and each learning model mighthave its own domain in which it works well. Identifying thesedomains is a promising enterprise.

References

Ball, C. T., Langholtz, H. J., Auble, J., & Sopchak, B. (1998). Resource-allocation strategies: A verbal protocol analysis. Organizational Behav-ior & Human Decision Processes, 76, 70–88.

Benartzi, S., & Thaler, R. H. (2001). Naive diversification strategies indefined contribution saving plans. American Economic Review, 91,79–98.

Borgers, T., & Sarin, R. (1997). Learning through reinforcement andreplicator dynamics. Journal of Economic Theory, 77, 1–14.

Brennan, M. J., Schwartz, E. S., & Lagnado, R. (1997). Strategic assetallocation. Journal of Economic Dynamics & Control, 21, 1377–1403.

Brown, G. W. (1951). Iterative solution of games by fictitious play, In T. C.Koopmans (Ed.), Activity analysis of production and allocation (pp.374–376). New York: Wiley.

Busemeyer, J. R., & Myung, I. J. (1987). Resource allocation decision-making in an uncertain environment. Acta Psychologica, 66, 1–19.

Busemeyer, J. R., & Myung, I. J. (1992). An adaptive approach to humandecision-making: Learning theory, decision theory, and human perfor-mance. Journal of Experimental Psychology: General, 121, 177–194.

Busemeyer, J. R., Swenson, K., & Lazarte, A. (1986). An adaptive ap-proach to resource allocation. Organizational Behavior & Human De-cision Processes, 38, 318–341.

Busemeyer, J. R., & Wang, Y.-M. (2000). Model comparisons and modelselections based on generalization criterion methodology. Journal ofMathematical Psychology, 44, 171–189.

Bush, R. R., & Mosteller, F. (1955). Stochastic models for learning. NewYork: Wiley.

Camerer, C., & Ho, T.-H. (1999a). Experience-weighted attraction learningin games: Estimates from weak-link games. In D. V. Budescu & I. Erev(Eds.), Games and human behavior: Essays in honor of Amnon Rap-oport (pp. 31–51). Mahwah, NJ: Erlbaum.

Camerer, C., & Ho, T.-H. (1999b). Experience-weighted attraction learningin normal form games. Econometrica, 67, 827–874.

Cheung, Y.-W., & Friedman, D. (1997). Individual learning in normal formgames: Some laboratory results. Games & Economic Behavior, 19,46–76.

Dorfman, D. D., Saslow, C. F., & Simpson, J. C. (1975). Learning modelsfor a continuum of sensory states reexamined. Journal of MathematicalPsychology, 12, 178–211.

Erev, I. (1998). Signal detection by human observers: A cutoffreinforcement-learning model of categorization decisions under uncer-tainty. Psychological Review, 105, 280–298.

Erev, I., & Gopher, D. (1999). A cognitive game-theoretic analysis ofattention strategies, ability, and incentives. In D. Gopher & A. Koriat(Eds.), Attention and performance XVII: Cognitive regulation of perfor-mance. Interaction of theory and application (pp. 343–371). Cambridge,MA: MIT Press.

Erev, I., & Roth, A. E. (1998). Predicting how people play games: Rein-forcement learning in experimental games with unique, mixed strategyequilibria. American Economic Review, 88, 848–881.

Estes, W. K. (1950). Toward a statistical theory of learning. PsychologicalReview, 57, 94–107.

Fudenberg, D., & Levine, D. K. (1995). Consistency and cautious fictitiousplay. Journal of Economic Dynamics & Control, 19, 1065–1089.

Gingrich, G., & Soli, S. D. (1984). Subjective evaluation and allocation ofresources in routine decision-making. Organizational Behavior & Hu-man Decision Processes, 33, 187–203.

Harley, C. B. (1981). Learning the evolutionary stable strategy. Journal ofTheoretical Biology, 89, 611–633.

Langholtz, H. J., Ball, C., Sopchak, B., & Auble, J. (1997). Resource-allocation behavior in complex but commonplace tasks. OrganizationalBehavior & Human Decision Processes, 70, 249–266.

Langholtz, H., Gettys, C., & Foote, B. (1993). Resource-allocation behav-ior under certainty, risk, and uncertainty. Organizational Behavior &Human Decision Processes, 54, 203–224.

Langholtz, H., Gettys, C., & Foote, B. (1994). Allocating resources over

time in benign and harsh environments. Organizational Behavior &Human Decision Processes, 58, 28–50.

Langholtz, H., Gettys, C., & Foote, B. (1995). Are resource fluctuationsanticipated in resource allocation tasks? Organizational Behavior &Human Decision Processes, 64, 274–282.

Luce, R. D. (1959). Individual choice behavior. New York: Wiley.Nelder, J. A., & Mead, R. (1965). A simplex method for function mini-

mization. Computer Journal, 7, 308–313.Northcraft, G. B., & Neale, M. A. (1986). Opportunity costs and the

framing of resource allocation decisions. Organizational Behavior &Human Decision Processes, 37, 348–356.

Roth, A. E., & Erev, I. (1995). Learning in extensive-form games: Exper-imental data and simple dynamic models in the intermediate term.Games & Economic Behavior, 8, 164–212.

Russel, S. J., & Norvig, P. (1995). Artificial intelligence. Englewood Cliffs,NJ: Prentice Hall.

Selten, R. (1991). Evolution, learning, and economic behavior. Games &Economic Behavior, 3, 3–24.

Selten, R. (1998). Axiomatic characterization of the quadratic scoringrules. Experimental Economics, 1, 43–62.

Selten, R., & Stocker, R. (1986). End behavior in sequences of finiteprisoner’s dilemma supergames: A learning theory approach. Journal ofEconomic Behavior & Organization, 7, 47–70.

Simon, H. A. (1990). Invariants of human behavior. Annual Review ofPsychology, 41, 1–19.

Smith, V. L. (1982). Microeconomic systems as an experimental science.American Economic Review, 72, 923–955.

Stahl, D. O. (1996). Boundedly rational rule learning in a guessing game.Games & Economic Behavior, 16, 303–330.

Sutton, R. S., & Barto, A. G. (1998). Reinforcement learning: An intro-duction. Cambridge, MA: MIT Press.

Thomas, E. A. C. (1973). On a class of additive learning models: Errorcorrecting and probability matching. Journal of Mathematical Psychol-ogy, 10, 241–264.

Appendix

Payoff Functions Used in Study 1 and Study 2

In Study 1 the payoff functions were defined as follows: The firstallocation asset produced a fixed rate or return of 10% (with the payofffunction uA( pA) � 0.1pA 0.1R with pA � [0,1] as the percent of theresource R invested in asset A.) For the other two allocation assets, the rateof return varied with the amount invested in the asset. For asset B, thepayoff function was defined as uB( pB, pA) � 10-0.1pA R � 40 [sin(3.2� ( pB-0.781)-9)/(3.2� ( pB-0.781)-9)] with pB, pA � [0,1].For asset C, the payoff function was defined as uC( pC) � 5 � [4R sin(1.1� ( pC-0.781)-24.6)/(1.1� ( pC-0.781)-24.6)] with pC � [0,1].

In Study 2 the payoff functions were defined as follows: The payofffunction for asset A was identical to the one used in Study 1. For asset B

the payoff function was defined as uB( pB, pA) � 6 – 0.2pA R � 80 [sin(3.2� ( pB – 0.781) –9)/(3.2� ( pB – 0.781) –9)] with pB, pA �

[0,1] and for the third asset C the payoff function was defined as uC( pC)�–4 � [8R sin(1.1� ( pC-0.781)-24.6)/(1.1� ( pC-0.781)-24.6)] withpC � [0,1].

Received October 22, 2002Revision received March 26, 2003

Accepted March 30, 2003 �

How Do People Learn to Allocate Resources? Comparing Two Learning Theories

Documents

Transcript of How Do People Learn to Allocate Resources? Comparing Two Learning Theories

COMPARING QUANTITIES - SelfStudys

COURS THEORIES LITTERAIRES 2

CONTEMPORARY THEORIES OF EDUCATION

PIPELINE RULES - Learn-Rite

Introduction to Communication Theories

yesterday - Learn Piano Live

Contemporary Educational Theories

Global Media Theories

Racial theories

Zee Learn Limited

Body Mechanics - Learn Muscles

Theories of Integration

Reading Comprehension | Learn CBSE

Learn Visual Basic 6.0

Company Presentation - Zee Learn

LET'S LEARN

Comparing instanton contributions with exact results in N = 2 supersymmetric scale invariant theories

Learn to Return Playbook

LEARN ADJUST IMPROVE - PAUSATF

Theories of Regionalism