Plasticity and language: an example of the Baldwin eect

21
Plasticity and language: an example of the Baldwin effect? Kevin Zollman Department of Philosophy Baker Hall 135 Carnegie Mellon University Pittsburgh, PA 15213-3890 Rory Smead * Department of Logic and Philosophy 3151 Social Science Plaza A University of California Irvine, CA 92697 October 20, 2009 Abstract In recent years, many scholars have suggested that the Baldwin effect may play an important role in the evolution of language. How- ever, the Baldwin effect is a multifaceted and controversial process and the assessment of its connection with language is difficult without a formal model. This paper provides a first step in this direction. We examine a game-theoretic model of the interaction between plastic- ity (represented by Herrnstein reinforcement learning) and evolution in the context of a simple language game. Additionally, we describe three distinct aspects of the Baldwin effect: the Simpson-Baldwin ef- fect, the Baldwin expediting effect and the Baldwin optimizing effect. We find that a simple model of the evolution of language lends the- oretical plausibility to the existence of the Simpson-Baldwin and the Baldwin optimizing effects in this arena, but not the Baldwin expe- diting effect. Humans and many other animals are extremely adaptive. Individuals are capable of changing their behavior in response to a wide variety of different * The authors would like to thank Brian Skyrms, Bill Harms, Kyle Stanford, Brad Armendt, and Santiago Amaya for helpful suggestions. 1

Transcript of Plasticity and language: an example of the Baldwin eect

Plasticity and language: an example of theBaldwin effect?

Kevin ZollmanDepartment of Philosophy

Baker Hall 135

Carnegie Mellon University

Pittsburgh, PA 15213-3890

Rory Smead∗

Department of Logic and Philosophy

3151 Social Science Plaza A

University of California

Irvine, CA 92697

October 20, 2009

Abstract

In recent years, many scholars have suggested that the Baldwineffect may play an important role in the evolution of language. How-ever, the Baldwin effect is a multifaceted and controversial processand the assessment of its connection with language is difficult withouta formal model. This paper provides a first step in this direction. Weexamine a game-theoretic model of the interaction between plastic-ity (represented by Herrnstein reinforcement learning) and evolutionin the context of a simple language game. Additionally, we describethree distinct aspects of the Baldwin effect: the Simpson-Baldwin ef-fect, the Baldwin expediting effect and the Baldwin optimizing effect.We find that a simple model of the evolution of language lends the-oretical plausibility to the existence of the Simpson-Baldwin and theBaldwin optimizing effects in this arena, but not the Baldwin expe-diting effect.

Humans and many other animals are extremely adaptive. Individuals arecapable of changing their behavior in response to a wide variety of different

∗The authors would like to thank Brian Skyrms, Bill Harms, Kyle Stanford, BradArmendt, and Santiago Amaya for helpful suggestions.

1

environmental circumstances. This phenotypic plasticity has undoubtedlybeen shaped by natural selection, and its presence has potentially influencedthe evolution of other less plastic traits. The relationship between natu-ral selection, adaptation, and phenotypic plasticity have been the subjectof significant discussion prompted by a multifaceted process proposed inde-pendently in the late 19th century by Lloyd Morgan, J. Mark Baldwin andFairfield Osborne called the “Baldwin effect.” The Baldwin effect is an in-teraction between phenotypic plasticity and selection by which the presenceof plasticity can alter the course of evolution and acquired characters canbecome genetically assimilated.

The discussion of the different aspects of the Baldwin effect has primarilyfocused on the theoretical plausibility of these components. Although farfrom decided, there has been significant negative criticism which regardsthe effect as unimportant, unlikely, or irrelevant. Some recent models havesuggested that at least parts of the Baldwin effect may be more commonthan previously presumed (Ancel, 1999; Smead and Zollman, 2008).

Many studies of the Baldwin effect have focused on individual creaturesadapting to a changing external environment. The changing environmentcreates a circumstance where there is some advantage to plasticity, sinceflexible individuals can respond to a new environment better than inflexibleones. One type of environmental variability is frequency dependent selection:where a phenotype’s fitness depends on the composition of other phenotypesin the population. Situations where an animal’s fitness is influenced by theoutcomes of interactions with other animals exemplify frequency dependentselection, and these situations are often modeled using techniques in gametheory. In another paper (Smead and Zollman, 2008), we model the effect ofnatural selection on plastic individuals using game theoretic techniques.

Because of its axiomatic approach, our earlier model considered gamesand phenotypes at a coarse level of detail. We here take the other tack byconsidering one particular game and plastic phenotype. This allows us toinvestigate different aspects of the Baldwin effect in detail.1 The game weanalyze, the Lewis signaling game, has been used extensively to model theinitial emergence of language (Barrett, 2008; Huttegger, 2007; Skyrms, 1996,2008; Zollman, 2005). This coincides with one of the most common situationswhere the Baldwin effect is thought to have occurred (e.g. Deacon, 1997;

1This investigation is also useful because the learning rule we discuss here falls outsidethe scope of those considered in (Smead and Zollman, 2008).

2

Dennett, 1991, 1995; Pinker, 2000) and where it is most plausible (Godfrey-Smith, 2003).

Along these lines, Dor and Jablonka (2000) offer a framework for explain-ing the evolution of language. They describe a complicated and dynamicinteraction between learning, genetic assimilation, changing environments,and culture. This framework involves, among other things, “culturally-drivengenetic assimilation” by means of an underlying Baldwin effect. Exploringthe Baldwin effect in the context of the simple Lewis signaling game willallow us to form an initial assessment of the possible role(s) of the Baldwineffect in the evolution of signaling and language. Along the way, we find thatthis simple model allows us to separate three distinct aspects of the Baldwineffect.

The first of these distinct aspects is the Simpson-Baldwin effect, namedfor Simpson (1953) by Ancel (1999). The Simpson-Baldwin effect focuses onthe effect of evolution on plasticity and predicts that, under certain condi-tions, initially acquired traits will become genetically assimilated as plasticityis first selected for and then later selected against.2 The second aspect, theBaldwin expediting effect (also named by Ancel) focuses on the effect of plas-ticity on the speed of evolution. Here it is predicted that the presence ofplastic individuals will result in the emergence of optimal behavior soonerthan would be expected without plasticity. Discussion of both of these twoaspects of the effect often have focused on single peaked fitness landscapeswhere there is a unique globally optimal phenotype and no other locally op-timal ones. When the possibility of locally-optimal, but globally sub-optimalphenotypes is introduced, another aspect of the Baldwin effect comes into fo-cus. Mills and Watson (2006) demonstrate that the Baldwin effect can allowevolution to “cross fitness valleys” and that the canalization of the Simpson-Baldwin effect is not required to do this. This suggests that the presence ofplastic individuals alters the trajectory of evolution by directing the popula-tion away from sub-optimal equilibria and toward the global optimum. Thislast aspect we name the Baldwin optimizing effect.

We find that in the model of language we consider, both the Simpson-Baldwin and Baldwin optimizing effect occur, but the Baldwin expeditingdoes not. This separation of the different aspects coincides with a verydifferent set of models (Ancel, 1999, 2000) in which the Simpson-Baldwin

2This aspect has also been explored by Mayley (1996) and genetic assimilation ofacquired characters is the focus of Waddington (1953).

3

effect is found more plausible than the Baldwin expediting effect.

1 The model

While human language has many complicated components, one that hasreceived much recent attention is the arbitrary assignment of signs to ob-jects. This conventionality of language presents an interesting evolutionarycircumstance, since there are multiple solutions which appear equally goodand many other solutions which are less than optimal. This central featureof animal (and human) signaling is captured in a game described first byDavid Lewis (1969).

1.1 The Lewis signaling game

In the Lewis signaling game one person (the sender) is given some informationabout the world (the state). He has at his disposal a set of signals that hecan send to another player: the receiver. The receiver observes the signal,but not the state, and chooses from a set of actions. There is a single actionwhich is appropriate for a given state. If the receiver takes the appropriateaction, both the sender and receiver obtain a reward, otherwise they receivenothing.

In such games it is obviously in both the sender’s and receiver’s interestto coordinate on a conventional association between states, signals, and acts.In games where there are an equal number of each, any one-to-one map ofstates to signals and signals to acts which results in the correct act in eachstate is a Nash equilibrium of the underlying game.

These optimal strategies, dubbed signaling systems, represent one poten-tial end point for the process of evolution. However, there are other Nashequilibria that are less than optimal. For instance, if the sender always sendsthe same signal in every state, the receiver’s best response is to take the ac-tion which is most often correct regardless of signal. If the receiver is doingthis, all potential sender strategies are equally good. This represents anotherpossible Nash equilibrium known as a pooling equilibrium.

When there are more than two states, signals, and acts, there are alsoequilibria which lie in between total pooling equilibria and signaling systems.So-called partial pooling equilibria involve the successful communication of

4

some states, and the pooling of others. These equilibria too are potentialend points of the evolutionary process.

Much recent work has been done to determine when these suboptimalstates will arise in models of evolution and learning. It is clear that theywill sometimes arise and that they represent a potential roadblock to thesuccessful evolution of optimal languages (see Huttegger and Zollman 2009for an overview).

1.2 Replicator dynamics

One of the simplest models of the evolutionary process is the replicator dy-namics (Taylor and Jonker, 1978). This set of differential equations has beenused extensively in game theory to model the evolution of strategies undernatural selection or similar circumstances and has been used to analyze theevolution of populations playing the Lewis signaling game (Skyrms, 1996;Huttegger, 2007; Huttegger et al., 2007; Pawlowitsch, 2008).

The continuous time replicator dynamics captures the idea that a strategywhich does better than other alternatives will grow while those that do worseshrink.3 The dynamics assumes that the population is effectively infinite andthat individuals interact with others in the population at random.

In the Lewis signaling game it has been shown that from a random initialstarting point, the replicator dynamics can converge to sub-optimal signal-ing. In fact, only in the case of two states, two signals, and two acts, wherethe states are equiprobable, is optimal signaling guaranteed to evolve (Hut-tegger, 2007). All other cases involve some possibility (perhaps small) thatpopulations will become stuck in sub-optimal equilibria (Huttegger, 2007;Huttegger et al., 2007; Pawlowitsch, 2008).

1.3 Herrnstein reinforcement learning

The models using the replicator dynamics suppose a collection of phenotypeswhich represent strategies in the signaling game. These strategies are notadaptive; they do not change in response to the strategy they are interactingwith nor do they change in response to states of the population. However,some recent attention has been given to a particular adaptive strategy, Her-rnstein reinforcement learning.

3The replicator dynamics is just one of a class of dynamics that captures this idea. Theclass is known as payoff-monotone dynamics.

5

Implementing Herrnstein’s (1970) matching law, Herrnstein reinforce-ment learning postulates that the probability an individual acts in a certainway is proportional to the past rewards for that action. At each stage anindividual has weights representing the past rewards for a given action. Anindividual determines the probability for each action by dividing the rewardsfrom that action by the total rewards for all actions. The individual thenchooses an action at random using these probabilities.

It has recently been proven that, like replicator dynamics, Herrnsteinreinforcement learning will converge to optimal behavior in two state, twosignal, two act signaling games with equiprobable states (Argiento et al.,2009). Simulation results suggest, however, that it can converge to sub-optimal (partial) pooling equilibria in other circumstances (Barrett, 2006;Skyrms, 2008).4

These models have primarily considered two individuals repeatedly inter-acting with one another, where both individuals are updating behavior viaHerrnstein reinforcement learning. They do not consider populations of play-ers, nor the possibility that some individuals might be implementing fixedstrategies.

Our model will do just this. We will begin with a population composed ofindividuals who play one of the available pure strategies (contingency plansin the signaling game) and one type that implements Herrnstein reinforce-ment learning. Each generation, individuals will be paired to play the gameagainst one another for one thousand iterations of the game. During this timethe pure strategy types will unreflectively implement their strategy, and theHerrnstein reinforcement type will attempt to adapt to her opponent. Afterthe iterations of game play, the population will evolve according to a discrete-time version of the replicator dynamic with respect to their type (either usinga pure strategy or Herrnstein reinforcement learning).5 The dynamics dictatethat if the frequency of reinforcement learners in the population will change

4These models differ slightly from ours in that they consider so-called “act-based”learning where individuals treat each state or signal as a learning situation distinct fromother states or signals. In that way individuals are not learning total contingency plansfor every possible state or every possible signal, but only particular moves. Our use ofHerrnstein reinforcement presumes that individuals are learning on entire contingencyplans.

5The discrete time replicator dynamics is more conducive to simulations than the re-lated continuous time dynamics. It should be noted, however, that there are some differ-ences between the two dynamics which may be significant.

6

proportionally to the ratio of their performance to the population average.Likewise for each non-plastic strategy.

This model will provide us an opportunity to consider the various aspectsof the Baldwin effect using a concrete model of language evolution. Wewill begin with the Simpson-Baldwin effect before turning to the Baldwinexpediting and optimizing effects.

2 The Simpson-Baldwin effect

The Simpson-Baldwin effect focuses primarily on natural selection’s influenceon the evolution of plastic behavior. This version of the effect claims that oneshould expect, after environmental change, that plastic individuals will firstarise and then later be reduced in a population as an acquired trait becomesgenetically assimilated. When the environment changes in a relevant way,plastic individuals who are capable of responding to the new environment inbetter ways can out-compete non-plastic individuals adapted to the previousenvironment. As a result, plastic individuals are superior to the ancestraltypes and will be selected for. However, if a type arises later who determin-istically engages in the appropriate behavior it will likely be superior, andthus out-compete the (now ancestral) plastic types. By means of a processlike this, a trait that is initially learned or acquired in a population becomesgenetically canalized.

There are a number of concerns about this story; we will discuss twohere. The first focuses on the comparison between plastic types and thebehaviorally equivalent non-plastic types. Why should it be the case thatnon-plastic types are ceteris paribus superior to plastic types? It is oftensupposed that plasticity carries with it a cost, either an exogenous cost ofmaintaining or developing the mechanism required for plasticity or an endoge-nous cost generated by delay or failure to implement the optimal strategy.Certainly this is the case with humans; our plasticity comes at great cost.

The second, and perhaps more serious concern focuses on the order of in-vasions. Godfrey-Smith (2003) points out that if the deterministic behavioroccurs before the plastic behavior, it will invade and one will see no growthof plasticity. Why should it be, he asks, that we ought to expect to findplasticity arising first? After considering a few answers, he suggests thatthe only plausible situation is one where plasticity creates the circumstancewhere the later deterministic types are superior. He cites the evolution of

7

language as an example. In the ancestral population no one is communi-cating, thus those who have a genetic predilection for language will do nobetter. Plastic individuals capable of acquiring linguistic abilities, however,maybe able to arise, learn to communicate with each other, and then takeover the population. Once they have done so, their presence makes thosewith canalized linguistic abilities superior, and thus a population of plasticindividuals can be invaded.

While the evolution of language represents one circumstance where thepresence of plastic individuals can change the fitness of other types, it is onlyone example of a more general phenomenon known as frequency dependentselection. Godfrey-Smith’s suggestion applies equally well to all these cases,and so we might be interested in inquiring about frequency dependent selec-tion. We will begin by summarizing the results of our earlier paper (Smeadand Zollman, 2008) which reveal Simpson-Baldwin-like phenomena, and thenturn to Lewis signaling games (as a model of language). In all these cases,it appears that such effects may be common.

2.1 Plasticity and Evolution

In another paper (Smead and Zollman, 2008), we model the effect of selectionon behavioral plasticity (or learning) in the context of social interaction. Inthese models, a large population of agents are randomly paired to play arepeated game with one another. The agents are either hard-wired to playa fixed strategy of the game or use an adaptive strategy. An agent using theadaptive strategy learns to play a best response to the behavior of her co-player in the repeated game. The specific method by which these individualslearn is not specified in these models, only the outcome or aim of the learningprocess. Two different ways of imposing costs are considered. The first fitswith the energetic or developmental aspects of learning where the costs areoutside of the interactions (exogenous cost). The other captures the idea thatcosts arise as a result of errors in the learning process (endogenous cost).

When costs are imposed only on the learners exogenously, the prospectsfor evolving adaptive individuals in this context are dim. Learning is rarelyevolutionarily stable, and only in a restricted subset of games can learnersco-exist with non-learners in stable populations. In Lewis signaling games,these results entail that when there is exogenous cost (no matter how small),learning will always be eliminated by evolution.

This demonstrates the second half of the Simpson-Baldwin effect, that

8

Cooperate DefectCooperate 2, 2 0, 3

Defect 3, 0 1, 1

Figure 1: Prisoner’s Dilemma

Figure 2: The Prisoner’s Dilemma with Plasticity

9

plasticity is invaded by a canalized type. We also find the first component,that plasticity invades other populations. Consider the prisoner’s dilemma(see figure 1). In this game there are two canalized types, cooperators anddefectors. We also add a plastic type: learners. Figure 2 shows the replicatordynamics this situation. The evolutionary paths that begin in populationslargely composed of cooperators are initially invaded by learners (who learnto defect in the game) before the learners are eliminated in favor of non-learning defectors.

The connection between the Simpson-Baldwin effect and this result mustbe explained in more detail. The Simpson-Baldwin effect is usually describedas the genetic canalization of novel traits and the trait that is canalized inthis model was already present in some non-plastic individuals. Modelinggenuine novelty is difficult in the framework of evolutionary game theory.However, there is a sense in which the phenomena seen these models are ageneric version of the Simpson-Baldwin effect, where the emergence of noveltraits is a special case. The Simpson-Baldwin effect with respect to noveltycan be understood as a population which is evolving on or near the edgeof the simplex. Using the Prisoner’s Dilemma example, we can imaginea population composed of all Cooperate and a few plastic individuals. Theplastic individuals will learn to play Defect, which is a novel behavior for thisparticular population. Once learners take over the population, a mutationthe direction of Defect will cause the population to proceed to be takenover by individuals who are non-plastic and always play Defect.6 Thus, theSimpson-Baldwin effect with respect to a novel behavior is a special case ofthe phenomena described in our previous work.7

What if there are no exogenous costs and the only costs come from errorsin learning? In this case, we modeled a “cost” as a deviation from the bestresponse by the learners.8

In a highly restricted setting (focusing only on 2 × 2 games), we find

6Here, it is not specified how the trait becomes canalized. This may be important fordetermining the plausibility of some particular evolutionary path, but is beyond the scopeof this paper.

7The fact that there are only two possible behaviors in this particular examples canmake the interpretation of “novelty” seem artificial. However, similar interpretations andexamples could be given for much larger games with many more possible actions.

8In this case, a “cost” could actually be beneficial, if deviations somehow resulted inhigher payoffs accidentally. The cost is quantified here by looking at the proportion of playsthat deviate from the “normal” best-response behavior of the learners and calculating thepayoff difference between the normal behavior and the cases where deviation occurs.

10

that it is unlikely for plastic individuals to evolve. To make analytic resultstractable in this case, it was assumed that the probability of deviation fromthe normal behavior is fixed regardless of opponent. Neither this assumptionnor the restriction to 2×2 games applies to Herrnstein reinforcement learningin signaling games. A signaling game cannot be represented as a 2× 2 game,and in the short run, the rate of exploration or deviation by Herrnsteinreinforcement learners depends heavily on the behavior of the co-player. Insome games, this means that a population of reinforcement learners can resistinvasion from pure strategy players.

To highlight this second point, consider a population of individuals pairedto repeatedly play the Prisoner’s Dilemma. The types of individuals are co-operators, defectors and Herrnstein reinforcement learners. Consider thebehavior of learners in the short run against each of the three types. Even-tually, learners will be driven to play Defect regardless of opponent, since itis strictly dominant. But in the short run learners will play Cooperate withone another and achieve a higher payoff than Defect against Defect.9 Withonly endogenous costs (conceived as generated by exploration), reinforce-ment learning can be evolutionarily stable. Figure 3 shows the evolutionarydynamics in this case.

With endogenous costs to learners, there are settings (such as the Pris-oner’s Dilemma) where we should not expect a reduction in strategic plastic-ity, but rather that Herrnstein reinforcement learning can evolve and domi-nate the population. This consideration leaves open the possibility that wemay not see a Simpson-Baldwin effect in signaling with respect to reinforce-ment learners.

In a very different model Suzuki and Arita (2004) consider a repeatedPrisoner’s Dilemma which includes plastic strategies. They also observe aSimpson-Baldwin effect, where plasticity is selected for and then later re-duced. In their model, however, plasticity is not eliminated and most in-dividuals are partially plastic. This later possibility is not included in ourmodels, individuals are either plastic or not and so as a result represents arather different way of modeling plasticity.

9We should expect Herrnstein reinforcement learners to take longer to play Defectregularly against cooperators than against defectors, and likewise longer against otherreinforcement learners than defectors. Hence, the “error rate” of reinforcement learnersis dependent on the strategy being used by the co-player, and thus this example is notsubject to the theorems regarding endogenous cost in (Smead and Zollman, 2008).

11

Figure 3: The Prisoner’s Dilemma with Plasticity

12

Figure 4: Simpson effect

2.2 Signaling games

Here we consider Herrnstein reinforcement learning as a type in a popula-tion of other types playing a signaling game. Two individuals are paired torepeatedly play a signaling game with one another and they will receive theaverage payoff for their interaction.10 There is no exogenous cost to learning,but Herrnstein reinforcement learning will make errors. Since this is a part-nership game (both player receive the same payoff) errors are costly–learnerscannot gain from them.

Figure 4 shows the average proportion of learners over time for 10,000separate starting points of a two state, two signal, two act signaling game withequiprobable states. Here we see a distinct Simpson-Baldwin effect. Initiallythe Herrnstein reinforcement learners do well, increasing in the population.

10This average payoff was determined by running 10,000 independent runs of Herrnsteinreinforcement against each pure type and constructing an expanded game using theseestimated payoffs.

13

Learners do well against all types, they learn to signaling effectively withthose that signal and they do as well as they can against those that donot. However, once there is a significant number signalers (both learnersand pure strategy signalers) in the population, those that deterministicallyplay one signaling system will eventually take over. The reason is that,when interacting with a learner, it is better for an individual to have afixed signaling strategy and allow the learner to adapt than it is for bothindividuals to try and adapt simultaneously. Two learners take longer tofind successful signaling than one learner and one pure-strategy signaler.

At a behavioral level, a population with a significant proportion of learn-ers will acquire the ability to communicate by learning. Once pure strategysignaling types begin to invade, the ability to effectively communicate be-comes innate. The end result of the evolutionary process is a populationthat uses a single signaling system without aid of learning. Which signalingsystem invades is a matter of chance and it is determined by the initial pro-portions which are determined at random. But eventually one or the otherwins out.

Our caution about novelty occurs here again. There are signaling systemtypes present in most of these populations, which means that the signalersare not using a novel strategy. But, as in the case of the Prisoner’s Dilemma,we find a more general Simpson-Baldwin effect where learners are selectedfor and then selected against even in populations with both deterministicand plastic types.

3 Baldwin expediting and optimizing effect

Another aspect of the Baldwin effect focuses on how the presence of plasticphenotypes may alter the course of evolution by “speeding up” or “expedit-ing” the evolutionary process. Here it is conjectured that the presence ofplastic individuals will result in the attainment of optimal phenotypes morequickly than would have occurred without the presence of plastic types in thepopulation (the expediting effect). More radically, some have suggested thatthe presence of plastic individuals helps populations to find optimal behav-iors that would not have been found without their presence (the optimizingeffect).

Most studies have focused on the expediting rather than the more-radicaloptimizing effect. Mills and Watson (2006) provide one discussion on the

14

ability of the Baldwin effect to cross fitness valleys. Daniel Dennett alsoseems to come close to discussion of the optimizing effect, but he appearsto vacillate between the two.11 Other examples discussed in the literaturetend to focus on fitness landscapes with single peaks, that is situations wherethere is a unique globally optimal behavior and no locally optimal behaviorsthat are not globally optimal (Ancel, 2000; Dennett, 1991, 1995; Hinton andNowlan, 1987).12

But, this is not the case in generalized signaling games. Most signalinggames have stable local optima which are not globally optimal. So, we mustbe careful to distinguish these effects. The Baldwin expediting effect (inisolation) would require that the ultimate basins of attraction of differentend states would remain the same irrespective of the presence of a learningphenotype, but the system would converge to these states more quickly withlearning than without. The Baldwin optimizing effect on the other handwould entail that the size of the basins of attraction would be altered by thepresence of learning as well.

First, we will consider the Baldwin expediting effect. To remaining consis-tent with the previous literature on this subject we will consider a two state,two signal, two act signaling game with equiprobable states. This game hastwo global optima and no stable local optima (Huttegger, 2007). Figure 5shows the results of simulations comparing the evolution of populations withand without learning respectively. The y-axis represents the average payoffof the population. While the population with learners starts out closer tooptimal behavior (by mere fact that there are more types who behave opti-mally) we see that the slope of the line for learners is more shallow whichrepresents slower evolution to the optimum.

This suggests that, in the context of signaling games, the Baldwin expe-diting effect is not present. Learners do not assist the population in converg-ing to optimal behavior more quickly. Here, however, concerns about nov-elty again appear. Those who describe the Baldwin expediting effect suggestthat the presence of plastic individuals help to find phenotypes which are not

11For instance, in his (1991) he says, “If it weren’t for the plasticity, however, theeffect wouldn’t be there” (186). But in his (1995) he weakens this claim, “...their specieswill evolve faster because of its greater capacity to discover design improvements in theneighborhood” (79, emphasis in original).

12Suzuki and Arita (2004) are an exception. They consider strategies in a repeatedPrisoner’s Dilemma which include plasticity. Although they do not single it out, theyobserve a Baldwin optimizing effect in their model as well.

15

Figure 5: Expediting effect

16

Figure 6: Optimizing Effect

present in the ancestral population. In these simulations, we choose an initialpopulation state at random which creates a population with some optimaltypes, so no novel behavior is generated. As a result, the negative result hereshould not be generalized to conclusions about the Baldwin expediting effectelsewhere.13

The Baldwin optimizing effect does occur in signaling games with sub-optimal equilibria. Figure 6 shows an example of a two state, two signal,two act signaling game with non-equiprobable states (the probabilities are0.8 and 0.2 respectively). This game has local optima which are not globallyoptimal as described in section 1.1. The y-axis in this figure represents

13Our informal analysis of novelty presented above does not answer the question here,since we would need to measure directly the convergence speed in different models. Thisis beyond the ability of the replicator or replicator-mutator dynamics, and would insteadrequire a finite population model. Models such as the Moran process (Moran, 1962) mightbe appropriate for this task. Such a model would conform to our informal discussion ofnovelty, but space prevents us from discussing it here.

17

the average proportion of the population which contains optimal strategies.Optimal strategies are present more often after a few generations and poolingequilibria are avoided. The learners are eventually eliminated, but their earlypresence has altered the evolutionary trajectories of populations that wouldhave otherwise evolved toward the pooling equilibria.

Perhaps the most surprising feature of this result is that learners them-selves are not guaranteed to converge to optimal behavior. Two individualsboth employing Herrnstein reinforcement learning in this game will oftenconverge to the sub-optimal total pooling equilibria. However, in the limit,a Herrnstein reinforcement learner playing against a fixed signaling systemstrategy will learn the appropriate signaling system. This latter feature ap-pears to be sufficient to move populations inside the basin of attraction ofsignaling systems before the learners are eliminated.

4 Conclusion

Several previous mathematical and simulation models of the Baldwin effecthave provided a variety of answers regarding the plausibility of the effectin different circumstances (Ancel, 1999, 2000; Hinton and Nowlan, 1987).These models, along with much of the discussion of the Baldwin effect, haveprimarily focused on a changing external environment, on adaptive problemswhich only require adaptation to the external environment, and on a fitnesslandscape without sub-optimal equilibria. We have analyzed a circumstancethat differs in these three respects. Our model has a stable game, but thefitness of different types changes according to changes in the population;there is no strategy which is optimal regardless of the strategies used byothers in the population. In addition, there are equilibria which are notglobally optimal.

We have focused on a model of the evolution of language already wellstudied by philosophers. Our aim was to evaluate the plausibility of thedifferent aspects of the Baldwin effect in this domain. Ultimately we concludethat some aspects are very likely.

The Simpson-Baldwin effect appears to be present in many models of fre-quency dependent selection, including our model of the evolution of language.While many have focused on the emergence of novel traits via plasticity, ourmodel shows that the Simpson-Baldwin effect may occur even in situationswhere the trait being implemented by plastic individuals is not novel. In the

18

early stages of evolution, effective signaling is often generated by learning andas evolution continues, learners are driven out in favor of individuals that ef-fectively signal without learning. The growth and later reduction of plasticityby frequency dependent selection may be a very widespread phenomena.

We find little support for the Baldwin expediting effect, but because ourmodel cannot account for novelty in the way discussed by many authors, thisnegative conclusion is not particularly telling. Instead, we are left with littleto say about this aspect of the Baldwin effect.

Finally, we distinguished a third effect, the Baldwin optimizing effectand found significant support for that effect in our model of the evolution oflanguage. Whether this effect is present in other games is an open question,but its presence in even one is surprising, and it confirms a more general moral– the presence of plastic individuals can effect the trajectory of evolution,even if those plastic individuals are later eliminated.

References

Ancel, L. W. (1999). A quantitative model of the Simpson-Baldwin effect.Journal of Theoretical Biology 196, 197–209.

Ancel, L. W. (2000). Undermining the Baldwin expediting effect: Doesphenotypuic plasticity accelerate evolution? Theoretical Population Biol-ogy 58, 307–319.

Argiento, R., R. Pemantle, B. Skyrms, and S. Volkov (2009, February).Learning to signal: Analysis of a micro-level reinforcement model. Stochas-tic Processes and their Applications 119 (2), 373–390.

Barrett, J. A. (2006). Numerical simulations of the Lewis signaling game:Learning strategies, pooling equilibria, and the evolution of grammar.Technical Report MBS 06-09, University of California, Irvine: Institutefor Mathematical Behavioral Sciences.

Barrett, J. A. (2008). Dynamic paritioning and the conventionality of kinds.Philosophy of Science 74, 527–546.

Deacon, T. W. (1997). The Symbolic Species: The Co-evolution of Languageand the Brain. Norton.

19

Dennett, D. (1991). Consciousness Explained. Little, Brown, and Company.

Dennett, D. (1995). Darwin’s Dangerous Idea: Evolution and the Meaningsof Life. New York: Simon and Shuster.

Dor, D. and E. Jablonka (2000). From cultural selection to genetic selection:A framework for the evolution of language. Selection 1 (1-3), 33–56.

Godfrey-Smith, P. (2003). Between baldwin skepticisim and baldwin boost-erism. In B. H. Weber and D. J. Depew (Eds.), Evolution and Learning:The Baldwin Effect Reconsidered. MIT Press.

Herrnstein, R. J. (1970). On the law of effect. Journal of the ExperimentalAnalysis of Behavior 15, 245–266.

Hinton, G. E. and S. J. Nowlan (1987). How learning can guide evolution.Complex Systems 1, 495–502.

Huttegger, S. (2007, January). Evolution and explanation of meaning. Phi-losophy of Science 74 (1), 1–27.

Huttegger, S., B. Skyrms, R. Smead, and K. Zollman (2007). Evolutionarydynamics of lewis signaling games: Signaling systems vs. partial pooling.Forthcomming in Synthese.

Huttegger, S. and K. J. Zollman (2009). Signaling games: The dynamics ofevolution and learning. In A. Benz (Ed.), Language, Games, and Evolution.

Lewis, D. (1969). Convention: A Philosophical Study. Cambridge: HarvardUniversity Press.

Mayley, G. (1996). Landscapes, learning costs, and genetic assimilation.Evolutionary Computation 4 (213-234).

Mills, R. and R. Watson (2006). On crossing fitness valleys with the Bald-win effect. In Proceedings of the Tenth International Conference on theSimulation and Synethsis of Living Systems, pp. 493–499.

Moran, P. (1962). The Stastical Process of Evolutionary Theory. Oxford:Clarendon Press.

20

Pawlowitsch, C. (2008). Why evolution does not always lead to an optimalsignaling system. Games and Economic Behavior (63), 203–226.

Pinker, S. (2000). The Language Instinct. HarperCollins.

Simpson, G. G. (1953, June). The Baldwin effect. Evolution 7 (2), 110–117.

Skyrms, B. (1996). Evolution of the Social Contract. Cambridge: CambridgeUniversity Press.

Skyrms, B. (2008). Signals: Evolution, Learning, and the Flow of Informa-tion (book manuscript).

Smead, R. and K. Zollman (2008). The stability of strategic plasticity.Carnegie Mellon University Department of Philosophy Technical ReportCMU-PHIL-182 .

Suzuki, R. and T. Arita (2004). Interactions between learning and evolution:The outstanding strategy generated by the Baldwin effect. BioSystems 77,57–71.

Taylor, P. and L. Jonker (1978). Evolutionarily stable strategies and gamedynamics. Mathematical Biosciences 40, 145–156.

Waddington, C. H. (1953, June). Genetic assimilation of an acquired char-acter. Evolution 7 (2), 213–234.

Zollman, K. J. (2005). Talking to neighbors: The evolution of regional mean-ing. Philosophy of Science 72, 69–85.

21