Exploring the Effect of Power Law Social Popularity on Language Evolution

Exploring the Effect of PowerLaw Social Popularity onLanguage Evolution

* Contact author.** Department of Linguistics, University of Hong Kong, Hong Kong. E-mail: gtojty@gm† Department of Electrical and Computer Engineering, Johns Hopkins University, Balti

© 2014 Massachusetts Institute of Technology Artificial Life 20: 385–4

Tao Gong*,**University of Hong Kong

Lan Shuai†

Johns Hopkins University

KeywordsSocial scaling, mutual understanding,self-organization, computer simulation

A version of this paper with color figuresis available online at http://dx.doi.org/10.1162/artl_a_00138. Subscription required.

Abstract We evaluate the effect of a power-law-distributed socialpopularity on the origin and change of language, based on threeartificial life models meticulously tracing the evolution of linguisticconventions including lexical items, categories, and simple syntax.A cross-model analysis reveals an optimal social popularity, in whichthe E value of the power law distribution is around 1.0. Under thisscaling, linguistic conventions can efficiently emerge and widelydiffuse among individuals, thus maintaining a useful level of mutualunderstandability even in a big population. From an evolutionaryperspective, we regard this social optimality as a tradeoff amongsocial scaling, mutual understandability, and population growth.Empirical evidence confirms that such optimal power laws exist inmany large-scale social systems that are constructed primarily vialanguage-related interactions. This study contributes to the empiricalexplorations and theoretical discussions of the evolutionary relationsbetween ubiquitous power laws in social systems and relevantindividual behaviors.

1 Introduction

Power laws ( f (x) ∼ x−E, E > 0.0, f (x) a density function), as one type of probability distributions [29],have been repeatedly identified in biological, ecological, psychological, social, and linguistic systems[3, 14, 33]. For example, the relations between organism mass and metabolic rates across species[41], between the numbers of recalled items and recalling periods in human memory systems [34],between the popularities of scholars or actors in academia or the film industry and the frequenciesof collaborations among them [36], and between the ranks of words and their frequencies of occurrencein scripts of different languages [48] all follow power laws. These power laws can be classified by theirscaling components (E). For example, the E values of the power laws between the ranks of words andtheir frequencies of occurrence in scripts of different languages are around 1.0 (such power laws arealso called Zipf s̓ laws [48]), and those of the power laws between the ranks of language families andnumbers of members in those families are around 2.0 (based on the data from Ethnologue [24])[42, 47]. The ubiquitous occurrence of power laws in various systems prompts many scholars to regardpower laws, rather than normal distributions, as one of the most striking signatures of complex adaptive

ail.commore, MD. E-mail: [email protected]

08 (2014) doi:10.1162/ARTL_a_00138

T. Gong and L. Shuai Effect of Power Law on Language Evolution

systems (CASs) [7, 25] that incorporate multiple dependent items and intricate connections amongthese items [3, 10, 14, 17, 18, 29, 40]. The macroscopic outcome of a CAS usually results from themicroscopic interactions of its components [4], and during such a self-organization [8] process, powerlaws, as well as other characteristics, may emerge at a global level. Previous work has shown thatpreferential attachment [5], kinship relation [11], and geographical constraint [45] can render powerlaws in different social systems.

In many human social systems, linguistic communication and language-related informationexchange (e.g., social collaboration and exchange via telephone or e-mail) are the most prominentbehaviors among individuals. As revealed in some surveys [36, 37, 44], social systems constructedprimarily via language-related behaviors tend to exhibit similar (in terms of E value) power lawdegree distributions (in network terms, if one treats individuals as nodes, and interactive behaviorsamong individuals as edges linking nodes, then the degree of a node is the number of edges it has,and the degree distribution describes the probability that a chosen node has a particular degree, whichis a probability distribution of degrees over the whole network). For example, the E value of thedegree distribution is 2.3 in the movie star collaboration network (449,913 nodes (movie stars)and 25,516,482 edges (collaborations in movies)) [46], 2.1 in the telephone call network(47,000,000 nodes (individuals) and 80,000,000 edges (phone calls among those individuals)) [2],and 1.8 in the e-mail exchange network (59,912 nodes (e-mail addresses) and 86,300 edges (outgoinge-mail exchanges from these addresses)) [15]. This cross-system similarity inspires us to wonderwhy social systems involving language or language-related interactive behaviors exhibit such simi-lar power laws, and what the relation is between these particular power laws and those languagebehaviors in those systems.

Apart from small-scale empirical studies, computer simulation offers an efficient way to exploreissues concerning power laws in large-scale social systems. Previous work in this line usually adoptsa network approach, treating individuals in a community as nodes, and interactions among them asedges linking nodes [32]. Extracting actual connections among individuals helps reveal the structuralfeatures of these networks, and analyzing simulation results in networks exhibiting different degreesof such features helps reveal the general effect of relevant social factors. For example, by simulatingvarious networks (e.g., row, lattice, ring, small-world, or scale-free networks), previous studies (e.g.,[11, 12, 22, 30]) have shown that the more the social connections an individual has (the higher itsdegree), the more influential it is in a community [27, 31], and that the bigger the social distance(the number of intermediate nodes) between individuals, the weaker the influence they have on eachother [28, 35].

On the one hand, social connections, as a local indicator, can explicitly denote individual relationsin large-scale societies. In small-scale societies, however, such connections are usually hard to re-trieve and less informative, since individuals therein often connect intensively and interact frequentlywith each other, which may blur the effect of particular social connections. Although weightednetworks (using connection weights to denote intensity or frequency) may partially release this dif-ficulty, estimating connection weights from empirical data, usually obtained at the population level,is not straightforward. Noting these facts, apart from local connections, we need global indicators tounderstand the general effect of social factors.

Social popularity (the distribution of probabilities for individuals to participate in social activities)could be one of such global indicators. Compared with social connection, social popularity is less de-pendent on actual connections among individuals, thus making it applicable to both small- and large-scale communities. In addition, social popularity can be estimated directly from empirical data at thepopulation level. Furthermore, since social popularity is inherently similar to probability distributionsdefined at the population level, we can use power laws (as well as other distributions) to manipulatesocial popularity and examine the relation between social popularity and individual behaviors.

On the other hand, each of the previous simulation studies often adopts one language model tostudy the effect of social factors on particular aspect(s) of language evolution. As a CAS, languagecontains many hierarchically organized and frequently interacting components [16]. A particular modeltouching on some of these components and their interactions would be insufficient to summarize the

386 Artificial Life Volume 20, Number 3


general relation between social popularity and individual language behaviors. Therefore, we need toconsider multiple models covering various aspects of language.

In our study, we define a power-law-distributed social popularity. By adjusting the E value of thispower law, we examine the effect of such a social popularity on language evolution, based on threelanguage models touching upon the semantic, lexical, and syntactic aspects of language evolution.These models include: (a) the naming game [6], which examines the origin of consensus on lexicon-like meaning-utterance associations in a population of individuals; (b) the category game [38], whichstudies the origin and diffusion of linguistic categories; and (c) the lexicon-syntax coevolution model[19, 21], which traces the origin and change of lexical items and simple word orders. The cross-model analysis of the simulation results reveals: (a) a correlation between the scaling components(E) of power laws and the understandability of evolving language; and (b) an optimal scaling com-ponent, with which linguistic conventions can sufficiently diffuse in the population to keep suffi-ciently high level of mutual understandability. The simulation results under different populationsizes indicate that such optimal scaling helps balance social scaling, population growth, and linguisticunderstandability.

The rest of the article is organized as follows: Section 2 defines power law social popularity, andpoints out its relation with power law degree distributions; Section 3 reports and analyzes the simu-lation results based on the adopted language models; Section 4 interprets these results and evaluatesthe cross-model analysis in our study; and finally, Section 5 concludes the article.

2 Power Law Social Popularity

In this study, social popularity refers to the distribution of probabilities for individuals to participatein language communications. The participating probability of each individual is a function of its rank(denoting an individualʼs popularity in the community). We use a power law distribution to manipulatesocial popularity:

pðrÞ ¼ cr−E ð1Þ

Here, r denotes the rank of an individual, p(r ) calculates the probability for this individual to participatein communications, and c (= 1/∑i=1

N r i−E) is a normalizing factor making sure the sum of all

participation probabilities is 1.0. For the sake of simplicity, we assign each individual a distinct rankfrom 1 to N, where N is the population size, and E classifies power laws. If E is 0.0, all individualshave the same probability of communicating with each other, which resembles the case of randomcommunications. When E has other values, the smaller the rank of an individual, the more popular thatindividual is in the community.

Assuming that the rank and participation probability of an individual are correlated with thenumber of social connections it can have, we can unify the global indicator of social popularity withthe local indicator of social connection. This assumption may not necessarily hold in all cases, but itoften does, especially in societies where social connections reflect opportunities of interactions. Letus consider a scale-free network [5] formed by individual social connections; the degree distributionof this network follows a power law. If the rank of a node having a degree k is defined accumula-tively according to the probability for this node to have at least k or more connections with others,then the E of the power law social popularity will be correlated with the E0 of the power law degreedistribution. This correlation is given by:

E0 ¼ 1þ 1E

ð2Þ

Artificial Life Volume 20, Number 3 387


and proved as follows:

rðkÞ∼ RN−1→∞k p0ðxÞdx ¼ R∞

k x−E0dx ∼ k1−E

0

pðr ðkÞÞ ¼ rðkÞ−E ∼ k

k−Eð1−E0Þ ∼ k

g ⇒ −Eð1 − E0Þ ¼ 1 ⇒ E0 ¼ 1þ 1E

ð3Þ

Here, r (k) is the rank of an individual, p(k) (=k−E) is a power law social popularity, p0(k) (=k−E0)

is a power law degree distribution, and normalizing factors are omitted. This correlation holdswhen N is sufficiently large. It links the simulation results obtained under power law socialpopularities with the empirical data obtained in real-world systems having power law degreedistributions.

In our study, we select seven E values (0.0, 0.5, 1.0, 1.5, 2.0, 2.5, and 3.0) to analyze the effectsof different power laws on language evolution. Figure 1 shows the participation probabilities in a50-agent population under power laws with these E values. In a log-log plot, these curves becomestraight lines, the slopes of which increase with E. In addition, we set up seven population sizes N(50, 100, 150, 200, 300, 400, and 500) to study the effect of power laws on language evolution inboth small and large communities. In each population, we run 140 simulations (20 under each of theseven E values).

3 Simulation Results

The three adopted language models are briefly reviewed in Appendices 1, 2, and 3, respectively.Due to the various language behaviors involved, the language evolution dynamics in these modelsmanifests itself in distinct time scales, and can be traced by various indices. Our cross-model analysisis based primarily on the indices tracing linguistic mutual understandability at the population level,and it proceeds in two steps. First, we analyze the effects of power law social popularity and popu-lation size on linguistic understandability in each of these models (Section 3.1 to 3.3). Then, wesummarize the general effect of power laws on language evolution and discuss the relation betweenpower laws and language evolution across these models (Section 3.4). In the end, we compare theeffects of power-law-distributed social popularity with normally distributed social popularity (as willbe discussed in Section 3.5).

Figure 1. Participation probabilities under power law social popularities. Each line traces the probabilities under a powerlaw with a particular E value.



3.1 Naming GameThis model traces the origin and spread of a common lexical name in a population of individuals.Linguistic convention refers to the lexical name. Due to the simple behaviors involved (i.e., hearersacquire new names in failed games, and both speakers and hearers delete competing names insuccessful games), the evolution dynamics manifests itself in a short time scale. Accordingly, weset the number of games per agent (individual) at 50 (the actual number of games depends onN, and due to social popularity, not all agents participate in exactly the same number of games).The evolution dynamics can be traced by the number of distinct names in the population (Nd)and the rate of successful games in which speakers and hearers agree on the same name (S). Ouranalysis focuses on S, which reflects mutual understandability in the population.

Figure 2a traces the dynamics of the naming game in a 50-agent population. The dynamics isshown by the transition of S from 0.0 (no understanding) to 1.0 (mutual understanding). As shownin Figure 2a, if E is smaller than 1.0, with increase in E, the transition becomes faster; if E is greaterthan 1.0, with increase in E, the transition becomes slower; and if E is greater than 1.5, the transi-tion will not complete within 50 games per agent. To sum up, among all power laws, the best

Figure 2. S of the naming game under different power laws in (a) a 50-agent population and (b) other populations. Each lineis averaged over 20 simulations. Error bars denote standard errors. For reasons of space, error bars in (b) are omitted.



performance occurs when E equals 1.0. Similar observations can be obtained in simulations withbigger populations (see Figure 2b). If E does not equal 1.0, with increase in population size, thetransition becomes slower, but if E equals 1.0, the transition remains the fastest among all powerlaws, and does not change much across populations. These findings are also confirmed by statisticalanalysis (see Appendix 4).

3.2 Category GameThis model traces the origin and diffusion of a set of linguistic categories among individuals.Linguistic conventions refer to linguistic categories having similar perceptual boundaries and com-mon lexical names across individuals. Due to the incorporated language behaviors for processingnot only lexical names but also categories, the evolution dynamics of linguistic categories manifestsitself in a much bigger time scale. Accordingly, we set the number of games per agent at 106. Thisdynamics can be traced by the degree of boundary alignment across individualsʼ linguistic categories,the number of shared lexical names among individualsʼ linguistic categories, and the rate (S) ofsuccessful games in which speakers and hearers correctly discriminate presented stimuli based ontheir categorical knowledge and use identical lexical names to call those stimuli. In our analysis, wefocus on S.

Figure 3a traces the dynamics of the category game in a 50-agent population, indicated by S. It isshown that when E equals 1.0, the transition of S is the fastest among all power laws. Similarobservations can be obtained in simulations under bigger populations (see Figure 3b). When E equals1.0, the transition of S remains the fastest among all power laws, and does not change much acrosspopulations. These findings are also confirmed by statistical analysis (see Appendix 4).

3.3 Lexicon-Syntax Coevolution ModelThe evolving language in this model can encode semantic expressions with simple predicate-argument structures into sentences with basic word orders. Linguistic conventions include commonlexical items, syntactic categories, and word orders regulating lexical items in sentences. Languagebehaviors for processing lexicon, syntax, and relevant linguistic categories are simulated for indi-viduals to learn, update, and use different types of linguistic knowledge during communications.The evolution of language proceeds on a time scale that is distinct from those in the other models.Apart from origin, this model can also study language change. The evolution dynamics of this modelcan be traced by the expressivity of individual linguistic knowledge and the linguistic mutual under-standability among individuals (UR). In our study, we set the number of communications per agentat 600, focus on UR for analysis, and conduct both the origin and change simulations. In the originsimulations, individuals initially share limited linguistic knowledge that can only encode a smallnumber of semantic expressions; in the change ones, individuals initially share a complete set oflinguistic knowledge capable of expressing all semantic expressions.

Figure 4a,c traces the dynamics of this model in a 50-agent population, indicated by UR. In theorigin simulations, when E is smaller than 1.0, UR can reach a high level after 600 communications,and the increase in UR starts earliest when E equals 1.0. However, when E is greater than 1.0, theincrease in UR occurs later and the achieved maximum UR within 600 communications becomessmaller. In the change simulations, when E is smaller than 1.0, a high UR is kept throughout thesimulation; when E is greater than 1.0, UR starts to drop with increase in E. By tracing the sharedlinguistic knowledge, we find that even in cases where a high UR is maintained, some shared lin-guistic knowledge gradually changes during the evolution. For example, Table 1 records the sharedlexical knowledge in a change simulation with E equal to 1.0. It is shown that the utterances of someinitially shared lexical items become different after 600 communications. This indicates the inevitablechange of language during cultural transmission [20]. In this situation, what it is that the power lawsocial popularity helps preserve is the mutual understandability based on such consistently changingknowledge.



Similar observations can be obtained in simulations under bigger populations (see Figures 4b and5b). In the origin simulations, only when E equals 0.0, 0.5, or 1.0 can UR reach a high value across allpopulations. When E equals 1.0, the transition of UR remains the fastest among all power laws, anddoes not change much across populations. In the change simulations, with the increase in N, onlywhen E is smaller than 1.0 can a high UR be preserved; in other cases, UR drops with increase in N.These conclusions are also confirmed by statistical analysis (see Appendix 4).

3.4 Cross-Model AnalysisDue to various aspects of language evolution (e.g., lexical and syntactic evolutions, and origins,diffusion, and change of linguistic conventions) and relevant language behaviors processing lexicaland syntactic information, the evolution of language in these three models proceeds on differenttime scales. Nonetheless, we can observe some similar tendencies across different power lawsocial popularities and population sizes in these models. On the one hand, compared with othersituations, in the situation where E is smaller than 1.0, with increase in E, the transition of S or

Figure 3. S of the category game under different power laws in (a) a 50-agent population and (b) other populations.Discriminative constraint dmin = 0.01.



UR starts earlier and proceeds faster; in other words, the origin and diffusion of linguistic con-ventions become accelerated. In addition, a relatively high level of linguistic understandability canbe achieved and maintained across different populations; S or UR can reach a high value and bepreserved throughout the simulations. On the other hand, when E is bigger than 1.0, with increasein E the diffusion of linguistic conventions becomes slower, or impossible within the simulations(especially with very big E), and a high level of linguistic mutual understandability fails to be achievedor maintained.

Both aspects indicate a watershed, optimal scaling component (E = 1.0) in power law social popularity:Under this optimal scaling component, emergent linguistic conventions can efficiently diffuse and arelatively high level of linguistic mutual understandability can be largely preserved, even in biggerpopulations; whereas under a scaling component below or above this optimal value, the evolution(especially the origin) of language becomes less efficient, especially in bigger populations. Thechange simulations in the lexicon-syntax coevolution model are partially exceptional to these generaltendencies. In those simulations, the best performance, in the sense of a high level of mutual un-derstandability, exists when E is smaller than 1.0. This is due to the distinct settings in the changesimulations compared with the origin simulations based on the other models. In the change simu-lations, all individuals initially share a common set of linguistic knowledge. When E is smaller than1.0, every individual has many chances to communicate with others, so that their shared linguisticknowledge can be frequently used and enhanced. Therefore, a sufficiently high level of mutual un-derstandability can be maintained. However, in the origin simulations and other models, individualsinitially have no or limited linguistic knowledge, and they have to develop their common linguisticknowledge from scratch. In those simulations, although in terms of maintaining common knowledge

Table 1. Shared lexical rules in a change simulation under a power law with E = 1.0. Numbers within ( ) are averagestrengths of these rules among individuals; those within / / are utterance syllables. “#” denotes unspecified semanticconstituents. During the simulation, the utterances of the lexical rules marked with “*” become different.

392

Initially shared lexical rules(UR = 0.86)

Shared lexical rules after600 games (UR = 0.83)

(1.0): ‘ lion ’↔/25 17 /
*(0.91): ‘ lion ’↔/17 /
(1.0): ‘ wolf ’↔/29 11 /
(0.91): ‘ wolf ’↔/29 11 /
(1.0): ‘ fox ’↔/19 9 /
(0.91): ‘ fox ’↔/19 9 /
(1.0): ‘ tiger ’↔/25 /
*(0.91): ‘ tiger ’↔/24 /
(1.0): ‘ run〈#〉’↔/29 /
*(0.90): ‘ run〈#〉’↔/17 29 /
(1.0): ‘ hop〈#〉’↔/18 /
(0.91): ‘ hop〈#〉’↔/18 /
(1.0): ‘ cry〈#〉’↔/5 /
(0.91): ‘ cry〈#〉’↔/5 /
(1.0): ‘ fall〈#〉’↔/0 /
(0.91): ‘ fall〈#〉’↔/0 /
(1.0): ‘ chase〈#,#〉’↔/26 /
(0.90): ‘ chase〈#,#〉’↔/26 /
(1.0): ‘ fight〈#,#〉’↔/24 /
*(0.91): ‘ fight〈#,#〉’↔/20 /
(1.0): ‘ stalk〈#,#〉’↔/21 16 /
(0.91): ‘ stalk〈#,#〉’↔/21 16 /
(1.0): ‘ beat〈#,#〉’↔/22 8 /
(0.91): ‘ beat〈#,#〉’↔/22 8 /
Artificial Life Volume 20, Number 3



,
Figure 4. UR in the origin simulations under different power laws in (a) a 50-agent population and (b) other populationsand (c) UR in the change simulations under different power laws in a 50-agent population and (d) other populations.


a power law social popularity with E equal to 0.0, 0.5, or 1.0 may have similar effects, in terms ofdeveloping common knowledge, only a power law social popularity with E equal to 1.0 can triggerthe best performance. More interpretation of these results is shown in Section 4 below.

3.5 Comparison with Other Types of Social PopularityIn these simulations, we focus on the power-law-distributed social popularity and summarizeits general effect on language evolution. What about the effect of other types of social popularityfollowing other probability distributions? For the sake of answering this question and not losinggenerality, we take the example of normally distributed social popularity, and compare the simula-tion results under the power law social popularity with those under the normally distributed socialpopularity. For brevity, we put the comparison in Appendix 5. This comparison confirms that thenormally distributed social popularity does not show the general effect of the power law socialpopularity on language evolution.

4 Discussion

4.1 Optimal Scaling Component in Power Law Social PopularityOur simulations based on three artificial life models consistently show that the power law socialpopularity with the optimal scaling component (1.0) helps efficiently spread linguistic conventionsand preserve a high level of mutual understandability in the population, whereas social popularitieswith other values of the scaling component tend to delay the diffusion process and destroy mutualunderstandability, especially in big populations and when E is greater than 1.0.

These results are due to the combined effect of two factors. On the one hand, apart from theparticular behaviors processing different types of linguistic knowledge, there are similar behaviors inthese models (e.g., deleting or weakening competing names or linguistic rules in successful games).These behaviors contribute to linguistic conventionalization through local games (or communications)among individuals; frequent games (or communications) among individuals can trigger and shareknowledge among these individuals. Meanwhile, these behaviors also have a certain degree of random-ness (e.g., randomly creating lexical names or expressions when speakers fail to discriminate or encodecertain meanings). Without sufficient shared knowledge, this randomness will cast its influence onlinguistic conventionalization.

On the other hand, the scaling component of the power law helps adjust the ratio among threetypes of games: (i) those between popular individuals (whose rank values are smaller than or equal toN/2 (if N is an even number) or (N + 1)/2 (if N is an odd number)); (ii) those between popular andunpopular individuals (whose rank values are greater than or equal to N/2 or (N + 1)/2); and (iii) thosebetween unpopular individuals. Let us illustrate the influence of this ratio using a thought experi-ment. Assume a is the probability of choosing a popular individual in a game, and 1 − a the prob-ability of choosing an unpopular one; then, the probability of type (i) games can be roughly estimatedas a2, that of type (ii) as 2a(1 − a), and that of type (iii) as (1 − a)2 (for the sake of simplicity, weomit the normalizing factors and allow choosing identical individuals in a game). When E is 0.0, a is0.5. When E increases, a also increases; then, the probability of type (i) games will increase, but thoseof the other two types of games will decrease.

Combining these factors, the existence of an optimal scaling component can be explained asfollows. In the case of random games (E = 0.0), popular and unpopular individuals have an equalchance to communicate with each other, and linguistic conventionalization proceeds in the wholegroup. Then, with increase in population size, the degree of randomness increases, which will delaylinguistic conventionalization in a big population.

When E slightly increases, type (i) games become more frequent, but the other two, especiallytype (iii) games, become less so. In this case, conventionalization can be quickly achieved amongpopular individuals, due to frequent games among them. Sufficient type (ii) games also allow un-popular ones to interact with popular ones and to learn their shared knowledge. Such a “popular



individuals first, unpopular ones later” process of conventionalization is faster than that in the caseof random games, because learning from common knowledge already developed in a small group ofpopular individuals is more efficient than learning from scratch or from limited knowledge in thewhole group. In addition, sufficient type (ii) games make sure the common knowledge in a smallgroup of popular individuals can efficiently diffuse in other individuals, so the increase in populationsize will not greatly affect the mutual understandability of the group.

When E increases further, both type (ii) and type (iii) games become insufficient, so that un-popular individuals cannot efficiently learn from popular ones or develop their own shared knowledge,and the shared knowledge among popular individuals cannot efficiently diffuse to unpopular ones.Therefore, the linguistic conventionalization in the whole group is affected. For the naming andcategory games, without forgetting mechanisms, additional games will give unpopular individuals morechances to communicate with popular ones, which will eventually lead to mutual understanding in thewhole group. For the lexicon-syntax coevolution model, however, due to rule competition and forget-ting, even if more communications are given, unpopular individuals may not grasp sufficient commonknowledge in time to maintain mutual understandability, and UR will remain low in the origin simu-lations. In the change simulations, if unpopular individuals do not have enough chances to use theirinitially shared knowledge and enhance its strength, their shared knowledge will be gradually forgottenand UR will drop as well. Therefore, similarly to the origin simulations, given more communications,UR may not rise again.

This discussion suggests that the optimal scaling component (E = 1.0, shown in the simulations)emerges as a tradeoff among social scaling, linguistic mutual understandability, and population size.In the optimal situation, both a certain degree of social scaling and a relatively high level of linguisticmutual understandability are maintained, and such situations can withstand the influence of popu-lation growth. In addition, seen from Equation 2, the optimal E around 1.0 in the power law socialpopularity corresponds to the critical E0 around 2.0 in the power law degree distribution in a scale-free network. Following this correlation, we find that many large-scale, real-world social systemsconstructed via language-related interactions do stay in such optimal situations. For example, asshown in the introduction, the movie star collaboration network, the telephone call network, andthe e-mail exchange network all have their E0 around 2.0. Apart from language behaviors, otherscale-free natural or technical systems involving some information exchange and conventionalizationbehaviors also have their E0 around 2.0 (e.g., the metabolic network (765 nodes, 3,686 edges, E0 =2.2) [26], the peer-to-peer network (880 nodes, 1,296 edges, E0 = 2.1) [39], and the World WideWeb (203,549,046 nodes, 2,130,000,000 edges, E0 = 2.1) [1]) [36, 44].

4.2 Within- and Cross-Model ComparisonApart from the above findings, the cross-model comparison approach in our study also deservesfurther evaluation. Previous simulations often adopt within-model comparison, which designs a particularmodel of certain aspects of language evolution and compares simulation results obtained in distinctconditions to gather understanding of the target question (e.g., [12, 27, 28, 30]). However, the con-clusions drawn from a single model covering particular aspect(s) of language evolution may not holdin other aspects of language evolution. For example, the social settings helping lexical evolutionmay not necessarily help syntactic evolution. One way to overcome this limitation is to extendthe model to incorporate other aspects of language evolution, but this is not an easy task; so far,the most sophisticated models still fail to address all aspects of language or come close to the level ofcomplexity in language [9, 13, 23, 43].

Instead of a narrow angle around particular model(s), cross-model comparison offers anotherway to overcome this limitation, especially when the research goal is to generalize “universal” effectsof certain factor(s) on different aspects of language evolution. The huge repertoire of availablemodels of language processing and evolution provides rich resources for cross-model comparison.The difficulty of such comparison lies in how to quantitatively compare and reasonably summarizethe results appearing on different time scales or obtained from different models; in our study, our



comparisons are still limited to a conceptual or qualitative level. Nonetheless, unifying within- andcross-model comparisons is very promising for gathering both qualitative and quantitative under-standing of the evolutionary relation between social characteristics and individual behaviors, and itis reasonably foreseen that such an approach will be widely adopted by the future work in this lineof research.

5 Conclusion

We conduct a simulation study analyzing the correlation between power law social popularity and theevolution of individual language behaviors. Focusing on power laws is due to their ubiquity in socialsystems, and studying language behaviors is because they are the most prominent phenomenon inhuman social systems. A cross-model comparison based on three language models covering differ-ent aspects of language evolution reveals an optimal power law social popularity, which results froma compromise between social scaling, linguistic mutual understanding, and population growth. Thisfinding reflects an evolutionary correlation between individual behaviors and social characteristics,and the approach of cross-model comparison serves as an efficient way to explore their mutualinfluence from an evolutionary perspective.

AcknowledgmentsThis work was funded by the Seed Fund for Basic Research of the University of Hong Kong. Thepreliminary results of this article were reported in the 8th International Conference on the Evolution ofLanguage (Evolang8) in Utrecht, the Netherlands. We thank Yicheng Wu from Zhejiang Universityfor valuable comments on this work.

References1. Albert, R., Jeong, H., & Barabási, A.-L. (1999). Diameter of the World Wide Web. Nature, 401, 130–131.

2. Aiello, W., Chung, F., & Lu, L. (2002). Random evolution of massive graphs. In J. Abello, P. M. Pardalos,& M. G. C. Resende (Eds.), Handbook of massive data sets (pp. 97–122). Dordrecht, The Netherlands: Kluwer.

3. Bak, P. (1996). How nature works: The science of self-organized criticality. New York: Copernicus.

4. Ball, P. (2001). The self-made tapestry: Pattern formation in nature. Oxford, UK: Oxford University Press.

5. Barabási, A.-L., & Albert, R. (1999). Emergence of scaling in random networks. Science, 286, 509–512.

6. Baronchelli, A., Felici, M., Loreto, V., Caglioti, E., & Steels, L. (2006). Sharp transition towards sharedvocabularies in multi-agent systems. Journal Statistical Mechanics, P06014. Available: http://arxiv.org/abs/physics/0509075v2

7. Beckner, C., Blythe, R., Bybee, J., Christiansen, M. H., Croft, W., Ellis, N. C., Holland, J., Ke, J.-Y.,Larsen-Freeman, D., & Schoenemann, T. (2009). Language is a complex adaptive system:Position paper. Language Learning, 59(Suppl. 1), 1–26.

8. Camazine, S., Deneubourg, J.-L., Franks, N. R., Sneyd, J., Theraulaz, G., & Bonabeau, E. (2001).Self-organization in biological systems. Princeton, NJ: Princeton University Press.

9. Cangelosi, A., & Parisi, D. (2002). Computer simulation: A new scientific approach to the study oflanguage evolution. In A. Cangelosi & D. Parisi (Eds.), Simulating the evolution of language (pp. 3–28).Berlin: Springer-Verlag.

10. Clauset, A., Shalizi, C. R., & Newman, M. E. J. (2009). Power law distributions in empirical data.SIAM Review, 51, 661–703.

11. Coelho, R., Néda, Z., Ramasco, J. J., & Santos, M. A. (2005). A family network model for wealthdistribution in societies. Physica A, 353, 515–528.

12. DallʼAsta, L., Baronchelli, A., Barrat, A., & Loreto, V. (2006). Nonequilibrium dynamics of languagegames on complex networks. Physical Review E, 74(3), 036105. Available: http://arxiv.org/abs/physics/0607054v1



13. De Boer, B., & Zuidema, W. (2010). Multi-agent simulations of the evolution of combinatorial phonology.Adaptive Behavior, 18(2), 141–154.

14. Dubrulle, B., Graner, F., & Sornette, D. (Eds.). (1997). Scale invariance and beyond. Berlin: Springer.

15. Ebel, H., Mielsch, L.-I., & Bornholdt, S. (2002). Scale-free topology of e-mail networks. Physical Review E,66, 035103. Available: http://arxiv.org/abs/cond-mat/0201476

16. Fitch, T. W. (2010). The evolution of language. Cambridge, UK: Cambridge University Press.

17. Gell-Mann, M. (1994). The quark and the jaguar: Adventures in the simple and the complex. New York:W. H. Freeman.

18. Gisiger, T. (2001). Scale invariance in biology: Coincidence or footprint of a universal mechanism?Biological Reviews of the Cambridge Philosophical Society, 76(2), 161–209.

19. Gong, T. (2009). Computational simulation in evolutionary linguistics: A study on language emergence. Taipei:Institute of Linguistics, Academia Sinica.

20. Gong, T. (2010). Exploring the roles of horizontal, vertical, and oblique transmissions in languageevolution. Adaptive Behavior, 18(3–4), 356–376.

21. Gong, T. (2011). Simulating the coevolution of compositionality and word order regularity. InteractionStudies, 12(1), 63–106.

22. Gong, T., Baronchelli, A., Puglisi, A., & Loreto, V. (2012). Exploring the roles of complex networksin linguistic categorization. Artificial Life, 18(1), 107–121.

23. Gong, T., & Shuai, L. (2013). Computer simulation as a scientific approach in evolutionary linguistics.Language Sciences, 40, 12–23.

24. Grimes, B. F. (Ed.). (2000). Ethnologue: Languages of the world (14th ed.). Dallas: Summer Institute ofLinguistics.

25. Holland, J. H. (2012). Signals and boundaries: Building blocks for complex adaptive systems. Cambridge, MA:MIT Press.

26. Jeong, H., Tombor, B., Albert, R., Oltvai, Z. N., & Barabási, A.-L. (2000). The large-scale organizationof metabolic networks. Nature, 407, 651–654.

27. Kalampokis, A., Kosmidis, K., & Argyrakis, P. (2007). Evolution of vocabulary on scale-free andrandom networks. Physica A, 379, 665–671.

28. Ke, J.-Y., Gong, T., & Wang, W. S.-Y. (2008). Language change and social networks. Communication inComputational Physics, 3(4), 935–949.

29. Kello, C. T., Brown, G. D. A., Ferrer-i-Cancho, R., Holden, J. G., Linkenkaer-Hansen, K., Rhodes, T.,& van Orden, G. C. (2010). Scaling laws in cognitive sciences. Trends in Cognitive Sciences, 14, 223–232.

30. Kirby, S. (2000). Syntax without natural selection: How compositionality emerges from vocabulary ina population of learners. In C. Knight (Ed.), The evolutionary emergence of language: Social function and theorigins of linguistic form (pp. 303–323). Cambridge, UK: Cambridge University Press.

31. Livingstone, D. (2002). The evolution of dialect diversity. In A. Cangelosi & D. Parisi (Eds.), Simulatingthe evolution of language (pp. 99–117). Berlin: Springer-Verlag.

32. Malsch, T., & Schulz-Schaeffer, I. (2007). Socionics: Sociological concepts for social systemsof artificial (and human) agents. Journal of Artificial Societies and Social Simulation, 10. Available:http://jasss/soc/surrey.ac.uk/10/1/11.html.

33. Mandelbrot, B. (1967). How long is the coast of Britain? Statistical self-similarity and fractionaldimension. Science, 156, 636–638.

34. Maylor, E. A., Chater, N., & Brown, G. D. A. (2001). Scale invariance in the retrieval of retrospectiveand prospective memories. Psychonomic Bulletin Review, 8, 162–167.

35. Nettle, D. (1999). Linguistic diversity. Oxford, UK: Oxford University Press.

36. Newman, M. E. J. (2003). The structure and function of complex networks. SIAM Review, 45, 167–256.

37. Newman, M. E. J. (2006). Power laws, Pareto distributions and Zipfʼs law. Contemporary Physics, 46,323–351.



38. Puglisi, A., Baronchelli, A., & Loreto, V. (2008). Cultural route to the emergence of linguisticcategories. Proceedings of the National Academy of Sciences of the USA, 105(23), 7936–7940.

39. Ripeanu, M., Foster, I., & Iamnitchi, A. (2002). Mapping the Gnutella network: Properties of large-scalepeer-to-peer systems and implications for system design. IEEE Internet Computing, 6, 50–57.

40. Sims, D. W., Southall, E. J., Humphries, N. E., Hays, G. C., Bradshaw, C. J. A., Pitchford, J. W.,James, A., Ahmed, M. Z., Brierley, A. S., Hindell, M. A., Morritt, D., Musy, M. K., Righton, D.,Shepard, E. L. C., Wearmouth, V. J., Wilson, R. P., Witt, M. J., & Metcalfe, J. D. (2008).Scaling laws of marine predator search behavior. Nature, 451, 1098–1102.

41. Spence, A. J. (2009). Scaling in biology. Current Biology, 19(2), R57–R61.

42. Stauffer, D., Schulze, C., Lima, F. W. S., Wichmann, S., & Solomon, S. (2006). Non-equilibrium andirreversible simulation of competition among languages. Physica A, 371, 719–724.

43. Vogt, P., & Lieven, E. (2010). Verifying theories of language acquisition using computer models oflanguage evolution. Adaptive Behavior, 18(1), 21–35.

44. Wang, X., & Chen, G. (2003). Complex networks: Small-world, scale-free and beyond. IEEE Circuitsand Systems, 3(1), 6–20.

45. Warren, C. P., Sander, L. M., & Sokolov, I. M. (2002). Geography in a scale-free network model.Physical Review E, 66, 056105. Available: http://arXiv:cond-mat/0207324v3

46. Watts, D. J., & Strogatz, S. H. (1998). Collective dynamics of “small-world” networks. Nature, 393,440–442.

47. Wichmann, S. (2005). On the power law distribution of language family sizes. Journal of Linguistics,41(1), 117–131.

48. Zipf, G. K. (1949). Human behavior and the principle of least effort: An introduction to human ecology.Reading, MA: Addison-Wesley.

Appendix 1: The Naming Game

In this model, N individuals (agents) are naming an object during naming games. Each agent has aninitially empty inventory to store candidate names. A game involves two agents (a speaker and ahearer ). First, the speaker utters a name to the hearer. If its inventory is empty, the speaker randomlyinvents a name; otherwise, it utters randomly one of the available names. If the hearer has theuttered name in its inventory, the game succeeds, and both agents delete all their names exceptthe uttered one; otherwise, the game fails, and the hearer adds the uttered name to its inventory.Figure 5a shows two examples of the naming game.

Based on the number of distinct names in the population (Nd) and the rate of successful gamesamong agents (S), Figure 5b traces the dynamics of this game in a population with random games(resembling the case E = 0.0). The dynamics has two phases: (a) Nd increases but S remains low,indicating that agents keep inventing new names, but many games fail; and (b) Nd drops to 1 and Sreaches 1.0, indicating that agents end up sharing a common name and most games succeed.Statistical analysis helps reveal the correlations among N, maximum Nd, number of games for Nd

to reach its maximum, and S [6].

Appendix 2: The Category Game

Agents in this model perceive stimuli from a continuous perceptual space. Each stimulus is denotedby a real number within [0, 1]. A categorization pattern corresponds to a partition of this space intosubintervals called perceptual categories. Lexical names are used to describe stimuli from differentperceptual categories. In an agent, if some perceptual categories having adjacent boundaries sharea common lexical name, they will join together as a linguistic category. All N agents initially conceivethe whole perceptual space as one perceptual category with no lexical names. Each agent has an



inventory to store perceptual categories and their lexical names. Categorization patterns evolve duringcategory games. In one game, M (≥2) stimuli randomly chosen from the perceptual space are pre-sented to the two agents (a speaker and a hearer). One of the stimuli is the topic of this game. Notethat the perceptual difference between any two of the stimuli must be greater than a discriminativeconstraint, dmin. The speaker first tries to discriminate the stimuli, and utters the name of the percep-tual category in which the topic lies. Failing to do so, the speaker will create new perceptual categoriesand new lexical names to distinguish the topic from other stimuli, and utter the name of the newlycreated category that contains the topic. Then, the hearer tries to guess the topic based on the heardname and its own categories. If the hearerʼs guess matches the topic, the game succeeds, and bothagents remove all competing names except the heard one in their perceptual categories referred to inthis game, just as in the naming game; otherwise, the hearer adds the heard name to the perceptualcategory that can discriminate the topic, and if no such category exists, the hearer will create a newcategory to discriminate the topic and assign the heard name to it. Figure 6a shows two examples ofthe category game.

The dynamics of this game can be traced by three indices: (a) overlap (O), which calculates thedegree of boundary alignment among linguistic categories across agents; (b) number of sharedlexical names (NL), which reflects the number of linguistic categories sharing similar boundariesand lexical names across agents; and (c) success rate (S), which calculates the percentage of success-ful games between all agents. To measure S, we let agents play virtual games without updating theirinventories, and calculate the percentage of successful games in these virtual games. S echoes Oand NL; if agents share many linguistic categories having similar boundaries and common lexicalnames, S will be high. Figure 6b traces the dynamics of this game in a population with randomgames. The dynamics has two phases: (a) new perceptual categories with different boundariesand lexical names are created for the purpose of discrimination, but O, NL, and S remain low;and (b) new perceptual categories keep emerging, but due to boundary mismatch, adjacent catego-ries in agents start to share lexical names and merge to linguistic categories (see [38] for examples).Then, although the boundaries of perceptual categories are still mismatched, those of linguisticcategories can become roughly aligned. At this stage, O and NL increase and become stable, andS increases and reaches a high value. From now on, the system remains stable for a long time;on waiting for a much longer time (say, 105–106 games per agent), one may observe a slight dropof NL and S [38].

Figure 5. (a) Examples of the naming game (adapted from [6]). Rectangles are individual inventories. Uttered names are initalic. In game 1, the speaker utters “gong”; since the hearer does not have this name in its inventory, the game fails, andthe hearer adds “gong” to its inventory. In game 2, the speaker utters “loreto”; since the hearer has this name, the gamesucceeds, and both agents delete other names than “loreto” from their inventories. (b) Dynamics of the naming game ina 50-agent population with random games. Each line is averaged over 20 simulations.



Appendix 3: The Lexicon-Syntax Coevolution Model

This model examines the origin of a communal language formed by lexical items and simple wordorder(s). Language is represented by meaning-utterance mappings (M-U mappings). Individuals share asemantic space containing a fixed number of integrated meanings, each having a simple predicate-argument structure, such as “predicate〈agent〉” or “predicate〈agent, patient〉,” where predicate, agent, andpatient are thematic notations. These meanings are encoded by utterances, each comprising a stringof syllables chosen from a signaling space. An utterance encoding an integrated meaning can besegmented into subparts, each mapping one or two semantic constituents; and subparts can com-bine to encode an integrated meaning. During communications, based on equipped mechanisms,individuals can acquire linguistic knowledge from exchanged M-U mappings in previous communi-cations, produce utterances encoding integrated meanings, and comprehend heard utterances.

Linguistic knowledge is characterized by lexicon, syntax, and syntactic categories. An individualʼs lexiconconsists of a number of lexical rules (see Figure 7), some of which are holistic, each mapping anintegrated meaning onto an utterance, for example, “run〈tiger〉”↔/abcd/; others are compositional,each mapping semantic constituent(s) onto a subpart of an utterance, for example “fox”↔/ef/.

Using compositional rules requires these rules to be regulated in order. A syntactic rule (see Figure 7)specifies an order between two lexical items, for example, “tiger” << “fox” means that the constituent“tiger” lies in an utterance before—but not necessarily immediately before—“fox”. One local order helpsexpress “predicate〈agent〉” meanings, and two or three help express “predicate〈agent, patient〉” meanings.

Syntactic categories allow syntactic rules acquired from some lexical items to be applied produc-tively to others sharing the same thematic notation. A syntactic category (see Figure 7) comprises aset of lexical rules and a set of syntactic rules that regulate the orders between these lexical rules and

Figure 6. (a) Examples of the category game (adapted from [38]). Circles denote presented stimuli, among which topicsare indicated by arrows. Banners denote the perceptual space, and agents use different bars to partition this space intoperceptual categories, whose lexical names are listed above or below. In game 1, the two stimuli fall into the same perceptualcategory in the speaker. Then, the speaker discriminates the topic (a) by creating a new boundary in this category at theposition (a + b)/2. This gives rise to two new categories, both inheriting the names (“green” and “olive”) of their parentcategory. A new name is invented in each new category (“brown” and “blue”). After that, the speaker sends the newlycreated name (“brown”) to the hearer. Since the hearer does not have this name in its inventory, the game fails. Then,the speaker clarifies the topic, and the hearer discriminates the topic, and adds “brown” to the name list of the correspondingcategory. If necessary, the hearer may create some new categories. In game 2, since the topic is discriminated by theperceptual category whose name is “green,” the speaker sends “green” to the hearer. The hearer knows “green,” andthe perceptual category having this name can also discriminate the topic. Therefore, the game succeeds. Then, both agentsdelete all competing names in their corresponding categories and leave “green” only. This alignment strategy adjusts thename lists of categories, not their boundaries. (b) Dynamics of the category game in a 50-agent population (dmin = 0.01)with random games. Each line is averaged over 20 simulations.




those from other categories. For the sake of simplicity, we simulate a nominative accusative languageand exclude passive voice. A category associating lexical rules having the thematic notation of agentis marked as a subject (S) category, since the notation of agent corresponds to the syntactic role of Sin this language. Similarly, patient corresponds to object (O), and predicate to verb (V). A local orderbetween two categories can be denoted by their syntactic roles; for example, an order before betweenan S and a V category can be denoted by S << V, or simply SV.

Lexical and syntactic knowledge jointly encode integrated meanings. As in Figure 7, based onthe three lexical rules respectively from the S, V, and O categories, and the two orders SV andSO among these categories, the semantic expression “fight〈wolf, fox〉” can be encoded into anutterance /bcea/ or /bcae/, following SVO or SOV. In addition, each lexical or syntactic rulehas a strength, indicating the probability of successfully applying its M-U mapping or local order.A lexical rule also has an association weight to the category that contains it, indicating the probabilityof successfully applying the syntactic rules of this category to the utterance of that lexical rule.Both strengths and association weights lie in [0.0, 1.0]. These numerical parameters enable strengthbased rule competition in communications and gradual forgetting of linguistic knowledge, that is, regularly(according to a forgetting frequency) deducting a fixed value ( forgetting rate) from strengths and asso-ciation weights of rules in each individual, and then, removing lexical rules from categories to whichtheir association weights are 0.0, and discarding rules with negative strengths, categories with nolexical members, and syntactic rules of these categories.

Lexical rules are acquired by detecting recurrent patterns (meanings and syllables appearingrecurrently in at least two M-U mappings). Each individual has a buffer storing M-U mappingsobtained in its previous communications. New mappings, before being inserted into the buffer, arecompared with those in the buffer. As in Figure 8, by comparing “hop〈fox〉”↔/ab/ with “run〈fox〉”↔/acd/, an individual can note the recurrent patterns “fox” and /a/, and map them as alexical rule “fox”↔/a/.

Figure 7. Examples of lexical rules, syntactic rules, and categories. “#” denotes unspecified semantic constituents, and “*”unspecified syllable(s). S, V, O are syntactic roles of categories. Numbers enclosed by ( ) denote strengths, and those by[ ] association weights. “<<” denotes the local order before, and “>>” after.


Syntactic rules and categories are acquired based on thematic notations of lexical rules and orderrelations of their utterances in M-U mappings. As in Figure 8, evident in M-U mappings (1) and (2),syllables /d/ of rule (i) and /ac/ of rule (iii) precede /m/ of rule (ii). Since both “wolf” and “fox”have the thematic notation agent in these meanings, rules (i) and (iii) are associated into an S category(category 1), and the order before between these rules and rule (ii) is acquired as a syntactic rule.Similarly, according to M-U mappings (1) and (3), a V category (category 2) associating rules (ii)and (iv) and a syntactic rule after are acquired. Now, since categories 1 and 2 respectively associaterules (i) and (iii) and rules (ii) and (iv), the syntactic rules in these categories are updated as “category1 (S) << category 2 (V),” or SV.

A communication between two individuals (a speaker and a hearer ) consists of many rounds ofutterance exchange. In one round, based on its linguistic rules, the speaker produces an utteranceto encode a randomly chosen integrated meaning in the semantic space (see Figure 9, left panel).If the available rules offer more than one form of utterance, rule competition takes place, based onthe strengths and association weights of related rules, and the speaker selects the set of rules havingthe highest combined strength for production. If the speaker lacks rules to encode the meaning, itmay (under a random creation rate) randomly create a holistic rule to encode the whole meaning;otherwise, it produces nothing. The hearer receives the produced utterance, and tries to comprehendit based on the hearerʼs linguistic rules (see Figure 9, right panel). If multiple choices are available,rule competition takes place, and the hearer selects the set of rules having the highest combinedstrength for comprehension. Calculation of combined strength can be found in [19, 21].

Apart from linguistic materials, nonlinguistic cues also assist comprehension, especially whenlinguistic knowledge is insufficient. A cue contains an integrated meaning and a fixed strength(cue strength). The probability with which the cueʼs meaning matches the speakerʼs intended one ismanipulated by reliability of cue. In comprehension, if the cueʼs meaning matches the one offered bysome linguistic rules, the cue strength is added to the combined strength of those rules; otherwise,

Figure 8. (a) Example of acquisition of lexical rules and (b) acquisition of syntactic rules and categories.



the cue itself forms a candidate set for comprehension. Such unreliable cues can trigger preliminarylinguistic knowledge at the early stage of language origin.

After comprehension, if the combined strength of the set of rules used for comprehensionexceeds a confidence threshold, the hearer adds the comprehended M-U mapping in its buffer, and sendsa positive feedback to the speaker. Then, both individuals reward their rules used in this utteranceexchange (by adding a fixed value (adjustment rate) to their strengths and association weights) andpenalize competing ones (by deducting the same value from their strengths and association weights);otherwise, without adding the M-U mapping, the hearer sends a negative feedback to the speaker,and then, both individuals penalize their used rules.

Table 2 lists the parameter values used in the simulations of this article. The effects of theseparameters on language evolution are discussed in [19]. This model can simulate both languageorigin and change. In the origin simulations, individuals initially share eight holistic rules to encode8 out of 64 integrated meanings. In the change simulations, individuals initially share 12 lexical rulesassociated into three categories (S, V, and O) having SV, VO, and SO local orders. These rules canencode all 64 integrated meanings, and the produced utterances follow SV (“predicate〈agent〉” mean-ings) and SVO (“predicate〈agent, patient〉” meanings) orders.

The dynamics of language origin and change in this model can be evaluated by: (a) the ruleexpressivity (RE ), the percentage of integrated meanings that individuals can express using theirlinguistic rules; and (b) the understanding rate (UR), the percentage of integrated meanings that indi-viduals can accurately comprehend using their linguistic rules, without referring to cues. To measureRE and UR, we let each pair of individuals talk to each other about each integrated meaning in thesemantic space, and calculate: (a) the percentage of utterance exchanges where speakers produceutterances to encode meanings; and (b) the percentage of utterance exchanges where speakersʼintended meanings match hearersʼ comprehended ones.

Figure 10 traces the dynamics of this model in a population with random communications. Thedynamics of origin has two phases. First, based on their learning mechanisms, individuals begin to

Figure 9. Examples of production and comprehension (adapted from [19]). CatS, CatV, and CatO are categories withsyntactic roles S, V, and O. “<<” denotes the local order before, and “>>” after. Syllables within / / are utterance syllables,and “#” denotes unspecified semantic constituents. Rule strengths and association weights are omitted. In production,to encode “chase〈lion, wolf〉”, the speaker selects lexical rules (e.g., “chase〈#, #〉”↔/a b c/) that can encode allor some semantic constituents in this meaning, and the syntactic categories (e.g., CatV) that associate these lexical rulesand have corresponding syntactic roles (e.g., V). Then, following the syntactic rules in these categories, the speakerregulates the lexical rules (e.g., /d/ << /e f/) into an utterance (/a b c d e f/). If this set of rules wins the competition againstothers (if any), this utterance is sent to the hearer. In comprehension, the hearer selects lexical rules (e.g., “fox”↔/d/)whose utterances partially or fully match the heard utterance. Then, the hearer detects the orders of these lexical rulesin the heard utterance (e.g., /d/ << /e f/). If these orders match the syntactic rules (e.g., OS) in some categories that alsoassociate those lexical rules, those categories are selected. After that, based on the syntactic roles of those categories,the semantic roles of those lexical rules are specified (e.g., “fox” is from an O category; then it is patient), and “fight〈lion,fox〉” is comprehended. In this example, the comprehended meaning does not match the speakerʼs intended one, butif the combined strength of the rules used by the hearer exceeds the confidence threshold, the hearer updates thecomprehended M-U mapping into its buffer, and sends a positive feedback to the speaker. Then, both individuals rewardtheir rules used in this utterance exchange and penalize competing ones.



acquire linguistic rules to express many integrated meanings, so there is an increase in RE, starting from0.125 (8/64), to 1.0, but since newly acquired rules are not yet widely shared and some may competewith original holistic rules,UR remains low and may even drop. Second, when competition causes somerules to be shared among individuals, mutual understanding becomes frequent, and UR starts toincrease and nearly reaches 1.0. The dynamics of change is relatively simple: RE and UR remain stableand high (over 0.8) throughout the simulation, but some lexical and/or syntactic rules may change.

Appendix 4: Statistical Analyses of the Simulation Results

The conclusions based on the naming game can be confirmed by a two way analysis of covariance(ANCOVA) (dependent variable: S in 20 simulations; fixed factor: 7 E values; random factor:

Table 2. Parameter setting of the lexicon-syntax coevolution model.

404

Parameter

Artificial Lif

Value

Size of semantic space
64
Size of signaling space
30
Size of buffer
40
Random creation rate
0.25
Adjustment rate
0.1
Forgetting rate
0.01
Reliability of cue
0.6
Confidence threshold (=cue strength)
0.75
Utterance exchange per communication
20
Figure 10. Dynamics of (a) language origin and (b) change in a 50-agent population with random communications. Eachline is averaged over 20 simulations.

e Volume 20, Number 3


7 N values; covariate: 20 sampling points throughout 50 games per agent). The purpose of usingANCOVA, instead of ANOVA, and treating the number of games as a covariate, is to partial outthe influence of the covariate. Noting that population size is not limited to these values, we treatN as a random factor, not a fixed one like E.

The ANCOVA reveals that both E (F6,36 = 229.932, p < 0.001, Dp2 = 0.975) and N (F6,36 =10.517, p < 0.001, Dp2 = 0.637) have significant main effects on S, and they interact significantly(F36,19550 = 54.428, p < 0.001, Dp2 = 0.091). The covariate is also significantly correlated with S(F1,19550 = 14830.896, p < 0.001, Dp2 = 0.431). These results are shown in Figure 11. The marginalmean S across all populations peaks when E = 1.0 (see Figure 11a). The marginal mean S across allpower laws drops with increase in population size (see Figure 11b). And the marginal mean S underdifferent power laws and population sizes is similarly high when E = 1.0, but drops in other cases(see Figure 11c).

The conclusions based on the category game are also confirmed by the ANCOVA. It revealsthat both E (F6,36 = 93.552, p < 0.001, Dp2 = 0.940) and N (F6,36 = 6.471, p < 0.001, Dp2 = 0.519)have significant main effects on S, and they interact significantly (F36,19550 = 59.769, p < 0.001,Dp2 = 0.099). The covariate is also significantly correlated with S (F1,19550 = 10467.806, p < 0.001,Dp2 = 0.349). These results are shown in Figure 12.

Finally, the conclusions based on the lexicon syntax coevolution model are also confirmed bythe ANCOVA. As for the origin simulations, the ANCOVA reveals that both E (F6,36 = 29.828,p < 0.001, Dp2 = 0.833) and N (F6,36 = 4.649, p < 0.001, Dp2 = 0.437) have significant main effectson UR, and they interact significantly (F36,19550 = 67.538, p < 0.001, Dp2 = 0.111). The covariate isalso significantly correlated with UR (F1,19550 = 6176.276, p < 0.001, Dp2 = 0.240). These results areshown in Figure 13a–c. As for the change simulations, the ANCOVA reveals that both E (F6,36 =787.092, p < 0.001, Dp2 = 0.992) and N (F6,36 = 4.075, p < 0.001, Dp2 = 0.404) have significantmain effects on UR, and they interact significantly (F36,19550 = 75.226, p < 0.001, Dp2 = 0.122). The

Figure 12. Marginal mean S of the category game (a) under different power law social popularities and (b, c) in differentpopulations.

Figure 11. Marginal mean S (average over all sampling points in 20 simulations under the same condition) of the naminggame (a) under different power law social popularities and (b, c) in different populations.



covariate is also significantly correlated with UR (F1,19550 = 146.196, p < 0.001, Dp2 = 0.007). Theseresults are shown in Figure 13d–f.

Appendix 5: Comparison between Power Law and Normally DistributedSocial Popularities

The normally distributed social popularity is defined by

f ðxÞ ¼ c � gðxÞ; where gðxÞ ¼ 1

jffiffiffiffiffiffi2k

p e−ðx−AÞ2=2j2 ð4Þ

Here, A is the mean, j is the standard deviation, and c is the normalizing factor making sure the sumof all participating probabilities is 1.0. Individual rank does not affect this distribution. To calculateindividualsʼ probabilities, we randomly select N values from [A − 2j, u + 2j] as x to calculate g(x),and then obtain f (x) after normalization of all g(x). For comparison, we first set up seven normallydistributed social popularities, whose means and standard deviations respectively equal those of theseven power law social popularities. Then, based on the three language models, we analyze thetransitions of S or UR under these normally distributed social popularities and in different popula-tion sizes to see if the effect generalized in these simulations is similar to that in the simulationsunder power law social popularities.

As for the naming game, we conduct a similar two-way analysis of covariance (ANCOVA)(dependent variable: S or UR in 20 simulations; fixed factor: seven types of normal distributionsdetermined by the seven E values; random factor: seven N values; covariate: 20 sampling pointsthroughout 50 games per agent), as in Appendix 4, for statistical analysis. The ANCOVA showsthat only N (F6,36 = 157.804, p < 0.001, Dp2 = 0.963) has a significant main effect on S, but E(F6,36 = 0.943, p = 0.477, Dp2 = 0.136) does not, and there is no significant interaction between Eand N (F36,19550 = 0.825, p = 0.761, Dp2 = 0.002). These results are shown in Figure 14. We can see

Figure 13. Marginal mean UR of the lexicon-syntax coevolution model under different power law social popularities andin different populations in the origin (a–c) and change (d–f ) simulations. Error bars denote standard errors.



that different types of normally distributed social popularity cannot greatly affect the evolution ofcommon lexical names; and with the increase in N, the maximum S drops, under all types of normallydistributed social popularity.

As regards the category game, the ANCOVA reveals that both E (F6,36 = 7.685, p < 0.001,Dp2 = 0.562) and N (F6,36 = 188.458, p < 0.001, Dp2 = 0.969) have significant main effects on S,but there is no significant interaction between E and N (F36,19550 = 0.288, p = 1.000, Dp2 = 0.001).These results are shown in Figure 15. We can see that with increase in N, the transition of S be-comes slower and the maximum S drops, under all types of normally distributed social popularity;and although the effect of normally distributed social popularity reaches a significant level, asshown in Figure 15a, once the normally distributed social popularities have nonzero standarddeviations (whose E is not 0.0), S increases a little bit, and the effects of these social popularitieson S are similar to each other. However, these effects are quite distinct from those of powerlaw social popularities.

Finally, as for the lexicon-syntax coevolution model, in the origin simulations the ANCOVAshows that both E (F6,36 = 3.976, p = 0.004, Dp2 = 0.399) and N (F6,36 = 464.513, p < 0.001,Dp2 = 0.987) have significant main effects on UR, and there is a significant interaction between Eand N (F36,19550 = 6.052, p < 0.001, Dp2 = 0.011). These results are shown in Figure 16a,b. We cansee that under all types of normally distributed social popularities, the emergent language has a lowUR; with increase in N, the transition of UR becomes slower and the maximum UR drops, under alltypes of normally distributed social popularity; and although the effect of normally distributed socialpopularities reaches a significant level, introducing nonzero standard deviations can only increase URslightly, in contrast with the effects of the power law social popularities. In the change simulations,the ANCOVA shows that only N (F6,36 = 7.880, p < 0.001, Dp2 = 0.568) has a significant main effecton UR, but E (F6,36 = 2.336, p = 0.052, Dp2 = 0.280) has a marginally significant main effect, and


Figure 15. Marginal mean S of the category game (a) under different normally distributed social popularities and (b) indifferent populations. Error bars denote standard errors.

Figure 14. Marginal mean S (average over all sampling points in 20 simulations under the same condition) of the naminggame (a) under different normally distributed social popularities and (b) in different populations. Error bars denotestandard errors.


there is a significant interaction between E and N (F36,19550 = 372.403, p < 0.001, Dp2 = 0.407).These results are shown in Figure 16c,d. We can see that under all types of normally distributedsocial popularities and across all N, UR is preserved at a medium level around 0.6; although theeffect of E reaches a marginally significant level, with increase in E, UR only drops slightly and withincrease in N, UR also drops slightly. All these results are different from those under power lawsocial popularities.

Figure 16. Marginal mean UR of the lexicon-syntax coevolution model under different normally distributed social popularitiesand in different populations in the (a, b) origin and (c, d) change simulations. Error bars denote standard errors.


Exploring the Effect of Power Law Social Popularity on Language Evolution

Documents

Transcript of Exploring the Effect of Power Law Social Popularity on Language Evolution