A fuzzy approach to the prisoner's dilemma

BioSystems 41 (1997) 127–137

Technical note

A fuzzy approach to the prisoner’s dilemma

Paulo S.S. Borgesa,*, Roberto C.S. Pachecob, Ricardo M. Barciab, Suresh K. Khatorc

aFederal Uni6ersity of Santa Catarina, Department of Statistics, CNPq, Florianopolis, Santa Catarina, BrazilbFederal Uni6ersity of Santa Catarina, Department of Production Engineering, CNPq, Florianopolis, Santa Catarina, Brazil

cUni6ersity of South Florida, Department of Industrial and Management Systems, Tampa, FL, USA

Received 12 June 1995; revised 8 March 1996; accepted 28 May 1996

Abstract

The traditional iterated prisoner’s dilemma (IPD) admits only two possible moves, cooperate (C) and defect (D),with no gradations involved. Nevertheless, when a rational agent perceives or implements a specific behavioraldecision, it usually employs qualitative measures of the variables involved. In this paper, we propose an approach tothe IPD where the possible moves are still confined to C and D, but these are no longer considered as twodichotomous choices, but as different attitudes that can have variable emphasis. The variables accounted for in thegame are modeled as fuzzy sets, and the players’ decisions are taken with the guidance of fuzzy expert systems. Acomputational tournament is performed, where in addition to the fuzzy players, the well-known successful strategiststit for tat (TFT) and Pavlov are also present. Some results are presented and briefly discussed. The main purpose ofthis paper is to investigate a model of the IPD in which the players’ decisions are taken by means of a qualitativereasoning system. Copyright © 1997 Elsevier Science Ireland Ltd.

Keywords: Decision theory; Fuzzy expert systems; Game theory; Prisoner’s dilemma; Simulation

1. Introduction

The traditional iterated prisoner’s dilemma(IPD) assumes a binary choice for the players,

either cooperation (C) or defection (D). The im-plementation of the strategies generates sequencesof these two kinds of moves. The resulting payoffsare numerical values specified in the cells of apayoff matrix according to a particular pair ofdecisions made by the participants. Some re-searchers have considered the departure from thebinary choice. For example, Hardin (1982) takes

* Corresponding author, Dept. Informatica e Estatistica—INE, Universidade Federal de Santa Catarina, Caixa Postal476-88040-900, Florianopolis, SC Brazil. Fax: 55 (048) 231-9770. E-mail: [email protected].

0303-2647/97/$17.00 Copyright © 1997 Elsevier Science Ireland Ltd. All rights reserved

PII S 0 3 0 3 -2647 (96 )01667 -X

P.S.S. Borges et al. / BioSystems 41 (1997) 127–137128

gradual cooperation into account mainly in thepursuit of divisible public goods, and Harrald andFogel (1996) employ a planar approximation ofthe payoff matrix offered in Axelrod (1984) as abasis for an evolutionary simulation of the IPD,but use neural networks to represent the players’strategies.

Much of the present interest in the IPD derivesfrom the computer tournaments presented in Ax-elrod (1984), where diversified strategies were con-fronted with each other and the winner, tit for tat(TFT), proved robust in several other simulationsof the IPD. Axelrod (1987), in subsequent work,used a genetic algorithm to determine if TFTcould evolve from a random set of strategies. Thecurrent effort presents an alternative approach tothe IPD, which will be called the fuzzy iteratedprisoner’s dilemma (FIPD) (Borges, 1996). Depart-ing from the usual classic C or D binary form, themoves selected by the players are mapped in a scaleranging from 0 to 1 corresponding, respectively, tofull defection and full cooperation. Since the inputsof the game are no longer discrete, a payofffunction consisting of two intersecting planes hasbeen used as a substitute of the payoff matrix.

Allowing the moves to vary continuously in aninterval brings a wider range of possibilities to theIPD. Still, the main subject to be explored in thisarticle is the representation of every player bythree fuzzy expert systems (FES). The participantsperceive and implement the actions pertaining tothe game only qualitatively, according to individ-ual rules. The expert systems then transform thequalitative strategies into numerical values, whichare the inputs of the payoff function. The under-lying idea loosely resembles the manner in whicha person drives a car. Depending on the vehicletrajectory, the driver assesses the situation andreacts by steering to the right or left, with differ-ent emphasis. In this way, although the driversucceeds in maintaining the car under control, hedoes not know the exact angle and other numeri-cal characteristics of his action conveyed to thewheels by the mechanical system of the car.

Here, likewise, the players are represented bytheir respective FES, which will sense, measureand implement the actions that belong to thegame, where the moves, even if gradual, are still

Table 1Classical payoff matrix of the PD game

Player 1’s decision Player 2’s decision

Cooperate Defect

RRCooperate TST S PDefect P

labeled as cooperation or defection. Consideringthe subjectivity of these two concepts, they havebeen replaced by the game payoffs, taken underthree different fashions, as the players’ inputs togenerate their decisions. Three sets of rules areassigned to every participant, each one having asinput a distinct deterministic factor based on thepayoffs that take place during the game. The first( f1) concerns the relation between a player’s accu-mulated gains and its opponent’s. The second ( f2)relates to the payoffs obtained in the sequence oflast moves, and the third ( f3) regards a compari-son between the performance of the populationand a player. The output in every move is stilleither C or D, but with variable emphasis.

The universe of discourse of each input factor isdivided into three fuzzy sets, and this schemayields 23=8 different strategies for any particularFES. The total number of strategies for all threeFES is then 83=512, which corresponds to thepopulation of fuzzy strategists or fuzzy players. Inaddition, both the Pavlov (Beardsley, 1993) and

Fig. 1. Fuzzy sets describing possible actions in the FIPD.

P.S.S. Borges et al. / BioSystems 41 (1997) 127–137 129

Fig. 2. The planar function used to depict player 1’s payoff inthe FIPD.

instance, a fuzzy set expressing indifference, cen-tered on a=0.5, could have been added, as shownin Fig. 11. The degree of membership of an actionin either fuzzy set C or D is given by:

mC(a)=!2a−1

0when a]0.5otherwise

(2a)

mD(a)=!−2a+1

0when a50.5otherwise

(2b)

Since the players can make moves in the interval[0,1] (defect, cooperate), the typical payoff matrixis replaced by a linear function of two indepen-dent variables that represent each player’s action.The output, as shown in Fig. 2, consists of theresulting payoff for player 1, depicted by twointersecting planes, one of them built with thepoints (a1=1, a

2=0, p1=0), (a1=0 a2=0, p1=1)and (a1=1, a2=1, p1=3). For the other, thepoints (a1=0 a2=0, p1=1), (a1=1, a2=1, p1=3) and (a1=0, a2=1, p1=5) were used.

The payoffs corresponding to the combinationsof the extreme actions C and D (mC(a)=1 and

the TFT strategies are present in the tournament.These well-known successful strategies were in-cluded to determine how well they would farewhen confronted by the standard players consid-ered in this paper, that is, those whose actions areguided by the respective FES.

2. A payoff function

The payoff matrix of the classical IPD game isshown in Table 1 (Axelrod, 1984). The pairs ofvalues in the cells of the matrix represent thepayoffs of the row and the column players, re-spectively. The meaning of the payoffs are: R,reward for mutual cooperation; S, the ‘sucker’s’payoff; T, temptation to defect; and P, punish-ment for mutual defection.

To represent an IPD, the values T, R, P and Sin Table 1 must satisfy Eqs. (1a) and (1b).

T\R\P\S (1a)

T+S2BR (1b)

In the classic IPD game, a commonly adopted setof payoffs is T=5, R=3, P=1 and S=0 (Axel-rod, 1984).

Instead of punctual choices, here cooperationand defection are considered as two non-intersect-ing fuzzy sets and a particular move or action (a)is defined by the degree of membership mC(a) ormD(a) in one of these sets. The universe of dis-course of the actions (a) has been divided intoonly two mutually exclusive triangular fuzzy setsC and D, but other shapes could be adopted. For

Fig. 3. Fuzzy sets for the qualitative description of f1. Theextreme points of the fuzzy set SL (0.44, 0.57) and the leftmostpoint of the fuzzy set GT (0.55) were determined by makingw1=0.8w2, w1=1.3w2 and w1=1.2w2, respectively, in Eq. (5).

1 Using two fuzzy sets to characterize the actions generateseight rules for each decision factor (See Section 3). Since thereare three decision factors, the total number of strategies will be512 (23×23×23). If a third fuzzy set were added, this numberwould increase to 19 683 (33×33×33), each one correspond-ing to a different player in the game.


Table 2Fuzzy production rules involving the wealth relation f1 and an action a

Wealth relation f1 Strategies

S5f 1

S8f 1

S7f 1

S6f 1

S1f 1

S2f 1

S3f 1

S4f 1

D D DMuch lower C C C DCCCDSimilar C DC D D

D C DMuch greater CC D C D

Sif 1

: Strategy Si in relation to the factor f1.

mD(a)=1, respectively) were chosen as T=5,R=3, P=1 and S=0, which conform to valuescommonly employed. The players’ gains (p1, p2)are given by:

Player 1’s payoff (p1):

p1=1−2a1+4a2 when a1Ba2 (3a)

p1=1−a1+3a2 when a1]a2 (3b)

Player 2’s payoff (p2):

p2=1−2a2+4a1 when a2Ba1 (4a)

p2=1−a2+3a1 when a2]a1 (4b)

where a1 and a2 are the actions of player 1 andplayer 2, respectively, in a move of the FIPD.

3. Fuzzy decision rules

In addition to the common criterion of model-ing the players’ strategies, acting with memory ofthe sequence of the opponent’s last moves, twoother variables that seem to be related to theplayer’s choices have been included in the presentmodel. The variables are represented by f1, f2 andf3, henceforth called decision factors, and definedas:

Relation between the accumulated gainsf1:of the player and its opponent;Sequence of the last iterations between thef2:parties, limited to three moves;Relation between the overall trends re-f3:garding the accumulation of payoffs, ex-pressed by the player’s average gain periteration divided by the mean payoff ofthe whole group.

3.1. Relation between the accumulated gains of aplayer and those of its opponent: f1

One of the criteria that might affect a player’sdecision is the relation between its own wealthand that possessed by the opponent. The idea thatunderlies the adoption of this factor comes fromthe Hawk–Dove game (Maynard Smith, 1982).An element called resource holding power (RHP),a measure of size, strength, weapons, or otherkinds of power symbols, would enable its ownerto win a dispute. The RHP plays an importantrole in the formulation of strategies. For instance,if a player knows that its opponent has a muchhigher wealth, it will be aware that the other hasa greater flexibility to execute strategies that, inthe long run, might force it into acting in accor-dance with the opponent’s desire, to avoid evenworse penalties. On the other hand, a favorablerelation can also be used by the player as a hintthat it is performing fairly well with its strategy ifcompared with a current competitor. Also, aplayer with a high f1 can implement strategieswith reduced concern about temporary losses. Eq.(5) considers the relation between a player whosewealth is w1 and an opponent with w2:

f1=w1

w1+w2

, f1� [0, 1], Ö w1, w2 (5)

To qualitatively characterize f1, three fuzzy setshave been designed: much lower (LW), similar(SL) and much greater (GT), which are the samefor all players (Fig. 3). In the fuzzy set SL,mSL( f1=0.5)=1, the left and rightmost points(mSL( f1% )=mSL( f1¦)=0) were arbitrarily chosen asf1%=0.44 and f1¦=0.57, trying to mirror an asym-metry usually found in human reasoning. This


means that a player would limit its concept ofsimilarity within an interval where its wealth is20% lower and 30% greater than that of theopponent (i.e. points w1=0.8w2 and w1=1.3w2).For a comparable reason, the fuzzy set GT startsat f1=0.55, which results from w1=1.2 w2, ap-proximately. The degree of membership of f1 ineach of those fuzzy sets is given by:

mLW( f1)=!−2f1+1

0for 05 f150.50;otherwise

(6a)

mSL( f1)

=ÍÁ

Ä

(50/3)f1−22/3− (100/7)f1+57/70

for 0.445 f150.50;for 0.50B f150.57;otherwise

(6b)

mGT( f1)=!(20/9)f1−11/9

0for 0.555 f151;otherwise

(6c)

The factor f1 is used as an antecedent in a set offuzzy production rules whose consequents are ei-ther gradual cooperation or defection. The samemechanism will be employed for the other twodecision factors. The conclusion yielded sepa-rately by each expert system relative to f1, f2 andf3 is partial, and the final decision is made con-jointly, as explained in the sequence. In a game, astrategy is a contingency plan that is adopted by aplayer, pointing out which action should be taken

Table 3Strategies used by the 512 different players of the FIPD

S2 S3 S4 S5 S6 S7 S8f1 S1

C CMuch lower CC D D D DC D D D D C CSimilar C

Much greater CDCDDCDC

S1 S2 S3 S4f2 S5 S6 S7 S8

Poor DDDDCCCCFair CCDDDDCC

D D C DC CDCHigh

S8S7S6S5S4S1 S3f3 S2

DDCCCCMuch lower DDD D D C CDSimilar C C

C D C DMuch greater D C D C

under specific circumstances. For example, thestrategy S2

f 1means ‘if f1 is much lower or similar,

then cooperate; if f1 is much greater, then defect’.Regarding just f1, the eight possible strategies arelisted in Table 2.

3.2. Last iterations between the parties: f2

The influence of the recent history of a con-tender’s moves is taken into account by the factorf2. It consists of a relation between a weightedaverage of the payoffs obtained in the last threeiterations and the maximum achievable payoff,i.e. 5. When it is the case that two specific playershave not met three times, the default values spe-cified by Eqs. (7a), (7b) and (7c) are used.

No previous mutual iterations: f2=0.4 (7a)

One previous mutual iteration: f2=pt−1

i

5(7b)

Two previous mutual iterations: f2

=0.4�pt−2

i

5�

+0.6�pt−1

i

5�

(7c)Fig. 4. Fuzzy sets for the qualitative description of f2.


Table 4Winners and payoffs achieved in the first phase of the simulations—groups 1–32

Group Group’s smallest payoffGroup’s avg. payoffWinner’s avg. payoffWinner

1 1.932.54 2.17455Pavlov 2.542 2.16 1.94

3 453 2.54 2.14 1.881.944 456 2.61 2.18

Pavlov5 2.61 2.12 1.831.732.202.606 566

Pavlov 2.437 2.12 1.872.53 2.17 1.858 664

1.952.162.589 3532.53 2.11 1.8510 Pavlov2.55 2.19 1.9011 355

436 2.42 2.1312 1.931.812.1455413 2.53

2.69 2.16 1.7714 Pavlov1.75Pavlov 2.49 2.1015

2.1616 555 2.71 1.911.752.112.4817 Pavlov

2.1818 546 2.66 1.902.1819 454 2.59 1.95

2.33 2.14 1.9620 515Pavlov 2.51 2.1021 1.65

1.692.1674522 2.61Pavlov 1.902.51 2.1423476 2.58 2.16 1.5424

1.8925 356 2.56 2.15Pavlov 2.5026 2.10 1.84

2.54 1.942.17354272.172.69 1.9844628

29 Pavlov 2.63 2.08 1.792.60 2.1530 1.71556

1.842.152.5431 5531.9232 Pavlov 2.83 2.14

Three previous mutual iterations: f2

=0.1�pt−3

i

5�

+0.3�pt−2

i

5�

+0.6�pt−1

i

5�

(7d)

The terms pt−6i /5 (6=1, 2, 3) stand for the rela-

tions between the gain that a player i obtainedand the maximum admissible in previous itera-tions.

The factor f2 belongs to [0, 1] and accounts forthe relative gain achieved in an iteration. It isdescribed by the attributes poor, fair and high,

which correspond to the respective fuzzy setsshown in Fig. 4 and membership grades given byEqs. (8a), (8b) and (8c). The points f2=0.2 and0.6 were determined with a fixed payoff pi=1 and

pi=3, respectively, which conform to the pairs ofdichotomous moves DD and CC. As to f2=0.4, itwas calculated with pi=2, which originated frommC(a=0.5)=mD(a=0.5)=0 for both players2.

2 From this point on, with any combination of actions, aplayer will receive a greater payoff than its opponent. For thisreason, f2=0.4 was chosen as the limit between the poor andthe high fuzzy sets. With regard to the fuzzy set fair, itslimiting points have been determined under two assumptions:first, an f2 below 0.2 means that the player is being exploitedby the other party and, thus, cannot see the outcome as fair.Second, in an opposite sense, if it obtains more than threepoints ( f2\0.6), this reveals an advantageous situation overthe opponent, rather than just a fair outcome.


Table 5Winners and payoffs achieved in the second phase of the simulations

Group’s avg. payoffGroup Winner Winner’s avg. payoff Group’s smallest payoff

1.76 1.421 2.24TFT1.312 556 2.21 1.92

1.833 TFT 2.10 1.191.86 1.294 2.22554

2.18 1.80 1.475 TFT1.866 TFT 2.16 1.351.70 1.147 1.94TFT

1.258 TFT 2.10 1.751.829 TFT 2.15 1.261.72 1.3410 2.01TFT

2.39 2.05 1.4111 445

mpoor( f2)=!−2.5f2+1


(8a)

mfair( f2)=ÍÁ

Ä

5f2−1−5f2+30

for 0.205 f250.40;for 0.40B f250.60;otherwise

(8b)

mhigh( f2)=!(5/3)f2−2/3


(8c)

The fuzzy expert systems which employ f2 followthe same pattern adopted for f1, only substitutingpoor, fair and high for LW, SM and GT in Table3.

3.3. Relation between the o6erall trends of wealth:f3

The third factor f3 is a linear relation between aplayer’s (wk

r ) and the population’s (W( n) averagepayoffs per iteration. This relation is intended toportray the current trend in the accumulation ofgains.

wkr =

wkr

k(9a)

W( n=%i wk

i

2n, i�P (9b)

f3=wk

r

wkr +W( n (9c)

where: wkr is the wealth of the player r at the k th

iteration; wkr is the average wealth per iteration

that the player has received until iteration k ; k isthe number of iterations already performed by theplayer r ; W( n is the average wealth per iterationthat the whole population has received after niterations; n is the total number of iterations untilthe present instant; and i�P, where P={playerswho have already played}.

The fuzzy sets designed to represent the qualita-tive concepts associated to f3 are similar to thoseshown in Fig. 3, and eight fuzzy production rulescan be generated in the same manner as depictedin Table 3.

mLW( f3)=!−2f3+1


(10a)

mSM( f3)

=ÍÃ

Ã

Á

Ä

(50/3)f3−22/3− (100/7)f3+57/7

0

for 0.445 f350.50;for 0.50B f350.57;

q=1, 2 and 3otherwise

(10b)

mGT( f3)=!(20/9)f3−11/9


(10c)


4. Determination of a player’s action in a move

The evaluation of f1, f2 and f3 yields three separatepartial conclusions about the intensity of coopera-tion/defection to be adopted. To accomplish theaggregation of the three partial results, a procedurewhich takes the maximum of the partial conclusionsderived from each factor is employed. The generalsteps to be followed by a player r=1 to m, whichare valid for every factor f j

q,r, are the following(where q corresponds to the type of the factor,q=1, 2 or 3, and j to the iteration j=1 to n):

(1) Determination of f jq,r.

(2) Fuzzification procedure of f jq,r.

(3) Activation of the fuzzy rule(s) matched by thefact (according to the player’s strategy Sq,r, whereq=1, 2, 3 designates the decision factor and r refersto a specific player).

(4) Join operation of the partial conclusionsyielding the final conclusions (mC

r , mDr ). Every player

is characterized by a set of nine fuzzy rules (threefor each Sq,r). The inference process fires the rulesmatched by the fuzzified decision factors. Eachfactor can originate either one or any pair of thepossible partial conclusions C or D, with its asso-ciated strength (membership grade m). The compo-sition of the partial conclusions is made through thefuzzy union, using the join operator n (fuzzymaximum), for the same class of conclusion. Hence,the final result to be derived in terms of cooperationand defection will be a sole pair of conclusions,taken as max(m fq

C ) and max(m fqD), q=1, 2, 3.

(5) Defuzzification of the final conclusions inorder to determine the final player’s action ar

j. Thefinal conclusions will consist of two membershipgrades, for either one of the fuzzy sets C and D.However, in order to find the payoffs in an itera-tion, the players’ actions cannot be fuzzy. A de-fuzzification procedure using the singleton method3

translates the qualitative conclusions into a crispvalue for each player.

5. Example of an iteration of the FIPD

The diagram depicted in Table 3 shows how aplayer is identified by the set of strategies it uses.Each player has nine rules defining what it will doaccording to the value of each decision factor. Letus assume that one of the players randomly chosenis P764. This means that it uses the strategy S7

f 1with

respect to f1, S6f 2

for f2, and S4f 3

for f3 (see columnsin boldface type)4.

Assume that the other player, also randomlyselected, was P177. The iteration process is asfollows: first, each player determines its decisionfactors. Every f is qualitatively described by itscorresponding fuzzy sets, and the rules that havebeen activated are added to the knowledge base.After the three factors have been considered, thefuzzy inference takes place using the fuzzy rulespreviously fired, yielding a player’s final action.With both final actions as inputs, the payoffs arecalculated using Eqs. (3) and (4). The entire decisionprocess for player P764 is described below (P177

implements an analogous process).The decision process for player P764: the data

assumed for the player P764 is w76424 =36 (wealth

reached after 24 iterations); w17730 =96 (wealth

reached by the current opponent after 30 itera-tions); w764

24 =1.5 (average wealth per iterationafter 24 iterations—Eq. (9a)); and W( n=2.5 (cur-rent population’s average gain per iteration—Eq.(9b)).

Previous iterations with the player P177:

IterationsPlayers’ actions

1 2

P764 0.5 0.2P177 0.50.8

(a) Determination of the partial conclusionsbased on each factor f1, f2 and f3

3 A singleton is a membership grade value represented by asingle vertical line that intercepts the horizontal axis in onlyone point (Viot, 1993). Here, there will be two singletons, onefor each fuzzy conclusion, C and D. The final action isdetermined by the point where the resultant of the two single-tons crosses the horizontal axis. This point is calculated bal-ancing the moments of the two singletons and of the resultant.

4 A player is a strategist with eight rules for each of thethree decision factors. The total number of players is thus8×8×8=512.


(1) Factor f1

Calculation: Eq. (5): f1=36/(36+96)=0.273Eq. (6a): mLW(0.273)=0.454Qualifica-

tion:Rule fired: Table 3 with strategy S7

f 1:

If f1 is much lower then defect:defect (0.454)

(2) Factor f2

Calculation: Eq. (7c): f2=0.4×3.2/5+0.6×2.6/5=0.568

Qualifica- Eq. (8b): mfair(0.568)=0.160tion:

Eq. (8c): mhigh(0.568)=0.280Table 3 with strategy S6

f 2:Rule fired:

If f2 is fair then defect: defect(0.160)If f2 is high then cooperate: co-operate (0.280)

(3) Factor f3

Calculation: Eq. (9c): f3=1.5/(1.5+2.5)=0.375

Qualifica- Eq. (10a): mLW(0.375)=0.250tion:Rule fired: Table 3 with strategy S4

f 3:

If f3 is much lower then cooper-ate: cooperate (0.250)

(b) Composition of the partial conclusions ofeach decision (cooperation and defection):

mcooperate

=max{cooperate (0.280), cooperate (0.250)}

=cooperate (0.280)

mdefection=max{defect (0.454), defect (0.160)}

=defect (0.454)

(c) Determination of the final action of theplayer P764:

mC(a)=0.280 � a=0.640 (Eq. (2b))

mD(a)=0.454 � a=0.227 (Eq. (2a))

an+1764 =

(1−mD(a))mD(a)+ (1+mC(a))mC(a)2(mC+mD)

an+1764 =0.413

Determination of the payoffs:

(a) Player P764 (a764n+1Ba177

n+1)p764=1using Eq.

(3a):p764=1−2×0.473+4×0.532p764=2.182

(b) Player P177 (a177n+1\a764

n+1)p177=1−a177

n+1+3×a764n+1using Eq.

(4b):p177=1−0.532+3×0.473p177=1.887

(c) Update of the population’s and players’parameters:

(1) Accumulated wealth (P764):w764

25 = (36+2.182)=38.182(2) Accumulated wealth (P177):

w17731 = (96+1.887)=97.887

(3) Average payoff per iteration (P764):w764

25 =38.182/25=1.527 (Eq. (9a))(4) Average of the payoff per iteration of P177:

w17731 =97.887/31=3.157 (Eq. (9a))

(5) Population’s average payoff per iteration:W( n=2.5 (Eq. (9a))

6. Simulations

The tournament consisted of four rounds, orphases. For the first phase, the 512 fuzzy playerswere divided into groups of 16 participants each,randomly chosen from the population. Addition-ally, every group also included the three non-fuzzy strategists, that is, the traditional TFT, thegeneralized TFT5 (TFT-g) and Pavlov, which

5 The generalized TFT strategist starts cooperating and, inthe next iterations, repeats the punctual action of the oppo-nent. Another possibility could be to consider the opponent’saction as an a-cut in the fuzzy action (C or D) and use adefuzzification procedure to generate the final action of thegeneralized TFT.


Table 6Winners and payoffs achieved in the third phase of the simulations

Winner Group’s avg. payoffGroup Group’s smallest payoffWinner’s avg. payoff

1 TFT 2.09 1.68 1.35TFT 2.23 1.66 1.172

1.283 TFT 2.18 1.63TFT 1.62 1.174 2.08

1.76 1.56 1.055 TFT

were always present in the subsequent phases,independently of their previous performance. Inthis manner, the first round had 32 groups, eachwith 19 contenders. After running 30 000 itera-tions per group, the competitors were ranked bytheir average payoff. The eight best fuzzy playersfrom each group were then selected, and 11 newgroups were formed, mixing the participants bychance. The next phase involved five groups, andthe 16 final best fuzzy strategists, along with TFT,TFT-g and Pavlov played the last dispute. Noplayer performed fewer than 3000 iterations perround, which means that, on the average, everyplayer encountered the same adversary on about166 occasions.

Tables 4–7 summarize the results obtained inthe four phases of the simulations, always refer-ring to groups of 19 players each (16 fuzzy andthree non-fuzzy).

7. Results and discussion

Except in the final phase, all groups played twosimulation rounds; as expected, the ranking orderfor each group showed variations, because thepayoffs accumulated during the iterations rely onthe random selection of pairs. The tables refer tothe round in which the greatest average payoffwas obtained.

An extra round involving all the 512 fuzzy andthe three non-fuzzy players in a sole group wasalso performed. Even though the group playedone million iterations, the results obtained werenot conclusive because the participants playedonly about 7.5 times with each particular oppo-nent, on average. This small number of iterations

between the same pair is insufficient for a playerto reach a stabilized pattern of behavior with itsopponents. The winner of this round was thefuzzy player 555 (the fuzzy ‘all D’). TFT ranked237th; TFT-g and Pavlov finished in 317th and374th places, respectively. An interesting featureobserved was that the first nine contestants had‘5’ (which means defection in every circumstance)as the middle digit. This position corresponds tothe factor f2 and refers to the decision rule regard-ing the payoffs previously obtained. In the limitedexperiments performed, the ‘all D’ characteristichad an edge over the other players, possibly be-cause it exploits their ‘nicer’ decision rules.

7.1. The first phase

In the 64 (2×32) runs, the digit 5 appeared atleast once in 36 of the 64 winners. In 28 cases, itwas the middle digit. Also, in 15 groups, the samefuzzy player won both runs, and 13 of themcontained the digit ‘5’ at least once in someposition. Excluding the fuzzy players, Pavlov per-formed quite well in this phase, winning 14 times.TFT arrived first three times.

7.2. The second phase

From the 32 groups, 176 fuzzy players werepicked, forming 11 newly arranged sets. The crite-rion was to select the individuals which appearedsimultaneously among the eight best-ranked fuzzyplayers in both rounds, along with some otherplayers that, although not having this feature,also performed well. This time, TFT won 17 ofthe 22 rounds. Surprisingly, the Pavlov strategistscored poorly. Its best positions were a second


Table 7Winner and payoffs achieved in the fourth (final) phase of the simulations

Group’s avg. payoff Smaller payoffWinner (player’s no.) Winner’s avg. payoff

0.781.51 1.36TFT

and fourth place, ranking in the lower half themajority of the time. Pavlov adopts the cognitiverule that interprets an opponent’s action as only to-tally cooperative or defective, even though the fuzzyplayers seldom, if ever, come out with the extremeactions 0 or 1. This caused Pavlov to elicit manyCs against less cooperative decisions from its adver-saries, and it earned consequently lower payoffs.

Why, then, was Pavlov so successful in the firstphase? The answer may be that the ‘defection-bi-ased’ strategists were scattered among all the groupsand in the end became present in a larger concen-tration, since they emerged as winners from theprevious level of simulations. On the other hand,TFT has a tradition of being robust in handlingdefectors, radically playing zeroswhenever it met a defector. As the players with adefective mood outnumbered those more coopera-tive, TFT thrived.

7.3. The third and final phases

As can be seen from the respective tables, thethird and final phases turned out to be only aconfirmation of the trend that had already beeninitiated by the increasing elimination of the ‘nicer’rules. Nevertheless, a fiercer dispute must havetaken place in the last phases. This can be confirmedby the average payoffs achieved by the players,which steadily dropped as the tournament evolved.This observation is valid for both winners andlosers. Again, TFT ended first, though closelyfollowed by the 555 fuzzy strategist.

8. Summary

The purpose of this research was essentiallyinvestigative. A model of the IPD was presented,where the players’ decisions were guided by fuzzyexpert systems. Also, the players took into consid-eration other variables besides those related to the

history of previous iterations. In this framework,there was also a departure from the dichotomousand mutually exclusive actions cooperate and de-fect, allowing the moves to vary continuously in theinterval (0, 1). A computer tournament was per-formed and its results were briefly discussed. Theauthors suppose that this approach can evoke agreat deal of investigation and refinement. Forinstance, alternative models of behavior, differentlyshaped fuzzy sets, membership learning, otherdefuzzification methods, and incorporation of eco-logical and evolutionary mechanisms are sometopics that could be considered in further research.

References

Axelrod, R., 1984, The Evolution of Cooperation (Basic Books,New York).

Axelrod, R., 1987, The evolution of strategies in the iteratedprisoner’s dilemma, in: Genetic Algorithms and SimulatedAnnealing, L. Davis (ed.) (Pitman, London) pp. 32–41.

Beardsley, T., Oct. 1993, Never give a sucker an even break. Sci.Am., 12.

Borges, P.S.S., 1996, Ph.D., Vol. 269, No. 4, Dissertation(Federal University of Santa Catarina, Florianopolis, Brazil/University of South Florida, Tampa, FL, USA).

Hardin, R., 1982, Collective Action (John Hopkins UniversityPress, Baltimore, MD).

Harrald, P.G. and Fogel, D.B., 1996, Evolving continuousbehaviors in the iterated prisoner’s dilemma. BioSystems 37(1–2), 135–145.

Kosko, B. and Isaka, S., 1993, Fuzzy logic. Sci. Am., 269, 76–81.Maynard Smith, J., 1982, Evolution and the theory of games

(Cambridge University Press, Cambridge, UK).Munakata, T. and Jani, Y., 1994, Fuzzy systems: an overview.

Commun. ACM 37 (3), 69–76.Nowak, M. and Sigmund, K., 1992, Tit for tat in heterogeneous

populations. Nature 355, 250–253.Poundstone, W., 1992, Prisoner’s Dilemma (Doubleday, New

York).Sigmund, K., 1993, Games of Life—Explorations in Ecology,

Evolution and Behavior (Oxford University Press, NewYork).

Viot, G., 1993, Fuzzy logic: concepts to constructs. AI Expert8 (11), 26–33.

Zadeh, L., 1965, Fuzzy sets. Inf. Control 8, 338–353.

A fuzzy approach to the prisoner's dilemma

Documents

Transcript of A fuzzy approach to the prisoner's dilemma