A swarm intelligence method combined to evolutionary game theory applied to the resources allocation...

ICSI 2011: International conference on swarm intelligence id–1

A Swarm Intelligence method combined to EvolutionaryGame theory applied to resources allocation problem

Cedric Leboucher1, Rachid Chelouah1, Patrick Siarry2, Stephane Le Menec3

1 L@ris, EISTI, Avenue du Parc,95011 Cergy-Pontoise, France

[email protected], [email protected] Universite Paris-Est Creteil (UPEC), LISSI (EA 3956)

61 av. du General de Gaulle,94010 Creteil, France

[email protected] MBDA France1, rue Reaumur,

92350 Le Plessis-Robinson, [email protected]

AbstractThis paper presents a suggested method to solve an allocation problem with a swarm intelligence method.

The application of swarm intelligence has to be discrete. This allocation problem can be modelled like a multi-objective optimization problem where we want to minimize the time and the distance of the total travel in alogistic context. To treat such a problem, we are presenting a hybrid Discrete Particle Swarm Optimization(DPSO) method combined to Evolutionary Game Theory (EGT), in which we adapt the movement of theparticles according to the constraints of our application. One of the difficulties in the implementation ofDPSO is the choice of inertial, individual and social coefficients. That is why we decided to optimize thosecoefficients by a dynamical approach based on EGT. The strategies are either to keep going with only inertia,or only with individual, or only with social coefficients. The optimal strategy is usually a mix of those threebehaviours. Thus, the fitness of the swarm is maximized when we obtain an optimal rate for each strategy.Evolutionary game theory studies the behaviour of large populations of agents who repeatedly engage instrategic interactions. Changes in behaviour in these populations are driven by natural selection via differencesin birth and death rates. We focused on replicator dynamics which is a fundamental deterministic evolutionarydynamics for games. To test this algorithm, we create a problem whose solution is already known. The aimof this study is to check whether this adapted DPSO method succeeds in providing an optimal solution forgeneral allocation problems and to evaluate the efficiency of convergence towards the solution.

KeywordsResources Allocation, Discrete Particle Swarm Intelligence, hard-constrained environment, Evolutionary

Game Theory, replicator dynamics

1 IntroductionIn this paper, we propose the use of swarm intelligence to solve an allocation problem. We explore a well-knownmethod of swarm intelligence: DPSO. For the last decade, the study of animal behaviour has been the subject ofactive research. Numerous scientists have been inspired by the modelling of social interactions to solve NP-hardoptimization problems. Even though the communication among the different agents is limited to an exchange ofbasic information, it results in a very effective team work that is definitely worth modelling. For example, let’sconsider a fish school. Each fish shares information with a small number of neighbours, those at its left and rightsides and ahead of it. Knowing the movement of the neighbours is the simple information upon which a fish schoolbases its strategy to avoid predators. As an isolated individual, a fish may, or may not, have the ability to escapea predator, depending on the context. As a member of a group, the fish makes up a collective intelligence thatreduces its vulnerability. The aim of the creators of PSO method (Kennedy and Eberhart) was to reproduce thissocial interaction among agents in order to solve combinatorial problems. Even though the meta-heuristic methodsprove efficient and give satisfactory solutions, the contribution of PSO enables to solve many problems with moreaccurate results than traditional methods, like Genetic Algorithms (GA) [1] [4] [11] . Initially, the PSO worked ina continuous space of solutions. It was later adapted to solve problems in a discrete space of solutions [1] [5] [11][12] [16] [22]. In many industrial contexts, allocation problems arouse a great interest in the operational researchcommunity. How to share available resources among a set of agents? If the resource allocation problem can besolved with a variety of techniques, it is still a challenging problem that requires further studies, especially when itdiverts to unusual forms. Here we consider a resource allocation problem in hard-constrained environment.

Before looking into the details, let’s consider the possible application fields of resource allocation. As itwas mentioned before, allocation problems are widely explored by the research community, and there are many

Cergy, France, June 14-15, 2011

id–2 ICSI 2011: International conference on swarm intelligence

different real applications using that modelling to solve those problems, from air traffic management [7], enterprisecontracting [8], water usage [9], economic applications [13], scheduling problems [17], job shop problems [20],tugboat allocation [21]... The variety of this list shows that there are many application fields to resource allocationproblems. In addition, the specific problem that we propose to solve matches a real industrial requirement: supplychain context. Being studied extensively by the scientific community and in a large field of applications, in thispaper, we propose to focus on a side of the problem worth looking into: we will use a Discrete Particle SwarmOptimization method (DPSO) combined to Evolutionary Game Theory (EGT) to solve this hard-constraint andcombinatorial problem. DPSO is derived of the continuous PSO [2], and, nowadays, it is increasingly used tosolve combinatorial problems. The basic principles of DPSO are well described in [10]. In the literature, the use ofDPSO is encountered in a variety of areas, such as systems of identification [1], budget allocation [5], schedulingproblems [6] [11] [16] [22], but also the travelling salesman problem (TSP) [18]. At the moment, one of the mainproblems of PSO and DPSO is the choice of the coefficients. Those coefficients usually vary depending on thespecific problem to treat. It is possible to determine the coefficients in supervised contexts. On the other hand,the task proves much trickier in unsupervised contexts. Not only does it take more time, but the coefficients arenot necessarily optimal. Our hybrid approach enables to make this choice by an automatically and optimised way.Besides, this new approach is totally dynamic and is self-adapting to the current situation to improve the globalefficiency of the total swarm.

To our knowledge, only a few researches are taking up with resources allocation problems in hard-constrainedenvironment. This adapted method to classic DPSO enables to solve many resource allocation problems whileadapting only function evaluations and constraints.

This paper is organized as follows: in the section 2, we introduce the adapted PSO in a discrete context. In thethird section, we describe the problem that we propose to solve. We give a mathematical modelling of that problemin the section 4. The proposed method is described in section 5. And the section 6 consists in an application andthe results that we obtain. At last, we conclude and envision a few perspectives.

2 Background of DPSO and principles of evolutionary game theory

2.1 DPSO principlesAs in the PSO method, we have to define the particles, their velocities and moving rules. In this description we usethe same notations as [11]. It is important to mention that, in most studies, the particles are built as binary matricesfor easier manipulation. As we face high dimension problems, our approach is quite different in the modelling, butthe principle is the same.

Let’s consider Xti a particle, where Xt

i is the ith particle at the tth iteration.Let Xt

i = [xti1, xti2, ..., x

tiD], xtid ∈ {0, 1} be a particle in a population of P particles, and composed by D

bits. We denote the velocity by V ti = [vti1, vti2, ..., v

tiD], vtid ∈ R. Then, as in the PSO method described in

[2], the next step is to define the best position P ti = [P ti1, Pti2, ..., P

tiD] for the ith particle, and the best position

P tg = [P tg1, Ptg2, ..., P

tgD] of the entire population at the iteration t.

As in continuous PSO, each particle adjusts its velocity according to social and selfish criteria. We can definethe velocity of a particle i in the direction d with: vt+1

id = c1r1vtid + c2r2(ptid − xtid) + c3r3(ptgd − xtid), where

c1 is called the cognition learning factor, and c2 is called the social learning factor. r1 and r2 are random numbersuniformly distributed in [0, 1]. Both of the coefficients c1 and c2 are fixed and represent the weights of the twocomponents. In order to determine the probability that a bit changes, the next step is the conversion of the velocityinto a probability through applying a sigmoid function. Thus we have:

s(vtid) =1

1 + e−vtid

, (∀vtid ∈ R)(s(vtid) ∈ [0, 1])

Here, s(vtid) represents the probability of a bit xtid to become 1 at the next iteration. The process is welldescribed in [11] by a pseudo-code. This pseudo-code is the one developed by Kennedy, Eberhart and Shi forDPSO.

2.2 Evolutionary game theory

2.2.1 Evolutionary stable strategiesAn Evolutionary Stable Strategy (ESS) is a strategy such that, if all members of a population adopt it, then nomutant strategy could invade the population under the influence of natural selection [3, 15]. Assume we have amixed population consisting of mostly p individuals (agents playing optimal strategy p) with a few individualsusing strategy p. That is, the strategy distribution in the population is:

(1− ε)p∗ + εp

where ε > 0 is the small frequency of p users in the population. Let the fitness, i.e. payoff of an individual usingstrategy q in this mixed population, be

π(q, (1− ε)p∗ + εp).



Then, an interpretation of Maynard Smith’s requirement [14] for p∗ to be an ESS is that, for all p 6= p∗,

π(p, (1− ε)p∗ + εp) > π(p∗, (1− ε)p∗ + εp)

for all ε > 0 ”sufficiently small”, for agents minimizing their fitness.

2.2.2 Replicator dynamicsA common way to describe strategy interactions is using matrix games. Matrix games are described using notationsas follows. ei is the ith unit line vector of i = 1, ...,m.

Aij = π(ei, ej) is the m×m payoff matrix.∆m ≡ {p = (p1, ..., pm) | p1 + ... + pm = 1, 0 ≤ pi ≤ 1} is the set of mixed strategies (probability

distributions over the pure strategies ei).Then, π(p, q) = p ·AqT is the payoff of agents playing strategy p facing agents playing strategy q.Another interpretation is π(p, q) being the fitness of a large population of agents playing pure strategies (p

describing the agent proportion in each behaviour inside a population) with respect to a large q population.The replicator equation (RE) is an Ordinary Differential Equation expressing the difference between the fit-

ness of a strategy and the average fitness in the population. Lower payoffs (agents are minimizers) bring fasterreproduction in accordance with Darwinian natural selection process.

pi = −pi(ei ·ApT − p ·ApT )

RE for i = 1, ...,m describes the evolution of strategy frequencies pi. Moreover, for every initial strategy distri-bution p(0) ∈ δm, there is an unique solution p(t) ∈ δm for all t ≥ 0 that satisfies the replicator equation. Thereplicator equation is the most widely used evolutionary dynamics. It was introduced for matrix games by Taylorand Jonker [19].

3 Description of the problemIn this section we extend the method proposed by Kennedy and Eberhart in [10] to solve a resource allocationproblem with hard constraints and real-time components. For example, we consider a supply chain context wherewe dispose of five delivery resources for three customers. The aim is to determine the best resource allocation torespond to several objectives.

Figure 1: Example of scenario including 3 customers and 5 available resources in 3 different storagecentres



The available resources: we have delivery resources at our disposal, defined by several properties. First, aresource is associated to its own stock centre. Second, the resources can join the customers with their own averagespeed.

The customers: the customers have requirements about the quantity of resources. We also suppose that weknow their geographical positions. Besides, customers are not considered as equal and we have ranked them byimportance of treatment. We define the priority vector P to express this importance of treatment. For example, ifthe priority vector is: P = [2, 3, 1], it means the customer 1 is ranked second, the customer 2 is ranked third, andthe customer 3 is ranked first. In this vector, the kth element represents the kth customer and the value of P (k) isthe priority rank of the kth customer.

The objective functions: we are considering a multi-objective problem, in which we want to minimize theglobal distance travelled by our delivery resources and minimize the time of delivery. The objective function isbuilt as follows: we add the time and the distance from stock centre to customer, multiply each of them by acoefficient, then divide that cost by the priority rank of the customer.

The figure 1 shows an example of the problem that we want to solve by this DPSO method.There are several methods to solve a multi-objective problem. The three main methods are: aggregation, non

aggregation and Pareto approach. In our case, we choose an aggregate approach because of its particularly goodmodelling for the problem.

The constraints: a single resource cannot be assigned to several customers, and all the resources don’t needto be used to solve this resource allocation problem.

Stop criterion: we consider that the algorithm has converged as soon as all the particles found an ”agreement”and that this agreement cannot be modified by the apparition of one new particle.

4 Mathematical modellingIn this section, we will describe the mathematical modelling we use to solve the problem. First, it appears thatwe have a combinatorial problem where we want to assign resources to a task. The first point of analysis leadsto model our problem as a task allocation problem. Thus we can define the starting point of our method. Beforelooking into the details, let C be the set of customers andR the set of resources. All the elements inR are assignedto a stock centre Si, where i ∈ N. We consider that we know the geographical position of each customer and ourstock centre in a two dimensional plan.

A particle is defined by a vector composed by N elements, where N represents the total number of resources.Each element of the vector is the id of the task treated by the resource n (n ∈ [1, 2, ..., N ]). So, we have thefollowing form for the particles: Xt

i = (Ck)k=1..P , where k represents the id of the customers and C is the setof customers. All the feasible solutions are permutations of an initial particle built around the customer requests.If we define the permutation function σ(v), where v is a vector, we obtain the set of feasible solutions Σ = σ(v),where v represents one feasible solution.

Let us consider D and T two N × P matrices. D represents the distance between the nth resource and thepth customer, and T represents the travel time of the nth resource to the pth customer. Those matrices are built asfollows:

D = (d(ri, pj))i=1..N,j=1..P ,

where d(A,B) represents the euclidean distance of the point A to B.

4.1 Objective functionNow, let’s describe the objective function: we want to minimize the cost of a solution evaluated by:

F (f1, f2) = α1f1 + α2f2, (α1, α2) ∈ R

Thus, we have a multi-objective problem, and the aim is to minimize F . If we look closely at the componentsof this function, there are two coefficients alpha1 and alpha2 which are real numbers, chosen by an expert,depending on the industrial requirements. Both functions f1 and f2 are defined as follows:

f1 =

N∑i=1

P∑j=1

(T (i, j)

U(j)

)Xti (i, j)

and



f2 =

N∑i=1

P∑j=1

(D(i, j)

U(j)

)Xti (i, j)

The first function f1 represents the travel time of the chosen resource. The second one f2 represents thetravelled distance. In those two functions we decided to divide the time/distance by the priority rank. It means thatwe accept to travel more distance or take more time if the customer is considered as important.

In this objective function, the proposed solution is unique, which is worth mentioning, because that is notthe case in Pareto approach, where a set of solutions is proposed and an expert has to select one of these solutionsamong the set. In our case we decided that the expert would intervene before computation by tuning the parametersof this objective function.

5 Proposed methodWe choose to use a DPSO method combined to EGT to solve this problem. First, it is important to state clearlythe definition of the particle: in our case, like it has been mentioned in the introduction of DPSO method, weconsider that a particle is a feasible solution of the problem. Each of those particles Xt

i takes the following form:Xti = (Ck)k=1..P , where k represents each of the customers and C the set of customers. We prefer such a

representation because in the case of high dimensional problems we won’t have to face sparse matrices and theheavy computations that they require. Moreover we can obtain all the possible solutions by permuting the elementsof Xi.

A particle is a combinatorial solution respecting the system constraints. The aim is to define a set of particles’moving’ on the solution space and interacting together about the quality of the found solution. The proposedalgorithm is developed on the same basis that the original work of Kennedy and Eberhart for the DPSO. Theadaptation to our application stands in the particle forms, the choice of the coefficients, and the probability of themovements. In fact, due to the context , we must respect hard constraints in order to define a move on the solutionspace. Those constraints that we need to respect are the following: first, a resource can be either unassigned, orassigned to one customer at most. Second, we consider that in the solution space, all the customer requests arecovered. To put into practice those constraints, we cannot deal with common DPSO.

5.1 Swarm organisationAnother key point of the PSO is the establishment of networks in the swarm. Who is in communication withwho? How to organise the swarm in order to improve the efficiency of the research and the convergence of swarmtoward the optimal solution? In our algorithm we propose to adopt four different swarms. Each of them has hisown specificity: One will adopt only inertial behaviour (c1 6= 0 and, c2 = c3 = 0). One will adopt only selfishbehaviour (c2 6= 0 and, c1 = c3 = 0). One will adopt only social behaviour (c3 6= 0 and, c1 = c2 = 0). Then thelast one is following the common behaviour of PSO with the coefficients determined by the result of EGT and thefour previous swarms. Interestingly, the coefficients are chosen dynamically according to the current state of allthe swarms.

5.2 Particle movementUsing the common process of PSO, we introduce the probability of move toward another solution. As we men-tioned earlier in this paper, the specific form of the particles leads to adapt the computation of the probability tomatch our requirement.

First, we consider each particle as a vector as described in 5. We can compute the probability of changeof each coefficient of the particle using the common computation of velocity, then applying an adapted sigmoidfunction. Now, let’s describe precisely how those probabilities are computed. The first step is to determine all thedifferences between the current position and the objective particles (the individual best performance or the globalbest). Let Xt

i be the particle i at the iteration t. We have Xti = [ci1ci2 . . . cip], i ∈ 1, . . . , N (N the number

of particles), where cij ∈ {0, 1, · · · , P}. If cij = 0, we consider that the resource j is unassigned. In order tocompute the best path that a particle will use, we compute the difference between the current position and thewished position. Since we only want to be informed of the difference, we state that if there is a difference betweentwo elements of the two vectors, we assign the value 1 to the difference vector, 0 if not. Now, we have two binaryvectors, the first one states the differences between the current position and the individual best, the second onestates the differences between the current position and the global best. According to the principle of DPSO, wecan compute the velocity of each particle by the formula: V ti = [vti1, v

ti2, ..., v

tiD], vtid ∈ R, where each coefficient

vid is computed by: vt+1id = c1r1v

tid + c2r2(ptid − xtid) + c3r3(ptgd − xtid). In this formula, r1 = r2 = r3 = 1.

The coefficients c1, c2 and c3 depend on the swarm. For the neutral swarm (neither c1, nor c2, nor c3 is equal to0) the coefficients are initialized to (1/3, 1/3, 1/3) first, then the EGT process will determine their values. For theother type of swarms, refer to the section 5.1. As the velocity coefficients are in R+, to convert those values intoprobabilities, we apply an adapted sigmoid function: s(vtid) = 1 − 2

1+evtid, (∀vtid ∈ R+)(s(vtid) ∈ [0, 1]). When

we have all the probabilities of change for a particle, the final step is to draw a random number and compare itto each coefficient of probability. All the coefficients greater than the random number are selected as potential



candidate for a permutation. Then we draw randomly two particle coefficients and permute them. (In case weobtain a random number greater than only one coefficient, we consider the particle won’t move).

5.3 Principle of EGT in the determination of the social coefficientsThe section 2.2.2 describes the process of EGT. As it is mentioned in this section, in order to establish a game wehave to define the different possible strategies in this game and what is the payoff of each strategy in this game. Inthis application, the strategies are the three possible ways that the particles can follow: inertial behaviour, selfishbehaviour and social behaviour. Now, let’s describe how we build the payoff matrix Πt, where t represents the tthiteration. In order to build this matrix, we use the first three swarms.

For each iteration we compute the movement of the three particular swarms. After this iteration, we cancompute easily the mean of the found solution. We consider the mean of each swarm as the result of using such astrategy. Thus we obtain easily the 3× 3 payoff matrix composed by:

Πt =

π(e1) π(e1)+π(e2)2

π(e1)+π(e3)2

π(e2)+π(e1)2 π(e2) π(e2)+π(e3)

2π(e3)+π(e1)

2π(e3)+π(e2)

2 π(e3)

As soon as we obtain this matrix, we can apply the replicator dynamics to find the optimal strategy to adopt.

If we consider mixed strategies, the optimal behaviour p is composed by a percentage of each strategy. Thus weobtain the repartition vector pt = [p(e1) p(e2) p(e3)], where

∑3i=1 p(ei) = 1. This is the result that we consider

as the optimal strategy. The fourth swarm will optimize its behaviour when it follows this strategy.Now we can run the classical DPSO algorithm for the fourth swarm using the optimal coefficients c1 = p(e1),

c2 = p(e2) and c3 = p(e3).

5.4 Escape local optimaEven if the PSO algorithm is very efficient to avoid to be trapped in local optima, when we add a behaviourbased on EGT and the auto-adaptation of coefficients, we find sometimes a premature agreement of the swarmbecause we reach the equilibrium. To avoid such a problem, we use a common process of PSO, which is kill andregenerates randomly a particle. Thus, one of the particles is out of the local optima. We compute again the meanvalue of the swarm and if the new result is better than the current one, we keep this particle in the swarm and allparticles continue their exploration by recomputing the best strategy in those new conditions. After a fixed numberof iterations, if no new particle enables a better result, we consider that the global optimum has been reached.

5.5 Algorithm proposalIn this section, we describe the pseudo-code used to solve the problem. In fact, the method we use is very similarto the one elaborated by Liaoa C.J., Tsengb C.T. and Luarnb P in [11], but in the case of our problem requirements,some constraints have been added, and the particle sequence construction is quite different, as described in thealgorithm 1.

6 Application

6.1 AssumptionNow, using the same notations as those introduced in the mathematical modelling section, let’s describe an examplein which the initial conditions are the following : there are P = 3 customers who are located in C1(2.5, 12),C2(4.1, 8.3) and C3(18.3, 4.9). The stock centres are located in: S1(1.1, 0.5), S2(10, 2.2), and S3(8.9, 15) (allthe coordinates are given in the (x, y) plan). There are N = 5 delivery resources. (r1, r2) ∈ S1, (r3) ∈ S2, and(r4, r5) ∈ S3.

Thus we can compute the distance matrix D:

D =

11.5849 8.3570 17.7539

11.5849 8.3570 17.7539

12.3406 8.4865 8.7281

7.0682 8.2420 13.7975

7.0682 8.2420 13.7975

Then, if we consider the average speed of each delivery resource: w = [5, 7, 9, 5, 9], we obtain the following

time matrix: T :



Algorithm 1 [solution] = DPSO EGT()1: Problem Initialisation2: for 1 to NbMax iteration do3: if Stop Criterion then4: break;5: else6: for Three First Swarm do7: // COMPUTATION OF DPSO FOR 3 SWARMS8: Swarm1← Compute DPSO(c1, c2, c3)9: π(e1)← Compute Cost(Swarm1)

10: Swarm2← Compute DPSO(c1, c2, c3)11: π(e2)← Compute Cost(Swarm2)12: Swarm3← Compute DPSO(c1, c2, c3)13: π(e3)← Compute Cost(Swarm3)14: end for15: // EVALUATION OF THE SWARMS16: Π← Compute Payoff Matrix(π(e1), π(e2), π(e3))17: // COMPUTATION OF OPTIMIZED COEFFICIENTS18: p← Compute Optimal Strategy(Π)19: Swarm← Compute DPSO(π(e1), π(e2)π(e3))20: // ALL SWARMS ARE REINITIALIZED ON THE NEW POSITION21: Swarm1← Swarm22: Swarm2← Swarm23: Swarm3← Swarm24: // IF AGREEMENT FOUND25: if isAtEquilibrium(Π) then26: // THE SWARM IS DISTURBED27: TMP Swarm← Disturb Swarm(Swarm)28: // IF AFTER A FIXED NUMBER OF DISTURBANCE THERE IS NO IMPROVEMENT

WE CONSIDER THAT THE OPTIMUM HAS BEEN REACHED29: for N Disturbance Max do30: π(etmp)← Compute Cost(TMP Swarm)31: if eTMP < Compute Cost(Swarm) then32: Swarm← TMP Swarm33: BREAK34: end if35: end for36: RETURN37: end if38: end if39: end for



T =

2.3170 1.6714 3.5508

1.6550 1.1939 2.5363

1.3712 0.9429 0.9698

1.4136 1.6484 2.7595

0.7854 0.9158 1.5331

After initializing the parameters, we can define the cost function as:

F (Xti ) = α1

5∑i=1

3∑j=1

(T (i, j)

U(j)

)Xti (i, j) + α2

5∑i=1

3∑j=1

(D(i, j)

U(j)

)Xti (i, j), α1 = 0.4, α2 = 0.6,

6.2 Results and comments

Figure 2: Evolution of the cost of the solution for all the possible permutations.

We want to make sure that the results that we obtain match the expected solution. If we compute all the spacesolutions and evaluate each of them, we obtain the graph of Figure 2. Thus, we reckon that the optimal solutionfor the particle is: X = [0, 2, 3, 1, 1] and F ([0, 2, 3, 1, 1]) = 13.05141. First, let’s use our algorithm to obtain theresult of the application.

As we can see on the graph of Figure 3, all the particles are effectively converging towards the optimal solutionand they are stabilising in those positions. In order to study the global convergence of the swarm, we can computethe mean of the solution found by each particle. Thus we obtain the graph represented on the Figure 4. This graphreveals a fast convergence, but also the good decisions of the swarm in its movement. In fact, the stagnation at thebeginning of the plot on Figure 4, shows that the swarm would rather keep its position than make a wrong decision.If this process has the inconvenience of limiting the exploration, the regeneration of new particles enables to avoidlocal traps. Thus, we obtain a winning combination between exploration and convergence. In order to illustrate thepositive effect of the EGT process, the Figure 5 shows its fast rate of convergence and the evolution of the payoffmatrix. In the end, we focus on the evolution of the coefficients c1(inertial), c2 (individual) and c3 (social) andmonitor their relative importance along the algorithm process. The Figure 6 enables to observe the proportion ofeach strategy in the swarm movement.

A general overview of the process can be illustrated by a plot in which all the graphs are superposed one ontop of the others for comparison. The dotted lines indicate critical points during the evolution of the swarm. Theyrepresent a set of transitions between which the swarm has different behaviours. In the first phase, the swarm isconducting an exploration during which it is converging fast towards the best global. In the second phase, themotion of the particles is very limited, but the agreement point remains the same. As mentioned before, we use aparticle regeneration process so as to escape local optima. As a result of this process, the swarm escapes the firstagreement point after twenty five iterations. Then, it converges towards the optimal allocation plan.

Now, let’s check the stability and the robustness of the method by reproducing the process a great numberof times and watching the convergence towards the optimal solution. Figure 8 describes the launching of thesimulation two hundred times. This graph shows a convergence in each and every case. The rapidity of theconvergence is very satisfactory since it takes fewer than 35 iterations in all but nine simulations.

Now, let’s look further into the rapidity of the convergence towards the optimal plan. To obtain significantresults, the simulation has been launched four thousand times. In each case, the number of iterations requiredfor the convergence has been measured so as to conduct a statistical analysis. In this experiment, the number ofiterations was limited to a maximum of seventy five, a choice that eventually proved reasonable. If we analyse



Figure 3: Evolution of the cost of particles. Each curve represents a particle. After 30 iterations, theswarm is stabilised.

Figure 4: Evolution of the mean for each iteration and evolution of the average move



Figure 5: Evolution of the coefficients in the payoff matrix

Figure 6: Proportion of each strategy in the movement of the swarm



Figure 7: Links among all the axis of study

Figure 8: The simulation has been launched two hundred times and appears very robust and reliable. Theoptimal allocation plan is always reached.



those results, we obtain that we have a mean of twenty three iterations before convergence. If we compute themedian value we obtain fifteen iterations. To finish, the standard deviation of the sample is hundred eighty.

Figure 9: Number of iterations required for the swarm before finding an agreement. The simulation hasbeen launched ten thousand times

So as to check that the algorithm matches the industrial real-time requirements, let’s use the results of thesimulation to determine the number of required iterations to converge in P% of the applications. For example,how many iterations are necessary so that the probability that the swarm finds a fit solution is greater than 95%?This kind of statistical data is valuable because it enables to limit the computation time. By computing cumulativesums, we can derive the probability of convergence as a function of the number of simulations. We obtain theplot on Figure 10. We can conclude, for instance, that forty iterations produce a convergence toward the optimalallocation plan with a probability of 98.64%. Thanks to those results, the number of iterations can be closelymonitored and the computation time reduced to a minimum. Of course, the number of iterations can be decreasedeven more, but the risk of the swarm being trapped in local optima could be damaging. Taking that risk in exchangefor a gain of time is a matter of requirements

Figure 10: Curve of the probability to reach the optimum allocation plan with n iterations.



7 Conclusion and perspectivesIn this study we focused on the resource allocation problem through a swarm intelligence method. The adaptedDPSO to a hard-constrained environment shows very promising results. The numerous and successful trials showthat the method is reliable and provides valuable solutions to the resource allocation problems. The differencebetween the new algorithm and those that were developed before lies not only in the movement of the particlesthrough the solution space, but also in this new approach considering a dynamical choice of coefficients based onthe evolution game theory. The choice of the coefficients was arguably the main weakness of PSO algorithms.Thanks to our study, the performances of the method is significantly enhanced. Furthermore, we take into accounta hard-constrained environment which is closer to a real industrial application. To check the efficiency of themethod, we test the algorithm in a supervised environment. The analysis of the results proves that this method isvery efficient to solve resource allocation problems.

In the future, it will be interesting to test this algorithm in the case of more sophisticated problems. Forexample, we could add a time component. If we envision the possibility that the customers have time requirements,we will mix this adapted DPSO and a scheduling problem. Since we are in real-time context, it could be moreinteresting to use another EGT method than replicator dynamics. For example, we could explore the ways eitherof the Nash equilibrium, or population game, in order to decrease the computation time of the algorithm. It willalso be interesting to add a Pareto approach, considering an expert can choose among a set of solutions.

References[1] M.A. Badamchizadeh and K. Madani. Applying modified discrete particle swarm optimization algorithm

and genetic algorithm for system identification. In Computer and Automation Engineering (ICCAE), 2010,volume 5, pages 354–358, Singapore (Republic of Singapore), February 2010.

[2] M. Clerc and P. Siarry. Une nouvelle metaheuristique pour l’optimisation difficile : la methode des essaimsparticulaires. J3eA, 3-7:1–13, 2004.

[3] R. Cressman. Evolutionary Dynamics and Extensive Form Games. MIT Press, 2003.

[4] R. Eberhart and Y. Shi. Comparison between genetic algorithms and particle swarm optimization. In Proceed-ings of the 7th International Conference on Evolutionary Programming, pages 611–616, San Diego (USA),March 1998.

[5] T-F Ho, S. J. Jian, and Y-L W. Lin. Discrete particle swarm optimization for materials budget allocation inacademic libraries. In Computational Science and Engineering (CSE), 2010, pages 196–203, Hong Kong(China), December 2010.

[6] W.B. Hu, J.X. Song, and W.J. Li. A new PSO scheduling simulation algorithm based on an intelligentcompensation particle position rounding off. In ICNC ’08 Proceedings of the 2008 Fourth InternationalConference on Natural Computation, volume 1, pages 145–149, Jinan (China), October 2008.

[7] U. Junker. Air traffic flow management with heuristic repair. The Knowledge Engineering Review, 0:1–24,2004.

[8] T. Kaihara and S. Fujii. A study on multi-agent based resource allocation mechanism for automated enterprisecontracting. In Emerging Technologies and Factory Automation, 2006. ETFA ’06, pages 567–573, Prague(Czech Republic), September 2006.

[9] D. Karimanzira and M. Jacobi. A feasible and adaptive water-usage prediction and allocation based on amachine learning method. In Proceedings of the Tenth International Conference on Computer Modeling andSimulation, pages 324–329, Cambridge (UK), April 2008.

[10] J. Kennedy and R.C. Eberhart. A discrete binary version of the particle swarm algorithm. In The 1997 IEEEInternational Conference on Systems, Man, and Cybernetics, volume 5, pages 4104–4108, Orlando (USA),October 1997.

[11] C.J. Liaoa, C.T. Tsengb, and P. Luarnb. A discrete version of particle swarm optimization for flowshopscheduling problems. Computers & Operations Research, 34:3099–3111, 2007.

[12] P. Liu, L. Wang, X. Ding, and X. Gao. Scheduling of dispatching ready mixed concrete trucks throughdiscrete particle swarm optimization. In The 2010 IEEE International Conference on Systems, Man, andCybernetics, pages 4086–4090, Istanbul (Turkey), October 2010.

[13] B. Lu and J Ma. A strategy for resource allocation and pricing in grid environment based on economic model.In Computational Intelligence and Natural Computing, 2009. CINC ’09, volume 1, pages 221–224, Wuhan(China), June 2009.

[14] J. Maynard-Smith. Evolution and the theory of games. Cambridge University Press, 1982.



[15] W. Sandholm. Population Games and Evolutionary Dynamics. MIT Press, 2010.

[16] D.Y. Shaa and Lin H.H. A multi-objective PSO for job-shop scheduling problems. Expert Systems withApplications, 37:1065–1070, 2010.

[17] F. Shaheen and M. B. Bader-El-Den. Co-evolutionary hyper-heuristic method for auction based scheduling.In Congress on Evolutionary Computation (CEC), 2010, pages 1–8, Barcelona, July 2010.

[18] X.H. Shi, Y.C. Lianga, H.P. Leeb, C. Lub, and Q.X. Wanga. Particle swarm optimization-based algorithmsfor TSP and generalized TSP. Information Processing Letters, 103:169–176, 2007.

[19] P. Taylor and Jonker L. Evolutionary stable strategies and game dynamics. Mathematical Bioscience, 40:145–156, 1978.

[20] V. V. Toporkov. Optimization of resource allocation in hard-real-time environment. Journal of ComputerSystems Sciences, 43-3:383–393, 2004.

[21] S. Wang and B. Meng. Chaos particle swarm optimization for resource allocation problem. In Automationand Logistics, 2007, pages 464–467, Jinan (China), August 2007.

[22] Y. Yare and G. K. Venayagamoorthy. Optimal scheduling of generator maintenance using modified discreteparticle swarm optimization. Bulk Power System Dynamics and Control - VII. Revitalizing OperationalReliability, 2007 iREP Symposium, 7:1–7, 2007.


A swarm intelligence method combined to evolutionary game theory applied to the resources allocation...

Documents

Transcript of A swarm intelligence method combined to evolutionary game theory applied to the resources allocation...