Swarm-based metaheuristics in automatic programming: a survey

Overview

Swarm-based metaheuristics inautomatic programming: a surveyJuan L. Olmo,1 José R. Romero1 and Sebastián Ventura1,2∗

On the one hand, swarm intelligence (SI) is an emerging field of artificial intelli-gence that takes inspiration in the collective and social behavior of different groupsof simple agents. On the other hand, the automatic evolution of programs is anactive research area that has attracted a lot of interest and has been mostly pro-moted by the genetic programming paradigm. The main objective is to find com-puter programs from a high-level problem statement of what needs to be done,without needing to know the structure of the solution beforehand. This paper looksat the intersection between SI and automatic programming, providing a survey onthe state-of-the-art of the automatic programming algorithms that use an SI meta-heuristic as the search technique. The expression of swarm programming (SP) hasbeen coined to cover swarm-based automatic programming proposals, since theyhave been published to date in a disorganized manner. Open issues for futureresearch are listed. Although it is a very recent area, we hope that this work willstimulate the interest of the research community in the development of new SPmetaheuristics, algorithms, and applications. © 2014 John Wiley & Sons, Ltd.

How to cite this article:WIREs Data Mining Knowl Discov 2014, 4:445–469. doi: 10.1002/widm.1138

INTRODUCTION

Bio-inspired algorithms1 are a kind of algorithmbased on biological systems that mimic the prop-

erties of these systems in nature. They are attractivefrom a computational point of view due to their broadapplication areas and their simplicity and randomcomponents, inherited from natural systems. Mostbio-inspired algorithms are easy to implement andtheir complexity is relatively low, and these algo-rithms, though simple, can search multimodal land-scape with sufficient diversity and ability to escape anylocal optimum.2 Bio-inspired computing is an activeand promising research field in algorithm design, andincludes paradigms such as artificial neural networks(ANNs),3 evolutionary algorithms (EAs),4 artificialimmune systems (AIS),5 and swarm intelligence (SI),6

among others.

∗Correspondence to: [email protected] of Computer Science and Numerical Analysis, Univer-sity of Córdoba, Córdoba, Spain2Department of Computer Science, King Abdulaziz University,Jeddah, Saudi ArabiaConflict of interest: The authors have declared no conflicts ofinterest for this article.

In particular, SI focuses on the development ofmulti-agent systems inspired by the collective behav-ior of simple agents. The general objectives of theswarm are pursued by means of individuals’ indepen-dent actions, which can interact locally both with oneanother and with the environment. A global, intel-ligent and coordinated behavior emerges from theseindependent actions.7 Representative types of SI areparticle swarm optimization (PSO),8 which deals withthe movement of flocks of birds or schools of fishes;ant colony optimization (ACO),9 which takes inspira-tion from the behavior and self-organization capabili-ties of ant colonies; or bee swarm intelligence (BSI),10

which models some of the features of honey bees. SIalgorithms are expanding and becoming increasinglypopular in many disciplines and applications, mainlybecause of their flexibility and efficiency in solving awide range of high complex problems.11

On the other hand, automatic programming isan active research field with applications in manydomains that has become very popular mainly due tothe widespread use of the GP paradigm.12 Automaticprogramming is a method that uses a search techniqueto automatically construct a computer program thatsolves a given problem, without requiring the user

Volume 4, November/December 2014 © 2014 John Wiley & Sons, Ltd. 445

Overview wires.wiley.com/widm

15

14

13

12

11

10

9

8

7

6

5

4

3

2

1

02000 2001 2002 2003 2004 2005 2006 2007 2008 2009 2010 2011 2012 2013+

FIGURE 1 | Swarm-based automatic programming publications per year.

to know the structure of the solution in advance. Infact, the problem is solved by specifying the goals tobe reached and the basic blocks that comprise anyprogram or individual.

While research on GP continues, other auto-matic programming techniques are arising from theSI paradigm, showing competitive or even better per-formance than GP for certain problems. In this work,we bring these techniques together under the nameof swarm programming (SP). Examples of SP tech-niques are ant programming (AP), which has beenapplied successfully to symbolic regression,13 optimalcontrol14 and data mining (DM) problems,15 or beeswarm programming (BSP), in application areas suchas symbolic regression,16 boolean functions,17 andplanning.17 In addition, the SI field is in continuousdevelopment, and new optimization algorithms18 arebeing devised from the biological behaviors of bats,mosquitoes, cuckoos, and other organisms, whichwould imply that their application to the automaticprogramming field is imminent. With the advent ofmore techniques, encoding schemes, and applicationsof SP, we believe it is worthwhile to present thissurvey to the research community, defining a frame-work where future SP algorithms could be broughttogether.

Figure 1 shows the frequency of SP publicationsbetween 2000 and 2013 as reviewed in this work.

As can be observed, the number of publications until2006 presents some fluctuations. From 2007 to 2011,it remains steady, except for 2009 where less paperswere published. Nevertheless, an increasing trend inthe number of SP publications, spiking to new levelssince 2012, has been observed. The percentages ofSP publications grouped by metaheuristic are alsoshown in Figure 2. As shown, the SP technique thathas attracted most attention is AP, which accountsfor nearly three quarters (69%) of the publications.Particle swarm programming (PSP) is the next SPtechnique in percentage of publications, with 21%,while other techniques such as BSP, herd programming(HP), artificial fish swarm programming (AFSP), andfirefly programming (FP) are still emerging, sharing therest of the literature references.

To the best of our knowledge, this is the firstsurvey paper reviewing the swarm-based algorithmsin automatic programming and their applications, andit pursues a threefold objective. First, it aims to givean overview of the SP metaheuristics that have beenpresented so far. Second, it classifies all known SPalgorithms unifying the use of key terminologies. Andthird, it intends to serve as a starting point for otherresearchers in this field, providing some open issuesand ideas for future work.

This paper is organized as follows. The sectionAutomatic Programming as a Search Problem briefly

446 © 2014 John Wiley & Sons, Ltd. Volume 4, November/December 2014

WIREs Data Mining and Knowledge Discovery Swarm-based metaheuristics in automatic programming

AP69%

PSP21%

BSP3%

HP3%

FP2%

AFSP2%

FIGURE 2 | Percentage of swarm-based automatic programmingpublications by metaheuristic.

presents classical bio-inspired automatic program-ming techniques, giving special emphasis to GP. Inthe subsequent sections, the different automatic pro-gramming metaheuristics placed among SI are furtherexamined. Specifically, the section The AP Metaheuris-tic describes AP, the section The PSP Metaheuristicgives an overview of the PSP metaheuristic, and thesection The BSP Metaheuristic focuses on BSP. Otherrecent and emerging SP approaches such as AFSP,HP, and FP are analyzed in the section Other SwarmProgramming Metaheuristics. Open issues and futuredirections of research are given in the section OpenIssues. Finally, the last section presents some conclud-ing remarks.

SI METAHEURISTICS

Stochastic search algorithms are those that considerrandom movements through the space of solutions,not in a blind way, but guided by an intelligentmechanism. This kind of algorithms is also known asmetaheuristics. Almost all metaheuristic algorithmsare biologically inspired so that the actions thatguide the search of solutions mimic biological andphysical systems in nature, such as natural selection,immune system, and collective intelligence.11 Thesealgorithms make use of stochastic components, andthey have several parameters that need to be fitted tothe problem at hand.

In particular, the aggregation of animals such asfishes, birds, mammals, or insects behaving in a collec-tive way provide a rich set of metaphors for designingSI algorithms. SI is defined as the collective behavior

of decentralized and self-organized swarms becausemultiple individual agents somehow work togetherwithout supervision or any central control. In addi-tion, all agents present a stochastic behavior owingto their perception in the neighborhood.19 Theseagents as a population can exchange information, bychemical messenger (pheromone by ants), by dance(waggle dance by bees), or by broadcasting ability(such as the global best in flock of birds, fireflies, andherds). Therefore, all swarm-based algorithms arealso population-based algorithms.11

On the other hand, not all the swarms can beconsidered as intelligent, or their level can vary fromone to another. Millonas20 determined five principlesthat any swarm should satisfy to be considered asintelligent:

• Proximity. The swarm should be capable ofperforming spatial and temporal calculations.

• Quality. The population should be able torespond to certain quality factors, evaluatingthe importance of foodstuffs, the security of theswarm and its location.

• Diverse response. Resources should be dis-tributed in many nodes, instead of allocatingthem along narrow channels, so that they remainsafer against a fluctuation of the environment.

• Stability and adaptability. These principlespresent opposite sides of the same coin. Theformer states that the swarm should not behavedifferently each time the environment changes,since it may not be capable of compensatingthe energetic cost entailed. Nevertheless, thelatter indicates that, when changing the behav-ior is worth the effort (the energetic cost), thepopulation should change it.

The key point in any SI algorithm is the employ-ment of a high number of individual agents. The mainobjective of the group will never fall on individualsin a single way, but even these individuals are dis-pensable, which provides robustness to the process.This is important because it permits the system to per-form the search without failure under a wide rangeof conditions, having that if one interaction fails orone of the individuals misses its task, then this fail-ure is quickly compensated by the other individuals.21

In general, each individual just has a short view ofits neighbors and a limited memory, which leads toadopt decisions based only on the local informationby using simple rules. Actually, the decision processof a given individual can be described with just a fewrules. The aforementioned features make SI systems



scalable with respect to the number of individuals,robust with respect to individual misbehavior or lossof group members, and adaptive to environments sub-ject to changing circumstances.

AUTOMATIC PROGRAMMING AS ASEARCH PROBLEM

Automatic programming can be addressed as a searchproblem if it is possible to represent the problemdomain as a program space comprising all possiblecomputer programs for solving the problem.12 How-ever, using exhaustive search techniques to find anoptimal solution over such a program space is notalways possible, in account of the difficulty of someproblems and the huge size of the space of solutions.At this point, bio-inspired algorithms appear to be agood option to employ as the search technique in auto-matic programming, as they are tolerant to certainimprecision and uncertainty that natural phenomenamodel in an approximate way. Then, the process ofproblem solving becomes an exploration of the pro-gram space in search of a highly fit individual. Tothis end, individuals in any bio-inspired automaticprogramming technique are made of primitives andencode computer programs, and operators should beadapted to the encoding scheme employed.

Koza22 suggested that there are five preparatorysteps to be completed in any automatic programmingapproach: selection of the terminal set, selection ofthe function set, specification of the fitness function,configuration of parameters to control the execution,and definition of the termination criteria. Notice that,even when by the time this assumption was made GPwas the main exponent of automatic programming,this scenario can be applied nowadays in most auto-matic programming techniques in a general sense. Toensure that a given bio-inspired automatic program-ming technique can be applied to a specific problem,it is also necessary to guarantee that the set of ter-minals and functions selected for the problem fulfillthe closure and sufficiency properties. Closure estab-lishes that each function should be able of handlingany value that can be received as input, including out-puts of other functions or operators. If this propertyis not satisfied, invalid individuals can be generated,having a negative effect in the efficiency of the evolu-tionary process. Sufficiency implies that the expressivepower of the set of terminals and functions should beenough to represent a solution to the problem.

GP was the first bio-inspired metaheuristicfor evolving computer programs,12 and many otherbio-inspired automatic programming techniqueshaven been proposed since then. In GP, individuals

are encoded with sophisticated structures, usuallyexpression trees. The typical lifecycle of GP is toinitialize a random population of individuals andcompute their fitness using an appropriate objectivefunction. Then, new solutions are generated based onthe combination or modification of existent individ-uals mainly by means of the crossover and mutationoperators, although several other operators havebeen proposed in literature. This new population ofindividuals is evaluated, and then again passed tothe genetic operators. The process is repeated untilreaching a stopping criterion, such as reaching thenumber of generations specified, or generating anoptimal individual. As the construction of solutionsin GP is based on the combination or modification ofexistent individuals, certain loss of information couldoccur,23 regarding the search space distribution, thedifficulty to adapt to an environment that is changingcontinuously, and the genotype–phenotype mapping,where a little modification in the genotype can havea great impact on the phenotype. One of the majordrawbacks of GP is known as bloat,24 which meansthat the tree structures become increasingly deep andunbalanced. This phenomenon entails more memoryconsumption, reducing the efficiency of genetic oper-ators drastically, and it also expands the search spaceof the problem, increasing the difficulty of finding anoptimal solution.

Grammatical evolution (GE) was presented byRyan et al.25,26 as another EA to evolve computerprograms. It is considered as a subtype of GP wherethe syntax of the language is defined by a context-freegrammar (CFG), and individuals are encoded bya variable lineal structure instead of a parse tree.The genotype is mapped onto terminals using thedefined grammar. One of the reasons for using agrammatical approach lies in the fact that it allowsprograms to contain multiple data types, in contrastto tree-based GP, where an individual is restrictedto a single type because of the closure property.Another type of GP that can either use a CFGor a tree-adjoining grammar (TAG)27 to give morecontrol over individuals’ structure and operators isgrammar-guided GP (G3P).28 In G3P, the genotypeof individuals is a syntax tree defined upon thegrammar, which restricts the search space so that onlysyntactically valid individuals can be generated.

Ferreira29 later proposed an evolutionary com-puting technique for solving problems automaticallycalled gene expression programming (GEP), inspiredin the gene expression process existent in nature.In this process, proteins are synthesized from thenucleotide sequence of a gene, making up the structureof cells. An individual or gene in GEP is similar to an



expression tree in GP, but it is encoded instead by usinga linear chromosome structure with a fixed length. Theprocess of information decoding from linear chromo-somes to expression trees is called translation. Whatvaries is not the length of genes, but the length of theopen reading frames (ORFs). In GEP, the length of theORF of a given individual can be equal to or less thanthe length of gene. In this latter case, the remainingregion between the ORF termination point and theend of the gene is called noncoding region. Noncodingregions allow modification of the genome by meansof any genetic operator without restrictions, alwaysproducing syntactically valid programs. Actually, non-coding regions are the main difference between a kindof GP with linear representation and GEP.

Bacterial programming (BP)30–32 is anotherautomatic programming technique where individualsare encoded using expression trees, and the bacterialevolutionary algorithm (BEA)33 is employed as thesearch technique. The BEA does not present a swarm-ing behavior, as it mimics a process occurring in thebacterial genetics level, and it cannot be consideredas an SP technique. By contrast, there is a techniquecalled bacterial foraging optimization (BFO)34 thatdoes belong to SI due to the fact that it presents agroup foraging behavior called chemotaxis, that bac-teria such as E. coli and M. xanthus have in nature.However, at the time this survey was written, therewere no automatic programming publications whereBFO was used as the search technique.

AIS have been used as the search technique inautomatic programming under the name of immuneprogramming (IP),35,36 and proposals where individu-als are represented either using a linear or a tree-basedencoding scheme can be found in the literature.37,38

Recently, some researchers have started to suggestthat many aspects of AIS have direct parallels withSI,39,40 as some mechanisms within the immune sys-tem exhibit self-organizing properties. However, tra-ditionally, they have been considered as differentdisciplines.5,41 We do not include the automatic con-struction of programs by using AIS in this surveybecause agents that take part in the process of solu-tion generation are not considered to exchange usefulinformation to improve future proposals, except forclonal generation or the elimination of agents accord-ing to their affinity, which we do not consider relevantenough in order to label AIS as a swarm technique.

THE AP METAHEURISTIC

Ant Colony OptimizationDorigo et al.9,42 proposed the ACO metaheuristic asan SI optimization method that bases the design of

intelligent multi-agent systems on the foraging behav-ior and organization of ant colonies in their searchfor food. In nature, ants communicate with eachother through the environment, in an indirect way,by means of a chemical substance (a pheromone) thatthey spray over the path they follow. The pheromoneconcentration in a given path increases as more antsfollow this path, and it decreases more quickly asants fail to travel it, as the evaporation in this pathbecomes greater than the reinforcement. The higherthe pheromone level in a path, the higher the proba-bility that a given ant will follow this path. To build asolution for a given problem, ACO uses a constructivemethod in which a given individual follows a sequenceof transitions that are guided by two components. Thefirst one, called heuristic information, is specific of theproblem domain, while the second, called pheromoneconcentration, indicates the pheromone amountspread over the environment, and it allows the indirectcommunication between ants along generations.

The general steps of the ACO metaheuristic areshown in Algorithm 1. At the beginning of the algo-rithm, all pheromone trails are initialized to a value𝜏0, which is a parameter of the algorithm, and all theparameters are set. After initialization, in the ants gen-eration phase, a number of ants specified as parameterare created by following the aforementioned construc-tive method, where artificial ants move through adja-cent states of a problem, selecting their next move-ment probabilistically by applying a transition rule.The next step is optional, and consists in improvingthe solutions obtained by performing a local search.Then, an evaporation process is performed to decreaseall the pheromone levels in the environment. Finally,a pheromone update phase is carried out in order toreinforce the pheromone levels in the transitions fol-lowed by those ants considered as good solutions.

Ant ProgrammingAP was the first swarm-based technique for evolvingcomputer programs,43 according to the principles ofautomatic programming. It presents certain similari-ties with GP, but rather than using GA as the searchtechnique, it uses ACO to look for programs. On theother hand, AP uses a global memory (pheromone



+ : 0.2– : 0.3* : 0.8% : 0.4cos : 0.4sin : 0.2rlog : 0.3x : 0.3R : 0.1

+ : 0.4– : 0.9* : 0.3% : 0.6cos : 0.1sin : 0.1rlog : 0.1

XX1 X

–

+

x : 0.1R : 0.1

FIGURE 3 | Prototype tree with a pheromone table associated with each node (it is shown only for the first two nodes).

matrix), while GP uses local memory. The applicationof ACO to automatic programming does not presentthe problems mentioned for GP in the section Auto-matic Programming as a Search Problem, which areinherent to methods that generate solutions by com-bining or modifying other solutions.23

As well as for GP, any program in AP is madeof primitives, which can be either terminals or func-tions. An AP algorithm should satisfy the closure andsufficiency properties as well.

Although several automatic programming algo-rithms using ACO have been presented so far, the useof the term AP to refer to this paradigm is of late onset.Actually, until Salehi-Abari and White44 presented awork comparing GP against AP for symbolic regres-sion, AP was just considered as a kind of ACO variantof GP or a GP hybridization, like in Ref 45 but not asan independent metaheuristic. Only the work by Rojasand Bentley46 considered the possibility of a new trendof research regarding natural computation approachesfor automatic programming. Since then, the term APhas been widely employed to refer to the automaticconstruction of programs by means of ACO.

Roux and Fonlupt43 presented the first AP pro-posal. In their algorithm, individuals are representedby means of a prototype tree, i.e., a parse tree witha pheromone table associated with each node. Thewell-known ramped half-and-half4 GP initializationmethod is used to generate the initial populationof individuals. Each node of a given tree stores apheromone table that keeps track of the pheromonelevel associated at that point with all possible func-tions and terminals, as illustrated in Figure 3. Noticethat in the first generation of the algorithm, the table of

pheromone at each node is initialized with the 0.5 neu-tral probability value so that functions and terminalshave the same probability of being chosen. Once anindividual is evaluated, the pheromone table at everynode is updated with evaporation and reinforcementprocesses, this latter related to the fitness of the tree.These steps are repeated until reaching a stop criterion,but the new generations of programs are not created asthe initial one, but according to the tree’s pheromonetables (the higher the rate, the higher the probability tobe chosen). This algorithm was applied over two sym-bolic regression problems and a multiplexer problem,providing slightly better results than GP.

A similar individual encoding scheme wasemployed in Ref 47. Each individual builds andmodifies trees taking into account the amount ofpheromone existent at each node, where a pheromonetable is kept. In this work, trees represented neuraltrees, and the goal was the evolution of flexible ANNs.To this end, the authors combined AP with PSO, APbeing responsible of evolving the architecture of theflexible ANNs, and PSO being in charge of optimizingthe values of the parameters encoded in the neuraltree. Flexible ANNs evolved were applied to temporalseries prediction problems, showing the efficiency ofthe algorithm.

The most recent AP proposal having individualsencoded by prototype trees was presented by Haraet al.48 In this work, the authors introduced geneticoperators to change the structure of the trees duringthe run of the algorithm with the aim of avoidingpremature convergence. They compared the perfor-mance of their technique against that obtained byRoux and Fonlupt’s43 original algorithm, obtaining



y

5

5 5 1

x

x

1

l

l

l

l

y

+

+

+

–

FIGURE 4 | Tree structure generated from a graph in the ant colony programming (ACP) expression approach.

better results in two symbolic regression problemsand the even-5-parity problem.

Further works started to use a space of stateswith the shape of a tree or a graph, and the transitionrule of the ACO metaheuristic to traverse this graphand find a path for a given individual. This allowedto make use of the heuristic information of theproblem domain, in addition to the pheromone globalinformation.

Boryczka et al.49,50 presented a group of meth-ods founded on the use of the classic Ant Colony Sys-tem algorithm,9 and which were together referred to asant colony programming (ACP). ACP was applied ini-tially to solve symbolic regression problems, and twodifferent versions were introduced, the expression andthe program approaches.

In the ACP expression approach, the searchspace consists in a graph defined as G= (N, E), whereN stands for the set of nodes, which can either repre-sent a variable or an operator, and E is the set of edges,each one with a pheromone value associated. Antsmove through the graph generating programs with ahierarchical structure. Thus, the objective is to buildan approximating function expressed as an arithmeticexpression in prefix notation. Concurrently, Greenet al.51 also presented an AP technique similar to theACP expression approach, where the graph is gener-ated by following a random process. A sample graphis shown in Figure 4, where starting at an initial nodeit is possible to observe the movement of ants through-out the graph. The process is as follows. When an antreaches a node, it determines whether it is a terminalor a function node. If the ant has reached a terminalnode, this ant has finished its tour. Otherwise, the antdetermines how many parameters need the functionnode. In case the function needs more than one param-eter, the original ant will reproduce so that the numberof ants starting out from the function node is equal to

the number of parameters. Notice that the graph is notdirected, but for a better understanding the directionof movement has been pointed out.

In the ACP program approach, graph nodesrepresent assignment instructions, and the solution(the approximating function) consists in a sequence ofassignments that evaluate the function.

Boryczka also extended these works with theaim of improving the evaluation performance of ACP.First, a modification eliminating the so-called intronswas presented,52 achieving simplified solutions andbetter effectiveness. Introns are sequences of symbolsand instructions that do not affect the quality of theapproximation and, therefore, they hinder the struc-ture of the solution and increase evaluation time.The second improvement53 supposed a reduction inthe computational time for evaluating transition rulesand also reported a better performance, by includ-ing the employment of a candidate list. A summaryof the results reported in the previous ACP works byBoryczka was presented in Ref 54. In this paper, theauthor reported a problem of the ACP approachesconcerning the tuning of parameters, identifying twokind of parameters, those related to the ACS, quantita-tive, and those specific of the ACP method, qualitative.

A proposal related to the ACP expressionapproach was presented by Rojas and Bentley, whocalled their algorithm Grid Ant Colony Programm-ming (GACP).46 GACP considers a space of stateswith a grid shape, instead of a graph. The mainadvantage lies in the fact that ants can store temporalinformation of the steps followed. Thus, in the case agiven ant visits the same node several times, the quan-tity of pheromone to be deposited in each one can bedetermined. The authors demonstrated that GACPwas able to obtain perfect solutions when solvingBoolean functions (6-multiplexor and 11-multiplexorproblems).



Shirakawa et al.13,55 proposed an AP methodbased on ACP whose main difference lies in theemployment of a dynamically changing pheromonetable and a variable number of nodes, which leads toa more compact space of states. However, the authorstested only the performance of DAP using symbolicregression problems, where they found benefits withrespect to GP in terms of performance and compact-ness, as the size of the pheromone table of DAP wasstable, while the tree size of GP bloated.

Kumaresan applied the ACP expressionapproach to optimal control and modeling.14,56,57,58

To obtain optimal control, AP is in charge of solvingdifferential algebraic equations to compute the matrixRicatti differential equation, which is the central issuein optimal control theory. The solution obtained byusing the proposed method was very close to the exactsolution of the problem.

Another attempt to evolve computer programsby using ACO was AntTAG, proposed by Abbasset al.23 This method is based on the use of a TAGas representation scheme and the ACS algorithm asthe search strategy. The initial grammar of AntTAGpresents a uniform distribution as there is no priorknowledge. A two-dimensional table is used to recordthe amount of pheromone deposited by ants whileconstructing their solution. Some versions considerinclusion of a crossover operator as a local searchoperator. The authors tested the performance ofAntTAG in symbolic regression problems, achievingbetter results than those obtained by G3P both using aCFG28 and a TAG.59 A further study of AntTAG is pre-sented in Ref 60, extending the experimentation of theprevious paper on more complicated symbolic regres-sion problems. The main drawbacks of AntTAG aredue to the complexity of the fixed grammar structure,which may exclude information that is crucial to finda good solution. Actually, according to Ref 60, the setof elementary trees needs to be adapted even for othersymbolic regression problems.

There are other grammar-based AP variants.Keber and Schuster61 proposed the generalized antprogramming (GAP) algorithm, combining a CFGand ACO to synthesize programs. In contrast toAntTAG, the model does not have a fixed structureand individuals generate a program or expression byfollowing a path over the space of states, depositingtheir pheromone in the derivation steps. To this end,the complete path visited by each ant is stored.Figure 5 shows part of the derivation tree that antsexplore to generate an expression or program, given aCFG expressed in Backus–Naur form (BNF) definedby G= (

∑N,

∑T, P, S), where

∑N is the set of

non-terminal symbols,∑

T is the set of terminal

S + T

F + T

T + T

F*T

F*T+ T

S

F

a

T

FIGURE 5 | Part of the derivation tree explored by ants to generatean expression in generalized ant programming (GAP).

symbols, P is the set of production rules, and S standsfor the start symbol:

G = (N = {S, T, F} ,

T = {a,+, ∗, (, )} ,

R = {S → S + T |T,T → F∗T|F,F → (S) |a} ,S = {S} ).

GAP was used first in option valuation to deriveaccurate analytical approximations of the value ofAmerican put options on non-dividend paying stocks.Later, in Ref 62, the authors used GAP in option pric-ing, specifically to derive approximations for comput-ing implied volatilities based on American put options.

Salehi-Abari and White63 built on previous workon GAP, proposing a new algorithm called enhancedgeneralized ant programming (EGAP), also based onthe use of a CFG. More specifically, EGAP modifiesthe pheromone placing method so that the amountof pheromone placed in a derivation step is propor-tional to the depth of the path followed. In addition,it makes use of a heuristic function to control the pathtermination. The algorithm was tested over severalbenchmark problems such as quarctic symbolic regres-sion, multiplexer, and Santa Fe ant trail, comparingthe results with its precursor, GAP. Both algorithmswere run with the same parameter configuration. Theresults obtained showed a significantly better perfor-mance of EGAP with respect to GAP. The reasons werestudied, in an attempt to determine if the improvementwas due either to the inclusion of the heuristic functionor to the modified pheromone reinforcement method.To this end, modified and separated versions of theoriginal GAP algorithm including just the heuristicfunction or the new pheromone placement approachwere run. As a result, the strength of EGAP was foundto be motivated by the interaction of both methods.



Then, the same authors in a subsequent work44 com-pared the performance of their EGAP algorithm versusGP over the same three benchmark problems, whereEGAP behaved better for symbolic regression. Never-theless, it was not capable of generating as many dis-tinct solutions as GP for the multiplexer and the SantaFe trail problems, what the authors attribute to a bet-ter exploration ability of GP, suggesting that the EGAPalgorithm should be improved by using any diversi-fication mechanism in order to reset the pheromonematrix in case no improved solutions are achieved ina given number of generations.

Olmo et al. proposed an AP model calledgrammar-based ant programming (GBAP).15 It isfounded on the use of a CFG that restricts the searchspace, which adopts the shape of a derivation tree,as in GAP and EGAP. The grammar controls ants’movements, forcing them to adopt valid transitionsso that they are able to find a feasible solution to theproblem. This model was applied to the classificationtask of DM, in such a way that individuals represent apath over the derivation tree, encoding simple rules ofthe form IF antecedent THEN consequent. Therefore,GBAP follows the individual= rule approach, a.k.a.Michigan approach.64,65 Some of the individualsobtained in the final generation of the algorithm areselected using a niching approach to make up thefinal classifier, which adopts the form of a decision listwhere rules are sorted in descending order by theirfitness. An extensive experimental study was carriedout, comparing the results of GBAP against thoseobtained by other state-of-the-art algorithms suchas AntMiner,66 Ant-Miner+,67 cAntMiner2-MDL,68

or PSO/ACO2,69 and other paradigms such as GP,70

decision trees,71 and reduced error pruning,72 demon-strating competitive or even better accuracy results.Later, the same authors proposed a multi-objectiveversion of this algorithm called multi-objectivegrammar-based ant programming (MOGBAP),73

which was capable of reporting statistically betterresults than its precursor. The most significant con-tribution of MOGBAP was the new Pareto strategythat was presented in this algorithm, where a separatePareto-front containing the non-dominated individu-als of each class is found at the end of each generation.This implements a kind of non-overlapping elitismbecause, if a classic Pareto approach were employed,best individuals of a given class could not pass throughdifferent generations, as they may be dominated byothers belonging to a different class. A parallel versionof this multi-objective algorithm using multithreadingand graphic processing units (GPUs)74 was publishedlater, called MOGBAP-GPU.75 In this algorithm, theevaluation of the solution encoded by each ant was

performed in the GPU using the NVIDIA CUDAprogramming model, while the creation of the ants,the multi-objective strategy, and the niching proce-dure were parallelized by using multithreading. Itdemonstrated to scale efficiently to larger data sets,and achieved a speedup up to 834× versus the CPUversion.

The extraction of classification rules in imbal-anced domains has also been tackled by using theAP metaheuristic. AP for imbalanced classification(APIC)76 is a multi-objective algorithm where multiplecolonies of ants are evolved, each one for predictinga different class of the data set. It was presented asan alternative to traditional imbalanced approaches,where most approaches are devoted to binary datasets and require modifications or extensions in orderto be applicable to multi-class problems. As mostbinary algorithms do not have a multi-class extension,when dealing with multi-class data sets, it is neces-sary to carry out a one-versus-one or one-versus-alldecomposition77 to reduce the original data set to sev-eral binary data sets, building a classifier for eachcombination. APIC is able to cope directly with bothbinary and multi-class data sets, simplifying the com-plexity of the model. APIC reported better results thanthe other algorithms in boundary situations, due to thefact that the classifier is trained knowing the existenceof all the classes.

Following the DM applications, the associationrule mining task was also tackled by using AP inRef 78, where two different algorithms founded inthe use of a CFG were presented, showing somebenefits with respect to classic exhaustive methods andbeing competitive regarding other GP algorithms. Anextension for discovering infrequent association ruleswas also proposed.79

Just as several variants have appeared for GP,the same happens for AP. One of them, cartesianGP,80 has its homonym in AP, also called cartesianAP (CAP). In CAP, individuals are represented as agraph addressed on the Cartesian co-ordinate system,and can be executed as a computer program. Itdistinguishes between genotype and phenotype, thegenotype being a string of integers of fixed size thatmaps to the phenotype, which is an executable graph.

The first CAP algorithm was presented by Haraet al.,81 where each individual encodes a graph rep-resentation. The computer program built by a givenant is generated by the movements through the graph,the route selection being based on the pheromoneintensity. The authors explored the performance ofCAP over symbolic regression problems and alsoon the spiral problem, which is another problem ofregression where two spirals of points are considered,



and where the main goal is to identify to which spiraleach point belongs. The second CAP algorithm wasproposed by Luis and dos Santos,82 having differencesrelated to the use of the ASrank83 ACO algorithminstead of the MMAS.84 The authors compared theiralgorithm against the original CAP version in dynamicsymbolic regression, showing a better performance oftheir proposal.

THE PSP METAHEURISTIC

Particle Swarm OptimizationPSO was introduced by Kennedy and Eberhart85

as a population-based stochastic global optimizationmethod inspired in the sociological behavior of birdflocks when flying. PSO requires the search space to becontinuous, and particles or individuals are encodedby vectors of real numbers that represent the positionof the particle and its velocity. Each particle has alsoa memory to store its previous best position. Thevelocity of the particle in a given iteration of thealgorithm is based upon the velocity of this particlein the previous one, its best position known and theglobal best position found by the population. Thus,the swarm is a collection of particles where eachone exhibits two behaviors: moving toward the bestparticle in the swarm and moving back toward its ownknown best position.

The standard PSO pseudocode is shown in Algo-rithm 2. It begins initializing a population of particleswith random positions and velocities. Then, while thetermination condition is not reached, for each particlein the population, its fitness is evaluated and comparedagainst the best historical fitness of the particle and thebest global fitness in the population. In case that thenew fitness improves any of them, they are replacedby the current solution. Then, each particle moves toa new position, adapting its velocity.

Particle Swarm ProgrammingThe automated construction of programs by usingPSO was explored initially by O’Neill and Brabazon,86

calling their approach grammatical swarm (GS). GS isbased on GE, in which a grammar is used to definea language and decode candidate solutions to a validrepresentation (program). Actually, in this first versionof GS, each particle or individual was encoded by afixed-length string structure, and a CFG expressed inBNF is employed to guide the construction of syn-tactically valid programs. Specifically, each individualshows the sequence of rule numbers applied to con-struct a program from the starting symbol of the gram-mar. An example of genotype–phenotype mapping inGS is shown in Figure 6, with the CFG was adopted forthe quartic symbolic regression problem. However, themain difference is that GE uses a GA in the search pro-cess while GS uses a PSO algorithm. The experimentscompared the performance of GS against GE over fourstandard automatic programming problems: Santa Feant trail, quartic symbolic regression, 3-multiplexer,and mastermind, showing the ability of GS for gen-erating computer programs, outperforming GE in twoof the problems studied. In addition, GS presents theadvantage of its simplicity, as it lacks operators thatare needed in GE and GP, such as crossover, selectionor replacement. An extended version of this work wasalso published, analyzing several parameter configura-tions and update constraints.87

In Ref 88, the CFG was adapted for two clas-sification problems, mushroom and eukaryotic pro-moter sequence detection. A comparison of the resultsobtained for GS and GE was carried out, and inboth problems GS was on a par with GE in terms ofthe quality of the classifiers induced. Later, Ramsteinet al.89,90 designed a variant of the GS algorithm wherea probability was used to select the production rule tobe applied, instead of computing it directly by apply-ing the modulo operator used in GE and GS. This GSvariant was applied to the identification of a particularprotein family, obtaining compact and understandablerules. It was compared against GE, obtaining slightlybetter AUC results and proving to be computationallyless expensive than GE.

Two recent applications of the GS algorithm canbe found in the literature. One of them was devotedto evolve different velocity update equations for eachparticle using GS, in order to avoid premature con-vergence (in PSO, the same velocity update equationis used for all particles). The other application of GSfocused on designing neural network topologies. Theauthors defined a specific grammar to be used in theGS algorithm so that feed-forward connections withone or more consecutive layers could be generated.Then, they used a PSO algorithm to train the net-works generated. The method proposed was testedon nine classification and nine regression problems



GS individual

Grammar

Expression tree

Derivation tree

Derivation sequence

<expr>

<expr><op><expr>

<var><op><expr>

a<op><expr>

a* <expr>

a* <var>

a * a

14 mod 2 = 0

5 mod 2 = 1

21 mod 1 = 0

6 mod 4 = 2

7 mod 2 = 1

18 mod 1 = 0

<expr>

<expr> <expr><op>

<var> <var>*

*

a a

a a

0

0

0

1

123

<expr> ::= <expr> <op> <expr> I

<op> ::= + |

<var> ::= a

– |* |/

<var>

14 5 21 6 7 18

FIGURE 6 | An example of the genotype–phenotype mapping in grammatical swarm (GS) from a linear chromosome. The integer values are usedto select production rules of the context-free grammar (CFG), producing a derivation sequence that can be kept as a derivation tree, which is furtherdecoded to an expression tree.

and compared against the state-of-the-art methods:RPROP, BFGS, and MinFinder. The results obtainedshowed that the combination of GS to construct theneural network and PSO to train it outperformed theothers, having also the advantage of being computa-tionally less expensive.

O’Neill et al. extended the previous workson GS by introducing a new encoding schemeusing variable-length string structures to encodeindividuals.91 The same four benchmark problemsused in the earlier work were used to test the perfor-mance of this variable-length GS against the originalfixed-length GS and GE, showing that the simplerfixed-length version was superior for the experimentscarried out.

The aforementioned PSP algorithms use a linearstructure, either fixed or variable, to encode individu-als. There are also proposals that address the extrac-tion of computer programs represented as parse trees.The first PSP algorithm using this encoding was pre-sented by Veenhuis et al.92 under the name of treeswarm optimization (TSO). This algorithm replacesthe position vectors that particles use in PSO byexpression trees. The TSO algorithm was tested onfour symbolic regression functions showing that lessnumber of evaluations were needed compared withAntTAG and GP. It was also tested on the Santa Fe anttrail problem, although poor results were obtained.Finally, the TSO algorithm was tested over the Irisclassification data set, where the individual directly



encoded the decision tree to be used as classifier. Thefitness function used was the number of misclassi-fied instances. In this classification problem, the algo-rithm showed very close results to those obtained bythe C4.5 decision tree algorithm and a simple GPapproach. The advantage of using TSO in this lat-ter case was that it needed significantly fewer eval-uations on average than those needed by the GPalgorithm.

The second PSP tree-based proposal was pre-sented by Togelius et al.,93 using a generic extensionof PSO called geometric PSO (GPSO) as the searchtechnique. This variant allows PSO to be applied toalmost any search space, provided it is possible tomeasure the distance between two points, there is amutation operator that alters a point randomly, andthat there is a weighted crossover operator capableof generating an offspring point placed between itstwo parent points. The authors presented three differ-ent weighted crossover operators for expression trees,and their experiments proved that their method per-formed as well as GA on Santa Fe ant trail and sym-bolic regression benchmarks. However, the authors inthat paper pointed out that an initial proposal of thisnature is unlikely to outperform the majority of highlyspecialized algorithms within parametric and combi-natorial optimization.

Qi et al.94 recently proposed another PSP algo-rithm called HGPPSO where they also changed thelinear encoding of PSO into tree encoding, thusredefining the evolving rules of PSO. The performanceof the proposed model was tested on two symbolicregression problems, showing that it achieved betterresults than traditional GP regarding convergencetime and average convergence generations.

THE BSP METAHEURISTIC

Artificial Bee Colony AlgorithmBSI is a computational intelligence paradigminspired in the intelligent behaviors of honey beeswarms. Several BSI algorithms have been developedso far, each one incorporating different behav-iors that bee colonies present in nature, such asdance and communication,95 mating,96 marriage,97

reproduction,98 task allocation,99 foraging,100 floraland pheromone laying,101 navigation,102 or collectivedecision and nest site selection.103

Among all, the most representative algorithmis artificial bee colony (ABC),19 which holds almosta 60% of BSI publications. ABC is based on bees’decentralized foraging behavior, where bees balanceexploitation of known food sources with explorationof new food sources. ABC considers three types of

artificial bees, which can be seen as procedures formanaging computational resources by exploring newsolutions to the problem: employed bees (foragers),onlooker bees (dance observers), and scouts (randomexplorers). The latter two kinds of bees are also calledunemployed bees. Note that in this algorithm, theposition of a food source encodes a solution to theproblem, while the nectar amount indicates the fitnessof the solution.

The main steps of the ABC algorithm are shownin Algorithm 3. The population is initialized first withscout bees. Then, the main loop of the algorithmstarts, where the first phase involves the search fornew food sources by the employed bees. They searchfor food sources that have more nectar within theneighborhood of the food source in their memory,and when they find one they evaluate its fitness,and a greedy selection process between both foodsources is applied. The new information is sharedwith onlooker bees with a certain probability bydancing. In the onlookers phase, after observing theemployed bees dance, they choose a food source byusing a probability-based selection process. The higherthe nectar content in a food source, the higher theprobability that the onlookers will prefer it. After afood source for an onlooker is chosen, a neighborsource is determined, and a greedy selection is appliedbetween them, as in the employed bees phase. Finally,in the scouts phase, those employed bees whose foodsource has been exhausted (i.e., after a number of trialstheir solutions cannot be improved), become a scoutfor searching for new food sources randomly.

Bee Swarm ProgrammingBSP refers to those automatic programming tech-niques where a BSI algorithm is used as the searchtechnique. To the best of our knowledge, just oneproposal of this type has been published, under thename of artificial bee colony programming (ABCP).16

It is based on the use of ABC as the search tech-nique. Specifically, ABCP is an extended version of theABC algorithm for program induction, where morecomplex structures are used for problem representa-tion. The position of a given food source in ABCP



Candidate solution produced

Current solution

cos sin

cos

1

1

1x

x

x

x x

x x

x x

x

+

−

+

–

Neighbourhood solution

FIGURE 7 | Artificial bee colony programming (ABCP) sharingmechanism.

corresponds to a computer program or individual,encoded by an expression tree, which consists of ter-minals and functions. In the work by Karaboga et al.,ABCP was applied to symbolic regression, consideringthe arithmetic operations (+, −, ×, ÷) and the math-ematical functions (sin, cos, exp, log) as functions,while the set of terminals was made of the x and yreal variables and the constant value 1. To generatethe initial population of individuals, the well-knownramped-half-and-half4 GP initialization method wasused. The quality of each individual or program ismeasured by using a raw fitness, which is computed asthe sum of the absolute errors between the obtainedand the target functions’ results, taken for severaltest cases.

A significant adaptation concerning candidatesolutions generation from the original ABC algorithmhas been carried out in ABCP, as individuals areencoded in this latter by using expression trees, and,therefore, do not present a fixed length. To build acandidate solution, ABCP uses a sharing mechanismthat consists in randomly selecting two nodes, onefrom a neighbor solution and another one fromthe current solution. Then the candidate solution isproduced by replacing this latter with the subtree fromthe neighborhood solution whose root corresponds tothe node previously chosen. Figure 7 illustrates thesharing operation.

The ABCP algorithm was tested over 10 sym-bolic regression problems including polynomial,trigonometric, logarithmic, square root, and bivariatefunctions, comparing its results against GP usingseveral crossover operators. ABCP results outper-formed GP in all problems except for the logarithmic

function, where GP behaved better when using twovariants of the semantic similarity-based crossoveroperator. The performance of ABCP was also com-pared with that obtained by the DAP algorithm(introduced in the section The AP Metaheuristic)over three functions, obtaining better results fortwo of them and performing equally well for thethird one.

More recently, Si et al. have presented anotherBSP algorithm called GBC (grammatical beecolony).17 In this work, the original ABC algo-rithm is also used as the search technique to generatecomputer programs through genotype-to-phenotypemapping by using a CFG expressed in BNF, similarto that used in GS and shown in Figure 6. In GBC,a variable-length linear structure is instead of parsetrees to encode individuals. As the original ABC algo-rithm represents solutions as positions of the foodsource, in the GBC algorithm, a food source’s posi-tion represents a genotype. Using the CFG, programs(phenotype) are generated from the food source’sposition.

GBC was tested on four standard GP bench-marks, namely Santa Fe Ant Trail, symbolicregression, even-3 parity, and multiplexer. Meanand standard deviation of best-run-errors of eachproblem were reported, showing the ability of GBCto generate programs successfully. These results werecompared with those obtained by GS and GDE, show-ing poorer performance except for the even 3-parityproblem.

OTHER SWARM PROGRAMMINGMETAHEURISTICS

Artificial Fish Swarm ProgrammingInitially proposed by Li et al.,104 artificial fish swarmoptimization (AFSO) is an SI metaheuristic inspired bythe activities of fishes in water for locating and discov-ering nutritious areas, such as preying, following, andswarming.

In AFSO, the population consists of a number ofartificial fish individuals that can search the solutionspace via several behaviors for reaching the globaloptimum. The next behavior adopted by a givenartificial fish depends on its current status and its localenvironment, as each artificial fish cooperates withothers in its own neighborhood sharing informationvia social behavior. The basic functions of artificialfishes are preying, following, swarming, randomlymoving, and leaping. Preying is a behavior that tendsto the food, as a given fish generally perceives theconcentration of food in water to determine the move-ment by vision or sense and then chooses the tendency.



sinexp

sin In

x xp px x

l

– +

*

AF individual 1 AF individual 2

FIGURE 8 | Two sample artificial fish individuals with the same structure.

Swarming is the behavior that fishes adopt in responseto a threat, grouping naturally in the moving process.Following is the behavior that neighbors of a givenfish perform when the fish locates food. Randomlymoving is what fishes do in water when they swimrandomly while seeking food or companions in largerranges. And leaping is a function introduced in artifi-cial fish to avoid stagnation, increasing the probabilityto leap out local optimum and reach global optimum.A recent review on AFSO can be found in Ref 105.

AFSP was first introduced by Liu et al.,106 whoreported the first study on the application of AFSO forsolving symbolic regression problems, where individ-uals were encoded by using a GEP scheme. In order todetermine whether two artificial fish individuals havethe same structure, all that is required is to checkwhether the feature codes, i.e., the number of argu-ments taken by each operator in their gene expression,are the same. Figure 8 shows two sample artificialfish individuals, and Table 1 shows that these indi-viduals have the same feature codes but different geneexpressions. The shaded areas represent the noncod-ing region, which are redundant to represent the parsetree encoded by the individuals, while non-shadedareas are the ORFs.

The authors defined a new penalty-based fit-ness function to measure the quality of individuals,which considers the number of nodes in the parse treeencoded by the individual as a constraint. The behav-iors considered in the AFSP proposal were randomlymoving, preying, following, and avoiding. Regardingthe latter, artificial fish individuals avoid the fishswarm to get over-crowded. For a given individual,the number of partners having the same parse treestructure is counted, and then a degree of congestionis computed. If it is less than an acceptable thresholdestablished as parameter, the individual remains thesame; if not, its parse structure is changed.

The pseudocode of the AFSP proposal is shownin Algorithm 4. It starts by initializing the parameters,the input data for symbolic regression and the popula-tion of AF individuals. Then, while the termination cri-teria are not met, the following actions are performed.First, individuals are evaluated and the best artificialfish is selected as AFbest. Then, for each AF in the pop-ulation, it performs avoiding and preying behaviors.If the AF does not improve, then it performs follow-ing behavior. If this does not improve either, then itperforms randomly moving behavior. During the evo-lutionary process, the best individual of each iterationis kept as AFbest.

The performance of the AFSP algorithm pro-posed was tested over four symbolic regression prob-lems, comparing the results obtained against twoGEP algorithms. Results indicated that the proposedmethod can reach higher-precision solutions with



TABLE 1 Gene Expression and Feature Code of Two Artificial Fishes

AF1 Gene expression exp − sin / x x p x x

Feature code 1 2 1 2 0 0 0 NULL

AF2 Gene expression sin + ln * x p x x x

Feature code 1 2 1 2 0 0 0 NULL

quicker convergence than GEP. The importance ofthe behavior operators included in AFSP was alsoinvestigated in this paper, finding that the algorithm’sperformance relied largely on preying and followingbehaviors, and it depended on avoiding and randomlybehaviors to a lesser degree.

Herd ProgrammingAs a flock for birds or a school for fishes, a herdis a large group of mammals that feed, travel, orare kept together. Some of the most important herdalgorithms are the bat algorithm, inspired in theecholocation and hunting behavior of bats107; thewolf pack search algorithm,108 based on the socialbehavior of wolves; or the dolphins herd algorithm,a bio-inspired algorithm based on the social behaviorof dolphin herds. The employment of any of theaforementioned algorithms as the search techniquein automatic programming could be considered as akind of HP. However, the first automatic program-ming algorithm based on herd movements, introducedrecently by Headleand and Teahan,109 was inspiredby horses. The authors called their method gram-matical herding (GH), as it uses a similar encodingscheme to that employed in GE or GS, representedin Figure 6. GH was proposed as a new fitness-basedautomatic programming technique that follows asimple set of rules that horse herds exhibit. The aim ofthe algorithm is to guide candidate solutions towardareas of the environment that are known to producesolutions of high quality. The whole population ofhorses or individuals, similar to that existent in anyother population-based algorithm, is known as theherd. Three kinds of agents are considered in the herd:alphas, betas, and the rest of the individuals. Eachindividual stores the personal best location found, alsokeeping the fitness value associated with that position.An arbitrary number of individuals in the populationwho have the highest personal best fitness are knownas the betas, and their position and fitness guide themovements of the herd. Alphas are the elitist betas,and they are used as targets for the weakest individu-als to be driven toward. This has the effect of herdingthose individuals with bad fitness to areas of the searchspace with potentially higher quality. This is inspired

by direct observations of herds of horses in nature,where weaker and younger members of the herd movetoward the alpha mares guided by the stallions.

The main pseudocode of the GH algorithm isshown in Algorithm 5. First, the size of the herd, thebetas, and the alphas is set, and a herd of candidateindividuals with random positions is generated. Then,the main loop of the algorithm starts, while the termi-nation condition is not reached. Once individuals areevaluated, those individuals with the highest personalbest fitness are selected as betas. Equally, withinthe betas, the subset with the highest personal bestfitness is selected to make up the alphas. Then, foreach agent in the herd, if its fitness is lower than theaverage fitness of the herd, its position is updated withthe location of one of the alphas, selected at random.Next, each agent moves to a new position, which iscomputed between its personal best location and thatof a randomly selected member of the set of betas.This position is determined by weighting the attrac-tions of the two points in a similar fashion to the waygravity works. The equilibrium point between the pullof the personal best location and the pull of the targetbeta’s personal best location is set as the new position.

The performance of GH was tested over theSanta Fe ant trail standard benchmark. The authorsfound that most of experiments created a successfulprogram capable of reaching the optimal solution,although others failed to reach this goal. The mostsignificant advantage with respect to GE was the factthat GH was able to obtain moderate to high fitnesssolutions quicker in less computational time than GE.

After this initial work, the same authors110 pro-posed another algorithm where GH was used to



quickly search a solution space to seed the initial popu-lation of a GE algorithm, with the aim of reducing thecomputational time of this latter, as well as obtaininga fitter solution than that obtained by GH.

Firefly ProgrammingThe firefly algorithm (FA), proposed by Yang,111 isa relatively new SI paradigm mainly based on theflashing patterns and behavior of fireflies. It findsseveral similarities with the standard PSO algorithm,but in the original article, Yang states that the PSOis simply a special case of the FA. In nature, thedisplay of flashing lights from fireflies is associatedwith mating habits. Yang idealized the biologicalphenomenon with the following rules. Firstly, thealgorithm would have unisex fireflies that would beattracted to any other fireflies regardless. Secondly,attractiveness is proportional to the brightness, thusfor any two flashing fireflies, the less brighter onewill move toward the brighter. If there is no brighterone than a particular firefly, it will prey by randomlymoving in the search space. In addition, the brightnessof a firefly is affected by the landscape of the objectivefunction.

The automatic construction of programs byusing the FA as the search technique is referred to asFP. The first FP algorithm has been published veryrecently by Husselmann and Hawick in Ref 112,where the authors presented a parallel implementationin GPUs of the FA algorithm for expression trees calledgeometric firefly algorithm (GFA). This algorithm fol-lows a GEP scheme. As it is a firefly-based algorithm,it operates by having all candidates to be aware ofone another. Under these circumstances, implement-ing a crossover operator with k-expressions involvesmulti-parent recombination. The metric used to judgedifferences between candidates was the Hamming dis-tance, which allows to quantify the search space andintroduce fitness values degraded exponentially.

The pseudocode of the GFA algorithm is shownin Algorithm 6. The typical lifecycle of a GPU-basedalgorithm involves copying some data onto the device,executing a sequence of GPU-specific code nameda kernel, and copying the resulting data back tothe host. In Algorithm 6, specific parts executedin the GPU are preceded by the word ‘CUDA’, asthis is the computing architecture that allows totake advantage of the NVIDIA GPUs in a generalpurpose manner. The algorithm begins initializing theparameters and allocating memory for the populationof individuals, initializing them randomly. The fireflysearch process comprises the inside of the while loop,and is composed of the following steps: Firstly, as

k-expressions are the genotypes of their respectivephenotypes, expressions are interpreted to obtain anexecutable tree. Then, the fitness of the solutionencoded is evaluated. Before the recombination byroulette selection occurs, the observed fitness valuesby exponential decay are computed. Finally, fireflypositions are moved in the search space, applying theroulette wheel to select the firefly to be recombinedwith, and performing the crossover. Note that thefireflies have a higher likelihood to be moved towardthe more attractive individuals.

The GFA was tested only against one symbolicregression problem (the sextic polynomial), compar-ing the results to a GPU-based GEP implementa-tion with tournament selection. The authors statethat results are promising and justify further analy-sis, investigating the behavior of the algorithm in otherapplication areas.

OPEN ISSUESSI is growing steadily, and other intelligent swarmingbehaviors observed in nature are inspiring researchersto develop new algorithms, such as those exhib-ited by bats,107 krill herds,113 penguins,114 or socialspiders,115 among others. Therefore, it would beinteresting to explore the employment of these newoptimization algorithms as the search technique inautomatic programming. This conclusion can be sup-ported in the fact that four SP techniques presented inthis work have appeared since 2012: BSP, AFSP, FP,and HP. Moreover, there are classic SI algorithms suchas BFO whose application to the automatic evolutionof programs has not been explored yet.

AP is the SP technique that has attracted moreattention and has been applied to most domains, butstill we can find limitations inherent to the discretesearch space that ACO uses, which hinders its appli-cation to continuous domains. For instance, AP pro-posals for DM require a previous preprocessing stepin order to transform numerical attributes into cate-gorical ones. A proposal able to cope with continuousattributes ‘on-the-fly’ would be very interesting andmight improve the performance of these algorithms.



Other open issues in SP include the explorationof other encoding representations. Different represen-tations will lead to different degrees of prowess beingreached depending on the application and the prob-lem addressed. Moreover, this is also a crucial issueif we bear in mind that four of the SP paradigmsreviewed in this work are of very recent appearingand thus few representation schemes (two or less) havebeen considered. In addition, concerning BSP, the exis-tent proposals use the ABC algorithm as the searchtechnique. Many other BSI algorithms have been pub-lished, and they could be used instead of ABC aswell. Regarding AFSP and FP, only one proposal hasbeen presented. Finally, there are just two articles pub-lished with regard to HP, both using a fixed-linearstructure. New proposals using tree-based encodingschemes may be proposed in the future.

The employment of a grammar to replace the ter-minal and function sets and guide the search of validindividuals has been scarcely explored. AP seems to bethe SP metaheuristic that pays greater attention to thisissue, although the use of a GE-like approach in AP hasnot been explored yet. The other SP paradigms thathave grammar-based proposals are PSP, BSP, and HP,which have presented GE-like variants. Thus, the useof grammars in SP warrant further research becauseone of the benefits is the possibility of adapting thealgorithms insert problem-dependent knowledge inorder to address particular problems.

The parallelization of SP techniques is also anunexplored issue, although this kind of algorithms areparticularly suitable for parallel implementation, asswarming agents typically work in parallel. Actually,only two recently published references that addressthis issue have been found in the literature. The firstone is an AP algorithm in which the fitness evaluationstage was parallelized by using GPUs. The other isthe only algorithm presented that belongs to the FPmetaheuristic, where GPUs were used as well forthe parallelization. As proven is the aforementionedreferences, the computational time needed to executethe algorithms, the dimensionality of the problemaddressed, and the quality of the solutions could beimproved by using parallel computing. On the otherhand, concerning the computational time of existentalgorithms, few references show the speeds achieved,and algorithms’ complexity is also a research area thatrequires more in-depth analysis.

As for any metaheuristic algorithm, anotherimportant area of research in SP is parameter tun-ing. All SP algorithms have algorithm-dependentparameters, and the values selected to set theseparameters can have great influence in results and per-formance. Studies devoted to identify optimal setting

for algorithm-dependent parameters are needed. Inaddition, self-adaptive proposals can help non-expertusers to use these techniques with no prior knowledgeabout the problem domain or the technique itself.Moreover, the automated design of algorithms byusing SP metaheuristics should be explored, owing totheir representation power. Algorithm selection andgeneration are crucial for all types of domains havingmany methods available with many parameters to besetup, but with no clear criteria for choosing them.116

Although we have seen diverse applications ofSP in this paper, it is easy to notice that most SP tech-niques are in a very incipient stage and still limited tosmall-scale problems. Table 2 summarizes the existentSP algorithms proposed in the literature grouped byapplication domain, SP metaheuristic, and encodingscheme, showing also the benchmarked techniques.Some common benchmark problems addressed, suchas quarctic and low-order polynomial symbolic regres-sion, multiplexer, or Santa Fe ant trail, are outdatedand toy problems that can lead to misleading or mean-ingless results. There is no doubt that more appli-cations of these metaheuristics will emerge in thenear future. Anyway, among the SP metaheuristicsthat have perceived more attention, such as AP, itwould be interesting to explore their application tonew areas such as telecommunications, bioinformat-ics, large-scale real-world applications, and other DMtasks. Actually, the literature table reveals that AP ini-tiatives within DM are limited to the classificationand association rule mining tasks, while PSP has beenapplied only to regression and classification. Given thesuccess of existing swarm algorithms for DM,128 simi-lar promising results could be expected in other tasks.

From this table, it is easy to realize at a glancethat several SP algorithms of the same metaheuris-tic have been presented in a specific application area.However, there are no experimental studies comparingthe existent proposals among themselves, which couldbe useful to determine their behavior and performancedepending on several factors, such as the encoding, thecharacteristics of the SI algorithm to guide the search,the configuration of parameters, and the specific prob-lem addressed. Thus, it would be valuable to carry outsuch comparative studies, although it is important tobear in mind that the implementation code of most ofthe proposed SP algorithms is not publicly available,making comparisons among benchmarks difficult.

Finally, many hybridizations of SI with otherautomatic programming techniques can be found inthe literature, for many goals: self-adapt the mutationrate in linear GP,129 improve the power of crossoveroperator in GP,130 optimize GP-evolved arithmeticclassifier expressions,131 optimize parameters and


Overview wires.wiley.com/widmTA

BLE

2Li

tera

ture

Ove

rvie

wTa

ble

ofSP

Algo

rithm

sG

roup

edby

Appl

icat

ion

Dom

ain,

Met

aheu

ristic

,and

Enco

ding

Sche

me

Use

d

Appl

icat

ion

Dom

ain

Met

aheu

ristic

Enco

ding

Algo

rithm

Benc

hmar

ked

Tech

niqu

esPu

blic

atio

ns

Sym

bolic

regr

essi

onAP

Prot

otyp

etr

eeAP

GP

111

Para

llelA

Pge

nop

sAP

48

Deriv

atio

ntr

eeus

ing

TAG

AntT

AGG

GG

P,TA

G3P

23,6

0

Expr

essi

ontr

eeaf

tert

rave

rsin

ga

grap

hAC

PG

P49

,117

,50,

52–5

4,51

Grid

ACP

46

DAP

GP

13,5

5

Path

over

deriv

atio

ntr

eeus

ing

CFG

GAP

EGAP

63

EGAP

GP,

GAP

63,4

4

Line

arar

ray

CAP

81

CGP-

ACO

CAP

82

PSP

Line

arar

ray

GS

GE

86,8

7,11

8,11

9

Varia

ble-

leng

thG

SG

S,G

E91

Expr

essi

ontr

eeTS

OG

P,An

tTAG

92

HGPP

SOG

P94

BSP

Expr

essi

ontr

eeAB

CPG

P19

Line

arar

ray

GBC

GS,

GDE

17

AFSP

Gen

eex

pres

sion

tree

AFSP

GEP

,P-G

EP10

6

FPG

ene

expr

essi

ontr

eeG

FAG

EP11

2

Bool

ean

func

tions

APPr

otot

ype

tree

APG

P43

Para

llelA

Pge

nop

sAP

48

Expr

essi

ontr

eeaf

tert

rave

rsin

ga

grap

hAC

P46

Path

over

deriv

atio

ntr

eeus

ing

CFG

GAP

EGAP

63

EGAP

GP,

GAP

63,4

4

PSP

Line

arar

ray

GS

GE

86,8

7,11

8

Varia

ble-

leng

thG

SG

S,G

E91

BSP

Line

arar

ray

GBC

GS,

GDE

17

Plan

ning

APPa

thov

erde

rivat

ion

tree

usin

gCF

GEG

APG

P,G

AP63

,44

PSP

Line

arar

ray

GS

GE

86,8

7,11

8

Varia

ble-

leng

thG

SG

S,G

E91

Expr

essi

ontr

eeG

PSO

GP,

Rand

omse

arch

93

TSO

92



TAB

LE2

Cont

inue

d

Appl

icat

ion

Dom

ain

Met

aheu

ristic

Enco

ding

Algo

rithm

Benc

hmar

ked

Tech

niqu

esPu

blic

atio

ns

BSP

Line

arar

ray

GBC

GS,

GDE

17

HPLi

near

arra

yG

HG

E10

9,11

0

Opt

imal

cont

rol

APEx

pres

sion

tree

afte

rtr

aver

sing

agr

aph

ACP

RK56

,58,

14,5

7

Tim

ese

ries

pred

ictio

nAP

Prot

otyp

etr

eeAP

ARM

A,Fu

NN,

ANFI

S47

,120

Fina

nce

APPa

thov

erde

rivat

ion

tree

usin

gCF

GG

AP61

,62,

121

Regr

essi

onPS

PLi

near

arra

yG

S+

PSO

RPRO

P,BF

GS,

MIN

FIN

DER

122

Clas

sific

atio

nAP

Path

over

deriv

atio

ntr

eeus

ing

CFG

GBA

PAn

t-M

iner

,Ant

-Min

er+

,PSO

/ACO

2,Bo

jarc

zukG

P,JR

IP,P

ART

123,

124,

15,1

25

MO

GBA

PG

BAP,

Ant-

Min

er,c

Ant-

Min

er2-

MDL

,Ant

-Min

er+

,PS

O/A

CO2,

Boja

rczu

kGP,

JRIP

,PAR

T12

6,73

,125

APIC

AdaC

2,N

N-C

S,C-

SVM

-CS,

C4.5

-CS,

RUS-

C4.5

,SB

C-C4

.5,S

MOT

E-C4

.5,S

MOT

E+

TL-C

4.5,

OVO

-C4.

5-CS

,OVA

-C4.

5-CS

76,1

25

GPU

-MO

GBA

PM

OG

BAP

75

PSP

Line

arar

ray

GS

GE

88–9

0

GS+

PSO

RPRO

P,BF

GS,

MIN

FIN

DER

122

Expr

essi

ontr

eeTS

OG

P,C4

.592

Asso

ciat

ion

rule

sAP

Path

over

deriv

atio

ntr

eeus

ing

CFG

GBA

P-AR

MM

OG

BAP-

ARM

,G3P

ARM

,NSG

A-G

3PAR

M,

SPEA

-G3P

ARM

,ARM

GA,

Aprio

ri,FP

-Gro

wth

127,

78

MO

GBA

P-AR

MG

BAP-

ARM

,G3P

ARM

,NSG

A-G

3PAR

M,

SPEA

-G3P

ARM

,ARM

GA,

Aprio

ri,FP

-Gro

wth

78

GBA

P-RA

RMAp

riori-

Inve

rse,

Aprio

ri-In

frequ

ent,

ARIM

A,M

RG-E

xp,R

are-

G3P

ARM

,MO

GBA

P-RA

RM79

MO

GBA

P-RA

RMAp

riori-

Inve

rse,

Aprio

ri-In

frequ

ent,

ARIM

A,M

RG-E

xp,R

are-

G3P

ARM

,GBA

P-RA

RM79

Biol

ogy

PSP

Line

arar

ray

GS

GE

89,9

0

AP,

antp

rogr

amm

ing;

AC

P,an

tcol

ony

prog

ram

min

g;PS

P,pa

rtic

lesw

arm

prog

ram

min

g;B

SP,b

eesw

arm

prog

ram

min

g;A

FSP,

arti

ficia

lfish

swar

mpr

ogra

mm

ing;

FP,fi

refly

prog

ram

min

g;H

P,he

rdpr

ogra

mm

ing;

GA

P,ge

nera

lized

ant

prog

ram

min

g;E

GA

P,en

hanc

edge

nera

lized

ant

prog

ram

min

g;C

AP,

cart

esia

nA

P;G

S,gr

amm

atic

alsw

arm

;TSO

,tre

esw

arm

opti

miz

atio

n;A

BC

P,ar

tific

ialb

eeco

lony

prog

ram

min

g;G

BC

,gra

mm

atic

albe

eco

lony

;GFA

,geo

met

ric

firefl

yal

gori

thm

;GE

,Gra

mm

atic

alev

olut

ion;

GB

AP,

gram

mar

-bas

edan

tpro

gram

min

g;M

OG

BA

P,m

ulti

-obj

ecti

vegr

amm

ar-b

ased

antp

rogr

amm

ing;

API

C,A

Pfo

rim

bala

nced

clas

sific

atio

n;T

AG

,tre

e-ad

join

ing

gram

mar

;AC

O,a

ntco

lony

opti

miz

atio

n;PS

O,p

arti

cle

swar

mop

tim

izat

ion;

GPS

O,g

eom

etri

cPS

O;G

3P,g

ram

mar

-gui

ded

GP;

GE

P,ge

neex

pres

sion

prog

ram

min

g;G

P,ge

neti

cpr

ogra

mm

ing;

DA

P,dy

nam

ican

tpr

ogra

mm

ing;

GD

E,g

ram

mat

ical

diff

eren

tial

evol

utio

n;A

RM

,ass

ocia

tion

rule

min

ing;

RA

RM

,rar

eas

soci

atio

nru

lem

inin

g;.



connected weights of ANNs,132 etc. However, to ourknowledge, only one proposal exists the other wayround, presented by Hara et al.48, where genetic oper-ators were used to change the structure of individualsin AP. It would be interesting to explore the combi-nation of SP with other techniques to improve theirperformance in specific domains, as the combinationof concepts from several metaheuristics allows to takeadvantage of the strengths of each one, as it has beendemonstrated so far with applications in several fields.

CONCLUDING REMARKS

Since its inception, SI has increasingly attracted theattention of the artificial intelligence research com-munity, having applications in many domains. In thispaper, the state-of-the-art of those automatic program-ming methods that use a SI algorithm as the searchtechnique has been reviewed, introducing a unifiedpresentation under the name of swarm programming(SP). Although SP makes up a small field inside SI,the number of publications is growing significantly,and more attention is being devoted to the develop-ment of automatic programming algorithms based in

SI. Currently stated SP metaheuristics include applica-tions in many domains, such as symbolic regression,finance, medicine, industry, and DM. In addition, newSI techniques are emerging, which lead us to thinkthat, in the near future, other SP metaheuristics relatedto these new SI techniques will appear.

This survey has presented and introduced theexisting SP proposals, classifying them into six meta-heuristics depending on their base SI algorithm. Theencoding schemes explored by each SP metaheuris-tic have been summarized, presenting the particular-ities of each SP algorithm. A taxonomy of the appli-cation areas grouped by metaheuristic and encodingscheme used has been presented, which can be usefulfor researchers to position their new proposals into theexistent literature for comparison purposes. The sur-vey has proved that SP metaheuristics, although incip-ient, are promising candidates to be applied on manydifferent application domains.

The field is still growing rapidly, with manyimportant directions currently being explored. Thus,we have outlined some open questions and challengesthat can inspire further research in these areas in thenear future.

ACKNOWLEDGMENTS

The authors thank Dr Carlos García-Martínez for his valuable comments and suggestions to improvethis manuscript. This work was supported by the Spanish Ministry of Science and Technology projects,TIN-2011-22408, and FEDER funds.

REFERENCES1. Floreano D, Mattiussi C. Bio-Inspired Artificial Intelli-

gence: Theories, Methods, and Technologies. The MITPress, Cambridge, MA, USA; 2008.

2. Yang XS, Cui Z, Xiao R, Gandomi AH, KaramanogluM, eds. Swarm Intelligence and Bio-Inspired Compu-tation. Theory and Applications. Elsevier; 2013. doi:10.1016/B978-0-12-405163-8.00020-X.

3. Zhang G. Neural networks for classification: a survey.IEEE Trans Syst Man Cybern C Appl Rev 2000, 30:451–462. doi: 10.1109/5326.897072.

4. Eiben AE, Smith JE. Introduction to EvolutionaryComputing. Natural Computing Series. 2nd ed.Springer; 2007.

5. Zheng J, Chen Y, Zhang W. A survey of artificialimmune applications. Artif Intell Rev 2010, 34:19–34.doi: 10.1007/s10462-010-9159-9.

6. Bonabeu E, Eric T, Dorigo M. Swarm Intelligence:From Natural to Artificial Systems. Oxford UniversityPress, New York, NY, USA; 1999.

7. Dorigo M, Birattari M. Swarm intelligence. Scholarpe-dia 2007, 2:1462.

8. Kennedy J. Particle swarm optimization. In: SammutC, Webb G, eds. Encyclopedia of Machine Learn-ing. Springer; 2010, 760–766. doi: 10.1007/978-0-387-30164-8_630.

9. Dorigo M, Gambardella L. Ant colony system: a coop-erative learning approach to the traveling salesmanproblems. IEEE Trans Evol Comput 1997, 1:53–66.

10. Karaboga D, Akay B. A survey: algorithms simulat-ing bee swarm intelligence. Artif Intell Rev 2009,31:61–85. doi: 10.1007/s10462-009-9127-4.

11. Yang XS. Swarm-based metaheuristic algorithms andno-free-lunch theorems. In: Theory and New Applica-tions of Swarm Intelligence. InTech; 2012, 1–16. doi:10.5772/30852.

12. Koza JR. Genetic Programming: On the Programmingof Computers by Means of Natural Selection. The MITPress, Cambridge, MA, USA; 1992.



13. Shirakawa S, Ogino S, Nagao T. Dynamic ant pro-gramming for automatic construction of programs.IEEJ Trans Electr Electron Eng 2008, 3:540–548. doi:10.1002/tee.20311.

14. Kumaresan N. Optimal control for stochastic linearquadratic singular periodic neuro Takagi-Sugeno (T-S)fuzzy system with singular cost using ant colony pro-gramming. Appl Math Modell 2011, 35:3797–3808.doi: 10.1016/j.apm.2011.02.017.

15. Olmo JL, Romero JR, Ventura S. Using ant program-ming guided by grammar for building rule-based clas-sifiers. IEEE Trans Syst Man Cybern B Cybern 2011,41:1585–1599. doi: 10.1109/TSMCB.2011.2157681.

16. Karaboga D, Ozturk C, Karaboga N, Gorkemli B.Artificial bee colony programming for symbolic regres-sion. Inform Sci 2012, 209:1–15. doi: 10.1016/j.ins.2012.05.002.

17. Si T, De A, Bhattacharjee A. Grammatical bee colony.In: Panigrahi B, Suganthan P, Das S, Dash S, eds.Swarm, Evolutionary, and Memetic Computing.LNCS, vol. 8297. Springer; 2013, 436–445. doi:10.1007/978-3-319-03753-0_39.

18. Parpinelli RS, Lopes HS. New inspirations in swarmintelligence: a survey. Int J Bio-Inspired Comput 2011,3:1–16. doi: 10.1504/IJBIC.2011.038700.

19. Karaboga D, Gorkemli B, Ozturk C, Karaboga N.A comprehensive survey: artificial bee colony (ABC)algorithm and applications. Artif Intell Rev 2014,42:21–57. doi: 10.1007/s10462-012-9328-0.

20. Millonas MM. Swarms, phase transitions and collec-tive intelligence. In: Langton C, ed. Artificial Life III.Addison-Wesley, Reading, MA, USA; 1994, 417–445.

21. Garnier S, Gautrais J, Theraulaz G. The biologicalprinciples of swarm intelligence. Swarm Intell 2007,1:3–31. doi: 10.1007/s11721-007-0004-y.

22. Koza JR, Andre D, Bennett FH, Keane MA. GeneticProgramming III: Darwinian Invention & ProblemSolving. Morgan Kaufmann Publishers Inc., San Fran-cisco, CA, USA; 1999.

23. Abbass HA, Hoai X, Mckay RI. AntTAG: a newmethod to compose computer programs using coloniesof ants. In: IEEE Congress on Evolutionary Compu-tation (IEEE CEC); 2002, 1654–1659. doi:10.1109/CEC.2002.1004490.

24. Vanneschi L, Castelli M, Silva S. Measuring bloat,overfitting and functional complexity in genetic pro-gramming. In: Genetic and Evolutionary ComputationConference (GECCO). ACM; 2010, 877–884. doi:10.1145/1830483.1830643.

25. O’Neill M, Ryan C. Grammatical Evolution: Evolu-tionary Automatic Programming in an Arbitrary Lan-guage. Kluwer Academic Publishers, Norwell, MA,USA; 2003.

26. Ryan C, Collins J, Neill M. Grammatical evolu-tion: evolving programs for an arbitrary language.

In: Genetic Programming. LNCS, vol. 1391. Springer;1998, 83–96. doi: 10.1007/BFb0055930.

27. Hoai NX, Mckay RI, Abbass HA. Tree adjoininggrammars, language bias, and genetic programming.In: European Conference on Genetic Programming(EuroGP). LNCS, Springer, Heidelberg, Berlin; 2003,2610:340–349. doi: 10.1007/3-540-36599-0_31.

28. Whigham P. Grammatically biased genetic program-ming. In: Workshop on Genetic Programming: FromTheory to Real-World Applications; 1995, 33–41.

29. Ferreira C. Gene Expression Programming: Math-ematical Modeling by an Artificial Intelligence.Springer; 2006.

30. Balázs K, Kóczy LT. Hierarchical-interpolative fuzzysystem construction by genetic and bacterial memeticprogramming approaches. Int J Uncertain FuzzinessKnowl-Based Syst 2012, 20:105–131. doi: 10.1142/S021848851240017X.

31. Botzheim J, Cabrita C, Koczy LT, Ruano AE. Geneticand bacterial programming for B-spline neural net-works design. J Adv Comput Intell Intell Inform 2007,11:220–231.

32. Cabrita C, Botzheim J, Ruano AE, Koczy LT. Designof B-spline neural networks using a bacterial pro-gramming approach. In: International Joint Confer-ence on Neural Networks (IJCNN); 2004, 2313–2318.doi:10.1109/IJCNN.2004.1380987.

33. Nawa N, Furuhashi T. Fuzzy system parameters dis-covery by bacterial evolutionary algorithm. IEEETrans Fuzzy Syst 1999, 7:608–616.

34. Passino K. Biomimicry of bacterial foraging for dis-tributed optimization and control. IEEE Control Syst2002, 22:52–67. doi: 10.1109/MCS.2002.1004010.

35. Johnson CG. Artificial immune system programmingfor symbolic regression. In: Genetic Programming.LNCS, vol. 2610. Berlin/Heidelberg: Springer; 2003,345–353. doi: 10.1007/3-540-36599-0_32.

36. Musilek P, Lau A, Reformat M, Wyard-Scott L.Immune programming. Inform Sci 2006, 176:972–1002. doi: 10.1016/j.ins.2005.03.009.

37. Bernardino H, Barbosa H. Grammar-based immuneprogramming. Nat Comput 2011, 10:209–241. doi:10.1007/s11047-010-9217-x.

38. Wang S, Ma J, He Q. An immune programming-basedranking function discovery approach for effec-tive information retrieval. Expert Syst Appl 2010,37:5863–5871. doi: 10.1016/j.eswa.2010.02.019.

39. Boussaïd I, Lepagnot J, Siarry P. A survey on optimiza-tion metaheuristics. Inform Sci 2013, 237:82–117.doi: 10.1016/j.ins.2013.02.041.

40. Timmis J, Andrews P, Hart E. On artificial immunesystems and swarm intelligence. Swarm Intell 2010,4:247–273. doi: 10.1007/s11721-010-0045-5.

41. Castro LN, Timmis J. Artificial Immune Systems: ANew Computational Intelligence Approach. Springer;2002.



42. Dorigo M, Maniezzo V, Colorni A. Ant system:optimization by a colony of cooperating agents. IEEETrans Syst Man Cybern B Cybern 1996, 26:29–41. doi:10.1109/3477.484436.

43. Roux O, Fonlupt C. Ant programming: or how touse ants for automatic programming. In: InternationalConference on Swarm Intelligence (ANTS); 2000,121–129.

44. Salehi-Abari A, White T. The uphill battle of antprogramming vs. genetic programming. In: Interna-tional Joint Conference on Computational Intelligence(IJCCI); 2009, 171–176.

45. Kouchakpour P, Zaknich A, Bräuni T. A survey andtaxonomy of performance improvement of canoni-cal genetic programming. Knowl Inform Syst 2009,21:1–39. doi: 10.1007/s10115-008-0184-9.

46. Rojas SA, Bentley PJ. A grid-based ant colony systemfor automatic program synthesis. In: Deb K, Poli R,Banzhaf W, Beyer H-G, Burke EK, Darwen PJ, Das-gupta D, Floreano D, Foster JA, Harman M, Hol-land O, Lanzi PL, Spector L, Tettamanzi A, ThierensD, Tyrrell AM, eds. Late Breaking papers at theGenetic and Evolutionary Computation Conference(GECCO-2004); 2004, 1–12.

47. Chen Y, Yang B, Dong J. Evolving flexible neural net-works using ant programming and PSO algorithms. In:Yin F-L, Wang J, Guo C, eds. Advances in Neural Net-works. LNCS, vol. 3173. Berlin/Heidelberg: Springer;2004. doi: 10.1007/978-3-540-28647-9_36.

48. Hara A, Kushida JI, Tanabe S, Takahama T. Par-allel ant programming using genetic operators. In:2013 IEEE Sixth International Workshop on Com-putational Intelligence Applications (IWCIA); 2013,75–80. doi:10.1109/IWCIA.2013.6624788.

49. Boryczka M, Czech ZJ. Solving approximation prob-lems by ant colony programming. In: Genetic and Evo-lutionary Computation Conference (GECCO); 2002,39–46.

50. Boryczka M, Czech ZJ, Wieczorek W. Ant colonyprogramming for approximation problems. In:Genetic and Evolutionary Computation Conference(GECCO); 2003, 142–143.

51. Green J, Whalley JL, Johnson, CG, et al. Automaticprogramming with ant colony optimization. In: UKWorkshop on Computational Intelligence. Loughbor-ough University; 2004, 70–77.

52. Boryczka M. Eliminating introns in ant colony pro-gramming. Fundam Inform 2005, 68:1–19.

53. Boryczka M. Ant colony programming with the can-didate list. In: Nguyen N, Jo G, Howlett R, JainL, eds. Agent and Multi-Agent Systems: Technologiesand Applications. LNCS, vol. 4953. Berlin/Heidelberg:Springer; 2008, 302–311. doi: 10.1007/978-3-540-78582-8_31.

54. Boryczka M. Ant Colony Programming: Applicationof Ant Colony System to Function Approximation.

In Chiong R, Ed. Intelligent Systems for AutomatedLearning and Adaptation: Emerging Trends andApplications (pp. 248–272). Hershey, PA: InformationScience Reference; 2010. doi: 10.4018/978-1-60566-798-0.ch011.

55. Shirakawa S, Ogino S, Nagao T. Automatic construc-tion of programs using dynamic ant programming. In:Ant Colony Optimization Methods and Applications.InTech; 2011, 75. doi: 10.5772/13786.

56. Kumaresan N. Optimal control for stochastic linearquadratic singular Takagi-Sugeno fuzzy system usingant colony programming. Neural Parallel Sci Comput2010, 18:89–108.

57. Kumaresan N. Optimal control for stochastic singularintegro-differential Takagi-Sugeno fuzzy system usingant colony programming. Filomat 2012, 26:415–426.doi: 10.2298/FIL1203415K.

58. Kumaresan N, Balasubramaniam P. Singular opti-mal control for stochastic linear quadratic singu-lar system using ant colony programming. Int JComput Math 2010, 87:3311–3327. doi: 10.1080/00207160903026634.

59. Hoai N, McKay R. A framework for treeadjunct grammar guided genetic programming.In: Post-Graduate ADFA Conference on ComputerScience (PACCS); 2001, 93–99.

60. Shan Y, Abbass H, Mckay RI, Essam D. AntTAG: afurther study. In: Australia-Japan Joint Workshop onIntelligent and Evolutionary Systems; 2002, 1–8.

61. Keber C, Schuster MG. Option valuation with gener-alized ant programming. In: Genetic and EvolutionaryComputation Conference (GECCO); 2002, 74–81.

62. Keber C, Schuster M. Generalized ant programming inoption pricing: determining implied volatilities basedon american put options. In: Proceedings of the IEEEInternational Conference on Computational Intelli-gence for Financial Engineering (IEEE CIFER); 2003,123–130. doi:10.1109/CIFER.2003.1196251.

63. Salehi-Abari A, White T. Enhanced generalized antprogramming (egap). In: Genetic and EvolutionaryComputation Conference (GECCO). ACM; 2008,111–118. doi: 10.1145/1389095.1389111.

64. Espejo P, Ventura S, Herrera F. A survey on theapplication of genetic programming to classification.IEEE Trans Syst Man Cybern C Appl Rev 2010,40:121–144. doi: 10.1109/TSMCC.2009.2033566.

65. Freitas AA. A review of evolutionary algorithms fordata mining. In: Maimon O, Rokach L, eds. SoftComputing for Knowledge Discovery and Data Min-ing. Springer; 2008, 79–111. doi: 10.1007/978-0-387-69935-6_4.

66. Parpinelli R, Lopes H, Freitas A. Data min-ing with an ant colony optimization algorithm.IEEE Trans Evol Comput 2002, 6:321–332. doi:10.1109/TEVC.2002.802452.



67. Martens D, De Backer M, Vanthienen J, Snoeck M,Baesens B. Classification with ant colony optimization.IEEE Trans Evol Comput 2007, 11:651–665. doi:10.1109/TEVC.2006.890229.

68. Otero FEB, Freitas AA, Johnson CG. Handling contin-uous attributes in ant colony classification algorithms.In: IEEE Symposium on Computational Intelligenceand Data Mining (IEEE CIDM); 2009, 225–231.

69. Holden N, Freitas AA. A hybrid PSO/ACO algo-rithm for discovering classification rules in data min-ing. J Artif Evol Appl 2008, 2008:2:1–2:11. doi:10.1155/2008/316145.

70. Bojarczuk CC, Lopes HS, Freitas AA, MichalkiewiczEL. A constrained-syntax genetic programming systemfor discovering classification rules: application to med-ical data sets. Artif Intell Med 2004, 30:27–48.

71. Frank E, Witten IH. Generating accurate rule sets with-out global optimization. In: International Conferenceon Machine Learning (ICML); 1998, 144–151.

72. Cohen W. Fast effective rule induction. In: Inter-national Conference on Machine Learning (ICML);1995, 115–123.

73. Olmo JL, Romero JR, Ventura S. Classification rulemining using ant programming guided by gram-mar with multiple Pareto fronts. Soft Comput 2012,16:2143–2163. doi: 10.1007/s00500-012-0883-8.

74. Cecilia JM, García JM, Nisbet A, Amos M, UjaldónM. Enhancing data parallelism for ant colony opti-mization on GPUs. J Parallel Distrib Comput 2013,73:42–51. doi: 10.1016/j.jpdc.2012.01.002.

75. Cano A, Olmo JL, Ventura S. Parallel multi-objectiveant programming for classification using GPUs.J Parallel Distrib Comput 2013, 73:713–728. doi:10.1016/j.jpdc.2013.01.017.

76. Olmo JL, Cano A, Romero JR, Ventura S. Binaryand multiclass imbalanced classification usingmulti-objective ant programming. In: 2012 12thInternational Conference on Intelligent SystemsDesign and Applications (ISDA); 2012, 70–76.doi:10.1109/ISDA.2012.6416515.

77. Galar M, Fernández A, Barrenechea E, Bustince H,Herrera F. An overview of ensemble methods forbinary classifiers in multi-class problems: experimentalstudy on one-vs-one and one-vs-all schemes. PatternRecog 2011, 44:1761–1776. doi: 10.1016/j.patcog.2011.01.017.

78. Olmo JL, Luna JM, Romero JR, Ventura S. Miningassociation rules with single and multi-objective gram-mar guided ant programming. Integr Comput-AidedEng 2013, 20:217–234. doi: 10.3233/ICA-130430.

79. Olmo JL, Romero JR, Ventura S. Single andmulti-objective ant programming for mining inter-esting rare association rules. Int J Hybrid Intell Syst2014, 11:197–209. doi: 10.3233/HIS-140195.

80. Miller J, Thomson P. Cartesian genetic programming.In: Poli R, Banzhaf W, Langdon W, Miller J, Nordin

P, Fogarty T, eds. Genetic Programming. LNCS, vol.1802. Berlin/Heidelberg: Springer; 2000, 121–132.doi: 10.1007/978-3-540-46239-2_9.

81. Hara A. Watanabe M, Takahama T. Cartesian antprogramming. In: 2011 IEEE International Confer-ence on Systems, Man, and Cybernetics (SMC); 2011;3161–3166. doi:10.1109/ICSMC.2011.6084146.

82. Luis S, dos Santos MV. On the evolvability ofa hybrid ant colony-cartesian genetic program-ming methodology. In: EuroGP; 2013, 109–120.doi:10.1007/978-3-642-37207-0_10.

83. Bullnheimer B, Hartl RF, Strauss C. A new rank basedversion of the ant system—a computational study.Cent Eur J Oper Res Econ 1997, 7:25–38.

84. Stützle T, Hoos HH. MAX-MIN ant system. FutureGener Comput Syst 2000, 16:889–914.

85. Kennedy J, Eberhart R. Particle swarm optimiza-tion. In: IEEE International Conference on Neu-ral Networks (ICNN), vol. 4; 1995, 1942–1948.doi:10.1109/ICNN.1995.488968.

86. O’Neill M, Brabazon A. Grammatical swarm. In:Genetic and Evolutionary Computation Conference(GECCO); 2004, 163–174.

87. O’Neill M, Brabazon A. Grammatical swarm:the generation of programs by social program-ming. Nat Comput 2006, 5:443–462. doi:10.1007/s11047-006-9007-7.

88. O’Neill M, Brabazon A, Adley C. The automatic gen-eration of programs for classification problems withgrammatical swarm. In: IEEE Congress on Evolu-tionary Computation, 2004 (CEC2004), vol. 1; 2004,104–110. doi:10.1109/CEC.2004.1330844.

89. Ramstein G, Beaume N, Jacques Y. A grammati-cal swarm for protein classification. In: 2008 IEEECongress on Evolutionary Computation (CEC); 2008,2561–2568. doi:10.1109/CEC.2008.4631142.

90. Ramstein G, Beaume N, Jacques Y. Detection ofremote protein homologs using social programming.In: Abraham A, Hassanien AE, de Carvalho A, eds.Foundations of Computational Intelligence Volume4. Studies in Computational Intelligence, vol. 204.Springer; 2009, 277–296. doi: 10.1007/978-3-642-01088-0_12.

91. O’Neill M, Leahy F, Brabazon A. Grammatical swarm:a variable-length particle swarm algorithm. In: SwarmIntelligent Systems, Studies in Computational Intelli-gence. Springer; 2006, 59–74.

92. Veenhuis C, Koppen M, Kruger J, Nickolay B. Treeswarm optimization: an approach to PSO-based treediscovery. In: 2005 IEEE Congress on EvolutionaryComputation (CEC), vol. 2; 2005, pp. 1238–1245.doi:10.1109/CEC.2005.1554832.

93. Togelius J, Nardi RD, Moraglio A. Geometric PSO +GP = particle swarm programming. In: IEEE Congresson Evolutionary Computation; 2008, 3594–3600.



94. Qi F, Ma Y, Liu X, Ji G. A hybrid genetic programmingwith particle swarm optimization. In: Advances inSwarm Intelligence. LNCS, vol. 7929. Springer; 2013,11–18. doi: 10.1007/978-3-642-38715-9_2.

95. Wedde H, Farooq M, Zhang Y. Beehive: an efficientfault-tolerant routing algorithm inspired by honeybee behavior. In: Dorigo M, Birattari M, Blum C,Gambardella L, Mondada F, Stützle T, eds. AntColony Optimization and Swarm Intelligence. LNCS,vol. 3172. Springer; 2004, 83–94. doi: 10.1007/978-3-540-28646-2_8.

96. Fathian M, Amiri B, Maroosi A. Application ofhoney-bee mating optimization algorithm on cluster-ing. Appl Math Comput 2007, 190:1502–1513. doi:10.1016/j.amc.2007.02.029.

97. Abbass HA, Teo J. A true annealing approach to themarriage in honey-bees optimization algorithm. Int JComput Intell Appl 2003, 3:199–211.

98. Jung SH. Queen-bee evolution for genetic algo-rithms. Electron Lett 2003, 39:575–576. doi:10.1049/el:20030383.

99. Nakrani S, Tovey C. From honeybees to internetservers: biomimicry for distributed management ofinternet hosting centers. Bioinspir Biomim 2007,2:182–197.

100. Rao RS, Narasimham SVL, Ramalingaraju M. Opti-mization of distribution network configuration for lossreduction using artificial bee colony algorithm. Int JElectr Power Energy Syst Eng 2008, 1:116–122.

101. Purnamadjaja AH, Russell RA. Pheromone communi-cation in a robot swarm: necrophoric bee behaviourand its replication. Robotica 2005, 23:731–742. doi:10.1017/S0263574704001225.

102. Lemmens N, de Jong S, Tuyls K, Nowé A. Beebehaviour in multi-agent systems. In: Tuyls K, NoweA, Guessoum Z, Kudenko D, eds. Adaptive Agents andMulti-Agent Systems III. Adaptation and Multi-AgentLearning. LNCS, vol. 4865. Springer; 2008, 145–156.doi: 10.1007/978-3-540-77949-0_11.

103. Gutierrez R, Huhns M. Multiagent-based fault tol-erance management for robustness. In: Schuster A,ed. Robust Intelligent Systems. LNCS. Springer; 2008,23–41. doi: 10.1007/978-1-84800-261-6_2.

104. Li X, Shao Z, Quian J. An optimizing method based onautonomous animals: fish-swarm algorithm. Syst EngTheory Pract 2002, 22:32–38.

105. Neshat M, Sepidnam G, Sargolzaei M, Toosi A.Artificial fish swarm algorithm: a survey of thestate-of-the-art, hybridization, combinatorial andindicative applications. Artif Intell Rev 2012:1–33.doi: 10.1007/s10462-012-9342-2.

106. Liu Q, Odaka T, Kuroiwa J, Ogura H. Application ofan artificial fish swarm algorithm in symbolic regres-sion. IEICE Trans Inf Syst 2013, E96-D:872–895.

107. Gandomi A, Yang XS, Alavi A, Talatahari S. Batalgorithm for constrained optimization tasks. Neural

Comput Appl 2013, 22:1239–1255. doi: 10.1007/s00521-012-1028-9.

108. Yang C, Tu X, Chen J. Algorithm of marriage in honeybees optimization based on the wolf pack search.In: International Conference on Intelligent Perva-sive Computing (IPC); 2007, 462–467. doi:10.1109/IPC.2007.104.

109. Headleand C, Teahan W. Grammatical herding. JComput Sci Syst Biol 2013, 6:43–47. doi: 10.4172/jcsb.1000099.

110. Headleand C. Swarm based population seeding ofgrammatical evolution. J Comput Sci Syst Biol 2013,6:132–135. doi: 10.4172/jcsb.1000110.

111. Yang XS. Firefly algorithms for multimodal optimiza-tion. In: Watanabe O, Zeugmann T, eds. Stochas-tic Algorithms: Foundations and Applications. LNCS,vol. 5792. Springer; 2009, 169–178. doi: 10.1007/978-3-642-04944-6_14.

112. Husselmann A, Hawick K. Geometric firefly algo-rithms on graphical processing units. In: Yang XS,ed. Cuckoo Search and Firefly Algorithm. Studies inComputational Intelligence, vol. 516. Springer; 2014,245–269. doi: 10.1007/978-3-319-02141-6_12.

113. Gandomi AH, Alavi AH. Krill herd: a newbio-inspired optimization algorithm. Commun Non-linear Sci Numer Simul 2012, 17:4831–4845. doi:10.1016/j.cnsns.2012.05.010.

114. Gheraibia Y, Moussaoui A. Penguins search optimiza-tion algorithm (PeSOA). In: Recent Trends in AppliedArtificial Intelligence. LNCS, vol. 7906. Springer;2013, 222–231. doi: 10.1007/978-3-642-38577-3_23.

115. Cuevas E, Cienfuegos M. A new algorithm inspiredin the behavior of the social-spider for constrainedoptimization. Expert Syst Appl 2014, 41:412–425.doi: 10.1016/j.eswa.2013.07.067.

116. Pappa G, Ochoa G, Hyde M, Freitas A, Wood-ward J, Swan J. Contrasting meta-learning andhyper-heuristic research: the role of evolutionaryalgorithms. Genet Program Evolvable Mach 2014,15:3–35. doi: 10.1007/s10710-013-9186-9.

117. Boryczka M. Ant colony programming for approxi-mation problems. In: Intelligent Information Systems.Advances in Soft Computing, vol. 17. Springer; 2002,147–156. doi: 10.1007/978-3-7908-1777-5_15.

118. Leahy F. Social programming: Investigations in gram-matical swarm. MSc Thesis, University of Limerick,Ireland, 2005.

119. Si T, De A, Bhattacharjee A. Grammatical swarmbased-adaptable velocity update equations in particleswarm optimizer. In: Satapathy SC, Udgata SK, BiswalBN, eds. Proceedings of the International Conferenceon Frontiers of Intelligent Computing: Theory andApplications (FICTA) 2013. Advances in IntelligentSystems and Computing, vol. 247. Springer; 2014,197–206. doi: 10.1007/978-3-319-02931-3_24.



120. Chen Y, Dong J, Yang B. Automatic design of hier-archical TS-FS model using ant programming andPSO algorithm. In: Bussler C, Fensel D, eds. Artifi-cial Intelligence: Methodology, Systems, and Applica-tions. LNCS, vol. 3192. Springer; 2004, 285–294. doi:10.1007/978-3-540-30106-6_29.

121. Keber C, Schuster M. Collective intelligence in optionpricing: determining black-scholfs implied volatilitieswith generalized ant programming. In: World Automa-tion Congress, vol. 17; 2004, 465–470.

122. de Mingo López LF, Blas NG, Arteta A. The opti-mal combination: grammatical swarm, particle swarmoptimization and neural networks. J Comput Sci 2012,3:46–55. doi: 10.1016/j.jocs.2011.12.005.

123. Olmo JL, Luna JM, Romero JR, Ventura S. An auto-matic programming ACO-based algorithm for classifi-cation rule mining. In: Trends in Practical Applicationsof Agents and Multiagent Systems. LNAI. Springer;2010, 649–656. doi: 10.1007/978-3-642-12433-4_76.

124. Olmo JL, Romero JR, Ventura S. A grammar basedant programming algorithm for mining classifica-tion rules. In: 2010 IEEE Congress on Evolution-ary Computation (CEC); 2010, 1–8. doi:10.1109/CEC.2010.5586492.

125. Olmo JL, Romero JR, Ventura S. Ant programmingalgorithms for classification. In: Alam S, Dobbie G,Koh YS, ur Rehman S, eds. Biologically-Inspired Tech-niques for Knowledge Discovery and Data Mining.IGI Global, Hershey, PA, USA; 2014, 107–128. doi:10.4018/978-1-4666-6078-6.ch005.

126. Olmo JL, Romero JR, Ventura S. Multi-objectiveant programming for mining classification rules. In:Moraglio A, Silva S, Krawiec K, Machado P, Cotta

C, eds. Genetic Programming. LNCS, vol. 7244.Berlin/Heidelberg: Springer; 2012, 146–157. doi: 10.1007/978-3-642-29139-5_13.

127. Olmo JL, Luna JM, Romero JR, Ventura S.Association rule mining using a multi-objectivegrammar-based ant programming algorithm. In: 201111th International Conference on Intelligent SystemsDesign and Applications (ISDA); 2011, 971–977.doi:10.1109/ISDA.2011.6121784.

128. Martens D, Baesens B, Fawcett T. Editorial survey:swarm intelligence for data mining. Mach Learn 2011,82:1–42. doi: 10.1007/s10994-010-5216-5.

129. Darabos C, Giacobini M, Hu T, Moore J. Lévy-flightgenetic programming: towards a new mutationparadigm. In: Giacobini M, Vanneschi L, Bush W,eds. Evolutionary Computation, Machine Learningand Data Mining in Bioinformatics. Lecture Notes inComputer Science, vol. 7246. Springer; 2012, 38–49.doi: 10.1007/978-3-642-29066-4_4.

130. White T, Salehi-Abari A. A swarm-based crossoveroperator for genetic programming. In: Genetic andEvolutionary Computation Conference (GECCO).ACM; 2008, 1345–1346. doi: 10.1145/1389095.1389356.

131. Jabeen H, Baig AR. GPSO: a framework for opti-mization of genetic programming classifier expressionsfor binary classification using particle swarm opti-mization. Int J Innovative Comput Inf Control 2012,8:233–242.

132. Bouaziz S, Dhahri H, Alimi AM, Abraham A. A hybridlearning algorithm for evolving flexible beta basisfunction neural tree model. Neurocomputing 2013,117:107–117. doi: 10.1016/j.neucom.2013.01.024.


Swarm-based metaheuristics in automatic programming: a survey

Documents

Transcript of Swarm-based metaheuristics in automatic programming: a survey