Customizable FPGA IP core implementation of a general-purpose genetic algorithm engine

17
IEEE TRANSACTIONS ON EVOLUTIONARY COMPUTATION, VOL. 14, NO. 1, FEBRUARY 2010 133 Customizable FPGA IP Core Implementation of a General-Purpose Genetic Algorithm Engine Pradeep R. Fernando, Student Member, IEEE, Srinivas Katkoori, Senior Member, IEEE, Didier Keymeulen, Ricardo Zebulum, and Adrian Stoica, Senior Member, IEEE Abstract— Hardware implementation of genetic algorithms (GAs) is gaining importance because of their proven effectiveness as optimization engines for real-time applications (e.g., evolvable hardware). Earlier hardware implementations suffer from major drawbacks such as absence of GA parameter programmability, rigid predefined system architecture, and lack of support for mul- tiple fitness functions. In this paper, we report the design of an IP core that implements a general-purpose GA engine that addresses these problems. Specifically, the proposed GA IP core can be cus- tomized in terms of the population size, number of generations, crossover and mutation rates, random number generator seed, and the fitness function. It has been successfully synthesized and verified on a Xilinx Virtex II Pro Field programmable gate arrays device (xc2vp30-7ff896) with only 13% logic slice utilization, 1% block memory utilization for GA memory, and a clock speed of 50MHz. The GA core has been used as a search engine for real- time adaptive healing but can be tailored to any given application by interfacing with the appropriate application-specific fitness evaluation module as well as the required storage memory and by programming the values of the desired GA parameters. The core is soft in nature i.e., a gate-level netlist is provided which can be readily integrated with the user’s system. The performance of the GA core was tested using standard optimization test functions. In the hardware experiments, the proposed core either found the globally optimum solution or found a solution that was within 3.7% of the value of the globally optimal solution. The experimental test setup including the GA core achieved a speedup of around 5.16× over an analogous software implementation. Index Terms— Evolvable hardware, field programmable gate arrays, genetic algorithm, IP core. I. I NTRODUCTION A GENETIC algorithm (GA) [1], [2] is a stochastic opti- mization algorithm modeled on the theory of natural evolution. Since originally proposed by Holland [1], GAs have been successfully used for handling a wide variety of problems including NP-complete problems such as the Manuscript received August 8, 2008; revised January 4, 2009 and March 26, 2009; accepted April 27, 2009. First version published October 30, 2009; current version published January 29, 2010. This work was partially supported by NASA/JPL, funded Project No. 21081010, entitled Self-Reconfigurable Electronics for Extreme Environments (SRE-EE), performed during 2006– 2007. S. Katkoori’s research sponsors include the National Science Founda- tion, Honeywell, NASA JPL, and the Florida I4 High Tech Corridor Initiative. P. R. Fernando and S. Katkoori are with the Department of Computer Science and Engineering, University of South Florida, Tampa, FL 3320 USA (e-mail: [email protected]; [email protected]). D. Keymeulen, R. Zebulum, and A. Stoica are with Jet Propulsion Lab- oratory, Pasadena, CA 91109 USA (e-mail: [email protected]; [email protected]; [email protected]). Color versions of one or more of the figures in this paper are available online at http://ieeexplore.ieee.org. Digital Object Identifier 10.1109/TEVC.2009.2025032 traveling salesman problem [3], real-time problems such as reconfiguration of evolvable hardware (EHW) [4], and other optimization problems that have complex constraints [2]. GAs have been shown to be a robust search mechanism that can be used to explore multiple regions of the problem’s solution space concurrently at the expense of significant runtimes. A natural solution to speed up the runtime of a GA is to implement the GA in hardware. Another advantage of a hardware implementation of a GA is the elimination of the need for complex time-and resource- consuming communication protocols needed by an equivalent software implementation to interface with the main applica- tion. This is particularly advantageous to real-time applications such as reconfiguration of EHW. Moreover, with the rapidly increasing field programmable gate array (FPGA) logic den- sity, efficiently designed hardware GAs can be implemented on a single FPGA in addition to the target application, resulting in a less bulky apparatus. In this paper, a robust parameterized GA IP core is proposed that is readily synthesizable using standard FPGA design tools at both the register transfer (RT) level and gate-level. The purpose of this programmable IP core is to be integrated with any application that requires a search engine and can also be implemented in a system-on-a-chip configuration. The architecture of the IP core makes it extremely flexible and easy to integrate with target applications, allowing seamless integra- tion of user-defined blocks such as fitness function modules. The core has been implemented on a Xilinx Virtex2Pro FPGA device (xc2vp30-7ff896). It has a very small footprint (only 13% slice utilization) and runs at a high speed (50MHz). This IP core is highly suitable for EHW [24] applications. It was one of the building blocks of the self-reconfigurable analog array architecture and was used to compensate extreme temperature effects on VLSI electronics [34], [35]. More specifically, the novel contributions of this paper are that the proposed GA IP core: 1) is available at different design levels, RT-level and gate- level, to provide the end-user flexibility in choosing the level at which to include the GA core; 2) can be used as a hard IP that can switch between eight different user-defined fitness functions without the need for re-synthesis of the entire design; 3) allows programming of the initial seed for the random number generator (RNG), enabling different conver- gence characteristics to be obtained for the same GA parameter settings; 1089-778X/$26.00 © 2010 IEEE

Transcript of Customizable FPGA IP core implementation of a general-purpose genetic algorithm engine

IEEE TRANSACTIONS ON EVOLUTIONARY COMPUTATION, VOL. 14, NO. 1, FEBRUARY 2010 133

Customizable FPGA IP Core Implementation of aGeneral-Purpose Genetic Algorithm Engine

Pradeep R. Fernando, Student Member, IEEE, Srinivas Katkoori, Senior Member, IEEE, Didier Keymeulen,Ricardo Zebulum, and Adrian Stoica, Senior Member, IEEE

Abstract— Hardware implementation of genetic algorithms(GAs) is gaining importance because of their proven effectivenessas optimization engines for real-time applications (e.g., evolvablehardware). Earlier hardware implementations suffer from majordrawbacks such as absence of GA parameter programmability,rigid predefined system architecture, and lack of support for mul-tiple fitness functions. In this paper, we report the design of an IPcore that implements a general-purpose GA engine that addressesthese problems. Specifically, the proposed GA IP core can be cus-tomized in terms of the population size, number of generations,crossover and mutation rates, random number generator seed,and the fitness function. It has been successfully synthesized andverified on a Xilinx Virtex II Pro Field programmable gate arraysdevice (xc2vp30-7ff896) with only 13% logic slice utilization, 1%block memory utilization for GA memory, and a clock speed of50 MHz. The GA core has been used as a search engine for real-time adaptive healing but can be tailored to any given applicationby interfacing with the appropriate application-specific fitnessevaluation module as well as the required storage memory and byprogramming the values of the desired GA parameters. The coreis soft in nature i.e., a gate-level netlist is provided which can bereadily integrated with the user’s system. The performance of theGA core was tested using standard optimization test functions.In the hardware experiments, the proposed core either foundthe globally optimum solution or found a solution that waswithin 3.7% of the value of the globally optimal solution. Theexperimental test setup including the GA core achieved a speedupof around 5.16× over an analogous software implementation.

Index Terms— Evolvable hardware, field programmable gatearrays, genetic algorithm, IP core.

I. INTRODUCTION

AGENETIC algorithm (GA) [1], [2] is a stochastic opti-mization algorithm modeled on the theory of natural

evolution. Since originally proposed by Holland [1], GAshave been successfully used for handling a wide varietyof problems including NP-complete problems such as the

Manuscript received August 8, 2008; revised January 4, 2009 and March 26,2009; accepted April 27, 2009. First version published October 30, 2009;current version published January 29, 2010. This work was partially supportedby NASA/JPL, funded Project No. 21081010, entitled Self-ReconfigurableElectronics for Extreme Environments (SRE-EE), performed during 2006–2007. S. Katkoori’s research sponsors include the National Science Founda-tion, Honeywell, NASA JPL, and the Florida I4 High Tech Corridor Initiative.

P. R. Fernando and S. Katkoori are with the Department of ComputerScience and Engineering, University of South Florida, Tampa, FL 3320 USA(e-mail: [email protected]; [email protected]).

D. Keymeulen, R. Zebulum, and A. Stoica are with Jet Propulsion Lab-oratory, Pasadena, CA 91109 USA (e-mail: [email protected];[email protected]; [email protected]).

Color versions of one or more of the figures in this paper are availableonline at http://ieeexplore.ieee.org.

Digital Object Identifier 10.1109/TEVC.2009.2025032

traveling salesman problem [3], real-time problems such asreconfiguration of evolvable hardware (EHW) [4], and otheroptimization problems that have complex constraints [2]. GAshave been shown to be a robust search mechanism that canbe used to explore multiple regions of the problem’s solutionspace concurrently at the expense of significant runtimes.A natural solution to speed up the runtime of a GA is toimplement the GA in hardware.

Another advantage of a hardware implementation of a GAis the elimination of the need for complex time-and resource-consuming communication protocols needed by an equivalentsoftware implementation to interface with the main applica-tion. This is particularly advantageous to real-time applicationssuch as reconfiguration of EHW. Moreover, with the rapidlyincreasing field programmable gate array (FPGA) logic den-sity, efficiently designed hardware GAs can be implemented ona single FPGA in addition to the target application, resultingin a less bulky apparatus.

In this paper, a robust parameterized GA IP core is proposedthat is readily synthesizable using standard FPGA designtools at both the register transfer (RT) level and gate-level.The purpose of this programmable IP core is to be integratedwith any application that requires a search engine and canalso be implemented in a system-on-a-chip configuration. Thearchitecture of the IP core makes it extremely flexible and easyto integrate with target applications, allowing seamless integra-tion of user-defined blocks such as fitness function modules.The core has been implemented on a Xilinx Virtex2Pro FPGAdevice (xc2vp30-7ff896). It has a very small footprint (only13% slice utilization) and runs at a high speed (50 MHz).This IP core is highly suitable for EHW [24] applications.It was one of the building blocks of the self-reconfigurableanalog array architecture and was used to compensate extremetemperature effects on VLSI electronics [34], [35].

More specifically, the novel contributions of this paper arethat the proposed GA IP core:

1) is available at different design levels, RT-level and gate-level, to provide the end-user flexibility in choosing thelevel at which to include the GA core;

2) can be used as a hard IP that can switch between eightdifferent user-defined fitness functions without the needfor re-synthesis of the entire design;

3) allows programming of the initial seed for the randomnumber generator (RNG), enabling different conver-gence characteristics to be obtained for the same GAparameter settings;

1089-778X/$26.00 © 2010 IEEE

134 IEEE TRANSACTIONS ON EVOLUTIONARY COMPUTATION, VOL. 14, NO. 1, FEBRUARY 2010

4) provides PRESET modes, allowing the user to experi-ment readily with a varied set of predefined GA para-meter settings; and

5) can be directly implemented as a digital ASIC usingstandard ASIC design tools with simple scan chain testa-bility built into the core and with basic fault tolerancein the form of PRESET modes to bypass parameterinitialization failure in an ASIC implementation.

In addition, the proposed GA IP core has several highlydesirable features as given below.

1) Programmability—values of important GA parame-ters including population size, number of generations,crossover rate, and mutation rate can be programmedto accommodate the requirements of a wide variety ofapplications.

2) High Probability of Convergence to Optimal Solution—an elitist GA model is used which has been shown tohave the ability to converge to the global optimum [17].

3) Easy Interfacing—simple two-way handshaking proto-cols to interface with user defined fitness evaluationmodule and the target application that ensure ease ofintegration and ease of use.

The rest of the paper is organized as follows. Section II givesa brief introduction to GAs, reviews the various architecturesand hardware implementations of GAs proposed in the litera-ture, and gives a brief introduction to EHW and its applicationsin space systems. Section III describes in detail the designmethodology and the hardware implementation details of theGA core. Section IV reports the simulations and experimentsconducted at various design levels to verify the functionality ofthe GA and demonstrate its effectiveness. Section V concludeswith a brief summary of the features and advantages of theproposed GA IP core and discusses future directions such asASIC development.

II. BACKGROUND AND RELATED WORK

A. GAs

GA [1], [2] is a robust stochastic optimization technique thatis based on the principle of survival of the fittest in nature. GAsare attractive because unlike simulated annealing that consid-ers a single solution at a time, GAs maintain a population ofsolutions or individuals throughout the optimization process.

GA begins its optimization cycle with an initial populationof randomly generated individuals. Each individual is anencoding of a solution to the problem under consideration.Each individual is assigned a fitness measure that denotesthe quality of the solution that it encodes. GAs evolve thepopulation over generations with the use of operators such asselection, crossover, and mutation. In each generation, highlyfit individuals from the current generation are selected asparents to produce new individuals for the next generation.Genetic information encoded in the parents’ representationis mixed to form one or more offspring using an operatorcalled crossover. The offspring produced by the crossoveroperator replace individuals with the worst fitness in thecurrent population. This ensures that the average fitness of thepopulation increases over generations as the fitter individuals

will survive through the generations. To prevent prematureconvergence to the best individual in the initial population,an operator called mutation is applied that introduces randomchanges into the population. After a prespecified number ofgenerations, the best individual found in the entire geneticoptimization cycle is output as the solution to the problem.

B. Previous Hardware Implementations of GAs

There have been many hardware implementations of bothgeneral-purpose [5]–[10] and application-specific [9] GAs.This section will review previous FPGA implementations ofgeneral-purpose GAs. Table I summarizes the existing workson FPGA implementations of general-purpose GAs.

Several application-specific hardware implementations ofGAs [16] exist in literature. The application-specific GAimplementations are tailored to the particular applicationin terms of chromosome encoding, crossover, and mutationoperation. These implementations will not be discussed hereas they cannot be reused with other applications even forprototyping purposes.

The first FPGA implementation of a general-purpose GAengine was proposed by Scott et al. [5], who described amodular hardware implementation of a simple GA that usedroulette wheel selection and one-point crossover and had afixed population size of 16 and member width of 3-bits.The GA was broken into simpler modules and each modulewas described using behavioral Very High Speed IntegratedCircuits Hardware Description Language (VHDL). The overallGA design was implemented on multiple Xilinx FPGAs on aBORG board. The main goal of that work was to illustrate theissues of hardware implementation of a general-purpose GA.

Tommiska and Vuori [6] implemented a general-purposeGA system with round robin parent selection and one-pointcrossover and used a fixed population size of 32. The GAwas implemented on Altera FPGAs mounted on PeripheralComponents Interconnect (PCI) cards and connected to thehost computer’s CPU using high-performance PCI buses.Experimentation on various fitness functions involved rewrit-ing the Altera Hardware Description Language (AHDL) codeand reprogramming the FPGAs.

Shackleford et al. [7] implemented a survival-based, steady-state GA in VHDL to achieve higher performance and testedit on set-covering and protein folding problems. The prototypeGA machine used for the set-cover problem was designedusing the Tsutsuji logic synthesis system and implementedon an Aptix AXB-MP3 field programmable circuit boardpopulated with six FPGAs.

Yoshida et al. [8] implemented a GA processor with asteady-state architecture that supports efficient pipelining anda simplified tournament selection.

Tang and Yip [9] implemented a PCI-based hardware GAsystem using two Altera FPGAs mounted on a PCI board.The PCI-based GA system has multiple crossover and muta-tion operators implemented with programmable crossover andmutation thresholds. Tang and Yip also discussed differentparallel implementations of the PCI-based GA system.

In contrast to the simple GA, Aportewan et al. [10] imple-mented a compact GA in Verilog HDL, as it is more amenable

FERNANDO et al.: CUSTOMIZABLE FPGA IP CORE IMPLEMENTATION OF A GENERAL-PURPOSE GA ENGINE 135

TABLE I

REVIEW OF EXISTING LITERATURE ON FPGA IMPLEMENTATIONS OF GENERAL-PURPOSE GAS

(“N/A”: NOT APPLICABLE, “—” : UNKNOWN)

Work Elitist Pop.size

No.gens Selection Crossover/

mutation ratesCrossoveroperators

RNG/seed

Presetmodes Initialize mode FPGA

platform

[5] NFixed(16) Fixed Roulette Fixed 1-Point CA/fixed None None BORG board

[6] NFixed(32) Fixed

Roundrobin Fixed 1-Point

LSHR/fixed None None Altera

[7] N Fixed Fixed Survival Fixed 1-Point CA/fixed None None Aptix

[8] N64 or128 Fixed

Simplifiedtourney — 1-Point CA/fixed None None SFL (HDL)

[9] — Prog. Prog. Roulette Prog.1-Point,4-Point,Uniform

Fixed None —PCI card

based system

[10] N/A Fixed(256) N/A N/A N/A N/A CA/fixed None None Xilinx

Virtex1000

Proposed YProg.(8-bit)

Prog.(32-bit) Roulette Prog. (4-bit) 1-point CA/prog.

3 Diff.modes

Separate init.mode (two-way

handshake)

XilinxVirtex2Pro

FPGA

toward a hardware implementation. But compact GAs sufferfrom a severe limitation that their convergence to the optimalsolution is guaranteed only for the class of applications thatpossess tightly coded nonoverlapping building blocks [10].

Hardware acceleration techniques such as pipelining andparallel architectures have also been applied to the design ofhardware GAs [11]–[13].

The motivations of all previous FPGA implementations fallunder one or more of the following categories.

1) Basic Hardware Acceleration—to obtain speedup overthe corresponding software implementation [5]–[8]; or

2) Novel GA Templates—to propose a novel GA templateor architecture [10] that is more suited for a hardwareimplementation; or

3) Advanced Hardware Acceleration Techniques—to accel-erate a GA using pipelined, and/or parallel implementa-tions of GA architectures [11]–[13].

The primary goal of the above efforts is to demonstrate thespeedup achievable by a hardware GA implementation. As aresult, the prototypes developed in the above implementationssuffer from one or more of the following limitations.

1) Lack of Programmability for GA Parameters—Some orall of the GA parameters are fixed. To the best ofthe authors’ knowledge, the only FPGA implementationthat has customizable parameters is the GA machineproposed by Tang and Yip [9].

2) Lack of Scalability in Terms of Number of Fitness Func-tions Supported—A general-purpose GA engine can beused to address many issues that arise within the sameapplication. This requires the ability to switch betweendifferent fitness functions. All the previous hardware GAimplementations support only a single fitness function.The entire design has to be re-synthesized after includingthe new fitness function. A lot of time and design efforthas to be spent again to make sure that the new designmeets all the design specifications including timing. Allthe previous hardware GA implementations suffer fromthis limitation.

3) Predefined System Architecture/Organization—The architectures of many previous hardware GAimplementations are defined on the basis of a specificdevelopment environment, imposing serious restrictionson the application using it. For example, in [9], theGA machine can only be implemented on a PCIcard that contains two FPGAs with a local PCI busproviding the communication between the differentmodules.

The proposed GA IP core overcomes all the above limita-tions using the following features.

1) Independent Parameter Initialization Phase—The pro-posed GA core has a separate initialization phase thatenables the user to program the desired GA parame-ters including population size, number of generations,crossover threshold, mutation threshold, and the initialseed used by the RNG online.

2) Compact IP Core with Simple Communication Inter-faces—The proposed GA core also does not imposeany hardware requirements or system architecture on theuser. It can be instantiated within the main applicationas a drop-in IP module and synthesized along with theapplication.

3) Support for Multiple Fitness Functions—The proposedcore supports up to eight different fitness functions.Some of these fitness functions can be synthesized alongwith the GA engine and implemented on the sameFPGA device. The proposed core also has additionalI/O ports that allow the user to add more fitness func-tions that have been implemented on a second FPGAdevice or some other external device. The user can thenselect between the existing internal fitness functions andexternal fitness functions. This eliminates the need toresynthesize the entire design just to accommodate anew fitness function.

Besides the FPGA implementations, ASIC implementationsof GAs [14], [15] have also been proposed to improvethe performance of the GA using hardware acceleration.Wakabayashi et al. [14] proposed a GA accelerator (GAA)

136 IEEE TRANSACTIONS ON EVOLUTIONARY COMPUTATION, VOL. 14, NO. 1, FEBRUARY 2010

chip that implements an elitist generational GA with on-the-fly adaptive selection between the two-point and uniformcrossover operators. The GAA chip was fabricated using0.5 µm standard CMOS technology.

Chen et al. [15] developed a GA chip using 0.18µmTaiwan Semiconductor Manufacturing Company cell library.They developed a software tool called Smart GA to tailor theGA module to the user specifications. Smart GA is a softwareapplication that accepts user values for the programmable GAparameters through a front-end Graphical User Interface andsynthesizes a custom GA netlist using these fixed GA para-meter values. A major disadvantage with such customizationis that once an ASIC is obtained from a custom netlist, theGA parameters cannot be changed. Any reprogramming ofthe GA parameters will involve resynthesizing the GA netlistfrom scratch using the Smart GA application and repeatingthe physical design process to obtain the ASIC. This is a verycritical problem, as, in most cases, the user cannot predictthe best GA parameter settings for his/her application. If thecurrent user settings do not offer the best performance, theuser then has to resynthesize the entire GA netlist with a newcombination of the GA parameter settings and re-design theentire ASIC. The GA IP core proposed in this paper eliminatesthe need for such resynthesis.

C. Pseudo-Random Number Generation and GA Performance

A GA needs a RNG for many purposes, including gen-erating the initial population, to determine if crossover andmutation should be performed, and also by the crossover andmutation operators to determine the crossover and mutationcut points.

“True” random numbers can be generated using specializedhardware that extracts the random numbers from a nondeter-ministic source such as clock jitter in digital circuits [18].Pseudo-random number generators (PRNG) use a determin-istic algorithm to generate the random numbers. Hence, thesequence of random numbers can be predicted if the initialseed is known. The choice of the RNGs depends upon theapplication at hand. Applications that require high security willuse true RNGs. Applications that require a quick response andcannot afford the high area overhead of true RNGs will usePRNGs.

The effect of the quality of the RNGs on the performanceof GAs has been previously studied [19]–[21]. A high-qualityRNG is generally characterized by a long period (beforerepetition of the random numbers), uniformly distributed ran-dom numbers, absence of correlations between consecutivenumbers, and structural properties such as organization ofthe numbers in lattices. Meysenburg [19] and Meysenburgand Foster [20] reported little or no improvement in theperformance of GAs using good PRNGs over those usingpoor PRNGs. But later, Cantu-Paz [21] found significantimprovements on the performance of a simple binary GA whenusing a good PRNG. Cantu-Paz found that the quality of therandom numbers used to generate the initial population hada major impact on the performance of the GA, while it didnot affect the performance of the GA significantly for all theother operations such as crossover and mutation.

The seed used by a RNG influences the sequence ofnumbers generated. Although the RNG characteristics suchas cycle length and uniform distribution will remain thesame with a different seed, the sequence of random numbersgenerated will differ. A poorly chosen seed can lead to poorquality of results even for random numbers generated by agood RNG. Guidelines to choosing a good seed can be foundin [22]. The performance of algorithms using random numbershas been shown to vary with the RNG seed. Elsner [23]studied the influence of RNG seeds on the performance of fourdifferent graph partitioning algorithms. In a particular instance,Elsner observed that the performance worsened by five timesby changing the RNG seed.

Theoretically, a good RNG will produce random numbersthat are uniformly distributed for a large enough sample size.But due to time constraints (of real-time applications), the dis-tribution of the random numbers generated might be nonuni-form. This might lead to poor results for resource-constrainedhardware GA. It has been observed by Meysenburg andFoster [20] and Cantu-Paz [21] that poor RNGs can sometimesoutperform good RNGs for particular seeds. Thus, a user willhave to experimentally determine the RNG seed value forhis/her particular application. This is particularly necessaryfor hardware implementations where simple RNG implemen-tations are preferred due to tight resource and response timerequirements. The proposed IP core is the only hardware GAIP core that allows the user to program the RNG seed inaddition to providing three in-built seeds to select from.

D. Basics of EHW

Space applications are increasingly employing EHW [24]to adapt on-board electronics according to changing environ-mental conditions. EHW is a class of hardware that adapts itsfunctionality/characteristics using EAs. There are two majordivisions of EHW, namely, extrinsic EHW and intrinsic EHW.Extrinsic EHW refers to hardware that is evolved usingsoftware simulations (and behavioral models of the hardware).The best configuration found in the simulations is then down-loaded on to the actual hardware. Intrinsic EHW refers tothe adaptation and reconfiguration of previously configuredhardware because of changes observed or required in the actualhardware.

The growing number of remote space applications hasincreased the demand for intelligent and adaptive space sys-tems [32]. Thus, intrinsic EHW is becoming popular in spaceapplications and has been targeted for different platformsincluding ASICs [25]–[28], and specialized platforms [29].Due to the flexibility and scalability requirements of spaceapplications, most of the existing work on intrinsic EHW havebeen implemented on FPGAs [37], [38].

Intrinsic EHW can be classified into four different classesbased on the location of the reconfigurable hardware and theevolutionary algorithm as proposed by Lambert et al. [30].

1) PC-Based Intrinsic EHW—In this class, the reconfig-urable hardware application is located on an FPGA andthe monitoring system is located in the PC. The reconfig-uration of the EHW is done from the PC. This system

FERNANDO et al.: CUSTOMIZABLE FPGA IP CORE IMPLEMENTATION OF A GENERAL-PURPOSE GA ENGINE 137

RTLcomponent

library

Datapath(Structural VHDL)

Controller(Beh. VHDL)

VHDL 2 kisstranslation

KISS format

Bdnet format

Bdnet 2 Verilogtranslation

Datapath(Gate-level

Verilog)

Controller(Gate-level

Verilog)

Merge netlists

Place and routeusing

cadence toolsStd CellLayoutLibrary

Spice netlistextraction

HSPICESimulation

Digital ASIC layout

Logic synthesis(SIS)

VHDL 2 verilogtranslation

Flattening toGate-level

Logicsynthesis

(Controller)

Std CellLibrary

Behavioral description (VHDL)

AUDI

Fig. 1. Design flow used by AUDI HLS tool.

suffers from slow speeds because of communicationswith the PC which are typically very slow.

2) Complete Intrinsic EHW—In this class of EHWs, boththe reconfigurable hardware and the EA are situated onthe same FPGA chip. This system will yield the bestperformance as the communication delays are due tointra-chip wires. But due to the finite logic resourcesavailable on a single FPGA chip, the complexity andnumber of fitness functions supported are limited.

3) Multichip Intrinsic EHW—The reconfigurable hardwareand the EHW are located on different FPGA chips. Theperformance of this system is worse than the completeintrinsic EHW, as the communication delays are due tointer-chip wires.

4) Multiboard Intrinsic EHW—The reconfigurable hard-ware and the EHW are located in FPGA chips ondifferent boards. The performance of this system is gen-erally slower than the completely intrinsic and multichipintrinsic EHW classes, as the communication delays aredue to inter-board wires.

Although the multichip and multiboard intrinsic EHWimplementations suffer from high communication delays,they are useful in applications where the fitness evaluationtime dominates the communication time between the FPGAchips. Moreover, these implementations can take advantage ofthe dynamic reconfiguration features available in FPGAs toprovide more flexibility (see [30]).

Although the complete intrinsic EHW implementation(especially on an ASIC) yields the best performance andsmallest form factor, it is not widely adopted as it suffers fromlow scalability. When building a complete intrinsic EHW onan ASIC, all the design details have to be fixed at a very earlystage including the different fitness functions that need to besupported. These details cannot be changed later.

The proposed core alleviates this problem by supportingthe interfacing of fitness functions housed on other chips (orboards) to the existing system. It can thus realize a hybridsystem, as shown in Fig. 5, combining the complete andmultichip (or multiboard) intrinsic EHW implementations and

allows the user to select between internal and external fitnessfunctions. Hence, even if the existing system is implementedon an ASIC, new fitness functions can be added externally. Itis to be noted that the proposed IP core can realize all classesof intrinsic EHW systems (except PC-based) both on an FPGAand an ASIC without losing out on the flexibility of supportingmultiple fitness functions.

III. PROPOSED GA FPGA IP CORE

This section describes in detail the design methodology aswell as implementation and interfacing details of the proposedcore.

A. Design Methodology

The proposed GA IP core was synthesized using a high-level circuit synthesis design flow as shown in Fig. 1. High-level synthesis (HLS) is the process of automatically synthe-sizing an RT-level description of a system from its behavioraldescription. This process consists of extracting the dataflowgraph (DFG) of the system from its behavioral description,scheduling all the operations in the DFG, allocating functionalunit resources to handle the various types of DFG operations,binding each operation in the DFG to an allocated resource,and generating control signals to enable correct operation ofthe synthesized datapath.

The RT-level description of the GA core was synthesizedusing the automatic design instantiation (AUDI) HLS tool [39].The AUDI HLS tool takes as input a behavioral (VHDL)description of the system. The output of the AUDI circuitsynthesis tool consists of the following.

1) Datapath—Structural VHDL description of the circuit’sdatapath in terms of simple components.

2) Controller—Behavioral VHDL description of a finitestate machine that provides the control signals to all thedatapath components for proper operation.

3) Top-level Design—Structural VHDL description instan-tiating the datapath and controller with appropriate portmappings.

The advantages of using a HLS tool such as AUDI are thefollowing.

1) Easy Synthesis Using Different Tools—The synthesizedcircuit descriptions are structural descriptions that usesimple components such as adders, multiplexers, etc.This ensures that these netlists will synthesize easilyusing tools from many vendors such as Altera andXilinx.

2) Easy Addition of New Features to Existing Design—Theentire circuit can be resynthesized within a few minutesif any changes are made to the behavioral description toinclude new features to the existing design. Althoughfeatures such as pipelining cannot be automaticallygenerated by the current version of the AUDI HLS tool,there exist HLS tools such as SEHWA [36] that canautomatically generate a pipelined datapath from thebehavioral description of a system.

3) Ease of ASIC Implementation—The AUDI HLS toolalso contains circuit flattening scripts that can generate

138 IEEE TRANSACTIONS ON EVOLUTIONARY COMPUTATION, VOL. 14, NO. 1, FEBRUARY 2010

Start

Initializeparams?

No

Generate InitialPopulation(currPop)

genID = 0

Parent selection

Crossover

Mutation

Evaluate fitness ofoffspring

Store offspring in newPop

newPopfull?

genID <nGENs?

currPop ↔ newPop

Output best solution

Stop

No

No

Yes

Yes

Initialize programmableGA parameters

Fig. 2. High-level view of the GA optimization cycle.

a gate-level netlist from the RT-level netlist. This gate-level netlist can be directly used by industry standardplace-and-route tools to generate an ASIC as shown inFig. 1.

The proposed GA core implements the GA optimizationcycle shown in Fig. 2. The behavior of the GA optimizer wasmodeled in VHDL and simulated to test its correctness. AnRT-level VHDL model of the GA core was synthesized fromthe behavioral model using the AUDI HLS tool. The RT-levelVHDL model was simulated thoroughly to test the correctnessof the synthesized netlist. A gate-level Verilog model was thensynthesized from the RT-level model using in-house flattening

1 1 1 100

0 1 0 1001 1 1 101

0 1 0 101

Offspring 1 Offspring 2

Parent 2Parent 1

cutpoint

Fig. 3. Example illustrating single-point crossover.

scripts and the Berkeley SIS tool [31] as shown in Fig. 1. Thegate-level Verilog model uses simple Boolean gates such asNAND, NOR, AND, OR, XOR, and SCAN_REGISTER. The gate-level Verilog model was also simulated using Cadence NC-Verilog to verify the functionality and timing of the design.The design methodology adopted ensures that the RT-leveland the gate-level netlists are completely synthesizable bystandard synthesis tools such as the Xilinx ISE tool. Thus, asynthesizable GA FPGA IP core is available to the user at twolevels of design abstraction, namely RT-level and gate-level.

B. Behavioral Modeling and Implementation Details

In this section, the implementation of the various featuresof the proposed GA core in the behavioral model will bediscussed in detail.

1) Initial Population: An initial population of randomlygenerated individuals is formed. A 16-bit cellular automaton-based PRNG, similar to the implementation in [5], is used togenerate all the random numbers required. In each generation,a new population of candidate solutions is generated usingcrossover and mutation operators. Elitism is provided bycopying the best individual in the current population into thenext population.

2) Parent Selection: The parent individuals required forcrossover are selected from the current population of individu-als using the proportionate selection scheme [2]. In proportion-ate selection, a threshold fitness value is computed by scalingdown the sum of the fitness values of all the individuals in thecurrent population using a random number. A cumulative sumof the fitness values of the individuals in the new populationis computed and compared to the threshold fitness value.The individual whose fitness causes the cumulative fitnesssum to exceed the scaled fitness threshold is selected as theparent. This ensures that highly fit individuals have a selectionprobability that is proportional to their fitness. To speed upthe computations, the sum of the fitness values of the newpopulation is accumulated when the fitness of the offspring iscomputed.

3) Crossover: The GA core implements the single-pointbinary crossover technique [2], illustrated in Fig. 3, to combineparents from the current generation and produce offspring forthe next generation. The GA core performs crossover onlyif the random number generated is lesser than the specified

FERNANDO et al.: CUSTOMIZABLE FPGA IP CORE IMPLEMENTATION OF A GENERAL-PURPOSE GA ENGINE 139

TABLE II

PORT INTERFACE OF THE PROPOSED GA CORE

No. Port Input/output Width in bits Functionality

1 reset I 1 System reset

2 sys_clock I 1 System clock

3 ga_load I 1 Load GA parameters

4 index I 3 Index of GA parameter

5 value I 16 init. value bus

6 data_valid I 1 Initialization Handshake signal

7 data_ack O 1 Initialization Handshake signal

8 fit_value I 16 Fitness value

9 fit_request O 1 Fitness request signal

10 fit_valid I 1 Fitness value validity signal

11 candidate O 16 Candidate solution bus

12 mem_address O 8 GA Memory address

13 mem_data_out O 32 Data to GA memory

14 mem_wr O 1 GA Memory Write signal

15 mem_data_in I 32 Data from GA memory

16 start_GA I 1 GA Start signal

17 GA_done I 1 GA completion signal

18 test I 1 Scan chain Test enable

19 scanin I 1 Scan chain input

20 scanout O 1 Scan chain output

21 preset I 2 Preset Mode Selector

22 rn I 16 Random number

23 fitfunc_Select I 3 Fitness module select signal

24 fit_value_ext I 16 Fitness value from external FEM

25 fit_valid_ext I 1 fit_valid signal from externalFEM

crossover threshold. Since the 4-bit crossover threshold is user-programmable, the user can control the crossover rate in theGA core. The single-point crossover is implemented by using abit mask vector that generates the first portion of the offspringfrom the first parent and the other portion of the offspringfrom the second parent. A random number n is generated todenote the random cut point on the parents’ chromosomes.A mask is then generated with 1s from position 0 to n − 1and 0s after n. This mask is then used to perform a logicalAND operation with the first parent’s chromosome to obtainthe first part of the offspring chromosome. The mask is thenlogically inverted, and the inverted mask is used to perform alogical AND operation with the second parent’s chromosometo obtain the second part of the offspring chromosome. Thecrossover operation produces two offspring as illustrated inFig. 3.

4) Mutation: The mutation operator in the proposed GAcore consists of inverting a single bit in the offspring obtainedafter crossover. The GA core generates a 4-bit random numberand compares it with the programmed mutation threshold todecide if mutation should be performed. If the random numberis smaller than the mutation threshold, a random bit mutationis performed. A randomly chosen mutation point dictates theappropriate bit mask to be used in an XOR operation withthe candidate solution. This XOR operation essentially flipsthe bit at the mutation point.

5) Fitness Evaluation: The fitness of the resultant offspringis computed using a simple two-way handshaking communi-cation between the GA core and the fitness evaluation module.The GA core can handle up to eight different fitness evaluationmodules, and the user can select the required fitness evaluationmodule by providing a 3-bit fitness selection value. The candi-date and its fitness are stored in the GA memory as part of thenew population. The above cycle of parent selection, crossover,and mutation is repeated in every generation until the newpopulation is completely filled. The GA optimization endswhen the generation index is equal to the user-programmednumber. Then, the GA core exits the optimization cycle andoutputs the best individual found.

6) Programmable GA Parameters: The proposed GA corehas a port interface as shown in Table II. The performanceand runtime of a GA depend on the GA parameters, namelypopulation size, number of generations, crossover rate, andmutation rate. Large values for population size and number ofgenerations generally yield the best results at the expense oflong runtimes. But if the target application is simple, a fewgenerations and a small population size may suffice to find thebest solution. An efficient GA core implementation must havethe ability to cater to the needs of the individual applicationsby allowing the user to change these parameters. The crossoverand mutation rates that produce the best results in the shortestamount of time also vary with the application. Hence, the

140 IEEE TRANSACTIONS ON EVOLUTIONARY COMPUTATION, VOL. 14, NO. 1, FEBRUARY 2010

TABLE III

INDEX VALUES OF THE GA CORE’S PROGRAMMABLE PARAMETERS

Index Programmable parameter

0 Number of generations [15:0]

1 Number of generations [31:16]

2 Population size

3 Crossover rate

4 Mutation rate

5 RNG seed

proposed GA core also has the capability to program boththe crossover and mutation rates. The quality of the randomnumbers generated for the execution of the genetic operatorsalso has an impact on the performance of the GA. Theproposed core allows the user to program the initial seed ofthe RNG, which enables the user to obtain different sequencesof random numbers using the same RNG module.

All the programmable parameters of the GA core mustbe initialized before it can be used. Initialization of the GAcore is done using a simple two-way handshake. The userfirst asserts the init_GA signal to put the GA core in theinitialization mode. Then, all the programmable parameterscan be initialized using a simple handshaking process.

Each programmable parameter has an index associatedwith it as shown in Table III. The user places the valueof the programmable parameter on the fit_value bus andthe corresponding index value on the index bus and assertsthe data_valid signal. The GA core reads the fit_value bus,decodes the index and stores the value in the appropriateregister. It then asserts the data_ack signal and waits fordata_valid to be de-asserted before de-asserting data_ack.

7) Interfacing Details of the GA Core: The overall GAoptimizer consists of three modules, namely, the GA core,the GA memory, and the RNG. Additionally, the GA corecommunicates with a fitness evaluation module and the actualapplication using simple two-way handshaking operations. Atypical system setup with the communication between all themodules is shown in Fig. 4.

The GA memory module is a single-port memory modulethat stores both the individuals and their fitness values. Tostore an individual and its fitness, the GA core places them onthe memory bus and asserts the memory write signal. To readan individual and its fitness, the GA core places the memoryaddress on the address bus and reads the memory contents inthe next clock cycle. The GA memory module is synthesizedusing the dedicated block RAM modules present in the XilinxVirtex-II Pro device.

The RNG module implements a cellular automaton-basedRNG. It is to be noted that the operation of the GA core isindependent of the RNG implementation. The initial seed ofthe RNG module can either be provided by the user or selectedfrom one of three different preset initial seeds. The GA corereads the output register of the RNG module when it needs arandom number. Based on the number of random bits needed,the GA selects the bits from predefined positions.

The GA core uses a simple two-way handshaking protocolfor its communication with the fitness evaluation module.

Fitnessevaluation

module

signals8–11,16–17,23–25

signals1–7,

18–21

signals5, 22

signals8, 12–15

RN

G m

oduleG

A m

emory

Application module

proposedGAcore

Initializationmodule

GA module

Fig. 4. Typical system showing the communication between the differentmodules and the GA core (signal numbers are with reference to Table II).

When the GA core requires the fitness of a candidate solution,it places the individual on the candidate bus and then assertsthe fit_request signal. The FEM module of the target appli-cation should then read the candidate port and compute thefitness of the individual. The computed fitness is then placedon the fit_value port of the GA core and the fit_valid signal isasserted by the target application. On assertion of the fit_validsignal, the GA core reads the fitness value and de-assertsthe fit_request signal. The simplicity of all the interfacingprotocols is a major advantage to the user, as it reduces timingissues during implementation of the entire application.

8) Usage Details of the GA Core: The GA core starts itsoptimization cycle when it receives the start_GA pulse fromthe target application. If the programmable parameters of theGA have been initialized, it uses these values. Otherwise,the GA core can use one of three available preset modes.During its optimization cycle, the GA core requests fitnesscomputations for candidate individuals from the fitnessevaluation module. Once the GA core has computed the bestcandidate, it is placed on the candidate bus and the GA_donesignal is asserted.

C. Design Considerations for ASIC Implementation and SpaceApplications

The proposed GA core is well suited for ASIC developmentas it is available as a Verilog gate-level netlist. A digital ASICcan be implemented from the proposed GA core using industrystandard place-and-route tools, given a standard cell layoutlibrary, as shown in Fig. 1.

1) Preset Modes: The proposed GA core has three presetmodes as shown in Table IV. The values for the programmableGA parameters in the preset modes have been set so that theGA core can be used for a varied set of applications withoutcompromising on performance or runtime. The user can selectany one of the three different preset modes based on thetarget application. When the 2-bit preset signal value is setto “00,” the GA core uses the user-programmed values for allthe programmable parameters.

The preset modes are useful for the following purposes.a) Limited fault tolerance in digital ASIC applications—

When building a digital ASIC application using the pro-posed GA core, failure of the GA parameter initialization

FERNANDO et al.: CUSTOMIZABLE FPGA IP CORE IMPLEMENTATION OF A GENERAL-PURPOSE GA ENGINE 141

TABLE IV

PRESET MODES AVAILABLE IN THE PROPOSED GA CORE

Mode Pop. size No. of Gens.Thresholds

Xover Mutn.

User 00 < 256 < 232 0–15 0–15

Preset

01 32 512 12 1

10 64 1024 13 2

11 128 4096 14 3

logic can be tolerated by running the GA core in one ofthe three preset modes.

b) User experimentation—GAs are used widely to findsolutions for problems with unknown characteristics. Forsuch problems and many other problems, the end-userwill have to experimentally determine the best values forall the GA parameter settings. The preset modes in theproposed GA core present three sets of GA parametervalues that can be used as effective starting points forsuch experimentation.

2) Scan Chain Testing: The AUDI HLS system can alsoautomatically insert scan chain testability when synthesizingthe IP core. The proposed GA core has a scan chain connectingall the registers in the design. A scan chain test can be run onthe core by asserting the test signal and feeding the user testpattern in the scanin port. The output of the scan chain canbe observed on the scanout port. This scan chain can also beconnected to a top level scan chain in a system-level design.

3) Design Considerations for Space Applications: Theincreasing number of remote space missions has necessitatedautonomous spacecrafts that are capable of handling unex-pected situations and adapting to new environments [32]. Thisrequires deployment of intrinsically evolvable hardware whoseadaptation and reconfiguration are controlled by on-boardevolutionary algorithms. In [32], Stoica et al. identify thefollowing characteristics as the most critical in space orientedEHW.

a) Systems approach to EHW design—The EHW systemshould help to reconfigure and adapt the higher levelapplication to the changing environment.

b) Flexibility in fitness calculation—The means of com-puting and the context of a candidate solution’s fitnessvalue need to be considered.

c) Response time—Most of the space applications are time-critical applications that must adapt to the changingenvironment quickly before serious damage is done tothe system and/or the mission itself.

d) Safety—Space systems are very expensive systems thatare highly sensitive to even small errors and/or environ-mental changes. Hence, safety of the space systems isthe most critical characteristic, as they can be perma-nently lost or damaged with the slightest of problems.

The proposed GA core addresses these issues in the follow-ing ways.

a) Design of the GA core as a compact drop-in IP moduleand its capability to be integrated at various design levels

(Add-on FPGA or ASIC)

(FPGA or ASIC)

(External)fitness

evaluationmodule

(Internal)fitness

evaluationmodule

Application module

Initializationmodule

GA module

ProposedGA core

RN

G m

oduleG

A m

emory

Fig. 5. Implementation of a hybrid EHW system (with internal and externalfitness modules) using the proposed GA core.

enable a systems approach to the design of the EHWapplication.

b) The proposed GA core supports external fitness func-tions as described earlier.

c) By allowing the user to program the number of gen-erations, the runtime of the GA can be controlled.Moreover, the best candidate of every generation isalways output to the application to use in case of anemergency.

4) Implementation of a Hybrid Intrinsic EHW System: Theproposed GA core can be used to implement a hybrid intrinsicEHW system as shown in Fig. 5. The GA core allows the userto multiplex between internal and external fitness functions.The fitness value (shown in bold in Fig. 5) and the handshakingsignals are available to the external fitness function module.The external fitness function, which may be on a differentchip or board, can be added on later to expand the systemfunctionality. It is possible to support up to seven externalfitness functions in addition to the internal fitness function, ifthe required multiplexers are added in the external hardwareor if the external fitness function is housed on a reconfigurablefabric such as an FPGA.

D. Scalability of the GA Core

The synthesized RT-level and gate-level netlists of theproposed GA core support chromosome encodings of lengthup to 16-bits. In this section, two methods are discussed thatallow the proposed GA core to support chromosome lengthsof more than 16-bits.

a) Re-synthesis of the GA core netlists—The most effi-cient method of obtaining a GA core that supportschromosome lengths of more than 16-bits is to modifythe behavioral description of the GA core, available inVHDL, and resynthesize the entire netlist using a HLStool. The entire process of resynthesis using the AUDI

142 IEEE TRANSACTIONS ON EVOLUTIONARY COMPUTATION, VOL. 14, NO. 1, FEBRUARY 2010

Fitness Evaluation Module (FEM)

16

16 16

16

16

1616

16 1616

3232

16

32

GA_Core1(MSB - 16-bits)

RNG1 RNG2

candidate ports candidate ports

Candidate

mem

WR

Fitness

fit_value

GA Memory

GA_Core2(LSB - 16-bits)

scalingLogic_parsel

fit_

valu

e_1

fit_

requ

est_

1

fit_

valid

_1

cand

idat

e

fit_

valu

e_2

fit_

requ

est_

2

fit_

valid

_2

Fig. 6. Implementation of a 32-bit GA core using two 16-bit cores.

HLS tool takes only a few minutes. The only disadvan-tage with such resynthesis is that the functionality andtiming details of the newly synthesized GA core needto be verified using extensive simulations.

b) Supporting longer chromosomes using multiple GAcores—If resynthesis of the entire netlist is undesirable,two or more GA cores, which support chromosomelengths of 16-bits, can be used together to supportchromosome lengths of more than 16-bits. The followingsection describes how to obtain a GA core that supportschromosome lengths of up to 32-bits using two instancesof the proposed GA core.

1) Implementation of a 32-Bit GA Core Using Two 16-BitGA Cores: A GA core that supports chromosome lengths ofup to 32-bits can be built using two of the proposed GA coresusing the setup shown in Fig. 6. Below is a discussion on howeach operation of the 32-bit GA core is affected by this setup.

a) Initial population—The 32-bit individuals of the initialpopulation are generated by the concatenation of the16-bit individuals produced by each of the 16-bit GAcores. As shown in Fig. 6, each of the GA cores has itsown RNG that produce random 16-bit vectors that formthe LSB and MSB of the 32-bit individuals.

b) Parent selection—If both GA cores are allowed to per-form independent parent selection, the resulting 32-bitindividual might be the concatenation of a 16-bit

Least Significant Bits (LSB) and a 16-bit Most Signifi-cant Bits (MSB) that belong to two different individuals!Hence, only GA_Core1 is allowed to perform parentselection while the GA_Core2 is forced to completeits parent selection when a valid parent is chosen byGA_Core1. This is accomplished by using external logic,called scalingLogic_parSel in Fig. 6, to interact withGA_Core2 when it requests the current individual’sfitness from the GA memory. Instead of supplying thestored fitness of the individual, the scalingLogic_parSelmodule supplies a fitness of zero until GA_Core1 hasfound the parent individual. This ensures that bothGA_Core1 and GA_Core2 select the same individualas its parent. Hence the MSB and LSB passed on tothe crossover phase will belong to the same individualselected as the parent by GA_Core1.

c) Crossover—Both the GA cores are allowed to per-form independent one-point crossover operations ontheir 16-bit individuals. This corresponds to a three-point crossover operation on the entire 32-bit individual.Since both crossover operations are independent of eachother, the crossover probability for the 32-bit GA core,xovProb32, is given by the equation below

xovProb32 = xovProb16(MSB) + xovProb16(LSB)

− (xovProb16(MSB) ∗ xovProb16(LSB)).

The individual crossover probabilities of the two GAcores xovProb16(MSB) and xovProb16(LSB) should beprogrammed according to the equation above. It shouldbe noted that since three-point crossover can be moredisruptive than one-point crossover, lower crossoverprobabilities should be used to prevent loss of goodschemata.

d) Mutation—Both the GA cores are allowed to performindependent mutation operations. This implies that atmost 2-bits can be flipped in the 32-bit chromosomedue to mutation. Since both mutation operations areindependent of each other, the mutation probability forthe 32-bit GA core, mutProb32, is given by the equationbelow

mutProb32 = mutProb16(MSB) + mutProb16(LSB)

− (mutProb16(MSB) ∗ mutProb16(LSB)).

The individual mutation probabilities of the two GAcores mutProb16(MSB) and mutProb16(LSB) should beprogrammed according to the equation above.

e) Fitness evaluation—The MSB and LSB from the twoGA cores are concatenated and sent to the fitness evalu-ation module. The “fit_request” signal is generated fromGA_Core1. The “fit_valid” signal is sent to both the GAcores while the “fitness” value is sent only to GA_Core1.

f) Candidate and fitness storage—The MSB and LSB fromthe two GA cores are concatenated to form the 32-bitcandidate that is stored in the GA memory. The writesignal for the GA memory is generated from GA_Core1.The fitness value is written only from GA_Core1.

FERNANDO et al.: CUSTOMIZABLE FPGA IP CORE IMPLEMENTATION OF A GENERAL-PURPOSE GA ENGINE 143

TABLE V

RT-LEVEL SIMULATION RESULTS OBTAINED FOR THE THREE TEST FUNCTIONS (BF6, F2, AND F3) UNDER VARIOUS GA PARAMETER SETTINGS

Test function Run number RNG seed Population size Crossover thresholdBest fitness Convergence

(gen. num.)Value Generation

BF6

1 45890 32 10 4047 1 8

2 45890 64 10 4271 14 30

3 10593 32 10 4271 3 16

4 1567 32 10 4146 2 26

5 1567 32 12 4047 2 10

F26 45890 32 10 3060 15 18

7 45890 64 10 2096 1 10

F38 10593 64 10 3060 10 26

9 10593 32 12 3060 5 12

10 1567 32 10 3060 16 20

IV. EXPERIMENTS AND RESULTS

The GA core was simulated and tested thoroughly at eachlevel of design abstraction (behavioral, RT-level, and gate-level). This section will discuss in detail the various exper-iments conducted at each design level and the simulation andhardware results obtained.

A. RT-Level Simulations

At the RT-level, the GA core was simulated using CadenceNC-Launch to verify the functionality. The effectiveness ofthe GA core was tested by optimizing three maximizationtest functions shown below using various parameter settings.All the experiments use a chromosome length of 16. Henceall the single variable experiments have an X-variable rangeof 0 to 65535 and the two variable experiments have equalranges (0 to 255). All the experiments used a mutation rateof 0.0625 and ran for 32 generations. The values of the otherGA parameters were varied to obtain different convergencecharacteristics and are listed in Table V along with the resultsof the corresponding experiments.

1) Test Function #1 (Binary F6): B F6(x) = ((x2 + x) ∗cos(x)/4000000) + 3200

This is a very difficult test function that has numerous localmaxima as can be seen in Fig. 7 and has exactly one globalmaxima with a value of 4271 when x = 65522. This is astandard test function used to test the effectiveness of GAsand other optimization algorithms [33].

2) Test Function #2: F2(x, y) = 8 ∗ x − 4 ∗ y + 1020This is a simple mini-max test function that has to maximize

one variable (x) and minimize the other variable (y) to obtainthe optimal objective function value of 3060.

3) Test Function #3: F3(x, y) = 8 ∗ x + 4 ∗ yThis is a simple maxi-max test function that has to maximize

both the variables (x and y) to obtain the optimal objectivefunction value of 3060.

The convergence plots for the three test functions underdifferent parameter settings are shown in Figs. 8–12. In theplots, the x-axis denotes the generation number and they-axis denotes an individual’s fitness. Each point P(i, j) isa population member in generation i with a fitness value

3200.02

3200.01

3200

3199.99

3199.98

3200.03

3199.9750 100 150

X

BF6

(X)

200 2500 300

Fig. 7. (Zoomed in) Plot of the modified Binary F6 [33] test function.

of j . For the sake of clarity, the plots show only one ofmultiple members with the same fitness in any generation.Hence, as the population converges to the best few candi-dates in the latter generations, the number of points willbe decreased. Although many inferior members are presentin the initial random population, the final generations con-tain highly fit individuals and very few inferior individuals(due to mutation).

Table V summarizes the best results obtained for the threetest functions under various parameter settings. The “con-vergence” column indicates the generation number when thedifference in average fitness between the current generationand next generation is less than 5%. It can be clearly seen thatthe proposed GA core finds the optimal values for all the testfunctions. But the optimum is found only for certain parametersettings, underlining the need for programmability of theGA core’s parameters. It can also be seen that the randomnumbers used by the GA play a vital role in determiningthe performance. For instance, in run #1 from Table V, theGA core converges prematurely to a local optimum. But whenthe RNG seed is changed from 45890 to 10593 (Run#3), theconvergence of the GA is better and the global optimum isfound under the exact same settings for the other parameters.

144 IEEE TRANSACTIONS ON EVOLUTIONARY COMPUTATION, VOL. 14, NO. 1, FEBRUARY 2010

4000

3500

3000

2500

2000

1500

1000

500

0

Fitn

ess

0 3 6 9 12 15

Generation

18 21 24 27 30 33

Fig. 8. Convergence plot for the BF6 test function using number ofgenerations = 32—Run#3 of Table V.

4000

3500

3000

2500

2000

1500

1000

500

0

Fitn

ess

0 3 6 9 12 15

Generation

18 21 24 27 30 33

Fig. 9. Convergence plot for the BF6 test function with initial seed forRNG = 1567—Run#4 of Table V.

Fig. 11 shows the convergence plot for test function #2,which is the mini-max objective function. Fig. 12 shows theconvergence plot for test function #3, which is the maxi-maxobjective function. Figs. 11 and 12 show that small populationsizes and fewer generations are sufficient to solve simple prob-lems. The convergence characteristics for the “hard” binary F6test function are shown in Figs. 8–10. The experimental runsshow that finding the optimal parameter settings for a difficultproblem might be nontrivial and that the initial seed for theRNG module is an important factor in GA convergence. Forinstance, changing the initial seed for the RNG module from45890 in run #1 to 10593 in run #3 (while using the samevalues for all the other programmable parameters) improvedthe best solution found for the BF6 test function by about5.5%, while for test function F2 the optimal result was foundquickly with the initial RNG seed set at 45890. Thus, it isclear that the optimal GA parameter settings differ widely fromfunction to function reiterating the need for a customizable GAcore.

B. FPGA Implementation Results

The gate-level Verilog netlist of the GA core was syn-thesized using the Xilinx ISE 10.1i tool and mapped to aVirtex2Pro (xc2vp30-ff896, speed grade-7) device. Table VI

4000

3500

3000

2500

2000

1500

1000

500

0

Fitn

ess

0 3 6 9 12 15

Generation

18 21 24 27 30 33

Fig. 10. Convergence plot for the BF6 test function with crossoverrate = 0.75—Run#5 of Table V.

shows the area utilization, clock speed, and block memoryutilization for the placed and routed GA core on this Xilinxdevice. Block memory utilization is reported as it is notincluded in the logic utilization computation. It is to benoted that the dedicated block memory in the Xilinx Virtex-II Pro device implements both the GA memory module andthe lookup-based fitness evaluation module. The post place-and-route simulation model for the designed GA IP corewas extracted and simulated using ModelSim to verify thefunctionality and timing. The design was then downloaded onto the FPGA device and its functionality was verified usingChipscope Pro 10.1i tools.

The effectiveness of the GA core was then tested byoptimizing three difficult maximization test functions shownbelow. For the FPGA experiments, complex test functionshave been used to test the effectiveness of the GA core.These functions have been modified to enable easy hardwareimplementation (floating coefficients have been changed sothat they can be realized using shift and add, etc.).

1) Modified and Scaled Binary F6:

mBF6_ 2(x) = 4096 + ([(x2 + x) ∗ cos(x)]/220),

0 ≤ x ≤ 65535.

This function is a modified and scaled version of themaximization test function from Haupt and Haupt [33]. Ithas a single globally optimal solution at x = 65521 with avalue = 8183.

2) Modified Binary F7:

mBF7_2(x, y)

= 32768 + (56 ∗ [x ∗ sin(4x) + 1.25 ∗ y ∗ sin(2y)]) ,

0 ≤ x, y ≤ 255.

This function is a modified version of the minimizationfunction BF7 in [33]. It has been modified into a maximizationfunction that has a single optimal solution with a value =63904 at x = 247 and y = 249.

FERNANDO et al.: CUSTOMIZABLE FPGA IP CORE IMPLEMENTATION OF A GENERAL-PURPOSE GA ENGINE 145

3000

2500

2000

1500

1000

500

0

Fitn

ess

0 3 6 9 12 15

Generation

18 21 24 27 30 33

Fig. 11. Convergence plot for the test function F2 with population size =32—Run #6 of Table V.

3000

2500

2000

1500

1000

500

0

Fitn

ess

0 42 6 108 12 14 16

Generation

222018 24 26 28 30 32

Fig. 12. Convergence plot for the test function F3 with initial seed forRNG = 1567—Run #10 of Table V.

3) Modified 2-D Shubert Function:

mShubert2D(x1, x2)

= 65535 − 174 ∗[

150 +(

2∏k=1

5∑i=1

{i. cos[(i + 1).xk + i ]})]

,

0 ≤ x1, x2 ≤ 255.

This function is a minimization function (derived from [15])modified into a maximization function with a global optimalvalue = 65535. The function has 48 global optimal solutionsand numerous local maxima.

The experimental setup is similar to the one shown inFig. 4. The Xilinx ISE 10.1i tool achieved a clock speed of50 MHz for the GA module (GA core, RNG module, and theGA memory). The initialization module and the application(fitness) module are separate entities that communicate withthe GA module using handshaking. The Xilinx ISE tool wasable to achieve a clock speed of 200 MHz for these modules. Adigital clock manager core is used to generate the two clocksfrom the on-board 100 MHz clock.

The initialization module consists of a simple finite statemachine (FSM) to perform the two-way handshaking operationusing the “data valid” and “data ack” signals to initialize thevarious GA parameters one by one.

TABLE VI

POST PLACE-AND-ROUTE STATISTICS FOR THE PROPOSED GA CORE AND

FITNESS MODULE ON VIRTEX II PRO DEVICE (XC2VP30-7FF896)

Design attribute Value

Logic utilization (% slices used) 13%

Clock period 50 MHz

Block memory utilization (GA memory) 1%

Block memory utilization (fitness lookup module) 48%

TABLE VII

BEST FITNESS VALUES OBTAINED BY THE GA FOR THE MBF6_2

FUNCTION FOR DIFFERENT PARAMETER SETTINGS

(XR = CROSSOVER RATE)

RNG_Seed(hexadecimal)

PopSize = 32 PopSize = 64

XR = 10 XR = 12 XR = 10 XR = 12

2961 7999 7813 7824 7819

061F 6175 7578 8134 8129

B342 7612 7497 7612 7719

AAAA 7534 7534 7578 7864

A0A0 8104 7406 8135 8039

FFFF 7291 7623 7847 7669

The application module contains a simple FSM to per-form the two-way handshaking with the GA core and thehardware implementation of the fitness function. A lookup-based implementation has been used for the fitness functionsas this resulted in better operational speed than a combina-tional implementation. In the lookup-based fitness computationmethod, block ROMs within the FPGA device are populatedwith the fitness values corresponding to each solution encod-ing. The approach is used only to demonstrate the effectivenessof the proposed GA IP core in optimizing difficult maximiza-tion test functions without having to implement the actual testfunctions in hardware.

The entire experimental setup was implemented on theXilinx Virtex2Pro (xc2vp30-7ff896) FPGA device. ChipscopePro 10.1 tools were used to build cores to observe andrecord the “best fitness” and “sum of fitness” values for eachgeneration on the FPGA.

The proposed GA core was run with 12 different parametersettings as shown in Tables VII–IX. It is to be noted thatmutation rate and number of generations were set to 0.0625and 64, respectively, for all the experiments. The number ofgenerations was set to 64, as the population converged within64 generations for all three fitness functions.

Table VII shows the results obtained by the GA core forthe mBF6_2 test function using different settings for theprogrammable GA parameters. In the experiments conducted,the best solution found by the proposed GA core for themBF6_2 test function was 65345. This solution evaluates toa fitness of 8135 which is approximately 0.59% less than theglobally optimal fitness value of 8183. It is to be noted thatthe best solution found by the proposed GA core lies withinapproximately 0.27% distance of the globally optimal solutionin the solution space.

146 IEEE TRANSACTIONS ON EVOLUTIONARY COMPUTATION, VOL. 14, NO. 1, FEBRUARY 2010

TABLE VIII

BEST FITNESS VALUES OBTAINED BY THE GA FOR THE MBF7_2

FUNCTION FOR DIFFERENT PARAMETER SETTINGS

(XR = CROSSOVER RATE)

RNG_Seed(hexadecimal)

PopSize = 32 PopSize = 64

XR = 10 XR = 12 XR = 10 XR = 12

2961 56 835 56 835 48 135 56 456

061F 59 648 53 432 59 648 60 656

B342 55 000 59 928 59 480 57 184

AAAA 55 560 52 704 55 000 61 496

A0A0 58 136 53 040 58 024 56 624

FFFF 60 880 61 384 56 344 60 768

TABLE IX

BEST FITNESS VALUES OBTAINED BY THE GA FOR THE SHUBERT

FUNCTION FOR DIFFERENT PARAMETER SETTINGS

(XR = CROSSOVER RATE)

RNG_Seed(hexadecimal)

PopSize = 32 PopSize = 64

XR = 10 XR = 12 XR = 10 XR = 12

2961 56 835 56 835 48 135 56 835

061F 56 835 55 095 65 535 58 227

B342 56 487 56 487 54 051 63 795

AAAA 63 795 56 487 65 535 65 535

A0A0 56 835 63 795 65 535 53 355

FFFF 53 355 65 535 48 135 56 835

Table VIII shows the results obtained by the GA corefor the mBF7_2 test function using different settings for theprogrammable GA parameters. The best candidate found bythe proposed GA core for the mBF7_2 test function was65 516. The corresponding solution is y = (FF)16 and x =(EC)16 with a fitness of 61496. This is approximately 3.7%less than the globally optimal fitness value of 63904. The bestsolution found by the proposed GA core lies within 4.3%and 2.35% distance of the globally optimal solution along thex-direction and y-direction respectively of the solution space.

Table IX shows the results obtained by the GA core forthe mShubert2D test function using different settings for theprogrammable GA parameters. The proposed GA core foundmore than one globally optimal solution for many differentparameter settings as shown in bold in Table IX. The GAcore found two different globally optimal solutions, (x1 =C2, y1 = 4A) and (x2 = DB, y2 = 4A), during the experi-mental run with RNG seed = (AAAA)16, population size =64, and crossover threshold = 10.

Figs. 13–16 plot the data collected from the hardware mea-surements and illustrate the convergence of the GA optimizerfor the three test functions. Both the best fitness and averagefitness values are plotted for every generation. It can be seenthat the GA core finds the best solution within the first 10generations for all three test functions. From Figs. 13 and 14,it can be seen that the GA core evaluates at most ({10generations + 1 initial population = 11} × {population size =64}) 704 candidate solutions before finding the best solution.Although the size of the entire solution space is only 65 536,

9000

8000

7000

6000

4000

5000

3000

Fitn

ess

valu

e

1 5 9 13 17 21 25 29 33 37 41 45 49 53 57 61

Generation

Best Fitness

Avg Fitness

Fig. 13. Convergence plot for the test function mBF6_2(x) with initial RNGseed = (061F)16, crossover threshold = 10, and popSize = 64 (data collectedfrom hardware execution).

9000

8000

7000

6000

4000

5000

3000

Fitn

ess

valu

e

1 5 9 13 17 21 25 29 33 37 41 45 49 53 57 61

Generation

Best Fitness

Avg Fitness

Fig. 14. Convergence plot for the test function mBF6_2(x) with initialRNG seed = (A0A0)16, crossover threshold = 10, and popSize = 64 (datacollected from hardware execution).

the GA core evaluates less than 1.1% of the solution spacebefore finding the best solution. This is a major speedupover an exhaustive search and is very important for real-timeapplications and other applications that have time-consumingfitness evaluation procedures such as EHW.

From Fig. 15, it can be seen that the GA core evaluatesat most ({18 generations + 1 initial population = 19} ×{population size = 64}) 1216 candidate solutions beforefinding the best solution for the mBF7_2 test function.

From Fig. 16, it can be seen that the GA core evaluatesat most ({12 generations + 1 initial population = 13} ×{population size = 64}) 832 candidate solutions before find-ing the best solution for the mShubert2D test function.

Thus, it can be seen that the GA core quickly convergesto a good solution after evaluating just a small fraction ofthe solution space even for very difficult test functions. It isexpected that the GA will find good solutions quickly due to itspopulation-based search mechanism. The GA then convergestoward the good solutions and tries to find better solutionsin their vicinity. But it has to be noted that the GA corefinds the optimal solutions only for certain settings of theGA parameters. This reiterates the necessity for the abilityto change the values of these parameters according to theapplication.

FERNANDO et al.: CUSTOMIZABLE FPGA IP CORE IMPLEMENTATION OF A GENERAL-PURPOSE GA ENGINE 147

Generation

65000

60000

55000

50000

45000

40000

35000

30000

25000

20000

1 4 7 10 13 16 19 22 25 28 31 34 37 40 43 46 49 52 55 58 61 64

Fit

ness

val

ue

Best Fitness

Avg Fitness

Fig. 15. Convergence plot for the test function mBF7_2(x, y) with initialRNG seed = (AAAA)16, crossover threshold = 12, and popSize = 64 (datacollected from hardware execution).

Generation

1

70000

65000

60000

55000

50000

45000

40000

35000

30000

25000

20000

4 7 10 13 16 19 22 25 28 31 34 37 40 43 46 49 52 55 58 61 64

Fit

ness

val

ue

Best Fitness

Avg Fitness

Fig. 16. Convergence plot for the test function mShubert2D(x1 , x2) withinitial RNG seed = (AAAA)16, crossover threshold = 10, and popSize = 64(data collected from hardware execution).

C. Runtime Comparison With Software Implementation

A software implementation of a GA optimizer, similar tothe GA optimization algorithm in the IP core, was developedin the C programming language. GAs, when used for hardwareapplications such as EHW, need to communicate with theapplication to evaluate the fitness of the candidate solutions.This communication overhead can be effectively modeledusing the Xilinx Virtex2Pro board as it contains PowerPCprocessor IP cores that can execute software programs. Theexperimental setup consists of the GA software running onthe PowerPC processor in the Xilinx Virtex2Pro board and thefitness evaluation module (implemented as the same lookuptable using block RAM) on the Xilinx Virtex2Pro FPGA.This setup gives a fair comparison between the software andhardware implementations as both are implemented using thesame technology node. The runtime, averaged over six runs,for the GA program for a population size of 32, crossoverrate of 0.625, mutation rate of 0.0625, and running for 32generations to optimize the modified binary F6 (mBF6_2)function was 37.615 ms.

In hardware, a 32-bit counter was implemented that isclocked using the 50 MHz clock used for the GA IP core.The GA execution time on the hardware was computed as the

product of the counter value and the clock period. The hard-ware GA implementation achieved a speedup of approximately5.16× over the software implementation.

V. CONCLUSION

A readily synthesizable, robust GA core for FPGAs hasbeen designed that is easy to interface and use with a widerange of applications. The efficiency of the designed coreis illustrated by the low area utilization and the high clockspeed. The effectiveness of the designed GA core is evidentfrom the convergence characteristics obtained for the standardtest function. The gate-level Verilog implementation of theGA core is advantageous in that it can be directly usedby commercial layout tools for chip layout generation. Theavailability of preset modes and scan chain testability providessome basic fault tolerance to an ASIC designed using theproposed GA core. The proposed core has been used as part ofa self-reconfigurable analog array architecture to compensateextreme temperature effects on VLSI electronics [34], [35].The GA module, an internal (slew-rate) fitness function, andthe other monitoring and controlling modules were imple-mented as a digital ASIC. The digital ASIC passed all theDesign Rule Check, Electrical Rule Check, antenna, andsoft-power tests and has been successfully fabricated usinga radiation-hardened SOI technology. The fabricated chipshave to be tested and incorporated in the Self ReconfigurableAnalog Array architecture.

REFERENCES

[1] J. H. Holland, Adaptation in Natural and Artificial Systems. Cambridge,MA: MIT Press, 1992.

[2] D. E. Goldberg, Genetic Algorithms in Search, Optimization, andMachine Learning. Boston, MA: Addison-Wesley, 1989.

[3] J. J. Grefenstette, R. Gopal, B. J. Rosmaita, and D. Van Gucht, “Geneticalgorithms for the traveling salesman problem,” in Proc. 1st Int. Conf.Genetic Algorithms, 1985, pp. 160–168.

[4] M. Vellasco, R. Zebulum, and M. Pacheco, Evolutionary Electronics:Automatic Design of Electronic Circuits and Systems by Genetic Algo-rithms. Boca Raton, FL: CRC Press, Dec. 2001.

[5] S. D. Scott, A. Samal, and S. Seth, “HGA: A hardware-based geneticalgorithm,” in Proc. ACM/SIGDA 3rd Int. Symp. Field ProgrammableGate Array, 1995, pp. 53–59.

[6] M. Tommiska and J. Vuori, “Implementation of genetic algorithms withprogrammable logic devices,” in Proc. 2nd Nordic Workshop GeneticAlgorithm, 1996, pp. 71–78.

[7] B. Shackleford, G. Snider, R. Carter, E. Okushi, M. Yasuda, K. Seo,and H. Yasuura, “A high-performance, pipelined, FPGA-based geneticalgorithm machine,” Genetic Algorithms Evolvable Mach., vol. 2, no. 1,pp. 33–60, Mar. 2001.

[8] N. Yoshida and T. Yasuoka, “Multi-GAP: Parallel and genetic algorithmsin VLSI,” in Proc. IEEE Int. Conf. Syst., Man, Cybern., 1999 (SMC ’99),pp. 571–576.

[9] W. Tang and L. Yip, “Hardware implementation of genetic algorithmsusing FPGA,” in Proc. 47th Midwest Symp. Circuits Syst., 2004,pp. 549–552.

[10] C. Aporntewan and P. Chongstitvatana, “A hardware implementation ofthe compact genetic algorithm,” in Proc. Congr. Evol. Comput., 2001,pp. 624–629.

[11] P. Graham and B. E. Nelson, “Genetic algorithms in software andhardware a performance analysis of workstation and custom machineimplementation,” in Proc. IEEE Symp. FPGAs Custom Comput. Mach.,1996, pp. 216–225.

[12] M. S. Jelodar, M. Kamal, S. M. Fakhraie, and M. N. Ahmadabadi,“SOPC-based parallel genetic algorithm in evolutionary computation,”in Proc. IEEE Congr. CEC, Vancouver, BC, 2006, pp. 2800–2806.

148 IEEE TRANSACTIONS ON EVOLUTIONARY COMPUTATION, VOL. 14, NO. 1, FEBRUARY 2010

[13] N. Nedjah and L. De Macedo Mourelle, “Massively parallel hardwarearchitecture for genetic algorithms,” in Proc. 8th Euromicro Conf. DigitalSyst. Design, 2005, pp. 231–234.

[14] S. Wakabayashi, T. Koide, K. Hatta, Y. Nakayama, M. Goto, andN. Toshine, “GAA: A VLSI genetic algorithm accelerator with on-the-fly adaptation of crossover operators,” in Proc. IEEE Intl. Symp. CircuitsSyst., 1998, pp. 268–271.

[15] P.-Y. Chen, R.-D. Chen, Y.-P. Chang, L.-S. Shieh, and H. Maliki,“Hardware implementation for a genetic algorithm,” IEEE Trans. Instru-mentation Measurement, vol. 57, no. 4, pp. 699–705, Apr. 2008.

[16] K. Sai Mohan and B. Nowrouzian, “A diversity controlled geneticalgorithm for optimization of FRM digital filters over DBNS multipliercoefficient space,” in Proc. IEEE Int. Symp. Circuits Syst., 2007,pp. 2331–2334.

[17] G. Rudolph, “Evolutionary search for minimal elements in partiallyordered finite sets,” in Proc. 7th Int. Conf. Evol. Programming, 1998,pp. 345–353.

[18] J. Holleman, B. Otis, S. Bridges, A. Mitros, and C. Diorio, “A2.92W hardware random number generator,” in Proc. IEEE ESSCIRC,Montreux, Switzerland, 2006, pp. 134–137.

[19] M. Meysenburg, “The effect of pseudo-random number generationquality on the performance of a simple genetic algorithm,” M.S. thesis,Univ. Idaho, Moscow, ID, 1997.

[20] M. Meysenburg and J. A. Foster, “Randomness and GA perfor-mance, revisited,” in Proc. Genetic Evol. Comput. Conf., vol. 1. 1999,pp. 425–432.

[21] E. Cantú-Paz, “On random numbers and the performance of geneticalgorithms,” in Proc. Genetic Evol. Comput. Conf., Jul. 2002,pp. 311–318.

[22] S. Garfinkel and G. Spafford, Practical UNIX and Internet Security.2nd ed. Sebastopol, CA: O’Reilly Books, Apr. 1996.

[23] U. Elsner, “Influence of random number generators on graph partitioningalgorithms,” Electron. Trans. Numerical Anal., vol. 21, pp. 125–133,2005.

[24] H. De Garis, “Evolvable hardware: Genetic programming of a darwinmachine,” in Proc. Intl. Conf. Artificial Neural Network Genetic Algo-rithms, New York: Spinger-Verlag, 1993, pp. 441–449.

[25] I. Kajitani, T. Hoshino, D. Nishikawa, H. Yokoi, S. Nakaya,T. Yamauchi, T. Inuo, N. Kajihara, M. Iwata, D. Keymeulen, andT. Higuchi, “A gate-level EHW chip: Implementing GA operations andreconfigurable hardware on a single LSI,” in Proc. Int. Conf. EvolvableSyst.: From Biology Hardware, LNCS vol. 1478. 1998, pp. 1–12.

[26] N. J. Macias, “The PIG paradigm: The design and use of a massivelyparallel fine grained self-reconfigurable infinitely scalable architec-ture,” in Proc. 1st NASA/DoD Workshop Evolvable Hardware, 1999,pp. 175–180.

[27] A. Stoica, D. Keymeulen, D. Vu, R. Zebulum, I. Ferguson, T. Daud,T. Arsian, and G. Xin, “Evolutionary recovery of electronic circuits fromradiation induced faults,” in Proc. Congr. Evol. Comput., vol. 2. 2004,pp. 1786–1793.

[28] J. Langeheine, K. Meier, J. Schemmel, and M. Trefzer, “Intrinsicevolution of digital-to-analog converters using a CMOS FPTA chip,”in Proc. NASA/DoD Conf. Evolvable Hardware, 2004, pp. 18–25.

[29] V. Baumgarte, G. Ehlers, F. May, A. Nückel, M. Vorbach, and M. Wein-hardt, “PACT XPP a self-reconfigurable data processing architecture,”J. Supercomputing, vol. 26, no. 2, pp. 167–184, 2003.

[30] C. Lambert, T. Kalganova, and E. Stomeo, “FPGA-based systems forevolvable hardware,” in Proc. World Academy Sci., Eng. Technol.,vol. 12. Mar. 2006, pp. 123–129.

[31] E. M. Sentovich, K. J. Singh, C. Moon, H. Savoj, R. K. Brayton, and A.Sangiovanni-Vincentelli, “Sequential circuit design using synthesis andoptimization,” in Proc. IEEE Int. Conf. Comput. Design: VLSI Comput.Processors, Oct. 1992, pp. 328–333.

[32] A. Stoica, A. Fukunaga, K. Hayworth, and C. Salazar-Lazaro, “Evolv-able hardware for space applications,” in Proc. ICES, LNCS vol. 1478.1998, pp. 166–173.

[33] R. Haupt and S. E. Haupt, Practical Genetic Algorithms. 2nd ed.Hoboken, NJ: Wiley, 2004.

[34] A. Stoica, D. Keymeulen, M. Mojarradi, R. Zebulum, and T. Daud,“Progress in the development of field programmable analog arrays forspace applications,” in Proc. IEEE Aerospace Conf., Mar. 2008, pp. 1–9.

[35] A. Stoica, D. Keymeulen, R. Zebulum, S. Katkoori, P. Fernando,H. Sankaran, M. Mojarradi, and T. Daud, “Self-reconfigurable analogarray integrated circuit architecture for space applications,” in Proc.NASA/ESA Conf. Adaptive Hardware Syst., Noordwijk, The Netherlands,Jun. 2008, pp. 83–90.

[36] N. Park and A. C. Parker, “Sehwa: A software package for synthesis ofpipelines from behavioral specifications,” IEEE Trans. Comput.-AidedDesign Integrated Circuits Syst., vol. 7, no. 3, pp. 356–370, Mar. 1988.

[37] A. Thompson, “Exploring beyond the scope of human design: Automaticgeneration of FPGA configurations through artificial evolution,” in Proc.8th Annu. Advanced PLD FPGA Conf., 1998, pp. 5–8.

[38] L. Sekanina and S. Friedl, “On routine implementation of virtualevolvable devices using COMBO6,” in Proc. 2004 NASA/DoD Conf.Evolvable Hardware, 2004, pp. 63–70.

[39] C. Gopalakrishnan, “High level techniques for leakage power estimationand optimization in VLSI ASICs,” Ph.D. dissertation, Dept. Comput. Sci.Eng., Univ. South Florida, Tampa, FL, 2003.

Pradeep Fernando (S’02–M’09) received the B.E.degree in computer science and engineering fromSri Venkateswara College of Engineering, Universityof Madras, India, in 2002 and the M.S. degree incomputer engineering from the University of SouthFlorida, Tampa, FL in 2006, where he is currentlyworking toward the Ph.D. degree in computer sci-ence and engineering.

From 2002–2008, he has served as a GraduateTeaching Associate and a Graduate Research Asso-ciate at the University of South Florida. He has

assisted in teaching many hardware courses and labs, and instructed theundergraduate VLSI Design Automation course many times and was awardeda certificate of recognition for outstanding teaching by the USF Provost in2006. He has co-authored several technical papers in refereed conferences andhas served as a reviewer for many technical conferences and journals includingthe IEEE TRANSACTIONS ON VERY LARGE SCALE INTEGRATION (VLSI)SYSTEMS. His research interests include VLSI circuit design and designautomation, genetic algorithms, multiobjective optimization algorithms, andevolvable hardware.

Srinivas Katkoori (S’92–M’97–SM’03) receivedthe B.E. degree in electronics and communicationengineering from Osmania University, Hyderabad,India, in 1992 and the Ph.D. degree in com-puter engineering from the University of Cincinnati,Cincinnati, OH, in 1998.

In 1997, he joined the Department of Com-puter Science and Engineering, University of SouthFlorida, Tampa, as an Assistant Professor. In 2004,he was tenured and promoted to an Associate Pro-fessor. His research interests are in the general

area of VLSI CAD algorithms and design methodologies. Specific researchareas include high-level synthesis, low-power synthesis, FPGA synthesis, andradiation-tolerant CAD for FPGAs. To date, he has published over 50 peer-reviewed journal and conference papers. He holds one U.S. patent.

Dr. Katkoori is a recipient of 2001 NSF CAREER award. He is a SeniorMember of ACM. He serves on technical committees of several VLSIconferences and as a peer reviewer for several VLSI journals. Since 2006,he is serving as an Associate Editor of the IEEE TRANSACTIONS ON VERYLARGE SCALE INTEGRATION (VLSI) SYSTEMS. He is the recipient ofthe inaugural 2002–2003 University of South Florida Outstanding FacultyResearch Achievement Award. He is also the recipient of the 2005 OutstandingEngineering Educator Award from the IEEE Florida Council (Region 3).

Didier Keymeulen received the BSEE, MSEE, andPh.D. degrees in electrical engineering and com-puter science from the Free University of Brussels,Brussels, Belgium.

In 1996, he joined the computer science divisionof the Japanese National Electrotechnical Laboratoryas a Senior Researcher. Currently, he is a PrincipalMember of the technical staff of the NASA JetPropulsion Laboratory (JPL), Pasadena, CA, in theBio-Inspired Technologies Group. At JPL, he isresponsible for DoD and NASA applications on

evolvable hardware for adaptive computing that leads to the developmentof fault-tolerant electronics and autonomous and adaptive sensor technology.He participated also as Test Electronics Lead in the Tunable Laser Spectruminstrument on Mars Science Laboratory. He has served as Chair, Co-Chair,and Program Chair of the NASA/ESA Conferences on Adaptive Hardware.

FERNANDO et al.: CUSTOMIZABLE FPGA IP CORE IMPLEMENTATION OF A GENERAL-PURPOSE GA ENGINE 149

Ricardo Zebulum received the Bachelors, Mas-ters, and the Ph.D degrees in electrical engineeringin 1992, 1995, and 1999, respectively, all fromthe Department of the Catholic University of Rio,Brazil.

He stayed two years at Sussex University, U.K.,from 1997 to 1999. He is currently a member of theJet Propulsion Laboratory (JPL), California Instituteof Technology, Pasadena. He has been involvedwith research in evolvable hardware and geneticalgorithms since 1996 and has authored more than

50 papers in this area. Besides evolutionary hardware research, he also worksas a Reliability Engineer at JPL.

Adrian Stoica (SM’06) received the BS degreein 1986 and obtained the MSEE degree from theTechnical University of Iasi, Iasi, Romania, in 1986,and the Ph.D. degree from Victoria University ofTechnology, Melbourne, Australia, in 1996.

He is currently a Senior Research Scientist at theNASA Jet Propulsion Laboratory, Pasadena, CA. Hismain research interests are in learning and adaptationin autonomous systems. He has six patents, haspublished over 100 papers, and has been keynotespeaker at several conferences. He has started four

conferences, one of which, the NASA/ESA Conference on Adaptive Hardwareand Systems (previously NASA/DOD Conference on Evolvable Hardware),has its 10th anniversary in 2009.