Volterra-System Identification Using Adaptive Real-Coded Genetic Algorithm

14
IEEE TRANSACTIONS ON SYSTEMS, MAN, AND CYBERNETICS—PART A: SYSTEMS AND HUMANS 1 Volterra-System Identification Using Adaptive Real-Coded Genetic Algorithm Hazem M. Abbas, Member, IEEE, and Mohamed M. Bayoumi, Senior Member, IEEE Abstract—In this paper, a floating-point genetic algorithm (GA) for Volterra-system identification is presented. The adaptive GA method suggested here addresses the problem of determining the proper Volterra candidates, which leads to the smallest error between the identified nonlinear system and the Volterra model. This is achieved by using variable-length GA chromosomes, which encode the coefficients of the selected candidates. The algorithm relies on sorting all candidates according to their correlation with the output. A certain number of candidates with the highest correlation with the output are selected to undergo the first evo- lution “era.” During the process of evolution the candidates with the least significant contribution in the error-reduction process is removed. Then, the next set of candidates are applied into the next era. The process continues until a solution is found. The proposed GA method handles the issues of detecting the proper Volterra candidates and calculating the associated coefficients as a nonseparable process. The fitness function employed by the algorithm prevents irrelevant candidates from taking part in the final solution. Genetic operators are chosen to suit the floating- point representation of the genetic data. As the evolution process improves and the method reaches a near-global solution, a local search is implicitly applied by zooming in on the search interval of each gene by adaptively changing the boundaries of those intervals. The proposed algorithms have produced excellent re- sults in modeling different nonlinear systems with white and col- ored Gaussian inputs with/without white Gaussian measurement noise. Index Terms—Evolutionary algorithms, intelligent computa- tion, neural networks, real-coded genetic algorithms, system iden- tification, Volterra systems. I. I NTRODUCTION I DENTIFICATION of nonlinear systems is of consider- able interest in many engineering and science applications. Volterra series provide a general representation of nonlinear systems and have been applied in system identification [1], [2]. They have been also applied in other engineering applications [3]–[5]. The complexity of the identified system is normally determined by the order and the time delays of the Volterra system used for identification. Naturally, the number of possible Volterra candidates that can model the system will excessively increase in proportion with the system order and/or time delays. There have been many attempts to detect and estimate the proper Volterra candidates and their corresponding coefficients Manuscript received December 9, 2003; revised June 6, 2004. This paper was recommended by Associate Editor Y. Pan. H. M. Abbas is with Mentor Graphics Egypt, Heliopolis, Cairo 11341, Egypt, on sabbatical leave from the Department of Computer and Systems Engineering, Ain Shams University, Cairo, Egypt. M. M. Bayoumi is with the Department of Electrical and Computer Engi- neering, Queen’s University, Kingston ON K7L 3N6, Canada. Digital Object Identifier 10.1109/TSMCA.2005.853495 or kernels. Neural networks [6] have been used to identify second-order time-invariant nonlinear system using a combi- nation of quadratic polynomials and linear layers. Korenberg [7] proposed a fast orthogonal approach to select the exact candidates using a modified Gram–Schmidt orthogonalization and Choleskey decomposition. Evolutionary algorithms have also been applied to find the minimal structure of nonlinear systems. In [8], a genetic al- gorithm (GA) searches the space of subsets of polynomial neural units and applies an evolutionary operator to detect a near-minimal linear architecture to construct a functional link neural network (FLN). The use of polynomial neural units was achieved earlier through parallel linear–nonlinear (LN) cascades, where each N represents a polynomial and is used for kernel estimation [9]. Yao [10] has proposed another GA method to model the sparse Volterra filter so that the number of candidates to be estimated is greatly reduced. The Volterra-system-modeling process requires a candidate- selection method and associated kernel estimation. Yao [10] used binary-coded chromosomes to encode the candidate lo- cation in each chromosome and then used a least-square ap- proach to find the kernel value. In the evolutionary FLN [8], a similar approach was adopted where a binary chromosome was used to indicate whether an input polynomial is either dropped or included in the representation. The weight associ- ated with each polynomial is adjusted using a learning rule, which decreases the overall representation error for the network described by each chromosome. Xiong [11] proposed a hybrid approach of case-based reasoning and binary-coded GA to identify significant input inputs. The GA served as a mecha- nism to search for optimal hypothesis about feature relevance and the case-based reasoning was employed for hypothesis evaluation. In this paper, we address the problem of selecting the reduced Volterra candidates and calculating the kernel of each candidate in one single step. To do that, we employ a GA with floating-point representation of its parameters. The main idea of the proposed evolutionary algorithm is to exploit the fact that candidates with the highest correlation with output are more likely to be principal system terms. The algorithm starts by sorting all candidates in descending order according to their correlation with the output. In the first evolution era, the candidates with the largest correlation are tested to see if they represent principal kernels of the identified system. In subsequent evolutionary eras, more significant candidates are examined through the evolutionary process. During each era, irrelevant candidates are dropped off and the fittest survive to the next era. The parameters that have the least significant 1083-4427/$20.00 © 2005 IEEE

Transcript of Volterra-System Identification Using Adaptive Real-Coded Genetic Algorithm

IEEE TRANSACTIONS ON SYSTEMS, MAN, AND CYBERNETICS—PART A: SYSTEMS AND HUMANS 1

Volterra-System Identification Using AdaptiveReal-Coded Genetic Algorithm

Hazem M. Abbas, Member, IEEE, and Mohamed M. Bayoumi, Senior Member, IEEE

Abstract—In this paper, a floating-point genetic algorithm (GA)for Volterra-system identification is presented. The adaptive GAmethod suggested here addresses the problem of determining theproper Volterra candidates, which leads to the smallest errorbetween the identified nonlinear system and the Volterra model.This is achieved by using variable-length GA chromosomes, whichencode the coefficients of the selected candidates. The algorithmrelies on sorting all candidates according to their correlationwith the output. A certain number of candidates with the highestcorrelation with the output are selected to undergo the first evo-lution “era.” During the process of evolution the candidates withthe least significant contribution in the error-reduction process isremoved. Then, the next set of candidates are applied into thenext era. The process continues until a solution is found. Theproposed GA method handles the issues of detecting the properVolterra candidates and calculating the associated coefficients asa nonseparable process. The fitness function employed by thealgorithm prevents irrelevant candidates from taking part in thefinal solution. Genetic operators are chosen to suit the floating-point representation of the genetic data. As the evolution processimproves and the method reaches a near-global solution, a localsearch is implicitly applied by zooming in on the search intervalof each gene by adaptively changing the boundaries of thoseintervals. The proposed algorithms have produced excellent re-sults in modeling different nonlinear systems with white and col-ored Gaussian inputs with/without white Gaussian measurementnoise.

Index Terms—Evolutionary algorithms, intelligent computa-tion, neural networks, real-coded genetic algorithms, system iden-tification, Volterra systems.

I. INTRODUCTION

IDENTIFICATION of nonlinear systems is of consider-able interest in many engineering and science applications.

Volterra series provide a general representation of nonlinearsystems and have been applied in system identification [1], [2].They have been also applied in other engineering applications[3]–[5]. The complexity of the identified system is normallydetermined by the order and the time delays of the Volterrasystem used for identification. Naturally, the number of possibleVolterra candidates that can model the system will excessivelyincrease in proportion with the system order and/or time delays.There have been many attempts to detect and estimate theproper Volterra candidates and their corresponding coefficients

Manuscript received December 9, 2003; revised June 6, 2004. This paperwas recommended by Associate Editor Y. Pan.

H. M. Abbas is with Mentor Graphics Egypt, Heliopolis, Cairo 11341,Egypt, on sabbatical leave from the Department of Computer and SystemsEngineering, Ain Shams University, Cairo, Egypt.

M. M. Bayoumi is with the Department of Electrical and Computer Engi-neering, Queen’s University, Kingston ON K7L 3N6, Canada.

Digital Object Identifier 10.1109/TSMCA.2005.853495

or kernels. Neural networks [6] have been used to identifysecond-order time-invariant nonlinear system using a combi-nation of quadratic polynomials and linear layers. Korenberg[7] proposed a fast orthogonal approach to select the exactcandidates using a modified Gram–Schmidt orthogonalizationand Choleskey decomposition.

Evolutionary algorithms have also been applied to find theminimal structure of nonlinear systems. In [8], a genetic al-gorithm (GA) searches the space of subsets of polynomialneural units and applies an evolutionary operator to detect anear-minimal linear architecture to construct a functional linkneural network (FLN). The use of polynomial neural unitswas achieved earlier through parallel linear–nonlinear (LN)cascades, where each N represents a polynomial and is usedfor kernel estimation [9]. Yao [10] has proposed another GAmethod to model the sparse Volterra filter so that the number ofcandidates to be estimated is greatly reduced.

The Volterra-system-modeling process requires a candidate-selection method and associated kernel estimation. Yao [10]used binary-coded chromosomes to encode the candidate lo-cation in each chromosome and then used a least-square ap-proach to find the kernel value. In the evolutionary FLN [8],a similar approach was adopted where a binary chromosomewas used to indicate whether an input polynomial is eitherdropped or included in the representation. The weight associ-ated with each polynomial is adjusted using a learning rule,which decreases the overall representation error for the networkdescribed by each chromosome. Xiong [11] proposed a hybridapproach of case-based reasoning and binary-coded GA toidentify significant input inputs. The GA served as a mecha-nism to search for optimal hypothesis about feature relevanceand the case-based reasoning was employed for hypothesisevaluation.

In this paper, we address the problem of selecting thereduced Volterra candidates and calculating the kernel of eachcandidate in one single step. To do that, we employ a GAwith floating-point representation of its parameters. The mainidea of the proposed evolutionary algorithm is to exploit thefact that candidates with the highest correlation with outputare more likely to be principal system terms. The algorithmstarts by sorting all candidates in descending order accordingto their correlation with the output. In the first evolution era,the candidates with the largest correlation are tested to see ifthey represent principal kernels of the identified system. Insubsequent evolutionary eras, more significant candidates areexamined through the evolutionary process. During each era,irrelevant candidates are dropped off and the fittest surviveto the next era. The parameters that have the least significant

1083-4427/$20.00 © 2005 IEEE

2 IEEE TRANSACTIONS ON SYSTEMS, MAN, AND CYBERNETICS—PART A: SYSTEMS AND HUMANS

effect on the error-reduction process are continuously removedfrom the chromosome. The process ends when a solution isfound and the identified system is successfully reproducedby the Volterra model. The GA-population selection processand the genetic operators, mutation and crossover, have beenchosen to accelerate the evolution process and to find the correctsolution as well. Also, the fitness function chosen in this workdoes not allow insignificant candidates to participate in thefinal solution or to “hitchhike.” When there is no measurementnoise, the fitness function will be based on producing theleast-mean-squared error. However, for noisy measurement, thefitness function will employ the maximum-likelihood principlein recovering the noise from the identified system.

The paper is organized as follows. Volterra-system repre-sentation of nonlinear system is presented in Section II. Theevolutionary approach for candidate detection and kernel es-timation is then discussed in Section III. Section IV presentsthe proposed evolutionary algorithm. The adaptation of thechromosome length and parameter search area is outlined inSection V. Section VI describes the experiments of applyingthe proposed algorithm in identifying several nonlinear systemsand the analysis of those results.

II. DISCRETE VOLTERRA SYSTEMS

The Volterra approach can be considered as a mappingbetween the input and output spaces of a system. The Volterra-series representation of continuous-time dynamic systems takesthe form

y(t) = h0 + H1 [u(t)] + H2 [u(t)] + · · · + Hn [u(t)] + · · ·(1)

where y(t) is the system output at time t, u(t) is the input attime t

Hn [u(t)] =

∞∫−∞

· · ·∞∫

−∞

hn(τ1, . . . , τn)

× u(t − τ1) · · ·u(t − τn)dτ1 · · · dτn

is the nth-order Volterra operator, and hn(τ1, . . . , τn) is thenth order Volterra kernel. There are some problems that existin the kernel calculation and the use of the series in systemmodeling in the continuous-time domain [12]. Alternatively,the discrete-time discrete Volterra series will be used. The nthorder discrete-time Volterra-series expansion of a discrete-timecausal, nonlinear, and time-invariant system is

y(k) = h0 +N−1∑τ1=0

h1(τ1)u(k − τ1)

+N−1∑τ1=0

N−1∑τ2=0

h2(τ1, τ2)u(k − τ1)u(k − τ2)

+ · · · +N−1∑τ1=0

· · ·N−1∑τn=0

hn(τ1, . . . , τn)

× u(k − τ1) · · ·u(k − τn) (2)

where u(k) and y(k) denote the input and output data sequence,respectively, N is the number of samples needed to describe thedynamics of the systems. The kernels {hi} are assumed to besymmetric functions, i.e., hn(τ1, . . . , τn) is unchanged for anyof the possible n! permutations of the indices τ1, . . . , τn. Thenumber of kernel values, K =

∑ni=1

(N+i−1

i

), to be identified

in (2), will be excessively large if either the system ordern or the system memory N increase. The main objective ofthe identification process is to detect a number, Ke < K, ofVolterra kernel values that contribute significantly to the output.

There are many methods in the literature that describe how toestimate the Volterra kernels. However, there has been alwaysthe need to find the most concise and accurate representation ofthe identified system in the least possible processing time.

III. THE EVOLUTIONARY VOLTERRA

DETECTION APPROACH

Identification of Volterra terms and their kernels can be con-sidered as an optimization problem where a set of parametersare needed to be estimated in such a way that the error betweenthe outputs of the Volterra representation and the actual systemis minimum. Classical optimization methods such as gradient-based techniques can easily get trapped in local minima. Theyalso rely on local information and need smooth search space.The possibility of these methods to find a good solution dimin-ishes further when the order of the Volterra representation andthe number of input time delays increase.

Evolutionary methods [13], [14], on the other hand, arepopulation-based algorithms, which are modeled loosely on theprinciples of the evolution via natural selection. They employa population of chromosomes or individuals that undergo se-lection in the presence of variation-inducing operators such asmutation and recombination (crossover). A fitness function isused to evaluate individuals, and reproductive success varieswith fitness. They can reach a diverse set of solutions includingthe global one. They do not require the calculation of any localinformation or the knowledge about specific system properties.The algorithm works by generating a random initial population.Then, the fitness for each individual in the current populationis computed. The fitness of an individual will control theprobability of this individual to be selected in the next evolu-tion generation. By probabilistically applying genetic operatorssuch as reproduction, crossover, and mutation, a new populationis generated for the next generation. The process continues untila required fitness value is reached or a number of prespecifiedgenerations have elapsed. By properly tuning the evolution-ary algorithm operators, a global solution can be found evenwhen the problem dimension gets too large. These propertieshave resulted in many evolutionary-based system-identificationtechniques. Nissinen and Koivisto [15] handled the problemof a large number of candidates using a combination of aGA-based search and the orthogonal least-squares (OLS) meth-od [16] (it should be noted that OLS is a somewhat less effi-cient variation of fast orthogonal search (FOS) [7]). The GApart is used as the search engine to create a small set of can-didates for which the OLS is applied to find the final solution.The problem of identifying the most significant nonlinear

ABBAS AND BAYOUMI: VOLTERRA-SYSTEM IDENTIFICATION USING ADAPTIVE REAL-CODED GENETIC ALGORITHM 3

candidates has also been addressed by Sierra et al. [8]. Theyemployed an evolutionary approach to find the lowest possiblepolynomial degree combining the inputs of an FLN. A binarychromosome representing all possible combinations of the in-put variables is used to indicate whether each of these com-binations will participate in the final solution. The evolutionstarts using the original attributes, i.e., only a linear solutionis tried first. If the representation error reached by the bestlinear solution is not satisfactory, a second-order evolution iscarried out. The best linear solution will be the starting pointaround which a second-order one is sought. The process isrepeated by increasing the order of the solution until a solutionis reached. The output of the FLN network with chromosomec for a binary classification problem in d dimensions up to adegree r is y =

∑Ni=1 ciwiφi(x) where φi(x) is the polynomial

term associated with bit ci and N is the number of terms up todegree r. Quadratic error minimization is used to update theweights wi.

Yao [10] also applied a binary-coded GA to find the bestset of kernel locations in a sparse Volterra filter. The GA wasapplied to locate the positions of best kernels whereas theleast-square-error method was used to estimate the associatedVolterra kernels. This GA algorithm used modified genetic op-erators to overcome the problems of duplicate kernel locationsor the out-of-range elements.

Clearly, those last two methods applied an evolutionaryalgorithm to determine the most accurate candidates in thenonlinear system representation. However, the coefficient (orkernel value) associated with each candidate was determinedin a separate step aside from the evolutionary process. In thiswork, we will be describing another evolutionary approach,which combines the two steps into a single operation as willbe explained in the next section.

In a third application, Duong and Stubberud [17] have as-sumed that the structure of the model is known a priori and theyapplied a binary GA to find the parameters of the model. Theycompared their GA-based approach with the recursive leastmean squares (LMS) method. Although the error of the LMSmethod was smaller than the GA error, the convergence processof the former was not stable. When the number of parametersincreases, the computation complexity of the LMS approachincreases rapidly (due to matrix multiplication and inversion)while the complexity of the GA method is slightly affected.This is because the added parameters affect only the encodedstring, which has very little influence on the computationalprocess. The approach has shown to work efficiently for bothlinear and nonlinear cases. However, convergence of the GAmethod was highly sensitive to the GA parameters such aspopulation size, mutation and crossover probability, and theselection and recombination process.

IV. THE PROPOSED EVOLUTIONARY APPROACH

In the following, another GA-based approach to Volterra-system identification is proposed. The approach differs fromprevious GA-based methods in the following aspects.

1) A floating-point chromosome representation of theVolterra kernels is used. The advantages of using floating-

point encoding instead of the binary one has been demon-strated in [13].

2) Neither the system order n nor the number of timedelays N need to be prespecified a priori. The algorithmstarts with relatively large values for both parametersand through evolution it will end up finding the correctsettings.

3) The GA method used here converges to the properVolterra terms very rapidly. By continuously examiningthe terms with highest correlation with the output andremoving the Volterra terms that do not contribute signif-icantly to the system output, the approach has shown todetect the correct terms and the associated kernels usinga small number of generations.

4) The algorithm combines the identification of the termsand kernels in one single step rather than in two separateidentification processes.

The algorithm goes through a set of evolution phases or eras.During each era, a population of individuals is initializedand a number of generations will be executed. Evolutionaryoperations such as selection, crossover, and mutation will beconducted at the end of each generation. An era is considereddone when the fitness of the system stops improving. Individ-uals that are considered insignificant are then removed whileprincipal terms will be left intact. The next era begins by ap-pending to the current population a new one, which correspondsto the individuals of the next era, and the process continues. Asolution is considered to be reached when a prespecified fitnessvalue is achieved.

A. Genetic Representation

The chromosome in the proposed GA method is composedof a sequence of real valued numbers (genes) that represent theVolterra kernels of the identified system. One would choose thegenes to be ordered in such a way that first-order terms with allpossible time delays come first, then the second-order kernelvalues follow, and so on. Sierra et al. [8] have used this methodof terms ordering in their evolutionary design of the FLN astheir solution starts with the candidates of the least polynomialdegree while higher degree terms are searched only when asolution is yet to be found. For example, the chromosome forthe a second-order system with N delays will take the form

{h0, h1(0), . . . , h1(N − 1), h2(0, 0), . . . , h2(0, N − 1),

h2(1, 1), . . . , h2(1, N − 1), . . . , h2(N − 1, N − 1)} (3)

for a total number of genes, K =∑2

i=1

(N+i−1

i

). Generally,

for an nth order and N time delays system, the value of Kwill be changing during evolution as both the values of n andN will be detected through evolution. In this work, we areexploiting the correlational properties between the candidateterms and the system output. It is highly likely that terms withhighest correlation coefficients with the output will be principalVolterra terms. Here we are using the statistical properties ofthe identified system instead of counting on the possibility thatlower degree polynomials could be sufficient to identify thesystem before higher ones can be tested. Therefore, the first

4 IEEE TRANSACTIONS ON SYSTEMS, MAN, AND CYBERNETICS—PART A: SYSTEMS AND HUMANS

step in our algorithm is to compute the correlation coefficientfor each candidate term

φui=

cov(ui,y)√var(ui)var(y)

where ui stands for a Volterra candidate, i = 1, 2, . . . ,K, y isthe given system output, and var and covar are the conventionalvariance and covariance functions, respectively. The K candi-date terms [in (3)] are sorted in descending order according toφui

. Rating the candidates according to largest crosscorrelationwith the output is equivalent to rating them according to whichhave smallest mean-square error of fit to the output. So at thisearly stage, the strategy is similar to that of FOS [7].

The evolution process will go through a set of evolutionphases or eras. The sorted candidates are distributed equallyover all eras. In the first era, the candidates with the highestφui

are tested to see whether any of them is a principal Volterraterm. There is a very high possibility that all principal Volterraterms can be identified during this era. When this is not thecase, the next era is then executed to search for more potentialcandidates. It should be noted that at the end of each era, therewill be insignificant candidates, which can be removed. Hence,a decision has to be made whether to remove any. This willresult in the corresponding kernel (gene) to be removed fromthe chromosome. Also, the data column of the correspondingcandidate should be eliminated from data matrix X, where thematrix XD×K = [u1 u1 · · · uK ] will be used to hold anumber of D data points for each of the K sorted candidates.Another binary vector, x ∈ {0, 1}K , is maintained throughoutthe process to indicate which terms have survived. Dropped-off terms will be marked by zero values in their respectivelocations.

B. Population

Generally, a large population size will result in faster con-vergence. However, this will incur large costs both in storageand in computation time. Thus, finding an optimal size of thepopulation will highly result in faster convergence [18]. Wehave made no attempt to find the optimal population size anda fixed number of chromosomes have been used throughout theprocess. The individuals of the initial population are sampledfrom a uniform distribution that has lower and upper bounds.Initially, the two bounds for all genes are chosen to be thesame. These two bounds will be adaptively changing duringthe process of evolution according to the contribution of eachgene relative to the overall performance as will be describedlater. At the start of the first evolution era, the length of eachchromosome will be equal to the number of candidates allottedto this era. At the end of the era, the chromosome length couldbe reduced to be equal to the number of terms, which areconsidered principal parts of the system. This end populationwill be appended by an initial population, which represents thecandidates of the next era, and the new evolution phase begins.Eventually, when a solution is found, the end population willrepresent only the significant Volterra terms.

C. Fitness Function

The fitness function of the proposed GA algorithm willdepend on whether there is any measurement noise added tothe system output.1) Noise-Free Measurement: The fitness function adopted

in this work is to serve two main objectives. First, it should mea-sure how good the selected Volterra model is when comparedwith the actual system. This is normally achieved by using themean-squared error between the two systems as an objectivefunction J to be minimized where

J =1D

D∑i=1

(yi − yi)2

and D is the number of sampled data points, yi and yi are theactual and model outputs for the ith sample, respectively. Sec-ond, the fitness function should be able to prevent insignificant(or noisy) terms from participating in the final solution or wewill be feeding noise into the model. It should be noted thatJ will need to be minimized to an optimal zero value. Thiswill allow noisy terms to participate in the final solution. Onedrawback of using the mean-squared-error criterion is that ittreats all terms in the error formula equally and has no wayof filtering out the effect of those noisy terms. Therefore, anexponential filtering process is applied to the objective functionJ to give the fitness function F

F = exp(−J) (4)

which is to be maximized to an optimal unit value. Smallcontributions made by noisy terms will have little effect on thevalue of the fitness function, and thus can be eliminated, as willbe demonstrated in this work.2) White Gaussian Measurement Noise: If the system is

contaminated with zero-mean white Gaussian measurementnoise µ, which is independent from the input u(k), i.e.

yt(k) = y(k) + µ(k) (5)

where y(k) is the noise-free Volterra-system output [(2)], thenthe fitness function defined in (4) cannot be used since achiev-ing a unit value (or zero mean-squared error) in the presenceof noise will be impossible. Therefore, another fitness functionhas to be adopted. Instead of using the least-squared-error ap-proach, we will test the modeling-error sequence to see if itreally represents a white Gaussian noise. Here, we will be as-suming that the noise has a zero mean and known variance σ2

µ.Dealing with the representation error

e(k) = yt(k) − y(k)

as a random variable ideally drawn from N(0, σµ), then maxi-mizing the maximum-likelihood function

L =D∏

i=1

p (e(i)) , p =1√

2πσ2µ

exp(−e2(i)2σ2

µ

)

ABBAS AND BAYOUMI: VOLTERRA-SYSTEM IDENTIFICATION USING ADAPTIVE REAL-CODED GENETIC ALGORITHM 5

can serve our modeling objective. Since the likelihood L ismonotonic, one can use the logarithmic likelihood functioninstead. Hence, the objective function to be maximized is

Fl =D∑

i=1

log p (e(i))

=D∑

i=1

log1√

2πσ2µ

−D∑

i=1

(e2(i)2σ2

µ

)

=C − 12σ2

µ

D∑i=1

e2(i)

=C − D

2σ2

e

σ2µ

(6)

where C is a constant term and σ2e = (1/D)

∑Di=1 e2(i) de-

notes the variance of the representation error e(k). At earlygenerations, the value of σ2

e will be large but will be reducedto an optimal value of σ2

µ as the evolution process grows older.This is where the fitness function Fl will reach its maximumvalue. Clearly, we cannot force σ2

e to reach a zero value, or wewill be adding noisy terms to the model. The fitness function,which will be used in the simulation, will measure whether σ2

e

is equal to σ2µ. Hence, the fitness function

Fµ = exp(−

∣∣σ2e − σ2

µ

∣∣) (7)

will be employed when noisy measurements are observed.

D. Selection

At the end of a generation, the current population undergoesthe process of selection of some individuals who will continueto the next generation. Other genetic operators will be appliedon the selected members to create a new population. The selec-tion process aims at improving the quality of the populationby providing the fittest chromosomes a better chance to getcopied into the next generation. In this work, we have used therank-based selection scheme instead of the fitness-based one.The latter scheme often results in the population to convergeto a suboptimal solution and thus sacrificing genetic diversity.Rank-based methods order the chromosomes according to theirfitness value. Since the selection here depends on the degree thefitter chromosomes are favored, the GA will produce improvedpopulation fitness over succeeding populations. The normalizedgeometric ranking scheme is used in this work. Individuals inthe population are ranked in decreasing order according to theirfitness value. Then, each individual is assigned a probabilityof selection based upon a triangular or geometric distribution.Michalewicz [13] and Joines and Houck [19] have shown thatGAs incorporating ranking methods based upon the geometricranking distribution outperform those based on the triangulardistribution. Thus, for a finite population size, the probabilityassigned to the ith individual or chromosome will be

Pi = q(1 − q)r−1

where q is the probability of selecting the best individual andr is the rank of the individual (1 is the best rank). Joines andHouck [19] showed that a pure geometric distribution is notappropriate since its range is defined on the interval one toinfinity. To alleviate this problem, they developed a normalizeddistribution such that the probability will be

Pi = q′(1 − q)r−1, q′ =q

1 − (1 − q)P

where P is the population size.

E. Genetic Operators

Reproduction of new individuals is carried out by the ap-plication of two genetic operators, mutation and crossover, onthe selected parents. Mutation alters a parent by changing oneor more genes in some way to form an offspring. Crossoveroperators combine information from two parents in order tohave two children resembling the parents. Since real-valuedgene representation is used in this work, the two operators willbe given in the floating-point representation. Michalewicz [13]has described a number of floating-point operators. We haveselected two of them for the mutation and crossover operations.1) Mutation: The nonuniform mutation operator is applied

in this study. It selects one of the parent chromosome genes gi

and adds to it a random displacement. The operator uses twouniform random numbers r1 and r2 drawn from the interval[0,1]. The first (r1) is used to determine the direction ofthe displacement while the other (r2) is used to generate themagnitude of the displacement. Assuming that gi ∈ [ai, bi],where ai and bi are the gene lower and upper bounds, respec-tively, the new variable becomes

qi ={

gi + (bi − gi)f(G), r1 < 0.5gi − (gi − ai)f(G), otherwise

where f(G) = [r2(1 − (G/Gmax))]p, G is the current gener-ation, Gmax is the maximum number of generations, and p isa shape parameter. In early generations, the operator providesa global search mechanism and behaves similar to a uniformmutation over the interval [ai, bi]. However, with increasingnumber of generations, the distribution narrows and thus be-haves like a local search. When the operator is applied to allgenes, it is called multi-nonuniform mutation.2) Crossover: The arithmetic crossover operator is em-

ployed for the proposed GA method. The operator producestwo complimentary linear combinations (A′, B′) of the parents(A,B). It is defined as

A′ = a A + b B

B′ = b A + a B

where a is a uniform random number drawn from the period[0,1], and b = 1 − a. If, for example, each chromosome iscomposed of two genes and A = (x1, x2), B = (y1, y2), thenthe new parents will be A′ = (ax1 + by1, ax2 + by2) and B′ =(bx1 + ay1, bx2 + ay2).

6 IEEE TRANSACTIONS ON SYSTEMS, MAN, AND CYBERNETICS—PART A: SYSTEMS AND HUMANS

It is worth mentioning here that the genes of the reproducedchromosomes after mutation or crossover are always within thegene bounds ai and bi. Therefore, there is no need to performany feasibility checking or repair the newly produced parents.

V. ADAPTATION OF THE GA ALGORITHM

Since the number of possible Volterra kernels can become toolarge, it is always desirable to provide a mechanism that can de-tect the significant ones. Korenberg used the orthogonal searchmethod [7] and introduced the FOS method [7], [20] for build-ing difference equations models of nonlinear systems and formodeling time-series data. A modified Gram–Schmidt orthogo-nalization is applied to the truncated Volterra model representedby (2) in order to generate mutually orthogonal terms over thesampling period. The coefficients of those new terms are se-lected to minimize the mean-square error over this period. Thecandidate with the largest contribution in decreasing the repre-sentation error will be selected to be added to the model. Bycontinuously choosing candidates in this manner, a concise andaccurate model can be constructed. Model development may behalted when no remaining candidates can cause a reduction inthe mean-square error exceeding a prespecified threshold.

In the GA-based algorithms described in Section III, theevolved binary chromosome indicates the selected Volterracandidates by having the location bits of these candidates setto 1 while the remaining bits are set to 0. Obviously, when thenumber of candidates is large, the GA method will require anextended number of generations before it reaches a good set ofcandidates. Additionally, another step is required to calculatethe kernel values of the selected terms. It is the intent here tocombine the two steps into one evolutionary process and toadapt the algorithm in such a way that the correct solution isreached in the smallest number of generations.

Adaptation of the GA methods can be carried in many dif-ferent ways. Hinterding et al. [21] provided a survey on theadaptation of parameters and operators within evolutionaryalgorithms. In this work, we perform the adaptation on thenumber of genes in the chromosome and the boundaries ofeach gene by exploiting some properties of the identificationproblems. Inclusion of domain-specific heuristics is not a pre-requisite, although it may improve the performance of a GA aswill be demonstrated in the experiments.

A. Adapting the Number of Genes

In the proposed algorithm, chromosome length will be vary-ing at the beginning and during each era. As mentioned earlier,the evolution process will go through a set of eras. Each erawill examine a number of candidates (genes) and detect theprincipal ones. Candidates that are considered unimportant willbe removed. Assuming that the K candidates have been sortedaccording to their correlation with the output, and that thealgorithm will experience a number of eras E, a number ofgenes e = (K/E) is added to the current chromosome andevolution operators will be applied at the end of each generationwithin an era. After a certain number of generations, the fitnessvalue will reach a constant value. At this point, the chromosome

with the best fitness is tested to decide if any gene can beremoved. This is accomplished by calculating the individualinfluence of each candidate on the final solution. To do that, thecorrelation coefficient of the output of each candidate and themodeling error (difference between actual system output y andthe model output yc when a particular candidate c is removed)is computed. Assuming that there is a number of D sampledpoints, hence, for candidate c, the correlation coefficient ρ(c)will be given by

ρ(c) =cov(oc, Jc)√var(oc)var(Jc)

(8)

where oc = gc·X(c) is a D-dimensional vector that denotes theoutput contributed by the candidate c, gc is the gene value, X(c)is the cth column in candidate matrix X that represents the datapoints of the that candidate, and Jc = y − yc is the modelingerror when candidate c is removed, in which y is the actual sys-tem output, yc = h0 +

∑KG

i=1,i �=c gi·X(i) is the model outputwhen candidate c is removed, KG is the number of candidatesat current generation G, and var and covar are the conventionalvariance and covariance function taken over all data points.When there is no measurement noise added to the output y, andat large fitness values, the value of ‖ρ(c)‖ for significant termswill reach the unit value indicating total correlation betweenthe error-reduction process and any significant term. Contrarily,insignificant terms that are uncorrelated with the modeling errorwill result in zero values for ‖ρ(c)‖. On the contrary, when theoutput is noisy, ‖ρ(c)‖ for significant terms cannot reach theunit value due to the existence of noise. However, the value willbe considerably much higher than that for insignificant terms.

The above process is executed only when the fitness ofthe best chromosome reaches a prespecified value. This is toallow the evolution to stabilize before starting the adaptationprocess. The correlation coefficient ρ(i) for all i = 1, . . . ,KG

is checked against a certain threshold ρt and candidates thatdo not exceed the threshold will be declared insignificant andremoved from the current chromosome. The end populationreached at the current generation is adjusted accordingly. Also,the candidate vector x is modified by setting the bit(s) corre-sponding to the removed candidate(s) to the zero value.

B. Adapting the Gene Search Area

The second level of adaptation will be performed on thelower and upper boundaries, ai, bi, i = 1, . . . , KG, of eachgene. At the beginning, the two values are selected to belarge enough to cover the expected range of the genes. Duringevolution, the bounds of each gene will be adjusted accordingto the contribution of this gene to the output. Genes withrelatively large correlation coefficient ρ(i) are believed to havebecome very close to the optimal kernel values. Hence, one canshorten the search interval of this gene and localize it aroundthe latest reached value. Let us assume that I = b0 − a0 is theinitial range at the first generation and Q is the final range tobe reached when evolution stops (fitness, F ∼= 1.0). Also, thefitness of the best chromosome after generation G is denoted

ABBAS AND BAYOUMI: VOLTERRA-SYSTEM IDENTIFICATION USING ADAPTIVE REAL-CODED GENETIC ALGORITHM 7

by Fbest(G). For each gene, the new boundaries will be calcu-lated using the following equations:

rc = I − Fbest(G)(I − Q)

bci =

12rc + gc and

aci = − 1

2rc + gc.

Here, the candidate range will be resized around the currentvalue of the gene gc. It should be noted that this range is affectedby the value of the best fitness in the current generation andconsequently, by the correlation coefficient of each gene. Alarge fitness value means that the algorithm is approaching theoptimal solution and this might allow starting a local searchby shortening the search interval. However, genes with verysmall ρ(c) are indicating that they either have not reached agood global solution or they can be removed entirely. Hence,the amount of interval shortening will be modulated furtherby ρ(c). The higher the value of ρ(c), the smaller the newrange will become. The main objective is to provide muchtuned local search. Conversely, a small correlation coefficientwill leave the range wide enough to allow the needed globalsearch. If the required fitness threshold is reached and there arestill some genes with ρ(c) < ρt, then a gene-removal process istriggered.

VI. SIMULATION RESULTS AND ANALYSIS

The proposed Volterra-system-identification algorithm hasbeen tested using a set of experiments. We will describe threeexamples to demonstrate the performance of the algorithm inidentifying a third-order system. The locations of the principalcandidates are chosen randomly. The corresponding kernelvalues of the chosen terms are also generated from a uniformdistribution bounded by the upper and lower values of eachkernel ([−2.5, 2.5]). The first example demonstrates the abilityof the algorithm in correctly identifying the system when thereare no linear terms existing with white Gaussian input andnoise-free measurement. The second example describes thealgorithm performance when there is a white Gaussian noisemeasurement. The last and third experiment shows the resultwhen the system is driven by colored Gaussian input and thereis white Gaussian output noise. For the three examples, the in-put sequence u is drawn from a zero-mean unit-variance whiteGaussian distribution. The number of data points generated was4000. The evolution process is halted when the final fitness ofthe system reaches a value of 0.99 or a maximum number ofgenerations have elapsed. In each generation, a populationof 150 chromosomes is used. The probability of selecting thebest individual for the ranking method was set at 0.09. Anarithmetic crossover with 0.025 probability is applied. The pa-rameter values of the multi-nonuniform mutation used in thiswork are 0.05, 2000, and 8 for the probability, Gmax (maxi-mum number of generation), and p (the shape), respectively. Apseudocode of the algorithm could be as follows.

1. Given are: input u, output y, a number of sampled pointsD, system delays N , order n, and SNR if there is a measure-ment noise

2. Calculate all K candidates and their correlation {φ(c)}Kc=1

with y3. Sort candidates in descending order according to the value

of φ, candidate matrix X, and vector of selected candidates xare constructed.

4. Assign a number e = (K/E) of candidates to each of Eevolution eras

5. Initialize a population of P individuals with length e withuniform random number from [a0, b0]

5. While fitness f < 0.999 do6. Evaluate the fitness of current P , perform selection,

crossover, mutation7. Repeat 6 and test that f is fixed and ft < f < 0.9998. Calculate ρ(c) for all genes and

remove the insignificant9. Adjust the chromosome length X and x,

and boundaries ai, bi of each gene10. Add next e genes from the candidate pool11. end while

A. Example 1: A Third-Order Volterra SystemWith No Measurement Noise

The first example is of a third-order Volterra system (n = 3)with 15 time delays (N = 15). This amounts to a total numberof 816 candidates. Only 25 candidates have been used togenerate the output. Table I shows the kernel values, theirordering before and after sorting, their correlation coefficientφ with the output, and the extracted value produced by theevolutionary algorithm. It is obvious that none of the selectedterms is a linear one (all terms are greater than 16) and this is acompletely nonlinear problem. We have assigned 30 candidatesto each evolution era. From the table, it is clear that thealgorithm will require 11 eras to reach a solution. For the samenumber of eras used with nonsorted candidates, the numberof generations needed for the evolution to just detect the 17significant candidates will be far greater. This can be explainedby examining the distribution of sorted candidates over the dif-ferent eras. Twenty-two candidates will be detected during thefirst seven eras, which produced a fitness value greater than 0.7.Since this value is far greater than the threshold fitness ( ft =0.01) at which insignificant candidates can be removed, it waseasy for the algorithm to retain 24 candidates after the fourth eraand to optimally detect the 22 candidates after the seventh era asshown in Fig. 1, which depicts the change in the genome length(number of extracted candidates). Fig. 2 shows the evolution ofthe fitness value on a logarithmic scale. From the two figures,it is clear that there was no removal of any insignificant termsduring the first two eras since the fitness value was too low totrigger the removal process. Candidates started to be removedduring the third era as the fitness value started to improve andconfidence of the insignificant terms had become too low [interms of the values of ρ(i)]. Since there were no significantcandidates to pick up in eras 4–6, there was no improvementin the fitness value and all added candidates during those three

8 IEEE TRANSACTIONS ON SYSTEMS, MAN, AND CYBERNETICS—PART A: SYSTEMS AND HUMANS

TABLE IEXAMPLE 1: ORIGINAL AND EXTRACTED TERMS AND THEIR KERNELS

Fig. 1. Example 1: Number of extracted candidates during evolution.

ABBAS AND BAYOUMI: VOLTERRA-SYSTEM IDENTIFICATION USING ADAPTIVE REAL-CODED GENETIC ALGORITHM 9

Fig. 2. Example 1: Evolution of fitness value.

eras have been removed. It is worth mentioning that the threeeras consume only 400 generations. Two more candidates havebeen added in the seventh era with considerable increase in thefitness value from 0.22 to 0.7. Understandably, eras 8–10 didnot add any fitness improvement. The final sought value tookplace at the 11th era as the last and final candidate was added.

It would be very interesting to investigate the behavior ofthe algorithm with nonsorted candidates. The first three eraswill contain only three significant terms compared to 22 forthe sorted case. Naturally, with this small number of significantterms, it will take a much greater number of generations for thefitness to remain stable inside an era. Moreover, it will accumu-late many candidates (300 candidates at the 11th era) beforeit starts to reach the fitness threshold at which insignificantcandidates can be removed. This will incur huge computationalcosts in terms of the fitness calculation, crossover, and mutationoperation.

There are many issues pertaining to the proposed algorithmthat need to be addressed.

1) The number of candidates assigned to each era should bechosen in such a way that it should be large enough toallow significant candidates to be selected in early eras.However, a small number of candidates in each era willmake the removal of candidates fast and the era’s age verysmall, which will eventually result in fast convergence. Acompromise has to be made nonetheless.

2) The evolutionary algorithm managed to extract the cor-rect number of candidates after 1700 generations with afitness value equal to 0.99996. Naturally, at convergence,the value of ρ(c) of the remaining candidates should beequal to the unit value. It should be also noted that thefast convergence to the unit fitness value can also beattributed to the shrinkage of the kernel search areas aswe have made the boundaries of the value of each genedecrease linearly with the fitness value. This has allowed

the crossover and mutation processes to perform a localsearch and thus converging rapidly to the correct kernelvalues.

3) The decision for ending an era and starting a new one isbased on the fact that the fitness has stopped improving.The test of no improvement was implemented in theproposed algorithm by averaging the logarithm of thefitness value over the past ten generations

flavg =110

G−9∑j=G

log( fj)

where G is the current generation and fj is the bestfitness value at generation j. If the difference between twoconsecutive values is less than a certain threshold, a newera is to start.

B. Example 2: White Gaussian Input and Measurement Noise

In this experiment, the identification of a third-order Volterrasystem with a large number of system delays is carried outwhen there is white Gaussian noise added to the output witha signal-to-noise ratio (SNR) equal to 5 dB. As in the firstexample, the input used is 4000 data points drawn from awhite Gaussian noise with zero mean and unit variance. Themeasurement noise is independent from the input signal. Here,we have 17 time delays and hence, 1140 candidates will beexamined to extract 18 kernel values for the participatingcandidates. Here, we assigned 20 candidates to each era. Ta-ble II lists the terms with both the original and evolved ones.Obviously, the proposed approach was successful in extractingthe correct terms with extracted kernel values close to theoriginal ones. Unlike the previous example, where there was nomeasurement noise, it is almost impossible to find the correctkernel values as we are seeking a maximum likelihood of

10 IEEE TRANSACTIONS ON SYSTEMS, MAN, AND CYBERNETICS—PART A: SYSTEMS AND HUMANS

TABLE IIEXAMPLE 2: ORIGINAL AND EXTRACTED TERMS AND THEIR KERNELS

Fig. 3. Example 2: Evolution of fitness value.

the noise function rather than the exact noise sequence. Theevolution of the fitness value and the number of candidatesare shown in Figs. 3 and 4, respectively. It took the algorithm135 generations to end the first era with a fitness value ex-ceeding 0.01. The length of the chromosome at the end ofthis era was 14 with three unimportant terms. The second erastarted by adding the next 20 candidates with the end resultof actually detecting 12 principal terms and 8 removable ones.After 200 generations in this era, the fitness value was only upto 0.065 (Fig. 3) and thus resulting in a very small correlation

threshold ρt, triggering the removal process of the irrelevantcandidates with very small ρ(c). In the third era, there are noprincipal terms to detect but the algorithm ended up collectingmore unimportant terms due to the high vigilance associatedwith ρt. Things changed in the fourth era, where the extra fourprincipal terms picked up there has resulted in a large increasein the fitness value (0.61) and thus produced a higher value forρt. This resulted in the removal of all unimportant terms andthe era ended with the maximum number of principal terms(16) that can be detected at this point of evolution. The last

ABBAS AND BAYOUMI: VOLTERRA-SYSTEM IDENTIFICATION USING ADAPTIVE REAL-CODED GENETIC ALGORITHM 11

Fig. 4. Example 2: Number of extracted candidates during evolution.

TABLE IIIEXAMPLE 3: ORIGINAL AND EXTRACTED TERMS AND THEIR KERNELS

two candidates are collected in the fifth era and the algorithmwas terminated after 780 generations. The algorithm could haveconverged in a far less number of generations had the numberof generations needed to elapse to start a new era been reduced.It should be noted that had the evolution been performed fornonsorted candidates, it would have taken at least 40 eras (todetect candidate 805) to reach a final solution. The adaptationof the gene boundaries has also resulted in a very small searcharea for the 16 candidates detected in the first four eras. Startingthe last era with these tight search areas has produced a very fastconvergence to the solution.

C. Example 3: Colored Gaussian Inputand Measurement Noise

Since least-squares identification methods such as Koren-berg’s FOS [20] and the evolutionary algorithm suggested inthis study do not require a white Gaussian input [22], a differ-

ent input is used here to demonstrate the proposed algorithmperformance on nonwhite Gaussian inputs. A colored Gaussianinput is generated by passing the white Gaussian sequencethrough a Butterworth low-pass filter of order 4 and normalizedcutoff frequency equal to 0.8. The intent is to demonstratethe algorithm efficiency with different types of inputs. WhiteGaussian measurement noise at SNR 10 dB was added to theoutput. Here, we have a third-order system with 16 time delaysto make up 969 candidates that need to be tested to find tenkernel values. We assigned 15 candidates to each era. Table IIIlists the same information as was done in Table II. As wasthe case in the above example, the measurement effect on theaccuracy of extracted kernels is evident.

Figs. 5 and 6 show the evolution of the number of detectedcandidates and the fitness value, respectively. Expectedly, thealgorithm took eight eras to find the solutions in 567 genera-tions. The inaccuracies in evaluating the kernel values due tothe measurement noise and the colored input is quite evident.

12 IEEE TRANSACTIONS ON SYSTEMS, MAN, AND CYBERNETICS—PART A: SYSTEMS AND HUMANS

Fig. 5. Example 3: Evolution of fitness value.

Fig. 6. Example 3: Number of extracted candidates during evolution.

However, this inaccuracy is less noticeable than in the previousexample since the SNR ratio is higher (10 versus 6 dB). Anotherinteresting observation in this experiment is the fast conver-gence. This is due to the fact that at the end of the first era, thesix principal terms were already detected in addition to only oneremovable term. This has produced a fitness value of 0.285 atthe end of this era. In the second era, the needed eight principalterms were collected and there were no unimportant kernels.The fitness was increased close to 0.7. At this value, the correla-tion threshold ρt is set to its maximum (0.1), and thus resultingin removing all unimportant terms in all subsequent eras usinga few number of generations. The ninth term is then collectedin the fourth era and the fitness jumped to 0.88. The last term

is detected in the eighth era at a fitness value equal to 0.99.The fast convergence is also attributed to the small search areafor the kernel values when the fitness improves. As mentionedearlier, the algorithm switches to a local search rather than theglobal one, which is normally performed at early generations.

A very interesting feature of the proposed algorithm is thatit can work for large values of system order and time delays.Although this will have an impact on the storage needed to havea large number of candidates in memory, free storage can bereclaimed at the end of each era and thus freeing a large amountof storage.

It is worth mentioning that the merit of any evolutionaryalgorithm is normally enhanced if a mathematical analysis of

ABBAS AND BAYOUMI: VOLTERRA-SYSTEM IDENTIFICATION USING ADAPTIVE REAL-CODED GENETIC ALGORITHM 13

its performance is provided. Unfortunately, a rigorous mathe-matical analysis of even the simple binary-coded GA is diffi-cult and is still incomplete [23]. Holland [24] introduced theSchemata Theory in an attempt to provide some theoreticalfoundations explaining the convergence of the GAs. The theoryoffers an interpretation of the chromosome evolution from onegeneration to the next. However, it fails to provide a long-termbehavior of the algorithm. There have been some attempts toanalyze the performance of real-coded GAs. Neubauer [25]made a study on the nonuniform mutation. However, it did notshow how this analysis can be extended to multiple genes andwhether the results can be generalized for different mutationoperators. Greenhalgh and Marshall [26] discussed the conver-gence properties for GAs of nonbinary encoding with its genevalues drawn from a set of K elements. They have shown thatthere is an upper bound for the number of iterations necessaryto guarantee convergence to a global optimum with any spec-ified level of confidence by studying the mutation operator.However, the analysis did not consider the effect of the cross-over or the selection operators, which apply to our case. Giventhe complex level of adaptation adopted in this work, it willbe extremely difficult to extend those results to the proposedalgorithm. A reasonable approach to provide theoretical analy-sis of the real-coded GA is to further develop the SchemataTheory in a such a way that it can work on variable-lengthreal-coded chromosomes, which can be operated on using thegenetic operators introduced in this work.

VII. CONCLUSION

In this paper, an evolutionary algorithm for Volterra-systemsidentification is presented. The proposed approach is based onreal-value encoding of the individuals of the GA. The genescomprising an individual or chromosome represent the valuesof the Volterra kernels of the identified system. The proposedevolutionary algorithm exploits the fact that candidates with thehighest correlation with output are more probable to be prin-cipal system terms. The algorithm sorts all available candidatesin descending order according to their correlation with theoutput. The evolution process goes through different eras. In thefirst era, the candidates with the largest correlation are detected.In subsequent evolutionary eras, more significant candidates areexamined through the evolutionary process. During each era,irrelevant candidates are dropped off and the fittest survive tothe next era. The parameters that have the least significant effecton the error-reduction process are continuously removed fromthe chromosome. The removal criterion is based on anothercorrelational relationship between the remaining terms and themodeling error when a particular remaining term is removed.Genes with the least correlation are tested against a certainthreshold to decide on which genes to remove. The threshold isexpressed as a function of the evolution fitness, which improvesas the evolution process matures. The process ends when asolution is found and the identified system is successfully re-produced by the Volterra model. The convergence to the neededsolution is enhanced by adaptively removing the unimportantgenes and by decreasing the search space of each kernel value.Whether the input is white or colored Gaussian, and in the

absence of any measurement noise, the algorithm has producedexact results when applied to second- and third-order systemswith relatively large system memory in order to extract a smallnumber of kernel values. For noisy outputs, the algorithmmanaged to detect the correct Volterra terms with a small errorin the kernel values. It was clear that the level of noise added tothe output has more influence on the accuracy of the detectedkernels than the type of input used. In all cases, the algorithmwas able to find the correct terms. The added noise also hasa great effect on the correlation between all candidates and thesystem output, which in turn has produced an interesting sortingfor the principal candidates. In most of the noise-free systemswe have experimented in this study, all principal candidatestend to be packed in the first few eras and thus resulting in a veryfast convergence. However, with noise added, as was evident inthe last example, some principal candidates can be thrown backto late eras, and this was characteristic for the noisy systems.This is mainly due to computing the correlation of a candidatewith an output that is corrupted with noise.

The GA population selection process and the genetic oper-ators, mutation and crossover, have been chosen to acceleratethe evolution process and to find the correct solution as well.The fitness function employed in this study depends on whetherthere is a measurement noise with the output or not. For thenoise-free output, a fitness function based on reducing themean-squared error was used. For the noisy measurement,the objective was to maximize the likelihood of the represen-tation error of being drawn from the same probability densityfunction (pdf) of the measurement noise. Both fitness functionschosen in this work do not allow insignificant candidates toparticipate in the final solution or to “hitchhike.”

ACKNOWLEDGMENT

The authors would like to thank the anonymous referees forthe very useful suggestions that helped to considerably improvethe paper quality.

REFERENCES

[1] J. Amorocho and A. Brandstetter, “Determination of nonlinear func-tional response functions in rainfall runoff processes,” Water Resour.Res., vol. 7, no. 5, pp. 1087–1101, 1971.

[2] V. Z. Marmarelis, “Identification of nonlinear systems using laguerreexpansion of kernels,” Ann. Biomed. Eng., vol. 21, no. 6, pp. 573–589,1993.

[3] S. Benedetto and E. Biglieri, “Nonlinear equalization of digital satel-lite channels,” IEEE J. Sel. Areas Commun., vol. 1, no. 1, pp. 57–62,Jan. 1983.

[4] O. Agazzi and D. G. Messerschmitt, “Nonlinear echo cancellation ofdata signals,” IEEE Trans. Commun., vol. 30, no. 11, pp. 2421–2433,Nov. 1982.

[5] J. D. Taft and N. K. Bose, “Quadratic linear filters for signal detection,”IEEE Trans. Signal Process., vol. 39, no. 11, pp. 2557–2559, Nov. 1991.

[6] R. Parker and M. Tummala, “Identification of volterra systems with apolynomial neural network,” in Proc. IEEE Int. Conf. Acoustics, Speechand Signal Processing, San Francisco, CA, 1992, vol. 4, pp. 561–564.

[7] M. J. Korenberg, “A robust orthogonal algorithm for system identifica-tion and time-series analysis,” Biol. Cybern., vol. 60, no. 4, pp. 267–276,1989.

[8] A. Sierra, J. A. Macias, and F. Corbacho, “Evolution of functional linknetworks,” IEEE Trans. Evol. Comput., vol. 5, no. 1, pp. 54–65, Feb. 2001.

[9] M. J. Korenberg, “Parallel cascade identification and kernel estimationfor nonlinear systems,” Ann. Biomed. Eng., vol. 19, no. 1, pp. 429–455,1991.

14 IEEE TRANSACTIONS ON SYSTEMS, MAN, AND CYBERNETICS—PART A: SYSTEMS AND HUMANS

[10] L. Yao, “Genetic algorithm based identification of nonlinear systems bysparse Volterra filters,” IEEE Trans. Signal Process., vol. 47, no. 12,pp. 3433–3435, Dec. 1999.

[11] N. Xiong, “A hybrid approach to input selection for complex pro-cesses,” IEEE Trans. Syst., Man, Cybern. A, vol. 32, no. 4, pp. 532–536,Jul. 2002.

[12] J. Wray and G. Green, “Calculation of the volterra kernels of nonlineardynamic systems using an artificial neural network,” Biol. Cybern.,vol. 71, no. 3, pp. 187–195, 1994.

[13] Z. Michalewicz, Genetic Algorithms + Data Structures = EvolutionPrograms. Berlin, Germany: Springer-Verlag, 1996.

[14] D. E. Goldberg, Genetic Algorithms in Search, Optimization, andMachine Learning. Reading, MA: Addison-Wesley, 1989.

[15] A. Nissinen and H. Koivisto, “Identification of multivariate volterraseries using genetic algorithms,” in Proc. of the 2nd Nordic Workshopon Genetic Algorithms (2NWGA), Vaasa, Finland, 1996, pp. 151–161.

[16] S. Chen, C. F. N. Cowan, and P. M. Grant, “Orthogonal least squareslearning algorithm for radial basis function networks,” IEEE Trans.Neural Netw., vol. 2, no. 2, pp. 302–309, Mar. 1991.

[17] V. Duong and A. R. Stubberad, “System identification by geneticalgorithm,” in Proc. IEEE Aerospace Conf., Big Sky, MT, 2002, vol. 5,pp. 2331–2338.

[18] G. Harik, E. Cantu-Paz, D. E. Goldberg, and B. L. Miller, “Thegambler’s ruin problem, genetic algorithms, and the sizing of popula-tions,” Evol. Comput., vol. 7, no. 3, pp. 231–253, 1999.

[19] J. A. Joines and C. R. Houck, “On the use of non-stationary penaltyfunctions to solve nonlinear constrained optimization problems withgenetic algorithms,” in Proc. 1st IEEE Conf. Evolutionary Computation,Orlando, FL, 1994, pp. 579–584.

[20] M. J. Korenberg and L. D. Paarmann, “Orthogonal approaches to time-series analysis and system identification,” IEEE Signal Process. Mag.,vol. 8, no. 3, pp. 29–43, Jul. 1991.

[21] R. Hinterding, Z. Michalewicz, and A. S. Eiben, “Adaptation in evolu-tionary computation: A survey,” in Proc. 4th IEEE Conf. EvolutionaryComputation, Indianapolis, IN, 1997, pp. 65–69.

[22] D. T. Westwick, B. Suki, and K. R. Lutchen, “Sensitivity analysis ofkernel estimates: Implications in nonlinear physiological system identi-fication,” Ann. Biomed. Eng., vol. 26, no. 3, pp. 488–501, 1998.

[23] P. Fleming and C. Purrshouse. (2001). Genetic Algorithms in Con-trol Systems Engineering, IFAC Professional Brief. [Online]. Available:http://www.oeaw.ac.at/ifac/publications/pbriefs/btxdoc.pdf

[24] J. Holland, Adaptation in Natural and Artificial Systems. Ann Arbor,MI: Univ. Michigan Press, 1975.

[25] A. Neubauer, “A theoretical analysis of the non-uniform mutationoperator for the modified genetic algorithm,” in Proc. IEEE Conf.Evolutionary Computation, Indianapolis, IN, 1997, pp. 93–97.

[26] D. Greenhalgh and S. Marshall, “Convergence criteria for genetic algo-rithms,” SIAM J. Comput., vol. 30, no. 1, pp. 269–282, 2001.

Hazem M. Abbas (SM’92–M’94) received the B.Sc.and M.Sc. degrees in electrical and computer engi-neering in 1983 and 1988, respectively, from AinShams University, Cairo, Egypt, and the Ph.D. de-gree in electrical and computer engineering in 1993from Queen’s University, Kingston, ON, Canada.

He held a postdoc position at Queen’s Universityin 1993. In 1995, he worked as a Research Fellowat the Royal Military College at Kingston and thenjoined the IBM Toronto Lab as a Research Associate.He joined the Department of Electrical and Com-

puter Engineering at Queen’s University as an Adjunct Assistant Professor in1997–1998. He is on sabbatical leave from the Department of Computers andSystems Engineering at Ain Shams University, where he works as an AssociateProfessor. He is currently working for Mentor Graphics Inc., Egypt, as anEngineering Manager. His research interests are in the areas of neural networks,pattern recognition, evolutionary computations, and image processing.

Dr. Abbas currently serves as the Acting President of the IEEE SignalProcessing Chapter in Cairo.

Mohamed M. Bayoumi (S’61–M’73–SM’83) re-ceived the B.Sc. degree in electrical engineeringfrom the University of Alexandria, Egypt, in 1956,and the Ph.D. degree in electrical engineering andthe “Diplom Mathematiker” degree in applied math-ematics from the Swiss Federal Institute of Technol-ogy, Zurich, in 1963 and 1966, respectively.

From 1963 to 1969, he worked in the Research andDevelopment Laboratory for Control System Designin Landis and Gyr, Zug, Switzerland. In 1969, hejoined the faculty at Queen’s University, Kingston,

Ontario, Canada, and has since been associated with the Electrical and Com-puter Engineering Department. His research interests lie in control systems,robotics, signal processing, image processing, and computer vision.