Multivariable Gaussian Evolving Fuzzy Modeling System

14
Copyright (c) 2010 IEEE. Personal use is permitted. For any other purposes, Permission must be obtained from the IEEE by emailing [email protected]. This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. 1 Multivariable Gaussian Evolving Fuzzy Modeling System Andre Lemos Student Member, IEEE, Walmir Caminhas, Fernando Gomide Senior Member, IEEE Abstract—This paper introduces a class of evolving fuzzy rule- based system as an approach for multivariable Gaussian adaptive fuzzy modeling. The system is an evolving Takagi-Sugeno func- tional fuzzy model whose rule-base can be continuously updated using a new recursive clustering algorithm based on participatory learning. The fuzzy sets of the rule antecedents are multivariable Gaussian membership functions, adopted to preserve information between input variables interactions. The parameters of the membership functions are estimated by the clustering algo- rithm. A weighted recursive least squares algorithm updates the parameters of the rules consequents. Experiments considering time series forecasting and nonlinear system identification are performed to evaluate the performance of the approach proposed. The multivariable Gaussian evolving fuzzy models are compared with alternative evolving fuzzy models and classic models with fixed structures. The results suggest that multivariable Gaussian evolving fuzzy modeling is a promising approach for adaptive system modeling. Index Terms—Evolving Fuzzy Systems, Participatory Learn- ing, Adaptive Fuzzy Rule-Based Modeling I. I NTRODUCTION D URING the last decades, fuzzy rule based systems and their hybrid derivations have played an important role for modeling complex systems. Initially the design of such sys- tems were based on expert knowledge [1]. During the 1990s, a new trend emerged, based on data-driven rule/knowledge extraction methods [2]–[8], where the system structure is identified based on data, and the expert knowledge plays a complementary role [9]. The techniques used in this period are mainly clustering, linear least-squares and/or nonlinear optimization for fine tunning of model parameters. Nowadays, increasing interest in adaptive system modeling has motivated the development of highly adaptive fuzzy sys- tems, denominated evolving fuzzy systems, whose models are self-developed from a stream of data [10]. Evolving fuzzy systems (eFS) can be seen as a synergy between fuzzy systems, as a mechanism for evolvable informa- tion compactation and representation, and recursive methods of machine learning [11]. Several eFS can be found in literature, some of them propos- ing evolvable functional rule-based systems in which the struc- ture (number of rules and antecedents/consequent parameters) continuously evolves based on clusters created/excluded by recursive clustering algorithms [12]. The parameters of the consequents are updated using recursive least squares or its variations [13], [14]. In [15] an evolving Takagi-Sugeno model (eTS) was proposed using an incremental version of the subtractive clustering algorithm [3] with recursive evaluation of the information potential of new data samples to create new clusters or revise the existing ones. The rule consequent parameters are updated with the recursive least squares algo- rithm. A simplified version of the eTS was suggested in [16] to reduce the complexity of information potential calculations. The Simpl_eTS replaces the notion of information potential by the concept of scatter to provide a similar, but compu- tationally more effective algorithm. The DENFIS (Dynamic Evolving Neural-Fuzzy System) [17] is another evolvable TS system derived from a distance-based recursive evolving clustering method (ECM) to adapt the rule base structure, and a weighted recursive least squares with forgetting factor algorithm to update rules consequent parameters. SONFIN (Self-constructing Neural Fuzzy Inference Network) [18] uses an input space partition approach with an aligned clustering- based method to define and eliminate redundant fuzzy sets for each input variable in an incremental manner. FLEXFIS (Flexible Fuzzy Inference System), detailed in [19], uses a recursive clustering algorithm derived from a modification of the vector quantization technique [20] called eVQ (Evolving Vector Quantization) [21]. Consequent parameters estimation uses the weighted recursive least squares algorithm. More recently, an evolving neuro-fuzzy type-2 model SOFMLS (Self-organizing Fuzzy Modified Least-squares Network) was developed [22]. The SOFMLS employs an evolving nearest- neighborhood clustering algorithm. The literature also reports alternative methods to adapt model structure. The SAFIS (Sequential Adaptive Fuzzy Inference System) [23] uses a distance criterion in conjunction with an influence measure of the new rules created. The SOFNN (Self-organising Fuzzy Neural Network) [24] adopts an error criterion considering the generalisation performance of the network. Although most of the evolving system approaches develop functional fuzzy models, the literature also mentions linguistic models. For instance, in [25] an evolving fuzzy classifier is developed using the clustering algorithm of [16]. In [26] an evolvable linguistic modeling methodology for fault detection is proposed. The methodology, denominated FSPC, uses a distance-based clus- tering inspired by statistical process control [27] to learn the parameters of the antecedent and consequent fuzzy sets. The recursive clustering algorithms used in most of the eFS found in literature can be summarized by a two-step procedure. First, a distance measure is defined and an initial number of clusters (and respective cluster centers) estimated from a priori knowledge about the problem; alternatively, a single cluster with the center at the first data sample is created. Second, for each new data sample, the distance between the existing clusters and the sample is computed and if the distance exceeds a threshold, then a new cluster is created; otherwise, the parameters of the closest cluster are updated using recursive algorithms [16], [17], [19], [22], [25]. Despite

Transcript of Multivariable Gaussian Evolving Fuzzy Modeling System

Copyright (c) 2010 IEEE. Personal use is permitted. For any other purposes, Permission must be obtained from the IEEE by emailing [email protected].

This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication.

1

Multivariable Gaussian Evolving Fuzzy ModelingSystem

Andre LemosStudent Member, IEEE, Walmir Caminhas, Fernando GomideSenior Member, IEEE

Abstract—This paper introduces a class of evolving fuzzy rule-based system as an approach for multivariable Gaussian adaptivefuzzy modeling. The system is an evolving Takagi-Sugeno func-tional fuzzy model whose rule-base can be continuously updatedusing a new recursive clustering algorithm based on participatorylearning. The fuzzy sets of the rule antecedents are multivariableGaussian membership functions, adopted to preserve informationbetween input variables interactions. The parameters of themembership functions are estimated by the clustering algo-rithm. A weighted recursive least squares algorithm updates theparameters of the rules consequents. Experiments consideringtime series forecasting and nonlinear system identification areperformed to evaluate the performance of the approach proposed.The multivariable Gaussian evolving fuzzy models are comparedwith alternative evolving fuzzy models and classic models withfixed structures. The results suggest that multivariable Gaussianevolving fuzzy modeling is a promising approach for adaptivesystem modeling.

Index Terms—Evolving Fuzzy Systems, Participatory Learn-ing, Adaptive Fuzzy Rule-Based Modeling

I. I NTRODUCTION

DURING the last decades, fuzzy rule based systems andtheir hybrid derivations have played an important role for

modeling complex systems. Initially the design of such sys-tems were based on expert knowledge [1]. During the 1990s,a new trend emerged, based on data-driven rule/knowledgeextraction methods [2]–[8], where the system structure isidentified based on data, and the expert knowledge plays acomplementary role [9]. The techniques used in this periodare mainly clustering, linear least-squares and/or nonlinearoptimization for fine tunning of model parameters.

Nowadays, increasing interest in adaptive system modelinghas motivated the development of highly adaptive fuzzy sys-tems, denominated evolving fuzzy systems, whose models areself-developed from a stream of data [10].

Evolving fuzzy systems (eFS) can be seen as a synergybetween fuzzy systems, as a mechanism for evolvable informa-tion compactation and representation, and recursive methodsof machine learning [11].

Several eFS can be found in literature, some of them propos-ing evolvable functional rule-based systems in which the struc-ture (number of rules and antecedents/consequent parameters)continuously evolves based on clusters created/excluded byrecursive clustering algorithms [12]. The parameters of theconsequents are updated using recursive least squares or itsvariations [13], [14]. In [15] an evolving Takagi-Sugeno model(eTS) was proposed using an incremental version of thesubtractive clustering algorithm [3] with recursive evaluationof the information potential of new data samples to createnew clusters or revise the existing ones. The rule consequent

parameters are updated with the recursive least squares algo-rithm. A simplified version of the eTS was suggested in [16]to reduce the complexity of information potential calculations.The Simpl_eTS replaces the notion of information potentialby the concept of scatter to provide a similar, but compu-tationally more effective algorithm. The DENFIS (DynamicEvolving Neural-Fuzzy System) [17] is another evolvableTS system derived from a distance-based recursive evolvingclustering method (ECM) to adapt the rule base structure,and a weighted recursive least squares with forgetting factoralgorithm to update rules consequent parameters. SONFIN(Self-constructing Neural Fuzzy Inference Network) [18] usesan input space partition approach with an aligned clustering-based method to define and eliminate redundant fuzzy setsfor each input variable in an incremental manner. FLEXFIS(Flexible Fuzzy Inference System), detailed in [19], uses arecursive clustering algorithm derived from a modificationofthe vector quantization technique [20] called eVQ (EvolvingVector Quantization) [21]. Consequent parameters estimationuses the weighted recursive least squares algorithm. Morerecently, an evolving neuro-fuzzy type-2 model SOFMLS(Self-organizing Fuzzy Modified Least-squares Network) wasdeveloped [22]. The SOFMLS employs an evolving nearest-neighborhood clustering algorithm. The literature also reportsalternative methods to adapt model structure. The SAFIS(Sequential Adaptive Fuzzy Inference System) [23] uses adistance criterion in conjunction with an influence measureof the new rules created. The SOFNN (Self-organising FuzzyNeural Network) [24] adopts an error criterion consideringthe generalisation performance of the network. Although mostof the evolving system approaches develop functional fuzzymodels, the literature also mentions linguistic models. Forinstance, in [25] an evolving fuzzy classifier is developed usingthe clustering algorithm of [16]. In [26] an evolvable linguisticmodeling methodology for fault detection is proposed. Themethodology, denominated FSPC, uses a distance-based clus-tering inspired by statistical process control [27] to learn theparameters of the antecedent and consequent fuzzy sets.

The recursive clustering algorithms used in most of the eFSfound in literature can be summarized by a two-step procedure.First, a distance measure is defined and an initial numberof clusters (and respective cluster centers) estimated from apriori knowledge about the problem; alternatively, a singlecluster with the center at the first data sample is created.Second, for each new data sample, the distance betweenthe existing clusters and the sample is computed and if thedistance exceeds a threshold, then a new cluster is created;otherwise, the parameters of the closest cluster are updatedusing recursive algorithms [16], [17], [19], [22], [25]. Despite

Copyright (c) 2010 IEEE. Personal use is permitted. For any other purposes, Permission must be obtained from the IEEE by emailing [email protected].

This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication.

2

their effectiveness, the clustering algorithms constructed uponthis two-step procedure have a major drawback in the sensethat they lack robustness. Whenever noisy data or outliersexceed a threshold, the two-step algorithms create new clustersinstead of reject or smooth the data.

A robust evolving TS system was developed in [28] usinga recursive clustering algorithm inspired by the idea of par-ticipatory learning [29]. Participatory learning is a learningparadigm which assumes that learning and beliefs about thesystem to be modeled depend on what the learning processhas already learned. An essential characteristic of this learningprocess is that a new observation impact in causing learningor belief revision depends on its compatibility with the currentsystem belief. Therefore, clustering algorithms based on thislearning process tend to be robust to noisy data because singleoutliers are likely to be incompatible with the current systembelief and consequently can be either discarded or have theireffect smoothed.

This paper introduces a new evolving fuzzy modeling ap-proach called Multivariable Gaussian Evolving Fuzzy Mod-eling System, eMG for short. The eMG uses an evolvingGaussian clustering algorithm rooted in the concept of partic-ipatory learning. The evolving clustering procedure developedhere can be seen as an extension of the one addressed in[30]. Differently from the batch algorithm of [30], here eachcluster is represented by a multivariable Gaussian membershipfunction, the cluster structure (number, center and shape of theclusters) is recursively updated at each step of the algorithm,and thresholds are set automatically.

More specifically, the evolving clustering algorithm devel-oped in this paper considers the possibility that input variablesmay interact with each other. Clusters are estimated usinga normalized distance measure (similar to theMahalanobisdistance) and trigger ellipsoidal clusters whose axes are notnecessarily parallel to the input variables axes, as it would bethe case if the Euclidean distance were used [15], [17], [19].The idea is to preserve information about interactions betweeninput variables. The fuzzy sets of the rules antecedents aremultivariable Gaussian membership functions characterizedby a center vector, and a dispersion matrix representing thedispersion of each variable and interactions between variables.Similarly as in other evolving system modeling approaches[15], [19], the parameters of the fuzzy rules consequents areupdated using weighted recursive least squares.

As discussed latter, the evolving clustering algorithms usedby some eFS create a new cluster whenever a distance measureexceeds a given threshold value [17], [19], [22]. As pointed-out in [19], to avoid thecurse of dimensionality, the thresholdvalue must be chosen considering the input or input-outputdata space dimensions. This is because the higher the spacedimension, the greater the distance between two adjacentdata points [31]. Thus, if the threshold does not accountsfor the dimension, then as the dimensionality increases moreobservations will have the corresponding distances exceedingthis threshold. Therefore more clusters are created, the modelbecomes more complex and over-fitting may occur. The recur-sive clustering algorithm introduced here avoids thecurse ofdimensionalitythrough an automatic mechanism to adjust the

threshold value based on the input space dimension.The eMG developed in this paper differs from the existing

evolving modeling approaches because

• it uses multivariable membership functions for the fuzzysets of the rules antecedents to prevent information lossabout input variables interactions (section II);

• the recursive clustering algorithm is robust to noisydata and outliers because it derives from the conceptof participatory learning, a concept which provides amechanism to smooth incompatible data (section III);

• creation of new rules is governed by an automatic mech-anism to adjust threshold value considering input spacedimension and, therefore, does not suffer from thecurseof dimensionality(section III);

• there is no need to normalize input and output data.

The remainder of the paper is organized as follows. SectionII reviews the issue of why multivariable Gaussian mem-bership functions of fuzzy rule-based systems created byclustering procedures helps to estimate interactions betweenthe input variables and improve modeling. Next, section IIIdetails the evolving clustering algorithm within the frameworkof participatory learning. The complete eMG algorithm ispresented in section IV. Section V addresses applicationsof eMG in time series forecasting modeling and nonlinearsystem identification. Finally, the conclusions and furtherdevelopments are summarized in section VI.

II. M ULTIVARIABLE GAUSSIAN MEMBERSHIPFUNCTIONS

All sections and experiments reported in this paper assumemultivariable Gaussian membership functions of the form:

H(x) = exp

[

−1

2(x− v)Σ−1(x− v)T

]

(1)

where x is an 1 × m input vector,v is the 1 × m centervector andΣ is am×m symmetric, positive definite matrix.The center vectorv is the modal value and represents thetypical element ofH(x). The matrixΣ denotes the dispersionand represents the spread ofH(x) [32]. Both, v andΣ, areparameters of the membership function to be associated withcluster center and cluster spread, respectively.

There are several motivations for using Gaussian member-ship functions. They include:

• infinite support of these functions, they do not omitinputs;

• parameters are easy to define and only two are required;• interactions between input variables can be easily cap-

tured by the dispersion matrix.

It is interesting to note that, because the modeling frame-work is adaptive and runs recursively, the issue of infinitesupport is convenient because input variables bounds may notbe knowna priori. It is also known in the neural networkliterature that Gaussian radial basis functions are good tochar-acterize local properties [33]. Moreover, Gaussians provide away to construct basis functions and fuzzy systems can berepresented as linear combinations of fuzzy basis functions.Linear combinations of the fuzzy basis functions are capable

Copyright (c) 2010 IEEE. Personal use is permitted. For any other purposes, Permission must be obtained from the IEEE by emailing [email protected].

This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication.

3

0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 10.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

x1

x2

Fig. 1. Cluster structure in the input space

of uniformly approximating any real continuous function ona compact set to arbitrary accuracy, i.e., they are universalapproximators [34]. Functional fuzzy models with Gaussiansalso are universal approximators [35].

Most of eFS systems perform clustering in the inputor input-output data space, and create rules using one-dimensional, single variable fuzzy sets which are projectionsof the clusters on each input variable space. During fuzzyinference, the fuzzy relation induced by the antecedent ofeach fuzzy rule is computed using an aggregation operator(e.g. a t-norm) and the input fuzzy sets. This approach iscommonly used, but it may cause information loss if inputvariables interact [36], [37]. For instance, system identificationand time series forecasting usually use lagged values of theinput and/or output as inputs, and these lagged values tend tobe highly related.

To avoid information loss, the algorithm introduced hereinuses multivariable, instead of single variable Gaussian mem-bership functions to represent each cluster developed by therecursive clustering algorithm. The parameters of the member-ship functions are extracted directly from the correspondingclusters. These multivariable membership functions use theinformation about the dispersion matrix of each cluster (esti-mated by the clustering algorithm proposed in this paper) andthus provide information about input variables interactions.

Fig. 1 illustrates a typical scenario where the use of theprojection approach may imply in information loss. WhileFig. 1 illustrates the cluster structure of the input space,Fig.2 shows the single variable Gaussian membership functionscreated by projecting the clusters on the input spaces. Thereare three clusters in the example and each cluster representsthe input relation of a fuzzy rule antecedent. The three inputrelations are

x1 is A1 AND x2 is B1

x1 is A2 AND x2 is B2

x1 is A3 AND x2 is B3 (2)

Fig. 3 shows the fuzzy relations induced by the singlevariable membership functions, assuming the product t-normaggregation (AND) operator for each rule antecedent. Fig.

0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 10

0.2

0.4

0.6

0.8

1

0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 10

0.2

0.4

0.6

0.8

1

A3

A2

A1

B3

B1

B2

x1

x2

Fig. 2. Single variable Gaussian membership functions: projections ofclusters of Fig. 1 on the input spaces

0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 10.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

x1

x2

Fig. 3. Fuzzy relations induced by rules antecedents (product t-norm forAND)

4 depicts the multivariable Gaussian membership functionsconstructed using the cluster center vector and dispersionma-trix estimated for each cluster. The multivariable membershipfunctions suggest three rules whose antecedents are:

x is H1

x is H2

x is H3 (3)

Comparing the fuzzy relations shown in Fig. 3 and thoseof Fig. 4 with the original clusters in Fig. 1, one can note theinformation loss due to projection and reconstruction withtheproduct t-norm (the issue of finding a t-norm that recovers theoriginal relation precisely still is, generally speaking,open).Cluster shapes and their orientation differ from the originalones. In Fig. 3 clusters have diagonal dispersion matricesand input variablesx1 and x2 do not interact. In contrast,dispersion matrices of the clusters in Fig. 4 are nondiagonaland variablesx1 and x2 do interact. Clearly, the clusterstructure of Fig. 4 is closer to the original in Fig. 1.

III. G AUSSIAN PARTICIPATORY EVOLVING CLUSTERING

The evolving clustering algorithm introduced in this paperis based on the concept of participatory learning (PL) [29].

Copyright (c) 2010 IEEE. Personal use is permitted. For any other purposes, Permission must be obtained from the IEEE by emailing [email protected].

This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication.

4

0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 10.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

H1

H2

H3

x1

x2

Fig. 4. Multivariable Gaussian membership functions

Participatory learning assumes that learning and beliefs aboutthe system to be modeled depend on what the model hasalready learned. In other words, the current knowledge aboutthe system is part of the learning process itself and influencesthe way in which new observations are used in learning.

An essential characteristic of this learning process is that anew observation impact in causing learning or belief revisiondepends on its compatibility with the current system belief.Therefore, clustering algorithms based on this learning processtend to be robust to noise and outliers because they arelikely to be incompatible with the current system belief andconsequently can be discarded or smoothed. The participatorylearning clustering algorithm provides an automatic mecha-nism to decide if a new observation lying far outside currentcluster structure denotes either a new cluster to be includedinto the model, or if it is an outlier who should be discardedor smoothed.

The outlier smoothing mechanism provided by the participa-tory learning also introduces a sort of stability during learning.Models based on participatory learning have the characteristicof protecting the current system belief from wide swings of theinputs caused by erroneous or anomalous observations, whilestill allowing learning of new knowledge [38].

The clustering algorithm proposed assumes that the knowl-edge about the system to be modeled is the cluster structure,i.e, the number of clusters, the corresponding cluster centersvki for i = 1, · · · , ck, whereck is the number of clusters atstepk, and the shape of clusters encoded inΣk. At each step,the learning process may create a new cluster, modify theparameters of an existing one, or merge two similar clusters.

The cluster structure is updated using a compatibility mea-sure ρki ∈ [0, 1] and an arousal index,aki ∈ [0, 1]. Thecompatibility measure computes how much an observation iscompatible with the current cluster structure, while the arousalindex is the output of a arousal mechanism that acts as a criticto remind when the current structure should be revised in frontof new information contained in data.

Thresholds are defined for the compatibility measure (Tρ)and the arousal index (Ta). If at each step the compatibilitymeasure of the current observation is less than the thresholdfor all clusters, i.e,ρki < Tρ ∀ i = 1, · · · , ck, and the arousalindex of the cluster with the greatest compatibility is greater

than the threshold, i.e,aki > Ta for i = argmaxi ρki , then a

new cluster is created. Otherwise the cluster center with thehighest compatibility is adjusted as follows:

vk+1i = vki +Gk

i (xk − vki ) (4)

whereGki is defined as:

Gki = α(ρki )

1−aki (5)

andα ∈ [0, 1] is the basic learning rate.According to [29], the compatibilityρki is a function that

measures the compatibility between the current belief of themodel, represented by each cluster center, and the currentobservation:

ρki = F (xk, vki ) (6)

The function F (xk, vki ) ∈ [0, 1] is such that it shouldapproachzero as observations become contradictory with thecurrent belief, i.e, the cluster centers, and approachone as theobservations become in complete agreement with the currentbelief. For example, ifxk is equal to a cluster center, thenF (xk, vki ) = 1.

If aki = 0, thenGki = αρki and the PL procedure has no

arousal. The learning rate is modulated by the compatibilitymeasure only. Observations withρki = 0 provides no newinformation becausevk+1

i = vki , while an observation withρki = 1 does bring new information. For example, ifα = 1andρki = 1, thenvk+1

i = xk.In the above, notice that the basic learning rateα is modu-

lated by the compatibilityρ. When there are no participatoryconsiderations,α is often made a small value to preclude greatswings due to spurious values of inputs which are far fromthe current cluster structure. Small values ofα protect againstthe influence of bad inputs, but may slow down the learningprocess. The introduction of the participatory termρ allows theuse of higher values ofα. The learning rate of the participatorylearning model is dynamic andρ acts in a way that it lowersthe learning rate when large deviations occur. However, whenthe compatibility is large,ρ is such that it speeds up learning.

The arousal index is the output of an arousal mechanismused to measure the confidence about the current knowledgeof the system. For example, while a single low value of thecompatibility measure causes aversion to learning, a sequenceof low values of the compatibility measure should imply on arevision of the current knowledge about the system.

The arousal mechanism is defined as a monitoring mech-anism of the dynamics of the compatibility measure. Thismechanism monitors the values of the compatibility level andits output is interpreted as the complement of the confidenceabout the current knowledge. A low value ofaki implies in ahigh confidence about the system belief, while a high valueindicates the necessity to revise the current belief. Analysisof expression (5) shows that, as the arousal index increases,the compatibility measure has a reduced effect, indicatingthat if a sequence of observations presents low compatibility

Copyright (c) 2010 IEEE. Personal use is permitted. For any other purposes, Permission must be obtained from the IEEE by emailing [email protected].

This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication.

5

Learning

Process

Arousal

Mechanism

Beliefs

Observations

ρ a

Fig. 5. Participatory learning procedure

values, then it is more likely that the current knowledgeis incorrect and must be revised. When this happens, thevalue of the compatibility measure is reduced and the currentobservation will provide more information about the systemwhen compared with the information provided without thearousal mechanism. As explained later in this section, theextreme case is when the arousal index exceeds a thresholdand a new cluster is generated.

Fig. 5 illustrates the participatory learning procedure, in-cluding the basic idea of using the current beliefs on thelearning process and the arousal index monitoring mechanism.

The compatibility measureρki suggested here in this paperuses the squared value of the normalized distance between thenew observation and cluster centers (M-Distance):

M(xk, vki ) = (xk − vki )(Σki )

−1(xk − vki )T (7)

To compute theM-Distance, the dispersion matrix of eachcluster Σk

i must be estimated at each step. The recursiveestimation of the dispersion matrix proceeds as follows:

Σk+1i = (1−Gk

i )(Σki −Gk

i (xk − vki )(x

k − vki )T ) (8)

The compatibility measure at each stepk is given by:

ρki = F (xk, vki ) = exp

[

−1

2M(xk, vki )

]

(9)

To find a threshold value for the compatibility measure, weassume that the valuesM(xk, vki ) can be modeled by a Chi-Square distribution. Thus, given a significance levelλ, thethreshold can be computed as follows:

Tρ = exp

[

−1

2χ2m,λ

]

(10)

whereχ2m,λ is theλ upper unilateral confidence interval of a

Chi-Square distribution withm degrees of freedom, wheremis the number of inputs.

The compatibility measure is based on a normalized dis-tance measure (7). As discussed below, the correspondingthreshold (10) must be adjusted considering the input spacedi-mension to avoid thecurse of dimensionality. This is because,as the input space dimension increases, the distance betweentwo adjacent points also increases. If a fixed threshold value

is used and it does not depend of the input space dimension,then the number of threshold violations will increase, whichmay lead to an excessive generation of clusters. Looking atexpression (10), one can note that the compatibility measurethreshold includes information about the data space dimen-sionality becauseχ2

m,λ is a function of the numberm ofinputs. Therefore no manual adjust is needed and thecurseof dimensionalityis automatically avoided. In other words,the clustering method has an automatic mechanism to adjustthe compatibility measure threshold according to input spacedimension. As the data dimension increases, the distancebetween two adjacent points also increases, and the respectivecompatibility measure decreases. However, the compatibilitymeasure threshold also decreases avoiding excessive thresholdviolations.

To further demonstrate the effectiveness of the automaticadjustment of the thresholdTρ, Table I summarizes the resultsof a numerical experiment performed to compute the averagenumber of threshold violations when the input data dimensionincreases. Data sets were generated randomly using Gaussiandistributions for different data dimensionm ∈ [1, 100]. Foreach data set, a cluster centered at the mean vector with dis-persion matrix equals to the covariance was constructed. Thecompatibility measure is computed for each data sample andthe cluster center. Next, the number of compatibility thresholdviolations is counted. Since the compatibility measure thresh-old derives from the unilateral confidence interval built usingconfidence levelλ, the expected number of violations isnλ.For each dimensionm, 100 data sets withn = 1000 sampleswere generated. The t-test [39] was used to check if differencesbetween the computed and expected number of thresholdviolations are statistically significant. The t-test has a nullhypothesis that the difference between the expected and com-puted threshold violations are random samples from a normaldistribution with zero mean and unknown variance, i.e., thereis no statistical significant difference between the expectedand computed number of threshold violations. The expectednumber of threshold violations for all data sets is50 becauseλ = 0.05 andn = 1000 in all experiments. Table I displaysthe average number of compatibility threshold violations foreach value of data dimensionm. Also shown in the table is thep-valueof the statistical test. Lowp-valuesmean significantdifferences between the expected and computed number ofviolations. Table I shows that as the dimension increases, thereis no statistical difference between the number of computedand expected compatibility threshold violations, for a t-testsignificance level of0.01, i.e., in the table allp-valuesaregreater than0.01. These results demonstrate experimentallythe effectiveness compatibility measure threshold suggestedhere in preventing thecurse of dimensionality.

This paper adopts an arousal mechanism to monitor thecompatibility index using a sliding window assembled by thelast w observations. More specifically, we define the arousalindex as the probability of observing less thennv violations ofthe compatibility threshold on a sequence ofw observations.Low values of the arousal index are associated with no orfew violations of the compatibility threshold, implying a highconfidence about the system knowledge. High values of the

Copyright (c) 2010 IEEE. Personal use is permitted. For any other purposes, Permission must be obtained from the IEEE by emailing [email protected].

This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication.

6

TABLE IAVERAGE COMPATIBILITY MEASURETHRESHOLDV IOLATIONS AS DATA

DIMENSION INCREASES

Data dimension Average # of observed threshold violationsp-value

1 48.96 0.135 49.00 0.1910 49.92 0.9115 49.40 0.3825 51.09 0.0950 49.55 0.55100 50.00 1.00

arousal index are associated with several threshold violations,meaning that the current cluster structure must be revised.

To compute the arousal index for each observation, a relatedoccurrencevalueoki is found using the following expression

oki =

0 for M(xk, vki ) < χ2n,λ

1 otherwise(11)

Notice that the occurrence valueoki = 1 indicates thresholdviolation.

Occurrence valueoki can also be viewed as the output ofa statistical test to evaluate if the values ofM(xk, vki ) arenormal. The null hypothesis of the corresponding test is thatM(xk, vki ) can be modeled by a Chi-Square distribution withm degrees of freedom. Under null hypothesis, the probabilityof observingoki = 1 is λ becauseλ definesχ2

n,λ and it is theprobability of observing a false positive, i.e.,M(xk, vki ) >χ2n,λ.Since the nature ofoki is binary and the probability of

observingoki = 1 is known, the random variable associatedwith oki can be described by aBernoulli distribution [39] withprobability of successλ.

Given a sequence assembled by the lastw observations, thenumber of threshold violationsnvki is:

nvki =

∑w−1

j=0 ok−ji k > w

0 otherwise(12)

Notice thatnvki is computed during the firstw steps. Thismeans that the algorithm has an initial latency ofw steps.However this causes no problem because usuallyw is muchsmaller than the number of steps in which learning occurs.For instance, in real-time applications learning can happencontinuously.

The discrete probability distribution of observingnv thresh-old violations on a window of sizew is p(NV k

i = nv), withNV k

i assuming the valuesnv = 0, 1, · · · , w. Thus, becauseNV k

i is the sum of a sequence of i.i.d. random variables drawnfrom a Bernoulli distribution with the same probability ofsuccessλ, p(NV k

i = nv) can be characterized by theBinomial[39] distribution:

p(NVk

i = nv) =

(

w

nv

)

λnv(1− λ)w−nv nv = 0, · · · , w

0 otherwise(13)

The binomial distribution gives the probability of observingnv threshold violations in a sequence ofw observations. High

probability values enforce the assumption that observationsfit the current cluster structure while low probability valuessuggests that the observations should be described by a newcluster.

For example, Fig. 6 showsp(NV ki = nv) the probability

of observing0 to 50 threshold violations in a sequence of50 observations, givenTρ = exp

[

−1/2χ22,0.1

]

, λ = 0.1 andw = 50. One can note that there is a significant probability ofobserving10 or less violations, and a very low probability ofobserving20 or more violations.

0 10 20 30 40 500

0.02

0.04

0.06

0.08

0.1

0.12

0.14

0.16

0.18

0.2

nv

p(N

Vk i

=nv)

Fig. 6. Probability of observingnv threshold violations in a sequence of50observations,λ = 0.1

The arousal index is defined as the value of the cumulativeprobability ofNV k

i , i.e.aki = p(NV ki < nv). Fig. 7 illustrates

the arousal index as a function of the numberv of violationsobserved whenλ = 0.1 and w = 50. As Fig. 7 indicates,aki ≈ 1 for about12 violations and above.

0 10 20 30 40 500

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

nv

ak i

Fig. 7. Arousal index as the probability of observing less thannv thresholdviolations in a sequence ofw = 50 observations givenλ = 0.1

The threshold value of the arousal indexTa is 1−λ, whereλ is the same as the one that defines the threshold for thecompatibility measure. The minimum number of compatibilitythreshold violations on a window of sizew necessary to exceedTa can be computed numerically looking for the first value ofnv for which the discrete cumulative distribution is equal toor greater than1− λ. More formally:

Copyright (c) 2010 IEEE. Personal use is permitted. For any other purposes, Permission must be obtained from the IEEE by emailing [email protected].

This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication.

7

nv∗ = argminnv

nv∑

k=1

(

wnv

)

λk(1− λ)w−k − (1 − λ)

(14)As discussed later, the clustering algorithm proposed con-

tinually revises the current cluster structure and eventuallymerges similar clusters. The compatibility between the updatedor created cluster and all remaining cluster centers is computedat each step. If, for a given pair, the compatibility exceedsthe thresholdTρ, then the two clusters are merged, i.e., ifρki (v

kj , v

ki ) > Tρ or ρkj (v

ki , v

kj ) > Tρ, then clustersj andi are

merged.The compatibility between two clustersi andj is computed

as follows:

ρki (vkj , v

ki ) = exp

[

−1

2M(vkj , v

ki )

]

(15)

whereM(vkj , vki ) is theM-distancebetween cluster centersi

andj, that is:

M(vkj , vki ) = (vkj − vki )(Σ

ki )

−1(vkj − vki )T (16)

To check if two clusters are similar, is necessary to computeρki (v

kj , v

ki ) andρkj (v

ki , v

kj ) because usuallyΣk

i 6= Σkj .

Notice that the clustering algorithm has only three parame-ters:

• the basic learning rateα;• the window sizew used by the arousal mechanism;• the confidence levelλ to compute thresholdsTρ andTa.The basic learning rate is usually set to a small value, i.e.,

typically α ∈ [10−1, 10−5].The window sizew is a problem specific parameter because

it defines how many consecutive observations must be consid-ered to compute the arousal index. In other words, consideringthe current system knowledge,w defines the length of theanomaly pattern needed to classify data either as a new clusteror as a noise or outlier.

The value of the significance levelλ depends onw. It mustbe set such that the arousal thresholdTa corresponds to morethan one compatibility threshold violation, i.e.,nv > 1 whenaki > Ta. Suggested ranges for values ofλ, givenw, are:

λ ≥

0.01, if w ≥ 1000.05, if 20 ≤ w < 1000.1, if 10 ≤ w < 20

(17)

The clustering process can be started using either a singleobservation or an initial data set. If initial data set is available,then an off-line clustering algorithm can be used to estimatethe initial number of clusters and their respective parameters.The off-line algorithm should be capable to provide both,cluster centers and respective dispersion matrices. If theclus-tering process starts with a single observation, then an initialdispersion matrixΣinit must be chosen, eventually usingapriori information about the problem.

Whenever a new cluster is created during the clusteringprocess, the new cluster center is set as the current observation,and the new dispersion matrix is the initial valueΣinit.

If two clusters are merged, then the center of the resultingcluster is the average of the corresponding clusters centers andthe dispersion matrix isΣinit.

Fig. 8 summarizes the recursive multivariable participatoryfuzzy clustering algorithm assuming that it starts with a singleobservation.

IV. GAUSSIAN EVOLVING FUZZY SYSTEM MODELING

This section details the procedure to estimate the remainingparameters of eMG modeling. For simplicity, we assume thatan eMG model is a set of first-order Takagi-Sugeno (TS) fuzzyrules whose consequent parameters are estimated using therecursive weighted least squares algorithm.

The number of eMG rules is the same as the number ofclusters found by the clustering algorithm at each step. Ateach iteration a new cluster can be created, an existing clusterremoved, or existing clusters updated. In other words, rules canbe created, merged, or adapted at each step of the algorithm.Rules antecedents are of the form:

xk is Hi (18)

where xk is a 1 × m input vector andHi is a fuzzy setwith multivariable Gaussian membership function (1) andparameters extracted from the corresponding cluster center anddispersion.

The model is formed by a set of functional fuzzy rules:

Ri : IF xk is Hi THEN yki = γkio +

m∑

j=1

γkijx

ki (19)

whereRi is the ith fuzzy rule, for i = 1, · · · , ck, ck is thenumber of rules , andγk

io and γkij are the parameters of the

consequent at stepk.The model output is the weighted average of the outputs of

the each rule, that is:

yk =

ck∑

i=1

Ψi(xk)yki (20)

with normalized membership functions:

Ψi(xk) =

exp[

(xk − vki )Σ−1i (xk − vki )

T]

∑ck

i=1 exp[

(xk − vki )(Σki )

−1(xk − vki )T]

(21)

wherevki andΣki are the center and dispersion matrix of the

ith cluster membership function at stepk.The parameters of the consequent are updated using the

weighted recursive least squares [14], [40] algorithm, similarlyas other TS evolving fuzzy models [15], [19]. Hence, theconsequent parameters and matrixQi of the update formulasfor rule i at each iterationk are:

γk+1i = γk

i +Qk+1i xkΨi(x

k)[

yki − ((xk)T γki )]

Qk+1i = Qk

i −Ψi(x

k)Qki x

k(xk)TQki

1 + (xk)TQki x

k(22)

Copyright (c) 2010 IEEE. Personal use is permitted. For any other purposes, Permission must be obtained from the IEEE by emailing [email protected].

This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication.

8

Procedure MGECInput : k,xk,v,Σ,λ,w,Σinit,cOutput : v, Σ, cluster_created, cluster_merged, c,

idxk

if k == 1 thenInitialize first cluster;v1 = x1;Σ1 = Σinit;c = 1;

endComputeρi and ai for all clusters;for i = 1, c do

M(xk, vi) = (xk − vi)(Σi)−1(xk − vi)

T ;ρi = exp

[

− 12M(xk, vi)

]

;if ρi < Tρ then

oki = 1;else

oki = 0;endif k > w then

nvi =∑w−1

l=0 ok−li ;

ai = p(NV ki < nvi);

elseai = 0;

endendidx = argmaxi ρi;if ρi < Tρ ∀i and akidx > Ta then

Create new cluster;c = c+ 1;vc = xk;Σc = Σinit;idx = c;cluster_created = c;

elseUpdate an existing cluster;Gidx = α(ρidx)

1−aidx ;vidx = vidxk +Gidx(x

k − vidx);Σidx =(1−Gidx)(Σidx −Gidx(x

k − vidx)(xk − vidx)

T );endCheck for redundant clusters;for i = 1, c do

if ρidx(vidx, vi) > Tρ or ρj(vj , vidx) > Tρ thenMerge two redundant clusters;vidx = mean(vj , vidx);Σidx = Σinit;c = c− 1;cluster_merged = [idx j];

endend

Fig. 8. Multivariable Gaussian evolving clustering

As discussed previously, the eMG algorithm can be ini-tialized either with an existing data set, or with a singleobservation.

If the eMG starts with an existing data set, then an offlineclustering algorithm can be used to estimate the numberand parameters of the initial set of rules. Clustering can bedone in the input space and a rule created for each cluster.The antecedent parameters of each rule is extracted fromthe clusters, and the consequent parameters estimated by theweighted least squares algorithm.

If the eMG starts with a single observation, then one ruleis created with the antecedent membership function centeredat the observation, and the respective dispersion matrix set atthe pre-defined initial value. The consequent parameters areinitialized asγ0 = [y0 0 · · · 0] andQk = ωIm+1, whereIm+1 is anm+ 1 identity matrix andω is a large real value,for example,ω ∈ [102, 104] [40].

As new data is input, the eMG algorithm may create, updateor merge clusters. Thus, the set of rules, the rule-base, mustbe updated as well. This is done as follows.

If a new cluster is created, then a corresponding rule is alsocreated with antecedent parameters extracted from the clusterand consequent parameters computed as the weighted averageof the parameters of the existing clusters:

γknew =

∑ck

i=1 γki ρ

ki

∑ck

i=1 ρki

(23)

The matrixQ is set asQknew = ωIm+1

If an existing cluster is updated, then the antecedent param-eters of the corresponding rule are updated accordingly.

Finally, if two clusters i and j are merged, then theconsequent parameters of the resulting rule are computed asfollows:

γknew =

γki ρ

ki + γk

j ρkj

ρki + ρkj(24)

The matrixQ is set asQknew = ωIm+1

Fig. 9 summarizes the multivariable Gaussian evolvingfuzzy system modeling algorithm.

V. EXPERIMENTS

The eMG model was tested using short term load forecast-ing, long term time series forecasting, classic nonlinear sys-tem identification, and high dimensional system identificationproblems. The results obtained were compared with alternativeevolving and fixed structure modeling approaches.

All data sets selected for the experiments, except the shortterm load forecasting problem which uses actual data, are thesame as the ones used to evaluate evolving models reportedin the literature [15]–[17], [19], [22], [23], [41].

In all experiments, modeling performance was evaluatedusing the root mean squared error (RMSE) and/or the non-dimensional error (NDEI). The NDEI is the ratio of the rootmean squared error by the standard deviation of the target data.The error measures are computed as follows:

Copyright (c) 2010 IEEE. Personal use is permitted. For any other purposes, Permission must be obtained from the IEEE by emailing [email protected].

This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication.

9

Procedure eMGInput : x,yd,λ,w,Σinit

Output : ysγ1 = [y10 · · · 0];Q1 = ωIm+1;for k = 1, length(x) do

Compute the output;for i = 1, c do

ρi = exp[

− 12 (x

k − vi)TΣ−1

i (xk − vi)]

;yi = xkγi;

endys =

∑ci=1 ρiyi∑ci=1 ρi

;

[v,Σ, cluster_created, cluster_merged, c, idxk] =MGEC(k, xk, v,Σ, λ, w,Σinit, c);if cluster_created then

Create a new rule;γc =

∑c−1i=1 γiρi

∑c−1i=1 ρi

;Qc = ωIm+1;

endfor i = 1, c do

Update consequent parameters;γi = γi +Qix

kΨi(xk)

[

yki − ((xk)T γi)]

;

Qi = Qi −Ψi(x

k)Qixk(xk)TQi

1+(xk)TQixk ;endif cluster_merged then

Merge two rules;[ij] = cluster_merged;γi =

γiρi+γjρj

ρi+ρj;

Qi = ωIm+1;end

end

Fig. 9. Evolving multivariable Gaussian fuzzy system modeling

RMSE =

(

1

N

N∑

k=1

(yk − yk)

)

12

(25)

NDEI =RMSE

std(yk)(26)

whereN is the size of the test data set,yk is the target output,andstd() is the standard deviation function.

A. Short Term Load Forecasting

Forecasts of load demand are very important in the op-eration of electric energy systems because several decisionmaking process, such as system operation planning, securityanalysis, and market decisions are strongly influenced by thefuture values of the load. In this context, a significant errorin the load forecast may result in economic losses, securityconstraints violations, and system operation drawbacks. Ac-curate and reliable load forecasting models are essential for asuitable system operation. The problem of load forecastingcanbe classified in long, medium and short-term, depending onthe situation. Long-term forecasting is important for capacity

0 10 20 30 40 50 60 70 800.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

Hour

kW

Fig. 10. Normalized load data for the first3 days of August, 2000

expansion of the electric system. Medium term is importantto organize the fuel supply, maintaining operations and inter-change scheduling. Short-term forecasting is generally used fordaily planning and operation of the electrical system, energytransfer and demand management [42].

Particularly, in the context of short-term hydrothermalscheduling, load forecasting is important to elaborate thenextday operation scheduling because errors in load forecastingcan cause serious consequences, affecting the efficiency andthe safety of the system (cost increasing, insufficient electricalenergy supply to the existing demand).

The goal of short-term load forecasting is to accuratelypredict the24 hourly loads of the next operation day, one-step-ahead. The effectiveness of the eMG approach is illustratedusing load data of a major electrical utility located at thesoutheast region of Brazil.

The load data set used in this experiment are expressedin kilowatts per hour (kW/hour), and corresponds to31 daysof August, 2000. Fig. 10 depicts the normalized load valuesof the first 3 days of August, 2000. The eMG approach isscale-invariant but for this experiment the data was normalizedbetween0 and1 to preserve privacy.

The eMG model is a one-step ahead forecaster whosepurpose is to predict the current load value using laggedload values in the series. The sample partial autocorrelationfunction [43] for the first36 observations of the series suggeststhe use of the last2 previous values of load values as inputsof the models, that is, the forecast model is of the form:

yk = f(yk−1, yk−2) (27)

The experiment has been conducted as follows. The hourlyload for first 30 days were input to the eMG algorithm (718observations) and the evolved model performance evaluatedusing data of the last day (24 observations), keeping themodel structure and parameters fixed at the values foundafter evolving during the period of 30 days. The eMG startedclustering with the first observation, and the parameters werechosen asλ = 0.05, w = 20, Σinit = 10−2I2 andα = 0.01.

Fig. 11 shows the forecasting result and Fig. 12 sketches thefinal 5 clusters found after learning. Looking at Fig. 12 onecan note that the resulting clusters have distinct orientationsand most of them are not parallel to the input axes.

Copyright (c) 2010 IEEE. Personal use is permitted. For any other purposes, Permission must be obtained from the IEEE by emailing [email protected].

This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication.

10

0 5 10 15 20 250.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

Actual Value

eMG Value

Hour

kW

Fig. 11. Load forecasting

−0.2 0 0.2 0.4 0.6 0.8 1 1.20

0.2

0.4

0.6

0.8

1

1.2

1.4

yk−1

yk−

2

Fig. 12. Input space cluster structure for load forecasting

Fig. 13 shows the evolution of the number of fuzzy rulesduring the learning stage. Rules were merged fort ≈ 320 andt ≈ 700 .

Table II shows how eMG performs against evolving andfixed structure modeling methods using RMSE and NDEIerror measures. The MLP has one hidden layer with fiveneurons trained with backpropagation algorithm, the ANFIS

100 200 300 400 500 600 7000

1

2

3

4

5

6

7

t (Hour)

Nu

mb

ero

fru

les

Fig. 13. Number of rules for load forecasting during learning

has three fuzzy sets for each input variable an seven fuzzyrules. The MLP adopted the following scheme for initializationphase: small weights values randomly assigned;α = 0.9as momentum parameter;1000 as the maximum number ofepochs, and a adaptive learning-rate starting fromη = 0.01 asinitial step size. The ANFIS has1000 as maximum numberof epochs,ηi = 0.01 as initial step size,sd = 0.9 as stepsize decrease rate andsi = 1.1 as step size increase rate.The parameters of the eTS model were set tor = 0.4 andΩ = 750. The xTS [41] has aΩ = 750. The parameters ofthe ePL model wereτ = 0.3, r = 0.25, λ = 0.35.

Table II suggests that eMG performs best among all themodels. If the window size is set tow = 25, then eMG alsooutperforms xTS using the same number of rules.

TABLE IIPERFORMANCE OF THE ELECTRICITY LOAD FORECASTING METHODS

Model name and source Number of rules (or nodes) RMSE NDEI

xTS [41] 4 0.0574 0.2135eMG (w = 25) 4 0.0524 0.1948MLP [44] 5 0.0552 0.2053ANFIS [2] 9 0.0541 0.2012eTS [15] 6 0.0527 0.1960ePL [28] 5 0.0508 0.1889eMG (w = 20) 5 0.0459 0.1707

B. Mackey-Glass Time Series Forecasting

In this experiment the eMG model is used to develop longterm forecasting of the Mackey-Glass time series [45]. Thetime series is created with the use of the following time-delaydifferential equation:

dx(t)

dt=

0.2x(t− τ)

1 + x10(t− τ)(28)

wherex(0) = 1.2, τ = 17.The aim is to forecast the valuexk+85 from the input vector

[xk−18 xk−12 xk−6 xk] for any value ofk.The experiment has been done as follows. Initially,3000

data samples were collected fork ∈ [201, 3200] and used asinputs of the evolving learning procedure of all models usedfor comparison. Next,500 data samples fork ∈ [5001, 5500]were collected to verify the performance after evolving. Theperformance was evaluated using the NDEI and compared withevolving fuzzy approaches. Fig. 14 shows the forecast resultand Fig. 15 depicts the evolution of the number of fuzzy rulesduring the learning stage. During learning, several rule mergeoccurred. The eMG started clustering with the first observationand the parameters were set asλ = 0.05, w = 30, Σinit =10−2I4 andα = 0.01.

Table III summarizes the performance of eMG againstevolving fuzzy models. The parameters of the eTS model wereset tor = 0.4 andΩ = 750. The xTS [41] has aΩ = 750.The results obtained for DENFIS [17] and FLEXFIS [19] weretaken from the respective references.

From Table III one can note that eMG, withΣinit =10−2I4, achieves better performance than eTS and xTS with

Copyright (c) 2010 IEEE. Personal use is permitted. For any other purposes, Permission must be obtained from the IEEE by emailing [email protected].

This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication.

11

0 50 100 150 200 250 300 350 400 450 5000.2

0.4

0.6

0.8

1

1.2

1.4

1.6

Actual Value

eMG Value

k

Ou

tpu

t

Fig. 14. Mackey-Glass time series forecasting

500 1000 1500 2000 2500 30000

2

4

6

8

10

12

k

Nu

mb

ero

fru

les

Fig. 15. Number of rules for the Mackey-Glass time series forecast duringlearning

approximately the same number of fuzzy rules . With the initialdispersion matrix set toΣinit = 10−3I4 and with w = 20eMG outperforms the remaining three models with similar orsimpler model.

C. Nonlinear System Identification

In this section, the eMG model is compared with alternativeevolving fuzzy modelling approaches using a classic nonlinearsystem identification problem. The nonlinear system to beidentified is the following:

TABLE IIIPERFORMANCE OFMACKEY-GLASS T IME SERIESFORECASTING

METHODS

Model name and source Number of rules NDEI

eTS [15] 9 0.372xTS [41] 10 0.331eMG (Σinit = 10−2I4, w = 30) 9 0.321DENFIS [17] 58 0.276FLEXFIS Var A [19] 69 0.206FLEXFIS Var B [19] 89 0.157eMG (Σinit = 10−3I4, w = 20) 58 0.139

0 20 40 60 80 100 120 140 160 180 200−2

−1.5

−1

−0.5

0

0.5

1

1.5Actual Value

eMG Output

k

Ou

tpu

t

Fig. 16. Nonlinear system identification

500 1000 1500 2000 2500 3000 3500 4000 4500 50000

1

2

3

4

5

6

k

Nu

mb

ero

fru

les

Fig. 17. Number of rules for the nonlinear system identification duringlearning

yk =yk−1yk−2(yk−1 − 0.5)

1 + (yk−1)2 + (yn−2)2+ uk−1 (29)

whereuk = sin(2πk/25), y0 = y1 = 0.The aim is to predict the current output based on past input

and outputs. The model for this data set is of the form:

yk = f(yk−1, yk−2uk−1) (30)

whereyk is the output.The experiment was performed as follows. First,5000

samples were created for evolving learning and next an ad-ditional 200 samples created to evaluate the performance afterevolving, keeping the model structure and parameters fixed.Performance was evaluated using the RMSE and comparedwith other evolving fuzzy approaches. Fig. 16 shows theresult and Fig. 17 the evolution of the number of fuzzy rules.The model structure was defined afterk = 175 and no rulemerging was performed. The eMG starts clustering with thefirst observation. The parameters areλ = 0.05, w = 40,Σinit = 10−1I3 andα = 0.01.

Copyright (c) 2010 IEEE. Personal use is permitted. For any other purposes, Permission must be obtained from the IEEE by emailing [email protected].

This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication.

12

TABLE IVPERFORMANCE OFNONLINEAR SYSTEM IDENTIFICATION METHODS

Model name and source Number of rules RMSE

SAFIS [23] 17 0.0221SOFMLS [22] 5 0.0201FLEXFIS Var A [19] 5 0.0176FLEXFIS Var B [19] 8 0.0171xTS [41] 5 0.0063eMG 5 0.0058

Table IV summarizes the performance of eMG againstrepresentative evolving models. The xTS [41] parameter wasset toΩ = 750. The results for SAFIS [23], FLEXFIS [19]and SOFMLS [22] were taken from the respective references.Table IV suggests that eMG performs best than the first4models and has similar performance as xTS [41].

D. High Dimensional System Identification Problem

The performance of the proposed model was also evaluatedusing a high dimensional system identification problem. Thisexperiment aims to evaluate how the proposed model behaveswhen dealing with higher dimensional input spaces.

The nonlinear system to be identified is:

yk =

∑m

i=1 yk−i

1 +∑m

i=1(yk−i)2

+ uk−1 (31)

where uk = sin(2πk/20), yj = 0 for j = 1, · · · ,m andm = 10.

The purpose is to predict the current output using past inputand outputs. The model for this data set is of the form:

yk = f(yk−1, yk−2, ..., yk−10, uk−1) (32)

whereyk is the model output.The experiment has been done as follows. The first3000

samples were created for learning and the next300 samplesto evaluate performance, keeping the model structure andparameters fixed after evolving. Performance was evaluatedusing the RMSE and compared with representative evolvingfuzzy approaches. Fig. 18 shows the result and Fig. 19how fuzzy rules evolve during learning. The model structurewas learned during the first1000 steps. Three rule mergeinstances occurred during learning. As before, the eMG startedclustering with the first observation. The parameters adoptedwereλ = 0.05, w = 25, Σinit = 10−1I11 andα = 0.01.

Table V summarizes the performance of eMG, eTS [15](r = 0.6, Ω = 750), xTS [41] (Ω = 750), and FLEXFIS[19]. Table V suggests that eMG performs best among allthe models with less or similar number of rules. With initialdispersion matrix set toΣinit = 2×10−1I4 eMG outperformsxTS with the same number of fuzzy rules.

VI. CONCLUSION

This paper has introduced a new method to develop evolvingfuzzy models emphasizing functional Takagi-Sugeno models

0 50 100 150 200 250 300−2

−1.5

−1

−0.5

0

0.5

1

1.5

2

Actual Value

eMG Value

k

Ou

tpu

t

Fig. 18. High Dimensional Nonlinear system identification

500 1000 1500 2000 2500 30000

5

10

15

k

Nu

mb

ero

fru

les

Fig. 19. Number of rules for the high dimensional nonlinear systemidentification during learning

within the framework of multivariable Gaussian membershipfunctions. The method uses a recursive clustering algorithmand is capable to continually modify the cluster structurewhenever new data samples becomes available. It can beused in both, off-line and on-line, real-time environments. Theclustering algorithm uses the concept of participatory learningand is robust to noisy data and outliers because it smooths theireffect during clustering. The clustering algorithm recursivelyestimates the clusters centers and respective clusters dispersionmatrices.

Rule antecedents uses multivariable Gaussian member-ship functions whose parameters are extracted directly from

TABLE VPERFORMANCE OFHIGH DIMENSIONAL NONLINEAR SYSTEM

IDENTIFICATION METHODS

Model name and source Number of rules RMSE

xTS [41] 9 0.0331eMG (Σinit = 2× 10−1I11) 9 0.0288FLEXFIS Var A [19] 15 0.0085eTS [15] 14 0.0075eMG (Σinit = 10−1I11) 13 0.0050

Copyright (c) 2010 IEEE. Personal use is permitted. For any other purposes, Permission must be obtained from the IEEE by emailing [email protected].

This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication.

13

the clusters. Multivariable Gaussian membership functionswere adopted because they preserve information about inputvariables interactions, estimated by the clustering algorithmthrough the dispersion matrix.

The method was evaluated using time series forecastingand nonlinear system identification problems. The experimentsperformed and their results suggest that the method introducedhere is a promising alternative to build adaptive functionalfuzzy models. Future work shall address the use of neural likestructures and evolving procedures using fuzzy neuron modelsbased on generalizations of triangular norms and conorms.

ACKNOWLEDGEMENT

The authors thank the Brazilian National Research Coun-cil, CNPq, for grants 141323/2009-4, 309666/2007-4 and304857/2006-8, respectively. The second author also acknowl-edges the support of FAPEMIG, the Research Foundation ofthe State of Minas Gerais, for grant PPM-00252-09. Com-ments of the anonymous referees are also kindly acknowl-edged.

REFERENCES

[1] E. H. Mamdani and S. Assilian, “An experimenti in linguistic synthesiswith a fuzzy logic controller,”Int. J. Man-Machine Studies, vol. 7, no. 1,pp. 1–13, 1975.

[2] J.-S. Jang, “Anfis: adaptive-network-based fuzzy inferencesystem,” IEEE Transactions on Systems Man and Cybernetics,vol. 23, no. 3, pp. 665–685, 1993. [Online]. Available:http://ieeexplore.ieee.org/lpdocs/epic03/wrapper.htm?arnumber=256541

[3] S. L. Chiu, “Fuzzy model identification based on cluster estimation,” J.Intell. Fuzzy Syst., vol. 2, pp. 267–278, 1994.

[4] R. Yager and D. Filev, “Approximate clustering via the mountainmethod,” IEEE Transactions on Systems Man and Cybernetics,vol. 24, no. 8, pp. 1279–1284, 1994. [Online]. Available:http://ieeexplore.ieee.org/lpdocs/epic03/wrapper.htm?arnumber=299710

[5] N. K. Kasabov,Foundations of Neural Networks, Fuzzy Systems, andKnowledge Engineering. Cambridge, MA, USA: MIT Press, 1996.

[6] J.-S. R. Jang, C.-T. Sun, and E. Mizutani,Neuro-Fuzzy and Soft Comput-ing: A Computational Approach to Learning and Machine Intelligence,1st ed. Prentice-Hall, 1997.

[7] P. Angelov and R. Guthke, “A genetic-algorithm-based approach tooptimization of bioprocesses described by fuzzy rules,”BioprocessEngineering, vol. 16, no. 5, p. 299, 1997. [Online]. Available:http://www.springerlink.com/index/10.1007/s004490050326

[8] R. Babuska,Fuzzy Modeling for Control. Kluwer, 1998.[9] L.-X. Wang and J. Mendel, “Generating fuzzy rules by learning from

examples,” IEEE Transactions on Systems Man and Cybernetics,vol. 22, no. 6, pp. 1414–1427, 1992. [Online]. Available:http://ieeexplore.ieee.org/lpdocs/epic03/wrapper.htm?arnumber=199466

[10] P. Angelov, D. Filev, and N. Kasabov, “Guest editorial evolving fuzzysystems - preface to the special section,”IEEE Transactions on FuzzySystems, vol. 16, no. 6, pp. 1390–1392, Dec 2008. [Online]. Available:http://ieeexplore.ieee.org/lpdocs/epic03/wrapper.htm?arnumber=4712536

[11] N. Kasabov and D. Filev, “Evolving intelligent systems: Methods,learning, applications.” IEEE, Sep 2006, pp. 8–18. [Online]. Available:http://ieeexplore.ieee.org/lpdocs/epic03/wrapper.htm?arnumber=4016749

[12] J. V. d. Oliveira and W. Pedrycz,Advances in Fuzzy Clustering and itsApplications. New York, NY, USA: John Wiley & Sons, Inc., 2007.

[13] P. C. Young,Recursive Estimation and Time-Series Analysis: An Intro-duction. Springer-Verlag, 1984.

[14] L. Ljung, System Identification. Prentice-Hall, 1999.[15] P. Angelov and D. Filev, “An approach to online identification

of takagi-sugeno fuzzy models,” IEEE Transactions onSystems Man and Cybernetics Part B (Cybernetics),vol. 34, no. 1, pp. 484–498, Feb 2004. [Online]. Available:http://ieeexplore.ieee.org/lpdocs/epic03/wrapper.htm?arnumber=1262519

[16] ——, “Simpl_ets: a simplified method for learning evolving takagi-sugeno fuzzy models.” IEEE, 2005, pp. 1068–1073. [Online].Available:http://ieeexplore.ieee.org/lpdocs/epic03/wrapper.htm?arnumber=1452543

[17] N. K. Kasabov and Q. Song, “Denfis: Dynamic evolving neural-fuzzyinference system and its application for time-series prediction,” IEEETrans. Fuzzy Syst., vol. 10, no. 2, pp. 144–154, Apr 2002.

[18] C. F. Juang and C. T. Lin, “An online self-constructing neural fuzzyinference network and its applications,”IEEE Trans. Fuzzy Syst., vol. 6,no. 1, pp. 12–32, Feb 1999.

[19] E. D. Lughofer, “Flexfis: A robust incremental learningapproach forevolving takagi-sugeno fuzzy models,”IEEE Transactions on FuzzySystems, vol. 16, no. 6, pp. 1393–1410, Dec 2008. [Online]. Available:http://ieeexplore.ieee.org/lpdocs/epic03/wrapper.htm?arnumber=4529084

[20] R. M. Gray, “Vector quantization,”IEEE ASSP Mag., pp. 4–29, Apr1984.

[21] E. D. Lughofer, “Extensions of vector quantizationfor incremental clustering,” Pattern Recognition, vol. 41,no. 3, pp. 995–1011, Mar 2008. [Online]. Available:http://linkinghub.elsevier.com/retrieve/pii/S0031320307003354

[22] J. d. J. Rubio, “Sofmls: Online self-organizing fuzzy modifiedleast-squares network,”IEEE Transactions on Fuzzy Systems,vol. 17, no. 6, pp. 1296–1309, Dec 2009. [Online]. Available:http://ieeexplore.ieee.org/lpdocs/epic03/wrapper.htm?arnumber=5196829

[23] H. Rong, N. Sundararajan, G. Huang, and P. Saratchandran,“Sequential adaptive fuzzy inference system (safis) for nonlinearsystem identification and prediction,”Fuzzy Sets and Systems,vol. 157, no. 9, pp. 1260–1275, May 2006. [Online]. Available:http://linkinghub.elsevier.com/retrieve/pii/S0165011405006020

[24] G. Leng, T. M. McGinnity, and G. Prasad, “An approach foron-lineextraction of fuzzy rules using a self-organising fuzzy neural network,”Fuzzy Sets Syst., vol. 150, no. 2, pp. 211–243, 2005.

[25] P. P. Angelov and X. Zhou, “Evolving fuzzy-rule-based classifiersfrom data streams,” IEEE Transactions on Fuzzy Systems,vol. 16, no. 6, pp. 1462–1475, Dec 2008. [Online]. Available:http://ieeexplore.ieee.org/lpdocs/epic03/wrapper.htm?arnumber=4529082

[26] D. Filev and P. Angelov, “Algorithms for real-time clustering andgeneration of rules from data,” inAdvances in Fuzzy Clustering andits Applications, J. V. d. Oliveira and W. Pedrycz, Eds. New York, NY,USA: John Wiley & Sons, Inc., 2007.

[27] D. C. Montgomery,Introduction to Statistical Quality Control. Wiley,2001.

[28] E. Lima, M. Hell, R. Ballini, and F. Gomide, “Evolving fuzzy modelingusing participatory learning,” inEvolving Intelligent Systems: Method-ology and Applications, P. Angelov, D. Filev, and N. Kasabov, Eds.Wiley-Interscience/IEEE Press, 2010.

[29] R. Yager, “A model of participatory learning,” IEEETransactions on Systems Man and Cybernetics, vol. 20,no. 5, pp. 1229–1234, 1990. [Online]. Available:http://ieeexplore.ieee.org/lpdocs/epic03/wrapper.htm?arnumber=59986

[30] L. Silva, F. Gomide, and R. Yager, “Participatory learning infuzzy clustering.” IEEE, 2005, pp. 857–861. [Online]. Available:http://ieeexplore.ieee.org/lpdocs/epic03/wrapper.htm?arnumber=1452506

[31] T. Hastie, R. Tibshirani, and J. H. Friedman,The Element of StatisticalLearning: Data Mining, Inference, and Prediction. Springer-Verlag,2001.

[32] W. Pedrycz and F. Gomide,Fuzzy Systems Engineering: Toward Human-Centric Computing. NJ, USA: Wiley Interscience, 2007.

[33] R. Lippmann, “A critical overview of neural network pattern classifiers,”in Proceedings of the 1991 IEEE Workshop Neural Networksfor Signal Processing, 1991, pp. 266–275. [Online]. Available:http://ieeexplore.ieee.org/lpdocs/epic03/wrapper.htm?arnumber=239515

[34] L.-X. Wang and J. Mendel, “Fuzzy basis functions, universal approxi-mation, and orthogonal least-squares learning,”IEEE Transactions onNeural Networks, vol. 3, no. 5, pp. 807–814, 1992. [Online]. Available:http://ieeexplore.ieee.org/lpdocs/epic03/wrapper.htm?arnumber=159070

[35] V. Kreinovich, M. G., and H. Nguyen, “Fuzzy rule-based modeling asuniversal approximation tool,” inFuzzy Systems: Modeling and Control,H. Nguyen and M. Sugeno, Eds. Boston, MA: Kluwer Academic, 1998,pp. 135–195.

[36] E. Kim, M. Park, S. Kim, and M. Park, “A transformed input-domainapproach to fuzzy modeling,”IEEE Trans. Fuzzy Syst., vol. 6, no. 4, pp.596–604, Nov 1998.

[37] J. Abonyi, R. Babuska, and F. Szeifert, “Modified gath-geva fuzzyclustering for identification of takagi-sugeno fuzzy models,” IEEETransactions on Systems Man and Cybernetics Part B (Cybernetics),vol. 32, no. 5, pp. 612–621, Oct 2002. [Online]. Available:http://ieeexplore.ieee.org/lpdocs/epic03/wrapper.htm?arnumber=1033180

[38] R. R. Yager, “Extending the participatory learning paradigmto include source credibility,” Fuzzy Optimization and Decision

Copyright (c) 2010 IEEE. Personal use is permitted. For any other purposes, Permission must be obtained from the IEEE by emailing [email protected].

This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication.

14

Making, vol. 6, no. 2, pp. 85–97, Jun 2007. [Online]. Available:http://www.springerlink.com/index/10.1007/s10700-007-9007-9

[39] A. Papoulis,Probability, Random Variables, and Stochastic Processes.McGraw-Hill, 1984.

[40] K. Astrom and B. Wittenmark,Adaptive Systems, 1st ed. USA:Addison-Wesley, 1988.

[41] P. Angelov and X. Zhou, “Evolving fuzzy systems from data streamsin real-time.” IEEE, Sep 2006, pp. 29–35. [Online]. Available:http://ieeexplore.ieee.org/lpdocs/epic03/wrapper.htm?arnumber=4016721

[42] S. Rahman and O. Hazim, “A generalized knowledge-basedshort-termload-forecasting technique,”IEEE Transactions on Power Systems,vol. 8, no. 2, pp. 508–514, May 1993. [Online]. Available:http://ieeexplore.ieee.org/lpdocs/epic03/wrapper.htm?arnumber=260833

[43] G. E. P. Box and G. Jenkins,Time Series Analysis, Forecasting andControl. Holden-Day, Incorporated, 1990.

[44] R. Duda, P. Hart, and D. Stork,Pattern Classification, 2nd ed., S. Edi-tion, Ed. Wiley-Interscience, 2000.

[45] M. C. Mackey and L. Glass, “Oscillation and chaos in physiologicalcontrol systems,”Science, no. 2, pp. 287–289, 1977.

Andre Lemos received the B.Sc. degree in com-puter science in 2002, and the M.Sc. degree inelectrical engineering in 2007 both from the FederalUniversity of Minas Gerais, Belo Horizonte, Brazil.He is now a PhD student of the department ofelectrical engineering of the Federal University ofMinas Gerais. His current research interests includeadaptive and evolving intelligent systems, machinelearning and applications in time series forecasting,fault detection and diagnosis, and nonlinear systemsmodeling.

Prof. Walmir Caminhas graduate at electric engi-neering from the Federal University of Minas Geraisin 1987, master’s at electric engineering from theFederal University of Minas Gerais in 1989 and PhDat electric engineering from the University of Camp-inas in 1997. He is currently an associate professorat the Department of Electronics Engineering atFederal University of Minas Gerais, Belo Horizonte,Brazil. His research interests include computationalintelligence and fault detection in dynamic system

Prof. Fernando Gomide Prof. Fernando Gomidereceived the B.Sc. degree in electrical engineer-ing from the Polytechnic Institute of the PontificalCatholic University of Minas Gerais (IPUC, PUC-MG) Belo Horizonte, Brazil, the M.Sc. degree inelectrical engineering from the University of Camp-inas (Unicamp), Campinas, Brazil, and the PhDdegree in systems engineering from Case WesternReserve University (CWRU) Cleveland, Ohio, USA.He is professor of the Department of ComputerEngineering and Automation (DCA), Faculty of

Electrical and Computer Engineering (FEEC) of University of Campinas,since 1983. His interest areas include fuzzy systems, neural and evolu-tionary computation, modeling, control and optimization,logistics, multiagent systems, decision-making and applications. He was past vice-presidentof IFSA - International Fuzzy systems Association - past IFSA Secretaryand member of the editorial board of IEEE Transactions on SMC-B andFuzzy Sets and Systems. Currently he serves the board of NAFIPS - NorthAmerican Fuzzy Information Processing Society - and the editorial boardsof IEEE Transactions on SMC-A, Fuzzy Optimization and Decision Making,International Journal of Fuzzy Systems, and Mathware and Soft Computing.He is a past editor of Controle & Automação, the journal of theBrazilianSociety for Automatics - SBA - the Brazilian National MemberOrganizationof IFAC and IFSA. He is on the Advisory Board of the International Journal ofUncertainty, Fuzziness and Knowledge-Based Systems, Journal of AdvancedComputational Intelligence, and Intelligent Automation and Soft Computing.He is senior member of the IEEE, member of NAFIPS, EUSFLAT, and IFSAFellow. He also serves the IEEE Task Force on Adaptive Fuzzy Systems andthe IEEE Emergent Technology Technical Committee.