Artificial wavelet neuro-fuzzy model based on parallel wavelet network and neural network

20
Soft Comput (2008) 12:789–808 DOI 10.1007/s00500-007-0238-z ORIGINAL PAPER Artificial wavelet neuro-fuzzy model based on parallel wavelet network and neural network Ahmad Banakar · Mohammad Fazle Azeem Published online: 25 September 2007 © Springer-Verlag 2007 Abstract From the well-known advantages and valuable features of wavelets when used in neural network, two type of networks (i.e., SWNN and MWNN) have been proposed. These networks are single hidden layer network. Each neu- ron in the hidden layer is comprised of wavelet and sigmoidal activation functions. First model is derived from adding the outputs of wavelet and sigmoidal activation functions, while in the second model outputs of wavelet and sigmoidal activa- tion function are multiplied together. Using these proposed networks in consequent part of the neuro-fuzzy model, which result summation wavelet neuro-fuzzy and multiplication wavelet neuro-fuzzy models, are also proposed. Different types of wavelet function are tested with proposed networks and fuzzy models on four different types of examples. Con- vergence of the learning process is also guaranteed by per- forming stability analysis using Lyapunov function. Keywords Wavelet · Wavelet network · Neural network · Neuro-fuzzy model 1 Introduction The ability of nonlinear approximation of wavelets (Hossen 2004; Krishnamachari and Chellappa 1992), neural network (NN) (Rumelhart et al. 1986; Narendra and Parthasarathy 1990) and neuro-fuzzy models (Takagi and Sugeno 1985; Quang et al. 2005; Azeem 2003; Sugeno and Yasukawa 1993) has been shown by many researchers. Combining the abil- ity of wavelet transformation for revealing the property of function in localize region with learning ability and general A. Banakar (B ) · M. F. Azeem Department of Electrical Engineering, A.M.U. Aligarh University, Aligarh, India e-mail: [email protected] approximation properties of NN; recently, different types of wavelet neural network (WNN) have been proposed (Boubez and Peskin 1993; Yamakawa 1994; Wang 1999, 2000; Wang and Sugai 2000; Chen et al. 2006a, b; Zhang 1992; Zhang 1995). Boubez and Peskin (1993), used orthonormal set of wavelet functions as basis functions. Yamakawa (1994) and Wang (2000, 1999) applied non-orthogonal wavelet func- tion as an activation function in single layer feedforward NN. They have used a simple cosine wavelet activation func- tion. NN with sigmoidal activation function has already been shown to carry out large dimensional problem very well (Zhang 1997). WNN instigate a superior system model for complex and seismic application in comparison to NN with sigmoidal activation function. The application of wavelet is limited to small dimension (Benveniste et al. 1994), though WNN can handle large dimension problem (Zhang 1997). In this paper two types of WNN namely summation wavelet neural network (SWNN) and multiplication wavelet neural network (MWNN), are proposed. These two proposed WNN lead to propose two type of wavelet neuro-fuzzy (WNF) model namely summation wavelet neuro-fuzzy (SWNF) and multiplication wavelet neuro-fuzzy (MWNF) model. Literature survey indicates that all studies show the efficacy of wavelets when used in wavelet network or/and in WNF model. But none of the reported work caters a com- parative study for different types of the wavelets. The pre- sented work is an attempt to brought a comparative study for three types of wavelet used in WNN or/and WNF, namely, Mexican hat, Morlet and Sinc wavelet function. The idea of this work is to use approximation of inputs by sigmoidal function and wavelet functions separately and then to combine them. The sigmoidal activation function in NN can modulate low frequency section of signal and the wavelet activation function in WNN can modulate high fre- quency section especially sharp section of signal. The idea 123

Transcript of Artificial wavelet neuro-fuzzy model based on parallel wavelet network and neural network

Soft Comput (2008) 12:789–808DOI 10.1007/s00500-007-0238-z

ORIGINAL PAPER

Artificial wavelet neuro-fuzzy model based on parallel wavelet networkand neural network

Ahmad Banakar · Mohammad Fazle Azeem

Published online: 25 September 2007© Springer-Verlag 2007

Abstract From the well-known advantages and valuablefeatures of wavelets when used in neural network, two typeof networks (i.e., SWNN and MWNN) have been proposed.These networks are single hidden layer network. Each neu-ron in the hidden layer is comprised of wavelet and sigmoidalactivation functions. First model is derived from adding theoutputs of wavelet and sigmoidal activation functions, whilein the second model outputs of wavelet and sigmoidal activa-tion function are multiplied together. Using these proposednetworks in consequent part of the neuro-fuzzy model, whichresult summation wavelet neuro-fuzzy and multiplicationwavelet neuro-fuzzy models, are also proposed. Differenttypes of wavelet function are tested with proposed networksand fuzzy models on four different types of examples. Con-vergence of the learning process is also guaranteed by per-forming stability analysis using Lyapunov function.

Keywords Wavelet ·Wavelet network · Neural network ·Neuro-fuzzy model

1 Introduction

The ability of nonlinear approximation of wavelets (Hossen2004; Krishnamachari and Chellappa 1992), neural network(NN) (Rumelhart et al. 1986; Narendra and Parthasarathy1990) and neuro-fuzzy models (Takagi and Sugeno 1985;Quang et al. 2005; Azeem 2003; Sugeno and Yasukawa 1993)has been shown by many researchers. Combining the abil-ity of wavelet transformation for revealing the property offunction in localize region with learning ability and general

A. Banakar (B) ·M. F. AzeemDepartment of Electrical Engineering, A.M.U. Aligarh University,Aligarh, Indiae-mail: [email protected]

approximation properties of NN; recently, different types ofwavelet neural network (WNN) have been proposed (Boubezand Peskin 1993; Yamakawa 1994; Wang 1999, 2000; Wangand Sugai 2000; Chen et al. 2006a, b; Zhang 1992; Zhang1995). Boubez and Peskin (1993), used orthonormal set ofwavelet functions as basis functions. Yamakawa (1994) andWang (2000, 1999) applied non-orthogonal wavelet func-tion as an activation function in single layer feedforwardNN. They have used a simple cosine wavelet activation func-tion. NN with sigmoidal activation function has already beenshown to carry out large dimensional problem very well(Zhang 1997). WNN instigate a superior system model forcomplex and seismic application in comparison to NN withsigmoidal activation function. The application of wavelet islimited to small dimension (Benveniste et al. 1994), thoughWNN can handle large dimension problem (Zhang 1997).

In this paper two types of WNN namely summationwavelet neural network (SWNN) and multiplication waveletneural network (MWNN), are proposed. These two proposedWNN lead to propose two type of wavelet neuro-fuzzy(WNF) model namely summation wavelet neuro-fuzzy(SWNF) and multiplication wavelet neuro-fuzzy (MWNF)model. Literature survey indicates that all studies show theefficacy of wavelets when used in wavelet network or/and inWNF model. But none of the reported work caters a com-parative study for different types of the wavelets. The pre-sented work is an attempt to brought a comparative study forthree types of wavelet used in WNN or/and WNF, namely,Mexican hat, Morlet and Sinc wavelet function.

The idea of this work is to use approximation of inputsby sigmoidal function and wavelet functions separately andthen to combine them. The sigmoidal activation function inNN can modulate low frequency section of signal and thewavelet activation function in WNN can modulate high fre-quency section especially sharp section of signal. The idea

123

790 A. Banakar, M. F. Azeem

of proposing SWNN and MWNN is to combine the localizeapproximation property of wavelets with functional approx-imation properties of NN. The temporal change in dynamicsystem, particularly when the changes are sharp, can be accu-mulated in wavelets. The output of every neuron in SWNNis summation of sigmoidal and wavelets activation functionand output of each neuron in MWNN is the product of thesetwo.

With the proving capability of ANFIS as powerful approx-imation method (Jang 1993) that has both ability of learningparameter in NNs and localizes approximation of TSK fuzzymodel (Takagi and Sugeno 1985), different type of networksbased on neuro-fuzzy model has been proposed. In TSKfuzzy model consequent part of each rule is approximatedby a linear function of the inputs. Essentially, neuro-fuzzymodels based on TSK model are nonlinear, but conceptually,it is an aggregation of the linear models. If the system underconsideration is chaos some forecasting able information inthe system may not be predicted well by the aggregation ofthese linear models. In nonlinear dynamics and time seriesapplication linear local models in TSK are adequate to pre-dict the behavior of the system, but nonlinear local models inTSK are better to predict the nonlinear dynamic behavior ofthe system under consideration. For example, WNF modelcan be used as a good general approximation (Ho et al. 2001;Lin et al. 2003). In these models, premise part of each fuzzyrule represents a localized region of input space in whicha wavelet network is used as local model in the consequentpart of fuzzy rule. In the present paper, proposed WNNs, i.e.,SWNN and MWNN are used as a local model in consequentpart of fuzzy rule that leads to the proposition of SWNF andMWNF, respectively. By joining the localized region trans-formation of wavelet activation function with the localizedapproximation of each fuzzy rule; an increase in precision ofmodels has been experienced.

In both wavelet network and WNF models three typeof non-orthogonal namely Mexican hat, Morlet and Sincwavelet function are used. Ability of proposed model is exam-ined with four examples of time series.

Paper is organized as follow: In Sect. 2, a brief discussionof wavelet function is presented. Section 3 proposes WNNmodels and describes their convergence analysis. WNFmodel are proposed in Sect. 4. This section also dealt withthe convergence analysis of SWNF and MWNF. Experimen-tal results are revealed in Sect. 5 and, finally conclusions arerelegated to Sect. 6.

2 Wavelet function

The wavelet transform (WT) in its continuous form providesa flexible time–frequency window, which narrows whenobserving high frequency phenomena and widens when

analyzing low frequency behavior. Thus, time resolutionbecomes arbitrarily good at high frequencies, while the fre-quency resolution becomes arbitrarily good at low frequen-cies. This kind of analysis is suitable for signals composedof high frequency components with short duration and lowfrequency components with long duration, which is oftenthe case in practical situations. Here, a brief review fromthe theory of wavelets is described that gives basic ideaabout the wavelets and the related work. Wavelets are dividedin the two parts: continuous wavelet transform (CWT) anddiscrete wavelet transform (DWT) (Rao and Bopardikar2004; Daubechies 1992; Burrus et al. 1997). Historically theCWT was the first studied wavelet transform.

Let f (t) be any square integrate able function. The signalor function f (t) can be expressed as (1).

f (t) =∫∫

W (a, b) · ψ(

t − b

a

)db da (1)

W (a, b) is CWT of f (t) with respect to a wavelet functionψ(t) and is defined as:

W (a, b) =∫

f (t)ψ∗a,b(t)dt (2)

where

ψa,b(t) = 1√|a|ψ(

t − b

a

)(3)

ψ(t) is the mother wavelet, ‘a’ is a scaling factor, ‘b’ isshifting parameter and * denotes complex conjugation. Thefamily of functions can be obtained by scaling and shift-ing of ψ(t). The mother wavelet has the property that theset

[ψa,b(t)

]a,b∈Z forms an orthogonal basis in L2(�). This

implies that the mother wavelet can, in turn, generate anyfunction in L2(�). The mother wavelet has to satisfy thefollowing admissibility condition:

Cψ =∞∫

−∞

|�(ω)|2ω

dω <∞ (4)

where �(ω) is the Fourier transform of ψ(t).In practice �(ω) will have sufficient decay, so that the

admissibility condition reduces to:

∞∫

−∞ψ(t)dt = �(0) = 0 (5)

The CWT has the drawbacks of redundancy and imprac-ticability with digital computers. As parameters (a, b) are ofcontinuous values, the resulting CWT is a very redundantrepresentation, and impracticable as well. This impractica-bility is the result of redundancy. Therefore, the scale andshift parameters are evaluated on a discrete grid of time-scaleleading to a discrete set of continuous basis functions. The

123

Artificial wavelet neuro-fuzzy model based on parallel wavelet network and neural network 791

Fig. 1 Different waveletfunctions a Mexican, b Morlet,and c Sinc

continuous inverse wavelet transform (1) is discretized as:

f (t) =∑

i

Wi · a−1/2i ψ

(t − bi

ai

)(6)

To analyze discrete time signals, it is convenient to takeinteger values for ‘a’ and ‘b’ in defining this basis: if a =2 j and b = n · 2 j (where j and n are integers) then, viatranslations and dilations:

{ψ j,n(t)

}j,n∈Z =

{2− j/2ψ

(t − n · 2 j

2 j

)}(7)

Equation (7) forms a sparse orthonormal basis of L2(�).This means that the wavelet basis induces an orthogonaldecomposition of any function in L2(�).

The applications of orthonormal wavelet bases andwavelet frames are usually limited to problems of smalldimension (Zhang 1997). The main reason is that they arecomposed of regularly dilated and translated wavelet. Forpractical implementations, infinite basis and frames arealways truncated. The number of wavelets in a truncatedbases or frames drastically increases with the dimension,therefore, constructing and storing wavelet bases or framesof large dimension are with prohibitive cost.

In most practical situation of large dimension, the avail-able data are sparse (Zhang 1997). If the inverse wavelettransform is discretized according to the distribution of thedata, there are expectations to reduce the number of waveletsneeded in the reconstruction. It is thus possible to handleproblem of large dimension with such adaptive discretiza-tion of the inverse wavelet transform.

The adaptive discretization consists of determining theparameters wi , ai and bi in (6) according to data sample(x, y). This problem is very similar to NN training. In fact,formula (6) can be viewed as a one hidden layer of NN withψas the activation function of the hidden neuron and with a lin-ear neuron in the output layer. For this reason, we refer to theadaptively discretized inverse wavelet transform as waveletnetwork. In this paper three types of wavelet function, namelyMexican hat, Morlet and Sinc are used.

2.1 Mexican hat wavelet

This wavelet is derived from a function, which is proportionalto the second derivative function of the Gaussian probabil-ity density function. It is non-orthogonal, with infinite sup-port and has maximum energy around origin with the narrowband. The expression for Mexican hat wavelet is given by (8)and it is shown in Fig. 1a

ψ(x) =(

1− 2x2)· exp

(−x2

)(8)

2.2 Morlet wavelet

This wavelet is derived from a function that is proportionalto the cosine function and Gaussian probability density func-tion. It is non-orthogonal, infinite support and its maximumenergy lies around origin with the narrow band. The Morletwavelet is expressed as (9) and shown in Fig. 1b

ψ(x) = exp (−x2) · cos (5x) (9)

123

792 A. Banakar, M. F. Azeem

2.3 Sinc (Shannon) wavelet

This wavelet is derived from a function that is proportionalto the cosine function. This wavelet is also non-orthogonalwith infinite support and maximum energy occupies widerband around origin as compared to the above two wavelets.The Sinc wavelet is specified as (10) and shown in Fig. 1c

ψ(x) = sin (πx)/(πx) (10)

3 Wavelet NN model

In this section two types of feedforward single hidden layernetwork, namely SWNN and MWNN are introduced. Eachneuron in hidden layer corresponds to a wavelet activationfunction in conjunction with sigmoidal activation function,as it is used in NN. In the SWNN the output of each neu-ron in the hidden layer, is the summation of sigmoidal andwavelet activation function outputs, whereas, in MWNN it isthe product of them. Discussions about the structure deter-mination of the proposed models and learning procedure arealso presented in this section.

3.1 Proposed SWNN and MWNN

Figure 2a shows the architecture of proposed SWNN andMWNN feedforward WNN models. Figure 2b shows theneuron structure used in SWNN and MWNN feedforwardWNN. The architecture of the feedforward WNN model is

Fig. 2 a Proposed SWNN and MWNN models and b hidden neuron

consists of three layers, i.e., input, hidden and output layer.The neurons in input layer transmit the inputs with weightsCW and CN to wavelet and sigmoidal activation functionrespectively in the hidden layer neuron. Each neuron in thehidden layer holds two sections. One of them is sigmoidalactivation function and another is wavelet activation function.The output of each hidden layer neuron in SWNN network isthe summation of the output from these two sections as givenin (11). In MWNN network the output of each hidden layerneuron is obtained by multiplying the output from these twosections and expressed as (12).

yl = θ(

n∑i=1

CNi · xi

)+ ψ

(n∑

i=1

CWi · xi

)(11)

yl = θ(

n∑i=1

CNi · xi

)× ψ

(n∑

i=1

CWi · xi

)(12)

ψ in (11) and (12) is wavelet function [anyone among (8–10)]and θ is sigmoidal function given by (13).

θ(y) = 1

1+ e−y(13)

The output of feedforward network is the weighted sum-mation of the outputs from hidden layer neurons with weightW as shown in Fig. 2a and expressed as (14).

YW N N =L∑

l=1

Wl · yl (14)

Structure determination In proposed SWNN and MWNNmodels, each neuron is a parallel combination of waveletand sigmoidal activation function. It means that the numberof sigmoidal and wavelet functions is same. Since waveletparameter is highly depend on the nature of the input-outputsignal, initially the scaling parameter is selected with theminimum possible value (a = 1 for normalize I/O signal)and shifting parameter is chosen by appropriate position-ing of wavelet (i.e., b = 0 to a − 1). It results in a singlehidden layer neuron. Later on, by gradually increasing thescaling factor and appropriate positioning of the wavelets,the number of neuron in the hidden layer goes on increasing,resulting in the growth of network. A criterion is specified tostop the growth of the network. We select minimum numberof wavelet function by using scaling factor a = 1 and shift-ing b = 0. Therefore, the number of sigmoidal activationfunction is one. In total, for a = 1 there is only one neuronconstituting one sigmoidal and one wavelet function. In thenext step the wavelet function with scaling factor a = 2 isadded to previous network. For a = 2 shifting parameter‘b’ is change from 0 to 1. Therefore, in this stage there arethree wavelet activation functions along with three sigmoidalactivation functions. In this manner, the network grows itselfuntil the specified criterion, for stopping of this growth, is

123

Artificial wavelet neuro-fuzzy model based on parallel wavelet network and neural network 793

accomplished. We have used a performance based criterionto stop the growth of hidden layer neuron. The criterion statesas: “Increasing of hidden neuron, by increasing the scalingfactor, continues until the performance index (15) of the net-work is improving. Terminate the growth of hidden layerneurons if the performance index of network deteriorates byfurther increasing the scaling factor”.

Various methods for correct selection of these parameters,in more effective way, have been proposed in Zhang et al.(1995), Oussar and Dreyfus (2000). Since the data are nor-malized, the number of wavelet function with scale ‘a’ thatare needed to cover normalized range is no more than ‘a’. Let‘a’ is the value for scaling factor, the value of shifting para-meter ‘b’ change from 0 to a − 1. Here in proposed model,firstly, the number of wavelet function is to be fixed and thenthe same number of sigmoidal function is added. To makeeasier wavelet selection we increase scaling factor from oneto higher value, in step of one, to obtain desired accuracy. Fora value of scaling factor ‘a’the number of hidden neurons inthe network is equal to a(a + 1)/2.

3.2 Learning algorithm

In SWNN and MWNN (11, 12) CN ,CW and W are learningparameters. To learn these parameters of the network, gra-dient descent (GD) algorithm is used. These parameters areadjusted such that performance J , defined in (15), should beminimized. (16–18)

J = 1

2.P.y2r·

P∑p=1

(Y (p)− Y (p)

)2(15)

where yr =(

maxPp=1Y (p)−minP

p=1Y (p)), Y is output of

network and Y is actual data, P is the number of dataset. TheGD algorithm for parameter updates are as follows: where‘q’ indicates the epoch number.

Wl(q + 1) = Wl(q)− ηW · ∂ J

∂Wl= Wl(q)− ηW · 1

P.y2r

·P∑

p=1

[(Y − YW N N

)· ∂YW N N

∂Wl

]

= Wl (q)−ηW · 1

P.y2r·

P∑p=1

[(Y−YW N N

)·yl

]

(16)

CNi (q + 1) = CNi (q)− ηCN ·∂ J

∂CNi

= CNi (q)− ηCN

· 1

P.y2r·

P∑p=1

[(Y − YW N N

)· ∂YW N N

∂CNi

]

(17)

CWi (q + 1) = CWi (q)− ηCW ·∂ J

∂CWi

= CWi (q)− ηCW

· 1

P.y2r·

P∑p=1

[(Y − YW N N

)· ∂YW N N

∂CWi

]

(18)

∂YW N N∂CNi

and ∂YW N N∂CWi

for SWNN model is calculated by (19)

and (20), respectively.

∂YW N N

∂CNi

∣∣∣∣∣SW N N

= Wl · xi · θ ′(

n∑i=1

CNi · xi

)(19)

∂YW N N

∂CWi

∣∣∣∣∣SW N N

= Wl · xi · ψ ′(

n∑i=1

CWi · xi

)(20)

∂YW N N∂CNi

and ∂YW N N∂CWi

for MWNN model is calculated by (21)

and (22), respectively.

∂YW N N

∂CNi

∣∣∣∣∣MW N N

= Wl · xi · θ ′(

n∑i=1

CNi · xi

)× ψ

(n∑

i=1

CWi · xi

)(21)

∂YW N N

∂CWi

∣∣∣∣∣MW N N

= Wl · xi · θ(

n∑i=1

CNi · xi

)× ψ ′

(n∑

i=1

CWi · xi

)(22)

In these equations, Wl is weight between hidden neuron andoutput layer. θ ′ is differential functions for the sigmoidalfunction and given as:

θ ′(y) = θ(y) · (1− θ(y)) (23)

ψ ′ is differential functions for the wavelet functions and forMexican hat, Morlet and Sinc function is given as (24), (25)and (26), respectively.

ψ ′a,b(y) = −2

a1.5· z · e−z2 ·

[2+

(1− 2 · z2

)](24)

ψ ′a,b(y)=−2

a· z · e−z2 · cos (5 · z)−e−z2 · 5

a· sin (5z)

(25)

ψ ′a,b(y) =1

a· π · z · cos (π · z)− sin (π · z)

π · z2 (26)

where z = y−ba .

In above equations η is an adaptive learning rate. Learn-ing parameter η increases or decreases depends on chang-ing in performance index J . A two-phase adaptive scheme,to make the learning rate (η) adaptive, is used in the GDtechnique. The initial value of learning rate is kept at 0.1for all applications. In the first phase either it increases or

123

794 A. Banakar, M. F. Azeem

decreases by a factor of “10”. When it reaches an appropri-ate value, in a very few epochs (i.e. <10), then the secondphase starts. This increase or decrease is dependent upon theacceptance or rejection, respectively, for updating the para-meters. In the second phase, involving the operation η← λη;we choose λ = 1.05 for the acceptance of parameter updatesand λ = 0.7 for the rejection of the same.

Convergence analysis Lyapunov stability theorem is appliedto guarantee the convergence and stability of learning algo-rithm for the proposed networks. Theorem 1 as discussedin Lee and Teng (2000) is applied to proposed networks toconsider their convergence. Proof for Theorem 1 is given inappendix A.

Theorem 1 Suppose υ is learning parameter, it is learnedby applying GD algorithm as given in (27).

υ(q + 1) = υ(q)+υ (27)

where υ = −η · ∂ J∂υ

. Using performance index J (15) anddefining the discrete Lyapunov function (28). The maximumvalue of η which guarantees the convergence of the learningis express by (29).

V (k) = E(k) = 1

2· [e(k)]2 (28)

ηmax <(

2 · P · y2r

)/max

(∂ y(k)

∂υ

)2

(29)

Theorem 2 Applying the condition in Theorem 1, the rangeof the learning rate for different parameters of the proposed

SWNN and MWNN (16–18) is:

0 < ηW <2 · P · y2

r∣∣∣ ∂YW N N∂W

∣∣∣2

max

(30)

0 < ηCN <2 · P · y2

r∣∣∣ ∂YW N N∂CN

∣∣∣2

max

(31)

0 < ηCW <2 · P · y2

r∣∣∣ ∂YW N N∂CW

∣∣∣2

max

(32)

4 WNF model

In this section, based on proposed WNN network, two typesof neuro-fuzzy model are proposed. Antecedent part of eachfuzzy rule in the proposed neuro-fuzzy model representsinput space in which a local model operates. These localmodels are estimated by proposed SWNN or MWNN net-work that results in SWNF or MWNF models, respectively.This section also deals with learning procedure and conver-gence analysis of the proposed SWNF and MWNF models.

4.1 Proposed SWNF and MWNF

Figure 3 shows a WNF model. This model can be describedby a set of following fuzzy rules:

Rm : IF x1 is Am1 and · · · and xn is Am

n

THEN YW N Fm = YW N Nm (33)

where Rm is the mth rule; xi is the i th input variable; YW N Nm

is the output of the mth local model for rule Rm ; and Ami is the

Fig. 3 Proposed waveletneuro-fuzzy (WNF) model

123

Artificial wavelet neuro-fuzzy model based on parallel wavelet network and neural network 795

linguistic term of the premise part with Gaussian membershipfunction given by (34).

µAmi(xi ) = exp

(− (xi − xmi )

2

σ 2mi

)(34)

where xim and σim are the mean and standard deviation,respectively, of mth fuzzy set associated with i th input vari-able xi . From the Fig. 3, structure of WNF is described asfollows:

Layer 1 Nodes in layer 1 represent input variable. Everynodes accepts input values and transmit it to thenext layer.

Layer 2 Nodes in this layer represent the terms of therespective linguistic variables. Every nodes oper-ates on incoming signal with Gaussian member-ship function expressed by (34). The parameters tobe learned in this layer are xmi and σmi .Corresponding to each rule the learning parame-ter are expressed in vector form as xm = {xm1,

xm2, . . . , xmn} and σm = {σm1, σm2, . . . , σmn}.Layer 3 Each node in layer 3 represents a fuzzy rule. The

output from the nodes in layer 2, specified for afuzzy rule, is inputted to the nodes, specified forthat rule, in layer 3. The output of each node inlayer 3 is the product of all inputs, it represents thefiring strength of that rule. Thus the firing strengthof the mth rule is specified as (35).

µAm =n∏

i=1

µAmi=

n∏i=1

exp

(−

(xi− xmi

σmi

)2)

(35)

Layer 4 Nodes in layer 4 are called consequent nodes.Two inputs are applied to each node in this layer,namely the output from the layer 3 node and theoutput from its corresponding local model approx-imated by either SWNN or MWNN. The outputof each node is the product of both input and givenby (36).

Y ∗W N Fm= µAm · YW N Nm (36)

Layer 5 There are three nodes in this layer that constitutesthe aggregation and defuzzification of fuzzy rules.The output of all nodes from layer 4 is the inputto the first node and its output is the summationof all inputs and expressed as (37). The outputof all nodes from layer 3 is the input to the sec-ond node and its output is the summation of allinputs and expressed as (38). Inputs to the thirdnode in this layer are the output from first and

second node. The output of the third node is theratio of these two input and given in (39). The out-put of the third node in this layer is the output ofeither SWNF or MWNF depending upon whetherSWNN or MWNN is used as local model in theconsequent part of each rule of fuzzy system.

a =M∑

m=1

(µAm · YW N Nm

)(37)

b =M∑

m=1

µAm (38)

YW N F = a

b=

∑Mm=1

(µAm · YW N Nm

)∑M

m=1 µAm(39)

In these equations n is number of inputs and M is numberof rules and the number of fuzzy sets for each input is sup-posed to be equal to the number of rules. To determine num-ber of necessary rules modified mountain clustering (MMC)is applied (Chung and Lee 1998; Yager and Filev 1994). Thepurpose of clustering is to do natural grouping of large set ofdata, producing a concise representation of system’s behav-ior. The description of MMC can be found in Appendix B.

4.2 Learning algorithm

In (33–39), x, σ,W,CN and Cw in WNF are the learningparameters that are adjusted by using GD algorithm (15 and40–44). The first two parameters depend on fuzzy member-ship (Gaussian) and the rest three parameters are associatedwith WNN.

σim(q + 1) = σ(q)−ησ · ∂ J

∂σmi

= σmi (q)− ησ · 1

P · y2r

·P∑

p=1

[(Y−YW N F

)· ∂YW N F

∂σmi

]

= σmi (q)−ησ · 1

P · y2r

·P∑

p=1

[(Y−YW N F

)· YW N Nm

· βm

µAm· (1−βm) · 2 · (xi− xmi )

2

σ 3mi

](40)

xmi (q + 1) = xmi (q)−ηx · ∂ J

∂ xmi

= x(q)−ηx · 1

P · y2r

·P∑

p=1

[(Y−YW N F

)· ∂YW N F

∂ xmi

]

123

796 A. Banakar, M. F. Azeem

= xmi (q)−ηx · 1

P · y2r

·P∑

p=1

[ (Y − YW N F

)· YW N Nm

· βm

µAm· (1− βm) · 2 · (xi − xmi )

σ 2mi

](41)

W ml (q + 1) = W m

l (q)− ηW · ∂ J

∂W ml= W m

l (q)− ηW

·P∑

p=1

[(Y − YW N F

)· 1

P · y2r· ∂YW N F

∂W ml

]

= W ml (q)− ηW

·P∑

p=1

[ (Y − YW N F

)· 1

P · y2r· βm · ym

l

](42)

CN mi(q + 1) = CN m

i(q)− ηCN ·

∂ J

∂CN mi

= CN mi(q)− ηCN

·P∑

p=1

[(Y − YW N F

)· 1

P · y2r· ∂YW N F

∂CN mi

]

= CN mi(q)− ηCN

·P∑

p=1

[(Y − YW N F

)· 1

P · y2r· βm · ∂YW N Nm

∂CN mi

]

(43)

CW mi(q + 1) = CW m

i(q)− ηCW ·

∂ J

∂CW mi

= CW mi(q)− ηCW

·P∑

p=1

[(Y − YW N F

)· 1

P · y2r· ∂YW N F

∂CW mi

]

= CW mi(q)− ηCW

·P∑

p=1

[(Y − YW N F

)· 1

P · y2r· βm · ∂YW N Nm

∂CW mi

]

(44)

where βm = µAm∑Mm=1 µAm

. In the above equations η is an adap-

tive learning parameter. ∂YW N Nm∂CNm

i

and ∂YW N Nm∂CW m

i

for SWNF or

MWNF are expressed by (19, 20) or (21, 22), respectively.

Convergence analysis To consider stability of learningprocedure Lyapunov stability theorem, as discussed inTheorem 1, is applied. Theorem 3 shows convergence con-dition for proposed neuro-fuzzy models.

Theorem 3 Applying Theorem 1 to proposed SWNF andMWNF models by (40–44), the convergence is guaranteed if

learning rate η is chosen as:

0 < ηw <2 · P · y2

r∣∣∣ ∂YW N N∂w

∣∣∣2

max

(45)

0 < ηCN <2 · P · y2

r∣∣∣ ∂YW N N∂CN

∣∣∣2

max

(46)

0 < ηCW <2 · P · y2

r∣∣∣ ∂YW N N∂CW

∣∣∣2

max

(47)

0 < ησ <2 · P · y2

r∣∣∣YW N N

∣∣∣2

max·(

2σ 3

min

)2 (48)

0 < ηx <2 · P · y2

r∣∣∣YW N N

∣∣∣2

max·(

2σ 2

min

)2 (49)

where M denotes the number of rules in the neuro-fuzzymodel. Proof of Theorem 3 is given in Appendix A.

5 Examples and results

In this section, different examples applied to proposed mod-els. Structure determination of the neuro-fuzzy models andproposed wavelet network is done separately. The proposedwavelet networks that have better performance are used inconsequent part of the fuzzy rules. In both WNN and WNFmodels three types of wavelet function, namely Mexican hat,Morlet and Sinc function, are applied to proposed models.In other word quality of proposed models is tested by thesewavelet functions. Each model has been learned up to onethousand epochs and with adaptive learning rate boundedby convergence limit, to learn W,CN and Cw parametersand x, σ,W,CN and Cw parameters in WNN and WNF,respectively. For each example the result is presented in twoparts: firstly WNN (SWNN and MWNN) and secondly WNF(SWNF and MWNF).

Example 1 Gas furnace dataIt is a benchmark problem of system identification. The

process in this example is of a gas furnace with single inputu(t), i.e., gas flow rate and single output y(t), i.e., C O2

concentration. In Sugeno and Yasukawa (1993) u(t − 3),u(t − 4) and y(t − 1) have been considered as input to themodel on the basis of their proposed input selection scheme,we have also considered these variable as input to our pro-posed models. Two hundred fifty data are used to train themodels and remaining 40 data are used for prediction.

Figure 4 shows learning patterns of SWNN with differentwavelet functions. There are twelve hidden layer neurons for

123

Artificial wavelet neuro-fuzzy model based on parallel wavelet network and neural network 797

Fig. 4 Learning pattern of SWNN network with all wavelet functionsfor Example 1

SWNN. The number of hidden layer neuron is determinedby checking the performance of the network while increasingthem by our proposed criterion. Learning patterns of MWNNwith different types of wavelet activation function are shownin Fig. 5. For MWNN, six hidden layer neurons are selected.In both SWNN and MWNN the network with Morlet activa-tion function yields better result. Figure 6 shows the compar-ative results for proposed SWNN and MWNN network withMorlet function, WNN with Mexican hat wavelet functionand 15 hidden neurons and NN with 16 hidden neurons. Inthis example, MWNN model with Morlet wavelet is better.After that SWNN with Morlet and WNN with Mexican hat.Actual output and output of MWNN (with Morlet) networkand error between them are shown in Fig. 7.

Fig. 5 Learning pattern of MWNN network with all wavelet functionsfor Example 1

Fig. 6 Learning pattern of different network (with best wavelet func-tion) for Example 1

Fig. 7 Actual output and output of MWNN (Morlet) network and theerror for Example 1

Figures 8 and 9 show learning pattern of SWNF andMWNF models with different wavelet functions, respectively.Number of rules which is selected by mountain clustering isfive. The consequent part of each rule in SWNF and MWNFmodels is SWNN and MWNN. The network size of SWNNand MWNN for each rule is determined independently.SWNN and MWNN network with Morlet activation func-tion, when used in consequent part of SWNF and MWNFlead to the better performance. The learning pattern of SWNFand MWNF with Morlet function, WNF with Mexican hatand TSK fuzzy models are shown in Fig. 10. MWNF modelwith Morlet wavelet function yields better result. Actual out-put and output of MWNF model with Morlet function andthe error between them have been shown in Fig. 11. Table 1illustrates the comparative performance of proposed WNNnetworks and WNF model with different wavelets. The

123

798 A. Banakar, M. F. Azeem

Fig. 8 Learning pattern of SWNF model with all wavelet functions forExample 1

Fig. 9 Learning pattern of MWNF model with all wavelet functionsfor Example 1

performance index of MWNF model with Morlet activationfunction is J = 1.61 × 10−7 which is better than othersmethods with J = 2.18× 10−6 in Quang et al. (2005).

Example 2 Box & Jenkins’ time series:The time series used in this example is generated by the

chaotic Mackey-Glass differential delay equation definedbelow:

x(t) = 0.2x(t − τ)1+ x10(t − τ) − 0.1x(t) (50)

The data of above equation is available in MATLAB(mgdata.dat) and produce by: x(0) = 1.2, τ = 17 and x(t)=0 for t < 0. Using x(t − 18), x(t − 12), x(t − 6), x(t) asinputs to model to predict x(t + 6) as output of the model.

Fig. 10 Learning Pattern of different neuro-fuzzy models (with bestwavelet function) for Example 1

Fig. 11 Actual output and output of MWNF (Morlet) model and theerror for Example 1

Table 1 Performance Index (J ) of proposed WNN network and WNFmodels with different wavelets function for Example 1

SWNN MWNN SWNF MWNF

Mexican 4.29× 10−6 2.38× 10−6 9.22× 10−7 3.49× 10−6

Morlet 2.52× 10−7 1.68× 10−7 2.82× 10−7 1.61× 10−7

Sinc 7.71× 10−6 6.64× 10−6 2.26× 10−6 2.85× 10−7

Six hundred data are produced; out of that 300 data are usedfor training and remaining 300 are for testing of the networks.

Figures 12 and 13 show the performance index for SWNN,MWNN networks, respectively. In SWNN, Sinc waveletfunction and in MWNN Morlet wavelet function yield bet-ter result. The comparative learning pattern of performanceindexes for different networks has been shown in Fig. 14.

123

Artificial wavelet neuro-fuzzy model based on parallel wavelet network and neural network 799

Fig. 12 Learning pattern of SWNN network with all wavelet functionsfor Example 2

Fig. 13 Learning pattern of MWNN network with all wavelet func-tions for Example 2

In this example, both SWNN and MWNN are better thanWNN. Also all wavelet networks get better result relative toNN. Actual output and predicted network output of MWNNnetwork with Morlet function is shown in Fig. 15.

Number of rules for neuro-fuzzy models is selected byusing modified mountain clustering, which yields five rules.The same networks structure of WNN, as determined above,are used in consequent parts of each rules. Figures 16 and 17show that SWNF with Morlet wavelet function and MWNFwith Mexican wavelet function yield better result. Compar-ative result of SWNF, MWNF, WNF and TSK fuzzy modelsare presented in Fig. 18. For this example, SWNF modelwith Mexican function is better. Actual output and predictedoutput of this model has been shown in Fig. 19. Table 2illustrates the comparative performance of proposed WNN

Fig. 14 Learning pattern of different network (with best wavelet func-tion) for Example 2

Fig. 15 Actual output and predicted output of MWNN (Morlet) net-work and the error for Example 2

networks and WNF model with different wavelets. In Chungand Lee (1998) the performance index with the same con-dition is J = 2.62 × 10−5 whereas all proposed waveletnetworks yield better result.

Example 3 The system is a non-linear second order dynam-ical model (Narendra and Parthasarathy 1990). The function‘ f ’ is polynomial with three degree of current input u(k).The input u(k) is a sum of two sinusoidal given in (53).

y(k + 1) = 0.3y(k)+ 0.6y(k − 1)+ f [u(k)] (51)

where

f [u(k)] = [u(k)]3 + 0.3[u(k)]2 − 0.4u(k) (52)

u(k) = sin (2πk/250)+ sin (2πk/25) (53)

123

800 A. Banakar, M. F. Azeem

Fig. 16 Learning pattern of SWNF model with all wavelet functionsfor Example 2

Fig. 17 Learning pattern of MWNF model with all wavelet functionsfor Example 2

In this example, 500 input–output data is produced. Threehundred data are used in learning procedure and remaining200 data are for prediction.

Figures 20 and 21 show learning pattern of proposed WNNnetworks with different type of wavelets. Number of hiddenneuron in this example for proposed SWNN and MWNNnetworks is six. Both networks with Morlet activation func-tion yield better performance. Figure 22 gives comparisonof proposed SWNN and MWNN with WNN (Mexican) andNN. SWNN and MWNN give better performance. Actualoutput and network output for MWNN with Morlet waveletfunction and error between them have been shown in Fig. 23.

Learning pattern of SWNF and MWNF models with dif-ferent types of wavelet functions have been illustrated inFigs. 24 and 25. The number of rules, which is computedby MMC, is five. SWNF and MWNF model with Morlet

Fig. 18 Learning pattern of different neuro-fuzzy models (with bestwavelet function) for Example 2

Fig. 19 Actual output and predicted output of MWNF (Morlet) modeland the error for Example 2

Table 2 Performance Index (J ) of proposed WNN network and WNFmodels with different wavelets function for Example 2

SWNN MWNN SWNF MWNF

Mexican 4.06× 10−6 2.62× 10−6 1.82× 10−6 1.48× 10−6

Morlet 2.11× 10−6 1.44× 10−6 1.54× 10−6 1.61× 10−6

Sinc 1.85× 10−6 2.17× 10−6 3.50× 10−6 2.07× 10−6

wavelet function yield better result. Learning pattern of pro-posed SWNF and MWNF models along with TSK and WNFmodel is shown in Fig. 26. Performance of MSWF modelwith Morlet wavelet function is better. Actual output andoutput of this model with error has been shown in Fig. 27.Table 3 illustrates the comparative performance of proposed

123

Artificial wavelet neuro-fuzzy model based on parallel wavelet network and neural network 801

Fig. 20 Learning pattern of SWNN network with all wavelet functionsfor Example 3

Fig. 21 Learning pattern of MWNN network with all wavelet func-tions for Example 3

WNN networks and WNF model with different waveletfunctions.

Example 4 The model has been taken from Narendra andParthasarathy (1990) and described in (54, 55) as follows:

y(k + 1) = f [y(k), y(k − 1)]+ u(k) (54)

Where function f is:

f [y(k), y(k − 1)] = y(k)y(k − 1)[y(k)+ 2.5]

1+ y2(k)+ y2(k − 1)(55)

The input signal u(k) is assumed to be a random variablein the interval [−2, 1]. Total 500 data are produced. First 300data are used for learning of network and remaining 200 dataare used for prediction.

Figures 28 and 29 show the learning pattern for SWNNand MWNN with scaling factor equal 3 and therefore 12 hid-den neurons, respectively. We obtain better performance for

Fig. 22 Learning pattern of different network (with best wavelet func-tion) for Example 3

Fig. 23 Actual output and network output with MWNN (Morlet) net-work and the error for Example 3

SWNN and MWNN with Morlet wavelet function.Figure 30 shows the learning pattern of SWNN, MWNN,WNN and NN networks. Proposed SWNN and MWNN net-works have better performance index value. Actual output &predicted output of the MWNN network with Morlet functionand error are shown in Fig. 31.

Figures 32 and 33 show the learning pattern for SWNF andMWNF models with different wavelet function, respectively.SWNF and MWNF models with Morlet wavelet functionyield better performance index value. Learning pattern ofSWNF, MWNF models with WNF and TSK models is shownin Fig. 34. Proposed SWNF and MWNF models with Morletwavelet function are yielding better performances. Figure 35shows actual output and predicted output of MWNF modelwith Morlet wavelet function. The comparative performance

123

802 A. Banakar, M. F. Azeem

Fig. 24 Learning pattern of SWNF model with all wavelet functionsfor Example 3

Fig. 25 Learning pattern of MWNF model with all wavelet functionsfor Example 3

of proposed WNN networks and WNF model with differentwavelets is illustrates in Table 4.

DiscussionTables 1 to 4 show that the proposed SWNN and MWNN

networks and SWNF and MWNF models with Morlet acti-vation function have better performance index. Wavelet withMorlet activation function has both behavior of Gaussian andcosine function. Whereas Mexican hat wavelet function hasonly Gaussian and Sinc wavelet function has only cosinebehavior.

Comparative results of WNN networks with differentwavelet activation functions are shown in Table 5. ProposedSWNN and MWNN networks have better performance withless number of hidden neurons. In addition, MWNN is better

Fig. 26 Learning pattern of different neuro-fuzzy models (with bestwavelet function) for Example 3

Fig. 27 Actual output and output of MWNF (Morlet) model and theerror for Example 3

Table 3 Performance Index (J ) of proposed WNN network and WNFmodels with different wavelets function for Example 3

SWNN MWNN SWNF MWNF

Mexican 2.76× 10−6 1.50× 10−6 1.24× 10−6 2.37× 10−6

Morlet 1.734× 10−6 7.585× 10−7 6.23× 10−7 8.41× 10−7

Sinc 5.56× 10−6 1.71× 10−6 6.56× 10−7 1.36× 10−6

than SWNN model. Using wavelet function in conjunctionwith sigmoidal activation function make network more com-fortable in dealing with dynamic systems. Sigmoidal activa-tion function can modulate low frequency part of the signal.Wavelet activation functions with small scaling factor arelow frequency filter whereas by increasing scaling factor thewavelets behave as high frequency filter. When summationor multiplication is applied to output of the sigmoidal and

123

Artificial wavelet neuro-fuzzy model based on parallel wavelet network and neural network 803

Fig. 28 Learning pattern of SWNN network with all wavelet functionsfor Example 4

Fig. 29 Learning pattern of MWNN network with all wavelet func-tions for Example 4

wavelet activation function they can carry out the low fre-quency and high frequency of the signal simultaneously thenthe number of hidden neuron decrease.

Table 6 shows performance of learning pattern for differ-ent fuzzy models. Proposed SWNF and MWNF models andspecially multiplication WNF models have better result thanboth WNF and TSK models. Each fuzzy rule is approxima-tion of input space in region in output space. Fuzzy infer-ence system try to find out best response to “IF section” ofthe rules. If the input space partition better the result of thatregion in the output space gets better. Clustering is used tocluster best input space in respect to its local output. If wecan analyze the local output space in consequent part of eachrule the final response will be better. The proposed WNN

Fig. 30 Learning pattern of different network (with best wavelet func-tion) for Example 4

Fig. 31 Actual output and predicted output of MWNN (Morlet) net-work and the error for Example 4

networks can detect the local output space much better thanlinear function in TSK model or WNN with only waveletactivation function.

Comparing the performance given in Tables 5 and 6 itshows that in most of the examples WNF are performingbetter than WNN. It confirms that when the system underconsideration is vague or the information about the plant isnot enough, because of the uncertainty, the fuzzy modelsare more functional. The better functioning of fuzzy modelsare due to aggregation of localized model, defined in differ-ent local region. These local regions are characterized by themembership functions. These membership functions in turnshandled vagueness of the system and/or insufficient informa-tion about the inputs.

123

804 A. Banakar, M. F. Azeem

Fig. 32 Learning pattern of SWNF model with all wavelet functionsfor Example 4

Fig. 33 Learning pattern of MWNF model with all wavelet functionsfor Example 4

6 Conclusions

In this presented work two type of WNN networks (i.e.,SWNN and MWNN) and two type of WNF model (i.e.,SWNF and MWNF) are proposed. A comparative study ofthree type wavelet, in reference with the proposed networks,is also presented in this paper. The proposed SWNN andMWNN and SWNF and MWNF are being tested on fourdifferent examples. Comparative study of different waveletfunction with reference to SWNN and MWNN is also carriedout with these examples.

The SWNN and MWNN are single hidden layer networks.Each neuron in the hidden layer comprised of wavelet acti-vation function and sigmoidal activation function. When thesummation operator is used to combine them, it results in

Fig. 34 Learning pattern of different neuro-fuzzy models (with bestwavelet function) for Example 4

Fig. 35 Actual output and predicted output of MWNF (Morlet) modeland the error for Example 4

Table 4 Performance Index (J ) of proposed WNN network and WNFmodels with different wavelets function for Example 4

SWNN MWNN SWNF MWNF

Mexican 2.83× 10−5 2.86× 10−5 1.00× 10−5 9.94× 10−6

Morlet 1.64× 10−5 1.14× 10−5 3.92× 10−6 2.56× 10−6

Sinc 3.45× 10−5 3.13× 10−5 8.51× 10−6 7.40× 10−6

SWNN network, whereas the product operator results inMWNN. The SWNF and the MWNF models are resultingby using SWNN and MWNN network in consequent part ofeach rule in neuro-fuzzy model, respectively.

The comparative result of different wavelets shows that,the Morlet wavelet activation function yields better perfor-mance of SWNN and MWNN for all considered examples.

123

Artificial wavelet neuro-fuzzy model based on parallel wavelet network and neural network 805

Table 5 Performance index (J ) of different WNN networks (with best wavelet function)

NN WNN SWNN MWNN

Hidden Performance Hidden Performance Hidden Performance Hidden Performanceneurons index (J ) neurons index (J ) neurons index (J ) neurons index (J )

Example 1 16 1.04× 10−5 15 2.632× 10−7 12 (6 + 6) 2.525× 10−7 6 (3 + 3) 1.685× 10−7

Example 2 13 5.81× 10−5 28 6.21× 10−6 12 (6 + 6) 1.85× 10−6 12 (6 + 6) 1.44× 10−6

Example 3 20 9.01× 10−6 15 1.993× 10−6 6 (3 + 3) 1.734× 10−6 6 (3 + 3) 7.585× 10−7

Example 4 13 4.01× 10−5 15 8.468× 10−5 12 (6 + 6) 1.648× 10−5 12 (6 + 6) 1.145× 10−5

Table 6 Performance Index (J )of different WNF models (withbest wavelet function)

Performance Index (J )

NF WNF SWNF MWNF

Example 1 2.72× 10−7 2.00× 10−7 2.82× 10−7 1.61× 10−7

Example 2 1.83× 10−6 3.47× 10−6 1.54× 10−6 1.48× 10−6

Example 3 1.72× 10−6 1.11× 10−6 6.23× 10−7 8.41× 10−7

Example 4 3.33× 10−5 3.31× 10−5 3.92× 10−6 2.56× 10−6

The proposed SWNN and MWNN networks, have better per-formance than WNN network with wavelet activation func-tion only and NN with sigmoidal activation function only,even with fewer number of hidden layer neurons. MWNNnetwork yields better performance in comparison to SWNNnetwork. SWNF and MWNF models with Morlet activationfunction confer better performance. SWNF and MWNF mod-els have better performance than WNF and TSK fuzzy model.Mostly MWNF model has better performance than SWNF.In general, it is found that neuro-fuzzy models achieve betterresult than networks.

Appendix A

To guarantee the stability during the learning procedure bydetermining the appropriate range of learning rate, Lyapunovstability theorem is applied. Small value of learning rate ηprecedes the lower speed of convergence. Large value of ηcauses the learning procedure non-stable. Therefore learningrate should be chosen in such a way that the stability andconvergence is guaranteed.

0 < η < ηMax (A-1)

Proof of Theorem 1 Discrete Lyapunov function can bedefined as:

VI (k) = EI (k) = 1

2· [e(k)]2 (A-2)

where e(k) = y(k) − y(k) is error between actual output yand estimation output y at the time k. The GD algorithm for

parameter υI may be written as:

υI (k + 1) = υI (k)+ η(−∂EI (k)

∂υI

)(A-3)

where k and η represent the number of epoch and the learningrate, respectively. The change of Lyapunov function due tolearning procedure is:

VI (k) = VI (k + 1)− VI (k) = 1

2·[e2(k + 1)− e2(k)

]

(A-4)

and

e(k + 1) = e(k)+e(k)⇒ e2(k + 1)

= e2(k)+2e(k)+ 2 · e(k) ·e(k) (A-5)

Using (A-5) in (A-4) gives:

VI (k) = e(k) ·[

e(k)+ 1

2·e(k)

](A-6)

The difference of error due incremental change in the learningparameters υI , can be expressed as:

e(k) = e(k + 1)− e(k) ≈[∂e(k)

∂υI

]T

·υI (k) (A-7)

υI is obtained by (A-3) as .

υI (k) = −η · ∂E(k)

∂υI(A-8)

By replacing equations (A-7) and (A-8) in (A-6) we have:

VI (k) =[∂e(k)

∂υI

]T

·(−η · ∂E(k)

∂υI

)

·{

e(k)+ 1

2·[∂e(k)

∂υI

]T

·(−η · ∂E(k)

∂υI

)}

(A-9)

123

806 A. Banakar, M. F. Azeem

or

V (k) =[∂e(k)

∂υ

]T

· (−η) · 1

P · y2r· e(k) · ∂ y(k)

∂υ

·{

e(k)+ 1

2·[∂e(k)

∂υ

]T

·(−η) · 1

P · y2r· e(k) · ∂ y(k)

∂υ

}

or

V (k) = e2(k) ·{−[∂ y(k)

∂υ

]T

· η · 1

P · y2r· ∂ y(k)

∂υ

+1

2·[∂ y(k)

∂υ

]T

·[∂ y(k)

∂υ

]T

·η2 · 1(P · y2

r

)2 ·(∂ y(k)

∂υ

)2}

Finally, we can derive differential of Lyapunov function as:

VI (k) = −e2(k) · 1

2· η

P · y2r·(∂ y(k)

∂υI

)2

·{

2− η

P · y2r·(∂ y(k)

∂υI

)2}

(A-10)

Therefore

VI (k) = −λ · e2(k) (A-11)

Where λ = 12 · η

P·y2r·(∂ y(k)∂υI

)2 ·{

2− η

P·y2r·(∂ y(k)∂υI

)2}

From the Laypunov stability theorem, the stability is guar-antied if VI (k) be positive and VI (k) be negative. From(A-2) VI (k) is already positive. The condition of stability isdepending onVI (k) to be negative. Therefore, we considerλ > 0 for all models.

Because 12 · η

P·y2r·(∂ y(k)∂υI

)2> 0 then the convergence

condition is limited to

2− η

P · y2r·(∂ y(k)

∂υI

)2

> 0

⇒ η

P · y2r·(∂ y(k)

∂υI

)2

< 2

⇒ η <(

2 · P · y2r

)/

(∂ y(k)

∂υI

)2

(A-12)

Maximum of learning rate η changes in a fixed range. Since2 · P · y2

r is not depending of the model, the value of ηMax

guarantees the convergence can be found out by minimizing

the term of∣∣∣ ∂ y(k)∂υI

∣∣∣.Proof of Theorem 3 (A-12) for WNF models can be writtenas:

0 < ηυ <2 · P · y2

r∣∣∣ ∂YW N F∂υ

∣∣∣2

max

(A-13)

where υ in consequent part of the rules is w,CN or CW andin premise part of the rules is σ or x . For parameters w,CN

and CW the partial derivatives, from (42–44):

∂YW N F

∂w= βm · ∂YW N Nm

∂w(A-14)

∂YW N F

∂CN= βm · ∂YW N Nm

∂CN(A-15)

∂YW N F

∂CW= βm · ∂YW N Nm

∂CW(A-16)

Because βm � 1 for all m, therefore

0 < ηw <2 · P · y2

r∣∣∣ ∂YW N N∂w

∣∣∣2

max

(A-17)

0 < ηCN <2 · P · y2

r∣∣∣ ∂YW N N∂CN

∣∣∣2

max

(A-18)

0 < ηCW <2 · P · y2

r∣∣∣ ∂YW N N∂CW

∣∣∣2

max

(A-19)

From (40, 41) for parameters σ or x there is:

∂YW N F

∂σ= YW N Nm ·

βm

µAm· (1− βm) · 2 · (xi − xmi )

2

σ 3mi

= YW N Nm ·(1− βm)∑M

m=1 µAm· 2 · (xi − xmi )

2

σ 3mi

(A-20)

∂YW N F

∂ x= YW N Nm ·

βm

µAm· (1− βm) · 2 · (xi − xmi )

σ 2mi

= YW N Nm ·(1− βm)∑M

m=1 µAm· 2 · (xi − xmi )

σ 2mi

(A-21)

0 < ησ <2 · P · y2

r∣∣∣YW N N

∣∣∣2

max·(

2σ 3

min

)2 (A-22)

0 < ηx <2 · P · y2

r∣∣∣YW N N

∣∣∣2

max·(

2σ 2

min

)2 (A-23)

Appendix B (modified mountain clustering)

The purpose of clustering is to do natural grouping of largeset of data, producing a concise representation of system’sbehavior. Yager and Filev Yager and Filev (1994) have pro-posed a, simple and easy to implement, mountain clusteringalgorithm for estimating the number and location of clustercenters. The proposed modified mountain clustering in unithypercube (normalized space of data) is as follows:

123

Artificial wavelet neuro-fuzzy model based on parallel wavelet network and neural network 807

We assume that each data point has potential to become acluster center and calculate its potential by:

P1i =

n∑j=1

exp(−α∥∥xi − x j

∥∥2), i = 1, 2, . . . , n (B-1)

where α = 4r2

aand n is number of data set.

‖.‖ denote the Euclidian distance and ra is a positive con-stant, which defines the neighborhood of datum. A data pointwith many neighboring data points will have a high potentialvalue and data points outside radial distance ra have a littleinfluence on the potential. After the potential of every datapoint has been evaluated, the data with highest potential isselected as first center:

x∗1 ⇐ P∗1 =n

maxi=1

(P1

i

)(B-2)

For the selection of second cluster center, the potential valueof each datum is revised in order to deduce the effect ofmountain function around the first cluster center as follows:

P2i = P1

i − P∗1 exp(−β∥∥xi − x∗1

∥∥2)

(B-3)

where β = 4r2

band second cluster center will be:

x∗2 ⇐ P∗2 =n−1maxi=1

(P2

i

)(B-4)

rb is a positive constant, which defines the neighborhood ofcluster center. Thus, we subtract an amount of potential fromeach data point as a function of its distance from the firstcluster center. It is evident from the above equation that thedata points near the first cluster center have greatly reducedpotential value and are unlikely to be selected as the nextcluster center. After revision of the potential value of eachdatum, second cluster center is selected with highest remain-ing potential. Similarly, for the selection of kth cluster center,revision of the potential value for each datum is done by:

Pk−1i = Pk−1

i − P∗k−1 exp(−β∥∥xi − x∗k−1

∥∥2)

(B-5)

Where x∗k−1 is location of (k − 1)′th cluster center and P∗kis its potential value and the k′th cluster will be:

x∗k ⇐ P∗k =n−k+1maxi=1

(Pk

i

)(B-6)

To stop this procedure we use the criterion P∗k /P∗1 < δ(δ isa small fraction).

Number of resulting cluster centers and distance betweenthem are highly dependent upon the mountain clusteringparameters, i.e., the neighborhood of datum or radius of influ-ence ra , neighborhood of clusters rb, gray region parameterδ Azeem et al. (2003). A brief analysis of their choice ispresented as follows:

Neighborhood of datum ra

Smaller the value of ra , smaller the values of potential valueof first centers Azeem et al. (2003) and rb which result in thelarge number of cluster centers and vice versa. The number ofcluster centers approaches the number of data points therebydefeating the purpose of clustering. The maximum value forra is half of the principal diagonal of unit hypercube, i.e.,ramax =

√(n + 1)/2 and the minimum value may be taken

as ramin = 0.2ramax .

Neighborhood of cluster centers

The spaces among the resulting clusters are highly dependingupon the value of rb. To avoid obtaining closely spaced clustercenters, set rb to be somewhat greater than ra . Here it issupposed to be 1.25.

Gray region of parameters δu and δl

The number of cluster centers is also depends upon the posi-tion and range of gray region. Small value of δ results in alarge number of cluster centers vice versa. It is difficult toestablish a single value for δ that works well for all data. Agood choice of the upper and lower limits is to take δu = 0.15and δl = 0.0.

References

Hossen A (2004) Power spectral density estimation via wavelet decom-position. Electron Lett 40(17):1055–1056

Krishnamachari S, Chellappa R (1992) GMRE models and waveletdecomposition for texture segmentation. Int Conf Image Process3(6):889–898

Rumelhart DE, Hinton GE, Williams RJ (1986) Learning internal rep-resentations by error propagation. In: McClelland JL, RumelhartDE (eds) Parallel distributed processing. MIT Press, Cambridge

Narendra KS, Parthasarathy K (1990) Identification and control ofdynamical systems using neural networks. IEEE Trans NeuralNetw 1(1):4–27

Takagi T, Sugeno M (1985) Fuzzy identification of systems and itsapplications to modeling and control. IEEE Trans Syst Man Cybern15:116–132

Quang CS, Lee WJ, Lee SJ (2005) A TSK-type neuro-fuzzy networkapproach to system modeling problems. IEEE Trans Syst ManCybern B Cybern 35(4):751–767

Azeem MF, Hanmandlu M, Ahmad N (2003) Structure identification ofgeneralized adaptive neuro-fuzzy inference systems. IEEE TransFuzzy Syst 11(5):666–681

Sugeno M, Yasukawa T (1993) A fuzzy-logic-based approach to qual-itative modeling. IEEE Trans Fuzzy Syst 1(1):7–31

Boubez TI, Peskin RL (1993) Wavelet neural networks and receptivefield partitioning. IEEE Int Conf Neural Netw 3:1544–1549

Yamakawa T, Uchino E, Samatsu T (1994) Wavelet neural networksemploying over-complete number of compactly supported non-orthogonal wavelets and their applications. IEEE World CongrComput Intell 3:1391–1396

123

808 A. Banakar, M. F. Azeem

Wang T, Sugai Y (2000) A wavelet neural network for the approxima-tion of nonlinear multivariable function. Trans Inst Electric EngJpn C 102-C:185–193

Ting W, Sugai Y (1999) A wavelet neural network for the approxima-tion of nonlinear multivariable function. IEEE Int Conf Syst ManCybern 3:378–383

Wang T, Sugai Y (2000) The local linear adaptive wavelet neural net-work with hybrid EP/gradient algorithm and its application to non-linear dynamic system identification. Trans Inst Electric Eng JpnC 102-C:185–193

Chen Y, Yang B, Dong J (2006) Time-series prediction using a locallinear wavelet neural network. Trans Inst Electric Eng Jpn 69(4–6):449–465

Chen Y, Yang B, Dong J (2006) A local linear adaptive wavelet neuralnetwork. Trans Inst Electric Eng Jpn 69(4–6):449–465

Zhang Q (1992) Wavelet networks. IEEE Trans Neural Netw 3(6):889–898

Zhang J, Walter GG, Miao Y, Lee W (1995) Wavelet neural networksfor function learning. IEEE Trans Signal Process 43(6):1485–1497

Zhang Q (1997) Using wavelet network in nonparametric estimation.IEEE Trans Neural Netw 8(2):227–236

Benveniste A, Juditsky A, Delyon B, Zhang Q, Glorennec PY (1994)Wavelets in identification. In Proceedings of SYSID’94, 10th IFACsymposium system identification, Copenhagen, Denmark

Jang JSR (1993) ANFIS: adaptive-network-based fuzzy inference sys-tem. IEEE Trans Syst Man Cybern 23(3):665–685

Ho DWC, Zhang PA, Xu J (2001) Fuzzy wavelet networks for functionlearning. IEEE Trans Fuzzy Syst 9(1):200–211

Lin CJ, Chin CC, Lee CL (2003) A wavelet based neuro-fuzzy systemand its applications. Proc Int Joint Conf Neural Netw 3:1921–1926

Rao RM, Bopardikar AS (2004) Wavelet transform: introduction totheory and applications. Pearson education (India)

Daubechies I (1992) Ten lecture on wavelets, CBMS series. SIMA,Philadelphia

Burrus CS, Gopinath RA, Guo H (1997) Introduction to wavelets andwavelet transforms. Prentice Hall, Englewood Cliffs

Oussar Y, Dreyfus G (2000) Initialization by selection for wavelet net-work training. Neurocomputing 34:131–143

Lee CH, Teng CC (2000) Identification and control of dynamic systemsusing Recurrent Fuzzy Neural Networks. IEEE Trans Fuzzy Syst8(4):349–366

Chung FL, Lee T (1998) Analytical resolution and numerical identifi-cation of fuzzy relational systems. IEEE Trans Syst Man CybernB Cybern 28(6):919–924

Yager RR, Filev DP (1994) Generation of fuzzy rules by mountain clus-tering. J Int Fuzzy Syst 2:209–219

123