General type-2 fuzzy neural network with hybrid learning for function approximation

6
General Type-2 Fuzzy Neural Network with Hybrid Learning for Function Approximation Wen-Hau Roger Jeng, Chi-Yuan Yeh, and Shie-Jue Lee, Member, IEEE Abstract— A novel Takagi-Sugeno-Kang (TSK) type fuzzy neural network which uses general type-2 fuzzy sets in a type-2 fuzzy logic system, called general type-2 fuzzy neural network (GT2FNN), is proposed for function approximation. The problems of constructing a GT2FNN include type reduc- tion, structure identification, and parameter identification. An efficient strategy is proposed by using α-cuts to decompose a general type-2 fuzzy set into several interval type-2 fuzzy sets to solve the type reduction problem. Incremental similarity- based fuzzy clustering and linear least squares regression are combined to solve the structure identification problem. Regard- ing the parameter identification, a hybrid learning algorithm (HLA) which combines particle swarm optimization (PSO) and recursive least squares (RLS) estimator is proposed for refining the antecedent and consequent parameters, respectively, of fuzzy rules. Simulation results show that the resulting networks obtained are robust against outliers. I. I NTRODUCTION During the past decades, fuzzy logic systems (FLS) based on traditional fuzzy sets, called type-1 fuzzy sets (T1FS) which represent uncertainties by numbers in the range [0, 1], have been successfully applied to different areas of appli- cation, such as automatic control, function approximation, data classification, etc. [1]. Sometimes, using T1FS may not be enough to handle the uncertainty which is difficult to be represented as a real value [2]. Type-2 fuzzy logic systems (T2FLS) based on type-2 fuzzy sets (T2FS) have been used to solve this problem, and may perform better than type-1 fuzzy logic systems (T1FLS) due to the flexibility that the membership degrees of T2FS themselves can be fuzzy sets [3],[4]. To date, most of the T2FLSs use interval type-2 fuzzy sets (IT2FS), a special case of general type-2 fuzzy sets (GT2FS). The main reasons can be: (1) the inference procedure in general T2FLS (GT2FLS) is much more complicated than interval T2FLS (IT2FLS) as well as there are no appropriate inference mechanisms for GT2FLS; (2) the amount of time required by GT2FLS is demanding due to the complexity of the type reduction procedure. Recently, Liu proposed an efficient centroid type reduction strategy for GT2FLS [5]. The main idea is based on α-cuts to decompose a GT2FS into several IT2FS, and then apply the KM algorithm [6] to convert IT2FS into T1FS. In this work, the idea of α-cuts is exploited to solve the type reduction problem for GT2FLS and to design a general type-2 fuzzy neural network (GT2FNN). Two more Wen-Hau Roger Jeng, Chi-Yuan Yeh, and Shie-Jue Lee are with the Department of Electrical Engineering, National Sun Yat-Sen University, Kaohsiung 804, Taiwan (corresponding author: [email protected]). stages, structure identification and parameter identification, are required. Regarding the structure identification, an in- cremental similarity-based fuzzy clustering method [7] is used to partition the dataset into several clusters and a local regression model is built for each cluster, and then a general type-2 fuzzy rule is extracted from each cluster and regressor. Regarding the parameter identification, a hybrid learning algorithm (HLA) which combines particle swarm optimization (PSO) [8] and recursive least squares (RLS) [9] estimator is proposed for refining the antecedent and con- sequent parameters, respectively, of fuzzy rules. Simulation results show that the resulting networks obtained are robust against outliers. The rest of this paper is organized as follows. Section II presents basic concepts about general type-2 fuzzy infer- ence systems. Section III describes incremental similarity- based fuzzy clustering, multiple linear regression, and rule extraction for structure identification. Section IV describes the hybrid learning algorithm for parameter identification. Experimental results are presented in Section V. Finally, a conclusion is given in Section VI. II. GENERAL TYPE-2 FUZZY NEURAL NETWORK In this section, basic concepts about general type-2 fuzzy sets and fuzzy logic systems are introduced. The general type-2 TSK fuzzy neural network is also briefly described. A. General type-2 fuzzy sets When we cannot determine the membership degree of an element in a set as 0 or 1, we can apply fuzzy sets to fuzzify it. Similarly, when we cannot determine the membership degree of an element in a fuzzy set as a crisp number in [0, 1], we can fuzzify it again, i.e., the membership function is fuzzy. This concept, called type-2 fuzzy logic, was first introduced by Zadeh [2] in 1975. A type-2 fuzzy set (T2FS), denoted as ˜ A, can be defined on universe X as ˜ A = xX μ ˜ A (x)/x = xX μJ x f x (μ) /x (1) where μ ˜ A (x) is a secondary membership function (MF), J x [0, 1] is the set of primary membership degrees of x X, with μ J x , x X, and f x (μ) [0, 1] is a secondary membership degree. Fig. 1 (Left to right) shows the Gaussian primary membership function and Gaus- sian secondary membership function, respectively. Note that, when f x (μ)=1, μ J x [0, 1], and the secondary MFs are interval sets, the fuzzy set can be called interval type- 2 fuzzy set (IT2FS), a special case of T2FS. In order to FUZZ-IEEE 2009, Korea, August 20-24, 2009 978-1-4244-3597-5/09/$25.00 ©2009 IEEE 1534

Transcript of General type-2 fuzzy neural network with hybrid learning for function approximation

General Type-2 Fuzzy Neural Network with Hybrid Learning forFunction Approximation

Wen-Hau Roger Jeng, Chi-Yuan Yeh, and Shie-Jue Lee, Member, IEEE

Abstract— A novel Takagi-Sugeno-Kang (TSK) type fuzzyneural network which uses general type-2 fuzzy sets in atype-2 fuzzy logic system, called general type-2 fuzzy neuralnetwork (GT2FNN), is proposed for function approximation.The problems of constructing a GT2FNN include type reduc-tion, structure identification, and parameter identification. Anefficient strategy is proposed by using α-cuts to decompose ageneral type-2 fuzzy set into several interval type-2 fuzzy setsto solve the type reduction problem. Incremental similarity-based fuzzy clustering and linear least squares regression arecombined to solve the structure identification problem. Regard-ing the parameter identification, a hybrid learning algorithm(HLA) which combines particle swarm optimization (PSO) andrecursive least squares (RLS) estimator is proposed for refiningthe antecedent and consequent parameters, respectively, offuzzy rules. Simulation results show that the resulting networksobtained are robust against outliers.

I. INTRODUCTION

During the past decades, fuzzy logic systems (FLS) based

on traditional fuzzy sets, called type-1 fuzzy sets (T1FS)

which represent uncertainties by numbers in the range [0, 1],have been successfully applied to different areas of appli-

cation, such as automatic control, function approximation,

data classification, etc. [1]. Sometimes, using T1FS may not

be enough to handle the uncertainty which is difficult to be

represented as a real value [2]. Type-2 fuzzy logic systems

(T2FLS) based on type-2 fuzzy sets (T2FS) have been used

to solve this problem, and may perform better than type-1

fuzzy logic systems (T1FLS) due to the flexibility that the

membership degrees of T2FS themselves can be fuzzy sets

[3],[4].

To date, most of the T2FLSs use interval type-2 fuzzy sets

(IT2FS), a special case of general type-2 fuzzy sets (GT2FS).

The main reasons can be: (1) the inference procedure in

general T2FLS (GT2FLS) is much more complicated than

interval T2FLS (IT2FLS) as well as there are no appropriate

inference mechanisms for GT2FLS; (2) the amount of time

required by GT2FLS is demanding due to the complexity

of the type reduction procedure. Recently, Liu proposed an

efficient centroid type reduction strategy for GT2FLS [5].

The main idea is based on α-cuts to decompose a GT2FS

into several IT2FS, and then apply the KM algorithm [6] to

convert IT2FS into T1FS.

In this work, the idea of α-cuts is exploited to solve

the type reduction problem for GT2FLS and to design a

general type-2 fuzzy neural network (GT2FNN). Two more

Wen-Hau Roger Jeng, Chi-Yuan Yeh, and Shie-Jue Lee are with theDepartment of Electrical Engineering, National Sun Yat-Sen University,Kaohsiung 804, Taiwan (corresponding author: [email protected]).

stages, structure identification and parameter identification,

are required. Regarding the structure identification, an in-

cremental similarity-based fuzzy clustering method [7] is

used to partition the dataset into several clusters and a

local regression model is built for each cluster, and then a

general type-2 fuzzy rule is extracted from each cluster and

regressor. Regarding the parameter identification, a hybrid

learning algorithm (HLA) which combines particle swarm

optimization (PSO) [8] and recursive least squares (RLS) [9]

estimator is proposed for refining the antecedent and con-

sequent parameters, respectively, of fuzzy rules. Simulation

results show that the resulting networks obtained are robust

against outliers.

The rest of this paper is organized as follows. Section

II presents basic concepts about general type-2 fuzzy infer-

ence systems. Section III describes incremental similarity-

based fuzzy clustering, multiple linear regression, and rule

extraction for structure identification. Section IV describes

the hybrid learning algorithm for parameter identification.

Experimental results are presented in Section V. Finally, a

conclusion is given in Section VI.

II. GENERAL TYPE-2 FUZZY NEURAL NETWORK

In this section, basic concepts about general type-2 fuzzy

sets and fuzzy logic systems are introduced. The general

type-2 TSK fuzzy neural network is also briefly described.

A. General type-2 fuzzy sets

When we cannot determine the membership degree of an

element in a set as 0 or 1, we can apply fuzzy sets to fuzzify

it. Similarly, when we cannot determine the membership

degree of an element in a fuzzy set as a crisp number in

[0, 1], we can fuzzify it again, i.e., the membership function

is fuzzy. This concept, called type-2 fuzzy logic, was first

introduced by Zadeh [2] in 1975. A type-2 fuzzy set (T2FS),

denoted as A, can be defined on universe X as

A =∫

x∈X

μA(x)/x =∫

x∈X

[∫μ∈Jx

fx(μ)/μ

]/x (1)

where μA(x) is a secondary membership function (MF),

Jx ⊆ [0, 1] is the set of primary membership degrees of

x ∈ X , with μ ∈ Jx, ∀x ∈ X , and fx(μ) ∈ [0, 1]is a secondary membership degree. Fig. 1 (Left to right)

shows the Gaussian primary membership function and Gaus-

sian secondary membership function, respectively. Note that,

when fx(μ) = 1, ∀μ ∈ Jx ⊆ [0, 1], and the secondary MFs

are interval sets, the fuzzy set can be called interval type-

2 fuzzy set (IT2FS), a special case of T2FS. In order to

FUZZ-IEEE 2009, Korea, August 20-24, 2009

978-1-4244-3597-5/09/$25.00 ©2009 IEEE 1534

−3 −2 −1 x=1 2 30

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

μ = 0.45 m12=0.45

σ11

m11=0

0.9

0.8

0.7

0.6

0.5

0.45

0.4

0.3

0.2

0.1

0

1

0 0.5 1

σ12

Fig. 1. General type-2 Gaussian membership function.

distinguish IT2FS from T2FS, T2FS will be called general

type-2 fuzzy set (GT2FS). Besides, a T1FS is also a special

case of a GT2FS, when there is only one element in Jx of

GT2FS and the degree of the element is 1.

B. General type-2 fuzzy logic systemSimilar to T1FLS, a GT2FLS includes fuzzifier, fuzzy rule

base, fuzzy inference engine, and output processing, where

output processing contains type-reducer and defuzzifier. A

block diagram of a GT2FLS is depicted in Fig. 2. We

Fig. 2. General type-2 fuzzy logic system.

briefly describe the functionality of each component and the

operation of the whole system as follows:1. Fuzzifier. For each crisp input value, the fuzzifier

transfers it into a GT2FS set to express the associated

measurement uncertainty.2. Fuzzy reasoning. Fuzzy reasoning is performed by the

fuzzy GT2 inference engine based on the GT2FS obtained

in step 1 and the fuzzy rule base is composed of a set of

fuzzy IF-THEN rules. After reasoning, we have a GT2FS

for each output variable. To solve type reduction problem,

an efficient strategy is proposed by using α-cuts to decom-

pose a general type-2 fuzzy set into several interval type-

2 fuzzy sets, and then the firing strength of antecedent for

each rule with different α-cuts can be computed. Note that

α ∧ (μAi,j(xi))α = [li,j ri,j ]α denotes the α-cut plane of

μAi,j(xi)in rule j. In this paper, the product operation is

used for inference, and the firing strength of the antecedent

can be defined as follows:

[f j fj]α =

n∏i=1

[li,j ri,j ]α =

[n∏

i=1

li,j,α

n∏i=1

ri,j,α

](2)

where n is the number of input dimension, f j is a lower

bound of the firing strength in the j-th rule, and fj

is an

upper bound of the firing strength in the j-th rule.

3. Type-reducer. The output sets of the GT2FIS are type-2

fuzzy sets. To obtain embedded type-1 fuzzy sets for each

rule, a type-reduced method, centroid type reduction, is used

in type-reducer. The efficient algorithm, called the Karnik-

Mendel (KM) algorithm [6], has been developed for centroid

type reduction. The type reduction set is

∨α

[y y

=

⎡⎢⎢⎢⎣L∑

j=1

bjfj

α +J∑

j=L+1

bjfj

α

L∑j=1

fj

α +J∑

j=L+1

f j

α

R∑j=1

bjfj

α+

J∑j=R+1

bjfj

α

R∑j=1

f j

α+

J∑j=R+1

fj

α

⎤⎥⎥⎥⎦(3)

where y is the left-most point of y, y is the right-most point

of y, and L and R are obtained from the KM algorithm.

4. Defuzzifier. To obtain a crisp output value for each out-

put variable, a defuzzification method, weighted average, is

used in the defuzzifier to convert the related fuzzy conclusion

obtained in the previous step to a single real number, which

is shown below:

y =

∑α

yα+yα

2∑α

α(4)

where α = [1, 0.8, 0.6, 0.4, 0.2, 0.01].

C. General type-2 TSK fuzzy neural network

The general type-2 TSK fuzzy neural network is a four-

layer network structure as shown in Fig. 3. The four layers

++++++

++++++

++++++

++++++

nJniJiJJJ

njnijijjj

nnii

nnii

xxxx

xxxx

xxxx

xxxx

,,2,21,1,0

,,2,21,1,0

2,2,22,212,12,0

1,1,21,211,11,0

βββββ

βββββ

βββββ

βββββ

Fig. 3. General type-2 fuzzy neural network.

are called the fuzzification layer (layer 1), the conjunction

FUZZ-IEEE 2009, Korea, August 20-24, 2009

1535

layer (layer 2), the normalization layer (layer 3), and the

output layer (layer 4), respectively. The operation of the

fuzzy neural network is described as follows.

Layer 1: Layer 1 contains J rules and each rule contains

n nodes. Node(i, j) of this layer produces its output, o(1)i,j ,

by computing the value of the corresponding general type-2

membership function Ai,j , i.e.,

o1i,j =Ai,j(xi)

=gt2(xi; m1

i,j , σ1i,j , σ

2i,j

)= ∨α

(α ∧

(μAi,j(xi)

) (5)

where xi is the i-th dimension of the input, m1i,j and σ1

i,j

are mean and standard deviation, respectively, of the primary

membership function of the i-th feature in the j-th fuzzy

rule, and σ2i,j is the deviation of the secondary membership

function of the i-th feature in the j-th fuzzy rule which is

defined as

μAi,j(xi)(a) = exp

⎛⎜⎜⎜⎝−

⎛⎜⎜⎝a − exp(−

(xi−m1

i,j

σ1i,j

)2)

σ2i,j

⎞⎟⎟⎠2⎞⎟⎟⎟⎠(6)

where a ∈ [0, 1], i = 1, ..., n, n is the number of input

dimension, j = 1, ..., J , J is the number of fuzzy rules, and

α ∧ (μAi,j(xi))α is the input of the Layer 2.

Layer 2: Layer 2 contains J rules and each rule contains

t nodes where t is the number of α-cut planes. The output

(o(2)j )α is: (

o(2)j

=n∏

i=1

(o(1)i,j

(7)

where j = 1, ...J .

Layer 3: Layer 3 contains t nodes where t is the number

of α-cut planes. The output (o3)α of Layer 3 is the result

obtained from the KM Algorithm:

(o3)α = KM((o21)α, (o2

2)α, ..., (o2j )α, ...(o2

J)α, cons) (8)

where

cons =

⎡⎢⎢⎢⎢⎢⎢⎢⎢⎣

β0,1 + β1,1x1 + ... + βi,1xi + ... + βn,1xn

β0,2 + β1,2x1 + ... + βi,2xi + ... + βn,2xn

...

β0,j + β1,jx1 + ... + βi,jxi + ... + βn,jxn

...

β0,J + β1,Jx1 + ... + βi,Jxi + ... + βn,Jxn

⎤⎥⎥⎥⎥⎥⎥⎥⎥⎦(9)

is the consequence vector generated from all rules.

Layer 4: Layer 4 contains one node and its output, o(4),

represents the result of the centroid defuzzification, i.e.,

o(4) =∑

α(o(3))α∑α α

. (10)

III. STRUCTURE IDENTIFICATION FOR GT2FNN

To date, there are no general guidelines that can be applied

to specify the optimal number of fuzzy rules and its corre-

sponding initial values for the first-order TSK type FNN.

In this study, we propose a self-constructing method which

consists of incremental similarity-based fuzzy clustering,

multiple linear regression, and fuzzy rule extraction to solve

this problem. The flowchart of the structure identification for

the first-order TSK type FNN is depicted in Fig. 4 and the

detailed process is described as follows.

Fig. 4. Structure identification for the first-order TSK type GT2 fuzzyrules.

A. Incremental similarity-based clustering

The basic concept of the incremental similarity-based clus-

tering algorithm is that one training pattern is considered at a

time, and then the input-similarity and out-similarity between

this pattern and the existing fuzzy clusters are calculated to

determine either to assign it to the most similar cluster and

to update the statistical mean and standard deviation for the

most similar cluster or to create a new cluster for it and to

set initial values for the new cluster. Suppose we are given a

set of input-output training patterns (x1, y1), ..., (xl, yl), with

input xi ∈ Rn, i = 1, ..., l, where l is the number of training

patterns, and output yi ∈ R. Let J be the number of existing

fuzzy clusters. The input-similarity between the i-th training

pattern xi and the j-th fuzzy cluster Cj is calculated by

Gi,j =n∏

k=1

exp

[−

(xk,i − mk,j

σk,j

)2]

(11)

where mk,j and σk,j denote the mean and standard deviation

of cluster Cj , respectively. The output-similarity between the

i-th training pattern and the j-th cluster Cj is calculated by

ei,j =|yi − ycj| (12)

where ycjdenotes the representative output of cluster Cj . If

Gi,j ≥ ρ and ei,j ≤ τ where 0 ≤ ρ ≤ 1 and τ are predefined

thresholds, this means that xi has passed the input-similarity

test and output-similarity test, i.e., xi is similar to Cj . In this

case, xi is assigned to the most similar cluster Ca with the

FUZZ-IEEE 2009, Korea, August 20-24, 2009

1536

largest input-similarity and the modification to this cluster is

defined as follows:

mk,a =|Ca|mk,a + xki

|Ca| + 1, for k = 1, ..., n (13)

where |Ca| is the cardinality of Ca,

σk,a =σk,0 +√

A − B, for k = 1, ..., n (14)

where σk,0 is a user-defined initial standard devia-

tion, A = (|Ca|−1)(σk,a−σk,0)2+|Ca|m2

k,a+x2k,a

|Ca| , and B =(|Ca|mk,a+xk,a)2

|Ca|(|Ca|+1) , and

yca =|Ca|yca + yi

|Ca| + 1. (15)

If xi does not pass the input-similarity test or output-

similarity test with any existing clusetr, a new fuzzy cluster

CJ+1 is created with mJ+1 = xi, σJ+1 = σ0, and ycJ+1 =yi.

B. Linear least square regressor

After the J clusters are obtained, we can build a local

regression model for each cluster by applying the linear least

squares approach. The multiple linear regression model for

the j-th cluster is

yj =Xjβj + ε (16)

where yj ∈ R|Cj | is the output of the j-th cluster, Xj ∈

R|Cj |×(n+1) is the input of the j-th cluster, βj ∈ R

n+1 is the

regression parameters of the j-th cluster, and ε ∈ R|Cj | is the

vector of random errors. We wish to find βj that minimizes

Ej =(yj − Xjβj)T (yj − Xjβj). (17)

Setting partial derivatives of E with respect to the regression

parameters βj to zero, we can obtain the least square normal

equation

XTj Xjβj =XT

j yj (18)

which implies that

βj =(XTj Xj)−1XT

j yj . (19)

Note that if (XTj Xj)−1 does not exist, we can use (XT

j Xj +λI)−1 to solve Eq. (19) where λ is an arbitrary small positive

real number and I is a (n + 1) by (n + 1) identity matrix.

C. Fuzzy rule extraction

After the J clusters and the linear local regressor are

obtained, we can extract a first-order TSK type GT2 IF-

THEN fuzzy rule from each cluster and the regressor. The

parameters of the ‘IF’ part in the j-th rule can be obtained

from the j-th fuzzy cluster, while the parameters of the

‘THEN’ part in the j-th rule can be obtain from the j-th

linear local regressor. The j-th first-order TSK type GT2 IF-

THEN fuzzy rule is:

IF x1 is A1,j and ... and xi is Ai,j and ... and xn is An,j

THEN yj is β0,j + β1,jx1 + ... + βi,jxi + ... + βn,jxn

(20)

where x = [x1, ..., xi, ..., xn]T is the input vector, yj is the

output of the j-th rule, βj = [β0,j , β1,j , ..., βi,j , ..., βn,j ]T is

the consequence parameters of the j-th rule obtaining from

the j-th linear local regressor, and Fi,j is GT2 fuzzy set

of antecedent part of the i-th feature in the j-th rule. The

primary membership function μi,j can be defined as follows:

μi,j = exp

⎡⎣−(xi − m1

i,j

σ1i,j

)2⎤⎦ (21)

where m1i,j is obtained from the statistic mean of the i-

th feature in the j-th cluster, σ1i,j is obtained from the

standard deviation of the i-th feature in the j-th cluster, and

the secondary membership function μAi,jcan be defined as

follows:μAi,j

= gt2(μi,j , σ2i,j) (22)

where 0 ≤ μi,j ≤ 1 is a membership degree and σ2i,j = σ1

i,j .

Now the first-order TSK type GT2 IF-THEN fuzzy rules have

been extracted by the above definition and the initial values

of m1i,j , σ1

i,j , and σ2i,j have been determined. The parameter

values can be refined by a hybrid learning algorithm in the

parameter identification phase, to be described below.

IV. PARAMETER IDENTIFICATION FOR GT2FNN

In order to improve the convergence speed, a hybrid

learning algorithm (HLA) which combines particle swarm

optimization (PSO) and the recursive least squares (RLS)

estimator is used to train the network. In each iteration, PSO

and RLS are applied to refine the antecedent and consequent

parameters, respectively, of the first-order TSK type GT2

fuzzy rules. The flowchart of the HLA is depicted in Fig.

5 and the detailed process is described as follows.

Fig. 5. Parameter identification for the first-order TSK type GT2 fuzzyrules.

A. Particle Swarm Optimization

PSO is a population-based global search algorithm for

problem solving proposed by Kennedy and Eberhart in

1995 [8]. Each particle is a candidate solution and moves

with an adaptable velocity within the search space, and

FUZZ-IEEE 2009, Korea, August 20-24, 2009

1537

remembers the best position it ever encountered. Assume

an d-dimensional search space S. The i-th particle is an

d-dimensional vector Pi = [pi,1, ..., pi,d]T ∈ S. The cor-

responding current velocity of this particle is Vi(t) =[vi,1, ..., vi,d]T . The new velocity Vi(t + 1) is updated by

Vi(t + 1) =w × Vi(t) + c1 × rand() × (Pbesti − Pi(t))+ c2 × rand() × (Gbest − Pi(t))

(23)

where w, c1, and c2 are the coefficient of inertia, cognitive,

and social, respectively, rand() is uniformly distributed

random numbers in [0, 1], Pbesti is the best previous position

of this particle (the cognitive effect), and Gbest is the overall

best particle (the social effect). The particle then updates its

position by using this new velocity. When all particles in a

swarm have updated their positions, the swarm migrates to

the next generation. If the new position jumps out of the

search space, it will be set to a proper value. Note that the

structure of a particle in this study is presented as follows:

Pi(t) =[m11,1, σ

11,1, σ

21,1, ..., m

1n,1, σ

1n,1, σ

2n,1, ...,

m11,j , σ

11,j , σ

21,j , ..., m

1n,j , σ

1n,j , σ

2n,j , ...,

m11,J , σ1

1,J , σ21,J , ...,m1

n,J , σ1n,J , σ2

n,J ]

(24)

where n is the dimension of a training pattern, J is the

number of GT2 fuzzy rules, the super-index 1 is the primary

membership function for crisp input, and the super-index 2 is

the secondary membership function for membership degree.

The dimension of a particle is d = 3 × n × J .

B. Recursive Least Squares Estimator

Once the reduced type-1 fuzzy sets are obtained by us-

ing the KM algorithm [6], solving consequence parameters

can be considered as a multiple linear regression problem.

From Eq. (19), we can know that the optimal consequence

parameters of fuzzy rules β ∈ R(n+1)×J can be solved by:

β =(WT W )−1WT y. (25)

where the input W = [wT1 , ..., wT

l ]T ∈ Rl×((n+1)×J) is a

non-linear transformation of x and y ∈ Rl is the desired

output. Apparently, the size of (WT W ) is larger than that

of (XT X) in Eq. (19), and it is unavoidable to calculate the

inverse of a large matrix for each particle in each iteration.

To avoid this problem, another approach, recursive singular

value decomposition (RSVD), which considers one training

pattern, (wTi , yi), at a time was proposed to replace the linear

least square approach. In RSVD, a singular value decompo-

sition of W ′ is needed each time. Although the size of W ′

is smaller than W , the amount of time required by RSVD is

demanding. Another effective approach, called recursive least

squares (RLS), which minimizes the summation of squared

errors for all training patterns up to the present iteration t,was proposed. The updating formulation for β is:

βt+1 =βt + λ−1Ht+1zt+1et+1 (26)

where λ > 0 is a scaling factor, zt+1 = [1, wTt+1]

T ∈ Rn+1,

et+1 = yt+1−zTt+1βt is the prediction error of the (t+1)-th

training pattern, and

Ht+1 =Ht − Htzt+1(Htzt+1)T

λ + zTt+1Htzt+1

(27)

where λ+zTt+1Htzt+1 is a scalar and Htzt+1(Htzt+1)T is a

rank-one matrix. Apparently, RLS runs more efficiently than

other approaches. Thus, we adopt RLS here.

V. EXPERIMENTAL RESULTS

In order to test the approximating capability of the pro-

posed method, we have conducted two simulation experi-

ments using two non-linear functions. For ‘fair’ comparison,

the same set of parameters are used in the simulation

experiments. For instance, in PSO, the population size is set

as 5, maximum iteration is set as 30, and the parameters w,

c1, and c2 are set as 0.5, 1.5, and 1.5, respectively.

A. Experiment IIn this simulation, the true function is given by:

y = 1.1 × (1 − x + 2x2) × e−x2/2, x ∈ [−5, 5]. (28)

The uncorrupted training dataset consists of 200 randomly

generated patterns, with input x and corresponding output y.

The testing dataset consists of 50 uncorrupted testing patterns

generated in the same way. A corrupted training pattern is

composed of the same output as the corresponding uncor-

rupted one but with the input corrupted by adding a random

value from a normal distribution with zero mean and standard

deviation σ = 0.1. Three corrupted datasets, in which 20%,

30%, and 40% of the patterns are randomly corrupted, are

used. The parameters ρ and τ of the incremental similarity-

based clustering are set as 0.01 and 0.2, respectively. The

simulation results are shown in Fig. 6 and Table I, where

the training and testing errors are root mean square error

(RMSE). For uncorrupted data shown in Fig. 6, T1FNN,

−5 0 50

0.5

1

1.5

2

2.5

3uncorrupted data

x−5 0 50

0.5

1

1.5

2

2.5

320% corrupted data

x

y

−5 0 50

0.5

1

1.5

2

2.5

330% corrupted data

x−5 0 50

0.5

1

1.5

2

2.5

340% corrupted data

x

y

training dataT1FNNIT2FNNGT2FNN

Training dataT1FNNIT2FNNGT2FNN

Training dataT1FNNIT2FNNGT2FNN

Training dataT1FNNIT2FNNGT2FNN

Fig. 6. Simulations results for four datasets of Experiment I.

IT2FNN and GT2FNN estimates are almost indistinguishable

from the true function. For corrupted data with progressively

increased corruption, GT2FNN estimates are more robust to

x-space outliers and they outperform T1FNN and IT2FNN

estimates. T1FNN may overfit the training data due to the

high percentage of outliers.

FUZZ-IEEE 2009, Korea, August 20-24, 2009

1538

TABLE I

SIMULATIONS RESULTS FOR FOUR DATASETS OF EXPERIMENT I.

uncorrupted datamethods training error testing error refining time number of rulesT1FNN 0.0022 0.0022 3.76 6IT2FNN 0.0047 0.0048 9.49 6GT2FNN 0.0036 0.0035 31.81 6

20% corrupted datamethods training error testing error refining time number of rulesT1FNN 0.0901 0.023 3.75 5IT2FNN 0.0920 0.026 8.47 5GT2FNN 0.0900 0.023 30.44 5

30% corrupted datamethods training error testing error refining time number of rulesT1FNN 0.1372 0.0872 5.4 6IT2FNN 0.1412 0.0598 9.27 6GT2FNN 0.1406 0.0435 31.38 6

40% corrupted datamethods training error testing error refining time number of rulesT1FNN 0.1470 0.1112 3.75 5IT2FNN 0.1616 0.0761 8.70 5GT2FNN 0.1635 0.0647 30.08 5

B. Experiment II

In this simulation, the true function is given by :

y = x21 × sin(x2π). (29)

The uncorrupted training dataset consists of 225 randomly

generated patterns, with input x = [x1, x2]T and correspond-

ing output y. The testing dataset consists of 50 uncorrupted

testing patterns generated in the same way. A corrupted

training pattern is composed of the same output as the

corresponding uncorrupted one but with the input corrupted

by adding a random value from a normal distribution with

zero mean and standard deviation σ = 0.2. Two corrupted

datasets, in which 20% and 40% of the patterns are ran-

domly corrupted, are used. The parameters ρ and τ of the

incremental similarity-based clustering are set as 0.0001 and

0.4, respectively. The simulation results are shown in Table

II. Again, for uncorrupted data, the performances of the three

TABLE II

SIMULATIONS RESULTS FOR THREE DATASETS OF EXPERIMENT II.

uncorrupted datamethods training error testing error refining time number of rulesT1FNN 0.0280 0.0280 3.81 6IT2FNN 0.0178 0.0186 9.85 6GT2FNN 0.0282 0.0297 36.20 6

20% corrupted datamethods training error testing error refining time number of rulesT1FNN 0.0492 0.0403 4.20 5IT2FNN 0.0506 0.0379 9.10 5GT2FNN 0.0427 0.0298 34.25 5

40% corrupted datamethods training error testing error refining time number of rulesT1FNN 0.0685 0.0384 3.69 7IT2FNN 0.0689 0.0380 9.44 7GT2FNN 0.0664 0.0353 34.44 7

fuzzy neural networks on the testing data set are about the

same. However, we can see clearly that GT2FNN performs

better for corrupted data. The approximating results obtained

by GT2FNN are shown in Fig. 7.

−1−0.5

00.5

1

−1

0

1−1

.5

0

.5

1

x1

target function

x2 −1

−0.50

0.51

−1

0

1−1

−0.5

0

0.5

1

x1

uncorrupted data

x2

y

−1−0.5

00.5

1

−1

0

1−1

.5

0

.5

1

x1

20% corrupted data

x2 −1

−0.50

0.51

−1

0

1−1

−0.5

0

0.5

1

x1

40% corrupted data

x2

y

Fig. 7. Simulations results for three datasets with GT2FNN.

VI. CONCLUSION

We have presented an efficient approach for constructing

general type-2 fuzzy neural networks (GT2FNN). The idea

of α-cuts is exploited to solve the type reduction problem.

For structure identification, an incremental similarity-based

fuzzy clustering method is used to partition the dataset

into several clusters and a local regression model is built

for each cluster, and then a general type-2 fuzzy rule is

extracted from each cluster and regressor. For parameter

identification, a hybrid learning algorithm which combines

particle swarm optimization and recursive least squares esti-

mator is proposed for refining the antecedent and consequent

parameters, respectively, of fuzzy rules. Simulation results

have shown that the resulting networks obtained are robust

against outliers.

REFERENCES

[1] G. J. Klir and B. Yuan, Fuzzy Set and Fuzzy logic. Prentice Hall PTR,May 1995.

[2] L. A. Zadeh, “The concept of a linguistic variable and its application toapproximate reasoning-1,” Information Sciences, vol. 8, pp. 199–249,January 1975.

[3] J. M. Mendel, UNCERTAIN Rule-Based Fuzzy Logic Systems. PrenticeHall PTR, January 2001.

[4] J. M. Mendel, “Type-2 fuzzy sets and systems: An overview,” IEEEComputational Intelligence Magazine, vol. 2, no. 1, pp. 20–29, February2007.

[5] F. Liu, “An efficient centroid type-reduction strategy for general type-2fuzzy logic system,” Information Sciences, vol. 179, no. 9, pp. 2224–2236, April 2008.

[6] N. N. Karnik and J. M. Mendel, “Centroid of a type-2 fuzzy set,”Information Sciences, vol. 132, no. 1-4, pp. 195–220, February 2001.

[7] S. J. Lee and C. S. Ouyang, “A neuro-fuzzy system modeling withself-constructing rule generation and hybrid svd-based learning,” IEEETransaction on Fuzzy Systems, vol. 11, no. 3, pp. 341–353, June 2003.

[8] J. Kennedy and R. C. Eberhart, “Particle swarm optimization,” inProceedings of the IEEE International Conference on Neural Networks,1995, pp. 1942–1948.

[9] V. Kecman, Learning and Soft Computing: Support Vector Machines,Neural Networks, and Fuzzy Logic Models. The MIT Press, March2001.

FUZZ-IEEE 2009, Korea, August 20-24, 2009

1539