Kernel shapes of fuzzy sets in fuzzy systems for function approximation

22
Kernel shapes of fuzzy sets in fuzzy systems for function approximation q Qiang Luo * , Wenqiang Yang, Dongyun Yi Department of Mathematics, National University of Defense Technology, Changsha, Hunan 410073, PR China Received 31 October 2006; received in revised form 23 September 2007; accepted 24 September 2007 Abstract The shapes of if-part fuzzy sets affect the approximating capability of fuzzy systems. In this paper, the fuzzy systems with the kernel-shaped if-part fuzzy sets are built directly from the training data. It is proved that these fuzzy systems are universal approximators and their uniform approximation rates can be estimated in the single-input–single-output (SISO) case. On the basis of these rates, the relationships between the approximating capability and the shapes of if-part fuzzy sets are developed for the fuzzy systems. Furthermore, the sinc functions that serve as input membership functions are proved to have the almost best approximation property in a particular class of membership functions. The theoretical results are confirmed from the simulation data. In addition, the estimations of the uniform approximation rates are extended to the multi-input–single-output (MISO) case. Ó 2007 Elsevier Inc. All rights reserved. Keywords: Fuzzy systems; Function approximation; Universal approximation; Uniform approximation rates; Almost best approximation 1. Introduction Fuzzy systems are successfully used in many real areas [17,31,42] such as knowledge engineering, automatic control, pattern recognition, and so on. In order to improve their effectiveness in practical applications, the theoretical guidance for designing them must be further investigated [18,19]. Since Kosko and Wang both independently proved in 1992 that some fuzzy systems are universal approximators [15,41], researches on the approximation accuracy theory of fuzzy systems have attracted much attention [9,38]. The researchers deal mainly with three questions: (1) universal approximation; (2) constructive approximation; (3) approximation rates. Recent studies show that various fuzzy systems have the universal approximation property [1,2,4– 6,15,16,21–23,25,32,41,48,49,54]. For the fuzzy systems with the and/or operators, which can be depicted by 0020-0255/$ - see front matter Ó 2007 Elsevier Inc. All rights reserved. doi:10.1016/j.ins.2007.09.020 q This work was partially supported by grants from the National Basic Research Program of China (No. 2005CB321800) and the Graduate Innovation Foundation of National University of Defense Technology. * Corresponding author. Tel.: +86 731 4573245. E-mail address: [email protected] (Q. Luo). Available online at www.sciencedirect.com Information Sciences 178 (2008) 836–857 www.elsevier.com/locate/ins

Transcript of Kernel shapes of fuzzy sets in fuzzy systems for function approximation

Available online at www.sciencedirect.com

Information Sciences 178 (2008) 836–857

www.elsevier.com/locate/ins

Kernel shapes of fuzzy sets in fuzzy systemsfor function approximation q

Qiang Luo *, Wenqiang Yang, Dongyun Yi

Department of Mathematics, National University of Defense Technology, Changsha, Hunan 410073, PR China

Received 31 October 2006; received in revised form 23 September 2007; accepted 24 September 2007

Abstract

The shapes of if-part fuzzy sets affect the approximating capability of fuzzy systems. In this paper, the fuzzy systemswith the kernel-shaped if-part fuzzy sets are built directly from the training data. It is proved that these fuzzy systemsare universal approximators and their uniform approximation rates can be estimated in the single-input–single-output(SISO) case. On the basis of these rates, the relationships between the approximating capability and the shapes of if-partfuzzy sets are developed for the fuzzy systems. Furthermore, the sinc functions that serve as input membership functionsare proved to have the almost best approximation property in a particular class of membership functions. The theoreticalresults are confirmed from the simulation data. In addition, the estimations of the uniform approximation rates areextended to the multi-input–single-output (MISO) case.� 2007 Elsevier Inc. All rights reserved.

Keywords: Fuzzy systems; Function approximation; Universal approximation; Uniform approximation rates; Almost best approximation

1. Introduction

Fuzzy systems are successfully used in many real areas [17,31,42] such as knowledge engineering, automaticcontrol, pattern recognition, and so on. In order to improve their effectiveness in practical applications, thetheoretical guidance for designing them must be further investigated [18,19]. Since Kosko and Wang bothindependently proved in 1992 that some fuzzy systems are universal approximators [15,41], researches onthe approximation accuracy theory of fuzzy systems have attracted much attention [9,38]. The researchers dealmainly with three questions: (1) universal approximation; (2) constructive approximation; (3) approximationrates.

Recent studies show that various fuzzy systems have the universal approximation property [1,2,4–6,15,16,21–23,25,32,41,48,49,54]. For the fuzzy systems with the and/or operators, which can be depicted by

0020-0255/$ - see front matter � 2007 Elsevier Inc. All rights reserved.

doi:10.1016/j.ins.2007.09.020

q This work was partially supported by grants from the National Basic Research Program of China (No. 2005CB321800) and theGraduate Innovation Foundation of National University of Defense Technology.

* Corresponding author. Tel.: +86 731 4573245.E-mail address: [email protected] (Q. Luo).

Q. Luo et al. / Information Sciences 178 (2008) 836–857 837

t-norms and t-conorms, respectively, Castro [5] proved that these fuzzy systems are capable of approximatingany real continuous functions on a compact set to arbitrary accuracy. Yager and Kreinovich [48] proposed touse more general operators based on uninorms within the fuzzy systems modeling paradigm, and they alsoproved that these fuzzy systems are universal approximators. Some scholars also discussed the universalapproximation properties of the hierarchical fuzzy systems [23,43]. Furthermore, some negative results havebeen obtained [13,24,36]: the fuzzy systems are nowhere dense in the space of continuous functions withrespect to the supremum norm, if the number of rules is restricted on each input space. In one word, the ques-tion of the universal approximation has been well studied for the fuzzy systems.

Many scholars have studied the constructive approximation for various classes of fuzzy systems[10,14,40,44–47,50,51,53,54]. Ying [44] has presented the first result on this aspect; subsequently, Ying et al.[45–47] have obtained some sufficient conditions for fuzzy systems to approximate some continuous functionswith the desired accuracy. Ding [10] has given some necessary conditions for the MISO Mamdani fuzzy sys-tems to be the universal approximators. In addition, Zeng has established the approximation error bounds forthe fuzzy systems generated by a center-average defuzzifier and also for the ones generated by an MoMdefuzzifier with Pseudo-trapezoid-Shaped (PTS) membership functions [50,51]. That is, some guidelines havebeen proposed for the design of the aforementioned types of fuzzy systems.

However, there is still a lack of the theoretical estimations of the approximation rates of fuzzy systems [38].The approximation rates play a key role in answering the question: what, if any, are the advantages of fuzzysystems as function approximators over the other methods such as the polynomial, spline, trigonometric,wavelets, neural networks. The function approximation with the fuzzy system is a kind of nonlinear approx-imation, in fact, a highly nonlinear approximation [7]. Given a target function f, the highly nonlinear approx-imation is supposed to choose both the fittest basis function from a class of basis functions and the best n-termapproximation to f from this basis. In fuzzy systems, the basis functions are the if-part fuzzy sets. The systemicunderstanding of the highly nonlinear approximation is generally a big challenge [7]; specifically, the approx-imation rates of fuzzy systems have not been estimated so far, except for some particular type of fuzzy systems(e.g., the fuzzy KH interpolators [37]).

Some researchers explored the relationships between the shapes of the fuzzy sets (i.e., the membership func-tions) and the approximating capability of the fuzzy systems [18,19,52]. Zeng and Singh [52] have developed arelationship between the PTS membership functions and the accuracies of function approximation for fuzzysystems under some certain conditions. Mitaim and Kosko [18,19] put forward the open question: What is thebest shape of a fuzzy set in fuzzy systems for function approximation. By exploring a wide range of candidateif-part sets, they found that no shape of the set emerges as the best shape; however, the numerical results showthat the sinc function (sinx/x) often converges fastest and with greatest accuracy among the candidates [19].Unfortunately, they could not find any theoretical reason for the good performance of the sinc function as anonlinear interpolator in a fuzzy system; however, they still suggest that an engineer should check whetherchoosing the sinc function to be the input membership function can improve a given fuzzy system.

Although the sinc-shaped fuzzy sets have many successful applications in both the numerical experiments[18,19,39] and the practical areas [20,26], some fundamental issues still need to be well addressed. For instance,the semantics of the sinc functions in the context of fuzzy systems and the approximation property of the sincmembership functions are of theoretical importance. For the semantics issue, the semantic integrity of a sincmembership function is discussed in the following paragraphs.

In comparison with the commonly used membership functions (e.g., the triangular membership function[28]) whose semantics are very clear in the context of fuzzy systems, the negative values and oscillatory natureof the sinc membership functions do not admit easy linguistic interpretation. Pedrycz et al. [27–31] have welldiscussed the requirements of the membership functions’ semantic integrity that are deemed crucial to thedevelopment of the entire fuzzy model. The general concepts of the semantic integrity are briefly reviewedbelow [29]:

• Distinguishability: the membership functions of fuzzy sets should be defined on a certain range in the uni-verse of discourse and be clearly distinguished from each other to represent some transparent semanticmeanings.

838 Q. Luo et al. / Information Sciences 178 (2008) 836–857

• A justifiable number of elements: the number of the linguistic terms should not exceed the well-known limitof 7 ± 2 distinct terms to assure that a human being can efficiently store and utilize them in his/her inferenceactivities.

• Natural zero positioning: if required by the nature of the problem, one of the membership function shouldbe unimodal, convex, and centered at zero to represent the ‘‘around zero’’ conceptual entity.

• Coverage: the entire discourse should be covered by the membership functions of fuzzy sets.• Normalization: since each fuzzy set has a clear semantic meaning, at least one point in the universe of dis-

course should acquire grade ‘‘1’’ from a membership function.

The number of elements can be specified in advance for fuzzy modeling, and the properties of distinguish-ability and coverage can be achieved by properly constructing the fuzzy partition of the universe of the dis-course and be enforced during the optimization process of the overall system [27,29]. The semanticintegrity of the sinc membership functions in the context of fuzzy systems can be elaborated on as follows:

First, a sinc membership function needs to be normalized, since it maps the discourse into a totally orderedinterval that includes the negative values l : R! [�0.217, 1] instead of the usual unit interval [0, 1]. However,we can view a sinc-shaped fuzzy set as a generalized fuzzy set [17]. Furthermore, we can linearly transform thegrade from the interval [�0.217, 1] to the unit interval, if required.

Second, the oscillatory nature of a sinc function does not admit easy linguistic interpretation [19], since thesinc function has many local minima. However, from the practical viewpoint, the oscillatory nature of the sincfunction appears to be sound for the following two reasons:

(1) The membership functions formulated from data usually have oscillatory nature. For example, in boththe numerical and the practical results of the fuzzy clustering methods, the membership functions of theclusters tend to have more local minima with the increasing values of the fuzzification factor [30,39].

(2) The sinc membership function sinððx�mÞ=dÞðx�mÞ=d

� �can be parameterized by its location parameter m and disper-

sion or variance-like parameter d : m specifies the center of the sinc-shaped fuzzy set and d determines thesize of this fuzzy set. The users can pick sinc-shaped fuzzy sets and tune them by only considering the m/d parameters. This is true for the commonly used fuzzy sets such as the Gaussian bell curves, symmetrictriangles, and symmetric trapezoids. Fig. 1a shows the sinc membership functions with m = 0, d = 1/15;the Gaussian membership functions are presented on Fig. 1b with m = 0, d = 0.1. It is observed that thesinc membership functions are similar to the Gaussian membership functions, except for the undulatingside lobes.

In a word, we could simply consider the smooth bell-shaped envelope of the sinc function and treat it as theguassian curve (see Fig. 2), i.e., a domain expert’s fuzzy concepts could be safely interpreted as appropriatelycentered and scaled sinc-shaped fuzzy sets [19].

With regard to the approximation problem, the sinc functions that serve as input membership functions areproved to have the almost best approximation property in some classes of membership functions by investi-gating the relationships between the shapes of fuzzy sets and the approximating capability of fuzzy systems.This relationship could be developed by estimating the uniform approximation rates of the fuzzy systems. Kimand Mendel [12] compared the fuzzy basis functions with other basis functions, including probabilistic neuralnetworks (PNN) [34] and general regression neural networks (GRNN) [35]. Hence, if we assume that the train-ing data are sampled from a distribution function and fuzzy systems are constructed directly and only fromthose data, the approximation rates of the fuzzy systems could be estimated according to the theories that aresimilar to those obtained for PNN and GRNN. Most studies on PNN and GRNN only deal with Gaussiankernels; however, there are many other choices of the membership functions for the fuzzy sets in fuzzy systems.

In this paper, we assume that the analytical form of a continuous function f(x) defined on the closed inter-val U � R is unknown, while the input–output behavior of f(x) for any x 2 U is known. In this case, f(x) issimilar to a black box [42]. Without loss of generality, we also specify that f : U = [�1,1]! V � R, whereV is a bounded subset of R. Further, the input–output data of f on U are fðxi; yiÞg

ni¼1, where the fxign

i¼1 areconsidered as a random sample of size n from the absolute continuous distribution function H(x) with the

−1 −0.5 0 0.5 1

−0.2

0

0.2

0.4

0.6

0.8

1sinc(15x)lower envelopeupper envelope

Fig. 2. The graph of the smooth envelope of the sinc function with m = 0, d = 1/15.

−1 −0.5 0 0.5 1

−0.2

0

0.5

1

x−1 −0.5 0 0.5 1

0

0.5

1

x

a b

Fig. 1. Nine fuzzy sets for input x. All membership functions have the same width but different centers. (a) Nine sinc set functions and (b)nine Gaussian set functions.

Q. Luo et al. / Information Sciences 178 (2008) 836–857 839

density function h(x) and yi = f(xi). Generally, the training data fxigni¼1 could be a random sample from any

distribution function; however, h(x) is determined in this paper by the denominator of a fuzzy system and isonly required to be bounded away from 0 in U by some e0 > 0, i.e., h(x) > e0 for any x 2 U. Theoretically, thesize of a random sample could be arbitrarily. We prove that the fuzzy systems have the universal approxima-tion properties for any continuous function defined on U. Furthermore, we estimate the uniform approxima-tion rates of the fuzzy systems, where the uniform approximation rates are defined as follows:

kF nðxÞ � f ðxÞk1 ¼ supx2UjF nðxÞ � f ðxÞj:

Then, we compare the uniform approximation rates among various shapes of input membership functionsfor the fuzzy systems. In particular, the sinc functions that serves as input membership functions are proved tohave the almost best approximation property in a particular class of membership functions.

The remainder of this paper is organized as follows: Section 2 provides some preliminaries. Section 3 pre-sents the construction of fuzzy systems from the training data. In Section 4, some properties pertaining to awide range of input membership functions are provided. The universal approximation properties are provedfor the fuzzy systems and their uniform approximation rates are estimated in Section 5. On the basis of theapproximation rates, Section 6 compares the performances among different input membership functions inthe fuzzy systems for function approximation. The numerical simulations are reported in Section 7 in orderto validate the theoretical results obtained in this paper. Furthermore, the function approximation rates ofthe fuzzy systems in a higher dimension are discussed in Section 8. The final section contains some conclusionsand a forward look at our subsequent research.

840 Q. Luo et al. / Information Sciences 178 (2008) 836–857

2. Preliminaries

In this paper, let C(U) be the set of the continuous functions defined on U with the supremum norm

kf kCðUÞ,kf k1, supx2Ujf ðxÞj;

and for any x 2 Uc, f(x) is defined to be zero. Since f is not continuous at the extremal points of U, in the fol-lowing of this paper, let U be the d0-subinterval [�1 + d0,1 � d0] with a given d0 > 0, for simplicity. For1 6 p <1, f 2 Lp implies that

kf kp ¼Z

Rjf ðuÞjp du

� �1=p

< þ1:

Further, f 2 NL1 implies f 2 L1 andR

Rf ðuÞdu ¼ 1. Assume that X(U) could represent C(U) or Lp(U). For

any f 2 X(U) and d P 0, the modulus of continuity of f is given by

xðX ðUÞ; f ; dÞ ¼ supjhj6dkf ð� þ hÞ � f ð�ÞÞkX ðUÞ;

and the generalized modulus of continuity of f is defined as follows:

x�ðX ðUÞ; f ; dÞ ¼ supjhj6dkf ð� þ hÞ þ f ð� � hÞ � 2f ð�ÞÞkX ðUÞ:

Furthermore, if x(X(U); f;d) = O(da), then f 2 Lip(X(U);a), i.e., f satisfies the ath-order Lipschitz condition.Similarly, f 2 Lip*(X(U);a) implies that f satisfies the generalized Lipschitz condition. f^ notates the Fouriertransform of f; f_, the Fourier reverse transform of f.

3. Fuzzy systems

In this section, we introduce the mathematical formula of the fuzzy system in the single-input–single-output(SISO) case. With n rules, the fuzzy system is denoted by Fn : R! R, which comprises four principal compo-nents: singleton fuzzifier, product inference engine, center-average defuzzifier (see [51] for more details), andthe rule base. The rule base stores n rules

Ri : IF x is Ai; THEN y is Bi; i ¼ 1; 2; . . . ; n; ð1Þ

where Ai (i = 1,2, . . . ,n) are the if-part sets and Bi (i = 1,2, . . . ,n) are the then-part sets. Assuming that thethen-part sets are singleton fuzzy sets, the rules can be rewritten in the following form

Ri : IF x is Ai; THEN y ¼ yi; i ¼ 1; 2; . . . ; n; ð2Þ

where x 2 U is the input variable, Ai is the if-part set, and yi is the point in V at which Bi(y) achieves its max-imum value (when Bi is a singleton fuzzy set, Bi(yi) = 1). Let li be the fuzzy membership function correspond-ing to the if-part set Ai. Clearly, in the SISO case, the fuzzy system can be expressed as follows:

F nðxÞ ¼Xn

i¼1

liðxÞPnj¼1ljðxÞ

" #yi: ð3Þ

In order to construct a fuzzy system from the random sample fxigni¼1, we define e0-complete for Ai (i = 1,2, . . . ,n).

Definition 1. Fuzzy sets Ai (i = 1,2, . . . ,n) as a partition on U are said to be e0-complete if there exists e0 > 0such that we have 1

n

Pni¼1liðxÞP e0 for any x 2 U.

As in [51], we assume that an if-part partition must be an e0-complete partition on U. This implies that atleast one of the fuzzy IF–THEN rules should be fired for every x 2 U. Therefore, when a partition is e0-com-plete, the denominator of (3) is bounded away from zero for all x 2 U. Hence, the fuzzy systems are welldefined. Fig. 3 illustrates an example of a e0-complete partition.

Fig. 3. Graph of the e0-complete partition.

Q. Luo et al. / Information Sciences 178 (2008) 836–857 841

4. Input membership functions

The input membership functions li(�) (i = 1, . . . ,n) can have many shapes, and each shape affects theapproximating capability of fuzzy systems [19]. Many classes of membership functions have been proposedin literatures [11], including triangular functions, normal peak functions, pseudo-trapezoidal functions, betafunctions, etc. In this paper, the input membership functions of a fuzzy system are generated by the transla-tions and scale transformations of a kernel function. By using Fourier analysis technology, we explain how theshape of the input membership functions determines the approximating capability of the fuzzy system. Somerelated notations, concepts, and properties are introduced below.

Definition 2. The class of functions {l(�;r), r > 0} is the kernel on R if for any r > 0, l(�;r) 2 NL1. l(�;r) issymmetric or positive if for any r > 0, l(x;r) = l(�x;r) or l(x;r) P 0.

Let I(f;x;r) denote the singular integral generated by the kernel function l(x;r) as follows:

Iðf ; x; rÞ ¼Z

R

f ðuÞlðx� u; rÞdu: ð4Þ

Definition 3. The kernel {l(�;r),r > 0} is an approximation identity kernel, if there exists some constantsM > 0 and d > 0 such that

klð�; rÞk1 6 M 8r > 0; ð5Þ

limr!0

Zd6jujjlðu; rÞjdu ¼ 0: ð6Þ

Lemma 1 [3]. If the kernel {l(�;r), r > 0} is an approximation identity kernel, then for any f 2 C(U),

limr!0kIðf ; �; rÞ � f ð�Þk1 ¼ 0:

Definition 4. {rn} are regularizing scale factors if rn > 0 for any n 2 N and the seriesP1

n¼1 expð�gnr2nÞ con-

verge for every g > 0.

The formulation of the kernel-type estimator is

fnðxÞ ¼1

nrn

Xn

i¼1

f ðxiÞlx� xi

rn

� �;

where fxigni¼1 is the random sample with the density function h(x).

842 Q. Luo et al. / Information Sciences 178 (2008) 836–857

Lemma 2 [33]. Let f(x) = E[fn(x)] be the expectation of the kernel-type estimator. Let {rn} be regularizing scale

factors; then,

limn!1kfnðxÞ � f ðxÞk1 ¼ 0

with probability one if and only if f is continuous on U.

Definition 5. Let l be a given function defined on U. The translations and scale transformations of l aredefined as 1

r lðx�ar Þ, where x 2 U, a is the translation factor, and r is the scale factor.

Lemma 3. Suppose that l 2 L1 andR

RlðxÞdx 6¼ 0, then the kernel of the form f1

r lðxrÞ; r > 0g confirms an approx-

imation identity kernel.

Proof 1 (Proof of Lemma 3). Without loss of generality, we can further suppose that l 2 NL1; otherwise, wecan replace l(x) by lðxÞ

c , where c ¼R

RlðxÞdu 6¼ 0. Then, we have

Z

R

1

rl

ur

� �du ¼

ZR

lur

� �d

ur¼Z

R

lðsÞds ¼ 1:

Trivially, l(x;r) is bounded and limr!0

Rd6juj j 1

r lðurÞjdu ¼ 0. h

By Definition 5 and Lemma 3, the scale transformations of l 2 L1 confirm an approximation identity ker-nel {l(x;rn), rn > 0} with the regularizing scale factors {rn} in a particular form

lðx; rnÞ ¼1

rnl

xrn

� �: ð7Þ

Furthermore, flðx� xi; rnÞgni¼1 construct an e0-complete partition on U, thereby, for the SISO case, a fuzzy

system can be rewritten as follows:

F nðxÞ ¼Xn

i¼1

1rn

l x�xirn

� �Pn

j¼11rn

l x�xj

rn

� �24

35yi,

Xn

i¼1

lnðx� xiÞPnj¼1lnðx� xjÞ

" #yi; ð8Þ

where x 2 U and fyigni¼1 are the corresponding outputs of fxign

i¼1.

5. Approximation properties

In this section, the fuzzy systems (8) are proved to be universal approximators, and their uniform approx-imation rates are estimated.

Theorem 1. Suppose that f 2 C(U); then,

limn!1kF nðxÞ � f ðxÞk1 ¼ 0; almost sureða:s:Þ:

For simplicity, we introduce the following notations:

H nðxÞ ¼1

n

Xn

k¼1

vð�1;x�ðxkÞ; ð9Þ

anðxÞ ¼1

nrn

Xn

i¼1

f ðxiÞlx� xi

rn

� �¼Z

R

f ðuÞlnðx� uÞdH nðuÞ; ð10Þ

E½anðxÞ� ¼Z

R

f ðuÞlnðx� uÞdHðuÞ; ð11Þ

gnðxÞ ¼1

nrn

Xn

i¼1

lx� xi

rn

� �¼Z

R

lnðx� uÞdH nðuÞ; ð12Þ

Q. Luo et al. / Information Sciences 178 (2008) 836–857 843

E½gnðxÞ� ¼Z

R

lnðx� uÞdHðuÞ; ð13Þ

F nðxÞ ¼anðxÞgnðxÞ

; ð14Þ

F ðxÞ ¼ E½anðxÞ�E½gnðxÞ�

; ð15Þ

then, we have

kF nðxÞ � f ðxÞk1 6 kF nðxÞ � F ðxÞk1 þ kF ðxÞ � f ðxÞk1,kI1k1 þ kI2k1: ð16Þ

Proof 2 (Proof of Theorem 1). In view of Lemma 2,

limn!1kanðxÞ � E½anðxÞ�k1 ¼ 0; a:s:;

limn!1kgnðxÞ � E½gnðxÞ�k1 ¼ 0; a:s:;

and gn(x), an(x), E[an(x)], E[gn(x)] are all bounded on U. Hence,

limn!1kI1k1 ¼ 0; a:s: ð17Þ

On the other hand, since the kernel in (8) is an approximation identity kernel, by Lemma 1 we have

limr!0kE½anðxÞ� � f ðxÞhðxÞk1 ¼ 0;

limr!0kE½gnðxÞ� � hðxÞk1 ¼ 0:

Observe that

kI2k1 6 supxjE½gnðxÞ�j

�1jE½anðxÞ� � f ðxÞhðxÞj þ supx

f ðxÞE½gnðxÞ�

��������jE½gnðxÞ� � hðxÞj:

Notice that U is a compact subset of R, f 2 C(U), and h(x) > e0; hence,

limn!1kI2k1 ¼ 0: ð18Þ

Therefore, the theorem has been proved. h

Theorem 2. Let the kernel l(�;rn) in (8) be positive. If there exists some n 2 R such that

Iðu; n; rnÞ ¼Z

R

ulnðn� uÞdu ¼ nþ bðn; rnÞ; limn!1

bðn; rnÞ ¼ 0; ð19Þ

Iðu2; n; rnÞ ¼Z

R

u2lnðn� uÞdu ¼ n2 þ cðn; rnÞ; limn!1

cðn; rnÞ ¼ 0; ð20Þ

then, for any f 2 C(U),

kF nðxÞ � f ðxÞk1 6C1

e0rnðlog n=nÞ

12 þ 1

e0

� OðxðCðUÞ; fh; dðc; bÞÞÞ þ C2

e0

OðxðCðUÞ; h; dðc; bÞÞÞ; a:s:;

ð21Þ

where dðc; bÞ ¼

ffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffifficðn; rnÞ � 2nbðn; rnÞ

p:

Lemma 4 [3]. Let the kernel {l(x;q)} of singular integral I(f;x;q) be positive. If there exists some n 2 R such

that

Iðu; n; qÞ ¼ nþ bðn; qÞ; limq!0

bðn; qÞ ¼ 0;

Iðu2; n; qÞ ¼ n2 þ cðn; qÞ; limq!0

cðn; qÞ ¼ 0;

844 Q. Luo et al. / Information Sciences 178 (2008) 836–857

then, for any f 2 C(R),

kIðf ; �; qÞ � f ð�ÞkCðRÞ 6 O xðCðRÞ; f ;ffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffifficðn; qÞ � 2nbðn; qÞ

� �ðq! 0Þ:

Proof 3 (Proof of Theorem 2). Observe that

supxjanðxÞ � E½anðxÞ�j ¼ sup

xjZ

R

f ðuÞlnðx� uÞdðH nðxÞ � HðxÞÞj

¼ supxjff ðuÞlnðx� uÞðHnðxÞ � HðxÞÞg1�1 �

ZR

½HnðxÞ � HðxÞ�dðf ðuÞlnðx� uÞÞj

¼ supx

1

rn

ZR

½H nðxÞ � HðxÞ�d f ðuÞl x� urn

� �� ��������� 6 M0

rnsup

xjHnðxÞ � HðxÞj; ð22Þ

where M0 is the total variation of f Æ l on U. From the results of [8], we can obtain

P supxjH nðxÞ � HðxÞj > rngn

M0

� �6 M exp � 2nr2

ng2n

M20

� �8gn > 0;

where M is an absolute constant.As in [33], let gn ¼ ðM0=rnÞðlog n=nÞ

12: Since

X1

n¼1

expð�2nr2ng

2n=M2

0Þ <1;

by Borel–Cantelli lemma,

supxjH nðxÞ � HðxÞj 6 ðlog n=nÞ

12; a:s: ð23Þ

From (22) and (23) we get

supxjanðxÞ � E½anðxÞ�j 6

M0

rnðlog n=nÞ

12; a:s: ð24Þ

By the same reason we also have

supxjgnðxÞ � E½gnðxÞ�j 6

M1

rnðlog n=nÞ

12; a:s:; ð25Þ

where M1 is the total variation of l.In consequence of (24) and (25), the following facts hold

kI1k1 ¼ supx

anðxÞgnðxÞ

� E½anðxÞ�E½gnðxÞ�

��������

6 supxjgnðxÞj

�1janðxÞ � E½anðxÞ�j þ supx

E½anðxÞ�gnðxÞE½gnðxÞ�

��������jgnðxÞ � E½gnðxÞ�j

61

e0

�M0

rnðlog n=nÞ

12 þM2

e0

�M1

rnðlog n=nÞ

12 ¼ C1

e0rnðlog n=nÞ

12; a:s:; ð26Þ

where C1 = (M0 + M1M2).On the other hand, for I2, the kernel l(�;rn) of the singular integral

IðUðuÞ; x; rnÞ ¼Z

R

UðuÞlnðx� uÞdu ð8U 2 CðUÞÞ

is positive and satisfies (19) and (20). Then, by Lemma 4,

kIðU; �; rnÞ � Uð�Þk1 6 O xðCðUÞ; U;ffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffifficðn; rnÞ � 2nbðn; rnÞ

� �: ð27Þ

Q. Luo et al. / Information Sciences 178 (2008) 836–857 845

Therefore,

kE½anðxÞ� � f ðxÞhðxÞk1 6 OðxðCðUÞ; fh; dðc; bÞÞÞ; ð28ÞkE½gnðxÞ� � hðxÞk1 6 OðxðCðUÞ; h; dðc; bÞÞÞ; ð29Þ

where dðc; bÞ ¼ffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffifficðn; rnÞ � 2nbðn; rnÞ

p: Considering (28) and (29) we have

kI2k1 6 supxjE½gnðxÞ�j

�1jE½anðxÞ� � f ðxÞhðxÞj þ supx

f ðxÞE½gnðxÞ�

��������jE½gnðxÞ� � hðxÞj

61

e0

OðxðCðUÞ; fh; dðc; bÞÞÞ þ C2

e0

OðxðCðUÞ; h; dðc; bÞÞÞ; ð30Þ

where C2 is the bound of jf(x)j on U.Combining inequalities (16), (26), and (30), we conclude

kF nðxÞ � f ðxÞk1 6 kF nðxÞ � F ðxÞk1 þ kF ðxÞ � f ðxÞk1

6C1

e0rnðlog n=nÞ

12 þ 1

e0

� OðxðCðUÞ; fh; dðc; bÞÞÞ

þ C2

e0

OðxðCðUÞ; h; dðc; bÞÞÞ; a:s: � ð31Þ

Corollary 1. Suppose that h(x) is an uniform density function on U and the conditions of Theorem 2 hold; then,

for any f 2 C(U),

kF nðxÞ � f ðxÞk1 6C1

e0rnðlog n=nÞ

12 þ 1

e0

� OðxðCðUÞ; f ; dðc; bÞÞÞ; a:s:; ð32Þ

where dðc; bÞ ¼ffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffifficðn; rnÞ � 2nbðn; rnÞ

p:

In view of the above theorem, if e0 increases, the rates decrease while the number of if-part sets increases.The kernels that serve as input membership functions are usually positive; however, from the viewpoint of thefunction approximation, the kernels that are not positive will also be worth discussing. Before giving the the-orem, we first review the following lemma in [3].

Definition 6. If a is a positive number, the ath-order absolute moment of l is defined as follows:

mðl; aÞ ¼Z

R

jujajlðuÞjdu:

Lemma 5 [3]. Let l 2 NL1 be a symmetric function. If for some 0 < a 6 2, the ath-order absolute moment of lexists and is finite, then for any f 2 Lip*(X(U);a),

kIðf ; �; qÞ � f ð�ÞkX ðUÞ ¼ Oðq�aÞ ðq!1Þ;

where Iðf ; x; qÞ ¼R

Rf ðuÞqlðqðx� uÞÞdu.

Similar to Theorem 2, we may easily obtain Theorem 3.

Theorem 3. Suppose that the kernel l(�;rn) in (8) is symmetric and the distribution function H(x) is the uniform

distribution on U. If for some 0 < a 6 2 the ath-order absolute moment of the kernel exists and is finite, then for

any f 2 Lip*(C(U);a),

kF nðxÞ � f ðxÞÞk1 6C1

e0rnðlog n=nÞ

12 þ 1

e0

� OðranÞ; a:s:

Under the conditions of Theorem 3, it is obvious that

kF nðxÞ � f ðxÞÞk1 6 O½ðlog n=nÞa=2ð1þaÞ�; a:s:;

when the rn ’ (log n/n)1/2(1+a).

846 Q. Luo et al. / Information Sciences 178 (2008) 836–857

Corollary 2. Suppose that h(x) is a uniform density function on U and the conditions of Theorem 3 hold. Let

rn ’ (logn/n)1/2(1+a); then, for any f 2 Lip*(C(U);a),

kF nðxÞ � f ðxÞÞk1 6 O½ðlog n=nÞa=2ð1þaÞ�; a:s:

6. Comparison among various shapes of fuzzy sets

In this section, we test some input membership functions, including the triangular, trapezoidal, Gaussian,quadratic, Laplace, p, H4, Fejer, Cauchy, and sinc set functions, in order to show the relationships betweenthe approximating capabilities of fuzzy systems and the shapes of the input membership functions. The def-initions of these candidate functions are listed as follows:

(1) Triangular set function

l1ðxÞ ¼xþ 1; �1 6 x < 0;

�xþ 1; 0 6 x 6 1;

0; jxj > 1:

8><>: ð33Þ

(2) Trapezoidal set function

l2ðxÞ ¼

2xþ 2; �1 6 x 6 �0:5;

1; �0:5 < x < 0:5;

�2xþ 2; 0:5 6 x 6 1;

0; jxj > 1:

8>>><>>>:

ð34Þ

(3) Gaussian set function

l3ðxÞ ¼ exp � 1

2x2

� �: ð35Þ

(4) Quadratic set function

l4ðxÞ ¼1� x2; �1 6 x 6 1;

0; jxj > 1:

�ð36Þ

(5) Laplace set function

l5ðxÞ ¼ expð�jxjÞ; ð37Þ

(6) p set function

l6ðxÞ ¼

0; x < �32;

89ðxþ 3

2Þ2; �3

26 x 6 �3

4;

1� 89x2; �3

4< x 6 3

4;

89

x� 32

�2; 3

4< x 6 3

2;

0; jxj > �32:

8>>>>>><>>>>>>:

ð38Þ

(7) H4 set function

l7ðxÞ ¼98ð1� 5

3x2Þ; jxj 6 1;

0; jxj > 1:

�ð39Þ

(8) Fejer set function

l8ðxÞ ¼sinðx=2Þ

x=2

h i2

; x 6¼ 0;

1; x ¼ 0:

8<: ð40Þ

Q. Luo et al. / Information Sciences 178 (2008) 836–857 847

(9) Cauchy set function

a

d

g

1

l9ðxÞ ¼1

1þ x2; ð41Þ

b c

e f

h i

j

Fig. 4. The candidate input membership functions.

848 Q. Luo et al. / Information Sciences 178 (2008) 836–857

(10) Sinc set functions

TableThe ca

1rn

l xrn

�l1

l2

l3

l4

l5

l6

l7

l8

l9

l10

‘‘–’’ de

l10ðxÞ ¼sinðxÞ

x ; x 6¼ 0;

1; x ¼ 0:

(ð42Þ

Fig. 4 shows the graphs of the candidate functions. It is easy to see that these membership functions allsatisfy the following conditions:

(1) 0 6 l(x) 6 1 "x 2 R;(2) l 2 L1 and

RR

lðxÞdx 6¼ 0;(3) l is symmetric.

Table 1 lists the validating results of the above candidates for the conditions in Theorems 2 and 3. Since allthe candidates are symmetric functions, I(u;x;rn) = x and b(n;rn) = 0. By Theorem 2, the uniform approxi-mation rates for the fuzzy systems with l1–l6 as the input membership functions can be estimated. Thesesix candidates do not differ in the orders of their uniform approximation rates. Given the same target functionf and the same distribution function H(x), the uniform approximation rates of all candidates are determinedby O(x(C(U); fh;d(c,b))), where d(c,b) ’ O(rn).

Although both the first and the second-order absolute moments of l7 exist and are finite, it is not a positivekernel, i.e., the conditions of Theorem 2 do not hold for l7. However, l7 is symmetric and m(l7;a) (0 < a 6 2)exists and is finite, and Table 1 shows that both the first and second-order absolute moments exist and arefinite for l1–l6. Therefore, for any f 2 Lip*(C(U);a), by Corollary 2, the uniform approximation rates ofthe fuzzy systems with the input membership functions having any one of shapes l1–l7 are O[(logn/n)a/2(1+a)].

Table 1 also shows that even the first-order absolute moment of l8 or l9 does not exist; however, for any0 < a < 1, the ath-order absolute moments of l8 and l9 exist, and

mðl8; aÞ ¼ � 4ranCð�1þ aÞ sin

ap2

� �;

mðl9; aÞ ¼ran sec

ap2

� �;

where the Gamma function CðxÞ ¼R1

0tx�1e�tdt: m(l;a) (0 < a < 1) of the candidates l1–l7 are listed below

mðl1; aÞ ¼2ra

n

2þ 3aþ a2;

mðl2; aÞ ¼2�að�1þ 22þaÞra

n

2þ 3aþ a2;

1ndidates in this table are all normalized to NL1�

I(u2;x;rn) c(n;rn) d(c,b) m(l; 1) m(l; 2)

r2n=6þ x2 r2

n=6 rn=ffiffiffi6p

rn/3 r2n=6

5r2n=24þ x2 5r2

n=24ffiffiffiffiffiffiffiffiffiffi5=24

prn 7rn/18 5r2

n=24

r2n þ x2 r2

n rn rn

ffiffiffiffiffiffiffiffi2=p

pr2

n

r2n=5þ x2 r2

n=5 rn=ffiffiffi5p

3rn/8 r2n=5

2r2n þ x2 2r2

n

ffiffiffi2p

rn rn 2r2n

9r2n=32þ x2 9r2

n=32ffiffiffiffiffiffiffiffiffiffi9=32

prn 7rn/16 9r2

n=32

x2 0 0 39rn/80ffiffiffiffiffiffiffiffiffiffi3=59

p=25r2

n– – – – –– – – – –– – – – –

notes that the integral does not exist.

Q. Luo et al. / Information Sciences 178 (2008) 836–857 849

mðl3; aÞ ¼ 212þ

a2C

1þ a2

� �ra

n;

mðl4; aÞ ¼4ra

n

3þ 4aþ a2;

mðl5; aÞ ¼ 2Cð1þ aÞran;

mðl6; aÞ ¼31þa4�að�1þ 22þaÞra

n

6þ 11aþ 6a2 þ a3;

mðl7; aÞ ¼ð�6þ 23

5þa2 5

�1�a2 þ 3aÞra

n

2ð3þ 4aþ a2Þ :

It is clear that the only differences in the ath-order absolute moments of the candidates are the coefficients ofra

n. For any f 2 Lip*(C(U);a), by 2, the uniform approximation rate is O[(log n/n)a/2(1+a)] for all 0 < a < 1.However, the sinc function is very different from the others: the ath absolute moment of l10 does not exist

for any a > 0; therefore, the above theorems can not be applied. However, as presented by Mitaim and Koskoin [19], this kind of set functions appears to have some advantages in function approximation. Next, we provethat the sinc functions that serve as input membership functions have the almost best approximation propertyin a certain class of kernels.

Let X denote the set of kernels satisfying the following conditions:

(1) l is continuous and supp(l) � U;(2) for some 0 < a 6 2, m(l;a) exists and is finite;(3) l is symmetric;(4) Fourier reverse transformation of l exists and is denoted by l_.

Choose l 2 X to be the input membership function and the fuzzy system Fn(x;l) constructed as (8). For anyf 2 C(U) Theorem 1 holds, i.e., Fn(x;l) is the universal approximator to C(U). Let h(x) be the uniform densityfunction; for any f 2 Lip*(C(U);a), we can obtain

kF nðx; lÞ � f ðxÞk1 6C1

e0rnðlog n=nÞ

12 þ 1

e0

� O ran

�; a:s: ð43Þ

Let

Ilðf ; x; rnÞ �1ffiffiffiffiffiffi2pp

Z 1

�1f ðx� uÞ 1

rnl

urn

� �du ¼ 1ffiffiffiffiffiffi

2pp

Z 1

�1f ^ðvÞeixvl_ðrnvÞdv; ð44Þ

Sðf ; x; rnÞ �1

p

Z 1

�1f ðx� uÞ

sinð urnÞ

udu ¼ 1ffiffiffiffiffiffi

2pp

Z r�1n

�r�1n

f ^ðvÞeixv dv; ð45Þ

EnðCðUÞ; f Þ ¼ infl2XkIlðf ; x; qÞ � f ðxÞk1: ð46Þ

Theorem 4. For any f 2 Lip*(C(U);a), there exist constants C1 and C2 such that,

kF nðx; l10Þ � f ðxÞk1 6C1

e0rnðlog n=nÞ

12 þ kðrnÞEnðCðUÞ; f Þ;

where k(rn) = jIntSinc(rn)j + C2rn + 1.

Before proving this theorem, we first introduce some lemmas.

Lemma 6. For any l 2 X,

SðIlðf ; x; rnÞ; x; rnÞ ¼ Ilðf ; x; rnÞ:

850 Q. Luo et al. / Information Sciences 178 (2008) 836–857

Proof 4 (Proof of Lemma 6). For any a > 0,

ffiffiffi2

p

rsin ax

x

" #^ðvÞ ¼

1; jvj < a;12; jvj ¼ a;

0: jvj > a:

8><>:

Observe that

SðIlðf ; x; rnÞ; x; rnÞ ¼1

p

Z 1

�1

1ffiffiffiffiffiffi2pp

Z 1

�1

1

rnlðrnvÞf ^ðvÞ � eiðx�uÞv dv

sin u=rn

u=rndu

¼ 1ffiffiffiffiffiffi2pp

Z 1

�1

1ffiffiffiffiffiffi2pp

Z 1

�1

ffiffiffi2

p

rsin u=rn

u� e�iuv dulðvrnÞf ^ðvÞeixv dv

¼ 1ffiffiffiffiffiffi2pp

Z 1

�1v½�r�1

n ;r�1n �ðvÞlðvrnÞf ^ðvÞeixv dv ¼ 1ffiffiffiffiffiffi

2pp

Z 1

�1v½�1;1�ðvrnÞlðvrnÞf ^ðvÞeixv dv

¼ 1ffiffiffiffiffiffi2pp

Z 1

�1f ðx� uÞ½v½�1;1�l�

^ urn

� �du:

Since l 2 X, condition (1) holds and v[�1,1]l(�) = l(�); this implies the conclusion. h

Lemma 7. For any f 2 Lip*(C(U);a)

kSðf ; x; rnÞ � f ðxÞk1 6 kðrnÞEnðCðUÞ; f Þ:

Proof 5 (Proof of Lemma 7). Let Il� ðf ; x; rnÞ be the best singular integral approximation of f with kernels in X.In view of lemma 6,

SðIl� ðf ; x; rnÞ; x; rnÞ ¼ Il� ðf ; x; rnÞ:

Note that U = [�1,1], the following inequations hold,

kSðf ; x; rnÞ � f ðxÞk1 6 kSðf ; x; rnÞ � Il� ðf ; x; rnÞk1 þ kIl� ðf ; x; rnÞ � f ðxÞk1¼ kSðf � Il� ðf ; x; rnÞ; x; rnÞk1 þ kIl� ðf ; x; rnÞ � f ðxÞk1

61

prn

Zjuj6r�1

n

½f � Il� �ðx� uÞsin u

rn

urn

du

����������1

þ 1

prn

Zjuj>r�1

n

½f � Il� �ðx� uÞsin u

rn

urn

du

����������1

þ EnðCðUÞ; f Þ

6 EnðCðUÞ; f Þ 2

p

Z r�1n

0

sin urn

udu

����������þ sup

x2U

Zjuj>r�1

n

j½f � Il� �ðx� uÞjrn du

( )

þ EnðCðUÞ; f Þ6 EnðCðUÞ; f ÞjIntSincðrnÞj þ kf � Il�k1rn þ EnðCðUÞ; f Þ6 EnðCðUÞ; f ÞjIntSincðrnÞj þ C2rnEnðCðUÞ; f Þ þ EnðCðUÞ; f Þ¼ ðjIntSincðrnÞj þ C2rn þ 1ÞEnðCðUÞ; f Þ;

where IntSinc(rn) is bounded on R and tends to p/2. Let k(rn) = jIntSinc(rn)j + C2rn + 1; then, this lemma isproved. h

Proof 6 (Proof of Theorem 4). Recall the definitions of I1 and I2 in (16), in view of Lemma 7,

kI2k1 6 ðjIntSincðrnÞj þ C2rn þ 1ÞEnðCðUÞ; f Þ:

Therefore, by (26), we can prove this theorem:

a

Q. Luo et al. / Information Sciences 178 (2008) 836–857 851

kF nðx; l10Þ � f ðxÞk1 6 kI1k1 þ kI2k1 6C1

e0rnðlog n=nÞ

12 þ kðrnÞEnðCðUÞ; f Þ: �

The above discussion shows that there is no general answer to the following question: What is the bestshape of the if-part fuzzy sets in a fuzzy system for function approximation? The relationship between theapproximating capability of a fuzzy system and the shape of fuzzy sets involves both the continuous conditionof a target function and the existence condition of the absolute moment of input membership functions. How-ever, the sinc functions that sever as input membership functions have some advantages in a certain class ofmembership functions for function approximation.

7. Simulation results

This section constructs the fuzzy systems with different set functions to approximate different target func-tions (see Fig. 5). Set d0 = 0.1; then, the d0 subinterval is [�0.9, 0.9]. We uniformly sampled 101 points of thetarget function f on U to give the training data fxi; yig

101i¼1, where yi = f(xi). We then finely sampled 1001 points

to obtain the test data for each target function. The approximating capabilities of the fuzzy systems were eval-uated by scoring in terms of the maximum approximation error on the test data. The fuzzy systems were con-structed as (8) directly from the training data and no training process was needed here. The target functionsare listed below:

f1ðxÞ ¼5xþ 1;

f2ðxÞ ¼8x sinð10x2 þ 5xþ 1Þ þ 9;

f3ðxÞ ¼10e�ðjxj=0:2Þ þ e�jx�0:8j=0:3 þ e�jxþ0:6j=0:1;

f4ðxÞ ¼3xðx� 1Þðx� 1:9Þðx� 0:7Þðxþ 1:8Þ þ 1;

f5ðxÞ ¼1þ 10e�100ðx�0:7Þ2 sinð125=ðxþ 1:5ÞÞxþ 0:1

:

b c

d e

Fig. 5. The target functions.

−1 −0.5 0 0.5 1−4

−3

−2

−1

0

1

2

3

4

5

6target functionfuzzy systemkernel type estimatordensity function

Fig. 6. Approximation results of the target function f1 by the kernel-type estimator and the fuzzy system with triangular membershipfunctions are presented. The density function h(x), which is determined by the denominator of the fuzzy system, is indicated with ‘‘–x–’’.

852 Q. Luo et al. / Information Sciences 178 (2008) 836–857

Since any f 2 C(U) can be considered as a piecewise linear function if the pieces of the intervals are sufficientlysmall, first, the fuzzy systems were used to approximate a linear function. For the given target function f1, letrn = 0.1(logn/n)1/2(1+1). Fig. 6 shows that the fuzzy system with triangular membership functions approxi-mates f1 very well on [�0.9,0.9], and the maximum error on test data is 0.0044. Meanwhile, the kernel-typeestimator almost equalled to f1(x)/2 for any x 2 [�0.9,0.9], because Theorem 1 has proved that the kernel-typeestimator approximates not the target function but the target function multiplied with the density function.Here, the density function h(x), which is determined by the denominator of a fuzzy system, approximatesthe uniform density function on [�1,1], i.e., h(x) 1/2. This explains why the denominators of fuzzy systems

−0.8 −0.6 −0.4 −0.2 0 0.2 0.4 0.6 0.8−5

−4

−3

−2

−1

0

1

2

3

4

5x 10−3

Fig. 7. Approximation errors of f1 by the fuzzy system with triangular membership functions.

0 200 400 600 800 1000 12000

0.5

1

1.5

2

2.5

3

3.5

4

4.5x 10−3

Fig. 8. The maximum approximation errors of f1 against the size (n) of the random sample by the fuzzy system with triangularmembership functions.

0 200 400 600 800 1000 12000.011

0.012

0.013

0.014

0.015

0.016

0.017

0.018

0.019

Fig. 9. The graph of rn=ffiffiffi6p

.

Q. Luo et al. / Information Sciences 178 (2008) 836–857 853

are necessary and important for function approximation. Fig. 7 illustrates that the error level of f1 approxi-mated by the fuzzy system with triangular membership functions is stable. In view of Theorems 2 and 3, this isdetermined by the modulus of continuity of f1. If U is partitioned into pieces, the moduli of continuity of f1 aresame in all the pieces, thereby the error level is stable. Fig. 8 shows the picture of the maximum approximationerror of f1, which is approximated by the fuzzy systems with triangular membership functions, against the size

n of the training data. By 2, the rate of the maximum error tending to zero has the order of rn=ffiffiffi6p

(differing bysome constant factor). Comparing Fig. 8 with Fig. 9, we can see that these two curves tend to zero with thesame order of speed in terms of n.

For target function f2, Fig. 10 illustrates that the fuzzy system approximates f2 very well, while the kernel-type estimator has the half value of f2(x) for any x 2 U, since h(x) approximates the uniform density function

−1 −0.5 0 0.5 10

2

4

6

8

10

12

14

16

18target functionfuzzy systemkernel type estimatordensity function

Fig. 10. Approximation results of the target function f2 by the kernel-type estimator and the fuzzy system with triangular membershipfunctions.

−1 −0.8 −0.6 −0.4 −0.2 0 0.2 0.4 0.6 0.8 1−0.8

−0.6

−0.4

−0.2

0

0.2

0.4

0.6

Fig. 11. Approximation errors of f2 by the fuzzy system with triangular membership functions.

Table 2The maximum error on the test data for each target function, which was approximated by the fuzzy systems with different inputmembership functions

l1 l2 l3 l4 l5 l6 l7 l8 l9 l10

f1 0.0044 0.0044 0.0053 0.0072 0.0372 0.0006 0.0807 0.1105 5.0700 0.0088f2 0.5355 0.6824 2.4853 0.6745 3.0327 0.8585 1.9314 3.1681 2.3945 0.6376f3 0.6522 0.8137 1.5768 0.7857 1.8115 0.8890 0.5920 2.2725 1.8536 0.5412f4 0.0307 0.0314 0.0532 0.0480 0.2567 0.0148 0.5192 0.6020 0.6465 0.1949f5 1.7340 2.1578 5.9294 2.1220 6.2744 2.6597 2.9575 7.2126 5.0700 1.5251

854 Q. Luo et al. / Information Sciences 178 (2008) 836–857

on U. Furthermore, it is more clear in Fig. 11: the approximation error level is low when the modulus of con-tinuity of f2 in the piece is small; on the contrary, the approximation error level is high when the modulus islarge.

In view of the theorems we have already obtained, a is determined by the target function and the absolutemoment of l. Here, to compare among different candidates, we choose a = 1 for l1–l7, a = 0.1 for l8 and l9,and a = 0.05 for l10. Table 2 lists the simulation results. For most target functions, l1 and l10 have someadvantages over the other candidates; however, it is difficult to say which one is the best shape of the inputmembership functions for function approximation by the fuzzy systems.

8. High-dimensional result

In this section, the main result obtained above is extended to a higher dimension. Since the result of Lemma4 is only for f defined on R, Theorem 2 is extended to dimension p by introducing the definition of Lipschitznorm instead of employing Lemma 4.

Definition 7. For any real-valued function h(x) on Rp, define the bounded Lipschitz norm of h as follows

jjhjjBL ¼ supx2RpjhðxÞj þ sup

x6¼y

jhðxÞ � hðyÞj=jjx� yjj: ð47Þ

Suppose that the kernels l(u;r) satisfy the following conditions:

(1) l(u;r) is of bounded variation on Rp, and for any r > 0,

ZRp

lðu; rÞdu ¼ 1; ð48Þ

(2) l(u;r)(r > 0) is approximation identity kernel, i.e., there exists some constant M > 0 such that

ZRpjjujjlðu; rÞdu 6 M ; ð49Þ

limjjujj!1

jjujjlðu; rÞ ¼ 0: ð50Þ

For the multi-input–single-output (MISO) case, define the fuzzy system as follows:

F nðxÞ ¼Xn

i¼1

lnðx� xiÞPnj¼1lnðx� xjÞ

" #yi; ð51Þ

where x 2 U � Rp and fyigni¼1 are the corresponding outputs of fxign

i¼1.

Theorem 5. Suppose that kfkBL <1 is a given function and the fuzzy system Fn is constructed as (51). Ifrn = n�1/2(p+1), then

jjF nðxÞ � f ðxÞjj1 6 O½ðlog log nÞ1=2n�1=2ðpþ1Þ�; a:s: ð52Þ

Q. Luo et al. / Information Sciences 178 (2008) 836–857 855

Proof 7 (Proof of Theorem 5). Recalling the proof of the theorems for the SISO case, we can split the questioninto two sub-questions by simply extending the Definitions (9)–(13) to their high-dimensional versions. Define

HnðxÞ ¼ n�1Xn

k¼1

I ð�1;x�ðxkÞ; ð53Þ

where (�1,x] = {u 2 Uju1 6 x1, . . . ,up 6 xp}.Then, the fuzzy system Fn(x) can be rewritten in the form of F nðxÞ ¼ anðxÞ

gnðxÞ and F ðxÞ ¼ E½anðxÞ�E½gnðxÞ�. Therefore,

jjF nðxÞ � f ðxÞjj1 6 jjF nðxÞ � F ðxÞjj1 þ jjF ðxÞ � f ðxÞjj1: ð54Þ

Observe that

supx

janðxÞ � E½anðxÞ�j ¼ supx2U

r�pn

ZRp

f ðuÞlnðx� uÞdðH nðxÞ � HðxÞÞ����

���� 6 M0pr�pn sup

x

jH nðxÞ � HðxÞj

6 M0p n�1=2ðlog log nÞ1=2r�pn ; a:s:; ð55Þ

where M0p is the total variation of f Æ l on U. This inequality follows after integration by parts and the lastinequality can be seen in [33, Chapter 3].

Similarly, we have

supx

jgnðxÞ � E½gnðxÞ�j 6 M1p n�1=2ðlog log nÞ1=2r�pn ; a:s:; ð56Þ

where M1p is the total variation of l.Together with (55) and (56), on the closed subset U � Rp, the following inequality holds

jjF nðxÞ � F ðxÞjj1 6C1p

ep0r

pnn1=2

ðlog log nÞ1=2 a:s: ¼ O½n�1=2ðlog log nÞ1=2r�pn �: ð57Þ

On the other hand,

supx

jE½anðxÞ� � fhðxÞj ¼ supx2U

ZRp

lðuÞ½fhðx� rnuÞ � fhðxÞ�du

��������

6 jrnjZ

RpjjujjlðuÞ fhðx� rnuÞ

rpnjjujj

�du

�������� 6 rnM jjfhjjBL: ð58Þ

For the same reason, we also have

supx

jE½gnðxÞ� � hðxÞj 6 rnM jjhjjBL: ð59Þ

In consequence of (58) and (59), we obtain

jjF ðxÞ � f ðxÞjj1 6rnMep

0

jjfhjjBL þ C2p jjhjjBL

�¼ OðrnÞ; ð60Þ

where C2p is the bound of jf(x)j on U. Since rn = n�1/2(p+1), it follows from (52), (57), and (60) that

jjF nðxÞ � f ðxÞjj1 6 O½n�1=2ðlog log nÞ1=2r�pn � þ OðrnÞ ¼ O½ðlog log nÞ1=2n�1=2ðpþ1Þ�; a:s: � ð61Þ

9. Conclusions

In this paper, we have systematically discussed the kernel shapes of the fuzzy sets in the fuzzy systems forfunction approximation. For a fixed kernel shape, we have proved that the fuzzy systems generated by thiskernel are the universal approximators and have estimated the uniform approximation rates of these fuzzysystems. Then, we have developed the relationship between the approximating capability of the fuzzy systemsand the kernel shape of the fuzzy sets. These results indicate that the approximating capability of the fuzzysystems involves both the continuity of a target function and the existence of the ath absolute moment ofan input membership function.

856 Q. Luo et al. / Information Sciences 178 (2008) 836–857

Furthermore, we have obtained the almost best approximation property of a sinc function as an inputmembership function, which suggests that the input membership functions of the fuzzy systems are not nec-essarily positive for function approximation. Hence, we should consider more shapes of fuzzy sets rather thanonly the traditional ones in order to investigate whether a given fuzzy system can be improved. Since only thekernel shapes have been considered in this paper, we will continue the search for the best shape of the fuzzysets in fuzzy systems for function approximation.

In addition, the input membership functions of a given fuzzy system such as (8) are generated by a fixedkernel; hence, the relationship between the shape of the input membership functions and the approximatingcapability of the fuzzy system can be clearly analyzed. However, for an adaptive fuzzy system, the input mem-bership functions can adapt their shapes according to the training data, i.e., the input membership functions inan adaptive fuzzy system can have various shapes instead of a fixed one. The further topic is how to estimatethe function approximation rates for adaptive fuzzy systems.

Acknowledgements

We highly appreciate Professor Witold Pedrycz, Professor Bart Kosko, and the reviewers for their goodcomments. We would also like to thank Dr. Hui Li for her particular help with our English writing. At last,we specially thank Professor Puyin Liu for his kindly help and valuable advices on this study before he passedaway in September 2005.

References

[1] J.J. Buckley, Sugeno type controllers are universal controllers, Fuzzy Sets and Systems 53 (1993) 299–304.[2] P. Bauer, E.P. Klement, A. Leikermoser, B. Moser, Modeling of control functions by fuzzy controllers, in: H. Nguyen, M. Sugeno,

R. Tong, R.R. Yager (Eds.), Theoretical Aspects of Fuzzy Control, Wiley, New York, 1995, pp. 91–116.[3] P.L. Butzer, R.J. Nessel, Fourier Analysis and Approximation, Vol. 1, Brikhauser Press, West Germany, 1971.[4] R.J.G.B. Campello, W. Caradori do Amaral, Hierarchical fuzzy relational models: linguistic interpretation and universal

approximation, IEEE Transactions on Fuzzy Systems 14 (3) (2006) 446–453.[5] J.L. Castro, Fuzzy logic controllers are universal approximators, IEEE Transactions on SMC 25 (1995) 629–635.[6] S.G. Cao, N.W. Rees, G. Feng, Universal fuzzy controllers for a class of nonlinear systems, Fuzzy Sets and Systems 122 (2001) 117–

123.[7] R.A. DeVore, Nonlinear approximation, Acta Numerica 7 (1998) 51–150.[8] A. Dvoretzky, J. Kiefer, J. Wolfowitz, Asymptotic minimax character of the sample distribution function and of the classical

multinomial estimator, Annals of Mathematical Statistics 27 (1956) 642–669.[9] H.S. Ding, J.Q. Mao, Development of approximation theory of fuzzy systems, Journal of System Simulation 18 (8) (2006) 2061–2066.

[10] Y.S. Ding, H. Ying, S.H. Shao, Necessary conditions on minimal system configuration for general MISO Mamdani fuzzy systems asuniversal approximators, IEEE Transactions on SMC, Part B 30 (6) (2000) 857–864.

[11] R. Hassine, F. Karray, A.M. Alimi, M. Selmi, Approximation properties of fuzzy systems for smooth functions and their first-orderderivative, IEEE Transactions on SMC, Part A 33 (2) (2003) 160–168.

[12] H.M. Kim, J.M. Mendel, Fuzzy basis functions: comparisons with other basis functions, IEEE Transactions on Fuzzy Systems 3 (2)(1995) 158–168.

[13] E.P. Klement, L.T. Koczy, B. Moser, Are fuzzy systems universal approximators, International Journal of General Systems 28 (2–3)(1999) 259–282.

[14] L.T. Koczy, A. Zorat, Fuzzy systems and approximation, Fuzzy Sets and Systems 85 (1997) 203–222.[15] B. Kosko, Fuzzy systems as universal approximators, in: Proceedings of the IEEE International Conference on Fuzzy Systems, San

Diego, 1992, pp. 1153–1162.[16] B. Kosko, Fuzzy systems as universal approximators, IEEE Transactions on Computers 43 (11) (1994) 1329–1333.[17] B. Kosko, Fuzzy Engineering, Prentice-Hall, Englewood Cliffs, NJ, 1996.[18] S. Mitaim, B. Kosko, What is the best shape for a fuzzy set in function approximation? in: Proceedings of the 5th IEEE International

Conference on Fuzzy Systems (FUZZ-96), vol. 2, 1996, pp. 1237–1243.[19] S. Mitaim, B. Kosko, The shape of fuzzy sets in adaptive function approximation, IEEE Transactions on Fuzzy Systems 9 (4) (2001)

637–656.[20] I. Lee, B. Kosko, W.F. Anderson, Modeling gunshot bruises in soft body armor with an adaptive fuzzy system, IEEE Transactions on

Systems, Man, and Cybernetics – Part B: Cybernetcis 35 (6) (2005) 1374–1390.[21] Y.M. Li, Z.K. Shi, Z.H. Li, Approximation theory of fuzzy systems based upon genuine many-valued implications – SISO cases,

Fuzzy Sets and Systems 130 (2002) 147–157.[22] P.Y. Liu, Analysis of approximation of continuous fuzzy functions by multivariate fuzzy polynomials, Fuzzy Sets and Systems 127

(2002) 299–313.

Q. Luo et al. / Information Sciences 178 (2008) 836–857 857

[23] P.Y. Liu, H.X. Li, Hierarchical TS fuzzy system and its universal approximation, Information Sciences 169 (2005) 279–303.[24] B. Moser, Sugeno controllers with a bounded number of rules are nowhere dense, Fuzzy Sets and Systems 104 (2) (1999) 269–277.[25] Z.H. Miao, H.X. Li, Approximation problem of a class of fuzzy systems, Journal of Beijing Normal University (Natural Science) 36

(1) (2000) 14–20.[26] M.E. Moghaddam, M. Jamzad, Linear motion blur parameter estimation in noisy images using fuzzy sets and power spectrum,

EURASIP Journal on Advances in Signal Processing, vol. 2007, Article ID 68985, 2007, 8 p. doi:10.1155/2007/68985.[27] J.V.d. Oliveira, Semantics constraints for membership function optimization, IEEE Transactions on Systems, Man, and Cybernetics –

Part A 29 (1) (1999) 128–138.[28] W. Pedrycz, Why triangular membership functions? Fuzzy Sets and Systems 64 (1994) 21–30.[29] W. Pedrycz, J.V.d. Oliveira, Optimization of fuzzy models, IEEE Transactions on Systems, Man, and Cybernetics – Part B 26 (4)

(1996) 627–636.[30] W. Pedrycz, A.V. Vasilakos, Linguistic models and linguistic modeling, IEEE Transactions on Systems, Man, and Cybernetics – Part

B 29 (6) (1999) 745–757.[31] W. Pedrycz, F. Gomide, Fuzzy Systems Engineering: Toward Human-Centric Computing, J. Wiley, Hoboken, NJ, 2007.[32] J.Y. Peng, H.X. Li, J. Hou, F. You, J.Y. Wang, Fuzzy controllers based on pointwise optimaization fuzzy inference and its

interplation mechanism, Journal of Systems Science and Mathematics 25 (3) (2005) 311–322.[33] B.L.S. Prakasa Rao, Nonparametric Functional Estimation, Academic Press, Orlanda, FL, 1983.[34] D.F. Specht, Probabilistic neural networks, Neural Networks 3 (1990) 109–118.[35] D.F. Specht, A general regression neural network, IEEE Transactions on Nueral Networks 2 (6) (1991) 568–576.[36] D. Tikk, On Nowhere Denseness of Certain Fuzzy Controllers Containing Prerestricted Number of Rules, vol. 16, Tatra Mountains

Math. Publ., 1999, pp. 369–377.[37] D. Tikk, Notes on the approximation rate of fuzzy KH interpolators, Fuzzy Sets and Systems 138 (2003) 441–453.[38] D. Tikk, L.T. Koczy, T.D. Gedeon, A survey on the universal approximation and its limits in soft computing techniques,

International Journal of Approximate Reasoning 33 (2) (2003) 185–202.[39] P. Vuorimaa, Fuzzy self-organizing map, Fuzzy Sets and Systems 66 (2) (1994) 223–231.[40] H. Wang, J. Xiao, T-S fuzzy system based on multi-resolution analysis and its function approximation, in: Proceedings of the 5th

World Congress on Intelligent Control and Automation, June 15–19, Hangzhou, PR China, 2004, pp. 244–249.[41] L.X. Wang, J.M. Mendel, Fuzzy basis functions, universal approximation, and orthogonal least squares learning, IEEE Transactions

on Neural Networks 3 (5) (1992) 807–814.[42] L.X. Wang, A Course in Fuzzy Systems and Control, Prentice-Hall, 1997.[43] C. Wei, L.X. Wang, A note on universal approximation by hierarchical fuzzy systems, Information Sciences 123 (2000) 241–248.[44] H. Ying, Sufficient conditions on general fuzzy systems as function approximations, Automatica 30 (3) (1994) 521–525.[45] H. Ying, General SISO Takagi–Sugeno fuzzy systems with linear rule consequents are universal approximators, IEEE Transactions

on Fuzzy Systems 6 (4) (1998) 582–587.[46] H. Ying, Sufficient conditions on uniform approximation of multivariate functions by general Takagi–Sugeno fuzzy systems with

linear rule consequents, IEEE Transactions on SMC, Part A 28 (4) (1998) 515–520.[47] H. Ying, General Takagi–Sugeno fuzzy systems with simplified linear rule consequents are universal controllers, models and filters,

Information Sciences 108 (1998) 91–107.[48] R.R. Yager, V. Kreinovich, Universal approximation theorem for uninorm-based fuzzy systems modeling, Fuzzy Sets and Systems

140 (2003) 331–339.[49] X.J. Zeng, M.G. Singh, Approximation theory of fuzzy systems-SISO case, IEEE Transactions on Fuzzy Systems 2 (2) (1994) 162–

176.[50] X.J. Zeng, M.G. Singh, Approximation theory of fuzzy systems-MIMO case, IEEE Transactions on Fuzzy Systems 3 (2) (1995) 219–

235.[51] X.J. Zeng, M.G. Singh, Approximation accuracy analysis of fuzzy systems as function approximators, IEEE Transactions on Fuzzy

Systems 4 (1) (1996) 44–63.[52] X.J. Zeng, M.G. Singh, A relationship between membership functions and approximation accuracy in fuzzy systems, IEEE

Transactions on SMC, Part B 26 (1) (1996) 176–180.[53] K. Zeng, N.Y. Zhang, W.L. Xu, A comparative study on sufficient conditions for Takagi–Sugeno fuzzy systems as universal

approximators, IEEE Transactions on Fuzzy Systems 8 (6) (2000) 773–780.[54] Y.Z. Zhang, H.X. Li, Generalized hierarchical Mamdani fuzzy systems and their universal approximation, Control Theory and

Applications 23 (3) (2006) 439–454.