Global and Local Modelling in Radial Basis Functions Networks

8
Global and Local Modelling in Radial Basis Functions Networks L.J. Herrera, H. Pomares, I. Rojas, A. Guillén, G. Rubio, and J. Urquiza Departamento de Arquitectura y Tecnología de los Computadores, Universidad de Granada, 18071 Granada, Spain [email protected] Abstract. In the problem of modelling Input/Output data using neuro-fuzzy sys- tems, the performance of the global model is normally the only objective opti- mized, and this might cause a misleading performance of the local models. This work presents a modified radial basis function network that maintains the opti- mization properties of the local sub-models whereas the model is globally op- timized, thanks to a special partitioning of the input space in the hidden layer performed to carry out those objectives. The advantage of the methodology pro- posed is that due to those properties, the global and the local models are both directly optimized. A learning methodology adapted to the proposed model is used in the simulations, consisting of a clustering algorithm for the initialization of the centers and a local search technique. 1 Introduction When modelling input/output data, it is intended that the obtained system faithfully represents the real phenomenon that is being addressed, that is, that the system is accu- rate, and at the same time that it expresses its behavior in a comprehensible way, that is, that the system is interpretable. Neuro-fuzzy systems then have to combine these two contradictory requirements to the obtained model, the first more inherent to neural networks, and the second more inherent to fuzzy systems [1] [2]. These systems can approximate a nonlinear mapping using a relatively low number of neurons, that we may call sub-models. Each one of these sub-models, that constitutes the neuro-fuzzy system, performs the approximation of a part of the input space and jointly collaborates to globally approximate the subjacent model from this set of I/O data. Radial Basis Function (RBF) networks [3] [4] are a widely used neuro-fuzzy paradigm, whose main characteristic is their ability to provide a high accurate model. An additional advantage of this type of models is that there’s the possibility to perform a rule extraction procedure from the RBF network, transforming it into a Takagi-Sugeno- Kang (TSK) set of interpretable fuzzy rules that can keep the accuracy of the original system [5], [6]. This work will consider RBF networks using linear weights, so-called RBF net- works with regression weights [7]. This type of networks present the advantage that the flexibility of the local models is increased, leading to networks with a much lower number of neurons [8]. The output of a RBF network with regression weights can be expressed as: J. Cabestany et al. (Eds.): IWANN 2009, Part I, LNCS 5517, pp. 49–56, 2009. c Springer-Verlag Berlin Heidelberg 2009

Transcript of Global and Local Modelling in Radial Basis Functions Networks

Global and Local Modelling in Radial Basis FunctionsNetworks

L.J. Herrera, H. Pomares, I. Rojas, A. Guillén, G. Rubio, and J. Urquiza

Departamento de Arquitectura y Tecnología de los Computadores,Universidad de Granada, 18071 Granada, Spain

[email protected]

Abstract. In the problem of modelling Input/Output data using neuro-fuzzy sys-tems, the performance of the global model is normally the only objective opti-mized, and this might cause a misleading performance of the local models. Thiswork presents a modified radial basis function network that maintains the opti-mization properties of the local sub-models whereas the model is globally op-timized, thanks to a special partitioning of the input space in the hidden layerperformed to carry out those objectives. The advantage of the methodology pro-posed is that due to those properties, the global and the local models are bothdirectly optimized. A learning methodology adapted to the proposed model isused in the simulations, consisting of a clustering algorithm for the initializationof the centers and a local search technique.

1 Introduction

When modelling input/output data, it is intended that the obtained system faithfullyrepresents the real phenomenon that is being addressed, that is, that the system is accu-rate, and at the same time that it expresses its behavior in a comprehensible way, thatis, that the system is interpretable. Neuro-fuzzy systems then have to combine thesetwo contradictory requirements to the obtained model, the first more inherent to neuralnetworks, and the second more inherent to fuzzy systems [1] [2]. These systems canapproximate a nonlinear mapping using a relatively low number of neurons, that wemay call sub-models. Each one of these sub-models, that constitutes the neuro-fuzzysystem, performs the approximation of a part of the input space and jointly collaboratesto globally approximate the subjacent model from this set of I/O data.

Radial Basis Function (RBF) networks [3] [4] are a widely used neuro-fuzzyparadigm, whose main characteristic is their ability to provide a high accurate model.An additional advantage of this type of models is that there’s the possibility to perform arule extraction procedure from the RBF network, transforming it into a Takagi-Sugeno-Kang (TSK) set of interpretable fuzzy rules that can keep the accuracy of the originalsystem [5], [6].

This work will consider RBF networks using linear weights, so-called RBF net-works with regression weights [7]. This type of networks present the advantage that theflexibility of the local models is increased, leading to networks with a much lowernumber of neurons [8]. The output of a RBF network with regression weights can beexpressed as:

J. Cabestany et al. (Eds.): IWANN 2009, Part I, LNCS 5517, pp. 49–56, 2009.c© Springer-Verlag Berlin Heidelberg 2009

50 L.J. Herrera et al.

F (x) =

K∑

k=1μk(x)Y k(x)

K∑

k=1μk(x)

(1)

where K is the number of neurons in the hidden layer, Y k(x) = ak+n∑

i=1bki ∗xi are their

linear output weights of the neurons, and the μk(x) =∏n

i=1 μki (xi) are the activations

of the hidden neurons, that when using Gaussian kernels can be expressed as:

μki (xi) = e

− (xi−cki )2

2σki2

, (2)

being cki and σk

i the center and radius of the Gaussian function of neuron k at dimen-sion i.

When dealing with the optimization of RBF networks, global accuracy is usuallythe single objective to optimize, and the problem of local model optimization is barelyaddressed [9], [10], [11]. Nevertheless, some works have studied multiobjective op-timization formulations of neuro-fuzzy systems, that deal with both local and globalmodelling [12]. Multi-modelling is another important area of research dealing withlocal and global optimization [13].

This work presents modified RBF model whose main characteristic is its ability toperform a global model approximation, whereas preserving the interpretation of the lo-cal sub-models. These characteristics are possible thanks to an special partitioning ofthe input space by means of a modified calculation of the final normalized activationof the neurons, which controls their overlapping and thus allows us to perform an un-constrained training. The learning methodology for this model used in the simulationssection consists of a clustering algorithm especially suited to function approximationproblems and a local search technique.

2 Modified Radial Basis Function Network for Local and GlobalModelling

The way to obtain the desired characteristic is adapted from previous approaches forfuzzy systems [14] [15]. The local modelling will be achieved through the use of theTaylor series expansion of a function around a point. The intention is that the polyno-mial weights of the RBF network represent the Taylor series expansion of the systemoutput around the neuron centres, thus verifying the local modelling properties of theneurons. Two properties will be required in the RBF network with regression weightsso that these properties are obtained:

1. The weights have the general form of a centred polynomial around a point whichresembles a truncated Taylor series expansion. The Taylor Theorem states that if acontinuous function f(x) defined in an interval (c−r, c+r) is l times differentiable,it can be approximated around the point x = c as its Taylor series expansion

f(x) � f(c) + (x − c)T [∇(c)] + 12 (x − c)T W (x − c) + . . . (3)

Global and Local Modelling in Radial Basis Functions Networks 51

Given the general RBF output function formulation, the linear weights of the RBFnetwork can be easily modified to resemble a truncated form of equation 3 (theconstant and linear term in the summation).

2. The input partitioning caused by the Radial Basis Functions in the hidden layer fol-low a structure that allows such local modelling. This implies a twofold condition.First, the model output should be continuous and differentiable, so that the Tay-lor Theorem can be applied. Second, the overlapping degree of all the RBF nodesactivations should vanish at each neuron centre. This way, every point of the n-dimensional space ck identified by the centre of a neuron k, is only affected by itsrespective Radial Basis Function in the global system output function. Moreover,the linear weights Y k(x) will contain the information on the function f(x) in ck,as well as information about its behavior around that point (its derivatives).

In [14] a special membership function type was presented for grid-partitioned fuzzysystems, that carries out those conditions about the overlapping of the input space par-titioning and the continuity and differentiability of the global model output. However,the overlapping degree among the different local models in the neurons can not nev-ertheless be directly controlled through the use of a specific radial basis function type.The activation functions are not organized along each dimension separately, but ratherthey are “randomly” distributed (scattered) according to the distribution of the neuroncentres along the n-dimensional input space.

Traditionally the output of a normalized RBF network is obtained through weightedaverage aggregation of the hidden neuron outputs, in which the final normalized activa-tion of a neuron depends on the relative activation of the rest of the neurons as seen inequation 1. The proposal in this work is to modify this final neuron activation, so that itis adapted to the overlapping and continuity requirements imposed to the RBF network.For the sake of simplicity, we next first explain how to perform this modification in theprocess for a one-dimensional case. Later it will be extrapolated to the general case.

Let us assume the simple case of a one-dimensional space with domain [0,1] withtwo gaussian neurons centred for example in c1 = 0.2 and c2 = 0.8 with σ = 0.3 (seefigure 1.(a)). In this case, there is a moderated overlapping between the two neurons’activations. In order to comply with the afore-mentioned overlapping conditions wewill allow the domain of the first neuron activation μ1(x) to be limited by the function1−μ2(x). That is, when the activation value of the opposite neuron is 1, the first neuronactivation will be forced to take the value 0. That is, the final activation value for anypoint in the network neurons using normalization would be

μ1∗(x) = μ1(x)(1 − μ2(x)

), μ1∗(x) =

μ1∗(x)μ1∗(x) + μ2∗(x)

(4)

μ2∗(x) = μ2(x)(1 − μ1(x)

), μ2∗(x) =

μ2∗(x)μ1∗(x) + μ2∗(x)

(5)

Generalizing to the n-dimensional case, the activation value of the k-th neuron isobtained by the following equation

μk∗(x) = μk(x)K∏

j=1;j �=k

(1 − μj(x)

)(6)

52 L.J. Herrera et al.

(a) (b)

Fig. 1. a) Original μ1 and μ2 MFs for the one-dimensional example. b) Normalized final neuronsactivations using the modified calculation μ1∗ and μ2∗.

Therefore, for any given number of neurons K , the general expression for the RBFnetwork output, using normalization (which forces the activation value of the neuronsto sum up to one in every point) can be calculated as follows

F (x) =K∑

k=1μk∗(x)Y k(x) =

K∑

k=1μk∗(x)Y k(x)

K∑

k=1μk∗(x)

=

K∑

k=1

⎜⎝μk(x)

K∏

j=1;j �=k

(1−μj(x))

⎟⎠Y k(x)

K∑

k=1

⎜⎝μk(x)

K∏

j=1;j �=k

(1−μj(x))

⎟⎠

(7)

where μk∗(x) = μk∗(x)/K∑

j=1μj∗(x) is the normalized activation value for neuron k.

With this new formulation of the system’s output given in equation 7, the final nor-malized neuron activations are modified so that they consider the relative positioning ofeach neuron with respect to the others. Moreover, the output function of the proposedmodel is continuous and differentiable, since it is a linear composition of continuousand differentiable functions.

It is immediate to deduce that the following properties hold, due to the continuity ofthe gaussian function:

μ2∗(c1) = 0; μ1∗(c1) = 1; ⇒ F (c1) = Y 1(c1) = a1

μ1∗(c2) = 0; μ2∗(c2) = 1; ⇒ F (c2) = Y 2(c2) = a2 (8)

moreover, due to the differentiability properties of those functions, it holds that

∂μ1∗

∂x (c1) = 0 ∧ ∂μ2∗

∂x (c1) = 0 ⇒ ∂F∂x (c1) = ∂Y 1

∂x (c1) = b1

∂μ2∗

∂x (c2) = 0 ∧ ∂μ1∗

∂x (c2) = 0 ⇒ ∂F∂x (c2) = ∂Y 2

∂x (c2) = b2(9)

Note that those properties are hold thanks to the use of the designed partitioning andto the use of truncated Taylor series-shaped polynomial consequents. Thus, accordingto equations 8 and 9, it holds that the weights Y k(x) of the neurons, can be interpreted

Global and Local Modelling in Radial Basis Functions Networks 53

as the first-order truncated Taylor series expansion of the model output around the re-spective neuron centre. Those results can directly be extrapolated to the n-dimensionalcase, thanks to the continuity and differentiability of the compounded functions.

3 Learning Methodology

A typical learning procedure for RBF networks can be summarized as follows [16]

– initialization of the neuron centres and radii– calculation of the optimal weights of every neuron using LSE– local search minimization to optimize centres and other MF parameters

The learning algorithm we propose is based on this scheme due to the lower com-putational cost that it entails, and to its effectiveness in finding good solutions. In thesimulations, the learning methodology will make use of the ICFA algorithm [16] (Im-proved Clustering for Function Approximation) for an appropriate initialization of thecentres in order to obtain optimal accurate models. This algorithm use the informationprovided by the output values of the function to approximate, so that it places morecentres where the output variability is higher, instead of where there is a higher numberof input vectors.

Since the model output function (Eq. 1) is linear with respect to all the neuronweights parameters, given a set of M input/output data D = (x1, y1), ..., (xm, ym) of amodelling problem, it is possible to optimally obtain these parameters through the use ofa wide range of mathematical methods. In this work we will use the Least Square Error(LSE) approach for the optimization of the neuro-fuzzy linear coefficients, and Singu-lar Value Decomposition (SVD) to solve the linear equation system constructed. Theobjective is to minimize the mean square error function J =

∑m∈D (F (xm) − ym)2.

Once an initial configuration of the centres and radii is obtained, thanks to the avail-ability of a direct method to obtain the optimal consequents coefficients and the con-tinuity and differentiability properties of the proposed model with respect to any ofits parameters, any local search procedure could be used in order to find an optimalconfiguration of the parameters defining every weight. Using a local search procedure,both centres and radii can be optimized to obtain a more accurate system accordingto the error function J . In this work we have optimized the neuron centres using theLevenberg-Marquardt algorithm, estimating for the sake of simplicity the radii accord-ing to the minimum distance among them [17].

4 Simulations

This section presents a significant example of the application of the model modifiedRBF network methodology to function approximation problems. Given the y functionthat is expressed as (see reference [18])

y(x1, x2) = 0.5 ∗ (1 + sin(2 ∗ π ∗ x1) ∗ cos(2 ∗ π ∗ x2) (10)

400 samples were generated randomly distributed as training dataset. Another set of400 samples was generated as test set. As an example of execution, 10 centres were

54 L.J. Herrera et al.

0

0.5

1

0

0.5

10

0.5

1

x2x1

y

0

0.5

1

0

0.5

1

0

0.5

1

x1x2

Act

ivat

ion

valu

e

(a) (b)

Fig. 2. a) Original function y. b) Activations of the 10 neurons of the modified RBF networkobtained after the complete learning algorithm for the function example y.

00.2

0.40.6

0.81

00.2

0.40.6

0.810

0.2

0.4

0.6

0.8

1

x2x1

y

Fig. 3. Joint representation of the model output, together with the linear weights of the 10 networkneurons in the area surrounding the corresponding centre

initialized using the ICFA algorithm [16]. The radius of every node of the model was au-tomatically initialized according to the minimum distance among them [17].Figure 2.(a) shows the shape of the original function y.

The local search procedure readjusts the radial basis functions parameters (centresand radius) according to the performance obtained in the optimization of the linear con-sequents using LSE, so that the error function J is minimized. The training NormalizedRoot Mean Square Error (NRMSE [14]) of the modified RBF network is 0.095, whereasthe test NRMSE obtained is 0.1.

Global and Local Modelling in Radial Basis Functions Networks 55

Figure 2.(b) shows the activations of the 10 neurons of the RBF network, accordingto the modified final normalized neuron activation designed. This figure shows the dis-tribution of the centres obtained after the execution of the algorithm; it is obtained adistribution of the neurons activations that is intuitive and in which no neuron has effectin the rest of the neurons centres, being each neuron activation limited by the locationof the rest of centres in the n-dimensional input space.

The effect of this partitioning in the model is observed in figure 3, which shows therepresentation of the local and global modelling performed by the proposed modifiedRBF model having ten neurons. It can be observed that the weights bring the informa-tion about the exact function value at the neuron centre, as well as the behavior of themodel output function in the area around that neuron centre. Each polynomial defines aplane that indicates the model output slope value around the neuron centre. Moreover,the global model output approximates the subjacent function given in the training dataset for this example. Given this modified RBF model, the extraction of the set of rulesfrom the network would be immediate, according to the well-known equivalence of thismodel with a TSK fuzzy system [4] [19] [14].

5 Conclusions

In this paper we have presented a modified Radial Basis Function model, that dueto a special partitioning of the input space provides a simultaneous optimization ofthe global model and of the local sub-models. In order to show the properties of theproposed modified RBF model, we have used a learning methodology consisting of aclustering algorithm especially suited to function approximation problems and a localsearch technique. The example performed has shown that it is able to obtain a goodglobal approximation, and moreover the obtained local models provide the informationabout the behavior of the model in the areas of the input space around the neuron cen-tres, and therefore of the behavior of the subjacent system being approximated. More-over due to the equivalency between TSK fuzzy models and RBF networks, there is thepossibility to perform a rule extraction procedure to obtain an equivalent TSK fuzzysystem with rules having the same global-local approximation capabilities.

Acknowledgment

This work has been partially supported by the Spanish CICYT Project TIN2007-60587and Junta Andalucia Project P07-TIC-02768.

References

1. Casillas, J., Cordon, O., del Jesus, M.J., Herrera, F.: Tuning of fuzzy rule deep structures pre-serving interpretability and its interaction with fuzzy rule set reduction. IEEE Trans. FuzzySyst. 13(1), 12–29 (2005)

2. Herrera, L., Pomares, H., Rojas, I., Valenzuela, O., Awad, M.: MultiGrid-based fuzzy sys-tems for function approximation. In: Monroy, R., Arroyo-Figueroa, G., Sucar, L.E., Sossa,H. (eds.) MICAI 2004. LNCS, vol. 2972, pp. 252–261. Springer, Heidelberg (2004)

56 L.J. Herrera et al.

3. Haykin, S.: Neural Networks. Prentice-Hall, Englewood Cliffs (1998)4. Jang, J.S.R., Sun, C.T.: Functional equivalence between radial basis function networks and

fuzzy inference systems. IEEE Transactions on Neural Networks 4(1), 156–159 (1993)5. Jin, Y., Sendhoff, B.: Extracting interpretable fuzzy rules from rbf networks. Neural Process-

ing Letters 17, 149–164 (2003)6. Vachkov, G.L., Kiyota, Y., Komatsu, K.: Learning of rbf network models for prediction of un-

measured parameters by use of rules extraction algorithm. In: Proc. of NAFIPS 2005 (2005)7. Langari, R., Wang, L., Yen, J.: R.b.f. networks, regression weights, and the expectation-

maximization algorithm. IEEE T. Syst., Man, and Cyber. A 27, 613–623 (1997)8. Rojas, I., Pomares, H., Bernier, J., Ortega, J., Pino, B., Pelayo, F., Prieto, A.: Time series

analysis using normalized pg-rbf network with regression weights. NeuroComputing 42,267–285 (2002)

9. Leng, G., McGinnity, T.M., Prasad, G.: Design for self-organizing fuzzy neural networksbased on genetic algorithms. IEEE T. Fuzzy Systems 14(6), 755–766 (2006)

10. Liu, P., Li, H.: Efficient learning algorithms for three-layer regular feedforward fuzzy neuralnetworks. IEEE T. Neural Networks 15(3), 545–558 (2004)

11. González, J., Rojas, I., Ortega, J., Pomares, H., Fernandez, F., Diaz, A.: Multiobjective evo-lutionary optimization of the size, shape, and position parameters of radial basis functionnetworks for function approximation. IEEE T. Neural Networks 14(6), 1478–1495 (2003)

12. Johansen, T., Babuska, R.: Multiobjective identification of takagi-sugeno fuzzy models. IEEETrans. Fuz. Syst. 11(6), 847–860 (2003)

13. Madani, K., Thiaw, L.: Self-organizing multi-modeling: A different way to design intelligentpredictors. Neurocomput 70(16-18), 2836–2852 (2007)

14. Herrera, L., Pomares, H., Rojas, I., Valenzuela, O., Prieto, A.: Tase, a taylor series basedfuzzy system model that combines interpretability and accuracy. Fuzzy Sets and Sys-tems 153, 403–427 (2005)

15. Herrera, L., Pomares, H., Rojas, I., Guillén, A., Awad, M., González, J.: Interpretable ruleextraction and function approximation from numerical input/output data using the modifiedfuzzy tsk model: Tase model. In: Slezak, D., Wang, G., Szczuka, M.S., Düntsch, I., Yao, Y.(eds.) RSFDGrC 2005. LNCS, vol. 3641, pp. 402–411. Springer, Heidelberg (2005)

16. Guillén, A., González, J., Rojas, I., Pomares, H., Herrera, L.J., Valenzuela, O., Prieto, A.:Using fuzzy logic to improve a clustering technique for function approximation. Neurocom-puting 70(16-18), 2853–2860 (2007)

17. Moody, J., Darken, C.: Fast learning in networks of locally-tuned processing units. NeuralComputation 1(2), 281–294 (1989)

18. Rovatti, R., Guerrieri, R.: Fuzzy sets of rules for system identification. IEEE Trans. FuzzySyst. 4(2), 89–102 (1996)

19. Reyneri, L.M.: Unification of neural and wavelet networks and fuzzy systems. IEEE Trans-actions on Neural Networks 10(4), 801–814 (1999)