Inferring descriptive and approximate fuzzy rules for credit scoring using evolutionary algorithms

16
Computing, Artificial Intelligence and Information Management Inferring descriptive and approximate fuzzy rules for credit scoring using evolutionary algorithms F. Hoffmann a, * , B. Baesens b,c , C. Mues b,c , T. Van Gestel d , J. Vanthienen c a Fakulta ¨t fu ¨ r Elektrotechnik und Informationstechnik, Universita ¨ t Dortmund, Otto-Hahn-Strabe 4, 44221 Dortmund, Germany b School of Management, University of Southampton, Highfield, Southampton SO17 1BJ, United Kingdom c K.U. Leuven, Department of Applied Economic Sciences, Naamsestraat 69, B-3000 Leuven, Belgium d Group Risk Management, Dexia Group, Square Meeus 1, B-1000 Brussel, Belgium Received 29 July 2004; accepted 11 September 2005 Available online 7 July 2006 Abstract Generating both accurate as well as explanatory classification rules is becoming increasingly important in a knowl- edge discovery context. In this paper, we investigate the power and usefulness of fuzzy classification rules for data min- ing purposes. We propose two evolutionary fuzzy rule learners: an evolution strategy that generates approximate fuzzy rules, whereby each rule has its own specific definition of membership functions, and a genetic algorithm that extracts descriptive fuzzy rules, where all fuzzy rules share a common, linguistically interpretable definition of membership func- tions in disjunctive normal form. The performance of the evolutionary fuzzy rule learners is compared with that of Nef- class, a neurofuzzy classifier, and a selection of other well-known classification algorithms on a number of publicly available data sets and two real life Benelux financial credit scoring data sets. It is shown that the genetic fuzzy classi- fiers compare favourably with the other classifiers in terms of classification accuracy. Furthermore, the approximate and descriptive fuzzy rules yield about the same classification accuracy across the different data sets. Ó 2006 Published by Elsevier B.V. Keywords: Fuzzy sets; Credit scoring; Data mining; Classification 1. Introduction In the field of knowledge discovery and data mining, many techniques have been suggested to perform classification. However, most of this work primarily focuses on developing models with high predictive accuracy without trying to explain 0377-2217/$ - see front matter Ó 2006 Published by Elsevier B.V. doi:10.1016/j.ejor.2005.09.044 * Corresponding author. Tel.: +49 231 755 2760; fax: +49 231 755 2752. E-mail addresses: hoff[email protected] (F. Hoffmann), [email protected], bart.baesens@ econ.kuleuven.ac.be (B. Baesens), [email protected], [email protected] (C. Mues), Tony. Vangestel@ dexia.com (T. Van Gestel), jan.vanthienen@econ. kuleuven.ac.be (J. Vanthienen). European Journal of Operational Research 177 (2007) 540–555 www.elsevier.com/locate/ejor

Transcript of Inferring descriptive and approximate fuzzy rules for credit scoring using evolutionary algorithms

European Journal of Operational Research 177 (2007) 540–555

www.elsevier.com/locate/ejor

Computing, Artificial Intelligence and Information Management

Inferring descriptive and approximate fuzzy rules forcredit scoring using evolutionary algorithms

F. Hoffmann a,*, B. Baesens b,c, C. Mues b,c, T. Van Gestel d, J. Vanthienen c

a Fakultat fur Elektrotechnik und Informationstechnik, Universitat Dortmund, Otto-Hahn-Strabe 4, 44221 Dortmund, Germanyb School of Management, University of Southampton, Highfield, Southampton SO17 1BJ, United Kingdom

c K.U. Leuven, Department of Applied Economic Sciences, Naamsestraat 69, B-3000 Leuven, Belgiumd Group Risk Management, Dexia Group, Square Meeus 1, B-1000 Brussel, Belgium

Received 29 July 2004; accepted 11 September 2005Available online 7 July 2006

Abstract

Generating both accurate as well as explanatory classification rules is becoming increasingly important in a knowl-edge discovery context. In this paper, we investigate the power and usefulness of fuzzy classification rules for data min-ing purposes. We propose two evolutionary fuzzy rule learners: an evolution strategy that generates approximate fuzzyrules, whereby each rule has its own specific definition of membership functions, and a genetic algorithm that extractsdescriptive fuzzy rules, where all fuzzy rules share a common, linguistically interpretable definition of membership func-tions in disjunctive normal form. The performance of the evolutionary fuzzy rule learners is compared with that of Nef-class, a neurofuzzy classifier, and a selection of other well-known classification algorithms on a number of publiclyavailable data sets and two real life Benelux financial credit scoring data sets. It is shown that the genetic fuzzy classi-fiers compare favourably with the other classifiers in terms of classification accuracy. Furthermore, the approximateand descriptive fuzzy rules yield about the same classification accuracy across the different data sets.� 2006 Published by Elsevier B.V.

Keywords: Fuzzy sets; Credit scoring; Data mining; Classification

0377-2217/$ - see front matter � 2006 Published by Elsevier B.V.doi:10.1016/j.ejor.2005.09.044

* Corresponding author. Tel.: +49 231 755 2760; fax: +49 231755 2752.

E-mail addresses: [email protected](F. Hoffmann), [email protected], [email protected] (B. Baesens), [email protected],[email protected] (C. Mues), Tony.Vangestel@ dexia.com (T. Van Gestel), [email protected] (J. Vanthienen).

1. Introduction

In the field of knowledge discovery and datamining, many techniques have been suggested toperform classification. However, most of this workprimarily focuses on developing models with highpredictive accuracy without trying to explain

F. Hoffmann et al. / European Journal of Operational Research 177 (2007) 540–555 541

how the classifications are being made. Clearly, thelatter can play a pivotal role in knowledge discov-ery and data mining as the end-user may berequired to give a justification why a certain classi-fication is being made. In the context of creditscoring, for example, it is very important to haveexplanatory models that provide the expert withan explanation facility that allows justifying whya certain credit applicant is rejected or approved[2,3].

Recent developments in rule extraction algo-rithms allow to infer classification rules that,besides being accurate, also provide the expertwith an explanation for the classification decision.Both crisp and fuzzy rule extraction methods havebeen proposed in the literature. It is often advo-cated that fuzzy rules are more comprehensibleand user-friendly than the former because the rulesare expressed in terms of linguistic concepts whichare usually easier to interpret. Clearly, this willdepend on the type of fuzzy rules that are gener-ated. We distinguish between descriptive fuzzy sys-tems, in which the fuzzy rules share membershipfunctions that are associated with linguistic con-cepts, and approximate fuzzy systems, in whicheach rule uses its individual, locally defined mem-bership functions. The main advantage of descrip-tive fuzzy systems is that they are believed to bemore comprehensible and user-friendly than crispor approximate rules, as the user has an intuitiveunderstanding of the linguistic concepts to whichthe rules refer.

In this paper, we investigate the appropriatenessof evolutionary algorithms for generating fuzzyclassification rules. We propose two evolutionaryfuzzy rule learners: an evolution strategy that gen-erates approximate fuzzy rules and a genetic algo-rithm that extracts descriptive fuzzy rules. Aboosting algorithm invokes the evolutionary algo-rithm to identify the fuzzy rule that best matchesthe training examples. Those instances correctlyclassified by the new rule are down-weighted, andthe rule generation scheme is repeatedly invokedas the boosting scheme adapts the distribution oftraining instances. In an extension to previouswork restricted to approximate fuzzy rules [14],we compare the comprehensiveness and classifica-tion accuracy of approximate and descriptive

fuzzy rules in the context of boosted fuzzy classifi-ers. The experiments are carried out on a numberof publicly available data sets and two large-scaleBenelux financial credit scoring data sets. We willalso compare the performance of the evolutionaryfuzzy classifiers with the performance of Nefclass,a well-known neurofuzzy classifier that infersdescriptive fuzzy classification rules. Furthermore,we will also include a selection of other well-known classification algorithms in our study.

This paper is organised as follows. Section 2presents the boosted genetic fuzzy classificationalgorithm. The Nefclass neurofuzzy classificationalgorithm is presented in Section 3. Section 4 dis-cusses the empirical evaluation including theexperimental setup and the results. Conclusionsare drawn in Section 5.

2. Boosted genetic fuzzy classification

The past decade witnessed an increasing interestin learning algorithms that automate the knowl-edge acquisition step in fuzzy system design. Thegeneral term genetic fuzzy rule based systems(GFRBS) encompasses methods in which an evo-lutionary algorithm adapts the entire knowledgebase or its individual components [6]. The majorityof GFRBS is concerned with the design and opti-mization of fuzzy logic controllers, but recentlyevolutionary algorithms have been successfullyapplied to learn fuzzy classification rules[5,16,18]. The iterative rule learning (IRL)approach to GFRBS builds the fuzzy rule basein an incremental fashion [9]. The evolutionaryalgorithm extracts a fuzzy rule that describes asubset of the current training examples. Theinstances that match the new rule are removedfrom the training set and the rule generation pro-cess is repeatedly invoked until all training exam-ples are sufficiently covered. In the originalapproach, the rule generation stage is succeededby a second post-processing stage which refinesthe rule base in order to improve the overall clas-sification accuracy.

In the approach presented in this paper, thepost-processing stage becomes obsolete as theboosting mechanism reweights the training

542 F. Hoffmann et al. / European Journal of Operational Research 177 (2007) 540–555

instances and thereby implicitly promotes cooper-ation among fuzzy rules in that it assigns a largerweight to currently misclassified or uncoveredinstances. In other words, the generation of rulesat later stages is influenced via the weighting mech-anism by the previously identified rule set. As aresult, the post-processing stage of the conven-tional IRL approach to establish rule cooperationis no longer needed, even though it might still beapplied to further improve rule cooperation.Hence, the boosting mechanism creates a bias inthe genetic rule generation step towards thosefuzzy rules that complement the current set offuzzy rules and correct their deficiencies.

The key idea of boosting is to aggregate multi-ple hypotheses generated by the same learningalgorithm invoked over different distributions oftraining data into a single composite classifier [8].After each invocation of the underlying weaklearning algorithm, the distribution of traininginstances is changed based on the error the newlygenerated hypothesis exhibits on the training set.As boosting combines multiple hypotheses, theaggregated classifier reduces the error on the train-ing data even though each individual hypothesismight have a relatively large error. Boostingrequires unstable classifiers, namely a learningalgorithm for which the generated hypotheses aresensitive to changes in the training examples.

The approach taken in this paper (see Fig. 1)combines the concepts of boosting and iterativefuzzy rule learning into a coherent framework.Modifications to the boosting algorithm take into

training

instances

genetic−

syste

boost

algoritof training instancesdistributionadapt

fuzzy rule

generate

classificationof traininginstances

Fig. 1. Architecture of boosted

account that fuzzy rules are only partial classifiers,as they are only valid in the local region defined bythe rule antecedent. The iterative rule learningapproach is extended in that training examplescovered by a new rule are down-weighted, ratherthan being entirely removed from the trainingset. That way the evolutionary rule generationmethod focuses on the previously misclassified oruncovered instances. Boosting algorithms forfuzzy classification systems have been previouslyproposed in [12,18].

The boosting scheme imposes no constraint onthe type of fuzzy rules used for classification. Inan approximate fuzzy knowledge base, each rulemaintains its own definition of membership func-tions. In a descriptive fuzzy knowledge base, therules share a commonly defined set of fuzzy setsto which they refer by linguistic labels. Approxi-mate fuzzy rules usually achieve a better accuracyfor modelling an underlying function, but aremore difficult to interpret. Descriptive fuzzy rulesare restricted to a resolution determined by thefixed partition of the input space into fuzzy sets.Descriptive fuzzy rules are more comprehensibleas their linguistic representation captures thehuman notion of the underlying attributes, e.g.,small, medium, large.

The boosting scheme equally applies to approx-imate as well as descriptive fuzzy rules. However,the choice between approximate and descriptiverules has a significant impact on the genetic repre-sentation of the rules and the type of evolutionaryalgorithm suitable for rule optimization. As the

fuzzy

m

ing

hm

rule base

fuzzyruleadd

fuzzy rule strengthadapt

genetic fuzzy classifier.

F. Hoffmann et al. / European Journal of Operational Research 177 (2007) 540–555 543

number of possible descriptive fuzzy rules is finite,identifying the optimal rule can be considered as adiscrete optimization problem, which can be han-dled by a genetic algorithm. The continuous nat-ure of approximate fuzzy rules motivates the useof evolution strategies for their optimization. Thenext two subsections describe the genetic represen-tations suitable for descriptive and approximatefuzzy rules.

2.1. Descriptive fuzzy classification rules

An elemental fuzzy classification rule is of theform:

Ri : If x1 is A1i And . . . xn is Ani Then Class ¼ c;

ð1Þwhereby xj denotes the jth input variable, Aji a fuz-zy set associated to input variable xj and c the classlabel of the rule. Note that, in an elemental rule,each label refers to a singular fuzzy set in the par-tition. These standard Mamdani fuzzy rules can beextended to so-called disjunctive normal form(DNF) fuzzy rules, where each input variable xj

can be related to several linguistic terms Ajk whichare joined by a disjunctive operator [19]. With thatextension, a possible DNF rule in the case of threeinput variables x1, x2, x3 each partitioned into fivefuzzy sets Aij might look like:

If x1 is fA12 Or A14g And x2 is fA25g And

x3 is fA31 Or A32g Then Class ¼ c. ð2Þ

The DNF rule structure does not affect the linguis-tic interpretability, as it is always possible to re-place the DNF rule by a set of equivalentelemental rules.

The genetic algorithm represents DNF fuzzyrules as bit strings, in which each bit denotes thepresence or absence of a linguistic term Aij in therule antecedent [19]. The rule in Eq. (2) would beencoded by the bit-string 01010j00001j11000 wherethe 1’s correspond to the terms A12, A14, A25, A31,A32.

The number of elemental fuzzy rules required tocover the entire input space grows rapidly with thenumber of input variables. In order to avoid anuncontrolled increase of the number of rules, the

coding scheme provides for wildcard variables thatare omitted from the rule antecedent. The chromo-some contains an additional bit-string S ={s1, . . . , sn} in which the bit si indicates the pres-ence or absence of the input variable xi in the ruleantecedent irrespective of the bits referring to thelinguistic labels. Adaptation of the bit-string S

enables the genetic algorithm to identify fuzzyrules with those features that best discriminateamong the different classes, irrespective of the bitsreferring to the linguistic labels.

In addition to the antecedent genes, the chro-mosome also contains a gene that defines the classlabel in the rule consequent. The antecedent partof chromosomes in the first generation is initializedby randomly picking a training instance. The bitscorresponding to labels that best match theinstance are set to 1, neighboring bits are chosenrandomly and the remaining bits are set to 0.

One uses the t-norm to measure the rule activa-tion for a particular instance x

lRiðxÞ ¼ lRi

ðfx1; . . . ; xngÞ ¼ minn

j¼1lAjiðxjÞ; ð3Þ

whereby lAjiðxjÞ is the membership function of xj

in Aji. Note that we hereby use a min operator in-stead of a product operator. The problem with theproduct operator is that for antecedents with manyinputs, the degree of rule activation can becomevery low. The instance x is then classified in theclass cmax

cmax ¼ argmaxcm

XRijci¼cm

lRiðxÞ. ð4Þ

In order to illustrate the coding scheme, consider atwo-input fuzzy system with three membershipfunctions per input variable as depicted in Fig. 2.The linear chromosome has the structurejs1 s2ja11 a12 a13ja21 a22 a23jcj. An example of a chro-mosome that encodes a compact rule isj1 0j1 1 0j0 1 0j1j which corresponds to the rule Ifx1 is {A11 or A12} then Y = c1. Note that the bitsja21 a22 a23j are not expressed as the regulator bits2 is off. For the training instance (x, c) markedby a cross in Fig. 2, the chromosome initializationproceeds in the following steps. For this training in-stance, A12 and A23 possess the highest degree ofmembership, therefore a12 = 1 and a23 = 1. For

A11x 1

x 2

A12 A13

A21

A22

A23

A ij

RA RB

RA

RB

{ }1 11 12if is thenor Y cx A A =

01 1 1 0 0 1 0 1

s2s1 a 11 a 12 a 13 a 21 a 22 a 23 c

S Y

1 13 2 23 0if is thenand is Y cx A x A =

11 0 0 1 0 0 1 0

x

1

Fig. 2. Genetic coding for the descriptive fuzzy rules.

Fig. 3. Genetic coding for the approximate fuzzy rules.

544 F. Hoffmann et al. / European Journal of Operational Research 177 (2007) 540–555

the adjacent fuzzy sets A11, A13 and A22 the bit val-ues a11, a13 and a22 are picked at random withp = 0.5 for zeros and ones. The bit a21 is set to zero.The values of s1 and s2 are also picked at randomwith p = 0.5. Finally, the class c is chosen accord-ing to the majority class of the training instancescovered by the rule antecedent. In the case that x

is the only example covered by the rule, the ruleclass is identical to that of the training instance.

2.2. Approximate fuzzy classification rules

Approximate fuzzy rules contain their own def-inition of membership functions rather than refer-ring to a commonly defined set of membershipfunctions. In our case, the approximate fuzzy rulesare elemental fuzzy rules described by Eq. (1). Thekey difference is that the fuzzy sets Aji no longerrefer to linguistic labels, but instead refer to trape-zoidal membership functions parameterized bytheir characteristic points. The chromosome is areal-valued vector that for each fuzzy set Aji

encodes the left most characteristic point aj, andthe distances between the remaining pointsd1

j ¼ bj � aj, d2j ¼ cj � bj, d3

j ¼ dj � cj as shownin Fig. 3. Thereby, it is ensured that the pointsaj, bj, cj, dj do not change their order as long asthe di

j remain positive. The entire rule chromosomeconcatenates the characteristic points of the fuzzysets A1i, . . . , Ani into one real-valued vectora1; d

11; d

21; d

31; . . . ; an; d

1n; d

2n; d

3n of dimension n · 4.

Fig. 3 shows the chromosome structure for anapproximate two-input fuzzy system. Again thediscrete class c is selected according to the majorityclass of instances covered by the rule.

Due to the continuous nature of the optimiza-tion problem, an evolution strategy rather than agenetic algorithm is used for optimizing the fuzzyclassification rules. In addition to the real-valuedgenetic representation of individuals, evolutionstrategies are distinguished by self-adaptationof additional strategy parameters to improve theoptimization steps [1]. For each of the n · 4object variables xi, the evolution strategy main-tains a strategy parameter ri that denotes themutation step size. Details on the evolution strat-egy, the genetic representation, initialization ofchromosomes and genetic operators for approxi-mate fuzzy rules are reported in [13].

F. Hoffmann et al. / European Journal of Operational Research 177 (2007) 540–555 545

2.3. Fitness function

Both the descriptive and the approximate vari-ants apply the same fitness function for rule gener-ation. Typical optimization criteria for fuzzyclassification rules are a high frequency value, ahigh covering degree over positive examples andthe k-consistency property to penalize negativeexamples covered by a rule. The boosting algo-rithm weights the training examples by a factorwk, that reflects the frequency of the instance xk

in the training set. Therefore, the fitness is com-puted according to the weighted distribution oftraining instances such that instances with largeweight contribute more to the fitness than thosewith small weights. The fitness function considersthree criteria which are aggregated into a singlescalar fitness value.

The first component of the fitness function isthe class coverage defined as the ratio betweenthe number of training instances covered by therule Ri and the overall number of traininginstances carrying the same class label ci,

f1 ¼P

kjck¼ciwklRi

ðxkÞPkjck¼ci

wk. ð5Þ

The second criterion is the rule coverage and is re-lated to the overall fraction of instances covered bythe rule, irrespective of the class label,

n ¼P

kjck¼ciwklRi

ðxkÞPkwk

. ð6Þ

The idea is that a rule should cover a significantportion kcov of the training examples instead ofrepresenting outliers in the training set

f2 ¼1; n > kcov;

nkcov

; otherwise

8<: ð7Þ

with kcov � 0.2, . . . , 0.5 decreasing with an increas-ing number of classes. The reason is that withmore classes it is harder to find rules that cover asignificant fraction of all instances. For example,if you have a classification problem with M classes,each having the same number of instances, then arule that perfectly classifies all instances of oneclass and none of the other M � 1 classes has a

coverage of 1/M. A reasonable choice would thenbe kcov = 1/M, as no rule can cover more than thatfraction of instances without covering other (false)instances.

The number of correctly nþc and incorrectly n�cclassified weighted instances covered by the ruleRi is approximated as

nþc ¼X

kjck¼ci

wklRiðxkÞ;

n�c ¼X

kjck 6¼ci

wklRiðxkÞ.

ð8Þ

Rule consistency demands that a rule covers alarge number of correct instances and a smallnumber of incorrect examples. Therefore, the fit-ness for the k-consistency of a rule is defined bythe ratio of these numbers

f3 ¼0; nþc � k < n�c ;

nþc � n�c =knþc

; otherwise,

8><>: ð9Þ

where the parameter k 2 [0, 1] determines the max-imal tolerance for the error made by an individualrule [10]. Typically, k decreases with the number ofclasses and the overlap between classes. The opti-mal value of k is problem specific as it dependson the number of classes and the class overlap.For a multi-class problem with substantial over-lap, a too small value of k will give no or only veryfew rules for which f3 is larger than zero. In such acase, k is increased in order for the evolutionaryalgorithm to distinguish between correct andincorrect rules. On the other hand, if the value ofk is unnecessarily large, the evolutionary algorithmmight generate global rules that cover a large num-ber of instances according to the criteria f1 and f2,but poorly discriminate between the classes. Forbinary classification problems, a value of k = 1was assumed.

The individual fitness criteria fi are normalizedto [0, 1] and the overall fitness of a rule is givenby the product

f ¼ Pifi. ð10Þ

Note that these fitness criteria do not considerthe complexity of neither individual rules nor the

546 F. Hoffmann et al. / European Journal of Operational Research 177 (2007) 540–555

entire rule base, even though it would be possibleto penalize more complex rules in the fitness func-tion. For descriptive fuzzy rules, the rule complex-ity is easily defined by the number of membershipfunctions in the antecedent. In our representationof approximate fuzzy rules, all rules possess thesame complexity as they have the same numberof parameters, namely n · 4. The complexity ofthe entire rule base is given by the total numberof rules. The boosting mechanism (cf. infra) gener-ates rules in decreasing order of relevance for theoverall classification. Therefore, the user can deter-mine the final number of rules a posteriori basedon his/her preference for the trade-off betweenaccuracy and complexity.

2.4. Boosting algorithm

The boosting scheme follows the idea of theiterative rule learning scheme and combines it witha fuzzy variant of the AdaBoost algorithm origi-nally proposed in [8]. The basic idea of boostingis to repeatedly train a weak classifier on variousdistributions of the training data. After a fuzzyrule has been generated, one changes the distribu-tion of training instances according to theobserved error of the last generated classifier onthe training set. The overall classification resultsfrom the aggregated votes of the individualclassifiers.

If the learning algorithm can handle fractionaltraining instances, it is not necessary to generatea new training set from a (modified) distribution.Instead, the instances obtain a weight wk that spec-ifies the relative importance of the kth traininginstance. This weight can be interpreted as if thetraining set would contain wk identical copies ofthe training example (xk, ck). Correctly classifiedinstances (xk, ck) are down-weighted, such thatthe next iterations of the learning algorithm focuson the seemingly more difficult instances.

Initially, all training examples obtain the weightwk = 1. The boosting algorithm repeatedlyinvokes the genetic fuzzy rule generation methodon the current distribution of training examples.Notice, that instances with large weights wk con-tribute more strongly to the fitness of a fuzzy rulein Eqs. (5), (7), and (9).

The boosting algorithm computes the errorE(Rt) of the fuzzy rule Rt generated in iterationt. Each descriptive or approximate fuzzy classifica-tion rule constitutes an incomplete, weak classifier.Incomplete in the sense that the rule only classifiesinstances covered by its antecedent but providesno classification for training examples that donot match the antecedent. Therefore, the classifica-tion error E(Rt) of a fuzzy rule Rt is weighted notonly by the weight wk but also by the degree ofmatching lRt

ðxkÞ between the kth training instance(xk, ck) and the rule antecedent

EðRtÞ ¼P

kjck 6¼ctwklRt

ðxkÞPkwklRt

ðxkÞ. ð11Þ

In other words, the objective of the fuzzy rule gen-eration method is to find classification rules thatbest describe the current distribution of trainingexamples.

Assume that the rule generation algorithm iden-tified Rt as the best rule for the distribution wk(t) atiteration t. For instances (xk, ck) correctly classifiedby Rt the weight is reduced by a factor bk, suchthat incorrectly classified instances gain relativelyin importance in the next invocation of the geneticrule learner,

wkðt þ 1Þ ¼wkðtÞ; if ci 6¼ ck;

wkðtÞ � blRt ðxkÞk ; if ci ¼ ck.

(ð12Þ

The factor

bk ¼EðRtÞ

1� EðRtÞð13Þ

depends on the error E(Rt) of the fuzzy rule. Effec-tively, examples that are classified correctly andmatch the rule antecedent are down-weighted,and misclassified or uncovered examples keep theiroriginal weights. In other words, instances that arecorrectly classified get lower weights and theweight reduction will be higher when the rule acti-vation is higher (for lRt

ðxkÞ ¼ 0, the weight willremain unchanged). Thereby, the boostingalgorithm increases the relative weight of thoseexamples which are hard to learn for the geneticfuzzy system.

In order to classify unknown examples, thevotes of the fuzzy rules Rt on a new instance x

F. Hoffmann et al. / European Journal of Operational Research 177 (2007) 540–555 547

are aggregated into an overall classification [5,15].The classification proposed by a single rule isweighted according to the rule’s classification inac-curacy expressed by

bt ¼EðRtÞ

1� EðRtÞ. ð14Þ

Rather than to pursue a winner-takes-all approachin which the rule with the highest matching degreedictates the ultimate classification, all rules con-tribute to the overall classification. The vote ofrule Rt on x is weighted by the rule activationlRtðxÞ and the factor log(1/bt). The weight log(1/

bt) can be interpreted as the confidence in the fuzzyrule Rt. The boosting classifier then outputs theclass label cmax that maximizes the sum

cmax ¼ argmaxcm

XRt jct¼cm

logð1=btÞlRtðxÞ. ð15Þ

Intuitively, the most accurate and most active ruleshave the largest influence on the classification.Unfortunately, confidence rated fuzzy rules are lessintuitive, in particular if several rules with possiblyconflicting classes trigger for the same input. Inthat case, the outcome of the aggregated classifica-tion not only depends on how well the rules matchthe instance, but also on their relative confidence.The issue of rule weights within fuzzy classificationsystems has been widely discussed in the literature[7]. It turns out that for some types of fuzzy classi-fication systems (e.g., NEFCLASS), rule weightscan be equivalently substituted by proper modifica-tions of the membership functions. However, sucha simplification for the sake of improved interpret-ability is beyond the scope of this paper.

Fig. 7 contains a list of descriptive fuzzy rulesfor a credit data set. Some of these rules have verysimilar antecedents, but nevertheless proposeopposite classifications. Which of these classifica-tions ultimately dominates, can only be under-stood if one also takes the relative confidencelevels of the rules into account.

3. Neurofuzzy classification

Neurofuzzy systems encompass methods thatuse learning algorithms from neural networks to

tune the parameters of a fuzzy system. Some ofthe most well-known neurofuzzy pattern recogni-tion systems include FuNe [11], Fuzzy RuleNet[23], Anfis [17], Fuzzy Artmap [4] and Nefclass[20,21]. In this section, we will further elaborateon Nefclass since it is a neurofuzzy classifier thatgenerates comprehensible descriptive fuzzy rules.

Nefclass has the architecture of a three-layerfuzzy perceptron whereby the first layer U1 con-sists of input neurons, the second layer U2 of hid-den neurons and the third layer U3 of outputneurons. The difference with a classical multilayerperceptron is that the weights now represent fuzzysets and that the activation functions are nowfuzzy set operators. The hidden layer neurons rep-resent the fuzzy rules and the output layer neuronsthe different classes of the classification problemwith 1 output neuron per class. When a trainingpattern is propagated through the perceptron,the activation value of the hidden unit R and theoutput unit c are typically computed as follows:

aR ¼ minxj2U1

fW ðxj;RÞðxjÞg;

ac ¼P

R2U2W ðR; cÞaRP

R2U2W ðR; cÞ

or alternatively ac ¼ maxR2U2

faRg;

ð16Þ

whereby W(xj, R) is the fuzzy weight between inputunit xj and hidden rule unit R and W(R, c) is theweight between hidden rule unit R and outputclass unit c. Nefclass sets all weights W(R, c) to 1for semantical reasons. Depending on the classifi-cation problem at hand, the output activationfunction can be just a simple maximum-operator.After an observation has been propagated throughthe network, its predicted class is assigned accord-ing to the output neuron with the highest activa-tion value (winner-takes-all). Fig. 4 depicts anexample of a Nefclass network. The fuzzy rule cor-responding to rule unit R1 can then, e.g., be ex-pressed as follows:

If x1 is small And x2 is large Then Class ¼ C1;

ð17Þ

where the fuzzy sets ‘small’ and ‘large’ have mem-bership functions lð1Þ1 and lð2Þ1 , respectively.

Fig. 4. Example Nefclass network.

548 F. Hoffmann et al. / European Journal of Operational Research 177 (2007) 540–555

In order to generate descriptive fuzzy rules,rather than approximate fuzzy rules, Nefclassenforces all connections representing the same lin-guistic label (e.g., x1 is small) to have the samefuzzy set associated with them. For example, inFig. 4, the fuzzy set having membership functionlð1Þ1 is shared by the rule units R1 and R2 and thushas the same definition in both fuzzy rules. Whenthis constraint was not imposed, the same linguis-tic label could be represented by different fuzzysets which would lead to the generation of approx-imate fuzzy rules and thus decrease the interpret-ability and comprehensibility of the classifier.

Nefclass starts by determining the appropriatenumber of rule units in the hidden layer. Supposewe have a data set D of N data points fðxi; yiÞg

Ni¼1,

with input data xi 2 Rn and target vectorsyi 2 {0, 1}m for an m-class classification problem.

For each input xj 2 U1, qj fuzzy sets lðjÞ1 ; . . . ; lðjÞqj

are defined. The rule learning algorithm then pro-ceeds as follows.

1. Select the next pattern (xi, yi) from D.2. For each input unit xj 2 U1, j = 1, . . . , n find the

membership function lðjÞljsuch that

lðjÞljðxjÞ ¼ max

l2f1;...;qjgflðjÞl ðxjÞg. ð18Þ

3. If there is no rule node R with:

W ðx1;RÞ ¼ lð1Þl1; . . . ;W ðxn;RÞ ¼ lðnÞln

ð19Þ

then create such a node and connect it to outputclass node p if yi(p) = 1.

4. Go to step 1 until all patterns in D have beenprocessed.

Obviously, the above procedure will result in alarge number of hidden neurons and fuzzy rules.This can be remedied by specifying a maximumnumber of hidden neurons and keeping only thefirst k rule units created (Simple rule learning).Alternatively, one could also keep the best k rules(Best rule learning) or the best bk

mc rules for eachclass (Best per Class rule learning).

Once the number of hidden units has beendetermined, the fuzzy sets between the input andhidden layer are tuned to improve the classifica-tion accuracy of the network. Hereto, Nefclassapplies a fuzzy variant of the well-known back-propagation algorithm to tune the characteristicparameters of the membership functions (see[20,21] for more details).

In a third step, Nefclass offers the possibility toprune the rule base by removing rules and vari-ables based on a simple greedy algorithm usingseveral heuristics.

4. Empirical evaluation

4.1. Data sets and experimental setup

Table 1 represents the characteristics of the datasets that will be used to evaluate the different clas-sifiers. The Breast Cancer, Pima, Australian(Austr) and German (Germ) credit data sets arepublicly available at the UCI repository (http://kdd.ics.uci.edu/). The Gauss data set is an artificialdata set generated from two two-di-mensionalGaussian distributions centered at (0, 0) and (2, 0)with covariance matrices I and 4I. The Bene1and Bene2 data sets are two real-life credit-scoringdata sets that have been obtained from major

Table 1Characteristics of data sets

Dataset size

Inputs

Total Continuous Nominal

Breast cancer 698 9 9 0Pima 768 8 8 0Australian

credit690 14 6 8

Germancredit

1000 20 7 13

Gauss 4000 2 2 0Bene1 3123 27 13 14Bene2 7190 28 18 10

F. Hoffmann et al. / European Journal of Operational Research 177 (2007) 540–555 549

Benelux (Belgium, The Netherlands and Luxem-bourg) financial companies.

These seven data sets will be used to train thegenetic and neurofuzzy classifiers and comparetheir classification accuracy with a selection ofwell-known classification algorithms such asFisher discriminant analysis, Bayes’ classificationrule, artificial neural networks and C4.5 decisiontrees. The first two classification algorithms didnot require any specific parameter settings. Weconsidered feedforward neural networks withone hidden layer, trained with error gradient des-cent using conjugate gradients. The C4.5 decisiontrees were retrospectively pruned. These algo-rithms were run using the PRTools Matlab Tool-box (http://www.prtools.org/). In order tocompute the classification accuracy for the BreastCancer, Pima, Austr, Germ, Gauss and Bene1data sets, we will generate 10 randomisationsand split each of them into 2/3 training set and1/3 test set. Each classifier will then be trainedand evaluated 10 times and its accuracy will beaveraged. The averaged performances can thenbe compared using paired t-tests. For the Bene2data set, we will use a single training set/test setsince it has a large number of observations. Wewill then use McNemar’s test to test the perfor-mance differences [22]. We will also use this dataset to illustrate some of the extracted fuzzy rulesets. We have included the attributes of this dataset in Appendix A.

For the descriptive genetic and neurofuzzyclassifiers, we experimented with five and sevenfuzzy sets for the continuous attributes. The

fuzzy sets are of triangular shape and uniformlypartition the universe of discourse such that themembership degrees sum up to one. The boostingscheme invoked the evolutionary algorithm forrule generation 20 times, generating an equivalentnumber of rules. In general, the classificationerror for the test set converged after about 15rules at which time usually each training examplewas at least covered by one fuzzy rule. The trainand test set classification rates are reported as theaverage performance of the aggregated classifierscomposed of 16 up to 20 rules. The evolutionaryrule generation scheme evolved a population of200 individuals over 50 generations of whichthe overall best solution was added to the aggre-gated classifier. The chromosome length dependson the number of attributes and in the descriptivecase also on the number of fuzzy sets. Thegenetic algorithm used fitness proportionate selec-tion and fitness scaling, whereas the evolutionstrategy operated with l, k selection in whichthe 20 best individuals served as parents for thenext generation of offspring. The number of 50generations was sufficient, as the evolution strat-egy only identified individual rules that matchsome part of the training data rather than anentire rule base. Usually, the best fitness of therule population stagnated after about 10–20 gen-erations. The comparatively large population sizeguarantees an initial diversity of rules to suffi-ciently cover alternative sets of training instances.The suitable parameters for population size andnumber of generations were determined before-hand. However, the accuracy of the boostedGFS based on all rules showed very little depen-dence on these parameters. This observation isexplained by the robustness of the boostingscheme, which still achieves high classificationaccuracy even if the individual classifiers performpoorly.

For the Nefclass classifier, we used triangularfuzzy sets, best per class rule learning and set themaximum number of fuzzy rules generated to 50.We report both an optimistic and a pessimisticaccuracy estimate. The optimistic estimate classi-fies all unclassified observations into the majorityclass whereas the pessimistic estimate considersthem as misclassified.

Tab

le2

Cla

ssifi

cati

on

accu

racy

of

the

evo

luti

on

ary

and

neu

rofu

zzy

clas

sifi

ers

vers

us

ase

lect

ion

of

wel

l-k

no

wn

clas

sifi

cati

on

algo

rith

ms

Tec

hn

iqu

eC

ance

rG

auss

Pim

aA

ust

rG

erm

Ben

e1B

ene2

Tra

inT

est

Tra

inT

est

Tra

inT

est

Tra

inT

est

Tra

inT

est

Tra

inT

est

Tra

inT

est

Fis

her

96.6

(0.4

)9

5.9

(0.9

)76

.1(0

.7)

76

.1(0

.5)

77.8

(0.9

)7

7.3

(0.8

)86

.0(0

.8)

86

.7(1

.5)

77.1

(0.8

)7

6.3

(1.6

)72

.9(0

.2)

71

.3(0

.9)

73.9

72

.7

Bay

es96

.6(0

.4)

95

.8(0

.9)

76.1

(0.7

)7

6.1

(0.5

)77

.9(1

.1)

77

.3(0

.7)

86.0

(0.8

)8

6.7

(1.6

)77

.5(0

.8)

76

.0(1

.4)

72.7

(0.5

)7

1.2

(0.8

)74

.47

3.2

AN

N97

.9(0

.8)

96

.2(0

.8)

78.9

(1.5

)7

8.4

(1.4

)77

.4(3

.0)

75.0

(3.0

)90

.5(1

.2)

86

.0(2

.1)

81.9

(1.3

)74

.1(1

.9)

75.9

(1.0

)7

1.5

(1.0

)73

.57

2.8

C45

98.6

(0.8

)9

3.5

(1.0

)81

.1(0

.6)

79

.0(0

.8)

86.1

(4.5

)7

1.6

(4.2

)90

.8(1

.5)

85

.9(2

.1)

84.9

(2.5

)7

2.4

(1.3

)85

.8(1

.4)

70

.9(1

.2)

89.2

70

.2

GF

SA

pp

rox

94.9

(1.9

)9

2.7

(2.1

)81

.1(0

.6)

79

.4(0

.8)

84.0

(0.8

)7

5.9

(1.6

)87

.1(1

.4)

84.6

(2.6

)80

.4(1

.0)

73

.3(2

.2)

74.6

(0.4

)7

2.9

(1.2

)75

.27

3.3

GF

SD

esc

5F

S91

.0(0

.6)

89

.3(0

.8)

79.6

(0.6

)7

9.6

(0.5

)80

.6(1

.3)

76

.0(1

.9)

88.9

(0.9

)8

5.9

(2.2

)80

.3(1

.0)

73

.2(2

.0)

73.6

(0.4

)72

.1(1

.2)

75.1

72

.3

GF

SD

esc

7F

S93

.5(0

.6)

91

.0(1

.2)

80.4

(0.4

)8

0.3

(0.4

)81

.4(1

.9)

76

.6(1

.7)

89.4

(1.1

)85

.8(2

.1)

80.2

(0.8

)7

3.4

(1.6

)73

.8(0

.6)

72

.9(0

.9)

74.8

73

.3

Nef

clas

sp

ess

5F

S94

.1(1

.4)

93

.2(1

.3)

76.9

(1.5

)7

6.6

(1.3

)73

.5(2

.4)

72

.7(3

.8)

85.0

(0.7

)8

6.5

(1.4

)70

.0(2

.6)

70

.5(3

.1)

49.9

(9.7

)5

1.0

(11)

70.6

70

.1

Nef

clas

so

pt

5F

S94

.2(1

.4)

93

.3(1

.4)

77.6

(1.9

)7

7.3

(1.8

)74

.6(0

.8)

73

.8(2

.2)

85.0

(0.7

)8

6.5

(1.4

)70

.2(2

.6)

70

.6(3

.3)

67.3

(7.7

)6

8.6

(2.1

)71

.27

0.6

Nef

clas

sp

ess

7F

S92

.4(1

.4)

91

.4(1

.9)

74.9

(0.7

)7

4.9

(1.1

)73

.4(2

.1)

72

.6(3

.1)

85.0

(0.7

)8

6.5

(1.4

)70

.7(2

.2)

70

.0(3

.0)

50.8

(16)

49

.7(1

6)

68.7

69

.4

Nef

clas

so

pt

7F

S92

.4(1

.3)

91

.5(1

.9)

75.6

(0.5

)7

5.3

(1.4

)74

.9(1

.1)

74

.2(1

.7)

85.0

(0.7

)8

6.5

(1.4

)71

.5(1

.5)

70

.7(2

.3)

69.1

(3.9

)6

8.6

(3.6

)69

.47

0.3

550 F. Hoffmann et al. / European Journal of Operational Research 177 (2007) 540–555

4.2. Results

Table 2 reports the classification accuracy andstandard deviation (between parentheses) of allclassifiers on the seven data sets. The best averagetest set classification accuracy is underlined anddenoted in bold face for each data set. Test set per-formances that are not significantly different at the5% level from the best performance with respect toa one-tailed paired t-test are tabulated in bold face.Statistically significant underperformances at the1% level are put in italic. Performances signifi-cantly different at the 5% level but not at the 1%level are reported in normal script.

It can be observed from Table 2 that the Fisher,artificial neural network (ANN), and genetic fuzzysystem inferring descriptive fuzzy rules with sevenfuzzy sets (GFS Desc 7FS), each achieved the bestperformance once, whereas the Bayes and thegenetic fuzzy system inferring approximate fuzzyrules (GFS Approx) did so twice. Generally speak-ing, the GFS Desc 7 FS classifier performed betterthan the GFS Desc 5 FS classifier. The perfor-mance of the GFS Approx classifier was in linewith that of its descriptive variants. Except forthe Breast Cancer and Germ data set, the GFSclassifiers did not perform significantly worse thanthe standard classification schemes. The reason forthe relative poor performance of the GFS classifi-ers on the Breast Cancer and Germ data sets isrelated to the fitness criterion rule consistency inEq. (9), for which throughout the experimentsthe default parameter value k = 1 was assumed.The parameter k should be adjusted to thedifficulty of the classification problem (i.e., a smallvalue of k if classes are easily separable as in theBreast Cancer data set, and a larger value of k ifthere is a significant overlap among classes as forexample in the credit data sets). For the BreastCancer data set, a lower value of k would put astronger emphasis on rule consistency relative tothe weight given to the two other criteria, ruleand class coverage. Except for the Australiancredit data set, both the C4.5 and Nefclass classifi-ers always achieved a statistically inferior classifi-cation performance at the 1% level. For theNefclass classifier, we also experimented withother parameter settings (e.g., trapezoidal fuzzy

F. Hoffmann et al. / European Journal of Operational Research 177 (2007) 540–555 551

sets, maximum 100 fuzzy rules, best rule learning)but found that its performance was highly depen-dent upon the specific parameter setting.

Fig. 5 shows the evolution of the training andtest set classification accuracy with the number offuzzy rules generated for the evolutionary fuzzyclassifiers. The low classification rate for a smallnumber of rules is due to the fact that a substantialnumber of instances is initially not covered by anyrule and are therefore counted as misclassified.Note that in the beginning, the classification accu-racy on both the training and test set improves sig-

Fig. 5. Evolution of the training and test set classification accuracy witclassifiers. (a) Approximate fuzzy, (b) descriptive Fuzzy 5 FS and (c)

nificantly, but after a sufficient number of ruleshave been added, the performance increases onlymarginally, hereby clearly illustrating the accuracyversus interpretability trade-off. The figure clearlyillustrates that for all three evolutionary classifiers,the classification accuracy on both the training setand test set stagnates after approximately 15 ruleshave been generated, and that no significant over-fitting occurs when more fuzzy rules are added tothe knowledge base.

One might expect that the approximatefuzzy classifiers perform better on data sets with

h the number of fuzzy rules generated for the evolutionary fuzzydescriptive Fuzzy 7 FS.

552 F. Hoffmann et al. / European Journal of Operational Research 177 (2007) 540–555

dominantly continuous attributes (in particular ifthe values are not uniformly distributed acrossthe universe of discourse as assumed by the uni-form descriptive fuzzy partitions), while descriptiverules might be better able to cope with those havinga larger number of nominal attributes which do notobey an ordering relation (as they can form a dis-junction of multiple allowed nominal values).However, the results in Table 2 do not provide evi-dence to support this assumption. Even though sin-gle approximate and descriptive rules differ in theirexpressive power, it seems that the boosting schemecompensates for these limitations. The sameapplies to the resolution of rules, as the descriptiveGFS with five fuzzy sets does not perform signifi-cantly worse than the GFS with seven fuzzy sets.

Fig. 6 depicts the approximate fuzzy rules thatwere generated by the genetic fuzzy classifier forthe Bene2 data set. Fig. 7 presents the descriptivefuzzy rules extracted by the evolutionary fuzzyclassifier using five fuzzy sets. One can observe,that for data sets with a large number of attributes,such as the Bene2 credit data, the approximateGFS scheme generates more compact rule anteced-ents that only refer to two or three attributes,whereas the descriptive rule antecedents utilizemore attributes. Note that this observation is onlyvalid for the particular fitness functions used inthis paper, which do not consider complexity mea-sures. Therefore, rule or rule base simplicity and

Fig. 6. Approximate fuzzy rule

complexity are not explicitly rewarded or penal-ized. In case the interpretability of the generatedfuzzy system is of primary concern, the fitnessfunction needs to be augmented by additionalcomplexity criteria.

5. Conclusions

In this paper, we have proposed two evolution-ary fuzzy rule learners for generating approximateand descriptive fuzzy rules, respectively. Approxi-mate fuzzy rules each have their own specific def-inition of membership functions, whereasdescriptive fuzzy rules share a common, linguisti-cally interpretable definition of membership func-tions in disjunctive normal form. A boostingscheme is used to extract the rules one by oneand adapt the training set distribution. The exper-iments are carried out on a number of publiclyavailable data sets and two real-life credit scoringdata sets obtained from major Benelux financialinstitutions. It is shown that the evolutionaryfuzzy rule learners compare favourably to theother classification techniques in terms of classifi-cation accuracy. Furthermore, the approximateand descriptive fuzzy rules yield about the sameclassification accuracy across the different datasets. The boosting scheme seems to compensatefor the expressive weaknesses of the individual

s for the Bene2 data set.

Fig. 7. Descriptive fuzzy rules for the Bene2 data set using five fuzzy sets.

F. Hoffmann et al. / European Journal of Operational Research 177 (2007) 540–555 553

classification rules. As a result, no representationperforms significantly better in terms of classifica-tion accuracy than the other. If the individualrules are more expressive, the same classificationaccuracy can be achieved with a smaller rulebase.The designer of a genetic fuzzy classifier faces atrade-off between a smaller number of more com-plex rules and a larger rulebase composed of moreintuitive linguistic rules. Which design alternativeis better depends on the requirements of the appli-cation and whether compactness or interpretabil-

ity of the rulebase are more important. We alsowish to add that the voting based classificationscheme has the drawback that the overall classifi-cation result is often only interpretable in the con-text of multiple rules that are compatible with theinput pattern and their aggregated votes. In par-ticular, the class assignment of rules generatedin the later stages might be misleading as it doesnot reflect the best class on the entire data set,but rather on the weighted and thereby biaseddata set.

554 F. Hoffmann et al. / European Journal of Operational Research 177 (2007) 540–555

A fuzzy rule base that potentially contains con-tradicting classification rules for the same input ismuch more difficult to interpret for a human userthan a set of mutually exclusive decision rules. Inprinciple, there are two remedies to avoid rulebases with conflicting class decisions. The firstone is to include an additional criterion of ruleaccuracy based on the original data set, rather thanthe weighted data set. The second solution involvesa post-processing stage in which to remove conflict-ing rules, similarly to the IRL method. Tacklingthis issue in order to arrive at a more comprehensi-ble fuzzy rule base, with mutually exclusive fuzzyrules and a ‘winner-take-all’ classification scheme,is beyond the scope of this paper but consideredto be an interesting topic for future research.

Appendix A

Table 3.

Table 3Inputs for the Bene2 credit scoring data set

Name Type

Prepayment ContinuousMonthly payment ContinuousAmount on savings account ContinuousStock ownership ContinuousPercentage of Mortgage burden ContinuousOther loan expenses ContinuousAge of savings account NominalNumber of dependents ContinuousPartner signed NominalLoan amount ContinuousPurpose of loan NominalTerm ContinuousNon-professional income ContinuousProfessional income ContinuousProfession NominalMarital status NominalGender NominalNumber of years at current address ContinuousAge ContinuousNumber of years in Belgium ContinuousNumber of years at current job ContinuousTelephone yes or no NominalProperty NominalHome bank NominalWorks in Foreign country NominalPercentage of financial burden ContinuousTotal income ContinuousAvailable income Continuous

References

[1] T. Back, Evolutionary Algorithms in Theory and Practice,Oxford University Press, 1996.

[2] B. Baesens, R. Setiono, C. Mues, J. Vanthienen, Usingneural network rule extraction and decision tables forcredit-risk evaluation, Management Science 49 (3) (2003)312–329.

[3] B. Baesens, T. Van Gestel, S. Viaene, M. Stepanova, J.Suykens, J. Vanthienen, Benchmarking state of the artclassification algorithms for credit scoring, Journal ofthe Operational Research Society 54 (6) (2003) 627–635.

[4] G.A. Carpenter, S. Grossberg, N. Markuzon, J.H. Rey-nolds, D.B. Rosen, Fuzzy artmap: A neural networkarchitecture for incremental supervised learning of analogmultidimensional maps, IEEE Transactions on NeuralNetworks 3 (1992) 698–713.

[5] O. Cordon, M.J. del Jesus, F. Herrera, Genetic learning offuzzy rule-based classification systems cooperating withfuzzy reasoning methods, International Journal of Intelli-gent Systems 13 (10–11) (1998) 1025–1053.

[6] O. Cordon, F. Herrera, F. Hoffmann, L. Magdalena,Genetic Fuzzy Systems: Evolutionary Tuning and Learn-ing of Fuzzy Knowledge Bases. Advances in FuzzySystems, World Scientific, Singapore, 2001.

[7] D. Nauck, R. Kruse, How the learning of rule weightsaffects the interpretability of fuzzy systems, in: Proceedingsof the IEEE International Conference on Fuzzy Systems1998 (FUZZ-IEEE’98), Anchorage, AK, May 1998, pp.1235–1240.

[8] Y. Freund, R.E. Schapire, Experiments with a newboosting algorithm, in: Proc. of the 13th Int. Conf. onMachine Learning ML-96, 1996.

[9] A. Gonzalez, F. Herrera, Multi-stage genetic fuzzy systemsbased on the iterative rule learning approach, Mathware &Soft Computing 4 (1997) 233–249.

[10] A. Gonzalez, R. Perez, Completeness and consistencyconditions for learning fuzzy rules, Fuzzy Sets and Systems96 (1998) 37–51.

[11] S.K. Halgamuge, M. Glesner, Neural networks in design-ing fuzzy systems for real world applications, Fuzzy Setsand Systems 65 (1994) 1–12.

[12] F. Hoffmann, Boosting a genetic fuzzy classifier, in: Proc.Joint 9th IFSA World Congress and 20th NAFIPSInternational Conference, Vancouver, Canada, July 2001,pp. 1564–1569.

[13] F. Hoffmann, Combining boosting and evolutionary algo-rithms for learning of fuzzy classification rules, Fuzzy Setsand Systems 141 (1) (2004) 47–58.

[14] F. Hoffmann, B. Baesens, J. Martens, F. Put, J. Vanthie-nen, Comparing a genetic fuzzy and a neurofuzzy classifierfor credit scoring, International Journal of IntelligentSystems 17 (11) (2002) 1067–1083.

[15] H. Ishibuchi, T. Nakashima, T. Morisawa, Voting in fuzzyrule-based systems for pattern classification problems,Fuzzy Sets and Systems 103 (1999) 223–238.

F. Hoffmann et al. / European Journal of Operational Research 177 (2007) 540–555 555

[16] H. Ishibuchi, K. Nozaki, N. Yamamoto, H. Tanaka,Selecting fuzzy if–then rules for classification problemsusing genetic algorithms, IEEE Transactions on FuzzySystems 3 (3) (1995) 260–270.

[17] J.S.R. Jang, Anfis: Adaptive network based fuzzy inferencesystems, IEEE Transactions on Systems, Man and Cyber-netics 23 (1993) 665–685.

[18] L. Junco, L. Sanchez, Using the AdaBoost algorithm toinduce fuzzy rules in classification problems, in: Proc.Spanish Conference for Fuzzy Logic and Technologies(ESTYLF 2000), Sevilla, Spain, September 2000, pp. 297–301.

[19] L. Magdalena, F. Monasterio, Evolutionary-based learn-ing applied to fuzzy controllers, in: Proceedings 4th IEEEInternational Conference on Fuzzy Systems and theSecond International Fuzzy Engineering Symposium,

FUZZ-IEEE/IFES’95, vol. 3, March 1995, pp. 1111–1118.

[20] D. Nauck, Data analysis with neuro-fuzzy methods,Habilitation Thesis, University of Magdeburg, 2000.

[21] D. Nauck, R. Kruse, A neuro-fuzzy method to learn fuzzyclassification rules from data, Fuzzy Sets and Systems 89(1997) 277–288.

[22] D.J. Sheskin, Handbook of Parametric and Nonparamet-ric Statistical Procedures, second ed., Chapman and Hall,2000.

[23] N. Tschichold-Gurman, Generation and improvement offuzzy classifiers with incremental learning using fuzzy rulerulenet, in: K. George, J.H. Carrol, E. Deaton, D.Oppenheim, J. Hightower (Eds.), Proceedings of theACM Symposium on Applied Computing, ACM Press,Nashville, NY, USA, 1995, pp. 466–470.