Combined numerical and linguistic knowledge representation and its application to medical diagnosis

17
206 IEEE TRANSACTIONS ON SYSTEMS, MAN, AND CYBERNETICS—PART A: SYSTEMS AND HUMANS, VOL. 33, NO.2, MARCH 2003 Combined Numerical and Linguistic Knowledge Representation and Its Application to Medical Diagnosis Phayung Meesad, Student Member, IEEE, and Gary G. Yen, Senior Member, IEEE Abstract—In this paper, we propose a novel hybrid intelligent system (HIS) which provides a unified integration of numerical and linguistic knowledge representations. The proposed HIS is a hier- archical integration of an incremental learning fuzzy neural net- work (ILFN) and a linguistic model, i.e., fuzzy expert system (FES), optimized via the genetic algorithm (GA). The ILFN is a self-or- ganizing network with the capability of fast, one-pass, online, and incremental learning. The linguistic model is constructed based on knowledge embedded in the trained ILFN or provided by the do- main expert. The knowledge captured from the low-level ILFN can be mapped to the higher level linguistic model and vice versa. The GA is applied to optimize the linguistic model to maintain high ac- curacy, comprehensibility, completeness, compactness, and consis- tency. The resulted HIS is capable of dealing with low-level numer- ical computation and higher level linguistic computation. After the system is completely constructed, it can incrementally learn new information in both numerical and linguistic forms. To evaluate the system’s performance, the well-known benchmark Wisconsin breast cancer data set was studied for an application to medical di- agnosis. The simulation results have shown that the proposed HIS performs better than the individual standalone systems. The com- parison results show that the linguistic rules extracted are compet- itive with or even superior to some well-known methods. Our in- terest is not only on improving the accuracy of the system, but also enhancing the comprehensibility of the resulted knowledge repre- sentation. Index Terms—Decision support system, fuzzy expert system, ge- netic algorithms, hybrid intelligent system, ILFN, medical diag- nosis, pattern classification, Wisconsin breast cancer database. I. INTRODUCTION C ONVENTIONAL medical diagnosis in clinical examina- tions relies highly upon the physicians’ experience. Physi- cians intuitively exercise knowledge obtained from previous pa- tients’ symptoms. In everyday practice, the amount of medical knowledge grows steadily, such that it may become difficult for physicians to keep up with all the essential information gained. For physicians to quickly and accurately diagnose a patient there is a critical need in employing computerized technologies to assist in medical diagnosis and accessing to the information related. Computer-assisted technology is certainly helpful for Manuscript received May 23, 2001; revised April 5, 2002 and September 13, 2002. This paper was recommended by Associate Editor E. Santos. P. Meesad is with the Power and Control Division, Department of Electrical Engineering, King Mongkut’s Institute of Technology, Bangkok 10800, Thai- land. G. G. Yen is with the Intelligent Systems and Control Laboratory, School of Electrical and Computer Engineering, Oklahoma State University, Stillwater, OK 74078 USA (e-mail: [email protected]). Digital Object Identifier 10.1109/TSMCA.2003.811290 inexperienced physicians in making medical diagnosis as well as for experienced physicians in supporting complex decisions. Computer-assisted technology has become an attractive tool to help physicians in retrieving the medical information as well as in making decisions in face of today’s medical complications [1]–[7]. A number of medical diagnostic decision support systems (MDSS) based on computational intelligence methods have been developed to assist physicians and medical professionals. Some medical diagnoses systems based on computational intelligence methods use expert systems (ESs) [8]–[11], fuzzy expert systems (FESs) [12]–[15], artificial neural networks (ANNs) [16]–[19], and genetic algorithms (GAs) [20]–[22]. ES’s and FESs, which use symbolic and linguistic knowledge, respectively, are well recognized as applications in medical diagnosis because their decisions are easy to understand by physicians and medical professionals. The development of an ES or a FES for medical diagnosis is not a trivial task, however. It demands a labor-intensive and iterative process from medical experts who may not be readily available. ANNs have been employed to learn numerical data recorded from sensory measurements or images. After being trained, ANNs store knowledge in numerical weights and biases that are often regarded as a black box scheme. This knowledge is difficult for physicians or medical professionals to understand the underlying rationale. Recently, the numerical weights of ANNs can be translated to symbolic/linguistic rules by using rule extraction algorithms [23]–[29]. Symbolic/linguistic rules extracted are then used as a knowledge base for an ES or a FES to support the physicians in making decisions for medical diagnosis [23], [26], [28]–[30]. The resulting knowledge base is often incomplete and inefficient. It may perform poorly in unseen data. To improve the accuracy of a decision making system such as an application in medical diagnosis, an in- tegration of symbolic/linguistic processing and numerical computation is motivated to a research area, namely hybrid intelligent architectures [26], [31]–[34]. Hybrid intelligent architectures tend to be more appropriate in applications that require both numerical computation for higher generalization and symbolic/linguistic reasoning for explanation. It is found that hybridization between symbolic/linguistic and numerical representations can achieve higher accuracy compared to either one alone [24], [26], [34]. Most of the hybrid intelligent system has focused on the ac- curacy and the interpretability. In a learning system, an incre- mental learning capability is considered an important attribute 1083-4419/03$17.00 © 2003 IEEE

Transcript of Combined numerical and linguistic knowledge representation and its application to medical diagnosis

206 IEEE TRANSACTIONS ON SYSTEMS, MAN, AND CYBERNETICS—PART A: SYSTEMS AND HUMANS, VOL. 33, NO. 2, MARCH 2003

Combined Numerical and Linguistic KnowledgeRepresentation and Its Application to

Medical DiagnosisPhayung Meesad, Student Member, IEEE,and Gary G. Yen, Senior Member, IEEE

Abstract—In this paper, we propose a novel hybrid intelligentsystem (HIS) which provides a unified integration of numerical andlinguistic knowledge representations. The proposed HIS is a hier-archical integration of an incremental learning fuzzy neural net-work (ILFN) and a linguistic model, i.e., fuzzy expert system (FES),optimized via the genetic algorithm (GA). The ILFN is a self-or-ganizing network with the capability of fast, one-pass, online, andincremental learning. The linguistic model is constructed based onknowledge embedded in the trained ILFN or provided by the do-main expert. The knowledge captured from the low-level ILFN canbe mapped to the higher level linguistic model and vice versa. TheGA is applied to optimize the linguistic model to maintain high ac-curacy, comprehensibility, completeness, compactness, and consis-tency. The resulted HIS is capable of dealing with low-level numer-ical computation and higher level linguistic computation. After thesystem is completely constructed, it can incrementally learn newinformation in both numerical and linguistic forms. To evaluatethe system’s performance, the well-known benchmark Wisconsinbreast cancer data set was studied for an application to medical di-agnosis. The simulation results have shown that the proposed HISperforms better than the individual standalone systems. The com-parison results show that the linguistic rules extracted are compet-itive with or even superior to some well-known methods. Our in-terest is not only on improving the accuracy of the system, but alsoenhancing the comprehensibility of the resulted knowledge repre-sentation.

Index Terms—Decision support system, fuzzy expert system, ge-netic algorithms, hybrid intelligent system, ILFN, medical diag-nosis, pattern classification, Wisconsin breast cancer database.

I. INTRODUCTION

CONVENTIONAL medical diagnosis in clinical examina-tions relies highly upon the physicians’ experience. Physi-

cians intuitively exercise knowledge obtained from previous pa-tients’ symptoms. In everyday practice, the amount of medicalknowledge grows steadily, such that it may become difficult forphysicians to keep up with all the essential information gained.For physicians to quickly and accurately diagnose a patient thereis a critical need in employing computerized technologies toassist in medical diagnosis and accessing to the informationrelated. Computer-assisted technology is certainly helpful for

Manuscript received May 23, 2001; revised April 5, 2002 and September 13,2002. This paper was recommended by Associate Editor E. Santos.

P. Meesad is with the Power and Control Division, Department of ElectricalEngineering, King Mongkut’s Institute of Technology, Bangkok 10800, Thai-land.

G. G. Yen is with the Intelligent Systems and Control Laboratory, School ofElectrical and Computer Engineering, Oklahoma State University, Stillwater,OK 74078 USA (e-mail: [email protected]).

Digital Object Identifier 10.1109/TSMCA.2003.811290

inexperienced physicians in making medical diagnosis as wellas for experienced physicians in supporting complex decisions.Computer-assisted technology has become an attractive tool tohelp physicians in retrieving the medical information as well asin making decisions in face of today’s medical complications[1]–[7].

A number of medical diagnostic decision support systems(MDSS) based on computational intelligence methods havebeen developed to assist physicians and medical professionals.Some medical diagnoses systems based on computationalintelligence methods use expert systems (ESs) [8]–[11], fuzzyexpert systems (FESs) [12]–[15], artificial neural networks(ANNs) [16]–[19], and genetic algorithms (GAs) [20]–[22].ES’s and FESs, which use symbolic and linguistic knowledge,respectively, are well recognized as applications in medicaldiagnosis because their decisions are easy to understand byphysicians and medical professionals. The development ofan ES or a FES for medical diagnosis is not a trivial task,however. It demands a labor-intensive and iterative processfrom medical experts who may not be readily available. ANNshave been employed to learn numerical data recorded fromsensory measurements or images. After being trained, ANNsstore knowledge in numerical weights and biases that areoften regarded as a black box scheme. This knowledge isdifficult for physicians or medical professionals to understandthe underlying rationale. Recently, the numerical weights ofANNs can be translated to symbolic/linguistic rules by usingrule extraction algorithms [23]–[29]. Symbolic/linguistic rulesextracted are then used as a knowledge base for an ES or aFES to support the physicians in making decisions for medicaldiagnosis [23], [26], [28]–[30]. The resulting knowledge baseis often incomplete and inefficient. It may perform poorly inunseen data. To improve the accuracy of a decision makingsystem such as an application in medical diagnosis, an in-tegration of symbolic/linguistic processing and numericalcomputation is motivated to a research area, namely hybridintelligent architectures [26], [31]–[34]. Hybrid intelligentarchitectures tend to be more appropriate in applications thatrequire both numerical computation for higher generalizationand symbolic/linguistic reasoning for explanation. It is foundthat hybridization between symbolic/linguistic and numericalrepresentations can achieve higher accuracy compared to eitherone alone [24], [26], [34].

Most of the hybrid intelligent system has focused on the ac-curacy and the interpretability. In a learning system, an incre-mental learning capability is considered an important attribute

1083-4419/03$17.00 © 2003 IEEE

MEESAD AND YEN: COMBINED NUMERICAL AND LINGUISTIC KNOWLEDGE REPRESENTATION 207

Fig. 1. Architecture of the proposed hybrid intelligent system.

aside from accuracy and interpretability. As for medical diag-nosis, patient data grows everyday when new symptoms are dis-covered. Novel medical knowledge should be quickly incorpo-rated into medical diagnosis system without spending tremen-dous time in the learning process. In a hybrid system, usuallymultilayer perceptron (MLP) neural networks are used as a nu-merical model that is trained by backpropagation algorithms[26]. One well-known problem of the backpropagation is thatit is difficult to employ an incremental learning feature. Thestandard backpropagation algorithm lacks the ability to dynam-ically generate extra nodes or connections during learning [24],[35]. In medical problems, new knowledge of symptoms may befound after the diagnosis system has been constructed. Using thebackpropagation algorithms all old and new data have to be re-trained in order to update the knowledge to cover the new symp-toms. With an incremental learning algorithm, the new knowl-edge can be learned and added to the system without retrainingpreviously learned data [35]–[38].

The contribution of this study is to develop a pattern classi-fier system (i.e., a decision support system) that is concernednot only with accuracy and interpretability, but also an incre-mental learning concept. We propose a hybrid intelligent system(HIS) that is composed of a numerical model in low level anda linguistic model in higher level and is equipped with an in-cremental learning algorithm. The proposed system is a hierar-chical integration of an incremental learning fuzzy neural net-work (ILFN) [37], [38] and a FES based on Takagi–Sugeno (TS)fuzzy model [39]. The ILFN is a self-organizing network withthe capability of fast online learning. The ILFN can learn in-crementally without retraining old information. The linguisticmodel, FES, is constructed based on knowledge embedded inthe trained ILFN. The knowledge captured from the low-levelILFN can be mapped to the higher level FES and vice versa. Thesystem is equipped with a conflict resolution scheme to main-tain consistency in decision making. The low-level ILFN con-

tributes fast, incremental learning while the higher level FESoffers advantages of dealing with fuzzy data. It provides easyinterpretation and explanation to the decision made. The GA[40] is applied to optimize the linguistic model to maintain highaccuracy and comprehensibility. The resulted HIS is capable ofdealing with low-level numerical data and higher level linguisticinformation. After being completely constructed, the system canincrementally learn new information in both numerical and lin-guistic forms.

The remainder of this paper is organized as follows. Sec-tion II discusses the proposed HIS architecture. To demonstratethe effectiveness and efficiency of the proposed system, numer-ical simulations and benchmark comparisons are studied in Sec-tion III. Section IV provides some concluding remarks and fu-ture research directions.

II. A RCHITECTURE OF THEPROPOSEDHYBRID

INTELLIGENT SYSTEM

The proposed HIS, shown in Fig. 1, is constituted of the fivefollowing components:

1) ILFN;2) FES;3) network-to-rule module;4) rule-to-network module;5) decision-explanation module.

Input data is brought into the system through both the ILFNand the FES. The ILFN and the FES are linked together by anetwork-to-rule module that is a rule extraction algorithm formapping the ILFN to the FES and a rule-to-network modulethat is an algorithm for mapping the FES to the ILFN. The out-puts of the ILFN and the FES connect to the decision-explana-tion module which makes decisions and explanations based onthe information received from both the ILFN and the FES. Thehuman operators can interact with the system through a user in-

208 IEEE TRANSACTIONS ON SYSTEMS, MAN, AND CYBERNETICS—PART A: SYSTEMS AND HUMANS, VOL. 33, NO. 2, MARCH 2003

teraction module. The details of each module are discussed inthe following sections.

A. Incremental Learning Fuzzy Neural Network

The ILFN is a self-organizing network that is equipped withan online, incremental learning algorithm capable of learningall training patterns within only one pass. Gaussian radial basisfunctions are used to form the distribution of the pattern space.The ILFN can learn in a supervised or an unsupervised fashion[37], [38]. The ILFN has four layers: one input layer, one hiddenlayer, one output layer, and one decision layer, as shown inFig. 2. Alternatively, the system can be viewed as two subsys-tems: an input subsystem and a target subsystem. Each sub-system has three layers: one input layer, one hidden layer, andone output layer. The hidden layer of both the input subsystemand the target subsystem are linked together via a controllermodule which is used to control the growing neurons in thehidden layer. Each output layer of both subsystems consists oftwo modules. The output layer of the input subsystem consistsof a pruning module and a membership module, while the outputlayer of the target subsystem consists of a pruning module and atarget module. The membership module of the input subsystemand the target module of the target subsystem are simultaneouslyupdated with their number of neurons controlled by the pruningmodules. The output of the classifier is linked together via a de-cision layer.

ILFN network uses four weighting parameters , ,, , as well as one threshold parameter,. is the

hidden weight of the input subsystem. Each row (i.e., a nodein the hidden layer) of represents a mean or centroid ofa cluster. Each node of stores the corresponding target ofthe input prototype patterns. and are the standarddeviation and the number, respectively, of patterns that belongto each node., selected within the range [0, 1], is the thresholdparameter that controls the number of clusters created. Thesystem generates many clusters ifis large and few clustersif it is small. However, clusters that belong to the same classare grouped together via the pruning module. The details oflearning algorithm can be found in [37] and [38].

A trained ILFN does not exhibit a clear meaning of knowl-edge embedded inside its structure. Linguistic knowledge ismore preferable if an explanation about the decision is needed.Thus it is desirable to transform the knowledge of the trainedILFN into a form that is easier to comprehend in linguisticused in a FES. A FES has a close relationship with an ILFNnetwork in that it can be mapped from one to another. Inorder to employ both numerical calculation from an ILFNand linguistic processing from a FES, we will combine both atrained ILFN and the mapped FES into the same hybrid system.The output decision of the hybrid system is based on both theILFN and the FES. The resulting hybrid system would providecomplementary features from both the ILFN and the FES. Thehybrid system seems to show the ability to deal with morecomplex problems that need an explanation capability.

The following sections describe the details of the FES usedin this study, as well as how to map knowledge from a trainedILFN to a FES and vice versa.

Fig. 2. ILFN classifier architecture.

B. Fuzzy Expert System (FES)

A FES can be thought of as a special kind of expert systems(ESs). In fact, a FES is an ES that is incorporated with fuzzysets [41]. Thus, a FES exhibits transparency to users. Users caneasily understand the decision made by a FES due to the factthat the rule base is in “if–sthen” form used in natural languages.From a knowledge representation viewpoint, a fuzzy if-then ruleis a scheme for capturing knowledge that is imprecise by nature.

Fig. 3 illustrates a schematic diagram of a FES. A FES is com-posed of four main modules: a fuzzifier, an inference engine, adefuzzifier, and a knowledge base. The function of the fuzzifieris to determine the degree of membership of a crisp input in afuzzy set. The fuzzy knowledge base is used to represent thefuzzy relationships between input and output fuzzy variables.The output of the fuzzy knowledge base is determined by thedegree of membership specified by the fuzzifier. The inferenceengine utilizes the information from the knowledge base as wellas from the fuzzifier to infer additional information. The outputof a FES can be fuzzy values from the inference engine process.The output in fuzzy value format is advantageous in pattern clas-sification problems since the fuzzy values indicate the degree ofbelongings of a given pattern to class prototypes. Optionally, thedefuzzifier is used to convert the fuzzy output of the system intocrisp values.

In a FES, a knowledge base is used for an explanation purposeas well as in a decision making process. A knowledge structureused in the proposed FES is comprised of the following:

1) input features’ names;2) variables’ ranges;3) number of linguistic labels;4) linguistic labels;5) membership function;6) membership functions’ parameters;7) fuzzy if-then rules.

The information about the knowledge structure of the FES canbe provided by experts or automatically generated from data.For an -dimensional pattern space, the components in theknowledge base used in the FES are detailed as follows:

MEESAD AND YEN: COMBINED NUMERICAL AND LINGUISTIC KNOWLEDGE REPRESENTATION 209

Fig. 3. Fuzzy expert system.

Knowledge Base( )

(1)

Input Features’ Names( )

(2)

Variables’ Ranges( )

......

(3)

Number of Linguistic Labels( )

(4)

To maintain comprehensibility of the linguistic model, thenumber of the linguistic variables should be as small as pos-sible. It is suggested that it should not be larger than nine [42].

Linguistic Labels( )

...(5)

where is a linguistic label in the dimension and isthe index to it.Membership Functions( )

...(6)

Membership Functions’ Parameters( )

...(7)

(8)

Assume that is a Gaussian membership function,has two variables which are and , a standard deviation and

a mean of a Gaussian membership function, respectively. Thus,we have

(9)

Fuzzy If–Then Rules( )

......

. . ....

......

(10)

represents the antecedent part of the if-then rules;and constitute the consequent part of the if-then rules;where and is the number of fuzzy rules and the dimen-sion of the pattern space, respectively. , ,

, is the antecedent of the rule for the dimen-sion. is the index of a linguistic label inthe dimension of the linguistic labels ( ) of the rule.If is “0” then the system uses adon’t carelabel in which itsactivation function is always a unity membership grade.is aconstant value that is a class consequent part of therule. ,a value in [0, 1], is a confident factor of the rule. In a FES,for a finite class pattern classification problem with an-di-mensional pattern space, linguistic knowledge can be written asa set of fuzzy if-then rule in a natural language as follows:

IF is AND is

AND AND is

THEN

belongs to Class with confident factor

(11)

where , , is the label of the rule andindicates a linguistic label such assmall, medium, or large.

Assume that Gaussian membership functions are employedin the FES. The rule firing strength can be computed by thefollowing equation:

for (12)

where and are the mean and the standard deviation, re-spectively, of the linguistic label indexed by ; and is a

-norm operator which can be replaced byproduct.

210 IEEE TRANSACTIONS ON SYSTEMS, MAN, AND CYBERNETICS—PART A: SYSTEMS AND HUMANS, VOL. 33, NO. 2, MARCH 2003

(a) (b)

Fig. 4. (a) Projection of ILFN to 1-D fuzzy sets. (b) FES grid partition with its parameter projected from trained ILFN.

After computing the firing strength from each rule, the classoutput, , is calculated by using the inference mechanism asfollows:

(13)

In developing a FES, developers must pay attention to sev-eral issues such as accuracy, comprehensibility, compactness,completeness, and consistency.Accuracy is a quantitativemeasure that indicates the performance of a FES in classifyingboth training and testing data.Comprehensibilityindicates howeasily a FES can be accessible by human beings. Generally,comprehensibility of a FES depends on the following aspects:the distinguishability of the shapes of membership functions,the number of fuzzy if-then rules, and the number of antecedentconditions of fuzzy if-then rules.Compactnessinvolves the sizeof fuzzy if-then rules and the number of antecedents of fuzzyif-then rules. More compactness of a fuzzy system usuallyyields a higher comprehensibility.Completenessassures that aFES will provide a nonzero output for any given input in theinput space.Consistencymakes sure that fuzzy if-then rules arenot conflicting with each other as well as those from humansenses. Fuzzy if-then rules are inconsistent if they have verysimilar antecedents, but distinct consequents, and they conflictwith the expert knowledge. If there are fuzzy if-then rulesthat are conflicting to each other, the rules become unclear[43]–[46]. The conflicting rules need to be resolved.

For the completeness of the rule structure in a FES, grid par-tition methods are widely used for partitioning input space intogrid cells. Fuzzy if-then rules can be obtained by using fuzzygrid partitions [47]. Despite its advantage of providing the com-pleteness of rule structure, the grid partition method has a dis-advantage in that the number of fuzzy rules increases exponen-tially as the dimension of the input space increases. Since eachcell represents a fuzzy if-then rule, the number of fuzzy if-thenrules is usually very large. The system becomes a black-boxscheme that is not comprehensible to human users. Pattern clas-sification problems in real world often have large dimensions. Itis undesirable to directly use the grid-type partitioning for con-structing fuzzy if-then rules.

To obtain a smaller number of fuzzy rules, projection fromclusters [43], [45] is called for. Clustering algorithms can be

used for partitioning data points into a small number of clusters.Each cluster then represents a fuzzy relation and corresponds toa rule. The fuzzy sets in the antecedent parts of the rules are pro-jected from the clusters onto the corresponding axis of the dataspace. The number of rules from projection method is smallerthan grid-type partitioning. A more compact linguistic modelis obtained. However, the fuzzy sets that are directly projectedfrom clustering methods may not be transparent enough. Thenumber of fuzzy sets from the projection may be very large andredundant since the projected fuzzy sets may be very similar re-sulting in a fuzzy system that is not optimal. The problem men-tioned above can be solved using rule simplification methods[45], [48].

The ILFN groups the patterns in the input space into a smallnumber of clusters. Based on grid partition methods, the clus-ters and its parameters of the trained ILFN can be mapped tofuzzy if-then rules. The number of fuzzy if-then rules is equalto the number of clusters in the trained ILFN. The number offuzzy sets in each dimension depends on the number of grid par-titions chosen. The parameters of fuzzy sets are projected fromthe cluster parameters of the trained ILFN. Fig. 4(a) shows theprojection of ILFN to 1–D fuzzy sets. We can see that the fuzzysets in Fig. 4(a) have some similarity. Combining grid parti-tioning method and projection method is shown in Fig. 4(b). Theparameters of fuzzy sets in Fig. 4(b) are projected and adaptedfrom the clusters of the trained ILFN.

Using a grid-based projection method, the fuzzy if–thenrules of the FES are extracted from a trained ILFN. The hiddennumerical weights of the ILFN are mapped into initial fuzzyif–then rules. A genetic algorithm is then used to select onlydiscriminatory features resulting in a more compact rule setwith highly transparent linguistic terms. The following sectiondescribes the mechanism used to map the ILFN to the FES.

C. Network-to-Rule Module

Since the knowledge embedded in the ILFN is not in lin-guistic form, the ILFN lacks of an explanation capability. ILFNweights can be extracted by using a rule extraction algorithm toobtain linguistic rules. A meaningful explanation in reasoningprocess can then be generated from the linguistic model i.e.,the FES. The mechanism used for mapping a trained ILFN to

MEESAD AND YEN: COMBINED NUMERICAL AND LINGUISTIC KNOWLEDGE REPRESENTATION 211

a linguistic knowledge base operates inside the network-to-rulemodule. The mechanism is calledilfn2rule algorithm.

1) ILFN2RULE Algorithm: Using a grid-based projectionmethod, the ilfn2rule algorithm is used to map a trained ILFNto fuzzy if–then linguistic rules.

The user specifies membership functions’ ( ) types. Anytype of fuzzy membership functions can be used. In this study,Gaussian membership functions are used. The rule extractionalgorithm is given in four steps described below.

Step 1: Retrieve trained ILFN parameters( , , ) as well as the numbersof linguistic labels ( ). (The numbersof linguistic labels are determined duringthe genetic optimization process that willbe discussed later.)Step 2: Calculate membership functions’parameters ( ) that are a mean orcenter and a standard deviation for eachlinguistic label in the case that Gaussianmembership function is used. Centers ofGaussian functions can be determined fromthe variables’ ranges ( ) that are min-imum and maximum values of the numericalweight for the ILFN network.

(14)

(15)

(16)

where ; ; is thenumber of hidden nodes, i.e., prototypescreated by the ILFN network; is thedimension of the pattern space; rep-resents the numerical resolution betweenlinguistic variables in the dimension;

and are the maximum and theminimum values of the weight in the

dimension; and is the number oflinguistic variables in the dimension.

forfor

(17)

(18)

(19)

where , , represents the meanof the Gaussian membership function inthe dimension; represents the stan-dard deviation of the Gaussian membershipfunctions in the dimension; and , se-lected in [0, 1], represents the overlapparameter between membership functions.

Step 3: Map the numerical weight intolinguistic label form using the followingequation:

(20)

where represents the index of the lin-guistic label mapped from ; and ,for , , ,is an element of the hidden weight ofthe ILFN network. is the dimension ofthe pattern space and is the number ofprototypes created by the ILFN network.Step 4: Generate if–then rule table: uselinguistic antecedent parts obtained from

and consequent parts from . Thenumber of fuzzy if–then rules is equalto the number of hidden neurons of thetrained ILFN. Optionally, calculate confi-dent factor , , for each ruleusing count parameter by the fol-lowing equation:

(21)

(22)

where , , is a count pa-rameter of the rule (i.e., theprototype) obtained when a pattern isincluded into the prototype; and

.

(23)

The knowledge base from Fig. 5 can be described by fuzzylinguistic form that is similar to natural language as follows:

Rule 1: If feature is high and feature is low and featureis medium, then class is 1;

Rule 2: If feature is mediumand is low andfeature is low, then class is 3;

Rule 3: If feature is high and feature is mediumandfeature is medium, then class is 2.

…For clarity of the ilfn2rule algorithm, we shall illustrate an

example. Suppose we derive an ILFN network as follows:

Note that we will use Gaussian membership functions, whichrequire two parameters: the mean and standard deviation. Theilfn2rule algorithm performs the following steps.

212 IEEE TRANSACTIONS ON SYSTEMS, MAN, AND CYBERNETICS—PART A: SYSTEMS AND HUMANS, VOL. 33, NO. 2, MARCH 2003

Fig. 5. Mapping from ILFN to linguistic rules.

Step 1: Let’s select NL = 3, the overlap pa-

rameter � = 0:01. The parameters WP , WT , count

and NL are acquired.

Step 2: Calculate the mean and standard devia-

tion for each membership function.

Vmin1 =min(wP11; wP21; wP31) = min([9:1; 1:6; 1:4]) = 1:4

Vmax1 =max(wP11; wP21; wP31) = max([9:1; 1:6; 1:4]) = 9:1

res1 =Vmax1 � Vmin1

N1�1=

9:1� 1:4

3� 1=

7:7

2= 3:85

Vmin2 =min(wP12; wP22; wP32) = min([3:8; 3:2; 8:8]) = 3:2

Vmax2 =max(wP12; wP22; wP32) = max([3:8; 3:2; 8:8]) = 8:8

res2 =Vmax2 � Vmin2

N2�1=

8:8� 3:2

3� 1=

5:6

2= 2:8

�11 =Vmin1 = 1:4

�12 =�11 + res1 = 1:4 + 3:85 = 5:25

�13 =�12 + res1 = 5:25 + 3:85 = 9:1

�1 =[�11; �12; �13] = [1:4; 5:25; 9:1]

�1 = � res21p

2� ln�= � 3:852p

2� ln 0:01= 1:5

�21 =Vmin2 = 3:2

�22 =�21 + res2 = 3:2 + 2:8 = 6

�23 =�22 + res2 = 6 + 2:8 = 8:8

�2 =[�21; �22; �23] = [3:2; 6; 8:8]

�2 = � res22p

2� ln�= � 2:82p

2� ln 0:01= 1:1:

Step 3: Map the numerical weight WP into lin-

guistic labels.

Aij =argmink

(k(wPij ; �jk)k)A11 =argmin (k(wP11; �11)k ; k(wP11; �12)k ; k(wP11; �13)k)

=argmin([(9:1� 1:4); (9:1� 5:25); (9:1� 9:1)])

=argmin([7:7; 3:85; 0])

=3

A12 =argmin (k(wP12; �21)k ; k(wP12; �22)k ; k(wP12; �23)k)=argmin([(3:8� 3:2); (3:8� 6:8); (3:8� 8:8)])

=argmin([0:6; 2:2; 5])

=1

A21 =argmin (k(wP21; �11)k ; k(wP21; �12)k ; k(wP21; �13)k)=1

A22 =argmin (k(wP22; �21)k ; k(wP22; �22)k ; k(wP22; �23)k)=1

A31 =argmin (k(wP31; �11)k ; k(wP31; �12)k ; k(wP31; �13)k)=1

A32 =argmin (k(wP32; �21)k ; k(wP32; �22)k ; k(wP32; �23)k)=3

A =

3; 1

1; 1

1; 3

;B =

3

2

1

;

Step 4: Generate if–then rule table:

R = fA;Bg =

3; 1 3

1; 1 2

1; 3 1

:

After linguistic rules are extracted from the ILFN network,they can be used as a rule base for a fuzzy expert system. Afuzzy expert system is considered as a higher level knowledgerepresentation since it uses if–then rules similar to natural lan-guages. Using linguistic form makes the system transparent, al-lowing human users to easily comprehend the rationale of howthe decision was made. Explanations and answers can be pro-vided if needed.

In pattern classification problems, the dimension of the pat-tern space may be very large. For very large dimensions, it istoo cumbersome to use all the features available as a knowl-edge base. Though it is described in linguistic form, using allavailable features, it results in a system that is no longer trans-parent to users. It is possible to select only feature subset thatprovides the most discriminatory power in classifying patterns.To do this, the genetic algorithm is very useful and suitable to se-lect the important features. We will adapt the GA [40] to searchfor an optimal number of features used for each rule while main-taining a high percentage of correct classification. This will re-sult in reducing the number of rules as well. Some rules will beredundant after many features have been eliminated, these du-plicate rules can be pruned out.

2) Genetic Algorithm for Rule Optimization:The linguisticrule base extracted from the ILFN is suboptimal. In order to ob-tain a near optimal rule set, the GA is used to operate on initialfuzzy rules. An integer chromosome representation is used in-stead of a binary chromosome representation, to reduce the sizeof chromosome and improve the speed of the evolutionary op-erations. The fuzzy if–then rules are encoded into integer chro-mosomes to be evolved by the GA. After converging, the bestchromosomes are decoded back into the FES with a compactrule set. In order to apply the genetic optimization, the if–thenrule base is encoded in a chromosome representation. Only theantecedent is coded and operated on by the evolutionary process.The original rule set is used as a reference rule set in decodingthe final population to the final linguistic rule base.

3) Fuzzy If–Then Rule Encoding:In our procedure, only theantecedents of the if–then rules are used in genetic encoding.If–then rules are encoded to an integer chromosome. A chro-mosome sometimes refers to an individual of the population.The elements of each chromosome are called genes that are in-teger numbers. Each gene in a chromosome can be decoded to

MEESAD AND YEN: COMBINED NUMERICAL AND LINGUISTIC KNOWLEDGE REPRESENTATION 213

a fuzzy if–then rule. Let be a chromosome that is a set ofgenes , where

(24)

(25)

ifotherwise

(26)

where is the number of fuzzy if–then rules, and is thedimension of the pattern space. The antecedentis the lin-guistic label in the th dimension of theth rules. is equal“1” meaning that the th dimension of theth rule is being usedand is equal “0” meaning that theth dimension of thethrule is not being used, i.e.,don’t care.

For example, a FES is used in a three-class, two-dimensionalproblem space. The fuzzy expert system has two linguistic la-bels {1: low, 2:high} in each dimension. Suppose that there arefour rules extracted from a trained ILFN, as follows:

Rule 1: if is low and is high, then class 1;Rule 2: if is high and is high, then class 2;Rule 3: if is low and is low, then class 3;Rule 4: if is high and is low, then class 3.That is we have

where is the antecedent set and

is the consequent set

Let be a set of antecedents when some features are com-posed of adon’t care linguistic label. Let be a binary set of1’s and 0’s indicating whether or not an element ofis used.

Suppose the antecedent setis reduced to .

That is we have a binary set .

From the antecedent set, we have an encoded chromosome

4) Fuzzy If–Then Rule Decoding:Integer chromosomes areused in the genetic optimization process. After convergence ofthe solution, encoded integer chromosomes are decoded backto fuzzy if–then rule bases. Given an integer chromosome, eachgene is decomposed into binary format. The decoding processis an inverse process of the encoding process mentioned above.

The binary decomposition is similar to the transformationfrom decimal number to binary number, but we express the re-sulting binary code in the opposite direction. We can directlyperform the transformation by decomposing the decimal codeinto binary code as

where or “1,” and is the dimension.The resulting binary code is the set of [ ]. Forexample, a decimal codeequals to our binary code 0101. In our early example, supposethat we have a solution chromosome .can be decomposed to binary format as follows:

So, we have . Knowing the origin antecedent

set , we have the reduced antecedent

;

Note that if a rule comprises of alldon’t carelinguistic labelsin the antecedent part, then that rule can be eliminated.

5) Genetic Selection for the Number of Linguistic Vari-ables: The number of linguistic variables can be varieddepending on a given problem. Some problems may need morelinguistic variables than others. Using more linguistic vari-ables results in finer fuzzy partitions and better classificationperformance. However, to maintain the interpretability of thesystem, the number of linguistic variable should be as smallas possible. Selecting of the numbers of linguistic variablesbecomes a trade off between the accuracy of the system andthe interpretability of the system. To obtain the optimal pointthat balances between the accuracy and the interpretability isnot an easy task in selecting the number of linguistic variables.To avoid the difficulty, the numbers of linguistic labels canbe selected by using the GA. The genetic selection for thelinguistic numbers can be processed simultaneously with therule optimization.

214 IEEE TRANSACTIONS ON SYSTEMS, MAN, AND CYBERNETICS—PART A: SYSTEMS AND HUMANS, VOL. 33, NO. 2, MARCH 2003

The chromosome for the genetic optimization of the numberof linguistic variables ( ) can be written as

(27)

where , , is the number of linguistic vari-ables for the dimension. The chromosome for optimizingthe number of linguistic variables ( ) can be combined withthe chromosome for optimizing the fuzzy if–then rules ().The combined chromosome from (24) and (27) can be writtenas

(28)6) Genetic Algorithm:When the genetic algorithm is imple-

mented, it usually proceeds in a manner that involves the fol-lowing steps:

Step 1: Initialization of the populationStep 2: Fitness evaluationStep 3: Mate selectionStep 4: CrossoverStep 5: MutationStep 6: Check stopping criteria; if thesolution meets the criteria, stop thealgorithm and obtain the final if–thenrules; otherwise, repeat Steps 2–6.

Initialization of the Population:A chromosome has twodifferent groups of genes: the number of linguistic variables andthe fuzzy if–then rules. The initial population of the chromo-somes is randomly selected as integer numbers in both of thegroups. These initial individuals will be reproduced to next gen-eration via the genetic operations: fitness evaluation, mate selec-tion, crossover, and mutation.

Fitness Evaluation:The fitness function is based on theperformance of resulting rules decoded from a chromosome andthe compactness of the rule set. A fuzzy expert system withthe decoded rules is used to evaluate the performance of theresulting rules. The fitness function of a chromosomecanbe determined from the following:

fitness (29)Total Patterns Wrongly Classified Patterns

Total Patterns(30)

(31)

(32)

where is the weight of percent correct classification by afuzzy expert system; is the percent correct classification;

is the weight of the number of features used for a rule set;is calculated from (26); is the structure complexity of

the fuzzy system, i.e., number of features used for a rule set,i.e., number of 1’s in ; is the weight of the number oflinguistic variables used for a rule set; and is the summa-tion of the numbers of linguistic variables used. Preferring fewer

linguistic variables, fewer rules, and fewer features with highercorrect classification performance, the weight of percent cor-rectly classified patterns ( ), is usually set to be relativelylarger than the weight of structure complexity ( ) and theweight of the linguistic variables ( ). , , andare all positive numbers in; they are predefined by the user.

Mate Selection:There are many ways of selecting indi-viduals for mating. One of the well-known methods isroulettewheel selection[49]. The fittest individuals usually have ahigher chance to mate than the ill-fitted ones. In roulette wheelselection, the individuals are randomly selected based on theprobability of fitness. The reproduction probability can bedefined from the fitness function

fitness

fitness

(33)

where is the number of individuals in the population.Crossover: After mate selection operation, crossover op-

eration is performed. Crossover operation is a mechanism forchanging information between two chromosomes calledparentsto reproduce two new individuals calledoffsprings. A crossoverpoint is selected randomly with probability. In our problem,since a chromosome is separated into two groups, the crossoverprocess is also separated into two parts: the crossover of thenumber linguistic variables and the crossover of fuzzy if–thenrules. The crossovers of the two parts are independent from eachother.

Mutation: Mutation is applied to offspring to prevent thesolution from trapping at a local minimum area. The mutationoperation allows the genetic algorithm to explore new possiblesolutions and increase a chance to get near global minima. Anoffspring chromosome mutates with the mutation probability

on each gene. In the integer-coded genetic algorithm, themutation process operates from the following

(34)

where is a gene selected for mutating; is the resultedgene from mutating; is a random number in [0, 1]produced by the Gaussian random generator; andis the highestpossible integer value a gene is allowed to be. The two parts ofthe chromosome have different values. The highest possiblevalue of the number of linguistic variables is set to 9 or smaller.The highest integer value for the gene of the if–then rule is 2for -dimensional space.

D. Rule-to-Network Module

The rule-to-network module is used for transferring the lin-guistic knowledge into the ILFN structure. The rule-to-networkmodule allows an expert to incorporate his knowledge into thesystem. The rule-to-network consist of therule2ilfn algorithmthat is used for mapping the FES to the ILFN.

1) RULE2IIFN Algorithm: There are two phases in therule2ilfn algorithm. Phase 1 is used when fuzzy rules arecompact where they havedon’t care linguistic variables. Thedon’t care linguistic variables need to be transformed intointermediate rules. In the transformed intermediate rules, every

MEESAD AND YEN: COMBINED NUMERICAL AND LINGUISTIC KNOWLEDGE REPRESENTATION 215

feature or component of the rules is composed of at least alinguistic variable attached; otherwise, we cannot map rulesto an ILFN network. Phase 2 operates after phase 1 ended.In phase 2, the parameters of fuzzy rules are correspondinglymapped to the parameters of an ILFN network. The details ofthe two phases are as follows.

Phase 1: Mapping a compact rule set to an intermediate ruleset.

1) Retrieve a rule;

.2) Check for a feature that has adon’t care linguistic label (i.e.,

). Within the present rule,if there is a feature having adon’t care linguistic label; ex-pand every possible rule to coverthe combinations of available lin-guistic labels.3) Repeat 1) and 2), until thereare no more rules.4) Output the intermediate ruleset.

Phase 2: Mapping an intermediate rule set to an initial ILFNnetwork.

Set , , , and tobe empty sets.

For rule to do,For feature to do,SetSet

SetSet

; where

; where

.

is the number of rules and is the dimension of patternspace. The parameters and , , ,are a mean and a standard deviation of the linguistic labelindexed by . More specifically, and are taken from

that is in the membership functions’ parameters ()from (7). After obtaining the initial ILFN network, availabletraining data is used to refine the ILFN network. Networkpruning is also needed to eliminate the hidden nodes that donot have any belonging pattern. This can be done by checkingat the parameter . If count of a node is equal to one, theneliminate that node.

E. Decision–Explanation Module

The last module in the HIS is the decision-explanationmodule. The decision-explanation module performs two func-

Fig. 6. Hybrid system combined from ILFN and FES.

tions: making a decision and explaining the decision. For a firstfunction, making a decision, the decision-explanation modulereceives two inputs from the outputs of low-level ILFN andhigher level FES. Another function of the decision-explanationmodule is to generate a natural language to explain and con-clude the decision made by using the knowledge base () fromthe FES module.

Fig. 6 shows an equivalent representation of the HIS. The de-cision for the class output can be calculated by the following:

(35)

(36)

(37)

for (38)

(39)

(40)

where is the class vector; is the membership values fromthe ILFN with respect to ; is the membership valuesfrom the FES with respect to ; is the membership valuesfrom the HIS with respect to ; and

are the real-value weights linking fromthe ILFN and the FES to the decision-explanation module,respectively; is the number of classes; and is the classdecision output from the HIS. Please note thatand canbe specified by the user or determined by an optimizationalgorithm such as the GA. In our study we used a real GA tosearch for possibly optimal values ofand .

For simplicity, we may set and. Then, we have

(41)

1) Decision Boundaries:The decision boundaries of theHIS come from the weighted average of the boundaries fromthe ILFN and FES. The decision boundaries of the ILFN and thedecision boundaries of the FES contribute in different manners.The decision boundaries of the ILFN emphasize in local area toachieve better generalization while the decision boundaries ofthe FES preserve for human interpretability. Since numericalinformation in the ILFN and linguistic information in the FEShave complement benefits, it is preferable to incorporate bothstructures into the same system. The boundaries of the hybridsystem provide both accuracy and interpretability. In the HIS,the ILFN serves as a low-level numerical computation, while

216 IEEE TRANSACTIONS ON SYSTEMS, MAN, AND CYBERNETICS—PART A: SYSTEMS AND HUMANS, VOL. 33, NO. 2, MARCH 2003

Fig. 7. Hybrid decision boundaries.

the FES operates as a higher level linguistic computation.Hybrid weights ( and ) play an important role in adjustingthe hybrid decision boundaries. If is larger than , thehybrid boundaries tend toward the ILFN boundaries. Ifissmaller than , the hybrid boundaries tend toward the FESboundaries. Fig. 7 shows the hybrid decision boundaries of thecombined ILFN and FES.

2) Conflict and Conflict Resolution Between Low Level andHigher Level: Ideally, there should be no conflict between low-level and high-level decisions in the HIS, if it is a one-to-onemapping between them. Since a combination of grid based par-tition and projection is used in the mapping process, decisionsfrom ILFN and FES may conflict with each other. The diagramin Fig. 8 shows the possible conflict decisions between the ILFNand the FES. The system is conflict, if the decision from theILFN is correct but the decision from the FES is wrong or if thedecision from the ILFN is wrong but the decision from the FESis correct. The system is not conflict, if the two systems makethe same decision. It is preferable that the two systems are notconflict and both make correct decisions.

It is feasible to resolve the conflict between the two systemsby forcing their decision boundaries to be as close to the HIS de-cision boundaries as possible. Conflict resolution for the HIS isthen the determination of the elements forand . As a matterof fact, this becomes an ordinary optimization problem, whichcan be solved by using any optimization method. Due to an ad-vantage of not requiring a derivative calculation and unlikely tobe trapped at local minima, GA can be adopted for this purpose.The GA searches for weightsand that adapt the decisionsboundaries of the two modules to be closer together. The hybriddecision finally will be forced to the diagonal path indicated asa dashed line in Fig. 8, which ideally shows that the ILFN andthe FES are not conflicting and both make correct decisions.

F. Increment Learning Characteristic of the Proposed HIS

An incremental learning system updates its new knowledgewithout training old data. Only new data is needed in thelearning process. This concept has been studied by many

Fig. 8. Conflict decision between the ILFN and the FES.

researchers (see [35]–[38].) In the hybrid structure between alow level and a higher level, such as in numerical and symbolicor numerical and linguistic systems, it is preferable to incor-porate the incremental feature to the system. It is important inapplication such as controls and monitoring process, as well asin medical diagnosis, to employ an incremental learning aspect.New knowledge needs to be captured in real time withoutspending tremendous time to learn all the old data along withthe new data.

Since the proposed HIS incorporated the ILFN which is anincremental learning architecture, it is easy to employ its in-cremental learning capability. The ILFN can learn all patternswithin only one pass. While operating, the ILFN detects newunseen class prototypes. If new knowledge is found, the newknowledge is added in the hidden unit without destroying theold knowledge. To extend the incremental learning feature to thehigher level linguistic model is straightforward. New linguisticrules can be directly extracted from the new hidden nodes ofthe ILFN by using the ilfn2rule in the network-to-rule module.An algorithm for checking conflict is operated to maintain con-sistency between the two levels. Similarly, if the higher levelhas a new knowledge, i.e., linguistic rules that may come froman expert or experienced users, the new knowledge needs to bemapped to the ILFN structures as well. This can be done byusing the rule2ilfn algorithm in the rule-to-network module.

III. SIMULATION RESULTS

To demonstrate the performance of the HIS, computer sim-ulations were used in our study. Simulations and analysis ofthe HIS are performed using the well-known benchmark data,namely Wisconsin breast cancer database (WBCD) [50], as anexample for application in medical diagnosis. The WBCD is areal application that has been used by many researchers [12],[23], [26], [29].

The WBCD contains a collection of 699 patterns each de-scribed by nine features. Each feature is a real number in theinterval one to ten based on a fine needle aspirate taken directlyfrom human breasts: clump thickness, size uniformity, shapeuniformity, marginal adhesion, cell size, bare nuclei, bland chro-matin, normal nucleoli, and mitosis. The larger the values ofthese attributes are, the greater the likelihood of malignancy will

MEESAD AND YEN: COMBINED NUMERICAL AND LINGUISTIC KNOWLEDGE REPRESENTATION 217

TABLE ISIMULATION RESULTS FOR THEWBCD

be. There are 458 patterns for benign (labeling as “2” in the database) and 241 patterns for malignant (labeling as “4”). There are16 patterns with incomplete feature descriptions marked as “?”[50], [51]. We replaced the missing values with “0.”

A. Simulation Results for the WBCD

Ten simulations were performed to evaluate the proposedmethod. In every simulation, the ILFN learning parameterswere set to defaults as follows: the threshold, and thestandard deviation, . The numerical weights of theILFN network were extracted to fuzzy initial linguistic rules. Inorder to optimize the linguistic rules, the GA with the integerchromosome representation was used by setting its learningparameters heuristically as follows: population size ,the number of generations , the mutation probability,

, and the crossover probability, . Theweights in the fitness evaluation are set as follows: ,

, and . The number of linguistic labelswas constrained to within three for each dimension. TheGA with a real chromosome representation also was usedto find the weighting parameters, and . The parametersfor the real GA were as follows: population size ,the number of generations , the mutation probability,

, and the crossover probability, . The resultsfrom the ten simulations are shown in Table I.

From Table I, the ILFN achieved an average correct classi-fication of 96.17% on training set and 97.37% on test set. Thefuzzy rules extracted from the trained ILFN achieved an averagecorrect classification of 97.43% on training set and 96.72% ontest set. It is worth noting that the fuzzy rules extracted from thetrained ILFN achieved higher classification rate for the trainingset. However, the fuzzy rules achieved lower percentage of cor-rectly classified patterns from the test set. When we combinedthe ILFN and fuzzy rules extracted to construct a HIS, the resultsshow that the HIS achieved higher classification rate than both

218 IEEE TRANSACTIONS ON SYSTEMS, MAN, AND CYBERNETICS—PART A: SYSTEMS AND HUMANS, VOL. 33, NO. 2, MARCH 2003

TABLE IIILFN PARAMETERS FOR THEWBCD

TABLE IIIRESULTED LINGUISTIC LABELS AND THEIR PARAMETERS FORWBCD

the ILFN and the extracted fuzzy rules alone. The proposed HIShad an average of 97.61% and 97.48% correct classification onthe training set and the test set, respectively.

Due to the space limitation, we show the details of numer-ical weights of the ILFN and the extracted linguistic rules fromone example based on the best classification performance of theHIS, i.e., from run number 3 in Table I. Based on run number 3,the details on ILFN and its linguistic rules extracted are shownin Tables II–IV.

From run number three, we used 100 patterns for training theILFN, 341 patterns for training to the FES and HIS, and used342 patterns for testing in all three systems. The ILFN con-structed 3 hidden nodes with the parameters shown in Table II.The ILFN network achieved 98% and 97.95% correct classifi-cation for the training set and the test set, respectively.

From Table II, the knowledge embedded in the trained ILFNis in numerical form. Linguistic rules are preferably extractedfrom the trained ILFN for a reasoning purpose. The fuzzy lin-guistic rules are mapped from the ILFN parameters and the GAis used to select only discriminatory features. This results in amore compact rule set.

After running for 100 generations, the resulting fuzzy lin-guistic rules are shown in Table III. The fuzzy linguistic rulesare shown in Table IV. Using the linguistic knowledge from Ta-

bles III and IV as the rule set for a fuzzy expert system, the finalfuzzy linguistic rules achieved 97.95% and 96.49% correct clas-sification for training and testing data, respectively. The hybridintelligent system combining the decisions from both ILFN andFES achieved 98% correct classification rate for training set and98.25% correct classification rate for the testing set. The HISachieved 98.13% in all 683 patterns of the WBCD. Fuzzy ex-pert rules in natural language for the WBCD can be interpretedas

Rule 1: If Clump Thickness is lowand Size Uniformity islow and Marginal Adhesion is lowand Cell Sizeis low and Normal Nucleoli is lowand Mitosis islow , Then Malignant, with confidence ;

Rule 2: If Clump Thickness is highand Shape Uniformityis high and Mitosis is high, Then Benign, withconfidence ;

Rule 3: If Clump Thickness is highand Bland Chromatinis medium and Mitosis is low, Then Benign, withconfidence .

B. Comparison Results Among Other Methods

Several groups of researchers have studied and developedknowledge-based system for the WBCD. Peña-Reyes and

MEESAD AND YEN: COMBINED NUMERICAL AND LINGUISTIC KNOWLEDGE REPRESENTATION 219

TABLE IVFUZZY EXPERT RULES FORWISCONSINBREAST CANCER DATA

TABLE VCOMPARISONRESULTS FOR THEWBCD AMONG WELL-KNOWN METHODS

* features per rule and features per node.

Sipper used a fuzzy if–then system as a classifier. They devel-oped a fuzzy-GA algorithm to extract rules from the WBCD.Fuzzy-GA algorithm uses the GA to search for two parameters,

and , of their fuzzy rules [12]. The number of rules has tobe predetermined inad hocmanner. In [23], Setiono developeda rule extraction called NeuroRule. NeuroRule uses a pruningprocedure after the training phase to decrease the number of thenetwork connections. The pruning process runs until networkperformance drops to 95% correct classification rate. In [23],100-MLP networks were used in the training phase. Thenetwork with highest performance out of 100 pruned networkswas use in rule-extraction phase. The NeuroRule extracts rulesby clustering the hidden nodes activation values. Then, theinput combinations are checked if any input makes the hiddennodes and output node active. An improvement of NeuroRulein the WBCD was studied by the same author in [29] by doingdata pre-process before the training step. Another group wasTaha and Ghosh [26]. In [26], three rule extraction algorithms

were developed: BIO-RE, Partial-RE, and Ful-RE. BIO-RE isa black box rule extraction technique which does not requireinformation regarding the internal network structure to generaterules. Partial-RE searches for a set of incoming connectionsthat will cause a unit to be active. Full-RE decomposes therule extraction process into two steps: rules between hiddenand output units and rules between input units and hiddenunits. This is similar to NeuroRule [29] but the differenceis that Full-RE employs linear programming and an inputdiscretization method to find a combination of the input valuesthat will cause a hidden unit to be active. Comparison results onthe WBCD are shown in Table V, which shows the comparisonamong several rule-based systems from [12], [23], [26], [29].

From Table V, the best performance was from NeuroRule [29]with five rules plus a default rule extracted from one of the 100pruned networks with two hidden units and nine connections.The accuracy rate was 98.24% in 683 patterns. The rule set ex-tracted in [29] is as follows:

220 IEEE TRANSACTIONS ON SYSTEMS, MAN, AND CYBERNETICS—PART A: SYSTEMS AND HUMANS, VOL. 33, NO. 2, MARCH 2003

If and and , then benign,Else if and and and, then benign,

Else if and and and, then benign,

Else if and and and, then benign,

Else if and and andand , then benign,

Else malignant

NeuroRule does not produce any rule for malignancy. It needsa default rule for malignancy. Fuzzy-GA [12] extracted rulesbased on the predetermined number of rules in the range of1 to 5. A total of 120 evolutionary runs were performed. Thehighest performance system was 97.80% correct classificationrate using 3 fuzzy if–then rules with 4.7 conditions per rule, anda default rule. In [26], using Bio-RE algorithm, the best perfor-mance system was 96.96% using 11 Boolean rules with 2.7 con-ditions per rule. Using Full-RE algorithm, the best performancewas 96.19% with 5 rules and 1.8 conditions per rule (no defaultrule). NeuroRule [23], [29], fuzzy-GA [12], Bio-RE [26], andPartial-RE [26] have a default rule that means they lack of com-pleteness. Default rules do not provide a symbolic interpretationof the decision other than that “because none of the above oc-curred” [26].

Based on ten runs, our proposed HIS achieved 98.13% cor-rect classification for all 683 patterns. The HIS used ILFN with 3hidden nodes and FES with three fuzzy if-then rules and 2.7 con-ditions per rules. An advantage of the proposed HIS is that it in-corporates an incremental learning characteristic in the system.Since data can be made available on a daily basis, using the pro-posed HIS the novel data can be added into the system quicklywithout spending too much time on retraining all the old infor-mation.

C. Study on Detecting of Novel Data

To provide some simulation results on how the proposed HISperforms when novel data is added into the system, we haveset up an experiment using the Fisher iris flower data [51] asa test benchmark. The Fisher’s Iris flower data set consists of150 patterns with four features: sepal length, sepal width, petallength, and petal width. These four features describe the shapeand size of the Iris flowers. Each pattern in the data set fallsinto one of three classes: Setosa, Versicolor and Virginica, witha total of 50 patterns per class.

First we trained the system by using only two classes offlowers, i.e., the Setosa and Versicolor. The Virginica was thenused to test the system. The objective was to see whether thesystem could detect the novel data.

The ILFN parameters trained by using Setosa (numbered as“1) class and Versicolor class (numbered as “2”):

TABLE VICLASS 2, VERSICOLOR

TABLE VIICLASS 3, VIRGINICA

The corresponding fuzzy rules in matrix form

An alternative representation can be written in natural lan-guage form.

Rule 1: If sepal length ismediumand sepal width ishighand petal length islow and petal width islow thenthe flower is Setosa.

Rule 2: If petal width ismediumthen the flower is Versi-color.

We presented the following data (listed in Table VI) to thesystem.

Since the data were all the same class, i.e., Versicolor, thesystem simply made decision as Versicolor.

Next, we presented another set of unseen data that was Vir-ginica to the system. The data were as in Table VII.

Because the data were novel, the ILFN and FES generatedadditional information as follows.

Since the data were all the same class, i.e., Virginica, as ex-pected, only node was added to weight and of theILFN. In addition, a new fuzzy rule was added to the fuzzy rule,i.e.,

Rule 3: If sepal length ishigh and sepal width ishigh andpetal length ishigh and petal width ishigh thenthe flower is Virginica. The results have confirmedthat the hybrid system could be able to detect newinformation.

IV. CONCLUSION

Combination of a trained ILFN network and a FES into a uni-fied structure results in a HIS useful for decision making frame-works. The proposed HIS offers mutually complementary ad-vantages from an ILFN network and a FES. This system can be

MEESAD AND YEN: COMBINED NUMERICAL AND LINGUISTIC KNOWLEDGE REPRESENTATION 221

useful for complex real-world applications, in particular, med-ical diagnosis where different processing strategies have to besupported.

A mapping mechanism from high-level linguistic knowledgeto a low level ILFN network is provided to continuously main-tain consistency between low-level and higher level modules.This allows an expert to add or revise linguistic rules to thesystem. New knowledge is then mapped back to the ILFN struc-ture allowing the ILFN network to update its parameters.

Fuzzy rules in the FES are generated directly from theILFN network using the “ilfn2rule” algorithm. After using theilfn2rule algorithm to map the ILFN numerical variables tolinguistic variable, the genetic algorithm is used to improvethe rule set. Based on the initial rules extracted, the number ofrules and features of the FES are optimized by using the GA.A compact FES with only essential discriminatory features isobtained.

The trained ILFN and the optimized FES is combined intoa hybrid system. It is found that the combination of the opti-mized rule base with the trained ILFN achieves better classi-fication results both on training and testing patterns. It is alsofound that some rules in the original rule set extracted from thetrained ILFN may conflict with each other. However, after usingthe genetic algorithm to refine rules and features, the rule con-fliction can be resolved.

The resulting knowledge from the proposed rule extractionprocedure is represented in “if-then” linguistic form that iseasily comprehensible to human users. By integrating the FESand ILFN, explanations and answers can be easily generatedwhen needed while numerical accuracy is maintained.

Computer simulations using the well-known Wisconsinbreast cancer database were performed. The low-level ILFNhas only few hidden nodes and the higher level linguistic modelextracted has very small number of rules. The trained ILFN andthe fuzzy linguistic rules are combined to a HIS achieved verygood results based on the classification performance comparedwith the original system as well as other rule-based methods.

REFERENCES

[1] L. P. Seka, A. Fresnel, D. Delamarre, C. Riou, A. Burgun, B. Pouliquen,R. Duvauferrier, and P. Le Beux, “Computer assisted medical diagnosisusing the web,”Int. J. Med. Inform., vol. 47, no. 1-2, pp. 51–56, 1997.

[2] F. M. H. M. Dupuits, A. Hasman, and P. Pop, “Computer-based assis-tance in family medicine,”Comput. Methods Programs Biomed., vol.55, no. 1, pp. 39–50, 1998.

[3] L. Makris, I. Kamilatos, E. V. Kopsacheilis, and M. G. Strintzis,“Teleworks: A CSCW application for remote medical diagnosis supportand teleconsultation,”IEEE Trans. Inform. Technol. Biomed., vol. 2,pp. 62–73, June 1998.

[4] D. Conforti and L. D. Luca, “Computer implementation of a medicaldiagnosis problem by patterns classification,”Future Gener. Comput.Syst.s, vol. 15, no. 2, pp. 287–292, 1999.

[5] D. J. Foran, D. Comaniciu, P. Meer, and L. A. Goodell, “Computer-as-sisted discrimination among malignant lymphomas and leukemiausing immunophenotyping, intelligent image repositories, and telemi-croscopy,”IEEE Trans. Inform. Technol. Biomed., vol. 4, pp. 265–273,Dec. 2000.

[6] K. P. Adlassnig, “The section on medical expert and knowledge-basedsystems at the department of medical computer sciences of the univer-sity of Vienna medical school,”Artif. Intell. Med., vol. 21, no. 1-3, pp.139–146, 2001.

[7] G. P. K. Economou, D. Lymberopoulos, E. Karvatselou, and C. Chas-someris, “A new concept toward computer-aided medical diagnosis–Aprototype implementation addressing pulmonary diseases,”IEEE Trans.Inform. Technol. Biomed., vol. 5, pp. 55–66, Mar. 2001.

[8] B. G. Buchana and E. H. Shortcliffe,Rule-Based Expert Systems:The MYCIN Experiment of the Stanford Heuristic ProgrammingProject. Reading, MA: Addison-Wesley, 1984.

[9] W. A. J. J. Wiegerinck, H. J. Kappen, E. W. M. T. ter Braak, W. J. P. P.ter Burg, M. J. Nijman, Y. L. O., and J. P. Neijt, “Approximate inferencefor medical diagnosis,”Pattern Recognit. Lett., vol. 20, no. 11-13, pp.1231–1239, 1999.

[10] B. Kovalerchuk, E. Vityaev, and J. F. Ruiz, “Consistent knowledge dis-covery in medical diagnosis,”IEEE Eng. Med. Biol., vol. 19, pp. 26–37,July/Aug. 2000.

[11] L. Yan and D. J. Miller, “General statistical inference for discrete andmixed spaces by an approximate application of the maximum entropyprinciple,” IEEE Trans. Neural Networks, vol. 11, pp. 558–573, May2000.

[12] C. A. Peña-Reyes and M. Sipper, “Designing breast cancer diagnosticvia a hybrid fuzzy-genetic methodology,”Proc. IEEE Int. Fuzzy SystemsConf., pp. 135–139, 1999.

[13] D. Walter and C. K. Mohan, “ClaDia: A Fuzzy classifier system for dis-ease diagnosis,” inProc. Congresss Evolutionary Computation, La Jolla,CA, 2000, pp. 1429–1435.

[14] S. Zahan, “A fuzzy approach to computer-assisted myocardial ischemiadiagnosis,”Artif. Intell. Med., vol. 21, no. 1-3, pp. 271–275, 2001.

[15] N. Belacel, P. Vincke, J. M. Scheiff, and M. R. Boulassel, “Acuteleukemia diagnosis aid using multicriteria fuzzy assignment method-ology,” Comput. Methods Programs Biomed., vol. 64, no. 2, pp.145–151, 2001.

[16] A. Durg, W. V. Stoecker, J. P. Cookson, S. E. Umbaugh, and R. H. Moss,“Identification of variegated coloring in skin tumors: Neural network vs.rule-based induction methods,”IEEE Eng. Med. Biology Mag., vol. 12,pp. 71–74, 98, Sept. 1993.

[17] C. S. Pattichis, C. N. Schizas, and L. T. Middleton, “Neural networkmodels in EMG diagnosis,”IEEE Trans. Biomed. Eng., vol. 42, pp.486–496, May 1995.

[18] X. Yao and Y. Liu, “A new evolutionary system for evolving artificialneural networks,”IEEE Trans. Neural Networks, vol. 8, pp. 694–713,May 1997.

[19] D. West and V. West, “Model selection for a medical diagnositic decisionsupport system: A breast cancer detection case,”Artif. Intell. Med., vol.20, no. 3, pp. 183–204, 2000.

[20] V. Podgorelec, P. Kokol, and J. Zavrsnik, “Medical diagnosis predictionusing genetic programming,”Proc. 12th IEEE Symp. Computer-BasedMedical Systems, pp. 202–207, 1999.

[21] L. W. Man, L. Wai, S. L. Kwong, S. N. Po, and J. C. Y. Cheng, “Dis-covering knowledge from medical databases using evolutionary algo-rithms,” IEEE Eng. Med. Biol., vol. 19, pp. 45–55, July/Aug. 2000.

[22] C. A. Peña-Reyes and M. Sipper, “Evolutionary computation inmedicine: An overview,”Artif. Intell. Med., vol. 19, no. 1, pp. 1–23,2000.

[23] R. Setiono and H. Liu, “Symbolic representation of neural networks,”IEEE Comput., vol. 29, pp. 71–77, Mar. 1996.

[24] A. H. Tan, “Cascade ARTMAP: Integrating neural computation andsymbolic knowledge processing,”IEEE Trans. Neural Networks, vol.8, pp. 237–250, Mar. 1997.

[25] A. B. Tickle, R. Andrews, M. Golea, and J. Diederich, “The truth willcome to light: Directions and challenges in extracting the knowledgeembedded within trained artificial neural networks,”IEEE Trans. NeuralNetworks, vol. 9, pp. 1057–1068, Nov. 1998.

[26] I. A. Taha and J. Ghosh, “Symbolic interpretation of artificial neuralnetworks,”IEEE Trans. Knowl. Data Eng., vol. 11, pp. 448–463, May1999.

[27] P. Tino and M. Koteles, “Extracting finite-state representations from re-current neural networks trained on chaotic symbolic sequences,”IEEETrans. Neural Networks, vol. 10, pp. 284–302, Mar. 1999.

[28] S. Mitra and Y. Hayashi, “Neuro-Fuzzy rule generation: Survey in softcomputing framework,”IEEE Trans. Neural Networks, vol. 11, pp.748–768, May 2000.

[29] R. Setiono, “Generating concise and accurate classification rules forbreast cancer diagnosis,”Artif. Intell. Med., vol. 18, no. 3, pp. 205–219,2000.

[30] Y. Hayashi, “A comparison between two neural network rule extrac-tion techniques for the diagnosis of hepatobiliary disorders,”Artif. Intell.Med., vol. 20, no. 3, pp. 205–216, 2000.

222 IEEE TRANSACTIONS ON SYSTEMS, MAN, AND CYBERNETICS—PART A: SYSTEMS AND HUMANS, VOL. 33, NO. 2, MARCH 2003

[31] S. I. Gallant, Neural Network Learning and ExpertSystems. Cambridge, MA: MIT Press, 1993.

[32] L. R. Medsker,Hybrid Neural Network and Expert Systems. Norwell,MA: Kluwer , 1994.

[33] S. Goonatilake and S. Khebbal,Intelligent Hybrid Systems. New York:Wiley, 1995.

[34] S. Wermter and R. Sun, “An overview of hybrid neural system,” inHy-brid Neural Systems, S. Wermter and R. Sun, Eds. Berlin, Germany:Springer-Verlag, 2000, pp. 1–13.

[35] L. M. Fu, H. H. Hsu, and J. C. Principe, “Incremental backpropagationlearning networks,”IEEE Trans. Neural Networks, vol. 7, pp. 757–761,May 1996.

[36] G. A. Carpenter, S. Grossberg, N. Markuzon, J. H. Reynolds, and D.B. Rosen, “Fuzzy ARTMAP: A neural network architecture for incre-mental supervised learning of analog multidimensional maps,”IEEETrans. Neural Networks, vol. 3, pp. 698–713, Sept. 1992.

[37] G. Yen and P. Meesad, “Pattern classification by an incremental learningfuzzy neural network,” inProc. Int. Joint Conf. Neural Networks, Wash-ington D.C, 1999, pp. 3230–3235.

[38] , “An effective neuro-fuzzy paradigm for machinery conditionhealth monitoring,”IEEE Trans. Syst., Man, Cybern. B, vol. 31, pp.523–536, Aug. 2001.

[39] T. Takagi and M. Sugeno, “Fuzzy identification of systems and its appli-cation to modeling and control,”IEEE Trans. Syst., Man, Cybern., vol.SMC-15, pp. 116–132, 1985.

[40] J. H. Holland,Adaptation in Natural and Artificial Systems: An Intro-ductory Analysis With Applications to Biology, Control, and ArtificialIntelligence. Ann Arbor, MI: Univ. Michigan Press, 1975.

[41] L. A. Zadeh, “Fuzzy sets,”Inf. Control, vol. 8, pp. 338–353, 1965.[42] Y. Jin, “Fuzzy modeling of high-dimensional systems: Complexity re-

duction and interpretability improvement,”IEEE Trans. Fuzzy Syst., vol.8, pp. 212–221, Apr. 2000.

[43] M. Setnes, R. Babuska, and H. B. Verbruggen, “Rule-based modeling:Precision and transparency,”IEEE Trans. Syst., Man, Cybern. C, vol. 28,pp. 165–169, Feb. 1998.

[44] H. Ishibuchi, T. Nakashima, and T. Murata, “Performance evaluationof fuzzy classifier systems for multidimensional pattern classificationproblems,”IEEE Trans. Syst., Man, Cybern. B, vol. 29, pp. 601–618,Oct. 1999.

[45] Y. Jin, W. V. Seelen, and B. Sendhoff, “On generatingFC fuzzy rulesystems from data using evolution strategies,”IEEE Trans. Syst., Man,Cybern. B, vol. 29, pp. 829–845, Dec. 1999.

[46] O. Cordon and F. Herrera, “A proposal for improving the accuracy oflinguistic modeling,”IEEE Trans. Fuzzy Syst., vol. 8, pp. 335–344, 2000.

[47] L. X. Wang and J. M. Mendel, “Generating fuzzy rules by learning fromexamples,”IEEE Trans. Syst., Man, Cybern., vol. 22, pp. 1414–1427,Dec. 1992.

[48] M. Setnes, R. Babuska, U. Kaymak, and H. R. van nauta Lemke, “Sim-ilarity measures in fuzzy rule base simplification,”IEEE Trans. Syst.,Man, Cybern. B, vol. 28, pp. 376–386, 1998.

[49] D. E. Goldberg,Genetic Algorithms in Search, Optimization, and Ma-chine Learning. Reading, MA: Addison-Wesley, 1989.

[50] W. H. Wolberg and O. L. Mangasarian, “Multi-surface method of patternseparation for medical diagnosis applied to breast cytology,” inProc.Nat. Acad. Sci., vol. 87, 1990, pp. 9193–9196.

[51] C. L. Blake and C. J. Merz. (1998) UCI Repository of MachineLearning Databases. Univ. California, Irvine, CA. [Online]. Available:http://www.ics.uci.edu~mlearn/MLRepository.html

Phayung Meesad(S’97) received the B.S. degree inelectrical engineering from King Mongkut’s Instituteof Technology North Bangkok, Thailand, in 1994,and the M.S. and Ph.D. degrees in electrical engi-neering from Oklahoma State University, Stillwater,in 1998 and 2002, respectively.

He is currently the Associate Dean at the KingMongkut’s Institute of Technology. His researchinterests includes neural networks, fuzzy systems,fuzzy neural networks, genetic algorithms, patternclassification, control systems, signal processing,

and machine health monitoring.

Gary G. Yen (S’87–M’88–SM’97) received thePh.D. degree in electrical and computer engineeringfrom the University of Notre Dame, Notre Dame,IN, in 1992.

He is currently an Associate Professor in theSchool of Electrical and Computer Engineering,Oklahoma State University (OSU), Stillwater, OK.Before he joined OSU in 1997, he was with theStructure Control Division, U.S. Air Force ResearchLaboratory, Albuquerque, NM. His research issupported by the DoD, DoE, EPA, NASA, NSF,

and Process Industry. His research interests include intelligent control,computational intelligence, conditional health monitoring, signal processingand their industrial/defense applications.

Dr. Yen was an Associate Editor of the IEEE TRANSACTIONS ON NEURAL

NETWORKS and IEEE CONTROL SYSTEMS MAGAZINE from 1994 to 1999.He is currently serving as an Associate Editor for the IEEE Transactions onSystems, Man and Cybernetics, IEEE TRANSACTION ON CONTROL SYSTEMS

TECHNOLOGY and IFAC Journal on Automatica.He served as Program Chairfor the 2000 IEEE Conference on Control Applications held in Anchorage,AK, and is the General Chair for the 2003 IEEE International Symposium onIntelligent Control to be held in Houston, TX. He is currently serving as Chairfor the IEEE Neural Network Society Neural Network Technical Committee.