Using genetic algorithms to improve the performance of classification rules produced by symbolic...
-
Upload
independent -
Category
Documents
-
view
4 -
download
0
Transcript of Using genetic algorithms to improve the performance of classification rules produced by symbolic...
~-t5 Sixh inltrnaJiona Symposium on ~fehodgies jor ifllelligefICt SYSltms 7-~ Charoue vC OClober 16middot19 1991
USI~G GE~ETIC ALGORITHi1S TO IMPROVE THE PERFORMANCE OF CLASSIFICATION RULES PRODUCED BY SYMBOLIC INDUCTIVE METHODS
Jerzy Bala Kenneth Dejong Peter Pachowicz
Center for Artificial Intelligence George Mason University
4400 University Drive Fairfax VA 22030
Abstract In this paper we present a novel way 0 combining symbolic inductive methods and genetic algorithms (GAs) applied to produce high-performance classification rules The presented method consists 0 two phases In the first one the algorithm induvtively learns a set of classification rules from noisy input examples In the second phase the worst performing rule is optimized by GAs techniques Experimental results are presentedor twelve classes onoisy data obtainedrom textured images
1 Introduction
One fundamental weakness of inductive learning (except for special cases) is the fact that
the acquired knowledge cannot be validated Traditional inquiries into inductive inference have
therefore dealt with questions of what are the best criteria for guiding the selection of inductive
assertions and how these assertiops can be confirmed The goal of inference is to formulate
plausible general assertions that explain the given facts and are able to predict new facts For a
given set of facts a potentially infinite number of hypotheses can be generated that imply these
facts A preference criterion is therefore necessary to provide constraints and reduce the infinite
choice to one hypothesis or a few more preferable ones A typical way of defining such criterion
is to specify the preferable propenies of the hypothesis for example to require that the
hypothesis is the shonest or the most economical description consistent with all the facts Even if
the preferred criterion is defined it can generate other problems eg performance on the future
data This is especially imponant if the initial data has some distribution of noisy and irrelevant
examples
We propose a novel hybrid approach in which frrst a classic rule induction system (AQ)
produces shon simple complete and consistent rules and then GAs are used to improve the
performance of the rule via an evolutionary mechanism By dropping the requirement for a rule
to be complete and consistent with respect to the initial learning dara we reduce the coverage of
noisy and irrelevant components in the rule structure Since the AQ learning algorithm is the
induction learning method in our method the next section presents a short overview of this
learning method Section 3 presents the GA phase of our method and the experimental results are
presented in section 4
2 AQ learning algorithm
The AQ algorithm [Michalski et al 1986] learns atuibutional description from examples
When building a decision rule AQ performs a heuristic search through a space of logical
expressions to determine those that account for all positive examples and no negative examples
Because there are usually many such complete and consistent expressions the goal of AQ is to
find the most preferred one according to some criterion Learning examples are given in the
form of events which are vectors of attribute values Events represent different decision classes
For each class a decision rule is produced that covers all examples of that class and none of other
classes The concept descriptions learned by AQ algorithm are represented in VL 1 the Variableshy
Valued Logic System 1 [Michalski 1972] and are used to represent attributional concept
descriptions A description of a concept is a disjunctive normal form which is called a cover A
cover is a disjunction of complexes A complex is a conjunction of selectors A selector is a
form [LR] (1)
where L is called the referee which is an attribute R is called the referent which is a set of values in the domain of the attribute L is one of the following relational symbols = lt gt gt= lt= ltgt
The following is an example of the five selectors AQ complex (equality is used as a relational
symbol)
[xl=1 3][x2=1][x4=O][x6=17][x8=1]
Dots in the description represent a range of possible values of a given attribute An example of
simple four complexes AQ description is presented below
1 [xl=78] [x2=8 19] [x3-8 13] [x5=4 54] [x6=O3] 2 [xl=15 54] [x3=1114] [x4=14 17] [x6=O 9] [x7=O111 3 [xl=9 18] [x3=16 21] [x4=9 10] 4 [xl=1O 14] [x3=13 16] [x4=14 54] [x7=4 5]
2
3 Genetic Algorithms Phase
The rules generated by the AQ algorithm have the propeny of being complete and
consistent with respect to the learning examples These properties mean that each class covers its
learning examples (positive examples) and does not cover any negative examples (negative
examples are all examples belonging to other classes) hen the AQ algorithm is applied to finershy
grained and noisy symbolic representations the consistency and completeness conditions
overconstrain the generated class description This in turn leads to poorer predictive accuracy
on future data Conversely by relaxing the requirement of consistency and completeness
predictive accuracy in noisy domains can be improved In the proposed method we use GAs to
improve the performance of initially consistent and complete rules via the evolutionary
mechanism
A genetic algorithm [De Jong 1988] maintains a constant-sized population of candidate
solutions known as individuals At each iteration each individual is evaluated and recombined
with others on the basis of its overall qUality or fitness The expected number of times an
individual is selected for recombination is proportional to its fitness relative to the rest of
population The power of a genetic algorithm lies in its ability to exploit in a highly efficient
manner information about a large number of individuals By allocating more reproductive
occurrences to above average individuals the overall affect is an increase of the populations
average fitness New individuals are created using two main genetic recombination operators
known as crossover and mutation Crossover operates by selecting a random location in the
genetic saing of the parents (crossover point) and concatenating the initial segments of one parent
with the final segment parent to create a new child A second child is simultaneously generated
using the remaining segments of the two parents Mutation provides for occasional disturbances
in the crossover operation to insure diversity in the genetic strings over long periods of rime and
to prevent stagnation in the convergence of the optimizations technique In order to apply GAs
one needs to choose a representation defme the operators and the performance measure In
traditional GAs individuals in the population are typically represented using a binary string
notation to promote efficiency and application independence in the genetic operations The
mathematical analysis of GAs shows that they work best when the internal representation
encourages the emergence of useful building blocks that can be subsequently combined with
others to produce improved performance and string representations are just one of many ways of
achieving this In olD application of GAs we do not use string representations Each individual of
the population is represented in VLl (previous section) as the different version of a given rule
that is a subject for the optimization The VL 1 representation is very natural for the
optimization problem defined in this paper (search for a better performing rule) and can easily be
manipulated by GA operators The next section describes GA operators chosen for our method
3
31 GA operators
An initial population of rules is created by causing small variations to an existing rule
Then a population of different variations of this rule is evolved by using GAs operators Each
rule in the population is evaluated based on its peIiormance within the set of initial rules To
evaluate the rule we use the tuning data which are pan of the set of learning examples The
perfonnance of the best rule in the population is monitored by matching this rule with the testing
examples
Mutation is performed by introducing small changes to the condition pans (selectors) of a
selected disjunct (rule complex) of a given rule The selectors to be changed are chosen by
randomly generating two pointers the first one that points to the rule disjunct complex and the
second one points to the selector of this disjunct Random changes are introduced to the left-most
or right-most (this is also chosen randomly) value of this selector For example the selector
[x5=310bullbull23] can be changed to one of the following [x5=3IO 201 [x5=3IO241
[x5=510 23] or [x5=210 23] Such a mutation process samples the space of possible
boundaries between rules to minimize the coverage of noisy and irrelevant components of a given
rule The crossover operation is performed by splitting rule disjuncts into two parts upper
disjuncts and lower disjuncts These parts are exchanged between parent rules to produce new
child rules Since degree of match of a given instance depends on the degree of match of this
instance to each disjunct of that rule this exchange process enables inheritance of information
about strong disjuncts (suongly matching) in the individuals of the next evolved population An
example ofcrossover applied to short four disjuncts rules is depicted below
Parent rule I I [xl=78] [x2=8 19] [x3=8 13] [xS=454] [x6=O3] 2 [xl=IS 54] [x3=1114] [x4=14 17] [x6=O 9] [x7=O11] ------------------ crossover position ---------------- shy3 [xl=918] [x3=16 21] [x4=9 10] 4 [xl=IO 14] [x3=13 16] [x4=1454] [x7=4 5]
Parent rule 2 I [x3=18 54] [x4=16 54] [xS=O 6] [x7=5 12] 2 [xl=8 25] [x3=8 13] [x4=9 11] [x5=O 3] ------------------ crossover position --------------------shy3 [x4=O bull 22] [x5=8 9] [x6=O 7] [x7=1148] 4 [x2=S bull 8] [x3=7 8] [x4=8 11] [x5=O 3]
Result of the crossover operation (one of two child rules) I [xl=7bull8] [x2=8 19] [x3=8 13] [xS=454] [x6=O 3] 2 [xl=IS 54] [x3=1114] [x4=14 17] [x6=O9] [x7=O 11] 3 [x4=O 22] [x5=8 9] [x6=O 7] [x7=1148] 4 [x2=5 8] [x3=7 8] [x4=8 11] [xS=O bull 3]
4
The GA operators as described above are very effective in situations where no a priori
infonnation is given on the distribution of attribute values Son-nonnal distribution of attribute
values complicates the AQ generated descriptions especially if the learned rules consistently and
completely cover input instances of the data Graph 1 presents the most representative samples of
non-normal attribute distribution obtained from the experiment with textured image classes
described in section 4
Class C9 Attribute 5 Class C8 Attribute xl 8~--~-------~-------~~
O--~~~-r----r-~-+~~~ O~~--~---+--P-~~--+--r-~
o 10 AUribute value 40 50 o 10 AUribute value 40 50 Class C8 Attribute x3
~ g -l---+---t-_r--Ii--1t--+---+--4 a ~ -+--I-----t--+--tfo~___+__t
i+---~--~~~--+---~~--~~ l+---~~--r---~--~~r-~ O~~--+--p--P-~~-----+--~~~
o 10 AUribuae value 40 50
Class Cll Attribute 7 4-----~--~----~----~--~~
O~~~--~~~~--~~-r~~
o 10 AUribute value 40 50
Graph 1 Examples of non-nonnal attribute distribution
The solid line corresponds to smoothed distribution of an attribute and the dotted line
corresponds to the approximated nonnal distribution The left hand diagrams of Graph 1 present
a relatively simple attribute distribution However the right hand diagrams of Graph I present
two cases of more complex attribute distributions The upper right diagram indicates the possible
formation of more than one cluster of training data The lower right diagram presents very
complex distribution of an attribute without distinction of regular clusters One can notice that x7
5
attribute does not carry significant information about CII class Considering the above
complexity of the attributes we postulate that this domain (data in a representation space
expressed by such attributes) is a good candidate for the GA type of searches where random
changes (mutation) and some directional sampling (crossover) can yield high performance
solutions (optimized rules)
32 Rule performance evaluation
Each rule candidate in the population is evaluated by plugging it into the set of other rules
and calculating the confusion mattix for the tuning examples The confusion matrix represents
infonnation on correct and incorrect classification results This matrix is obtained by calculating a
degree of flexible match between a tuning instance and a given class description The row entries
it) the matrix represent a percentage of matched instances from a given class (row index) to all
class descriptions (column index) The following is an example of the confusion matrix for 12
classes
Cl C2 C3 C4 C5 C6 Cl C8 C9 CIO Cl1 C12
Cl 45 S 31 0 0 9 0 28 16 0 6 2 C2 S 51 3 12 0 16 1 17 7 0 17 2 C3 31 0 80 0 7 1 0 2 10 0 6 S C4 0 25 0 72 0 14 0 12 0 0 1 10 C5 0 0 2 0 99 0 0 0 0 0 0 7 C6 8 11 3 25 0 62 0 28 0 0 12 1 C7 0 2 0 0 0 0 98 0 0 0 0 0 C8 19 14 8 2 1 59 0 59 3 0 5 0 C9 16 0 8 0 0 5 0 5 87 0 I 2 CI0 0 0 I 0 0 0 0 0 2 95 0 0 Cll 16 4 32 10 0 4 0 4 2shy 0 44 7 C12 0 0 0 4 0 0 1 0 0 0 2 98
Average recognition=7467 Average mis-classitication=483
The above confusion matrix represents classification results on data extracted from
textured images The method of extraction and texture classification by inductive learning is not
the subject of research presented in this paper The reader can find various information on the
method of learning texture descriptions in the following papers [Bala and Pachowicz 1991]
Let us suppose that the class Cll is the one chosen to be evolved by GAs mechanism
First we mutate this class representation multiple times to produce an initial population of
different individuals of that class To test the performance of a given individual we calculate the
confusion matrix The confusion matrix shows us how this specific variation of class Cll
6
perfonns when tested with other classes using the set of tuning examples For example we can
see from me above confusion matrix mat the class C11 has me following performance
Cl C2 C3 C4 C5 C6 C7 C8 C9 ClO Cll Cl2 CII 16 4 32 10 0 4 0 4 2 0 44 7
The above values represent percentage of instances of class C 11 (set of tuning instances)
matched to learned descriptions of all 12 classes If class CII is to be optimized by GAs we have
to evaluate the performance of each individual from the population of that class by calculating the
performance measure of different candidates of that class with respect to other classes Thus as
the performance evaluation measure of each individual of the population we use the ratio of
correct recognition rates to incorrect recognition rates CCMC where CC is an average
recognition for correctly recognized classes (an average of entries on the diagonal of the
confusion matrix) and MC is an average mis-classification (an average of entries outside the
diagonal of the confusion matrix) For the above confusion matrix CCMC=7467483=1545
The next section describes how the distance measure between an example (tuning or testing) and
a rule is calculated
321 Reeoampnizinamp class membership throuamph nexible matehinamp
Flexible matching measures the degree of closeness between an instance and the
conditional pan of a rule Such closeness is computed as distance value between an instance and
rule complex within the attribute space This distance measure takes any value in the range from
o (Le does not match) to 1 (Lebull matches) Calculation of this distance measure for a single
test instance is executed according to the following schema For a given condition of a rule
[ Xn =valjl and an instance where Xn = valk the normalized distance measure is
1 bull ( I valj bull valk II levels ) (2)
where levels is the total number of attrib1te values The confidence level of a rule is computed
by multiplying evaluation values of each condition of the rule The total evaluation of class
membership of a given test instance is equal to the confidence level of the best matching rule ie
the rule of the highest confidence value For example the confidence value c for matching the
following complex of a rule [ xl =0 ] [ x7 =10 ] [ x8 = 10 20]
with a given instance x = lt 4 S 24 34 O 12 6 25 gt and the number of attribute values levels=55 is computed as follows
cx = 1middot ( 10 - 4155) = 928 Cx7 = 1 - ( 110 - 61 55) = 928 cx8 = 1- ( 120middot 511 55) = 91 Cx2 cx3 cx4 cxs cx6 = 1
7
c = cxl Cx2 Cx) Cx4 Cx5 Cx6 Cx7 Cx8 = 078
The recognition process yields class membership for which the confidence level is the highest
among matched rules Calculated confidence level however is not a probability measure and it
can yield more than one class membership It means that for a given preclassified test dataset the
system recognition effectiveness is calculated as the number of correctly classified instances to
the total number of instances of the dataset The above described method of rule evaluation by
flexible matching and confidence level calculation has imponant consequences for the crossover
operation Since the degree of match of an instance to a rule depends on the confidence level of
this instance to each disjunct of that rule the swapping process introduced by crossover operator
enables inheritance of infonnation represented in the strong disjuncts (with high confidence level)
of rule candidates through generations of a population
4 Experimental results
In the experiment we used 12 classes of texture data The initial deSCriptions were
generated by the AQ module using 200 examples per class Another set of 100 examples was
used to calculat the perfonnance measure for the GAs cycle The testing set had 200 examples
extracted from different image areas other than training and tuning data Examples of a given
class were constructed using 8 attributes and an extraction process based on Laws mask [Bala
and Pachowicz 1991] The weakest class (with the lowest matching results with testing data)
chosen for the experiment was class Cll (like in the confusion matrix example in section 32)
with 40 disjuncts Graph 2 represents results of the experiment White circles of the diagrams
represent characteristics obtained for tuning data used to guide the genetic search Characteristics
mapped by black circles were obtained for testing data Upper diagram monitors the performance
of genetically evolved description of Cll texture When all 300 examples were used to generate
rules the average classification rate for this class (when tested with 200 examples) was below
45 When the set of 300 examples was split into two parts 200 for the initial inductive learning
and 100 as the tuning data for GAs cycle the correct classification rates obtainedtin the 30th
evolution was above 60 That is a significant increase in comparison with 45 obtained from
inductive learning only The bottom diagrams represent the evaluation function used by genetic
algorithm (COMe section 32) in order to guide the genetic seanh The evaluation function was
calculated as a rate of the correct classifications to mis-classifications for all twelve texture
classes These diagrams are depicted for both testing and tuning data The increases of CCMC
on both diagrams represent an overall improvement of system recognition performance The
system performance was investigated for a larger number of GA generation steps However it
appears that the substantial increase was reached both for the CII class description and for the
overall system performance in a very few generation steps (Le in 10 steps)
8
o
bull Results for tuning data Results for testing data
30~----~A+--------+-------~
20---~--~----~--~~--~--~
o 10 Generations 30
208
-~ 0
r 1
~
MM II be
roo J
248
207
247206
205
246 204
203 shy245
202
244201 o 10 Generations 30
Graph 2 Results of experiment with 12 classes
u
~
bullJ
0 10 Generations 30
f
- ~
s Conclusion
This paper presents a novel approach to the optimization of the classification rule
descriptions through GAs techniques We recognize rule optimization as an emerging research
area in the creation of autonomous intelligent systems Two issues appear to be important for this
research First rule optimization is absolutely necessary in such problems where the system
must be protected against the accumulation of noisy components and where attributes used to
describe initial data have complex non-nonna distributions The method can be augmented by
9
optimizing input data formation (searching for the best subset of attributes) prior to rule
generation [Vafaie De Jong 1991] Secondly for inducti ely generated rules (falsity preserving
rules) there should be some method of validating these rules (or choosing the most preferred
inductive rule) before applying them to future data The methoo presented in this paper tries to
offer some solutions to these problems
The presented experiments on applying this hybrid methoo to finer-grained symbolic
representations of vision data are encouraging Highly disjunctive descriptions obtained by
inductive learning from this data were easily manipulated by chosen GA operators and a
substantial increase in the overall system perfonnance was observed in a few generations
Although there exists substantial differences between GAs and symbolic induction methods
(learning algorithm performance elements knowledge representation) this method serves as a
promising example that a better understanding of abilities of each approach (identifying common
building blocks) can lead to novel and useful ways of combining them This paper concentrates
primarily on the proposed methodology More experiments are needed to draw concrete
conclusions Future research will attempt to thoroughly test this approach on numerous data sets
We will also examine more thoroughly mutation and crossover operators
Acknowledgement This research was supponed in pan by the Office of Naval Research under grants No NOOOI4-88-K-0397 and No NOOO14-88-K-0226 and in pan by the Defense Advan~ed Research Projects Agency under the grant administered by the Office of Naval Research No NOOOI4-K-85-0878 The authors wish to thank Janet Holmes and Haleh Vafaie for editing suggestions
References
Bala IW and Pachowicz PW Application of Symbolic Machine Learning to the Recognition ofTexture Concepts 7th IEEE Conference on Artificial Intelligence Applications Miami Beach FL February 1991
De Jong K bullbull Learning with Genetic Algorithms An Overview Machine Learning vol 3 123shy138 Kluwer Academic Publishers 1988
Vafaie Hand De Jong K Improving the Performance of a Rule Induction System Using Genetic Algorithms in preparation(Center for AI GMU 1991)
Fitzpatrick JM and Grefenstette I Genetic Algorithms in Noisy Environments Machine Leaming 3 pp 101-12-(1988)
MichalskiR S AQVAUI--ComputerImplementation ofa Variable-Valued Logic System VLl and Examples of Its Application to Pattern Recognition First International Joint Conference on Pattern Recognition October 30 1973 Washington DC
Michalski RS bull Mozetic I Hong IR Lavrac N bull The AQ15 Inductive Learning System Report No UIUCDCS-R-86-1260 Department of Computer Science University of Illinois at Urbane-Champaign July 1986
10
to be complete and consistent with respect to the initial learning dara we reduce the coverage of
noisy and irrelevant components in the rule structure Since the AQ learning algorithm is the
induction learning method in our method the next section presents a short overview of this
learning method Section 3 presents the GA phase of our method and the experimental results are
presented in section 4
2 AQ learning algorithm
The AQ algorithm [Michalski et al 1986] learns atuibutional description from examples
When building a decision rule AQ performs a heuristic search through a space of logical
expressions to determine those that account for all positive examples and no negative examples
Because there are usually many such complete and consistent expressions the goal of AQ is to
find the most preferred one according to some criterion Learning examples are given in the
form of events which are vectors of attribute values Events represent different decision classes
For each class a decision rule is produced that covers all examples of that class and none of other
classes The concept descriptions learned by AQ algorithm are represented in VL 1 the Variableshy
Valued Logic System 1 [Michalski 1972] and are used to represent attributional concept
descriptions A description of a concept is a disjunctive normal form which is called a cover A
cover is a disjunction of complexes A complex is a conjunction of selectors A selector is a
form [LR] (1)
where L is called the referee which is an attribute R is called the referent which is a set of values in the domain of the attribute L is one of the following relational symbols = lt gt gt= lt= ltgt
The following is an example of the five selectors AQ complex (equality is used as a relational
symbol)
[xl=1 3][x2=1][x4=O][x6=17][x8=1]
Dots in the description represent a range of possible values of a given attribute An example of
simple four complexes AQ description is presented below
1 [xl=78] [x2=8 19] [x3-8 13] [x5=4 54] [x6=O3] 2 [xl=15 54] [x3=1114] [x4=14 17] [x6=O 9] [x7=O111 3 [xl=9 18] [x3=16 21] [x4=9 10] 4 [xl=1O 14] [x3=13 16] [x4=14 54] [x7=4 5]
2
3 Genetic Algorithms Phase
The rules generated by the AQ algorithm have the propeny of being complete and
consistent with respect to the learning examples These properties mean that each class covers its
learning examples (positive examples) and does not cover any negative examples (negative
examples are all examples belonging to other classes) hen the AQ algorithm is applied to finershy
grained and noisy symbolic representations the consistency and completeness conditions
overconstrain the generated class description This in turn leads to poorer predictive accuracy
on future data Conversely by relaxing the requirement of consistency and completeness
predictive accuracy in noisy domains can be improved In the proposed method we use GAs to
improve the performance of initially consistent and complete rules via the evolutionary
mechanism
A genetic algorithm [De Jong 1988] maintains a constant-sized population of candidate
solutions known as individuals At each iteration each individual is evaluated and recombined
with others on the basis of its overall qUality or fitness The expected number of times an
individual is selected for recombination is proportional to its fitness relative to the rest of
population The power of a genetic algorithm lies in its ability to exploit in a highly efficient
manner information about a large number of individuals By allocating more reproductive
occurrences to above average individuals the overall affect is an increase of the populations
average fitness New individuals are created using two main genetic recombination operators
known as crossover and mutation Crossover operates by selecting a random location in the
genetic saing of the parents (crossover point) and concatenating the initial segments of one parent
with the final segment parent to create a new child A second child is simultaneously generated
using the remaining segments of the two parents Mutation provides for occasional disturbances
in the crossover operation to insure diversity in the genetic strings over long periods of rime and
to prevent stagnation in the convergence of the optimizations technique In order to apply GAs
one needs to choose a representation defme the operators and the performance measure In
traditional GAs individuals in the population are typically represented using a binary string
notation to promote efficiency and application independence in the genetic operations The
mathematical analysis of GAs shows that they work best when the internal representation
encourages the emergence of useful building blocks that can be subsequently combined with
others to produce improved performance and string representations are just one of many ways of
achieving this In olD application of GAs we do not use string representations Each individual of
the population is represented in VLl (previous section) as the different version of a given rule
that is a subject for the optimization The VL 1 representation is very natural for the
optimization problem defined in this paper (search for a better performing rule) and can easily be
manipulated by GA operators The next section describes GA operators chosen for our method
3
31 GA operators
An initial population of rules is created by causing small variations to an existing rule
Then a population of different variations of this rule is evolved by using GAs operators Each
rule in the population is evaluated based on its peIiormance within the set of initial rules To
evaluate the rule we use the tuning data which are pan of the set of learning examples The
perfonnance of the best rule in the population is monitored by matching this rule with the testing
examples
Mutation is performed by introducing small changes to the condition pans (selectors) of a
selected disjunct (rule complex) of a given rule The selectors to be changed are chosen by
randomly generating two pointers the first one that points to the rule disjunct complex and the
second one points to the selector of this disjunct Random changes are introduced to the left-most
or right-most (this is also chosen randomly) value of this selector For example the selector
[x5=310bullbull23] can be changed to one of the following [x5=3IO 201 [x5=3IO241
[x5=510 23] or [x5=210 23] Such a mutation process samples the space of possible
boundaries between rules to minimize the coverage of noisy and irrelevant components of a given
rule The crossover operation is performed by splitting rule disjuncts into two parts upper
disjuncts and lower disjuncts These parts are exchanged between parent rules to produce new
child rules Since degree of match of a given instance depends on the degree of match of this
instance to each disjunct of that rule this exchange process enables inheritance of information
about strong disjuncts (suongly matching) in the individuals of the next evolved population An
example ofcrossover applied to short four disjuncts rules is depicted below
Parent rule I I [xl=78] [x2=8 19] [x3=8 13] [xS=454] [x6=O3] 2 [xl=IS 54] [x3=1114] [x4=14 17] [x6=O 9] [x7=O11] ------------------ crossover position ---------------- shy3 [xl=918] [x3=16 21] [x4=9 10] 4 [xl=IO 14] [x3=13 16] [x4=1454] [x7=4 5]
Parent rule 2 I [x3=18 54] [x4=16 54] [xS=O 6] [x7=5 12] 2 [xl=8 25] [x3=8 13] [x4=9 11] [x5=O 3] ------------------ crossover position --------------------shy3 [x4=O bull 22] [x5=8 9] [x6=O 7] [x7=1148] 4 [x2=S bull 8] [x3=7 8] [x4=8 11] [x5=O 3]
Result of the crossover operation (one of two child rules) I [xl=7bull8] [x2=8 19] [x3=8 13] [xS=454] [x6=O 3] 2 [xl=IS 54] [x3=1114] [x4=14 17] [x6=O9] [x7=O 11] 3 [x4=O 22] [x5=8 9] [x6=O 7] [x7=1148] 4 [x2=5 8] [x3=7 8] [x4=8 11] [xS=O bull 3]
4
The GA operators as described above are very effective in situations where no a priori
infonnation is given on the distribution of attribute values Son-nonnal distribution of attribute
values complicates the AQ generated descriptions especially if the learned rules consistently and
completely cover input instances of the data Graph 1 presents the most representative samples of
non-normal attribute distribution obtained from the experiment with textured image classes
described in section 4
Class C9 Attribute 5 Class C8 Attribute xl 8~--~-------~-------~~
O--~~~-r----r-~-+~~~ O~~--~---+--P-~~--+--r-~
o 10 AUribute value 40 50 o 10 AUribute value 40 50 Class C8 Attribute x3
~ g -l---+---t-_r--Ii--1t--+---+--4 a ~ -+--I-----t--+--tfo~___+__t
i+---~--~~~--+---~~--~~ l+---~~--r---~--~~r-~ O~~--+--p--P-~~-----+--~~~
o 10 AUribuae value 40 50
Class Cll Attribute 7 4-----~--~----~----~--~~
O~~~--~~~~--~~-r~~
o 10 AUribute value 40 50
Graph 1 Examples of non-nonnal attribute distribution
The solid line corresponds to smoothed distribution of an attribute and the dotted line
corresponds to the approximated nonnal distribution The left hand diagrams of Graph 1 present
a relatively simple attribute distribution However the right hand diagrams of Graph I present
two cases of more complex attribute distributions The upper right diagram indicates the possible
formation of more than one cluster of training data The lower right diagram presents very
complex distribution of an attribute without distinction of regular clusters One can notice that x7
5
attribute does not carry significant information about CII class Considering the above
complexity of the attributes we postulate that this domain (data in a representation space
expressed by such attributes) is a good candidate for the GA type of searches where random
changes (mutation) and some directional sampling (crossover) can yield high performance
solutions (optimized rules)
32 Rule performance evaluation
Each rule candidate in the population is evaluated by plugging it into the set of other rules
and calculating the confusion mattix for the tuning examples The confusion matrix represents
infonnation on correct and incorrect classification results This matrix is obtained by calculating a
degree of flexible match between a tuning instance and a given class description The row entries
it) the matrix represent a percentage of matched instances from a given class (row index) to all
class descriptions (column index) The following is an example of the confusion matrix for 12
classes
Cl C2 C3 C4 C5 C6 Cl C8 C9 CIO Cl1 C12
Cl 45 S 31 0 0 9 0 28 16 0 6 2 C2 S 51 3 12 0 16 1 17 7 0 17 2 C3 31 0 80 0 7 1 0 2 10 0 6 S C4 0 25 0 72 0 14 0 12 0 0 1 10 C5 0 0 2 0 99 0 0 0 0 0 0 7 C6 8 11 3 25 0 62 0 28 0 0 12 1 C7 0 2 0 0 0 0 98 0 0 0 0 0 C8 19 14 8 2 1 59 0 59 3 0 5 0 C9 16 0 8 0 0 5 0 5 87 0 I 2 CI0 0 0 I 0 0 0 0 0 2 95 0 0 Cll 16 4 32 10 0 4 0 4 2shy 0 44 7 C12 0 0 0 4 0 0 1 0 0 0 2 98
Average recognition=7467 Average mis-classitication=483
The above confusion matrix represents classification results on data extracted from
textured images The method of extraction and texture classification by inductive learning is not
the subject of research presented in this paper The reader can find various information on the
method of learning texture descriptions in the following papers [Bala and Pachowicz 1991]
Let us suppose that the class Cll is the one chosen to be evolved by GAs mechanism
First we mutate this class representation multiple times to produce an initial population of
different individuals of that class To test the performance of a given individual we calculate the
confusion matrix The confusion matrix shows us how this specific variation of class Cll
6
perfonns when tested with other classes using the set of tuning examples For example we can
see from me above confusion matrix mat the class C11 has me following performance
Cl C2 C3 C4 C5 C6 C7 C8 C9 ClO Cll Cl2 CII 16 4 32 10 0 4 0 4 2 0 44 7
The above values represent percentage of instances of class C 11 (set of tuning instances)
matched to learned descriptions of all 12 classes If class CII is to be optimized by GAs we have
to evaluate the performance of each individual from the population of that class by calculating the
performance measure of different candidates of that class with respect to other classes Thus as
the performance evaluation measure of each individual of the population we use the ratio of
correct recognition rates to incorrect recognition rates CCMC where CC is an average
recognition for correctly recognized classes (an average of entries on the diagonal of the
confusion matrix) and MC is an average mis-classification (an average of entries outside the
diagonal of the confusion matrix) For the above confusion matrix CCMC=7467483=1545
The next section describes how the distance measure between an example (tuning or testing) and
a rule is calculated
321 Reeoampnizinamp class membership throuamph nexible matehinamp
Flexible matching measures the degree of closeness between an instance and the
conditional pan of a rule Such closeness is computed as distance value between an instance and
rule complex within the attribute space This distance measure takes any value in the range from
o (Le does not match) to 1 (Lebull matches) Calculation of this distance measure for a single
test instance is executed according to the following schema For a given condition of a rule
[ Xn =valjl and an instance where Xn = valk the normalized distance measure is
1 bull ( I valj bull valk II levels ) (2)
where levels is the total number of attrib1te values The confidence level of a rule is computed
by multiplying evaluation values of each condition of the rule The total evaluation of class
membership of a given test instance is equal to the confidence level of the best matching rule ie
the rule of the highest confidence value For example the confidence value c for matching the
following complex of a rule [ xl =0 ] [ x7 =10 ] [ x8 = 10 20]
with a given instance x = lt 4 S 24 34 O 12 6 25 gt and the number of attribute values levels=55 is computed as follows
cx = 1middot ( 10 - 4155) = 928 Cx7 = 1 - ( 110 - 61 55) = 928 cx8 = 1- ( 120middot 511 55) = 91 Cx2 cx3 cx4 cxs cx6 = 1
7
c = cxl Cx2 Cx) Cx4 Cx5 Cx6 Cx7 Cx8 = 078
The recognition process yields class membership for which the confidence level is the highest
among matched rules Calculated confidence level however is not a probability measure and it
can yield more than one class membership It means that for a given preclassified test dataset the
system recognition effectiveness is calculated as the number of correctly classified instances to
the total number of instances of the dataset The above described method of rule evaluation by
flexible matching and confidence level calculation has imponant consequences for the crossover
operation Since the degree of match of an instance to a rule depends on the confidence level of
this instance to each disjunct of that rule the swapping process introduced by crossover operator
enables inheritance of infonnation represented in the strong disjuncts (with high confidence level)
of rule candidates through generations of a population
4 Experimental results
In the experiment we used 12 classes of texture data The initial deSCriptions were
generated by the AQ module using 200 examples per class Another set of 100 examples was
used to calculat the perfonnance measure for the GAs cycle The testing set had 200 examples
extracted from different image areas other than training and tuning data Examples of a given
class were constructed using 8 attributes and an extraction process based on Laws mask [Bala
and Pachowicz 1991] The weakest class (with the lowest matching results with testing data)
chosen for the experiment was class Cll (like in the confusion matrix example in section 32)
with 40 disjuncts Graph 2 represents results of the experiment White circles of the diagrams
represent characteristics obtained for tuning data used to guide the genetic search Characteristics
mapped by black circles were obtained for testing data Upper diagram monitors the performance
of genetically evolved description of Cll texture When all 300 examples were used to generate
rules the average classification rate for this class (when tested with 200 examples) was below
45 When the set of 300 examples was split into two parts 200 for the initial inductive learning
and 100 as the tuning data for GAs cycle the correct classification rates obtainedtin the 30th
evolution was above 60 That is a significant increase in comparison with 45 obtained from
inductive learning only The bottom diagrams represent the evaluation function used by genetic
algorithm (COMe section 32) in order to guide the genetic seanh The evaluation function was
calculated as a rate of the correct classifications to mis-classifications for all twelve texture
classes These diagrams are depicted for both testing and tuning data The increases of CCMC
on both diagrams represent an overall improvement of system recognition performance The
system performance was investigated for a larger number of GA generation steps However it
appears that the substantial increase was reached both for the CII class description and for the
overall system performance in a very few generation steps (Le in 10 steps)
8
o
bull Results for tuning data Results for testing data
30~----~A+--------+-------~
20---~--~----~--~~--~--~
o 10 Generations 30
208
-~ 0
r 1
~
MM II be
roo J
248
207
247206
205
246 204
203 shy245
202
244201 o 10 Generations 30
Graph 2 Results of experiment with 12 classes
u
~
bullJ
0 10 Generations 30
f
- ~
s Conclusion
This paper presents a novel approach to the optimization of the classification rule
descriptions through GAs techniques We recognize rule optimization as an emerging research
area in the creation of autonomous intelligent systems Two issues appear to be important for this
research First rule optimization is absolutely necessary in such problems where the system
must be protected against the accumulation of noisy components and where attributes used to
describe initial data have complex non-nonna distributions The method can be augmented by
9
optimizing input data formation (searching for the best subset of attributes) prior to rule
generation [Vafaie De Jong 1991] Secondly for inducti ely generated rules (falsity preserving
rules) there should be some method of validating these rules (or choosing the most preferred
inductive rule) before applying them to future data The methoo presented in this paper tries to
offer some solutions to these problems
The presented experiments on applying this hybrid methoo to finer-grained symbolic
representations of vision data are encouraging Highly disjunctive descriptions obtained by
inductive learning from this data were easily manipulated by chosen GA operators and a
substantial increase in the overall system perfonnance was observed in a few generations
Although there exists substantial differences between GAs and symbolic induction methods
(learning algorithm performance elements knowledge representation) this method serves as a
promising example that a better understanding of abilities of each approach (identifying common
building blocks) can lead to novel and useful ways of combining them This paper concentrates
primarily on the proposed methodology More experiments are needed to draw concrete
conclusions Future research will attempt to thoroughly test this approach on numerous data sets
We will also examine more thoroughly mutation and crossover operators
Acknowledgement This research was supponed in pan by the Office of Naval Research under grants No NOOOI4-88-K-0397 and No NOOO14-88-K-0226 and in pan by the Defense Advan~ed Research Projects Agency under the grant administered by the Office of Naval Research No NOOOI4-K-85-0878 The authors wish to thank Janet Holmes and Haleh Vafaie for editing suggestions
References
Bala IW and Pachowicz PW Application of Symbolic Machine Learning to the Recognition ofTexture Concepts 7th IEEE Conference on Artificial Intelligence Applications Miami Beach FL February 1991
De Jong K bullbull Learning with Genetic Algorithms An Overview Machine Learning vol 3 123shy138 Kluwer Academic Publishers 1988
Vafaie Hand De Jong K Improving the Performance of a Rule Induction System Using Genetic Algorithms in preparation(Center for AI GMU 1991)
Fitzpatrick JM and Grefenstette I Genetic Algorithms in Noisy Environments Machine Leaming 3 pp 101-12-(1988)
MichalskiR S AQVAUI--ComputerImplementation ofa Variable-Valued Logic System VLl and Examples of Its Application to Pattern Recognition First International Joint Conference on Pattern Recognition October 30 1973 Washington DC
Michalski RS bull Mozetic I Hong IR Lavrac N bull The AQ15 Inductive Learning System Report No UIUCDCS-R-86-1260 Department of Computer Science University of Illinois at Urbane-Champaign July 1986
10
3 Genetic Algorithms Phase
The rules generated by the AQ algorithm have the propeny of being complete and
consistent with respect to the learning examples These properties mean that each class covers its
learning examples (positive examples) and does not cover any negative examples (negative
examples are all examples belonging to other classes) hen the AQ algorithm is applied to finershy
grained and noisy symbolic representations the consistency and completeness conditions
overconstrain the generated class description This in turn leads to poorer predictive accuracy
on future data Conversely by relaxing the requirement of consistency and completeness
predictive accuracy in noisy domains can be improved In the proposed method we use GAs to
improve the performance of initially consistent and complete rules via the evolutionary
mechanism
A genetic algorithm [De Jong 1988] maintains a constant-sized population of candidate
solutions known as individuals At each iteration each individual is evaluated and recombined
with others on the basis of its overall qUality or fitness The expected number of times an
individual is selected for recombination is proportional to its fitness relative to the rest of
population The power of a genetic algorithm lies in its ability to exploit in a highly efficient
manner information about a large number of individuals By allocating more reproductive
occurrences to above average individuals the overall affect is an increase of the populations
average fitness New individuals are created using two main genetic recombination operators
known as crossover and mutation Crossover operates by selecting a random location in the
genetic saing of the parents (crossover point) and concatenating the initial segments of one parent
with the final segment parent to create a new child A second child is simultaneously generated
using the remaining segments of the two parents Mutation provides for occasional disturbances
in the crossover operation to insure diversity in the genetic strings over long periods of rime and
to prevent stagnation in the convergence of the optimizations technique In order to apply GAs
one needs to choose a representation defme the operators and the performance measure In
traditional GAs individuals in the population are typically represented using a binary string
notation to promote efficiency and application independence in the genetic operations The
mathematical analysis of GAs shows that they work best when the internal representation
encourages the emergence of useful building blocks that can be subsequently combined with
others to produce improved performance and string representations are just one of many ways of
achieving this In olD application of GAs we do not use string representations Each individual of
the population is represented in VLl (previous section) as the different version of a given rule
that is a subject for the optimization The VL 1 representation is very natural for the
optimization problem defined in this paper (search for a better performing rule) and can easily be
manipulated by GA operators The next section describes GA operators chosen for our method
3
31 GA operators
An initial population of rules is created by causing small variations to an existing rule
Then a population of different variations of this rule is evolved by using GAs operators Each
rule in the population is evaluated based on its peIiormance within the set of initial rules To
evaluate the rule we use the tuning data which are pan of the set of learning examples The
perfonnance of the best rule in the population is monitored by matching this rule with the testing
examples
Mutation is performed by introducing small changes to the condition pans (selectors) of a
selected disjunct (rule complex) of a given rule The selectors to be changed are chosen by
randomly generating two pointers the first one that points to the rule disjunct complex and the
second one points to the selector of this disjunct Random changes are introduced to the left-most
or right-most (this is also chosen randomly) value of this selector For example the selector
[x5=310bullbull23] can be changed to one of the following [x5=3IO 201 [x5=3IO241
[x5=510 23] or [x5=210 23] Such a mutation process samples the space of possible
boundaries between rules to minimize the coverage of noisy and irrelevant components of a given
rule The crossover operation is performed by splitting rule disjuncts into two parts upper
disjuncts and lower disjuncts These parts are exchanged between parent rules to produce new
child rules Since degree of match of a given instance depends on the degree of match of this
instance to each disjunct of that rule this exchange process enables inheritance of information
about strong disjuncts (suongly matching) in the individuals of the next evolved population An
example ofcrossover applied to short four disjuncts rules is depicted below
Parent rule I I [xl=78] [x2=8 19] [x3=8 13] [xS=454] [x6=O3] 2 [xl=IS 54] [x3=1114] [x4=14 17] [x6=O 9] [x7=O11] ------------------ crossover position ---------------- shy3 [xl=918] [x3=16 21] [x4=9 10] 4 [xl=IO 14] [x3=13 16] [x4=1454] [x7=4 5]
Parent rule 2 I [x3=18 54] [x4=16 54] [xS=O 6] [x7=5 12] 2 [xl=8 25] [x3=8 13] [x4=9 11] [x5=O 3] ------------------ crossover position --------------------shy3 [x4=O bull 22] [x5=8 9] [x6=O 7] [x7=1148] 4 [x2=S bull 8] [x3=7 8] [x4=8 11] [x5=O 3]
Result of the crossover operation (one of two child rules) I [xl=7bull8] [x2=8 19] [x3=8 13] [xS=454] [x6=O 3] 2 [xl=IS 54] [x3=1114] [x4=14 17] [x6=O9] [x7=O 11] 3 [x4=O 22] [x5=8 9] [x6=O 7] [x7=1148] 4 [x2=5 8] [x3=7 8] [x4=8 11] [xS=O bull 3]
4
The GA operators as described above are very effective in situations where no a priori
infonnation is given on the distribution of attribute values Son-nonnal distribution of attribute
values complicates the AQ generated descriptions especially if the learned rules consistently and
completely cover input instances of the data Graph 1 presents the most representative samples of
non-normal attribute distribution obtained from the experiment with textured image classes
described in section 4
Class C9 Attribute 5 Class C8 Attribute xl 8~--~-------~-------~~
O--~~~-r----r-~-+~~~ O~~--~---+--P-~~--+--r-~
o 10 AUribute value 40 50 o 10 AUribute value 40 50 Class C8 Attribute x3
~ g -l---+---t-_r--Ii--1t--+---+--4 a ~ -+--I-----t--+--tfo~___+__t
i+---~--~~~--+---~~--~~ l+---~~--r---~--~~r-~ O~~--+--p--P-~~-----+--~~~
o 10 AUribuae value 40 50
Class Cll Attribute 7 4-----~--~----~----~--~~
O~~~--~~~~--~~-r~~
o 10 AUribute value 40 50
Graph 1 Examples of non-nonnal attribute distribution
The solid line corresponds to smoothed distribution of an attribute and the dotted line
corresponds to the approximated nonnal distribution The left hand diagrams of Graph 1 present
a relatively simple attribute distribution However the right hand diagrams of Graph I present
two cases of more complex attribute distributions The upper right diagram indicates the possible
formation of more than one cluster of training data The lower right diagram presents very
complex distribution of an attribute without distinction of regular clusters One can notice that x7
5
attribute does not carry significant information about CII class Considering the above
complexity of the attributes we postulate that this domain (data in a representation space
expressed by such attributes) is a good candidate for the GA type of searches where random
changes (mutation) and some directional sampling (crossover) can yield high performance
solutions (optimized rules)
32 Rule performance evaluation
Each rule candidate in the population is evaluated by plugging it into the set of other rules
and calculating the confusion mattix for the tuning examples The confusion matrix represents
infonnation on correct and incorrect classification results This matrix is obtained by calculating a
degree of flexible match between a tuning instance and a given class description The row entries
it) the matrix represent a percentage of matched instances from a given class (row index) to all
class descriptions (column index) The following is an example of the confusion matrix for 12
classes
Cl C2 C3 C4 C5 C6 Cl C8 C9 CIO Cl1 C12
Cl 45 S 31 0 0 9 0 28 16 0 6 2 C2 S 51 3 12 0 16 1 17 7 0 17 2 C3 31 0 80 0 7 1 0 2 10 0 6 S C4 0 25 0 72 0 14 0 12 0 0 1 10 C5 0 0 2 0 99 0 0 0 0 0 0 7 C6 8 11 3 25 0 62 0 28 0 0 12 1 C7 0 2 0 0 0 0 98 0 0 0 0 0 C8 19 14 8 2 1 59 0 59 3 0 5 0 C9 16 0 8 0 0 5 0 5 87 0 I 2 CI0 0 0 I 0 0 0 0 0 2 95 0 0 Cll 16 4 32 10 0 4 0 4 2shy 0 44 7 C12 0 0 0 4 0 0 1 0 0 0 2 98
Average recognition=7467 Average mis-classitication=483
The above confusion matrix represents classification results on data extracted from
textured images The method of extraction and texture classification by inductive learning is not
the subject of research presented in this paper The reader can find various information on the
method of learning texture descriptions in the following papers [Bala and Pachowicz 1991]
Let us suppose that the class Cll is the one chosen to be evolved by GAs mechanism
First we mutate this class representation multiple times to produce an initial population of
different individuals of that class To test the performance of a given individual we calculate the
confusion matrix The confusion matrix shows us how this specific variation of class Cll
6
perfonns when tested with other classes using the set of tuning examples For example we can
see from me above confusion matrix mat the class C11 has me following performance
Cl C2 C3 C4 C5 C6 C7 C8 C9 ClO Cll Cl2 CII 16 4 32 10 0 4 0 4 2 0 44 7
The above values represent percentage of instances of class C 11 (set of tuning instances)
matched to learned descriptions of all 12 classes If class CII is to be optimized by GAs we have
to evaluate the performance of each individual from the population of that class by calculating the
performance measure of different candidates of that class with respect to other classes Thus as
the performance evaluation measure of each individual of the population we use the ratio of
correct recognition rates to incorrect recognition rates CCMC where CC is an average
recognition for correctly recognized classes (an average of entries on the diagonal of the
confusion matrix) and MC is an average mis-classification (an average of entries outside the
diagonal of the confusion matrix) For the above confusion matrix CCMC=7467483=1545
The next section describes how the distance measure between an example (tuning or testing) and
a rule is calculated
321 Reeoampnizinamp class membership throuamph nexible matehinamp
Flexible matching measures the degree of closeness between an instance and the
conditional pan of a rule Such closeness is computed as distance value between an instance and
rule complex within the attribute space This distance measure takes any value in the range from
o (Le does not match) to 1 (Lebull matches) Calculation of this distance measure for a single
test instance is executed according to the following schema For a given condition of a rule
[ Xn =valjl and an instance where Xn = valk the normalized distance measure is
1 bull ( I valj bull valk II levels ) (2)
where levels is the total number of attrib1te values The confidence level of a rule is computed
by multiplying evaluation values of each condition of the rule The total evaluation of class
membership of a given test instance is equal to the confidence level of the best matching rule ie
the rule of the highest confidence value For example the confidence value c for matching the
following complex of a rule [ xl =0 ] [ x7 =10 ] [ x8 = 10 20]
with a given instance x = lt 4 S 24 34 O 12 6 25 gt and the number of attribute values levels=55 is computed as follows
cx = 1middot ( 10 - 4155) = 928 Cx7 = 1 - ( 110 - 61 55) = 928 cx8 = 1- ( 120middot 511 55) = 91 Cx2 cx3 cx4 cxs cx6 = 1
7
c = cxl Cx2 Cx) Cx4 Cx5 Cx6 Cx7 Cx8 = 078
The recognition process yields class membership for which the confidence level is the highest
among matched rules Calculated confidence level however is not a probability measure and it
can yield more than one class membership It means that for a given preclassified test dataset the
system recognition effectiveness is calculated as the number of correctly classified instances to
the total number of instances of the dataset The above described method of rule evaluation by
flexible matching and confidence level calculation has imponant consequences for the crossover
operation Since the degree of match of an instance to a rule depends on the confidence level of
this instance to each disjunct of that rule the swapping process introduced by crossover operator
enables inheritance of infonnation represented in the strong disjuncts (with high confidence level)
of rule candidates through generations of a population
4 Experimental results
In the experiment we used 12 classes of texture data The initial deSCriptions were
generated by the AQ module using 200 examples per class Another set of 100 examples was
used to calculat the perfonnance measure for the GAs cycle The testing set had 200 examples
extracted from different image areas other than training and tuning data Examples of a given
class were constructed using 8 attributes and an extraction process based on Laws mask [Bala
and Pachowicz 1991] The weakest class (with the lowest matching results with testing data)
chosen for the experiment was class Cll (like in the confusion matrix example in section 32)
with 40 disjuncts Graph 2 represents results of the experiment White circles of the diagrams
represent characteristics obtained for tuning data used to guide the genetic search Characteristics
mapped by black circles were obtained for testing data Upper diagram monitors the performance
of genetically evolved description of Cll texture When all 300 examples were used to generate
rules the average classification rate for this class (when tested with 200 examples) was below
45 When the set of 300 examples was split into two parts 200 for the initial inductive learning
and 100 as the tuning data for GAs cycle the correct classification rates obtainedtin the 30th
evolution was above 60 That is a significant increase in comparison with 45 obtained from
inductive learning only The bottom diagrams represent the evaluation function used by genetic
algorithm (COMe section 32) in order to guide the genetic seanh The evaluation function was
calculated as a rate of the correct classifications to mis-classifications for all twelve texture
classes These diagrams are depicted for both testing and tuning data The increases of CCMC
on both diagrams represent an overall improvement of system recognition performance The
system performance was investigated for a larger number of GA generation steps However it
appears that the substantial increase was reached both for the CII class description and for the
overall system performance in a very few generation steps (Le in 10 steps)
8
o
bull Results for tuning data Results for testing data
30~----~A+--------+-------~
20---~--~----~--~~--~--~
o 10 Generations 30
208
-~ 0
r 1
~
MM II be
roo J
248
207
247206
205
246 204
203 shy245
202
244201 o 10 Generations 30
Graph 2 Results of experiment with 12 classes
u
~
bullJ
0 10 Generations 30
f
- ~
s Conclusion
This paper presents a novel approach to the optimization of the classification rule
descriptions through GAs techniques We recognize rule optimization as an emerging research
area in the creation of autonomous intelligent systems Two issues appear to be important for this
research First rule optimization is absolutely necessary in such problems where the system
must be protected against the accumulation of noisy components and where attributes used to
describe initial data have complex non-nonna distributions The method can be augmented by
9
optimizing input data formation (searching for the best subset of attributes) prior to rule
generation [Vafaie De Jong 1991] Secondly for inducti ely generated rules (falsity preserving
rules) there should be some method of validating these rules (or choosing the most preferred
inductive rule) before applying them to future data The methoo presented in this paper tries to
offer some solutions to these problems
The presented experiments on applying this hybrid methoo to finer-grained symbolic
representations of vision data are encouraging Highly disjunctive descriptions obtained by
inductive learning from this data were easily manipulated by chosen GA operators and a
substantial increase in the overall system perfonnance was observed in a few generations
Although there exists substantial differences between GAs and symbolic induction methods
(learning algorithm performance elements knowledge representation) this method serves as a
promising example that a better understanding of abilities of each approach (identifying common
building blocks) can lead to novel and useful ways of combining them This paper concentrates
primarily on the proposed methodology More experiments are needed to draw concrete
conclusions Future research will attempt to thoroughly test this approach on numerous data sets
We will also examine more thoroughly mutation and crossover operators
Acknowledgement This research was supponed in pan by the Office of Naval Research under grants No NOOOI4-88-K-0397 and No NOOO14-88-K-0226 and in pan by the Defense Advan~ed Research Projects Agency under the grant administered by the Office of Naval Research No NOOOI4-K-85-0878 The authors wish to thank Janet Holmes and Haleh Vafaie for editing suggestions
References
Bala IW and Pachowicz PW Application of Symbolic Machine Learning to the Recognition ofTexture Concepts 7th IEEE Conference on Artificial Intelligence Applications Miami Beach FL February 1991
De Jong K bullbull Learning with Genetic Algorithms An Overview Machine Learning vol 3 123shy138 Kluwer Academic Publishers 1988
Vafaie Hand De Jong K Improving the Performance of a Rule Induction System Using Genetic Algorithms in preparation(Center for AI GMU 1991)
Fitzpatrick JM and Grefenstette I Genetic Algorithms in Noisy Environments Machine Leaming 3 pp 101-12-(1988)
MichalskiR S AQVAUI--ComputerImplementation ofa Variable-Valued Logic System VLl and Examples of Its Application to Pattern Recognition First International Joint Conference on Pattern Recognition October 30 1973 Washington DC
Michalski RS bull Mozetic I Hong IR Lavrac N bull The AQ15 Inductive Learning System Report No UIUCDCS-R-86-1260 Department of Computer Science University of Illinois at Urbane-Champaign July 1986
10
31 GA operators
An initial population of rules is created by causing small variations to an existing rule
Then a population of different variations of this rule is evolved by using GAs operators Each
rule in the population is evaluated based on its peIiormance within the set of initial rules To
evaluate the rule we use the tuning data which are pan of the set of learning examples The
perfonnance of the best rule in the population is monitored by matching this rule with the testing
examples
Mutation is performed by introducing small changes to the condition pans (selectors) of a
selected disjunct (rule complex) of a given rule The selectors to be changed are chosen by
randomly generating two pointers the first one that points to the rule disjunct complex and the
second one points to the selector of this disjunct Random changes are introduced to the left-most
or right-most (this is also chosen randomly) value of this selector For example the selector
[x5=310bullbull23] can be changed to one of the following [x5=3IO 201 [x5=3IO241
[x5=510 23] or [x5=210 23] Such a mutation process samples the space of possible
boundaries between rules to minimize the coverage of noisy and irrelevant components of a given
rule The crossover operation is performed by splitting rule disjuncts into two parts upper
disjuncts and lower disjuncts These parts are exchanged between parent rules to produce new
child rules Since degree of match of a given instance depends on the degree of match of this
instance to each disjunct of that rule this exchange process enables inheritance of information
about strong disjuncts (suongly matching) in the individuals of the next evolved population An
example ofcrossover applied to short four disjuncts rules is depicted below
Parent rule I I [xl=78] [x2=8 19] [x3=8 13] [xS=454] [x6=O3] 2 [xl=IS 54] [x3=1114] [x4=14 17] [x6=O 9] [x7=O11] ------------------ crossover position ---------------- shy3 [xl=918] [x3=16 21] [x4=9 10] 4 [xl=IO 14] [x3=13 16] [x4=1454] [x7=4 5]
Parent rule 2 I [x3=18 54] [x4=16 54] [xS=O 6] [x7=5 12] 2 [xl=8 25] [x3=8 13] [x4=9 11] [x5=O 3] ------------------ crossover position --------------------shy3 [x4=O bull 22] [x5=8 9] [x6=O 7] [x7=1148] 4 [x2=S bull 8] [x3=7 8] [x4=8 11] [x5=O 3]
Result of the crossover operation (one of two child rules) I [xl=7bull8] [x2=8 19] [x3=8 13] [xS=454] [x6=O 3] 2 [xl=IS 54] [x3=1114] [x4=14 17] [x6=O9] [x7=O 11] 3 [x4=O 22] [x5=8 9] [x6=O 7] [x7=1148] 4 [x2=5 8] [x3=7 8] [x4=8 11] [xS=O bull 3]
4
The GA operators as described above are very effective in situations where no a priori
infonnation is given on the distribution of attribute values Son-nonnal distribution of attribute
values complicates the AQ generated descriptions especially if the learned rules consistently and
completely cover input instances of the data Graph 1 presents the most representative samples of
non-normal attribute distribution obtained from the experiment with textured image classes
described in section 4
Class C9 Attribute 5 Class C8 Attribute xl 8~--~-------~-------~~
O--~~~-r----r-~-+~~~ O~~--~---+--P-~~--+--r-~
o 10 AUribute value 40 50 o 10 AUribute value 40 50 Class C8 Attribute x3
~ g -l---+---t-_r--Ii--1t--+---+--4 a ~ -+--I-----t--+--tfo~___+__t
i+---~--~~~--+---~~--~~ l+---~~--r---~--~~r-~ O~~--+--p--P-~~-----+--~~~
o 10 AUribuae value 40 50
Class Cll Attribute 7 4-----~--~----~----~--~~
O~~~--~~~~--~~-r~~
o 10 AUribute value 40 50
Graph 1 Examples of non-nonnal attribute distribution
The solid line corresponds to smoothed distribution of an attribute and the dotted line
corresponds to the approximated nonnal distribution The left hand diagrams of Graph 1 present
a relatively simple attribute distribution However the right hand diagrams of Graph I present
two cases of more complex attribute distributions The upper right diagram indicates the possible
formation of more than one cluster of training data The lower right diagram presents very
complex distribution of an attribute without distinction of regular clusters One can notice that x7
5
attribute does not carry significant information about CII class Considering the above
complexity of the attributes we postulate that this domain (data in a representation space
expressed by such attributes) is a good candidate for the GA type of searches where random
changes (mutation) and some directional sampling (crossover) can yield high performance
solutions (optimized rules)
32 Rule performance evaluation
Each rule candidate in the population is evaluated by plugging it into the set of other rules
and calculating the confusion mattix for the tuning examples The confusion matrix represents
infonnation on correct and incorrect classification results This matrix is obtained by calculating a
degree of flexible match between a tuning instance and a given class description The row entries
it) the matrix represent a percentage of matched instances from a given class (row index) to all
class descriptions (column index) The following is an example of the confusion matrix for 12
classes
Cl C2 C3 C4 C5 C6 Cl C8 C9 CIO Cl1 C12
Cl 45 S 31 0 0 9 0 28 16 0 6 2 C2 S 51 3 12 0 16 1 17 7 0 17 2 C3 31 0 80 0 7 1 0 2 10 0 6 S C4 0 25 0 72 0 14 0 12 0 0 1 10 C5 0 0 2 0 99 0 0 0 0 0 0 7 C6 8 11 3 25 0 62 0 28 0 0 12 1 C7 0 2 0 0 0 0 98 0 0 0 0 0 C8 19 14 8 2 1 59 0 59 3 0 5 0 C9 16 0 8 0 0 5 0 5 87 0 I 2 CI0 0 0 I 0 0 0 0 0 2 95 0 0 Cll 16 4 32 10 0 4 0 4 2shy 0 44 7 C12 0 0 0 4 0 0 1 0 0 0 2 98
Average recognition=7467 Average mis-classitication=483
The above confusion matrix represents classification results on data extracted from
textured images The method of extraction and texture classification by inductive learning is not
the subject of research presented in this paper The reader can find various information on the
method of learning texture descriptions in the following papers [Bala and Pachowicz 1991]
Let us suppose that the class Cll is the one chosen to be evolved by GAs mechanism
First we mutate this class representation multiple times to produce an initial population of
different individuals of that class To test the performance of a given individual we calculate the
confusion matrix The confusion matrix shows us how this specific variation of class Cll
6
perfonns when tested with other classes using the set of tuning examples For example we can
see from me above confusion matrix mat the class C11 has me following performance
Cl C2 C3 C4 C5 C6 C7 C8 C9 ClO Cll Cl2 CII 16 4 32 10 0 4 0 4 2 0 44 7
The above values represent percentage of instances of class C 11 (set of tuning instances)
matched to learned descriptions of all 12 classes If class CII is to be optimized by GAs we have
to evaluate the performance of each individual from the population of that class by calculating the
performance measure of different candidates of that class with respect to other classes Thus as
the performance evaluation measure of each individual of the population we use the ratio of
correct recognition rates to incorrect recognition rates CCMC where CC is an average
recognition for correctly recognized classes (an average of entries on the diagonal of the
confusion matrix) and MC is an average mis-classification (an average of entries outside the
diagonal of the confusion matrix) For the above confusion matrix CCMC=7467483=1545
The next section describes how the distance measure between an example (tuning or testing) and
a rule is calculated
321 Reeoampnizinamp class membership throuamph nexible matehinamp
Flexible matching measures the degree of closeness between an instance and the
conditional pan of a rule Such closeness is computed as distance value between an instance and
rule complex within the attribute space This distance measure takes any value in the range from
o (Le does not match) to 1 (Lebull matches) Calculation of this distance measure for a single
test instance is executed according to the following schema For a given condition of a rule
[ Xn =valjl and an instance where Xn = valk the normalized distance measure is
1 bull ( I valj bull valk II levels ) (2)
where levels is the total number of attrib1te values The confidence level of a rule is computed
by multiplying evaluation values of each condition of the rule The total evaluation of class
membership of a given test instance is equal to the confidence level of the best matching rule ie
the rule of the highest confidence value For example the confidence value c for matching the
following complex of a rule [ xl =0 ] [ x7 =10 ] [ x8 = 10 20]
with a given instance x = lt 4 S 24 34 O 12 6 25 gt and the number of attribute values levels=55 is computed as follows
cx = 1middot ( 10 - 4155) = 928 Cx7 = 1 - ( 110 - 61 55) = 928 cx8 = 1- ( 120middot 511 55) = 91 Cx2 cx3 cx4 cxs cx6 = 1
7
c = cxl Cx2 Cx) Cx4 Cx5 Cx6 Cx7 Cx8 = 078
The recognition process yields class membership for which the confidence level is the highest
among matched rules Calculated confidence level however is not a probability measure and it
can yield more than one class membership It means that for a given preclassified test dataset the
system recognition effectiveness is calculated as the number of correctly classified instances to
the total number of instances of the dataset The above described method of rule evaluation by
flexible matching and confidence level calculation has imponant consequences for the crossover
operation Since the degree of match of an instance to a rule depends on the confidence level of
this instance to each disjunct of that rule the swapping process introduced by crossover operator
enables inheritance of infonnation represented in the strong disjuncts (with high confidence level)
of rule candidates through generations of a population
4 Experimental results
In the experiment we used 12 classes of texture data The initial deSCriptions were
generated by the AQ module using 200 examples per class Another set of 100 examples was
used to calculat the perfonnance measure for the GAs cycle The testing set had 200 examples
extracted from different image areas other than training and tuning data Examples of a given
class were constructed using 8 attributes and an extraction process based on Laws mask [Bala
and Pachowicz 1991] The weakest class (with the lowest matching results with testing data)
chosen for the experiment was class Cll (like in the confusion matrix example in section 32)
with 40 disjuncts Graph 2 represents results of the experiment White circles of the diagrams
represent characteristics obtained for tuning data used to guide the genetic search Characteristics
mapped by black circles were obtained for testing data Upper diagram monitors the performance
of genetically evolved description of Cll texture When all 300 examples were used to generate
rules the average classification rate for this class (when tested with 200 examples) was below
45 When the set of 300 examples was split into two parts 200 for the initial inductive learning
and 100 as the tuning data for GAs cycle the correct classification rates obtainedtin the 30th
evolution was above 60 That is a significant increase in comparison with 45 obtained from
inductive learning only The bottom diagrams represent the evaluation function used by genetic
algorithm (COMe section 32) in order to guide the genetic seanh The evaluation function was
calculated as a rate of the correct classifications to mis-classifications for all twelve texture
classes These diagrams are depicted for both testing and tuning data The increases of CCMC
on both diagrams represent an overall improvement of system recognition performance The
system performance was investigated for a larger number of GA generation steps However it
appears that the substantial increase was reached both for the CII class description and for the
overall system performance in a very few generation steps (Le in 10 steps)
8
o
bull Results for tuning data Results for testing data
30~----~A+--------+-------~
20---~--~----~--~~--~--~
o 10 Generations 30
208
-~ 0
r 1
~
MM II be
roo J
248
207
247206
205
246 204
203 shy245
202
244201 o 10 Generations 30
Graph 2 Results of experiment with 12 classes
u
~
bullJ
0 10 Generations 30
f
- ~
s Conclusion
This paper presents a novel approach to the optimization of the classification rule
descriptions through GAs techniques We recognize rule optimization as an emerging research
area in the creation of autonomous intelligent systems Two issues appear to be important for this
research First rule optimization is absolutely necessary in such problems where the system
must be protected against the accumulation of noisy components and where attributes used to
describe initial data have complex non-nonna distributions The method can be augmented by
9
optimizing input data formation (searching for the best subset of attributes) prior to rule
generation [Vafaie De Jong 1991] Secondly for inducti ely generated rules (falsity preserving
rules) there should be some method of validating these rules (or choosing the most preferred
inductive rule) before applying them to future data The methoo presented in this paper tries to
offer some solutions to these problems
The presented experiments on applying this hybrid methoo to finer-grained symbolic
representations of vision data are encouraging Highly disjunctive descriptions obtained by
inductive learning from this data were easily manipulated by chosen GA operators and a
substantial increase in the overall system perfonnance was observed in a few generations
Although there exists substantial differences between GAs and symbolic induction methods
(learning algorithm performance elements knowledge representation) this method serves as a
promising example that a better understanding of abilities of each approach (identifying common
building blocks) can lead to novel and useful ways of combining them This paper concentrates
primarily on the proposed methodology More experiments are needed to draw concrete
conclusions Future research will attempt to thoroughly test this approach on numerous data sets
We will also examine more thoroughly mutation and crossover operators
Acknowledgement This research was supponed in pan by the Office of Naval Research under grants No NOOOI4-88-K-0397 and No NOOO14-88-K-0226 and in pan by the Defense Advan~ed Research Projects Agency under the grant administered by the Office of Naval Research No NOOOI4-K-85-0878 The authors wish to thank Janet Holmes and Haleh Vafaie for editing suggestions
References
Bala IW and Pachowicz PW Application of Symbolic Machine Learning to the Recognition ofTexture Concepts 7th IEEE Conference on Artificial Intelligence Applications Miami Beach FL February 1991
De Jong K bullbull Learning with Genetic Algorithms An Overview Machine Learning vol 3 123shy138 Kluwer Academic Publishers 1988
Vafaie Hand De Jong K Improving the Performance of a Rule Induction System Using Genetic Algorithms in preparation(Center for AI GMU 1991)
Fitzpatrick JM and Grefenstette I Genetic Algorithms in Noisy Environments Machine Leaming 3 pp 101-12-(1988)
MichalskiR S AQVAUI--ComputerImplementation ofa Variable-Valued Logic System VLl and Examples of Its Application to Pattern Recognition First International Joint Conference on Pattern Recognition October 30 1973 Washington DC
Michalski RS bull Mozetic I Hong IR Lavrac N bull The AQ15 Inductive Learning System Report No UIUCDCS-R-86-1260 Department of Computer Science University of Illinois at Urbane-Champaign July 1986
10
The GA operators as described above are very effective in situations where no a priori
infonnation is given on the distribution of attribute values Son-nonnal distribution of attribute
values complicates the AQ generated descriptions especially if the learned rules consistently and
completely cover input instances of the data Graph 1 presents the most representative samples of
non-normal attribute distribution obtained from the experiment with textured image classes
described in section 4
Class C9 Attribute 5 Class C8 Attribute xl 8~--~-------~-------~~
O--~~~-r----r-~-+~~~ O~~--~---+--P-~~--+--r-~
o 10 AUribute value 40 50 o 10 AUribute value 40 50 Class C8 Attribute x3
~ g -l---+---t-_r--Ii--1t--+---+--4 a ~ -+--I-----t--+--tfo~___+__t
i+---~--~~~--+---~~--~~ l+---~~--r---~--~~r-~ O~~--+--p--P-~~-----+--~~~
o 10 AUribuae value 40 50
Class Cll Attribute 7 4-----~--~----~----~--~~
O~~~--~~~~--~~-r~~
o 10 AUribute value 40 50
Graph 1 Examples of non-nonnal attribute distribution
The solid line corresponds to smoothed distribution of an attribute and the dotted line
corresponds to the approximated nonnal distribution The left hand diagrams of Graph 1 present
a relatively simple attribute distribution However the right hand diagrams of Graph I present
two cases of more complex attribute distributions The upper right diagram indicates the possible
formation of more than one cluster of training data The lower right diagram presents very
complex distribution of an attribute without distinction of regular clusters One can notice that x7
5
attribute does not carry significant information about CII class Considering the above
complexity of the attributes we postulate that this domain (data in a representation space
expressed by such attributes) is a good candidate for the GA type of searches where random
changes (mutation) and some directional sampling (crossover) can yield high performance
solutions (optimized rules)
32 Rule performance evaluation
Each rule candidate in the population is evaluated by plugging it into the set of other rules
and calculating the confusion mattix for the tuning examples The confusion matrix represents
infonnation on correct and incorrect classification results This matrix is obtained by calculating a
degree of flexible match between a tuning instance and a given class description The row entries
it) the matrix represent a percentage of matched instances from a given class (row index) to all
class descriptions (column index) The following is an example of the confusion matrix for 12
classes
Cl C2 C3 C4 C5 C6 Cl C8 C9 CIO Cl1 C12
Cl 45 S 31 0 0 9 0 28 16 0 6 2 C2 S 51 3 12 0 16 1 17 7 0 17 2 C3 31 0 80 0 7 1 0 2 10 0 6 S C4 0 25 0 72 0 14 0 12 0 0 1 10 C5 0 0 2 0 99 0 0 0 0 0 0 7 C6 8 11 3 25 0 62 0 28 0 0 12 1 C7 0 2 0 0 0 0 98 0 0 0 0 0 C8 19 14 8 2 1 59 0 59 3 0 5 0 C9 16 0 8 0 0 5 0 5 87 0 I 2 CI0 0 0 I 0 0 0 0 0 2 95 0 0 Cll 16 4 32 10 0 4 0 4 2shy 0 44 7 C12 0 0 0 4 0 0 1 0 0 0 2 98
Average recognition=7467 Average mis-classitication=483
The above confusion matrix represents classification results on data extracted from
textured images The method of extraction and texture classification by inductive learning is not
the subject of research presented in this paper The reader can find various information on the
method of learning texture descriptions in the following papers [Bala and Pachowicz 1991]
Let us suppose that the class Cll is the one chosen to be evolved by GAs mechanism
First we mutate this class representation multiple times to produce an initial population of
different individuals of that class To test the performance of a given individual we calculate the
confusion matrix The confusion matrix shows us how this specific variation of class Cll
6
perfonns when tested with other classes using the set of tuning examples For example we can
see from me above confusion matrix mat the class C11 has me following performance
Cl C2 C3 C4 C5 C6 C7 C8 C9 ClO Cll Cl2 CII 16 4 32 10 0 4 0 4 2 0 44 7
The above values represent percentage of instances of class C 11 (set of tuning instances)
matched to learned descriptions of all 12 classes If class CII is to be optimized by GAs we have
to evaluate the performance of each individual from the population of that class by calculating the
performance measure of different candidates of that class with respect to other classes Thus as
the performance evaluation measure of each individual of the population we use the ratio of
correct recognition rates to incorrect recognition rates CCMC where CC is an average
recognition for correctly recognized classes (an average of entries on the diagonal of the
confusion matrix) and MC is an average mis-classification (an average of entries outside the
diagonal of the confusion matrix) For the above confusion matrix CCMC=7467483=1545
The next section describes how the distance measure between an example (tuning or testing) and
a rule is calculated
321 Reeoampnizinamp class membership throuamph nexible matehinamp
Flexible matching measures the degree of closeness between an instance and the
conditional pan of a rule Such closeness is computed as distance value between an instance and
rule complex within the attribute space This distance measure takes any value in the range from
o (Le does not match) to 1 (Lebull matches) Calculation of this distance measure for a single
test instance is executed according to the following schema For a given condition of a rule
[ Xn =valjl and an instance where Xn = valk the normalized distance measure is
1 bull ( I valj bull valk II levels ) (2)
where levels is the total number of attrib1te values The confidence level of a rule is computed
by multiplying evaluation values of each condition of the rule The total evaluation of class
membership of a given test instance is equal to the confidence level of the best matching rule ie
the rule of the highest confidence value For example the confidence value c for matching the
following complex of a rule [ xl =0 ] [ x7 =10 ] [ x8 = 10 20]
with a given instance x = lt 4 S 24 34 O 12 6 25 gt and the number of attribute values levels=55 is computed as follows
cx = 1middot ( 10 - 4155) = 928 Cx7 = 1 - ( 110 - 61 55) = 928 cx8 = 1- ( 120middot 511 55) = 91 Cx2 cx3 cx4 cxs cx6 = 1
7
c = cxl Cx2 Cx) Cx4 Cx5 Cx6 Cx7 Cx8 = 078
The recognition process yields class membership for which the confidence level is the highest
among matched rules Calculated confidence level however is not a probability measure and it
can yield more than one class membership It means that for a given preclassified test dataset the
system recognition effectiveness is calculated as the number of correctly classified instances to
the total number of instances of the dataset The above described method of rule evaluation by
flexible matching and confidence level calculation has imponant consequences for the crossover
operation Since the degree of match of an instance to a rule depends on the confidence level of
this instance to each disjunct of that rule the swapping process introduced by crossover operator
enables inheritance of infonnation represented in the strong disjuncts (with high confidence level)
of rule candidates through generations of a population
4 Experimental results
In the experiment we used 12 classes of texture data The initial deSCriptions were
generated by the AQ module using 200 examples per class Another set of 100 examples was
used to calculat the perfonnance measure for the GAs cycle The testing set had 200 examples
extracted from different image areas other than training and tuning data Examples of a given
class were constructed using 8 attributes and an extraction process based on Laws mask [Bala
and Pachowicz 1991] The weakest class (with the lowest matching results with testing data)
chosen for the experiment was class Cll (like in the confusion matrix example in section 32)
with 40 disjuncts Graph 2 represents results of the experiment White circles of the diagrams
represent characteristics obtained for tuning data used to guide the genetic search Characteristics
mapped by black circles were obtained for testing data Upper diagram monitors the performance
of genetically evolved description of Cll texture When all 300 examples were used to generate
rules the average classification rate for this class (when tested with 200 examples) was below
45 When the set of 300 examples was split into two parts 200 for the initial inductive learning
and 100 as the tuning data for GAs cycle the correct classification rates obtainedtin the 30th
evolution was above 60 That is a significant increase in comparison with 45 obtained from
inductive learning only The bottom diagrams represent the evaluation function used by genetic
algorithm (COMe section 32) in order to guide the genetic seanh The evaluation function was
calculated as a rate of the correct classifications to mis-classifications for all twelve texture
classes These diagrams are depicted for both testing and tuning data The increases of CCMC
on both diagrams represent an overall improvement of system recognition performance The
system performance was investigated for a larger number of GA generation steps However it
appears that the substantial increase was reached both for the CII class description and for the
overall system performance in a very few generation steps (Le in 10 steps)
8
o
bull Results for tuning data Results for testing data
30~----~A+--------+-------~
20---~--~----~--~~--~--~
o 10 Generations 30
208
-~ 0
r 1
~
MM II be
roo J
248
207
247206
205
246 204
203 shy245
202
244201 o 10 Generations 30
Graph 2 Results of experiment with 12 classes
u
~
bullJ
0 10 Generations 30
f
- ~
s Conclusion
This paper presents a novel approach to the optimization of the classification rule
descriptions through GAs techniques We recognize rule optimization as an emerging research
area in the creation of autonomous intelligent systems Two issues appear to be important for this
research First rule optimization is absolutely necessary in such problems where the system
must be protected against the accumulation of noisy components and where attributes used to
describe initial data have complex non-nonna distributions The method can be augmented by
9
optimizing input data formation (searching for the best subset of attributes) prior to rule
generation [Vafaie De Jong 1991] Secondly for inducti ely generated rules (falsity preserving
rules) there should be some method of validating these rules (or choosing the most preferred
inductive rule) before applying them to future data The methoo presented in this paper tries to
offer some solutions to these problems
The presented experiments on applying this hybrid methoo to finer-grained symbolic
representations of vision data are encouraging Highly disjunctive descriptions obtained by
inductive learning from this data were easily manipulated by chosen GA operators and a
substantial increase in the overall system perfonnance was observed in a few generations
Although there exists substantial differences between GAs and symbolic induction methods
(learning algorithm performance elements knowledge representation) this method serves as a
promising example that a better understanding of abilities of each approach (identifying common
building blocks) can lead to novel and useful ways of combining them This paper concentrates
primarily on the proposed methodology More experiments are needed to draw concrete
conclusions Future research will attempt to thoroughly test this approach on numerous data sets
We will also examine more thoroughly mutation and crossover operators
Acknowledgement This research was supponed in pan by the Office of Naval Research under grants No NOOOI4-88-K-0397 and No NOOO14-88-K-0226 and in pan by the Defense Advan~ed Research Projects Agency under the grant administered by the Office of Naval Research No NOOOI4-K-85-0878 The authors wish to thank Janet Holmes and Haleh Vafaie for editing suggestions
References
Bala IW and Pachowicz PW Application of Symbolic Machine Learning to the Recognition ofTexture Concepts 7th IEEE Conference on Artificial Intelligence Applications Miami Beach FL February 1991
De Jong K bullbull Learning with Genetic Algorithms An Overview Machine Learning vol 3 123shy138 Kluwer Academic Publishers 1988
Vafaie Hand De Jong K Improving the Performance of a Rule Induction System Using Genetic Algorithms in preparation(Center for AI GMU 1991)
Fitzpatrick JM and Grefenstette I Genetic Algorithms in Noisy Environments Machine Leaming 3 pp 101-12-(1988)
MichalskiR S AQVAUI--ComputerImplementation ofa Variable-Valued Logic System VLl and Examples of Its Application to Pattern Recognition First International Joint Conference on Pattern Recognition October 30 1973 Washington DC
Michalski RS bull Mozetic I Hong IR Lavrac N bull The AQ15 Inductive Learning System Report No UIUCDCS-R-86-1260 Department of Computer Science University of Illinois at Urbane-Champaign July 1986
10
attribute does not carry significant information about CII class Considering the above
complexity of the attributes we postulate that this domain (data in a representation space
expressed by such attributes) is a good candidate for the GA type of searches where random
changes (mutation) and some directional sampling (crossover) can yield high performance
solutions (optimized rules)
32 Rule performance evaluation
Each rule candidate in the population is evaluated by plugging it into the set of other rules
and calculating the confusion mattix for the tuning examples The confusion matrix represents
infonnation on correct and incorrect classification results This matrix is obtained by calculating a
degree of flexible match between a tuning instance and a given class description The row entries
it) the matrix represent a percentage of matched instances from a given class (row index) to all
class descriptions (column index) The following is an example of the confusion matrix for 12
classes
Cl C2 C3 C4 C5 C6 Cl C8 C9 CIO Cl1 C12
Cl 45 S 31 0 0 9 0 28 16 0 6 2 C2 S 51 3 12 0 16 1 17 7 0 17 2 C3 31 0 80 0 7 1 0 2 10 0 6 S C4 0 25 0 72 0 14 0 12 0 0 1 10 C5 0 0 2 0 99 0 0 0 0 0 0 7 C6 8 11 3 25 0 62 0 28 0 0 12 1 C7 0 2 0 0 0 0 98 0 0 0 0 0 C8 19 14 8 2 1 59 0 59 3 0 5 0 C9 16 0 8 0 0 5 0 5 87 0 I 2 CI0 0 0 I 0 0 0 0 0 2 95 0 0 Cll 16 4 32 10 0 4 0 4 2shy 0 44 7 C12 0 0 0 4 0 0 1 0 0 0 2 98
Average recognition=7467 Average mis-classitication=483
The above confusion matrix represents classification results on data extracted from
textured images The method of extraction and texture classification by inductive learning is not
the subject of research presented in this paper The reader can find various information on the
method of learning texture descriptions in the following papers [Bala and Pachowicz 1991]
Let us suppose that the class Cll is the one chosen to be evolved by GAs mechanism
First we mutate this class representation multiple times to produce an initial population of
different individuals of that class To test the performance of a given individual we calculate the
confusion matrix The confusion matrix shows us how this specific variation of class Cll
6
perfonns when tested with other classes using the set of tuning examples For example we can
see from me above confusion matrix mat the class C11 has me following performance
Cl C2 C3 C4 C5 C6 C7 C8 C9 ClO Cll Cl2 CII 16 4 32 10 0 4 0 4 2 0 44 7
The above values represent percentage of instances of class C 11 (set of tuning instances)
matched to learned descriptions of all 12 classes If class CII is to be optimized by GAs we have
to evaluate the performance of each individual from the population of that class by calculating the
performance measure of different candidates of that class with respect to other classes Thus as
the performance evaluation measure of each individual of the population we use the ratio of
correct recognition rates to incorrect recognition rates CCMC where CC is an average
recognition for correctly recognized classes (an average of entries on the diagonal of the
confusion matrix) and MC is an average mis-classification (an average of entries outside the
diagonal of the confusion matrix) For the above confusion matrix CCMC=7467483=1545
The next section describes how the distance measure between an example (tuning or testing) and
a rule is calculated
321 Reeoampnizinamp class membership throuamph nexible matehinamp
Flexible matching measures the degree of closeness between an instance and the
conditional pan of a rule Such closeness is computed as distance value between an instance and
rule complex within the attribute space This distance measure takes any value in the range from
o (Le does not match) to 1 (Lebull matches) Calculation of this distance measure for a single
test instance is executed according to the following schema For a given condition of a rule
[ Xn =valjl and an instance where Xn = valk the normalized distance measure is
1 bull ( I valj bull valk II levels ) (2)
where levels is the total number of attrib1te values The confidence level of a rule is computed
by multiplying evaluation values of each condition of the rule The total evaluation of class
membership of a given test instance is equal to the confidence level of the best matching rule ie
the rule of the highest confidence value For example the confidence value c for matching the
following complex of a rule [ xl =0 ] [ x7 =10 ] [ x8 = 10 20]
with a given instance x = lt 4 S 24 34 O 12 6 25 gt and the number of attribute values levels=55 is computed as follows
cx = 1middot ( 10 - 4155) = 928 Cx7 = 1 - ( 110 - 61 55) = 928 cx8 = 1- ( 120middot 511 55) = 91 Cx2 cx3 cx4 cxs cx6 = 1
7
c = cxl Cx2 Cx) Cx4 Cx5 Cx6 Cx7 Cx8 = 078
The recognition process yields class membership for which the confidence level is the highest
among matched rules Calculated confidence level however is not a probability measure and it
can yield more than one class membership It means that for a given preclassified test dataset the
system recognition effectiveness is calculated as the number of correctly classified instances to
the total number of instances of the dataset The above described method of rule evaluation by
flexible matching and confidence level calculation has imponant consequences for the crossover
operation Since the degree of match of an instance to a rule depends on the confidence level of
this instance to each disjunct of that rule the swapping process introduced by crossover operator
enables inheritance of infonnation represented in the strong disjuncts (with high confidence level)
of rule candidates through generations of a population
4 Experimental results
In the experiment we used 12 classes of texture data The initial deSCriptions were
generated by the AQ module using 200 examples per class Another set of 100 examples was
used to calculat the perfonnance measure for the GAs cycle The testing set had 200 examples
extracted from different image areas other than training and tuning data Examples of a given
class were constructed using 8 attributes and an extraction process based on Laws mask [Bala
and Pachowicz 1991] The weakest class (with the lowest matching results with testing data)
chosen for the experiment was class Cll (like in the confusion matrix example in section 32)
with 40 disjuncts Graph 2 represents results of the experiment White circles of the diagrams
represent characteristics obtained for tuning data used to guide the genetic search Characteristics
mapped by black circles were obtained for testing data Upper diagram monitors the performance
of genetically evolved description of Cll texture When all 300 examples were used to generate
rules the average classification rate for this class (when tested with 200 examples) was below
45 When the set of 300 examples was split into two parts 200 for the initial inductive learning
and 100 as the tuning data for GAs cycle the correct classification rates obtainedtin the 30th
evolution was above 60 That is a significant increase in comparison with 45 obtained from
inductive learning only The bottom diagrams represent the evaluation function used by genetic
algorithm (COMe section 32) in order to guide the genetic seanh The evaluation function was
calculated as a rate of the correct classifications to mis-classifications for all twelve texture
classes These diagrams are depicted for both testing and tuning data The increases of CCMC
on both diagrams represent an overall improvement of system recognition performance The
system performance was investigated for a larger number of GA generation steps However it
appears that the substantial increase was reached both for the CII class description and for the
overall system performance in a very few generation steps (Le in 10 steps)
8
o
bull Results for tuning data Results for testing data
30~----~A+--------+-------~
20---~--~----~--~~--~--~
o 10 Generations 30
208
-~ 0
r 1
~
MM II be
roo J
248
207
247206
205
246 204
203 shy245
202
244201 o 10 Generations 30
Graph 2 Results of experiment with 12 classes
u
~
bullJ
0 10 Generations 30
f
- ~
s Conclusion
This paper presents a novel approach to the optimization of the classification rule
descriptions through GAs techniques We recognize rule optimization as an emerging research
area in the creation of autonomous intelligent systems Two issues appear to be important for this
research First rule optimization is absolutely necessary in such problems where the system
must be protected against the accumulation of noisy components and where attributes used to
describe initial data have complex non-nonna distributions The method can be augmented by
9
optimizing input data formation (searching for the best subset of attributes) prior to rule
generation [Vafaie De Jong 1991] Secondly for inducti ely generated rules (falsity preserving
rules) there should be some method of validating these rules (or choosing the most preferred
inductive rule) before applying them to future data The methoo presented in this paper tries to
offer some solutions to these problems
The presented experiments on applying this hybrid methoo to finer-grained symbolic
representations of vision data are encouraging Highly disjunctive descriptions obtained by
inductive learning from this data were easily manipulated by chosen GA operators and a
substantial increase in the overall system perfonnance was observed in a few generations
Although there exists substantial differences between GAs and symbolic induction methods
(learning algorithm performance elements knowledge representation) this method serves as a
promising example that a better understanding of abilities of each approach (identifying common
building blocks) can lead to novel and useful ways of combining them This paper concentrates
primarily on the proposed methodology More experiments are needed to draw concrete
conclusions Future research will attempt to thoroughly test this approach on numerous data sets
We will also examine more thoroughly mutation and crossover operators
Acknowledgement This research was supponed in pan by the Office of Naval Research under grants No NOOOI4-88-K-0397 and No NOOO14-88-K-0226 and in pan by the Defense Advan~ed Research Projects Agency under the grant administered by the Office of Naval Research No NOOOI4-K-85-0878 The authors wish to thank Janet Holmes and Haleh Vafaie for editing suggestions
References
Bala IW and Pachowicz PW Application of Symbolic Machine Learning to the Recognition ofTexture Concepts 7th IEEE Conference on Artificial Intelligence Applications Miami Beach FL February 1991
De Jong K bullbull Learning with Genetic Algorithms An Overview Machine Learning vol 3 123shy138 Kluwer Academic Publishers 1988
Vafaie Hand De Jong K Improving the Performance of a Rule Induction System Using Genetic Algorithms in preparation(Center for AI GMU 1991)
Fitzpatrick JM and Grefenstette I Genetic Algorithms in Noisy Environments Machine Leaming 3 pp 101-12-(1988)
MichalskiR S AQVAUI--ComputerImplementation ofa Variable-Valued Logic System VLl and Examples of Its Application to Pattern Recognition First International Joint Conference on Pattern Recognition October 30 1973 Washington DC
Michalski RS bull Mozetic I Hong IR Lavrac N bull The AQ15 Inductive Learning System Report No UIUCDCS-R-86-1260 Department of Computer Science University of Illinois at Urbane-Champaign July 1986
10
perfonns when tested with other classes using the set of tuning examples For example we can
see from me above confusion matrix mat the class C11 has me following performance
Cl C2 C3 C4 C5 C6 C7 C8 C9 ClO Cll Cl2 CII 16 4 32 10 0 4 0 4 2 0 44 7
The above values represent percentage of instances of class C 11 (set of tuning instances)
matched to learned descriptions of all 12 classes If class CII is to be optimized by GAs we have
to evaluate the performance of each individual from the population of that class by calculating the
performance measure of different candidates of that class with respect to other classes Thus as
the performance evaluation measure of each individual of the population we use the ratio of
correct recognition rates to incorrect recognition rates CCMC where CC is an average
recognition for correctly recognized classes (an average of entries on the diagonal of the
confusion matrix) and MC is an average mis-classification (an average of entries outside the
diagonal of the confusion matrix) For the above confusion matrix CCMC=7467483=1545
The next section describes how the distance measure between an example (tuning or testing) and
a rule is calculated
321 Reeoampnizinamp class membership throuamph nexible matehinamp
Flexible matching measures the degree of closeness between an instance and the
conditional pan of a rule Such closeness is computed as distance value between an instance and
rule complex within the attribute space This distance measure takes any value in the range from
o (Le does not match) to 1 (Lebull matches) Calculation of this distance measure for a single
test instance is executed according to the following schema For a given condition of a rule
[ Xn =valjl and an instance where Xn = valk the normalized distance measure is
1 bull ( I valj bull valk II levels ) (2)
where levels is the total number of attrib1te values The confidence level of a rule is computed
by multiplying evaluation values of each condition of the rule The total evaluation of class
membership of a given test instance is equal to the confidence level of the best matching rule ie
the rule of the highest confidence value For example the confidence value c for matching the
following complex of a rule [ xl =0 ] [ x7 =10 ] [ x8 = 10 20]
with a given instance x = lt 4 S 24 34 O 12 6 25 gt and the number of attribute values levels=55 is computed as follows
cx = 1middot ( 10 - 4155) = 928 Cx7 = 1 - ( 110 - 61 55) = 928 cx8 = 1- ( 120middot 511 55) = 91 Cx2 cx3 cx4 cxs cx6 = 1
7
c = cxl Cx2 Cx) Cx4 Cx5 Cx6 Cx7 Cx8 = 078
The recognition process yields class membership for which the confidence level is the highest
among matched rules Calculated confidence level however is not a probability measure and it
can yield more than one class membership It means that for a given preclassified test dataset the
system recognition effectiveness is calculated as the number of correctly classified instances to
the total number of instances of the dataset The above described method of rule evaluation by
flexible matching and confidence level calculation has imponant consequences for the crossover
operation Since the degree of match of an instance to a rule depends on the confidence level of
this instance to each disjunct of that rule the swapping process introduced by crossover operator
enables inheritance of infonnation represented in the strong disjuncts (with high confidence level)
of rule candidates through generations of a population
4 Experimental results
In the experiment we used 12 classes of texture data The initial deSCriptions were
generated by the AQ module using 200 examples per class Another set of 100 examples was
used to calculat the perfonnance measure for the GAs cycle The testing set had 200 examples
extracted from different image areas other than training and tuning data Examples of a given
class were constructed using 8 attributes and an extraction process based on Laws mask [Bala
and Pachowicz 1991] The weakest class (with the lowest matching results with testing data)
chosen for the experiment was class Cll (like in the confusion matrix example in section 32)
with 40 disjuncts Graph 2 represents results of the experiment White circles of the diagrams
represent characteristics obtained for tuning data used to guide the genetic search Characteristics
mapped by black circles were obtained for testing data Upper diagram monitors the performance
of genetically evolved description of Cll texture When all 300 examples were used to generate
rules the average classification rate for this class (when tested with 200 examples) was below
45 When the set of 300 examples was split into two parts 200 for the initial inductive learning
and 100 as the tuning data for GAs cycle the correct classification rates obtainedtin the 30th
evolution was above 60 That is a significant increase in comparison with 45 obtained from
inductive learning only The bottom diagrams represent the evaluation function used by genetic
algorithm (COMe section 32) in order to guide the genetic seanh The evaluation function was
calculated as a rate of the correct classifications to mis-classifications for all twelve texture
classes These diagrams are depicted for both testing and tuning data The increases of CCMC
on both diagrams represent an overall improvement of system recognition performance The
system performance was investigated for a larger number of GA generation steps However it
appears that the substantial increase was reached both for the CII class description and for the
overall system performance in a very few generation steps (Le in 10 steps)
8
o
bull Results for tuning data Results for testing data
30~----~A+--------+-------~
20---~--~----~--~~--~--~
o 10 Generations 30
208
-~ 0
r 1
~
MM II be
roo J
248
207
247206
205
246 204
203 shy245
202
244201 o 10 Generations 30
Graph 2 Results of experiment with 12 classes
u
~
bullJ
0 10 Generations 30
f
- ~
s Conclusion
This paper presents a novel approach to the optimization of the classification rule
descriptions through GAs techniques We recognize rule optimization as an emerging research
area in the creation of autonomous intelligent systems Two issues appear to be important for this
research First rule optimization is absolutely necessary in such problems where the system
must be protected against the accumulation of noisy components and where attributes used to
describe initial data have complex non-nonna distributions The method can be augmented by
9
optimizing input data formation (searching for the best subset of attributes) prior to rule
generation [Vafaie De Jong 1991] Secondly for inducti ely generated rules (falsity preserving
rules) there should be some method of validating these rules (or choosing the most preferred
inductive rule) before applying them to future data The methoo presented in this paper tries to
offer some solutions to these problems
The presented experiments on applying this hybrid methoo to finer-grained symbolic
representations of vision data are encouraging Highly disjunctive descriptions obtained by
inductive learning from this data were easily manipulated by chosen GA operators and a
substantial increase in the overall system perfonnance was observed in a few generations
Although there exists substantial differences between GAs and symbolic induction methods
(learning algorithm performance elements knowledge representation) this method serves as a
promising example that a better understanding of abilities of each approach (identifying common
building blocks) can lead to novel and useful ways of combining them This paper concentrates
primarily on the proposed methodology More experiments are needed to draw concrete
conclusions Future research will attempt to thoroughly test this approach on numerous data sets
We will also examine more thoroughly mutation and crossover operators
Acknowledgement This research was supponed in pan by the Office of Naval Research under grants No NOOOI4-88-K-0397 and No NOOO14-88-K-0226 and in pan by the Defense Advan~ed Research Projects Agency under the grant administered by the Office of Naval Research No NOOOI4-K-85-0878 The authors wish to thank Janet Holmes and Haleh Vafaie for editing suggestions
References
Bala IW and Pachowicz PW Application of Symbolic Machine Learning to the Recognition ofTexture Concepts 7th IEEE Conference on Artificial Intelligence Applications Miami Beach FL February 1991
De Jong K bullbull Learning with Genetic Algorithms An Overview Machine Learning vol 3 123shy138 Kluwer Academic Publishers 1988
Vafaie Hand De Jong K Improving the Performance of a Rule Induction System Using Genetic Algorithms in preparation(Center for AI GMU 1991)
Fitzpatrick JM and Grefenstette I Genetic Algorithms in Noisy Environments Machine Leaming 3 pp 101-12-(1988)
MichalskiR S AQVAUI--ComputerImplementation ofa Variable-Valued Logic System VLl and Examples of Its Application to Pattern Recognition First International Joint Conference on Pattern Recognition October 30 1973 Washington DC
Michalski RS bull Mozetic I Hong IR Lavrac N bull The AQ15 Inductive Learning System Report No UIUCDCS-R-86-1260 Department of Computer Science University of Illinois at Urbane-Champaign July 1986
10
c = cxl Cx2 Cx) Cx4 Cx5 Cx6 Cx7 Cx8 = 078
The recognition process yields class membership for which the confidence level is the highest
among matched rules Calculated confidence level however is not a probability measure and it
can yield more than one class membership It means that for a given preclassified test dataset the
system recognition effectiveness is calculated as the number of correctly classified instances to
the total number of instances of the dataset The above described method of rule evaluation by
flexible matching and confidence level calculation has imponant consequences for the crossover
operation Since the degree of match of an instance to a rule depends on the confidence level of
this instance to each disjunct of that rule the swapping process introduced by crossover operator
enables inheritance of infonnation represented in the strong disjuncts (with high confidence level)
of rule candidates through generations of a population
4 Experimental results
In the experiment we used 12 classes of texture data The initial deSCriptions were
generated by the AQ module using 200 examples per class Another set of 100 examples was
used to calculat the perfonnance measure for the GAs cycle The testing set had 200 examples
extracted from different image areas other than training and tuning data Examples of a given
class were constructed using 8 attributes and an extraction process based on Laws mask [Bala
and Pachowicz 1991] The weakest class (with the lowest matching results with testing data)
chosen for the experiment was class Cll (like in the confusion matrix example in section 32)
with 40 disjuncts Graph 2 represents results of the experiment White circles of the diagrams
represent characteristics obtained for tuning data used to guide the genetic search Characteristics
mapped by black circles were obtained for testing data Upper diagram monitors the performance
of genetically evolved description of Cll texture When all 300 examples were used to generate
rules the average classification rate for this class (when tested with 200 examples) was below
45 When the set of 300 examples was split into two parts 200 for the initial inductive learning
and 100 as the tuning data for GAs cycle the correct classification rates obtainedtin the 30th
evolution was above 60 That is a significant increase in comparison with 45 obtained from
inductive learning only The bottom diagrams represent the evaluation function used by genetic
algorithm (COMe section 32) in order to guide the genetic seanh The evaluation function was
calculated as a rate of the correct classifications to mis-classifications for all twelve texture
classes These diagrams are depicted for both testing and tuning data The increases of CCMC
on both diagrams represent an overall improvement of system recognition performance The
system performance was investigated for a larger number of GA generation steps However it
appears that the substantial increase was reached both for the CII class description and for the
overall system performance in a very few generation steps (Le in 10 steps)
8
o
bull Results for tuning data Results for testing data
30~----~A+--------+-------~
20---~--~----~--~~--~--~
o 10 Generations 30
208
-~ 0
r 1
~
MM II be
roo J
248
207
247206
205
246 204
203 shy245
202
244201 o 10 Generations 30
Graph 2 Results of experiment with 12 classes
u
~
bullJ
0 10 Generations 30
f
- ~
s Conclusion
This paper presents a novel approach to the optimization of the classification rule
descriptions through GAs techniques We recognize rule optimization as an emerging research
area in the creation of autonomous intelligent systems Two issues appear to be important for this
research First rule optimization is absolutely necessary in such problems where the system
must be protected against the accumulation of noisy components and where attributes used to
describe initial data have complex non-nonna distributions The method can be augmented by
9
optimizing input data formation (searching for the best subset of attributes) prior to rule
generation [Vafaie De Jong 1991] Secondly for inducti ely generated rules (falsity preserving
rules) there should be some method of validating these rules (or choosing the most preferred
inductive rule) before applying them to future data The methoo presented in this paper tries to
offer some solutions to these problems
The presented experiments on applying this hybrid methoo to finer-grained symbolic
representations of vision data are encouraging Highly disjunctive descriptions obtained by
inductive learning from this data were easily manipulated by chosen GA operators and a
substantial increase in the overall system perfonnance was observed in a few generations
Although there exists substantial differences between GAs and symbolic induction methods
(learning algorithm performance elements knowledge representation) this method serves as a
promising example that a better understanding of abilities of each approach (identifying common
building blocks) can lead to novel and useful ways of combining them This paper concentrates
primarily on the proposed methodology More experiments are needed to draw concrete
conclusions Future research will attempt to thoroughly test this approach on numerous data sets
We will also examine more thoroughly mutation and crossover operators
Acknowledgement This research was supponed in pan by the Office of Naval Research under grants No NOOOI4-88-K-0397 and No NOOO14-88-K-0226 and in pan by the Defense Advan~ed Research Projects Agency under the grant administered by the Office of Naval Research No NOOOI4-K-85-0878 The authors wish to thank Janet Holmes and Haleh Vafaie for editing suggestions
References
Bala IW and Pachowicz PW Application of Symbolic Machine Learning to the Recognition ofTexture Concepts 7th IEEE Conference on Artificial Intelligence Applications Miami Beach FL February 1991
De Jong K bullbull Learning with Genetic Algorithms An Overview Machine Learning vol 3 123shy138 Kluwer Academic Publishers 1988
Vafaie Hand De Jong K Improving the Performance of a Rule Induction System Using Genetic Algorithms in preparation(Center for AI GMU 1991)
Fitzpatrick JM and Grefenstette I Genetic Algorithms in Noisy Environments Machine Leaming 3 pp 101-12-(1988)
MichalskiR S AQVAUI--ComputerImplementation ofa Variable-Valued Logic System VLl and Examples of Its Application to Pattern Recognition First International Joint Conference on Pattern Recognition October 30 1973 Washington DC
Michalski RS bull Mozetic I Hong IR Lavrac N bull The AQ15 Inductive Learning System Report No UIUCDCS-R-86-1260 Department of Computer Science University of Illinois at Urbane-Champaign July 1986
10
o
bull Results for tuning data Results for testing data
30~----~A+--------+-------~
20---~--~----~--~~--~--~
o 10 Generations 30
208
-~ 0
r 1
~
MM II be
roo J
248
207
247206
205
246 204
203 shy245
202
244201 o 10 Generations 30
Graph 2 Results of experiment with 12 classes
u
~
bullJ
0 10 Generations 30
f
- ~
s Conclusion
This paper presents a novel approach to the optimization of the classification rule
descriptions through GAs techniques We recognize rule optimization as an emerging research
area in the creation of autonomous intelligent systems Two issues appear to be important for this
research First rule optimization is absolutely necessary in such problems where the system
must be protected against the accumulation of noisy components and where attributes used to
describe initial data have complex non-nonna distributions The method can be augmented by
9
optimizing input data formation (searching for the best subset of attributes) prior to rule
generation [Vafaie De Jong 1991] Secondly for inducti ely generated rules (falsity preserving
rules) there should be some method of validating these rules (or choosing the most preferred
inductive rule) before applying them to future data The methoo presented in this paper tries to
offer some solutions to these problems
The presented experiments on applying this hybrid methoo to finer-grained symbolic
representations of vision data are encouraging Highly disjunctive descriptions obtained by
inductive learning from this data were easily manipulated by chosen GA operators and a
substantial increase in the overall system perfonnance was observed in a few generations
Although there exists substantial differences between GAs and symbolic induction methods
(learning algorithm performance elements knowledge representation) this method serves as a
promising example that a better understanding of abilities of each approach (identifying common
building blocks) can lead to novel and useful ways of combining them This paper concentrates
primarily on the proposed methodology More experiments are needed to draw concrete
conclusions Future research will attempt to thoroughly test this approach on numerous data sets
We will also examine more thoroughly mutation and crossover operators
Acknowledgement This research was supponed in pan by the Office of Naval Research under grants No NOOOI4-88-K-0397 and No NOOO14-88-K-0226 and in pan by the Defense Advan~ed Research Projects Agency under the grant administered by the Office of Naval Research No NOOOI4-K-85-0878 The authors wish to thank Janet Holmes and Haleh Vafaie for editing suggestions
References
Bala IW and Pachowicz PW Application of Symbolic Machine Learning to the Recognition ofTexture Concepts 7th IEEE Conference on Artificial Intelligence Applications Miami Beach FL February 1991
De Jong K bullbull Learning with Genetic Algorithms An Overview Machine Learning vol 3 123shy138 Kluwer Academic Publishers 1988
Vafaie Hand De Jong K Improving the Performance of a Rule Induction System Using Genetic Algorithms in preparation(Center for AI GMU 1991)
Fitzpatrick JM and Grefenstette I Genetic Algorithms in Noisy Environments Machine Leaming 3 pp 101-12-(1988)
MichalskiR S AQVAUI--ComputerImplementation ofa Variable-Valued Logic System VLl and Examples of Its Application to Pattern Recognition First International Joint Conference on Pattern Recognition October 30 1973 Washington DC
Michalski RS bull Mozetic I Hong IR Lavrac N bull The AQ15 Inductive Learning System Report No UIUCDCS-R-86-1260 Department of Computer Science University of Illinois at Urbane-Champaign July 1986
10
optimizing input data formation (searching for the best subset of attributes) prior to rule
generation [Vafaie De Jong 1991] Secondly for inducti ely generated rules (falsity preserving
rules) there should be some method of validating these rules (or choosing the most preferred
inductive rule) before applying them to future data The methoo presented in this paper tries to
offer some solutions to these problems
The presented experiments on applying this hybrid methoo to finer-grained symbolic
representations of vision data are encouraging Highly disjunctive descriptions obtained by
inductive learning from this data were easily manipulated by chosen GA operators and a
substantial increase in the overall system perfonnance was observed in a few generations
Although there exists substantial differences between GAs and symbolic induction methods
(learning algorithm performance elements knowledge representation) this method serves as a
promising example that a better understanding of abilities of each approach (identifying common
building blocks) can lead to novel and useful ways of combining them This paper concentrates
primarily on the proposed methodology More experiments are needed to draw concrete
conclusions Future research will attempt to thoroughly test this approach on numerous data sets
We will also examine more thoroughly mutation and crossover operators
Acknowledgement This research was supponed in pan by the Office of Naval Research under grants No NOOOI4-88-K-0397 and No NOOO14-88-K-0226 and in pan by the Defense Advan~ed Research Projects Agency under the grant administered by the Office of Naval Research No NOOOI4-K-85-0878 The authors wish to thank Janet Holmes and Haleh Vafaie for editing suggestions
References
Bala IW and Pachowicz PW Application of Symbolic Machine Learning to the Recognition ofTexture Concepts 7th IEEE Conference on Artificial Intelligence Applications Miami Beach FL February 1991
De Jong K bullbull Learning with Genetic Algorithms An Overview Machine Learning vol 3 123shy138 Kluwer Academic Publishers 1988
Vafaie Hand De Jong K Improving the Performance of a Rule Induction System Using Genetic Algorithms in preparation(Center for AI GMU 1991)
Fitzpatrick JM and Grefenstette I Genetic Algorithms in Noisy Environments Machine Leaming 3 pp 101-12-(1988)
MichalskiR S AQVAUI--ComputerImplementation ofa Variable-Valued Logic System VLl and Examples of Its Application to Pattern Recognition First International Joint Conference on Pattern Recognition October 30 1973 Washington DC
Michalski RS bull Mozetic I Hong IR Lavrac N bull The AQ15 Inductive Learning System Report No UIUCDCS-R-86-1260 Department of Computer Science University of Illinois at Urbane-Champaign July 1986
10